Looksmart's Ambitious Plans

  • Mar. 21, 2003

Back in January Looksmart quietly bought a company called Grub.  The purchase didn’t make the news.  In fact most search engine watchers wondered what their plan was.  Why buy a company that develops software which allows home computer users to donate their unused processor cycles?

 

Google is involved in this type of project.  They are working with Stanford University to study proteins.  By utilizing volunteer’s computers when they aren’t being used for anything else, they are able to increase their computing power immensely.  Rather than having 10 or 20 super computers running calculations, they have tens of thousands of regular home computers which can produce infinitely more results at a fraction of the cost to the project.  Currently, the Folding@home project (as it is called) has just over 385,000 registered computers performing calculations.

 

In Looksmart’s case, they are hoping to use distributed computing for their own good.  They are expecting people will be willing to share their processor cycles with their spiders (spiders are the various software used by the engines to index website pages).  The goal is to have enough distributed computing power to reindex their database of web pages every day.  That would be a massive undertaking.  Consider that Google currently has the largest database at just over 3 billion pages, and by using their server farm (estimate to be around 10,000 servers) they are able to index the web once per month.

 

Now consider that many web specialists consider only 5% of the web pages out there to be currently indexed.  That means there are roughly 57 billion more pages just waiting to be found.  The computational power alone to index that many pages every day would be massive.

 

That being said, maybe distributed computing is the most efficient way to index these pages.  After all, if Google currently has 10,000 computers indexing 3 billion pages, what kind of investment would they need to increase their capacity by 20 times? Of course not all of Google’s servers perform the indexing task – many also rank pages, some perform PageRank calculations and others serve up search engine results.

 

So let’s make some assumptions to see if the Looksmart plan would work.  Let’s assume Google uses 8,000 of the 10,000 servers to index the web.  If we multiply that by 20 we would need 160,000 computers spidering the web and indexing content.  We should also assume these are high end servers built for speed and results, and most home computers will not match that level of processing power.  Therefore if we double this number (Assuming the average home computer has ½ the processing power of a Google server) we’re looking at utilizing 320,000 home computers to index the web every day.  This seems high considering that the Folding@home project only has 285,000 computers.

 

But consider another distributed computing project you may have heard of called Seti@home.  This project scans the universe for sounds that could be from other intelligent life.  They currently report over 4.3 million registered computers. 

 

So, maybe Looksmart is onto something.  Maybe 320,000 isn’t an unreasonable number after all.  Maybe they can index the entire web every day.

 

Rob Sullivan

Searchengineposition.com

Search Engine Positioning

Specialists



Tags: