← Back to team overview

opencog-dev team mailing list archive

blog bit: Powerset: All About the compute power?

 

Hi All,

Finally, some recent straight-talk about Powerset:

Microsoft-Powerset: All About the Data
Centers?<http://www.datacenterknowledge.com/archives/2008/Jun/30/microsoft-powerset_all_about_the_data_centers.html>

Venture Beat<http://venturebeat.com/2008/06/26/microsoft-to-buy-semantic-search-engine-powerset-for-100m-plus/>is
reporting that Microsoft may acquire the semantic search company
Powerset <http://www.powerset.com/> for about $100 million. Both companies
are mum about the reports, but a Microsoft-Powerset deal could solve the
primary roadblock to real-world deployment of semantic search - the
extraordinary amount of computing resources required to build a semantic
index of the entire web.

One of Powerset's challenges is data center resources, and whether it can
afford to buy or rent the computing power needed to apply its indexing
technology to the entire World Wide Web. From its inception, Powerset has
acknowledged that its approach to "natural language" contextual search
requires much more horsepower to compile and the keyword driven indexes
built by Google, Yahoo and Microsoft. Here's an explanation from our
conversation last year with Powerset co-founder Steve Newcomb (who has since
left the company):

We capture each sentence rather than just the key words in that that
sentence, so our index size is many times larger than a keyword index. That
has a couple of large impacts on the data center. To parse a single sentence
is much more computationally expensive for us than for Google. Because of
that, we have to create a massively parallel infrastructure on the
processing side. We need a lot more compute power than any keyword search
engine, so we need more powerful machines, and the cost of our data center
is potentially a very significant cost.

Powerset has its own data center infrastructure, but turned to Amazon and
its EC2 utility computing service to build its web index.

When Powerset began using Amazon EC2 in November 2006, founder and CEO
Barney Pell noted that building traditional data center infrastructure would
be "a significant barrier to seriously competing with companies like Google
and Yahoo." In mid-2007, Newcomb projected that using Amazon would allow
Powerset to complete building its index by early 2008, after which the
company would use its own data centers to maintain the index - which
requires less computational horsepower than building it.

But when Powerset launched in
May<http://www.techcrunch.com/2008/05/11/powerset-launches-showcase-for-user-search-experience/>,
it did so with a proof-of-concept index of Wikipedia, rather than the
web-wide search originally envisioned by its founders.

Was this all Powerset was able to come up with after more than 18 months of
indexing using Amazon EC2? The company says it still intends to eventually
offer a semantic index of the entire web, but hasn't indicated how much of
that task has been completed (beyond Wikipedia). But the progress to date
suggests that it will be difficult for Powerset to complete the task quickly
on its own - hence the rumors that the company is seeking suitors.

With its enormous data center infrastructure, Microsoft has the resources to
build a semantic index of the entire web using Powerset's technology, and to
do so faster than anyone else.
-dave