One of the next frontiers of search is taking all of the unstructured data spread helter-skelter across the Web and treat it like it is sitting in a nice, structured database. It is easier to get answers out of a database where everything is neatly labeled, stamped, and categorized. As the sheer volume of stuff on the Web keeps growing, keyword search keeps getting closer to its breaking point. Adding structure to the Web is one way to make sense of all that data, and Google is starting the tackle the problem with a Google Labs project called Google Squared, which Marissa Mayer mentioned earlier today at the company’s Searchology briefing.
Google Squared extracts data from Web pages and presents them in search results as squares in an online spreadsheet. Michael was at the event and got a personal demo (see video below). From Michael’s Searchology notes:
Google Squared is launching later this month in labs. Google Squared returns search results in a spreadsheet format. It structures the unstructured data on web pages. So a search for Small Dogs returns results with names, description, size, weight, origin, etc., in columns and rows.This type of technology has obvious applications for many types of targeted searches, including product search, health search, scientific searches, you name it. There are dozens of semantic search startups trying to impose structure on the Web to perform similar tricks. Another high-profile search startup which is launching on Monday, Wolfram Alpha, takes a slightly different approach in that it simply ingests massive amounts of information into its own databases where it can query it to its heart’s delight. Already there is a bit of a rivalry between Google and Wolfram because getting back structured results is a major new direction for search.
Google is looking for data structures on the web that imply facts, and then grabbing it for Squared results. “It takes an incredible amount of compute power to create one of those squares,” she says.
Wolfram does a pretty good job parsing the information in its own databases, but those databases will never match what is available on the Web. Wolfram’s databases currently store only 10 terabytes of information, a tiny fraction of what is on the Web. (I will be posting my impressions of Wolfram’s search engine soon). Google Squared is an early attempt to take the messy data which exists on the Web and place it into simple tables. It is still very experimental and isn’t always on target, but you can see where this is going. Turning the Web into a giant database will crush any attempt to segregate the “best” information into a separate database so that it can be processed and searched more deeply.
In the video demo below, a search for “camera” sorts the results in different columns by images, description, and manufacturer, resolution, etc.. You can refine results by clicking on a particular column such as manufacturer. A search for “rollercoasters” sorts results by name, image, description, height, length, and number of inversions. But sometimes it gets confused. A search for “spaceships” turns up a Corvette and a missile carrier. It is going to be a while before this makes it out of Google Labs