Deep Web Search May Help Scientists

When you do a simple Web search on a topic, the results that pop up aren't the whole story. The Internet contains a vast trove of information sometimes called the "Deep Web" that isn't indexed by search engines: information that would be useful for tracking criminals, terrorist activities, sex trafficking and the spread of diseases. Scientists could also use it to search for images and data from spacecraft.

The Defense Advanced Research Projects Agency (DARPA) has been developing tools as part of its Memex program that access and catalog this mysterious online world. Researchers at NASA's Jet Propulsion Laboratory in Pasadena, California, have joined the Memex effort to harness the benefits of deep Web searching for science. Memex could, for example, help catalog the vast amounts of data NASA spacecraft deliver on a daily basis.

"We're developing next-generation search technologies that understand people, places, things and the connections between them," said Chris Mattmann, principal investigator for JPL's work on Memex.

Memex checks not just standard text-based content online but also images, videos, pop-up ads, forms, scripts and other ways information is stored to look at how they are interrelated.

"We're augmenting Web crawlers to behave like browsers in other words, executing scripts and reading ads in ways that you would when you usually go online. This information is normally not catalogued by search engines," Mattmann said.

Additionally, a standard Web search doesn't get much information from images and videos, but Memex can recognize what's in this content and pair it with searches on the same subjects. The search tool could identify the same object across many frames of a video or even different videos.

No comments:

Post a Comment