As we embark on our new program (more on that next week :), we are pleased to add an additional team to our portfolio. The group consists of engineers who have developed web-scale infrastructure at Yahoo, eBay, Bing, and many hail from Stanford University’s PhD program in Computer Science.
In their own words:
“Diffbot looks at the Web with a human set of eyes. We’ve built a robot that examines the Web using artificial intelligence, computer vision, machine learning and natural language processing, and provides software developers with tools to find, extract and understand objects from any Web page for use in their applications.
Our goal is to make the Web as readable by machines as it currently is by humans. There is tremendous data — *all* data, increasingly — available across the Web, and we think our computer vision technology is key to helping computers identify and use all of it.
To date we’ve categorized the Web into approximately 20 different page types that can be visually analyzed using layout and contextual cues, including everything from product and review pages to social networking profiles and recipes. We’ve released developer APIs for two of the most commonly consumed page types, Front Pages and Articles. The Front Page API is designed for analyzing home and index pages using common layout markers (headlines, bylines, images, articles, ads and more), while the Article API is used to extract clean article text, related images and videos and generate unique cross-referenced tags from news and blog Web pages. You can play with our APIs here.
We’re delighted to be working with WebFWD. As we strive to better understand the entire Web, we can think of few partners better experienced than Mozilla to help inform our efforts, and are excited for the opportunities this will present.”