Paper Review: WebTables: Exploring the Power of Tables on the Web

Title and Author of Paper WebTables: Exploring the Power of Tables on the Web. M.J. Cafarella et al. Summary WebTables is a project to extract and process HTML tables from Google’s serach index. It attempts to answer two questions: what are some effective techniques for searching structured data at search engine scale, and what can be derived from analyzing a large corpus of HTML tables? Web documents often contain structured and relational data embedded in HTML tables. The WebTables project extracted 14.1 billion English language HTML tables and further filtered those down to 154 million tables that contain structured data. From this data, we have the potential to determine semantic information embedded in the web, create visualizations, and integrate web documents into new applications. ...

March 29, 2017 · 4 min · Kevin Sookocheff