Skip to main content Skip to secondary navigation
Journal Article

Snorkel: A System for Lightweight Extraction

We describe a vision and an initial prototype system for extracting structured data from unstructured or dark input sources — such as text, embedded tables, and images — called Snorkel, in which users write traditional extraction scripts which are automatically enhanced by machine learning techniques. The key technical idea is to view the user’s actions with standard tools as implicitly defining a statistical model.

Project page

A system for rapidly creating, modeling, and managing training data, focused on accelerating the development of structured or “dark” data extraction applications for domains in which large labeled training sets are not available or easy to obtain.
Author(s)
Alexander Ratner
Stephen H. Bach
Henry R. Ehrenberg
Jason Alan Fries
Sen Wu
Christopher Ré
Journal Name
8th Biennial Conference on Innovative Data Systems Research (CIDR ’17)
Publication Date
January, 2017