Journal Article
Snorkel: A System for Lightweight Extraction
We describe a vision and an initial prototype system for extracting structured data from unstructured or dark input sources — such as text, embedded tables, and images — called Snorkel, in which users write traditional extraction scripts which are automatically enhanced by machine learning techniques. The key technical idea is to view the user’s actions with standard tools as implicitly defining a statistical model.
Project page
Snorkel
A system for rapidly creating, modeling, and managing training data, focused on accelerating the development of structured or “dark” data extraction applications for domains in which large labeled training sets are not available or easy to obtain.Journal Name
8th Biennial Conference on Innovative Data Systems Research (CIDR ’17)
Publication Date
January, 2017