Snorkel
Snorkel is a system for programmatically building and managing training datasets.
In Snorkel, users can develop training datasets in hours or days rather than hand-labeling them over weeks or months.
Snorkel currently exposes three key programmatic operations: labeling data, for example using heuristic rules or distant supervision techniques; transforming data, for example rotating or stretching images to perform data augmentation; and slicing data into different critical subsets. Snorkel then automatically models, cleans, and integrates the resulting training data using novel, theoretically-grounded techniques.
Snorkel has been deployed in industry, medicine, science, and government to build new ML applications in a fraction of the time; for more, see tutorials and other resources.
People
-
Research Engineer
-
Principal Investigator