A Five-Year Research Project to Democratize AI
Stanford DAWN (Data Analytics for What’s Next) · 2018–2022
Making a machine learning system is a complicated process — but with better tools, we believe any organization could do it.
Better tools are needed
With better data management tools, the process would become easier. The DAWN project set out to research and build these tools. Our vision is that anyone with expertise in their domain — such as a medical lab optimizing clinical procedures or a business group addressing its field-specific problems — can build their own production-quality data products without requiring a team of experts in machine learning.
“It’s hard in grad school to find a project that pulls together so many different collaborators. It was a really cool team, both from industry and grad students. It was really fun rather than the typical grad school solo-journey student experience. I feel grateful about that.”
— Firas Abuzaid
DAWN Leadership
People
- Principal Investigator
- Program Manager
- Principal Investigator
- Principal Investigator
- Principal Investigator
- Principal Investigator
DAWN addresses every step of the ML production process
Today it is easier than ever to choose, adjust, and train machine learning models — the core algorithms that learn from data to produce the desired results. But a model can only do its job if people have gathered a lot of good data for it to learn from, and it can only be useful if people make it widely available and monitor its output for errors. DAWN aimed to make all these steps easier, streamlining the process from beginning to end.
Collecting and preparing data
One of the greatest challenges is to acquire or produce enough data in the first place. Many ML models require huge amounts of training data, and the data often have to be cleaned of errors and labeled with additional information. These tasks often need to be done by hand.
Training and running the model
Thanks to years of ML research, the models and algorithms themselves are often good enough out of the box. The main challenge here is one that affects every step in the process: running systems quickly-and cost-effectively, when many ML applications are constructed from disparate parts that weren’t designed to work together efficiently.
Flagship projects
Site Pages
MacroBase
MacroBase is a new analytic monitoring engine designed to prioritize human attention in large-scale datasets and data streams.Snorkel
A system for rapidly creating, modeling, and managing training data, focused on accelerating the development of structured or “dark” data extraction applications for domains in which large labeled training sets are not available or easy to obtain.Spatial
A new Domain Specific Language for programming reconfigurable hardware from a parameterized, high level abstraction.Weld
A runtime for improving the performance of data-intensive applications. It optimizes across libraries and functions by expressing the core computations in libraries using a small common intermediate representation, similar to CUDA and OpenCL.NoScope
A system for querying videos at scale using neural networks and for accelerating neural network inference by over 1000× by exploiting model specialization and dynamic cascades.DAWNBench
A benchmark suite for end-to-end deep learning training and inference.HyperMapper
A multi-objective black-box optimization tool based on Bayesian Optimization.