Stanford DAWN

Home

NoScope: 1000x Faster Deep Learning Queries over Video

Video data is exploding – the UK alone has over 4 million CCTVs, and users upload over 300 hours of video to YouTube every minute. Recent advances in deep learning enable automated analysis of this growing amount of video data, allowing us to query for objects of interest, detect unusual and abnormal events, and sift through lifetimes of video that no human would ever want to watch. However, these deep learning methods are extremely computationally expensive: state-of-the-art methods for object...

HoloClean - Weakly Supervised Data Repairing

Data cleaning and repairing account for about 60% of the work of data scientists. Noisy and erroneous data is a major bottleneck in analytics. Data cleaning and repairing account for about 60% of the work of data scientists. To address this bottleneck, we recently introduced HoloClean, a semi-automated data repairing framework that relies on statistical learning and inference to repair errors in structured data. In HoloClean, we build upon the paradigm of weak supervision and demonstrate how to leverage diverse...

Snorkel and The Dawn of Weakly Supervised Machine Learning

In this post, we’ll discuss our approaches to weakly supervising complex machine learning models in the age of big data. Learn more about Snorkel, our system for rapidly creating training sets with weak supervision, at snorkel.stanford.edu. Labeled Training Data: The New New Oil Today’s state-of-the-art machine learning models are both more powerful and easier to spin up than ever before. Whereas practitioners used to spend the bulk of their time carefully engineering features for their models, we can now feed...

A retrospective on NSDI 2017

A group of us at DAWN went to NSDI last month. The program was quite diverse, spanning a wide variety of sub-areas in the networking and distributed systems space. We were excited to see some trends in the research presented that meshed well with the DAWN vision. Greater emphasis on systems for machine learning The machine learning community has spent a lot of time optimizing different machine learning algorithms to achieve better accuracies in different settings. Despite these advances, deploying...

Implementing Weld in Rust

Weld is a runtime and language for high performance data analytics, developed in the Stanford Infolab. It is implemented in Rust, a modern take on a fast systems programming language. In this blog post we provide our experiences implementing a low-level systems research project in Rust (with no prior experience with the language). We hope this will help other developers evaluate Rust when choosing a language for their system. First, a bit of background on Weld. The Weld language includes...

A New DAWN for Data Analytics

We are in the golden age of machine learning and artificial intelligence. Sustained algorithmic advances coupled with the availability of massive datasets and fast parallel computing have led to breakthroughs in applications that would have been considered science fiction even a few years ago. Over the past five years, voice-driven personal assistants have become commonplace, image recognition systems have reached human quality, and autonomous vehicles are rapidly become broadly available. Given these successes, there is no doubt that machine learning...