Deep Learning Pitfalls Encountered while Developing DAWNBench

In December, we introduced DAWNBench, the first deep learning benchmark focused on end-to-end training and inference time at a state-of-the-art accuracy. Despite the successes of deep learning, achieving state-of-the-art accuracy remains surprisingly difficult with pitfalls hidden behind inconsistent evaluation, underspecified metrics, complex tuning, and conflicting implementations. This blog outlines several of the lessons we learned while building DAWNBench, which we hope will save researchers and practitioners time, and illustrate the various issues associated with using deep learning in practice. Lesson...

Don't Throw Out Your Algorithms Book Just Yet: Classical Data Structures That Can Outperform Learned Indexes

There’s recently been a lot of excitement about a new proposal from authors at Google: to replace conventional indexing data structures like B-trees and hash maps by instead fitting a neural network to the dataset. The paper compares such learned indexes against several standard data structures and reports promising results. For range searches, the authors report up 3.2x speedups over B-trees while using 9x less memory, and for point lookups, the authors report up to 80% reduction of hash table...

Programming Training Data: The New Interface Layer for ML

Machine learning today is both far more and far less accessible than ever before. On the one hand, without any manual feature engineering or custom algorithm development, a developer can have a deep learning model downloaded and running near state-of-the-art within minutes. However, in other ways, machine learning has never been so opaque and inaccessible. Modern deep learning models admit one primary input type—training data—and other than that, are largely black boxes. Given some knowledge of a new domain or...

Introducing DAWNBench: An End-to-end Deep Learning Benchmark and Competition

Deep learning has shown amazing results in tasks ranging from image classification to question answering to machine translation, but these models are extremely costly to train. To optimize performance, the deep learning community has developed new software systems, training algorithms, and hardware to optimize deep learning performance. Unfortunately, it’s hard to compare these different optimizations due to a lack of a standard criterion for end-to-end deep learning performance. While there are several existing benchmarks, these benchmarks measure only proxy metrics,...

Exploiting Building Blocks of Data to Efficiently Create Training Sets

The ability of deep learning models to achieve state-of-the-art performance is grounded in the availability of large, labeled training sets. However, gathering this magnitude of ground truth labels is expensive and time-consuming. While users can write rules that check for specific words or patterns in text data, developing such heuristics for image or video data is challenging since the raw pixels are difficult to interpret. To address this issue, we present Coral, a paradigm that allows users to write heuristics...

Learning to Compose Domain-Specific Transformations for Data Augmentation

Data augmentation is a popular technique for increasing the size of labeled training sets by applying class-preserving transformations to create copies of labeled data points. In the image domain, it is a crucial factor in almost every state-of-the-art result today. However, the choice of types, parameterizations, and compositions of transformations applied can have a large effect on performance, and is tricky and time-consuming to tune by hand for a new dataset or task. In this blog post we describe our...

There and Back Again: A General Approach to Learning Sparse Models

Sparse models – models where only a small fraction of parameters are non-zero – arise frequently in machine learning. Sparsity is beneficial in several ways: sparse models are more easily interpretable by humans, and sparsity can yield statistical benefits – such as reducing the number of examples that have to be observed to learn the model. In a sense, we can think of sparsity as an antidote to the oft-maligned curse of dimensionality. In a recent paper, we ask: can...

Accelerated Stochastic Power Iteration

Surprisingly, standard acceleration doesn’t always work for stochastic PCA. We provide a very simple stochastic PCA algorithm, based on adding a momentum term to the power iteration, that achieves the optimal sample complexity and an accelerated iteration complexity in terms of the eigengap. Importantly, it is embarrassingly parallel, allowing accelerated convergence in terms of wall-clock time. Our results hinge on a tight variance analysis of a stochastic two-term matrix recurrence, which implies acceleration for a wider class of non-convex problems....

Automatic Time Series Smoothing with ASAP

Dashboard-based visualization is critical in monitoring and diagnosing modern applications and services. However, most time-series dashboards simply plot raw data as it arrives. In a recent paper, we showed it’s possible to increase human accuracy in identifying anomalies in time series visualizations by up to 38% while reducing response time by up to 44% by adopting a simple strategy: smooth your dashboards! Moreover, our ASAP.js library will smooth your plots automatically. As a motivating example, consider the two plots of...

Weak Supervision: The New Programming Paradigm for Machine Learning

Getting labeled training data has become the key development bottleneck in supervised machine learning. We provide a broad, high-level overview of recent weak supervision approaches, where noisier or higher-level supervision is used as a more expedient and flexible way to get supervision signal, in particular from subject matter experts (SMEs). We provide a simple, broad definition of weak supervision as being comprised of one or more noisy conditional distributions over unlabeled data, and focus on the key technical challenge of...