Willump: A Statistically-Aware End-to-end Optimizer for Machine Learning Inference

In this blog post, we introduce Willump, a statistically-aware end-to-end optimizer for machine learning (ML) inference. Most existing ML inference systems, such as Clipper or AWS Sagemaker, approach ML inference as an extension of conventional data serving workloads. In contrast, Willump leverages unique properties of ML inference to improve performance of real-world workloads by up to 23x. Willump targets ML inference workloads whose computational bottleneck is the cost of computing features, especially workloads that use relatively inexpensive models such as...

Powerful Abstractions for Programming Your Training Data

Using standard models (i.e. pretrained BERT) and minimal tuning, we leverage key abstractions for programmatically building and managing training data to achieve a state-of-the-art result on SuperGLUE—a a newly curated benchmark with six tasks for evaluating “general-purpose language understanding technologies.”1 We also give updates on Snorkel’s use in the real world with even more applications—from industrial scale at Google in Snorkel Drybell to scientific work in MRI classification and automated Genome-wide association study (GWAS) curation (both accepted in Nature Comms)!...

Butterflies Are All You Need: A Universal Building Block for Structured Linear Maps

We use a type of structured matrix known as a butterfly matrix to learn fast algorithms for discrete linear transforms such as the Discrete Fourier Transform. We further introduce a hierarchy of matrix families based on composing butterfly matrices, which is capable of efficiently representing any structured matrix (any matrix with a fast matrix-vector multiplication algorithm, such as low rank or sparse matrices), with a nearly optimal number of parameters. We experiment with the usage of butterfly matrices for a...

Learning Dependency Structures in Weak Supervision

Recently, weak supervision has been used to efficiently label large-scale training sets without traditional hand-labeled data across applications in academia and industry. However, users cannot always specify which dependencies (i.e., correlations) exist among the weak supervision sources, which could potentially number in the hundreds. We discuss a method to learn the dependency structure of weak supervision sources without using traditional hand-labeled data. A few of our benefits: Improved sample-complexity: sublinear, and in some cases, logarithmic in the number of sources,...

Rehashing Kernel Evaluation in High Dimensions

Kernel methods are a class of non-parametric methods used for a wide variety of tasks including density estimation, regression, clustering and distribution testing [1]. In MacroBase, for instance, we use Kernel Density Estimation to perform outlier detection for multimodal data. Despite their wide applications and clean theoretical foundation, kernel methods do not scale well to large scale data: a larger training set improves the accuracy but incurs a quadratic increase in overall evaluation time. This is especially problematic in high...

Get LIT: A New Approach to Compress DNNs by up to 5.5x with no Loss in Accuracy

Check out our paper and our code on GitHub! Modern DNNs are becoming deeper, requiring large amounts of compute resources to deploy. In this blog post, we describe LIT, a compression method better suited to modern DNN architectures than prior work. LIT can provide compression up to 5.5x with no loss in accuracy. LIT improves model compression for modern DNN architectures by taking advantage of multiple intermediate representations of a teacher model to train a shallower, faster student model. LIT...

DeepFreak: Learning Crystallography Diffraction Patterns with Automated Machine Learning

A detailed description can be found in this paper. Crystallography is the science that studies the properties of crystals. It has been a central tool in many disciplines, including chemistry, geology, biology, materials science, metallurgy, and physics, and has led to substantial advances in, for instance, drugs development for fighting diseases. In crystallography, a crystal is irradiated with an X-ray beam that strikes the crystal and produces an image with a diffraction pattern (Figure 1, see this video for more...

Massive Multi-Task Learning with Snorkel MeTaL: Bringing More Supervision to Bear

TL;DR: We use Snorkel MeTaL1 to construct a simple model (pretrained BERT + linear task heads) and incorporate a variety of supervision signals (traditional supervision, transfer learning, multi-task learning, weak supervision, and ensembling) in a Massive Multi-Task Learning (MMTL) setting, achieving a new state-of-the-art score on the GLUE Benchmark and four of its nine component tasks (CoLA, SST-2, MRPC, STS-B). Research is ongoing, with a code release of the MMTL package coming in Snorkel MeTaL v0.5 in April 2019. Designing...

Model Assertions as a Tool for Quality Assurance and Improving ML Models

Machine learning is increasingly being used in real-world domains, such as self driving cars or healthcare. However, ML models can fail in confusing or complicated ways. For example, autonomous vehicles have suffered multiple incidents where they accelerated into one type of highway lane divider. We believe it is critical to develop tools for ensure model quality and to improve models over time, especially as ML is deployed in mission-critical domains. Prior work on quality assurance for machine learning has focused...

DAWN PI Delivers NeurIPS Keynote

We are excited to have DAWN Principal Investigator, Kunle Olukotun, presenting some of our latest research advancements in his keynote at the NeurIPS conference tomorrow. To accompany his keynote we are providing a reading list for some of the topics that will be covered during his talk. HALP: [Updated Manuscript (12/5/18)][Manuscript (3/9/18)][Blog] Stay tuned (updated results and code coming soon)![Initial Definition and Study of Hardware Versus Statistical Effiency] Snorkel: [Paper][Website] Multi-Task Learning (Snorkel MeTaL): [Paper][Code] Software 2.0: [Paper][Blog] Spatial: [Paper][Code]...