by Cody Coleman, Daniel Kang, Deepak Narayanan, Peter Bailis, and Matei Zaharia
Building on our experience with DAWNBench, we helped create MLPerf as an industry-standard for measuring machine learning system performance. Now that both the MLPerf Training and Inference benchmark suites have successfully launched, we have decided to end rolling submissions to DAWNBench on 3/27/2020 to consolidate benchmarking efforts. Until then, we will continue to accept new submissions via pull requests to dawn-bench-entries. Since the end of the first round of DAWNBench, we continued to see impressive results from the community. ImageNet...
by Shoumik Palkar and Matei Zaharia
Over the past few years, developers have sought to improve the performance of data science and machine learning applications using JIT compilers such as Weld, TensorFlow XLA, and TorchScript. These compilers have been shown to enable major speedups (up to two orders of magnitude) in applications that use existing high level APIs such as NumPy and pandas. Unfortunately, they are also difficult to implement, debug, and integrate into existing libraries. For example, the Weld compiler requires thousands of lines of...
by Fred Sala, Ines Chami, Adva Wolf, Albert Gu, Beliz Gunel and Chris Ré
Is our comfortable and familiar Euclidean space and its linear structure always the right place for machine learning? Recent research argues otherwise: it is not always needed and sometimes harmful, as demonstrated by a wave of exciting work. Starting with the notion of hyperbolic representations for hierarchical data two years ago, a major push has resulted in new ideas for representations in non-Euclidean spaces, new algorithms and models with non-Euclidean data and operations, and new perspectives on the underlying functionality...
by Dan Fu, Chris Ré, Kayvon Fatahalian
Many real-world video analysis applications require the ability to identify domain-specific events in video, such as interviews and commercials in TV news broadcasts, or action sequences in film. Pre-trained models to detect all events of interest often do not exist, and training new models from scratch can be costly and labor-intensive. In this blog post, we discuss an alternative approach to specifying new events in video: by writing queries that compose the outputs of existing, pre-trained models using a new...
by Peter Kraft, Daniel Kang, Deepak Narayanan, Shoumik Palkar, Peter Bailis, Matei Zaharia
This blog post has been updated. See the updated post here. In this blog post, we introduce Willump, a statistically-aware end-to-end optimizer for machine learning (ML) inference. Most existing ML inference systems, such as Clipper or AWS Sagemaker, approach ML inference as an extension of conventional data serving workloads. In contrast, Willump leverages unique properties of ML inference to improve performance of real-world workloads by up to 23x. Willump targets ML inference workloads whose computational bottleneck is the cost of...
by Sen Wu, Vincent S. Chen, Braden Hancock, Alex Ratner, Chris Ré, and other members of Hazy Lab
Using standard models (i.e. pretrained BERT) and minimal tuning, we leverage key abstractions for programmatically building and managing training data to achieve a state-of-the-art result on SuperGLUE—a a newly curated benchmark with six tasks for evaluating “general-purpose language understanding technologies.”1 We also give updates on Snorkel’s use in the real world with even more applications—from industrial scale at Google in Snorkel Drybell to scientific work in MRI classification and automated Genome-wide association study (GWAS) curation (both accepted in Nature Comms)!...
by Tri Dao, Albert Gu, Matthew Eichhorn, Megan Leszczynski, Nimit Sohoni, Amit Blonder, Atri Rudra, and Chris Ré
We use a type of structured matrix known as a butterfly matrix to learn fast algorithms for discrete linear transforms such as the Discrete Fourier Transform. We further introduce a hierarchy of matrix families based on composing butterfly matrices, which is capable of efficiently representing any structured matrix (any matrix with a fast matrix-vector multiplication algorithm, such as low rank or sparse matrices), with a nearly optimal number of parameters. We experiment with the usage of butterfly matrices for a...
by Fred Sala, Paroma Varma, Chris Ré
Recently, weak supervision has been used to efficiently label large-scale training sets without traditional hand-labeled data across applications in academia and industry. However, users cannot always specify which dependencies (i.e., correlations) exist among the weak supervision sources, which could potentially number in the hundreds. We discuss a method to learn the dependency structure of weak supervision sources without using traditional hand-labeled data. A few of our benefits: Improved sample-complexity: sublinear, and in some cases, logarithmic in the number of sources,...
by Paris Siminelakis*, Kexin Rong*, Peter Bailis, Moses Charikar, Phillip Levis
Kernel methods are a class of non-parametric methods used for a wide variety of tasks including density estimation, regression, clustering and distribution testing [1]. In MacroBase, for instance, we use Kernel Density Estimation to perform outlier detection for multimodal data. Despite their wide applications and clean theoretical foundation, kernel methods do not scale well to large scale data: a larger training set improves the accuracy but incurs a quadratic increase in overall evaluation time. This is especially problematic in high...
by Animesh Koratana*, Daniel Kang*, Peter Bailis, Matei Zaharia
Check out our paper and our code on GitHub! Modern DNNs are becoming deeper, requiring large amounts of compute resources to deploy. In this blog post, we describe LIT, a compression method better suited to modern DNN architectures than prior work. LIT can provide compression up to 5.5x with no loss in accuracy. LIT improves model compression for modern DNN architectures by taking advantage of multiple intermediate representations of a teacher model to train a shallower, faster student model. LIT...