DAWN at MLSys 2020

We are excited to present some of our latest research at the MLSys 2020 conference in Austin next week! DAWN researchers are involved in five conference papers and several workshop papers, and on top of that, our PI Chris Ré is giving a keynote on Monday, and PI Matei Zaharia is co-organizing the MLOps workshop. Be sure to check out the the following talks on our papers at MLSys next week:

Willump: A Statistically-Aware End-to-end Optimizer for Machine Learning Inference

by Peter Kraft · Daniel Kang · Deepak Narayanan · Shoumik Palkar · Peter Bailis · Matei Zaharia

Systems for ML inference are widely deployed today, but they typically optimize ML inference workloads using techniques designed for conventional data serving workloads and neglect the unique statistical properties of ML. In response, we developed Willump, a system for optimizing ML inference performance that introduces two statistically-motivated optimizations targeting ML applications whose performance bottleneck is feature computation. First, Willump automatically cascades feature computation for classification problems: Willump classifies most data inputs using only high-value, low-cost features, only computing more expensive features when needed. Second, Willump accurately approximates ML top-K queries, discarding low-scoring inputs with an automatically constructed approximate model then ranking the remainder with a more powerful model. We benchmarked Willump on real-world ML inference applications curated from major data science competitions, showing our optimizations improved performance by up to 10x with minimal loss of accuracy.

Understanding the Downstream Instability of Word Embeddings

by Megan Leszczynski · Avner May · Jian Zhang · Sen Wu · Christopher Aberger · Christopher Ré

Frequent retraining of ML models to ensure model freshness exacerbates a large challenge facing ML models: model training can be unstable–small changes in data can result in large changes in predictions. In this work, we focus on the instability of a core building block of many NLP pipelines—word embeddings. First, we perform a study on the impact of embedding hyperparameters on downstream instability, exposing a stability-memory tradeoff: increasing the memory, decreases the downstream instability. To theoretically understand this tradeoff, we introduce the eigenspace instability measure and prove it bounds the disagreement in downstream predictions introduced by the change in word embeddings. Finally, to practically select embedding hyperparameters to minimize instability, we evaluate various embedding distance measures as selection criteria for embedding hyperparameters. We demonstrate that the theoretically grounded eigenspace instability measure and a nearest neighbor-based measure outperform other methods of selecting hyperparameters to minimize instability without actually training downstream models.

Model Assertions for Monitoring and Improving ML Models

by Daniel Kang · Deepti Raghavan · Peter Bailis · Matei Zaharia

Machine learning models are increasingly deployed in settings with real world interactions such as vehicles, but unfortunately, these models can fail in systematic ways. To prevent errors, ML engineering teams monitor and continuously improve these models. We propose a new abstraction, model assertions, that adapts the classical use of program assertions as a way to monitor and improve ML Models. We introduce various ways of using model assertions both at runtime and training time, as well as a consistency API to help developers write model assertions. At runtime, we show model assertions can find high confidence errors, where a model returns incorrect output with high confidence. We also propose two methods to use model assertions at training time, for active learning and weak supervision. We introduce a new bandit-based algorithm to use assertions for active learning and show that it can reduce labeling costs up to 40% over traditional uncertainty based methods.

Improving the Accuracy, Scalability, and Performance of Graph Neural Networks with Roc

by Zhihao Jia · Sina Lin · Mingyu Gao · Matei Zaharia · Alex Aiken

Graph neural networks (GNNs) have been demonstrated to be an effective model for learning tasks related to graph structured data. Different from classical deep neural networks that handle relatively small individual samples, GNNs process very large graphs, which must be partitioned and processed in a distributed manner. We present Roc, a distributed multi-GPU framework for fast GNN training and inference on graphs. Roc is up to 4x faster than existing GNN frameworks on a single machine, and can scale to multiple GPUs on multiple machines. This performance gain is mainly enabled by Roc’s graph partitioning and memory management optimizations. Besides performance acceleration, the better scalability of Roc also enables the exploration of more sophisticated GNN architectures on large, real-world graphs. We demonstrate that a class of GNN architectures significantly deeper and larger than the typical two-layer models can achieve new state-of-the-art classification accuracy on the widely used Reddit dataset.

MLPerf Training Benchmark

by Peter Mattson · Christine Cheng · Gregory Diamos · Cody Coleman · Paulius Micikevicius · David Patterson · Hanlin Tang · Gu-Yeon Wei · Peter Bailis · Victor Bittorf · David Brooks · Dehao Chen · Debo Dutta · Udit Gupta · Kim Hazelwood · Andy Hock · Xinyuan Huang · Daniel Kang · David Kanter · Naveen Kumar · Jeffery Liao · Deepak Narayanan · Tayo Oguntebi · Gennady Pekhimenko · Lillian Pentecost · Vijay Janapa Reddi · Taylor Robie · Tom St John · Carole-Jean Wu · Lingjie Xu · Cliff Young · Matei Zaharia

Machine learning (ML) performance benchmarking is critical to the design and competitive evaluation of the many software and hardware solutions for ML that are becoming common today. Unlike other computational workloads, ML training presents many unique benchmarking challenges: (1) optimizations can impact both throughput and accuracy, (2) training is inherently stochastic, and (3) implementations vary significantly across software and hardware systems. As part of a large commercial and academic collaboration, we created MLPerf to address these issues. Building on DAWNBench, MLPerf continues our approach of end-to-end benchmarking for ML system performance, but extends this methodology to more tasks and introduces an additional “Closed” division to enable easier comparisons between hardware and software systems. The first two rounds of the MLPerf Training benchmark helped drive improvements to software-stack performance and scalability, showing a 1.3x speedup in the top 16-chip results despite higher quality targets and a 5.5x increase in system scale.

Workshop Presentations

We’re also presenting several pieces of work at MLSys workshops. PI Matei Zaharia is co-organizing the MLOps workshop, which will include a poster on Efficient Scheduling of DNN Training on Multitenant Clusters from DAWN students. We’re also presenting work on efficient GNN training and efficient sparse deep learning on GPUs in the Resource-constrained ML workshop and on the Taurus intelligent data plane in the ML for networking workshop. Drop by our workshop talks to see some of the latest research we are doing at DAWN.