End-to-End Optimization for Data Analytics with Weld

by Shoumik Palkar, James Thomas, Deepak Narayanan, Pratiksha Thaker, Parimarjan Negi, Rahul Palamuttam, and Matei Zaharia 30 Jul 2018 Weld is an open source project, with an initial prototype described in a CIDR 2017 paper. This blog describes the adaptive optimizer in Weld, which we present in our VLDB 2018 paper. Analytics applications compose a diverse mix of software libraries and functions, such as Pandas to manipulate tables, NumPy for numerical processing, and TensorFlow for machine learning. These libraries allow developers to combine fast, state-of-the art algorithms from a variety of domains into powerful processing pipelines. Unfortunately, even if...

Announcing Rolling Submissions for DAWNBench

by Cody Coleman, Deepak Narayanan, Daniel Kang, Peter Bailis, and Matei Zaharia 25 Jul 2018 Following the successful completion of the DAWNBench v1 competition, we are re-opening DAWNBench to allow rolling submissions. We’re eager to see the community continue to innovate and improve on optimizing for time-to-accuracy in deep learning, so starting today, we will accept new pull requests to dawn-bench-entries. The tasks, thresholds, metrics, and instructions are still the same as DAWNBench v1, but with two changes to the reviewing process: We will only review submissions that are in the top 5 results for...

Using Provenance to Debug Training Data for Software 2.0

by Paroma Varma, Braden Hancock, Chris Ré 21 Jun 2018 Debugging training set labels is challenging since they are often generated via black-box processes. We describe our work aggregating labels from user-defined heuristics[1], [2], machine-generated heuristics[1], and natural language explanations[1] as a step towards systematic debugging. Training sets are often aggregated from multiple imperfect sources, which can lead to systematic errors in the training set. Opening the black-box of how training labels are generated can help debug training sets and improve end model predictions. We look at how our work...

An Analysis of DAWNBench v1, a Time-to-Accuracy Benchmark for Deep Learning

by Deepak Narayanan, Daniel Kang, Cody Coleman, Tian Zhao, Jian Zhang, Luigi Nardi, Peter Bailis, Kunle Olukotun, Chris Ré, and Matei Zaharia 19 Jun 2018 As the cost of training deep learning models has increased, the community has proposed a range of hardware, software, and statistical optimizations to decrease this cost. While some of these optimizations simply run the same operations faster (e.g., upgrading from a K80 to a P100), others (e.g., asynchronous SGD, reduced precision) trade off statistical performance (number of iterations needed to obtain a certain accuracy) for improved hardware performance (time needed for each iteration). To understand these trade-offs, we created DAWNBench...

DAWNBench v1 Deep Learning Benchmark Results

by Cody Coleman, Deepak Narayanan, Daniel Kang, Peter Bailis, and Matei Zaharia 30 Apr 2018 April 20th, 2018 marked the end of our first iteration of DAWNBench, the first deep learning benchmark and competition that measures end-to-end performance: the time/cost required to achieve a state-of-the-art accuracy level for common deep learning tasks, as well as the latency/cost of inference at this state-of-the-art accuracy level. Focusing on end-to-end performance provided an objective means of normalizing across differences in computation frameworks, hardware, optimization algorithms, hyperparameter settings, and other factors that affect real-world performance. Thanks to innovative submissions...

The last decade of database research and its blindingly bright future. or Database Research: A love song.

by Michael Cafarella and Chris Ré 11 Apr 2018 To go by Twitter and many hallway conversations, the database research community has been unsettled lately in a way that we have never seen before. Many people are unhappy with the review process, many types of useful work seem to be more difficult to pursue, and our relationship with adjacent fields such as machine learning is unclear. Turing award winner – and giant of the field – Mike Stonebraker made some (though not all) of these points in a recent...

Weld v0.2.0 Released with New Features and Improved Performance

by The Weld Developers 22 Mar 2018 The Weld developers are happy to announce a new version of Weld, v0.2.0. Weld is a language and runtime for fast in-memory data analytics. It enables optimizations across operators within existing libraries as well as operators across Weld-enabled libraries. We have also released new versions of two Weld-enabled Python libraries: Grizzly v0.0.5 and weldnumpy v0.0.1. Grizzly is an accelerated subset of the Pandas data frame library, and weldnumpy accelerates the NumPy numerical computing library. What’s New in Weld v0.2.0 The...

Hyperbolic Embeddings with a Hopefully Right Amount of Hyperbole

by Chris De Sa, Albert Gu, Chris Ré, and Fred Sala 19 Mar 2018 Check out our paper on arXiv, and our code on GitHub! Valuable knowledge is encoded in structured data such as carefully curated databases, graphs of disease interactions, and even low-level information like hierarchies of synonyms. Embedding these structured, discrete objects in a way that can be used with modern machine learning methods, including deep learning, is challenging. Fundamentally, the problem is that these objects are discrete and structured, while much of machine learning works on continuous and unstructured data. Recent...

HALP: High-Accuracy Low-Precision Training

by Chris De Sa, Megan Leszczynski, Jian Zhang, Alana Marzoev, Chris Aberger, Kunle Olukotun, and Chris Ré 09 Mar 2018 Using fewer bits of precision to train machine learning models limits training accuracy—or does it? This post describes cases in which we can get high-accuracy solutions using low-precision computation via a technique called bit recentering, and our theory to explain what's going on. Low-precision computation has been gaining a lot of traction in machine learning. Companies have even started developing new hardware architectures that natively support and accelerate low-precision operations including Microsoft's Project Brainwave and Google's TPU. Even though using...

Stanford DAWN at SysML 2018

by Deepak Narayanan 08 Mar 2018 The DAWN PIs recently helped start a new research conference called SysML that targets research at the intersection of Systems and Machine Learning. The first conference was very well-attended, with over 200 poster submissions and sold-out registration, demonstrating the huge interest in this new and evolving research area from both academia and industry. At SysML, many members of DAWN presented posters about our latest research; in this post, we highlight the work we presented. Accelerating Model Search with Model Batching...

Older Newer