2018

A Formal Framework For Probabilistic Unclean Databases
Christopher De Sa, Ihab F Ilyas, Benny Kimelfeld, Christopher Re, Theodoros Rekatsinas
arXiv Preprint, 2018

High-Accuracy Low-Precision Training
Christopher De Sa, Megan Leszczynski, Jian Zhang, Alana Marzoev, Christopher R Aberger, Kunle Olukotun, Christopher Ré
arXiv Preprint, 2018 [blog]

Evaluating End-to-end Optimization for Data Analytics Applications in Weld
Shoumik Palkar, James Thomas, Deepak Narayanan, Pratiksha Thaker, Rahul Palamuttam, Parimajan Negi, Anil Shanbhag, Malte Schwarzkopf, Holger Pirk, Saman Amarasinghe, Samuel Madden, Matei Zaharia
VLDB, 2018 [blog]

Filter Before You Parse: Faster Analytics on Raw Data with Sparser
Shoumik Palkar, Firas Abuzaid, Peter Bailis, Matei Zaharia
VLDB, 2018 [blog]

Locality-sensitive Hashing for Earthquake Detection: A Case Study of Scaling Data-driven Science
Kexin Rong, Clara Yoon, Karianne Bergen, Hashem Elezabi, Peter Bailis, Philip Levis, Gregory Beroza
VLDB, 2018 [blog]

Moment-Based Quantile Sketches for Efficient High Cardinality Aggregation Queries
Edward Gan, Jialin Ding, Kai Sheng Tai, Vatsal Sharan, Peter Bailis
VLDB, 2018 [blog]

Snorkel: Rapid Training Data Creation with Weak Supervision
Alexander Ratner, Stephen Bach, Henry Ehrenberg, Jason Fries, Sen Wu, Christopher Ré
VLDB 2018 [blog]

Efficient Mergeable Quantile Sketches using Moments
Edward Gan, Jialin Ding, Peter Bailis
SysML 2018 Poster.

PipeDream: Pipeline Parallelism for DNN Training
Aaron Harlap, Deepak Narayanan, Amar Phanishayee, Vivek Seshadri, Gregory Ganger, Phillip Gibbons
SysML 2018 Poster.

Accelerating Model Search with Model Batching
Deepak Narayanan, Keshav Santhanam, Matei Zaharia
SysML 2018 Poster.

BlazeIt: An Optimizing Query Engine for Video at Scale
Daniel Kang, Peter Bailis, Matei Zaharia
SysML 2018 Poster.

A Two-pronged Progress in Structured Dense Matrix Vector Multiplication
Christopher De Sa, Albert Cu, Rohan Puttagunta, Christopher Ré, Atri Rudra
SODA 2018

Sketching Linear Classifiers on Data Streams
Kai Sheng Tai, Vatsal Sharan, Peter Bailis, Gregory Valiant
SIGMOD 2018 [blog]

Fonduer: Knowledge Base Construction from Richly Formatted Data
Sen Wu, Luke Hsiao, Xiao Cheng, Braden Hancock, Theodoros Rekatsinas, Philip Levis, Christopher Ré
SIGMOD 2018 [blog]

Snorkel MeTaL: Weak Supervision for Multi-Task Learning
Alex Ratner, Braden Hancock, Jared Dunnmon, Roger Goldman, Christopher Ré
Proceedings of the Second Workshop on Data Management for End-To-End Machine Learning, 2018

Spatial: A Language and Compiler for Application Accelerators
David Koeplinger, Matthew Feldman, Raghu Prabhakar, Yaqi Zhang, Stefan Hadjis, Ruben Fiszel, Tian Zhao, Luigi Nardi, Ardavan Pedram, Christos Kozyrakis, Kunle Olukotun
PLDI 2018

Systematic Protein Prioritization for Targeted Proteomics Studies through Literature Mining
Kun-Hsing Yu, TSUNG-LU LEE, Chi-Shiang Wang, Yu-Ju Chen, Christopher Ré, Samuel C Kou, Jung-Hsien Chiang, Isaac S Kohane, Michael Snyder
Journal of Proteome Research, 2018

Efficient Octree-Based Volumetric SLAM Supporting Signed-Distance and Occupancy Mapping
Emanuele Vespa, Nikolay Nikolov, Marius Grimm, Luigi Nardi, Paul HJ Kelly, Stefan Leutenegger
IEEE Robotics and Automation Letters, 2018

SLAMBench2: Multi-Objective Head-to-Head Benchmarking for Visual SLAM
Bruno Bodin, Harry Wagstaff, Sajad Saeedi, Luigi Nardi, Emanuele Vespa, John H Mayer, Andy Nisbet, Mikel Luján, Steve Furber, Andrew J Davison, Paul H.J. Kelly, Michael O’Boyle
ICRA 2018

Representation Tradeoffs for Hyperbolic Embeddings
Frederic Sala, Chris De Sa, Albert Gu, Christopher Ré
ICML, 2018

Learning Invariance with Compact Transforms
Anna Thomas, Albert Gu, Tri Dao, Atri Rudra, Christopher Re
ICLR Workshop Track, 2018

LevelHeaded: A Unified Engine for Business Intelligence and Linear Algebra Querying
Christopher R. Aberger, Andrew Lamb, Kunle Olukotun, Christopher Ré
ICDE 2018

Exploring the Utility of Developer Exhaust
Jian Zhang, Max Lam, Stephanie Wang, Paroma Varma, Luigi Nardi, Kunle Olukotun, Christopher Re
DEEM’18, 2018

Title Generation for Web Tables
Braden Hancock, Hongrae Lee, Cong Yu
Arxiv Preprint, 2018

Accelerated stochastic power iteration
Christopher De Sa, Bryan He, Ioannis Mitliagkas, Christopher Ré, Peng Xu
AI Stats, 2018 [blog]

Training Classifiers with Natural Language Explanations
Braden Hancock, Paroma Varma, Stephanie Wang, Martin Bringmann, Percy Liang, Christopher Ré
ACL, 2018

2017

AMELIE accelerates Mendelian patient diagnosis directly from the primary literature
Johannes Birgmeier, Maximilian Haeussler, Cole A Deisseroth, Karthik A Jagadeesh, Alexander J Ratner, Harendra Guturu, Aaron M Wenger, Peter D Stenson, David N Cooper, Christopher Re, others
bioRxiv, 2017

ASAP: Prioritizing Attention via Time Series Smoothing
Kexin Rong, Peter Bailis
VLDB 2017 [demo] [blog] [talk] [slides] [code]

NoScope: Optimizing Neural Network Queries over Video at Scale
Daniel Kang, John Emmons, Firas Abuzaid, Peter Bailis, Matei Zaharia
VLDB 2017 [blog] [slides] [code]

HoloClean: Holistic Data Repairs with Probabilistic Inference
Theodoros Rekatsinas, Xu Chu, Ihab F Ilyas, Christopher Ré
VLDB 2017 [blog]

Mind the Gap: Bridging Multi-Domain Workloads with EmptyHeaded
Christopher R. Aberger, Andrew Lamb, Kunle Olukotun, Christopher Ré
VLDB 2017 (Demo)

DAWNBench: An End-to-End Deep Learning Benchmark and Competition
Cody Coleman, Deepak Narayanan, Daniel Kang, Tian Zhao, Jian Zhang, Luigi Nardi, Peter Bailis, Kunle Olukotun, Chris Ré, Matei Zaharia
SOSP MLSys Workshop, 2017

Stadium: A Distributed Metadata-Private Messaging System
Nirvan Tyagi, Yossi Gilad, Derek Leung, Matei Zaharia, Nickolai Zeldovich
SOSP 2017

SLiMFast: Guaranteed Results for Data Fusion and Source Reliability
Manas Joglekar, Theodoros Rekatsinas, Hector Garcia-Molina, Aditya Parameswaran, Christopher Ré
SIGMOD, 2017 [blog]

Scalable Kernel Density Classification via Threshold-Based Pruning
Edward Gan, Peter Bailis
SIGMOD 2017 [talk] [slides] [code]

Demonstration: MacroBase, A Fast Data Analysis Engine
Peter Bailis, Edward Gan, and Kexin Rong, Sahaana Suri
SIGMOD 2017 (Demo)

MacroBase: Prioritizing Attention in Fast Data
Peter Bailis, Edward Gan, Samuel Madden, Deepak Narayanan, Kexin Rong, Sahaana Suri
SIGMOD 2017 Awarded “Best of SIGMOD 2017”. [code]

Snorkel: Fast Training Set Generation for Information Extraction
Alexander J. Ratner, Stephen H. Bach, Henry R. Ehrenberg, Christopher Ré
SIGMOD 2017 (Demo) [code] [coverage]

Deep Learning at 15PF: Supervised and Semi-Supervised Classification for Scientific Data
Thorsten Kurth, Jian Zhang, Nadathur Satish, Ioannis Mitliagkas, Evan Racah, Mostofa Ali Patwary, Tareq Malas, Narayanan Sundaram, Wahid Bhimji, Mikhail Smorkalov, Jack Deslippe, Mikhail Shiryaev, Srinivas Sridharan, Prabhat, Pradeep Dubey
SC 2017

ShortFuse: Biomedical Time Series Representations in the Presence of Structured Information
Madalina Fiterau, Suvrat Bhooshan, Jason Fries, Charles Bournhonesque, Jennifer Hicks, Eni Halilaj, Christopher Ré, Scott Delp
Proceedings of the Machine Learning in Healthcare Conference 2017 [slides]

Flipper: A Systematic Approach to Debugging Training Sets
Paroma Varma, Dan Iter, Christopher De Sa, Christopher Ré
Proceedings of the 2nd Workshop on Human-In-the-Loop Data Analytics 2017

DIY Hosting for Online Privacy
Shoumik Palkar, Matei Zaharia
Proceedings of the 16th ACM Workshop on Hot Topics in Networks 2017

A Relational Framework for Classifier Engineering
Benny Kimelfeld, Christopher Ré
PODS 2017 Best of PODS 2017.

Splinter: Practical Private Queries on Public Data
Frank Wang, Catherine Yun, Shafi Goldwasser, Vinod Vaikuntanathan, Matei Zaharia
NSDI 2017 [coverage]

Cross-Modal Data Programming for Medical Images
Nishith Khandwala, Alexander Ratner, Jared Dunnmon, Roger Goldman, Matt Lungren, Daniel Rubiun, Christopher Ré
NIPS ML4H Workshop 2017

Automatic Training Set Generation for Aortic Valve Classification
Vincent Chen, Paroma Varma, Madalina Fiterau, Seung-Pyo Lee, James Priest, Christopher Ré
NIPS ML4H Workshop 2017

Generating Training Labels for Cardiac Phase-Contrast MRI Images
Vincent Chen, Paroma Varma, Madalina Fiterau, Seung-Pyo Lee, James Priest, Christopher Ré
NIPS ML4H Workshop 2017

Babble Labble: Learning from Natural Language Explanations
Braden Hancock, Stephanie Wang, Paroma Varma, Percy Liang, Christopher Ré
NIPS Demonstration 2017 [blog]

Inferring Generative Model Structure with Static Analysis
Paroma Varma, Bryan He, Payal Bajaj, Imon Banerjee, Nishith Khandwala, Daniel L. Rubin Rubin, Christopher Ré
NIPS 2017

Learning to Compose Domain-Specific Transformations for Data Augmentation
Alexander J. Ratner, Henry R. Ehrenberg, Zeshan, Hussain, Jared Dunmon, Christopher Ré
NIPS 2017 [blog] [code]

Gaussian Quadrature for Kernel Features
Tri Dao, Chris De Sa, Christopher Ré
NIPS 2017 Spotlight.

DAWNBench: An End-to-End Deep Learning Benchmark and Competition
Cody Coleman, Deepak Narayanan, Daniel Kang, Tian Zhao, Jian Zhang, Luigi Nardi, Peter Bailis, Kunle Olukotun, Chris R'e, Matei Zaharia
ML System Workshops at NIPS 2017

Learning the structure of generative models without labeled data
Stephen H Bach, Bryan He, Alexander Ratner, Christopher Ré
International Conference on Machine Learning 2017 [blog]

Understanding and optimizing asynchronous low-precision stochastic gradient descent
Christopher De Sa, Matthew Feldman, Christopher Ré, Kunle Olukotun
ISCA 2017

Plasticine: A Reconfigurable Architecture For Parallel Patterns
David Koeplinger, Raghu Prabhakar, Yaqi Zhang, Matt Feldman, Tian Zhao, Stefan Hadjis, Christos Kozyrakis, Kunle Olukotun
ISCA 2017

Learning the Structure of Generative Models without Labeled Data
Bryan He, Christopher M De Sa, Ioannis Mitliagkas, Christopher Ré
ICML 2017

GYM: A multiround join algorithm in mapreduce
Foto Afrati, Manas Joglekar, Christopher Ré, Semih Salihoglu, Jeffrey D Ullman
ICDT, 2017

Association of Omics Features with Histopathology Patterns in Lung Adenocarcinoma
Kun-Hsing Yu, Gerald J Berry, Daniel L Rubin, Christopher Ré, Russ B Altman, Michael Snyder
Cell systems, 2017

Snorkel: A System for Lightweight Extraction.
Alexander Ratner, Stephen H Bach, Henry R Ehrenberg, Jason Alan Fries, Sen Wu, Christopher Ré
CIDR 2017

Prioritizing Attention in Fast Data: Principles and Promise
Peter Bailis, Edward Gan, Kexin Rong, Sahaana Suri
CIDR 2017

Weld: A Common Runtime for High Performance Data Analytics
Shoumik Palkar, James J Thomas, Anil Shanbhag, Deepak Narayanan, Holger Pirk, Malte Schwarzkopf, Saman Amarasinghe, Matei Zaharia
CIDR 2017 [code]

YellowFin and the Art of Momentum Tuning
Jian Zhang, Ioannis Mitliagkas, Christopher Ré
AutoML Workshop at ICML 2017 [blog] [code]

There and Back Again: A General Approach to Learning Sparse Models
Vatsal Sharan, Kai Sheng Tai, Peter Bailis, Gregory Valiant
Arxiv Preprint, 2017

Report from the third workshop on Algorithms and Systems for MapReduce and Beyond (BeyondMR’16)
Foto N Afrati, Jan Hidders, Christopher Ré, Jacek Sroka, Jeffrey Ullman
ACM SIGMOD Record, 2017

2016

Emptyheaded: A relational engine for graph processing
Christopher R Aberger, Susan Tu, Kunle Olukotun, Christopher Ré
SIGMOD 2016 Awarded “Best of SIGMOD 2016”. [slides]

Predicting non-small cell lung cancer prognosis by fully automated microscopic pathology image features
Kun-Hsing Yu, Ce Zhang, Gerald J Berry, Russ B Altman, Christopher Ré, Daniel L Rubin, Michael Snyder
Nature Communications, 2016 [coverage]

Data Programming: Creating Large Training Sets, Quickly
Alexander J. Ratner, Christopher M. De Sa, Sen Wu, Daniel Selsam, Christopher Ré
NIPS 2016 [blog] [talk] [code] [coverage]

Scan order in Gibbs sampling: Models in which it matters and bounds on how much
Bryan He, Christopher M. De Sa, Ioannis Mitliagkas, Christopher Ré
NIPS 2016

Yggdrasil: An Optimized System for Training Deep Decision Trees at Scale
Firas Abuzaid, Joseph K Bradley, Feynman T Liang, Andrew Feng, Lee Yang, Matei Zaharia, Ameet S Talwalkar
NIPS 2016 [talk] [slides] [code]

Automatic generation of efficient accelerators for reconfigurable hardware
David Koeplinger, Christina Delimitrou, Raghu Prabhakar, Christos Kozyrakis, Yaqi Zhang, Kunle Olukotun
ISCA 2016 [slides]

Ensuring rapid mixing and low bias for asynchronous Gibbs sampling
Christopher De Sa, Kunle Olukotun, Christopher Ré
ICML 2016 Best Paper Award. [slides]

Old Techniques for New Join Algorithms: A Case Study in RDF Processing
Christopher R. Aberger, Susan Tu, Kunle Olukotun, Christopher Ré
ICDE Workshops 2016 [code]

Have abstraction and eat performance, too: Optimized heterogeneous computing with parallel patterns
Kevin J Brown, HyoukJoong Lee, Tiark Rompf, Arvind K Sujeeth, Christopher De Sa, Christopher Aberger, Kunle Olukotun
CGO 2016

Asynchrony begets momentum, with an application to deep learning
Ioannis Mitliagkas, Ce Zhang, Stefan Hadjis, Christopher Ré
Allerton 2016 [blog]

Generating configurable hardware from parallel patterns
Raghu Prabhakar, David Koeplinger, Kevin J Brown, HyoukJoong Lee, Christopher De Sa, Christos Kozyrakis, Kunle Olukotun
ASPLOS 2016