DAWNBench

An End-to-End Deep Learning Benchmark and Competition

What is DAWNBench?

DAWNBench is a benchmark suite for end-to-end deep learning training and inference. Computation time and cost are critical resources in building deep models, yet many existing benchmarks focus solely on model accuracy. DAWNBench provides a reference set of common deep learning workloads for quantifying training time, training cost, inference latency, and inference cost across different optimization strategies, model architectures, software frameworks, clouds, and hardware.

About the Initial Release

In this initial release of DAWNBench (part of the Stanford DAWN Project), we are releasing benchmark specifications for image classification (ImageNet, CIFAR10) and question answering (SQuAD). To the best of our knowledge, this is the first benchmark to compare end-to-end training and inference across multiple deep learning frameworks and tasks. We have seeded the benchmark leaderboard with an initial set of results, and are currently accepting new benchmark results. The deadline for submissions to this release of the benchmark is April 20th, 2018 at 11:59 PM PST.

Next Steps

Future releases of the DAWNBench benchmark suite will include additional tasks (e.g., neural machine translation, object detection), datasets (e.g., WMT English-German Translation), and objectives (e.g., inference cost, sample complexity).

Join Us

DAWNBench is part of a larger community conversation about the future of machine learning infrastructure. Sound off on the DAWNBench google group.

Submit your results on GitHub

Citation

Please cite the following if you use results from the benchmark or competition in any way:

Cody A. Coleman, Deepak Narayanan, Daniel Kang, Tian Zhao, Jian Zhang, Luigi Nardi, Peter Bailis, Kunle Olukotun, Chris Ré, and Matei Zaharia
NIPS ML Systems Workshop, 2017