DAWNBench is a benchmark suite for end-to-end deep learning training and inference. Computation time and cost are critical resources in building deep models, yet many existing benchmarks focus solely on model accuracy. DAWNBench provides a reference set of common deep learning workloads for quantifying training time, training cost, inference latency, and inference cost across different optimization strategies, model architectures, software frameworks, clouds, and hardware.
In this initial release of DAWNBench (part of the Stanford DAWN Project), we are releasing benchmark specifications for image classification (ImageNet, CIFAR10) and question answering (SQuAD). To the best of our knowledge, this is the first benchmark to compare end-to-end training and inference across multiple deep learning frameworks and tasks. We have seeded the benchmark leaderboard with an initial set of results, and are currently accepting new benchmark results. The deadline for submissions to this release of the benchmark is April 20th, 2018 at 11:59 PM PDT.
Future releases of the DAWNBench benchmark suite will include additional tasks (e.g., neural machine translation, object detection), datasets (e.g., WMT English-German Translation), and objectives (e.g., inference cost, sample complexity).
DAWNBench is part of a larger community conversation about the future of machine learning infrastructure. Sound off on the DAWNBench google group.
Submit your results on GitHubPlease cite the following if you use results from the benchmark or competition in any way: