DAWNBench is a benchmark suite for end-to-end deep learning training and inference. Computation time and cost are critical resources in building deep models, yet many existing benchmarks focus solely on model accuracy. DAWNBench provides a reference set of common deep learning workloads for quantifying training time, training cost, inference latency, and inference cost across different optimization strategies, model architectures, software frameworks, clouds, and hardware.

The first iteration of DAWNBench is over and the competiton results and key takeways have been finalized. Check out MLPerf.org for our latest benchmarking efforts.

Read the paper

More information

Submit your results on GitHub

Image Classification on ImageNet

Training Time

All Submissions

Objective: Time taken to train an image classification model to a top-5 validation accuracy of 93% or greater on ImageNet.

Rank	Time to 93% Accuracy	Model	Hardware	Framework
1 Apr 2018	0:30:43	ResNet50 Google source	Half of a TPUv2 Pod	TensorFlow 1.8.0-rc1
2 Apr 2018	1:06:32	AmoebaNet-D N6F256 Google source	1/4 of a TPUv2 Pod	TensorFlow 1.8.0-rc1
3 Apr 2018	1:58:24	AmoebaNet-D N6F256 Google source	1/16 of a TPUv2 Pod	TensorFlow 1.8.0-rc1
4 Apr 2018	2:57:28	Resnet 50 fast.ai + students team: Jeremy Howard, Andrew Shaw, Brett Koonce, Sylvain Gugger source	8 * V100 (AWS p3.16xlarge)	fastai / pytorch
5 Apr 2018	3:25:55	ResNet50 Intel(R) Corporation source	128 nodes with Xeon Platinum 8124M / 144 GB / 36 Cores (Amazon EC2 [c5.18xlarge])	Intel(R) Optimized Caffe

Training Cost

All Submissions

Objective: Total cost of public cloud instances to train an image classification model to a top-5 validation accuracy of 93% or greater on ImageNet.

Rank	Cost (USD)	Model	Hardware	Framework
1 Apr 2018	$49.30	AmoebaNet-D N6F256 Google Cloud TPU source	GCP n1-standard-2, Cloud TPU	TensorFlow 1.8.0-rc0
2 Apr 2018	$58.53	ResNet50 Google Cloud TPU source	GCP n1-standard-2, Cloud TPU	TensorFlow v1.8rc1
3 Apr 2018	$72.40	Resnet 50 fast.ai + students team: Jeremy Howard, Andrew Shaw, Brett Koonce, Sylvain Gugger source	8 * V100 (AWS p3.16xlarge)	fastai / pytorch
4 Mar 2018	$82.07	ResNet50 Google Cloud TPU source	GCP n1-standard-2, Cloud TPU	TensorFlow v1.7rc1
5 Jan 2018	$358.22	ResNet50 DIUX source	p3.16xlarge	tensorflow 1.5, tensorpack 0.8.1

Inference Latency

All Submissions

Objective: Latency required to classify one ImageNet image using a model with a top-5 validation accuracy of 93% or greater.

Rank	1-example Latency (milliseconds)	Model	Hardware	Framework
1 Apr 2018	9.9600	ResNet50 Intel(R) Corporation source	Amazon EC2 [c5.18xlarge]	Intel(R) Optimized Caffe
2 Apr 2018	12.4000	ResNet50 Intel(R) Corporation source	Amazon EC2 [c5.4xlarge]	Intel(R) Optimized Caffe
3 Apr 2018	17.3800	ResNet50 Intel(R) Corporation source	Amazon EC2 [c5.2xlarge]	Intel(R) Optimized Caffe
4 Nov 2017	22.2700	ResNet 152 Stanford DAWN source	1 P100 / 30 GB / 8 CPU (Google Compute)	TensorFlow v1.2
5 Nov 2017	26.8200	ResNet 152 Stanford DAWN source	1 P100 / 30 GB / 8 CPU (Google Compute)	MXNet 0.11.0

Inference Cost

All Submissions

Objective: Average cost on public cloud instances to classify 10,000 validation images from ImageNet using of an image classification model with a top-5 validation accuracy of 93% or greater.

Rank	Cost (USD)	Model	Framework	Hardware
1 Apr 2018	$0.02	ResNet50 Intel(R) Corporation source	Intel(R) Optimized Caffe	Amazon EC2 [c5.2xlarge]
2 Apr 2018	$0.02	ResNet50 Intel(R) Corporation source	Intel(R) Optimized Caffe	Amazon EC2 [c5.4xlarge]
3 Nov 2017	$0.07	ResNet 152 Stanford DAWN source	MXNet 0.11.0	1 K80 / 61 GB / 4 CPU (Amazon EC2 [p2.xlarge])
4 Nov 2017	$0.11	ResNet 152 Stanford DAWN source	TensorFlow v1.2	1 P100 / 30 GB / 8 CPU (Google Compute)
5 Nov 2017	$0.12	ResNet 152 Stanford DAWN source	TensorFlow v1.2	1 K80 / 61 GB / 4 CPU (Amazon EC2 [p2.xlarge])

Image Classification on CIFAR10

Training Time

All Submissions

Objective: Time taken to train an image classification model to a test accuracy of 94% or greater on CIFAR10.

Rank	Time to 94% Accuracy	Model	Framework	Hardware
1 Apr 2018	0:02:54	Custom Wide Resnet fast.ai + students team: Jeremy Howard, Andrew Shaw, Brett Koonce, Sylvain Gugger source	fastai / pytorch	8 * V100 (AWS p3.16xlarge)
2 Apr 2018	0:05:41	Resnet18 + minor modifications bkj source	pytorch 0.3.1.post2	V100 (AWS p3.2xlarge)
3 Apr 2018	0:06:45	Custom Wide Resnet fast.ai + students team: Jeremy Howard, Andrew Shaw, Brett Koonce, Sylvain Gugger source	fastai / pytorch	Paperspace Volta (V100)
4 Apr 2018	0:35:37	KervResNet34 Chen Wang source	PyTorch 0.3.1	1 GPU (Nvidia GeForce GTX 1080 Ti)
5 Jan 2018	1:07:55	ResNet50 DIUX source	tensorflow 1.5, tensorpack 0.8.1	p3.2xlarge

Training Cost

All Submissions

Objective: Total cost for public cloud instances to train an image classification model to a test accuracy of 94% or greater on CIFAR10.

Rank	Cost (USD)	Model	Framework	Hardware
1 Apr 2018	$0.26	Custom Wide Resnet fast.ai + students team: Jeremy Howard, Andrew Shaw, Brett Koonce, Sylvain Gugger source	fastai / pytorch	Paperspace Volta (V100)
2 Apr 2018	$0.29	Resnet18 + minor modifications bkj source	pytorch 0.3.1.post2	V100 (AWS p3.2xlarge)
3 Apr 2018	$1.18	Custom Wide Resnet fast.ai + students team: Jeremy Howard, Andrew Shaw, Brett Koonce, Sylvain Gugger source	fastai / pytorch	8 * V100 (AWS p3.16xlarge)
4 Jan 2018	$3.46	ResNet50 DIUX source	tensorflow 1.5, tensorpack 0.8.1	p3.2xlarge
5 Jan 2018	$3.78	ResNet50 DIUX source	tensorflow 1.5, tensorpack 0.8.1	g3.4xlarge

Inference Latency

All Submissions

Objective: Latency required to classify one CIFAR10 image using a model with a test accuracy of 94% or greater.

Rank	1-example Latency (milliseconds)	Model	Framework	Hardware
1 Oct 2017	9.7843	ResNet 56 Stanford DAWN source	PyTorch v0.1.12	1 K80 / 61 GB / 4 CPU (Amazon EC2 [p2.xlarge])
2 Oct 2017	24.6291	ResNet 164 (with bottleneck) Stanford DAWN source	PyTorch v0.1.12	1 P100 / 512 GB / 56 CPU (DAWN Internal Cluster)
3 Oct 2017	24.9200	ResNet 164 (without bottleneck) Stanford DAWN source	TensorFlow v1.2	60 GB / 16 CPU (Google Cloud [n1-standard-16])
4 Oct 2017	25.2188	ResNet 164 (without bottleneck) Stanford DAWN source	PyTorch v0.1.12	1 P100 / 512 GB / 56 CPU (DAWN Internal Cluster)
5 Oct 2017	28.1000	ResNet 164 (with bottleneck) Stanford DAWN source	TensorFlow v1.2	1 P100 / 512 GB / 56 CPU (DAWN Internal Cluster)

Inference Cost

All Submissions

Objective: Average cost on public cloud instances to classify 10,000 test images from CIFAR10 using an image classification model with a test accuracy of 94% or greater.

Rank	Cost (USD)	Model	Framework	Hardware
1 Oct 2017	$0.02	ResNet 56 Stanford DAWN source	PyTorch v0.1.12	1 K80 / 61 GB / 4 CPU (Amazon EC2 [p2.xlarge])
2 Oct 2017	$0.04	ResNet 164 (without bottleneck) Stanford DAWN source	TensorFlow v1.2	60 GB / 16 CPU (Google Cloud [n1-standard-16])
3 Oct 2017	$0.05	ResNet 164 (with bottleneck) Stanford DAWN source	TensorFlow v1.2	60 GB / 16 CPU (Google Cloud [n1-standard-16])
4 Oct 2017	$0.07	ResNet 164 (without bottleneck) Stanford DAWN source	PyTorch v0.1.12	1 K80 / 61 GB / 4 CPU (Amazon EC2 [p2.xlarge])
5 Oct 2017	$0.07	ResNet 164 (with bottleneck) Stanford DAWN source	PyTorch v0.1.12	1 K80 / 61 GB / 4 CPU (Amazon EC2 [p2.xlarge])

Question Answering on SQuAD

Training Time

All Submissions

Objective: Time taken to train a question answering model to a F1 score of 0.75 or greater on the SQuAD development dataset.

Rank	Time to 0.75 F1	Model	Framework	Hardware
1 Apr 2018	0:45:56	QANet Google source	TensorFlow v1.8	1 TPUv2
2 Oct 2017	7:38:10	BiDAF Stanford DAWN source	TensorFlow v1.2	1 K80 / 61 GB / 4 CPU (Amazon EC2 [p2.xlarge])
3 Oct 2017	7:51:22	BiDAF Stanford DAWN source	TensorFlow v1.2	1 P100 / 512 GB / 56 CPU (DAWN Internal Cluster)
4 Oct 2017	8:43:40	BiDAF Stanford DAWN source	TensorFlow v1.2	1 K80 / 30 GB / 8 CPU (Google Cloud)
5 Oct 2017	10:50:22	BiDAF Stanford DAWN source	TensorFlow v1.2	60 GB / 16 CPU (Google Cloud [n1-standard-16])

Training Cost

All Submissions

Objective: Total cost for public cloud instances to train a question answering model to a F1 score of 0.75 or greater on the SQuAD development dataset.

Rank	Cost (USD)	Model	Framework	Hardware
1 Oct 2017	$5.78	BiDAF Stanford DAWN source	TensorFlow v1.2	60 GB / 16 CPU (Google Cloud [n1-standard-16])
2 Oct 2017	$6.87	BiDAF Stanford DAWN source	TensorFlow v1.2	1 K80 / 61 GB / 4 CPU (Amazon EC2 [p2.xlarge])
3 Oct 2017	$8.44	BiDAF Stanford DAWN source	TensorFlow v1.2	1 K80 / 30 GB / 8 CPU (Google Cloud)

Rank

Cost (USD)

Model

Framework

Hardware

1

Oct 2017

$5.78

BiDAF

Stanford DAWN

source

TensorFlow v1.2

60 GB / 16 CPU (Google Cloud [n1-standard-16])

2

Oct 2017

$6.87

BiDAF

Stanford DAWN

source

TensorFlow v1.2

1 K80 / 61 GB / 4 CPU (Amazon EC2 [p2.xlarge])

3

Oct 2017

$8.44

BiDAF

Stanford DAWN

source

TensorFlow v1.2

1 K80 / 30 GB / 8 CPU (Google Cloud)

Inference Latency

All Submissions

Objective: Latency required to answer one SQuAD question using a model with a F1 score of at least 0.75 on the development dataset.

Rank	1-example Latency (milliseconds)	Model	Framework	Hardware
1 Oct 2017	100.0000	BiDAF Stanford DAWN source	TensorFlow v1.2	60 GB / 16 CPU (Google Cloud [n1-standard-16])
2 Oct 2017	590.0000	BiDAF Stanford DAWN source	TensorFlow v1.2	1 K80 / 30 GB / 8 CPU (Google Cloud)
3 Oct 2017	638.1000	BiDAF Stanford DAWN source	TensorFlow v1.2	1 P100 / 512 GB / 56 CPU (DAWN Internal Cluster)
4 Oct 2017	705.9000	BiDAF Stanford DAWN source	TensorFlow v1.2	1 K80 / 61 GB / 4 CPU (Amazon EC2 [p2.xlarge])

Inference Cost

All Submissions

Objective: Average cost on public cloud instances to answer 10,000 questions from the SQuAD development dataset using a question answering model to a dev F1 score of 0.75% or greater.

Rank	Cost (USD)	Model	Framework	Hardware
1 Oct 2017	$0.15	BiDAF Stanford DAWN source	TensorFlow v1.2	60 GB / 16 CPU (Google Cloud [n1-standard-16])
2 Oct 2017	$1.58	BiDAF Stanford DAWN source	TensorFlow v1.2	1 K80 / 30 GB / 8 CPU (Google Cloud)
3 Oct 2017	$1.76	BiDAF Stanford DAWN source	TensorFlow v1.2	1 K80 / 61 GB / 4 CPU (Amazon EC2 [p2.xlarge])

Rank

Cost (USD)

Model

Framework

Hardware

1

Oct 2017

$0.15

BiDAF

Stanford DAWN

source

TensorFlow v1.2

60 GB / 16 CPU (Google Cloud [n1-standard-16])

2

Oct 2017

$1.58

BiDAF

Stanford DAWN

source

TensorFlow v1.2

1 K80 / 30 GB / 8 CPU (Google Cloud)

3

Oct 2017

$1.76

BiDAF

Stanford DAWN

source

TensorFlow v1.2

1 K80 / 61 GB / 4 CPU (Amazon EC2 [p2.xlarge])

Join Us

DAWNBench is part of a larger community conversation about the future of machine learning infrastructure. Sound off on the DAWNBench google group.

Disclosure: The Stanford DAWN research project is a five-year industrial affiliates program at Stanford University and is financially supported in part by founding members including Intel, Microsoft, NEC, Teradata, VMWare, and Google. For more information, including information regarding Stanford’s policies on openness in research and policies affecting industrial affiliates program membership, please see DAWN's membership page.

DAWNBench

An End-to-End Deep Learning Benchmark and Competition

Image Classification on ImageNet

Training Time

Training Cost

Inference Latency

Inference Cost

Image Classification on CIFAR10

Training Time

Training Cost

Inference Latency

Inference Cost

Question Answering on SQuAD

Training Time

Training Cost

Inference Latency

Inference Cost

Join Us