DAWNBench is a benchmark suite for end-to-end deep learning training and inference. Computation time and cost are critical resources in building deep models, yet many existing benchmarks focus solely on model accuracy. DAWNBench provides a reference set of common deep learning workloads for quantifying training time, training cost, inference latency, and inference cost across different optimization strategies, model architectures, software frameworks, clouds, and hardware.
The first iteration of DAWNBench is over and the competiton results and key takeways have been finalized. Check out MLPerf.org for our latest benchmarking efforts.
Objective: Time taken to train an image classification model to a top-5 validation accuracy of 93% or greater on ImageNet.
Rank | Time to 93% Accuracy | Model | Hardware | Framework |
---|---|---|---|---|
1
Apr 2018 |
0:30:43 | ResNet50 | Half of a TPUv2 Pod | TensorFlow 1.8.0-rc1 |
2
Apr 2018 |
1:06:32 | AmoebaNet-D N6F256 | 1/4 of a TPUv2 Pod | TensorFlow 1.8.0-rc1 |
3
Apr 2018 |
1:58:24 | AmoebaNet-D N6F256 | 1/16 of a TPUv2 Pod | TensorFlow 1.8.0-rc1 |
4
Apr 2018 |
2:57:28 | Resnet 50 | 8 * V100 (AWS p3.16xlarge) | fastai / pytorch |
5
Apr 2018 |
3:25:55 | ResNet50 | 128 nodes with Xeon Platinum 8124M / 144 GB / 36 Cores (Amazon EC2 [c5.18xlarge]) | Intel(R) Optimized Caffe |
Objective: Total cost of public cloud instances to train an image classification model to a top-5 validation accuracy of 93% or greater on ImageNet.
Rank | Cost (USD) | Model | Hardware | Framework |
---|---|---|---|---|
1
Apr 2018 |
$49.30 | AmoebaNet-D N6F256 | GCP n1-standard-2, Cloud TPU | TensorFlow 1.8.0-rc0 |
2
Apr 2018 |
$58.53 | ResNet50 | GCP n1-standard-2, Cloud TPU | TensorFlow v1.8rc1 |
3
Apr 2018 |
$72.40 | Resnet 50 | 8 * V100 (AWS p3.16xlarge) | fastai / pytorch |
4
Mar 2018 |
$82.07 | ResNet50 | GCP n1-standard-2, Cloud TPU | TensorFlow v1.7rc1 |
5
Jan 2018 |
$358.22 | ResNet50 | p3.16xlarge | tensorflow 1.5, tensorpack 0.8.1 |
Objective: Latency required to classify one ImageNet image using a model with a top-5 validation accuracy of 93% or greater.
Rank | 1-example Latency (milliseconds) | Model | Hardware | Framework |
---|---|---|---|---|
1
Apr 2018 |
9.9600 | ResNet50 | Amazon EC2 [c5.18xlarge] | Intel(R) Optimized Caffe |
2
Apr 2018 |
12.4000 | ResNet50 | Amazon EC2 [c5.4xlarge] | Intel(R) Optimized Caffe |
3
Apr 2018 |
17.3800 | ResNet50 | Amazon EC2 [c5.2xlarge] | Intel(R) Optimized Caffe |
4
Nov 2017 |
22.2700 | ResNet 152 | 1 P100 / 30 GB / 8 CPU (Google Compute) | TensorFlow v1.2 |
5
Nov 2017 |
26.8200 | ResNet 152 | 1 P100 / 30 GB / 8 CPU (Google Compute) | MXNet 0.11.0 |
Objective: Average cost on public cloud instances to classify 10,000 validation images from ImageNet using of an image classification model with a top-5 validation accuracy of 93% or greater.
Rank | Cost (USD) | Model | Framework | Hardware |
---|---|---|---|---|
1
Apr 2018 |
$0.02 | ResNet50 | Intel(R) Optimized Caffe | Amazon EC2 [c5.2xlarge] |
2
Apr 2018 |
$0.02 | ResNet50 | Intel(R) Optimized Caffe | Amazon EC2 [c5.4xlarge] |
3
Nov 2017 |
$0.07 | ResNet 152 | MXNet 0.11.0 | 1 K80 / 61 GB / 4 CPU (Amazon EC2 [p2.xlarge]) |
4
Nov 2017 |
$0.11 | ResNet 152 | TensorFlow v1.2 | 1 P100 / 30 GB / 8 CPU (Google Compute) |
5
Nov 2017 |
$0.12 | ResNet 152 | TensorFlow v1.2 | 1 K80 / 61 GB / 4 CPU (Amazon EC2 [p2.xlarge]) |
Objective: Time taken to train an image classification model to a test accuracy of 94% or greater on CIFAR10.
Rank | Time to 94% Accuracy | Model | Framework | Hardware |
---|---|---|---|---|
1
Apr 2018 |
0:02:54 | Custom Wide Resnet | fastai / pytorch | 8 * V100 (AWS p3.16xlarge) |
2
Apr 2018 |
0:05:41 | Resnet18 + minor modifications | pytorch 0.3.1.post2 | V100 (AWS p3.2xlarge) |
3
Apr 2018 |
0:06:45 | Custom Wide Resnet | fastai / pytorch | Paperspace Volta (V100) |
4
Apr 2018 |
0:35:37 | KervResNet34 | PyTorch 0.3.1 | 1 GPU (Nvidia GeForce GTX 1080 Ti) |
5
Jan 2018 |
1:07:55 | ResNet50 | tensorflow 1.5, tensorpack 0.8.1 | p3.2xlarge |
Objective: Total cost for public cloud instances to train an image classification model to a test accuracy of 94% or greater on CIFAR10.
Rank | Cost (USD) | Model | Framework | Hardware |
---|---|---|---|---|
1
Apr 2018 |
$0.26 | Custom Wide Resnet | fastai / pytorch | Paperspace Volta (V100) |
2
Apr 2018 |
$0.29 | Resnet18 + minor modifications | pytorch 0.3.1.post2 | V100 (AWS p3.2xlarge) |
3
Apr 2018 |
$1.18 | Custom Wide Resnet | fastai / pytorch | 8 * V100 (AWS p3.16xlarge) |
4
Jan 2018 |
$3.46 | ResNet50 | tensorflow 1.5, tensorpack 0.8.1 | p3.2xlarge |
5
Jan 2018 |
$3.78 | ResNet50 | tensorflow 1.5, tensorpack 0.8.1 | g3.4xlarge |
Objective: Latency required to classify one CIFAR10 image using a model with a test accuracy of 94% or greater.
Rank | 1-example Latency (milliseconds) | Model | Framework | Hardware |
---|---|---|---|---|
1
Oct 2017 |
9.7843 | ResNet 56 | PyTorch v0.1.12 | 1 K80 / 61 GB / 4 CPU (Amazon EC2 [p2.xlarge]) |
2
Oct 2017 |
24.6291 | ResNet 164 (with bottleneck) | PyTorch v0.1.12 | 1 P100 / 512 GB / 56 CPU (DAWN Internal Cluster) |
3
Oct 2017 |
24.9200 | ResNet 164 (without bottleneck) | TensorFlow v1.2 | 60 GB / 16 CPU (Google Cloud [n1-standard-16]) |
4
Oct 2017 |
25.2188 | ResNet 164 (without bottleneck) | PyTorch v0.1.12 | 1 P100 / 512 GB / 56 CPU (DAWN Internal Cluster) |
5
Oct 2017 |
28.1000 | ResNet 164 (with bottleneck) | TensorFlow v1.2 | 1 P100 / 512 GB / 56 CPU (DAWN Internal Cluster) |
Objective: Average cost on public cloud instances to classify 10,000 test images from CIFAR10 using an image classification model with a test accuracy of 94% or greater.
Rank | Cost (USD) | Model | Framework | Hardware |
---|---|---|---|---|
1
Oct 2017 |
$0.02 | ResNet 56 | PyTorch v0.1.12 | 1 K80 / 61 GB / 4 CPU (Amazon EC2 [p2.xlarge]) |
2
Oct 2017 |
$0.04 | ResNet 164 (without bottleneck) | TensorFlow v1.2 | 60 GB / 16 CPU (Google Cloud [n1-standard-16]) |
3
Oct 2017 |
$0.05 | ResNet 164 (with bottleneck) | TensorFlow v1.2 | 60 GB / 16 CPU (Google Cloud [n1-standard-16]) |
4
Oct 2017 |
$0.07 | ResNet 164 (without bottleneck) | PyTorch v0.1.12 | 1 K80 / 61 GB / 4 CPU (Amazon EC2 [p2.xlarge]) |
5
Oct 2017 |
$0.07 | ResNet 164 (with bottleneck) | PyTorch v0.1.12 | 1 K80 / 61 GB / 4 CPU (Amazon EC2 [p2.xlarge]) |
Objective: Time taken to train a question answering model to a F1 score of 0.75 or greater on the SQuAD development dataset.
Rank | Time to 0.75 F1 | Model | Framework | Hardware |
---|---|---|---|---|
1
Apr 2018 |
0:45:56 | QANet | TensorFlow v1.8 | 1 TPUv2 |
2
Oct 2017 |
7:38:10 | BiDAF | TensorFlow v1.2 | 1 K80 / 61 GB / 4 CPU (Amazon EC2 [p2.xlarge]) |
3
Oct 2017 |
7:51:22 | BiDAF | TensorFlow v1.2 | 1 P100 / 512 GB / 56 CPU (DAWN Internal Cluster) |
4
Oct 2017 |
8:43:40 | BiDAF | TensorFlow v1.2 | 1 K80 / 30 GB / 8 CPU (Google Cloud) |
5
Oct 2017 |
10:50:22 | BiDAF | TensorFlow v1.2 | 60 GB / 16 CPU (Google Cloud [n1-standard-16]) |
Objective: Total cost for public cloud instances to train a question answering model to a F1 score of 0.75 or greater on the SQuAD development dataset.
Rank | Cost (USD) | Model | Framework | Hardware |
---|---|---|---|---|
1
Oct 2017 |
$5.78 | BiDAF | TensorFlow v1.2 | 60 GB / 16 CPU (Google Cloud [n1-standard-16]) |
2
Oct 2017 |
$6.87 | BiDAF | TensorFlow v1.2 | 1 K80 / 61 GB / 4 CPU (Amazon EC2 [p2.xlarge]) |
3
Oct 2017 |
$8.44 | BiDAF | TensorFlow v1.2 | 1 K80 / 30 GB / 8 CPU (Google Cloud) |
Objective: Latency required to answer one SQuAD question using a model with a F1 score of at least 0.75 on the development dataset.
Rank | 1-example Latency (milliseconds) | Model | Framework | Hardware |
---|---|---|---|---|
1
Oct 2017 |
100.0000 | BiDAF | TensorFlow v1.2 | 60 GB / 16 CPU (Google Cloud [n1-standard-16]) |
2
Oct 2017 |
590.0000 | BiDAF | TensorFlow v1.2 | 1 K80 / 30 GB / 8 CPU (Google Cloud) |
3
Oct 2017 |
638.1000 | BiDAF | TensorFlow v1.2 | 1 P100 / 512 GB / 56 CPU (DAWN Internal Cluster) |
4
Oct 2017 |
705.9000 | BiDAF | TensorFlow v1.2 | 1 K80 / 61 GB / 4 CPU (Amazon EC2 [p2.xlarge]) |
Objective: Average cost on public cloud instances to answer 10,000 questions from the SQuAD development dataset using a question answering model to a dev F1 score of 0.75% or greater.
Rank | Cost (USD) | Model | Framework | Hardware |
---|---|---|---|---|
1
Oct 2017 |
$0.15 | BiDAF | TensorFlow v1.2 | 60 GB / 16 CPU (Google Cloud [n1-standard-16]) |
2
Oct 2017 |
$1.58 | BiDAF | TensorFlow v1.2 | 1 K80 / 30 GB / 8 CPU (Google Cloud) |
3
Oct 2017 |
$1.76 | BiDAF | TensorFlow v1.2 | 1 K80 / 61 GB / 4 CPU (Amazon EC2 [p2.xlarge]) |
DAWNBench is part of a larger community conversation about the future of machine learning infrastructure. Sound off on the DAWNBench google group.