DAWNBench is a benchmark suite for end-to-end deep learning training and inference. Computation time and cost are critical resources in building deep models, yet many existing benchmarks focus solely on model accuracy. DAWNBench provides a reference set of common deep learning workloads for quantifying training time, training cost, inference latency, and inference cost across different optimization strategies, model architectures, software frameworks, clouds, and hardware.
Building on our experience with DAWNBench, we helped create MLPerf as an industry-standard for measuring machine learning system performance. Now that both the MLPerf Training and Inference benchmark suites have successfully launched, we ended rolling submissions to DAWNBench on 3/27/2020 to consolidate benchmarking efforts.
The original results before the April 20, 2018 deadline are archived for reference. To learn more about key takeaways from DAWNBench, check out our analysis of DAWNBench.
Objective: Time taken to train an image classification model to a top-5 validation accuracy of 93% or greater on ImageNet.
Rank | Time to 93% Accuracy | Model | Hardware | Framework |
---|---|---|---|---|
1
Mar 2020 |
0:02:38 |
ResNet50-v1.5
source |
16 ecs.gn6e-c12g1.24xlarge (AlibabaCloud) | AIACC-Training 1.3 + Tensorflow 2.1 |
2
May 2019 |
0:02:43 |
ResNet-50
source |
16 nodes with InfiniBand (8*V100 with NVLink for each node) | Moxing v1.13.0 + TensorFlow v1.13.1 |
3
Dec 2018 |
0:09:22 | ResNet-50 | 16 * 8 * Tesla-V100(ModelArts Service) | Huawei Optimized MXNet |
4
Sep 2018 |
0:18:06 | ResNet-50 | 16 p3.16xlarge (AWS) | PyTorch 0.4.1 |
5
Sep 2018 |
0:18:53 | Resnet 50 | 64 * V100 (8 machines - AWS p3.16xlarge) | ncluster / Pytorch 0.5.0a0+0e8088d |
Objective: Total cost of public cloud instances to train an image classification model to a top-5 validation accuracy of 93% or greater on ImageNet.
Rank | Cost (USD) | Model | Hardware | Framework |
---|---|---|---|---|
1
Mar 2020 |
$7.43 |
ResNet50-v1.5
source |
1 ecs.gn6e-c12g1.24xlarge (AlibabaCloud) | AIACC-Training 1.3 + Tensorflow 2.1 |
2
Sep 2018 |
$12.60 | ResNet50 | GCP n1-standard-2, Cloud TPU | TensorFlow v1.11.0 |
3
Mar 2020 |
$14.42 |
ResNet50-v1.5
source |
16 ecs.gn6e-c12g1.24xlarge (AlibabaCloud) | AIACC-Training 1.3 + Tensorflow 2.1 |
4
Aug 2019 |
$19.00 | Resnet 50 | Lambda GPU Cloud - 4x GTX 1080 Ti | ncluster / Pytorch 1.0.0 |
5
Apr 2019 |
$20.89 |
ResNet50
source |
Azure ND40s_v2 | PyTorch 1.0 |
Objective: Latency required to classify one ImageNet image using a model with a top-5 validation accuracy of 93% or greater.
Rank | 1-example Latency (milliseconds) | Model | Hardware | Framework |
---|---|---|---|---|
1
Mar 2020 |
0.0739 | ResNet26d | Alibaba Cloud [ecs.ebman1.26xlarge] | Pytorch+AIACC-Inference+HGAI |
2
Feb 2020 |
0.3880 | ResNet101 | Alibaba Cloud Npu | tensorflow+NpuInference |
3
Mar 2020 |
0.3926 | MIVT-NET-v2 | Alibaba Cloud [ecs.gn6i-c8g1.2xlarge] | HIE |
4
Feb 2020 |
0.4662 | ResNet26 | Alibaba Cloud [ecs.gn6i-c8g1.2xlarge] | PAI-Blade + TensorRT |
5
Nov 2019 |
0.4945 | ResNet26 | Huawei Cloud [pi2.2xlarge.4] | ModelArts-AIBOX + TensorRT |
Objective: Average cost on public cloud instances to classify 10,000 validation images from ImageNet using of an image classification model with a top-5 validation accuracy of 93% or greater.
Rank | Cost (USD) | Model | Framework | Hardware |
---|---|---|---|---|
1
Oct 2019 |
$0.00 | ResNet26d | Pytorch+AIACC-Inference | Alibaba Cloud [ecs.gn6i-c8g1.2xlarge] |
2
Jun 2019 |
$0.00 | ResNet50 | ifx | Didi Cloud [1 P4 / 16 GB / 8 vCPU] |
3
May 2018 |
$0.01 | ResNet50 | TensorFlow 1.12.2 | Alibaba Cloud [ecs.gn5i-c8g1.2xlarge] |
4
Dec 2018 |
$0.02 | ResNet50 | TensorFlow 1.10.0 | Alibaba Cloud [ecs.gn5i-c8g1.2xlarge] |
5
Apr 2018 |
$0.02 | ResNet50 | Intel(R) Optimized Caffe | Amazon EC2 [c5.2xlarge] |
Objective: Time taken to train an image classification model to a test accuracy of 94% or greater on CIFAR10.
Rank | Time to 94% Accuracy | Model | Framework | Hardware |
---|---|---|---|---|
1
Dec 2019 |
0:00:10 | Custom Resnet 9 | Pytorch 1.1.0 | Tesla V100 * 8 GPU / 32 GB / 40 CPU |
2
Jan 2020 |
0:00:11 | Custom ResNet 9 | PyTorch 1.1.0 | IBM AC922 + 4 * Nvidia Tesla V100 (NCSA HAL) |
3
Oct 2019 |
0:00:28 | Kakao Brain Custom ResNet9 | PyTorch 1.1.0 | Tesla V100 * 4 GPU / 488 GB / 56 CPU (Kakao Brain BrainCloud) |
4
May 2019 |
0:00:45 | BaiduNet9P | PyTorch v1.0.1 and PaddlePaddle | Baidu Cloud Tesla 8*V100-16GB/448 GB/96 CPU |
5
Oct 2019 |
0:00:58 | Kakao Brain Custom ResNet9 | PyTorch 1.1.0 | Tesla V100 * 1 GPU / 488 GB / 56 CPU (Kakao Brain BrainCloud) |
Objective: Total cost for public cloud instances to train an image classification model to a test accuracy of 94% or greater on CIFAR10.
Rank | Cost (USD) | Model | Framework | Hardware |
---|---|---|---|---|
1
May 2019 |
$0.02 | BaiduNet9 | PyTorch v1.0.1 and PaddlePaddle | Baidu Cloud Tesla V100*1-16GB/56 GB/12 CPU |
2
Aug 2019 |
$0.04 | BaiduNet9 | fastai / Pytorch 1.0.0 | Lambda GPU Cloud - 4x GTX 1080 Ti |
3
Nov 2018 |
$0.06 | Custom ResNet 9 | pytorch 0.4.0 | V100 (AWS p3.2xlarge) |
4
May 2019 |
$0.11 | BaiduNet9P | PyTorch v1.0.1 and PaddlePaddle | Baidu Cloud Tesla 8*V100-16GB/448 GB/96 CPU |
5
Apr 2018 |
$0.26 | Custom Wide Resnet | fastai / pytorch | Paperspace Volta (V100) |
Objective: Latency required to classify one CIFAR10 image using a model with a test accuracy of 94% or greater.
Rank | 1-example Latency (milliseconds) | Model | Framework | Hardware |
---|---|---|---|---|
1
Nov 2019 |
0.1345 | ResNet8 | ModelArts-AIBOX + TensorRT | Huawei Cloud [pi2.2xlarge.4] |
2
Apr 2019 |
0.6830 | BaiduNet8 using PyTorch JIT in C++ | PyTorch v1.0.1 and PaddlePaddle | Baidu Cloud Tesla V100*1/60 GB/12 CPU |
3
Nov 2018 |
0.8280 | Custom ResNet 9 using PyTorch JIT in C++ | PyTorch v1.0.0.dev20181116 | 1 P100 / 128 GB / 16 CPU |
4
Oct 2019 |
0.8570 | Kakao Brain Custom ResNet9 using PyTorch JIT in python | PyTorch 1.1.0 | Tesla V100 * 1 GPU / 488 GB / 56 CPU (Kakao Brain BrainCloud) |
5
Oct 2017 |
9.7843 | ResNet 56 | PyTorch v0.1.12 | 1 K80 / 61 GB / 4 CPU (Amazon EC2 [p2.xlarge]) |
Objective: Average cost on public cloud instances to classify 10,000 test images from CIFAR10 using an image classification model with a test accuracy of 94% or greater.
Rank | Cost (USD) | Model | Framework | Hardware |
---|---|---|---|---|
1
Apr 2019 |
$0.00 | BaiduNet8 using PyTorch JIT in C++ | PyTorch v1.0.1 and PaddlePaddle | Baidu Cloud Tesla V100*1/60 GB/12 CPU |
2
Oct 2017 |
$0.02 | ResNet 56 | PyTorch v0.1.12 | 1 K80 / 61 GB / 4 CPU (Amazon EC2 [p2.xlarge]) |
3
Oct 2017 |
$0.04 | ResNet 164 (without bottleneck) | TensorFlow v1.2 | 60 GB / 16 CPU (Google Cloud [n1-standard-16]) |
4
Oct 2017 |
$0.05 | ResNet 164 (with bottleneck) | TensorFlow v1.2 | 60 GB / 16 CPU (Google Cloud [n1-standard-16]) |
5
Oct 2017 |
$0.07 | ResNet 164 (without bottleneck) | PyTorch v0.1.12 | 1 K80 / 61 GB / 4 CPU (Amazon EC2 [p2.xlarge]) |
Objective: Time taken to train a question answering model to a F1 score of 0.75 or greater on the SQuAD development dataset.
Rank | Time to 0.75 F1 | Model | Framework | Hardware |
---|---|---|---|---|
1
Mar 2019 |
0:18:46 | FastFusionNet | Pytorch v0.3.1 | 1 NVidia GTX-1080 Ti |
2
Dec 2018 |
0:27:07 | DrQA | Pytorch 1.0.0 | 1 NVidia 2080 RTX (dev box) |
3
Apr 2018 |
0:45:56 | QANet | TensorFlow v1.8 | 1 TPUv2 |
4
Dec 2018 |
0:50:21 | DrQA | Pytorch 1.0.0 | 1 T4 / GCP |
5
Dec 2018 |
0:56:43 | DrQA | Pytorch 1.0.0 | 1 P4 / GCP |
Objective: Total cost for public cloud instances to train a question answering model to a F1 score of 0.75 or greater on the SQuAD development dataset.
Rank | Cost (USD) | Model | Framework | Hardware |
---|---|---|---|---|
1
Dec 2018 |
$0.57 | DrQA | Pytorch 1.0.0 | 1 P4 / GCP |
2
Dec 2018 |
$0.76 | DrQA | Pytorch 1.0.0 | 1 T4 / GCP |
3
Sep 2018 |
$1.23 | DrQA | Pytorch 0.4.1 | 1 K80 / AWS p2.xlarge |
4
Sep 2018 |
$3.09 | DrQA | Pytorch 0.4.1 | 1 V100 / AWS p3.2xlarge |
5
Oct 2017 |
$5.78 | BiDAF | TensorFlow v1.2 | 60 GB / 16 CPU (Google Cloud [n1-standard-16]) |
Objective: Latency required to answer one SQuAD question using a model with a F1 score of at least 0.75 on the development dataset.
Rank | 1-example Latency (milliseconds) | Model | Framework | Hardware |
---|---|---|---|---|
1
Jul 2019 |
7.5790 | PA-Occam-Bert | Tensorflow 1.13.0 | 1 NVidia Tesla V100 |
2
Feb 2019 |
7.9000 | FastFusionNet | Pytorch v0.3.1 | 1 NVidia GTX-1080 Ti |
3
Oct 2017 |
100.0000 | BiDAF | TensorFlow v1.2 | 60 GB / 16 CPU (Google Cloud [n1-standard-16]) |
4
Oct 2017 |
590.0000 | BiDAF | TensorFlow v1.2 | 1 K80 / 30 GB / 8 CPU (Google Cloud) |
5
Oct 2017 |
638.1000 | BiDAF | TensorFlow v1.2 | 1 P100 / 512 GB / 56 CPU (DAWN Internal Cluster) |
Objective: Average cost on public cloud instances to answer 10,000 questions from the SQuAD development dataset using a question answering model to a dev F1 score of 0.75% or greater.
Rank | Cost (USD) | Model | Framework | Hardware |
---|---|---|---|---|
1
Oct 2017 |
$0.15 | BiDAF | TensorFlow v1.2 | 60 GB / 16 CPU (Google Cloud [n1-standard-16]) |
2
Oct 2017 |
$1.58 | BiDAF | TensorFlow v1.2 | 1 K80 / 30 GB / 8 CPU (Google Cloud) |
3
Oct 2017 |
$1.76 | BiDAF | TensorFlow v1.2 | 1 K80 / 61 GB / 4 CPU (Amazon EC2 [p2.xlarge]) |
DAWNBench is part of a larger community conversation about the future of machine learning infrastructure. Sound off on the DAWNBench google group.