DAWNBench

An End-to-End Deep Learning Benchmark and Competition

DAWNBench is a benchmark suite for end-to-end deep learning training and inference. Computation time and cost are critical resources in building deep models, yet many existing benchmarks focus solely on model accuracy. DAWNBench provides a reference set of common deep learning workloads for quantifying training time, training cost, inference latency, and inference cost across different optimization strategies, model architectures, software frameworks, clouds, and hardware.

The first iteration of DAWNBench is over, and the competition results and key takeaways have been finalized. However, we are still curious to see how well people can do on this benchmark and are now accepting rolling submissions. The original results before the April 20, 2018 deadline are archived for reference. For a more comprehensive benchmark, please consider submitting to the updated MLPerf benchmark.

Image Classification on ImageNet

Training Time

Objective: Time taken to train an image classification model to a top-5 validation accuracy of 93% or greater on ImageNet.

Rank Time to 93% Accuracy Model Hardware Framework
1

Nov 2018

0:10:28 ResNet-50

ModelArts Service of Huawei Cloud

source

16 nodes with RDMA (8*V100 for each node) TensorFlow v1.8.0
2

Nov 2018

0:12:50 ResNet-50

ModelArts Service of Huawei Cloud

source

16*8*Tesla-V100(ModelArts Service) MXNet v1.2.1
3

Sep 2018

0:18:06 ResNet-50

fast.ai/DIUx (Yaroslav Bulatov, Andrew Shaw, Jeremy Howard)

source

16 p3.16xlarge (AWS) PyTorch 0.4.1
4

Sep 2018

0:18:53 Resnet 50

Andrew Shaw, Yaroslav Bulatov, Jeremy Howard

source

64 * V100 (8 machines - AWS p3.16xlarge) ncluster / Pytorch 0.5.0a0+0e8088d
5

Sep 2018

0:29:43 Resnet 50

Andrew Shaw, Yaroslav Bulatov, Jeremy Howard

source

32 * V100 (4 machines - AWS p3.16xlarge) ncluster / Pytorch 0.5.0a0+0e8088d

Training Cost

Objective: Total cost of public cloud instances to train an image classification model to a top-5 validation accuracy of 93% or greater on ImageNet.

Rank Cost (USD) Model Hardware Framework
1

Sep 2018

$12.60 ResNet50

Google Cloud TPU

source

GCP n1-standard-2, Cloud TPU TensorFlow v1.11.0
2

Sep 2018

$27.00 ResNet50

Google Cloud TPU

source

GCP n1-standard-2, Cloud TPU TensorFlow v1.11.0
3

Sep 2018

$48.48 Resnet 50

Andrew Shaw, Yaroslav Bulatov, Jeremy Howard

source

32 * V100 (4 machines - AWS p3.16xlarge) ncluster / Pytorch 0.5.0a0+0e8088d
4

Apr 2018

$49.30 AmoebaNet-D N6F256

Google Cloud TPU

source

GCP n1-standard-2, Cloud TPU TensorFlow 1.8.0-rc0
5

Apr 2018

$58.53 ResNet50

Google Cloud TPU

source

GCP n1-standard-2, Cloud TPU TensorFlow v1.8rc1

Inference Latency

Objective: Latency required to classify one ImageNet image using a model with a top-5 validation accuracy of 93% or greater.

Rank 1-example Latency (milliseconds) Model Hardware Framework
1

Apr 2018

9.9600 ResNet50

Intel(R) Corporation

source

Amazon EC2 [c5.18xlarge] Intel(R) Optimized Caffe
2

Apr 2018

12.4000 ResNet50

Intel(R) Corporation

source

Amazon EC2 [c5.4xlarge] Intel(R) Optimized Caffe
3

Apr 2018

17.3800 ResNet50

Intel(R) Corporation

source

Amazon EC2 [c5.2xlarge] Intel(R) Optimized Caffe
4

Nov 2017

22.2700 ResNet 152

Stanford DAWN

source

1 P100 / 30 GB / 8 CPU (Google Compute) TensorFlow v1.2
5

Nov 2017

26.8200 ResNet 152

Stanford DAWN

source

1 P100 / 30 GB / 8 CPU (Google Compute) MXNet 0.11.0

Inference Cost

Objective: Average cost on public cloud instances to classify 10,000 validation images from ImageNet using of an image classification model with a top-5 validation accuracy of 93% or greater.

Rank Cost (USD) Model Framework Hardware
1

Apr 2018

$0.02 ResNet50

Intel(R) Corporation

source

Intel(R) Optimized Caffe Amazon EC2 [c5.2xlarge]
2

Apr 2018

$0.02 ResNet50

Intel(R) Corporation

source

Intel(R) Optimized Caffe Amazon EC2 [c5.4xlarge]
3

Nov 2017

$0.07 ResNet 152

Stanford DAWN

source

MXNet 0.11.0 1 K80 / 61 GB / 4 CPU (Amazon EC2 [p2.xlarge])
4

Nov 2017

$0.11 ResNet 152

Stanford DAWN

source

TensorFlow v1.2 1 P100 / 30 GB / 8 CPU (Google Compute)
5

Nov 2017

$0.12 ResNet 152

Stanford DAWN

source

TensorFlow v1.2 1 K80 / 61 GB / 4 CPU (Amazon EC2 [p2.xlarge])

Image Classification on CIFAR10

Training Time

Objective: Time taken to train an image classification model to a test accuracy of 94% or greater on CIFAR10.

Rank Time to 94% Accuracy Model Framework Hardware
1

Nov 2018

0:01:15 Custom ResNet 9

David Page, myrtle.ai

source

pytorch 0.4.0 V100 (AWS p3.2xlarge)
2

Apr 2018

0:02:54 Custom Wide Resnet

fast.ai + students team: Jeremy Howard, Andrew Shaw, Brett Koonce, Sylvain Gugger

source

fastai / pytorch 8 * V100 (AWS p3.16xlarge)
3

Apr 2018

0:05:41 Resnet18 + minor modifications

bkj

source

pytorch 0.3.1.post2 V100 (AWS p3.2xlarge)
4

Apr 2018

0:06:45 Custom Wide Resnet

fast.ai + students team: Jeremy Howard, Andrew Shaw, Brett Koonce, Sylvain Gugger

source

fastai / pytorch Paperspace Volta (V100)
5

Apr 2018

0:35:37 KervResNet34

Chen Wang

source

PyTorch 0.3.1 1 GPU (Nvidia GeForce GTX 1080 Ti)

Training Cost

Objective: Total cost for public cloud instances to train an image classification model to a test accuracy of 94% or greater on CIFAR10.

Rank Cost (USD) Model Framework Hardware
1

Nov 2018

$0.06 Custom ResNet 9

David Page, myrtle.ai

source

pytorch 0.4.0 V100 (AWS p3.2xlarge)
2

Apr 2018

$0.26 Custom Wide Resnet

fast.ai + students team: Jeremy Howard, Andrew Shaw, Brett Koonce, Sylvain Gugger

source

fastai / pytorch Paperspace Volta (V100)
3

Apr 2018

$0.29 Resnet18 + minor modifications

bkj

source

pytorch 0.3.1.post2 V100 (AWS p3.2xlarge)
4

Apr 2018

$1.18 Custom Wide Resnet

fast.ai + students team: Jeremy Howard, Andrew Shaw, Brett Koonce, Sylvain Gugger

source

fastai / pytorch 8 * V100 (AWS p3.16xlarge)
5

Jan 2018

$3.46 ResNet50

DIUX

source

tensorflow 1.5, tensorpack 0.8.1 p3.2xlarge

Inference Latency

Objective: Latency required to classify one CIFAR10 image using a model with a test accuracy of 94% or greater.

Rank 1-example Latency (milliseconds) Model Framework Hardware
1

Nov 2018

0.8280 Custom ResNet 9 using PyTorch JIT in C++

Laurent Mazare

source

PyTorch v1.0.0.dev20181116 1 P100 / 128 GB / 16 CPU
2

Oct 2017

9.7843 ResNet 56

Stanford DAWN

source

PyTorch v0.1.12 1 K80 / 61 GB / 4 CPU (Amazon EC2 [p2.xlarge])
3

Oct 2017

24.6291 ResNet 164 (with bottleneck)

Stanford DAWN

source

PyTorch v0.1.12 1 P100 / 512 GB / 56 CPU (DAWN Internal Cluster)
4

Oct 2017

24.9200 ResNet 164 (without bottleneck)

Stanford DAWN

source

TensorFlow v1.2 60 GB / 16 CPU (Google Cloud [n1-standard-16])
5

Oct 2017

25.2188 ResNet 164 (without bottleneck)

Stanford DAWN

source

PyTorch v0.1.12 1 P100 / 512 GB / 56 CPU (DAWN Internal Cluster)

Inference Cost

Objective: Average cost on public cloud instances to classify 10,000 test images from CIFAR10 using an image classification model with a test accuracy of 94% or greater.

Rank Cost (USD) Model Framework Hardware
1

Oct 2017

$0.02 ResNet 56

Stanford DAWN

source

PyTorch v0.1.12 1 K80 / 61 GB / 4 CPU (Amazon EC2 [p2.xlarge])
2

Oct 2017

$0.04 ResNet 164 (without bottleneck)

Stanford DAWN

source

TensorFlow v1.2 60 GB / 16 CPU (Google Cloud [n1-standard-16])
3

Oct 2017

$0.05 ResNet 164 (with bottleneck)

Stanford DAWN

source

TensorFlow v1.2 60 GB / 16 CPU (Google Cloud [n1-standard-16])
4

Oct 2017

$0.07 ResNet 164 (without bottleneck)

Stanford DAWN

source

PyTorch v0.1.12 1 K80 / 61 GB / 4 CPU (Amazon EC2 [p2.xlarge])
5

Oct 2017

$0.07 ResNet 164 (with bottleneck)

Stanford DAWN

source

PyTorch v0.1.12 1 K80 / 61 GB / 4 CPU (Amazon EC2 [p2.xlarge])

Question Answering on SQuAD

Training Time

Objective: Time taken to train a question answering model to a F1 score of 0.75 or greater on the SQuAD development dataset.

Rank Time to 0.75 F1 Model Framework Hardware
1

Sep 2018

0:36:25 DrQA

Runqi Yang, Facebook ParlAI, Brett Koonce

source

Pytorch 0.4.1 1 NVidia 1070 Ti (dev box)
2

Apr 2018

0:45:56 QANet

Google

source

TensorFlow v1.8 1 TPUv2
3

Oct 2018

0:59:40 DrQA

Runqi Yang, Facebook ParlAI, Brett Koonce

source

Pytorch 0.4.1 1 P4 / GCP
4

Sep 2018

1:00:35 DrQA

Runqi Yang, Facebook ParlAI, Brett Koonce

source

Pytorch 0.4.1 1 V100 / AWS p3.2xlarge
5

Sep 2018

1:21:55 DrQA

Runqi Yang, Facebook ParlAI, Brett Koonce

source

Pytorch 0.4.1 1 K80 / AWS p2.xlarge

Training Cost

Objective: Total cost for public cloud instances to train a question answering model to a F1 score of 0.75 or greater on the SQuAD development dataset.

Rank Cost (USD) Model Framework Hardware
1

Oct 2018

$0.60 DrQA

Runqi Yang, Facebook ParlAI, Brett Koonce

source

Pytorch 0.4.1 1 P4 / GCP
2

Sep 2018

$1.23 DrQA

Runqi Yang, Facebook ParlAI, Brett Koonce

source

Pytorch 0.4.1 1 K80 / AWS p2.xlarge
3

Sep 2018

$3.09 DrQA

Runqi Yang, Facebook ParlAI, Brett Koonce

source

Pytorch 0.4.1 1 V100 / AWS p3.2xlarge
4

Oct 2017

$5.78 BiDAF

Stanford DAWN

source

TensorFlow v1.2 60 GB / 16 CPU (Google Cloud [n1-standard-16])
5

Oct 2017

$6.87 BiDAF

Stanford DAWN

source

TensorFlow v1.2 1 K80 / 61 GB / 4 CPU (Amazon EC2 [p2.xlarge])

Inference Latency

Objective: Latency required to answer one SQuAD question using a model with a F1 score of at least 0.75 on the development dataset.

Rank 1-example Latency (milliseconds) Model Framework Hardware
1

Oct 2017

100.0000 BiDAF

Stanford DAWN

source

TensorFlow v1.2 60 GB / 16 CPU (Google Cloud [n1-standard-16])
2

Oct 2017

590.0000 BiDAF

Stanford DAWN

source

TensorFlow v1.2 1 K80 / 30 GB / 8 CPU (Google Cloud)
3

Oct 2017

638.1000 BiDAF

Stanford DAWN

source

TensorFlow v1.2 1 P100 / 512 GB / 56 CPU (DAWN Internal Cluster)
4

Oct 2017

705.9000 BiDAF

Stanford DAWN

source

TensorFlow v1.2 1 K80 / 61 GB / 4 CPU (Amazon EC2 [p2.xlarge])

Inference Cost

Objective: Average cost on public cloud instances to answer 10,000 questions from the SQuAD development dataset using a question answering model to a dev F1 score of 0.75% or greater.

Rank Cost (USD) Model Framework Hardware
1

Oct 2017

$0.15 BiDAF

Stanford DAWN

source

TensorFlow v1.2 60 GB / 16 CPU (Google Cloud [n1-standard-16])
2

Oct 2017

$1.58 BiDAF

Stanford DAWN

source

TensorFlow v1.2 1 K80 / 30 GB / 8 CPU (Google Cloud)
3

Oct 2017

$1.76 BiDAF

Stanford DAWN

source

TensorFlow v1.2 1 K80 / 61 GB / 4 CPU (Amazon EC2 [p2.xlarge])

Join Us

DAWNBench is part of a larger community conversation about the future of machine learning infrastructure. Sound off on the DAWNBench google group.

Disclosure: The Stanford DAWN research project is a five-year industrial affiliates program at Stanford University and is financially supported in part by founding members including Intel, Microsoft, NEC, Teradata, VMWare, and Google. For more information, including information regarding Stanford’s policies on openness in research and policies affecting industrial affiliates program membership, please see DAWN's membership page.