DAWNBench

An End-to-End Deep Learning Benchmark and Competition

DAWNBench is a benchmark suite for end-to-end deep learning training and inference. Computation time and cost are critical resources in building deep models, yet many existing benchmarks focus solely on model accuracy. DAWNBench provides a reference set of common deep learning workloads for quantifying training time, training cost, inference latency, and inference cost across different optimization strategies, model architectures, software frameworks, clouds, and hardware.

Deadline: April 20, 2018 at 11:59 PM PDT. All pull requests created on dawn-bench-entries will be considered and reviewed over the following two weeks. Final results will be announced at the beginning of May.

Image Classification on ImageNet

Training Time

Objective: Time taken to train an image classification model to a top-5 validation accuracy of 93% or greater on ImageNet.

Rank Time to 93% Accuracy Model Hardware Framework
1

Apr 2018

0:30:43 ResNet50

Google

source

Half of a TPUv2 Pod TensorFlow 1.8.0-rc1
2

Apr 2018

1:06:32 AmoebaNet-D N6F256

Google

source

1/4 of a TPUv2 Pod TensorFlow 1.8.0-rc1
3

Apr 2018

1:58:24 AmoebaNet-D N6F256

Google

source

1/16 of a TPUv2 Pod TensorFlow 1.8.0-rc1
4

Apr 2018

2:57:49 Resnet 50

fast.ai + students team: Jeremy Howard, Andrew Shaw, Brett Koonce, Sylvain Gugger

source

8 * V100 (AWS p3.16xlarge) fastai / pytorch
5

Apr 2018

3:25:55 ResNet50

Intel(R) Corporation

source

128 nodes with Xeon Platinum 8124M / 144 GB / 36 Cores (Amazon EC2 [c5.18xlarge]) Intel(R) Optimized Caffe

Training Cost

Objective: Total cost of public cloud instances to train an image classification model to a top-5 validation accuracy of 93% or greater on ImageNet.

Rank Cost (USD) Model Hardware Framework
1

Apr 2018

$49.30 AmoebaNet-D N6F256

Google Cloud TPU

source

GCP n1-standard-2, Cloud TPU TensorFlow 1.8.0-rc0
2

Apr 2018

$58.53 ResNet50

Google Cloud TPU

source

GCP n1-standard-2, Cloud TPU TensorFlow v1.8rc1
3

Apr 2018

$72.54 Resnet 50

fast.ai + students team: Jeremy Howard, Andrew Shaw, Brett Koonce, Sylvain Gugger

source

8 * V100 (AWS p3.16xlarge) fastai / pytorch
4

Mar 2018

$82.07 ResNet50

Google Cloud TPU

source

GCP n1-standard-2, Cloud TPU TensorFlow v1.7rc1
5

Jan 2018

$358.22 ResNet50

DIUX

source

p3.16xlarge tensorflow 1.5, tensorpack 0.8.1

Inference Latency

Objective: Latency required to classify one ImageNet image using a model with a top-5 validation accuracy of 93% or greater.

Rank 1-example Latency (milliseconds) Model Hardware Framework
1

Apr 2018

9.9600 ResNet50

Intel(R) Corporation

source

Amazon EC2 [c5.18xlarge] Intel(R) Optimized Caffe
2

Apr 2018

12.4000 ResNet50

Intel(R) Corporation

source

Amazon EC2 [c5.4xlarge] Intel(R) Optimized Caffe
3

Apr 2018

17.3800 ResNet50

Intel(R) Corporation

source

Amazon EC2 [c5.2xlarge] Intel(R) Optimized Caffe
4

Nov 2017

22.2700 ResNet 152

Stanford DAWN

source

1 P100 / 30 GB / 8 CPU (Google Compute) TensorFlow v1.2
5

Nov 2017

26.8200 ResNet 152

Stanford DAWN

source

1 P100 / 30 GB / 8 CPU (Google Compute) MXNet 0.11.0

Inference Cost

Objective: Average cost on public cloud instances to classify 10,000 validation images from ImageNet using of an image classification model with a top-5 validation accuracy of 93% or greater.

Rank Cost (USD) Model Framework Hardware
1

Apr 2018

$0.02 ResNet50

Intel(R) Corporation

source

Intel(R) Optimized Caffe Amazon EC2 [c5.2xlarge]
2

Apr 2018

$0.02 ResNet50

Intel(R) Corporation

source

Intel(R) Optimized Caffe Amazon EC2 [c5.4xlarge]
3

Nov 2017

$0.07 ResNet 152

Stanford DAWN

source

MXNet 0.11.0 1 K80 / 61 GB / 4 CPU (Amazon EC2 [p2.xlarge])
4

Nov 2017

$0.11 ResNet 152

Stanford DAWN

source

TensorFlow v1.2 1 P100 / 30 GB / 8 CPU (Google Compute)
5

Nov 2017

$0.12 ResNet 152

Stanford DAWN

source

TensorFlow v1.2 1 K80 / 61 GB / 4 CPU (Amazon EC2 [p2.xlarge])

Image Classification on CIFAR10

Training Time

Objective: Time taken to train an image classification model to a test accuracy of 94% or greater on CIFAR10.

Rank Time to 94% Accuracy Model Framework Hardware
1

Apr 2018

0:02:54 Custom Wide Resnet

fast.ai + students team: Jeremy Howard, Andrew Shaw, Brett Koonce, Sylvain Gugger

source

fastai / pytorch 8 * V100 (AWS p3.16xlarge)
2

Apr 2018

0:05:41 Resnet18 + minor modifications

bkj

source

pytorch 0.3.1.post2 V100 (AWS p3.2xlarge)
3

Apr 2018

0:06:45 Custom Wide Resnet

fast.ai + students team: Jeremy Howard, Andrew Shaw, Brett Koonce, Sylvain Gugger

source

fastai / pytorch Paperspace Volta (V100)
4

Apr 2018

0:35:37 KervResNet34

Chen Wang

source

PyTorch 0.3.1 1 GPU (Nvidia GeForce GTX 1080 Ti)
5

Jan 2018

1:07:55 ResNet50

DIUX

source

tensorflow 1.5, tensorpack 0.8.1 p3.2xlarge

Training Cost

Objective: Total cost for public cloud instances to train an image classification model to a test accuracy of 94% or greater on CIFAR10.

Rank Cost (USD) Model Framework Hardware
1

Apr 2018

$0.26 Custom Wide Resnet

fast.ai + students team: Jeremy Howard, Andrew Shaw, Brett Koonce, Sylvain Gugger

source

fastai / pytorch Paperspace Volta (V100)
2

Apr 2018

$0.29 Resnet18 + minor modifications

bkj

source

pytorch 0.3.1.post2 V100 (AWS p3.2xlarge)
3

Apr 2018

$1.18 Custom Wide Resnet

fast.ai + students team: Jeremy Howard, Andrew Shaw, Brett Koonce, Sylvain Gugger

source

fastai / pytorch 8 * V100 (AWS p3.16xlarge)
4

Jan 2018

$3.46 ResNet50

DIUX

source

tensorflow 1.5, tensorpack 0.8.1 p3.2xlarge
5

Jan 2018

$3.78 ResNet50

DIUX

source

tensorflow 1.5, tensorpack 0.8.1 g3.4xlarge

Inference Latency

Objective: Latency required to classify one CIFAR10 image using a model with a test accuracy of 94% or greater.

Rank 1-example Latency (milliseconds) Model Framework Hardware
1

Oct 2017

9.7843 ResNet 56

Stanford DAWN

source

PyTorch v0.1.12 1 K80 / 61 GB / 4 CPU (Amazon EC2 [p2.xlarge])
2

Oct 2017

24.6291 ResNet 164 (with bottleneck)

Stanford DAWN

source

PyTorch v0.1.12 1 P100 / 512 GB / 56 CPU (DAWN Internal Cluster)
3

Oct 2017

24.9200 ResNet 164 (without bottleneck)

Stanford DAWN

source

TensorFlow v1.2 60 GB / 16 CPU (Google Cloud [n1-standard-16])
4

Oct 2017

25.2188 ResNet 164 (without bottleneck)

Stanford DAWN

source

PyTorch v0.1.12 1 P100 / 512 GB / 56 CPU (DAWN Internal Cluster)
5

Oct 2017

28.1000 ResNet 164 (with bottleneck)

Stanford DAWN

source

TensorFlow v1.2 1 P100 / 512 GB / 56 CPU (DAWN Internal Cluster)

Inference Cost

Objective: Average cost on public cloud instances to classify 10,000 test images from CIFAR10 using an image classification model with a test accuracy of 94% or greater.

Rank Cost (USD) Model Framework Hardware
1

Oct 2017

$0.02 ResNet 56

Stanford DAWN

source

PyTorch v0.1.12 1 K80 / 61 GB / 4 CPU (Amazon EC2 [p2.xlarge])
2

Oct 2017

$0.04 ResNet 164 (without bottleneck)

Stanford DAWN

source

TensorFlow v1.2 60 GB / 16 CPU (Google Cloud [n1-standard-16])
3

Oct 2017

$0.05 ResNet 164 (with bottleneck)

Stanford DAWN

source

TensorFlow v1.2 60 GB / 16 CPU (Google Cloud [n1-standard-16])
4

Oct 2017

$0.07 ResNet 164 (without bottleneck)

Stanford DAWN

source

PyTorch v0.1.12 1 K80 / 61 GB / 4 CPU (Amazon EC2 [p2.xlarge])
5

Oct 2017

$0.07 ResNet 164 (with bottleneck)

Stanford DAWN

source

PyTorch v0.1.12 1 K80 / 61 GB / 4 CPU (Amazon EC2 [p2.xlarge])

Question Answering on SQuAD

Training Time

Objective: Time taken to train a question answering model to a F1 score of 0.75 or greater on the SQuAD development dataset.

Rank Time to 0.75 F1 Model Framework Hardware
1

Oct 2017

7:38:10 BiDAF

Stanford DAWN

source

TensorFlow v1.2 1 K80 / 61 GB / 4 CPU (Amazon EC2 [p2.xlarge])
2

Oct 2017

7:51:22 BiDAF

Stanford DAWN

source

TensorFlow v1.2 1 P100 / 512 GB / 56 CPU (DAWN Internal Cluster)
3

Oct 2017

8:43:40 BiDAF

Stanford DAWN

source

TensorFlow v1.2 1 K80 / 30 GB / 8 CPU (Google Cloud)
4

Oct 2017

10:50:22 BiDAF

Stanford DAWN

source

TensorFlow v1.2 60 GB / 16 CPU (Google Cloud [n1-standard-16])

Training Cost

Objective: Total cost for public cloud instances to train a question answering model to a F1 score of 0.75 or greater on the SQuAD development dataset.

Rank Cost (USD) Model Framework Hardware
1

Oct 2017

$5.78 BiDAF

Stanford DAWN

source

TensorFlow v1.2 60 GB / 16 CPU (Google Cloud [n1-standard-16])
2

Oct 2017

$6.87 BiDAF

Stanford DAWN

source

TensorFlow v1.2 1 K80 / 61 GB / 4 CPU (Amazon EC2 [p2.xlarge])
3

Oct 2017

$8.44 BiDAF

Stanford DAWN

source

TensorFlow v1.2 1 K80 / 30 GB / 8 CPU (Google Cloud)

Inference Latency

Objective: Latency required to answer one SQuAD question using a model with a F1 score of at least 0.75 on the development dataset.

Rank 1-example Latency (milliseconds) Model Framework Hardware
1

Oct 2017

100.0000 BiDAF

Stanford DAWN

source

TensorFlow v1.2 60 GB / 16 CPU (Google Cloud [n1-standard-16])
2

Oct 2017

590.0000 BiDAF

Stanford DAWN

source

TensorFlow v1.2 1 K80 / 30 GB / 8 CPU (Google Cloud)
3

Oct 2017

638.1000 BiDAF

Stanford DAWN

source

TensorFlow v1.2 1 P100 / 512 GB / 56 CPU (DAWN Internal Cluster)
4

Oct 2017

705.9000 BiDAF

Stanford DAWN

source

TensorFlow v1.2 1 K80 / 61 GB / 4 CPU (Amazon EC2 [p2.xlarge])

Inference Cost

Objective: Average cost on public cloud instances to answer 10,000 questions from the SQuAD development dataset using a question answering model to a dev F1 score of 0.75% or greater.

Rank Cost (USD) Model Framework Hardware
1

Oct 2017

$0.15 BiDAF

Stanford DAWN

source

TensorFlow v1.2 60 GB / 16 CPU (Google Cloud [n1-standard-16])
2

Oct 2017

$1.58 BiDAF

Stanford DAWN

source

TensorFlow v1.2 1 K80 / 30 GB / 8 CPU (Google Cloud)
3

Oct 2017

$1.76 BiDAF

Stanford DAWN

source

TensorFlow v1.2 1 K80 / 61 GB / 4 CPU (Amazon EC2 [p2.xlarge])

Join Us

DAWNBench is part of a larger community conversation about the future of machine learning infrastructure. Sound off on the DAWNBench google group.