Why is dd so slow with a bs of 100M

I just tried to overwrite a fast ssd using dd. Using the ubuntu boot image I typed in:

dd if=/dev/zero of=/dev/sda bs=100M
error writing '/dev/sda': No space left on device
blah blah
256 GB copied, 1195.81 s 214 MB/s

Isn’t that quite slow? And where is the bottleneck? What about the choice of block size?

Asked By: Nils

||

Optimal blocksizes for dd are around 64k256k, humans usually prefer 1M.


A benchmark without real I/O:

$ for bs in 512 4k 16k 64k 128k 256k 512k 1M 4M 16M 64M 128M 256M 512M
> do
>     echo ---- $bs: ----
>     dd bs=$bs if=/dev/zero of=/dev/null iflag=count_bytes count=10000M
> done
---- 512: ----
20480000+0 records in
20480000+0 records out
10485760000 bytes (10 GB) copied, 4.2422 s, 2.5 GB/s
---- 4k: ----
2560000+0 records in
2560000+0 records out
10485760000 bytes (10 GB) copied, 0.843686 s, 12.4 GB/s
---- 16k: ----
640000+0 records in
640000+0 records out
10485760000 bytes (10 GB) copied, 0.533373 s, 19.7 GB/s
---- 64k: ----
160000+0 records in
160000+0 records out
10485760000 bytes (10 GB) copied, 0.480879 s, 21.8 GB/s
---- 128k: ----
80000+0 records in
80000+0 records out
10485760000 bytes (10 GB) copied, 0.464556 s, 22.6 GB/s
---- 256k: ----
40000+0 records in
40000+0 records out
10485760000 bytes (10 GB) copied, 0.48516 s, 21.6 GB/s
---- 512k: ----
20000+0 records in
20000+0 records out
10485760000 bytes (10 GB) copied, 0.495087 s, 21.2 GB/s
---- 1M: ----
10000+0 records in
10000+0 records out
10485760000 bytes (10 GB) copied, 0.494201 s, 21.2 GB/s
---- 4M: ----
2500+0 records in
2500+0 records out
10485760000 bytes (10 GB) copied, 0.496309 s, 21.1 GB/s
---- 16M: ----
625+0 records in
625+0 records out
10485760000 bytes (10 GB) copied, 0.972703 s, 10.8 GB/s
---- 64M: ----
156+1 records in
156+1 records out
10485760000 bytes (10 GB) copied, 1.0409 s, 10.1 GB/s
---- 128M: ----
78+1 records in
78+1 records out
10485760000 bytes (10 GB) copied, 1.04533 s, 10.0 GB/s
---- 256M: ----
39+1 records in
39+1 records out
10485760000 bytes (10 GB) copied, 1.04685 s, 10.0 GB/s
---- 512M: ----
19+1 records in
19+1 records out
10485760000 bytes (10 GB) copied, 1.0436 s, 10.0 GB/s
  • The default 512 bytes is slow like hell (two syscalls per 512 bytes is just too much for the CPU)
  • 4k is considerably better than 512
  • 16k is considerably better than 4k
  • 64k256k is about as good as it gets
  • 512k4M slightly slower
  • 16M512M speed cuts in half, worse than 4k.

My guess is that starting with a certain size, you start losing speed due to lack of concurrency. dd is a single process; concurrency is largely provided by the kernel (readahead, cached write, …). If it has to read 100M before it can write 100M, there will be moments when a device sits idle, waiting for the other to finish reading or writing. Too small blocksize and you suffer from sheer syscall overhead, but that goes away completely with 64k or so.

100M or larger blocksizes might help when copying from and to the same device. At least for hard drives, doing so should reduce the time wasted on seeking, as it can’t be in two places simultaneously.


Why are you overwriting your SSD like this in the first place? Normally, you try to avoid unnecessary writes on SSDs; if it considers all of its space used, it will likely also lose some of its performance until you TRIM it free again.

You could use this command instead to TRIM/discard your entire SSD:

blkdiscard /dev/sda

If your SSD has deterministic read zeroes after TRIM (a property you can check with hdparm -I) it will look like it’s full of zeroes, but the SSD actually considers all of its blocks as free which should give you the best possible performance.

The downside of TRIM is that you lose all chances at data recovery if the deleted file has already been discarded…

Answered By: frostschutz
Categories: Answers Tags:
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.