Why do these duplicated SD cards have different sha1sums for their content?

I have a bunch of Class 10 UHS-1 SDHC SD cards from different manufacturers. They are all partitioned as follows

 $ sudo fdisk -l /dev/sdj
Disk /dev/sdj: 14.9 GiB, 15931539456 bytes, 31116288 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: dos
Disk identifier: 0x0000de21

Device     Boot   Start      End  Sectors  Size Id Type
/dev/sdj1          2048  1050623  1048576  512M  c W95 FAT32 (LBA)
/dev/sdj2       1050624  2099199  1048576  512M 83 Linux
/dev/sdj3       2099200  3147775  1048576  512M 83 Linux
/dev/sdj4       3147776 31116287 27968512 13.3G 83 Linux

I used a memory card duplicator to copy the images. All cards have the same content.

When I mount the second partition of any two SD cards and compare the content, they are exactly the same.

 $ sudo mount -o ro /dev/sdg2 /mnt/system-a/
 $ sudo mount -o ro /dev/sdj2 /mnt/system-b/
 $ diff -r --no-derefence /mnt/system-a /mnt/system-b/
 $ # prints nothing^

However, if I compare the sha1sum of the partitions, they sometimes differ

 $ sudo dd if=/dev/sdg2 | sha1sum
1048576+0 records in
1048576+0 records out
536870912 bytes (537 MB) copied, 12.3448 s, 43.5 MB/s
ee7a16a8d7262ccc6a2e6974e8026f78df445e72  -

 $ sudo dd if=/dev/sdj2 | sha1sum
1048576+0 records in
1048576+0 records out
536870912 bytes (537 MB) copied, 12.6412 s, 42.5 MB/s
4bb6e3e5f3e47dc6cedc6cf8ed327ca2ca7cd7c4  -

Stranger, if I compare these two drives using a binary diffing tool like radiff2, I see the following

 $ sudo dd if=/dev/sdg2 of=sdg2.img
1048576+0 records in
1048576+0 records out
536870912 bytes (537 MB) copied, 12.2378 s, 43.9 MB/s

 $ sudo dd if=/dev/sdj2 of=sdj2.img
1048576+0 records in
1048576+0 records out
536870912 bytes (537 MB) copied, 12.2315 s, 43.9 MB/s

 $ radiff2 -c sdg2.img sdj2.img
767368

767368 changes, even though diff didn’t see any differences in the content!

And for sanity, if I compare two partitions that had the same sha1sum, I see the following

 $ radiff2 -c sdj2.img sdf2.img
0

0 changes!

Here is a breakdown of the different sha1sums I see from different cards. It seems like the manufacturer of the card has a large affect on what sha1sum I get when I use dd to read the drive.

enter image description here

Despite differences in sha1sums, all these cards work for my purposes. However, it is making integrety checking difficult because I cannot compare sha1sums.

How is it possible two SD card partitions could have different sha1sums, yet have the exact same content when mounted?


Answer: So now it works as expected. To clear things up, the inconsistency was caused by the SySTOR duplicator I was using. The copy setting I had it use copied partition information and files, but it did not necessary dd the bits to ensure there was a one-to-one match.

Asked By: peskal

||

Did you compare their contents immediately after writing the duplicated contents? If yes, they should come out exactly the same. For example,

# Duplicate
dd bs=16M if=/dev/sdg of=/dev/sdk

# Comparing should produce no output
cmp /dev/sdg /dev/sdk
# Compare, listing each byte difference; also no output
cmp -l /dev/sdg /dev/sdk

This is only true if the cards have exactly the same size. Sometimes, even different batches of cards that are the same manufacturer and model come out with slightly different sizes. Use blockdev --getsize64 to get the exact size of the device.

Also, if both cards have exactly identical sizes but you wrote an image to both cards that was smaller than the capacity of the cards, then the garbage that comes after the end of the image may cause differences to be reported.

Once you mount any filesystem on the device, you will start to see differences. The filesystem implementation will write various things to the filesystem, such as an empty journal, or a flag/timestamp to mark the filesystem as clean, and then you won’t see identical content anymore. I believe this can be the case under some circumstances even if you mount the filesystem read-only.

Answered By: Celada

To build upon Celada’s answer: 
On the one hand, you’re doing a diff (recursive)
between two mounted filesystems. 
On the other hand, you’re doing a binary compare
between devices that have filesystems on them
apparently, after you have mounted the filesystems. 
That’s apples and pomegranates.

The operation at the mounted filesystem level can see only the data content of the files in the filesystems. 
The binary compare between the devices looks at the data and the metadata
I’m a little surprised by the 767368 differences, but I can guess at a few:

  • When you mount a filesystem,
    the kernel writes the current time into the filesystem superblock
    as the “mount time”. 
    If you have mounted both devices (and not at the exact same time),
    the “mount times” in the superblocks will be different.
  • If you do the device-level binary compare
    after the recursive filesystem diff,
    every file on each device will have had its access time
    (in the inode) updated.

P.S. Do you need to use dd so much? 
What happens if you do radiff2 -c /dev/sdg2 /dev/sdj2
or sha1sum /dev/sdg2?

Categories: Answers Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.