Recover files from hardware failed ssd
I have an ssd that has had a hardware failure of some kind, I can rma it as it is still under warranty but their service does not include data recovery. I have collected my work files from it but my personal files and plugins are still there and I would like to recover them.
The catch…
Bandwidth and total copy size seams to be an issue. Read too much and the drive crashes.
Reports bad sectors everywhere (false) or shuts down the whole os.
So I have since put the drive in an external case so I can hot swap it as I suspect the memory buffer is the issue.
I thought of using rsync with a bandwidth limit as per another question here, but I believe I would need to stagger the copy process to either let the buffer clear itself or cool down.
I am in need of some script or tool to recover my lost data.
Reports bad sectors everywhere (false)
"sectors" is not a thing that exists on SSDs. "blocks" do, and if your drive reports them as bad that means, because there’s nothing to fail mechanically:
- Asserted the address lines to get these bits of the block
- Read out the memory cells, which means "gotten a vector of voltage readoutss"
- Tried to convert these to bits, by applying a rather complicated soft-input error correcting code on them
- Return (success and data) or (error):
- Success when trying to decode them (iteratively) yielded an error term ("syndrome" in decoder speak) that was zero at some point
- Error when the soft values never actually can be massaged and error corrected to yield a word that has no error
So, you get an error. That means the thing cannot be read out. There’s no "the SSD is wrong about the data being unrecoverable": It always reads some voltages, no matter how broken everything is, and checks whether these pass a check, and corrects them, if necessary, and possible.
Therefore:
Reports bad sectors everywhere
You’ll have to trust your SSD on that – it literally cannot read the data. The only thing you can change to "globally" make reading hard although the memory cells (which, by the way, are small capacitors charged to some voltage) are intact is if you "shift" the reference voltage of the ADC that converts the analog voltages to digital values. Then, you get incorrect soft input for your decoder, even if the actual memory was OK.
But that voltage is generated within the very same die as the ADC is (so, within your flash memory chip) and should be quite resilient to changes in e.g. supply voltage.
So, maybe it’s a thermal thing, really, or some silicon mode failure.
Either way, it would sound to me like you would not want to use rsync, or any file-system level tool, to get data from the drive. That requires the operating system to be able to access the same data spots very frequently, just to understand what data is in which file.
What you’d need to do is make block-device level copy, and to it in small steps. Read (for example) 16 MB into an image file. Wait a while. Read 16 MB… and so on.
This could be done in a ZSH/bash shell script with a loop that reads these blocks sequentially with dd
, then calls sleep
to wait, then reads the next and such, or in a few lines of Python. Don’t forget to check for errors reading on the way, and abort the procedure when they happen, to restart it at the same point later.
Because an actual prepared solution is asked for:
#!/usr/bin/zsh
# Copyright 2022 Marcus Müller
# SPDX-License-Identifier: BSD-3-Clause
# Find the license text under https://spdx.org/licenses/BSD-3-Clause.html
# THIS SCRIPT IS UNTESTED AND COMES WITH NO WARRANTIES, FOLKS.
IN_DEVICE=/dev/yoursource_ssd
BACKUP_IMG=myimage
LOGFILE=broken_mbs.txt
# get size, round up to full MB
size_in_bytes=$(blockdev "${IN_DEVICE}")
size_in_MB=$(( ( ${size_in_bytes} + 2**20 - 1) / 2**20 ))
#check whether size > 0
if [[ ! ${size_in_MB} -gt 0 ]]; then
logger -p user.crit "Nope, can't determine size of ${IN_DEVICE}. I'm outta here."
echo "Failure on input" >&2
exit -1
else
logger -p user.info "Trying to back up ${IN_DEVICE}, size ${size_in_MB} MB"
fi
if fallocate -l "${size_in_MB}MiB" "${BACKUP_IMG}" ; then
logger -p user.info "preallocated ${BACKUP_IMG}"
else
logger -p user.crit "failed to preallocate ${BACKUP_IMG}"
echo "failure on output" >&2
exit -2
fi
failcounter=0
MB=$((2**20))
for i in {0..$((${size_in_MB}-1))}; do
if
dd
"if=${IN_DEVICE}"
"of=${BACKUP_IMG}"
"ibs=${MB}" "obs=${MB}"
"skip=${i}" "seek=${i}" ;
then
echo "backed up MB nr. ${i}"
else
failcounter=$(( ${failcounter} + 1 ))
echo "${failcounter}. error: couldn't backup MB nr. $i" > &2
echo "${i}" >> ${LOGFILE}
logger -p user.err "couldn't backup MB nr. $i"
fi
sleep 0.5
done
echo "Got ${failcounter} failures"
exit ${failcounter}