Server hardening of Ubuntu 22.04 by analyzing and removing unnecessary packages
I have read Jay Lacroix’s book about "Mastering Ubuntu Server", and he recommends removing all unnecessary packages in order to reduce the attack surface. Specifically, he advises running apt-cache rdepends <package>
to find out if there are other packages depending on a package we consider removing.
I wrote a bash script that lists dependent packages for all installed packages, but it takes a very long time (>30 mins on a Raspberry Pi 4, 8GB), and I was wondering if there is a better, faster solution to this.
#!/bin/bash
readarray -t packages < <(dpkg --get-selections | cut -f1)
for package in ${packages[@]};
do
readarray -t dependents < <(apt-cache rdepends $package | sed -n '3,$s/^s*//p')
echo "-----------------------------------------------------------------------" | tee -a packages_and_depents.txt
echo "${package} has these dependents on the system of max ${#dependents[@]}:" | tee -a packages_and_depents.txt
echo "-----------------------------------------------------------------------" | tee -a packages_and_depents.txt
for dependent in ${dependents[@]};
do
dpkg --get-selections $dependent 2>/dev/null | tee -a packages_and_depents.txt
done
done
The dpkg
package system has a field for each package indicating its Priority.
You could use this as an initial filter, and only run your script on packages that are categorized as optional
and extra
(and leaving out required
, important
and standard
).
Also, creating an extra array and running an extra for
loop for each packages seems unnecessary, and will definitely take more computing power.
So I’ve removed the 2nd for
loop and instead adding --installed
directly to the apt-cache rdepends
command.
This could be done with the script modified as this:
#!/bin/bash
# The command for this line is changed
readarray -t packages < <(dpkg-query -Wf '${Package}${Status}${Priority}n' | sort -b -k5,5 -k1,1 | grep -v 'required|important|standard' | grep 'installed' | awk '{ print $1 }')
for package in ${packages[@]};
do
echo "--------------------------------------------------------" | tee -a packages_and_depents.txt
echo "${package} has these dependents installed on the system:" | tee -a packages_and_depents.txt
echo "--------------------------------------------------------" | tee -a packages_and_depents.txt
# 2nd for loop removed and replaced with `--installed` option
apt-cache --installed rdepends "$package" | tail -n +3 | tee -a packages_and_depents.txt
done
Another option is to change the entire script, so instead of showing all reverse dependencies, it only shows the name of those packages that have NO reverse dependencies (those that are a candidate for removal).
Also, I think you could add an additional grep
exclusion by excluding all packages whose name start with lib
(adding grep -v '^lib'
).
Finally, the presentation can be improved, so the script gives a visual feedback while it’s running, but the final report is only written to the output file.
Here is my final version of the script:
#!/bin/bash
# The command for this line is changed
readarray -t packages < <(dpkg-query -Wf '${Package}${Status}${Priority}n' | sort -b -k5,5 -k1,1 | grep -v 'required|important|standard' | grep -v '^lib' | grep 'installed' | awk '{ print $1 }')
# Write to file
echo "The following packages are not a dependency to any installed package:" > packages_no_depends.txt
# Write to screen
echo "Number of packages: [ ${#packages[@]} ] (priority optional/extra)"
echo ""
i=0
j=0
# Loop that only prints package names with NO reverse dependencies
for package in ${packages[@]};
do
(( j++ ))
echo -e "