Server hardening of Ubuntu 22.04 by analyzing and removing unnecessary packages

I have read Jay Lacroix’s book about "Mastering Ubuntu Server", and he recommends removing all unnecessary packages in order to reduce the attack surface. Specifically, he advises running apt-cache rdepends <package> to find out if there are other packages depending on a package we consider removing.

I wrote a bash script that lists dependent packages for all installed packages, but it takes a very long time (>30 mins on a Raspberry Pi 4, 8GB), and I was wondering if there is a better, faster solution to this.

#!/bin/bash

readarray -t packages < <(dpkg --get-selections | cut -f1)

for package in ${packages[@]};
do
    readarray -t dependents < <(apt-cache rdepends $package | sed -n '3,$s/^s*//p')
    echo "-----------------------------------------------------------------------" | tee -a packages_and_depents.txt
    echo "${package} has these dependents on the system of max ${#dependents[@]}:" | tee -a packages_and_depents.txt
    echo "-----------------------------------------------------------------------" | tee -a packages_and_depents.txt
    for dependent in ${dependents[@]};
    do
        dpkg --get-selections $dependent 2>/dev/null | tee -a packages_and_depents.txt
    done
done
Asked By: Thomas Grusz

||

The dpkg package system has a field for each package indicating its Priority.

You could use this as an initial filter, and only run your script on packages that are categorized as optional and extra (and leaving out required, important and standard).

Also, creating an extra array and running an extra for loop for each packages seems unnecessary, and will definitely take more computing power.

So I’ve removed the 2nd for loop and instead adding --installed directly to the apt-cache rdepends command.

This could be done with the script modified as this:

#!/bin/bash

# The command for this line is changed
readarray -t packages < <(dpkg-query -Wf '${Package}${Status}${Priority}n' | sort -b -k5,5 -k1,1 | grep -v 'required|important|standard' | grep 'installed' | awk '{ print $1 }')

for package in ${packages[@]};
do
    echo "--------------------------------------------------------" | tee -a packages_and_depents.txt
    echo "${package} has these dependents installed on the system:" | tee -a packages_and_depents.txt
    echo "--------------------------------------------------------" | tee -a packages_and_depents.txt

    # 2nd for loop removed and replaced with `--installed` option
    apt-cache --installed rdepends "$package" | tail -n +3 | tee -a packages_and_depents.txt
done

Another option is to change the entire script, so instead of showing all reverse dependencies, it only shows the name of those packages that have NO reverse dependencies (those that are a candidate for removal).

Also, I think you could add an additional grep exclusion by excluding all packages whose name start with lib (adding grep -v '^lib').

Finally, the presentation can be improved, so the script gives a visual feedback while it’s running, but the final report is only written to the output file.

Here is my final version of the script:

#!/bin/bash

# The command for this line is changed
readarray -t packages < <(dpkg-query -Wf '${Package}${Status}${Priority}n' | sort -b -k5,5 -k1,1 | grep -v 'required|important|standard' | grep -v '^lib' | grep 'installed' | awk '{ print $1 }')

# Write to file
echo "The following packages are not a dependency to any installed package:" > packages_no_depends.txt
# Write to screen
echo "Number of packages: [ ${#packages[@]} ] (priority optional/extra)"
echo ""

i=0
j=0

# Loop that only prints package names with NO reverse dependencies
for package in ${packages[@]};
do

    (( j++ ))
    echo -e "33[1AProcessed packages: [ $i/$j ]"
    if [[ $(apt-cache --installed rdepends "$package" | tail -n +3 | wc -l) -eq 0 ]]
    then
        # Write to file
        echo "  $package" >> packages_no_depends.txt
        # Write to screen
        echo -e "33[K  Package $package added to the list of non-dependencies33[1A"
        (( i++ ))
    fi

done

# Final overview
echo -e "33[K"
echo "STATUS"
echo "======"
echo "  Total packages scanned : $j"
echo "  Candidates for removal : $i"
echo "  Script execution time  : $SECONDS seconds"

Reference to escape codes for cursor movement.

EDIT: This solution should mostly be considered for layout and presentation – Raffa’s solution is much more effective, so do your own combination of the two according to preference.

With input from Raffa, this is my final version of the script:

#!/bin/bash

# Change /path/to
dpkg_file="/path/to/packages_no_depends.txt"

# Write to file
echo "The following packages are not a dependency to any installed package:" > "$dpkg_file"
# Write to screen
echo "Scanning packages ..."

# Function to write non-dependencies to file
dpkg-query -Wf '${Package} ${Status}${Priority}n' |
  grep -v 'required|important|standard' |
  grep 'installed' |
  awk '{ print $1 }' |
xargs apt-cache rdepends --installed |
  awk '! /Reverse Depends:/ {
    tp = $0
    n++
  }
  /Reverse Depends:/ {
    if (n == 1 && NR != 2) {
        print "  " p
    }
    n = 0
    p = tp
  }
  END {
    if (n == 0) {
        print "  " p
    }
  }' >> "$dpkg_file"

# Final overview
echo -e "nSTATUSn======"
echo "  Total packages scanned : $(dpkg-query -Wf '${Package}${Status}${Priority}n' | grep -v 'required|important|standard' | grep 'installed' | wc -l)"
echo "  Candidates for removal : $(tail -n +2 $dpkg_file | wc -l)"
echo "  Script execution time  : $SECONDS seconds"
Answered By: Artur Meinild

In line with the tools you implement in your script, this should be as fast as it gets:

dpkg --get-selections |
cut -f1 |
xargs apt-cache rdepends --installed |
awk '! /Reverse Depends:/ {
    tp = $0
    n++
}

/Reverse Depends:/ {
    if (n == 1 && NR != 2) {
        print p
    }
    n = 0
    p = tp
}

END {
    if (n == 0) {
        print p
    }
}
'

It should list packages in the output of dpkg --get-selections | cut -f1 that have no installed revers dependencies on the system.

!!WARNING!!

Never feed the output of the above command to a package removal tool … Inspect the output and handle it manually in all cases.

Answered By: Raffa

This is my final implementation with input from @Raffa, @artur-meinild, and @Dan.

I tested the script on a Raspberry Pi 4 (8GB) and a VM on my iMac (8GB, i5, SSD, 2015) and runtime was about 1 second on both systems. A big improvement from my initial script that took more than 30 minutes.

Thanks to everyone!

#!/bin/bash

# Define output filename
filename=packages_no_dependents.txt

# Write heading to file
echo "The following packages are not a dependency to any installed package:" > $filename

# Get all installed packages that do not have a 'Priority' of 'required', 'important' or 'standard' and do not have packages that depend on them
dpkg-query -Wf '${Package} ${Status;-26}${Priority}n' | grep -v 'required|important|standard' | grep 'installed' | awk '{ print $1 }' |
xargs apt-cache rdepends --installed |
awk '! /Reverse Depends:/ {
    tp = $0
    n++
}

/Reverse Depends:/ {
    if (n == 1 && NR != 2) {
        print p
    }
    n = 0
    p = tp
}

END {
    if (n == 0) {
        print p
    }
}
' | tee -a $filename

# Remove leading white-space in file
sed -i '3,$s/s*//' $filename
Answered By: Thomas Grusz

I generally go the other way: In aptitude I mark everything as "automatically installed", which will mark all but essential packages for deletion. Then I go through the list of packages to be uninstalled, and manually add those that I know I will directly need.

If something in the list looks useful, I can press r to get the list of reverse dependencies, and if something in that list looks useful, that gets marked manually installed and the package I was looking at earlier remains installed as a dependency.

Answered By: Simon Richter