Using fdupes to cleanup my file server
The overall problem:
Like many of us, I am guilty of copying files haphazardly, promising myself that I'll organize them later. This has built up to a significant problem over the years, particularly with old smartphone backups. I had a bad habit of dumping photo folder backups onto my server, with each dump containing even more dumps of old photos, resulting in multiple levels of duplication. Using the command-line tool called fdupes
I've only just managed to get some of it under control.
fdupes
is a command-line application designed to find and identify duplicate files within a directory or a set of directories. It employs various techniques to compare file contents and determine duplicates, enabling efficient cleanup and reclamation of storage space.
To streamline the review process and make sure I know what's about to happen before deleting any files, I created a simple bash wrapper script. This script acts as a nice safety belt, preventing accidental fat finger deletions.
Avoiding the rm -rf
Pitfall:
As many of us have learned the hard way, the rm -rf
command can have disastrous consequences if misused (goodbye email server with 2000 emails). A simple typo or a wrong path can result in irreversible data loss. To mitigate this risk, the bash wrapper script avoids using rm -rf
altogether. Instead, it leverages the safer alternative of moving duplicate files to a temporary trash directory for review and then subsequent manual deletion.
#!/bin/bash TRASH="/tmp/trash" find_files() { fdupes -rn "$*" > ${file} echo "Duplicate files have listed in ${file}" } remove_files() { echo "Reading from ${file}" echo "" read -p "Type yes to continue" choice case "$choice" in yes ) mkdir -p "${TRASH}" while IFS= read -r line; do mv "${line}" "${TRASH}" done < ${file} echo "Duplicate files have been moved to ${TRASH}" exit ;; * ) echo "Exiting" exit ;; esac } while getopts "rf:" option; do case "${option}" in r) remove=true ;; f) file="${OPTARG:-dupes.txt}" ;; esac done shift $((OPTIND - 1)) case "$remove" in true ) remove_files false ) find_files $* esac
Understanding the Script:
The script utilizes the fdupes
command-line tool to identify duplicate files within a given directory or set of directories. Here's how it works:
Finding Duplicate Files:
- The
find_files
function invokes thefdupes
command with the-rn
flags, instructing it to recursively search for duplicates and list the results in a specified file. - If no file name is provided as an argument, the script will use the default file name
dupes.txt
to store the duplicate file list. - After the duplicates are found, the script informs us that the duplicate files have been listed in the
dupes.txt
file.
Removing Duplicate Files:
- The
remove_files
function allows us to decide whether to remove the duplicates. Make sure to review thedupes.txt
file before running. - If no file name is provided as an argument, the script will still refer to the default
dupes.txt
file to read the duplicate file list. - After printing the file listing the duplicates, the script prompts us to confirm our decision by typing "yes."
- If confirmed, the script creates a temporary trash directory and proceeds to move the duplicate files to it.
- Finally, it provides a message confirming that the duplicate files have been successfully moved to the trash directory.
Using the Script:
To utilize the script effectively, follow these steps:
- Copy the script into a text editor and save it as
ddup.sh
. - Open a terminal and navigate to the directory containing the script.
- Make the script executable by running the command:
chmod +x ddup.sh
.
Execute the script with appropriate options:
- To find duplicate files:
./ddup.sh <directory>
- To remove duplicate files:
./ddup.sh -r
Note: If you don't specify a file name using the -f
argument, it will default to using the dupes.txt
file for listing duplicate files.