Using fdupes to cleanup my file server

The overall problem:

Like many of us, I am guilty of copying files haphazardly, promising myself that I'll organize them later. This has built up to a significant problem over the years, particularly with old smartphone backups. I had a bad habit of dumping photo folder backups onto my server, with each dump containing even more dumps of old photos, resulting in multiple levels of duplication. Using the command-line tool called fdupes I've only just managed to get some of it under control.

fdupes is a command-line application designed to find and identify duplicate files within a directory or a set of directories. It employs various techniques to compare file contents and determine duplicates, enabling efficient cleanup and reclamation of storage space.

To streamline the review process and make sure I know what's about to happen before deleting any files, I created a simple bash wrapper script. This script acts as a nice safety belt, preventing accidental fat finger deletions.

Avoiding the rm -rf Pitfall:

As many of us have learned the hard way, the rm -rf command can have disastrous consequences if misused (goodbye email server with 2000 emails). A simple typo or a wrong path can result in irreversible data loss. To mitigate this risk, the bash wrapper script avoids using rm -rf altogether. Instead, it leverages the safer alternative of moving duplicate files to a temporary trash directory for review and then subsequent manual deletion.

#!/bin/bash

TRASH="/tmp/trash"

find_files() {
  fdupes -rn "$*" > ${file}
  echo "Duplicate files have listed in ${file}"
}

remove_files() {
  echo "Reading from ${file}"
  echo ""
  read -p "Type yes to continue" choice
  case "$choice" in
    yes )
      mkdir -p "${TRASH}"
      while IFS= read -r line; do
        mv "${line}" "${TRASH}"
      done < ${file}
      echo "Duplicate files have been moved to ${TRASH}"
      exit ;;
    * )
      echo "Exiting"
      exit ;;
  esac
}

while getopts "rf:" option; do
  case "${option}" in
    r)
      remove=true  ;;
    f)
      file="${OPTARG:-dupes.txt}"  ;;
  esac
done
shift $((OPTIND - 1))

case "$remove" in
  true )
    remove_files
  false )
    find_files $*
esac

Understanding the Script:

The script utilizes the fdupes command-line tool to identify duplicate files within a given directory or set of directories. Here's how it works:

Finding Duplicate Files:
  • The find_files function invokes the fdupes command with the -rn flags, instructing it to recursively search for duplicates and list the results in a specified file.
  • If no file name is provided as an argument, the script will use the default file name dupes.txt to store the duplicate file list.
  • After the duplicates are found, the script informs us that the duplicate files have been listed in the dupes.txt file.
Removing Duplicate Files:
  • The remove_files function allows us to decide whether to remove the duplicates. Make sure to review the dupes.txt file before running.
  • If no file name is provided as an argument, the script will still refer to the default dupes.txt file to read the duplicate file list.
  • After printing the file listing the duplicates, the script prompts us to confirm our decision by typing "yes."
  • If confirmed, the script creates a temporary trash directory and proceeds to move the duplicate files to it.
  • Finally, it provides a message confirming that the duplicate files have been successfully moved to the trash directory.

Using the Script:

To utilize the script effectively, follow these steps:

  • Copy the script into a text editor and save it as ddup.sh.
  • Open a terminal and navigate to the directory containing the script.
  • Make the script executable by running the command: chmod +x ddup.sh.
Execute the script with appropriate options:
  • To find duplicate files: ./ddup.sh <directory>
  • To remove duplicate files: ./ddup.sh -r

Note: If you don't specify a file name using the -f argument, it will default to using the dupes.txt file for listing duplicate files.

Comments