<?xml version="1.0" encoding="utf-8"?>
<?xml-stylesheet type="text/xsl" href="../assets/xml/rss.xsl" media="all"?><rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Peek Read Info (Posts about fdupes)</title><link>https://peekread.info/</link><description></description><atom:link href="https://peekread.info/tags/fdupes.xml" rel="self" type="application/rss+xml"></atom:link><language>en</language><copyright>Contents © 2024 &lt;a href="mailto:dugite-code@peekread.info"&gt;Dugite-Code&lt;/a&gt; 
&lt;a rel="license" href="http://creativecommons.org/licenses/by-sa/4.0/"&gt;
&lt;img alt="Creative Commons License BY-SA"
width="88px" height="31px" style="border-width:0; margin-bottom:12px;"
src="https://i.creativecommons.org/l/by-sa/4.0/88x31.png"&gt;&lt;/a&gt;</copyright><lastBuildDate>Wed, 14 Feb 2024 06:33:09 GMT</lastBuildDate><generator>Nikola (getnikola.com)</generator><docs>http://blogs.law.harvard.edu/tech/rss</docs><item><title>Using fdupes to cleanup my file server</title><link>https://peekread.info/tech/20230706-using-fdupes-to-cleanup-my-file-server/</link><dc:creator>Dugite-Code</dc:creator><description>&lt;h4&gt;The overall problem:&lt;/h4&gt;
&lt;p&gt;Like many of us, I am guilty of copying files haphazardly, promising myself that I'll organize them later. This has built up to a significant problem over the years, particularly with old smartphone backups. I had a bad habit of dumping photo folder backups onto my server, with each dump containing even more dumps of old photos, resulting in multiple levels of duplication. Using the command-line tool called &lt;code&gt;fdupes&lt;/code&gt; I've only just managed to get some of it under control.&lt;/p&gt;
&lt;p&gt;&lt;code&gt;fdupes&lt;/code&gt; is a command-line application designed to find and identify duplicate files within a directory or a set of directories. It employs various techniques to compare file contents and determine duplicates, enabling efficient cleanup and reclamation of storage space.&lt;/p&gt;
&lt;p&gt;To streamline the review process and make sure I know what's about to happen before deleting any files, I created a simple bash wrapper script. This script acts as a nice safety belt, preventing accidental fat finger deletions.&lt;/p&gt;
&lt;h4&gt;Avoiding the &lt;code&gt;rm -rf&lt;/code&gt; Pitfall:&lt;/h4&gt;
&lt;p&gt;As many of us have learned the hard way, the &lt;code&gt;rm -rf&lt;/code&gt; command can have disastrous consequences if misused (goodbye email server with 2000 emails). A simple typo or a wrong path can result in irreversible data loss. To mitigate this risk, the bash wrapper script avoids using &lt;code&gt;rm -rf&lt;/code&gt; altogether. Instead, it leverages the safer alternative of moving duplicate files to a temporary trash directory for review and then subsequent manual deletion.&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;&lt;span class="ch"&gt;#!/bin/bash&lt;/span&gt;

&lt;span class="nv"&gt;TRASH&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"/tmp/trash"&lt;/span&gt;

find_files&lt;span class="o"&gt;()&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;{&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;fdupes&lt;span class="w"&gt; &lt;/span&gt;-rn&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$*&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&amp;gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="si"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;file&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="nb"&gt;echo&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Duplicate files have listed in &lt;/span&gt;&lt;span class="si"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;file&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;
&lt;span class="o"&gt;}&lt;/span&gt;

remove_files&lt;span class="o"&gt;()&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;{&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="nb"&gt;echo&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Reading from &lt;/span&gt;&lt;span class="si"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;file&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="nb"&gt;echo&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;""&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="nb"&gt;read&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;-p&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Type yes to continue"&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;choice
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="k"&gt;case&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$choice&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;in&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;yes&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
&lt;span class="w"&gt;      &lt;/span&gt;mkdir&lt;span class="w"&gt; &lt;/span&gt;-p&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="si"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;TRASH&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;
&lt;span class="w"&gt;      &lt;/span&gt;&lt;span class="k"&gt;while&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;IFS&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nb"&gt;read&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;-r&lt;span class="w"&gt; &lt;/span&gt;line&lt;span class="p"&gt;;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;do&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;mv&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="si"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;line&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="si"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;TRASH&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;
&lt;span class="w"&gt;      &lt;/span&gt;&lt;span class="k"&gt;done&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&amp;lt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="si"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;file&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;
&lt;span class="w"&gt;      &lt;/span&gt;&lt;span class="nb"&gt;echo&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Duplicate files have been moved to &lt;/span&gt;&lt;span class="si"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;TRASH&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;
&lt;span class="w"&gt;      &lt;/span&gt;&lt;span class="nb"&gt;exit&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;;;&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;*&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
&lt;span class="w"&gt;      &lt;/span&gt;&lt;span class="nb"&gt;echo&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Exiting"&lt;/span&gt;
&lt;span class="w"&gt;      &lt;/span&gt;&lt;span class="nb"&gt;exit&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;;;&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="k"&gt;esac&lt;/span&gt;
&lt;span class="o"&gt;}&lt;/span&gt;

&lt;span class="k"&gt;while&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nb"&gt;getopts&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"rf:"&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;option&lt;span class="p"&gt;;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;do&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="k"&gt;case&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="si"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;option&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;in&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;r&lt;span class="o"&gt;)&lt;/span&gt;
&lt;span class="w"&gt;      &lt;/span&gt;&lt;span class="nv"&gt;remove&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nb"&gt;true&lt;/span&gt;&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="p"&gt;;;&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;f&lt;span class="o"&gt;)&lt;/span&gt;
&lt;span class="w"&gt;      &lt;/span&gt;&lt;span class="nv"&gt;file&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="si"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;OPTARG&lt;/span&gt;&lt;span class="k"&gt;:-&lt;/span&gt;&lt;span class="nv"&gt;dupes&lt;/span&gt;&lt;span class="p"&gt;.txt&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="p"&gt;;;&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="k"&gt;esac&lt;/span&gt;
&lt;span class="k"&gt;done&lt;/span&gt;
&lt;span class="nb"&gt;shift&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;$((&lt;/span&gt;&lt;span class="nv"&gt;OPTIND&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="m"&gt;1&lt;/span&gt;&lt;span class="k"&gt;))&lt;/span&gt;

&lt;span class="k"&gt;case&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$remove&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;in&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="nb"&gt;true&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;remove_files
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="nb"&gt;false&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;find_files&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;$*&lt;/span&gt;
&lt;span class="k"&gt;esac&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;h4&gt;Understanding the Script:&lt;/h4&gt;
&lt;p&gt;The script utilizes the &lt;code&gt;fdupes&lt;/code&gt; command-line tool to identify duplicate files within a given directory or set of directories. Here's how it works:&lt;/p&gt;
&lt;h5&gt;Finding Duplicate Files:&lt;/h5&gt;
&lt;ul&gt;
&lt;li&gt;The &lt;code&gt;find_files&lt;/code&gt; function invokes the &lt;code&gt;fdupes&lt;/code&gt; command with the &lt;code&gt;-rn&lt;/code&gt; flags, instructing it to recursively search for duplicates and list the results in a specified file.&lt;/li&gt;
&lt;li&gt;If no file name is provided as an argument, the script will use the default file name &lt;code&gt;dupes.txt&lt;/code&gt; to store the duplicate file list.&lt;/li&gt;
&lt;li&gt;After the duplicates are found, the script informs us that the duplicate files have been listed in the &lt;code&gt;dupes.txt&lt;/code&gt; file.&lt;/li&gt;
&lt;/ul&gt;
&lt;h5&gt;Removing Duplicate Files:&lt;/h5&gt;
&lt;ul&gt;
&lt;li&gt;The &lt;code&gt;remove_files&lt;/code&gt; function allows us to decide whether to remove the duplicates. Make sure to review the &lt;code&gt;dupes.txt&lt;/code&gt; file before running.&lt;/li&gt;
&lt;li&gt;If no file name is provided as an argument, the script will still refer to the default &lt;code&gt;dupes.txt&lt;/code&gt; file to read the duplicate file list.&lt;/li&gt;
&lt;li&gt;After printing the file listing the duplicates, the script prompts us to confirm our decision by typing "yes."&lt;/li&gt;
&lt;li&gt;If confirmed, the script creates a temporary trash directory and proceeds to move the duplicate files to it.&lt;/li&gt;
&lt;li&gt;Finally, it provides a message confirming that the duplicate files have been successfully moved to the trash directory.&lt;/li&gt;
&lt;/ul&gt;
&lt;h4&gt;Using the Script:&lt;/h4&gt;
&lt;p&gt;To utilize the script effectively, follow these steps:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Copy the script into a text editor and save it as &lt;code&gt;ddup.sh&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;Open a terminal and navigate to the directory containing the script.&lt;/li&gt;
&lt;li&gt;Make the script executable by running the command: &lt;code&gt;chmod +x ddup.sh&lt;/code&gt;.&lt;/li&gt;
&lt;/ul&gt;
&lt;h5&gt;Execute the script with appropriate options:&lt;/h5&gt;
&lt;ul&gt;
&lt;li&gt;To find duplicate files: &lt;code&gt;./ddup.sh &amp;lt;directory&amp;gt;&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;To remove duplicate files: &lt;code&gt;./ddup.sh -r&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Note: If you don't specify a file name using the &lt;code&gt;-f&lt;/code&gt; argument, it will default to using the &lt;code&gt;dupes.txt&lt;/code&gt; file for listing duplicate files.&lt;/p&gt;</description><category>fdupes</category><category>nextcloud</category><category>quick post</category><guid>https://peekread.info/tech/20230706-using-fdupes-to-cleanup-my-file-server/</guid><pubDate>Wed, 05 Jul 2023 16:00:00 GMT</pubDate></item></channel></rss>