Script Ahoy

Community Resource for Archivists and Librarians Scripting

The command line is powerful but more than a little intimidating. Our community resource is intended to provide helpful one-liners and script code specifically drawn from real-life examples in archives and libraries. Take these, use them in your workflows, change them up, contribute back! Remember: you have to crawl before you walk. (See what we did there with the acronym?)

Each button displays helpful information about how to perform a wide variety of tasks on the command line. To use this site, click on the task you would like to perform. A new window will open up with a sample command and a description of how that command works. You can copy this command and understand how the command works with a breakdown of each of the flags.

Practice more command line wizardry with this Command Line Crash Course from Learn Code the Hard Way.

What do you want to do?

Select from the following.

Copy, rename, and create files

Copy JPGs

cp -a -i -v *.jpg destination

Copy all jpg files into another directory and preserve timestamps.

cp: starts the command, which copies files and directories.
-a: preserves timestamp information for copied files.
-i: will prompt when a file with the same name already exists in the destination directory. (You can ignore this if you don't need or want prompting; alternatively you can use -n if you want copy to never overwrite a file with the same name in the destination.
-v: prints out all files successfully copied. (You can ignore this if you don't need the feedback.)
destination: is the directory where you want the files copied to.

Remove spaces from file names

rename -n -v "s/\s+/_g" *

This command will test a batch replacement of all whitespace characters in a filename with underscores. You must have the rename command installed to use it; if you encounter an error, it is possible you do not have rename installed.

rename: starts the command, which renames multiple files.
-n: tells rename to only print the names of files to be renamed, but doesn't actually rename anything. (When you’re ready to batch rename, reissue the command without -n.)
-v: tells rename to print out the names of the files that will be successfully renamed. (You can leave this out if you don’t need confirmation.)
"s/\s+/_g": is a regular expression that says to replace all whitespace characters (e.g., space, tab, newline) with an underscore.

Create multiple directories

mkdir -p -v disk{001..100}/{rawFiles,normalizedFiles,reports}

This command makes one hundred sequentially named directories with a common prefix, and create three subdirectories (named rawFiles, normalizedFiles, and reports) underneath each parent directory.

mkdir: starts the command, which makes directories.
-p: tells mkdir to create parent directories when they do not already exist.
-v: tells mkdir to print a message for each new directory created.
disk: prepends the literal string "disk".
{001..100}: prints out every number in the range from 1 to 100, right padding with zeros so each number has three digits.
/: is the path delimiter and separates a parent and child directory.
{rawFiles,normalizedFiles,reports}: prints out each string -- rawFiles, normalizedFiles, reports -- one at a time.

Use rsync

rsync -a -h --progress --stats --exclude=".*" --exclude="~*" --log-file=log_file source destination

This command will recursively synchronize select files from one directory to another. (Note: rsync is generally used when you are copying data off of one storage medium to another, such as from your workstation to an external hard drive.)

rsync: starts the command, which is a file-copying tool.
-a: preserves timestamp information
-h: prints out file size information in a human readable format. (You can omit this if you’re a robot.)
--progress: prints out a graphical progress bar while rsync is copying files
--exclude=".*": tells rsync not to copy any hidden files (i.e., files that start with a dot)
--exclude="~*": tells rsync not to copy any temporary files (i.e., files that start with a tilde)
--log-file=log_file
source: is where you want rsync to copy from
destination: is the directory where you want rsync to copy files into.

Append file extensions

for files in *; do mv ${files} ${files}.doc; done

Append the extension .doc to all files in a directory.

This script, which uses a loop, deploys the move command to rename a directory of files with their current filename plus the extension .doc. This will be helpful when working with a folder of homogenous files of the same type that, for one reason or another, lack file extensions. The absence of file extensions is a very common pattern of documents from the mid to late 1990s. This script quickly solves the problem and can be executed directly from your Terminal window.

for files in *;: starts the loop
do mv "$files" "$files.doc";: for each matched file in the directory, move it to a new file with the same name and .doc appended to it.
done: ends the loop.

Rename WAV files by substitution

for i in *.wav; do mv $i $(echo $i | sed 's/MAC/MN/g'); done

Rename WAV files in a directory by substituting the pattern MAC for MN.

This works within the full filename string, instead of only at the beginning or end. It uses "sed" to do this substitution work.
for i in *.wav;: starts the loop
do mv $i $(echo $i | sed 's/MAC/MN/g');: Performs the renaming.
$(echo $i | sed 's/MAC/MN/g'): You can nest commands by placing a line of bash inside the parentheses preceded by a dollar sign: $( ... )
$echo i |: says to print out the value of the variable i
|: (pipe) indicates the output of the first command will be passed as input for the next command.
sed 's/MAC/MN/g': sed is known as "stream editor." The text in single quotes tells sed what to do. The 's' tells sed that we want it to perform a substitution. The '/' (slash) characters serve as seprators between the old and new values that we want to find and replace. First we define the old value we want to replace. In this case, it is "MAC". Then we define the new value we want to substitute for the old value. In this case, it is "MN". The 'g' at the end of the command says we want to find/replace all occurences of this pattern, not just the first one.
done: ends the loop.

Find and rsync all Excel files

find source -iname "*.xlsx" -exec rsync {} destination \;

Find and rsync all excel files in any subdirectory from source to a single directory at destination.

This command can be used to quickly aggregate multiple files to a single place for analysis. For example, copying metadata files from the data/ directory of a large number of bags.

If files do not have unique names, they will be overwritten at destination.

find: starts the command, which is a file finding tool
source: is the parent directory where you want to look
-iname: match a case-insensitive filename pattern
"*.xlsx": match any filename that is followed by a dot and xlsx
-exec: execute a command on each file found
rsync: tells rsync not to copy any temporary files (i.e., files that start with a tilde)
{}: tells to use the file found by find as the source
destination: is the directory where you want rsync to copy files into.
\;: end the exec command

Find and interpret files

Find all ZIP files in a directory

ls *.zip

This will find all of the .zip files in a particular directory. Note that it will not find files in subdirectories.

ls: starts the command, which lists directory contents.
*.zip: tells ls to match every file (represented by the asterisk) that is followed by a dot and the three letter string "zip".

Count MP3s in a directory

ls *.mp3 | wc -l

This command will count the number of MP3 files in a directory. Note that it will not count files in subdirectories.

ls: starts the command, which lists directory contents.
*.mp3: says to print out every file that ends in .mp3
|
wc: starts the next command, which prints line, word, and byte counts of input.
-l: tells wc to only print the number of lines

Print columns from a tab-delimited file

cut -f2 input_file

This will print the second column of a tab-delimited file.

cut: starts the command, which removes sections from each line of files.
-f2: tells cut to select the second field (column) for printing.
input_file: is the name of the file you want to print from.

cut -f2 -d',' input_file

If your file uses a different delimited, such as a comma, you can adjust the command as shown above.

-d',': specifies the character that is used as a delimiter; here, a comma.

Find oldest file and print its modification date in a friendly format

find -type f -printf '%T+\t%TY-%TB-%Td\t%p\n' | sort -n | head -n 1 | cut -f2,3

Find and print the oldest file and display its modified date in a friendly format.

find: starts the command, which searches for file in a directory hierarchy.
-type f: tells find to look for files (and not directories).
-printf '%T+\t%TY-%TB-%Td\t%p\n': tells find to print the exact modification time, a nicely-formatted version of the date, and the filename, separated by tabs.
|: (pipe) indicates the output of the first command will be passed as input for the next command.
sort -n: tells find to sort the output of the previous command numerically.
|: (pipe) indicates the output of the first command will be passed as input for the next command.
head -n 1: says to select just the first line of the previous output. Here it is the “smallest” (i.e., earliest) timestamp. You can replace this with tail -n 1 to find the “largest” (i.e., latest) timestamp.
|: (pipe) indicates the output of the first command will be passed as input for the next command.
cut -f2,3: says to print the second and third columns only (with the nicely formatted date and filename).

Count files and directories

echo "Digital Folders: $(find -type d | wc -l); Digital Files: $(find -type f | wc -l)"

The script queries a directory (using find), counts the number of new lines (using wc) to determine the number of directories (-type d) and files (-type f), then prints the output to the screen.

echo: starts the command, which displays a line a text.
"Digital Folders:: prints a literal string
$(find -type d | wc -l);: is a subquery that uses find -type d to find all directories and then pipes that to wc -l to count the number of lines of output.
Digital Files:: prints a literal string
$(find -type f | wc -l): is a subquery that uses find -typd f to find all files and then pipes that to wc -l to count the number of lines of output.

Compare two textfiles

diff input_file_1 input_file_2

This will compare two textfiles and show the differences, if any.

diff: starts the command, which compares your two text files
input_file_1: is the name of your first text file.
input_file_2: is the name of the file that you want to compare it to.

Compare two binary streams

cmp -l input_file_1 input_file_2

This will compare two binary streams and show the address and bytes of each difference, if any.

cmp: starts the command, which compares your two files
-l: tells cmp to list all the differences.
input_file_1: is the name of your first file.
input_file_2: is the name of the file that you want to compare it to.

Calculate the percentage of difference between two bytestreams

echo "100 * $(cmp -l input_file_1 input_file_2 | wc -l)/$(wc -c input_file_2 | cut -d' ' -f4)" | bc -l

This will compare two binary streams and show the address and bytes of each difference, if any.

For example, if two disk images of the same item have different checksums this will indicate how different they are.

Or, if you suspect that an operation has corrupted a bitstream, this will indicate the amount of corruption.

There's a lot going on here, but if you'd like to adapt the command, the gist of it is:

echo "100 * $(commands to find number of different bytes)/$(commands to find total number of bytes)" | bc -l

echo: in order to eventually pipe the fraction to bc, we need to wrap everything in an echo command.
"...": tell echo what information to echo
100 *: multiply by 100 to format the percentage correctly
$(...): evaluate everything inside of the parentheses
cmp: starts the command, which compares your two files
-l: tell cmp to list all the differences.
input_file_1: is the name of your first file.
input_file_2: is the name of the file that you want to compare it to.
wc: start the word count command, to summarize the differences found with cmp.
-l: tell wc to count by line, since cmp listed differences by line.
/: integer division. For decimal division, the result is piped to bc eventually.
wc: start the word count command, to find the byte size of the comparison file.
-c: tell wc to count by byte.
cut: start the cut command to slice the needed information out of the results of wc -c.
-d' ': specify that ' ' characters are delimiting the fields in the string.
-f4: specify that the information needed is in the fourth field.
bc: start the precision math command, in order to get the decimal result
-l: tell bc to use the standard math library

List files/folders and write out to external text file

ls -alR /set/your/path/here > /set/your/output/path/inventory.txt

Extract a list of files/directories (as well as filesystem data about the files/directories) to a "space" separated text file that you can parse separately from the CLI interface.

ls: starts the command, which lists files and folders.
-alR: tells ls to include "hidden" files and folders in the list (-a), to include permissions, owners, size, and date modified of each file and folder (-l), and to recursively go into nested folders to list files and folders (-R).
/set/your/path/here: tells ls where to start its list of files and folders.
/set/your/output/path/inventory.txt: tells bash where to put the resulting text file. Do note that if a file already exists in that location, it may be automatically overwritten! (See Don't accidentally overwrite existing files to see how to prevent automatic overwriting of files in your scripts.)

Checks disk usage for folder

du -sch /set/your/path/here/*

This will give you disk usage information for all files in a chosen directory.

du: starts the command, which lists directory contents.
-sch: -sch is a series of flags. This could also be written out as -s -c -h instead. Each letter represents a different, chosen option for the command.
-s indicates that you want one entry per file.
-c indicates that you want to calculate the grand total.
-h indicates that you want these results in human-readable units.
/set/your/path/here/*: Defines the file path you want to search. The asterisk at the end of this file path indicates that you want information about everything in the chosen directory.

Add helpful flags into bash shell scripts

Don't accidentally overwrite existing files

set -o noclobber

Set a flag so that you don't accidentally overwrite existing files when redirecting output from another command. This can be helpful if you are trying to create log files from the output of your script, but want to ensure that you don't write over any file that already exists.

Add this line after #!/bin/bash and before the rest of your code starts.

You can also issue this command in a Terminal window, but know that its effects are not permanent. (Meaning that, if you open a new Terminal window, you'll have to issue the command again to get the same effect.)

Exit a script immediately if any command returns an error

set -o errexit

Set a flag so that if any one part of your shell script returns an error, the entire script stops. This can be super helpful if you have a loop in your script: the script will not issue multiple consecutive errors but instead quit after the first one. This also can be helpful if you are debugging someone else's code and are not sure why it is not working on your system.

Add this line after #!/bin/bash and before the rest of your code starts.

One technique for handling whitespace in filenames

OIFS=$IFS

IFS=$'\n'

[script text]

IFS=$OIFS

Set a flag so that bash only interprets new lines as delimiters. Use this if you find that, in your code, you are getting weird errors because certain filenames have spaces in them, and there's no other way around that. (i.e., If you're not allowed or don't want to rename files that have spaces in them.)

Add the first two lines after #!/bin/bash and before the rest of your code starts.

At the end of the script, use the last line to re-set $IFS to its original value.

Increase control on the command line

Moving the cursor to the beginning of the line

ctrl+a

Sometimes when you're editing a command, the part you want to edit is buried at the beginning of the line. In that case, it can be faster to jump to the start of the line and move the cursor to the part you want to edit, instead of moving backwards from the end of the line.

On the keyboard, press the control key and the a at the same time, and the cursor will move to the start of line.

Your console might have other shortcuts for moving the cursor forward or backward one word at a time.

Moving the cursor using the mouse

On Mac or Linux, hold alt/option and click with the mouse

Sometimes when you're editing a command, it can be frustrating to move the cursor to the right position with the arrow keys. In that case, it can be faster to jump to the spot you want to edit directly.

On the keyboard, press the alt/option key and then click on the location where you want to place the cursor on the command line.

Stop a command while it's running

ctrl+c

When:

a command is running, and it won't stop,
the wrong command is running, and you need to stop it,
the command has a bug, and you need to stop it,

press ctrl+c on the keyboard.

If one press doesn't work, try holding ctrl and tapping c.

Don't just stop, kill a command

ctrl+z

kill %%

When you try to stop a command, but ctrl+c doesn't work.

ctrl+z

suspend the current command.

kill %%

stop the last suspended commanded, metaphorically yanking the power.

Save more commands in your history

The terminal saves the commands that you enter into a history file. When you press up, the terminal retrieves lines from this history file one-by-one. It can be helpful to save more than the default number of commands. To do that you need to increase the storage of the history cache and history file.

echo "export $HISTSIZE=5000" >> ~/.bash_profile

echo "": starts the command, which repeats what is inside the quotes as an argument for another command.
export: is a command that makes a variable available to all programs in your terminal.
$HISTSIZE=: is the variable used by bash to control how many lines are stored in the history cache.
5000: how many lines you would like to store. Using -1 sets the storage to unlimited.
>>: appends the output of the previous command to the end of a file.
~/.bash_profile: is the file used to store configuration options.

echo "export $HISTFILESIZE=5000" >> ~/.bash_profile

$HISTFILESIZE=: is the variable used by bash to control how many lines are stored in the history file.

Find an old command

Remember that super-long, super-complicated, super-great bash command you used once? Want to use it again but don't want to search your commands one up press at a time? This command searches your saved command history using grep.

history | grep "search_term"

history: prints a log of your bash commands up to limit set by $HISTSIZE.
grep: searches text using patterns known as regular expressions.
"search_term": the pattern grep looks for. Using numbers, letters, and spaces will do a regular search. Some punctuation characters kick off special functions. For example, . is a wildcard character.