Read the Manual: du

Disk utility. Not that Mac app you might be thinking of, the older one.

People interested in learning how the command lines tools on their computer work.

Today’s Read the Manual entry is about du, for disk usage! A positively ancient utility:

du… first appeared in Version 1 AT&T UNIX.

And:

This version of du was written by Chris Newcomb for 4.3BSD-Reno in 1989.

Till now, I have only ever du -sh. Time to change that!

The first interesting flag is the first flag listed in the manual: -A displays the apparent size in distinction to the disk usage, for compressed volumes or sparse files”. Neither of which I was familiar with before just now (I have heard the terms, but that’s it). So, just now I learned that a sparse file is one which is mostly empty, and a file system can use metadata to store it as a much smaller file — presumably using some heuristics/tradeoff where the metadata is enough smaller than the empty space that would otherwise be used. Neat!

A compressed volume — which I don’t think is that common among file systems, but ZFS has it! — is a volume where files are compressed and decompressed on the fly. Obviously costs CPU to de/compress, but I could see many situations in which that tradeoff would be worth it. Also neat!

Going back to du itself, the most common flags I have actually used are worth explaining.

  • -s displays an entry for each specified file instead of showing the tree of files in a directory if you name a directory. I basically never want to see a whole tree like that!

  • -h formats the output to be human-readable” (thus the h), using unit suffixes for (kilo|mega|giga|tera|peta)?bytes, all using powers of 1024. This rounds aggressively but usefully. A directory I have that is 146436096 bytes is printed as 140M. Just today I learned that --si will use powers of 1000 instead of 1024, so it’s human-readable”… but different than -h human-readable”.

You can also specify the size of block you want to count with: kilobytes -k, megabytes -m, gigabytes -g. No flags for terabytes or petabytes. Note that this is block size not output formatting. The block size is literally report as if the disk blocks were this big”.

Unsurprisingly given its job, du also has flags for whether to follow symlinks:

  • -H for yes for the files you tell me but no in the hierarchy”
  • -L for follow alllll the symlinks”
  • -P for nothing at all”, which is the default

And for depth to recurse with -d.

Bonus: the tool I am apt to reach for just as much as du these days is the very nice (and very fast!) dust tool, so named because it is like du and written in Rust (because of course it is!).