Assumed Audience: 90% myself in the future, when I (inevitably) ask this question again — but also anyone else who hits this particular question about command-line invocations.
In my previous post, I used the
tr utility to deal with needing to transform newlines into null characters. However, as I hoped when I put a request for a better way to do it in my Epistemic Status qualifier, a reader emailed me with a better solution!
If you’re using the GNU version of
grep, it has a
--null-data (shortened as
-z) flag which makes grep treat its input as null-character-separated. You can combine that with the
-print0 flag to
find to get the same results as I got with
tr (presumably with better performance because it doesn’t require doing the replacement in another tool):
$ find notes -name ".md" -print0 |\ grep --null-data "notes/2020" |\ xargs -0 wc -w
$ fd notes --print0 ".md" notes |\ rg --null-data 'notes/2020' |\ xargs -0 cw -w
Huzzah for versions of tools that understand these things and make this simpler than the solution I posted yesterday (and thanks to my reader for sending in that note)!
cwis nice because with especially large sets of data, the fact that you can invoke across threads becomes very handy. If I word-count all of my notes with it (currently 667 files and just shy of 150,000 words), using 4 threads instead of 1 (the default, and all you get with
wc) takes about
6 – 8milliseconds off the run time. Not important at this scale… but if you’re dealing with very large amounts of data, it might be. ↩︎