Shell showcase: `cut`

There are a bunch of utilities that I use daily, but that I learned how to use embarassingly late in my programming journey. I plan to do a sequence of short tutorials highlighting the use of these tools in the hope that others can skip the long, frustrating discovery process that I went through before I knew them.

cut is part of coreutils, the set of utilities that *nix systems expect to be in place at all times. You should already have it if you’re running Linux, any BSD, macOS, or Windows Subsystem for Linux. It reads lines of input from stdin and cuts them into pieces according to the flags that you provide. It sends the resulting pieces on stdout. A simple example:

Let us say that we have this excellent poem by Dylan Thomas in a file called thomas.txt

Do not go gentle into that good night,
Old age should burn and rave at close of day;
Rage, rage against the dying of the light.

Though wise men at their end know dark is right,
Because their words had forked no lightning they
Do not go gentle into that good night.

Good men, the last wave by, crying how bright
Their frail deeds might have danced in a green bay,
Rage, rage against the dying of the light.

Wild men who caught and sang the sun in flight,
And learn, too late, they grieved it on its way,
Do not go gentle into that good night.

Grave men, near death, who see with blinding sight
Blind eyes could blaze like meteors and be gay,
Rage, rage against the dying of the light.

And you, my father, there on the sad height,
Curse, bless, me now with your fierce tears, I pray.
Do not go gentle into that good night.
Rage, rage against the dying of the light.

We could run:

$ cut -d ' ' -f 1,3,5,6,8 < thomas.txt
Do go into that night,
Old should and rave close
Rage, against dying of light.

Though men their end dark
Because words forked no they
Do go into that night.

Good the wave by, how
Their deeds have danced a
Rage, against dying of light.

Wild who and sang sun
And too they grieved on
Do go into that night.

Grave near who see blinding
Blind could like meteors be
Rage, against dying of light.

And my there on sad
Curse, me with your tears,
Do go into that night.
Rage, against dying of light.

The -d flag specifies a “delimiter”, which is the symbol that cut will use to break lines of input into pieces. In this case, I’ve specified that the delimiter should be a space. This had to be quoted to ensure that my shell actually passed a space character instead of ignoring the extra space.

The -f flag specified the “fields” that we want to keep. These “fields” are the pieces that cut broke the input into. We refer to them like elements of an array, though the first element is at position 1 instead of 0. The example above asks for the first, third, fifth, sixth, and eighth field, which resulted in our output string. When the delimiter is a space, the fields are essentially words (plus surrounding punctuation).

We don’t always need to be so verbose though. The fields can be specified using several different syntaxes:

  • -f N- takes fields from the Nth field onward
  • -f -N takes fields through the Nth field (inclusive)
  • -f N-M takes fields from the Nth through the Mth

These can all be combined with comma separation, so you can do:

$ cut -d ' ' -f -3,5,7- < thomas.txt
Do not go into good night,
Old age should and at close of day;
Rage, rage against dying the light.

Though wise men their know dark is right,
Because their words forked lightning they
Do not go into good night.

Good men, the wave crying how bright
Their frail deeds have in a green bay,
Rage, rage against dying the light.

Wild men who and the sun in flight,
And learn, too they it on its way,
Do not go into good night.

Grave men, near who with blinding sight
Blind eyes could like and be gay,
Rage, rage against dying the light.

And you, my there the sad height,
Curse, bless, me with fierce tears, I pray.
Do not go into good night.
Rage, rage against dying the light.

Okay, so when would I actually use this?

Well, one thing I end up doing is manipulating CSV data in my terminal. Let’s say that we have this example CSV file called example.csv:

tool,source,language
cut,coreutils,c
tr,coreutils,c
sponge,moreutils,c
rg,ripgrep,rust
fzf,fzf,go

What if we want to extract only the tool name and language from this file into a new file? We could open it in some kind of spreadsheet editor and delete the center column, but what if this file is huge or we don’t have a spreadsheet editor available? Cut makes this simple:

$ cut -d ',' -f 1,3 < example.csv
tool,language
cut,c
tr,c
sponge,c
rg,rust
fzf,go

Note that the delimiter (-d flag) now uses a comma instead of a space. We could use this technique to take any number of columns from the file. It’s actually quite fast, as cut is able to stream the data one line at a time.

Cut can also take specific characters or bytes instead of delimited fields. I use these features less often, so I won’t cover them now. Perhaps in a future post.

unsplash-logoSérgio Rola

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.