Intermediate Sed Usage

If you’ve been using Linux for any small length of time, you’ve likely used sed before. Most of the time, you’ve seen it in the form of sed "s/find/replace/g", so you simply go to it whenever there’s a replacement you want to do.

But sed stands for stream editor, and as a tool it can do more than just find and replace.

Note: I highly recommend pulling up man sed to follow along.

Addresses

Let’s take a look at a particular section of man sed:

Addresses: Sed commands can be given with no addresses, in which case the command will be executed for all input lines; with one address, in which case the command will only be executed for input lines which match that address; or with two addresses, in which case the command will be executed for all input lines which match the inclusive range of lines starting from the first address and continuing to the second address. Three things to note about address ranges: the syntax is addr1,addr2 (i.e., the addresses are separated by a comma); the line which addr1 matched will always be accepted, even if addr2 selects an earlier line; and if addr2 is a regexp, it will not be tested against the line that addr1 matched.

We have two main takaways from this:

  • Sed commands operate only on the lines which are matched by the addresses given.
  • If no address is given, sed commands will operate on all lines.
    • (This is what your typical sed 's/foo/bar/g' does!)

There are a few ways to specify addresses – for example, the addr1,addr2 method above. The others are later in the manual.

Take a look at this file:

$ cat file
# List of directories to rememeber

/home/me/Documents/
/home/me/Downloads/
/tmp
~other/Documents
~/Repos/snippets
~
/usr/share/nvim/runtime
# TODO: Add more
  • sed '7s:~:/home/me:': Here, we do your typical replacement, but only on the seventh line. That line will be changed to /home/me/Repos/snippets, but the following line will remain untouched.

The delete and print commands go hand-in-hand with addresses.

  • sed '1,2d;$d': Here, we strip the leading comment, empty line, and last-line comment. leaving us with just the list:
$ sed '1,2d;$d' file
/home/me/Documents/
/home/me/Downloads/
/tmp
~other/Documents
~/Repos/snippets
~
/usr/share/nvim/runtime
  • sed -n '3,9p': This does the exact same thing as the previous command, but using the -n to suppress normal printing and instead printing only the lines in between.
  • sed -n '3,9{/^~/d; p}': Now things are getting interesting. We specify an address range 3,9, but then our do a compound command using another address spec! This will take lines in between 3,9, then delete any which have a leading tilde, then print the rest.
$ sed -n '3,9{/^~/d; p}' file
/home/me/Documents/
/home/me/Downloads/
/tmp
/usr/share/nvim/runtime

If you’re familiar with regular grammar, sed scripts can be represented with the following production rules:

  • [script] -> [address][command]
  • [script] -> [command]
  • [command] -> {[script]}
  • [address] and [command] have production rules according to their respective sections in the manpage.

Script files

As our sed commands start becoming complex, it may be benificial to use a script file, like it is common to do with bash, awk, or other languages. Here is the shebang I typically use:

#!/usr/bin/env -S sed -E -f
# Note: I use -E to enable extended regex
${d}    # delete the last line
/foo/{  # replace o with a on lines with 'foo'
   s/o/a/g
}

The form most script files take is similar to awk script files. You’ll have match rules (or “addresses”, as sed calls them) in the top level, and then commands within a pair of {braces} under them.

There is something worth noting here: sed will concatenate all script files (-f) and script fragments (-e) together in the order given on the command line. So if you run sed -e p -f file, then for each line sed reads, it will print the line unaltered before running any commands from file. Now you can see exactly which line sed exits on while debugging.

Commands

Now that we’ve learned about addresses, let’s address (heh) some of the other commands we can use.

t,T,b,:: Labels and branches!

If you aren’t familiar with labels from C or assembly, labels and branches are the low-level control flow constructs which higher-level control flow constructs like if else while compile to.

To run the equivalent of “while the line matches /regex/, remove the first instance of ‘foo’”, we would do something like this:

:top
/regex/{
	s/foo//  # remove foo
	b top    # branch to top to check again
}

Here’s an example of doing one operation on multiple addresses using labels:

sed '1bl; $bl; b; :l s/^/At beginning or end of input: /'

Or as a script:

#!/usr/bin/env -S sed -E -f
1{ b label }  # first line: branch to :label
${ b label }  # last line: branch to :label
b             # branch to end of script (i.e.: move on to next line)
:label{       # :label
    s/^/At beginning or end of input: /
}

The t/T commands are the same as branch, but instead of branching based on an address, branch based on the result of a previous substitute command.

#!/usr/bin/env -S sed -E -f
s:^~([^/][^/]*):/home/\1:; t label # ~user
s:^~:/home/me:; t label            # ~
b
:label {
    iNOTE: a tilde replacment was done here
}

h, H, g, G: These commands operate with a new object: the hold space. The hold space is the only way sed has to keep state between lines.

  • h H: Move/append the pattern space (the current line) into the hold space
  • g G: Move/append the hold space into the pattern space

Instead of working on some Lorem ipsum file, let’s try and solve a problem with these commands. Maybe we want to mute a certain application in PulseAudio.

Before we break down the commands, let’s take a snippet of pacmd list-sink-inputs to see what we’re after.

$ pacmd list-sink-inputs
2 sink input(s) available.
    index: 43
    driver: <protocol-native.c>
[...]
    properties:
        media.role = "music"
        media.name = "Spotify"
        application.name = "Spotify"
[...]
    index: 500
    driver: <protocol-native.c>
[...]
    properties:
        media.name = "AudioStream"
        application.name = "AudioIPC Server"
        native-protocol.peer = "UNIX socket client"

We can find a sink-input’s index number under the index: field easily enough: sed -n '/index:/{s/.*: //p}', but this will give us all sink indices. Maybe we just want one of them… maybe just the one which Spotify is using. Well, we can hold the sink index with h, and then print it when we find Spotify elsewhere.

So let’s do that:

# using GNU extensions
pacmd list-sink-inputs | sed -n -e 's/^[[:space:]]*index: //; T; h'
# using POSIX sed commands only:
pacmd list-sink-inputs | sed -n -e 's/^[[:space:]]*index: //; tl; b; :l h'

Here, we use the T (or t and b commands, in the POSIX case) to skip hold command if no replacement was done. Our replacement simply removes text before the index number.

Now we have the sink index in the hold space, but we need to print it out at some point. This is where the get-held command comes into play:

pacmd list-sink-inputs | sed -n '
    s/^[[:space:]]*index: //  # remove the "index:", leaving just the index number in the pattern space
    Tl            # jump to label if we didn't find a new index
    h             # the replacement was successful, move the index number into the hold space
    b             # break to the end of the script (i.e.: start parsing next line)
    :l /"Spotify"/{ 
        g   # copy held space into pattern space
        p   # print the pattern spac
        q   # quit
    }'

Now, instead of branching to the end of the script if no match was found, we keep going. This way we can try and find our application name.

Finally, we capture sed’s output and pass it to pacmd: pacmd mute-sink-input $(pacmd list-sink-inputs | sed -n [...] )


i, a: Insert and append text. These commands will insert lines before or after the current line. They can be difficult to use at first, because the text they insert matches to the next newline, including characters like ;, {, and }. So sed -n /^~/{i/home/me; p} will error, since there is no closing bracket.

/^~/{i/home/me; p}
/^~/                 # address: leading tilde
    {                # opening {
     i               # insert command
      /home/me; p}   # text to insert

Instead, you have to use a newline or use a new script fragment to indicate where to end the insert command: sed -e '/^~/{i/home/me' -e 'p}'.

The same change must be made with append commands.

Conclusion

Sed is often relegated to a global replacement tool in most user’s toolbox, and for good reason: The s/// command is the most powerful line-wise command sed has to offer. But sed is in fact a limited scripting language unto itself, with control flow primitives and tools to insert and hold text.

However, these advanced features are often overlooked in favor of other languages, most notably awk (whose scripts take on a similar shape of [address]{ [commands] }).

Still, sed is much more versatile than most users give it credit for, and its incredible efficiency and speed can make it a strong candidate for any kind of stream filtering. It is the stream editor, after all.

sed  linux  code