If you’ve been using Linux for any small length of time,
you’ve likely used sed
before.
Most of the time, you’ve seen it in the form of sed "s/find/replace/g"
,
so you simply go to it whenever there’s a replacement you want to do.
But sed
stands for stream editor,
and as a tool it can do more than just find and replace.
Note: I highly recommend pulling up
man sed
to follow along.
Addresses
Let’s take a look at a particular section of man sed
:
Addresses: Sed commands can be given with no addresses, in which case the command will be executed for all input lines; with one address, in which case the command will only be executed for input lines which match that address; or with two addresses, in which case the command will be executed for all input lines which match the inclusive range of lines starting from the first address and continuing to the second address. Three things to note about address ranges: the syntax is addr1,addr2 (i.e., the addresses are separated by a comma); the line which addr1 matched will always be accepted, even if addr2 selects an earlier line; and if addr2 is a regexp, it will not be tested against the line that addr1 matched.
We have two main takaways from this:
- Sed commands operate only on the lines which are matched by the addresses given.
- If no address is given, sed commands will operate on all lines.
- (This is what your typical
sed 's/foo/bar/g'
does!)
- (This is what your typical
There are a few ways to specify addresses – for example, the addr1,addr2
method above.
The others are later in the manual.
Take a look at this file:
$ cat file
# List of directories to rememeber
/home/me/Documents/
/home/me/Downloads/
/tmp
~other/Documents
~/Repos/snippets
~
/usr/share/nvim/runtime
# TODO: Add more
sed '7s:~:/home/me:'
: Here, we do your typical replacement, but only on the seventh line. That line will be changed to/home/me/Repos/snippets
, but the following line will remain untouched.
The d
elete and p
rint commands go hand-in-hand with addresses.
sed '1,2d;$d'
: Here, we strip the leading comment, empty line, and last-line comment. leaving us with just the list:
$ sed '1,2d;$d' file
/home/me/Documents/
/home/me/Downloads/
/tmp
~other/Documents
~/Repos/snippets
~
/usr/share/nvim/runtime
sed -n '3,9p'
: This does the exact same thing as the previous command, but using the-n
to suppress normal printing and instead printing only the lines in between.sed -n '3,9{/^~/d; p}'
: Now things are getting interesting. We specify an address range3,9
, but then our do a compound command using another address spec! This will take lines in between3,9
, then delete any which have a leading tilde, then print the rest.
$ sed -n '3,9{/^~/d; p}' file
/home/me/Documents/
/home/me/Downloads/
/tmp
/usr/share/nvim/runtime
If you’re familiar with regular grammar, sed scripts can be represented with the following production rules:
[script] -> [address][command]
[script] -> [command]
[command] -> {[script]}
[address]
and[command]
have production rules according to their respective sections in the manpage.
Script files
As our sed commands start becoming complex, it may be benificial to use a script file, like it is common to do with bash, awk, or other languages. Here is the shebang I typically use:
#!/usr/bin/env -S sed -E -f
# Note: I use -E to enable extended regex
${d} # delete the last line
/foo/{ # replace o with a on lines with 'foo'
s/o/a/g
}
The form most script files take is similar to awk script files.
You’ll have match rules (or “addresses”, as sed calls them) in the top level,
and then commands within a pair of {
braces}
under them.
There is something worth noting here: sed
will concatenate all script files (-f) and script fragments (-e) together
in the order given on the command line.
So if you run sed -e p -f file
, then for each line sed reads, it will print the line unaltered
before running any commands from file
.
Now you can see exactly which line sed exits on while debugging.
Commands
Now that we’ve learned about addresses, let’s address (heh) some of the other commands we can use.
t
,T
,b
,:
: Labels and branches!
If you aren’t familiar with labels from C or assembly,
labels and branches are the low-level control flow constructs
which higher-level control flow constructs like if
else
while
compile to.
To run the equivalent of “while the line matches /regex/, remove the first instance of ‘foo’”, we would do something like this:
:top
/regex/{
s/foo// # remove foo
b top # branch to top to check again
}
Here’s an example of doing one operation on multiple addresses using labels:
sed '1bl; $bl; b; :l s/^/At beginning or end of input: /'
Or as a script:
#!/usr/bin/env -S sed -E -f
1{ b label } # first line: branch to :label
${ b label } # last line: branch to :label
b # branch to end of script (i.e.: move on to next line)
:label{ # :label
s/^/At beginning or end of input: /
}
The t
/T
commands are the same as b
ranch, but instead of branching based on an address,
branch based on the result of a previous s
ubstitute command.
#!/usr/bin/env -S sed -E -f
s:^~([^/][^/]*):/home/\1:; t label # ~user
s:^~:/home/me:; t label # ~
b
:label {
iNOTE: a tilde replacment was done here
}
h
, H
, g
, G
: These commands operate with a new object: the hold space.
The hold space is the only way sed has to keep state between lines.
h
H
: Move/append the pattern space (the current line) into the hold spaceg
G
: Move/append the hold space into the pattern space
Instead of working on some Lorem ipsum file, let’s try and solve a problem with these commands. Maybe we want to mute a certain application in PulseAudio.
Before we break down the commands, let’s take a snippet of pacmd list-sink-inputs
to see what we’re after.
$ pacmd list-sink-inputs
2 sink input(s) available.
index: 43
driver: <protocol-native.c>
[...]
properties:
media.role = "music"
media.name = "Spotify"
application.name = "Spotify"
[...]
index: 500
driver: <protocol-native.c>
[...]
properties:
media.name = "AudioStream"
application.name = "AudioIPC Server"
native-protocol.peer = "UNIX socket client"
We can find a sink-input’s index number under the index:
field easily enough:
sed -n '/index:/{s/.*: //p}'
, but this will give us all sink indices.
Maybe we just want one of them… maybe just the one which Spotify is using.
Well, we can hold the sink index with h
, and then print it when we find Spotify elsewhere.
So let’s do that:
# using GNU extensions
pacmd list-sink-inputs | sed -n -e 's/^[[:space:]]*index: //; T; h'
# using POSIX sed commands only:
pacmd list-sink-inputs | sed -n -e 's/^[[:space:]]*index: //; tl; b; :l h'
Here, we use the T
(or t
and b
commands, in the POSIX case) to skip h
old command
if no replacement was done. Our replacement simply removes text before the index number.
Now we have the sink index in the hold space,
but we need to print it out at some point.
This is where the g
et-held command comes into play:
pacmd list-sink-inputs | sed -n '
s/^[[:space:]]*index: // # remove the "index:", leaving just the index number in the pattern space
Tl # jump to label if we didn't find a new index
h # the replacement was successful, move the index number into the hold space
b # break to the end of the script (i.e.: start parsing next line)
:l /"Spotify"/{
g # copy held space into pattern space
p # print the pattern spac
q # quit
}'
Now, instead of b
ranching to the end of the script if no match was found, we keep going.
This way we can try and find our application name.
Finally, we capture sed’s output and pass it to pacmd: pacmd mute-sink-input $(pacmd list-sink-inputs | sed -n [...] )
i
, a
: Insert and append text. These commands will insert lines before or after the current line.
They can be difficult to use at first, because the text they insert matches
to the next newline, including characters like ;
, {
, and }
.
So sed -n /^~/{i/home/me; p}
will error, since there is no closing bracket.
/^~/{i/home/me; p}
/^~/ # address: leading tilde
{ # opening {
i # insert command
/home/me; p} # text to insert
Instead, you have to use a newline or use a new script fragment to indicate where to end the insert command:
sed -e '/^~/{i/home/me' -e 'p}'
.
The same change must be made with a
ppend commands.
Conclusion
Sed is often relegated to a global replacement tool in most user’s toolbox,
and for good reason: The s///
command is the most powerful line-wise command sed has to offer.
But sed is in fact a limited scripting language unto itself, with control flow primitives
and tools to insert and hold text.
However, these advanced features are often overlooked in favor of other languages,
most notably awk
(whose scripts take on a similar shape of [address]{ [commands] }
).
Still, sed
is much more versatile than most users give it credit for,
and its incredible efficiency and speed can make it a strong candidate for any kind of stream filtering.
It is the stream editor, after all.