Awk book review: Awk One-Liners Explained by Peteris Krumins
In addition to being a thorough, fair review, it’s a great explanation of why and how Awk is so helpful.
A history lesson
As a target for writing code, the UNIX software ecosystem is strangely fascinating. Among its features, one standout is its reliance on text as a basic information storage unit. This sentence looks dumb out of context, because of course. But in the age of rich media and omnipresent design, the simplicity of carrying information on text alone is refreshing, comforting in its minimalism (doh! I’ve managed to bring up design; must resist). UNIX was also designed on the principle that software designed for it should be made to manipulate text streams in such a way that programs could be connected like stations on a train track. The text train would go out from the first station, stop at each station for some processing and arrive at the last station rich with the solution to some problem. It will brand me as stuck in the past, but I am utterly seduced by this paradigm of powerfully plain, point-free, simple-step-by-simple-step data processing.
This is why I jumped on the occasion to learn about the awk utility, when I heard of Peteris Krumins’ e-book Awk One-Liners Explained As curious as I am with text processing, I mostly do it with small Ruby programs and some simple uses of Sed. I know that Ruby, a language that I both love and hate (but mostly love) borrows from Perl, which borrows frow Awk, so I took it as a history lesson, at least.
A surprisingly useful tool
The book is essentially a cookbook, as the author teaches Awk by example. He starts on simple examples that show the basics and progressively ramps up the complexity of examples, showing clever use of various Awk tools. The onus of the cookbook is on one-liners, Awk programs that can easily and readably hold on a single line of code. This is highly relevant as I’ve never seen or heard about a very complex Awk program: the few invocations of Awk I’ve seen in shell scripts indeed carried all the editing code inline. What the author suggests is that more complex tasks are rather accomplished by stringing Awk one-liners in a stream processing sequence, an idiom of UNIX programming that can integrate other tools (e.g.
sort,cut, “) to properly and elegantly solve the problem.It is an excellent Awk tutorial. The author pushes no introductory theory or generality and dives right into the first example. As he carries into the text, the reader the general principles behind Awk programs are spelled out. This teaching approach works wonders and is well suited to showing Awk idiosyncrasies, when compared to a classical exposition. The style and structure of the book encourages the reader to try the examples as he reads, which facilitates the assimilation of the material (in my opinion, as far as coding goes, practical knowledge is competence and theoretical knowledge is warm wind) and allows the reader to better feel when he grows tired and stops learning effectively.
The book argues strongly in favor of the usefulness of the Awk text processor, as well as in the simplicity and the readability of Awk programs. While corresponding Sed one-liners would likely be more terse and compact, the C-like syntax has a flow and visual structure that makes Awk one-liners easy to understand, however concise they remain. Given that Awk is a standard POSIX utility, it is deployed in all UNIX systems that I care about (Ruby is not installed by default in even the most recent Ubuntu releases, which frustrated a few times), and so is a very welcome addition to my common scripting tools.
But the
manpage says there’s more!The main criticism I address to Krumins’ otherwise excellent book concern aspects that I would have liked covered in the book, or at least completely covered. One of the staples of modern programming languages is iteration through lists or arrays using specific statements (e.g.
for elem in array...). Awk has such an iterative form, which is relevant for one-liner programs and as to position Awk among the inspirations for modern languages. Other features of Awk that could have made for interesting one-liners are thenextandnextfilestatements, as well as thegetlinestatement. The latter is one that deserves treatment, given that it looks useful for combining two text streams in one and it is as confusing as Canadian tax law.In addition, while the book makes for an excellent Awk tutorial, it is difficult to use as a reference for the Awk language. Maybe it would be better for this purpose if the author would add more page links in the index of the book. In his defense, though, Krumins publishes a free Awk cheat sheet, which does make for a good quick reference. And that’s not counting the multiple other references that can be found on the web with just a bit of searching —- many of which are of much poorer quality than the author’s material, might I add.
However, Peteris Krumins set out to explain a bunch of one-liners, and he did so superbly. I have had the history lesson I had settled for, and came away with an exciting new tool as a bonus. Moreover, the book is quite short, so it may be read to satisfaction in one little afternoon. I recommend it.
Update #1
Peteris Krumins responded to my criticisms through Twitter, and I must retract on one of my points. In my defense, it has been two or three weeks that I’ve read the book, so hey. But the author does cover the
nextandgetlinestatements. As for the latter, though, the alternative forms of the statement by which other files than the current input file, or even pipes, were not covered. These alternate forms are confusing: some change the$Nfields, others don’t; some forms advanceNR, others don’t. The documentation of GNU Awk, for instance, goes to great length to delineate the cases, and putting up a cheat sheet would come in handy. And it is these forms that could allow the combination of two data streams in Awk. Here’s a one-liner that alternates the lines from files namedfile1andfile2:awk '{ print; if(getline < "file2") print; } END { while(getline < "file2") print; }' file1The first block prints all lines from
file1; then, callinggetline < "file2"fills$0with the next line fromfile2, which then gets printed. As a function,getlinereturns 1 on successful reading of a record, 0 when reaching the end of the file and -1 on error (meaning my one-liner does not handle errors correctly, but it’s not the point of one-liners, is it?). Thus, one the one hand, iffile2has less lines thanfile1, only the lines of the latter keep being printed. On the other hand, iffile2is longer, the printing of its remainder is handled by the END block.Long digression, but in the end, I mean to insist on one thing: this Awk book is authoritative, concise and highly readable. I enjoyed it to the end.
Update #2
In yet another conversation with Mr. Krumins, he reminded me that the premise of his book was to paraphrase the bunch of one-liners originally published by Eric Pement. Thus, it is clear that from the beginning, he had no intention of covering the full extent of Awk features, which puts my no. 1 criticism somewhat beside the point. Indeed. But if this misguided comment put one more nice one-liner about
getlineout there (see update #1), I guess we’ll all see some good came out of it.
