AWK: wrap lines to 72 characters
$ awk 'length > 72' {HOW TO PRINT THE LINEs IN PCS?} msg
ie I want it to add n
after 72 chars and continue, so initially you may need to remove all single n
s and the add them. It may be easier be easier with other tool but let’s give a try to awk.
[Update]
Williamson provided the right answer but some help needed to read it. I break the problem into parts with simpler examples, below.
-
Why does the code below print
t
in both cases,gsub
should substitute things? x is a dummy-file, some odd 0 at the end. -
Attacking the line
line = $0 n more = getline n gsub("t"," ")
in Williamson’s reply,line
apparently gets whole stdout whilemore
gets popped value of$0
, right?
Code to part 1
$ gawk '{ hallo="tjenat tjena2"; gsub("t"," "); }; END {print hallo; gsub("t", ""); hallo=hallo gsub("t",""); print hallo }' x
tjena tjena2
tjena tjena20
Not using awk
I understand this may just be one part of a larger problem you are trying to solve using awk
or simply an attempt to understand awk better, but if you really just want to keep your line length to 72 columns, there is a much better tool.
The fmt
tool was designed with specifically this in mind:
fmt --width=72 filename
fmt
will also try hard to break the lines in reasonable places, making the output nicer to read. See the info
page for more details about what fmt
considers “reasonable places.”
Awk is a Turing-complete language, and not a particularly obfuscated one, so it’s easy enough to truncate lines. Here’s a straightforward imperative version.
awk -v WIDTH=72 '
{
while (length>WIDTH) {
print substr($0,1,WIDTH);
$0=substr($0,WIDTH+1);
}
print;
}
'
If you want to truncate lines between words, you can code it up in awk, but recognizing words is a non-trivial (for reasons having more to do with natural languages than algorithmic difficulty). Many systems have a utility called fmt
that does just that.
Here is an AWK script that wraps long lines and re-wraps the remainders as well as short lines:
awk -v WIDTH=72 '
{
gsub("t"," ")
$0 = line $0
while (length <= WIDTH) {
line = $0
more = getline
gsub("t"," ")
if (more)
$0 = line " " $0
else
$0 = line
break
}
while (length >= WIDTH) {
print substr($0,1,WIDTH)
$0 = substr($0,WIDTH+1)
}
line = $0 " "
}
END {
print
}
'
There is a Perl script available on CPAN which does a very nice job of reformatting text. It’s called paradj (individual files). In order to do hyphenation, you will also need TeX::Hyphen
.
SWITCHES
--------
The available switches are:
--width=n (or -w=n or -w n)
Line width is n chars long
--left (or -l)
Output is left-justified (default)
--right (or -r)
Output is right-justified
--centered (or -c)
Output is centered
--both (or -b)
Output is both left- and right-justified
--indent=n (or -i=n or -i n)
Leave n spaces for initial indention (defaults to 0)
--newline (or -n)
Insert blank lines between paragraphs
--hyphenate (or -h)
Hyphenate word that doesn't fit on a line
Here is a diff of some changes I made to support a left-margin option:
12c12
< my ($indent, $newline);
---
> my ($indent, $margin, $newline);
15a16
> "margin:i" => $margin,
21a23
> $margin = 0 if (!$margin);
149a152
> print " " x $margin;
187a191,193
> print "--margin=n (or -m=n or -m n) Add a left margin of n ";
> print "spacesn";
> print " (defaults to 0)n";
Here is an Awk function that breaks on spaces:
function wrap(text, q, y, z) {
while (text) {
q = match(text, / |$/); y += q
if (y > 72) {
z = z RS; y = q - 1
}
else if (z) z = z FS
z = z substr(text, 1, q - 1)
text = substr(text, q + 1)
}
return z
}
Surprisingly this is more performant than fold or fmt.
You asked why the awk
code emitted tabs and where the zero came from.
-
The code does not modify the
hello
string with thegsub()
calls. With two arguments,gsub()
acts on$0
. To actually modify thehallo
variable, usegsub(..., ..., hallo)
. -
You get the zero at the end of the string because
gsub()
returns the number of substitutions made, and at one point you append this number to the value ofhallo
.
I’m aware of at least three utilities that are specifically for wrapping and formatting text paragraphs:
-
fold
, “filter for folding lines”, which is a standard POSIX utility. It simply inserts newlines and does not reflow text. -
fmt
, “simple text formatter”, which is also often installed on Unix systems by default and a fair bit smarter thanfold
when it comes to reflowing paragraphs. -
par
, “filter for reformatting paragraphs“, which has additional capabilities for detecting paragraph prefixes and suffixes (such as a text with an ASCII box around it, or comments in a bit of source code), and handles indentation and hanging indents a fair bit better thanfmt
.
Using gensub, in order to get fold
semantics, you could run something along the lines of
awk '{printf gensub("(.{0,72})","\1n","g")}'
It’s hard to say without specific requirements and sample input/output but this simplistic approach might be what is wanted (using 5 instead of 72 as the default line width to make test results clearer):
$ cat tst.awk
BEGIN {
wid = (wid ? wid : 5)
}
{
rec = rec $0
while ( length(rec) > wid ) {
print substr(rec,1,wid)
rec = substr(rec,wid+1)
}
}
END {
if ( rec != "" ) {
print rec
}
}
$ seq 9 | awk -f tst.awk
12345
6789
$ seq 9 | awk -v wid=4 -f tst.awk
1234
5678
9
if your input can contain tabs then I recommend running it through pr -e -t
first to replace them with relative blanks, otherwise just add gsub(/t/," ")
or whatever substitution you think is appropriate immediately above the rec = rec $0
line.