AWK: wrap lines to 72 characters

$ awk 'length > 72' {HOW TO PRINT THE LINEs IN PCS?} msg

ie I want it to add n after 72 chars and continue, so initially you may need to remove all single ns and the add them. It may be easier be easier with other tool but let’s give a try to awk.

[Update]

Williamson provided the right answer but some help needed to read it. I break the problem into parts with simpler examples, below.

  1. Why does the code below print t in both cases, gsub should substitute things? x is a dummy-file, some odd 0 at the end.

  2. Attacking the line line = $0 n more = getline n gsub("t"," ") in Williamson’s reply, line apparently gets whole stdout while more gets popped value of $0, right?

Code to part 1

$ gawk '{ hallo="tjenat tjena2"; gsub("t"," "); }; END {print hallo; gsub("t", ""); hallo=hallo gsub("t",""); print hallo }' x
tjena  tjena2
tjena  tjena20
Asked By: user2362

||

Not using awk

I understand this may just be one part of a larger problem you are trying to solve using awk or simply an attempt to understand awk better, but if you really just want to keep your line length to 72 columns, there is a much better tool.

The fmt tool was designed with specifically this in mind:

fmt --width=72 filename

fmt will also try hard to break the lines in reasonable places, making the output nicer to read. See the info page for more details about what fmt considers “reasonable places.”

Answered By: Steven D

Awk is a Turing-complete language, and not a particularly obfuscated one, so it’s easy enough to truncate lines. Here’s a straightforward imperative version.

awk -v WIDTH=72 '
{
    while (length>WIDTH) {
        print substr($0,1,WIDTH);
        $0=substr($0,WIDTH+1);
    }
    print;
}
'

If you want to truncate lines between words, you can code it up in awk, but recognizing words is a non-trivial (for reasons having more to do with natural languages than algorithmic difficulty). Many systems have a utility called fmt that does just that.

Here is an AWK script that wraps long lines and re-wraps the remainders as well as short lines:

awk -v WIDTH=72 '
{
    gsub("t"," ")
    $0 = line $0
    while (length <= WIDTH) {
        line = $0
        more = getline
        gsub("t"," ")
        if (more)
            $0 = line " " $0
        else
            $0 = line
            break
    }
    while (length >= WIDTH) {
        print substr($0,1,WIDTH)
        $0 = substr($0,WIDTH+1)
    }
    line = $0 " "
}

END {
    print
}
'

There is a Perl script available on CPAN which does a very nice job of reformatting text. It’s called paradj (individual files). In order to do hyphenation, you will also need TeX::Hyphen.

SWITCHES
--------
The available switches are:

--width=n (or -w=n or -w n)
    Line width is n chars long

--left (or -l)
    Output is left-justified (default)

--right (or -r)
    Output is right-justified

--centered (or -c)
    Output is centered

--both (or -b)
    Output is both left- and right-justified

--indent=n (or -i=n or -i n)
    Leave n spaces for initial indention (defaults to 0)

--newline (or -n)
    Insert blank lines between paragraphs

--hyphenate (or -h)
    Hyphenate word that doesn't fit on a line

Here is a diff of some changes I made to support a left-margin option:

12c12
< my ($indent, $newline);
---
> my ($indent, $margin, $newline);
15a16
>   "margin:i" => $margin,
21a23
> $margin = 0 if (!$margin);
149a152
>     print " " x $margin;
187a191,193
>   print "--margin=n (or -m=n or -m n)  Add a left margin of n ";
>   print "spacesn";
>   print "                                (defaults to 0)n";
Answered By: Dennis Williamson

Here is an Awk function that breaks on spaces:

function wrap(text,   q, y, z) {
  while (text) {
    q = match(text, / |$/); y += q
    if (y > 72) {
      z = z RS; y = q - 1
    }
    else if (z) z = z FS
    z = z substr(text, 1, q - 1)
    text = substr(text, q + 1)
  }
  return z
}

Surprisingly this is more performant than fold or fmt.

Source

Answered By: Zombo

You asked why the awk code emitted tabs and where the zero came from.

  1. The code does not modify the hello string with the gsub() calls. With two arguments, gsub() acts on $0. To actually modify the hallo variable, use gsub(..., ..., hallo).

  2. You get the zero at the end of the string because gsub() returns the number of substitutions made, and at one point you append this number to the value of hallo.

I’m aware of at least three utilities that are specifically for wrapping and formatting text paragraphs:

  1. fold, “filter for folding lines”, which is a standard POSIX utility. It simply inserts newlines and does not reflow text.

  2. fmt, “simple text formatter”, which is also often installed on Unix systems by default and a fair bit smarter than fold when it comes to reflowing paragraphs.

  3. par, “filter for reformatting paragraphs“, which has additional capabilities for detecting paragraph prefixes and suffixes (such as a text with an ASCII box around it, or comments in a bit of source code), and handles indentation and hanging indents a fair bit better than fmt.

Answered By: Kusalananda

Using gensub, in order to get fold semantics, you could run something along the lines of

awk '{printf gensub("(.{0,72})","\1n","g")}' 
Answered By: JJoao

It’s hard to say without specific requirements and sample input/output but this simplistic approach might be what is wanted (using 5 instead of 72 as the default line width to make test results clearer):

$ cat tst.awk
BEGIN {
    wid = (wid ? wid : 5)
}
{
    rec = rec $0
    while ( length(rec) > wid ) {
        print substr(rec,1,wid)
        rec = substr(rec,wid+1)
    }
}
END {
    if ( rec != "" ) {
        print rec
    }
}

$ seq 9 | awk -f tst.awk
12345
6789

$ seq 9 | awk -v wid=4 -f tst.awk
1234
5678
9

if your input can contain tabs then I recommend running it through pr -e -t first to replace them with relative blanks, otherwise just add gsub(/t/," ") or whatever substitution you think is appropriate immediately above the rec = rec $0 line.

Answered By: Ed Morton
Categories: Answers Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.