How can I wrap text at a certain column size?

I know that I can use something like cat test.txt | pr -w 80 to wrap lines to 80 characters wide, but that puts a lot of space on the top and bottom of the printed lines and it does not work right on some systems

What’s the best way to force a text file with long lines to be wrapped at a certain width?

Bonus points if you can keep it from breaking words.

Asked By: cwd

||

You are looking for

fold -w 80 -s text.txt
  • -w tells the width of the text, where 80 is standard.
  • -s tells to break at spaces, and not in words.

This is the standard way, but there are other systems, which need “-c” instead of “-w”.

Answered By: Rainer Bendig

In addition to fold, take a look at fmt. fmt tries to choose line breaks intelligently to make text look good. It doesn’t break long words, rather it wraps only by spaces. It will also join adjacent lines, which is good for prose but bad for log files or other formatted text.

Answered By: Jonathan

And for more formatting options, look at parhttp://www.nicemice.net/par/

Answered By: sendmoreinfo

Another (less known) tool that does what you want is wrap from GNU Talkfilters:

wrap -w 80 < textfile

Also (off topic):

but that puts a lot of space on the top and bottom of the printed lines

add -t when invoking pr to omit headers/trailers:

   -t, --omit-header
          omit page headers and trailers
Answered By: don_crissti
$ cat shxp.txt

O, they have lived long on the alms-basket of words, I marvel thy
master hath not eaten thee for a word; for thou art not so long by the
head as honorificabilitudinitatibus: thou art easier swallowed than a
flap-dragon.

1​) Assured fixed line width with word breaking:

fold -w 20 <shxp.txt

O, they have lived l
ong on the alms-bask
et of words, I marve
l thy master hath no
t eaten thee for a w
ord; for thou art no
t so long by the hea
d as honorificabilit
udinitatibus: thou a
rt easier swallowed
than a flap-dragon.

2​) Assured fixed line width with extraordinary word breaking. A word gets broken only if it is too large to fit in a line:

fold -sw 20 <shxp.txt

O, they have lived
long on the
alms-basket of
words, I marvel thy
master hath not
eaten thee for a
word; for thou art
not so long by the
head as
honorificabilitudini
tatibus: thou art
easier swallowed
than a flap-dragon.

3​) Promising fixed line width without any word breaking. If word is too large to fit in a line, it is still left as it is, so finally some lines may be larger in size than you need:

fmt -w 20 <shxp.txt

O, they have
lived long on the
alms-basket of
words, I marvel
thy master hath
not eaten thee
for a word; for
thou art not so
long by the head as
honorificabilitudinitatibus:
thou art easier
swallowed than
a flap-dragon.

Note that fmt also tries to balance ragged paragraph lines unlike fold -s.

4) Perhaps, the most typographically sophisticated way of solving the problem due to a special markup language and formatting utility used under the hood of the man program. Great possibilities for additional customization:

2>/dev/null nroff <(echo .pl 1 ; echo .ll 20) shxp.txt

O,  they  have lived
long  on  the  alms‐
basket  of  words, I
marvel  thy   master
hath  not eaten thee
for a word; for thou
art  not  so long by
the head as  honori‐
ficabilitudinitati‐
bus: thou art easier
swallowed   than   a
flap‐dragon.

.pl 1 roff markup sets the page height to a single line, effectively disabling pagination.

.ll 20 sets the line length to 20 characters.

Putting the markup in a separate file will simplify the command:

$ cat markup.roff
.pl 1
.ll 20
$ 2>/dev/null nroff markup.roff shxp.txt
Answered By: user2683246

Using Raku (formerly known as Perl_6)

[ Posting this because a number of U&L users have commented that some previous answers don't work with Unicode ].

Raku is a programming language in the Perl-family that features high-level support for Unicode. Raku normalizes all non-filename/non-filepath text to Normalization Form C (NFC) by default. Thus "graphemes, which are user-visible forms of the characters, will use a normalized representation" (i.e. normalized codepoints/width, see Unicode links at bottom for details).

Immediately below is an approach to solving the easier of the OP's requests (i.e. break text exactly at a desired column-width, irrespective of words/whitespace. The code is based on Raku's comb routine, and is written such that paragraphs (nn-separated or greater) are maintained separate with a single blank line in between. (Thanks to @user2683246 for the example text):

1. Break text/words at a desired column-width:

Sample Input:

~$ cat shxp_X2.txt
O, they have lived long on the alms-basket of words, I marvel thy
master hath not eaten thee for a word; for thou art not so long by the
head as honorificabilitudinitatibus: thou art easier swallowed than a
flap-dragon.

O, they have lived long on the alms-basket of words, I marvel thy
master hath not eaten thee for a word; for thou art not so long by the
head as honorificabilitudinitatibus: thou art easier swallowed than a
flap-dragon.

Code with Sample Output (wrapped to <= 40 characters wide):

~$ raku -e 'my $wrap = 40; for slurp.split(/ n**2..* /) { .subst(:global, / n /, " ") andthen .put for $_.comb($wrap); put ""; };'   shxp_X2.txt
O, they have lived long on the alms-bask
et of words, I marvel thy master hath no
t eaten thee for a word; for thou art no
t so long by the head as honorificabilit
udinitatibus: thou art easier swallowed 
than a flap-dragon.

O, they have lived long on the alms-bask
et of words, I marvel thy master hath no
t eaten thee for a word; for thou art no
t so long by the head as honorificabilit
udinitatibus: thou art easier swallowed 
than a flap-dragon.




2. Break between words (i.e. on whitespace) at desired column-width:

The code immediately below uses Raku's words routine which breaks on whitespace. Below are example lines in over 30 Unicode Scripts, wrapped to <= 72 characters wide:

~$ raku -e 'my  $wrap = 72; my   $tmp = 0; 
            for lines() {   my $ln-ch = $_.chars;  
                if  $ln-ch == 0 { "n".say; $tmp = 0; next };    
                for $_.words -> $w {   my  $w-ch = $w.chars;  
                    $wrap >=  ($tmp + $w-ch)        
                    ?? (   "$w".print andthen $tmp += $w-ch )  
                    !! ( "n$w".print andthen $tmp  = $w-ch );  
                    if ($wrap > $tmp) { " ".print andthen ++$tmp };  
                }   
            };'   file

Sample Input (from The Kermit Project):

English: The quick brown fox jumps over the lazy dog.
Jamaican: Chruu, a kwik di kwik brong fox a jomp huova di liezi daag de, yu no siit?
Irish: "An ḃfuil do ċroí ag bualaḋ ó ḟaitíos an ġrá a ṁeall lena ṗóg éada ó ṡlí do leasa ṫú?" "D'ḟuascail Íosa Úrṁac na hÓiġe Beannaiṫe pór Éava agus Áḋaiṁ."
Dutch: Pa's wijze lynx bezag vroom het fikse aquaduct.
German: Falsches Üben von Xylophonmusik quält jeden größeren Zwerg. (1)
German: Im finſteren Jagdſchloß am offenen Felsquellwaſſer patzte der affig-flatterhafte kauzig-höf‌liche Bäcker über ſeinem verſifften kniffligen C-Xylophon. (2)
Norwegian: Blåbærsyltetøy ("blueberry jam", includes every extra letter used in Norwegian).
Swedish: Flygande bäckasiner söka strax hwila på mjuka tuvor.
Icelandic: Sævör grét áðan því úlpan var ónýt.
Finnish: (5) Törkylempijävongahdus (This is a perfect pangram, every letter appears only once. Translating it is an art on its own, but I'll say "rude lover's yelp". :-D)
Finnish: (5) Albert osti fagotin ja töräytti puhkuvan melodian. (Albert bought a bassoon and hooted an impressive melody.)
Finnish: (5) On sangen hauskaa, että polkupyörä on maanteiden jokapäiväinen ilmiö. (It's pleasantly amusing, that the bicycle is an everyday sight on the roads.)
Polish: Pchnąć w tę łódź jeża lub osiem skrzyń fig.
Czech: Příliš žluťoučký kůň úpěl ďábelské ódy.
Slovak: Starý kôň na hŕbe kníh žuje tíško povädnuté ruže, na stĺpe sa ďateľ učí kvákať novú ódu o živote.
Slovenian: Šerif bo za domačo vajo spet kuhal žgance.
Greek (monotonic): ξεσκεπάζω την ψυχοφθόρα βδελυγμία
Greek (polytonic): ξεσκεπάζω τὴν ψυχοφθόρα βδελυγμία
Russian: Съешь же ещё этих мягких французских булок да выпей чаю.
Russian: В чащах юга жил-был цитрус? Да, но фальшивый экземпляр! ёъ.
Bulgarian: Жълтата дюля беше щастлива, че пухът, който цъфна, замръзна като гьон.
Sami (Northern): Vuol Ruoŧa geđggiid leat máŋga luosa ja čuovžža.
Hungarian: Árvíztűrő tükörfúrógép.
Spanish: El pingüino Wenceslao hizo kilómetros bajo exhaustiva lluvia y frío, añoraba a su querido cachorro.
Spanish: Volé cigüeña que jamás cruzó París, exhibe flor de kiwi y atún.
Portuguese: O próximo vôo à noite sobre o Atlântico, põe freqüentemente o único médico. (3)
French: Les naïfs ægithales hâtifs pondant à Noël où il gèle sont sûrs d'être déçus en voyant leurs drôles d'œufs abîmés.
Esperanto: Eĥoŝanĝo ĉiuĵaŭde
Esperanto: Laŭ Ludoviko Zamenhof bongustas freŝa ĉeĥa manĝaĵo kun spicoj.
Hebrew: זה כיף סתם לשמוע איך תנצח קרפד עץ טוב בגן.
Japanese (Hiragana):
いろはにほへど ちりぬるを
わがよたれぞ つねならむ
うゐのおくやま けふこえて
あさきゆめみじ ゑひもせず (4)
Japanese (Kanji):
色は匂へど 散りぬるを
我が世誰ぞ 常ならむ
有為の奥山 今日越えて
浅き夢見じ 酔ひもせず

Sample Output (wrapped to 72 characters):

English: The quick brown fox jumps over the lazy dog. Jamaican: Chruu, a
kwik di kwik brong fox a jomp huova di liezi daag de, yu no siit? Irish:
"An ḃfuil do ċroí ag bualaḋ ó ḟaitíos an ġrá a ṁeall lena ṗóg éada ó ṡlí
do leasa ṫú?" "D'ḟuascail Íosa Úrṁac na hÓiġe Beannaiṫe pór Éava agus
Áḋaiṁ." Dutch: Pa's wijze lynx bezag vroom het fikse aquaduct. German:
Falsches Üben von Xylophonmusik quält jeden größeren Zwerg. (1) German:
Im finſteren Jagdſchloß am offenen Felsquellwaſſer patzte der
affig-flatterhafte kauzig-höf‌liche Bäcker über ſeinem verſifften
kniffligen C-Xylophon. (2) Norwegian: Blåbærsyltetøy ("blueberry jam",
includes every extra letter used in Norwegian). Swedish: Flygande
bäckasiner söka strax hwila på mjuka tuvor. Icelandic: Sævör grét áðan
því úlpan var ónýt. Finnish: (5) Törkylempijävongahdus (This is a
perfect pangram, every letter appears only once. Translating it is an
art on its own, but I'll say "rude lover's yelp". :-D) Finnish: (5)
Albert osti fagotin ja töräytti puhkuvan melodian. (Albert bought a
bassoon and hooted an impressive melody.) Finnish: (5) On sangen
hauskaa, että polkupyörä on maanteiden jokapäiväinen ilmiö. (It's
pleasantly amusing, that the bicycle is an everyday sight on the roads.)
Polish: Pchnąć w tę łódź jeża lub osiem skrzyń fig. Czech: Příliš
žluťoučký kůň úpěl ďábelské ódy. Slovak: Starý kôň na hŕbe kníh žuje
tíško povädnuté ruže, na stĺpe sa ďateľ učí kvákať novú ódu o živote.
Slovenian: Šerif bo za domačo vajo spet kuhal žgance. Greek (monotonic):
ξεσκεπάζω την ψυχοφθόρα βδελυγμία Greek (polytonic): ξεσκεπάζω τὴν
ψυχοφθόρα βδελυγμία Russian: Съешь же ещё этих мягких французских булок
да выпей чаю. Russian: В чащах юга жил-был цитрус? Да, но фальшивый
экземпляр! ёъ. Bulgarian: Жълтата дюля беше щастлива, че пухът, който
цъфна, замръзна като гьон. Sami (Northern): Vuol Ruoŧa geđggiid leat
máŋga luosa ja čuovžža. Hungarian: Árvíztűrő tükörfúrógép. Spanish: El
pingüino Wenceslao hizo kilómetros bajo exhaustiva lluvia y frío,
añoraba a su querido cachorro. Spanish: Volé cigüeña que jamás cruzó
París, exhibe flor de kiwi y atún. Portuguese: O próximo vôo à noite
sobre o Atlântico, põe freqüentemente o único médico. (3) French: Les
naïfs ægithales hâtifs pondant à Noël où il gèle sont sûrs d'être déçus
en voyant leurs drôles d'œufs abîmés. Esperanto: Eĥoŝanĝo ĉiuĵaŭde
Esperanto: Laŭ Ludoviko Zamenhof bongustas freŝa ĉeĥa manĝaĵo kun
spicoj. Hebrew: זה כיף סתם לשמוע איך תנצח קרפד עץ טוב בגן. Japanese
(Hiragana): いろはにほへど ちりぬるを わがよたれぞ つねならむ うゐのおくやま けふこえて あさきゆめみじ ゑひもせず (4)
Japanese (Kanji): 色は匂へど 散りぬるを 我が世誰ぞ 常ならむ 有為の奥山 今日越えて 浅き夢見じ 酔ひもせず
  • Paragraphs (nn-separated or greater) are maintained separate with a single blank line in between. All lines in the Sample Output wrap to 72 characters or less. The only visual problem is with Japanese Hiragana/Kanji, but in fact the last two lines of the "wrapped" output contain 71 and 65 characters, respectively.

  • Custom words can be defined, based upon Unicode properties. For example, the .words routine can be replaced by .comb(/ <-:Zs>+ /) to split on Unicode 'Space-Separator' as defined in Unicode® Standard Annex #44.

  • Right now the code doesn't hyphenate or otherwise break individual words that are longer than the desired $wrap column width. (This may be the desired behavior, otherwise you indeed might see issues with excessively long words and/or short column-widths).

  • A single trailing whitespace is left at the end of lines less that $wrap. This can be corrected by running ~$ raku -ne '.trim-trailing.put;' over the wrapped output.


https://unicode.org/reports/tr15/#Canon_Compat_Equivalence
https://docs.raku.org/language/unicode
https://docs.raku.org/type/Str#routine_words
https://docs.raku.org/type/Str#routine_comb
https://raku.org

Answered By: jubilatious1

pandoc can wrap unicode text

pandoc -f plain.lua -t plain 
  --wrap=auto --columns=78 input.txt

you only need a plain text reader in plain.lua
because by default, pandoc cannot parse plain text

-- A sample custom reader that just parses text into blankline-separated
-- paragraphs with space-separated words.

-- For better performance we put these functions in local variables:
local P, S, R, Cf, Cc, Ct, V, Cs, Cg, Cb, B, C, Cmt =
  lpeg.P, lpeg.S, lpeg.R, lpeg.Cf, lpeg.Cc, lpeg.Ct, lpeg.V,
  lpeg.Cs, lpeg.Cg, lpeg.Cb, lpeg.B, lpeg.C, lpeg.Cmt

local whitespacechar = S(" trn")
local wordchar = (1 - whitespacechar)
local spacechar = S(" t")
local newline = P"r"^-1 * P"n"
local blanklines = newline * (spacechar^0 * newline)^1
local endline = newline - blanklines

-- Grammar
G = P{ "Pandoc",
  Pandoc = Ct(V"Block"^0) / pandoc.Pandoc;
  Block = blanklines^0 * V"Para" ;
  Para = Ct(V"Inline"^1) / pandoc.Para;
  Inline = V"Str" + V"Space" + V"SoftBreak" ;
  Str = wordchar^1 / pandoc.Str;
  Space = spacechar^1 / pandoc.Space;
  SoftBreak = endline / pandoc.SoftBreak;
}

function Reader(input)
  return lpeg.match(G, tostring(input))
end
Answered By: milahu
Categories: Answers Tags:
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.