How to "grep" for line length in a given range?

NOTE: This question is the complement of this Q&A: How to "grep" for line length *not* in a given range?


I need to get only the lines from a textfile (a wordlist, separated with newline) that has a length range of minimum or equal than 3 characters, but not longer or equal than 10.

Example:

INPUT:

egyezményét
megkíván
ki
alma
kevesen
meghatározó

OUTPUT:

megkíván
alma
kevesen

Question: How can I do this in bash?

Asked By: user90825

||
grep -x '.{3,10}'

where

  • -x (also --line-regexp with GNU grep) match pattern to whole line
  • . any single character
  • {3,10} quantify from 3 to 10 times previous symbol (in the case any ones)
Answered By: Costas

Using grep -E:

grep -E '^.{3,10}$'

This matches lines consisting of between three and 10 characters.

Answered By: repzero

Using sed:

sed '/^.{3,10}$/!d'

Or, with GNU sed or compatible:

sed -r '/^.{3,10}$/!d'

Though nowadays, you’d use -E instead of -r as that’s the one that is going to be specified by POSIX (and already made its way to most sed implementations including GNU sed).

Answered By: agc

Using awk (and assuming that it is an implementation that is locale-aware, such as GNU awk, so that lines with multi-byte characters that are shorter than three characters, like "Ők", are not matched):

LC_ALL=hu_HU.UTF-8 awk 'length >= 3 && length <= 10' file

The length statement would return the length of $0 (the current record/line) by default, and this is used by the code to test wether the line’s length is within the given range. If a test like this has no corresponding action block, then the default action is to print the record.

Testing on the given data:

$ LC_ALL=hu_HU.UTF-8 awk 'length >= 3 && length <= 10' file
megkíván
alma
kevesen

Similarly with Perl:

$ LC_ALL=hu_HU.UTF-8 perl -C -lne '$l=length($_); print if ($l >= 3 && $l <= 10)' file
megkíván
alma
kevesen
Answered By: Kusalananda

I think this will be useful to someone. By extension, if you want to match a specific string within a line that is no longer than say 255 characters, this would be a solution.

Usage: looking for a string but wanting to exclude long lines like minified JS files which you didn’t write or don’t need

grep -x '.{1,255}theStringIWant.{1,255}'

A bit of a hack as you can’t really control the length on both ends to be no more than a certain number (it could be 1 and 255, 255 and 1, or 255 and 255) but this works in most cases to exclude minified long lines

BASH NEWBIE TIP: the backslashes are escape characters for the {} braces.

Example/Proof:

echo "aaaalocalStoragebbbbccccdd" | grep -x '.{3,10}localStorage.{3,10}' #works
echo "aaaalocalStoragebbbbccccdddd" | grep -x '.{3,10}localStorage.{3,10}' #doesn't work, dddd puts end string to 12 chars
Answered By: Oliver Williams
Categories: Answers Tags: , , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.