How do I trim leading and trailing whitespace from each line of some output?
I would like to remove all leading and trailing spaces and tabs from each line in an output.
Is there a simple tool like trim
I could pipe my output into?
Example file:
test space at back
test space at front
TAB at end
TAB at front
sequence of some space in the middle
some empty lines with differing TABS and spaces:
test space at both ends
sed -e 's/^[[:space:]]*//' -e 's/[[:space:]]*$//'
If you’re reading a line into a shell variable, read
does that already unless instructed otherwise.
sed is a great tool for that:
# substitute ("s/")
sed 's/^[[:blank:]]*//; # parts of lines that start ("^") with a space/tab
s/[[:blank:]]*$//' # or end ("$") with a space/tab
# with nothing (/)
You can use it for your case be either piping in the text, e.g.
<file sed -e 's/^[[...
or by acting on it ‘inline’ if your sed
is the GNU one:
sed -i 's/...' file
but changing the source this way is “dangerous” as it may be unrecoverable when it doesn’t work right (or even when it does!), so backup first (or use -i.bak
which also has the benefit to be portable to some BSD sed
s)!
The command can be condensed like so if you’re using GNU sed
:
$ sed 's/^[ t]*//;s/[ t]*$//' < file
Example
Here’s the above command in action.
$ echo -e " t blahblah t " | sed 's/^[ t]*//;s/[ t]*$//'
blahblah
You can use hexdump
to confirm that the sed
command is stripping the desired characters correctly.
$ echo -e " t blahblah t " | sed 's/^[ t]*//;s/[ t]*$//' | hexdump -C
00000000 62 6c 61 68 62 6c 61 68 0a |blahblah.|
00000009
Character classes
You can also use character class names instead of literally listing the sets like this, [ t]
:
$ sed 's/^[[:blank:]]*//;s/[[:blank:]]*$//' < file
Example
$ echo -e " t blahblah t " | sed 's/^[[:blank:]]*//;s/[[:blank:]]*$//'
Most of the GNU tools that make use of regular expressions (regex) support these classes (here with their equivalent in the typical C locale of an ASCII-based system (and there only)).
[[:alnum:]] - [A-Za-z0-9] Alphanumeric characters
[[:alpha:]] - [A-Za-z] Alphabetic characters
[[:blank:]] - [ t] Space or tab characters only
[[:cntrl:]] - [x00-x1Fx7F] Control characters
[[:digit:]] - [0-9] Numeric characters
[[:graph:]] - [!-~] Printable and visible characters
[[:lower:]] - [a-z] Lower-case alphabetic characters
[[:print:]] - [ -~] Printable (non-Control) characters
[[:punct:]] - [!-/:-@[-`{-~] Punctuation characters
[[:space:]] - [ tvfnr] All whitespace chars
[[:upper:]] - [A-Z] Upper-case alphabetic characters
[[:xdigit:]] - [0-9a-fA-F] Hexadecimal digit characters
Using these instead of literal sets always seems like a waste of space, but if you’re concerned with your code being portable, or having to deal with alternative character sets (think international), then you’ll likely want to use the class names instead.
References
As suggested by Stéphane Chazelas in the accepted answer, you can now
create a script /usr/local/bin/trim
:
#!/bin/bash
awk '{$1=$1};1'
and give that file executable rights:
chmod +x /usr/local/bin/trim
Now you can pass every output to trim
for example:
cat file | trim
(for the comments below: i used this before: while read i; do echo "$i"; done
which also works fine, but is less performant)
awk '{$1=$1;print}'
or shorter:
awk '{$1=$1};1'
Would trim leading and trailing space or tab characters1 and also squeeze sequences of tabs and spaces into a single space.
That works because when you assign something to one of the fields, awk
rebuilds the whole record (as printed by print
) by joining all fields ($1
, …, $NF
) with OFS
(space by default).
To also remove blank lines, change it to awk '{$1=$1};NF'
(where NF
tells awk
to only print the records for which the N
umber of F
ields is non-zero). Do not do as sometimes suggested as that would also remove lines whose first field is any representation of awk '$1=$1'
0
supported by awk
(0
, 00
, -0e+12
…)
¹ and possibly other blank characters depending on the locale and the awk
implementation
xargs without arguments do that.
Example:
trimmed_string=$(echo "no_trimmed_string" | xargs)
To remove all leading and trailing spaces from a given line thanks to a ‘piped’ tool, I can identify 3 different ways which are not completely equivalent. These differences concern the spaces between words of the input line. Depending on the expected behaviour, you’ll make your choice.
Examples
To explain the differences, let consider this dummy input line:
" t A tBtC t "
tr
$ echo -e " t A tBtC t " | tr -d "[:blank:]"
ABC
tr
is really a simple command. In this case, it deletes any space or tabulation character.
awk
$ echo -e " t A tBtC t " | awk '{$1=$1};1'
A B C
awk
deletes leading and tailing spaces and squeezes to a single space every spaces between words.
sed
$ echo -e " t A tBtC t " | sed 's/^[ t]*//;s/[ t]*$//'
A B C
In this case, sed
deletes leading and tailing spaces without touching any spaces between words.
Remark:
In the case of one word per line, tr
does the job.
If you store lines as variables, you can use bash to do the job:
remove leading whitespace from a string:
shopt -s extglob
printf '%sn' "${text##+([[:space:]])}"
remove trailing whitespace from a string:
shopt -s extglob
printf '%sn' "${text%%+([[:space:]])}"
remove all whitespace from a string:
printf '%sn' "${text//[[:space:]]}"
translate command would work
cat file | tr -d [:blank:]
If the string one is trying to trim is short and continuous/contiguous, one can simply pass it as a parameter to any bash function:
trim(){
echo $@
}
a=" some random string "
echo ">>`trim $a`<<"
Output
>>some random string<<
I wrote this shell function using awk
awkcliptor(){
awk -e 'BEGIN{ RS="^$" } {gsub(/^[nt ]*|[nt ]*$/,"");print ;exit}' "$1" ; }
BEGIN{ RS="^$" }
:
in the beginning before start parsing set record
separator to none i.e. treat the whole input as
a single record
gsub(this,that)
:
substitute this regexp with that string
/^[nt ]*|[nt ]*$/
:
of that string catch any pre newline space and tab class
or post newline space and tab class and replace them with
empty string
print;exit
:
then print and exit
"$1"
:
and pass the first argument of the function to be
process by awk
how to use:
copy above code , paste in shell, and then enter to
define the function.
then you can use awkcliptor as a command with first argument as the input file
sample usage:
echo '
ggggg
' > a_file
awkcliptor a_file
output:
ggggg
or
echo -e "n ggggg nn "|awkcliptor
output:
ggggg
An answer you can understand in a glance:
#!/usr/bin/env python3
import sys
for line in sys.stdin: print(line.strip())
Bonus: replace str.strip([chars])
with arbitrary characters to trim or use .lstrip()
or .rstrip()
as needed.
Like rubo77’s answer, save as script /usr/local/bin/trim
and give permissions with chmod +x
.
trimpy () {
python3 -c 'import sys
for line in sys.stdin: print(line.strip())'
}
trimsed () {
gsed -e 's/^[[:space:]]*//' -e 's/[[:space:]]*$//'
}
trimzsh () {
local out="$(</dev/stdin)"
[[ "$out" =~ '^s*(.*S)s*$' ]] && out="$match[1]" || out=''
print -nr -- "$out"
}
# example usage
echo " hi " | trimpy
Bonus: replace str.strip([chars])
with arbitrary characters to trim or use .lstrip()
or .rstrip()
as needed.
For those of us without enough space in the brain to remember obscure sed syntax, just reverse the string, cut the 1st field with a delimiter of space, and reverse it back again.
cat file | rev | cut -d' ' -f1 | rev
for bash example:
alias trim="awk '{$1=$1};1'"
usage:
echo -e " hellottkitty " | trim | hexdump -C
result:
00000000 68 65 6c 6c 6f 20 6b 69 74 74 79 0a |hello kitty.|
0000000c
Using Raku (formerly known as Perl_6):
raku -ne '.trim.put;'
Or more simply:
raku -pe '.=trim;'
As a previous answer suggests (thanks, @Jeff_Clayton!), you can create a trim
alias in your bash environment:
alias trim="raku -pe '.=trim;'"
Finally, to only remove leading/trailing whitespace (e.g. unwanted indentation), you can use the appropriate trim-leading
or trim-trailing
command instead.
You will be adding this to your little Bash library. I can almost bet on it!
This has the benefit of not adding a newline character to the end of your output, as will happen with echo
throwing off your expected output. Moreover, these solutions are reusable, do not require modifying the shell options, can be called in-line with your pipelines, and are posix compliant. This is the best answer, by far. Modify to your liking.
Output tested with od -cb
, something some of the other solutions might want to do with their output.
BTW: The correct quantifier is the +
, not the *
, as you want the replacement to be triggered upon 1 or more whitespace characters!
ltrim (that you can pipe input into)
function ltrim ()
{
sed -E 's/^[[:space:]]+//'
}
rtrim (that you can pipe input into)
function rtrim ()
{
sed -E 's/[[:space:]]+$//'
}
trim (the best of both worlds and yes, you can pipe to it)
function trim ()
{
ltrim | rtrim
}
Remove start space and tab and end space and tab:
alias strip='python3 -c "from sys import argv; print(argv[1].strip(" ").strip("t"))"'
Remove every space and tab
alias strip='python3 -c "from sys import argv; print(argv[1].replace("t", "").replace(" ", "")"'
Give argument to strip. Use sys.stdin().read() to make pipeable instead of argv.
simple enough for my purposes was this:
_text_=" one two three "
echo "$_text_" | { read __ ; echo ."$__". ; }
… giving …
.one two three.
… if you want to squeeze the spaces then …
echo .$( echo $_text_ ).
… gives …
.one two three.
rust sd command
sd '^s*(.*)s*' '$1'
My favorite is using perl: perl -n -e'/[s]*(.*)?[s]*/ms && print $1'
Take for example:
MY_SPACED_STRING="nn mynmulti-linenstring nn"
echo $MY_SPACED_STRING
Would output:
my
multi-line
string
Then:
echo $MY_SPACED_STRING | perl -n -e'/[s]*(.*)?[s]*/ms && print $1'
Would output:
my
multi-line
string