What's a good way to filter a text file to remove empty lines?

I have a .csv file (on a mac) that has a bunch of empty lines, e.g.:

"1", "2", "lorem ipsum lorem ipsum lorem ipsum lorem ipsum lorem ipsum lorem ipsum lorem ipsum lorem ipsum lorem ipsum lorem ipsum lorem ipsum 

lorem ipsum ","2","3","4"
"1", "2", "lorem ipsum lorem ipsum lorem ipsum lorem ipsum lorem ipsum lorem ipsum lorem ipsum lorem ipsum lorem ipsum lorem ipsum lorem ipsum 

lorem ipsum ","2","3","4"

Which I want to convert to:

"1", "2", "lorem ipsum lorem ipsum lorem ipsum lorem ipsum lorem ipsum lorem ipsum lorem ipsum lorem ipsum lorem ipsum lorem ipsum lorem ipsum lorem ipsum ","2","3","4"
"1", "2", "lorem ipsum lorem ipsum lorem ipsum lorem ipsum lorem ipsum lorem ipsum lorem ipsum lorem ipsum lorem ipsum lorem ipsum lorem ipsum  lorem ipsum ","2","3","4"

I know there must be a one liner but I don’t know awk or sed. Any tips greatly appreciated!

Asked By: pitosalas

||

Here’s a Perl one-liner for it:

perl -pi -e 's/^s*n//' yourfile

EDIT: Improved code based on ruakh’s comments below.

Answered By: Joseph R.

You can use grep’s -v (invert match) mode to do this:

grep -v '^$' old-file.csv > new-file.csv

Note that those need to be different files, because of how shell redirects work. The output file is opened (and emptied) before the input file is read. If you have moreutils (not by default on Mac OS X), you can use sponge to work around this:

grep -v '^$' file.csv | sponge file.csv

But of course, then you have a harder time going back if something goes wrong.

If you “blank lines” actually may contain spaces (it sounds like they do), then you can use this instead:

egrep -v '^[[:space:]]*$' old-file.csv > new-file.csv

That will ignore blank lines as well as lines containing only whitespace. You can of course do the same sponge transformation on it.

Answered By: derobert

To remove empty lines, in place, with ksh93:

sed '/./!d' file 1<>; file

The <>; redirection operator is specific to ksh93 and is the same as the standard <> operator except that ksh truncates the file after the command has terminated.

sed '/./!d' is a convoluted way to write grep ., but unfortunately GNU grep at least complains if its stdout points to the same file as its stdin. You’d say one could write:

grep . file | cat 1<>; file

But unfortunately, there’s a bug in ksh93 (at least my version (93u+)), in that the file seems to be truncated to zero length in that case.

grep . file | { cat; } 1<>; file

Seems to work around that bug, but now, it’s far more convoluted than the sed command.

Answered By: Stéphane Chazelas

I found an idea for a possible solution on stackoverflow.

sed -i ':a;N;$!ba;s/[^"]ns*n/ /g' file.csv

You should probably backup your csv file before testing it, but at least for the example you provided it works flawlessly.

A good explanation about the inner workings of this expression is offered at the answer, I just edited it to look for lines that do not end with a " ([^"]n).

Answered By: tongpu
awk '
    length == 0 {next} 
    /^[^"]/ && /"$/ {print; next} 
    {printf("%s", $0)}
' filename

produces

"1", "2", "lorem ipsum lorem ipsum lorem ipsum lorem ipsum lorem ipsum lorem ipsum lorem ipsum lorem ipsum lorem ipsum lorem ipsum lorem ipsum lorem ipsum ","2","3","4"
"1", "2", "lorem ipsum lorem ipsum lorem ipsum lorem ipsum lorem ipsum lorem ipsum lorem ipsum lorem ipsum lorem ipsum lorem ipsum lorem ipsum lorem ipsum ","2","3","4"
Answered By: glenn jackman

Based on the clarification in the comments to your question, something like:

awk -v RS= -v ORS= 1

may do what you want.

An empty record separator is a special case that tells awk that records are to be paragraphs (separated by sequences of empty lines). Setting the output record separator to the empty string as well means that the content of those paragraphs (without the separators) are to be concatenated. 1 is just a true condition to print every record.

That would however omit the trailing newline, so you could do:

awk -v RS= -v ORS= '1;END{if (NR) printf "n"}'
Answered By: Stéphane Chazelas

I know this would have been easier if I gave the file, but unfortunately it contained confidential info that I couldn’t share. In the meanwhile I wrote me a ruby script that seemed to do the trick:

require 'csv'
c = CSV.open("outfile1.csv", "w")
CSV.foreach("data.csv", :encoding => 'windows-1251:utf-8') do |row|
  row = row.map { |a| a.class == String ? a.gsub(/r/, '') : a}
  c << row
end
c.close

Thanks everyone for helping!

Answered By: pitosalas

The easiest option is just grep .. Here, the dot means “match anything”, so if the line is empty, it is not matched. Otherwhise it prints the whole line as is.

Answered By: Pythonist

It looks like in effect that you want more than removing empty lines, but remove every sequence of 2 or more newline characters.

Which you could do with perl:

perl -0777 -pe 's/n{2,}//gs' file

You could also use use perl’s -i flag to edit the files in place.

perl -0777 -pi -e 's/n{2,}//gs' file1 file2...
Answered By: Stéphane Chazelas

If, from your own response, you want to remove newline characters contained inside quoted strings, you could do:

 perl -0777 -pe 's/".*?"/$_=$&;s:n::g;$_/gse'

You could also use use perl’s -i flag to edit the files in place.

 perl -0777 -pe 's/".*?"/$_=$&;s:n::g;$_/gse' file1 file2...

Or with GNU awk:

 awk -v RS=" 'NR%2==0 {gsub("n","")}; {printf "%s", $0 RT}'

or:

 awk -vRS=" '1-NR%2{gsub("n","")}{ORS=RT}1'

(if you’re competing for the shortest one)

Note that those assume that there are no escaped double quote characters in the input.

Answered By: Stéphane Chazelas

There is an ever shorter way of removing empty lines in AWK:

awk 'NF' file

But to get the output you want, all is needed is a simple one liner:

awk 'NF {printf("%s ", $0); i++;} !(i % 2) {printf("n");}' file

Explanation

In AWK, an empty line means the row/record has no fields, that is, the NF (Number of Fields) variable is zero. The one liner above will only execute when NF > 0, printing all lines, but the empty ones.

The i++ is the non-empty lines counter.

The !(i % 2) is used in order to print two consecutive non-empty lines in the way of your desired output, that is, every time a multiple of 2 is found, the modulo statement !(i % 2) yields 1, what terminates the concatenation of two non-empty lines.

Answered By: Marcelo Augusto

You can use Vim in Ex mode:

ex -sc v/./d -cx b.csv
  1. v/./ find empty lines

  2. d delete

  3. x save and close

Answered By: Zombo
Categories: Answers Tags: , , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.