cut with 2 character delimiter

I wanted to use cut to with a 2 charachter delimeter to process a file with many lines like this:

1F3C6..1F3CA
1F3CF..1F3D3
1F3E0..1F3F0

But cut only allows a single character.

Instead of cut -d'..' I’m trying awk -F'..' "{echo $1}" but it’s not working.

My script:

wget -O output.txt http://www.unicode.org/Public/emoji/6.0/emoji-data.txt                                                                             
sed -i '/^#/ d' output.txt                        # Remove comments                                                                                   
cat output.txt | cut -d' ' -f1 | while read line ;                                                                                                    
  do echo $line | awk -F'..' "{echo $1}"                                                                                                             
done  
Asked By: Philip Kirkbride

||

Sample test script that works for me:

#!/bin/sh

raw="1F3C6..1F3CA
1F3CF..1F3D3
1F3E0..1F3F0"

for r in $raw
do
    f1=`echo "${r}" | cut -d'.' -f1`
    f2=`echo "${r}" | cut -d'.' -f2`
    f3=`echo "${r}" | cut -d'.' -f3`
    echo "field 1:[${f1}] field 2:[${f2}] field 3:[${f3}]"
done

exit

And the output is:

field 1:[1F3C6] field 2:[] field 3:[1F3CA]
field 1:[1F3CF] field 2:[] field 3:[1F3D3]
field 1:[1F3E0] field 2:[] field 3:[1F3F0]

Edit

After reading Stéphane Chazelas comment and linked Q&A, I re-wrote the above to remove the loop.

I could not work out a way to remove the loop and keep the parts as variables (for example; $f1, $f2 and $f3 in my original answer) that could be passed around. Still I don’t know what was required output in the original question.

First, still using cut:

#!/bin/sh
raw="1F3C6..1F3CA
1F3CF..1F3D3
1F3E0..1F3F0"

printf '%sn' "${raw}" | cut -d'.' -f1,3

Which will output:

1F3C6.1F3CA
1F3CF.1F3D3
1F3E0.1F3F0

Could replace the displayed . with any string using the --output-delimiter=STRING.

Next, with sed instead of cut in order to give more control of the output:

#!/bin/sh
raw="1F3C6..1F3CA
1F3CF..1F3D3
1F3E0..1F3F0"

printf '%sn' "${raw}" | sed 's/^(.*)..(.*)$/field 1 [1] field 2 [2]/'

And this will render:

field 1 [1F3C6] field 2 [1F3CA]
field 1 [1F3CF] field 2 [1F3D3]
field 1 [1F3E0] field 2 [1F3F0]
Answered By: Tigger

You could use IFS to split each line discarding the field between the two dots:

#/bin/sh
while IFS=. read a _ b
do
     echo "field one=[$a] field two=[$b]"
done < "file"

Execute:

$ ./script
field one=1F3C6 field two=1F3CA
field one=1F3CF field two=1F3D3
field one=1F3E0 field two=1F3F0

Assuming that file is:

$ cat file
1F3C6..1F3CA
1F3CF..1F3D3
1F3E0..1F3F0
Answered By: user232326

awk‘s field separator is treated as a regexp as long as it’s more than two characters. .. as a regexp, means any 2 characters. You’d need to escape that . either with [.] or with ..

awk -F'[.][.]' ...
awk -F'\.\.' ...

(the backslash itself also needs to be escaped (with some awks like gawk at least) for the n/b expansion that the argument to -F undergoes).

In your case:

awk -F' +|[.][.]' '/^[^#]/{print $1}' < output.txt

In any case, avoid shell loops to process text, note that read is not meant to be used like that, that echo should not be used for arbitrary data and remember to quote your variables.

Answered By: Stéphane Chazelas

You can also use rev to revert your string:

cat data.txt
1F3C6..1F3CA
1F3CF..1F3D3
1F3E0..1F3F0

First column:

cat data.txt | cut -d. -f1
1F3C6
1F3CF
1F3E0

Last column:

cat data.txt | rev | cut -d. -f1 | rev
1F3CA
1F3D3
1F3F0

Unfortunately this method would work in your case only and not acceptable for wider species of input data.

Answered By: saver

I’ve created a patch that adds new -m command-line option to cut, which works in the field mode and treats multiple consecutive delimiters as a single delimiter. This basically solves the OP’s question in a rather efficient way. I also submitted this patch upstream a couple of days ago, and let’s hope that it will be merged into the coreutils project.

There are some further thoughts about adding even more whitespace-related features to cut, and having some feedback about all that would be great. I’m willing to implement more patches for cut and submit them upstream, which would make this utility more versatile and more usable in various real-world scenarios.

Answered By: dsimic
Categories: Answers Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.