Making single digit numbers between 1 and 9 into double digits, inside CSV file

I have a CSV file with thousands of lines like these

1664;4;5;35;37;43;5;6
1663;21;23;32;40;49;8;11
1662;16;17;34;35;44;5;10
1661;2;9;23;32;40;6;7
1660;23;25;30;44;47;9;12
1659;3;5;9;32;43;6;10
1658;4;6;10;13;34;3;5
1657;8;9;33;35;40;3;6
1656;15;20;31;44;48;1;3
1655;25;27;35;40;45;7;11
1654;7;32;33;34;38;6;9
1653;5;7;11;27;37;6;12
1652;7;31;33;35;36;7;10
1651;4;12;34;35;45;1;9
1650;5;8;29;35;48;5;6
1649;2;11;28;42;48;4;9
1648;2;11;12;19;38;4;8

You can see that the numbers between 1 and 9 are single digit.

How can I use sed or something to covert these number into double digit, by preceding a zero to them, such as

01 02 03 04 05 06 07 08 09

instead of

1 2 3 4 5 6 7 8 9

Thanks in advance.

Asked By: Duck

||

Given your input sample, it looks like

perl -pi -e 's/bdb/0$&/g' file.csv

Should do it.

That prepends a 0 to all digits that are both preceded and followed by a word boundary. Replace d with [1-9] if you want 0 to be left alone as the wording of your question suggests.

The same with the GNU implementation of sed

sed -i 's/b[[:digit:]]b/0&/g' file.csv

With recent versions of mlr (miller), you can also do:

mlr --csv -N --fs ';' put '$* = fmtifnum($*, "%02d")'

To format all numeric fields as integers 0-padded to a length of 2.

That will leave numbers in non-numeric fields (like in "foo 1 and 2";x-2) untouched, but note that it will also reformat non-integer numbers (1e0, 10e-1, 1.123, 0x1 would all be changed to 01 for instance).

To only pad the fields that consist of only one decimal digit (quoted or not):

mlr --csv -N --fs ';' put 'for (k in $*) {$[k] = sub(string($[k]),"^d$","0")}'
Answered By: Stéphane Chazelas

You can use sed to pick out a "word" that’s a single digit, and prefix it with zero:

sed -E 's/<([0-9])>/01/g' {file}

Overall:

  • The -E flag ensures that sed will use an Extended Regular Expression (ERE) rather than a standard/ordinary Regular Expression.
  • s/ …pattern… / …replacement… /g defines a substitution that applies an ERE match on …pattern… and corresponding replacement with …replacement…. Because of the g ("global") flag the match/replace is applied as many times as possible for a line.
  • There is no line selection so implicitly it applies this rule to every line in the input.

In the ERE match pattern:

  • < and > represent a word boundary
  • ( and ) identify a grouped expression that can be reused later
  • [0-9] is any single digit

In the ERE substitution:

  • 0 is a literal 0 character
  • 1 interpolates the text captured by the first grouped expression, i.e. an instance of a single digit

Output:

1664;04;05;35;37;43;05;06
1663;21;23;32;40;49;08;11
1662;16;17;34;35;44;05;10
1661;02;09;23;32;40;06;07
1660;23;25;30;44;47;09;12
1659;03;05;09;32;43;06;10
1658;04;06;10;13;34;03;05
1657;08;09;33;35;40;03;06
1656;15;20;31;44;48;01;03
1655;25;27;35;40;45;07;11
1654;07;32;33;34;38;06;09
1653;05;07;11;27;37;06;12
1652;07;31;33;35;36;07;10
1651;04;12;34;35;45;01;09
1650;05;08;29;35;48;05;06
1649;02;11;28;42;48;04;09
1648;02;11;12;19;38;04;08

In this case it doesn’t matter whether the file really is CSV (fields separated by comma, ","), or if it has fields separated by semicolon (";"), or even fields separated by whitespace.

If your sed supports -i you can edit the file "in place", but note that in common with most other utilities the "in place" effect is actually implemented behind the scenes as "write to a temporary file, and then replace the original file with that temporary one":

sed -E 's/<([0-9])>/01/g' file >file.$$.tmp &&
    mv -f file.$$.tmp file
rm -f file.$$.tmp
Answered By: roaima

Here is a solution using awk:

awk 'BEGIN{FS=OFS=";"} {for (i=1;i<=NF;i++) $i=sprintf("%02d",$i)} 1' file.csv

This will first set the input and output field separator to ;. It will then iterate over all fields and normalize the field width to at least two digits with left zero-padding by using sprintf() (which effectively won’t change the field content if it is already a two-digit number).

The seemingly stray 1 outside of the action blocks instructs awk to print the current line including all changes.

Note that awk will not by default modify the file, but only print to stdout, so you will either

  • have to redirect output to a temporary file and rename, as in
    awk ' ... ' file.csv > output.csv && mv output.csv file.csv
    
  • or use an awk version that supports the -i inplace extension:
    awk -i inplace ' ... ' file.csv
    

Also note that this particular case is tailored to your example input which only comprises single- or double-digit numbers. For the more general case of padding to n digits, you would use

awk -v n=3 'BEGIN{FS=OFS=";"} {for (i=1;i<=NF;i++) $i=sprintf("%0*d",n,$i)} 1' file.csv
Answered By: AdminBee

Using Miller with type inference (-A) and outputing integer values with zero padding and minimum field width of 2:

mlr -A --nidx --fs ';' --ofmt '%02d' cat file

Note that based on your input sample I’ve assumed the data is actually index-numbered rather than CSV.

Answered By: steeldriver

Using Raku (formerly known as Perl_6)

~$ raku -MText::CSV -e 'my @a = csv(in => $*IN, sep => ";");        
                        @a = @a>>.map( {sprintf "%.2d", $_ } );    
                        csv(in => @a, out => $*OUT, sep => ";");'  file 

The code above uses Raku’s Text::CSV module. In the first statement the CSV file is read-in, setting the sep parameter to accept ; semicolons as column separators. In the second statement the sprintf directive "%.2d" sets each column to a minimum width of 2, zero-padded. In the output ; semicolons are restored as column separators, otherwise the default will be used (comma).

The above answer holds for CSV files with every column being integer. So far unsigned integers are NYI (not-yet-implemented). It might be preferable to use the string form of sprintf, taking care to add 0 to the formatting code, which directs padding with zeros insted of blank spaces. Alternate second statement below:

@a = @a>>.map( {sprintf "%02s", $_ } );

Sample Input:

1664;4;5;35;37;43;5;6
1663;21;23;32;40;49;8;11
1662;16;17;34;35;44;5;10
1661;2;9;23;32;40;6;7
1660;23;25;30;44;47;9;12
1659;3;5;9;32;43;6;10
1658;4;6;10;13;34;3;5
1657;8;9;33;35;40;3;6
1656;15;20;31;44;48;1;3
1655;25;27;35;40;45;7;11
1654;7;32;33;34;38;6;9
1653;5;7;11;27;37;6;12
1652;7;31;33;35;36;7;10
1651;4;12;34;35;45;1;9
1650;5;8;29;35;48;5;6
1649;2;11;28;42;48;4;9
1648;2;11;12;19;38;4;8

Sample Output (either example above):

1664;04;05;35;37;43;05;06
1663;21;23;32;40;49;08;11
1662;16;17;34;35;44;05;10
1661;02;09;23;32;40;06;07
1660;23;25;30;44;47;09;12
1659;03;05;09;32;43;06;10
1658;04;06;10;13;34;03;05
1657;08;09;33;35;40;03;06
1656;15;20;31;44;48;01;03
1655;25;27;35;40;45;07;11
1654;07;32;33;34;38;06;09
1653;05;07;11;27;37;06;12
1652;07;31;33;35;36;07;10
1651;04;12;34;35;45;01;09
1650;05;08;29;35;48;05;06
1649;02;11;28;42;48;04;09
1648;02;11;12;19;38;04;08

Input/Output: You can redirect output to a new file taking the excellent code from either @AdminBee or @roiama (or both). If you want to direct Input/Output from within-the-code itself, the $*IN and $*OUT dynamic values can be replaced by PATHs (see second URL below for details).

https://docs.raku.org/routine/sprintf
https://github.com/Tux/CSV
https://raku.org

Answered By: jubilatious1

Super crude.
Super slow.
May be useful.

Using any text editor –
replace all occurrences of ",1," with ",01," etc

Single digits at the end of line need to eg
replace all occurrences of ",3[EOL identifier]" with ",03[EOL identifier]"

Answered By: Russell McMahon
Categories: Answers Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.