Parsing CSV with AWK, returning fields to bash var with line breaks

I have to migrate a password database (Keepass) by using a csv file of it, to a new application by using its API. The API is updated with post requests, those needs JSON data format.
What I need to do is to use the KeePass CSV to export the passwords and other pieces of information linked to it to the API. I decided todo a script using bash and awk.

The columns of the csv file are arranged like this :

"Group","Title","Username","Password","URL","Notes","TOTP","Icon","Last Modified","Created"

The field "notes" is multiline because some of the comments have line breaks into them.

"That's an important note, <br/>
some extra infos <br/>
concerning a password"

Here’s an example of the API request to post the data, the data field is in JSON format :

I didn’t put all the needed fields on this request but you can already see how it would work. Some of the field names are different because the KeePass and the API fields name are differently made


curl -s --request PUT -u username123:password123 -H 'Content-Type: application/json; charset=utf-8' --data-binary @- <<DATA
"name": "$var1",
"username": "$var2",
"password": "$var3"

I have planned to parse my CSV file field by field and then when I finish to parse the row, I do my API request to post the password in the database. Then I do this for every remaining rows.

To process the CSV I find the AWK language, it seems very handy and quite useful for my situation. I’ve come with multiple testing on my file with the gsub command, helping me to replace the line breaks (n). I don’t really know how to go further. Here’s some of them (only the first work :

cat keepass.csv | awk NF=NF RS=/n/ OFS=n
cat keepass.csv |awk 'BEGIN {RS=","}{gsub("/n/","",$0); print $0}'
cat keepass.csv | awk 'BEGIN {RS=""}{gsub(/n/,"",$6); print $0}'

I also know that you can share bash var by adding -v after awk. Here’s the closest code I could have.

awk -v RS='"n' -v FPAT='"[^"]*"|[^,]*' '{
print "Row n°", NR, ""
for (i=1; i<=NF; i++) {
sub(/^"/, "", $i)
printf "Field %d, value=[%s]n", i, $i
}} keepass.csv

What I am looking for, would be a command to parse any column of my csv by taking in account the multilines notes and input them into the global var of bash in JSON format.

I think you need to structure it by doing something like :

awk -v 'BEGIN{parsing and replacing keeping 'n' of notes}
if end of row,
return boolean to bash for processing the API requests, wait,
restart the loop}''

I’m new to scripting, I think it can be done in only a few lines but I’m unsure on how to proceed. I can change the language to python if needed and I can add some tools to my code.

Asked By: Plourfi


The field "notes" is multiline

No! CSV does not support multi-line. Absolutely not.

There are two common approaches if you need to store a multi-line string in CSV.

  1. Change the n character in the original string to something other. Usually it is just a two-letter "n" string.
  2. Change the record delimiter from n into some other character which will not appear in the string. Often it is something like x01.

The first approach would require some additional pre- and post-conversion, but pretty reliable.

The second – works very fine and easy, but not all CSV-able application can change record delimiter, and there is always a possibility that the character you used as record delimiter would appear inside a field.

If by some reason you cannot recreate the file and make it a proper CSV, I would suggest to go with a loop:

while read row
 if row has 6 fields ("Notes" is a 6th field)
     append "n" to it
     append next row to it
   repeat until the working row has 10 fields
   export the work row to output file
 end if
end while

Any language can do it (even bash).

Answered By: White Owl

The multi line is a feature of a CSV cell, and you can use an utility that is CSV aware as Miller.

As examples, if you have this CSV you can run

  • mlr --csv cut -f fieldA acr.csv to cut the first column
  • mlr --icsv --ojson cut -f fieldA acr.csv to cut the first column and convert all to JSON
    "fieldA": "That's an important note,nsome extra infosnConcerning a passwordnIpsum"
    "fieldA": "hello"

As you can see Miller is aware of cell carriage return (RFC4180 compliant).

Below an image of the sample input file.

enter image description here

Answered By: aborruso
Categories: Answers Tags: , , , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.