Convert json mapping object to managed csv row using bash script

I have json mapping under properties key as below in a file Customer.json,

{
    "customer": {
        "properties": {
            "customerId": {
                "type": "string",
                "index": "not_analyzed"
            },
            "name": {
                "type": "string",
                "index": "not_analyzed"
            }
        }
    }
}

Which I want to convert into following with key and displayName duplicated and type from above mapping,

field(key: 'customerId',     displayName: 'customerId', type: 'String')
field(key: 'name',           displayName: 'name',       type: 'String')

I hit and tried bash + python as below assuming it fetched customer key first and loops inside properties ,

$ cat Customer.json | python -c 'import sys; import simplejson as json; 
print "n".join( [i["properties"] for i in json.loads( sys.stdin.read() )["customer"]] )'

Traceback (most recent call last):
File "<string>", line 2, in <module>
TypeError: string indices must be integers, not str

I’m open to other solutions as well.

Asked By: prayagupa

||

The error seems clear to me, the variable “i” is a string as the for loop iterates over the value of “customer” dict/mapping. That value is a dict/mapping itself and iterating over it will get you the succesion of the list of keys (i.e. [“properties”].

cat Customer.json |  python -c 'import sys; import simplejson as json; 
print "n".join( [i for i in json.loads( sys.stdin.read() )["customer"]["properties"] ] )'

will give you

 customerid
 name

and the following should get you closer to your goal:

cat Customer.json |  python -c 'import sys; import simplejson as json; 
print "n".join( ["{} {}".format(k, v) for k,v in json.loads( sys.stdin.read() )["customer"]["properties"].iteritems() ] )'

which gives:

customerId {'index': 'not_analyzed', 'type': 'string'}
name {'index': 'not_analyzed', 'type': 'string'}

From there I would recommend you actually make the python in a script. You will have to decide on how to get from string to String and do the additional formatting. A multiliner is always more easy to debug (case proven by your Question) and maintainable and gives more meaningful (line numbers) error messages.

Answered By: Anthon

Parsing such structured data is best done using a dedicated parser, just as you are doing. However, in this particular case, it is simple enough that you could do:

$ grep -B 1 '"type":' Customer.json | tr $'"' $"'" | sed 's/[:,]//g' | 
    awk '{print "field(key: "$1",tdisplayName: "$1",t type: "$NF")"}' RS="--" 

Which returns:

field(key: 'customerId',    displayName: 'customerId',   type: 'string')
field(key: 'name',  displayName: 'name',     type: 'string')
Answered By: terdon

With jq:

jq -r '.customer.properties | to_entries[] | 
    "field(key: u0027(.key)u0027, displayName: u0027(.key)u0027, type: u0027(.value.type)u0027)"' Customer.json

This turns the properties values into a list objects with keys key and value. Each key will have the value of the properties keys (i.e., customerId and name), and each value will have the value that is the properties sub-object.

Given one such entry, the key value and the type value of the value entry are extracted and inserted into the text string template. Each u0027 that you see in the string is a single-quote character.

The output of this, given your example input file, would be

field(key: 'customerId', displayName: 'customerId', type: 'string')
field(key: 'name', displayName: 'name', type: 'string')
Answered By: Kusalananda
Categories: Answers Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.