XML is complicated and parsing it natively is quite difficult, even with scripting languages. Fortunately, there is a utility that can convert XML to JSON, the one that is easier to work with, both in scripts and on the command line.
Use the xq utility
You will want to use a custom utility for this, instead of trying to parse it with something like regex, which is a bad idea. There is a utility called xq
which is perfect for this task. It is installed together yq
, that works for YAML. You can install yq
the pip:
pip install yq
Underhood, this utility uses jq
to handle work with JSON, so it must download the binary, and move it somewhere on your ROUTE (/usr/local/bin/
should work fine).
Now, you will be able to parse the XML input by piping it into xq
:
cat xml | why .
the .
operator means you want to convert all XML to JSON. Actually, you can use full jq
syntax here to choose sub-elements, where you can read our guide.
It can also generate xq
The solution as XML with the -x
flag:
xq -x
This enables you to use jq
syntax selection for parsing XML while keeping it in XML format. Even though you can't seem to convert backwards, What xq
still want XML syntax in format.
A roadblock with converting XML to JSON
XML to JSON is not a perfect conversion; en XML, the order of the items can be important and the keys can be duplicated. A document like:
<e> <a>some</a> <b>textual</b> <a>content</a> </e>
It would throw an error if translated directly to JSON, because the a
The key exists twice. Then, becomes a matrix, what breaks the order. xq
returns the next output for that bit of XML:
{ "e": { "a": [ "some", "content" ], "b": "textual" } }
Which is technically correct, but not ideal in all situations. You will want to double check and make sure your XML conversion is smooth.