Formatting Xml Files with xmllint
I am currently working on a project where I need to parse Open Xml files comming from the Office Suite. As you may already know this involves opening what is essentially a zip file full of xml files.
When creating a parser though, having source files for referencing is very useful, and the saved files in these zip files are unformatted, i.e. they have no newlines or indenting - which making navigating them very painful.
After having opening a file I usually ran this vim command %!xmllint --format %
which would run the file through xmllint, format it, and then replace the buffers content with the formatted version of the file.
This also worked well - however I keep opening new files, and keep forgetting to save after having formatted them… So that command got run a lot!
First try to solve it
I decided to just parse all xml files in one go after having opened the zip. I thought this to be as easy as:
find . -name "*.xml*" -exec xmllint --format {} \;
However this just output each file to stdout. And xmllint has no option to save the file in place.
Second try
Well - maybe we could just pipe the output back into the file.
find . -name "*.xml*" -exec xmllint --format {} > {} \;
But, alas, this did not work. The command now just did nothing. Maybe because you cannot reference the filename multiple times (turns out, you can…)
Third try
Ok, so I created a script to wrap the command in instaed.
#!/bin/bash
xmllint $1 > $1
and calling that
find . -name "*.xml*" -exec format.sh {} \;
But now all files gives us this error in xmllint:
parser error : Document is empty
So we are destroying the file, before xml lint is parsing.
Final try
Instaed of finding an elegant way to do this, I thought: Well if this happened to me, I would just format to a new file, the remove the original file, and rename the new file to the old name…
So that’s what I did:
#!/bin/bash
xmllint --format $1 > $1.tmp
rm $1
mv $1.tmp $1
I then extended it a bit, and placed in my scripts folder:
#!/bin/bash
if [[ $1 == "all" ]]
then
find . -name "*.xml*" -exec $0 {} \;
else
xmllint --format $1 > $1.tmp
rm $1
mv $1.tmp $1
fi
and now I can just be in a unzipped folder of xml files and run
formatxml all