xmlfuzzer — an XML fuzzing tool
Xmlfuzzer takes XML Scheme on input and returns valid XML document with random data.
Features
It supports:
- XML Schemes;
- various restrictions for generated tree — by elements, by tags, etc;
- fast batch mode.
It doesn't (yet) support:
- XML Document Type Definitions (DTD);
- some features from XML Schema — wildcards, pattern (regexp) restrictions;
- generating incorrect XML's;
- generating mutated XML from samples (but untidy does);
- running client-side applications, watching for crashes and memeory leaks, etc — it only generates samples for fuzzing.
In general this project developed as needed, because we have more interesting things to work at.
Download
- The last sources from darcs repository:
darcs get http://komar.in/darcs/xmlfuzzer/
. - Release sources in tarballs. Last version of xmlfuzzer is 0.04.
- Compiling OCamlduce application is not too funny, so you can just download pre-compiled binaries.
Building
First you need to get:
- an OCamlduce compiler >= 3.11;
- GNU Make;
- xml-light;
- extlib for OCaml.
Then simply run make
to build executable files.
Runtime dependencies
Optionally you need these runtime dependecies:
- tidy — just for pretty-printing;
- GNU parallel — a tool for executing jobs in parallel.
How to use
There is a core tool which can be used to generate random XML document from XML schema. It's called xmlfuzzer
(or xmlfuzzer.byte
for bytecode version).
$ ./xmlfuzzer -xsd ../ooxml/OfficeOpenXML-XMLSchema/wml.xsd -root-elem document -max-elem 10 2> /dev/null | tidy -utf8 -xml -i 2> /dev/null
<w:document xmlns:r="http://schemas.openxmlformats.org/officeDocument/2006/relationships"
xmlns:w="http://schemas.openxmlformats.org/wordprocessingml/2006/main">
<w:body>
<w:sdt>
<w:sdtContent>
<w:sdt>
<w:sdtPr />
</w:sdt>
</w:sdtContent>
</w:sdt>
<w:sectPr w:rsidDel="Be0N" w:rsidRPr="-4R^">
<w:footerReference w:type="default" r:id="" />
<w:footnotePr>
<w:pos w:val="pageBottom" />
<w:numStart w:val="1584397285253833485" />
<w:numRestart w:val="continuous" />
</w:footnotePr>
<w:endnotePr>
<w:numStart w:val="-4241193316720752020" />
</w:endnotePr>
</w:sectPr>
</w:body>
</w:document>
Example: generating random OOXML documents
Quickstart:
wget http://komar.in/src/xmlfuzzer/schemes/OfficeOpenXML-XMLSchema.tar.gz
tar -xf OfficeOpenXML-XMLSchema.tar.gz
./ooxml/ooxml.bash OfficeOpenXML-XMLSchema
First, you need to download a schema of this crappy format. Then extract it in any directory.
OOXML document is not just a XML file but a zip-archive which contains several files. I've written all necessary for generating valid OOXML-documents and store it in ooxml/
directory in the distribution of xmlfuzzer.
And then run ooxml/ooxml.sh
to generate 1000000 OOXML-samples:
$ ./ooxml/ooxml.bash path-to-OOXML-schema
But it will work very slow because all operations will be running sequentially. If you don't want to fall asleep at your workplace, use parallel version of this script (which requires GNU parallel):
$ ./ooxml/ooxml-parallel.bash path-to-OOXML-schema
If you want to make XML files more readable, install tidy, open ooxml-iter.bash
and replace these lines:
cp $batch/$file $base/word/document.xml
# replace with this line if you want pretty-printing
#tidy -xml -i -utf8 < $batch/$file 2> /dev/null > $base/word/document.xml;
by these:
#cp $batch/$file $base/word/document.xml
# replace with this line if you want pretty-printing
tidy -xml -i -utf8 < $batch/$file 2> /dev/null > $base/word/document.xml;
Here you can see some interesting screenshots of generated OOXML documents:
Example: generating random XHTML
Now you need to get XHTML XML schema.
I set -no-cdata
in this example just to make short and readable listing.
$ ./xmlfuzzer -xsd ../xhtml/xhtml1-transitional.xsd -root-elem html -max-elem 10 -skip-namespaces -no-cdata | tidy -xml -i -utf8
Import ../xhtml/xml.xsd
<html lang="" dir="">
<head lang="" profile="" dir="">
<meta lang="" content="" />
<base id="" target="" />
<isindex lang="" id="" />
<title id="" dir="" />
<isindex lang="" dir="" title="" />
</head>
<body lang="" link="" class="" onmouseup="" bgcolor=""
onkeypress="" dir="" style="" onmouseover="" title="" onunload=""
alink="" background="" vlink="" onkeydown="">
<hr class="" onclick="" size="" onkeyup="" onkeypress="" id=""
onmousemove="" onmouseover="" title="" onmouseout=""
noshade="noshade" ondblclick="" />
</body>
</html>
Example of results in firefox:
Any questinons?
I think you should do that because generating samples for file fuzz testing is very specific for every task. So you probably will have to fix the code of fuzzer's core, and it's not funny to analyze my code. Of course, I can have no time or no desire to help you with your task, but it will be not too bad if you notify me what you want to do and I'll answer you how I can help you.
Credits
- written by Alexander Markov
- logo by Voker57
- beer by Center of Innovative Security Solutions
- and many thanks to OCamlduce guys