xmlfuzzer — an XML fuzzing tool

Xmlfuzzer takes XML Scheme on input and returns valid XML document with random data.

Features

It supports:

XML Schemes;
various restrictions for generated tree — by elements, by tags, etc;
fast batch mode.

It doesn't (yet) support:

XML Document Type Definitions (DTD);
some features from XML Schema — wildcards, pattern (regexp) restrictions;
generating incorrect XML's;
generating mutated XML from samples (but untidy does);
running client-side applications, watching for crashes and memeory leaks, etc — it only generates samples for fuzzing.

In general this project developed as needed, because we have more interesting things to work at.

Download

The last sources from darcs repository: darcs get http://komar.in/darcs/xmlfuzzer/.
Release sources in tarballs. Last version of xmlfuzzer is 0.04.
Compiling OCamlduce application is not too funny, so you can just download pre-compiled binaries.

Building

First you need to get:

Then simply run make to build executable files.

Runtime dependencies

Optionally you need these runtime dependecies:

tidy — just for pretty-printing;
GNU parallel — a tool for executing jobs in parallel.

How to use

There is a core tool which can be used to generate random XML document from XML schema. It's called xmlfuzzer (or xmlfuzzer.byte for bytecode version).


$ ./xmlfuzzer -xsd ../ooxml/OfficeOpenXML-XMLSchema/wml.xsd -root-elem document -max-elem 10 2> /dev/null | tidy -utf8 -xml -i 2> /dev/null
<w:document xmlns:r="http://schemas.openxmlformats.org/officeDocument/2006/relationships"
xmlns:w="http://schemas.openxmlformats.org/wordprocessingml/2006/main">
  <w:body>
    <w:sdt>
      <w:sdtContent>
        <w:sdt>
          <w:sdtPr />
        </w:sdt>
      </w:sdtContent>
    </w:sdt>
    <w:sectPr w:rsidDel="Be0N" w:rsidRPr="-4R^">
      <w:footerReference w:type="default" r:id="" />
      <w:footnotePr>
        <w:pos w:val="pageBottom" />
        <w:numStart w:val="1584397285253833485" />
        <w:numRestart w:val="continuous" />
      </w:footnotePr>
      <w:endnotePr>
        <w:numStart w:val="-4241193316720752020" />
      </w:endnotePr>
    </w:sectPr>
  </w:body>
</w:document>

Example: generating random OOXML documents

Quickstart:


wget http://komar.in/src/xmlfuzzer/schemes/OfficeOpenXML-XMLSchema.tar.gz
tar -xf OfficeOpenXML-XMLSchema.tar.gz
./ooxml/ooxml.bash OfficeOpenXML-XMLSchema

First, you need to download a schema of this crappy format. Then extract it in any directory.

OOXML document is not just a XML file but a zip-archive which contains several files. I've written all necessary for generating valid OOXML-documents and store it in ooxml/ directory in the distribution of xmlfuzzer.

And then run ooxml/ooxml.sh to generate 1000000 OOXML-samples:

$ ./ooxml/ooxml.bash path-to-OOXML-schema

But it will work very slow because all operations will be running sequentially. If you don't want to fall asleep at your workplace, use parallel version of this script (which requires GNU parallel):

$ ./ooxml/ooxml-parallel.bash path-to-OOXML-schema

If you want to make XML files more readable, install tidy, open ooxml-iter.bash and replace these lines:


cp $batch/$file $base/word/document.xml
# replace with this line if you want pretty-printing
#tidy -xml -i -utf8 < $batch/$file 2> /dev/null > $base/word/document.xml;

by these:


#cp $batch/$file $base/word/document.xml
# replace with this line if you want pretty-printing
tidy -xml -i -utf8 < $batch/$file 2> /dev/null > $base/word/document.xml;

Here you can see some interesting screenshots of generated OOXML documents:

Example: generating random XHTML

Now you need to get XHTML XML schema.

I set -no-cdata in this example just to make short and readable listing.


$ ./xmlfuzzer -xsd ../xhtml/xhtml1-transitional.xsd -root-elem html -max-elem 10 -skip-namespaces -no-cdata | tidy -xml -i -utf8
Import ../xhtml/xml.xsd
<html lang="" dir="">
  <head lang="" profile="" dir="">
    <meta lang="" content="" />
    <base id="" target="" />
    <isindex lang="" id="" />
    <title id="" dir="" />
    <isindex lang="" dir="" title="" />
  </head>
  <body lang="" link="" class="" onmouseup="" bgcolor=""
  onkeypress="" dir="" style="" onmouseover="" title="" onunload=""
  alink="" background="" vlink="" onkeydown="">
    <hr class="" onclick="" size="" onkeyup="" onkeypress="" id=""
    onmousemove="" onmouseover="" title="" onmouseout=""
    noshade="noshade" ondblclick="" />
  </body>
</html>

Example of results in firefox:

Any questinons?

Mail me.

I think you should do that because generating samples for file fuzz testing is very specific for every task. So you probably will have to fix the code of fuzzer's core, and it's not funny to analyze my code. Of course, I can have no time or no desire to help you with your task, but it will be not too bad if you notify me what you want to do and I'll answer you how I can help you.

Credits

written by Alexander Markov
logo by Voker57
beer by Center of Innovative Security Solutions
and many thanks to OCamlduce guys