xmlfuzzer — an XML fuzzing tool

xmlfuzzer logo — a spade in a heap of XML

Xmlfuzzer takes XML Scheme on input and returns valid XML document with random data.

Features

It supports:

It doesn't (yet) support:

In general this project developed as needed, because we have more interesting things to work at.

Download

Building

First you need to get:

Then simply run make to build executable files.

Runtime dependencies

Optionally you need these runtime dependecies:

How to use

There is a core tool which can be used to generate random XML document from XML schema. It's called xmlfuzzer (or xmlfuzzer.byte for bytecode version).

$ ./xmlfuzzer -xsd ../ooxml/OfficeOpenXML-XMLSchema/wml.xsd -root-elem document -max-elem 10 2> /dev/null | tidy -utf8 -xml -i 2> /dev/null
<w:document xmlns:r="http://schemas.openxmlformats.org/officeDocument/2006/relationships"
xmlns:w="http://schemas.openxmlformats.org/wordprocessingml/2006/main">
  <w:body>
    <w:sdt>
      <w:sdtContent>
        <w:sdt>
          <w:sdtPr />
        </w:sdt>
      </w:sdtContent>
    </w:sdt>
    <w:sectPr w:rsidDel="Be0N" w:rsidRPr="-4R^">
      <w:footerReference w:type="default" r:id="" />
      <w:footnotePr>
        <w:pos w:val="pageBottom" />
        <w:numStart w:val="1584397285253833485" />
        <w:numRestart w:val="continuous" />
      </w:footnotePr>
      <w:endnotePr>
        <w:numStart w:val="-4241193316720752020" />
      </w:endnotePr>
    </w:sectPr>
  </w:body>
</w:document>

Example: generating random OOXML documents

Quickstart:

wget http://komar.in/src/xmlfuzzer/schemes/OfficeOpenXML-XMLSchema.tar.gz
tar -xf OfficeOpenXML-XMLSchema.tar.gz
./ooxml/ooxml.bash OfficeOpenXML-XMLSchema

First, you need to download a schema of this crappy format. Then extract it in any directory.

OOXML document is not just a XML file but a zip-archive which contains several files. I've written all necessary for generating valid OOXML-documents and store it in ooxml/ directory in the distribution of xmlfuzzer.

And then run ooxml/ooxml.sh to generate 1000000 OOXML-samples:

$ ./ooxml/ooxml.bash path-to-OOXML-schema

But it will work very slow because all operations will be running sequentially. If you don't want to fall asleep at your workplace, use parallel version of this script (which requires GNU parallel):

$ ./ooxml/ooxml-parallel.bash path-to-OOXML-schema

If you want to make XML files more readable, install tidy, open ooxml-iter.bash and replace these lines:

cp $batch/$file $base/word/document.xml
# replace with this line if you want pretty-printing
#tidy -xml -i -utf8 < $batch/$file 2> /dev/null > $base/word/document.xml;

by these:

#cp $batch/$file $base/word/document.xml
# replace with this line if you want pretty-printing
tidy -xml -i -utf8 < $batch/$file 2> /dev/null > $base/word/document.xml;

Here you can see some interesting screenshots of generated OOXML documents:

randomly generated OOXML document in MS Word randomly generated OOXML document in MS Word

Example: generating random XHTML

Now you need to get XHTML XML schema.

I set -no-cdata in this example just to make short and readable listing.

$ ./xmlfuzzer -xsd ../xhtml/xhtml1-transitional.xsd -root-elem html -max-elem 10 -skip-namespaces -no-cdata | tidy -xml -i -utf8
Import ../xhtml/xml.xsd
<html lang="" dir="">
  <head lang="" profile="" dir="">
    <meta lang="" content="" />
    <base id="" target="" />
    <isindex lang="" id="" />
    <title id="" dir="" />
    <isindex lang="" dir="" title="" />
  </head>
  <body lang="" link="" class="" onmouseup="" bgcolor=""
  onkeypress="" dir="" style="" onmouseover="" title="" onunload=""
  alink="" background="" vlink="" onkeydown="">
    <hr class="" onclick="" size="" onkeyup="" onkeypress="" id=""
    onmousemove="" onmouseover="" title="" onmouseout=""
    noshade="noshade" ondblclick="" />
  </body>
</html>

Example of results in firefox:

randomly generated XHTML page in firefox

Any questinons?

Mail me.

I think you should do that because generating samples for file fuzz testing is very specific for every task. So you probably will have to fix the code of fuzzer's core, and it's not funny to analyze my code. Of course, I can have no time or no desire to help you with your task, but it will be not too bad if you notify me what you want to do and I'll answer you how I can help you.

Credits