<< Previouse | Up | Next >>
Creating Wave Data Models
Now we are going to dive right in.
Lets start by making a copy of template.xml (found in your Peach folder) to wav.xml.
This will hold all of the information about our WAV fuzzer.
You will also want a sample WAV file, grab this one.
Go ahead and load up wav.xml into your XML editor.
Now, you will want to check out the following two specifications to get an idea for the format of WAV:
-
RIFF File Specification (Microsoft)
If you glance through the wave file format you will notice that the wave file is composed of a file header followed by a number of chunks. This is fairly common for file formats and also network packets. Typically each chunk has the same format that follows some form of T-L-V or Type-Length-Value, in fact wave file chunks are just that, a type followed by length followed by data. Each chunk type will define what there data looks like.
Based on this basic information we can plan out our fuzzer. We will have several top level "DataModel" elements that will be called:
-
Chunk
-
ChunkFmt
-
ChunkData
-
ChunkFact
-
ChunkCue
-
ChunkPlst
-
ChunkList
-
ChunkLabl
-
ChunkLtxt
-
ChunkNote
-
ChunkSmpl
-
ChunkInst
-
Wav
Setting Defaults for Number element
The majority of numbers used in WAV are unsigned. We can make that the default by adding this XML to our PIT:
<Defaults>
<Number signed="false" />
</Defaults>
Creating the Wav DataModel
Okay, head over to your wav.xml file and lets start writing some XML!
Locate the DataModel called TheDataModel is should look something like this:
<!-- TODO: Create data model -->
<DataModel name="TheDataModel">
</DataModel>
Read more about: DataModel
We are going to rename this Wav and define the header for our wave file. Looking at the specification I notice that the format for the wave header is:
-
File magic: 4 character string, always "RIFF"
-
Length of file: 32 bit unsigned integer
-
Riff type: 4 character string, always "WAVE"
We can define that in Peach using the following XML:
<!-- Defines the format of a WAV file -->
<DataModel name="Wav">
<!-- wave header -->
<String value="RIFF" token="true" />
<Number size="32" />
<String value="WAVE" token="true"/>
</DataModel>
Otherwise we will continue on and define the Chunk DataModel.
Chunk DataModel
The Chunk data model should be the first data model in the Peach Pit file, so lets add it in above the Wav data model as follows:
<!-- Defines the common wave chunk -->
<DataModel name="Chunk">
</DataModel>
<!-- Defines the format of a WAV file -->
<DataModel name="Wav">
<!-- wave header -->
<String value="RIFF" token="true" />
<Number size="32" />
<String value="WAVE" token="true"/>
</DataModel>
Notice that the Chunk data model occurs before the Wav data model. This is important, we will later reference this data model and it must be defined before we use it.
Looking at the specification we know that the wave chunk format is as follows:
-
ID: 4 character string padded with spaces
-
Size: 4 byte unsigned integer
-
Data: bytes of data the size of Size
We can model that in Peach using the following XML:
<!-- Defines the common wave chunk -->
<DataModel name="Chunk">
<String name="ID" length="4" padCharacter=" " />
<Number name="Size" size="32" >
<Relation type="size" of="Data" />
</Number>
<Blob name="Data" />
<Padding alignment="16" />
</DataModel>
Notice that we have created a size relationship between Size and Data. By doing this Size will automatically get updated with the size of Data when we produce data. When we parse in a sample file to use as default values this will instruct the parser that it can find the size of Data by looking at Size.
Now we can use a Padding type to pad out our DataModel correctly. Notice that the alignment attribute is set to 16. This tells the Padding element to automatically size itself so that the Chunk DataModel ends on a 16-bit (2-byte) boundary.
Format Chunk
Now we are going to define the details of the format chunk. We will use the generic chunk we already defined as a template for this chunk. That will allow us to only specify the specifics of this chunk and save on some typing.
Looking at the wave specification we can tell that the format chunk is as follows:
-
ID: Always 'fmt '
-
Data:
-
Compression code: 16 bit unsigned int
-
Number of channels: 16 bit unsigned int
-
Sample rate: 32bit unsigned int
-
Average bytes per second: 32bit unsigned int
-
Block align: 16 bit unsigned int
-
Significant bits per sample: 16 bit unsigned int
-
Extra format bytes: 16 bit unsigned int
-
The ChunkFmt data model will be defined after Chunk but before Wav:
<DataModel name="ChunkFmt" ref="Chunk">
<String name="ID" value="fmt " token="true"/>
<Block name="Data">
<Number name="CompressionCode" size="16" />
<Number name="NumberOfChannels" size="16" />
<Number name="SampleRate" size="32" />
<Number name="AverageBytesPerSecond" size="32" />
<Number name="BlockAlign" size="16" />
<Number name="SignificantBitsPerSample" size="16" />
<Number name="ExtraFormatBytes" size="16" />
<Blob name="ExtraData" />
</Block>
</DataModel>
Now, if you look at this you will notice a number of cool things. First off if you check out the DataModel element you can see an attribute called ref which has a value of Chunk. This tells Peach to copy the Chunk data model and make it the basis for our new data model called ChunkFmt. This means that all the elements defined in Chunk are in our new ChunkFmt by default! This is way cool and our first look at re-use in Peach. Next you will notice we have two elements in our data model that have the same name as elements in the Chunk model (ID and Data). This will cause the old elements to be replaced with our new ones. This allows us to override the old elements based on the needs of our format chunk type.
Now, you might be asking why we needed to override ID? This is a good question, we override ID here to specify the static string that it will always be when for this format chunk. Later we will specify a sample wave file to use and the parser will need hints on how to select the correct chunk. More on that later when we introduce the Choice element :)
Otherwise I think things should largely make sense.
Data Chunk
Fact Chunk
Okay, now we have the fact chunk. This chunk is defined as follows:
-
ID: "fact", string 4 chars
-
Data:
-
Number of samples: 32 bit unsigned int
-
Unknown? Unknown trailing bytes
-
Another easy one to define in XML:
<DataModel name="ChunkFact" ref="Chunk">
<String name="ID" value="fact" token="true"/>
<Block name="Data">
<Number size="32" />
<Blob/>
</Block>
</DataModel>
Wave List Chunk
This chunk it a bit different. The wave list chunk is comprised of silent chunk and data chunks alternating in a list. So, before we can complete the wave list chunk we will need to define the silent chunk. Lets do that now.
The silent chunk of easy, it’s just a 4 byte unsigned integer, the data model looks like this:
<DataModel name="ChunkSint" ref="Chunk">
<String name="ID" value="sInt" token="true"/>
<Block name="Data">
<Number size="32" />
</Block>
</DataModel>
Now that that’s out of the way we can get on with our wave list chunk. The data portion is an array of silent and data chunks. Here is how we do that:
<DataModel name="ChunkWavl" ref="Chunk">
<String name="ID" value="wavl" token="true"/>
<Block name="Data">
<Block name="ArrayOfChunks" maxOccurs="3000">
<Block ref="ChunkSint"/>
<Block ref="ChunkData" />
</Block>
</Block>
</DataModel>
This definition introduces the concept of arrays, or repeating elements. Notice that we have a block element that has an attribute maxOccurs. This will tell Peach that this block may occur more then one, upto 3,000 times. Also you will notice we are using the ref attribute with the Block element. This is just like using it with the data model, but allows us to get re-use inside of the data model as well.
That wasn’t so hard!
Cue Chunk
Now onto the cue chunk. This chunk should be easy now that we know about the maxOccurs attribute. This chunk is also an array. The array is comprised of the following:
-
ID: 4 bytes
-
Position: 4 byte unsigned integer
-
Data Chunk ID: 4 byte RIFF ID
-
Chunk start: 4 byte unsigned integer offset of data chunk
-
Block start: 4 byte unsigned integer offset to sample of first channel
-
Sample offset: 4 byte unsigned integer offset to sample byte of first channel
We don’t have to worry about the fact the last 3 numbers are offset’s. The data is already parsed in the wave list chunk, we just need to read them in. So lets build the XML!
<DataModel name="ChunkCue" ref="Chunk">
<String name="ID" value="cue " token="true"/>
<Block name="Data">
<Block name="ArrayOfCues" maxOccurs="3000">
<String length="4" />
<Number size="32" />
<String length="4" />
<Number size="32" />
<Number size="32" />
<Number size="32" />
</Block>
</Block>
</DataModel>
There shouldn’t be any surprises here, we are just re-using the same stuff as before. Once again I’m being a bit lazy and not giving everything a name. This is okay, but it can be nice sometimes to use the names as documentation :)
Playlist Chunk
Looking at this chunk I notice that Data will be comprised of an array (again) but this time the count will be included before the array. We will use a count-of relationship to model this.
<DataModel name="ChunkPlst" ref="Chunk">
<String name="ID" value="plst" token="true"/>
<Block name="Data">
<Number name="NumberOfSegments" size="32" >
<Relation type="count" of="ArrayOfSegments"/>
</Number>
<Block name="ArrayOfSegments" maxOccurs="3000">
<String length="4" />
<Number size="32" />
<Number size="32" />
</Block>
</Block>
</DataModel>
Notice in this XML that we setup a relationship between NumberOfSegments and ArrayOfSegments that will indicate the count.
Associated Data List Chunk
This chunk is an array of label chunks, name chunks, and text chunks. We will not know in what order they will appear so we will need to support them in any order. This will actually be fairly easy, but first we need to define each of the tree chunks before we define our data list chunk. Lets do that now.
Label Chunk
First up is the label chunk, in this the data portion contains a null terminated string an possible a single pad byte.
<DataModel name="ChunkLabl" ref="Chunk">
<String name="ID" value="labl" token="true"/>
<Block name="Data">
<Number size="32" />
<String nullTerminated="true" />
</Block>
</DataModel>
We will automatically get the pad byte from the Chunk.
Note Chunk
Now onto the note chunk, it turns out this chunk is exactly the same as the label chunk! So, we will just create an alias for it like this:
<DataModel name="ChunkNote" ref="ChunkLabl">
<String name="ID" value="note" token="true"/>
</DataModel>
Yup, that’s it! Nice and easy :)
Labeled Text Chunk
This one is also very similar to the note and label chunks but has several more numbers included in it. I’ll copy the data model for label and expand it like this:
<DataModel name="ChunkLtxt" ref="Chunk">
<String name="ID" value="ltxt" token="true"/>
<Block name="Data">
<Number size="32" />
<Number size="32" />
<Number size="32" />
<Number size="16" />
<Number size="16" />
<Number size="16" />
<Number size="16" />
<String nullTerminated="true" />
</Block>
</DataModel>
As we can see it’s very similar to the label chunk.
Back to Associated Data List Chunk
Okay, we are ready to combine all those chunks into an array. It will end up looking like this:
<DataModel name="ChunkList" ref="Chunk">
<String name="ID" value="list" token="true"/>
<Block name="Data">
<String value="adtl" token="true" />
<Choice maxOccurs="3000">
<Block ref="ChunkLabl"/>
<Block ref="ChunkNote"/>
<Block ref="ChunkLtxt"/>
<Block ref="Chunk"/>
</Choice>
</Block>
</DataModel>
Sampler Chunk
The sampler chunk is similar to what we have already seem, it contains several numbers and then an array of some values. We will define it as follows:
<DataModel name="ChunkSmpl" ref="Chunk">
<String name="ID" value="smpl" token="true"/>
<Block name="Data">
<Number size="32" />
<Number size="32" />
<Number size="32" />
<Number size="32" />
<Number size="32" />
<Number size="32" />
<Number size="32" />
<Number size="32" />
<Number size="32" />
<Block maxOccurs="3000">
<Number size="32" />
<Number size="32" />
<Number size="32" />
<Number size="32" />
<Number size="32" />
<Number size="32" />
</Block>
</Block>
</DataModel>
Again, that was straight forward :)
Instrument Chunk
Few, this is our last chunk to define and it’s an easy one. It’s comprised of just seven (7) 8 bit numbers. This will be super easy.
<DataModel name="ChunkInst" ref="Chunk">
<String name="ID" value="inst" token="true"/>
<Block name="Data">
<Number size="8"/>
<Number size="8"/>
<Number size="8"/>
<Number size="8"/>
<Number size="8"/>
<Number size="8"/>
<Number size="8"/>
</Block>
</DataModel>
Notice that the numbers in this case are not unsigned. The values they can have range from negative to positive.
Finishing the Wav Model
Time to wrap this modeling up! Lets head down to the Wav chunk which last we touched it looked like this:
<!-- Defines the format of a WAV file -->
<DataModel name="Wav">
<!-- wave header -->
<String value="RIFF" token="true" />
<Number size="32" />
<String value="WAVE" token="true"/>
</DataModel>
We are going to add in an array of chunks, however we don’t know in what order all these chunks will occur in, so we will use our friend the Choice element to have Peach choose for us based on the input file.
<!-- Defines the format of a WAV file -->
<DataModel name="Wav">
<!-- wave header -->
<String value="RIFF" token="true" />
<Number size="32" />
<String value="WAVE" token="true"/>
<Choice maxOccurs="30000">
<Block ref="ChunkFmt"/>
<Block ref="ChunkData"/>
<Block ref="ChunkFact"/>
<Block ref="ChunkSint"/>
<Block ref="ChunkWavl"/>
<Block ref="ChunkCue"/>
<Block ref="ChunkPlst"/>
<Block ref="ChunkLtxt"/>
<Block ref="ChunkSmpl"/>
<Block ref="ChunkInst"/>
<Block ref="Chunk"/>
</Choice>
</DataModel>
That wasn’t so hard was it!
Next Steps
All the hard work is over, but there is still stuff we need to do before we can run our fuzzer!
<< Previouse | Up | Next >>


