Stupid Compiler

Notes on things about stuff

I'm not dead yet…

with 2 comments

This site has never been updated all that regularly, I admit. This time, though, I have an excuse; I’ll be speaking at Blackhat DC and ShmooCon. Preparing for two talks in the same week on mostly disjoint topics has been taking up all my free time (and more). So, I apologize for no good technical posts recently, but I hope to change that post-cons. I have some ideas up my sleeve and I’ll be releasing some of the tools developed for the talks.

The Blackhat DC talk (4:45pm on Wednesday, February 3rd) is an elaboration of the last post on information leakage and the types of things we can do with it. The focus is the circumvention of Windows memory corruption mitigations (ASLR and DEP) using the addresses of leaked heap objects and predictable behaviors within the JIT compiler (examples using Adobe Flash).

The Shmoocon talk (4:00pm on Saturday, February 6th) focuses on a simple dynamic flow analysis (taint tracking) tool and the machinery needed to make it useful for auditing/reversing. Even if the taint tracking stuff isn’t your bag, I’ll be releasing a Pin tool that does full tracing of an execution and I hope the analysis engine will be abstract enough to allow others to write other analysis on top or to export the trace. This was submitted as a work-in-progress and it really is. I have a bunch of code-in-motion and a ton of python glue right now. I hope to clean everything up in time, but the glue will be the first thing neglected. My plan is to move the development of the tracing tool and analysis framework to a public hg repository. Oh, there is also an IDA plug-in involved (for interacting with the taint information). You can see the old test version of it in a screen shot from the DiffCov blog. Evidently Shmoocon will be streaming the talks, so if your timezone permits, you can heckle me live even if you don’t have a golden Shmoo ticket.

Lastly, I’ve never been to any industry hacker cons, so I’ll be trying to meet lots of people. Send me an e-mail and let me know where you’ll be if you want to meet up for a chat.

Written by dionthegod

January 18, 2010 at 11:00 pm

Posted in Uncategorized

Getting Pointers from Leaky Interpreters

with 4 comments

Note: I haven’t seen this anywhere before but I wouldn’t be surprised if it had been done, so let me know if I should credit someone. It was inspired in some really abstract way by a USENIX Security paper from 2003.


To create an ActionScript function that takes an Object and computes the address when run in Tamarin.


Despite the growing adoption of SDLs and the proliferation of code analysis tools, many commercial applications are still sprinkled with memory corruption bugs. Microsoft has implemented address space layout randomization (ASLR) and data execution prevention (DEP) to make the exploitation of these vulnerabilities much more difficult. Researchers and attackers have been forced to develop application specific strategies to circumvent the mitigations. Mark Dowd and Alex Sotirov wrote a really useful Blackhat USA 2008 paper explaining the implementation decisions made for each of the mitigations (for each version of Windows) and some attacks they’ve developed to take advantage of those decisions. I waited way too long to read this paper — don’t make my mistake. These techniques are not universal and for each “class” of application new exploit techniques must be developed. Thinking about application specific techniques has been fun and produced a few cool gadgets. This note talks about one of them.

Knowing the address of attacker controllable data before triggering an exploit is useful (if not necessary). Many times, a heap spray is enough to place some shellcode or data structure at a known address. Despite their effectiveness, heap sprays just feel dirty. So, out of a desire to pimp my exploits, I set out in search of a way to leak addresses of attacker controlled structures.

Tagged Pointers

Many interpreters represent atomic objects (we’ll call them atoms) as tagged pointers. Each base type is given a tag and an atom is represented by placing this tag in the lower bits while the atoms value is encoded in the upper bits. The Tamarin virtual machine uses 3 bit tags. The Tamarin integer tag is 6, so, for example, the atom for 42 is encoded as:

>>> '0x%08x' %  (42 << 3 | 6)

Similarly, a Tamarin Object tag is 2, so an Object at 0xaabbccd0 is encoded as:

>>> '0x%08x' %  (0xaabbccd0 | 2)

Integers that don’t fit into 29 bits are interned as strings. This technique is quite old and was used on the Lisp Machines and was discussed in both SICP (footnote 8) and at least one [PDF] of the early ‘Lambda Papers’.

Tamarin Dictionaries

Tamarin Dictionary objects store Object to Object mappings and are implemented internally as hashtables. The hashtable implementation maintains a table that is the smallest power of 2 greater than the number of entries * 5/4. In other words, the table is never more than 80% full (see the source). When an insert causes the table to become more full, the table is grown and all entries are rehashed (see the source). The hash function operates on atoms; it shifts off the lower 3 tag bits and masks off enough top bits to fit the table (see the source).

The “for x in y” looping construct and the AVM2 instructions “hasnext2”, “nextname”, and “nextvalue” allow the interpreted program to iterate over a Dictionary (or generic Object — the difference is not made clear in the AVM2 documentation, but the Tamarin implementation makes a distinction). This iteration is accomplished by walking the hashtable from start of table to end. For example, if all integers inserted into the Dictionary are less than the size of the hashtable, the integers will come back out in ascending order. We will leverage this to determine the value of any atom (well, most of it, anyway).

Integer Sieves

Now that all of the background is out of the way, I can explain the general idea. Since integers are placed into the hashtable by using their value as the key (of course, the any top bits will be masked off), we can determine the value of some other atom by comparing the integers before and after it. Since Object atoms are basically just pointers, we can disclose as many bits of a pointer as we can grow the hashtable. To avoid the problem of a hash collision, we create two Dictionaries, one with all the even integers and one with all the odd integers (up to some power of two — the larger, the more bits we discover). After creating the Dictionaries, we insert the victim object into both Dictionaries (the value you map it to does not matter for this trick — in fact, the values are of no use at all). Next, search each Dictionary using the for-in construct, recording the last key visited and breaking when the current key is the victim object. We now have two values, the two values should differ by 17. This is due to the linear probe; when a hashtable insert collides, it uses a quadratic probe to find an empty slot. It begins at hash(atom) + 8 (collides — even + even = even, odd + even = odd), then tries hash(atom) + 17 (success — even + odd = odd, odd + odd = even). So, we know that when the two values differ by 17, the lower value is the one from the Dictionary that didn’t have the collision. When it isn’t 17 (wrapped around), the larger value is from the Dictionary that didn’t have the collision. We now have the integer that, when turned into an atom is 8 (aka 1 << 3) smaller than the victim atom. Finally, to get the victim atom from the integer, x: (x << 3 + 8) or more to the point ((x + 1) << 3).

Sample Code

package getptr
  import avmplus.System;
  import flash.utils.Dictionary;

  print("getptr -");

  function objToPtr(obj) {
    var i;
    var even = new Dictionary();
    var odd = new Dictionary();

    print("[+] Creating hashtables");
    for (i = 0; i &lt; (1024 * 1024 * 2); i += 1) {
      even[i * 2] = i;
	  odd[i * 2 + 1] = i;

    print("[+] Triggering hashtable insert");
    even[obj] = false;
    odd[obj] = false;

    var evenPrev = 0;
    var oddPrev = 0;
    var curr;

    print("[+] Searching even hashtable");
    for (curr in even)
      if (curr == obj) { break; }
	  evenPrev = curr;

    print("[+] Searching odd hashtable");
    for (curr in odd)
      if (curr == obj) { break; }
	  oddPrev = curr;

    var ptr;
    if (evenPrev &lt; oddPrev) {
      ptr = evenPrev;
      if (evenPrev + 8 + 9 != oddPrev) {
        print("[-] Something went wrong " + evenPrev + ", " + oddPrev);
    } else {
      ptr = oddPrev;
      if (oddPrev + 8 + 9 != evenPrev) {
        print("[-] Something went wrong " + oddPrev + ", " + evenPrev);
    ptr = (ptr + 1) * 8;

    return ptr;

  var victim = "VICTIM";
  var ptr = objToPtr(victim);

  print("[+] ptr = " + ptr);


Notes: All test were performed with Tamarin pulled from tamarin-central @ dab354bc047c

To test this code, compile up a Tamarin avmshell, place a breakpoint on SystemClass::debugger(), and run the sample script. Once the debugger hits, check the address spit out. On an XP system, the heap is gonna start pretty low, so the output address will probably be correct (it was for me). For a string, the address + 0xC will be a pointer to the bytes of the string (to help you verify that it works).

I compiled the above ActionScript with asc.jar as suggested by the Tamarin build documentation.

It is worth knowing that Flash Player uses a version of the Tamarin virtual machine, but the sample code will not work directly with Flash. Feel free to reverse it yourself and see the modifications you need to make. I have not spent the time to make my script work with it, but I think it shouldn’t be hard.


This is, I think, a cute trick. Who cares? I care because this kind of leak is really hard to automatically check for. Bits are leaked via comparisons. How do you track this kind of information leakage? Maybe someone from academia can pipe up — maybe no one cares. Knowing the address of some attacker controllable bytes is always good. Regardless of the usefulness, I hope it was interesting.

Happy hunting!

EDIT: Changed “Prologue” to “Epilogue”… wow, I wrote this post too quickly.
EDIT: Fix the typo reported by Jordan.

Written by dionthegod

October 29, 2009 at 6:31 pm

Posted in Uncategorized

Differential Reversing (or some better name)

with 6 comments

Note: As a prefix, I want to say I can’t decide on what to call this simple technique. Everyone seems to call it something different: filtering, differential debugging, coverage diffs, or delta traces. Either way it’s a simple idea, so I’m sticking with the first name that popped in my head. Whatever it’s called, it is important to know it’s been done many times and called a few different things. Carry on…


Close your eyes and imagine…

In a moment of irrational team spirit, you, a vocal DC native (you actually live in Montgomery County — poser), bet $5,000 on your beloved Redskins. On Sunday, the Skins lose to the worst team in the league. You spend a night trying to destroy the part of your brain responsible for this lapse of judgment by consuming many many shots of tequila and making many many amazingly bad choices (attempting a bad idea overflow?). It’s 9am and you wake up with a hole where the team spirit functions of your brain used to be (I’m sure there was some collateral damage but I doubt anyone will notice). After a glass of orange juice (which you manage to keep down!), you remember you don’t have $5,000 (“It’s a lock!”… right). You reflect that Willie “Wet Work” Wollinski, your bookie, doesn’t actually seem like such a nice guy and would probably not “be so nice as to forget the whole thing.” Your brainstorming session helps you realize that you have no marketable skills… besides vulnerability discovery and exploit development!

You decide to try and find a bug in some large widely installed software and sell your new found bug to ZDI or iDefense. Having already read this blog post, you use differential reversing to pinpoint the implementation of an interesting feature in your target application, use the IDA plug-in to audit, find an exploitable bug, pound out a PoC and write a fairly weak advisory (but it should be worth $5,000). Hooray! You can keep your kneecaps (for this week at least). Thank $DEITY you read this blog post and didn’t waste time auditing extra sections of the target binary.


Differential reversing (as I am deeming it) is a really simple method to select starting points in a binary for dynamic auditing. This isn’t a new idea, I didn’t invent this technique. People have been doing this for-evah. I’m just documenting a useful set of tools I’ve developed to make my life easier. Pedram Amini’s Process Stalker can do this (calls it “filtering”). It does a bunch of other awesome stuff too. I’m also told Zynamics BinNavi can do this (they call it “differential debugging”), but I don’t have a rich uncle to buy me a copy so I cannot give a first hand account of how it works. It looks pretty nice — check out the BinNavi page for details.

This post is organized as follows. First, I’ll describe the method and the steps in implementing it. Next, I’ll describe the three small tools I’ve written and their implementation. At the end, I’ll show a little example of the tools in action on a nice proprietary application. All the tools are written for use on Windows and have been tested on XP. The technique is generic and could be used on any platform.

Differential Reversing

When I’m reversing, I’m always trying to find a place to add a good breakpoint — in other words, I’m not yet a great reverse engineer. I still spend more time in the debugger than IDA. I’ve seen suggestions to count cross references and get the frequently called functions reversed first. This makes great sense. After doing so, you get the memory primitives out of the way. I have a problem with the next step — where do you go next? My solution to this problem is to use dynamic information to find areas I am interested in. In large binaries, unless you can find some good data cross-references (strings or unique constants), it is very hard to statically find the areas of interest. On the other hand, it is usually easy to exercise the code you want dynamically. For example, you can exercise the de-frobbing code by passing your application frobbed data. Record a trace of execution while the application is processing your input and you will have a bound on where the interesting code is located. Next, the problem is how to search your large basic block run trace for the de-frobbing code. The next logical step is to create a baseline trace of code hit by other inputs that is not hit by the de-frob inducing input. By removing those blocks hit by the baseline trace, you have narrowed the search greatly. That is differential reversing (or, at least, that is what I’m calling it).



There are two obvious tools needed: a tool to capture the set of basic blocks hits during a run and a tool to produce a set of basic blocks given a baseline set and a trigger set. For the first tool (BlockCov), I’ve written a Pin tool to capture the basic blocks hit during a run. The Pin tool takes as arguments a set of modules (executables or libraries) of interest. This allows the GUI and system stuff to be ignored at the trace level (in other words, we aren’t even going to record hits in modules outside the whitelist of modules). The output is a simple list of basic blocks hit for each modules. It also records the module load address in case multiple runs load the module at different virtual addresses.

The second tool is a small python script ( The script creates a stable set of blocks by loading multiple runs using the same input and discarding any blocks that don’t exist in all runs with that input. Once a stable set of blocks has been created for both the baseline and the trigger, those blocks appearing only in the trigger set are recorded in an output set of blocks. Finally, this output is provided to a small IDA plug-in (IDACov) to color the blocks that are hit and a list of covered function to quickly navigate to the areas of interest (Actually, since I started this blog post, I rewrote this plug-in as a IDAPython script — both are included in the archive.)

Tool #1: BlockCov

BlockCov is a Pin tool that monitors a running process and records each basic block executed. Pin is a dynamic binary instrumentation (DBI) framework made available by Intel. It allows us to monitor the execution while adding very little overhead and maintaining a reasonable runtime. Pin publishes an easy to use API and extensive documentation. The mailing list is active and the replies are quick. The downside of using a DBI framework is the difficulty of debugging your tool. Most of the time, you end up using printf debugging techniques. Despite this part of the process, Pin allows you to do some things that would otherwise be too slow to do with a normal debugger. The tradeoff is lack of flexibility, but with the right tools that can be mitigated. But we’re off on a tangent…

BlockCov reduces the overhead by using an address range filter. A set of interesting images is given using command line switches to exclude GUI and system code at the trace level (of course it can still be included if that is what you are interested in). This filter is created by hooking image loads (PE files — executables and DLLs). When an image is loaded, the filename of the loaded image is checked against the whitelist. If a match is found, the image address range is stored along with the image name in a loaded module list. Pin works by dynamically rewriting IA32 (or x64 or IA64) instructions just before execution. The rewrite accomplishes two things: first, it ensures the process under execution does not escape control of the Pin driver and, second, it allows a Pin tool to insert instrumentation hooks at any point in the process. We want to record every access to a basic block within the loaded whitelist modules. We ask Pin to call us every time it does this translation. When BlockCov gets this callback, it looks at the addresses being translated. If the translation falls within an interesting module, then a function call is inserted to record that this block has been hit. Effectively, this is like adding a “CALL RecordBlockHit” at the start of every interesting block before running the process. When the process exits, the recorded set of block addresses are dumped for each interesting module. BlockCov is fairly straightforward — it doesn’t do much.

Tool #2: is a script that has two functions. To avoid spurious differences in a run caused by processes not dependent on the inputs we control, multiple runs are recorded using BlockCov before processing with The script will then take all runs with the same input and filter out any blocks which are not present in all traces. You can come up with instances when this wouldn’t be useful, or even when it might be counter productive, but it has been more useful this way (YMMV). We will call the resulting set of blocks the stable set. Once that has been computed for both the baseline input runs and the trigger input runs, these two sets are compared and a set difference gives the blocks that are unique to the trigger input. This set is output to a file for the IDA plug-in (or anything else you want to do with it).

Tool #3: IDACov

IDACov is a really simple plug-in that takes a list of basic block starting addresses as input. It colors the instructions in this basic block blue and the function containing a color block light blue. It also makes a list of functions with highlighted blocks for quick navigation. I’m guessing there are plug-ins/IDAPython/IDC that do almost the exact same thing, but I’m learning the SDK and this was a good simple exercise. I’ll be re-implementing this in IDAPython soon to see how much cleaner that is. Oh, look, I did it already. IDAPython is great.

Building the Tools

First, grab a current snapshot.

To use the tools, you’ll need Pin 29972 and a recent Visual Studio (the free Express version will work fine). When you unpack Pin, you’ll get a directory with something like pin-2.7-29972-blah, we’ll call this $PINROOT. Unpack the DiffCov tools into $PINROOT\source\tools\. This should place all the tools under $PINROOT\source\tools\DiffCov. Open the DiffCov.sln solution file and build both the pintool and the IDA plug-in. The solution assumes you have IDA at C:\Program Files\IDA and that you want to build the plugin in the \plugins directory under IDA. If you don’t want it there, modify the properties of the IDACov project. The sample SWF files used for input are includes, but if you want to compile them from the HaXe source, you will need HaXe installed. Oh, also, the IDA plug-in expects the SDK to be at C:\Program Files\IDA\idasdk55 — another thing you can fix in the project properties if you need to. Alternatively, the package includes a compiled version of the plug-in. The Pin tool is not distributed in compiled form, you’ll have to build that yourself.

Use Case: Adobe Flash and AMF

The Adobe Flash Player uses some incarnation of the Tamarin framework. This means much of the front-side of Flash is open-sourced. The back-side, the ActionScript API, is not open-source. Flash has a built-in serialization protocol called Action Message Format (or AMF). The ByteArray class in flash.utils support serialization and de-serialization of byte streams using this format. The format is described in an open document from Adobe’s wiki. We will be focusing on AMF3 because that is what the latest ActionScript API uses by default — although, it would be pretty simple to modify the two inputs to find the processing of an AMF0 message. Our goal is to find the parsing of an AMF message in the Flash Player plug-in. I tend to use Firefox for this, so my examples will be using Firefox to launch Flash Player.

Our first step is creating two different inputs that are as similar as possible yet only one will exercise the AMF object parsing codepath. Below are the two HaXe programs to do just that:


class Test {
  static function main() {
    var ba = new flash.utils.ByteArray();
    ba.position = 0;

AMF Integer Parse

class Test {
  static function main() {
    var ba = new flash.utils.ByteArray();
    ba.position = 0;
    var v = ba.readObject();

Now that we have out inputs, let’s run Firefox under the BlockCov tool to capture some coverage sets. We will pass a single whitelisted image to BlockCov: NPSWF32.dll. This is the Flash Player plug-in used by Firefox. Since we are only whitelisting the Flash DLL, none of the Firefox code will be captured — this will keep the overhead low and the block trace smaller. Below is a transcript of 4 runs of BlockCov. Note that BlockCov takes an id and a run parameter; the id parameter is a name for the input used in this run (it shouldn’t change when doing multiple runs with the same input) and the run parameter is a number to give this run (it differentiates between multiple runs with the same input). Keep in mind I’m using a Firefox profile called “fuzz” to run this under — you’ll have to modify the command line to get rid of the -no-remote and -P fuzz switches if you want to run under the default profile.

pin.exe -t BlockCov.dll -mw NPSWF32.dll -id base -run 0 -- "c:/program files/moz
illa firefox/firefox.exe" -no-remote -P fuzz "E:\tools\PinTools\pin-2.6-27887\so

pin.exe -t BlockCov.dll -mw NPSWF32.dll -id base -run 1 -- "c:/program files/moz
illa firefox/firefox.exe" -no-remote -P fuzz "E:\tools\PinTools\pin-2.6-27887\so

pin.exe -t BlockCov.dll -mw NPSWF32.dll -id amfint -run 0 -- "c:/program files/m
ozilla firefox/firefox.exe" -no-remote -P fuzz "E:\tools\PinTools\pin-2.6-27887\

pin.exe -t BlockCov.dll -mw NPSWF32.dll -id amfint -run 1 -- "c:/program files/m
ozilla firefox/firefox.exe" -no-remote -P fuzz "E:\tools\PinTools\pin-2.6-27887\

These four runs have generated four block sets: base-0-NPSWF32.dll.blocks, base-1-NPSWF32.dll.blocks, amfint-0-NPSWF32.dll.blocks, and amfint-1-NPSWF32.dll.blocks. Next up, run from within the directory containing these four block sets. This should output two files: amfint-results.blocks and base-results.blocks. These are human readable and list the address of blocks of interest. The addresses are offsets from the loaded image base (often 0x10000000 in IDA for DLLs).

If you own IDA, fire it up and load NPSWF32.dll (C:\WINDOWS\system32\Macromed\Flash\NPSWF32.dll). When the analysis is complete, load the IDACov plug-in. A file dialog should pop-up asking for a results file to load. Point it to the amfint-results.blocks produced by and voila. Here’s another screen shot:

About 20 functions to inspect. Those go by pretty quick and the most interesting one (offset 0x00175903) is what appears to be the readObject implementation. See the switch statement covering all the AMF markers listed in the AMF3 specification (oh, look, 2 don’t appear in the specification).

Future Posts

I’ve recently written a Pin tool to gather a detailed run trace. This records instructions executes, memory read or written, and register value changes. It was inspired by MSR’s Nirvana project. On top of that, I have some simple analyses — one tracks tainted data and hooks up to an IDA plug-in shown in the screenshot:

The tainted data source is translated into a parse tree node to quickly identify how various fields in a file format are processed within an executable (note the tree on the right). Eventually, I’d like to hook this up to hex-rays to get some nice auto-commenting (but first, I have to convince my boss to spend the money on it). All of that is for another day and another post (hopefully with less than 6 months in between this one and the next). There is also some static analysis I’ve written to do control dependency calculations — useful for determining the span of a tainted conditional jump. Another future project is implementing some smart fuzzing tools using the trace collection engine and some SMT solver. Basically, all the cool stuff the academics get to do.

I hope this was useful to some people — much of this has been repeated before in tools like PaiMei, but this is a slightly different way to go about it. Thanks for reading this far. I can be contacted at with any questions or comments.

Happy hunting!

Written by dionthegod

September 29, 2009 at 9:05 pm

Posted in Uncategorized

A Quine in PDF

leave a comment »

A quine is a self-reproducing program. One whose output is the source of the program itself. The wikipedia link above goes into more depth.

I’m full of mucus and my head is floating, so instead of working on stuff that matters I spent a few hours working out a quine in PDF. Since I wasted so much time making it, I’ve decided to waste some more trying to explain it. I think to understand this, you’ll need to either already be familiar with the PDF format or you will need to grab the specification and follow along with me.

I’ll go object by object and try to explain how it works at each object.

  • Object 1 – Catalog dictionary
  • Object 2 – Pages dictionary
  • Object 3 – Page dictionary
    • Contains references to XObjects defined in objects 9 and 10
    • Font is defined in object 4
    • Content stream for this page is defined in object 5
    • The page is extra long (/MediaBox [0 0 612 1600]) to fit the whole quine in a single page
  • Object 4 – Font dictionary
  • Object 5 – Content stream for page defined by object 3
    • Draws the entire page — this is “main”
    • Overall flow:
      • \X4 Do – Call XObject to draw the raw file up to the start of object 9.
      • \X3 Do – Call XObject to draw the commands that draw the raw file up to the start of object 9.
      • Draw the bits in between the prologue of object 9’s content stream and the start of object 10.
      • \X3 Do – Call XObject to draw the commands that draw the raw file up to the start of object 9.
      • Draw the prologue of the file. This includes the end of object 10 and the xref table.
  • Object 6 – A NOP XObject stream
  • Object 7 – An XObject stream that draws the first half of the drawing commands to draw a line
    • To avoid putting the left parenthesis (0x28) in a normal string where it would have to be escaped, a hex string (<28>) is used.
  • Object 8 – An XObject stream that draws the second half of the drawing commands to draw a line
    • Again, to avoid putting the right parenthesis (0x29) in a normal string where it would have to be escaped, a hex string (<29>) is used.
  • Object 9 and 10 – XObject streams that print the first part of the PDF file and the drawing commands that print the first part of the PDF file
    • The streams for these two objects are intentionally identical. Their dictionaries define different object for the X1 and X2 objects used in the streams. To print the raw pdf lines, a NOP XObject (object 6) is used for both. To print the drawing commands used to print the raw pdf, the prefix and suffix are defined by objects 7 and 8.

That still seems horribly unclear. Hrm. The main idea is to use the Form XObject object as a macro to easily draw the pdf “escaped” or “unescaped” (to draw the commands to draw a string or just the string). The extra wrench is that the XObjects do not inherit any values from the calling stream. This means we can’t call the same XObject twice to draw the escaped and unescaped versions. Instead, we create two versions with identical streams but different Resource dictionaries (defining escaped or unescaped).

Well, I think that’s the best my medicated ass is going to do. Please leave comments with the bits that are unclear.

EDIT: For quines in many languages, check out The Quine Page.

Written by dionthegod

March 17, 2009 at 11:03 pm

Posted in Uncategorized

Patch for libdasm-1.5

with 2 comments

While working on DynaTrex, I found a small but problematic bug in libdasm-1.5 when parsing some floating point instructions. One of the floating point opcode tables was missing 4 null entries in the middle. This resulted in some incorrect parsing for those instructions following the omission (about 32 opcode encodings). I generated a patch and sent it off to the maintainer, but in case this library isn’t maintained any longer I’m posting the patch here. For verification, try disassembling FRNDINT (0xd9 0xfc).

Written by dionthegod

March 16, 2009 at 2:44 pm

Posted in Uncategorized

Fuzzing Adobe Reader 9

leave a comment »

As I mentioned in a previous post, the PDF specification seems bloated. Additionally, the Adobe Reader makes a really good effort to display something even when the PDF document is ill-formed. These observations led me to implement a fuzzing framework with a PDF file format fuzzer as a guinea pig application.

Looking around, I only found one fuzzing framework in Haskell, FileH. It seems very fast, but it is targeted at “dumb” binary fuzzing. Flip some bits here and remove some stuff there… increment these 4 bytes as if they were an integer. It also assumes you have a test harness executing the program in question and returning a non-zero exit code when an interesting execution occurs. My original plan was to create a generative file fuzzer creating new PDFs using the QuickCheck module and to integrate the debugger/execution monitor with the fuzzing framework. With these goals in mind, I set out to write FuzzyWuzzy, the Haskell file fuzzing framework. (Note: I have restrained myself from getting into the gory details, but the link to bitbucket gives you the source should it interest you. Also, keep in mind this is the 5th Haskell project I’ve developed [and the largest]… I’m learning.)

My first step was writing the launcher and monitor bits that interface with the operating system. The first generation would be Windows specific with half an eye towards eventually supporting a ptrace interface. The Win32 modules provided with the GHC installation were almost all I needed. I ended up writing an foreign interface to CreateProcess to support an extra flag (DEBUG_PROCESS) not exposed by the unified System.Process module. I also implemented the foreign interface to TerminateProcess, but that was trivial. With those two extra functions available, I was able to use the functions from System.Win32.DebugApi to create the launcher and monitor to detect crashing programs. I have not yet investigated ways to detect large memory usage or CPU load, but those are on the list for the next version. Currently, the monitor will end an execution and flag it as interesting if the application would have crashed had it not been attached to a debugger.

With the OS stuff out of the way, I turned my attention to file generation. I created an abstract representation of the PDF format and implemented a serialization function to turn a PDF type into a file. After some more serious thought about generating almost valid PDFs from QuickCheck generators, I decided to take another direction.

Instead of generating a PDF from nothing, I wrote a PDF parser to turn a PDF into the abstract representation. The next step was to write mutations on the PDF abstract tree — operations like enlarging Name or String objects, adding long chains of escape sequences to Name or String objects, and deleting entries in Dictionary objects. I also wrote some mutations on the raw character stream going back to disk. These were similar to the mutations done by FileH. At this point, the fuzzer was a complete program. I let it run for a bit and watched Acrobat throw lots of nice message boxes complaining about ill-formed PDFs.

In the course of writing the higher level PDF type mutations, I realized the hierarchical PDF structure made it difficult to pick, say, a random String object (String objects are usually referenced as values in a Dictionary object and would rarely if ever be found as top level [indirect] objects themselves). It would be easier to filter if the PDF was a flat list of objects with each node able to reference the id of another object if needed. After adding this as a transformation from the hierarchical representation, I came up with another bunch of mutations that were much easier to formulate with this representation. With this modification I started finding some crash bugs! My little fuzzer actually works.

Now what? Finding the offending mutation wasn’t difficult and now I have a minimal case to play with. Of course, I’ve been coming up with new ideas for mutations each day.

Ideas for the future:

  • Implement a system for distributed fuzzing – Break up the fuzzing process to be able to easily distribute the work. In other words, have a few computers doing the generations of new files to test and a pool of tester computers to do the runs.
  • Decompose the PDF format further to fuzz the stream contents – Stream objects are usually compressed with the DEFLATE algorithm. This makes for boring fuzzing. Uncompress the Stream objects and decompose them further (Embedded Fonts have known formats, Graphics commands are not difficult to parse, movies, pictures, and music are all stored as Streams as well).
  • Notification system – Email notification of newly found crashes with unique stack/EIP backtraces. Who wouldn’t want to know *immediately*?

After doing all this fuzzing work, it’s become apparent why many people have moved towards developing hybrid fuzzers that use dynamic information to control the future inputs. That is probably where I’ll be heading next. Simple fuzzers are hard to measure (as everyone has already said many many times).

UPDATE: I’ve implemented the e-mail notification. I’ve begun the stream mutation code. I’ve also run into some weirdness with the Haskell CreateProcess interface — I’ve gotten a few rare segfaults. Since I haven’t been successful getting a Haskell monitor written, I wrote a quick one in C, but I haven’t ported the PDF mutation stuff to use it. I’m thinking about writing some Python to manage a simple distributed fuzzing system. Of course, I have been spending all my time lately on other stuff completely, including DynaTrex, an open source binary rewriting tool for Windows. It is still very young.

Written by dionthegod

March 12, 2009 at 10:51 am

Posted in Uncategorized

A Tricky Bit of SNES Code

leave a comment »

I’ve been working to disassemble and comment the source to a certain SNES cartridge. I’ve decided to write all the tools from the ground up — as a learning experience. I’m making good progress and, at the current rate, I should have something reasonable in 4 or 5 months. I’m happy with that. I just hope I can keep it going long enough to produce something worth posting. Today, I wanted to explain why I’ve had a harder time than I expected writing the disassembler and show a piece of particularly tricky (for someone, like myself, who has never worked with the 6502 or 65816) code.

Writing the disassembler has been tough. I’ve never written one before, but I believe the quirks of the 65816 make it a little more of a pain than usual. Disassembling a flat binary isn’t too bad if you have a few entry points (typically interrupt vector locations); you follow the control flow graph visiting every connected instruction and mark the rest as data. Of course there are a few stumbling blocks:

  • Indirect or computed jumps are not so simple to deal with. For now, my method is the human method — taking a look and adding a map from indirect jump addresses to a list of possible destinations. My brain fantasizes that using abstract interpretation may give enough information to reason this out occasionally (especially with the use of a constraint solver like STP). Of course, I don’t have much confidence in the idea. I doubt I will try it, but I do continue to think about it.
  • Not playing nice with CALL or JSR instructions. A call instruction pushes the return address on the stack. Some pieces of code (including the one I talk about below) will either modify this stack variable or remove it from the stack and use it to compute the next jump. This makes it very difficult to follow control flow and breaks the call/return semantics we usually assume.

Those are the usual problems encountered when writing a disassembler. The 65816 adds another layer of pain: a particular opcode can have a variable sized argument depending on a runtime flag state. Huh? Let me say that again, the interpretation of an instruction can vary over two dynamic execution. Example: The bytes A9 00 60 have two very different interpretations based on the value of the M flag in the P (Processor Status) register. If M is set (8-bit accumulator/memory access):

008000:    lda #$0x00
008002:    rts

If M is unset (16-bit accumulator/memory access):

008000:    lda #$0x6000

Can you see how this might make things difficult to debug statically? My solution is to associate a flag state with each address I encounter. The instruction isn’t decoded until the state of the M and X (the X and Y registers can also be dynamically sized) are known. I assume that once one path hits the instruction the value for the flags is valid. In other words, no two paths result in conflicting flag settings. This assumption could be invalid. Take the case of a path which is not valid due to the constraints placed on external data. If that path is evaluated first, the flag settings could be invalid and so my further processing would be invalid. In practice, I don’t think this would happen outside of intentional obfuscation. In other words, my method is a simple data flow analysis without the fix-point iteration. (In case you were wondering, this is why I didn’t just add the (human generated) indirect jump destination addresses to the set of entry points — they need to be associated with a source jump to have a flags value.)

I now have my disassembler chugging along and have been working from both a reset and a VSYNC interrupt. I’ve commented about 16 pages of assembly and have a feel for the main game loop. I’m just getting to the real logic — so far it has just been bookkeeping for the GFX hardware and sound co-processor. In my analysis of the main game loop, I’ve encountered some super awesome code I had to share with someone:

0cc135: jsr $00879c
00879c: sty $05
00879e: ply
00879f: sty $02
0087a1: rep #$30
0087a3: and #$00ff
0087a6: sta $03
0087a8: asl A
0087a9: adc $03
0087ab: tay
0087ac: pla
0087ad: sta $03
0087af: iny
0087b0: lda [$02],Y
0087b2: sta $00
0087b4: iny
0087b5: lda [$02],Y
0087b7: sta $01
0087b9: sep #$30
0087bb: ldy $05
0087bd: jmp [$0000]

Below the fold is the translated version of this gem.
Read the rest of this entry »

Written by dionthegod

November 23, 2008 at 7:12 am

Posted in Uncategorized