September | 2009 | Stupid Compiler

Archive for September 2009

Differential Reversing (or some better name)

Note: As a prefix, I want to say I can’t decide on what to call this simple technique. Everyone seems to call it something different: filtering, differential debugging, coverage diffs, or delta traces. Either way it’s a simple idea, so I’m sticking with the first name that popped in my head. Whatever it’s called, it is important to know it’s been done many times and called a few different things. Carry on…

Motivation

Close your eyes and imagine…

In a moment of irrational team spirit, you, a vocal DC native (you actually live in Montgomery County — poser), bet $5,000 on your beloved Redskins. On Sunday, the Skins lose to the worst team in the league. You spend a night trying to destroy the part of your brain responsible for this lapse of judgment by consuming many many shots of tequila and making many many amazingly bad choices (attempting a bad idea overflow?). It’s 9am and you wake up with a hole where the team spirit functions of your brain used to be (I’m sure there was some collateral damage but I doubt anyone will notice). After a glass of orange juice (which you manage to keep down!), you remember you don’t have $5,000 (“It’s a lock!”… right). You reflect that Willie “Wet Work” Wollinski, your bookie, doesn’t actually seem like such a nice guy and would probably not “be so nice as to forget the whole thing.” Your brainstorming session helps you realize that you have no marketable skills… besides vulnerability discovery and exploit development!

You decide to try and find a bug in some large widely installed software and sell your new found bug to ZDI or iDefense. Having already read this blog post, you use differential reversing to pinpoint the implementation of an interesting feature in your target application, use the IDA plug-in to audit, find an exploitable bug, pound out a PoC and write a fairly weak advisory (but it should be worth $5,000). Hooray! You can keep your kneecaps (for this week at least). Thank $DEITY you read this blog post and didn’t waste time auditing extra sections of the target binary.

Overview

Differential reversing (as I am deeming it) is a really simple method to select starting points in a binary for dynamic auditing. This isn’t a new idea, I didn’t invent this technique. People have been doing this for-evah. I’m just documenting a useful set of tools I’ve developed to make my life easier. Pedram Amini’s Process Stalker can do this (calls it “filtering”). It does a bunch of other awesome stuff too. I’m also told Zynamics BinNavi can do this (they call it “differential debugging”), but I don’t have a rich uncle to buy me a copy so I cannot give a first hand account of how it works. It looks pretty nice — check out the BinNavi page for details.

This post is organized as follows. First, I’ll describe the method and the steps in implementing it. Next, I’ll describe the three small tools I’ve written and their implementation. At the end, I’ll show a little example of the tools in action on a nice proprietary application. All the tools are written for use on Windows and have been tested on XP. The technique is generic and could be used on any platform.

Differential Reversing

When I’m reversing, I’m always trying to find a place to add a good breakpoint — in other words, I’m not yet a great reverse engineer. I still spend more time in the debugger than IDA. I’ve seen suggestions to count cross references and get the frequently called functions reversed first. This makes great sense. After doing so, you get the memory primitives out of the way. I have a problem with the next step — where do you go next? My solution to this problem is to use dynamic information to find areas I am interested in. In large binaries, unless you can find some good data cross-references (strings or unique constants), it is very hard to statically find the areas of interest. On the other hand, it is usually easy to exercise the code you want dynamically. For example, you can exercise the de-frobbing code by passing your application frobbed data. Record a trace of execution while the application is processing your input and you will have a bound on where the interesting code is located. Next, the problem is how to search your large basic block run trace for the de-frobbing code. The next logical step is to create a baseline trace of code hit by other inputs that is not hit by the de-frob inducing input. By removing those blocks hit by the baseline trace, you have narrowed the search greatly. That is differential reversing (or, at least, that is what I’m calling it).

Screenshot

Tools

There are two obvious tools needed: a tool to capture the set of basic blocks hits during a run and a tool to produce a set of basic blocks given a baseline set and a trigger set. For the first tool (BlockCov), I’ve written a Pin tool to capture the basic blocks hit during a run. The Pin tool takes as arguments a set of modules (executables or libraries) of interest. This allows the GUI and system stuff to be ignored at the trace level (in other words, we aren’t even going to record hits in modules outside the whitelist of modules). The output is a simple list of basic blocks hit for each modules. It also records the module load address in case multiple runs load the module at different virtual addresses.

The second tool is a small python script (diffre.py). The script creates a stable set of blocks by loading multiple runs using the same input and discarding any blocks that don’t exist in all runs with that input. Once a stable set of blocks has been created for both the baseline and the trigger, those blocks appearing only in the trigger set are recorded in an output set of blocks. Finally, this output is provided to a small IDA plug-in (IDACov) to color the blocks that are hit and a list of covered function to quickly navigate to the areas of interest (Actually, since I started this blog post, I rewrote this plug-in as a IDAPython script — both are included in the archive.)

Tool #1: BlockCov

BlockCov is a Pin tool that monitors a running process and records each basic block executed. Pin is a dynamic binary instrumentation (DBI) framework made available by Intel. It allows us to monitor the execution while adding very little overhead and maintaining a reasonable runtime. Pin publishes an easy to use API and extensive documentation. The mailing list is active and the replies are quick. The downside of using a DBI framework is the difficulty of debugging your tool. Most of the time, you end up using printf debugging techniques. Despite this part of the process, Pin allows you to do some things that would otherwise be too slow to do with a normal debugger. The tradeoff is lack of flexibility, but with the right tools that can be mitigated. But we’re off on a tangent…

BlockCov reduces the overhead by using an address range filter. A set of interesting images is given using command line switches to exclude GUI and system code at the trace level (of course it can still be included if that is what you are interested in). This filter is created by hooking image loads (PE files — executables and DLLs). When an image is loaded, the filename of the loaded image is checked against the whitelist. If a match is found, the image address range is stored along with the image name in a loaded module list. Pin works by dynamically rewriting IA32 (or x64 or IA64) instructions just before execution. The rewrite accomplishes two things: first, it ensures the process under execution does not escape control of the Pin driver and, second, it allows a Pin tool to insert instrumentation hooks at any point in the process. We want to record every access to a basic block within the loaded whitelist modules. We ask Pin to call us every time it does this translation. When BlockCov gets this callback, it looks at the addresses being translated. If the translation falls within an interesting module, then a function call is inserted to record that this block has been hit. Effectively, this is like adding a “CALL RecordBlockHit” at the start of every interesting block before running the process. When the process exits, the recorded set of block addresses are dumped for each interesting module. BlockCov is fairly straightforward — it doesn’t do much.

Tool #2: diffre.py

diffre.py is a script that has two functions. To avoid spurious differences in a run caused by processes not dependent on the inputs we control, multiple runs are recorded using BlockCov before processing with diffre.py. The script will then take all runs with the same input and filter out any blocks which are not present in all traces. You can come up with instances when this wouldn’t be useful, or even when it might be counter productive, but it has been more useful this way (YMMV). We will call the resulting set of blocks the stable set. Once that has been computed for both the baseline input runs and the trigger input runs, these two sets are compared and a set difference gives the blocks that are unique to the trigger input. This set is output to a file for the IDA plug-in (or anything else you want to do with it).

Tool #3: IDACov

IDACov is a really simple plug-in that takes a list of basic block starting addresses as input. It colors the instructions in this basic block blue and the function containing a color block light blue. It also makes a list of functions with highlighted blocks for quick navigation. I’m guessing there are plug-ins/IDAPython/IDC that do almost the exact same thing, but I’m learning the SDK and this was a good simple exercise. I’ll be re-implementing this in IDAPython soon to see how much cleaner that is. Oh, look, I did it already. IDAPython is great.

Building the Tools

First, grab a current snapshot.

To use the tools, you’ll need Pin 29972 and a recent Visual Studio (the free Express version will work fine). When you unpack Pin, you’ll get a directory with something like pin-2.7-29972-blah, we’ll call this $PINROOT. Unpack the DiffCov tools into $PINROOT\source\tools\. This should place all the tools under $PINROOT\source\tools\DiffCov. Open the DiffCov.sln solution file and build both the pintool and the IDA plug-in. The solution assumes you have IDA at C:\Program Files\IDA and that you want to build the plugin in the \plugins directory under IDA. If you don’t want it there, modify the properties of the IDACov project. The sample SWF files used for input are includes, but if you want to compile them from the HaXe source, you will need HaXe installed. Oh, also, the IDA plug-in expects the SDK to be at C:\Program Files\IDA\idasdk55 — another thing you can fix in the project properties if you need to. Alternatively, the package includes a compiled version of the plug-in. The Pin tool is not distributed in compiled form, you’ll have to build that yourself.

Use Case: Adobe Flash and AMF

The Adobe Flash Player uses some incarnation of the Tamarin framework. This means much of the front-side of Flash is open-sourced. The back-side, the ActionScript API, is not open-source. Flash has a built-in serialization protocol called Action Message Format (or AMF). The ByteArray class in flash.utils support serialization and de-serialization of byte streams using this format. The format is described in an open document from Adobe’s wiki. We will be focusing on AMF3 because that is what the latest ActionScript API uses by default — although, it would be pretty simple to modify the two inputs to find the processing of an AMF0 message. Our goal is to find the parsing of an AMF message in the Flash Player plug-in. I tend to use Firefox for this, so my examples will be using Firefox to launch Flash Player.

Our first step is creating two different inputs that are as similar as possible yet only one will exercise the AMF object parsing codepath. Below are the two HaXe programs to do just that:

Baseline

class Test {
  static function main() {
    var ba = new flash.utils.ByteArray();
    ba.writeByte(0x04);
    ba.writeByte(0x01);
    ba.position = 0;
  }
}

AMF Integer Parse

class Test {
  static function main() {
    var ba = new flash.utils.ByteArray();
    ba.writeByte(0x04);
    ba.writeByte(0x01);
    ba.position = 0;
    var v = ba.readObject();
  }
}

Now that we have out inputs, let’s run Firefox under the BlockCov tool to capture some coverage sets. We will pass a single whitelisted image to BlockCov: NPSWF32.dll. This is the Flash Player plug-in used by Firefox. Since we are only whitelisting the Flash DLL, none of the Firefox code will be captured — this will keep the overhead low and the block trace smaller. Below is a transcript of 4 runs of BlockCov. Note that BlockCov takes an id and a run parameter; the id parameter is a name for the input used in this run (it shouldn’t change when doing multiple runs with the same input) and the run parameter is a number to give this run (it differentiates between multiple runs with the same input). Keep in mind I’m using a Firefox profile called “fuzz” to run this under — you’ll have to modify the command line to get rid of the -no-remote and -P fuzz switches if you want to run under the default profile.

E:\tools\PinTools\pin-2.6-27887\source\tools\DiffCov\Debug&gt;..\..\..\..\ia32\bin\
pin.exe -t BlockCov.dll -mw NPSWF32.dll -id base -run 0 -- "c:/program files/moz
illa firefox/firefox.exe" -no-remote -P fuzz "E:\tools\PinTools\pin-2.6-27887\so
urce\tools\DiffCov\Samples\AMFInt-Baseline\Test.swf"

E:\tools\PinTools\pin-2.6-27887\source\tools\DiffCov\Debug&gt;..\..\..\..\ia32\bin\
pin.exe -t BlockCov.dll -mw NPSWF32.dll -id base -run 1 -- "c:/program files/moz
illa firefox/firefox.exe" -no-remote -P fuzz "E:\tools\PinTools\pin-2.6-27887\so
urce\tools\DiffCov\Samples\AMFInt-Baseline\Test.swf"

E:\tools\PinTools\pin-2.6-27887\source\tools\DiffCov\Debug&gt;..\..\..\..\ia32\bin\
pin.exe -t BlockCov.dll -mw NPSWF32.dll -id amfint -run 0 -- "c:/program files/m
ozilla firefox/firefox.exe" -no-remote -P fuzz "E:\tools\PinTools\pin-2.6-27887\
source\tools\DiffCov\Samples\AMFInt\Test.swf"

E:\tools\PinTools\pin-2.6-27887\source\tools\DiffCov\Debug&gt;..\..\..\..\ia32\bin\
pin.exe -t BlockCov.dll -mw NPSWF32.dll -id amfint -run 1 -- "c:/program files/m
ozilla firefox/firefox.exe" -no-remote -P fuzz "E:\tools\PinTools\pin-2.6-27887\
source\tools\DiffCov\Samples\AMFInt\Test.swf"

These four runs have generated four block sets: base-0-NPSWF32.dll.blocks, base-1-NPSWF32.dll.blocks, amfint-0-NPSWF32.dll.blocks, and amfint-1-NPSWF32.dll.blocks. Next up, run diffre.py from within the directory containing these four block sets. This should output two files: amfint-results.blocks and base-results.blocks. These are human readable and list the address of blocks of interest. The addresses are offsets from the loaded image base (often 0x10000000 in IDA for DLLs).

If you own IDA, fire it up and load NPSWF32.dll (C:\WINDOWS\system32\Macromed\Flash\NPSWF32.dll). When the analysis is complete, load the IDACov plug-in. A file dialog should pop-up asking for a results file to load. Point it to the amfint-results.blocks produced by diffre.py and voila. Here’s another screen shot:

About 20 functions to inspect. Those go by pretty quick and the most interesting one (offset 0x00175903) is what appears to be the readObject implementation. See the switch statement covering all the AMF markers listed in the AMF3 specification (oh, look, 2 don’t appear in the specification).

Future Posts

I’ve recently written a Pin tool to gather a detailed run trace. This records instructions executes, memory read or written, and register value changes. It was inspired by MSR’s Nirvana project. On top of that, I have some simple analyses — one tracks tainted data and hooks up to an IDA plug-in shown in the screenshot:

The tainted data source is translated into a parse tree node to quickly identify how various fields in a file format are processed within an executable (note the tree on the right). Eventually, I’d like to hook this up to hex-rays to get some nice auto-commenting (but first, I have to convince my boss to spend the money on it). All of that is for another day and another post (hopefully with less than 6 months in between this one and the next). There is also some static analysis I’ve written to do control dependency calculations — useful for determining the span of a tainted conditional jump. Another future project is implementing some smart fuzzing tools using the trace collection engine and some SMT solver. Basically, all the cool stuff the academics get to do.

I hope this was useful to some people — much of this has been repeated before in tools like PaiMei, but this is a slightly different way to go about it. Thanks for reading this far. I can be contacted at dion@semantiscope.com with any questions or comments.

Happy hunting!

Written by dionthegod

September 29, 2009 at 9:05 pm

Posted in Uncategorized

Stupid Compiler