Triggering vulnerabilities and design flaws found through static analysis and research is a difficult process, and it can get complicated when the vulnerability lies in a less-accessible part of the code. We’ve developed a Python-based technique for effective, fairly quick prototyping and testing of such vulnerabilities.
When the vulnerability is in an entry point verifying it is trivial; you can just load the vulnerable file in the application. Things get more complicated if the vulnerability is in part of the application that is not directly reachable from an external file, such as the broker. One option in that case would be to exploit the entry point and then trigger the second vulnerability from shellcode. Although this is the end goal, it does not have to be where you start. Debugging the trigger for a second vulnerability can become complicated if your original assumptions were wrong and you have to modify your shellcode, so why not make things easier?
Using Python, we’ve developed a way to prototype and verify our assumptions about a potential vulnerability. Since our approach is not dependent on any particular entry vulnerability, it allows for more general tests. It also has the benefit of being more immediately gratifying while still allowing us to be fairly confident that we are testing things in a legitimate way. Specifically, we can send messages to window handles, make RPC calls, and access shared sections just as the program normally would.
We used two Python modules to pull this off: the ctypes standard library and the RPyC library. Using Python, and more specifically the ctypes module, to instrument and inspect a binary is by no means a new concept. In fact, previous ZDI researchers documented various approaches in 2009 and 2011. What might be new is the method in which we used it.
The ctypes module is a foreign function library that allows us to make calls within the target process. We could use Cython or our own Python module written in pure C just as effectively, but ctypes was the simplest of the available options. RPyC is a remote procedure call module for Python that was chosen because of its ease of use. PyRO is another viable option, though it can be a little more complicated to set up.
The ctypes module is used to allocate a block of memory within the target process that is marked as executable, and then create a remote thread in the target process. More specifically, we obtain a handle to the target process with OpenProcess, use VirtualAllocEx to allocate an executable region, and end by using CreateRemoteThread to create a thread within the process that starts at the beginning of the allocated region.
The remote thread will be responsible for loading the python27.dll module into memory and running through Python initialization as well as RPyC initialization. It then executes a Python stub to start the RPyC server. Since python27.dll was designed to be imported into other processes, it exports all the functions we need to accomplish this. The thread we create runs code that looks similar to the following C:
That code initializes the Python interpreter, imports relevant pieces of the RPyC module, and starts a server that will spawn a new Python thread for each connection. At this point we can remote into the process and start inspecting and executing code within the target process. Since RPyC allows you to transparently execute commands, we effectively have a full Python interpreter running within a sandboxed process and can now execute system calls as the target process. As a minor design limitation, the Python installation architecture must match the target process. Specifically, 32-bit Python is required to inject into a 32-bit process and 64-bit Python is required to inject into a 64-bit process.
You can download the script from here. Running the script is a matter of specifying the PID of the target process and optionally the port to start the RPyC service on. As a contrived example, here is a screenshot of calling kernel32!CreateProcessA within calc.
This approach has a few obvious drawbacks, the most obvious being the fact that we have to load python27.dll into the target process. Usage of the ctypes and RPyC module will result in a few other C modules being loaded into the target process’ address space. It is also much slower than other options, since commands sent to it have to be serialized by RPyC, sent over the network, de-serialized, and then executed. This approach could also introduce issues by allowing you to bypass normal program flow; for example, by potentially avoiding any shims that were loaded. Despite these issues, this technique still ended up being effective for (fairly) rapid prototyping and testing of our assumptions.