Tracing assembler code with minimally defined state

Given the following assembler code, we never have seen before:

push 34h
call check_stacksize
push ebx
push ecx
push esi
push edi
push ebp
mov ebp, esp
sub esp, 1Ch

mov esi, eax
mov [ebp+dt], edx
cmp [eax+person.index], 0
jl short 1BBA5
cmp [eax+person.name], 0
jz short 1BB6B
Figure 1: a complicated routine

Figure 1: a complicated routine

What does it do and how do we find out? The code snippet is quite simple, so we could simply trace it ourselves, following the flow of control through the jumps and calls and deduct from that, what its purpose is and how it works in detail. As you can see, we have some semantic information („person.index“, „person.name“) which is crucial to understand what the code does.

But what do we do if your program gets more and more complicated, something like in Figure 1? It will be much much more complicated to deduct the purpose and function of the code there. Why?

Let’s look at how we normally approach the question „what does this code do?“ (independent of whether it is written in Assembler, Basic or any other language).

The first aspect is the purpose of a piece of code. Because we can assume that it was written by a human being, we can also assume that the structure of the program – even if the compiler did heavy optimization – still reflects the author’s approach to the problem. There will propably be a modularization, a separation of code by task and timing. Even in assembler code, statements following each other will – most of the time – have something to do with each other, work on the same task. Therefore we can identify chunks of code which can be understood as actions and tasks (open a file, read a specific object from file, etc.). And, most important, there will be a sense of purpose. Code doesn’t just sit there for the sake of being, it has a purpose and this purpose is connected to the purpose of the whole program. If this code looks into the properties of a „person object“, it will propably do something relevant with that data. By examining which properties the code reads and writes, what other actions are performed in its vicinity and which kinds of objects it combines, we can narrow down the set of possible purposes of that code. To sum this up: Even for a piece of assembler code it is possible to split it into chunks and understand the whole piece in a more abstract sense, assign a purpose to it.

The second aspect of the question „what does it do?“ is connected to the specific behaviour of the code for a given state of the system. For example, if we have a initialized person and we run our code on it, how will the state of the person change exactly? This second aspect of the question often is much more complicated to answer than the first, because it normally requires a line-by-line tracing of the program flow. Why? Because in assembler code, the state of the program is largely represented by the state of the CPU. The CPU does own only a limited number of state variables. It is like reading a C listing in which the number of variables has been limited to six variables. There will be a lot of value-switching and temporarily parking. And all this state we have to keep in mind during tracing. The longer and complicated our code gets, the more state we have to keep in mind. A possible solution might be to break down the problem into smaller chunks by identifying partitions of the code which work indepentently of each other. But the very task of ruling out the dependency between to pieces of code often required a deep understanding of the state of the CPU and the memory at as specific point of execution.

It seems as if we cannot avoid tracing code. Debuggers were invented for that case. They introduce the great advantage that they keep track of the state (CPU and memory) for us. We can concentrate on the semantically relevant points in the code and forget the rest.

But what if we don’t have a full, executable program at our hand? If we had, we would also have to worry about whether our piece of code is executed at all. We would have to provide a lot of state (hardware state, memory, operating system, execution environment) just to execute our little snippet! What if we just want to understand understand this little, isolated piece of code? A classical debugger is always restricted to follow a single path of execution, given its state. But we might want to know what happens if we take the other road at a specific junction. Does the result of the execution even depend on a specific state variable?

Those questions cannot be answered by classical debugging techniques. What we require is

  1. Minimal state. We want to provide only those state variables which are important for the behaviour of our code. If the state of the graphics card is never queried, we don’t want to provide it.
  2. Alternative paths. We want to be able to overview all the possible alternative routes of execution through our code and their dependency on the „input variables“.

What we want might be symbolic execution. A technique developed to (automatically) prove the correctness of programs. And this is the idea, I currently work on. There are frameworks to do symbolic execution of binary programs, already (https://github.com/feliam/pysymemu). But I don’t want to do it with a complete program bug simply an isolated piece of assembler code.

Reconstructing Ecstatica’s Level Geometry MK2

Amazing! As it turned out, what I expected to be the navigation- and collision mesh is actually almost the complete level geometry including walls, stairs, windows, etc. What’s missing is triangle-based geometry like roofs and windows as well as all the static ellipse-based models (flowers, grass, etc.). Nontheless this are great news, because the whole game world can now be explored and looked at in 3D!

Extracted Level Geometry from Ecstatica

Extracted Level Geometry from Ecstatica

Reconstructing Ecstatica’s Level Geometry

Alone in the Dark 3

Alone in the Dark 3

Ecstatica 1 takes place in a small village upon a rock, surrounded by a deep valley. The walkable area is restricted to this very rock, but still it is quite big and it is open. Not just a bunch of rooms but an exterior area. If you take a closer look at Ecstatica’s backgrounds, you will be puzzled by the really high-quality of the level geometry. Yes, there is real geometry behind the backgrounds. It even looks like not only the characters are all made of Ellipses but the walls, the floor and the plants as well. Those stone floors and walls look just like something you would use displacement mapping or metaballs for today. The Compare that to titles like „Alone in the Dark“ (see screenshot below) which were using painted environments. Alone in the Dark is quite a visually appealing title. But having actual geometry to render the backgrounds from is something different. Remember: When Ecstatica was shipped it was 1994!  How the hell did those Britons do this? This was not a multi-million dollar production. On what hardware were they able to create and render this vast amount of geometric data?

Screenshot demonstrating the high qualify of Ecstaticas level geometry

Screenshot demonstrating the high qualify of Ecstatica's level geometry

I recently worked on some code doing a reconstruction of the level geometry based on the set of about 240 background images. I managed to mix this reconstructed geometry with the collision- and navigation mesh. The current state already looks interesting. I propably won’t be able to explicitely reconstruct the complete geometry…

  • the available camera views are too sparsly distributed
  • the texture infomation I reconstruct is quickly diminishing in quality as you get farer away from the camera due to the projection effect
  • It would require a LOT of post-processing in order to merge and clean up the generated mesh data
  • It’s actually too much geometry. Given my 8 Gigs of RAM I’m unable to represent all 340 backgrounds in one blender session – even though I threw away around 99,5% of all generated polygons (using the „Decimate“ modifier).

How did they do it in 1994? My current assumptions are

  • They chopped the game world into chunks swallowable for 1994 hardware
  • For individual chunks they used some CAD software to create coarse geometry (walls, floors, stairs, etc.)
  • The used a custom raytracer to shoot rays through the world, querying the relevant geometry chunk and then, as a final step, they generated the ellipses on-the-fly using some form of texture mapping. I base this assumption on the fact that the stone walls show certain patterns. The basic principle of this last step actually is rather simple. Instead of calculating the ray intersection with a plane from the coarse geometry you put a set of ellipses there, each only described by its center and its radii. Because you’re systematically scanlining over the sensor area of your virtual camera, you should be able to keep only those ellipses in memory which are actually hit by rays.
Current state of my attempt to reconstruct Ecstatica's level geometry

Current state of my attempt to reconstruct Ecstatica's level geometry

Approaching a breakthrough

Hi, It’s been a lApproachingBreakThroughong time again since my last update. For some months I was lacking motivation to continue work on my Ecstatica project. But now during the last week of my Christmas vacation, I started to work on it again and finally all the frustrating research into the data structures employed to describe characters in Ecstatica is starting to show signs of success. I think I have a pretty good picture now how Ecstatica is doing it and therefore I was able to put this knowledge into a little Python script run inside Blender to generate Ecstatica characters.

As you can see, the main character is missing its nose. I currently assume that it is described by a triangles. Triangles are not generated yet. I’m working on that. After that, the next milestone will be to translate character animations into blender animations.

Update on Reasearch on Ecstatica

Things move slowly forward. This weekend I have continued to implement some Python routines to

  • load an Ecstatica FANT file
  • write an Ecstatica FANT file
  • write a XML representation of the FANT file
  • read a XML representation of the FANT file

Those 4 building blocks are in an early stage right now. At the moment I have verified that I can load a FANT file and write it back without introducing any relevant differences. I’m still having some precision issues with converting the fixed-point numbers to floats and back.

Using these tools I will be able to modify the game data files and see, what happens. I’m thinking about giving the main character green hair or something like that 😀

Blockheads invade the village

Ecstatica fortunately still contains a lot of debug code, which is partly not even called anymore (strangely it was not stripped from the binary).

One of those tiny little debug features is to draw a box around every ellipsoid. The result looks kinda funny ;D