In 1999 I had a course in Computer Architecture for my ComputerScience studies. Those of us who wrote a disassembler for Intel 8086/8088(using Intel 8086/8088 assember for that) were released from the exam andgot highest grades automatically. Needles to say, I did it.
8086 disassembler download 4 0
In essence, a disassembler is the exact opposite of an assembler. Where an assembler converts code written in an assembly language into binary machine code, a disassembler reverses the process and attempts to recreate the assembly code from the binary machine code.
Since most assembly languages have a one-to-one correspondence with underlying machine instructions, the process of disassembly is relatively straight-forward, and a basic disassembler can often be implemented simply by reading in bytes, and performing a table lookup. Of course, disassembly has its own problems and pitfalls, and they are covered later in this chapter.
Many disassemblers have the option to output assembly language instructions in Intel, AT&T, or (occasionally) HLA syntax. Examples in this book will use Intel and AT&T syntax interchangeably. We will typically not use HLA syntax for code examples, but that may change in the future.
Here we are going to list some commonly available disassembler tools. Notice that there are professional disassemblers (which cost money for a license) and there are freeware/shareware disassemblers. Each disassembler will have different features, so it is up to you as the reader to determine which tools you prefer to use.
Many of the Unix disassemblers, especially the open source ones, have been ported to other platforms, like Windows (mostly using MinGW or Cygwin). Some Disassemblers like otool ([OS X) are distro-specific.
Since data and instructions are all stored in an executable as binary data, the obvious question arises: how can a disassembler tell code from data? Is any given byte a variable, or part of an instruction?
Many interactive disassemblers will give the user the option to render segments of code as either code or data, but non-interactive disassemblers will make the separation automatically. Disassemblers often will provide the instruction AND the corresponding hex data on the same line, shifting the burden for decisions about the nature of the code to the user. Some disassemblers (e.g. ciasdis) will allow you to specify rules about whether to disassemble as data or code and invent label names, based on the content of the object under scrutiny. Scripting your own "crawler" in this way is more efficient; for large programs interactive disassembling may be impractical to the point of being unfeasible.
The general problem of separating code from data in arbitrary executable programs is equivalent to the halting problem. As a consequence, it is not possible to write a disassembler that will correctly separate code and data for all possible input programs. Reverse engineering is full of such theoretical limitations, although by Rice's theorem all interesting questions about program properties are undecidable (so compilers and many other tools that deal with programs in any form run into such limits as well). In practice a combination of interactive and automatic analysis and perseverance can handle all but programs specifically designed to thwart reverse engineering, like using encryption and decrypting code just prior to use, and moving code around in memory.
User defined textual identifiers, such as variable names, label names, and macros are removed by the assembly process. They may still be present in generated object files, for use by tools like debuggers and relocating linkers, but the direct connection is lost and re-establishing that connection requires more than a mere disassembler. Especially small constants may have more than one possible name. Operating system calls (like DLLs in MS-Windows, or syscalls in Unices) may be reconstructed, as their names appear in a separate segment or are known beforehand. Many disassemblers allow the user to attach a name to a label or constant based on his understanding of the code. These identifiers, in addition to comments in the source file, help to make the code more readable to a human, and can also shed some clues on the purpose of the code. Without these comments and identifiers, it is harder to understand the purpose of the source code, and it can be difficult to determine the algorithm being used by that code. When you combine this problem with the possibility that the code you are trying to read may, in reality, be data (as outlined above), then it can be even harder to determine what is going on. Another challenge is posed by modern optimising compilers; they inline small subroutines, then combine instructions over call and return boundaries. This loses valuable information about the way the program is structured.
Akin to Disassembly, Decompilers take the process a step further and actually try to reproduce the code in a high level language. Frequently, this high level language is C, because C is simple and primitive enough to facilitate the decompilation process. Decompilation does have its drawbacks, because lots of data and readability constructs are lost during the original compilation process, and they cannot be reproduced. Since the science of decompilation is still young, and results are "good" but not "great", this page will limit itself to a listing of decompilers, and a general (but brief) discussion of the possibilities of decompilation. Compared to disassemblers a decompiler generates code that doesnot require that one is familiar at the processor at hand. It may even be that the decompiled code can be compiled on a different processor, or give a reasonable starting point to reproduce the program on a different processor.
From a human disassembler's point of view, this is a nightmare, although this is straightforward to read in the original Assembly source code, as there is no way to decide if the db should be interpreted or not from the binary form, and this may contain various jumps to real executable code area, triggering analysis of code that should never be analysed, and interfering with the analysis of the real code (e.g. disassembling the above code from 0000h or 0001h won't give the same results at all).
IDA Pro as a disassembler is capable of creating maps of their execution to show the binary instructions that are actually executed by the processor in a symbolic representation (assembly language). Advanced techniques have been implemented into IDA Pro so that it can generate assembly language source code from machine-executable code and make this complex code more human-readable.
DisC was originally written in Borland C++ 3.0 (running on DOS), but now i dont have the compiler. Also i find that not many people are using it these days, so i have "ported" DisC to Win32! I have compiled and successfully executed DisC using Microsoft Visual C++ 6.0, though there shouldnt be any problems with other Win32 compilers. But once small quirk - you must have a front-end program called "PrepDisC" (it is also included with the downloads listed below!) which is compiled as a DOS executable using a DOS C compiler like TurboC, and DisC will use this program to convert the input program which you want to decompile, into its own internal format and then do the actual decompilation.
Please note that this code was written when i was trying to learn C++, so it is not a very well commented code, but dont hesitate to download and have a look - i have added quite a few comments at very important places and that should help you. If you already have a good knowledge of decompilation and the 8086 assembly language, then it would be a breeze for you!
Please take a look at the README.TXT files in the source distribution for instructions. Make sure you also download the TurboC compiler shown above, because you need that to test DisC - after all, this is a decompiler for TurboC only! Also make sure you have compiled the file "prepdisc.c" using TurboC on DOS, and store the file "prepdisc.exe" in the same directory as the DisC executable.
Pass target specific information to the disassembler. Only supported onsome targets. If it is necessary to specify more than onedisassembler option then multiple -M options can be used orcan be placed together into a comma separated list.
cpu=... allows one to enforce a particular ISA when disassemblinginstructions, overriding the -m value or whatever is in the ELF file.This might be useful to select ARC EM or HS ISA, because architecture is samefor those and disassembler relies on private ELF header data to decide if codeis for EM or HS. This option might be specified multiple times - only thelatest value will be used. Valid values are same as for the assembler-mcpu=... option.
This option can also be used for ARM architectures to force thedisassembler to interpret all instructions as Thumb instructions byusing the switch --disassembler-options=force-thumb. This can beuseful when attempting to disassemble thumb code produced by othercompilers.
If you're just looking to use a disassembler, then objdump is one choice. The disassembler that comes with the nasm assembler is ndisasm. You can also run "debug.exe" in DOS Box on Linux, provided you get a hold of a copy of the program. It also does disassembly, as well as controlled execution; i.e. simulation of the CPU, itself - which is also important, even when doing disassembly, for reasons I'm about to describe.
This gets to the other sense of your query: "I want to make a disassembler". The source for ndisasm is available, and it handles many of the descendants of 8086, not just 8086, itself (which seriously clutters it, if all you want is an 8086 or even 80386 disassembler), but it is not self-contained and has a heavy dependency on the rest of the distribution.
Its main talking point is that it uses octal digits for the opcodes - which better fits the 80x86 - as I pointed out on the USENET in 1995 in comp.lang.asm ... and (in fact) nasm's creation was a direct response to that. So, it's potentially more transparent and you may want to keep the source handy as a check and comparison, if you're making your own disassembler. 2ff7e9595c
Comments