Introduction
The decompiler attempts to translate from low-level representations of computer programs into high-level representations. Thus it needs to model concepts from both the low-level machine hardware domain and from the high-level software programming domain.
Understanding the classes within the source code that implement these models provides the quickest inroad into obtaining an overall understanding of the code.
We list all these fundamental classes here, loosely grouped as follows. There is one set of classes that describe the Syntax Trees, which are built up from the original p-code, and transformed during the decompiler's simplification process. The Translation classes do the actual building of the syntax trees from binary executables, and the Transformation classes do the actual work of transforming the syntax trees. Finally there is the High-level classes, which for the decompiler represents recovered information, describing familiar software development concepts, like datatypes, prototypes, symbols, variables, etc.
Syntax Trees
- AddrSpace
- A place within the reverse engineering model where data can be stored. The typical address spaces are ram, modeling the main databus of a processor, and register, modeling a processor's on board registers. Data is stored a byte at a time at offsets within the AddrSpace.
- Address
- An AddrSpace and an offset within the space forms the Address of the byte at that offset.
- Varnode
- A contiguous set of bytes, given by an Address and a size, encoding a single value in the model. In terms of SSA syntax tree, a Varnode is also a node in the tree.
- SeqNum
- PcodeOp
- A single p-code operation. A single machine instruction is translated into (possibly several) operations in this Register Transfer Language.
- Overview of PcodeOp
- BlockBasic
- Funcdata
- The root object holding all information about a function, including: the p-code syntax tree, prototype, and local symbol information.
- Overview of Funcdata
Translation
Transformation
High-level Representation
Overview of SeqNum
A sequence number is a form of extended address for multiple p-code operations that may be associated with the same address. There is a normal Address field. There is a time field which is a static value, determined when an operation is created, that guarantees the uniqueness of the SeqNum. There is also an order field which preserves order information about operations within a basic block. This value may change if the syntax tree is manipulated.
uintm getTime();
uintm getOrder();
A low-level machine address for labelling bytes and data.
Definition: address.hh:46
Overview of PcodeOp
A single operation in the p-code language. It has, at most, one Varnode output, and some number of Varnode inputs. The inputs are operated on depending on the opcode of the instruction, producing the output.
int4 numInput();
bool isDead();
bool isCall();
bool isBranch();
bool isBoolOutput();
A basic block for p-code operations.
Definition: block.hh:365
A class for uniquely labelling and comparing PcodeOps.
Definition: address.hh:111
A low-level variable or contiguous set of bytes described by an Address and a size.
Definition: varnode.hh:65
OpCode
The op-code defining a specific p-code operation (PcodeOp)
Definition: opcodes.hh:35
Overview of BlockBasic
A sequence of PcodeOps with a single path of execution.
int4 sizeOut();
int4 sizeIn();
iterator beginOp();
iterator endOp();
Overview of Funcdata
This is a container for the sytax tree associated with a single function and all other function specific data. It has an associated start address, function prototype, and local scope.
string & getName();
int4 numCalls();
iterator beginDef(uint4,
Address &);
A control-flow block built out of sub-components.
Definition: block.hh:271
A class for analyzing parameters to a sub-function call.
Definition: fspec.hh:1476
LoadImage
Action
Rule
Translate
Decodes machine instructions and can produce p-code.
void printAssembly(ostream &,int4,
Address &)
const;
Abstract class for emitting pcode to an application.
Definition: translate.hh:76
Datatype
Many objects have an associated Datatype, including Varnodes, Symbols, and FuncProtos. A Datatype is built to resemble the type systems of common high-level languages like C or Java.
string & getName();
int4 getSize();
type_metatype
Definition: type.hh:31
There are base types (in varying sizes) as returned by getMetatype.
};
@ TYPE_FLOAT
Floating-point.
Definition: type.hh:39
@ TYPE_BOOL
Boolean.
Definition: type.hh:37
@ TYPE_INT
Signed integer. Signed is considered less specific than unsigned in C.
Definition: type.hh:35
@ TYPE_UNKNOWN
An unknown low-level type. Treated as an unsigned integer.
Definition: type.hh:34
@ TYPE_VOID
Standard "void" type, absence of type.
Definition: type.hh:32
@ TYPE_CODE
Data is actual executable code.
Definition: type.hh:38
@ TYPE_UINT
Unsigned integer.
Definition: type.hh:36
Then these can be used to build compound types, with pointer, array, and structure qualifiers.
};
};
};
The base datatype class for the decompiler.
Definition: type.hh:87
Datatype object representing an array of elements.
Definition: type.hh:297
Datatype * getBase(void) const
Get the element data-type.
Definition: type.hh:311
Datatype object representing a pointer.
Definition: type.hh:267
A composite Datatype object: A "structure" with component "fields".
Definition: type.hh:355
const TypeField * getField(int4 off, int4 sz, int4 *newoff) const
Get field based on offset.
Definition: type.cc:1037
Specifies subfields of a structure or what a pointer points to.
Definition: type.hh:171
TypeFactory
This is a container for Datatypes.
HighVariable
A single high-level variable can move in and out of various memory locations and registers during the course of its lifetime. A HighVariable encapsulates this concept. It is a collection of (low-level) Varnodes, all of which are used to store data for one high-level variable.
int4 numInstances();
The base class for a symbol in a symbol table or scope.
Definition: database.hh:152
FuncProto
FuncCallSpecs
Symbol
A particular symbol used for describing memory in the model. This behaves like a normal (high-level language) symbol. It lives in a scope, has a name, and has a Datatype.
string & getName();
A collection of Symbol objects within a single (namespace or functional) scope.
Definition: database.hh:403
A storage location for a particular Symbol.
Definition: database.hh:51
SymbolEntry
This associates a memory location with a particular symbol, i.e. it maps the symbol to memory. Its, in theory, possible to have more than one SymbolEntry associated with a Symbol.
int4 getSize();
A disjoint set of Ranges, possibly across multiple address spaces.
Definition: address.hh:203
Scope
This is a container for symbols.
Symbol * findByName(
string &);
string & getName();
Container for data structures associated with a single function.
Definition: funcdata.hh:45
Database
This is the container for Scopes.
Architecture
This is the repository for all information about a particular processor and executable. It holds the symbol table, the processor translator, the load image, the type database, and the transform engine.
};
Database of root Action objects that can be used to transform a function.
Definition: action.hh:298
Manager for all the major decompiler subsystems.
Definition: architecture.hh:119
LoadImage * loader
Method for loading portions of binary.
Definition: architecture.hh:149
ActionDatabase allacts
Actions that can be applied in this architecture.
Definition: architecture.hh:162
const Translate * translate
Translation method for this binary.
Definition: architecture.hh:148
TypeFactory * types
List of types for this binary.
Definition: architecture.hh:147
Database * symboltab
Memory map of global variables and functions.
Definition: architecture.hh:140
A manager for symbol scopes for a whole executable.
Definition: database.hh:846
An interface into a particular binary executable image.
Definition: loadimage.hh:71
The interface to a translation engine for a processor.
Definition: translate.hh:294
Container class for all Datatype objects in an Architecture.
Definition: type.hh:482