Version: 4.1.4
7 Decompiling Bytecode
The --decompile mode for mzc takes a bytecode file (which
usually has the file extension ".zo") and converts it back
to an approximation of Scheme code. Decompiled bytecode is mostly
useful for checking the compiler’s transformation and optimization of
the source program.
Many forms in the decompiled code, such as module,
define, and lambda, have the same meanings as
always. Other forms and transformations are specific to the rendering
of bytecode, and they reflect a specific execution model:
Top-level variables, variables defined within the module, and
variables imported from other modules are prefixed with _,
which helps expose the difference between uses of local variables
versus other variables. Variables imported from other modules,
moreover, have a suffix that indicates the source module.
Non-local variables are always accessed indirectly though an implicit
#%globals or #%modvars variable that
resides on the value stack (which otherwise contains local
variables). Variable accesses are further wrapped with
#%checked when the compiler cannot prove that the
variable will be defined before the access.
Uses of core primitives are shown without a leading _, and
they are never wrapped with #%checked.
Local-variable access may be wrapped with
#%sfs-clear, which indicates that the variable-stack
location holding the variable will be cleared to prevent the
variable’s value from being retained by the garbage collector.
Mutable variables are converted to explicitly boxed values using
#%box, #%unbox, and
#%set-boxes! (which works on multiple boxes at once).
A set!-rec-values operation constructs
mutually-recursive closures and simultaneously updates the
corresponding variable-stack locations that bind the closures. A
set!, set!-values, or
set!-rec-values form is always used on a local
variable before it is captured by a closure; that ordering reflects
how closures capture values in variable-stack locations, as opposed
to stack locations.
In a lambda form, if the procedure produced by the
lambda has a name (accessible via object-name)
and/or source-location information, then it is shown as a quoted
constant at the start of the procedure’s body. Afterward, if the
lambda form captures any bindings from its context, those
bindings are also shown in a quoted constant. Neither constant
corresponds to a computation when the closure is called, though the
list of captured bindings corresponds to a closure allocation when
the lambda form itself is evaluated.
A lambda form that closes over no bindings is wrapped with
#%closed plus an identifier that is bound to the
closure. The binding’s scope covers the entire decompiled output, and
it may be referenced directly in other parts of the program; the
binding corresponds to a constant closure value that is shared, and
it may even contain cyclic references to itself or other constant
closures.
Some applications of core primitives are annotated with
#%in, which indicates that the JIT compiler will
inline the operation. (Inlining information is not part of the
bytecode, but is instead based on an enumeration of primitives that
the JIT is known to handle specially.)
A form (#%apply-values proc expr) is equivalent to
(call-with-values (lambda () expr) proc), but the run-time
system avoids allocating a closure for expr.
A #%decode-syntax form corresponds to a syntax
object. Future improvements to the decompiler will convert such
syntax objects to a readable form.
7.1 Scheme API for Decompiling
Consumes the result of parsing bytecode and returns an S-expression
(as described above) that represents the compiled code.
7.2 Scheme API for Parsing Bytecode
Parses a port (typically the result of opening a ".zo" file)
containing byte. The parsed bytecode is returned in a
compilation-top structure.
Beware that the structure types used to represent the bytecode are
subject to frequent changes across PLT Scheme versons.
7.2.1 Prefix
Wraps compiled code. The max-let-depth field indicates the
maximum stack depth that code creates (not counting the
prefix array). The prefix field describes top-level
variables, module-level variables, and quoted syntax-objects accessed
by code. The code field contains executable code; it
is normally a form, but a literal value is represented as
itself.
Represents a “prefix” that is pushed onto the stack to initiate
evaluation. The prefix is an array, where buckets holding the values
for toplevels are first, then a bucket for another array if
stxs is non-empty, then num-lifts extra buckets for
lifted local procedures.
In toplevels, each element is one of the following:
a #f, which indicates a dummy variable that is used to
access the enclosing module/namespace at run time;
a symbol, which is a reference to a variable defined in the
enclosing module;
a global-bucket, which is a top-level variable
(appears only outside of modules); or
a module-variable, which indicates a variable imported
from another module.
The variable buckets and syntax objects that are recorded in a prefix
are accessed by toplevel and topsyntax expression
forms.
Represents a top-level variable, and used only in a prefix.
Represents a top-level variable, and used only in a prefix.
The pos may record the variable’s offset within its module,
or it can be -1 if the variable is always located by name.
The phase indicates the phase level of the definition within
its module.
(struct | | stx | | (encoded) | | | #:transparent) |
|
encoded : wrapped? |
Wraps a syntax object in a prefix.
7.2.2 Forms
(struct | | form | | () | | | #:transparent) |
|
A supertype for all forms that can appear in compiled code (including
exprs), except for literals that are represented as
themselves and indirect structures to create cycles.
Represents a define-values form. Each element of ids
will reference via the prefix either a top-level variable or a local
module variable.
After rhs is evaluated, the stack is restored to its depth
from before evaluating rhs.
Represents a define-syntaxes or
define-values-for-syntax form. The rhs expression
has its own prefix, which is pushed before evaluating
rhs; the stack is restored after obtaining the result
values. The max-let-depth field indicates the maximum size of
the stack that will be created by rhs (not counting
prefix).
Represents a top-level require form (but not one in a
module form). The dummy variable is used to access
to the top-level namespace.
Represents a module declaration. The body forms use
prefix, rather than any prefix in place for the module
declaration itself (and each syntax-body has its own
prefix). The body field contains the module’s run-time code,
and syntax-body contains the module’s compile-time code. The
max-let-depth field indicates the maximum stack depth created
by body forms (not counting the prefix array).
After each form in body is evaluated, the stack is restored
to its depth from before evaluating the form.
Represents a begin form, either as an expression or at the
top level (though the latter is more commonly a splice form).
When a seq appears in an expression position, its
forms are expressions.
After each form in forms is evaluated, the stack is restored
to its depth from before evaluating the form.
Represents a top-level begin form where each evaluation is
wrapped with a continuation prompt.
After each form in forms is evaluated, the stack is restored
to its depth from before evaluating the form.
7.2.3 Expressions
A supertype for all expression forms that can appear in compiled code,
except for literals that are represented as themselves,
indirect structures to create cycles, and some seq
structures (which can appear as an expression as long as it contains
only other things that can be expressions).
Represents a lambda form. The name field is a name
for debugging purposes. The num-params field indicates the
number of arguments accepted by the procedure, not counting a rest
argument; the rest? field indicates whether extra arguments
are accepted and collected into a “rest” variable. The
closure-map field is a vector of stack positions that are
captured when evaluating the lambda form to create a closure.
When the function is called, the rest-argument list (if any) is pushed
onto the stack, then the normal arguments in reverse order, then the
closure-captured values in reverse order. Thus, when body is
run, the first value on the stack is the first value captured by the
closure-map array, and so on.
The max-let-depth field indicates the maximum stack depth
created by body (not including arguments and closure-captured
values pushed onto the stack). The body field is the
expression for the closure’s body.
A lambda form with an empty closure, which is a procedure
constant. The procedure constant can appear multiple times in the
graph of expressions for bytecode, and the code field can
refer back to the same closure through an indirect
for a recursive constant procedure; the gen-id is different
for each such constant.
An indirection used in expression positions to form cycles.
Represents a case-lambda form as a combination of
lambda forms that are tried (in order) based on the number of
arguments given.
Pushes an uninitialized slot onto the stack, evaluates rhs
and puts its value into the slot, and then runs body.
After rhs is evaluated, the stack is restored to its depth
from before evaluating rhs. Note that the new slot is created
before evaluating rhs.
Pushes count uninitialized slots onto the stack and then runs
body. If boxes? is #t, then the slots are
filled with boxes that contain #<undefined>.
Runs rhs to obtain count results, and installs them
into existing slots on the stack in order, skipping the first
pos stack positions. If boxes? is #t, then
the values are put into existing boxes in the stack slots.
After rhs is evaluated, the stack is restored to its depth
from before evaluating rhs.
Represents a letrec form with lambda bindings. It
allocates a closure shell for each lambda form in
procs, pushes them onto the stack in reverse order, fills out
each shell’s closure using the created shells, and then evaluates
body.
Skips pos elements of the stack, setting the slot afterward
to a new box containing the slot’s old value, and then runs
body. This form appears when a lambda argument is
mutated using set! within its body; calling the function
initially pushes the value directly on the stack, and this form boxes
the value so that it can be mutated later.
Represents a local-variable reference; it accesses the value in the
stack slot after the first pos slots. If unbox? is
#t, the stack slot contains a box, and a value is extracted
from the box. If clear? is #t, then after the value
is obtained, the stack slot is cleared (to avoid retaining a reference
that can prevent reclamation of the value as garbage).
(struct | | (toplevel expr) | | (depth pos const? ready?) | | | #:transparent) |
|
depth : nonnegative-exact-integer? |
pos : nonnegative-exact-integer? |
const? : boolean? |
ready? : boolean? |
Represents a reference to a top-level or imported variable via the
prefix array. The depth field indicates the number
of stack slots to skip to reach the prefix array, and pos is
the offset into the array.
If const? is #t, then the variable definitely will
be defined, and its value stays constant. If ready? is
#t, then the variable definitely will be defined (but its
value might change in the future). If const? and
ready? are both #f, then a check is needed to
determine whether the variable is defined.
|
depth : nonnegative-exact-integer? |
pos : nonnegative-exact-integer? |
midpt : nonnegative-exact-integer? |
Represents a reference to a quoted syntax object via the
prefix array. The depth field indicates the number
of stack slots to skip to reach the prefix array, and pos is
the offset into the array. The midpt value is used internally
for lazy calculation of syntax information.
Represents a function call. The rator field is the expression
for the function, and rands are the argument
expressions. Before any of the expressions are evaluated,
(length rands) uninitialized stack slots are created (to be
used as temporary space).
Represents an if form.
After test is evaluated, the stack is restored to its depth
from before evaluating test.
Represents a with-continuation-mark expression.
After each of key and val is evaluated, the stack is
restored to its depth from before evaluating key or
val.
Represents a begin0 expression.
After each expression in seq is evaluated, the stack is
restored to its depth from before evaluating the expression.
Represents a #%variable-reference form.
Represents a set! expression that assigns to a top-level or
module-level variable. (Assignments to local variables are represented
by install-value expressions.)
After rhs is evaluated, the stack is restored to its depth
from before evaluating rhs.
Represents (call-with-values (lambda () args-expr) proc),
which is handled specially by the run-time system.
Represents a direct reference to a variable imported from the run-time
kernel.
7.2.4 Syntax Objects
(struct | | wrapped | | (datum wraps certs) | | | #:transparent) |
|
datum : any/c |
wraps : (listof wrap?) |
certs : list? |
Represents a syntax object, where wraps contain the lexical
information and certs is certificate information. When the
datum part is itself compound, its pieces are wrapped, too.
(struct | | wrap | | () | | | #:transparent) |
|
A supertype for lexical-information elements.
A local-binding mapping from symbols to binding-set names.
Shifts module bindings later in the wrap set.
(struct | | (module-rename wrap) | ( | phase | | | | | kind | | | | | set-id | | | | | unmarshals | | | | | renames | | | | | mark-renames | | | | | plus-kern?) | | | #:transparent) |
|
phase : exact-integer? |
kind : (or/c 'marked 'normal) |
set-id : any/c |
unmarshals : (listof make-all-from-module?) |
renames : (listof module-binding?) |
mark-renames : any/c |
plus-kern? : boolean? |
Represents a set of module and import bindings.
Represents a set of simple imports from one module within a
module-rename.
Represents a single identifier import (i.e., the general case) within
a module-rename.