7 Decompiling Bytecode

Version: 4.1.4

7 Decompiling Bytecode

The --decompile mode for mzc takes a bytecode file (which usually has the file extension ".zo") and converts it back to an approximation of Scheme code. Decompiled bytecode is mostly useful for checking the compiler’s transformation and optimization of the source program.

Many forms in the decompiled code, such as module, define, and lambda, have the same meanings as always. Other forms and transformations are specific to the rendering of bytecode, and they reflect a specific execution model:

Top-level variables, variables defined within the module, and variables imported from other modules are prefixed with _, which helps expose the difference between uses of local variables versus other variables. Variables imported from other modules, moreover, have a suffix that indicates the source module.
Non-local variables are always accessed indirectly though an implicit #%globals or #%modvars variable that resides on the value stack (which otherwise contains local variables). Variable accesses are further wrapped with #%checked when the compiler cannot prove that the variable will be defined before the access.
Uses of core primitives are shown without a leading _, and they are never wrapped with #%checked.
Local-variable access may be wrapped with #%sfs-clear, which indicates that the variable-stack location holding the variable will be cleared to prevent the variable’s value from being retained by the garbage collector.
Mutable variables are converted to explicitly boxed values using #%box, #%unbox, and #%set-boxes! (which works on multiple boxes at once). A set!-rec-values operation constructs mutually-recursive closures and simultaneously updates the corresponding variable-stack locations that bind the closures. A set!, set!-values, or set!-rec-values form is always used on a local variable before it is captured by a closure; that ordering reflects how closures capture values in variable-stack locations, as opposed to stack locations.
In a lambda form, if the procedure produced by the lambda has a name (accessible via object-name) and/or source-location information, then it is shown as a quoted constant at the start of the procedure’s body. Afterward, if the lambda form captures any bindings from its context, those bindings are also shown in a quoted constant. Neither constant corresponds to a computation when the closure is called, though the list of captured bindings corresponds to a closure allocation when the lambda form itself is evaluated.
A lambda form that closes over no bindings is wrapped with #%closed plus an identifier that is bound to the closure. The binding’s scope covers the entire decompiled output, and it may be referenced directly in other parts of the program; the binding corresponds to a constant closure value that is shared, and it may even contain cyclic references to itself or other constant closures.
Some applications of core primitives are annotated with #%in, which indicates that the JIT compiler will inline the operation. (Inlining information is not part of the bytecode, but is instead based on an enumeration of primitives that the JIT is known to handle specially.)
A form (#%apply-values proc expr) is equivalent to (call-with-values (lambda () expr) proc), but the run-time system avoids allocating a closure for expr.
A #%decode-syntax form corresponds to a syntax object. Future improvements to the decompiler will convert such syntax objects to a readable form.

7.1 Scheme API for Decompiling

(require compiler/decompile)

(decompile top) → any/c
top : compilation-top?

Consumes the result of parsing bytecode and returns an S-expression (as described above) that represents the compiled code.

7.2 Scheme API for Parsing Bytecode

(require compiler/zo-parse)

(zo-parse in) → compilation-top?
in : input-port?

Parses a port (typically the result of opening a ".zo" file) containing byte. The parsed bytecode is returned in a compilation-top structure.

Beware that the structure types used to represent the bytecode are subject to frequent changes across PLT Scheme versons.

7.2.1 Prefix

(struct compilation-top (max-let-depth prefix code)
#:transparent)
  max-let-depth : exact-nonnegative-integer?
  prefix : prefix?
  code : (or/c form? indirect? any/c)

Wraps compiled code. The max-let-depth field indicates the maximum stack depth that code creates (not counting the prefix array). The prefix field describes top-level variables, module-level variables, and quoted syntax-objects accessed by code. The code field contains executable code; it is normally a form, but a literal value is represented as itself.

(struct prefix (num-lifts toplevels stxs)
#:transparent)
  num-lifts : exact-nonnegative-integer?
  toplevels : (listof (or/c #f symbol? global-bucket? module-variable?))
  stxs : (listof stx?)

Represents a “prefix” that is pushed onto the stack to initiate evaluation. The prefix is an array, where buckets holding the values for toplevels are first, then a bucket for another array if stxs is non-empty, then num-lifts extra buckets for lifted local procedures.

In toplevels, each element is one of the following:

a #f, which indicates a dummy variable that is used to access the enclosing module/namespace at run time;
a symbol, which is a reference to a variable defined in the enclosing module;
a global-bucket, which is a top-level variable (appears only outside of modules); or
a module-variable, which indicates a variable imported from another module.

The variable buckets and syntax objects that are recorded in a prefix are accessed by toplevel and topsyntax expression forms.

(struct global-bucket (name)
#:transparent)
name : symbol?

Represents a top-level variable, and used only in a prefix.

(struct module-variable (modidx sym pos phase)
#:transparent)
  modidx : module-path-index?
  sym : symbol?
  pos : exact-integer?
  phase : (or/c 0 1)

Represents a top-level variable, and used only in a prefix. The pos may record the variable’s offset within its module, or it can be -1 if the variable is always located by name. The phase indicates the phase level of the definition within its module.

(struct stx (encoded)
#:transparent)
encoded : wrapped?

Wraps a syntax object in a prefix.

7.2.2 Forms

(struct form ()
#:transparent)

A supertype for all forms that can appear in compiled code (including exprs), except for literals that are represented as themselves and indirect structures to create cycles.

(struct (def-values form) (ids rhs)
#:transparent)
ids : (listof toplevel?)
rhs : (or/c expr? seq? indirect? any/c)

Represents a define-values form. Each element of ids will reference via the prefix either a top-level variable or a local module variable.

After rhs is evaluated, the stack is restored to its depth from before evaluating rhs.

(struct (def-syntaxes form) (ids rhs prefix max-let-depth)
#:transparent)
  ids : (listof toplevel?)
  rhs : (or/c expr? seq? indirect? any/c)
  prefix : prefix?
  max-let-depth : nonnegative-exact-integer?
(struct (def-for-syntax form) (ids rhs prefix max-let-depth)
#:transparent)
  ids : (listof toplevel?)
  rhs : (or/c expr? seq? indirect? any/c)
  prefix : prefix?
  max-let-depth : nonnegative-exact-integer?

Represents a define-syntaxes or define-values-for-syntax form. The rhs expression has its own prefix, which is pushed before evaluating rhs; the stack is restored after obtaining the result values. The max-let-depth field indicates the maximum size of the stack that will be created by rhs (not counting prefix).

(struct (req form) (reqs dummy)
#:transparent)
reqs : (listof module-path?)
dummy : toplevel?

Represents a top-level require form (but not one in a module form). The dummy variable is used to access to the top-level namespace.

(struct (mod form) ( name
self-modidx
prefix
provides
requires
body
syntax-body
max-let-depth)
#:transparent)
  name : symbol?
  self-modidx : module-path-index?
  prefix : prefix?
  provides : (listof symbol?)
   requires :
(listof (cons/c (or/c exact-integer? #f)
                (listof module-path-index?)))
  body : (listof (or/c form? indirect? any/c))
  syntax-body : (listof (or/c def-syntaxes? def-for-syntax?))
  max-let-depth : exact-nonnegative-integer?

Represents a module declaration. The body forms use prefix, rather than any prefix in place for the module declaration itself (and each syntax-body has its own prefix). The body field contains the module’s run-time code, and syntax-body contains the module’s compile-time code. The max-let-depth field indicates the maximum stack depth created by body forms (not counting the prefix array).

After each form in body is evaluated, the stack is restored to its depth from before evaluating the form.

(struct (seq form) (forms)
#:transparent)
forms : (listof (or/c form? indirect? any/c))

Represents a begin form, either as an expression or at the top level (though the latter is more commonly a splice form). When a seq appears in an expression position, its forms are expressions.

After each form in forms is evaluated, the stack is restored to its depth from before evaluating the form.

(struct (splice form) (forms)
#:transparent)
forms : (listof (or/c form? indirect? any/c))

Represents a top-level begin form where each evaluation is wrapped with a continuation prompt.

After each form in forms is evaluated, the stack is restored to its depth from before evaluating the form.

7.2.3 Expressions

(struct (expr form) ()
#:transparent)

A supertype for all expression forms that can appear in compiled code, except for literals that are represented as themselves, indirect structures to create cycles, and some seq structures (which can appear as an expression as long as it contains only other things that can be expressions).

(struct (lam expr) ( name
flags
num-params
rest?
closure-map
max-let-depth
body)
#:transparent)
  name : (or/c symbol? vector?)
  flags : exact-integer?
  num-params : exact-nonnegative-integer?
  rest? : boolean?
  closure-map : (vectorof exact-nonnegative-integer?)
  max-let-depth : exact-nonnegative-integer?
  body : (or/c expr? seq? indirect? any/c)

Represents a lambda form. The name field is a name for debugging purposes. The num-params field indicates the number of arguments accepted by the procedure, not counting a rest argument; the rest? field indicates whether extra arguments are accepted and collected into a “rest” variable. The closure-map field is a vector of stack positions that are captured when evaluating the lambda form to create a closure.

When the function is called, the rest-argument list (if any) is pushed onto the stack, then the normal arguments in reverse order, then the closure-captured values in reverse order. Thus, when body is run, the first value on the stack is the first value captured by the closure-map array, and so on.

The max-let-depth field indicates the maximum stack depth created by body (not including arguments and closure-captured values pushed onto the stack). The body field is the expression for the closure’s body.

(struct (closure expr) (code gen-id)
#:transparent)
code : lam?
gen-id : symbol?

A lambda form with an empty closure, which is a procedure constant. The procedure constant can appear multiple times in the graph of expressions for bytecode, and the code field can refer back to the same closure through an indirect for a recursive constant procedure; the gen-id is different for each such constant.

(struct indirect (v)
#:mutable
#:prefab)
v : closure?

An indirection used in expression positions to form cycles.

(struct (case-lam expr) (name clauses)
#:transparent)
name : (or/c symbol? vector?)
clauses : (listof lam?)

Represents a case-lambda form as a combination of lambda forms that are tried (in order) based on the number of arguments given.

(struct (let-one expr) (rhs body)
#:transparent)
rhs : (or/c expr? seq? indirect? any/c)
body : (or/c expr? seq? indirect? any/c)

Pushes an uninitialized slot onto the stack, evaluates rhs and puts its value into the slot, and then runs body.

After rhs is evaluated, the stack is restored to its depth from before evaluating rhs. Note that the new slot is created before evaluating rhs.

(struct (let-void expr) (count boxes? body)
#:transparent)
  count : nonnegative-exact-integer?
  boxes? : boolean?
  body : (or/c expr? seq? indirect? any/c)

Pushes count uninitialized slots onto the stack and then runs body. If boxes? is #t, then the slots are filled with boxes that contain #<undefined>.

(struct (install-value expr) (count pos boxes? rhs body)
#:transparent)
  count : nonnegative-exact-integer?
  pos : nonnegative-exact-integer?
  boxes? : boolean?
  rhs : (or/c expr? seq? indirect? any/c)
  body : (or/c expr? seq? indirect? any/c)

Runs rhs to obtain count results, and installs them into existing slots on the stack in order, skipping the first pos stack positions. If boxes? is #t, then the values are put into existing boxes in the stack slots.

After rhs is evaluated, the stack is restored to its depth from before evaluating rhs.

(struct (let-rec expr) (procs body)
#:transparent)
procs : (listof lam?)
body : (or/c expr? seq? indirect? any/c)

Represents a letrec form with lambda bindings. It allocates a closure shell for each lambda form in procs, pushes them onto the stack in reverse order, fills out each shell’s closure using the created shells, and then evaluates body.

(struct (boxenv expr) (pos body)
#:transparent)
pos : nonnegative-exact-integer?
body : (or/c expr? seq? indirect? any/c)

Skips pos elements of the stack, setting the slot afterward to a new box containing the slot’s old value, and then runs body. This form appears when a lambda argument is mutated using set! within its body; calling the function initially pushes the value directly on the stack, and this form boxes the value so that it can be mutated later.

(struct (localref expr) (unbox? pos clear?)
#:transparent)
  unbox? : boolean?
  pos : nonnegative-exact-integer?
  clear? : boolean?

Represents a local-variable reference; it accesses the value in the stack slot after the first pos slots. If unbox? is #t, the stack slot contains a box, and a value is extracted from the box. If clear? is #t, then after the value is obtained, the stack slot is cleared (to avoid retaining a reference that can prevent reclamation of the value as garbage).

(struct (toplevel expr) (depth pos const? ready?)
#:transparent)
  depth : nonnegative-exact-integer?
  pos : nonnegative-exact-integer?
  const? : boolean?
  ready? : boolean?

Represents a reference to a top-level or imported variable via the prefix array. The depth field indicates the number of stack slots to skip to reach the prefix array, and pos is the offset into the array.

If const? is #t, then the variable definitely will be defined, and its value stays constant. If ready? is #t, then the variable definitely will be defined (but its value might change in the future). If const? and ready? are both #f, then a check is needed to determine whether the variable is defined.

(struct (topsyntax expr) (depth pos midpt)
#:transparent)
  depth : nonnegative-exact-integer?
  pos : nonnegative-exact-integer?
  midpt : nonnegative-exact-integer?

Represents a reference to a quoted syntax object via the prefix array. The depth field indicates the number of stack slots to skip to reach the prefix array, and pos is the offset into the array. The midpt value is used internally for lazy calculation of syntax information.

(struct (application expr) (rator rands)
#:transparent)
rator : (or/c expr? seq? indirect? any/c)
rands : (listof (or/c expr? seq? indirect? any/c))

Represents a function call. The rator field is the expression for the function, and rands are the argument expressions. Before any of the expressions are evaluated, (length rands) uninitialized stack slots are created (to be used as temporary space).

(struct (branch expr) (test then else)
#:transparent)
  test : (or/c expr? seq? indirect? any/c)
  then : (or/c expr? seq? indirect? any/c)
  else : (or/c expr? seq? indirect? any/c)

Represents an if form.

After test is evaluated, the stack is restored to its depth from before evaluating test.

(struct (with-cont-mark expr) (key val body)
#:transparent)
  key : (or/c expr? seq? indirect? any/c)
  val : (or/c expr? seq? indirect? any/c)
  body : (or/c expr? seq? indirect? any/c)

Represents a with-continuation-mark expression.

After each of key and val is evaluated, the stack is restored to its depth from before evaluating key or val.

(struct (beg0 expr) (seq)
#:transparent)
seq : (listof (or/c expr? seq? indirect? any/c))

Represents a begin0 expression.

After each expression in seq is evaluated, the stack is restored to its depth from before evaluating the expression.

(struct (varref expr) (toplevel)
#:transparent)
toplevel : toplevel?

Represents a #%variable-reference form.

(struct (assign expr) (id rhs undef-ok?)
#:transparent)
  id : toplevel?
  rhs : (or/c expr? seq? indirect? any/c)
  undef-ok? : boolean?

Represents a set! expression that assigns to a top-level or module-level variable. (Assignments to local variables are represented by install-value expressions.)

After rhs is evaluated, the stack is restored to its depth from before evaluating rhs.

(struct (apply-values expr) (proc args-expr)
#:transparent)
proc : (or/c expr? seq? indirect? any/c)
args-expr : (or/c expr? seq? indirect? any/c)

Represents (call-with-values (lambda () args-expr) proc), which is handled specially by the run-time system.

(struct (primval expr) (id)
#:transparent)
id : symbol?

Represents a direct reference to a variable imported from the run-time kernel.

7.2.4 Syntax Objects

(struct wrapped (datum wraps certs)
#:transparent)
  datum : any/c
  wraps : (listof wrap?)
  certs : list?

Represents a syntax object, where wraps contain the lexical information and certs is certificate information. When the datum part is itself compound, its pieces are wrapped, too.

(struct wrap ()
#:transparent)

A supertype for lexical-information elements.

(struct (lexical-rename wrap) (alist)
#:transparent)
alist : (listof (cons/c identifier? identifier?))

A local-binding mapping from symbols to binding-set names.

(struct (phase-shift wrap) (amt src dest)
#:transparent)
  amt : exact-integer?
  src : module-path-index?
  dest : module-path-index?

Shifts module bindings later in the wrap set.

(struct (module-rename wrap) ( phase
kind
set-id
unmarshals
renames
mark-renames
plus-kern?)
#:transparent)
  phase : exact-integer?
  kind : (or/c 'marked 'normal)
  set-id : any/c
  unmarshals : (listof make-all-from-module?)
  renames : (listof module-binding?)
  mark-renames : any/c
  plus-kern? : boolean?

Represents a set of module and import bindings.

(struct all-from-module (path phase src-phase exceptions prefix)
#:transparent)
  path : module-path-index?
  phase : (or/c exact-integer? #f)
  src-phase : (or/c exact-integer? #f)
  exceptions : (listof symbol?)
  prefix : symbol?

Represents a set of simple imports from one module within a module-rename.

(struct module-binding ( path
mod-phase
import-phase
id
nominal-path
nominal-phase
nominal-id)
#:transparent)
  path : module-path-index?
  mod-phase : (or/c exact-integer? #f)
  import-phase : (or/c exact-integer? #f)
  id : symbol?
  nominal-path : module-path-index?
  nominal-phase : (or/c exact-integer? #f)
  nominal-id : (or/c exact-integer? #f)

Represents a single identifier import (i.e., the general case) within a module-rename.

top← prev up next →

1	Running mzc
2	Compiling Modified Modules to Bytecode
3	Creating and Distributing Stand-Alone Executables
4	Packaging Library Collections
5	Compiling and Linking C Extensions
6	Embedding Scheme Modules via C
7	Decompiling Bytecode
8	Compiling to Raw Bytecode
9	Compiling to Native Code via C
10	Scheme API for Compilation
	Index

7.1	Scheme API for Decompiling
7.2	Scheme API for Parsing Bytecode