About mzc
1.1 mzc Is...
The mzc compiler takes MzScheme (or MrEd) source code and produces either platform-independent byte-code compiled files (.zo files, which are just-in-time compiled to native code when loaded on x86, x86_64, and PowerPC platforms) or platform-specific native-code libraries (.so or .dll files) to be loaded into MzScheme (or MrEd). In the latter mode, mzc provides limited support for interfacing directly to C libraries.
mzc works on either individual files or on collections. (A
collection is a group of files that conform to MzScheme's
library collection system; see section 16 in PLT MzScheme: Language Manual). In general, mzc
works best with code using the module
form.
As a convenience for programmers writing low-level MzScheme extensions, mzc can compile and link plain C files that use MzScheme's escheme.h header. This facility is described in Inside PLT MzScheme.
Finally, mzc can perform miscellaneous tasks, such as embedding Scheme code in a copy of the MzScheme (or MrEd) binary to produce a stand-alone executable, or creating .plt distribution archives.
1.1.1 Byte-Code Compilation
A byte-code file typically uses the file extension
.zo. The file starts with #~
followed by
the byte-code data.
Byte-code files are loaded into MzScheme in the same way as regular
Scheme source files (e.g., with load
). The
#~
marker causes MzScheme's reader to load byte
codes instead of normal Scheme expressions. When a .zo file
exists in a compiled subdirectory, it is sometimes loaded in
place of a source file; see section 3.3 for details.
Byte-code programs produced by mzc run exactly the same as source code compiled by MzScheme directly (assuming the same set of bindings are in place at compile time and load time). In other words, byte-code compilation does not optimize the code any more than MzScheme's normal evaluator. However, a byte-code file can be loaded into MzScheme much faster than a source-code file.
Whether loading from source or byte code, MzScheme compiles as needed to native code on x86, x86_64, and PowerPC platforms. Setting the environment variable PLTNOMZJIT disables just-in-time compilation on all platforms. (In addition, the stand-alone MzScheme executable also accepts a -j or --no-jit flag to disable just-in-time compilation.) See section 1.4 for information on obtaining the best possible performance.
1.1.2 Native-Code Compilation
A native-code file is a platform-specific shared library. Under Windows, native-code files use the extension .dll. Under Mac OS X, native-code files use the extension .dylib. Under Unix, native-code files use the extension .so.
Native-code files are loaded into MzScheme with the
load-extension
procedure (see section 14.4 in PLT MzScheme: Language Manual). When a
native-code file exists in a compiled subdirectory, it is
sometimes loaded in place of a source file; see section 3.3
for details.
The native-code ahead-of-time compiler uses C as an intermediate language, instead of byte code, and it works on all platforms (when a C compiler is available). The ahead-of-time native compiler can sometimes produce better performance than the just-in-time compiler (where available), but the difference is small compared to the difference between direct byte-code interepretation and just-in-time compilation. See section 1.4 for information on obtaining the best possible performance from mzc-compiled programs.
The cffi.ss library of the compiler collection
defines Scheme forms, such as c-lambda
, for accessing C
functions from Scheme. The forms produce run-time errors when
interpreted directly or compiled to byte code. See section 2 for
further information.
Since native-code compilation produces C source code in an intermediate stage, your system must provide an external C compiler for ahead-of-time native code.
Under Unix and Mac OS X, gcc is used as the C compiler if it can be found in any of the directories listed in the PATH environment variable. If gcc is not found, cc is used if it can be found.
Under Windows, cl.exe, Microsoft Visual C, is used as the C compiler if it can be found in any of the directories listed in the PATH environment variable. If cl.exe is not found, then gcc.exe is used if it can be found. If neither cl.exe nor gcc.exe is found, then bcc32.exe (Borland) is used if it can be found.
In either case, if the MZSCHEME_DYNEXT_COMPILER or CC environment variable is defined, it overrides the above search paths (and MZSCHEME_DYNEXT_COMPILER takes precedence over CC).
The C compiler and compiler flags used by mzc can be adjusted via command line flags.
1.2 mzc Is Not...
mzc does not generally produce stand-alone executables from Scheme source code. The compiler's output is intended to be loaded into MzScheme (or MrEd or DrScheme). However, see also section 5 for information about embedding code into a copy of the MzScheme (or MrEd) executable.
mzc does not translate Scheme code into similar C code. Native-code compilation produces C code that relies on MzScheme to provide run-time support, which includes memory management, closure creation, procedure application, and primitive operations.
1.3 Running mzc
Run mzc from a shell, passing flags and arguments on the command line.
In this manual, each example command line is shown as follows:
mzc --extension --prefix macros.ss file.ss
To run this example, type the command line into a shell (replacing mzc with the path to mzc on your system, if necessary).
Simple on-line help is available for mzc's command-line arguments by running mzc with the -h or --help flag.
1.4 Native Code Optimization
Native code compilation (either just-in-time or ahead-of-time) can provide significant speedups compared to interpreting byte code or running directly from source code (when just-in-time compilation is unavailable or disabled).
Significant speedup from native-code compilation is typically due to two optimizations:
Direct function calls -- When the compiler detects a function call to an immediately visible function, it generates more efficient code than for a generic call, especially for tail calls. For example, given the program
the compiler can detect the(letrec ([odd (lambda (x) (if (
zero?
x) #f (even (sub1
x))))] [even (lambda (x) (if (zero?
x) #t (odd (sub1
x))))]) (odd 400000))odd
-even
loop and produce native code that runs twice as fast as byte-code interpretation. In contrast, given a similar program using top-level definitions,the compiler cannot assume an(define (odd x) ...) (define (even x) ...)
odd
-even
loop, because the global variablesodd
andeven
can be redefined at any time. Within amodule
,define
d variables are lexically scoped likeletrec
variables, andmodule
definitions therefore permit call optimizations.1Primitive inlining -- When mzc encounters the application of certain primitives, it inlines the primitive procedure. However, the compiler must be certain that a variable reference will resolve to a primitive procedure when the code is loaded into MzScheme. In the preceding example, the compiler cannot inline the application of
sub1
because the global variablesub1
might be redefined. To encourage the inlining of primitives -- which produces native code that runs about 30 times faster than byte-code interpretation for the preceding example -- the programmer has three options:Use
module
-- If the original example is encapsulated in a module that importsmzscheme
, then each primitive name, such assub1
, is guaranteed to access the primitive procedure (assuming that the name is not lexically bound). The ``modulized'' version of the preceding program follows:To run this program, the(module oe
mzscheme
(letrec ([odd (lambda (x) (if (zero?
x) #f (even (sub1
x))))] [even (lambda (x) (if (zero?
x) #t (odd (sub1
x))))]) (odd 400000)))oe
module must berequire
d at the top level.Use a
(
prefix -- If the preceding example is prefixed withrequire
mzscheme)(
, thenrequire
mzscheme)sub1
refers not to the global variable, but to thesub1
export of themzscheme
module. See section 3.2 for more information about prefixing compilation.Use the --prim flag -- The --prim flag for mzc effectively prefixes the program with
(
.require
mzscheme)
Programs that permit these optimizations also to encourage a host of
other optimizations, such as procedure inlining (for
programmer-defined procedures) and static closure detection. In
general, module
-based programs provide the most opportunities
for optimization.
Native-code compilation rarely produces significant speedup for programs that are not loop-intensive, programs that are heavily object-oriented, programs that are allocation-intensive, or programs that exploit built-in procedures (e.g., list operations, regular expression matching, or file manipulations) to perform most of the program's work.
1 The compiler cannot always prove that
module
definitions have been evaluated before the
corresponding variable is used in an expression. With ahead-of-time
compilation via mzc, use the -v or --verbose flag to
check whether mzc reports a ``last known module binding'' warning
when compiling a module
expression, which indicates that
definitions after a particular line in the source file might be
referenced before they are defined.