Overview
This manual describes MzScheme's C interface, which allows the interpreter to be extended by a dynamically-loaded library, or embedded within an arbitrary C/C++ program. The manual assumes familiarity with MzScheme, as described in PLT MzScheme: Language Manual.
For an alternative way of dealing with foreign code, see
PLT Foreign Interface Manual; it describes the (
module for manipulating low-level libraries and structures through
Scheme code instead of C code.lib
"foreign.ss"
)
1.1 Writing MzScheme Extensions
To write a C/C++-based extension for MzScheme, follow these steps:
For each C/C++ file that uses MzScheme library functions, #include the file escheme.h.
This file is distributed with the PLT software in an include directory, but if mzc is used to compile, this path is found automatically.
Define the C function scheme_initialize, which takes a Scheme_Env * namespace (see section 4) and returns a Scheme_Object * Scheme value.
This initialization function can install new global primitive procedures or other values into the namespace, or it can simply return a Scheme value. The initialization function is called when the extension is loaded with
load-extension
(the first time); the return value from scheme_initialize is used as the return value forload-extension
. The namespace provided to scheme_initialize is the current namespace whenload-extension
is called.Define the C function scheme_reload, which has the same arguments and return type as scheme_initialize.
This function is called if
load-extension
is called a second time (or more times) for an extension. Like scheme_initialize, the return value from this function is the return value forload-extension
.Define the C function scheme_module_name, which takes no arguments and returns a Scheme_Object * value, either a symbol or scheme_false.
The function should return a symbol when the effect of calling scheme_initialize and scheme_reload is only to declare a module with the returned name. This function is called when the extension is loaded to satisfy a
require
declaration.The scheme_module_name function may be called before scheme_initialize and scheme_reload, after those functions, or both before and after, depending on how the extension is loaded and re-loaded.
Compile the extension C/C++ files to create platform-specific object files.
The mzc compiler, distributed with MzScheme, compiles plain C files when the --cc flag is specified. More precisely, mzc does not compile the files itself, but it locates a C compiler on the system and launches it with the appropriate compilation flags. If the platform is a relatively standard Unix system, a Windows system with either Microsoft's C compiler or gcc in the path, or a Mac OS X system with Apple's developer tools installed, then using mzc is typically easier than working with the C compiler directly.
Link the extension C/C++ files with mzdyn.o (Unix, Mac OS X) or mzdyn.obj (Windows) to create a shared object. The resulting shared object should use the extension .so (Unix), .dll (Windows), or .dylib (Mac OS X).
The mzdyn object file is distributed in the installation's lib directory. For Windows, the object file is in a compiler-specific sub-directory of plt\lib.
The mzc compiler links object files into an extension when the --ld flag is specified, automatically locating mzdyn.
Load the shared object within Scheme using
(load-extension
, wherepath
)path
is the name of the extension file generated in the previous step.
IMPORTANT: Scheme values are garbage collected using a conservative garbage collector, so pointers to MzScheme objects can be kept in registers, stack variables, or structures allocated with scheme_malloc. However, static variables that contain pointers to collectable memory must be registered using scheme_register_extension_global (see section 3).
As an example, the following C code defines an extension that returns
"hello world"
when it is loaded:
#include "escheme.h" Scheme_Object *scheme_initialize(Scheme_Env *env) { return scheme_make_string("hello world"); } Scheme_Object *scheme_reload(Scheme_Env *env) { return scheme_initialize(env); /* Nothing special for reload */ } Scheme_Object *scheme_module_name() { return scheme_false; }
Assuming that this code is in the file hw.c, the extension is compiled under Unix with the following two commands:
mzc --cc hw.c
mzc --ld hw.so hw.o
(Note that the --cc and --ld flags are each prefixed by two dashes, not one.)
The collects/mzscheme/examples directory in the PLT distribution contains additional examples.
MzScheme3m is a variant of MzScheme that uses precise garbage collection instead of conservative garbage collection, and it may move objects in memory during a collection. To build an extension to work with MzScheme3m, the above instructions must be extended as follows:
Adjust code to cooperate with the garbage collector as described in section 3.1. Using mzc with the --xform might convert your code to implement part of the conversion, as described in section 3.1.3.
In either your soure in the in compiler command line, #define MZ_PRECISE_GC before including escheme.h. When using mzc with the --cc and --3m flags, MZ_PRECISE_GC is automatically defined.
Link with mzdyn3m.o (Unix, Mac OS X) or mzdyn3m.obj (Windows) to create a shared object. The resulting extension will work with MzScheme3m and MrEd3m, only. When using mzc with the --ld and --3m flags links to these libraries.
For a relatively simple extension hw.c, the extension is compiled under Unix for 3m with the following three commands:
mzc --xform hw.c
mzc --3m --cc hw.3m.c
mzc --3m --ld hw.so hw.o
Some examples in collects/mzscheme/examples work with MzScheme3m in this way. A few examples are manually instrumented, in which case the --xform step should be skipped.
1.2 Embedding MzScheme into a Program
To embed MzScheme in a program, follow these steps:
Locate or build the MzScheme libraries. For some Unix platforms, you must first download the MzScheme source code and compile the libraries, because they are not included with a binary distribution. Under Windows and Mac OS X, the standard binary distribution includes the libraries.
Under Unix, the libraries are libmzscheme.a and libgc.a (or libmzscheme.so and libgc.so for a dynamic-library build, with libmzscheme.la and libgc.la files for use with libtool). Building from source and installing places the libraries into the installation's lib directory.
Under Windows, stub libraries for use with Microsoft tools are libmzsch
x
.lib and libmzgcx
.lib (wherex
represents the version number) are in a compiler-specific directory in plt\lib. These libraries identify the bindings that are provided by libmzschx
.dll and libmzgcx
.dll -- which are typically installed in plt\lib. When linking with Cygwin, link to libmzschx
.dll and libmzgcx
.dll directly. At run time, either libmzschx
.dll and libmzgcx
.dll must be moved to a location in the standard DLL search path, or your embedding application must ``delayload'' link the DLLs and explicitly load them before use. (MzScheme.exe and MrEd.exe use the latter strategy.)Under Mac OS X, dynamic libraries are provided by the PLT_MzScheme framework, which is typically installed in lib sub-directory of the installation. Supply -framework PLT_MzScheme to gcc when linking, along with -F and a path to the lib directory. At run time, either PLT_MzScheme.framework must be moved to a location in the standard framework search path, or your embedding executable must provide a specific path to the framework (possibly an executable-relative path using the Mach-O @executable_path prefix).
For each C/C++ file that uses MzScheme library functions, #include the file scheme.h.1
This file is distributed with the PLT software in the installation's include directory.
In your main program, obtain a global MzScheme environment Scheme_Env * by calling scheme_basic_env. This function must be called before any other function in the MzScheme library (except scheme_make_param).
Access MzScheme through scheme_load, scheme_eval, and/or other top-level MzScheme functions described in this manual.
Compile the program and link it with the MzScheme libraries.
Scheme values are garbage collected using a conservative garbage collector, so pointers to MzScheme objects can be kept in registers, stack variables, or structures allocated with scheme_malloc. In an embedding application on some platforms, static variables are also automatically registered as roots for garbage collection (but see notes below specific to Mac OS X and Windows).
For example, the following is a simple embedding program which
evaluates all expressions provided on the command line and displays
the results, then runs a read
-eval
-print
loop:
#include "scheme.h" int main(int argc, char *argv[]) { Scheme_Env *e; Scheme_Object *curout; int i; mz_jmp_buf * volatile save, fresh; scheme_set_stack_base(NULL, 1); /* required for OS X, only */ e = scheme_basic_env(); curout = scheme_get_param(scheme_current_config(), MZCONFIG_OUTPUT_PORT); for (i = 1; i < argc; i++) { save = scheme_current_thread->error_buf; scheme_current_thread->error_buf = &fresh; if (scheme_setjmp(scheme_error_buf)) { scheme_current_thread->error_buf = save; return -1; /* There was an error */ } else { Scheme_Object *v = scheme_eval_string(argv[i], e); scheme_display(v, curout); scheme_display(scheme_make_character('\n'), curout); /* read-eval-print loop, implicitly uses the initial Scheme_Env: */ scheme_apply(scheme_builtin_value("read-eval-print-loop"), 0, NULL); scheme_current_thread->error_buf = save; } } return 0; }
Under Mac OS X, or under Windows when MzScheme is compiled to a DLL using Cygwin, the garbage collector cannot find static variables automatically. In that case, scheme_set_stack_base must be called with a non-zero second argument before calling any scheme_ function.
Under Windows (for any other build mode), the garbage collector finds static variables in an embedding program by examining all memory pages. This strategy fails if a program contains multiple Windows threads; a page may get unmapped by a thread while the collector is examining the page, causing the collector to crash. To avoid this problem, call scheme_set_stack_base with a non-zero second argument before calling any scheme_ function.
When an embedding application calls scheme_set_stack_base with a non-zero second argument, it must register each of its static variables with MZ_REGISTER_STATIC if the variable can contain a GCable pointer. For example, if e above is made static, then MZ_REGISTER_STATIC(e) should be inserted before the call to scheme_basic_env.
When building an embedded MzScheme to use SenoraGC (SGC) instead of the default collector, scheme_set_stack_base must be called both with a non-zero second argument and with a stack-base pointer in the first argument. See section 3 for more information.
MzScheme3m can be embedded the same as MzScheme, as long as the embedding program cooperates with the precise garbage collector as described in section 3.1.
1.3 MzScheme and Threads
MzScheme implements threads for Scheme programs without aid from the operating system, so that MzScheme threads are cooperative from the perspective of C code. Under Unix, stand-alone MzScheme uses a single OS-implemented thread. Under Windows and Mac OS X, stand-alone MzScheme uses a few private OS-implemented threads for background tasks, but these OS-implemented threads are never exposed by the MzScheme API.
In an embedding application, MzScheme can co-exist with additional OS-implemented threads, but the additional OS threads must not call any scheme_ function. Only the OS thread that originally calls scheme_basic_env can call scheme_ functions.2 When scheme_basic_env is called a second time to reset the interpreter, it can be called in an OS thread that is different from the original call to scheme_basic_env. Thereafter, all calls to scheme_ functions must originate from the new thread.
See section 8 for more information about threads, including the possible effects of MzScheme's thread implementation on extension and embedding C code.
1.4 MzScheme, Unicode, Characters, and Strings
A character in MzScheme is a Unicode code point. In C, a character value has type mzchar, which is an alias for unsigned -- which is, in turn, 4 bytes for a properly compiled MzScheme. Thus, a mzchar* string is effectively a UCS-4 string.
Only a few MzScheme functions use mzchar*. Instead, most functions accept char* strings. When such byte strings are to be used as a character strings, they are interpreted as UTF-8 encodings. A plain ASCII string is always acceptable in such cases, since the UTF-8 encoding of an ASCII string is itself.
See also section 2.3 and section 11.
1.5 Integers
MzScheme expects to be compiled in a mode where short is a 16-bit integer, int is a 32-bit integer, and long has the same number of bits as void*. The mzlonglong type has 64 bits for compilers that support a 64-bit integer type, otherwise it is the same as long; thus, mzlonglong tends to match long long. The umzlonglong type is the unsigned version of mzlonglong.
1 The C preprocessor symbol SCHEME_DIRECT_EMBEDDED is defined as 1 when scheme.h is #included, or as 0 when escheme.h is #included.
2 This restriction is stronger than saying all calls must be serialized across threads. MzScheme relies on properties of specific threads to avoid stack overflow and garbage collection.