Overview

This manual describes MzScheme's C interface, which allows the interpreter to be extended by a dynamically-loaded library, or embedded within an arbitrary C/C++ program. The manual assumes familiarity with MzScheme, as described in PLT MzScheme: Language Manual.

For an alternative way of dealing with foreign code, see PLT Foreign Interface Manual; it describes the (lib "foreign.ss") module for manipulating low-level libraries and structures through Scheme code instead of C code.

1.1  CGC versus 3m

Before mixing any C code with MzScheme, first decide whether to use/support the CGC variant or 3m variant of MzScheme:

At the C level, working with MzSchemeCGC can be much easier than working with MzScheme3m, but overall system performance is typically better with MzScheme3m. Most users now run MzScheme3m.

1.2  Writing MzScheme Extensions

The process of creating an extension for MzSchemeCGC or MzScheme3m is essentially the same, but the process for MzScheme3m is most easily understood as a variant of the process for MzSchemeCGC.

1.2.1  CGC Extensions

To write a C/C++-based extension for MzSchemeCGC, follow these steps:

IMPORTANT: With MzSchemeCGC, Scheme values are garbage collected using a conservative garbage collector, so pointers to MzScheme objects can be kept in registers, stack variables, or structures allocated with scheme_malloc. However, static variables that contain pointers to collectable memory must be registered using scheme_register_extension_global (see section 3).

As an example, the following C code defines an extension that returns "hello world" when it is loaded:

 #include "escheme.h"
 Scheme_Object *scheme_initialize(Scheme_Env *env) {
   return scheme_make_utf8_string("hello world");
 }
 Scheme_Object *scheme_reload(Scheme_Env *env) {
   return scheme_initialize(env); /* Nothing special for reload */
 }
 Scheme_Object *scheme_module_name() {
   return scheme_false;
 }

Assuming that this code is in the file hw.c, the extension is compiled under Unix with the following two commands:

mzc --cgc --cc hw.c
mzc --cgc --ld hw.so hw.o

(Note that the --cgc, --cc, and --ld flags are each prefixed by two dashes, not one.)

The collects/mzscheme/examples directory in the PLT distribution contains additional examples.

1.2.2  3m Extensions

To build an extension to work with MzScheme3m, the CGC instructions must be extended as follows:

For a relatively simple extension hw.c, the extension is compiled under Unix for 3m with the following three commands:

mzc --xform hw.c
mzc --3m --cc hw.3m.c
mzc --3m --ld hw.so hw.o

Some examples in collects/mzscheme/examples work with MzScheme3m in this way. A few examples are manually instrumented, in which case the --xform step should be skipped.

1.3  Embedding MzScheme into a Program

Like creating extensions, the embedding process for MzSchemeCGC or MzScheme3m is essentially the same, but the process for MzScheme3m is most easily understood as a variant of the process for MzSchemeCGC.

1.3.1  CGC Embedding

To embed MzSchemeCGC in a program, follow these steps:

With MzSchemeCGC, Scheme values are garbage collected using a conservative garbage collector, so pointers to MzScheme objects can be kept in registers, stack variables, or structures allocated with scheme_malloc. In an embedding application on some platforms, static variables are also automatically registered as roots for garbage collection (but see notes below specific to Mac OS X and Windows).

For example, the following is a simple embedding program which evaluates all expressions provided on the command line and displays the results, then runs a read-eval-print loop:

#include "scheme.h"

int main(int argc, char *argv[])
{
  Scheme_Env *e;
  Scheme_Object *curout;
  int i;
  mz_jmp_buf * volatile save, fresh;

  scheme_set_stack_base(NULL, 1); /* required for OS X, only */

  e = scheme_basic_env();

  curout = scheme_get_param(scheme_current_config(), MZCONFIG_OUTPUT_PORT);

  for (i = 1; i < argc; i++) {
    save = scheme_current_thread->error_buf;
    scheme_current_thread->error_buf = &fresh;
    if (scheme_setjmp(scheme_error_buf)) {
      scheme_current_thread->error_buf = save;
      return -1; /* There was an error */
    } else {
      Scheme_Object *v = scheme_eval_string(argv[i], e);
      scheme_display(v, curout);
      scheme_display(scheme_make_character('\n'), curout);
      /* read-eval-print loop, implicitly uses the initial Scheme_Env: */
      scheme_apply(scheme_builtin_value("read-eval-print-loop"), 0, NULL);
      scheme_current_thread->error_buf = save;
    }
  }
  return 0;
}

Under Mac OS X, or under Windows when MzScheme is compiled to a DLL using Cygwin, the garbage collector cannot find static variables automatically. In that case, scheme_set_stack_base must be called with a non-zero second argument before calling any scheme_ function.

Under Windows (for any other build mode), the garbage collector finds static variables in an embedding program by examining all memory pages. This strategy fails if a program contains multiple Windows threads; a page may get unmapped by a thread while the collector is examining the page, causing the collector to crash. To avoid this problem, call scheme_set_stack_base with a non-zero second argument before calling any scheme_ function.

When an embedding application calls scheme_set_stack_base with a non-zero second argument, it must register each of its static variables with MZ_REGISTER_STATIC if the variable can contain a GCable pointer. For example, if e above is made static, then MZ_REGISTER_STATIC(e) should be inserted before the call to scheme_basic_env.

When building an embedded MzSchemeCGC to use SenoraGC (SGC) instead of the default collector, scheme_set_stack_base must be called both with a non-zero second argument and with a stack-base pointer in the first argument. See section 3 for more information.

1.3.2  3m Embedding

MzScheme3m can be embedded mostly the same as MzScheme, as long as the embedding program cooperates with the precise garbage collector as described in section 3.1.

In either your source in the in compiler command line, #define MZ_PRECISE_GC before including scheme.h. When using mzc with the --cc and --3m flags, MZ_PRECISE_GC is automatically defined.

In addition, some library details are different:

For MzScheme3m, an embedding application must call scheme_set_stack_base with non-zero arguments. Furthermore, the first argument must be &__gc_var_stack__, where __gc_var_stack__ is bound by a MZ_GC_DECL_REG.

The simple embedding program from the previous section can be extended to work with either CGC or 3m, dependong on whether MZ_PRECISE_GC is specified on the compiler's command line:

#include "scheme.h"

int main(int argc, char *argv[])
{
  Scheme_Env *e = NULL;
  Scheme_Object *curout = NULL, *v = NULL;
  Scheme_Config *config = NULL;
  int i;
  mz_jmp_buf * volatile save = NULL, fresh;

  MZ_GC_DECL_REG(5);
  MZ_GC_VAR_IN_REG(0, e);
  MZ_GC_VAR_IN_REG(1, curout);
  MZ_GC_VAR_IN_REG(2, save);
  MZ_GC_VAR_IN_REG(3, config);
  MZ_GC_VAR_IN_REG(4, v);

# ifdef MZ_PRECISE_GC
#  define STACK_BASE &__gc_var_stack__
# else
#  define STACK_BASE NULL
# endif

  scheme_set_stack_base(STACK_BASE, 1);

  MZ_GC_REG();

  e = scheme_basic_env();

  config = scheme_current_config();
  curout = scheme_get_param(config, MZCONFIG_OUTPUT_PORT);

  for (i = 1; i < argc; i++) {
    save = scheme_current_thread->error_buf;
    scheme_current_thread->error_buf = &fresh;
    if (scheme_setjmp(scheme_error_buf)) {
      scheme_current_thread->error_buf = save;
      return -1; /* There was an error */
    } else {
      v = scheme_eval_string(argv[i], e);
      scheme_display(v, curout);
      v = scheme_make_character('\n');
      scheme_display(v, curout);
      /* read-eval-print loop, implicitly uses the initial Scheme_Env: */
      v = scheme_builtin_value("read-eval-print-loop");
      scheme_apply(v, 0, NULL);
      scheme_current_thread->error_buf = save;
    }
  }

  MZ_GC_UNREG();

  return 0;
}

Strictly speaking, the config and v variables above need not be registered with the garbage collector, since their values are not needed across function calls that allocate. That is, the original example could have been left alone starting with the scheme_base_env call, except for the addition of MZ_GC_UNREG. The code is much easier to maintain, however, when all local variables are regsistered and when all temporary values are put into variables.

1.4  MzScheme and Threads

MzScheme implements threads for Scheme programs without aid from the operating system, so that MzScheme threads are cooperative from the perspective of C code. Under Unix, stand-alone MzScheme uses a single OS-implemented thread. Under Windows and Mac OS X, stand-alone MzScheme uses a few private OS-implemented threads for background tasks, but these OS-implemented threads are never exposed by the MzScheme API.

In an embedding application, MzScheme can co-exist with additional OS-implemented threads, but the additional OS threads must not call any scheme_ function. Only the OS thread that originally calls scheme_basic_env can call scheme_ functions.2 When scheme_basic_env is called a second time to reset the interpreter, it can be called in an OS thread that is different from the original call to scheme_basic_env. Thereafter, all calls to scheme_ functions must originate from the new thread.

See section 8 for more information about threads, including the possible effects of MzScheme's thread implementation on extension and embedding C code.

1.5  MzScheme, Unicode, Characters, and Strings

A character in MzScheme is a Unicode code point. In C, a character value has type mzchar, which is an alias for unsigned -- which is, in turn, 4 bytes for a properly compiled MzScheme. Thus, a mzchar* string is effectively a UCS-4 string.

Only a few MzScheme functions use mzchar*. Instead, most functions accept char* strings. When such byte strings are to be used as a character strings, they are interpreted as UTF-8 encodings. A plain ASCII string is always acceptable in such cases, since the UTF-8 encoding of an ASCII string is itself.

See also section 2.3 and section 11.

1.6  Integers

MzScheme expects to be compiled in a mode where short is a 16-bit integer, int is a 32-bit integer, and long has the same number of bits as void*. The mzlonglong type has 64 bits for compilers that support a 64-bit integer type, otherwise it is the same as long; thus, mzlonglong tends to match long long. The umzlonglong type is the unsigned version of mzlonglong.


1 The C preprocessor symbol SCHEME_DIRECT_EMBEDDED is defined as 1 when scheme.h is #included, or as 0 when escheme.h is #included.

2 This restriction is stronger than saying all calls must be serialized across threads. MzScheme relies on properties of specific threads to avoid stack overflow and garbage collection.