Memory Allocation

MzScheme uses both malloc and allocation functions provided by a garbage collector. Embedding/extension C/C++ code may use either allocation method, keeping in mind that pointers to garbage-collectable blocks in malloced memory are invisible (i.e., such pointers will not prevent the block from being garbage-collected).

MzSchemeCGC uses a conservative garbage collector. This garbage collector normally only recognizes pointers to the beginning of allocated objects. Thus, a pointer into the middle of a GC-allocated string will normally not keep the string from being collected. The exception to this rule is that pointers saved on the stack or in registers may point to the middle of a collectable object. Thus, it is safe to loop over an array by incrementing a local pointer variable.

MzScheme3m uses a precise garbage collector that moves objects during collection, in which case the C code must be instrumented to expose local pointer bindings to the collector, and to provide tracing procedures for (tagged) records containing pointers. This instrumentation is described further in section 3.1.

The basic collector allocation functions are:

If a MzScheme extension stores Scheme pointers in a global or static variable, then that variable must be registered with scheme_register_extension_global; this makes the pointer visible to the garbage collector. Registered variables need not contain a collectable pointer at all times (even with 3m, but the variable must contain some pointer, possibly uncollectable, at all times).

With conservative collection, no registration is needed for the global or static variables of an embedding program, unless it calls scheme_set_stack_base with a non-zero second argument.3 In that case, global and static variables containing collectable pointers must be registered with scheme_register_static. The MZ_REGISTER_STATIC macro takes any variable name and registers it with scheme_register_static. The scheme_register_static function can be safely called even when it's not needed, but it must not be called multiple times for a single memory address.

Collectable memory can be temporarily locked from collection by using the reference-counting function scheme_dont_gc_ptr. Under 3m, such locking does not prevent the object from being moved.

Garbage collection can occur during any call into MzScheme or its allocator, on anytime that MzScheme has control, except during functions that are documented otherwise. The predicate and accessor macros listed in section 2.1 never trigger a collection.

3.1  Cooperating with 3m

To allow 3m's precise collector to detect and update pointers during garbage collection, all pointer values must be registered with the collector, at least during the times that a collection may occur. The content of a word registered as a pointer must contain either NULL, a pointer to the start of a collectable object, a pointer into an object allocated by scheme_malloc_allow_interior, a pointer to an object currently allocated by another memory mamanger (and therefore not into a block that is currently managed by the collector), or a pointer to an odd-numbered address (e.g., a MzScheme fixnum).

Pointers are registered in three different ways:

A pointer must never refer to the interior of an allocated object (when a garbage collection is possible), unless the object was allocated with scheme_malloc_allow_interior. For this reason, pointer arithmetic must usually be avoided, unless the variable holding the generated pointer is NULLed before a collection.

IMPORTANT: The SCHEME_SYM_VAL, SCHEME_KEYWORD_VAL, SCHEME_VEC_ELS, and SCHEME_PRIM_CLOSURE_ELS macros produce pointers into the middle of their respective objects, so the results of these macros must not be held during the time that a collection can occur. Incorrectly retaining such a pointer can lead to a crash.

3.1.1  Tagged Objects

As explained in section 2, the scheme_make_type function can be used to obtain a new tag for a new type of object. These new types are in relatively short supply for 3m; the maximum tag is 255, and MzScheme itself uses nearly 200.

After allocating a new tag in 3m (and before creating instances of the tag), a size procedure, a mark procedure, and a fixup procedure must be installed for the tag using GC_register_traversers.

A size procedure simply takes a pointer to an object with the tag and returns its size in words (not bytes). The gcBYTES_TO_WORDS macro converts a byte count to a word count.

A mark procedure is used to trace references among objects without moving any objects. The procedure takes a pointer to an object, and it should apply the gcMARK macro to every pointer within the object. The mark procedure should return the same result as the size procedure.

A fixup procedure is used to update references to objects after or while they are moved. The procedure takes a pointer to an object, and it should apply the gcFIXUP macro to every pointer within the object; the expansion of this macro takes the address of its argument. The fixup procedure should return the same result as the size procedure.

Depending on the collector's implementation, the mark or fixup procedure might not be used. For example, the collector may only use the mark procedure and not actually move the object. Or it may use the fixup procedure to mark and move objects at the same time. To dereference an object pointer during a fixup procedure, use GC_fixup_self to convert the address passed to the procedure to refer to the potentially moved object, and use GC_resolve to convert an address that is not yet fixed up to determine the object's current location.

When allocating a tagged object in 3m, the tag must be installed immediately after the object is allocated -- or, at least, before the next possible collection.

3.1.2  Local Pointers

The 3m collector needs to know the address of every local or temporary pointer within a function call at any point when a collection can be triggered. Beware that nested function calls can hide temporary pointers; for example, in

  scheme_make_pair(scheme_make_pair(scheme_true, scheme_false),
                   scheme_make_pair(scheme_false, scheme_true))

the result from one scheme_make_pair call is on the stack or in a register during the other call to scheme_make_pair; this pointer must be exposed to the garbage collection and made subject to update. Simply changing the code to

  tmp = scheme_make_pair(scheme_true, scheme_false);
  scheme_make_pair(tmp,
                   scheme_make_pair(scheme_false, scheme_true))

does not expose all pointers, since tmp must be evaluated before the second call to scheme_make_pair. In general, the above code must be converted to the form

  tmp1 = scheme_make_pair(scheme_true, scheme_false);
  tmp2 = scheme_make_pair(scheme_true, scheme_false);
  scheme_make_pair(tmp1, tmp2);

and this is converted form must be instrumented to register tmp1 and tmp2. The final result might be

  {
    Scheme_Object *tmp1 = NULL, *tmp2 = NULL, *result;
    MZ_GC_DECL_REG(2);

    MZ_GC_VAR_IN_REG(0, tmp1);
    MZ_GC_VAR_IN_REG(1, tmp2);
    MZ_GC_REG();

    tmp1 = scheme_make_pair(scheme_true, scheme_false);
    tmp2 = scheme_make_pair(scheme_true, scheme_false);
    result = scheme_make_pair(tmp1, tmp2);

    MZ_GC_UNREG();

    return result;
  }

Notice that result is not registered above. The MZ_GC_UNREG macro cannot trigger a garbage collection, so the result variable is never live during a potential collection. Note also that tmp1 and tmp2 are initialized with NULL, so that they always contain a pointer whenever a collection is possible.

The MZ_GC_DECL_REG macro expands to a local-variable declaration to hold information for the garbage collector. The argument is the number of slots to provide for registration. Registering a simple pointer requires a single slot, whereas registering an array of pointers requires three slots. For example, to register a pointer tmp and an array of 10 char *s:

  {
    Scheme_Object *tmp1 = NULL;
    char *a[10];
    int i;
    MZ_GC_DECL_REG(4);

    MZ_GC_ARRAY_VAR_IN_REG(0, a, 10);
    MZ_GC_VAR_IN_REG(3, tmp1);
    /* Clear a before a potential GC: */
    for (i = 0; i < 10; i++) a[i] = NULL;
    ...
    f(a);
    ...
  }

The MZ_GC_ARRAY_VAR_IN_REG macro registers a local array given a starting slot, the array variable, and an array size. The MZ_GC_VAR_IN_REG takes a slot and simple pointer variable. A local variable or array must not be registered multiple times.

In the above example, the first argument to MZ_GC_VAR_IN_REG is 3 because the information for a uses the first three slots. Even if a is not used after the call to f, a must be registered with the collector during the entire call to f, because f presumably uses a until it returns.

The name used for a variable need not be immediate. Structure members can be supplied as well:

  {
    struct { void *s; int v; void *t; } x = {NULL, 0, NULL};
    MZ_GC_DECL_REG(2);

    MZ_GC_VAR_IN_REG(0, x.s);
    MZ_GC_VAR_IN_REG(0, x.t);
    ...
  }

In general, the only constraint on the second argument to MZ_GC_VAR_IN_REG or MZ_GC_ARRAY_VAR_IN_REG is that & must produce the relevant address.

Pointer information is not actually registered with the collector until the MZ_GC_REG macro is used. The MZ_GC_UNREG macro de-registers the information. Each call to MZ_GC_REG must be balanced by one call to MZ_GC_UNREG.

Pointer information need not be initialized with MZ_GC_VAR_IN_REG and MZ_GC_ARRAY_VAR_IN_REG before calling MZ_GC_REG, and the set of registered pointers can change at any time -- as long as all relevent pointers are registered when a collection might occur. The following example recycles slots and completely de-registers information when no pointers are relevant. The example also illustrates how MZ_GC_UNREG is not needed when control escapes from the function, such as when scheme_signal_error escapes.

  {
    Scheme_Object *tmp1 = NULL, *tmp2 = NULL;
    mzchar *a, *b;
    MZ_GC_DECL_REG(2);

    MZ_GC_VAR_IN_REG(0, tmp1);
    MZ_GC_VAR_IN_REG(1, tmp2);
    
    tmp1 = scheme_make_utf8_string("foo");
    MZ_GC_REG();
    tmp2 = scheme_make_utf8_string("bar");
    tmp1 = scheme_append_char_string(tmp1, tmp2);

    if (SCHEME_FALSEP(tmp1))
      scheme_signal_error("shouldn't happen!");

    a = SCHEME_CHAR_VAL(tmp1);

    MZ_GC_VAR_IN_REG(0, a);

    tmp2 = scheme_make_pair(scheme_read_bignum(a, 0, 10), tmp2);

    MZ_GC_UNREG();

    if (SCHEME_INTP(tmp2)) {
      return 0;
    }

    MZ_GC_REG();
    tmp1 = scheme_make_pair(scheme_read_bignum(a, 0, 8), tmp2);
    MZ_GC_UNREG();

    return tmp1;
  }

A MZ_GC_DECL_REG can be used in a nested block to hold declarations for the block's variables. In that case, the nested MZ_GC_DECL_REG must have its own MZ_GC_REG and MZ_GC_UNREG calls.

  {
    Scheme_Object *accum = NULL;
    MZ_GC_DECL_REG(1);
    MZ_GC_VAR_IN_REG(0, accum);
    MZ_GC_REG();

    accum = scheme_make_pair(scheme_true, scheme_null);
    {
      Scheme_Object *tmp = NULL;
      MZ_GC_DECL_REG(1);
      MZ_GC_VAR_IN_REG(0, tmp);
      MZ_GC_REG();

      tmp = scheme_make_pair(scheme_true, scheme_false);
      accum = scheme_make_pair(tmp, accum);

      MZ_GC_UNREG();
    }
    accum = scheme_make_pair(scheme_true, accum);

    MZ_GC_UNREG();
    return accum;
  }

Variables declared in a local block can also be registered together with variables from an enclosing block, but the local-block variable must be unregistered before it goes out of scope. The MZ_GC_NO_VAR_IN_REG macro can be used to unregister a variable or to initialize a slot as having no variable.

  {
    Scheme_Object *accum = NULL;
    MZ_GC_DECL_REG(2);
    MZ_GC_VAR_IN_REG(0, accum);
    MZ_GC_NO_VAR_IN_REG(1);
    MZ_GC_REG();

    accum = scheme_make_pair(scheme_true, scheme_null);
    {
      Scheme_Object *tmp = NULL;
      MZ_GC_VAR_IN_REG(1, tmp);

      tmp = scheme_make_pair(scheme_true, scheme_false);
      accum = scheme_make_pair(tmp, accum);

      MZ_GC_NO_VAR_IN_REG(1);
    }
    accum = scheme_make_pair(scheme_true, accum);

    MZ_GC_UNREG();
    return accum;
  }

The MZ_GC_ macros all expand to nothing when MZ_PRECISE_GC is not defined, so the macros can be placed into code to be compiled for both conservative and precise collection.

The MZ_GC_REG and MZ_GC_UNREG macros must never be used in an OS thread other than MzScheme's thread.

3.1.3  Local Pointers and mzc --xform

When mzc is run with the --xform flag and a source C program, it produces a C program that is instrumented in the way described in the previous section (but with a slightly different set of macros). For each input file name.c, the transformed output is name.3m.c.

The --xform mode for mzc does not change allocation calls, nor does it generate size, mark, or fixup predocures. It merely converts the code to register local pointers.

Furthermore, the --xform mode for mzc does not handle all of C. It's ability to rearrange compound expressions is particularly limited, because --xform merely converts expression text heuristically instead of parsing C. A future version of the tool will correct such problems. For now, mzc in --xform mode attempts to provide reasonable error messages when it is unable to convert a program, but beware that it can miss cases. To an even more limited degree, --xform can work on C++ code. Inspect the output of --xform mode to ensure that your code is correctly instrumented.

Some specific limitations:

3.1.4  Guiding mzc --xform

The following macros can be used (with care!) to navigate --xform around code that it cannot handle:

3.2  Library Functions

¤ void *scheme_malloc(size_t n)

Allocates n bytes of collectable memory, initially filled with zeros. In 3m, the allocated object is treated as an array of pointers.

¤ void *scheme_malloc_atomic(size_t n)

Allocates n bytes of collectable memory containing no pointers visible to the garbage collector. The object is not initialized to zeros.

¤ void *scheme_malloc_uncollectable(size_t n)

Non-3m, only. Allocates n bytes of uncollectable memory.

¤ void *scheme_malloc_eternal(size_t n)

Allocates uncollectable atomic memory. This function is equivalent to malloc, except that the memory cannot be freed.

¤ void *scheme_calloc(size_t num, size_t size)

Allocates num * size bytes of memory using scheme_malloc.

¤ void *scheme_malloc_tagged(size_t n)

Like scheme_malloc, but in 3m, the type tag determines how the garbage collector traverses the object; see section 3.

¤ void *scheme_malloc_allow_interior(size_t n)

Like scheme_malloc, but in 3m, pointers are allowed to reference the middle of the object; see section 3.

¤ char *scheme_strdup(char *str)

Copies the null-terminated string str; the copy is collectable.

¤ char *scheme_strdup_eternal(char *str)

Copies the null-terminated string str; the copy will never be freed.

¤ void *scheme_malloc_fail_ok(void *(*mallocf)(size_t size), size_t size)

Attempts to allocate size bytes using mallocf. If the allocation fails, the exn:misc:out-of-memory exception is raised.

¤ void **scheme_malloc_immobile_box(void *p)

Allocates memory that is not garbage-collected and that does not move (even with 3m), but whose first word contains a pointer to a collectable object. The box is initialized with p, but the value can be changed at any time. An immobile box must be explicitly freed using scheme_free_immobile_box.

¤ void scheme_free_immobile_box(void **b)

Frees an immobile box allocated with scheme_malloc_immobile_box.

¤ void scheme_register_extension_global(void *ptr, long size)

Registers an extension's global variable that can contain Scheme pointers. The address of the global is given in ptr, and its size in bytes in size.In addition to global variables, this function can be used to register any permanent memory that the collector would otherwise treat as atomic. A garbage collection can occur during the registration.

¤ void scheme_set_stack_base(void *stack_addr, int no_auto_statics)

Overrides the GC's auto-determined stack base, and/or disables the GC's automatic traversal of global and static variables. If stack_addr is NULL, the stack base determined by the GC is used. Otherwise, it should be the ``deepest'' memory address on the stack where a collectable pointer might be stored. This function should be called only once, and before any other scheme_ function is called. It never triggers a garbage collection.

The following example shows a typical use for setting the stack base:

    int main(int argc, char **argv) {
       int dummy;
       scheme_set_stack_base(&dummy, 0);
       real_main(argc, argv); /* calls scheme_basic_env(), etc. */
    }

¤ void scheme_set_stack_bounds(void *stack_addr, void *stack_end, int no_auto_statics)

Like scheme_set_stack_base, except for the extra stack_end argument. If stack_end is non-NULL, then it corresponds to a point of C-stack growth after which MzScheme should attempt to handle stack overflow. The stack_end argument should not correspond to the actual stack end, since detecting stack overflow may take a few frames, and since handling stack overflow requires a few frames.

If stack_end is NULL, then the stack end is computed automatically: the stack size assumed to be the limit reported by getrlimit under Unix and Mac OS X, or it is assumed to be 1 MB under Windows; if this size is greater than 8 MB, then 8 MB is assumed, instead; the size is decremented by 50000 bytes to cover a large margin of error; finally, the size is subtracted from (for stacks that grow down) or added to (for stacks that grow up) the stack base in stack_addr or the auotmatically computed stack base. Note that the 50000-byte margin of error is assumed to cover the difference between the actual stack start and the reported stack base, in addition to the margin needed for detecting and handling stack overflow.

¤ void scheme_register_static(void *ptr, long size)

Like scheme_register_extension_global, for use in embedding applications in situations where the collector does not automatically find static variables (i.e., when scheme_set_stack_base has been called with a non-zero second argument).

The macro MZ_REGISTER_STATIC can be used directly on a static variable. It expands to a comment if statics need not be registered, and a call to scheme_register_static (with the address of the static variable) otherwise.

¤ void scheme_weak_reference(void **p)

Registers the pointer *p as a weak pointer; when no other (non-weak) pointers reference the same memory as *p references, then *p will be set to NULL by the garbage collector. The value in *p may change, but the pointer remains weak with respect to the value of *p at the time p was registered.

¤ void scheme_weak_reference_indirect(void **p, void *v)

Like scheme_weak_reference, but *p is cleared (regardless of its value) when there are no references to v.

¤ void scheme_register_finalizer(void *p, void (*f)(void *p, void *data), void *data,
   void (**oldf)(void *p, void *data), void **olddata)

Registers a callback function to be invoked when the memory p would otherwise be garbage-collected, and when no ``will''-like finalizers are registered for p.

The f argument is the callback function; when it is called, it will be passed the value p and the data pointer data; data can be anything -- it is only passed on to the callback function. If oldf and olddata are not NULL, then *oldf and *olddata are filled with the old callback information (f and data will override this old callback).

To remove a registered finalizer, pass NULL for f and data.

Note: registering a callback not only keeps p from collection until the callback is invoked, but it also keeps data reachable until the callback is invoked.

¤ void scheme_add_finalizer(void *p, void (*f)(void *p, void *data), void *data)

Adds a finalizer to a chain of primitive finalizers. This chain is separate from the single finalizer installed with scheme_register_finalizer; all finalizers in the chain are called immediately after a finalizer that is installed with scheme_register_finalizer.

See scheme_register_finalizer, above, for information about the arguments.

To remove an added finalizer, use scheme_subtract_finalizer.

¤ void scheme_add_scheme_finalizer(void *p, void (*f)(void *p, void *data), void *data)

Installs a ``will''-like finalizer, similar to will-register. Scheme finalizers are called one at a time, requiring the collector to prove that a value has become inaccessible again before calling the next Scheme finalizer. Finalizers registered with scheme_register_finalizer or scheme_add_finalizer are not called until all Scheme finalizers have been exhausted.

See scheme_register_finalizer, above, for information about the arguments.

There is currently no facility to remove a ``will''-like finalizer.

¤ void scheme_add_finalizer_once(void *p, void (*f)(void *p, void *data), void *data)

Like scheme_add_finalizer, but if the combination f and data is already registered as a (non-``will''-like) finalizer for p, it is not added a second time.

¤ void scheme_add_scheme_finalizer_once(void *p, void (*f)(void *p, void *data), void *data)

Like scheme_add_scheme_finalizer, but if the combination of f and data is already registered as a ``will''-like finalizer for p, it is not added a second time.

¤ void scheme_subtract_finalizer(void *p, void (*f)(void *p, void *data), void *data)

Removes a finalizer that was installed with scheme_add_finalizer.

¤ void scheme_remove_all_finalization(void *p)

Removes all finalization (``will''-like or not) for p, including wills added in Scheme with will-register and finalizers used by custodians.

¤ void scheme_dont_gc_ptr(void *p)

Keeps the collectable block p from garbage collection. Use this procedure when a reference to p is be stored somewhere inaccessible to the collector. Once the reference is no longer used from the inaccessible region, de-register the lock with scheme_gc_ptr_ok. A garbage collection can occur during the registration.

This function keeps a reference count on the pointers it registers, so two calls to scheme_dont_gc_ptr for the same p should be balanced with two calls to scheme_gc_ptr_ok.

¤ void scheme_gc_ptr_ok(void *p)

See scheme_dont_gc_ptr.

¤ void scheme_collect_garbage()

Forces an immediate garbage-collection.

¤ void GC_register_traversers(short tag,
Size_Proc s, Mark_Proc m, Fixup_Proc f,
int is_const_size, int is_atomic)

3m only. Registers a size, mark, and fixup procedure for a given type tag; see section 3.1.1 for more information.

Each of the three procedures takes a pointer and returns an integer:

  typedef int (*Size_Proc)(void *obj);
  typedef int (*Mark_Proc)(void *obj);
  typedef int (*Fixup_Proc)(void *obj);

If the result of the size procedure is a constant, then pass a non-zero value for is_const_size. If the mark and fixup procedures are no-ops, then pass a non-zero value for is_atomic.


3 Under Mac OS X or with MzScheme3m, scheme_set_stack_base must be called always.