Memory Allocation
MzScheme uses both malloc and allocation functions provided by a garbage collector. Embedding/extension C/C++ code may use either allocation method, keeping in mind that pointers to garbage-collectable blocks in malloced memory are invisible (i.e., such pointers will not prevent the block from being garbage-collected).
By default MzScheme uses a conservative garbage collector. This garbage collector normally only recognizes pointers to the beginning of allocated objects. Thus, a pointer into the middle of a GC-allocated string will normally not keep the string from being collected. The exception to this rule is that pointers saved on the stack or in registers may point to the middle of a collectable object. Thus, it is safe to loop over an array by incrementing a local pointer variable.
MzScheme3m uses a precise garbage collector that moves objects during collection, in which case the C code must be instrumented to expose local pointer bindings to the collector, and to provide tracing procedures for (tagged) records containing pointers. This instrumentation is described further in section 3.1.
The basic collector allocation functions are:
scheme_malloc -- Allocates collectable memory that may contain pointers to collectable objects; for 3m, the memory must be an array of pointers (though not necessarily to collectable objects). The newly allocated memory is initially zeroed.
scheme_malloc_atomic -- Allocates collectable memory that does not contain pointers to collectable objects. If the memory does contain pointers, they are invisible to the collector and will not prevent an object from being collected. Newly allocated atomic memory is not necessary zeroed.
Atomic memory is used for strings or other blocks of memory which do not contain pointers. Atomic memory can also be used to store intentionally-hidden pointers.
scheme_malloc_tagged -- Allocates collectable memory that contains a mixture of pointers and atomic data. With the conservative collector, this function is the same as scheme_malloc, but under 3m, the type tag stored at the start of the block is used to determine the size and shape of the object for future garbage collection (as described in section 3.1).
scheme_malloc_allow_interior -- Allocates a large array of pointers such that references are allowed into the middle of the block under 3m, and such pointers prevent the block from being collected. This procedure is the same as scheme_malloc with the conservative collector, but in the that case, having only a pointer into the interior will not prevent the array from being collected.
scheme_malloc_uncollectable -- Allocates uncollectable memory that may contain pointers to collectable objects. There is no way to free the memory. The newly allocated memory is initially zeroed. This function is not available in 3m.
If a MzScheme extension stores Scheme pointers in a global or static variable, then that variable must be registered with scheme_register_extension_global; this makes the pointer visible to the garbage collector. Registered variables need not contain a collectable pointer at all times (even with 3m, but the variable must contain some pointer, possibly uncollectable, at all times).
With conservative collection, no registration is needed for the global or static variables of an embedding program, unless it calls scheme_set_stack_base with a non-zero second argument.3 In that case, global and static variables containing collectable pointers must be registered with scheme_register_static. The MZ_REGISTER_STATIC macro takes any variable name and registers it with scheme_register_static. The scheme_register_static function can be safely called even when it's not needed, but it must not be called multiple times for a single memory address.
Collectable memory can be temporarily locked from collection by using the reference-counting function scheme_dont_gc_ptr. Under 3m, such locking does not prevent the object from being moved.
Garbage collection can occur during any call into MzScheme or its allocator, on anytime that MzScheme has control, except during functions that are documented otherwise. The predicate and accessor macros listed in section 2.1 never trigger a collection.
3.1 Cooperating with 3m
To allow 3m's precise collector to detect and update pointers during garbage collection, all pointer values must be registered with the collector, at least during the times that a collection may occur. The content of a word registered as a pointer must contain either NULL, a pointer to the start of a collectable object, a pointer into an object allocated by scheme_malloc_allow_interior, a pointer to an object currently allocated by another memory mamanger (and therefore not into a block that is currently managed by the collector), or a pointer to an odd-numbered address (e.g., a MzScheme fixnum).
Pointers are registered in three different ways:
Pointers in static variables should be registered with scheme_register_static or MZ_REGISTER_STATIC.
Pointers in allocated memory are registered automatically when they are in an array allocated with scheme_malloc, etc. When a pointer resides in an object allocated with scheme_malloc_tagged, etc. the tag at the start of the object identifiers the object's size and shape. Handling of tags is described in section 3.1.1.
Local pointers (i.e., pointers on the stack or in registers) must be registered through the MZ_GC_DECL_REG, etc. macros that are described in section 3.1.2.
A pointer must never refer to the interior of an allocated object (when a garbage collection is possible), unless the object was allocated with scheme_malloc_allow_interior. For this reason, pointer arithmetic must usually be avoided, unless the variable holding the generated pointer is NULLed before a collection.
IMPORTANT: The SCHEME_SYM_VAL, SCHEME_KEYWORD_VAL, SCHEME_VEC_ELS macros produce pointers into the middle of their respective objects, so the results of these macros must not be held during the time that a collection can occur. Incorrectly retaining such a pointer can lead to a crash.
3.1.1 Tagged Objects
As explained in section 2, the scheme_make_type function can be used to obtain a new tag for a new type of object. These new types are in relatively short supply for 3m; the maximum tag is 255, and MzScheme itself uses nearly 200.
After allocating a new tag in 3m (and before creating instances of the tag), a size procedure, a mark procedure, and a fixup procedure must be installed for the tag using GC_register_traversers.
A size procedure simply takes a pointer to an object with the tag and returns its size in words (not bytes). The gcBYTES_TO_WORDS macro converts a byte count to a word count.
A mark procedure is used to trace references among objects without moving any objects. The procedure takes a pointer to an object, and it should apply the gcMARK macro to every pointer within the object. The mark procedure should return the same result as the size procedure.
A fixup procedure is used to update references to objects after or while they are moved. The procedure takes a pointer to an object, and it should apply the gcFIXUP macro to every pointer within the object; the expansion of this macro takes the address of its argument. The fixup procedure should return the same result as the size procedure.
Depending on the collector's implementation, the mark or fixup procedure might not be used. For example, the collector may only use the mark procedure and not actually move the object. Or it may use the fixup procedure to mark and move objects at the same time. To dereference an object pointer during a fixup procedure, use GC_fixup_self to convert the address passed to the procedure to refer to the potentially moved object, and use GC_resolve to convert an address that is not yet fixed up to determine the object's current location.
When allocating a tagged object in 3m, the tag must be installed immediately after the object is allocated -- or, at least, before the next possible collection.
3.1.2 Local Pointers
The 3m collector needs to know the address of every local or temporary pointer within a function call at any point when a collection can be triggered. Beware that nested function calls can hide temporary pointers; for example, in
scheme_make_pair(scheme_make_pair(scheme_true, scheme_false),
scheme_make_pair(scheme_false, scheme_true))
the result from one scheme_make_pair call is on the stack or in a register during the other call to scheme_make_pair; this pointer must be exposed to the garbage collection and made subject to update. Simply changing the code to
tmp = scheme_make_pair(scheme_true, scheme_false);
scheme_make_pair(tmp,
scheme_make_pair(scheme_false, scheme_true))
does not expose all pointers, since tmp must be evaluated before the second call to scheme_make_pair. In general, the above code must be converted to the form
tmp1 = scheme_make_pair(scheme_true, scheme_false); tmp2 = scheme_make_pair(scheme_true, scheme_false); scheme_make_pair(tmp1, tmp2);
and this is converted form must be instrumented to register tmp1 and tmp2. The final result might be
{
Scheme_Object *tmp1 = NULL, *tmp2 = NULL, *result;
MZ_GC_DECL_REG(2);
MZ_GC_VAR_IN_REG(0, tmp1);
MZ_GC_VAR_IN_REG(1, tmp2);
MZ_GC_REG();
tmp1 = scheme_make_pair(scheme_true, scheme_false);
tmp2 = scheme_make_pair(scheme_true, scheme_false);
result = scheme_make_pair(tmp1, tmp2);
MZ_GC_UNREG();
return result;
}
Notice that result is not registered above. The MZ_GC_UNREG macro cannot trigger a garbage collection, so the result variable is never live during a potential collection. Note also that tmp1 and tmp2 are initialized with NULL, so that they always contain a pointer whenever a collection is possible.
The MZ_GC_DECL_REG macro expands to a local-variable declaration to hold information for the garbage collector. The argument is the number of slots to provide for registration. Registering a simple pointer requires a single slot, whereas registering an array of pointers requires three slots. For example, to register a pointer tmp and an array of 10 char *s:
{
Scheme_Object *tmp1 = NULL;
char *a[10];
int i;
MZ_GC_DECL_REG(4);
MZ_GC_ARRAY_VAR_IN_REG(0, a, 10);
MZ_GC_VAR_IN_REG(3, tmp1);
/* Clear a before a potential GC: */
for (i = 0; i < 10; i++) a[i] = NULL;
...
f(a);
...
}
The MZ_GC_ARRAY_VAR_IN_REG macro registers a local array given a starting slot, the array variable, and an array size. The MZ_GC_VAR_IN_REG takes a slot and simple pointer variable. A local variable or array must not be registered multiple times.
In the above example, the first argument to MZ_GC_VAR_IN_REG is 3 because the information for a uses the first three slots. Even if a is not used after the call to f, a must be registered with the collector during the entire call to f, because f presumably uses a until it returns.
The name used for a variable need not be immediate. Structure members can be supplied as well:
{
struct { void *s; int v; void *t; } x = {NULL, 0, NULL};
MZ_GC_DECL_REG(2);
MZ_GC_VAR_IN_REG(0, x.s);
MZ_GC_VAR_IN_REG(0, x.t);
...
}
In general, the only constraint on the second argument to MZ_GC_VAR_IN_REG or MZ_GC_ARRAY_VAR_IN_REG is that & must produce the relevant address.
Pointer information is not actually registered with the collector until the MZ_GC_REG macro is used. The MZ_GC_UNREG macro de-registers the information. Each call to MZ_GC_REG must be balanced by one call to MZ_GC_UNREG.
Pointer information need not be initialized with MZ_GC_VAR_IN_REG and MZ_GC_ARRAY_VAR_IN_REG before calling MZ_GC_REG, and the set of registered pointers can change at any time -- as long as all relevent pointers are registered when a collection might occur. The following example recycles slots and completely de-registers information when no pointers are relevant. The example also illustrates how MZ_GC_UNREG is not needed when control escapes from the function, such as when scheme_signal_error escapes.
{
Scheme_Object *tmp1 = NULL, *tmp2 = NULL;
mzchar *a, *b;
MZ_GC_DECL_REG(2);
MZ_GC_VAR_IN_REG(0, tmp1);
MZ_GC_VAR_IN_REG(1, tmp2);
tmp1 = scheme_make_utf8_string("foo");
MZ_GC_REG();
tmp2 = scheme_make_utf8_string("bar");
tmp1 = scheme_append_char_string(tmp1, tmp2);
if (SCHEME_FALSEP(tmp1))
scheme_signal_error("shouldn't happen!");
a = SCHEME_CHAR_VAL(tmp1);
MZ_GC_VAR_IN_REG(0, a);
tmp2 = scheme_make_pair(scheme_read_bignum(a, 0, 10), tmp2);
MZ_GC_UNREG();
if (SCHEME_INTP(tmp2)) {
return 0;
}
MZ_GC_REG();
tmp1 = scheme_make_pair(scheme_read_bignum(a, 0, 8), tmp2);
MZ_GC_UNREG();
return tmp1;
}
A MZ_GC_DECL_REG can be used in a nested block to hold declarations for the block's variables. In that case, the nested MZ_GC_DECL_REG must have its own MZ_GC_REG and MZ_GC_UNREG calls.
{
Scheme_Object *accum = NULL;
MZ_GC_DECL_REG(1);
MZ_GC_VAR_IN_REG(0, accum);
MZ_GC_REG();
accum = scheme_make_pair(scheme_true, scheme_null);
{
Scheme_Object *tmp = NULL;
MZ_GC_DECL_REG(1);
MZ_GC_VAR_IN_REG(0, tmp);
MZ_GC_REG();
tmp = scheme_make_pair(scheme_true, scheme_false);
accum = scheme_make_pair(tmp, accum);
MZ_GC_UNREG();
}
accum = scheme_make_pair(scheme_true, accum);
MZ_GC_UNREG();
return accum;
}
Variables declared in a local block can also be registered together with variables from an enclosing block, but the local-block variable must be unregistered before it goes out of scope. The MZ_GC_NO_VAR_IN_REG macro can be used to unregister a variable or to initialize a slot as having no variable.
{
Scheme_Object *accum = NULL;
MZ_GC_DECL_REG(2);
MZ_GC_VAR_IN_REG(0, accum);
MZ_GC_NO_VAR_IN_REG(1);
MZ_GC_REG();
accum = scheme_make_pair(scheme_true, scheme_null);
{
Scheme_Object *tmp = NULL;
MZ_GC_VAR_IN_REG(1, tmp);
tmp = scheme_make_pair(scheme_true, scheme_false);
accum = scheme_make_pair(tmp, accum);
MZ_GC_NO_VAR_IN_REG(1);
}
accum = scheme_make_pair(scheme_true, accum);
MZ_GC_UNREG();
return accum;
}
The MZ_GC_ macros all expand to nothing when MZ_PRECISE_GC is not defined, so the macros can be placed into code to be compiled for both conservative and precise collection.
The MZ_GC_REG and MZ_GC_UNREG macros must never be used in an OS thread other than MzScheme's thread.
3.1.3 Local Pointers and mzc
When mzc is run with the --xform flag and a source C program, it produces a C program that is instrumented in the way described in the previous section (but with a slightly different set of macros). For each input file name.c, the transformed output is name.3m.c.
The --xform mode for mzc does not change allocation calls, nor does it generate size, mark, or fixup predocures. It merely converts the code to register local pointers.
Furthermore, the --xform mode for mzc does not handle all of C. It's ability to rearrange compound expressions is particularly limited, because --xform merely converts expression text heuristically instead of parsing C. A future version of the tool will correct such problems. For now, mzc in --xform mode attempts to provide reasonable error messages when it is unable to convert a program, but beware that it can miss cases. To an even more limited degree, --xform can work on C++ code. Inspect the output of --xform mode to ensure that your code is correctly instrumented.
Some specific limitations:
The body of a for, while, or do loop must be surrounded with curly braces. (A conversion error is normally reported, otherwise.)
Function calls may not appear on the right-hand side of an assignment within a declaration block. (A conversion error is normally reported if such an assignment is discovered.)
Multiple function calls in ... ? ... : ... cannot be lifted. (A conversion error is normally reported, otherwise.)
In an assignment, the left-hand side must be a local or static variable, not a field selection, pointer dereference, etc. (A conversion error is normally reported, otherwise.)
The conversion assumes that all function calls use an immediate name for a function, as opposed to a compound expression as in s->f(). The function name need not be a top-level function name, but it must be bound either as an argument or local variable with the form
typeid; the syntaxret_type(*id)(...) is not recgoinzed, so bind the function type to a simple name with typedef, first: typedefret_type(*type)(...); ....typeid.Arrays and structs must be passed by address, only.
GC-triggering code must not appear in system headers.
Pointer-comparison expressions are not handled correctly when either of the compared expressions includes a function call. For example, a() == b() is not converted correctly when a and b produce pointer values.
Passing the address of a local pointer to a function works only when the pointer variable remains live after the function call.
A return; form can get converted to {
stmt; return; };, which can break an if (...) return; else ... pattern.Local instances of union types are generally not supported.
Pointer arithmetic cannot be converted away, and is instead reported as an error.
3.2 Library Functions
¤ void *scheme_malloc(size_t n)
Allocates n bytes of collectable memory, initially filled with
zeros. In 3m, the allocated object is treated as an array of
pointers.
¤ void *scheme_malloc_atomic(size_t n)
Allocates n bytes of collectable memory containing no pointers
visible to the garbage collector. The object is not
initialized to zeros.
¤ void *scheme_malloc_uncollectable(size_t n)
Non-3m, only. Allocates n bytes of uncollectable memory.
¤ void *scheme_malloc_eternal(size_t n)
Allocates uncollectable atomic memory. This function is equivalent to malloc, except that the memory cannot be freed.
¤ void *scheme_calloc(size_t num, size_t size)
Allocates num * size bytes of memory using scheme_malloc.
¤ void *scheme_malloc_tagged(size_t n)
Like scheme_malloc, but in 3m, the type tag determines how the garbage collector traverses the object; see section 3.
¤ void *scheme_malloc_allow_interior(size_t n)
Like scheme_malloc, but in 3m, pointers are allowed to reference the middle of the object; see section 3.
¤ char *scheme_strdup(char *str)
Copies the null-terminated string str; the copy is collectable.
¤ char *scheme_strdup_eternal(char *str)
Copies the null-terminated string str; the copy will never be freed.
¤ void *scheme_malloc_fail_ok(void *(*mallocf)(size_t size), size_t size)
Attempts to allocate size bytes using mallocf. If the
allocation fails, the exn:misc:out-of-memory exception is
raised.
¤ void scheme_register_extension_global(void *ptr, long size)
Registers an extension's global variable that can contain Scheme
pointers. The address of the global is given in ptr, and its
size in bytes in size.In addition to global variables, this
function can be used to register any permanent memory that the
collector would otherwise treat as atomic. A garbage collection can
occur during the registration.
¤ void scheme_set_stack_base(void *stack_addr, int no_auto_statics)
Overrides the GC's auto-determined stack base, and/or disables the
GC's automatic traversal of global and static variables. If
stack_addr is NULL, the stack base determined by the GC is
used. Otherwise, it should be the ``deepest'' memory address on the
stack where a collectable pointer might be stored. This function
should be called only once, and before any other scheme_
function is called. It never triggers a garbage collection.
The following example shows a typical use for setting the stack base:
int main(int argc, char **argv) {
int dummy;
scheme_set_stack_base(&dummy, 0);
real_main(argc, argv); /* calls scheme_basic_env(), etc. */
}
¤ void scheme_register_static(void *ptr, long size)
Like scheme_register_extension_global, for use in embedding applications in situations where the collector does not automatically find static variables (i.e., when scheme_set_stack_base has been called with a non-zero second argument).
The macro MZ_REGISTER_STATIC can be used directly on a static variable. It expands to a comment if statics need not be registered, and a call to scheme_register_static (with the address of the static variable) otherwise.
¤ void scheme_weak_reference(void **p)
Registers the pointer *p as a weak pointer; when no other
(non-weak) pointers reference the same memory as *p references,
then *p will be set to NULL by the garbage collector. The
value in *p may change, but the pointer remains weak with
respect to the value of *p at the time p was registered.
¤ void scheme_weak_reference_indirect(void **p,
void *v)
Like scheme_weak_reference, but *p is cleared
(regardless of its value) when there are no references to v.
¤ void scheme_register_finalizer(void *p,
void (*f)(void *p, void *data), void *data,
void (**oldf)(void *p, void *data), void **olddata)
Registers a callback function to be invoked when the memory p
would otherwise be garbage-collected, and when no ``will''-like
finalizers are registered for p.
The f argument is the callback function; when it is called, it
will be passed the value p and the data pointer data;
data can be anything -- it is only passed on to the callback
function. If oldf and olddata are not NULL, then
*oldf and *olddata are filled with with old callback
information (f and data will override this old callback).
To remove a registered finalizer, pass NULL for f and
data.
Note: registering a callback not only keeps p from collection
until the callback is invoked, but it also keeps data reachable
until the callback is invoked.
¤ void scheme_add_finalizer(void *p,
void (*f)(void *p, void *data), void *data)
Adds a finalizer to a chain of primitive finalizers. This chain is separate from the single finalizer installed with scheme_register_finalizer; all finalizers in the chain are called immediately after a finalizer that is installed with scheme_register_finalizer.
See scheme_register_finalizer, above, for information about the arguments.
To remove an added finalizer, use scheme_subtract_finalizer.
¤ void scheme_add_scheme_finalizer(void *p,
void (*f)(void *p, void *data), void *data)
Installs a ``will''-like finalizer, similar to will-register.
Scheme finalizers are called one at a time, requiring the collector
to prove that a value has become inaccessible again before calling
the next Scheme finalizer. Finalizers registered with
scheme_register_finalizer or scheme_add_finalizer are
not called until all Scheme finalizers have been exhausted.
See scheme_register_finalizer, above, for information about the arguments.
There is currently no facility to remove a ``will''-like finalizer.
¤ void scheme_add_finalizer_once(void *p,
void (*f)(void *p, void *data), void *data)
Like scheme_add_finalizer, but if the combination f and
data is already registered as a (non-``will''-like) finalizer
for p, it is not added a second time.
¤ void scheme_add_scheme_finalizer_once(void *p,
void (*f)(void *p, void *data), void *data)
Like scheme_add_scheme_finalizer, but if the combination of
f and data is already registered as a ``will''-like
finalizer for p, it is not added a second time.
¤ void scheme_subtract_finalizer(void *p,
void (*f)(void *p, void *data), void *data)
Removes a finalizer that was installed with scheme_add_finalizer.
¤ void scheme_remove_all_finalization(void *p)
Removes all finalization (``will''-like or not) for p, including
wills added in Scheme with will-register and finalizers used
by custodians.
¤ void scheme_dont_gc_ptr(void *p)
Keeps the collectable block p from garbage collection. Use this
procedure when a reference to p is be stored somewhere
inaccessible to the collector. Once the reference is no longer used
from the inaccessible region, de-register the lock with
scheme_gc_ptr_ok. A garbage collection can occur during the
registration.
This function keeps a reference count on the pointers it registers, so
two calls to scheme_dont_gc_ptr for the same p should
be balanced with two calls to scheme_gc_ptr_ok.
¤ void scheme_gc_ptr_ok(void *p)
See scheme_dont_gc_ptr.
¤ void scheme_collect_garbage()
Forces an immediate garbage-collection.
¤ void GC_register_traversers(short tag,
Size_Proc s, Mark_Proc m, Fixup_Proc f,
int is_const_size, int is_atomic)
3m only. Registers a size, mark, and fixup procedure for a given type tag; see section 3.1.1 for more information.
Each of the three procedures takes a pointer and returns an integer:
typedef int (*Size_Proc)(void *obj); typedef int (*Mark_Proc)(void *obj); typedef int (*Fixup_Proc)(void *obj);
If the result of the size procedure is a constant, then pass a
non-zero value for is_const_size. If the mark and fixup
procedures are no-ops, then pass a non-zero value
for is_atomic.
3 Under Mac OS X, scheme_set_stack_base must be called always.