C Types

C types are the main concept of the FFI, either primitive types or user-defined types. The FFI deals with primitive types internally, converting them to and from C types. A user type is defined in terms of existing primitive and user types, along with conversion functions to and from the existing types.

(make-ctype ctype scheme-to-C-proc C-to-scheme-proc)      PROCEDURE

Creates a new C type object, with the given conversions functions. The conversion functions can be #f meaning that there is no conversion for the corresponding direction. If both functions are #f, ctype is returned.

(ctype? v)      PROCEDURE

Returns #t if v is a C type (primitive or user-defined), #f otherwise.

(ctype-sizeof ctype)      PROCEDURE


(ctype-alignof ctype)      PROCEDURE

Return the size and alignment of a given ctype for the current platform.

(compiler-sizeof symbol)      PROCEDURE

Possible values for symbol are 'int, 'char, 'short, 'long, '*, 'void, 'float, 'double. The result is the size of the correspond type according to the C sizeof operator for the current platform. The compiler-sizeof operation should be used to gather information about the current platform, such as defining alias type like _int to a known type like _int32.

3.1  Numeric Types

There are basic integer types at various sizes. These are: _int8, _sint8, _uint8, _int16, _sint16, _uint16, _int32, _sint32, _uint32, _int64, _sint64, and _uint64. The `s' or `u' prefix specifies a signed or an unsigned integer respectively; the ones with no prefix are signed.

In addition, there are several type `aliases' (extra bindings for some of the above types):

In cases where speed matters, and you know that the integer is small enough, use the types _fixnum and _ufixnum, which are similar to _long and _ulong but assume that the quantities fit in MzScheme's immediate integers (not bignums). If you need this capability, but you want to be sure that the C level integer is a 32-bit size (as opposed to a long integer on some platforms), then use _fixint and _ufixint.

Finally, there are two floating point types, _float and _double for the corresponding C types, and the type _double* that implicitly coerces any non-complex number to a C double.

3.2  Other Atomic Types

_bool      C TYPE

Translates #f to a 0 _int, and any other value to 1.

_void      C TYPE

This type indicates a Scheme void return value, and it cannot be used to translate values to C (i.e., this type cannot be used for function inputs).

3.3  String Types

3.3.1  Primitive String Types

The five primitive string types corerspond to cases where a C representation matches MzScheme's representation without encodings.

_bytes      C TYPE

A type for Scheme byte strings, which corresponds to C's char* type. In addition to translating byte strings, #f corresponds to the NULL pointer. Note that this is also a custom-type macro; see section 3.5.1 below.

_string/ucs-4      C TYPE

A type for MzScheme's native Unicode strings, which is in UCS-4 format. These correspond to the C mzchar* type used by MzScheme.

_string/utf-16      C TYPE

Unicode strings in UTF-16 format.

_path      C TYPE

Simple char* strings, corresponding to MzScheme's paths.

_symbol      C TYPE

Simple char* strings as Scheme symbols (encoded in UTF-8). Return values using this type are interned as symbols.

3.3.2  Fixed Auto-Converting String Types

Several addition string types correspond to encoring conversions. The _string/utf-8, _string/locale, and _string/latin-1 types all correspond to (character) strings on the Scheme side and char* strings on the C side. The brige between the two requires a transformation on the content of the string. As usual, the types treat #f as NULL and vice-versa.

The _string*/utf-8, _string*/locale, and _string*/latin-1 types are similar, but they accept a wider range of values: Scheme byte strings are allowed passed as is, and Scheme paths are converted using path->bytes.

3.3.3  Variable Auto-Converting String Type

The _string/ucs-4 type is rarely useful when interacting with foreign code. Using _bytes is somewhat unnatural, since it forces Scheme programmers to use byte strings. Using _string/utf-8, etc. may prematurely commit to a particular encoding of strings as bytes. The _string type supports conversion between Scheme strings and char* strings using a parameter-determined conversion.

_string      C TYPE

Expands to a use of the default-_string-type parameter. The parameter's value is consulted when _string is evaluated, so the parameter should be set before any interface definition that uses _string.

(default-_string-type [ctype])      PROCEDURE

A parameter that determines the current meanging of _string. It is initially set to _string/*utf-8. If you change it, do so before interfaces are defined.

3.3.4  Other String Types

_file      C TYPE

Like _path, but when values go from Scheme to C, expand-path is used on the given value. As an output value, it is identical to _path.

_bytes/eof      C TYPE

Similar to the _bytes type, except that a foreign return value of NULL is translated to a Scheme eof value.

_string/eof      C TYPE

Similar to the _string type, except that a foreign return value of NULL is translated to a Scheme eof value.

3.4  Pointer Types

_pointer      C TYPE

Corresponds to Scheme ``C pointer'' objects. These pointers can have an arbitrary Scheme object attached as a type tag. The tag is ignored by built-in functionality; it is intended to be used by interfaces. See section 5.1 for creating pointer types that use these tags for safety.

_scheme      C TYPE

This type can be used with any Scheme object; it corresponds to the Scheme_Object* type of MzScheme's C API (see Inside PLT MzScheme). It is useful only for libraries that are aware of MzScheme's C API.

_fpointer      C TYPE

Similar to _pointer, except that it should be used with function pointers. Using these pointers avoids one dereferencing, which is the proper way of dealing with function pointers. This type should be used only in rare situations where you need to pass a foreign function pointer to a foreign function; using a _cprocedure type is possible for such situations, but inefficient, as every call will go through Scheme unnecessarily. Otherwise, _cprocedure should be used (it is based on _fpointer).

3.5  Function Types

(_cprocedure input-types output-type [wrapper-proc])      PROCEDURE

A type constructor that creates a new function type, which is specified by the given input-types list and output-type. Usually, the _fun syntax (described below) should be used instead, since it manages a wide range of complicated cases.

The resulting type can be used to reference foreign functions (usually ffi-objs, but any pointer object can be referenced with this type), generating a matching foreign callout object. Such objects are new primitive procedure objects that can be used like any other Scheme procedure.

A type created with _cprocedure can also be used for passing Scheme procedures to foreign functions, which will generate a foreign function pointer that calls the given Scheme procedure when it is used. There are no restrictions on the Scheme procedure; in particular, its lexical context is properly preserved.

The optional wrapper-proc, if provided, is expected to be a function that can change a callout procedure: when a callout is generated, the wrapper is applied on the newly created primitive procedure, and its result is used as the new function. Thus, wrapper-proc is a hook that can perform various argument manipulations before the foreign function is invoked, and return different results (for example, grabbing a value stored in an `output' pointer and returning multiple values). It can also be used for callbacks, as an additional layer that tweaks arguments from the foreign code before they reach the Scheme procedure, and possibly changes the result values too.

(_fun [args ::] input-type ···  -> output-type [-> output-expr])      

Creates a new function type. This is a convenient syntax for the _cprocedure type constructor that can handle complicated cases of argument handling. In its simplest form, only the input-types and the output-type are specified and each one is a simple expression, which creates a straightforward function type.

In its full form, the _fun syntax provides an IDL-like language that can be used to create a wrapper function around the primitive foreign function. These wrappers can implement complex foreign interfaces given simple specifications. First, the full form of each of the types can include an optional label and an expression:

type-spec is one of
  type
  (label : type)
  (type = expr)
  (label : type = expr)

If an expression is provided, then the resulting function will be a wrapper that calculates the argument for that position itself, meaning that it does not expect an argument for that position. The expression can use previous arguments if they were labeled. In addition, the result of a function call need not be the value returned from the foreign call: if the optional output-expr is specified, or if an expression is provided for the output type, then this specifies an expression that will be used as a return value. This expression can use any of the previous labels, including a label given for the output which can be used to access the actual foreign return value.

In rare cases where complete control over the input arguments is needed, the wrapper's argument list can be specified as args, in any form (including a `rest' argument). Identifiers in this place are related to type labels, so if an argument is there is no need to use an expression, for example:

(_fun (n s) :: (s : _string) (n : _int) -> _int)

specifies a function that receives an integer and a string, but the foreign function will get the string first.

3.5.1  Custom Function Types

The behavior of the _fun type can be customized via custom function types. These are pieces of syntax that can behave as C types and C type constructors, but they can interact with function calls in several ways that are not possible otherwise. When the _fun form is expanded, it tries to expand each of the given type expressions, and ones that expand to certain keyword-value lists interact with the generation of the foreign function wrapper. This makes it possible to construct a single wrapper function, avoiding the costs involved in compositions of higher-order functions.

Custom function types are macros that expand to a list that looks like: `(key: val ...)', where all of the `key:'s are from a short list of known keys. Each key interacts with generated wrapper functions in a different way, which affects how its corresponding argument is treated:

The pre: and post: bindings can be of the form (id => expr) to use the existing value. Note that if the pre: expression is not (id => expr), then it means that there is no input for this argument to the _fun-generated procedure. Also note that if a custom type is used as an output type of a function, then only the post: code is used.

All of the special custom types that are described here are defined this way.

Most custom types are meaningful only in a _fun context, and will raise a syntax error if used elsewhere. A few such types can be used in non-_fun contexts: types which use only type:, pre:, post:, and no others. Such custom types can be used outside a _fun by expanding them into a usage of make-ctype, using other keywords makes this impossible -- it means that the type has specific interaction with a function call.

(define-fun-syntax identifier transformer)      SYNTAX

The results of expanding custom type macros is taken apart by the _fun macro, which will lead to code certificate problems. To solve this, do use define-fun-syntax instead of define-syntax. It is used in the same way, but will avoid such problems.

_?      CUSTOM C TYPE

Not a conventional C type, but a marker for expressions that should not be sent to the ffi function. Use this to bind local values in a computation that is part of an ffi wrapper interface, or to specify wrapper arguments that are not sent to the foreign function (e.g., an argument that is used for processing the foreign output).

(_ptr mode type)      CUSTOM C TYPE

For C pointers, where mode indicates input or output pointers (or both). mode can be one of the following:

For example, the _ptr type can be used in output mode to create a foreign function wrapper that returns more than a single argument. The following type:

(_fun (i : (_ptr o _int))
      -> (d : _double)
      -> (values d i))

will create a function that calls the foreign function with a fresh integer pointer, and use the value that is placed there as a second return value.

_box      TYPE

Custom C Type

This is similar to a (_ptr io type) argument, where the input is expected to be a box holding an appropriate value, which is unboxed on entry and modified accordingly on exit.

(_list mode type [len])      CUSTOM C TYPE

Similar to _ptr, except that it is used for converting lists to/from C vectors. The optional len argument is needed for output values where it is used in the post code, and in the pre code of an output mode to allocate the block. In any case it can refer to a previous binding for the length of the list which the C function will most likely require.

_vector      CUSTOM C TYPE

Same as _list, except that it uses Scheme vectors instead of lists.

_bytes      CUSTOM C TYPE

_bytes can be used by itself as a simple type that uses a byte string as a C pointer. Alternatively, it can be used as a `(_bytes o len)' form is for a pointer return value, where the size should be explicitly specified. There is no need for other modes: input or input/output would be just like _bytes since the string carries its size information (there is no real need for the `o' part of the syntax, but it's there for consistency with the above macros).

_cvector      CUSTOM C TYPE

Like _bytes, _cvector can be used as a simple type that corresponds to a pointer that is managed as a safe C vector on the Scheme side -- this is described in section 5.2 above. The syntax specified here is an alternative that makes it behave similarly to the _list and _vector custom types, except that this is more efficient since no Scheme list or vector are needed. (It can be used with all three modes.)

3.6  C Struct Types

(make-cstruct-type ctypes)      PROCEDURE

The primitive type constructor for creating new C struct types. These types are actually new primitive types -- they don't have any conversion functions associated. The corresponding Scheme objects that are used for structs are pointers, but when these types are used, the value that the pointer refers to is used rather than the pointer itself. This value is basically made of a number of bytes that is known according to the given list of ctypes list.

(_list-struct ctypes ···1)      PROCEDURE

A type constructor that builds a struct type using the above make-cstruct-type function and wraps it in a type that marshals a struct as a list of its components. Note that space for structs needs to be allocated; the converter for a _list-struct type immediately allocates and uses a list from the allocated space, so it is inefficient. Use define-cstruct below for a more efficient approach.

(define-cstruct _id ((field-id ctype) ···))      SYNTAX

Defines a new C struct type, but unlike _list-struct, the resulting type deals with C structs in binary form rather than marshaling them to Scheme values. It uses a define-struct-like approach, providing accessor functions for raw struct values (which are pointer objects). The new type uses pointer tags to guarantee that only proper struct objects are used. The name must have a form of _id. The form and the generated bindings are intentionally similar to define-struct only with type specification for the fields -- the identifiers that will be bound as a result are:

Objects of this new type are actually cpointers, with a type tag that is a list that contains "id". Since structs are implemented as pointers, they can be used for a _pointer input to a foreign function: their address will be used. To make this a little safer, the corresponding cpointer type is defined as _id-pointer. The _id type should not be used when a pointer is expected, since it will cause the struct to be copied rather than use the pointer value, leading to memory corruption.

If the first field is itself a cstruct type, its tag will be used in addition to the new tag. This feature supports common cases of object inheritance, where a sub-struct is made by having a first field that is its super-struct. Instances of the sub-struct can be considered as instances of the super-struct, since they share the same initial layout. Using the tag of an initial cstruct field means that the same behavior is implemented in Scheme; for example, accessors and mutators of the super-cstruct can be used with the new sub-cstruct. See Section 3.6.1 for an example.

Note that structs are allocated as atomic blocks, which means that the garbage collector ignores their content. Currently, there is no safe way to store pointers to GC-managed objects in structs (even if you keep a reference to avoid collecting the referenced objects, a the 3m variant's GC will invalidate the pointer's value). Thus, only non-pointer values and pointers to memory that is outside the GC's control can be lpaced into struct fields.

(define-cstruct (_id _super-id) ((field-id ctype) ···))      SYNTAX

This alternative form of define-cstruct, is shorthand for using an initial field named super-id using _super-id as its type. Remember that the new struct will use _super-id's tag in addition to its own tag, meaning that instances of _id can be used as instances of _super-id. Aside from the syntactic sugar, the constructor function will be different when this syntax is used: instead of expecting a first argument which is an instance of _super-id, it will expect arguments for each of _super-id's fields in addition for the new fields. This is, again, in analogy to using a super-struct with define-struct.

3.6.1  C Struct Examples

A few examples will help understanding how to use structs. Assuming the following C code:

 typedef struct { int x; char y; } A;
 typedef struct { A a; int z; } B;

 A* makeA() {
   A *p = malloc(sizeof(A));
   p->x = 1;
   p->y = 2;
   return p;
 }

 B* makeB() {
   B *p = malloc(sizeof(B));
   p->a.x = 1;
   p->a.y = 2;
   p->z   = 3;
   return p;
 }

 char gety(A* a) {
   return a->y;
 }

First, using the simple _list-struct, you might expect this code to work:

 (define makeB
   (get-ffi-obj 'makeB "foo.so"
     (_fun -> (_list-struct (_list-struct _int _byte) _int))))
 (makeB) ; should return ((1 2) 3)

The problem here is that makeB returns a pointer to the struct rather than the struct itself. The following works as expected:

 (define makeB
   (get-ffi-obj 'makeB "foo.so" (_fun -> _pointer)))
 (ptr-ref (makeB) (_list-struct (_list-struct _int _byte) _int))

As described above, _list-structs should be used in cases where efficiency is not an issue. We continue using define-cstruct, first define a type for `A' which makes it possible to use `makeA':

 (define-cstruct _A ([x _int] [y _byte]))
 (define makeA
   (get-ffi-obj 'makeA "foo.so"
     (_fun -> _A-pointer))) ; using _A is a memory-corrupting bug!
 (define a (makeA))
 (list a (A-x a) (A-y a))
   ; -> (#<cpointer:A> 1 2)

`gety' is also simple to use from Scheme:

 (define gety
   (get-ffi-obj 'gety "foo.so"
     (_fun _A-pointer -> _byte)))
 (gety a)
   ; -> 2

We now define another C struct for `B', and expose `makeB' using it:

 (define-cstruct _B ([a _A] [z _int]))
 (define makeB
   (get-ffi-obj 'makeB "foo.so"
     (_fun -> _B-pointer)))
 (define b (makeB))

We can access all values of b using a naive approach:

 (list (A-x (B-a b)) (A-y (B-a b)) (B-z b))

but this is inefficient as it allocates and copies an instance of `A' on every access. Inspecting the tags (cpointer-tag b) we can see that A's tag is included, so we can simply use its accessors and mutators, as well as any function that is defined to take an A pointer:

 (list (A-x b) (A-y b) (B-z b))
 (gety b)

Constructing a B instance in Scheme requires allocating a temporary A struct:

 (define b (make-B (make-A 1 2) 3))

To make this more efficient, we switch to the alternative define-cstruct syntax, which creates a constructor that expects arguments for both the super fields ands the new ones:

 (define-cstruct (_B _A) ([z _int]))
 (define b (make-B 1 2 3))

3.7  Enumerations and Masks

Although the constructors below are describes as procedures,they are implemented as syntax, so that error messages can report a type name where the syntactic context implies one.

(_enum symbols [basetype])      PROCEDURE

Takes a list of symbols and generates an enumeration type. The enumeration maps between the given symbols and integers, counting from 0. The list of symbols can also set the values of symbols, if a symbol is followed by '= and an integer. For example, the list '(x y = 10 z) maps 'x to 0, 'y to 10, and 'z to 11.

The optional basetype argument specifies the base type to use, defaulting to _ufixint.

(_bitmask symbols [basetype])      PROCEDURE

Similar to _enum, but the resulting mapping translates a list of symbols to a number and back, using a logical or. A single symbol can be given as an input to make things a little more convenient. The default basetype is _uint, since high bits are often used for flags.