# Chapter 3Basic Data Extensions

## 3.1  Void and Undefined

MzScheme returns the unique void value -- printed as `#<void>` -- for expressions that have unspecified results in R5RS. The procedure `void` takes any number of arguments and returns void:

• `(void` `v` ···`)` returns void.

• `(void?` `v``)` returns `#t` if `v` is void, `#f` otherwise.

Variables bound by `letrec-values` that are accessible but not yet initialized are bound to the unique undefined value, printed as `#<undefined>`.

## 3.2  Booleans

Unless otherwise specified, two instances of a particular MzScheme data type are `equal?` only when they are `eq?`. Two values are `eqv?` only when they are either `eq?`, both `+nan.0`, or both `=` and have the same exactness and sign. (The inexact numbers `0.0` and `-0.0` are not `eqv?`, although they are `=`.)

The `andmap` and `ormap` procedures apply a test procedure to the elements of a list, returning immediately when the result for testing the entire list is determined. The arguments to `andmap` and `ormap` are the same as for `map`, but a single boolean value is returned as the result, rather than a list:

• `(andmap` `proc list` ···1`)` applies `proc` to elements of the `list`s from the first elements to the last, returning `#f` as soon as any application returns `#f`. If no application of `proc` returns `#f`, then the result of the last application of `proc` is returned. If the `list`s are empty, then `#t` is returned.

• `(ormap` `proc list` ···1`)` applies `proc` to elements of the `list`s from the first elements to the last. If any application returns a value other than `#f`, that value is immediately returned as the result of the `ormap` application. If all applications of `proc` return `#f`, then the result is `#f`. If the `list`s are empty, then `#f` is returned.

Examples:

```(`andmap` `positive?` '(1 2 3)) ; => `#t`
(`ormap` `eq?` '(a b c) '(a b c)) ; => `#t`
(`andmap` `positive?` '(1 2 a)) ; => raises `exn:fail:contract`
(`ormap` `positive?` '(1 2 a)) ; => `#t`
(`andmap` `positive?` '(1 -2 a)) ; => `#f`
(`andmap` + '(1 2 3) '(4 5 6)) ; => `9`
(`ormap` + '(1 2 3) '(4 5 6)) ; => `5`
```

## 3.3  Numbers

A number in MzScheme is one of the following:

• a fixnum exact integer (30 bits4 plus a sign bit)

• a bignum exact integer (cannot be represented in a fixnum)

• a fraction exact rational (represented by two exact integers)

• a flonum inexact rational (double-precision floating-point number)

• a complex number; either the real and imaginary parts are both exact or inexact, or the number has an exact zero real part and an inexact imaginary part; a complex number with an inexact zero imaginary part is a real number

MzScheme extends the number syntax of R5RS in three ways:

• All input radixes (`#b`, `#o`, `#d`, and `#x`) allow ``decimal'' numbers that contain a period or exponent marker. For example, `#b1.1` is equivalent to `1.5`. In hexadecimal numbers, `e` and `d` always stand for a hexadecimal digit, not an exponent marker.

• The mantissa of a number with an exponent marker can be expressed as a fraction. For example, `1/2e3` is equivalent to `500.0`, and `1/2e2+1/2e4i` is equivalent to `50.0+5000.0i`.

• The following are inexact numerical constants: `+inf.0` (infinity), `-inf.0` (negative infinity), `+nan.0` (not a number), and `-nan.0` (same as `+nan.0`). These names can also be used within complex constants, as in `-inf.0+inf.0i`. These names are case-insensitive.

The special inexact numbers `+inf.0`, `-inf.0`, and `+nan.0` have no exact form. Dividing by an inexact zero returns `+inf.0` or `-inf.0`, depending on the sign of the dividend. The infinities are integers, and they answer `#t` for both `even?` and `odd?`. The `+nan.0` value is not an integer and is not `=` to itself, but `+nan.0` is `eqv?` to itself.5 Similarly, `(= 0.0 -0.0)` is `#t`, but `(eqv? 0.0 -0.0)` is `#f`.

All multi-argument arithmetic procedures operate pairwise on arguments from left to right.

The `string->number` procedure works on all number representations and exact integer radix values in the range `2` to `16` (inclusive). The `number->string` procedure accepts all number types and the radix values `2`, `8`, `10`, and `16`; however, if an inexact number is provided with a radix other than `10`, the `exn:fail:contract` exception is raised.

The `add1` and `sub1` procedures work on any number:

• `(add1` `z``)` returns `z` + 1.

• `(sub1` `z``)` returns `z` - 1.

The following procedures work on integers:

• `(quotient/remainder` `n1 n2``)` returns two values: `(quotient n1 n2)` and `(remainder n1 n2)`.

• `(integer-sqrt` `n``)` returns the integer square-root of `n`. For positive `n`, the result is the largest positive integer bounded by the `(sqrt n)`. For negative `n`, the result is `(* (integer-sqrt (- n)) 0+i)`.

• `(integer-sqrt/remainder` `n``)` returns two values: `(integer-sqrt n)` and `(- n (expt (integer-sqrt n) 2))`.

The following procedures work on exact integers in their (semi-infinite) two's complement representation:

• `(bitwise-ior` `n` ···1`)` returns the bitwise ``inclusive or'' of the `n`s.

• `(bitwise-and` `n` ···1`)` returns the bitwise ``and'' of the `n`s.

• `(bitwise-xor` `n` ···1`)` returns the bitwise ``exclusive or'' of the `n`s.

• `(bitwise-not` `n``)` returns the bitwise ``not'' of `n`.

• `(arithmetic-shift` `n m``)` returns the bitwise ``shift'' of `n`. The integer `n` is shifted left by `m` bits; i.e., `m` new zeros are introduced as rightmost digits. If `m` is negative, `n` is shifted right by - `m` bits; i.e., the rightmost `m` digits are dropped.

The `random` procedure generates pseudo-random numbers:

• `(random` `k``)` returns a random exact integer in the range `0` to `k` - 1 where `k` is an exact integer between 1 and 231 - 1, inclusive. The number is provided by the current pseudo-random number generator, which maintains an internal state for generating numbers.6

• `(random``)` returns a random inexact number between `0` and `1`, exclusive, using the current pseudo-random number generator.

• `(random-seed` `k``)` seeds the current pseudo-random number generator with `k`, an exact integer between 0 and 231 - 1, inclusive. Seeding a generator sets its internal state deterministically; seeding a generator with a particular number forces it to produce a sequence of pseudo-random numbers that is the same across runs and across platforms.

• `(pseudo-random-generator->vector` `generator``)` produces a vector that represents the complete internal state of `generator`. The vector is suitable as an argument to `vector->pseudo-random-generator` to recreate the generator in its current state (across runs and across platforms).

• `(vector->pseudo-random-generator` `vec``)` produces a pseudo-random number generator whose internal state corresponds to `vec`. The vector `vec` must contain six exact integers; the first three integers must be in the range `0` to `4294967086`, inclusive; the last three integers must be in the range `0` to `4294944442`, inclusive; at least one of the first three integers must be non-zero; and at least one of the last three integers must be non-zero.

• `(current-pseudo-random-generator``)` returns the current pseudo-random number generator, and `(current-pseudo-random-generator` `generator``)` sets the current generator to `generator`. See also section 7.9.1.10.

• `(make-pseudo-random-generator``)` returns a new pseudo-random number generator. The new generator is seeded with a number derived from `(current-milliseconds)`.

• `(pseudo-random-generator?` `v``)` returns `#t` if `v` is a pseudo-random number generator, `#f` otherwise.

The following procedures convert between Scheme numbers and common machine byte representations:

• `(integer-bytes->integer` `string signed?` [`big-endian?`]`)` converts the machine-format number encoded in `string` to an exact integer. The `string` must contain either 2, 4, or 8 characters. If `signed?` is true, then the string is decoded as a two's-complement number, otherwise it is decoded as an unsigned integer. If `big-endian?` is true, then the first character's ASCII value provides the most significant eight bits of the number, otherwise the first character provides the least-significant eight bits, and so on. The default value of `big-endian?` is the result of `system-big-endian?`.

• `(integer->integer-bytes` `n size-n signed?` [`big-endian? to-string`]`)` converts the exact integer `n` to a machine-format number encoded in a string of length `size-n`, which must be 2, 4, or 8. If `signed?` is true, then the number is encoded with two's complement, otherwise it is encoded as an unsigned bit stream. If `big-endian?` is true, then the most significant eight bits of the number are encoded in the first character of the resulting string, otherwise the least-significant bits are encoded in the first character, and so on. The default value of `big-endian?` is the result of `system-big-endian?`.

If `to-string` is provided, it must be a mutable string of length `size-n`; in that case, the encoding of `n` is written into `to-string`, and `to-string` is returned as the result. If `to-string` is not provided, the result is a newly allocated string.

If `n` cannot be encoded in a string of the requested size and format, the `exn:fail:contract` exception is raised. If `to-string` is provided and it is not of length `size-n`, the `exn:fail:contract` exception is raised.

• `(floating-point-bytes->real` `string` [`big-endian?`]`)` converts the IEEE floating-point number encoded in `string` to an inexact real number. The `string` must contain either 4 or 8 characters. If `big-endian?` is true, then the first character's ASCII value provides the most significant eight bits of the IEEE representation, otherwise the first character provides the least-significant eight bits, and so on. The default value of `big-endian?` is the result of `system-big-endian?`.

• `(real->floating-point-bytes` `x size-n` [`big-endian? to-string`]`)` converts the real number `x` to its IEEE representation in a string of length `size-n`, which must be 4 or 8. If `big-endian?` is true, then the most significant eight bits of the number are encoded in the first character of the resulting string, otherwise the least-significant bits are encoded in the first character, and so on. The default value of `big-endian?` is the result of `system-big-endian?`.

If `to-string` is provided, it must be a mutable string of length `size-n`; in that case, the encoding of `n` is written into `to-string`, and `to-string` is returned as the result. If `to-string` is not provided, the result is a newly allocated string.

If `to-string` is provided and it is not of length `size-n`, the `exn:fail:contract` exception is raised.

• `(system-big-endian?``)` returns `#t` if the native encoding of numbers is big-endian for the machine running MzScheme, `#f` if the native encoding is little-endian.

## 3.4  Characters

MzScheme characters range over Unicode scalar values (see section 1.2.1), which includes characters whose values range from `#x0` to `#x10FFFF`, but not including `#xD800` to `#xDFFF`. The procedure `char->integer` returns a character's code-point number, and `integer->char` converts a code-point number to a character. If `integer->char` is given an integer that is either outside `#x0` to `#x10FFFF` or in the excluded range `#xD800` to `#xDFFF`, the `exn:fail:contract` exception is raised.

Character constants include special named characters, such as `#\newline`, plus octal representations (e.g., `#\251`), and Unicode-style hexadecimal representations (e.g., `#\u03BB`). See section 11.2.4 for more information on character constants.

The character comparison procedures `char=?`, `char<?`, `char-ci=?`, etc. take two or more character arguments and check the arguments pairwise (like the numerical comparison procedures). Two characters are `eq?` whenever they are `char=?`. The expression `(char<? char1 char2)` produces the same result as `(< (char->integer char1) (char->integer char2))`, etc. The case-independent `-ci` procedures compare characters after case-folding with `char-foldcase` (described below).

The character predicates produce results consistent with the Unicode database7 and (usually) SRFI-14. These procedures are fully portable; their results do not depend on the current platform or locale.

• `(char-alphabetic?` `char``)` -- returns `#t` if `char`'s Unicode general category is Lu, Ll, Lt, Lm, or Lo, `#f` otherwise.

• `(char-lower-case?` `char``)` -- returns `#t` if `char` has the Unicode ``Lowercase'' property.

• `(char-upper-case?` `char``)` -- returns `#t` if `char` has the Unicode ``Uppercase'' property.

• `(char-title-case?` `char``)` -- returns `#t` if `char`'s Unicode general category is Lt, `#f` otherwise.

• `(char-numeric?` `char``)` -- returns `#t` if `char`'s Unicode general category is Nd, `#f` otherwise.

• `(char-symbolic?` `char``)` -- returns `#t` if `char`'s Unicode general category is Sm, Sc, Sk, or So, `#f` otherwise.

• `(char-punctuation?` `char``)` -- returns `#t` if `char`'s Unicode general category is Pc, Pd, Ps, Pe, Pi, Pf, or Po, `#f` otherwise.

• `(char-graphic?` `char``)` -- returns `#t` if `char`'s Unicode general category is Mn, Mc, Me, or if one of the following produces `#t` when applied to `char`: `char-alphabetic?`, `char-numeric?`, `char-symbolic?`, or `char-punctuation?`.

• `(char-whitespace?` `char``)` -- returns `#t` if `char`'s Unicode general category is Zs, Zl, or Zp, or if `char` is one of the following: `#\tab`, `#\newline`, `#\vtab`, `#\page`, or `#\return`.

• `(char-blank?` `char``)` -- returns `#t` if `char`'s Unicode general category is Zs or if `char` is `#\tab`. (These correspond to horizontal whitespace.)

• `(char-iso-control?` `char``)` -- return `#t` if `char` is between `#\u0000` and `#\u001F` inclusive or `#\u007F` and `#\u009F` inclusive.

Character conversions are also consistent with the 1-to-1 code point mapping defined by Unicode. String procedures (see section 3.5) handle the case where Unicode defines a locale-independent mapping from the code point to a code-point sequence (in addition to the 1-1 mapping on scalar values).

• `(char-upcase` `char``)` produces a character according to the upcase mapping provided by the Unicode database for `char`; if `char` has no upcase mapping, `char-upcase` produces `char`.

• `(char-downcase` `char``)` produces a character according to the downcase mapping provided by the Unicode database for `char`; if `char` has no downcase mapping, `char-upcase` produces `char`.

• `(char-titlecase` `char``)` produces a character according to the titlecase mapping provided by the Unicode database for `char`; if `char` has no titlecase mapping, `char-upcase` produces `char`.

• `(char-foldcase` `char``)` produces a character according to the case-folding mapping provided by the Unicode database for `char`.

`(make-known-char-range-list``)` produces a list of three-element lists, where each three-element list represents a set of consecutive code points for which the Unicode standard specifies character properties. Each three-element list contains two integers and a boolean; the first integer is a starting code-point value (inclusive), the second integer is an ending code-point value (inclusive), and the boolean is `#t` when all characters in the code-point range have identical results for all of the character predicates above. The three-element lists are ordered in the overall result list such that later lists represent larger code-point values, and all three-element lists are separated from every other by at least one code-point value that is not specified by Unicode.

`(char-utf-8-length` `char``)` produces the same result as `(bytes-length (string->bytes/utf-8 (string char)))`.

## 3.5  Strings

Since a string consists of a sequence of characters, a string in MzScheme is a Unicode code-point sequence. MzScheme also provides byte strings, as well as functions to convert between byte strings and strings with respect to various encodings, including UTF-8 and the current locale's encoding. See section 1.2 for an overview of Unicode, locales, and encodings, and see section 3.6 for more specific information on byte-string conversions.

A string can be mutable or immutable. When an immutable string is provided to a procedure like `string-set!`, the `exn:fail:contract` exception is raised. String constants generated by `read` are immutable. `(string->immutable-string` `string``)` returns an immutable string with the same content as `string`, and it returns `string` itself if `string` is immutable. (See also `immutable?` in section 3.10.)

`(substring` `string start-k` [`end-k`]`)` returns a mutable string, even if the `string` argument is immutable. The `end-k` argument defaults to `(string-length string)`

`(string-copy!` `dest-string dest-start-k src-string` [`src-start-k src-end-k`]`)` changes the characters of `dest-string` from positions `dest-start-k` (inclusive) to `dest-end-k` (exclusive) to match the characters in `src-string` from `src-start-k` (inclusive). If `src-start-k` is not provided, it defaults to `0`. If `src-end-k` is not provided, it defaults to `(string-length src-string)`. The strings `dest-string` and `src-string` can be the same string, and in that case the destination region can overlap with the source region; the destination characters after the copy match the source characters from before the copy. If any of `dest-start-k`, `src-start-k`, or `src-end-k` are out of range (taking into account the sizes of the strings and the source and destination regions), the `exn:fail:contract` exception is raised.

When a string is created with `make-string` without a fill value, it is initialized with the null character (`#\nul`) in all positions.

The string comparison procedures `string=?`, `string<?`, `string-ci=?`, etc. take two or more string arguments and check the arguments pairwise (like the numerical comparison procedures). String comparisons are performed through pairwise comparison of characters; for the `-ci` operations, the two strings are first case-folded using `string-foldcase` (described below). Comparisons using all of these functions are fully portable; the results do not depend on the current platform or locale.

Four string-conversion procedures take into account Unicode's locale-independent conversion rules that map code-point sequences to code-point sequences (instead of simply mapping a 1-to-1 function on code points over the string). In each case, the string produced by the conversion can be longer that the input string.

• `(string-upcase` `string``)` returns a string whose characters are the upcase conversion of the characters in `string`.

• `(string-downcase` `string``)` returns a string whose characters are the downcase conversion of the characters in `string`.

• `(string-titlecase` `string``)` returns a string where the first character in each sequence of cased characters in `string` (ignoring case-ignorable characters) is converted to titlecase, and all other cased characters are downcased.

• `(string-foldcase` `string``)` returns a string whose characters are the case-fold conversion of the characters in `string`.

Examples:

```(`string-upcase` "abc!") ; => `"ABC!"`
(`string-upcase` "Stra\xDFe") ; => `"STRASSE"`

(`string-downcase` "aBC!") ; => `"abc!"`
(`string-downcase` "Stra\xDFe") ; => `"stra\xDFe"`
(`string-downcase` "\u039A\u0391\u039F\u03A3") ; => `"\u03BA\u03b1\u03BF\u03C2"`
(`string-downcase` "\u03A3") ; => `"\u03C3"`

(`string-titlecase` "aBC  twO") ; => `"Abc  Two"`
(`string-titlecase` "y2k") ; => `"Y2K"`
(`string-titlecase` "main stra\xDFe") ; => `"Main Stra\xDFe"`
(`string-titlecase` "stra \xDFe") ; => `"Stra Sse"`

(`string-foldcase` "aBC!") ; => `"abc!"`
(`string-foldcase` "Stra\xDFe") ; => `"strasse"`
(`string-foldcase` "\u039A\u0391\u039F\u03A3") ; => `"\u03BA\u03b1\u03BF\u03C3"`
```

In addition to the character-based string procedures, MzScheme provides the following locale-sensitive procedures (see also section 1.2.2 and section 7.9.1.11):

• `(string-locale=?` `string1 string2` ···1`)`

• `(string-locale<?` `string1 string2` ···1`)`

• `(string-locale>?` `string1 string2` ···1`)`

• `(string-locale-ci=?` `string1 string2` ···1`)`

• `(string-locale-ci<?` `string1 string2` ···1`)`

• `(string-locale-ci>?` `string1 string2` ···1`)`

• `(string-locale-upcase` `string``)` -- may produce a string that is longer or shorter than `string` if the current locale has complex case-folding rules.

• `(string-locale-downcase` `string``)` -- like `string-locale-upcase`, may produce a string that is longer or shorter than `string`

These procedures depend only on the current locale's case-conversion and collation rules, and not on its encoding rules.

## 3.6  Byte Strings

A byte string is like a string, but it a sequence of bytes instead of characters. A byte is an exact integer between `0` and `255` inclusive; `(byte?` `v``)` produces `#t` if `v` is such an exact integer, `#f` otherwise. Two bytes strings are `equal?` if they are bytewise equal, and two byte strings are `eqv?` only if they are `eq?`.

MzScheme provides byte-string operations in parallel to the character-string operations:

• `(bytes?` `v``)`

• `(bytes` `byte` ···1`)`

• `(make-bytes` `k` [`byte`]`)`

• `(bytes-length` `bytes``)`

• `(bytes-ref` `bytes k``)`

• `(bytes-set!` `bytes k byte``)`

• `(bytes-fill!` `bytes byte``)`

• `(subbytes` `bytes start-k` [`end-k`]`)`

• `(bytes-append` `bytes` ···1`)`

• `(bytes-copy` `bytes``)`

• `(bytes-copy!` `dest-bytes dest-start-k src-bytes` [`src-start-k src-end-k`]`)`

• `(bytes->list` `bytes``)`

• `(list->bytes` `byte-list``)`

• `(bytes->immutable-bytes` `bytes``)`

• `(bytes=?` `bytes1 bytes2` ···1`)`

• `(bytes<?` `bytes1 bytes2` ···1`)`

• `(bytes>?` `bytes1 bytes2` ···1`)`

A byte-string constant is written like a string, but prefixed with `#` (with no space between `#` and the opening double-quote). A byte-string constant can contain escape sequences, as in `#"\n"`, just like strings; an `exn:fail:read` exception is raised if a ``\u'' sequence appears within a byte string and the given hexadecimal value is larger than 255.

Like character strings, byte strings generated by `read` are immutable, and when an immutable string is provided to a procedure like `bytes-set!`, the `exn:fail:contract` exception is raised.

The following procedures convert between byte strings and character strings:

• `(bytes->string/utf-8` `bytes` [`err-char start-k end-k`]`)` -- produces a string by decoding the `start-k` to `end-k` substring of `bytes` as a UTF-8 encoding of Unicode code points. If `err-char` is provided and not `#f`, then it is used for bytes that fall in the range `#o200` to `#o377` but are not part of a valid encoding sequence. (This is consistent with reading characters from a port; see section 11.1 for more details.) If `err-char` is `#f` or not provided, and if the `start-k` to `end-k` substring of `bytes` is not a valid UTF-8 encoding overall, then the `exn:fail:contract` exception is raised. If `start-k` or `end-k` are not provided, they default to `0` and `(bytes-length bytes)`, respectively.

• `(bytes->string/locale` `bytes` [`err-char start-k end-k`]`)` -- produces a string by decoding the `start-k` to `end-k` substring of `bytes` using the current locale's encoding (see also section 1.2.2). If `err-char` is provided and not `#f`, it is used for each byte in `bytes` that is not part of a valid encoding; if `err-char` is `#f` or not provided, and if the `start-k` to `end-k` substring of `bytes` is not a valid encoding overall, then the `exn:fail:contract` exception is raised. If `start-k` or `end-k` are not provided, they default to `0` and ```(bytes-length bytes)```, respectively.

• `(bytes->string/latin-1` `bytes` [`err-char start-k end-k`]`)` -- produces a string by decoding the `start-k` to `end-k` substring of `bytes` as a Latin-1 encoding of Unicode code points; i.e., each byte is translated directly to a character using `integer->char`, so the decoding always succeeds.8 The `err-char` argument is ignored, but for consistency with the other operations, it must be a character or `#f` if provided. If `start-k` or `end-k` are not provided, they default to `0` and `(bytes-length bytes)`, respectively.

• `(string->bytes/utf-8` `string` [`err-byte start-k end-k`]`)` -- produces a byte string by ending the `start-k` to `end-k` substring of `string` via UTF-8 (always succeeding). The `err-char` argument is ignored, but for consistency with the other operations, it must be a byte or `#f` if provided. If `start-k` or `end-k` are not provided, they default to `0` and `(string-length string)`, respectively.

• `(string->bytes/locale` `string` [`err-byte start-k end-k`]`)` -- produces a string by encoding the `start-k` to `end-k` substring of `string` using the current locale's encoding (see also section 1.2.2). If `err-byte` is provided and not `#f`, it is used for each character in `string` that cannot be encoded for the current locale; if `err-byte` is `#f` or not provided, and if the `start-k` to `end-k` substring of `string` cannot be encoded, then the `exn:fail:contract` exception is raised. If `start-k` or `end-k` are not provided, they default to `0` and ```(string-length string)```, respectively.

• `(string->bytes/latin-1` `string` [`err-byte start-k end-k`]`)` -- produces a string by encoding the `start-k` to `end-k` substring of `string` using Latin-1; i.e., each character is translated directly to a byte using `char->integer`. If `err-byte` is provided and not `#f`, it is used for each character in `string` whose value is greater than `255`;9 if `err-byte` is `#f` or not provided, and if the `start-k` to `end-k` substring of `string` has a character with a value greater than `255`, then the `exn:fail:contract` exception is raised. If `start-k` or `end-k` are not provided, they default to `0` and `(string-length string)`, respectively.

• `(string-utf-8-length` `string` [`start-k end-k`]`)` returns the length in bytes of the UTF-8 encoding of `string`'s substring from `start-k` to `end-k`, but without actually generating the encoded bytes. If `start-k` is not provided, it defaults to `0`, and `end-k` defaults to ```(string-length string)```.

• `(bytes-utf-8-length` `bytes` [`err-char start-k end-k`]`)` returns the length in characters of the UTF-8 decoding of `bytes`'s substring from `start-k` to `end-k`, but without actually generating the decoded characters. If `start-k` is not provided, it defaults to `0`, and `end-k` defaults to `(bytes-length bytes)`. If `err-char` is `#f` and the substring is not a UTF-8 encoding overall, the result is `#f`. Otherwise, `err-char` is used to resolve decoding errors as in `bytes->string/utf-8`.

• `(bytes-utf-8-ref` `bytes` [`skip-k err-char start-k end-k`]`)` returns the `skip-k`th character in the UTF-8 decoding of `bytes`'s substring from `start-k` to `end-k`, but without actually generating the other decoded characters. If `start-k` is not provided, it defaults to `0`, and `end-k` defaults to `(bytes-length bytes)`. If the substring is not a UTF-8 encoding up to the `skip-k`th character (when `err-char` is `#f`), or if the substring decoding produces fewer than `skip-k` characters, the result is `#f`. If `err-char` is not `#f`, it is used to resolve decoding errors as in `bytes->string/utf-8`.

• `(bytes-utf-8-index` `bytes` [`skip-k err-char start-k end-k`]`)` returns the offset in bytes into `bytes` at which the `skip-k`th character's encoding starts in the UTF-8 decoding of `bytes`'s substring from `start-k` to `end-k` (but without actually generating the other decoded characters). If `start-k` is not provided, it defaults to `0`, and `end-k` defaults to `(bytes-length bytes)`. The result is relative to the start of `bytes`, not to `start-k`. If the substring is not a UTF-8 encoding up to the `skip-k`th character (when `err-char` is `#f`), or if the substring decoding produces fewer than `skip-k` characters, the result is `#f`. If `err-char` is not `#f`, it is used to resolve decoding errors as in `bytes->string/utf-8`.

A string converter can be used to convert directly from one byte-string encoding of characters to another byte-string encoding.

• `(bytes-open-converter` `from-name-string to-name-string``)` -- produces a string converter to go from the encoding named by `from-name-string` to the encoding named by `to-name-string`. If the requested conversion pair is not available, `#f` is returned instead of a converter.

Three encodings are always available in certain positions:

• `"UTF-8"` as either `from` or `to` -- the UTF-8 encoding.

• `"UTF-8-permissive"` as `from` with `"UTF-8"` as `to` -- the UTF-8 encoding where encoding errors are tolerated, producing the same result as `(char->integer #\?)` for bytes that are not part of a valid encoding sequence. (This handling of invalid sequences is consistent with the interpretation of port bytes streams into characters; see section 11.1.)

• `""` as either `from` or `to` -- the current locale's default encoding (see section 1.2.2).

A newly opened byte converter is registered with the current custodian (see section 9.2), so that the converter is closed when the custodian is shut down. A converter is not registered with a custodian (and does not need to be closed) if it is is one of the guaranteed combinations involving only `"UTF-8"` and `"UTF-8-permissive"` under Unix, or if it is any of the guaranteed combinations (including `""`) under Windows and Mac OS X.

The set of available encodings and combinations varies by platform, depending on the iconv library that is installed. Under Windows, iconv.dll or libiconv.dll must be in the user's path or the current executable's directory at run time, and iconv.dll or libiconv.dll must link to msvcrt.dll for _errno; otherwise, only the guaranteed combinations are available.

• `(bytes-close-converter` `bytes-converter``)` -- closes the given converter, so that it can no longer be used with `bytes-convert` or `bytes-convert-end`.

• `(bytes-convert` `bytes-converter src-bytes` [`src-start-k src-end-k dest-bytes dest-start-k dest-end-k`]`)` converts the bytes from `src-start-k` to `src-end-k` in `src-bytes`. If `dest-bytes` is supplied and not `#f`, the converted byte are written into `dest-bytes` from `dest-start-k` to `dest-end-k`. If `dest-bytes` is not supplied or is `#f`, then a newly allocated byte string holds the conversion results, and the size of the result byte string is no more than `(- dest-end-k start-start-k)`.

If `src-start-k` or `dest-start-k` is not provided, it defaults to `0`. If `src-end-k` is not provided, it defaults to `(bytes-length src-bytes`. If `src-end-k` is not provided or is `#f`, then it defaults to `(bytes-length dest-bytes)` when `dest-bytes` is a byte string or to an arbitrarily large integer otherwise.

The result of `bytes-convert` is three values:

• `result-bytes` or `dest-wrote-k` -- a byte string if `dest-bytes` is `#f` or not provided, or the number of bytes written into `dest-bytes` otherwise.

• `src-read-k` -- the number of bytes successfully converted from `src-bytes`.

• `'complete`, `'continues`, `'aborts`, or `'error` -- indicates how conversion terminated.

• `'complete`: The entire input was processed, and `src-read-k` will be equal to ```(- src-end-k src-start-k)```.

• `'continues`: Conversion stopped due to the limit on the result size or the space in `dest-bytes`; in this case, fewer than `(- dest-end-k dest-start-k)` bytes may be returned if more space is needed to process the next complete encoding sequence in `src-bytes`.

• `'aborts`: The input stopped part-way through an encoding sequence, and more input bytes are necessary to continue. For example, if the last byte of input is `#o303` for a `"UTF-8-permissive"` decoding, the result is `'aborts`, because another byte is needed to determine how to use the `#o303` byte.

• `'error`: The bytes starting at ```(+ src-start-k src-read-k)``` bytes in `src-bytes` do not form a legal encoding sequence. This result is never produced for some encodings, where all byte sequences are valid encodings. For example, since `"UTF-8-permissive"` handles an invalid UTF-8 sequence by dropping characters or generating ``?'', every byte sequence is effectively valid.

Applying a converter accumulates state in the converter (even when the third result of `bytes-convert` is `'complete`). This state can affect both further processing of input and further generation of output, but only for conversions that involve ``shift sequences'' to change modes within a stream. To terminate an input sequence and reset the converter, use `bytes-convert-end`.

• `(bytes-convert-end` `bytes-converter` [`dest-bytes dest-start-k dest-end-k`]`)` -- like `bytes-convert`, but instead of converting bytes, this procedure generates an ending sequence for the conversion (sometimes called a ``shift sequence''), if any. Few encodings use shift sequences, so this function will succeed with no output for most encodings. In any case, successful output of a (possibly empty) shift sequence resets the converter to its initial state.

The result of `bytes-convert-end` is two values:

• `result-bytes` or `dest-wrote-k` -- a byte string if `dest-bytes` is `#f` or not provided, or the number of bytes written into `dest-bytes` otherwise.

• `'complete` or `'continues` -- indicates whether conversion completed. If `'complete`, then an entire ending sequence was produced. If `'continues`, then the conversion could not complete due to the limit on the result size or the space in `dest-bytes`, and the first result is either an empty byte string or `0`.

• `(bytes-converter?` `v``)` returns `#t` if `v` is a byte converter produced by `bytes-open-converter`, `#f` otherwise.

• `(locale-string-encoding``)` returns a string for the current locale's encoding (i.e., the encoding normally identified by `""`). See also `system-language+country` in section 15.5.

## 3.7  Symbols

For information about symbol parsing and printing, see section 11.2.4 and section 11.2.5, respectively.

MzScheme provides two ways of generating an uninterned symbol, i.e., a symbol that is not `eq?`, `eqv?`, or `equal?` to any other symbol, although it may print the same as another symbol:

• `(string->uninterned-symbol` `string``)` is like `(string->symbol string)`, but the resulting symbol is a new uninterned symbol. Calling `string->uninterned-symbol` twice with the same `string` returns two distinct symbols.

• `(gensym` [`symbol/string`]`)` creates an uninterned symbol with an automatically-generated name. The optional `symbol/string` argument is a prefix symbol or string.

Regular (interned) symbols are only weakly held by the internal symbol table. This weakness can never affect the result of an `eq?`, `eqv?`, or `equal?` test, but a symbol may disappear when placed into a weak box (see section 13.1) used as the key in a weak hash table (see section 3.14), or used as an ephemeron key (see section 13.2).

## 3.8  Keywords

A symbol-like datum that starts with a hash and colon (``#:'') is parsed as a keyword constant. Keywords behave like symbols -- two keywords are `eq?` if and only if they print the same -- but they are a distinct set of values.

• `(keyword?` `v``)` returns `#t` if `v` is a keyword, `#f` otherwise.

• `(keyword->string` `keyword``)` returns a string for the `display`ed form of `keyword`, not including the leading `#:`.

• `(string->keyword` `string``)` returns a keyword whose `display`ed form is the same as that of `string`, but with a leading `#:`.

Like symbols, keywords are only weakly held by the internal keyword table; see section 3.7 for more information.

## 3.9  Vectors

When a vector is created with `make-vector` without a fill value, it is initialized with `0` in all positions. A vector can be immutable, such as a vector returned by `syntax-e`, but vectors generated by `read` are mutable. (See also `immutable?` in section 3.10.)

`(vector->immutable-vector` `vec``)` returns an immutable vector with the same content as `vec`, and it returns `vec` itself if `vec` is immutable. (See also `immutable?` in section 3.10.)

`(vector-immutable` `v` ···1`)` is like `(vector v ···1)` except that the resulting vector is immutable. (See also `immutable?` in section 3.10.)

## 3.10  Lists

A cons cell can be mutable or immutable. When an immutable cons cell is provided to a procedure like `set-cdr!`, the `exn:fail:contract` exception is raised. Cons cells generated by `read` are always mutable.

The global variable `null` is bound to the empty list.

`(reverse!` `list``)` is the same as `(reverse list)`, but `list` is destructively reversed using `set-cdr!` (i.e., each cons cell in `list` is mutated).

`(append!` `list` ···1`)` is like `(append list)`, but it destructively appends the `list`s (i.e., except for the last `list`, the last cons cell of each `list` is mutated to append the lists; empty lists are essentially dropped).

`(list*` `v` ···1`)` is similar to `(list v ···1)` but the last argument is used directly as the `cdr` of the last pair constructed for the list:

```(`list*` 1 2 3 4) ; => `'(1 2 3 . 4)`
```

`(cons-immutable` `v1 v2``)` returns an immutable pair whose `car` is `v1` and `cdr` is `v2`.

`(list-immutable` `v` ···1`)` is like `(list v ···1)`, but using immutable pairs.

`(list*-immutable` `v` ···1`)` is like `(list* v ···1)`, but using immutable pairs.

`(immutable?` `v``)` returns `#t` if `v` is an immutable cons cell, string, vector, box, or hash table, `#f` otherwise.

The `list-ref` and `list-tail` procedures accept an improper list as a first argument. If either procedure is applied to an improper list and an index that would require taking the `car` or `cdr` of a non-cons-cell, the `exn:fail:contract` exception is raised.

The `member`, `memv`, and `memq` procedures accept an improper list as a second argument. If the membership search reaches the improper tail, the `exn:fail:contract` exception is raised.

The `assoc`, `assv`, and `assq` procedures accept an improperly formed association list as a second argument. If the association search reaches an improper list tail or a list element that is not a pair, the `exn:fail:contract` exception is raised.

## 3.11  Boxes

MzScheme provides boxes, which are records that have a single field:

• `(box` `v``)` returns a new mutable box that contains `v`.

• `(box-immutable` `v``)` returns a new immutable box that contains `v`.

• `(unbox` `box``)` returns the content of `box`. For any `v`, `(unbox (box v))` returns `v`.

• `(set-box!` `mutable-box v``)` sets the content of `mutable-box` to `v`.

• `(box?` `v``)` returns `#t` if `v` is a box, `#f` otherwise.

Two boxes are `equal?` if the contents of the boxes are `equal?`.

A box returned by `syntax-e` (see section 12.2.2) is immutable; if `set-box!` is applied to such a box, the `exn:fail:contract` exception is raised. A box produced by `read` (via `#&`) is mutable. (See also `immutable?` in section 3.10.)

## 3.12  Procedures

See section 4.6 for information on defining new procedure types.

### 3.12.1  Arity

MzScheme's `procedure-arity` procedure returns the input arity of a procedure:

• `(procedure-arity` `proc``)` returns information about the number of arguments accepted by the procedure `proc`. The result `a` is either:

• an exact non-negative integer ==> the procedure always takes exactly `a` arguments;

• an `arity-at-least`10 instance ==> the procedure takes `(arity-at-least-value a)` or more arguments; or

• a list containing integers and `arity-at-least` instances ==> the procedure takes any number of arguments that can match one of the arities in the list.

• `(procedure-arity-includes?` `proc k``)` returns `#t` if the procedure can accept `n` arguments (where `k` is an exact, non-negative integer), `#f` otherwise.

Examples:

```(`procedure-arity` `cons`) ; => `2`
(`procedure-arity` `list`) ; => `#<struct:arity-at-least>`
(`arity-at-least?` (`procedure-arity` `list`)) ; => `#t`
(`arity-at-least-value` (`procedure-arity` `list`)) ; => `0`
(`arity-at-least-value` (`procedure-arity` (lambda (x . y) x))) ; => `1`
(`procedure-arity` (case-lambda [(x) 0] [(x y) 1])) ; => `'(1 2)`
(`procedure-arity-includes?` `cons` 2) ; => `#t`
(`procedure-arity-includes?` `display` 3) ; => `#f`
```

When compiling a `lambda` or `case-lambda` expression, MzScheme looks for a `'method-arity-error` property attached to the expression (see section 12.6.2). If it is present with a true value, and if no case of the procedure accepts zero arguments, then the procedure is marked so that an `exn:fail:contract:arity` exception involving the procedure will hide the first argument, if one was provided. (Hiding the first argument is useful when the procedure implements a method, where the first argument is implicit in the original source). The property affects only the format of `exn:fail:contract:arity` exceptions, not the result of `procedure-arity`.

### 3.12.2  Primitives

A primitive procedure is a built-in procedure that is implemented in low-level language. Not all built-in procedures are primitives, but almost all R5RS procedures are primitives, as are most of the procedures described in this manual.

• `(primitive?` `v``)` returns `#t` if `v` is a primitive procedure or `#f` otherwise.

• `(primitive-result-arity` `prim-proc``)` returns the arity of the result of the primitive procedure `prim-proc` (as opposed to the procedure's input arity as returned by `arity`; see section 3.12.1). For most primitives, this procedure returns `1`, since most primitives return a single value when applied. For information about arity values, see section 3.12.1.

• `(primitive-closure?` `v``)` returns `#t` if `v` is internally implemented as a primitive closure rather than a simple primitive procedure, `#f` otherwise. This information is intended for use by the mzc compiler.

### 3.12.3  Procedure Names

See section 6.2.4 for information about the names of primitives, and the names inferred for `lambda` and `case-lambda` procedures.

## 3.13  Promises

The `force` procedure can only be applied to values returned by `delay`, and promises are never implicitly `force`d.

`(promise?` `v``)` returns `#t` if `v` is a promise created by `delay`, `#f` otherwise.

## 3.14  Hash Tables

`(make-hash-table` [`flag-symbol flag-symbol`]`)` creates and returns a new hash table. If provided, each `flag-symbol` must one of the following:

• `'weak` -- creates a hash table with weakly-held keys (see section 13.1).

• `'equal` -- creates a hash table that compares keys using `equal?` instead of `eq?` (needed, for example, when using strings as keys).

By default, key comparisons use `eq?`. If the second `flag-symbol` is redundant, the `exn:fail:contract` exception is raised.

Two hash tables are `equal?` if they are created with the same flags, and if they map the same keys to `equal?` values (where ``same key'' means either `eq?` or `equal?`, depending on the way the hash table compares keys).

`(make-immutable-hash-table` `assoc-list` [`flag-symbol`]`)` creates an immutable hash table. (See also `immutable?` in section 3.10.) The `assoc-list` must be a list of pairs, where the `car` of each pair is a key, and the `cdr` is the corresponding value. The mappings are added to the table in the order that they appear in `assoc-list`, so later mappings can hide earlier mappings. If the optional `flag-symbol` argument is provided, it must be `'equal`, and the created hash table compares keys with `equal?`; otherwise, the created table compares keys with `eq?`.

`(hash-table?` `v` [`flag-symbol flag-symbol`]`)` returns `#t` if `v` was created by `make-hash-table` or `make-immutable-hash-table` with the given `flag-symbol`s (or more), `#f` otherwise. Each provided `flag-symbol` must be a distinct flag supported by `make-hash-table`; if the second `flag-symbol` is redundant, the `exn:fail:contract` exception is raised.

`(hash-table-put!` `hash-table key-v v``)` maps `key-v` to `v` in `hash-table`, overwriting any existing mapping for `key-v`. If `hash-table` is immutable, the `exn:fail:contract` exception is raised.

`(hash-table-get` `hash-table key-v` [`failure-thunk`]`)` returns the value for `key-v` in `hash-table`. If no value is found for `key-v`, then the result of invoking `failure-thunk` (a procedure of no arguments) is returned. If `failure-thunk` is not provided, the `exn:fail:contract` exception is raised when no value is found for `key-v`.

`(hash-table-remove!` `hash-table key-v``)` removes the value mapping for `key-v` if it exists in `hash-table`. If `hash-table` is immutable, the `exn:fail:contract` exception is raised.

`(hash-table-map` `hash-table proc``)` applies the procedure `proc` to each element in `hash-table`, accumulating the results into a list. The procedure `proc` must take two arguments: a key and its value. See the caveat below about concurrent modification.

`(hash-table-for-each` `hash-table proc``)` applies the procedure `proc` to each element in `hash-table` (for the side-effects of `proc`) and returns void. The procedure `proc` must take two arguments: a key and its value. See the caveat below about concurrent modification.

`(hash-table-count` `hash-table``)` returns the number of keys mapped by `hash-table`. If `hash-table` is not created with `'weak`, then the result is computed in constant time and atomically. If `hash-table` is created with `'weak`, see the caveat below about concurrent modification.

`(hash-table-copy` `hash-table``)` returns a mutable hash table with the same mappings, same key-comparison mode, and same key-holding strength as `hash-table`.

`(eq-hash-code` `v``)` returns an exact integer; for any two `eq?` values, the returned integer is the same. Furthermore, for the result integer `k` and any other exact integer `j`, `(= k j)` implies `(eq? k j)`.

`(equal-hash-code` `v``)` returns an exact integer; for any two `equal?` values, the returned integer is the same. Furthermore, for the result integer `k` and any other exact integer `j`, `(= k j)` implies `(eq? k j)`. If `v` contains a cycle through pairs, vectors, boxes, and inspectable structure fields, then `equal-hash-code` applied to `v` will loop indefinitely.

Caveat concerning concurrent modification: A hash table can be manipulated with `hash-table-get`, `hash-table-put!`, and `hash-table-remove!` concurrently by multiple threads, and the operations are protected by a table-specific semaphore as needed. A few caveats apply, however:

• If a thread is terminated while applying `hash-table-get`, `hash-table-put!`, or `hash-table-remove!` to a hash table that uses `equal?` comparisons, all current and future operations on the hash table block indefinitely.

• The `hash-table-map`, `hash-table-for-each`, and `hash-table-count` procedures do not use the table's semaphore. Consequently, if a hash table is extended with new keys by another thread while a map, for-each, or count is in process, arbitrary key-value pairs can be dropped or duplicated in the map or for-each. Similarly, if a map or for-each procedure itself extends the table, arbitrary key-value pairs can be dropped or duplicated. However, key mappings can be deleted or remapped by any thread with no adverse affects (i.e., the change does not affect a traversal if the key has been seen already, otherwise the traversal skips a deleted key or uses the remapped key's new value).

Caveat concerning mutable keys: If a key into an `equal?`-based hash table is mutated (e.g., a key string is modified with `string-set!`), then the hash table's behavior for put and get operations becomes unpredictable.

4 30 bits for a 32-bit architecture, 62 bits for a 64-bit architecture.

5 This definition of `eqv?` technically contradicts R5RS, but R5RS does not address strange ``numbers'' like `+nan.0`.

6 The random number generator uses a 54-bit version of L'Ecuyer's MRG32k3a algorithm.

7 The current version of MzScheme uses Unicode version 4.1.

10 The `arity-at-least` structure type is transparent to all inspectors (see section 4.5).