string.ss: String Utilities

To load: (require (lib "string.ss"))

(eval-string str [err-handler])      PROCEDURE

Reads and evaluates S-expressions from the string str, returning results for all of the expressions in the string. Note that if str contains only whitespace and comments, zero values are returned, and if str contains multiple expressions, the result will be contain multiple values from all subexpression. str can also be a byte string.

err-handler can be:

(expr->string expr)      PROCEDURE

Prints expr into a string and returns the string.

(real->decimal-string n [digits-after-decimal-k])      PROCEDURE

Prints n into a string and returns the string. The printed form of n shows exactly digits-after-decimal-k digits after the decimal point, where digits-after-decimal-k defaults to 2.

Before printing, the n is converted to an exact number, multiplied by (expt 10 digits-after-decimal-k), rounded, and then divided again by (expt 10 digits-after-decimal-k). The result of ths process is an exact number whose decimal representation has no more than digits-after-decimal-k digits after the decimal (and it is padded with trailing zeros if necessary). The printed for uses a minus sign if n is negative, and it does not use a plus sign if n is positive.

(read-from-string str [err-handler])      PROCEDURE

Reads the first S-expression from the string (or byte string) str and returns it. The err-handler is as in eval-string.

(read-from-string-all str [err-handler])      PROCEDURE

Reads all S-expressions from the string (or byte string) str and returns them in a list. The err-handler is as in eval-string.

(regexp-match* pattern string [start-k end-k])      PROCEDURE

(regexp-match* pattern bytes [start-k end-k])      PROCEDURE

(regexp-match* pattern input-port [start-k end-k])      PROCEDURE

Like regexp-match (see section 10 in PLT MzScheme: Language Manual), but the result is a list of strings or byte strings corresponding to a sequence of matches of pattern in string, bytes, or input-port. (Unlike regexp-match, results for parenthesized sub-patterns in pattern are not returned.) If pattern matches a zero-length string or byte sequence along the way, the exn:fail exception is raised.

If string, bytes, or input-port contains no matches (in the range start-k to end-k), null is returned. Otherwise, each item in the resulting list is a distinct substring or byte sequence from string, bytes, or input-port that matches pattern. The end-k argument can be #f to match to the end of string or baytes or to an end-of-file in input-port.

(regexp-match/fail-without-reading pattern input-port [start-k end-k output-port])      PROCEDURE

Like regexp-match on input ports (see section 10 in PLT MzScheme: Language Manual), except that if the match fails, no characters are read and discarded from input-port.

This procedure is especially useful with a pattern that begins with a start-of-string caret (``^'') or with a non-#f end-k, since each limits the amount of peeking into the port.

(regexp-match-exact? pattern string)      PROCEDURE

(regexp-match-exact? pattern bytes)      PROCEDURE

(regexp-match-exact? pattern input-port)      PROCEDURE

This procedure is like MzScheme's built-in regexp-match (see section 10 in PLT MzScheme: Language Manual), but the result is always #t or #f; #t is only returned when the entire content of string, bytes, or input-port matches pattern.

(regexp-match-peek-positions* pattern input-port [start-k end-k])      PROCEDURE

Like regexp-match-positions*, but it works only on input ports, and the port is peeked instead of read for matches.

(regexp-match-positions* pattern string [start-k end-k])      PROCEDURE

(regexp-match-positions* pattern bytes [start-k end-k])      PROCEDURE

(regexp-match-positions* pattern input-port [start-k end-k])      PROCEDURE

Like regexp-match-positions (see section 10 in PLT MzScheme: Language Manual), but the result is a list of integer pairs corresponding to a sequence of matches of pattern in string-or-input-port. (Unlike regexp-match-positions, results for parenthesized sub-patterns in pattern are not returned.) If pattern matches a zero-length string along the way, the exn:fail exception is raised.

If string, bytes, or input-port contains no matches (in the range start-k to end-k), null is returned. Otherwise, each position pair in the resulting list corresponds to a distinct substring in string or byte sequence in bytes, input-port, or string (as UTF-8 encoded when pattern is a byte pattern), that matches pattern. The end-k argument can be #f to match to the end of string or bytes or to an end-of-file in input-port.

(regexp-quote str [case-sensitive?])      PROCEDURE

(regexp-quote bytes [case-sensitive?])      PROCEDURE

Produces a string or byte string suitable for use with regexp (see section 10 in PLT MzScheme: Language Manual) to match the literal sequence of characters in str or sequence of bytes in bytes. If case-sensitive? is true, the resulting regexp matches letters in str or bytes case-insensitively, otherwise (and by default) it matches case-sensitively.

(regexp-replace-quote str)      PROCEDURE

(regexp-replace-quote bytes)      PROCEDURE

Produces a string suitable for use as the third argument to regexp-replace (see section 10 in PLT MzScheme: Language Manual) to insert the literal sequence of characters in str or bytes in bytes as a replacement. Concretely, every backslash and ampersand in str or bytes is protected by a quoting backslash.

(glob->regexp str [hide-dots? case-sensitive? simple?])      PROCEDURE

Produces a regexp for a an input ``glob pattern'' in str. A glob pattern is one that matches ``*'' with any string, ``?'' with a single character, and character ranges are the same as in regexps. In addition, the resulting regexp does not match strings that begin with a period, unless the glob string begins with a literal period. The resulting regexp can be used with string file names to check the glob pattern. If the glob pattern is provided as a byte string, the result is a byte regexp.

If hide-dots? is true (the default), the resulting regexp will not match names that begin with a dot.

If case-sensitive? is given, it determines whether the resulting regexp is case-sensitive; otherwise the default case sensitivity depends on the system-type.

Finally, if simple? is provided as #t, then the glob is not expected to contain ranges (if it does, they will be regexp-quoted).

(regexp-split pattern string [start-k end-k])      PROCEDURE

(regexp-split pattern bytes [start-k end-k])      PROCEDURE

(regexp-split pattern input-port [start-k end-k])      PROCEDURE

The complement of regexp-match* (see above): the result is a list of strings or byte strings from in string, bytes, or input-port that are separated by matches to pattern; adjacent matches are separated with "" or #"". If pattern matches a zero-length string or byte sequence along the way, the exn:fail exception is raised.

If string, bytes, or input-port contains no matches (in the range start-k to end-k), the result is be a list containing string (UTF-8 encoded if pattern is a byte pattern), bytes, or the content of input-port -- from start-k to end-k. If a match occurs at the beginning of string, bytes, or input-port (at start-k), the resulting list will start with an empty string or empty byte string, and if a match occurs at the end (at end-k), the list will end with an empty string or empty byte string. The end-k argument can be #f, in which case splitting goes to the end of string or bytes or to an end-of-file in input-port.

(string-lowercase! str)      PROCEDURE

Destructively changes str to contain only lowercase characters.

(string-uppercase! str)      PROCEDURE

Destructively changes str to contain only uppercase characters.