node188

Common Lisp the Language, 2nd Edition

Next: Parsing of Numbers Up: Printed Representation of Previous: Printed Representation of

22.1.1. What the Read Function Accepts

The purpose of the Lisp reader is to accept characters, interpret them as the printed representation of a Lisp object, and construct and return such an object. The reader cannot accept everything that the printer produces; for example, the printed representations of compiled code objects cannot be read in. However, the reader has many features that are not used by the output of the printer at all, such as comments, alternative representations, and convenient abbreviations for frequently used but unwieldy constructs. The reader is also parameterized in such a way that it can be used as a lexical analyzer for a more general user-written parser.

The reader is organized as a recursive-descent parser. Broadly speaking, the reader operates by reading a character from the input stream and treating it in one of three ways. Whitespace characters serve as separators but are otherwise ignored. Constituent and escape characters are accumulated to make a token, which is then interpreted as a number or symbol. Macro characters trigger the invocation of functions (possibly user-supplied) that can perform arbitrary parsing actions, including recursive invocation of the reader.

More precisely, when the reader is invoked, it reads a single character from the input stream and dispatches according to the syntactic type of that character. Every character that can appear in the input stream must be of exactly one of the following kinds: illegal, whitespace, constituent, single escape, multiple escape, or macro. Macro characters are further divided into the types terminating and non-terminating (of tokens). (Note that macro characters have nothing whatever to do with macros in their operation. There is a superficial similarity in that macros allow the user to extend the syntax of Common Lisp at the level of forms, while macro characters allow the user to extend the syntax at the level of characters.) Constituents additionally have one or more attributes, the most important of which is alphabetic; these attributes are discussed further in section 22.1.2.

The parsing of Common Lisp expressions is discussed in terms of these syntactic character types because the types of individual characters are not fixed but may be altered by the user (see set-syntax-from-char and set-macro-character). The characters of the standard character set initially have the syntactic types shown in table 22-1. Note that the brackets, braces, question mark, and exclamation point (that is, [, ], {, }, ?, and !) are normally defined to be constituents, but they are not used for any purpose in standard Common Lisp syntax and do not occur in the names of built-in Common Lisp functions or variables. These characters are explicitly reserved to the user. The primary intent is that they be used as macro characters; but a user might choose, for example, to make ! be a single escape character (as it is in Portable Standard Lisp).

----------------------------------------------------------------
Table 22-1: Standard Character Syntax Types

<tab> whitespace          <page> whitespace <newline> whitespace 
<space> whitespace        @ constituent     ` terminating macro 
! constituent *           A constituent     a constituent 
" terminating macro       B constituent     b constituent 
# non-terminating macro   C constituent     c constituent 
$ constituent             D constituent     d constituent 
% constituent             E constituent     e constituent 
& constituent             F constituent     f constituent 
' terminating macro       G constituent     g constituent 
( terminating macro       H constituent     h constituent 
) terminating macro       I constituent     i constituent 
* constituent             J constituent     j constituent 
+ constituent             K constituent     k constituent 
, terminating macro       L constituent     l constituent 
- constituent             M constituent     m constituent 
. constituent             N constituent     n constituent 
/ constituent             O constituent     o constituent 
0 constituent             P constituent     p constituent 
1 constituent             Q constituent     q constituent 
2 constituent             R constituent     r constituent 
3 constituent             S constituent     s constituent 
4 constituent             T constituent     t constituent 
5 constituent             U constituent     u constituent 
6 constituent             V constituent     v constituent 
7 constituent             W constituent     w constituent 
8 constituent             X constituent     x constituent 
9 constituent             Y constituent     y constituent 
: constituent             Z constituent     z constituent 
; terminating macro       [ constituent *   { constituent * 
< constituent             \ single escape   | multiple escape 
= constituent             ] constituent *   } constituent * 
> constituent             ^ constituent     ~ constituent 
? constituent *           _ constituent     <rubout> constituent 
<bkspace> constituent  <return> whitespace <linefeed> whitespace

The characters marked with an asterisk are initially constituents
but are reserved to the user for use as macro characters or for
any other desired purpose.
----------------------------------------------------------------

The algorithm performed by the Common Lisp reader is roughly as follows:

If at end of file, perform end-of-file processing (as specified by the caller of the read function). Otherwise, read one character from the input stream, call it x, and dispatch according to the syntactic type of x to one of steps 2 to 7.
If x is an illegal character, signal an error.
If x is a whitespace character, then discard it and go back to step 1.
If x is a macro character (at this point the distinction between terminating and non-terminating macro characters does not matter), then execute the function associated with that character. The function may return zero values or one value (see values).

The macro-character function may of course read characters from the input stream; if it does, it will see those characters following the macro character. The function may even invoke the reader recursively. This is how the macro character ( constructs a list: by invoking the reader recursively to read the elements of the list.

If one value is returned, then return that value as the result of the read operation; the algorithm is done. If zero values are returned, then go back to step 1.
If x is a single escape character (normally ), then read the next character and call it y (but if at end of file, signal an error instead). Ignore the usual syntax of y and pretend it is a constituent whose only attribute is alphabetic.

(If y is a lowercase character, leave it alone; do not replace it with the corresponding uppercase character.)

For the purposes of readtable-case, y is not replaceable.

Use y to begin a token, and go to step 8.
If x is a multiple escape character (normally |), then begin a token (initially containing no characters) and go to step 9.
If x is a constituent character, then it begins an extended token. After the entire token is read in, it will be interpreted either as representing a Lisp object such as a symbol or number (in which case that object is returned as the result of the read operation), or as being of illegal syntax (in which case an error is signaled).

If x is a lowercase character, replace it with the corresponding uppercase character.

X3J13 voted in June 1989 (READ-CASE-SENSITIVITY) to introduce readtable-case. Consequently, the preceding sentence should be ignored. The case of x should not be altered; instead, x should be regarded as replaceable.

Use x to begin a token, and go on to step 8.
(At this point a token is being accumulated, and an even number of multiple escape characters have been encountered.) If at end of file, go to step 10. Otherwise, read a character (call it y), and perform one of the following actions according to its syntactic type:
- If y is a constituent or non-terminating macro, then do the following.
  
  If y is a lowercase character, replace it with the corresponding uppercase character.
  
  X3J13 voted in June 1989 (READ-CASE-SENSITIVITY) to introduce readtable-case. Consequently, the preceding sentence should be ignored. The case of y should not be altered; instead, y should be regarded as replaceable.
  
  Append y to the token being built, and repeat step 8.
- If y is a single escape character, then read the next character and call it z (but if at end of file, signal an error instead). Ignore the usual syntax of z and pretend it is a constituent whose only attribute is alphabetic.
  
  (If z is a lowercase character, leave it alone; do not replace it with the corresponding uppercase character.)
  
  For the purposes of readtable-case, z is not replaceable.
  
  Append z to the token being built, and repeat step 8.
- If y is a multiple escape character, then go to step 9.
- If y is an illegal character, signal an error.
- If y is a terminating macro character, it terminates the token. First ``unread’’ the character y (see unread-char), then go to step 10.
- If y is a whitespace character, it terminates the token. First ``unread’’ y if appropriate (see read-preserving-whitespace), then go to step 10.
(At this point a token is being accumulated, and an odd number of multiple escape characters have been encountered.) If at end of file, signal an error. Otherwise, read a character (call it y), and perform one of the following actions according to its syntactic type:
- If y is a constituent, macro, or whitespace character, then ignore the usual syntax of that character and pretend it is a constituent whose only attribute is alphabetic.
  
  (If y is a lowercase character, leave it alone; do not replace it with the corresponding uppercase character.)
  
  For the purposes of readtable-case, y is not replaceable.
  
  Append y to the token being built, and repeat step 9.
- If y is a single escape character, then read the next character and call it z (but if at end of file, signal an error instead). Ignore the usual syntax of z and pretend it is a constituent whose only attribute is alphabetic.
  
  (If z is a lowercase character, leave it alone; do not replace it with the corresponding uppercase character.)
  
  For the purposes of readtable-case, z is not replaceable.
  
  Append z to the token being built, and repeat step 9.
- If y is a multiple escape character, then go to step 8.
- If y is an illegal character, signal an error.
An entire token has been accumulated.

X3J13 voted in June 1989 (READ-CASE-SENSITIVITY) to introduce readtable-case. If the accumulated token is to be interpreted as a symbol, any case conversion of replaceable characters should be performed at this point according to the value of the readtable-case slot of the current readtable (the value of *readtable*).

Interpret the token as representing a Lisp object and return that object as the result of the read operation, or signal an error if the token is not of legal syntax.

X3J13 voted in March 1989 (CHARACTER-PROPOSAL) to specify that implementation-defined attributes may be removed from the characters of a symbol token when constructing the print name. It is implementation-dependent which attributes are removed.

As a rule, a single escape character never stands for itself but always serves to cause the following character to be treated as a simple alphabetic character. A single escape character can be included in a token only if preceded by another single escape character.

A multiple escape character also never stands for itself. The characters between a pair of multiple escape characters are all treated as simple alphabetic characters, except that single escape and multiple escape characters must nevertheless be preceded by a single escape character to be included.

Compatibility note: In MacLisp, the | character is implemented as a macro character that reads characters up to the next unescaped | and then makes a token; no characters are ever read beyond the second | of a matching pair. In Common Lisp, the second | does not terminate the token being read but merely reverts to the ordinary (rather than multiple-escape) mode of token accumulation. This results in some differences in the way certain character sequences are interpreted. For example, the sequence |foo||bar| would be read in MacLisp as two distinct tokens, |foo| and |bar|, whereas in Common Lisp it would be treated as a single token equivalent to |foobar|. The sequence |foo|bar|baz| would be read in MacLisp as three distinct tokens, |foo|, bar, and |baz|, whereas in Common Lisp it would be treated as a single token equivalent to |fooBARbaz|; note that the middle three lowercase letters are converted to uppercase letters as they do not fall within a matching pair of vertical bars.

One reason for the different treatment of | in Common Lisp lies in the syntax for package-qualified symbol names. A sequence such as |foo:bar| ought to be interpreted as a symbol whose name is foo:bar; the colon should be treated as a simple alphabetic character because it lies within a pair of vertical bars. The symbol |bar| within the package |foo| can be notated not as |foo:bar| but as |foo|:|bar|; the colon can serve as a package marker because it falls outside the vertical bars, and yet the notation is treated as a single token thanks to the new rules adopted in Common Lisp.

In MacLisp, the parentheses are treated as additional character types. In Common Lisp they are simply macro characters, as described in section 22.1.3.

What MacLisp calls ``single character objects’’ (tokens of type single) are not provided for explicitly in Common Lisp. They can be viewed as simply a kind of macro character. That is, the effect of

(setsyntax '$ 'single nil) 
(setsyntax '% 'single nil)

in MacLisp can be achieved in Common Lisp by

(defun single-macro-character (stream char) 
  (declare (ignore stream)) 
  (intern (string char))) 
(set-macro-character '$ #'single-macro-character) 
(set-macro-character '% #'single-macro-character)

Next: Parsing of Numbers Up: Printed Representation of Previous: Printed Representation of

AI.Repository@cs.cmu.edu