Common Lisp the Language, 2nd Edition
Next: Parsing of Numbers
Up: Printed Representation
of Previous: Printed
Representation of
The purpose of the Lisp reader is to accept characters, interpret them as the printed representation of a Lisp object, and construct and return such an object. The reader cannot accept everything that the printer produces; for example, the printed representations of compiled code objects cannot be read in. However, the reader has many features that are not used by the output of the printer at all, such as comments, alternative representations, and convenient abbreviations for frequently used but unwieldy constructs. The reader is also parameterized in such a way that it can be used as a lexical analyzer for a more general user-written parser.
The reader is organized as a recursive-descent parser. Broadly speaking, the reader operates by reading a character from the input stream and treating it in one of three ways. Whitespace characters serve as separators but are otherwise ignored. Constituent and escape characters are accumulated to make a token, which is then interpreted as a number or symbol. Macro characters trigger the invocation of functions (possibly user-supplied) that can perform arbitrary parsing actions, including recursive invocation of the reader.
More precisely, when the reader is invoked, it reads a single character from the input stream and dispatches according to the syntactic type of that character. Every character that can appear in the input stream must be of exactly one of the following kinds: illegal, whitespace, constituent, single escape, multiple escape, or macro. Macro characters are further divided into the types terminating and non-terminating (of tokens). (Note that macro characters have nothing whatever to do with macros in their operation. There is a superficial similarity in that macros allow the user to extend the syntax of Common Lisp at the level of forms, while macro characters allow the user to extend the syntax at the level of characters.) Constituents additionally have one or more attributes, the most important of which is alphabetic; these attributes are discussed further in section 22.1.2.
The parsing of Common Lisp expressions is discussed in terms of these
syntactic character types because the types of individual characters are
not fixed but may be altered by the user (see
set-syntax-from-char
and set-macro-character
).
The characters of the standard character set initially have the
syntactic types shown in table 22-1. Note that the
brackets, braces, question mark, and exclamation point (that is,
[
, ]
, {
, }
,
?
, and !
) are normally defined to be
constituents, but they are not used for any purpose in standard Common
Lisp syntax and do not occur in the names of built-in Common Lisp
functions or variables. These characters are explicitly reserved to the
user. The primary intent is that they be used as macro characters; but a
user might choose, for example, to make !
be a single
escape character (as it is in Portable Standard Lisp).
----------------------------------------------------------------
Table 22-1: Standard Character Syntax Types
<tab> whitespace <page> whitespace <newline> whitespace
<space> whitespace @ constituent ` terminating macro
! constituent * A constituent a constituent
" terminating macro B constituent b constituent
# non-terminating macro C constituent c constituent
$ constituent D constituent d constituent
% constituent E constituent e constituent
& constituent F constituent f constituent
' terminating macro G constituent g constituent
( terminating macro H constituent h constituent
) terminating macro I constituent i constituent
* constituent J constituent j constituent
+ constituent K constituent k constituent
, terminating macro L constituent l constituent
- constituent M constituent m constituent
. constituent N constituent n constituent
/ constituent O constituent o constituent
0 constituent P constituent p constituent
1 constituent Q constituent q constituent
2 constituent R constituent r constituent
3 constituent S constituent s constituent
4 constituent T constituent t constituent
5 constituent U constituent u constituent
6 constituent V constituent v constituent
7 constituent W constituent w constituent
8 constituent X constituent x constituent
9 constituent Y constituent y constituent
: constituent Z constituent z constituent
; terminating macro [ constituent * { constituent *
< constituent \ single escape | multiple escape
= constituent ] constituent * } constituent *
> constituent ^ constituent ~ constituent
? constituent * _ constituent <rubout> constituent
<bkspace> constituent <return> whitespace <linefeed> whitespace
The characters marked with an asterisk are initially constituents
but are reserved to the user for use as macro characters or for
any other desired purpose.
----------------------------------------------------------------
The algorithm performed by the Common Lisp reader is roughly as follows:
If at end of file, perform end-of-file processing (as specified
by the caller of the read
function). Otherwise, read one
character from the input stream, call it x, and dispatch
according to the syntactic type of x to one of steps 2 to 7.
If x is an illegal character, signal an error.
If x is a whitespace character, then discard it and go back to step 1.
If x is a macro character (at this point the
distinction between terminating and non-terminating
macro characters does not matter), then execute the function associated
with that character. The function may return zero values or one value
(see values
).
The macro-character function may of course read characters from the
input stream; if it does, it will see those characters following the
macro character. The function may even invoke the reader recursively.
This is how the macro character (
constructs a list: by
invoking the reader recursively to read the elements of the list.
If one value is returned, then return that value as the result of the read operation; the algorithm is done. If zero values are returned, then go back to step 1.
If x is a single escape character (normally
), then read the next character and call it y (but
if at end of file, signal an error instead). Ignore the usual syntax of
y and pretend it is a constituent whose only attribute
is alphabetic.
(If y is a lowercase character, leave it alone; do not replace
it with the corresponding uppercase character.)
For the purposes of readtable-case
, y is not
replaceable.
Use y to begin a token, and go to step 8.
If x is a multiple escape character (normally
|
), then begin a token (initially containing no characters)
and go to step 9.
If x is a constituent character, then it begins an extended token. After the entire token is read in, it will be interpreted either as representing a Lisp object such as a symbol or number (in which case that object is returned as the result of the read operation), or as being of illegal syntax (in which case an error is signaled).
If x is a lowercase character, replace it with the
corresponding uppercase character.
X3J13 voted in June 1989 (READ-CASE-SENSITIVITY) to introduce
readtable-case
. Consequently, the preceding sentence should
be ignored. The case of x should not be altered; instead,
x should be regarded as replaceable.
Use x to begin a token, and go on to step 8.
(At this point a token is being accumulated, and an even number of multiple escape characters have been encountered.) If at end of file, go to step 10. Otherwise, read a character (call it y), and perform one of the following actions according to its syntactic type:
If y is a constituent or non-terminating macro, then do the following.
If y is a lowercase character, replace it with the
corresponding uppercase character.
X3J13 voted in June 1989 (READ-CASE-SENSITIVITY) to introduce
readtable-case
. Consequently, the preceding sentence should
be ignored. The case of y should not be altered; instead,
y should be regarded as replaceable.
Append y to the token being built, and repeat step 8.
If y is a single escape character, then read the next character and call it z (but if at end of file, signal an error instead). Ignore the usual syntax of z and pretend it is a constituent whose only attribute is alphabetic.
(If z is a lowercase character, leave it alone; do not replace
it with the corresponding uppercase character.)
For the purposes of readtable-case
, z is not
replaceable.
Append z to the token being built, and repeat step 8.
If y is a multiple escape character, then go to step 9.
If y is an illegal character, signal an error.
If y is a terminating macro character, it
terminates the token. First ``unread’’ the character y (see
unread-char
), then go to step 10.
If y is a whitespace character, it terminates
the token. First ``unread’’ y if appropriate (see
read-preserving-whitespace
), then go to step 10.
(At this point a token is being accumulated, and an odd number of multiple escape characters have been encountered.) If at end of file, signal an error. Otherwise, read a character (call it y), and perform one of the following actions according to its syntactic type:
If y is a constituent, macro, or whitespace character, then ignore the usual syntax of that character and pretend it is a constituent whose only attribute is alphabetic.
(If y is a lowercase character, leave it alone; do not replace
it with the corresponding uppercase character.)
For the purposes of readtable-case
, y is not
replaceable.
Append y to the token being built, and repeat step 9.
If y is a single escape character, then read the next character and call it z (but if at end of file, signal an error instead). Ignore the usual syntax of z and pretend it is a constituent whose only attribute is alphabetic.
(If z is a lowercase character, leave it alone; do not replace
it with the corresponding uppercase character.)
For the purposes of readtable-case
, z is not
replaceable.
Append z to the token being built, and repeat step 9.
If y is a multiple escape character, then go to step 8.
If y is an illegal character, signal an error.
An entire token has been accumulated.
X3J13 voted in June 1989 (READ-CASE-SENSITIVITY) to introduce
readtable-case
. If the accumulated token is to be
interpreted as a symbol, any case conversion of replaceable characters
should be performed at this point according to the value of the
readtable-case
slot of the current readtable (the value of
*readtable*
).
Interpret the token as representing a Lisp object and return that object as the result of the read operation, or signal an error if the token is not of legal syntax.
X3J13 voted in March 1989 (CHARACTER-PROPOSAL) to specify that
implementation-defined attributes may be removed from the characters of
a symbol token when constructing the print name. It is
implementation-dependent which attributes are removed.
As a rule, a single escape character never stands for itself but always serves to cause the following character to be treated as a simple alphabetic character. A single escape character can be included in a token only if preceded by another single escape character.
A multiple escape character also never stands for itself. The characters between a pair of multiple escape characters are all treated as simple alphabetic characters, except that single escape and multiple escape characters must nevertheless be preceded by a single escape character to be included.
Compatibility note: In MacLisp, the |
character is implemented as a macro character that reads characters up
to the next unescaped |
and then makes a token; no
characters are ever read beyond the second |
of a matching
pair. In Common Lisp, the second |
does not terminate the
token being read but merely reverts to the ordinary (rather than
multiple-escape) mode of token accumulation. This results in some
differences in the way certain character sequences are interpreted. For
example, the sequence |foo||bar|
would be read in MacLisp
as two distinct tokens, |foo|
and |bar|
,
whereas in Common Lisp it would be treated as a single token equivalent
to |foobar|
. The sequence |foo|bar|baz|
would
be read in MacLisp as three distinct tokens, |foo|
,
bar
, and |baz|
, whereas in Common Lisp it
would be treated as a single token equivalent to
|fooBARbaz|
; note that the middle three lowercase letters
are converted to uppercase letters as they do not fall within a matching
pair of vertical bars.
One reason for the different treatment of |
in Common
Lisp lies in the syntax for package-qualified symbol names. A sequence
such as |foo:bar|
ought to be interpreted as a symbol whose
name is foo:bar
; the colon should be treated as a simple
alphabetic character because it lies within a pair of vertical bars. The
symbol |bar|
within the package |foo|
can be
notated not as |foo:bar|
but as |foo|:|bar|
;
the colon can serve as a package marker because it falls outside the
vertical bars, and yet the notation is treated as a single token thanks
to the new rules adopted in Common Lisp.
In MacLisp, the parentheses are treated as additional character types. In Common Lisp they are simply macro characters, as described in section 22.1.3.
What MacLisp calls ``single character objects’’ (tokens of type single) are not provided for explicitly in Common Lisp. They can be viewed as simply a kind of macro character. That is, the effect of
(setsyntax '$ 'single nil)
(setsyntax '% 'single nil)
in MacLisp can be achieved in Common Lisp by
(defun single-macro-character (stream char)
(declare (ignore stream))
(intern (string char)))
(set-macro-character '$ #'single-macro-character)
(set-macro-character '% #'single-macro-character)
Next: Parsing of Numbers
Up: Printed Representation
of Previous: Printed
Representation of
AI.Repository@cs.cmu.edu