Common Lisp the Language, 2nd Edition
Next: What the Print
Up: Printed Representation
of Previous: Standard
Dispatching Macro
Previous sections describe the standard syntax accepted by the
read
function. This section discusses the advanced topic of
altering the standard syntax either to provide extended syntax for Lisp
objects or to aid the writing of other parsers.
There is a data structure called the readtable that is used
to control the reader. It contains information about the syntax of each
character equivalent to that in table 22-1. It is set up
exactly as in table 22-1 to give the
standard Common Lisp meanings to all the characters, but the user can
change the meanings of characters to alter and customize the syntax of
characters. It is also possible to have several readtables describing
different syntaxes and to switch from one to another by binding the
variable *readtable*
.
Even if an implementation supports characters with non-zero bits and
font attributes, it need not (but may) allow for such characters to have
syntax descriptions in the readtable. However, every character of type
string-char
must be represented in the readtable.
X3J13 voted in March 1989 (CHARACTER-PROPOSAL) to remove the type
string-char
and to replace the bits and font attributes
with the notion of implementation-defined attributes. If any
implementation-defined attributes are supported, an implementation may
(but need not) allow for such characters to have syntax descriptions in
the readtable. Characters that do not have non-standard values for any
implementation-defined attribute must be represented in the
readtable.
[Variable]
*readtable*
The value of *readtable*
is the current readtable. The
initial value of this is a readtable set up for standard Common Lisp
syntax. You can bind this variable to temporarily change the readtable
being used.
To program the reader for a different syntax, a set of functions are provided for manipulating readtables. Normally, you should begin with a copy of the standard Common Lisp readtable and then customize the individual characters within that copy.
[Function]
copy-readtable &optional
from-readtable
to-readtable
A copy is made of from-readtable, which defaults to the
current readtable (the value of the global variable
*readtable*
). If from-readtable is
nil
, then a copy of a standard Common Lisp readtable is
made. For example,
(setq *readtable* (copy-readtable nil))
will restore the input syntax to standard Common Lisp syntax, even if the original readtable has been clobbered (assuming it is not so badly clobbered that you cannot type in the above expression!). On the other hand,
(setq *readtable* (copy-readtable))
will merely replace the current readtable with a copy of itself.
If to-readtable is unsupplied or nil
, a fresh
copy is made. Otherwise, to-readtable must be a readtable,
which is destructively copied into.
[Function]
readtablep
object
readtablep
is true if its argument is a readtable, and
otherwise is false.
(readtablep x) == (typep x 'readtable)
[Function]
set-syntax-from-char to-char
from-char
&optional
to-readtable
from-readtable
This makes the syntax of to-char in to-readtable be
the same as the syntax of from-char in from-readtable.
The to-readtable defaults to the current readtable (the value
of the global variable *readtable*
), and
from-readtable defaults to nil
, meaning to use the
syntaxes from the standard Lisp readtable.
X3J13 voted in January 1989 (ARGUMENTS-UNDERSPECIFIED) to clarify that
the to-char and from-char must each be a
character.
Only attributes as shown in table 22-1 are copied;
moreover, if a macro character is copied, the macro definition
function is copied also. However, attributes as shown in table 22-3 are not
copied; they are ``hard-wired’’ into the extended-token parser. For
example, if the definition of S
is copied to
*
, then *
will become a constituent
that is alphabetic but cannot be used as an exponent indicator
for short-format floating-point number syntax.
It works to copy a macro definition from a character such as
"
to another character; the standard definition for
"
looks for another character that is the same as the
character that invoked it. It doesn’t work to copy the definition of
(
to {
, for example; it can be done, but it
lets one write lists in the form {a b c)
, not
{a b c}
, because the definition always looks for a closing
parenthesis, not a closing brace. See the function
read-delimited-list
, which is useful in this
connection.
X3J13 voted in January 1989 (RETURN-VALUES-UNSPECIFIED) to specify
that the set-syntax-from-char
function returns
t
.
[Function]
set-macro-character char function &optional
non-terminating-p readtable
get-macro-character char &optional readtable
set-macro-character
causes char to be a macro
character that when seen by read
causes function
to be called. If non-terminating-p is not nil
(it
defaults to nil
), then it will be a non-terminating macro
character: it may be embedded within extended tokens.
set-macro-character
returns t
.
get-macro-character
returns the function associated with
char and, as a second value, returns the
non-terminating-p flag; it returns nil
if
char does not have macro-character syntax. In each case,
readtable defaults to the current readtable.
X3J13 voted in January 1989 (GET-MACRO-CHARACTER-READTABLE) to specify
that if nil
is explicitly passed as the second argument to
get-macro-character
, then the standard readtable is used.
This is consistent with the behavior of
copy-readtable
.
The function is called with two arguments, stream and char. The stream is the input stream, and char is the macro character itself. In the simplest case, function may return a Lisp object. This object is taken to be that whose printed representation was the macro character and any following characters read by the function. As an example, a plausible definition of the standard single quote character is:
(defun single-quote-reader (stream char)
(declare (ignore char))
(list 'quote (read stream t nil t)))
(set-macro-character #\' #'single-quote-reader)
(Note that t
is specified for the recursive-p
argument to read
; see section 22.2.1.) The function
reads an object following the single-quote and returns a list of the
symbol quote
and that object. The char argument is
ignored.
The function may choose instead to return zero values (for
example, by using (values)
as the return expression). In
this case, the macro character and whatever it may have read contribute
nothing to the object being read. As an example, here is a plausible
definition for the standard semicolon (comment) character:
(defun semicolon-reader (stream char)
(declare (ignore char))
;; First swallow the rest of the current input line.
;; End-of-file is acceptable for terminating the comment.
(do () ((char= (read-char stream nil #\Newline t) #\Newline)))
;; Return zero values.
(values))
(set-macro-character #\; #'semicolon-reader)
(Note that t
is specified for the recursive-p
argument to read-char
; see section 22.2.1.)
The function should not have any side effects other than on
the stream. Because of backtracking and restarting of the
read
operation, front ends (such as editors and rubout
handlers) to the reader may cause function to be called
repeatedly during the reading of a single expression in which the macro
character only appears once.
Compatibility note: The ability to return either
zero or one value is the closest Common Lisp macro characters come to
the splicing macro characters of MacLisp or the splice
macro characters of Interlisp. The Common Lisp definition does not allow
the splicing of arbitrarily many values, but it does allow a
macro-character function to decide after it is invoked whether or not to
yield a value, an option not possible in MacLisp or Interlisp.
MacLisp has nothing equivalent to non-terminating macro characters.
The Interlisp equivalents of terminating and non-terminating macro
characters are macro characters with the ALWAYS
or
FIRST
option, respectively. Common Lisp has nothing
equivalent to the Interlisp ALONE
macro-character
option.
Here is an example of a more elaborate set of read-macro characters that
I used in the implementation of the original simulator for Connection
Machine Lisp [44,57], a parallel dialect of
Common Lisp. This simulator was used to gain experience with the
language before freezing its design for full-scale implementation on a
Connection Machine computer system. This example illustrates the typical
manner in which a language designer can embed a new language within the
syntactic and semantic framework of Lisp, saving the effort of designing
an implementation from scratch.
Connection Machine Lisp introduces a new data type called a
xapping, which is simply an unordered set of ordered pairs of
Lisp objects. The first element of each pair is called the
index and the second element the value. We say that
the xapping maps each index to its corresponding value. No two pairs of
the same xapping may have the same (that is, eql
) index.
Xappings may be finite or infinite sets of pairs; only certain kinds of
infinite xappings are required, and special representations are used for
them.
A finite xapping is notated by writing the pairs between braces, separated by whitespace. A pair is notated by writing the index and the value, separated by a right arrow (or an exclamation point if the host Common Lisp has no right-arrow character).
Remark: The original language design used the right
arrow; the exclamation point was chosen to replace it on ASCII-only
terminals because it is one of the six characters
[ ] { } ! ?
reserved by Common Lisp to the user.
While preparing the TeX manuscript for this book I made a mistake in font selection and discovered that by an absolutely incredible coincidence the right arrow has the same numerical code (octal 41) within TeX fonts as the ASCII exclamation point. The result was that although the manuscript called for right arrows, exclamation points came out in the printed copy. Imagine my astonishment!
Here is an example of a xapping that maps three symbols to strings:
{moe->"Oh, a wise guy, eh?" larry->"Hey, what's the idea?"
curly->"Nyuk, nyuk, nyuk!"}
For convenience there are certain abbreviated notations. If the index
and value for a pair are the same object x, then instead of
having to write ``x->x’’ (or, worse yet,
``#43=
x
->#43#
’’) we
may write simply x for the pair. If all pairs of a xapping are
of this form, we call the xapping a xet. For example, the
notation
{baseball chess cricket curling bocce 43-man-squamish}
is entirely equivalent in meaning to
{baseball->baseball curling->curling cricket->cricket
chess->chess bocce->bocce 43-man-squamish->43-man-squamish}
namely a xet of symbols naming six sports.
Another useful abbreviation covers the situation where the n pairs of a finite xapping are integers, collectively covering a range from zero to n-1. This kind of xapping is called a xector and may be notated by writing the values between brackets in ascending order of their indices. Thus
[tinker evers chance]
is merely an abbreviation for
{tinker->0 evers->1 chance->2}
There are two kinds of infinite xapping: constant and universal. A
constant xapping
{->
z
}
maps every
object to the same value z. The universal xapping
{->}
maps every object to itself and is therefore the
xet of all Lisp objects, sometimes called simply the universe. Both
kinds of infinite xet may be modified by explicitly writing exceptions.
One kind of exception is simply a pair, which specifies the value for a
particular index; the other kind of exception is simply
k->indicating that the xapping does not have a pair
with index k after all. Thus the notation
{sky->blue grass->green idea->glass->->red}
indicates a xapping that maps sky
to blue
,
grass
to green
, and every other object except
idea
and glass
to red
. Note well
that the presence or absence of whitespace on either side of an arrow is
crucial to the correct interpretation of the notation.
Here is the representation of a xapping as a structure:
(defstruct
(xapping (:print-function print-xapping)
(:constructor xap
(domain range &optional
(default ':unknown defaultp)
(infinite (and defaultp :constant))
(exceptions '()))))
domain
range
default
(infinite nil :type (member nil :constant :universal)
exceptions)
The explicit pairs are represented as two parallel lists, one of
indexes (domain
) and one of values (range
).
The default
slot is the default value, relevant only if the
infinite
slot is :constant
. The
exceptions
slot is a list of indices for which there are no
values. (See the end of section 22.3.3 for the definition of
print-xapping
.)
Here, then, is the code for reading xectors in bracket notation:
(defun open-bracket-macro-char (stream macro-char)
(declare (ignore macro-char))
(let ((range (read-delimited-list #\] stream t)))
(xap (iota-list (length range)) range)))
(set-macro-character #\[ #'open-bracket-macro-char)
(set-macro-character #\] (get-macro-character #\) ))
(defun iota-list (n) ;Return list of integers from 0 to n-1
(do ((j (- n 1) (- j 1))
(z '() (cons j z)))
((< j 0) z)))
The code for reading xappings in the more general brace notation, with all the possibilities for xets (or individual xet pairs), infinite xappings, and exceptions, is a bit more complicated; it is shown in table 22-5. That code is used in conjunction with the initializations
(set-macro-character #\{ #'open-brace-macro-char)
(set-macro-character #\} (get-macro-character #\) ))
----------------------------------------------------------------
Table 22-5: Macro Character Definition for Xapping Syntax
(defun open-brace-macro-char (s macro-char)
(declare (ignore macro-char))
(do ((ch (peek-char t s t nil t) (peek-char t s t nil t))
(domain '()) (range '()) (exceptions '()))
((char= ch #\})
(read-char s t nil t)
(construct-xapping (reverse domain) (reverse range)))
(cond ((char= ch #\->)
(read-char s t nil t)
(let ((nextch (peek-char nil s t nil t)))
(cond ((char= nextch #\})
(read-char s t nil t)
(return (xap (reverse domain)
(reverse range)
nil :universal exceptions)))
(t (let ((item (read s t nil t)))
(cond ((char= (peek-char t s t nil t) #\})
(read-char s t nil t)
(return (xap (reverse domain)
(reverse range)
item :constant
exceptions)))
(t (reader-error s
"Default -> item must be last"))))))))
(t (let ((item (read-preserving-whitespace s t nil t))
(nextch (peek-char nil s t nil t)))
(cond ((char= nextch #\->)
(read-char s t nil t)
(cond ((member (peek-char nil s t nil t)
'(#\Space #\Tab #\Newline))
(push item exceptions))
(t (push item domain)
(push (read s t nil t) range))))
((char= nch #\})
(read-char s t nil t)
(push item domain)
(push item range)
(return (xap (reverse domain) (reverse range))))
(t (push item domain)
(push item range))))))))
----------------------------------------------------------------
[Function]
make-dispatch-macro-character char
&optional non-terminating-p readtable
This causes the character char to be a dispatching macro
character in readtable (which defaults to the current
readtable). If non-terminating-p is not nil
(it
defaults to nil
), then it will be a non-terminating macro
character: it may be embedded within extended tokens.
make-dispatch-macro-character
returns t
.
Initially every character in the dispatch table has a character-macro
function that signals an error. Use
set-dispatch-macro-character
to define entries in the
dispatch table.
X3J13 voted in January 1989 (ARGUMENTS-UNDERSPECIFIED) to clarify that
char must be a character.
[Function]
set-dispatch-macro-character disp-char sub-char function
&optional readtable
get-dispatch-macro-character disp-char sub-char
&optional readtable
set-dispatch-macro-character
causes function to
be called when the disp-char followed by sub-char is
read. The readtable defaults to the current readtable. The
arguments and return values for function are the same as for
normal macro characters except that function gets
sub-char, not disp-char, as its second argument and
also receives a third argument that is the non-negative integer whose
decimal representation appeared between disp-char and
sub-char, or nil
if no decimal integer appeared
there.
The sub-char may not be one of the ten decimal digits; they
are always reserved for specifying an infix integer argument. Moreover,
if sub-char is a lowercase character (see
lower-case-p
), its uppercase equivalent is used instead.
(This is how the rule is enforced that the case of a dispatch
sub-character doesn’t matter.)
set-dispatch-macro-character
returns t
.
get-dispatch-macro-character
returns the macro-character
function for sub-char under disp-char, or
nil
if there is no function associated with
sub-char.
If the sub-char is one of the ten decimal digits
0 1 2 3 4 5 6 7 8 9
,
get-dispatch-macro-character
always returns
nil
. If sub-char is a lowercase character, its
uppercase equivalent is used instead.
X3J13 voted in January 1989 (GET-MACRO-CHARACTER-READTABLE) to specify
that if nil
is explicitly passed as the second argument to
get-dispatch-macro-character
, then the standard readtable
is used. This is consistent with the behavior of
copy-readtable
.
For either function, an error is signaled if the specified
disp-char is not in fact a dispatch character in the specified
readtable. It is necessary to use
make-dispatch-macro-character
to set up the dispatch
character before specifying its sub-characters.
As an example, suppose one would like
#$
foo
to be read as if it were
(dollars
foo
)
. One might
say:
(defun |#$-reader| (stream subchar arg)
(declare (ignore subchar arg))
(list 'dollars (read stream t nil t)))
(set-dispatch-macro-character #\# #\$ #'|#$-reader|)
Compatibility note: This macro-character mechanism is different from those in MacLisp, Interlisp, and Lisp Machine Lisp. Recently Lisp systems have implemented very general readers, even readers so programmable that they can parse arbitrary compiled BNF grammars. Unfortunately, these readers can be complicated to use. This design is an attempt to make the reader as simple as possible to understand, use, and implement. Splicing macros have been eliminated; a recent informal poll indicates that no one uses them to produce other than zero or one value. The ability to access parts of the object preceding the macro character has been eliminated. The MacLisp single-character-object feature has been eliminated because it is seldom used and trivially obtainable by defining a macro.
The user is encouraged to turn off most macro characters, turn others
into single-character-object macros, and then use read
purely as a lexical analyzer on top of which to build a parser. It is
unnecessary, however, to cater to more complex lexical analysis or
parsing than that needed for Common Lisp.
[Function]
readtable-case
readtable
X3J13 voted in June 1989 (READ-CASE-SENSITIVITY) to introduce the
function readtable-case
to control the reader’s
interpretation of case. It provides access to a slot in a readtable, and
may be used with setf
to alter the state of that slot. The
possible values for the slot are :upcase
,
:downcase
, :preserve
, and
:invert
; the readtable-case
for the standard
readtable is :upcase
. Note that copy-readtable
is required to copy the readtable-case
slot along with all
other readtable information.
Once the reader has accumulated a token as described in section 22.1.1, if the token is a symbol,
``replaceable’’ characters (unescaped uppercase or lowercase constituent
characters) may be modified under the control of the
readtable-case
of the current readtable:
:upcase
, replaceable characters are converted to
uppercase. (This was the behavior specified by the first edition.):downcase
, replaceable characters are converted to
lowercase.:preserve
, the cases of all characters remain
unchanged.:invert
, if all of the replaceable letters in the
extended token are of the same case, they are all converted to the
opposite case; otherwise the cases of all characters in that token
remain unchanged.As an illustration, consider the following code.
(let ((*readtable* (copy-readtable nil)))
(format t "READTABLE-CASE Input Symbol-name~
~%-----------------------------------~
~%")
(dolist (readtable-case '(:upcase :downcase :preserve :invert))
(setf (readtable-case *readtable*) readtable-case)
(dolist (input '("ZEBRA" "Zebra" "zebra"))
(format t ":~A~16T~A~24T~A~%"
(string-upcase readtable-case)
input
(symbol-name (read-from-string input)))))))
The output from this test code should be
READTABLE-CASE Input Symbol-name
------------
:UPCASE ZEBRA ZEBRA
:UPCASE Zebra ZEBRA
:UPCASE zebra ZEBRA
:DOWNCASE ZEBRA zebra
:DOWNCASE Zebra zebra
:DOWNCASE zebra zebra
:PRESERVE ZEBRA ZEBRA
:PRESERVE Zebra Zebra
:PRESERVE zebra zebra
:INVERT ZEBRA zebra
:INVERT Zebra Zebra
:INVERT zebra ZEBRA
The readtable-case
of the current readtable also affects
the printing of symbols (see *print-case*
and
*print-escape*
).
Next: What the Print
Up: Printed Representation
of Previous: Standard
Dispatching Macro
AI.Repository@cs.cmu.edu