Common Lisp the Language, 2nd Edition
Next: Macro Characters
Up: Printed Representation
of Previous: What the
Read
When an extended token is read, it is interpreted as a number or symbol. In general, the token is interpreted as a number if it satisfies the syntax for numbers specified in table 22-2; this is discussed in more detail below.
The characters of the extended token may serve various syntactic functions as shown in table 22-3, but it must be remembered that any character included in a token under the control of an escape character is treated as alphabetic rather than according to the attributes shown in the table. One consequence of this rule is that a whitespace, macro, or escape character will always be treated as alphabetic within an extended token because such a character cannot be included in an extended token except under the control of an escape character.
To allow for extensions to the syntax of numbers, a syntax for potential numbers is defined in Common Lisp that is more general than the actual syntax for numbers. Any token that is not a potential number and does not consist entirely of dots will always be taken to be a symbol, now and in the future; programs may rely on this fact. Any token that is a potential number but does not fit the actual number syntax defined below is a reserved token and has an implementation-dependent interpretation; an implementation may signal an error, quietly treat the token as a symbol, or take some other action. Programmers should avoid the use of such reserved tokens. (A symbol whose name looks like a reserved token can always be written using one or more escape characters.)
Just as bignum is the standard term used by Lisp implementors
for very large integers, and flonum (rhymes with ``low hum’‘)
refers to a floating-point number, the term potnum has been
used widely as an abbreviation for ``potential number.’’ ``Potnum’’
rhymes with ``hot rum.’’
----------------------------------------------------------------
Table 22-2: Actual Syntax of Numbers
number ::= integer | ratio | floating-point-number
integer ::= [sign] {digit}+ [decimal-point]
ratio ::= [sign] {digit}+ / {digit}+
floating-point-number ::= [sign] {digit}* decimal-point {digit}+ [exponent]
| [sign] {digit}+ [decimal-point {digit}*] exponent
sign ::= + | -
decimal-point ::= .
digit ::= 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
exponent ::= exponent-marker [sign] {digit}+
exponent-marker ::= e | s | f | d | l | E | S | F | D | L
----------------------------------------------------------------
----------------------------------------------------------------
Table 22-3: Standard Constituent Character Attributes
! alphabetic <page> illegal <backspace> illegal
" alphabetic * <return> illegal * <tab> illegal *
# alphabetic * <space> illegal * <newline> illegal *
$ alphabetic <rubout> illegal <linefeed> illegal *
% alphabetic . alphabetic, dot, decimal point
& alphabetic + alphabetic, plus sign
' alphabetic * - alphabetic, minus sign
( alphabetic * * alphabetic
) alphabetic * / alphabetic, ratio marker
, alphabetic * @ alphabetic
0 alphadigit A, a alphadigit
1 alphadigit B, b alphadigit
2 alphadigit C, c alphadigit
3 alphadigit D, d alphadigit, double-float exponent marker
4 alphadigit E, e alphadigit, float exponent marker
5 alphadigit F, f alphadigit, single-float exponent marker
6 alphadigit G, g alphadigit
7 alphadigit H, h alphadigit
8 alphadigit I, i alphadigit
9 alphadigit J, j alphadigit
: package marker K, k alphadigit
; alphabetic * L, l alphadigit, long-float exponent marker
< alphabetic M, m alphadigit
= alphabetic N, n alphadigit
> alphabetic O, o alphadigit
? alphabetic P, p alphadigit
[ alphabetic Q, q alphadigit
\ alphabetic * R, r alphadigit
] alphabetic S, s alphadigit, short-float exponent marker
^ alphabetic T, t alphadigit
_ alphabetic U, u alphadigit
` alphabetic * V, v alphadigit
{ alphabetic W, w alphadigit
| alphabetic * X, x alphadigit
} alphabetic Y, y alphadigit
~ alphabetic Z, z alphadigit
----------------------------------------------------------------
A token is a potential number if it satisfies the following requirements:
+
or
-
), ratio markers (/
), decimal points
(.
), extension characters (^
or
_
), and number markers. (A number marker is a letter.
Whether a letter may be treated as a number marker depends on context,
but no letter that is adjacent to another letter may ever be treated as
a number marker. Floating-point exponent markers are instances of number
markers.)*read-base*
, but only in
tokens containing no decimal points.)As examples, the following tokens are potential numbers, but they are not actually numbers as defined below, and so are reserved tokens. (They do indicate some interesting possibilities for future extensions.)
1b5000 777777q 1.7J -3/4+6.7J 12/25/83
27^19 3^4/5 6//7 3.1.2.6 ^-43^
3.141_592_653_589_793_238_4 -3.7+2.6i-6.17j+19.6k
The following tokens are not potential numbers but are always treated as symbols:
/ /5 + 1+ 1-
foo+ ab.cd _ ^ ^/-
The following tokens are potential numbers if the value of
*read-base*
is 16
(an abnormal situation), but
they are always treated as symbols if the value of
*read-base*
is 10
(the usual value):
bad-face 25-dec-83 a/b fad_cafe f^
It is possible for there to be an ambiguity as to whether a letter should be treated as a digit or as a number marker. In such a case, the letter is always treated as a digit rather than as a number marker.
Note that the printed representation for a potential number may not contain any escape characters. An escape character robs the following character of all syntactic qualities, forcing it to be strictly alphabetic and therefore unsuitable for use in a potential number. For example, all of the following representations are interpreted as symbols, not numbers:
\256 25\64 1.0\E6 |100| 3\.14159 |3/4| 3\/4 5||
In each case, removing the escape character(s) would allow the token to be treated as a number.
If a potential number can in fact be interpreted as a number
according to the BNF syntax in table 22-2, then a number object of
the appropriate type is constructed and returned. It should be noted
that in a given implementation it may be that not all tokens conforming
to the actual syntax for numbers can actually be converted into number
objects. For example, specifying too large or too small an exponent for
a floating-point number may make the number impossible to represent in
the implementation. Similarly, a ratio with denominator zero (such as
-35/000
) cannot be represented in any
implementation. In any such circumstance where a token with the syntax
of a number cannot be converted to an internal number object, an error
is signaled. (On the other hand, an error must not be signaled for
specifying too many significant digits for a floating-point number; an
appropriately truncated or rounded value should be produced.)
There is an omission in the syntax of numbers as described in table
22-2, in that the syntax
does not account for the possible use of letters as digits. The radix
used for reading integers and ratios is normally decimal. However, this
radix is actually determined by the value of the variable
*read-base*
, whose initial value is 10
.
*read-base*
may take on any integral value between
2
and 36
; let this value be n. Then a
token x is interpreted as an integer or ratio in base
n if it could be properly so interpreted in the syntax
#
n
R
x
(see section 22.1.4). So, for
example, if the value of *read-base*
is 16
,
then the printed representation
(a small face in a bad place)
would be interpreted as if the following representation had been read
with *read-base*
set to 10
:
(10 small 64206 in 10 2989 place)
because four of the seven tokens in the list can be interpreted as
hexadecimal numbers. This facility is intended to be used in reading
files of data that for some reason contain numbers not in decimal radix;
it may also be used for reading programs written in Lisp dialects (such
as MacLisp) whose default number radix is not decimal. Non-decimal
constants in Common Lisp programs or portable Common Lisp data files
should be written using #O
, #X
,
#B
, or #
n
R
syntax.
When *read-base*
has a value greater than
10
, an ambiguity is introduced into the actual syntax for
numbers because a letter can serve as either a digit or an exponent
marker; a simple example is 1E0
when the value of
*read-base*
is 16
. The ambiguity is resolved
in accordance with the general principle that interpretation as a digit
is preferred to interpretation as a number marker. The consequence in
this case is that if a token can be interpreted as either an integer or
a floating-point number, then it is taken to be an integer.
If a token consists solely of dots (with no escape characters), then
an error is signaled, except in one circumstance: if the token is a
single dot and occurs in a situation appropriate to ``dotted list’’
syntax, then it is accepted as a part of such syntax. Signaling an error
catches not only misplaced dots in dotted list syntax but also lists
that were truncated by *print-length*
cutoff, because such
lists end with a three-dot sequence (...
). Examples:
(a . b) ;A dotted pair of a and b
(a.b) ;A list of one element, the symbol named a.b
(a. b) ;A list of two elements a. and b
(a .b) ;A list of two elements a and .b
(a . b) ;A list of three elements a, ., and b
(a |.| b) ;A list of three elements a, ., and b
(a ... b) ;A list of three elements a, ..., and b
(a |...| b) ;A list of three elements a, ..., and b
(a b . c) ;A dotted list of a and b with c at the end
.iot ;The symbol whose name is .iot
(. b) ;Illegal; an error is signaled
(a .) ;Illegal; an error is signaled
(a .. b) ;Illegal; an error is signaled
(a . . b) ;Illegal; an error is signaled
(a b c ...) ;Illegal; an error is signaled
In all other cases, the token is construed to be the name of a symbol. If there are any package markers (colons) in the token, they divide the token into pieces used to control the lookup and creation of the symbol.
If there is a single package marker, and it occurs at the beginning of
the token, then the token is interpreted as a keyword, that is, a symbol
in the keyword
package. The part of the token after the
package marker must not have the syntax of a number.
If there is a single package marker not at the beginning or end of the token, then it divides the token into two parts. The first part specifies a package; the second part is the name of an external symbol available in that package. Neither of the two parts may have the syntax of a number.
If there are two adjacent package markers not at the beginning or end
of the token, then they divide the token into two parts. The first part
specifies a package; the second part is the name of a symbol within that
package (possibly an internal symbol). Neither of the two parts may have
the syntax of a number.
X3J13 voted in March 1988 (COLON-NUMBER) to clarify that, in the
situations described in the preceding three paragraphs, the restriction
on the syntax of the parts should be strengthened: none of the parts may
have the syntax of even a potential number. Tokens such as
:3600
, :1/2
, and editor:3.14159
were already ruled out; this clarification further declares that such
tokens as :2^ 3
, compiler:1.7J
, and
Christmas:12/25/83
are also in error and therefore should
not be used in portable programs. Implementations may differ in their
treatment of such package-marked potential numbers.
If a symbol token contains no package markers, then the entire token
is the name of the symbol. The symbol is looked up in the default
package, which is the value of the variable *package*
.
All other patterns of package markers, including the cases where there are more than two package markers or where a package marker appears at the end of the token, at present do not mean anything in Common Lisp (see chapter 11). It is therefore currently an error to use such patterns in a Common Lisp program. The valid patterns for tokens may be summarized as follows:
nnnnn a number
xxxxx a symbol in the current package
:xxxxx a symbol in the keyword package
ppppp:xxxxx an external symbol in the ppppp package
ppppp::xxxxx a (possibly internal) symbol in the ppppp package
where nnnnn has the syntax of a number, and xxxxx and ppppp do not have the syntax of a number.
In accordance with the X3J13 decision noted above (COLON-NUMBER) ,
xxxxx and ppppp may not have the syntax of even a
potential number.
[Variable]
*read-base*
The value of *read-base*
controls the interpretation of
tokens by read
as being integers or ratios. Its value is
the radix in which integers and ratios are to be read; the value may be
any integer from 2
to 36
(inclusive) and is
normally 10
(decimal radix). Its value affects only the
reading of integers and ratios. In particular, floating-point numbers
are always read in decimal radix. The value of *read-base*
does not affect the radix for rational numbers whose radix is explicitly
indicated by #O
, #X
, #B
, or
#
n
R
syntax or by a
trailing decimal point.
Care should be taken when setting *read-base*
to a value
larger than 10
, because tokens that would normally be
interpreted as symbols may be interpreted as numbers instead. For
example, with *read-base*
set to 16
(hexadecimal radix), variables with names such as a
,
b
, f
, bad
, and face
will be treated by the reader as numbers (with decimal values 10, 11,
15, 2989, and 64206, respectively). The ability to alter the input radix
is provided in Common Lisp primarily for the purpose of reading data
files in special formats, rather than for the purpose of altering the
default radix in which to read programs. The user is strongly encouraged
to use #O
, #X
, #B
, or
#
n
R
syntax when notating
non-decimal constants in programs.
Compatibility note: This variable corresponds to the
variable called ibase
in MacLisp and to the function called
radix
in Interlisp.
[Variable]
*read-suppress*
When the value of *read-suppress*
is nil
,
the Lisp reader operates normally. When it is not nil
, then
most of the interesting operations of the reader are suppressed; input
characters are parsed, but much of what is read is not interpreted.
The primary purpose of *read-suppress*
is to support the
operation of the read-time conditional constructs #+
and
#-
(see section 22.1.4). It is
important for these constructs to be able to skip over the printed
representation of a Lisp expression despite the possibility that the
syntax of the skipped expression may not be entirely legal for the
current implementation; this is because a primary application of
#+
and #-
is to allow the same program to be
shared among several Lisp implementations despite small
incompatibilities of syntax.
A non-nil
value of *read-suppress*
has the
following specific effects on the Common Lisp reader:
nil
; that is,
reading an extended token when *read-suppress*
is
non-nil
simply returns nil
. (One consequence
of this is that the error concerning improper dotted-list syntax will
not be signaled.)#
macro-character construction that
requires, permits, or disallows an infix numerical argument, such as
#
n
R
, will not enforce
any constraint on the presence, absence, or value of such an
argument.#\
construction always produces the value
nil
. It will not signal an error even if an unknown
character name is seen.#B
, #O
, #X
, and
#R
constructions always scans over a following token and
produces the value nil
. It will not signal an error even if
the token does not have the syntax of a rational number.#*
construction always scans over a following token
and produces the value nil
. It will not signal an error
even if the token does not consist solely of the characters
0
and 1
.#.
and #,
constructions reads
the following form (in suppressed mode, of course) but does not evaluate
it. The form is discarded and nil
is produced.#,
from the language.#A
, #S
, and #:
constructions reads the following form (in suppressed mode, of course)
but does not interpret it in any way; it need not even be a list in the
case of #S
, or a symbol in the case of #:
. The
form is discarded and nil
is produced.#=
construction is totally ignored. It does not
read a following form. It produces no object, but is treated as
whitespace.##
construction always produces
nil
.Note that, no matter what the value of *read-suppress*
,
parentheses still continue to delimit (and construct) lists; the
#(
construction continues to delimit vectors; and comments,
strings, and the quote and backquote constructions continue to be
interpreted properly. Furthermore, such situations as ')
,
#<
, #)
, and #``space
continue
to signal errors.
In some cases, it may be appropriate for a user-written
macro-character definition to check the value of
*read-suppress*
and to avoid certain computations or side
effects if its value is not nil
.
[Variable]
*read-eval*
X3J13 voted in June 1989 (DATA-IO) to add a new reader control
variable, *read-eval*
, whose default value is
t
. If *read-eval*
is false, the
#.
reader macro signals an error.
Printing is also affected. If *read-eval*
is false and
*print-readably*
is true, any print-object
method that would otherwise output a #.
reader macro must
either output something different or signal an error of type
print-not-readable
.
Binding *read-eval*
to nil
is useful when
reading data that came from an untrusted source, such as a network or a
user-supplied data file; it prevents the #.
reader macro
from being exploited as a ``Trojan horse’’ to cause arbitrary forms to
be evaluated.
Next: Macro Characters
Up: Printed Representation
of Previous: What the
Read
AI.Repository@cs.cmu.edu