Common Lisp the Language, 2nd Edition
Next: Character Attributes
Up: Common Lisp the Language
Previous: Implementation
Parameters
Common Lisp provides a character data type; objects of this type represent printed symbols such as letters.
In general, characters in Common Lisp are not true objects;
eq
cannot be counted upon to operate on them reliably. In
particular, it is possible that the expression
(let ((x z) (y z)) (eq x y))
may be false rather than true, if the value of z
is a
character.
Rationale: This odd breakdown of eq
in
the case of characters allows the implementor enough design freedom to
produce exceptionally efficient code on conventional architectures. In
this respect the treatment of characters exactly parallels that of
numbers, as described in chapter 12.
Table 13-1: Standard Character Labels, Glyphs, and Descriptions SM05 @ commercial at SD13 ` grave accent SP02 ! exclamation mark LA02 A capital A LA01 a small a SP04 ” quotation mark LB02 B capital B LB01 b small b SM01 # number sign LC02 C capital C LC01 c small c SC03 $ dollar sign LD02 D capital D LD01 d small d SM02 % percent sign LE02 E capital E LE01 e small e SM03 & ampersand LF02 F capital F LF01 f small f SP05 ’ apostrophe LG02 G capital G LG01 g small g SP06 ( left parenthesis LH02 H capital H LH01 h small h SP07 ) right parenthesis LI02 I capital I LI01 i small i SM04 * asterisk LJ02 J capital J LJ01 j small j SA01 + plus sign LK02 K capital K LK01 k small k SP08 , comma LL02 L capital L LL01 l small l SP10 - hyphen or minus sign LM02 M capital M LM01 m small m SP11 . period or full stop LN02 N capital N LN01 n small n SP12 / solidus LO02 O capital O LO01 o small o ND10 0 digit 0 LP02 P capital P LP01 p small p ND01 1 digit 1 LQ02 Q capital Q LQ01 q small q ND02 2 digit 2 LR02 R capital R LR01 r small r ND03 3 digit 3 LS02 S capital S LS01 s small s ND04 4 digit 4 LT02 T capital T LT01 t small t ND05 5 digit 5 LU02 U capital U LU01 u small u ND06 6 digit 6 LV02 V capital V LV01 v small v ND07 7 digit 7 LW02 W capital W LW01 w small w ND08 8 digit 8 LX02 X capital X LX01 x small x ND09 9 digit 9 LY02 Y capital Y LY01 y small y SP13 : colon LZ02 Z capital Z LZ01 z small z SP14 ; semicolon SM06 [ left square bracket SM11 { left curly bracket SA03 < less-than sign SM07 \ reverse solidus SM13 | vertical bar SA04 = equals sign SM08 ] right square bracket SM14 } right curly bracket SA05 > greater-than sign SD15 ^ circumflex accent SD19 ~ tilde SP15 ? question mark SP09 _ low line ——————————————————————————
If two objects are to be compared for ``identity,’’ but either might
be a character, then the predicate eql
is probably
appropriate.
X3J13 voted in March 1989 (CHARACTER-PROPOSAL) to approve the
following definitions and terminology for use in discussing character
facilities in Common Lisp.
A character repertoire defines a collection of characters
independent of their specific rendered image or font. (This corresponds
to the mathematical notion of a set, but the term character set
is avoided here because it has been used in the past to mean both what
is here called a repertoire and what is here called a coded character
set.) Character repertoires are specified independent of coding and
their characters are identified only with a unique character
label, a graphic symbol, and a character description. As an
example, table 13-1 shows the
character labels, graphic symbols, and character descriptions for all of
the characters in the repertoire standard-char
except for
#\Space
and #\Newline
.
Every Common Lisp implementation must support the standard character
repertoire as well as repertoires named base-character
,
extended-character
, and character
. Other
repertoires may be supported as well. X3J13 voted in June 1989
(MORE-CHARACTER-PROPOSAL) to specify that names of repertoires may be
used as type specifiers. Such types must be subtypes of
character
; that is, in a given implementation the
repertoire named character
must encompass all the character
objects supported by that implementation.
A coded character set is a character repertoire plus an encoding that provides a bijective mapping between each character in the set and a number (typically a non-negative integer) that serves as the character representation. There are numerous internationally standardized coded character sets.
A character may be included in one or more character repertoires. Similarly, a character may be included in one or more coded character sets.
To ensure that each character is uniquely defined, we may use a universal registry of characters that incorporates a collection of distinguished repertoires called character scripts that form an exhaustive partition of all characters. That is, each character is included in exactly one character script. (Draft ISO 10646 Coded Character Set Standard, if eventually approved as a standard, may become the practical realization of this universal registry.)
(X3J13 voted in June 1989 (MORE-CHARACTER-PROPOSAL) to specify that
an implementation must document the character scripts it supports. For
each script the documentation should discuss character labels, glyphs,
and descriptions; any canonicalization processes performed by the reader
that result in treating distinct characters as equivalent; any
canonicalization performed by format
in processing
directives; the behavior of char-upcase
,
char-downcase
, and the predicates
alpha-char-p
, upper-case-p
,
lower-case-p
, both-case-p
,
graphic-char-p
, alphanumericp
,
char-equal
, char-not-equal
,
char-lessp
, char-greaterp
,
char-not-greaterp
, and char-not-lessp
for
characters in the script; and behavior with respect to input and output,
including coded character sets and external coding schemes.)
In Common Lisp a character data object is identified by its character code, a unique numerical code. Each character code is composed from a character script and a character label. The convention by which a character script and character label compose a character code is implementation dependent. [X3J13 did not approve all parts of the proposal from its Subcommittee on Characters. As a result, some features that were approved appear to have no purpose. X3J13 wished to support the standardization by ISO of character scripts and coded character sets but declined to design facilities for use in Common Lisp until there has been more progress by ISO in this area. The approval of the terminology for scripts and labels gives a hint to implementors of likely directions for Common Lisp in the future.]
A character object that is classified as graphic, or displayable, has an associated glpyh. The glyph is the visual representation of the character. All other character data objects are classified as non-graphic.
This terminology assigns names to Common Lisp concepts in a manner
consistent with related concepts discussed in various ISO standards for
coded character sets and provides a demarcation between standardization
activities. For example, facilities for manipulating characters,
character scripts, and coded character sets are properly defined by a
Common Lisp standard, but Common Lisp should not define standard
character sets or standard character scripts.
Next: Character Attributes
Up: Common Lisp the Language
Previous: Implementation
Parameters
AI.Repository@cs.cmu.edu