Operations in Lisp, Scheme, and other dynamically-typed languages typically dispatch on representational type information rather than intentional type information. Several broad classes of bugs and confusions can be traced to improper attempts to recover intentional type information from representation types.
I’ve chosen here to discuss some Common Lisp built-in operators that highlight the various issues. However, the problems cited here are quite general, and occur routinely in other dynamically-typed languages as well as user programs. Fortunately, the solutions to these problems are also conveniently available to designers, implementors, and programmers–without throwing dynamic typing out the window. I’ve provided code to illustrate how these ideas can be translated into practice without requiring any fundamental change to the underlying technology.
“Why is there no generic COPY
function?” Common Lisp
programmers often ask this question.
This glossary entry from dpANS Common Lisp [CL93]
provides some useful background information and a brief rationale for
the absence of a generic COPY
function:
copy n.
- (of a cons C) a fresh cons with the same car and cdr as C.
- (of a list L) a fresh list with the same elements as L.. (Only the list structure is fresh; the elements are the same.) See the function
copy-list
.- (of an association list A with elements A_(i)) a fresh list B with elements B_(i), each of which is nil if A_(i) is nil, or else a copy of the cons A_(i). See the function
copy-alist
.- (of a tree T) a fresh tree with the same leaves as T. See the function
copy-tree
.- (of a random state R) a fresh random state that, if used as an argument to the function
random
would produce the same series of “random” values as R would produce.- (of a structure S) a fresh structure that has the same type as S, and that has slot values, each of which is the same as the corresponding slot value of S.
(Note that since the difference between a cons, a list, and a tree is a matter of “view” or “intention,” there can be no general-purpose function which, based solely on the type of an object, can determine which of these distinct meanings is intended. The distinction rests solely on the basis of the text description within this document. For example, phrases like “a copy of the given list” or “copy of the list x” imply the second definition.)
Programmers want a generic COPY
function that somehow
``does the right thing,’’ but relying solely on the representation type
is insufficient to resolve intentional distinctions between conses,
lists, alists, and trees, so we wouldn’t know whether to call
COPY-CONS
(if there were such a thing), COPY-LIST
,
COPY-ALIST
,
or COPY-TREE
.
Of course, we could just define that COPY-LIST
is what’s used, but that could (and probably would) give the misleading
impression that such a choice was ``uniquely determined’’ or even
``morally correct,’’ instead of just arbitrarily chosen. Also, an
arbitrary choice would leave users feeling funny about the presence of
an operator that can copy many–but not all–kinds of objects.
Where does this leave us? Well, it doesn’t mean the problem cannot be solved. It only means that additional data flow must be provided to the running program in order for it to be able to divine the programmer’s intent. For example:
(DEFUN COPY (OBJECT &OPTIONAL (INTENTIONAL-TYPE (TYPE-OF OBJECT)))
(COPY-OBJECT OBJECT INTENTIONAL-TYPE))
(DEFGENERIC COPY-OBJECT (OBJECT INTENTIONAL-TYPE))
(DEFMETHOD COPY-OBJECT ((OBJECT CONS) (TYPE (EQL 'CONS)))
(CONS (CAR OBJECT) (CDR OBJECT)))
(DEFMETHOD COPY-OBJECT ((OBJECT LIST) (TYPE (EQL 'LIST)))
(COPY-LIST OBJECT))
(DEFMETHOD COPY-OBJECT ((OBJECT LIST) (TYPE (EQL 'ALIST)))
(COPY-ALIST OBJECT))
(DEFMETHOD COPY-OBJECT (OBJECT (TYPE (EQL 'CONS-TREE)))
(COPY-TREE OBJECT))
``If I can’t have COPY
, why can I have a generic (ok,
just polymorphic) EQUAL
function?’’ Common Lisp programmers should ask this question, but rarely
do. I suppose most programmers are just happy we’ve given them a tool of
`reasonable engineering quality’ so they can get work done, and are not
philosophically inclined to look such a proverbial gift horse in the
mouth.
The design issues here are pretty much the same as they are for
COPY
. If COPY
can’t be done properly, then
neither can EQUAL
.
And, in fact, that’s the case. There is no uniquely determined equality
function for complex structures–there are only arbitrary ones.
EQUAL
and EQUALP
are just two of an arbitrary number of possible equality operations that
could have been provided by the language. Indeed, many of the dialects
which contributed to the design of Common Lisp had functions called EQUAL
which had slightly varying semantics. No particular definition was
definitively better than another. Arbitrary choices were made to resolve
the differences.
To avoid an arbitrary choice, it would be necessary to express some intentional information about how to descend the tree and how to compare the leaves. The following code illustrates one possibility.
;;; The name EQUIVALENT is used here instead of EQUAL to avoid a need
;;; to shadow the Common Lisp built-in function EQUAL.
(DEFUN EQUIVALENT (X Y &OPTIONAL (INTENTIONAL-TYPE (TYPE-OF Y)))
(EQUIVALENT-OBJECTS X Y INTENTIONAL-TYPE))
(DEFGENERIC EQUIVALENT-OBJECTS (X Y INTENTIONAL-TYPE))
(DEFMETHOD EQUIVALENT-OBJECTS ((X CONS) (Y CONS) (TYPE (EQL 'CONS)))
(AND (EQ (CAR X) (CAR Y)) (EQ (CDR X) (CDR Y))))
(DEFMETHOD EQUIVALENT-OBJECTS ((X LIST) (Y LIST) (TYPE (EQL 'LIST)))
(AND (= (LENGTH X) (LENGTH Y))
(EVERY #'EQL X Y)))
(DEFMETHOD EQUIVALENT-OBJECTS ((X LIST) (Y LIST) (TYPE (EQL 'ALIST)))
(AND (= (LENGTH X) (LENGTH Y))
(EVERY #'(LAMBDA (X Y) (EQUIVALENT X Y 'CONS)) X Y)))
(DEFMETHOD EQUIVALENT-OBJECTS (X Y (TYPE (EQL 'CONS-TREE)))
(EQUAL X Y))
This example shows truth values under our new
EQUIVALENT
predicate after various kinds of copying
using our new COPY
function:
(LET* ((A (GENSYM "A")) (B (GENSYM "B"))
(CONS (LIST A B))
(LIST (LIST CONS CONS))
(TYPES '(CONS LIST ALIST CONS-TREE))
(LISTS (CONS LIST (MAPCAR #'(LAMBDA (TYPE) (COPY LIST TYPE))
TYPES)))
(PAIRS (LOOP FOR P1 ON LISTS
APPEND (LOOP FOR P2 ON (CDR P1)
COLLECT (LIST (CAR P1) (CAR P2))))))
(LOOP FOR TYPE IN TYPES
COLLECT (CONS TYPE (LOOP FOR (X Y) IN PAIRS
COLLECT (EQUIVALENT X Y TYPE)))))
-> ((CONS T NIL NIL NIL NIL NIL NIL NIL NIL NIL)
(LIST T T NIL NIL T NIL NIL NIL NIL NIL)
(ALIST T T T NIL T T NIL T NIL NIL)
(CONS-TREE T T T T T T T T T T))
An interesting sidelight on the equality issue is that the functions provided by Common Lisp are not chosen in a completely arbitrary way. The following implications hold:
(EQ X Y) => (EQL X Y) => (EQUAL X Y) => (EQUALP X Y)
Similarly, in the replacement functionality we’ve proposed here:
(EQUIVALENT X Y 'CONS)
=> (EQUIVALENT X Y 'LIST)
=> (EQUIVALENT X Y 'ALIST)
=> (EQUIVALENT X Y 'CONS-TREE)
The fact that these functions have been chosen to have a sort of inclusion relationship does not imply that equality is a one-dimensional quantity, with predicates varying in no more interesting way than being more or less conservative. For example, the following predicates are not comparable:
(DEFUN EQUIV1 (X Y)
(COND ((AND (STRINGP X) (STRINGP Y)) (STRING-EQUAL X Y)) ;Liberal
((AND (NUMBERP X) (NUMBERP Y)) (EQL X Y)) ;Conservative
(T (EQL X Y))))
(DEFUN EQUIV2 (X Y)
(COND ((AND (STRINGP X) (STRINGP Y)) (STRING= X Y)) ;Conservative
((AND (NUMBERP X) (NUMBERP Y)) (= X Y)) ;Liberal
(T (EQL X Y))))
(EQUIV1 "Foo" "FOO") -> T (EQUIV2 "Foo" "FOO") -> NIL
(EQUIV1 1 1.0) -> NIL (EQUIV2 1 1.0) -> T
While it’s useful that there is an inclusion relationship among the
particular equality predicates offered by Common Lisp, the orderliness
of this relationship contributes to a mistaken impression among some
programmers that the equality testing done by EQUAL
and EQUALP
is somehow more special than many similar predicates we could have
provided but did not. This is evidenced in occasional bug reports that
vendors receive, arguing that an incorrect choice has ``clearly’’ been
made for how objects of a given type are compared, rather than
acknowledging that the choice is really quite arbitrary. When urged to
write their own equality predicate to suit their particular needs, they
sometimes react as if we are putting them off, rather than realizing
that any function they could write is just as valid as any one the
language provides. Were the language changed to accommodate such bug
reports, different users would probably complain. EQUAL
and EQUALP
are not in Common Lisp because they are uniquely dictated by
science–rather, these functions are present out of a sense of tradition
and conceptual (although in some cases not functional) compatibility
with older dialects.
Another place where intentional type information is quite valuable is
in coercion. The Common Lisp functions STRING
and COERCE
blatantly illustrate the problem:
(STRING 'NIL) -> "NIL" (COERCE 'NIL 'STRING) -> ""
The issue here is that COERCE
coerces sequence types to other sequence types, but does not coerce
symbols to strings; STRING
,
by contrast, accepts only symbols, strings, and characters as arguments.
These somewhat arbitrary type restrictions on the arguments provide
enough intentional type information to dictate the behavior in the NIL
case.
(STRING 'FOO) -> "FOO" (COERCE '(#\F #\O #\O) 'STRING) -> "FOO"
(STRING 'NIL) -> "NIL" (COERCE '() 'STRING) -> ""
An arbitrary design decision was made that COERCE
would view NIL
as an empty
list, but STRING
would view it as a symbol. The problem could be solved by a single
function in a general way if COERCE
took arguments specifying not only the target type but the intentional
type of the argument, as in the following example:
;;; We use the name CONVERT here to avoid name conflict with CL's COERCE.
(DEFGENERIC CONVERT (X FROM-INTENTIONAL-TYPE TO-INTENTIONAL-TYPE))
(DEFMETHOD CONVERT ((OBJECT SYMBOL) (FROM (EQL 'SYMBOL)) (TO (EQL 'STRING)))
(STRING OBJECT))
(DEFMETHOD CONVERT ((OBJECT LIST) (FROM (EQL 'LIST)) (TO (EQL 'STRING)))
(COERCE OBJECT 'STRING))
(CONVERT 'NIL 'SYMBOL 'STRING) -> "NIL"
(CONVERT 'NIL 'LIST 'STRING) -> ""
A related problem arose with the function INT-CHAR
, which
was removed between the publication of Common Lisp: The Language [CL84] and dpANS Common Lisp [CL93], primarily because the mapping from integers to
characters is not uniquely determined and it was felt that typical uses
of INT-CHAR
were suspect in portable code because the
integer-to-character mapping performed by INT-CHAR
is
implementation-defined, and the intentional type information of the
integer was not manifest. Explicit representation of intentional types
would have solved this problem as well, as illustrated here:
;;; For brevity, these examples do not bounds-check their argument integers.
(DEFMETHOD CONVERT ((I INTEGER) (FROM (EQL 'ASCII-CODE)) (TO (EQL 'CHARACTER)))
(AREF *ASCII-CODE-TABLE* I))
(DEFMETHOD CONVERT ((I INTEGER) (FROM (EQL 'EBCDIC-CODE)) (TO (EQL 'CHARACTER)))
(AREF *EBCDIC-CODE-TABLE* I))
(DEFMETHOD CONVERT ((I INTEGER) (FROM (EQL 'SAIL-CODE)) (TO (EQL 'CHARACTER)))
;; This character encoding originated long ago at the Stanford AI Lab.
(AREF *SAIL-CODE-TABLE* I))
(CONVERT #x41 'ASCII-CODE 'CHARACTER) -> #\A
(CONVERT #xC1 'EBCDIC-CODE 'CHARACTER) -> #\A
(CONVERT #x08 'ASCII-CODE 'CHARACTER) -> #\BS
(CONVERT #x08 'SAIL-CODE 'CHARACTER) -> #\l
An example of a programming system that does make good use of intentional type information is the Common Lisp Interface Manager [CLIM92].
CLIM has a model of both input and output which is built around the idea of intentional types. Input and output requests can be accompanied by ``presentation types’’ that contains the intentional type information necessary to accept (i.e., parse) or present (i.e., unparse or display) an object in a manner more refined than representational type information would permit. This is especially important for input since there are a wide variety of possible representations into which the same string could be mapped.
(ACCEPT-FROM-STRING 'STRING "3.2") -> "3.2"
(ACCEPT-FROM-STRING 'NUMBER "3.2") -> 3.2
(ACCEPT-FROM-STRING 'KEYWORD "3.2") -> :|3.2|
(ACCEPT-FROM-STRING 'PATHNAME "3.2") -> #P"MY-HOST:3.2"
Note that the presence of this CLIM functionality does not compromise
access to pre-existing functionality involving the standard Lisp parser
(i.e., READ
),
as in:
(READ-FROM-STRING "3.2") -> 3.2
Instead, CLIM provides such functionality through one of the many explicitly provided ``presentation type’’ options, as in:
(ACCEPT-FROM-STRING 'EXPRESSION "3.2") -> 3.2
In fact, input and output are just a specific instance of the more general issue of translating between internal and external representations of any kind. Other instances might occur in binary file I/O, in network protocols, in foreign function interfaces, and even in some cross-module exchanges within the same address space. Simplified, stylized, compacted, or otherwise specialized representations may be quite useful, but are really only fully powerful if the information that they convey can be inverted to produce an object of the same quality as the original.
In a language with strong static typing, the intentional type of the object would be evident at compile time, and the same representational type could be used for multiple intentional types. The main problem with this approach is that it gives up dynamic typing, which Lisp users have come to expect and enjoy.
If static type information is optional, it is difficult for language designers to reliably express how operators behave in the hybrid environment that results.
To avoid the need to throw dynamic typing out the window, we have proposed that the information which in some languages would be reliably available at compile time be passed as explicit data. Where such data is provided as a literal constant, the same compilation techniques as used by static languages would still apply. Where such data is not available to the compiler, the information would still be reliably and explainably available at runtime.
We have seen how this technique would lead to a more intuitive feel in a number of commonly used operators, as well as a possible reduction in the overall number of operators required.
These improvements are achieved by avoiding the complicated and messy business of trying to guess the user’s intentions about data from its chosen representation, and instead asking the programmer to express this information explicitly.
``If you have two bits of information to represent,
use two bits to represent it.
Neither coincidence nor convenience
justifies overloading a single bit.''
-- Pitman's Two-Bit Rule
Innumerable legions of my co-workers at Harlequin rushed to help me search out the EBCDIC character code for capital A; I shall be forever indebted to these fine and dedicated individuals. John Aspinall and Scott McKay provided advice about the CLIM examples. Chris Stacy and Rebecca Spainhower read early drafts of this document and provided useful feedback.
[CL84] Guy L. Steele Jr.,
Common Lisp: The Language,
Digital Press (Burlington, MA), 1984.
[CL93] Kent Pitman and Kathy Chapman (editors),
draft proposed American National Standard for Information
Systems--Programming Language--Common Lisp,
X3J13 document 93-102, 1993.
Available by anonymous FTP from "/pub/cl/dpANS2" on PARCFTP.Xerox.COM.
[CLIM92] Scott McKay and Bill York,
Common Lisp Interface Manager Release 2.0 Specification.
[No longer] Available by anonymous FTP from "/pub/clim" on Cambridge.Apple.COM.
Published in hardcopy in Lisp Pointers, Volume VI, Number 4, October-December 1993.
Hardcopy version: Copyright © 1994, Kent M. Pitman. All rights reserved.
[The copyright date appears to contradict the publication date, but in fact the printed date on the magazine was much earlier than the distribution date, which was in 1994.]
Web version: Copyright © 1997, Kent M. Pitman. All rights reserved.