Symbolics'
Genera system represents the accumulation of nearly three and a half decades of
software engineering tools, written in several dialects of the Lisp language
and several object-oriented extensions to Lisp. Traditionally, Genera has run
on a "Lisp Machine", a line of custom hardware workstations designed
specifically to execute the Lisp language. Because of the limited market, this
custom hardware has never been able to take advantage of cutting-edge
technology such as is available in commodity, RISC-based workstations. At the
same time, non-standard hardwares such as the Lisp Machine have fallen into
economic disfavor. Nonetheless, Lisp's (and the Lisp Machine's) capability in
prototyping, evolutionary problem solving, and super-complex problems has
retained a small but dedicated market.
In response to market pressure to provide Genera's capabilities on commodity
hardware, Symbolics chose to implement an "emulator" that would execute the
Lisp Machine instruction set on a standard platform, rather than to port the
approximately 1.5 million lines of Genera source code to a single Lisp dialect,
as would be necessary to take advantage of a native Lisp compiler on the same
platform. It was felt that this approach would result in a shorter
time-to-market while still preserving the robustness and flexibility of Genera.
At the same time, the emulator approach has meant that the most important
features of the Lisp machine in support of Lisp have also been preserved. In
particular, the tagged memory architecture supporting dynamic typing, the fast
exception handling for complete instruction semantics, and read and write
barriers in support of automatic storage management have been preserved.
The DEC Alpha RISC architecture offers a full 64-bit memory architecture and
multiple-issue instruction execution. Using the 64-bit (byte-based) address
space we were able to emulate within a single OSF/1 process both the 40-bit
tagged data and 32-bit (word-based) address space of the Lisp Machine. With
the multiple-issue instruction execution we were able to take the approach of
using the Alpha as a programmable micro-engine and write the emulation of the
Lisp Machine instruction set as if we were writing micro-code. We were
successfully able to utilize the OSF/1 operating system facilities to replace
the I/O subsystem of the traditional Lisp Machine at a high level, thus gaining
significant I/O performance improvement. The user-program accessible memory
protection facilities of OSF/1 allowed us to successfully emulate most of the
hardware features of the Lisp Machine in support of "ephemeral" garbage
collection.
The performance of the first version of the Virtual Lisp Machine running on a
DEC Alpha AXP 500 is very competitive with the performance of Symbolics' most
advanced custom hardware. (This performance can be expected to improve
proportionally as DEC introduces more powerful versions of the Alpha.) We
believe that a large part of this performance is due to the "micro-program"
approach of the emulator, which utilizes the instruction and data caches of the
Alpha in a quite different fashion than a traditional compiled version of the
same Lisp program might. The full paper discusses the details of our design
decisions, implementation, benchmarks, and some of the surprising results we
observed in tuning the emulator to the Alpha architecture.
The
sole purpose of a Lisp Machine is to support the execution of the Lisp language
in hardware. To that end, all architectural features find their roots in Lisp.
The memory architecture is "object-oriented"- every memory word contains an
object in the form of a data-type (tag) and representation, either immediate
(data) or as a reference (pointer) to the representation of the object. In
addition to objects, there are a small set of "special markers" used for
storage-management bookkeeping (mutating objects), monitoring (data and call
tracing), and debugging (uninitialized data). The hardware understands
"primitive" type representations.
One of the features of Lisp Machines (one of the technologies that allowed its
creation) is the use of microcode to implement complex instructions closely
tuned to the language-level concepts of Lisp. In the '70s, when the Lisp
Machine was born, compiler technology was poor and microcoding simplified the
compiler writer's task while maintaining performance. Today, advanced compiler
technology appears to obviate the need for microcoding and complex instruction
sets, but we discovered that the continual climb in processor clock rates (and
resulting increasing mismatch between memory speeds and instruction execution
rate) may mark the return of micro-programming as a valid implementation
technology. In this case, by micro-programming, we simply mean another layer
of abstraction between the underlying hardware execution unit and the
instruction execution model the compiler has as a target, what has also been
termed an interpreter.
A second enabling technology of the Lisp Machine, is the use of "tagged
memory". While modern compiler technology (in particular, flow-analysis, type
propagation, and block compilation) combined with careful type declarations on
the part of the programmer can provide very competitive Lisp implementations on
standard architectures, it is not in the tradition of Lisp to require type
declarations. The use of tagged memory to support dynamic typing and generic
operations allowed the Lisp machine to give competitive performance in the
absence of carefully declared types, a key feature in support of its rapid
prototyping capability. Supporting tagged memory was an important goal of our
emulator.
In the full paper, we give an overview of Lisp Machine architecture, its origin
and current state. We discuss our earlier studies that had determined
emulation on 32-bit address and data machines would have unacceptable
performance. The advent of true 64-bit workstations allowed us to revisit that
study and determine that a competitive product was possible.
We
knew there were a number of challenges to be overcome in developing an
emulator, but the performance bottleneck in all our studies was the
sophisticated memory model of the Lisp Machine. The "object-ness" of memory is
built in to the hardware at the lowest level. All memory accesses are
data-sensitive, in support of dynamic typing, monitoring, and garbage
collection. We knew from other work in the field that comparable garbage
collection models were being supported using the simple page-protection
features underlying most operating systems. The challenge was to convert the
complex but venerable Genera garbage-collector to such a scheme with minimal
changes. The full paper discusses the details of the conversion of the
garbage-collector, including moving some of its supporting routines to
"microcode" (i.e., implementing them as new instructions in the software
emulator). The emulation of the data-sensitive memory operation is examined in
detail and several generations of refinement show how we were able to take
better and better advantage of our understanding of the Alpha instruction
set.
The full paper also discusses other aspects of the emulator design and some of
the lesser goals such as sharing data with other processes running under OSF.
We
built a prototype of the emulator in C, but it quickly became obvious that we
could not achieve the level of performance desired in C. Examination of code
emitted by the C compiler showed it took very poor advantage of the Alpha's
dual-issue capabilities. A second implementation was done in Alpha assembly
language and is the basis for the current product.
We built a number of tools (in Lisp, running on Genera) that supported the
level of complexity of the assembly language program we were attempting. A
translator was built that allowed us to use Lisp as a macro language. One of
the primary benefits of using Lisp was that we could use all our normal
development tools, including incremental patching, even though we were working
in another machine's assembly language. Even more beneficial, however, was
that early on in the project, we were able to easily graft on to the translator
a cycle-counting tool that allowed one to easily and automatically "preview"
any code fragment and see its total cycle cost, dual-issues that were taken or
missed, and any free stall slots. Because this tool was integrated directly
with the Genera editor, we were able to pro-actively optimize our code, right
from the start. The full paper describes this tool in more detail, with
examples, and compares it with tools that have recently become available from
DEC that attempt to automatically re-organize executable files. It is our
claim that our tool, because of its interactive nature, offers many more
opportunities for optimization.
The architecture of the emulator as designed and eventually implemented is
examined in detail in the full paper. Among the interesting details is the
substitution for the hardware-supported display of an X-window interface. This
came nearly for free, as earlier projects had already developed a compatibility
package that allowed Genera to use any X-server as a display. Similarly,
earlier hardware projects to create co-processor systems, where our custom
hardware would run inside another workstation, had already defined an
architecture that allowed us to easily substitute OSF disk and network services
for real hardware. One interesting aspect of this substitution, however, is
that the emulator process acts as an independent host on the network (it is
emulating a complete workstation) despite sharing underlying network services
with the Alpha workstation.
In
order to compare the performance of the emulator against our custom hardware,
we used the standard "Gabriel" Lisp benchmarks, plus our own "large program"
and "user interface" benchmarks. For the most part, the benchmarks gave
consistent results, but we were puzzled a number of times when changes that
clearly had a global effect of reducing overall cycle counts showed
inexplicably poorer benchmark results. After much puzzling and studying, we
eventually realized that the Alpha's direct-mapped cache had to be taken into
consideration. This resulted in an effort to organize our code, effectively
into micro-code "overlays", by ensuring that commonly called routines and their
siblings were placed contiguously, and that they would not collide with
themselves in the cache. Similarly, the data structures of the emulator were
reorganized and placed carefully in memory.
Version 1.0 of the emulator exceeded our initial performance goals, achieving
nearly the performance of our high-end custom workstation. It is currently in
use at a number of customer sites and being used internally for software
development work. A second version is underway, where we are attempting to
further exploit some of the lessons we learned in the first implementation.
It is our contention that although our Lisp code is interpreted, our performance may approach or even surpass that of compiled Lisp code on the Alpha because our interpreter is acting almost like a micro-program when it remains resident in the primary caches. The full paper examines some anecdotal evidence that supports this theory, gives some empirical data we have gathered that also supports this theory, and compares the idea with studies performed elsewhere showing the effect of cache-misses on high clock-rate CPU's. We speculate that as execution clock rates continue to increase (exceeding the speed of even first-level cache technology) we may come full-circle in computer architectures and return to a situation where micro-programming and tagged data are again valuable implementation techniques.
[1]Symbolics® and Genera® are
registered trademarks and Virtual Lisp Machine(TM)
and VLM(TM) are trademarks of Symbolics, Inc.
DEC® is a registered trademark and Alpha(TM) and
AXP(TM) are trademarks of Digital Equipment
Corporation.
[2]Author's electronic mail addresses are
ptw@Riverside. SCRC.Symbolics.COM, SWM@Stony-Brook.SCRC. Symbolics.COM,
Palter@Stony-Brook.SCRC.Symbolics. COM, and PRobertson@ACM.ORG, respectively.
Last Updated: 21-Nov-98 | Problems, questions to Webmaster | Copyright © 1998 Callitropetm |