Tagged pointers and immediate fixnums, characters, and single-floats in Clasp

In preparation for making Clasp generate performant code, I’ve switched to using tagged pointers and immediate fixnums, characters and single-floats.  This is a common technique in dynamic language implementations where you represent small values directly within pointers rather than storing them on the heap. The tagging scheme in Clasp is as follows for 64-bit systems (32-bit is not supported at the moment).

A pointer provides 64-bits (63 … 0) to work with.

FIXNUM #b00 The other 62 bits are used to store a signed FIXNUM. The 0b0 bit allows certain arithmetic operations (addition, subtraction, comparison) to be very fast.
GENERAL #b001 In the lower three bits represents a pointer to an object on the heap.  The pointer is 8-byte aligned in memory.
CONS #b011 A pointer to a CONS cell. Robert Strandh suggested this so that Common Lisp code can easily distinguish CONS cells from non CONS cells and speed up list traversal.
VA-LIST #b101 A pointer to a wrapped va_list structure pointing to a list of arguments on the stack.
CHARACTER #b010 A character in a 32-bit value (>> 5 to unbox).
SINGLE-FLOAT #b110 Represents a single precision C/C++ float, a 32-bit value (>>5 to unbox).
GCTAG #b111 Headerless objects like CONS cells use this tag to indicate that the tagged word is used by the garbage collector
UNUSED #b010 Suggestions?

I’m especially interested in peoples thoughts on embedding something like a double-precision float within 61 bits. I’m 99.9(repeating)% sure it can’t be done but I’d love something like that. Arithmetic, Arrr! she be a harsh mistress.

All pointer access in Clasp is performed using a C++ template class called smart_ptr and it was relatively easy to modify this class to manage the tagged pointers. The untagging/tagging of OTHER-PTR and CONS-PTR pointers is carried out by overloading the C++ dereferencing operators: operator-> and operator*, of the smart_ptr template class.

It was a bit tricky to get these immediate values to play nice with C++ types and inheritance because the Clasp C++ code does not treat all Common Lisp types as a single undifferentiated pointer type. Clasp uses C++ types like T_sp, List_sp, Cons_sp, Float_sp, Symbol_sp, HashTable_sp etc. to distinguish different Common Lisp pointer types from each other within the C++ code. An assignment like:

Symbol_sp y(...);
T_sp x = y;

is valid and very common within the Clasp C++ code. But:

T_sp a(...);
...
Symbol_sp s = gc::As(a);

will signal a Common Lisp error if a doesn’t point to a SYMBOL at the time of the assignment to s.

To deal with this I defined a template class: TaggedCast<ToPtr,FromPtr> that provides an “isA” function and a “castOrNULL” function that can be specialized to simulate inheritance between classes that represent immediate values to the regular C++ class hierarchy.

Advertisements

How C++ interoperation works in Clasp “under the hood”

A “core::Functoid” in Clasp is a base class for an object that can be called from Common Lisp. It implements a method that is the calling convention from Common Lisp. It can wrap an environment and whatever it needs to call some code.
It is the base class of core::Closure which also wraps an environment.
I have core::InterpretedClosure that also wraps a List_sp which is an S-expression that is walked to interpret it.
I have CompiledClosure that wraps an LLVM function pointer that can be called.
I have BuiltInClosure that wraps a pointer to a C++ function that can be called.
There are also VariadicMethoid objects that call C++ class instance methods with different numbers of arguments.
In many cases these classes are generated at C++ compile time using template code that generates everything needed to convert Common Lisp types to C++ types and back again.  All that needs to be specified to expose a C++ function to Clasp Common Lisp is a string that becomes the name of the symbol that binds the C++ function and a pointer to the C++ function.  Since Common Lisp types in Clasp are all implemented as C++ types the overhead of these conversions should be pretty low. You can write from_object and to_object (template classes) translators that do the translation back and forth.

An update on Clasp

I’ve been really busy incorporating a new compiler into Clasp to eventually allow it to generate performant code. My goal is code that runs as fast as the best Common Lisp compilers out there (I’m looking at you Steel Bank Common Lisp! :-)).

I’m also incorporating a pointer tagging scheme within Clasp so that FIXNUM, CHARACTER, SHORT-FLOAT (32bit IEEE float) and SINGLE-FLOAT (a hacked 59bit pseudo double-precision representation) are all immediate values.  CONS cells will have their own tag as well to make iteration over Common Lisp LISTs as fast as possible.

Stay tuned.

Release 0.2 of clasp is available

I uploaded a new release of Clasp today (January 25th, 2015) that brings a lot of stability improvements, improved performance, improved compatibility with the Common Lisp standard and ASDF, SLIME and Quicklisp support.

It requires a reinstallation of externals-clasp.  We are working on eliminating the need for externals-clasp.

Features include:

  1. ASDF support has been added – it can be accessed using (require :asdf)
  2. SLIME support has been added and the changes to SLIME have been uploaded to the standard SLIME repository.  You should be able to access it the standard way you install and upgrade slime.
  3. Quicklisp support has been added.  I’ve submitted the changes to quicklisp to Zach Bean (the developer of quicklisp) and they should be available soon.
  4. Improvements in performance of the compiled code (Linux code is at least 2x faster).
  5. Improvements in stability.
  6. Almost all Common Lisp symbols have been implemented (18 left, you can see them using (core:calculate-missing-common-lisp-symbols).
  7. Example code for Clasp/C++ interoperation is available.
  8. Weak-key-hash-tables, weak-pointer, weak-key-mapping types have been implemented in both garbage collectors.

This release focused on getting SLIME working because I want to incorporate a new compiler front end for Clasp and I want to do it in this extremely powerful programming environment.

Debugging with the Clasp debugger

Clasp provides two ways of debugging code. In interactive sessions Clasp invokes a built in Common Lisp debugger when errors or other exceptional situations arise. The Clasp compiler also generates DWARF debugging information that can be used by the GDB debugger (and hopefully soon the LLDB debugger) to display Clasp Common Lisp source information interleaved with C++ source information.

To see this start up clasp and type what follows the > prompt:

Top level.
> (defun c () (break "In c"))

C
> (defun b () (c))

B
> (defun a () (b))

A
> (a)

Condition of type: SIMPLE-CONDITION
In c

Available restarts:
(use :r1 to invoke restart 1)

1. (CONTINUE) Return from BREAK.
2. (RESTART-TOPLEVEL) Go back to Top-Level REPL.

Broken at frame[14] CORE::REP.
 File: #<CORE:SOURCE-FILE-INFO #P"/Users/meister/Development/clasp/src/lisp/kernel/lsp/top.lsp"> (Position #573)
>> 

The double prompt >> indicates that we are now in the Clasp Common Lisp debugger. This debugger is inherited from ECL (because Clasp uses the excellent Common Lisp source code from ECL). To get a list of commands that are available in the debugger type:


>> :h

Top level commands:
:cf		Compile file.
:exit or ^D	Exit Lisp.
:ld		Load file.
:step		Single step form.
:tr(ace)	Trace function.
:untr(ace)	Untrace function.
:pwd	Print the current value of *default-pathname-defaults*.
:cd	Change the current value of *default-pathname-defaults*.

Help commands:
:apropos	Apropos.
:doc(ument)	Document.
:h(elp) or ?	Help.  Type ":help help" for more information.

Break commands:
:q(uit)		Return to some previous break level.
:pop		Pop to previous break level.
:c(ontinue)	Continue execution.
:b(acktrace)	Print backtrace.
:f(unction)	Show current function.
:p(revious)	Go to previous function.
:d(own)         Alias to :previous.
:n(ext)		Go to next function.
:u(p)           Alias to :next.
:g(o)		Go to next function.
:fs             Search forward for function.
:bs             Search backward for function.
:disassemble	Disassemble current function.
:l(ambda-)e(expression)	Show lisp code for current function.
:v(ariables)	Show local variables, functions, blocks, and tags.
:hide		Hide function.
:unhide		Unhide function.
:hp		Hide package.
:unhp		Unhide package.
:unhide-all     Unhide all variables and packages.
:bds            Show binding stack.
:frs            Show frame stack.
:m(essage)      Show error message.
:hs		Help stack.
:i(nspect)      Inspect value of local variable.

Restart commands:
:r1             Return from BREAK. (CONTINUE).
:r2             Go back to Top-Level REPL. (RESTART-TOPLEVEL).
>> 


Clasp/ECL use Common Lisp keywords to activate debugger functionality.

To generate a backtrace type:


>> :b

--------STACK TRACE--------
   frame#  0toplevel         epilogueForm     0/0   REPL
   frame#  1/c              top.lsp   419/2   CORE::TOP-LEVEL
   frame#  2/c              top.lsp   615/21  CORE::TPL
   frame#  3/c              top.lsp   605/32  CORE::REP
   frame#  4/b         evaluator.cc  2353/0   CORE:TOP-LEVEL-EVAL-WITH-ENV
   frame#  5/b         evaluator.cc  2351/0   CORE:COMPILE-FORM-AND-EVAL-WITH-ENV
   frame#  6/c            -no-file-     1/0   nil
   frame#  7/c            -no-file-     1/0   A
   frame#  8/c            -no-file-     1/0   B
   frame#  9/c            -no-file-     1/0   C
   frame# 10/c       conditions.lsp   457/8   COMMON-LISP:BREAK
   frame# 11/c              top.lsp  1507/9   COMMON-LISP:INVOKE-DEBUGGER
   frame# 12/c              top.lsp  1489/5   CORE::DEFAULT-DEBUGGER
   frame# 13/c              top.lsp   618/7   CORE::TPL
-->frame# 14/c              top.lsp   605/32  CORE::REP
   frame# 15/b         evaluator.cc  2353/0   CORE:TOP-LEVEL-EVAL-WITH-ENV
   frame# 16/b         evaluator.cc  2351/0   CORE:COMPILE-FORM-AND-EVAL-WITH-ENV
   frame# 17/c            -no-file-     0/0   nil
   frame# 18/c              top.lsp  1088/3   CORE::TPL-BACKTRACE
   frame# 19/b            stacks.cc   712/0   CORE:IHS-BACKTRACE

NIL
>> 

The —-> indicates the current frame that the debugger has stopped on. Since the error handling code and the debugger functions are all written in Common Lisp, those functions also appear on the backtrace. The functions we entered are in frames 7, 8, and 9.

At this point we could go to a specific frame using :g and view the environment of that frame using :v or we can print variables by just typing their names.

For now we will just leave the debugger and return to the top level REPL by invoking a restart.


>> :r2

> 

Now we are back in the top level REPL and can continue working.

Next I’ll show you how to use the DWARF generated debugging information embedded in compiled Common Lisp code to debug Clasp using GDB or LLDB.

Things you can do with Clasp #1

Now that people are starting to build Clasp I thought I would say a few things about what you can do with it.

Clasp is an implementation of Common Lisp (80-90% complete) and I plan to make it an ANSI compliant Common Lisp as soon as possible. I also plan to add a faster compiler and expose some powerful C++ libraries to extend its capabilities.

But here are a few things you can do right away to get a sense of what Clasp does and to test some of its capabilities. Code that you type will (look-like-this). Don’t worry about how what you see isn’t colored or highlighted as shown here, I’m exploring different ways of rendering program output in this blog.

(defun add (x y) (+ x y))

ADD

(add 3 5)

8

(disassemble 'add)

Disassembling function: #<COMMON-LISP:COMPILED-FUNCTION CORE:ADD>

"#<LLVM-SYS::FUNCTION 
define internal void @ADD({ {}*, i32 }* %result-ptr, { {}* }* %closed-af-ptr, i32 %num-varargs, {}* %farg0, {}* %farg1, {}* %farg2, i8* %va-list) {
\"(TRY-0).entry\":
  %lambda-args-42- = alloca { {}* }
  call void @newAFsp({ {}* }* %lambda-args-42-)
  %temp = alloca { {}* }
  call void @newTsp({ {}* }* %temp)
  %0 = alloca { {}* }
  call void @newTsp({ {}* }* %0)
  call void @makeValueFrame({ {}* }* %lambda-args-42-, i32 2, i32 2000058)
  call void @setParentOfActivationFrame({ {}* }* %lambda-args-42-, { {}* }* %closed-af-ptr)
  %correct-num-args = icmp eq i32 %num-varargs, 2
  br i1 %correct-num-args, label %\"(TRY-0).continue3\", label %\"(TRY-0).error\"

\"(TRY-0).error\":                                  ; preds = %\"(TRY-0).entry\"
  %enough-args = icmp slt i32 %num-varargs, 2
  br i1 %enough-args, label %\"(TRY-0).error1\", label %\"(TRY-0).continue\"
...

That gobbledy-gook is pure LLVM-IR code generated by the Clasp Common Lisp compiler and using the fantastic LLVM library. It is exactly the same intermediate language that the Clang compiler converts C++/C/Objective-C into before it lowers it to machine code. Clasp shares the same LLVM backend library with the Clang compiler and once Clasp generates LLVM-IR, that is as efficient as that generated by Clang, then Clasp will generate code that runs as fast as C++ and C!

For some more information on Common Lisp you can check out the following links or use Google. I haven’t tested the tutorial against Clasp but almost everything should work.

For a tutorial:
http://en.wikibooks.org/wiki/Common_Lisp/First_steps/Beginner_tutorial

For beginners:
http://www.amazon.com/Land-Lisp-Learn-Program-Game/dp/1593272812/ref=sr_1_1?s=books&ie=UTF8&qid=1412084665&sr=1-1&keywords=land+of+lisp

I’ll post more later.

How Clasp compiles itself

This is the process that Clasp uses to compile itself from the repository. I wrote it stream-of-consciousness in response to a question on IRC and I thought I should document this for later.

Clasp starts up running with a slow S-expression walking interpreter.
It loads clasp/src/lisp/kernel/init.lsp. Within init.lsp is a list of modules that it loads to build everything. The list is in the variable *init-files* which contains a list of symbols that are translated to file pathnames by the system. The list also includes keywords like :base and :cmp – these denote way-points in the build process. Clasp then loads files from :start to :cmp into the interpreter. That loads in the Common Lisp code for the Clasp compiler. Then Clasp COMPILE-FILE(s) each source file from :start to :cmp and after compiling each file it loads the new bitcode file that is generated.
It’s slow to start but once the compiler starts compiling the files between :pre-cmp and :cmp it gets faster and faster as interpreted functions are replaced by compiled functions. Then it loads cmp/cmprepl (these files are all in src/lisp/kernel/lsp/* and src/lisp/kernel/cmp/*)
cmp/cmprepl installs a function that automatically compiles every form passed to EVAL before it evaluates it. Then it loads the files from :cmp to :min and then COMPILE-FILE(s) them. Then Clasp links all the bitcode files together with a bitcode file generated from src/llvmo/intrinsics.cc. So C++ code gets inlined into the compiled Lisp code. Then it generates min-boehm:image.bundle this is a minimal common lisp without CLOS. Then Clasp starts compiling the full-boehm version. It boots with the min-boehm:image.bundle and LOADs everything from :base to :all. Then it COMPILE-FILE(s) everything from :base to :all.
Then it links all of that together with intrinsics.cc and some epilogue code to start up the REPL.

Whew – that’s the build system.