Linking LLVM bitcode files for a dynamic language

Clasp Common Lisp is a dynamic language in which every top-level form needs to be evaluated in the top level environment. Clasp compiles Common Lisp code to bitcode files and then links them together into a shared library or an executable. When the library or the executable are loaded, each top-level form needs to be evaluated. This is until Clasp gains the ability to save a running environment to a file, which it doesn’t have yet. Even then, the ability to play back the top-level forms will be needed to create the environment to write to a file.

So the compiled bitcode files need to keep track of the top-level forms so that they can be played back at startup. Clasp does this by defining a “main” function with internal linkage (called “run-all”) for each llvm::Module. “run-all” evaluates every top-level form that was compiled into the llvm:Module. The tricky part is how does this main function get exposed to the outside world so that it can be called if a single bitcode file is loaded into clasp or linked together into a library or executable and invoked with other “run-all” functions from other modules.

Clasp creates a global variable in each module called “global-run-all-array” that stores an array of initially one function pointer that points to the module’s “run-all” function.  The “global-run-all-array” global variable is defined with “appending” linkage. What this does is when bitcode files get linked together by the system linker, the “global-run-all-array” will have all of the “run-all” functions appended together and put back into the “global-run-all-array” global variable.

Then there is the problem to determine the number of entries in the “global-run-all-array”.  Clasp solves that by ensuring that the last module linked in a list of modules has a two element “global-run-all-array” where the second element is NULL and by adding a second global variable called “global-epilogue”.

When a bitcode file or a shared library is loaded into Clasp, it checks for the “global-epilogue” symbol, if it finds it then it knows that the “global-run-all-array” contains a NULL terminated array of function pointers to call.   If “global-epilogue” is not present, then it knows that “global-run-all-array” contains a single function pointer.

Clasp then invokes each of the “global-run-all-array” functions one after the other and each one of them invokes the compiled functions for the top-level forms for each of the bitcode files.

This only takes a few seconds when starting up Clasp.

Note: I haven’t been actively blogging – because I’m very, very actively programming.  If you want to say hi, I’m on IRC, freenode.org #clasp almost every day.

Improving the Clasp programming experience

I’ve been working quietly to improve the Clasp programming experience by implementing a C++ scraper for Clasp that extracts all information from the C++ code required to expose that code to Common Lisp.

This means functions, classes, methods, symbols, enums and initialization functions are identified by the scraper and code is automatically generated to expose these entities to the Common Lisp programming environment.  This replaces thousands of calls that were hand coded by me and were called at startup to expose C++ functionality to Common Lisp with automatically generated code.  To expose a C++ function now all you do is some variation on this:

namespace ext {
CL_LAMBDA(x y &optional (z 0));
CL_DOCSTRING(R”doc(* Arguments
– x :: A fixnum
– y :: A fixnum
– z :: A fixnum
* Description
Add up to three fixnums together as long as they fit into an int.
If they don’t, an error will be signaled.)doc”);
CL_DEFUN int add_numbers(int x, int y, int z) {
return x + y;
}

The CL_LAMBDA(…), CL_DOCSTRING(…), and CL_DEFUN tags are read from the C++ source code by the scraper and code is generated to enable things like (ext:add-numbers 4 5) to be called from Common Lisp.  Above is a complicated example using the optional CL_LAMBDA(…) and CL_DOCSTRING(…) macros – something as simple as below is all that is necessary to bind the C++ function to the ext:add-two-numbers symbol:

CL_DEFUN int ext__add_two_numbers(int x, int y) {return x+y;}

The C-preprocessor is run on all of the source code with macros defined for tags like CL_DEFUN, CL_DEFMETHOD, CL_DOCSTRING(…), LISP_CLASS(…) etc. prior to the scraper searching for the tags. The scraper then generates code that is run at startup to expose everything. The scraper is run every time Clasp is built but it’s smart about only rebuilding generated code that needs to be rebuilt.

There are several benefits to this: (1) adding functions, classes, methods etc. to Clasp Common Lisp is easier now because you just add the entity and there are no more worries about where to add the call at startup to expose the entity. (2) The scraper builds a database of source location for all of these entities. So now, from Slime you can hit M-. on a Clasp symbol and emacs will jump to the source in the C++ source or Common Lisp source – wherever it is defined.

At the same time I’ve added code to extract and generate lambda-lists and docstrings for C++ functions and methods and they are also available now within the Common Lisp environment.

The scraper is written in standard Common Lisp and sbcl has been added as a build dependency for Clasp. The python scrapers Clasp used up to this point have all been retired and python is no longer a dependency for Clasp.

Why Common Lisp for Scientific Programming?

What makes any language suitable for scientific computing?

The most important goal is translating ideas into fast code and building on other people’s work. We are working on improving Clasp’s ability to generate fast code based on the excellent LLVM library and Clasp can expose C++/C/Fortran libraries to build on the work of others. I’ve programmed in many languages and Common Lisp is the best at expressing ideas. Every language gets translated into an abstract syntax tree on its way to native code, Lisp code is an abstract syntax tree.  There is no programming concept that can’t be expressed compactly in Lisp, this is not true in other languages.  You cannot yet express multiple dispatch functions, dynamic variables or Common Lisp style macros (a few CL features) compactly in R, Fortran, C++, or Python.

Why are R, Fortran, C++, or Python considered suitable for scientific computing?

It is the wrong question – those languages are just the languages that people started using and so they keep using them.  It is not a choice most people make – it is inertia.

I choose Common Lisp.

Clasp 0.4 – Joining Common Lisp and C++

Announcing Clasp 0.4 – a new release that incorporates a brand new compiler – capable of generating 200x faster code than previously, many bug fixes, a more complete Common Lisp implementation, and C++ interoperation.

Get Clasp 0.4 here

New features:

  • Clasp has a completely new, optimizing/inlining compiler called cclasp!
  • Fixnum, character and single-float types are immediate values.
  • General object pointers and cons pointers are tagged for speed.
  • Clbind library allows programmers to expose external C++ libraries.
  • Lots of bug fixes and stability improvements.

What is Clasp?

Clasp is a new Common Lisp implementation that seamlessly interoperates with C++ libraries using LLVM for compilation to native code. This allows Clasp to take advantage of a vast array of preexisting libraries and programs, such as out of the scientific computing ecosystem. Embedding them in a Common Lisp environment allows you to make use of rapid prototyping, incremental development, and other capabilities that make Common Lisp such a powerful language.

Getting Clasp

Get Clasp 0.4 here

Precompiled and prepackaged versions of Clasp will be available for a limited number of distributions. Check the releases to see if there is something available for you.

At the moment, Clasp is supported on Linux and Mac OS X. On these systems, you should be able to build it from source if a pre-made package is not available or workable for you. In case you cannot get it to compile even with the instructions below, the quickest way to get help is to either file an issue, or to chat with us directly.

Building on most systems will take around 4GB of RAM and 2-4 hours with a relatively modern processor.

Acknowledgements (#clasp IRC channel nicknames)

Robert Strandh (beach) – the Cleavir compiler and guidance.

Shinmera – build testing and organization.

stassats – guidance, bug finding and speeding up the compiler.

flash- – feedback and debugging of the clbind library.

SAL9000 – designing list iterator and feedback on ASTMatcher library.

faheem – guidance in setting up the build system.

Timing data comparing CClasp to C++, SBCL and Python

Work on CClasp (Clasp using Robert Strandh’s Cleavir compiler) is moving forward, here is some timing data that I generated comparing CClasp performance to C++, SBCL and Python.

NOTE: this test is a specific test of an algorithm that uses FIXNUM arithmetic. I have inlined simple FIXNUM arithmetic (+, -, <, =, >, and fixnump) and so these operations are fast. Code that uses other functions will run a lot slower until inlining is implemented more broadly.

 

timing

I’m calculating the 78th Fibonacci number 10,000,000 times in each case. For these integer arithmetic heavy functions, CClasp performs pretty well (~4x slower than C++). Once type inference is added as well as a few other optimizations CClasp should be generating performant code.

Note: There are compiler settings (loop unrolling) where the C code runs even faster than SBCL, it’s just for this specific test, with the compiler settings below that SBCL comes out a little faster than C++. I don’t want to start an argument about the speed of SBCL vs C++ here, my point is that CClasp has come a long way from being hundreds of times slower than C++ to within a factor of 4.

Here is the C++ code, it converts the numbers back and forth from Common Lisp representations:


Integer_sp core_cxxFibn(Fixnum_sp reps, Fixnum_sp num) {
  long int freps = clasp_to_fixnum(reps);
  long int fnum = clasp_to_fixnum(num);
  long int p1, p2, z;
  for ( long int r = 0; r<freps; ++r ) {
    p1 = 1;
    p2 = 1;
    long int rnum = fnum - 2;
    for ( long int i=0; i<rnum; ++i ) {
      z = p1 + p2;
      p2 = p1;
      p1 = z;
    }
  }
  return Integer_O::create(z);
}

Here is the Common Lisp code:


(defun fibn (reps num)
  (declare (optimize speed (safety 0) (debug 0)))
  (let ((z 0))
    (declare (type (unsigned-byte 53) reps num z))
    (dotimes (r reps)
      (let* ((p1 1)
             (p2 1))
        (dotimes (i (- num 2))
          (setf z (+ p1 p2)
                p2 p1
                p1 z))))
    z))

Here is the Python code:


import time
def fibn(reps,num):
    for r in range(0,reps):
        p1 = 1
        p2 = 1
        rnum = num - 2
        for i in range(0,rnum):
            z = p1 + p2
            p2 = p1
            p1 = z
    return z
start = time.time()
res = fibn(10000000, 78)
end = time.time()
print( "Result = %f\n", res)
print( "elapsed time: %f seconds\n" % (end-start))

More details.

CClasp version is 0.3-test-10

It was compiled using settings:
“clang++” -x c++ -O3 -gdwarf-4 -g -Wgnu-array-member-paren-init -Wno-attributes -Wno-deprecated-register -Wno-unused-variable -ferror-limit=999 -fvisibility=default -isysroot /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.10.sdk -mmacosx-version-min=10.7 -std=c++11 -stdlib=libc++ -O3 -O3 -gdwarf-4 -g -O3 -Wno-inline -DBUILDING_CLASP -DCLASP_GIT_COMMIT=\”cf99526\” -DCLASP_VERSION=\”0.3-test-10\” -DCLBIND_DYNAMIC_LINK -DDEBUG_CL_SYMBOLS -DDEBUG_FLOW_CONTROL -DEXPAT -DINCLUDED_FROM_CLASP -DINHERITED_FROM_SRC -DNDEBUG -DPROGRAM_CANDO -DREADLINE -DTRACK_ALLOCATIONS -DUSE_BOEHM -DUSE_CLASP_DYNAMIC_CAST -DUSE_STATIC_ANALYZER_GLOBAL_SYMBOLS -D_ADDRESS_MODEL_64 -D_RELEASE_BUILD -D_TARGET_OS_DARWIN -D__STDC_CONSTANT_MACROS -D__STDC_FORMAT_MACROS -D__STDC_LIMIT_MACROS -I”../../../../include” -I”../../../../projects/cando/include” -I”../../../../src” -I”../../include” -I”../../src/cffi” -I”../../src/core” -I”../../src/gctools” -I”../../src/llvmo” -I”../../src/main” -I”../../src/serveEvent” -I”../../src/sockets” -I”/Users/meister/Development/externals-clasp/build/common/include” -I”/Users/meister/Development/externals-clasp/build/release/include” -c -o “../../src/main/bin/boehm/cando/clang-darwin-4.2.1/release/link-static/main.o” “../../src/main/main.cc”

The C-code is embedded within Clasp and is thus compiled with the same settings.

The Clang version is 3.6.1

The Python version is 2.7.6. I ran the python code using: python fib.py

The SBCL version is: SBCL 1.2.11

These were run on a MacBook Pro (Retina, 15-inch, Early 2013)

I gave a talk on Clasp and my Chemistry at Google in Cambridge Mass last week

Here is the link for the talk.

This talk describes our unique approach to constructing large, atomically precise molecules (called “Molecular Lego” or “spiroligomers”) that could act as new therapeutics, new catalysts (molecules that make new chemical reactions happen faster) and ultimately to construct atomically precise molecular devices. Then I describe Clasp and CANDO, a new implementation of the powerful language Common Lisp. Clasp is a Common Lisp compiler that uses LLVM to generate fast machine code and it interoperates with C++. CANDO is a molecular design tool that uses Clasp as its programming language. Together I believe that these are the hardware (molecules) and the software (the CANDO/Clasp compiler) that will enable the development of sophisticated molecular nanotechnology.

For more info see: https://chem.cst.temple.edu/directory/faculty/schafmeister/

What a great place Google was! My host, Martin Cracauer was fantastic, he made me feel really, really welcome and made sure that the talk would be recorded and put up on the web. He arranged it so that I could spend the afternoon talking with him and Doug and James, two Lisp/compiler gurus at Google. He also gave me a tour of Google, it was great.

Tagged pointers and immediate fixnums, characters, and single-floats in Clasp

In preparation for making Clasp generate performant code, I’ve switched to using tagged pointers and immediate fixnums, characters and single-floats.  This is a common technique in dynamic language implementations where you represent small values directly within pointers rather than storing them on the heap. The tagging scheme in Clasp is as follows for 64-bit systems (32-bit is not supported at the moment).

A pointer provides 64-bits (63 … 0) to work with.

FIXNUM #b00 The other 62 bits are used to store a signed FIXNUM. The 0b0 bit allows certain arithmetic operations (addition, subtraction, comparison) to be very fast.
GENERAL #b001 In the lower three bits represents a pointer to an object on the heap.  The pointer is 8-byte aligned in memory.
CONS #b011 A pointer to a CONS cell. Robert Strandh suggested this so that Common Lisp code can easily distinguish CONS cells from non CONS cells and speed up list traversal.
VA-LIST #b101 A pointer to a wrapped va_list structure pointing to a list of arguments on the stack.
CHARACTER #b010 A character in a 32-bit value (>> 5 to unbox).
SINGLE-FLOAT #b110 Represents a single precision C/C++ float, a 32-bit value (>>5 to unbox).
GCTAG #b111 Headerless objects like CONS cells use this tag to indicate that the tagged word is used by the garbage collector
UNUSED #b010 Suggestions?

I’m especially interested in peoples thoughts on embedding something like a double-precision float within 61 bits. I’m 99.9(repeating)% sure it can’t be done but I’d love something like that. Arithmetic, Arrr! she be a harsh mistress.

All pointer access in Clasp is performed using a C++ template class called smart_ptr and it was relatively easy to modify this class to manage the tagged pointers. The untagging/tagging of OTHER-PTR and CONS-PTR pointers is carried out by overloading the C++ dereferencing operators: operator-> and operator*, of the smart_ptr template class.

It was a bit tricky to get these immediate values to play nice with C++ types and inheritance because the Clasp C++ code does not treat all Common Lisp types as a single undifferentiated pointer type. Clasp uses C++ types like T_sp, List_sp, Cons_sp, Float_sp, Symbol_sp, HashTable_sp etc. to distinguish different Common Lisp pointer types from each other within the C++ code. An assignment like:

Symbol_sp y(...);
T_sp x = y;

is valid and very common within the Clasp C++ code. But:

T_sp a(...);
...
Symbol_sp s = gc::As(a);

will signal a Common Lisp error if a doesn’t point to a SYMBOL at the time of the assignment to s.

To deal with this I defined a template class: TaggedCast<ToPtr,FromPtr> that provides an “isA” function and a “castOrNULL” function that can be specialized to simulate inheritance between classes that represent immediate values to the regular C++ class hierarchy.

How C++ interoperation works in Clasp “under the hood”

A “core::Functoid” in Clasp is a base class for an object that can be called from Common Lisp. It implements a method that is the calling convention from Common Lisp. It can wrap an environment and whatever it needs to call some code.
It is the base class of core::Closure which also wraps an environment.
I have core::InterpretedClosure that also wraps a List_sp which is an S-expression that is walked to interpret it.
I have CompiledClosure that wraps an LLVM function pointer that can be called.
I have BuiltInClosure that wraps a pointer to a C++ function that can be called.
There are also VariadicMethoid objects that call C++ class instance methods with different numbers of arguments.
In many cases these classes are generated at C++ compile time using template code that generates everything needed to convert Common Lisp types to C++ types and back again.  All that needs to be specified to expose a C++ function to Clasp Common Lisp is a string that becomes the name of the symbol that binds the C++ function and a pointer to the C++ function.  Since Common Lisp types in Clasp are all implemented as C++ types the overhead of these conversions should be pretty low. You can write from_object and to_object (template classes) translators that do the translation back and forth.

An update on Clasp

I’ve been really busy incorporating a new compiler into Clasp to eventually allow it to generate performant code. My goal is code that runs as fast as the best Common Lisp compilers out there (I’m looking at you Steel Bank Common Lisp! :-)).

I’m also incorporating a pointer tagging scheme within Clasp so that FIXNUM, CHARACTER, SHORT-FLOAT (32bit IEEE float) and SINGLE-FLOAT (a hacked 59bit pseudo double-precision representation) are all immediate values.  CONS cells will have their own tag as well to make iteration over Common Lisp LISTs as fast as possible.

Stay tuned.