Why Common Lisp for Scientific Programming?

What makes any language suitable for scientific computing?

The most important goal is translating ideas into fast code and building on other people’s work. We are working on improving Clasp’s ability to generate fast code based on the excellent LLVM library and Clasp can expose C++/C/Fortran libraries to build on the work of others. I’ve programmed in many languages and Common Lisp is the best at expressing ideas. Every language gets translated into an abstract syntax tree on its way to native code, Lisp code is an abstract syntax tree.  There is no programming concept that can’t be expressed compactly in Lisp, this is not true in other languages.  You cannot yet express multiple dispatch functions, dynamic variables or Common Lisp style macros (a few CL features) compactly in R, Fortran, C++, or Python.

Why are R, Fortran, C++, or Python considered suitable for scientific computing?

It is the wrong question – those languages are just the languages that people started using and so they keep using them.  It is not a choice most people make – it is inertia.

I choose Common Lisp.

Advertisements

Clasp 0.4 – Joining Common Lisp and C++

Announcing Clasp 0.4 – a new release that incorporates a brand new compiler – capable of generating 200x faster code than previously, many bug fixes, a more complete Common Lisp implementation, and C++ interoperation.

Get Clasp 0.4 here

New features:

  • Clasp has a completely new, optimizing/inlining compiler called cclasp!
  • Fixnum, character and single-float types are immediate values.
  • General object pointers and cons pointers are tagged for speed.
  • Clbind library allows programmers to expose external C++ libraries.
  • Lots of bug fixes and stability improvements.

What is Clasp?

Clasp is a new Common Lisp implementation that seamlessly interoperates with C++ libraries using LLVM for compilation to native code. This allows Clasp to take advantage of a vast array of preexisting libraries and programs, such as out of the scientific computing ecosystem. Embedding them in a Common Lisp environment allows you to make use of rapid prototyping, incremental development, and other capabilities that make Common Lisp such a powerful language.

Getting Clasp

Get Clasp 0.4 here

Precompiled and prepackaged versions of Clasp will be available for a limited number of distributions. Check the releases to see if there is something available for you.

At the moment, Clasp is supported on Linux and Mac OS X. On these systems, you should be able to build it from source if a pre-made package is not available or workable for you. In case you cannot get it to compile even with the instructions below, the quickest way to get help is to either file an issue, or to chat with us directly.

Building on most systems will take around 4GB of RAM and 2-4 hours with a relatively modern processor.

Acknowledgements (#clasp IRC channel nicknames)

Robert Strandh (beach) – the Cleavir compiler and guidance.

Shinmera – build testing and organization.

stassats – guidance, bug finding and speeding up the compiler.

flash- – feedback and debugging of the clbind library.

SAL9000 – designing list iterator and feedback on ASTMatcher library.

faheem – guidance in setting up the build system.

Timing data comparing CClasp to C++, SBCL and Python

Work on CClasp (Clasp using Robert Strandh’s Cleavir compiler) is moving forward, here is some timing data that I generated comparing CClasp performance to C++, SBCL and Python.

NOTE: this test is a specific test of an algorithm that uses FIXNUM arithmetic. I have inlined simple FIXNUM arithmetic (+, -, <, =, >, and fixnump) and so these operations are fast. Code that uses other functions will run a lot slower until inlining is implemented more broadly.

 

timing

I’m calculating the 78th Fibonacci number 10,000,000 times in each case. For these integer arithmetic heavy functions, CClasp performs pretty well (~4x slower than C++). Once type inference is added as well as a few other optimizations CClasp should be generating performant code.

Note: There are compiler settings (loop unrolling) where the C code runs even faster than SBCL, it’s just for this specific test, with the compiler settings below that SBCL comes out a little faster than C++. I don’t want to start an argument about the speed of SBCL vs C++ here, my point is that CClasp has come a long way from being hundreds of times slower than C++ to within a factor of 4.

Here is the C++ code, it converts the numbers back and forth from Common Lisp representations:


Integer_sp core_cxxFibn(Fixnum_sp reps, Fixnum_sp num) {
  long int freps = clasp_to_fixnum(reps);
  long int fnum = clasp_to_fixnum(num);
  long int p1, p2, z;
  for ( long int r = 0; r<freps; ++r ) {
    p1 = 1;
    p2 = 1;
    long int rnum = fnum - 2;
    for ( long int i=0; i<rnum; ++i ) {
      z = p1 + p2;
      p2 = p1;
      p1 = z;
    }
  }
  return Integer_O::create(z);
}

Here is the Common Lisp code:


(defun fibn (reps num)
  (declare (optimize speed (safety 0) (debug 0)))
  (let ((z 0))
    (declare (type (unsigned-byte 53) reps num z))
    (dotimes (r reps)
      (let* ((p1 1)
             (p2 1))
        (dotimes (i (- num 2))
          (setf z (+ p1 p2)
                p2 p1
                p1 z))))
    z))

Here is the Python code:


import time
def fibn(reps,num):
    for r in range(0,reps):
        p1 = 1
        p2 = 1
        rnum = num - 2
        for i in range(0,rnum):
            z = p1 + p2
            p2 = p1
            p1 = z
    return z
start = time.time()
res = fibn(10000000, 78)
end = time.time()
print( "Result = %f\n", res)
print( "elapsed time: %f seconds\n" % (end-start))

More details.

CClasp version is 0.3-test-10

It was compiled using settings:
“clang++” -x c++ -O3 -gdwarf-4 -g -Wgnu-array-member-paren-init -Wno-attributes -Wno-deprecated-register -Wno-unused-variable -ferror-limit=999 -fvisibility=default -isysroot /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.10.sdk -mmacosx-version-min=10.7 -std=c++11 -stdlib=libc++ -O3 -O3 -gdwarf-4 -g -O3 -Wno-inline -DBUILDING_CLASP -DCLASP_GIT_COMMIT=\”cf99526\” -DCLASP_VERSION=\”0.3-test-10\” -DCLBIND_DYNAMIC_LINK -DDEBUG_CL_SYMBOLS -DDEBUG_FLOW_CONTROL -DEXPAT -DINCLUDED_FROM_CLASP -DINHERITED_FROM_SRC -DNDEBUG -DPROGRAM_CANDO -DREADLINE -DTRACK_ALLOCATIONS -DUSE_BOEHM -DUSE_CLASP_DYNAMIC_CAST -DUSE_STATIC_ANALYZER_GLOBAL_SYMBOLS -D_ADDRESS_MODEL_64 -D_RELEASE_BUILD -D_TARGET_OS_DARWIN -D__STDC_CONSTANT_MACROS -D__STDC_FORMAT_MACROS -D__STDC_LIMIT_MACROS -I”../../../../include” -I”../../../../projects/cando/include” -I”../../../../src” -I”../../include” -I”../../src/cffi” -I”../../src/core” -I”../../src/gctools” -I”../../src/llvmo” -I”../../src/main” -I”../../src/serveEvent” -I”../../src/sockets” -I”/Users/meister/Development/externals-clasp/build/common/include” -I”/Users/meister/Development/externals-clasp/build/release/include” -c -o “../../src/main/bin/boehm/cando/clang-darwin-4.2.1/release/link-static/main.o” “../../src/main/main.cc”

The C-code is embedded within Clasp and is thus compiled with the same settings.

The Clang version is 3.6.1

The Python version is 2.7.6. I ran the python code using: python fib.py

The SBCL version is: SBCL 1.2.11

These were run on a MacBook Pro (Retina, 15-inch, Early 2013)

I gave a talk on Clasp and my Chemistry at Google in Cambridge Mass last week

Here is the link for the talk.

This talk describes our unique approach to constructing large, atomically precise molecules (called “Molecular Lego” or “spiroligomers”) that could act as new therapeutics, new catalysts (molecules that make new chemical reactions happen faster) and ultimately to construct atomically precise molecular devices. Then I describe Clasp and CANDO, a new implementation of the powerful language Common Lisp. Clasp is a Common Lisp compiler that uses LLVM to generate fast machine code and it interoperates with C++. CANDO is a molecular design tool that uses Clasp as its programming language. Together I believe that these are the hardware (molecules) and the software (the CANDO/Clasp compiler) that will enable the development of sophisticated molecular nanotechnology.

For more info see: https://chem.cst.temple.edu/directory/faculty/schafmeister/

What a great place Google was! My host, Martin Cracauer was fantastic, he made me feel really, really welcome and made sure that the talk would be recorded and put up on the web. He arranged it so that I could spend the afternoon talking with him and Doug and James, two Lisp/compiler gurus at Google. He also gave me a tour of Google, it was great.

Tagged pointers and immediate fixnums, characters, and single-floats in Clasp

In preparation for making Clasp generate performant code, I’ve switched to using tagged pointers and immediate fixnums, characters and single-floats.  This is a common technique in dynamic language implementations where you represent small values directly within pointers rather than storing them on the heap. The tagging scheme in Clasp is as follows for 64-bit systems (32-bit is not supported at the moment).

A pointer provides 64-bits (63 … 0) to work with.

FIXNUM #b00 The other 62 bits are used to store a signed FIXNUM. The 0b0 bit allows certain arithmetic operations (addition, subtraction, comparison) to be very fast.
GENERAL #b001 In the lower three bits represents a pointer to an object on the heap.  The pointer is 8-byte aligned in memory.
CONS #b011 A pointer to a CONS cell. Robert Strandh suggested this so that Common Lisp code can easily distinguish CONS cells from non CONS cells and speed up list traversal.
VA-LIST #b101 A pointer to a wrapped va_list structure pointing to a list of arguments on the stack.
CHARACTER #b010 A character in a 32-bit value (>> 5 to unbox).
SINGLE-FLOAT #b110 Represents a single precision C/C++ float, a 32-bit value (>>5 to unbox).
GCTAG #b111 Headerless objects like CONS cells use this tag to indicate that the tagged word is used by the garbage collector
UNUSED #b010 Suggestions?

I’m especially interested in peoples thoughts on embedding something like a double-precision float within 61 bits. I’m 99.9(repeating)% sure it can’t be done but I’d love something like that. Arithmetic, Arrr! she be a harsh mistress.

All pointer access in Clasp is performed using a C++ template class called smart_ptr and it was relatively easy to modify this class to manage the tagged pointers. The untagging/tagging of OTHER-PTR and CONS-PTR pointers is carried out by overloading the C++ dereferencing operators: operator-> and operator*, of the smart_ptr template class.

It was a bit tricky to get these immediate values to play nice with C++ types and inheritance because the Clasp C++ code does not treat all Common Lisp types as a single undifferentiated pointer type. Clasp uses C++ types like T_sp, List_sp, Cons_sp, Float_sp, Symbol_sp, HashTable_sp etc. to distinguish different Common Lisp pointer types from each other within the C++ code. An assignment like:

Symbol_sp y(...);
T_sp x = y;

is valid and very common within the Clasp C++ code. But:

T_sp a(...);
...
Symbol_sp s = gc::As(a);

will signal a Common Lisp error if a doesn’t point to a SYMBOL at the time of the assignment to s.

To deal with this I defined a template class: TaggedCast<ToPtr,FromPtr> that provides an “isA” function and a “castOrNULL” function that can be specialized to simulate inheritance between classes that represent immediate values to the regular C++ class hierarchy.

How C++ interoperation works in Clasp “under the hood”

A “core::Functoid” in Clasp is a base class for an object that can be called from Common Lisp. It implements a method that is the calling convention from Common Lisp. It can wrap an environment and whatever it needs to call some code.
It is the base class of core::Closure which also wraps an environment.
I have core::InterpretedClosure that also wraps a List_sp which is an S-expression that is walked to interpret it.
I have CompiledClosure that wraps an LLVM function pointer that can be called.
I have BuiltInClosure that wraps a pointer to a C++ function that can be called.
There are also VariadicMethoid objects that call C++ class instance methods with different numbers of arguments.
In many cases these classes are generated at C++ compile time using template code that generates everything needed to convert Common Lisp types to C++ types and back again.  All that needs to be specified to expose a C++ function to Clasp Common Lisp is a string that becomes the name of the symbol that binds the C++ function and a pointer to the C++ function.  Since Common Lisp types in Clasp are all implemented as C++ types the overhead of these conversions should be pretty low. You can write from_object and to_object (template classes) translators that do the translation back and forth.