Timing data comparing CClasp to C++, SBCL and Python

Work on CClasp (Clasp using Robert Strandh’s Cleavir compiler) is moving forward, here is some timing data that I generated comparing CClasp performance to C++, SBCL and Python.

NOTE: this test is a specific test of an algorithm that uses FIXNUM arithmetic. I have inlined simple FIXNUM arithmetic (+, -, <, =, >, and fixnump) and so these operations are fast. Code that uses other functions will run a lot slower until inlining is implemented more broadly.



I’m calculating the 78th Fibonacci number 10,000,000 times in each case. For these integer arithmetic heavy functions, CClasp performs pretty well (~4x slower than C++). Once type inference is added as well as a few other optimizations CClasp should be generating performant code.

Note: There are compiler settings (loop unrolling) where the C code runs even faster than SBCL, it’s just for this specific test, with the compiler settings below that SBCL comes out a little faster than C++. I don’t want to start an argument about the speed of SBCL vs C++ here, my point is that CClasp has come a long way from being hundreds of times slower than C++ to within a factor of 4.

Here is the C++ code, it converts the numbers back and forth from Common Lisp representations:

Integer_sp core_cxxFibn(Fixnum_sp reps, Fixnum_sp num) {
  long int freps = clasp_to_fixnum(reps);
  long int fnum = clasp_to_fixnum(num);
  long int p1, p2, z;
  for ( long int r = 0; r<freps; ++r ) {
    p1 = 1;
    p2 = 1;
    long int rnum = fnum - 2;
    for ( long int i=0; i<rnum; ++i ) {
      z = p1 + p2;
      p2 = p1;
      p1 = z;
  return Integer_O::create(z);

Here is the Common Lisp code:

(defun fibn (reps num)
  (declare (optimize speed (safety 0) (debug 0)))
  (let ((z 0))
    (declare (type (unsigned-byte 53) reps num z))
    (dotimes (r reps)
      (let* ((p1 1)
             (p2 1))
        (dotimes (i (- num 2))
          (setf z (+ p1 p2)
                p2 p1
                p1 z))))

Here is the Python code:

import time
def fibn(reps,num):
    for r in range(0,reps):
        p1 = 1
        p2 = 1
        rnum = num - 2
        for i in range(0,rnum):
            z = p1 + p2
            p2 = p1
            p1 = z
    return z
start = time.time()
res = fibn(10000000, 78)
end = time.time()
print( "Result = %f\n", res)
print( "elapsed time: %f seconds\n" % (end-start))

More details.

CClasp version is 0.3-test-10

It was compiled using settings:
“clang++” -x c++ -O3 -gdwarf-4 -g -Wgnu-array-member-paren-init -Wno-attributes -Wno-deprecated-register -Wno-unused-variable -ferror-limit=999 -fvisibility=default -isysroot /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.10.sdk -mmacosx-version-min=10.7 -std=c++11 -stdlib=libc++ -O3 -O3 -gdwarf-4 -g -O3 -Wno-inline -DBUILDING_CLASP -DCLASP_GIT_COMMIT=\”cf99526\” -DCLASP_VERSION=\”0.3-test-10\” -DCLBIND_DYNAMIC_LINK -DDEBUG_CL_SYMBOLS -DDEBUG_FLOW_CONTROL -DEXPAT -DINCLUDED_FROM_CLASP -DINHERITED_FROM_SRC -DNDEBUG -DPROGRAM_CANDO -DREADLINE -DTRACK_ALLOCATIONS -DUSE_BOEHM -DUSE_CLASP_DYNAMIC_CAST -DUSE_STATIC_ANALYZER_GLOBAL_SYMBOLS -D_ADDRESS_MODEL_64 -D_RELEASE_BUILD -D_TARGET_OS_DARWIN -D__STDC_CONSTANT_MACROS -D__STDC_FORMAT_MACROS -D__STDC_LIMIT_MACROS -I”../../../../include” -I”../../../../projects/cando/include” -I”../../../../src” -I”../../include” -I”../../src/cffi” -I”../../src/core” -I”../../src/gctools” -I”../../src/llvmo” -I”../../src/main” -I”../../src/serveEvent” -I”../../src/sockets” -I”/Users/meister/Development/externals-clasp/build/common/include” -I”/Users/meister/Development/externals-clasp/build/release/include” -c -o “../../src/main/bin/boehm/cando/clang-darwin-4.2.1/release/link-static/main.o” “../../src/main/main.cc”

The C-code is embedded within Clasp and is thus compiled with the same settings.

The Clang version is 3.6.1

The Python version is 2.7.6. I ran the python code using: python fib.py

The SBCL version is: SBCL 1.2.11

These were run on a MacBook Pro (Retina, 15-inch, Early 2013)