Tagged pointers and immediate fixnums, characters, and single-floats in Clasp

In preparation for making Clasp generate performant code, I’ve switched to using tagged pointers and immediate fixnums, characters and single-floats.  This is a common technique in dynamic language implementations where you represent small values directly within pointers rather than storing them on the heap. The tagging scheme in Clasp is as follows for 64-bit systems (32-bit is not supported at the moment).

A pointer provides 64-bits (63 … 0) to work with.

FIXNUM #b00 The other 62 bits are used to store a signed FIXNUM. The 0b0 bit allows certain arithmetic operations (addition, subtraction, comparison) to be very fast.
GENERAL #b001 In the lower three bits represents a pointer to an object on the heap.  The pointer is 8-byte aligned in memory.
CONS #b011 A pointer to a CONS cell. Robert Strandh suggested this so that Common Lisp code can easily distinguish CONS cells from non CONS cells and speed up list traversal.
VA-LIST #b101 A pointer to a wrapped va_list structure pointing to a list of arguments on the stack.
CHARACTER #b010 A character in a 32-bit value (>> 5 to unbox).
SINGLE-FLOAT #b110 Represents a single precision C/C++ float, a 32-bit value (>>5 to unbox).
GCTAG #b111 Headerless objects like CONS cells use this tag to indicate that the tagged word is used by the garbage collector
UNUSED #b010 Suggestions?

I’m especially interested in peoples thoughts on embedding something like a double-precision float within 61 bits. I’m 99.9(repeating)% sure it can’t be done but I’d love something like that. Arithmetic, Arrr! she be a harsh mistress.

All pointer access in Clasp is performed using a C++ template class called smart_ptr and it was relatively easy to modify this class to manage the tagged pointers. The untagging/tagging of OTHER-PTR and CONS-PTR pointers is carried out by overloading the C++ dereferencing operators: operator-> and operator*, of the smart_ptr template class.

It was a bit tricky to get these immediate values to play nice with C++ types and inheritance because the Clasp C++ code does not treat all Common Lisp types as a single undifferentiated pointer type. Clasp uses C++ types like T_sp, List_sp, Cons_sp, Float_sp, Symbol_sp, HashTable_sp etc. to distinguish different Common Lisp pointer types from each other within the C++ code. An assignment like:

Symbol_sp y(...);
T_sp x = y;

is valid and very common within the Clasp C++ code. But:

T_sp a(...);
...
Symbol_sp s = gc::As(a);

will signal a Common Lisp error if a doesn’t point to a SYMBOL at the time of the assignment to s.

To deal with this I defined a template class: TaggedCast<ToPtr,FromPtr> that provides an “isA” function and a “castOrNULL” function that can be specialized to simulate inheritance between classes that represent immediate values to the regular C++ class hierarchy.

Advertisements

13 thoughts on “Tagged pointers and immediate fixnums, characters, and single-floats in Clasp

  1. Of course you cannot fit IEEE doubles in 61 bits, but you could allow some subset to be immediate values. The easiest subset would be floats whose exponent can fit in 8 bits. So if you changed your 101 tag to be “smallish double float”, you could store doubles that fit the pattern:

    s000eeeeeeeefffff…42 more…fffff

    as:

    seeeeeeeefffff…42 more…fffff101

    (where s is the sign bit, e is an exponent bit, and f is a fraction bit)

    That would make converting between representations pretty terrible, though. You could store those same floats as

    eeeeeeeefffff…42 more…fffffs101

    which would make conversion back to double floats just a 3-bit shift and 1-bit rotate.

    Whether this immediate representation would be of any use depends of course why you chose doubles over singles: if it’s for the extra precision, then it should help keep things as immediate values; if it’s for the extra range, then it won’t help one bit (pun intended).

  2. We talked briefly about nanboxing at Google NY. Here’s what I had in mind:

    It’s true that if you insist on 63-bit fixnums, you can’t also nanbox. But CL doesn’t really require such large fixnums: the constraint is that a fixnum must be able to index the largest possible array, and arrays can’t in practice exceed 2^47 bytes on 64-bit systems. So if we trim back fixnums to that range, we see that all your types above fit in 48 bits with no problem.

    Now the range of NaNs consists of the 64-bit values where the most significant 13 bits are all 1, and the remaining bits are not all zero. (Technically, these are just the signaling NaNs with negative signs, but no matter.) In your tagged pointers above, the most significant 13 bits are all 0. In order to align these, you have to transform all double values so they avoid this space. The simplest, though not necessarily the fastest, approach is to complement all the bits of a double. Alternatively, you could add or subtract a well-chosen constant to a double value to transform it. Then you simply need to add a test for being a transformed double, namely that the top 13 bits are not all 0, before doing the low-bit type tests.

    And voila: doubles no longer need to be boxed, and all your above low tagging is preserved intact.

    • Thank you very much! It will take me a little while to work through your analysis. I could drop FIXNUMs down to 48 (or so) bits. I’m still missing how to (or if it’s possible) to combine this with my other pointer tagging scheme that uses the lower two bits. It sounds like I have to check the upper bits first to determine if something is a double (or not) and then if not check the lower two bits.

      • I’ve been thinking about your NAN-boxing suggestion. A couple of other thoughts/questions came up. I’m thinking about implementing it as an option because for scientific programming it would be very valuable to have immediate doubles. Thoughts/questions: (1) Would NAN boxing would increase the cost of untagging pointers? Currently it’s free using instructions with offsets. Clasp would have to mask out the NAN bits before dereferencing every pointer. (2) Would NAN boxing increase the cost of untagging fixnums? Currently some fixnum operations don’t require untagging (addition, comparison). Those are the tradeoffs I think I see, cheaper double precision values would mean more expensive pointers and fixnums. That may be a good trade if double precision performance is important.

  3. You arrange the doubles to that the NAN bits are all zero. That means all your pointers are exactly the same as they are now. No changes at all. Pointers remain pointers, and you don’t have to mask anything, the same with fixnums.

    What you do have to do is make sure those bits are zero before you do a type test. If they aren’t zero, you have a double. And before you do arithmetic on doubles, you have to change the top few bits around, but that’s still faster than keeping doubles boxed.

  4. Symbols, conses and NIL are all problematic in Common Lisp. For historical compatibility, Common Lisp requires that (CDR NIL) == (CAR NIL) == NIL and also that NIL is a symbol whose value is NIL and also that NIL is boolean falsehood. This makes it difficult to optimize certain common operations, especially CAR and CDR. In ancient times, Lisp’s NIL value was typically represented using machine zero (all bits off), making NULL and NOT tests very fast. In many cases, zero was also stored at the first two locations in memory, thus making CAR and CDR a single indirection without a test. In Lucid Common Lisp NIL was tagged immediately as a symbol, but we placed it in memory at the beginning of an array of “systemic quantities” then stored that value in a dedicated register. Obviously you can’t do that in a mixed C++/Lisp world where you don’t have complete control of the runtime. In such an environment it probably makes the most sense to create an immediate value for NIL. One possibility is to use -1 (all bits on), which might enable a fast compare and branch. In your chart UNUSED2 could be used for NIL as well as other constant values, although it’s hard to think of anything other than numbers and characters and NIL that could potentially have an immediate representation.

    Dedicating one or more tags to directly indexable objects, such as simple vectors and defstructs and even symbols can help performance, since it allows testing and indirection to be done inline without a large number of instructions.

    This is bringing back some old memories of going through this design process 36 years ago, when we already had a 25 year history of Lisp implementations.

    • Thank you, that is interesting insight. Based on many discussions with people in the #lisp IRC channel (Robert Strandh gets special mention) I decided on having a dedicated tag for pointers to cons cells and another tag for pointers to general objects. The tagging scheme is described here: https://drmeister.wordpress.com/2015/05/16/tagged-pointers-and-immediate-fixnums-characters-and-single-floats-in-clasp/. The idea being that traversal of a list could then just check the tag to determine if the next element was a cons cell. If it isn’t, only then is a test for NIL (a symbol with no special pointer value) evaluated. The tagging scheme is implemented using C++ template programming and can be easily modified.

      • Having a special immediate value for NIL would make a NULL test faster, saving a load, at the expense of making SYMBOLP and symbol access functions a little slower. It all depends on the benchmark to know whether this is the correct tradeoff. My instinct is that it is worthwhile because of the ubiquity of NIL and especially because of its role as boolean false. Almost all symbol access is to constant, non-null symbols, thus no need for a null test in that case.

  5. A few years ago I spent a little time thinking about how to do 60-bit floats in a 64-bit CL implementation. I’m not at all sure it’s impossible. My approach was to use bits from the significand, not the exponent (so, losing precision, not range). This just requires stripping the tag bits before every operation — trivial — and rounding afterwards (unless you get a NaN). What I concluded at the time — and you should definitely double-check this very carefully if you want to go this way — was that correct round-to-even behavior could be implemented as follows: if the low-order 5 bits of the result are #x18, add (as integer) 8, else 7; then replace the low-order 4 bits with the correct tag.

    If I’m correct that this works, it’s almost astonishing. The only possible conclusion is that the format was carefully designed to make rounding this easy. The really fun part is that if the number is normalized and the high-order bits of the significand are all 1, the overflow will reach the exponent, which is exactly what you want! And it works on denormals too! — That is, again, IF I understand correctly. I might be wrong, but I encourage you to work through the various cases and see.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s