(λ [blog] KING)   KING Is Not Genera, but it will be

defλ Part 1

Introduction

Let’s define some primitives for a preliminary lisp (and associate some symbols while we’re at it). We need:

In order to get there, we’re going to have to define some preliminaries:

Preliminaries: Types and Tags

We’re going to use tagged pointers in order to know what kind of thing (or type) our pointer is pointing to. In AArch64 architectures, bits 56-63 of a virtual address (a pointer) may be used for program specific data, and that is where we will put our tags. Right now we’re only going to define a few general categories:

Pointer Type        Pointer tag
general data        #b000xxxxx 
numerical values    #b001xxxxx 
displayable values  #b010xxxxx 
procedural data     #b100xxxxx 
symbolic data       #b101xxxxx 
structured data     #b110xxxxx 

We’ll really only deal with two of them today: structured data and displayable values.

Displayable Values: Booleans

What are displayable values? They’re values that aren’t numbers, evaluate to themselves, and can be displayed. This includes things like characters, strings, keywords, and booleans. It also includes things like images, videos, diagrams, and lots more.

We’ll define the pointer tag for booleans as follows:

Pointer Type     Pointer tag
boolean          #b01010001 

The actual value of the boolean will be encoded in the pointers least significant bit. Keeping with convention, we’ll define the values with 1 being true and 0 being false.

Structured Data: The Cons Pair

We’re going to start with one data structure today: the cons pair.

What’s a Cons Pair?

In any lisp, the core method of constructing data, the core primitive of all data structures, is the cons pair. See SICP Ch. 2.

(cons 'a 'b) ;;=> ('a . 'b)

It’s typically denoted with a period in between the two objects like so (<object-a> . <object-b>)

Typically, a cons pair can depicted in a box and pointer diagram like this1:

img

All a cons pair really is is just two pointers. So in memory we just have to put two things:

  1. The pointer to the first object.
  2. The pointer to the second object

Now it’s kind of a magical thing that you can use these pairs to generate lists, and trees, and numerical representations, and all sorts of crazy things! For instance, here’s how you represent lists:

img

It’s just:

(cons 1 (cons 2 (cons 3 (cons 4 nil))))

And that becomes:

'(1 2 3 4)

Amazing, right!?

Tagging a cons cell

Now, we’re going to tag the cons pair as such:

Pointer Type     Pointer tag
cons cell        #b11000000 

Procedural Data

Now why do we have to have a tag for procedural data in our pointers? Because we’re leaving the door open for an (optional) complication to lisp evaluation. Here’s an example:

This is what happens in most lisps when you do this:

(eval '(1 (+ 1 1) (+ 1 2))) => ERROR: SOMETHING SOMETHING '1' is not a function

And here is what I personally think SHOULD happen:

(eval '(1 (+ 1 1) (+ 1 2))) => '(1 2 3)

Now, a pointer to a general object, or to something that you treat as a literal, or can’t otherwise evaluate, will contain a 0 as the first significant bit (bit 0). As such, the proposed evaluation model would know not to attempt to apply it to a set of arguments. This is subject to change, and may not work, or may be useless. But we’ll stick with it for now.

Preliminaries: Functional Arguments and Procedure Calls

Let’s look at an actual example of the kind of thing we’d like to write:

(defλ some-fn
  [x0 x1 x2 x3 & x4]
  (some-stuff x1 x3)
  (compute-return-value x0 x4))

Becomes something like:

    .text
    .global some-fn
some-fn:
    /* <evaluation of arguments goes here> */

    /* store all the arguments on the stack*/
    stp x0, x1 [sp, #-16]!
    stp x2, x3 [sp, #-16]!
    str x4 [sp, #-16]!

    /* getting the arguments for some-stuff and calling it */
    ldr x0, [sp], #42         // get x1 (second arg)
    ldr x1, [sp], #24         // get x3 (fourth arg)
    bl some-stuff             // call some-stuff

    /* getting the arguments for compute-return-value and calling it */
    ldr x0, [sp], #48         // get x0 (first arg)
    ldr x1, [sp], #8         // get x4 (& args)
    bl compute-return-value   // call compute-return-value

    /* pointer to return value is placed in x0 by compute-return-value */

    ret /* return to point in x30 */

This resource was very helpful in understanding how one might handle procedure calls.

Caveat Hacker

I haven’t run any of the code in this post. We won’t really be able to until we get a REPL going. So, if you’re reading this, and see an error, please feel free to mention it in the github repo.

nil (∅) and true (⊤)

Boolean values, with values encoded in the pointer as described above. Defined as such:

 value      encoded  
 nil (∅)    0        
 true (⊤)   1        

cons (ζ)

Let’s start simple, the absolute simplest we can get: A cons pair.

A cons pair is a pair of pointers. That’s it. It takes 128 bits of memory, and we allocate it on the stack2. The procedure cons just returns the tagged pointer to this 128 bit span of memory. The tagged pointer itself is just the address of the pointers on the stack, with the appropriate memory tag.

So this:

(cons 'a 'b) ;;=> ('a . 'b)

    .text
    .global cons
cons:
 /* store the arguments on the stack*/
    stp x0, x1 [sp, #-16]!

 /* <evaluation of arguments goes here>
  We'll put the pointers to evaluated arguments
  in the same registers (x0, x1) as they came to us */

 /* Get the tagged pointer to our cons pair */
    add x0, [sp], #16

 /* Add our pointer tag. */
    mov x1, #b11000000
    lsl x1, #56
    add x0, x1

 /* set x1 to nil */
    mov x1, #b01010001
    lsl x1, #56

 /* return the pointer to our cons pair */
    ret

car (α)

The car is the first object in your cons-pair. Like so:

(car ('a . 'b)) ;;=> 'a

The procedure for car should return the pointer stored in the first 64 bits of the cons-pair’s memory address. Remember, this address is just the value of the pointer with a different tag.

But there are a couple edge cases here. The first is (car nil) which should return nil. The second is trying to call car on something that isn’t a cons pair, in which case we should throw an error. But since we don’t have an exception handling method yet, we’ll just return nil for now.

    .text
    .global car
car:
  /* <evaluation of arguments goes here>
    We'll put the pointers to evaluated arguments
    in the same register x0 as they came to us.
    This is also where we would check the tag of
    the pointer given to us in x0, to make sure it
    points to a cons pair.*/

 /* check the tag */
    lsr x1, x0 #56
    cmp x1, #b11000000

 /* set x1 to nil */
    mov x1, #b01010001
    lsl x1, #56

 /* load our prospective car from memory */
    ldr x0, x0

 /* if you tried to cons something that wasn't a pair,
    set x0 to nil */
    mov x0, x1

 /* return the pointer to the first value in the pair */
    ret

cdr is similar:

cdr (ω)

The cdr of a cons-pair is the second object stored in a cons-pair. Hence, cdr returns the second pointer stored in the 128 bits of a cons-pair in memory. So this:

(cdr ('a . 'b)) ;;=> 'b

Becomes this:

    .text
    .global cdr
cdr:
  /* <evaluation of arguments goes here>
    We'll put the pointers to evaluated arguments
    in the same register x0 as they came to us. */

 /* check the tag */
    lsr x1, x0 #56
    cmp x1, #b11000000

 /* set x1 to nil */
    mov x1, #b01010001
    lsl x1, #56

 /* load our prospective cdr from memory */
    ldr x0, x0, #8

 /* if you tried to cons something that wasn't a pair,
    set x0 to nil */
    mov x0, x1

 /* return the pointer to the second value in the pair */
    ret

As I said, similar.

cond (χ)

Cond is a procedure that takes a list of those functions that takes an unlimited number of unevaluated pairs of forms (just groups of two, not cons pairs), comprising of a conditional and a result. In our little, clojure-like dialect of lisp, it looks something like this:

(cond
 (condition1 args) (result1 args)
 (condition2 args) (result2 args)
 :else (default-result))

Now, we haven’t had a procedure that takes potentially more arguments than there are registers before, so we need to decide how we’re going to handle that. We’ll define it properly when we define apply (Λ). But for now we’ll state it like this:

Let’s try to implement it:

    .text
    .global cond
cond:
 /* store the stack pointer and return address */
    stp x29, x30, [sp, #-16]!

 /* Check if the argument list in x7 is nil. */
 /* set x6 to nil and compare with x7 */
    mov x6, #b01010001
    lsl x6, #56
    cmp x7, x6

 /* Set our return value to zero if the arglist is nil */
    moveq x2, #0  /* put a zero value in x2 for the
                     conditional to follow if args are nil */
    moveq x0, x6  /* put nil in x0 if the args are nil */

 /* Get the first conditional argument if the args are not nil */
    movnq x2, #1  /* put a nonzero value in x2 for the
                     conditional to follow if args not nil*/
    str x0, x7    /* put the pointer to our list in x0
                     so car knows the location of our argument
                     list.*/
    blnz x2, car  /* call car. the pointer to our first
                     conditional will come back in x0 */

 /* evaluation of the conditional in x0 goes here */

 /* Check if the evaluated conditional in x0 is truthy */
    cmp x0, x6    /* remember #4 is #b0010 which is nil */

 /* If truthy, set x2 to non-zero for following conditional branch,
    otherwise set x2 to zero.*/
    movnq x2, #1
    moveq x2, #0

 /* We need the cdr for the arglist regardless of whether our
    conditional was truthy, so let's get that */
    movnq x0, x7  /* store the pointer to our list in x0 */
    bl cdr    /* get the cdr to our argument list, returns in x0*/

 /* If conditional was truthy get the resultant form, which is the second
    item in our argument list. Returns in x0. */
    blnz x2, car

 /* If the conditional was not truthy, get the arglist minus the
    initial conditional and result form */
    blz x2, cdr

 /* If the conditional was not truthy, set the register for the
    arglist to the new arglist returned above*/
    movnq x7, x0

 /* Get back the stack pointer and return address */
    ldp x29, x30, [sp], #16

 /* if the conditional was not truthy, recur */
    bnz x2, cond

 /* evaluation of the return value goes here. It'll return in x0*/

 /* return to calling function */
    ret

Conclusion

Well, this concludes Part 1 of our definition of a preliminary lisp. In Part 2, we’ll define the rest of our primitives. In Part 3 we’ll actually try and get a REPL going and evaluate some code3!

Footnotes

1 (Thanks to Andres Raba for his version of SICP and for these figures!). Now, if we look at that, it becomes pretty clear what exactly that diagram actually represents:

2 We’re not going to worry about memory allocation right now, or heaps and stacks, (although Henry Baker has some interesting things to say on the matter). Why? Because it’s a big topic that I’m not ready to approach. I do have some ideas, and it has ramifications to what we’re going to do in this post, but suffice to say: where we’re going, we don’t need heaps!

3 And by evaluate some code, I mean find out just how glaringly wrong the code in Part 1 and 2 actually is!