Type of literals and constructing hash-tables at compile-time.

So what is the type of a constant literal such as 42? Common wisdom says it should be an integer.

Well the thing is that there is no reason for 42 not to have its own unique type which is a subtype of an integer. This kind of approach provides you with a richer type system, but without any significant additional complexity. I simply assign a new type name to every constant literal used in a program. Extending the idea to strings and collections, you can imagine a compiler that constructs hash-tables at compile-time. This is in fact how I am building the Cat object system.

Where is all of this going? Well its up to other people to take this in different directions. I’m quite busy working on the language implementation and on a book project. One interesting research direction would be to write a type-checked and optimized JavaScript by converting JavaScript to Cat. Let me know if this interests you, and you’d like my help.

3 Responses to “Type of literals and constructing hash-tables at compile-time.”

  1. Tom Cahalan Says:

    I had this same idea for the language that I am designing! I didn’t want modifiers mucking up my syntax and so I ended up with the easiest way to make a constant would be to make them subtypes. This also leads to a prototype approach to subclasses. You see if you have your traditional Person class with a Student subtype:

    Person p;
    Student s;

    p=s; //legal
    s=p; //not legal

    int i;
    42 n;
    42 m;
    i=n; //legal
    n=1; //not legal
    n=m; //legal

    However while the idea is cool it didn’t really do much for my language so I cut quickly.

  2. cdiggins Says:

    Hi Tom,

    You didn’t find that the approach made your type system more powerful? In general it should enable certain kinds of metaprogramming. The programmer can use constant values as compile time flags.

    Tell me more about your language. I am always interested in new language designs.

  3. Tom Cahalan Says:

    Well you see it would all have to be done at run time, rather than compiletime, unless

    the compiler uses what it knows and figures stuff out. You see I wasn’t going to

    actually have constants, just variables that act like constants. For example you have

    type A and you declare B:

    A B; //B is a variable of type A and/or a subtype of A

    WHich is it? Subtype or variable? Depends on perspective. So then if you wanted to

    treat B as a type you could do this:

    B b1; B b2; B b3

    You can assign bx to by because they are all on the same level. You can assign any of

    the b’s to B, and it would slice off anything extra. You would be unable to assign B to

    the b’s, because the b’s might have extra members that you would be unable to fill.

    So that would lead to some crazy stuff. Remember type/variable A? Well what if you

    assign C to A? Now B has changed type/supertype!

    How would this be implemented? I’m thinking that every object would have a pointer to a

    parent and a hashtable for its local members. There would be an add_member function.

    Object Person;
    Person.add_member(string name);
    Person Joe;
    Joe.name=”Joe”;
    Person Teacher;
    Teacher.add_member(int grade_level);
    Teacher.add_member(string subject_taught);
    Teacher Bob;
    Bob const_bob;

    So now you can see that you can assign const_bob to Bob, but not vice-versa. You could

    assign Teacher to Joe and vice-versa but that would be bad, very bad. Something like

    this would be interesting but is not at all the direction that I wanted to go.

    So anyway my language is designed to be a lot like C++ but with simpler syntax and some new features: tuples, first class functions, and multiple dispatch.

    so I came up with my syntax first (it used to be with the identifier first, but that was

    harder to parse)

    int i; //int
    int@ ip; //int pointer
    int->int f; //function taking an int and returning an int
    (int,int)->(int,int) g; //function taking a tuple of ints and returing two ints

    I am still not sure how I want to do templates or macros but I know that I will add them

    soon and that I want them to be tuple friendly. Thus I needed a way to package a tuple

    into a singular thing (object, type, etc…). So if you make an array (array will be a container in the library, not a built in metatype) of int:

    array(int) a;

    you need to be able to do something like this:

    array(int,float,int) a;

    but then you aren’t providing the same number of elements to the template. That made me

    introduce boxing:

    array([int,float,int) b;

    oh and the parenthesis are uneccessary, because they only affect precedence. They have

    no other meaning. So if all templates assume boxed arguments you would write:

    array[int] a;
    array[int,float,int] b;

    You see when you call functions like this:

    f(x)

    it is no different than:

    f x

    because there is an implied adjacency operator. If two things are next to each other

    there is assumed to be an invisible operator between them. So in the syntax tree there

    would be an adj node with f as its left node and x as its right.

    You see everything is an expression. There are no statements like in C++. The

    semicolon is used as an operator. Unlike most operators which can evaluate their

    arguments in either order, it always evaluates its left and then its right, and then it

    returns its right argument.

    So this is legal (but ill-advised code):

    sin(DoSomething();pi)

    first it does something, then it passes pi to the sine function. The semicolon is kind

    of a complement to the comma (tuple) operator. The comma evaluates its arguments in

    either order and then returns both as a tuple. So you could write code like this:

    f(); g(); h()

    so call them in order, or like this:

    f(), g(), h()

    and allow the compiler to call them in any order. I may introduce the ,, operator to

    allow for simulataneous execution:

    f(),, g(),, h()

    but that is in the future if ever. By combining , ; and ( ) to control grouping you can

    create a lot of different execution graphs. The notion of which statement comes before

    another gets tricky. For any given statement there will be some which come before it,

    some which come after, and some which don’t come before or after. I figured out some

    rules for it to work. So you can’t use a value unless it comes before, but you can make a pointer or declare a variable.

    The main reason that ( ) is used only for precedence is that every operator has one meaning. You don’t need anywhere near as much context to figure things out as you would in C++. This also meant that I had to give up prefix operators. Only infix, outfix (the () [] and {} operators) and postfix.

    Making functions first class objects has a lot of implications. For example you decide to make a global variable as follows:

    (int numerator, int denominator) -> (int quotient, int remainder) DivMod
    {
    quotient=numerator/denomination,
    remainder=numerator%denominator
    }

    Ok so far so good, you have a function of type (int,int)->(int,int)
    Now what if you want a member of a class C to have a member of type (int,int)->(int,int), what then?
    Maybe you want that member to be a method, with access to a “this” pointer. Maybe you don’t, maybe you want to be able to assign your DivMod function to it. What if you want the function to be a method, and then you want to be able to assign it to a global variable. Should it be able to access its object? Should it be able to reseat itself into another object of the same type? Too many issues. I probably could have solved them well enough, but I just decided to eliminate the “this” pointer and so no more methods. If a class has functions as members those members have no special status. Afterall, if you have a Person type and he has four Legs, those Legs don’t automatically know about the Person that they are a part of. They might have a reference or a pointer, but only if the programmer provides it. So I got rid of the “this” pointer. With multiple dispatch you aren’t even treating the “this” special anyway.

    Now I wanted to be able to return functions, have them access local variables, etc… As I’m sure that you are aware this causes problems when your variables are on the stack. I didn’t want to throw everything on the heap and go with a garbage collected language either. That was departing from C++ too much. So all variables are on the stack, but scopes are on the heap. Whenever a function is called a secret scope node is created. It has three members: a parent pointer, a frame pointer, and a reference count. Like so:

    ()->() A
    {
    int x;
    x=4;

    ()->( (int)->(int) c ) B
    {
    c=(int y)->(int z){z=y*x};

    };

    int->int b;
    C = B();

    print C(5); //prints 20
    x=6;
    print C(5); //prints 30
    }

    if X is an function object, and X’ is a call to X, and X” is the node created by the prologue of X’

    when A is called it has a node object A”. A” contains a frame pointer to the locals of A. B is a local variable accessable by an offset from that pointer. B has two private members: a pointer to some x86 code and a pointer to A”. When B is called the invocation is B’. It creates a node B”. B” gets its parent pointer from object B. Thus B’ can use A’ variables. B’ creates variable C, a function object. One of the members of C is the pointer to B”. When B’ returns and assigns c to b, the stack for B’ is cleaned up but B’ remains because C has a reference to it. When C is called it too has a node, and it can access x by following the chain of nodes. It does not capture the x, it can get to it. This is kind of like static chains but not on the stack.

    Oh and you can leave the type off of functions, at least if they take no arguments. For example: {5} is a function of type: ()->(int) that always returns 5. {doThis();thenThat()} is of type ()->(), assuming that thenThat returns void. A block with no return type specified returns the value of the internal expression. I am thinking of adding in arguments too, like this: {$1(int)+$2(int)} which would be the same as: (int x, int y)->(int z){z=x+y}. However that is a lot of work so it is likely to be in a later release.

    Lately I have been thinking about making boxes of variable declarations into class literals, so you could do this:

    [int x, int y] p;
    p.x=5;

    So [int,int] would be a type literal but not a class literal, and there would be some sort of implicit conversion

    so:

    p=[6,7] would work, but you couldn’t do this:

    [int,int] q;
    q=p; // [int x, int y] is not converatble to [int,int]

    But you could throw the unboxing operator in there, and then rebox it:

    q=[p`];

    As for progress, first I wrote the tokenizer, then the parser. The parser was only 650 lines of C++. I found a great program called graphviz so I can see the parse trees too. Then I wrote something to order things by when they are temporarlly (remember the ; and , operators). Then I started writting x86 so that I would know what code I had to generate but I had some trouble. I was getting frustrated so I am taking a year off. This will give me time to think over how I want to do templates, Also that class literal idea was in my head but I hadn’t thought of it yet so it was bothering me. So right now I am making a video game with C++. I will start back on my language next father’s day (since that is when I decided to take a year off).

Leave a Reply

You must be logged in to post a comment.