I Officially Like C# Extension Methods
Thursday, May 15th, 2008I officially like C# extension methods. They can be used to create traits. That’s it, just thought I’d share this with you.
I officially like C# extension methods. They can be used to create traits. That’s it, just thought I’d share this with you.
Here are some of the things I have been up to:
I have been thinking a lot about how to extend the Cat type system so that I can handle objects. The big challenge is to do this in a way that doesn’t add any significant complexity to the language. This is particularly hard to do for a language as small and tight as Cat.
Previously I was thinking about starting with tuples. Not only are tuples nice on their own, but the type of an object is essentially a tuple. There is one simple way to add tuples, which is to represent them as dynamically constructed functions. E.g. ( -> ( -> int string)) which would be a tuple containing an int and a string. The challenge occurs when I try to get values in and out of that function? I want something like: “get0″, “get1″, etc. which gets the value from the appropriate slot. This is trivial to code, but the problem is that I can’t express the type of “getN” in a general enough manner. Consider
get0 : (( -> ‘A ‘b) -> ‘b)
get1 : (( -> ‘A ‘b ‘c) -> ‘b)
get2 : (( -> ‘A ‘b ‘c ‘d) -> ‘b)
…
This works fine, but we can only write so many of these (perhaps a dozen or so) before they start to become unmanageable. The problem then becomes that we would have to limit the number of field accessors, and hence the number of fields to a relatively small and arbitrary number. That is extremely limited, and nearly useless in practice when we can easily imagine objects with hundreds of fields or more. It will also put a horrible strain on the type reconstruction and checking algorithms.
Sidebar: I will probably limit the number of fields to 255 anyway for several reasons:
- there has to be some stated upper limit
- too many fields represents a horrible flaw in the design of OO software
- it is is trivial to extend the 255 limit using aggregation (putting objects in objects)
- Cat is designed to run on small memory systems, so being able to represent the index of a field with a single byte is appealing.
There is another subtle problem (or benefit depending on your point of view) and that is that this approach automatically provides structural subtyping. In other words two types that are structured the same will always be treated as exactly the same type. While this is useful in some circumstances, it is not good to paint ourselves in to such a corner. It is very important for software modularity and correctness to be able to construct two different types with similar structure when needed.Â
The following is an entirely different approach that I am now seriously considering using. First I will approach the problem of objects and use those to build tuples. Â
object IntString { x : int; y : string; }
6 “the answer is ” IntString.new x.get 7 * x.set
y.get write x.get writeln
So what is the type of the constructors, setters, and getters? Well I propose that the types are:
IntString.new : (int string -> IntString)
x.get : (’o -> ‘o ‘o.x)
x.set : (’o ‘o.x -> ‘o)
y.get : (’o -> ‘o ‘o.y)
y.set : (’o ‘o.y -> ‘o)
So what has happened is that the type of ‘o.x is treated as an unknown variable, that is expanded once ‘o is known. Internally the type reconstruction algorithm simply has to create an object to represent the IntString, and track it.
So far I am very pleased with this approach, because the overall changes to the type system are very minimal, and the everything looks like it will work fine.
The next step is to allow fields to be added incrementally. I am thinking that the syntax may be something like:
30 “answer” IntString.newÂ
12 z.add
x.get z.get +
So the question is what the type will be off “z.add”? Perhaps something like:
z.add : (’o ‘x -> ‘o+z)
This ‘o+z just indicates that the result is a new type with a field named “z”. The type checker will have to manage the type internally. So to make this a fully compatible duck-typing system, such as the kind found in prototype based object-oriented languages like JavaScript, the requirement is simply that any object can be used where any other object is used, and that if a field is requested whose type is not known at compile-time, we return a value of type “any”, with the special value “undefined” used to indicate failure to find the field.Â
Cat version 1.0 beta 4 was just released, download available here. New features inlude:
Other recent news includes:
About CatÂ
Cat is a stack-based programming language that is distinct in that it allows functions to be constructed and pushed on to the stack in a type-safe manner. Cat has potential applications in computer science education, and as an intermediate language for optimization, verificatino, program transformation, and automated translation.
This is to announce that I have released version 1.2 of the online Cat interpreter. There are some bug fixes (e.g. “dup” now properly copies lists) and I have introduced primitive graphics capabilities, along with a new tool-tip help text to the various predefined instructions. Have fun, and feel free to reuse the source code, it is public domain!
I finally just finished integrating type checking back into the Cat compiler. This is only available in the version of Cat most recently checked into the SVN repository so feel free to pull it out: http://code.google.com/p/cat-language/source/checkout.
So Cat supports “self” types again which are references to enclosing functions. They are relatively rare in functions, but one example is the m combinator, defined as “dup apply” which has the type:
m : (’A (’A self -> ‘B) -> ‘B).
Anyway, I can’t sum up how happy I am to have this working with a minimum amount of pain.
This is an announcement of a new release of the Cat programming language. Recent changes include:
Cat is a functional stack-based language based on Joy. Cat is designed for several different purposes, but one of my key intentions was for education. Stack based languages can be very interesting for teaching computer science concepts because a student can see each step of a computation, and better understand how a computer operates.
In an effort to make Cat a suitable language for teaching programming I wanted to recapture the spirit of the Logo language in Cat. I found that when learning to program it was not only fun to write drawing programs, but that seeing the results of any mistakes I made helped me to understand what went wrong, and learn faster.
To do this I have included a turtle graphics library. The following screenshot is a demonstration of Cat in action:
The program that generated this is:
>> ow [20 tr pu 40 fd pd 40 draw_triangle] 20 repeat
These instructions are:
If you are interested in using Cat to teach programming, please let me know if I can help in any way. I’m very interested in trying to make it easier to teach programming to people.
I have been talking lately on my blog and at Lambda-the-Ultimate.org about how a while loop that modifies local values can be expressed as the composition of a sequence of pure functions (e.g. without side-effects) by writing it in a stack-based language. To demonstrate this I have been using the following Java example:
int process(int[] xs) { int zeros = 0; int sum = 0; int i=0; while (i < xs.Size) { if (xs[i] == 0) zeros++; sum += xs[i]; ++i; } return sum - zeros; }
Now I have heard frequently that the above code is considered to have side-effects because of the local assignments to variables. However, the mapping to pure-functional stack-based code (here shown in Cat) is very straightforward:
define process { // top argument = xs 0 // zeros 0 // sum 0 // i [ dig3 // bring xs to top of local stack dupd swap get_at eqz // xs[i] == 0 [[inc] dip3] // inc zeros [] if dupd swap get_at // tmp = xs[i] [bury3] dip // put xs to bottom of local stack swap [add_int] dip // sum += tmp inc // ++i ] [ dig3 count [bury3] dip // tmp = xs.Size dupd lt_int // i < tmp ] while pop // remove i swap sub_int // sum-zeros }
What I think is much more interesting is that we can rewrite the above code very succinctly using a fold instruction as follows:
define process {
0 swap 0 [[[inc] dip2] ifeqz +] fold swap sub_int
}
What happens in the code example is we place a zero below the list argument (representing the xs variable) and the fold function happily accesses it (increments it to be specific) without violating type-safety and remaining entirely free of side-effects.
The Cat version of fold is different than that of foldl in Haskell because it accepts a tail-polymorphic argument (as opposed to simply a binary function). In other words the function argument can accept any number of arguments (say N + 1), as long as it returns N results whose types match those of the first N arguments. In Cat the type is expressed as:
fold : ('A list 'b ('A 'b 'c -> 'A 'b) -> 'A 'b)
Type variable names starting with capital letters (e.g. ‘A) are type-vector variables and refer to “the rest of the stack”. These kinds of type variables are sometimes called “row variables”, and this kind of polymorphism is called “row polymorphism”. I view the ability to return multiple results as the primary advantage of a stack language.
What happens in the code example above is we place a zero below the list argument that represents the xs variable and the fold function happily accesses it (increments it to be specific) during each iteration without violating type-safety, and all the while remaining entirely side-effect free. The reason it is not a side-effect is because the type of the fold function is automatically widened to encompass the fact that it accesses an additional value on the stack.
Now I have been looking for a way to do this in Haskell, and I can’t find a way to do it in a general manner unless I package all local variables in a list, and have the fold function return all local arguments as a list. This is a rather laborious step, and is not practical for a compiler. Cat seems to be a win here, because it closely resembles the final assembly code that is to be generated.
I am hoping that maybe some of the Haskell hackers out there may rise to the challenge of providing a simple implementation in Haskell in the very possible case that I am mistaken.Â
In a functional stack-based langauge, it is quite natural to handle local assignment with destroying referential transparency. The following is from a recent post I made to Lambda-the-Ultimate.org in order to ascertain the wider thinking on the problem.
Okay, this might seem like an odd question, but is there contention on whether or not a while loop that performs assignment on local function variables is pure functional? For example the following Java code:
int process(int[] xs) {int zeros = 0;int sum = 0;int i=0;while (i < xs.Size) {if (xs[i] == 0)zeros++;sum += xs[i];++i;}return sum - zeros;}Now I have heard frequently that the above code is considered to have side-effects because of the local assignments to variables. However, the mapping to pure-functional stack-based code (here shown in Cat) is very straightforward:
define process {// top argument = xs0 // zeros0 // sum0 // i[dig3 // bring xs to top of local stackdupd swap get_at eqz // xs[i] == 0[[inc] dip3] // inc zeros[]ifdupd swap get_at // tmp = xs[i][bury3] dip // put xs to bottom of local stackswap [add_int] dip // sum += tmpinc // ++i][dig3 count [bury3] dip // tmp = xs.Sizedupd lt_int // i < tmp]whilepop // remove iswap sub_int // sum-zeros}So, my problem is that I am not sure whether this is old news or new news. Any feedback would be appreciated.
For those interested, the type of “while” in Cat is (’A (’A -> ‘A) (’A -> ‘A bool) -> ‘A). In other words, while takes a stack with two functions on the top, the first one is a predicate and the next one is the loop body. If the predicate returns true, the loop body is executed, and the process is repeated. The loop body takes the rest of the stack as an argument, and returns it in the same state. The predicate is the same, but in addition pushes a boolean value.
If you really feel like geeking out then we can define while from core primitives using the “m” recursive combinator (m == dup apply) as follows:
define m_while : ('A ('A -> 'A) ('A -> 'A bool) -> 'A)
{{
 desc:
 A while function written using the m (a.k.a. U) combinator
 test:
 in: 1 5 [[2 mul_int] dip dec] [is_neqz] m_while pop
 out: 32
}}
{
 // [$A] [$B]
 [dip swap] papply // [$A] [[$B] dip swap]
 swap // [[$B] dip swap] [$A]
 [dip m] papply // [[$B] dip swap] [[$A] swap m]
 quote compose // [[$B] dip swap [[$A] dip m]]
 [[pop] if] compose // [[$B] dip swap [[$A] dip m] [pop] if]
 m
}
To be completely forthright there is a small bug in the “m_while” code. The actual inferred type is “(’A (’B -> ‘C) (’A -> ‘B bool) -> ‘B)”. This is because my type-inference engine can not currently handle recursive types properly (it used to, honest!). This is just a question of finding the time to sit down and fix it, at least theoretically there are no hurdles preventing one from performing inference of equi-recursive types.
Cat is my functional stack-based language based on Joy. One of the interesting characteristics Cat shares with Joy is an incredible succinctness, which rivals that of array-oriented languages such as K or Q. A demonstration of this is how list reversal can be achieved with just three instructions:
nil [cons]Â foldÂ
You would use it like this:
>>Â [1 2 3 4] list nil [cons] foldÂ
stack: (4Â 3 2 1)
One of the advantages of stack-based languages is that they are executed left to right. This means we can enter each instruction on separate lines which is useful for pedgogical purposes because we can see the stack after each instruction.
The first thing that happens is that we push a function literal (called a quotation) on to the stack:
>> [1 2 3 4]
stack: [1 2 3 4]
This pushes a function on the stack that if executed would push the values 1,2,3 and 4. Next we call the “list” instruction
>> list
stack: (1 2 3 4)
This converts a quotation into a list. We can convert any function into a list as long as it doesn’t pop any values off of the stack. The resulting list contains the values that would be generated if we executed the function on any empty stack.
The next thing we do is push an empty list on the stack using the “nil” instruction.
>>Â nil
stack: (1 2 3 4) ()
Now we push another quotation containing the “cons” instruction on the stack.
>> [cons]
stack: (1 2 3 4) () [cons]
The “cons” instruction is similar to the Lisp/Scheme instruction that appends a value to a list. For example “[1 2 3] list 4 cons” is the same as “[1 2 3 4] list”. You may notice then that the front of a list is the rightmost value. This is in keeping with the postfix notation of Cat.
The final instruction is the “fold” instruction.
>> fold
stack: (4 3 2 1)
The Cat “fold” instruction, which is similar to the “foldl” function in Haskell, takes a binary function off of the stack, an initial accumulator value, and a list. The binary function is applied successively to each element in the list and the accumulated value. On each iteration the accumulated value is updated with the result of applying the function. Another example of using the fold function is when summing the values in a list (e.g. “[1 2 3 4] list 0 [+] fold”).
In future posts I will go into more detail about the Cat “fold” function, how it is constructed, and how it differs from the Haskell “foldl” function. For now I hope you will download the interpreter.