Expressing Local Assignment without Side-Effects using Fold (can you do this in Haskell?)
I have been talking lately on my blog and at Lambda-the-Ultimate.org about how a while loop that modifies local values can be expressed as the composition of a sequence of pure functions (e.g. without side-effects) by writing it in a stack-based language. To demonstrate this I have been using the following Java example:
int process(int[] xs) { int zeros = 0; int sum = 0; int i=0; while (i < xs.Size) { if (xs[i] == 0) zeros++; sum += xs[i]; ++i; } return sum - zeros; }
Now I have heard frequently that the above code is considered to have side-effects because of the local assignments to variables. However, the mapping to pure-functional stack-based code (here shown in Cat) is very straightforward:
define process { // top argument = xs 0 // zeros 0 // sum 0 // i [ dig3 // bring xs to top of local stack dupd swap get_at eqz // xs[i] == 0 [[inc] dip3] // inc zeros [] if dupd swap get_at // tmp = xs[i] [bury3] dip // put xs to bottom of local stack swap [add_int] dip // sum += tmp inc // ++i ] [ dig3 count [bury3] dip // tmp = xs.Size dupd lt_int // i < tmp ] while pop // remove i swap sub_int // sum-zeros }
What I think is much more interesting is that we can rewrite the above code very succinctly using a fold instruction as follows:
define process {
0 swap 0 [[[inc] dip2] ifeqz +] fold swap sub_int
}
What happens in the code example is we place a zero below the list argument (representing the xs variable) and the fold function happily accesses it (increments it to be specific) without violating type-safety and remaining entirely free of side-effects.
The Cat version of fold is different than that of foldl in Haskell because it accepts a tail-polymorphic argument (as opposed to simply a binary function). In other words the function argument can accept any number of arguments (say N + 1), as long as it returns N results whose types match those of the first N arguments. In Cat the type is expressed as:
fold : ('A list 'b ('A 'b 'c -> 'A 'b) -> 'A 'b)
Type variable names starting with capital letters (e.g. ‘A) are type-vector variables and refer to “the rest of the stack”. These kinds of type variables are sometimes called “row variables”, and this kind of polymorphism is called “row polymorphism”. I view the ability to return multiple results as the primary advantage of a stack language.
What happens in the code example above is we place a zero below the list argument that represents the xs variable and the fold function happily accesses it (increments it to be specific) during each iteration without violating type-safety, and all the while remaining entirely side-effect free. The reason it is not a side-effect is because the type of the fold function is automatically widened to encompass the fact that it accesses an additional value on the stack.
Now I have been looking for a way to do this in Haskell, and I can’t find a way to do it in a general manner unless I package all local variables in a list, and have the fold function return all local arguments as a list. This is a rather laborious step, and is not practical for a compiler. Cat seems to be a win here, because it closely resembles the final assembly code that is to be generated.
I am hoping that maybe some of the Haskell hackers out there may rise to the challenge of providing a simple implementation in Haskell in the very possible case that I am mistaken.Â