Expressing Local Assignment without Side-Effects using Fold (can you do this in Haskell?)

I have been talking lately on my blog and at Lambda-the-Ultimate.org about how a while loop that modifies local values can be expressed as the composition of a sequence of pure functions (e.g. without side-effects) by writing it in a stack-based language. To demonstrate this I have been using the following Java example:

int process(int[] xs)  {   int zeros = 0;   int sum = 0;   int i=0;   while (i < xs.Size) {     if (xs[i] == 0)       zeros++;     sum += xs[i];     ++i;   }   return sum - zeros; }

Now I have heard frequently that the above code is considered to have side-effects because of the local assignments to variables. However, the mapping to pure-functional stack-based code (here shown in Cat) is very straightforward:

define process {   // top argument = xs   0 // zeros   0 // sum   0 // i   [     dig3 // bring xs to top of local stack     dupd swap get_at eqz // xs[i] == 0       [[inc] dip3] // inc zeros       []     if     dupd swap get_at // tmp = xs[i]     [bury3] dip // put xs to bottom of local stack     swap [add_int] dip // sum += tmp     inc // ++i   ]   [     dig3 count [bury3] dip // tmp = xs.Size     dupd lt_int // i < tmp   ]   while   pop // remove i   swap sub_int // sum-zeros }

What I think is much more interesting is that we can rewrite the above code very succinctly using a fold instruction as follows:

define process {
    0 swap 0 [[[inc] dip2] ifeqz +] fold swap sub_int
}

What happens in the code example is we place a zero below the list argument (representing the xs variable) and the fold function happily accesses it (increments it to be specific) without violating type-safety and remaining entirely free of side-effects.

The Cat version of fold is different than that of foldl in Haskell because it accepts a tail-polymorphic argument (as opposed to simply a binary function). In other words the function argument can accept any number of arguments (say N + 1), as long as it returns N results whose types match those of the first N arguments. In Cat the type is expressed as:

fold : ('A list 'b ('A 'b 'c -> 'A 'b) -> 'A 'b)

Type variable names starting with capital letters (e.g. ‘A) are type-vector variables and refer to “the rest of the stack”. These kinds of type variables are sometimes called “row variables”, and this kind of polymorphism is called “row polymorphism”. I view the ability to return multiple results as the primary advantage of a stack language.

What happens in the code example above is we place a zero below the list argument that represents the xs variable and the fold function happily accesses it (increments it to be specific) during each iteration without violating type-safety, and all the while remaining entirely side-effect free. The reason it is not a side-effect is because the type of the fold function is automatically widened to encompass the fact that it accesses an additional value on the stack.

Now I have been looking for a way to do this in Haskell, and I can’t find a way to do it in a general manner unless I package all local variables in a list, and have the fold function return all local arguments as a list. This is a rather laborious step, and is not practical for a compiler. Cat seems to be a win here, because it closely resembles the final assembly code that is to be generated.

I am hoping that maybe some of the Haskell hackers out there may rise to the challenge of providing a simple implementation in Haskell in the very possible case that I am mistaken. 

Comments are closed.