Why local arrays in functions seem to prevent TCO?

Question

Looks like having a local array in your function prevents tail-call optimization on it on all compilers I've checked it on:

int foo(int*);

int tco_test() {
    // int arr[5]={1, 2, 3, 4, 5}; // <-- variant 1
    // int* arr = new int[5]; // <-- variant 2
    int x = foo(arr);
    return x > 0 ? tco_test() : x;
}

When variant 1 is active, there is a true call to tco_test() in the end (gcc tries to do some unrolling before, but it still calls the function in the end). Variant 2 does TCO as expected.

Is there something in local arrays which make it impossible to optimize tail calls?

This is simply not a tail call. GCC may still transform the call and optimise it *like* a tail-call under the as-if rule. — Konrad Rudolph, Aug 10 '16 at 16:19
Which version of GCC? What flags are you passing to the compiler? — Some programmer dude, Aug 10 '16 at 16:19
Besides, the recursive call is part of another expression so it's not really a tail call anyway. — Some programmer dude, Aug 10 '16 at 16:20
@JoachimPileborg,variant2 is is optmized as a tail call. On any rate, switching to `return tco_test()` doesn't make a difference, it was there simply to stop compiler thinking I have an undefined behavior there. — SergeyA, Aug 10 '16 at 16:21
Does the memory leak introduced by variant 2 have anything to do with it? — NathanOliver, Aug 10 '16 at 16:33
@NathanOliver, nope. I have tested with delete's there as well, I was just trying to give as concise example as possible. — SergeyA, Aug 10 '16 at 16:40
To test things more properly you should do the (trivial) rewrite to have the recursive call directly as a `return` expression at the very end. But this can only tell you something about your compiler's optimization abilities. THere's nothing about the local array that absolutely prevents tail call optimization. — Cheers and hth. - Alf, Aug 10 '16 at 18:41

Kaz · Answer 1 · 2016-08-10T18:09:36.380

4

If the compiler sill performed TCO, then all of the external foo(arr) calls would receive the same pointer. That's a visible semantics change, and thus no longer a pure optimization.

The fact that the local variable in question is an array is probably a red herring here; it is its visibility to the outside via a pointer that is important.

Consider this program:

#include <stdio.h>

int *valptr[7], **curptr = valptr, **endptr = valptr + 7;

void reset(void)
{
  curptr = valptr;

}

int record(int *ptr)
{
  if (curptr >= endptr)
    return 1;
  *curptr++ = ptr;
  return 0;
}

int tally(void)
{
  int **pp;
  int count = 0;

  for (pp = valptr; pp < curptr; pp++)
    count += **pp;

  return count;
}

int tail_function(int x)
{
  return record(&x) ? tally() : tail_function(x + 1);
}

int main(void)
{
  printf("tail_function(0) = %d\n", tail_function(0));
  return 0;
}

As the tail_function recurses, which it does via a tail call, the record function records the addresses of different instances of the local variable x. When it runs out of room, it returns 1, and that triggers tail_function to call tally and return. tally sweeps through the recorded memory locations and adds their values.

If tally were subject to TCO, then there would just be one instance of x. Effectively, it would be this:

int tail_function(int x)
{
tail:
  if (record(&x))
    return tally();
  x = x + 1;
  goto tail;
}

And so now, record is recording the same location over and over again, causing tally to calculate an incorrect value instead of the expected 21.

The logic of record and tally depends on x being actually instantiated on each activation of the scope, and that outer activations of the scope have a lifetime which endures until the inner ones terminate. That requirement precludes tail_function from recursing in constant space; it must allocate separate x instances.

edited Aug 10 '16 at 18:09

answered Aug 10 '16 at 16:43

Kaz

55,781
9
100
149

1

Not sure if I follow. What do you mean by 'external foo' in this case? May be some illustrating code? – SergeyA Aug 10 '16 at 16:55
Calls to `foo` can't be inlined because there is no definition in scope. – Kaz Aug 10 '16 at 17:48
@KonradRudolph I have added an illustrative example how how exporting the address of a local variable can create program semantics which breaks under constant space recursion. – Kaz Aug 10 '16 at 18:12
I am not following this example. How is an array is different from a single pointer in terms of external visibility? My variant2 could as well be in a form of `int* arr = &static_int_variable` and it would still be TCO-optimized (ill-formed as it is, of course). May be you could reword your example, make it slightly? shorter and emphasize how array is different from the pointer here – SergeyA Aug 10 '16 at 18:31
@Kaz Ah, I was confusing `foo` and `tco_test`. The answer now makes sense. But just to be pedantic: [`foo` can be inlined at link time](http://stackoverflow.com/q/5987020/1968). – Konrad Rudolph Aug 10 '16 at 18:34
@KonradRudolph However, the example holds up even if `record` and `tally` are completely inlined into `tail_funtion`. Of course, on the other hand, the compiler could see very deeply and just reduce the whole thing to `return 21`. – Kaz Aug 10 '16 at 18:36
@Kaz I know. As I said: I simply mistook `foo` for the recursive function in the explanation. – Konrad Rudolph Aug 10 '16 at 18:37

Why local arrays in functions seem to prevent TCO?

1 Answers1