Personal site and blog. Please feel free to contact me via the social networks below.
please visit the new revision of this article georgeplotnikov.com/tale-about-the-tail-call-in-dotnet
In the programming as the recursion we call the function which is directly, or indirectly calls itself: A->B / A->B->A. The recursive calls obviously might be a reason of stack overflowing [StackOverflowException]. On the other hand, the recursive styled functions allow us clearly write our ideas, as Fibonacci, Fractal, Tree-walk function etc.
private static long Factorial(int n)
{
return n== 0 ? 1 : n* Factorial(n - 1);
}
Because of being aware of the possible failure, there are several well-known approaches to resolve stack overflow issue, they are: looping and inlining the recursive function. From the programmer’s perspective the suitable approach might be re-write the function with the loop, inlining is rarely applicable, but with code, readability sacrificing. The inlining and another method of reducing stack calls for the recursive function is the “tail call”. Both methods are resolved via the compiler. This is tremendously important both for C# dotnet developers and, moreover, for the functional languages programmers - aka F#. For functional languages, it is natural to express the intention with the recursion.
let rec factorial n =
match n with
| 0 | 1 -> 1
| _ -> n * factorial(n-1)
Thanks to Wiki: “In computer science, a tail call is a subroutine call performed as the final act of a procedure. If a tail call might lead to the same subroutine being called again later in the call chain, the subroutine is said to be tail-recursive, which is a special case of recursion. Tail recursion (or tail-end recursion) is particularly useful, and often easy to handle in implementations.”
of the factorial function:
private static long Factorial(int n)
{
return n== 0 ? 1 : n* Factorial(n - 1);
}
if we consider the IL code we’ll discover the truly recursive self-call:
.method private hidebysig static
int64 FactorialWithoutTailing (
int32 n
) cil managed
{
// Method begins at RVA 0x2f78
// Code size 18 (0x12)
.maxstack 8
IL_0000: ldarg.0
IL_0001: brfalse.s IL_000f
IL_0003: ldarg.0
IL_0004: conv.i8
IL_0005: ldarg.0
IL_0006: ldc.i4.1
IL_0007: sub
IL_0008: call int64 Jit_TailCalling::Factorial(int32)
IL_000d: mul
IL_000e: ret
IL_000f: ldc.i4.1
IL_0010: conv.i8
IL_0011: ret
} // end of method Jit_TailCalling::Factorial
consider the code: the recursive call is not in the tail position, although it is last in the function text. However, automatic optimization does not occur, because, in fact, the last statement is multiplication. Here is the instructions order:
To allow the compiler to generate the tail call we have to modify the function to shift the recursive call in the last position.
private static long FactorialWithTailing(int i, int acc)
{
return i == 0 ? acc : FactorialWithTailing(i - 1, acc * i);
}
and the entrypoint function:
private static long FactorialWithTailing(int i)
{
return FactorialWithTailing(i, 1);
}
The entrypoint function serves the call with constant 1. Consider the new IL code:
.method private hidebysig static
int64 FactorialWithTailing (
int32 i
) cil managed
{
// Method begins at RVA 0x2c70
// Code size 8 (0x8)
.maxstack 8
IL_0000: ldarg.0
IL_0001: ldc.i4.1
IL_0002: call int64 BenchmarkDotNet.Samples.JIT.Jit_Inlining::FactorialWithTailing(int32, int32)
IL_0007: ret
} // end of method Jit_Inlining::FactorialWithTailing
for the entrypoint:
.method private hidebysig static
int64 FactorialWithTailing (
int32 i,
int32 acc
) cil managed
{
// Method begins at RVA 0x2c5d
// Code size 18 (0x12)
.maxstack 8
IL_0000: ldarg.0
IL_0001: brfalse.s IL_000f
IL_0003: ldarg.0
IL_0004: ldc.i4.1
IL_0005: sub
IL_0006: ldarg.1
IL_0007: ldarg.0
IL_0008: mul
IL_0009: call int64 BenchmarkDotNet.Samples.JIT.Jit_Inlining::FactorialWithTailing(int32, int32)
IL_000e: ret
IL_000f: ldarg.1
IL_0010: conv.i8
IL_0011: ret
} // end of method Jit_Inlining::FactorialWithTailing
Pay attention to recursive call instruction
for the non-modified function:
IL_0008: call int64 Jit_TailCalling::Factorial(int32)
IL_000d: mul
IL_000e: ret
for the modified function:
IL_0008: mul
IL_0009: call int64 Jit_TailCalling::FactorialWithTailing(int32, int32)
IL_000e: ret
the key difference is the second case - the multiplication call is above the function calling that’s why it is performed at the stage of computing arguments for the function call. That modification allows CLR automatically within JITting generate non-recursive calls and perform the computation on the single stack.
For the recursive function
let rec factorial n =
match n with
| 0 | 1 -> 1
| _ -> n * factorial(n-1)
the IL code:
.method assembly hidebysig
instance int32 factorial (
int32 n
) cil managed
{
.custom instance void [mscorlib]System.Runtime.CompilerServices.CompilerGeneratedAttribute::.ctor() = (
01 00 00 00
)
// Method begins at RVA 0x2338
// Code size 28 (0x1c)
.maxstack 8
IL_0000: ldarg.1
IL_0001: switch (IL_001a, IL_001a)
IL_000e: ldarg.1
IL_000f: ldarg.0
IL_0010: ldarg.1
IL_0011: ldc.i4.1
IL_0012: sub
IL_0013: callvirt instance int32 Program/TailCallDetector::factorial(int32)
IL_0018: mul
IL_0019: ret
IL_001a: ldc.i4.1
IL_001b: ret
} // end of method TailCallDetector::factorial
looks the same as for the previous example. The function modification will reveal the difference how C# and F# compilers work.
let factorial1 n =
let rec loop i acc =
match i with
| 0 | 1 -> acc
| _ -> loop (i-1) (acc * i)
loop n 1
the IL code:
.method assembly hidebysig
instance int32 factorial1 (
int32 n
) cil managed
{
.custom instance void [mscorlib]System.Runtime.CompilerServices.CompilerGeneratedAttribute::.ctor() = (
01 00 00 00
)
// Method begins at RVA 0x2358
// Code size 8 (0x8)
.maxstack 8
IL_0000: ldarg.1
IL_0001: ldc.i4.1
IL_0002: call int32 Program::loop@47(int32, int32)
IL_0007: ret
} // end of method TailCallDetector::factorial1
the difference is compiler has generated the loop for this function at the compile time: IL_0002: call int32 Program::loop@47(int32, int32)
.method assembly static
int32 loop@47 (
int32 i,
int32 acc
) cil managed
{
// Method begins at RVA 0x20fc
// Code size 28 (0x1c)
.maxstack 8
// loop start
IL_0000: ldarg.0
IL_0001: switch (IL_001a, IL_001a)
IL_000e: ldarg.0
IL_000f: ldc.i4.1
IL_0010: sub
IL_0011: ldarg.1
IL_0012: ldarg.0
IL_0013: mul
IL_0014: starg.s acc
IL_0016: starg.s i
IL_0018: br.s IL_0000
// end loop
IL_001a: ldarg.1
IL_001b: ret
} // end of method Program::loop@47
This solution solves the problem but with another option: to generate new method to calculate the recursion into a loop. But to gain the tail call at JITting one has to change the function and extract the tail call to separate one.
let factorial2 n =
let rec tailCall n f =
if n <= 1 then
f()
else
tailCall (n - 1) (fun () -> n * f())
tailCall n (fun () -> 1)
as the result:
.method assembly hidebysig
instance int32 factorial2 (
int32 n
) cil managed
{
.custom instance void [mscorlib]System.Runtime.CompilerServices.CompilerGeneratedAttribute::.ctor() = (
01 00 00 00
)
// Method begins at RVA 0x2364
// Code size 14 (0xe)
.maxstack 8
IL_0000: ldarg.1
IL_0001: newobj instance void Program/clo@60::.ctor()
IL_0006: tail.
IL_0008: call int32 Program::tailCall@54(int32, class [FSharp.Core]Microsoft.FSharp.Core.FSharpFunc`2<class [FSharp.Core]Microsoft.FSharp.Core.Unit, int32>)
IL_000d: ret
} // end of method TailCallDetector::factorial2
here is the need call just above the return from the method, and for the entrypoint function:
IL_0008: call int32 Program::tailCall@54(int32, class [FSharp.Core]Microsoft.FSharp.Core.FSharpFunc`2<class [FSharp.Core]Microsoft.FSharp.Core.Unit, int32>)
which is obviously with the tail call: tailCall n (fun () -> 1)
.method assembly static
int32 tailCall@54 (
int32 n,
class [FSharp.Core]Microsoft.FSharp.Core.FSharpFunc`2<class [FSharp.Core]Microsoft.FSharp.Core.Unit, int32> f
) cil managed
{
// Method begins at RVA 0x211c
// Code size 30 (0x1e)
.maxstack 8
// loop start
IL_0000: ldarg.0
IL_0001: ldc.i4.1
IL_0002: bgt.s IL_000e
IL_0004: ldarg.1
IL_0005: ldnull
IL_0006: tail.
IL_0008: callvirt instance !1 class [FSharp.Core]Microsoft.FSharp.Core.FSharpFunc`2<class [FSharp.Core]Microsoft.FSharp.Core.Unit, int32>::Invoke(!0)
IL_000d: ret
IL_000e: ldarg.0
IL_000f: ldc.i4.1
IL_0010: sub
IL_0011: ldarg.0
IL_0012: ldarg.1
IL_0013: newobj instance void Program/'tailCall@58-1'::.ctor(int32, class [FSharp.Core]Microsoft.FSharp.Core.FSharpFunc`2<class [FSharp.Core]Microsoft.FSharp.Core.Unit, int32>)
IL_0018: starg.s f
IL_001a: starg.s n
IL_001c: br.s IL_0000
// end loop
} // end of method Program::tailCall@54
here is the call we are interested in:
IL_0006: tail.
IL_0008: callvirt instance !1 class [FSharp.Core]Microsoft.FSharp.Core.FSharpFunc`2<class [FSharp.Core]Microsoft.FSharp.Core.Unit, int32>::Invoke(!0)
IL_000d: ret
To detect JIT tail calls and analyze functions there are two ETW events:
MethodJitInliningSucceeded Event MethodJITTailCallFailed Event
These events are used in the TailCallDiagnoser that I’ve recently contributed to the BanchmarkDotNet project.
Here is the examples of the diagnoser demo and output:
The C# demo:
[Diagnostics.Windows.Configs.TailCallDiagnoser(false)]
[LegacyJitX86Job, LegacyJitX64Job, RyuJitX64Job]
public class Jit_TailCalling
{
[Benchmark]
public long Calc()
{
return FactorialWithoutTailing(7) - FactorialWithTailing(7);
}
private static long FactorialWithoutTailing(int depth)
{
return depth == 0 ? 1 : depth * FactorialWithoutTailing(depth - 1);
}
private static long FactorialWithTailing(int i, int acc)
{
return i == 0 ? acc : FactorialWithTailing(i - 1, acc * i);
}
private static long FactorialWithTailing(int i)
{
return FactorialWithTailing(i, 1);
}
}
output:
--------------------
Jit_TailCalling.Calc: LegacyJitX64(Jit=LegacyJit, Platform=X64, Runtime=Clr)
--------------------
--------------------
Jit_TailCalling.Calc: LegacyJitX86(Jit=LegacyJit, Platform=X86, Runtime=Clr)
--------------------
--------------------
Jit_TailCalling.Calc: RyuJitX64(Jit=RyuJit, Platform=X64)
--------------------
Caller: <null>.<null> - <null>
Callee: BenchmarkDotNet.Samples.JIT.Jit_TailCalling.FactorialWithTailing - int64 (int32,int32)
Tail prefix: False
Tail call type: RecursiveLoop
--------------------
the F# demo:
[<TailCallDiagnoser>]
type TailCallDetector () =
let rec factorial n =
match n with
| 0 | 1 -> 1
| _ -> n * factorial(n-1)
let factorial1 n =
let rec loop i acc =
match i with
| 0 | 1 -> acc
| _ -> loop (i-1) (acc * i)
loop n 1
let factorial2 n =
let rec tailCall n f =
if n <= 1 then
f()
else
tailCall (n - 1) (fun () -> n * f())
tailCall n (fun () -> 1)
[<Params (7)>]
member val public facRank = 0 with get, set
[<Benchmark>]
member self.test () =
factorial self.facRank
[<Benchmark>]
member self.test1 () =
factorial1 self.facRank
[<Benchmark>]
member self.test2 () =
factorial2 self.facRank
output:
--------------------
TailCallDetector.test: DefaultJob [facRank=7]
--------------------
--------------------
TailCallDetector.test1: DefaultJob [facRank=7]
--------------------
--------------------
TailCallDetector.test2: DefaultJob [facRank=7]
--------------------
Caller: <null>.<null> - <null>
Callee: Program+TailCallDetector.factorial2 - instance int32 (int32)
Tail prefix: True
Tail call type: HelperAssistedTailCall
--------------------
Limitations:
Useful links: