# C# value type boxing under the hood

Talks about different situation when value type boxing happens and how to avoid it

I recently had some really interesting discussion with a .NET typesystem expert in the team, and during the conversation he had pointed out an interesting aspect of .NET value type boxing when using constraints. Intrigued by that discussion, I decided to take a further look.

## The basics

Before we dig into the details, let’s review some basics and see how boxing can come into play when calling value type methods.

Suppose we have the following code:

interface IAdd
{
}

{
public int value;

{
value += val;
}

{
value += val;
}

public void Print(string msg)
{
Console.WriteLine(msg + ":" + value);
}
}


Nothing fancy here. Foo is a struct that has a value integer field. It privately implements an interface method that attempts to mutates it’s value, as well as a regular method that does the same thing.

Now if we have the following code:

        Foo f = new Foo();
f.value = 10;



What is the correct value after AddValue call and the Add call?

Initial Value:10


If you are familiar with the language, this is perhaps not surprising to you at all.

But let’s dig a bit deeper and see how JIT does it:

Let’s take a look at the AddValue call first.

lea     rcx,[rbp-18h]
mov     edx,0Ah
call    00007ff8cfa700e0


Note that I’m showing x64 assembly code, which is much easier to understand. The first 4 arguments are always passed in register rcx, edx, r8, r9 (rest is passed through stack), and return value is returned in rax. All these are 64-bit wide registers. In the code above, JIT is passing the ‘this’ pointer in rcx (pointing to portion of the stack starting at rbp-18h, and the integer 10 (0x0a) in rdx/edx (edx is simply the lower 32-bit portion of rdx).

Now if you look at the actual code Foo.AddValue:

00:000> !u 00007ff8cfa70670
Normal JIT generated code
Begin 00007ff8cfa70670, size 36
>>> 00007ff8cfa70670 55              push    rbp
sub     rsp,20h
lea     rbp,[rsp+20h]
mov     qword ptr [rbp+10h],rcx        ; this pointer getting saved
mov     dword ptr [rbp+18h],edx        ; this is integer 10
mov rax,7FF8CF964560h
cmp     dword ptr [rax],0
je      00007ff8cfa70695
call    clr!JIT_DbgIsJustMyCode (00007ff92f534eb0)
nop
mov     eax,dword ptr [rbp+18h]        ; integer 10
mov     rdx,qword ptr [rbp+10h]        ; this pointer getting restored
add     dword ptr [rdx],eax            ; assigning first 4-byte at 'this' with 10
nop
lea     rsp,[rbp]
pop     rbp
ret


Feel free to ignore some of the debugging gibberish (clr!JIT_DbgIsJustMyCode). If you follow my comments in the assembly (starting with ;), you can see 10 is being added to the first 4-byte memory location at ‘this’, which is exactly what value += val is supposed to do.

And you get the following:

Initial Value:10


## Interface call into the value type instance method

Now, let’s take a look at the interface call - the interface call gets a bit more complicated:

mov rcx,7FF8CF965BB0h                        ; first arg to allocation routine - the Foo struct type
call    clr!JIT_TrialAllocSFastMP_InlineGetThread ; this is the allocation
mov     qword ptr [rbp-20h],rax                   ; rax is the created boxed 'Foo' struct
mov     ecx,dword ptr [rbp-18h]                   ; foo.value
mov     rdx,qword ptr [rbp-20h]                   ; boxed foo
mov     dword ptr [rdx+8],ecx                     ; copy foo to boxed foo
mov     rcx,qword ptr [rbp-20h]
mov     qword ptr [rbp-28h],rcx
mov     rcx,qword ptr [rbp-28h]                   ; rcx points to the new boxed 'Foo' struct on the heap
mov     edx,0Ah                                   ; = 10
mov r11,7FF8CF970020h                        ; r11 is the target
cmp     dword ptr [rcx],ecx                       ; this does the 'null' check and triggers a NullRefernceException if needed
call    qword ptr [r11]                           ; interface dispatch code


Again, I’ve put comments on the right side of the assembly code. It basically creates a boxed Foo, copy the value to the newly created boxed Foo, . Note the 8 offset is for the MethodTable pointer in the beginning of the object - only objects and boxed value type (which is an object, naturally) has those. A regular value type doesn’t.

Ignor all the interface dispatch code for now (it’s not relevant to our discussion), eventually you’ll arrive at some interesting instructions below:

add     rcx,8                          ; skip the MethodTable pointer and to the first field
mov rax,7FF8CF965B78h
mov     rax,qword ptr [rax]            ; retrieve Foo.Add method
jmp     rax


This code doesn’t really do much. But actually gives us a lot of insight on how the system works together. Looking back at the old code we’ve shown earlier for AddValue method, it basically expects this pointer to point to the first field. However, all objects, in order to support type operations (such as reflection, casting, etc) has their first pointer-size field as the type pointer, which is called MethodTable in CLR jargon. Therefore, CLR needs to generate unboxing stub that unbox the boxed value and calls the underlying JITted method that expects to work with an unboxed this pointer. Note that the unboxing doesn’t involve a copy, it simply adds an offset to it. This effectively means that the += operation would take effect on the boxed copy. However, since the boxed Foo is only known to the compiler, the newly updated value is forever lost. And that’s why you would see:

After calling IAdd.Add:20


## A case with generics

Now let’s add some generics in the mix:

    static void Add_WithoutConstraints<T>(ref T foo, int val)
{
}



Even though it is a fancy generic method, the call itself and the underlying code is nothing surprising. As you might already expect, even though the caller passes Foo by reference, Add_WithoutConstraint makes a copy of it before it calls into IAdd, and the modification is again, forever lost.

After Add_WithoutConstrats:20


Now the interesting case that I’d like to talk about earlier in the article (thanks for staying with me so far!). Let’s create a generic method with a generic constraint where the T is an IAdd interface:

    static void Add_WithConstraints<T>(ref T foo, int val) where T : IAdd
{
}



Perhaps it isn’t entirely obvious to everyone - foo.Add(val) is an interface call using callvirt instruction: callvirt instance void IAdd::Add(int32), because that’s the only way compiler knows how to make the call.

The interesting part is, when we call Add_WithConstraints, the call happens exactly in the same manner, except the code we are calling into looks drastically different:

0:000> !u 00007ff8cfa707d0
Normal JIT generated code
Begin 00007ff8cfa707d0, size 3a
>>> push    rbp
sub     rsp,20h
lea     rbp,[rsp+20h]
mov     qword ptr [rbp+10h],rcx           ; this pointer
mov     dword ptr [rbp+18h],edx           ; val
mov rax,7FF8CF964560h                     ; debugger gibberish
; but you probably guessed it's for Just My Code
cmp     dword ptr [rax],0
je      00007ff8cfa707f5
call    clr!JIT_DbgIsJustMyCode (00007ff92f534eb0)
nop
mov     rcx,qword ptr [rbp+10h]                  ; this pointer
mov     edx,dword ptr [rbp+18h]                  ; val
call    00007ff8cfa706c0 (Foo.IAdd.Add(Int32)   ; calls the method without boxing!
nop
nop
lea     rsp,[rbp]
pop     rbp
ret


As you can see, the code is surprisingly simple. No boxing, no interface cast, and a direct call to Foo.IAdd.Add method. No value is lost. And you can observe the side effect:

After Add_WithConstraints:30


The reason is compiler now has enough information to figure out the code is for Foo and the interface call will land exactly on Foo.IAdd.Add`, so it skips the formality and calls the function directly. This is both a performance optimization but also comes with observable side-effect.

## Conclusion

When you are working with interface on value types, be aware of the potential performance cost of boxing and correctness problem of not observing changes in the callee. If you’d like to avoid that, you can use generic constraints to constraint the interface call so that compiler can optimize out the boxing and interface call altogether and go straight to the right function.

You can find the full code in this gist.