Value Types and Reference Types

11. June 2007 15:44 by Eric in .net  //   Comments (0)

It would be hard to overstate how important the subject of this section is. Everything you do in .NET will deal with either a value type or a reference type—and yet it’s curiously possible to develop for a long time with only a vague idea of what the difference is. Worse yet, there are plenty of myths around to confuse things further. The unfortunate fact is that it’s quite easy to make a short but incorrect statement that is close enough to the truth to be plausible but inaccurate enough to be misleading—but it’s relatively tricky to come up with a concise but accurate description.

Suppose you’re reading something really fantastic, and want a friend to read it too. Let’s further suppose that it’s a document in the public domain, just to avoid any accusations of supporting copyright violation. What do you need to give your friend so that he can read it too? It entirely depends on just what you’re reading.

First we’ll deal with the case where you’ve got real paper in your hands. To give your friend a copy, you’d need to photocopy all the pages and then give it to him. At that point, he has his own complete copy of the document. In this situation, we are dealing with value type behavior. All the information is directly in your hands—you don’t need to go anywhere else to get it. Your copy of the information is also independent of your friend’s after you’ve made the copy. You could add some notes to your pages, and his pages wouldn’t be changed at all.

Compare that with the situation where you’re actually reading a web page. This time, all you have to give your friend is the URL of the web page. This is reference type behavior, with the URL taking the place of the reference. In order to actually read the document, you have to navigate the reference by putting the URL in your browser and asking it to load the page. On the other hand, if the web page changes for some reason (imagine it’s a wiki page and you’ve added your notes to the page) both you and your friend will see that change the next time each of you loads the page.

The differences we’ve seen in the real world form the heart of the distinction between value types and reference types in C# and .NET. Most types in .NET are reference types, and you’re likely to create far more reference than value types. Aside from the special cases that follow, classes (declared using class) are reference types, and structures (declared using struct) are value types.

The key concept to grasp when it comes to value types and reference types is what the value of a particular expression is. To keep things concrete, I’ll use variables as the most common examples of expressions—but the same thing applies to properties, method calls, indexers, and other expressions.

As we might know, most expressions have types associated with them. The value of a value type expression is the value, plain and simple. For instance, the value of the expression “2+3” is just 5. The value of a reference type expression, however, is a reference. It’s not the object that the reference refers to. So, the value of the expression String.

Empty is not an empty string—it’s a reference to an empty string. In everyday discussions and even in documentation we tend to blur this distinction. For instance, we might describe String.Concat as returning “a string that is the concatenation of all the parameters.” Using very precise terminology here would be time-consuming and distracting, and there’s no problem so long as everyone involved understands that it’s only a reference that is returned.

The values of variables are stored wherever they are declared. Local variable values are always stored on the stack,9 and instance variable values are always stored wherever the instance itself is stored. Reference type instances (objects) are always stored on the heap, as are static variables.

Another difference between the two kinds of type is that value types cannot be derived from. One consequence of this is that the value doesn’t need any extra information about what type that value actually is. Compare that with reference types, where each object contains a block of data at the start of it identifying the actual type of the object, along with some other information. You can never change the type of an object—when you perform a simple cast, the runtime just takes a reference, checks whether the object it refers to is a valid object of the desired type, and returns the original reference if it’s valid or throws an exception otherwise. The reference itself doesn’t know the type of the object—so the same reference value can be used for multiple variables of different types.

In this section we’ve looked at the differences between value types and reference types, as well as some of the myths surrounding them. Here are the key points:

* The value of a reference type expression (a variable for example) is a reference, not an object.

* References are like URLs—they are small pieces of data that let you access the real information.

* The value of a value type expression is the actual data.

* There are times when value types are more efficient than reference types, and vice versa.

* Reference type objects are always on the heap, but value type values can be on either the stack or the heap, depending on context.

* When a reference type is used as a method parameter, by default the parameter is passed by value—but the value itself is a reference.

* Value type values are boxed when reference type behavior is needed; unboxing is the reverse process.