C# is D flat: Pointers, it was once...

In these last 10 years working as a developer or teaching in a Introduction to Programming discipline, I've found many people struggling to understand pointers properly. Many found it a very abstract subject but the truth is it is not all that difficult to grasp.

I'll try to remember as many doubts I had listened to in order to try to write about then clearly.

I think first it worth have a look again in how variables work (in a more illustrative and informal way). First a variable has a type, which determines how the bytes stored in that variable will be interpreted. The type also determines the amount of memory necessary for that variable.

Let's say we want to declare an integer variable (in C):

int x;

This variable (assuming it is an int 32 bits) will use 4 bytes of memory; this means that it can store up to 2³² possible values (not all of then at the same time obviously). And what about the memory, how is it being used by the variable x? Let's think of the main memory as a huge array (once again I want to be informal) divided in blocks of 8 bits - 1 byte. As an array, each byte can be accessed by an index, actually each byte in that 'array' has an unique index called memory address through which we can access the contents stored in that piece of memory.

Let's imagine that the memory address 0x100 has been reserved for the variable x - that piece of memory would look like this:

As said before, the variable x, as a 32 bits integer, is using 4 bytes starting at the address 0x100, which we say is the address of that variable. So, knowing the start address and the type, we can 'visualize' the memory space used by that variable and verify that it uses four consecutive bytes starting at 0x100.

If we assign a value to x, say 27 (0x1B in hex), the memory space would look like this:

x = 27;

The LSB (Last significant Byte) is in the smallest address of the four bytes being used by the variable x. This is due to the fact I'm considering a Little-endian architecture in this example - used by popular PCs nowadays.

Now that we have a basic idea about variables, let's talk about pointers. In some extent, pointers are like normal variables. They have a type, a name and need memory space to store the data assigned to it.

But this starts to change a little bit when we look on what type of information does a pointer variable stores. Basically a pointer stores a memory address. So in part we can think on it as a index variable (remember that we can think of the memory as a big array). This memory address can be the address of dynamically allocated memory, of a variable and even the address of a function - which we can talk about later in another post.

In a 32 bits architecture an memory address is a number that uses 32 bits of space, so a pointer will use 4 bytes. Let's go to a simple example. Considering our x variable - whose address is 0x100 - lets assign its address to a pointer

int *p = &x;

Before going any further, lets briefly talk about the symbols being used here. When declaring a pointer variable in C or C++, what distinguishes such a variable from a normal one is the asterisk (*) symbol put on the left of the variable's name. The ampersand symbol (&) when put on the left of a variable is used to get the address of that variable. So in this single line we are declaring a integer pointer called p and assign to in the address of the variable x.

In our picture now, what we have is a int variable x, whose address is 0x100, and the int pointer p pointing to the same memory address. Now, as the pointer 'knows' the address where the variable x stores its data, we can manipulate the data indirectly using the pointer. For instance consider the line bellow:

*p = 30;

What happens is that the variable x no longer has the value 27, instead we change its contents to 30 using the pointer.

Before ending this very basic discussion about pointer, a little bit of C, C++ symbology.

int *p

Here the * symbol is being used to say the p is a pointer (int pointer)

p = 30;

A pointer always interpret the data assigned to it as a memory address. So in the case we are telling the pointer to point to the address 30 - which is likely to cause problems and seems wrong as we are not sure what is being store there.

p = &x;

Now we are assigning the address (&) of x into p - note that p without the *, as written above, everything we assign to a pointer will be treated as a memory address.

*p = 20;

Now look the asterisk! It tells the pointer to assign 20 into the memory being pointed by it, we are not telling the pointer to point to the address 20.

So the asterisk when used in a variable declaration, tells the compiler that it is a pointer. However when used in the middle of the program it tells to the pointer: assign this value in the memory being pointed by you!

I think it is enough by now. More coming soon...

C# is D flat

Sunday, 20 February 2011

Pointers, it was once...

No comments:

Post a Comment