C++CharacterArrays

From HerzbubeWiki
Jump to: navigation, search

This page illustrates the use in C++ of character arrays, character pointers and arrays of character pointers.

I transcribed the information on this page from an old, old piece of paper with handwritten notes, jotted down to clarify my thoughts when I was still relatively new to C++ and had to deal with the confusing topic for the umpteenth time. There is probably a lot more to be said about the topic, but I think this page should cover the basics, and that's sufficient for now.


General notes

  • The asterisk (*) operator dereferences something, i.e. it "gets" the thing that the pointer references
  • The ampersand (&) operator obtains the address of something, i.e. the result is a "pointer to something"
  • An array variable is treated by the compiler as the address of the array, i.e. a pointer to the first element of the array (the element with index position zero)


Character Arrays

This is how a character array variable is declared and initialized:

char x[7] = "foobar";

Notes:

  • Although the string represented by the character array consists of only 6 character, the array must be declared with size 7 to hold the implicit string terminating null byte


This is how the variable and the the character array look in memory. The numbers on the second line express the memory addresses where all the items are stored.

Variable x        Character array
                  2000
has no memory     +---+---+---+---+---+---+----+
representation    | f | o | o | b | a | r | \0 |
                  +---+---+---+---+---+---+----+


And here are a number of expressions and what they evaluate to. Very important: By definition of the C/C++ language standard, an array variable (in this case x) is interpreted as the address of the array. This can be expressed like this: &x[0]. The array variable itself is not represented anywhere in memory.

&x[0]            is equal to the memory address 2000
x                is equal to the memory address 2000, because by definition x resolves to &x[0]
&x               is equal to the memory address 2000, again because by definition x resolves to &x[0]

x[0]             is equal to the character 'f'
*x               is equal to the character 'f', because by definition x resolves to &x[0]

sizeof(x)        is equal to 7
sizeof(x[0])     is equal to 1

printf("%s", x)  results in "foobar" being printed to the console


Character Pointers

This is how a character pointer variable is declared and initialized:

char* x = "foobar";

Notes:

  • Strictly speaking the declaration should look like this: "char *x". It is my personal preference, however, to add the asterisk to the type, not the variable name - in my mind it expresses far clearer that the variable has the type "char pointer".


This is how the variable and the the character array look in memory. The numbers on the second line express the memory addresses where all the items are stored.

Variable x        Character array
1000              2000
+-----+           +---+---+---+---+---+---+----+
| *-- | --------> | f | o | o | b | a | r | \0 |
+-----+           +---+---+---+---+---+---+----+


And here are a number of expressions and what they evaluate to. Very important: Unlike an array variable (see the first section of this page), the pointer variable has a representation in memory.

x                is equal to the memory address 2000, because that is the content of the pointer variable
&x               is equal to the memory address 1000, because pointer variables have a representation in memory

*x               is equal to the character 'f'
x[0]             is equal to the character 'f'

&x[0]            is equal to the memory address 2000

sizeof(x)        is equal to 8; the result here is architecture dependent, it shows how much memory is required to store a pointer
sizeof(x[0])     is equal to 1

printf("%s", x)  results in "foobar" being printed to the console


Arrays of Character Pointers

This is how an "array of character pointers" variable is declared and initialized:

char* x[6| = { "abc", "def", "ghi", "jkl", "mno", "pqr" };

Notes:

  • Strictly speaking the declaration should look like this: "char *x[6]". It is my personal preference, however, to add the asterisk to the type, not the variable name - in my mind it expresses far clearer that each array element has the type "char pointer".
  • This syntax seems to be illegal these days! I am pretty sure that many years back, when I wrote my notes, a compiler was able to process this example. This needs investigation!


This is how the variable and the the character arrays look in memory. The numbers on the second line of each array element express the memory addresses where all the items are stored.

x[0]              Character array
1000              2000
+-----+           +---+---+---+----+
| *-- | --------> | a | b | c | \0 |
+-----+           +---+---+---+----+
x[1]
1008              3000
+-----+           +---+---+---+----+
| *-- | --------> | d | e | f | \0 |
+-----+           +---+---+---+----+

[...]


And here are a number of expressions and what they evaluate to: Very important: As explained in the first section on this page (the one about character arrays), an array variable is interpreted as the address of the array and has no representation in memory. Unlike in that first section, the array here consists of character pointers.

&x[0]            is equal to the memory address 1000
&x[1]            is equal to the memory address 1008; the result here is architecture dependent, it shows how much memory is required to store a pointer

x                is equal to the memory address 1000, because by definition x resolves to &x[0]
&x               is equal to the memory address 1000, again because by definition x resolves to &x[0]
x+1              is equal to the memory address 1008; equivalent to &x[1], but uses pointer arithmetics

x[0]             is equal to the memory address 2000
x[1]             is equal to the memory address 3000

*(x[0])          is equal to the character 'a'
*(x[1])          is equal to the character 'd'

*x               is equal to the memory address 2000; equivalent to x[0]
*(&x[0])         is equal to the memory address 2000; equivalent to x[0]