Strings in C
In C, a string is an array of
characters. This array can be created
statically or dynamically and accessed via a name or variable which contains
its address.
#include
<stdlib.h>
#include
<stdio.h>
#define N 10
int main(int argc,
char *argv[])
{
char
fName[ N ];
/*
The memory allocated for the string fName
looks like:
fName:
[?][?][?][?][?][?][?][?][?][?]
0
1 2 3 4 5
6 7 8 9
The ? means the value in the cell is
undefined.
Whatever was last there is still there.
*/
printf("Enter
your first name: ");
scanf(
"%s", fName );
/*
assume the user types: Timothy
The memory allocated for the string fName now
looks like:
fName:
['T']['i']['m']['o']['t']['h']['y']['\0'][?][?]
0 1 2 3 4 5 6 7 8 9
scanf copies the chars from the keyboard
into their respective elements of the array and then adds a null
character as a terminator. '\0' is really a char value 0
(zero). In an array of chars, null is a zero byte (8 zero bits). The null terminator must be present in
order for printf and the string functions to work properly. It follows that a character array of length
N can safely store at most N-1 characters.
*/
printf("Your
first name is %s\n", fName);
/*
stops printing chars when it encounters the null character */
return
0;
} /* END OF MAIN */
The
printf function starts at the first char of the array and continues
printing chars until it sees the null character. The null character is not printed.
A
couple questions arise:
Question #1: What if you enter a string with more than
N-1 characters?
In
this case scanf continues to copy characters into memory beyond the end
of the array then adds the null terminator.
if
the user types BillyJoeBob
then
memory now looks like:
fName: ['B']['i']['l']['l']['y']['J']['o']['e']['B']['o']['b']['\0']
0 1 2
3 4 5 6 7
8 9
OOPS! We
don't (necessarily) own this memory
We have just
accessed memory via the fName variable that does not belong to the fName
variable! Even if the memory occupied
by those last 2 bytes was being used by some other variable in our
program, we just trashed its value!
Unfortunately C does not guarantee detecting such a mistake for us. Your progam may crash before the data is
finished being copied or, worse yet, it may continue to run with corrupted
memory which can produce unexpected/unpredictable behaviour later. These kinds of errors can be very difficult
to discover since the behaviour may be inconsistent from run to run. If the program does crash you will probably
see an error message with the word segfault in it.
Question #2: What if printf is fed a string that
does not have a null terminator?
int
i=0;
char
c = 'a';
while
(i < N) /* we hardcode the contents
of the string but no null terminator */
fName[ i++] = c++;
and
now memory looks like:
fName: ['a']['b']['c']['d']['e']['f']['g']['h']['i']['j']
0 1 2
3 4 5 6 7
8 9
Our
printf function now does not
know when to stop and it continues to print characters beyond the end of the
array until it chance encounters a null value,
OR it crashes somewhere after the end of the array.
Common
String Functions
(formal descriptions are found in the man pages)
size_t strlen(const char *s);
The
strlen function returns an unsigned int which is the number of characters in the string, not including the terminating null character.
char* strcpy(char *dest, const char *src);
The strcpy function copies
the contents of one string into another and tacks on the terminating null
character. It returns a pointer to the dest string.
char foo[10];
char bar[10];
strcpy(foo, "Hello");
foo: ['H']['e']['l']['l']['o']['\0'][?][?][?][?]
0 1 2 3 4
5 6 7 8 9
Note
that strcpy accepts a string literal as its source. The meaning of const
in the prototype does not mean the src must be a string literal. It just means that the code inside
strcpy should not modify the source string.
If the code inside does modify
the src string, the compiler will issue a warning that a read only location is
being assigned into. The compiler however will complete the compilation and let
you do it. Those of you familiar with C++ may recall that C++ will refuse to
compile const code that modifies a const arg. C however, as usual, lets you do
something that is inconsistent (and possibly very bad!).
The
strcpy function added a null character to the dest string. What really happened here is that a null character was actually stored at the end of the literal "Hello" by
the compiler and copied onto the dest just like any other null-terminated
source.
Note
also that strcpy does not require dynamic (malloc'd) memory for the
destination. The char * dest argument merely specifies that the address of a
character (pointer to char) must be passed in for this arg. The name dest as declared in main is in fact
a const pointer to char. It is const because
we can never assign a new array's address into the dest declared in main. In
main, the name dest is bound for life to the same chunk of memory. Thus
we refer to dest in main as being a variable's name.
strcpy(bar, foo);
produces:
bar: ['H']['e']['l']['l']['o']['\0'][?][?][?][?]
0 1 2 3 4
5 6 7 8 9
It
is important to remember that strcpy copies from src into the dest string starting at the address in dest.
strcpy(bar, "Tim");
produces:
this memory unchanged
bar: ['T']['i']['m']['\0']['o']['\0'][?][?][?][?]
0 1 2 3 4
5 6 7 8 9
Note
that strcpy does not alter the portion of the dest string after the null
character deposited by the copy operation.
Strcpy
does not do any error checking for invalid arguments. If the src string
is not null-terminated than strcpy will read on further in memory until it
crashes or chance encounters a null character. If
the dest string is not big enough to hold src then strcpy will copy chars
beyond the end of dest until it crashes or completes the copy.
Either
of these 2 cases are error conditions, even though your program might not
actually crash on any particular run.
int strcmp(const char *s1, const char *s2);
The
strcmp function behaves much like the compareTo method in Java Strings (or vice
versa since C came first).
strcmp compares both strings one char at a time starting at the start
addresses passed in. It subtracts the ascii value of s2[i] from s1[i]. If the
difference is non-zero or if a null character is encountered in either string, the
difference is returned. See the man pages for a formal description.
As
with all the string functions no error checking is done and bad args produce
crashes or unpredictable behavior.