C Tutorial, Part VI

Structs

The structure (or struct) in C is analogous to a degenerate class (class with no methods) in Java. Also, structs have no mechanism for data hiding - all fields (data members to you Java disciples) defined in the struct are public and accessible to client code via the dot operator. Further, there is no bundling of functions (methods) with the data - although function pointers can be stored as fields. Lastly, since there is no OOP in C - a struct cannot be used to derive other classes. There is no inheritance, only composition.

One other significant difference between a Java class and a C struct is that in Java when you declare a class variable you are creating a reference to an object and, usually, instantiating an object of the type. In C, when you define a struct variable you are getting what is essentially a standard variable. The effect of this is as follows: when you pass a struct into a function as a parameter you are passing a copy of the struct, not a pointer to it. A struct's name is NOT an address, like an array. A struct's name is like the name of a primitive type such as int, char, double etc. You can, of course, explicitly declare a pointer to a struct or use the & operator on a struct's name in order to pass the address of the struct into a function. We will now look at code that declares automatic and dynamic struct variables and passes the structs or their addresses into functions.

structs_1.c

Note in the above sample we declare a local struct variable and use the dot notation to access the fields inside. The string hanging from the name field is created at run-time but the struct itself is a compile-time, stack-allocated, local variable.

You'll notice the use of typedef to simplify the declaration of struct variables. There are LOTS of ways to declare the person struct (with name (char *) and age (int) members):

	struct {
	    char *name;
	    int age;
	} p, p2, p3;  /* 8 bytes of storage allocated for each of p, p2, p3 */
	/* but this doesn't allow for additional variable declarations except here */

	
	struct Person {
	    char *name;
	    int  age;
	};  /* don't forget the semi-colon! */
	/* no storage allocated */

	struct Person p, p2, p3;  /* storage allocated for p, p2, p3 */

	/* could add */
	typedef struct Person Person;

	/* and then declare the variable as */
	Person p, p2, p3; */

	
	struct {
	    char *name;
	    int age;
	} typedef Person;  /* don't forget the semi-colon! */
	/* no storage allocated */

	Person p, p2 p3;  /* storage allocated for p, p2, p3 */

	
	typedef struct {
	    char * name;
	    int age;
	} Person;  /* don't forget the semi-colon! */
	/* no storage allocated */

	Person p, p2 p3;  /* storage allocated for p, p2, p3 */

Regardless of how you declare it, the only legal operations on a struct are

copying (assigning) as a single entity (e.g., p = p2;) - all the bits of p2 are (shallow) copied to p.
taking its address (e.g., &p3)
accessing its members (with . or -> as we'll see below)

HandsOn: Play around with this code, writing functions that would modify the age field of an arbitrary Person. Here is a final version that has that: structs_1-rev.c.

Now, lets take a look at pointers to structs using

structs_2.c

Note in demo 2 that we introduce the -> operator. This operator is a shortcut that combines the dereference and the field-access operator. Thus, p->field is synonymous with (*p).field (the parentheses around *p are important! Clearly the -> notation is simpler and is preferred. The rule concerning whether to use . or ->) is simple: if you are accessing the struct via its name, use the . (dot) operator to access the fields; if you accessing the struct via a pointer, use the -> operator.

Caution about structs and pointer arithmetic!

One other warning about structs before we go on. While it is the case that the fields of a struct are allocated in the same order that they are declared, alignment rules may force "spaces" in the actual memory allocated, so you can never assume that the sizeof a struct is equal to the sum of the sizes of the individual fields. more specifically, never try to do "pointer arithmetic" to calculate the address (offset) of where some field starts inside the struct. Of course, if one of the fields is an array you may use pointer arithmetic on (within) the array itself, but you must never use pointer arithmetic to calculate the offset from the start of the struct to where any field starts. Pointer arithmetic is for elements that are guaranteed to be contiguous in storage and homogenous in size. The fields within a struct are guaranteed to be neither.

Looking at the Person struct, the member name is a char *, i.e., a pointer to char. You should not assume that the first 4 bytes of the struct are the name field or that the second 4 bytes of a Person are the age field. The name field will, indeed, be allocated a memory address that occurs before the age field, but the compiler reserves the right to store the fields aligned as it needs. As a result, you cannot assume that the sizeof a struct is the sum of the sizeof's of its constituent fields (although, clearly, the sizeof the struct must be at least this large!). Sometimes the compiler has to pad to ensure that certain types start on a word boundary. We will cover alignment in machine language in a few classes.

Let's now look at

structs_3.c

And finally, let's look at

structs_4.c and input.txt

Linked Lists

Now that we are familiar with the struct type, we use it to implement a linked list. A linked list can be thought of as an array that's been blown apart and, since the elements are no longer contiguous in memory, we must explicitly chain (link) them together via explicitly mallloc'd pointers.

Notice in our code sample below that we use a slightly different form of typedef to declare our list element struct. This is needed because in our list element struct declaration we make a (self-)reference to our own struct! As you can see, C allows one struct to contain instances of other structs (or pointers to other structs or itself). But, for obvious reasons, a struct cannot contain a variable of its own type, only a pointer to its own type. Why? If we had declared a variable of our own type in the struct, the compiler would be unable to resolve the symbol, due to an infinite recursion of definition. Similarly, two different struct types cannot contain instances of each other as resolution of the final types would require an infinite mutual recursion.

Nested Structs

Here is an example of struct composition in the context of a Linked List:

linkedList.c and input.txt

Let's look at the code and make sure we understand what's going on in all those functions. We'll write a few of our own as well. Here's where we ended up:

linkedList-rev1.c

Linked Lists (continued)

We continued to develop our linked list code, adding a recursive append function as well as iterative and recursive delete functions. Our final version of the code can be found in

linkedList-final.c

We also worked with a hash table example. Here is the starter code that we used:

hash-table.c

I augmented this code with a recursive delete function, deleteRecurReturn that performs the delete through the return, instead of through the paramenter. It is thus passed a ListNode;nbsp;* instead of a ListNode **. Make sure you read and understand this function!

Some more information about source code decomposition

Please look at the new file on makefiles and source code decomposition. Here is another example of source code decomposition.

We now step back a minute from arrays and pointers and look at the larger picture and issues of source code decomposition into separate files.

Let's look at arrDemo2.c next. It's the exact same program as one of our old array programs EXCEPT:

The function definitions that operate on arrays are no longer in the same file as main; they've been moved to their own source file: arraylib.c
All the corresponding declarations (prototypes) are also moved to their own file: arraylib.h
That leaves only the main function in the arrDemo2.c file and a new included file, #include "arraylib.h" above main. Note the use of " " instead of < > around arraylib.h. " " denotes that the header file is "local" or programmer-defined and < > denotes that it's a system-defined header (which is found in /usr/include).
We also #include "arraylib.h" at the top of arraylib.c. Why?

There's some tricky looking stuff in arraylib.h - the #ifndef pre-processor directive. This is to prevent multiple includes of the same source (since recall that #include implies copy and paste). We'll discuss this in lecture.

To successfully compile the arrDemo2.c program, we need to compile both arrDemo2.c and arraylib.c together, i.e., gcc -ansi -pedantic -Wall arrDemo2.c arraylib.c

Why don't we need to list the .h files on our gcc compilation command line?