C Tutorial, Installment I
Nael Abu-Ghazaleh (based almost entirely on notes and code by Mark Stehlik)
Your first C program
Copy/Download
demo1.c
from ~nael/public_html/cs220/C into your home directory, and open in your favorite text editor (emacs :-)). You'll find:The #include lines at the top are similar to an import in Java but they differ significantly. In C, the #include is a pre-compilation directive that instructs the compiler to replace (cut and paste) the #include line with the text from the indicated file. Thus, a #include increases the size of your compilation source since it causes (much) more code to be compiled. In Java, however, the import statement merely adds the file to the list of places to search when attempting to resolve a symbol.
The main function has some similarity to Java since, like a Java program, a C program must have exactly one main function (now that we are in C, we refer to what you know as methods as functions from now on). The difference is that while each class in a multi-class Java program can have its own main method, and you indicate at run-time which class has the main method that you wish to start from (and all the others are never executed), in C there can only be one main function anywhere in the set of files that are compiled and run. Main returns an int, but the utility of this returned value is mostly moot for our purposes since we do no script programming which might examine and use the value returned by main. We will defer our explanation of argc and argv until we study pointers, strings and arrays in a few lectures. Suffice to say, they need to be there, just like the signature for main in Java (public static void main (String[] args)).
Variable declarations
Variable declarations are similar to Java. However, in C the compiler does not guarantee initialization. Newly declared variables contain garbage values and must be explicitly initialized before use! Also, the C compiler also does not standardize the storage size of all the types like Java does. One C compiler might store ints in 32 bits while another uses 16 bits per int - or even 64 bits. Currently, most C compilers you're likely to run into use 32 bits for an int. Thus, an int is a (usually) a 32-bit signed value having a range of +2,147,483,647 to -2,147,483,648. An unsigned int is a strictly positive 32 bit value ranging from 0 to 4 billion. C has floating point types float and double. It is common practice in C (as in Java) to just use the double type with its increased precision for floating point numbers to minimize accumulated rounding errors. Since the representations are not guaranteed across platforms, C provides the sizeof operator which operates on a data type (or expression) such as int, char, float, etc. and produces the number of bytes used by that type (or expression). C also provides a library limits.h that defines range values for the types on the platform you're on.Lastly, look at
lims.c
a program that reports the size and range of several data types in C.The char type is an 8-bit (usually) signed quantity that stores the ASCII code for a character. Thus a char is just an 8-bit int (signed/unsigned is compiler-dependent). Sometimes a char or array of chars is used to store plain numbers when the range of values is certain not to exceed 8 bits. I would not write such code!
ASCII TABLE
As we discussed in class, acceptable casting between types occurs in a manner similar to Java (but with no Class cast exceptions!). But assignments between dissimilar types is permitted and is not caught by the compiler. In particular, you can assign a double to an int and no one will care (the value will just be truncated). You should, however, make any casts explicit, i.e.,
double d; int i = (int)d;
There is NO boolean type in C. Anywhere an expression is expecting a boolean value, such as a conditional test, the value 0 is interpreted as false and ALL other values are interpreted as true.
Console I/O
Console I/O can be done via the many print, scan and get functions (see the man pages - RTFM). Our first C program uses:printf("Please enter an integer: "); and printf("x= %d y= %d\n", x, y);In the first printf, we supply a string literal. What you see is what you get. In the second call to printf, we pass in a format string, followed by two actual arguments (the names of our variables, x and y). The format string contains the literal "x= " followed by a placeholder for an int, %d, followed by literal " y= " and another placeholder for an int. At the end of the format string is an escape character, \n, which is the newline char. After the format string come 2 arguments, the value of x and the value of y. For each % placeholder in the format string there must be a corresponding argument supplied after the format string. Your code may compile and run if you omit those arguments but the behavior is unpredictable (welcome to C!).
To read data from the keyboard our first C program uses:
scanf("%d", &x);And this where things get interesting. Note the ampersand character (&) before the x variable identifier in our call to scanf. This ampersand character is the address-of operator. This operator, when placed before a variable identifier, produces the address of that variable, rather than the value stored in the variable. This address value is a positive memory address between 0 and however much memory you have in your computer. scanf needs the address of of x and not the value of x because scanf wants to know where to store the numeric conversion of the string you typed into the keyboard.
Recall that when we just want the value of x we just use the name x in a statement such as:
x = 15; // assign a new value into x or y = x + 5; // look up the value of x or printf("value of x= %d", x ); // look up the value of x
This is the first time we have ever been concerned with the address of where a variable is stored in memory. Java intentionally shields us from any such concerns. C does not. Thus it is important to understand the distinction between the value of a variable, and the address of a variable. They are not the same! If we want the address we must put the & operator immediately to the left of the variable name.
scanf is a value-returning function. It returns the number of successful conversions it makes. If your format string has three % format placeholders then scanf will return a number between 0 and three inclusive depending on how many good values the user provides. Unlike Java, scanf will not crash or throw any kind of exception if you enter your firstName where a number was expected. It will simply fail to convert that string to a number and fail to store any new value at the address specified! But the program will continue merrily along!
Later on, when you run lims.c, notice the different formats used in the various printfs, such as %d, %li, %u, %e. These are format specifiers that serve as placeholders for the values of the actual arguments of type decimal integer, long int, unsigned int, float (and double) respectively. Look in the K&R textbook on page 154 for a good summary of the data types and format strings used in printf (and its sibling functions such as fprintf, sprintf, etc.).
For now, let's modify demo1.c to prompt for x and y on the same line. Also, let's add some code to report the number of successful conversions. Then we'll run it a few times entering good and bad values.
Compilation & Execution
Compilation: gcc -ansi -pedantic -Wall -Wextra demo1.c
(if successful, produces a.out, an executable binary file)
To do this, we'll pop out of emacs by using ctrl-z to pause that process. Once we compile our code, if there are no compilation errors, we can run the binary that's produced. If there are errors, we need to go back into emacs (fg), fix the errors and try it again. We can also compile from within emacs by using ctrl-x ctrl-e.
What's with those compilation options (flags)?
- The -ansi flag tells the compiler to only accept code in
accordance with the rules of ANSI C (89). This disallows, for
example, the use of //-style comments.
- The -pedantic flag tells the compiler to issue all the
warnings demanded by strict ANSI C and reject all programs that use
forbidden extensions.
- The -Wall flag tells the compiler to report all
warnings (even the optional ones).
- The -Wextra flag (same as the older -W flag) tells the compiler to report a few things that -Wall doesn't catch.
Execution: ./a.out (executes the binary file)
The ./ before the a.out is needed as it stands
for the current directory (otherwise your execution path would be searched for
a file called a.out to execute and would likely not be found). You can modify
your .login file to append . to your path.
C (unlike Java) is platform-dependent. Thus we should test code by compiling and executing on several platforms. This practice is a heuristic for maximizing our chances of discovering bugs in our code - particularly those dealing with dynamic memory and pointers. It is very common for your program to appear to run perfectly on one platform but crash (when I test it) on another. Students typically respond with "But it ran fine under Linux!", or, "It ran fine under Cygwin!" Please understand that differing platforms (or even the same platform under different runtime conditions) simply reveals flaws in your code - not flaws in the compiler or the OS. Simply put - you got lucky on Linux but a different OS caught you.
The gcc compiler is usually installed on all UNIX platforms. These gcc compilers are more modern versions which support C and C++ code. In this course will we emphasize coding standards that are specific to strict C and forbid C++ features as described above.
Conditionals and Loops
-
if ( ) { }
-
if ( ) { } else { }
-
switch (int-valued expression) { case int-valued constant : stmt; stmt; break; case next-int-valued constant : stmt; stmt; break; ... default : stmt; stmt; }
-
for (initialization; termination; update) { }
-
while (condition) { }
-
do stmt; while (condition);
We began the lecture with a discussion of the conditional and looping statements that C has available (descriptions are available in K&R Chapter 3):
Command-Line Arguments
The last program we looked at examined the main variables, argc and argv. The variable, argc, keeps track of how many command-line arguments are provided; argv stores those arguments as strings.
Look at
args.c
to see how we can examine the command-line arguments. Notice the use of atoi to convert the strings to ints so we can sum them.Function prototypes
As we look again at args.c, note the use of a function prototype above main. Without this "function declaration", if you will, every function would have to be defined before use as C is a one-pass compilation process. This would place a lot of burden on the programmer (to remember who calls who) and prevents a possibly more natural grouping of functions. Using function prototypes frees the programmer to place the function definitions anywhere in the file and in any order.
Function prototypes will be revisited later when we discuss function pointers
Bit-wise Operators
- bit-wise NOT (~, not !)
- bit-wise OR (|, not ||)
- bit-wise AND (&, not &&)
- bit-wise XOR (^)
- << and >>, left-shift and right-shift respectively
There are a number of operators in C that operate on the bit representation of your data. Many have names that are similar to logical operators, but remember that bit operators operate on bits, not booleans-implemented-as-integers! In particular, the bit operators we will examine are:
NOT
A bit-wise NOT or complement is a unary operation which performs logical negation on each bit. 0 digits become 1, and vice-versa. For example:
NOT 0111 = 1000
In C, the NOT operator is "~" (tilde). For example:
x = ~y;
assigns x the result of "NOT y". This is different from the C logical "not" operator,
"!" (exclamation point), which treats the entire numeral as a single Boolean value.
For example:
x = !y;
assigns x a Boolean value of "true" if y is "false", or "false" if y is "true".
In C, a numerical value is interpreted as "true" if it is non-zero.
The logical "not" is not considered a bit-wise operation, since it
does not operate at the bit level.
Bit-wise NOT is useful in finding the one's complement of a binary numeral.
OR
A bit-wise OR takes two bit patterns of equal length, and produces another one of the same length by matching up corresponding bits (the first of each, the second of each, and so on) and performing the logical OR operation on each pair of corresponding bits. In each pair, the result is 1 if either (or both) of the corresponding bits is 1. Otherwise, the result is zero. For example:
0101 OR 0011 = 0111
In C, the bit-wise OR operator is "|" (vertical bar). For example:
x = y | z;
assigns x the result of "y OR z". This is different from the C logical
"or" operator, "||" (two vertical bars), which treats its operands as Boolean values,
and returns "true" (non-zero) or "false" (zero).
The bit-wise OR may be used in situations where a set of bits are used as flags. The bits in a single binary numeral may each represent a distinct Boolean flag. Applying the bit-wise OR operation to the numeral along with a bit pattern containing 1 in some positions will result in a new numeral with those bits set. For example:
0010
can be considered as a set of four flags. The first, second, and fourth flags are not set (0); the third flag is set (1). The first flag may be set by applying the bit-wise OR to this value, along with another value in which only the first flag is set:
0010 OR 1000 = 1010
This technique may be employed by programmers who are working under restrictions of memory; one bit pattern can represent the states of several independent variables at once.
AND
A bit-wise AND takes two bit patterns of equal length and performs the logical AND on each pair of corresponding bits. In each pair, the result is 1 if the first bit is 1 AND the second bit is 1. Otherwise, the result is zero. For example:
0101 AND 0011 = 0001
In C, the bit-wise AND operator is "&" (ampersand). For example:
x = y & z;
assigns x the result of "y AND z". This is different from the C logical "and"
operator, "&&", which takes two logical operands as input and produces a result of "true" (non-zero)
or "false" (zero).
The bit-wise AND may be used to clear particular bits in a pattern or perform a bit mask operation. A masking operation may be used to isolate part of a string of bits, or to determine whether a particular bit is 1 or 0. For example, given a bit pattern:
0101
To determine whether the third bit is 1, a bit-wise AND is applied to it along with another bit pattern containing 1 in the third bit, and 0 in all other bits:
0101 AND 0010 = 0000
Since the result is zero, the third bit in the original pattern was 0. Using bit-wise AND in this manner is called bit masking, by analogy to the use of masking tape to cover, or mask, portions that should not be altered, or that are not of interest. In this case, the 0 values in the masking operand mask the bits that are not of concern (all but the third bit).
XOR
A bit-wise XOR takes two bit patterns of equal length and performs the logical exclusive OR operation on each pair of corresponding bits. The result in each position is 1 if the corresponding bits are different, and 0 if they are the same. For example:
0101 XOR 0011 = 0110
In C, the bit-wise XOR operator is "^" (circumflex). For example:
x = y ^ z;
assigns x the result of "y XOR z".
Assembly language programmers sometimes use the XOR operation as a short-cut to set the value of a register to zero. On many architectures, the XOR operation requires fewer CPU clock cycles than the sequence of operations that may be required to load a zero value and save it to the register. Using a given value as input to both sides of the bit-wise XOR operation always results in an output of zero; by XORing a register with itself, that register can be easily set to zero.
The bit-wise XOR may also be used to toggle (invert) flags in a set of bits. Given a bit pattern:
0010
The first and third bits may be toggled simultaneously by a bit-wise XOR with another bit pattern containing 1 in the first and third positions:
0010 XOR 1010 = 1000
xor Swap algorithm
Standard swapping algorithms require the use of temporary storage. Here is one such algorithm to swap x and y:
Copy the value of y to temporary storage: temp = y
Assign y to get the value of x: y = x
Assign x to get the temporary storage value: x = temp
If the two variables x and y are of type integer, an arithmetic algorithm to swap them is as follows:
x = x + y; y = x - y; x = x - y;
The above algorithm breaks down on systems that trap integer overflow.
Also, when x and y are aliased to the same storage location the result is
to zero out that location. Using the XOR swap algorithm, however,
neither temporary storage nor overflow detection are needed. However,
the problem still remains that if x and y use the same storage location,
the values will be zeroed out. The algorithm is as follows:
x = x XOR y; y = x XOR y; x = x XOR y;
This algorithm typically corresponds to three machine code instructions and thus is particularly attractive to assembly language programmers due to its performance and efficiency. It eliminates the use of an intermediate register, which is a limited resource in assembly language programming. It also eliminates two memory access cycles, which are expensive compared to a register operation.
Explanation of the algorithm
For example, let's say we have two values X = 12 and Y = 10. In binary, we have
X = 1 1 0 0 Y = 1 0 1 0 Now, we XOR X and Y to get 0 1 1 0 and store in X. We now have X = 0 1 1 0 Y = 1 0 1 0 XOR X and Y again to get 1 1 0 0 - store in Y. We now have X = 0 1 1 0 Y = 1 1 0 0 XOR X and Y again to get 1 0 1 0 - store in X. We ultimately have X = 1 0 1 0 Y = 1 1 0 0
The values are swapped, and the algorithm has worked (at least this time)!
In general, if we call the initial value of X = x and the initial value of Y = y, then performing the above steps (and remembering that a XOR a == 0 and b XOR 0 == b), yields:
x = x XOR y; X == x XOR y Y == y y = x XOR y; X == x XOR y Y == x XOR y XOR y == x x = x XOR y; X == x XOR y XOR x == y Y = x
Code example
/* C code to implement an xor swap: */ void xorSwap(int *x, int *y) { if (x != y) { *x ^= *y; *y ^= *x; *x ^= *y; } }
Bit Shift (Arithmetic shift)
The bit shift is sometimes considered a bit-wise operation, since it operates on a set of bits. Unlike the above, the bit shift operates on the entire bit-string, rather than on the individual bits. In this operation, the digits are moved, or shifted, to the left or right. Registers in a computer processor have a fixed number of available bits for storing numerals, so some bits may be shifted past the "end" of the register; the different kinds of shift typically differ in what they do with the bits that are shifted past the end.
Left shift "<<" and right shift ">>"
For example, the number
0111 LEFT-SHIFT = 1110 0111 RIGHT-SHIFT = 0011
In the first case, the left-most 0 was shifted past the end of the register, and a 0 was put into the right-most position. In the second case, the rightmost 1 was shifted past the end (and is often in the carry flag though that can't usually be accessed in high level languages), and the sign bit, 0, was copied into the leftmost position.
You should ALWAYS use unsigned values as arguments to the shift operators. Consider what happens when you RIGHT shift an unsigned value - you are guaranteed to put a zero in the left most bit(s). No problem. But if you RIGHT shift a signed value you cannot be sure whether the sign bit will be replicated or whether a zero will be replicated. The behavior is platform-dependent and thus should be avoided by always shifting unsigned values.
In C, the left and right shift operators are "<<" and ">>", respectively. The number of places to shift is given as an argument to the shift operators. For example:
0111 LEFT-SHIFT-BY-TWO = 1100
x = y << 2;
assigns x the result of shifting y to the left by two digits.
NOTE: A left shift is equivalent to multiplying by two (provided the value does not overflow), while a right shift is equivalent to dividing by two and rounding down (i.e., x / 2).
Power Sets and Bit Maps
There are a few ways to generate the power set (the set of all subsets) of a set: writing nested loops (awkward), using recursion (more elegant but still not the simplest), or using bitmaps (easier). Remember, the cardinality of the power set of a set with N elements is 2N. This should give you a bit of a hint as to how to generate the power set.
What is a bitmap?
A bit map is an association we make between a bit (1/0 or TRUE/FALSE) and an element of a set. The association is simple: for every element in the set we assign one bit to correspond to one element of the set. The correspondence is defined as a 1 indicating inclusion into the subset, and a 0 meaning exclusion from the subset. If our set has N elements then we need N bits to map each set element. An example:
original set: { 12 21 13 31 14 41 15 51 16 61 17 71 18 81 77 34 } our bitmap: 1 0 1 1 0 1 1 1 1 0 1 1 0 0 1 0 the subset indicated: { 12 13 31 41 15 51 16 17 71 77 }
Notice that only where the associated bit is a 1, do we include that element in our subset. Bits that are 0 cause the exclusion of their associated set element.
The remaining question is: what tools does C give us to set up this mapping and generate all the possible subsets? C gives us the unsigned integer type (which we can think of as an array of bits), the bit-wise AND operator (the single ampersand & used as a binary operator - recall that the single ampersand & used as a unary operator is address-of). C also provides us with some bit-shifting operators: << and >>. We can use these tools to determine if any specific bit in an integer is a 1 or a 0.
As an example, if you wanted to take an integer and print out its binary bit pattern from left to right you could use a loop as illustrated by the following: