Computer Science 400
Parallel Processing and High Performance Computing
Fall 2017, Siena College
You will work individually on this lab.
Getting Set Up
You will receive an email with the link to follow to set up your GitHub repository c-lab-yourgitname for this lab.
Clone your repository onto blizzard or some other Unix system (a Mac terminal or any Linux/Unix would work) where you plan to develop your code for this lab.
Please answer the lab questions in the README.md file.
Unix Commands
The Unix command line is a confusing and frustrating place to work unless you know the commands. But once you get to know them, it's an incredibly efficient way to work. We will work with many of them throughout the course, but for now, your task is to familiarize yourself with some of the most common.
Identify the function of and experiment with these Unix commands (a few of which you have already used):
ls cd cp mv rm mkdir pwd man chmod cat more grep head tail ln find rmdir wc diff scp touch
Using the Unix manual, your favorite search engine, or in discussion with your classmates, determine the answers to these questions:
The C Programming Language
C is a widely-used, general purpose language, well-suited to low-level systems programming and scientific computation. Few languages have maintained popularity for as long as C has.
We will initially study it assuming you have Java experience, focusing on the features that make C significantly different from Java. Fortunately, Java borrowed much of its syntax from C, so it is not difficult for a Java programmer to read most C programs.
C++ is a superset of C (that is, any valid C program is also a valid C++ program, just one that doesn't take advantage of the additional features of C++). C++ adds object-oriented feautures. In this course, we will look only at C, not C++. That said, the parallel processing tools we will use mostly work with C++ as well, so you are welcome to write in C++ if you wish.
We saw in our setup lab how to compile and run a "Hello, World" program. You used the gcc command to produce an executable file a.out. Your repository for this lab includes a similar program, hello.c. Even in this simple program, there are several things worth noting as a beginning C programmer.
The command
gcc hello.c
is essentially just another program that can can run at the command prompt. We run a program named gcc, which is a free C compiler, part of the GNU Compiler Collection.
This example uses the gcc command in its simplest form, where it is used to compile a complete C program that is contained in a single file. In this case, we're asking gcc to compile a C program (the source code) found in the file hello.c. Since we didn't specify what to call the executable program produced, gcc produces a file a.out. The name is a.out for historical reasons, and stands for "assembler output".
This is analogous to a Java program consisting of one class (let's say it's the public class Hello in Hello.java in the same example directory as our hello.c) that has nothing but a main method. There is an important difference, however. In Java, when you compile, either by pressing a button in your IDE or at the command line with
javac Hello.java
the file produced is Hello.class, which needs to be run inside a Java Virtual Machine (JVM):
java Hello
It cannot run directly on the computer's hardware. The program java, the implementation of the JVM, runs directly on the hardware, but that program runs the Java program on our behalf.
Executables and Search Paths
But... when we compile the hello.c program, the a.out file produced is an actual executable program that runs on the hardware.
To understand how we run the program and why it's done that way, we need to understand how Unix shells run any program. Basically, to run a program we type its name. But the names it recognizes are only those programs that exist in a set of directories on the system called the search path.
The search path is simply a list of directory names, which are searched in the order they're specified for an executable program with the name that was typed at the shell prompt.
The search path is specified using an environment variable. Environment variables are used in Unix to provide information to a variety of programs. We can see the set of environment variables assigned to our shell with the env command. Run the command and redirect its output to a file env.out.
In the file env.out, find the line that specifies the PATH environment variable. This is the list of directories where your shell will look for programs when you type a name at the prompt.
Using ls, look at the contents of some of the directories in your path. Can you find some of the commands you learned earlier in this lab?
So, if we want to figure out which actual executable file will run when we type a name, we can (as the shell would do), search each directory in our search path. The first one we encounter is the one that will execute. That's a lot of work. If we want to know which program will execute if we issue a particular command, we can use the which command to find out.
So when we run one of our own programs, such as the a.out we generated from hello.c, we type its name. But if you do that on blizzard, you will likely get an error message, even those a.out is in your working directory:
[jcool@blizzard ~]$ a.out bash: a.out: command not found
The problem is that your working directory is not part of your search path! That's why when we ran the program above, we ran it with a slightly different command:
[jcool@blizzard hello]$ ./a.out Hello, C World!
The "./
" before the name tells our shell that we want to run
the program in ".
", which is the Unix shortcut for specifying
our home directory. We could just as well give an entire absolute
path to our program:
[jcool@blizzard hello]$ /home/jcool/parallel/c-lab-jcool/a.out Hello, C World!
We could have programs in our current directory execute without the
"./
" or absolute path, but having ".
" in a search
path is generally considered a bad
idea.
We'll be writing lots of C programs, and we probably don't want all of our executables to be named a.out. We could certainly rename the ones we want to keep using the mv command. But let's just have gcc produce an executable with the name we want right way:
gcc -o hello hello.c
Here, the executable file produced is called hello because the -o command-line parameter is specified, which tells gcc that the next command-line parameter following the -o should be used as the output file name.
Details of our Simple Program
Finally, we examine the source code for our hello.c program.
At the top of the file, we have a big comment (the equivalent of the class comment in Java) describing what the program does, who wrote it, and when. Your programs should have something similar in each C file.
As with Java, we need to tell C if there are libraries or other code that we will be using within this file. In Java, this is done with import statements, but nothing needs to be imported to use parts of some of Java's core API that fall under the java.lang package, like System and Math. In C, we need to inform the compiler for even things like basic input/output. In this case, our program uses a C library function called printf to print a message to the screen. For C library functions, the needed information is provided in header files, which usually end in .h. In this case, we need to include stdio.h. How do we know? Well, in this case, it's a header file included by nearly every C program, so you'll just get to know it. But in general, we can check the Unix manual with "man 3 printf" and see which header files are listed. We'll learn more about using the Unix manual to find out about C library functions and think more about the actual mechanism employed here later this semester.
Every C program starts its execution by calling the function main. The line
int main(int argc, char *argv[])
is the function header for main. It corresponds very nicely to the typical main method header in a Java application
public static void main(String args[])
and plays the same role. The keyword public is not needed in C, as it has no notion of data protection like Java or C++. The static is not needed because all functions in C are essentially like static methods: they have a global scope and anyone can call them. C's main has an int return instead of void, since C uses the return value of the main function as a return code that the whole program provides to the operating system. The two command-line parameters are provided to main, traditionally declared as argc, the number of command-line parameters (including the name of the program itself), and argv, an array of pointers to character strings, each of which represents one of the command-line parameters. In this case, we don't use them, but they are often listed anyway as here (though they can be omitted if not used). These provide the same information as Java's array of Strings. As we will see soon, C arrays do not come equipped with a length attribute, so argc is needed to tell how many entries exist in the array argv, and string data is represented by a pointer to an array of char, hence the char *.
printf plays the role of Java's System.out.print and
results in the string passed as a parameter to be printed to the
screen. The \n
results in a new line. We will see soon that
the mechanism for constructing strings to print is quite different
from that in Java.
A value of 0 returned from main generally indicates a successful execution, while a non-zero return indicates an error condition. So we return a 0. Many C compilers will also allow main to have a return type of void and no return statement, but the int return type is normally used.
In general, there is a lot of good news for Java programmers learning
C. Much of the syntax of Java was borrowed from C, so a lot of things
will look familiar. This includes the basics like ;
-terminated
statements and code blocks enclosed in {}
pairs, most of the
arithmetic, boolean, and logical operators, and the names and syntax
of control structures (loops and conditionals), and more. Much of our
focus this semester will be on those places where important difference
exist.
The biggest difference that is evident in this simple program is that there are no classes and methods, just functions, which can be called at any time. Any information a function needs to do its job must be provided by its parameters or exist in global variables - variable declared outside of every function and which are accessible from all functions.
The C for loop is much like Java's for loop, except that the loop index variable needs to be declared before the loop. That is, a Java loop that looks like this:
for (int i=0; i<10; i++) { ... }
would need to have the declaration of i outside of the loop:
int i; // any other code that happens before the loop for (i=0; i<10; i++) { ... }
More C Basics
There are many C programming references and tutorials online and you are welcome to look at them. We will refer to some pages on http://www.cprogramming.com/ and elsewhere to help get you up to speed on some C topics.
The printf Function
C's printf function is the primary mechanism for printing to the standard output (terminal). While you are most likely familiar with Java's print and println methods, it also contains a printf method that is very similar to C's. Check out Wikipedia's printf article for some information about C's printf.
-100F = -73.333C -99F = -72.778C -98F = -72.222C ... -10F = -23.333C -9F = -22.778C -8F = -22.222C ... -1F = -18.333C 0F = -17.778C 1F = -17.222C ... 31F = -0.556C 32F = 0.000C 33F = 0.556C ... 998F = 536.667C 999F = 537.222C 1000F = 537.778C
Command-line Parameters
You have likely seen Java applications that take command-line parameters (the String args[] parameter to the main method of a class). A C program that wishes to make use of command-line parameters must declare two parameters to the main function, traditionally named argc and argv.
The parameter argc to the main function is a count of how many command-line strings are included in argv, which is an array of strings.
These are demonstrated in the printargs.c program included in your repository.
Note: argv[0] is not the first parameter, it is the program name itself, and this array entry for the program name is included in the value of argc.
Even when we enter numbers for command-line parameters, the operating system will provide them to your program as strings. So we need to be able to convert strings to a numeric equivalent.
This is demonstrated in the repeat.c program.
Note that the string to integer conversion uses the the overly complicated strtol function, which we use, then check error conditions. There's a lot here we have not yet seen.
%s
, which means to expect an additional parameter which is
a character string (well, really a pointer to a
null-terminated array of char). Here, the string is
argv[0], the first command-line parameter, which is always
the name of the program. This labels the error message with the
program name.
%s
's, so we have
two additional parameters to fprintf, both pointers to
strings.
Formatted Keyboard Input
We have seen how to use the getchar function to get input from the keyboard or redirected from a file, one character at a time. But often, we'd like to read input as words or numbers.
C's standard mechanism for this is the scanf function, as shown in the scanf-example.c program.
%d
in the format
string), and put it into the place pointed at by the address of
x, then return the number of values that matched the input
with the correct format." Similarly for the double value
using a %lf
in the format string.
&
operator.
Don't worry, it will make better sense when you see more examples.
&
operator. This is because the name of a
C string already is a pointer to the first element in the array.
Again, much more on this when we study C pointers in more detail.
Submitting
Your submission requires that all required delierables are committed and pushed to the master for your repository on GitHub.
Grading
This assignment is worth 50 points, which are distributed as follows:
> Feature | Value | Score |
Q1: Unix command descriptions | 4 | |
Q2: cd up | 1 | |
Q3: cd home | 1 | |
env.out | 1 | |
Q4: gcc file? | 1 | |
helloloop.c | 10 | |
Q5: no include? | 1 | |
temps.c | 10 | |
argadder.c | 10 | |
inputadder.c | 10 | |
Total | 50 | |