Lab 101-- Real MIPS Programming

This week, you will learn some more about MIPS programming in part by using a real MIPS processor rather than the SPIM or MARS simulators. There is not a lot of programming to do. You should be able to finish up, or at least come very close, before the end of the lab meeting. This is by design, because you have an exam to worry about for next week.

Parts of this lab will require you to answer questions and generate several files. Create a directory where you will accumulate these files and open a plain text document lab101.txt where you will answer the questions. Start out by putting your name(s) and the lab number at the top of that file.

Thanks to our School of Science network administrators, an old computer with a genuine MIPS processor has been resurrected and made available on the network for our use. The machine is called indy.cs.siena.edu. It is a Silicon Graphics (SGI) Indy, a low-end graphics workstation from the mid 1990's. It runs a flavor of Unix called Irix, SGI's proprietary Unix.

The MIPS system is on the School of Science (SoS) network but it is in no way integrated with our regular systems. Your login and password and your file space there are completely local to the machine.

To log in, we will need to use the old telnet protocol, an insecure predecessor to ssh. From a computer on the SoS network:

Your username is your SoS username, and your initial password is a capital P followed by your first and last initials, followed by your 9-digit Siena ID.

This will allow the system to understand the Linux terminal windows we use to connect via telnet.

Once you are logged in, you can change your password, if you like, with the command passwd.

Run the command uname -a to get some information about the system. Paste the output of this command into your file lab101.txt.

indy.cs.siena.edu has the standard selection of Unix command-line utilities, but has very little other software installed. Therefore, we will generally edit files on the lab machines and transfer files between there and indy.cs.siena.edu as needed.

Again, these file transfers will use an older protocol: ftp. From a computer on the SoS network:

After you provide your username and password, you will be issued a prompt ftp>. The command ? will list all of the commands that ftp understands, but a few you should be aware of include:

A collection of files you will use in the remainder of this lab are located in the directory /home/jteresco/public/lab101 on indy.cs.siena.edu. Copy these to your directory on indy.cs.siena.edu. A regular old cp command will do this, ftp is not needed here.

One of the files you copied from /home/jteresco/public/lab101 is hellomips.c, a simple "Hello, World" C program.

First, let's make sure we can compile and run the C program. As on our Linux systems, we have the gcc compiler on indy.cs.siena.edu. So we can compile hellomips.c into an executable program a.out with the command:

You've certainly seen similar programs before: we looked at a MIPS assembly "Hello, World" program and ran it in SPIM a few weeks ago. A slightly updated version of that program is in hellospim.s. This version can be assembled by the MIPS assembler (called as), but does not run correctly. But we can give it a try, at least:

This will produce an a.out file that we can run, but the real MIPS processor does not understand our syscall. The syscall mechanism exists in both of the simulators and on real MIPS processors, but the codes and usage are different.

Next, let's see what MIPS assembly code is generated by the compiler when it compiles up the hellomips.c program. Normally, we do not pay any attention to the fact that our C compiler first generates assembly code (the compilation step), then assembles that code into object code. That object code is then used as input for linking with other object code to produce an executable file.

We can ask gcc to stop after the compilation to assembly code by providing the -S flag:

This will produce a hellomips.s that we can then pass to the assembler to generate an object file:

and then link it to an executable (linking with printf from the standard C library, which gcc will take care of for us) with the command:

But, we can also take a look at the assembly code, so let's do that. First, transfer (with ftp) a copy of hellomips.s to your directory back on the Linux system. Then you can use better tools like emacs or whatever else you'd like to look at it.

Next, let's see if we can pare down this assembly file to its essentials. The compiler is designed to generate assembly code for situations much more complicated than a simple "Hello, World" program, so there's a good chance a lot of what it generated is irrelevant here.

Copy the file hellomips.s to helloshort.s. First, make sure you can still assemble, link, and run the unmodified version. If so, then you can go about removing (better: commenting out) lines that you suspect might be unimportant. Only do one or two lines at a time and then retry it. You may wish to do this by editing helloshort.s on the Linux system and transferring the file to indy.cs.siena.edu repeatedly for testing.

When you believe you have the shortest assembly file that still works (and it should be very short), save a copy (you'll be submitting it).

First, compile and link these C programs on indy.cs.siena.edu into an executable a.out with the single command:

Now, compile each of these to MIPS assembly source using gcc with the -S flag, obtaining print17.s and seventeen.s.

Again, we can see that the compiler makes things quite a bit more complex than necessary. Instead of paring down the seventeen.s that the compiler generated, let's just write our own very simple version (you should be able to do it with a handful of assembler directives and only 2 instructions - after all, it does nothing but return a constant value!). Call it simple17.s, but be sure to name the subroutine seventeen so we can still call it from the main program.

This is an example of where assembly programming "by hand" can improve upon what a compiler can generate. An assembly programmer can write the simplest and most efficient code possible, while a compiler is going to follow rules that need to apply to all situations and may not be able to detect a very simple case (like this one).

Once you've created your version, try it out. To do so, assemble the compiler-generated print17.s and your simple17.s, then link them together into an executable:

Once it works, place your version of simple17.s into your Linux directory for inclusion in your submission.

Finally, let's see how much faster your version of simple17.s is than the compiler-generated seventeen.s. There is a C program print17s.c that will call seventeen() in a loop ten million times. Compile this and link it first with the compiler-generated code for seventeen():

Run the program several times, using the time command to see how long it takes:

The third number in the output tells you how much time the program spent executing on the CPU. Paste the output for the fastest run into your lab101.txt file. Also state why it makes more sense to take the fastest and not the average.

Again run it several times with time and choose the fastest time. Also paste that output into your lab101.txt file.

For the next task, start with the C file f.c. There's no main function here, so we'll just look at the assembly code. Generate the f.s MIPS assembly code using gcc -S.

One of the things we notice here (and in previous examples) is the use of the frame pointer, register $fp. In many ways, $fp is used similarly to the stack pointer, $sp. In fact, it is always a pointer into the memory that houses the stack. The difference is that while the stack pointer may change during the execution of a subroutine, the frame pointer should remain fixed. This allows it to be used to refer to specific places on the stack that are used for things like local variables. If space is allocated on the stack for variables, we can determine a fixed offset from the frame pointer where those variables are located.

In this example, the compiler has assigned space on the stack to hold the local variables ga and gb at 24($fp) and 28($fp), respectively. Additionally, it has assigned space on the stack to hold copies of our parameters a and b at 48($fp) and 52($fp). By using this stack space, the compiler is certain that those values (unlike those in registers) cannot be modified unexpectedly by subroutine g.

So far, we have assumed that all local variables are simply assigned registers and we're done. This is only possible when we have enough registers to hold all of the variables that need to exist simultaneously in our subroutine. And...we can't assume any values in t-registers will retain their values across calls to other subroutines. So sometimes, we need space on the stack for local variables.

shortf.c is a modification of f.c that implements the function in a single line of code:

Generate shortf.s for this modified version, and answer these questions in lab101.txt:

It should be pointed out here that we are using gcc to generate MIPS assembly code without any compiler-driven optimizations. Any modern compiler, including gcc, is capable of generating better code than we have seen here. We can see what it does on the original f.c file (note the "O" in -O2 is the letter "O" not the number "0"):

Answer the following questions about the optimized version of f.s in lab101.txt:

Our last example can be found in qpr.c. This contains a simple C function that takes two parameters and computes the sum of their quotient and their remainder when the first is divided by the second.

Generate the MIPS assembly for this function using gcc -S. Note that the function uses two psuedoinstructions you have not seen: div and rem, which compute the quotient and remainder, respectively. However, each of these pseudoinstructions involves using the real instruction div, which computes both the quotient and the remainder, but places them into special registers called LO and HI. The psuedoinstructions as seen in the code will each result in two real instructions being executed:

As an assembly programmer, however, we can notice that the first div instruction has both pieces of information available. In lab101.txt, describe how you can rewrite the main part of qpr to use only one div instruction and still get the right answer?

This is an example where a compiler is severely limited in its ability to perform an optimization that is readily apparent to an assembly programmer who can take a look at the meaning of a statement rather than just the individual operations that make it up. Our high-level language (in this case, C) does not have the ability to express "divide this number by that and put the quotient in this place and the remainder in that place" so the code we can write by hand would not be generated.

By 10:00 AM, Tuesday, October 18, 2011, submit the required files packaged into a single tar file lab101.tar, or as separate attachments, by email to jteresco AT siena.edu.

Grading Breakdown
`lab101.txt`: `uname -a` output	1 point
`hellomips.s` file	2 points
`lab101.txt`: questions about `hellomips.s`	4 points
`helloshort.s` file	4 points
`simple17.s` file	2 points
`seventeen()` timing comparisons	2 points
`lab101.txt`: `f.s` questions	5 points
`lab101.txt`: `shortf.s` questions	4 points
`lab101.txt`: optimized `f.s` questions	2 points
`lab101.txt`: `qpr` rewrite	4 points