Lab 1010-- ISA Comparisons

Computer Science 220
Assembly Language & Computer Architecture

Fall 2011, Siena College

Lab 1010: ISA Comparisons
Due: 4:00 PM, Monday, December 12, 2011

During this week's lab meeting, you will learn about instruction set architectures other than MIPS. Be sure you understand everything here - all topics in this lab handout are potential final exam question topics.

You may work individually or with a partner on all parts of this lab.

There are a series of questions you need to answer and turn in by email. You may wish to open an email draft or start editing a document to record your answers as you go. Start by placing your name and the name of your partner (if you have one) in this document.

Addressing Modes

Before we consider specific instruction set architectures, we review and give examples of addressing modes. The addressing mode of of each operand of a machine instruction determines how the instruction finds the data specified by that operand (either source or destination).

Some very common addressing modes include:

register direct addressing: the operand is in a register
immediate addressing: the operand is specified as part of the instruction (usually a constant value)
register indirect addressing: the operand is in memory at an address contained in a register
register indirect with offset addressing: the operand is in memory at an address specified by the contents of a register plus a constant offset that is included as part of the instruction

Question 1: Identify the addressing mode used for each operand in the following MIPS instructions (2 points):

add $4, $5, $6
ori $7, $4, 16
lw $8, 0($12)
sw $9, -14($8)

Question 2: MIPS does not have a register indirect addressing mode (without offset). But we have been doing the equivalent in many of our programs. How do we achieve the equivalent using the modes it does have? (1 point)

We used register indirect with offset addressing for accessing array elements where we had a constant subscript:

lw $5, 16($6)   ; load a[4] if $6 is a pointer to a

We also saw this when saving and restoring registers on the stack:

sw $9, -4($sp)  ; store $9 on stack at offset -4

Now, suppose we have a C structure (think Java class if C structures upset you too much):

struct ratio {
  int numerator;
  int denominator;
};

and the following C code, where r is a pointer to one of these structures:

  int n;
  int d;
  n = r->numerator;
  d = r->denominator;

Question 3: If the variables r, n, and d are assigned to registers $9, $10, and $11, respectively, give MIPS code that would implement the above statements. (2 points)

There are also some less-common addressing modes worth noting:

register indirect with displacement and indexing addressing: the operand is in memory at an address computed by taking the sum of the value in a base register, a displacement value contained in a second register, possibly multiplied by a data size, and a constant offset
For example, an operand for the Motorola 68000 might be specified as:
4(%a4,%d1.w)
%a4 and %d1 are register names. %a4 is the base register, and the %d1.w specifies that the displacement value is the number of word-sized values (which are 2 bytes on the M68K) so it should be multiplied by 2. So the effective address is %a4+2*%d1+4.

Question 4: MIPS certainly does not have this addressing mode, but if it did, we would be able to write an instruction like this:

add $3, $2, 4($7,$8.w)

What sequence of real MIPS instructions would peform the same function? (1 point)

register indirect with predecrement addressing: the given register is decremented, then the operand is in memory at the address specified by that register.
This is also available in the Motorola 68000, where is takes the form:
-(%a7)
register indirect with postincrement addressing the operand is in memory at the address specified by the register, then the register is incremented.
Again, on the M68K, this looks like:
(%a7)+

Question 5: How could these addressing modes help when compiling a loop that computes the sum of all elements of an array? (1 point)

program counter relative addressing the operand is computed as a function (usually the sum) of the program counter and a constant value contained within the instruction
These are typically used only with control flow instructions (branches and jumps).

More Complex Instuctions

All MIPS machine instructions are, by design, very limited in what they can do. This is not true of all architectures. Consider some examples from other architectures:

The SOBGTR instruction - "Subtract One and Branch on GreaTeR than" - from VAX MACRO (the assembly language your instructor first learned as an undergraduate) subtracts 1 from the specified register and branches to the target address if the result is greater than 0. So an entire loop could be written:
```
        MOVAW   DATA,R6         ; load array ptr into R6
        MOVL    NUM,R9          ; initialize R9 with the number of elts
DOUBLE: ADDW2   (R6),(R6)+      ; double entry, increment ptr
        SOBGTR  R9,DOUBLE       ; loop control
```

Question 6: Give C or Java code equivalent to the code above. You may assume that there are variables data and num that contain a pointer to the start of an array and the number of elements in that array, respectively. (2 points)

The CMPC3 and CMPC5 string comparison instructions, also from VAX MACRO:
```
       CMPC5   R5,STRING1,#^A/ /,R7,STRING2
```
This compares the character string whose starting address is specified by a label STRING1 and whose length is in register R5 to the string labelled STRING2, length R7, using the space character to pad the shorter string for comparison purposes, using registers R0, R1, R2 and R3 to store information about the result of the comparison.

This instruction not only takes a long time, the amount of time it takes depends on the length of the strings being compared!
Motorola 68000 movem.l instruction pushes or pops a specified subset of the 68000's 16 user-visible registers onto or from the call stack in a single instruction.
```
swap: link  %a6,#0
     movem.l #0x010f,-(%sp)
     ... code for swap subroutine that ...
     ... uses the saved registers ...
     movem.l (%sp)+,#0xf080
     unlk   %a6
     rts
```
The hexadecimal constants are bit fields specifying which registers should be pushed or popped. If there is a 1 in a given position, the corresponding register is pushed or popped.

Note that the constants each contain 5 bits set to 1, and that the second is the reverse of the first. This is because popping is done in the opposite order as pushing (it is a stack, after all).

The code above would replace the longer code:
```
swap: link  %a6,#0
     move.l %d0,-(%sp)
     move.l %d1,-(%sp)
     move.l %d2,-(%sp)
     move.l %d3,-(%sp)
     move.l %a0,-(%sp)
     ... code for swap subroutine that ...
     ... uses the saved registers ...
     move.l (%sp)+,%a0
     move.l (%sp)+,%d3
     move.l (%sp)+,%d2
     move.l (%sp)+,%d1
     move.l (%sp)+,%d0
     unlk   %a6
     rts   
```

Question 7: Which MIPS design principles would a similar instruction in MIPS violate? Could you come up with an instruction format that would work for it? (1 point)

RISC vs. CISC

In class, we are studying how to implement the MIPS ISA directly in hardware, including a pipelined implementation that handles complexities like data forwarding and hazard detection.

This is possible in a course such as this because of the simplicity of its design. MIPS instructions each have a single, simple function, all instructions are the same size (one 32-bit word), and there are very few addressing modes.

These are some of the features common to Reduced Instruction Set Computer (RISC) architectures, of which MIPS is an excellent example.

CISC Architectures

Other architectures are Complex Instruction Set Computer (CISC) architectures. Examples include IBM mainframe architectures, the Digital Equipment Corporation PDP-11 and VAX architectures, the Motorola 68000 series, and the Intel x86 family of architectures.

As the name indicates, and as you saw in the previous section, instructions in a CISC architecture can be quite complex.

CISC architectures are too complex to be implemented directly in hardware. In these cases, an implementation might involve a simpler hardware (a microarchitecture) and an interpreter that is used to execute instructions, guided by a microprogram.

Decoding and executing an instruction can require several steps (microinstructions), and the more complex the instruction (more operands, etc) the longer it will take.

Typical CISC architecture characteristics:

They include large sets of complex instructions: many operations, even complex ones, are supported as single machine instructions.
Many addressing modes are included and can be used by operands for many instructions.
Memory-based data for one or more operands is permitted in most instructions, meaning that a single instruction may make several (slow) memory accesses.
Instruction lengths (number of bytes of memory per instruction) vary by the opcode and the addressing modes of the operands.
Only a small number of user-visible registers (a dozen or two) are provided.
Implementations are usually heavily pipelined.
Programs occupy a relatively compact code footprint (each instruction accomplishes a lot).

If you have to program directly in assembly language, a CISC architecture can make this task easer. However, compilers may make use of only a subset of available the available instructions.

RISC Architectures

Researchers (including our text authors) in the early 1980's advocated for the RISC approach.

Characteristics of RISC architectures:

Most instructions can be executed in one "cycle".
- SPARC supports a single step of multiply, not a full multiply in one instruction. (keep the common case fast!)
Instruction sets may be large, but are easily decoded.
- small number of opcodes: MIPS has 14 arithmetic, 4 load/store, 4 (×14) conditionals, 4 misc, 3 multiple cycle macro instructions.
Instructions support a small number of addressing modes, with each opcode requiring specific addressing modes for its operand.
Memory access is only through explicit "load" and "store" operations.
Instructions are fixed-length with simple instruction formats: 3-operand operations, memory reference, conditional branch, jumps.
Implementations are pipelined, but the pipes are shorter.
- typical pipeline: instruction fetch, instruction decode, operand decode, execution, data write
A larger number (maybe thousands) of user-visible registers are provided, frequently supplanting the use of the stack.
Programs have a relatively large code footprint (each instruction accomplishes only a single, simple task).

If you have to program by hand, RISC architectures will be more difficult. The task is easier for, and intended to be done by, a compiler.

Question 8: One of the above items states that "instructions support a small number of addressing modes, with each opcode requiring specific addressing modes for its operand". But in MIPS, we can add a register to a register or a register to a constant. Does MIPS fit the statement? (1 point)

Question 9: Why do the pipelines in a RISC architecture need to be short? (1 point)

Sun SPARC

Sun Microsystems' SPARC architectures enjoyed a long period of success from the early 1990's to the early 2000's.

SPARC is a RISC architecture, and contains many of the same features as MIPS.

A RISC system may have hundreds of registers in its register file.

One way to organize these registers is to treat the register file as circular and give each routine a "view" of a limited subset of these registers at any given time.

Read the first SPARC has many, many registers but only 32 visible at any given time.

Question 10: Give a few reasons why might one not wish to expose hundreds of registers to all subroutines. (1 point)

One subset of registers is designated as "globals" and are visible to all routines.

Then, each routine has three subsets of registers

"ins" - the actual parameters
"locals" - local variables, temporaries
"outs" - parameters to any subroutines called by this routine

The "outs" of a routine become the "ins" of a subroutine that it calls.

Much of what we could do with the stack for parameter passing and local variable storage can be replaced with this.

We eliminate some of the problems of registers getting clobbered by subroutines and we speed function calls by reducing memory accesses needed to pass parameters on the stack.

But what if we have a deep call stack and we run out of registers? We can't just clobber things, so we'd need to keep a stack also for those cases.

The actual implementation involves "pretending" to write things to the stack, but it only actually does this ("spill" to the stack, and "fill" from the stack to restore) if necessary given the current state of the register file.

Problem: what about functions with more parameters than can be passed in the register window? We still need the stack for those.

Question 11: Compare the SPARC with register windows where there are 8 globals, 8 ins, 8 locals, and 8 outs with the MIPS register file we know so well. How many parameters can be passed to a function without worrying about using the stack in each? How many values can be returned in each? What differences are there between a programmer's usage of the MIPS t-registers and the SPARC "locals"? (3 points)

Other SPARC Features

A few final notes about SPARC:

Compare and branch are done in separate instructions - the conditional branches look at the status codes set by a previous instruction.
There is a 1 cycle branch delay slot.

x86/IA-32 ISA

This section presents some highlights of the text's discussion of the Intel x86 architecture in Section 2.17. Now that you know something about the MIPS ISA and its simplicity, that section will make for interesting reading.

A Brief History of the x86

1978: The Intel 8086 is announced (16 bit architecture)
1980: The 8087 floating point coprocessor is added
1982: The 80286 increases address space to 24 bits, +instructions
1985: The 80386 extends to 32 bits, new addressing modes
1989-1995: The 80486, Pentium, Pentium Pro add a few instructions (mostly designed for higher performance)
1997: 57 new "MMX" instructions are added, Pentium II
1999: The Pentium III added another 70 instructions (SSE)
2001: Another 144 instructions (SSE2)
2003: AMD extends the architecture to increase address space to 64 bits, widens all registers to 64 bits and other changes (AMD64)
2004: Intel capitulates and embraces AMD64 (calls it EM64T) and adds more media extensions
2006: SSE4: 54 new instructions from Intel
2007: SSE5: 170 new instructions from AMD
2008: Advanced Vector Extension from Intel to expand SSE register width to 256 (from 128), redefining 250 instructions, adding 128 new

x86/IA-32 Overview

Complexity:
- Instructions from 1 to 17 bytes long
- one operand must act as both a source and destination
- one operand can come from memory
- complex addressing modes e.g., "base or scaled index with 8 or 32 bit displacement"

Question 12: Based on these restrictions, give an example of a single MIPS instruction that peforms an operation that could not be done in a single x86 instruction. (1 point)

Why does it work with all that complexity?
- the most frequently used instructions are not too difficult to build
- compilers avoid the portions of the architecture that are slow
- much of the complexity comes from backwards compatibility and may safely be ignored by modern programmers/compilers
There are 8 general purpose registers plus 8 special purpose visible to programmers.
Lots of restrictions and caveats.

x86 Basics: Registers, Data Types, and Memory

A diagram of the main registers can be found in Figure 2.36.

modern x86/IA-32 has 8 32-bit integer registers
- These are not entirely general-purpose: some instructions limit the choice of register operands to fewer than 8.
special-purpose 32-bit registers:
- instruction pointer (program counter)
- flags (condition codes)
- floating-point and multimedia registers

Registers names come from their historical special purposes:

`%eax`		accumulator (for arithmetic ops)
`%ebx`		base (address of array in memory)
`%ecx`		count (of loop iterations)
`%edx`		data (e.g., second operand for binary operations)
`%esi`		source index (for string copy or array access)
`%edi`		destination index (for string copy or array access)
`%ebp`		base pointer (base of current stack frame)
`%esp`		stack pointer (top of stack)
`%eip`		instruction pointer (program counter)
`%eflags`		flags (condition codes and other things)

The letter "E" in the name indicates that the "extended" (32-bit) version.
Registers can also be used to store 16- and 8-bit values
- useful when writing smaller values to memory
- the low 16 bits of a register are denoted by dropping the "E" from the register name, e.g., %si
- the two 8-bit halves of the low 16 bits of the first four registers can be used as 8-bit registers by replacing "X" with "H" (high) or "L" (low)
Many instructions are independent of data type, but some require that you select the proper instruction for the data types of the operands.
Memory: many x86 operations accept memory locations as operands.
- e.g., increment value in memory in one instruction
- Figure 2.37 shows the valid combinations (both from memory is not allowed).

Question 13: How would you increment a value in memory in MIPS? (1 point)

x86 ISA

Arithmetic operations: ADD, SUB, NEG (negate), INC (increment), DEC (decrement)
Logical operations: AND, OR, XOR, NOT
Shift operations: SHL (left), SAR (arithmetic right), SHR (logical right)
Rotate operations (shift with wraparound): to the left (ROL) or to the right (ROR)
typically specify one register as both the destination and one of the sources:
```
addl %eax,%ebx       # EBX <= EBX + EAX
```
- we cannot use a single ADD instruction to put the sum of EAX and EBX into ECX
- instruction name is extended with a label for the type of data, an "L" in the case above to indicate long, or 32-bit, operands (others are "W" for 16-bit (word) and "B" for 8-bit (byte) operands)
- note destination appears last (though this is assembler-specific)
immediate values of up to 32-bits are usually allowed (since we can have longer instructions), and values that fit into fewer bits are encoded as shorter instructions.

Immediate (constant) values are preceded with a dollar sign:
```
addl $20,%esp     # ESP <= ESP + 20
```
Beware: removing the dollar sign
```
addl 20,%esp      # ESP <= ESP + M[20]
```
which specifies the contents of memory location 20 to be added to ESP rather than the value 20

A few more examples:

movl $10,%esi   # ESI <= 10
movl %eax,%ecx  # ECX <= EAX
xorl %edx,%edx  # EDX <= 0

Data movement: the MOV instruction takes the place of both load and store.
Most x86 addressing modes are specific cases of the general mode:
```
displacement(SR1,SR2,scale)
```
which multiplies SR2 by scale, then adds both SR1 and displacement.

This complex addressing supports array accesses generated by high-level programs.

For example, to access the i^th element of an array of 32-bit integers, one could put a pointer to the base of the array into EBX and the index i into ESI, and execute
```
 movw (%ebx,%esi,4),%eax # EAX <= M[EBX + ESI * 4]
```
If the array started at the 28^th byte of a structure, and EBX instead held a pointer to the structure, one could still use this form by adding a displacement:
```
movw 28(%ebx,%esi,4),%eax   # EAX <= M[EBX + ESI * 4 + 28]
```
scale can be 1, 2, 4, or 8, defaults to 1.

Examples of simpler cases of this addressing mode:

movb (%ebp),%al           # AL <= M[EBP]
movb -4(%esp),%al         # AL <= M[ESP - 4]
movb (%ebx,%edx),%al      # AL <= M[EBX + EDX]
movb 13(%ecx,%ebp),%al    # AL <= M[ECX + EBP + 13]
movb (,%ecx,4),%al        # AL <= M[ECX * 4]
movb -6(,%edx,2),%al      # AL <= M[EDX * 2 - 6]
movb (%esi,%eax,2),%al    # AL <= M[ESI + EAX * 2]
movb 24(%eax,%esi,8),%al  # AL <= M[EAX + ESI * 8 + 24]

Figure 2.38 also shows addressing modes with MIPS equivalents.

direct addressing mode: the address to be used is specified as an immediate value in the instruction

movb 100,%al              # AL <= M[100]
movb label,%al            # AL <= M[label]
movb label+10,%al         # AL <= M[label+10]
movb 10(label),%al        # NOT LEGAL!
movb label(%eax),%al      # AL <= M[EAX + label]
movb 13+8*8-35+label(%edx),%al # AL <= M[EDX + label + 42]
movw $label,%eax          # EAX <= label
movw $label+10,%eax       # EAX <= label+10
movw $label(%eax),%eax    # NOT LEGAL!

Condition codes: there are many, but we will look at 5 condition codes
- sign flag (SF): was last result negative
- zero flag (ZF): was last result 0
- carry flag (CF): did last result generate a carry (among other uses)
- overflow flag (OF): did last operation overflow
- parity flag (PF): even parity of last operation
Note: most, but not all instructions affect all flags
Conditional branches: 8 basic branch conditions and their inverses are available plus the unconditional branch JMP
- jo: overflow - OF is set
- jp: parity - PF is set (even parity)
- js: sign - SF is set (negative)
- je: equal - ZF is set
- jb: below - CF is set
- jbe: below or equal - CF or ZF is set
- jl: less - SF != OF
- jle: less or equal - (SF != OF) or ZF is set
All except can be inverted by inserting an "N" after the initial "J": JNB jumps if the carry flag is clear.

Subroutine control: the CALL and RET instructions

CALL pushes EIP, and its target can come from one of many addressing modes:

call printf        # (push EIP), EIP <= printf
call *%eax         # (push EIP), EIP <= EAX
call *(%eax)       # (push EIP), EIP <= M[EAX]
call *fptr         # (push EIP), EIP <= M[fptr]
call *10(%eax,%edx,2) # (push EIP), EIP <= M[EAX + EDX*2 + 10]

RET (return) pops the return address off the stack and into EIP.

Some typical instruction formats are shown in Figure 2.41.

x86 Wrapup

Just how complex is the x86 ISA today?

See Figure 2.43 for a summary.

Question 14: COD Exercise 2.38, parts 1b, 2b, and 3b, only. (3 points)

Comparing Assembly Language Programs

We will consider the assembly language code generated from a few simple C programs for some of the ISAs we consider:

See Example:
~jteresco/shared/cs220/examples/assembly

Copy these to your own account or computer.

Here, you will find two C files, each of which contains a simple C function. You will also find compiler-generated assembly language code for 5 architectures: x86, M68K, MIPS, PPC, and SPARC.

You will notice that while each architecture has a very unique set of instructions, register names, and addressing modes, each is unmistakeably an assembly language program.

Consider the files multiply-mips-gcc.s, multiply-i386-gcc295.s, multiply-sparc-cc.s, multiply-sparc-gcc.s, multiply-m68k-gcc.s, and multiply-ppc-gcc.s when answering the following questions.

Question 15: Identify the register or memory location being used to store the variable product in each program. (1 point)

Question 16: Identify the instruction that performs the right shift in each program. (1 point)

Question 17: Identify the instruction that performs the conditional branch for the if (y & 1) in each program. (1 point)

Question 18: Find an example of a branch delay slot in each "sparc" program. Which one makes effective use of the branch delay slots? (1 point)

Submission and Grading

This lab is graded out of 25 points. The points per question are specified above.

By 4:00 PM, Monday, December 12, 2011, please submit your responses to the questions by email to jteresco AT siena.edu. You may supplement your submission with a handwritten response to some or all questions.