Computer Science 237
Computer Organization

Williams College
Fall 2005

Lecture 26: MIC1 Microcode
Date: November 11, 2005

Announcements

Be a Computer Science teaching assistant for the spring semester! See Lorraine Robinson in the CS office or fill out the online form to apply. The deadline is November 15.
Computer Science Alumni Panel this afternoon, 2:35 PM, TCL 202. Come hear what some of our recent alumni are doing with their Computer Science degrees. Pizza with the alumni at 1 PM.
Exam due back tomorrow morning. I plan to be in my office to get them at 11.

Agenda

Tanenbaum's horizontally microcoded MIC-1 - implementing the MAC-1 ISA on Tanenbaum's microarchitecture
Another layer of abstraction - we want to present an ISA, given a microarchitecture
We can specify microcode instructions using Pascal-like statements.
Tanenbaum 1990 Fig. 4-15 shows the kinds of statements we can use to specify microcode instructions
- Mostly straightforward, numbers indicate values presented on various control lines.
- ADDR is specified as 2 hex digits
- Note that the first statement tells us that the PC is in scratchpad 0.
- other statements tell us that IR is scratchpad 3, scratchpad 6 contains a 1, etc.
- some of these statements could be combined to happen at the same time - such as the first (mar:=pc; rd) and fourth (pc:=pc+1)
An microinstruction sets values for all control lines. If a control line is not determined by the current operation group, other operations may be specified. For example:
```
mar := pc; pc := 1 + pc; mbr := 1 + pc; rd;
```
This is a single instruction; it takes one line of store (32 bits) and one (major) cycle to execute. Removing parts of the instruction only reduces the work accomplished.
Here, we need to
1. put 1 onto A
2. put the PC onto B
3. tell the ALU to add
4. tell the shifter to do nothing
5. copy the PC (which is on B) into the MAR
6. store the result of the ALU/shifter operation in the MBR
7. enable the C bus
8. put the value on C into the PC
9. read from memory
So our microinstruction:
```
0 00 00 00 1 1 1 0 1 000 000 111 00000000
```
The operations rd and wr take two cycles to execute (because memory is slow). These cycles must be consecutive. From memory's point-of-view, when the rd line is held high, it must control the bus, etc. Interruptions in this protocol will cause problems.
ALU-based operations appear in the microcode source as binary operations (e.g. pc+1), or functions (e.g. band(ir,mask)).
Shifter operations appear as functions lshift and rshift and are composed with ALU operations.
The setting of the N and Z bits (for condition testing) happens during the ALU computation; the shifter has no effect on N or Z.
An optional if or unconditional goto sets the COND and ADDR bits. In Tanenbaum, the values of ADDR are simply the address of the target instruction. Branching occurs at no cost.
The MIC1 microprogram - Tanenbaum 1990 Fig. 4-16.
- An interesting implementation, but there are many improvements possible.
- General approach: Test bits from msb to lsb.
  Each test falls to next instruction or takes conditional branch.
  Approach may be thought of as a binary search tree for the appropriate microcode, traversed by looking at bits of instruction.
  Decoding instructions in this manner is not practicable.
- Scratchpad: pc, ac, ir, tir, amask (for successful branches), smask (for INSP and DESP), 1, and -1. 0 and A-F are unused. We should think about a better use of registers.
- Greater parallelism is possible. How much compression can you accomplish?
  There is not under a great amount of pressure because of the relatively large control store. Our store will only be 4 times larger for a 68000 style architecture.
  Think about how we might add more power to the machine with little overhead.
- Instructions 0-2. Instruction fetch.
  ir is a general purpose scratchpad register used to hold the current instruction.
  Notice that the PC points to the next instruction to be executed.
  Note that the value is "negative" if it starts with a 1 (i.e. has a 1 in bit 15). The n bit is set if it is. So from 2, we continue to 3 if the instruction starts with a 0, jump to 28 if it starts with a 1.
- Instruction 3.
  Note that ir+ir causes a left shift (a multiply by 2). So lshift(ir+ir) represents two left-shifts. The condition code bits are set based on the value of ir+ir.
  Thus the conditional branch is taken if bit 14 is high. That bit will have disappeared (as part of tir; it will be in ir as bit 14) by the time we get to either instruction 4 or 19. But we never need to look at it again, so that's fine.
- Instruction 4-5.
  The double-shifting trick cannot be used again since only one bit can be tested at a time.
  The form alu:=tir simply drops the tir register into the alu to get status bits; the scratchpad and mbr are left unchanged.
- Instructions 6-8.
  If the instruction is LODD (opcode 0), four failed N tests (one for each opcode bit of zero) cause us to arrive here.
  This instruction takes the address (found in the low 12 bits of the instruction, still in ir) and loads that memory value into the accumulator (ac).
  Notice that the opcode bits do not need to be stripped off; the mar is only 12 bits wide, so the stripping is done by the hardware (see also instruction 9 for a non-trivial example).
  Instruction 7 is painful to see. Only one bit is set. Think: Perhaps we can do something else here?
  goto 0; restarts the instruction fetch loop.
- Instructions 9-10.
  STOD - write ac to memory.
- Instruction 11.
  Split on bit 12 to decode instruction starting with 001.
- Instructions 12-14.
  ADDD - no surprises here.
- Instructions 15-18.
  No subtraction in ALU, so we make use of the 1's complement feature (inv(a)) and add 1.
  Can this be optimized? (Hint: goto 1, instead - "prefetching")
  Also, a is being used as a temp storage area.
- Instruction 22.
  What's happening here? Is it necessary? We need a philosophy to deal with it.
  We "need" to pull out the bottom 12 bits of the ir to store in the pc because the pc should contain a valid address.
  But does this actually matter? When the pc is loaded into the mar for a subsequent fetch, it will be stripped to 12 bits anyway.
  One possible danger: if the pc is put onto the stack, the 16-bit version with junk bits in the top could end up in the hands of a user program and two equivalent pointers might not be equal.
  Possible solution: do the band(pc, amask) only when pushing a pc value onto the stack. Then we could write:
```
pc := ir; mar := ir; rd; goto 1
```
- Instruction 24.
  This is a total waste of time. This instruction can almost always be removed. Seems impossible. How?
- Instruction 27.
  Is there an optimization to be performed here?
- Instructions 31-32.
  Why can't these instructions be merged? Why are we branching to 7?
- Instructions 31 & 33.
  Is it possible to remove one of these instructions (they're duplicates)? How about 36-37 & 38-39?
- Instructions 47-49.
  Nice code. Would it be useful to store sp internally as off-by-one? Look at the number of sp := sp - 1; operations.
- Instructions 62-64.
  The last two instructions can be removed by putting goto 7 at the end of instruction 62.
- Instructions 70-72.
  Can these be optimized? Perhaps we can use the mbr and jump to code that already loads ac from the mbr?
- Instructions 74-78.
  Compare INSP and DESP. Something is horribly wrong here. Can it be fixed?

Computer Science 237 Computer Organization

Williams College Fall 2005

Announcements

Agenda

Computer Science 237
Computer Organization

Williams College
Fall 2005