Computer Science 237
Computer Organization

Williams College
Fall 2006

Lecture 27: MIC1 Microprogramming, Microcode Project Overview
Date: November 13, 2006

Agenda

Announcements
No lab this week
Clean up Lab 8 circuits
Submit Bonus Lab - grading by demonstration only, come by when you have a chance to show your circuit in action
Microprogramming the MIC1
- Tanenbaum 1990 Fig. 4-16: an interesting implementation, but there are many improvements possible
- We need to implement the main loop we discussed a while back: Fetch, Update, Decode, Execute
- General approach to decoding: Test bits from most significant bit (msb) to least significant bit (lsb)
  - Each test falls to next instruction or takes conditional branch
  - Approach may be thought of as a binary search tree for the appropriate microcode, traversed by looking at bits of instruction
  - Decoding instructions in this manner is not practicable
- Scratchpad:
  - Special-purpose registers: pc, ac, ir, tir
  - Constants: amask (for successful branches), smask (for INSP and DESP), 1, and -1, 0 (unused) - note these are assumed to be initialized here, for your projects, you will have to generate any constants you wish to have in registers
  - Scratch: a-f, only a is ever used
  - We should think about a better use of registers
- Greater parallelism is possible
  - How much compression can you accomplish?
  - There is not under a great amount of pressure here because of the relatively large control store (Note that your store will only be 4 times larger for a 68000-style architecture)
  - Think about how we might add more power to the machine with little overhead
- Instructions 0-2: Instruction fetch, pc update, start decode
  - ir is a general purpose scratchpad register used to hold the current instruction
  - Notice that the pc points to the next instruction to be executed
  - Note that the value is "negative" if it starts with a 1 (i.e. has a 1 in bit 15)
    - The n bit is set if it is
    - So from 2, we continue to 3 if the instruction starts with a 0, jump to 28 if it starts with a 1
- Instruction 3: decode some more
  - Note that ir+ir causes a left shift (a multiply by 2)
  - So lshift(ir+ir) represents two left-shifts
  - The condition code bits are set based on the value of ir+ir only - not the shifter output
  - Thus the conditional branch is taken if bit 14 is high
  - Bit 14 will have disappeared (from tir; it will of course still be in ir as bit 14) by the time we get to either instruction 4 or 19 - but we never need to look at it again, so that's fine
- Instruction 4-5: more decoding (this is painful)
  - The double-shifting trick cannot be used again since only one bit can be tested at a time
  - The form alu:=tir simply drops the tir register into the alu to get status bits; the scratchpad and mbr are left unchanged
- Instructions 6-8: finally, we execute an instruction
  - If the instruction is LODD (opcode 0), four failed N tests (one for each opcode bit of zero) cause us to arrive here
  - This instruction takes the address (found in the low 12 bits of the instruction, still in ir) and loads that memory value into the accumulator (ac)
  - Notice that the opcode bits do not need to be stripped off; the mar is only 12 bits wide, so the stripping is done by the hardware (see also instruction 9 for a non-trivial example)
  - Instruction 7 is painful to see - only one bit is set, so only one small part of our microarchitecture is doing anything useful - Think: Perhaps we can do something else here?
  - goto 0; restarts the instruction fetch loop
- Instructions 9-10:
  - STOD - write ac to memory
- Instruction 11:
  - Split on bit 12 to decode instruction starting with 001
- Instructions 12-14:
  - ADDD - no surprises here
- Instructions 15-18: subtraction operation
  - No subtraction in ALU, so we make use of the 1's complement feature (inv(a)) and add 1
  - Can this be optimized? (Hint: goto 1, instead - "prefetching")
  - Also, note that a is being used as a temp storage area
- Instruction 22:
  - What's happening here? Is it necessary? We need a philosophy to deal with it
  - We "need" to pull out the bottom 12 bits of the ir to store in the pc because the pc should contain a valid address
  - But does this actually matter? When the pc is loaded into the mar for a subsequent fetch, it will be stripped to 12 bits anyway
  - One possible danger: if the pc is put onto the stack, the 16-bit version with junk bits in the top could end up in the hands of a user program and two equivalent pointers might not be equal
  - Possible solution: do the band(pc, amask) only when pushing a pc value onto the stack - Then we could write:
```
pc := ir; mar := ir; rd; goto 1
```
- Instruction 24: a lone goto
  - This is a total waste of time - no useful work happening
  - It may seem impossible, but this kind of instruction can almost always be removed. How?
- Instruction 27:
  - Is there an optimization to be performed here?
  - Can we start the next fetch?
- Instructions 31-32:
  - Why can't these instructions be merged?
  - Why are we branching to 7?
- Instructions 31 & 33:
  - Is it possible to remove one of these instructions (they're duplicates)?
  - How about 36-37 & 38-39?
- Instructions 47-49:
  - Nice code - lots of things happening at once
  - Would it be useful to store sp internally as off-by-one? Look at the number of sp := sp - 1; operations
- Instructions 62-64:
  - The last two instructions can be removed by putting goto 7 at the end of instruction 62
- Instructions 70-72:
  - Can these be optimized? Perhaps we can use the mbr and jump to code that already loads ac from the mbr?
- Instructions 74-78:
  - Compare INSP and DESP
  - What can we optimize?
Microcode Design Project

Computer Science 237 Computer Organization

Williams College Fall 2006

Agenda

Computer Science 237
Computer Organization

Williams College
Fall 2006