Submit Bonus Lab - grading by demonstration only, come by when
you have a chance to show your circuit in action
Microprogramming the MIC1
Tanenbaum 1990 Fig. 4-16: an interesting implementation, but
there are many improvements possible
We need to implement the main loop we discussed a while back:
Fetch, Update, Decode, Execute
General approach to decoding: Test bits from most significant
bit (msb) to least significant bit (lsb)
Each test falls to next instruction or takes conditional branch
Approach may be thought of as a binary search tree for the
appropriate microcode, traversed by looking at bits of
instruction
Decoding instructions in this manner is not practicable
Scratchpad:
Special-purpose registers: pc, ac, ir,
tir
Constants: amask (for successful branches), smask (for INSP and DESP), 1, and -1,
0 (unused) - note these are assumed to be initialized
here, for your projects, you will have to generate any constants
you wish to have in registers
Scratch: a-f, only a is ever used
We should think about a better use of registers
Greater parallelism is possible
How much compression can you accomplish?
There is not under a great amount of pressure here because
of the relatively large control store (Note that your store will
only be 4 times larger for a 68000-style architecture)
Think about how we might add more power to the machine with
little overhead
Instructions 0-2: Instruction fetch, pc update, start decode
ir is a general purpose scratchpad register used to
hold the current instruction
Notice that the pc points to the next
instruction to be executed
Note that the value is "negative" if it starts with a 1
(i.e. has a 1 in bit 15)
The n bit is set if it is
So from 2, we continue to 3 if the instruction starts
with a 0, jump to 28 if it starts with a 1
Instruction 3: decode some more
Note that ir+ir causes a left shift (a multiply by 2)
So lshift(ir+ir) represents two left-shifts
The condition code bits are set based on the value of ir+ir only - not the shifter output
Thus the conditional branch is taken if bit 14 is high
Bit 14 will have disappeared (from tir; it will of
course still be in ir as bit 14) by the time we get to
either instruction 4 or 19 - but we never need to look at it
again, so that's fine
Instruction 4-5: more decoding (this is painful)
The double-shifting trick cannot be used again since only
one bit can be tested at a time
The form alu:=tir simply drops the tir register
into the alu to get status bits; the scratchpad and
mbr are left unchanged
Instructions 6-8: finally, we execute an instruction
If the instruction is LODD (opcode 0), four failed
N tests (one for each opcode bit of zero) cause us to arrive
here
This instruction takes the address (found in the low 12 bits
of the instruction, still in ir) and loads that memory
value into the accumulator (ac)
Notice that the opcode bits do not need to be stripped off;
the mar is only 12 bits wide, so the stripping is done by
the hardware (see also instruction 9 for a non-trivial example)
Instruction 7 is painful to see - only one bit is set, so
only one small part of our microarchitecture is doing anything
useful - Think: Perhaps we can do something else here?
goto 0; restarts the instruction fetch loop
Instructions 9-10:
STOD - write ac to memory
Instruction 11:
Split on bit 12 to decode instruction starting with 001
Instructions 12-14:
ADDD - no surprises here
Instructions 15-18: subtraction operation
No subtraction in ALU, so we make use of the 1's complement
feature (inv(a)) and add 1
Can this be optimized? (Hint: goto 1, instead -
"prefetching")
Also, note that a is being used as a temp storage area
Instruction 22:
What's happening here? Is it necessary? We need a
philosophy to deal with it
We "need" to pull out the bottom 12 bits of the ir
to store in the pc because the pc should contain a
valid address
But does this actually matter? When the pc is loaded
into the mar for a subsequent fetch, it will be stripped
to 12 bits anyway
One possible danger: if the pc is put onto the stack,
the 16-bit version with junk bits in the top could end up in the
hands of a user program and two equivalent pointers might not be
equal
Possible solution: do the band(pc, amask) only when
pushing a pc value onto the stack - Then we could write:
pc := ir; mar := ir; rd; goto 1
Instruction 24: a lone goto
This is a total waste of time - no useful work happening
It may seem impossible, but this kind of instruction can
almost always be removed. How?
Instruction 27:
Is there an optimization to be performed here?
Can we start the next fetch?
Instructions 31-32:
Why can't these instructions be merged?
Why are we branching to 7?
Instructions 31 & 33:
Is it possible to remove one of these instructions (they're
duplicates)?
How about 36-37 & 38-39?
Instructions 47-49:
Nice code - lots of things happening at once
Would it be useful to store sp internally as
off-by-one? Look at the number of sp := sp - 1;
operations
Instructions 62-64:
The last two instructions can be removed by putting goto 7 at the end of instruction 62
Instructions 70-72:
Can these be optimized? Perhaps we can use the mbr
and jump to code that already loads ac from the mbr?