Computer Science 237 |
Lecture 26: MIC1 Microcode
Date: November 11, 2005
mar := pc; pc := 1 + pc; mbr := 1 + pc; rd;
This is a single instruction; it takes one line of store (32 bits) and one (major) cycle to execute. Removing parts of the instruction only reduces the work accomplished.
Here, we need to
So our microinstruction:
0 00 00 00 1 1 1 0 1 000 000 111 00000000
Each test falls to next instruction or takes conditional branch.
Approach may be thought of as a binary search tree for the appropriate microcode, traversed by looking at bits of instruction.
Decoding instructions in this manner is not practicable.
There is not under a great amount of pressure because of the relatively large control store. Our store will only be 4 times larger for a 68000 style architecture.
Think about how we might add more power to the machine with little overhead.
ir is a general purpose scratchpad register used to hold the current instruction.
Notice that the PC points to the next instruction to be executed.
Note that the value is "negative" if it starts with a 1 (i.e. has a 1 in bit 15). The n bit is set if it is. So from 2, we continue to 3 if the instruction starts with a 0, jump to 28 if it starts with a 1.
Note that ir+ir causes a left shift (a multiply by 2). So lshift(ir+ir) represents two left-shifts. The condition code bits are set based on the value of ir+ir.
Thus the conditional branch is taken if bit 14 is high. That bit will have disappeared (as part of tir; it will be in ir as bit 14) by the time we get to either instruction 4 or 19. But we never need to look at it again, so that's fine.
The double-shifting trick cannot be used again since only one bit can be tested at a time.
The form alu:=tir simply drops the tir register into the alu to get status bits; the scratchpad and mbr are left unchanged.
If the instruction is LODD (opcode 0), four failed N tests (one for each opcode bit of zero) cause us to arrive here.
This instruction takes the address (found in the low 12 bits of the instruction, still in ir) and loads that memory value into the accumulator (ac).
Notice that the opcode bits do not need to be stripped off; the mar is only 12 bits wide, so the stripping is done by the hardware (see also instruction 9 for a non-trivial example).
Instruction 7 is painful to see. Only one bit is set. Think: Perhaps we can do something else here?
goto 0; restarts the instruction fetch loop.
STOD - write ac to memory.
Split on bit 12 to decode instruction starting with 001.
ADDD - no surprises here.
No subtraction in ALU, so we make use of the 1's complement feature (inv(a)) and add 1.
Can this be optimized? (Hint: goto 1, instead - "prefetching")
Also, a is being used as a temp storage area.
What's happening here? Is it necessary? We need a philosophy to deal with it.
We "need" to pull out the bottom 12 bits of the ir to store in the pc because the pc should contain a valid address.
But does this actually matter? When the pc is loaded into the mar for a subsequent fetch, it will be stripped to 12 bits anyway.
One possible danger: if the pc is put onto the stack, the 16-bit version with junk bits in the top could end up in the hands of a user program and two equivalent pointers might not be equal.
Possible solution: do the band(pc, amask) only when pushing a pc value onto the stack. Then we could write:
pc := ir; mar := ir; rd; goto 1
This is a total waste of time. This instruction can almost always be removed. Seems impossible. How?
Is there an optimization to be performed here?
Why can't these instructions be merged? Why are we branching to 7?
Is it possible to remove one of these instructions (they're duplicates)? How about 36-37 & 38-39?
Nice code. Would it be useful to store sp internally as off-by-one? Look at the number of sp := sp - 1; operations.
The last two instructions can be removed by putting goto 7 at the end of instruction 62.
Can these be optimized? Perhaps we can use the mbr and jump to code that already loads ac from the mbr?
Compare INSP and DESP. Something is horribly wrong here. Can it be fixed?