Superscalar processors are uniprocessor organizations capable of increasing machine performance by executing multiple scalar instructions in parallel. Since the amount of instruction-level parallelism within a basic block is small, superscalar processors must look across basic block boundaries to increase performance. In numerical code, the do-loop branches, which are a large percentage of the total branches executed, can be resolved early to expose the parallelism between many iterations. Unfortunately, many of the branches in non-numerical code are data dependent and cannot be resolved early. Thus, speculative execution, the execution of operations before previous branches, is an important source of parallelism in this type of code.
Instruction-level parallelism can be extracted statically (at compile-time) or dynamically (at run-time). Statically-scheduled superscalar processors and Very Long Instruction Word (VLIW) machines exploit instruction-level parallelism with a modest amount of hardware by exposing the machine's parallel architecture in the instruction set. For numerical applications, where branches can be determined early, compilers harness the parallelism across basic blocks by techniques such as software pipelining or trace scheduling. However, the overhead and complexity of speculative computation in compilers has prevented efficient parallelization of non-numerical code.
Dynamically-scheduled superscalar processors, on the other hand, effectively support speculative execution in hardware. By using simple buffers, these processors can efficiently commit or discard the side effects of speculative computations. Unfortunately, the additional hardware necessary to look far ahead in the dynamic instruction stream, find independent operations, and schedule these independent operations out of order is costly and complex.
We are interested in using superscalar techniques to increase the performance of non-numerical code at a reasonable cost. To accomplish this goal, we propose a superscalar architecture, which we call TORCH, that combines the strengths of static and dynamic instruction scheduling. The strength of static scheduling is the compiler's ability to efficiently schedule operations across many basic blocks; consequently, TORCH performs all instruction scheduling in the compiler. The strength of dynamic scheduling is in its ability to efficiently support speculative execution; consequently, TORCH provides hardware that allows the compiler to schedule any instruction before preceding branches, an operation we term boosting. Boosted instructions are conditionally committed upon the result of later branch instructions. Boosting, therefore, removes the scheduling constraints that result from the dependences caused by conditional branches and makes aggressive instruction scheduling in the compiler simple.
To make the conditional evaluation of boosted instructions efficient at run-time, the TORCH hardware includes two shadow structures: the shadow register file and shadow store buffer. These structures buffer the side effects of a boosted instruction until its dependent branch conditions are determined. On a correctly predicted branch, the hardware commits the appropriate values in the shadow structures. On a mispredicted branch, the hardware guarantees correct program operation by squashing all shadow values.

The machine supports the standard MIPS 32-register register file. To support boosting the hardware includes a second 32-word register file called the shadow register file. Memory is byte-addressable and TORCH is a big-endian machine: byte 0 is the most-significant byte of word 0.
The integer execution units have a five-cycle pipeline similar to the MIPS R2000 pipeline:
The two execution units operate in lock-step; if one side stalls, so does the other.
Branch conditions are evaluated by the end of the first half of the ALU cycle so branches have a single-cycle delay slot, a topic which is discussed in detail in subsection "Branches" on page 6 . The machine has some hardware interlocks, including a load interlock (there is no load delay slot, unlike the MIPS R2000 architecture). Exceptions occur in the second half of the MEM cycle.
As an example, here are two legal TORCH packets written in assembly language:
The first instruction in each pair is the A-side instruction. These same two packets can be abbreviated by the following schematic notation:
In this example the addition and subtraction operations execute in parallel, followed by the shift (sll) and store (sw).
The architecture does not define the results if an instruction is on the wrong side of a packet. For instance, the following packet is illegal since a store must be on the B side:
However, there is an exception: subsection "Dynamic NOPs" on page 5 describes a special case in which an A-side instruction with a dynamic NOP can appear on the B side of a packet, and similarly for a B-side instruction on the A side.
Instructions which execute in parallel must not depend on each other in any way. For example, if the two instructions in a packet write to the same register in the same cycle (a zero-cycle write-write dependency) then the result is undefined. However, the two instructions in a packet might be issued in different cycles if dynamic NOPs are present; again, see the Dynamic NOP subsection.
The numbers at the left are byte addresses, each of the small boxes represents one byte, and a wide box represents a four-byte word. E0 through E7 are the extension bytes for instructions 0 through 7, respectively, and I0 through I7 are the MIPS-compatible parts of the instructions. The first packet consists of E0, I0, E1 and I1. A group of eight instructions packed in this way is called an instruction block.
The instruction blocks are repacked as individual 40-bit instructions as they are loaded into the primary cache, which is organized to store 40-bit words. The bus between the primary and secondary caches is 64 bits wide, so an instruction block is loaded into the primary cache as follows: First, the primary cache controller reads the eight extension bytes and stores them in a shift register. Next, two instructions are read and the first two extension bytes are shifted out of the shift register and stored with the corresponding instructions. Then the next two instructions are loaded, and so on. This scheme requires a minimal amount of hardware (a simple, fast unidirectional shift register for the extension bytes), in contrast with the complicated shift register/aligner which would be required if the extension bytes were stored adjacent to the corresponding instructions in memory (since an instruction could begin at any byte address).
Now since the extension byte and the MIPS portion of each instruction are not stored adjacent to each other, what is the address of an instruction? The convention used in TORCH is to address an instruction by its instruction number rather than by a memory address. In the example above the first instruction (E0-I0) has instruction number 0, the second instruction (E1-I1) has instruction number 1, and so on. The first (A-side) instruction of a packet always has an even instruction number, and the second (B-side) instruction has an odd instruction number. Branch and jump effective addresses and the address in the program counter are all instruction numbers.
This method of addressing instructions maintains compatibility with the word addresses used in a standard MIPS binary. If all of the extension bytes were removed and the remaining MIPS instructions were packed to fill in the empty spaces, then the instruction number would be the same as the word address of the instruction (assuming a 32-bit word). In the above example I0 would be at address 0, I1 would be at byte address 4 or word address 1, and so on.
From now on the phrases "instruction number" and "instruction address" will be used interchangeably, but remember that the addresses in branch instructions are not the virtual addresses of the targets.
TORCH solves this problem by allowing a NOP to be encoded by a single bit in the extension byte of some other (hopefully useful) instruction. If the dynamic NOP bit of an instruction is set then the instruction is delayed by one machine cycle before it is issued. Both instructions in a packet have independent dynamic NOP bits. In assembly language a dynamic NOP is indicated by preceding an instruction with "d.":
The schematic notation is:
The extra vertical bar indicates an "empty" instruction preceding the add. In this example the subtract instruction is issued immediately since its dynamic NOP bit is not set. In the following machine cycle the add instruction is issued, and in the third cycle the next packet can be issued. Here is a functionally equivalent code sequence with no dynamic NOPs which takes the same amount of time to run (assuming no additional I-cache misses), but which has one extra packet:
A packet with dynamic NOPs takes two cycles to issue. The instruction (or instructions) issued in the first cycle is called the first issue of the packet, and the instruction issued in the second cycle is called the second issue. In the example above the subtract instruction is the first issue of the packet and the add is the second issue. A packet with no dynamic NOPs has only one issue.
Table 1 is a list of all legal combinations of dynamic NOPs in a packet and the equivalent packets with explicit NOPs.
---------------------------------------------------
Source Code Schematic Notation Execution Sequence
---------------------------------------------------
---------------------------------------------------
A branch which is predicted not taken is extended with ".n":
Branch prediction is done by the compiler.
In the MIPS architecture all branches have a single-instruction delay slot, which means that the instruction following the branch is always executed regardless of whether the branch is taken or not. TORCH branch instructions also have a delay slot, but only the issue following the branch issue is considered the delay slot. In other words, a branch takes effect after the issue following the branch and not necessarily after the first packet following the packet containing the branch. If the packet containing the branch and the following packet have no dynamic NOPs then the entire second packet is in the delay slot and will be executed regardless of the branch outcome. If, however, an instruction in the second packet has a dynamic NOP then that instruction is not in the delay slot. For example, the sequence
is equivalent to
and in both cases only the store instruction (sw) is in the delay slot of the branch (beq). As a second example, in the following sequence the delay slot has two NOPs and the subtract instructions are not in the delay slot:
Finally, in this last example only the add instruction is in the delay slot:
One of the instructions in the delay slot can be a branch or a jump:
Assuming that there are no dynamic NOPs the execution sequence of this program is:
The second jump is in the delay slot of the first jump. The first jump takes effect in the third cycle, but then the second jump takes effect in the fourth cycle and "cancels" the first jump. As a result, A5 and B5 are not executed. The exception handlers use this feature to return to the user's program (see subsection 2.8, Returning from an Exception).
Recall that the first instruction of a packet always has an even address. A branch to an odd address is allowed, but it has a special meaning. If there are no dynamic NOPs in the target packet then the second instruction of the packet (which must be a B-side instruction) is executed by itself. If, however, either instruction in the target packet has a dynamic NOP then the machine executes the second issue of the packet, which is not necessarily the second instruction. These semantics provide a way for exception handlers to return to the second issue of a packet without reexecuting the first issue (see subsection 2.8, Returning from an Exception for more details).
Here are some examples of odd-target branches:
Example 1: The target has no dynamic NOPs, so the second instruction is executed.
Example 2: The first instruction of the target has a dynamic NOP.
Example 3: Both instructions of the target have dynamic NOPs.
If the target packet of a branch or jump to an odd address has a single dynamic NOP (e.g. Example 2 above) then the machine stalls for one cycle before executing the target instruction. The stall cycle is necessary because the instruction might have to be moved to the opposite side of the machine. For instance instruction A3 of Example 2 could be a B-side instruction, but then the machine would need an extra cycle after fetching the target packet to move the instruction from the A side to the B side. There would be no stall cycle if the jump target address were t1, because in that case instruction B3 could execute immediately and instruction A3 could be moved to the correct side in time for the following cycle.
One of the central features of TORCH is the ability to execute an instruction before a branch it depends on, but to delay updating the sequential machine state with the results of the instruction until the branch result is known. Instructions which have been moved across a branch are called "boosted" instructions, and non-boosted instructions are called "sequential" instructions.
Boosting is implemented by keeping the results of boosted instructions in a shadow register file. When a branch occurs the contents of the shadow registers are either discarded, to squash the boosted instructions, or moved to the "sequential" (main) registers, to commit the boosted instructions. In the case of a store instruction the address and data are held in a shadow store buffer until the branch path is determined. The store buffer holds values which are waiting to be written to the data cache (or to main memory in the case of uncacheable references). Currently, up to four outstanding boosted stores can be held in the store buffer. The compiler enforces this limit, but if it is exceeded then an exception occurs(1).
Now recall that the TORCH compiler annotates each branch with the path which is most likely to be taken. Instructions are only allowed to be boosted from the branch path which is predicted to occur. Furthermore, in the current implementation an instruction can be boosted across only one branch. Boosted instructions can therefore be executed as follows. The machine executes a boosted instruction as if it were a normal instruction, but all results are saved in the shadow registers and shadow store buffer. Subsequent boosted instructions can read the results of previous boosted instructions from the shadow registers. When the next branch occurs, if it does not follow the predicted path then the machine squashes all of the boosted instructions by simply ignoring all of the shadow state. If the branch does follow the predicted path then all of the shadow registers that were written to by boosted instructions are exchanged with the corresponding sequential registers, so that the registers which formerly contained the shadow state now become sequential registers with the correct copy of the machine state (this operation is called "ponging"). Boosted stores which have been waiting in the store buffer are also allowed to begin writing to the data cache or main memory after the branch is taken.
Boosted ALU instructions always write to shadow registers, but they may read operands from either shadow registers or sequential registers depending on whether the operands are produced by a boosted instruction or a sequential instruction. In assembly language a shadow register is specified by appending ".b" to the register name:
In this case the add instruction reads its operands from the sequential register file and writes its result to the shadow register file. If the following branch is not taken, which in this case is the predicted path, then registers $2 and $2.b are ponged so that A4 and B4 (the instructions following the delay slot of the branch) see the result of the add in register $2. The example above is faster than but functionally equivalent to the following code:
The architecture does not prevent sequential instructions from reading shadow registers, but the result is unpredictable (problems can occur if any instruction causes an exception; see subsection 2.7 on Exception Handling).
Note that boosted instructions do not commit at jumps (which have no conditional). An instruction can be moved across a jump just by duplicating it (one copy of the instruction must appear in each path which leads to the target code).
The current implementation of TORCH only allows instructions to be boosted across one branch, but that is not a limitation of the architecture. Boosting across multiple branches can be implemented by associating a count with each boosted instruction. A boosted instruction with a count of two can commit only if the next two branches in the dynamic instruction stream both follow their predicted paths. However, because there is only one shadow register file only one level of boosting can be handled at a time per register. If, for example, an instruction boosted across one branch and an instruction boosted across two branches both write to the same shadow register before the next branch then the result is undefined. This is a minor limitation since the two instructions can write to different shadow registers.
Jump and branch instructions may not be boosted, except into the delay slot of another branch; in that case the boosted branch only takes effect if the first branch takes the predicted path. System calls and break instructions should not be boosted because they always cause exceptions, and as will become clear from the subsection on exception handling there is no performance gain from boosting an instruction which excepts. System coprocessor instructions which modify registers in the system coprocessor cannot be boosted since there are no shadow structures in the system coprocessor. The multiply and divide instructions use two special registers, "hi" and "lo", which do not have corresponding shadow registers. As a result, multiply and divide instructions can be boosted only if there is no sequential instruction using the hi and lo registers at the same time. Since the hi and lo registers are generally used for temporary results this constraint is not severe.
Suppose a boosted instruction cannot complete due to an exceptional condition, for instance a load instruction attempts to dereference a NULL pointer. If the next branch does not follow the predicted path then the boosted instruction should never have been executed and the "boosted exception" must be ignored (in fact, this case occurs frequently during the last iteration of a tightly-coded loop which traverses a linked list). At the time the exception is detected the machine sets a bit in the status register of the system coprocessor (the "boosted exception pending" or "BEP" bit), and then continues fetching and executing instructions normally. The instruction which excepted stores garbage as its result. Of course, any boosted instructions which depend on the instruction which excepted may not compute correct results.
When the machine executes the next branch instruction it must decide how to process the boosted exception(2). If the branch does not follow the predicted path then the results of the pending boosted instructions can be discarded, and the machine ignores the exception. But if the branch does follow the predicted path then the exception must be processed, and assuming that the exception handler does not terminate the program, all of the boosted instructions executed since the last branch must be reexecuted (since some of those instructions may depend on the excepting instruction). TORCH causes these actions to be performed by clearing the boosted registers and the store buffer and then calling the boosted exception handler. The exception handler is called during the MEM cycle of the delay slot instruction, before the delay slot instruction completes.
The boosted exception handler is actually quite simple. It records the PC at the time of the exception (the delay slot PC), and then returns---not to the branch which caused the exception, but to a special boosted exception routine at a fixed location in the user's program. The delay slot PC is passed to this routine.
The boosted exception routine is generated by the TORCH compiler and executes in user mode, unlike the boosted exception handler which executes in system mode and is part of the operating system. Based on the delay slot PC, the routine jumps to a piece of code which contains a copy of the delay slot instruction and all of the boosted instructions since the last branch, but none of the other sequential instructions. However, when the boosted instructions are copied (at compile time) they are converted to sequential instructions---since the true sequential instructions have already completed, the boosted instructions can operate directly on the sequential register set. The block of copied code ends with a jump to the predicted path of the branch which caused the boosted exception handler to be called.
Now whatever happened to the exception the machine was supposed to handle? When the machine executes the sequential copy of the boosted instruction which excepted, if the exception occurs again then the machine can handle it as a normal sequential exception. The appropriate exception handler can be called immediately since the branch condition on which the boosted instruction depended has already been resolved. It is also possible that the exception may not reoccur. In either case, once the instructions in the copied code have all successfully completed, the jump at the end of the block causes a return to the correct instruction in the main program. This mechanism allows the boosted exceptions to be handled after the completion of the sequential instructions which logically precede them---and that is one of the requirements of a precise interrupt mechanism. Furthermore, the exception handlers only have to work on sequential instructions; there are no special cases for boosted instructions.
Here is an example to illustrate how the boosted exception handler works. Consider the following code:
Remember that the ".b" notation means an instruction is boosted, and the ".n" notation means the compiler predicted a branch will not be taken. The numbers at the left are instruction numbers (PCs). Now suppose the add instruction causes an overflow and the branch is not taken. Here is the sequence of events:

So far the discussion has focused on boosted exceptions, but there are some complications for sequential exceptions too. During the MEM pipeline stage of a sequential instruction which excepts, the appropriate exception handler is called. However, before the handler gets control the machine discards all of the values in the boosted registers and the store buffer (by clearing bits which indicate which boosted registers have valid data). The boosted state must be cleared to prevent that state from being committed if the exception handler executes a branch (and most exception handlers do!) Alternatively the boosted state could be saved and restored, but significant extra hardware would be required. The bottom line is that in our implementation all boosted state is lost when a sequential instruction excepts.
When the exception handler returns by executing an RFE instruction, the machine checks state bits in the status register to see if any boosted instructions were pending before the exception. If so, the machine sets the BEP bit and continues execution. At the next branch the same sequence of events as for a boosted exception occurs---if the branch follows the predicted path then the boosted exception handler is called and the copied code is executed. In this case the copied code must be executed only to recompute the results of the boosted instructions, not to handle a true boosted exception.
Figure 2 summarizes the exception handling mechanism, including a special case if the excepting instruction is in the delay slot. The problem with a sequential exception in the delay slot is that any boosted instructions which were committed by the branch instruction must be squashed before the exception handler is called (since the delay slot instruction logically precedes the boosted instructions), and then when the exception handler returns the results of the boosted instructions must be recomputed. But this case is different then other sequential exceptions because the branch has already completed, so it would be incorrect to simply set the BEP bit. It is also necessary to keep a reecord of whether the branch is predicted incorrectly. This is because there could be a boosted instruction in the delay slot that needs to be squash when it is re-executed.
While the copied code technique does allow TORCH to support precise interrupts, it also increases the size of binaries substantially. The dispatch table used to find the address of the copied code must contain one entry for every even PC in the program code, and each entry is a 32-bit address, so the table is 2/5 the size of the program(3). A hash table could also be used, but most programs have a relatively high proportion of branches so a hash table would likely have nearly the same size as a dispatch table. On top of the space requirements of the dispatch table, all boosted instructions and some delay slot instructions must be duplicated. Although the increase in program size does increase the amount of disk space required to store binaries, we expect it to have little impact on cache and machine performance since the copied code and dispatch table are only accessed when exceptions occur.
The RFE instruction restores the previous status register context bits (including the kernel/user mode bit), and the two jumps cause a return to the user's program. Note that the second jump is in the delay slot of the first; this sequence is necessary since the instruction at EPC might be in a delay slot, in which case the instruction at EPCN might not be the following instruction in the program text.
EPC and EPCN may contain odd addresses if the excepting instruction is in a packet which contains a dynamic NOP. The semantics of odd addresses allow the second issue of a packet to be executed without reexecuting the first issue, which may have successfully completed before the exception occurred. For example, suppose instruction A1 excepted in the following sequence:
Then the machine would set EPC to t1+1 and EPCN to t1+2. When the exception handler returned to t1+1 the machine would only execute instruction A1, since t1+1 is an odd address and the second issue of the packet is A1; B1 would not be executed again. It would be incorrect to set EPC to t1 because instruction B1 would execute and complete both before and after the exception. As a second example, if instruction B1 excepted instead of A1 then EPC would be set to t1 and EPCN to t1+1. Since A1 follows B1 in the pipeline, EPCN should point to A1 and not to A2 (which is two cycles behind B1 in the pipeline). Finally, if A2 or B2 excepted then EPC would be set to t1+2 and EPCN would be set to t1+4. No odd addresses would be necessary since A2 and B2 execute simultaneously; if one of the two excepted, then the other would not complete either. In summary, the general rule is that EPC points to the issue containing the excepting instruction and EPCN points to the next issue in the pipeline.
The break and syscall instructions are special because they always cause an exception, but they must not be reexecuted when the exception handler returns. So, the exception handler returns to EPCN instead of EPC by using the following code sequence:
However, this return sequence has the side-effect that the A-side instruction in the packet with the syscall or break will never complete unless one of the two sides has a dynamic NOP. For example, in the following code sequence instruction A1 never completes since the syscall handler returns to instruction A2:
This code sequence, however, is correct:
Since A1 is one pipeline cycle behind the syscall, EPCN points to A1 and it gets executed after the syscall returns.
Note that instructions should not be boosted into a basic block containing a system call. There is no harm in doing so, but there is no performance gain either. Since the system call causes an exception, any boosted instructions in the same basic block must be reexecuted after the next branch (if the branch follows the predicted path).

The operating system runs in TORCH mode. A user program can contain instructions of only one type, MIPS or TORCH. To execute a MIPS binary the operating system sets the top bit of a three-bit "mode" stack in the status register and branches to the user program. The machine returns to TORCH mode when the program causes an exception (e.g. a system call), in which case a context switch occurs and the previous bit on the mode stack determines the new execution mode.
The instruction cache is an 10 kByte cache with virtual tags and virtual indexes. The cache accesses the TLB only in the case of a miss. Since the tags are virtual page coloring is not necessary.
The TORCH page size is 4 kBytes. However, main memory for program text (instructions) is allocated in units of five pages, or 20 kBytes. This allocation scheme ensures that the instructions in a 40-byte instruction block are always contiguous in main memory. Since 40 is an even divisor of 20k (but not of 4k), an integral number of instruction blocks fit into a 20 kByte chunk of memory. As a result, the hardware which fills the instruction cache does not need to handle a special case if the instruction block crosses a page boundary. There are no restrictions on the placement of program data in memory, other than the page coloring requirement.

-----------------------------------------------------------------------------------
Class and Side Opcode Description
-----------------------------------------------------------------------------------
ALU (both sides) ADDI, ADDIU, ADD, ADDU, SUB, SUBU add and subtract
SLTI, SLTIU, SLT, SLTU set on less than
ANDI, ORI, XORI, AND, OR, XOR, NOR bitwise boolean operations
LUI load upper immediate
Shift (A side) SLL, SRL, SLLV, SRLV logical shifts
SRA, SRAV arithmetic shifts
Multiply/Divide MULT, MULTU integer multiply
(A side) DIV, DIVU integer divide
MFHI, MFLO, MTHI, MTLO move from/to HI and LO regis
ters
Jump/Branch J, JR jump
(A side) JAL, JALR jump and link
BEQ, BNE, BLEZ, BGTZ, BLTZ, BGEZ branch
BLTZAL, BGEZAL branch and link
Load/Store LB, LBU, LH, LHU, LW, LWL, LWR load
(B side) SB, SH, SW, SWL, SWR store
Special SYSCALL system call
(B side) BREAK breakpoint
Coprocessor RFE return from exception
(B side) TLBR, TLBWI, TLBWR, TLBP TLB operations
LWCz, SWCz load/store word to/from copro
MTCz, MFCz, CTCz, CFCz cessor
COPz move to/from coprocessor
BCzT, BCzF coprocessor operation
branch on coprocessor
-----------------------------------------------------------------------------------
---------------------------------------------- A side B side ---------------------------------------------- ALU operations ALU operations multiply and divide load and store shift operations system call and break jump and branch system coprocessor calls ----------------------------------------------

Table 4 shows how the branch prediction information is encoded. Note that TORCH branch instructions which are predicted taken have encodings identical to those of the correpsonding MIPS branch instructions, and TORCH branches which are predicted not taken are encoded with unused MIPS opcodes.
---------------------------------------- Branch Opcode Bits 31-26 Bits 20-16 ---------------------------------------- BEQ.T 000100 rt BEQ.N 010100 rt BNE.T 000101 rt BNE.N 010101 rt BLEZ.T 000110 00000 BLEZ.N 010110 00000 BGTZ.T 000111 00000 BGTZ.N 010111 00000 BLTZ.T 000001 00000 BLTZ.N 000001 01000 BGEZ.T 000001 00001 BGEZ.N 000001 01001 BLTZAL.T 000001 10000 BLTZAL.N 000001 11000 BGEZAL.T 000001 10001 BGEZAL.N 000001 11001 BCz.T bit 22 is 0 BCz.N bit 22 is 1 ----------------------------------------
--------------------------------------------------------------------------
Register Name Description
--------------------------------------------------------------------------
$0 and $0.b Read-only registers containing zero
$1-$31 General-purpose sequential registers
$1.b-$31.b General-purpose shadow registers
HI and LO Result registers for integer multiply and divide
Status Status and mode bits (see Figure 5)
Cause Description of last exception (identical to the MIPS R2000
Cause Register)
EPC PC of instruction which excepted
EPCN PC of instruction following the instruction which excepted
BadVAddr Virtual address which caused last addressing exception
Context For use by TLB miss handler
EntryHi High word of TLB entry
EntryLo Low word of TLB entry
Index TLB entry index
Random random TLB entry index
--------------------------------------------------------------------------
The registers associated with the TLB and address translation are identical to the corresponding registers in the R2000 architecture.
------------------------------------------------------- Bit Number Description ------------------------------------------------------- 0-31 MIPS instruction encoding 32-33 rd (destination register) boosting level 34-35 rt (second source register) boosting level 36-37 rs (first source register) boosting level 38 dynamic NOP bit 39 reserved for future use -------------------------------------------------------
-------------------------------------------------------------------------------
Bit Number Name Description
-------------------------------------------------------------------------------
0 IEc Interrupt enable for current context (set to 1 to enable)
1 KUc Kernel/user mode for current context (set to 1 for user
mode)
2 IEp Interrupt enable for previous context
3 KUp Kernel/user mode for previous context
4 IEo Interrupt enable for old context (prior to previous)
5 KUo Kernel/user mode for old context
6-7 not used
8-9 Sw0-1 Software interrupt mask bits
10-15 Hw0-5 Hardware interrupt mask bits
16 BSc Boosted code executed in current context (the hardware
sets this bit when a boosted instruction commits and
clears it when a branch commits)
17 BSp Boosted code executed in previous context
18 BSo Boosted code executed in old context
19 SBp Squash bit in old context
20 SBo Squash bit in old context
21 TS TLB shutdown (set to 1 by processor if TLB failure is
detected)
22 BEV Bootstrap exception vector (if set to 1, the machine uses
an alternate set of exception vectors)
23 MPc MIPS mode for current context (set to 1 for MIPS mode)
24 MPp MIPS mode for previous context
25 MPo MIPS mode for old context
26 BEP Boosted exception pending (the hardware sets this bit
when a boosted instruction excepts and clears it when a
branch commits)
27 not used
28-31 Cu0-3 Cux is set to 1 by the hardware if coprocessor x is usable
-------------------------------------------------------------------------------
