A NEW 16-BITS RISC PROCESSOR...
Transcript of A NEW 16-BITS RISC PROCESSOR...
3rd Seminar on Science and Technology, 10-11 August 2004, Kota Kinabalu, Sabah
A NEW 16-BITS RISC PROCESSOR ARCHITECTURE:
CONTROLLER STATE MACHINES AND FUNCTIONAL VERIFICATION
USING VERILOG™ HDL
Ismail Saad, Pukhraj Vaya, Abu Bakar A.R, Wan Hoong Wai
School of Engineering and Information Technology University Malaysia Sabah, Locked Bag 2073, 88999 Kota Kinabalu, Sabah, Malaysia
Tel: +60-8-832-0000 x 3147/3066, Fax: +60-8-832-0348 (e-mail: [email protected], [email protected] , [email protected])
ABSTRACT
This paper presents the design and simulation of new 16-bits RISC microprocessor architecture
with an emphasis on state machines namely Controller State Machine (CSM) model and the processor
functionality verification using Verilog Hardware Description Language (HDL). The processor system
consists of ROM, RAM, I/O and CPU. The CPU module is merely a shell which instances the real
processor definition in cpu_core.v, control.v, datapath.v and alu.v module. The design and verification of
CSM, which represents the core mechanism of control unit architectural design, are elaborated in detail in
this paper. The processor offers 36 types of instruction to be used by the programmer. The functional
verification task of the processor is carried out using VCS™ (Verilog Code Simulator) simulators by
executing the 36 instructions which four of them are discussed in this paper.
Key words: Verilog HDL, RISC, Datapath, Behavioural Model, VCS Simulator
1.0 Introduction
Microprocessor application is not limited to personal computer but also used in a specific field
such as robotics, communications, control systems, etc [1-5]. However, the existing process of designing
a very large scale ICs such a new microprocessor for specific application is complicated, time consuming
and prone to human errors. Thus, we have employed the design methodology based on the Verilog-HDL
(Hardware Description Language) software for our new architecture of 16-bits RISC microprocessor. The
3rd Seminar on Science and Technology, 10-11 August 2004, Kota Kinabalu, Sabah
Verilog is a tool that simplify the design processes by allowing designer to describe the design at the
highest level of abstraction (behavioral and register transfer level) [8-10]. The design can be tested by
simulation before sent-off for fabrication and thus, cost and time are saved [6-7]. However, the success of
designing such new processors depends mainly on the accurate design of the state controller in the system
control unit [12-14]. In this paper, we present the design and simulation of a new 16-bits processor
architecture based on HDL design methodology using Verilog language with an emphasis on the design
and verification of controller state machine as well as the processor functionality.
2.0 Processor Architecture
The new 16-bits RISC processor design has a multiplexed 16-bit data and address path. The
instruction has a variable length, as it takes one word for instruction that operates within registers only
and two words for instructions operated on registers/memory and register/immediate. The 16-bits
instruction field consists of 2-mode bit, 1-bit each for set condition (set_bit) and test condition (test_bit),
3-bit ALU function (ALU_func) and 3-bit each for destination register (Rd), source1 register (Rs1) and
source2 register (Rs2). The processor can execute 36 instructions, which are grouped into 2 instructions
type; arithmetic/logical and load/store. There are six registers in the processor where 3 of them are general
purpose (R1,R2,R3) while the other 3 are dedicated register that is PC (Program Counter), IR (Instruction
Register) and DR (Direct Register). On top of that, a dummy register, R0 (always zero) is also included in
the register file which follow the convention of RISC architecture [15].
3. Processor Verilog Module Systems
The top module of processor system is defined in system.v file. It consists of CPU, 256 words of
ROM (addresses 0-255), 256 words of RAM (addresses 256-511), I/O module consisting of a bank of 16
switches (mapped at address 512) and a bank of 16 LEDs (mapped at address 513), transparent address
latch that stores address and decoder module to select either ROM, RAM or I/O modules. The second top
3rd Seminar on Science and Technology, 10-11 August 2004, Kota Kinabalu, Sabah
module is cpu.v file, which is merely a shell which simulates the pad ring and instances the real processor
architecture definition in the cpu_core.v module. The cpu_core.v module instances the processor control.v
and datapath.v definition. Finally, datapath.v module instances the ALU definition in alu.v file as show in
figure 1 below. A monitor.v module is also written for monitoring the activity of processor design. The
control.v, datapath.v, monitor.v and alu.v module includes an opcodes.v module which contains a
definition of operational codes and oprenads of the processor architecture.
cpu.v
cpu_core.v
control.v
datapath.v
alu.v
Fig.1 ssor : Verilog Module Structure for Proce
4. Processor Control Unit Design
The control unit is the core of the microprocessor. It accepts as input, those signals that are needed to
operate the controller, and provides as output all the control signals necessary to effect that operation.
Thus, two main functions of control unit are to execute operations in a proper sequence by means of CSM
and to interpret the instruction words and consequently generate the control signal that causes each
instruction to be executed. Our control unit design consists of 16-bit Instruction Register (IR), 1-bit Zero
Flag register, Controller State Machine and Sub States of memory cycles and the different types of
generated control signals as illustrated in Fig.2:
3rd Seminar on Science and Technology, 10-11 August 2004, Kota Kinabalu, Sabah
State : 0: Fetch1
CONTROLFunction
Zero
TrisPC
4.1 Controller State Machine
The CSM has three states: Fetch1 (00), Fetch2 (11) and Execute (01) that coded by using gray code.
The controller state machine is based on the Mealy machines as referred in the reference [1,14]. Details of
the state transition are shown in the state diagram in the Fig.3.
3: Fetch2
1: Execute
Sub_state :
0: address_setup 1: address_hold 2: data_hold
IR
15 14 13 12 11 10 9 8 4 5 6 7 0 1 2 3
Rs2 Rs1 RdOpcode ModeBit
TrisALU
TrisRs2
TrisRd
nTrisRd
PC_inc
Rs2_sel
WriteR1
WriteR2
WriteR3
ReadP
ReadR
ReadR1_1
ReadR2_1
ReadR3_1
C_1
0_1
Zero Flag
ReadPC_2
ReadR0_2
ReadR1_2
ReadR2_2
ReadR3_2
LoadDR
WritePC LoadPC
ALUfuncsetbit
testbit
Zero zero_flag_reg
3: data_setup
Fig.2: Processor Control Module Architecture
Fetch1 (00)
Fetch2 (11)
Execute (01)
TRUE, 01 TRUE, 10,11
FALSE, XX TRUE, 10/11
TRUE, XX
TRUE, 00 or FLASE, 00/01
Fig.3: Controller State Machine State Diagram
FALSE, 10/11
3rd Seminar on Science and Technology, 10-11 August 2004, Kota Kinabalu, Sabah
In addition it also has 4 memory cycles sub states: address_setup (00), address_hold (01),
data_setup (11) and data_hold (10). To distinguish transitions of operation from one state to another, the
data_hold sub state of memory cycle and the 2-mode bit fields of instruction are used.
Referring to Fig. 3, TRUE or FALSE represents the presence of data_hold in the sub state cycle,
the 2-bit (00,01,11,10) is represent the possible values of mode bit and XX is referred as don’t care
condition. Details of states transition are explained details below.
Fetch1 states case:
The Fetch1 states will remain in its current state when the data_hold is TRUE and mode bit is 00. Then,
Fetch1 states will jump to the Execute states and if the data_hold is TRUE and mode bit is 01. In order for
the Fetch1 states to jump from its present state to Fetch2, the condition to be fulfilled is when the
data_hold is TRUE and mode bit is 10 or 11. When the data_hold is FALSE and mode bit is 01 or 10, the
Fetch1 state will remains at its current state.
Execute states case:
The Execute state will remain in its current state during the FALSE data_hold and don’t care conditions
(XX) of mode bit occur. Then, if data_hold is TRUE and mode bit is don’t care conditions then the next
state will be Fecth1.
Fetch2 states case:
For the purpose of Load and Store operations both Fetch2 and Execute states will be used accordingly. If
the data_hold is TRUE and mode bit is 10 or 11 then the next state will be jump to execute states.
Otherwise, if data_hold is FALSE and mode bit is 10 or 11 then the current state will be remained.
Generally, Fetch1 states is dedicated for register and register instruction type, which uses 4 clock
cycle or 1 memory cycle to be executed. Execute states is for register and immediate instruction type, that
uses 8 clock cycle or 2 memory cycles to be executed. For Load and Store instruction type, which is the
3rd Seminar on Science and Technology, 10-11 August 2004, Kota Kinabalu, Sabah
longest instruction to be executed, Fetch2 and Execute state is uses 3 memory cycles or 12 clock cycles.
Hence, all instructions are completed in exactly 12 clock cycles. Gray code style is employed for the state
assignments since each of the state transition requires only one bit changing. This approach is chosen to
reduce the glitch problem during bit changing process.
This controller state machine is coded in verilog by using case statement and the algorithm can be
viewed as follows:
case (state) `Fetch1: if (sub_state == `data_hold && (ModeBit == 2'b00)) state <= `Fetch1; else if (sub_state == `data_hold && ModeBit == 2'b01) state <= `Execute; else if (sub_state == `data_hold && ((ModeBit == 2'b10) || (ModeBit == 2'b11))) state <= `Fetch2; else if (ModeBit == 2'b01 || ModeBit == 2'b00) state <= `Fetch1; `Fetch2: if (sub_state == `data_hold && ((ModeBit == 2'b10) || (ModeBit == 2'b11))) state <= `Execute; else if (ModeBit == 2'b10 || ModeBit == 2'b11) state <= `Fetch2; `Execute: if (sub_state == `data_hold ) state <= `Fetch1; else state <= `Execute;
4.2 VERIFICATION OF THE CONTROLLER STATE MACHINE
The verification of the controller state machine is done by simulating the whole control unit
together with the instructions that saved in the ROM. The states transition will take place when data_hold
is TRUE. With remain states are excluded three states transition discussed accordingly in the following.
The states and sub_state of the processor are defined in the verilog control module as below:
`define Fetch1 0 `define Execute 1 `define Fetch2 3 `define address_setup 0 `define address_hold 1 `define data_setup 3 `define data_hold 2
3rd Seminar on Science and Technology, 10-11 August 2004, Kota Kinabalu, Sabah
Fig. 4.0: Fetch1 (0) to Execute (1) state transition
Fig. 4.0 shows the transition from Fetch1 (2) to Execute (1) state happen when data_hold (3) is
TRUE (2) and mode bit equals to 01. It also shown the Execute operation (register and immediate) uses 8
clock cycles to be executed denoted by c1 to c2 range.
Fig. 4.1: Execute (1) to Fetch1 (0) state transition
Fig. 4.1 shows the transition from Execute (1) to Fetch1 (0) state happen when the data_hold (3) is
TRUE (2) and mode bit equals to 00. In the first data_hold TRUE there is state transition happen due to
the previous instruction (4040) is Execute operation where it uses 8 clock cycles as denoted by red line in
the Fig. 4.1. The Fetch1 operation (register and register) is the shortest types of instruction to be executed
where it only used 4 clock cycles as denoted by c1 to c2 range.
Fig. 4.2: Fetch1 (0) to Fetch2 (3) state transition
3rd Seminar on Science and Technology, 10-11 August 2004, Kota Kinabalu, Sabah
Fig. 4.2 shows the transition from Fetch1 (0) to Fetch2 (3) state happen when the data_hold is
TRUE (2) and mode bit equals to 11. The executed instruction (c00b) is longest types of instruction. It
used for store and load operation where it requires Fetch2 (3) and Execute (1) state and total 12 clock
cycles needed in order to executed the instruction as denoted by c1 to c2 range.
5. Processor Functionality Verification
Verification of processor functionalities is done for the basic operations which include arithmetic,
logic and shift operation. This processor architecture offers 36 types of instructions available for use. At
the simulation level, the functionalities of the processor are verified through timing diagram of each
module as generated in the VCS simulator windows. For example, only 4 types of instructions are shown
here which Register + Immediate, Register + Register and Load/Store Instructions.
5.1 Register + Immediate value operation test Rd ← Rs1Addi Imm //R1 ← R0 + 259(103hex);
This instruction is used to verify add operations between Register1 (R1) and immediate value (259) where
the immediate value is stored into Register1. Details of the process are shown in Fig.5.0.
Fig.5.0: Timing Diagram of Register + Immediate Operation
3rd Seminar on Science and Technology, 10-11 August 2004, Kota Kinabalu, Sabah
5.2 Register + Register operation test Rd ← Rs1 Addr Rs2 // R3 ← R1 + R2;
This instruction is used to verify add operation within registers. In this example the instruction involves
add operation between Register1 and Register2 then output is stored into Register3. The immediate value
259(0103hex) in the Register1 is added to immediate value in the Register2: 93(005dhex) that is stored
initially, then result: 352(0160hex) is stored into Register 3. Details of the process are shown in the
Fig.5.1.
Fig.5.1: Timing Diagram of Register + Register Operation 5.3 Store operation test mem[Rs1+ Imm] ← Rd // mem[R0 + 259] ← R1;
This instruction is used to verify store operation. In this example the instruction involves store operation
from Register2 into memory at location [259]. The Register0 (R0) is a dummy register and it always 0.
After Write signals enabled the content of Register2 is stored into memory addresses at [259]. Details of
the process are shown in the Fig.5.3.
3rd Seminar on Science and Technology, 10-11 August 2004, Kota Kinabalu, Sabah
The content of memory at location [259] after store operation is now equal to 10 as shown in the Fig.5.4.
Fig.5.3: Timing Diagram of Store Operation
Fig.5.4: Interactive Display of Memory
Content 5.4 Load operation test Rd ← mem[Rs1+ Imm] // R2 ← mem[R0 + 259];
This instruction is used to verify load operation. In this example the instruction involves load from
memory at location [259] into Register 2. The content of memory locations addresses at [259] is loaded
into destination register (Register2). Details of the process are shown in the Fig.5.5.
3rd Seminar on Science and Technology, 10-11 August 2004, Kota Kinabalu, Sabah
Fig.5.5: Timing Diagram of Load Operation
6. Conclusions
New 16-bit processor architecture is successfully designed based on HDL methodology and
simulated completely through VCS simulator in Synopsys tools in order to verify the processor
functionalities. The success of the processor depends to the state controller of the system. As presented in
the paper, controller state machine is designed to control the state transition and its functionalities is
verified through execution of 36 instructions out of which 4 instructions as test bench cases are explained
thoroughly in the paper.
3rd Seminar on Science and Technology, 10-11 August 2004, Kota Kinabalu, Sabah
7. References [1] D.D Gajski, Principles of Digital Design, Prentice Hall, 1997. [2] M. Zwolinski, Digital System Design with VHDL, Prentice Hall, 2000. [3] D. A. Patterson & J.L. Hennesy, Computer Organization and Design - The Hardware/ Software Interface, Morgan Kaufmann, 1999. [4] M. Morris Mano, Digital Logic and Computer Design, Prentice Hall, 1997. [5] G.H Miller, Microcomputer Engineering, 2nd edition, Printice Hall, 1998. [6] Dally, W-J. Chang, A. The Role of Custom Design In ASIC Chips, Proceedings of the 37th conference on design automation, ACM Press, pg 643-647, 2000. [7] Flynn, M-J. Winner, R-I. ASIC microprocessor, Proceedings of the 22nd annual International Workshop on Microprogramming and Microarchitecture, ACM Press, pg 237-243, 1989. [8] Samir Palnitkar, Verilog HDL A Guide to Digital Design and Synthesis, Printice Hall, 1995. [9] Lioupis, D. Papagiannis, A. Psihogiou, D., A Systematic approach to software peripherals for embedded system, Proceedings of the ninth International symposium on hardware/software codesign, ACM Press, pg 14-145, 2001. [10] J.C Diaz, P. Plaza, L.A. Merayo, P. Scarfone, M. Zamboni, Design and validation with HDL of a complex input/output processor for an ATM switch : the CMC, Verilog HDL conference, Proceedings, pg 67-71, 1995. [11] A.E Mahdi, I.A Grout, PLL based ASIC system for DSP real-time analogue interface, www.ece.ul.ie/hompage/ian_grout/publications.html ,2002. [12] M.G Arnold, T.A Bailey, J.R Cowles, J.J Cupal, A.W Wallace, A purely data structure for accurate high level timing simulation of synchronous designs, Verilog HDL Conference, pg 101- 107, 1994. [13] O. Hebert, I.C Kraljic, Y. Savaria, A Method to Derive Application-Specific Embedded Processing Cores, International Conference on Hardware Software Codesign, San Diego, California, United States, ACM Press, pg 88-92, 2000. [14] S.Golson., State machine design technique for Verilog and VHDL, Synopsys Journal of High-Level Design, pg 1-20, September 1994. [15] D.A Patterson, C.H Sequin, RISC 1: A Reduced Instruction Set VLSI Computer, International Symposium on Computer Architecture (selected paper), Spain, pg 216-230, 1998.