We are going to EMULATE a processor where all 4 components (fetch, decode, execute and store) are running at the same time.

computer science

Description

Assignment 4 - pipelines

 

Introduction

We are going to EMULATE a processor where all 4 components (fetch, decode, execute and store) are running at the same time. This is called pipelining. In order to do so, we need to add some additional hardware.

Each of the four components needs to interact with the previous component. For example – Decode needs to get its data from Fetch. In assignment 3, Decode could only run after Fetch, so we knew that there was something to Decode. In assignment 4, though, they are running at the same time. It is possible that Fetch isn’t done with its work and Decode is ready for more. So we need to put some signaling mechanism between the components so that we can ensure that they are only doing work when the previous phase is complete.

Additionally, you must remember that once a component signals that it is done and the next one can take over, the producing component (Fetch, from our example above) doesn’t overwrite the value between itself and the next component until that component is done.

To solve these problems we will use a technique that is used in graphics – double buffering. We will have 2 data areas between our components – one Component can be writing to one while the other Component is reading from the other. https://en.wikipedia.org/wiki/Multiple_buffering#Double_buffering_in_computer_graphics

Instead of having a single data item that we use to transfer data between two Components, we will have two data items. See diagram below:


 

 

With two different currentInstruction registers, Fetch can write to currentInstruction1 while Decode reads from currentInstruction2 and vice versa. We use 2 boolean values on each currentInstruction to indicate who can access them.

This pattern should be used all the way through the system. For example – between Decode and Execute, in non-pipelined, you had OP1 and OP2. Now you need 2 OP1s and 2 OP2s. Since they always work in pairs, you can reduce the Booleans by treating them as pairs:  {OP1, OP2} and {OP1, OP2}.

You need to make the Components only do work when called if they have input (Fetch will always have input) and they have a place to put their work (an output); Store doesn’t require an output.

Also note – previously, we used CurrentInstruction as a shared variable – Fetch set it and every other Component used it. Now, we could have 4 CurrentInstructions at a time. So we need to consider those to be input to every Component.

Once this work is complete, you should now be able to run the main loop (Fetch, Decode, Execute, Store) in different orders. In some orders of running, little or no work may happen at first (filling the pipeline).

There are a few remaining things to consider. With the ability to have multiple instructions running at the same time, we introduce the ability to have errors based on hazards. For the sake of this assignment, we can ignore memory related hazards and only focus on the register related hazards.

Consider this program:

A: MOVE 10 R1

B: ADD R1 R1 R2

C: ADD R2 R2 R3

When instruction C is DECODED, instruction B may well be executing (or even waiting to execute).

As a result, C will set OP1 and OP2 to 0 because R2 hasn’t been calculated or stored yet.

There are two ways to fix this:

1) Stall – when decoding, if the output register from the PREVIOUS instruction is the INPUT instruction to the next instruction, wait until the instruction has finished executing AND storing. This has some trickiness to it – we have to look 2 stages (Decode -> Store) into the “future” to see when the store has completed.

2) Register Forwarding – when executing, if the previous instruction OUTPUTTED a register that is used as INPUT to this instruction, replace OP1 or OP2 with the output value from the previous instruction. In the case of the little program above – when C starts to execute, it looks at the current instruction and notes that it is using R2. R2 was the output from the previous instruction (B), so it replaces OP1 and OP2 with the value from the last execute.

For this assignment, Register Forwarding makes the most sense, so please implement that.

The final thing to consider is branching. We don’t know if the branch will be taken or not until we get to Execute. So if the branch IS taken, we need to go back and invalidate everything in the pipeline.


Related Questions in computer science category