Picking up from where I left off last time, that next concern is that we would be increasing the number of adders in an effort to improve the multiplication instruction. Given the “serialized” adder described in the last log, this is not really a problem from a transistor count standpoint.

At this point, it is time to ask a new question: What do those adders do when not doing multiplication?

Before answering that question, I want to address an expected criticism. This is in fact a hobby undertaking, and the efficiency of the design really is a low priority consideration. The processor that I am attempting to build is not intended to compete with commercial offerings. That said, if I am going to solder up however many thousand transistors this thing ends up requiring, I want to get as much performance out of it as I can. The easiest way to do that is to make it as efficient1 as possible.

When I think of having multiple simultaneously usable instructions I immediately think of concurrent processing. There are several forms of concurrent processing:

I remembered reading about Intel’s Hyper-threading technology and how it could simulate multiple processors. I now know that Hyper-threading is actually an implementation of simultaneous multi-threading(SMT).

Assuming I stuck with these “serial” adders that I described in my previous post, unused adders could be used to perform other operations simultaneously. The only draw back is that SMT makes heavy use of microcode to keep the various instructions in the processor working simultaneously. In order for concurrency to be realized one of a few things has to happen:

  1. The code being executed has to be written in such a way to maximize concurrency. This is necessary when using a concurrent flag on the instructions themselves2.
  2. The microcode has to be sufficiently advanced to keep the processor’s instruction’s busy. I can imagine a couple of different ways to do this, but I do not believe this to be a trivial undertaking.
  3. There is a third way that is essentially a combination of the first two that implements some concurrency logic in microcode but still has the same software requirements as the first option.

If the first option is chosen the hardware needs some modifications to support this functionality. The complexity of these changes is dependent on other aspects of the design and implementation of the processor. The other drawback to this option is that software that has not been optimized for this design will not realize any advantage. The second option will drastically impact the complexity of the processor’s microcode (so far as I can tell anyway), but will likely provide the greatest realized advantage. The third option is, again, a combination of the first two.

At this point you may have noticed that microcode has become a recurrent theme. Keep that in mind as I move to the next topic: instruction pipelines. The last time I tried to explain an instruction pipeline to somebody I am pretty sure that I somehow knew less about it when I was done (which is bad, because I know incredibly little about it to start with). For both our sakes, I will simply refer you to the following resources:

Pay particular attention to that last source, specifically this: “Unlike earlier microcoded machines, the first RISC machines had no microcode.” I did not know that at the time, but I was familiar with the concept of pipelines as well as the fetch-decode-execute cycle.

Based on that understanding and the issues identified in this and the previous post, I decided to just eliminate microcode. In my mind, this is an obvious solution. After all, why incur the overhead of microcode if it is not truly necessary? As I mentioned previously, there are a number of reasons to use microcode, but can the overall design of a processor be simplified by omitting it?

I cannot say definitively that the answer is yes, however, I can say that the complexity of the design is shifted. Specifically, complexity is shifted from the control logic to the individual instructions. In fact, the control logic can be greatly simplified – to the point of being a few multiplexers. This is important because this ended up being a factor in a later design consideration.

This decision does result in considerably more complexity for certain instructions (specifically, those dealing with arithmetic operations). This is because those operations have to be performed in dedicated hardware instead of making use of other instructions (such as the shift and add instructions).



1 In this particular context efficiency is defined as number of instructions being utilized per cycle.

2 In theory at least, it should be possible for an operating system to analyze code on-the-fly and change instruction context between single and concurrent execution, thereby enabling multi-threading. I do not know whether or not this has ever been done, but can think of no reason that it would not be possible. It would probably not be trivial to implement, and would require some analysis to determine whether or not the performance gain outweighed the increase in overhead (in my mind this would be done in the scheduler).

No Comment.

Add Your Comment