Microcode

src: www.techpowerup.com

Microcode is a computer hardware technique that imposes an interpreter between the CPU hardware and the programmer's instruction set architecture visible from the computer. Thus, microcode is a hardware level instruction layer that implements higher level machine code instructions or internal machine state sequences in many digital processing elements. Microcode is used in a general-purpose central processing unit, although in desktop CPUs it is currently only backward for cases that can not be handled by wired control units faster.

Microcode usually resides in a special high-speed memory and translates machine instructions, state machine data or other input into a sequence of detailed level circuit operations. This separates machine instructions from the underlying electronics so that instructions can be designed and changed more freely. It also facilitates the construction of complex multi-step instructions, while reducing the complexity of computer circuits. Writing microcodes is often called microprogramming and microcode in certain processor implementations is sometimes called microprogram .

Larger microcoding allows small and simple microarchitectures to mimic stronger architectures with larger word lengths, more execution units and so on, which is a relatively simple way to achieve software compatibility between different products in the processor family.

Some hardware vendors, especially IBM, use the term microcode as a synonym for firmware . That way, all code inside a device is called microcode regardless of whether it is a microcode or machine code; for example, hard disk drives are said to have updated their microcode, although they usually contain microcode and firmware.

Video Microcode

Ikhtisar

The lowest layer in the pile of computer software is the traditional raw binary machine code instructions for the processor. Microcode sits one level below. To avoid confusion, every element associated with the microprogram is distinguished by the micro prefix: microinstruction, microassembler, microprogrammer, microarchitecture, etc.

Engineers typically write microcode during the processor design phase, storing it in a read-only memory (ROM) or programmable logic (PLA) structure, or in a combination of both. However, there are also some machines that have some or all of the microcodes stored in SRAM or flash memory. This is traditionally denoted as a writeable control store in the computer context, which can be read-only or read-write. In the latter case, the CPU initialization process loads the microcode into the control store of another storage medium, with the possibility of changing the microcode to fix bugs in the instruction set, or to implement new machine instructions.

Complex digital processors can also use more than one (perhaps microcode-based) control unit to delegate sub-tasks to be performed essentially asynchronously in parallel. A high level programmer, or even an assembly programmer, usually does not see or change the microcode. Unlike machine code, which often retains some backward compatibility among different processors in a family, microcode only runs on precisely designed electronic circuits, as it is an inherent part of the design of the particular processor itself.

Microprograms consist of a series of microinstructions, which control the CPU at the very basic level of the hardware set. For example, the only ordinary horizontal horizontal microinstruction can determine the following operations:

Connect list 1 to the A side of the ALU

Connect list 7 to the B side of the ALU

Set ALU to add two-complement

Set ALU carry input to zero

Save result value in list 8

Update the condition code from the ALU (negative , zero , abundant , and take status flags)

Microjump to microPC nnn for next microinstruction

To simultaneously control all processor features in one cycle, microinstruction is often wider than 50 bits; eg 128 bit on 360/85 with emulator feature. The micro programs are carefully designed and optimized for the fastest possible execution, because slow microprograms will result in slow machine instructions and degraded performance for associated application programs that use these instructions.

Maps Microcode

Justification
Microcode was originally developed as a simpler method for developing control logic for computers. Initially, the embedded CPU instruction set. Every step required to retrieve, decode, and execute machine instructions (including any operand address calculation, read, and write) is controlled directly by combinational logic and somewhat sequential state sequential engine. Although highly efficient, the need for powerful instruction sets with multi-step addressing and complex operations ( see below ) makes wired processors difficult to design and debug; highly-coded and vary-length instructions can contribute to this as well, especially when very irregular coding is used.
Microcode simplifies work by enabling many processor behaviors and programming models to be defined via microprogramming routines rather than with special circuits. Even at the end of the design process, microcode can be easily changed, while hard-wired CPU design is not very practical to change. So this greatly simplifies the CPU design.
From the 1940s to the late 1970s, most programs were conducted in assembly language; Higher level instructions mean greater programmer productivity, so the important advantage of microcode is the relative ease with which powerful machine instructions can be defined. The main extension of this is the design of "Directly Executable High Level Language", where each statement from a high-level language such as PL/I is completely and directly executed by microcode, without compilation. The IBM Future Systems project and the General Data Fountainhead Processor are examples of this. During the 1970s, CPU speeds grew faster than memory speed and many techniques such as memory block transfer, pre-fetch memory and multi-level cache were used to reduce this. High-level machine instructions, made possible by microcode, help further, because more complex machine instructions require less memory bandwidth. For example, operations on character strings can be performed as single machine instructions, thus avoiding taking multiple instructions.
Architecture with instruction set implemented by complex microprogram including IBM System/360 and Digital Equipment Corporation VAX. The microcode instruction set approach is getting more complex then called CISC. An alternative approach, used in many microprocessors, is to use PLA or ROM (not combinational logic) especially for decoding instructions, and let a simple state machine (without much, or whatever, microcode) do most of the sequence. The MOS Technology 6502 is an example of a microprocessor using PLA for decoding and sequencing instructions. PLA is seen in photomicrographs chip, and its operation can be seen in transistor level simulation.
Microprogramming is still used in modern CPU design. In some cases, once the microcode is debugged in the simulation, the logic function is replaced for the control store. Logical functions are often faster and cheaper than equivalent microprogram memory.
Benefits
The processor microprogramming program operates with a more primitive architecture, completely different, and more hardware-oriented than the assembly instructions seen by ordinary programmers. In coordination with hardware, microcode implements the architecture seen by the programmer. The underlying hardware does not need to have a fixed connection with the visible architecture. This makes it easier to apply the instruction set architecture that is rendered on a variety of underlying micro-architectural hardware.
The IBM System/360 has a 32-bit architecture with 16 general purpose registers, but most System/360 implementations actually use hardware that implements a much simpler microarchitecture; for example, the System/Model 360 30 has an 8-bit data path to the arithmetic logic unit (ALU) and main memory and implements a general purpose register in a special unit of higher-speed core memory, and System/Model 360 40 has an 8-bit data path to ALU and 16-bit data paths to main memory and also apply general purpose registers in special units of higher-speed core memory. Model 50 has a full 32-bit data path and implements a general-purpose register in a special unit of higher-speed core memory. Model 65 through Model 195 has a larger data path and applies general-purpose registers in faster transistor circuits. In this way, microprogramming allows IBM to design many System/360 models with very different hardware and cover a wide range of costs and performance, while making it all compatible with architecture. This dramatically reduces the number of unique system software programs that must be written for each model.
A similar approach is used by Digital Equipment Corporation (DEC) in their computer's VAX family. As a result, different VAX processors use different micro architectures, but the programmer's architecture-looks unchanged.
Microprogramming also reduces the cost of changing the field to fix the bugs in the processor; bugs can often be fixed by replacing some microprograms rather than with changes made to hardware and cable logic.
src: www.extremetech.com

History
In 1947, the MIT whirlwind design introduced the concept of a control shop as a way to simplify computer design and move beyond ad hoc methods. The control shop is a diode matrix: a two-dimensional lattice, in which one dimension receives a "control time pulse" of the CPU's internal clock, and the other connects to control signals at gates and other circuits. A "pulse distributor" takes the pulse generated by the CPU clock and breaks it into eight separate time pulses, each of which activates a different grid line. When the line is activated, it activates the control signal connected to it.
Described another way, the signal sent by the control shop is being played like a piano roll player. That is, they are controlled by a very wide sequence of words built from bits, and they are "played" in sequence. However, in the control shop, the "song" is short and repeatable continuously.
Each microinstruction in a microprogram provides bits that control functional elements that internally form a CPU. The advantage over programmed CPUs is that the internal CPU control becomes a special form of computer program. Microcode thereby alter the challenges of complex electronic design (control of the CPU) into a less complex programming challenge. To take advantage of this, the CPU is divided into sections:

A microsequencer takes the next word from the control store. A sequencer is mostly a counter, but usually also has several ways to jump to different parts of a control store depending on some data, usually data from the instruction register and always any part of the control store. The simplest sequencer is simply a list that is loaded from a few bits from the control store.

A set of registers is a fast memory containing data from the central processing unit. This may include program counters, stack pointers, and other numbers that are not easily accessible to application programmers. Often the register set is a triple-ported register file; that is, two registers can be read, and the third is written at the same time.

Arithmetic and logic units perform calculations, usually additions, logical negations, right shifts, and AND logic. It often performs other functions as well.

There may also be a list of memory addresses and memory data registers, which are used to access primary computer storage. Together, these elements form the "unit of execution". Most modern CPUs have several units of execution. Even a simple computer usually has one unit to read and write memory, and another to execute user code. These elements can often be put together as a single chip. This chip has a fixed width that will form a "slice" through the unit of execution. This is known as chip "bit slice". The AMD Am2900 family is one of the best examples of bit slice elements. The parts of the execution units and the execution units themselves are interconnected by a bundle of wires called buses.
Programmers develop microprograms, using basic software. A microassembler allows a programmer to define a bit table symbolically. Because of its close relationship to the underlying architecture, "microcode has several properties that make it difficult to generate using the compiler." The simulator program is meant to execute bits in the same way as electronics, and allows more freedom to debug microprograms. Once the microprogram is completed, and tested extensively, it is sometimes used as an input to a computer program that constructs logic to produce the same data. This program is similar to that used to optimize programmable logic arrays. Even without completely optimal logic, heuristic-optimized logic can greatly reduce the number of transistors of the required number for a ROM controller store. It reduces production costs, and electricity consumed by, the CPU.
Microcode can be characterized as horizontal or vertical , referring principally to whether each of the micrninstructions controls the CPU element with little or no decomposition (horizontal microcode) or requires extensive decoding by combinatorial logic before doing so (vertical microcode). As a result, each horizontal microinstruction is wider (containing more bits) and occupies more storage space than a vertical microinstruction.
Horizontal microcode
"Horizontal microcode has several discrete micro operations that are combined in a single microinstruction for simultaneous operation." Horizontal microcode is usually contained in a control shop that is wide enough; not infrequently every word becomes 108 bits or more. At each tick of the sequencer clock, the word microcode is read, decoded, and used to control the functional elements that make up the CPU.
In a typical implementation of the word horizontal microprogram consists of a group of bits that are defined very clearly. For example, one simple setting might be:
For this type of micromachine to implement JUMP instructions with addresses following opcode, microcode may require two tick clocks. The designer designing it will write the source code of the microassembler that looks like this:
# Each line that starts with a numeric is a comment Ã‚ Ã‚ Ã‚ # This is just a label, the ordinary assembler way symbolically represents a Ã‚ Ã‚ Ã‚ # memory address. InstructionJUMP: Ã‚ Ã‚ Ã‚ Ã‚ Ã‚ # To prepare for the next instruction, decode microcode instructions already Ã‚ Ã‚ Ã‚ Ã‚ Ã‚ # Move the program counter to the memory address register. This instruction is picking up Ã‚ Ã‚ Ã‚ Ã‚ Ã‚ # The target address of the leap instruction from said memory follows Ã‚ Ã‚ Ã‚ Ã‚ Ã‚ Ã‚ # jump opcode, by copying from memory data list to memory address register. Ã‚ Ã‚ Ã‚ Ã‚ Ã‚ # This gives the system two clock tick memory to retrieve the next one Ã‚ Ã‚ Ã‚ Ã‚ Ã‚ # instructions to memory data registers for use by decode instructions. Ã‚ Ã‚ Ã‚ Ã‚ Ã‚ # The "next" sequencer instruction means just adding 1 to the control word address. Ã‚ Ã‚ Ã‚ MDR, NONE, MAR, COPY, NEXT, NONE Ã‚ Ã‚ Ã‚ Ã‚ Ã‚ # This puts the next instruction address to the PC. Ã‚ Ã‚ Ã‚ Ã‚ Ã‚ # This gives the system memory a checkmark to complete the fetch that starts on Ã‚ Ã‚ Ã‚ Ã‚Ã‚Ã‚Ã‚ # previous microstructure. Ã‚ Ã‚ Ã‚ Ã‚ Ã‚ # The sequencer instruction is to jump to the start of the instruction decode. Ã‚ Ã‚ Ã‚ MAR, 1, PC, ADD, JMP, InstructionDecode Ã‚ Ã‚ Ã‚ Ã‚ Ã‚ # Instruction decoding is not displayed, as it is usually messy, very special Ã‚ Ã‚ Ã‚ Ã‚Ã‚ # to the exact processor being copied. Even this example is simplified. Ã‚ Ã‚ Ã‚ Ã‚ Ã‚ # Many CPUs have several ways to calculate addresses, not just take Ã‚ Ã‚ Ã‚ Ã‚ Ã‚ # it's a word following op-code. Therefore, not just one Ã‚ Ã‚ Ã‚ Ã‚ Ã‚ instruction # jumps, the CPU has a family of related jumping instructions.
For each tick it is common to find that only a few parts of the CPU are used, with the remaining bits of groups in microinstruction being no-ops. With careful hardware and microcode design, this property can be utilized for parallel operations that use multiple CPU areas; for example, in the case above, the ALU is not required during the first tick, so it is potentially used to complete the previous arithmetic instructions.
Vertical microcode
In a vertical microcode, each microinstruction is significantly coded - that is, the bit plane generally passes intermediate joint logic which, in turn, produces actual control and sequencing signals for internal CPU elements (ALU, registers, etc.). This is in contrast to the horizontal microcode, in which the bit field itself directly produces control and sequencing signals or is only minimally encoded. As a result, vertical microcode requires lesser instruction length and less storage, but takes more time to decode, resulting in a slower CPU clock.
Some vertical microcodes are simply conventional conventional computer assemblies that mimic more complex computers. Some processors, such as DEC Alpha processors and CMOS microprocessors in later IBM System/390 mainframes and z/mainframe architectures, have PALcode (a term used on Alpha processors) or millicode (a term used on IBM mainframe microprocessors). This is a form of machine code, with access to special registers and other hardware resources not available for regular machine code, used to implement some other instructions and functions, such as a page table running on an Alpha processor.
Another form of vertical microcode has two areas:
The select field selects which part of the CPU this word will control from the control store. The field value actually controls the part of the CPU. With this type of microcode, designers explicitly choose to make CPUs slower to save money by reducing unused bits in control stores; However, reduced complexity can increase CPU clock frequency, which reduces the effect of increasing number of cycles per instruction.
When transistors become cheaper, the horizontal microcode dominates the CPU design using microcode, with the vertical microcode being used less frequently.
When both vertical and horizontal microcodes are used, the horizontal microcode can be referred to as nanocode or picocode .
src: i.ytimg.com

A rewritable control store
Some computers are built using "writable microcodes". In this design, instead of storing microcode in ROM or programmable logic, microcode is stored in RAM called authorized control stores or WCS . Such computers are sometimes called computer instruction set instructions or WISC .
Many experimental prototype computers use writeable control stores; there are also commercial machines that use writable microcodes, such as Burroughs Small Systems, early Xerox workstations, DEC VAX 8800 ("Nautilus"), Symbolics L-and G-machines, IBM/360 System and System/370 implementations, some DEC PDP-10 engine, and General Eclipse MV/8000 Data.
More machines offer user-programmable programmable control stores, including HP 2100, DEC PDP-11/60 and Varian Data Machines V-70 series minicomputers. The IBM/370 system includes a facility called Initial Microprogram Expense ( or IMPL ) that can be invoked from the console, as part of power -on reset ( POR ) or from another processor in a tightly coupled multiprocessor complex.
Some commercial machines, such as the IBM 360/85, have read-only storage and control stores that can be written for microcode.
WCS offers several advantages including easy patching of microprograms and, for specific hardware generation, faster access than can be provided by ROM. User programmable WCS allows users to optimize the machine for a specific purpose.
Starting with the Pentium Pro in 1995, some Intel x86 CPUs have a writable microcode. This, for example, has allowed bugs in Intel Core 2 and Intel Xeon microcodes to be fixed by patching their microprograms, rather than requiring the entire chip to be replaced. A prominent second example is a set of microcode patches offered by Intel to upgrade processor architectural capabilities up to nearly 10 years in Counter advances serious counter adversaries and Meltdown security hazards as going completely public by early 2018. Microcode updates can be installed by Linux, FreeBSD, Microsoft Windows, or motherboard BIOS.
src: i.ytimg.com

Comparison with VLIW and RISC
The design trends toward highly microcoded processors with intricate instruction began in the early 1960s and continued into the mid-1980s. At that time the RISC design philosophy began to become more prominent.
A CPU that uses microcode generally requires multiple clock cycles to run a single instruction, one clock cycle for each step in the microprogram for that instruction. Some CISC processors include instructions that can take a very long time to be executed. Such variation interferes with interrupt latency and, which is much more important in modern systems, pipelining.
When designing a new processor, a strongly controlled RISC controller has the following advantages over a microcode CISC:

Programming has largely moved from the assembly level, so it is no longer useful to provide complex instructions for productivity reasons.

A simpler set of instructions enables direct execution by hardware, avoiding the performance penalty of microcoded execution.

Analysis shows complex instructions seldom used, so machine resources devoted to them are mostly wasted.

Machine resources devoted to rarely used complex instructions are better used to speed up the performance of simpler and more commonly used instructions.

Extensive microcode instructions may take many different clock cycles, and it is difficult to pipe for performance improvements.

There is also a match:

Complex instructions in implementing multiple microcodas may not require much additional machine resources, except for microcode space. For example, the same ALU is often used to calculate effective addresses as well as calculate the results of the actual operand (for example, the original Z80, 8086, and others).

The simpler non-RISC directive (that is, it involves direct memory operands) is often used by modern compilers. Even immediately to accumulate (ie, memory results) arithmetic operations are usually used. Although such memory operations, often with long encoding range, are more difficult for the pipeline, it is still feasible to do so - clearly exemplified by i486, AMD K5, Cyrix 6x86, Motorola 68040, etc.

Non-RISC Instructions inherently do more work per instruction (on average), and are also usually very encoded, so they allow smaller overall sizes of the same program, and thus better use limited cache memory.

Many RISC and VLIW processors are designed to execute each instruction (during cache) in one cycle. This is very similar to the way a CPU with microcode executes one microinstruction per cycle. The VLIW processor has instructions that behave similarly to a very wide horizontal microcode, though usually without very good control over the hardware as provided by microcode. RISC instructions are sometimes similar to a narrow vertical microcode.
Microcoding has been popular in special processor applications such as network processors, microcontrollers, digital signal processors, channel controllers, disk controllers, network interface controllers, graphics processing units, and other hardware.
src: i50.tinypic.com

Micro Ops
Modern CISC implementations, such as the x86 family, decode instructions into dynamic micro-operations ("? Ops") with encoding instructions similar to RISC or traditional microcode. Electronically decoded instruction units directly emit ops for general x86 instructions, but return to more traditional microcode ROMs for more complex or less used instructions.
Source of the article : Wikipedia