Search for notes by fellow students, in your own course and all over the country.
Browse our notes for titles which look like what you need, you can preview any of the notes via a sample of the contents. After you're happy these are the notes you're after simply pop them into your shopping cart.
Title: CIT 309 COMPUTER ARCHITECHURE
Description: Computer Architecture is a prerequisite for all 300 Level or HND 1 Students to offer in their first year. The Course is a work of A Professor and in accordance with the requirements for the award of First Degree or Higher National Diploma in Computer/Mathematics/Statistics Department with an easily understandable nature. It gives you a complete guide and knowledge of how the computer system was developed, manufactured and pre-programmed using binary digits, i.e 0 and 1 respectively. I have searched the internet consecutively and noticed that courses of such are difficult to get, in this veil, I advice all Computer/Mathemetics/Statistics Student who come across this eBook as well as Course Note. For complaints or inquiry, please send a mail to joysesam212@gmail.com Enjoy Your Write-Up!!!
Description: Computer Architecture is a prerequisite for all 300 Level or HND 1 Students to offer in their first year. The Course is a work of A Professor and in accordance with the requirements for the award of First Degree or Higher National Diploma in Computer/Mathematics/Statistics Department with an easily understandable nature. It gives you a complete guide and knowledge of how the computer system was developed, manufactured and pre-programmed using binary digits, i.e 0 and 1 respectively. I have searched the internet consecutively and noticed that courses of such are difficult to get, in this veil, I advice all Computer/Mathemetics/Statistics Student who come across this eBook as well as Course Note. For complaints or inquiry, please send a mail to joysesam212@gmail.com Enjoy Your Write-Up!!!
Document Preview
Extracts from the notes are below, to see the PDF you'll receive please use the links above
NATIONAL OPEN UNIVERSITY OF NIGERIA
SCHOOL OF SCIENCE AND TECHNOLOGY
COURSE CODE:
COURSE TITLE:
CIT309
Computer Architecture
COURSE
GUIDE
CIT 309
COMPUTER ARCHITECTURE
COURSE TEAM:
Developer/Writer: Greg Onwodi
National Open University of Nigeria
Course Coordinator: Rele Afolorunsho
National Open University of Nigeria
Course Editor: Engr
...
Obi
Programme Leader :
Prof
...
5 Dares Salaam Street Off Aminu Kano Crescent Wuse II, Abuja
Nigeria
e-mail: central info@nou
...
ng URL: www
...
edu
...
0 INTRODUCTION
2
...
0 MAIN CONTENTS
3
...
2 STRUCTURE AND FUNCTION
3
...
0 CONCLUSION
5
...
0 TUTOR MARKED ASSIGNMENT
7
...
0
INTRODUCTION
In spite of the variety and pace of change in the computer field, certain
fundamental concepts apply consistently throughout
...
Many computer manufacturers offer a family of computer models, all with the
same architecture but with differences in organization
...
Changes in technology not only influence
organization but also result in the introduction of more powerful and more
complex architecture
...
2
...
•
Outline types of operands and operations specific by machine
instruction
...
0
MAIN CONTENT
3
...
Computer organization refers to the operational
units and their interconnection that realize the architectural specification
...
g numbers, characters), I/O
mechanism, and techniques for addressing memory
...
3
...
•
Structure: The way in which the components are interrelated
...
In term of description, there are two choices: starting at the bottom and
building up to a complete description, or beginning with a top view and
decomposing the system into its subparts
...
The approach taken is that the computer be described from the top down
...
Figure 1
...
In general terms,
there are only four:
-
Data processing
-
Data storage
-
Data movement
Control
The computer of course, must be able to process data
...
It is
also essential that a computer store data
...
e data come in and get processed and the results go out
immediately) the computer must temporarily store at least
...
Files of data are stored on
the computer for subsequent retrieval and update
...
The computers operating environment consist of devices that serve as
either sources or destinations of data
...
When data are moved over longer distances, to or form a remote device, the
process is known as data communications
...
Ultimately, this control is exercised by the individuals
who provide the computer with instructions
...
There are four main structural components
Central processing unit (CPU): Controls the operations of the
computer and performs its data processing functions; often simply referred to
as processor
...
-
System interconnections: Some mechanism that provides for
communication among CPU, main memory and I/O
...
However, the most interesting and complex component is the C
...
U
...
-
Arithmetic and logic unit (ALU): Performs the computer data
processing functions
...
-
CPU
interconnection:
Some
mechanism
that
provides
for
communication among the control unit, ALU and registers
...
Such a design is referred to as the yon
Neumann architecture and is base on three key concepts:
• Data and instructions are stored in a single read-write memory
...
• Execution occurs in a sequential fashion (unless explicitly modified)
from one instruction to the next
...
There is a small set of basic logic components that can be
combined in various ways to store binary data and to perform arithmetic and
logical operations on that data
...
We can think of the process of connecting
the various components in the desired configuration as a form of
programming
...
Now consider this alternative
...
This set of hardware will
perform various functions on data depending on control signals applied to the
hardware
...
1a)
...
Thus, instead of
rewiring the hardware for each new program, the programmer merely needs to
supply a new set of control signals
...
The
entire program is actually a sequence of steps
...
For each step, a new set of control
signals is needed
...
1b)
...
Instead of rewiring the hardware for each
new program, all we need to do is provide a new sequence of codes
...
To distinguish this new method of
programming, a sequence of codes or instructions is called software
...
1b indicates two major components of the system: an instruction interpreter and a module of general-purpose arithmetic and logic functions
...
Several other components are needed to yield
a functioning computer
...
For this we need some sort of input module
...
A means of
reporting results is needed, and this is in the form o an output module
...
One more component is needed
...
But a program is not invariably executed sequentially; it
ma,
...
g
...
Similarly, operations on
data may require access to more than just one element at a time in a
predetermined sequence Thus, there must be a place to store temporarily
both instructions and data
...
Von Neumann
pointed out that the same memory could be uses to store both instructions
and data
...
2 illustrates these top-level components and suggests the
interaction, among them
...
For this
purpose, it typical'' makes use of two internal (to the CPU) registers: a
memory address register (MAR), which specifies the address in memory for
the next read or write, and memory buffer register (MBR), which contains
the data to be written into memory receives the data read from memory
...
An I/0 buffer (I/OBR) register is used for the exchange of data between an I/0
module and the CPU
...
Each location contains a binary number that can be
interpreted either an instruction or data
...
It contains internal
buffers for temporarily holing these data until they can be sent on
...
the key elements of program execution
...
Program execution
consists of repeating the process of instruction fetch and instruction execution
...
4)
...
Using the simplified two-step description given previously, the
instruction cycle is depicted in Figure 3
...
The two steps are referred to as the
fetch cycle and the execute
cycle
...
INSTRUCTION FETCH AND EXECUTE
At the beginning of each instruction cycle, the processor fetches an instruction
from memory
...
Unless told otherwise,
the processor
Basic Instruction Cycle
always increments the PC after each instruction fetch so that it will fetch the
next instruction in sequence (i
...
, the instruction located at the next higher
memory address)
...
Assume that the program
counter is set to location 300
...
On succeeding instruction cycles, it will fetch instructions from
locations 301, 302, 303, and so on
...
The fetched instruction is loaded into a register in the processor known as
the instruction register (IR)
...
The processor interprets the instruction and performs
the required action
...
Processor-I/O: Data may be transferred to or from a peripheral device be
transferring between the processor and an I/O module
...
Control: An instruction may specify that the sequence of execution be altered
For example, the processor may fetch an instruction from location 149, which
specifies that the next instruction be from location 182
...
Thus, on the next fetch
cycle, the instruction will be fetched from location 182 rather than 150
...
The processor contains a single data register called an accumulator (AC)
...
Thus, it convenient to organize
memory using 16-bit words
...
address 941 and stores the result in the latter location
...
The PC contains 300, the address of the first instruction
...
Note that this process involves the use
of a memory dress register (MAR) and a memory buffer register
2
...
For simply these intermediate registers are ignored
...
The remaining 12 bits (three hexadecimal digits) specify the
ac (940) from which data are to be loaded
...
The next instruction (5941) is fetched from location 301 and the
incremented
...
The old contents of the AC and the contents of location 941 are added
5
...
The next instruction (2941) is fetched from location 302 and the F
incremented
...
The contents of the AC are stored in location 941
...
Some
processors, for example, included instructions that contain more than one
address
...
Also, instead of memory
references, an instruction may specify an I/O operation
...
A single instruction cycle with the
following steps
Fetch the ADD instruction
...
Read the contents of memory location B into the processor
...
Add the two values
Write the result from the processor to memory location A
...
Also, instead of memory references, an
instructor specify an I/O operation
...
For any given instruction cycle, some states -null and others may be visited
more than once
...
Usually, this involves adding a fixed number to the
address of the previous instruction
...
If, instead, memory is organized as individually addressable 8-bit bytes,
then add 2 to the previous address
...
Instruction operation decoding (iod): Analyze instruction to determine type
of operation to be performed and operand(s) to be used
...
,
Operand fetch (of): Fetch the operand from memory or read it in from 1/O
...
Operand
store (os): Write the result into memory or out to I/O
...
6 involve an exchange between the
processor and either memory or an 1/O module
...
The oac state appears
twice, because an instruction may involve a read, a write, or both
...
Also note that the diagram allows for
multiple operands and multiple results, because some instructions on some
machines require this
...
Finally, on some machines, a single instruction can specify an operation to be
performed on a vector (one-dimensional array) of numbers or a string (onedimensional array) of characters
...
6 indicates, this would involve
repetitive operand fetch and/or store operations
...
0 INTRODUCTION
One boundary where the computer designer and the computer
programmer can view the same machine is the machine instruction
set
...
Implementing the processor is a tasks that in large part involves
implementing the machine instruction set
...
0 OBJECTIVES
At the end ;f the of this unit, you should be able to
Explain the instruction format
Understand the instruction length and characteristics
3
...
1 INSTRUCTION FORMATS
An instruction format defines the layout of the bits of an instruction, in terms
of its constituent’s fields
...
The format must implicitly
explicitly, indicate the addressing mode for each operands
...
3
...
1 INSTRUCTION LENGTH
The most basic design issue to be faced is the instruction format length
...
This decision determines
the richness and flexibility of the machine
...
2
INSTRUCTION SETS CHARACTERISTICS
The operation of the processor is determined by the instructions it executes
referred to as machine instructions or computer instruction
...
3
...
1 ELEMENTS OF A MACHINE INSTRUCTION
These elements are as follows:
-
Operation code: Specifies the operation to be performed (e
...
The operation is specified by a binary code, known as the operation code
or opcode
...
The address of the next instruction to be fetched could be either a real address
or a virtual address, depending on the architecture
...
In most cases, the next
instruction to be fetched immediately follows the current instruction
...
Source and result operands can be in one of four areas
...
-
Processor register: With rare exception a processor contains one or
more registers that may be referenced by machine instructions
...
If more than one register exists,
then each register is assigned a unique name or number, and the instruction
must contain the number of the designed register
-
Immediate: The value of the operand is contained in a field in the
instruction being executed
...
If memory-mapped I/O is used, this is just another main or
virtual memory address
3
...
2 INSTRUCTION REPRESENTATION
Within the computer, each instruction is represented by a sequence of bits
...
Common examples include:
ADD
add
SUB
MUL
SUBTRACT
multiply
DIV
divide
LOAD
Load data form memory
STOR Store data to memory
Operands are also represented symbolically
...
In this example Y refers to the address of a location in memory,
and R refers to a particular register
...
X= 413
Y= 414
A simple program would accept this symbolic input, convert opcodes and
operand references to binary form, and construct binary machine instructions
...
Lets assume that the variables X and Y corresponds to location 413 and 414
...
1
...
2
...
Add the contents of memory location 414 to the register
...
3
...
The instruction defines any of the functions performed by
the processor and thus has significant effect on the implementation of the
process
...
Thus, programmer requirements must be considered in designing
the instruction set
...
-
Data types: The various types of data upon which operations are
perform
...
-
Registers: Number of processor registers that can be referenced by
instructions and their use
...
These issues are highly interrelated and must be considered together in
designing an instruction set
...
0
CONCLUSION
In spite of the variety and pace of change in the computer field, certain
fundamental concept applies consistently throughout
...
5
...
and
their
Computer architecture refers to those attributes of a system visible to a
programmer or those attributes that have a direct impact on the logical
execution of a program
...
6
...
What in general terms is the distinction between computer organization
and computer architecture?
2
...
List and briefly explain five important instruction set design issues
7
...
ARM system developers guides an
Fransisco Morgan Kaufinann, 2004
MODULE 2: Computer Arithmetic
UNIT 1: The arithmetic and logic unit
UNIT 2: Control unit design/Implementation
1
...
0
3
...
1
The arithmetic and basic unit
3
...
3
Integer representation
Integer Arithmetic
3
...
5
4
...
0
Floating point arithmetic
Conclusion
Summary
6
...
0
T
...
A
Reference and Further reading
1
...
Computer arithmetic is commonly performed on two very
different types of numbers: integer and floating point
...
2
...
3
...
All of the other elements
of the computer system- Control unit, registers memory, I/0- are there mainly
to bring into the ALU for it to process and then take the result back out
...
Figure 3
...
1 indicates, in general terms, how the ALU is interconnected with
the rest of the processor
...
These registers are temporary
storage locations within the processor that are connected by signal paths to the
ALU
...
For example,
an overflow flag is set to 1 if the result of a computation exceeds the length of
the register into which it is to be stored
...
The control unit provides signals that control
the operation of th ALU and the movement of the data into and out of the
ALU
...
2 INTEGER REPRESENTATION
In the binary number, arbitrary numbers can be represented with just
the digits zero and one the minis sign and the period or radix point
...
01012= -13
...
Only binary digits (0 and 1) may be used
to represent numbers
...
An 8 bit word can represent the numbers form 0 to 255, including
00000000
00000001
=
=
0
1
00101001
=
41
10000000
11111111
=
=
128
255
In general, if an n- bit sequence of binary digits
is interpreted as an
unsigned integer, A it value is
A= n-1
Σ2i ai
2=0
In going from the first to the second equation, we require that the least
significant n - 1 bits do not change between the two representations
...
-next to last equation, which is only true if all of the bits in positions
throem 2 are 1
...
Fixed-point representation
Finally, we mention that the representations discussed in this section are
sometime referred to as fixed point
...
The
programmer can use the representation for binary fractions by scaling the
numbers so that the binary poor implicitly positioned at some other location
...
In twos complement notation, the
negation of an integer can be formed with the following rules:
Take the Boolean complement of each bit of the integer (including the sign
bit)
...
Treating the result as an unsigned binary integer, add 1
...
bitwise complement
As expected, the negative of the negative of that number is itself:
Again, interpret an n-bit sequence of binary digits a,-Ian-2
...
Finally, interpret the resulting n-bit sequence of binary digits as a twos
complement integer B,so that its value is
A=-2n-1a n-1 +
Some such anomaly is unavoidable
...
We wish to represent positive
and negative integers and 0
...
If there is on=_ one representation of 0 (twos complement), then there must
be an unequal numb -- - of negative and positive numbers represented
...
Addition in twos complement is illustrated in Figure 9
...
Addition proceeds as
it :_ two numbers were unsigned integers
...
If the result of the operation is positive, we get a positive
number in :-
...
If the result o= : - _ operation is negative, we get a negative number in
twos complement form
...
On any addition, the result may be larger than can be held in the wor- - i
being used
...
When overflow occurs, the ALL
-- _ signal this fact so that no attempt is made to use the result
...
following rule is observed:
Some insight into twos complement addition and subtraction can be
gained by looking at a geometric depiction [BENH92], as shown in Figure
9
...
The circle in the upper half of each part of the figure is formed by
selecting the appropriate segment of the number line and joining the
endpoints
...
Starting at any number on the circle, we can add
positive k (or subtract negative k), to that number by moving k positions
clockwise, and we can subtract positive k (of add negative k) from that
number by moving k positions counterclockwise
...
The central element is a binary adder, which is presented two numbers
for addition and produces a sum and an overflow indication
...
For addition, the two numbers are
presented to the adder from two registers, designated in this case as A and B
registers
...
The
overflow indication is stored in a 1-bit overflow flag (0 = no overflow; I =
overflow)
...
0 CONCLUSION
Numbers are represented in binary form and the algorithms used for basic
arithmetic operators are add, subtract, multiply and divide
5
...
- Subtraction flow is to subtract one number (subtracted) from another
(minuend) take the two compliments (negation) of the subtrahend and hold it
to the minuend
...
It can be used to
represent very large and very small numbers
...
0 TUTOR- MARKED ASSIGNMENT
1
...
Find the following differences using two compliment arithmetic:
a
...
10101110 c
...
0 Reference and further reading
Swartzlander, E
...
Los
Alamitiss, CA IEEE Computer society press, 1990
...
0
2
...
0
Main Content
3
...
2 Control of the processor
3
...
4 Micro programmed control
4
...
0
Summary
6
...
M
...
0
Reference and further reading
1
...
For example an execution may consist of fetch,
indirect, execute and interrupt cycles
...
A single
micro operation generally involves a transfer between registers a transfer
between registers a register and an external bus, or a simple ALU operation
...
0
-
At the end of this unit you should be able to
Understand that each cycle is in turn made up of a sequence of more
fundamental operations called micro- operations
...
3
...
To design a control unit each of the smaller cycles
involves a series of step each of which involves the processor registers
...
Micro operations are the functional,
or atomic operations of a processor
...
Now, we turn to the question of how these functions are performed or,
more specifically, how the various elements of the processor are controlled to
provide these functions
...
We have seen that the operation of a computer, in executing a program,
consists of a sequence of instruction cycles, with one machine instruction per
cycle
...
What we are
referring to here is the execution time sequence of instructions
...
One subdivision that we found convenient is fetch, indirect,
execute, and interrupt, with only fetch and execute cycles always occurring
...
In our discussion of pipelining in Chapter 12, we began to see that a
further decomposition is possible
...
We will refer
to these steps as micro-operations
...
Figure 15
...
To
summarize, the execution of a program consists of the sequential execution of
instructions
...
g
...
The execution of
each subcycle involves one or more shorter operations, that is, microoperations
...
In this section, we will
...
A simple example will be used
...
-then show how the concept of micro-operations serves as a guide to the
desi=the control unit
...
For purp,: of discussion, we assume the organization depicted in Figure 12
...
Four register• involved:
Memory address register (MAR): Is connected to the address lines of the
tem bus
...
Memory buffer register (MBR): Is connected to the data lines of the system -_ It contains the value to be stored in memory or the last value read from melr
_ Program counter (PC): Holds the address of the next instruction to be
fete==Instruction register (IR): Holds the last instruction fetched
...
An example appears in Figure
15
...
=-is in the program counter (PC); in this case, the address is
1100100
...
The second step bring in the instruction
...
Of course, we must remember that
this sequence of instruction cycles is not necessarily the same as the written sequence of
instructions that make up the program, because of the existence of branching instructions
...
We have further seen that each instruction cycle is made up of a number of smaller units
...
To design a control unit, however, we need to break down the description further
...
We will refer to these steps as micro-operations
...
Figure 15
...
To summarize, the execution
of a program consists of the sequential execution of instructions
...
g
...
The execution of each subcycle involves one or more shorter operations, that is,
micro-operations
...
bus, the control unit issues a READ command on the control bus, and the
result appears on the data bus and is copied into the memory buffer register
(MBR)
...
Because these two actions (read word from
memory, increment PC) do not interfere with each other, we can do them
simultaneously to save time
...
This frees up the MBR for use during a
possible indirect cycle
...
Each micro-operation involves the movement of data into or
out of a register
...
Symbolically, we can write this sequence of events as follows:
t 1: MAR E- (PC) t2: MBR <-- Memory PC <- (PC) + I t3: IR <-- (MBR)
where I is the instruction length
...
We assume that a clock is available for timing purposes and
that it emits regularly spaced clock pulses
...
Thus, all time units are of equal duration
...
The notation (ti, t2, t3)
represents successive time units
...
Second time unit: Move contents of memory location specified by MAR
to MBR
...
Third time unit: Move contents of MBR to IR
...
The third micro-operation could have been grouped with the
fourth without affecting the fetch operation:
t 1: MAR <- (PC) t 2: MBR <- Memory t3: PC E- (PC) + I IR <- (MBR)
The groupings of micro-operations must follow two simple rules:
35
The proper sequence of events must be followed
...
Conflicts must be avoided
...
For example, the micro-operations (MBR ¢-- Memory) and (IR <- MBR)
should not occur during the same time unit
...
To avoid duplication of circuitry, this addition could be
performed by the ALU
...
We defer a discussion of this point until later in this chapter
...
5
...
Once an instruction is fetched, the next step is to fetch source operands
...
If the instruction specifies
an indirect address, then ar indirect cycle must precede the execute cycle
...
7 and includes
the following micro-operations:
tl: MAR <-- (IR(Address)) t2: MBR
F- Memory
t3: IR(Address) F- (MBR(Address))
The address field of the instruction is transferred to the MAR
...
Finally, the address field of the IR is
update from the MBR, so that it now contains a direct rather than an indirect
address
...
We skip that cycle for a moment, to
consider t interrupt cycle
...
If so, the interrupt cycle occurs
...
We present a very
simple sequeof events, as illustrated in Figure 12
...
We have
t1: MBR E- (PC)
t2: MAR F- Save Address PC FRoutine Address t3: Memory E(MBR)
In the first step, the contents of the PC are transferred to the MBR, so that ucan be saved for return from the interrupt
...
at which the contents of the PC are to be saved, and the PC is loaded
with the add to the MAR and PC, respectively
...
The processor is now ready to begin the next instruction cycle
...
Each
involves a small, fixed sequence of micro-operations and, in each case, the
same micro-operations are repeated each time around
...
Because of the variety opcodes, there
are a number of different sequences of micro-operations that can occur
...
First, consider an add instruction:
ADD R1, X
which adds the contents of the location X to register R1
...
In the first step, the
address portion of the IR is loaded into the MAR
...
Finallv
...
Again
...
Additional micro-operations may be required to extract
the register reference from the IR and perhaps to stage the ALt' inputs or
outputs in some intermediate registers
...
A common instruction is increment and skip if zero:
The content of location X is incremented by l
...
A possible sequence of micro-operations is
ti: MAR <-- (IR(address))
t2: MBR- F- Memory
tz : MBR <-(MBR) +
1
tu: Memory <- (MBR)
If ( (MBR) = 0) then (PC F - (PC)
+ I)
The new feature introduced here is the conditional action
...
This test and action can be implemented
as one micro-operation
...
It is worth pondering the minimal nature of the control unit
...
It does this based only on knowing
the instructions to be executed and the nature of the results of arithmetic and
logical operations (e
...
, positive, overflow, etc
...
And it controls everything with
a few control signals to points within the processor and a few control signals
to the system bus
...
5 indicates the use of a variety of data paths
...
More typically, some sort of internal bus
arrangement, as was suggested in Figure 12
...
Using an internal processor bus, Figure 15
...
6
...
CPU with Internal Bus
...
each register
...
These are needed for the proper operation of the ALU
...
The AC could be used for this purpose, but this limits the flexibility of the
system and would not work with a processor with multiple general-purpose
registers
...
The
ALU is a combinatorial circuit (see Chapter 20) with no internal storage
...
Thus, the output of the ALU cannot be directly
connected to the bus, because this output would feed back to the input
...
With this arrangement, an
operation to add a value from memory to the AC would have the following
steps:
t 1: MAR <-- (IR(address)) t 2:
MBR E- Memory
t 3: Y <__ (MBR)
t4: Z f- (AC) +
(Y) t s: AC F- (Z)
Other organizations are possible, but, in general, some sort of internal
bus or set of internal buses is used
...
Another practical
reason for the use of an internal bus is to save space
...
Its organization is shown in Figure 15
...
Several key
components that may not be self-explanatory are:
40
Incrementer/decrementer address latch: Logic that can add 1 to or subtract
1 from the contents of the stack pointer or program counter
...
Interrupt control: This module handles multiple levels of interrupt
signals
...
Table 15
...
These
are linked to the external system bus
...
8)
...
A discuss"C _- first component is deferred until the next section
...
This module includes a clock and accepts as icurrent instruction and some external control signals
...
The timing of processor operations is synchronized by the
clock trolled by the control unit with control signals
...
Each state lasts
one clock cycle
...
the
son performs one or a set of simultaneous micro-operations as
determine
control signals
...
Machine cycles are defined to be equivalent cesses
...
For e an
instruction consists of two 8-bit portions, then two machine, cycles are fetch the
instruction
...
45
Figure 15
...
Of course, at the same time, the control unit
generates internal control signals that control internal data transfers
...
Three machine
cycles (Ml, M2, M3) are needed
...
The second machine cycle fetches the second half of the instruction,
which contains the number of the 1/O device selected for output
...
The Address Latch Enabled (ALE) pulse signals the start of each
machine cycle from the control unit
...
During timing state T1 of machine cycle Mr, the control unit sets the IO/M
signal to indicate that this is a memory operation
...
The control unit sets the Read Control (RD)
signal to indicate a read, but it waits until T3 to copy the data from the bus
...
The final state, T4, is a bus idle state during which the
processor decodes the instruction
...
Finally, consider a subroutine call instruction
...
The saved address will
later be uses for return
...
The fo=lowing micro-operations suffice:
t,: MAR E- (IR(address)) MBR ~(PC)
tz: PC <-- (IR(address)) Memory <-(MBR) t3: PC <__ (PC) + I
The address in the PC at the start of the instruction is the address of the
nexinstruction in sequence
...
The lateeaddress is also incremented to provide the address of the instruction
for the next it - struction cycle
...
In our example, there is one sequence
eac= for the fetch, indirect, and interrupt cycles, and, for the execute cycle,
there is one sequence of micro-operations for each opcode
...
3
...
The ICC designates the state of the processor
in terms of which portion of the cycle it is in:
00: Fetch 01: Indirect
10:
Execute
Interrupt
11:
At the end of each of the four cycles, the ICC is set appropriately
...
The interrupt cycle is
always followed by the fetch cycle (see Figure 12
...
For both the fetch and
execute cycles, the next cycle depends on the state of the system
...
3 defines the complete sequence of
microoperations, depending only on the instruction sequence and the interrupt
pattern
...
The flowchart for an actual
processor would be more complex
...
We can now consider how the
control unit causes this sequence to occur
...
These two actions may each be
- single micro-operation
...
A simple example will be used
...
49
THE FETCH CYCLE
We begin by looking at the fetch cycle, which occurs at the beginning of
each instruction cycle and causes an instruction to be fetched from
memory
...
It specifies the address in memory for a read or write
operation
...
It contains the value to be stored in memory or the last
value read from memory
...
• Instruction register (IR): Holds the last instruction fetched
...
An example appears
in Figure 3
...
2
...
The first step is to move that address to the
memory address register (MAR) because this is the only register
connected to the address lines of the system bus
...
The desired address (in the MAR) is placed on
the address
50
bus, the control unit issues a READ command on the control bus, and the result
appears on the data bus and is copied into the memory buffer register (MBR)
...
Because these two actions (read word from memory, increment PC) do
not interfere with each other, we can do them simultaneously to save time
...
This
frees up the MBR for use during a possible indirect cycle
...
Each micro-operation involves the movement of data into or out of a
register
...
Symbolically, we can write this
sequence of events as follows:
where I is the instruction length
...
We assume that a clock is available for timing purposes and that it emits
regularly spaced clock pulses
...
Thus, all time
units are of equal duration
...
The notation (t1, t2, t3) represents successive time units
...
•
Second time unit: Move contents of memory location specified by MAR to
MBR
...
•
Third time unit: Move contents of MBR to IR
...
The third micro-operation could have been grouped with the fourth without affecting the fetch operation:
The groupings of micro-operations must follow two simple rules:
The proper sequence of events must be followed
...
51
Conflicts must be avoided
...
For
example, the micro-operations (MBR Memory) and (IR E- MBR) should not occur
during the same time unit
...
To avoid duplication of circuitry, this addition could be performed by the
ALU
...
Whereas micro-operations are ignored in that figure, this discussion shows
the micro-operations needed to perform the subcycles of the instruction cycle
...
Continuing
our simple example, let us assume a one-address instruction format, with direct and
indirect addressing allowed
...
The address field of the instruction is transferred to the MAR
...
Finally, the address field of the IR is
updated from the MBR, so that it now contains a direct rather than an indirect
address
...
We skip that cycle for a moment, to consider the
interrupt cycle
...
If so, the interrupt cycle occurs
...
We have
tl: MBR <-- (PC)
t2: MAR <-- Save Address PC <-- Routine
Address t3: Memory <-- (MBR)
In the first step, the contents of the PC are transferred to the MBR, so that they
can be saved for return from the interrupt
...
These two actions may
each be a single micro-operation
...
In any case, once this is done, the final
step is to store the MBR, which contains the old value of the PC, into memory
...
The fetch, indirect, and interrupt cycles are simple and predictable
...
This is not true of the execute cycle
...
Let us consider
several hypothetical examples
...
The following sequence of
micro-operations might occur:
t1: MAR <-- (IR(address)) t2: MBR <-Memory
t3: R1 ~- (R1) + (MBR)
We begin with the IR containing the ADD instruction
...
Then the referenced memory location
is read
...
Again, this is a
simplified example
...
Let us look at two more complex examples
...
If the result is 0, the next instruction
is skipped
...
The PC is incremented
if (MBR) = 0
...
Note
also that this micro-operation can be performed during the same time unit during
which the updated value in MBR is stored back to memory
...
As an example, consider a
branch and-save-address instruction:
BSA X
The address of the instruction that follows the BSA instruction is saved in location X,
and execution continues at location X + I
...
This is a straightforward technique for providing subroutine calls
...
This is saved at the address designated in the IR
...
THE INSTRUCTION CYCLE
We have seen that each phase of the instruction cycle can be decomposed into a sequence of elementary micro-operations
...
We assume a new 2-bit register called the instruction cycle code (ICC)
...
The indirect
cycle is always followed by the execute cycle
...
For both the fetch and execute cycles, the next cycle depends on
the state of the system
...
1
...
Of
course, this is a simplified example
...
In any case, we have reached the point in our discussion in which the
operation of the processor is defined as the performance of a sequence of micro
operations
...
55
3
...
By
reducing the operation of the processor to its most fundamental level, we are able to define
exactly what it is that the control unit must cause to happen
...
A definition of these functional requirements is the basis for the design and
implementation of the control unit
...
Define the basic elements of the processor
...
Describe the micro-operations that the processor performs
...
Determine the functions that the control unit must perform to cause the microoperations to be performed
...
Let us summarize the results
...
The ALU is the
functional essence of the computer
...
Some registers contain status information needed to manage instruction
sequencing (e
...
, a program status word)
...
Internal data paths are used to move data between
registers and between register and ALU
...
The control unit causes operations to
happen within the processor
...
As we have seen, these operations consist of a sequence of micro-operations
...
Transfer data from a register to an external interface (e
...
, system bus)
...
Perform an arithmetic or logic operation, using registers for input and output
...
We can now be somewhat more explicit about the way in which the control unit
functions
...
Execution: The control unit causes each micro-operation to be performed
...
The key to
how the control unit operates is the use of control signals
...
For the control unit to perform its function,
it must have inputs that allow it to determine the state of the system and outputs that
allow it to control the behavior of the system
...
Internally, the control unit must have the logic required to perform its
sequencing and execution functions
...
57
Figure 3
...
1 is a general model of the control unit, showing all of its inputs and
outputs
...
" The control unit causes one
micro-operation (or a set of simultaneous micro-operations) to be performed for
each clock pulse
...
Instruction register: The opcode and addressing mode of the current instruction
are used to determine which micro-operations to perform during the execute cycle
...
For example, for the increment-andskip-if-zero (ISZ) instruction, the control unit will increment the PC if the zero flag
is set
...
The outputs are as follows:
Control signals within the processor: These are two types: those that cause data to
be moved from one register to another, and those that activate specific ALU
functions
...
58
Three types of control signals are used: those that activate an ALU function, those that
activate a data path, and those that are signals on the external system bus or other external
interface
...
Let us consider again the fetch cycle to see how the control unit maintains control
...
At a given point, it knows
that the fetch cycle is to be performed next
...
The control unit does this by activating the control signal that opens the
gates between the bits of the PC and the bits of the MAR
...
The control unit does this by sending
the following control signals simultaneously:
A control signal that opens gates, allowing the contents of the MAR onto the address bus
A memory read control signal on the control bus
A control signal that opens the gates, allowing the contents of the data bus to be stored in
the MBR
Control signals to logic that add 1 to the contents of the PC and store the result back to
the PC
Following this, the control unit sends a control signal that opens gates between the MBR
and the IR
...
To decide this, it examines
the IR to see if an indirect memory reference is made
...
For the execute cycle, the control
unit begins by examining the opcode and, on the basis of that, decides which sequence of
micro-operations to perform for the execute cycle
...
Figure
3
...
3 illustrates the example
...
The data paths between elements are indicated
...
The control unit receives inputs from the clock, the
instruction register, and flags
...
Control signals go to three separate destinations:
Data paths: The control unit controls the internal flow of data
...
For each path to be controlled, there is a switch (indicated by a circle
in the figure)
...
ALU: The control unit controls the operation of the ALU by a set of control signals
...
System bus: The control unit sends control signals out onto the control lines of the
system bus (e
...
, memory READ)
...
Using
this knowledge, and by reading all of its inputs, the control unit emits a sequence of
control signals that causes micro-operations to occur
...
For
simplicity, the data and control paths for incrementing the PC and for loading the fixed
addresses into the PC and MAR are not shown
...
The control unit is the
engine that runs the entire computer
...
g
...
It never gets to spe the data being processed or the actual
results produced
...
Figure 15
...
The complexity of this type of
organization should be clear
...
2
...
2
...
61
Gates and control signals are provided for movement of data onto and off the bus from
each register
...
Two new registers, labeled Y and Z, have been added to the organization
...
When an operation involving two
operands is performed, one can be obtained from the internal bus, but the other must
be obtained from another source
...
Register Y provides temporary storage for the other input
...
Thus, when control
signals activate an ALU function, the input to the ALU is transformed to the output
...
Register Z provides temporary output storage
...
The use of common data paths simplifies the interconnection
layout and the control of the processor
...
To illustrate some of the concepts introduced thus far in this unit, let us consider the
Intel 8085
...
2
...
Several key components that may
not be self-explanatory are:
Incremental decrementer address latch: Logic that can add 1 to or subtract 1
from the contents of the stack pointer or program counter
...
Interrupt control: This module handles multiple levels of interrupt signals
...
Table 15
...
These are linked
to the external system bus
...
8)
...
A discussion of the first
component is deferred until the next section
...
This module includes a clock and accepts as inputs the current
instruction and some external control signals
...
The timing of processor operations is synchronized by the clock and controlled by the
control unit with control signals
...
Each
state lasts one clock cycle
...
The number of machine cycles is fixed for a given instruction but varies from one
instruction to accesses
...
For example, if
an instruction consists of two 8-bit portions, then two machine cycles are required to fetch
the instruction
...
66
Figure 3
...
7 gives an example of 8085 timing, showing the value of external
control signals
...
The diagram shows the instruction cycle for an OUT instruction
...
During the first, the OUT instruction is fetched
...
During the third cycle, the contents of the AC are written out to
the selected device over the data bus
...
The ALE pulse alerts
external circuits
...
Also, the control unit causes the
contents of the PC to be placed on the
67
address bus (Als through As) and the address/data bus (ADS through ADO)
...
During timing state
T2, the addressed memory mole places the contents of the addressed memory location on
the address/data bus
...
This gives the memory module time to put
the data on the bus and for the signal levels to stabilize
...
The remaining machine cycles
proceed in a similar fashion
...
3 HARDWIRED CONTROL/ IMPLEMENTATION
68
In a hardwired implementation, the control unit is essentially a state machine circuit
...
3
...
1 CONTROL UNIT INPUT
The key inputs are the instruction registers, the clock, flags and control bus signals
...
The other two inputs, however are not directly useful to the control unit
...
The control unit makes use of the opcode and will
perform different actions (issue a different combination of control signals) for different
instructions
...
This function can be performed by a decoder, which takes an encoded input
and produces a single output
...
This is
useful for measuring the duration of micro-operations
...
However the control unit emits different control signals at
different time units within a single instruction cycle
...
At the
end of an instruction cycle, the control unit must feed back to the counter to reinitialize it
at T1
...
10
...
Essentially, what must be done is, for each control signal, to derive a Boolean
expression of that signal as a function of the inputs
...
Let us consider again our simple example illustrated in Figure 15
...
We saw in Table
15
...
Let us consider a single control signal, C5
...
We can see that it is used twice in Table 15
...
Let us
define two new control signals, P and Q, that have the following interpretation:
PQ = 00
PQ = Ol
Fetch Cycle
Indirect Cycle
PQ = 10
Execute Cycle
PQ = 11
Interrupt Cycle
Then the following Boolean expression defines C5:
C5 = P
...
T2 + P
...
T2
That is, the control signal C5 will be asserted during the second time unit of both the
fetch and indirect cycles
...
C5 is also needed during the execute cycle
...
Now we can define C5 as
71
C5
+ P
...
TZ + P - Q - (LDA + ADD + AND)-T2
This same process could be repeated for every control signal generated by the
processor
...
To tie everything together, the control unit must control the state of the instruction
cycle
...
The control unit must also set the appropriate values of P and
Q to define the next sub cycle to be performed
...
The task
of implementing a combinatorial circuit that satisfies all of these equations becomes
extremely difficult
...
3
...
A micro program consist of a
sequence of instructions in a microprogramming language
...
3
...
1 MICRO INSTRUCTIONS
To implement a control unit as n interconnection of basic logic elements is no easy task
...
72
relatively inflexible
...
An alternative, which has been used in many CISC processors, is to implement a
microprogrammed control unit
...
1
...
This notation looks suspiciously like a programming
language
...
Each line
describes a set of micro-operations occurring at one time and is known as a
microinstruction
...
This latter term reflects the fact that a microprogram is midway between: hardware and
software
...
How can we use the concept of microprogramming to implement a contra: unit? Consider
that for each micro-operation, all that the control unit is allowed t o do is generate a set of
control signals
...
This condition can, of course, be represented by a binary digit for
each control line
...
Then each micro-operation would be represented by a different pattern of 1s
and Os in the control word
...
Next, we must recognizz that the sequence
of micro-operations is not fixed
...
So let us put our control words in a memory, with each word having a unique address
...
g
...
Also, add a few bits to specify
the condition
...
4
...
There is one
bit for each internal processor control line and one bit for each system bus control line
There is a condition field indicating the condition under which there should be a’ branch,
and there is a field with the address of the microinstruction to be executed next when a
branch is taken
...
The resulting control signals will cause
one or more micro-operations to be performed
...
•
If the condition indicated by the condition bits is true, the next microinstruction to be
executed is indicated in the address field
...
4
...
The microinstructions in each routine are to be executed sequentially
...
There is a
special execute cycle routine whose only purpose is to signify that one of the machine
instruction routines (AND, ADD, and so on) is to be executed next, depending on the
current opcode
...
2 is a concise description of the complete operation of the
control unit
...
If nothing
else, this notation would be a useful device for documenting the functioning of a control unit
for a particular computer
...
It is also a way of implementing the
control unit
...
4
...
It follows that we could implement the control unit by simply executing that
program
...
4
...
The set of microinstructions is stored in the control memory
...
When a microinstruction is read from the
control memory, it is transferred to a control buffer register the left-hand portion of that
register (see Figure 3
...
1b) connects to the control lines emanating
75
from the control unit
...
The third element shown in the figure is a
sequencing unit that loads the control address register and issues a read command
...
4
...
4
...
The control unit functions as follows:
1
...
2
...
the control buffer register
...
4
...
All this happens during one clock pulse
...
At the conclusion of each microinstruction, the sequencing logic unit loads a new address into the control address
register
...
76
•
Jump to a new routine based on a jump microinstruction: Load the address field of the
control buffer register into the control address register
...
Figure 3
...
1d shows two modules labeled decoder
...
The lower decoder is not used for horizontal
microinstructions but is used for vertical microinstructions (Figure 16
...
As was
mentioned, in a horizontal microinstruction every bit in the control' field attaches to a control
line
...
g
...
The advantage
of vertical microinstructions is that they are more compact (fewer bits) than horizontal
microinstructions, at the expense of a small additional amount of logic and time delay
...
4
...
Thus it is both cheaper and less error prone
to implement
...
On the other hand the
77
decoders and sequencing logic unit of a micro programmed control unit are very simple
pieces of logic
...
Despite this, microprogramming is the
dominant technique for implementing control units in pure CISC architecture due to its
ease of implementation
...
In designing a control unit, these tasks must be considered together
because both affect the format of the micro instruction and the timing of the control unit
...
0 CONCLUSION
Micro- operations are the functional or atomic operations of a processor
...
5
...
Execution is accomplished by the effect of these control signals, emanating for the
control unit to the ALU registers and system interconnection structure
...
Furthermore, the concept of micro- operations leads to an elegant and powerful approach to
control unit implementation, known as micro programming
...
6
...
What is the relationship between instructions and micro operations?
2
...
Briefly what is meant by a hard wired implementation of a control unit
...
What is the difference between a hard wired implementation and a micro
programmed implementation of a control unit?
7
...
Microprocessor Architecture and Microprogramming – Upper saddle River N
...
0
Introduction
2
...
0
3
...
2
Parallel Organization
4
...
0
Conclusion
Summary
6
...
0
References/further reading
1
...
Instruction pipelining, at least to the extent of overlapping fetch and execute operations,
has been around for a long time
...
This approach is taken further with super scalar organization, which exploits
instruction- level parallelism
...
As computer technology has evolved and as the cost of
computer hard ware has dropped computer designers have sought more and more
opportunities for parallelism usually to enhance performance and in some cases to increase
availability
...
0
OBJECTIVES
At the end of the unit you should be able to
-
Understand the traditional way to increase system performance
...
3
...
He proposed the following categories of
computer systems:
-
Single instruction, single data (SISD) stream: A single processor executes a
single instruction stream to operate on data stored in a single memory
...
-
Single instruction, multiple data (SIMD) stream: A single machine instruction
controls the simultaneous execution of a number of processing elements on a lock
step basis
...
Vector
and array processors fall into this category
-
Multiple instruction, single data (MISD) stream: A sequence of data is
transmitted to a set of processors, each of which executes a different instruction
sequence
...
-
Multiple instructions, multiple data (MIMD) stream: A sequence of data
transmitted to a set of processors, each of which executes a different instruction
sequence
...
-
Multiple instructions, multiple data (MIMD) stream: A set of processors
simultaneously execute different instruction sequences on different data sets
...
-
With the MMD organization, the processors are general-purpose; each is able to
process all of the instruction necessary to perform the appropriate data
transformation
...
1
...
4
...
The two most common multiple
processor organizations are symmetric multi processors (SMPs) and clusters
...
80
5
...
0 TUTOR- MARKED ASSIGNMENT
1
...
2
...
0 REFERENCES/FURTHER READING
Catanzaro B
...
0
Introduction
2
...
0
3
...
2
Multi processor operating system design considerations
3
...
0
A main frame SMP
Conclusion
5
...
0
7
...
0
Introduction
Virtually all single user personal computers and most work stations contained a single
generate purpose micro processors
...
Vendors have introduced
organization
...
0
Objectives
At the end of this unit you should be able to
81
system with and SMP
-
Understand the organization of a multi processor system
-
Explain the character of an SMP as a standalone computer system
...
1 Organization
This depicts in general terms the organization of a multiprocessor system
...
Each processor is self-contained, including a control unit, ALU, registers,
and, typically, one or more levels of cache
...
The processors can communicate with each other through
memory (messages and status information left in common data areas)
...
The memory is often organized so that
multiple simultaneous accesses to separate blocks of memory are possible
...
The most common organization for personal computers, workstations, and servers is the
time-shared bus
...
1
...
The structure and interfaces are basically the same as
for a single-processor system that uses a bus interconnection
...
To facilitate DMA transfers from I/O processors, the following
features are provided:
-
Addressing: It must be possible to distinguish modules on the bus to determine the
source and destination of data
...
-
Time – sharing: When one module is controlling the bus other modules are locked
out and must if necessary, suspend operation until bus access is achieved
...
In this latter case,
there are now multiple processors as well as multiple I/O processors all attempting
to gain access to one or more memory modules via the bus
...
The
physical feature interface and the addressing, arbitration and time sharing logic of
each processor remain the same as in a single processor system
...
-
Reliability: The bus is essentially a passive medium, and the failure of any attached
device should not cause failure of the whole system
...
All memory references pass
through the common bus
...
To
improve performance, it is desirable to equip each processor with a cache memory
...
Typically work station
and PC SMPs have two levels of cache, with the LI cache internal (same chip as the
processor) and the L2 cache either internal or external some processors now employ
a L3 cache as well
...
Because each local cache contains an image of a portion of memory
if a word is altered in one cache, it could conceivably invalidate a word in another
cache
...
This problem is known as the cache coherence problem and is typically
addressed in hardware rather than by the operating system
...
2 MULTIPROCESSOR OPERATING DESIGN CONSIDERATIONS
An SMP operating system manages processor and other computer resources so that the user
perceives a single operating system controlling system controlling system resources
...
In
both the SMP and uniprocesssor cases, multiple jobs or processes may be active at one
time and it is the responsibility of the operating system to schedule their expiation and to
allocate resources
...
A user may construct application that use multiple
processes or multiple threads within processes without regard to whether a simple
processor or multiple processor will be available
...
Among the key design issues:
Simultaneous concurrent processes: Operating system routines need to be
reentrant to allow several processor to execute the case is code simultaneously
...
-
Scheduling: Any processor may perform scheduling, so conflict must be avoided
...
84
-
Synchronization: With multiple active processes having potential accesses to
shared address spaces or shared I/O resources, care must be taken to provide effective
...
-
Memory management: Memory management on a multi processor must deal with
all of the issues found on unit processor machines
...
The paging mechanism on different processors shares a page
replacement
...
The scheduler and other portions of the
operating system must recognize the loss of a processor and restructure management tables
accordingly
...
2 A MAIN FRAME SMP
Most pc and work station smps use a bus interconnection strategy as depicted
in figure 3
...
1
...
The key components of the
configuration are shown in figure 3
...
2
Dual: core processor chip: Each processor chip includes two identical central
processor, in which most of the instructions are hard wired and the rest are executed by
vertical micro code
...
-
L2 cache: Each L2 caches are arranged in clusters of five, with each cluster
supporting eight processor chips and providing access to the entire main memory space
...
-
Main store control (MSC): The MSCs interconnect the L2 caches and the main
memory
...
The maximum configurable
memory consists of 8 memory cards for a total of 256 GB
...
Traffic to/from the channels goes directly to the L2 cache
...
It is superscalar it executes instructions in strict
architectural order
4
...
It can be defined as a stands alone
computer system with the following characteristics
...
2
...
These processors share the same main memory and I/O facilities and are
interconnected by a bus or other internal connection scheme such that memory aces time is
approximately the same for each process
...
All processors share access to I/O devices either through the same channels or
through different channels that provide paths to the same device
...
5
...
The system is controlled by an integrated operating system that provide interaction
between processors and their programs at the job task file and data element levels
...
Point 5 illustrates one of the contrasts with a
loosely coupled multiprocessing system, such as a cluster
...
In an SNIP, individual data elements can
constitute the level of interaction, and there can be a high degree of cooperation between
processes
...
An SMP organization has a number of potential advantages over a uniprocessor
organization
...
0 SUMMARY
- Availability: In a symmetric multiprocessor, because all processors can perform the same
functions, the failure of a single processor does not halt the machine
...
- Incremental growth: A user can enhance the performance of a system by adding an
additional processor
...
It is important to note that these are potential, rather than guaranteed, benefits
...
An attractive feature of an SMP is that the existence of multiple processors is transparent to
the user
...
6
...
What are some of the potential advantages of an SMP compared with a uniprocessor?
2
...
What are some of key operating system design issues for an SMP?
87
7
...
“Achieving High Performance in Bus- Based shared memory
multiprocessors” IEEE concurrency, July- September 2000
UNIT 3: MULTI THREADING AND CHIP MULTI PROCESSORS
CONTENT
1
...
0
OBJECTIVES
3
...
1
MAIN CONTENT
IMPLICIT AND EXPLICIT MULTITHREADING
3
...
3
4
...
0
SUMMARY
6
...
0
TUTOR- MARKED ASSIGNMENT
REFERENCES/FURTHER READING
1
...
This can be expressed as MIPS rate = f x IPC where f is the processor clock
frequency, in MHz, and IPC (instructions per cycle) is the average number of instructions
executed per cycle
...
An alternative approach, which allows for a high degree of instruction-level parallelism
without increasing circuit complexity or power consumption, is called multithreading
...
88
The variety of specific multithreading designs, realized in both commercial systems and
experimental systems, is vast
...
0 OBJECTIVES
At the end of this unit you should be able to:
• Explain multi threading and chip multi processor
...
• Understand the four principal approaches to multithreading
...
1 IMPLICIT AND EXPLICIT MULTITHREADING
The concept of thread used in discussing multi-thread processors may not be the
same as the concept of software threads in a multi programmed operating system
...
A process embodies two
key characteristics
...
From time to time, a process may allocate control or ownership of resources, such
as main memory, I/O channels, I/O devices, and files
...
This execution may be interleaved with that of other
processes
...
) and a
dispatching priority and is the entity that is scheduled and dispatched by the operating
system
...
Thread: A dispatch able unit of work within a process
...
A thread executes sequentially and is interruptible so that the
processor can turn to another thread
...
Typically, this type of switch is much less costly than a process switch
...
The multiple threads within a
process share the same resources
...
Traditional operating systems, such as earlier versions of UNIX, did
not support threads
...
A distinction is made between user-level threads,
which are visible to the application program, and kernel-level threads, which are visible
only to the operating system
...
All of the commercial processors and most of the experimental processors so far have used
explicit multithreading
...
Implicit multithreading refers to the
concurrent execution of multiple threads extracted from a single sequential program
...
3
...
The designs differ in the amount and type
of additional hardware used to support concurrent thread execution
...
The processor treats each thread separately and may
use a number of techniques for optimizing single-thread execution, including branch
prediction, register renaming, and superscalar techniques
...
Broadly speaking, there are four principal approaches to
multithreading:
Interleaved multithreading: This is also known as fine-grained multithreading
...
If a thread is blocked because of data dependencies or memory
latencies
...
90
Blocked multithreading: This is also known as coarse-grained multithreading
...
This event induces a switch to another thread
...
Simultaneous multithreading (SMT): Instructions are simultaneously issued from
multiple threads to the execution units of a superscalar processor
...
Chip multiprocessing: In this case, the entire processor is replicated on a single chip
and each processor handles separate threads
...
This is referred to as multicore;
For the first two approaches, instructions from different threads are not executed
simultaneously
...
This results in a better
utilization of the processor's execution resources and avoids a large penalty due to cache
misses and other latency events
...
Chip
multiprocessing also enables simultaneous execution of instructions from different threads
...
•
Interleaved multithreaded scalar: This is the easiest multithreading approach to
implement
...
The pipeline
stages can be kept fully occupied, or close to fully occupied
...
3
...
This is the processor has a super scalar architecture and
can issue instructions from one or both threads in parallel
...
4
...
Another organizational
91
design decision in a multi core will be superscalar or will implement simultaneous multi
threading (SMI)
5
...
When more than one processor are implemented on a single chip the configuration
is referred to as chip multi processing
...
0 TUTOR MARKED ASSIGNMENT
1
...
List and briefly explain four principal approaches to multi threading
7
...
Rubic B and sile J” Multi threaded processors the computer journal No 3 2002
UNIT 4: VECTOR COMPUTATION
1
...
0 OBJECTIVES
3
...
1 APPROACHES TO VECTOR COMPUTATION
3
...
0 CONCLUSION
5
...
0 TUTOR MARKED ASSIGNMENT
7
...
0 INTRODUCTION
Although the performance of mainframe general-purpose computers continues to improve
relentlessly, there continue to be applications that are beyond the reach of the contemporary
mainframe
...
2
...
- Explain the contrast between main frame and super computers as it relates to vectors
computation
...
0 CONCLUSION
Supercomputers were developed to handle these types of problems
...
In contrast to
mainframes, which are designed for multiprogramming and intensive I/O
...
The supercomputer has
limited use and, because of its price tag, a limited market
...
0 SUMMARY
Comparatively few of these machines are operational, mostly at research centers and some
government agencies with scientific or engineering functions
...
Thus, the technology and performance of the supercomputer continues to
evolve
...
Although a supercomputer optimized for
vector computation, it is a general-purpose computer, capable of handling scalar processing
and general data processing tasks
...
3
...
This facility is an optional add-on to the basic system but is highly
integrated with it
...
The IBM facility makes use of a number of vector registers
...
To compute the vector sum C = A + B, the vectors A and B are
loaded into two vector registers
...
The computation overlap,
93
and the loading of the input data into the registers in a block, results in a significant speeding
up over an ordinary ALU operation
...
provides
increased performance over loops of scalar arithmetic instructions in three ways:
The fixed and predetermined structure of vector data permits housekeeping instructions
inside the loop to be replaced by faster internal (hardware or microcode) machine operations
...
The use of vector registers for intermediate results avoids additional storage reference
...
2
...
Although the vector
facility is seen to be a physically separate add-on to the processor, its architecture is an
extension of the System/370 architecture and is compatible with it
...
- Arithmetic operations on individual vector elements produce exactly the sale
result as do corresponding System/370 scalar instructions
...
Should the result be exact, as it is for scalar floatingpoint, division, or should an approximation be allowed that would permit
higher speed implementation but could sometimes introduce an error in one
or mare low-order bit positions? The decision was made to uphold complete
compatibility with the System/370 architecture at the expense of a minor
performance degradation
...
94
- Arithmetic exceptions are the same as, or extensions of, exceptions for the scalar
arithmetic instructions of the System/370, and similar fix-up routines can be used
...
g
...
Thus,
when execution of the vector instruction resumes, the proper place in a vector
register is accessed
...
- This level of integration provides a number of benefits
...
Existing
application programs, language compilers, and other software can be run
unchanged
...
A key issue in the design of a vector facility is whether operands are located
registers or memory
...
95
in
This approach is also used on the Cray supercomputer
...
The main
disadvantage of the use of vector registers is that the programmer or compiler must take
them into account for good performance
...
In this case, a vector loop must be performed, in which the operation is performed on K
elements at a time and the loop is repeated N/K times
...
The speedup that can be achieved using registers is demonstrated in F17
...
The
FORTRAN routine multiplies vector A by vector B to produce C, where each vector
has a real part (AR, BR, CR) and an imaginary part (Ai
...
The 3090 can perform
one main-storage access per processor, or clock
...
Let us assume the use of instructions that can specify
two source operands result
...
96
register architecture (part b), this time is reduced to 12 cycles
...
For large vectors, this fixed penalty is
relatively small
...
20c shows that the ability to specify both storage and register
operands in one instruction further reduces the time to 10 cycles per iteration
...
5
Figure 17
...
There are sixteen 32-bit vector registers
...
Any register element can hold an integer or floatingpoint value
...
6
...
What are the chief characteristics of SMP
97
2
...
0 REFERENCES/ FURTHER READING
Tomascivic, M amd Multinsvic, V
...
Hardware solutions Los Alamitos, C
...
MODULE 4 REDUCED INSTRUCTION SET COMPUTERS
UNIT 1: INSTRUCTIONS EXECUTION CHARACTERS
UNIT2: REDUCED INSTRUCTION SET ARCHITECTURE
CONTENT
1
...
0 Objectives
3
...
1 Operations
3
...
3 Procedure calls
3
...
0 Conclusion
5
...
0 Tutor marked assignment
7
...
0 INTRODUCTION
Studies of the execution behaving high- level language programs have provided
guidance’s in designing a new type of processor architecture
...
0 OBJECTIVES
At the end of this unit you should be able to
- Understand the operation of RISC machines
- Understand the number of parameters and variables a procedure deals with and
the depth of nesting
...
1 OPERATIONS
There is quite good agreement in the results of this mixture of languages and applications
...
suggesting that the simple movement of data is of high importance
...
LOOP)
...
This suggests that the sequence control
mechanism of the instruction set is important
...
However these results do not reveal which statements use the most time in
the execution of a typical program
...
described
in Appendix 4A
...
The second and third columns in Table 13
...
3
...
There are several aspects that are significant
...
3)
...
Further, more than 80 % of the scalars were local (to the procedure)
variables
...
Thus, there is a preponderance of
references to scalars, and these are highly localized
...
As discussed before, it is necessary to deal
with actual architectures to examine program behavior more deeply
...
5 operand in memory and 1
...
Similar results are reported in [HUCK83] for C, Pascal, and FORTRAN programs on
S/370, PDP-11, and VAX
...
These latter studies suggest the importance of an architecture that lends itself to
fast operand accessing, because this operation is performed so frequently
...
We have seen that procedure calls and returns are an important aspect of HLL programs
...
2) suggests that these are the most time-consuming operations in
compiled HLL programs
...
Two aspects are significant: the number of parameters and
variables that a procedure deals with, and the depth of nesting
...
Similar results were reported by the Berkeley RISC team [KATE83],
as shown in Table 13
...
These results show that the number of words required per
procedure activation is not large
...
These studies show that
those references are in fact confined to relatively few variables
...
3 IMPLICATIONS
A number of groups have looked at results such as those just reported and have
concluded that the attempt to make the instruction set architecture close to HLLs is not
the most effective design strategy
...
100
Generalizing from the work of a number of researchers, three elements emerge that, by
and large, characterize RISC architectures
...
This is intended to optimize operand referencing
...
This, coupled with
the locality and predominance of scalar references, suggests that performance can be
improved by reducing memory references at the expense of more register references
...
Second, careful attention needs to be paid to the design of instruction pipelines
...
This manifests itself as a high
proportion of instructions that are prefetched but never executed
...
This point is not as obvious
as the others, but should become clearer in the ensuing discussion
...
0 CONCLUSION
Assignment statements predominate, suggesting that the simple movement of data
should be optimized
...
Studies of operand reference patterns suggest that it should be
possible to enhance, performance by keeping A moderate number of operands in
registers
...
0 SUMMARY
The simple instruction set of a RISC lends itself to efficient pipelining because there
are fewer and more predictable operations performed per instruction
...
6
...
What is a delayed branch?
7
...
101
UNIT 2: REDUCED INSTRUCTION SET ARCHITECTURE
CONTENT
1
...
0 OBJECTIVES
3
...
1 WHY CISC
3
...
3 CISC VERSUS RISC CHARACTERISTICS
4
...
0 SUMMARY
6
...
0 REFERENCES/ FURTHER READING
1
...
The RISC architecture is a dramatic departure from
the historical trend in processor architecture
...
2
...
3
...
Specific examples will be seen later in this chapter
...
We have noted the trend to richer instruction sets, which include a larger number of
instructions and more complex instructions
...
Underlying both of these
reasons was the shift to HLLs on the part of programmers; architects attempted to design
machines that provided better support for HLLs
...
Indeed, because technology continues to evolve and because architectures exist along a
spectrum rather than in two neat categories, a black-and-white assessment is unlikely ever to
emerge
...
The first of the reasons cited, compiler simplification, seems obvious
...
If
there are machine instructions that resemble HLL statements, this task is simplified
...
[RAIJI83], [PATT82b])
...
The task of optimizing the
generated code to minimize code size, reduce instruction execution count, and enhance
pipelining is much more difficult with a complex instruction set
...
The other major reason cited is the expectation that a CISC will yield smaller
...
Let us examine both aspects of this assertion: that programs will be smaller and
that they will execute faster
...
First, because the program takes up less
memory, there is a savings in that resource
...
More important
smaller programs should improve performance, and this will happen in two ways
...
Second, in a paging
environment, smaller programs occupy fewer pages, reducing page faults
...
In many cases, the CISC program, expressed
in symbolic machine language, may be shorter (i
...
, fewer instructions), but the number of
103
bits of memory occupied may not be noticeably smaller
...
6 shows results from three
studies that compared the size of compiled C programs on a variety of machines, including
RISC I, which has a reduced instruction set architecture
...
It is also interesting to note that the VAX, which has a much
more complex instruction set than the PDP-11, achieves very little savings over the latter
...
9 times the size of code on an IBM S/370
...
There are several reasons for these rather surprising results
...
Also, because there are more instructions on a
CISC, longer opcodes are required, producing longer instructions
...
and the former require fewer bits
...
So the expectation that a CISC will produce smaller programs, with the attendant
advantages, may not be realized
...
It seems to make sense that a
complex HLL operation will execute more quickly as a single machine instruction rather
than as a series of more primitive instructions
...
The entire control unit must be made more
complex, and/or the microprogram control store must be made larger, to accommodate a
richer instruction set
...
In fact, some researchers have found that the speedup iii the execution of complex functions
is due not so much to the power of the complex machine instructions as to their residence in
high-speed control store [RADI83]
...
Thus, the hardware architect is in the position of trying to determine which subroutines or
functions will be used most frequently and assigning those to the control store by
implementing them in microcode
...
On S/390
systems, instructions such as Translate and Extended-Precision-Floating-Point-Divide
reside in high-speed storage, while the sequence involved in setting up procedure calls or
initiating an interrupt handler are in slower main memory
...
This has led a number of groups to pursue the opposite path
...
2
CHARACTERISTICS
OF
REDUCED
ARCHITECTURES
104
INSTRUCTION
SET
Although a variety of different approaches to reduced instruction set architecture have
beers taken, certain characteristics are common to all of them:
One instruction per cycle
Register-to-register operations
Simple addressing modes
Simple instruction formats
Here, we provide a brief discussion of these characteristics
...
The first characteristic listed is that there is one machine instruction per machine
cycle
...
Thus, RISC
machine instructions should be no more complicated than, and execute about as fast as,
microinstructions on CISC machines (discussed in Part Four)
...
Such instructions should execute faster than comparable machine instructions
on other machines, because it is not necessary to access a microprogram control store
during instruction execution
...
with
only simple LOAD and STORE operations accessing memory
...
For example, a RISC
instruction set may include only one or two ADD instructions (e
...
, integer add, add with
carry); the VAX has 25 different ADD instructions
...
This emphasis on register-to-register operations is notable for RISC design
Contemporary CISC machines provide such instructions but also include memory to
memory and mixed register/memory operations
...
illustrates the approach taken
...
Results such as this one led researcher to suggest that future architectures
should contain no registers a: [MYER78]
...
Thus, Figure 13
...
A third characteristic is the use of simple addressing modes
...
Several additional modes, such as
displacement
and PC-relative, may be included
...
Again, this design feature simplifies the instruction set and
the control unit
...
Generally,
only one or a few formats are used
...
Field locations, especially the opcode, are fixed
...
With fixed fields, opcode decoding and register operand accessing
can occur simultaneously
...
Instruction
fetching is optimized because word-length units are fetched
...
Taken together, these characteristics can be assessed to determine the potential
performance benefits of the RISC approach
...
First, more effective optimizing compilers can be developed
...
It
is even possible to compute parts of complex instructions at compile time
...
Each time it is executed, the move will depend on the length of the
string, whether and in which direction the locations overlap, and what the alignment
characteristics are
...
Thus, the
compiler could produce an optimized sequence of primitive instructions for this function
...
It would seem reasonable that a control unit built specifically for those
instructions and using little or no microcode could execute them faster than a comparable
CISC
...
RISC researchers feel that the
instruction pipelining technique can be applied much more effectively with a reduced
instruction set
...
A final, and somewhat less significant, point is that RISC processors are more
responsive to interrupts because interrupts are checked between rather elementary operations
...
The case for improved performance for a reduced instruction set architecture is strong,
but one could perhaps still make an argument for CISC
...
Further, most studies have not
attempted to separate the effects of a reduced instruction set and the effects of a large register
file
...
3
...
The result is that the more
recent RISC designs, notably the PowerPC, are no longer "pure" RISC and the more recent
CISC designs, notably the Pentium II and later Pentium models, do incorporate some RISC
characteristics
...
Table
13
...
For
purposes of this comparison, the following are considered typical of a classic RISC:
107
1
...
2
...
3
...
This paramet:is
difficult to pin down
...
4
...
5
...
g
...
6
...
No more than one memory-addressed operand per instruction
...
8
...
in an instruction
...
This means that at
least 32 integer registers can be explicitly referenced at a time
...
Number of bits for floating-point register specifier equal to four or more
...
108
Items 1 through 3 are an indication of instruction decode
...
Items 9 and 10
are related to the advantage of compilers
...
UNIT 3 : RISC PIPELINING
109
1
...
0 objectives
3
...
1 pipelining with regular instructions
3
...
0 conclusion
5
...
0 tutor marked assignment
7
...
0 introductio
Instruction pipelining is often used to enhance performances and can be improved
further by permitting two memoryaccesses per stage
...
0 objectives
At the end of this unit, you should be able to
-Explain the stages involve in an instruction cycle
- Explain the stages required for LOAD and STORE operations
3
...
Let us reconsider this in the context
of a RISC architecture
...
• I: Instruction fetch
...
Performs an ALU operation with register input any
For load and store operations, three stages are required:
I: Instruction fetch
...
Calculates memory address
D: Memory
...
Figure 3
...
A depicts the timing of a sequence of instructions using
...
Even very simple pipelining can improve performance
...
4
...
Thus we see that
the instruction fetch stage instruction can e performed in parallel with the first part of
the ex
...
However, the execute/memory stage of the second instruction must be
110
delayed until the first instruction clears the second stage of the pipeline
...
Two problems prevent the
maximum speed up from being achieved
...
This requires the
insertion of a wait state in some instructions
...
To accommodate this with minimum circuit instruction
can be inserted into the instruction stream by the compiler or assemblier
...
This
yields the sequence shown in Figure 3
...
1 Now, up to three instructions can be
overlapped; and the improvement is as much as a factor of 3
...
Also data dependencies have an
effect
...
1
...
Again
...
Because the E stage usually involves an ALU operation, it may be longer
...
2 OPTIMIZATION OF PIPELINING
Because of the simplicity and regularity of a RISC instruction set, the design of the phasing
into three or four stages is easily accomplished
...
1
...
Up to four instructions at a time can be under way, and the maximum
potential speedup is a factor of 4
...
Because of the simple and regular nature of RISC instructions, pipelining schemes can be
efficiently employed
...
However, we have seen that data and branch
dependencies reduce the overall execution rate
...
First, let us consider branching instructions
...
The instruction location
immediately following the branch is referred to the delay slot
...
1
...
At time 4, the JUMP instruction is executed at
the same time that instruction 103 (ADD instruction) is fetched
...
Figure 3
...
1 shows the same
pipeline handled by a typical RISC organization
...
However,
because of the insertion of the NOOP instruction, we do not need special circuitry to clear
the pipeline; the HOOP simply executes with no effect
...
7c shows the use of the
113
delayed branch
...
Note, however, that the ADD instruction is fetched before the
execution of the JUMP instruction has a chance to alter the program counter
...
Thus, the original semantics of the program are retained but one less clock cycle is
required for execution
...
For conditional branches, this procedure cannot be blindly applied
...
The
experience with both the Berkeley RISC and IBM 801 systems is that the majority of
conditional branch instructions can be optimized in this fashion ([PATT82a], [RADI83])
...
On
LOAD instructions, the register that is to be the target of the load is locked by the
processor
...
If the
compiler can rearrange instructions so that useful work can be done while the load is in the
pipeline, efficiency is increased
...
Unrolling replicates the body of a loop some number of times called the unrolling factor (u)
and iterates by step a instead of step 1
...
8 illustrates all three of these improvements in an example
...
Instruction parallelism is increased because the second
assignment can be performed while the results of the first are being stored and the loop
variables are being updated
...
As a final note, we should point out that the design of the instruction pipeline should
not be carried out in isolation from other optimization techniques applied to the system
...
UNIT 4: MIPS R4000
One of the first commercially available RISC chip sets was developed by MIPS
Technology Inc
...
In this section we look at the MIPS 84000
...
The most significant difference is that the 84000 uses 64 rather than 32
bits for all internal and external data paths and for addresses, registers, and the ALU
...
It allows a
bigger address space-large enough for an operating system to map more than a terabyte of
files directly into virtual memory for easy access
...
Also,
115
the 64-bit capacity allows the 84000 to process data such as IEEE double-precision
floating-point numbers and character strings, up to eight characters in a single action
...
The processor has a very
simple architecture
...
g
...
The processor supports thirty-two 64-bit registers
...
The relatively large cache
(the IBM 3090 provides 128 to 256 Kbytes of cache) enables the system to keep large sets
of program code and data local to the processor, off-loading the main memory bus and
avoiding the need for a large register file with the accompanying windowing logic
...
1 INSTRUCTION SET
Table 13
...
All processor
instructions are encoded in a single 32-bit word format
...
The R4000 makes no use of condition codes
...
This avoids the need for
special logic to deal with condition codes as they affect the pipelining mechanism and the
reordering of instructions by the compiler
...
Further, conditions mapped onto the
register files are subject to the same compile-time optimizations in allocation and reuse as
other values stored in registers
...
This single instruction length simplifies instruction fetch and decode, and it also simplifies
the int
action of instruction fetch with the virtual memory management unit
(i
...
, instructions do not cross word or page boundaries)
...
9) share common formatting of opcodes and register references, simplifying
instruction decode
...
Only the simplest and most frequently used memory-addressing mode is implemented
in hardware
...
For
example, the "load word" instruction is of the form
116
1w r2, 128(r3) /"load word at address 128 offset from register 3 into
register 2
Each of the 32 general-purpose registers can be used as the base register
...
The compiler makes use of multiple machine instructions to typical addressing
modes in conventional machines
...
loads the upper half of a register with a 16-bit immediate
value, setting the lower
117
half to zero
...
2 INSTRUCTION PIPELINE
With its simplified instruction architecture, the MIPS can achieve very efficient pipelining
...
The initial experimental RISC systems and the first generation of commercial RISC
processors achieve execution speeds that approach one instruction per system clock cycle
...
In essence, a superscalar architecture replicates each of the pipeline stages so
that two or more instructions at the same stage of the pipeline can be processed
118
simultaneously
...
With more stages, more instructions can be in the pipeline at the
same time, increasing parallelism
...
With superscalar pipelining, dependencies between
instructions in different pipelines can slow down the system
...
With superpipelining, there is overhead
associated with transferring instructions from one stage to the next
...
The MIPS 84000 is a
good example of a RISC-based superpipeline architecture
...
10a shows the instruction pipeline of the 83000
...
The MIPS compiler is able to reorder instructions to fill
delay slots with code 70 to 90% of the time
...
•
ALU operation or data operand address generation
119
•
Data memory reference
•
Write back into register file
As illustrated in Figure 3
...
5a, there is not only parallelism due to pipelining but also
parallelism within the execution of a single instruction
...
The external instruction and data access operations to the cache each
require 60 ns, as do the major internal operations (OP, DA, IA)
...
Calculation of an address for a branch instruction also overlaps
instruction decode and register fetch, so that a branch at instruction i can address the
ICACHE access of instruction i + 2
...
This tight coupling between instructions makes for
a highly efficient pipeline
...
The
functions performed in each stage are summarized in Table 3
...
5a
...
The use of more
advanced technology allows the clock cycle time to be cut in half, to 30 ns, and for the
access time to the register file to be cut in half
...
Before
looking at the final 84000 pipeline, let us consider how the 83000 pipeline can be modified
to improve performance using 84000 technology
...
5
...
Remember that the cycles in this figure are half as long as
those in Figure 3
...
b
...
Again, because of the speedup of the register file access, register read and write still
occupy only half of a clock cycle
...
This delay is reduced by implementing virtually indexed caches
and going to a parallel cache access and address translation
...
1
...
Because of the compression of events,
the data cache tag check is performed separately on the next cycle after cache access
...
In a super pipelined system, existing hardware is used several times per cycle by
inserting pipeline registers to split up each pipe stage
...
Tke R4000 technology has the speed and density to permit super
pipelining of degree 2
...
11 a shows the optimized R3000 pipeline using this super
pipelining
...
1
...
For the R4000, a much larger and specialized
adder was designed
...
Other improvements allow the execution of loads and stores at twice the rate
...
The pipeline advances at the rate of two stages per
clock cycle
...
IF = Instruction fetch first half
DC = Data cache
IS = Instruction fetch second half DF = Data cache first half
RF = Fetch operands fromDS = Data cache second half
EX = Instruction execute
TC = Tag check
IC = Instruction cache
Theoretical
R3000
and
Actual R4000
Superpipelines
• Instruction
second
Instruction
outputs the instruction and the TLB generates the physical address
...
e
...
• Instruction cache tag check is made
...
• Instruction execute: One of three activities can occur:
• If the instruction is a register-to-register operation, the ALU performs the arithmetic
or logical operation
...
• If the instruction is a branch, the branch target virtual address is calculated and
branch conditions are checked
...
• Data cache second: The TLB generates the physical address, and the data cache
outputs the instruction
...
122
• Write back: Instruction result is written back to register file
...
0 CONCLUSION
This unit has motivated the key characteristics of RISC machines:
1
...
A limited instruction set with a fixed format
A large number of registers or the use of a compiler that optimizes register usage and
3
...
0 SUMMARY
A RISC instruction set architecture also lends itself to the delayed branch technique, in
which branch instructions are rearranged with other instruction to improve pipeline
efficiency
...
0 TUTOR MARKED ASSIGNMENT
1
...
2
...
7
...
TMIPS RISC Architecture Engle wood Cliffs NJ Prentice Hall,
1992
...
0 Introduction
2
...
0 Main content
3
...
2Types of operating system
4
...
0 Summary
6
...
0 References and further reading
1
...
2
...
1 Operating System Objectives And Function
124
An OS is a program that controls the execution of application programs and acts as an
interface between the user of a computer and the computer hardware
...
- Efficiency: An OS allows the computer system resources to be used in an efficient
manner
...
The operating system a as user/ computer interface
The hardware and software used in providing applications to a user can be viewed in a
layered or hierarchical fashion, as depicted in Figure 8
...
The user of those applications,
the end user, generally is not concerned with the computer's architecture
...
That application row can used in a
programming language and is developed by an application programme
...
To ease this
task, a set of systems programs is provided
...
These implement frequently used functions that assist in program creation, the
management of files, and the control of I/O devices
...
The most important system program is the OS
...
It acts as mediator, making it easier for
the programmer and for application programs to access and use those facilities and
services
...
the OS typically provides services in the following area:
Program creation: The OS provides a variety of facilities and services such as
editors and debuggers to assist the programmer in creating programs
...
Program execution: A number of tasks need to be
performed to program
...
Access to I/O devices: Each I/O device requires its own specifications or control
signals for operation
...
Controlled access to files: In the case of files control must include an
understanding of not only the nature of the I/O device (disk drive, tap also the file
format on the storage medium
...
System access: In the case of a shared or public system, the OS controls access the
system as a Nyhole and to specific system resources
...
126
Error detection and response: A variety of errors can occur while system is
running
...
In each case the OS must make the response that
clears the error condition with the least impact on running applications
...
Accounting: A good OS collects usage statistics for various resource monitor
performance parameters such as response time
...
On a multiuser system the information can be used for billing
purposes
...
The OS is responsible for managing these resources
...
But this control is exercised in a curious way
...
(For
example, a residential heating is controlled by a thermostat, which is completely distinct
from the heat-generation and heat-distribution apparatus
...
The OS frequently relinquishes control and must depend on the processor to allow it to
regain control
...
Like other computer
programs, it provides instructions for the processor
...
The OS directs the processor in the use of the other system
resources and in the timing of its execution of other programs
...
Thus, the OS relinquishes control for the processor to do some "useful"
127
work and then resumes control long enough to prepare the processor to do the next piece
of work
...
Figure 3
...
2 suggests the main resources that are managed by the OS
...
This includes the kernel, or nucleus, which contains the most
frequently used functions in the OS and, at a given time, other portions of the OS
currently in use
...
The
allocation of this resource (main memory) is controlled jointly by the OS and memory
management hardware in the processor, as we shall see
...
2 Types of operating system
Certain key characteristics serve to differentiate various types of operating systems
...
The first dimension specifies whether
the system is batch or interactive
...
Furthermore, the user may, depending on the
nature of the application, communicate with the computer during the execution of the job
...
The user's program is batched together with
programs from other users and submitted by a computer operator
...
Pure batch systems are rare today
...
An independent dimension specifies whether the system employs multiprogramming
or not
...
Several programs are loaded
into memory, and the processor switches rapidly among them
...
With the earliest computers, from the late 1940s to the mid-1950s, the programmer
interacted directly with the computer hardware; there was no OS
...
Programs in processor code were loaded via the input device (e
...
, a card reader)
...
The programmer
could proceed to examine registers and main memory to determine the cause of the error
...
These early systems presented two main problems
...
Typically, a
user could sign up for a block of time in multiples of a half hour or so
...
On the
other hand, the user might run into problems, not finish in the allotted time, and be forced to
stop before resolving the problem
...
Each of these steps could in mounting or dismounting tapes, or setting up card
decks
...
Thus a considerable amount of time was spent just in setting of program to run
...
Over time, various system software tools were
developed to attempt to make serial processing more efficient
...
Early processors were very expensive, and therefore it was important to maximize
processor utilization
...
129
To improve utilization, simple batch operating systems were developed
...
Rather,
the user submits the job on cards or tape to a computer operator, who batches the jobs
together sequentially and places the entire batch on an input device, for use by the monitor
...
From the point of view of the monitor, the monitor
controls the sequence of events
...
1
...
That portion is referred to as the
resident monitor
...
The
monitor reads in jobs one at a time from the input device (typically a card reader or magnetic
tape drive)
...
When the job is completed, it
returns control to the monitor, which immediately reads in the next job
...
Now consider this sequence from the point of view of the processor
...
These instructions cause the next job to be read in to another
portion of main memory
...
The processor will then execute the instruction in the user's program
130
until it encounters an ending or error condition
...
Thus the phrase "control is passed to a job"
simply means that the processor is now fetching and executing instructions in a user
program, and "control is returned to the monitor" means that the processor is now
fetching and executing instructions from the monitor program
...
A batch of jobs
is queued up, and jobs are executed as rapidly as possible, with no intervening idle time
...
With each job,
instructions are included in a job control language (JCL)
...
A simple example is
that of a user submitting a program written in FORTRAN plus some data to be used by
the program
...
In addition to FORTRAN and data lines, the job
includes job control instructions, which are denoted by the beginning "$"
...
The compiler translates the user's program
into object code, which is stored in memory or mass storage
...
" If it is stored on tape, then the $LOAD
instruction is required
...
The monitor invokes the loader, which loads the object program
into memory in place of the compiler and transfers control to it
...
We see that the monitor, or batch OS, is simply a computer program
...
Certain other hardware features are
also desirable:
Memory protection: While the user program is executing it must not alter the
memory area containing the monitor
...
The monitor Would then abort the job
...
and load the next job
...
The timer is set at the beginning of each job
...
and control returns to the monitor
...
If the processor encounters such an
instruction while executing a user program an error interrupt occurs
...
This prevents
...
a user program from accidentally
reading job control instructions from the next job
...
If a
privileged instruction is encountered by the processor while it is executing a
user program, the processor hardware considers this an error and transfers
control to the monitor
...
This feature
gives the OS more flexibility in relinquishing control to and retraining control
from user programs
...
There have been two sacrifices: Some main memory is now given over to
the monitor and some processor time is consumed by the monitor
...
Even with this overhead, the simple batch system improves
utilization of the computer
...
The processor
is often idle
...
The
calculation concerns a program that processes a file of records and performs, on
average
...
In this example the computer spends
over 96% of its time waiting for I/O devices to finish transferring data! The processor
spends a certain amount of time executing, until it reaches an I/O instruction
...
132
This inefficiency is not necessary
...
Suppose that there is room
for the OS and two user programs
...
the processor can switch to the other job, which likely is not waiting
for(Figure 8
...
Furthermore we might expand memory to hold three, four
programs and switch among all of them (Figure 8
...
This technique known as
multiprogramming or multitasking It is the central theme of modern operating
systems
...
When the I/O operation is complete, the processor is interrupted and control is passed tow an
interrupt-handling program in the OS
...
Multiprogramming operating systems are fairly sophisticated compared to singleprogram,
or uniprogramming, systems
...
In addition, if several jobs are ready to
run, the processor must decide which one to run, which requires some algorithm for
scheduling
...
With the use of multiprogramming, batch processing can
be quite efficient
...
Indeed, for some jobs, such as transaction processing, an
interactive mode is essential
...
That option was not available in the 1960s, when most
computers were big and costly
...
134
Just as multiprogramming allows the processor to handle multiple batch jobs at a time,
multiprogramming can be used to handle multiple interactive jobs
...
In a time-sharing system, multiple users simultaneously
access the system through terminals, with the OS interleaving the execution of each user
program in a short burst or quantum of computation
...
However, given the relatively slow human
reaction time, the response time on a properly designed system should be comparable to that
on a dedicated computer
...
The key
differences are listed in Table 3
...
3
135
4
...
5
...
Typically the
hardware will interrupt a running process from time to time to enables the operating system
to make a new scheduling decision so as to share processor time fairly among a number of
process
...
0 Tutor marked assignment
1
...
List and briefly define the key services provided by an operating system
7
...
Operating systems, internals design principles, sixth edition
UNIT 2: SCHEDULING
1
...
0 OBJECTIVES
3
...
1 LONG TERM SCHEDULING
3
...
3 SHORT TERM SCHEDULING
4
...
0 SUMMARY
6
...
0 REFERENCES/ FURTHER READING
1
...
It is the key to multi programming
...
0 OBJECTIVES
At this end of this unit you should be able to
- Explain scheduling
- List and discuss types of scheduling
3
...
Thus, it controls the degree of multiprogramming (number of processes in
memory)
...
In some systems, a newly created process begins in a swappedout condition, in which case it is added to a queue for the medium-term scheduler
...
The long-term scheduler creates
processes from the queue when it can
...
First, the
scheduler must decide that the OS can take on one or more additional processes
...
The criteria
used may include priority, expected execution time, and I/O requirements
...
Time-sharing users are not simply queued up and
kept waiting until the system can accept them
...
At that
point, a connection request is met with a message indicating that the system is full and the
user should try again later
...
2 Medium-term scheduling
Medium-term scheduling is part of the swapping function, described in Section 8
...
Typically, the swapping-in decision is based on the need to manage the degree of
multiprogramming
...
Thus, the swapping-in decision will consider the memory requirements of the
swapped-out processes
...
3 Short-term scheduling
The long-term scheduler executes relatively infrequently and makes the coarse-grained
decision of whether or not to take on a new process, and which one to take
...
137
To understand the operation of the short-term scheduler, we need to
consider the concept of a process state
...
Its status at any point in time is referred to as a state
...
At minimum, there are five defined states for a process (Figure
3
...
1):
New: A program is admitted by the high-level scheduler but is not yet ready to
execute
...
moving it to the ready state
...
Running: The process is being executed by the processor
...
Halted: The process has terminated and will be destroyed by the OS
...
For this purpose each
process is represented in the OS by a process control block (Figure 3
...
State: The current state of the process (new
...
and so on)
...
Program counter: The address of the next instruction in the program to be executed
...
Context data: These are data that are present in registers in the processor while the process
is executing
...
The context data plus the
program counter are saved when the process leaves the running state
...
I/O status information: Includes outstanding I/O requests, 1/0 devices (e
...
, tape drives)
assigned to this process, a list of files assigned to the process, and so on
...
When the scheduler accepts a new job or user request for execution, it creates a blank
process control block and places the associated process in the new state
...
To understand how the OS manages the scheduling of the various jobs in memory, let
us begin by considering the simple example in Figure 8
...
The figure shows how main
memory is partitioned at a given point in time
...
In addition, there are a number of active
processes, including A and B, each of which is allocated a portion of memory
...
The processor is executing
instructions from the program contained in A's memory partition
...
This will happen for one of three reasons:
Process A issues a service call (e
...
, an 1/O request) to the OS
...
Process A causes an interrupt
...
When this signal is detected, the processor ceases to execute A and transfers to
the interrupt handler in the OS
...
One example is an error, such as attempting to execute a privileged instruction
...
140
Some event unrelated to process A that requires attention causes an interrupt
...
In any case, the result is the following
...
The OS may perform some work, such as initiating an I/O operation
...
In this
example, B is chosen
...
This simple example highlights the basic functioning of the short-term scheduler
...
10 shows the major elements of the OS involved in the multiprogramming
and scheduling of processes
...
Once the
interrupt or service call is handled, the short-term scheduler is invoked to select a process for
execution
...
Each queue is simply a waiting list
of processes waiting for some resource
...
As conditions permit, the high-level scheduler will allocate memory and create a
process for one of the waiting items
...
Any one of these processes could use the processor next
...
Generally, this is done with a round-robin algorithm, giving each
141
process some time in turn
...
Finally, there is an I/O queue for
each I/O device
...
All
processes waiting to use each device are lined up in that device's queue
...
11 suggests how processes progress through the computer under the control of
the OS
...
As resources become available, a process request becomes a process and is then
placed in the ready state and put in the short-term queue
...
While the OS is in control, it
decides which process in the short-term queue should be executed next
...
As was mentioned earlier, a process being executed may be suspended for a variety of
reasons
...
If it is suspended because of a timeout or because the OS must attend
to pressing business
...
Finally, we mention that the OS also manages the I/O queues
...
It then selects another waiting process (if any) and signals for the I/O
device to satisfy that process's request
...
0 Introduction
2
...
0 Main content
3
...
2 The memory hierarchy
3
...
0 Introduction
Computer memory is organized into a hierarchy
...
Next comes one or more levels of cache, When
multiple levels are used, they are denoted L1, L2, and so on
...
2
...
It would be nice to use only the fastest memory,
but because that is the most expensive memory, we trade off access time for cost by
using more of the slower memory
...
In general, it is likely that most future accesses to train memory by the processor
will be to locations recently accessed
...
If the c~ichc is designed
properly, then most of the time the processor will request memory words that are
already in the cache
...
No one technology is optimal in satisfying the memory requirements for a
computer system
...
This chapter and the next focus on internal memory elements, while Chapter 6 is
devoted to external memory
...
The remainder of the chapter examines an essential element of all
modern computer systems: cache memory
...
The most important of these are
listed in Table 4
...
The term location in refers to whether memory is internal and external to the
computer
...
But there are other forms
of internal memory
...
g
...
3)
...
We will defer discussion of these
latter two types of internal memory to later chapters
...
External memory consists of peripheral storage devices, such as disk and tape, that
are accessible to the processor via I/O controllers
...
For internal memory, this is
typically expressed in terms of bytes (1 byte = 8 bits) or words
...
External memory capacity is typically expressed in terms of bytes
...
For internal memory, the unit of transfer is equal
to the number of electrical lines into and out of the memory module
...
To clarify this point, consider
three related concepts for internal memory:
Word: The "natural" unit of organization of memory
...
Unfortunately, there are many exceptions
...
The Intel x86 architecture has a wide variety of instruction lengths,
expressed as multiples of bytes, and a word size of 32 bits
...
In any case, the
relationship between the length in bits A of an address and the number N of
addressable units is 2A = N
...
The unit of transfer need not equal a word or an addressable
144
unit
...
Another distinction among memory types is the method of accessing units of data
...
Access
must be made in a specific linear sequence
...
A shared read write mechanism is
used, and this must be moved from its current location to the desired location, passing
and rejecting each intermediate record
...
Tape units, discussed in Chapter 6, are sequential access
...
However, individual blocks or records have a unique address based on
physical location
...
Again,
access time is variable
...
Random access: Each addressable location in memory has a unique, physically
wired-in addressing mechanism
...
Thus, any location can be selected at
random and directly addressed and accessed
...
Associative: This is a random access type of memory that enables one to make a
comparison of desired bit locations within a word for a specified match, and to do this
for all words simultaneously
...
As with ordinary random-access memory, each
location has its own addressing mechanism, and retrieval time is constant independa t
of location or prior access patterns
...
From a user's point of view, the two most important characteristics of memory are capacity
and performance
...
For non-random-access memory, access time is the time it takes to position
the read-write mechanism at the desired location
...
This additional time may be required for transients to die out on
signal lines or to regenerate data if they are read destructively
...
Transfer rate: This is the rate at which data can be transferred into or out of a
memory unit
...
For non-random-access memory, the following relationship holds:
Tx = TA + R
(4
...
The most common today are
semiconductor memory, magnetic surface memory, used for disk a tape, and optical and
magneto-optical
...
In a volatile memory,
information decays naturally or is lost when electrical power is switched off
...
Magnetic-surface
memories are nonvolatile
...
Nonerasable memory cannot be altered, except by destroy
the storage unit
...
Of necessity, a
practical nonerasable memory must also be nonvolatile
...
By on nation is
meant the physical arrangement of bits to form words
...
146
The design constraints on a computer's memory can be summed up by three question:
How much? How fast? How expensive?
The question of how much is somewhat open ended
...
The question of how fast is, in a easier to
answer
...
That is, as the processor is executing instructions, we not want it to have to
pause waiting for instructions or operands
...
For
a practical system, the cost of memory must be able in relationship to other components
...
A variety of technologic used to implement
memory systems, and across this spectrum of technology following relationships hold:
Faster access time, greater cost per bit
Greater capacity, smaller cost per bit
Greater capacity, slower access time
The dilemma facing the designer is clear
...
However, to meet performance requirements,
the designer needs to use expensive, relatively lower-capacity memories with short access
times
...
A typical hierarchy is illustrated in Figure
4
...
As one goes down the hierarchy, the following occur:
Decreasing cost per bit i > Increasing capacity
7a Increasing access time
Decreasing frequency of access of the memory by the processor
Thus, smaller, more expensive, faster memories are supplemented by larger, cheaper,
slower memories
...
We examine this concept in greater detail when
we discuss the cache, later in this chapter, and virtual memory in
Chapter 8
...
147
148
The use of two levels of memory to reduce average access time works in principle, but only if conditions (a) through (d) apply
...
Fortunately, condition (d) is also generally valid
...
During the course of execution of a program, memory references by the processor, for both instructions and data, tend to cluster
...
Once a loop or
subroutine is entered, there are repeated references to a small set of instructions
...
Over a long period of time, the clusters in use change, but over a short
period of time, the processor is primarily working with fixed clusters of memory
references
...
Consider the two-level example already presented
...
The current clusters can be
temporarily placed in level l
...
On average, however, most references will be to instructions and data
contained in level 1
...
1
...
Typically, a processor will contain a few dozen such registers, although some machines contain
hundreds of registers
...
Each location in main memory has a
unique _address
...
The -ache is not usually visible to the programmer or, indeed, to the
149
processor
...
-ice for staging the movement of data between main memory
and processor registers to improve performance
...
The use of three levels exploits the fact that semiconductor
memory comes in a variety of types, which differ in speed and cost
...
External, nonvolatile memory is also referred to as secondary memory
auxiliary memory
...
Disk is also used to provide an extension to main memory
own as virtual memory, which is discussed in Chapter 8
...
For example, large 1
mainframes include a form of internal memory known as expanded storage is uses a
semiconductor technology that is slower and less expensive than that of in memory
...
Other forms of secondary memory include optical and
magneto-optical disks
...
A portion of main memory can be used as a buffer to hold
data temporarily that is to be read out to disk
...
Instead of many small transfers of data, we have a
few large transfers of data
...
Some data destined for write-out may be referenced by a program before the
next dump to disk
...
150
ERROR CORRECTION
A semiconductor memory system is subject to errors
...
A hard failure is a permanent physical defect so that the
memory cell or cells affected cannot reliably store data but become stuck at 0 or 1 or
switch erratically between 0 and 1
...
A soft error is a random, nondestructive
event that alters the contents of one or more memory cells without damaging the
memory
...
These particles result from radioactive decay and are distressingly common because
radioactive nuclei are found in small quantities in nearly all materials
...
Figure 5
...
When data are
to be read into memory, a calculation, depicted as a function f, is performed on the
data to produce a code
...
Thus, if an _VI-bit
word of data is to be stored and the code is of length K bits, then the actual size of the
stored word is M + K bits
...
A new set of K code bits is generated from the M data bits and
compared with the fetched code bits
...
The fetched data bits are sent out
...
The data bits plus error
correction bits are fed into a corrector, which produces a corrected set of M bits to be
sent out
...
This condition is reported
...
A code
is characterized by the numbs of bit errors in a word that it can correct and detect
...
Figure 5
...
With three intersecting circles, there
aseven compartments
...
8a)
...
that the total number of is in its circle is even (Figure 5
...
Thus, because circ A includes three data 1s, the parity bit in that circle set to 1
...
8c), it is easily four
By checking the parity bits, discrepancies are found in circle A and circle C but in
circle B
...
The er can
therefore be corrected by changing that bit
...
To start, let us determine how long the code must be
...
A bit-by-bit comparison
done by taking the exclusive-OR of the two inputs
...
Thus, each bit of the syndrome is 0 or 1 according to if there is or is n,_ match in that bit position for the two inputs
...
The value
0 indicates that no error was detected, leaving 2 K - 1 value - indicate, if there is an
error, which bit was in error
...
0 Conclusion
As one goes down the memory hierarchy one finds decreasing cost bit,
increasing capacity, and slower access time
...
2
...
If a bits error occurs,
the code will detect and usually correct the error
...
0 Tutor Marked Assignment
1
...
What is a parity bit?
3
...
0 References/Further reading
1
...
Foundation of coding New York Wiley 1991
2
...
0 INTRODUCTION
2
...
0 MAIN CONTEXT
3
...
2 ELEMENTS OF CACHE DESIGN
3
...
4 ARM CACHE ORGANIZATION
4
...
0 SUMMARY
6
...
0 REFERENCES AND FURTHER READING
154
1
...
So the cache memory
automatically retains a copy of some of the recently used words form the
dynamic random- access memory (DRAM)
2
...
1 Cache memory principles
3
...
3 Pentium 4 cache organization
3
...
0 Conclusion
If the cache is designed properly then most of the time the processor will request
memory words that are already in the cache
...
0 SUMMARY
Cache memories are intended to give memory speeds and large memory size
...
Moreover the longer the cache the larger the number of gates involved in
addressing the cache
...
0 Tutor marked assignment
For a direct mapped cache a main memory address is viewed as consisting of two
fields list and define the two fields
...
0 Reference and further reading
Agarwal, A
...
Boston: Kluwer academic publishers 1989
...
0 Objectives
2
...
1 Digital circuitry
2
...
3 Basic identities of Boolean Algebra
3
...
0 Summary
5
...
0 References/ Further reading
Construct a truth table for the following Boolean expression:
a
...
ABC + ABC + ABC
c
...
(A +B) (A+C) (A+B)
7
...
Jones and zeros: Understanding Boolean Algebra, Digital
1
...
In this unit we suggest how storage elements and circuits can be
implemented in digital logic specifically with combinational and sequential
circuits
...
0 OBJECTIVES
At the end of this unit you should be able to
- Understand Boolean operators
- Explain the Boolean operators
- Discuss the identities of Boolean Algebra
3
...
2 Explain the Boolean operators
3
...
1 DIGITAL CIRCUITRY
The digital circuitry in digital computers and other digital systems is designed,
and its behavior is analyzed, with the use of a mathematical discipline known as
Boolean algebra
...
In 1938, Claude Shannon, a research
assistant in the Electrical Engineering Department at M
...
T, suggested that
Boolean algebra could be used to solve problems in relay-switching circuit
design [SFIAN38]
...
Boolean algebra turns out to be
a convenient tool in two areas:
Analysis: It is an economical way of describing the function of digital circuitry
...
As with any algebra, Boolean algebra makes use of variables and
operations
...
Thus, a variable may take on the value 1 (TRUE) or 0 (FALSE)
...
The operation OR yields true if either or both of its operands
are true
...
For
example, consider the equation
D=A+(B,C)
D is equal to 1 if A is 1 or if both B = 0 and C = 1
...
'The paper is available at this book's web site
...
2
3
...
In the absence of parentheses, the AND operation takes precedence over the OR operation
...
Thus,
A+B-C=A+(B-C)=A+BC
all mean: Take the AND of B and C; then take the OR of the result and A
...
1a defines the basic logical operations in a form known as a truth
table, which lists the value of an operation for every possible combination of values
of operands
...
The exclusive-or (XOR) of two logical operands is 1 if and only if exactly
one of the operands has the value 1
...
3
...
2
Basic Identities of Boolean Algebra
Basic Postulates
A-B=B-A
A -t- B = B + A
Commutative
Laws
Distributive Laws
A - (B + C) _ (A-13) + (A -A + (B - C) - (A + B)-(A + C)
C) A = A
10+A=A
Identity Elements
A- A = 0
Inverse Elements
A-1 A = 1
Other Identities
o-A=0
1 +A =I
A-A=A
Aj-A-A
A - (B -C') -- (A - B) - C
A + (B + C) -- (A + B) + C
X---B - A + B
A -= B = A -13
Associative Laws
DeMorgan's
Theorem
The logical operations, with the exception of NOT, can be generalized to
more than two variables, as shown in Table 20
...
159
Table 20
...
The equations have been
arranged in two columns to show the complementary, or dual, nature of the AND
and OR operations
...
The postulates define the way in which Boolean expressions are
interpreted
...
2 by substituting
actual values (1s and Os) for the variables A, B, and C
...
0 Introduction
2
...
0 Main content
3
...
2 Basic logic gates
4
...
0 Summary
6
...
0 References/ further reading
1
...
Logical
functions are implemented by the interconnection of gates
...
0 OBJECTIVES
At the end of this unit, you should be able to
- Explain the basic logic gates
- Design by the line diagram for each gates/logical circuits
Construct truth table of logical circuits
160
3
...
The basic gates used in digital logic are
AND, OR, NOT, NAND, NOR, and XOR
...
1 depicts these six gates
...
The symbology used here and throughout the appendix is the IEEE standard,
IEEE Std 91
...
Each gate shown in Figure 20
...
However, as indicated in Table 20
...
Thus, (X + Y + Z) can be implemented with a single OR gate with
three inputs
...
The significance of this delay is
discussed in Section 20
...
In some cases, a gate is implemented with two outputs,
one output being the negation of the other output
...
The true (1) state is either a high or low voltage state, depending on the type of
electronic circuitry
...
2 BASIC LOGIC GATES
Typically, not all gate types are used in implementation
...
Thus, it is important to identify
functionally complete sets of gates
...
The following are functionally complete
sets:
• AND, OR, NOT
• AND, NOT
• OR, NOT
• NAND
• NOR
It should be clear that AND, OR, and NOT gates constitute a functionally complete set,
because they represent the three operations of Boolean algebra
...
This can be done by applying DeMorgan's theorem:
A+B=A-B
A OR B = NOT (
...
Figure 211
...
3 shows the same thing for NOR gates
...
With gates, we have reached the most primitive circuit level of computer hardware
...
For our purposes, however, we are content to
describe how gates can be used as building blocks to implement the essential logical circuits
of a digital computer
4
...
The design and fabrication of circuits
becomes simpler and earlier of one or two gates are used
...
0 SUMMARY
In summary, we have size basic logic gates which are defined in three ways,
namely graphic symbol, algebraic notation and truth table
...
0 TUTOR MARKED ASSIGNMENT
1
...
2
...
0 REFERENCES/ FURTHER READING
Farahat, H
...
0 Introduction
2
...
0 Main content
3
...
0
Introduction
A combinational circuit is an interconnected set of gates whose output at any
time is a function only of the output at that time
...
0
the output with only gate delays
...
• Explain
and
discuss
algebraic
karnaugh maps and quine mckluskey tables
simplification,
3
...
As with a gate, a combinational circuit can be defined in three ways:
• Truth table: For each of the 2" possible combinations of input signals, the
binary value of each of the m output signals is listed
...
• Boolean equations: Each output signal is expressed as a Boolean function of
its input signals
...
For any given function, there are a number of alternative realizations
...
3
...
1)
164
There are three combinations of input values that cause F to be 1, and if any
one of these combinations occurs, the result is 1
...
3
reasons, is known as the sure of products (SOP) form: Figure 20
...
Another form can also be derived from the truth table
...
We can
also say that the output is 1 if none of the input combinations that produce 0 is
true
...
Z)=X+Y+Z
Thus,
F=(A+B+C)-(A+B+c)-(A+B+c)(A+B+C)-(A+B+C) (lox) =(A+B+C)-(A+B+c)(A+B+C)-(A+B+c)-(A+R+c)
This is in the product of sums (POS) form, which is illustrated in Figure 20
...
For
clarity, NOT gates are not shown
...
This simplifies the logic diagram and makes the
inputs to the gates more readily apparent
...
At this point,
it would seem that the choice would depend on whether the truth table contains
165
more 1s or Os for the output function: The SOP has one term for each 1, and the
POS has one term for each 0
...
Implementation of Table 20
...
The significance of the first point is that, with a simpler Boolean expression,
fewer gates will be needed to implement the function
...
3
...
2 to
reduce the Boolean expression to one with fewer elements
...
1)
...
3)
F=B(A+~)
166
This expression can be implemented as shown in Figure 20
...
The simplification
of Equation (20
...
For more complex
expressions, some more systematic approach is needed
...
The map is an array of 2" squares, representing all possible combinations of values of
n binary variables
...
7a shows the :nap of four squares for a function of two
variables
...
In the case
of three variables, the representation is an arrangement of eight squares (Figure
20
...
For four variables, 16 squares are needed, with the
arrangement indicated in Figure 20
...
The map can be used to represent any Boolean function in the following way
...
Thus, the product AB corresponds to the fourth square in Figure 20
...
For each such product in the function, 1 is placed in the corresponding square
...
Given the
truth table of a Boolean function, it is an easy matter to construct the map: for each
combination of values of variables that produce a result of 1 in the truth table, fill
in the corresponding square of the map with 1
...
7b shows the result for
the truth table of Table 20
...
To convert from a Boolean expression to a map, it is
first necessary to put the expression into what is referred to as canonical form:
each term in the expression must contain each variable
...
3), we must first expand it into the full form of Equation (20
...
The labeling used in Figure 20
...
Here the two rows embraced by the symbol
A are those in which the variable A has the value 1; the rows not embraced by the
symbol A are those in which A is 0; similarly for B, C, and D
...
The principle is
as follows
...
If
two adjacent squares both have an entry of one, then the corresponding product
terms differ in only one variable
...
For_ example, in Figure 20
...
Thus, the function expressed is
ABCD + ABCD = ABD
This process can be extended in several ways
...
Thus, the top
square of a column is adjacent to the bottom square, and the leftmost square of a
row is adjacent to the rightmost square
...
8b and c
...
The next three examples in Figure 20
...
Note that in this case, two of the variables can be eliminated
...
We can summarize the rules for simplification as follows:
Among the marked squares (squares with a 1), find those that belong to a unique
largest block of 1, 2, 4, or 8 and circle those blocks
...
The results
may not be unique in some cases
...
When you are circling groups, you are allowed to use the same 1 value
more than once
...
169
Figure 20
...
3, illustrates the simplification process
...
Finally, before going from the map to a simplified Boolean expression, any
group of 1s that is completely overlapped by other groups can be eliminated
...
9b
...
One additional feature of Karnaugh maps needs to be mentioned
...
These are referred to as "don't care"
conditions
...
In doing the grouping and simplification each “d” can be
treated as 1 or 0 whichever leads to the simplest expression
...
We would like to develop the Boolean expressions for a circuit that adds 1
to a packed decimal digit
...
2 that with packed decimal, each
decimal digit is represented by a 4-bit code, in the obvious way
...
, 8 = 1000, and 9 = 1001
...
This code is also referred to as Binary Coded Decimal (BCD)
...
4 shows the truth table for producing a 4-bit result that is one more
than a 4-bit BCD input
...
Thus, 9 + 1 = 0
...
Figure 20
...
The d squares are used to achieve the best possible groupings
...
With five variables, two 16 x 16 maps are needed, with one map
considered to be on top of the other in three dimensions to achieve adjacency
...
The
method is suitable for programming on a computer to give an autoirmafic fool for
producing minimized Boolem expregSiong
...
Consider the following
expression:
ABCD+ABCD+ABCD+ABCD+ABCD+ABCD+ABCD+ABCD
Let us assume that this expression was derived from a truth table
...
172
The first step is to con_ trLictt a table in which each row corresponds to one of the
product terms of the expression
...
I =:at is; eve start with the term with no complements, if it
exists, then all terms with one complement, and so on
...
5 shows the list for
our example expression, with horizontal fines used to indicate the grouping
...
Thus, we group terms according to the number of is they contain
...
The next step is to find all pairs of terms that differ in only one variable, that is, all
pairs of terms that are the same except that one variable is 0 in one of the terms and 1 in the
other
...
Then compare each term of the second group with all of the terms of the third
group, and so on
...
a new list
...
This
process continues until the entire original table has been examined
...
The second table is then processed in the same manner as the first
...
In this
example, the third table that is produced contains only one term: BD
...
In this case, this has involved three tables
...
Those terms that have not been eliminated are used to construct a
174
matrix, as illustrated in Table 20
...
Each row of the matrix corresponds to one of the terms
that have not been eliminated (has no check) in any of the tables used so far
...
An X is placed at each
intersection of a row and a column such that the row element is "compatible" with the
column element
...
Next, circle each X
that is alone in a column
...
If every column now has either a squared or a circled X, then we are done, and
those row elements whose Xs have been marked constitute the minimal expression
...
Essentially, we keep adding row elements until all columns are covered
...
The first phase of the operation is reasonably straightforward, The process eliminates
unneeded variables in product terms
...
However, there may be redundant terms in this
expression, just as we found redundant groupings in Karnaugh maps
...
175
Another consideration in the implementation of Boolean functions concerns the types of gates
used
...
Although this may not be the minimum-gate implementation, it has
the advantage of regularity, which can simplify the manufacturing process
...
3):
Because the complement of the complement of a value is just the original value;
F=B(A+C)=(AB)+ Applying DeMorgan's theorem,
F = (AB) ° (BC)
which has three NAND forms, as illustrated in Figure 2
...
The multiplexer connects 1utinle inputs to a single output
...
A general block diagram representation is shown in Figure This
represents a 4-to-1 multiplexer
...
D'
...
One
of these lines is selected to provide the output signal F
...
An example 4-to--- multiplexer is defined by the truth table in Table 20
...
This is a
simplified form of a truth table
...
Figure 20
...
S1 and S2 are connected to the AND
gates in such a way that, for any combination of S1 and S2, three of the AND gates will
output 0
...
Thus, three of the inputs to the OR gate are always 0, and the output of the OR gate will
176
equal the value of the selected input gate
...
Multiplexers are used in digital circuits to control signal and data routing
...
The value to be loaded into the program counter
may come from one of several different sources:
A binary counter, if the PC is to be incremented for the next instruction
•
The instruction register, if a branch instruction using a direct address has just been
executed
•
The output of the AT U, if the branch instruction specifies the address using a
displacement mode
These various inputs could be connected to the input lines of a mulltiplexer, with the PC
connected to the output line
...
Because the PC contains multiple bits, multiple multiplexers are used, one per bit
...
14 illustrates this for 16-bit addresses
...
In general, a
decoder has n inputs and 2" outputs
...
15 shows a decoder with three inputs and eight
outputs
...
One example is address decoding
...
We
want a single unified address space, which can be broken down as follows;
Address
0000-00FF
Chip
0
0100-01FF
1
0200-02FF
0300-03FF
2
3
178
Each chip requires 8 address lines, and these are supplied by the lower-order 8 bits of
the address
...
For this purpose, a 2-to-4 decoder is used whose output enables one of the four
chips, as shown in Figure 20
...
With an additional input line, a decoder can be used as a demultiplexer
...
This is shown in Figure 20
...
As before, n inputs are decoded to
produce a single one of 2" outputs
...
Thus, the n inputs act as an address to select a particular output line, and the value on
the data input line (0 or 1) is routed to that output line
...
17 can be viewed in another way
...
This allows for the control of the timing of the
decoder
...
3
...
180
However, there is one sort of memory that is implemented with combinational circuits,
namely read-only memory (ROM)
...
This
implies that the binary information stored in a ROM is permanent and was created during
the fabrication process
...
Because the outputs are a function only of the present inputs,
the ROM is in fact a combinational circuit
...
As an example,
consider Table 20
...
This can be viewed as a truth table with four inputs and four outputs
...
It can also be viewed as defining the contents of a 64-bit ROM consisting of 16
words of 4 bits each
...
Figure 20
...
As with the PLA, a
regular organization is used, and the interconnections are made to reflect the desired
result
...
One essential area not yet addressed is that of
arithmetic
...
181
Binary addition differs from Boolean algebra in that the result includes a carry term
...
In Table G0
...
This truth table
182
could easily be implemented in digital logic However, we are not interested in performing
addition on just a single pair of bits
...
This can be
done by putting together a set of adders so that the carry from one adder is provided as input
to the next
...
14
...
The revised truth table appears in Table
20
...
The two outputs can be expressed:
Sum = A BC + ABC + ABC + ABC
Carry=AB+AC+BC
183
Thus we have the necessary logic to implement a multiple-bit adder such as shown in
Figure 20
...
Note that because the output from each adder depends on the carry from the
previous adder, there is an increasing delay from the least significant to the most significant
bit
...
For larger adders, the accumulated delay can become unacceptably high
...
This can be achieved with an approach known as carry lookahead
...
We would like to come up with an expression that specifies the carry input to any
stage of the adder without reference to previous carry values
...
4)
(20
...
4)
Ci = AjBj + AjA0B0 + BIAGBO
(20
...
Each carry term can be expressed in
SOP form as a function only of the original inputs, with no dependence on the carries
...
For long numbers, this approach becomes excessively complicated
...
Accordingly, full carry lookahead is
typically done only 4 to 8 bits at a time
...
21 shows how a 32-bit adder can be
constructed out of four 8-bit adders
...
4
...
5
...
The operators
make use of sign in symbolic form
...
1
...
Logical functions
are implemented by the interconnection of gates
...
1 Gate
3
...
1
Implementation of Boolean function(7-9)
3
...
3
Reading memory(20-24)
4
...
mcklukey tables
...
0
SUMMARY
In summary, the karnaugh map of simplifying Boolean function appear to be more
cumbersome
...
6
...
1 Write the Boolean expression for a four input NAND gate
...
Design an 8 to 1 multiplexer
...
0
REFERENCE/FURTHER READING
Mand,M
...
logic and Computer Design fundamentals
...
186
Title: CIT 309 COMPUTER ARCHITECHURE
Description: Computer Architecture is a prerequisite for all 300 Level or HND 1 Students to offer in their first year. The Course is a work of A Professor and in accordance with the requirements for the award of First Degree or Higher National Diploma in Computer/Mathematics/Statistics Department with an easily understandable nature. It gives you a complete guide and knowledge of how the computer system was developed, manufactured and pre-programmed using binary digits, i.e 0 and 1 respectively. I have searched the internet consecutively and noticed that courses of such are difficult to get, in this veil, I advice all Computer/Mathemetics/Statistics Student who come across this eBook as well as Course Note. For complaints or inquiry, please send a mail to joysesam212@gmail.com Enjoy Your Write-Up!!!
Description: Computer Architecture is a prerequisite for all 300 Level or HND 1 Students to offer in their first year. The Course is a work of A Professor and in accordance with the requirements for the award of First Degree or Higher National Diploma in Computer/Mathematics/Statistics Department with an easily understandable nature. It gives you a complete guide and knowledge of how the computer system was developed, manufactured and pre-programmed using binary digits, i.e 0 and 1 respectively. I have searched the internet consecutively and noticed that courses of such are difficult to get, in this veil, I advice all Computer/Mathemetics/Statistics Student who come across this eBook as well as Course Note. For complaints or inquiry, please send a mail to joysesam212@gmail.com Enjoy Your Write-Up!!!