plsql | More Info | Notesale | Buy and Sell Study Notes Online | Extra Student Income | University Notes

Search for notes by fellow students, in your own course and all over the country.

Browse our notes for titles which look like what you need, you can preview any of the notes via a sample of the contents. After you're happy these are the notes you're after simply pop them into your shopping cart.

My Basket

Buy These Notes

You have nothing in your shopping cart yet.

Title: plsql
Description: My notes is about the pl/sql programs that we do in database management system.

Buy These Notes Preview

Document Preview

Extracts from the notes are below, to see the PDF you'll receive please use the links above

Edited by Foxit PDF Editor
Copyright
dddddd (c) by Foxit Software Company, 2004
For Evaluation Only
...

Computer
Science

Volume 1
Silberschatz−Korth−Sudarshan • Database System Concepts, Fourth Edition
Front Matter

1

Preface

1

1
...
Data Models

35

Introduction
2
...
Relational Model

35
36
87

II
...
SQL
5
...
Integrity and Security
7
...
Object−Based Databases and XML

307

Introduction
8
...
Object−Relational Databases
10
...
Data Storage and Querying

393

Introduction
11
...
Indexing and Hashing
13
...
Query Optimization

393
394
446
494
529

V
...
Transactions
16
...
Recovery System

563
564
590
637

iii

VI
...
Database System Architecture
19
...
Parallel Databases

679
680
705
750

VII
...
Application Development and Administration
22
...
Advanced Data Types and New Applications
24
...
In this text, we present the fundamental concepts of database management
...

This text is intended for a ﬁrst course in databases at the junior or senior undergraduate, or ﬁrst-year graduate, level
...

We assume only a familiarity with basic data structures, computer organization,
and a high-level programming language such as Java, C, or Pascal
...
Important theoretical results are covered, but formal proofs are
omitted
...
In place of proofs, ﬁgures and examples are used to suggest why a result is
true
...
Our aim is
to present these concepts and algorithms in a general setting that is not tied to one
particular database system
...
”
In this fourth edition of Database System Concepts, we have retained the overall style
of the ﬁrst three editions, while addressing the evolution of database management
...
Every chapter has
been edited, and most have been modiﬁed extensively
...

xv

2

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

xvi

Front Matter

Preface

© The McGraw−Hill
Companies, 2001

Preface

Organization
The text is organized in eight major parts, plus three appendices:
• Overview (Chapter 1)
...
We explain how the concept of a database
system has developed, what the common features of database systems are,
what a database system does for the user, and how a database system interfaces with operating systems
...
This example
is used as a running example throughout the book
...

• Data models (Chapters 2 and 3)
...
This model provides a high-level view of the issues in database design,
and of the problems that we encounter in capturing the semantics of realistic
applications within the constraints of a data model
...

• Relational databases (Chapters 4 through 7)
...
Chapter 5 covers
two other relational languages, QBE and Datalog
...
Algorithms
and design issues are deferred to later chapters
...

Chapter 6 presents constraints from the standpoint of database integrity
and security; Chapter 7 shows how constraints can be used in the design of
a relational database
...
The theme of this chapter is the protection of the database
from accidental and intentional damage
...
The theory
of functional dependencies and normalization is covered, with emphasis on
the motivation and intuitive understanding of each normal form
...

• Object-based databases and XML (Chapters 8 through 10)
...
It introduces the concepts of object-oriented programming, and shows how these concepts form the basis for a data model
...
Chapter 9 covers object-relational databases, and shows how the SQL:1999 standard extends
the relational data model to include object-oriented features, such as inheritance, complex types, and object identity
...
The chapter also describes query languages for XML
...
Chapter 11 deals with
disk, ﬁle, and ﬁle-system structure, and with the mapping of relational and
object data to a ﬁle system
...
Chapters 13 and 14 address query-evaluation algorithms, and query optimization
based on equivalence-preserving query transformations
...

• Transaction management (Chapters 15 through 17)
...

Chapter 16 focuses on concurrency control and presents several techniques
for ensuring serializability, including locking, timestamping, and optimistic
(validation) techniques
...
Chapter 17
covers the primary techniques for ensuring correct transaction execution despite system crashes and disk failures
...

• Database system architecture (Chapters 18 through 20)
...
We discuss centralized systems,
client – server systems, parallel and distributed architectures, and network
types in this chapter
...
The chapter also covers issues of system availability during failures and describes the
LDAP directory system
...
The chapter also describes
parallel-system design
...
Chapter 21 covers database application development and administration
...
Chapter 22 covers querying techniques, including decision support systems, and information retrieval
...
The chapter also describes information retrieval techniques for

4

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

xviii

Front Matter

Preface

© The McGraw−Hill
Companies, 2001

Preface

querying textual data, including hyperlink-based techniques used in Web
search engines
...
Finally, Chapter 24 deals with
advanced transaction processing
...

• Case studies (Chapters 25 through 27)
...
These chapters outline unique features of each of these
products, and describe their internal structure
...
They also cover several interesting practical aspects in the design of
real systems
...
Although most new database applications use either the
relational model or the object-oriented model, the network and hierarchical
data models are still in use
...
bell-labs
...

Appendix C describes advanced relational database design, including the
theory of multivalued dependencies, join dependencies, and the project-join
and domain-key normal forms
...
This appendix, too, is available
only online, on the Web page of the book
...

Our basic procedure was to rewrite the material in each chapter, bringing the older
material up to date, adding discussions on recent developments in database technology, and improving descriptions of topics that students found difﬁcult to understand
...
We have also added a tools section at the end of most chapters, which provide information on software tools related to the topic of the chapter
...

We have added a new chapter covering XML, and three case study chapters covering the leading commercial database systems, including Oracle, IBM DB2, and Microsoft SQL Server
...
For the beneﬁt of those readers familiar with the third edition,
we explain the main changes here:
• Entity-relationship model
...
More examples have been added, and some changed,
to give better intuition to the reader
...

• Relational databases
...

SQL coverage has been signiﬁcantly expanded to include the with clause, expanded coverage of embedded SQL, and coverage of ODBC and JDBC whose
usage has increased greatly in the past few years
...
Coverage of QBE
has been revised to remove some ambiguities and to add coverage of the QBE
version used in the Microsoft Access database
...
Coverage of security has been moved to Chapter 6 from its third-edition position of Chapter 19
...
Chapter 7 covers relational-database
design and normal forms
...
Chapter
7 has been signiﬁcantly rewritten, providing several short-cut algorithms for
dealing with functional dependencies and extended coverage of the overall
database design process
...

• Object-based databases
...
Object-relational coverage in
Chapter 9 has been updated, and in particular the SQL:1999 standard replaces
the extended SQL used in the third edition
...
Chapter 10, covering XML, is a new chapter in the fourth edition
...
Coverage of storage and ﬁle structures, in Chapter 11, has been updated; this chapter was Chapter 10 in the
third edition
...
Coverage of RAID has been updated to reﬂect technology trends
...

Chapter 12, on indexing, now includes coverage of bitmap indices; this
chapter was Chapter 11 in the third edition
...
Partitioned hashing has been dropped, since it is not in signiﬁcant use
...
All
details regarding cost estimation and query optimization have been moved

6

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

xx

Front Matter

Preface

© The McGraw−Hill
Companies, 2001

Preface

to Chapter 14, allowing Chapter 13 to concentrate on query processing algorithms
...
Chapter 14
now has pseudocode for optimization algorithms, and new sections on optimization of nested subqueries and on materialized views
...
Chapter 15, which provides an introduction to transactions, has been updated; this chapter was numbered Chapter 13 in the third
edition
...

Chapter 16, on concurrency control, includes a new section on implementation of lock managers, and a section on weak levels of consistency, which
was in Chapter 20 of the third edition
...
Chapter 17, on recovery, now includes coverage of the ARIES
recovery algorithm
...

As in the third edition, instructors can choose between just introducing
transaction-processing concepts (by covering only Chapter 15), or offering detailed coverage (based on Chapters 15 through 17)
...
Chapter 18, which provides an overview of
database system architectures, has been updated to cover current technology;
this was Chapter 16 in the third edition
...
While the coverage of parallel database query processing techniques in Chapter 20
(which was Chapter 16 in the third edition) is mainly of interest to those who
wish to learn about database internals, distributed databases, now covered in
Chapter 19, is a topic that is more fundamental; it is one that anyone dealing
with databases should be familiar with
...
Coverage of three-phase commit protocol has been abbreviated, as has distributed detection of global deadlocks, since neither is
used much in practice
...
There is
a new section on directory systems, in particular LDAP, since these are quite
widely used as a mechanism for making information available in a distributed
setting
...
Although we have modiﬁed and updated the entire text, we
concentrated our presentation of material pertaining to ongoing database research and new database applications in four new chapters, from Chapter 21
to Chapter 24
...
The description of how to build Web interfaces to
databases, including servlets and other mechanisms for server-side scripting,
is new
...
Coverage of materialized view selection is also new
...
There is a new section on e-commerce, focusing on database issues in e-commerce, and a new
section on dealing with legacy systems
...
Coverage of data warehousing and data mining has also been extended greatly
...
Earlier versions of this material were in Chapter 21 of the third edition
...
This material is an updated version of material that was in Chapter 21
of the third edition
...

• Case studies
...
These chapters outline unique features
of each of these products, and describe their internal structure
...
We have marked several sections as advanced, using the symbol
“∗∗”
...

It is possible to design courses by using various subsets of the chapters
...

• If object orientation is to be covered in a separate advanced course, Chapters
8 and 9, and Section 11
...
Alternatively, they could constitute
the foundation of an advanced course in object databases
...

• Both our coverage of transaction processing (Chapters 15 through 17) and our
coverage of database-system architecture (Chapters 18 through 20) consist of

8

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

xxii

Front Matter

Preface

© The McGraw−Hill
Companies, 2001

Preface

an overview chapter (Chapters 15 and 18, respectively), followed by chapters with details
...

• Chapters 21 through 24 are suitable for an advanced course or for self-study
by students, although Section 21
...

Model course syllabi, based on the text, can be found on the Web home page of the
book (see the following section)
...
bell-labs
...
For more information about how to get a copy of the solution manual, please send electronic mail to
customer
...
com
...

The McGraw-Hill Web page for this book is
http://www
...
com/silberschatz

Contacting Us and Other Users
We provide a mailing list through which users of our book can communicate among
themselves and with us
...
bell-labs
...

We have endeavored to eliminate typos, bugs, and the like from the text
...
We would appreciate it if you would notify us of any
errors or omissions in the book that are not on the current list of errata
...
We also
welcome any contributions to the book Web page that could be of use to other read-

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

Front Matter

Preface

9

© The McGraw−Hill
Companies, 2001

Preface

xxiii

ers, such as programming exercises, project suggestions, online labs and tutorials,
and teaching tips
...
bell-labs
...
Any other correspondence should be sent to Avi Silberschatz, Bell Laboratories, Room 2T-310, 600
Mountain Avenue, Murray Hill, NJ 07974, USA
...
In addition, many people have
written or spoken to us about the book, and have offered suggestions and comments
...
Gurari, The Ohio State
University; Irwin Levinstein, Old Dominion University; Ling Liu, Georgia Institute of Technology; Ami Motro, George Mason University; Bhagirath Narahari, Meral Ozsoyoglu, Case Western Reserve University; and Odinaldo Rodriguez, King’s College London; who served as reviewers of the book and
whose comments helped us greatly in formulating this fourth edition
...
L
...

• Phil Bohannon, for writing the ﬁrst draft of Chapter 10 describing XML
...
Blakeley, Kalen Delaney, Michael Rys, Michael
e
Zwilling, Sameet Agarwal, Thomas Casey (all of Microsoft) for writing the
appendices describing the Oracle, IBM DB2, and Microsoft SQL Server database
systems
...

• Marilyn Turnamian and Nandprasad Joshi, whose excellent secretarial assistance was essential for timely completion of this fourth edition
...
The senior developmental editor was Kelley
Butcher
...
The executive marketing manager was
John Wannemacher
...
The freelance copyeditor was George Watson
...
The supplement producer was Jodi Banowetz
...
The freelance indexer was Tobiah Waldron
...
B
...
Edwards,
Christos Faloutsos, Homma Farian, Alan Fekete, Shashi Gadia, Jim Gray, Le Gruen-

10

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

xxiv

Front Matter

Preface

© The McGraw−Hill
Companies, 2001

Preface

wald, Ron Hitchens, Yannis Ioannidis, Hyoung-Joo Kim, Won Kim, Henry Korth (father of Henry F
...
V
...
Seshadri, Shashi Shekhar, Amit Sheth, Nandit Soparkar, Greg Speegle, and Marianne Winslett
...
Greg Speegle, Dawn Bezviner, and K
...
Raghavan helped
us to prepare the instructor’s manual for earlier editions
...
The idea of using ships as part of the cover
concept was originally suggested to us by Bruce Stephan
...
Hank
would like to acknowledge his wife, Joan, and his children, Abby and Joe, for their
love and understanding
...

A
...

H
...
K
...
S
...
Introduction

H

A

P

T

E

R

11

© The McGraw−Hill
Companies, 2001

Text

1

Introduction

A database-management system (DBMS) is a collection of interrelated data and a
set of programs to access those data
...
The primary goal of a DBMS
is to provide a way to store and retrieve database information that is both convenient
and efﬁcient
...
Management of data involves both deﬁning structures for storage of information and providing mechanisms for the manipulation of information
...
If data are to be shared among several users, the
system must avoid possible anomalous results
...
These
concepts and technique form the focus of this book
...

1
...
Here are some representative applications:
• Banking: For customer information, accounts, and loans, and banking transactions
...
Airlines were among the
ﬁrst to use databases in a geographically distributed manner — terminals situated around the world accessed the central database system through phone
lines and other data networks
...

1

12

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

2

Chapter 1

1
...

• Telecommunication: For keeping records of calls made, generating monthly bills,
maintaining balances on prepaid calling cards, and storing information about
the communication networks
...

• Sales: For customer, product, and purchase information
...

• Human resources: For information about employees, salaries, payroll taxes and
beneﬁts, and for generation of paychecks
...

Over the course of the last four decades of the twentieth century, use of databases
grew in all enterprises
...
Then automated teller machines
came along and let users interact directly with databases
...

The internet revolution of the late 1990s sharply increased direct user access to
databases
...
For
instance, when you access an online bookstore and browse a book or music collection, you are accessing data stored in a database
...
When you access a bank Web site and retrieve
your bank balance and transaction information, the information is retrieved from the
bank’s database system
...

Furthermore, data about your Web accesses may be stored in a database
...

The importance of database systems can be judged in another way — today, database system vendors like Oracle are among the largest software companies in the
world, and database systems form an important part of the product line of more
diversiﬁed companies like Microsoft and IBM
...
Introduction

13

© The McGraw−Hill
Companies, 2001

Text

1
...
2 Database Systems versus File Systems
Consider part of a savings-bank enterprise that keeps information about all customers and savings accounts
...
To allow users to manipulate the information, the
system has a number of application programs that manipulate the ﬁles, including
• A program to debit or credit an account
• A program to add a new account
• A program to ﬁnd the balance of an account
• A program to generate monthly statements
System programmers wrote these application programs to meet the needs of the
bank
...
For example, suppose that the savings bank decides to offer checking accounts
...
Thus,
as time goes by, the system acquires more ﬁles and more application programs
...
The system stores permanent records in various ﬁles, and it needs different
application programs to extract records from, and add records to, the appropriate
ﬁles
...

Keeping organizational information in a ﬁle-processing system has a number of
major disadvantages:
• Data redundancy and inconsistency
...
Moreover, the same information may be duplicated in
several places (ﬁles)
...
This redundancy leads
to higher storage and access cost
...
For
example, a changed customer address may be reﬂected in savings-account
records but not elsewhere in the system
...
Suppose that one of the bank ofﬁcers needs to
ﬁnd out the names of all customers who live within a particular postal-code
area
...

Because the designers of the original system did not anticipate this request,
there is no application program on hand to meet it
...
The bank ofﬁcer has

14

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

4

Chapter 1

1
...
Both alternatives are obviously unsatisfactory
...
As expected, a program to generate such a list does
not exist
...

The point here is that conventional ﬁle-processing environments do not allow needed data to be retrieved in a convenient and efﬁcient manner
...

• Data isolation
...

• Integrity problems
...
For example, the balance of a bank account may never fall below a prescribed amount (say, $25)
...
However, when new constraints are added, it is difﬁcult
to change the programs to enforce them
...

• Atomicity problems
...
In many applications, it is crucial that, if a
failure occurs, the data be restored to the consistent state that existed prior to
the failure
...

If a system failure occurs during the execution of the program, it is possible
that the $50 was removed from account A but was not credited to account B,
resulting in an inconsistent database state
...

That is, the funds transfer must be atomic — it must happen in its entirety or
not at all
...

• Concurrent-access anomalies
...
In such an environment, interaction of concurrent updates may result in inconsistent data
...
If two customers withdraw funds (say $50 and $100 respectively) from
account A at about the same time, the result of the concurrent executions may
leave the account in an incorrect (or inconsistent) state
...
If the
two programs run concurrently, they may both read the value $500, and write
back $450 and $400, respectively
...
Introduction

15

© The McGraw−Hill
Companies, 2001

Text

1
...
To guard against this possibility, the system must maintain some form
of supervision
...

• Security problems
...
For example, in a banking system, payroll personnel need
to see only that part of the database that has information about the various
bank employees
...
But, since application programs are added to the system in an ad hoc
manner, enforcing such security constraints is difﬁcult
...

In what follows, we shall see the concepts and algorithms that enable database systems to solve the problems with ﬁle-processing systems
...

1
...
A major purpose of a database system is to
provide users with an abstract view of the data
...

1
...
1 Data Abstraction
For the system to be usable, it must retrieve data efﬁciently
...

Since many database-systems users are not computer trained, developers hide the
complexity from users through several levels of abstraction, to simplify users’ interactions with the system:
• Physical level
...
The physical level describes complex low-level data structures in
detail
...
The next-higher level of abstraction describes what data are
stored in the database, and what relationships exist among those data
...
Although implementation of the simple structures at the logical level may involve complex physical-level structures, the
user of the logical level does not need to be aware of this complexity
...

16

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

6

Chapter 1

1
...
The highest level of abstraction describes only part of the entire
database
...
Many
users of the database system do not need all this information; instead, they
need to access only a part of the database
...
The system may provide many
views for the same database
...
1 shows the relationship among the three levels of abstraction
...
Most high-level programming languages
support the notion of a record type
...
Each ﬁeld has
a name and a type associated with it
...
The language

view level
view 1

view 2

…

view n

logical
level
physical
level
Figure 1
...

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

1
...
4

Data Models

7

compiler hides this level of detail from programmers
...
Database
administrators, on the other hand, may be aware of certain details of the physical
organization of the data
...
Programmers using a programming language work at this level of abstraction
...

Finally, at the view level, computer users see a set of application programs that
hide details of the data types
...
In addition to hiding details of the
logical level of the database, the views also provide a security mechanism to prevent
users from accessing certain parts of the database
...

1
...
2 Instances and Schemas
Databases change over time as information is inserted and deleted
...
The overall design of the database is called the database schema
...

The concept of database schemas and instances can be understood by analogy to
a program written in a programming language
...
Each
variable has a particular value at a given instant
...

Database systems have several schemas, partitioned according to the levels of abstraction
...
A database
may also have several schemas at the view level, sometimes called subschemas, that
describe different views of the database
...
The physical schema is hidden beneath the logical schema, and can usually
be changed easily without affecting application programs
...

We study languages for describing schemas, after introducing the notion of data
models in the next section
...
4 Data Models
Underlying the structure of a database is the data model: a collection of conceptual
tools for describing data, data relationships, data semantics, and consistency constraints
...
Introduction

Text

© The McGraw−Hill
Companies, 2001

Introduction

section: the entity-relationship model and the relational model
...

1
...
1 The Entity-Relationship Model
The entity-relationship (E-R) data model is based on a perception of a real world that
consists of a collection of basic objects, called entities, and of relationships among these
objects
...
For example, each person is an entity, and bank accounts can be
considered as entities
...
For example, the attributes account-number and balance may describe one particular account in a bank,
and they form attributes of the account entity set
...

An extra attribute customer-id is used to uniquely identify customers (since it may
be possible to have two customers with the same name, street address, and city)
...
In the United States,
many enterprises use the social-security number of a person (a unique number the
U
...
government assigns to every person in the United States) as a customer
identiﬁer
...
For example, a depositor
relationship associates a customer with each account that she has
...

The overall logical structure (schema) of a database can be expressed graphically
by an E-R diagram, which is built up from the following components:
• Rectangles, which represent entity sets
• Ellipses, which represent attributes
• Diamonds, which represent relationships among entity sets
• Lines, which link attributes to entity sets and entity sets to relationships
Each component is labeled with the entity or relationship that it represents
...
Figure 1
...
The E-R diagram indicates that there are two entity sets,
customer and account, with attributes as outlined earlier
...

In addition to entities and relationships, the E-R model represents certain constraints to which the contents of a database must conform
...
For example, if each account must belong
to only one customer, the E-R model can express that constraint
...

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

1
...
4

customer-name

Data Models

account-number

customer-street

customer-id

19

© The McGraw−Hill
Companies, 2001

Text

9

balance

customer-city
customer

Figure 1
...

1
...
2 Relational Model
The relational model uses a collection of tables to represent both data and the relationships among those data
...
Figure 1
...

The ﬁrst table, the customer table, shows, for example, that the customer identiﬁed
by customer-id 192-83-7465 is named Johnson and lives at 12 Alma St
...

The second table, account, shows, for example, that account A-101 has a balance of
$500, and A-201 has a balance of $900
...
For example,
account number A-101 belongs to the customer whose customer-id is 192-83-7465,
namely Johnson, and customers 192-83-7465 (Johnson) and 019-28-3746 (Smith) share
account number A-201 (they may share a business venture)
...
Record-based models are so named because the database is structured in ﬁxed-format records of several
types
...
Each record type deﬁnes a
ﬁxed number of ﬁelds, or attributes
...

It is not hard to see how tables may be stored in ﬁles
...
The relational model hides such low-level implementation details
from database developers and users
...
Chapters 3 through 7
cover the relational model in detail
...
Database
designs are often carried out in the E-R model, and then translated to the relational
model; Chapter 2 describes the translation process
...

We also note that it is possible to create schemas in the relational model that have
problems such as unnecessarily duplicated information
...
Introduction

© The McGraw−Hill
Companies, 2001

Text

Introduction

customer-id customer-name
192-83-7465
Johnson
019-28-3746
Smith
677-89-9011
Hayes
182-73-6091
Turner
321-12-3123
Jones
336-66-9999
Lindsay
019-28-3746
Smith

customer-street
12 Alma St
...

3 Main St
...

100 Main St
...

72 North St
...
3

A sample relational database
...
Then, to represent the fact
that accounts A-101 and A-201 both belong to customer Johnson (with customer-id
192-83-7465), we would need to store two rows in the customer table
...
In Chapter 7, we shall study how to distinguish
good schema designs from bad schema designs
...
4
...
The object-oriented model can be seen as extending the E-R model with notions

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

1
...
5

Database Languages

11

of encapsulation, methods (functions), and object identity
...

The object-relational data model combines features of the object-oriented data
model and relational data model
...

Semistructured data models permit the speciﬁcation of data where individual data
items of the same type may have different sets of attributes
...
The extensible markup language (XML) is widely
used to represent semistructured data
...

Historically, two other data models, the network data model and the hierarchical
data model, preceded the relational data model
...
As a
result they are little used now, except in old database code that is still in service in
some places
...

1
...
In
practice, the data deﬁnition and data manipulation languages are not two separate
languages; instead they simply form parts of a single database language, such as the
widely used SQL language
...
5
...

For instance, the following statement in the SQL language deﬁnes the account table:
create table account
(account-number char(10),
balance integer)
Execution of the above DDL statement creates the account table
...

A data dictionary contains metadata — that is, data about data
...
A database system consults the data dictionary before
reading or modifying actual data
...
These statements deﬁne the implementation details of the database schemas,
which are usually hidden from the users
...

For example, suppose the balance on an account should not fall below $100
...
The database systems check these constraints every time the database is updated
...
Introduction

Text

© The McGraw−Hill
Companies, 2001

Introduction

1
...
2 Data-Manipulation Language
Data manipulation is
• The retrieval of information stored in the database
• The insertion of new information into the database
• The deletion of information from the database
• The modiﬁcation of information stored in the database
A data-manipulation language (DML) is a language that enables users to access
or manipulate data as organized by the appropriate data model
...

• Declarative DMLs (also referred to as nonprocedural DMLs) require a user to
specify what data are needed without specifying how to get those data
...

However, since a user does not have to specify how to get the data, the database
system has to ﬁgure out an efﬁcient means of accessing data
...

A query is a statement requesting the retrieval of information
...
Although technically incorrect, it is common practice to use the terms query language and datamanipulation language synonymously
...
customer-name
from customer
where customer
...
If the query were run on the table in Figure 1
...

Queries may involve information from more than one table
...

select account
...
customer-id = 192-83-7465 and
depositor
...
account-number

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

1
...
6

Database Users and Administrators

13

If the above query were run on the tables in Figure 1
...

There are a number of database query languages in use, either commercially or
experimentally
...

We also study some other query languages in Chapter 5
...
3 apply not only to deﬁning
or structuring data, but also to manipulating data
...
At higher levels of abstraction,
we emphasize ease of use
...
The query processor component of the database system (which we study in
Chapters 13 and 14) translates DML queries into sequences of actions at the physical
level of the database system
...
5
...
Application programs are usually written in a host language, such as Cobol, C, C++, or
Java
...

To access the database, DML statements need to be executed from the host language
...

The Open Database Connectivity (ODBC) standard deﬁned by Microsoft
for use with the C language is a commonly used application program interface standard
...

• By extending the host language syntax to embed DML calls within the host
language program
...

1
...
People who work with a database can be categorized as
database users or database administrators
...
6
...
Different types of user interfaces have been
designed for the different types of users
...
Introduction

Text

© The McGraw−Hill
Companies, 2001

Introduction

• Naive users are unsophisticated users who interact with the system by invoking one of the application programs that have been written previously
...
This program asks the teller for the amount
of money to be transferred, the account from which the money is to be transferred, and the account to which the money is to be transferred
...
Such a user may access a form, where she
enters her account number
...

The typical user interface for naive users is a forms interface, where the
user can ﬁll in appropriate ﬁelds of the form
...

• Application programmers are computer professionals who write application
programs
...
Rapid application development (RAD) tools are tools that enable an application programmer to construct forms and reports without writing a program
...
These languages, sometimes called fourth-generation languages, often
include special features to facilitate the generation of forms and the display of
data on the screen
...

• Sophisticated users interact with the system without writing programs
...
They submit
each such query to a query processor, whose function is to break down DML
statements into instructions that the storage manager understands
...

Online analytical processing (OLAP) tools simplify analysts’ tasks by letting them view summaries of data in different ways
...
The tools also permit the analyst to select speciﬁc regions, look at data in more detail (for example, sales by city within a region)
or look at the data in less detail (for example, aggregate products together by
category)
...

We study OLAP tools and data mining in Chapter 22
...

Among these applications are computer-aided design systems, knowledge-

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

1
...
7

Transaction Management

15

base and expert systems, systems that store data with complex data types (for
example, graphics data and audio data), and environment-modeling systems
...

1
...
2 Database Administrator
One of the main reasons for using DBMSs is to have central control of both the data
and the programs that access those data
...
The functions of a DBA include:
• Schema deﬁnition
...

• Storage structure and access-method deﬁnition
...
The DBA carries out changes to the schema and physical organization to reﬂect the changing needs of the
organization, or to alter the physical organization to improve performance
...
By granting different types of
authorization, the database administrator can regulate which parts of the database various users can access
...

• Routine maintenance
...

Ensuring that enough free disk space is available for normal operations,
and upgrading disk space as required
...

1
...
An example is a funds transfer, as in Section 1
...
Clearly, it is essential that either both the credit
and debit occur, or that neither occur
...
This all-or-none requirement is called atomicity
...
That is, the value of the sum A + B must be preserved
...
Finally, after the successful execution of a funds
transfer, the new values of accounts A and B must persist, despite the possibility of
system failure
...

A transaction is a collection of operations that performs a single logical function
in a database application
...
Introduction

Text

© The McGraw−Hill
Companies, 2001

Introduction

tency
...
That is, if the database was consistent when a transaction started, the
database must be consistent when the transaction successfully terminates
...

This temporary inconsistency, although necessary, may lead to difﬁculty if a failure
occurs
...
For example, the transaction to
transfer funds from account A to account B could be deﬁned to be composed of two
separate programs: one that debits account A, and another that credits account B
...

However, each program by itself does not transform the database from a consistent
state to a new consistent state
...

Ensuring the atomicity and durability properties is the responsibility of the database system itself — speciﬁcally, of the transaction-management component
...
However, because of various types of failure, a transaction may not always
complete its execution successfully
...
Thus, the database must
be restored to the state in which it was before the transaction in question started executing
...

Finally, when several transactions update the database concurrently, the consistency of data may no longer be preserved, even though each individual transaction is correct
...

Database systems designed for use on small personal computers may not have
all these features
...
Others do not offer backup and recovery, leaving that to the
user
...
Although such a low-cost, low-feature
approach is adequate for small personal databases, it is inadequate for a medium- to
large-scale enterprise
...
8 Database System Structure
A database system is partitioned into modules that deal with each of the responsibilites of the overall system
...

The storage manager is important because databases typically require a large
amount of storage space
...
A gigabyte is 1000 megabytes

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

1
...
8

Database System Structure

17

(1 billion bytes), and a terabyte is 1 million megabytes (1 trillion bytes)
...
Data are moved between disk storage and main memory as needed
...

The query processor is important because it helps the database system simplify
and facilitate access to data
...
However, quick processing of updates and queries
is important
...

1
...
1 Storage Manager
A storage manager is a program module that provides the interface between the lowlevel data stored in the database and the application programs and queries submitted to the system
...
The raw data are stored on the disk using the ﬁle system, which is usually provided by a conventional operating system
...
Thus, the storage
manager is responsible for storing, retrieving, and updating data in the database
...

• Transaction manager, which ensures that the database remains in a consistent
(correct) state despite system failures, and that concurrent transaction executions proceed without conﬂicting
...

• Buffer manager, which is responsible for fetching data from disk storage into
main memory, and deciding what data to cache in main memory
...

The storage manager implements several data structures as part of the physical
system implementation:
• Data ﬁles, which store the database itself
...

• Indices, which provide fast access to data items that hold particular values
...
Introduction

Text

© The McGraw−Hill
Companies, 2001

Introduction

1
...
2 The Query Processor
The query processor components include
• DDL interpreter, which interprets DDL statements and records the deﬁnitions
in the data dictionary
...

A query can usually be translated into any of a number of alternative evaluation plans that all give the same result
...

• Query evaluation engine, which executes low-level instructions generated by
the DML compiler
...
4 shows these components and the connections among them
...
9 Application Architectures
Most users of a database system today are not present at the site of the database
system, but connect to it through a network
...

Database applications are usually partitioned into two or three parts, as in Figure 1
...
In a two-tier architecture, the application is partitioned into a component
that resides at the client machine, which invokes database system functionality at the
server machine through query language statements
...

In contrast, in a three-tier architecture, the client machine acts as merely a front
end and does not contain any direct database calls
...
The application
server in turn communicates with a database system to access data
...
Three-tier applications are more appropriate for large applications, and for
applications that run on the World Wide Web
...
10 History of Database Systems
Data processing drives the growth of computers, as it has from the earliest days of
commercial computers
...
Punched cards, invented by Hollerith, were used at the very beginning of the
twentieth century to record U
...
census data, and mechanical systems were used to

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

1
...
10

naive users
(tellers, agents,
web-users)

write

application
interfaces

History of Database Systems

sophisticated
users
(analysts)

application
programmers

use

use

application
programs

query
tools

compiler and
linker

DML queries

application
program
object code

19

database
administrator
use

administration
tools

DDL interpreter

DML compiler
and organizer
query evaluation
engine

buffer manager

29

© The McGraw−Hill
Companies, 2001

Text

query processor

authorization
and integrity
manager

file manager

transaction
manager

storage manager

disk storage
indices
data

data dictionary
statistical data

Figure 1
...

process the cards and tabulate results
...

Techniques for data storage and processing have evolved over the years:
• 1950s and early 1960s: Magnetic tapes were developed for data storage
...

Processing of data consisted of reading data from one or more tapes and

30

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

20

Chapter 1

1
...
two-tier architecture
Figure 1
...
three-tier architecture
Two-tier and three-tier architectures
...
Data could also be input from punched card decks,
and output to printers
...
The records had to
be in the same sorted order
...

Tapes (and card decks) could be read only sequentially, and data sizes were
much larger than main memory; thus, data processing programs were forced
to process data in a particular order, by reading and merging data from tapes
and card decks
...
The position of data on disk was immaterial, since any location on disk
could be accessed in just tens of milliseconds
...
With disks, network and hierarchical databases could
be created that allowed data structures such as lists and trees to be stored on
disk
...

A landmark paper by Codd [1970] deﬁned the relational model, and nonprocedural ways of querying data in the relational model, and relational
databases were born
...
Codd later won the prestigious Association of Computing
Machinery Turing Award for his work
...
Introduction

31

© The McGraw−Hill
Companies, 2001

Text

1
...
That changed with System R, a groundbreaking project
at IBM Research that developed techniques for the construction of an efﬁcient
relational database system
...
[1976] and Chamberlin et al
...
The fully functional System R prototype led to IBM’s ﬁrst relational database product, SQL/DS
...
By the early 1980s, relational databases had become
competitive with network and hierarchical database systems even in the area
of performance
...
Most importantly, they had to keep
efﬁciency in mind when designing their programs, which involved a lot of
effort
...
Since attaining dominance in the 1980s, the relational
model has reigned supreme among data models
...

• Early 1990s: The SQL language was designed primarily for decision support
applications, which are query intensive, yet the mainstay of databases in the
1980s was transaction processing applications, which are update intensive
...
Tools for analyzing large amounts of data saw large growths in
usage
...
Database vendors also began to add object-relational support to their
databases
...

Databases were deployed much more extensively than ever before
...
Database
systems also had to support Web interfaces to data
...
11 Summary
• A database-management system (DBMS) consists of a collection of interrelated data and a collection of programs to access that data
...

32

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

22

Chapter 1

1
...

• Database systems are ubiquitous today, and most people interact, either directly or indirectly, with databases many times every day
...
The management of data involves both the deﬁnition of structures for the storage of
information and the provision of mechanisms for the manipulation of information
...
If data are to be shared among several users, the system must avoid
possible anomalous results
...
That is, the system hides certain details of how the data are
stored and maintained
...
The entity-relationship (E-R) data model is a widely used data
model, and it provides a convenient graphical representation to view data, relationships and constraints
...
Other data models are the object-oriented model, the objectrelational model, and semistructured data models
...
A database
schema is speciﬁed by a set of deﬁnitions that are expressed using a datadeﬁnition language (DDL)
...
Nonprocedural DMLs, which require a user to specify
only what data are needed, without specifying exactly how to get those data,
are widely used today
...

• A database system has several subsystems
...

The transaction manager also ensures that concurrent transaction executions proceed without conﬂicting
...

The storage manager subsystem provides the interface between the lowlevel data stored in the database and the application programs and queries
submitted to the system
...
Introduction

33

© The McGraw−Hill
Companies, 2001

Text

Exercises

23

• Database applications are typically broken up into a front-end part that runs at
client machines and a part that runs at the back end
...

In three-tier architectures, the back end part is itself broken up into an application server and a database server
...
1 List four signiﬁcant differences between a ﬁle-processing system and a DBMS
...
2 This chapter has described several major advantages of a database system
...
3 Explain the difference between physical and logical data independence
...
4 List ﬁve responsibilities of a database management system
...

1
...
6 List seven programming languages that are procedural and two that are nonprocedural
...

1
...

34

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

24

Chapter 1

1
...
8 Consider a two-dimensional integer array of size n × m that is to be used in
your favorite programming language
...

Bibliographical Notes
We list below general purpose books, research paper collections, and Web sites on
databases
...

Textbooks covering database systems include Abiteboul et al
...
Textbook coverage of transaction processing is provided
by Bernstein and Newcomer [1997] and Gray and Reuter [1993]
...

Among these are Bancilhon and Buneman [1990], Date [1986], Date [1990], Kim [1995],
Zaniolo et al
...

A review of accomplishments in database management and an assessment of future research challenges appears in Silberschatz et al
...
[1996]
and Bernstein et al
...
The home page of the ACM Special Interest Group on
Management of Data (see www
...
org/sigmod) provides a wealth of information
about database research
...

Codd [1970] is the landmark paper that introduced the relational model
...

Tools
There are a large number of commercial database systems in use today
...
ibm
...
oracle
...
microsoft
...
informix
...
sybase
...
Some of these systems are available free for personal or
noncommercial use, or for development, but are not free for actual deployment
...
mysql
...
postgressql
...

A more complete list of links to vendor Web sites and other information is available from the home page of this book, at www
...
bell-labs
...

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

P A

I
...
In this part, we study two data
models— the entity – relationship model and the relational model
...
It is based on a
perception of a real world that consists of a collection of basic objects, called entities,
and of relationships among these objects
...
It uses a collection of tables to represent both data and the relationships among those data
...
Designers often formulate database schema design by ﬁrst
modeling data at a high level, using the E-R model, and then translating it into the
the relational model
...
The object-oriented data model,
for example, extends the representation of entities by adding notions of encapsulation, methods (functions), and object identity
...
Chapters 8 and 9, respectively, cover these two data models
...
Data Models

H

A

P

2
...
It was developed
to facilitate database design by allowing speciﬁcation of an enterprise schema, which
represents the overall logical structure of a database
...
The E-R model is very useful in mapping the meanings
and interactions of real-world enterprises onto a conceptual schema
...

2
...

2
...
1 Entity Sets
An entity is a “thing” or “object” in the real world that is distinguishable from all
other objects
...
An entity has a
set of properties, and the values for some set of properties may uniquely identify an
entity
...
Thus, the value 677-89-9011 for person-id would uniquely identify one particular person in the enterprise
...

An entity may be concrete, such as a person or a book, or it may be abstract, such as
a loan, or a holiday, or a concept
...
The set of all persons who are customers at a given bank, for example, can
be deﬁned as the entity set customer
...
Data Models

2
...
The individual entities that constitute a
set are said to be the extension of the entity set
...

Entity sets do not need to be disjoint
...
A person entity may be an employee entity, a customer entity, both, or neither
...
Attributes are descriptive properties possessed by each member of an entity set
...

Possible attributes of the customer entity set are customer-id, customer-name, customerstreet, and customer-city
...
Possible attributes of the loan entity set are loan-number
and amount
...
For instance, a particular customer
entity may have the value 321-12-3123 for customer-id, the value Jones for customername, the value Main for customer-street, and the value Harrison for customer-city
...
In the United States,
many enterprises ﬁnd it convenient to use the social-security number of a person1
as an attribute whose value uniquely identiﬁes the person
...

For each attribute, there is a set of permitted values, called the domain, or value
set, of that attribute
...
Similarly, the domain of attribute loan-number might
be the set of all strings of the form “L-n” where n is a positive integer
...
Figure 2
...

Formally, an attribute of an entity set is a function that maps from the entity set into
a domain
...
For
example, a particular customer entity may be described by the set {(customer-id, 67789-9011), (customer-name, Hayes), (customer-street, Main), (customer-city, Harrison)},
meaning that the entity describes a person named Hayes whose customer identiﬁer
is 677-89-9011 and who resides at Main Street in Harrison
...
The
attribute values describing an entity will constitute a signiﬁcant portion of the data
stored in the database
...

1
...
Each person is supposed to have only one socialsecurity number, and no two people are supposed to have the same social-security number
...
Data Models

© The McGraw−Hill
Companies, 2001

2
...
1

Basic Concepts

321-12-3123 Jones

Main

Harrison

L-17 1000

019-28-3746 Smith

North

Rye

L-23 2000

677-89-9011 Hayes

Main

Harrison

L-15 1500

555-55-5555 Jackson

Dupont Woodside

L-14 1500

244-66-8800 Curry

North

L-19

500

963-96-3963 Williams Nassau Princeton

L-11

900

335-57-7991 Adams

29

L-16 1300

Rye

Spring Pittsfield

customer
Figure 2
...

• Simple and composite attributes
...
Composite attributes,
on the other hand, can be divided into subparts (that is, other attributes)
...
Using composite attributes in
a design schema is a good choice if a user will wish to refer to an entire attribute on some occasions, and to only a component of the attribute on other
occasions
...
2 Composite attributes help us to group
together related attributes, making the modeling cleaner
...
In the composite attribute address, its component attribute street can be further divided
into street-number, street-name, and apartment-number
...
2 depicts these
examples of composite attributes for the customer entity set
...
The attributes in our examples all
have a single value for a particular entity
...
Such attributes
are said to be single valued
...
Consider an employee entity set with the
attribute phone-number
...

This type of attribute is said to be multivalued
...
We assume the address format used in the United States, which includes a numeric postal code called
a zip code
...
Data Models

© The McGraw−Hill
Companies, 2001

2
...
2

Composite attributes customer-name and customer-address
...

Where appropriate, upper and lower bounds may be placed on the number
of values in a multivalued attribute
...
Placing bounds
in this case expresses that the phone-number attribute of the customer entity set
may have between zero and two values
...
The value for this type of attribute can be derived from
the values of other related attributes or entities
...
We can derive the value for this attribute
by counting the number of loan entities associated with that customer
...
If the customer entity set also has an
attribute date-of-birth, we can calculate age from date-of-birth and the current
date
...
In this case, date-of-birth may be referred
to as a base attribute, or a stored attribute
...

An attribute takes a null value when an entity does not have a value for it
...
For example, one may have no middle name
...
An unknown value may be either missing (the value does
exist, but we do not have that information) or not known (we do not know whether or
not the value actually exists)
...
A null value for the
apartment-number attribute could mean that the address does not include an apartment number (not applicable), that an apartment number exists but we do not know
what it is (missing), or that we do not know whether or not an apartment number is
part of the customer’s address (unknown)
...

For example, in addition to keeping track of customers and loans, the bank also

39

40

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

I
...
Entity−Relationship
Model

2
...
Also, if the bank has a number of different branches, then
we may keep information about all the branches of the bank
...

2
...
2 Relationship Sets
A relationship is an association among several entities
...
This relationship speciﬁes that Hayes is a customer with loan number L-15
...
Formally, it is a mathematical relation on n ≥ 2 (possibly nondistinct) entity sets
...
, En are
entity sets, then a relationship set R is a subset of
{(e1 , e2 ,
...
, en ∈ En }
where (e1 , e2 ,
...

Consider the two entity sets customer and loan in Figure 2
...
We deﬁne the relationship set borrower to denote the association between customers and the bank loans
that the customers have
...
3 depicts this association
...
We can deﬁne
the relationship set loan-branch to denote the association between a bank loan and the
branch in which that loan is maintained
...
3

loan
Relationship set borrower
...
Data Models

2
...
, En participate in relationship set R
...
As an illustration, the individual customer entity
Hayes, who has customer identiﬁer 677-89-9011, and the loan entity L-15 participate
in a relationship instance of borrower
...

The function that an entity plays in a relationship is called that entity’s role
...
However, they are useful when the meaning of a relationship needs clariﬁcation
...
In this type of relationship set, sometimes called a recursive relationship set, explicit role names are necessary to specify how an entity
participates in a relationship instance
...
We may have a relationship set works-for that is modeled by ordered pairs of employee entities
...
In this way, all relationships of works-for are characterized by (worker, manager)
pairs; (manager, worker) pairs are excluded
...
Consider a
relationship set depositor with entity sets customer and account
...
The depositor relationship among the entities corresponding to customer Jones and account A-217 has the value “23 May 2001” for attribute access-date, which means that the most recent date that Jones accessed account
A-217 was 23 May 2001
...
We
may wish to store a descriptive attribute for-credit with the relationship, to record
whether a student has taken the course for credit, or is auditing (or sitting in on) the
course
...
To understand
this point, suppose we want to model all the dates when a customer accessed an
account
...
We
cannot represent multiple access dates by multiple relationship instances between the
same customer and account, since the relationship instances would not be uniquely
identiﬁable using only the participating entities
...

However, there can be more than one relationship set involving the same entity
sets
...
Additionally, suppose each loan must have another customer who serves
as a guarantor for the loan
...

41

42

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

I
...
Entity−Relationship
Model

2
...
Most of the relationship sets in
a database system are binary
...

As an example, consider the entity sets employee, branch, and job
...
Job entities may have the attributes title and level
...
A ternary relationship among Jones, Perryridge,
and manager indicates that Jones acts as a manager at the Perryridge branch
...
Yet another relationship could be between Smith, Downtown,
and teller, indicating Smith acts as a teller at the Downtown branch
...
A binary relationship set is of degree 2; a ternary relationship set
is of degree 3
...
2 Constraints
An E-R enterprise schema may deﬁne certain constraints to which the contents of a
database must conform
...

2
...
1 Mapping Cardinalities
Mapping cardinalities, or cardinality ratios, express the number of entities to which
another entity can be associated via a relationship set
...
In this section, we shall concentrate on only binary relationship
sets
...
An entity in A is associated with at most one entity in B, and an
entity in B is associated with at most one entity in A
...
4a
...
An entity in A is associated with any number (zero or more) of
entities in B
...
(See Figure 2
...
)
• Many to one
...
An
entity in B, however, can be associated with any number (zero or more) of
entities in A
...
5a
...
An entity in A is associated with any number (zero or more) of
entities in B, and an entity in B is associated with any number (zero or more)
of entities in A
...
5b
...
Data Models

© The McGraw−Hill
Companies, 2001

2
...
4

B

(b)

Mapping cardinalities
...
(b) One to many
...

As an illustration, consider the borrower relationship set
...
If a loan can belong to several
customers (as can loans taken jointly by several business partners), the relationship
set is many to many
...
3 depicts this type of relationship
...
2
...
If only some entities in E
participate in relationships in R, the participation of entity set E in relationship R is
said to be partial
...
Therefore the participation of loan in

A

B

a1
a2

b1

a3

b2

a4

b3

a5
(a)
Figure 2
...
(a) Many to one
...

43

44

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

I
...
Entity−Relationship
Model

© The McGraw−Hill
Companies, 2001

2
...
In contrast, an individual can be a bank customer
whether or not she has a loan with the bank
...

2
...
Conceptually, individual entities are distinct; from a database perspective,
however, the difference among them must be expressed in terms of their attributes
...
In other words, no two entities in an entity set are allowed
to have exactly the same value for all attributes
...
Keys also help uniquely identify relationships, and thus distinguish
relationships from each other
...
3
...
For example, the customer-id attribute of the
entity set customer is sufﬁcient to distinguish one customer entity from another
...
Similarly, the combination of customer-name and customer-id
is a superkey for the entity set customer
...

The concept of a superkey is not sufﬁcient for our purposes, since, as we saw, a
superkey may contain extraneous attributes
...
We are often interested in superkeys for which no proper subset is a superkey
...

It is possible that several distinct sets of attributes could serve as a candidate key
...
Then, both {customer-id} and
{customer-name, customer-street} are candidate keys
...

We shall use the term primary key to denote a candidate key that is chosen by
the database designer as the principal means of identifying entities within an entity
set
...
Any two individual entities in the set are prohibited from
having the same value on the key attributes at the same time
...

Candidate keys must be chosen with care
...

In the United States, the social-security number attribute of a person would be a

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

36

Chapter 2

I
...
Entity−Relationship
Model

© The McGraw−Hill
Companies, 2001

Entity-Relationship Model

candidate key
...
S
...
An alternative
is to use some unique combination of other attributes as a key
...
For instance, the address ﬁeld of a person should not be part of the primary
key, since it is likely to change
...
Unique identiﬁers generated by enterprises generally do not
change, except if two enterprises merge; in such a case the same identiﬁer may have
been issued by both enterprises, and a reallocation of identiﬁers may be required to
make sure they are unique
...
3
...
We need a similar mechanism to distinguish among the various relationships
of a relationship set
...
, En
...
Assume
for now that the attribute names of all primary keys are unique, and each entity set
participates only once in the relationship
...

If the relationship set R has no attributes associated with it, then the set of attributes
primary-key(E1 ) ∪ primary-key(E2 ) ∪ · · · ∪ primary-key(En )
describes an individual relationship in set R
...
, am }
describes an individual relationship in set R
...

In case the attribute names of primary keys are not unique across entity sets, the
attributes are renamed to distinguish them; the name of the entity set combined with
the name of the attribute would form a unique name
...
1
...

45

46

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

I
...
Entity−Relationship
Model

2
...
As an illustration, consider the entity sets
customer and account, and the relationship set depositor, with attribute access-date, in
Section 2
...
2
...
Then the primary
key of depositor consists of the union of the primary keys of customer and account
...
Similarly, if the relationship is many to one from
account to customer — that is, each account is owned by at most one customer — then
the primary key of depositor is simply the primary key of account
...

For nonbinary relationships, if no cardinality constraints are present then the superkey formed as described earlier in this section is the only candidate key, and it
is chosen as the primary key
...
Since we have not discussed how to specify cardinality constraints on nonbinary relations, we do not discuss this issue further in this
chapter
...
3
...
4 Design Issues
The notions of an entity set and a relationship set are not precise, and it is possible
to deﬁne a set of entities and the relationships among them in a number of different ways
...
Section 2
...
4 covers the design process in further detail
...
4
...

It can easily be argued that a telephone is an entity in its own right with attributes
telephone-number and location (the ofﬁce where the telephone is located)
...
Treating a telephone as an entity telephone permits employees to have several telephone numbers (including zero) associated with
them
...

The main difference then is that treating a telephone as an entity better models a
situation where one may want to keep extra information about a telephone, such as

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

38

Chapter 2

I
...
Entity−Relationship
Model

© The McGraw−Hill
Companies, 2001

Entity-Relationship Model

its location, or its type (mobile, video phone, or plain old telephone), or who all share
the telephone
...

In contrast, it would not be appropriate to treat the attribute employee-name as an
entity; it is difﬁcult to argue that employee-name is an entity in its own right (in contrast
to the telephone)
...

Two natural questions thus arise: What constitutes an attribute, and what constitutes an entity set? Unfortunately, there are no simple answers
...

A common mistake is to use the primary key of an entity set as an attribute of another entity set, instead of using a relationship
...
The relationship borrower is the correct way to represent the connection between loans and
customers, since it makes their connection explicit, rather than implicit via an attribute
...
This should
not be done, since the primary key attributes are already implicit in the relationship
...
4
...
In Section 2
...
1, we assumed that a bank loan is modeled as an entity
...
Each
loan is represented by a relationship between a customer and a branch
...
However, with this design, we cannot represent conveniently a situation in
which several customers hold a loan jointly
...
Then, we must replicate
the values for the descriptive attributes loan-number and amount in each such relationship
...

Two problems arise as a result of the replication: (1) the data are stored multiple
times, wasting storage space, and (2) updates potentially leave the data in an inconsistent state, where the values differ in two relationships for attributes that are supposed to have the same value
...

The problem of replication of the attributes loan-number and amount is absent in
the original design of Section 2
...
1, because there loan is an entity set
...
Data Models

© The McGraw−Hill
Companies, 2001

2
...
4

Design Issues

39

entities
...

2
...
3 Binary versus n-ary Relationship Sets
Relationships in databases are often binary
...
For
instance, one could create a ternary relationship parent, relating a child to his/her
mother and father
...
Using the two relationships mother and father allows us record a child’s
mother, even if we are not aware of the father’s identity; a null value would be
required if the ternary relationship parent is used
...

In fact, it is always possible to replace a nonbinary (n-ary, for n > 2) relationship
set by a number of distinct binary relationship sets
...
We replace
the relationship set R by an entity set E, and create three relationship sets:
• RA , relating E and A
• RB , relating E and B
• RC , relating E and C
If the relationship set R had any attributes, these are assigned to entity set E; further,
a special identifying attribute is created for E (since it must be possible to distinguish
different entities in an entity set on the basis of their attribute values)
...
Then, in each of the three new relationship sets, we insert a relationship as follows:
• (ei , ai ) in RA
• (ei , bi ) in RB
• (ei , ci ) in RC
We can generalize this process in a straightforward manner to n-ary relationship
sets
...
However, this restriction is not always desirable
...
This attribute, along with the extra relationship
sets required, increases the complexity of the design and (as we shall see in
Section 2
...

• A n-ary relationship set shows more clearly that several entities participate in
a single relationship
...
Data Models

2
...
For example, consider a constraint
that says that R is many-to-one from A, B to C; that is, each pair of entities
from A and B is associated with at most one C entity
...

Consider the relationship set works-on in Section 2
...
2, relating employee, branch,
and job
...
If we did so, we would be able to record
that Jones is a manager and an auditor and that Jones works at Perryridge and Downtown; however, we would not be able to record that Jones is a manager at Perryridge
and an auditor at Downtown, but is not an auditor at Perryridge or a manager at
Downtown
...
However, doing so would not be very natural
...
4
...
Thus, attributes of one-to-one or one-to-many relationship sets can be associated with one of the participating entity sets, rather than with the relationship
set
...
In this case, the attribute access-date, which speciﬁes when the customer last
accessed that account, could be associated with the account entity set, as Figure 2
...
Since each account entity participates in a relationship with at most one instance of customer, making this attribute designation would have the same meaning
account (account-number, access-date)
customer (customer-name)
depositor
A-101 24 May 1996
Johnson
A-215 3 June 1996
Smith
A-102 10 June 1996
Hayes
A-305 28 May 1996
Turner
A-201 17 June 1996
Jones
A-222 24 June 1996
Lindsay
A-217 23 May 1996

Figure 2
...

49

50

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

I
...
Entity−Relationship
Model

2
...
Attributes of a one-tomany relationship set can be repositioned to only the entity set on the “many” side of
the relationship
...

The design decision of where to place descriptive attributes in such cases— as a
relationship or entity attribute — should reﬂect the characteristics of the enterprise
being modeled
...

The choice of attribute placement is more clear-cut for many-to-many relationship
sets
...

If we are to express the date on which a speciﬁc customer last accessed a speciﬁc
account, access-date must be an attribute of the depositor relationship set, rather than
either one of the participating entities
...
When an attribute is determined by the combination of participating
entity sets, rather than by either entity separately, that attribute must be associated
with the many-to-many relationship set
...
7 depicts the placement of accessdate as a relationship attribute; again, to keep the ﬁgure simple, only some of the
attributes of the two entity sets are shown
...
7

Access-date as attribute of the depositor relationship set
...
Data Models

© The McGraw−Hill
Companies, 2001

2
...
5 Entity-Relationship Diagram
As we saw brieﬂy in Section 1
...
E-R diagrams are simple and clear — qualities that may
well account in large part for the widespread use of the E-R model
...
6
...
8, which consists of two entity sets, customer and loan, related through a binary relationship set borrower
...
The attributes associated with loan are loan-number and amount
...
8, attributes of an entity set that are members of the primary key are underlined
...
To distinguish among these types, we draw either a directed line (→)
or an undirected line (— ) between the relationship set and the entity set in question
...

customer-name

customer-street

customer-id

loan-number

amount

customer-city
customer

Figure 2
...

51

52

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

I
...
Entity−Relationship
Model

2
...

Returning to the E-R diagram of Figure 2
...
If the relationship set borrower were one-to-many, from customer to
loan, then the line from borrower to customer would be directed, with an arrow pointing to the customer entity set (Figure 2
...
Similarly, if the relationship set borrower
were many-to-one from customer to loan, then the line from borrower to loan would
have an arrow pointing to the loan entity set (Figure 2
...
Finally, if the relationship set borrower were one-to-one, then both lines from borrower would have arrows:

customer-name

loan-number

customer-street

customer-id

amount

customer-city
borrower

customer

loan

(a)
customer-name

customer-street

customer-id

loan-number

amount

customer-city
borrower

customer

loan

(b)
customer-name

customer-street

customer-id

amount

loan-number

customer-city
customer

borrower

loan

(c)
Figure 2
...
(a) one to many
...
(c) one-to-one
...
Data Models

© The McGraw−Hill
Companies, 2001

2
...
10

account

E-R diagram with an attribute attached to a relationship set
...
9c)
...
For example, in Figure 2
...

Figure 2
...

Here, a composite attribute name, with component attributes ﬁrst-name, middle-initial,
and last-name replaces the simple attribute customer-name of customer
...
The attribute street is
itself a composite attribute whose component attributes are street-number, street-name,
and apartment number
...
11 also illustrates a multivalued attribute phone-number, depicted by a
double ellipse, and a derived attribute age, depicted by a dashed ellipse
...
11

apartment-number

date-of-birth

zip-code

age

E-R diagram with composite, multivalued, and derived attributes
...
Data Models

© The McGraw−Hill
Companies, 2001

2
...
5

Entity-Relationship Diagram

45

employee-name
telephone-number

employee-id

manager
works-for

employee
worker
Figure 2
...

We indicate roles in E-R diagrams by labeling the lines that connect diamonds
to rectangles
...
12 shows the role indicators manager and worker between the
employee entity set and the works-for relationship set
...
Figure 2
...

We can specify some types of many-to-one relationships in the case of nonbinary
relationship sets
...
This constraint can be speciﬁed by an arrow pointing to job on the edge from works-on
...
Suppose there is a relationship set R between entity sets A1 , A2 ,
...
, An
...
A particular combination of entities from A1 , A2 ,
...
, An
...
, Ai
...
13

branch-name
works-on

branch

E-R diagram with a ternary relationship
...
Data Models

© The McGraw−Hill
Companies, 2001

2
...
14

borrower

loan-number
amount
loan

Total participation of an entity set in a relationship set
...
For each entity set Ak , i < k ≤ n, each combination of the entities from the
other entity sets can be associated with at most one entity from Ak
...
, Ak−1 , Ak+1 ,
...

Each of these interpretations has been used in different books and systems
...
In Chapter 7 (Section 7
...

Double lines are used in an E-R diagram to indicate that the participation of an
entity set in a relationship set is total; that is, each entity in the entity set occurs in at
least one relationship in that relationship set
...
A double line from loan to borrower, as in
Figure 2
...

E-R diagrams also provide a way to indicate more complex constraints on the number of times each entity participates in relationships in a relationship set
...
h, where l is the minimum and h
the maximum cardinality
...
A maximum value of 1 indicates that the entity participates in at most one relationship, while a maximum value ∗ indicates no limit
...
∗ on an edge is equivalent to a double line
...
15
...
1, meaning the minimum and the maximum cardinality are
both 1
...
The limit 0
...
Thus, the relationship borrower is one to many from customer to loan, and
further the participation of loan in borrower is total
...
∗ on the edge between customer and borrower, and
think that the relationship borrower is many to one from customer to loan — this is
exactly the reverse of the correct interpretation
...
If we had speciﬁed a cardinality limit of 1
...

55

56

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

I
...
Entity−Relationship
Model

2
...
15

0
...
1

loan

Cardinality limits on relationship sets
...
6 Weak Entity Sets
An entity set may not have sufﬁcient attributes to form a primary key
...
An entity set that has a primary key is termed a strong
entity set
...
Payment numbers are typically
sequential numbers, starting from 1, generated separately for each loan
...
Thus, this entity set does not have a primary key; it is a weak
entity set
...
Every weak entity must be associated
with an identifying entity; that is, the weak entity set is said to be existence dependent on the identifying entity set
...
The relationship associating the weak entity set with the
identifying entity set is called the identifying relationship
...

In our example, the identifying entity set for payment is loan, and a relationship
loan-payment that associates payment entities with their corresponding loan entities is
the identifying relationship
...
The discriminator of a weak entity set is a set of attributes that allows this distinction to be made
...
The discriminator
of a weak entity set is also called the partial key of the entity set
...
In the case of the entity
set payment, its primary key is {loan-number, payment-number}, where loan-number is
the primary key of the identifying entity set, namely loan, and payment-number distinguishes payment entities within the same loan
...
Data Models

© The McGraw−Hill
Companies, 2001

2
...
2
...

A weak entity set can participate in relationships other than the identifying relationship
...
A
weak entity set may participate as owner in an identifying relationship with another
weak entity set
...
A particular weak entity would then be identiﬁed by a combination
of entities, one from each identifying entity set
...

In E-R diagrams, a doubly outlined box indicates a weak entity set, and a doubly outlined diamond indicates the corresponding identifying relationship
...
16, the weak entity set payment depends on the strong entity set loan via the
relationship set loan-payment
...
Finally,
the arrow from loan-payment to loan indicates that each payment is for a single loan
...

In some cases, the database designer may choose to express a weak entity set as
a multivalued composite attribute of the owner entity set
...
A weak
entity set may be more appropriately modeled as an attribute if it participates in only
the identifying relationship, and if it has few attributes
...

loan-number

payment-date

amount

payment-number

loan

Figure 2
...

57

58

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

I
...
Entity−Relationship
Model

2
...
The same course may be offered in
different semesters, and within a semester there may be several sections for the same
course
...

2
...

In this section, we discuss the extended E-R features of specialization, generalization,
higher- and lower-level entity sets, attribute inheritance, and aggregation
...
7
...
For instance, a subset of entities within an entity set
may have attributes that are not shared by all the entities in the entity set
...

Consider an entity set person, with attributes name, street, and city
...
For example, customer
entities may be described further by the attribute customer-id, whereas employee entities may be described further by the attributes employee-id and salary
...
The specialization of person allows us to distinguish among persons according to whether they
are employees or customers
...
Savings accounts need a minimum
balance, but the bank may set interest rates differently for different customers, offering better rates to favored customers
...

The bank could then create two specializations of account, namely savings-account
and checking-account
...
The entity set savings-account would have all the
attributes of account and an additional attribute interest-rate
...

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

50

Chapter 2

I
...
Entity−Relationship
Model

© The McGraw−Hill
Companies, 2001

Entity-Relationship Model

We can apply specialization repeatedly to reﬁne a design scheme
...
For example, ofﬁcer entities
may be described further by the attribute ofﬁce-number, teller entities by the attributes
station-number and hours-per-week, and secretary entities by the attribute hours-perweek
...

An entity set may be specialized by more than one distinguishing feature
...
Another, coexistent, specialization could be based on whether the person
is a temporary (limited-term) employee or a permanent employee, resulting in the
entity sets temporary-employee and permanent-employee
...
For instance, a given employee may be a temporary employee who is a
secretary
...
17 shows
...
The ISA relationship may also be referred to as
a superclass-subclass relationship
...

2
...
2 Generalization
The reﬁnement from an initial entity set into successive levels of entity subgroupings
represents a top-down design process in which distinctions are made explicit
...
The
database designer may have ﬁrst identiﬁed a customer entity set with the attributes
name, street, city, and customer-id, and an employee entity set with the attributes name,
street, city, employee-id, and salary
...
This commonality can be
expressed by generalization, which is a containment relationship that exists between
a higher-level entity set and one or more lower-level entity sets
...

Higher- and lower-level entity sets also may be designated by the terms superclass
and subclass, respectively
...

For all practical purposes, generalization is a simple inversion of specialization
...
Data Models

© The McGraw−Hill
Companies, 2001

2
...
7

name

street

Extended E-R Features

51

city

person

ISA
credit-rating

salary
employee

customer

ISA

officer

teller

secretary
hours-worked

office-number
station-number
Figure 2
...

schema for an enterprise
...
New levels of entity representation will be
distinguished (specialization) or synthesized (generalization) as the design schema
comes to express fully the database application and the user requirements of the
database
...

Specialization stems from a single entity set; it emphasizes differences among entities within the set by creating distinct lower-level entity sets
...
Indeed, the reason a designer applies specialization is to represent such distinctive features
...

Generalization proceeds from the recognition that a number of entity sets share
some common features (namely, they are described by the same attributes and participate in the same relationship sets)
...
Data Models

2
...
Generalization
is used to emphasize the similarities among lower-level entity sets and to hide the
differences; it also permits an economy of representation in that shared attributes are
not repeated
...
7
...
The attributes of the higher-level entity
sets are said to be inherited by the lower-level entity sets
...
Thus, customer is described by its name, street,
and city attributes, and additionally a customer-id attribute; employee is described by
its name, street, and city attributes, and additionally employee-id and salary attributes
...
The ofﬁcer, teller, and
secretary entity sets can participate in the works-for relationship set, since the superclass employee participates in the works-for relationship
...
The above entity sets can participate in any
relationships in which the person entity set participates
...

Figure 2
...
In the ﬁgure, employee is a lower-level
entity set of person and a higher-level entity set of the ofﬁcer, teller, and secretary entity
sets
...
If an entity set is a lower-level entity set in more than one ISA relationship,
then the entity set has multiple inheritance, and the resulting structure is said to be
a lattice
...
7
...
One type of constraint involves
determining which entities can be members of a given lower-level entity set
...
In condition-deﬁned lower-level entity sets, membership
is evaluated on the basis of whether or not an entity satisﬁes an explicit condition or predicate
...
Data Models

© The McGraw−Hill
Companies, 2001

2
...
7

Extended E-R Features

53

count has the attribute account-type
...
Only those entities that satisfy the condition
account-type = “savings account” are allowed to belong to the lower-level entity set person
...
Since all the lower-level entities are
evaluated on the basis of the same attribute (in this case, on account-type), this
type of generalization is said to be attribute-deﬁned
...
User-deﬁned lower-level entity sets are not constrained by a
membership condition; rather, the database user assigns entities to a given entity set
...
We therefore represent the
teams as four lower-level entity sets of the higher-level employee entity set
...
Instead, the user in charge of this decision makes the team assignment on an individual basis
...

A second type of constraint relates to whether or not entities may belong to more
than one lower-level entity set within a single generalization
...
A disjointness constraint requires that an entity belong to no more
than one lower-level entity set
...

• Overlapping
...
For an
illustration, consider the employee work team example, and assume that certain managers participate in more than one work team
...
Thus, the generalization is overlapping
...
The generalization is
overlapping if an employee can also be a customer
...
We can note a disjointedness constraint in an E-R diagram by adding the word disjoint next to the triangle symbol
...
This
constraint may be one of the following:
• Total generalization or specialization
...

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

54

Chapter 2

I
...
Entity−Relationship
Model

© The McGraw−Hill
Companies, 2001

Entity-Relationship Model

• Partial generalization or specialization
...

Partial generalization is the default
...
(This notation is similar to the notation for total participation
in a relationship
...
Because the higher-level entity set arrived at through
generalization is generally composed of only those entities in the lower-level entity
sets, the completeness constraint for a generalized higher-level entity set is usually
total
...
The work team entity sets illustrate a partial specialization
...

We may characterize the team entity sets more fully as a partial, overlapping specialization of employee
...
The completeness and disjointness constraints, however, do not depend on each other
...

We can see that certain insertion and deletion requirements follow from the constraints that apply to a given generalization or specialization
...
With a
condition-deﬁned constraint, all higher-level entities that satisfy the condition must
be inserted into that lower-level entity set
...

2
...
5 Aggregation
One limitation of the E-R model is that it cannot express relationships among relationships
...
13)
...

Let us assume that there is an entity set manager
...
(A quaternary relationship is
required — a binary relationship between manager and employee would not permit us
to represent which (branch, job) combinations of an employee are managed by which
manager
...
18
...
)
It appears that the relationship sets works-on and manages can be combined into
one single relationship set
...

63

64

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

I
...
Entity−Relationship
Model

2
...
18

E-R diagram with redundant relationships
...
If the manager were a
value rather than an manager entity, we could instead make manager a multivalued attribute of the relationship works-on
...
Since the manager is a manager entity, this alternative is
ruled out in any case
...
Aggregation is an abstraction through which relationships are treated as higherlevel entities
...

Such an entity set is treated in the same manner as is any other entity set
...
Figure 2
...

2
...
6 Alternative E-R Notations
Figure 2
...
There is
no universal standard for E-R diagram notation, and different books and E-R diagram
software use different notations; Figure 2
...
An entity set may be represented as a box with the name
outside, and the attributes listed one below the other within the box
...

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

56

Chapter 2

I
...
Entity−Relationship
Model

Entity-Relationship Model

job

employee

works-on

branch

manages

manager
Figure 2
...

Cardinality constraints can be indicated in several different ways, as Figure 2
...
The labels ∗ and 1 on the edges out of the relationship are sometimes used for
depicting many-to-many, one-to-one, and many-to-one relationships, as the ﬁgure
shows
...
In
another alternative notation in the ﬁgure, relationship sets are represented by lines
between entity sets, without diamonds; only binary relationships can be modeled
thus
...

2
...
In this section, we consider how a database designer may
select from the wide range of alternatives
...
2
...
2
...
2
...
Data Models

© The McGraw−Hill
Companies, 2001

2
...
8

Design of an E-R Database Schema

E

entity set

A

attribute

E

weak entity set

A

multivalued
attribute

R

relationship set

A

derived attribute

R

identifying
relationship
set for weak
entity set

A

one-to-one
relationship

rolename

ISA

E

total
participation
of entity set
in relationship
discriminating
attribute of
weak entity set

A

many-to-many
relationship

R

R

primary key

R

R

57

many-to-one
relationship

R

l
...
20

disjoint

Symbols used in the E-R notation
...
6); a strong entity set
and its dependent weak entity sets may be regarded as a single “object” in the
database, since weak entities are existence dependent on a strong entity
• Whether using generalization (Section 2
...
2) is appropriate; generalization, or
a hierarchy of ISA relationships, contributes to modularity by allowing com-

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

58

Chapter 2

I
...
Entity−Relationship
Model

Entity-Relationship Model

E
entity set E with
attributes A1, A2, A3
and primary key A1

A1
A2
A3

many-to-many
relationship

*

one-to-one
relationship

1

many-to-one
relationship

*

R

R

R

Figure 2
...

mon attributes of similar entity sets to be represented in one place in an E-R
diagram
• Whether using aggregation (Section 2
...
5) is appropriate; aggregation groups
a part of an E-R diagram into a single entity set, allowing us to treat the aggregate entity set as a single unit without concern for the details of its internal
structure
...

2
...
1 Design Phases
A high-level data model serves the database designer by providing a conceptual
framework in which to specify, in a systematic fashion, what the data requirements
of the database users are, and how the database will be structured to fulﬁll these
requirements
...
The database designer needs to interact
extensively with domain experts and users to carry out this task
...

Next, the designer chooses a data model, and by applying the concepts of the
chosen data model, translates these requirements into a conceptual schema of the
database
...
Since we have studied only the E-R model so far, we shall

67

68

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

I
...
Entity−Relationship
Model

2
...
Stated in terms of the E-R model, the schema
speciﬁes all entity sets, relationship sets, attributes, and mapping constraints
...
She can also examine the design to remove
any redundant features
...

A fully developed conceptual schema will also indicate the functional requirements of the enterprise
...
Example
operations include modifying or updating data, searching for and retrieving speciﬁc
data, and deleting data
...

The process of moving from an abstract data model to the implementation of the
database proceeds in two ﬁnal design phases
...
The designer uses the resulting systemspeciﬁc database schema in the subsequent physical-design phase, in which the
physical features of the database are speciﬁed
...

In this chapter, we cover only the concepts of the E-R model as used in the conceptual-schema-design phase
...
Database design
receives a full treatment in Chapter 7
...
8
...
We employ the E-R data model to translate user requirements
into a conceptual design schema that is depicted as an E-R diagram
...
8
...
However, we do not attempt to model every
aspect of the database-design for a bank; we consider only a few aspects, in order to
illustrate the process of database design
...
8
...
1 Data Requirements
The initial speciﬁcation of user requirements may be based on interviews with the
database users, and on the designer’s own analysis of the enterprise
...
Here are the major characteristics of the banking enterprise
...
Each branch is located in a particular
city and is identiﬁed by a unique name
...

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

60

Chapter 2

I
...
Entity−Relationship
Model

© The McGraw−Hill
Companies, 2001

Entity-Relationship Model

• Bank customers are identiﬁed by their customer-id values
...
Customers
may have accounts and can take out loans
...

• Bank employees are identiﬁed by their employee-id values
...
The bank also keeps track of the employee’s start date and, thus,
length of employment
...
Accounts can be held by more than one customer, and a customer can have more
than one account
...
The
bank maintains a record of each account’s balance, and the most recent date on
which the account was accessed by each customer holding the account
...

• A loan originates at a particular branch and can be held by one or more customers
...
For each loan, the bank
keeps track of the loan amount and the loan payments
...
The date and amount are recorded for each payment
...
Since the modeling requirements for that tracking are similar, and we
would like to keep our example application small, we do not keep track of such deposits and withdrawals in our model
...
8
...
2 Entity Sets Designation
Our speciﬁcation of data requirements serves as the starting point for constructing a
conceptual schema for the database
...
8
...
1,
we begin to identify entity sets and their attributes:
• The branch entity set, with attributes branch-name, branch-city, and assets
...
A possible additional attribute is banker-name
...
Additional descriptive features are the multivalued attribute dependent-name, the base attribute start-date, and the derived attribute employment-length
...
Data Models

© The McGraw−Hill
Companies, 2001

2
...
8

Design of an E-R Database Schema

61

• Two account entity sets — savings-account and checking-account — with the common attributes of account-number and balance; in addition, savings-account has
the attribute interest-rate and checking-account has the attribute overdraft-amount
...

• The weak entity set loan-payment, with attributes payment-number, paymentdate, and payment-amount
...
8
...
3 Relationship Sets Designation
We now return to the rudimentary design scheme of Section 2
...
2
...
In the process, we also reﬁne
some of the decisions we made earlier regarding attributes of entity sets
...

• loan-branch, a many-to-one relationship set that indicates in which branch a
loan originated
...

• loan-payment, a one-to-many relationship from loan to payment, which documents that a payment is made on a loan
...

• cust-banker, with relationship attribute type, a many-to-one relationship set expressing that a customer can be advised by a bank employee, and that a bank
employee can advise one or more customers
...

• works-for, a relationship set between employee entities with role indicators manager and worker; the mapping cardinalities express that an employee works
for only one manager and that a manager supervises one or more employees
...

2
...
2
...
8
...
3, we now present the completed E-R diagram for our example banking enterprise
...
22 depicts the full representation
of a conceptual model of a bank, expressed in terms of E-R concepts
...
8
...
1 and 2
...
2
...
8
...
3
...
Data Models

© The McGraw−Hill
Companies, 2001

2
...
22

interest-rate

overdraft-amount

E-R diagram for a banking enterprise
...
9 Reduction of an E-R Schema to Tables
We can represent a database that conforms to an E-R database schema by a collection
of tables
...
Each table has multiple columns, each of which has a unique name
...
Because the two models employ similar design principles, we can convert an E-R design into a relational design
...
Although important differences

71

72

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

I
...
Entity−Relationship
Model

2
...

In this section, we describe how an E-R schema can be represented by tables; and
in Chapter 3, we show how to generate a relational-database schema from an E-R
schema
...

We provide more details about this mapping in Chapter 6 after describing how to
specify constraints on tables
...
9
...
, an
...
Each row in this table corresponds to one entity of the entity
set E
...
8
...
We represent this entity set by
a table called loan, with two columns, as in Figure 2
...
The row
(L-17, 1000)
in the loan table means that loan number L-17 has a loan amount of $1000
...
We can also delete or
modify rows
...

Any row of the loan table must consist of a 2-tuple (v1 , v2 ), where v1 is a loan (that
is, v1 is in set D1 ) and v2 is an amount (that is, v2 is in set D2 )
...
We refer to the set of all
possible rows of loan as the Cartesian product of D1 and D2 , denoted by
D1 × D2
In general, if we have a table of n columns, we denote the Cartesian product of
D1 , D2 , · · · , Dn by
D1 × D2 × · · · × Dn−1 × Dn
loan-number
L-11
L-14
L-15
L-16
L-17
L-23
L-93
Figure 2
...

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

64

Chapter 2

I
...
Entity−Relationship
Model

Entity-Relationship Model

customer-id
019-28-3746
182-73-6091
192-83-7465
244-66-8800
321-12-3123
335-57-7991
336-66-9999
677-89-9011
963-96-3963

customer-name
Smith
Turner
Johnson
Curry
Jones
Adams
Lindsay
Hayes
Williams
Figure 2
...

As another example, consider the entity set customer of the E-R diagram in Figure 2
...
This entity set has the attributes customer-id, customer-name, customer-street,
and customer-city
...
24
...
9
...
, am
...
Let the primary key of B consist of attributes b1 , b2 ,
...
We
represent the entity set A by a table called A with one column for each attribute of
the set:
{a1 , a2 ,
...
, bn }
As an illustration, consider the entity set payment in the E-R diagram of Figure 2
...

This entity set has three attributes: payment-number, payment-date, and payment-amount
...

Thus, we represent payment by a table with four columns labeled loan-number, paymentnumber, payment-date, and payment-amount, as in Figure 2
...

2
...
3 Tabular Representation of Relationship Sets
Let R be a relationship set, let a1 , a2 ,
...
, bn
...
, am } ∪ {b1 , b2 ,
...
8
...
Data Models

© The McGraw−Hill
Companies, 2001

2
...
9

loan-number
L-11
L-14
L-15
L-16
L-17
L-17
L-17
L-23
L-93
L-93

Reduction of an E-R Schema to Tables

payment-number
53
69
22
58
5
6
7
11
103
104

payment-date
7 June 2001
28 May 2001
23 May 2001
18 June 2001
10 May 2001
7 June 2001
17 June 2001
17 May 2001
3 June 2001
13 June 2001

Figure 2
...

Since the relationship set has no attributes, the borrower table has two columns, labeled customer-id and loan-number, as shown in Figure 2
...

2
...
3
...
As we noted in Section 2
...
Furthermore, the primary key of a weak entity set includes the primary key of the strong entity set
...
16, the
weak entity set payment is dependent on the strong entity set loan via the relationship set loan-payment
...
Since loan-payment has no descriptive
attributes, the loan-payment table would have two columns, loan-number and paymentnumber
...
Every (loan-number, payment-number) combination in loan-payment would also be present in the payment table, and vice versa
...
In general, the table for the relationship set

customer-id
019-28-3746
019-28-3746
244-66-8800
321-12-3123
335-57-7991
555-55-5555
677-89-9011
963-96-3963
Figure 2
...

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

66

Chapter 2

I
...
Entity−Relationship
Model

Entity-Relationship Model

linking a weak entity set to its corresponding strong entity set is redundant and does
not need to be present in a tabular representation of an E-R diagram
...
9
...
2 Combination of Tables
Consider a many-to-one relationship set AB from entity set A to entity set B
...

Suppose further that the participation of A in the relationship is total; that is, every
entity a in the entity set A must participate in the relationship AB
...

As an illustration, consider the E-R diagram of Figure 2
...
The double line in the
E-R diagram indicates that the participation of account in the account-branch is total
...

Further, the relationship set account-branch is many to one from account to branch
...
9
...
Suppose address is a composite attribute of entity set customer, and the components of address are street and city
...

2
...
5 Multivalued Attributes
We have seen that attributes in an E-R diagram generally map directly into columns
for the appropriate tables
...

branch-name
account-number
account

branch-city
assets

balance
accountbranch

Figure 2
...

branch

75

76

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

I
...
Entity−Relationship
Model

2
...
As an illustration, consider the E-R diagram
in Figure 2
...
The diagram includes the multivalued attribute dependent-name
...
Each dependent of an employee is represented
as a unique row in the table
...
9
...
Although we refer to the generalization in Figure 2
...

1
...
For each lower-level entity set,
create a table that includes a column for each of the attributes of that entity set
plus a column for each attribute of the primary key of the higher-level entity
set
...
17, we have three tables:
• account, with attributes account-number and balance
• savings-account, with attributes account-number and interest-rate
• checking-account, with attributes account-number and overdraft-amount
2
...
Here, do not
create a table for the higher-level entity set
...
Then,
for the E-R diagram of Figure 2
...

• savings-account, with attributes account-number, balance, and interest-rate
• checking-account, with attributes account-number, balance, and overdraftamount
The savings-account and checking-account relations corresponding to these
tables both have account-number as the primary key
...
Similarly, if the generalization
were not complete — that is, if some accounts were neither savings nor checking
accounts — then such accounts could not be represented with the second method
...
9
...
Consider the diagram of Figure 2
...
The table for the relationship set

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

68

Chapter 2

I
...
Entity−Relationship
Model

© The McGraw−Hill
Companies, 2001

Entity-Relationship Model

manages between the aggregation of works-on and the entity set manager includes a
column for each attribute in the primary keys of the entity set manager and the relationship set works-on
...
We then transform the relationship sets
and entity sets within the aggregated entity
...
10 The Uniﬁed Modeling Language UML∗∗
Entity-relationship diagrams help model the data representation component of a software system
...
Other components include models of user interactions with the system, speciﬁcation of functional modules of the system and their interaction, etc
...
Some of the parts of UML are:
• Class diagram
...
Later in this
section we illustrate a few features of class diagrams and how they relate to
E-R diagrams
...
Use case diagrams show the interaction between users and
the system, in particular the steps of tasks that users perform (such as withdrawing money or registering for a course)
...
Activity diagrams depict the ﬂow of tasks between various
components of a system
...
Implementation diagrams show the system components and their interconnections, both at the software component level and
the hardware component level
...

See the bibliographic notes for references on UML
...

Figure 2
...
We describe these constructs below
...
UML actually models objects, whereas E-R models entities
...
Class diagrams can depict methods in addition to attributes
...

We represent binary relationship sets in UML by just drawing a line connecting
the entity sets
...
We may also
specify the role played by an entity set in a relationship set by writing the role name
on the line, adjacent to the entity set
...
This box can then be treated as

77

78

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

I
...
Entity−Relationship
Model

2
...
entity sets
and attributes

The Uniﬁed Modeling Language UML∗∗

customer
customer-id
customer-name
customer-street
customer-city

customer-city

customer-id
customer

2
...
cardinality
constraints

E1

role1

0
...
1

person
4
...
*

E2

E2

employee

(disjoint
generalization)

person

customer
employee

E-R diagram

Figure 2
...
1

customer
customer

role2

person

(overlapping
generalization)

ISA

R

R
a1
a2

a2

a1
E1

role2

69

employee

class diagram in UML

Symbols used in the UML class diagram notation
...

Nonbinary relationships cannot be directly represented in UML — they have to
be converted to binary relationships by the technique we have seen earlier in Section 2
...
3
...
Data Models

2
...
h, where l denotes the minimum and h the maximum number of relationships an entity can participate in
...
28
...
∗ on the E2 side and 0
...

Single values such as 1 or ∗ may be written on edges; the single value 1 on an edge
is treated as equivalent to 1
...

We represent generalization and specialization in UML by connecting entity sets
by a line with a triangle at the end corresponding to the more general entity set
...
UML
diagrams can also represent explicitly the constraints of disjoint/overlapping on generalizations
...
28 shows disjoint and overlapping generalizations of customer
and employee to person
...
An overlapping
generalization allows a person to be both a customer and an employee
...
11 Summary
• The entity-relationship (E-R) data model is based on a perception of a real
world that consists of a set of basic objects called entities, and of relationships
among these objects
...
It was developed to facilitate database design by allowing the speciﬁcation of an enterprise schema
...
This overall structure can be expressed graphically by an E-R diagram
...
We express the distinction by associating with each entity a set
of attributes that describes the object
...
The collection of all
entities of the same type is an entity set, and the collection of all relationships
of the same type is a relationship set
...

• A superkey of an entity set is a set of one or more attributes that, taken collectively, allows us to identify uniquely an entity in the entity set
...
Similarly, a relationship set
is a set of one or more attributes that, taken collectively, allows us to identify
uniquely a relationship in the relationship set
...
Data Models

© The McGraw−Hill
Companies, 2001

2
...
11

Summary

71

mal superkey for each relationship set from among its superkeys; this is the
relationship set’s primary key
...
An entity set that has a primary key is termed a
strong entity set
...
Specialization
is the result of taking a subset of a higher-level entity set to form a lowerlevel entity set
...
The attributes of higher-level entity sets are inherited by lower-level entity sets
...

• The various features of the E-R model offer the database designer numerous
choices in how to best represent the enterprise being modeled
...
Aspects of the overall structure of the enterprise may be best described by using weak entity sets, generalization, specialization, or aggregation
...

• A database that conforms to an E-R diagram can be represented by a collection
of tables
...
Each table has a number of columns, each of which has a
unique name
...

• The uniﬁed modeling language (UML) provides a graphical means of modeling various components of a software system
...
However, there are some differences between the two that one must beware of
...
Data Models

© The McGraw−Hill
Companies, 2001

2
...
1 Explain the distinctions among the terms primary key, candidate key, and superkey
...
2 Construct an E-R diagram for a car-insurance company whose customers own
one or more cars each
...

2
...
Associate with each patient a log of the various tests and examinations conducted
...
4 A university registrar’s ofﬁce maintains data about the following entities: (a)
courses, including number, title, credits, syllabus, and prerequisites; (b) course
offerings, including course number, year, semester, section number, instructor(s),
timings, and classroom; (c) students, including student-id, name, and program;
and (d) instructors, including identiﬁcation number, name, department, and title
...

Construct an E-R diagram for the registrar’s ofﬁce
...

2
...

a
...

81

82

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

I
...
Entity−Relationship
Model

© The McGraw−Hill
Companies, 2001

Exercises

73

b
...
Make sure that only one relationship
exists between a particular student and course-offering pair, yet you can
represent the marks that a student gets in different exams of a course offering
...
6 Construct appropriate tables for each of the E-R diagrams in Exercises 2
...
4
...
7 Design an E-R diagram for keeping track of the exploits of your favourite sports
team
...
Summary statistics should be modeled as derived attributes
2
...

2
...

2
...
Why, then, do we have weak entity sets?
2
...
Give two examples of where this concept is
useful
...
12 Consider the E-R diagram in Figure 2
...

a
...

b
...
The same music item may be present in cassette or compact disk
format, with differing prices
...

c
...

2
...

Why is allowing this redundancy a bad practice that one should avoid whenever
possible?
2
...

This database could be modeled as the single entity set exam, with attributes
course-name, section-number, room-number, and time
...
Show an E-R diagram illustrating the use of all three additional entity sets
listed
...
Data Models

© The McGraw−Hill
Companies, 2001

2
...
29

address

code

phone

E-R diagram for Exercise 2
...

b
...

2
...

a
...
Design three alternative E-R diagrams to represent the university registrar’s
ofﬁce of Exercise 2
...
List the merits of each
...

2
...
What do the following mean in terms
of the structure of an enterprise schema?
a
...

b
...

2
...
4
...
30a) using binary relationships, as shown in Figure 2
...
Consider the alternative shown in

83

84

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

I
...
Entity−Relationship
Model

Exercises

75

A
A
RA
B

R

C

B

RB

(a)

E

RC

C

(b)

R1

A

R3

B

R2

C

(c)

Figure 2
...
17 (attributes not shown)
...
30c
...

2
...
4
...
30b
...
Show a simple instance of E, A, B, C, RA , RB , and RC that cannot correspond to any instance of A, B, C, and R
...
Modify the E-R diagram of Figure 2
...

c
...

d
...
Show how to treat E as a weak entity set so that a primary key attribute
is not required
...
19 A weak entity set can always be made into a strong entity set by adding to its
attributes the primary key attributes of its identifying entity set
...

2
...
The company sells motorcycles, passenger cars, vans, and buses
...
Explain why they
should not be placed at a higher or lower level
...
Data Models

© The McGraw−Hill
Companies, 2001

2
...
21 Explain the distinction between condition-deﬁned and user-deﬁned constraints
...

2
...

2
...

2
...
31 shows a lattice structure of generalization and specialization
...
Discuss how to handle a case where an attribute of X
has the same name as some attribute of Y
...
25 Draw the UML equivalents of the E-R diagrams of Figures 2
...
10, 2
...
13
and 2
...

2
...
Assume that both banks
use exactly the same E-R database schema — the one in Figure 2
...
(This assumption is, of course, highly unrealistic; we consider the more realistic case in
Section 19
...
) If the merged bank is to have a single database, there are several
potential problems:
• The possibility that the two original banks have branches with the same
name
• The possibility that some customers are customers of both original banks
• The possibility that some loan or account numbers were used at both original banks (for different loans or accounts, of course)
For each of these potential problems, describe why there is indeed a potential
for difﬁculties
...
For your solution, explain any
changes that would have to be made and describe what their effect would be on
the schema and the data
...
27 Reconsider the situation described for Exercise 2
...
As before, the
banks use the schema of Figure 2
...
S
...
What problems (be-

X

ISA

A

Figure 2
...
24 (attributes not shown)
...
Data Models

2
...
24) might occur in this multinational case?
How would you resolve them? Be sure to consider both the scheme and the
actual data values in constructing your answer
...
A logical design methodology for
relational databases using the extended E-R model is presented by Teorey et al
...

Mapping from extended E-R models to the relational model is discussed by Lyngbaek
and Vianu [1987] and Markowitz and Shoshani [1992]
...
[1981]),
GORDAS (Elmasri and Wiederhold [1981]), and ERROL (Markowitz and Raz [1983])
...

Smith and Smith [1977] introduced the concepts of generalization, specialization,
and aggregation and Hammer and McLeod [1980] expanded them
...

Thalheim [2000] provides a detailed textbook coverage of research in E-R modeling
...
[1992] and Elmasri and
Navathe [2000]
...
[1983] provide a collection of papers on the E-R model
...

These tools help a designer create E-R diagrams, and they can automatically create corresponding tables in a database
...
There are also some databaseindependent data modeling tools that support E-R diagrams and UML class diagrams
...
rational
...
visio
...
cai
...

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

C

I
...
Relational Model

T

E

R

87

© The McGraw−Hill
Companies, 2001

3

Relational Model

The relational model is today the primary data model for commercial data-processing
applications
...

In this chapter, we ﬁrst study the fundamentals of the relational model, which provides a very simple yet powerful way of representing data
...
The three we cover in this chapter are not user-friendly, but instead serve as
the formal basis for user-friendly query languages that we study later
...
The relational algebra forms
the basis of the widely used SQL query language
...
The
domain relational calculus is the basis of the QBE query language
...
We study the part of this theory
dealing with queries in this chapter
...

3
...
Each table has a structure similar to that presented in Chapter 2, where
we represented E-R databases by tables
...
Since a table is a collection of such relationships, there is a
close correspondence between the concept of table and the mathematical concept of
79

88

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

80

Chapter 3

I
...
Relational Model

Relational Model

relation, from which the relational data model takes its name
...

In this chapter, we shall be using a number of different relations to illustrate the
various concepts underlying the relational data model
...
They differ slightly from the tables that were used in Chapter 2, so that we can simplify our presentation
...

3
...
1 Basic Structure
Consider the account table of Figure 3
...
It has three column headers: account-number,
branch-name, and balance
...
For each
attribute, there is a set of permitted values, called the domain of that attribute
...
Let
D1 denote the set of all account numbers, D2 the set of all branch names, and D3
the set of all balances
...
In general, account will contain only a subset of the set of all possible
rows
...
This deﬁnition corresponds almost exactly with our deﬁnition of table
...
Because tables are essentially relations, we shall use the mathematical
account-number
A-101
A-102
A-201
A-215
A-217
A-222
A-305
Figure 3
...

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

I
...
1

account-number
A-101
A-215
A-102
A-305
A-201
A-222
A-217
Figure 3
...
Relational Model

Structure of Relational Databases

81

branch-name balance
Downtown
500
Mianus
700
Perryridge
400
Round Hill
350
Brighton
900
Redwood
700
Brighton
750

The account relation with unordered tuples
...
A tuple variable is a
variable that stands for a tuple; in other words, a tuple variable is a variable whose
domain is the set of all tuples
...
1, there are seven tuples
...
We use the notation t[account-number] to denote
the value of t on the account-number attribute
...
Alternatively, we may write t[1] to denote the value
of tuple t on the ﬁrst attribute (account-number), t[2] to denote branch-name, and so on
...

The order in which tuples appear in a relation is irrelevant, since a relation is a
set of tuples
...
1, or are unsorted, as in Figure 3
...

We require that, for all relations r, the domains of all attributes of r be atomic
...

For example, the set of integers is an atomic domain, but the set of all sets of integers
is a nonatomic domain
...
The important issue is not what the domain itself is,
but rather how we use domain elements in our database
...
In
all our examples, we shall assume atomic domains
...

It is possible for several attributes to have the same domain
...
It is possible that the attributes customer-name and employee-name will
have the same domain: the set of all person names, which at the physical level is
the set of all character strings
...
It is perhaps less clear whether customer-name
and branch-name should have the same domain
...
However, at the logical level, we may
want customer-name and branch-name to have distinct domains
...
Data Models

3
...
For example, suppose
that we include the attribute telephone-number in the customer relation
...
We would then have to resort to null values to signify that the value is unknown or does not exist
...
We shall assume null values are absent initially, and in Section 3
...
4, we
describe the effect of nulls on different operations
...
1
...

The concept of a relation corresponds to the programming-language notion of a
variable
...

It is convenient to give a name to a relation schema, just as we give names to type
deﬁnitions in programming languages
...
Following this notation, we use Account-schema to denote the relation
schema for relation account
...
We shall not be concerned about the precise deﬁnition of the domain of
each attribute until we discuss the SQL language in Chapter 4
...
The value of a given variable may change with time;
similarly the contents of a relation instance may change with time as the relation is
updated
...
”
As an example of a relation instance, consider the branch relation of Figure 3
...
The
schema for that relation is
Branch-schema = (branch-name, branch-city, assets)
Note that the attribute branch-name appears in both Branch-schema and Accountschema
...
Rather, using common attributes in
relation schemas is one way of relating tuples of distinct relations
...
Data Models

3
...
3

91

© The McGraw−Hill
Companies, 2001

3
...

located in Brooklyn
...
Then, for each such branch, we would look in the account relation to ﬁnd the information about the accounts maintained at that branch
...

Let us continue our banking example
...
The relation schema is
Customer -schema = (customer-name, customer-street, customer-city)
Figure 3
...
Note that we have
omitted the customer-id attribute, which we used Chapter 2, because now we want to
have smaller relation schemas in our running example of a bank database
...

customer-name
Adams
Brooks
Curry
Glenn
Green
Hayes
Johnson
Jones
Lindsay
Smith
Turner
Williams
Figure 3
...

92

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

84

Chapter 3

I
...
Relational Model

Relational Model

In a real-world database, the customer-id (which could be a social-security number, or
an identiﬁer generated by the bank) would serve to uniquely identify customers
...
The relation schema to describe this association is
Depositor -schema = (customer-name, account-number)
Figure 3
...

It would appear that, for our banking example, we could have just one relation
schema, rather than several
...
Suppose that we used only one
relation for our example, with schema
(branch-name, branch-city, assets, customer-name, customer-street
customer-city, account-number, balance)
Observe that, if a customer has several accounts, we must list her address once for
each account
...
This repetition is wasteful and is avoided by the use of several relations, as in our example
...
To represent
incomplete tuples, we must use null values that signify that the value is unknown or
does not exist
...
By using several relations, we can represent the branch information for a bank with no customers without using null values
...

In Chapter 7, we shall study criteria to help us decide when one set of relation
schemas is more appropriate than another, in terms of information repetition and
the existence of null values
...

We include two additional relations to describe data about loans maintained in the
various branches in the bank:
customer-name
Hayes
Johnson
Johnson
Jones
Lindsay
Smith
Turner
Figure 3
...

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

I
...
1

loan-number
L-11
L-14
L-15
L-16
L-17
L-23
L-93
Figure 3
...
Relational Model

Structure of Relational Databases

85

branch-name amount
Round Hill
900
Downtown
1500
Perryridge
1500
Perryridge
1300
Downtown
1000
Redwood
2000
Mianus
500
The loan relation
...
6 and 3
...

The E-R diagram in Figure 3
...
The relation schemas correspond to the set of tables that we might generate by the method outlined in Section 2
...
Note that the tables for account-branch and
loan-branch have been combined into the tables for account and loan respectively
...
Finally, we
note that the customer relation may contain information about customers who have
neither an account nor a loan at the bank
...
On occasion, we shall need to introduce additional
relation schemas to illustrate particular points
...
1
...
For example, in Branch-schema, {branchcustomer-name
Adams
Curry
Hayes
Jackson
Jones
Smith
Smith
Williams
Figure 3
...

94

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

86

Chapter 3

I
...
Relational Model

Relational Model

branch-city
branch-name

balance

account-number

account-branch

account

depositor

branch

loan-branch

customer

customer-name

assets

loan

borrower

customer-city
loan-number

customer-street

Figure 3
...

name} and {branch-name, branch-city} are both superkeys
...
However, {branch-name} is a candidate
key, and for our purpose also will serve as a primary key
...

Let R be a relation schema
...
That is, if t1 and t2 are in r and t1 = t2 , then
t1 [K] = t2 [K]
...
The primary key of the entity set becomes the primary key
of the relation
...
The table, and thus the relation, corresponding to a weak
entity set includes
The attributes of the weak entity set
The primary key of the strong entity set on which the weak entity set
depends

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

I
...
Relational Model

3
...

• Relationship set
...
If the relationship is many-to-many, this superkey is also the primary key
...
4
...
Recall from Section 2
...
3 that no table is generated for relationship sets linking a weak entity set to the corresponding strong
entity set
...
Recall from Section 2
...
3 that a binary many-to-one relationship set from A to B can be represented by a table consisting of the attributes of A and attributes (if any exist) of the relationship set
...
For one-to-one relationship sets, the relation
is constructed like that for a many-to-one relationship set
...

• Multivalued attributes
...
9
...
The primary key of the entity or relationship set, together
with the attribute C, becomes the primary key for the relation
...
This attribute is called a foreign key from r1 , referencing r2
...
For example, the attribute branch-name in
Account-schema is a foreign key from Account-schema referencing Branch-schema, since
branch-name is the primary key of Branch-schema
...

It is customary to list the primary key attributes of a relation schema before the
other attributes; for example, the branch-name attribute of Branch-schema is listed ﬁrst,
since it is the primary key
...
1
...
Figure 3
...
Each relation appears as a box, with the attributes listed inside it and the relation name above it
...
Foreign

96

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

88

Chapter 3

I
...
Relational Model

Relational Model

branch

account

depositor

customer

branch–name
branch–city
assets

account–number
branch–name
balance

customer–name
account–number

customer–name
customer–street
customer–city

loan
loan–number
branch–name
amount

Figure 3
...

key dependencies appear as arrows from the foreign key attributes of the referencing
relation to the primary key of the referenced relation
...
In particular, E-R diagrams
do not show foreign key attributes explicitly, whereas schema diagrams show them
explicity
...

3
...
5 Query Languages
A query language is a language in which a user requests information from the database
...
Query languages can be categorized as either procedural or nonprocedural
...
In a nonprocedural language, the user describes the desired information without giving a speciﬁc
procedure for obtaining that information
...
We shall study
the very widely used query language SQL in Chapter 4
...

In this chapter, we examine “pure” languages: The relational algebra is procedural, whereas the tuple relational calculus and domain relational calculus are nonprocedural
...

Although we shall be concerned with only queries initially, a complete datamanipulation language includes not only a query language, but also a language for
database modiﬁcation
...
Data Models

97

© The McGraw−Hill
Companies, 2001

3
...
2

The Relational Algebra

89

as well as commands to modify parts of existing tuples
...

3
...
It consists of a set of operations
that take one or two relations as input and produce a new relation as their result
...
In addition to the fundamental operations, there are
several other operations— namely, set intersection, natural join, division, and assignment
...

3
...
1 Fundamental Operations
The select, project, and rename operations are called unary operations, because they
operate on one relation
...

3
...
1
...
We use the lowercase
Greek letter sigma (σ) to denote selection
...

The argument relation is in parentheses after the σ
...
6, then the relation that results from the
preceding query is as shown in Figure 3
...

We can ﬁnd all tuples in which the amount lent is more than $1200 by writing
σamount>1200 (loan)
In general, we allow comparisons using =, =, <, ≤, >, ≥ in the selection predicate
...
Thus, to ﬁnd those tuples pertaining to loans
of more than $1200 made by the Perryridge branch, we write
σbranch-name = “Perryridge” ∧ amount>1200 (loan)
loan-number
L-15
L-16
Figure 3
...

98

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

90

Chapter 3

I
...
Relational Model

Relational Model

The selection predicate may include comparisons between two attributes
...
To ﬁnd all customers who have the
same name as their loan ofﬁcer, we can write
σcustomer -name = banker -name (loan-oﬃcer )

3
...
1
...
The project operation allows us to produce this relation
...
Since a relation is a set, any duplicate rows are eliminated
...
We list those attributes that
we wish to appear in the result as a subscript to Π
...
Thus, we write the query to list all loan numbers and the amount of the
loan as
Πloan-number , amount (loan)
Figure 3
...

3
...
1
...
Consider the more complicated query “Find those customers who live in Harrison
...

In general, since the result of a relational-algebra operation is of the same type
(relation) as its inputs, relational-algebra operations can be composed together into
loan-number
L-11
L-14
L-15
L-16
L-17
L-23
L-93
Figure 3
...

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

I
...
Relational Model

3
...
Composing relational-algebra operations into relational-algebra expressions is just like composing arithmetic operations (such as +, −,
∗, and ÷) into arithmetic expressions
...
2
...

3
...
1
...
Note that the customer relation does not contain the information,
since a customer does not need to have either an account or a loan at the bank
...
5) and
in the borrower relation (Figure 3
...
We know how to ﬁnd the names of all customers
with a loan in the bank:
Πcustomer -name (borrower )
We also know how to ﬁnd the names of all customers with an account in the bank:
Πcustomer -name (depositor )
To answer the query, we need the union of these two sets; that is, we need all customer names that appear in either or both of the two relations
...
So the expression needed
is
Πcustomer -name (borrower ) ∪ Πcustomer -name (depositor )
The result relation for this query appears in Figure 3
...
Notice that there are 10 tuples
in the result, even though there are seven distinct borrowers and six depositors
...
Since relations are sets, duplicate values are eliminated
...
12

Names of all customers who have either a loan or an account
...
Data Models

3
...
In general, we must ensure that unions are taken between compatible relations
...
The former is a relation of three attributes;
the latter is a relation of two
...
Such a union would not make sense in most situations
...
The relations r and s must be of the same arity
...

2
...

Note that r and s can be, in general, temporary relations that are the result of relationalalgebra expressions
...
2
...
5 The Set Difference Operation
The set-difference operation, denoted by −, allows us to ﬁnd tuples that are in one
relation but are not in another
...

We can ﬁnd all customers of the bank who have an account but not a loan by
writing
Πcustomer -name (depositor ) − Πcustomer -name (borrower )
The result relation for this query appears in Figure 3
...

As with the union operation, we must ensure that set differences are taken between compatible relations
...

3
...
1
...
We write the Cartesian product of relations r1 and
r2 as r1 × r2
...
13

Customers with an account but no loan
...
Data Models

101

© The McGraw−Hill
Companies, 2001

3
...
2

The Relational Algebra

93

Recall that a relation is by deﬁnition a subset of a Cartesian product of a set of
domains
...
However, since the same attribute name
may appear in both r1 and r2 , we need to devise a naming schema to distinguish
between these attributes
...
For example, the relation schema
for r = borrower × loan is
(borrower
...
loan-number, loan
...
branch-name, loan
...
loan-number from loan
...
For
those attributes that appear in only one of the two schemas, we shall usually drop
the relation-name preﬁx
...
We can
then write the relation schema for r as
(customer-name, borrower
...
loan-number,
branch-name, amount)
This naming convention requires that the relations that are the arguments of the
Cartesian-product operation have distinct names
...

A similar problem arises if we use the result of a relational-algebra expression in a
Cartesian product, since we shall need a name for the relation so that we can refer
to the relation’s attributes
...
2
...
7, we see how to avoid these problems by
using a rename operation
...
Thus, r is a large
relation, as you can see from Figure 3
...

Assume that we have n1 tuples in borrower and n2 tuples in loan
...
In particular, note that for some tuples t in r, it may be that
t[borrower
...
loan-number]
...
Relation R contains all tuples t for which
there is a tuple t1 in r1 and a tuple t2 in r2 for which t[R1 ] = t1 [R1 ] and t[R2 ] =
t2 [R2 ]
...
We need the information in both the loan relation and the borrower
relation to do so
...
15
...
However, the customer-name column may contain customers

102

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

94

Chapter 3

I
...
Relational Model

Relational Model

customer-name
Adams
Adams
Adams
Adams
Adams
Adams
Adams
Curry
Curry
Curry
Curry
Curry
Curry
Curry
Hayes
Hayes
Hayes
Hayes
Hayes
Hayes
Hayes

...

...

loan-number
L-16
L-16
L-16
L-16
L-16
L-16
L-16
L-93
L-93
L-93
L-93
L-93
L-93
L-93
L-15
L-15
L-15
L-15
L-15
L-15
L-15

...

...
14

loan
...

loan-number
L-11
L-14
L-15
L-16
L-17
L-23
L-93
L-11
L-14
L-15
L-16
L-17
L-23
L-93
L-11
L-14
L-15
L-16
L-17
L-23
L-93

...

...

...

Round Hill
Downtown
Perryridge
Perryridge
Downtown
Redwood
Mianus
Round Hill
Downtown
Perryridge
Perryridge
Downtown
Redwood
Mianus

Result of borrower × loan
...

...

900
1500
1500
1300
1000
2000
500
900
1500
1500
1300
1000
2000
500

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

I
...
2

customer-name
Adams
Adams
Curry
Curry
Hayes
Hayes
Jackson
Jackson
Jones
Jones
Smith
Smith
Smith
Smith
Williams
Williams
Figure 3
...

loan-number
L-16
L-16
L-93
L-93
L-15
L-15
L-14
L-14
L-17
L-17
L-11
L-11
L-23
L-23
L-17
L-17

103

© The McGraw−Hill
Companies, 2001

3
...

loan-number
L-15
L-16
L-15
L-16
L-15
L-16
L-15
L-16
L-15
L-16
L-15
L-16
L-15
L-16
L-15
L-16

The Relational Algebra

branch-name
Perryridge
Perryridge
Perryridge
Perryridge
Perryridge
Perryridge
Perryridge
Perryridge
Perryridge
Perryridge
Perryridge
Perryridge
Perryridge
Perryridge
Perryridge
Perryridge

95

amount
1500
1300
1500
1300
1500
1300
1500
1300
1500
1300
1500
1300
1500
1300
1500
1300

Result of σbranch-name = “Perryridge” (borrower × loan)
...
(If you do not see why that is true,
recall that the Cartesian product takes all possible pairings of one tuple from borrower
with one tuple of loan
...
loan-number
= loan
...
So, if we write
σborrower
...
loan-number
(σbranch-name = “Perryridge” (borrower × loan))
we get only those tuples of borrower × loan that pertain to customers who have a
loan at the Perryridge branch
...
loan-number = loan
...
16, is the correct answer to our query
...
2
...
7 The Rename Operation
Unlike relations in the database, the results of relational-algebra expressions do not
have a name that we can use to refer to them
...
Data Models

3
...
16 Result of Πcustomer -name
(σborrower
...
loan-number
(σbranch-name = “Perryridge” (borrower × loan)))
...
Given a relational-algebra expression E, the expression
ρx (E)
returns the result of expression E under the name x
...
Thus,
we can also apply the rename operation to a relation r to get the same relation under
a new name
...
Assume that a relationalalgebra expression E has arity n
...
,An ) (E)
returns the result of expression E under the name x, and with the attributes renamed
to A1 , A2 ,
...

To illustrate renaming a relation, we consider the query “Find the largest account
balance in the bank
...

Step 1: To compute the temporary relation, we need to compare the values of
all account balances
...
First, we need to devise a mechanism to distinguish between
the two balance attributes
...

balance
500
400
700
750
350
Figure 3
...
balance (σaccount
...
balance (account × ρd (account)))
...
Data Models

105

© The McGraw−Hill
Companies, 2001

3
...
2

The Relational Algebra

97

balance
900
Figure 3
...

We can now write the temporary relation that consists of the balances that are not
the largest:
Πaccount
...
balance

< d
...
The result contains all
balances except the largest one
...
17 shows this relation
...
balance (σaccount
...
balance

(account × ρd (account)))

Figure 3
...

As one more example of the rename operation, consider the query “Find the names
of all customers who live on the same street and in the same city as Smith
...
In the following query, we use the rename
operation on the preceding expression to give its result the name smith-addr, and to
rename its attributes to street and city, instead of customer-street and customer-city:
Πcustomer
...
customer -street =smith-addr
...
customer -city=smith-addr
...
4, appears in Figure 3
...

The rename operation is not strictly required, since it is possible to use a positional
notation for attributes
...
refer to the ﬁrst attribute, the second attribute, and
so on
...

customer-name
Curry
Smith
Figure 3
...

106

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

98

Chapter 3

I
...
Relational Model

© The McGraw−Hill
Companies, 2001

Relational Model

The following relational-algebra expression illustrates the use of positional notation
with the unary operator σ:
σ$2=$3 (R × R)
If a binary operation needs to distinguish between its two operand relations, a similar
positional notation can be used for relation names as well
...
However, the
positional notation is inconvenient for humans, since the position of the attribute is a
number, rather than an easy-to-remember attribute name
...

3
...
2 Formal Deﬁnition of the Relational Algebra
The operations in Section 3
...
1 allow us to give a complete deﬁnition of an expression
in the relational algebra
...

A general expression in relational algebra is constructed out of smaller subexpressions
...
Then, these are all relationalalgebra expressions:
• E1 ∪ E2
• E1 − E2
• E1 × E2
• σP (E1 ), where P is a predicate on attributes in E1
• ΠS (E1 ), where S is a list consisting of some of the attributes in E1
• ρx (E1 ), where x is the new name for the result of E1

3
...
3 Additional Operations
The fundamental operations of the relational algebra are sufﬁcient to express any
relational-algebra query
...
Therefore, we deﬁne additional operations that do not add any power to the algebra, but simplify common
queries
...

1
...
3, we introduce operations that extend the power of the relational algebra, to handle null
and aggregate values
...
Data Models

107

© The McGraw−Hill
Companies, 2001

3
...
2

The Relational Algebra

99

3
...
3
...
Suppose that we wish to ﬁnd all customers who have both a loan and an
account
...
20
...
It is simply more convenient to write r ∩ s than to write
r − (r − s)
...
2
...
2 The Natural-Join Operation
It is often desirable to simplify certain queries that require a Cartesian product
...
Consider the query “Find the names of all customers
who have a loan at the bank, along with the loan number and the loan amount
...
Then, we select
those tuples that pertain to only the same loan-number, followed by the projection of
the resulting customer-name, loan-number, and amount:
Πcustomer -name, loan
...
loan-number = loan
...
It is denoted by the “join” symbol 1
...

Although the deﬁnition of natural join is complicated, the operation is easy to
apply
...
” We express this query
customer-name
Hayes
Jones
Smith
Figure 3
...

108

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

100

Chapter 3

I
...
Relational Model

Relational Model

customer-name
Adams
Curry
Hayes
Jackson
Jones
Smith
Smith
Williams
Figure 3
...

by using the natural join as follows:
Πcustomer -name, loan-number , amount (borrower

1

loan)

Since the schemas for borrower and loan (that is, Borrower-schema and Loan-schema)
have the attribute loan-number in common, the natural-join operation considers only
pairs of tuples that have the same value on loan-number
...
After performing the projection, we obtain the relation in Figure 3
...

Consider two relation schemas R and S — which are, of course, lists of attribute
names
...
Similarly, those attribute names that
appear in R but not S are denoted by R − S, whereas S − R denotes those attribute
names that appear in S but not in R
...

We are now ready for a formal deﬁnition of the natural join
...
The natural join of r and s, denoted by r 1 s, is a relation on schema
R ∪ S formally deﬁned as follows:
r

1

s = ΠR ∪ S (σr
...
A1 ∧ r
...
A2 ∧
...
An = s
...
, An }
...

branch-name
Brighton
Perryridge
Figure 3
...

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

I
...
Relational Model

3
...

Πbranch-name
(σcustomer -city = “Harrison” (customer

1

account

1

depositor ))

The result relation for this query appears in Figure 3
...

Notice that we wrote customer 1 account 1 depositor without inserting
parentheses to specify the order in which the natural-join operations on the
three relations should be executed
...
That is, the natural join is associative
...

Πcustomer -name (borrower

1

depositor )

Note that in Section 3
...
3
...
We repeat this expression here
...
20
...

• Let r(R) and s(S) be relations without any attributes in common; that is,
R ∩ S = ∅
...
) Then, r 1 s = r × s
...
Consider
relations r(R) and s(S), and let θ be a predicate on attributes in the schema R ∪ S
...
2
...
3 The Division Operation
The division operation, denoted by ÷, is suited to queries that include the phrase
“for all
...
We can obtain all branches in Brooklyn by the expression
r1 = Πbranch-name (σbranch-city = “Brooklyn” (branch))
The result relation for this expression appears in Figure 3
...

110

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

102

Chapter 3

I
...
Relational Model

Relational Model

branch-name
Brighton
Downtown
Figure 3
...

We can ﬁnd all (customer-name, branch-name) pairs for which the customer has an
account at a branch by writing
r2 = Πcustomer -name, branch-name (depositor

1

account)

Figure 3
...

Now, we need to ﬁnd customers who appear in r2 with every branch name in
r1
...
We
formulate the query by writing
Πcustomer -name, branch-name (depositor 1 account)
÷ Πbranch-name (σbranch-city = “Brooklyn” (branch))
The result of this expression is a relation that has the schema (customer-name) and that
contains the tuple (Johnson)
...
The relation r ÷ s is a relation on schema R − S (that
is, on the schema containing all attributes of schema R that are not in schema S)
...
t is in ΠR−S (r)
2
...
tr [S] = ts [S]
b
...
Let r(R) and s(S) be given, with S ⊆ R:
r ÷ s = ΠR−S (r) − ΠR−S ((ΠR−S (r) × s) − ΠR−S,S (r))
customer-name
Hayes
Johnson
Johnson
Jones
Lindsay
Smith
Turner
Figure 3
...

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

I
...
3

111

© The McGraw−Hill
Companies, 2001

3
...
The expression on the right
side of the set difference operator
ΠR−S ((ΠR−S (r) × s) − ΠR−S,S (r))
serves to eliminate those tuples that fail to satisfy the second condition of the deﬁnition of division
...
Consider ΠR−S (r) × s
...
The expression
ΠR−S,S (r) merely reorders the attributes of r
...
If a tuple tj is in
ΠR−S ((ΠR−S (r) × s) − ΠR−S,S (r))
then there is some tuple ts in s that does not combine with tuple tj to form a tuple in
r
...
It is these
values that we eliminate from ΠR−S (r)
...
2
...
4 The Assignment Operation
It is convenient at times to write a relational-algebra expression by assigning parts of
it to temporary relation variables
...
To illustrate this operation, consider the
deﬁnition of division in Section 3
...
3
...
We could write r ÷ s as
temp1 ← ΠR−S (r)
temp2 ← ΠR−S ((temp1 × s) − ΠR−S,S (r))
result = temp1 − temp2
The evaluation of an assignment does not result in any relation being displayed to
the user
...
This relation variable may be used in subsequent
expressions
...
For relational-algebra queries, assignment must
always be made to a temporary relation variable
...
We discuss this issue in Section 3
...
Note
that the assignment operation does not provide any additional power to the algebra
...

3
...
A simple
extension is to allow arithmetic operations as part of projection
...
Data Models

© The McGraw−Hill
Companies, 2001

3
...
25

limit credit-balance
2000
1750
1500
1500
6000
700
2000
400

The credit-info relation
...
Another important extension is the outer-join operation, which
allows relational-algebra expressions to deal with null values, which model missing
information
...
3
...
The generalized projection operation has the form
ΠF1 ,F2 ,
...
, Fn is an arithmetic expression involving constants and attributes in the schema of E
...

For example, suppose we have a relation credit-info, as in Figure 3
...
If we want to
ﬁnd how much more each person can spend, we can write the following expression:
Πcustomer -name, limit

− credit-balance

(credit-info)

The attribute resulting from the expression limit − credit -balance does not have a
name
...
As a notational convenience, renaming of attributes can be
combined with generalized projection as illustrated below:
Πcustomer -name, (limit

− credit-balance) as credit-available

(credit-info)

The second attribute of this generalized projection has been given the name creditavailable
...
26 shows the result of applying this expression to the relation in
Figure 3
...

3
...
2 Aggregate Functions
Aggregate functions take a collection of values and return a single value as a result
...
Thus, the function sum applied on the collection
{1, 1, 3, 4, 4, 11}

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

I
...
3

Extended Relational-Algebra Operations

customer-name
Curry
Jones
Smith
Hayes
Figure 3
...
Relational Model

105

credit-available
250
5300
1600
0

The result of Πcustomer -name, (limit
(credit-info)
...
The aggregate function avg returns the average of the values
...
The aggregate function count returns the number of the elements in the collection, and returns 6 on
the preceding collection
...

The collections on which aggregate functions operate can have multiple occurrences of a value; the order in which the values appear is not relevant
...
Sets are a special case of multisets where there is only one
copy of each element
...
27, for part-time employees
...
The relational-algebra expression
for this query is:
Gsum(salary) (pt-works)
The symbol G is the letter G in calligraphic font; read it as “calligraphic G
...
The result of the expression
above is a relation with a single attribute, containing a single row with a numerical
value corresponding to the sum of all the salaries of all employees working part-time
in the bank
...
27

branch-name salary
Perryridge
1500
Perryridge
1300
Perryridge
5300
Downtown
1500
Downtown
1300
Downtown
2500
Austin
1500
Austin
1600
The pt-works relation
...
Data Models

3
...
If we do want to eliminate duplicates, we use the
same function names as before, with the addition of the hyphenated string “distinct”
appended to the end of the function name (for example, count-distinct)
...
”
In this case, a branch name counts only once, regardless of the number of employees
working that branch
...
27, the result of this query is a single row containing the
value 3
...
To do so, we
need to partition the relation pt-works into groups based on the branch, and to apply
the aggregate function on each group
...
Figure 3
...
The expression sum(salary) in
the right-hand subscript of G indicates that for each group of tuples (that is, each
branch), the aggregation function sum must be applied on the collection of values of
the salary attribute
...
29
...
,Gn GF1 (A1 ), F2 (A2 ),
...
, Gn constitute a list of attributes on which to group; each Fi is an aggregate function; and each Ai is an atemployee-name
Rao
Sato
Johnson
Loreena
Peterson
Adams
Brown
Gopal
Figure 3
...

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

I
...
3

115

© The McGraw−Hill
Companies, 2001

3
...
29

Result of

branch-name Gsum(salary) (pt-works)
...
The meaning of the operation is as follows
...
All tuples in a group have the same values for G1 , G2 ,
...

2
...
, Gn
...
, Gn
...
, gn ), the result has a tuple (g1 , g2 ,
...
, am ) where, for
each i, ai is the result of applying the aggregate function Fi on the multiset of values
for attribute Ai in the group
...
, Gn can
be empty, in which case there is a single group containing all tuples in the relation
...

Going back to our earlier example, if we want to ﬁnd the maximum salary for
part-time employees at each branch, in addition to the sum of the salaries, we write
the expression
branch-name Gsum(salary),max(salary) (pt-works)

As in generalized projection, the result of an aggregation operation does not have a
name
...
As
a notational convenience, attributes of an aggregation operation can be renamed as
illustrated below:
branch-name Gsum(salary) as sum-salary,max(salary) as max -salary (pt-works)

Figure 3
...

branch-name sum-salary max-salary
Austin
3100
1600
Downtown
5300
2500
Perryridge
8100
5300
Figure 3
...

116

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

108

Chapter 3

I
...
Relational Model

Relational Model

employee-name
Coyote
Rabbit
Smith
Williams
employee-name
Coyote
Rabbit
Gates
Williams
Figure 3
...

3
...
3 Outer Join
The outer-join operation is an extension of the join operation to deal with missing
information
...
31
...
A possible approach would be to use the naturaljoin operation as follows:
employee

1 ft-works

The result of this expression appears in Figure 3
...
Notice that we have lost the street
and city information about Smith, since the tuple describing Smith is absent from
the ft-works relation; similarly, we have lost the branch name and salary information
about Gates, since the tuple describing Gates is absent from the employee relation
...
There are
actually three forms of the operation: left outer join, denoted 1; right outer join, denoted 1 ; and full outer join, denoted 1
...
The results of the expressions
employee-name
Coyote
Rabbit
Williams

street
Toon
Tunnel
Seaview

Figure 3
...

salary
1500
1300
1500

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

I
...
3

employee-name
Coyote
Rabbit
Williams
Smith

street
Toon
Tunnel
Seaview
Revolver

Figure 3
...
Relational Model

Extended Relational-Algebra Operations

city
Hollywood
Carrotville
Seattle
Death Valley

Result of employee

branch-name
Mesa
Mesa
Redmond
null

109

salary
1500
1300
1500
null

1 ft-works
...
33, 3
...
35, respectively
...
In Figure 3
...
All information from
the left relation is present in the result of the left outer join
...
In Figure 3
...
Thus, all information from the right relation is present in the
result of the right outer join
...
Figure 3
...

Since outer join operations may generate results containing null values, we need
to specify how the different relational-algebra operations deal with null values
...
3
...

It is interesting to note that the outer join operations can be expressed by the basic
relational-algebra operations
...
, null)}

where the constant relation {(null,
...

employee-name
Coyote
Rabbit
Williams
Gates

street
Toon
Tunnel
Seaview
null

Figure 3
...

salary
1500
1300
1500
5300

118

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

110

Chapter 3

I
...
Relational Model

Relational Model

employee-name
Coyote
Rabbit
Williams
Smith
Gates

street
Toon
Tunnel
Seaview
Revolver
null

Figure 3
...

3
...
4 Null Values∗∗
In this section, we deﬁne how the various relational algebra operations deal with null
values and complications that arise when a null value participates in an arithmetic
operation or in a comparison
...
Operations and comparisons on null values should therefore be avoided,
where possible
...

Similarly, any comparisons (such as <, <=, >, >=, =) involving a null value evaluate to special value unknown; we cannot say for sure whether the result of the
comparison is true or false, so we say that the result is the new truth value unknown
...
We must therefore deﬁne how the three Boolean operations deal with the truth value unknown
...

• or: (true or unknown) = true; (false or unknown) = unknown; (unknown or unknown) = unknown
...

We are now in a position to outline how the different relational operations deal
with null values
...

• select: The selection operation evaluates predicate P in σP (E) on each tuple t
in E
...
Otherwise,
if the predicate returns unknown or false, t is not added to the result
...
Thus,
the deﬁnition of how selection handles nulls also deﬁnes how join operations
handle nulls
...

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

I
...
Relational Model

3
...
Thus, if two tuples in the projection result are exactly
the same, and both have nulls in the same ﬁelds, they are treated as duplicates
...

• union, intersection, difference: These operations treat nulls just as the projection operation does; they treat tuples that have the same values on all ﬁelds as
duplicates even if some of the ﬁelds have null values in both tuples
...

• generalized projection: We outlined how nulls are handled in expressions
at the beginning of Section 3
...
4
...

• aggregate: When nulls occur in grouping attributes, the aggregate operation
treats them just as in projection: If two tuples are the same on all grouping
attributes, the operation places them in the same group, even if some of their
attribute values are null
...
If the resultant multiset is empty,
the aggregate result is null
...
However, this would mean
a single unknown value in a large group could make the aggregate result on
the group to be null, and we would lose a lot of useful information
...
Such tuples may be added to the
result (depending on whether the operation is 1, 1 , or 1 ), padded with
nulls
...
4 Modiﬁcation of the Database
We have limited our attention until now to the extraction of information from the
database
...

We express database modiﬁcations by using the assignment operation
...
2
...

3
...
1 Deletion
We express a delete request in much the same way as a query
...
We

120

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

112

Chapter 3

I
...
Relational Model

© The McGraw−Hill
Companies, 2001

Relational Model

can delete only whole tuples; we cannot delete values on only particular attributes
...

Here are several examples of relational-algebra delete requests:
• Delete all of Smith’s account records
...

loan ← loan − σamount≥0 and amount≤50 (loan)
• Delete all accounts at branches located in Needham
...

3
...
2 Insertion
To insert data into a relation, we either specify a tuple to be inserted or write a query
whose result is a set of tuples to be inserted
...
Similarly, tuples inserted
must be of the correct arity
...
We express the insertion
of a single tuple by letting E be a constant relation containing one tuple
...
We write
account ← account ∪ {(A-973, “Perryridge”, 1200)}
depositor ← depositor ∪ {(“Smith”, A-973)}
More generally, we might want to insert tuples on the basis of the result of a query
...
Let the loan number serve as the account number
for this savings account
...
Data Models

121

© The McGraw−Hill
Companies, 2001

3
...
5

Views

113

Instead of specifying a tuple as we did earlier, we specify a set of tuples that is inserted into both the account and depositor relation
...
Each tuple in the depositor
relation has as customer-name the name of the loan customer who is being given the
new account and the same account number as the corresponding account tuple
...
4
...
We can use the generalized-projection operator to do this task:
r ← ΠF1 ,F2 ,
...

If we want to select some tuples from r and to update only them, we can use
the following expression; here, P denotes the selection condition that chooses which
tuples to update:
r ← ΠF1 ,F2 ,
...
We write
account ← Πaccount-number, branch-name, balance

∗1
...
We write
account ← ΠAN,BN, balance ∗1
...
05 (σbalance≤10000 (account))
where the abbreviations AN and BN stand for account-number and branch-name, respectively
...
5 Views
In our examples up to this point, we have operated at the logical-model level
...

It is not desirable for all users to see the entire logical model
...
Consider a person who
needs to know a customer’s loan number and branch name, but has no need to see
the loan amount
...

122

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

114

Chapter 3

I
...
Relational Model

© The McGraw−Hill
Companies, 2001

Relational Model

An employee in the advertising department, for example, might like to see a relation
consisting of the customers who have either an account or a loan at the bank, and
the branches with which they do business
...
It is possible to support a large number of views on
top of any given set of actual relations
...
5
...
To deﬁne a view, we must give
the view a name, and must state the query that computes the view
...
The view
name is represented by v
...
We
wish this view to be called all-customer
...
Using the view all-customer, we can ﬁnd all customers
of the Perryridge branch by writing
Πcustomer -name (σbranch-name = “Perryridge” (all-customer ))
Recall that we wrote the same query in Section 3
...
1 without using views
...
We study the issue of update
operations on views in Section 3
...
2
...
Suppose
that we deﬁne relation r1 as follows:
r1 ← Πbranch-name, customer -name (depositor 1 account)
∪ Πbranch-name, customer -name (borrower 1 loan)
We evaluate the assignment operation once, and r1 does not change when we update the relations depositor, account, loan, or borrower
...

Intuitively, at any given time, the set of tuples in the view relation is the result of
evaluation of the query expression that deﬁnes the view at that time
...
Data Models

123

© The McGraw−Hill
Companies, 2001

3
...
5

Views

115

Thus, if a view relation is computed and stored, it may become out of date if the
relations used to deﬁne it are modiﬁed
...
When we deﬁne a view, the database system stores the deﬁnition of the
view itself, rather than the result of evaluation of the relational-algebra expression
that deﬁnes the view
...
Thus, whenever we evaluate the query, the view relation
gets recomputed
...
Such views are called materialized views
...
5
...
Of course, the beneﬁts to queries
from the materialization of a view must be weighed against the storage costs and the
added overhead for updates
...
5
...
The difﬁculty is that a modiﬁcation
to the database expressed in terms of a view must be translated to a modiﬁcation to
the actual relations in the logical model of the database
...
Let loan-branch be the view given to the clerk
...

However, to insert a tuple into loan, we must have some value for amount
...

• Insert a tuple (L-37, “Perryridge”, null) into the loan relation
...
Data Models

© The McGraw−Hill
Companies, 2001

3
...
36

amount
900
1500
1500
1300
1000
2000
500
1900

loan-number
L-16
L-93
L-15
L-14
L-17
L-11
L-23
L-17
null

Tuples inserted into loan and borrower
...

Consider the following insertion through this view:
loan-info ← loan-info ∪ {(“Johnson”, 1900)}
The only possible method of inserting tuples into the borrower and loan relations is to
insert (“Johnson”, null) into borrower and (null, null, 1900) into loan
...
36
...
Thus, there is no way to update the relations borrower and loan by using nulls
to get the desired update on loan-info
...
Different database systems specify different
conditions under which they permit updates on view relations; see the database
system manuals for details
...

3
...
3 Views Deﬁned by Using Other Views
In Section 3
...
1 we mentioned that view relations may appear in any place that a
relation name may appear, except for restrictions on the use of views in update ex-

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

I
...
Relational Model

125

© The McGraw−Hill
Companies, 2001

3
...
Thus, one view may be used in the expression deﬁning another view
...

View expansion is one way to deﬁne the meaning of views deﬁned in terms of
other views
...
For example, if v1 is used in the deﬁnition of v2, v2 is used in the
deﬁnition of v3, and v3 is used in the deﬁnition of v1, then each of v1, v2, and v3
is recursive
...
2
...
A view relation stands for the expression deﬁning the view, and therefore
a view relation can be replaced by the expression that deﬁnes it
...
Hence, view expansion of an expression
repeats the replacement step as follows:
repeat
Find any view relation vi in e1
Replace the view relation vi by the expression deﬁning vi
until no more view relations are present in e1
As long as the view deﬁnitions are not recursive, this loop will terminate
...

As an illustration of view expansion, consider the following expression:
σcustomer -name=“John” ( perryridge-customer )
The view-expansion procedure initially generates
σcustomer -name=“John” (Πcustomer -name (σbranch-name = “Perryridge”
(all-customer )))
It then generates
σcustomer -name=“John” (Πcustomer -name (σbranch-name = “Perryridge”
(Πbranch-name, customer -name (depositor 1 account)
∪ Πbranch-name, customer -name (borrower 1 loan))))
There are no more uses of view relations, and view expansion terminates
...
Data Models

3
...
6 The Tuple Relational Calculus
When we write a relational-algebra expression, we provide a sequence of procedures
that generates the answer to our query
...
It describes the desired information without giving
a speciﬁc procedure for obtaining that information
...
Following our
earlier notation, we use t[A] to denote the value of tuple t on attribute A, and we use
t ∈ r to denote that tuple t is in relation r
...
2
...
6
...
To write this query in the tuple relational calculus, we need to write
an expression for a relation on the schema (loan-number)
...
To
express this request, we need the construct “there exists” from mathematical logic
...
”
Using this notation, we can write the query “Find the loan number for each loan
of an amount greater than $1200” as
{t | ∃ s ∈ loan (t[loan-number ] = s[loan-number ]
∧ s[amount] > 1200)}
In English, we read the preceding expression as “The set of all tuples t such that there
exists a tuple s in relation loan for which the values of t and s for the loan-number
attribute are equal, and the value of s for the amount attribute is greater than $1200
...
Thus, the result is a relation on (loannumber)
...
” This query is slightly more complex than the previous queries,
since it involves two relations: borrower and loan
...
We write the query as follows:

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

I
...
Relational Model

3
...
” Tuple variable u ensures that the
customer is a borrower at the Perryridge branch
...
Figure 3
...

To ﬁnd all customers who have a loan, an account, or both at the bank, we used
the union operation in the relational algebra
...

• The customer-name appears in some tuple of the depositor relation as a depositor of the bank
...
The result of this query appeared earlier in Figure 3
...

If we now want only those customers who have both an account and a loan at the
bank, all we need to do is to change the or (∨) to and (∧) in the preceding expression
...
20
...
” The tuple-relational-calculus expression for this
query is similar to the expressions that we have just seen, except for the use of the not
(¬) symbol:
{t | ∃ u ∈ depositor (t[customer -name] = u[customer -name])
∧ ¬ ∃ s ∈ borrower (t[customer -name] = s[customer -name])}
customer-name
Adams
Hayes
Figure 3
...

128

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

120

Chapter 3

I
...
Relational Model

© The McGraw−Hill
Companies, 2001

Relational Model

This tuple-relational-calculus expression uses the ∃ u ∈ depositor (
...
) clause to eliminate those customers who appear in some tuple of the
borrower relation as having a loan from the bank
...
13
...
The formula
P ⇒ Q means “P implies Q”; that is, “if P is true, then Q must be true
...
The use of implication rather than not and
or often suggests a more intuitive interpretation of a query in English
...
2
...
” To
write this query in the tuple relational calculus, we introduce the “for all” construct,
denoted by ∀
...
”
We write the expression for our query as follows:
{t | ∃ r ∈ customer (r[customer -name] = t[customer -name]) ∧
( ∀ u ∈ branch (u[branch-city] = “ Brooklyn” ⇒
∃ s ∈ depositor (t[customer -name] = s[customer -name]
∧ ∃ w ∈ account (w[account-number ] = s[account-number ]
∧ w[branch-name] = u[branch-name]))))}
In English, we interpret this expression as “The set of all customers (that is, (customername) tuples t) such that, for all tuples u in the branch relation, if the value of u on attribute branch-city is Brooklyn, then the customer has an account at the branch whose
name appears in the branch-name attribute of u
...
The ﬁrst line of the query expression is critical in this case — without the condition
∃ r ∈ customer (r[customer -name] = t[customer -name])
if there is no branch in Brooklyn, any value of t (including values that are not customer names in the depositor relation) would qualify
...
6
...
A tuple-relational-calculus expression is of
the form
{t | P(t)}
where P is a formula
...
A tuple variable is said to be a free variable unless it is quantiﬁed by a ∃ or ∀
...
Tuple variable s is said to be a bound variable
...
Data Models

129

© The McGraw−Hill
Companies, 2001

3
...
6

The Tuple Relational Calculus

121

A tuple-relational-calculus formula is built up out of atoms
...

• If P1 is a formula, then so are ¬P1 and (P1 )
...

• If P1 (s) is a formula containing a free tuple variable s, and r is a relation, then
∃ s ∈ r (P1 (s)) and ∀ s ∈ r (P1 (s))
are also formulae
...
In the tuple relational calculus, these equivalences
include the following three rules:
1
...

2
...

3
...

3
...
3 Safety of Expressions
There is one ﬁnal issue to be addressed
...
Suppose that we write the expression
{t |¬ (t ∈ loan)}
There are inﬁnitely many tuples that are not in loan
...

To help us deﬁne a restriction of the tuple relational calculus, we introduce the
concept of the domain of a tuple relational formula, P
...
They include values
mentioned in P itself, as well as values that appear in a tuple of a relation mentioned in P
...
Data Models

3
...
For example,
dom(t ∈ loan ∧ t[amount] > 1200) is the set containing 1200 as well as the set of all
values appearing in loan
...

We say that an expression {t | P (t)} is safe if all values that appear in the result
are values from dom(P )
...
Note that
dom(¬ (t ∈ loan)) is the set of all values appearing in loan
...
The other
examples of tuple-relational-calculus expressions that we have written in this section
are safe
...
6
...
We will not prove this assertion here; the bibliographic notes contain references
to the proof
...
We note that the
tuple relational calculus does not have any equivalent of the aggregate operation, but
it can be extended to support aggregation
...

3
...
The domain relational calculus, however, is closely related to the tuple
relational calculus
...

3
...
1 Formal Deﬁnition
An expression in the domain relational calculus is of the form
{< x1 , x2 ,
...
, xn )}
where x1 , x2 ,
...
P represents a formula composed
of atoms, as was the case in the tuple relational calculus
...
, xn > ∈ r, where r is a relation on n attributes and x1 , x2 ,
...

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

I
...
Relational Model

3
...
We require that attributes x and y have domains that can
be compared by Θ
...

We build up formulae from atoms by using the following rules:
• An atom is a formula
...

• If P1 and P2 are formulae, then so are P1 ∨ P2 , P1 ∧ P2 , and P1 ⇒ P2
...

As a notational shorthand, we write
∃ a, b, c (P (a, b, c))
for
∃ a (∃ b (∃ c (P (a, b, c))))

3
...
2 Example Queries
We now give domain-relational-calculus queries for the examples that we considered earlier
...

• Find the loan number, branch name, and amount for loans of over $1200:
{< l, b, a > | < l, b, a > ∈ loan ∧ a > 1200}
• Find all loan numbers for loans with an amount greater than $1200:
{< l > | ∃ b, a (< l, b, a > ∈ loan ∧ a > 1200)}
Although the second query appears similar to the one that we wrote for the tuple
relational calculus, there is an important difference
...
However, when we write ∃ b in the domain calculus, b refers not to a tuple,
but rather to a domain value
...
For example,
• Find the names of all customers who have a loan from the Perryridge branch
and ﬁnd the loan amount:

132

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

124

Chapter 3

I
...
Relational Model

© The McGraw−Hill
Companies, 2001

Relational Model

{< c, a > | ∃ l (< c, l > ∈ borrower
∧ ∃ b (< l, b, a > ∈ loan ∧ b = “Perryridge”))}
• Find the names of all customers who have a loan, an account, or both at the
Perryridge branch:
{< c > | ∃ l (< c, l > ∈ borrower
∧ ∃ b, a (< l, b, a > ∈ loan ∧ b = “Perryridge”))
∨ ∃ a (< c, a > ∈ depositor
∧ ∃ b, n (< a, b, n > ∈ account ∧ b = “Perryridge”))}
• Find the names of all customers who have an account at all the branches located in Brooklyn:
{< c > | ∃ n (< c, n > ∈ customer ) ∧
∀ x, y, z (< x, y, z > ∈ branch ∧ y = “Brooklyn” ⇒
∃ a, b (< a, x, b > ∈ account ∧ < c, a > ∈ depositor ))}
In English, we interpret this expression as “The set of all (customer-name) tuples c such that, for all (branch-name, branch-city, assets) tuples, x, y, z, if the
branch city is Brooklyn, then the following is true”:
There exists a tuple in the relation account with account number a and
branch name x
...
”

3
...
3 Safety of Expressions
We noted that, in the tuple relational calculus (Section 3
...
That led us to deﬁne safety for tuplerelational-calculus expressions
...
An expression such as
{< l, b, a > | ¬(< l, b, a > ∈ loan)}
is unsafe, because it allows values in the result that are not in the domain of the
expression
...
Consider the expression
{< x > | ∃ y (< x, y >∈ r) ∧ ∃ z (¬(< x, z >∈ r) ∧ P (x, z))}
where P is some formula involving x and z
...
However, to test the second
part of the formula, ∃ z (¬ (< x, z > ∈ r) ∧ P (x, z)), we must consider values for
z that do not appear in r
...
Thus, it is not possible, in general, to test the second part of the

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

I
...
Relational Model

3
...
Instead,
we add restrictions to prohibit expressions such as the preceding one
...
Since we did not do so in the domain calculus, we
add rules to the deﬁnition of safety to deal with cases like our example
...
, xn > | P (x1 , x2 ,
...
All values that appear in tuples of the expression are values from dom(P)
...
For every “there exists” subformula of the form ∃ x (P1 (x)), the subformula is
true if and only if there is a value x in dom(P1 ) such that P1 (x) is true
...
For every “for all” subformula of the form ∀x (P1 (x)), the subformula is true
if and only if P1 (x) is true for all values x from dom(P1 )
...
Consider the
second rule in the deﬁnition of safety
...
In general, there would be inﬁnitely many values to
test
...
This restriction reduces to a ﬁnite number the tuples we must
consider
...
To assert that
∀x (P1 (x)) is true, we must, in general, test all possible values, so we must examine inﬁnitely many values
...

All the domain-relational-calculus expressions that we have written in the example queries of this section are safe
...
7
...

Since we noted earlier that the restricted tuple relational calculus is equivalent to the
relational algebra, all three of the following are equivalent:
• The basic relational algebra (without the extended relational algebra operations)
• The tuple relational calculus restricted to safe expressions
• The domain relational calculus restricted to safe expressions

134

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

126

Chapter 3

I
...
Relational Model

© The McGraw−Hill
Companies, 2001

Relational Model

We note that the domain relational calculus also does not have any equivalent of the
aggregate operation, but it can be extended to support aggregation, and extending it
to handle arithmatic expressions is straightforward
...
8 Summary
• The relational data model is based on a collection of tables
...
There are several languages for expressing these operations
...
These operations can be combined
to get expressions that express desired queries
...

• The operations in relational algebra can be divided into
Basic operations
Additional operations that can be expressed in terms of the basic operations
Extended operations, some of which add further expressive power to relational algebra
• Databases can be modiﬁed by insertion, deletion, or update of tuples
...

• Different users of a shared database may beneﬁt from individualized views of
the database
...
We
evaluate queries involving views by replacing the view with the expression
that deﬁnes the view
...
Therefore, database
systems severely restrict updates through views
...
When database relations are updated, the materialized view must be correspondingly updated
...
The basic relational algebra is a procedural language that is
equivalent in power to both forms of the relational calculus when they are
restricted to safe expressions
...
Commercial database systems, therefore, use languages with more “syntactic sugar
...
Data Models

135

© The McGraw−Hill
Companies, 2001

3
...

Review Terms
• Table
• Relation
• Tuple variable
• Atomic domain
• Null value
• Database schema
• Database instance
• Relation schema
• Relation instance
• Keys
• Foreign key
Referencing relation
Referenced relation
• Schema diagram
• Query language
• Procedural language
• Nonprocedural language
• Relational algebra
• Relational algebra operations
Select σ
Project Π
Union ∪
Set difference −
Cartesian product ×
Rename ρ
• Additional operations
Set-intersection ∩

Natural-join 1
Division /
• Assignment operation
• Extended relational-algebra
operations
Generalized projection Π
Outer join
–– Left outer join 1
–– Right outer join 1
–– Full outer join 1
Aggregation G
• Multisets
• Grouping
• Null values
• Modiﬁcation of the database
Deletion
Insertion
Updating
• Views
• View deﬁnition
• Materialized views
• View update
•
•
•
•

View expansion
Recursive views
Tuple relational calculus
Domain relational calculus

• Safety of expressions
• Expressive power of languages

Exercises
3
...
The ofﬁce maintains data about each class, including the instructor, the number of students
enrolled, and the time and place of the class meetings
...

136

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

128

Chapter 3

I
...
Relational Model

Relational Model

model

address
driver-id

license

name
person

location

car

owns

driver

year

report-number

participated

date

accident

damage-amount

Figure 3
...

3
...

Illustrate your answer by referring to your solution to Exercise 3
...

3
...
38
...
4 In Chapter 2, we saw how to represent many-to-many, many-to-one, one-tomany, and one-to-one relationship sets
...

3
...
39, where the primary keys are underlined
...
Find the names of all employees who work for First Bank Corporation
...
Find the names and cities of residence of all employees who work for First
Bank Corporation
...
Find the names, street address, and cities of residence of all employees who
work for First Bank Corporation and earn more than $10,000 per annum
...
Find the names of all employees in this database who live in the same city
as the company for which they work
...
Find the names of all employees who live in the same city and on the same
street as do their managers
...
Find the names of all employees in this database who do not work for First
Bank Corporation
...
Find the names of all employees who earn more than every employee of
Small Bank Corporation
...
Assume the companies may be located in several cities
...

3
...
21, which shows the result of the query “Find
the names of all customers who have a loan at the bank
...

Observe that now customer Jackson no longer appears in the result, even though
Jackson does in fact have a loan from the bank
...
Data Models

3
...
39

Relational database for Exercises 3
...
8 and 3
...

a
...

b
...
How would you
modify the database to achieve this effect?
c
...
Write a query
using an outer join that accomplishes this desire without your having to
modify the database
...
7 The outer-join operations extend the natural-join operation so that tuples from
the participating relations are not lost in the result of the join
...

3
...
39
...

Give all employees of First Bank Corporation a 10 percent salary raise
...

Give all managers in this database a 10 percent salary raise, unless the salary
would be greater than $100,000
...

e
...

a
...

c
...

3
...
Using an aggregate function
...
Without using any aggregate functions
...
10 Consider the relational database of Figure 3
...
Give a relational-algebra expression for each of the following queries:
a
...

b
...

c
...

3
...

3
...

3
...
Data Models

3
...
Give an expression in the tuple relational
calculus that is equivalent to each of the following:
a
...

c
...

ΠA (r)
σB = 17 (r)
r × s
ΠA,F (σC = D (r × s))

3
...
Give
an expression in the domain relational calculus that is equivalent to each of the
following:
a
...

c
...

e
...

ΠA (r1 )
σB = 17 (r1 )
r1 ∪ r2
r1 ∩ r2
r1 − r2
ΠA,B (r1 ) 1 ΠB,C (r2 )

3
...
5 using the tuple relational calculus and the domain relational
calculus
...
16 Let R = (A, B) and S = (A, C), and let r(R) and s(S) be relations
...

b
...

d
...
17 Let R = (A, B) and S = (A, C), and let r(R) and s(S) be relations
...
r
b
...
r

1s
1s
1s

3
...

3
...
A marked null ⊥i is equal to itself, but if
i = j, then ⊥i = ⊥j
...
Consider the view loan-info (Section 3
...
Show how you can use
marked nulls to allow the insertion of the tuple (“Johnson”, 1900) through loaninfo
...
Data Models

3
...
F
...
This work led to the prestigious ACM Turing Award to
Codd in 1981; Codd [1982]
...
J
...
System R is discussed in Astrahan et al
...
[1979],
and Chamberlin et al
...
Ingres is discussed in Stonebraker [1980], Stonebraker
[1986b], and Stonebraker et al
...
Query-by-example is described in Zloof [1977]
...

Many relational-database products are now commercially available
...
Database
products for personal computers include Microsoft Access, dBase, and FoxPro
...

General discussion of the relational data model appears in most database texts
...
The original deﬁnition of relational algebra is in Codd [1970];
that of tuple relational calculus is in Codd [1972]
...

Several extensions to the relational calculus have been proposed
...
[1993] describe extensions to scalar aggregate functions
...

Codd [1990] is a compendium of E
...
Codd’s papers on the relational model
...
The problem of updating relational databases
through views is addressed by Bancilhon and Spyratos [1981], Cosmadakis and Papadimitriou [1984], Dayal and Bernstein [1978], and Langerak [1990]
...
5
covers materialized view maintenance, and references to literature on view maintenance can be found at the end of that chapter
...
Relational Databases

R T

Introduction

© The McGraw−Hill
Companies, 2001

2

Relational Databases

A relational database is a shared repository of data
...
One is how users specify requests for data: Which of the various query languages do they use? Chapter 4
covers the SQL language, which is the most widely used query language today
...

Another issue is data integrity and security; databases need to protect data from
damage by user actions, whether unintentional or intentional
...
The security component of a database
includes authentication of users, and access control, to restrict the permissible actions
for each user
...
Security and integrity
issues are present regardless of the data model, but for concreteness we study them
in the context of the relational model
...

Relational database design — the design of the relational schema — is the ﬁrst step
in building a database application
...
There are, however, principles that can be used to distinguish good
database designs from bad ones
...
Chapter 7 describes the formal design of relational
schemas
...
Relational Databases

H

A

P

T

E

R

141

© The McGraw−Hill
Companies, 2001

4
...
However, commercial database systems require a query language
that is more user friendly
...
SQL uses a combination of relational-algebra
and relational-calculus constructs
...
It can deﬁne the structure of the data, modify data
in the database, and specify security constraints
...
Rather, we
present SQL’s fundamental constructs and concepts
...

4
...
IBM implemented the language, originally called Se-

quel, as part of the System R project in the early 1970s
...

Many products now support the SQL language
...

In 1986, the American National Standards Institute (ANSI) and the International
Organization for Standardization (ISO) published an SQL standard, called SQL-86
...
ANSI published an extended standard for
SQL, SQL-89, in 1989
...
The bibliographic notes provide references to these
standards
...
Relational Databases

4
...
The SQL:1999 standard is a superset of the SQL-92 standard;
we cover some features of SQL:1999 in this chapter, and provide more detailed coverage in Chapter 9
...
You
should also be aware that some database systems do not even support all the features of SQL-92, and that many databases provide nonstandard features that we do
not cover here
...
The SQL DDL provides commands for deﬁning relation schemas, deleting relations, and modifying relation schemas
...
The SQL DML includes a
query language based on both the relational algebra and the tuple relational
calculus
...

• View deﬁnition
...

• Transaction control
...

• Embedded SQL and dynamic SQL
...

• Integrity
...
Updates that violate integrity
constraints are disallowed
...
The SQL DDL includes commands for specifying access rights
to relations and views
...
We also
brieﬂy outline embedded and dynamic SQL, including the ODBC and JDBC standards
for interacting with a database from programs written in the C and Java languages
...

The enterprise that we use in the examples in this chapter, and later chapters, is a
banking enterprise with the following relation schemas:
Branch-schema = (branch-name, branch-city, assets)
Customer-schema = (customer-name, customer-street, customer-city)
Loan-schema = (loan-number, branch-name, amount)
Borrower-schema = (customer-name, loan-number)
Account-schema = (account-number, branch-name, balance)
Depositor-schema = (customer-name, account-number)

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

II
...
SQL

4
...
In actual SQL systems, however,
hyphens are not valid parts of a name (they are treated as the minus operator)
...
For example, we use branch name in place of
branch-name
...
2 Basic Structure
A relational database consists of a collection of relations, each of which is assigned
a unique name
...

SQL allows the use of null values to indicate that the value either is unknown or does
not exist
...
11
...

• The select clause corresponds to the projection operation of the relational algebra
...

• The from clause corresponds to the Cartesian-product operation of the relational algebra
...

• The where clause corresponds to the selection predicate of the relational algebra
...

That the term select has different meaning in SQL than in the relational algebra is an
unfortunate historical fact
...

A typical SQL query has the form
select A1 , A2 ,
...
, rm
where P
Each Ai represents an attribute, and each ri a relation
...
The query is
equivalent to the relational-algebra expression
ΠA1 , A2 ,
...
However, unlike the result of a
relational-algebra expression, the result of the SQL query may contain multiple copies
of some tuples; we shall return to this issue in Section 4
...
8
...
Relational Databases

© The McGraw−Hill
Companies, 2001

4
...
In practice, SQL may convert the expression into an equivalent form that can be processed more efﬁciently
...

4
...
1 The select Clause
The result of an SQL query is, of course, a relation
...

Formal query languages are based on the mathematical notion of a relation being
a set
...
In practice, duplicate elimination is time-consuming
...
Thus, the
preceding query will list each branch-name once for every tuple in which it appears in
the loan relation
...
We can rewrite the preceding query as
select distinct branch-name
from loan
if we want duplicates removed
...
To ensure
the elimination of duplicates in the results of our example queries, we will use distinct whenever it is necessary
...

However, the number is important in certain applications; we return to this issue in
Section 4
...
8
...
” Thus, the use of
loan
...
A select clause of the form select * indicates that all attributes of all relations
appearing in the from clause are selected
...
For example, the query
select loan-number, branch-name, amount * 100
from loan

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

II
...
SQL

4
...

SQL also provides special data types, such as various forms of the date type, and
allows several arithmetic functions to operate on these types
...
2
...
Consider the query “Find all loan
numbers for loans made at the Perryridge branch with loan amounts greater that
$1200
...
The operands of the logical connectives
can be expressions involving the comparison operators <, <=, >, >=, =, and <>
...

SQL includes a between comparison operator to simplify where clauses that specify that a value be less than or equal to some value and greater than or equal to some
other value
...

4
...
3 The from Clause
Finally, let us discuss the use of the from clause
...
Since the natural join is deﬁned in
terms of a Cartesian product, a selection, and a projection, it is a relatively simple
matter to write an SQL expression for the natural join
...
” In SQL, this query can be written as

146

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

140

Chapter 4

II
...
SQL

© The McGraw−Hill
Companies, 2001

SQL

select customer-name, borrower
...
loan-number = loan
...
attribute-name, as does the relational
algebra, to avoid ambiguity in cases where an attribute appears in the schema of more
than one relation
...
customer-name instead of customername in the select clause
...

We can extend the preceding query and consider a more complicated case in which
we require also that the loan be from the Perryridge branch: “Find the customer
names, loan numbers, and loan amounts for all loans at the Perryridge branch
...
loan-number, amount
from borrower, loan
where borrower
...
loan-number and
branch-name = ’Perryridge’
SQL includes extensions to perform natural joins and outer joins in the from clause
...
10
...
2
...
It uses the as

clause, taking the form:
old-name as new-name
The as clause can appear in both the select and from clauses
...
loan-number, amount
from borrower, loan
where borrower
...
loan-number
The result of this query is a relation with the following attributes:
customer-name, loan-number, amount
...

We cannot, however, always derive names in this way, for several reasons: First,
two relations in the from clause may have attributes with the same name, in which
case an attribute name is duplicated in the result
...
Third,

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

II
...
SQL

4
...
Hence, SQL
provides a way of renaming the attributes of a result relation
...
loan-number as loan-id, amount
from borrower, loan
where borrower
...
loan-number

4
...
5 Tuple Variables
The as clause is particularly useful in deﬁning the notion of tuple variables, as is
done in the tuple relational calculus
...
Tuple variables are deﬁned in the from clause by way of the as
clause
...
loan-number, S
...
loan-number = S
...
When we write expressions of the form relation-name
...

Tuple variables are most useful for comparing two tuples in the same relation
...

Suppose that we want the query “Find the names of all branches that have assets
greater than at least one branch located in Brooklyn
...
branch-name
from branch as T, branch as S
where T
...
assets and S
...
asset, since it would not be clear
which reference to branch is intended
...
, vn ) to denote a tuple of arity n containing values v1 , v2 ,
...
The comparison operators can be used on tuples, and
the ordering is deﬁned lexicographically
...

4
...
6 String Operations
SQL speciﬁes strings by enclosing them in single quotes, for example, ’Perryridge’,
as we saw earlier
...
Relational Databases

4
...

The most commonly used operation on strings is pattern matching using the operator like
...

• Underscore ( ): The

character matches any character
...
To illustrate pattern matching, we consider the following examples:
• ’Perry%’ matches any string beginning with “Perry”
...

• ’

’ matches any string of exactly three characters
...

SQL expresses patterns by using the like comparison operator
...
”
This query can be written as

select customer-name
from customer
where customer-street like ’%Main%’
For patterns to include the special pattern characters (that is, % and ), SQL allows
the speciﬁcation of an escape character
...
We deﬁne the escape character for a like comparison
using the escape keyword
...

• like ’ab\\cd%’ escape ’\’ matches all strings beginning with “ab\cd”
...

SQL also permits a variety of functions on character strings, such as concatenating (using “ ”), extracting substrings, ﬁnding the length of strings, converting between uppercase and lowercase, and so on
...

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

II
...
SQL

4
...
2
...
The order by clause causes the tuples in the result of a query to appear in
sorted order
...
loan-number = loan
...
To specify the sort order,
we may specify desc for descending order or asc for ascending order
...
Suppose that we wish to list the
entire loan relation in descending order of amount
...
We express this query in
SQL as follows:
select *
from loan
order by amount desc, loan-number asc
To fulﬁll an order by request, SQL must perform a sort
...

4
...
8 Duplicates
Using relations with duplicates offers advantages in several situations
...
We can deﬁne the duplicate
semantics of an SQL query using multiset versions of the relational operators
...
Given
multiset relations r1 and r2 ,
1
...

2
...

3
...
t2 in r1 × r2
...
Relational Databases

4
...
An SQL query of the form
select A1 , A2 ,
...
, rm
where P
is equivalent to the relational-algebra expression
ΠA1 , A2 ,
...

4
...
Like union, intersection, and set
difference in relational algebra, the relations participating in the operations must be
compatible; that is, they must have the same set of attributes
...
We shall now construct queries involving the union,
intersect, and except operations of two sets: the set of all customers who have an
account at the bank, which can be derived by
select customer-name
from depositor
and the set of customers who have a loan at the bank, which can be derived by
select customer-name
from borrower
We shall refer to the relations obtained as the result of the preceding queries as
d and b, respectively
...
3
...
Relational Databases

151

© The McGraw−Hill
Companies, 2001

4
...
3

Set Operations

145

The union operation automatically eliminates duplicates, unlike the select clause
...

If we want to retain all duplicates, we must write union all in place of union:
(select customer-name
from depositor)
union all
(select customer-name
from borrower)
The number of duplicate tuples in the result is equal to the total number of duplicates
that appear in both d and b
...

4
...
2 The Intersect Operation
To ﬁnd all customers who have both a loan and an account at the bank, we write
(select distinct customer-name
from depositor)
intersect
(select distinct customer-name
from borrower)
The intersect operation automatically eliminates duplicates
...

If we want to retain all duplicates, we must write intersect all in place of intersect:
(select customer-name
from depositor)
intersect all
(select customer-name
from borrower)
The number of duplicate tuples that appear in the result is equal to the minimum
number of duplicates in both d and b
...

4
...
3 The Except Operation
To ﬁnd all customers who have an account but no loan at the bank, we write

152

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

146

Chapter 4

II
...
SQL

© The McGraw−Hill
Companies, 2001

SQL

(select distinct customer-name
from depositor)
except
(select customer-name
from borrower)
The except operation automatically eliminates duplicates
...

If we want to retain all duplicates, we must write except all in place of except:
(select customer-name
from depositor)
except all
(select customer-name
from borrower)
The number of duplicate copies of a tuple in the result is equal to the number of
duplicate copies of the tuple in d minus the number of duplicate copies of the tuple
in b, provided that the difference is positive
...

If, instead, this customer has two accounts and three loans at the bank, there will be
no tuple with the name Jones in the result
...
4 Aggregate Functions
Aggregate functions are functions that take a collection (a set or multiset) of values as
input and return a single value
...

As an illustration, consider the query “Find the average account balance at the
Perryridge branch
...
Relational Databases

153

© The McGraw−Hill
Companies, 2001

4
...
4

Aggregate Functions

147

The result of this query is a relation with a single attribute, containing a single tuple with a numerical value corresponding to the average balance at the Perryridge
branch
...

There are circumstances where we would like to apply the aggregate function not
only to a single set of tuples, but also to a group of sets of tuples; we specify this wish
in SQL using the group by clause
...
Tuples with the same value on all attributes in the
group by clause are placed in one group
...
” We write this query as follows:
select branch-name, avg (balance)
from account
group by branch-name
Retaining duplicates is important in computing an average
...
The
average balance is $7000/4 = $1750
...
If duplicates were eliminated, we would obtain the wrong answer ($6000/3 = $2000)
...
If we do want to eliminate duplicates, we use the keyword distinct in
the aggregate expression
...
” In this case, a depositor counts only once, regardless of the
number of accounts that depositor may have
...
account-number = account
...
For example, we might be interested in only those branches where the average
account balance is more than $1200
...
To express such a
query, we use the having clause of SQL
...
We express this
query in SQL as follows:
select branch-name, avg (balance)
from account
group by branch-name
having avg (balance) > 1200
At times, we wish to treat the entire relation as a single group
...
Consider the query “Find the average balance for all
accounts
...
Relational Databases

© The McGraw−Hill
Companies, 2001

4
...
The notation for this function in SQL is count (*)
...
It is legal to use distinct with
max and min, even though the result does not change
...

If a where clause and a having clause appear in the same query, SQL applies the
predicate in the where clause ﬁrst
...
SQL then applies the having clause, if it
is present, to each group; it removes the groups that do not satisfy the having clause
predicate
...

To illustrate the use of both a having clause and a where clause in the same query,
we consider the query “Find the average balance for each customer who lives in
Harrison and has at least three accounts
...
customer-name, avg (balance)
from depositor, account, customer
where depositor
...
account-number and
depositor
...
customer-name and
customer-city = ’Harrison’
group by depositor
...
account-number) >= 3

4
...

We can use the special keyword null in a predicate to test for a null value
...

The use of a null value in arithmetic and comparison operations causes several
complications
...
3
...
We now outline how SQL handles null values
...
Relational Databases

155

© The McGraw−Hill
Companies, 2001

4
...
6

Nested Subqueries

149

The result of an arithmetic expression (involving, for example +, −, ∗ or /) is null
if any of the input values is null
...

Since the predicate in a where clause can involve Boolean operations such as and,
or, and not on the results of comparisons, the deﬁnitions of the Boolean operations
are extended to deal with the value unknown, as outlined in Section 3
...
4
...

• or: The result of true or unknown is true, false or unknown is unknown, while
unknown or unknown is unknown
...

SQL deﬁnes the result of an SQL statement of the form

select
...
If the predicate evaluates to either false or unknown for a tuple in R1 × · · · × Rn
(the projection of) the tuple is not added to the result
...

Null values, when they exist, also complicate the processing of aggregate operators
...
Consider the following query to total all loan amounts:
select sum (amount)
from loan
The values to be summed in the preceding query include null values, since some
tuples have a null value for amount
...

In general, aggregate functions treat nulls according to the following rule: All aggregate functions except count(*) ignore null values in their input collection
...
The count
of an empty collection is deﬁned to be 0, and all other aggregate operations return a
value of null when applied on an empty collection
...

A boolean type data, which can take values true, false, and unknown, was introduced in SQL:1999
...

4
...
A subquery is a select-from-

where expression that is nested within another query
...
Relational Databases

4
...
We shall study these uses in subsequent sections
...
6
...
The in connective tests for set membership, where the set is a
collection of values produced by a select clause
...
As an illustration, reconsider the query “Find all customers who have both a loan and an account at the bank
...
We can take the alternative approach of ﬁnding all account
holders at the bank who are members of the set of borrowers from the bank
...
We begin by ﬁnding all account
holders, and we write the subquery
(select customer-name
from depositor)
We then need to ﬁnd those customers who are borrowers from the bank and who
appear in the list of account holders obtained in the subquery
...
The resulting query is
select distinct customer-name
from borrower
where customer-name in (select customer-name
from depositor)
This example shows that it is possible to write the same query several ways in
SQL
...
We shall see that there is a substantial amount of
redundancy in SQL
...
It is
also possible to test for membership in an arbitrary relation in SQL
...
loan-number = loan
...
account-number = account
...
Relational Databases

157

© The McGraw−Hill
Companies, 2001

4
...
6

Nested Subqueries

151

We use the not in construct in a similar way
...
The following
query selects the names of customers who have a loan at the bank, and whose names
are neither Smith nor Jones
...
6
...
” In Section 4
...
5, we wrote this query as follows:
select distinct T
...
assets > S
...
branch-city = ’Brooklyn’
SQL does, however, offer an alternative style for writing the preceding query
...
This construct
allows us to rewrite the query in a form that resembles closely our formulation of the
query in English
...
The > some
comparison in the where clause of the outer select is true if the assets value of the
tuple is greater than at least one member of the set of all asset values for branches in
Brooklyn
...
Relational Databases

4
...

As an exercise, verify that = some is identical to in, whereas <> some is not the same
as not in
...
Early versions of SQL
allowed only any
...

Now we modify our query slightly
...
The construct > all
corresponds to the phrase “greater than all
...
As an exercise, verify that <> all is identical to not in
...
” Aggregate functions cannot be composed in SQL
...
Instead, we can follow this strategy: We begin
by writing a query to ﬁnd all average balances, and then nest it as a subquery of a
larger query that ﬁnds those branches for which the average balance is greater than
or equal to all average balances:
select branch-name
from account
group by branch-name
having avg (balance) >= all (select avg (balance)
from account
group by branch-name)

4
...
3 Test for Empty Relations
SQL includes a feature for testing whether a subquery has any tuples in its result
...
Using
the exists construct, we can write the query “Find all customers who have both an
account and a loan at the bank” in still another way:

select customer-name
from borrower
where exists (select *
from depositor
where depositor
...
customer-name)
We can test for the nonexistence of tuples in a subquery by using the not exists construct
...
Relational Databases

159

© The McGraw−Hill
Companies, 2001

4
...
6

Nested Subqueries

153

(that is, superset) operation: We can write “relation A contains relation B” as “not
exists (B except A)
...
) To illustrate the
not exists operator, consider again the query “Find all customers who have an account at all the branches located in Brooklyn
...
Using the except construct, we can write the query as
follows:
select distinct S
...
branch-name
from depositor as T, account as R
where T
...
account-number and
S
...
customer-name))
Here, the subquery
(select branch-name
from branch
where branch-city = ’Brooklyn’)
ﬁnds all the branches in Brooklyn
...
branch-name
from depositor as T, account as R
where T
...
account-number and
S
...
customer-name)
ﬁnds all the branches at which customer S
...
Thus, the
outer select takes each customer and tests whether the set of all branches at which
that customer has an account contains the set of all branches located in Brooklyn
...
In
a subquery, according to the rule, it is legal to use only tuple variables deﬁned in
the subquery itself or in any query that contains the subquery
...
This rule is analogous to the usual scoping rules used for variables
in programming languages
...
6
...
The unique construct returns the value true if the argument subquery contains

160

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

154

Chapter 4

II
...
SQL

© The McGraw−Hill
Companies, 2001

SQL

no duplicate tuples
...
customer-name
from depositor as T
where unique (select R
...
customer-name = R
...
account-number = account
...
branch-name = ’Perryridge’)
We can test for the existence of duplicate tuples in a subquery by using the not
unique construct
...
customer-name
from depositor T
where not unique (select R
...
customer-name = R
...
account-number = account
...
branch-name = ’Perryridge’)
Formally, the unique test on a relation is deﬁned to fail if and only if the relation
contains two tuples t1 and t2 such that t1 = t2
...

4
...
To deﬁne a view, we
must give the view a name and must state the query that computes the view
...
The view name is represented by v
...

As an example, consider the view consisting of branch names and the names of
customers who have either an account or a loan at that branch
...
We deﬁne this view as follows:

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

II
...
SQL

4
...
account-number = account
...
loan-number = loan
...
Since the expression sum(amount) does not have a name, the attribute
name is speciﬁed explicitly in the view deﬁnition
...
Using the
view all-customer, we can ﬁnd all customers of the Perryridge branch by writing
select customer-name
from all-customer
where branch-name = ’Perryridge’

4
...
(An SQL block consists of a single select
from where statement, possibly with groupby and having clauses
...

4
...
1 Derived Relations
SQL allows a subquery expression to be used in the from clause
...
We do this renaming by using the as clause
...
Relational Databases

4
...
The subquery result is named result, with
the attributes branch-name and avg-balance
...
” We wrote this query in Section 4
...
We can now rewrite this query, without using the having clause, as
follows:
select branch-name, avg-balance
from (select branch-name, avg (balance)
from account
group by branch-name)
as branch-avg (branch-name, avg-balance)
where avg-balance > 1200
Note that we do not need to use the having clause, since the subquery in the from
clause computes the average balance, and its result is named as branch-avg; we can
use the attributes of branch-avg directly in the where clause
...
The having clause does not help us in this task, but
we can write this query easily by using a subquery in the from clause, as follows:
select max(tot-balance)
from (select branch-name, sum(balance)
from account
group by branch-name) as branch-total (branch-name, tot-balance)

4
...
2 The with Clause
Complex queries are much easier to write and to understand if we structure them
by breaking them into smaller views that we then combine, just as we structure programs by breaking their task into procedures
...

The with clause provides a way of deﬁning a temporary view whose deﬁnition is
available only to the query in which the with clause occurs
...

with max-balance (value) as
select max(balance)
from account
select account-number
from account, max-balance
where account
...
value

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

II
...
SQL

4
...

We could have written the above query by using a nested subquery in either the
from clause or the where clause
...
The with clause makes the query
logic clearer; it also permits a view deﬁnition to be used in multiple places within a
query
...
We can write the
query using the with clause as follows
...
value >= branch-total-avg
...
You can write the equivalent query
as an exercise
...
9 Modiﬁcation of the Database
We have restricted our attention until now to the extraction of information from the
database
...

4
...
1 Deletion
A delete request is expressed in much the same way as a query
...
SQL expresses a
deletion by
delete from r
where P
where P represents a predicate and r represents a relation
...
The where
clause can be omitted, in which case all tuples in r are deleted
...
If we want to delete
tuples from several relations, we must use one delete command for each relation
...
Relational Databases

© The McGraw−Hill
Companies, 2001

4
...
At the other extreme, the where clause may be empty
...
(Well-designed systems will seek conﬁrmation from the user before executing such a devastating request
...

delete from account
where branch-name = ’Perryridge’
• Delete all loans with loan amounts between $1300 and $1500
...

delete from account
where branch-name in (select branch-name
from branch
where branch-city = ’Needham’)
This delete request ﬁrst ﬁnds all branches in Needham, and then deletes all
account tuples pertaining to those branches
...
The delete request can contain a nested select that references the relation
from which tuples are to be deleted
...
We could write
delete from account
where balance < (select avg (balance)
from account)
The delete statement ﬁrst tests each tuple in the relation account to check whether the
account has a balance less than the average at the bank
...

Performing all the tests before performing any deletion is important — if some tuples
are deleted before other tuples have been tested, the average balance may change,
and the ﬁnal result of the delete would depend on the order in which the tuples were
processed!

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

II
...
SQL

4
...
9
...
Obviously, the attribute values for inserted tuples must be members of the attribute’s domain
...

The simplest insert statement is a request to insert one tuple
...
We write
insert into account
values (’A-9732’, ’Perryridge’, 1200)
In this example, the values are speciﬁed in the order in which the corresponding
attributes are listed in the relation schema
...
For example, the following SQL insert statements are identical
in function to the preceding one:
insert into account (account-number, branch-name, balance)
values (’A-9732’, ’Perryridge’, 1200)
insert into account (branch-name, account-number, balance)
values (’Perryridge’, ’A-9732’, 1200)
More generally, we might want to insert tuples on the basis of the result of a query
...
Let the loan number
serve as the account number for the savings account
...
SQL evaluates the select statement ﬁrst, giving a set of tuples that is
then inserted into the account relation
...

We also need to add tuples to the depositor relation; we do so by writing
insert into depositor
select customer-name, loan-number
from borrower, loan
where borrower
...
loan-number and
branch-name = ’Perryridge’

166

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

160

Chapter 4

II
...
SQL

© The McGraw−Hill
Companies, 2001

SQL

This query inserts a tuple (customer-name, loan-number) into the depositor relation for
each customer-name who has a loan in the Perryridge branch with loan number loannumber
...
If we carry out some insertions even as the select statement is being
evaluated, a request such as
insert into account
select *
from account
might insert an inﬁnite number of tuples! The request would insert the ﬁrst tuple in
account again, creating a second copy of the tuple
...
The select statement may then ﬁnd this third copy and insert a fourth copy,
and so on, forever
...

Our discussion of the insert statement considered only examples in which a value
is given for every attribute in inserted tuples
...
The
remaining attributes are assigned a null value denoted by null
...
Consider
the query
select account-number
from account
where branch-name = ’Perryridge’
Since the branch at which account A-401 is maintained is not known, we cannot determine whether it is equal to “Perryridge”
...
11
...
9
...
For this purpose, the update statement can be used
...

Suppose that annual interest payments are being made, and all balances are to be
increased by 5 percent
...
05

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

II
...
SQL

4
...

If interest is to be paid only to accounts with a balance of $1000 or more, we can
write
update account
set balance = balance * 1
...
As with
insert and delete, a nested select within an update statement may reference the relation that is being updated
...
For example, we can write the request “Pay 5 percent interest on accounts whose balance is
greater than average” as follows:
update account
set balance = balance * 1
...
We could write two update statements:
update account
set balance = balance * 1
...
05
where balance <= 10000
Note that, as we saw in Chapter 3, the order of the two update statements is important
...
3 percent interest
...

update account
set balance = case
when balance <= 10000 then balance * 1
...
06
end
The general form of the case statement is as follows
...
Relational Databases

4
...

when pred n then result n
else result 0
end

The operation returns result i , where i is the ﬁrst of pred 1 , pred 2 ,
...
Case statements can be used in any place where a value is expected
...
9
...
As an
illustration, consider the following view deﬁnition:
create view loan-branch as
select branch-name, loan-number
from loan
Since SQL allows a view name to appear wherever a relation name is allowed, we can
write
insert into loan-branch
values (’Perryridge’, ’L-307’)
SQL represents this insertion by an insertion into the relation loan, since loan is the

actual relation from which the view loan-branch is constructed
...
This value is a null value
...

As we saw in Chapter 3, the view-update anomaly becomes more difﬁcult to handle when a view is deﬁned in terms of several relations
...

Under this constraint, the update, insert, and delete operations would be forbidden
on the example view all-customer that we deﬁned previously
...
Relational Databases

169

© The McGraw−Hill
Companies, 2001

4
...
10

Joined Relations∗∗

163

4
...
5 Transactions
A transaction consists of a sequence of query and/or update statements
...
One of the following SQL statements must end the transaction:
• Commit work commits the current transaction; that is, it makes the updates
performed by the transaction become permanent in the database
...

• Rollback work causes the current transaction to be rolled back; that is, it undoes all the updates performed by the SQL statements in the transaction
...

The keyword work is optional in both the statements
...
Commit is similar, in a sense, to saving changes to a document that
is being edited, while rollback is similar to quitting the edit session without saving
changes
...
The database system guarantees that in the event of some
failure, such as an error in one of the SQL statements, a power outage, or a system
crash, a transaction’s effects will be rolled back if it has not yet executed commit
work
...

For instance, to transfer money from one account to another we need to update
two account balances
...
An error
while a transaction executes one of its statements would result in undoing of the
effects of the earlier statements of the transaction, so that the database is not left in a
partially updated state
...

If a program terminates without executing either of these commands, the updates
are either committed or rolled back
...
In many SQL implementations, by default each SQL statement is taken to be a transaction on its own, and gets
committed as soon as it is executed
...
How to turn off automatic commit depends on the speciﬁc SQL implementation
...
end
...

4
...
Relational Databases

© The McGraw−Hill
Companies, 2001

4
...
1

amount
3000
4000
1700

customer-name loan-number
Jones
L-170
Smith
L-230
Hayes
L-155
borrower

The loan and borrower relations
...
These additional operations are typically used as subquery
expressions in the from clause
...
10
...
1
...
Figure 4
...
loan-number = borrower
...
loan-number = borrower
...
The attributes of the
result consist of the attributes of the left-hand-side relation followed by the attributes
of the right-hand-side relation
...
The SQL standard does not require
attribute names in such results to be unique
...

We rename the result relation of a join and the attributes of the result relation by
using an as clause, as illustrated here:
loan inner join borrower on loan
...
loan-number
as lb(loan-number, branch, amount, cust, cust-loan-num)
We rename the second occurrence of loan-number to cust-loan-num
...

Next, we consider an example of the left outer join operation:
loan left outer join borrower on loan
...
loan-number
loan-number
L-170
L-230

branch-name
Downtown
Redwood

amount
3000
4000

customer-name
Jones
Smith

loan-number
L-170
L-230

Figure 4
...
loan-number = borrower
...

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

II
...
10

loan-number
L-170
L-230
L-260

branch-name
Downtown
Redwood
Perryridge

171

© The McGraw−Hill
Companies, 2001

4
...
3 The result of loan left outer join borrower on
loan
...
loan-number
...
First, compute the
result of the inner join as before
...
Figure 4
...
The tuples (L-170, Downtown, 3000) and (L-230, Redwood, 4000) join with
tuples from borrower and appear in the result of the inner join, and hence in the result
of the left outer join
...

Finally, we consider an example of the natural join operation:
loan natural inner join borrower
This expression computes the natural join of the two relations
...
Figure 4
...
The result is similar to the result of the inner join with the on condition in
Figure 4
...
However, the attribute
loan-number appears only once in the result of the natural join, whereas it appears
twice in the result of the join with the on condition
...
10
...
10
...
Join operations take two relations and return another relation as the result
...

Each of the variants of the join operations in SQL consists of a join type and a join
condition
...
The join type deﬁnes how tuples in each
loan-number
L-170
L-230
Figure 4
...

172

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

166

Chapter 4

II
...
SQL

SQL

Join types
inner join
left outer join
right outer join
full outer join
Figure 4
...
, An)

Join types and join conditions
...
Figure 4
...
The
ﬁrst join type is the inner join, and the other three are the outer joins
...

The use of a join condition is mandatory for outer joins, but is optional for inner
joins (if it is omitted, a Cartesian product results)
...
The keywords inner and outer are
optional, since the rest of the join type enables us to deduce whether the join is an
inner join or an outer join
...
The ordering of the attributes in the result of a
natural join is as follows
...

Next come all nonjoin attributes of the left-hand-side relation, and ﬁnally all nonjoin
attributes of the right-hand-side relation
...
Tuples from the right-handside relation that do not match any tuple in the left-hand-side relation are padded
with nulls and are added to the result of the right outer join
...
6 shows the result of this expression
...
The
ﬁrst two tuples in the result are from the inner natural join of loan and borrower
...
Hence, the tuple (L-155, null,
null, Hayes) appears in the join result
...
, An ) is similar to the natural join condition, except that the join attributes are the attributes A1 , A2 ,
...
The attributes A1 , A2 ,
...

The full outer join is a combination of the left and right outer-join types
...
Relational Databases

4
...
6

branch-name
Downtown
Redwood
null

173

© The McGraw−Hill
Companies, 2001

4
...

the left-hand-side relation that did not match with any from the right-hand-side, and
adds them to the result
...

For example, Figure 4
...
customer-name = borrower
...
The
ﬁrst is equivalent to an inner join without a join condition; the second is equivalent
to a full outer join on the “false” condition — that is, where the inner join is empty
...
7

branch-name
Downtown
Redwood
Perryridge
null

amount
3000
4000
1700
null

customer-name
Jones
Smith
null
Hayes

The result of loan full outer join borrower using(loan-number)
...
Relational Databases

4
...
11 Data-Deﬁnition Language
In most of our discussions of SQL and relational databases, we have accepted a set of
relations as given
...

The SQL DDL allows speciﬁcation of not only a set of relations, but also information
about each relation, including
• The schema for each relation
• The domain of values associated with each attribute
• The integrity constraints
• The set of indices to be maintained for each relation
• The security and authorization information for each relation
• The physical storage structure of each relation on disk
We discuss here schema deﬁnition and domain values; we defer discussion of the
other SQL DDL features to Chapter 6
...
11
...
The full
form, character, can be used instead
...
The full form, character varying, is equivalent
...
The
full form, integer, is equivalent
...

• numeric(p, d): A ﬁxed-point number with user-speciﬁed precision
...
Thus, numeric(3,1) allows 44
...
5 or 0
...

• real, double precision: Floating-point and double-precision ﬂoating-point
numbers with machine-dependent precision
...

• date: A calendar date containing a (four-digit) year, month, and day of the
month
...
Relational Databases

175

© The McGraw−Hill
Companies, 2001

4
...
11

Data-Deﬁnition Language

169

• time: The time of day, in hours, minutes, and seconds
...
It is also possible to store time zone information along with the time
...
A variant, timestamp(p), can be
used to specify the number of fractional digits for seconds (the default here
being 6)
...
45’
Dates must be speciﬁed in the format year followed by month followed by day, as
shown
...
We can use an expression of the form cast e as t to convert a character string (or string valued expression) e to the type t, where t is one of date, time,
or timestamp
...

To extract individual ﬁelds of a date or time value d, we can use extract (ﬁeld from
d), where ﬁeld can be one of year, month, day, hour, minute, or second
...
SQL
also provides a data type called interval, and it allows computations based on dates
and times and on intervals
...
Similarly, adding
or subtracting an interval to a date or time gives back a date or time, respectively
...
For example, since
every small integer is an integer, a comparison x < y, where x is a small integer and
y is an integer (or vice versa), makes sense
...
A transformation of this sort is called a type coercion
...

As an illustration, suppose that the domain of customer-name is a character string
of length 20, and the domain of branch-name is a character string of length 15
...

As we discussed in Chapter 3, the null value is a member of all domains
...
Consider a tuple in the
customer relation where customer-name is null
...
In cases such
as this, we wish to forbid null values, and we do so by restricting the domain of
customer-name to exclude null values
...
Any database
modiﬁcation that would cause a null to be inserted in a not null domain generates

176

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

170

Chapter 4

II
...
SQL

© The McGraw−Hill
Companies, 2001

SQL

an error diagnostic
...

In particular, it is essential to prohibit null values in the primary key of a relation
schema
...

4
...
2 Schema Deﬁnition in SQL
We deﬁne an SQL relation by using the create table command:
create table r(A1 D1 , A2 D2 ,
...
,
integrity-constraintk )
where r is the name of the relation, each Ai is the name of an attribute in the schema
of relation r, and Di is the domain type of values in the domain of attribute Ai
...
, Ajm ): The primary key speciﬁcation says that attributes Aj1 , Aj2 ,
...
The primary
key attributes are required to be non-null and unique; that is, no tuple can have
a null value for a primary key attribute, and no two tuples in the relation can
be equal on all the primary-key attributes
...

• check(P): The check clause speciﬁes a predicate P that must be satisﬁed by
every tuple in the relation
...

Figure 4
...
Note that,
as in earlier chapters, we do not attempt to model precisely the real world in the
bank-database example
...
We use customer-name as a primary key to keep our
database schema simple and short
...
Similarly, it
ﬂags an error and prevents the update if the check condition on the tuple fails
...
An attribute can be declared to be not null in the
following way:
account-number char(10) not null
1
...

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

II
...
SQL

4
...
8

SQL data deﬁnition for part of the bank database
...
, Ajm )
The unique speciﬁcation says that attributes Aj1 , Aj2 ,
...

However, candidate key attributes are permitted to be null unless they have explicitly
been declared to be not null
...

The treatment of nulls here is the same as that of the unique construct deﬁned in
Section 4
...
4
...
For instance, the check
clause in the create table command for relation branch checks that the value of assets
is nonnegative
...
Relational Databases

© The McGraw−Hill
Companies, 2001

4
...
We consider more
general forms of check conditions, as well as a class of constraints called referential
integrity constraints, in Chapter 6
...
We can use the insert command to load
data into the relation
...

To remove a relation from an SQL database, we use the drop table command
...
The command
drop table r
is a more drastic action than
delete from r
The latter retains relation r, but deletes all tuples in r
...
After r is dropped, no tuples can be inserted
into r unless it is re-created with the create table command
...
All tuples
in the relation are assigned null as the value for the new attribute
...
We can drop attributes from a relation by
the command
alter table r drop A
where r is the name of an existing relation, and A is the name of an attribute of the
relation
...

4
...
Writing queries in SQL is usu-

ally much easier than coding the same queries in a general-purpose programming
language
...
Not all queries can be expressed in SQL, since SQL does not provide the full
expressive power of a general-purpose language
...
To write such queries, we can embed SQL within a more
powerful language
...
Relational Databases

179

© The McGraw−Hill
Companies, 2001

4
...
12

Embedded SQL

173

SQL is designed so that queries written in it can be optimized automatically
and executed efﬁciently — and providing the full power of a programming
language makes automatic optimization exceedingly difﬁcult
...
Nondeclarative actions— such as printing a report, interacting with a user, or
sending the results of a query to a graphical user interface — cannot be done
from within SQL
...
For an integrated application, the
programs written in the programming language must be able to access the
database
...
A language in which SQL
queries are embedded is referred to as a host language, and the SQL structures permitted in the host language constitute embedded SQL
...
This embedded form of SQL extends the
programmer’s ability to manipulate the database even further
...

An embedded SQL program must be processed by a special preprocessor prior to
compilation
...
Then, the resulting program is compiled by the host-language compiler
...
For instance, a semicolon is used instead of END-EXEC when SQL
is embedded in C
...
Variables of the host language can be used
within embedded SQL statements, but they must be preceded by a colon (:) to distinguish them from SQL variables
...
There are, however, several important differences, as we note
here
...
The result of the
query is not yet computed
...

180

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

174

Chapter 4

II
...
SQL

© The McGraw−Hill
Companies, 2001

SQL

Consider the banking schema that we have used in this chapter
...
We can
write this query as follows:
EXEC SQL

declare c cursor for
select customer-name, customer-city
from depositor, customer, account
where depositor
...
customer-name and
account
...
account-number and
account
...
We use
this variable to identify the query in the open statement, which causes the query to
be evaluated, and in the fetch statement, which causes the values of one tuple to be
placed in host-language variables
...
The query has a host-language variable (:amount); the
query uses the value of the variable at the time the open statement was executed
...

An embedded SQL program executes a series of fetch statements to retrieve tuples
of the result
...
For our example query, we need one variable to hold the
customer-name value and another to hold the customer-city value
...
Then the statement:
EXEC SQL fetch c into :cn, :cc END-EXEC

produces a tuple of the result relation
...

A single fetch request returns only one tuple
...
Embedded SQL assists the
programmer in managing this iteration
...
When the program
executes an open statement on a cursor, the cursor is set to point to the ﬁrst tuple
of the result
...
When no further tuples remain to be processed, the
variable SQLSTATE in the SQLCA is set to ’02000’ (meaning “no data”)
...

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

II
...
SQL

4
...
For our example, this statement takes
the form
EXEC SQL close c END-EXEC
SQLJ, the Java embedding of SQL, provides a variation of the above scheme, where
Java iterators are used in place of cursors
...

Embedded SQL expressions for database modiﬁcation (update, insert, and delete)
do not return a result
...
A databasemodiﬁcation request takes the form
EXEC SQL < any valid update, insert, or delete> END-EXEC

Host-language variables, preceded by a colon, may appear in the SQL databasemodiﬁcation expression
...

Database relations can also be updated through cursors
...

declare c cursor for
select *
from account
where branch-name = ‘Perryridge‘
for update
We then iterate through the tuples by performing fetch operations on the cursor (as
illustrated earlier), and after fetching each tuple we execute the following code
update account
set balance = balance + 100
where current of c
Embedded SQL allows a host-language program to access the database, but it provides no assistance in presenting results to the user or in generating reports
...
We discuss such tools in Chapter 5
(Section 5
...

4
...
In contrast, embedded SQL statements must be completely present
at compile time; they are compiled by the embedded SQL preprocessor
...
Relational Databases

4
...
Preparing a dynamic SQL statement compiles it, and
subsequent uses of the prepared statement use the compiled version
...

char * sqlprog = ”update account set balance = balance ∗1
...

However, the syntax above requires extensions to the language or a preprocessor
for the extended language
...

In the rest of this section, we look at two standards for connecting to an SQL
database and performing queries and updates
...

To understand these standards, we need to understand the concept of SQL sessions
...
Thus, all activities of
the user or application are in the context of an SQL session
...

4
...
1 ODBC∗∗
The Open DataBase Connectivity (ODBC) standard deﬁnes a way for an application
program to communicate with a database server
...
Applications such as graphical user
interfaces, statistics packages, and spreadsheets can make use of the same ODBC API
to connect to any database server that supports ODBC
...
When the client program makes an ODBC API call, the code
in the library communicates with the server to carry out the requested action, and
fetch results
...
9 shows an example of C code using the ODBC API
...
To do
so, the program ﬁrst allocates an SQL environment, then a database connection handle
...
The program then opens
the database connection by using SQLConnect
...
Relational Databases

183

© The McGraw−Hill
Companies, 2001

4
...
13

Dynamic SQL

177

int ODBCexample()
{
RETCODE error;
HENV env; /* environment */
HDBC conn; /* database connection */
SQLAllocEnv(&env);
SQLAllocConnect(env, &conn);
SQLConnect(conn, ”aura
...
com”, SQL NTS, ”avi”, SQL NTS,
”avipasswd”, SQL NTS);

{
char branchname[80];
ﬂoat balance;
int lenOut1, lenOut2;
HSTMT stmt;
SQLAllocStmt(conn, &stmt);

}

char * sqlquery = ”select branch name, sum (balance)
from account
group by branch name”;
error = SQLExecDirect(stmt, sqlquery, SQL NTS);
if (error == SQL SUCCESS) {
SQLBindCol(stmt, 1, SQL C CHAR, branchname , 80, &lenOut1);
SQLBindCol(stmt, 2, SQL C FLOAT, &balance, 0 , &lenOut2);
while (SQLFetch(stmt) >= SQL SUCCESS) {
printf (” %s %g\n”, branchname, balance);
}
}

SQLFreeStmt(stmt, SQL DROP);
SQLDisconnect(conn);
SQLFreeConnect(conn);
SQLFreeEnv(env);

}
Figure 4
...

cluding the connection handle, the server to which to connect, the user identiﬁer,
and the password for the database
...

Once the connection is set up, the program can send SQL commands to the database
by using SQLExecDirect C language variables can be bound to attributes of the query
result, so that when a result tuple is fetched using SQLFetch, its attribute values are
stored in corresponding C variables
...
The next argument

184

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

178

Chapter 4

II
...
SQL

© The McGraw−Hill
Companies, 2001

SQL

gives the address of the variable
...
A negative value returned
for the length ﬁeld indicates that the value is null
...
On each fetch, the program stores the values
in C variables as speciﬁed by the calls on SQLBindCol and prints out these values
...
Good
programming style requires that the result of every function call must be checked to
make sure there are no errors; we have omitted most of these checks for brevity
...
The question marks are placeholders
for values which will be supplied later
...

ODBC deﬁnes functions for a variety of tasks, such as ﬁnding all the relations in the
database and ﬁnding the names and types of columns of a query result or a relation
in the database
...
The call SQLSetConnectOption(conn, SQL AUTOCOMMIT, 0) turns
off automatic commit on connection conn, and transactions must then be committed
explicitly by SQLTransact(conn, SQL COMMIT) or rolled back by SQLTransact(conn,
SQL ROLLBACK)
...
Each version deﬁnes conformance levels, which specify subsets of the functionality deﬁned by
the standard
...
Level 1 requires support
for fetching information about the catalog, such as information about what relations
are present and the types of their attributes
...

The more recent SQL standards (SQL-92 and SQL:1999) deﬁne a call level interface
(CLI) that is similar to the ODBC interface, but with some minor differences
...
13
...
(The word JDBC was originally an abbreviation for “Java Database Connectivity”, but the full form is no longer used
...
10 shows an example Java program that uses the JDBC interface
...
forName
...
Relational Databases

185

© The McGraw−Hill
Companies, 2001

4
...
13

Dynamic SQL

179

public static void JDBCexample(String dbid, String userid, String passwd)
{
try
{
Class
...
jdbc
...
OracleDriver”);
Connection conn = DriverManager
...
bell-labs
...
createStatement();
try {
stmt
...
out
...
” + sqle);
}
ResultSet rset = stmt
...
next()) {
System
...
println(rset
...
getFloat(2));
}
stmt
...
close();
}
catch (SQLException sqle)
{
System
...
println(”SQLException : ” + sqle);
}
}
Figure 4
...

runs (in our example, aura
...
com), the port number it uses for communication (in our example, 2000)
...
The ﬁrst parameter also speciﬁes the protocol to be used to communicate
with the database (in our example, jdbc:oracle:thin:)
...
A JDBC driver may support multiple protocols, and we must specify one supported by both the database and the driver
...

The program then creates a statement handle on the connection and uses it to
execute an SQL statement and get back results
...
executeUpdate
executes an update statement
...
} catch {
...
Relational Databases

4
...
prepareStatement(
”insert into account values(?,?,?)”);
pStmt
...
setString(2, ”Perryridge”);
pStmt
...
executeUpdate();
pStmt
...
executeUpdate();
Figure 4
...

catch any exceptions (error conditions) that arise when JDBC calls are made, and print
an appropriate message to the user
...
executeQuery
...
Figure 4
...

We can also create a prepared statement in which some values are replaced by “?”,
thereby specifying that actual values will be provided later
...
The database can compile the query when it is prepared,
and each time it is executed (with new values), the database can reuse the previously
compiled form of the query
...
11 shows how prepared
statements can be used
...
It can
create an updatable result set from a query that performs a selection and/or a projection on a database relation
...
JDBC also provides an
API to examine database schemas and to ﬁnd the types of attributes of a result set
...

4
...
We covered the basics of SQL earlier in this chapter
...

4
...
1 Schemas, Catalogs, and Environments
To understand the motivation for schemas and catalogs, consider how ﬁles are named
in a ﬁle system
...
Current generation ﬁle systems of course have a directory structure, with

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

II
...
SQL

4
...
To name a ﬁle uniquely, we must specify the full
path name of the ﬁle, for example, /users/avi/db-book/chapter4
...

Like early ﬁle systems, early database systems also had a single name space for all
relations
...
Contemporary database systems provide a three-level hierarchy for naming relations
...
SQL objects such as relations and views are contained
within a schema
...
The user must provide the user name and usually, a secret
password for verifying the identity of the user, as we saw in the ODBC and JDBC
examples in Sections 4
...
1 and 4
...
2
...
When a user connects to a database system,
the default catalog and schema are set up for for the connection; this corresponds to
the current directory being set to the user’s home directory when the user logs into
an operating system
...
bank-schema
...
Thus if catalog5 is the default
catalog, we can use bank-schema
...
Further, we may also omit the schema name, and the schema part of the name is again
considered to be the default schema for the connection
...

With multiple catalogs and schemas available, different applications and different users can work independently without worrying about name clashes
...

The default catalog and schema are part of an SQL environment that is set up
for each connection
...
All the usual SQL statements, including the
DDL and DML statements, operate in the context of a schema
...
Creation and
dropping of catalogs is implementation dependent and not part of the SQL standard
...
14
...

A module typically contains multiple SQL procedures
...
An extension of the SQL-92 standard lan-

guage also permits procedural constructs, such as for, while, and if-then-else, and
compound SQL statements (multiple SQL statements between a begin and an end)
...
Such procedures are also called stored procedures
...
Relational Databases

4
...

Chapter 9 covers procedural extensions of SQL as well as many other new features
of SQL:1999
...
15 Summary
• Commercial database systems do not use the terse, formal query languages
covered in Chapter 3
...
”
• SQL includes a variety of language constructs for queries on the database
...
SQL also allows ordering of query results by sorting on speciﬁed attributes
...

Views are useful for hiding unneeded information, and for collecting together
information from more than one relation into a single view
...

• SQL provides constructs for updating, inserting, and deleting information
...
That is, all the operations are carried out successfully, or none is carried out
...

• Modiﬁcations to the database may lead to the generation of null values in
tuples
...

• The SQL data deﬁnition language is used to create relations with speciﬁed
schemas
...
Further details on the SQL DDL, in particular its support for integrity
constraints, appear in Chapter 6
...
The ODBC and JDBC standards deﬁne application program interfaces to
access SQL databases from C and Java language programs
...

• We also saw a brief overview of some advanced features of SQL, such as procedural extensions, catalogs, schemas and stored procedures
...
Relational Databases

189

© The McGraw−Hill
Companies, 2001

4
...
1 Consider the insurance database of Figure 4
...
Construct the following SQL queries for this relational database
...
Find the total number of people who owned cars that were involved in accidents in 1989
...
Find the number of accidents in which the cars belonging to “John Smith”
were involved
...
Add a new accident to the database; assume any values for required attributes
...
Delete the Mazda belonging to “John Smith”
...
Update the damage amount for the car with license number “AABB2000” in
the accident with report number “AR2197” to $3000
...
2 Consider the employee database of Figure 4
...
Give an expression in SQL for each of the following queries
...
Find the names of all employees who work for First Bank Corporation
...
Relational Databases

© The McGraw−Hill
Companies, 2001

4
...
12

Insurance database
...
13

Employee database
...
Find the names and cities of residence of all employees who work for First
Bank Corporation
...
Find the names, street addresses, and cities of residence of all employees
who work for First Bank Corporation and earn more than $10,000
...
Find all employees in the database who live in the same cities as the companies for which they work
...
Find all employees in the database who live in the same cities and on the
same streets as do their managers
...
Find all employees in the database who do not work for First Bank Corporation
...
Find all employees in the database who earn more than each employee of
Small Bank Corporation
...
Assume that the companies may be located in several cities
...

i
...

j
...

k
...

l
...

4
...
13
...

a
...

b
...

c
...

d
...

e
...

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

II
...
SQL

Exercises

185

4
...
Give an expression in SQL that is equivalent
to each of the following queries
...
ΠA (r)
b
...
r × s
d
...
5 Let R = (A, B, C), and let r1 and r2 both be relations on schema R
...

a
...

c
...

r1 ∪ r2
r1 ∩ r2
r1 − r2
ΠAB (r1 )

1

ΠBC (r2 )

4
...
Write an
expression in SQL for each of the queries below:
a
...
{< a, b, c > | < a, b > ∈ r ∧ < a, c > ∈ s}
c
...
7 Show that, in SQL, <> all is identical to not in
...
8 Consider the relational database of Figure 4
...
Using SQL, deﬁne a view consisting of manager-name and the average salary of all employees who work for
that manager
...

4
...
a1
from p, r1, r2
where p
...
a1 or p
...
a1
Under what conditions does the preceding query select values of p
...

4
...
Using a nested query in the from clauser
...
Relational Databases

4
...
Using a nested query in a having clause
...
11 Suppose that we have a relation marks(student-id, score) and we wish to assign
grades to students based on the score as follows: grade F if score < 40, grade C
if 40 ≤ score < 60, grade B if 60 ≤ score < 80, and grade A if 80 ≤ score
...
Display the grade for each student, based on the marks relation
...
Find the number of students with each grade
...
12 SQL-92 provides an n-ary operation called coalesce, which is deﬁned as follows:
coalesce(A1 , A2 ,
...
, An ,
and returns null if all of A1 , A2 ,
...
Show how to express the coalesce operation using the case operation
...
13 Let a and b be relations with the schemas A(name, address, title) and B(name, address, salary), respectively
...

Make sure that the result relation does not contain two copies of the attributes
name and address, and that the solution is correct even if some tuples in a and b
have null values for attributes name or address
...
14 Give an SQL schema deﬁnition for the employee database of Figure 4
...
Choose
an appropriate domain for each attribute and an appropriate primary key for
each relation schema
...
15 Write check conditions for the schema you deﬁned in Exercise 4
...
Every employee works for a company located in the same city as the city in
which the employee lives
...
No employee earns a salary higher than that of his manager
...
16 Describe the circumstances in which you would choose to use embedded SQL
rather than SQL alone or only a general-purpose programming language
...
[1976]
...
[1975] and Chamberlin and Boyce [1974]
...
The IBM Systems Application Architecture deﬁnition of SQL is deﬁned by IBM
[1987]
...

Textbook descriptions of the SQL-92 language include Date and Darwen [1997],
Melton and Simon [1993], and Cannan and Otten [1993]
...
More information on SQLJ
and SQLJ software can be obtained from http://www
...
org
...

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

II
...
SQL

193

© The McGraw−Hill
Companies, 2001

Bibliographical Notes

187

Eisenberg and Melton [1999] provide an overview of SQL:1999
...
Part 1 (SQL/Framework),
gives an overview of the other parts
...
Part 3 (SQL/CLI) describes the Call-Level Interface
...
The standard is useful to database implementers but is very hard
to read
...
ansi
...

Many database products support SQL features beyond those speciﬁed in the standards, and may not support some features of the standard
...

http://java
...
com/docs/books/tutorial is an excellent source for more (and up-todate) information on JDBC, and on Java in general
...
The ODBC API is described in Microsoft
[1997] and Sanders [1998]
...
Bibliographic references on these matters appear in
that chapter
...
Relational Databases

H

A

P

T

5
...
In this chapter, we study two more languages: QBE and Datalog
...
QBE and its variants
are widely used in database systems on personal computers
...
Although not used commercially at present, Datalog has been used in several research database systems
...
Keep in mind that individual implementations of a
language may differ in details, or may support only a subset of the full language
...
While these are not strictly speaking languages, they form the main
interface to a database for many users
...

5
...
The QBE database system was
developed at IBM’s T
...
Watson Research Center in the early 1970s
...

Today, many database systems for personal computers support variants of QBE language
...
It has two
distinctive features:
1
...
A query in a one-dimensional
189

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

190

Chapter 5

II
...
Other Relational
Languages

Other Relational Languages

language (for example, SQL) can be written in one (possibly long) line
...
(There is a
one-dimensional version of QBE, but we shall not consider it in our discussion)
...
QBE queries are expressed “by example
...

The system generalizes this example to compute the answer to the query
...

We express queries in QBE by skeleton tables
...
1
...
An example row consists of constants and example elements, which are domain
variables
...

branch

customer

branch-name

customer-name

loan

loan-number

borrower

account

depositor

Figure 5
...

195

196

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

II
...
Other Relational
Languages

5
...

5
...
1 Queries on One Relation
Returning to our ongoing bank example, to ﬁnd all loan numbers at the Perryridge
branch, we bring up the skeleton for the loan relation, and ﬁll it in as follows:
loan

loan-number
P
...
For each such tuple, the system assigns the value
of the loan-number attribute to the variable x
...
appears in the loan-number column next to
the variable x
...
As a result,
if a variable does not appear more than once in a query, it may be omitted
...

branch-name
Perryridge

amount

QBE (unlike SQL) performs duplicate elimination automatically
...
after the P
...
ALL
...
in
every ﬁeld
...
in
the column headed by the relation name:
loan
P
...

branch-name

amount
>700

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

192

Chapter 5

II
...
Other Relational
Languages

Other Relational Languages

Comparisons can involve only one arithmetic expression on the right-hand side of
the comparison operation (for example, > ( x + y − 20))
...
The space on the left-hand side of the comparison operation must be blank
...

Note that requiring the left-hand side to be blank implies that we cannot compare
two distinct named variables
...

As yet another example, consider the query “Find the names of all branches that
are not located in Brooklyn
...

branch-city
¬ Brooklyn

assets

The primary purpose of variables in QBE is to force values of certain tuples to have
the same value on certain attributes
...
x
x

To execute this query, the system ﬁnds all pairs of tuples in borrower that agree on
the loan-number attribute, where the value for the customer-name attribute is “Smith”
for one tuple and “Jones” for the other
...

In the domain relational calculus, the query would be written as
{ l | ∃ x ( x, l ∈ borrower ∧ x = “Smith”)
∧ ∃ x ( x, l ∈ borrower ∧ x = “Jones”)}
As another example, consider the query “Find all customers who live in the same
city as Jones”:
customer

customer-name
P
...
1
...
The connections among the various relations are achieved through variables that force certain tuples to have the same value
on certain attributes
...
This query can be written as

197

198

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

II
...
Other Relational
Languages

5
...
y

Query-by-Example

193

amount

loan-number
x

To evaluate the preceding query, the system ﬁnds tuples in loan with “Perryridge”
as the value for the branch-name attribute
...
It
displays the values for the customer-name attribute
...
x
customer-name
x

account-number

loan-number

Now consider the query “Find the names of all customers who have an account
at the bank, but who do not have a loan from the bank
...
x
customer-name
x

account-number

loan-number

Compare the preceding query with our earlier query “Find the names of all customers who have both an account and a loan at the bank
...
This difference, however,
has a major effect on the processing of the query
...
There is a tuple in the depositor relation whose customer-name is the domain
variable x
...
There is no tuple in the borrower relation whose customer-name is the same as
in the domain variable x
...
”
The fact that we placed the ¬ under the relation name, rather than under an attribute name, is important
...
Thus, to
ﬁnd all customers who have at least two accounts, we write

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

194

Chapter 5

II
...
Other Relational
Languages

Other Relational Languages

depositor

customer-name
P
...
”

5
...
3 The Condition Box
At times, it is either inconvenient or impossible to express all the constraints on the
domain variables within the skeleton tables
...
QBE allows logical expressions to appear in a condition box
...

For example, the query “Find the loan numbers of all loans made to Smith, to Jones
(or to both jointly)” can be written as
borrower

customer-name
n

loan-number
P
...
in multiple rows
...
in multiple rows are sometimes hard to
understand, and are best avoided
...
1
...
” We want to include an “x = Jones” constraint in this query
...

conditions
x ≥ 1300
x ≤ 1500

branch-name

balance
x

199

200

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

II
...
Other Relational
Languages

5
...
” This query can be written as
branch

branch-name
P
...
We can
write the query “Find all branches that have assets that are at least twice as large as
the assets of one of the branches located in Brooklyn” much as we did in the preceding query, by modifying the condition box to

conditions
y ≥ 2* z
To ﬁnd all account numbers of account with a balance between $1300 and $2000,
but not exactly $1500, we write
account

account-number
P
...
To ﬁnd all branches that are located in either Brooklyn or Queens,
we write

branch

branch-name
P
...
1
...
If the result of a query
includes attributes from several relation schemas, we need a mechanism to display
the desired result in a single table
...
We print the desired
result by including the command P
...

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

196

Chapter 5

II
...
Other Relational
Languages

Other Relational Languages

As an illustration, consider the query “Find the customer-name, account-number, and
balance for all accounts at the Perryridge branch
...
Join depositor and account
...
Project customer-name, account-number, and balance
...
Create a skeleton table, called result, with attributes customer-name, accountnumber, and balance
...

2
...

The resulting query is
account

account-number
y

depositor

result
P
...
1
...

We gain this control by inserting either the command AO
...
(descending order) in the appropriate column
...
AO
...

We specify the order in which the sorting should be carried out by including, with
each sort operator (AO or DO), an integer surrounded by parentheses
...
AO(1)
...
DO(2)
...
Relational Databases

© The McGraw−Hill
Companies, 2001

5
...
1

Query-by-Example

197

The command P
...
speciﬁes that the account number should be sorted ﬁrst;
the command P
...
speciﬁes that the balances for each account should then be
sorted
...
1
...
We must postﬁx these operators with ALL
...
The ALL
...
Thus, to ﬁnd

the total balance of all the accounts maintained at the Perryridge branch, we write
account

account-number

branch-name
Perryridge

balance
P
...
ALL
...
Thus, to
ﬁnd the total number of customers who have an account at the bank, we write
depositor

customer-name

account-number

P
...
UNQ
...

operator, which is analogous to SQL’s group by construct
...
G
...
AVG
...
x

The average balance is computed on a branch-by-branch basis
...

in the P
...
ALL
...
If we wish to display the branch names in ascending order, we replace P
...
by
P
...
G
...
ALL
...
Relational Databases

© The McGraw−Hill
Companies, 2001

5
...
G
...
UNQ
...
UNQ
...
Thus, CNT
...
w is the number of distinct branches in Brooklyn
...

• The customer whose name is x has an account at the branch
...
UNQ
...
If CNT
...
z = CNT
...
w, then customer x must have an account
at all of the branches located in Brooklyn
...

5
...
7 Modiﬁcation of the Database
In this section, we show how to add, remove, or change information in QBE
...
1
...
1 Deletion
Deletion of tuples from a relation is expressed in much the same way as a query
...
in place of P
...
When we delete information in only
some of the columns, null values, speciﬁed by −, are inserted
...
command operates on only one relation
...
operator for each relation
...

customer
D
...
Relational Databases

© The McGraw−Hill
Companies, 2001

5
...
1

Query-by-Example

199

• Delete the branch-city value of the branch whose name is “Perryridge
...

assets

Thus, if before the delete operation the branch relation contains the tuple
(Perryridge, Brooklyn, 50000), the delete results in the replacement of the preceding tuple with the tuple (Perryridge, −, 50000)
...

loan
D
...

branch-name

customer-name

amount
x

loan-number
y

conditions
x = (≥ 1300 and ≤ 1500)
Note that to delete loans we must delete tuples from both the loan and borrower relations
...

account
D
...

branch

account-number
y
customer-name

branch-name
x

branch-name
x

balance

account-number
y
branch-city
Brooklyn

assets

Note that, in expressing a deletion, we can reference relations other than those from
which we are deleting information
...
1
...
2 Insertion
To insert data into a relation, we either specify a tuple to be inserted or write a query
whose result is a set of tuples to be inserted
...

operator in the query expression
...

The simplest insert is a request to insert one tuple
...
We write

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

200

Chapter 5

II
...
Other Relational
Languages

Other Relational Languages

account
I
...
To insert information into the branch relation about a new branch with name “Capital” and city
“Queens,” but with a null asset value, we write
branch
I
...

Consider again the situation where we want to provide as a gift, for all loan customers of the Perryridge branch, a new $200 savings account for every loan account
that they have, with the loan number serving as the account number for the savings
account
...

account-number
x

depositor
I
...

5
...
7
...
For this purpose, we use the U
...
As we could
for insert and delete, we can choose the tuples to be updated by using a query
...

Suppose that we want to update the asset value of the of the Perryridge branch to
$10,000,000
...
10000000

205

206

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

II
...
Other Relational
Languages

5
...

The preceding query updates the assets of the Perryridge branch to $10,000,000,
regardless of the old value
...
Suppose that interest payments are being
made, and all balances are to be increased by 5 percent
...
x * 1
...
05
...
1
...
While
the original QBE was designed for a text-based display environment, Access QBE is
designed for a graphical display environment, and accordingly is called graphical
query-by-example (GQBE)
...
2

An example query in Microsoft Access QBE
...
Relational Databases

5
...
2 shows a sample GQBE query
...
” Section 5
...
4 showed how it is expressed in QBE
...
A more signiﬁcant difference is that
the graphical version of QBE uses a line linking attributes of two tables, instead of a
shared variable, to specify a join condition
...
In the example in Figure 5
...
The attribute account-number is
shared between the two selected tables, and the system automatically inserts a link
between the two tables
...
The link can also be
speciﬁed to denote a natural outer-join, instead of a natural join
...
in the table
...

Queries involving group by and aggregation can be created in Access as shown in
Figure 5
...
The query in the ﬁgure ﬁnds the name, street, and city of all customers
who have more than one account at the bank; we saw the QBE version of the query
earlier in Section 5
...
6
...
3

An aggregation query in Microsoft Access QBE
...
Relational Databases

© The McGraw−Hill
Companies, 2001

5
...
2

Datalog

203

are noted in the design grid
...
SQL has a similar requirement
...

Queries are created through a graphical user interface, by ﬁrst selecting tables
...
Selection conditions, grouping and aggregation can then be speciﬁed
on the attributes in the design grid
...

5
...
As in the relational calculus, a user describes the information desired
without giving a speciﬁc procedure for obtaining that information
...
However, the meaning of Datalog programs is deﬁned
in a purely declarative manner, unlike the more procedural semantics of Prolog, so
Datalog simpliﬁes writing simple queries and makes query optimization easier
...
2
...
Before presenting a formal deﬁnition
of Datalog rules and their formal meaning, we consider examples
...
The symbol :– is read as “if,” and the comma separating
the “account(A, “Perryridge”, B)” from “B > 700” is read as “and
...
4
...
5
...
Relational Databases

© The McGraw−Hill
Companies, 2001

5
...
4

balance
500
700
400
350
900
700
750

The account relation
...
Each rule deﬁnes
a set of tuples that the view relation must contain
...
The following Datalog
program speciﬁes the interest rates for accounts:
interest-rate(A, 5) :– account(A, N , B), B < 10000
interest-rate(A, 6) :– account(A, N , B), B >= 10000
The program has two rules deﬁning a view relation interest-rate, whose attributes are
the account number and the interest rate
...

Datalog rules can also use negation
...
Thus, Datalog rules are compact, compared to SQL
account-number
A-201
A-217
Figure 5
...

209

210

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

II
...
Other Relational
Languages

5
...
However, when relations have a large number of attributes, or the order or
number of attributes of relations may change, the positional notation can be cumbersome and error prone
...
In such a system, the Datalog rule
deﬁning v1 can be written as
v1(account-number A, balance B) :–
account(account-number A, branch-name “Perryridge”, balance B),
B > 700
Translation between the two forms can be done without signiﬁcant effort, given the
relation schema
...
2
...
2
...
We use the same conventions
as in the relational algebra for denoting relation names, attribute names, and constants (such as numbers or quoted strings)
...
Examples of constants are 4, which is a number, and “John,” which is a string;
X and Name are variables
...
, tn )
where p is the name of a relation with n attributes, and t1 , t2 ,
...
A negative literal has the form
not p(t1 , t2 ,
...
Here is an example of a literal:
account(A, “Perryridge”, B)
Literals involving arithmetic operations are treated specially
...

But what does this notation mean for arithmetic operations such as “>”? The relation > (conceptually) contains tuples of the form (x, y) for every possible pair of
values x, y such that x > y
...
Clearly,
the (conceptual) relation > is inﬁnite
...
For example, A = B + C stands conceptually for +(B, C, A), where the relation + contains every tuple (x, y, z) such that
z = x + y
...
Relational Databases

5
...
, vn )
and denotes that the tuple (v1 , v2 ,
...
A set of facts for a relation
can also be written in the usual tabular notation
...
Rules are built
out of literals and have the form
p(t1 , t2 ,
...
, Ln
where each Li is a (positive or negative) literal
...
, tn ) is referred
to as the head of the rule, and the rest of the literals in the rule constitute the body of
the rule
...
As mentioned earlier, there may be several rules deﬁning a
relation
...
6 shows a Datalog program that deﬁnes the interest on each account in
the Perryridge branch
...
It
uses the relation account and the view relation interest-rate
...

A view relation v1 is said to depend directly on a view relation v2 if v2 is used
in the expression deﬁning v1
...
Relation interest-rate in turn depends
directly on account
...
, in , for some n, such that v1 depends directly on i1 , i1 depends directly on i2 , and so on till in−1 depends on in
...
6, since we have a chain of dependencies from interest
to interest-rate to account, relation interest also depends indirectly on account
...

A view relation v is said to be recursive if it depends on itself
...

Consider the program in Figure 5
...
Here, the view relation empl depends on itself
(becasue of the second rule), and is therefore recursive
...
6 is nonrecursive
...

interest-rate(A, 5) :– account(A, N , B), B < 10000
...

Figure 5
...

211

212

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

II
...
Other Relational
Languages

5
...

empl(X, Y ) :– manager(X, Z), empl(Z, Y )
...
7

Recursive Datalog program
...
2
...
For now, we consider only
programs that are nonrecursive
...
2
...
We deﬁne the semantics of a program by starting with the semantics of a single rule
...
2
...
1 Semantics of a Rule
A ground instantiation of a rule is the result of replacing each variable in the rule
by some constant
...
Ground instantiations are often
simply called instantiations
...

A rule usually has many possible instantiations
...

Suppose that we are given a rule R,
p(t1 , t2 ,
...
, Ln
and a set of facts I for the relations used in the rule (I can also be thought of as a
database instance)
...
, vn ) :– l1 , l2 ,
...
, vi,ni ) or of the form not qi (vi,1 ,
v1,2 ,
...

We say that the body of rule instantiation R is satisﬁed in I if
1
...
, vi,ni ) in the body of R , the set of facts I
contains the fact q(vi,1 ,
...

2
...
, vj,nj ) in the body of R , the set of facts
I does not contain the fact qj (vj,1 ,
...

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

208

Chapter 5

II
...
Other Relational
Languages

Other Relational Languages

account-number
A-201
A-217
Figure 5
...

We deﬁne the set of facts that can be inferred from a given set of facts I using rule
R as
infer(R, I) = {p(t1 ,
...
, tni ) is the head of R , and
the body of R is satisﬁed in I}
...
, Rn }, we deﬁne
infer(R, I) = infer(R1 , I) ∪ infer (R2 , I) ∪
...
4
...

The fact account(“A-217”, “Perryridge”, 750) is in the set of facts I
...
Hence, the
body of the rule instantiation is satisﬁed in I
...
8
...
2
...
2 Semantics of a Program
When a view relation is deﬁned in terms of another view relation, the set of facts in
the ﬁrst view depends on the set of facts in the second one
...
Hence, we can layer the view relations in the following way,
and can use the layering to deﬁne the semantics of the program:
• A relation is in layer 1 if all relations used in the bodies of rules deﬁning it are
stored in the database
...

• In general, a relation p is in layer i + 1 if (1) it is not in layers 1, 2,
...
, i
...
6
...
9
...
Relation interest-rate is

213

214

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

II
...
Other Relational
Languages

5
...
9

account
Layering of view relations
...

Relation perryridge-account is similarly in layer 1
...

We can now deﬁne the semantics of a Datalog program in terms of the layering of
view relations
...
, n
...

• We deﬁne I0 to be the set of facts stored in the database, and deﬁne I1 as
I1 = I0 ∪ infer (R1 , I0 )
• We proceed in a similar fashion, deﬁning I2 in terms of I1 and R2 , and so on,
using the following deﬁnition:
Ii+1 = Ii ∪ infer (Ri+1 , Ii )
• Finally, the set of facts in the view relations deﬁned by the program (also called
the semantics of the program) is given by the set of facts In corresponding to
the highest layer n
...
6, I0 is the set of facts in the database, and I1 is the set
of facts in the database along with all facts that we can infer from I0 using the rules for
relations interest-rate and perryridge-account
...
The semantics of the program — that is, the set of those facts that are
in each of the view relations— is deﬁned as the set of facts I2
...
5
...
View expansion
can be used with nonrecursive Datalog views as well; conversely, the layering technique described here can also be used with relational-algebra views
...
Relational Databases

5
...
2
...
Consider the
rule
gt(X, Y ) :– X > Y
Since the relation deﬁning > is inﬁnite, this rule would generate an inﬁnite number
of facts for the relation gt, which calculation would, correspondingly, take an inﬁnite
amount of time and space
...
Consider the rule:
not-in-loan(L, B, A) :– not loan(L, B, A)
The idea is that a tuple (loan-number, branch-name, amount) is in view relation not-inloan if the tuple is not present in the loan relation
...

Finally, if we have a variable in the head that does not appear in the body, we may
get an inﬁnite number of facts where the variable is instantiated to different values
...
Every variable that appears in the head of the rule also appears in a nonarithmetic positive literal in the body of the rule
...
Every variable appearing in a negative literal in the body of the rule also appears in some positive literal in the body of the rule
...
The conditions can be weakened somewhat to allow variables in the head to appear only in an arithmetic literal in the body
in some cases
...

5
...
5 Relational Operations in Datalog
Nonrecursive Datalog expressions without arithmetic operations are equivalent in
expressive power to expressions using the basic operations in relational algebra (∪, −,
×, σ, Π and ρ)
...
Rather, we shall show
through examples how the various relational-algebra operations can be expressed in
Datalog
...

215

216

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

II
...
Other Relational
Languages

5
...
We perform
projections simply by using only the required attributes in the head of the rule
...
, Xn , Y1 , Y2 ,
...
, Xn ), r2 (Y1 , Y2 ,
...
, Xn , Y1 , Y2 ,
...

We form the union of two relations r1 and r2 (both of arity n) in this way:
query(X1 , X2 ,
...
, Xn )
query(X1 , X2 ,
...
, Xn )
We form the set difference of two relations r1 and r2 in this way:
query(X1 , X2 ,
...
, Xn ), not r2 (X1 , X2 ,
...
A relation can occur more than once in the rule body, but instead
of renaming to give distinct names to the relation occurrences, we can use different
variable names in the different occurrences
...
We leave this demonstration
as an exercise for you to carry out
...

Certain extensions to Datalog support the extended relational update operations
of insertion, deletion, and update
...
Some systems allow the use of + or − in rule heads to
denote relational insertion and deletion
...
Again, there is no standard syntax for this operation
...
2
...
For example, consider employees in an organization
...
Each manager manages a set of people who report to him or her
...
Relational Databases

© The McGraw−Hill
Companies, 2001

5
...
10

Datalog-Fixpoint procedure
...
Thus employees may be organized in a structure similar to a
tree
...

Suppose now that we want to ﬁnd out which employees are supervised, directly
or indirectly by a given manager — say, Jones
...
People often write programs to manipulate tree data structures by recursion
...
The
people supervised by Jones are (1) people whose manager is Jones and (2) people
whose manager is supervised by Jones
...

We can encode the preceding recursive deﬁnition as a recursive Datalog view,
called empl-jones:
empl-jones(X) :– manager(X, “Jones” )
empl-jones(X) :– manager(X, Y ), empl-jones(Y )
The ﬁrst rule corresponds to case (1); the second rule corresponds to case (2)
...
We assume that recursive Datalog programs contain no
rules with negative literals
...
The bibliographical
employee-name
Alon
Barinsky
Corbin
Duarte
Estovar
Jones
Rensal
Figure 5
...

217

218

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

II
...
Other Relational
Languages

5
...
12

Datalog

213

Tuples in empl-jones
(Duarte), (Estovar)
(Duarte), (Estovar), (Barinsky), (Corbin)
(Duarte), (Estovar), (Barinsky), (Corbin), (Alon)
(Duarte), (Estovar), (Barinsky), (Corbin), (Alon)

Employees of Jones in iterations of procedure Datalog-Fixpoint
...

The view relations of a recursive program that contains a set of rules R are deﬁned
to contain exactly the set of facts I computed by the iterative procedure DatalogFixpoint in Figure 5
...
The recursion in the Datalog program has been turned into
an iteration in the procedure
...

Consider the program deﬁning empl-jones, with the relation manager, as in Figure 5
...
The set of facts computed for the view relation empl-jones in each iteration
appears in Figure 5
...
In each iteration, the program computes one more level of
employees under Jones and adds it to the set empl-jones
...
Such a termination point must be reached, since the set of managers and
employees is ﬁnite
...

You should verify that, at the end of the iteration, the view relation empl-jones
contains exactly those employees who work under Jones
...
Iteration starts with a set of facts I set to the facts in the
database
...
1 Next, the set of rules R in the given Datalog program is used to infer
what facts are true, given that facts in I are true
...
This process is repeated until
no new facts can be inferred
...
At this point, then, we
have the ﬁnal set of true facts
...

1
...
Thus, in the
Datalog sense of “fact,” a fact may be true (the tuple is indeed in the relation) or false (the tuple is not in
the relation)
...
Relational Databases

5
...
Recall that when we make an inference by using a ground instantiation
of a rule, for each negative literal notq in the rule body we check that q is not present
in the set of facts I
...
However, in
the ﬁxed-point iteration, the set of facts I grows in each iteration, and even if q is
not present in I at one iteration, it may appear in I later
...
We require that a recursive program should not contain
negative literals, in order to avoid such problems
...
7):
empl(X, Y ) :– manager(X, Y )
empl(X, Y ) :– manager(X, Z), empl(Z, Y )
To ﬁnd the direct and indirect subordinates of Jones, we simply use the query
? empl(X, “Jones”)
which gives the same set of values for X as the view empl-jones
...

The view empl deﬁned previously is called the transitive closure of the relation
manager
...

5
...
7 The Power of Recursion
Datalog with recursion has more expressive power than Datalog without recursion
...
For example, we cannot express transitive
closure in Datalog without using recursion (or for that matter, in SQL or QBE without
recursion)
...
Intuitively, a ﬁxed
number of joins can ﬁnd only those employees that are some (other) ﬁxed number of
levels down from any manager (we will not attempt to prove this result here)
...
If the number of levels of employees
in the manager relation is more than the limit of the query, the query will miss some
levels of employees
...

An alternative to recursion is to use an external mechanism, such as embedded
SQL, to iterate on a nonrecursive query
...
10
...
However, writing such queries by iter-

219

220

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

II
...
Other Relational
Languages

5
...

The expressive power provided by recursion must be used with care
...
The second rule of the program does not satisfy the safety
condition in Section 5
...
4
...
For such
programs, tuples in view relations can contain only constants from the database, and
hence the view relations must be ﬁnite
...

5
...
8 Recursion in Other Languages
The SQL:1999 standard supports a limited form of recursion, using the with recursive
clause
...
We can ﬁnd every
pair (X, Y ) such that X is directly or indirectly managed by Y , using this SQL:1999
query:
with recursive empl(emp, mgr) as (
select emp, mgr
from manager
union
select emp, empl
...
mgr = empl
...
The additional keyword recursive
speciﬁes that the view is recursive
...
2
...

The procedure Datalog-Fixpoint iteratively uses the function infer(R, I) to compute what facts are true, given a recursive Datalog program
...
Regardless of the language used to deﬁne a view V, the view can be thought of as being deﬁned by an
expression EV that, given a set of facts I, returns a set of facts EV (I) for the view relation V
...
Relational Databases

Chapter 5

5
...
The preceding function has the same form
as the infer function for Datalog
...

Similarly, the function infer is said to be monotonic if
I1 ⊆ I2 ⇒ infer(R, I1 ) ⊆ inf er(R, I2 )
Thus, if infer is monotonic, given a set of facts I0 that is a subset of the true facts, we
can be sure that all facts in infer(R, I0 ) are also true
...
2
...

Relational-algebra expressions that use only the operators Π, σ, ×, 1, ∪, ∩, or ρ are
monotonic
...

However, relational expressions that use the operator − are not monotonic
...
Let
I1 = { manager 1 (“Alon”, “Barinsky”), manager 1 (“Barinsky”, “Estovar”),
manager 2 (“Alon”, “Barinsky”) }
and let
I2 = { manager 1 (“Alon”, “Barinsky”), manager 1 (“Barinsky”, “Estovar”),
manager 2 (“Alon”, “Barinsky”), manager 2 (“Barinsky”, “Estovar”)}
Consider the expression manager 1 − manager 2
...
But I1 ⊆ I2 ; hence, the expression is not monotonic
...

The ﬁxed-point technique does not work on recursive views deﬁned with nonmonotonic expressions
...
Such relationships deﬁne what subparts make up each part
...
An example of an aggregate query on such a structure
would be to compute the total number of subparts of each part
...
The bibliographic notes provide references
to research on deﬁning such views
...
For
example, extended relational operations have been proposed to deﬁne transitive closure, and extensions to the SQL syntax to specify (generalized) transitive closure have
been proposed
...

221

222

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

II
...
Other Relational
Languages

5
...
3 User Interfaces and Tools
Although many people interact with databases, few people use a query language to
directly interact with a database system
...
Forms and graphical user interfaces allow users to enter values that complete predeﬁned queries
...
Graphical user interfaces provide
an easy-to-use way to interact with the database system
...
Report generators permit predeﬁned reports to be generated on the current
database contents
...

3
...

It is worth noting that such interfaces use query languages to communicate with
database systems
...
Chapter 22 covers data analysis tools in more detail
...
In this section, we describe the basic concepts, without going
into the details of any particular user interface product
...
3
...
For example, World Wide Web search
engines provide forms that are used to enter key words
...

As a more database-oriented example, you may connect to a university registration system, where you are asked to ﬁll in your roll number and password into a
form
...
There may be further links on the Web page that let you
search for courses and ﬁnd further information about courses such as the syllabus
and the instructor
...
Most database system vendors also provide proprietary
forms interfaces that offer facilities beyond those present in HTML forms
...
Most database system vendors also provide tools that simplify the creation of graphical user interfaces and forms
...
Users can deﬁne the type, size, and format of each ﬁeld in
a form by using the form editor
...
Relational Databases

5
...

For instance, the execution of a query to ﬁll in name and address ﬁelds may be associated with ﬁlling in a roll number ﬁeld, and execution of an update statement may
be associated with submitting a form
...
2 For example, a constraint on the course number ﬁeld may check that the
course number typed in by the user corresponds to an actual course
...
Menus that indicate the valid values that can
be entered in a ﬁeld can help eliminate the possibility of many types of errors
...

5
...
2 Report Generators
Report generators are tools to generate human-readable summary reports from a
database
...
For example, a report may show the
total sales in each of the past two months for each sales region
...
Variables can be used to store parameters such as the
month and the year and to deﬁne ﬁelds in the report
...
The query deﬁnitions can
make use of the parameter values stored in the variables
...
Report-generator systems
provide a variety of facilities for structuring tabular output, such as deﬁning table
and column headers, displaying subtotals for each group in a table, automatically
splitting long tables into multiple pages, and displaying subtotals at the end of each
page
...
13 is an example of a formatted report
...

The Microsoft Ofﬁce suite provides a convenient way of embedding formatted
query results from a database, such as MS Access, into a document created with a
text editor, such as MS Word
...
A feature
called OLE (Object Linking and Embedding) links the resulting structure into a text
document
...
The name emphasizes that these tools offer a programming paradigm that is different from the imperative programming paradigm offered by third2
...

223

224

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

II
...
Other Relational
Languages

5
...

Quarterly Sales Report
Period: Jan
...
13

2,100,000

A formatted report
...
However, this term is less
relevant today, since forms and report generators are typically created with graphical
tools, rather than with programming languages
...
4 Summary
• We have considered two query languages: QBE, and Datalog
...

• QBE and its variants have become popular with nonexpert database users because of the intuitive simplicity of the visual paradigm
...

• Datalog is derived from Prolog, but unlike Prolog, it has a declarative semantics, making simple queries easier to write and query evaluation easier to optimize
...
However, no
accepted standards exist for important features, such as grouping and aggregation, in Datalog
...

• Most users interact with databases via forms and graphical user interfaces,
and there are numerous tools to simplify the construction of such interfaces
...

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

220

Chapter 5

II
...
Other Relational
Languages

Other Relational Languages

Review Terms
•
•
•
•
•
•
•

Query-by-Example (QBE)
Two-dimensional syntax
Skeleton tables
Example rows
Condition box
Result relation
Microsoft Access

• Graphical Query-By-Example
(GQBE)
• Design grid
• Datalog
• Rules
• Uses
• Deﬁnes
• Positive literal
• Negative literal
• Fact
• Rule
Head
Body

• Datalog program
• Depend on
Directly
Indirectly
• Recursive view
• Nonrecursive view
• Instantiation
Ground instantiation
Satisﬁed
• Infer
• Semantics
Of a rule
Of a program
• Safety
• Fixed point
• Transitive closure
• Monotonic view deﬁnition
• Forms
• Graphical user interfaces
• Report generators

Exercises
5
...
14, where the primary keys are underlined
...

a
...

b
...

c
...

d
...
”
e
...

5
...
15
...
Find the names of all employees who work for First Bank Corporation
...
Find the names and cities of residence of all employees who work for First
Bank Corporation
...
Relational Databases

© The McGraw−Hill
Companies, 2001

5
...
14

Insurance database
...
Find the names, street addresses, and cities of residence of all employees
who work for First Bank Corporation and earn more than $10,000 per annum
...
Find all employees who live in the same city as the company for which they
work is located
...
Find all employees who live in the same city and on the same street as their
managers
...
Find all employees in the database who do not work for First Bank Corporation
...
Find all employees who earn more than every employee of Small Bank Corporation
...
Assume that the companies may be located in several cities
...

5
...
15
...
Give expressions in QBE for each of the following queries:
a
...

b
...

c
...

d
...

5
...
15
...

b
...

d
...

Give all employees of First Bank Corporation a 10 percent raise
...

Give all managers in the database a 10 percent raise, unless the salary would
be greater than $100,000
...

employee (person-name, street, city)
works (person-name, company-name, salary)
company (company-name, city)
manages (person-name, manager-name)
Figure 5
...

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

222

II
...
Other Relational
Languages

© The McGraw−Hill
Companies, 2001

Other Relational Languages

e
...

5
...
Give expressions in QBE, and Datalog equivalent to each of the following queries:
a
...

c
...

ΠA (r)
σB = 17 (r)
r × s
ΠA,F (σC = D (r × s))

5
...
Give expressions in QBE, and Datalog equivalent to each of the following queries:
a
...

c
...

r1 ∪ r2
r1 ∩ r2
r1 − r2
ΠAB (r1 )

1

ΠBC (r2 )

5
...
Write expressions in QBE and Datalog for each of the following queries:
a
...
{< a, b, c > | < a, b > ∈ r ∧ < a, c > ∈ s}
c
...
8 Consider the relational database of Figure 5
...
Write a Datalog program for
each of the following queries:
a
...

b
...

c
...

d
...

5
...

227

228

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

II
...
Other Relational
Languages

© The McGraw−Hill
Companies, 2001

Bibliographical Notes

223

5
...

Bibliographical Notes
The experimental version of Query-by-Example is described in Zloof [1977]; the commercial version is described in IBM [1978]
...

Examples are Microsoft Access and Borland Paradox
...
[1993]), and Coral
(described in Ramakrishnan et al
...
[1993])
...
[1984]
...
Ramakrishnan and Ullman [1995] provides a more recent survey on deductive databases
...
Chandra and Harel [1982] and Apt and Pugin [1987] discuss stratiﬁed negation
...
[1992a]
...

IBM DB2 QMF and Borland Paradox also support QBE
...
cs
...
edu/coral)
...
sourceforge
...

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

C

II
...
Integrity and Security

E

R

229

© The McGraw−Hill
Companies, 2001

6

Integrity and Security

Integrity constraints ensure that changes made to the database by authorized users
do not result in a loss of data consistency
...

We have already seen two forms of integrity constraints for the E-R model in Chapter 2:
• Key declarations — the stipulation that certain attributes form a candidate key
for a given entity set
...

In general, an integrity constraint can be an arbitrary predicate pertaining to the
database
...
Thus, we concentrate
on integrity constraints that can be tested with minimal overhead
...
1 and 6
...
3
...

In Section 6
...
Triggers are used
to ensure some types of integrity
...
In Sections 6
...
7, we examine ways in which data
may be misused or intentionally made inconsistent, and present security mechanisms
to guard against such occurrences
...
1 Domain Constraints
We have seen that a domain of possible values must be associated with every attribute
...
Relational Databases

6
...
Declaring an attribute to
be of a particular domain acts as a constraint on the values that it can take
...
They are tested easily by the system whenever a new data item is entered into the database
...
For example, the attributes customer-name and employee-name might have the same domain: the set of all
person names
...
It is perhaps less clear whether customer-name and branch-name should have
the same domain
...
However, we would normally not consider the query
“Find all customers who have the same name as a branch” to be a meaningful query
...

From the above discussion, we can see that a proper deﬁnition of domain constraints not only allows us to test values inserted in the database, but also permits
us to test queries to ensure that the comparisons made make sense
...
Strongly typed programming languages allow the compiler to check the
program in greater detail
...
For example, the
statements:
create domain Dollars numeric(12,2)
create domain Pounds numeric(12,2)
deﬁne the domains Dollars and Pounds to be decimal numbers with a total of 12 digits,
two of which are placed after the decimal point
...
Such an assignment is likely to be due to a programmer error,
where the programmer forgot about the differences in currency
...

Values of one domain can be cast (that is, converted) to another domain
...
A as Pounds
In a real application we would of course multiply r
...
SQL also provides drop domain and alter domain clauses
to drop or modify domains that have been created earlier
...
Speciﬁcally, the check
clause permits the schema designer to specify a predicate that must be satisﬁed by
any value assigned to a variable whose type is the domain
...
Relational Databases

231

© The McGraw−Hill
Companies, 2001

6
...
2

Referential Integrity

227

create domain HourlyWage numeric(5,2)
constraint wage-value-test check(value >= 4
...
00
...
The name is used to indicate which constraint
an update violated
...
However, in general, the check conditions can be more complex (and
harder to check), since subqueries that refer to other relations are permitted in the
check condition
...
Thus, the condition has to be
checked not only when a tuple is inserted or modiﬁed in deposit, but also when the
relation branch changes (in this case, when a tuple is deleted or modiﬁed in relation
branch)
...
We discuss such constraints, along with a simpler way
of specifying them in SQL, in Section 6
...

Complex check conditions can be useful when we want to ensure integrity of data,
but we should use them with care, since they may be costly to test
...
2 Referential Integrity
Often, we wish to ensure that a value that appears in one relation for a given set of
attributes also appears for a certain set of attributes in another relation
...

232

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

228

Chapter 6

II
...
Integrity and Security

© The McGraw−Hill
Companies, 2001

Integrity and Security

6
...
1 Basic Concepts
Consider a pair of relations r(R) and s(S), and the natural join r 1 s
...
That is, there is no ts in s such that
tr [R ∩ S] = ts [R ∩ S]
...
Depending on the entity
set or relationship set being modeled, dangling tuples may or may not be acceptable
...
3
...
Here, our concern is not with queries, but
rather with when we should permit dangling tuples to exist in the database
...
This
situation would be undesirable
...

Therefore, tuple t1 would refer to an account at a branch that does not exist
...

Not all instances of dangling tuples are undesirable, however
...
In this case, a branch exists that
has no accounts
...
Thus, we do not want to prohibit this situation
...

• The attribute branch-name in Branch-schema is not a foreign key
...
1
...
)
In the Lunartown example, tuple t1 in account has a value on the foreign key
branch-name that does not appear in branch
...
Thus, the distinction between our two examples of dangling tuples
is the presence of a foreign key
...
We
say that a subset α of R2 is a foreign key referencing K1 in relation r1 if it is required
that, for every t2 in r2 , there must be a tuple t1 in r1 such that t1 [K1 ] = t2 [α]
...
The latter term arises because the preceding referential-integrity constraint
can be written as Πα (r2 ) ⊆ ΠK1 (r1 )
...

6
...
2 Referential Integrity and the E-R Model
Referential-integrity constraints arise frequently
...
Relational Databases

233

© The McGraw−Hill
Companies, 2001

6
...
2

Referential Integrity

229

E1
E2

...

...
1

An n-ary relationship set
...
Figure 6
...
, En
...
The attributes of the relation schema for relationship set R
include K1 ∪ K2 ∪ · · · ∪ Kn
...
Recall from
Chapter 2 that the relation schema for a weak entity set must include the primary
key of the entity set on which the weak entity set depends
...

6
...
3 Database Modiﬁcation
Database modiﬁcations can cause violations of referential integrity
...
If a tuple t2 is inserted into r2 , the system must ensure that there is a
tuple t1 in r1 such that t1 [K] = t2 [α]
...
If a tuple t1 is deleted from r1 , the system must compute the set of
tuples in r2 that reference t1 :
σα = t1 [K] (r2 )
If this set is not empty, either the delete command is rejected as an error, or the
tuples that reference t1 must themselves be deleted
...

234

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

230

Chapter 6

II
...
Integrity and Security

© The McGraw−Hill
Companies, 2001

Integrity and Security

• Update
...

If a tuple t2 is updated in relation r2 , and the update modiﬁes values for
the foreign key α, then a test similar to the insert case is made
...
The system must ensure that
t2 [α] ∈ ΠK (r1 )
If a tuple t1 is updated in r1 , and the update modiﬁes values for the primary key (K), then a test similar to the delete case is made
...
If this set
is not empty, the update is rejected as an error, or the update is cascaded
in a manner similar to delete
...
2
...
We illustrate foreign-key declarations by using the SQL DDL definition of part of our bank database, shown in Figure 6
...

By default, a foreign key references the primary key attributes of the referenced
table
...
The speciﬁed list of attributes must
be declared as a candidate key of the referenced relation
...
However, a foreign key clause can specify that
if a delete or update action on the referenced relation violates the constraint, then,
instead of rejecting the action, the system must take steps to change the tuple in the
referencing relation to restore the constraint
...

foreign key (branch-name) references branch
on delete cascade
on update cascade,

...
Instead, the delete “cascades” to the

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

II
...
Integrity and Security

6
...
2

SQL data deﬁnition for part of the bank database
...
Similarly, the system does not reject an update to a ﬁeld referenced by the constraint if it
violates the constraint; instead, the system updates the ﬁeld branch-name in the referencing tuples in account to the new value as well
...

If there is a chain of foreign-key dependencies across multiple relations, a deletion
or update at one end of the chain can propagate across the entire chain
...
4
...
As a result, all the changes caused by the transaction and its cascading actions
are undone
...

Attributes of foreign keys are allowed to be null, provided that they have not other-

236

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

232

Chapter 6

II
...
Integrity and Security

© The McGraw−Hill
Companies, 2001

Integrity and Security

wise been declared to be non-null
...
If
any of the foreign-key columns is null, the tuple is deﬁned automatically to satisfy
the constraint
...
To avoid such complexity, it is best to ensure that all columns of a foreign
key speciﬁcation are declared to be non-null
...
For instance, suppose we have a relation marriedperson with primary key name, and an attribute spouse, and suppose that spouse is a foreign key on marriedperson
...
Suppose we wish to note the fact that John and Mary are married to each
other by inserting two tuples, one for John and one for Mary, in the above relation
...
After the second tuple is inserted the foreign
key constraint would hold again
...
1

6
...
Domain constraints and referential-integrity constraints are special forms
of assertions
...
However,
there are many constraints that we cannot express by using only these special forms
...

• Every loan has at least one customer who maintains an account with a minimum balance of $1000
...

An assertion in SQL takes the form
create assertion check
Here is how the two examples of constraints can be written
...
We can work around the problem in the above example in another way, if the spouse attribute can be
set to null: We set the spouse attributes to null when inserting the tuples for John and Mary, and we update
them later
...

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

II
...
Integrity and Security

6
...
We write
create assertion sum-constraint check
(not exists (select * from branch
where (select sum(amount) from loan
where loan
...
branch-name)
>= (select sum(balance) from account
where account
...
branch-name)))
create assertion balance-constraint check
(not exists (select * from loan
where not exists ( select *
from borrower, depositor, account
where loan
...
loan-number
and borrower
...
customer-name
and depositor
...
account-number
and account
...
If the assertion is valid,
then any future modiﬁcation to the database is allowed only if it does not cause that
assertion to be violated
...
Hence, assertions should be used with great
care
...

6
...
To design a trigger mechanism, we must meet two
requirements:
1
...
This is broken up into an event that
causes the trigger to be checked and a condition that must be satisﬁed for trigger execution to proceed
...
Specify the actions to be taken when the trigger executes
...

The database stores triggers just as if they were regular data, so that they are persistent and are accessible to all database operations
...

238

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

234

Chapter 6

II
...
Integrity and Security

© The McGraw−Hill
Companies, 2001

Integrity and Security

6
...
1 Need for Triggers
Triggers are useful mechanisms for alerting humans or for starting certain tasks automatically when certain conditions are met
...
The bank
gives this loan a loan number identical to the account number of the overdrawn account
...
Suppose that Jones’ withdrawal
of some money from an account made the account balance negative
...
The actions to be taken are:
• Insert a new tuple s in the loan relation with
s[loan-number] = t[account-number]
s[branch-name] = t[branch-name]
s[amount] = −t[balance]
(Note that, since t[balance] is negative, we negate t[balance] to get the loan
amount — a positive number
...

As another example of the use of triggers, suppose a warehouse wishes to maintain a minimum inventory of each item; when the inventory level of an item falls
below the minimum level, an order should be placed automatically
...

Note that trigger systems cannot usually perform updates outside the database,
and hence in the inventory replenishment example, we cannot use a trigger to directly place an order in the external world
...
We must create a separate permanently running
system process that periodically scans the orders relation and places orders
...
The process would also track deliveries of orders,
and alert managers in case of exceptional conditions such as delays in deliveries
...
4
...
Unfortunately, each database system implemented its

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

II
...
Integrity and Security

6
...
balance < 0
begin atomic
insert into borrower
(select customer-name, account-number
from depositor
where nrow
...
account-number);
insert into loan values
(nrow
...
branch-name, − nrow
...
account-number = nrow
...
3

Example of SQL:1999 syntax for triggers
...
We outline in Figure 6
...

This trigger deﬁnition speciﬁes that the trigger is initiated after any update of the
relation account is executed
...
The referencing new row as clause creates a variable
nrow (called a transition variable), which stores the value of an updated row after
the update
...
balance < 0
...
The
begin atomic
...
The two insert statements with the begin
...
The update statement serves to set the account balance back
to 0 from its earlier negative value
...

For example, the action on delete of an account could be to check if the
holders of the account have any remaining accounts, and if they do not, to
delete them from the depositor relation
...
7)
...
Obviously a trigger cannot directly cause such an action outside the database, but could instead add a tuple to a relation storing addresses to which welcome letters need to be sent
...

240

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

236

Chapter 6

II
...
Integrity and Security

© The McGraw−Hill
Companies, 2001

Integrity and Security

• For updates, the trigger can specify columns whose update causes the trigger
to execute
...

• The referencing old row as clause can be used to create a variable storing the
old value of an updated or deleted row
...

• Triggers can be activated before the event (insert/delete/update) instead of
after the event
...
For instance, if we wish not to permit overdrafts, we can create a before
trigger that rolls back the transaction if the new balance is negative
...
We can
deﬁne a trigger that replaces the value by the null value
...

create trigger setnull-trigger before update on r
referencing new row as nrow
for each row
when nrow
...
phone-number = null
• Instead of carrying out an action for each affected row, we can carry out a single action for the entire SQL statement that caused the insert/delete/update
...

The clauses referencing old table as or referencing new table as can then
be used to refer to temporary tables (called transition tables) containing all the
affected rows
...

A single SQL statement can then be used to carry out multiple actions on
the basis of the transition tables
...
Relational Databases

241

© The McGraw−Hill
Companies, 2001

6
...
4

Triggers

237

create trigger reorder-trigger after update of amount on inventory
referencing old row as orow, new row as nrow
for each row
when nrow
...
item = orow
...
level > (select level
from minlevel
where minlevel
...
item)
begin
insert into orders
(select item, amount
from reorder
where reorder
...
item)
end
Figure 6
...

• minlevel(item, level), which notes the minimum amount of the item to be maintained
• reorder(item, amount), which notes the amount of the item to be ordered when
its level falls below the minimum
• orders(item, amount), which notes the amount of the item to be ordered
...
4 for reordering the item
...
If we only check that the
new value after an update is below the minimum level, we may place an order erroneously when the item has already been reordered
...
For instance, many database systems do not
implement the before clause, and the keyword on is used instead of after
...
Instead, they may specify transition tables by
using the keywords inserted or deleted
...
5 illustrates how the overdraft trigger would be written in MS-SQLServer
...

6
...
3 When Not to Use Triggers
There are many good uses for triggers, such as those we have just seen in Section 6
...
2,
but some uses are best handled by alternative techniques
...
For instance, they used
triggers on insert/delete/update of a employee relation containing salary and dept attributes to maintain the total salary of each department
...
5
...
Relational Databases

6
...
balance < 0
begin
insert into borrower
(select customer-name, account-number
from depositor, inserted
where inserted
...
account-number)
insert into loan values
(inserted
...
branch-name, − inserted
...
account-number = inserted
...
5

Example of trigger in MS-SQL server syntax

easier way to maintain summary data
...
A separate process
copied over the changes to the replica (copy) of the database, and the system executed
the changes on the replica
...

In fact, many trigger applications, including our example overdraft trigger, can be
substituted by “encapsulation” features being introduced in SQL:1999
...
That procedure would in turn check for negative balance, and carry out the actions of the overdraft trigger
...

Triggers should be written with great care, since a trigger error detected at run
time causes the failure of the insert/delete/update statement that set off the trigger
...
In the worst case,
this could even lead to an inﬁnite chain of triggering
...
The insert action then triggers yet another insert action, and so on ad inﬁnitum
...

Triggers are occasionally called rules, or active rules, but should not be confused
with Datalog rules (see Section 5
...

6
...
Relational Databases

243

© The McGraw−Hill
Companies, 2001

6
...
5

Security and Authorization

239

duction of inconsistency that integrity constraints provide
...
We
then present mechanisms to guard against such occurrences
...
5
...
Absolute protection
of the database from malicious abuse is not possible, but the cost to the perpetrator
can be made high enough to deter most if not all attempts to access the database
without proper authority
...
Some database-system users may be authorized to access
only a limited portion of the database
...
It is the responsibility of
the database system to ensure that these authorization restrictions are not violated
...
No matter how secure the database system is, weakness in
operating-system security may serve as a means of unauthorized access to the
database
...
Since almost all database systems allow remote access through terminals or networks, software-level security within the network software is as
important as physical security, both on the Internet and in private networks
...
Sites with computer systems must be physically secured against
armed or surreptitious entry by intruders
...
Users must be authorized carefully to reduce the chance of any user
giving access to an intruder in exchange for a bribe or other favors
...

A weakness at a low level of security (physical or human) allows circumvention of
strict high-level (database) security measures
...
Security at the physical and human levels, although important, is beyond the
scope of this text
...
The ﬁle system also provides some degree of protection
...
Relational Databases

6
...

Finally, network-level security has gained widespread recognition as the Internet
has evolved from an academic research platform to the basis of international electronic commerce
...
We shall present our discussion of security in terms of the
relational-data model, although the concepts of this chapter are equally applicable to
all data models
...
5
...
For
example,
• Read authorization allows reading, but not modiﬁcation, of data
...

• Update authorization allows modiﬁcation, but not deletion, of data
...

We may assign the user all, none, or a combination of these types of authorization
...

• Resource authorization allows the creation of new relations
...

• Drop authorization allows the deletion of relations
...
If a user deletes all tuples of a relation, the relation still exists, but
it is empty
...

We regulate the ability to create new relations through resource authorization
...

Index authorization may appear unnecessary, since the creation or deletion of an
index does not alter data in relations
...
However, indices also consume space, and all database modiﬁcations
are required to update indices
...
To allow the database
administrator to regulate the use of system resources, it is necessary to treat index
creation as a privilege
...
Relational Databases

245

© The McGraw−Hill
Companies, 2001

6
...
5

Security and Authorization

241

The ultimate form of authority is that given to the database administrator
...
This form of authorization is analogous to that of a superuser or operator for an
operating system
...
5
...
A view can hide data that a user does
not need to see
...
Views simplify system usage because they restrict
the user’s attention to the data of interest
...
Thus, a combination of relational-level security and view-level security limits a
user’s access to precisely the data that the user needs
...
This clerk is not authorized to see information regarding speciﬁc loans that the customer may have
...
But, if she is to have access to the information
needed, the clerk must be granted access to the view cust-loan, which consists of only
the names of customers and the branches at which they have a loan
...
loan-number = loan
...
However, when the
query processor translates it into a query on the actual relations in the database, it
produces a query on borrower and loan
...

Creation of a view does not require resource authorization
...
She receives only those
privileges that provide no additional authorization beyond those that she already
had
...
If a user creates
a view on which no authorization can be granted, the system will deny the view
creation request
...

246

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

242

Chapter 6

II
...
Integrity and Security

Integrity and Security

6
...
4 Granting of Privileges
A user who has been granted some form of authorization may be allowed to pass
on this authorization to other users
...

Consider, as an example, the granting of update authorization on the loan relation of the bank database
...
The passing of authorization from one user to another
can be represented by an authorization graph
...

The graph includes an edge Ui → Uj if user Ui grants update authorization on loan
to Uj
...
In the sample graph in
Figure 6
...

A user has an authorization if and only if there is a path from the root of the authorization graph (namely, the node representing the database administrator) down to
the node representing the user
...
Since U4 has authorization from U1 , that authorization should be revoked as
well
...
Since the database
administrator did not revoke update authorization on loan from U2 , U5 retains update
authorization on loan
...

A pair of devious users might attempt to defeat the rules for revocation of
authorization by granting authorization to each other, as shown in Figure 6
...
If
the database administrator revokes authorization from U2 , U2 retains authorization
through U3 , as in Figure 6
...
If authorization is revoked subsequently from U3 , U3
appears to retain authorization through U2 , as in Figure 6
...
However, when the
database administrator revokes authorization from U3 , the edges from U3 to U2 and
from U2 to U3 are no longer part of a path starting with the database administrator
...
6

Authorization-grant graph
...
Relational Databases

247

© The McGraw−Hill
Companies, 2001

6
...
5

Security and Authorization

243

DBA

U1

U2

U3

(a)
DBA

DBA

U1

U2

U1

U3

U3

(c)

(b)
Figure 6
...

We require that all edges in an authorization graph be part of some path originating
with the database administrator
...
8
...
5
...
Each teller must have the same types
of authorizations to the same set of relations
...

A better scheme would be to specify the authorizations that every teller is to be
given, and to separately identify which database users are tellers
...
When a new person is hired as a teller, a user identiﬁer must be allocated
to him, and he must be identiﬁed as a teller
...

The notion of roles captures this scheme
...

Authorizations can be granted to roles, in exactly the same fashion as they are granted
to individual users
...

DBA

U1
Figure 6
...

248

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

244

Chapter 6

II
...
Integrity and Security

© The McGraw−Hill
Companies, 2001

Integrity and Security

In our bank database, examples of roles could include teller, branch-manager, auditor, and system-administrator
...
The problem with this scheme
is that it would not be possible to identify exactly which teller carried out a transaction, leading to security risks
...

Any authorization that can be granted to a user can be granted to a role
...
And like other authorizations, a user
may also be granted authorization to grant a particular role to others
...

6
...
6 Audit Trails
Many secure database applications require an audit trail be maintained
...

The audit trail aids security in several ways
...
The bank could then also use the audit trail to trace all
the updates performed by these persons, in order to ﬁnd other incorrect or fraudulent
updates
...
However, many database systems provide built-in mechanisms to create audit trails, which
are much more convenient to use
...

6
...

We describe these mechanisms, as well as their limitations, in this section
...
6
...
The select
privilege corresponds to the read privilege
...
If the relation
to be created includes a foreign key that references attributes of another relation,
the user/role must have been granted references privilege on those attributes
...

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

II
...
Integrity and Security

6
...
The grant statement is used to confer authorization
...

The following grant statement grants users U1 , U2 , and U3 select authorization on
the account relation:
grant select on account to U1 , U2 , U3
The update authorization may be given either on all attributes of the relation or
on only some
...
If the list of attributes is omitted, the
update privilege will be granted on all attributes of the relation
...

The SQL references privilege is granted on speciﬁc attributes in a manner like
that for the update privilege
...
However, recall from Section 6
...

In the preceding example, if U1 creates a foreign key in a relation r referencing the
branch-name attribute of the branch relation, and then inserts a tuple into r pertaining
to the Perryridge branch, it is no longer possible to delete the Perryridge branch from
the branch relation without also modifying relation r
...

The privilege all privileges can be used as a short form for all the allowable privileges
...
SQL also includes a usage privilege that authorizes a user to use a speciﬁed
domain (recall that a domain corresponds to the programming-language notion of a
type, and may be user deﬁned)
...
Relational Databases

6
...
6
...

grant teller to john
create role manager
grant teller to manager
grant manager to mary
Thus the privileges of a user or a role consist of
• All privileges directly granted to the user/role
• All privileges granted to roles that have been granted to the user/role
Note that there can be a chain of roles; for example, the role employee may be granted
to all tellers
...
Thus, the manager role inherits all privileges granted to the roles employee and to teller in addition to privileges
granted directly to manager
...
6
...
If we wish to grant a privilege and to allow the recipient
to pass the privilege on to other users, we append the with grant option clause to the
appropriate grant command
...
It takes a form almost
identical to that of grant:
revoke on
from [restrict | cascade]
Thus, to revoke the privileges that we granted previously, we write

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

II
...
Integrity and Security

6
...
5
...
This behavior is called cascading of the
revoke
...
The revoke
statement may alternatively specify restrict:
revoke select on branch from U1 , U2 , U3 restrict
In this case, the system returns an error if there are any cascading revokes, and does
not carry out the revoke action
...
6
...

The SQL standard speciﬁes a primitive authorization mechanism for the database
schema: Only the owner of the schema can carry out any modiﬁcation to the schema
...
Several database implementations have more powerful authorization mechanisms for database schemas, similar to those discussed earlier, but these mechanisms are nonstandard
...
6
...
For instance,
suppose you want all students to be able to see their own grades, but not the grades
of anyone else
...

Furthermore, with the growth in the Web, database accesses come primarily from
Web application servers
...

The task of authorization then falls on the application server; the entire authorization scheme of SQL is bypassed
...
The problems
are these:
• The code for checking authorization becomes intermixed with the rest of the
application code
...
Relational Databases

6
...
Because of an oversight, one of the application programs may not check for authorization, allowing unauthorized users access to conﬁdential data
...

6
...
In such cases, data may
be stored in encrypted form
...
Encryption also forms the basis of
good schemes for authenticating users to a database
...
7
...
Simple encryption
techniques may not provide adequate security, since it may be easy for an unauthorized user to break the code
...
Thus,
Perryridge
becomes
Qfsszsjehf
If an unauthorized user sees only “Qfsszsjehf,” she probably has insufﬁcient information to break the code
...

A good encryption technique has the following properties:
• It is relatively simple for authorized users to encrypt and decrypt data
...

• Its encryption key is extremely difﬁcult for an intruder to determine
...
For this scheme to work, the authorized users must be provided with
the encryption key via a secure mechanism
...
The DES standard was reafﬁrmed in 1983, 1987,

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

II
...
Integrity and Security

6
...
However, weakness in DES was recongnized in 1993 as reaching a
point where a new standard to be called the Advanced Encryption Standard (AES),
needed to be selected
...
Rijmen and J
...
The Rijndael algorithm was
chosen for its signiﬁcantly stronger level of security and its relative ease of implementation on current computer systems as well as such devices as smart cards
...

Public-key encryption is an alternative scheme that avoids some of the problems
that we face with the DES
...
Each
user Ui has a public key Ei and a private key Di
...
Each private key is known to only the one user to whom the
key belongs
...
Decryption requires the private key D1
...
If user U1 wants to share data with U2 , U1 encrypts
the data using E2 , the public key of U2
...

For public-key encryption to work, there must be a scheme for encryption that
can be made public without making it easy for people to ﬁgure out the scheme for
decryption
...
Such a scheme does exist and is based on these conditions:
• There is an efﬁcient algorithm for testing whether or not a number is prime
...

For purposes of this scheme, data are treated as a collection of integers
...
The
private key consists of the pair (P1 , P2 )
...
Since all that is published is the product P1 P2 , an unauthorized user would need
to be able to factor P1 P2 to steal data
...

The details of public-key encryption and the mathematical justiﬁcation of this technique’s properties are referenced in the bibliographic notes
...
A hybrid scheme used for secure communication is as follows: DES
keys are exchanged via a public-key – encryption scheme, and DES encryption is used
on the data transmitted subsequently
...
7
...
The simplest form of authentication consists of a secret password which must be presented when a connection is opened to a database
...
Relational Databases

6
...
However, the use of passwords has some drawbacks, especially over a
network
...
Once
the eavesdropper has a user name and password, she can connect to the database,
pretending to be the legitimate user
...
The database system sends a challenge string to the user
...
The database system
can verify the authenticity of the user by decrypting the string with the same secret
password, and checking the result with the original challenge string
...

Public-key systems can be used for encryption in challenge – response systems
...
The user decrypts the string using her private key, and returns
the result to the database system
...

This scheme has the added beneﬁt of not storing the secret password in the database,
where it could potentially be seen by system administrators
...
The private key is used to sign data, and the signed data
can be made public
...
Thus, we can authenticate
the data; that is, we can verify that the data were indeed created by the person who
claims to have created them
...
That is, in
case the person who created the data later claims she did not create it (the electronic
equivalent of claiming not to have signed the check), we can prove that that person
must have created the data (unless her private key was leaked to others)
...
8 Summary
• Integrity constraints ensure that changes made to the database by authorized
users do not result in a loss of data consistency
...
In this chapter, we considered several additional
forms of constraints, and discussed mechanisms for ensuring the maintenance
of these constraints
...
Such constraints may also prohibit the use of null values for
particular attributes
...
Relational Databases

255

© The McGraw−Hill
Companies, 2001

6
...
8

Summary

251

• Referential-integrity constraints ensure that a value that appears in one relation for a given set of attributes also appears for a certain set of attributes in
another relation
...
Use of more complex constraints may lead to substantial overhead
...
Assertions are declarative
expressions that state predicates that we require always to be true
...
Triggers have many uses, such
as implementing business rules, audit logging, and even carrying out actions
outside the database system
...

• The data stored in the database need to be protected from unauthorized access, malicious destruction or alteration, and accidental introduction of inconsistency
...
Absolute protection of the database
from malicious abuse is not possible, but the cost to the perpetrator can be
made sufﬁciently high to deter most, if not all, attempts to access the database
without proper authority
...
Authorization is a means by which the database system can be protected against
malicious or unauthorized access
...
However, we must be careful about how authorization can be passed among users if we are to ensure that such authorization can be revoked at some future time
...

• The various authorization provisions in a database system may not provide
sufﬁcient protection for highly sensitive data
...
Only a user who knows how to decipher (decrypt) the encrypted data
can read them
...

Review Terms
• Domain constraints

• Primary key constraint

• Check clause

• Unique constraint

• Referential integrity

• Foreign key constraint

256

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

252

Chapter 6

II
...
Integrity and Security

Integrity and Security

• Cascade
• Assertion
• Trigger
• Event-condition-action model
• Before and after triggers
• Transition variables and tables
• Database security
• Levels of security
• Authorization
• Privileges
Read
Insert
Update
Delete
Index

Resource
Alteration
Drop
Grant
All privileges
• Authorization graph
• Granting of privileges
• Roles
• Encryption
• Secret-key encryption
• Public-key encryption
• Authentication
• Challenge – response system
• Digital signature
• Nonrepudiation

Exercises
6
...
2 to include
the relations loan and borrower
...
2 Consider the following relational database:
employee (employee-name, street, city)
works (employee-name, company-name, salary)
company (company-name, city)
manages (employee-name, manager-name)
Give an SQL DDL deﬁnition of this database
...

6
...
Consider a database that includes the following relations:
salaried-worker (name, ofﬁce, phone, salary)
hourly-worker (name, hourly-wage)
address (name, street, city)
Suppose that we wish to require that every name that appears in address appear
in either salaried-worker or hourly-worker, but not necessarily in both
...
Propose a syntax for expressing such constraints
...
Discuss the actions that the system must take to enforce a constraint of this
form
...
Relational Databases

6
...
4 SQL allows a foreign-key dependency to refer to the same relation, as in the
following example:
create table manager
(employee-name char(20) not null
manager-name char(20) not null,
primary key employee-name,
foreign key (manager-name) references manager
on delete cascade )
Here, employee-name is a key to the table manager, meaning that each employee
has at most one manager
...
Explain exactly what happens when a tuple in the relation
manager is deleted
...
5 Suppose there are two relations r and s, such that the foreign key B of r references the primary key A of s
...

6
...

6
...

6
...
account-number = account
...

Write active rules to maintain the view, that is, to keep it up to date on insertions
to and deletions from depositor or account
...

6
...
For each item on your list, state
whether this concern relates to physical security, human security, operatingsystem security, or database security
...
10 Using the relations of our sample bank database, write an SQL expression to
deﬁne the following views:
a
...

258

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

254

Chapter 6

II
...
Integrity and Security

© The McGraw−Hill
Companies, 2001

Integrity and Security

b
...

c
...

6
...
10, explain how updates
would be performed (if they should be allowed at all)
...

6
...
In this chapter, we described
the use of views as a security mechanism
...

6
...
14 Database systems that store each relation in a separate operating-system ﬁle
may use the operating system’s security and authorization scheme, instead of
deﬁning a special scheme themselves
...

6
...
16 Perhaps the most important data items in any database system are the passwords that control access to the database
...
Be sure that your scheme allows the system to test passwords
supplied by users who are attempting to log into the system
...
The original SQL proposals for assertions and triggers are discussed in Astrahan et al
...
[1976], and Chamberlin
et al
...
See the bibliographic notes of Chapter 4 for references to SQL standards
and books on SQL
...

[1980a], Hsu and Imielinski [1985], McCune and Henschen [1989], and Chomicki
[1992]
...
Sheard and Stemple [1989] discusses this
approach
...
McCarthy and Dayal
[1989] discuss the architecture of an active database system based on the event–
condition–action formalism
...
Relational Databases

6
...
[1991]
...
A rule system is said to be conﬂuent if, regardless of the rule chosen,
the ﬁnal state is the same
...
[1995]
...
of Defense [1985]
...

Stonebraker and Wong [1974] discusses the Ingres approach to security, which involves modiﬁcation of users’ queries to ensure that users do not access data for which
authorization has not been granted
...

Database systems that can produce incorrect answers when necessary for security
maintenance are discussed in Winslett et al
...

Work on security in relational databases includes that of Stachour and Thuraisingham [1990], Jajodia and Sandhu [1990], and Qian and Lunt [1996]
...

Stallings [1998] provides a textbook description of cryptography
...
The Data Encryption Standard is presented by US Dept
...
Public-key encryption is discussed by Rivest
et al
...
Other discussions on cryptography include Difﬁe and Hellman [1979],
Simmons [1979], Fernandez et al
...

260

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

C

II
...
Relational−Database
Design

E

R

© The McGraw−Hill
Companies, 2001

7

Relational-Database Design

This chapter continues our discussion of design issues in relational databases
...
One approach is to design schemas that are in an
appropriate normal form
...
In this chapter, we introduce the notion
of functional dependencies
...

7
...

A domain is atomic if elements of the domain are considered to be indivisible
units
...

A set of names is an example of a nonatomic value
...

Composite attributes, such as an attribute address with component attributes street
and city, also have nonatomic domains
...
The distinction is that we do not
normally consider integers to have subparts, but we consider sets of integers to have
subparts— namely, the integers making up the set
...

257

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

258

Chapter 7

II
...
Relational−Database
Design

© The McGraw−Hill
Companies, 2001

Relational-Database Design

The domain of all integers would be nonatomic if we considered each integer to be
an ordered list of digits
...
Examples of such numbers would be CS0012 and
EE1127
...
If a relation schema had an attribute whose domain consists of identiﬁcation numbers encoded as above, the schema would not be in ﬁrst normal form
...

Doing so requires extra programming, and information gets encoded in the application program rather than in the database
...

The use of set valued attributes can lead to designs with redundant storage of data,
which in turn can result in inconsistencies
...
Whenever an account is created, or the set of
owners of an account is updated, the update has to be performed at two places; failure to perform both updates can leave the database in an inconsistent state
...
Set valued attributes are also more complicated to write queries with, and
more complicated to reason about
...
Although we have not mentioned ﬁrst normal form earlier, when
we introduced the relational model in Chapter 3 we stated that attribute values must
be atomic
...
For example, composite valued attributes are often useful, and set valued attributes are also useful in many cases, which is why both are supported in the E-R
model
...
There is also a runtime overhead of converting data back and forth from the atomic form
...
In fact, modern database
systems do support many types of nonatomic values, as we will see in Chapters 8
and 9
...

7
...
Among the undesirable properties that a bad design may
have are:

261

262

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

II
...
Relational−Database
Design

7
...
1 shows an instance of the relation lending (Lending-schema)
...

• t[branch-city] is the city in which the branch named t[branch-name] is located
...

• t[amount] is the amount of the loan whose number is t[loan-number]
...
Say that the loan is made
by the Perryridge branch to Adams in the amount of $1500
...
In our design, we need a tuple with values on all the attributes of Lendingschema
...
1

assets
9000000
2100000
1700000
9000000
400000
8000000
300000
3700000
9000000
1700000
7100000

customername
Jones
Smith
Hayes
Jackson
Jones
Turner
Williams
Hayes
Johnson
Glenn
Brooks

loannumber
L-17
L-23
L-15
L-14
L-93
L-11
L-29
L-16
L-18
L-25
L-10

Sample lending relation
...
Relational Databases

7
...
In general, the asset and city data for a branch must appear
once for each loan made by that branch
...
Repeating
information wastes space
...
Suppose, for example, that the assets of the Perryridge branch change from 1700000
to 1900000
...
Under our alternative design, many tuples of the lending relation need to be
changed
...
When we perform the update in the alternative database, we must
ensure that every tuple pertaining to the Perryridge branch is updated, or else our
database will show two different asset values for the Perryridge branch
...
We
know that a bank branch has a unique value of assets, so given a branch name we can
uniquely identify the assets value
...
In other words, we say that the functional dependency
branch-name → assets
holds on Lending-schema, but we do not expect the functional dependency branchname → loan-number to hold
...
We shall see that we can use functional
dependencies to specify formally when a database design is good
...
This is because tuples in the lending relation require values for loan-number, amount, and customer-name
...
Recall, however, that null values are difﬁcult to handle, as we
saw in Section 3
...
4
...

Worse, we would have to delete this information when all the loans have been paid
...

7
...
A functional dependency is a type of constraint that is a
generalization of the notion of key, as discussed in Chapters 2 and 3
...
3
...
They allow us
to express facts about the enterprise that we are modeling with our database
...
Relational Databases

© The McGraw−Hill
Companies, 2001

7
...
3

Functional Dependencies

261

In Chapter 2, we deﬁned the notion of a superkey as follows
...
A subset K of R is a superkey of R if, in any legal relation r(R), for all pairs
t1 and t2 of tuples in r such that t1 = t2 , then t1 [K] = t2 [K]
...

The notion of functional dependency generalizes the notion of superkey
...
The functional dependency
α→β
holds on schema R if, in any legal relation r(R), for all pairs of tuples t1 and t2 in r
such that t1 [α] = t2 [α], it is also the case that t1 [β] = t2 [β]
...
That is, K is a superkey if, whenever t1 [K] = t2 [K], it is also the case that
t1 [R] = t2 [R] (that is, t1 = t2 )
...
Consider the schema
Loan-info-schema = (loan-number, branch-name, customer-name, amount)
which is simpliﬁcation of the Lending-schema that we saw earlier
...

We shall use functional dependencies in two ways:
1
...
If a relation r is legal under a set F of functional dependencies,
we say that r satisﬁes F
...
To specify constraints on the set of legal relations
...
If we wish to constrain ourselves to relations on schema R that satisfy a
set F of functional dependencies, we say that F holds on R
...
2, to see which functional dependencies
are satisﬁed
...
There are two tuples that have an A
value of a1
...
Similarly, the two tuples with an A value of a2 have the same C value, c2
...
The functional dependency C → A is not
satisﬁed, however
...
Relational Databases

© The McGraw−Hill
Companies, 2001

7
...
2

B
b1
b2
b2
b2
b3

C
c1
c1
c2
c2
c2

D
d1
d2
d2
d3
d4

Sample relation r
...
These two tuples have the same C values, c2 , but they have different A values, a2 and a3 , respectively
...

Many other functional dependencies are satisﬁed by r, including, for example, the
functional dependency AB → D
...
Observe that there is no pair of distinct tuples t1 and
t2 such that t1 [AB] = t2 [AB]
...
So, r satisﬁes AB → D
...
For example, A → A is satisﬁed by all relations involving attribute A
...
Similarly, AB → A
is satisﬁed by all relations involving attribute A
...

To distinguish between the concepts of a relation satisfying a dependency and a
dependency holding on a schema, we return to the banking example
...
3, we see that customer-street
→ customer-city is satisﬁed
...
3

customer-street customer-city
Main
Harrison
North
Rye
Main
Harrison
North
Rye
Pittsfield
Park
Putnam
Stamford
Nassau
Princeton
Spring
Pittsfield
Alma
Palo Alto
Sand Hill
Woodside
Senator
Brooklyn
Walnut
Stamford
The customer relation
...
Relational Databases

© The McGraw−Hill
Companies, 2001

7
...
3

Functional Dependencies

263

loan-number branch-name amount
L-17
Downtown
1000
L-23
Redwood
2000
L-15
Perryridge
1500
L-14
Downtown
1500
L-93
Mianus
500
L-11
Round Hill
900
L-29
Pownal
1200
L-16
North Town 1300
L-18
2000
Downtown
Perryridge
L-25
2500
L-10
Brighton
2200
Figure 7
...

can have streets with the same name
...
So, we would not include customer-street → customer-city in the set of functional
dependencies that hold on Customer-schema
...
4, we see that the dependency loannumber → amount is satisﬁed
...
Therefore, we want to require
that loan-number → amount be satisﬁed by the loan relation at all times
...

In the branch relation of Figure 7
...
We want to require that branch-name → assets hold on
Branch-schema
...

In what follows, we assume that, when we design a relational database, we ﬁrst
list those functional dependencies that must always hold
...
5

branch-city
Brooklyn
Palo Alto
Horseneck
Horseneck
Horseneck
Bennington
Rye
Brooklyn

assets
9000000
2100000
1700000
400000
8000000
300000
3700000
7100000

The branch relation
...
Relational Databases

7
...
3
...
Rather, we
need to consider all functional dependencies that hold
...
We say that such functional dependencies are “logically implied” by F
...

Suppose we are given a relation schema R = (A, B, C, G, H, I) and the set of
functional dependencies
A→B
A→C
CG → H
CG → I
B→H
The functional dependency
A→H

267

268

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

II
...
Relational−Database
Design

7
...
That is, we can show that, whenever our given set of functional
dependencies holds on a relation, A → H must also hold on the relation
...
But that is exactly the deﬁnition of A → H
...
The closure of F, denoted by F + , is the
set of all functional dependencies logically implied by F
...
If F were large, this
process would be lengthy and difﬁcult
...

Axioms, or rules of inference, provide a simpler technique for reasoning about
functional dependencies
...
)
for sets of attributes, and uppercase Roman letters from the beginning of the alphabet
for individual attributes
...

We can use the following three rules to ﬁnd logically implied functional dependencies
...
This collection
of rules is called Armstrong’s axioms in honor of the person who ﬁrst proposed it
...
If α is a set of attributes and β ⊆ α, then α → β holds
...
If α → β holds and γ is a set of attributes, then γα → γβ
holds
...
If α → β holds and β → γ holds, then α → γ holds
...
They are complete, because, for a given set F of functional dependencies, they allow us to generate all F +
...

Although Armstrong’s axioms are complete, it is tiresome to use them directly for
the computation of F +
...
It is possible to use Armstrong’s axioms to prove that these rules are correct (see Exercises 7
...
9, and 7
...

• Union rule
...

• Decomposition rule
...

• Pseudotransitivity rule
...

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

266

Chapter 7

II
...
Relational−Database
Design

© The McGraw−Hill
Companies, 2001

Relational-Database Design

Let us apply our rules to the example of schema R = (A, B, C, G, H, I) and the
set F of functional dependencies {A → B, A → C, CG → H, CG → I, B → H}
...
Since A → B and B → H hold, we apply the transitivity rule
...

• CG → HI
...

• AG → I
...

Another way of ﬁnding that AG → I holds is as follows
...
Applying the transitivity rule to
this dependency and CG → I, we infer AG → I
...
6 shows a procedure that demonstrates formally how to use Armstrong’s
axioms to compute F +
...
We will also
see an alternative way of computing F + in Section 7
...
3
...
Since a set of size n has 2n subsets, there are a total of 2 × 2n = 2n+1 possible
functional dependencies, where n is the number of attributes in R
...
Thus, the procedure is guaranteed to terminate
...
3
...
One way of doing this is to compute
F + , take all functional dependencies with α as the left-hand side, and take the union
of the right-hand sides of all such dependencies
...

F+ = F
repeat
for each functional dependency f in F +
apply reﬂexivity and augmentation rules on f
add the resulting functional dependencies to F +
for each pair of functional dependencies f1 and f2 in F +
if f1 and f2 can be combined using transitivity
add the resulting functional dependency to F +
+
until F does not change any further
Figure 7
...

269

270

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

II
...
Relational−Database
Design

7
...

Let α be a set of attributes
...
Figure 7
...
The
input is a set F of functional dependencies and the set α of attributes
...

To illustrate how the algorithm works, we shall use it to compute (AG)+ with the
functional dependencies deﬁned in Section 7
...
2
...
The ﬁrst
time that we execute the while loop to test each functional dependency, we ﬁnd that
• A → B causes us to include B in result
...

• A → C causes result to become ABCG
...

• CG → I causes result to become ABCGHI
...

Let us see why the algorithm of Figure 7
...
The ﬁrst step is correct, since
α → α always holds (by the reﬂexivity rule)
...
Since we start the while loop with α → result being true, we can add γ to result
only if β ⊆ result and β → γ
...
Another application of transitivity shows that α → γ (using α → β and
β → γ)
...
Thus, any attribute returned by the algorithm
is in α+
...
If there is an attribute in α+ that
is not yet in result, then there must be a functional dependency β → γ for which β ⊆
result, and at least one attribute in γ is not in result
...
There is a faster (although slightly more complex) algorithm that runs in time linear in the size of F; that algorithm is presented as part of
Exercise 7
...

result := α;
while (changes to result) do
for each functional dependency β → γ in F do
begin
if β ⊆ result then result := result ∪ γ;
end
Figure 7
...

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

268

Chapter 7

II
...
Relational−Database
Design

© The McGraw−Hill
Companies, 2001

Relational-Database Design

There are several uses of the attribute closure algorithm:
• To test if α is a superkey, we compute α+ , and check if α+ contains all attributes of R
...
That is, we compute α+ by using attribute
closure, and then check if it contains β
...

• It gives us an alternative way to compute F + : For each γ ⊆ R, we ﬁnd the
closure γ + , and for each S ⊆ γ + , we output a functional dependency γ → S
...
3
...
Whenever a user performs an update on the relation, the database system must ensure that
the update does not violate any functional dependencies, that is, all the functional
dependencies in F are satisﬁed in the new database state
...

We can reduce the effort spent in checking for violations by testing a simpliﬁed set
of functional dependencies that has the same closure as the given set
...
However, the simpliﬁed set is easier to test
...
First, we need some deﬁnitions
...
The formal
deﬁnition of extraneous attributes is as follows
...

• Attribute A is extraneous in α if A ∈ α, and F logically implies (F − {α →
β}) ∪ {(α − A) → β}
...

For example, suppose we have the functional dependencies AB → C and A → C
in F
...
As another example, suppose we have the
functional dependencies AB → CD and A → C in F
...

Beware of the direction of the implications when using the deﬁnition of extraneous
attributes: If you exchange the left-hand side with right-hand side, the implication
will always hold
...
Relational Databases

© The McGraw−Hill
Companies, 2001

7
...
3

Functional Dependencies

269

Here is how we can test efﬁciently if an attribute is extraneous
...

Consider an attribute A in a dependency α → β
...
To do so, compute α+ (the closure
of α) under F ; if α+ includes A, then A is extraneous in β
...
To do so, compute γ + (the closure of γ) under F ; if γ +
includes all attributes in β, then A is extraneous in α
...
To check
if C is extraneous in AB → CD, we compute the attribute closure of AB under
F = {AB → D, A → E, and E → C}
...

A canonical cover Fc for F is a set of dependencies such that F logically implies all
dependencies in Fc , and Fc logically implies all dependencies in F
...

• Each left side of a functional dependency in Fc is unique
...

A canonical cover for a set of functional dependencies F can be computed as depicted in Figure 7
...
It is important to note that when checking if an attribute is extraneous, the check uses the dependencies in the current value of Fc , and not the dependencies in F
...
Such functional dependencies
should be deleted
...
However,
Fc is minimal in a certain sense — it does not contain extraneous attributes, and it
Fc = F
repeat
Use the union rule to replace any dependencies in Fc of the form
α1 → β1 and α1 → β2 with α1 → β1 β2
...

/* Note: the test for extraneous attributes is done using Fc , not F */
If an extraneous attribute is found, delete it from α → β
...

Figure 7
...
Relational Databases

7
...
It is cheaper to test Fc
than it is to test F itself
...

• There are two functional dependencies with the same set of attributes on the
left side of the arrow:
A → BC
A→B
We combine these functional dependencies into A → BC
...
This assertion is true because B → C is already in our set of functional dependencies
...

Thus, our canonical cover is
A→B
B→C
Given a set F of functional dependencies, it may be that an entire functional dependency in the set is extraneous, in the sense that dropping it does not change the
closure of F
...
Suppose that, to the contrary, there were such an extraneous
functional dependency in Fc
...

A canonical cover might not be unique
...
If we apply the extraneity
test to A → BC, we ﬁnd that both B and C are extraneous under F
...
Then,
1
...
Now,
B is not extraneous in the righthand side of A → B under F
...

273

274

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

II
...
Relational−Database
Design

7
...
If B is deleted, we get the set {A → C, B → AC, and C → AB}
...

As an exercise, can you ﬁnd one more canonical cover for F ?

7
...
2 suggests that we should decompose a relation schema
that has many attributes into several schemas with fewer attributes
...

Consider an alternative design in which we decompose Lending-schema into the
following two schemas:
Branch-customer-schema = (branch-name, branch-city, assets, customer-name)
Customer-loan-schema = (customer-name, loan-number, amount)
Using the lending relation of Figure 7
...
9 and 7
...

Of course, there are cases in which we need to reconstruct the loan relation
...
No relation in our alternative database contains these data
...
It appears that we can do so by writing
branch-customer
branch-name
Downtown
Redwood
Perryridge
Downtown
Mianus
Round Hill
Pownal
North Town
Downtown
Perryridge
Brighton

branch-city
Brooklyn
Palo Alto
Horseneck
Brooklyn
Horseneck
Horseneck
Bennington
Rye
Brooklyn
Horseneck
Brooklyn

Figure 7
...

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

272

Chapter 7

II
...
Relational−Database
Design

Relational-Database Design

customer-name
Jones
Smith
Hayes
Jackson
Jones
Turner
Williams
Hayes
Johnson
Glenn
Brooks
Figure 7
...

Figure 7
...
When
we compare this relation and the lending relation with which we started (Figure 7
...
In our example, branch-customer 1 customer-loan
has the following additional tuples:
(Downtown, Brooklyn, 9000000, Jones, L-93, 500)
(Perryridge, Horseneck, 1700000, Hayes, L-16, 1300)
(Mianus, Horseneck, 400000, Jones, L-17, 1000)
(North Town, Rye, 3700000, Hayes, L-15, 1500)
Consider the query, “Find all bank branches that have made a loan in an amount less
than $1000
...
1, we see that the only branches with loan
amounts less than $1000 are Mianus and Round Hill
...

A closer examination of this example shows why
...
Thus, when we join branch-customer and customer-loan, we obtain not only
the tuples we had originally in lending, but also several additional tuples
...
We are no longer able, in general, to represent in the database information
about which customers are borrowers from which branch
...
A
decomposition that is not a lossy-join decomposition is a lossless-join decomposi-

275

276

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

II
...
Relational−Database
Design

7
...
11

assets
9000000
9000000
2100000
1700000
1700000
9000000
400000
400000
8000000
300000
3700000
3700000
9000000
1700000
7100000

customername
Jones
Jones
Smith
Hayes
Hayes
Jackson
Jones
Jones
Turner
Williams
Hayes
Hayes
Johnson
Glenn
Brooks

The relation branch-customer

1

Decomposition

loannumber
L-17
L-93
L-23
L-15
L-16
L-14
L-17
L-93
L-11
L-29
L-15
L-16
L-18
L-25
L-10

273

amount
1000
500
2000
1500
1300
1500
1000
500
900
1200
1500
1300
2000
2500
2200

customer -loan
...
It should be clear from our example that a lossy-join decomposition is, in general, a bad database design
...
This representation is not adequate because a customer may have several loans, yet these loans are not necessarily obtained
from the same branch
...
The difference between this example and the preceding one is that the assets of a branch are the same, regardless
of the customer to which we are referring, whereas the lending branch associated
with a certain loan amount does depend on the customer to which we are referring
...
Relational Databases

7
...
That is, the functional
dependency
branch-name → assets branch-city
holds, but customer-name does not functionally determine loan-number
...
Therefore, we restate the preceding examples more concisely and more formally
...
A set of relation schemas {R1 , R2 ,
...
, Rn } is a decomposition of R if, for i = 1, 2,
...

Let r be a relation on schema R, and let ri = ΠRi (r) for i = 1, 2,
...
That is,
{r1 , r2 ,
...
, Rn }
...
When we compute the
relations r1 , r2 ,
...
, n
...
The
details are left for you to complete as an exercise
...

In general, r = r1 1 r2 1 · · · 1 rn
...

• R = Lending-schema
...

• R2 = Customer-loan-schema
...
1
...
9
...
10
...
11
...
1 and 7
...

To have a lossless-join decomposition, we need to impose constraints on the set of
possible relations
...

277

278

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

II
...
Relational−Database
Design

7
...
We say that a relation is legal if it satisﬁes all rules, or constraints, that we
impose on our database
...

A decomposition {R1 , R2 ,
...
A major part of this chapter deals with the questions of
how to specify constraints on the database, and how to obtain lossless-join decompositions that avoid the pitfalls represented by the examples of bad database designs
that we have seen in this section
...
5 Desirable Properties of Decomposition
We can use a given set of functional dependencies in designing a relational database
in which most of the undesirable properties discussed in Section 7
...

When we design such systems, it may become necessary to decompose a relation
into several smaller relations
...
In later sections, we outline speciﬁc ways of decomposing a relational
schema to get the properties we desire
...
2:
Lending-schema = (branch-name, branch-city, assets, customer-name,
loan-number, amount)
The set F of functional dependencies that we require to hold on Lending-schema are
branch-name → branch-city assets
loan-number → amount branch-name
As we discussed in Section 7
...
Assume that we decompose it to the following three relations:
Branch-schema = (branch-name, branch-city, assets)
Loan-schema = (loan-number, branch-name, amount)
Borrower-schema = (customer-name, loan-number)
We claim that this decomposition has several desirable properties, which we discuss
next
...

7
...
1 Lossless-Join Decomposition
In Section 7
...
We claim that the

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

276

Chapter 7

II
...
Relational−Database
Design

© The McGraw−Hill
Companies, 2001

Relational-Database Design

decomposition in Section 7
...
To demonstrate our claim, we must
ﬁrst present a criterion for determining whether a decomposition is lossy
...
Let
R1 and R2 form a decomposition of R
...
We can use attribute closure to efﬁciently test for
superkeys, as we have seen earlier
...
We
begin by decomposing Lending-schema into two schemas:
Branch-schema = (branch-name, branch-city, assets)
Loan-info-schema = (branch-name, customer-name, loan-number, amount)
Since branch-name → branch-city assets, the augmentation rule for functional dependencies (Section 7
...
2) implies that
branch-name → branch-name branch-city assets
Since Branch-schema ∩ Loan-info-schema = {branch-name}, it follows that our initial
decomposition is a lossless-join decomposition
...

For the general case of decomposition of a relation into multiple parts at once, the
test for lossless join decomposition is more complicated
...

While the test for binary decomposition is clearly a sufﬁcient condition for lossless
join, it is a necessary condition only if all constraints are functional dependencies
...

7
...
2 Dependency Preservation
There is another goal in relational-database design: dependency preservation
...
Relational Databases

© The McGraw−Hill
Companies, 2001

7
...
5

Desirable Properties of Decomposition

277

functional dependencies
...

To decide whether joins must be computed to check an update, we need to determine what functional dependencies can be tested by checking each relation individually
...
, Rn
be a decomposition of R
...
Since all functional dependencies in
a restriction involve attributes of only one relation schema, it is possible to test such
a dependency for satisfaction by checking only one relation
...
For instance, suppose F = {A → B, B → C}, and we have a decomposition into
AC and AB
...

The set of restrictions F1 , F2 ,
...
We now must ask whether testing only the restrictions is sufﬁcient
...
F is a set of functional dependencies on schema R, but,
in general, F = F
...
If the latter is
true, then every dependency in F is logically implied by F , and, if we verify that F
is satisﬁed, we have veriﬁed that F is satisﬁed
...

Figure 7
...
The input
is a set D = {R1 , R2 ,
...
This algorithm is expensive since it requires computation of F + ;
we will describe another algorithm that is more efﬁcient after giving an example of
testing for dependency preservation
...
Instead of applying the algorithm of Figure 7
...
12

Testing for dependency preservation
...
Relational Databases

7
...

• We can test the functional dependency: branch-name → branch-city assets using
Branch-schema = (branch-name, branch-city, assets)
...

If each member of F can be tested on one of the relations of the decomposition, then
the decomposition is dependency preserving
...
The alternative test can
therefore be used as a sufﬁcient condition that is easy to check; if it fails we cannot
conclude that the decomposition is not dependency preserving, instead we will have
to apply the general test
...
The idea is to test each functional dependency α → β in F by using a
modiﬁed form of attribute closure to see if it is preserved by the decomposition
...

result = α
while (changes to result) do
for each Ri in the decomposition
t = (result ∩Ri )+ ∩ Ri
result = result ∪ t
The attribute closure is with respect to the functional dependencies in F
...
The
decomposition is dependency preserving if and only if all the dependencies in F are
preserved
...
This procedure
takes polynomial time, instead of the exponential time required to compute F +
...
5
...
2
...
The decomposition separates
branch and loan data into distinct relations, thereby eliminating this redundancy
...
In the decomposition, the relation on schema Borrowerschema contains the loan-number, customer-name relationship, and no other schema
does
...
Relational Databases

© The McGraw−Hill
Companies, 2001

7
...
6

Boyce – Codd Normal Form

279

on Borrower-schema
...

Clearly, the lack of redundancy in our decomposition is desirable
...

7
...
In this section we cover BCNF (deﬁned below), and later, in
Section 7
...

7
...
1 Deﬁnition
One of the more desirable normal forms that we can obtain is Boyce – Codd normal
form (BCNF)
...

• α is a superkey for schema R
...

As an illustration, consider the following relation schemas and their respective
functional dependencies:
• Customer-schema = (customer-name, customer-street, customer-city)
customer-name → customer-street customer-city
• Branch-schema = (branch-name, assets, branch-city)
branch-name → assets branch-city
• Loan-info-schema = (branch-name, customer-name, loan-number, amount)
loan-number → amount branch-name
We claim that Customer-schema is in BCNF
...
The only nontrivial functional dependencies that hold on
Customer-schema have customer-name on the left side of the arrow
...
Similarly, it can be shown easily that the relation
schema Branch-schema is in BCNF
...
First, note that loan-number
is not a superkey for Loan-info-schema, since we could have a pair of tuples representing a single loan made to two people — for example,
(Downtown, John Bell, L-44, 1000)
(Downtown, Jane Bell, L-44, 1000)

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

280

Chapter 7

II
...
Relational−Database
Design

© The McGraw−Hill
Companies, 2001

Relational-Database Design

Because we did not list functional dependencies that rule out the preceding case, loannumber is not a candidate key
...
Therefore, Loan-info-schema does not satisfy the deﬁnition of
BCNF
...
2
...
We can eliminate this redundancy by redesigning our database such that
all schemas are in BCNF
...
Consider the decomposition of Loan-info-schema into two schemas:
Loan-schema = (loan-number, branch-name, amount)
Borrower-schema = (customer-name, loan-number)
This decomposition is a lossless-join decomposition
...
In this example, it is easy to see that
loan-number → amount branch-name
applies to the Loan-schema, and that only trivial functional dependencies apply to
Borrower-schema
...
Thus, both schemas of our decomposition are in BCNF
...
There is exactly one tuple for each loan in the relation on Loan-schema, and one tuple for each customer of each loan in the relation on
Borrower-schema
...

Often testing of a relation to see if it satisﬁes BCNF can be simpliﬁed:
• To check if a nontrivial dependency α → β causes a violation of BCNF, compute α+ (the attribute closure of α), and verify that it includes all attributes of
R; that is, it is a superkey of R
...

We can show that if none of the dependencies in F causes a violation of
BCNF, then none of the dependencies in F + will cause a violation of BCNF
either
...

That is, it does not sufﬁce to use F when we test a relation Ri , in a decomposition
of R, for violation of BCNF
...
Suppose this were

283

284

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

II
...
Relational−Database
Design

7
...
Now, neither of the dependencies in
F contains only attributes from (A, C, D, E) so we might be misled into thinking R2
satisﬁes BCNF
...
Thus, we may need a dependency that is in F + , but is not in F , to
show that a decomposed relation is not in BCNF
...
To check if a relation Ri in a decomposition of R is in BCNF, we apply this test:
• For every subset α of attributes in Ri , check that α+ (the attribute closure of α
under F ) either includes no attribute of Ri − α, or includes all attributes of Ri
...

The above dependency shows that Ri violates BCNF, and is a “witness” for the violation
...
6
...

7
...
2 Decomposition Algorithm
We are now able to state a general method to decompose a relation schema so as to
satisfy BCNF
...
13 shows an algorithm for this task
...
, Rn by the algorithm
...

The decomposition that the algorithm generates is not only in BCNF, but is also
a lossless-join decomposition
...

result := {R};
done := false;
compute F + ;
while (not done) do
if (there is a schema Ri in result that is not in BCNF)
then begin
let α → β be a nontrivial functional dependency that holds
on Ri such that α → Ri is not in F + , and α ∩ β = ∅ ;
result := (result − Ri ) ∪ (Ri − β) ∪ ( α, β);
end
else done := true;
Figure 7
...

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

282

Chapter 7

II
...
Relational−Database
Design

© The McGraw−Hill
Companies, 2001

Relational-Database Design

We apply the BCNF decomposition algorithm to the Lending-schema schema that
we used in Section 7
...

We can apply the algorithm of Figure 7
...
Thus, Lendingschema is not in BCNF
...
Since branch-name is a key for
Branch-schema, the relation Branch-schema is in BCNF
...

We replace Loan-info-schema by
Loan-schema = (loan-number, branch-name, amount)
Borrower-schema = (customer-name, loan-number)
• Loan-schema and Borrower-schema are in BCNF
...
These relation
schemas are the same as those in Section 7
...

The BCNF decomposition algorithm takes time exponential in the size of the initial
schema, since the algorithm for checking if a relation in the decomposition satisﬁes
BCNF can take exponential time
...
Relational Databases

© The McGraw−Hill
Companies, 2001

7
...
6

Boyce – Codd Normal Form

283

algorithm that can compute a BCNF decomposition in polynomial time
...

7
...
3 Dependency Preservation
Not every BCNF decomposition is dependency preserving
...
The
set F of functional dependencies that we require to hold on the Banker-schema is
banker-name → branch-name
branch-name customer-name → banker-name
Clearly, Banker-schema is not in BCNF since banker-name is not a superkey
...
13, we obtain the following BCNF decomposition:
Banker-branch-schema = (banker-name, branch-name)
Customer-banker-schema = (customer-name, banker-name)
The decomposed schemas preserve only banker-name → branch-name (and trivial
dependencies), but the closure of {banker-name → branch-name} does not include
customer-name branch-name → banker-name
...

To see why the decomposition of Banker-schema into the schemas Banker-branchschema and Customer-banker-schema is not dependency preserving, we apply the algorithm of Figure 7
...
We ﬁnd that the restrictions F1 and F2 of F to each schema
are:
F1 = {banker-name → branch-name}
F2 = ∅ (only trivial dependencies hold on Customer-banker-schema)
(For brevity, we do not show trivial functional dependencies
...
Therefore, (F1 ∪ F2 )+ = F + , and the decomposition is not dependency preserving
...
Moreover, it is easy to see that any BCNF decomposition of Banker-schema
must fail to preserve customer-name branch-name → banker-name
...
Lossless join
2
...
Dependency preservation

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

284

Chapter 7

II
...
Relational−Database
Design

© The McGraw−Hill
Companies, 2001

Relational-Database Design

Recall that lossless join is an essential condition for a decomposition, to avoid loss
of information
...
In Section 7
...

There are situations where there is more than one way to decompose a schema
into BCNF
...
For instance, suppose we have a relation schema R(A, B, C) with the
functional dependencies A → B and B → C
...
If we used the dependency A → B (or equivalently, A → C)
to decompose R, we would end up with two relations R1(A, B) and R2(A, C); the
dependency B → C would not be preserved
...
Clearly the decomposition into R1(A, B) and R2(B, C)
is preferable
...

7
...
For such schemas, we have two alternatives if we wish to
check if an update violates any functional dependencies:
• Pay the extra cost of computing joins to test for violations
...
Unlike BCNF, 3NF decompositions may contain some redundancy in the decomposed schema
...
Which of the two alternatives to choose is a design
decision to be made by the database designer on the basis of the application requirements
...
7
...
3NF relaxes this constraint slightly by allowing nontrivial functional dependencies whose left side is not a superkey
...

• α is a superkey for R
...
Relational Databases

© The McGraw−Hill
Companies, 2001

7
...
7

Third Normal Form

285

• Each attribute A in β − α is contained in a candidate key for R
...

The ﬁrst two alternatives are the same as the two alternatives in the deﬁnition of
BCNF
...
It represents, in some sense, a minimal relaxation of the
BCNF conditions that helps ensure that every schema has a dependency-preserving
decomposition into 3NF
...

Observe that any schema that satisﬁes BCNF also satisﬁes 3NF, since each of its
functional dependencies would satisfy one of the ﬁrst two alternatives
...

The deﬁnition of 3NF allows certain functional dependencies that are not allowed
in BCNF
...
1
Let us return to our Banker-schema example (Section 7
...
We have shown that this
relation schema does not have a dependency-preserving, lossless-join decomposition
into BCNF
...
To see that it is, we note
that {customer-name, branch-name} is a candidate key for Banker-schema, so the only
attribute not contained in a candidate key for Banker-schema is banker-name
...
Since {customer-name, branch-name}
is a candidate key, these dependencies do not violate the deﬁnition of 3NF
...
Also, we can decompose the dependencies in F so that their right-hand side consists of only single attributes, and use the
resultant set in place of F
...
If α is not a superkey, we
have to verify whether each attribute in β is contained in a candidate key of R; this
test is rather more expensive, since it involves ﬁnding candidate keys
...

7
...
2 Decomposition Algorithm
Figure 7
...
The set of dependencies Fc used in the algorithm is a canoni1
...
25)
...
The deﬁnition we use is equivalent but easier to
understand
...
Relational Databases

7
...
, i contains α β
then begin
i := i + 1;
Ri := α β;
end
if none of the schemas Rj , j = 1, 2,
...
, Ri )
Figure 7
...

cal cover for F
...
, i;
initially i = 0, and in this case the set is empty
...
14, consider the following extension to the
Banker-schema in Section 7
...
The functional dependencies for this relation schema are
banker-name → branch-name ofﬁce-number
customer-name branch-name → banker-name
The for loop in the algorithm causes us to include the following schemas in our
decomposition:
Banker-ofﬁce-schema = (banker-name, branch-name, ofﬁce-number)
Banker-schema = (customer-name, branch-name, banker-name)
Since Banker-schema contains a candidate key for Banker-info-schema, we are ﬁnished
with the decomposition process
...
It ensures that the decomposition
is a lossless-join decomposition by guaranteeing that at least one schema contains a
candidate key for the schema being decomposed
...
19 provides some insight
into the proof that this sufﬁces to guarantee a lossless join
...
The result is not uniquely deﬁned, since a set of functional dependencies

289

290

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

II
...
Relational−Database
Design

7
...

If a relation Ri is in the decomposition generated by the synthesis algorithm, then
Ri is in 3NF
...
Therefore, to see that Ri is in
3NF, you must convince yourself that any functional dependency γ → B that holds
on Ri satisﬁes the deﬁnition of 3NF
...
Now, B must be in α or β, since B is in Ri and
α → β generated Ri
...
In this case, the dependency α → β would not have been
in Fc since B would be extraneous in β
...

• B is in β but not α
...
The second condition of 3NF is satisﬁed
...
Then α must contain some attribute not in γ
...
The derivation could not have used α → β —
if it had been used, α must be contained in the attribute closure of γ,
which is not possible, since we assumed γ is not a superkey
...
This would imply that B
is extraneous in the right-hand side of α → β, which is not possible since
α → β is in the canonical cover Fc
...

• B is in α but not β
...

Interestingly, the algorithm we described for decomposition into 3NF can be implemented in polynomial time, even though testing a given relation to see if it satisﬁes
3NF is NP-hard
...
7
...
Nevertheless, there are
disadvantages to 3NF: If we do not eliminate all transitive relations schema dependencies, we may have to use null values to represent some of the possible meaningful
relationships among data items, and there is the problem of repetition of information
...
Since banker-name → branch-name, we may
want to represent relationships between values for banker-name and values for branchname in our database
...

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

288

Chapter 7

II
...
Relational−Database
Design

Relational-Database Design

customer-name
Jones
Smith
Hayes
Jackson
Curry
Turner
Figure 7
...

As an illustration of the repetition of information problem, consider the instance
of Banker-schema in Figure 7
...
Notice that the information indicating that Johnson
is working at the Perryridge branch is repeated
...
BCNF
2
...
Dependency preservation
Since it is not always possible to satisfy all three, we may be forced to choose between
BCNF and dependency preservation with 3NF
...
It is possible, although a little complicated, to write assertions
that enforce a functional dependency (see Exercise 7
...
Thus even if we had
a dependency-preserving decomposition, if we use standard SQL we would not be
able to efﬁciently test a functional dependency whose left-hand side is not a key
...
Given a BCNF decomposition that is not dependency preserving, we consider each dependency in a minimum cover Fc that is
not preserved in the decomposition
...
The functional dependency can be easily tested on the materialized view, by means of a constraint unique (α)
...
(Later in the
book, in Section 14
...
)
Thus, in case we are not able to get a dependency-preserving BCNF decomposition,
it is generally preferable to opt for BCNF, and use techniques such as materialized
views to reduce the cost of checking functional dependencies
...
Relational Databases

© The McGraw−Hill
Companies, 2001

7
...
8

Fourth Normal Form

289

7
...
Consider again our banking example
...
However, assume that our bank is attracting wealthy customers who have several addresses (say, a winter home and a summer home)
...
If we
remove this functional dependency, we ﬁnd BC-schema to be in BCNF with respect to
our modiﬁed set of functional dependencies
...

To deal with this problem, we must deﬁne a new form of constraint, called a multivalued dependency
...
This normal form, called
fourth normal form (4NF), is more restrictive than BCNF
...

7
...
1 Multivalued Dependencies
Functional dependencies rule out certain tuples from being in a relation
...
Multivalued dependencies, on the other hand, do not rule out the existence of certain
tuples
...
For this reason, functional dependencies sometimes are referred to as equalitygenerating dependencies, and multivalued dependencies are referred to as tuplegenerating dependencies
...
The multivalued dependency
α→ β
→
holds on R if, in any legal relation r(R), for all pairs of tuples t1 and t2 in r such that
t1 [α] = t2 [α], there exist tuples t3 and t4 in r such that
t1 [α] = t2 [α] = t3 [α] = t4 [α]
t3 [β] = t1 [β]
t3 [R − β] = t2 [R − β]
t4 [β] = t2 [β]
t4 [R − β] = t1 [R − β]

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

290

Chapter 7

II
...
Relational−Database
Design

Relational-Database Design

t1
t2
t3
t4

α
a1
...
ai
a1
...
ai

Figure 7
...
aj
bi + 1
...
aj
bi + 1
...
an
bj + 1
...
bn
aj + 1
...

→

This deﬁnition is less complicated than it appears to be
...
16 gives a tabular
→
picture of t1 , t2 , t3 , and t4
...
If the multivalued dependency α → β is satisﬁed by all relations on schema
→
R, then α → β is a trivial multivalued dependency on schema R
...

To illustrate the difference between functional and multivalued dependencies, we
consider the BC-schema again, and the relation bc (BC-schema) of Figure 7
...
We must
repeat the loan number once for each address a customer has, and we must repeat
the address for each loan a customer has
...
If a customer (say, Smith) has a loan (say, loan
number L-23), we want that loan to be associated with all Smith’s addresses
...
18 is illegal
...
18
...
(The multivalued dependency customer-name → loan-number will do as well
...
)
As with functional dependencies, we shall use multivalued dependencies in two
ways:
1
...
To specify constraints on the set of legal relations; we shall thus concern ourselves with only those relations that satisfy a given set of functional and multivalued dependencies
loan-number
L-23
L-23
L-93
Figure 7
...

293

294

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

II
...
Relational−Database
Design

7
...
18

Fourth Normal Form

customer-street
North
Main

291

customer-city
Rye
Manchester

An illegal bc relation
...

Let D denote a set of functional and multivalued dependencies
...

As we did for functional dependencies, we can compute D+ from D, using the formal
deﬁnitions of functional dependencies and multivalued dependencies
...
Luckily, multivalued dependencies that occur in practice appear to be quite simple
...
(Section C
...
1 of the appendix outlines a system of inference rules for
multivalued dependencies
...

→
In other words, every functional dependency is also a multivalued dependency
...
8
...
We saw in the opening paragraphs of Section 7
...
We shall see that we can use the given multivalued dependency to improve the database design, by decomposing BC-schema into a fourth
normal form decomposition
...

→
• α is a superkey for schema R
...

Note that the deﬁnition of 4NF differs from the deﬁnition of BCNF in only the use
of multivalued dependencies instead of functional dependencies
...
To see this fact, we note that, if a schema R is not in BCNF, then there is

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

292

Chapter 7

II
...
Relational−Database
Design

© The McGraw−Hill
Companies, 2001

Relational-Database Design

result := {R};
done := false;
compute D+ ; Given schema Ri , let Di denote the restriction of D+ to Ri
while (not done) do
if (there is a schema Ri in result that is not in 4NF w
...
t
...
19

4NF decomposition algorithm
...

Since α → β implies α → β, R cannot be in 4NF
...
, Rn be a decomposition of R
...
Recall that, for a set F of functional
dependencies, the restriction Fi of F to Ri is all functional dependencies in F + that
include only attributes of Ri
...
The restriction of D to Ri is the set Di consisting of
1
...
All multivalued dependencies of the form
α → β ∩ Ri
→
where α ⊆ Ri and α → β is in D+
...
8
...
Figure 7
...
It is identical
to the BCNF decomposition algorithm of Figure 7
...

If we apply the algorithm of Figure 7
...
Following the algorithm, we replace BC-schema by two
schemas:
Borrower-schema = (customer-name, loan-number)
Customer-schema = (customer-name, customer-street, customer-city)
...

295

296

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

II
...
Relational−Database
Design

7
...
The following fact about multivalued dependencies and lossless
joins shows that the algorithm of Figure 7
...
Let R1 and R2 form a decomposition of R
...
5
...
The preceding fact about multivalued dependencies is a more general statement about lossless joins
...

→
→
The issue of dependency preservation when we decompose a relation becomes
more complicated in the presence of multivalued dependencies
...
1
...

7
...
As we saw earlier, multivalued dependencies help us understand and tackle some forms of repetition of information that cannot be understood in terms of functional dependencies
...
There is a class of even
more general constraints, which leads to a normal form called domain-key normal
form
...
Hence PJNF and domain-key normal form
are used quite rarely
...

Conspicuous by its absence from our discussion of normal forms is second normal form (2NF)
...
We
simply deﬁne it, and let you experiment with it in Exercise 7
...

7
...
In
this section we study how normalization ﬁts into the overall database design process
...
4, we assumed that a relation schema
R is given, and proceeded to normalize it
...
Relational Databases

7
...
R could have been generated when converting a E-R diagram to a set of tables
...
R could have been a single relation containing all attributes that are of interest
...

3
...

In the rest of this section we examine the implications of these approaches
...

7
...
1 E-R Model and Normalization
When we carefully deﬁne an E-R diagram, identifying all entities correctly, the tables
generated from the E-R diagram should not need further normalization
...
For instance,
suppose an employee entity had attributes department-number and department-address,
and there is a functional dependency department-number → department-address
...

Most examples of such dependencies arise out of poor E-R diagram design
...
Similarly, a relationship involving more than two entities may not be in a
desirable normal form
...
(In fact, some E-R diagram variants actually make it difﬁcult or impossible to
specify nonbinary relations
...
If the generated relations are not in desired normal form, the problem can be ﬁxed in the E-R diagram
...
Alternatively,
normalization can be left to the designer’s intuition during E-R modeling, and can be
done formally on the relations generated from the E-R model
...
10
...
One of our goals in choosing a
decomposition was that it be a lossless-join decomposition
...

Consider the database of Figure 7
...
The ﬁgure depicts a situation in which we have not yet determined the amount
of loan L-58, but wish to record the remainder of the data on the loan
...
In other words, there is no loan-info relation corresponding to the relations
of Figure 7
...
Tuples that disappear when we compute the join are dangling tuples
(see Section 6
...
1)
...
, rn (Rn ) be a set of relations
...
Relational Databases

© The McGraw−Hill
Companies, 2001

7
...
10

Overall Database Design Process

branch-name
Round Hill

loan-number
L-58

loan-number

295

amount

loan-number
L-58
Figure 7
...

tuple t of relation ri is a dangling tuple if t is not in the relation
ΠRi (r1

1

r2

1

···

1

rn )

Dangling tuples may occur in practical database applications
...
The relation r1 1 r2 1 · · · 1 rn is
called a universal relation, since it involves all the attributes in the universe deﬁned
by R1 ∪ R2 ∪ · · · ∪ Rn
...
20
is to include null values in the universal relation
...
Because of them, it may be better to view the relations
of the decomposed design as representing the database, rather than as the universal relation whose schema we decomposed during the normalization process
...
)
Note that we cannot enter all incomplete information into the database of Figure 7
...
For example, we cannot enter a loan number
unless we know at least one of the following:
• The customer name
• The branch name
• The amount of the loan
Thus, a particular decomposition deﬁnes a restricted form of incomplete information
that is acceptable in our database
...
Returning again to the
example of Figure 7
...
” This is
because
loan-number → customer-name amount
and therefore the only way that we can relate customer-name and amount is through
loan-number
...

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

296

Chapter 7

II
...
Relational−Database
Design

© The McGraw−Hill
Companies, 2001

Relational-Database Design

In other words, we do not want to store data for which the key attributes are unknown
...
Thus, our normal forms allow
representation of acceptable incomplete information via dangling tuples, while prohibiting the storage of undesirable incomplete information
...
We cannot use name to refer
to both customer-name and to branch-name
...
Nevertheless, if we deﬁned our relation schemas directly,
rather than in terms of a universal relation, we could obtain relations on schemas
such as the following for our banking example:
branch-loan (name, number)
loan-customer (number, name)
amt (number, amount)
Observe that, with the preceding relations, expressions such as branch-loan 1 loancustomer are meaningless
...

In a language such as SQL, however, a query involving branch-loan and loan-customer must remove ambiguity in references to name by preﬁxing the relation name
...

We believe that using the unique-role assumption — that each attribute name has
a unique meaning in the database — is generally preferable to reusing of the same
name in multiple roles
...

7
...
3 Denormalization for Performance
Occasionally database designers choose a schema that has redundant information;
that is, it is not normalized
...
The penalty paid for not using a normalized schema is the extra
work (in terms of coding time and execution time) to keep redundant data consistent
...
In our
normalized schema, this requires a join of account with depositor
...
This makes displaying the account information
faster
...
The process of taking a normalized schema and
making it non-normalized is called denormalization, and designers use it to tune
performance of systems to support time-critical operations
...
Relational Databases

© The McGraw−Hill
Companies, 2001

7
...
11

Summary

297

A better alternative, supported by many database systems today, is to use the normalized schema, and additionally store the join or account and depositor as a materialized view
...
)
Like denormalization, using materialized view does have space and time overheads;
however, it has the advantage that keeping the view up to date is the job of the
database system, not the application programmer
...
10
...
We give examples here; obviously, such
designs should be avoided
...
A relation earnings(company-id, year, amount) could be used to store the
earnings information
...

An alternative design is to use multiple relations, each storing the earnings for a
different year
...
The only functional dependency here on each
relation would be company-id → earnings, so these relations are also in BCNF
...
Queries would also be more complicated since
they may have to refer to many relations
...
Here the only functional
dependencies are from company-id to the other attributes, and again the relation is
in BCNF
...
Queries would also be more complicated, since they may have to refer to
many attributes
...
While such representations are useful for display to users, for the reasons just given, they are not desirable in a database design
...

7
...
The pitfalls included repeated information and inability to represent some information
...
Relational Databases

© The McGraw−Hill
Companies, 2001

7
...
We laid special emphasis on what dependencies are logically implied by a set of dependencies
...

• We introduced the concept of decomposition, and showed that decompositions must be lossless-join decompositions, and should preferably be dependency preserving
...

• We then presented Boyce – Codd Normal Form (BCNF); relations in BCNF are
free from the pitfalls outlined earlier
...
There are relations for which there is no dependencypreserving BCNF decomposition
...
Relations in 3NF may have some redundancy, but there is always a dependency-preserving decomposition into
3NF
...
We deﬁned fourth normal form (4NF) with multivalued dependencies
...
1
...

• Other normal forms, such as PJNF and DKNF, eliminate more subtle forms
of redundancy
...

Appendix C gives details on these normal forms
...
That is one of the primary
advantages of the relational model compared with the other data models that
we have studied
...
Relational Databases

© The McGraw−Hill
Companies, 2001

7
...
1 Explain what is meant by repetition of information and inability to represent information
...

7
...
3 Why are certain functional dependencies called trivial functional dependencies?
7
...
21
...
21

B
b1
b1
b1
b1

C
c1
c2
c1
c3

Relation of Exercise 7
...

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

300

Chapter 7

II
...
Relational−Database
Design

Relational-Database Design

7
...

7
...

• A many-to-one relationship set exists between entity sets account and customer
...
7 Consider the following proposed rule for functional dependencies: If α → β and
γ → β, then α → γ
...

7
...
(Hint: Use the
augmentation rule to show that, if α → β, then α → αβ
...
)
7
...

7
...

7
...

A → BC
CD → E
B→D
E→A
List the candidate keys for R
...
12 Using the functional dependencies of Exercise 7
...

7
...
11, compute the canonical
cover Fc
...
14 Consider the algorithm in Figure 7
...
Show that this algorithm
is more efﬁcient than the one presented in Figure 7
...
3
...

7
...
Also write an SQL assertion that enforces the functional dependency
...

7
...
2 is not a
lossless-join decomposition:
(A, B, C)
(C, D, E)
Hint: Give an example of a relation r on schema R such that
ΠA, B, C (r)

1

ΠC, D, E (r) = r

303

304

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

II
...
Relational−Database
Design

© The McGraw−Hill
Companies, 2001

Exercises

result := ∅;
/* fdcount is an array whose ith element contains the number
of attributes on the left side of the ith FD that are
not yet known to be in α+ */
for i := 1 to |F | do
begin
let β → γ denote the ith FD;
fdcount [i] := |β|;
end
/* appears is an array with one entry for each attribute
...
Each integer
i on the list indicates that A appears on the left side
of the ith FD */
for each attribute A do
begin
appears [A] := N IL;
for i := 1 to |F | do
begin
let β → γ denote the ith FD;
if A ∈ β then add i to appears [A];
end
end
addin (α);
return (result);
procedure addin (α);
for each attribute A in α do
begin
if A ∈ result then
begin
result := result ∪ {A};
for each element i of appears[A] do
begin
fdcount [i] := fdcount [i] − 1;
if fdcount [i] := 0 then
begin
let β → γ denote the ith FD;
addin (γ);
end
end
end
end
Figure 7
...

301

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

302

Chapter 7

II
...
Relational−Database
Design

Relational-Database Design

7
...
, Rn be a decomposition of schema U
...
Show that
u ⊆ r1

1

r2

1

···

1

rn

7
...
2 is not a dependency-preserving
decomposition
...
19 Show that it is possible to ensure that a dependency-preserving decomposition into 3NF is a lossless-join decomposition by guaranteeing that at least one
schema contains a candidate key for the schema being decomposed
...
)
7
...

7
...
2
...
22 Give an example of a relation schema R and set F of functional dependencies
such that there are at least three distinct lossless-join decompositions of R into
BCNF
...
23 In designing a relational database, why might we choose a non-BCNF design?
7
...
2
...
25 Let a prime attribute be one that appears in at least one candidate key
...
Let A be
an attribute that is not in α, is not in β, and for which β → A holds
...
We can restate our deﬁnition of 3NF as follows:
A relation schema R is in 3NF with respect to a set F of functional dependencies
if there are no nonprime attributes A in R for which A is transitively dependent
on a key for R
...

7
...
We say that β is partially dependent on α
...

• It is not partially dependent on a candidate key
...
(Hint: Show that every partial dependency is a transitive dependency
...
27 Given the three goals of relational-database design, is there any reason to design
a database schema that is in 2NF, but is in no higher-order normal form? (See
Exercise 7
...
)

305

306

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

II
...
Relational−Database
Design

Bibliographical Notes

303

7
...

7
...

7
...
Explain problems that they may cause
...
In that paper, Codd also introduced functional dependencies, and
ﬁrst, second, and third normal forms
...
Ullman [1988] is an
easily accessible source of proofs of soundness and completeness of Armstrong’s axioms
...
Maier [1983] discusses the theory of functional dependencies
...
[1986] discusses formal aspects of the concept of a
legal relation
...
The desirability of BCNF is discussed in
Bernstein et al
...
A polynomial-time algorithm for BCNF decomposition appears in Tsou and Fischer [1982], and can also be found in Ullman [1988]
...

[1979] gives the algorithm we used to ﬁnd a lossless-join dependency-preserving decomposition into 3NF
...
[1979a]
...
Beeri et al
...
Our axiomatization is based on theirs
...

Maier [1983] presents the design theory of relational databases in detail
...
[1995] present a more theoretic coverage of many of the
dependencies and normal forms presented here
...

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

P A

III
...
As a result, researchers have developed several data models to
deal with these application domains
...
In addition, we study XML, a language
that can represent data that is less structured than that of the other data models
...
Inheritance,
object-identity, and encapsulation (information hiding), with methods to provide an
interface to objects, are among the key concepts of object-oriented programming that
have found applications in data modeling
...
While inheritance
and, to some extent, complex types are also present in the E-R model, encapsulation
and object-identity distinguish the object-oriented data model from the E-R model
...
This model provides the rich type system of
object-oriented databases, combined with relations as the basis for storage of data
...
The object-relational data model
provides a smooth migration path from relational databases, which is attractive to
relational database vendors
...

The XML language was initially designed as a way of adding markup information to text documents, but has become important because of its applications in data
exchange
...
Chapter 10 describes the XML language, and
then presents different ways of expressing queries on data represented in XML, and
transforming XML data from one form to another
...
Object−Based
Databases and XML

R T

8
...
Speciﬁcally, three widely used database systems— IBM
DB2, Oracle, and Microsoft SQL Server — are covered in Chapters 25, 26, and 27
...

Each of these chapters highlights unique features of each database system: tools,
SQL variations and extensions, and system architecture, including storage organization, query processing, concurrency control and recovery, and replication
...
Furthermore, since products are enhanced regularly, details of the product may change
...

Keep in mind that the chapters in this part use industrial rather than academic
terminology
...

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

III
...
Object−Oriented
Databases

© The McGraw−Hill
Companies, 2001

309

310

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

III
...
Object−Oriented
Databases

C H A P T E R

© The McGraw−Hill
Companies, 2001

2 5

Oracle
Hakan Jakobsson
Oracle Corporation

When Oracle was founded in 1977 as Software Development Laboratories by Larry
Ellison, Bob Miner, and Ed Oates, there were no commercial relational database products
...
Since then, Oracle has held a leading position in the relational database market, but over the years its product and service offerings have grown beyond the relational database server
...

In addition to database-related servers and tools, the company also offers application software for enterprise resource planning and customer-relationship management, including areas such as ﬁnancials, human resources, manufacturing, marketing, sales, and supply chain management
...

This chapter surveys a subset of the features, options, and functionality of Oracle
products
...
The feature set described here is based on the
ﬁrst release of Oracle9i
...
1 Database Design and Querying Tools
Oracle provides a variety of tools for database design, querying, report generation
and data analysis, including OLAP
...
Object−Based
Databases and XML

8
...
1
...

This is a suite of tools for various aspects of application development, including tools
for forms development, data modeling, reporting, and querying
...
10) for development modeling
...
The suite also supports XML for data exchange with other UML tools
...
It supports such modeling techniques as E-R diagrams, information
engineering, and object analysis and design
...

The metadata can then be used to generate forms and reports
...

The suite also contains application development tools for generating forms, reports, and tools for various aspects of Java and XML-based development
...

Oracle also has an application development tool for data warehousing, Oracle
Warehouse Builder
...
Oracle Warehouse Builder
supports both 3NF and star schemas and can also import designs from Oracle Designer
...
1
...

Oracle Discoverer is a Web-based, ad hoc query, reporting, analysis and Web publishing tool for end users and data analysts
...
Discoverer has wizards to help end
users visualize data as graphs
...
Discoverer’s ad hoc query
interface can generate SQL that takes advantage of this functionality and can provide end users with rich analytical functionality
...

Oracle Express Server is a multidimensional database server
...
Object−Based
Databases and XML

© The McGraw−Hill
Companies, 2001

8
...
2

SQL Variations and Extensions

923

ment
...

With the introduction of OLAP services in Oracle9i, Oracle is moving away from
supporting a separate storage engine and moving most of the calculations into SQL
...
The model also provides
a Java OLAP application programmer interface
...

• A common security model can be used for the analytical applications and the
data warehouse
...

• The relational database management system has a larger set of features and
functionality in many areas such as high availability, backup and recovery,
and third-party tool support
...

The main challenge with moving away from a separate multidimensional database
engine is to provide the same performance
...
Oracle has approached this problem in two
ways
...

• Oracle has extended materialized views to permit analytical functions, in particular grouping sets
...

25
...
In addition, Oracle supports a large number of
other language constructs, some of which conform with SQL:1999, while others are
Oracle-speciﬁc in syntax or functionality
...
Object−Based
Databases and XML

8
...
2, including ranking, moving aggregation, cube,
and rollup
...
It is an Oracle-speciﬁc syntax for
a feature that Oracle has had since the 1980s
...
The upsert operation combines update and insert, and is useful for merging new data with old data in data warehousing
applications
...
Multitable inserts allow multiple
tables to be updated based on a single scan of new data
...
8
...

25
...
1 Object-Relational Features
Oracle has extensive support for object-relational constructs, including:
• Object types
...

• Collection types
...

• Object tables
...

• Table functions
...
Table functions in Oracle can be
nested
...

• Object views
...
They allow data to be accessed or viewed in an objectoriented style even if the data are really stored in a traditional relational format
...
These can be written in PL/SQL, Java, or C
...
These can be used in SQL statements in the
same way as built-in functions such as sum and count
...
These can be used to store and index XML documents
...
PL/SQL was Oracle’s
original language for stored procedures and it has syntax similar to that used in the
Ada language
...
Oracle provides a package to encapsulate related procedures, functions, and

313

314

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

III
...
Object−Oriented
Databases

25
...
Oracle supports SQLJ (SQL embedded in Java) and JDBC,
and provides a tool to generate Java class deﬁnitions corresponding to user-deﬁned
database types
...
2
...
(See Section 6
...
) Triggers can be
written in PL/SQL or Java or as C callouts
...
Row triggers execute once for
every row that is affected (updated or deleted, for example) by the DML operation
...
In each case, the trigger can
be deﬁned as either a before or after trigger, depending on whether it is to be invoked
before or after the DML operation is carried out
...
Depending on the view deﬁnition, it may not be possible for Oracle to translate a DML statement on a view to modiﬁcations of the underlying base
tables unambiguously
...
A user can create an instead of trigger on a view to specify manually what
operations on the base tables are to occur in response to the DML operation on the
view
...

Oracle also has triggers that execute on a variety of other events, like database
startup or shutdown, server error messages, user logon or logoff, and DDL statements
such as create, alter and drop statements
...
3 Storage and Indexing
In Oracle parlance, a database consists of information stored in ﬁles and is accessed
through an instance, which is a shared memory area and a set of processes that interact with the data in the ﬁles
...
3
...
Each
table space, in turn, consists of one or more physical structures called data ﬁles
...

Usually, an Oracle database will have the following table spaces:
• The system table space, which is always created
...

• Table spaces created to store user data
...
Usually, the decision about what other table spaces should be created is based on performance, availability, maintainability, and ease of admin-

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

926

Chapter 25

III
...
Object−Oriented
Databases

© The McGraw−Hill
Companies, 2001

Oracle

istration
...

• Temporary table spaces
...
Temporary table spaces are allocated for sorting,
to make the space management operations involved in spilling to disk more
efﬁcient
...
For
example, it is common to move data from a transactional system to a data warehouse
at regular intervals
...
These operations can be much faster than unloading the data from one database and then using a loader to insert it into the other
...

25
...
2 Segments
The space in a table space is divided into units, called segments, that each contain
data for a speciﬁc data structure
...

• Data segments
...
(Partitioning in Oracle is described in Section 25
...
10
...
Each index in a table space has its own index segment, except
for partitioned indices, which have one index segment per partition
...
These are segments used when a sort operation needs
to write data to disk or when data are inserted into a temporary table
...
These segments contain undo information so that an uncommitted transaction can be rolled back
...
5
...
5
...

Below the level of segment, space is allocated at a level of granularity called extent
...
A database block is the
lowest level of granularity at which Oracle performs disk I/O
...

Oracle provides storage parameters that allow for detailed control of how space is
allocated and managed, parameters such as:
• The size of a new extent that is to be allocated to provide room for rows that
are inserted into a table
...
Object−Based
Databases and XML

© The McGraw−Hill
Companies, 2001

8
...
3

Storage and Indexing

927

• The percentage of space utilization at which a database block is considered full
and at which no more rows will be inserted into that block
...
)

25
...
3 Tables
A standard table in Oracle is heap organized; that is, the storage location of a row in
a table is not based on the values contained in the row, and is ﬁxed when the row
is inserted
...
There are several features and variations
...
The nested table is not stored in line in the parent table, but is stored
in a separate table
...
The data are private to the
session and are automatically removed at the end of its duration
...
7)
...
In a cluster, rows from different tables are stored together in the same block on the basis of some common
columns
...
The primary key/foreign
key values are used to determine the storage location
...
As a tradeoff, a query involving only the department table may have
to involve a substantially larger number of blocks than if that table had been stored
on its own
...

Therefore, an index on the clustering column is mandatory
...
Here, Oracle computes the location of a row by applying a hash
function to the value for the cluster column
...
Since no index traversal is needed to access a row
according to its cluster column value, this organization can save signiﬁcant amounts
of disk I/O
...

Both the hash cluster and regular cluster organization can be applied to a single
table
...

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

928

Chapter 25

III
...
Object−Oriented
Databases

© The McGraw−Hill
Companies, 2001

Oracle

25
...
4 Index-Organized Tables
In an index organized table, records are stored in an Oracle B-tree index instead of in a
heap
...
While an entry in a regular index contains the key value and row-id of the
indexed row, an index-organized table replaces the row-id with the column values
for the remaining columns of the row
...
Consider looking up all the column values
of a row, given its primary key value
...
For an index-organized table, only the
index probe is necessary
...
In a heap table, each row has a ﬁxed row-id
that does not change
...
Hence, a secondary index on an indexorganized table contains not normal row-ids, but logical row-ids instead
...
The physical row-id is referred to as a “guess” since it could be incorrect if the row has been
moved
...
However, if a table is
highly volatile and a large percentage of the guesses are likely to be wrong, it can be
better to create the secondary index with only key values, since using an incorrect
guess may result in a wasted disk I/O
...
3
...
The most commonly used type is a
B-tree index, created on one or multiple columns
...
) Index entries have the following format: For an index
on columns col1 , col2 , and col3 , each row in the table where at least one of the columns
has a nonnull value would result in the index entry
< col1 >< col2 >< col3 >< row-id >
where < coli > denotes the value for column i and < row-id > is the row-id for
the row
...
For
example, if there are many repeated combinations of < col1 >< col2 > values, the
representation of each distinct < col1 >< col2 > preﬁx can be shared between the
entries that have that combination of values, rather than stored explicitly for each
such entry
...

317

318

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

III
...
Object−Oriented
Databases

25
...
3
...
9
...
Bitmap indices
in Oracle use the same kind of B-tree structure to store the entries as a regular index
...
The number of such possible rows in a block depends
on how many rows can ﬁt into a block, which is a function of the number of columns
in the table and their data types
...
If the column value of that row is that of the index entry, the bit is set
to 1
...
(It is possible that the row does not actually exist because a table
block may well have a smaller number of rows than the number that was calculated
as the maximum possible
...

The compression algorithm is a variation of a compression technique called ByteAligned Bitmap Compression (BBC)
...

If the distance between two ones is sufﬁciently large — that is, there is a sufﬁcient
number of adjacent zeros between them — a runlength of zeros, that is the number of
zeros, is stored
...
For example, for the condition
(col1 = 1 or col1 = 2) and col2 > 5 and col3 <> 10
Oracle would be able to calculate which rows match the condition by performing
Boolean operations on bitmaps from indices on the three columns
...

• For the index on col2 , all the bitmaps for key values > 5 would be merged in
an operation that corresponds to a logical or
...
Then, a Boolean and would be performed on the results from the ﬁrst
two indices, followed by two Boolean minuses of the bitmaps for values 10
and null for col3
...
Object−Based
Databases and XML

8
...

The ability to use the Boolean operations to combine multiple indices is not limited to bitmap indices
...

As a rule of thumb, bitmap indices tend to be more space efﬁcient than regular
B-tree indices if the number of distinct key values is less than half the number of
rows in the table
...
For columns with a very small number of distinct values— for example, columns referring to properties such as country, state, gender, marital status,
and various status ﬂags— a bitmap index might require only a small fraction of the
space of a regular B-tree index
...

25
...
7 Function-Based Indices
In addition to creating indices on one or multiple columns of a table, Oracle allows
indices to be created on expressions that involve one or more columns, such as col1 +
col2 ∗ 5
...
In order to ﬁnd all rows
with name “van Gogh” efﬁciently, the condition
upper(name) = ’VAN GOGH’
would be used in the where clause of the query
...
A function-based index can be created as either a bitmap or a
B-tree index
...
3
...
Oracle supports bitmap join indices primarily for use
with star schemas (see Section 22
...
2)
...
How the rows in
the fact and dimension tables correspond is based on a join condition that is speciﬁed
when the index is created, and becomes part of the index metadata
...
Object−Based
Databases and XML

© The McGraw−Hill
Companies, 2001

8
...
3

Storage and Indexing

931

processed, the optimizer will look for the same join condition in the where clause of
the query in order to determine if the join index is applicable
...
In all cases, the join conditions between the fact
table on which the index is built and the dimension tables must refer to unique keys
in the dimension tables; that is, an indexed row in the fact table must correspond to
a unique row in each of the dimension tables
...
For example, consider a schema with a fact table for sales, and dimension
tables for customers, products, and time
...
If a multicolumn bitmap join index exists where
the key columns are the constrained dimension table columns (zip code, product category and time), Oracle can use the join index to ﬁnd rows in the fact table that match
the constraining conditions
...
If the query contains
conditions on some columns of the fact table, indices on those columns could be included in the same access path, even if they were regular B-tree indices or domain
indices (domain indices are described below in Section 25
...
9)
...
3
...

This extensibility feature of the Oracle server allows software vendors to develop
so-called cartridges with functionality for speciﬁc application domains, such as text,
spatial data, and images, with indexing functionality beyond that provided by the
standard Oracle index types
...

A domain index must be registered in the data dictionary, together with the operators it supports
...
Oracle allows cost functions to be registered with the operators so that the optimizer can compare the cost of using the domain index to those of
other access paths
...
Once this operator has been registered, the domain index will be considered
as an access path for a query like
select *
from employees
where contains(resume, ’LINUX’)

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

932

Chapter 25

III
...
Object−Oriented
Databases

© The McGraw−Hill
Companies, 2001

Oracle

where resume is a text column in the employee table
...

A domain index can be combined with other (bitmap or B-tree) indices in the same
access path by converting between the row-id and bitmap representation and using
Boolean bitmap operations
...
3
...
The
ability to partition a table or index has advantages in many areas
...

• Loading operations in a data warehousing environment are less intrusive:
data can be added to a partition, and then the partition added to a table, which
is an instantaneous operation
...

• Query performance beneﬁts substantially, since the optimizer can recognize
that only a subset of the partitions of a table need to be accessed in order to
resolve a query (partition pruning)
...

Each row in a partitioned table is associated with a speciﬁc partition
...
There are several ways to map column values to partitions, giving
rise to several types of partitioning, each with different characteristics: range, hash,
composite, and list partitioning
...
3
...
1 Range Partitioning
In range partitioning, the partitioning criteria are ranges of values
...
In a data warehouse
where data are loaded from the transactional systems at regular intervals, range partitioning can be used to implement a rolling window of historical data efﬁciently
...
The system actually loads the data into a separate table with the same
column deﬁnition as the partitioned table
...
After that, the system can make the separate table a
new partition of the partitioned table, by a simple change to the metadata in the data
dictionary — a nearly instantaneous operation
...
Object−Based
Databases and XML

© The McGraw−Hill
Companies, 2001

8
...
3

Storage and Indexing

933

Up until the metadata change, the loading process does not affect the existing
data in the partitioned table in any way
...
Old data can be removed from a table by
simply dropping its partition; this operation does not affect the other partitions
...
If date range
partitioning is used, the query optimizer can restrict the data access to those partitions that are relevant to the query, and avoid a scan of the entire table
...
3
...
2 Hash Partitioning
In hash partitioning, a hash function maps rows to partitions according to the values
in the partitioning columns
...

25
...
10
...
This type of partitioning combines the advantages of range partitioning and hash partitioning
...
3
...
4 List Partitioning
In list partitioning, the values associated with a particular partition are stated in a
list
...
For instance, a table with a state column can be
implicitly partitioned by geographical region if each partition list has the states that
belong in the same region
...
3
...
5
...
In addition, Oracle maintains
the materialized result, updating it when the tables that were referenced in the query
are updated
...

In data warehousing, a common usage for materialized views is to summarize
data
...
” Precomputing the result, or some partial result, of such a
query can speed up query processing dramatically compared to computing it from
scratch by aggregating all detail-level sales records
...
The rewrite consists of changing the query to
use the materialized view instead of the original tables in the query
...
Object−Based
Databases and XML

8
...
For example, if a query needs sales by quarter, the rewrite can take
advantage of a view that materializes sales by month, by adding additional aggregation to roll up the months to quarters
...
For example,
for a time dimension table in a star schema, Oracle can deﬁne a dimension metadata
object to specify how days roll up to months, months to quarters, quarters to years,
and so forth
...
The query rewrite logic looks at
these relationships since they allow a materialized view to be used for wider classes
of queries
...

When there are changes to the data in the tables referenced in the query that deﬁnes a materialized view, the materialized view must be refreshed to reﬂect those
changes
...
In a full refresh, Oracle recomputes the materialized view from scratch,
which may be the best option if the underlying tables have had signiﬁcant changes,
for example, changes due to a bulk load
...
Incremental refresh may be better if the number of rows that
were changed is low
...

A materialized view is similar to an index in the sense that, while it can improve
query performance, it uses up space, and creating and maintaining it consumes resources
...

25
...
Some of the more important ones are described here brieﬂy
...
4
...
The query processor scans the entire table by getting information about the blocks that make up the table from the extent map, and
scanning those blocks
...
The processor creates a start and/or stop key from conditions
in the query and uses it to scan to a relevant part of the index
...
Object−Based
Databases and XML

© The McGraw−Hill
Companies, 2001

8
...
4

Query Processing and Optimization

935

scan would be followed by a table access by index row-id
...

• Index fast full scan
...
If the index contains all the columns that are needed
in the index, and there are no good start/stop keys that would signiﬁcantly
reduce that portion of the index that would be scanned in a regular index scan,
this method may be the fastest way to access the data
...
However, unlike a
regular full scan, which traverses the index leaf blocks in order, a fast full scan
does not guarantee that the output preserves the sort order of the index
...
If a query needs only a small subset of the columns of a wide
table, but no single index contains all those columns, the processor can use an
index join to generate the relevant information without accessing the table, by
joining several indices that together contain the needed columns
...

• Cluster and hash cluster access
...

Oracle has several ways to combine information from multiple indices in a single
access path
...
The functionality includes the
ability to perform Boolean operations and, or, and minus on bitmaps representing
row-ids
...
In addition, for many queries involving count(*) on selections
on a table, the result can be computed by just counting the bits that are set in the
bitmap generated by applying the where clause conditions, without accessing the
table
...
(An antijoin in Oracle returns rows from the left-hand
side input that do not match any row in the right-hand side input; this operation is
called anti-semijoin in other literature
...

25
...
2 Optimization
In Chapter 14, we discussed the general topic of query optimization
...

25
...
2
...
Most of the techniques relating to
query transformations and rewrites take place before access path selection, but Oracle also supports several types of cost-based query transformations that generate a
complete plan and return a cost estimate for both a standard version of the query and

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

936

Chapter 25

III
...
Object−Oriented
Databases

© The McGraw−Hill
Companies, 2001

Oracle

one that has been subjected to advanced transformations
...

Some of the major types of transformations and rewrites supported by Oracle are
as follows:
• View merging
...

This transformation is not applicable to all views
...
Oracle offers this feature for certain classes of views
that are not subject to regular view merging because they have a group by or
select distinct in the view deﬁnition
...

• Subquery ﬂattening
...

• Materialized view rewrite
...
If some part of the query can be
matched up with an existing materialized view, Oracle can replace that part
of the query with a reference to the table in which the view is materialized
...
If multiple materialized views are applicable, Oracle picks the one that gives the greatest advantage in reducing the amount of
data that has to be processed
...
Oracle then decides
whether to execute the rewritten or the original version of the query on the
basis of the cost estimates
...
Oracle supports a technique for evaluating queries against
star schemas, known as the star transformation
...
fki in
(select pk from dimension tablei
where )
One such subquery is generated for each dimension that has some constraining predicate
...
4), the
subquery will contain a join of the applicable tables that make up the dimension
...
Object−Based
Databases and XML

© The McGraw−Hill
Companies, 2001

8
...
4

Query Processing and Optimization

937

Oracle uses the values that are returned from each subquery to probe an
index on the corresponding fact table column, getting a bitmap as a result
...
The resultant bitmap can be used to access matching fact table
rows
...

Both the decision on whether the use of a subquery for a particular dimension is cost-effective, and the decision on whether the rewritten query is better
than the original, are based on the optimizer’s cost estimates
...
4
...
2 Access Path Selection
Oracle has a cost-based optimizer that determines join order, join methods, and access paths
...

In estimating the cost of an operation, the optimizer relies on statistics that have
been computed for schema objects such as tables and indices
...
For column statistics, Oracle supports height-balanced and
frequency histograms
...
Oracle
also tracks what columns are used in where clauses of queries, which make them potential candidates for histogram creation
...

Oracle uses sampling to speed up the process of gathering the new statistics and
automatically chooses the smallest adequate sample percentage
...

Oracle uses both CPU cost and disk I/Os in the optimizer cost model
...
Oracle’s package for gathering optimizer statistics
computes these measures
...
Oracle addresses this issue in several ways
...
It then changes the order of the tables and determines the best join
methods and access paths for the new join order and so forth, while keeping the best
plan that has been found so far
...
Since this cutoff depends on the cost estimate for the best
plan found so far, ﬁnding a good plan early is important so that the optimization can
be stopped after a smaller number of join orders, resulting in better response time
...
Object−Based
Databases and XML

8
...

For each join order that is considered, the optimizer may make additional passes
over the tables to decide join methods and access paths
...
For instance, a speciﬁc
combination of join methods and access paths may eliminate the need to perform an
order by sort
...

25
...
2
...
For example, if a table is partitioned
by date range and the query is constrained to data between two speciﬁc dates, the
optimizer determines which partitions contain data between the speciﬁed dates and
ensures that only those partitions are accessed
...

25
...
3 Parallel Execution
Oracle allows the execution of a single SQL statement to be parallelized by dividing
the work between multiple processes on a multiprocessor computer
...
Representative examples are decision support
queries that need to process large amounts of data, data loads in a data warehouse,
and index creation or rebuild
...
Depending on the type of
operation, Oracle has several ways to split up the work
...
For some operations, such as a full table scan,
each such slice can be a range of blocks— each parallel query process scans the table
from the block at the start of the range to the block at the end
...
For inserts
into a nonpartitioned table, the data to be inserted are randomly divided across the
parallel processes
...
One way is to divide one of the
inputs to the join between parallel processes and let each process join its slice with
the other input to the join; this is the asymmetric fragment-and-replicate method
of Section 20
...
2
...
For example, if a large table is joined to a small one by a hash
join, Oracle divides the large table among the processes and broadcasts a copy of the
small table to each process, which then joins its slice with the smaller table
...
Object−Based
Databases and XML

8
...
4

© The McGraw−Hill
Companies, 2001

Query Processing and Optimization

939

tables are large, it would be prohibitively expensive to broadcast one of them to all
processes
...
5
...
1)
...
Which one of these processes gets the row is determined by a hash function
on the values of the join column
...

Oracle parallelizes sort operations by value ranges of the column on which the
sort is performed (that is, using the range-partitioning sort of Section 20
...
1)
...
To maximize the beneﬁts of parallelism, the rows need to be divided
as evenly as possible among the parallel processes, and the problem of determining
range boundaries that generates a good distribution then arises
...

25
...
3
...
The coordinator is
responsible for assigning work to the parallel servers and for collecting and returning
data to the user process that issued the statement
...
The degree of parallelism is determined by the optimizer, but
can be throttled back dynamically if the load on the system increases
...
When a sequence of
operations is needed to process a statement, the producer set of servers performs the
ﬁrst operation and passes the resulting data to the consumer set
...
If a subsequent operation is needed, like another sort,
the roles of the two sets of servers switch
...
Hence, a sequence of operations proceeds by
passing data back and forth between two sets of servers that alternate in their roles as
producers and consumers
...

For shared nothing systems, the cost of accessing data on disk is not uniform
among processes
...
Oracle uses knowledge about device-to-node and device-toprocess afﬁnity — that is, the ability to access devices directly — when distributing
work among parallel execution servers
...
Object−Based
Databases and XML

8
...
5 Concurrency Control and Recovery
Oracle supports concurrency control and recovery techniques that provide a number
of useful features
...
5
...
Read-only queries are given a read-consistent
snapshot, which is a view of the database as it existed at a speciﬁc point in time,
containing all updates that were committed by that point in time, and not containing
any updates that were not committed at that point in time
...
(This is basically the multiversion two-phase locking protocol described in
Section 16
...
2
...
The SCN essentially acts as a timestamp, where the time is measured in terms
of transaction commits instead of wall-clock time
...
Hence, the data in the block cannot be included in a consistent
view of the database as it existed at the time of the query’s SCN
...
Oracle retrieves that version of the
data from the rollback segment (rollback segments are described in Section 25
...
2)
...
Should the block with the desired SCN no
longer exist in the rollback segment, the query will return an error
...

In the Oracle concurrency model, read operations do not block write operations
and write operations do not block read operations, a property that allows a high
degree of concurrency
...
This kind of scenario is often problematic for database systems where queries
use read locks, since the query may either fail to acquire them or lock large amounts
of data for a long time, thereby preventing transactional activity against that data
and reducing concurrency
...
)
Oracle’s concurrency model is used as a basis for the Flashback Query feature
...
Object−Based
Databases and XML

© The McGraw−Hill
Companies, 2001

8
...
5

Concurrency Control and Recovery

941

perform queries on the data that existed at that point in time (provided that the data
still exist in the rollback segment)
...
However, recovery of a
very large database can be very costly, especially if the goal is just to retrieve some
data item that had been inadvertently deleted by a user
...

Oracle supports two ANSI/ISO isolation levels, “read committed” and “serializable”
...
The two isolation
levels correspond to whether statement-level or transaction-level read consistency is
used
...
Statement-level
read consistency is the default
...
Updates to different rows do not conﬂict
...
Locks are held for the duration of a transaction
...
These locks
prevent one user from, say, dropping a table while another user has an uncommitted
transaction that is accessing that table
...

Oracle detects deadlocks automatically and resolves them by rolling back one of
the transactions involved in the deadlock
...
When Oracle invokes an autonomous transaction, it generates a new transaction in a separate context
...

Oracle supports multiple levels of nesting of autonomous transactions
...
5
...
In addition to the
data ﬁles that contain tables and indices, there are control ﬁles, redo logs, archived
redo logs, and rollback segments
...

Oracle records any transactional modiﬁcation of a database buffer in the redo log,
which consists of two or more ﬁles
...
It logs
changes to indices and rollback segments as well as changes to table data
...

The rollback segment contains information about older versions of the data (that
is, undo information)
...
Object−Based
Databases and XML

8
...

To be able to recover from a storage failure, the data ﬁles and control ﬁles should be
backed up regularly
...
Oracle supports hot backups
— backups performed on an online database that is subject to transactional activity
...
First, Oracle rolls forward
by applying the (archived) redo logs to the backup
...
Second, Oracle rolls back uncommitted transactions by using the rollback segment
...

Recovery on a database that has been subject to heavy transactional activity since
the last backup can be time consuming
...
Oracle provides
a GUI tool, Recovery Manager, which automates most tasks associated with backup
and recovery
...
5
...

(This feature is the same as remote backups, described in Section 17
...
) A standby
database is a copy of the regular database that is installed on a separate system
...
Oracle
keeps the standby database up to date by constantly applying archived redo logs
that are shipped from the primary database
...

25
...
Oracle can be conﬁgured
so that the operating system process is dedicated exclusively to the statement it is
processing or so that the process can be shared among multiple statements
...
We shall discuss the dedicated
server architecture ﬁrst and the multithreaded server architecture later
...
6
...

The system code areas are the parts of the memory where the Oracle server code
resides
...
Object−Based
Databases and XML

© The McGraw−Hill
Companies, 2001

8
...
6

System Architecture

943

tion
...
It also contains memory for sorting and
hashing operations that may occur during the evaluation of the statement
...
It is made
up by several major structures, including:
• The buffer cache
...
A least recently used replacement policy is used except for blocks accessed during a full table scan
...
Some Oracle operations bypass the buffer cache and read data directly from disk
...
This buffer contains the part of the redo log that has not
yet been written to disk
...
Oracle seeks to maximize the number of users that can
use the database concurrently by minimizing the amount of memory that is
needed for each user
...
When multiple users execute the same SQL statement, they can
share most data structures that represent the execution plan for the statement
...

The sharable parts of the data structures representing the SQL statement are
stored in the shared pool, including the text of the statement
...
The determination of whether an SQL statement is the same as one existing in the shared pool is based on exact text
matching and the setting of certain session parameters
...
The shared pool also contains caches for dictionary
information and various control structures
...
6
...
Some of these processes are optional, and in
some cases, multiple processes of the same type can be used for performance reasons
...
When a buffer is removed from the buffer cache, it must be
written back to disk if it has been modiﬁed since it entered the cache
...
Object−Based
Databases and XML

8
...

• Log writer
...
It also writes a commit record to disk whenever a transaction commits
...
The checkpoint process updates the headers of the data ﬁle when
a checkpoint occurs
...
This process performs crash recovery if needed
...

• Process monitor
...

• Recoverer
...

• Archiver
...

25
...
3 Multithreaded Server
The multithreaded server conﬁguration increases the number of users that a given
number of server processes can support by sharing server processes among statements
...
In doing so, it uses a request queue and a response queue in the
SGA
...
As a server process completes a request, it
puts the result in the response queue to be picked up by the dispatcher and
returned to the user
...
Instead, it stores the session-speciﬁc data in
the SGA
...
6
...
(Recall that, in Oracle terminology, an instance
is the combination of background processes and memory areas
...
This feature was called Oracle Parallel Server in earlier versions of Oracle
...

333

334

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

III
...
Object−Oriented
Databases

25
...
Oracle further optimizes the use of the hardware through features
such as afﬁnity and partitionwise joins
...
If
one node fails, the remaining ones are still available to the application accessing the
database
...

Having multiple instances run against the same database gives rise to some technical issues that do not exist on a single instance
...

To address this, Oracle supports a distributed lock manager and the cache fusion feature, which allows data blocks to ﬂow directly among caches on different instances
using the interconnect, without being written to disk
...
7 Replication, Distribution, and External Data
Oracle provides support for replication and distributed transactions with two-phase
commit
...
7
...
(See Section 19
...
1 for an introduction
to replication
...
(The term “snapshot” in this context should not be confused with the concept of a read-consistent snapshot in the context of the concurrency
model
...
Oracle supports two types
of snapshots: read-only and updatable
...
However, read-only
snapshots allow for a wider range of snapshot deﬁnitions
...

Oracle also supports multiple master sites for the same data, where all master
sites act as peers
...
The updates can be propagated either
asynchronously or synchronously
...
Since the same data could be subject to conﬂicting modiﬁcations at different sites, conﬂict resolution based on some business rules might be
needed
...

With synchronous replication, an update to one master site is propagated immediately to all other sites
...

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

946

Chapter 25

III
...
Object−Oriented
Databases

© The McGraw−Hill
Companies, 2001

Oracle

25
...
2 Distributed Databases
Oracle supports queries and transactions spanning multiple databases on different
systems
...
Oracle has built-in capability to optimize a query that includes tables at different sites, retrieve the relevant data, and return the result as if it had been a normal,
local query
...

25
...
3 External Data Sources
Oracle has several mechanisms for supporting external data sources
...

25
...
3
...
It supports a variety of data formats and it can
perform various ﬁltering operations on the data being loaded
...
7
...
2 External Tables
Oracle allows external data sources, such as ﬂat ﬁles, to be referenced in the from
clause of a query as if they were regular tables
...
An access driver is also needed to access the external data
...

The external table feature is primarily intended for extraction, transformation, and
loading (ETL) operations in a data warehousing environment
...
from < external table >
where
...
Since these
operations can be expressed either in native SQL or in functions written in PL/SQL or
Java, the external table feature provides a very powerful mechanism for expressing
all kinds of data transformation and ﬁltering operations
...

25
...

335

336

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

III
...
Object−Oriented
Databases

Bibliographical Notes

947

25
...
1 Oracle Enterprise Manager
Oracle Enterprise Manager is Oracle’s main tool for database systems management
...
It also provides performance monitoring and tools to help
an administrator tune application SQL, access paths, and instance and data storage
parameters
...

25
...
2 Database Resource Management
A database administrator needs to be able to control how the processing power of
the hardware is divided among users or groups of users
...

It is also important to be able to prevent a user from inadvertently submitting an
extremely expensive ad hoc query that will unduly delay other users
...
For example, a group of high-priority, interactive users may be guaranteed at
least 60 percent of the CPU
...
A really low-priority group could get assigned 0 percent, which
would mean that queries issued by this group would run only when there are spare
CPU cycles available
...
The database administrator can also set time limits for how
long an SQL statement is allowed to run for each group
...
The resource manager can also
limit the number of user sessions that can be active concurrently for each resource
consumer group
...
oracle
...
oracle
...

Extensible indexing in Oracle8i is described by Srinivasan et al
...
[2000a] describe index organized tables in Oracle8i
...

[2000] describe XML support in Oracle8i
...
[1998] describe materialized
views in Oracle
...

The Oracle Parallel Server is described by Bamford et al
...
Recovery in Oracle
is described by Joshi et al
...
[2001]
...

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

C

III
...
Object−Relational
Databases

T

E

R

337

© The McGraw−Hill
Companies, 2001

9

Object-Relational Databases

Persistent programming languages add persistence and other database features to existing programming languages by using an existing object-oriented type system
...
Relational
query languages, in particular SQL, need to be correspondingly extended to deal
with the richer type system
...
Object-relational database systems (that is, database systems based on
the object-relation model) provide a convenient migration path for users of relational
databases who wish to use object-oriented features
...
We then show how to extend SQL by adding a variety of object-relational
features
...

Finally, we discuss differences between persistent programming languages and
object-relational systems, and mention criteria for choosing between them
...
1 Nested Relations
In Chapter 7, we deﬁned ﬁrst normal form (1NF), which requires that all attributes
have atomic domains
...

The assumption of 1NF is a natural one in the bank examples we have considered
...
For example, rather
than view a database as a set of records, users of certain applications view it as a set of
objects (or entities)
...

We shall see that a simple, easy-to-use interface requires a one-to-one correspondence
335

338

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

336

Chapter 9

III
...
Object−Relational
Databases

Object-Relational Databases

title
Compilers
Networks

author-set

publisher
(name, branch)
{Smith, Jones} (McGraw-Hill, New York)
{Jones, Frick}
(Oxford, London)
Figure 9
...

between the user’s intuitive notion of an object and the database system’s notion of
a data item
...
Thus, the value of a tuple on an attribute may be a relation, and relations may be contained within relations
...
If we view a tuple of a nested relation as a data item, we have a one-to-one correspondence between
data items and objects in the user’s view of the database
...
Suppose we store for
each book the following information:
• Book title
• Set of authors
• Publisher
• Set of keywords
We can see that, if we deﬁne a relation for the preceding information, several domains
will be nonatomic
...
A book may have a set of authors
...
Thus, we are interested
in a subpart of the domain element “set of authors
...
If we store a set of keywords for a book, we expect to be able to
retrieve all books whose keywords include one or more keywords
...

• Publisher
...
However, we may view publisher as consisting of the subﬁelds name
and branch
...

Figure 9
...
The books relation can be represented
in 1NF, as in Figure 9
...
Since we must have atomic domains in 1NF, yet want access to individual authors and to individual keywords, we need one tuple for each
(keyword, author) pair
...

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

III
...
2

title
Compilers
Compilers
Compilers
Compilers
Networks
Networks
Networks
Networks
Figure 9
...
Object−Relational
Databases

Complex Types

pub-branch
New York
New York
New York
New York
London
London
London
London

337

keyword
parsing
parsing
analysis
analysis
Internet
Internet
Web
Web

ﬂat-books, a 1NF version of non-1NF relation books
...
2 disappears if we
assume that the following multivalued dependencies hold:
• title → author
→
• title → keyword
→
• title → pub-name, pub-branch
Then, we can decompose the relation into 4NF using the schemas:
• authors(title, author)
• keywords(title, keyword)
• books4(title, pub-name, pub-branch)
Figure 9
...
2 onto the preceding decomposition
...
The 4NF design would
require users to include joins in their queries, thereby complicating interaction with
the system
...
In such a view,
however, we lose the one-to-one correspondence between tuples and books
...
2 Complex Types
Nested relations are just one example of extensions to the basic relational model;
other nonatomic data types, such as nested records, have also proved useful
...
With complex type systems and object orientation, we can represent E-R model concepts, such as identity of entities, multivalued attributes, and
generalization and specialization directly, without a complex translation to the relational model
...
Object−Based
Databases and XML

© The McGraw−Hill
Companies, 2001

9
...
3

4NF version of the relation ﬂat-books of Figure 9
...

In this section, we describe extensions to SQL to allow complex types, including nested relations, and object-oriented features
...

9
...
1 Collection and Large Object Types
Consider this fragment of code
...

keyword-set setof(varchar(20))

...

Sets are an instance of collection types
...
The following attribute deﬁnitions illustrate the declaration of an
array:
author-array varchar(20) array [10]

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

III
...
Object−Relational
Databases

9
...
We can access elements of an
array by specifying the array index, for example author-array[1]
...
SQL:1999 does not support unordered sets or multisets,
although they may appear in future versions of SQL
...
SQL:1999 therefore provides new large-object data types
for character data (clob) and binary data (blob)
...
For example, we may declare attributes
book-review clob(10KB)
image blob(10MB)
movie blob(2GB))
Large objects are typically used in external applications, and it makes little sense to
retrieve them in their entirety by SQL
...
For instance, JDBC permits the programmer to fetch a large object
in small pieces, rather than all at once, much like fetching data from an operating
system ﬁle
...
2
...
The second statement deﬁnes a structured type Book, which contains
a title, an author-array, which is an array of authors, a publication date, a publisher
(of type Publisher), and a set of keywords
...
) The types
illustrated above are called structured types in SQL:1999
...
The Oracle 8 database system supports nested relations, but uses a syntax different from that in this
chapter
...
Object−Based
Databases and XML

9
...
The table is similar
to the nested relation books in Figure 9
...
The array permits us to record the
order of author names
...
Unnamed row types can also be used in SQL:1999 to deﬁne composite attributes
...

We can of course create tables without creating an intermediate type for the table
...
2
A structured type can have methods deﬁned on it
...
salary = self
...
salary * percent) / 100;
end
The variable self refers to the structured type instance on which the method is invoked
...
6
...
In Oracle PL/SQL, given a table t, t%rowtype denotes the type of the rows of the table
...
a%type denotes the type of attribute a of table t
...
Object−Based
Databases and XML

343

© The McGraw−Hill
Companies, 2001

9
...
2

Complex Types

341

9
...
3 Creation of Values of Complex Types
In SQL:1999 constructor functions are used to create values of structured types
...
For instance, we could declare a constructor for the type Publisher
like this:
create function Publisher (n varchar(20), b varchar(20))
returns Publisher
begin
set name = n;
set branch = b;
end
We can then use Publisher(’McGraw-Hill’, ’New York’) to create a value of the type
Publisher
...
6; the names of such functions must be different from the name of any structured type
...
That is, the value the constructor creates
has no object identity
...

By default every structured type has a constructor with no arguments, which sets
the attributes to their default values
...
There can be more than one constructor for the same structured type; although
they have the same name, they must be distinguishable by the number of arguments
and types of their arguments
...
For instance,
if we declare an attribute publisher1 as a row type (as in Section 9
...
2), we can construct this value for it:
(’McGraw-Hill’, ’New York’)
without using a constructor
...
We can create multiset values just like
set values, by replacing set by multiset
...
Although sets and multisets are not part of the SQL:1999 standard, the other constructs shown in this
section are part of the standard
...

344

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

342

Chapter 9

III
...
Object−Relational
Databases

© The McGraw−Hill
Companies, 2001

Object-Relational Databases

Here we have created a value for the attribute Publisher by invoking a constructor
function for Publisher with appropriate arguments
...
3 Inheritance
Inheritance can be at the level of types, or at the level of tables
...

9
...
1 Type Inheritance
Suppose that we have the following type deﬁnition for people:
create type Person
(name varchar(20),
address varchar(20))
We may want to store extra information in the database about people who are students, and about people who are teachers
...

Student and Teacher are said to be subtypes of Person, and Person is a supertype of
Student, as well as of Teacher
...

However, a subtype can redeﬁne the effect of a method by declaring the method
again, using overriding method in place of method in the method declaration
...

We can do this by using multiple inheritance, which we studied in Chapter 8
...
However, draft versions
of the SQL:1999 standard provided for multiple inheritance, and although the ﬁnal

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

III
...
Object−Relational
Databases

9
...
We base
our discussion on the draft versions of the SQL:1999 standard
...
There is a
problem, however, since the attributes name, address, and department are present in
Student, as well as in Teacher
...
So there is no conﬂict caused by inheriting them from Student as well as Teacher
...
In fact,
a teaching assistant may be a student of one department and a teacher in another
department
...
Multiple inheritance as in the TeachingAssistant example is not supported in SQL:1999
...
The keyword ﬁnal says that subtypes may not be created
from the given type, while not ﬁnal says that subtypes may be created
...
” That is, each value must be associated with one speciﬁc
type, called its most-speciﬁc type, when it is created
...
For example,
suppose that an entity has the type Person, as well as the type Student
...
However, an
entity cannot have the type Student, as well as the type Teacher, unless it has a type,
such as TeachingAssistant, that is a subtype of Teacher, as well as of Student
...
3
...

For instance, suppose we deﬁne the people table as follows:
create table people of Person
We can then deﬁne tables students and teachers as subtables of people, as follows:

346

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

344

Chapter 9

III
...
Object−Relational
Databases

© The McGraw−Hill
Companies, 2001

Object-Relational Databases

create table students of Student
under people
create table teachers of Teacher
under people
The types of the subtables must be subtypes of the type of the parent table
...

Further, when we declare students and teachers as subtables of people, every tuple
present in students or teachers becomes also implicitly present in people
...
However,
only those attributes that are present in people can be accessed
...
(We
note, however, that multiple inheritance of tables is not supported by SQL:1999
...

SQL:1999 permits us to ﬁnd tuples that are in people but not in its subtables by using
“only people” in place of people in a query
...
Before we state the constraints, we need a deﬁnition: We say that tuples in a subtable corresponds to tuples
in a parent table if they have the same values for all inherited attributes
...

The consistency requirements for subtables are:
1
...

2
...

For example, without the ﬁrst condition, we could have two tuples in students (or
teachers) that correspond to the same person
...

Since SQL:1999 does not support multiple inheritance, the second condition actually prevents a person from being both a teacher and a student
...
Obviously it would be useful to model a situation where a person

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

III
...
Object−Relational
Databases

9
...
Thus, it can be useful to remove the second consistency constraint
...
3
...

Subtables can be stored in an efﬁcient manner without replication of all inherited
ﬁelds, in one of two ways:
• Each table stores the primary key (which may be inherited from a parent table)
and the attributes deﬁned locally
...

• Each table stores all inherited and locally deﬁned attributes
...
Access to all attributes of a tuple is faster,
since a join is not required
...

9
...
3 Overlapping Subtables
Inheritance of types should be used with care
...

Student may itself have subtypes such as UndergraduateStudent, GraduateStudent, and
PartTimeStudent
...

As Chapter 8 mentions, each of these categories is sometimes called a role
...
In the preceding example, we would have subtypes such as ForeignUndergraduateStudent, ForeignGraduateStudentFootballPlayer, and so on
...

A better approach in the context of database systems is to allow an object to have
multiple types, without having a most-speciﬁc type
...

For example, suppose we again have the type Person, with subtypes Student and
Teacher, and the corresponding table people, with subtables teachers and students
...

There is no need to have a type TeachingAssistant that is a subtype of both Student
and Teacher
...

We note, however, that SQL:1999 prohibits such a situation, because of consistency
requirement 2 from Section 9
...
2
...
We can of course create separate tables to represent the

348

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

346

Chapter 9

III
...
Object−Relational
Databases

© The McGraw−Hill
Companies, 2001

Object-Relational Databases

information without using inheritance
...

9
...
An attribute of a type
can be a reference to an object of a speciﬁed type
...
The restriction of the
scope of a reference to tuples of a table is mandatory in SQL:1999, and it makes references behave like foreign keys
...
We can get the identiﬁer value of a tuple by means of a query
...

SQL:1999 adopts a different approach, one where the referenced table must have an

attribute that stores the identiﬁer of the tuple
...
Object−Based
Databases and XML

349

© The McGraw−Hill
Companies, 2001

9
...
4

Reference Types

347

Here, oid is an attribute name, not a keyword
...
oid
instead of select ref(p)
...
The type of the self-referential attribute must be speciﬁed as part of the type
deﬁnition of the referenced table, and the table deﬁnition must specify that the reference is user generated:
create type Person
(name varchar(20),
address varchar(20))
ref using varchar(20)
create table people of Person
ref is oid user generated
When inserting a tuple in people, we must provide a value for the identiﬁer:
insert into people values
(’01284567’, ’John’, ’23 Coyote Run’)
No other tuple for people or its supertables or subtables can have the same identiﬁer
...
When inserting a tuple for departments, we
can then use
insert into departments
values (’CS’, ’John’)

350

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

348

Chapter 9

III
...
Object−Relational
Databases

© The McGraw−Hill
Companies, 2001

Object-Relational Databases

9
...
Let us start with a simple example: Find the title and the name of the publisher
of each book
...
name
from books
Notice that the ﬁeld name of the composite attribute publisher is referred to by a dot
notation
...
5
...
Consider the departments
table deﬁned earlier
...

Since head is a reference to a tuple in the people table, the attribute name in the
preceding query is the name attribute of the tuple from the people table
...
To ﬁnd
the name and address of the head of a department, we would require an explicit
join of the relations departments and people
...

9
...
2 Collection-Valued Attributes
We now consider how to handle collection-valued attributes
...
An expression evaluating to a collection can appear anywhere that a
relation name may appear, such as in a from clause, as the following paragraphs
illustrate
...

If we want to ﬁnd all books that have the word “database” as one of their keywords, we can use this query:
select title
from books
where ’database’ in (unnest(keyword-set))
Note that we have used unnest(keyword-set) in a position where SQL without nested
relations would have required a select-from-where subexpression
...
Object−Based
Databases and XML

351

© The McGraw−Hill
Companies, 2001

9
...
5

Querying with Complex Types

349

If we know that a particular book has three authors, we could write:
select author-array[1], author-array[2], author-array[3]
from books
where title = ’Database System Concepts’
Now, suppose that we want a relation containing pairs of the form “title, authorname” for each book and each author of the book
...
title, A
...
author-array) as A
Since the author-array attribute of books is a collection-valued ﬁeld, it can be used in a
from clause, where a relation is expected
...
5
...
The books relation has two attributes, author-array and
keyword-set, that are collections, and two attributes, title and publisher, that are not
...
We can use the following query to carry out
the task:
select title, A as author, publisher
...
branch
as pub-branch, K as keyword
from books as B, unnest(B
...
keyword-set) as K
The variable B in the from clause is declared to range over books
...
Figure 9
...
1)
shows an instance books relation, and Figure 9
...

The reverse process of transforming a 1NF relation into a nested relation is called
nesting
...
In the normal
use of grouping in SQL, a temporary multiset relation is (logically) created for each
group, and an aggregate function is applied on the temporary relation
...
Suppose that we are given a 1NF relation ﬂat-books, as in Figure 9
...
The
following query nests the relation on the attribute keyword:
select title, author, Publisher(pub-name, pub-branch) as publisher,
set(keyword) as keyword-set
from ﬂat-books
groupby title, author, publisher
The result of the query on the books relation from Figure 9
...
4
...
Object−Based
Databases and XML

© The McGraw−Hill
Companies, 2001

9
...
4

publisher
(pub-name, pub-branch)
(McGraw-Hill, New York)
(McGraw-Hill, New York)
(Oxford, London)
(Oxford, London)

keyword-set
{parsing, analysis}
{parsing, analysis}
{Internet, Web}
{Internet, Web}

A partially nested version of the ﬂat-books relation
...
2 back to the nested table books in Figure 9
...
The following query, which performs the same task as the previous query,
illustrates this approach
...
title = O
...
title = O
...
Observe that the attribute
O
...
An advantage of
this approach is that an orderby clause can be used in the nested query, to generate
results in a desired order
...
Without such an ordering, arrays and lists would not be uniquely
determined
...

The extensions we have shown for nesting illustrate features from some proposals
for extending SQL, but are not part of any standard currently
...
Object−Based
Databases and XML

353

© The McGraw−Hill
Companies, 2001

9
...
6

Functions and Procedures

351

9
...
These can be
deﬁned either by the procedural component of SQL:1999, or by an external programming language such as Java, C, or C++
...
Several database systems support their own procedural languages, such as PL/SQL in Oracle and TransactSQL in
Microsoft SQLServer
...

9
...
1 SQL Functions and Procedures
Suppose that we want a function that, given the title of a book, returns the count of
the number of authors, using the 4NF schema
...
title = title
return a-count;
end
This function can be used in a query that returns the titles of all books that have
more than one author:
select title
from books4
where author-count(title) > 1
Functions are particularly useful with specialized data types such as images and
geometric objects
...
Functions
may be written in an external language such as C, as we see in Section 9
...
2
...

Methods, which we saw in Section 9
...
2, can be viewed as functions associated
with structured types
...
Thus, the body of the
method can refer to an attribute a of the value by using self
...
These attributes can
also be updated by the method
...
The author-count function could instead be written as a procedure:

354

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

352

Chapter 9

III
...
Object−Relational
Databases

© The McGraw−Hill
Companies, 2001

Object-Relational Databases

create procedure author-count-proc(in title varchar(20), out a-count integer)
begin
select count(author) into a-count
from authors
where authors
...
The name,
along with the number of arguments, is used to identify the procedure
...

9
...
2 External Language Routines
SQL:1999 allows us to deﬁne functions in a programming language such as C or C++
...
An example of the use of such functions would be to perform a complex arithmetic computation on the data in a tuple
...

They must therefore have several extra parameters: an sqlstate value to indicate failure/success status, a parameter to store the return value of the function, and indicator variables for each parameter/function result to indicate if the value is null
...

Functions deﬁned in a programming language and compiled outside the database
system may be loaded and executed with the database system code
...
Object−Based
Databases and XML

355

© The McGraw−Hill
Companies, 2001

9
...
6

Functions and Procedures

353

ing so carries the risk that a bug in the program can corrupt the database internal
structures, and can bypass the access-control functionality of the database system
...

Database systems that are concerned about security would typically execute such
code as part of a separate process, communicate the parameter values to it, and fetch
results back, via interprocess communication
...
The sandbox prevents
the Java code from carrying out any reads or updates directly on the database
...
6
...
The part of the SQL:1999

standard that deals with these constructs is called the Persistent Storage Module
(PSM)
...
end, and it may contain multiple SQL statements between the begin and the end
...
6
...

SQL:1999 supports while statements and repeat statements by this syntax:
declare n integer default 0;
while n < 10 do
set n = n + 1;
end while
repeat
set n = n − 1;
until n = 0
end repeat
This code does not do anything useful; it is simply meant to show the syntax of while
and repeat loops
...

There is also a for loop, which permits iteration over all results of a query:
declare n integer default 0;
for r as
select balance from account
where branch-name = ‘Perryridge‘
do
set n = n+ r
...
It is possible to give a name to the cursor, by inserting the text cn cursor for just
after the keyword as, where cn is the name we wish to give to the cursor
...
Object−Based
Databases and XML

9
...
The statement leave can be used to exit the loop, while iterate starts on
the next tuple, from the beginning of the loop, skipping the remaining statements
...
balance < 1000
then set l = l+ r
...
balance < 5000
then set m = m+ r
...
balance
end if
This code assumes that l, m, and h are integer variables, and r is a row variable
...
balance” in the for loop of the preceding paragraph by
the if-then-else code, the loop would compute the total balances of accounts that fall
under the low, medium, and high balance categories respectively
...

Finally, SQL:1999 includes the concept of signaling exception conditions, and declaring handlers that can handle the exception, as in this code:
declare out-of-stock condition
declare exit handler for out-of-stock
begin

...
The handler says that if the condition arises, the action to be taken
is to exit the enclosing begin end statement
...
In addition to explicitly deﬁned conditions, there are also predeﬁned conditions such as sqlexception, sqlwarning, and not found
...
5 provides a larger example of the use of SQL:1999 procedural constructs
...
The relation manager(empname, mgrname), specifying who works directly for which manager, is assumed to be available
...
We saw how to express such a query by recursion
in Chapter 5 (Section 5
...
6)
...
The procedure inserts
all employees who directly work for mgr into newemp before the repeat loop
...
Next, it computes employees
who work for those in newemp, except those who have already been found to be

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

III
...
Object−Relational
Databases

9
...

– – The relation manager(empname, mgrname) speciﬁes who directly
– – works for whom
...
empname
from newemp, manager
where newemp
...
mgrname;
)
except (
select empname
from empl
);
delete from newemp;
insert into newemp
select *
from temp;
delete from temp;
until not exists (select * from newemp)
end repeat;
end
Figure 9
...

employees of mgr, and stores them in the temporary table temp
...
The repeat loop terminates when it
ﬁnds no new (indirect) employees
...
For
example, if a works for b, b works for c, and c works for a, there is a cycle
...
For instance, suppose we have a relation ﬂights(to, from) that says which

358

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

356

Chapter 9

III
...
Object−Relational
Databases

© The McGraw−Hill
Companies, 2001

Object-Relational Databases

cities can be reached from which other cities by a direct ﬂight
...
All we have to do is to replace manager by ﬂight and replace
attribute names correspondingly
...

9
...
Database systems of both types are on the
market, and a database designer needs to choose the kind of system that is appropriate to the needs of the application
...
The declarative nature and limited power (compared to a
programming language) of the SQL language provides good protection of data from
programming errors, and makes high-level optimizations, such as reducing I/O, relatively easy
...
) Objectrelational systems aim at making data modeling and querying easier by using complex data types
...

A declarative language such as SQL, however, imposes a signiﬁcant performance
penalty for certain kinds of applications that run primarily in main memory, and
that perform a large number of accesses to the database
...
They
provide low-overhead access to persistent data, and eliminate the need for data translation if the data are to be manipulated by a programming language
...
Typical applications include CAD databases
...
For example, some object-oriented database systems built around a
persistent programming language are implemented on top of a relational database
system
...

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

III
...
Object−Relational
Databases

9
...
To do so, the complex data types supported by object-relational
systems need to be translated to the simpler type system of relational databases
...
For instance, multivalued attributes in the E-R model correspond to set-valued attributes in the object-relational
model
...
ISA hierarchies
in the E-R model correspond to table inheritance in the object-relational model
...
9,
can be used, with some extensions, to translate object-relational data to relational
data
...
8 Summary
• The object-relational data model extends the relational data model by providing a richer type system including collection types, and object orientation
...

• Collection types include nested relations, sets, multisets, and arrays, and the
object-relational model permits attributes of a table to be collections
...

• We saw a variety of features of the extended data-deﬁnition language, as
well as the query language, and in particular support for collection-valued
attributes, inheritance, and tuple references
...

• Object-relational database systems (that is, database systems based on the
object-relation model) provide a convenient migration path for users of relational databases who wish to use object-oriented features
...

• We discussed differences between persistent programming languages and
object-relational systems, and mention criteria for choosing between them
...
Object−Based
Databases and XML

© The McGraw−Hill
Companies, 2001

9
...
1 Consider the database schema
Emp = (ename, setof(Children), setof(Skills))
Children = (name, Birthday)
Birthday = (day, month, year)
Skills = (type, setof(Exams))
Exams = (year, city)
Assume that attributes of type setof(Children), setof(Skills), and setof(Exams),
have attribute names ChildrenSet, SkillsSet, and ExamsSet, respectively
...
Write the following queries in
SQL:1999 (with the extensions described in this chapter)
...
Find the names of all employees who have a child who has a birthday in
March
...
Find those employees who took an examination for the skill type “typing”
in the city “Dayton”
...
List all skill types in the relation emp
...
2 Redesign the database of Exercise 9
...
List any functional or multivalued dependencies that you assume
...

9
...
3
...
Recall the constraints on subtables, and give all constraints that must be imposed on the relational schema
so that every database instance of the relational schema can also be represented
by an instance of the schema with inheritance
...
Object−Based
Databases and XML

9
...
4 A car-rental company maintains a vehicle database for all vehicles in its current
ﬂeet
...
Special data are included
for certain types of vehicles:
•
•
•
•

Trucks: cargo capacity
Sports cars: horsepower, renter age requirement
Vans: number of passengers
Off-road vehicles: ground clearance, drivetrain (four- or two-wheel drive)

Construct an SQL:1999 schema deﬁnition for this database
...

9
...
Under what
circumstances would you choose to use a reference type?
9
...
11, which contains composite, multivalued
and derived attributes
...
Give an SQL:1999 schema deﬁnition corresponding to the E-R diagram
...

b
...

9
...
17, which
contains specializations
...
8 Consider the relational schema shown in Figure 3
...

a
...

b
...
10 on the above schema, using
SQL:1999
...
9 Consider an employee database with two relations
employee (employee-name, street, city)
works (employee-name, company-name, salary)
where the primary keys are underlined
...

a
...

b
...

9
...
6
...

9
...
Under what circumstances would
you use each of these features?

362

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

360

Chapter 9

III
...
Object−Relational
Databases

© The McGraw−Hill
Companies, 2001

Object-Relational Databases

9
...
For each of the following applications, state what
type of database system (relational, persistent-programming-language – based
OODB, object relational; do not specify a commercial product) you would recommend
...

a
...
A system to track contributions made to candidates for public ofﬁce
c
...
Various algebraic query languages are presented in Fischer and Thomas
[1983], Zaniolo [1983], Ozsoyoglu et al
...
[1988]
...
[1989]
...
[1996]
...
POSTGRES (Stonebraker and Rowe [1986] and Stonebraker [1986a]) was an early implementation of an
object-relational system
...
The Iris database system from Hewlett-Packard (Fishman
et al
...
[1990]) provides object-oriented extensions on top
of a relational database system
...

[1989] is an object-oriented extension of SQL implemented in the O2 object-oriented
database system (Deux [1991])
...
XSQL is an
object-oriented extension of SQL proposed by Kifer et al
...

SQL:1999 was the product of an extensive (and long-delayed) standardization effort, which originally started off as adding object-oriented features to SQL and ended
up adding many more features, such as control ﬂow, as we have seen
...
ansi
...
However,
standards documents are very hard to read, and are best left to SQL:1999 implementers
...

Tools
The Informix database system provides support for many object-relational features
...
0
...
IBM DB2 supports many of the
SQL:1999 features
...
Object−Based
Databases and XML

H

A

P

T

E

R

363

© The McGraw−Hill
Companies, 2001

10
...
In
fact, like the Hyper-Text Markup Language (HTML) on which the World Wide Web is
based, XML has its roots in document management, and is derived from a language
for structuring large documents known as the Standard Generalized Markup Language
(SGML)
...
It is particularly
useful as a data format when an application must communicate with another application, or integrate information from several other applications
...
In this chapter, we introduce XML and discuss both the management of XML data with database techniques and the exchange of data formatted
as XML documents
...
1 Background
To understand XML, it is important to understand its roots as a document markup
language
...
For example, a writer creating text that will eventually
be typeset in a magazine may want to make notes about how the typesetting should
be done
...
In electronic document processing,
a markup language is a formal description of what part of the document is content,
what part is markup, and what the markup means
...
Object−Based
Databases and XML

10
...
For instance, with
functional markup, text representing section headings (for this section, the words
“Background”) would be marked up as being a section heading, instead of being
marked up as text to be printed in large size, bold font
...
It also helps
different parts of a large document, or different pages in a large Web site to be formatted in a uniform manner
...

For the family of markup languages that includes HTML, SGML, and XML the
markup takes the form of tags enclosed in angle-brackets, <>
...
For example, the title of a document might be
marked up as follows
...
This feature is the key to XML’s major role in data representation and exchange, whereas HTML is used primarily for document formatting
...
1
...
These tags provide context for each
value and allow the semantics of the value to be identiﬁed
...
However, in spite of
this disadvantage, an XML representation has signiﬁcant advantages when it is used
to exchange data, for example, as part of a message:
• First, the presence of the tags makes the message self-documenting; that is, a
schema need not be consulted to understand the meaning of the text
...

• Second, the format of the document is not rigid
...
The ability to recognize and ignore unexpected tags allows the
format of the data to evolve over time, without invalidating existing applications
...

Just as SQL is the dominant language for querying relational data, XML is becoming
the dominant format for data exchange
...
Object−Based
Databases and XML

10
...
1

365

© The McGraw−Hill
Companies, 2001

10
...

363

366

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

364

Chapter 10

III
...
XML

© The McGraw−Hill
Companies, 2001

XML

10
...
An element is simply
a pair of matching start- and end-tags, and all the text that appears between them
...
In the example in Figure 10
...
Further, elements in an XML document must nest properly
...

...

is properly nested, whereas

...

...

While proper nesting is an intuitive property, we may deﬁne it more formally
...
Tags are properly nested if every start-tag has a unique
matching end-tag that is in the context of the same parent element
...
2
...

The ability to nest elements within other elements provides an alternative way to
represent information
...
3 shows a representation of the bank information
from Figure 10
...
The
nested representation makes it easy to ﬁnd all accounts of a customer, although it
would store account elements redundantly if they are owned by multiple customers
...
For instance, a shipping application would store the full address of sender
and receiver redundantly on a shipping document associated with each shipment,
whereas a normalized representation may require a join of shipping records with a
company-address relation to get address information
...
For instance, the
type of an account can represented as an attribute, as in Figure 10
...
The attributes of

...

A-102

Perryridge

400

...
2

Mixture of text with subelements
...
Object−Based
Databases and XML

367

© The McGraw−Hill
Companies, 2001

10
...
2

Structure of XML Data

365

Johnson

Alma

Palo Alto

A-101

Downtown

500

A-201

Brighton

900

Hayes

Main

Harrison

A-102

Perryridge

400

Figure 10
...

an element appear as name=value pairs before the closing “>” of a tag
...
Furthermore, attributes can appear only once in
a given tag, unlike subelements, which may be repeated
...
However, in database and data exchange applications of XML, this distinction is less relevant, and the choice of representing data as
an attribute or a subelement is frequently arbitrary
...

Since XML documents are designed to be exchanged between applications, a namespace mechanism has been introduced to allow organizations to specify globally
unique names to be used as element tags in documents
...
Object−Based
Databases and XML

© The McGraw−Hill
Companies, 2001

10
...

A-102

Perryridge

400

...
4

Use of attributes
...
The bank may use
a Web URL such as
http://www
...
com
as a unique identiﬁer
...

In Figure 10
...
The abbreviation can
then be used in various element tags, as illustrated in the ﬁgure
...
Different elements can then be associated with different namespaces
...
Elements without an explicit namespace preﬁx would then belong to
the default namespace
...
So that we can do so, XML allows this construct:
· · ·]]>
Because it is enclosed within CDATA, the text is treated as normal text
data, not as a tag
...

...
com”>

...

Figure 10
...

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

III
...
XML

10
...
3 XML Document Schema
Databases have schemas, which are used to constrain what information can be stored
in the database and to constrain the data types of the stored information
...
While such freedom may occasionally be acceptable given the self-describing nature of the data format, it is not
generally useful when XML documents must be processesed automatically as part of
an application, or even when large amounts of related data are to be formatted in
XML
...

10
...
1 Document Type Deﬁnition
The document type deﬁnition (DTD) is an optional part of an XML document
...
However, the DTD does not in fact constrain types
in the sense of basic types like integer or string
...
The DTD is primarily a list of
rules for what pattern of subelements appear within an element
...
6 shows
a part of an example DTD for a bank information document; the XML document in
Figure 10
...

Each declaration is in the form of a regular expression for the subelements of an
element
...
6, a bank element consists of one or more
account, customer, or depositor elements; the | operator speciﬁes “or” while the +
operator speciﬁes “one or more
...

]>

Figure 10
...

370

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

368

Chapter 10

III
...
XML

© The McGraw−Hill
Companies, 2001

XML

The account element is deﬁned to contain subelements account-number, branchname and balance (in that order)
...

Finally, the elements account-number, branch-name, balance, customer-name, customer-street, and customer-city are all declared to be of type #PCDATA
...
” Two other special type declarations are empty, which says that the element has
no contents, and any, which says that there is no constraint on the subelements of the
element; that is, any elements, even those not mentioned in the DTD, can occur as
subelements of the element
...

The allowable attributes for each element are also declared in the DTD
...
Attributes may speciﬁed to be of
type CDATA, ID, IDREF, or IDREFS; the type CDATA simply says that the attribute contains character data, while the other three are not so simple; they are explained in
more detail shortly
...

Attributes must have a type declaration and a default declaration
...
If an attribute has a default value, for every
element that does not specify a value for the attribute, the default value is ﬁlled in
automatically when the XML document is read
An attribute of type ID provides a unique identiﬁer for the element; a value that
occurs in an ID attribute of an element must not occur in any other element in the
same document
...

account-number ID #REQUIRED
owners IDREFS #REQUIRED >

customer-id ID #REQUIRED
accounts IDREFS #REQUIRED >
· · · declarations for branch, balance, customer-name,
customer-street and customer-city · · ·
]>
Figure 10
...

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

III
...
XML

10
...
The type
IDREFS allows a list of references, separated by spaces
...
7 shows an example DTD in which customer account relationships are
represented by ID and IDREFS attributes, instead of depositor records
...
The customer elements have a new identiﬁer attribute called customer-id
...
Each account element has an attribute
owners, of type IDREFS, which is a list of owners of the account
...
8 shows an example XML document based on the DTD in Figure 10
...

Note that we use a different set of accounts and customers from our earlier example,
in order to illustrate the IDREFS feature better
...

Downtown

500

Perryridge

900

Joe

Monroe

Madison

Lisa

Mountain

Murray Hill

Mary

Erin

Newark

Figure 10
...

372

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

370

Chapter 10

III
...
XML

© The McGraw−Hill
Companies, 2001

XML

Document type deﬁnitions are strongly connected to the document formatting heritage of XML
...
Nevertheless, a tremendous number of data exchange formats are being deﬁned in terms of DTDs, since they were
part of the original standard
...

• Individual text elements and attributes cannot be further typed
...
The lack of
such constraints is problematic for data processing and exchange applications,
which must then contain code to verify the types of elements and attributes
...
Order is seldom important for data exchange (unlike document layout,
where it is crucial)
...
6 permits the speciﬁcation of unordered collections of tags, it is much more difﬁcult to specify that each tag may only appear
once
...
Thus, there is no way to specify
the type of element to which an IDREF or IDREFS attribute should refer
...
7 does not prevent the “owners” attribute of an
account element from referring to other accounts, even though this makes no
sense
...
3
...
We present here an example of XMLSchema, and list
some areas in which it improves DTDs, without giving full details of XMLSchema’s
syntax
...
9 shows how the DTD in Figure 10
...

The ﬁrst element is the root element bank, whose type is declared later
...
Observe the use
of types xsd:string and xsd:decimal to constrain the types of data elements
...
XMLSchema can deﬁne the minimum and
maximum number of occurrences of subelements by using minOccurs and maxOccurs
...

Among the beneﬁts that XMLSchema offers over DTDs are these:
• It allows user-deﬁned types to be created
...

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

III
...
XML

10
...
w3
...
9

XMLSchema version of DTD from Figure 10
...

• It allows types to be restricted to create specialized types, for instance by specifying minimum and maximum values
...

• It is a superset of DTDs
...

• It is integrated with namespaces to allow different parts of a document to
conform to different schema
...
9 shows
...
Object−Based
Databases and XML

10
...

10
...
In particular, tools for querying and transformation of XML data are essential
to extract information from large bodies of XML data, and to convert data between
different representations (schemas) in XML
...
As a result, querying
and transformation can be combined into a single tool
...

• XSLT was designed to be a transformation language, as part of the XSL style
sheet system, which is used to control the formatting of XML data into HTML
or other print or display languages
...
Furthermore, it is currently the most widely available language for manipulating
XML data
...
XQuery
combines features from many of the earlier proposals for querying XML, in
particular the language Quilt
...
An XML document is modeled as a tree, with nodes corresponding to elements and attributes
...
Correspondingly, each node (whether attribute or element), other than the root element,
has a parent node, which is an element
...
The terms
parent, child, ancestor, descendant, and siblings are interpreted in the tree model of
XML data
...

Elements containing text broken up by intervening subelements can have multiple
text node children
...
Since such structures are not commonly used in database data, we shall assume that elements do not
contain both text and subelements
...
Object−Based
Databases and XML

375

© The McGraw−Hill
Companies, 2001

10
...
4

Querying and Transformation

373

10
...
1 XPath
XPath addresses parts of an XML document by means of path expressions
...
5
...

A path expression in XPath is a sequence of location steps separated by “/” (instead of the “
...
The result of a path expression is a set of values
...
8, the XPath
expression

/bank-2/customer/name
would return these elements:
Joe
Lisa
Mary
The expression
/bank-2/customer/name/text()
would return the same names, but without the enclosing tags
...
(Note
that this is an abstract root “above” that is the document tag
...
As a path expression is evaluated, the result of
the path at any point consists of a set of nodes from the document
...
Since multiple children can have the same name, the number of nodes in the node
set can increase or decrease with each step
...
For instance, /bank-2/account/@account-number returns a set
of all values of account-number attributes of account elements
...

XPath supports a number of other features:
• Selection predicates may follow any step in a path, and are contained in square
brackets
...

We can test the existence of a subelement by listing it without any comparison operation; for instance, if we removed just “> 400” from the above, the

376

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

374

Chapter 10

III
...
XML

© The McGraw−Hill
Companies, 2001

XML

expression would return account numbers of all accounts that have a balance
subelement, regardless of its value
...
For example, the path expression
/bank-2/account/[customer/count()> 2]
returns accounts with more than 2 customers
...
) can be used for negation
...
The function id can even be applied on sets of references, or even
strings containing multiple references separated by blanks, such as IDREFS
...

• The | operator allows expression results to be unioned
...
However, the | operator cannot
be nested inside other operators
...
For instance, the expression /bank-2//name ﬁnds any name element anywhere under
the /bank-2 element, regardless of the element in which it is contained
...

• Each step in the path need not select from the children of the nodes in the
current node set
...
We omit details, but note that “//”, described above, is a short form for
specifying “all descendants,” while “
...

10
...
2 XSLT
A style sheet is a representation of formatting options for a document, usually stored
outside the document itself, so that formatting is separate from content
...
Object−Based
Databases and XML

377

© The McGraw−Hill
Companies, 2001

10
...
4

Querying and Transformation

375

...
10

Using XSLT to wrap results in new XML elements
...
The XML Stylesheet
Language (XSL) was originally designed for generating HTML from XML, and is thus
a logical extension of HTML style sheets
...
1 XSLT transformations are quite powerful, and in fact XSLT can even
act as a query language
...

In their basic form, templates allow selection of nodes in an XML tree by an XPath
expression
...
While XSLT can
be used as a query language, its syntax and semantics are quite dissimilar from those
of SQL
...
Consider
this XSLT code:

...
The ﬁrst template matches customer elements that occur as children of
the bank-2 root element
...
The ﬁrst template
outputs the value of the customer-name subelement; note that the value does not
contain the element tag
...
This is required because the default behavior of XSLT on subtrees of the input document that do not match any
template is to copy the subtrees to the output document
...
Figure 10
...

1
...
Formatting is not relevant from a database perspective, so we do not
cover it here
...
Object−Based
Databases and XML

© The McGraw−Hill
Companies, 2001

10
...
”/>
Figure 10
...

Structural recursion is a key part of XSLT
...
The idea of structural recursion is this: When a template matches an element in the tree structure, XSLT can use structural recursion to
apply template rules recursively on subtrees, instead of just outputting a value
...

For example, the results of our previous query can be placed in a surrounding
element by the addition of a rule using xsl:apply-templates, as in Figure 10
...
Without recursion forced by the clause, the template
would output , and then apply the other templates on
the subelements
...

XSLT provides a feature called keys, which permit lookup of elements by using
values of subelements or attributes; the goals are similar to that of the id() function in
XPath, but permits attributes other than the ID attributes to be used
...
The match attribute speciﬁes
which nodes the key applies to
...
Note that the expression need not be unique to
an element; that is, more than one element may have the same expression value
...

Keys can be subsequently used in templates as part of any pattern through the
key function
...
Object−Based
Databases and XML

379

© The McGraw−Hill
Companies, 2001

10
...
4

Querying and Transformation

377

...
12

Joins in XSLT
...
Thus, the XML node for account “A-401” can be
referenced as key(“acctno”, “A-401”)
...
12
...
1
...

The result of the query consists of pairs of customer and account elements enclosed
within cust-acct elements
...
A simple example shows how xsl:sort would be
used in our style sheet to return customer elements sorted by name:

...
The xsl:sort directive within the xsl:apply-template element causes nodes to be sorted before they are processed by the next set of templates
...

10
...
3 XQuery
The World Wide Web Consortium (W3C) is developing XQuery, a query language
for XML
...
Object−Based
Databases and XML

10
...
The XQuery language derives from an XML query language
called Quilt; most of the XQuery features we outline here are part of Quilt
...
4
...

Unlike XSLT, XQuery does not represent queries in XML
...
The for section gives a series
of variables that range over the results of XPath expressions
...
The let clause simply allows complicated expressions to be assigned
to variable names for simplicity of representation
...

Finally, the return section allows the construction of results in XML
...
8, which uses ID and IDREFS:
for $x in /bank-2/account
let $acctno := $x/@account-number
where $x/balance > 400
return $acctno
Since this query is simple, the let clause is not essential, and the variable $acctno
in the return clause could be replaced with $x/@account-number
...
Thus, an equivalent query may have only for and return clauses:
for $x in /bank-2/account[balance > 400]
return $x/@account-number
However, the let clause simpliﬁes complex queries
...
The function distinct applied on a multiset, returns a set without duplication
...
XQuery also provides aggregate functions
such as sum and count that can be applied on collections such as sets and multisets
...
Note also that variables assigned by let clauses may be set- or
multiset-valued, if the path expression on the right-hand side returns a set or multiset
value
...
The join of depositor, account and customer elements in Figure 10
...
4
...
Object−Based
Databases and XML

381

© The McGraw−Hill
Companies, 2001

10
...
4

Querying and Transformation

379

for $b in /bank/account,
$c in /bank/customer,
$d in /bank/depositor
where $a/account-number = $d/account-number
and $c/customer-name = $d/customer-name
return $c $a
The same query can be expressed with the selections speciﬁed as XPath selections:
for $a in /bank/account,
$c in /bank/customer,
$d in /bank/depositor[account-number = $a/account-number
and customer-name = $c/customer-name]
return $c $a
XQuery FLWR expressions can be nested in the return clause, in order to generate
element nestings that do not appear in the source document
...
5
...

For instance, the XML structure shown in Figure 10
...
1 by this
query:

for $c in /bank/customer
return

$c/*
for $d in /bank/depositor[customer-name = $c/customer-name],
$a in /bank/account[account-number=$d/account-number]
return $a

The query also introduces the syntax $c/*, which refers to all the children of the node,
which is bound to the variable $c
...

Path expressions in XQuery are based on path expressions in XPath, but XQuery
provides some extensions (which may eventually be added to XPath itself)
...
The operator can be applied on a value of type
IDREFS to get a set of elements
...

We leave details to the reader
...

For instance, this query outputs all customer elements sorted by the name subelement:

382

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

380

Chapter 10

III
...
XML

© The McGraw−Hill
Companies, 2001

XML

for $c in /bank/customer,
return $c/* sortby(name)
To sort in descending order, we can use sortby(name descending)
...
For instance, we can get a nested
representation of bank information sorted in customer name order, with accounts of
each customer sorted by account number, as follows
...
For instance, the built-in function document(name) returns the root of a named
document; the root can then be used in a path expression to access the contents of the
document
...
XQuery also provides functions to con-

vert between types
...

XQuery offers a variety of other features, such as if-then-else clauses, which can be
used within return clauses, and existential and universal quantiﬁcation, which can
be used in predicates in where clauses
...
Universal quantiﬁcation can be expressed by using
every in place of some
...
5 The Application Program Interface
With the wide acceptance of XML as a data representation and exchange format, software tools are widely available for manipulation of XML data
...

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

III
...
XML

10
...
Programs may access parts of the document in a navigational fashion,
beginning with the root
...
We outline here some of the interfaces and methods in the Java
API for DOM, to give a ﬂavor of DOM
...

The Node interface provides methods such as getParentNode(), getFirstChild(), and
getNextSibling(), to navigate the DOM tree, starting with the root node
...
Attribute values of an element can be accessed by name, using the method getAttribute(name)
...
The method getData() on the Text node returns the text contents
...

Many more details are required for writing an actual DOM program; see the bibliographical notes for references to further information
...

However, the DOM interface does not support any form of declarative querying
...
This API is built on the notion of event handlers, which consists of user-speciﬁed
functions associated with parsing events
...
The
pieces of a document are always encountered in order from start to ﬁnish
...

10
...
One way to store XML data is to
convert it to relational representation, and store it in a relational database
...

10
...
1 Relational Databases
Since relational databases are widely used in existing applications, there is a great
beneﬁt to be had in storing XML data in relational databases, so that the data can be
accessed from existing applications
...
Object−Based
Databases and XML

10
...
However, there are many applications
where the XML data is not generated from a relational schema, and translating the
data to relational form for storage may not be straightforward
...
Several alternative approaches are available:
• Store as string
...
For instance, the XML data in Figure 10
...

While the above representation is easy to use, the database system does
not know the schema of the stored elements
...
In fact, it is not even possible to implement simple
selections such as ﬁnding all account elements, or ﬁnding the account element
with account number A-401, without scanning all tuples of the relation and
examining the contents of the string stored in the tuple
...
For instance, in our example, the
relations would be account-elements, customer-elements, and depositor-elements,
each with an attribute data
...
Thus, a
query that requires account elements with a speciﬁed account number can be
answered efﬁciently with this representation
...

Some database systems, such as Oracle 9, support function indices, which
can help avoid replication of attributes between the XML string and relation
attributes
...

For instance, a function index can be built on a user-deﬁned function that returns the value of the account-number subelement of the XML string in a tuple
...

The above approaches have the drawback that a large part of the XML information is stored within strings
...

• Tree representation
...
Object−Based
Databases and XML

385

© The McGraw−Hill
Companies, 2001

10
...
6

Storage of XML Data

383

Each element and attribute in the XML data is given a unique identiﬁer
...
The relation child
is used to record the parent element of each element and attribute
...
As an exercise, you can represent
the XML data of Figure 10
...

This representation has the advantage that all XML information can be represented directly in relational form, and many XML queries can be translated
into relational queries and executed inside the database system
...

• Map to relations
...
Elements whose schema is unknown are
stored as strings, or as a tree representation
...
All
attributes of these elements are stored as attributes of the relation
...
Otherwise, the relation corresponding to the subelement stores the contents of the subelement, along with
an identiﬁer for the parent type and the attribute stores the identiﬁer of the
subelement
...

If a subelement can occur multiple times in an element, the map-to-relations
approach stores the contents of the subelements in the relation corresponding
to the subelement
...

Note that when we apply this appoach to the DTD of the data in Figure 10
...
The bibliographical notes provide references to such hybrid approaches
...
6
...
Since XML is primarily a ﬁle format, a natural storage mechanism is simply a ﬂat ﬁle
...
In
particular, it lacks data isolation, integrity checks, atomicity, concurrent access, and security
...
Object−Based
Databases and XML

10
...

Thus, this storage format may be sufﬁcient for some applications
...
XML databases are databases that use XML as
their basic data model
...
This allows much of the
object-oriented database infrastucture to be reused, while using a standard
XML interface
...
It is also possible to build XML databases as a layer on top of relational databases
...
7 XML Applications
A central design goal for XML is to make it easier to communicate information, on the
Web and between applications, by allowing the semantics of the data to be described
with the data itself
...
Two applications of XML for communication
— exchange of data, and mediation of Web information resources— illustrate how
XML achieves its goal of supporting data exchange and demonstrate how database
technology and interaction are key in supporting exchange-based applications
...
7
...
Some examples:
• The chemical industry needs information about chemicals, such as their molecular structure, and a variety of important properties such as boiling and melting points, caloriﬁc values, solubility in various solvents, and so on
...

• In shipping, carriers of goods and customs and tax ofﬁcials need shipment
records containing detailed information about the goods being shipped, from
whom and to where they were sent, to whom and to where they are being
shipped, the monetary value of the goods, and so on
...

Using normalized relational schemas to model such complex data requirements
results in a large number of relations, which is often hard for users to manage
...

Nested element representations help reduce the number of relations that must be

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

III
...
XML

10
...
For instance, in our bank example, listing customers
with account elements nested within account elements, as in Figure 10
...
1
...
Data in relational databases must be published,
that is, converted to XML form, for export to other applications
...
While application code can perform the publishing and
shredding operations, the operations are so common that the conversions should
be done automatically, without writing application code, where possible
...

An XML-enabled database supports an automatic mapping from its internal model
(relational, object-relational or object-oriented) to XML
...
A simple mapping might assign an element to every row of a table,
and make each column in that row either an attribute or a subelement of the row’s
element
...
A more complicated mapping would allow nested structures to be created
...
Some database products also allow XML queries to access relational data by treating the XML form of relational data as a virtual XML document
...
7
...
1 Data Mediation
Comparison shopping is an example of a mediation application, in which data about
items, inventory, pricing, and shipping costs are extracted from a variety of Web sites
offering a particular item for sale
...

A personal ﬁnancial manager is a similar application in the context of banking
...
Suppose that these accounts may be held
at different institutions
...
XML-based mediation addresses the problem by extracting an XML representation of account information from the respective Web sites of
the ﬁnancial institutions where the individual holds accounts
...
For those that do not, wrapper software is used to generate XML
data from HTML Web pages returned by the Web site
...
Nevertheless, the value provided by mediation often justiﬁes the effort
required to develop and maintain wrappers
...

This may require further transformation of the XML data from each site, since different sites may structure the same information differently
...
Object−Based
Databases and XML

10
...
1, while another may use the
nested format in Figure 10
...
They may also use different names for the same information (for instance, acct-number and account-id), or may even use the same name for
different information
...
Such issues are discussed in more detail in Section 19
...
XML query languages such as XSLT and XQuery play an
important role in the task of transformation between different XML representations
...
8 Summary
• Like the Hyper-Text Markup Language, HTML, on which the Web is based, the
Extensible Markup Language, XML, is a descendant of the Standard Generalized Markup Language (SGML)
...

• XML documents contain elements, with matching starting and ending tags
indicating the beginning and end of an element
...
Elements may also have
attributes
...

• Elements may have an attribute of type ID that stores a unique identiﬁer for the
element
...
Attributes of type IDREFS can store a list of references
...
The DTD of a document speciﬁes what elements may occur,
how they may be nested, and what attributes each element may have
...
For instance,
they do not provide a type system
...
While it provides more expressive power,
including a powerful type system, it is also more complicated
...
Nesting of elements is reﬂected by the parent-child
structure of the tree representation
...
XPath is a standard language for path expressions, and allows
required elements to be speciﬁed by a ﬁle-system-like path, and additionally
allows selections and other features
...

• The XSLT language was originally designed as the transformation language
for a style sheet facility, in other words, to apply formatting information to

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

III
...
XML

10
...
However, XSLT offers quite powerful querying and transformation features and is widely available, so it is used for quering XML data
...
Each element in the input XML data is matched against available
templates, and the select part of the ﬁrst matching template is applied to the
element
...
XSLT supports keys, which
can be used to implement some types of joins
...

• The XQuery language, which is currently being standardized, is based on the
Quilt query language
...

However, it supports many extensions to deal with the tree nature of XML
and to allow for the transformation of XML documents into other documents
with a signiﬁcantly different structure
...
For example, XML
data can be stored as strings in a relational database
...
As another alternative, XML data can be
mapped to relations in the same way that E-R schemas are mapped to relational schemas
...

• The ability to transform documents in languages such as XSLT and XQuery
is a key to the use of XML in mediation applications, such as electronic business exchanges and the extraction and combination of Web data for use by a
personal ﬁnance manager or comparison shopper
...
Object−Based
Databases and XML

© The McGraw−Hill
Companies, 2001

10
...
1 Give an alternative representation of bank information containing the same
data as in Figure 10
...
Also give
the DTD for this representation
...
2 Show, by giving a DTD, how to represent the books nested-relation from Section 9
...

10
...
4 Write the following queries in XQuery, assuming the DTD from Exercise 10
...

a
...

b
...

c
...

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

III
...
XML

Exercises

389

· · · similar PCDATA declarations for year, publisher, place, journal, year,
number, volume, pages, last-name and ﬁrst-name
]>
Figure 10
...

10
...
3 to list all skill
types in Emp
...
6 Write a query in XQuery on the XML representation in Figure 10
...
(Hint: Use a nested query to
get the effect of an SQL group by
...
7 Write a query in XQuery on the XML representation in Figure 10
...
(Hint: Use universal quantiﬁcation
...
8 Give a query in XQuery to ﬂip the nesting of data from Exercise 10
...
That is, at
the outermost level of nesting the output must have elements corresponding to
authors, and each such element must have nested within it items corresponding to all the books written by the author
...
9 Give the DTD for an XML representation of the information in Figure 2
...
Create a separate element type to represent each relationship, but use ID and IDREF
to implement primary and foreign keys
...
10 Write queries in XSLT and XQuery to output customer elements with associated account elements nested within the customer elements, given the bank
information representation using ID and IDREFS in Figure 10
...

10
...
13
...
You can assume that only books and articles
appear as top level elements in XML documents
...
12 Consider Exercise 10
...
What change would have to be done to the relational schema
...
13 Write queries in XQuery on the bibliography DTD fragment in Figure 10
...

a
...

b
...

c
...

392

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

390

III
...
XML

© The McGraw−Hill
Companies, 2001

XML

10
...
1, and the representation of the tree using nodes and child relations described in Section 10
...
1
...
15 Consider the following recursive DTD
...
Give a small example of data corresponding to the above DTD
...
Show how to map this DTD to a relational schema
...

Bibliographical Notes
The XML Cover Pages site (www
...
org/cover/) contains a wealth of XML
information, including tutorial introductions to XML, standards, publications, and
software
...
A large number of technical reports deﬁning the XML
related standards are available at www
...
org
...
[2000] gives an algebra for XML
...
[2000]
...
Deutsch et al
...
Integration of
keyword querying into XML is outlined by Florescu et al
...
Query optimization for XML is described in McHugh and Widom [1999]
...
Other
work on querying and manipulating XML data includes Chawathe [1999], Deutsch
et al
...
[2000]
...
[1999] describe storage of XML data
...
XML support in commercial databases is described in Banerjee
et al
...
See Chapters 25 through 27 for
more information on XML support in commercial databases
...
[2000], Draper et al
...
[1999], and
Carey et al
...

Tools
A number of tools to deal with XML are available in the public domain
...
oasis-open
...
Kweelt (available at http://db
...
upenn
...

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

P A

IV
...
A vast majority of databases today
store data on magnetic disk and fetch data into main space memory for processing,
or copy data onto tapes and other backup devices for archival storage
...

Chapter 11 begins with an overview of physical storage media, including mechanisms to minimize the chance of data loss due to failures
...
Storage and retrieval of objects is also covered in Chapter 11
...
An index
is a structure that helps locate desired records of a relation quickly, without examining all records
...
Chapter 12 describes several types of indices used
in database systems
...
It is usually convenient to break up queries into smaller operations, roughly
corresponding to the relational algebra operations
...

There are many alternative ways of processing a query, which can have widely
varying costs
...
Chapter 14 describes the process of query optimization
...
Data Storage and
Querying

H

A

P

T

© The McGraw−Hill
Companies, 2001

11
...

For example, at the conceptual or logical level, we viewed the database, in the relational
model, as a collection of tables
...
This is because the goal of a database system is
to simplify and facilitate access to data; users of the system should not be burdened
unnecessarily with the physical details of the implementation of the system
...
We start with characteristics of the
underlying storage media, such as disk and tape systems
...
We consider several alternative
structures, each best suited to a different kind of access to data
...

11
...
These storage media
are classiﬁed by the speed with which data can be accessed, by the cost per unit of
data to buy the medium, and by the medium’s reliability
...
The cache is the fastest and most costly form of storage
...
We shall not
be concerned about managing cache storage in the database system
...
The storage medium used for data that are available to be operated on is main memory
...
Although main memory may contain many megabytes of
393

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

394

Chapter 11

IV
...
Storage and File
Structure

© The McGraw−Hill
Companies, 2001

Storage and File Structure

data, or even gigabytes of data in large server systems, it is generally too small
(or too expensive) for storing the entire database
...

• Flash memory
...
Reading data from ﬂash memory takes less than 100 nanoseconds (a nanosecond is 1/1000 of a microsecond), which is roughly as fast as
reading data from main memory
...
To overwrite memory that has
been written already, we have to erase an entire bank of memory at once; it
is then ready to be written again
...
Flash memory has found popularity as a replacement for magnetic disks
for storing small volumes of data (5 to 10 megabytes) in low-cost computer
systems, such as computer systems that are embedded in other devices, in
hand-held computers, and in other digital electronic devices such as digital
cameras
...
The primary medium for the long-term on-line storage of data is the magnetic disk
...
The system must move the data from disk to main memory so that
they can be accessed
...

The size of magnetic disks currently ranges from a few gigabytes to 80 gigabytes
...

Disk storage survives power failures and system crashes
...

• Optical storage
...
7 or 8
...
Data are stored optically on a disk,
and are read by a laser
...

There are “record-once” versions of compact disk (called CD-R) and digital
video disk (called DVD-R), which can be written only once; such disks are also
called write-once, read-many (WORM) disks
...
Recordable compact disks
are magnetic – optical storage devices that use optical means to read magnetically encoded data
...

395

396

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

IV
...
Storage and File
Structure

11
...

• Tape storage
...

Although magnetic tape is much cheaper than disks, access to data is much
slower, because the tape must be accessed sequentially from the beginning
...
In contrast, disk storage is referred to as direct-access storage because it is possible
to read data from any location on disk
...
Tape jukeboxes are used to hold exceptionally large
collections of data, such as remote-sensing data from satellites, which could
include as much as hundreds of terabytes (1 terabyte = 1012 bytes), or even a
petabyte (1 petabyte = 1015 bytes) of data
...
1) according
to their speed and their cost
...
As we move
down the hierarchy, the cost per bit decreases, whereas the access time increases
...
In fact, many early storage devices, including paper tape and core memories, are relegated to museums now that magnetic tape
and semiconductor memory have become faster and cheaper
...
1

Storage-device hierarchy
...
Data Storage and
Querying

11
...
Today, almost all active data are stored on disks, except in rare cases
where they are stored on tape or in optical jukeboxes
...
The media in the next level in the hierarchy — for example,
magnetic disks — are referred to as secondary storage, or online storage
...

In addition to the speed and cost of the various storage systems, there is also the
issue of storage volatility
...
In the hierarchy shown in Figure 11
...
In the absence of expensive battery and generator backup systems, data
must be written to nonvolatile storage for safekeeping
...

11
...

Disk capacities have been growing at over 50 percent per year, but the storage requirements of large applications have also been growing very fast, in some cases even
faster than the growth rate of disk capacities
...

11
...
1 Physical Characteristics of Disks
Physically, disks are relatively simple (Figure 11
...
Each disk platter has a ﬂat circular shape
...
Platters are made from rigid metal or glass and are covered (usually on both sides) with magnetic recording material
...

When the disk is in use, a drive motor spins it at a constant high speed (usually 60,
90, or 120 revolutions per second, but disks running at 250 revolutions per second are
available)
...

The disk surface is logically divided into tracks, which are subdivided into sectors
...
In currently available disks, sector sizes are typically 512 bytes; there are over
16,000 tracks on each platter, and 2 to 4 platters per disk
...
The
numbers above vary among different models; higher-capacity models usually have
more sectors per track and more tracks on each platter
...
There may be hundreds of
concentric tracks on a disk surface, containing thousands of sectors
...
Data Storage and
Querying

© The McGraw−Hill
Companies, 2001

11
...
2

Magnetic Disks

397

spindle

track t

arm assembly
sector s

cylinder c

read-write
head

platter
arm
rotation

Figure 11
...

Each side of a platter of a disk has a read– write head, which moves across the
platter to access different tracks
...
The disk platters mounted on a spindle and the heads mounted
on a disk arm are together known as head– disk assemblies
...
Hence, the
ith tracks of all the platters together are called the ith cylinder
...
They have
2
a lower cost and faster seek times (due to smaller seek distances) than do the largerdiameter disks (up to 14 inches) that were common earlier, yet they provide high
storage capacity
...

The read– write heads are kept as close as possible to the disk surface to increase
the recording density
...
Because
the head ﬂoats so close to the surface, platters must be machined carefully to be ﬂat
...
If the head contacts the disk surface, the head can
scrape the recording medium off the disk, destroying the data that had been there
...

Under normal circumstances, a head crash results in failure of the entire disk, which
must then be replaced
...
Data Storage and
Querying

© The McGraw−Hill
Companies, 2001

11
...
They are much less susceptible to failure by head crashes
than the older oxide-coated disks
...
This arrangement allows the
computer to switch from track to track quickly, without having to move the head assembly, but because of the large number of heads, the device is extremely expensive
...
Fixed-head disks and multiple-arm disks were
used in high-performance mainframe systems, but are no longer in production
...
It accepts high-level commands to read or write a sector, and
initiates actions, such as moving the disk arm to the right track and actually reading
or writing the data
...
When the sector is
read back, the controller computes the checksum again from the retrieved data and
compares it with the stored checksum; if the data are corrupted, with a high probability the newly computed checksum will not match the stored checksum
...

Another interesting task that disk controllers perform is remapping of bad sectors
...
The remapping is noted on disk or in nonvolatile memory, and the write is
carried out on the new location
...
3 shows how disks are connected to a computer system
...
In modern disk systems, lower-level functions of the disk controller, such as control of the disk arm, computing and veriﬁcation of checksums, and
remapping of bad sectors, are implemented within the disk drive unit
...
3

Disk subsystem
...
Data Storage and
Querying

© The McGraw−Hill
Companies, 2001

11
...
2

Magnetic Disks

399

disks to personal computers and workstations
...

While disks are usually connected directly by cables to the disk controller, they can
be situated remotely and connected by a high-speed network to the disk controller
...
The disks are usually
organized locally using redundant arrays of independent disks (RAID) storage organizations, but the RAID organization may be hidden from the server computers:
the disk subsystems pretend each RAID system is a very large and very reliable disk
...
Remote access to
disks across a storage area network means that disks can be shared by multiple computers, which could run different parts of an application in parallel
...

11
...
2 Performance Measures of Disks
The main measures of the qualities of a disk are capacity, access time, data-transfer
rate, and reliability
...
To access (that is, to read or write) data on a given sector of a disk,
the arm ﬁrst must move so that it is positioned over the correct track, and then must
wait for the sector to appear under it as the disk rotates
...
Typical seek times range from 2 to 30 milliseconds, depending on how far the
track is from the initial arm position
...

The average seek time is the average of the seek times, measured over a sequence
of (uniformly distributed) random requests
...
Taking these factors into account, the average seek time is around one-half of
the maximum seek time
...

Once the seek has started, the time spent waiting for the sector to be accessed
to appear under the head is called the rotational latency time
...
1 milliseconds per rotation
...
Thus, the
average latency time of the disk is one-half the time for a full rotation of the disk
...
Once the ﬁrst sector of the data to be accessed has come under

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

400

Chapter 11

IV
...
Storage and File
Structure

© The McGraw−Hill
Companies, 2001

Storage and File Structure

the head, data transfer begins
...
Current disk systems claim to support maximum
transfer rates of about 25 to 40 megabytes per second, although actual transfer rates
may be signiﬁcantly less, at about 4 to 8 megabytes per second
...
The mean time to failure of a disk (or
of any other system) is the amount of time that, on average, we can expect the system
to run continuously without any failure
...
4 to 136
years
...
A
mean time to failure of 1,200,000 hours does not imply that the disk can be expected
to function for 136 years! Most disks have an expected life span of about 5 years, and
have signiﬁcantly higher rates of failure once they become more than a few years old
...
The widely used ATA-4 interface standard (also called Ultra-DMA) supports 33 megabytes per second transfer
rates, while ATA-5 supports 66 megabytes per second
...
The transfer rate of the interface is
shared between all disks attached to the interface
...
2
...
Each request speciﬁes the address on the
disk to be referenced; that address is in the form of a block number
...
Block sizes range from
512 bytes to several kilobytes
...
The lower levels of the ﬁle-system manager convert block addresses
into the hardware-level cylinder, surface, and sector number
...
One such technique, buffering of blocks
in memory to satisfy future requests, is discussed in Section 11
...
Here, we discuss
several other techniques
...
If several blocks from a cylinder need to be transferred from disk
to main memory, we may be able to save access time by requesting the blocks
in the order in which they will pass under the heads
...
Disk-arm – scheduling algorithms
attempt to order accesses to tracks in a fashion that increases the number of
accesses that can be processed
...
Suppose that,
initially, the arm is moving from the innermost track toward the outside of
the disk
...
Data Storage and
Querying

© The McGraw−Hill
Companies, 2001

11
...
2

Magnetic Disks

401

is an access request, the arm stops at that track, services requests for the track,
and then continues moving outward until there are no waiting requests for
tracks farther out
...

Now, it reverses direction and starts a new cycle
...

• File organization
...

For example, if we expect a ﬁle to be accessed sequentially, then we should
ideally keep all the blocks of the ﬁle sequentially on adjacent cylinders
...
However, this control places a burden on the programmer or system administrator to decide, for example, how
many cylinders to allocate for a ﬁle, and may require costly reorganization if
data are inserted to or deleted from the ﬁle
...
However, over time, a sequential ﬁle may become fragmented;
that is, its blocks become scattered all over the disk
...
The restore operation writes back the blocks of each ﬁle contiguously (or
nearly so)
...
The performance increases realized from these techniques can
be large, but the system is generally unusable while these utilities operate
...
Since the contents of main memory are lost in
a power failure, information about database updates has to be recorded on
disk to survive possible system crashes
...

We can use nonvolatile random-access memory (NV-RAM) to speed up
disk writes drastically
...
A common way to implement nonvolatile RAM is to use battery –
backed-up RAM
...
The controller writes the data to
their destination on disk whenever the disk does not have any other requests,
or when the nonvolatile RAM buffer becomes full
...
Data Storage and
Querying

11
...
On recovery from a system crash, any pending buffered writes in the
nonvolatile RAM are written back to the disk
...

Assume that write requests are received in a random fashion, with the disk
being busy on average 90 percent of the time
...
Doubling the buffer to 100 blocks results in approximately only one write per hour
ﬁnding the buffer to be full
...

• Log disk
...
All access to the log disk is sequential, essentially
eliminating seek time, and several consecutive blocks can be written at once,
making writes to the log disk several times faster than random writes
...
Furthermore, the log disk can reorder the writes to
minimize disk arm movement
...

File systems that support log disks as above are called journaling ﬁle systems
...
Doing so reduces the monetary cost, at the expense of lower performance
...

Data are not written back to their original destination on disk; instead, the
ﬁle system keeps track of where in the log disk the blocks were written most
recently, and retrieves them from that location
...
This approach improves write performance, but generates a high
degree of fragmentation for ﬁles that are updated often
...

11
...

1
...
The exact arrival rate
and rate of service are not needed since the disk utilization provides enough information for our calculations
...
Data Storage and
Querying

11
...
3

RAID

403

Having a large number of disks in a system presents opportunities for improving
the rate at which data can be read or written, if the disks are operated in parallel
...

Furthermore, this setup offers the potential for improving the reliability of data storage, because redundant information can be stored on multiple disks
...

A variety of disk-organization techniques, collectively called redundant arrays of
independent disks (RAID), have been proposed to achieve improved performance
and reliability
...
In fact, the I in RAID,
which now stands for independent, originally stood for inexpensive
...
RAID systems are used for their higher reliability and higher performance
rate, rather than for economic reasons
...
3
...
The chance that some disk out of a set of N disks will
fail is much higher than the chance that a speciﬁc single disk will fail
...
Then,
the mean time to failure of some disk in an array of 100 disks will be 100,000 / 100 =
1000 hours, or around 42 days, which is not long at all! If we store only one copy of
the data, then each disk failure will result in loss of a signiﬁcant amount of data (as
discussed in Section 11
...
1)
...

The solution to the problem of reliability is to introduce redundancy; that is, we
store extra information that is not needed normally, but that can be used in the event
of failure of a disk to rebuild the lost information
...

The simplest (but most expensive) approach to introducing redundancy is to duplicate every disk
...
A
logical disk then consists of two physical disks, and every write is carried out on both
disks
...
Data will be lost
only if the second disk fails before the ﬁrst failed disk is repaired
...
Suppose that the failures of the two disks are independent;
that is, there is no connection between the failure of one disk and the failure of the
other
...
)

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

404

Chapter 11

IV
...
Storage and File
Structure

© The McGraw−Hill
Companies, 2001

Storage and File Structure

You should be aware that the assumption of independence of disk failures is not
valid
...
As disks age, the probability of
failure increases, increasing the chance that a second disk will fail while the ﬁrst is
being repaired
...
Mirrored-disk systems with
mean time to data loss of about 500,000 to 1,000,000 hours, or 55 to 110 years, are
available today
...
Power failures are not a concern if there is no data
transfer to disk in progress when they occur
...
The solution
to this problem is to write one copy ﬁrst, then the next, so that one of the two copies
is always consistent
...
This matter is examined in Exercise 11
...

11
...
2 Improvement in Performance via Parallelism
Now let us consider the beneﬁt of parallel access to multiple disks
...
The transfer rate of each read is the same as in a single-disk system,
but the number of reads per unit time has doubled
...
In its simplest form, data striping consists of splitting
the bits of each byte across multiple disks; such striping is called bit-level striping
...
The array of eight disks can be treated as a single disk with sectors that are eight
times the normal size, and, more important, that has eight times the transfer rate
...
Bit-level striping can be generalized to a number of disks that either is a
multiple of 8 or a factor of 8
...

Block-level striping stripes blocks across multiple disks
...
With an array of n disks, block-level striping assigns logical block i
of the disk array to disk (i mod n) + 1; it uses the i/n th physical block of the disk
to store logical block i
...
When
reading a large ﬁle, block-level striping fetches n blocks at a time in parallel from the
n disks, giving a high data transfer rate for large reads
...

405

406

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

IV
...
Storage and File
Structure

© The McGraw−Hill
Companies, 2001

11
...
Other levels
of striping, such as bytes of a sector or sectors of a block also are possible
...
Load-balance multiple small accesses (block accesses), so that the throughput
of such accesses increases
...
Parallelize large accesses so that the response time of large accesses is reduced
...
3
...
Striping provides high datatransfer rates, but does not improve reliability
...
These schemes have different cost – performance trade-offs
...
4
...
) For all levels,
the ﬁgure depicts four disk’s worth of data, and the extra disks depicted are used to
store redundant information for failure recovery
...
Figure 11
...

• RAID level 1 refers to disk mirroring with block striping
...
4b shows
a mirrored organization that holds four disks worth of data
...
Memory systems have long used parity bits for error
detection and correction
...
If one of the bits in the byte
gets damaged (either a 1 becomes a 0, or a 0 becomes a 1), the parity of the
byte changes and thus will not match the stored parity
...
Thus, all 1-bit
errors will be detected by the memory system
...

The idea of error-correcting codes can be used directly in disk arrays by
striping bytes across disks
...

Figure 11
...
The disks labeled P store the errorcorrection bits
...
Figure 11
...

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

406

Chapter 11

IV
...
Storage and File
Structure

Storage and File Structure

(a) RAID 0: nonredundant striping

C

C

C

C

(b) RAID 1: mirrored disks
P

P

P

(c) RAID 2: memory-style error-correcting codes

P

(d) RAID 3: bit-interleaved parity

P

(e) RAID 4: block-interleaved parity

P

P

P

P

P

(f) RAID 5: block-interleaved distributed parity

P

P

P

P

P

P

(g) RAID 6: P + Q redundancy

Figure 11
...

• RAID level 3, bit-interleaved parity organization, improves on level 2 by
exploiting the fact that disk controllers, unlike memory systems, can detect
whether a sector has been read correctly, so a single parity bit can be used for
error correction, as well as for detection
...
If one of the
sectors gets damaged, the system knows exactly which sector it is, and, for
each bit in the sector, the system can ﬁgure out whether it is a 1 or a 0 by computing the parity of the corresponding bits from sectors in the other disks
...

407

408

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

IV
...
Storage and File
Structure

© The McGraw−Hill
Companies, 2001

11
...

Figure 11
...

RAID level 3 has two beneﬁts over level 1
...
Since reads and writes of a byte are
spread out over multiple disks, with N -way striping of data, the transfer rate
for reading or writing a single block is N times faster than a RAID level 1 organization using N -way striping
...

• RAID level 4, block-interleaved parity organization, uses block level striping,
like RAID 0, and in addition keeps a parity block on a separate disk for corresponding blocks from N other disks
...
4e
...

A block read accesses only one disk, allowing other requests to be processed by the other disks
...
The transfer rates for large reads is high, since all the disks can be
read in parallel; large writes also have high transfer rates, since the data and
parity can be written in parallel
...
A write of a block has to access the disk on which the block is stored,
as well as the parity disk, since the parity block has to be updated
...
Thus, a single
write requires four disk accesses: two to read the two old blocks, and two to
write the two blocks
...
In level 5, all disks can participate in satisfying
read requests, unlike RAID level 4, where the parity disk cannot participate,
so level 5 increases the total number of requests that can be met in a given
amount of time
...

Figure 11
...
The P ’s are distributed across all the disks
...
The
following table indicates how the ﬁrst 20 blocks, numbered 0 to 19, and their
parity blocks are laid out
...

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

408

Chapter 11

IV
...
Storage and File
Structure

Storage and File Structure

P0
4
8
12
16

0
P1
9
13
17

1
5
P2
14
18

2
6
10
P3
19

3
7
11
15
P4

Note that a parity block cannot store parity for blocks in the same disk,
since then a disk failure would result in loss of data as well as of parity, and
hence would not be recoverable
...

• RAID level 6, the P + Q redundancy scheme, is much like RAID level 5, but
stores extra redundant information to guard against multiple disk failures
...
In the scheme in Figure 11
...

Finally, we note that several variations have been proposed to the basic RAID schemes
described here
...
2
However, the terminology we have presented is the most widely used
...
3
...
Rebuilding is easiest for RAID level 1, since data can
be copied from another disk; for the other levels, we need to access all the other
disks in the array to rebuild data of a failed disk
...
Furthermore, since rebuild time can form a
signiﬁcant part of the repair time, rebuild performance also inﬂuences the mean time
to data loss
...
For example, some products use RAID level 1 to refer to mirroring without striping, and level 1+0 or
level 10 to refer to mirroring with striping
...

409

410

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

IV
...
Storage and File
Structure

© The McGraw−Hill
Companies, 2001

11
...
Since RAID levels 2 and 4 are subsumed by RAID levels 3 and 5, the choice
of RAID levels is restricted to the remaining levels
...
For small transfers, the disk access time dominates anyway, so the beneﬁt of parallel reads diminishes
...
Level 6 is not supported currently by many RAID
implementations, but it offers better reliability than level 5 and can be used in applications where data safety is very important
...
RAID level 1 is
popular for applications such as storage of log ﬁles in a database system, since it
offers the best write performance
...
For applications where data are
read frequently, and written rarely, level 5 is the preferred choice
...
As a result, for
many existing database applications with moderate storage requirements, the monetary cost of the extra disk storage needed for mirroring has become relatively small
(the extra monetary cost, however, remains a signiﬁcant issue for storage-intensive
applications such as video data storage)
...

RAID level 5, which increases the number of I/O operations needed to write a
single logical block, pays a signiﬁcant time penalty in terms of write performance
...

RAID system designers have to make several other decisions as well
...
If there are more bits protected by a parity bit,
the space overhead due to parity bits is lower, but there is an increased chance that a
second disk will fail before the ﬁrst failed disk is repaired, and that will result in data
loss
...
3
...

RAID can be implemented with no change at the hardware level, using only software
modiﬁcation
...
However, there
are signiﬁcant beneﬁts to be had by building special-purpose hardware to support
RAID, which we outline below; systems with special hardware support are called
hardware RAID systems
...
Data Storage and
Querying

11
...
Without such hardware support, extra
work needs to be done to detect blocks that may have been partially written before
power failure (see Exercise 11
...

Some hardware RAID implementations permit hot swapping; that is, faulty disks
can be removed and replaced by new ones without turning power off
...
In fact many critical systems today
run on a 24 × 7 schedule; that is, they run 24 hours a day, 7 days a week, providing
no time for shutting down and replacing a failed disk
...
If a disk
fails, the spare disk is immediately used as a replacement
...
The failed disk
can be replaced at leisure
...
To avoid this possibility, good RAID implementations have multiple
redundant power supplies (with battery backups so they continue to function even
if power fails)
...
Thus, failure of any single component will not stop the functioning of the
RAID system
...
3
...
When applied
to arrays of tapes, the RAID structures are able to recover data even if one of the tapes
in an array of tapes is damaged
...

11
...

The two most common tertiary storage media are optical disks and magnetic tapes
...
4
...
They have
a fairly large capacity (640 megabytes), and they are cheap to mass-produce
...
Disks in the DVD-5 format can store 4
...
Data Storage and
Querying

© The McGraw−Hill
Companies, 2001

11
...
4

Tertiary Storage

411

(in one recording layer), while disks in the DVD-9 format can store 8
...
Recording on both sides of a disk yields even larger
capacities; DVD-10 and DVD-18 formats, which are the two-sided versions of DVD-5
and DVD-9, can store 9
...

CD and DVD drives have much longer seek times (100 milliseconds is common)
than do magnetic-disk drives, since the head assembly is heavier
...
Rotational speeds of CD drives originally corresponded to the audio CD standards, and the speeds of DVD drives originally corresponded to the DVD video standards, but current-generation drives rotate
at many times the standard rate
...
Current CD drives
read at around 3 to 6 megabytes per second, and current DVD drives read at 8 to 15
megabytes per second
...
The transfer rate of optical drives is characterized as n×, which means the drive supports transfers at n times the standard
rate; rates of around 50× for CD and 12× for DVD are now common
...
Since they cannot be overwritten, they can be used
to store information that should not be modiﬁed, such as audit trails
...

Jukeboxes are devices that store a large number of optical disks (up to several hundred) and load them automatically on demand to one of a small number (usually, 1 to
10) of drives
...

When a disk is accessed, it is loaded by a mechanical arm from a rack onto a drive
(any disk that was already in the drive must ﬁrst be placed back on the rack)
...

11
...
2 Magnetic Tapes
Although magnetic tapes are relatively permanent, and can hold large volumes of
data, they are slow in comparison to magnetic and optical disks
...
Thus, they cannot provide random access for secondary-storage requirements, although historically, prior to the
use of magnetic disks, tapes were used as a secondary-storage medium
...

Tapes are also used for storing large volumes of data, such as video or image data,
that either do not need to be accessible quickly or are so voluminous that magneticdisk storage would be too expensive
...

Moving to the correct spot on a tape can take seconds or even minutes, rather than

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

412

Chapter 11

IV
...
Storage and File
Structure

© The McGraw−Hill
Companies, 2001

Storage and File Structure

milliseconds; once positioned, however, tape drives can write data at densities and
speeds approaching those of disk drives
...

The market is currently fragmented among a wide variety of tape formats
...
Data transfer rates are of the order of a few to tens of megabytes
per second
...
Tapes, however, have
limits on the number of times that they can be read or written reliably
...
Most other tape formats provide larger capacities, at the cost of slower access;
such formats are ideal for data backup, where fast seeks are not important
...
Applications that need such enormous data
storage include imaging systems that gather data by remote-sensing satellites, and
large video libraries for television broadcasters
...
5 Storage Access
A database is mapped into a number of different ﬁles, which are maintained by the
underlying operating system
...
Each ﬁle is partitioned into ﬁxed-length storage units called blocks, which
are the units of both storage allocation and data transfer
...
6 various ways to organize the data logically in ﬁles
...
The exact set of data items that a block
contains is determined by the form of physical data organization being used (see
Section 11
...
We shall assume that no data item spans two or more blocks
...

A major goal of the database system is to minimize the number of block transfers
between the disk and memory
...
The goal is to maximize the chance
that, when a block is accessed, it is already in main memory, and, thus, no disk access
is required
...
The buffer
is that part of main memory available for storage of copies of disk blocks
...
Data Storage and
Querying

© The McGraw−Hill
Companies, 2001

11
...
5

Storage Access

413

of the block older than the version in the buffer
...

11
...
1 Buffer Manager
Programs in a database system make requests (that is, calls) on the buffer manager
when they need a block from disk
...
If the block is
not in the buffer, the buffer manager ﬁrst allocates space in the buffer for the block,
throwing out some other block, if necessary, to make space for the new block
...
Then, the buffer manager reads in the requested block from the disk to the buffer, and passes the address of the block in main
memory to the requester
...

If you are familiar with operating-system concepts, you will note that the buffer
manager appears to be nothing more than a virtual-memory manager, like those
found in most operating systems
...
Further, to serve the database system
well, the buffer manager must use techniques more sophisticated than typical virtualmemory management schemes:
• Buffer replacement strategy
...
Most operating systems use a least recently used (LRU) scheme, in which the block that
was referenced least recently is written back to disk and is removed from the
buffer
...

• Pinned blocks
...
For instance, most recovery systems require that a block should
not be written to disk while an update on the block is in progress
...
Although many
operating systems do not support pinned blocks, such a feature is essential for
a database system that is resilient to crashes
...
There are situations in which it is necessary to write
back the block to disk, even though the buffer space that it occupies is not
needed
...
We shall see the
reason for forced output in Chapter 17; brieﬂy, main-memory contents and
thus buffer contents are lost in a crash, whereas data on disk usually survive
a crash
...
5
...
For general-purpose programs, it is not possible to predict accurately

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

414

Chapter 11

IV
...
Storage and File
Structure

Storage and File Structure

for each tuple b of borrower do
for each tuple c of customer do
if b[customer-name] = c[customer-name]
then begin
let x be a tuple deﬁned as follows:
x[customer-name] := b[customer-name]
x[loan-number] := b[loan-number]
x[customer-street] := c[customer-street]
x[customer-city] := c[customer-city]
include tuple x as part of result of borrower
end
end
end
Figure 11
...

which blocks will be referenced
...
The assumption generally made
is that blocks that have been referenced recently are likely to be referenced again
...

This approach is called the least recently used (LRU) block-replacement scheme
...
However, a database system is able to predict the pattern of future references more accurately than an
operating system
...
The
database system is often able to determine in advance which blocks will be needed by
looking at each of the steps required to perform the user-requested operation
...

To illustrate how information about future block access allows us to improve the
LRU strategy, consider the processing of the relational-algebra expression
borrower

1

customer

Assume that the strategy chosen to process this request is given by the pseudocode
program shown in Figure 11
...
(We shall study other strategies in Chapter 13
...
In this
example, we can see that, once a tuple of borrower has been processed, that tuple is not
needed again
...
The buffer manager should be instructed to free the space occupied by a
borrower block as soon as the ﬁnal tuple has been processed
...

Now consider blocks containing customer tuples
...
When processing of
a customer block is completed, we know that that block will not be accessed again
until all other customer blocks have been processed
...
Data Storage and
Querying

© The McGraw−Hill
Companies, 2001

11
...
6

File Organization

415

customer block is the block that will be referenced next
...
Indeed, the optimal
strategy for block replacement is the most recently used (MRU) strategy
...

For the MRU strategy to work correctly for our example, the system must pin the
customer block currently being processed
...

In addition to using knowledge that the system may have about the request being
processed, the buffer manager can use statistical information about the probability
that a request will reference a particular relation
...
8) keeps track of the logical schema of the
relations as well as their physical storage information is one of the most frequently
accessed parts of the database
...

In Chapter 12, we discuss indices for ﬁles
...

The ideal database block-replacement strategy needs knowledge of the database
operations— both those being performed and those that will be performed in the
future
...
Indeed, a surprisingly large number of database systems use LRU, despite that strategy’s faults
...

The strategy that the buffer manager uses for block replacement is inﬂuenced by
factors other than the time at which the block will be referenced again
...
If the buffer manager is given information from the concurrencycontrol subsystem indicating which requests are being delayed, it can use this information to alter its block-replacement strategy
...

The crash-recovery subsystem (Chapter 17) imposes stringent constraints on block
replacement
...
Instead, the block manager must seek permission from the crashrecovery subsystem before writing out a block
...
In Chapter 17, we deﬁne precisely the
interaction between the buffer manager and the crash-recovery subsystem
...
6 File Organization
A ﬁle is organized logically as a sequence of records
...
Files are provided as a basic construct in operating systems, so we shall

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

416

Chapter 11

IV
...
Storage and File
Structure

Storage and File Structure

record 0
record 1
record 2

A-102

Perryridge

400

A-305
A-215

Round Hill
Mianus

350
700

record 3
record 4

A-101
A-222

500
700

record 5
record 6
record 7

A-201
A-217
A-110

Downtown
Redwood
Perryridge

record 8

A-218

Figure 11
...

assume the existence of an underlying ﬁle system
...

Although blocks are of a ﬁxed size determined by the physical properties of the
disk and by the operating system, record sizes vary
...

One approach to mapping the database to ﬁles is to use several ﬁles, and to store
records of only one ﬁxed length in any given ﬁle
...
Many
of the techniques used for the former can be applied to the variable-length case
...

11
...
1 Fixed-Length Records
As an example, let us consider a ﬁle of account records for our bank database
...
A simple approach is to use the ﬁrst 40 bytes
for the ﬁrst record, the next 40 bytes for the second record, and so on (Figure 11
...

However, there are two problems with this simple approach:
1
...
The space occupied by the
record to be deleted must be ﬁlled with some other record of the ﬁle, or we
must have a way of marking deleted records so that they can be ignored
...
Data Storage and
Querying

© The McGraw−Hill
Companies, 2001

11
...
6

record 0
record 1
record 3
record 4

A-101
A-222

record 5
record 6

A-201
A-217

record 7
record 8

Figure 11
...
6, with record 2 deleted and all records moved
...
Unless the block size happens to be a multiple of 40 (which is unlikely), some
records will cross block boundaries
...
It would thus require two block accesses to
read or write such a record
...
7)
...
It might be easier simply to move the
ﬁnal record of the ﬁle into the space occupied by the deleted record (Figure 11
...

It is undesirable to move records to occupy the space freed by a deleted record,
since doing so requires additional block accesses
...
A simple
marker on a deleted record is not sufﬁcient, since it is hard to ﬁnd this available space
when an insertion is being done
...

At the beginning of the ﬁle, we allocate a certain number of bytes as a ﬁle header
...
For now, all we need
to store there is the address of the ﬁrst record whose contents are deleted
...
8

A-102
A-305

A-217
A-110

Brighton
Downtown

750
600

File of Figure 11
...

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

418

Chapter 11

IV
...
Storage and File
Structure

Storage and File Structure

header
record 0

A-102

Perryridge

400

record 2
record 3
record 4

A-215
A-101

Mianus

700

Downtown

500

record 5
record 6

A-201

Perryridge

900

record 7
record 8

A-110
A-218

Downtown
Perryridge

600

record 1

Figure 11
...
6, with free list after deletion of records 1, 4, and 6
...
Intuitively,
we can think of these stored addresses as pointers, since they point to the location of
a record
...
Figure 11
...
6, with the free list, after records 1, 4,
and 6 have been deleted
...
We
change the header pointer to point to the next available record
...

Insertion and deletion for ﬁles of ﬁxed-length records are simple to implement,
because the space made available by a deleted record is exactly the space needed to
insert a record
...
An inserted record may not ﬁt in the space left free by a deleted record, or it
may ﬁll only part of that space
...
6
...
For purposes of
illustration, we shall use one example to demonstrate the various implementation
techniques
...
6, in which we use one variable-length record for each
branch name and for all the account information for that branch
...
Data Storage and
Querying

© The McGraw−Hill
Companies, 2001

11
...
6

File Organization

419

type account-list = record
branch-name : char (22);
account-info : array [1
...
That is,
the type deﬁnition does not limit the number of elements in the array, although any
actual record will have a speciﬁc number of elements in its array
...

11
...
2
...
We can then store each record as a
string of consecutive bytes
...
10 shows such an organization to represent the
ﬁle of ﬁxed-length records of Figure 11
...
An alternative
version of the byte-string representation stores the record length at the beginning of
each record, instead of using end-of-record symbols
...
10 has some disadvantages:
• It is not easy to reuse space occupied formerly by a deleted record
...

• There is no space, in general, for records to grow longer
...
g
...

Thus, the basic byte-string representation described here not usually used for implementing variable-length records
...
10

Byte-string representation of variable-length records
...
Data Storage and
Querying

11
...
11

Slotted-page structure
...

The slotted-page structure appears in Figure 11
...
There is a header at the beginning of each block, containing the following information:
1
...
The end of free space in the block
3
...
The free space in the block is contiguous, between the ﬁnal entry in the
header array, and the ﬁrst record
...

If a record is deleted, the space that it occupies is freed, and its entry is set to
deleted (its size is set to −1, for example)
...
The end-of-free-space pointer in the header is appropriately updated as well
...
The cost of moving the records is not too high, since the size of a block is
limited: A typical value is 4 kilobytes
...
Instead, pointers must point to the entry in the header that contains the
actual location of the record
...

11
...
2
...

There are two ways of doing this:
1
...
If there is a maximum record length that is never exceeded,
we can use ﬁxed-length records of that length
...
Data Storage and
Querying

© The McGraw−Hill
Companies, 2001

11
...
6

0

Perryridge

A-102

400

A-201

900

1

Round Hill

A-305

350

⊥

⊥

2
3
4

Mianus
Downtown
Redwood

700
500
700

⊥

A-110
⊥

5

Brighton

A-215
A-101
A-222
A-217

750

⊥

Figure 11
...
10, using the reserved-space method
...

2
...
We can represent variable-length records by lists of ﬁxedlength records, chained together by pointers
...
Figure 11
...
10
would be represented if we allowed a maximum of three accounts per branch
...
Those branches with fewer than three accounts (for example, Round
Hill) have records with null ﬁelds
...
12
...

The reserved-space method is useful when most records have a length close to
the maximum
...
In our bank
example, some branches may have many more accounts than others
...
To represent the ﬁle by the linked list
method, we add a pointer ﬁeld as we did in Figure 11
...
The resulting structure appears in Figure 11
...

0
1
2
3
4
5
6
7

Perryridge
Round Hill
Mianus
Downtown
Redwood

8

Figure 11
...
10 using linked lists
...
Data Storage and
Querying

© The McGraw−Hill
Companies, 2001

11
...
14

A-102
A-305
A-215
A-101
A-222
A-217

400
350
700
500
700
750

A-201
A-218
A-110

900
700
600

Anchor-block and overﬂow-block structures
...
9 and 11
...
9, we use pointers to chain together only deleted records, whereas
in Figure 11
...

A disadvantage to the structure of Figure 11
...
The ﬁrst record needs to have the branch-name value, but
subsequent records do not
...
This wasted space is signiﬁcant,
since we expect, in practice, that each branch has a large number of accounts
...
Anchor block, which contains the ﬁrst record of a chain
2
...
Figure 11
...

11
...
An instance
of a relation is a set of records
...
Several of the possible ways of organizing records in ﬁles are:
• Heap ﬁle organization
...
There is no ordering of records
...
Records are stored in sequential order, according to the value of a “search key” of each record
...
7
...

423

424

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

IV
...
Storage and File
Structure

11
...
A hash function is computed on some attribute of
each record
...
Chapter 12 describes this organization; it is
closely related to the indexing structures described in that chapter
...
However,
in a clustering ﬁle organization, records of several different relations are stored in
the same ﬁle; further, related records of the different relations are stored on the same
block, so that one I/O operation fetches related records from all the relations
...
Section 11
...
2 describes this organization
...
7
...
A search key is any attribute or set of attributes; it need not be
the primary key, or even a superkey
...
The pointer in each record points to
the next record in search-key order
...

Figure 11
...
In that example, the records are stored in search-key order, using branchname as the search key
...

It is difﬁcult, however, to maintain physical sequential order as records are inserted and deleted, since it is costly to move many records as a result of a single

A-217
A-101

Brighton
Downtown

750
500

A-110
A-215
A-102
A-201
A-218
A-222
A-305

Downtown
Mianus
Perryridge

600
700
400
900
700
700
350

Figure 11
...

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

424

Chapter 11

IV
...
Storage and File
Structure

Storage and File Structure

A-217
A-101

Brighton
Downtown

750
500

A-110
A-215
A-102
A-201
A-218
A-222
A-305

Downtown
Mianus
Perryridge
Perryridge
Perryridge
Redwood

600
700
400
900
700
700

Round Hill

350

A-888

North Town

800

Figure 11
...

insertion or deletion
...
For insertion, we apply the following rules:
1
...

2
...
Otherwise, insert the new record in
an overﬂow block
...

Figure 11
...
15 after the insertion of the record (North
Town, A-888, 800)
...
16 allows fast insertion of new records,
but forces sequential ﬁle-processing applications to process records in an order that
does not match the physical order of the records
...
Eventually, however, the correspondence between search-key order and physical order may be totally lost, in which case sequential processing will become much
less efﬁcient
...
Such reorganizations are costly, and must be done during
times when the system load is low
...
In the extreme case in
which insertions rarely occur, it is possible always to keep the ﬁle in physically sorted
order
...
15 is not needed
...
7
...
Usually, tuples of a relation can be represented as ﬁxed-length records
...
Data Storage and
Querying

© The McGraw−Hill
Companies, 2001

11
...
7

Organization of Records in Files

425

can be mapped to a simple ﬁle structure
...
In such systems, the size of the database
is small, so little is gained from a sophisticated ﬁle structure
...
A simple ﬁle structure reduces the amount of code needed to implement the system
...
We have seen that there are performance
advantages to be gained from careful assignment of records to blocks, and from careful organization of the blocks themselves
...

However, many large-scale database systems do not rely directly on the underlying operating system for ﬁle management
...
The database system stores all relations in this one
ﬁle, and manages the ﬁle itself
...
customer-name = customer
...
Thus, for each
tuple of depositor, the system must locate the customer tuples with the same value for
customer-name
...
Regardless of how these records are located, however,
they need to be transferred from disk into main memory
...

As a concrete example, consider the depositor and customer relations of Figures
11
...
18, respectively
...
19, we show a ﬁle structure designed for efﬁcient execution of queries involving depositor 1 customer
...
This structure mixes together tuples of two relations, but allows for efﬁcient
processing of the join
...
Since the corresponding
customer-name
Hayes
Hayes
Hayes
Turner
Figure 11
...

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

426

Chapter 11

IV
...
Storage and File
Structure

Storage and File Structure

customer-name
Hayes
Turner

customer-street customer-city
Main
Brooklyn
Putnam
Stamford

Figure 11
...

depositor tuples are stored on the disk near the customer tuple, the block containing the
customer tuple contains tuples of the depositor relation needed to process the query
...

A clustering ﬁle organization is a ﬁle organization, such as that illustrated in Figure 11
...
Such a ﬁle
organization allows us to read records that would satisfy the join condition by using
one block read
...

Our use of clustering has enhanced processing of a particular join (depositor 1 customer), but it results in slowing processing of other types of query
...
Instead of several customer records appearing in one block,
each record is located in a distinct block
...
To locate all tuples of the
customer relation in the structure of Figure 11
...
20
...
Careful use of clustering can produce signiﬁcant
performance gains in query processing
...
8 Data-Dictionary Storage
So far, we have considered only the representation of the relations themselves
...
19

Brooklyn

Stamford

A-305

Clustering ﬁle structure
...
Data Storage and
Querying

© The McGraw−Hill
Companies, 2001

11
...
8

Hayes
Hayes

Main
A-102

Hayes
Hayes
Turner
Turner

A-220
A-503
Putnam

Figure 11
...

schema of the relations
...
Among the types of information that the system must store are these:
• Names of the relations
• Names of the attributes of each relation
• Domains and lengths of attributes
• Names of views deﬁned on the database, and deﬁnitions of those views
• Integrity constraints (for example, key constraints)
In addition, many systems keep the following data on users of the system:
• Names of authorized users
• Accounting information about users
• Passwords or other information used to authenticate users
Further, the database may store statistical and descriptive data about the relations,
such as:
• Number of tuples in each relation
• Method of storage for each relation (for example, clustered or nonclustered)
The data dictionary may also note the storage organization (sequential, hash or heap)
of relations, and the location where each relation is stored:
• If relations are stored in operating system ﬁles, the dictionary would note the
names of the ﬁle (or ﬁles) containing each relation
...

In Chapter 12, in which we study indices, we shall see a need to store information
about each index on each of the relations:

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

428

Chapter 11

IV
...
Storage and File
Structure

© The McGraw−Hill
Companies, 2001

Storage and File Structure

• Name of the index
• Name of the relation being indexed
• Attributes on which the index is deﬁned
• Type of index formed
All this information constitutes, in effect, a miniature database
...

It is generally preferable to store the data about the database in the database itself
...

The exact choice of how to represent system data by relations must be made by
the system designers
...
The Index-metadata relation is thus
not in ﬁrst normal form; it can be normalized, but the above representation is likely
to be more efﬁcient to access
...

The storage organization and location of the Relation-metadata itself must be recorded elsewhere (for example, in the database code itself), since we need this information to ﬁnd the contents of Relation-metadata
...
9 Storage for Object-Oriented Databases∗∗
The ﬁle-organization techniques described in Section 11
...
However, some extra features are needed to support objectoriented database features, such as set-valued ﬁelds and persistent pointers
...
9
...
At the lowest level of data representation, both tuples and the data
parts of objects are simply sequences of bytes
...

Objects in object-oriented databases may lack the uniformity of tuples in relational
databases
...
Data Storage and
Querying

© The McGraw−Hill
Companies, 2001

11
...
9

Storage for Object-Oriented Databases∗∗

429

trast, data are typically required to be (at least) in ﬁrst normal form
...
Such objects have to be managed differently from
records in a relational system
...
Set-valued ﬁelds that have a larger number of
elements can be implemented as relations in the database
...
Each tuple also
contains the object identiﬁer of the object
...
The storage system gives the upper levels
of the database system the view of a set-valued ﬁeld, even though the set-valued ﬁeld
has actually been normalized by creating a new relation
...
Such large objects may each be stored in a separate ﬁle
...
9
...

11
...
2 Implementation of Object Identiﬁers
Since objects are identiﬁed by object identiﬁers (OIDs), an object-storage system needs
a mechanism to locate an object, given an OID
...
If the OIDs are physical
OIDs — that is, they encode the location of the object — then the object can be found
directly
...
A volume or ﬁle identiﬁer
2
...
An offset within the block
A volume is a logical unit of storage that usually corresponds to a disk
...
The unique
identiﬁer is also stored with the object, and the identiﬁers in an OID and the corresponding object should match
...
(A dangling pointer is a
pointer that does not point to a valid object
...
21 illustrates this scheme
...
If the space occupied by the object had been
reallocated, there may be a new object in the location, and it may get incorrectly
addressed by the identiﬁer of the old object
...
The unique
identiﬁer helps to detect such errors, since the unique identiﬁers of the old physical
OID and the new object will not match
...
Data Storage and
Querying

Chapter 11

© The McGraw−Hill
Companies, 2001

11
...
Block
...
56850
...

Good OID

Object
Unique-Id

Location

Data

(a) General structure
Figure 11
...
56850
...
56850
...

Suppose that an object has to be moved to a new block, perhaps because the size of
the object has increased, and the old block has no extra space
...
Rather than change
the OID of the object (which involves changing every object that points to this one),
we leave behind a forwarding address at the old location
...

11
...
3 Management of Persistent Pointers
We implement persistent pointers in a persistent programming language by using
OIDs
...
An important difference between persistent pointers and in-memory
pointers is the size of the pointer
...
On most current computers, in-memory pointers are
usually 4 bytes long, which is sufﬁcient to address 4 gigabytes of memory
...
Since database systems are often bigger than 4 gigabytes, persistent pointers are usually at least 8 bytes
long
...
This feature further increases the size of persistent pointers
...

The action of looking up an object, given its identiﬁer, is called dereferencing
...
Given a persistent pointer, dereferencing an object has an extra step — ﬁnding the actual location of the object in memory by looking up the persistent pointer
in a table
...
We
can implement the table lookup fairly efﬁciently by using a hash table data structure,
but the lookup is still slow compared to a pointer dereference, even if the object is
already in memory
...
Data Storage and
Querying

11
...
9

© The McGraw−Hill
Companies, 2001

Storage for Object-Oriented Databases∗∗

431

Pointer swizzling is a way to cut down the cost of locating persistent objects that
are already present in memory
...
Now the system carries out an extra step — it stores an in-memory
pointer to the object in place of the persistent pointer
...
(When persistent objects have to be
moved from memory back to disk to make space for other persistent objects, the
system must carry out an extra step to ensure that the object is still in memory
...
Pointer swizzling on pointer dereference, as described here, is
called software swizzling
...
One way to ensure that it will not change is to pin pages containing swizzled objects in the buffer pool, so that they are never replaced until the program
that performed the swizzling has ﬁnished execution
...

11
...
4 Hardware Swizzling
Having two types of pointers, persistent and transient (in-memory), is inconvenient
...
It
would be simpler if both persistent and in-memory pointers were of the same type
...
However, the storage cost of longer persistent pointers will have to be borne by in-memory
pointers as well; understandably, this scheme is unpopular
...
When data in a virtual memory page are accessed, and the operating system detects that the page does not have real storage allocated for it, or has
been access protected, then a segmentation violation is said to occur
...
In most Unix systems, the
mmap system call provides this latter functionality
...

3
...

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

432

Chapter 11

IV
...
Storage and File
Structure

© The McGraw−Hill
Companies, 2001

Storage and File Structure

Hardware swizzling has two major advantages over software swizzling:
1
...

2
...
Software written to deal with in-memory pointers
can thereby deal with persistent pointers as well, without any changes
...
9
...
1 Pointer Representation
Hardware swizzling uses the following representation of persistent pointers contained in objects that are on disk
...
4 The page
identiﬁer in a persistent pointer is actually a small indirect pointer, which we call
the short page identiﬁer
...
The system has to look up the short page identiﬁer in a persistent pointer in the
translation table to ﬁnd the full page identiﬁer
...
In practice, the
translation table is likely to contain much less than the maximum number of elements
(1024 in our example) and will not consume excessive space
...
Hence, a small number of bits is enough to
store the short page identiﬁer
...
Even though only a few
bits are needed for the short page identiﬁer, all the bits of an in-memory pointer,
other than the page-offset bits, are used as the short page identiﬁer
...

The persistent-pointer representation scheme appears in Figure 11
...
The translation table gives the mapping between short page identiﬁers and the full database page identiﬁers for each of the short page identiﬁers in these persistent pointers
...
page
...

Each page maintains extra information so that all persistent pointers in the page
can be found
...
The need to locate all the persistent pointers in a page will become clear
later
...
The term page is generally used to refer to a real-memory or virtual-memory page, and the term
block is used to refer to disk blocks in the database
...
We shall use the terms page and block
interchangeably in this section
...
Data Storage and
Querying

© The McGraw−Hill
Companies, 2001

11
...
9

Storage for Object-Oriented Databases∗∗

PageId Off
...

PageId Off
...
22

2395
4867

679
...
56850

Page image before swizzling
...
9
...
2 Swizzling Pointers on a Page
Initially no page of the database has been allocated a page in virtual memory
...
Database pages get loaded into virtual-memory when
the database system needs to access data on the page
...
The system then loads the database page into the virtualmemory page it has allocated to it
...
It takes the following actions for
each persistent pointer in the page
...
Let Pi be the
full page identiﬁer of pi , found in the translation table in page P
...
If page Pi does not already have a virtual-memory page allocated to it, the
system now allocates a free page in virtual memory to it
...
At this
point, the page in virtual address space does not have any storage allocated
for it, either in memory or on disk; it is merely a range of addresses reserved
for the database page
...

2
...
The system updates the persistent pointer being considered, whose
value is pi , oi , by replacing pi with vi
...
Data Storage and
Querying

© The McGraw−Hill
Companies, 2001

11
...

PageId Off
...

5001

4867

5001

255

object 1

020

object 2

170

object 3

PageID FullPageID
translation table

Figure 11
...
34278
519
...

Figure 11
...
22 after the system has
brought that page into memory and swizzled the pointers in it
...
34278 has been mapped to page
5001 in memory, whereas the page whose identiﬁer is 519
...
All the pointers in objects
have been updated to reﬂect the new mapping, and can now be used as in-memory
pointers
...
Thus, objects in in-memory pages contain only inmemory pointers
...
That is indeed an important
advantage!

11
...
4
...
As we described,
a segmentation violation will occur, and will result in a function call on the database
system
...
It ﬁrst determines what database page was allocated to virtual-memory page
vi ; let the full page identiﬁer of the database page be Pi
...
)
2
...

435

436

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

IV
...
Storage and File
Structure

11
...
It carries out pointer swizzling out on page Pi , as described earlier in “Swizzling Pointer on a Page”
...
After swizzling all persistent pointers in P , the system allows the pointer
dereference that resulted in the segmentation violation to continue
...

If any swizzled pointer that points to an object in page vi is dereferenced later,
the dereference proceeds just like any other virtual-memory access, with no extra
overheads
...
This overhead
has to be incurred on every access to objects in the page, whereas when swizzling is
performed, the overhead is incurred only on the ﬁrst access to an object in the page
...
Hardware swizzling
thus gives excellent performance beneﬁts to applications that repeatedly dereference
pointers
...
9
...
4 Optimizations
Software swizzling performs a deswizzling operation when a page in memory has
to be written back to the database, to convert in-memory pointers back to persistent
pointers
...
For example, as shown in Figure 11
...
34278 (with short
identiﬁer 2395 in the page shown) is mapped to virtual-memory page 5001
...
Thus, the short identiﬁer 5001 in
object 1 and in the table match each other again
...

Several optimizations can be carried out on the basic scheme described here
...
If the system can allocate the page in this attempt, pointers
to it do not need to be updated
...
56850 with short
page identiﬁer 4867 was mapped to virtual-memory page 4867, which is the same as
its short page identiﬁer
...
If every page can be allocated to its appropriate
location in virtual address space, none of the pointers need to be translated, and the
cost of swizzling is reduced signiﬁcantly
...
If they do not, a page that has been brought into virtual
memory will have to be replaced, and that replacement is hard to do, since there may
be in-memory pointers to objects in that page
...
Data Storage and
Querying

11
...
For set-level swizzling, the system uses a
single translation table for all pages in the segment
...

11
...
5 Disk Versus Memory Structure of Objects
The format in which objects are stored in memory may be different from the format in which they are stored on disk in the database
...
Another reason may be that we want to have the database accessible from
different machines, possibly based on different architectures, and from different languages, and from programs compiled under different compilers, all of which result
in differences in the in-memory representation
...
The physical structure (such as sizes and representation of integers)
in the object depends on the machine on which the program is run
...

The solution to this problem is to make the physical representation of objects in the
database independent of the machine and of the compiler
...
It can do
this conversion transparently at the same time that it swizzles pointers in the object,
so the programmer does not need to worry about the conversion
...
One such
language is the Object Deﬁnition Language (ODL) developed by the Object Database
Management Group (ODMG)
...

The deﬁnition of the structure of each class in the database is stored (logically) in
the databases
...
We can generate this code
automatically, using the stored deﬁnition of the class of the object
...
Hidden pointers are transient pointers
5
...
However, they differ in how the bits of an integer are laid out within a word
...

437

438

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

IV
...
Storage and File
Structure

11
...
These pointers point (indirectly) to tables
used to implement certain methods of the object
...
Therefore, when a process
accesses an object, the hidden pointers must be ﬁxed to point to the correct location
...

11
...
6 Large Objects
Objects may also be extremely large; for instance, multimedia objects may occupy
several megabytes of space
...
Large objects containing binary data are called
binary large objects (blobs), while large objects containing character data, are called
character large objects (clobs), as we saw in Section 9
...
1
...
Large objects
and long ﬁelds are often stored in a special ﬁle (or collection of ﬁles) reserved for
long-ﬁeld storage
...
Large
objects may need to be stored in a contiguous sequence of bytes when they are
brought into memory; in that case, if an object is bigger than a page, contiguous pages
of the buffer pool must be allocated to store it, which makes buffer management more
difﬁcult
...
If inserts and
deletes need to be supported, we can handle large objects by using B-tree structures
(which we study in Chapter 12)
...

For practical reasons, we may manipulate large objects by using application programs, instead of doing so within the database:
• Text data
...

• Image/Graphical data
...
Although some graphical
data often are managed within the database system itself, special application
software is used for many cases, such as integrated circuit design
...
Audio and video data are typically a digitized, compressed representation created and displayed by separate application software
...

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

438

Chapter 11

IV
...
Storage and File
Structure

© The McGraw−Hill
Companies, 2001

Storage and File Structure

The most widely used method for updating such data is the checkout/checkin
method
...
Checkout and a checkin correspond roughly to read and write
...

11
...
They are classiﬁed by the speed with which they can access data, by their cost per unit
of data to buy the memory, and by their reliability
...

• Two factors determine the reliability of storage media: whether a power failure or system crash causes data to be lost, and what the likelihood is of physical failure of the storage devise
...
For disks, we can use mirroring
...
By striping
data across disks, these methods offer high throughput rates on large accesses;
by introducing redundancy across disks, they improve reliability greatly
...
RAID level 1 (mirroring) and RAID level
5 are the most commonly used
...
One approach to mapping the database to ﬁles is to use several ﬁles,
and to store records of only one ﬁxed length in any given ﬁle
...

There are different techniques for implementing variable-length records, including the slotted-page method, the pointer method, and the reserved-space
method
...
If we can access several of the records
we want with only one block access, we save disk accesses
...

• One way to reduce the number of disk accesses is to keep as many blocks as
possible in main memory
...
The buffer is that part of main memory avail-

439

440

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

IV
...
Storage and File
Structure

11
...
The subsystem responsible for the
allocation of buffer space is called the buffer manager
...
There are schemes to detect
dangling persistent pointers
...
The hardware-based schemes use the virtualmemory-management support implemented in hardware, and made accessible to user programs by many current-generation operating systems
...
Data Storage and
Querying

© The McGraw−Hill
Companies, 2001

11
...
1 List the physical storage media available on the computers you use routinely
...

11
...
3 Consider the following data and parity-block arrangement on four disks:
Disk 1
B1
P1
B8

...

...

...

Disk 3
B3
B6
B9

...

...

...

The Bi ’s represent data blocks; the Pi ’s represent parity blocks
...
What, if any, problem might
this arrangement present?

441

442

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

IV
...
Storage and File
Structure

© The McGraw−Hill
Companies, 2001

Exercises

441

11
...
Assume that partially written blocks can
be detected
...
e
...
Suggest schemes
for getting the effect of atomic block writes with the following RAID schemes
...

a
...
RAID level 5 (block interleaved, distributed parity)
11
...
Thus, the data in the failed disk must be rebuilt and written
to the replacement disk while the system is in operation
...

11
...
MRU is preferable to LRU
...
LRU is preferable to MRU
...
7 Consider the deletion of record 5 from the ﬁle of Figure 11
...
Compare the
relative merits of the following techniques for implementing the deletion:
a
...

b
...

c
...

11
...
9 after each of the following steps:
a
...

b
...

c
...

11
...
Explain your answer
...
10 Give an example of a database application in which the pointer method of representing variable-length records is preferable to the reserved-space method
...

11
...
12 after each of the following steps:
a
...

b
...

c
...

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

442

Chapter 11

IV
...
Storage and File
Structure

© The McGraw−Hill
Companies, 2001

Storage and File Structure

11
...
12?
11
...
13 after each of the following steps:
a
...

b
...

c
...

11
...

11
...
Discuss how the control on replacement
that it provides would be useful for the implementation of database systems
...
16 In the sequential ﬁle organization, why is an overﬂow block used even if there
is, at the moment, only one overﬂow record?
11
...
Store each relation in one ﬁle
...
Store multiple relations (perhaps even the entire database) in one ﬁle
...
18 Consider a relational database with two relations:
course (course-name, room, instructor)
enrollment (course-name, student-name, grade)
Deﬁne instances of these relations for three courses, each of which enrolls ﬁve
students
...

11
...
For
each block in the ﬁle, two bits are maintained in the bitmap
...
Such bitmaps can be kept in memory even for quite large ﬁles
...
Describe how to keep the bitmap up-to-date on record insertions and deletions
...
Outline the beneﬁt of the bitmap technique over free lists when searching
for free space and when updating free space information
...
20 Give a normalized version of the Index-metadata relation, and explain why using the normalized version would result in worse performance
...
21 Explain why a physical OID must contain more information than a pointer to a
physical storage location
...
Data Storage and
Querying

11
...
22 If physical OIDs are used, an object can be relocated by keeping a forwarding
pointer to its new location
...

11
...
Describe how the unique-id scheme helps in
detecting dangling pointers in an object-oriented database
...
24 Consider the example on page 435, which shows that there is no need for
deswizzling if hardware swizzling is used
...
34278 from 2395 to 5001
...
Rosch and Wethington [1999]
presents an excellent overview of computer hardware, including extensive coverage of all types of storage technology such as ﬂoppy disks, magnetic disks, optical
disks, tapes, and storage interfaces
...
Flash memory is discussed by Dippert and Levy [1993]
...

Alternative disk organizations that provide a high degree of fault tolerance include those described by Gray et al
...
Disk striping
is described by Salem and Garcia-Molina [1986]
...
[1988] and Chen and Patterson [1990]
...
[1994] presents an excellent survey of RAID principles and
implementation
...
The log-based ﬁle
system, which makes disk access sequential, is described in Rosenblum and Ousterhout [1991]
...
The
broadcast medium can be viewed as a level of the storage hierarchy — as a broadcast
disk with high latency
...
[1995]
...
Further discussion of storage issues in mobile computing appears in Douglis
et al
...

Basic data structures are discussed in Cormen et al
...
There are several papers
describing the storage structure of speciﬁc database systems
...
[1976]
discusses System R
...
[1981] reviews System R in retrospect
...
The structure of the Wisconsin Storage System (WiSS) is
described in Chou et al
...
A software tool for the physical design of relational
databases is described by Finkelstein et al
...

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

444

Chapter 11

IV
...
Storage and File
Structure

© The McGraw−Hill
Companies, 2001

Storage and File Structure

Buffer management is discussed in most operating system texts, including in Silberschatz and Galvin [1998]
...
Chou and
Dewitt [1985] presents algorithms for buffer management in database systems, and
describes a performance evaluation
...
[1997] describes techniques used in
the buffer manager of the Oracle database system
...
White and DeWitt [1994] describes the virtual-memory-mapped buffer-management scheme used
in the ObjectStore OODB system and in the QuickStore storage manager
...
The Exodus object storage manager is described in Carey
et al
...
Biliris and Orenstein [1994] provides a survey of storage systems for
object-oriented databases
...
[1994] describes a storage manager for mainmemory databases
...
Data Storage and
Querying

H

A

P

T

12
...
For example, a query like “Find all accounts at the Perryridge branch” or “Find the balance
of account number A-101” references only a fraction of the account records
...
Ideally, the
system should be able to locate these records directly
...

12
...
If we want to learn about a particular topic (speciﬁed by a word or
a phrase) in this textbook, we can search for the topic in the index at the back of the
book, ﬁnd the pages where it occurs, and then read the pages to ﬁnd the information
we are looking for
...
Moreover, the index is much smaller than the book,
further reducing the effort needed to ﬁnd the words we are looking for
...
To ﬁnd a book by a particular author, we would search in the author
catalog, and a card in the catalog tells us where to ﬁnd the book
...

Database system indices play the same role as book indices or card catalogs in
libraries
...

Keeping a sorted list of account numbers would not work well on very large
databases with millions of accounts, since the index would itself be very big; further,
445

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

446

Chapter 12

IV
...
Indexing and Hashing

© The McGraw−Hill
Companies, 2001

Indexing and Hashing

even though keeping the index sorted reduces the search time, ﬁnding an account
can still be rather time-consuming
...
We shall discuss several of these techniques in this chapter
...
Based on a sorted ordering of the values
...
Based on a uniform distribution of values across a range of
buckets
...

We shall consider several techniques for both ordered indexing and hashing
...
Rather, each technique is best suited to particular database
applications
...
Access types
can include ﬁnding records with a speciﬁed attribute value and ﬁnding records
whose attribute values fall in a speciﬁed range
...

• Insertion time: The time it takes to insert a new data item
...

• Deletion time: The time it takes to delete a data item
...

• Space overhead: The additional space occupied by an index structure
...

We often want to have more than one index for a ﬁle
...

An attribute or set of attributes used to look up records in a ﬁle is called a search
key
...
This duplicate meaning for key is (unfortunately) well established
in practice
...

12
...
Each
index structure is associated with a particular search key
...

The records in the indexed ﬁle may themselves be stored in some sorted order, just
as books in a library are stored according to some attribute such as the Dewey deci-

447

448

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

IV
...
Indexing and Hashing

12
...
1

Sequential ﬁle for account records
...
A ﬁle may have several indices, on different search keys
...
(The term primary index is sometimes
used to mean an index on a primary key
...
) Primary indices are also called clustering indices
...

Indices whose search key speciﬁes an order different from the sequential order of the
ﬁle are called secondary indices, or nonclustering indices
...
2
...

Such ﬁles, with a primary index on the search key, are called index-sequential ﬁles
...
They are
designed for applications that require both sequential processing of the entire ﬁle and
random access to individual records
...
1 shows a sequential ﬁle of account records taken from our banking example
...
1, the records are stored in search-key order, with
branch-name used as the search key
...
2
...
1 Dense and Sparse Indices
An index record, or index entry, consists of a search-key value, and pointers to one
or more records with that value as their search-key value
...

There are two types of ordered indices that we can use:
• Dense index: An index record appears for every search-key value in the ﬁle
...
The rest of the
records with the same search key-value would be stored sequentially after the

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

448

Chapter 12

IV
...
Indexing and Hashing

Indexing and Hashing

ﬁrst record, since, because the index is a primary one, records are sorted on
the same search key
...

• Sparse index: An index record appears for only some of the search-key values
...
To locate a record,
we ﬁnd the index entry with the largest search-key value that is less than or
equal to the search-key value for which we are looking
...

Figures 12
...
3 show dense and sparse indices, respectively, for the account
ﬁle
...
Using the
dense index of Figure 12
...
We process this record, and follow the pointer in that record to locate the
next record in search-key (branch-name) order
...
If we are using the sparse
index (Figure 12
...
” Since the last entry (in alphabetic order) before “Perryridge” is “Mianus,” we follow that pointer
...

As we have seen, it is generally faster to locate a record if we have a dense index
rather than a sparse index
...

There is a trade-off that the system designer must make between access time and
space overhead
...
The reason this design is a good trade-off is that the dominant cost in pro-

Brighton
Downtown
Mianus
Perryridge
Redwood
Round Hill

A-217
A-101
A-110
A-215
A-102
A-201
A-218
A-222
A-305
Figure 12
...

750
500
600
700
400
900
700
700
350

449

450

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

IV
...
Indexing and Hashing

12
...
3

Sparse index
...
Once we have brought in the block, the time to scan the entire block
is negligible
...
Thus, unless the record is on an overﬂow block (see Section 11
...
1),
we minimize block accesses while keeping the size of the index (and thus, our space
overhead) as small as possible
...
It is easy to modify our
scheme to handle this situation
...
2
...
2 Multilevel Indices
Even if we use a sparse index, the index itself may become too large for efﬁcient
processing
...
If we have one index record per block, the index has
10,000 records
...
Thus, our index occupies 100 blocks
...

If an index is sufﬁciently small to be kept in main memory, the search time to ﬁnd
an entry is low
...
Binary search can be used on the index
ﬁle to locate an entry, but the search still has a large cost
...
( x denotes
the least integer that is greater than or equal to x; that is, we round upward
...
On a disk system where a
block read takes 30 milliseconds, the search will take 210 milliseconds, which is long
...
In
that case, a sequential search is typically used, and that requires b block reads, which
will take even longer
...

To deal with this problem, we treat the index just as we would treat any other
sequential ﬁle, and construct a sparse index on the primary index, as in Figure 12
...

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

450

Chapter 12

IV
...
Indexing and Hashing

Indexing and Hashing

To locate a record, we ﬁrst use binary search on the outer index to ﬁnd the record for
the largest search-key value less than or equal to the one that we desire
...
We scan this block until we ﬁnd the record that
has the largest search-key value less than or equal to the one that we desire
...

Using the two levels of indexing, we have read only one index block, rather than
the seven we read with binary search, if we assume that the outer index is already in
main memory
...
In such a case, we can create yet another level of index
...
Indices with two or more
levels are called multilevel indices
...
Each level of index could correspond to a unit of physical storage
...

A typical dictionary is an example of a multilevel index in the nondatabase world
...
Such a book

index
block 0

outer index

data
block 0

data
block 1

index
block 1

inner index

Figure 12
...

451

452

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

IV
...
Indexing and Hashing

12
...

Multilevel indices are closely related to tree structures, such as the binary trees
used for in-memory indexing
...
3
...
2
...
3 Index Update
Regardless of what form of index is used, every index must be updated whenever a
record is either inserted into or deleted from the ﬁle
...

• Insertion
...
Again, the actions the system takes next
depend on whether the index is dense or sparse:
Dense indices:
1
...

2
...
If the index record stores pointers to all records with the same
search-key value, the system adds a pointer to the new record to
the index record
...
Otherwise, the index record stores a pointer to only the ﬁrst record
with the search-key value
...

Sparse indices: We assume that the index stores an entry for each block
...
On the other
hand, if the new record has the least search-key value in its block, the
system updates the index entry pointing to the block; if not, the system
makes no change to the index
...
To delete a record, the system ﬁrst looks up the record to be deleted
...
If the deleted record was the only record with its particular search-key
value, then the system deletes the corresponding index record from
the index
...
Otherwise the following actions are taken:
a
...

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

452

Chapter 12

IV
...
Indexing and Hashing

© The McGraw−Hill
Companies, 2001

Indexing and Hashing

b
...
In this case, if the deleted record was
the ﬁrst record with the search-key value, the system updates the
index record to point to the next record
...
If the index does not contain an index record with the search-key value
of the deleted record, nothing needs to be done to the index
...
Otherwise the system takes the following actions:
a
...
If the next
search-key value already has an index entry, the entry is deleted
instead of being replaced
...
Otherwise, if the index record for the search-key value points to the
record being deleted, the system updates the index record to point
to the next record with the same search-key value
...
On deletion or insertion, the system updates the lowestlevel index as described
...
The same technique
applies to further levels of the index, if there are any
...
2
...
A primary index may be sparse, storing only some
of the search-key values, since it is always possible to ﬁnd records with intermediate
search-key values by a sequential access to a part of the ﬁle, as described earlier
...

A secondary index on a candidate key looks just like a dense primary index, except
that the records pointed to by successive values in the index are not stored sequentially
...
If the search key of a primary index is not a candidate key, it sufﬁces
if the index points to the ﬁrst record with a particular value for the search key, since
the other records can be fetched by a sequential scan of the ﬁle
...
The remaining records with the same search-key value could be anywhere in the ﬁle, since the
records are ordered by the search key of the primary index, rather than by the search
key of the secondary index
...

453

454

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

IV
...
Indexing and Hashing

12
...
5

500

A-217
A-110

Brighton
Downtown

750
600

A-215

Mianus

700

A-102

Perryridge

400

A-201
A-218

Perryridge
Perryridge

900
700

A-222
A-305

350
400
500
600
700
750
900

Downtown

Redwood
Round Hill

453

700
350

Secondary index on account ﬁle, on noncandidate key balance
...
The pointers in such a secondary index do not point
directly to the ﬁle
...

Figure 12
...

A sequential scan in primary index order is efﬁcient because records in the ﬁle are
stored physically in the same order as the index order
...
Because secondary-key order and
physical-key order differ, if we attempt to scan the ﬁle sequentially in secondary-key
order, the reading of each record is likely to require the reading of a new block from
disk, which is very slow
...
If a ﬁle has multiple indices, whenever the ﬁle is
modiﬁed, every index must be updated
...
However, they impose a signiﬁcant overhead
on modiﬁcation of the database
...

12
...
Although this degradation can be remedied by reorganization of the ﬁle,
frequent reorganizations are undesirable
...
Data Storage and
Querying

© The McGraw−Hill
Companies, 2001

12
...
6

...

The B+ -tree index structure is the most widely used of several index structures
that maintain their efﬁciency despite insertion and deletion of data
...
Each nonleaf node in the tree has between n/2
and n children, where n is ﬁxed for a particular tree
...
The overhead is acceptable even for frequently modiﬁed ﬁles, since the cost of ﬁle reorganization is avoided
...
This space overhead, too, is acceptable given
the performance beneﬁts of the B+ -tree structure
...
3
...
Figure 12
...
It contains up to n − 1 search-key values K1 , K2 ,
...
, Pn
...

We consider ﬁrst the structure of the leaf nodes
...
, n − 1, pointer
Pi points to either a ﬁle record with search-key value Ki or to a bucket of pointers,
each of which points to a ﬁle record with search-key value Ki
...
Pointer Pn has a special purpose that we shall discuss
shortly
...
7 shows one leaf node of a B+ -tree for the account ﬁle, in which we have
chosen n to be 3, and the search key is branch-name
...

Now that we have seen the structure of a leaf node, let us consider how search-key
values are assigned to particular nodes
...
We
allow leaf nodes to contain as few as (n − 1)/2 values
...
Thus, if Li and Lj are leaf nodes and i < j, then every searchkey value in Li is less than every search-key value in Lj
...

Now we can explain the use of the pointer Pn
...
This ordering allows for efﬁcient sequential processing of the ﬁle
...

The structure of nonleaf nodes is the same as that for leaf nodes, except that all pointers are pointers to tree nodes
...
Data Storage and
Querying

© The McGraw−Hill
Companies, 2001

12
...
3

Brighton

B+ -Tree Index Files

455

Downtown

leaf node
A-212 Brighton

750

A-101 Downtown

500

A-110 Downtown

600

...
7

A leaf node for account B+ -tree index (n = 3)
...
The number of pointers in a node is called the fanout of
the node
...
For i = 2, 3,
...
Pointer Pm points to the part of the subtree that contains those key
values greater than or equal to Km − 1 , and pointer P1 points to the part of the subtree
that contains those search-key values less than K1
...

It is always possible to construct a B+ -tree, for any n, that satisﬁes the preceding
requirements
...
8 shows a complete B+ -tree for the account ﬁle (n = 3)
...

As an example of a B+ -tree for which the root must have less than n/2 values,
Figure 12
...

These examples of B+ -trees are all balanced
...
This property is a requirement for a B+ -tree
...
” It is the balance property of B+ -trees
that ensures good performance for lookup, insertion, and deletion
...
8

Redwood

Perryridge

Redwood

B+ -tree for account ﬁle (n = 3)
...
Data Storage and
Querying

© The McGraw−Hill
Companies, 2001

12
...
9

Perryridge Redwood

Round Hill

B+ -tree for account ﬁle with n = 5
...
3
...
Suppose that we wish to ﬁnd
all records with a search-key value of V
...
10 presents pseudocode for doing
so
...
First, we examine the root node, looking for the smallest search-key value greater than V
...
We then follow pointer Pi to another node
...
In this case
we follow Pm to another node
...
Eventually, we reach a leaf node
...
If
the value V is not found in the leaf node, no record with key value V exists
...
If there are K search-key values in the ﬁle, the path is no longer than
log n/2 (K)
...
With a search-key
size of 12 bytes, and a disk-pointer size of 8 bytes, n is around 200
...
With
n = 100, if we have 1 million search-key values in the ﬁle, a lookup requires only
procedure ﬁnd(value V )
set C = root node
while C is not a leaf node begin
Let Ki = smallest search-key value, if any, greater than V
if there is no such value then begin
Let m = the number of pointers in the node
set C = node pointed to by Pm
end
else set C = the node pointed to by Pi
end
if there is a key value Ki in C such that Ki = V
then pointer Pi directs us to the desired record or bucket
else no record with key value k exists
end procedure
Figure 12
...

457

458

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

IV
...
Indexing and Hashing

12
...
Thus, at most four blocks need to be
read from disk for the lookup
...

An important difference between B+ -tree structures and in-memory tree structures, such as binary trees, is the size of a node, and as a result, the height of the
tree
...
In a B+ -tree,
each node is large—typically a disk block—and a node can have a large number of
pointers
...
In
a balanced binary tree, the path for a lookup can be of length log2 (K) , where K is
the number of search-key values
...
If each node were on a different disk block, 20 block reads would be required to process a lookup, in contrast to
the four block reads for the B+ -tree
...
3
...

Furthermore, when a node is split or a pair of nodes is combined, we must ensure
that balance is preserved
...

Under this assumption, insertion and deletion are performed as deﬁned next
...
Using the same technique as for lookup, we ﬁnd the leaf node in
which the search-key value would appear
...
If the search-key value does not appear,
we insert the value in the leaf node, and position it such that the search keys
are still in order
...

• Deletion
...
We remove the search-key value from the
leaf node if there is no bucket associated with that search-key value or if the
bucket becomes empty as a result of the deletion
...
Assume that we wish
to insert a record with a branch-name value of “Clearview” into the B+ -tree of Figure 12
...
Using the algorithm for lookup, we ﬁnd that “Clearview” should appear
in the node containing “Brighton” and “Downtown
...
” Therefore, the node is split into two nodes
...
11
shows the two leaf nodes that result from inserting “Clearview” and splitting the
node containing “Brighton” and “Downtown
...
Data Storage and
Querying

© The McGraw−Hill
Companies, 2001

12
...
11

Clearview

Downtown

Split of leaf node on insertion of “Clearview
...

Having split a leaf node, we must insert the new leaf node into the B+ -tree structure
...

We need to insert this search-key value into the parent of the leaf node that was split
...
12 shows the result of the insertion
...
It was possible to perform this insertion
because there was room for an added search-key value
...
In the worst case, all nodes along the path to the
root must be split
...

The general technique for insertion into a B+ -tree is to determine the leaf node l
into which insertion must occur
...
If this insertion causes a split, proceed recursively up the tree until either
an insertion does not cause a split or a new root is created
...
13 outlines the insertion algorithm in pseudocode
...
Ki and L
...
The
pseudocode also makes use of the function parent(L) to ﬁnd the parent of a node L
...
The pseudocode refers to inserting an entry (V, P ) into a node
...
For internal nodes, P is stored just after V
...
First,
let us delete “Downtown” from the B+ -tree of Figure 12
...
We locate the entry for
“Downtown” by using our lookup algorithm
...
Since, in our example, n = 3 and
0 < (n − 1)/2 , this node must be eliminated from the B+ -tree
...
12

Downtown

Mianus

Redwood

Mianus

Perryridge

Redwood Round Hill

Insertion of “Clearview” into the B+ -tree of Figure 12
...

459

460

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

IV
...
Indexing and Hashing

12
...
K1 ,
...
Kn−1 , V such that exactly
n/2 of the values L
...
, L
...
Km ≥ V
/* Note: V must be either L
...
Pm , L
...
, L
...
Kn−1 to L
if (V < V ) then insert (P, V ) in L
else insert (P, V ) in L
end
else begin
if (V = V ) /* V is smallest value to go to L */
then add P, L
...
, L
...
Kn−1 , L
...
Pm ,
...
Pn−1 , L
...
Pn to L
delete L
...
, L
...
Kn−1 , L
...
Pn = L
...
Pn = L
end
end
end procedure
Figure 12
...

459

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

460

Chapter 12

IV
...
Indexing and Hashing

Indexing and Hashing

Perryridge

Mianus

Brighton

Clearview

Figure 12
...
12
...
In our example, this deletion leaves
the parent node, which formerly contained three pointers, with only two pointers
...
The resulting B+ -tree appears in Figure 12
...

When we make a deletion from a parent of a leaf node, the parent node itself may
become too small
...
14
...
When we delete the pointer to this node in the latter’s parent, the parent is left
with only one pointer
...

However, since the parent node contains useful information, we cannot simply delete
it
...
This sibling node has room to accommodate the information contained
in our now-too-small node, so we coalesce these nodes, such that the sibling node
now contains the keys “Mianus” and “Redwood
...
Figure 12
...
Notice that the root has only one child pointer after the deletion, so
it is deleted and its sole child becomes the root
...

It is not always possible to coalesce nodes
...
12
...
Once again, the leaf node containing “Perryridge” becomes empty
...
However, in this example, the sibling node already contains the maximum number of pointers: three
...
The solution in this case is to redistribute the pointers such that each sibling has two pointers
...
15

Mianus

Redwood

Redwood

Round Hill

Deletion of “Perryridge” from the B+ -tree of Figure 12
...

461

462

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

IV
...
Indexing and Hashing

12
...
16

Downtown

Redwood

Mianus

Redwood

Round Hill

Deletion of “Perryridge” from the B+ -tree of Figure 12
...

Figure 12
...
Note that the redistribution of values necessitates a change of a searchkey value in the parent of the two siblings
...
If the node is too small, we delete it from its parent
...

Figure 12
...
The procedure
swap variables(L, L ) merely swaps the values of the (pointer) variables L and L ;
this swap has no effect on the tree itself
...
” For nonleaf nodes, this criterion means less than n/2 pointers;
for leaf nodes, it means less than (n − 1)/2 values
...
We can also redistribute
entries by repartitioning entries equally between the two nodes
...
In the case of leaf nodes, the pointer to
an entry actually precedes the key value, so the pointer P precedes the key value V
...

It is worth noting that, as a result of deletion, a key value that is present in an
internal node of the B+ -tree may not be present at any leaf of the tree
...
It can be shown that the number of I/O operations needed for a
worst-case insertion or deletion is proportional to log n/2 (K), where n is the maximum number of pointers in a node, and K is the number of search-key values
...
It is the speed of operation on B+ -trees that makes
them a frequently used index structure in database implementations
...
3
...
3, the main drawback of index-sequential ﬁle organization is the degradation of performance as the ﬁle grows: With growth, an increasing
percentage of index records and actual records become out of order, and are stored in
overﬂow blocks
...
We solve the degradation problem for storing the actual records by using
the leaf level of the B+ -tree to organize the blocks containing the actual records
...
Data Storage and
Querying

12
...
Pn = L
...
Pm is the last pointer in L
remove (L
...
Pm ) from L
insert (L
...
Km−1
end
else begin
let m be such that (L
...
Km ) is the last pointer/value
pair in L
remove (L
...
Km ) from L
insert (L
...
Km ) as the ﬁrst pointer and value in L,
by shifting other pointers and values right
replace V in parent(L) by L
...
symmetric to the then case
...
17

Deletion of entry from a B+ -tree
...
Data Storage and
Querying

© The McGraw−Hill
Companies, 2001

12
...
3

B+ -Tree Index Files

463

use the B+ -tree structure not only as an index, but also as an organizer for records in
a ﬁle
...
Figure 12
...
Since records are usually larger than pointers, the maximum number of records
that can be stored in a leaf node is less than the number of pointers in a nonleaf node
...

Insertion and deletion of records from a B+ -tree ﬁle organization are handled in
the same way as insertion and deletion of entries in a B+ -tree index
...
If the
block located has enough free space for the record, the system stores the record in the
block
...

The split propagates up the B+ -tree in the normal fashion
...
If a block B becomes less
than half full as a result, the records in B are redistributed with the records in an adjacent block B
...
The system updates the nonleaf
nodes of the B+ -tree in the usual fashion
...
We can improve the utilization of space in a B+ tree by involving more sibling nodes in redistribution during splits and merges
...

During insertion, if a node is full the system attempts to redistribute some of its
entries to one of the adjacent nodes, to make space for a new entry
...
Since the three nodes together contain one more record
than can ﬁt in two nodes, each node will be about two-thirds full
...
( x denotes the greatest integer that is less than or equal to x; that
is, we drop the fractional part, if any
...
18

M

(F,7) (G,3) (H,3)

(K,1) (L,6)

B+ -tree ﬁle organization
...
Data Storage and
Querying

12
...
If both sibling
nodes have 2n/3 records, instead of borrowing an entry, the system redistributes
the entries in the node and in the two siblings evenly between two of the nodes, and
deletes the third node
...
With three adjacent nodes used for redistribution,
each node can be guaranteed to have 3n/4 entries
...
However, the cost of update becomes higher as more
sibling nodes are involved in the redistribution
...
4 B-Tree Index Files
B-tree indices are similar to B+ -tree indices
...

In the B+ -tree of Figure 12
...
Every search-key value appears in some leaf node;
several are repeated in nonleaf nodes
...
Figure 12
...
12
...
However, since search keys that appear in
nonleaf nodes appear nowhere else in the B-tree, we are forced to include an additional pointer ﬁeld for each search key in a nonleaf node
...

A generalized B-tree leaf node appears in Figure 12
...
20b
...
In nonleaf nodes, the pointers Pi are the tree pointers that we used also for B+ -trees, while the pointers Bi are
bucket or ﬁle-record pointers
...
This discrepancy
occurs because nonleaf nodes must include pointers Bi , thus reducing the number of

Downtown

Downtown
bucket

Brighton

Brighton
bucket

Clearview

Clearview
bucket

Figure 12
...
12
...
Data Storage and
Querying

© The McGraw−Hill
Companies, 2001

12
...
5

P1

K1

P2

Pn−1

...

Pm−1

Bm−1

Km−1

Pm

(b)

Figure 12
...
(a) Leaf node
...

search keys that can be held in these nodes
...

The number of nodes accessed in a lookup in a B-tree depends on where the search
key is located
...
In contrast, it is sometimes possible to ﬁnd the desired value
in a B-tree before reaching a leaf node
...
Moreover, the fact
that fewer search keys appear in a nonleaf B-tree node, compared to B+ -trees, implies
that a B-tree has a smaller fanout and therefore may have depth greater than that of
the corresponding B+ -tree
...

Deletion in a B-tree is more complicated
...
In a B-tree, the deleted entry may appear in a nonleaf node
...
Speciﬁcally, if search key Ki is deleted, the smallest search key
appearing in the subtree of pointer Pi + 1 must be moved to the ﬁeld formerly occupied by Ki
...

In contrast, insertion in a B-tree is only slightly more complicated than is insertion in
a B+ -tree
...
Thus, many database system implementers prefer the structural simplicity of a B+ -tree
...

12
...
File organizations based on the technique of hashing allow us to avoid accessing an index structure
...
We
study ﬁle organizations and indices based on hashing in the following sections
...
Data Storage and
Querying

12
...
5
...

In our description of hashing, we shall use the term bucket to denote a unit of storage
that can store one or more records
...

Formally, let K denote the set of all search-key values, and let B denote the set of
all bucket addresses
...
Let h denote a hash
function
...
Assume for now that there is space in the bucket to store
the record
...

To perform a lookup on a search-key value Ki , we simply compute h(Ki ), then
search the bucket with that address
...
If we perform a lookup on K5 , the
bucket h(K5 ) contains records with search-key values K5 and records with searchkey values K7
...

Deletion is equally straightforward
...

12
...
1
...
Such
a function is undesirable because all the records have to be kept in the same bucket
...
An ideal hash
function distributes the stored keys uniformly across all the buckets, so that every
bucket has the same number of records
...
That is, the hash function assigns each bucket the
same number of search-key values from the set of all possible search-key values
...
That is, in the average case, each bucket will have
nearly the same number of values assigned to it, regardless of the actual distribution of search-key values
...

As an illustration of these principles, let us choose a hash function for the account
ﬁle using the search key branch-name
...
Data Storage and
Querying

© The McGraw−Hill
Companies, 2001

12
...
5

Static Hashing

467

the desirable properties not only on the example account ﬁle that we have been using,
but also on an account ﬁle of realistic size for a large bank with many branches
...
This hash
function has the virtue of simplicity, but it fails to provide a uniform distribution,
since we expect more branch names to begin with such letters as B and R than Q and
X, for example
...
Suppose that
the minimum balance is 1 and the maximum balance is 100,000, and we use a hash
function that divides the values into 10 ranges, 1–10,000, 10,001–20,000 and so on
...
But records with balances between 1
and 10,000 are far more common than are records with balances between 90,001 and
100,000
...
If the function has a random distribution, even if there
are such correlations in the search keys, the randomness of the distribution will make
it very likely that all buckets will have roughly the same number of records, as long
as each search key occurs in only a small fraction of the records
...
)
Typical hash functions perform computation on the internal binary machine representation of characters in the search key
...
Figure 12
...

Hash functions require careful design
...
A well-designed
function gives an average-case lookup time that is a (small) constant, independent of
the number of search keys in the ﬁle
...
5
...
2 Handling of Bucket Overﬂows
So far, we have assumed that, when a record is inserted, the bucket to which it is
mapped has space to store the record
...
Bucket overﬂow can occur for several reasons:
• Insufﬁcient buckets
...
This designation, of course, assumes that the total number of records
is known when the hash function is chosen
...
Some buckets are assigned more records than are others, so a bucket
may overﬂow even when other buckets still have space
...
Skew can occur for two reasons:

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

468

Chapter 12

IV
...
Indexing and Hashing

Indexing and Hashing

bucket 0

bucket 5
A-102
A-201
A-218

bucket 1

400
900
700

Mianus

700

Downtown
Downtown

500
600

bucket 6

bucket 2

Perryridge
Perryridge
Perryridge

bucket 7
A-215

bucket 3
A-217
A-305

bucket 8
Brighton
Round Hill

750
350

bucket 4
A-222

Figure 12
...

1
...

2
...

So that the probability of bucket overﬂow is reduced, the number of buckets is
chosen to be (nr /fr ) ∗ (1 + d), where d is a fudge factor, typically around 0
...
Some
space is wasted: About 20 percent of the space in the buckets will be empty
...

Despite allocation of a few more buckets than required, bucket overﬂow can still
occur
...
If a record must be
inserted into a bucket b, and b is already full, the system provides an overﬂow bucket
for b, and inserts the record into the overﬂow bucket
...
All the overﬂow buck-

469

470

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

IV
...
Indexing and Hashing

12
...
22

Overﬂow chaining in a hash structure
...
22
...

We must change the lookup algorithm slightly to handle overﬂow chaining
...
The
system must examine all the records in bucket b to see whether they match the search
key, as before
...

The form of hash structure that we have just described is sometimes referred to
as closed hashing
...
Instead, if a bucket is full, the system inserts records in some other bucket in the initial set of buckets B
...
Other policies, such as computing further hash functions, are also used
...
The reason is that deletion under open hashing is troublesome
...
However, in a database system, it is important to be able to handle deletion as well as insertion
...

An important drawback to the form of hashing that we have described is that
we must choose the hash function when we implement the system, and it cannot be
changed easily thereafter if the ﬁle being indexed grows or shrinks
...
Data Storage and
Querying

© The McGraw−Hill
Companies, 2001

12
...
If B is too small, the buckets contain
records of many different search-key values, and bucket overﬂows can occur
...
We study later, in Section 12
...

12
...
2 Hash Indices
Hashing can be used not only for ﬁle organization, but also for index-structure creation
...
We construct a hash index as follows
...
Figure 12
...
The hash function in the ﬁgure
computes the sum of the digits of the account number modulo 7
...
23

Hash index on search key account-number of account ﬁle
...
Data Storage and
Querying

© The McGraw−Hill
Companies, 2001

12
...
6

Dynamic Hashing

471

bucket sizes)
...
In this example, account-number is a primary key for account, so each searchkey has only one associated pointer
...

We use the term hash index to denote hash ﬁle structures as well as secondary
hash indices
...
A
hash index is never needed as a primary index structure, since, if a ﬁle itself is organized by hashing, there is no need for a separate hash index structure on it
...

12
...
Most databases
grow larger over time
...
Choose a hash function based on the current ﬁle size
...

2
...
Although performance degradation is avoided, a signiﬁcant
amount of space may be wasted initially
...
Periodically reorganize the hash structure in response to ﬁle growth
...

This reorganization is a massive, time-consuming operation
...

Several dynamic hashing techniques allow the hash function to be modiﬁed dynamically to accommodate the growth or shrinkage of the database
...
The bibliographical notes provide references to other forms of dynamic hashing
...
6
...
As a result, space efﬁciency is retained
...

With extendable hashing, we choose a hash function h with the desirable properties of uniformity and randomness
...
A typical value for
b is 32
...
Data Storage and
Querying

© The McGraw−Hill
Companies, 2001

12
...

bucket 1

01
...

11
...

...

bucket 2
i3

bucket 3

...

...
24

General extendable hash structure
...
Indeed, 232 is over 4 billion, and
that many buckets is unreasonable for all but the largest databases
...
We do not use the entire b
bits of the hash value initially
...
These i
bits are used as an offset into an additional table of bucket addresses
...

Figure 12
...
The i appearing above
the bucket address table in the ﬁgure indicates that i bits of the hash value h(K) are
required to determine the correct bucket for K
...
Although i bits are required to ﬁnd the correct entry in the bucket
address table, several consecutive table entries may point to the same bucket
...
Therefore, we associate with each bucket an integer giving the length of the
common hash preﬁx
...
24 the integer associated with bucket j is shown as
ij
...
6
...

To locate the bucket containing search-key value Kl , the system takes the ﬁrst i
high-order bits of h(Kl ), looks at the corresponding table entry for this bit string, and
follows the bucket pointer in the table entry
...
If there is room in the bucket,

473

474

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

IV
...
Indexing and Hashing

12
...
If, on the other hand, the bucket is full, it
must split the bucket and redistribute the current records, plus the new one
...

• If i = ij , only one entry in the bucket address table points to bucket j
...
It
does so by considering an additional bit of the hash value
...
It replaces
each entry by two entries, both of which contain the same pointer as the original entry
...
The
system allocates a new bucket (bucket z), and sets the second entry to point
to the new bucket
...
Next, it rehashes each record in bucket
j and, depending on the ﬁrst i bits (remember the system has added 1 to i),
either keeps it in bucket j or allocates it to the newly created bucket
...
Usually, the
attempt will succeed
...
If the hash function has been chosen carefully, it is unlikely
that a single insertion will require that a bucket be split more than once, unless
there are a large number of records with the same search key
...
In
such cases, overﬂow buckets are used to store the records, as in static hashing
...
Thus, the system can split bucket j without increasing the size of
the bucket address table
...

The system allocates a new bucket (bucket z), and set ij and iz to the value
that results from adding 1 to the original ij value
...
(Note that with the new value for ij , not all the entries correspond to hash
preﬁxes that have the same value on the leftmost ij bits
...
Next, as in
the previous case, the system rehashes each record in bucket j, and allocates it
either to bucket j or to the newly created bucket z
...
In the unlikely case that it again fails,
it applies one of the two cases, i = ij or i > ij , as appropriate
...

To delete a record with search-key value Kl , the system follows the same procedure for lookup as before, ending up in some bucket—say, j
...
Data Storage and
Querying

© The McGraw−Hill
Companies, 2001

12
...
25

750
500
600
700
400
900
700
700
350

Sample account ﬁle
...
The bucket too is removed
if it becomes empty
...
The procedure for deciding on
which buckets can be coalesced and how to coalesce buckets is left to you to do as an
exercise
...
Unlike coalescing of buckets, changing the size of
the bucket address table is a rather expensive operation if the table is large
...

Our example account ﬁle in Figure 12
...
The
32-bit hash values on branch-name appear in Figure 12
...
Assume that, initially, the
ﬁle is empty, as in Figure 12
...
We insert the records one by one
...

We insert the record (A-217, Brighton, 750)
...
Next, we insert the record
(A-101, Downtown, 500)
...

When we attempt to insert the next record (Downtown, A-110, 600), we ﬁnd that
the bucket is full
...
We now use 1 bit, allowing us 21 = 2 buckets
...
26

Hash function for branch-name
...
Data Storage and
Querying

© The McGraw−Hill
Companies, 2001

12
...
6

hash prefix

Dynamic Hashing

475

0

0
bucket address table

bucket 1

Figure 12
...

the number of bits necessitates doubling the size of the bucket address table to two
entries
...
Figure 12
...

Next, we insert (A-215, Mianus, 700)
...
Once again, we ﬁnd the bucket full and i = i1
...
This increase in the number of bits necessitates
doubling the size of the bucket address table to four entries, as in Figure 12
...
Since
the bucket of Figure 12
...

For each record in the bucket of Figure 12
...

Next, we insert (A-102, Perryridge, 400), which goes in the same bucket as Mianus
...
The insertion of the third Perryridge record, (A-218, Perryridge, 700),
leads to another overﬂow
...

Hence the system uses an overﬂow bucket, as in Figure 12
...

We continue in this manner until we have inserted all the account records of Figure 12
...
The resulting structure appears in Figure 12
...

1

hash prefix
1

A-217 Brighton

750

1
bucket address table

500

A-110 Downtown

Figure 12
...

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

476

Chapter 12

IV
...
Indexing and Hashing

Indexing and Hashing

1

hash prefix

A-217 Brighton

2

750

2
A-101 Downtown 500
A-110 Downtown 600
2
A-215 Mianus

bucket address table

Figure 12
...

12
...
3 Comparison with Other Schemes
We now examine the advantages and disadvantages of extendable hashing, compared with the other schemes that we have discussed
...
Furthermore, there is minimal space overhead
...
30

3
A-218 Perryridge 700

A-201 Perryridge 900

Hash structure after seven insertions
...
Data Storage and
Querying

12
...
Indexing and Hashing

Comparison of Ordered Indexing and Hashing

477

1
A-217 Brighton

750

A-222 Redwood

700

2
A-101 Downtown 500
A-110 Downtown 600
3
A-215 Mianus

700

A-305 Round Hill 350
3

3

A-102 Perryridge 400

bucket address table

A-218 Perryridge 700

A-201 Perryridge 900

Figure 12
...

ﬁx length
...
The main space saving of extendable hashing
over other forms of hashing is that no buckets need to be reserved for future growth;
rather, buckets can be allocated dynamically
...
This extra reference has only a minor effect on performance
...
5 do not have this extra level of indirection, they lose their minor performance advantage as they become
full
...

The bibliographical notes reference more detailed descriptions of extendable hashing
implementation
...

12
...
We
can organize ﬁles of records as ordered ﬁles, by using index-sequential organization
or B+ -tree organizations
...

Finally, we can organize them as heap ﬁles, where the records are not ordered in any
particular way
...
Data Storage and
Querying

12
...
A database-system implementor could provide many schemes, leaving the ﬁnal decision of which schemes to use
to the database designer
...
Most database systems support B+ -trees and may additionally support
some form of hash ﬁle organization or hash indices
...
The fourth issue, the expected type of query, is critical to the choice of
ordered indexing or hashing
...
, An
from r
where Ai = c
then, to process this query, the system will perform a lookup on an ordered index
or a hash structure for attribute Ai , for value c
...
An ordered-index lookup requires time proportional to the log
of the number of values in r for Ai
...
The only advantage to
an index over a hash structure for this form of query is that the worst-case lookup
time is proportional to the log of the number of values in r for Ai
...
However, the worst-case lookup time is unlikely to occur with hashing, and
hashing is preferable in this case
...
Such a query takes the following form:
select A1 , A2 ,
...

479

480

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

IV
...
Indexing and Hashing

12
...
First, we perform a lookup on value c1
...

If, instead of an ordered index, we have a hash structure, we can perform a lookup
on c1 and can locate the corresponding bucket—but it is not easy, in general, to determine the next bucket that must be examined
...
Thus, there is no simple notion of
“next bucket in sorted order
...
Since values are
scattered randomly by the hash function, the values in the speciﬁed range are likely
to be scattered across many or all of the buckets
...

Usually the designer will choose ordered indexing unless it is known in advance
that range queries will be infrequent, in which case hashing would be chosen
...

12
...
Indices
are not required for correctness, since they are redundant data structures
...
Indices are also important for efﬁcient enforcement of integrity constraints
...

In principle, a database system can decide automatically what indices to create
...
Therefore, most SQL implementations provide the programmer
control over creation and removal of indices via data-deﬁnition-language commands
...
Although the syntax that we
show is widely used and supported by many database systems, it is not part of the
SQL:1999 standard
...

We create an index by the create index command, which takes the form
create index on ()
The attribute-list is the list of attributes of the relations that form the search key for
the index
...
Data Storage and
Querying

12
...
Thus, the command
create unique index b-index on branch (branch-name)
declares branch-name to be a candidate key for branch
...
If the indexcreation attempt succeeds, any subsequent attempt to insert a tuple that violates the
key declaration will fail
...

Many database systems also provide a way to specify the type of index to be used
(such as B+ -tree or hashing)
...

The index name we speciﬁed for an index is required to drop an index
...
9 Multiple-Key Access
Until now, we have assumed implicitly that only one index (or hash table) is used to
process a query on a relation
...

12
...
1 Using Multiple Single-Key Indices
Assume that the account ﬁle has two indices: one for branch-name and one for balance
...
” We write
select loan-number
from account
where branch-name = “Perryridge” and balance = 1000
There are three strategies possible for processing this query:
1
...
Examine each such record to see whether balance = 1000
...
Use the index on balance to ﬁnd all records pertaining to accounts with balances of $1000
...
”
3
...
Also, use the index on balance to ﬁnd pointers to all records

481

482

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

IV
...
Indexing and Hashing

12
...
Take the intersection of these
two sets of pointers
...

The third strategy is the only one of the three that takes advantage of the existence
of multiple indices
...

• There are many records pertaining to accounts with a balance of $1000
...

If these conditions hold, we must scan a large number of pointers to produce a small
result
...
Bitmap indices are outlined in Section 12
...
4
...
9
...
The structure of the index is the same as that of any
other index, the only difference being that the search key is not a single attribute, but
rather is a list of attributes
...
, an ), where the indexed attributes are A1 ,
...
The ordering
of search-key values is the lexicographic ordering
...

Lexicographic ordering is basically the same as alphabetic ordering of words
...
As an illustration, consider the query
select loan-number
from account
where branch-name < “Perryridge” and balance = 1000
We can answer this query by using an ordered index on the search key (branch-name,
balance): For each value of branch-name that is less than “Perryridge” in alphabetic
order, the system locates records with a balance value of 1000
...

The difference between this query and the previous one is that the condition on
branch-name is a comparison condition, rather than an equality condition
...
We shall
consider the grid ﬁle in Section 12
...
3
...
Data Storage and
Querying

© The McGraw−Hill
Companies, 2001

12
...
The R-tree is an extension of the B+ -tree to handle indexing on multiple dimensions
...

12
...
3 Grid Files
Figure 12
...
The two-dimensional array in the ﬁgure is called the grid array, and
the one-dimensional arrays are called linear scales
...

Search keys are mapped to cells in this way
...
Only some
of the buckets and pointers from the cells are shown in the ﬁgure
...
The dotted boxes in the
ﬁgure indicate which cells point to the same bucket
...
To ﬁnd the cell to which the key is mapped, we independently locate the row and column to which the cell belongs
...
To do so, we search the array to ﬁnd the least element that is
greater than “Brighton”
...
If it were the ith element, the search key would map to row i − 1
...
32

2

3

4

2K
5K
10K
50K
2
3
4
5
Linear scale for balance

5

6
100K
6

Buckets

Grid ﬁle on keys branch-name and balance of the account ﬁle
...
Data Storage and
Querying

© The McGraw−Hill
Companies, 2001

12
...
9

Multiple-Key Access

483

the ﬁnal row
...
In this case, the balance 500000 maps to column 6
...
Similarly, (“Downtown”, 60000) would map to the cell in row 1 column 5
...

To perform a lookup to answer our example query, with the search condition of
branch-name < “Perryridge” and balance = 1000
we ﬁnd all rows that can contain branch names less than “Perryridge”, using the
linear scale on branch-name
...
Rows 3 and beyond
contain branch names greater than or equal to “Perryridge”
...
In this case, only column 1 satisﬁes
this condition
...

We therefore look up all entries in the buckets pointed to from these three cells
...
The buckets may contain some search
keys that do not satisfy the required condition, so each search key in the buckets must
be tested again to see whether it satisﬁes the search condition
...

We must choose the linear scales in such a way that the records are uniformly distributed across the cells
...
If more than one cell points
to A, the system changes the cell pointers so that some point to A and others to B
...
If only one cell points to bucket A, B becomes
an overﬂow bucket for A
...
The
process is much like the expansion of the bucket address table in extensible hashing,
and is left for you to do as an exercise
...
If we want our structure to be used for queries on n keys, we construct an ndimensional grid array with n linear scales
...
Consider
this query:
select *
from account
where branch-name = “Perryridge”
The linear scale on branch-name tells us that only cells in row 3 can satisfy this condition
...
Thus, we can use a grid-ﬁle index on

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

484

Chapter 12

IV
...
Indexing and Hashing

© The McGraw−Hill
Companies, 2001

Indexing and Hashing

two search keys to answer queries on either search key by itself, as well as to answer
queries on both search keys
...
If each index were maintained separately, the three together would
occupy more space, and the cost of updating them would be high
...

However, they impose a space overhead (the grid directory can become large), as
well as a performance overhead on record insertion and deletion
...
If insertions to the ﬁle are frequent, reorganization will have to be carried out
periodically, and that can have a high cost
...
9
...

For bitmap indices to be used, records in a relation must be numbered sequentially, starting from, say, 0
...
This is particularly easy to achieve if records are ﬁxed in size, and allocated on consecutive blocks of a ﬁle
...

Consider a relation r, with an attribute A that can take on only one of a small number (for example, 2 to 20) values
...
Another example
would be an attribute income-level, where income has been broken up into 5 levels:
L1: $0 − 9999, L2: $10, 000 − 19, 999, L3: 20, 000 − 39, 999, L4: 40, 000 − 74, 999, and
L5: 75, 000 − ∞
...

12
...
4
...
In its simplest form, a bitmap index on the
attribute A of relation r consists of one bitmap for each value that A can take
...
The ith bit of the
bitmap for value vj is set to 1 if the record numbered i has the value vj for attribute
A
...

In our example, there is one bitmap for the value m and one for f
...
All
other bits of the bitmap for m are set to 0
...
Figure 12
...

We now consider when bitmaps are useful
...
The bitmap index doesn’t
really help to speed up such a selection
...
Data Storage and
Querying

© The McGraw−Hill
Companies, 2001

12
...
9

record
number name gender

address

income
-level

0

John

m

Perryridge

L1

1

Diana

f

Brooklyn

2

Mary

f

3

Peter

4

Kathy

Multiple-Key Access

Bitmaps for gender
m

01101

Bitmaps for
income-level

10010

f

485

L1

10100

L2

L2

01000

Jonestown

L1

L3

00001

m

Brooklyn

L4

L4

00010

f

Perryridge

L3

L5

00000

Figure 12
...

In fact, bitmap indices are useful for selections mainly when there are selections
on multiple keys
...

Consider now a query that selects women with income in the range 10, 000 −
19, 999
...
To evaluate this
selection, we fetch the bitmaps for gender value f and the bitmap for income-level value
L2, and perform an intersection (logical-and) of the two bitmaps
...
In the example in Figure 12
...

Since the ﬁrst attribute can take 2 values, and the second can take 5 values, we
would expect only about 1 in 10 records, on an average, to satisfy a combined condition on the two attributes
...
The system can then compute the
query result by ﬁnding all bits with value 1 in the intersection bitmap, and retrieving
the corresponding records
...

Another important use of bitmaps is to count the number of tuples satisfying a
given selection
...
For instance, if we wish
to ﬁnd out how many women have an income level L2, we compute the intersection
of the two bitmaps, and then count the number of bits that are 1 in the intersection
bitmap
...

Bitmap indices are generally quite small compared to the actual relation size
...
Thus the space occupied by a single bitmap
is usually less than 1 percent of the space occupied by the relation
...
If an attribute A
8
of the relation can take on only one of 8 values, a bitmap index on attribute A would
consist of 8 bitmaps, which together occupy only 1 percent of the size of the relation
...
Data Storage and
Querying

12
...
To recognize deleted
records, we can store an existence bitmap, in which bit i is 0 if record i does not exist
and 1 otherwise
...
9
...
2
...
Therefore,
we can do insertion either by appending records to the end of the ﬁle or by replacing
deleted records
...
9
...
2 Efﬁcient Implementation of Bitmap Operations
We can compute the intersection of two bitmaps easily by using a for loop: the ith
iteration of the loop computes the and of the ith bits of the two bitmaps
...
A word usually consists of 32 or 64
bits, depending on the architecture of the computer
...
What is important to note is that a single
bit-wise and instruction can compute the intersection of 32 or 64 bits at once
...
Only 31,250 instructions are needed to compute the intersection of two bitmaps for our relation, assuming a 32-bit word length
...

Just like bitmap intersection is useful for computing the and of two conditions,
bitmap union is useful for computing the or of two conditions
...

The complement operation can be used to compute a predicate involving the negation of a condition, such as not (income-level = L1)
...
It may appear that not (income-level = L1) can be implemented by just computing the complement of the bitmap for income level L1
...
Bits corresponding to such records would be 0 in the original bitmap,
but would become 1 in the complement, although the records don’t exist
...
For instance, if the value
of income-level is null, the bit would be 0 in the original bitmap for value L1, and 1 in
the complement bitmap
...
Similarly, to handle null values, the complement bitmap must
also be intersected with the complement of the bitmap for the value null
...
We can maintain an array with 256 entries, where the ith entry stores the
1
...

487

488

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

IV
...
Indexing and Hashing

12
...
Set the total count initially
to 0
...
The number of addition operations would be 1 of the
8
number of tuples, and thus the counting process is very efﬁcient
...

12
...
4
...
In a B+ -tree index leaf, for each value we would normally maintain a list
of all records with that value for the indexed attribute
...
For a value that
occurs in many records, we store a bitmap instead of a list of records
...
Let N be
the number of records in the relation, and assume that a record has a 64-bit number
identifying it
...
In contrast,
the list representation requires 64 bits per record where the value occurs, or 64 ∗
N/16 = 4N bits
...
In our example (with a 64-bit record identiﬁer), if fewer than 1 in 64 records
have a particular value, the list representation is preferable for identifying records
with that value, since it uses fewer bits than the bitmap representation
...

Thus, bitmaps can be used as a compressed storage mechanism at the leaf nodes
of B+ -trees, for those values that occur very frequently
...
10 Summary
• Many queries reference only a small proportion of the records in a ﬁle
...

• Index-sequential ﬁles are one of the oldest index schemes used in database
systems
...
To allow
fast random access, we use an index structure
...
Dense indices contain entries for every search-key value, whereas
sparse indices contain entries only for some search-key values
...
The other indices are called secondary indices
...
However, they impose an overhead
on modiﬁcation of the database
...
Data Storage and
Querying

12
...
To overcome this deﬁciency, we can use
a B+ -tree index
...
The height of a B+ tree is proportional to the logarithm to the base N of the number of records
in the relation, where each nonleaf node stores N pointers; the value of N is
often around 50 or 100
...

• Lookup on B+ -trees is straightforward and efﬁcient
...
The number of
operations required for lookup, insertion, and deletion on B+ -trees is proportional to the logarithm to the base N of the number of records in the relation,
where each nonleaf node stores N pointers
...

• B-tree indices are similar to B+ -tree indices
...
The
major disadvantages are overall complexity and reduced fanout for a given
node size
...

• Sequential ﬁle organizations require an index structure to locate data
...
Since we do not know at design time precisely which search-key
values will be stored in the ﬁle, a good hash function to choose is one that assigns search-key values to buckets such that the distribution is both uniform
and random
...

Such hash functions cannot easily accommodate databases that grow signiﬁcantly larger over time
...
One example is extendable hashing, which
copes with changes in database size by splitting and coalescing buckets as the
database grows and shrinks
...
For notational convenience, we assume hash ﬁle organizations
have an implicit hash index on the search key used for hashing
...
When multiple

489

490

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

IV
...
Indexing and Hashing

Exercises

489

attributes are involved in a selection condition, we can intersect record identiﬁers retrieved from multiple indices
...

• Bitmap indices provide a very compact representation for indexing attributes
with very few distinct values
...

Review Terms
• Access types
• Access time
• Insertion time
• Deletion time
• Space overhead
• Ordered index
• Primary index
• Clustering index
• Secondary index
• Nonclustering index
• Index-sequential ﬁle
• Index record/entry
• Dense index
• Sparse index
• Multilevel index
• Sequential scan
• B+ -Tree index
• Balanced tree
• B+ -Tree ﬁle organization

•
•
•
•
•
•
•
•
•
•
•

B-Tree index
Static hashing
Hash ﬁle organization
Hash index
Bucket
Hash function
Bucket overﬂow
Skew
Closed hashing
Dynamic hashing
Extendable hashing

•
•
•
•
•

Multiple-key access
Indices on multiple keys
Grid ﬁles
Bitmap index
Bitmap operations
Intersection
Union
Complement
Existence bitmap

Exercises
12
...

12
...

12
...
4 Is it possible in general to have two primary indices on the same relation for
different search keys? Explain your answer
...
Data Storage and
Querying

12
...
5 Construct a B+ -tree for the following set of key values:
(2, 3, 5, 7, 11, 17, 19, 23, 29, 31)
Assume that the tree is initially empty and values are added in ascending order
...
Four
b
...
Eight
12
...
5, show the steps involved in the following
queries:
a
...

b
...

12
...
5, show the form of the tree after each of the
following series of operations:
a
...

b
...

c
...

d
...

e
...

12
...
What is the expected height of the tree as a function of n?
12
...
5 for a B-tree
...
10 Explain the distinction between closed and open hashing
...

12
...
12 Suppose that we are using extendable hashing on a ﬁle that contains records
with the following search-key values:
2, 3, 5, 7, 11, 17, 19, 23, 29, 31
Show the extendable hash structure for this ﬁle if the hash function is h(x) = x
mod 8 and buckets can hold three records
...
13 Show how the extendable hash structure of Exercise 12
...
Delete 11
...
Delete 31
...
Insert 1
...
Insert 15
...
Data Storage and
Querying

© The McGraw−Hill
Companies, 2001

12
...
14 Give pseudocode for deletion of entries from an extendable hash structure,
including details of when and how to coalesce buckets
...

12
...
Give details of how the count should be maintained when buckets are
split, coalesced or deleted
...
Therefore, it
is best not to reduce the size as soon as it is possible to do so, but instead do
it only if the number of index entries becomes small compared to the bucket
address table size
...
16 Why is a hash structure not the best choice for a search key on which range
queries are likely?
12
...
In cases where an overﬂow bucket would be needed, we instead reorganize the grid ﬁle
...

12
...
25
...
Construct a bitmap index on the attributes branch-name and balance, dividing balance values into 4 ranges: below 250, 250 to below 500, 500 to below
750, and 750 and above
...
Consider a query that requests all accounts in Downtown with a balance of
500 or more
...

12
...
Make sure that
your technique works even in the presence of null values, by using a bitmap
for the value null
...
20 How does data encryption affect index schemes? In particular, how might it
affect schemes that attempt to store data in sorted order?

Bibliographical Notes
Discussions of the basic data structures in indexing and hashing can be found in
Cormen et al
...
B-tree indices were ﬁrst introduced in Bayer [1972] and Bayer
and McCreight [1972]
...
The bibliographic notes in Chapter 16 provides references to
research on allowing concurrent accesses and updates on B+ -trees
...

Several alternative tree and treelike search structures have been proposed
...
Such trees may not be balanced
in the sense that B+ -trees are
...
[1989], Orenstein

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

492

Chapter 12

IV
...
Indexing and Hashing

© The McGraw−Hill
Companies, 2001

Indexing and Hashing

[1982], Litwin [1981] and Fredkin [1960]
...

Knuth [1973] analyzes a large number of different hashing techniques
...
Extendable hashing was introduced by Fagin et al
...
Linear hashing was introduced by Litwin [1978] and Litwin [1980]; Larson
[1982] presents a performance analysis of linear hashing
...
Larson [1988] presents a variant of linear hashing
...
An alternative given by Ramakrishna and Larson [1989] allows retrieval in a single disk access
at the price of a high overhead for a small fraction of database modiﬁcations
...

The grid ﬁle structure appears in Nievergelt et al
...

Bitmap indices, and variants called bit-sliced indices and projection indices are described in O’Neil and Quass [1997]
...
They provide very large speedups on certain types of queries, and are today implemented on most database systems
...

493

494

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

C

IV
...
Query Processing

E

R

1

3

Query Processing

Query processing refers to the range of activities involved in extracting data from
a database
...

13
...
1
...
Parsing and translation
2
...
Evaluation
Before query processing can begin, the system must translate the query into a usable form
...
A more useful internal representation
is one based on the extended relational algebra
...
This translation process is similar to the work
performed by the parser of a compiler
...
The system constructs a parse-tree representation of the query, which it then translates into
a relational-algebra expression
...
Data Storage and
Querying

© The McGraw−Hill
Companies, 2001

13
...
1

statistics
about data

Steps in query processing
...
1 Most compiler texts cover parsing (see the bibliographical
notes)
...

For example, we have seen that, in SQL, a query could be expressed in several different ways
...
Furthermore, the relational-algebra representation of a query
speciﬁes only partially how to evaluate a query; there are usually several ways to
evaluate relational-algebra expressions
...
For example, to implement the preceding selection, we can search
every tuple in account to ﬁnd tuples with balance less than 2500
...

To specify fully how to evaluate a query, we need not only to provide the relationalalgebra expression, but also to annotate it with instructions specifying how to eval1
...
Therefore, the stored relation can be used, instead of uses of the view being replaced by the expression deﬁning
the view
...
2
...

495

496

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

IV
...
Query Processing

13
...
2

A query-evaluation plan
...
Annotations may state the algorithm to be used for a speciﬁc
operation, or the particular index or indices to use
...

A sequence of primitive operations that can be used to evaluate a query is a queryexecution plan or query-evaluation plan
...
2 illustrates an evaluation plan
for our example query, in which a particular index (denoted in the ﬁgure as “index 1”) is speciﬁed for the selection operation
...

The different evaluation plans for a given query can have different costs
...
Rather, it is the responsibility of the system to construct a query-evaluation plan
that minimizes the cost of query evaluation
...

Once the query plan is chosen, the query is evaluated with that plan, and the result
of the query is output
...
For instance, instead of using the
relational-algebra representation, several databases use an annotated parse-tree representation based on the structure of the given SQL query
...

In order to optimize a query, a query optimizer must know the cost of each operation
...

Section 13
...
Sections 13
...
6 cover the evaluation of individual relational-algebra operations
...

In Section 13
...

13
...
Data Storage and
Querying

13
...
The response time for a query-evaluation plan (that is, the clock
time required to execute the plan), assuming no other activity is going on on the computer, would account for all these costs, and could be used as a good measure of the
cost of the plan
...
Moreover, CPU speeds have been
improving much faster than have disk speeds
...
Finally, estimating the CPU time is relatively hard, compared to estimating the disk-access cost
...

We use the number of block transfers from disk as a measure of the actual cost
...
This assumption ignores the variance arising from rotational
latency (waiting for the desired data to spin under the read – write head) and seek
time (the time that it takes to move the head over the desired track or cylinder)
...
We also
need to distinguish between reads and writes of blocks, since it takes more time to
write a block to disk than to read a block from disk
...
The number of seek operations performed
2
...
The number of blocks written
and then add up these numbers after multiplying them by the average seek time,
average transfer time for reading a block, and average transfer time for writing a
block, respectively
...
For simplicity we ignore these details, and leave
it to you to work out more precise cost estimates for various operations
...
These are taken into account separately where required
...

In the best case, all data can be read into the buffers, and the disk does not need
to be accessed again
...
When presenting cost
estimates, we generally assume the worst case
...
3 Selection Operation
In query processing, the ﬁle scan is the lowest-level operator to access data
...

497

498

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

IV
...
Query Processing

13
...

13
...
1 Basic Algorithms
Consider a selection operation on a relation whose tuples are stored together in one
ﬁle
...
In a linear search, the system scans each ﬁle block and tests
all records to see whether they satisfy the selection condition
...

The cost of linear search, in terms of number of I/O operations, is br , where
br denotes the number of blocks in the ﬁle
...

Although it may be slower than other algorithms for implementing selection, the linear search algorithm can be applied to any ﬁle, regardless of the
ordering of the ﬁle, or the availability of indices, or the nature of the selection
operation
...

• A2 (binary search)
...
The system performs the binary search
on the blocks of the ﬁle
...
If the selection is on a nonkey attribute, more than one block may
contain required records, and the cost of reading the extra blocks has to be
added to the cost estimate
...
2), and dividing it by
the average number of records that are stored per block of the relation
...
3
...
In Chapter 12, we pointed out that it is
efﬁcient to read the records of a ﬁle in an order corresponding closely to physical
order
...
An index that is not a
primary index is called a secondary index
...
Ordered indices,
such as B+ -trees, also permit access to tuples in a sorted order, which is useful for
implementing range queries
...
We
use the selection predicate to guide us in the choice of the index to use in processing
the query
...
Data Storage and
Querying

13
...
For an equality comparison on a key
attribute with a primary index, we can use the index to retrieve a single record
that satisﬁes the corresponding equality condition
...

• A4 (primary index, equality on nonkey)
...
The only difference from the previous
case is that multiple records may need to be fetched
...

The cost of the operation is proportional to the height of the tree, plus the
number of blocks containing records with the speciﬁed search key
...
Selections specifying an equality condition
can use a secondary index
...

In the ﬁrst case, only one record is retrieved, and the cost is equal to the
height of the tree plus one I/O operation to fetch the record
...
The cost could become even worse than
that of linear search if a large number of records are retrieved
...
If secondary indices store pointers to records’ physical location, the pointers
will have to be updated when records are moved
...
Accessing a record through a secondary index is then
even more expensive since a search has to be performed on the B+ -tree used in the
ﬁle organization
...

13
...
3 Selections Involving Comparisons
Consider a selection of the form σA≤v (r)
...
A primary ordered index (for example, a
primary B+ -tree index) can be used when the selection condition is a comparison
...
For A ≥ v, we
look up the value v in the index to ﬁnd the ﬁrst tuple in the ﬁle that has a value
of A = v
...
Data Storage and
Querying

© The McGraw−Hill
Companies, 2001

13
...
3

Selection Operation

499

all tuples that satisfy the condition
...

For comparisons of the form A < v or A ≤ v, an index lookup is not required
...
The case of A ≤ v is similar, except that the scan continues up to (but
not including) the ﬁrst tuple with attribute A > v
...

• A7 (secondary index, comparison)
...
The lowestlevel index blocks are scanned, either from the smallest value up to v (for <
and ≤), or from v up to the maximum value (for > and ≥)
...
This step may require an I/O operation for each record fetched, since consecutive records may
be on different disk blocks
...

Therefore the secondary index should be used only if very few records are
selected
...
3
...
We now consider more complex selection
predicates
...

• Negation: The result of a selection σ¬θ (r) is the set of tuples of r for which the
condition θ evaluates to false
...

We can implement a selection operation involving either a conjunction or a disjunction of simple conditions by using one of the following algorithms:
• A8 (conjunctive selection using one index)
...
If one

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

500

Chapter 13

IV
...
Query Processing

© The McGraw−Hill
Companies, 2001

Query Processing

is, one of the selection algorithms A2 through A7 can retrieve records satisfying that condition
...

To reduce the cost, we choose a θi and one of algorithms A1 through A7 for
which the combination results in the least cost for σθi (r)
...

• A9 (conjunctive selection using composite index)
...
If the selection speciﬁes an equality condition on two
or more attributes, and a composite index exists on these combined attribute
ﬁelds, then the index can be searched directly
...

• A10 (conjunctive selection by intersection of identiﬁers)
...
This algorithm requires indices with
record pointers, on the ﬁelds involved in the individual conditions
...
The intersection of all the retrieved pointers is the set of pointers to tuples
that satisfy the conjunctive condition
...
If indices are not available on all the individual
conditions, then the algorithm tests the retrieved records against the remaining conditions
...
This cost can be reduced by sorting the list of pointers and
retrieving records in the sorted order
...
Section 13
...

• A11 (disjunctive selection by union of identiﬁers)
...
The union of all the
retrieved pointers yields the set of pointers to all tuples that satisfy the disjunctive condition
...

However, if even one of the conditions does not have an access path, we
will have to perform a linear scan of the relation to ﬁnd tuples that satisfy the
condition
...

The implementation of selections with negation conditions is left to you as an exercise (Exercise 13
...

501

502

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

IV
...
Query Processing

13
...
4 Sorting
Sorting of data plays an important role in database systems for two reasons
...
Second, and equally important for
query processing, several of the relational operations, such as joins, can be implemented efﬁciently if the input relations are ﬁrst sorted
...
5
...
However, such a process orders the relation
only logically, through an index, rather than physically
...

For this reason, it may be desirable to order the records physically
...
In the ﬁrst
case, standard sorting techniques such as quick-sort can be used
...

Sorting of relations that do not ﬁt in memory is called external sorting
...

We describe the external sort – merge algorithm next
...

1
...

i = 0;
repeat
read M blocks of the relation, or the rest of the relation,
whichever is smaller;
sort the in-memory part of the relation;
write the sorted data to run ﬁle Ri ;
i = i + 1;
until the end of the relation
2
...
Suppose, for now, that the total number of runs, N, is less than M, so that we can allocate one page frame to each
run and have space left to hold one page of output
...
Data Storage and
Querying

© The McGraw−Hill
Companies, 2001

13
...
The output ﬁle is buffered
to reduce the number of disk write operations
...

In general, if the relation is much larger than memory, there may be M or more
runs generated in the ﬁrst stage, and it is not possible to allocate a page frame for each
run during the merge stage
...
Since there is enough memory for M − 1 input buffer pages, each merge can
take M − 1 runs as input
...
Then, it merges the next M − 1
runs similarly, and so on, until it has processed all the initial runs
...
If this reduced number of runs
is still greater than or equal to M , another pass is made, with the runs created by the
ﬁrst pass as input
...
The
passes repeat as many times as required, until the number of runs is less than M ; a
ﬁnal pass then generates the sorted output
...
3 illustrates the steps of the external sort– merge for an example relation
...
During the merge stage,
two page frames are used for input and one for output
...
3

d

3

p

2

create
runs

c 33

7

m

7

p

b 14

d 21

a 14

a 14

a 19

33

a 14

2

d

a 14

merge
pass –2

sorted
output

External sorting using sort – merge
...
Data Storage and
Querying

© The McGraw−Hill
Companies, 2001

13
...
5

Join Operation

503

We compute how many block transfers are required for the external sort merge
in this way: Let br denote the number of blocks containing records of relation r
...
The initial number of runs is br /M
...
Each of these passes reads every block of the relation
once and writes it out once, with two exceptions
...
Second, there may be runs that
are not read in or written out during a pass— for example, if there are M runs to
be merged in a pass, M − 1 are read in and merged, and one run is not accessed
during the pass
...
3, we get a total of 12∗(4+1) =
60 block transfers, as you can verify from the ﬁgure
...

13
...

We use the term equi-join to refer to a join of the form r 1r
...
B s, where A and
B are attributes or sets of attributes of relations r and s respectively
...

• Number of blocks of customer: bcustomer = 400
...

• Number of blocks of depositor: bdepositor = 100
...
5
...
4 shows a simple algorithm to compute the theta join, r 1θ s, of two relations r and s
...
Relation r is called the outer relation and
relation s the inner relation of the join, since the loop for r encloses the loop for s
...

Like the linear ﬁle-scan algorithm for selection, the nested-loop join algorithm requires no indices, and it can be used regardless of what the join condition is
...
Data Storage and
Querying

13
...

end
end
Figure 13
...

join can be expressed as a theta join followed by elimination of repeated attributes by
a projection
...

The nested-loop join algorithm is expensive, since it examines every pair of tuples
in the two relations
...
The number
of pairs of tuples to be considered is nr ∗ns , where nr denotes the number of tuples in
r, and ns denotes the number of tuples in s
...
In the worst case, the buffer can hold only one block of each
relation, and a total of nr ∗ bs + br block accesses would be required, where br and
bs denote the number of blocks containing tuples of r and s respectively
...

If one of the relations ﬁts entirely in main memory, it is beneﬁcial to use that relation as the inner relation, since the inner relation would then be read only once
...

Now consider the natural join of depositor and customer
...
We can use the nested loops to compute the join; assume that depositor
is the outer relation and customer is the inner relation in the join
...
In the worst case, the number of
block accesses is 5000 ∗ 400 + 100 = 2,000,100
...
This computation
requires at most 100 + 400 = 500 block accesses — a signiﬁcant improvement over the
worst-case scenario
...

13
...
2 Block Nested-Loop Join
If the buffer is too small to hold either relation entirely in memory, we can still obtain a major saving in block accesses if we process the relations on a per-block basis,
rather than on a per-tuple basis
...
5 shows block nested-loop join, which is
a variant of the nested-loop join where every block of the inner relation is paired with
every block of the outer relation
...
Data Storage and
Querying

© The McGraw−Hill
Companies, 2001

13
...
5

Join Operation

505

for each block Br of r do begin
for each block Bs of s do begin
for each tuple tr in Br do begin
for each tuple ts in Bs do begin
test pair (tr , ts ) to see if they satisfy the join condition
if they do, add tr · ts to the result
...
5

Block nested-loop join
...
As before,
all pairs of tuples that satisfy the join condition are added to the result
...
Thus, in the worst case, there will be a total of br ∗ bs + br block accesses, where br and bs denote the number of blocks containing records of r and s
respectively
...
In the best case, there will be
br + bs block accesses
...
In the worst case we have to read each block of customer
once for each block of depositor
...
This cost is a signiﬁcant improvement over the
5000 ∗ 400 + 100 = 2, 000, 100 block accesses needed in the worst case for the basic
nested-loop join
...

The performance of the nested-loop and block nested-loop procedures can be further improved:
• If the join attributes in a natural join or an equi-join form a key on the inner
relation, then for each outer relation tuple the inner loop can terminate as soon
as the ﬁrst match is found
...
In other words, if memory has M blocks, we read in M − 2 blocks
of the outer relation at a time, and when we read each block of the inner relation we join it with all the M − 2 blocks of the outer relation
...
The total cost is then
br /(M − 2) ∗ bs + br
...
Data Storage and
Querying

13
...
This scanning
method orders the requests for disk blocks so that the data remaining in the
buffer from the previous scan can be reused, thus reducing the number of disk
accesses needed
...
Section 13
...
3 describes this optimization
...
5
...
4), if an index is available on the inner loop’s join
attribute, index lookups can replace ﬁle scans
...

This join method is called an indexed nested-loop join; it can be used with existing
indices, as well as with temporary indices created for the sole purpose of evaluating
the join
...
For example, consider depositor 1 customer
...
Then, the relevant tuples in s
are those that satisfy the selection “customer-name = John”
...
For each tuple
in the outer relation r, a lookup is performed on the index for s, and the relevant
tuples are retrieved
...
Then, br disk accesses are needed to read relation
r, where br denotes the number of blocks containing records of r
...
Then, the cost of the join can be computed as
br + nr ∗ c, where nr is the number of records in relation r, and c is the cost of a single
selection on s using the join condition
...
3 how to estimate
the cost of a single selection algorithm (possibly using indices); that estimate gives us
the value of c
...

For example, consider an indexed nested-loop join of depositor 1 customer, with
depositor as the outer relation
...
Since customer has 10,000 tuples, the height of the tree is 4, and one more
access is needed to ﬁnd the actual data
...
This cost is lower than the 40, 100 accesses needed
for a block nested-loop join
...
5
...
Let r(R) and s(S) be the relations whose
natural join is to be computed, and let R∩S denote their common attributes
...
Data Storage and
Querying

© The McGraw−Hill
Companies, 2001

13
...
5

Join Operation

507

pr := address of ﬁrst tuple of r;
ps := address of ﬁrst tuple of s;
while (ps = null and pr = null) do
begin
ts := tuple to which ps points;
Ss := {ts };
set ps to point to next tuple of s;
done := false;
while (not done and ps = null) do
begin
ts := tuple to which ps points;
if (ts [JoinAttrs] = ts [JoinAttrs])
then begin
Ss := Ss ∪ {ts };
set ps to point to next tuple of s;
end
else done := true;
end
tr := tuple to which pr points;
while (pr = null and tr [JoinAttrs] < ts [JoinAttrs]) do
begin
set pr to point to next tuple of r;
tr := tuple to which pr points;
end
while (pr = null and tr [JoinAttrs] = ts [JoinAttrs]) do
begin
for each ts in Ss do
begin
add ts 1 tr to result ;
end
set pr to point to next tuple of r;
tr := tuple to which pr points;
end
end
...
6

Merge join
...
Then, their join can be computed
by a process much like the merge stage in the merge – sort algorithm
...
6 shows the merge join algorithm
...
The merge join algorithm associates one pointer
with each relation
...
As the algorithm proceeds, the pointers move through the relation
...

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

508

Chapter 13

IV
...
Query Processing

Query Processing

The algorithm in Figure 13
...
Then, the corresponding tuples (if any) of the other relation are read in, and
are processed as they are read
...
7 shows two relations that are sorted on their join attribute a1
...

Since the relations are in sorted order, tuples with the same value on the join attributes are in consecutive order
...
Since it makes only
a single pass through both ﬁles, the merge join method is efﬁcient; the number of
block accesses is equal to the sum of the number of blocks in both ﬁles, br + bs
...
The merge join algorithm
can also be easily extended from natural joins to the more general case of equi-joins
...

The join attribute here is customer-name
...
In this case, the merge join takes a total of 400 +
100 = 500 block accesses
...
Sorting customer takes 400 ∗ (2 log2 (400/3) + 1), or
6800, block transfers, with 400 more transfers to write out the result
...
Thus, the total cost is 9100 block transfers if the relations are not sorted,
and the memory size is just 3 blocks
...
Adding the cost of writing out the sorted results and reading them
back gives a total cost of 2500 block transfers if the relations are not sorted and the
memory size is 25 blocks
...
6 requires that the set
Ss of all tuples with the same value for the join attributes must ﬁt in main memory
...
7

Sorted relations for merge join
...
Data Storage and
Querying

© The McGraw−Hill
Companies, 2001

13
...
5

Join Operation

509

This requirement can usually be met, even if the relation s is large
...
The overall cost of the merge join increases as a
result
...
The algorithm scans the records
through the indices, resulting in their being retrieved in sorted order
...
Hence, each tuple access could involve accessing a disk block, and
that is costly
...
Suppose that one of the relations is sorted; the other is unsorted, but has a secondary B+ -tree index on the join attributes
...
The result ﬁle contains tuples from the sorted relation and addresses for
tuples of the unsorted relation
...
Extensions of the technique to handle
two unsorted relations are left as an exercise for you
...
5
...
In the hash join algorithm, a hash function h is used to
partition tuples of both relations
...

We assume that
• h is a hash function mapping JoinAttrs values to {0, 1,
...

• Hr0 , Hr1 ,
...
Each tuple tr ∈ r is put in partition Hri , where i = h(tr [JoinAttrs])
...
, Hsnh denote partitions of s tuples, each initially empty
...

The hash function h should have the “goodness” properties of randomness and uniformity that we discussed in Chapter 12
...
8 depicts the partitioning of the
relations
...
If that value is hashed to some value i, the r tuple has to be in Hri and the
s tuple in Hsi
...

For example, if d is a tuple in depositor, c a tuple in customer, and h a hash function
on the customer-name attributes of the tuples, then d and c must be tested only if

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

510

Chapter 13

IV
...
Query Processing

Query Processing

0

1

...

...

0
1

2

2

3

3

4

...

...

4

s
r

partitions
of r

Figure 13
...

h(c) = h(d)
...

However, if h(c) = h(d), we must test c and d to see whether the values in their join
attributes are the same, since it is possible that c and d have different customer-names
that have the same hash value
...
9 shows the details of the hash join algorithm to compute the natural
join of relations r and s
...
After the partitioning of the relations, the rest of the hash join code performs
a separate indexed nested-loop join on each of the partition pairs i, for i = 0,
...

To do so, it ﬁrst builds a hash index on each Hsi , and then probes (that is, looks
up Hsi ) with tuples from Hri
...

The hash index on Hsi is built in memory, so there is no need to access the disk to
retrieve the tuples
...
In the
course of the indexed nested-loop join, the system uses this hash index to retrieve
records that will match records in the probe input
...
It is straightforward to extend the hash join algorithm to compute
general equi-joins
...
It is not necessary for the partitions of the probe relation to ﬁt in
memory
...
If the
size of the build relation is bs blocks, then, for each of the nh partitions to be of size
less than or equal to M , nh must be at least bs /M
...
Data Storage and
Querying

© The McGraw−Hill
Companies, 2001

13
...
5

Join Operation

511

/* Partition s */
for each tuple ts in s do begin
i := h(ts [JoinAttrs]);
Hsi := Hsi ∪ {ts };
end
/* Partition r */
for each tuple tr in r do begin
i := h(tr [JoinAttrs]);
Hri := Hri ∪ {tr };
end
/* Perform join on each partition */
for i := 0 to nh do begin
read Hsi and build an in-memory hash index on it
for each tuple tr in Hri do begin
probe the hash index on Hsi to locate all tuples ts
such that ts [JoinAttrs] = tr [JoinAttrs]
for each matching tuple ts in Hsi do begin
add tr 1 ts to the result
end
end
end
Figure 13
...

to account for the extra space occupied by the hash index on the partition as well, so
nh should be correspondingly larger
...

13
...
5
...
Instead, partitioning has to be done in repeated passes
...
Each bucket generated by one pass is separately read in and
partitioned again in the next pass, to create smaller partitions
...
The system
repeats this splitting of the input until each partition of the build input ﬁts in memory
...

A relation does not need recursive partitioning if M > √h +1, or equivalently M >
n
(bs /M ) + 1, which simpliﬁes (approximately) to M > bs
...
We can use a memory of this size to partition relations of size
9 million blocks, which is 36 gigabytes
...

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

512

Chapter 13

IV
...
Query Processing

© The McGraw−Hill
Companies, 2001

Query Processing

13
...
5
...
Hash-table overﬂow can occur if there are many
tuples in the build relation with the same values for the join attributes, or if the hash
function does not have the properties of randomness and uniformity
...

We can handle a small amount of skew by increasing the number of partitions so
that the expected size of each partition (including the hash index on the partition)
is somewhat less than the size of memory
...
5
...

Even if we are conservative on the sizes of the partitions, by using a fudge factor,
overﬂows can still occur
...
Overﬂow resolution is performed during the build phase,
if a hash-index overﬂow is detected
...
Similarly, Hri is also partitioned using the new hash
function, and only tuples in the matching partitions need to be joined
...
In overﬂow avoidance, the build relation s
is initially partitioned into many small partitions, and then some partitions are combined in such a way that each combined partition ﬁts in memory
...

If a large number of tuples in s have the same value for the join attributes, the resolution and avoidance techniques may fail on some partitions
...

13
...
5
...
Our analysis assumes that there is no hashtable overﬂow
...

The partitioning of the two relations r and s calls for a complete reading of both relations, and a subsequent writing back of them
...
The build and probe phases read each of the partitions once, calling for a further br + bs accesses
...
Accessing such partially ﬁlled blocks can add an overhead of at most 2nh for each of the relations, since
each of the nh partitions could have a partially ﬁlled block that has to be written and
read back
...
Data Storage and
Querying

© The McGraw−Hill
Companies, 2001

13
...
5

Join Operation

513

The overhead 4nh is quite small compared to br + bs , and can be ignored
...
Each pass reduces
the size of each of the partitions by an expected factor of M − 1; and passes are
repeated until each partition is of size at most M blocks
...
Since, in each pass,
every block of s is read in and written out, the total block transfers for partitioning of
s is 2bs logM −1 (bs ) − 1
...
With a memory size of 20
blocks, depositor can be partitioned into ﬁve partitions, each of size 20 blocks, which
size will ﬁt into memory
...
The relation
customer is similarly partitioned into ﬁve partitions, each of size 80
...

The hash join can be improved if the main memory size is large
...
The cost estimate goes down to br + bs
...
5
...
4 Hybrid Hash – Join
The hybrid hash– join algorithm performs another optimization; it is useful when
memory sizes are relatively large, but not all of the build relation ﬁts in memory
...
Hence,
a total of nh + 1 blocks of memory are needed for the partitioning the two relations
...
Further, the hash function is designed in such a
way that the hash index on Hs0 ﬁts in M − nh − 1 blocks, in order that, at the end of
partitioning of s, Hs0 is completely in memory and a hash index can be built on Hs0
...
After they are used for probing,
the tuples can be discarded, so the partition Hr0 does not occupy any memory space
...

The system writes out tuples in the other partitions as usual, and joins them later
...

If the size of the build relation is bs , nh is approximately equal to bs /M
...
For example, suppose the block size is 4 kilobytes, and
the build relation size is 1 gigabyte
...

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

514

Chapter 13

IV
...
Query Processing

Query Processing

Consider the join customer 1 depositor again
...
It occupies 20 blocks
of memory; one block is for input and one block each is for buffering the other four
partitions
...
Ignoring the cost of writing partially ﬁlled blocks, the
cost is 3(80 + 320) + 20 + 80 = 1300 block transfers, instead of 1500 block transfers
without the hybrid hashing optimization
...
5
...
The other join techniques are more efﬁcient than the nested-loop join and its
variants, but can handle only simple join conditions, such as natural joins or equijoins
...
3
...

Consider the following join with a conjunctive condition:
r

1θ ∧θ ∧···∧θ
1

2

n

s

One or more of the join techniques described earlier may be applicable for joins on
the individual conditions r 1θ1 s, r 1θ2 s, r 1θ3 s, and so on
...

The result of the complete join consists of those tuples in the intermediate result that
satisfy the remaining conditions
θ1 ∧ · · · ∧ θi−1 ∧ θi+1 ∧ · · · ∧ θn
These conditions can be tested as tuples in r 1θi s are being generated
...
6 describes algorithms for computing the union of relations
...
6 Other Operations
Other relational operations and extended relational operations — such as duplicate
elimination, projection, set operations, outer join, and aggregation — can be implemented as outlined in Sections 13
...
1 through 13
...
5
...
Data Storage and
Querying

© The McGraw−Hill
Companies, 2001

13
...
6

Other Operations

515

13
...
1 Duplicate Elimination
We can implement duplicate elimination easily by sorting
...
With
external sort – merge, duplicates found while a run is being created can be removed
before the run is written to disk, thereby reducing the number of block transfers
...
The worst-case cost estimate for duplicate elimination is the same
as the worst-case cost estimate for sorting of the relation
...
First, the relation is partitioned on the basis of a hash function on the whole
tuple
...

While constructing the hash index, a tuple is inserted only if it is not already present
...
After all tuples in the partition have been processed, the tuples in the hash index are written to the result
...

Because of the relatively high cost of duplicate elimination, SQL requires an explicit
request by the user to remove duplicates; otherwise, the duplicates are retained
...
6
...
Duplicates can be eliminated by the methods described in Section 13
...
1
...
Generalized projection (which was
discussed in Section 3
...
1) can be implemented in the same way as projection
...
6
...
In r ∪ s, when a concurrent scan of both relations reveals the same tuple in
both ﬁles, only one of the tuples is retained
...
We implement set difference, r − s, similarly, by
retaining tuples in r only if they are absent in s
...
If the relations are not sorted initially, the cost of sorting has to be
included
...

Hashing provides another way to implement these set operations
...
, Hrnh and Hs0 , Hs1 ,
...
Depending on the
operation, the system then takes these steps on each partition i = 0, 1
...
Data Storage and
Querying

13
...
Build an in-memory hash index on Hri
...
Add the tuples in Hsi to the hash index only if they are not already present
...
Add the tuples in the hash index to the result
...
Build an in-memory hash index on Hri
...
For each tuple in Hsi , probe the hash index, and output the tuple to the
result only if it is already present in the hash index
...
Build an in-memory hash index on Hri
...
For each tuple in Hsi , probe the hash index, and, if the tuple is present in
the hash index, delete it from the hash index
...
Add the tuples remaining in the hash index to the result
...
6
...
3
...
For example, the natural left
outer join customer 1 depositor contains the join of customer and depositor, and, in
addition, for each customer tuple t that has no matching tuple in depositor (that is,
where customer-name is not in depositor), the following tuple t1 is added to the result
...

The remaining attributes (from the schema of depositor) of tuple t1 contain the value
null
...
Compute the corresponding join, and then add further tuples to the join result to get the outer-join result
...
To evaluate r 1θ s, we ﬁrst compute r 1θ s, and
save that result as temporary relation q1
...
We can use any of the algorithms for computing the joins, projection, and set difference described earlier
to compute the outer joins
...

The right outer-join operation r 1 θ s is equivalent to s 1θ r, and can
therefore be implemented in a symmetric fashion to the left outer join
...

2
...
It is easy to extend the nested-loop join algorithms
to compute the left outer join: Tuples in the outer relation that do not match
any tuple in the inner relation are written to the output after being padded
with null values
...

517

518

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

IV
...
Query Processing

13
...
Merge join
can be extended to compute the full outer join as follows: When the merge
of the two relations is being done, tuples in either relation that did not match
any tuple in the other relation can be padded with nulls and written to the output
...
Since the relations are sorted, it is easy to detect whether or
not a tuple matches any tuples from the other relation
...

The cost estimates for implementing outer joins using the merge join algorithm are the same as are those for the corresponding join
...

The extension of the hash join algorithm to compute outer joins is left for
you to do as an exercise (Exercise 13
...

13
...
5 Aggregation
Recall the aggregation operator G, discussed in Section 3
...
2
...

The aggregation operation can be implemented in the same way as duplicate elimination
...
However, instead of eliminating tuples with the same value for the grouping attribute, we
gather them into groups, and apply the aggregation operations on each group to get
the result
...

Instead of gathering all the tuples in a group and then applying the aggregation
operations, we can implement the aggregation operations sum, min, max, count, and
avg on the ﬂy as the groups are being constructed
...
For the count operation, it maintains a running count for each group for
which a tuple has been found
...

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

518

Chapter 13

IV
...
Query Processing

Query Processing

If all tuples of the result will ﬁt in memory, both the sort-based and the hash-based
implementations do not need to write any tuples to disk
...
When we use on the
ﬂy aggregation techniques, only one tuple needs to be stored for each of the groups
...

13
...
Now
we consider how to evaluate an expression containing multiple operations
...
The result of each evaluation is materialized in a temporary
relation for subsequent use
...
An
alternative approach is to evaluate several operations simultaneously in a pipeline,
with the results of one operation passed on to the next, without the need to store a
temporary relation
...
7
...
7
...
We shall see that the costs of these approaches can differ
substantially, but also that there are cases where only the materialization approach is
feasible
...
7
...
Consider the expression
Πcustomer -name (σbalance<2500 (account) 1 customer)
in Figure 13
...

Π customer-name

σ balance < 2500

customer

account

Figure 13
...

519

520

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

IV
...
Query Processing

13
...
In our example, there is only one such operation; the selection operation on account
...
We execute these operations by the algorithms that we
studied earlier, and we store the results in temporary relations
...
In our
example, the inputs to the join are the customer relation and the temporary relation
created by the selection on account
...

By repeating the process, we will eventually evaluate the operation at the root of
the tree, giving the ﬁnal result of the expression
...

Evaluation as just described is called materialized evaluation, since the results of
each intermediate operation are created (materialized) and then are used for evaluation of the next-level operations
...
When we computed the cost estimates of algorithms, we ignored the
cost of writing the result of the operation to disk
...
We assume that the records
of the result accumulate in a buffer, and, when the buffer is full, they are written to
disk
...

Double buffering (using two buffers, with one continuing execution of the algorithm while the other is being written out) allows the algorithm to execute more
quickly by performing CPU activity in parallel with I/O activity
...
7
...
We achieve this reduction by combining several relational operations into a pipeline of operations, in which the results of one operation are passed
along to the next operation in the pipeline
...
Combining operations into a pipeline eliminates the cost of
reading and writing temporary relations
...
If materialization were applied, evaluation would involve creating a temporary relation to hold the result of the
join, and then reading back in the result to perform the projection
...
By combining the join and
the projection, we avoid creating the intermediate result, and instead create the ﬁnal
result directly
...
Data Storage and
Querying

13
...
7
...
1 Implementation of Pipelining
We can implement a pipeline by constructing a single, complex operation that combines the operations that constitute the pipeline
...
Therefore, each operation in the pipeline is modeled as a separate process or thread within the system,
which takes a stream of tuples from its pipelined inputs, and generates a stream of
tuples for its output
...

In the example of Figure 13
...
In turn,
it passes the results of the join to the projection as they are generated
...
However,
as a result of pipelining, the inputs to the operations are not available all at once for
processing
...
Demand driven
2
...
Each time that an operation receives a request
for tuples, it computes the next tuple (or tuples) to be returned, and then returns
that tuple
...
If it has some pipelined inputs, the operation also
makes requests for tuples from its pipelined inputs
...

In a producer-driven pipeline, operations do not wait for requests to produce
tuples, but instead generate the tuples eagerly
...
An operation at any other level of a pipeline generates output tuples
when it gets input tuples from lower down in the pipeline, until its output buffer is
full
...
In either case, once the output buffer is full, the operation waits
until its parent operation removes tuples from the buffer, so that the buffer has space
for more tuples
...
The operation repeats this process until all the output tuples have been
generated
...
In a parallel-processing system, operations in a pipeline
may be run concurrently on distinct processors (see Chapter 20)
...
Data Storage and
Querying

© The McGraw−Hill
Companies, 2001

13
...
7

Evaluation of Expressions

521

pulling data up an operation tree from the top
...

Each operation in a demand-driven pipeline can be implemented as an iterator,
which provides the following functions: open(), next(), and close()
...
The implementation of the operation in turn calls open() and next() on its inputs, to get its input
tuples when required
...
The iterator maintains the state of its execution in between calls, so that
successive next() requests receive successive result tuples
...
When the next() function is called, the ﬁle scan continues from after the previous point; when the next tuple satisfying the selection is
found by scanning the ﬁle, the tuple is returned after storing the point where it was
found in the iterator state
...
On calls to
next(), it would return the next pair of matching tuples
...

Details of the implementation of iterators are left for you to complete in Exercise 13
...
Demand-driven pipelining is used more commonly than producer-driven
pipelining, because it is easier to implement
...
7
...
2 Evaluation Algorithms for Pipelining
Consider a join operation whose left-hand– side input is pipelined
...
This
unavailability limits the choice of join algorithm to be used
...
However, indexed nested-loop join can be used: As tuples are received for the
left-hand side of the join, they can be used to index the right-hand– side relation, and
to generate tuples in the join result
...

The restrictions on the evaluation algorithms that are eligible for use are a limiting
factor for pipelining
...
Suppose that the join of
r and s is required, and input r is pipelined
...
The cost of this technique is nr ∗ HTi , where HTi is the height of the
index on s
...
With a join
technique such as hash join, it may be possible to perform the join with a cost of
about 3(br + bs )
...

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

522

Chapter 13

IV
...
Query Processing

© The McGraw−Hill
Companies, 2001

Query Processing

doner := false;
dones := false;
r := ∅;
s := ∅;
result := ∅;
while not doner or not dones do
begin
if queue is empty, then wait until queue is not empty;
t := top entry in queue;
if t = End r then done r := true
else if t = End s then done s := true
else if t is from input r
then
begin
r := r ∪ {t};
result := result ∪ ({t} 1 s);
end
else /* t is from input s */
begin
s := s ∪ {t};
result := result ∪ (r 1 {t});
end
end
Figure 13
...

The effective use of pipelining requires the use of evaluation algorithms that can
generate output tuples even as tuples are received for the inputs to the operation
...
Only one of the inputs to a join is pipelined
...
Both inputs to the join are pipelined
...
If the pipelined input tuples are sorted on the join attributes, and the join
condition is an equi-join, merge join can also be used
...
However, tuples that are not in the
ﬁrst partition will be output only after the entire pipelined input relation is received
...

If both inputs are pipelined, the choice of join algorithms is more restricted
...
Another alternative is the pipelined join technique, shown in Figure
13
...
The algorithm assumes that the input tuples for both input relations, r and s,
are pipelined
...
Special queue entries, called Endr and Ends , which serve as end-of-ﬁle

523

524

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

IV
...
Query Processing

13
...
For efﬁcient evaluation, appropriate indices should be built on the
relations r and s
...

13
...
In the process of generating the internal form
of the query, the parser checks the syntax of the user’s query, veriﬁes that the
relation names appearing in the query are names of relations in the database,
and so on
...

• Given a query, there are generally a variety of methods for computing the
answer
...
Chapter 14 covers query optimization
...
We can handle complex
selections by computing unions and intersections of the results of simple selections
...

• Queries involving a natural join may be processed in several ways, depending
on the availability of indices and the form of physical storage for the relations
...

If indices are available, the indexed nested-loop join can be used
...
It may be advantageous to sort a relation prior to join computation (so as to allow use of
the merge join strategy)
...
The partitioning is
carried out with a hash function on the join attributes, so that corresponding pairs of partitions can be joined independently
...

• Outer join operations can be implemented by simple extensions of join algorithms
...
Data Storage and
Querying

© The McGraw−Hill
Companies, 2001

13
...

• An expression can be evaluated by means of materialization, where the system computes the result of each subexpression and stores it on disk, and then
uses it to compute the result of the parent expression
...

Review Terms
• Query processing
• Evaluation primitive
• Query-execution plan
• Query-evaluation plan
• Query-execution engine
• Measures of query cost

•
•
•
•
•
•

• Sequential I/O
• Random I/O
• File scan
• Linear search
• Binary search
• Selections using indices
• Access paths
• Index scans
• Conjunctive selection
• Disjunctive selection
• Composite index
• Intersection of identiﬁers
• External sorting

•
•
•
•
•

• External sort – merge
• Runs
• N-way merge
• Equi-join
• Nested-loop join

•

Block nested-loop join
Indexed nested-loop join
Merge join
Sort – merge join
Hybrid merge – join
Hash join
Build
Probe
Build input
Probe input
Recursive partitioning
Hash-table overﬂow
Skew
Fudge factor
Overﬂow resolution
Overﬂow avoidance
Hybrid hash – join
Operator tree
Materialized evaluation
Double buffering
Pipelined evaluation
Demand-driven pipeline
(lazy, pulling)
Producer-driven pipeline
(eager, pushing)
Iterator
Pipelined join

525

526

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

IV
...
Query Processing

© The McGraw−Hill
Companies, 2001

Exercises

525

Exercises
13
...

13
...
branch-name
from branch T, branch S
where T
...
assets and S
...

Justify your choice
...
3 What are the advantages and disadvantages of hash indices relative to B+ -tree
indices? How might the type of index available inﬂuence the choice of a queryprocessing strategy?
13
...
Show the runs created on each pass of
the sort-merge algorithm, when applied to sort the following tuples on the ﬁrst
attribute: (kangaroo, 17), (wallaby, 21), (emu, 1), (wombat, 13), (platypus, 3),
(lion, 8), (warthog, 4), (zebra, 11), (meerkat, 6), (hyena, 9), (hornbill, 2), (baboon,
12)
...
5 Let relations r1 (A, B, C) and r2 (C, D, E) have the following properties: r1 has
20,000 tuples, r2 has 45,000 tuples, 25 tuples of r1 ﬁt on one block, and 30 tuples
of r2 ﬁt on one block
...

b
...

d
...
6 Design a variant of the hybrid merge – join algorithm for the case where both
relations are not physically sorted, but both have a sorted secondary index on
the join attributes
...
7 The indexed nested-loop join algorithm described in Section 13
...
3 can be inefﬁcient if the index is a secondary index, and there are multiple tuples with the
same value for the join attributes
...
Under what
conditions would this algorithm be more efﬁcient than hybrid merge – join?
13
...
6 for r1 1 r2 , where r1 and r2 are as deﬁned in Exercise 13
...

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

526

Chapter 13

IV
...
Query Processing

© The McGraw−Hill
Companies, 2001

Query Processing

13
...
Assuming inﬁnite memory, what is the lowest cost way (in terms of I/O
operations) to compute r 1 s? What is the amount of memory required for this
algorithm?
13
...
List different ways to handle the following
selections that involve negation?
a
...
σ¬(branch -city=“Brooklyn”) (branch)
c
...
11 The hash join algorithm as described in Section 13
...
5 computes the natural join
of two relations
...
(Hint: Keep extra information with each tuple in the hash index, to detect
whether any tuple in the probe relation matches the tuple in the hash index
...

13
...
Use the standard iterator functions in
your pseudocode
...

13
...

Bibliographical Notes
A query processor must parse statements in the query language, and must translate
them into an internal form
...
Most compiler texts, such as Aho et al
...

Knuth [1973] presents an excellent description of external sorting algorithms,
including an optimization that can create initial runs that are (on the average) twice
the size of memory
...
These studies, which were related to the development of System R, determined that either the
nested-loop join or merge join nearly always provided the optimal join method (Blasgen and Eswaran [1976]); hence, these two were the only join algorithms implemented in System R
...
Today, hash joins are considered to be highly efﬁcient
...
Hash
join techniques are described in Kitsuregawa et al
...
Zeller and Gray [1990] and Davison
and Graefe [1994] describe hash join techniques that can adapt to the available mem-

527

528

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

IV
...
Query Processing

© The McGraw−Hill
Companies, 2001

Bibliographical Notes

527

ory, which is important in systems where multiple queries may be running at the
same time
...
[1998] describes the use of hash joins and hash teams, which
allow pipelining of hash-joins by using the same partitioning for all hash-joins in a
pipeline sequence, in the Microsoft SQL Server
...
An earlier survey of query-processing techniques appears in Jarke and Koch [1984]
...
[1984] and
Whang and Krishnamurthy [1990]
...

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

C

IV
...
Query Optimization

4

Query Optimization

Query optimization is the process of selecting the most efﬁcient query-evaluation
plan from among the many strategies usually possible for processing a given query,
especially if the query is complex
...
Rather, we expect the system to construct a
query-evaluation plan that minimizes the cost of query evaluation
...

One aspect of optimization occurs at the relational-algebra level, where the system
attempts to ﬁnd an expression that is equivalent to the given expression, but more
efﬁcient to execute
...

The difference in cost (in terms of evaluation time) between a good strategy and a
bad strategy is often substantial, and may be several orders of magnitude
...

14
...
”
Πcustomer -name (σbranch−city = “Brooklyn” (branch 1 (account 1 depositor)))

This expression constructs a large intermediate relation, branch 1 account 1 depositor
...

Since we are concerned with only those tuples in the branch relation that pertain to
branches located in Brooklyn, we do not need to consider those tuples that do not
529

530

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

530

Chapter 14

IV
...
Query Optimization

Query Optimization

Π customer-name

Π customer-name
σ branch-city=Brooklyn

σ branch-city=Brooklyn

branch
account

depositor

(a) Initial expression tree
Figure 14
...

have branch-city = “Brooklyn”
...
Our query
is now represented by the relational-algebra expression
Πcustomer -name ( (σbranch -city = “Brooklyn” (branch)) 1 (account 1 depositor))
which is equivalent to our original algebra expression, but which generates smaller
intermediate relations
...
1 depicts the initial and transformed expressions
...

To choose among different query-evaluation plans, the optimizer has to estimate
the cost of each evaluation plan
...
Instead, optimizers make
use of statistical information about the relations, such as relation sizes and index
depths, to make a good estimate of the cost of a plan
...

In Section 14
...
Using these statistics with the cost formulae in Chapter 13 allows
us to estimate the costs of individual operation
...
7
...
Generation of query-evaluation plans involves two steps: (1) generating expressions that are logically equivalent to the given expression and (2) annotating the resultant expressions in alternative ways to generate alternative query
evaluation plans
...

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

IV
...
Query Optimization

14
...
It does so by means of equivalence rules that specify how
to transform an expression into a logically equivalent one
...
3
...
In Section 14
...

We can choose one based on the estimated cost of the plans
...
Such optimization, called cost-based optimization, is described in Section 14
...
2
...
In Section 14
...

14
...
Given
an expression such as a 1 (b 1 c) to estimate the cost of joining a with (b 1 c), we
need to have estimates of statistics such as the size of b 1 c
...

One thing that will become clear later in this section is that the estimates are not
very accurate, since they are based on assumptions that may not hold exactly
...
However, real-world experience
has shown that even if estimates are not precise, the plans with the lowest estimated
costs usually have actual execution costs that are either the lowest actual execution
costs, or are close to the lowest actual execution costs
...
2
...

• br , the number of blocks containing tuples of relation r
...

• fr , the blocking factor of relation r — that is, the number of tuples of relation r
that ﬁt into one block
...
This value is the same as the size of ΠA (r)
...

532

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

532

Chapter 14

IV
...
Query Optimization

© The McGraw−Hill
Companies, 2001

Query Optimization

The last statistic, V (A, r), can also be maintained for sets of attributes, if desired,
instead of just for individual attributes
...

If we assume that the tuples of relation r are stored together physically in a ﬁle,
the following equation holds:
nr
br =
fr
Statistics about indices, such as the heights of B+ -tree indices and number of leaf
pages in the indices, are also maintained in the catalog
...
This update incurs a substantial amount of overhead
...
Instead, they update the statistics during periods of light system load
...
However, if not too many updates occur in the intervals between the updates of
the statistics, the statistics will be sufﬁciently accurate to provide a good estimation
of the relative costs of the different plans
...
Real-world optimizers often
maintain further statistical information to improve the accuracy of their cost estimates of evaluation plans
...
As an example of a histogram, the range of values for an attribute age of a relation person could be divided
into 0 – 9, 10 – 19,
...
With each range we
store a count of the number of person tuples whose age values lie in that range
...

14
...
2 Selection Size Estimation
The size estimate of the result of a selection operation depends on the selection predicate
...

• σA = a (r): If we assume uniform distribution of values (that is, each value appears with equal probability), the selection result can be estimated to have
nr /V (A, r) tuples, assuming that the value a appears in attribute A of some
record of r
...
However,
it is often not realistic to assume that each value appears with equal probability
...
There is one tuple in the account relation for
each account
...
Therefore, certain branch-name values appear
with greater probability than do others
...
Data Storage and
Querying

14
...
2

533

© The McGraw−Hill
Companies, 2001

Estimating Statistics of Expression Results

533

distribution assumption is often not correct, it is a reasonable approximation
of reality in many cases, and it helps us to keep our presentation relatively
simple
...
If the actual value used
in the comparison (v) is available at the time of cost estimation, a more accurate estimate can be made
...
Assuming that values
are uniformly distributed, we can estimate the number of records that will
satisfy the condition A ≤ v as 0 if v < min(A, r), as nr if v ≥ max(A, r), and
nr ·

v − min(A, r)
max(A, r) − min(A, r)

otherwise
...
In such cases, we
will assume that approximately one-half the records will satisfy the comparison condition
...

• Complex selections:
Conjunction: A conjunctive selection is a selection of the form
σθ1 ∧θ2 ∧···∧θn (r)
We can estimate the result size of such a selection: For each θi , we estimate the size of the selection σθi (r), denoted by si , as described previously
...

The preceding probability is called the selectivity of the selection σθi (r)
...
Thus, we estimate the number of tuples in the full selection
as
s1 ∗ s2 ∗ · · · ∗ sn
nr ∗
nn
r
Disjunction: A disjunctive selection is a selection of the form
σθ1 ∨θ2 ∨···∨θn (r)
A disjunctive condition is satisﬁed by the union of all records satisfying
the individual, simple conditions θi
...
The probability that the tuple will satisfy the disjunction is then 1
minus the probability that it will satisfy none of the conditions:
s1
s2
sn
1 − (1 −
) ∗ (1 −
) ∗ · · · ∗ (1 −
)
nr
nr
nr
Multiplying this value by nr gives us the estimated number of tuples that
satisfy the selection
...
Data Storage and
Querying

14
...
We already know how to estimate
the number of tuples in σθ (r)
...

We can account for nulls by estimating the number of tuples for which
the condition θ would evaluate to unknown, and subtracting that number
from the above estimate ignoring nulls
...

14
...
3 Join Size Estimation
In this section, we see how to estimate the size of the result of a join
...
Each tuple of r × s occupies
lr + ls bytes, from which we can calculate the size of the Cartesian product
...
Let r(R) and s(S) be relations
...

• If R ∩ S is a key for R, then we know that a tuple of s will join with at most
one tuple from r
...
The case where R ∩ S is a key for S is symmetric
to the case just described
...

• The most difﬁcult case is when R ∩ S is a key for neither R nor S
...
Consider a tuple t of r, and assume R ∩ S = {A}
...
Considering all the tuples in r, we estimate
that there are
nr ∗ ns
V (A, s)
tuples in r 1 s
...
These two estimates differ if V (A, r) = V (A, s)
...

Thus, the lower of the two estimates is probably the more accurate one
...
Data Storage and
Querying

14
...
2

535

© The McGraw−Hill
Companies, 2001

Estimating Statistics of Expression Results

535

attribute A in s
...
More important, the preceding estimate depends on the assumption that each value appears with equal probability
...

We can estimate the size of a theta join r 1θ s by rewriting the join as σθ (r × s),
and using the size estimates for Cartesian products along with the size estimates for
selections, which we saw in Section 14
...
2
...

• fcustomer = 25, which implies that bcustomer = 10000/25 = 400
...

• fdepositor = 50, which implies that bdepositor = 5000/50 = 100
...

Also assume that customer-name in depositor is a foreign key on customer
...

Let us now compute the size estimates for depositor 1 customer without using information about foreign keys
...
In this case, the lower
of these estimates is the same as that which we computed earlier from information
about foreign keys
...
2
...

Projection: The estimated size (number of records or number of tuples) of a projection of the form ΠA (r) is V (A, r), since projection eliminates duplicates
...

Set operations: If the two inputs to a set operation are selections on the same relation, we can rewrite the set operation as disjunctions, conjunctions, or negations
...
Similarly, we

536

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

536

Chapter 14

IV
...
Query Optimization

© The McGraw−Hill
Companies, 2001

Query Optimization

can rewrite intersections as conjunctions, and we can rewrite set difference by
using negation, so long as the two relations participating in the set operations
are selections on the same relation
...
2
...

If the inputs are not selections on the same relation, we estimate the sizes
this way: The estimated size of r ∪ s is the sum of the sizes of r and s
...
The estimated
size of r − s is the same size as r
...

Outer join: The estimated size of r 1 s is the size of r 1 s plus the size of r; that of
r 1 s is symmetric, while that of r 1 s is the size of r 1 s plus the sizes of
r and s
...

14
...
5 Estimation of Number of Distinct Values
For selections, the number of distinct values of an attribute (or set of attributes) A in
the result of a selection, V (A, σθ (r)), can be estimated in these ways:
• If the selection condition θ forces A to take on a speciﬁed value (e
...
, A = 3),
V (A, σθ (r)) = 1
...
g
...

• If the selection condition θ is of the form A op v, where op is a comparison
operator, V (A, σθ (r)) is estimated to be V (A, r) ∗ s, where s is the selectivity
of the selection
...
A more
accurate estimate can be derived for this case using probability theory, but the
above approximation works fairly well
...

• If A contains attributes A1 from r and A2 from s, then V (A, r 1 s) is estimated
as
min(V (A1, r) ∗ V (A2 − A1, s), V (A1 − A2, r) ∗ V (A2, s), nr1s )

Note that some attributes may be in A1 as well as in A2, and A1 − A2 and
A2−A1 denote, respectively, attributes in A that are only from r and attributes

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

IV
...
Query Optimization

14
...
Again, more accurate estimates can be derived by
using probability theory, but the above approximations work fairly well
...
The same holds for grouping attributes of aggregation
...
For min(A) and max(A), the number of distinct values can be estimated as min(V (A, r), V (G, r)), where G denotes the grouping attributes
...

14
...
As mentioned at the start of this chapter, a
query can be expressed in several different ways, with different costs of evaluation
...

Two relational-algebra expressions are said to be equivalent if, on every legal database instance, the two expressions generate the same set of tuples
...
) Note that the order of the tuples is irrelevant; the two expressions may
generate the tuples in different orders, but would be considered equivalent as long
as the set of tuples is the same
...
Two expressions in the multiset
version of the relational algebra are said to be equivalent if on every legal database
the two expressions generate the same multiset of tuples
...
We leave extensions to the multiset version of
the relational algebra to you as exercises
...
3
...
We can replace an expression of the ﬁrst form by an expression of the second form, or vice
versa — that is we can replace an expression of the second form by an expression
of the ﬁrst form — since the two expressions would generate the same result on any
valid database
...

We now list a number of general equivalence rules on relational-algebra expressions
...
2
...
A relation name r is
simply a special case of a relational-algebra expression, and can be used wherever E
appears
...
Data Storage and
Querying

© The McGraw−Hill
Companies, 2001

14
...
2

E2

E1

Pictorial representation of equivalences
...
Conjunctive selection operations can be deconstructed into a sequence of individual selections
...

σθ1 ∧θ2 (E) = σθ1 (σθ2 (E))
2
...

σθ1 (σθ2 (E)) = σθ2 (σθ1 (E))
3
...
This transformation can also be referred to as a
cascade of Π
...
(ΠLn (E))
...
Selections can be combined with Cartesian products and theta joins
...
σθ (E1 × E2 ) = E1 1θ E2
This expression is just the deﬁnition of the theta join
...
σθ1 (E1 1θ2 E2 ) = E1 1θ1 ∧θ2 E2
5
...

E1

1θ

E2 = E2

1θ

E1

Actually, the order of attributes differs between the left-hand side and righthand side, so the equivalence does not hold if the order of attributes is taken
into account
...

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

IV
...
3

539

© The McGraw−Hill
Companies, 2001

14
...

6
...
Natural-join operations are associative
...
Theta joins are associative in the following manner:
(E1

1θ

1

E2 ) 1θ2 ∧θ3 E3 = E1

1θ ∧θ
1

3

(E2

1θ

2

E3 )

where θ2 involves attributes from only E2 and E3
...
The commutativity and associativity of join operations
are important for join reordering in query optimization
...
The selection operation distributes over the theta-join operation under the following two conditions:
a
...

σθ0 (E1

1θ E 2 ) =

(σθ0 (E1 )) 1θ E2

b
...

σθ1 ∧θ2 (E1

1θ E2 ) =

(σθ1 (E1 )) 1θ (σθ2 (E2 ))

8
...

a
...
Suppose that the
join condition θ involves only attributes in L1 ∪ L2
...
Consider a join E1 1θ E2
...
Let L3 be attributes of E1 that are involved in join
condition θ, but are not in L1 ∪ L2 , and let L4 be attributes of E2 that are
involved in join condition θ, but are not in L1 ∪ L2
...
The set operations union and intersection are commutative
...

10
...

(E1 ∪ E2 ) ∪ E3 = E1 ∪ (E2 ∪ E3 )
(E1 ∩ E2 ) ∩ E3 = E1 ∩ (E2 ∩ E3 )

540

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

540

Chapter 14

IV
...
Query Optimization

© The McGraw−Hill
Companies, 2001

Query Optimization

11
...

σP (E1 − E2 ) = σP (E1 ) − σP (E2 )
Similarly, the preceding equivalence, with − replaced with either ∪ or ∩, also
holds
...

12
...

ΠL (E1 ∪ E2 ) = (ΠL (E1 )) ∪ (ΠL (E2 ))
This is only a partial list of equivalences
...

14
...
2 Examples of Transformations
We now illustrate the use of the equivalence rules
...

In our example in Section 14
...
We can carry out this transformation by using rule 7
...
Remember
that the rule merely says that the two expressions are equivalent; it does not say that
one is better than the other
...
As an illustration, suppose that we modify our original query to restrict
attention to customers who have a balance over $1000
...
However, we can ﬁrst

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

IV
...
3

541

© The McGraw−Hill
Companies, 2001

14
...
a (associativity of natural join) to transform the join branch 1 (account 1
depositor) into (branch 1 account) 1 depositor:
Πcustomer -name (σbranch -city = “Brooklyn” ∧ balance
((branch 1 account) 1 depositor))

>1000

Then, using rule 7
...
Using rule 1, we
can break the selection into two selections, to get the following subexpression:
σbranch -city = “Brooklyn” (σbalance > 1000 (branch 1 account))
Both of the preceding expressions select tuples with branch-city = “Brooklyn” and
balance > 1000
...
3 depicts the initial expression and the ﬁnal expression after all these
transformations
...
b to get the ﬁnal expression
directly, without using rule 1 to break the selection into two selections
...
b
can itself be derived from rules 1 and 7
...
The preceding example illustrates that the set of equivalence rules in Section 14
...
1 is not minimal
...

Query optimizers therefore use minimal sets of equivalence rules
...
3

branch

σbalance < 1000

account

(b) Tree after multiple transformations
Multiple transformations
...
Data Storage and
Querying

© The McGraw−Hill
Companies, 2001

14
...
a and 8
...
The only attributes that we must retain are those
that either appear in the result of the query or are needed to process subsequent
operations
...
Thus, we reduce the size of the intermediate result
...
Therefore, we can modify the expression to
Πcustomer -name (
( Πaccount -number ((σbranch -city = “Brooklyn” (branch)) 1 account)) 1 depositor)
The projection Πaccount -number reduces the size of the intermediate join results
...
3
...
As mentioned in Chapter 3 and in equivalence rule 6
...
Thus, for all relations r1 , r2 , and r3 ,
(r1

1

r2 ) 1 r3 = r1

1

(r2

1

r3 )

Although these expressions are equivalent, the costs of computing them may differ
...
In contrast,
σbranch -city = “Brooklyn” (branch) 1 account
is probably a small relation
...
Thus, the preceding expression results in one tuple for each account held by a resident of Brooklyn
...

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

IV
...
3

543

© The McGraw−Hill
Companies, 2001

14
...
We do not care about
the order in which attributes appear in a join, since it is easy to change the order
before displaying the result
...

Using the associativity and commutativity of the natural join (rules 5 and 6), we
can consider rewriting our relational-algebra expression as
Πcustomer -name (((σbranch -city = “Brooklyn” (branch)) 1 depositor) 1 account)
That is, we could compute
(σbranch -city = “Brooklyn” (branch)) 1 depositor
ﬁrst, and, after that, join the result with account
...
If there are b branches in Brooklyn and d tuples in the depositor
relation, this Cartesian product generates b ∗ d tuples, one for every possible pair of
depositor tuple and branches (without regard for whether the account in depositor is
maintained at the branch)
...
As a result, we would reject this strategy
...

14
...
4 Enumeration of Equivalent Expressions
Query optimizers use equivalence rules to systematically generate expressions equivalent to the given query expression
...

Given an expression, if any subexpression matches one side of an equivalence rule,
the optimizer generates a new expression where the subexpression is transformed to
match the other side of the rule
...

The preceding process is costly both in space and in time
...
Expression-representation techniques that allow
both expressions to point to shared subexpressions can reduce the space requirement
signiﬁcantly, and many query optimizers use them
...
If an optimizer takes cost estimates of evaluation
into account, it may be able to avoid examining some of the expressions, as we shall
see in Section 14
...
We can reduce the time required for optimization by using techniques such as these
...
Data Storage and
Querying

© The McGraw−Hill
Companies, 2001

14
...
4 Choice of Evaluation Plans
Generation of expressions is only part of the query-optimization process, since each
operation in the expression can be implemented with different algorithms
...
Figure 14
...
3
...
Further, decisions about pipelining
have to be made
...
They would do so if the
indices on branch and account store records with equal values for the index attributes
sorted by branch-name
...
4
...
We can choose any ordering
of the operations that ensures that operations lower in the tree are executed before
operations higher in the tree
...
Although a merge join at a given level may be costlier than
a hash join, it may provide a sorted output that makes evaluating a later operation
(such as duplicate elimination, intersection, or another merge join) cheaper
...
4

An evaluation plan
...
Data Storage and
Querying

545

© The McGraw−Hill
Companies, 2001

14
...
4

Choice of Evaluation Plans

545

performing the join
...

Thus, in addition to considering alternative expressions for a query, we must also
consider alternative algorithms for each operation in an expression
...
We can use
these rules to generate all the query-evaluation plans for a given expression
...
2 coupled with cost estimates for various algorithms
and evaluation methods described in Chapter 13
...
There are two broad approaches: The
ﬁrst searches all the plans, and chooses the best plan in a cost-based fashion
...
We discuss these approaches next
...

14
...
2 Cost-Based Optimization
A cost-based optimizer generates a range of query-evaluation plans from the given
query by using the equivalence rules, and chooses the one with the least cost
...
As an illustration, consider the expression
r1

1 r2 1 · · · 1 rn

where the joins are expressed without any ordering
...
(We
leave the computation of this expression for you to do in Exercise 14
...
) For joins
involving small numbers of relations, this number is acceptable; for example, with
n = 5, the number is 1680
...
With
n = 7, the number is 665280; with n = 10, the number is greater than 17
...
For example, suppose we want to ﬁnd the best join order of the form
(r1

1 r2 1 r3 ) 1 r4 1 r5

which represents all join orders where r1 , r2 , and r3 are joined ﬁrst (in some order),
and the result is joined (in some order) with r4 and r5
...
Thus, there appear to be 144 join orders to examine
...
Thus, instead of 144 choices to examine, we need to examine only
12 + 12 choices
...
Data Storage and
Querying

14
...
cost = ∞)
return bestplan[S]
// else bestplan[S] has not been computed earlier, compute it now
for each non-empty subset S1 of S such that S1 = S
P1 = ﬁndbestplan(S1)
P2 = ﬁndbestplan(S − S1)
A = best algorithm for joining results of P 1 and P 2
cost = P 1
...
cost + cost of A
if cost < bestplan[S]
...
cost = cost
bestplan[S]
...
plan; execute P 2
...
5

Dynamic programming algorithm for join order optimization
...
Dynamic programming algorithms store results of computations
and reuse them, a procedure that can reduce execution time greatly
...
5
...
Each element of the associative array contains two components: the cost of the best plan of S, and the plan itself
...
cost is assumed to be initialized to ∞ if bestplan[S] has not yet
been computed
...
Otherwise, the procedure tries
every way of dividing S into two disjoint subsets
...
The procedure picks the cheapest plan
from among all the alternatives for dividing S into two sets
...
The time
complexity of the procedure can be shown to be O(3n ) (see Exercise 14
...

Actually, the order in which tuples are generated by the join of a set of relations
is also important for ﬁnding the best overall join order, since it can affect the cost of
further joins (for instance, if merge join is used)
...
For
instance, generating the result of r1 1 r2 1 r3 sorted on the attributes common with
r4 or r5 may be useful, but generating it sorted on the attributes common to only r1
and r2 is not useful
...

Hence, it is not sufﬁcient to ﬁnd the best join order for each subset of the set of
n given relations
...
Data Storage and
Querying

547

© The McGraw−Hill
Companies, 2001

14
...
4

Choice of Evaluation Plans

547

each interesting sort order of the join result for that subset
...
The number of interesting sort orders is generally not large
...
The dynamic-programming algorithm
for ﬁnding the best join order can be easily extended to handle sort orders
...

With n = 10, this number is around 59000, which is much better than the 17
...
More important, the storage required is much less than
before, since we need to store only one join order for each interesting sort order of
each of 1024 subsets of r1 ,
...
Although both numbers still increase rapidly with
n, commonly occurring joins usually have less than 10 relations, and can be handled
easily
...
For instance, when examining the plans for an expression, we
can terminate after we examine only a part of the expression, if we determine that
the cheapest plan for that part is already costlier than the cheapest evaluation plan
for a full expression examined earlier
...
Then, no full expression involving that
subexpression needs to be examined
...
Then, only a few competing plans will require a full
analysis of cost
...

14
...
3 Heuristic Optimization
A drawback of cost-based optimization is the cost of optimization itself
...
Hence, many systems use heuristics to reduce the number
of choices that must be made in a cost-based fashion
...

An example of a heuristic rule is the following rule for transforming relationalalgebra queries:
• Perform selection operations as early as possible
...
In the ﬁrst transformation example in Section 14
...

We say that the preceding rule is a heuristic because it usually, but not always,
helps to reduce the cost
...

The selection can certainly be performed before the join
...
Data Storage and
Querying

14
...
Performing the selection early — that is, directly on s — would require doing a
scan of all tuples in s
...

The projection operation, like the selection operation, reduces the size of relations
...
This advantage suggests a companion to the “perform selections early” heuristic:
• Perform projections early
...
An example similar to the one used for the selection heuristic
should convince you that this heuristic does not always reduce the cost
...
3
...
We now present an overview of the steps in a typical heuristic optimization algorithm
...
3
1
...
This step, based on equivalence rule 1, facilitates moving selection operations down the query tree
...
Move selection operations down the query tree for the earliest possible execution
...
a, 7
...

For instance, this step transforms σθ (r 1 s) into either σθ (r) 1 s or r 1 σθ (s)
whenever possible
...
The degree of reordering permitted for a particular selection is determined by the attributes
involved in that selection condition
...
Determine which selection operations and join operations will produce the
smallest relations — that is, will produce the relations with the least number
of tuples
...

This step considers the selectivity of a selection or join condition
...
This step relies on the associativity
of binary operations given in equivalence rule 6
...
Replace with join operations those Cartesian product operations that are followed by a selection condition (rule 4
...
The Cartesian product operation is
often expensive to implement since r1 × r2 includes a record for each combination of records from r1 and r2
...

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

IV
...
Query Optimization

14
...
Deconstruct and move as far down the tree as possible lists of projection attributes, creating new projections where needed
...
a, 8
...

6
...

In summary, the heuristics listed here reorder an initial query-tree representation
in such a way that the operations that reduce the size of intermediate results are applied ﬁrst; early selection reduces the number of tuples, and early projection reduces
the number of attributes
...

Heuristic optimization further maps the heuristically transformed query expression into alternative sequences of operations to produce a set of candidate evaluation plans
...
The access-plan
– selection phase of a heuristic optimizer chooses the most efﬁcient strategy for each
operation
...
4
...
For
example, certain query optimizers, such as the System R optimizer, do not consider
all join orders, but rather restrict the search to particular kinds of join orders
...
, rn
...
Left-deep join orders are particularly convenient for pipelined evaluation,
since the right operand is a stored relation, and thus only one input to each join is
pipelined
...
6 illustrates the difference between left-deep join trees and non-left-deep
join trees
...
With the use of dynamic programming
optimizations, the System R optimizer can ﬁnd the best join order in time O(n2n )
...

The System R optimizer uses heuristics to push selections and projections down the
query tree
...
The estimate is likely to be accurate with small buffers; with large buffers, however, the page containing the tuple
may already be in the buffer
...

550

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

550

Chapter 14

IV
...
Query Optimization

Query Optimization

r5
r4
r3

r4

r5

r3
r1
r1

r2

r2
(a) Left-deep join tree
Figure 14
...

Query optimization approaches that integrate heuristic selection and the generation of alternative access plans have been adopted in several systems
...
The cost-based optimization techniques
described here are used for each block of the query separately
...
Each plan uses a left-deep join order,
starting with a different one of the n relations
...
Either nested-loop
or sort – merge join is chosen for each of the joins, depending on the available access
paths
...

The intricacies of SQL introduce a good deal of complexity into query optimizers
...

We brieﬂy outline how to handle nested subqueries in Section 14
...
5
...

Even with the use of heuristics, cost-based query optimization imposes a substantial overhead on query processing
...
The difference in execution time between a good
plan and a bad one may be huge, making query optimization essential
...
Therefore, most commercial systems include relatively sophisticated optimizers
...

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

IV
...
Query Optimization

14
...
4
...

The parameters are the variables from outer level query that are used in the nested
subquery (these variables are called correlation variables)
...

select customer-name
from borrower
where exists (select *
from depositor
where depositor
...
customer-name)
Conceptually, the subquery can be viewed as a function that takes a parameter (here,
borrower
...

SQL evaluates the overall query (conceptually) by computing the Cartesian product of the relations in the outer from clause and then testing the predicates in the
where clause for each tuple in the product
...

This technique for evaluating a query with a nested subquery is called correlated
evaluation
...
A large number of random
disk I/O operations may result
...
Efﬁcient join algorithms help avoid expensive random I/O
...

As an example of transforming a nested subquery into a join, the query in the
preceding example can be rewritten as
select customer-name
from borrower, depositor
where depositor
...
customer-name
(To properly reﬂect SQL semantics, the number of duplicate derivations should not
change because of the rewriting; the rewritten query can be modiﬁed to ensure this
property, as we will see shortly
...
In general, it may not be
possible to directly move the nested subquery relations into the from clause of the
outer query
...
For instance, a query of the
form

552

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

552

Chapter 14

IV
...
Query Optimization

© The McGraw−Hill
Companies, 2001

Query Optimization

select
...

from L1 , t1
2
where P1 and P2
1
where P2 contains predicates in P2 without selections involving correlation variables,
2
and P2 reintroduces the selections involving correlation variables (with relations referenced in the predicate appropriately renamed)
...

In our example, the original query would have been transformed to

create table t1 as
select distinct customer-name
from depositor
select customer-name
from borrower, t1
where t1
...
customer-name
The query we rewrote to illustrate creation of a temporary relation can be obtained
by simplifying the above transformed query, assuming the number of duplicates of
each tuple does not matter
...

Decorrelation is more complicated when the nested subquery uses aggregation,
or when the result of the nested subquery is used to test for equality, or when the
condition linking the nested subquery to the outer query is not exists, and so on
...

Optimization of complex nested subqueries is a difﬁcult task, as you can infer from
the above discussion, and many optimizers do only a limited amount of decorrelation
...

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

IV
...
Query Optimization

14
...
5 Materialized Views∗∗
When a view is deﬁned, normally the database stores only the query deﬁning the
view
...
Materialized views constitute redundant data, in that their contents can be
inferred from the view deﬁnition and the rest of the database contents
...

Materialized views are important for improving performance in some applications
...
Computing the view requires reading every loan tuple
pertaining to the branch, and summing up the loan amounts, which can be timeconsuming
...

14
...
1 View Maintenance
A problem with materialized views is that they must be kept up-to-date when the
data used in the view deﬁnition changes
...
The task of keeping a materialized view up-to-date with
the underlying data is known as view maintenance
...

Another option for maintaining materialized views is to deﬁne triggers on insert,
delete, and update of each relation in the view deﬁnition
...
A simplistic way of doing so is to completely recompute the materialized view on every update
...
We describe how to perform incremental view maintenance in Section 14
...
2
...
Database system programmers no longer need to deﬁne triggers for view
maintenance
...

554

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

554

Chapter 14

IV
...
Query Optimization

Query Optimization

14
...
2 Incremental View Maintenance
To understand how to incrementally maintain materialized views, we start off by
considering individual operations, and then see how to handle a complete expression
...
To simplify our description, we replace updates to
a tuple by deletion of the tuple followed by insertion of the updated tuple
...
The changes (inserts and deletes) to a
relation or expression are referred to as its differential
...
5
...
1 Join Operation
Consider the materialized view v = r 1 s
...
If the old value of r is denoted by r old , and the new value of r
by r new , r new = r old ∪ ir
...
We can rewrite r new 1 s as (r old ∪ ir ) 1 s,
which we can again rewrite as (r old 1 s) ∪ (ir 1 s)
...
Inserts to s are handled in an exactly
symmetric fashion
...
Using the
same reasoning as above, we get
Deletes on s are handled in an exactly symmetric fashion
...
5
...
2 Selection and Projection Operations
Consider a view v = σθ (r)
...
Consider a materialized
view v = ΠA (r)
...
Then, ΠA (r) has a single tuple (a)
...
The reason is
that the same tuple (a) is derived in two ways, and deleting one tuple from r removes
only one of the ways of deriving (a); the other is still present
...

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

IV
...
Query Optimization

14
...

Let t
...
We ﬁnd (t
...
If the count becomes 0, (t
...

Handling insertions is relatively straightforward
...
If (t
...
If not, we add (t
...

14
...
2
...
The aggregate operations in SQL are count, sum, avg, min, and max:
• count: Consider a materialized view v = A Gcount(B) (r), which computes the
count of the attribute B, after grouping r by attribute A
...
We look for the group t
...
If it is not present,
we add (t
...
If the group t
...

When a set of tuples dr is deleted from r, for each tuple t in dr we do the
following
...
A in the materialized view, and subtract 1
from the count for the group
...
A from the materialized view
...

When a set of tuples ir is inserted into r, for each tuple t in ir we do the following
...
A in the materialized view
...
A, t
...
A, t
...
If the group t
...
B to the aggregate value for the group, and add
1 to the count of the group
...
We look for the group t
...
B from the aggregate value for the group
...
A from the materialized view
...

• avg: Consider a materialized view v = A Gavg(B) (r)
...

556

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

556

Chapter 14

IV
...
Query Optimization

© The McGraw−Hill
Companies, 2001

Query Optimization

Instead, to handle the case of avg, we maintain the sum and count aggregate values as described earlier, and compute the average as the sum divided
by the count
...
(The case of max is
exactly equivalent
...
Maintaining the aggregate values min and max on deletions may be more expensive
...

14
...
2
...
Given materialized view v =
r ∩ s, when a tuple is inserted in r we check if it is present in s, and if so we add
it to v
...

The other set operations, union and set difference, are handled in a similar fashion; we
leave details to you
...

In the case of deletion from r we have to handle tuples in s that no longer match any
tuple in r
...
Again we leave details to you
...
5
...
5 Handling Expressions
So far we have seen how to update incrementally the result of a single operation
...

For example, suppose we wish to incrementally update a materialized view E1 1
E2 when a set of tuples ir is inserted into relation r
...
Suppose the set of tuples to be inserted into E1 is given by expression D1
...

See the bibliographical notes for further details on incremental view maintenance
with expressions
...
5
...
However, materialized views offer further opportunities for optimization:
• Rewriting queries to use materialized views:
Suppose a materialized view v = r 1 s is available, and a user submits a
query r 1 s 1 t
...
Thus, it is the job of the

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

IV
...
Query Optimization

14
...

• Replacing a use of a materialized view by the view deﬁnition:
Suppose a materialized view v = r 1 s is available, but without any index
on it, and a user submits a query σA=10 (v)
...
The best plan
for this query may be to replace v by r 1 s, which can lead to the query plan
σA=10 (r) 1 s; the selection and join can be performed efﬁciently by using
the indices on r
...
B, respectively
...

The bibliographical notes give pointers to research showing how to efﬁciently perform query optimization with materialized views
...
One simple criterion would be to select a set
of materialized views that minimizes the overall execution time of the workload of
queries and updates, including the time taken to maintain the materialized views
...

Indices are just like materialized views, in that they too are derived data, can speed
up queries, and may slow down updates
...

We examine these issues in more detail in Sections 21
...
5 and 21
...
6
...
5, and the RedBrick Data
Warehouse from Informix, provide tools to help the database administrator with index and materialized view selection
...

14
...
It is the responsibility of the system to transform the query as entered
by the user into an equivalent query that can be computed more efﬁciently
...

• The evaluation of complex queries involves many accesses to disk
...

• The strategy that the database system chooses for evaluating an operation depends on the size of each relation and on the distribution of values within

558

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

558

Chapter 14

IV
...
Query Optimization

Query Optimization

columns
...
These statistics include
The number of tuples in the relation r
The size of a record (tuple) of relation r in bytes
The number of distinct values that appear in the relation r for a particular
attribute
• These statistics allow us to estimate the sizes of the results of various operations, as well as the cost of executing the operations
...
The presence of these structures has a significant inﬂuence on the choice of a query-processing strategy
...
The ﬁrst step in selecting a query-processing strategy is to ﬁnd a relational-algebra expression that is equivalent to the given expression and is estimated to cost less to execute
...
We use these rules to generate systematically
all expressions equivalent to the given query
...
Several
optimization techniques are available to reduce the number of alternative expressions and plans that need to be generated
...
Heuristic rules for transforming relationalalgebra queries include “Perform selection operations as early as possible,”
“Perform projections early,” and “Avoid Cartesian products
...
Incremental
view maintenance is needed to efﬁciently update materialized views when
the underlying relations are modiﬁed
...
Other issues related to materialized views include how
to optimize queries by making use of available materialized views, and how
to select views to be materialized
...
Data Storage and
Querying

Exercises

•
•
•
•
•

•
•
•
•
•

559

© The McGraw−Hill
Companies, 2001

14
...
1 Clustering indices may allow faster access to data than a nonclustering index
affords
...

14
...
Assume that r1 has 1000 tuples, r2 has 1500 tuples,
and r3 has 750 tuples
...

14
...
2
...
Let V (C, r1 )
be 900, V (C, r2 ) be 1100, V (E, r2 ) be 50, and V (E, r3 ) be 100
...
Estimate the size of
r1 1 r2 1 r3 , and give an efﬁcient strategy for computing the join
...
4 Suppose that a B+ -tree index on branch-city is available on relation branch, and
that no other index is available
...
σ¬(branch -city<“Brooklyn”) (branch)
b
...
σ¬(branch -city<“Brooklyn” ∨ assets<5000) (branch)
14
...
What would be the best way to handle the following selection?
σ(branch-city<“Brooklyn”) ∧ (assets<5000)∧(branch-name=“Downtown”) (branch)

560

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

560

Chapter 14

IV
...
Query Optimization

© The McGraw−Hill
Companies, 2001

Query Optimization

14
...
Explain how you can apply then
to improve the efﬁciency of certain queries:
a
...

b
...

c
...

14
...
3
...

a
...
σθ1 ∧θ2 (E1 1θ3 E2 ) = σθ1 (E1 1θ3 (σθ2 (E2 ))), where θ2 involves only attributes from E2
14
...

a
...
σB<4 ( A Gmax (B) (R)) and A Gmax (B) (σB<4 (R))
c
...
(R 1 S) 1 T and R 1 (S 1 T )
In other words, the natural left outer join is not associative
...
)
e
...
9 SQL allows relations with duplicates (Chapter 4)
...
Deﬁne versions of the basic relational-algebra operations σ, Π, ×, 1, −, ∪,
and ∩ that work on relations with duplicates, in a way consistent with SQL
...
Check which of the equivalence rules 1 through 7
...

14
...

Hint: A complete binary tree is one where every internal node has exactly
two children
...

(n−1)
If you wish, you can derive the formula for the number of complete binary
trees with n nodes from the formula for the number of binary trees with n
1
nodes
...

14
...
Assume that you can store and look up information about a set of relations (such
as the optimal join order for the set, and the cost of that join order) in constant
time
...
)

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

IV
...
Query Optimization

561

© The McGraw−Hill
Companies, 2001

Bibliographical Notes

561

14
...
Assume
that there is only one interesting sort order
...
13 A set of equivalence rules is said to be complete if, whenever two expressions
are equivalent, one can be derived from the other by a sequence of uses of the
equivalence rules
...
3
...

14
...
Write a nested query on the relation account to ﬁnd for each branch with
name starting with “B”, all accounts with the maximum balance at the
branch
...
Rewrite the preceding query, without using a nested subquery; in other
words, decorrelate the query
...
Give a procedure (similar that that described in Section 14
...
5) for decorrelating such queries
...
15 Describe how to incrementally maintain the results of the following operations,
on both insertions and deletions
...
Union and set difference
b
...
16 Give an example of an expression deﬁning a materialized view and two situations (sets of statistics for the input relations and the differentials) such that
incremental view maintenance is better than recomputation in one situation,
and recomputation is better in the other situation
...
[1979] describes access-path selection in the System R optimizer, which was one of the earliest relational-query optimizers
...

Query processing in Starburst is described in Haas et al
...
Query optimization
in Oracle is brieﬂy outlined in Oracle [1997]
...
[1996], and Ganguly et al
...

Nonuniform distributions of values causes problems for estimation of query size and
cost
...
Ioannidis and Christodoulakis [1993], Ioannidis and
Poosala [1995], and Poosala et al
...

Exhaustive searching of all query plans is impractical for optimization of joins
involving many relations, and techniques based on randomized searching, which do
not examine all alternatives, have been proposed
...

Parametric query-optimization techniques have been proposed by Ioannidis et al
...
Data Storage and
Querying

14
...
A set of plans — one for each of several
different query selectivities— is computed, and is stored by the optimizer, at compile
time
...

Klug [1982] was an early work on optimization of relational-algebra expressions
with aggregate functions
...
Optimization of queries containing outer
joins is described in Rosenthal and Reiner [1984], Galindo-Legaria and Rosenthal
[1992], and Galindo-Legaria [1994]
...
Extension
of relational algebra to duplicates is described in Dayal et al
...
Optimization of
nested subqueries is discussed in Kim [1982], Ganski and Wong [1987], Dayal [1987],
and more recently, in Seshadri et al
...

When queries are generated through views, more relations often are joined than is
necessary for computation of the query
...
The notion of a tableau
was introduced by Aho et al
...
[1979a], and was further extended
by Sagiv and Yannakakis [1981]
...

Sellis [1988] and Roy et al
...
If an entire group
of queries is considered, it is possible to discover common subexpressions that can be
evaluated once for the entire group
...
Dalvi et al
...

Query optimization can make use of semantic information, such as functional dependencies and other integrity constraints
...
[1990], and in the context of
aggregation, by Sudarshan and Ramakrishnan [1991]
...
[1992c], Srivastava et al
...
[1996]
...
[1993]
...
[1986], Blakeley et al
...
Gupta and Mumick [1995] provides a survey of materialized view maintenance
...
[2001]
...
[1995], Dar et al
...
[2000]
...
[1996], Labio et al
...
[2000]
...
Transaction
Management

R T

Introduction

© The McGraw−Hill
Companies, 2001

5

Transaction Management

The term transaction refers to a collection of operations that form a single logical unit
of work
...

It is important that either all actions of a transaction be executed completely, or, in
case of some failure, partial effects of a transaction be undone
...
Further, once a transaction is successfully executed, its effects must persist
in the database — a system failure should not result in the database forgetting about
a transaction that successfully completed
...

In a database system where multiple transactions are executing concurrently, if
updates to shared data are not controlled there is potential for transactions to see
inconsistent intermediate states created by updates of other transactions
...
Thus, database
systems must provide mechanisms to isolate transactions from the effects of other
concurrently executing transactions
...

Chapter 15 describes the concept of a transaction in detail, including the properties
of atomicity, durability, isolation, and other properties provided by the transaction
abstraction
...

Chapter 16 describes several concurrency control techniques that help implement
the isolation property
...

563

564

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

C

V
...
Transactions

T

E

R

1

5

Transactions

Often, a collection of several operations on the database appears to be a single unit
from the point of view of the database user
...

Clearly, it is essential that all these operations occur, or that, in case of a failure, none
occur
...

Collections of operations that form a single logical unit of work are called transactions
...
Furthermore, it must
manage concurrent execution of transactions in a way that avoids the introduction of
inconsistency
...
As a result, it
would obtain an incorrect result
...
Details on
concurrent transaction processing and recovery from failures are in Chapters 16 and
17, respectively
...

15
...
Usually, a transaction is initiated by a user program written in a
high-level data-manipulation language or programming language (for example, SQL,
COBOL, C, C++, or Java), where it is delimited by statements (or function calls) of the
form begin transaction and end transaction
...

To ensure integrity of the data, we require that the database system maintain the
following properties of the transactions:
565

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

566

Chapter 15

V
...
Transactions

© The McGraw−Hill
Companies, 2001

Transactions

• Atomicity
...

• Consistency
...

• Isolation
...
Thus, each transaction is unaware of other transactions
executing concurrently in the system
...
After a transaction completes successfully, the changes it has made
to the database persist, even if there are system failures
...

To gain a better understanding of ACID properties and the need for them, consider a simpliﬁed banking system consisting of several accounts and a set of transactions that access and update those accounts
...

Transactions access data using two operations:
• read(X), which transfers the data item X from the database to a local buffer
belonging to the transaction that executed the read operation
...

In a real database system, the write operation does not necessarily result in the immediate update of the data on the disk; the write operation may be temporarily stored
in memory and executed on the disk later
...
We shall return to this subject
in Chapter 17
...
This transaction can be deﬁned as
Ti : read(A);
A := A − 50;
write(A);
read(B);
B := B + 50;
write(B)
...
(For ease of presentation, we
consider them in an order different from the order A-C-I-D)
...
Without the consistency
requirement, money could be created or destroyed by the transaction! It can

565

566

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

V
...
Transactions

15
...

Ensuring consistency for an individual transaction is the responsibility of
the application programmer who codes the transaction
...

• Atomicity: Suppose that, just before the execution of transaction Ti the values
of accounts A and B are $1000 and $2000, respectively
...
Examples of such failures include power
failures, hardware failures, and software errors
...
In
this case, the values of accounts A and B reﬂected in the database are $950 and
$2000
...
In particular, we
note that the sum A + B is no longer preserved
...
We term such a
state an inconsistent state
...
Note, however, that the system must at some
point be in an inconsistent state
...
This state, however, is eventually replaced by the consistent state where the value of account
A is $950, and the value of account B is $2050
...
That is the reason for
the atomicity requirement: If the atomicity property is present, all actions of
the transaction are reﬂected in the database, or none are
...
We discuss these ideas further in Section 15
...
Ensuring atomicity
is the responsibility of the database system itself; speciﬁcally, it is handled by
a component called the transaction-management component, which we describe in detail in Chapter 17
...

The durability property guarantees that, once a transaction completes successfully, all the updates that it carried out on the database persist, even if
there is a system failure after the transaction completes execution
...
Transaction
Management

15
...
We can
guarantee durability by ensuring that either
1
...

2
...

Ensuring durability is the responsibility of a component of the database system called the recovery-management component
...

• Isolation: Even if the consistency and atomicity properties are ensured for
each transaction, if several transactions are executed concurrently, their operations may interleave in some undesirable way, resulting in an inconsistent
state
...
If a
second concurrently running transaction reads A and B at this intermediate
point and computes A + B, it will observe an inconsistent value
...

A way to avoid the problem of concurrently executing transactions is to
execute transactions serially — that is, one after the other
...
4
...

We discuss the problems caused by concurrently executing transactions in
Section 15
...
The isolation property of a transaction ensures that the concurrent execution of transactions results in a system state that is equivalent to a
state that could have been obtained had these transactions executed one at a
time in some order
...
5
...

15
...
However, as we
noted earlier, a transaction may not always complete its execution successfully
...
If we are to ensure the atomicity property, an aborted
transaction must have no effect on the state of the database
...
Transaction
Management

© The McGraw−Hill
Companies, 2001

15
...
2

Transaction State

569

the aborted transaction made to the database must be undone
...
It is part of the responsibility of the recovery scheme to manage
transaction aborts
...

A committed transaction that has performed updates transforms the database into a
new consistent state, which must persist even if there is a system failure
...
The
only way to undo the effects of a committed transaction is to execute a compensating
transaction
...
However, it is not always possible
to create such a compensating transaction
...
Chapter 24 includes a discussion of compensating transactions
...
We therefore establish a simple abstract transaction model
...
1
...
Similarly, we say that a transaction has aborted only if it has entered the aborted state
...

A transaction starts in the active state
...
At this point, the transaction has completed its execution, but it is still possible that it may have to be aborted, since the actual output
may still be temporarily residing in main memory, and thus a hardware failure may
preclude its successful completion
...
When the last of this information is written out,
the transaction enters the committed state
...
Chapter 17 discusses techniques to deal with loss of data on disk
...
Such a transaction must be rolled back
...
At this point, the system has two options:

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

570

Chapter 15

V
...
Transactions

Transactions

partially
committed

committed

failed

aborted

active

Figure 15
...

• It can restart the transaction, but only if the transaction was aborted as a result
of some hardware or software error that was not created through the internal logic of the transaction
...

• It can kill the transaction
...

We must be cautious when dealing with observable external writes, such as writes
to a terminal or printer
...
Most systems allow such writes
to take place only after the transaction has entered the committed state
...
If the system should
fail after the transaction has entered the committed state, but before it could complete
the external writes, the database system will carry out the external writes (using the
data in nonvolatile storage) when the system is restarted
...
For example
suppose the external action is that of dispensing cash at an automated teller machine,
and the system fails just before the cash is actually dispensed (we assume that cash
can be dispensed atomically)
...
In such a case a compensating transaction, such as depositing the cash back in the users account, needs to be
executed when the system is restarted
...
Transaction
Management

© The McGraw−Hill
Companies, 2001

15
...
3

Implementation of Atomicity and Durability

571

For certain applications, it may be desirable to allow active transactions to display data to users, particularly for long-duration transactions that run for minutes
or hours
...
Most current transaction systems
ensure atomicity and, therefore, forbid this form of interaction with users
...

15
...
We ﬁrst consider a simple, but extremely inefﬁcient, scheme called the shadow copy scheme
...
The scheme also assumes that the database is simply a ﬁle on
disk
...

In the shadow-copy scheme, a transaction that wants to update the database ﬁrst
creates a complete copy of the database
...
If at any point the transaction has to be aborted, the system merely deletes the new copy
...

If the transaction completes, it is committed as follows
...
(Unix systems use the ﬂush command for this purpose
...
The old copy of the database is then deleted
...
2 depicts the scheme, showing the database state before and after the update
...
2

new copy of
database

(b) After update

Shadow-copy technique for atomicity and durability
...
Transaction
Management

15
...

We now consider how the technique handles transaction and system failures
...
If the transaction fails at any time before db-pointer is
updated, the old contents of the database are not affected
...
Once the transaction has been
committed, all the updates that it performed are in the database pointed to by dbpointer
...

Now consider the issue of system failure
...
Then, when the system restarts, it
will read db-pointer and will thus see the original contents of the database, and none
of the effects of the transaction will be visible on the database
...
Before the pointer is updated,
all updated pages of the new copy of the database were written to disk
...
Therefore, when the system restarts, it will read db-pointer
and will thus see the contents of the database after all the updates performed by the
transaction
...
If some of the
bytes of the pointer were updated by the write, but others were not, the pointer is
meaningless, and neither old nor new versions of the database may be found when
the system restarts
...
In other words, the disk system guarantees that it will update
db-pointer atomically, as long as we make sure that db-pointer lies entirely in a single
sector, which we can ensure by storing db-pointer at the beginning of a block
...

As a simple example of a transaction outside the database domain, consider a textediting session
...
The actions
executed by the transaction are reading and updating the ﬁle
...

Many text editors use essentially the implementation just described, to ensure that
an editing session is transactional
...
At the
end of the editing session, if the updated ﬁle is to be saved, the text editor uses a ﬁle
rename command to rename the new ﬁle to have the actual ﬁle name
...

Unfortunately, this implementation is extremely inefﬁcient in the context of large
databases, since executing a single transaction requires copying the entire database
...
There are practical ways of implementing atomicity and durability
that are much less expensive and more powerful
...

571

572

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

V
...
Transactions

15
...
4 Concurrent Executions
Transaction-processing systems usually allow multiple transactions to run concurrently
...
Ensuring consistency
in spite of concurrent execution of transactions requires extra work; it is far easier to
insist that transactions run serially — that is, one at a time, each starting only after
the previous one has completed
...
A transaction consists of many
steps
...
The CPU and the
disks in a computer system can operate in parallel
...
The parallelism of the CPU
and the I/O system can therefore be exploited to run multiple transactions in
parallel
...
All of this
increases the throughput of the system — that is, the number of transactions
executed in a given amount of time
...

• Reduced waiting time
...
If transactions run serially, a short transaction
may have to wait for a preceding long transaction to complete, which can lead
to unpredictable delays in running a transaction
...
Concurrent execution
reduces the unpredictable delays in running transactions
...

The motivation for using concurrent execution in a database is essentially the same
as the motivation for using multiprogramming in an operating system
...
In this section, we present the
concept of schedules to help identify those executions that are guaranteed to ensure
consistency
...
It does
so through a variety of mechanisms called concurrency-control schemes
...

Consider again the simpliﬁed banking system of Section 15
...
Let T1 and

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

574

Chapter 15

V
...
Transactions

Transactions

T2 be two transactions that transfer funds from one account to another
...
It is deﬁned as
T1 : read(A);
A := A − 50;
write(A);
read(B);
B := B + 50;
write(B)
...
It is
deﬁned as
T2 : read(A);
temp := A * 0
...

Suppose the current values of accounts A and B are $1000 and $2000, respectively
...
This execution sequence appears in Figure 15
...
In the ﬁgure, the
sequence of instruction steps is in chronological order from top to bottom, with instructions of T1 appearing in the left column and instructions of T2 appearing in the
right column
...
3
takes place, are $855 and $2145, respectively
...
1
A := A – temp
write(A)
read(B)
B := B + temp
write(B)

Figure 15
...

573

574

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

V
...
Transactions

15
...

Similarly, if the transactions are executed one at a time in the order T2 followed
by T1 , then the corresponding execution sequence is that of Figure 15
...
Again, as
expected, the sum A + B is preserved, and the ﬁnal values of accounts A and B are
$850 and $2150, respectively
...
They represent the
chronological order in which instructions are executed in the system
...
For example, in transaction T1 , the instruction write(A) must appear before the
instruction read(B), in any valid schedule
...

These schedules are serial: Each serial schedule consists of a sequence of instructions from various transactions, where the instructions belonging to one single transaction appear together in that schedule
...

When the database system executes several transactions concurrently, the corresponding schedule no longer needs to be serial
...
With multiple transactions, the CPU time is shared among all the transactions
...
In general, it is not possible to predict exactly
how many instructions of a transaction will be executed before the CPU switches to

T1

T2
read(A)
temp := A * 0
...
4

Schedule 2 — a serial schedule in which T2 is followed by T1
...
Transaction
Management

© The McGraw−Hill
Companies, 2001

15
...
1
A := A – temp
write(A)
read(B)
B := B + 50
write(B)
read(B)
B := B + temp
write(B)
Figure 15
...

another transaction
...

Returning to our previous example, suppose that the two transactions are executed concurrently
...
5
...
The sum A + B is indeed preserved
...
To illustrate, consider the
schedule of Figure 15
...
After the execution of this schedule, we arrive at a state
where the ﬁnal values of accounts A and B are $950 and $2100, respectively
...
Indeed, the sum A + B is not preserved by the execution of the two
transactions
...
It is the job of the database system to
ensure that any schedule that gets executed will leave the database in a consistent
state
...

We can ensure consistency of the database under concurrent execution by making
sure that any schedule that executed has the same effect as a schedule that could
have occurred without any concurrent execution
...
We examine this idea in Section 15
...

15
...
Before we examine how the database

575

576

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

V
...
Transactions

15
...
1
A := A – temp
write(A)
read(B)
write(A)
read(B)
B := B + 50
write(B)
B := B + temp
write(B)
Figure 15
...

system can carry out this task, we must ﬁrst understand which schedules will ensure consistency, and which schedules will not
...
For this reason, we shall not interpret the type of operations that a
transaction can perform on a data item
...
We thus assume that, between a read(Q) instruction and a write(Q)
instruction on a data item Q, a transaction may perform an arbitrary sequence of operations on the copy of Q that is residing in the local buffer of the transaction
...
We shall therefore usually show only read and write
instructions in schedules, as we do in schedule 3 in Figure 15
...

In this section, we discuss different forms of schedule equivalence; they lead to the
notions of conﬂict serializability and view serializability
...
7

Schedule 3 — showing only the read and write instructions
...
Transaction
Management

15
...
5
...
If Ii and Ij refer to different data
items, then we can swap Ii and Ij without affecting the results of any instruction in
the schedule
...
Since we are dealing with only read and write instructions,
there are four cases that we need to consider:
1
...
The order of Ii and Ij does not matter, since the
same value of Q is read by Ti and Tj , regardless of the order
...
Ii = read(Q), Ij = write(Q)
...
If Ij comes before Ii , then Ti reads
the value of Q that is written by Tj
...

3
...
The order of Ii and Ij matters for reasons similar
to those of the previous case
...
Ii = write(Q), Ij = write(Q)
...
However, the value
obtained by the next read(Q) instruction of S is affected, since the result of
only the latter of the two write instructions is preserved in the database
...

Thus, only in the case where both Ii and Ij are read instructions does the relative
order of their execution not matter
...

To illustrate the concept of conﬂicting instructions, we consider schedule 3, in Figure 15
...
The write(A) instruction of T1 conﬂicts with the read(A) instruction of T2
...

Let Ii and Ij be consecutive instructions of a schedule S
...
We expect S to be equivalent to S , since all
instructions appear in the same order in both schedules except for Ii and Ij , whose
order does not matter
...
7 does not conﬂict
with the read(B) instruction of T1 , we can swap these instructions to generate an
equivalent schedule, schedule 5, in Figure 15
...
Regardless of the initial system state,
schedules 3 and 5 both produce the same ﬁnal system state
...

• Swap the write(B) instruction of T1 with the write(A) instruction of T2
...

577

578

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

V
...
Transactions

15
...
8

Schedule 5 — schedule 3 after swapping of a pair of instructions
...
9, is a serial schedule
...
This equivalence
implies that, regardless of the initial system state, schedule 3 will produce the same
ﬁnal state as will some serial schedule
...

In our previous examples, schedule 1 is not conﬂict equivalent to schedule 2
...

The concept of conﬂict equivalence leads to the concept of conﬂict serializability
...
Thus, schedule 3 is conﬂict serializable, since it is conﬂict equivalent to the
serial schedule 1
...
10; it consists of only the signiﬁcant operations (that is, the read and write) of transactions T3 and T4
...

It is possible to have two schedules that produce the same outcome, but that are
not conﬂict equivalent
...
9

Schedule 6 — a serial schedule that is equivalent to schedule 3
...
Transaction
Management

© The McGraw−Hill
Companies, 2001

15
...
10

Schedule 7
...
Let schedule 8 be as deﬁned in Figure 15
...
We claim
that schedule 8 is not conﬂict equivalent to the serial schedule , since, in
schedule 8, the write(B) instruction of T5 conﬂicts with the read(B) instruction of T1
...
However, the ﬁnal values of accounts A and B
after the execution of either schedule 8 or the serial schedule are the same
— $960 and $2040, respectively
...
For the system to determine that schedule 8
produces the same outcome as the serial schedule , it must analyze the computation performed by T1 and T5 , rather than just the read and write operations
...
However, there are other deﬁnitions of schedule equivalence based purely on the read and
write operations
...

15
...
2 View Serializability
In this section, we consider a form of equivalence that is less stringent than conﬂict
equivalence, but that, like conﬂict equivalence, is based on only the read and write
operations of transactions
...
11

Schedule 8
...
Transaction
Management

© The McGraw−Hill
Companies, 2001

15
...
5

Serializability

581

Consider two schedules S and S , where the same set of transactions participates
in both schedules
...
For each data item Q, if transaction Ti reads the initial value of Q in schedule
S, then transaction Ti must, in schedule S , also read the initial value of Q
...
For each data item Q, if transaction Ti executes read(Q) in schedule S, and if
that value was produced by a write(Q) operation executed by transaction Tj ,
then the read(Q) operation of transaction Ti must, in schedule S , also read the
value of Q that was produced by the same write(Q) operation of transaction Tj
...
For each data item Q, the transaction (if any) that performs the ﬁnal write(Q)
operation in schedule S must perform the ﬁnal write(Q) operation in schedule S
...
Condition 3, coupled with
conditions 1 and 2, ensures that both schedules result in the same ﬁnal system state
...
However, schedule 1 is view equivalent
to schedule 3, because the values of account A and B read by transaction T2 were
produced by T1 in both schedules
...
We
say that a schedule S is view serializable if it is view equivalent to a serial schedule
...
12
...
Indeed, it is view
equivalent to the serial schedule , since the one read(Q) instruction reads
the initial value of Q in both schedules, and T6 performs the ﬁnal write of Q in both
schedules
...
Indeed, schedule 9 is not conﬂict serializable, since every pair of consecutive instructions conﬂicts, and, thus, no
swapping of instructions is possible
...
Writes of this sort are called blind
writes
...

T3
read(Q)

T4

T6

write(Q)
write(Q)
write(Q)
Figure 15
...

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

582

Chapter 15

V
...
Transactions

Transactions

15
...
We
now address the effect of transaction failures during concurrent execution
...
In a system that allows
concurrent execution, it is necessary also to ensure that any transaction Tj that is
dependent on Ti (that is, Tj has read data written by Ti ) is also aborted
...

In the following two subsections, we address the issue of what schedules are
acceptable from the viewpoint of recovery from transaction failure
...

15
...
1 Recoverable Schedules
Consider schedule 11 in Figure 15
...
Suppose that the system allows T9 to commit immediately
after executing the read(A) instruction
...
Now suppose that T8 fails before it commits
...
However, T9 has already
committed and cannot be aborted
...

Schedule 11, with the commit happening immediately after the read(A) instruction, is an example of a nonrecoverable schedule, which should not be allowed
...
A recoverable schedule is
one where, for each pair of transactions Ti and Tj such that Tj reads a data item previously written by Ti , the commit operation of Ti appears before the commit operation
of Tj
...
6
...
Such situations occur if transactions have read data written by Ti
...
13

Schedule 11
...
Transaction
Management

© The McGraw−Hill
Companies, 2001

15
...
7

T10
read(A)
read(B)
write(A)

T11

Implementation of Isolation

583

T12

read(A)
write(A)
read(A)
Figure 15
...

of Figure 15
...
Transaction T10 writes a value of A that is read by transaction T11
...
Suppose that,
at this point, T10 fails
...
Since T11 is dependent on T10 , T11
must be rolled back
...
This
phenomenon, in which a single transaction failure leads to a series of transaction
rollbacks, is called cascading rollback
...
It is desirable to restrict the schedules to those where cascading
rollbacks cannot occur
...
Formally, a
cascadeless schedule is one where, for each pair of transactions Ti and Tj such that
Tj reads a data item previously written by Ti , the commit operation of Ti appears
before the read operation of Tj
...

15
...

Speciﬁcally, schedules that are conﬂict or view serializable and cascadeless satisfy
these requirements
...

As a trivial example of a concurrency-control scheme, consider this scheme: A
transaction acquires a lock on the entire database before it starts and releases the
lock after it has committed
...
As
a result of the locking policy, only one transaction can execute at a time
...
These are trivially serializable, and it is easy to
verify that they are cascadeless as well
...
In
other words, it provides a poor degree of concurrency
...
4,
concurrent execution has several performance beneﬁts
...
Transaction
Management

15
...

We study a number of concurrency-control schemes in Chapter 16
...
Some of them allow only conﬂict serializable
schedules to be generated; others allow certain view-serializable schedules that are
not conﬂict-serializable to be generated
...
8 Transaction Deﬁnition in SQL
A data-manipulation language must include a construct for specifying the set of actions that constitute a transaction
...
Transactions are
ended by one of these SQL statements:
• Commit work commits the current transaction and begins a new one
...

The keyword work is optional in both the statements
...

The standard also speciﬁes that the system must ensure both serializability and
freedom from cascading rollback
...
Thus,
conﬂict and view serializability are both acceptable
...

We study such weaker levels of consistency in Section 16
...

15
...
To do that, we must ﬁrst understand how to
determine, given a particular schedule S, whether the schedule is serializable
...
Consider a schedule S
...
This graph consists of a pair G = (V, E), where V is a set
of vertices and E is a set of edges
...
The set of edges consists of all edges Ti → Tj for which
one of three conditions holds:
1
...

2
...

3
...

583

584

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

V
...
Transactions

15
...
15

Testing for Serializability

Precedence graph for (a) schedule 1 and (b) schedule 2
...

For example, the precedence graph for schedule 1 in Figure 15
...
Similarly, Figure 15
...

The precedence graph for schedule 4 appears in Figure 15
...
It contains the edge
T1 → T2 , because T1 executes read(A) before T2 executes write(A)
...

If the precedence graph for S has a cycle, then schedule S is not conﬂict serializable
...

A serializability order of the transactions can be obtained through topological
sorting, which determines a linear order consistent with the partial order of the
precedence graph
...
For example, the graph of Figure 15
...
17b and 15
...

Thus, to test for conﬂict serializability, we need to construct the precedence graph
and to invoke a cycle-detection algorithm
...
Cycle-detection algorithms, such as those based
on depth-ﬁrst search, require on the order of n2 operations, where n is the number of
vertices in the graph (that is, the number of transactions)
...

Returning to our previous examples, note that the precedence graphs for schedules 1 and 2 (Figure 15
...
The precedence graph for
schedule 4 (Figure 15
...

Testing for view serializability is rather complicated
...
Thus, almost certainly there exists no efﬁcient algorithm to test for view serializability
...
16

T2

Precedence graph for schedule 4
...
Transaction
Management

© The McGraw−Hill
Companies, 2001

15
...
17

Illustration of topological sorting
...
However,
concurrency-control schemes can still use sufﬁcient conditions for view serializability
...

15
...
Understanding the concept of a transaction is critical for
understanding and implementing updates of data in a database, in such a way
that concurrent executions and failures of various forms do not result in the
database becoming inconsistent
...

Atomicity ensures that either all the effects of a transaction are reﬂected
in the database, or none are; a failure cannot leave the database in a state
where a transaction is partially executed
...

585

586

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

V
...
Transactions

15
...

Durability ensures that, once a transaction has been committed, that transaction’s updates do not get lost, even if there is a system failure
...

• When several transactions execute concurrently in the database, the consistency of data may no longer be preserved
...

Since a transaction is a unit that preserves consistency, a serial execution
of transactions guarantees that consistency is preserved
...

We require that any schedule produced by concurrent processing of a
set of transactions will have an effect equivalent to a schedule produced
when these transactions are run serially in some order
...

There are several different notions of equivalence leading to the concepts
of conﬂict serializability and view serializability
...

• Schedules must be recoverable, to make sure that if transaction a sees the effects of transaction b, and b then aborts, then a also gets aborted
...
Cascadelessness is
ensured by allowing transactions to only read committed data
...
Chapter 16 describes
concurrency-control schemes
...

The shadow copy scheme is used for ensuring atomicity and durability in
text editors; however, it has extremely high overheads when used for database
systems, and, moreover, it does not support concurrent execution
...

• We can test a given schedule for conﬂict serializability by constructing a precedence graph for the schedule, and by searching for absence of cycles in the

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

588

Chapter 15

V
...
Transactions

Transactions

graph
...

Review Terms
• Transaction
• ACID properties
Atomicity
Consistency
Isolation
Durability
• Inconsistent state
• Transaction state
Active
Partially committed
Failed
Aborted
Committed
Terminated
• Transaction
Restart
Kill
• Observable external writes
• Shadow copy scheme

•
•
•
•
•
•
•
•

Concurrent executions
Serial execution
Schedules
Conﬂict of operations
Conﬂict equivalence
Conﬂict serializability
View equivalence
View serializability

•
•
•
•
•
•
•
•
•
•

Blind writes
Recoverability
Recoverable schedules
Cascading rollback
Cascadeless schedules
Concurrency-control scheme
Lock
Serializability testing
Precedence graph
Serializability order

Exercises
15
...
Explain the usefulness of each
...
2 Suppose that there is a database system that never fails
...
3 Consider a ﬁle system such as the one on your favorite operating system
...
What are the steps involved in creation and deletion of ﬁles, and in writing
data to a ﬁle?
b
...

15
...
Why might this be the case?
15
...
List all possible sequences of states through which a transaction may pass
...

587

588

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

V
...
Transactions

Exercises

589

15
...

15
...

15
...

T2 : read(B);
read(A);
if B = 0 then A := A + 1;
write(A)
...

a
...

b
...

c
...
9 Since every conﬂict-serializable schedule is view serializable, why do we emphasize conﬂict serializability rather than view serializability?
15
...
18
...

15
...

T1

T2

T4

T3
T5

Figure 15
...

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

590

Chapter 15

V
...
Transactions

© The McGraw−Hill
Companies, 2001

Transactions

15
...

Bibliographical Notes
Gray and Reuter [1993] provides detailed textbook coverage of transaction-processing
concepts, techniques and implementation details, including concurrency control and
recovery issues
...

Early textbook discussions of concurrency control and recovery included Papadimitriou [1986] and Bernstein et al
...
An early survey paper on implementation
issues in concurrency control and recovery is presented by Gray [1978]
...
[1976] in connection
to work on concurrency control for System R
...
[1977] and Papadimitriou [1979]
...
[1990]
...

589

590

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

C

V
...
Concurrency Control

T

E

R

1

© The McGraw−Hill
Companies, 2001

6

Concurrency Control

We saw in Chapter 15 that one of the fundamental properties of a transaction is isolation
...
To ensure that it is, the system must
control the interaction among the concurrent transactions; this control is achieved
through one of a variety of mechanisms called concurrency-control schemes
...
That is, all the schemes presented here ensure that the
schedules are serializable
...
In this chapter, we consider the management of
concurrently executing transactions, and we ignore failures
...

16
...
The most common method used to implement
this requirement is to allow a transaction to access a data item only if it is currently
holding a lock on that item
...
1
...
In this section, we
restrict our attention to two modes:
1
...
If a transaction Ti has obtained a shared-mode lock (denoted by S)
on item Q, then Ti can read, but cannot write, Q
...
Exclusive
...

591

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

592

Chapter 16

V
...
Concurrency Control

Concurrency Control

S
S true
X false
Figure 16
...

We require that every transaction request a lock in an appropriate mode on data
item Q, depending on the types of operations that it will perform on Q
...
The transaction can
proceed with the operation only after the concurrency-control manager grants the
lock to the transaction
...
Let A and B represent arbitrary lock modes
...
If transaction Ti can be granted a lock on Q immediately, in spite
of the presence of the mode B lock, then we say mode A is compatible with mode
B
...
The compatibility
relation between the two modes of locking discussed in this section appears in the
matrix comp of Figure 16
...
An element comp(A, B) of the matrix has the value true if
and only if mode A is compatible with mode B
...
At any time, several shared-mode locks can be held simultaneously (by different transactions) on a particular data item
...

A transaction requests a shared lock on data item Q by executing the lock-S(Q)
instruction
...
A transaction can unlock a data item Q by the unlock(Q) instruction
...
If the data item is
already locked by another transaction in an incompatible mode, the concurrencycontrol manager will not grant the lock until all incompatible locks held by other
transactions have been released
...

T1 : lock-X(B);
read(B);
B := B − 50;
write(B);
unlock(B);
lock-X(A);
read(A);
A := A + 50;
write(A);
unlock(A)
...
2

Transaction T1
...
Transaction
Management

© The McGraw−Hill
Companies, 2001

16
...
1

Lock-Based Protocols

593

T2 : lock-S(A);
read(A);
unlock(A);
lock-S(B);
read(B);
unlock(B);
display(A + B)
...
3

Transaction T2
...

Note that a transaction must hold a lock on a data item as long as it accesses that item
...

As an illustration, consider again the simpliﬁed banking system that we introduced in Chapter 15
...
Transaction T1 transfers $50 from account B to account A (Figure 16
...

Transaction T2 displays the total amount of money in accounts A and B—that is, the
sum A + B (Figure 16
...

T1
lock-X(B)

T2

concurrency-control manager
grant-X(B, T1)

read(B)
B := B –– 50
write(B)
unlock(B)
lock-S(A)
grant-S(A, T2)
read(A)
unlock(A)
lock-S(B)
grant-S(B, T2)
read(B)
unlock(B)
display(A + B)
lock-X(A)
grant-X(A, T2)
read(A)
A := A + 50
write(A)
unlock(A)
Figure 16
...

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

594

Chapter 16

V
...
Concurrency Control

© The McGraw−Hill
Companies, 2001

Concurrency Control

T3 : lock-X(B);
read(B);
B := B − 50;
write(B);
lock-X(A);
read(A);
A := A + 50;
write(A);
unlock(B);
unlock(A)
...
5

Transaction T3
...
If these
two transactions are executed serially, either in the order T1 , T2 or the order T2 , T1 ,
then transaction T2 will display the value $300
...
4 is possible
...
The reason for this mistake is that the
transaction T1 unlocked data item B too early, as a result of which T2 saw an inconsistent state
...
The transaction making a lock request cannot execute its next action until the concurrency-control manager grants the lock
...
Exactly when
within this interval the lock is granted is not important; we can safely assume that the
lock is granted just before the following action of the transaction
...
We let you infer when locks are granted
...
Transaction T3 corresponds to T1 with unlocking delayed (Figure 16
...
Transaction T4 corresponds to T2 with unlocking delayed (Figure 16
...

You should verify that the sequence of reads and writes in schedule 1, which lead
to an incorrect total of $250 being displayed, is no longer possible with T3 and T4
...

Figure 16
...

593

594

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

V
...
Concurrency Control

16
...
7

Schedule 2
...
T4 will not print out an inconsistent result in any of
them; we shall see why later
...
Consider the partial
schedule of Figure 16
...
Since T3 is holding an exclusive-mode lock
on B and T4 is requesting a shared-mode lock on B, T4 is waiting for T3 to unlock
B
...
Thus, we have arrived at
a state where neither of these transactions can ever proceed with its normal execution
...
When deadlock occurs, the system must roll back
one of the two transactions
...
These data items are then available
to the other transaction, which can continue with its execution
...
6
...
On the other hand, if we do not
unlock a data item before requesting a lock on another data item, deadlocks may
occur
...
1
...
However, in general, deadlocks are a necessary evil associated with locking, if we want to avoid inconsistent states
...

We shall require that each transaction in the system follow a set of rules, called a
locking protocol, indicating when a transaction may lock and unlock each of the data
items
...
The set of all such
schedules is a proper subset of all possible serializable schedules
...
Before doing
so, we need a few deﬁnitions
...
, Tn } be a set of transactions participating in a schedule S
...
If Ti → Tj , then that precedence implies that in any equivalent serial schedule, Ti must appear before Tj
...
Transaction
Management

16
...
9 to test for conﬂict serializability
...

We say that a schedule S is legal under a given locking protocol if S is a possible
schedule for a set of transactions that follow the rules of the locking protocol
...

16
...
2 Granting of Locks
When a transaction requests a lock on a data item in a particular mode, and no other
transaction has a lock on the same data item in a conﬂicting mode, the lock can be
granted
...
Suppose a
transaction T2 has a shared-mode lock on a data item, and another transaction T1
requests an exclusive-mode lock on the data item
...
Meanwhile, a transaction T3 may request a shared-mode
lock on the same data item
...
At this point T2 may release the lock,
but still T1 has to wait for T3 to ﬁnish
...
In fact, it is possible that there is a sequence of transactions that
each requests a shared-mode lock on the data item, and each transaction releases the
lock a short while after it is granted, but T1 never gets the exclusive-mode lock on the
data item
...

We can avoid starvation of transactions by granting locks in the following manner:
When a transaction Ti requests a lock on a data item Q in a particular mode M , the
concurrency-control manager grants the lock provided that
1
...

2
...

Thus, a lock request will never get blocked by a lock request that is made later
...
1
...
This protocol requires that each transaction issue lock and unlock requests in two phases:
1
...
A transaction may obtain locks, but may not release any lock
...
Shrinking phase
...

595

596

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

V
...
Concurrency Control

16
...
The transaction acquires locks as
needed
...

For example, transactions T3 and T4 are two phase
...
Note that the unlock instructions do not need to appear
at the end of the transaction
...

We can show that the two-phase locking protocol ensures conﬂict serializability
...
The point in the schedule where the transaction has obtained its ﬁnal lock (the end of its growing phase) is called the lock point of the
transaction
...
We leave the proof as
an exercise for you to do (see Exercise 16
...

Two-phase locking does not ensure freedom from deadlock
...
7), they are deadlocked
...
6
...
Cascading rollback may occur under two-phase locking
...
8
...

Cascading rollbacks can be avoided by a modiﬁcation of two-phase locking called
the strict two-phase locking protocol
...
This requirement ensures that any data written by an
uncommitted transaction are locked in exclusive mode until the transaction commits,
preventing any other transaction from reading the data
...
We can easily
T5
lock-X(A)
read(A)
lock-S(B)
read(B)
write(A)
unlock(A)

T6

T7

lock-X(A)
read(A)
read
write(A)
unlock(A)
lock-S (A)
read(A)
Figure 16
...

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

598

Chapter 16

V
...
Concurrency Control

Concurrency Control

verify that, with rigorous two-phase locking, transactions can be serialized in the order in which they commit
...

Consider the following two transactions, for which we have shown only some of
the signiﬁcant read and write operations:
T8 : read(a1 );
read(a2 );

...

T9 : read(a1 );
read(a2 );
display(a1 + a2 )
...
Therefore, any concurrent execution of both transactions amounts to a serial
execution
...
Thus, if T8 could initially lock a1 in shared mode, and
then could later change the lock to exclusive mode, we could get more concurrency,
since T8 and T9 could access a1 and a2 simultaneously
...
We shall provide a mechanism for upgrading
a shared lock to an exclusive lock, and downgrading an exclusive lock to a shared
lock
...
Lock conversion cannot be allowed arbitrarily
...

Returning to our example, transactions T8 and T9 can run concurrently under
the reﬁned two-phase locking protocol, as shown in the incomplete schedule of Figure 16
...

T8
lock-S (a 1 )

T9
lock-S (a 1 )

lock-S (a 2 )
lock-S (a 2 )
lock-S (a 3 )
lock-S (a 4 )
unlock(a 1 )
unlock(a 2 )
lock-S ( an )
upgrade (a 1 )
Figure 16
...

597

598

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

V
...
Concurrency Control

16
...
This enforced wait occurs if Q is currently locked by another transaction in
shared mode
...
Further, if exclusive locks are held until the end of the transaction, the schedules are cascadeless
...
However, to obtain conﬂictserializable schedules through non-two-phase locking protocols, we need either to
have additional information about the transactions or to impose some structure or
ordering on the set of data items in the database
...

Strict two-phase locking and rigorous two-phase locking (with lock conversions)
are used extensively in commercial database systems
...

• When Ti issues a write(Q) operation, the system checks to see whether Ti
already holds a shared lock on Q
...
Otherwise, the system issues a lock-X(Q) instruction, followed by the write(Q) instruction
...

16
...
4 Implementation of Locking∗∗
A lock manager can be implemented as a process that receives messages from transactions and sends messages in reply
...
Unlock messages require only an acknowledgment in
response, but may result in a grant message to another waiting transaction
...
It uses a hash table, indexed on the name of a data item, to
ﬁnd the linked list (if any) for a data item; this table is called the lock table
...
The record also notes if the request has currently
been granted
...
Transaction
Management

© The McGraw−Hill
Companies, 2001

16
...
10

Lock table
...
10 shows an example of a lock table
...
The lock table uses overﬂow chaining,
so there is a linked list of data items for each entry in the lock table
...
Granted locks are the ﬁlled-in (black) rectangles, while waiting requests
are the empty rectangles
...

It can be seen, for example, that T23 has been granted locks on I912 and I7, and is
waiting for a lock on I4
...

The lock manager processes requests this way:
• When a lock request message arrives, it adds a record to the end of the linked
list for the data item, if the linked list is present
...

It always grants the ﬁrst lock request on a data item
...
Otherwise the request has
to wait
...
Transaction
Management

© The McGraw−Hill
Companies, 2001

16
...
1

Lock-Based Protocols

601

• When the lock manager receives an unlock message from a transaction, it
deletes the record for that data item in the linked list corresponding to that
transaction
...
If it can, the lock manager grants that request, and processes the record following it, if any, similarly,
and so on
...
Once the database system has taken appropriate actions to
undo the transaction (see Section 17
...

This algorithm guarantees freedom from starvation for lock requests, since a request can never be granted while a request received earlier is waiting to be granted
...
6
...
Section 18
...
1
describes an alternative implementation — one that uses shared memory instead of
message passing for lock request/grant
...
1
...
1
...
But, if we wish to develop protocols that are
not two phase, we need additional information on how each transaction will access
the database
...
The simplest model requires
that we have prior knowledge about the order in which the database items will be
accessed
...

To acquire such prior knowledge, we impose a partial ordering → on the set
D = {d1 , d2 ,
...
If di → dj , then any transaction accessing both
di and dj must access di before accessing dj
...

The partial ordering implies that the set D may now be viewed as a directed acyclic
graph, called a database graph
...
We will present a
simple protocol, called the tree protocol, which is restricted to employ only exclusive
locks
...

In the tree protocol, the only lock instruction allowed is lock-X
...
The ﬁrst lock by Ti may be on any data item
...
Subsequently, a data item Q can be locked by Ti only if the parent of Q is
currently locked by Ti
...
Transaction
Management

© The McGraw−Hill
Companies, 2001

16
...
Data items may be unlocked at any time
...
A data item that has been locked and unlocked by Ti cannot subsequently be
relocked by Ti
...

To illustrate this protocol, consider the database graph of Figure 16
...
The following four transactions follow the tree protocol on this graph
...

T11 : lock-X(D); lock-X(H); unlock(D); unlock(H)
...

T13 : lock-X(D); lock-X(H); unlock(D); unlock(H)
...
12
...

Observe that the schedule of Figure 16
...
It can be shown
not only that the tree protocol ensures conﬂict serializability, but also that this protocol ensures freedom from deadlock
...
12 does not ensure recoverability and cascadelessness
...
Holding exclusive locks until the end of the transaction reduces concurrency
...
Whenever a transaction Ti performs a read of an uncommitted data
item, we record a commit dependency of Ti on the transaction that performed the

A
B

C
F

E

D
G

H

I

J
Figure 16
...

601

602

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

V
...
Concurrency Control

16
...
12

Serializable schedule under the tree protocol
...
Transaction Ti is then not permitted to commit until the
commit of all transactions on which it has a commit dependency
...

The tree-locking protocol has an advantage over the two-phase locking protocol in
that, unlike two-phase locking, it is deadlock-free, so no rollbacks are required
...
Earlier unlocking may lead to shorter waiting times,
and to an increase in concurrency
...
For example, a transaction that needs
to access data items A and J in the database graph of Figure 16
...
This additional locking results in increased
locking overhead, the possibility of additional waiting time, and a potential decrease
in concurrency
...

For a set of transactions, there may be conﬂict-serializable schedules that cannot
be obtained through the tree protocol
...
Examples of such schedules are explored in the exercises
...
Transaction
Management

16
...
2 Timestamp-Based Protocols
The locking protocols that we have described thus far determine the order between
every pair of conﬂicting transactions at execution time by the ﬁrst lock that both
members of the pair request that involves incompatible modes
...
The most common method for doing so is to use a timestamp-ordering
scheme
...
2
...
This timestamp is assigned by the database system before the transaction Ti starts execution
...
There are two simple
methods for implementing this scheme:
1
...

2
...

The timestamps of the transactions determine the serializability order
...

To implement this scheme, we associate with each data item Q two timestamp
values:
• W-timestamp(Q) denotes the largest timestamp of any transaction that executed write(Q) successfully
...

These timestamps are updated whenever a new read(Q) or write(Q) instruction is
executed
...
2
...
This protocol operates as follows:
1
...

a
...
Hence, the read operation is rejected, and Ti is rolled
back
...
Transaction
Management

© The McGraw−Hill
Companies, 2001

16
...
2

Timestamp-Based Protocols

605

b
...

2
...

a
...
Hence, the system rejects the write operation and rolls Ti
back
...
If TS(Ti ) < W-timestamp(Q), then Ti is attempting to write an obsolete
value of Q
...

c
...

If a transaction Ti is rolled back by the concurrency-control scheme as result of issuance of either a read or write operation, the system assigns it a new timestamp and
restarts it
...
Transaction T14
displays the contents of accounts A and B:
T14 : read(B);
read(A);
display(A + B)
...

In presenting schedules under the timestamp protocol, we shall assume that a transaction is assigned a timestamp immediately before its ﬁrst instruction
...
13, TS(T14 ) < TS(T15 ), and the schedule is possible under the timestamp protocol
...
There are, however, schedules that are possible under the two-phase
locking protocol, but are not possible under the timestamp protocol, and vice versa
(see Exercise 16
...

The timestamp-ordering protocol ensures conﬂict serializability
...

The protocol ensures freedom from deadlock, since no transaction ever waits
...
If

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

606

Chapter 16

V
...
Concurrency Control

Concurrency Control

T14
read (B)

T15
read (B)
B := B – 50
write(B)

read (A)
read (A)
display(A + B)
A := A + 50
write(A)
display (A + B)
Figure 16
...

a transaction is found to be getting restarted repeatedly, conﬂicting transactions need
to be temporarily blocked to enable the transaction to ﬁnish
...
However, it can be
extended to make the schedules recoverable, in one of several ways:
• Recoverability and cascadelessness can be ensured by performing all writes
together at the end of the transaction
...

• Recoverability and cascadelessness can also be guaranteed by using a limited
form of locking, whereby reads of uncommitted items are postponed until the
transaction that updated the item commits (see Exercise 16
...

• Recoverability alone can be ensured by tracking uncommitted writes, and allowing a transaction Ti to commit only after the commit of any transaction that
wrote a value that Ti read
...
1
...

16
...
3 Thomas’ Write Rule
We now present a modiﬁcation to the timestamp-ordering protocol that allows greater
potential concurrency than does the protocol of Section 16
...
2
...
14, and apply the timestamp-ordering protocol
...
The read(Q) operation of T16 succeeds, as does the write(Q) operation of T17
...
Thus, the
write(Q) by T16 is rejected and transaction T16 must be rolled back
...
Since T17 has already written Q, the value that T16 is attempting to
write is one that will never need to be read
...
Any

605

606

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

V
...
Concurrency Control

16
...
14

Schedule 4
...

This observation leads to a modiﬁed version of the timestamp-ordering protocol
in which obsolete write operations can be ignored under certain circumstances
...
The protocol rules for write
operations, however, are slightly different from the timestamp-ordering protocol of
Section 16
...
2
...

1
...
Hence, the system rejects the write operation and rolls Ti back
...
If TS(Ti ) < W-timestamp(Q), then Ti is attempting to write an obsolete value
of Q
...

3
...

The difference between these rules and those of Section 16
...
2 lies in the second
rule
...
However, here, in those cases where TS(Ti )
≥ R-timestamp(Q), we ignore the obsolete write
...
This modiﬁcation of transactions makes it possible to generate serializable schedules that would not be possible
under the other protocols presented in this chapter
...
14 is not conﬂict serializable and, thus, is not possible under any of two-phase
locking, the tree protocol, or the timestamp-ordering protocol
...
The result is a schedule that is
view equivalent to the serial schedule
...
3 Validation-Based Protocols
In cases where a majority of transactions are read-only transactions, the rate of conﬂicts among transactions may be low
...
A concurrency-control scheme imposes overhead of
code execution and possible delay of transactions
...
Transaction
Management

16
...
A difﬁculty in reducing the overhead is that
we do not know in advance which transactions will be involved in a conﬂict
...

We assume that each transaction Ti executes in two or three different phases in its
lifetime, depending on whether it is a read-only or an update transaction
...
Read phase
...
It reads
the values of the various data items and stores them in variables local to Ti
...

2
...
Transaction Ti performs a validation test to determine whether it can copy to the database the temporary local variables that hold the
results of write operations without causing a violation of serializability
...
Write phase
...
Otherwise, the system rolls back
Ti
...
However, all
three phases of concurrently executing transactions can be interleaved
...
We shall, therefore, associate three different timestamps with
transaction Ti :
1
...

2
...

3
...

We determine the serializability order by the timestamp-ordering technique, using
the value of the timestamp Validation(Ti )
...
The reason we have
chosen Validation(Ti ), rather than Start(Ti ), as the timestamp of transaction Ti is that
we can expect faster response time provided that conﬂict rates among transactions
are indeed low
...
Finish(Ti ) < Start(Tj )
...

2
...
This condition ensures that

607

608

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

V
...
Concurrency Control

16
...
15

Schedule 5, a schedule produced by using validation
...
Since the writes of Ti do not affect the
read of Tj , and since Tj cannot affect the read of Ti , the serializability order is
indeed maintained
...
Suppose that TS(T14 )
< TS(T15 )
...
15
...
Thus, T14 reads the old values of B and A, and this schedule is serializable
...

However, there is a possibility of starvation of long transactions, due to a sequence
of conﬂicting short transactions that cause repeated restarts of the long transaction
...

This validation scheme is called the optimistic concurrency control scheme since
transactions execute optimistically, assuming they will be able to ﬁnish execution
and validate at the end
...

16
...

There are circumstances, however, where it would be advantageous to group several data items, and to treat them as one individual synchronization unit
...
Clearly, executing these locks is
time consuming
...
Transaction
Management

© The McGraw−Hill
Companies, 2001

16
...
16

rb1

A2

Fc

…

rbk

rc1

…

rcm

Granularity hierarchy
...
On the other hand, if transaction Tj needs to access only a few data
items, it should not be required to lock the entire database, since otherwise concurrency is lost
...
We can make one by allowing data items to be of various sizes and deﬁning a hierarchy of data granularities, where the small granularities are nested within
larger ones
...
Note that the
tree that we describe here is signiﬁcantly different from that used by the tree protocol
(Section 16
...
5)
...
In the tree protocol, each node is an independent
data item
...
16, which consists of four levels
of nodes
...
Below it are nodes of type
area; the database consists of exactly these areas
...
Each area contains exactly those ﬁles that are its child nodes
...
Finally, each ﬁle has nodes of type record
...

Each node in the tree can be locked individually
...
When a transaction locks
a node, in either shared or exclusive mode, the transaction also has implicitly locked
all the descendants of that node in the same lock mode
...
16, in exclusive mode, then it has an
implicit lock in exclusive mode all the records belonging to that ﬁle
...

Suppose that transaction Tj wishes to lock record rb6 of ﬁle Fb
...
But, when Tj issues a lock
request for rb6 , rb6 is not explicitly locked! How does the system determine whether
Tj can lock rb6 ? Tj must traverse the tree from the root to record rb6
...

609

610

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

V
...
Concurrency Control

16
...
17

S
true
false
true
false
false

SIX

true
false
false
false
false

Multiple Granularity

611

X
false
false
false
false
false

Compatibility matrix
...
To do so, it
simply must lock the root of the hierarchy
...
But how does the system determine if the root node can be
locked? One possibility is for it to search the entire tree
...
A more efﬁcient
way to gain this knowledge is to introduce a new class of lock modes, called intention lock modes
...
Intention locks are put
on all the ancestors of a node before that node is locked explicitly
...
A transaction wishing to lock a node—say, Q—must traverse a path in the
tree from the root to Q
...

There is an intention mode associated with shared mode, and there is one with
exclusive mode
...
Similarly,
if a node is locked in intention-exclusive (IX) mode, then explicit locking is being
done at a lower level, with exclusive-mode or shared-mode locks
...
The compatibility function for these lock
modes is in Figure 16
...

The multiple-granularity locking protocol, which ensures serializability, is this:
Each transaction Ti can lock a node Q by following these rules:
1
...
17
...
It must lock the root of the tree ﬁrst, and can lock it in any mode
...
It can lock a node Q in S or IS mode only if it currently has the parent of Q
locked in either IX or IS mode
...
It can lock a node Q in X, SIX, or IX mode only if it currently has the parent of
Q locked in either IX or SIX mode
...
It can lock a node only if it has not previously unlocked any node (that is, Ti
is two phase)
...
It can unlock a node Q only if it currently has none of the children of Q locked
...
Transaction
Management

16
...

As an illustration of the protocol, consider the tree of Figure 16
...
Then, T18 needs to
lock the database, area A1 , and Fa in IS mode (and in that order), and ﬁnally
to lock ra2 in S mode
...
Then, T19 needs to
lock the database, area A1 , and ﬁle Fa in IX mode, and ﬁnally to lock ra9 in X
mode
...
Then, T20 needs
to lock the database and area A1 (in that order) in IS mode, and ﬁnally to lock
Fa in S mode
...
It can do so after locking the database in S mode
...

Transaction T19 can execute concurrently with T18 , but not with either T20 or T21
...
It is particularly
useful in applications that include a mix of
• Short transactions that access only a few data items
• Long transactions that produce reports from an entire ﬁle or set of ﬁles
There is a similar locking protocol that is applicable to database systems in which
data granularities are organized in the form of a directed acyclic graph
...
Deadlock is possible in the protocol that
we have, as it is in the two-phase locking protocol
...
These techniques are referenced in the bibliographical notes
...
5 Multiversion Schemes
The concurrency-control schemes discussed thus far ensure serializability by either
delaying an operation or aborting the transaction that issued the operation
...
These
difﬁculties could be avoided if old copies of each data item were kept in a system
...
When a transaction issues a read(Q) operation, the concurrencycontrol manager selects one of the versions of Q to be read
...
Transaction
Management

© The McGraw−Hill
Companies, 2001

16
...
5

Multiversion Schemes

613

serializability
...

16
...
1 Multiversion Timestamp Ordering
The most common transaction ordering technique used by multiversion schemes is
timestamping
...
The database system assigns this timestamp before
the transaction starts execution, as described in Section 16
...

With each data item Q, a sequence of versions ...

Each version Qk contains three data ﬁelds:
• Content is the value of version Qk
...

• R-timestamp(Qk ) is the largest timestamp of any transaction that successfully
read version Qk
...
The content ﬁeld of the version holds the value written by Ti
...
It updates the
R-timestamp value of Qk whenever a transaction Tj reads the content of Qk , and
R-timestamp(Qk ) < TS(Tj )
...
The scheme operates as follows
...
Let Qk denote the version of Q whose write timestamp is the
largest write timestamp less than or equal to TS(Ti )
...
If transaction Ti issues a read(Q), then the value returned is the content of
version Qk
...
If transaction Ti issues write(Q), and if TS(Ti ) < R-timestamp(Qk ), then the system rolls back transaction Ti
...

The justiﬁcation for rule 1 is clear
...
The second rule forces a transaction to abort if it is “too late”
in doing a write
...

Versions that are no longer needed are removed according to the following rule
...

Then, the older of the two versions Qk and Qj will not be used again, and can be
deleted
...
In typical database systems, where

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

614

Chapter 16

V
...
Concurrency Control

© The McGraw−Hill
Companies, 2001

Concurrency Control

reading is a more frequent operation than is writing, this advantage may be of major
practical signiﬁcance
...
First, the reading
of a data item also requires the updating of the R-timestamp ﬁeld, resulting in two
potential disk accesses, rather than one
...
This alternative may be
expensive
...
5
...

This multiversion timestamp-ordering scheme does not ensure recoverability and
cascadelessness
...

16
...
2 Multiversion Two-Phase Locking
The multiversion two-phase locking protocol attempts to combine the advantages
of multiversion concurrency control with the advantages of two-phase locking
...

Update transactions perform rigorous two-phase locking; that is, they hold all
locks up to the end of the transaction
...
Each version of a data item has a single timestamp
...

Read-only transactions are assigned a timestamp by reading the current value
of ts-counter before they start execution; they follow the multiversion timestampordering protocol for performing reads
...

When an update transaction reads an item, it gets a shared lock on the item, and
reads the latest version of that item
...
The write is performed on the new version, and the timestamp of the
new version is initially set to a value ∞, a value greater than that of any possible
timestamp
...
Only one update transaction
is allowed to perform commit processing at a time
...
In either case, read-only transactions never
need to wait for locks
...

Versions are deleted in a manner like that of multiversion timestamp ordering
...
Then, the older of the two versions Qk and Qj will not be used again and can
be deleted
...
Transaction
Management

© The McGraw−Hill
Companies, 2001

16
...
6

Deadlock Handling

615

Multiversion two-phase locking or variations of it are used in some commercial
database systems
...
6 Deadlock Handling
A system is in a deadlock state if there exists a set of transactions such that every
transaction in the set is waiting for another transaction in the set
...
, Tn } such that T0 is waiting for a
data item that T1 holds, and T1 is waiting for a data item that T2 holds, and
...
None of the transactions can make progress in such a situation
...

Rollback of a transaction may be partial: That is, a transaction may be rolled back to
the point where it obtained a lock whose release resolves the deadlock
...
We can
use a deadlock prevention protocol to ensure that the system will never enter a deadlock state
...
As we
shall see, both methods may result in transaction rollback
...

Note that a detection and recovery scheme requires overhead that includes not
only the run-time cost of maintaining the necessary information and of executing the
detection algorithm, but also the potential losses inherent in recovery from a deadlock
...
6
...
One approach ensures that no
cyclic waits can occur by ordering the requests for locks, or requiring all locks to be
acquired together
...

The simplest scheme under the ﬁrst approach requires that each transaction locks
all its data items before it begins execution
...
There are two main disadvantages to this protocol: (1) it is often
hard to predict, before the transaction begins, what data items need to be locked;
(2) data-item utilization may be very low, since many of the data items may be locked
but unused for a long time
...
We have seen one such scheme in the tree protocol, which uses a
partial ordering of data items
...
Once a transaction has locked a particular item, it cannot

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

616

Chapter 16

V
...
Concurrency Control

© The McGraw−Hill
Companies, 2001

Concurrency Control

request locks on items that precede that item in the ordering
...
There is no need to change the underlying
concurrency-control system if two-phase locking is used: All that is needed it to ensure that locks are requested in the right order
...
In preemption, when a transaction T2 requests a lock that transaction
T1 holds, the lock granted to T1 may be preempted by rolling back of T1 , and granting
of the lock to T2
...
The system uses these timestamps only to decide whether a transaction
should wait or roll back
...
If a transaction
is rolled back, it retains its old timestamp when restarted
...
The wait–die scheme is a nonpreemptive technique
...
Otherwise, Ti is
rolled back (dies)
...
If T22 requests a data item held by T23 , then T22 will
wait
...

2
...
It is a counterpart to the
wait–die scheme
...
Otherwise, Tj is rolled back (Tj is wounded by Ti )
...
If T24 requests a data item held by T23 , then T24
will wait
...

Both the wound–wait and the wait–die schemes avoid starvation: At any time,
there is a transaction with the smallest timestamp
...
Since timestamps always increase, and since transactions are not assigned new timestamps when they are rolled back, a transaction that
is rolled back repeatedly will eventually have the smallest timestamp, at which point
it will not be rolled back again
...

• In the wait–die scheme, an older transaction must wait for a younger one to
release its data item
...
By contrast, in the wound–wait scheme, an older transaction never waits
for a younger transaction
...
Transaction
Management

© The McGraw−Hill
Companies, 2001

16
...
6

Deadlock Handling

617

• In the wait–die scheme, if a transaction Ti dies and is rolled back because it
requested a data item held by transaction Tj , then Ti may reissue the same
sequence of requests when it is restarted
...
Thus, Ti may die several times before acquiring the
needed data item
...
Transaction Ti is wounded and rolled back because Tj
requested a data item that it holds
...
Thus, there may be fewer rollbacks in the
wound–wait scheme
...

16
...
2 Timeout-Based Schemes
Another simple approach to deadlock handling is based on lock timeouts
...
If the lock has not been granted within that time, the transaction is said to time
out, and it rolls itself back and restarts
...
This scheme falls somewhere between deadlock prevention, where a
deadlock will never occur, and deadlock detection and recovery, which Section 16
...
3
discusses
...
However, in general
it is hard to decide how long a transaction must wait before timing out
...
Too short a wait
results in transaction rollback even when there is no deadlock, leading to wasted resources
...
Hence, the timeout-based
scheme has limited applicability
...
6
...
An algorithm that examines the state
of the system is invoked periodically to determine whether a deadlock has occurred
...
To do so, the
system must:
• Maintain information about the current allocation of data items to transactions, as well as any outstanding data item requests
...

• Recover from the deadlock when the detection algorithm determines that a
deadlock exists
...

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

618

Chapter 16

V
...
Concurrency Control

Concurrency Control

T26

T28

T25
T27
Figure 16
...

16
...
3
...
This graph consists of a pair G = (V, E), where V is a set of vertices and E is
a set of edges
...
Each
element in the set E of edges is an ordered pair Ti → Tj
...

When transaction Ti requests a data item currently being held by transaction Tj ,
then the edge Ti → Tj is inserted in the wait-for graph
...

A deadlock exists in the system if and only if the wait-for graph contains a cycle
...
To detect deadlocks,
the system needs to maintain the wait-for graph, and periodically to invoke an algorithm that searches for a cycle in the graph
...
18, which
depicts the following situation:
• Transaction T25 is waiting for transactions T26 and T27
...

• Transaction T26 is waiting for transaction T28
...

Suppose now that transaction T28 is requesting an item held by T27
...
19
...

Consequently, the question arises: When should we invoke the detection algorithm? The answer depends on two factors:
1
...
How many transactions will be affected by the deadlock?

617

618

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

V
...
Concurrency Control

16
...
19

Wait-for graph with a cycle
...
Data items allocated to deadlocked transactions will be
unavailable to other transactions until the deadlock can be broken
...
In the worst case, we would invoke the
detection algorithm every time a request for allocation could not be granted immediately
...
6
...
2 Recovery from Deadlock
When a detection algorithm determines that a deadlock exists, the system must recover from the deadlock
...
Three actions need to be taken:
1
...
Given a set of deadlocked transactions, we must determine which transaction (or transactions) to roll back to break the deadlock
...
Unfortunately, the term minimum cost is not a precise one
...
How long the transaction has computed, and how much longer the transaction will compute before it completes its designated task
...
How many data items the transaction has used
...
How many more data items the transaction needs for it to complete
...
How many transactions will be involved in the rollback
...
Rollback
...

The simplest solution is a total rollback: Abort the transaction and then
restart it
...
Such partial rollback requires the system
to maintain additional information about the state of all the running transactions
...
The deadlock detection mechanism should decide which locks the selected transaction needs to release in
order to break the deadlock
...
The recovery mechanism must be capable of performing such

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

620

Chapter 16

V
...
Concurrency Control

© The McGraw−Hill
Companies, 2001

Concurrency Control

partial rollbacks
...
See the bibliographical notes for relevant
references
...
Starvation
...
As a result, this transaction never completes its designated task, thus
there is starvation
...
The most common solution is to include
the number of rollbacks in the cost factor
...
7 Insert and Delete Operations
Until now, we have restricted our attention to read and write operations
...
Some transactions
require not only access to existing data items, but also the ability to create new data
items
...
To examine how such transactions affect concurrency control, we introduce these additional operations:
• delete(Q) deletes data item Q from the database
...

An attempt by a transaction Ti to perform a read(Q) operation after Q has been
deleted results in a logical error in Ti
...
It is also a logical error to attempt to delete a nonexistent data item
...
7
...
Let Ii
and Ij be instructions of Ti and Tj , respectively, that appear in schedule S in consecutive order
...
We consider several instructions Ij
...
Ii and Ij conﬂict
...
If Ij comes before Ii , Tj can execute the read operation successfully
...
Ii and Ij conﬂict
...
If Ij comes before Ii , Tj can execute the write operation successfully
...
Ii and Ij conﬂict
...
If Ij comes before Ii , Ti will have a logical error
...
Ii and Ij conﬂict
...
Then, if Ii comes before Ij , a logical error results
for Ti
...
Likewise, if Q existed

619

620

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

V
...
Concurrency Control

16
...

We can conclude the following:
• Under the two-phase locking protocol, an exclusive lock is required on a data
item before that item can be deleted
...
Suppose that transaction Ti issues delete(Q)
...
Hence, the
delete operation is rejected, and Ti is rolled back
...
Hence, this delete operation is rejected, and Ti is rolled
back
...

16
...
2 Insertion
We have already seen that an insert(Q) operation conﬂicts with a delete(Q) operation
...

Since an insert(Q) assigns a value to data item Q, an insert is treated similarly to a
write for concurrency-control purposes:
• Under the two-phase locking protocol, if Ti performs an insert(Q) operation,
Ti is given an exclusive lock on the newly created data item Q
...

16
...
3 The Phantom Phenomenon
Consider transaction T29 that executes the following SQL query on the bank database:
select sum(balance)
from account
where branch-name = ’Perryridge’
Transaction T29 requires access to all tuples of the account relation pertaining to the
Perryridge branch
...
We expect there to be potential for a
conﬂict for the following reasons:

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

622

Chapter 16

V
...
Concurrency Control

© The McGraw−Hill
Companies, 2001

Concurrency Control

• If T29 uses the tuple newly inserted by T30 in computing sum(balance), then
T29 read a value written by T30
...

• If T29 does not use the tuple newly inserted by T30 in computing sum(balance),
then in a serial schedule equivalent to S, T29 must come before T30
...
T29 and T30 do not access any tuple in
common, yet they conﬂict with each other! In effect, T29 and T30 conﬂict on a phantom
tuple
...
This problem is called the phantom phenomenon
...
”
To ﬁnd all account tuples with branch-name = “Perryridge”, T29 must search either
the whole account relation, or at least an index on the relation
...
However, T29 is an example of a transaction that reads information about what tuples are
in a relation, and T30 is an example of a transaction that updates that information
...

The simplest solution to this problem is to associate a data item with the relation;
the data item represents the information used to ﬁnd the tuples in the relation
...

Transactions, such as T30 , that update the information about what tuples are in a relation would have to lock the data item in exclusive mode
...

Do not confuse the locking of an entire relation, as in multiple granularity locking, with the locking of the data item corresponding to the relation
...
Locking is still required on tuples
...

The major disadvantage of locking a data item corresponding to the relation is
the low degree of concurrency— two transactions that insert different tuples into a
relation are prevented from executing concurrently
...
Any transaction that inserts a
tuple into a relation must insert information into every index maintained on the relation
...
For simplicity we shall only consider B+ -tree indices
...
A query will usually use one or more indices to access a relation
...
In our example, we assume
that there is an index on account for branch-name
...
If T29 reads the same leaf node to locate all tuples
pertaining to the Perryridge branch, then T29 and T30 conﬂict on that leaf node
...
Transaction
Management

© The McGraw−Hill
Companies, 2001

16
...
8

Weak Levels of Consistency

623

The index-locking protocol takes advantage of the availability of indices on a relation, by turning instances of the phantom phenomenon into conﬂicts on locks on
index leaf nodes
...

• A transaction Ti can access tuples of a relation only after ﬁrst ﬁnding them
through one or more of the indices on the relation
...

• A transaction Ti may not insert, delete, or update a tuple ti in a relation r
without updating all indices on r
...

For insertion and deletion, the leaf nodes affected are those that contain (after
insertion) or contained (before deletion) the search-key value of the tuple
...

• The rules of the two-phase locking protocol must be observed
...

16
...
If every transaction has the
property that it maintains database consistency if executed alone, then serializability ensures that concurrent executions maintain consistency
...
In these cases, weaker levels of consistency are used
...

16
...
1 Degree-Two Consistency
The purpose of degree-two consistency is to avoid cascading aborts without necessarily ensuring serializability
...
A transaction must hold the appropriate lock mode when it
accesses a data item
...
Exclusive locks cannot be released until
the transaction either commits or aborts
...
Indeed, a transaction may read the same data item twice and obtain different

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

624

Chapter 16

V
...
Concurrency Control

Concurrency Control

T3
lock-S(Q)
read(Q)
unlock(Q)

T4

lock-X(Q)
read(Q)
write(Q)
unlock(Q)

lock-S(Q)
read(Q)
unlock(Q)
Figure 16
...

results
...
20, T3 reads the value of Q before and after that value is written
by T4
...

16
...
2 Cursor Stability
Cursor stability is a form of degree-two consistency designed for programs written
in host languages, which iterate over tuples of a relation by using cursors
...

• Any modiﬁed tuples are locked in exclusive mode until the transaction commits
...
Two-phase locking is
not required
...
Cursor stability is used in practice
on heavily accessed relations as a means of increasing concurrency and improving
system performance
...
Thus, the use of cursor stability is limited to specialized situations with simple
consistency constraints
...
8
...
For instance, a
transaction may operate at the level of read uncommitted, which permits the transaction to read records even if they have not been committed
...
For instance, approximate information is usually sufﬁcient for statistics used for query optimization
...
Transaction
Management

© The McGraw−Hill
Companies, 2001

16
...
9

Concurrency in Index Structures∗∗

625

these transactions were to execute in a serializable fashion, they could interfere with
other transactions, causing the others’ execution to be delayed
...

• Repeatable read allows only committed records to be read, and further requires that, between two reads of a record by a transaction, no other transaction is allowed to update the record
...
For instance, when it is searching for records satisfying some conditions, a transaction may ﬁnd some of the
records inserted by a committed transaction, but may not ﬁnd others
...
For instance, between two reads of a record by the
transaction, the records may have been updated by other committed transactions
...

• Read uncommitted allows even uncommitted records to be read
...

16
...
However, since indices
are accessed frequently, they would become a point of great lock contention, leading
to a low degree of concurrency
...
It is perfectly acceptable for a transaction to perform a lookup
on an index twice, and to ﬁnd that the structure of the index has changed in between,
as long as the index lookup returns the correct set of tuples
...

We outline two techniques for managing concurrent access to B+ -trees
...

The techniques that we present for concurrency control on B+ -trees are based on
locking, but neither two-phase locking nor the tree protocol is employed
...

The ﬁrst technique is called the crabbing protocol:
• When searching for a key value, the crabbing protocol ﬁrst locks the root node
in shared mode
...
After acquiring the lock on the child
node, it releases the lock on the parent node
...

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

626

Chapter 16

V
...
Concurrency Control

© The McGraw−Hill
Companies, 2001

Concurrency Control

• When inserting or deleting a key value, the crabbing protocol takes these actions:
It follows the same protocol as for searching until it reaches the desired
leaf node
...

It locks the leaf node in exclusive mode and inserts or deletes the key
value
...
After performing these actions, it releases the
locks on the node and siblings
...
Otherwise, it releases the lock on the parent
...

The progress of locking while the protocol both goes down the tree and goes back up
(in case of splits, coalescing, or redistribution) proceeds in a similar crab-like manner
...
There is a possibility of deadlocks between search operations coming
down the tree, and splits, coalescing or redistribution propagating up the tree
...

The second technique achieves even more concurrency, avoiding even holding the
lock on one node while acquiring the lock on another node, by using a modiﬁed version of B+ -trees called B-link trees; B-link trees require that every node (including internal nodes, not just the leaves) maintain a pointer to its right sibling
...
We shall illustrate
this technique with an example later, but we ﬁrst present the modiﬁed procedures of
the B-link-tree locking protocol
...
Each node of the B+ -tree must be locked in shared mode before it is
accessed
...
If a split occurs concurrently with a lookup,
the desired search-key value may no longer appear within the range of values
represented by a node accessed during lookup
...
However, the system locks leaf
nodes following the two-phase locking protocol, as Section 16
...
3 describes,
to avoid the phantom phenomenon
...
The system follows the rules for lookup to locate the
leaf node into which it will make the insertion or deletion
...
Transaction
Management

© The McGraw−Hill
Companies, 2001

16
...
9

Concurrency in Index Structures∗∗

627

or deletion
...
7
...

• Split
...
3 and makes it the right sibling of the original node
...

Following this, the transaction releases the exclusive lock on the original node
and requests an exclusive lock on the parent, so that it can insert a pointer to
the new node
...
If a node has too few search-key values after a deletion, the node
with which it will be coalesced must be locked in exclusive mode
...
At this point, the transaction
releases the locks on the coalesced nodes
...

Observe this important fact: An insertion or deletion may lock a node, unlock it, and
subsequently relock it
...

As an illustration, consider the B+ -tree in Figure 16
...
Assume that there are two
concurrent operations on this B+ -tree:
1
...
Look up “Downtown”
Let us assume that the insertion operation begins ﬁrst
...

It therefore converts its shared lock on the node to exclusive mode, and creates a
new node
...
” The new node contains the search-key value “Downtown
...
This lookup operation accesses the root, and follows the pointer
Perryridge

Downtown

Brighton

Clearview

Downtown

Figure 16
...

Round Hill

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

628

Chapter 16

V
...
Concurrency Control

Concurrency Control

Perryridge

Downtown

Brighton Clearview

Figure 16
...
21
...
It then accesses that node, and obtains a pointer to the
left child
...
” Since this node is currently locked by the insertion operation in
exclusive mode, the lookup operation must wait
...
It completes the insertion, leaving the B+ -tree as in Figure 16
...

The lookup operation proceeds
...
It therefore follows the right-sibling pointer to locate the next node
...
It
can be shown that, if a lookup holds a pointer to an incorrect node, then, by following
right-sibling pointers, the lookup must eventually reach the correct node
...
Coalescing of nodes
during deletion can cause inconsistencies, since a lookup may have read a pointer
to a deleted node from its parent, before the parent node was updated, and may
then try to access the deleted node
...
Leaving nodes uncoalesced avoids such inconsistencies
...
In most databases, however, insertions are more frequent than deletions, so
it is likely that nodes that have too few search-key values will gain additional values
relatively quickly
...
Key-value locking thus
provides increased concurrency
...
In this technique, every index lookup must lock
not only the keys found within the range (or the single key, in case of a point lookup)
but also the next key value — that is, the key value just greater than the last key value
that was within the range
...
Thus, if a transaction attempts to insert a value
that was within the range of the index lookup of another transaction, the two transactions would conﬂict on the key value next to the inserted key value
...

627

628

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

V
...
Concurrency Control

16
...
10 Summary
• When several transactions execute concurrently in the database, the consistency of data may no longer be preserved
...

• To ensure serializability, we can use various concurrency-control schemes
...
The most common ones are locking protocols, timestampordering schemes, validation techniques, and multiversion schemes
...

• The two-phase locking protocol allows a transaction to lock a new data item
only if that transaction has not yet unlocked any data item
...
In the absence of information
concerning the manner in which data items are accessed, the two-phase locking protocol is both necessary and sufﬁcient for ensuring serializability
...
The rigorous two-phase locking protocol releases
all locks only at the end of the transaction
...
A unique ﬁxed timestamp is
associated with each transaction in the system
...
Thus, if the timestamp of transaction
Ti is smaller than the timestamp of transaction Tj , then the scheme ensures
that the produced schedule is equivalent to a serial schedule in which transaction Ti appears before transaction Tj
...

• A validation scheme is an appropriate concurrency-control method in cases
where a majority of transactions are read-only transactions, and thus the rate
of conﬂicts among these transactions is low
...
The serializability order is determined by the timestamp of the transaction
...
It must, however, pass a validation test to complete
...

• There are circumstances where it would be advantageous to group several
data items, and to treat them as one aggregate data item for purposes of working, resulting in multiple levels of granularity
...
Such a hierarchy can be represented graphically as a tree
...
Transaction
Management

16
...

The protocol ensures serializability, but not freedom from deadlock
...
When a read
operation is issued, the system selects one of the versions to be read
...
A read operation
always succeeds
...

In multiversion two-phase locking, write operations may result in a lock
wait or, possibly, in deadlock
...
One way to prevent
deadlock is to use an ordering of data items, and to request locks in a sequence
consistent with the ordering
...
To control the preemption, we assign a unique timestamp to each transaction
...
If a transaction is rolled back, it retains its old timestamp when restarted
...

• If deadlocks are not prevented, the system must deal with them by using a
deadlock detection and recovery scheme
...
A system is in a deadlock state if and only if the wait-for graph
contains a cycle
...
It does so by
rolling back one or more transactions to break the deadlock
...
A transaction that inserts a
new tuple into the database is given an exclusive lock on the tuple
...
Such conﬂict cannot be detected if locking is done only on
tuples accessed by the transactions
...
The index-locking technique solves this problem by requiring locks on certain index buckets
...

• Weak levels of consistency are used in some applications where consistency
of query results is not critical, and using serializability would result in queries
adversely affecting transaction processing
...
SQL:1999 allows queries to specify the level of
consistency that they require
...
Transaction
Management

© The McGraw−Hill
Companies, 2001

16
...
10

Summary

631

• Special concurrency-control techniques can be developed for special data
structures
...
These techniques allow nonserializable access to the B+ -tree, but
they ensure that the B+ -tree structure is correct, and ensure that accesses to
the database itself are serializable
...
Transaction
Management

© The McGraw−Hill
Companies, 2001

16
...
1 Show that the two-phase locking protocol ensures conﬂict serializability, and
that transactions can be serialized according to their lock points
...
2 Consider the following two transactions:
T31 : read(A);
read(B);
if A = 0 then B := B + 1;
write(B)
...

Add lock and unlock instructions to transactions T31 and T32 , so that they observe the two-phase locking protocol
...
3 What beneﬁt does strict two-phase locking provide? What disadvantages result?
16
...
5 Most implementations of database systems use strict two-phase locking
...

16
...
Suppose that we
insert a dummy vertex between each pair of vertices
...

631

632

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

V
...
Concurrency Control

© The McGraw−Hill
Companies, 2001

Exercises

633

16
...

16
...

• Each transaction must follow the rules of the tree protocol
...

Show that the protocol ensures serializability and deadlock freedom
...
9 Consider the following graph-based locking protocol, which allows only exclusive lock modes, and which operates on data graphs that are in the form of
a rooted directed acyclic graph
...

• To lock any other vertex, the transaction must be holding a lock on the
majority of the parents of that vertex
...

16
...

• A transaction can lock any vertex ﬁrst
...

Show that the protocol ensures serializability and deadlock freedom
...
11 Consider a variant of the tree protocol called the forest protocol
...
Each transaction Ti must follow the
following rules:
• The ﬁrst lock in each tree may be on any data item
...

• Data items may be unlocked at any time
...

Show that the forest protocol does not ensure serializability
...
12 Locking is not done explicitly in persistent programming languages
...
Most modern operating systems allow the user to set access protections (no access, read, write) on pages, and memory access that violate the
access protections result in a protection violation (see the Unix mprotect command, for example)
...
Transaction
Management

© The McGraw−Hill
Companies, 2001

16
...
23

S
true
false
false

X
false
false
false

I
false
false
true

Lock-compatibility matrix
...
(Hint: The
technique is similar to that used for hardware swizzling in Section 11
...
4)
...
13 Consider a database system that includes an atomic increment operation, in
addition to the read and write operations
...

The operation
increment(X) by C
sets the value of X to V + C in an atomic step
...
Figure 16
...

a
...

b
...
(Hint: Consider check-clearing transactions in our bank example
...
14 In timestamp ordering, W-timestamp(Q) denotes the largest timestamp of any
transaction that executed write(Q) successfully
...
Would this change in wording make any difference? Explain your
answer
...
15 When a transaction is rolled back under timestamp ordering, it is assigned a
new timestamp
...
16 In multiple-granularity locking, what is the difference between implicit and
explicit locking?
16
...
Why is it useless?
16
...
Provide examples of both situations, and compare the relative amount of concurrency allowed
...
19 Consider the validation-based concurrency-control scheme of Section 16
...

Show that by choosing Validation(Ti ), rather than Start(Ti ), as the timestamp of
transaction Ti , we can expect better response time provided that conﬂict rates
among transactions are indeed low
...
Transaction
Management

16
...
20 Show that there are schedules that are possible under the two-phase locking
protocol, but are not possible under the timestamp protocol, and vice versa
...
21 For each of the following protocols, describe aspects of practical applications
that would lead you to suggest using the protocol, and aspects that would
suggest not using the protocol:
•
•
•
•
•
•
•

Two-phase locking
Two-phase locking with multiple-granularity locking
The tree protocol
Timestamp ordering
Validation
Multiversion timestamp ordering
Multiversion two-phase locking

16
...
Explain how the commit bit can prevent cascading abort
...
23 Explain why the following technique for transaction execution may provide
better performance than just using strict two-phase locking: First execute the
transaction without acquiring any locks and without performing any writes to
the database as in the validation based techniques, but unlike in the validation
techniques do not perform either validation or perform writes on the database
...
(Hint: Consider
waits for disk I/O
...
24 Under what conditions is it less expensive to avoid deadlock than to allow
deadlocks to occur and then to detect them?
16
...

16
...

Give a schedule whereby the timestamp test for a write operation fails and
causes the ﬁrst transaction to be restarted, in turn causing a cascading abort of
the other transaction
...
(Such a situation, where two or more processes carry out actions, but are
unable to complete their task because of interaction with the other processes,
is called a livelock
...
27 Explain the phantom phenomenon
...
28 Devise a timestamp-based protocol that avoids the phantom phenomenon
...
29 Explain the reason for the use of degree-two consistency
...
Transaction
Management

16
...
30 Suppose that we use the tree protocol of Section 16
...
5 to manage concurrent
access to a B+ -tree
...
Under what circumstances is it possible to release a lock
earlier?
16
...

Bibliographical Notes
Gray and Reuter [1993] provides detailed textbook coverage of transaction-processing
concepts, including concurrency control concepts and implementation details
...

Early textbook discussions of concurrency control and recovery included Papadimitriou [1986] and Bernstein et al
...
An early survey paper on implementation
issues in concurrency control and recovery is presented by Gray [1978]
...
[1976]
...
Other non-two-phase locking protocols that operate on more general graphs are described in Yannakakis et al
...
General discussions concerning locking protocols are offered by Lien and Weinberger
[1978], Yannakakis et al
...
Korth
[1983] explores various lock modes that can be obtained from the basic shared and
exclusive lock modes
...
6 is from Buckley and Silberschatz [1984]
...
8 is from Kedem
and Silberschatz [1983]
...
9 is from Kedem and Silberschatz [1979]
...
10 is from Yannakakis et al
...
Exercise 16
...

The timestamp-based concurrency-control scheme is from Reed [1983]
...
A timestamp algorithm that does not require any
rollback to ensure serializability is presented by Buckley and Silberschatz [1983]
...

The locking protocol for multiple-granularity data items is from Gray et al
...

A detailed description is presented by Gray et al
...
The effects of locking granularity are discussed by Ries and Stonebraker [1977]
...
This approach includes a class of lock modes
called update modes to deal with lock conversion
...
An extension of the protocol to ensure deadlock freedom is presented by Korth [1982]
...

Discussions concerning multiversion concurrency control are offered by Bernstein
et al
...
A multiversion tree-locking algorithm appears in Silberschatz [1982]
...
Transaction
Management

16
...
Lai
and Wilkinson [1984] describes a multiversion two-phase locking certiﬁer
...
Holt [1971] and Holt [1972] were the ﬁrst to formalize the notion of deadlocks in terms of a graph model similar to the one presented in this chapter
...
[1981a]
...
[1981] and Yannakakis [1981]
...
[1990]
...
[1975]
...

[1995]
...
The techniques presented in Section 16
...
The technique of key-value locking used
in ARIES provides for very high concurrency on B+ -tree access, and is described in
Mohan [1990a] and Mohan and Levine [1992]
...
Ellis [1987] presents a concurrency-control technique for
linear hashing
...

Concurrency-control algorithms for other index structures appear in Ellis [1980a] and
Ellis [1980b]
...
Transaction
Management

H

A

P

T

E

R

1

637

© The McGraw−Hill
Companies, 2001

17
...
In any failure, information may be lost
...
An integral part of a database
system is a recovery scheme that can restore the database to the consistent state that
existed before the failure
...

17
...
The simplest type of failure is one that does not
result in the loss of information in the system
...
In this chapter, we shall consider
only the following types of failure:
• Transaction failure
...
The transaction can no longer continue with its normal execution because of some internal condition, such as bad input, data not
found, overﬂow, or resource limit exceeded
...
The system has entered an undesirable state (for example,
deadlock), as a result of which a transaction cannot continue with its normal execution
...

• System crash
...
Transaction
Management

17
...
The content of nonvolatile
storage remains intact, and is not corrupted
...
Well-designed systems have numerous internal
checks, at the hardware and the software level, that bring the system to a halt
when there is an error
...

• Disk failure
...
Copies of the data on other disks,
or archival backups on tertiary media, such as tapes, are used to recover from
the failure
...
Next, we must consider
how these failure modes affect the contents of the database
...

These algorithms, known as recovery algorithms, have two parts:
1
...

2
...

17
...
To understand how to ensure the
atomicity and durability properties of a transaction, we must gain a better understanding of these storage media and their access methods
...
2
...
We review these terms, and introduce another class of storage, called stable storage
...
Information residing in volatile storage does not usually survive system crashes
...
Access to volatile storage is extremely fast, both because of the speed
of the memory access itself, and because it is possible to access any data item
in volatile storage directly
...
Information residing in nonvolatile storage survives system crashes
...
Disks are
used for online storage, whereas tapes are used for archival storage
...
Transaction
Management

639

© The McGraw−Hill
Companies, 2001

17
...
2

Storage Structure

641

however, are subject to failure (for example, head crash), which may result
in loss of information
...
This is
because disk and tape devices are electromechanical, rather than based entirely on chips, as is volatile storage
...
Other nonvolatile media are normally used only for
backup data
...
1), though nonvolatile, has insufﬁcient capacity for most database systems
...
Information residing in stable storage is never lost (never should
be taken with a grain of salt, since theoretically never cannot be guaranteed—
for example, it is possible, although extremely unlikely, that a black hole may
envelop the earth and permanently destroy all data!)
...
Section 17
...
2 discusses
stable-storage implementation
...
Certain systems provide battery backup, so that some main
memory can survive system crashes and power failures
...

17
...
2 Stable-Storage Implementation
To implement stable storage, we need to replicate the needed information in several nonvolatile storage media (usually disk) with independent failure modes, and
to update the information in a controlled manner to ensure that failure during data
transfer does not damage the needed information
...
The simplest and fastest
form of RAID is the mirrored disk, which keeps two copies of each block, on separate
disks
...

RAID systems, however, cannot guard against data loss due to disasters such as
ﬁres or ﬂooding
...
However, since tapes cannot be carried off-site continually,
updates since the most recent time that tapes were carried off-site could be lost in
such a disaster
...
Since the blocks are output to a remote system as and when
they are output to local storage, once an output operation is complete, the output is
not lost, even in the event of a disaster such as a ﬁre or ﬂood
...
10
...
Block transfer between memory and disk storage
can result in

640

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

642

Chapter 17

V
...
Recovery System

© The McGraw−Hill
Companies, 2001

Recovery System

• Successful completion
...

• Partial failure
...

• Total failure
...

We require that, if a data-transfer failure occurs, the system detects it and invokes
a recovery procedure to restore the block to a consistent state
...
An output operation
is executed as follows:
1
...

2
...

3
...

During recovery, the system examines each pair of physical blocks
...
(Recall that
errors in a disk block, such as a partial write to the block, are detected by storing a
checksum with each block
...
If both blocks contain no detectable
error, but they differ in content, then the system replaces the content of the ﬁrst block
with the value of the second
...

The requirement of comparing every corresponding pair of blocks during recovery
is expensive to meet
...
On recovery, only
blocks for which writes were in progress need to be compared
...
4
...
Although a large number of copies reduces
the probability of a failure to even lower than two copies do, it is usually reasonable
to simulate stable storage with only two copies
...
2
...

Blocks are the units of data transfer to and from disk, and may contain several data

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

V
...
Recovery System

17
...
1

Block storage operations
...
We shall assume that no data item spans two or more blocks
...

Transactions input information from the disk to main memory, and then output the
information back onto the disk
...
The blocks residing on the disk are referred to as physical blocks; the blocks
residing temporarily in main memory are referred to as buffer blocks
...

Block movements between disk and main memory are initiated through the following two operations:
1
...

2
...

Figure 17
...

Each transaction Ti has a private work area in which copies of all the data items
accessed and updated by Ti are kept
...
Each data item X kept in the work area of transaction Ti is denoted by xi
...
We transfer data by these two operations:
1
...
It executes
this operation as follows:
a
...

b
...

2
...

It executes this operation as follows:
a
...

b
...

642

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

644

Chapter 17

V
...
Recovery System

© The McGraw−Hill
Companies, 2001

Recovery System

Note that both operations may require the transfer of a block from disk to main memory
...

A buffer block is eventually written out to the disk either because the buffer manager needs the memory space for other purposes or because the database system
wishes to reﬂect the change to B on the disk
...

When a transaction needs to access a data item X for the ﬁrst time, it must execute
read(X)
...
After the transaction accesses X for the ﬁnal time, it must execute write(X) to reﬂect the change to X in the
database itself
...
Thus, the actual output may
take place later
...

17
...
Suppose that a system crash has occurred during the execution of Ti ,
after output(BA ) has taken place, but before output(BB ) was executed, where BA and
BB denote the buffer blocks on which A and B reside
...
This procedure will result in the value of A becoming $900,
rather than $950
...

• Do not reexecute Ti
...
Thus, the system enters an inconsistent state
...
The reason for this difﬁculty is that we have modiﬁed
the database without having assurance that the transaction will indeed commit
...
However, if
Ti performed multiple database modiﬁcations, several output operations may be required, and a failure may occur after some of these modiﬁcations have been made,
but before all of them are made
...
As we shall
see, this procedure will allow us to output all the modiﬁcations made by a committed transaction, despite failures
...
4 and 17
...
In these two sections, we shall assume that

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

V
...
Recovery System

17
...
We shall describe how to handle concurrently executing transactions later, in
Section 17
...

17
...
The
log is a sequence of log records, recording all the update activities in the database
...
An update log record describes a single database write
...

• Data-item identiﬁer is the unique identiﬁer of the data item written
...

• Old value is the value of the data item prior to the write
...

Other special log records exist to record signiﬁcant events during transaction processing, such as the start of a transaction and the commit or abort of a transaction
...
Transaction Ti has started
...
Transaction Ti has performed a write on data item Xj
...

•
...

•
...

Whenever a transaction performs a write, it is essential that the log record for that
write be created before the database is modiﬁed
...
Also, we have the ability
to undo a modiﬁcation that has already been output to the database
...

For log records to be useful for recovery from system and disk failures, the log
must reside in stable storage
...
In Section 17
...
In Sections 17
...
1 and 17
...
2, we shall introduce two techniques for using the
log to ensure transaction atomicity despite failures
...
As a result, the volume of data stored in the
log may become unreasonably large
...
4
...

644

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

646

Chapter 17

V
...
Recovery System

© The McGraw−Hill
Companies, 2001

Recovery System

17
...
1 Deferred Database Modiﬁcation
The deferred-modiﬁcation technique ensures transaction atomicity by recording all
database modiﬁcations in the log, but deferring the execution of all write operations
of a transaction until the transaction partially commits
...
The version of the deferred-modiﬁcation technique that we describe in this
section assumes that transactions are executed serially
...
If the system crashes before
the transaction completes its execution, or if the transaction aborts, then the information on the log is simply ignored
...
Before Ti starts its execution,
a record is written to the log
...
Finally, when Ti partially commits, a record commit> is written to the log
...
Since a failure may occur while this updating is
taking place, we must ensure that, before the start of these updates, all the log records
are written out to stable storage
...

Observe that only the new value of the data item is required by the deferredmodiﬁcation technique
...

To illustrate, reconsider our simpliﬁed banking system
...

Let T1 be a transaction that withdraws $100 from account C:
T1 : read(C);
C := C − 100;
write(C)
...
The portion of the log containing the relevant
information on these two transactions appears in Figure 17
...

There are various orders in which the actual outputs can take place to both the
database system and the log as a result of the execution of T0 and T1
...
Transaction
Management

645

© The McGraw−Hill
Companies, 2001

17
...
4

Log-Based Recovery

647

Figure 17
...

appears in Figure 17
...
Note that the value of A is changed in the database only after
the record has been placed in the log
...
The recovery scheme uses the following recovery procedure:
• redo(Ti ) sets the value of all data items updated by transaction Ti to the new
values
...

The redo operation must be idempotent; that is, executing it several times must be
equivalent to executing it once
...

After a failure, the recovery subsystem consults the log to determine which transactions need to be redone
...
Thus, if the system
crashes after the transaction completes its execution, the recovery scheme uses the
information in the log to restore the system to a previous consistent state after the
transaction had completed
...
Figure 17
...
Let us suppose that the
Log

Figure 17
...

646

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

648

Chapter 17

V
...
Recovery System

Recovery System

(a)
Figure 17
...
3, shown at three different times
...
Assume that the crash
occurs just after the log record for the step
write(B)
of transaction T0 has been written to stable storage
...
4a
...
The values of accounts A and B
remain $1000 and $2000, respectively
...

Now, let us assume the crash comes just after the log record for the step
write(C)
of transaction T1 has been written to stable storage
...
4b
...
After this operation is executed, the values of accounts
A and B are $950 and $2050, respectively
...
As
before, the log records of the incomplete transaction T1 can be deleted from the log
...
The log at the time of this crash is as in Figure 17
...
When
the system comes back up, two commit records are in the log: one for T0 and one
for T1
...
After the system executes these
operations, the values of accounts A, B, and C are $950, $2050, and $600, respectively
...
Some changes may have been made to the database as a

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

V
...
Recovery System

17
...
When the system comes up after the second crash, recovery proceeds exactly as in the preceding
examples
...
In other words,
it restarts the recovery actions from the beginning
...

17
...
2 Immediate Database Modiﬁcation
The immediate-modiﬁcation technique allows database modiﬁcations to be output
to the database while the transaction is still in the active state
...
In the event
of a crash or a transaction failure, the system must use the old-value ﬁeld of the
log records described in Section 17
...
The undo operation, described next,
accomplishes this restoration
...
During its execution, any write(X) operation by Ti is preceded by the writing of the appropriate new update record to the log
...

Since the information in the log is used in reconstructing the state of the database,
we cannot allow the actual update to the database to take place before the corresponding log record is written out to stable storage
...
We shall return to this issue in Section 17
...

As an illustration, let us reconsider our simpliﬁed banking system, with transactions T0 and T1 executed one after the other in the order T0 followed by T1
...
5
...
6 shows one possible order in which the actual outputs took place in both
the database system and the log as a result of the execution of T0 and T1
...
5

Portion of the system log corresponding to T0 and T1
...
Transaction
Management

© The McGraw−Hill
Companies, 2001

17
...
6

State of system log and database corresponding to T0 and T1
...
4
...

Using the log, the system can handle any failure that does not result in the loss
of information in nonvolatile storage
...

• redo(Ti ) sets the value of all data items updated by transaction Ti to the new
values
...

The undo and redo operations must be idempotent to guarantee correct behavior
even if a failure occurs during the recovery process
...

• Transaction Ti needs to be redone if the log contains both the record
and the record
...
Suppose that the system
crashes before the completion of the transactions
...
The
state of the logs for each of these cases appears in Figure 17
...

First, let us assume that the crash occurs just after the log record for the step
write(B)

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

V
...
4

(a)
Figure 17
...
Recovery System

Log-Based Recovery

651

(c)

The same log, shown at three different times
...
7a)
...
Thus, transaction T0 must be undone, so an undo(T0 ) is performed
...

Next, let us assume that the crash comes just after the log record for the step
write(C)
of transaction T1 has been written to stable storage (Figure 17
...
When the system
comes back up, two recovery actions need to be taken
...
The operation redo(T0 ) must be performed, since the log contains both
the record and the record
...

Note that the undo(T1 ) operation is performed before the redo(T0 )
...
However, the order of
doing undo operations ﬁrst, and then redo operations, is important for the recovery
algorithm that we shall see in Section 17
...

Finally, let us assume that the crash occurs just after the log record

has been written to stable storage (Figure 17
...
When the system comes back up,
both T0 and T1 need to be redone, since the records and
appear in the log, as do the records and
...

17
...
3 Checkpoints
When a system failure occurs, we must consult the log to determine those transactions that need to be redone and those that need to be undone
...
There are two major difﬁculties
with this approach:

650

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

652

Chapter 17

V
...
Recovery System

© The McGraw−Hill
Companies, 2001

Recovery System

1
...

2
...
Although redoing them
will cause no harm, it will nevertheless cause recovery to take longer
...
During execution, the
system maintains the log, using one of the two techniques described in Sections 17
...
1
and 17
...
2
...
Output onto stable storage all log records currently residing in main memory
...
Output to the disk all modiﬁed buffer blocks
...
Output onto stable storage a log record
...

The presence of a record in the log allows the system to streamline
its recovery procedure
...
For such a transaction, the record appears in the log before the
record
...

Thus, at recovery time, there is no need to perform a redo operation on Ti
...
(We continue
to assume that transactions are run serially
...
It can ﬁnd such a transaction by searching the log backward, from the end of the log, until it ﬁnds the ﬁrst
record (since we are searching backward, the record found is the ﬁnal
record in the log); then it continues the search backward until it ﬁnds
the next record
...

Once the system has identiﬁed transaction Ti , the redo and undo operations need
to be applied to only transaction Ti and all transactions Tj that started executing
after transaction Ti
...
The remainder
(earlier part) of the log can be ignored, and can be erased whenever desired
...
For the immediate-modiﬁcation technique, the recovery operations are:
• For all transactions Tk in T that have no record in the log, execute undo(Tk )
...

Obviously, the undo operation does not need to be applied when the deferred-modiﬁcation technique is being employed
...
Transaction
Management

651

© The McGraw−Hill
Companies, 2001

17
...
5

Shadow Paging

653

As an illustration, consider the set of transactions {T0 , T1 ,
...
Suppose that the most recent checkpoint took place during
the execution of transaction T67
...
, T100 need to be
considered during the recovery scheme
...

In Section 17
...
3, we consider an extension of the checkpoint technique for concurrent transaction processing
...
5 Shadow Paging
An alternative to log-based crash-recovery techniques is shadow paging
...
3
...

There are, however, disadvantages to the shadow-paging approach, as we shall see,
that limit its use
...

As before, the database is partitioned into some number of ﬁxed-length blocks,
which are referred to as pages
...
Assume that there are
n pages, numbered 1 through n
...
)
These pages do not need to be stored in any particular order on disk (there are many
reasons why they do not, as we saw in Chapter 11)
...
We use a page table, as in Figure 17
...
The page table has n entries—one for each database page
...
The ﬁrst entry contains a pointer to the
ﬁrst page of the database, the second entry points to the second page, and so on
...
8 shows that the logical order of database pages does not need
to correspond to the physical order in which the pages are placed on disk
...

When the transaction starts, both page tables are identical
...
The current page table may be
changed when a transaction performs a write operation
...

Suppose that the transaction Tj performs a write(X) operation, and that X resides
on the ith page
...
If the ith page (that is, the page on which X resides) is not already in main
memory, then the system issues input(X)
...
If this is the write ﬁrst performed on the ith page by this transaction, then the
system modiﬁes the current page table as follows:
a
...
Usually, the database system has access
to a list of unused (free) pages, as we saw in Chapter 11
...
Recovery System

Recovery System

…

1
2
3
4
5
6
7

n
page table

…

654

V
...
8

Sample page table
...
It deletes the page found in step 2a from the list of free page frames; it
copies the contents of the ith page to the page found in step 2a
...
It modiﬁes the current page table so that the ith entry points to the page
found in step 2a
...
It assigns the value of xj to X in the buffer page
...
2
...
Steps 1 and 3 here correspond
to steps 1 and 2 in Section 17
...
3
...
Transaction
Management

653

© The McGraw−Hill
Companies, 2001

17
...
5

Shadow Paging

1
2
3
4
5
6
7
8
9
10

655

1
2
3
4
5
6
7
8
9
10
current page table

shadow page table

pages on disk
Figure 17
...

page table
...
9 shows the shadow and current page tables for a transaction
performing a write to the fourth page of a database consisting of 10 pages
...
When
the transaction commits, the system writes the current page table to nonvolatile storage
...
It is important that the shadow page table
be stored in nonvolatile storage, since it provides the only means of locating database
pages
...
We do
not care whether the current page table is lost in a crash, since the system recovers by
using the shadow page table
...
A simple way of ﬁnding it is to choose one ﬁxed location in stable storage that
contains the disk address of the shadow page table
...
Transaction
Management

17
...
Because of our deﬁnition of the write operation,
we are guaranteed that the shadow page table will point to the database pages corresponding to the state of the database prior to any transaction that was active at the
time of the crash
...
Unlike our log-based schemes, shadow
paging needs to invoke no undo operations
...
Ensure that all buffer pages in main memory that have been changed by the
transaction are output to disk
...
)
2
...
Note that we must not overwrite the
shadow page table, since we may need it for recovery from a crash
...
Output the disk address of the current page table to the ﬁxed location in stable storage containing the address of the shadow page table
...
Therefore, the current page
table has become the shadow page table, and the transaction is committed
...
If the crash occurs after the completion of step 3, the
effects of the transaction will be preserved; no redo operations need to be invoked
...
The overhead of log-record output is eliminated, and recovery from crashes is signiﬁcantly
faster (since no undo or redo operations are needed)
...
The commit of a single transaction using shadow paging
requires multiple blocks to be output—the actual data blocks, the current page
table, and the disk address of the current page table
...

The overhead of writing an entire page table can be reduced by implementing the page table as a tree structure, with page table entries at the leaves
...
The
nodes of the tree are pages and have a high fanout, like B+ -trees
...
When a
page is to be updated for the ﬁrst time, the system changes the entry in the current page table to point to the copy of the page
...
Otherwise, the
system ﬁrst copies it, and updates the copy
...
The process of copying proceeds up to the root of the tree
...

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

V
...
Recovery System

17
...

All the other parts of the tree are shared between the shadow and the current
page table, and do not need to be copied
...
However, several pages of the page table
still need to copied for each transaction, and the log-based schemes continue
to be superior as long as most transactions update only small parts of the
database
...
In Chapter 11, we considered strategies to ensure locality
— that is, to keep related database pages close physically on the disk
...
Shadow paging causes database pages to
change location when they are updated
...
(See the bibliographical notes for
references
...
Each time that a transaction commits, the database pages
containing the old version of data changed by the transaction become inaccessible
...
9, the page pointed to by the fourth entry of the shadow
page table will become inaccessible once the transaction of that example commits
...
Garbage may be created also as a side
effect of crashes
...
This process, called garbage collection,
imposes additional overhead and complexity on the system
...
(See the bibliographical notes for
references
...
In such systems, some logging is usually required, even if shadow
paging is used
...
4
...
It is relatively
easy to extend the log-based recovery schemes to allow concurrent transactions, as
we shall see in Section 17
...
For these reasons, shadow paging is not widely used
...
6 Recovery with Concurrent Transactions
Until now, we considered recovery in an environment where only a single transaction at a time is executing
...
Regardless
of the number of concurrent transactions, the system has a single disk buffer and a
single log
...
We allow immediate modiﬁcation,
and permit a buffer block to have data items updated by one or more transactions
...
Transaction
Management

17
...
6
...
To roll back a failed transaction, we must undo the updates performed by the
transaction
...
Using the log-based schemes
for recovery, we restore the value by using the undo information in a log record
...
Then, the update performed by T1 will be lost if T0 is rolled back
...
We can ensure this requirement easily by using strict two-phase locking—that
is, two-phase locking with exclusive locks held until the end of the transaction
...
6
...
The system scans the log backward; for every log record of the form found in the log, the system
restores the data item Xj to its old value V1
...

Scanning the log backward is important, since a transaction may have updated a
data item more than once
...
Scanning the log backward sets A correctly to 10
...

If strict two-phase locking is used for concurrency control, locks held by a transaction T may be released only after the transaction has been rolled back as described
...
6
...
Therefore, restoring the old value of the
data item will not erase the effects of any other transaction
...
6
...
4
...
Since we assumed no concurrency,
it was necessary to consider only the following transactions during recovery:
• Those transactions that started after the most recent checkpoint
• The one transaction, if any, that was active at the time of the most recent checkpoint
The situation is more complex when transactions can execute concurrently, since several transactions may have been active at the time of the most recent checkpoint
...
Transaction
Management

17
...
6

657

© The McGraw−Hill
Companies, 2001

Recovery with Concurrent Transactions

659

In a concurrent transaction-processing system, we require that the checkpoint log
record be of the form , where L is a list of transactions active at the
time of the checkpoint
...

The requirement that transactions must not perform any updates to buffer blocks
or to the log during checkpointing can be bothersome, since transaction processing
will have to halt while a checkpoint is in progress
...
Section 17
...
5 describes fuzzy checkpointing schemes
...
6
...

The system constructs the two lists as follows: Initially, they are both empty
...

• For each record found of the form , if Ti is not in redo-list, then it
adds Ti to undo-list
...
For each transaction Ti in L, if Ti is not in redo-list then it adds
Ti to the undo-list
...
The system rescans the log from the most recent record backward, and performs an undo for each log record that belongs transaction Ti on the undo-list
...
The scan
stops when the records have been found for every transaction Ti
in the undo-list
...
The system locates the most recent record on the log
...

3
...
It ignores log records of transactions on the undo-list in this
phase
...

658

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

660

Chapter 17

V
...
Recovery System

© The McGraw−Hill
Companies, 2001

Recovery System

After the system has undone all transactions on the undo-list, it redoes those transactions on the redo-list
...
When
the recovery process has completed, transaction processing resumes
...

Suppose that data item A initially has the value 10
...
Suppose that another transaction Tj then updated data item A to 30 and
committed, following which the system crashed
...
The ﬁnal value of Q should be 30, which we can ensure
by performing undo before performing redo
...
7 Buffer Management
In this section, we consider several subtle details that are essential to the implementation of a crash-recovery scheme that ensures data consistency and imposes a minimal
amount of overhead on interactions with the database
...
7
...
This assumption imposes a high overhead on system execution for several
reasons: Typically, output to stable storage is in units of blocks
...
Thus, the output of each log record translates to
a much larger output at the physical level
...
2
...

The cost of performing the output of a block to stable storage is sufﬁciently high
that it is desirable to output multiple log records at once
...
Multiple log records can be gathered in the log buffer, and
output to stable storage in a single output operation
...

As a result of log buffering, a log record may reside in only main memory (volatile
storage) for a considerable time before it is output to stable storage
...
Transaction
Management

659

© The McGraw−Hill
Companies, 2001

17
...
7

Buffer Management

661

• Transaction Ti enters the commit state after the log record has
been output to stable storage
...

• Before a block of data in main memory can be output to the database (in nonvolatile storage), all log records pertaining to data in that block must have
been output to stable storage
...
(Strictly speaking,
the WAL rule requires only that the undo information in the log have been
output to stable storage, and permits the redo information to be written later
...
)
The three rules state situations in which certain log records must have been output
to stable storage
...
Thus, when the system ﬁnds it necessary to output a log record to
stable storage, it outputs an entire block of log records, if there are enough log records
in main memory to ﬁll a block
...

Writing the buffered log to disk is sometimes referred to as a log force
...
7
...
2, we described the use of a two-level storage hierarchy
...
Since main memory is typically much smaller than the entire
database, it may be necessary to overwrite a block B1 in main memory when another
block B2 needs to be brought into memory
...
As discussed in Section 11
...
1 in Chapter 11, this
storage hierarchy is the standard operating system concept of virtual memory
...
If the input of block B2 causes block B1 to be chosen for output, all log
records pertaining to data in B1 must be output to stable storage before B1 is output
...

• Output block B1 to disk
...

It is important that no writes to the block B1 be in progress while the system carries out this sequence of actions
...
Transaction
Management

17
...

The lock can be released immediately after the update has been performed
...
It releases the lock once the block output has
completed
...
Latches
are treated as distinct from locks used by the concurrency-control system
...

To illustrate the need for the write-ahead logging requirement, consider our banking example with transactions T0 and T1
...
Assume that the block on which B resides is
not in main memory, and that main memory is full
...
If the system outputs this block to disk and
then a crash occurs, the values in the database for accounts A, B, and C are $950,
$2000, and $700, respectively
...
However, because
of the WAL requirements, the log record

must be output to stable storage prior to output of the block on which A resides
...

17
...
3 Operating System Role in Buffer Management
We can manage the database buffer by using one of two approaches:
1
...
The database system manages
data-block transfer in accordance with the requirements in Section 17
...
2
...
The buffer must be kept small enough that other applications have
sufﬁcient main memory available for their needs
...
Likewise, nondatabase applications may not use
that part of main memory reserved for the database buffer, even if some of the
pages in the database buffer are not being used
...
The database system implements its buffer within the virtual memory provided by the operating system
...

But, to ensure the write-ahead logging requirements in Section 17
...
1, the operating system should not write out the database buffer pages itself, but in-

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

V
...
8

661

© The McGraw−Hill
Companies, 2001

17
...

The database system in turn would force-output the buffer blocks to the database, after writing relevant log records to stable storage
...
The operating system reserves space on disk
for storing virtual-memory pages that are not currently in main memory; this
space is called swap space
...

Therefore, if the database buffer is in virtual memory, transfers between
database ﬁles and the buffer in virtual memory must be managed by the
database system, which enforces the write-ahead logging requirements that
we discussed
...
If a block Bx is
output by the operating system, that block is not output to the database
...
When the database system needs to output Bx , the operating system may
need ﬁrst to input Bx from its swap space
...

Although both approaches suffer from some drawbacks, one or the other must
be chosen unless the operating system is designed to support the requirements of
database logging
...

17
...
Although failures in which the content of nonvolatile storage is lost
are rare, we nevertheless need to be prepared to deal with this type of failure
...
Our discussions apply as well to other
nonvolatile storage types
...
For example, we may dump the database to one or
more magnetic tapes
...
Once this restoration has been accomplished, the system uses the log
to bring the database system to the most recent consistent state
...
Output all log records currently residing in main memory onto stable storage
...
Output all buffer blocks onto the disk
...
Transaction
Management

17
...
Copy the contents of the database to stable storage
...
Output a log record onto the stable storage
...
4
...

To recover from the loss of nonvolatile storage, the system restores the database
to disk by using the most recent dump
...
Notice that
no undo operations need to be executed
...

Dumps of a database and checkpointing of buffers are similar
...

First, the entire database must be be copied to stable storage, resulting in considerable
data transfer
...
Fuzzy dump schemes have been developed, which allow transactions to be active while the dump is in progress
...

17
...
6 require that, once a transaction updates a data item, no other transaction may update the same data item until the ﬁrst
commits or is rolled back
...

Although strict two-phase locking is acceptable for records in relations, as discussed
in Section 16
...

To increase concurrency, we can use the B+ -tree concurrency-control algorithm described in Section 16
...

As a result, however, the recovery techniques from Section 17
...
Several alternative recovery techniques, applicable even with early lock release, have been proposed
...
We ﬁrst describe an advanced recovery scheme supporting early lock release
...
ARIES is more complex than our advanced recovery scheme, but
incorporates a number of optimizations to minimize recovery time, and provides a
number of other useful features
...
9
...
Consider a transaction T
that inserts an entry into a B+ -tree, and, following the B+ -tree concurrency-control
protocol, releases some locks after the insertion operation completes, but before the
transaction commits
...

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

V
...
Recovery System

17
...
For this reason,
the B+ -tree concurrency-control protocol in Section 16
...

Now let us consider how to perform transaction rollback
...
Instead, the insertion operation has to be undone by a logical undo—that is,
in this case, by the execution of a delete operation
...
For example, if the operation inserted an entry in a B+ -tree, the undo information U would
indicate that a deletion operation is to be performed, and would identify the B+ -tree
and what to delete from the tree
...
In contrast, logging of old-value and new-value information
is called physical logging, and the corresponding log records are called physical log
records
...
Before a logical operation begins, it writes a log record operation-begin>, where Oj is the unique identiﬁer for the operation
...
Thus, the usual old-value and new-value information is written out for each update
...

17
...
2 Transaction Rollback
First consider transaction rollback during normal operation (that is, not during recovery from system failure)
...
Unlike rollback
in normal operation, however, rollback in our advanced recovery scheme writes out
special redo-only log records of the form containing the value V being
restored to data item Xj during the rollback
...
Such records do not need undo information, since we will
never need to undo such an undo operation
...
It rolls back the operation by using the undo information U in the log record
...
In other words,
the system logs physical undo information for the updates performed during

664

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

666

Chapter 17

V
...
Recovery System

© The McGraw−Hill
Companies, 2001

Recovery System

rollback, instead of using compensation log records
...
9
...

At the end of the operation rollback, instead of generating a log record
< Ti , Oj , operation-end, U >, the system generates a log record < Ti , Oj ,
operation-abort>
...
When the backward scan of the log continues, the system skips all log records
of the transaction until it ﬁnds the log record
...

Observe that skipping over physical log records when the operation-end log record
is found during rollback ensures that the old values in the physical log record are not
used for rollback, once the operation completes
...
These preceding log records
must be skipped to prevent multiple rollback of the same operation, in case there had
been a crash during an earlier rollback, and the transaction had already been partly
rolled back
...

If failures occur while a logical operation is in progress, the operation-end log
record for the operation will not be found when the transaction is rolled back
...
The physical log
records will be used to roll back the incomplete operation
...
9
...
6
...
It outputs to stable storage all log records currently residing in main memory
...
It outputs to the disk all modiﬁed buffer blocks
...
It outputs onto stable storage a log record , where L is a list of
all active transactions
...
9
...
In the redo phase, the system replays updates of all transactions by scanning the log forward from the last checkpoint
...
Transaction
Management

17
...
9

665

© The McGraw−Hill
Companies, 2001

Advanced Recovery Techniques∗∗

667

tem crash, and those that had not committed when the system crash occurred
...
This phase also determines all transactions that
are either in the transaction list in the checkpoint record, or started later, but
did not have either a or a record in the log
...

2
...
It
performs rollback by scanning the log backward from the end
...
Thus, log records of a transaction preceding an operationend record, but after the corresponding operation-begin record, are ignored
...
Scanning of the log stops
when the system has found log records for all transactions in the
undo-list
...
In other words, this phase of restart recovery repeats all
the update actions that were executed after the checkpoint, and whose log records
reached the stable log
...
The actions are repeated in the
same order in which they were carried out; hence, this process is called repeating
history
...

Note that if an operation undo was in progress when the system crash occurred,
the physical log records written during operation undo would be found, and the partial operation undo would itself be undone on the basis of these physical log records
...

17
...
5 Fuzzy Checkpointing
The checkpointing technique described in Section 17
...
3 requires that all updates to
the database be temporarily suspended while the checkpoint is in progress
...

To avoid such interruptions, the checkpointing technique can be modiﬁed to permit updates to start once the checkpoint record has been written, but before the modiﬁed buffer blocks are written to disk
...

Since pages are output to disk only after the checkpoint record has been written, it
is possible that the system could crash before all pages are written
...
One way to deal with incomplete checkpoints is this:
The location in the log of the checkpoint record of the last completed checkpoint

666

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

668

Chapter 17

V
...
Recovery System

© The McGraw−Hill
Companies, 2001

Recovery System

is stored in a ﬁxed position, last-checkpoint, on disk
...
Instead, before it writes the
checkpoint record, it creates a list of all modiﬁed buffer blocks
...

Even with fuzzy checkpointing, a buffer block must not be updated while it is
being output to disk, although other buffer blocks may be updated concurrently
...

Note that, in our scheme, logical logging is used only for undo purposes, whereas
physical logging is used for redo and undo purposes
...
To perform logical redo, the database state on
disk must be operation consistent, that is, it should not have partial effects of any
operation
...
Therefore, logical redo logging is usually restricted only
to operations that affect a single page; we will see how to handle such logical redos
in Section 17
...
6
...

17
...
6 ARIES
The state of the art in recovery methods is best illustrated by the ARIES recovery
method
...
In contrast, ARIES uses a number of techniques to reduce the
time taken for recovery, and to reduce the overheads of checkpointing
...
The price paid is greater
complexity; the beneﬁts are worth the price
...
Uses a log sequence number (LSN) to identify log records, and the use of
LSNs in database pages to identify which operations have been applied to a
database page
...
Supports physiological redo operations, which are physical in that the affected page is physically identiﬁed, but can be logical within the page
...
With physical redo logging, all bytes of the page affected by the shifting of records must
be logged
...
Redo of the deletion operation would
delete the record and shift other records as required
...
Transaction
Management

17
...
9

667

© The McGraw−Hill
Companies, 2001

Advanced Recovery Techniques∗∗

669

3
...
Dirty
pages are those that have been updated in memory, and the disk version is
not up-to-date
...
Uses fuzzy checkpointing scheme that only records information about dirty
pages and associated information, and does not even require writing of dirty
pages to disk
...

In the rest of this section we provide an overview of ARIES
...

17
...
6
...
The number is conceptually just a logical identiﬁer whose value is greater
for log records that occur later in the log
...
Typically, ARIES splits a
log into multiple log ﬁles, each of which has a ﬁle number
...
The LSN then consists of a
ﬁle number and an offset within the ﬁle
...
Whenever an operation (whether physical or logical) occurs on a page, the operation stores the LSN of
its log record in the PageLSN ﬁeld of the page
...
In combination with a scheme for recording PageLSNs as part of checkpointing, which we
present later, ARIES can avoid even reading many pages for which logged operations
are already reﬂected on disk
...

The PageLSN is essential for ensuring idempotence in the presence of physiological redo operations, since reapplying a physiological redo that has already been applied to a page could cause incorrect changes to a page
...
Therefore, ARIES uses latches on buffer pages to prevent them from being written to disk while they are being updated
...

Each log record also contains the LSN of the previous log record of the same transaction
...
There are special redo-only
log records generated during transaction rollback, called compensation log records
(CLRs) in ARIES
...
In addition CLRs serve the role of the operation-abort
log records in our scheme
...
Transaction
Management

17
...
This ﬁeld serves the same purpose as the operation identiﬁer in the
operation-abort log record in our scheme, which helps to skip over log records that
have already been rolled back
...
For each page, it stores the PageLSN and a ﬁeld
called the RecLSN which helps identify log records that have been applied already
to the version of the page on disk
...
Whenever the page is ﬂushed to disk, the page is removed from the
DirtyPageTable
...
For each transaction, the checkpoint log record also notes LastLSN, the LSN of
the last log record written by the transaction
...

17
...
6
...

• Analysis pass: This pass determines which transactions to undo, which pages
were dirty at the time of the crash, and the LSN from which the redo pass
should start
...

• Undo pass: This pass rolls back all transactions that were incomplete at the
time of crash
...
It then sets RedoLSN to the minimum
of the RecLSNs of the pages in the DirtyPageTable
...
The redo pass starts its scan
of the log from RedoLSN
...
The analysis pass initially sets the list of
transactions to be undone, undo-list, to the list of transactions in the checkpoint log
record
...

The analysis pass continues scanning forward from the checkpoint
...
Whenever it ﬁnds a transaction end log record, it deletes the transaction
from undo-list
...
The analysis pass also keeps track of the last record
of each transaction in undo-list, which is used in the undo pass
...
Transaction
Management

17
...
Recovery System

Advanced Recovery Techniques∗∗

671

The analysis pass also updates DirtyPageTable whenever it ﬁnds a log record for
an update on a page
...

Redo Pass: The redo pass repeats history by replaying every action that is not already
reﬂected in the page on disk
...

Whenever it ﬁnds an update log record, it takes this action:
1
...

2
...

Note that if either of the tests is negative, then the effects of the log record have
already appeared on the page
...

Undo Pass and Transaction Rollback: The undo pass is relatively straightforward
...
If a CLR
is found, it uses the UndoNextLSN ﬁeld to skip log records that have already been
rolled back
...

Whenever an update log record is used to perform an undo (whether for transaction rollback during normal processing, or during the restart undo pass), the undo
pass generates a CLR containing the undo action performed (which must be physiological)
...

17
...
6
...
If
some pages of a disk fail, they can be recovered without stopping transaction
processing on other pages
...
This can be quite useful for deadlock handling, since
transactions can be rolled back up to a point that permits release of required
locks, and then restarted from that point
...

670

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

672

Chapter 17

V
...
Recovery System

© The McGraw−Hill
Companies, 2001

Recovery System

• Recovery optimizations: The DirtyPageTable can be used to prefetch pages
during redo, instead of fetching a page only when the system ﬁnds a log
record to be applied to the page
...
Meanwhile, other log records can continue to be processed
...

17
...

Such systems are vulnerable to environmental disasters such as ﬁre, ﬂooding, or
earthquakes
...
Such systems must
provide high availability, that is, the time for which the system is unusable must be
extremely small
...
The remote backup site is sometimes also called the
secondary site
...
We achieve synchronization by sending all log
records from primary site to the remote backup site
...

Figure 17
...

When the primary site fails, the remote backup site takes over processing
...
In effect, the remote backup
site is performing recovery actions that would have been performed at the primary
site when the latter recovered
...
Once recovery has been
performed, the remote backup site starts processing transactions
...
10

Architecture of remote backup system
...
Transaction
Management

671

© The McGraw−Hill
Companies, 2001

17
...
10

Remote Backup Systems

673

Availability is greatly increased over a single-site system, since the system can
recover even if all data at the primary site are lost
...

Several issues must be addressed in designing a remote backup system:
• Detection of failure
...
Failure of communication lines can fool the remote backup into believing that the primary has failed
...
For example, in addition to the network connection,
there may be a separate modem connection over a telephone line, with services provided by different telecommunication companies
...

• Transfer of control
...
When the original primary site recovers, it can either play the role of remote backup, or take over the role of primary site again
...

The simplest way of transferring control is for the old primary to receive
redo logs from the old backup site, and to catch up with the updates by applying them locally
...

If control must be transferred back, the old backup site can pretend to have
failed, resulting in the old primary taking over
...
If the log at the remote backup grows large, recovery will
take a long time
...
The delay before the remote backup takes over can
be signiﬁcantly reduced as a result
...
In this conﬁguration, the remote backup site continually processes redo log records as they arrive, applying the updates locally
...

• Time to commit
...
This delay can result in a longer wait to commit
a transaction, and some systems therefore permit lower degrees of durability
...

One-safe
...

672

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

674

Chapter 17

V
...
Recovery System

© The McGraw−Hill
Companies, 2001

Recovery System

The problem with this scheme is that the updates of a committed transaction may not have made it to the backup site, when the backup site
takes over processing
...
When the
primary site recovers, the lost updates cannot be merged in directly, since
the updates may conﬂict with later updates performed at the backup site
...

Two-very-safe
...

The problem with this scheme is that transaction processing cannot
proceed if either the primary or the backup site is down
...

Two-safe
...
If only the primary is active, the transaction is
allowed to commit as soon as its commit log record is written to stable
storage at the primary site
...

It results in a slower commit than the one-safe scheme, but the beneﬁts
generally outweigh the cost
...
In these systems, the
failure of a CPU does not result in system failure
...
Recovery actions include rollback of transactions running
on the failed CPU, and recovery of locks held by those transactions
...
However, we should
safeguard the data from disk failure by using, for example, a RAID disk organization
...
Transactions are then required to update
all replicas of any data item that they update
...

17
...
There are a variety of causes of such failure, including disk crash,
power failure, and software errors
...

• In addition to system failures, transactions may also fail for various reasons,
such as violation of integrity constraints or deadlocks
...

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

V
...
Recovery System

17
...
Data in volatile storage, such as in RAM, are lost
when the computer crashes
...
Data in stable storage are never lost
...
Ofﬂine,
or archival, stable storage may consist of multiple tape copies of data stored
in a physically secure location
...
To preserve consistency, we require that each transaction be
atomic
...
There are basically two different approaches for
ensuring atomicity: log-based schemes and shadow paging
...

In the deferred-modiﬁcations scheme, during the execution of a transaction, all the write operations are deferred until the transaction partially
commits, at which time the system uses the information on the log associated with the transaction in executing the deferred writes
...
If a crash occurs, the system uses the information
in the log in restoring the state of the system to a previous consistent state
...

• In shadow paging, two page tables are maintained during the life of a transaction: the current page table and the shadow page table
...
The shadow page table and pages
it points to are never changed during the duration of the transaction
...
If the transaction aborts, the current
page table is simply discarded
...

No transaction can be allowed to update a data item that has already been
updated by an incomplete transaction
...

• Transaction processing is based on a storage model in which main memory
holds a log buffer, a database buffer, and a system buffer
...

674

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

676

Chapter 17

V
...
Recovery System

Recovery System

• Efﬁcient implementation of a recovery scheme requires that the number of
writes to the database and to stable storage be minimized
...

Before a block of data in main memory is output to the database (in nonvolatile storage), all log records pertaining to data in that block must have
been output to stable storage
...
If a failure occurs that results in the loss of physical database
blocks, we use the most recent dump in restoring the database to a previous
consistent state
...

• Advanced recovery techniques support high-concurrency locking techniques,
such as those used for B+ -tree concurrency control
...

When recovering from system failure, the system performs a redo pass using
the log, followed by an undo pass on the log to roll back incomplete transactions
...
It is also based on repeating of history, and allows
logical undo operations
...
It uses log sequence numbers (LSNs) to implement a variety of optimizations that reduce
the time taken for recovery
...

Review Terms
• Recovery scheme

• Fail-stop assumption

• Failure classiﬁcation

• Disk failure

Transaction failure
Logical error
System error
System crash
Data-transfer failure

• Storage types
Volatile storage
Nonvolatile storage
Stable storage

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

V
...
Recovery System

Exercises

• Blocks

• Archival dump

Physical blocks
Buffer blocks
• Disk buffer

677

• Fuzzy dump
• Advanced recovery technique

• Immediate modiﬁcation

Physical undo
Logical undo
Physical logging
Logical logging
Logical operations
Transaction rollback
Checkpoints
Restart recovery
Redo phase
Undo phase
• Repeating history

• Uncommitted modiﬁcations

• Fuzzy checkpointing

• Checkpoints

• ARIES
Log sequence number (LSN)
PageLSN
Physiological redo
Compensation log record
(CLR)
DirtyPageTable
Checkpoint log record
• High availability

• Force-output
• Log-based recovery
• Log
• Log records
• Update log record
• Deferred modiﬁcation
• Idempotent

• Shadow paging
Page table
Current page table
Shadow page table
• Garbage collection
• Recovery with concurrent
transactions
Transaction rollback
Fuzzy checkpoint
Restart recovery
• Buffer management
• Log-record buffering
• Write-ahead logging (WAL)
• Log force
• Database buffering
• Latches
• Operating system and buffer
management
• Loss of nonvolatile storage

• Remote backup systems
Primary site
Remote backup site
Secondary site
• Detection of failure
• Transfer of control
• Time to recover
• Hot-spare conﬁguration
• Time to commit
One-safe
Two-very-safe
Two-safe

Exercises
17
...

676

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

678

Chapter 17

V
...
Recovery System

© The McGraw−Hill
Companies, 2001

Recovery System

17
...

a
...

b
...

17
...

17
...
Show, by an example,
how an inconsistent database state could result if log records for a transaction
are not output to stable storage prior to data updated by the transaction being
written to disk
...
5 Explain the purpose of the checkpoint mechanism
...
6 When the system recovers from a crash (see Section 17
...
4), it constructs an
undo-list and a redo-list
...

17
...

17
...
, block 10)
...

read block 3
read block 7
read block 5
read block 3
read block 1
modify block 1
read block 10
modify block 5
17
...

17
...
Give examples of one situation where
logical logging is preferable to physical logging and one situation where physical logging is preferable to logical logging
...
Transaction
Management

17
...
11 Explain the reasons why recovery of interactive transactions is more difﬁcult
to deal with than is recovery of batch transactions
...
)
17
...

a
...

b
...
Transactions that committed later have their effects rolled back with
this scheme
...

c
...
Why?
17
...

Describe how page access protections provided by modern operating systems
can be used to create before and after images of pages that are updated
...
12
...
14 ARIES assumes there is space in each page for an LSN
...
Suggest a technique to
handle such a situation; your technique must support physical redos but need
not support physiological redos
...
15 Explain the difference between a system crash and a “disaster
...
16 For each of the following requirements, identify the best choice of degree of
durability in a remote backup system:
a
...

b
...

c
...

Bibliographical Notes
Gray and Reuter [1993] is an excellent textbook source of information about recovery,
including interesting implementation and historical details
...
[1987] is
an early textbook source of information on concurrency control and recovery
...
Chandy et al
...

678

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

680

Chapter 17

V
...
Recovery System

© The McGraw−Hill
Companies, 2001

Recovery System

An overview of the recovery scheme of System R is presented by Gray et al
...
The shadow-paging mechanism of System R is described by Lorie [1977]
...
[1980], and Verhofstad [1978]
...
[1980]
...

The state of the art in recovery methods is best illustrated by the ARIES recovery
method, described in Mohan et al
...
Aries and its variants
are used in several database products, including IBM DB2 and Microsoft SQL Server
...
[2001]
...

Remote backup for disaster recovery (loss of an entire computing facility by, for
example, ﬁre, ﬂood, or earthquake) is considered in King et al
...

Chapter 24 lists references pertaining to long-duration transactions and related
recovery issues
...
Database System
Architecture

R T

Introduction

© The McGraw−Hill
Companies, 2001

6

Database System Architecture

The architecture of a database system is greatly inﬂuenced by the underlying computer system on which the database system runs
...
Database systems can also be designed to exploit parallel computer architectures
...

Chapter 18 ﬁrst outlines the architectures of database systems running on server
systems, which are used in centralized and client–server architectures
...

The chapter then outlines parallel computer architectures, and parallel database architectures designed for different types of parallel computers
...

Chapter 19 presents a number of issues that arise in a distributed database, and
describes how to deal with each issue
...

Distributed query processing and directory systems are also described in this chapter
...

679

680

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

C

VI
...
Database System
Architecture

E

R

1

8

Database System Architectures

The architecture of a database system is greatly inﬂuenced by the underlying computer system on which it runs, in particular by such aspects of computer architecture
as networking, parallelism, and distribution:
• Networking of computers allows some tasks to be executed on a server system, and some tasks to be executed on client systems
...

• Parallel processing within a computer system allows database-system activities to be speeded up, allowing faster response to transactions, as well as more
transactions per second
...
The need for parallel
query processing has led to parallel database systems
...
Keeping multiple copies
of the database across different sites also allows large organizations to continue their database operations even when one site is affected by a natural
disaster, such as ﬂood, ﬁre, or earthquake
...

We study the architecture of database systems in this chapter, starting with the
traditional centralized systems, and covering client – server, parallel, and distributed
database systems
...
1 Centralized and Client–Server Architectures
Centralized database systems are those that run on a single computer system and do
not interact with other computer systems
...
Database System
Architecture

© The McGraw−Hill
Companies, 2001

18
...
Client – server systems, on
the other hand, have functionality split between a server system, and multiple client
systems
...
1
...
1)
...
Each device controller
is in charge of a speciﬁc type of device (for example, a disk drive, an audio device,
or a video display)
...
Cache memory reduces the contention for memory
access, since it reduces the number of times that the CPU needs to access the shared
memory
...
Personal computers and workstations fall into the ﬁrst category
...
1

A centralized computer system
...
Database System
Architecture

18
...
1

© The McGraw−Hill
Companies, 2001

Centralized and Client – Server Architectures

685

machine at a time
...

It serves a large number of users who are connected to the system via terminals
...
In particular, they may not support
concurrency control, which is not required when only a single user can generate updates
...
Many such systems do not support SQL, and provide a simpler query language, such as a variant of QBE
...

Although general-purpose computer systems today have multiple processors, they
have coarse-granularity parallelism, with only a few processors (about two to four,
typically), all sharing the main memory
...

Thus, such systems support a higher throughput; that is, they allow a greater number of transactions to run per second, although individual transactions do not run
any faster
...
Thus, coarsegranularity parallel machines logically appear to be identical to single-processor
machines, and database systems designed for time-shared machines can be easily
adapted to run on them
...
We study the architecture of
parallel database systems in Section 18
...

18
...
2 Client – Server Systems
As personal computers became faster, more powerful, and cheaper, there was a shift
away from the centralized system architecture
...
Correspondingly, personal computers assumed the user-interface functionality that used to be handled directly by the centralized systems
...
Figure 18
...

Database functionality can be broadly divided into two parts — the front end and
the back end— as in Figure 18
...
The back end manages access structures, query
evaluation and optimization, concurrency control, and recovery
...
The interface between the front end and the back end is through
SQL, or through an application program
...
Database System
Architecture

© The McGraw−Hill
Companies, 2001

18
...
2

General structure of a client – server system
...
Any client that uses the ODBC or JDBC interfaces can
connect to any server that provides the interface
...
With the
growth of interface standards, the front-end user interface and the back-end server
are often provided by different vendors
...
Some of the popular application development tools
are PowerBuilder, Magic, and Borland Delphi; Visual Basic is also widely used for
application development
...
In effect, they provide front ends specialized for particular tasks
...
These calls appear like ordinary procedure calls to the programmer, but all the remote procedure calls from a client are
enclosed in a single transaction at the server end
...

SQL user-

interface

forms
interface

report
writer

graphical
interface

front-end

interface
(SQL + API)

SQL engine

Figure 18
...

back-end

683

684

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

VI
...
Database System
Architecture

18
...
2 Server System Architectures
Server systems can be broadly categorized as transaction servers and data servers
...
Usually,
client machines ship transactions to the server systems, where those transactions are executed, and results are shipped back to clients that are in charge
of displaying the data
...

• Data-server systems allow clients to interact with the servers by making requests to read or update data, in units such as ﬁles or pages
...
Data servers for database systems offer much more functionality; they support units of data — such as pages, tuples, or objects — that
are smaller than a ﬁle
...

Of these, the transaction-server architecture is by far the more widely used architecture
...
2
...
2
...

18
...
1 Transaction Server Process Structure
A typical transaction server system today consists of multiple processes accessing
data in shared memory, as in Figure 18
...
The processes that form part of the database
system include
• Server processes: These are processes that receive user queries (transactions),
execute them, and send the results back
...
Some database systems
use a separate process for each user session, and a few use a single database
process for all user sessions, but with multiple threads so that multiple queries
can execute concurrently
...
Multiple threads within a process can execute
concurrently
...

• Lock manager process: This process implements lock manager functionality,
which includes lock grant, lock release, and deadlock detection
...

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

688

Chapter 18

VI
...
Database System
Architecture

Database System Architectures

user
process

user
process
ODBC

JDBC
server
process

server
process

user
process

server
process

buffer pool
shared
memory
query plan cache
log buffer

log writer
process

lock table

checkpoint
process

log disks

Figure 18
...

• Log writer process: This process outputs log records from the log record buffer
to stable storage
...

• Checkpoint process: This process performs periodic checkpoints
...

The shared memory contains all shared data, such as:
• Buffer pool
• Lock table

685

686

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

VI
...
Database System
Architecture

18
...
Since multiple processes may read or perform updates on data structures in shared memory, there must
be a mechanism to ensure that only one of them is modifying any data structure at
a time, and no process is reading a data structure while it is being written by others
...
Alternative implementations, with less overheads, use special
atomic instructions supported by the computer hardware; one type of atomic instruction tests a memory location and sets it to 1 atomically
...

The mutual exclusion mechanisms are also used to implement latches
...
The
lock request procedure executes the actions that the lock manager process would
take on getting a lock request
...
1
...

• If a lock cannot be obtained immediately because of a lock conﬂict, the lock
request code keeps monitoring the lock table to check when the lock has been
granted
...

To avoid repeated checks on the lock table, operating system semaphores
can be used by the lock request code to wait for a lock grant notiﬁcation
...

Even if the system handles lock requests through shared memory, it still uses the lock
manager process for deadlock detection
...
2
...
In such an environment, it makes sense to ship data to client machines,
to perform all processing at the client machine (which may take a while), and then
to ship the data back to the server machine
...
Data-server architectures have been particularly
popular in object-oriented database systems
...
Database System
Architecture

18
...
The unit of communication for data can
be of coarse granularity, such as a page, or ﬁne granularity, such as a tuple (or
an object, in the context of object-oriented database systems)
...

If the unit of communication is a single item, the overhead of message passing is high compared to the amount of data transmitted
...
Fetching items even before they are requested is called
prefetching
...

• Locking
...
A disadvantage of page shipping is that client
machines may be granted locks of too coarse a granularity — a lock on a page
implicitly locks all items contained in the page
...
Other client machines that require locks on those items may be blocked
unnecessarily
...
If
the client machine does not need a prefetched item, it can transfer locks on the
item back to the server, and the locks can then be allocated to other clients
...
Data that are shipped to a client on behalf of a transaction can be
cached at the client, even after the transaction completes, if sufﬁcient storage
space is available
...
However, cache coherency is an issue: Even if a
transaction ﬁnds cached data, it must make sure that those data are up to date,
since they may have been updated by a different client after they were cached
...

• Lock caching
...
Suppose that a client ﬁnds a data item in
the cache, and that it also ﬁnds the lock required for an access to the data item
in the cache
...
However, the server must keep track of cached locks; if a client requests a lock from the server, the server must call back all conﬂicting locks on
the data item from any other client machines that have cached the locks
...

This technique differs from lock de-escalation in that lock caching takes place
across transactions; otherwise, the two techniques are similar
...
Database System
Architecture

© The McGraw−Hill
Companies, 2001

18
...
3

Parallel Systems

691

The bibliographical references provide more information about client – server database systems
...
3 Parallel Systems
Parallel systems improve processing and I/O speeds by using multiple CPUs and
disks in parallel
...
The driving
force behind parallel database systems is the demands of applications that have to
query extremely large databases (of the order of terabytes — that is, 1012 bytes) or
that have to process an extremely large number of transactions per second (of the order of thousands of transactions per second)
...

In parallel processing, many operations are performed simultaneously, as opposed
to serial processing, in which the computational steps are performed sequentially
...
Most high-end machines today offer some degree of coarse-grain parallelism:
Two or four processor machines are common
...
Parallel computers with hundreds of CPUs and disks
are available commercially
...
A system that processes a large number of small transactions can
improve throughput by processing many transactions in parallel
...

18
...
1 Speedup and Scaleup
Two important issues in studying parallelism are speedup and scaleup
...

Handling larger tasks by increasing the degree of parallelism is called scaleup
...
Now suppose that we increase the size of the system by
increasing the number or processors, disks, and other components of the system
...
Suppose that the execution time of a task on the larger machine
is TL , and that the execution time of the same task on the smaller machine is TS
...
The parallel system is said to
demonstrate linear speedup if the speedup is N when the larger system has N times
the resources (CPU, disk, and so on) of the smaller system
...
Figure 18
...

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

Chapter 18

© The McGraw−Hill
Companies, 2001

18
...
Database System
Architecture

resources
Figure 18
...

Scaleup relates to the ability to process larger tasks in the same amount of time by
providing more resources
...
Suppose that the execution time of task Q on a given machine MS is TS , and
the execution time of task QN on a parallel machine ML , which is N times larger than
MS , is TL
...
The parallel system ML is said to
demonstrate linear scaleup on task Q if TL = TS
...
Figure 18
...
There are two kinds of
scaleup that are relevant in parallel database systems, depending on how the size of
the task is measured:
• In batch scaleup, the size of the database increases, and the tasks are large jobs
whose runtime depends on the size of the database
...

Thus, the size of the database is the measure of the size of the problem
...

• In transaction scaleup, the rate at which transactions are submitted to the
database increases and the size of the database increases proportionally to
the transaction rate
...
Such transaction processing is especially well adapted
for parallel execution, since transactions can run concurrently and independently on separate processors, and each transaction takes roughly the same
amount of time, even if the database grows
...
The goal of parallelism in database systems is usually to make sure
that the database system can continue to perform at an acceptable speed, even as the

689

690

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

VI
...
Database System
Architecture

18
...
6

Scaleup with increasing problem size and resources
...
Increasing the capacity of the system by increasing the parallelism provides a smoother path for growth
for an enterprise than does replacing a centralized system by a faster machine (even
assuming that such a machine exists)
...

A number of factors work against efﬁcient parallel operation and can diminish
both speedup and scaleup
...
There is a startup cost associated with initiating a single process
...

• Interference
...
Both speedup and scaleup are affected by this phenomenon
...
By breaking down a single task into a number of parallel steps, we
reduce the size of the average step
...
It is often
difﬁcult to divide a task into exactly equal-sized parts, and the way that the
sizes are distributed is therefore skewed
...

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

694

Chapter 18

VI
...
Database System
Architecture

Database System Architectures

18
...
2 Interconnection Networks
Parallel systems consist of a set of components (processors, memory, and disks) that
can communicate with each other via an interconnection network
...
7 shows
three commonly used types of interconnection networks:
• Bus
...
This type of interconnection is shown in Figure 18
...

The bus could be an Ethernet or a parallel interconnect
...
However, they do not scale well with increasing parallelism, since the bus can handle communication from only one
component at a time
...
The components are nodes in a grid, and each component connects
to all its adjacent components in the grid
...
Figure 18
...
Nodes that are not directly connected can communicate with one another by routing messages via a sequence of intermediate nodes that are directly connected to one another
...

• Hypercube
...
Thus, each of the n components is connected to log(n) other
components
...
7c shows a hypercube with 8 nodes
...
In contrast, in a mesh architecture a
√
component may be 2( n − 1) links away from some of the other components
√
(or n links away, if the mesh interconnection wraps around at the edges of
the grid)
...

011

111
101

001
110

010
000
(a) bus

(b) mesh
Figure 18
...

100
(c) hypercube

691

692

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

VI
...
Database System
Architecture

18
...
3
...
Among the most prominent ones are those in Figure 18
...
All the processors share a common memory (Figure 18
...

• Shared disk
...
8b)
...

• Shared nothing
...
8c)
...
This model is a hybrid of the preceding three architectures (Figure 18
...

In Sections 18
...
3
...
3
...
4, we elaborate on each of these models
...
2
...

In fact, they are very important for efﬁcient transaction processing in such systems
...
8

(d) hierarchical
Parallel database architectures
...
Database System
Architecture

18
...
3
...
1 Shared Memory
In a shared-memory architecture, the processors and disks have access to a common
memory, typically via a bus or through an interconnection network
...
A processor can send messages to other processors much faster by using memory writes (which usually take less than a microsecond) than by sending a message
through a communication mechanism
...

Adding more processors does not help after a point, since the processors will spend
most of their time waiting for their turn on the bus to access memory
...
However, at least some of the data will not be in the cache, and accesses will have to go
to the shared memory
...

Maintaining cache-coherency becomes an increasing overhead with increasing number of processors
...

18
...
3
...
There are two advantages of this architecture over a shared-memory architecture
...
Second, it offers a
cheap way to provide a degree of fault tolerance: If a processor (or its memory) fails,
the other processors can take over its tasks, since the database is resident on disks
that are accessible from all processors
...
The shared-disk
architecture has found acceptance in many applications
...
Although the
memory bus is no longer a bottleneck, the interconnection to the disk subsystem is
now a bottleneck; it is particularly so in a situation where the database makes a large
number of accesses to disks
...

DEC clusters running Rdb were one of the early commercial users of the shareddisk database architecture
...

Digital Equipment Corporation (DEC) is now owned by Compaq
...
Database System
Architecture

© The McGraw−Hill
Companies, 2001

18
...
4

Distributed Systems

697

18
...
3
...
The processors at one node may communicate with another processor at another node by a high-speed interconnection network
...
Since
local disk references are serviced by local disks at each processor, the shared-nothing
model overcomes the disadvantage of requiring all I/O to go through a single interconnection network; only queries, accesses to nonlocal disks, and result relations pass
through the network
...
Consequently, shared-nothing architectures are
more scalable and can easily support a large number of processors
...

The Teradata database machine was among the earliest commercial systems to
use the shared-nothing database architecture
...

18
...
3
...
At the top level, the system consists of nodes
connected by an interconnection network, and do not share disks or memory with
one another
...
Each node of the system could actually be a shared-memory system with a few processors
...
Thus, a system could be built as a hierarchy,
with shared-memory architecture with a few processors at the base, and a sharednothing architecture at the top, with possibly a shared-disk architecture in the middle
...
8d illustrates a hierarchical architecture with shared-memory nodes
connected together in a shared-nothing architecture
...

Attempts to reduce the complexity of programming such systems have yielded
distributed virtual-memory architectures, where logically there is a single shared
memory, but physically there are multiple disjoint memory systems; the virtualmemory-mapping hardware, coupled with system software, allows each processor
to view the disjoint memories as a single virtual memory
...

18
...
The
computers in a distributed system communicate with one another through various

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

698

Chapter 18

VI
...
Database System
Architecture

Database System Architectures

communication media, such as high-speed networks or telephone lines
...
The computers in a distributed system may vary in size
and function, ranging from workstations up to mainframe systems
...

We mainly use the term site, to emphasize the physical distribution of these systems
...
9
...
Another major difference is that, in a distributed database system, we differentiate between local and
global transactions
...
A global transaction, on the other hand, is one
that either accesses data in a site different from the one at which the transaction was
initiated, or accesses data in several different sites
...

• Sharing data
...
For instance, in a distributed banking
system, where each branch stores data related to that branch, it is possible for
a user in one branch to access data in another branch
...

• Autonomy
...
9

A distributed system
...
Database System
Architecture

© The McGraw−Hill
Companies, 2001

18
...
4

Distributed Systems

699

are stored locally
...
In a distributed system, there is a global
database administrator responsible for the entire system
...

Depending on the design of the distributed database system, each administrator may have a different degree of local autonomy
...

• Availability
...
In particular, if data items are replicated in several sites, a transaction needing a particular data item may ﬁnd that item in
any of several sites
...

The failure of one site must be detected by the system, and appropriate
action may be needed to recover from the failure
...
Finally, when the failed site recovers or is
repaired, mechanisms must be available to integrate it smoothly back into the
system
...
Availability is crucial for database systems used for real-time applications
...

18
...
1 An Example of a Distributed Database
Consider a banking system consisting of four branches in four different cities
...
Each such installation is thus a site
...
Each branch maintains
(among others) a relation account(Account-schema), where
Account-schema = (account-number, branch-name, balance)
The site containing information about all the branches of the bank maintains the relation branch(Branch-schema), where
Branch-schema = (branch-name, branch-city, assets)
There are other relations maintained at the various sites; we ignore them for the purpose of our example
...
If the transaction was initiated at the Valleyview
branch, then it is considered local; otherwise, it is considered global
...
Database System
Architecture

18
...

In an ideal distributed database system, the sites would share a common global
schema (although some relations may be stored only at some sites), all sites would
run the same distributed database-management software, and the sites would be
aware of each other’s existence
...
However, in reality a distributed database has to be constructed by linking together multiple already-existing
database systems, each with its own schema and possibly running different databasemanagement software
...
We discuss these systems in Section 19
...

18
...
2 Implementation Issues
Atomicity of transactions is an important issue in building a distributed database system
...
Transaction commit protocols ensure such a situation cannot arise
...

The basic idea behind 2PC is for each site to execute the transaction till just before
commit, and then leave the commit decision to a single coordinator site; the transaction is said to be in the ready state at a site at this point
...
Every site where the transaction executed must follow the decision of the coordinator
...
The
2PC protocol is described in detail in Section 19
...
1
...
Since a transaction may access data items at several sites, transaction managers at several sites may
need to coordinate to implement concurrency control
...
Therefore deadlock detection needs to be carried out
across multiple sites
...
Replication of data items,
which is the key to the continued functioning of distributed databases when failures
occur, further complicates concurrency control
...
5 provides detailed coverage of concurrency control in distributed databases
...

697

698

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

VI
...
Database System
Architecture

18
...

When the tasks to be carried out are complex, involving multiple databases and/or
multiple interactions with humans, coordination of the tasks and ensuring transaction properties for the tasks become more complicated
...
Section 19
...
3 describes
persistent messaging, while Section 24
...

In case an organization has to choose between a distributed architecture and a
centralized architecture for implementing an application, the system architect must
balance the advantages against the disadvantages of distribution of data
...
The primary disadvantage
of distributed database systems is the added complexity required to ensure proper
coordination among the sites
...
It is more difﬁcult to implement a distributed
database system; thus, it is more costly
...
Since the sites that constitute the distributed system operate in parallel, it is harder to ensure the correctness of algorithms,
especially operation during failures of part of the system, and recovery from
failures
...

• Increased processing overhead
...

There are several approaches to distributed database design, ranging from fully
distributed designs to ones that include a large degree of centralization
...

18
...
There are basically two types of networks: local-area networks and widearea networks
...
In local-area networks, processors are distributed over
small geographical areas, such as a single building or a number of adjacent buildings
...
These differences imply major variations in the speed and reliability of
the communication network, and are reﬂected in the distributed operating-system
design
...
5
...
10) emerged in the early 1970s as a way
for computers to communicate and to share data with one another
...
Database System
Architecture

© The McGraw−Hill
Companies, 2001

18
...
10

workstation

PC

Local-area network
...
Because each
small computer is likely to need access to a full complement of peripheral devices
(such as disks and printers), and because some form of data sharing is likely to occur in a single enterprise, it was a natural step to connect these small systems into a
network
...
All the sites in such systems
are close to one another, so the communication links tend to have a higher speed and
lower error rate than do their counterparts in wide-area networks
...
Communication speeds range from a few megabits per
second (for wireless local-area networks), to 1 gigabit per second for Gigabit Ethernet
...

A storage-area network (SAN) is a special type of high-speed local-area network
designed to connect large banks of storage devices (disks) to computers that use the
data
...
The motivation for using storage-area networks to connect multiple computers to large banks
of storage devices is essentially the same as that for shared-disk databases, namely
• Scalability by adding more computers
• High availability, since data is still accessible even if a computer fails
RAID organizations are used in the storage devices to ensure high availability of the
data, permitting processing to continue even if individual disks fail
...

699

700

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

VI
...
Database System
Architecture

18
...
5
...
Systems that allowed remote terminals to be connected to a central computer
via telephone lines were developed in the early 1960s, but they were not true WANs
...
Work on the Arpanet
began in 1968
...
Typical links on the Internet are ﬁber-optic lines and, sometimes,
satellite channels
...
The last link, to end user sites, is often based on digital subscriber loop (DSL) technology supporting a few megabits per
second), or cable modem (supporting 10 megabits per second), or dial-up modem
connections over phone lines (supporting up to 56 kilobits per second)
...

• In continuous connection WANs, such as the wired Internet, hosts are connected to the network at all times
...
For applications where consistency is not critical,
such as sharing of documents, groupware systems such as Lotus Notes allow updates of remote data to be made locally, and the updates are then propagated back
to the remote site periodically
...
A mechanism for detecting
conﬂicting updates is described later, in Section 23
...
4; the resolution mechanism for
conﬂicting updates is, however, application dependent
...
6 Summary
• Centralized database systems run entirely on a single computer
...
Client– server interface protocols have
helped the growth of client – server database systems
...

Transaction servers have multiple processes, possibly running on multiple
processors
...
Database System
Architecture

© The McGraw−Hill
Companies, 2001

18
...
In addition
to processes that handle queries, there are system processes that carry out
tasks such as lock and log management and checkpointing
...
Such systems strive to
minimize communication between clients and servers by caching data
and locks at the clients
...

• Parallel database systems consist of multiple processors and multiple disks
connected by a fast interconnection network
...
Scaleup measures how well we can handle an increased number of
transactions by increasing parallelism
...

• Parallel database architectures include the shared-memory, shared-disk,
shared-nothing, and hierarchical architectures
...

• A distributed database is a collection of partially independent databases that
(ideally) share a common schema, and coordinate processing of transactions
that access nonlocal data
...

• Principally, there are two types of communication networks: local-area networks and wide-area networks
...
Wide-area networks connect nodes spread over a large
geographical area
...

Storage-area networks are a special type of local-area network designed
to provide fast interconnection between large banks of storage devices and
multiple computers
...
Database System
Architecture

18
...
1 Why is it relatively easy to port a database from a single processor machine to
a multiprocessor machine if individual queries need not be parallelized?
18
...
On the other hand, data server architectures are popular for client-server object-oriented database systems, where
transactions are expected to be relatively long
...

18
...
What would
be the drawback of such an architecture?
18
...
Consider instead a scenario

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

706

Chapter 18

VI
...
Database System
Architecture

© The McGraw−Hill
Companies, 2001

Database System Architectures

where client and server machines have exactly the same power
...
5 Consider an object-oriented database system based on a client-server architecture, with the server acting as a data server
...
What is the effect of the speed of the interconnection between the client
and the server on the choice between object and page shipping?
b
...
The page cache stores data in units
of a page, while the object cache stores data in units of objects
...
Describe one beneﬁt of an object cache
over a page cache
...
6 What is lock de-escalation, and under what conditions is it required? Why is it
not required if the unit of data shipping is an item?
18
...
Suppose the company is growing rapidly
each year, and has outgrown its current computer system
...
8 Suppose a transaction is written in C with embedded SQL, and about 80 percent
of the time is spent in the SQL code, with the remaining 20 percent spent in C
code
...

18
...
10 Consider a bank that has a collection of sites, each running a database system
...
Would such a system qualify as a distributed database?
Why?
18
...
Such networks are often conﬁgured with a
server site and multiple client sites
...
What is the advantage of such an
architecture over one where a site can exchange data with another site only by
ﬁrst dialing it up?

703

704

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

VI
...
Database System
Architecture

© The McGraw−Hill
Companies, 2001

Bibliographical Notes

707

Bibliographical Notes
Patterson and Hennessy [1995] and Stone [1993] are textbooks that provide a good
introduction to the area of computer architecture
...
Geiger [1995] and
Signore et al
...
North
[1995] describes the use of a variety of tools for client – server database access
...
[1991] and Franklin et al
...
Biliris and Orenstein [1994] survey object storage
management systems, including client – server related issues
...
[1992]
and Mohan and Narang [1994] describe recovery techniques for client-server systems
...
A survey of parallel computer architectures is
presented by Duncan [1990]
...

Ozsu and Valduriez [1999], Bell and Grimson [1992] and Ceri and Pelagatti [1984]
provide textbook coverage of distributed database systems
...

Comer and Droms [1999] and Thomas [1996] describe the computer networking
and the Internet
...
Discussions concerning ATM networks and switches are offered
by de Prycker [1993]
...
Database System
Architecture

H

A

P

T

19
...
Furthermore, the database systems that run
on each site may have a substantial degree of mutual independence
...

Each site may participate in the execution of transactions that access data at one
site, or several sites
...
This distribution of data is the cause of
many difﬁculties in transaction processing and query processing
...

We start by classifying distributed databases as homogeneous or heterogeneous,
in Section 19
...
We then address the question of how to store data in a distributed
database in Section 19
...
In Section 19
...
In Section 19
...
In Section 19
...
In Section 19
...
We
address query processing in distributed databases in Section 19
...
In Section 19
...
In Section 19
...

19
...
In such a system, local sites surrender a portion of their autonomy
709

706

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

710

Chapter 19

VI
...
Distributed Databases

© The McGraw−Hill
Companies, 2001

Distributed Databases

in terms of their right to change schemas or database management system software
...

In contrast, in a heterogeneous distributed database, different sites may use different schemas, and different database management system software
...
The differences in schemas are often a major problem
for query processing, while the divergence in software becomes a hindrance for processing transactions that access multiple sites
...
However,
in Section 19
...
Transaction processing issues in such systems are covered later, in
Section 24
...

19
...
There are two approaches to
storing this relation in the distributed database:
• Replication
...
The alternative to replication
is to store only one copy of relation r
...
The system partitions the relation into several fragments, and
stores each fragment at a different site
...
In the following subsections, we elaborate on each of these techniques
...
2
...
In the most
extreme case, we have full replication, in which a copy is stored in every site in the
system
...

• Availability
...
Thus, the system can continue to process queries
involving r, despite the failure of one site
...
In the case where the majority of accesses to the relation r result in only the reading of the relation, then several sites can process
queries involving r in parallel
...
Hence, data replication minimizes movement of data between
sites
...
Database System
Architecture

707

© The McGraw−Hill
Companies, 2001

19
...
2

Distributed Data Storage

711

• Increased overhead on update
...
Thus,
whenever r is updated, the update must be propagated to all sites containing
replicas
...
For example, in a banking system,
where account information is replicated in various sites, it is necessary to ensure that the balance in a particular account agrees in all sites
...
However, update transactions incur
greater overhead
...
We can simplify the management of replicas of relation r by choosing one of them
as the primary copy of r
...
Similarly, in an airlinereservation system, a ﬂight can be associated with the site at which the ﬂight originates
...
5
...
2
...
, rn
...
There are two different schemes for fragmenting a relation: horizontal fragmentation and vertical fragmentation
...
Vertical fragmentation splits the
relation by decomposing the scheme R of relation r
...
, rn
...

As an illustration, the account relation can be divided into several different fragments, each of which consists of tuples of accounts belonging to a particular branch
...

In general, a horizontal fragment can be deﬁned as a selection on the global relation
r
...
Database System
Architecture

© The McGraw−Hill
Companies, 2001

19
...
By changing the selection predicates
used to construct the fragments, we can have a particular tuple of r appear in more
than one of the ri
...
Vertical fragmentation of r(R) involves the deﬁnition of several subsets
of attributes R1 , R2 ,
...
More generally, any superkey can be
used
...
The tuple-id value of a tuple is a unique value that distinguishes the tuple from all
other tuples
...
The physical or logical address for a tuple
can be used as a tuple-id, since each tuple has a unique address
...

For privacy reasons, this relation may be fragmented into a relation employee-privateinfo containing employee-id and salary, and another relation employee-public-info containing attributes employee-id, name, and designation
...

The two types of fragmentation can be applied to a single schema; for instance, the
fragments obtained by horizontally fragmenting a relation can be further partitioned
vertically
...
In general, a fragment can be replicated,
replicas of fragments can be fragmented further, and so on
...
2
...
This characteristic, called data transparency, can take several forms:
• Fragmentation transparency
...

• Replication transparency
...

The distributed system may replicate an object to increase either system per-

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

VI
...
Distributed Databases

19
...
Users do not have to be concerned with what
data objects have been replicated, or where replicas have been placed
...
Users are not required to know the physical location
of the data
...

Data items—such as relations, fragments, and replicas — must have unique names
...
In a distributed database,
however, we must take care to ensure that two sites do not use the same name for
distinct data items
...
The name server helps to ensure that the same name does not get used
for different data items
...
This approach, however, suffers from two major disadvantages
...
Second, if the name server
crashes, it may not be possible for any site in the distributed system to continue
to run
...
This approach ensures that no two sites
generate the same name (since each site has a unique identiﬁer)
...
This solution, however, fails to achieve location transparency, since site identiﬁers are attached to names
...
account, or account@site17, rather than as simply account
...

To overcome this problem, the database system can create a set of alternative
names or aliases for data items
...
The mapping of aliases
to the real names can be stored at each site
...
Furthermore, the user will be unaffected if the
database administrator decides to move a data item from one site to another
...
Instead, the
system should determine which replica to reference on a read request, and should
update all replicas on a write request
...

19
...
1)
...
The local transactions are those
that access and update data in only one local database; the global transactions are
those that access and update data in several local databases
...

However, for global transactions, this task is much more complicated, since several

710

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

714

Chapter 19

VI
...
Distributed Databases

Distributed Databases

sites may be participating in execution
...

In this section we study the system structure of a distributed database, and its
possible failure modes
...
4 we study protocols for ensuring atomic commit of global transactions, and
in Section 19
...

In Section 19
...

19
...
1 System Structure
Each site has its own local transaction manager, whose function is to ensure the ACID
properties of those transactions that execute at that site
...
To understand how such a manager
can be implemented, consider an abstract model of a transaction system, in which
each site contains two subsystems:
• The transaction manager manages the execution of those transactions (or subtransactions) that access data stored in a local site
...

• The transaction coordinator coordinates the execution of the various transactions (both local and global) initiated at that site
...
1
...
1

System architecture
...
Database System
Architecture

711

© The McGraw−Hill
Companies, 2001

19
...
3

Distributed Transactions

715

The structure of a transaction manager is similar in many respects to the structure
of a centralized system
...

The transaction coordinator subsystem is not needed in the centralized environment, since a transaction accesses data at only a single site
...
For each such transaction, the coordinator is responsible
for
• Starting the execution of the transaction
• Breaking the transaction into a number of subtransactions and distributing
these subtransactions to the appropriate sites for execution
• Coordinating the termination of the transaction, which may result in the transaction being committed at all sites or aborted at all sites

19
...
2 System Failure Modes
A distributed system may suffer from the same types of failure that a centralized
system does (for example, software errors, hardware errors, or disk crashes)
...
The basic failure types are
• Failure of a site
• Loss of messages
• Failure of a communication link
• Network partition
The loss or corruption of messages is always a possibility in a distributed system
...
Information about such protocols may be found in standard textbooks on networking (see the bibliographical notes)
...
If a communication link fails, messages that would have been transmitted across the link must be
rerouted
...
In other cases, a failure may result in there being no connection between some pairs of sites
...
Database System
Architecture

19
...
Note that, under this deﬁnition, a subsystem may consist of a
single node
...
4 Commit Protocols
If we are to ensure atomicity, all the sites in which a transaction T executed must
agree on the ﬁnal outcome of the execution
...
To ensure this property, the transaction coordinator of T must
execute a commit protocol
...
4
...
An alternative is the
three-phase commit protocol (3PC), which avoids certain disadvantages of the 2PC
protocol but adds to complexity and overhead
...
4
...

19
...
1 Two-Phase Commit
We ﬁrst describe how the two-phase commit protocol (2PC) operates during normal
operation, then describe how it handles failures and ﬁnally how it carries out recovery and concurrency control
...

19
...
1
...

• Phase 1
...
It then sends a prepare T message to all sites at which T executed
...
If the answer is no, it adds a
record to the log, and then responds by sending an abort T message
to Ci
...
The
transaction manager then replies with a ready T message to Ci
...
When Ci receives responses to the prepare T message from all the
sites, or when a prespeciﬁed interval of time has elapsed since the prepare
T message was sent out, Ci can determine whether the transaction T can be
committed or aborted
...
Otherwise, transaction T must be
aborted
...
At
this point, the fate of the transaction has been sealed
...
Database System
Architecture

713

© The McGraw−Hill
Companies, 2001

19
...
4

Commit Protocols

717

coordinator sends either a commit T or an abort T message to all participating
sites
...

A site at which T executed can unconditionally abort T at any time before it sends
the message ready T to the coordinator
...
The ready T message is, in effect, a promise
by a site to follow the coordinator’s order to commit T or to abort T
...
Otherwise, if
the site crashes after sending ready T, it may be unable to make good on its promise
...

Since unanimity is required to commit a transaction, the fate of T is sealed as soon
as at least one site responds abort T
...
The ﬁnal verdict
regarding T is determined at the time that the coordinator writes that verdict (commit
or abort) to the log and forces that verdict to stable storage
...
When the coordinator receives the acknowledge T message from all the sites, it adds the record to the log
...
4
...
2 Handling of Failures
The 2PC protocol responds in differenct ways to various types of failures:
• Failure of a participating site
...
If the site fails after the coordinator has received the ready T message
from the site, the coordinator executes the rest of the commit protocol in the
normal fashion, ignoring the failure of the site
...
Let T be one such transaction
...
In this case, the site executes
redo(T)
...
In this case, the site executes undo(T)
...
In this case, the site must consult Ci
to determine the fate of T
...
In the former case, it executes redo(T); in the latter
case, it executes undo(T)
...
It does so by sending a querystatus T message to all the
sites in the system
...
It then notiﬁes Sk about this outcome
...
The decision concerning T is

714

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

718

Chapter 19

VI
...
Distributed Databases

© The McGraw−Hill
Companies, 2001

Distributed Databases

postponed until Sk can obtain the needed information
...
It continues
to do so until a site that contains the needed information recovers
...

The log contains no control records (abort, commit, ready) concerning T
...
Since the failure of Sk precludes the sending of such a response,
by our algorithm Ci must abort T
...

• Failure of the coordinator
...
We shall see that, in certain cases, the participating sites
cannot decide whether to commit or abort T, and therefore these sites must
wait for the recovery of the failed coordinator
...

If an active site contains an record in its log, then T must be
aborted
...
However, the coordinator may have decided to abort T,
but not to commit T
...

If none of the preceding cases holds, then all active sites must have a
record in their logs, but no additional control records (such
as or )
...
Thus, the active sites
must wait for Ci to recover
...
For example, if locking is used, T may
hold locks on data at active sites
...
During this time, other
transactions may be forced to wait for T
...
This
situation is called the blocking problem, because T is blocked pending the
recovery of site Ci
...
When a network partitions, two possibilities exist:
1
...
In this
case, the failure has no effect on the commit protocol
...
The coordinator and its participants belong to several partitions
...
Sites that are not in the partition containing
the coordinator simply execute the protocol to deal with failure of the
coordinator
...
Database System
Architecture

715

© The McGraw−Hill
Companies, 2001

19
...
4

Commit Protocols

719

the coordinator follow the usual commit protocol, assuming that the sites
in the other partitions have failed
...

19
...
1
...
9
...

The recovering site must determine the commit–abort status of such transactions by
contacting other sites, as described in Section 19
...
1
...

If recovery is done as just described, however, normal transaction processing at
the site cannot begin until all in-doubt transactions have been committed or rolled
back
...
Further, if the coordinator has failed, and no other site has
information about the commit–abort status of an incomplete transaction, recovery
potentially could become blocked if 2PC is used
...

To circumvent this problem, recovery algorithms typically provide support for
noting lock information in the log
...
) Instead of writing a log record, the algorithm writes
a log record, where L is a list of all write locks held by the transaction
T when the log record is written
...

After lock reacquisition is complete for all in-doubt transactions, transaction processing can start at the site, even before the commit–abort status of the in-doubt transactions is determined
...
Thus, site recovery is faster, and
never gets blocked
...

19
...
2 Three-Phase Commit
The three-phase commit (3PC) protocol is an extension of the two-phase commit protocol that avoids the blocking problem under certain assumptions
...
Under these assumptions, the protocol avoids blocking
by introducing an extra third phase where multiple sites are involved in the decision
to commit
...
Database System
Architecture

19
...
If the coordinator fails, the remaining sites ﬁrst select a new coordinator
...
The new
coordinator restarts the third phase of the protocol if some site knew that the old coordinator intended to commit the transaction
...

While the 3PC protocol has the desirable property of not blocking unless k sites
fail, it has the drawback that a partitioning of the network will appear to be the same
as more than k sites failing, which would lead to blocking
...
Because of its overhead, the 3PC protocol is not
widely used
...

19
...
3 Alternative Models of Transaction Processing
For many applications, the blocking problem of two-phase commit is not acceptable
...

In this section we describe how to use persistent messaging to avoid the problem of
distributed commit, and then brieﬂy outline the larger issue of workﬂows; workﬂows
are considered in more detail in Section 24
...

To understand persistent messaging consider how one might transfer funds between two different banks, each with its own computer
...
However, the transaction may have to update the total bank balance, and blocking could
have a serious impact on all other transactions at each bank, since almost all transactions at the bank would update the total bank balance
...
The bank ﬁrst
deducts the amount of the check from the available balance and prints out a check
...
After
verifying the check, the bank increases the local balance by the amount of the check
...
So that funds are not
lost or incorrectly increased, the check must not be lost, and must not be duplicated
and deposited more than once
...

Persistent messages are messages that are guaranteed to be delivered to the recipient exactly once (neither less nor more), regardless of failures, if the transaction
sending the message commits, and are guaranteed to not be delivered if the transaction aborts
...
In contrast, regular
messages may be lost or may even be delivered multiple times in some situations
...
Database System
Architecture

717

© The McGraw−Hill
Companies, 2001

19
...
4

Commit Protocols

721

Error handling is more complicated with persistent messaging than with twophase commit
...
Both sites must therefore be provided with error handling code, along
with code to handle the persistent messages
...

The types of exception conditions that may arise depend on the application, so
it is not possible for the database system to handle exceptions automatically
...
For
instance, it is not acceptable to just lose the money being transfered if the receiving
account has been closed; the money must be credited back to the originating account,
and if that is not possible for some reason, humans must be alerted to resolve the
situation manually
...
In fact, few
organizations would agree to support two-phase commit for transactions originating
outside the organization, since failures could result in blocking of access to local data
...

Workﬂows provide a general model of transaction processing involving multiple
sites and possibly human processing of certain steps
...
The
steps, together, form a workﬂow
...
2
...

We now consider the implementation of persistent messaging
...
The message is also given
a unique message identiﬁer
...
The usual database concurrency
control mechanisms ensure that the system process reads the message only
after the transaction that wrote the message commits; if the transaction aborts,
the usual recovery mechanism would delete the message from the relation
...
If it receives no
acknowledgement from the destination site, after some time it sends the message again
...
In case of permanent failures, the system will decide, after some period of time, that the

718

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

722

Chapter 19

VI
...
Distributed Databases

© The McGraw−Hill
Companies, 2001

Distributed Databases

message is undeliverable
...

Writing the message to a relation and processing it only after the transaction
commits ensures that the message will be delivered if and only if the transaction commits
...

• Receiving site protocol: When a site receives a persistent message, it runs a
transaction that adds the message to a special received-messages relation, provided it is not already present in the relation (the unique message identiﬁer
detects duplicates)
...

Note that sending the acknowledgment before the transaction commits is
not safe, since a system failure may then result in loss of the message
...

In many messaging systems, it is possible for messages to get delayed arbitrarily, although such delays are very unlikely
...
Deleting
it could result in a duplicate delivery not being detected
...
To deal with this problem, each message is given a timestamp, and if the timestamp of a received
message is older than some cutoff, the message is discarded
...

19
...
We assume
that each site participates in the execution of a commit protocol to ensure global transaction atomicity
...
If any site containing a replica of a data item has failed, updates to the
data item cannot be processed
...
6 we describe protocols that can continue
transaction processing even if some sites or links have failed, thereby providing high
availability
...
5
...
The only change that needs to be incorporated is in the way the lock
manager deals with replicated data
...
As in
Chapter 16, we shall assume the existence of the shared and exclusive lock modes
...
Database System
Architecture

19
...
5

719

© The McGraw−Hill
Companies, 2001

Concurrency Control in Distributed Databases

723

19
...
1
...
All lock and unlock requests are made at
site Si
...

The lock manager determines whether the lock can be granted immediately
...
Otherwise, the request is delayed until it can
be granted, at which time a message is sent to the site at which the lock request was
initiated
...
In the case of a write, all the sites where a replica of
the data item resides must be involved in the writing
...
This scheme requires two messages for handling
lock requests, and one message for handling unlock requests
...
Since all lock and unlock requests are made at one
site, the deadlock-handling algorithms discussed in Chapter 16 can be applied
directly to this environment
...
The site Si becomes a bottleneck, since all requests must be processed there
...
If the site Si fails, the concurrency controller is lost
...
6
...

19
...
1
...

Each site maintains a local lock manager whose function is to administer the lock
and unlock requests for those data items that are stored in that site
...
If data item Q is locked in an incompatible mode, then the request is delayed
until it can be granted
...

There are several alternative ways of dealing with replication of data items, which
we study in Sections 19
...
1
...
5
...
6
...
It has a reasonably low overhead, requiring two message transfers for handling lock requests, and

720

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

724

Chapter 19

VI
...
Distributed Databases

© The McGraw−Hill
Companies, 2001

Distributed Databases

one message transfer for handling unlock requests
...

The deadlock-handling algorithms discussed in Chapter 16 must be modiﬁed, as we
shall discuss in Section 19
...
4, to detect global deadlocks
...
5
...
3 Primary Copy
When a system uses data replication, we can choose one of the replicas as the primary
copy
...

When a transaction needs to lock a data item Q, it requests a lock at the primary
site of Q
...

Thus, the primary copy enables concurrency control for replicated data to be handled like that for unreplicated data
...
However, if the primary site of Q fails, Q is inaccessible, even though other sites
containing a replica may be accessible
...
5
...
4 Majority Protocol
The majority protocol works this way: If data item Q is replicated in n different sites,
then a lock-request message must be sent to more than one-half of the n sites in which
Q is stored
...
As before, the response is delayed until the request can
be granted
...

This scheme deals with replicated data in a decentralized manner, thus avoiding
the drawbacks of central control
...
The majority protocol is more complicated to implement
than are the previous schemes
...

• Deadlock handling
...
As an illustration, consider
a system with four sites and full replication
...
Transaction T1 may succeed
in locking Q at sites S1 and S3 , while transaction T2 may succeed in locking
Q at sites S2 and S4
...
Luckily, we can avoid such deadlocks with relative
ease, by requiring all sites to request locks on the replicas of a data item in the
same predetermined order
...
Database System
Architecture

19
...
5

721

© The McGraw−Hill
Companies, 2001

Concurrency Control in Distributed Databases

725

19
...
1
...
The difference from
the majority protocol is that requests for shared locks are given more favorable treatment than requests for exclusive locks
...
When a transaction needs to lock data item Q, it simply requests
a lock on Q from the lock manager at one site that contains a replica of Q
...
When a transaction needs to lock data item Q, it requests a
lock on Q from the lock manager at all sites that contain a replica of Q
...

The biased scheme has the advantage of imposing less overhead on read operations than does the majority protocol
...

However, the additional overhead on writes is a disadvantage
...

19
...
1
...
The
quorum consensus protocol assigns each site a nonnegative weight
...
To execute a write operation, enough replicas must be written so that their
total weight is ≥ Qw
...
For instance, with a small read quorum, reads need to read fewer
replicas, but the write quorum will be higher, hence writes can succeed only if correspondingly more replicas are available
...

In fact, by setting weights and quorums appropriately, the quorum consensus protocol can simulate the majority protocol and the biased protocols
...
5
...
2 is that each transaction is given a unique timestamp that the system uses in deciding the serialization
order
...
Database System
Architecture

© The McGraw−Hill
Companies, 2001

19
...
2

Generation of unique timestamps
...
Then, the various
protocols can operate directly to the nonreplicated environment
...
In the centralized scheme, a single site distributes the timestamps
...

In the distributed scheme, each site generates a unique local timestamp by using
either a logical counter or the local clock
...
2)
...
Compare this
technique for generating unique timestamps with the one that we presented in Section 19
...
3 for generating unique names
...
In such a case, the fast site’s logical counter will be larger
than that of other sites
...
What we need is a mechanism to ensure
that local timestamps are generated fairly across the system
...
The logical
clock can be implemented as a counter that is incremented after a new local timestamp is generated
...
In this case,
site Si advances its logical clock to the value x + 1
...
Since
clocks may not be perfectly accurate, a technique similar to that for logical clocks
must be used to ensure that no clock gets far ahead of or behind another clock
...
5
...
With master – slave replication, the database allows updates at a primary site,
and automatically propagates updates to replicas at other sites
...

An important feature of such replication is that transactions do not obtain locks at
remote sites
...
Database System
Architecture

19
...
5

723

© The McGraw−Hill
Companies, 2001

Concurrency Control in Distributed Databases

727

(but perhaps outdated) view of the database, the replica should reﬂect a transactionconsistent snapshot of the data at the primary; that is, the replica should reﬂect all
updates of transactions up to some transaction in the serialization order, and should
not reﬂect any updates of later transactions in the serialization order
...

Master – slave replication is particularly useful for distributing information, for instance from a central ofﬁce to branch ofﬁces of an organization
...
Updates should be propagated periodically — every night, for example — so that update propagation does not interfere with
query processing
...
It also supports snapshot refresh, which can be done either by recomputing the
snapshot or by incrementally updating it
...

With multimaster replication (also called update-anywhere replication) updates
are permitted at any replica of a data item, and are automatically propagated to
all replicas
...
Transactions update the local copy and the system updates other replicas
transparently
...
Many
database systems use the biased protocol, where writes have to lock and update all
replicas and reads lock and read any one replica, as their currency-control technique
...

Schemes based on lazy propagation allow transaction processing (including updates)
to proceed even if a site is disconnected from the network, thus improving availability, but, unfortunately, do so at the cost of consistency
...

This approach ensures that updates to an item are ordered serially, although
serializability problems can occur, since transactions may read an old value of
some other data item and use it to perform an update
...

This approach can cause even more problems, since the same data item
may be updated concurrently at multiple sites
...
5
...
Database System
Architecture

© The McGraw−Hill
Companies, 2001

19
...
Further, human intervention may be required to deal with conﬂicts
...

19
...
4 Deadlock Handling
The deadlock-prevention and deadlock-detection algorithms in Chapter 16 can be
used in a distributed system, provided that modiﬁcations are made
...

Similarly, the timestamp-ordering approach could be directly applied to a distributed
environment, as we saw in Section 19
...
2
...
Furthermore, certain deadlock-prevention techniques may require more sites to be involved
in the execution of a transaction than would otherwise be the case
...
Common
techniques for dealing with this issue require that each site keep a local wait-for
graph
...
For example, Figure 19
...
Note that transactions T2 and T3 appear in both graphs,
indicating that the transactions have requested items at both sites
...
When a transaction Ti on site S1 needs a resource in site S2 , it
sends a request message to site S2
...

Clearly, if any local wait-for graph has a cycle, deadlock has occurred
...
To illustrate this problem, we consider the
local wait-for graphs of Figure 19
...
Each wait-for graph is acyclic; nevertheless, a
deadlock exists in the system because the union of the local wait-for graphs contains
a cycle
...
4
...
3

T4

site S2
Local wait-for graphs
...
Database System
Architecture

19
...
4

T2

725

© The McGraw−Hill
Companies, 2001

19
...
3
...
Since there is communication delay in the system,
we must distinguish between two types of wait-for graphs
...
The constructed graph is an approximation generated by
the controller during the execution of the controller’s algorithm
...
Correct means in this case
that, if a deadlock exists, it is reported promptly, and if the system reports a deadlock,
it is indeed in a deadlock state
...

• Periodically, when a number of changes have occurred in a local wait-for
graph
...

When the coordinator invokes the deadlock-detection algorithm, it searches its
global graph
...
The coordinator
must notify all the sites that a particular transaction has been selected as victim
...

This scheme may produce unnecessary rollbacks if:
• False cycles exist in the global wait-for graph
...
5
...
Transaction T2 then requests a resource
held by T3 at site S2 , resulting in the addition of the edge T2 → T3 in S2
...
Deadlock recovery may be initiated, although
no deadlock has occurred
...
Database System
Architecture

© The McGraw−Hill
Companies, 2001

19
...
5

False cycles in the global wait-for graph
...

The likelihood of false cycles is usually sufﬁciently low that they do not cause
a serious performance problem
...
For example,
suppose that site S1 in Figure 19
...
At the same time, the
coordinator has discovered a cycle, and has picked T3 as a victim
...

Deadlock detection can be done in a distributed manner, with several sites taking
on parts of the task, instead of being done at a single site, However, such algorithms
are more complicated and more expensive
...

19
...
In particular, since failures are more likely
in large distributed systems, a distributed database must continue functioning even
when there are various types of failures
...

For a distributed system to be robust, it must detect failures, reconﬁgure the system
so that computation may continue, and recover when a processor or a link is repaired
...
For example, message
loss is handled by retransmission
...
Database System
Architecture

727

© The McGraw−Hill
Companies, 2001

19
...
6

Availability

731

without receipt of an acknowledgment, is usually a symptom of a link failure
...
Failure to ﬁnd
such a route is usually a symptom of network partition
...
The system can usually detect that a failure has occurred, but
it may not be able to identify the type of failure
...
It could be that S2 has failed
...
The problem is partly addressed by using multiple links between sites, so that
even if one link fails the sites will remain connected
...

Suppose that site S1 has discovered that a failure has occurred
...

• If transactions were active at a failed/inaccessible site at the time of the failure,
these transactions should be aborted
...

However, in some cases, when data objects are replicated it may be possible
to proceed with reads and updates even though some replicas are inaccessible
...
We address this issue in Section 19
...
1
...
When a
site rejoins, care must be taken to ensure that data at the site is consistent, as
we will see in Section 19
...
3
...
6
...
Examples of central servers
include a name server, a concurrency coordinator, or a global deadlock detector
...
In particular, these situations must be avoided:
• Two or more central servers are elected in distinct partitions
...

728

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

732

Chapter 19

VI
...
Distributed Databases

© The McGraw−Hill
Companies, 2001

Distributed Databases

19
...
1 Majority-Based Approach
The majority-based approach to distributed concurrency control in Section 19
...
1
...
In this approach, each data object stores
with it a version number to detect when it was last written to
...
The
transaction does not operate on a until it has successfully obtained a lock on a
majority of the replicas of a
...
(Optionally, they may also write this value back to replicas with lower version numbers
...
The new version number is
one more than the highest version number
...

Failures during a transaction (whether network partitions or site failures) can be tolerated as long as (1) the sites available at commit contain a majority of replicas of all
the objects written to and (2) during reads, a majority of replicas are read to ﬁnd the
version numbers
...

As long as the requirements are satisﬁed, the two-phase commit protocol can be used,
as usual, on the sites that are available
...
This is because
writes would have updated a majority of the replicas, while reads will read a majority
of the replicas and ﬁnd at least one replica that has the latest version
...
We leave the
(straightforward) details to the reader
...

19
...
2 Read One, Write All Available Approach
As a special case of quorum consensus, we can employ the biased protocol by giving
unit weights to all sites, setting the read quorum to 1, and setting the write quorum to
n (all sites)
...
This protocol is called the read one, write all
protocol since all replicas must be written
...
In this approach, a read operation proceeds as
in the read one, write all scheme; any available replica can be read, and a read lock is

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

VI
...
Distributed Databases

19
...
A write operation is shipped to all replicas; and write locks
are acquired on all the replicas
...

While this approach appears very attractive, there are several complications
...
Further, if the network partitions, each partition may proceed to
update the same data item, believing that sites in the other partitions are all dead
...

19
...
3 Site Reintegration
Reintegration of a repaired site or link into the system requires care
...
If the site had replicas of any data items, it must obtain
the current values of these data items and ensure that it receives all future updates
...

An easy solution is to halt the entire system temporarily while the failed site rejoins
it
...

Techniques have been developed to allow failed sites to reintegrate while concurrent
updates to data items proceed concurrently
...
If a failed link recovers, two or more partitions can be rejoined
...
See the bibliographical
notes for more information on recovery in distributed systems
...
6
...
10, and replication in distributed databases are two alternative approaches to providing high availability
...
In particular, remote backup
systems help avoid two-phase commit, and its resultant overheads
...
Thus remote backup systems offer a
lower-cost approach to high availability than replication
...

730

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

734

Chapter 19

VI
...
Distributed Databases

© The McGraw−Hill
Companies, 2001

Distributed Databases

19
...
5 Coordinator Selection
Several of the algorithms that we have presented require the use of a coordinator
...
One way to
continue execution is by maintaining a backup to the coordinator, which is ready to
assume responsibility if the coordinator fails
...
All messages directed to the coordinator are received
by both the coordinator and its backup
...
The only difference
in function between the coordinator and its backup is that the backup does not take
any action that affects other sites
...

In the event that the backup coordinator detects the failure of the actual coordinator, it assumes the role of coordinator
...

The prime advantage to the backup approach is the ability to continue processing
immediately
...
Frequently, the only source
of some of the requisite information is the failed coordinator
...

Thus, the backup-coordinator approach avoids a substantial amount of delay while
the distributed system recovers from a coordinator failure
...
Furthermore, a coordinator and its backup need to communicate regularly to ensure that their activities are
synchronized
...

In the absence of a designated backup coordinator, or in order to handle multiple
failures, a new coordinator may be chosen dynamically by sites that are live
...
Election algorithms require that a unique identiﬁcation number be
associated with each active site in the system
...
To keep the notation and the
discussion simple, assume that the identiﬁcation number of site Si is i and that the
chosen coordinator will always be the active site with the largest identiﬁcation number
...
The algorithm must send this number to each active
site in the system
...
Suppose that site Si
sends a request that is not answered by the coordinator within a prespeciﬁed time

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

VI
...
7

731

© The McGraw−Hill
Companies, 2001

19
...
In this situation, it is assumed that the coordinator has failed, and Si tries
to elect itself as the site for the new coordinator
...
Site Si then waits, for a time interval T, for an answer from any one of these sites
...

If Si does receive an answer, it begins a time interval T , to receive a message
informing it that a site with a higher identiﬁcation number has been elected
...
)
If Si receives no message within T , then it assumes the site with a higher number
has failed, and site Si restarts the algorithm
...

If there are no active sites with higher numbers, the recovered site forces all sites with
lower numbers to let it become the coordinator site, even if there is a currently active
coordinator with a lower number
...

19
...
We examined several techniques for choosing a strategy for processing a
query that minimize the amount of time that it takes to compute the answer
...
In a distributed system, we must take into account
several other matters, including
• The cost of data transmission over the network
• The potential gain in performance from having several sites process parts of
the query in parallel
The relative cost of data transfer over the network and data transfer to and from disk
varies widely depending on the type of network and on the speed of the disks
...
Rather, we must
ﬁnd a good tradeoff between the two
...
7
...
” Although the query is simple — indeed, trivial—processing it is not trivial, since the
account relation may be fragmented, replicated, or both, as we saw in Section 19
...

If the account relation is replicated, we have a choice of replica to make
...

However, if a replica is fragmented, the choice is not so easy to make, since we need
to compute several joins or unions to reconstruct the account relation
...
Database System
Architecture

© The McGraw−Hill
Companies, 2001

19
...
Query optimization
by exhaustive enumeration of all alternative strategies may not be practical in such
situations
...
The result is the expression
σbranch-name = “Hillside” (account1 ) ∪ σbranch-name = “Hillside” (account2 )
which includes two subexpressions
...
The second involves only account2 , and thus can be
evaluated at the Valleyview site
...
In evaluating
σbranch-name = “Hillside” (account2 )
we can apply the deﬁnition of the account2 fragment to obtain
σbranch-name = “Hillside” (σbranch-name = “Valleyview” (account))
This expression is the empty set, regardless of the contents of the account relation
...

19
...
2 Simple Join Processing
As we saw in Chapter 13, a major decision in the selection of a query-processing strategy is choosing a join strategy
...
Let SI denote the site
at which the query was issued
...

Among the possible strategies for processing this query are these:

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

VI
...
7

733

© The McGraw−Hill
Companies, 2001

19
...
Using the techniques of Chapter 13,
choose a strategy for processing the entire query locally at site SI
...
Ship temp1 from S2 to S3 , and compute temp2 = temp1 1 branch
at S3
...

• Devise strategies similar to the previous one, with the roles of S1 , S2 , S3 exchanged
...
Among the factors that must be considered
are the volume of data being shipped, the cost of transmitting a block of data between a pair of sites, and the relative speed of processing at each site
...
If we ship all three relations to SI , and indices exist on
these relations, we may need to re-create these indices at SI
...
However, the second
strategy has the disadvantage that a potentially large relation (customer 1 account)
must be shipped from S2 to S3
...
Thus, the second strategy may result in
extra network transmission compared to the ﬁrst strategy
...
7
...
Let the schemas of r1 and r2 be R1 and R2
...
If there are many tuples of r2 that do not
join with any tuple of r1 , then shipping r2 to S1 entails shipping tuples that fail to
contribute to the result
...

A possible strategy to accomplish all this is:
1
...

2
...

3
...

4
...

5
...
The resulting relation is the same as r1 1

r2
...
In step 3, temp2 has the result of r2 1 ΠR1 ∩ R2 (r1 )
...

1

r2 , the

734

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

738

Chapter 19

VI
...
Distributed Databases

Distributed Databases

This strategy is particularly advantageous when relatively few tuples of r2 contribute to the join
...
In such a case, temp2 may have signiﬁcantly
fewer tuples than r2
...
Additional cost is incurred in shipping temp1 to S2
...

This strategy is called a semijoin strategy, after the semijoin operator of the relational algebra, denoted n
...
In step 3, temp2
= r2 n r1
...
A substantial body of theory has been developed regarding the use of semijoins
for query optimization
...

19
...
4 Join Strategies that Exploit Parallelism
Consider a join of four relations:
r1

1

r2

1

r3

1

r4

where relation ri is stored at site Si
...
There are many possible strategies for parallel evaluation
...
) In one such strategy, r1 is
shipped to S2 , and r1 1 r2 computed at S2
...
Site S2 can ship tuples of (r1 1 r2 ) to S1 as they are
produced, rather than wait for the entire join to be computed
...
Once tuples of (r1 1 r2 ) and (r3 1 r4 ) arrive at S1 , the
computation of (r1 1 r2 ) 1 (r3 1 r4 ) can begin, with the pipelined join technique
of Section 13
...
2
...
Thus, computation of the ﬁnal join result at S1 can be done
in parallel with the computation of (r1 1 r2 ) at S2 , and with the computation of
(r3 1 r4 ) at S4
...
8 Heterogeneous Distributed Databases
Many new database applications require data from a variety of preexisting databases
located in a heterogeneous collection of hardware and software environments
...
This software layer
is called a multidatabase system
...
A multidatabase system creates the illusion of logical database integration without requiring
physical database integration
...
Database System
Architecture

19
...
8

735

© The McGraw−Hill
Companies, 2001

Heterogeneous Distributed Databases

739

Full integration of heterogeneous systems into a homogeneous distributed database is often difﬁcult or impossible:
• Technical difﬁculties
...

• Organizational difﬁculties
...
In such cases, it is important for a multidatabase system to allow the local database systems to retain a high degree of
autonomy over the local database and transactions running against that data
...
In this section, we provide an overview of the challenges faced
in constructing a multidatabase environment from the standpoint of data deﬁnition
and query processing
...
6 provides an overview of transaction management
issues in multidatabases
...
8
...
For instance, some may employ the relational model, whereas others may employ older
data models, such as the network model (see Appendix A) or the hierarchical model
(see Appendix B)
...
A commonly used
choice is the relational model, with SQL as the common query language
...

Another difﬁculty is the provision of a common conceptual schema
...
The multidatabase system must integrate
these separate schemas into one common schema
...

Schema integration is not simply straightforward translation between data-deﬁnition languages
...
The data types used in one system may not be supported by
other systems, and translation between types may not be simple
...
At the semantic level, an integer value for length may be inches in one system and millimeters in another, thus
creating an awkward situation in which equality of integers is only an approximate
notion (as is always the case for ﬂoating-point numbers)
...
For example, a system based in the
United States may refer to the city “Cologne,” whereas one in Germany refers to it as
“Koln
...
Database System
Architecture

19
...
Translation functions must be provided
...
As we noted earlier,
the alternative of converting each database to a common format may not be feasible
without obsoleting existing application programs
...
8
...
Some of the issues
are:
• Given a query on a global schema, the query may have to be translated into
queries on local schemas at each of the sites where the query has to be executed
...

The task is simpliﬁed by writing wrappers for each data source, which provide a view of the local data in the global schema
...
Wrappers may be provided by individual
sites, or may be written separately as part of the multidatabase system
...

• Some data sources may provide only limited query capabilities; for instance,
they may support selections, but not joins
...
Queries may therefore
have to be broken up, to be partly performed at the data source and partly at
the site issuing the query
...
Answers retrieved from the sites may have to be processed to remove
duplicates
...

A query on the entire account relation would require access to both sites and
removal of duplicate answers resulting from tuples with balance between 50
and 100, which are replicated at both sites
...
The usual solution is to rely on only local-level optimization, and just use heuristics at the global level
...
Unlike full-ﬂedged multidatabase systems, mediator systems do not
bother about transaction processing
...
Database System
Architecture

737

© The McGraw−Hill
Companies, 2001

19
...
9

Directory Systems

741

ten used in an interchangeable fashion, and systems that are called mediators may
support limited forms of transactions
...

19
...
In the precomputerization days, organizations would create physical
directories of employees and distribute them across the organization
...

In general, a directory is a listing of information about some class of objects such as
persons
...
In the world of physical telephone directories, directories that satisfy lookups in the forward direction are
called white pages, while directories that satisfy lookups in the reverse direction are
called yellow pages
...
However, directories today need to be available over a
computer network, rather than in a physical (paper) form
...
9
...
Such interfaces are good for humans
...
Directories can
be used for storing other types of information, much like ﬁle system directories
...
A user can thus access the same settings from multiple locations,
such as at home and at work, without having to share a ﬁle system
...
The most widely used among them today is the
Lightweight Directory Access Protocol (LDAP)
...
The
question then is, why come up with a specialized protocol for accessing directory
information? There are at least two answers to the question
...
They evolved in parallel with the database access
protocols
...
Database System
Architecture

19
...
For example, a particular directory server may store information for Bell Laboratories employees in Murray
Hill, while another may store information for Bell Laboratories employees in
Bangalore, giving both sites autonomy in controlling their local data
...
More importantly, the directory system can be set up to automatically forward queries made at one site to the other site, without user intervention
...
As may be expected, several directory implementations ﬁnd it beneﬁcial to use relational databases to store data, instead of creating
special-purpose storage systems
...
9
...
Clients use the application programmer interface deﬁned by directory system to communicate with the directory servers
...

The X
...
However,
the protocol is rather complex, and is not widely used
...
500 features, but with less complexity, and is widely used
...

19
...
2
...
Each entry must have a
distinguished name (DN), which uniquely identiﬁes the entry
...
For example, an entry may
have the following distinguished name
...
The order of the components of a distinguished name reﬂects the normal postal address order, rather than
the reverse order used in specifying path names for ﬁles
...

Entries can also have attributes
...
Unlike those in the relational model, attributes

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

VI
...
Distributed Databases

19
...

LDAP allows the deﬁnition of object classes with attribute names and types
...
Moreover, entries can be speciﬁed to
be of one or more object classes
...

Entries are organized into a directory information tree (DIT), according to their
distinguished names
...
Entries that are internal nodes represent objects such as organizational units,
organizations, or countries
...
For instance, an internal node may
have a DN c=USA, and all entries below it have the value USA for the RDN c
...

Entries may have more than one distinguished name — for example, an entry for a
person in more than one organization
...

19
...
2
...
However, LDAP deﬁnes a network protocol for carrying out data
deﬁnition and manipulation
...
LDAP also deﬁnes a ﬁle format called LDAP Data Interchange
Format (LDIF) that can be used for storing and exchanging information
...
A query must specify the following:
• A base — that is, a node within a DIT — by giving its distinguished name (the
path from the root to the node)
...
Equality, matching by wild-card characters, and approximate equality (the exact deﬁnition of approximate equality is system dependent) are supported
...

• Attributes to return
...

The query can also specify whether to automatically dereference aliases; if alias dereferences are turned off, alias entries can be returned as answers
...
Database System
Architecture

19
...
Examples of
LDAP URLs are:

ldap:://aura
...
bell-labs
...
research
...
com/o=Lucent,c=USA??sub?cn=Korth
The ﬁrst URL returns all attributes of all entries at the server with organization being
Lucent, and country being USA
...
The
question marks in the URL separate different ﬁelds
...
The second ﬁeld, the list of attributes to return, is left
empty, meaning return all attributes
...
The last parameter is the search condition
...
Figure 19
...
The code ﬁrst opens a connection to an LDAP
server by ldap open and ldap bind
...
The
arguments to ldap search s are the LDAP connection handle, the DN of the base from
which the search should be done, the scope of the search, the search condition, the
list of attributes to be returned, and an attribute called attrsonly, which, if set to 1,
would result in only the schema of the result being returned, without any actual tuples
...

The ﬁrst for loop iterates over and prints each entry in the result
...

Since attributes in LDAP may be multivalued, the third for loop prints each value of
an attribute
...
Figure 19
...

The LDAP API also contains functions to create, update, and delete entries, as well
as other operations on the DIT
...

19
...
2
...
The sufﬁx of a DIT is a sequence of RDN=value
pairs that identify what information the DIT stores; the pairs are concatenated to the
rest of the distinguished name generated by traversing from the entry to the root
...
The DITs may be organizationally and geographically
separated
...

Referrals are the key component that help organize a distributed collection of directories into an integrated system
...
Database System
Architecture

741

© The McGraw−Hill
Companies, 2001

19
...
9

Directory Systems

745

#include ...
h>
main() {
LDAP *ld;
LDAPMessage *res, *entry;
char *dn, *attr, *attrList[] = {“telephoneNumber”, NULL};
BerElement *ptr;
int vals, i;
ld = ldap open(“aura
...
bell-labs
...
6

Example of LDAP code in C
...
Access to the referenced DIT is transparent, proceeding without the user’s knowledge
...

The hierarchical naming mechanism used by LDAP helps break up control of information across parts of an organization
...

Although it is not an LDAP requirement, organizations often choose to break up
information either by geography (for instance, an organization may maintain a directory for each site where the organization has a large presence) or by organizational

742

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

746

Chapter 19

VI
...
Distributed Databases

© The McGraw−Hill
Companies, 2001

Distributed Databases

structure (for instance, each organizational unit, such as department, maintains its
own directory)
...
Work
on standardizing replication in LDAP is in progress
...
10 Summary
• A distributed database system consists of a collection of sites, each of which
maintains a local database system
...
In addition, a
site may participate in the execution of global transactions; those transactions
that access data in several sites
...

• Distributed databases may be homogeneous, where all sites have a common
schema and database system code, or heterogeneous, where the schemas and
system codes may differ
...
It is essential that the system
minimize the degree to which a user needs to be aware of how a relation is
stored
...
There are, however, additional failures with which we
need to deal in a distributed environment, including the failure of a site, the
failure of a link, loss of a message, and network partition
...

• To ensure atomicity, all the sites in which a transaction T executed must agree
on the ﬁnal outcome of the execution
...
To ensure this property, the transaction coordinator of T must execute
a commit protocol
...

• The two-phase commit protocol may lead to blocking, the situation in which
the fate of a transaction cannot be determined until a failed site (the coordinator) recovers
...

• Persistent messaging provides an alternative model for handling distributed
transactions
...
Persistent messages (which are guaranteed to be
delivered exactly once, regardless of failures), are sent to remote sites to request actions to be taken there
...

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

VI
...
Distributed Databases

19
...

In the case of locking protocols, the only change that needs to be incorporated is in the way that the lock manager is implemented
...
One or more central coordinators
may be used
...

Protocols for handling replicated data include the primary-copy, majority,
biased, and quorum-consensus protocols
...

In the case of timestamping and validation schemes, the only needed
change is to develop a mechanism for generating unique global timestamps
...
Such facilities must be used with great care, since they may result
in nonserializable executions
...

• To provide high availability, a distributed database must detect failures, reconﬁgure itself so that computation may continue, and recover when a processor
or a link is repaired
...

The majority protocol can be extended by using version numbers to permit transaction processing to proceed even in the presence of failures
...
Less-expensive protocols are available to deal with site failures, but they
assume network partitioning does not occur
...
To provide
high availability, the system must maintain a backup copy that is ready to assume responsibility if the coordinator fails
...
The algorithms that determine which site should act as a coordinator are called election algorithms
...
Several
optimization techniques are available to choose which sites need to be accessed
...

• Heterogeneous distributed databases allow sites to have their own schemas
and database system code
...
The local database systems may employ different logical mod-

744

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

748

Chapter 19

VI
...
Distributed Databases

Distributed Databases

els and data-deﬁnition and data-manipulation languages, and may differ in
their concurrency-control and transaction-management mechanisms
...

• Directory systems can be viewed as a specialized form of database, where
information is organized in a hierarchical fashion similar to the way ﬁles are
organized in a ﬁle system
...

Directories can be distributed across multiple sites to provide autonomy to
individual sites
...

Review Terms
• Homogeneous distributed
database

In-doubt transactions
Blocking problem

• Heterogeneous distributed
database

• Three-phase commit protocol

• Data replication

• Persistent messaging

• Primary copy

• Concurrency control

• Data fragmentation
Horizontal fragmentation
Vertical fragmentation
• Data transparency

• Single lock-manager

Fragmentation transparency
Replication transparency
Location transparency
• Name server
• Aliases
• Distributed transactions
Local transactions
Global transactions
• Transaction manager
• Transaction coordinator
• System failure modes
• Network partition
• Commit protocols
• Two-phase commit protocol (2PC)
Ready state

(3PC)

• Distributed lock-manager
• Protocols for replicas
Primary copy
Majority protocol
Biased protocol
Quorum consensus protocol
• Timestamping
• Master – slave replication
• Multimaster (update-anywhere)
replication
• Transaction-consistent snapshot
• Lazy propagation
• Deadlock handling
Local wait-for graph
Global wait-for graph
False cycles
• Availability
• Robustness

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

VI
...
Distributed Databases

749

• Mediators
• Virtual database
• Directory systems

• Semijoin strategy

• LDAP: Lightweight directory
access protocol
Distinguished name (DN)
Relative distinguished names
RDNs
Directory information
tree (DIT)
• Distributed directory trees

• Multidatabase system

• DIT sufﬁx

• Autonomy

• Referral

• Coordinator selection
• Backup coordinator
• Election algorithms
• Bully algorithm
• Distributed query processing

Exercises
19
...

19
...

19
...
4 When is it useful to have replication or fragmentation of data? Explain your
answer
...
5 Explain the notions of transparency and autonomy
...
6 To build a highly available distributed system, you must know what kinds of
failures can occur
...
List possible types of failure in a distributed system
...
Which items in your list from part a are also applicable to a centralized
system?
19
...
For each possible
failure that you listed in Exercise 19
...

19
...
Can site A distinguish
among the following?
• B goes down
...

• B is extremely overloaded and response time is 100 times longer than normal
...
Database System
Architecture

19
...
9 The persistent messaging scheme described in this chapter depends on timestamps combined with discarding of received messages if they are too old
...

19
...

19
...
Suppose we modify that protocol as follows:
• Only intention-mode locks are allowed on the root
...

Show that these modiﬁcations alleviate this problem without allowing any
nonserializable schedules
...
12 Explain the difference between data replication in a distributed system and the
maintenance of a remote backup site
...
13 Give an example where lazy replication can lead to an inconsistent database
state even when updates get an exclusive lock on the primary (master) copy
...
14 Study and summarize the facilities that the database system you are using provides for dealing with inconsistent states that can be reached with lazy propagation of updates
...
15 Discuss the advantages and disadvantages of the two methods that we presented in Section 19
...
2 for generating globally unique timestamps
...
16 Consider the following deadlock-detection algorithm
...
The edge (Ti , Tj , n) is inserted in the local wait-for of S1
...
A request from Ti to Tj in the same site is handled in the usual manner;
no timestamps are associated with the edge (Ti , Tj )
...

On receiving this message, a site sends its local wait-for graph to the coordinator
...
The wait-for graph reﬂects an instantaneous
state of the site, but it is not synchronized with respect to any other site
...

• The graph has an edge (Ti , Tj ) if and only if

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

VI
...
Distributed Databases

Exercises

751

There is an edge (Ti , Tj ) in one of the wait-for graphs
...

Show that, if there is a cycle in the constructed graph, then the system is in a
deadlock state, and that, if there is no cycle in the constructed graph, then the
system was not in a deadlock state when the execution of the algorithm began
...
17 Consider a relation that is fragmented horizontally by plant-number:
employee (name, address, salary, plant-number)
Assume that each fragment has two replicas: one stored at the New York site
and one stored locally at the plant site
...

a
...

b
...

c
...

d
...

19
...
Assume
that the machine relation is stored in its entirety at the Armonk site
...

a
...

b
...
”
c
...

d
...

19
...
18, state how your choice of a strategy
depends on:
a
...
The site at which the result is desired
19
...
7
...
21 Is ri n rj necessarily equal to rj
rj = rj n ri hold?

n

ri ? Under what conditions does ri

n

19
...
Database System
Architecture

© The McGraw−Hill
Companies, 2001

19
...
7

D
4
6
3
4
2

E
5
8
2
1
3

s
Relations for Exercise 19
...

19
...

Bibliographical Notes
Textbook discussions of distributed databases are offered by Ozsu and Valduriez
[1999] and Ceri and Pelagatti [1984]
...
Rothnie et al
...
Breitbart et al
...

The implementation of the transaction concept in a distributed database are presented by Gray [1981], Traiger et al
...
[1991]
...
The three-phase commit protocol is from Skeen [1981]
...

The bully algorithm in Section 19
...
5 is from Garcia-Molina [1982]
...
Distributed concurrency control is covered by Rosenkrantz et al
...
[1978], Bernstein et al
...
[1980], Bernstein and Goodman [1980], Bernstein and Goodman [1981a], Bernstein and Goodman [1982], and Garcia-Molina and Wiederhold
[1982]
...
[1986]
...
Validation techniques for distributed concurrencycontrol schemes are described by Schlageter [1981], Ceri and Owicki [1983], and
Bassiouni [1988]
...

Attar et al
...
A survey of techniques for recovery in distributed
database systems is presented by Kohler [1981]
...
Problems in this

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

VI
...
Distributed Databases

749

© The McGraw−Hill
Companies, 2001

Bibliographical Notes

753

environment are discussed in Gray et al
...
Anderson et al
...
Breitbart et al
...
The user manuals of various database systems provide details of how they handle replication and consistency
...

Distributed deadlock-detection algorithms are presented by Rosenkrantz et al
...
[1983], and Obermarck [1982]
...
16 is from Stuart et al
...

Distributed query processing is discussed in Wong [1977], Epstein et al
...
[1983], Ceri and
Pelagatti [1983], and Wong [1983]
...
[1982]
discuss the approach to distributed query processing taken by R* (a distributed version of System R)
...
The performance results also serve to validate
the cost model used in the R* query optimizer
...
[1982]
...
[1997]
...
[1996] and Papakonstantinou et al
...

Weltman and Dahbura [2000] and Howes et al
...
Kapitskaia et al
...

750

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

VI
...
Parallel Databases

C H A P T E R

© The McGraw−Hill
Companies, 2001

2 0

Parallel Databases

In this chapter, we discuss fundamental algorithms for parallel database systems that
are based on the relational data model
...

20
...
Today, they are successfully marketed by practically every database system vendor
...
Moreover, the growth of the World Wide Web has created
many sites with millions of viewers, and the increasing amounts of data collected from these viewers has produced extremely large databases at many
companies
...
Queries
used for such purposes are called decision-support queries, and the data requirements for such queries may run into terabytes
...

• The set-oriented nature of database queries naturally lends itself to parallelization
...

• As microprocessors have become cheap, parallel machines have become common and relatively inexpensive
...
Database System
Architecture

20
...
Parallelism is also used to provide scaleup, where increasing workloads
are handled without increased response time, via an increase in the degree of parallelism
...
Brieﬂy,
in shared-memory architectures, all processors share a common memory and disks;
in shared-disk architectures, processors have independent memories, but share disks;
in shared-nothing architectures, processors share neither memory nor disks; and hierarchical architectures have nodes that share neither memory nor disks with each
other, but internally each node has a shared-memory or a shared-disk architecture
...
2 I/O Parallelism
In it simplest form, I/O parallelism refers to reducing the time required to retrieve
relations from disk by partitioning the relations on multiple disks
...

In horizontal partitioning, the tuples of a relation are divided (or declustered) among
many disks, so that each tuple resides on one disk
...

20
...
1 Partitioning Techniques
We present three basic data-partitioning strategies
...
, Dn−1 , across which the data are to be partitioned
...
This strategy scans the relation in any order and sends the ith
tuple to disk number Di mod n
...

• Hash partitioning
...
A hash
function is chosen whose range is {0, 1,
...
Each tuple of the original
relation is hashed on the partitioning attributes
...

• Range partitioning
...
It chooses a partitioning attribute, A, as a partitioning
vector
...
Let [v0 , v1 ,
...
Consider a tuple t such
that t[A] = x
...
If x ≥ vn−2 , then t goes on disk
Dn−1
...

For example, range partitioning with three disks numbered 0, 1, and 2 may
assign tuples with values less than 5 to disk 0, values between 5 and 40 to disk
1, and values greater than 40 to disk 2
...
Database System
Architecture

© The McGraw−Hill
Companies, 2001

20
...
2

I/O Parallelism

757

20
...
2 Comparison of Partitioning Techniques
Once a relation has been partitioned among several disks, we can retrieve it in parallel, using all the disks
...
Thus, the transfer rates for reading or writing an entire
relation are much faster with I/O parallelism than without it
...
Access to data
can be classiﬁed as follows:
1
...
Locating a tuple associatively (for example, employee-name = “Campbell”);
these queries, called point queries, seek tuples that have a speciﬁed value
for a speciﬁc attribute
3
...

The different partitioning techniques support these types of access at different levels
of efﬁciency:
• Round-robin
...
With this scheme, both point
queries and range queries are complicated to process, since each of the n disks
must be used for the search
...
This scheme is best suited for point queries based on the
partitioning attribute
...
Directing a query to a single disk saves the startup cost of initiating a query on multiple disks, and
leaves the other disks free to process other queries
...

If the hash function is a good randomizing function, and the partitioning attributes form a key of the relation, then the number of tuples in each of the
disks is approximately the same, without much variance
...

The scheme, however, is not well suited for point queries on nonpartitioning attributes
...
Therefore, all the disks need to be scanned for range queries
to be answered
...
This scheme is well suited for point and range queries on
the partitioning attribute
...
For range queries, we consult

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

758

Chapter 20

VI
...
Parallel Databases

© The McGraw−Hill
Companies, 2001

Parallel Databases

the partitioning vector to ﬁnd the range of disks on which the tuples may
reside
...

An advantage of this feature is that, if there are only a few tuples in the
queried range, then the query is typically sent to one disk, as opposed to
all the disks
...
On the other hand, if there are many tuples in the queried range (as
there are when the queried range is a larger fraction of the domain of the relation), many tuples have to be retrieved from a few disks, resulting in an I/O
bottleneck (hot spot) at those disks
...
In contrast, hash partitioning and round-robin partitioning would engage all the disks for such queries,
giving a faster response time for approximately the same throughput
...
5
...
In general, hash partitioning or range
partitioning are preferred to round-robin partitioning
...
Large relations are preferably partitioned across all the available disks
...

20
...
3 Handling of Skew
When a relation is partitioned (by a technique other than round-robin), there may be
a skew in the distribution of tuples, with a high percentage of tuples placed in some
partitions and fewer tuples in other partitions
...
All the tuples with the same value for the partitioning
attribute end up in the same partition, resulting in skew
...

Attribute-value skew can result in skewed partitioning regardless of whether range
partitioning or hash partitioning is used
...
Partition skew is less likely with hash
partitioning, if a good hash function is chosen
...
Database System
Architecture

© The McGraw−Hill
Companies, 2001

20
...
2

I/O Parallelism

759

As Section 18
...
1 noted, even a small skew can result in a signiﬁcant decrease in
performance
...
For example, if a relation of 1000 tuples is divided into 10 parts, and the division is skewed, then there may be some partitions of size less than 100 and some
partitions of size more than 100; if even one partition happens to be of size 200, the
speedup that we would obtain by accessing the partitions in parallel is only 5, instead
of the 10 for which we would have hoped
...
If even one partition has
40 tuples (which is possible, given the large number of partitions) the speedup that
we would obtain by accessing them in parallel would be 25, rather than 100
...

A balanced range-partitioning vector can be constructed by sorting: The relation
is ﬁrst sorted on the partitioning attributes
...
After every 1/n of the relation has been read, the value of the partitioning
attribute of the next tuple is added to the partition vector
...
In case there are many tuples with the same value
for the partitioning attribute, the technique can still result in some skew
...

The I/O overhead for constructing balanced range-partition vectors can be reduced by constructing and storing a frequency table, or histogram, of the attribute
values for each attribute of each relation
...
1 shows an example of a histogram for an integer-valued attribute that takes values in the range 1 to 25
...
It is straightforward to construct a balanced
range-partitioning function given a histogram on the partitioning attributes
...

50
frequency

754

40
30
20
10
1–5

6–10

Figure 20
...

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

760

Chapter 20

VI
...
Parallel Databases

© The McGraw−Hill
Companies, 2001

Parallel Databases

Another approach to minimizing the effect of skew, particularly with range partitioning, is to use virtual processors
...

Any of the partitioning techniques and query evaluation techniques that we study
later in this chapter can be used, but they map tuples and work to virtual processors
instead of to real processors
...

The idea is that even if one range had many more tuples than the others because
of skew, these tuples would get split across multiple virtual processor ranges
...

20
...
Transaction throughput can be increased by this form of parallelism
...
Thus, the primary use of interquery parallelism is to scaleup a transaction-processing system to support a larger number of
transactions per second
...
Database systems designed
for single-processor systems can be used with few or no changes on a shared-memory
parallel architecture, since even sequential database systems support concurrent processing
...

Supporting interquery parallelism is more complicated in a shared-disk or sharednothing architecture
...
A parallel database system must also ensure that two processors do not update
the same data independently at the same time
...
The problem of ensuring that the version is the
latest is known as the cache-coherency problem
...
One such protocol for a shared-disk system is this:
1
...
Immediately after the transaction obtains
either a shared or exclusive lock on a page, it also reads the most recent copy
of the page from the shared disk
...
Before a transaction releases an exclusive lock on a page, it ﬂushes the page to
the shared disk; then, it releases the lock
...
Database System
Architecture

© The McGraw−Hill
Companies, 2001

20
...
4

Intraquery Parallelism

761

This protocol ensures that, when a transaction sets a shared or exclusive lock on a
page, it gets the correct copy of the page
...
Such protocols do not write pages to disk when exclusive
locks are released
...
The protocols have to be designed to handle concurrent requests
...
When other processors want
to read or write the page, they send requests to the home processor Pi of the page,
since they cannot directly communicate with the disk
...

The Oracle 8 and Oracle Rdb systems are examples of shared-disk parallel database
systems that support interquery parallelism
...
4 Intraquery Parallelism
Intraquery parallelism refers to the execution of a single query in parallel on multiple processors and disks
...
Interquery parallelism does not help in this task, since each
query is run sequentially
...
Suppose that the relation has been partitioned across multiple
disks by range partitioning on some attribute, and the sort is requested on the partitioning attribute
...

Thus, we can parallelize a query by parallelizing individual operations
...
We can parallelize the evaluation of the operator tree by
evaluating in parallel some of the operations that do not depend on one another
...
The two operations can be executed in parallel on separate processors, one generating output that is consumed by the other, even as it is generated
...
We can speed up processing of a query by parallelizing the execution of each individual operation, such as sort, select, project,
and join
...
5
...
We can speed up processing of a query by executing in parallel the different operations in a query expression
...
6
...
Since the number of operations in a typical query is small, compared to
the number of tuples processed by each operation, the ﬁrst form of parallelism can

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

762

Chapter 20

VI
...
Parallel Databases

© The McGraw−Hill
Companies, 2001

Parallel Databases

scale better with increasing parallelism
...

In the following discussion of parallelization of queries, we assume that the queries
are read only
...
Rather than presenting algorithms for each architecture
separately, we use a shared-nothing architecture model in our description
...

We can simulate this model easily by using the other architectures, since transfer
of data can be done via shared memory in a shared-memory architecture, and via
shared disks in a shared-disk architecture
...
We mention occasionally how
the algorithms can be further optimized for shared-memory or shared-disk systems
...
, Pn−1 , and n disks D0 , D1 ,
...
A real system may have multiple disks per processor
...
However, for simplicity, we assume here that Di is a single disk
...
5 Intraoperation Parallelism
Since relational operations work on relations containing large sets of tuples, we can
parallelize the operations by executing them in parallel on different subsets of the relations
...
Thus, intraoperation parallelism is natural in a database system
...
5
...
5
...

20
...
1 Parallel Sort
Suppose that we wish to sort a relation that resides on n disks D0 , D1 ,
...
If the
relation has been range partitioned on the attributes on which it is to be sorted, then,
as noted in Section 20
...
2, we can sort each partition separately, and can concatenate
the results to get the full sorted relation
...

If the relation has been partitioned in any other way, we can sort it in one of two
ways:
1
...

2
...

20
...
1
...
When we sort by range partitioning the relation,
it is not necessary to range-partition the relation on the same set of processors or

757

758

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

VI
...
Parallel Databases

20
...
Suppose that we choose processors
P0 , P1 ,
...
There are two steps involved in this
operation:
1
...

To implement range partitioning, in parallel every processor reads the tuples from its disk and sends the tuples to their destination processor
...
, Pm also receives tuples belonging to its partition, and
stores them locally
...

2
...
Each processor executes the same operation
— namely, sorting — on a different data set
...
)
The ﬁnal merge operation is trivial, because the range partitioning in the
ﬁrst phase ensures that, for 1 ≤ i < j ≤ m, the key values in processor Pi are
all less than the key values in Pj
...
Virtual processor partitioning can also be used to reduce skew
...
5
...
2 Parallel External Sort–Merge
Parallel external sort–merge is an alternative to range partitioning
...
, Dn−1 (it does not matter how the relation has been partitioned)
...
Each processor Pi locally sorts the data on disk Di
...
The system then merges the sorted runs on each processor to get the ﬁnal
sorted output
...
The system range-partitions the sorted partitions at each processor Pi (all by
the same partition vector) across the processors P0 , P1 ,
...
It sends the
tuples in sorted order, so that each processor receives the tuples in sorted
streams
...
Each processor Pi performs a merge on the streams as they are received, to get
a single sorted run
...
The system concatenates the sorted runs on processors P0 , P1 ,
...

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

764

Chapter 20

VI
...
Parallel Databases

© The McGraw−Hill
Companies, 2001

Parallel Databases

As described, this sequence of actions results in an interesting form of execution
skew, since at ﬁrst every processor sends all blocks of partition 0 to P0 , then every
processor sends all blocks of partition 1 to P1 , and so on
...
To avoid this problem, each processor repeatedly sends a block of data to each partition
...
As a result, all processors receive data in parallel
...
The Y-net interconnection network in the Teradata DBC
machines can merge output from multiple processors to give a single sorted output
...
5
...

Parallel join algorithms attempt to split the pairs to be tested over several processors
...
Then, the system collects the
results from each processor to produce the ﬁnal result
...
5
...
1 Partitioned Join
For certain kinds of joins, such as equi-joins and natural joins, it is possible to partition
the two input relations across the processors, and to compute the join locally at each
processor
...
Partitioned join then works this way: The system partitions the relations
r and s each into n partitions, denoted r0 , r1 ,
...
, sn−1
...

The partitioned join technique works correctly only if the join is an equi-join (for
example, r 1r
...
B s) and if we partition r and s by the same partitioning function
on their join attributes
...
In a partitioned join, however, there are two different
ways of partitioning r and s:
• Range partitioning on the join attributes
• Hash partitioning on the join attributes
In either case, the same partitioning function must be used for both relations
...
For
hash partitioning, the same hash function must be used on both relations
...
2
depicts the partitioning in a partitioned parallel join
...
For example, hash–join, merge–join, or
nested-loop join could be used
...

759

760

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

VI
...
Parallel Databases

20
...

...

...

...

...

...

...

...

...

s

r
Figure 20
...

If one or both of the relations r and s are already partitioned on the join attributes
(by either hash partitioning or range partitioning), the work needed for partitioning
is reduced greatly
...
Each processor
Pi reads in the tuples on disk Di , computes for each tuple t the partition j to which t
belongs, and sends tuple t to processor Pj
...

We can optimize the join algorithm used locally at each processor to reduce I/O by
buffering some of the tuples to memory, instead of writing them to disk
...
5
...
3
...
The partition vector should be such
that | ri | + | si | (that is, the sum of the sizes of ri and si ) is roughly equal over all
the i = 0, 1,
...
With a good hash function, hash partitioning is likely to have
a smaller skew, except when there are many tuples with the same values for the join
attributes
...
5
...
2 Fragment-and-Replicate Join
Partitioning is not applicable to all types of joins
...
a ...
Thus, there may be no easy way of partitioning r and s so
that tuples in partition ri join with only tuples in partition si
...
We
ﬁrst consider a special case of fragment and replicate — asymmetric fragment-andreplicate join — which works as follows
...
Database System
Architecture

© The McGraw−Hill
Companies, 2001

20
...
The system partitions one of the relations— say, r
...

2
...

3
...

The asymmetric fragment-and-replicate scheme appears in Figure 20
...
If r is already stored by partitioning, there is no need to partition it further in step 1
...

The general case of fragment and replicate join appears in Figure 20
...
, rn−1 , and partitions s into m partitions, s0 , s1 ,
...
As before, any partitioning technique may
be used on r and on s
...
Asymmetric fragment and
replicate is simply a special case of general fragment and replicate, where m = 1
...

s
s0

s1

s2

s3

...

P1,2

P2,1

r

sm–1

...

r2

P2

r3

P3

r3

...

...

...

...

...

...

...

...
3

...

761

762

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

VI
...
Parallel Databases

20
...
, P0,m−1 , P1,0 ,
...
Processor Pi,j computes the join of ri with sj
...
To do so, the system replicates ri to processors Pi,0 , Pi,1 ,
...
3b), and replicates si to processors P0,i , P1,i ,
...
3b)
...

Fragment and replicate works with any join condition, since every tuple in r can
be tested with every tuple in s
...

Fragment and replicate usually has a higher cost than partitioning when both relations are of roughly the same size, since at least one of the relations has to be replicated
...
In
such a case, asymmetric fragment and replicate is preferable, even though partitioning could be used
...
5
...
3 Partitioned Parallel Hash–Join
The partitioned hash–join of Section 13
...
5 can be parallelized
...
, Pn−1 , and two relations r and s, such that the relations r and
s are partitioned across multiple disks
...
5
...
If the size of s is less than that of r, the parallel
hash–join algorithm proceeds this way:
1
...
Let ri denote the
tuples of relation r that are mapped to processor Pi ; similarly, let si denote the
tuples of relation s that are mapped to processor Pi
...

2
...
The partitioning at this stage is exactly the same as in the
partitioning phase of the sequential hash–join algorithm
...

3
...
As it receives each tuple, the destination processor repartitions it
by the function h2 , just as the probe relation is partitioned in the sequential
hash–join algorithm
...
Each processor Pi executes the build and probe phases of the hash–join algorithm on the local partitions ri and si of r and s to produce a partition of the
ﬁnal result of the hash–join
...
Therefore, any
of the optimizations of the hash–join described in Chapter 13 can be applied as well

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

768

Chapter 20

VI
...
Parallel Databases

© The McGraw−Hill
Companies, 2001

Parallel Databases

to the parallel case
...

20
...
2
...
Suppose that relation r is
stored by partitioning; the attribute on which it is partitioned does not matter
...

We use asymmetric fragment and replicate, with relation s being replicated and
with the existing partitioning of relation r
...
At the end of this phase, relation s is replicated at all sites
that store tuples of relation r
...
We can overlap the indexed nested-loop join with the
distribution of tuples of relation s, to reduce the costs of writing the tuples of relation
s to disk, and of reading them back
...

20
...
3 Other Relational Operations
The evaluation of other relational operations also can be parallelized:
• Selection
...
Consider ﬁrst the case where θ is of the
form ai = v, where ai is an attribute and v is a value
...
If θ is of the form
l ≤ ai ≤ u — that is, θ is a range selection — and the relation has been rangepartitioned on ai , then the selection proceeds at each processor whose partition overlaps with the speciﬁed range of values
...

• Duplicate elimination
...
We can also parallelize duplicate elimination by
partitioning the tuples (by either range or hash partitioning) and eliminating
duplicates locally at each processor
...
Projection without duplicate elimination can be performed as tuples are read in from disk in parallel
...

• Aggregation
...
We can parallelize the operation by partitioning the relation on the grouping attributes, and then com-

763

764

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

VI
...
Parallel Databases

20
...
Either hash partitioning
or range partitioning can be used
...

We can reduce the cost of transferring tuples during partitioning by partly
computing aggregate values before partitioning, at least for the commonly
used aggregate functions
...
The system can perform the operation at each processor Pi on those r tuples
stored on disk Di
...
The system partitions the result of the local aggregation
on the grouping attribute A, and performs the aggregation again (on tuples
with the partial sums) at each processor Pi to get the ﬁnal result
...
This idea can be extended easily to the min and
max aggregate functions
...
8
...

20
...
4 Cost of Parallel Evaluation of Operations
We achieve parallelism by partitioning the I/O among multiple disks, and partitioning the CPU work among multiple processors
...
We
already know how to estimate the cost of an operation such as a join or a selection
...

We must also account for the following costs:
• Startup costs for initiating the operation at multiple processors
• Skew in the distribution of work among the processors, with some processors
getting a larger number of tuples than others
• Contention for resources — such as memory, disk, and the communication
network — resulting in delays
• Cost of assembling the ﬁnal result by transmitting partial results from each
processor
The time taken by a parallel operation can be estimated as
Tpart + Tasm + max(T0 , T1 ,
...
Assuming that the
tuples are distributed without any skew, the number of tuples sent to each processor

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

770

Chapter 20

VI
...
Parallel Databases

Parallel Databases

can be estimated as 1/n of the total number of tuples
...

The preceding estimate will be an optimistic estimate, since skew is common
...
A partitioned parallel evaluation, for instance, is only as fast as the slowest of the parallel executions
...

The problem of skew in partitioning is closely related to the problem of partition
overﬂow in sequential hash–joins (Chapter 13)
...
We can use balanced range partitioning and virtual processor partitioning
to minimize skew due to range partitioning, as in Section 20
...
3
...
6 Interoperation Parallelism
There are two forms of interoperation parallelism: pipelined parallelism, and independent parallelism
...
6
...
Recall that, in pipelining, the output tuples of one operation, A, are consumed by a second operation, B, even before the ﬁrst
operation has produced the entire set of tuples in its output
...

Parallel systems use pipelining primarily for the same reason that sequential systems do
...
It is possible to
run operations A and B simultaneously on different processors, so that B consumes
tuples in parallel with A producing them
...

Consider a join of four relations:
r1

1

r2

1

r3

1

r4

We can set up a pipeline that allows the three joins to be computed in parallel
...
As P1 computes tuples in r1 1 r2 , it
makes these tuples available to processor P2
...
P2 can use those tuples
that are available to begin computation of temp1 1 r3 , even before r1 1 r2 is fully
computed by P1
...

765

766

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

VI
...
Parallel Databases

20
...
First, pipeline chains generally do not attain sufﬁcient length to provide a high degree of parallelism
...
Third, only marginal speedup is obtained for the frequent
cases in which one operator’s execution cost is much higher than are those of the
others
...
The real reason for using pipelining is that pipelined executions can avoid writing intermediate results to disk
...
6
...
This form of parallelism is called independent parallelism
...
Clearly, we can compute temp1 ← r1 1 r2
in parallel with temp2 ← r3 1 r4
...
7
...
2)
...

20
...
3 Query Optimization
Query optimizers account in large measure for the success of relational technology
...

Query optimizers for parallel query evaluation are more complicated than query
optimizers for sequential query evaluation
...
More important is the issue of how
to parallelize a query
...
The expression can be represented by an operator tree, as in Section 13
...

To evaluate an operator tree in a parallel system, we must make the following
decisions:
• How to parallelize each operation, and how many processors to use for it
• What operations to pipeline across different processors, what operations to execute independently in parallel, and what operations to execute sequentially,
one after the other

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

772

Chapter 20

VI
...
Parallel Databases

© The McGraw−Hill
Companies, 2001

Parallel Databases

These decisions constitute the task of scheduling the execution tree
...
For instance, it may appear wise to use the maximum amount of
parallelism available, but it is a good idea not to execute certain operations in parallel
...
Otherwise,
the advantage of parallelism is negated by the overhead of communication
...
Unless the operations are coarse grained, the ﬁnal operation of the pipeline may
wait for a long time to get inputs, while holding precious resources, such as memory
...

The number of parallel evaluation plans from which to choose is much larger than
the number of sequential evaluation plans
...
Hence, we usually adopt heuristic approaches to reduce the number of parallel execution plans that we have to consider
...

The ﬁrst heuristic is to consider only evaluation plans that parallelize every operation across all processors, and that do not use any pipelining
...
Finding the best such execution plan is like doing query optimization in a sequential system
...

The second heuristic is to choose the most efﬁcient sequential evaluation plan,
and then to parallelize the operations in that evaluation plan
...

This model uses existing implementations of operations, operating on local copies of
data, coupled with an exchange operation that moves data around between different
processors
...

Yet another dimension of optimization is the design of physical-storage organization to speed up queries
...
The database administrator must choose a physical organization that appears to be good for the expected mix of database queries
...

20
...
Since large-scale parallel database systems are used primarily for storing
large volumes of data, and for processing decision-support queries on those data,
these topics are the most important in a parallel database system
...

767

768

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

VI
...
Parallel Databases

20
...

With a large number of processors and disks, the probability that at least one processor or disk will malfunction is signiﬁcantly greater than in a single-processor system with one disk
...
Assuming that the probability of failure of a single processor or disk is small, the probability of failure of the system goes up linearly
with the number of processors and disks
...

Therefore, large-scale parallel database systems, such as Compaq Himalaya,
Teradata, and Informix XPS (now a division of IBM), are designed to operate even
if a processor or disk fails
...
If a processor fails, the data that it stored can still be accessed from the other processors
...
Requests for data stored at the failed site are automatically routed to the
backup sites that store a replica of the data
...
Therefore, the replicas
of the data of a processor are partitioned across multiple other processors
...
Therefore, it is
unacceptable for the database system to be unavailable while such operations are in
progress
...

Consider, for instance, online index construction
...
The index-building operation therefore cannot lock the entire
relation in shared mode, as it would have done otherwise
...

20
...

• In I/O parallelism, relations are partitioned among available disks so that
they can be retrieved faster
...

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

774

Chapter 20

VI
...
Parallel Databases

Parallel Databases

• Skew is a major problem, especially with increasing degrees of parallelism
...

• In interquery parallelism, we run different queries concurrently to increase
throughput
...
There
are two types of intraquery parallelism: intraoperation parallelism and interoperation parallelism
...
Intraoperation parallelism is natural for relational
operations, since they are set oriented
...

In partitioned parallelism, the relations are split into several parts, and
tuples in ri are joined with only tuples from si
...

In fragment and replicate, both relations are partitioned and each partition is replicated
...
Unlike partitioned parallelism, fragment and replicate and asymmetric fragment-and-replicate
can be used with any join condition
...

• In independent parallelism, different operations that do not depend on one
another are executed in parallel
...

• Query optimization in parallel databases is signiﬁcantly more complex than
query optimization in sequential databases
...
Database System
Architecture

© The McGraw−Hill
Companies, 2001

20
...
1 For each of the three partitioning techniques, namely round-robin, hash partitioning, and range partitioning, give an example of a query for which that
partitioning technique would provide the fastest response
...
2 In a range selection on a range-partitioned attribute, it is possible that only
one disk may need to be accessed
...

20
...
Hash partitioning
b
...
4 What form of parallelism (interquery, interoperation, or intraoperation) is likely
to be the most important for each of the following tasks
...
Increasing the throughput of a system with many small queries
b
...
Database System
Architecture

Chapter 20

20
...
5 With pipelined parallelism, it is often a good idea to perform several operations
in a pipeline on a single processor, even when many processors are available
...
Explain why
...
Would the arguments you advanced in part a hold if the machine has a
shared-memory architecture? Explain why or why not
...
Would the arguments in part a hold with independent parallelism? (That
is, are there cases where, even if the operations are not pipelined and there
are many processors available, it is still a good idea to perform several
operations on the same processor?)
20
...
What attributes should be used for partitioning?
20
...
How can you optimize the evaluation if the join condition is of
the form | r
...
B | ≤ k, where k is a small constant
...
A join with such a join condition is called a band join
...
8 Describe a good way to parallelize each of the following
...
Full outer join, if the join condition involves comparisons other than equality

a
...

c
...

e
...

20
...

a
...
, 91 – 100, with frequencies
15, 5, 20, 10, 10, 5, 5, 20, 5, and 5, respectively
...

b
...

20
...

20
...

a
...
What are the beneﬁts and drawbacks of using RAID storage instead of storing an extra copy of each data item?

771

772

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

VI
...
Parallel Databases

© The McGraw−Hill
Companies, 2001

Bibliographical Notes

777

Bibliographical Notes
Relational database systems began appearing in the marketplace in 1983; now, they
dominate it
...
A commercial system, Teradata, and several
research projects, such as GRACE (Kitsuregawa et al
...
[1986]),
GAMMA (DeWitt et al
...
[1990]) were
launched in quick succession
...
Subsequently,
in the late 1980s and the 1990s, several more companies — such as Tandem, Oracle,
Sybase, Informix, and Red-Brick (now a part of Informix, which is itself now a part of
IBM) — entered the parallel database market
...
[1989]) and Volcano (Graefe [1990])
...
Cache-coherency protocols for parallel database systems are discussed by Dias et al
...
Carey et al
...
Parallelism and recovery in database systems are discussed by
Bayer et al
...

Graefe [1993] presents an excellent survey of query processing, including parallel processing of queries
...
[1992]
...
[1984], Kitsuregawa et al
...
[1987], Schneider and DeWitt [1989], Kitsuregawa and Ogawa [1990],
Lin et al
...
[1995], among other works
...
[1992], Deshpande and Larson [1992], and Shatdal and Naughton [1993]
...
[1991], Wolf [1991],
and DeWitt et al
...
Sampling techniques for parallel databases are described
by Seshadri and Naughton [1992] and Ganguly et al
...
The exchange-operator
model was advocated by Graefe [1990] and Graefe [1993]
...
Lu and Tan [1991],
Hong and Stonebraker [1991], Ganguly et al
...
[1993], Hasan
and Motwani [1995], and Jhingran et al
...

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

P A

VII
...
The chapter ﬁrst outlines how to implement user interfaces, in particular Web-based interfaces
...

Chapter 22 describes a number of recent advances in querying and information
retrieval
...
It next covers data warehousing, whereby
data generated by different parts of an organization are gathered centrally
...
Finally, the chapter describes information retrieval,
which deals with techniques for querying collections of text documents, such as Web
pages, to ﬁnd documents of interest
...
Applications such as mobile
computing and its connections with databases, are also described in this chapter
...

773

774

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

VII
...
Application
Development and
Administration

C H A P T E R

© The McGraw−Hill
Companies, 2001

2 1

Application Development
and Administration

Practically all use of databases occurs from within application programs
...
Not surprisingly, therefore, database systems have long supported tools such
as form and GUI builders, which help in rapid development of applications that interface with users
...

Once an application has been built, it is often found to run slower than the designers wanted, or to handle fewer transactions per second than they required
...
Benchmarks help to characterize the performance of database systems
...
A variety of standards have been proposed that affect database application
development
...

Legacy systems are systems based on older-generation technology
...
We outline issues
in interfacing with legacy systems, and how they can be replaced by other systems
...
1 Web Interfaces to Databases
The World Wide Web (Web, for short), is a distributed information system based on
hypertext
...
After outlining
several reasons for interfacing databases with the Web (Section 21
...
1), we provide
an overview of Web technology (Section 21
...
2) and then study Web servers (Section 21
...
3) and outline some state-of-the art techniques for building Web interfaces
781

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

782

Chapter 21

VII
...
Application
Development and
Administration

© The McGraw−Hill
Companies, 2001

Application Development and Administration

to databases, using servlets and server-side scripting languages (Sections 21
...
4 and
21
...
5)
...
1
...

21
...
1 Motivation
The Web has become important as a front end to databases for several reasons: Web
browsers provide a universal front end to information supplied by back ends located
anywhere in the world
...

Further, today, almost everyone who can afford it has access to the Web
...
The HTML forms interface is convenient for transaction processing
...
The server executes an application program
corresponding to the order form, and this action in turn executes transactions on a
database at the server site
...

Another reason for interfacing databases to the Web is that presenting only static
(ﬁxed) documents on a Web site has some limitations, even when the user is not
doing any querying or transaction processing:
• Fixed Web documents do not allow the display to be tailored to the user
...

• When the company data are updated, the Web documents become obsolete
if they are not updated simultaneously
...

We can ﬁx these problems by generating Web documents dynamically from a database
...
Whenever relevant data in the database are updated, the generated documents will automatically become up-to-date
...

Web interfaces provide attractive beneﬁts even for database applications that are
used only with a single organization
...

Hyperlinks, which are links to other documents, can be associated with regions of
the displayed data
...
Hyperlinks are very useful for browsing data, permitting users to get more
details of parts of the data as desired
...
Programs can be written in client-side scripting languages, such as
Javascript, or can be “applets” written in the Java language
...
Other Topics

© The McGraw−Hill
Companies, 2001

21
...
1

Web Interfaces to Databases

783

the construction of sophisticated user interfaces, beyond what is possible with just
HTML, interfaces that can be used without downloading and installing any software
...

21
...
2 Web Fundamentals
Here we review some of the fundamental technology behind the World Wide Web,
for readers who are not familiar with it
...
1
...
1 Uniform Resource Locators
A uniform resource locator (URL) is a globally unique name for each document that
can be accessed on the Web
...
bell-labs
...
The second part gives the unique name
of a machine that has a Web server
...

Much data on the Web is dynamically generated
...
An example of such a URL is
http://www
...
com/search?q=silberschatz
which says that the program search on the server www
...
com should be executed with the argument q=silberschatz
...

21
...
2
...
1 is an example of the source of an HTML document
...
2 shows the
displayed image that this document creates
...
HTML also supports several other input types
...
The program generates an HTML document, which is then
sent back and displayed to the user; we will see how to construct such programs in
Sections 21
...
3, 21
...
4, and 21
...
5
...
The cascading stylesheet (css) standard allows the same

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

784

Chapter 21

VII
...
Application
Development and
Administration

Application Development and Administration

A-101	Downtown	500
A-102	Perryridge	400
A-201	Brighton	900

The account relation

Select account/loan and enter number

Figure 21
...

stylesheet to be used for multiple HTML documents, giving a uniform look to all
the pages on a Web site
...
1
...
3 Client-Side Scripting and Applets
Embedding of program code in documents allows Web pages to be active, carrying out activities such as animation by executing programs at the local site, rather
than just presenting passive text and graphics
...
Further, executing programs at the client site speeds up

A–101
A–102
A–201

Downtown
500
Perryridge 400
Brighton
900
The account relation

Select account/loan and enter number
Account
Figure 21
...
1
...
Other Topics

© The McGraw−Hill
Companies, 2001

21
...
1

Web Interfaces to Databases

785

interaction greatly, compared to every interaction being sent to a server site for processing
...
The malicious actions
could range from reading private information, to deleting or modifying information
on the computer, up to taking control of the computer and propagating the code
to other computers (through e-mail, for example)
...

The Java language became very popular because it provides a safe mode for executing programs on user’s computers
...

Unlike local programs, Java programs (applets) downloaded as part of a Web page
have no authority to perform any actions that could be destructive
...
However,
they are not permitted to access local ﬁles, to execute any system programs, or to
make network connections to any other computers
...
These languages provide constructs that can be embedded with
an HTML document
...
Of these, the Javascript language is by far the
most widely used
...
Scripting
languages can also be used on the server side, as we shall see
...
1
...
The
browser and Web server communicate by a protocol called the HyperText Transfer Protocol (HTTP)
...
The most important feature is the ability to execute programs, with
arguments supplied by the user, and deliver the results back as an HTML document
...
A new service can be created by creating and installing an application program that provides the service
...
The application program typically communicates with a database server,
through ODBC, JDBC, or other protocols, in order to get or store data
...
3 shows a Web service using a three-tier architecture, with a Web server,
an application server, and a database server
...

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

786

Chapter 21

VII
...
Application
Development and
Administration

Application Development and Administration

web server

network
network

HTTP

application server
database server
data

browser

server
Figure 21
...

Most Web services today therefore use a two-tier Web architecture, where the application program runs within the Web server, as in Figure 21
...
We study systems
based on the two-tier architecture in more detail in subsequent sections
...

In contrast, when a user logs on to a computer, or connects to an ODBC or JDBC
server, a session is created, and session information is retained at the server and the
client until the session is terminated— information such as whether the user was
authenticated using a password and what session options the user set
...
With a connectionless service, the connection is broken as soon as a request is
satisﬁed, leaving connections available for other requests
...
For instance, services typically restrict access to information, and therefore need to authenticate users
...

To create the view of such sessions, extra information has to be stored at the client,
and returned with each request in a session, for a server to identify that a request is
part of a user session
...

This extra information is maintained in the form of a cookie at the client; a cookie
is simply a small piece of text containing identifying information
...
Cookies sent
to different clients contain different identifying text
...
By comparing the
cookie with locally stored cookies at the server, the server can identify the request as

779

780

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

VII
...
Application
Development and
Administration

21
...
4

Two-tier Web architecture
...
Cookies can also be used for storing user preferences and
using them when the server replies to a request
...

21
...
4 Servlets
In a two-tier Web architecture, the application runs as part of the Web server itself
...
The Java servlet speciﬁcation deﬁnes an application programming interface for communication between the Web server and the application program
...
The
program is loaded into the Web server when the server starts up or when the server
receives a Web request for executing the servlet application
...
5 is an example
of servlet code to implement the form in Figure 21
...

The servlet is called BankQueryServlet, while the form speciﬁes that action=“BankQuery”
...

The example will give you an idea of how servlets are used
...

See the bibliographical notes for references to these sources
...
So the doGet() method of the servlet,
which is deﬁned in the code, gets invoked
...

Any values from the form menus and input ﬁelds on the Web page, as well as
cookies, pass through an object of the HttpServletRequest class that is created for the

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

788

VII
...
Application
Development and
Administration

© The McGraw−Hill
Companies, 2001

Application Development and Administration

public class BankQueryServlet extends HttpServlet {
public void doGet(HttpServletRequest request, HttpServletResponse result)
throws ServletException, IOException
{
String type = request
...
getParameter(“number”);

...

...

...
setContentType(“text/html”);
PrintWriter out = result
...
println(“ Query Result”);
out
...
println(“Balance on ” + type + number + “ = ” + balance);
out
...
close();
}

}
Figure 21
...

request, and the reply to the request passes through an object of the class HttpServletResponse
...
getParameter(), and uses these to run a query against
a database
...
13
...
The system returns the results of
the query to the requester by printing them out in HTML format to the HttpServletResponse result
...
Invoking the
method getSession(true) of the class HttpServletRequest creates a new object of type
HttpSession if this is the ﬁrst request from that client; the argument true says that a
session must be created if the request is a new request
...
Internally, cookies
are used to recognize that a request is from the same browser session as an earlier
request
...
For instance, the ﬁrst
request in a session may ask for a user-id and password, and store the user-id in the
session object
...

Displaying a set of results from a query is a common task for many database applications
...
JDBC metadata calls
1
...

781

782

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

VII
...
Application
Development and
Administration

21
...

21
...
5 Server-Side Scripting
Writing even a simple Web application in a programming language such as Java or C
is a rather time-consuming task that requires many lines of code and programmers
familiar with the intricacies of the language
...
Scripting languages provide constructs that can be embedded within HTML documents
...
Each piece of script, when executed, can generate text that is added to the page (or may even delete content from
the page)
...
The executed script may
contain SQL code that is executed against a database
...
These include ServerSide Javascript from Netscape, JScript from Microsoft, JavaServer Pages (JSP) from
Sun, the PHP Hypertext Preprocessor (PHP), ColdFusion’s ColdFusion Markup Language (CFML) and Zope’s DTML
...

For instance, Microsoft’s Active Server Pages (ASP) supports embedded VBScript
and JScript
...
These also support HTML forms for getting parameter values that are used in the queries embedded
in the reports
...
They all support similar
features, but differ in the style of programming and the ease with which simple applications can be created
...
1
...

Ensuring that requests are served with low response times is a major challenge for
Web site developers
...
For instance, suppose the application code for servicing each request
needs to contact a database through JDBC
...
Many applications create a pool of
open JDBC connections, and each request uses one of the connections from the pool
...
The cost of communication with the database can be greatly reduced by caching

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

790

Chapter 21

VII
...
Application
Development and
Administration

© The McGraw−Hill
Companies, 2001

Application Development and Administration

the results of earlier queries, and reusing them, so long as the query result has not
changed at the database
...

Costs can be further reduced by caching the ﬁnal Web page that is sent in response
to a request
...

Cached query results and cached Web pages are forms of materialized views
...
5)
...

21
...
Various aspects of
a database-system design — ranging from high-level aspects such as the schema and
transaction design, to database parameters such as buffer sizes, down to hardware
issues such as number of disks— affect the performance of an application
...

21
...
1 Location of Bottlenecks
The performance of most systems (at least before they are tuned) is usually limited
primarily by the performance of one or a few components, called bottlenecks
...
Improving the performance of a component that is not a bottleneck does
little to improve the overall speed of the system; in the example, improving the speed
of the rest of the code cannot lead to more than a 20 percent improvement overall,
whereas improving the speed of the bottleneck loop could result in an improvement
of nearly 80 percent overall, in the best case
...
When one bottleneck is removed, it may turn out that another component becomes the bottleneck
...
If the system contains bottlenecks, components that are not
part of the bottleneck are underutilized, and could perhaps have been replaced by
cheaper components with lower performance
...
However, database systems are much more complex, and can
be modeled as queueing systems
...
Each of these services has a
queue associated with it, and small transactions may spend most of their time wait-

783

784

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

VII
...
Application
Development and
Administration

21
...
Figure 21
...

As a result of the numerous queues in the database, bottlenecks in a database system typically show up in the form of long queues for a particular service, or, equivalently, in high utilizations for a particular service
...
Unfortunately, the arrival of requests
in a database system is never so uniform, and is instead random
...
Assuming uniformly randomly distributed arrivals, the length of the queue (and
correspondingly the waiting time) go up exponentially with utilization; as utilization
approaches 100 percent, the queue length increases sharply, resulting in excessively
long waiting times
...
As a rule of the thumb, utilizations of around 70 percent are
considered to be good, and utilizations above 90 percent are considered excessive,
since they will result in signiﬁcant delays
...

concurrency control
manager

…
lock
request

lock
grant
CPU manager

transaction
manager

transaction
source
transaction
monitor

page
request

page
reply

buffer
manager

Figure 21
...

disk manager

…

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

792

Chapter 21

VII
...
Application
Development and
Administration

© The McGraw−Hill
Companies, 2001

Application Development and Administration

21
...
2 Tunable Parameters
Database administrators can tune a database system at three levels
...
Options for tuning systems at this level include adding disks
or using a RAID system if disk I/O is a bottleneck, adding more memory if the disk
buffer size is a bottleneck, or moving to a faster processor if CPU use is a bottleneck
...
The exact set of database-system parameters that can be
tuned depends on the speciﬁc database system
...
Well-designed database systems perform
as much tuning as possible automatically, freeing the user or database administrator
from the burden
...
If the system automatically adjusts the buffer size by observing indicators
such as page-fault rates, then the user will not have to worry about tuning the buffer
size
...
It includes the schema and transactions
...
Tuning at this level is
comparatively system independent
...
For example, tuning at a higher level may result in the
hardware bottleneck changing from the disk system to the CPU, or vice versa
...
2
...

An important factor in tuning a transaction processing system is to make sure that
the disk subsystem can handle the rate at which I/O operations are required
...
If each transaction requires just 2 I/O operations, a single disk would support at
most 50 transactions per second
...
If the system needs to support n transactions
per second, each performing 2 I/O operations, data must be striped (or otherwise
partitioned) across n/50 disks (ignoring skew)
...
The number of I/O operations per transaction can be reduced
by storing more data in memory
...
Keeping frequently used data in memory reduces the number of
disk I/Os, and is worth the extra cost of memory
...

The question is, for a given amount of money available for spending on disks or
memory, what is the best way to spend the money to achieve maximum number of

785

786

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

VII
...
Application
Development and
Administration

21
...
A reduction of 1 I/O per second saves (price per disk drive) /
(access per second per disk)
...
Storing a page in
memory costs (price per MB of memory) / (pages per MB of memory)
...
Current disk technology and memory and
disk prices give a value of n around 1/300 times per second (or equivalently, once in
5 minutes) for pages that are randomly accessed
...
In
other words, it is worth buying enough memory to cache all pages that are accessed
at least once in 5 minutes on an average
...

The formula for ﬁnding the break-even point depends on factors, such as the costs
of disks and memory, that have changed by factors of 100 or 1000 over the past
decade
...
Assuming 1 MB of data is read at a time, we get the 1-minute rule, which
says that sequentially accessed data should be cached in memory if they are used at
least once in 1 minute
...
Some applications need to keep even infrequently used data in memory, to support response times that are less than or comparable to disk access time
...
The answer depends on how frequently the data are updated, since RAID 5 is much slower than
RAID 1 on random writes: RAID 5 requires 2 reads and 2 writes to execute a single random write request
...
We can then calculate the number of disks
required to support the required I/O operations per second by dividing the result of
the calculation by 100 I/O operations per second (for current generation disks)
...
For such applications, if RAID 1 is used, the required
number of disks is actually less than the required number of disks if RAID 5 is used!
Thus RAID 5 is useful only when the data storage requirements are very large, but
the I/O rates and data transfer requirements are small, that is, for very large and very
“cold” data
...
Other Topics

21
...
2
...
For example, consider the account relation, with the schema
account (account-number, branch-name, balance)
for which account-number is a key
...

If most accesses to account information look at only the account-number and balance, then they can be run against the account-balance relation, and access is likely to be
somewhat faster, since the branch-name attribute is not fetched
...
This effect would be particularly marked
if the branch-name attribute were large
...

On the other hand, if most accesses to account information require both balance and
branch-name, using the account relation would be preferable, since the cost of the join
of account-balance and account-branch would be avoided
...

Another trick to improve performance is to store a denormalized relation, such
as a join of account and depositor, where the information about branch-names and
balances is repeated for every account holder
...
However,
a query that fetches the names of the customers and the associated balances will
be speeded up, since the join of account and depositor will have been precomputed
...

Materialized views can provide the beneﬁts that denormalized relations provide,
at the cost of some extra storage; we describe performance tuning of materialized
views in Section 21
...
6
...
Thus, materialized views are preferable,
whenever they are supported by the database system
...
We saw
such clustered ﬁle organizations in Section 11
...
2
...
Other Topics

© The McGraw−Hill
Companies, 2001

21
...
2

Performance Tuning

795

21
...
5 Tuning of Indices
We can tune the indices in a system to improve performance
...
If
updates are the bottleneck, there may be too many indices, which have to be updated
when the relations are updated
...

The choice of the type of index also is important
...
If range queries
are common, B-tree indices are preferable to hash indices
...
Only one index on a relation can
be made clustered, by storing the relation sorted on the index attributes
...

To help identify what indices to create, and which index (if any) on each relation
should be clustered, some database systems provide tuning wizards
...

Recommendations on what indices to create are based on these estimates
...
2
...
Recall the example from Section 14
...
As we saw in that section, creating a materialized view
storing the total loan amount for each branch can greatly speed up such queries
...
In the case of immediate view maintenance, if
the updates of a transaction affect the materialized view, the materialized view must
be updated as part of the same transaction
...

In the case of deferred view maintenance, the materialized view is updated later;
until it is updated, the materialized view may be inconsistent with the database relations
...
Using deferred maintenance reduces the burden on
update transactions
...
From the examination, the system administrator may choose an appropriate set of materialized views
...

However, manual choice is tedious for even moderately large sets of query types,
and making a good choice may be difﬁcult, since it requires understanding the costs

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

796

Chapter 21

VII
...
Application
Development and
Administration

© The McGraw−Hill
Companies, 2001

Application Development and Administration

of different alternatives; only the query optimizer can estimate the costs with reasonable accuracy, without actually executing the query
...

The administrator repeats the process until a set of views is found that gives acceptable performance
...
Some database systems, such as Microsoft SQL Server 7
...
These tools examine the workload (the history of queries and updates)
and suggest indices and views to be materialized
...

Microsoft’s materialized view selection tool also permits the user to ask “what
if” questions, whereby the user can pick a view, and the optimizer then estimates the
effect of materializing the view on the total cost of the workload and on the individual
costs of different query/update types in the workload
...

Greedy heuristics for materialized view selection operate roughly this way: They
estimate the beneﬁts of materializing different views, and choose the view that gives
either the maximum beneﬁt or the maximum beneﬁt per unit space (that is, beneﬁt divided by the space required to store the view)
...
The process continues until
either the available disk space for storing materialized views is exhausted, or the cost
of view maintenance increases above acceptable limits
...
2
...
Today’s advanced optimizers can transform even badly
written queries and execute them efﬁciently, so the need for tuning individual queries
is less important than it used to be
...
Most systems provide a
mechanism to ﬁnd out the exact execution plan for a query; this information can be
used to rewrite the query in a form that the optimizer can deal with better
...
Other Topics

© The McGraw−Hill
Companies, 2001

21
...
2

Performance Tuning

797

executed only once
...

For example, consider a program that steps through each department speciﬁed in
a list, invoking an embedded SQL query to ﬁnd the total expenses of the department
by using the group by construct on a relation expenses(date, employee, department,
amount)
...
Instead, we can use a single SQL query
to ﬁnd total expenses of all departments; the query can be evaluated with a single
scan
...
Even if there is an index that permits efﬁcient
access to tuples of a given department, using multiple SQL queries can have a high
communication overhead in a client – server system
...

Another technique used widely in client – server systems to reduce the cost of communication and SQL compilation is to use stored procedures, where queries are stored
at the server in the form of procedures, which may be precompiled
...

Concurrent execution of different types of transactions can sometimes lead to poor
performance because of contention on locks
...
During the day, numerous small update transactions are executed almost continuously
...
If the query performs a scan on a relation, it may block out all updates
on the relation while it runs, and that can have a disastrous effect on the performance
of the system
...
This feature should be used if available
...
For databases supporting Web sites, there may be no such quiet period
for updates
...
The application semantics determine whether approximate (inconsistent) answers are acceptable
...
If a transaction performs
many updates, the system log may become full even before the transaction completes, in which case the transaction will have to be rolled back
...
Again, this blocking could lead
to the log getting ﬁlled up
...
Even if the system does not
impose such limits, it is often helpful to break up a large update transaction into a set

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

798

Chapter 21

VII
...
Application
Development and
Administration

© The McGraw−Hill
Companies, 2001

Application Development and Administration

of smaller update transactions where possible
...
Such transactions are called minibatch transactions
...
First, if there are concurrent updates on the set of employees, the
result of the set of smaller transactions may not be equivalent to that of the single
large transaction
...
To avoid this problem, as soon as the system recovers from failure, we
must execute the transactions remaining in the batch
...
2
...
Each service shown in
Figure 21
...
Instead of modeling details of a service, the simulation
model may capture only some aspects of each service, such as the service time — that
is, the time taken to ﬁnish processing a request once processing has begun
...

Since requests for a service generally have to wait their turn, each service has an
associated queue in the simulation model
...
The requests are queued up as they arrive, and are serviced according to the
policy for that service, such as ﬁrst come, ﬁrst served
...

Once the simulation model for transaction processing is built, the system administrator can run a number of experiments on it
...
The administrator could run other experiments that vary the service times for each service to ﬁnd out how sensitive the
performance is to each of them
...

21
...
Performance benchmarks are suites of tasks that are used to quantify the performance of software systems
...
3
...
As a result, there is a signiﬁcant amount of variation in their performance on different tasks
...
Other Topics

© The McGraw−Hill
Companies, 2001

21
...
3

Performance Benchmarks

799

the most efﬁcient on a particular task; another may be the most efﬁcient on a different task
...
Instead, the performance of a system is measured by suites of standardized
tasks, called performance benchmarks
...

Suppose that we have two tasks, T1 and T2 , and that we measure the throughput of a
system as the number of transactions of each type that run in a given amount of time
— say, 1 second
...
Similarly, let system B run both T1 and T2 at
50 transactions per second
...

If we took the average of the two pairs of numbers (that is, 99 and 1, versus 50
and 50), it might appear that the two systems have equal performance
...
5 seconds to ﬁnish, whereas system B would ﬁnish in
just 2 seconds!
The example shows that a simple measure of performance is misleading if there
is more than one type of transaction
...
We can then compute system performance accurately in transactions per second for a speciﬁed workload
...
5/100, which is
0
...
02 seconds per transaction,
on average
...
98 transactions per second, whereas system B runs at 50 transactions per second
...
The harmonic mean of n throughputs t1 ,
...
98
...
Thus, system B is approximately 25 times faster than system A on
a workload consisting of an equal mixture of the two example types of transactions
...
3
...
These two classes of tasks have different requirements
...
On the other hand, good query-evaluation algorithms and query optimization are required for decision support
...
Other vendors try to strike a balance between the two tasks
...
Other Topics

21
...
Hence, which database system is best for an application depends
on what mix of the two requirements the application has
...

We must be careful even about taking the harmonic mean of the throughput numbers, because of interference between the transactions
...
The harmonic mean of throughputs should be used
only if the transactions do not interfere with one another
...
3
...

The TPC benchmarks are deﬁned in great detail
...
They deﬁne the number of tuples in the relations not as a
ﬁxed number, but rather as a multiple of the number of claimed transactions per second, to reﬂect that a larger rate of transaction execution is likely to be correlated with
a larger number of accounts
...
When its performance is measured, the system must
provide a response time within certain bounds, so that a high throughput cannot be
obtained at the cost of very long response times
...
Hence, the TPC benchmark also measures performance
in terms of price per TPS
...
Moreover, a
company cannot claim TPC benchmark numbers for its systems without an external
audit that ensures that the system faithfully follows the deﬁnition of the benchmark,
including full support for the ACID properties of transactions
...
This
benchmark simulates a typical bank application by a single type of transaction that
models cash withdrawal and deposit at a bank teller
...
The benchmark also incorporates communication with terminals, to model the end-to-end performance of the
system realistically
...

It removes the parts of the TPC-A benchmark that deal with users, communication,
and terminals, to focus on the back-end database server
...

The TPC-C benchmark was designed to model a more complex system than the
TPC-A benchmark
...
The TPC-C benchmark is
still widely used for transaction processing
...
Other Topics

© The McGraw−Hill
Companies, 2001

21
...
3

Performance Benchmarks

801

The TPC-D benchmark was designed to test the performance of database systems
on decision-support queries
...
The TPC-A, TPC-B, and TPC-C benchmarks measure performance
on transaction-processing workloads, and should not be used as a measure of performance on decision-support queries
...

The TPC-D benchmark schema models a sales/distribution application, with parts,
suppliers, customers, and orders, along with some auxiliary information
...
TPC-D at scale factor 1 represents the TPC-D benchmark
on a 1-gigabyte database, while scale factor 10 represents a 10-gigabyte database
...
Some of the queries make use of complex SQL
features, such as aggregation and nested queries
...
There are applications, such as periodic reporting tasks, where the queries are
known in advance and materialized view can be carefully selected to speed up the
queries
...

The TPC-R benchmark (where R stands for reporting) is a reﬁnement of the TPC-D
benchmark
...
In addition, there are two updates, a set of inserts and a set of deletes
...

In contrast the TPC-H benchmark (where H represents ad hoc) uses the same
schema and workload as TPC-R but prohibits materialized views and other redundant information, and permits indices only on primary and foreign keys
...

Both TPC-H and TPC-R measure performance in this way: The power test runs
the queries and updates one at a time sequentially, and 3600 seconds divided by
geometric mean of the execution times of the queries (in seconds) gives a measure
of queries per hour
...
There is also a parallel update stream
...

The composite query per hour metric, which is the overall metric, is then obtained as the square root of the the product of the power and throughput metrics
...

The TPC-W Web commerce benchmark is an end-to-end benchmark that models
Web sites having static content (primarily images) and dynamic content generated
from a database
...
The benchmark models an electronic bookstore,
and like other TPC benchmarks, provides for different scale factors
...

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

802

Chapter 21

VII
...
Application
Development and
Administration

© The McGraw−Hill
Companies, 2001

Application Development and Administration

21
...
4 The OODB Benchmarks
The nature of applications in an object-oriented database, OODB, is different from
that of typical transaction-processing applications
...
The Object Operations benchmark, version 1,
popularly known as the OO1 benchmark, was an early proposal
...
The TPC benchmarks provide one or two numbers (in terms of average transactions per second, and
transactions per second per dollar); the OO7 benchmark provides a set of numbers,
containing a separate benchmark number for each of several different kinds of operations
...
It is clear that such a transaction will carry out certain operations, such
as traversing a set of connected objects or retrieving all objects in a class, but it is not
clear exactly what mix of these operations will be used
...

21
...
Today, database systems are complex, and are often made up of multiple independently created parts that need to interact
...
A company that has multiple heterogeneous database systems may
need to exchange data between the databases
...

Formal standards are those developed by a standards organization or by industry
groups, through a public process
...
Some formal standards, like many aspects of the SQL-92 and
SQL:1999 standards, are anticipatory standards that lead the marketplace; they deﬁne
features that vendors then implement in products
...
SQL-89 was in many ways reactionary, since it standardized
features, such as integrity checking, that were already present in the IBM SAA SQL
standard and in other databases
...
Formal standards committees meet periodically, and
members present proposals for features to be added to or modiﬁed in the standard
...
Other Topics

© The McGraw−Hill
Companies, 2001

21
...
4

Standardization

803

public review, members vote on whether to accept or reject a feature
...
The process of updating the standard then begins, and a new version of the standard is usually released after a few years
...

The DBTG CODASYL standard for network databases, formulated by the Database
Task Group, was one of the early formal standards for databases
...
With the growth of relational databases came a number of new entrants in
the database business; hence, the need for formal standards arose
...
A notable example is ODBC, which is now used in non-Microsoft environments
...

This section give a very high level overview of different standards, concentrating
on the goals of the standard
...

21
...
1 SQL Standards
Since SQL is the most widely used query language, much work has been done on
standardizing it
...
The SQL-86 standard was the initial version
...
As people identiﬁed the need for more features, updated versions of the formal SQL standard were developed, called SQL-89 and SQL-92
...
We have seen many of these features in earlier chapters, and will see a few in
later chapters
...

• SQL/Foundation (Part 2) deﬁnes the basics of the standard: types, schemas, tables, views, query and update statements, expressions, security model, predicates, assignment rules, transaction management and so on
...

• SQL/PSM (Persistent Stored Modules) (Part 4) deﬁnes extensions to SQL to
make it procedural
...

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

804

Chapter 21

VII
...
Application
Development and
Administration

© The McGraw−Hill
Companies, 2001

Application Development and Administration

The SQL:1999 OLAP features (Section 22
...
3) have been speciﬁed as an amendment
to the earlier version of the SQL:1999 standard
...

• Part 9: SQL/MED (Management of External Data) deﬁnes standards for interfacing an SQL system to external sources
...

• Part 10: SQL/OLB (Object Language Bindings) deﬁnes standards for embedding SQL in Java
...
The multimedia standards propose to cover storage and retrieval of text data,
spatial data, and still images
...
4
...
ODBC is based on the SQL Call-Level Interface
(CLI) standards developed by the X/Open industry consortium and the SQL Access
Group, but has several extensions
...
The standard also deﬁnes
conformance levels for the CLI and the SQL syntax
...
The next level of conformance (level 1) requires support for catalog information retrieval and some other
features over and above the core-level CLI; level 2 requires further features, such as
ability to send and retrieve arrays of parameter values and to retrieve more detailed
catalog information
...

A distributed system provides a more general environment than a client – server
system
...
These standards deﬁne transaction-management primitives (such as transaction begin, commit, abort, and prepare-to-commit) that compliant databases should provide; a transaction manager can invoke these primitives
to implement distributed transactions by two-phase commit
...
Thus, we can use the XA protocols to implement a distributed transaction system in which a single transaction can access relational as well
as object-oriented databases, yet the transaction manager ensures global consistency
via two-phase commit
...
Other Topics

© The McGraw−Hill
Companies, 2001

21
...
4

Standardization

805

There are many data sources that are not relational databases, and in fact may not
be databases at all
...
Microsoft’s OLE-DB is
a C++ API with goals similar to ODBC, but for nondatabase data sources that may
provide only limited querying and update facilities
...

However, OLE-DB differes from ODBC in several ways
...
An OLE-DB
program can negotiate with a data source to ﬁnd what interfaces are supported
...
In OLE-DB, commands may be in any language
supported by the data source; while some sources may support SQL, or a limited
subset of SQL, other sources may provide only simple capabilities such as accessing
data in a ﬂat ﬁle, without any query capability
...
A rowset object can be updated by one application, and
other applications sharing that object would get notiﬁed about the change
...

21
...
3 Object Database Standards
Standards in the area of object-oriented databases have so far been driven primarily
by OODB vendors
...

The C++ language interface speciﬁed by ODMG was discussed in Chapter 8
...

The Object Management Group (OMG) is a consortium of companies, formed with
the objective of developing a standard architecture for distributed software applications based on the object-oriented model
...
The Object Request Broker (ORB) is a component
of the OMA architecture that provides message dispatch to distributed objects transparently, so the physical location of the object is not important
...
The IDL helps to support data conversion when
data are shipped between systems with different data representations
...
4
...
Many of these standards are related to e-commerce
...
RosettaNet, which falls into the former category,
uses XML-based standards to facilitate supply-chain management in the computer

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

806

Chapter 21

VII
...
Application
Development and
Administration

© The McGraw−Hill
Companies, 2001

Application Development and Administration

and information technology industries
...
BizTalk is a framework of XML schemas and guidelines, backed by Microsoft
...

Participants in electronic marketplaces may store data in a variety of database systems
...

Furthermore, there may be semantic differences (metric versus English measure, distinct monetary currencies, and so forth) in the data
...
These XML wrappers form the basis of a uniﬁed view of data across all
of the participants in the marketplace
...
SOAP is backed by the
World Wide Web Consortium (W3C) and is gaining wide acceptance in industry
(including IBM and Microsoft)
...
For
instance, in business-to-business e-commerce, applications running at one site can
access data from other sites through SOAP
...
(OLAP and data mining are covered in
Chapter 22
...
As of early 2001 the
standard was in working draft stage, and should be ﬁnalized by the end of the year
...

21
...
The types of activities include:
• Presale activities, needed to inform the potential buyer about the product or
service being sold
...

• The marketplace: When there are multiple sellers and buyers for a product,
a marketplace, such as a stock exchange, helps in negotiating the price to be
paid for the product
...

• Payment for the sale
...
Other Topics

© The McGraw−Hill
Companies, 2001

21
...
5

E-Commerce∗∗

807

• Activities related to delivery of the product or service
...

• Customer support and postsale service
...
For some of the activities the use of databases is straightforward, but there are interesting application development issues for the other activities
...
5
...
The services provided by an e-catalog may vary considerably
...
To help with browsing, products
should be organized into an intuitive hierarchy, so a few clicks on hyperlinks can
lead a customer to the products they are interested in
...
E-catalogs should also provide a means for customers
to easily compare alternatives from which to choose among competing products
...
For instance, a retailer may have
an agreement with a large company to supply some products at a discount
...
Because of legal restrictions on sales of some types of items, customers who are underage, or from certain states or countries, should not be shown items that cannot be
legally sold to them
...
For instance, frequent customers may be offered special
discounts on some items
...
There are also challenges in supporting very high transaction rates, which are
often tackled by caching of query results or generated Web pages
...
5
...
There are several different types of marketplaces:
• In a reverse auction system a buyer states requirements, and sellers bid for
supplying the item
...
In a closed
bidding system, the bids are not made public, whereas in an open bidding
system the bids are made public
...
For simplicity, assume that there is only one instance of each item being sold
...
Other Topics

21
...

When there are multiple copies of an item, things become more complicated: Suppose there are four items, and one bidder may want three copies
for $10 each, while another wants two copies for $13 each
...
If the items will be of no value if they are not sold (for
instance, airline seats, which must be sold before the plane leaves), the seller
simply picks a set of bids that maximizes the income
...

• In an exchange, such as a stock exchange, there are multiple sellers and multiple buyers
...
There is usually a market
maker who matches buy and sell bids, deciding on the price for each trade (for
instance, at the price of the sell bid)
...

Among the database issues in handling marketplaces are these:
• Bidders need to be authenticated before they are allowed to bid
...
Bids need to be
communicated quickly to other people involved in the marketplace (such as
all the buyers or all the sellers), who may be numerous
...

• The volumes of trades may be extremely large at times of stock market volatility, or toward the end of auctions
...

21
...
3 Order Settlement
After items have been selected (perhaps through an electronic catalog), and the price
determined (perhaps by an electronic marketplace), the order has to be settled
...

A simple but unsecure way of paying electronically is to send a credit card number
...
First, credit card fraud is possible
...
Second, the seller has to be trusted to bill only
for the agreed-on item and to not pass on the card number to unauthorized people
who may misuse it
...
In addition, they provide for better privacy, whereby the seller may not
be given any unnecessary details about the buyer, and the credit card company is not
provided any unnecessary information about the items purchased
...
Other Topics

© The McGraw−Hill
Companies, 2001

21
...
6

Legacy Systems

809

cannot ﬁnd out the contents
...

The protocols must also prevent person-in-the-middle attacks, where someone
can impersonate the bank or credit-card company, or even the seller, or buyer, and
steal secret information
...
Impersonation is prevented by a system of digital certiﬁcates, whereby public keys are signed by a certiﬁcation agency, whose public key is well known
(or which in turn has its public key certiﬁed by another certiﬁcation agency and so
on up to a key that is well known)
...

The Secure Electronic Transaction (SET) protocol is one such secure payment protocol
...

There are also systems that provide for greater anonymity, similar to that provided by physical cash
...
When
a payment is made in such a system, it is not possible to identify the purchaser
...

21
...
Such systems may still contain valuable data, and
may support critical applications
...

Porting legacy applications to a more modern environment is often costly in terms
of both time and money, since they are often very large, consisting of millions of lines
of code developed by teams of programmers, over several decades
...
One approach used to interoperate between relational databases and legacy databases is to build a layer, called a
wrapper, on top of the legacy systems that can make the legacy system appear to be
a relational database
...
The wrapper is responsible for converting relational queries and updates into
queries and updates on the legacy system
...
Reverse engineering also
examines the code to ﬁnd out what procedures and processes were implemented, in
order to get a high-level model of the system
...
Other Topics

21
...
When coming up with the design of a new system, the design
is reviewed, so that it can be improved rather than just reimplemented as is
...
The overall process is
called re-engineering
...
However, abruptly transitioning to a new system, which is called
the big-bang approach, carries several risks
...
Second there may be bugs or performance problems
in the new system that were not discovered when it was tested
...
In some extreme cases the new
system has even been abandoned, and the legacy system reused, after an attempted
switchover failed
...
For example, the new user interfaces may be
used with the old system in the back end, or vice versa
...
In either case, the legacy and new systems coexist for some time
...
This approach, therefore
has a higher development cost associated with it
...
7 Summary
• The Web browser has emerged as the most widely used user interface to
databases
...
Web browsers communicate with Web servers by
the HTTP protocol
...

• There are several client-side scripting languages— Javascript is the most widely
used— that provide richer user interaction at the browser end
...

Servlets are a widely used mechanism to write application programs that run
as part of the Web server process, in order to reduce overheads
...

• Tuning of the database-system parameters, as well as the higher-level database
design — such as the schema, indices, and transactions — is important for good
performance
...

803

804

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

VII
...
Application
Development and
Administration

21
...
The TPC
benchmark suites are widely used, and the different TPC benchmarks are useful for comparing the performance of databases under different workloads
...
Formal standards exist for SQL
...
Standards for object-oriented databases, such as ODMG, are
being developed by industry groups
...
There are several database issues in e-commerce systems
...

Electronic marketplaces help in pricing of products through auctions, reverse
auctions, or exchanges
...
Orders are settled by electronic payment systems, which
also need high-performance database systems to handle very high transaction
rates
...
Interfacing legacy
systems with new-generation systems is often important when they run
mission-critical systems
...

Review Terms
• Web interfaces to databases

• Connectionless

• HyperText Markup Language
(HTML)

• Cookie

• Hyperlinks

• Server-side scripting

• Uniform resource locator (URL)
• Client-side scripting
• Applets
• Client-side scripting language
• Web servers
• Session
• HyperText Transfer Protocol
(HTTP)
• Common Gateway Interface
(CGI)

• Servlets
• Performance tuning
• Bottlenecks
• Queueing systems
• Tunable parameters
• Tuning of hardware
• Five-minute rule
• One-minute rule
• Tuning of the schema
• Tuning of indices

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

812

Chapter 21

VII
...
Application
Development and
Administration

Application Development and Administration

•
•
•
•
•

Materialized views
Immediate view maintenance
Deferred view maintenance
Tuning of transactions
Improving set orientedness

•
•
•
•
•
•
•

Minibatch transactions
Performance simulation
Performance benchmarks
Service time
Time to completion
Database-application classes
The TPC benchmarks
TPC-A
TPC-B
TPC-C
TPC-D
TPC-R
TPC-H
TPC-W

• Web interactions per second
• OODB benchmarks
OO1
OO7

• Standardization

Formal standards
De facto standards
Anticipatory standards
Reactionary standards
• Database connectivity standards
ODBC
OLE-DB

X/Open XA standards
• Object database standards
ODMG

CORBA
• XML-based standards
• E-commerce
• E-catalogs
• Marketplaces
Auctions
Reverse-auctions
Exchange
• Order settlement
• Digital certiﬁcates
• Legacy systems
• Reverse engineering
• Re-engineering

Exercises
21
...

21
...

21
...

21
...
What are the three broad levels at which a database system can be tuned
to improve performance?
b
...

21
...
Other Topics

21
...
6 Suppose a system runs three types of transactions
...
Suppose the mix of transactions
has 25 percent of type A, 25 percent of type B, and 50 percent of type C
...
What is the average transaction throughput of the system, assuming there
is no interference between the transactions
...
What factors may result in interference between the transactions of different types, leading to the calculated throughput being incorrect?
21
...

What would be the effect of this change on the 5 minute and 1 minute rule?
21
...

21
...
10 List some beneﬁts and drawbacks of an anticipatory standard compared to a
reactionary standard
...
11 Suppose someone impersonates a company and gets a certiﬁcate from a certiﬁcate issuing authority
...
The difﬁculty of the project can be adjusted easily by adding or
deleting features
...
1 Consider the E-R schema of Exercise 2
...
Design and implement a Web-based system to enter, update, and view the data
...
2 Design and implement a shopping cart system that lets shoppers collect
items into a shopping cart (you can decide what information is to be supplied
for each item) and purchased together
...
12 of Chapter 2
...

Project 21
...

Project 21
...
The number of assignments/exams should not

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

814

Chapter 21

VII
...
Application
Development and
Administration

© The McGraw−Hill
Companies, 2001

Application Development and Administration

be predeﬁned; that is, more assignments/exams can be added at any time
...

You may also wish to integrate it with the student registration system of
Project 21
...

Project 21
...
Periodic booking (ﬁxed days/times each week for a whole
semester) must be supported
...

You may also wish to integrate it with the student registration system of
Project 21
...

Project 21
...
You should support distributed contribution of questions (by teaching
assistants, for example), editing of questions by whoever is in charge of the
course, and creation of tests from the available set of questions
...

Project 21
...

Incoming mail goes to a common pool
...
If the e-mail is part of an ongoing series of replies (tracked
using the in-reply-to ﬁeld of e-mail) the mail should preferably be replied to
by the same agent who replied earlier
...

Project 21
...
You may also wish to support alerting services, whereby a user
can register interest in items in a particular category, perhaps with other constraints as well, without publicly advertising his/her interest, and is notiﬁed
when such an item is listed for sale
...
9 Design and implement a Web-based newsgroup system
...
The
system tracks which articles were read by a user, so they are not displayed
again
...

You may also wish to provide a rating service for articles, so that articles
with high rating are highlighted permitting the busy reader to skip low-rated
articles
...
Other Topics

21
...
10 Design and implement a Web-based system for managing a sports “ladder
...
Anyone can challenge anyone else to a match, and
the rankings are adjusted according to the result
...
You can try
to invent more complicated rank adjustment systems
...
11 Design and implement a publications listing service
...
Authors should be a separate entity with attributes such as name, institution, department, e-mail, address, and home page
...
For instance, you should provide all publications by a given author (sorted by year,
for example), or all publications by authors from a given institution or department
...

Bibliographical Notes
Information about servlets, including tutorials, standard speciﬁcations, and software,
is available on java
...
com/products/servlet
...
sun
...

An early proposal for a database-system benchmark (the Wisconsin benchmark)
was made by Bitton et al
...
The TPC-A,-B, and -C benchmarks are described
in Gray [1991]
...
tpc
...
Poess
and Floyd [2000] gives an overview of the TPC-H, TPC-R, and TPC-W benchmarks
...
[1993]
...

Shasha [1992] provides a good overview of database tuning
...
The ﬁve minute and one minute rules are described in Gray and Putzolu
[1987] and Gray and Graefe [1997]
...
[1994] describes an approach to
automated tuning
...
[1996], Labio et al
...
[2000] and Mistry et al
...

The American National Standard SQL-86 is described in ANSI [1986]
...
The
standards for SQL-89 and SQL-92 are available as ANSI [1989] and ANSI [1992] respectively
...

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

816

VII
...
Application
Development and
Administration

© The McGraw−Hill
Companies, 2001

Application Development and Administration

The X/Open SQL call-level interface is deﬁned in X/Open [1993]; the ODBC API is
described in Microsoft [1997] and Sanders [1998]
...
More information about ODBC, OLE-DB, and ADO can be found
on the Web site www
...
com/data, and in a number of books on the subject
that can be found through www
...
com
...
0 standard is deﬁned in
Cattell [2000]
...

A wealth of information on XML based standards is available online
...

Loeb [1998] provides a detailed description of secure electronic transactions
...
Kirchmer [1999] describes application implementation using standard software such as Enterprise Resource Planning (ERP) packages
...

Tools
There are many Web development tools that support database connectivity through
servlets, JSP, Javascript, or other mechanisms
...
sun
...
apache
...
org), IBM WebSphere (www
...
ibm
...
microsoft
...
allaire
...
caucho
...
zope
...
A few of these, such
as Apache, are free for any use, some are free for noncommercial use or for personal use, while others need to be paid for
...

809

810

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

VII
...
Advanced Querying and
Information Retrieval

C H A P T E R

© The McGraw−Hill
Companies, 2001

2 2

Advanced Querying and
Information Retrieval

Businesses have begun to exploit the burgeoning data online to make better decisions
about their activities, such as what items to stock and how best to target customers
to increase sales
...

Several techniques and tools are available to help with decision support
...
Other analysis tools precompute summaries of very large amounts of data, in order to give fast
responses to queries
...
Another approach to getting knowledge from data is to use
data mining, which aims at detecting various types of patterns in large volumes of
data
...

Textual data, too, has grown explosively
...
Querying of unstructured textual data
is referred to as information retrieval
...
However, the emphasis in the ﬁeld of information systems is different from that in database systems, concentrating on issues such as querying based on
keywords; the relevance of documents to the query; and the analysis, classiﬁcation,
and indexing of documents
...

22
...
3
...
Transaction-processing systems are widely used today, and companies have accumulated a vast amount of information generated by these systems
...
Other Topics

22
...
The size of the information storage required
may range up to hundreds of gigabytes, or even terabytes, for large retail chains
...
Information about the items purchased
may include the name of the item, the manufacturer, the model number, the color,
and the size
...

Such large databases can be treasure troves of information for making business
decisions, such as what items to stock and what discounts to offer
...
As another example, a car company may ﬁnd, on
querying its database, that most of its small sports cars are bought by young women
whose annual incomes are above $50,000
...
In both cases, the
company has identiﬁed patterns in customer behavior, and has used the patterns to
make business decisions
...
Several SQL
extensions have therefore been proposed to make data analysis easier
...
In
Section 22
...

• Database query languages are not suited to the performance of detailed statistical analyses of data
...
Such packages have been interfaced with databases,
to allow large volumes of data to be stored in the database and retrieved efﬁciently for analysis
...

• Knowledge-discovery techniques attempt to discover automatically statistical rules and patterns from data
...
Section 22
...

• Large companies have diverse sources of data that they need to use for making
business decisions
...

For performance reasons (as well as for reasons of organization control), the

811

812

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

VII
...
Advanced Querying and
Information Retrieval

22
...

To execute queries efﬁciently on such diverse data, companies have built
data warehouses
...
Thus, they provide the user a single uniform
interface to data
...
4
...

22
...
Since the data stored
in databases are usually large in volume, they need to be summarized in some fashion if we are to derive information that humans can use
...
Several SQL extensions have been developed to support OLAP tools
...
Examples include ﬁnding percentiles, or cumulative distributions, or aggregates
over sliding windows on sequentially ordered data
...

22
...
1 Online Analytical Processing
Statistical analysis often requires grouping on multiple attributes
...
Let us
suppose that clothes are characterized by their item-name, color, and size, and that
we have a relation sales with the schema sales(item-name, color, size, number)
...

Given a relation used for data analysis, we can identify some of its attributes as
measure attributes, since they measure some value, and can be aggregated upon
...
Some (or all) of the other attributes of the relation
are identiﬁed as dimension attributes, since they deﬁne the dimensions on which
measure attributes, and summaries of measure attributes, are viewed
...
(A more realistic version of
the sales relation would have additional dimensions, such as time and sales location,
and additional measures such as monetary value of the sale
...

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

820

Chapter 22

VII
...
Advanced Querying and
Information Retrieval

Advanced Querying and Information Retrieval

size:

all
color

item-name

Figure 22
...

To analyze the multidimensional data, a manager may want to see data laid out
as shown in the table in Figure 22
...
The table shows total numbers for different
combinations of item-name and color
...

The table in Figure 22
...
In general, a cross-tab is a table where values for one
attribute (say A) form the row headers, values for another attribute (say B) form the
column headers, and the values in an individual cell are derived as follows
...
If there
is at most one tuple with any (ai , bj ) value, the value in the cell is derived from that
single tuple (if any); for instance, it could be the value of one or more other attributes
of the tuple
...
In our example, the
aggregation used is the sum of the values for attribute number
...
Most cross-tabs have such summary rows and columns
...
A change in the data
values may result in adding more columns, which is not desirable for data storage
...
It is straightforward to
represent a cross-tab without summary values in a relational form with a ﬁxed number of columns
...
2
...

Consider the tuples (skirt, all, 53) and (dress, all, 35)
...
The value all can be thought of
as representing the set of all values for an attribute
...
Similarly, a group by on color can be used to get the tuples with
the value all for item-name, and a group by with no attributes (which can simply be
omitted in SQL) can be used to get the tuple with value all for item-name and color
...
Other Topics

© The McGraw−Hill
Companies, 2001

22
...
2

item-name
skirt
skirt
skirt
skirt
dress
dress
dress
dress
shirt
shirt
shirt
shirt
pant
pant
pant
pant
all
all
all
all
Figure 22
...
1
...
Figure 22
...
The data cube has three dimensions, namely item-name,
color, and size, and the measure attribute is number
...
Each cell in the data cube contains a value, just as in a
cross-tab
...
3, the value contained in a cell is shown on one of the faces of
the cell; other faces of the cell are shown blank if they are visible
...
The number of different
ways in which the tuples can be grouped for aggregation can be large
...
1
An online analytical processing or OLAP system is an interactive system that permits an analyst to view different summaries of multidimensional data
...

With an OLAP system, a data analyst can look at different cross-tabs on the same
data by interactively selecting the attributes in the cross-tab
...
Grouping on the set of all n dimensions is useful only if the table may have duplicates
...
Advanced Querying and
Information Retrieval

Advanced Querying and Information Retrieval

2

5

4
2

3

7
8

6
5

1
12

7

11
29

22
16

8

20

14

20

62

pastel

35

10

7

2

54

white

10

8

28

5

4
34
9
21

35

49

skirt dress shirts
item name
Figure 22
...
Other Topics

Three-dimensional data cube
...
For instance the analyst may
select a cross-tab on item-name and size, or a cross-tab on color and size
...

An OLAP system provides other functionality as well
...
Such an operation is referred to as
slicing, since it can be thought of as viewing a slice of the data cube
...

When a cross-tab is used to view a multidimensional cube, the values of dimension
attributes that are not part of the cross-tab are shown above the cross-tab
...
1, indicating that data in the crosstab are a summary over all values for the attribute
...

OLAP systems permit users to view data at any desired level of granularity
...
In our example, starting from the data cube on the
sales table, we got our example cross-tab by rolling up on the attribute size
...
Clearly, ﬁner-granularity data cannot be generated from
coarse-granularity data; they must be generated either from the original data, or from
even ﬁner-granularity summary data
...
For instance,
an attribute of type datetime contains a date and a time of day
...
Other Topics

© The McGraw−Hill
Companies, 2001

22
...
2

Data Analysis and OLAP

823

Year
Quarter
Region
Day of week

Month
Country

Hour of day

Date
State
City

DateTime
a) Time Hierarchy
Figure 22
...

of day may look at only the hour value
...
Another
analyst may be interested in aggregates over a month, or a quarter, or for an entire
year
...

Figure 22
...
As another example, Figure 22
...
In
our earlier example, clothes can be grouped by category (for instance, menswear or
womenswear); category would then lie above item-name in our hierarchy on clothes
...

An analyst may be interested in viewing sales of clothes divided as menswear and
womenswear, and not interested in individual values
...
An analyst looking at the detailed level may drill up the
hierarchy, and look at coarser-level aggregates
...
5
...
2
...
Later, OLAP
facilities were integrated into relational systems, with data stored in a relational database
...
Hybrid

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

824

Chapter 22

VII
...
Advanced Querying and
Information Retrieval

Advanced Querying and Information Retrieval

category
womenswear

menswear

total
Figure 22
...

systems, which store some summaries in memory and store the base data and other
summaries in a relational database, are called hybrid OLAP (HOLAP) systems
...
The server contains the relational database as well as any MOLAP data cubes
...

A na¨ve way of computing the entire data cube (all groupings) on a relation is to
ı
use any standard algorithm for computing aggregate operations, one grouping at a
time
...
A
ı
simple optimization is to compute an aggregation on, say, (item-name, color) from an
aggregation (item-name, color, size), instead of from the original relation
...
1), but note that to compute avg, we additionally need the count value
...
) The amount of data
read drops signiﬁcantly by computing an aggregate from another aggregate, instead
of from the original relation
...
See the bibliographical
notes for references to algorithms for efﬁciently computing data cubes
...
Precomputation allows OLAP
queries to be answered within a few seconds, even on datasets that may contain
millions of tuples adding up to gigabytes of data
...

As a result, the entire data cube is often larger than the original relation that formed
the data cube and in many cases it is not feasible to store the entire data cube
...
Instead of computing queries from the original relation, which may take a very long
time, we can compute them from other precomputed queries
...

The query result can be computed from summaries by (item-name, color, size), if that

817

818

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

VII
...
Advanced Querying and
Information Retrieval

22
...
See the bibliographical notes for references on how to select
a good set of groupings for precomputation, given limits on the storage available for
precomputed results
...
Section 22
...
3 discusses SQL extensions to support OLAP
functionality
...
2
...
The SQL:1999 standard, however, deﬁnes a rich set of
aggregate functions, which we outline in this section and in the next two sections
...

The new aggregate functions on single attributes are standard deviation and variance (stddev and variance)
...
2 Some
database systems support other aggregate functions such as median and mode
...

SQL:1999 also supports a new class of binary aggregate functions, which can compute statistical results on pairs of attributes; they include correlations, covariances,
and regression curves, which give a line approximating the relation between the values of the pair of attributes
...

SQL:1999 also supports generalizations of the group by construct, using the cube
and rollup constructs
...

For each grouping, the result contains the null value for attributes not present in
the grouping
...
2, with occurrences of all replaced
by null, can be computed by the query
select item-name, color, sum(number)
from sales
group by cube(item-name, color)
2
...
The deﬁnitions of the two types differ
slightly; see a statistics textbook for details
...
Other Topics

22
...
For instance, suppose we have a table itemcategory(item-name, category) giving the category of each item
...
item-name = itemcategory
...

Multiple rollups and cubes can be used in a single group by clause
...

The cross product of the two gives us the six groupings shown
...
2
...
This dual use of null can cause ambiguity if the
attributes used in a rollup or cube clause contain null values
...
Consider the following query:
select item-name, color, size, sum(number),
grouping(item-name) as item-name-ﬂag,
grouping(color) as color-ﬂag,
grouping(size) as size-ﬂag
from sales
group by cube(item-name, color, size)

819

820

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

VII
...
Advanced Querying and
Information Retrieval

22
...
In each tuple, the
value of a ﬂag ﬁeld is 1 if the corresponding ﬁeld is a null representing all
...
This expression can be
used in place of item-name in the select clause to get “all” in the output of the query,
in place of nulls representing all
...
For instance, we cannot use them to specify that we want only
groupings {(color, size), (size, item-name)}
...

22
...
4 Ranking
Finding the position of a value in a larger set is a common operation
...
While such queries can be expressed in SQL-92,
they are difﬁcult to express and inefﬁcient to evaluate
...
A related
type of query is to ﬁnd the percentile in which a value in a (multi)set belongs, for
example, the bottom third, middle third, or top third
...

Ranking is done in conjunction with an order by speciﬁcation
...
The following query gives the rank of each student
...
An extra order by clause is needed to get them in sorted order, as shown
below
...
In our example, this means deciding what to
do if there are two students with the same marks
...
Other Topics

22
...
For instance, if the highest
mark is shared by two students, both would get rank 1
...
There is also a dense rank
function that does not create gaps in the ordering
...

Ranking can be done within partitions of the data
...
The following query then gives the rank of
students within each section
...
student-id = student-section
...

Multiple rank expressions can be used within a single select statement; thus we
can obtain the overall rank and the rank within the section by using two rank expressions in the same select clause
...
In this case, the
group by clause is applied ﬁrst, and partitioning and ranking are done on the results
of the group by
...
For example,
suppose we had marks for each student for each of several subjects
...
We leave details as an exercise for you
...
Note that bottom
n is simply the same as top n with a reverse sorting order
...

SQL:1999 also speciﬁes several other functions that can be used in place of rank
...
If there
are n tuples in the partition3 and the rank of the tuple is r, then its percent rank is
deﬁned as (r − 1)/(n − 1) (and as null if there is only one tuple in the partition)
...

The entire set is treated as a single partition if no explicit partition is used
...
Other Topics

© The McGraw−Hill
Companies, 2001

22
...
2

Data Analysis and OLAP

829

tion
...

Finally, for a given constant n, the ranking function ntile(n) takes the tuples in each
partition in the speciﬁed order, and divides them into n buckets with equal numbers
of tuples
...
This function is particularly useful for
constructing histograms based on percentiles
...

The presence of null values can complicate the deﬁnition of rank, since it is not
clear where they should occur ﬁrst in the sort order
...
2
...

Another example of a window query is one that ﬁnds the cumulative balance in an
account, given a relation specifying the deposits and withdrawals on an account
...

SQL:1999 provides a windowing feature to support such queries
...
Suppose we are given a
relation transaction(account-number, date-time, value), where value is positive for a deposit and negative for a withdrawal
...

Consider the query
4
...
Tuples with the same value for the ordering attribute may be assigned to
different buckets, nondeterministically, in order to make the number of tuples in each bucket equal
...
Other Topics

22
...

The partition by clause partitions tuples by account number, so for each row only
the tuples in its partition are considered
...
The aggregate function sum(value) is applied on all the tuples in
the window
...

While the query could be written without these extended constructs, it would be
rather difﬁcult to formulate
...

Other types of windows can be speciﬁed
...
To get a window containing the current, previous, and following row, we can use between rows
1 preceding and 1 following
...
Note that if the ordering is on
a nonkey attribute, the result is not deterministic, since the order of tuples is not fully
deﬁned
...
For
instance, suppose the ordering value of a tuple is v; then range between 10 preceding
and current row would give tuples whose ordering value is between v − 10 and v
(both values inclusive)
...

Clearly, the windowing functionality of SQL:1999 is very rich and can be used to
write rather complex queries with a small amount of effort
...
3 Data Mining
The term data mining refers loosely to the process of semiautomatically analyzing
large databases to ﬁnd useful patterns
...
However, data mining differs from machine
learning and statistics in that it deals with large volumes of data, stored primarily on
disk
...
”

823

824

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

VII
...
Advanced Querying and
Information Retrieval

22
...
The following is an example of a rule, stated informally: “Young women
with annual incomes greater than $50,000 are the most likely people to buy small
sports cars
...
Other types of knowledge are represented
by equations relating different variables to each other, or by other mechanisms for
predicting outcomes when the values of some variables are known
...
We shall study a few examples
of patterns and see how they may be automatically derived from a database
...
There may also be more than one type
of pattern that can be discovered from a given database, and manual interaction may
be needed to pick useful types of patterns
...
However, in our description we concentrate on
the automatic aspect of mining
...
3
...
The most widely used applications are those that require some sort of prediction
...
The prediction is to be based on known attributes of the person, such
as age, income, debts, and past debt repayment history
...
Other types of prediction include predicting which customers may switch
over to a competitor (these customers may be offered special discounts to tempt them
not to switch), predicting which people are likely to respond to promotional mail
(“junk mail”), or predicting what types of phone calling card usage are likely to be
fraudulent
...
If a customer buys a book, an online bookstore may suggest
other associated books
...
A good salesperson is aware of such patterns and exploits them to make additional sales
...
Other types of associations may lead to discovery of causation
...
The medicine was then withdrawn from the market
...
Clusters are another example
of such patterns
...
Detection of clusters of disease remains important even
today
...
Other Topics

22
...
3
...
3
...
We outline what is classiﬁcation, study techniques for building one type of
classiﬁers, called decision tree classiﬁers, and then study other prediction techniques
...
The class of the new instance is not known, so other attributes of
the instance must be used to predict the class
...
For instance, suppose that a credit-card company wants to decide
whether or not to give a credit card to an applicant
...

Some of this information could be relevant to the credit worthiness of the applicant, whereas some may not be
...
Then, the company
attempts to ﬁnd rules that classify its current customers into excellent, good, average, or bad, on the basis of the information about the person, other than the actual
payment history (which is unavailable for new customers)
...
The rules may be of
the following form:
∀person P, P
...
income > 75, 000
⇒ P
...
degree = bachelors or
(P
...
income ≤ 75, 000) ⇒ P
...

The process of building a classiﬁer starts from a sample of data, called a training
set
...
For instance, the training set for a credit-card application may be the existing
customers, with their credit worthiness determined from their payment history
...
There are several ways of building a classiﬁer, as we shall see
...
3
...
1 Decision Tree Classiﬁers
The decision tree classiﬁer is a widely used technique for classiﬁcation
...
Figure 22
...

To classify a new instance, we start at the root, and traverse the tree to reach a
leaf; at an internal node we evaluate the predicate (or function) on the data instance,

825

826

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

VII
...
Advanced Querying and
Information Retrieval

22
...
6

good

excellent

Classiﬁcation tree
...
The process continues till we reach a leaf node
...
The class at the leaf is “good,” so we predict that
the credit risk of that person is good
...
The most common way of doing so is to use a greedy algorithm, which
works recursively, starting at the root and building the tree downward
...

At each node, if all, or “almost all” training instances associated with the node belong to the same class, then the node becomes a leaf node associated with that class
...
The data associated with each child node is the set of training
instances that satisfy the partitioning condition for that child node
...
The conditions for the four children nodes are degree = none, degree = bachelors,
degree = masters, and degree = doctorate, respectively
...

At the node corresponding to masters, the attribute income is chosen, with the range
of values partitioned into intervals 0 to 25,000, 25,000 to 50,000, 50,000 to 75,000, and
over 75,000
...
Other Topics

© The McGraw−Hill
Companies, 2001

22
...
As an optimization, since the class for the range 25,000 to 50,000 and the
range 50,000 to 75,000 is the same under the node degree = masters, the two ranges
have been merged into a single range 25,000 to 75,000
...
We shall see shortly how to
measure purity quantitatively
...
The attribute and
condition that result in the maximum purity are chosen
...
Suppose there are k classes, and of the instances in S the fraction of instances
in class i is pi
...
Another measure
of purity is the entropy measure, which is deﬁned as
k

Entropy(S) = −

pi log2 pi
i−1

The entropy value is 0 if all instances are in a single class, and reaches its maximum
when each class has the same number of instances
...

When a set S is split into multiple sets Si , i = 1, 2,
...
, Sr ) =
i=1

|Si |
purity(Si )
|S|

That is, the purity is the weighted average of the purity of the sets Si
...

The information gain due to a particular split of S into Si , i = 1, 2,
...
, Sr }) = purity(S) − purity(S1 , S2 ,
...
The number of elements in each of the
sets Si may also be taken into account; otherwise, whether a set Si has 0 elements or
1 element would make a big difference in the number of sets, although the split is the
same for almost all the elements
...
Other Topics

© The McGraw−Hill
Companies, 2001

22
...
3

Data Mining

835

deﬁned in terms of entropy as
r

Information-content(S, {S1 , S2 ,
...
, Sr })
Information-content(S, {S1 , S2 ,
...
Attributes can be either continuous valued, that is, the
values can be ordered in a fashion meaningful to classiﬁcation, such as age or income,
or can be categorical, that is, they have no meaningful order, such as department
names or country names
...

Usually attributes that are numbers (integers/reals) are treated as continuous valued while character string attributes are treated as categorical, but this may be controlled by the user of the system
...

We ﬁrst consider how to ﬁnd best splits for continuous-valued attributes
...
The case of multiway splits is more complicated;
see the bibliographical notes for references on the subject
...
We then compute the information gain obtained by splitting at each value
...
The best binary split for the attribute is the split that
gives the maximum information gain
...
This works ﬁne for categorical attributes with only a few distinct values, such as degree or gender
...
In such cases, we would try to combine multiple values into each
child, to create a smaller number of children
...

Decision-Tree Construction Algorithm
The main idea of decision tree construction is to evaluate different attributes and different partitioning conditions, and pick the attribute and partitioning condition that
results in the maximum information gain ratio
...
Other Topics

22
...
, Sr ;
for i = 1, 2,
...
7

Recursive construction of a decision tree
...
If the data can be perfectly classiﬁed, the recursion stops when the
purity of a set is 0
...
In this case, the recursion stops
when the purity of a set is “sufﬁciently high,” and the class of resulting leaf is deﬁned
as the class of the majority of the elements of the set
...

Figure 22
...
The recursion stops when the set is
sufﬁciently pure or the set S is too small for further partitioning to be statistically
signiﬁcant
...

There are a wide variety of decision tree construction algorithms, and we outline
the distinguishing features of a few of them
...

With very large data sets, partitioning may be expensive, since it involves repeated
copying
...

Several of the algorithms also prune subtrees of the generated decision tree to
reduce overﬁtting: A subtree is overﬁtted if it has been so highly tuned to the speciﬁcs
of the training data that it makes many classiﬁcation errors on other data
...
There are different pruning heuristics;
one heuristic uses part of the training data to build the tree and another part of the
training data to test it
...

We can generate classiﬁcation rules from a decision tree, if we so desire
...
An example of such a classiﬁcation rule is
degree = masters and income > 75, 000 ⇒ excellent

829

830

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

VII
...
Advanced Querying and
Information Retrieval

22
...
3
...
2 Other Types of Classiﬁers
There are several types of classiﬁers other than decision tree classiﬁers
...
Neural net
classiﬁers use the training data to train artiﬁcial neural nets
...

Bayesian classiﬁers ﬁnd the distribution of attribute values for each class in the
training data; when given a new instance d, they use the distribution information to
estimate, for each class cj , the probability that instance d belongs to class cj , denoted
by p(cj |d), in a manner outlined here
...

To ﬁnd the probability p(cj |d) of instance d being in class cj , Bayesian classiﬁers
use Bayes’ theorem, which says
p(cj |d) =

p(d|cj )p(cj )
p(d)

where p(d|cj ) is the probability of generating instance d given class cj , p(cj ) is the
probability of occurrence of class cj , and p(d) is the probability of instance d occurring
...
p(cj ) is simply
the fraction of training instances that belong to class cj
...
To simplify the task, naive Bayesian classiﬁers assume attributes have
independent distributions, and thereby estimate
p(d|cj ) = p(d1 |cj ) ∗ p(d2 |cj ) ∗
...

The probabilities p(di |cj ) derive from the distribution of values for each attribute i,
for each class class cj
...
For
instance, we may divide the range of values of attribute i into equal intervals, and
store the fraction of instances of class cj that fall in each interval
...

A signiﬁcant beneﬁt of Bayesian classiﬁers is that they can classify instances with
unknown and null attribute values — unknown or null attributes are just omitted
from the probability computation
...

22
...
2
...
Given values for
a set of variables, X1 , X2 ,
...
For
instance, we could treat the level of education as a number and income as another
number, and, on the basis of these two variables, we wish to predict the likelihood of

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

838

Chapter 22

VII
...
Advanced Querying and
Information Retrieval

© The McGraw−Hill
Companies, 2001

Advanced Querying and Information Retrieval

default, which could be a percentage chance of defaulting, or the amount involved in
the default
...
, an such that
Y = a0 + a1 ∗ X 1 + a2 ∗ X 2 + · · · + an ∗ X n
Finding such a linear polynomial is called linear regression
...

The ﬁt may only be approximate, because of noise in the data or because the relationship is not exactly a polynomial, so regression aims to ﬁnd coefﬁcients that give
the best possible ﬁt
...
We do not discuss these techniques here, but the bibliographical notes
provide references
...
3
...
Examples of such associations are:
• Someone who buys bread is quite likely also to buy milk
• A person who bought the book Database System Concepts is quite likely also to
buy the book Operating System Concepts
...
When a customer buys a particular book, an online shop may suggest associated books
...
Or the shop may place them at opposite ends of a row, and place
other associated items in between to tempt people to buy those items as well, as the
shoppers walk from one end of the row to the other
...

Association Rules
An example of an association rule is
bread ⇒ milk
In the context of grocery-store purchases, the rule says that customers who buy bread
also tend to buy milk with a high probability
...
In the grocery-store
example, the population may consist of all grocery store purchases; each purchase is
an instance
...
Each customer is an instance
...

831

832

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

VII
...
Advanced Querying and
Information Retrieval

22
...
These are
deﬁned in the context of the population:
• Support is a measure of what fraction of the population satisﬁes both the antecedent and the consequent of the rule
...
001 percent of all purchases include milk and
screwdrivers
...
The rule may not even be statistically signiﬁcant—perhaps there was
only a single purchase that included both milk and screwdrivers
...

On the other hand, if 50 percent of all purchases involve milk and bread,
then support for rules involving bread and milk (and no other item) is relatively high, and such rules may be worth attention
...

• Conﬁdence is a measure of how often the consequent is true when the antecedent is true
...
A rule with a low conﬁdence is not meaningful
...

Note that the conﬁdence of bread ⇒ milk may be very different from the
conﬁdence of milk ⇒ bread, although both have the same support
...
, in ⇒ i0
we ﬁrst ﬁnd sets of items with sufﬁcient support, called large itemsets
...

We will shortly see how to compute large itemsets
...
For each large itemset S, we output a
rule S − s ⇒ s for every subset s ⊂ S, provided S − s ⇒ s has sufﬁcient conﬁdence;
the conﬁdence of the rule is given by support of s divided by support of S
...
If the number of possible sets
of items is small, a single pass over the data sufﬁces to detect the level of support
for all the sets
...
When a
purchase record is fetched, the count is incremented for each set of items such that

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

840

Chapter 22

VII
...
Advanced Querying and
Information Retrieval

© The McGraw−Hill
Companies, 2001

Advanced Querying and Information Retrieval

all items in the set are contained in the purchase
...
Those sets with a sufﬁciently high count at the end of the pass correspond to items that have a high degree of association
...
Luckily, almost all the sets would normally
have very low support; optimizations have been developed to eliminate most such
sets from consideration
...

In the a priori technique for generating large itemsets, only sets with single items
are considered in the ﬁrst pass
...

At the end of a pass all sets with sufﬁcient support are output as large itemsets
...
Once a set is
eliminated, none of its supersets needs to be considered
...
At the end of some pass i, we would ﬁnd that no set of size i has
sufﬁcient support, so we do not need to consider any set of size i + 1
...

22
...
4 Other Types of Associations
Using plain association rules has several shortcomings
...

For instance, if many people buy cereal and many people buy bread, we can predict
that a fairly large number of people would buy both, even if there is no connection between the two purchases
...
In statistical terms, we look for correlations between items;
correlations can be positive, in that the co-occurrence is higher than would have been
expected, or negative, in that the items co-occur less frequently than predicted
...

Another important class of data-mining applications is sequence associations (or
correlations)
...
Stock-market analysts want to ﬁnd associations among
stock-market price sequences
...
” Discovering such association between sequences can help us to make intelligent investment
decisions
...

Deviations from temporal patterns are often interesting
...
If sales of winter clothes go down in summer, it is not surprising, since
we can predict it from past years; a deviation that we could not have predicted from
past experience would be considered interesting
...
See the bibliographical notes for references to research on this topic
...
Other Topics

© The McGraw−Hill
Companies, 2001

22
...
3

Data Mining

841

22
...
5 Clustering
Intuitively, clustering refers to the problem of ﬁnding clusters of points in the given
data
...
One way is to phrase it as the problem of grouping points into k sets (for a
given k) so that the average distance of points from the centroid of their assigned
cluster is minimized
...
There are other deﬁnitions too; see the bibliographical notes for details
...

Another type of clustering appears in classiﬁcation systems in biology
...
) For instance, leopards and humans are clustered under the class
mammalia, while crocodiles and snakes are clustered under reptilia
...
The clustering of mammalia has
further subclusters, such as carnivora and primates
...
Given characteristics of different species, biologists have created a complex
hierarchical clustering scheme grouping related species together at different levels of
the hierarchy
...
Internet directory systems (such as Yahoo’s) cluster related documents
in a hierarchical fashion (see Section 22
...
5)
...

The statistics community has studied clustering extensively
...
The Birch clustering algorithm is one such algorithm
...
3
...
3), and guided to appropriate leaf nodes based on nearness
to representative points in the internal nodes of the tree
...
Some postprocessing after insertion of all points gives the desired overall
clustering
...

An interesting application of clustering is to predict what new movies (or books,
or music) a person is likely to be interested in, on the basis of:
1
...
Other people with similar past preferences
3
...
The centroid of a set of points is deﬁned as a point whose coordinate on each dimension is the average
of the coordinates of all the points of that set on that dimension
...
, (xn , yn ) } is given by (

P

n
i=1

n

xi

,

P

n
i=1

n

yi

)

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

842

Chapter 22

VII
...
Advanced Querying and
Information Retrieval

© The McGraw−Hill
Companies, 2001

Advanced Querying and Information Retrieval

One approach to this problem is as follows
...
The accuracy
of clustering can be improved by previously clustering movies by their similarity, so
even if people have not seen the same movies, if they have seen similar movies they
would be clustered together
...
Given a new
user, we ﬁnd a cluster of users most similar to that user, on the basis of the user’s
preferences for movies already seen
...
In fact,
this problem is an instance of collaborative ﬁltering, where users collaborate in the task
of ﬁltering information to ﬁnd information of interest
...
3
...
For instance, there
are tools that form clusters on pages that a user has visited; this helps users when
they browse the history of their browsing to ﬁnd pages they have visited earlier
...
5
...
3)
...
5
...

Data-visualization systems help users to examine large volumes of data, and to
detect patterns visually
...
A single graphical screen can encode as much information as a far larger number of text
screens
...
The user can then quickly discover
locations where problems are occurring
...

As another example, information about values can be encoded as a color, and can
be displayed with as little as one pixel of screen area
...
The percentage of transactions that buy both items can
be encoded by the color intensity of the pixel
...

Data visualization systems do not automatically detect patterns, but provide system support for users to detect patterns
...

22
...
For instance, large retail chains have hundreds or thousands of stores,
whereas insurance companies may have data from thousands of local branches
...
Other Topics

© The McGraw−Hill
Companies, 2001

22
...
4

Data Warehousing

843

data source 1

data
loaders
data source 2

…

836

DBMS

query and
analysis tools

data warehouse

data source n
Figure 22
...

fore different data may be present in different locations, or on different operational
systems, or under different schemas
...
Corporate decision makers require access to information from all such sources
...
Moreover, the sources of
data may store only current data, whereas decision makers may need access to past
data as well; for instance, information about how purchase patterns have changed in
the past year could be of great importance
...

A data warehouse is a repository (or archive) of information gathered from multiple sources, stored under a uniﬁed schema, at a single site
...
Thus, data warehouses
provide the user a single consolidated interface to data, making decision-support
queries easier to write
...

22
...
1 Components of a Data Warehouse
Figure 22
...

Among the issues to be addressed in building a warehouse are the following:
• When and how to gather data
...
In
a destination-driven architecture, the data warehouse periodically sends requests for new data to the sources
...
Other Topics

22
...
Twophase commit is usually far too expensive to be an option, so data warehouses
typically have slightly out-of-date data
...

• What schema to use
...
In fact, they may even use different data
models
...
As a result, the
data stored in the warehouse are not just a copy of the data at the sources
...

• Data cleansing
...
Data sources often deliver data with numerous minor inconsistencies, that can be corrected
...
These can be corrected to a reasonable extent by consulting a database of street names and zip codes in each city
...
Records for multiple individuals in a house may be grouped
together so only one mailing is sent to each house; this operation is called
householding
...
Updates on relations at the data sources must
be propagated to the data warehouse
...
If they are not, the problem of propagating updates is basically the
view-maintenance problem, which was discussed in Section 14
...

• What data to summarize
...
However, we can answer many queries
by maintaining just summary data obtained by aggregation on a relation,
rather than maintaining the entire relation
...

Suppose that a relation r has been replaced by a summary relation s
...
If the query requires only summary data, it may be possible to transform it into an equivalent one using s instead; see Section 14
...

22
...
2 Warehouse Schemas
Data warehouses typically have schemas that are designed for data analysis, using
tools such as OLAP tools
...
Tables containing multidimensional data
are called fact tables and are usually very large
...
Other Topics

© The McGraw−Hill
Companies, 2001

22
...
4

Data Warehousing

845

for a retail store, with one tuple for each item that is sold, is a typical example of a fact
table
...

The measure attributes may include the number of items sold and the price of the
items
...
For
instance, a fact table sales would have attributes item-id, store-id, customer-id, and date,
and measure attributes number and price
...
The item-id attribute of the sales table would be a foreign key into a dimension table item-info, which would contain information such as the name of the
item, the category to which the item belongs, and other item details such as color and
size
...
We can also view the date attribute as a foreign key into a date-info table giving the month, quarter, and year of
each date
...
9
...
More complex data warehouse designs may have multiple levels of dimension tables; for instance, the item-info table may have an attribute
manufacturer-id that is a foreign key into another table giving details of the manufacturer
...
Complex data warehouse designs
may also have more than one fact table
...
9

store
store-id
city
state
country
customer
customer-id
name
street
city
state
zipcode
country

Star schema for a data warehouse
...
Other Topics

22
...
5 Information-Retrieval Systems
The ﬁeld of information retrieval has developed in parallel with the ﬁeld of databases
...

Data contained in documents is unstructured, without any associated schema
...

The Web provides a convenient way to get to, and to interact with, information
sources across the Internet
...
Information retrieval has played a critical role in making the Web a
productive and useful tool, especially for researchers
...
The data in such systems are organized as a collection of documents; a newspaper
article or a catalog entry (in a library catalog) are examples of documents
...

A user of such a system may want to retrieve a particular document or a particular
class of documents
...
Documents have associated with them a set of
keywords, and documents whose keywords contain those supplied by the user are
retrieved
...
For instance, a video movie may
have associated with it keywords such as its title, director, actors, type, and so on
...

• Database systems deal with several operations that are not addressed in information-retrieval systems
...
These matters are viewed as less important in information systems
...

• Information-retrieval systems deal with several issues that have not been addressed adequately in database systems
...

839

840

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

VII
...
Advanced Querying and
Information Retrieval

22
...
5
...
For example, a user could ask
for all documents that contain the keywords “motorcycle and maintenance,” or documents that contain the keywords “computer or microprocessor,” or even documents
that contain the keyword “computer but not database
...

In full text retrieval, all the words in each document are considered to be keywords
...
We shall use the
word term to refer to the words in a document, since all words are keywords
...
More sophisticated systems estimate
relevance of documents to a query so that the documents can be shown in order of
estimated relevance
...
5
...
1 and 22
...
1
...
Section 22
...
1
...
Some systems also attempt to provide a better set of answers by using
the meanings of terms, rather than just the syntactic occurrence of terms, as outlined
in Section 22
...
1
...

22
...
1
...
Full text retrieval makes this problem worse: Each document may contain
many terms, and even terms that are only mentioned in passing are treated equivalently with documents where the term is indeed relevant
...

Information retrieval systems therefore estimate relevance of documents to a query,
and return only highly ranked documents as answers
...

The ﬁrst question to address is, given a particular term t, how relevant is a particular document d to the term
...
Just counting
the number of occurrences of a term is usually not a good indicator: First, the number of occurrences depends on the length of the document, and second, a document
containing 10 occurrences of a term may not be 10 times as relevant as a document
containing one occurrence
...
Other Topics

© The McGraw−Hill
Companies, 2001

22
...
Observe that this metric takes
the length of the document into account
...

Many systems reﬁne the above metric by using other information
...
Similarly, if the ﬁrst occurrence of a term is late
in the document, the document may be considered less relevant than if the ﬁrst occurrence is early in the document
...
In the information retrieval community, the
relevance of a document to a term is referred to as term frequency, regardless of the
exact formula used
...
The relevance of a document to a
query with two or more keywords is estimated by combining the relevance measures
of the document to each keyword
...
However, not all terms used as keywords are equal
...
” A document containing “Silberschatz” but not
“web” should be ranked higher than a document containing the term “web” but not
“Silberschatz
...
The relevance of a document d to a set of terms Q is then deﬁned as
r(d, Q) =
t∈Q

r(d, t)
n(t)

This measure can be further reﬁned if the user is permitted to specify weights w(t)
for terms in the query, in which case the user-speciﬁed weights are also taken into
account by using w(t)/n(t) in place of 1/n(t)
...
Information-retrieval systems deﬁne a
set of words, called stop words, containing 100 or so of the most common words,
and remove this set from the document when indexing; such words are not used as
keywords, and are discarded if present in the keywords supplied by the user
...
If the terms occur close to each other in the
document, the document would be ranked higher than if they occur far apart
...

841

842

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

VII
...
Advanced Querying and
Information Retrieval

22
...
Since there may be a very large number
of documents that are relevant, information retrieval systems typically return only
the ﬁrst few documents with the highest degree of estimated relevance, and permit
users to interactively request further documents
...
5
...
2 Relevance Using Hyperlinks
Early Web search engines ranked documents by using only relevance measures similar to those described in Section 22
...
1
...
However, researchers soon realized that Web
documents have information that plain text documents do not have, namely hyperlinks
...

The basic idea of site ranking is to ﬁnd sites that are popular, and to rank pages
from such sites higher than pages from other sites
...
bell-labs
...
belllabs
...
A site usually contains multiple Web pages
...
For instance, the term “google” may occur in vast numbers of pages, but the site google
...
Documents from google
...

This raises the question of how to deﬁne the popularity of a site
...
However, getting such information
is impossible without the cooperation of the site, and is infeasible for a Web search
engine to implement
...

Traditional measures of relevance of the page (which we saw in Section 22
...
1
...
Pages with high overall relevance value are
returned as answers to a query, as before
...
There are at least two
reasons for this
...
Second, there are far
fewer sites than pages, so computing and using popularity of sites is cheaper than
computing and using popularity of pages
...
For instance, a link from
a popular site to another site s may be considered to be a better indication of the
popularity of s than a link to s from a less popular site
...
This is similar in some sense to giving extra weight to endorsements of products by celebrities (such
as ﬁlm stars), so its signiﬁcance is open to question!

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

850

Chapter 22

VII
...
Advanced Querying and
Information Retrieval

© The McGraw−Hill
Companies, 2001

Advanced Querying and Information Retrieval

is in fact circular, since the popularity of a site is deﬁned by the popularity of other
sites, and there may be cycles of links between sites
...
The linear equations are deﬁned in such a way that
they have a unique and well-deﬁned solution
...
com uses the referring-site popularity idea
in its deﬁnition page rank, which is a measure of popularity of a page
...
com became a widely used search engine, in a rather short period
of time
...
In the social networking
context, the goal was to deﬁne the prestige of people
...
If
someone is known by multiple prestigious people, then she also has high prestige,
even if she is not known by as large a number of people
...

A hub is a page that stores links to many pages; it does not in itself contain actual
information on a topic, but points to pages that contain actual information
...
Each page then gets a prestige
value as a hub (hub-prestige), and another prestige value as an authority (authorityprestige)
...
A page gets higher hub-prestige if it points to many
pages with high authority-prestige, while a page gets higher authority-prestige if it is
pointed to by many pages with high hub-prestige
...
See the bibliographical notes
for references giving further details
...
5
...
3 Similarity-Based Retrieval
Certain information-retrieval systems permit similarity-based retrieval
...
The similarity of a document to another may be deﬁned, for
example, on the basis of common terms
...
The terms in the query are themselves weighted by r(d, t)
...
The resultant set
of documents is likely to be what the user intended to ﬁnd
...
In such a situation, instead of
adding further keywords to the query, users may be allowed to identify one or a few
of the returned documents as relevant; the system then uses the identiﬁed documents

843

844

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

VII
...
Advanced Querying and
Information Retrieval

22
...
The resultant set of documents is likely to be what the user
intended to ﬁnd
...
5
...
4 Synonyms and Homonyms
Consider the problem of locating documents about motorcycle maintenance for the
keywords “motorcycle” and “maintenance
...
The document titled
Motorcycle Repair would not be retrieved, since the word “maintenance” does not occur in its title
...
Each word can have a set
of synonyms deﬁned, and the occurrence of a word can be replaced by the or of all
its synonyms (including the word itself)
...
” This query would ﬁnd the
desired document
...
For instance, the word object has different
meanings as a noun and as a verb
...
Some keyword query systems attempt to disambiguate the meaning
of words in documents, and when a user poses a query, they ﬁnd out the intended
meaning by asking the user
...
However, disambiguating meanings of words in
documents is not an easy task, so not many systems implement this idea
...
Documents that use the synonyms with an
alternative intended meaning would be retrieved
...

22
...
2 Indexing of Documents
An effective index structure is important for efﬁcient processing of queries in an
information-retrieval system
...
To support relevance ranking
based on proximity of keywords, such an index may provide not just identiﬁers of
documents, but also a list of locations in the document where the keyword appears
...
Thus, the system may attempt to keep the set of documents for a keyword in consecutive disk pages
...
, Kn
...
, Sn of all documents that contain the respective keywords
...
Other Topics

22
...
The or operation gives the set of all documents that contain
at least one of the keywords K1 , K2 ,
...
We implement the or operation by computing the union, S1 ∪S2 ∪· · ·∪Sn , of the sets
...
Given a set of document identiﬁers S, we can
eliminate documents that contain the speciﬁed keyword Ki by taking the difference
S − Si , where Si is the set of identiﬁers of documents that contain the keyword Ki
...
In this case, all documents containing at least one of the words are
retrieved (as in the or operation), but are ranked by their relevance measure
...
To reduce this effort, they
may use a compressed representation with only a few bits, which approximates the
term frequency
...

22
...
3 Measuring Retrieval Effectiveness
Each keyword may be contained in a large number of documents; hence, a compact
representation is critical to keep space usage of the index low
...
So that storage space
is saved, the index is sometimes stored such that the retrieval is approximate; a few
relevant documents may not be retrieved (called a false drop or false negative), or
a few irrelevant documents may be retrieved (called a false positive)
...

In Web indexing, false positives are not desirable either, since the actual document
may not be quickly accessible for ﬁltering
...
The ﬁrst, precision, measures what percentage of the retrieved
documents are actually relevant to the query
...
Ideally both should
be 100 percent
...
Ranking strategies can result in false
negatives and false positives, but in a more subtle sense
...
However, humans would rarely look beyond the ﬁrst few tens of returned documents, and
may thus miss relevant documents because they are not ranked among the
top few
...

Therefore instead of having a single number as the measure of recall, we
can measure the recall as a function of the number of documents fetched
...
Other Topics

22
...
5

© The McGraw−Hill
Companies, 2001

Information-Retrieval Systems

853

• False positives may occur because irrelevant documents get higher rankings
than relevant documents
...
One option is to measure precision as a function of number of documents fetched
...
With this combined measure, both precision and recall can be
computed as a function of number of documents, if required
...
In general,
we can draw a graph relating precision to recall
...

Yet another problem with measuring precision and recall lies in how to deﬁne
which documents are really relevant and which are not
...
Researchers therefore have created collections of documents and queries, and have manually tagged documents as relevant or irrelevant
to the queries
...

22
...
4 Web Search Engines
Web crawlers are programs that locate and gather information on the Web
...

A crawler retrieves the documents and adds information found in the documents to a
combined index; the document is generally not stored, although some search engines
do cache a copy of the document to give clients faster access to the documents
...
There are usually many processes, running on multiple machines, involved in crawling
...

New links found during a crawl are added to the database, and may be crawled later
if they are not crawled immediately
...
Pages have to
be refetched (that is, links recrawled) periodically to obtain updated information, and
to discard sites that no longer exist, so that the information in the search index is kept
reasonably up to date
...
It is not a good
idea to add pages to the same index that is being used for queries, since doing so
would require concurrency control on the index, and affect query and update performance
...
At periodic intervals the copies switch over,
with the old one being updated while the new copy is being used for queries
...
Other Topics

Chapter 22

© The McGraw−Hill
Companies, 2001

22
...

22
...
5 Directories
A typical library user may use a catalog to locate a book for which she is looking
...
Libraries organize books in such a way that related books are kept close together
...

To keep related books close together, libraries use a classiﬁcation hierarchy
...
Within this set of books, there is a ﬁner classiﬁcation, with computer-science books organized together, mathematics books organized
together, and so on
...
At yet another
level in the classiﬁcation hierarchy, computer-science books are broken down into
subareas, such as operating systems, languages, and algorithms
...
10 illustrates a classiﬁcation hierarchy that may be used by a library
...

In an information retrieval system, there is no need to store related documents
close together
...
Thus, such a system could use a classiﬁcation hierarchy similar to

books

science

math

…

engineering

computer science

…

…

…

…

algorithms

graph algorithms
Figure 22
...

847

848

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

VII
...
Advanced Querying and
Information Retrieval

22
...

In an information retrieval system, there is no need to keep a document in a single
spot in the hierarchy
...
All that is
stored at each spot is an identiﬁer of the document (that is, a pointer to the document),
and it is easy to fetch the contents of the document by using the identiﬁer
...
The class of “graph algorithm” document can appear both under mathematics and under computer science
...
11
...

A directory is simply a classiﬁcation DAG structure
...
Internal nodes may
also contain links, for example to documents that cannot be classiﬁed under any of
the child nodes
...

While browsing down the directory, the user can ﬁnd not only documents on the
topic he is interested in, but also ﬁnd related documents and related classes in the
classiﬁcation hierarchy
...

Organizing the enormous amount of information available on the Web into a directory structure is a daunting task
...
11

…

…

fiction

…

A classiﬁcation DAG for a library information retrieval system
...
Other Topics

22
...

• The second problem is, given a document, deciding which nodes of the directory are categories relevant to the document
...
The
Open Directory Project is a large collaborative effort, with different volunteers being
responsible for organizing different branches of the directory
...

There are also techniques for automatically deciding the location of documents based
on computing their similarity to documents that have already been classiﬁed
...
6 Summary
• Decision-support systems analyze online data collected by transactionprocessing systems, to help people make business decisions
...
Decision-support systems come in
various forms, including OLAP systems and data mining systems
...

OLAP tools work on multidimensional data, characterized by dimension

attributes and measure attributes
...
Precomputing the data cube helps speed up queries on summaries
of data
...

Drill down, rollup, slicing, and dicing are among the operations that users
perform with OLAP tools
...

• Data mining is the process of semiautomatically analyzing large databases
to ﬁnd useful patterns
...

849

850

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

VII
...
Advanced Querying and
Information Retrieval

22
...
Classiﬁcation can be used, for instance, to
predict credit-worthiness levels of new applicants, or to predict the performance of applicants to a university
...
These perform classiﬁcation by constructing a
tree based on training instances with leaves having class labels
...

Several techniques are available to construct decision trees, most of
them based on greedy heuristics
...

• Association rules identify items that co-occur frequently, for instance, items
that tend to be bought by the same customer
...

• Other types of data mining include clustering, text mining, and data visualization
...
Warehouses are used for decision support and analysis on historical data, for instance to predict trends
...
Warehouse schemas tend to be multidimensional, involving one or a few very large fact tables and several much smaller
dimension tables
...
They use a simpler data model than do database systems, but
provide more powerful querying capabilities within the restricted model
...
The query that a user has in mind usually cannot
be stated precisely; hence, information-retrieval systems order answers on the
basis of potential relevance
...

Inverse document frequency
...
Page rank and hub/authority rank are two ways to assign
importance to sites on the basis of links to the site
...
Synonyms and homonyms complicate the task of information retrieval
...
Other Topics

© The McGraw−Hill
Companies, 2001

22
...

• Directory structures are used to classify documents with other similar documents
...
Other Topics

22
...
1 For each of the SQL aggregate functions sum, count, min and max, show how
to compute the aggregate value on a multiset S1 ∪ S2 , given the aggregate
values on multisets S1 and S2
...
sum, count, min and max
b
...
standard deviation
22
...

22
...

22
...

22
...

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

860

Chapter 22

VII
...
Advanced Querying and
Information Retrieval

© The McGraw−Hill
Companies, 2001

Advanced Querying and Information Retrieval

22
...
2
...

22
...
Write an SQL query to
compute a histogram of balance values, dividing the range 0 to the maximum
account balance present, into three equal ranges
...
8 Consider the sales relation from Section 22
...
Write an SQL query to compute
the cube operation on the relation, giving the relation in Figure 22
...
Do not
use the with cube construct
...
9 Construct a decision tree classiﬁer with binary splits at each node, using tuples in relation r(A, B, C) shown below as training data; attribute C denotes
the class
...

(1, 2, a), (2, 1, a), (2, 5, b), (3, 3, b), (3, 6, b), (4, 5, b), (5, 5, c), (6, 3, b), (6, 7, c)
22
...
Under what conditions can the rules be replaced, without any loss of information, by a single rule that says people with salaries between $10,000 and
$30,000 have a credit rating of good
...
11 Suppose half of all the transactions in a clothes shop purchase jeans, and one
third of all transactions in the shop purchase T-shirts
...
Write down all
the (nontrivial) association rules you can deduce from the above information,
giving support and conﬁdence of each rule
...
12 Consider the problem of ﬁnding large itemsets
...
Describe how to ﬁnd the support for a given collection of itemsets by using
a single scan of the data
...

b
...
Show that no superset of this
itemset can have support greater than or equal to j
...
13 Describe beneﬁts and drawbacks of a source-driven architecture for gathering
of data at a data-warehouse, as compared to a destination-driven architecture
...
14 Consider the schema depicted in Figure 22
...
Give an SQL:1999 query to summarize sales numbers and price by store and date, along with the hierarchies
on store and date
...
15 Compute the relevance (using appropriate deﬁnitions of term frequency and
inverse document frequency) of each of the questions in this chapter to the
query “SQL relation
...
Other Topics

22
...
16 What is the difference between a false positive and a false drop? If it is essential
that no relevant information be missed by an information retrieval query, is it
acceptable to have either false positives or false drops? Why?
22
...
Suppose also you have a keyword index that gives you a (sorted) list
of identiﬁers of documents that contain a speciﬁed keyword
...

Bibliographical Notes
Gray et al
...
[1997] describe the data-cube operator
...
[1996], Harinarayan
et al
...
Descriptions of extended aggregation
support in SQL:1999 can be found in the product manuals of database systems such
as Oracle and IBM DB2
...

Witten and Frank [1999] and Han and Kamber [2000] provide textbook coverage
of data mining
...
Fayyad et al
...
Kohavi and Provost [2001]
presents a collection of articles on applications of data mining to electronic commerce
...
[1993] provides an early overview of data mining in databases
...
[1992] and Shafer et al
...
[1996]
...
Algorithms for mining
of different forms of association rules are described by Srikant and Agrawal [1996a]
and Srikant and Agrawal [1996b]
...
[1998] describes techniques for
mining surprising temporal patterns
...
Ng and Han [1994] describes spatial clustering techniques
...
[1996]
...
[1998] provides an empirical analysis of different algorithms
for collaborative ﬁltering
...
[1997]
...
Chakrabarti [1999] provides a survey of Web
resource discovery
...

Poe [1995] and Mattison [1996] provide textbook coverage of data warehousing
...
[1995] describes view maintenance in a data-warehousing environment
...
[1999], Grossman and Frieder [1998], and Baeza-Yates and RibeiroNeto [1999] provide textbook descriptions of information retrieval
...
[1999]
...
Salton [1989] is an early textbook on information-

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

862

VII
...
Advanced Querying and
Information Retrieval

© The McGraw−Hill
Companies, 2001

Advanced Querying and Information Retrieval

retrieval systems
...
nist
...

Brin and Page [1998] describes the anatomy of the Google search engine, including the PageRank technique, while a hubs and authorities based ranking technique
called HITS is described by Kleinberg [1999]
...
A point worth noting is that the PageRank
of a page is computed independent of any query, and as a result a highly ranked page
which just happens to contain some irrelevant keywords would ﬁgure among the top
answers for a query on the irrelevant keywords
...

Tools
A variety of tools are available for each of the applications we have studied in this
chapter
...
These include OLAP tools from Microsoft Corp
...
The Arbor Essbase OLAP tool is from an independent software vendor
...
databeacon
...
Many companies
also provide analysis tools specialized for speciﬁc applications, such as customer relationship management
...
A good
deal of expertise is required to apply general purpose mining tools for speciﬁc applications
...
The Web site www
...
com provides an extensive directory of mining software, solutions, publications, and so on
...
These provide support functionality for data modeling, cleansing, loading, and querying
...
dwinfocenter
...

Google (www
...
com) is a popular search engine
...
yahoo
...
org) provide classiﬁcation hierarchies for Web
sites
...
Other Topics

23
...
In the past few years, however, there has been increasing
need for handling new data types in databases, such as temporal data, spatial data
...

Another major trend in the last decade has created its own issues: the growth
of mobile computers, starting with laptop computers and pocket organizers, and in
more recent years growing to include mobile phones with built-in computers, and a
variety of wearable computers that are increasingly used in commercial applications
...

23
...

• Temporal data
...
In many applications, it is very important to store and retrieve information about past states
...
However, the task is greatly simpliﬁed by database
support for temporal data, which we study in Section 23
...

• Spatial data
...
Applications of spatial data initially stored data
as ﬁles in a ﬁle system, as did early-generation business applications
...
Other Topics

23
...

Spatial-data applications require facilities offered by a database system —
in particular, the ability to store and query large amounts of data efﬁciently
...
In
Section 23
...

• Multimedia data
...
4, we study the features required in database
systems that store multimedia data such as image, video, and audio data
...

• Mobile databases
...
5, we study the database requirements of the
new generation of mobile computing systems, such as notebook computers
and palmtop computing devices, which are connected to base stations via
wireless digital communication networks
...
They also have limited storage capacity, and
thus require special techniques for memory management
...
2 Time in Databases
A database models the state of some aspect of the real world outside itself
...
When the state of
the real world changes, the database gets updated, and information about the old
state gets lost
...
For example, a patient database must store information about the medical history of a patient
...

Databases that store information about states of the real world across time are called
temporal databases
...
The
valid time for a fact is the set of time intervals during which the fact is true in the
real world
...
This latter time is based on the transaction serialization order and is generated automatically by the system
...

A temporal relation is one where each tuple has an associated time when it is
true; the time may be either valid time or transaction time
...
Other Topics

© The McGraw−Hill
Companies, 2001

23
...
2

accountnumber
A-101
A-101
A-215
A-215
A-215
A-217

branch-name
Downtown
Downtown
Mianus
Mianus
Mianus
Brighton

balance
500
100
700
900
700
750

Figure 23
...

bitemporal relation
...
1 shows an example of a temporal relation
...
Intervals
are shown here as a pair of attributes from and to; an actual implementation would
have a structured type, perhaps called Interval, that contains both ﬁelds
...
Although times are shown in textual form, they are stored
internally in a more compact form, such as the number of seconds since some ﬁxed
time on a ﬁxed date (such as 12:00 AM, January 1, 1900) that can be translated back
to the normal textual form
...
2
...
The type date contains four digits for the year (1 – 9999), two digits for the month (1 – 12), and two digits
for the date (1 – 31)
...
The seconds
ﬁeld can go beyond 60, to allow for leap seconds that are added during some years
to correct for small variations in the speed of rotation of Earth
...

Since different places in the world have different local times, there is often a need
for specifying the time zone along with the time
...
(The standard abbreviation is UTC, rather than UCT, since it is an
abbreviation of “Universal Coordinated Time” written in French as universel temps
coordonn´
...
For instance, the time could be expressed in terms of U
...
Eastern Standard
Time, with an offset of −6:00, since U
...
Eastern Standard time is 6 hours behind
UTC
...
Other Topics

23
...
This notion differs from the notion of interval we used previously,
which refers to an interval of time with speciﬁc starting and ending times
...
2
...
Thus, a snapshot of
a temporal relation at a point in time t is the set of tuples in the relation that are true
at time t, with the time-interval attributes projected out
...

A temporal selection is a selection that involves the time attributes; a temporal
projection is a projection where the tuples in the projection inherit their times from
the tuples in the original relation
...

If the times do not intersect, the tuple is removed from the result
...
The intersect operation can be applied on two intervals, to
give a single (possibly empty) interval
...

Functional dependencies must be used with care in a temporal relation
...
A temporal functional depenτ
dency X → Y holds on a relation schema R if, for all legal instances r of R, all
snapshots of r satisfy the functional dependency X → Y
...
SQL:1999 Part 7 (SQL/Temporal), which is currently under development, is the proposed standard for temporal extensions to SQL
...
3 Spatial and Geographic Data
Spatial data support in databases is important for efﬁciently storing, indexing, and
querying of data based on spatial locations
...
We cannot use standard index structures, such as Btrees or hash indices, to answer such a query efﬁciently
...

Two types of spatial data are particularly important:
• Computer-aided-design (CAD) data, which includes spatial information
about how objects— such as buildings, cars, or aircraft — are constructed
...

1
...

859

860

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

VII
...
Advanced Data Types
and New Applications

23
...

Geographic information systems are special-purpose databases tailored for
storing geographic data
...

23
...
1 Representation of Geometric Information
Figure 23
...
We stress here that geometric information can be represented in several different ways, only some of which we describe
...
For example,
in a map database, the two coordinates of a point would be its latitude and longi-

2
line segment

{(x1,y1), (x2,y2)}

1
3
{(x1,y1), (x2,y2), (x3,y3)}

triangle
1

2
2
3

polygon

1

{(x1,y1), (x2,y2), (x3,y3), (x4,y4), (x5,y5)}
4

5
2

3
polygon

1

{(x1,y1), (x2,y2), (x3,y3), ID1}
{(x1,y1), (x3,y3), (x4,y4), ID1}
{(x1,y1), (x4,y4), (x5,y5), ID1}

4

5
object
Figure 23
...

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

868

Chapter 23

VII
...
Advanced Data Types
and New Applications

© The McGraw−Hill
Companies, 2001

Advanced Data Types and New Applications

tude
...
We can approximately represent an arbitrary curve by
polylines, by partitioning the curve into a sequence of segments
...
Some systems also support circular arcs as primitives, allowing curves to be
represented as sequences of arcs
...
2
...
In an alternative representation, a polygon can be divided into a set of triangles, as shown in Figure 23
...

This process is called triangulation, and any polygon can be triangulated
...
Circles and ellipses can be represented
by corresponding types, or can be approximated by polygons
...
Such non-ﬁrst-normal-form representations are used when supported by
the underlying database
...
Similarly, the triangulated representation of polygons allows a
ﬁrst-normal-form relational representation of polygons
...
Similarly, the representation of planar ﬁgures —
such as triangles, rectangles, and other polygons — does not change much when we
move to three dimensions
...
We can represent arbitrary polyhedra by dividing
them into tetrahedrons, just as we triangulate polygons
...

23
...
2 Design Databases
Computer-aided-design (CAD) systems traditionally stored data in memory during
editing or other processing, and wrote the data back to a ﬁle at the end of a session of
editing
...
For large designs, such
as the design of a large-scale integrated circuit, or the design of an entire airplane,
it may be impossible to hold the complete design in memory
...
Some references use the term closed polygon to refer to what we call polygons, and refer to polylines as
open polygons
...
Other Topics

© The McGraw−Hill
Companies, 2001

23
...
3

Spatial and Geographic Data

869

systems
...

The objects stored in a design database are generally geometric objects
...
Complex two-dimensional objects can be formed from simple
objects by means of union, intersection, and difference operations
...
3
...

Design databases also store nonspatial information about objects, such as the material from which the objects are constructed
...
We concern ourselves here with only the spatial aspects
...
For instance, the designer may want to retrieve that part of the design that corresponds to a particular region of interest
...
3
...
Spatial-index structures are multidimensional, dealing with two- and
three-dimensional data, rather than dealing with just the simple one-dimensional ordering provided by the B+ -trees
...
Such errors
often occur if the design is performed manually, and are detected only when a prototype is being constructed
...
Database
support for spatial-integrity constraints helps people to avoid design errors, thereby
keeping the design consistent
...

(a) Difference of cylinders
Figure 23
...

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

870

Chapter 23

VII
...
Advanced Data Types
and New Applications

© The McGraw−Hill
Companies, 2001

Advanced Data Types and New Applications

23
...
3 Geographic Data
Geographic data are spatial in nature, but differ from design data in certain ways
...
Maps may
provide not only location information — about boundaries, rivers, and roads, for
example — but also much more detailed information associated with locations, such
as elevation, soil type, land usage, and annual rainfall
...
Such data consist of bit maps or pixel maps, in two or more dimensions
...
Such data can be three-dimensional — for example, the temperature
at different altitudes at different regions, again measured with the help of a
satellite
...
Design databases generally
do not store raster data
...
Vector data are constructed from basic geometric objects, such as
points, line segments, triangles, and other polygons in two dimensions, and
cylinders, spheres, cuboids, and other polyhedrons in three dimensions
...
Rivers and roads may be
represented as unions of multiple line segments
...
Topological information, such as height, may be represented by a surface divided into polygons covering regions of equal height,
with a height value associated with each polygon
...
3
...
1 Representation of Geographic Data
Geographical features, such as states and large lakes, are represented as complex
polygons
...

Geographic information related to regions, such as annual rainfall, can be represented as an array — that is, in raster form
...
In Section 23
...
5, we study an alternative representation of such arrays by a data structure called a quadtree
...
3
...
The vector representation is more compact than the raster representation in
some applications
...
However, the vector representation is unsuitable
for applications where the data are intrinsically raster based, such as satellite images
...
3
...
2 Applications of Geographic Data
Geographic databases have a variety of uses, including online map services, vehiclenavigation systems; distribution-network information for public-service utilities such

863

864

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

VII
...
Advanced Data Types
and New Applications

23
...

Web-based road map services form a very widely used application of map data
...
An important beneﬁt of online maps is that it is easy to scale the
maps to the desired size — that is, to zoom in and out to locate relevant features
...
With this additional information about roads, the maps can be used
for getting directions to go from one place to another and for automatic trip planning
...

Vehicle-navigation systems are systems mounted in automobiles, which provide
road maps and trip planning services
...
With such a system, a driver can never3
get lost — the GPS unit ﬁnds the location in terms of latitude, longitude, and elevation
and the navigation system can query the geographic database to ﬁnd where and on
which road the vehicle is currently located
...
Without detailed maps,
work carried out by one utility may damage the cables of another utility, resulting in large-scale disruption of service
...

So far, we have explained why spatial databases are useful
...

23
...
4 Spatial Queries
There are a number of types of queries that involve spatial locations
...
A query
to ﬁnd all restaurants that lie within a given distance of a given point is an
example of a nearness query
...
For example, we may want to ﬁnd the
nearest gasoline station
...

• Region queries deal with spatial regions
...
A query to ﬁnd all retail shops
within the geographic boundaries of a given town is an example
...
Well, hardly ever!

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

872

Chapter 23

VII
...
Advanced Data Types
and New Applications

© The McGraw−Hill
Companies, 2001

Advanced Data Types and New Applications

• Queries may also request intersections and unions of regions
...

Queries that compute intersections of regions can be thought of as computing the
spatial join of two spatial relations— for example, one representing rainfall and the
other representing population density — with the location playing the role of join attribute
...

Several join algorithms efﬁciently compute spatial joins on vector data
...
Researchers have proposed join techniques based on coordinated traversal of spatial index structures on
the two relations
...

In general, queries on spatial data may have a combination of spatial and nonspatial requirements
...

Since spatial data are inherently graphical, we usually query them by using a
graphical query language
...
The user can invoke various operations on the interface, such as
choosing an area to be viewed (for example, by pointing and clicking on suburbs west
of Manhattan), zooming in and out, choosing what to display on the basis of selection
conditions (for example, houses with more than three bedrooms), overlay of multiple maps (for example, houses with more than three bedrooms overlayed on a map
showing areas with low crime rates), and so on
...
Extensions of SQL have been proposed to permit relational databases
to store and retrieve spatial information efﬁciently, and also allowing queries to mix
spatial and nonspatial conditions
...

23
...
5 Indexing of Spatial Data
Indices are required for efﬁcient access to spatial data
...

23
...
5
...
Tree structures, such
as binary trees and B-trees, operate by successively dividing space into smaller parts
...
Other Topics

© The McGraw−Hill
Companies, 2001

23
...
3

Spatial and Geographic Data

873

in two
...
In a balanced binary tree, the partition
is chosen so that approximately one-half of the points stored in the subtree fall in
each partition
...

We can use that intuition to create tree structures for two-dimensional space, as
well as in higher-dimensional spaces
...
Each level of a k-d
tree partitions the space into two
...
The partitioning proceeds in such
a way that, at each node, approximately one-half of the points stored in the subtree
fall on one side, and one-half fall on the other
...
Figure 23
...
Each line
corresponds to a node in the tree, and the maximum number of points in a leaf node
has been set at 1
...
The numbering of the lines in the ﬁgure indicates the level of
the tree at which the corresponding node appears
...
k-d-B trees
are better suited for secondary storage than k-d trees
...
4

1

3

Division of space by a k-d tree
...
Other Topics

23
...
3
...
2 Quadtrees
An alternative representation for two-dimensional data is a quadtree
...
5
...
4
...
The top node is associated with the entire target space
...
Leaf nodes have between zero and some ﬁxed maximum number of points
...
In the example in Figure 23
...

This type of quadtree is called a PR quadtree, to indicate it stores points, and that
the division of space is divided based on regions, rather than on the actual set of
points stored
...
A
node in a region quadtree is a leaf node if all the array values in the region that it
covers are the same
...
Each node in the region quadtree corresponds
to a subarray of values
...

Indexing of line segments and polygons presents new problems
...
However, a line segment or polygon
may cross a partitioning line
...
Multiple occurrences of a line segment or
polygon can result in inefﬁciencies in storage, as well as inefﬁciencies in querying
...
5

Division of space by a quadtree
...
Other Topics

© The McGraw−Hill
Companies, 2001

23
...
3

Spatial and Geographic Data

875

23
...
5
...
An R-tree is a balanced tree structure with the indexed polygons stored in
leaf nodes, much like a B+ -tree
...
The bounding box of a leaf node is
the smallest rectangle parallel to the axes that contains all objects stored in the leaf
node
...
The bounding box
of a polygon is deﬁned, similarly, as the smallest rectangle parallel to the axes that
contains the polygon
...
Each leaf node stores the indexed polygons, and may
optionally store the bounding boxes of the polygons; the bounding boxes help speed
up checks for overlaps of the rectangle with the indexed polygons — if a query rectangle does not overlap with the bounding box of a polygon, it cannot overlap with
the polygon either
...
)
Figure 23
...
Note that the bounding boxes are shown with extra space inside them, to
make them stand out pictorially
...

The R-tree itself is at the right side of Figure 23
...
The ﬁgure refers to the coordinates of bounding box i as BBi in the ﬁgure
...
6

An R-tree
...
Other Topics

23
...

• Search: As the ﬁgure shows, the bounding boxes associated with sibling nodes
may overlap; in B+ -trees, k-d trees, and quadtrees, in contrast, the ranges do
not overlap
...
Similarly, a query to ﬁnd all
polygons that intersect a given polygon has to go down every node where the
associated rectangle intersects the polygon
...
Ideally we should pick a leaf node that has space to hold a new
entry, and whose bounding box contains the bounding box of the polygon
...
At each internal node we may ﬁnd multiple children whose
bounding boxes contain the bounding box of the polygon, and each of these
children needs to be explored
...
If
none of the children satisfy this condition, the algorithm chooses a child node
whose bounding box has the maximum overlap with the bounding box of the
polygon for continuing the traversal
...
Just as with B+ -tree insertion, the
R-tree insertion algorithm ensures that the tree remains balanced
...

The main difference of the insertion procedure from the B+ -tree insertion
procedure lies in how the node is split
...
This property does not generalize beyond one dimension; that is,
for more than one dimension, it is not always possible to split the entries into
two sets so that their bounding boxes do not overlap
...
The two nodes resulting from the split would
contain the entries in S1 and S2 respectively
...
(The heuristic gets is name from the
fact that it takes time quadratic in the number of entries
...
Other Topics

© The McGraw−Hill
Companies, 2001

23
...
4

Multimedia Databases

877

The quadratic split heuristic works this way: First, it picks a pair of entries
a and b from S such that putting them in the same node would result in a
bounding box with the maximum wasted space; that is, the area of the minimum bounding box of a and b minus the sum of the areas of a and b is the
largest
...

It then iteratively adds the remaining entries, one entry per iteration, to one
of the two sets S1 or S2
...
In each iteration, the heuristic
chooses one of the entries with the maximum difference of ie,1 and ie,2 and
adds it to S1 if ie,1 is less than ie,2 , and to S2 otherwise
...
The
iteration stops when all entries have been assigned, or when one of the sets
S1 or S2 has enough entries that all remaining entries have to be added to
the other set so the nodes constructed from S1 and S2 both have the required
minimum occupancy
...

• Deletion: Deletion can be performed like a B+ -tree deletion, borrowing entries
from sibling nodes, or merging sibling nodes if a node becomes underfull
...

See the bibliographical references for more details on insertion and deletion operations on R-trees, as well as on variants of R-trees, called R∗ -trees or R+ -trees
...
However, querying may be slower, since multiple paths have to be searched
...
However, because of their better storage
efﬁciency, and their similarity to B-trees, R-trees and their variants have proved popular in database systems that support spatial data
...
4 Multimedia Databases
Multimedia data, such as images, audio, and video — an increasingly popular form
of data — are today almost always stored outside the database, in ﬁle systems
...

However, database features become important when the number of multimedia
objects stored is large
...
Multimedia objects often have descriptive attributes, such as those indicating when they were created, who created them, and to
what category they belong
...

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

878

Chapter 23

VII
...
Advanced Data Types
and New Applications

© The McGraw−Hill
Companies, 2001

Advanced Data Types and New Applications

However, storing multimedia outside the database makes it harder to provide
database functionality, such as indexing on the basis of actual multimedia data content
...
It is therefore desirable to store the data
themselves in the database
...

• The database must support large objects, since multimedia data such as videos
can occupy up to a few gigabytes of storage
...
Larger objects could be split into
smaller pieces and stored in the database
...
The SQL/MED standard (MED stands for Management of External Data), which is under development, allows external data, such as ﬁles, to be treated as if they are part of the
database
...

We discuss multimedia data formats in Section 23
...
1
...
Such data
are sometimes called isochronous data, or continuous-media data
...
If
the data are supplied too fast, system buffers may overﬂow, resulting in loss
of data
...
4
...

• Similarity-based retrieval is needed in many multimedia database applications
...
Index structures such as B+ -trees and
R-trees cannot be used for this purpose; special index structures need to be
created
...
4
...
4
...
For
image data, the most widely used format is JPEG, named after the standards body
that created it, the Joint Picture Experts Group
...
The Moving Picture Experts
Group has developed the MPEG series of standards for encoding video and audio
data; these encodings exploit commonalities among a sequence of frames to achieve
a greater degree of compression
...
5 megabytes (compared to approximately 75 megabytes for video in only JPEG)
...

871

872

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

VII
...
Advanced Data Types
and New Applications

23
...
MPEG-2 compresses
1 minute of video and audio to approximately 17 megabytes
...

23
...
2 Continuous-Media Data
The most important types of continuous-media data are video and audio data (for example, a database of movies)
...

• Data must be delivered at a rate that does not cause overﬂow of system buffers
...
This need
arises, for example, when the video of a person speaking must show lips moving synchronously with the audio of the person speaking
...
Usually, data are
fetched in periodic cycles
...
The
cycle period is a compromise: A short period uses less memory but requires more
disk arm movement, which is a waste of resources, while a long period reduces disk
arm movement but increases memory requirements and may delay initial delivery
of data
...

Extensive research on delivery of continuous media data has dealt with such issues
as handling arrays of disks and dealing with disk failure
...

Several vendors offer video-on-demand servers
...
The basic architecture of a video-on-demand system
comprises:
• Video server
...
Systems containing a large volume of data may use tertiary
storage for less frequently accessed data
...
People view multimedia data through various devices, collectively
referred to as terminals
...

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

880

Chapter 23

VII
...
Advanced Data Types
and New Applications

© The McGraw−Hill
Companies, 2001

Advanced Data Types and New Applications

• Network
...

Video-on-demand service eventually will become ubiquitous, just as cable and
broadcast television are now
...

23
...
3 Similarity-Based Retrieval
In many multimedia applications, data are described only approximately in the database
...
4
...
Two pictures or images that are slightly different as represented
in the database may be considered the same by a user
...
When a new trademark is to be registered, the
system may need ﬁrst to identify all similar trademarks that were registered
previously
...
Speech-based user interfaces are being developed that allow the
user to give a command or identify a data item by speaking
...

• Handwritten data
...
Here again, similarity testing is
required
...
However, similarity
testing is often more successful than speech or handwriting recognition, because the
input can be compared to data already in the system and, thus, the set of choices
available to the system is limited
...
Some systems, including a dial-by-name, voice-activated telephone system,
have been deployed commercially
...

23
...
In distributed database applications, there has usually been strong central database and network administration
...
The increasingly widespread use of personal computers, and, more important,
of laptop or notebook computers
...
Other Topics

© The McGraw−Hill
Companies, 2001

23
...
5

Mobility and Personal Databases

881

2
...

Mobile computing has proved useful in many applications
...
Delivery
services use mobile computers to assist in package tracking
...
New applications of mobile computers continue to emerge
...
Location-dependent queries are an interesting class of
queries that are motivated by mobile computers; in such queries, the location of the
user (computer) is a parameter of the query
...
An
example is a traveler’s information system that provides data on hotels, roadside services, and the like to motorists
...
Increasingly, navigational aids are being offered as a built-in
feature in automobiles
...
This limitation inﬂuences many aspects of system design
...

Increasing amounts of data may reside on machines administered by users, rather
than by database administrators
...
In many cases, there is a conﬂict between the user’s
need to continue to work while disconnected and the need for global data consistency
...
5
...
5
...

23
...
1 A Model of Mobile Computing
The mobile-computing environment consists of mobile computers, referred to as mobile hosts, and a wired network of computers
...
Each mobile
support station manages those mobile hosts within its cell — that is, the geographical area that it covers
...
Since mobile hosts
may, at times, be powered down, a host may leave one cell and rematerialize later at
some distant cell
...
Within a small area, such as a building, mobile hosts may be connected by a
wireless local-area network (LAN) that provides lower-cost connectivity than would
a wide-area cellular network, and that reduces the overhead of handoffs
...
Other Topics

23
...
However, such communication can occur between only
nearby hosts
...
Bluetooth uses short-range digital radio to
allow wireless connectivity within a 10-meter range at high speed (up to 721 kilobits per second)
...

The network infrastructure for mobile computing consists in large part of two
technologies: wireless local-area networks (such as Avaya’s Orinoco wireless LAN),
and packet-based cellular telephony networks
...
Second-generation digital
systems retained the focus on voice appliations
...
5G systems use packet-based networking and are more suited to data applications
...

Bluetooth, wireless LANs, and 2
...
While such communication
itself does not ﬁt the domain of a usual database application, the accounting, monitoring, and management data pertaining to this communication will generate huge
databases
...
This need for timeliness adds another dimension
to the constraints on the system — a matter we shall discuss further in Section 24
...

The size and power limitations of many mobile computers have led to alternative
memory hierarchies
...
1, may be included
...
The
same considerations of size and energy limit the type and size of the display used
in a mobile device
...
However, the need to present Web-based
data has neccessitated the creation of presentation standards
...
WAP-based browsers access special Web pages that use wireless markup lanaguge (WML), an XML-based
language designed for the constraints of mobile and wireless Web browsing
...
5
...
This simple fact has a dramatic effect at the network level, since locationbased network addresses are no longer constants within the system
...
As we saw in Chapter 19,
we must consider the communication costs when we choose a distributed queryprocessing strategy
...
Furthermore, there are competing notions of cost to consider:

875

876

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

VII
...
Advanced Data Types
and New Applications

23
...
Often, battery power is a scarce resource whose use must
be optimized
...
Thus, transmission and reception of data impose different power demands on the mobile host
...
5
...

A typical application of such broadcast data is stock-market price information
...
First, the mobile host avoids the energy cost
for transmitting data requests
...
Thus, the available transmission
bandwidth is utilized more effectively
...
The mobile host may have local nonvolatile storage
available to cache the broadcast data for possible later use
...
If the cached data are insufﬁcient, there are two options: Wait
for the data to be broadcast, or transmit a request for data
...

Broadcast data may be transmitted according to a ﬁxed schedule or a changeable
schedule
...
In the latter case, the broadcast
schedule must itself be broadcast at a well-known radio frequency and at well-known
time intervals
...

Requests for data can be thought of as being serviced when the requested data are
broadcast
...
The bibliographical notes list recent research papers in the area of broadcast data management
...
5
...

Mobile computers without wireless connectivity are disconnected most of the time

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

884

Chapter 23

VII
...
Advanced Data Types
and New Applications

© The McGraw−Hill
Companies, 2001

Advanced Data Types and New Applications

when they are being used, except periodically when they are connected to their host
computers, either physically or through a computer network
...

The user of the mobile host may issue queries and updates on data that reside or are
cached locally
...
Since the mobile host represents a single point of failure, stable storage cannot be simulated well
...
Likewise, updates
occurring in the mobile host cannot be propagated until reconnection occurs
...
In wired distributed systems, partitioning is
considered to be a failure mode; in mobile computing, partitioning via disconnection
is part of the normal mode of operation
...

For data updated by only the mobile host, it is a simple matter to propagate the
updates when the mobile host reconnects
...
When the mobile host is connected, it can be sent invalidation
reports that inform it of out-of-date cache entries
...
A simple solution to this problem is
to invalidate the entire cache on reconnection, but such an extreme solution is highly
costly
...

If updates can occur at both the mobile host and elsewhere, detecting conﬂicting updates is more difﬁcult
...
These schemes do not guarantee that the updates will be consistent
...

The version-vector scheme detects inconsistencies when copies of a document are
independently updated
...
Although we use the term document, the scheme can be applied to any
other data items, such as tuples of a relation
...
When a host i updates a
document d, it increments the version number Vd,i [i] by one
...
However, before exchanging documents, the hosts have to discover whether the copies are consistent:
1
...

877

878

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

VII
...
Advanced Data Types
and New Applications

23
...
If, for each k, Vd,i [k] ≤ Vd,j [k] and the version vectors are not identical, then
the copy of document d at host i is older than the one at host j
...
Host i replaces its copy of d, as well as its
copy of the version vector for d, with the copies from host j
...
If there is a pair of hosts k and m such that Vd,i [k] < Vd,j [k] and Vd,i [m] >
Vd,j [m], then the copies are inconsistent; that is, the copy of d at i contains updates performed by host k that have not been propagated to host j, and, similarly, the copy of d at j contains updates performed by host m that have not
been propagated to host i
...
Manual intervention
may be required to merge the updates
...
The scheme gained importance because mobile computers often
store copies of ﬁles that are also present on server systems, in effect constituting a
distributed ﬁle system that is often disconnected
...
The version-vector scheme also has
applications in replicated databases
...
Many applications can perform reconciliation automatically by
executing in each computer those operations that had performed updates on remote
computers during the period of disconnection
...
Alternative techniques may be available in certain applications; in the worst case, however, it must be left to the users to resolve the inconsistencies
...

Another weakness is that the version-vector scheme requires substantial communication between a reconnecting mobile host and that host’s mobile support station
...

The potential for disconnection and the cost of wireless communication limit the
practicality of transaction-processing techniques discussed in Chapter 19 for distributed systems
...
Transactions that span more than one computer
and that include a mobile host face long-term blocking during transaction commit,
unless disconnectivity is rare or predictable
...
6 Summary
• Time plays an important role in database systems
...
Whereas most databases model the state of the real world at a

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

886

Chapter 23

VII
...
Advanced Data Types
and New Applications

Advanced Data Types and New Applications

point in time (at the current time), temporal databases model the states of the
real world across time
...
Temporal query languages simplify
modeling of time, as well as time-related queries
...

• Design data are stored primarily as vector data; geographic data consist of a
combination of vector and raster data
...

• Vector data can be encoded as ﬁrst-normal-form data, or can be stored using
non-ﬁrst-normal-form structures, such as lists
...

• R-trees are a multidimensional extension of B-trees; with variants such as R+trees and R∗ -trees, they have proved popular in spatial databases
...

• Multimedia databases are growing in importance
...

• Mobile computing systems have become common, leading to interest in database systems that can run on such systems
...
The query cost model must include
the cost of communication, including monetary cost and battery-power cost,
which is relatively high for mobile systems
...

• Disconnected operation, use of broadcast data, and caching of data are three
important issues being addressed in mobile computing
...
Other Topics

© The McGraw−Hill
Companies, 2001

23
...
1 What are the two types of time, and how are they different? Why does it make
sense to have both types of time associated with a tuple?
23
...
3 Suppose you have a relation containing the x, y coordinates and names of
restaurants
...
Which type of index would be preferable, R-tree or B-tree?
Why?
23
...

Is it possible to convert such vector data to raster data? If so, what are the
drawbacks of storing raster data obtained by such conversion, instead of the
original vector data?

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

888

Chapter 23

VII
...
Advanced Data Types
and New Applications

© The McGraw−Hill
Companies, 2001

Advanced Data Types and New Applications

23
...
Describe an algorithm to ﬁnd the
nearest neighbor by making use of multiple region queries
...
6 Suppose you want to store line segments in an R-tree
...

• Describe the effect on performance of having large bounding boxes on
queries that ask for line segments intersecting a given region
...
Hint: you can divide segments into smaller
pieces
...
7 Give a recursive procedure to efﬁciently compute the spatial join of two relations with R-tree indices
...
)
23
...
A schema to represent the geographic location of restaurants along with
features such as the cuisine served at the restaurant and the level of expensiveness
...
A query to ﬁnd moderately priced restaurants that serve Indian food and
are within 5 miles of your house (assume any location for your house)
...
A query to ﬁnd for each restaurant the distance from the nearest restaurant
serving the same cuisine and with the same level of expensiveness
...
9 What problems can occur in a continuous-media system if data is delivered
either too slowly or too fast?
23
...
3) can be used
in a broadcast-data environment, where there may occasionally be noise that
prevents reception of part of the data being transmitted
...
11 List three main features of mobile computing over wireless networks that are
distinct from traditional distributed systems
...
12 List three factors that need to be considered in query optimization for mobile
computing that are not considered in traditional query optimizers
...
13 Deﬁne a model of repeatedly broadcast data in which the broadcast medium is
modeled as a virtual disk
...

23
...
Copies of some documents are kept on mobile computers
...
Show how the version-vector scheme can ensure proper up-

881

882

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

VII
...
Advanced Data Types
and New Applications

© The McGraw−Hill
Companies, 2001

Bibliographical Notes

889

dating of the central database and mobile computers when a mobile computer
reconnects
...
15 Give an example to show that the version-vector scheme does not ensure serializability
...
14, with the assumption
that documents 1 and 2 are available on both mobile computers A and B, and
take into account the possibility that a document may be read without being
updated
...
[1993], Snodgrass et al
...

Stam and Snodgrass [1988] and Soo [1991] provide surveys on temporal data management
...
[1994] presents a glossary of temporal-database concepts, aimed
at unifying the terminology
...
Tansel et al
...
Chomicki [1995] presents techniques for managing temporal integrity constraints
...

[1994]
...
Samet [1990] provides a textbook coverage of spatial data structures
...
Samet
[1990] and Samet [1995b] describe numerous variants of quad trees
...
The R-tree was
originally presented in Guttman [1984]
...
[1987], which describes the R+ tree; Beckmann et al
...

Brinkhoff et al
...

Lo and Ravishankar [1996] and Patel and DeWitt [1996] present partitioning-based
methods for computation of spatial joins
...
Indexing of handwritten documents is discussed in Aref et al
...
[1995a], and Lopresti and Tomkins [1993]
...
[1992]
...
[1995] presents a technique for cona
current access to indices on spatial data
...
Indexing of multimedia data is discussed in Faloutsos and Lin [1995]
...
[1992], Rangan et al
...
[1994], Freedman and DeWitt
[1995], and Ozden et al
...
Fault tolerance is discussed in Berson et al
...
[1996a]
...
[1996] suggests alternative compression schemes
for video transmission over wireless networks
...
Other Topics

23
...
[1995], Chervenak et al
...

[1995a], and Ozden et al
...

Information management in systems that include mobile computers is studied in
Alonso and Korth [1993] and Imielinski and Badrinath [1994]
...
Indexing of data broadcast over wireless media is considered in
Imielinski et al
...
Caching of data in mobile environments is discussed in Barbar´ and Imielinski [1994] and Acharya et al
...
Disk management in mobile
a
computers is addressed in Douglis et al
...

The version-vector scheme for detecting inconsistency in distributed ﬁle systems
is described by Popek et al
...
[1983]
...
Other Topics

24
...
We discussed in those
chapters a variety of schemes for ensuring the ACID properties in an environment
where failure can occur, and where the transactions may run concurrently
...

24
...
The
term TP monitor initially stood for teleprocessing monitor
...
The
CICS TP monitor from IBM was one of the earliest TP monitors, and has been very
widely used
...

24
...
1 TP-Monitor Architectures
Large-scale transaction processing systems are built around a client–server architecture
...

891

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

892

VII
...
Advanced Transaction
Processing

Advanced Transaction Processing

remote
clients

server

files

remote
clients

(a) Process-per-client model

server

files

(b) Single-server model
monitor

remote
clients

router

servers

files

(c) Many-server, single-router model

Figure 24
...

This process-per-client model is illustrated in Figure 24
...
This model presents several problems with respect to memory utilization and processing speed:
• Per-process memory requirements are high
...

• The operating system divides up available CPU time among processes by
switching among them; this technique is called multitasking
...

The above problems can be avoided by having a single-server process to which
all remote clients connect; this model is called the single-server model, illustrated in
Figure 24
...
Remote clients send requests to the server process, which then executes
those requests
...
The server process handles tasks, such as
user authentication, that would normally be handled by the operating system
...
It executes code on

885

886

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

VII
...
Advanced Transaction
Processing

24
...
Unlike the overhead of full multitasking, the cost of switching
between threads is low (typically only a few microseconds)
...
However, they had problems, especially when multiple applications accessed the same database:
• Since all the applications run as a single process, there is no protection among
them
...
It
would be best to run each application as a separate process
...
(However, concurrent
threads within a process can be supported in a shared-memory multiprocessor system
...

One way to solve these problems is to run multiple application-server processes
that access a common database, and to let the clients communicate with the application through a single communication process that routes requests
...
1c
...
The request can, for example, be routed to the most lightly loaded server in a
pool
...
As a further generalization, the application servers can
run on different sites of a parallel or distributed database, and the communication
process can handle the coordination among the processes
...
A Web server has a
main process that receives HTTP requests, and then assigns the task of handling each
request to a separate process (chosen from among a pool of processes)
...

A more general architecture has multiple processes, rather than just one, to communicate with clients
...
Latergeneration TP monitors therefore have a different architecture, called the many-server,
many-router model, illustrated in Figure 24
...
A controller process starts up the
other processes, and supervises their functioning
...
Very high performance
Web server systems also adopt such an architecture
...
2
...
When messages arrive, they
may have to be queued; thus, there is a queue manager for incoming messages
...
Using a

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

894

Chapter 24

VII
...
Advanced Transaction
Processing

Advanced Transaction Processing

input queue
authorization
lock manager
application
servers

recovery manager
log manager
database and
resource managers

network
Figure 24
...

durable queue helps ensure that once received and stored in the queue, the
messages will be processed eventually, regardless of system failures
...
TP monitors often provide
logging, recovery, and concurrency-control facilities, allowing application servers to
implement the ACID transaction properties directly if required
...
Recall that persistent messaging (Section 19
...
3) provides a guarantee that the message will be delivered if (and only if) the transaction commits
...

24
...
2 Application Coordination Using TP monitors
Applications today often have to interact with multiple databases
...
Finally, they may have to communicate with users or
other applications at remote sites
...
It is important to be able to coordinate data accesses, and to
implement ACID properties for transactions across such systems
...
A TP monitor treats each subsystem as a resource manager that provides transactional access to some set of resources
...
Other Topics

© The McGraw−Hill
Companies, 2001

24
...
2

Transactional Workﬂows

895

prepare to commit transaction (for two-phase commit)
...

The resource-manager interface is deﬁned by the X/Open Distributed Transaction
Processing standard
...
TP monitors—as well as other products, such as SQL
systems, that support the X/Open standards—can connect to the resource managers
...
The TP monitor
can act as coordinator of two-phase commit for transactions that access these services as well as database systems
...
Two-phase commit between the database and the resource
managers for the durable queue and persistent messaging helps ensure that, regardless of failures, either all these actions occur, or none occurs
...
The TP monitor coordinates
activities such as system checkpoints and shutdowns
...
It administers server pools by adding servers or removing
servers without interruption of the system
...
If
a server fails, the TP monitor can detect this failure, abort the transactions in progress,
and restart the transactions
...
When failed nodes
restart, the TP monitor can govern the recovery of the node’s resource managers
...
10) are an example of replicated systems
...
If one site fails, the TP
monitor can transparently route messages to a backup site, masking the failure of the
ﬁrst site
...
As far as the client code
that invokes the RPC is concerned, the call looks like a local procedure-call invocation
...
In such an interface, the RPC mechanism provides calls that can be used to
enclose a series of RPC calls within a transaction
...

24
...
A task deﬁnes some work to be done and can be
speciﬁed in a number of ways, including a textual description in a ﬁle or electronicmail message, a form, a message, or a computer program
...
Other Topics

© The McGraw−Hill
Companies, 2001

24
...
3

Typical processing
entity
mailers
humans,
application software
humans, application
software, DBMSs

Examples of workﬂows
...

Figure 24
...
A simple example is that of an electronicmail system
...
Each mailer performs a task—forwarding the mail to the
next mailer—and the tasks of multiple mailers may be required to route mail from
source to destination
...
Workﬂow tasks are
also sometimes called steps
...
For instance, consider
the processing of a loan
...
4
...
An employee who processes loan applications veriﬁes the data in the form, using sources
such as credit-reference bureaus
...

Each human here performs a task; in a bank that has not automated the task of loan
processing, the coordination of the tasks is typically carried out by passing of the
loan application, with attached notes and other information, from one employee to

customer

loan
application

verification

reject
loan
disbursement
Figure 24
...

889

890

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

VII
...
Advanced Transaction
Processing

24
...
Other examples of workﬂows include processing of expense vouchers, of
purchase orders, and of credit-card transactions
...
Hence, it is feasible
for organizations to automate their workﬂows
...
The workﬂow itself then involves handing of responsibility
from one human to the next, and possibly even to programs that can automatically
fetch the required information
...

We have to address two activities, in general, to automate a workﬂow
...
The second problem is workﬂow execution, which we
must do while providing the safeguards of traditional database systems related to
computation correctness and data integrity and durability
...
The idea behind transactional workﬂows is to use
and extend the concepts of transactions to the context of workﬂows
...
Workﬂow activities may require interactions among several such systems, each performing a task, as well as
interactions with humans
...
Here, we
study properties of workﬂow systems at a relatively abstract level, without going
into the details of any particular system
...
2
...
In an abstract view of a task, a task may use parameters stored in its input variables, may retrieve and update data in the local system,
may store its results in its output variables, and may be queried about its execution
state
...

The coordination of tasks can be speciﬁed either statically or dynamically
...
For instance, the tasks in an expense-voucher workﬂow
may consist of the approvals of the voucher by a secretary, a manager, and an accountant, in that order, and ﬁnally by the delivery of a check
...

A generalization of this strategy is to have a precondition for execution of each
task in the workﬂow, so that all possible tasks in a workﬂow and their dependencies are known in advance, but only those tasks whose preconditions are satisﬁed

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

898

Chapter 24

VII
...
Advanced Transaction
Processing

© The McGraw−Hill
Companies, 2001

Advanced Transaction Processing

are executed
...

An example of dynamic scheduling of tasks is an electronic-mail routing system
...

24
...
2 Failure-Atomicity Requirements of a Workﬂow
The workﬂow designer may specify the failure-atomicity requirements of a workﬂow according to the semantics of the workﬂow
...
However, a workﬂow can, in many cases, survive the failure of one of its tasks—
for example, by executing a functionally equivalent task at another site
...

The system must guarantee that every execution of a workﬂow will terminate in a
state that satisﬁes the failure-atomicity requirements deﬁned by the designer
...
All other execution states of
a workﬂow constitute a set of nonacceptable termination states, in which the failureatomicity requirements may be violated
...
A
committed acceptable termination state is an execution state in which the objectives
of a workﬂow have been achieved
...
If an aborted acceptable termination state has been reached, all undesirable
effects of the partial execution of the workﬂow must be undone in accordance with
that workﬂow’s failure-atomicity requirements
...
Thus, if a workﬂow was in a nonacceptable termination state at the time of
failure, during system recovery it must be brought to an acceptable termination state
(whether aborted or committed)
...
In case of failures such as a long failure of the veriﬁcation system, the loan application could be

891

892

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

VII
...
Advanced Transaction
Processing

24
...
A committed acceptable termination would
be either the acceptance or the rejection of the loan
...
However, if the multitask transaction later aborts, its failure atomicity may require that we undo the effects of already completed tasks (for example,
committed subtransactions) by executing compensating tasks (as subtransactions)
...

In an expense-voucher-processing workﬂow, for example, a department-budget
balance may be reduced on the basis of an initial approval of a voucher by the manager
...

24
...
3 Execution of Workﬂows
The execution of the tasks may be controlled by a human coordinator or by a software system called a workﬂow-management system
...
A task agent controls the execution of a task by a processing entity
...
A scheduler may submit a task for execution (to a task agent),
or may request that a previously submitted task be aborted
...
In accordance with the workﬂow speciﬁcations,
the scheduler enforces the scheduling dependencies and is responsible for ensuring
that tasks reach acceptable termination states
...
A centralized architecture has a single scheduler that schedules the
tasks for all concurrently executing workﬂows
...
When the issues of concurrent
execution can be separated from the scheduling function, the latter option is a natural
choice
...

The simplest workﬂow-execution systems follow the fully distributed approach
just described and are based on messaging
...
Some implementations use e-mail for messaging; such implementations provide many of the features
of persistent messaging, but generally do not guarantee atomicity of message delivery and transaction commit
...
Execution may also involve presenting messages to humans, who
have then to carry out some action
...

The message contains all relevant information about the task to be performed
...
Other Topics

24
...

The centralized approach is used in workﬂow systems where the data are stored
in a central database
...

It is easier to keep track of the state of a workﬂow with a centralized approach than
it is with a fully distributed approach
...
Ideally, before attempting to execute a workﬂow,
the scheduler should examine that workﬂow to check whether the workﬂow may terminate in a nonacceptable state
...
As an example, let us consider a workﬂow consisting of two tasks represented by subtransactions S1 and S2 , with the failure-atomicity
requirements indicating that either both or neither of the subtransactions should be
committed
...
Therefore, such a workﬂow speciﬁcation is
unsafe, and should be rejected
...

24
...
4 Recovery of a Workﬂow
The objective of workﬂow recovery is to enforce the failure atomicity of the workﬂows
...
For example, the scheduler could continue processing after failure and recovery, as though
nothing happened, thus providing forward recoverability
...
In either case, some subtransactions may need to be committed or even submitted for
execution (for example, compensating subtransactions)
...
To recover the executionenvironment context, the failure-recovery routines need to restore the state information of the scheduler at the time of failure, including the information about the
execution states of each task
...

We also need to consider the contents of the message queues
...
Persistent messaging (Section 19
...
3) provides exactly the
features to ensure positive, single handoff
...
Other Topics

© The McGraw−Hill
Companies, 2001

24
...
3

Main-Memory Databases

901

24
...
5 Workﬂow Management Systems
Workﬂows are often hand coded as part of application systems
...

The goal of workﬂow management systems is to simplify the construction of workﬂows and make them more reliable, by permitting them to be speciﬁed in a high-level
manner and executed in accordance with the speciﬁcation
...

In today’s world of interconnected organizations, it is not sufﬁcient to manage
workﬂows only within an organization
...
For instance, consider an order placed by
an organization and communicated to another organization that fulﬁlls the order
...

The Workﬂow Management Coalition has developed standards for interoperation
between workﬂow systems
...
See the bibliographical notes for more information
...
3 Main-Memory Databases
To allow a high rate of transaction processing (hundreds or thousands of transactions
per second), we must use high-performance hardware, and must exploit parallelism
...
Disk I/O is often the bottleneck for reads, as well as for transaction commits
...

We can make a database system less disk bound by increasing the size of the
database buffer
...
Today, commercial 64-bit systems can support main
memories of tens of gigabytes
...
The memory size required for
most such systems is not exceptionally large, although there are at least a few applications that require multiple gigabytes of data to be memory resident
...

Large main memories allow faster processing of transactions, since data are memory resident
...
Other Topics

24
...
The improved performance made possible by a large main memory may
result in the logging process becoming a bottleneck
...
The overhead imposed by logging can also be reduced by the group-commit technique discussed
later in this section
...

• Buffer blocks marked as modiﬁed by committed transactions still have to be
written so that the amount of log that has to be replayed at recovery time is
reduced
...

• If the system crashes, all of main memory is lost
...
Therefore, even after recovery is complete, it takes some time before the database is fully loaded in main memory and high-speed processing
of transactions can resume
...
However, data structures can have pointers crossing multiple pages unlike those in
disk databases, where the cost of the I/Os to traverse multiple pages would be
excessively high
...

• There is no need to pin buffer pages in memory before data are accessed, since
buffer pages will never be replaced
...

• Once the disk I/O bottleneck is removed, operations such as locking and latching may become bottlenecks
...

• Recovery algorithms can be optimized, since pages rarely need to be written
out to make space for other pages
...
Additional information on main-memory databases is given in the references in the bibliographical notes
...
Other Topics

© The McGraw−Hill
Companies, 2001

24
...
4

Real-Time Transaction Systems

903

The process of committing a transaction T requires these records to be written to
stable storage:
• All log records associated with T that have not been output to stable storage
• The log record
These output operations frequently require the output of blocks that are only partially ﬁlled
...
Instead of attempting to commit T when T completes, the system waits until several transactions have completed, or a certain period of time has passed since
a transaction completed execution
...
Blocks written to the log on stable storage would contain records
of several transactions
...
This technique results, on average, in
fewer output operations per committed transaction
...
The delay can be made
quite small (say, 10 milliseconds), which is acceptable for many applications
...
Transactions can commit as soon as the write is performed on
the nonvolatile RAM buffer
...

Note that group commit is useful even in databases with disk-resident data
...
4 Real-Time Transaction Systems
The integrity constraints that we have considered thus far pertain to the values stored
in the database
...
Examples of such applications include plant management,
trafﬁc control, and scheduling
...
Rather, we are concerned
with how many deadlines are missed, and by how much time they are missed
...
Serious problems, such as system crash, may occur if a task is
not completed by its deadline
...
The task has zero value if it is completed after the deadline
...
The task has diminishing value if it is completed after the
deadline, with the value approaching zero as the degree of lateness increases
...

Transaction management in real-time systems must take deadlines into account
...
In such cases, it may be preferable to pre-empt the
transaction holding the lock, and to allow Ti to proceed
...
Other Topics

24
...
Unfortunately, it is
difﬁcult to determine whether rollback or waiting is preferable in a given situation
...
In the best case, all data accesses reference data in the
database buffer
...
Because the two or more disk accesses
required in the worst case take several orders of magnitude more time than the mainmemory references required in the best case, transaction execution time can be estimated only very poorly if data are resident on disk
...

However, even if data are resident in main memory, variances in execution time
arise from lock waits, transaction aborts, and so on
...
They have extended
locking protocols to provide higher priority for transactions with early deadlines
...
The bibliographical notes provide references to research
in the area of real-time databases
...
Designing a real-time system involves ensuring that there is enough processing
power to meet deadlines without requiring excessive hardware resources
...

24
...
Although
the techniques presented here and earlier in Chapters 15, 16, and 17 work well in
those applications, serious problems arise when this concept is applied to database
systems that involve human interaction
...
Once a human interacts with an active transaction, that transaction becomes a long-duration transaction from the perspective of the computer, since human response time is slow relative to computer speed
...
Thus, transactions may be of long duration in human
terms, as well as in machine terms
...
Data generated and displayed to a user by a
long-duration transaction are uncommitted, since the transaction may abort
...
If several users are cooperating on a project, user transactions
may need to exchange data prior to transaction commit
...
Other Topics

© The McGraw−Hill
Companies, 2001

24
...
5

Long-Duration Transactions

905

• Subtasks
...
The user may wish to abort a subtask without necessarily causing
the entire transaction to abort
...
It is unacceptable to abort a long-duration interactive transaction because of a system crash
...

• Performance
...
This deﬁnition is in contrast to that in a noninteractive system, in which high throughput (number of transactions per second) is the goal
...
However, in the case of interactive transactions, the most costly
resource is the user
...
In those
cases where a task takes a long time, response time should be predictable (that
is, the variance in response times should be low), so that users can manage
their time well
...
5
...
5
...

24
...
1 Nonserializable Executions
The properties that we discussed make it impractical to enforce the requirement
used in earlier chapters that only serializable schedules be permitted
...
When a lock cannot be granted, the transaction requesting the lock is forced to wait for the data item in question to be unlocked
...
If the data item is locked by a short-duration transaction, we expect
that the waiting time will be short (except in case of deadlock or extraordinary
system load)
...
Long waiting times lead to both longer
response time and an increased chance of deadlock
...
Graph-based protocols allow for locks to be released
earlier than under the two-phase locking protocols, and they prevent deadlock
...
Transactions must
lock data items in a manner consistent with this ordering
...
Furthermore, a transaction
must hold a lock until there is no chance that the lock will be needed again
...

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

906

Chapter 24

VII
...
Advanced Transaction
Processing

© The McGraw−Hill
Companies, 2001

Advanced Transaction Processing

• Timestamp-based protocols
...
However, they do require transactions to abort under certain circumstances
...
For noninteractive transactions, this lost work is a performance
issue
...
It
is highly undesirable for a user to ﬁnd that several hours’ worth of work have
been undone
...
Like timestamp-based protocols, validation protocols
enforce serializability by means of transaction abort
...
There are theoretical results, cited
in the bibliographical notes, that substantiate this conclusion
...
We previously discussed the problem of cascading rollback, in which
the abort of a transaction may lead to the abort of other transactions
...
If locking is
used, exclusive locks must be held until the end of the transaction, if cascading rollback is to be avoided
...

Thus, it appears that the enforcement of transaction atomicity must either lead to
an increased probability of long-duration waits or create a possibility of cascading
rollback
...

24
...
2 Concurrency Control
The fundamental goal of database concurrency control is to ensure that concurrent
execution of transactions does not result in a loss of database consistency
...
However, not all schedules that preserve consistency of the database are serializable
...
Although the schedule of Figure 24
...
It also illustrates two important points
about the concept of correctness without serializability
...

• Correctness depends on the properties of operations performed by each transaction
...
However,
there are simpler techniques
...
Other Topics

© The McGraw−Hill
Companies, 2001

24
...
5

T1
read(A)
A := A – 50
write(A)

Long-Duration Transactions

907

T2

read(B)
B := B – 10
write(B)
read(B)
B := B + 50
write(B)
read(A)
A := A + 10
write(A)
Figure 24
...

the basis for a split of the database into subdatabases on which concurrency can be
managed separately
...

The bibliographical notes reference other techniques for ensuring consistency without requiring serializability
...
6)
...
Since many of the new database applications require the
maintenance of versions of data, concurrency-control techniques that exploit multiple versions are practical
...
5
...
By structuring a transaction as a set of subtransactions, we are able to
enhance parallelism, since it may be possible to run several subtransactions in parallel
...

A nested or multilevel transaction T consists of a set T = {t1 , t2 ,
...
A subtransaction ti in T may abort without
forcing T to abort
...
If
ti commits, this action does not make ti permanent (unlike the situation in Chapter 17)
...
5
...
An execution of T must not violate the partial order P
...

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

908

Chapter 24

VII
...
Advanced Transaction
Processing

© The McGraw−Hill
Companies, 2001

Advanced Transaction Processing

Nesting may be several levels deep, representing a subdivision of a transaction
into subtasks, subsubtasks, and so on
...

If a subtransaction of T is permitted to release locks on completion, T is called
a multilevel transaction
...
Alternatively, if locks held
by a subtransaction ti of T are automatically assigned to T on completion of ti , T is
called a nested transaction
...
5 to show
how nesting can create higher-level operations that may enhance concurrency
...
Any execution of these subtransactions will generate a correct result
...
5 corresponds to the
schedule < T1,1 , T2,1 , T1,2 , T2,2 >
...
5
...
Indeed, multilevel
transactions may allow this exposure
...
The concept of compensating transactions helps us to deal with this problem
...
, tn
...
Now, if the outer-level transaction T has to
be aborted, the effect of its subtransactions must be undone
...
, tk have committed, and that tk+1 was executing when the decision to
abort is made
...
However, it is not possible to abort subtransactions t1 ,
...

Instead, we execute a new subtransaction cti , called a compensating transaction, to
undo the effect of a subtransaction ti
...
Other Topics

© The McGraw−Hill
Companies, 2001

24
...
5

Long-Duration Transactions

909

compensating transaction cti
...
, ct1
...
5, which we have shown to be correct,
although not conﬂict serializable
...
Suppose that T2 fails just prior to termination, after T2,2 has released its locks
...

• Consider a database insert by transaction Ti that, as a side effect, causes a
B+ -tree index to be updated
...
Other transactions may have read these nodes in
accessing data other than the record inserted by Ti
...
9, we can
undo the insertion by deleting the record inserted by Ti
...
Thus, deletion is a compensating action
for insertion
...

Transaction T has three subtransactions: Ti,1 , which makes airline reservations; Ti,2 , which reserves rental cars; and Ti,3 , which reserves a hotel room
...
Instead of undoing all of Ti ,
we compensate for the failure of Ti,3 by deleting the old hotel reservation and
making a new one
...
The techniques described in Section 17
...

Compensation for the failure of a transaction requires that the semantics of the
failed transaction be used
...
For more complex
transactions, the application programmers may have to deﬁne the correct form of
compensation at the time that the transaction is coded
...

24
...
5 Implementation Issues
The transaction concepts discussed in this section create serious difﬁculties for implementation
...

Long-duration transactions must survive system crashes
...
However, these actions solve only part of the problem
...
For a long-duration transaction to be resumed

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

910

Chapter 24

VII
...
Advanced Transaction
Processing

© The McGraw−Hill
Companies, 2001

Advanced Transaction Processing

after a crash, these data must be restored
...

Logging of updates is made more complex when certain types of data items exist
in the database
...
Such data items are physically large
...

There are two approaches to reducing the overhead of ensuring the recoverability
of large data items:
• Operation logging
...
Operation logging is also called logical logging
...
We perform
undo using the inverse operation, and redo using the operation itself
...
Further, using logical logging for an operation that updates multiple pages is greatly complicated by the fact that some, but not all, of the
updated pages may have been written to the disk, so it is hard to apply either
the redo or the undo of the operation on the disk image during recovery
...
9, provides the concurrency beneﬁts of logical logging while avoiding
the above pitfalls
...
Logging is used for modiﬁcations to small data
items, but large data items are made recoverable via a shadow-page technique
(see Section 17
...
When we use shadowing, only those pages that are actually
modiﬁed need to be stored in duplicate
...
Thus, it is desirable
to allow certain noncritical data to be exempt from logging, and to rely instead on
ofﬂine backups and human intervention
...
6 Transaction Management in Multidatabases
Recall from Section 19
...

A multidatabase system supports two types of transactions:
1
...
These transactions are executed by each local database
system outside of the multidatabase system’s control
...
Global transactions
...

903

904

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

VII
...
Advanced Transaction
Processing

24
...

Ensuring the local autonomy of each database system requires that no changes
be made to its software
...

Since the multidatabase system has no control over the execution of local transactions, each local system must use a concurrency-control scheme (for example, twophase locking or timestamping) to ensure that its schedule is serializable
...

The guarantee of local serializability is not sufﬁcient to ensure global serializability
...

Suppose that the local schedules are serializable
...
Indeed, even if there is no concurrency among global
transactions (that is, a global transaction is submitted only after the previous one
commits or aborts), local serializability is not sufﬁcient to ensure global serializability (see Exercise 24
...

Depending on the implementation of the local database systems, a global transaction may not be able to control the precise locking behavior of its local substransactions
...
For example, one local database system may commit its subtransaction and
release locks, while the subtransaction at another local system is still executing
...
If different local systems follow different concurrencycontrol mechanisms, however, this straightforward sort of global control does not
work
...
Some are based on imposing sufﬁcient conditions to ensure global serializability
...
We consider one of the latter schemes: two-level serializability
...
5
describes further approaches to consistency without serializability; other approaches
are cited in the bibliographical notes
...
If
all local systems follow the two-phase commit protocol, that protocol can be used
to achieve global atomicity
...
Even if a local system is capable of supporting two-phase commit, the organization owning the system
may be unwilling to permit waiting in cases where blocking occurs
...
Other Topics

24
...

Further discussion of these matters appears in the literature (see the bibliographical
notes)
...
6
...

• The multidatabase system ensures serializability among the global transactions alone—ignoring the orderings induced by local transactions
...
Local systems already offer
guarantees of serializability; thus, the ﬁrst requirement is easy to achieve
...
Thus, the multidatabase system can ensure the second requirement using standard concurrency-control techniques (the precise choice of technique
does not matter)
...

However, under the 2LSR-based approach, we adopt a requirement weaker than serializability, called strong correctness:
1
...
Guarantee that the set of data items read by each transaction is consistent
It can be shown that certain restrictions on transaction behavior, combined with 2LSR,
are sufﬁcient to ensure strong correctness (although not necessarily to ensure serializability)
...

In each of the protocols, we distinguish between local data and global data
...
Note
that there cannot be any consistency constraints between local data items at distinct
sites
...

The global-read protocol allows global transactions to read, but not to update,
local data items, while disallowing all access to global data by local transactions
...
Local transactions access only local data items
...
Global transactions may access global data items, and may read local data
items (although they must not write local data items)
...
There are no consistency constraints between local and global data items
...
In this protocol, we need to

905

906

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

VII
...
Advanced Transaction
Processing

24
...
A transaction has a value dependency
if the value that it writes to a data item at one site depends on a value that it read for
a data item on another site
...
Local transactions may access local data items, and may read global data items
stored at the site (although they must not write global data items)
...
Global transactions access only global data items
...
No transaction may have a value dependency
...
It allows global transactions to read
and write local data, and allows local transactions to read global data
...

The global-read–write/local-read protocol ensures strong correctness if all these
conditions hold:
1
...

2
...

3
...

4
...

24
...
2 Ensuring Global Serializability
Early multidatabase systems restricted global transactions to be read only
...
It is indeed possible to get such global schedules and to develop a scheme to ensure global serializability, and we ask you to do both in Exercise 24
...

There are a number of general schemes to ensure global serializability in an environment where update as well read-only transactions can execute
...
A special data item called a ticket is created
in each local database system
...
This requirement ensures that global transactions
conﬂict directly at every site they visit
...
References to such schemes appear in the
bibliographical notes
...
Other Topics

24
...
For example, if the local schedules
are such that the commit order and serialization order are always identical, we can
ensure serializability by controlling only the order in which transactions commit
...
They are particularly likely to do so because most transactions submit SQL statements to the underlying database system, instead of submitting individual read, write, commit, and abort steps
...
6
...

24
...
They exist not just in computer applications, but also in almost all organizational activities
...

• Although the usual ACID transactional requirements are too strong or are
unimplementable for such workﬂow applications, workﬂows must satisfy a
limited set of transactional properties that guarantee that a process is not left
in an inconsistent state
...

They have since evolved, and today they provide the infrastructure for building and administering complex transaction-processing systems that have a
large number of clients and multiple servers
...

• Large main memories are exploited in certain systems to achieve high system
throughput
...
Under the group-commit
concept, the number of outputs to stable storage can be reduced, thus releasing this bottleneck
...
Since the concurrency-control techniques used in Chapter 16 use waits,
aborts, or both, alternative techniques must be considered
...

• A long-duration transaction is represented as a nested transaction with atomic
database operations at the lowest level
...
Active long-duration transactions resume once

907

908

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

VII
...
Advanced Transaction
Processing

24
...
A compensating transaction
is needed to undo updates of nested transactions that have committed, if the
outer-level transaction fails
...
The wide variance
of execution times for read and write operations complicates the transactionmanagement problem for time-constrained systems
...

The local database systems may employ different logical models and datadeﬁnition and data-manipulation languages, and may differ in their concurrency-control and transaction-management mechanisms
...

Review Terms
• TP monitor
• TP-monitor architectures
Process per client
Single server
Many server, single router
Many server, many router

• Workﬂow termination states
Acceptable
Nonacceptable
Committed
Aborted
• Workﬂow recovery

• Multitasking

• Workﬂow-management system

• Context switch

• Workﬂow-management system
architectures
Centralized
Partially distributed
Fully distributed
• Main-memory databases

• Multithreaded server
• Queue manager
• Application coordination
Resource manager
Remote procedure call (RPC)
• Transactional Workﬂows
Task
Processing entity
Workﬂow speciﬁcation
Workﬂow execution
• Workﬂow state
Execution states
Output values
External variables
• Workﬂow failure atomicity

• Group commit
• Real-time systems
• Deadlines
Hard deadline
Firm deadline
Soft deadline
• Real-time databases
• Long-duration transactions
• Exposure of uncommitted data
• Subtasks

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

916

Chapter 24

VII
...
Advanced Transaction
Processing

Advanced Transaction Processing

• Nonserializable executions
• Nested transactions
• Multilevel transactions
• Saga
• Compensating transactions
• Logical logging
• Multidatabase systems
• Autonomy
• Local transactions
• Global transactions

•
•
•
•
•

Two-level serializability (2LSR)
Strong correctness
Local data
Global data
Protocols

Global-read
Local-read
Value dependency
Global-read–write/local-read
• Ensuring global serializability
• Ticket

Exercises
24
...

24
...

24
...

a
...

b
...

c
...

d
...

24
...
List three reasons why we cannot simply apply a relational
database system using 2PL, physical undo logging, and 2PC
...
5 If the entire database ﬁts in main memory, do we still need a database system
to manage the data? Explain your answer
...
6 Consider a main-memory database system recovering from a system crash
...
7 In the group-commit technique, how many transactions should be part of a
group? Explain your answer
...
Other Topics

24
...
8 Is a high-performance transaction system necessarily a real-time system? Why
or why not?
24
...

24
...

24
...
Different threads may run concurrently, attempting to
deliver different messages
...
Model the actions that each thread carries out as a multilevel transaction, so that locks on the queue need not be held till a message is
delivered
...
12 Discuss the modiﬁcations that need to be made in each of the recovery schemes
covered in Chapter 17 if we allow nested transactions
...

24
...

24
...

a
...

b
...

24
...

a
...

b
...

Bibliographical Notes
Gray and Edwards [1995] provides an overview of TP monitor architectures; Gray
and Reuter [1993] provides a detailed (and excellent) textbook description of transaction-processing systems, including chapters on TP monitors
...
X/Open [1991] deﬁnes the X/Open XA
interface
...
Wipﬂer
[1987] is one of several texts on application development using CICS
...
A reference model for workﬂows, proposed by the Workﬂow Management Coalition, is presented in Hollinsworth

Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition

918

Chapter 24

VII
...
Advanced Transaction
Processing

© The McGraw−Hill
Companies, 2001

Advanced Transaction Processing

[1994]
...
wfmc
...
Our description of workﬂows
follows the model of Rusinkiewicz and Sheth [1995]
...
Some issues related to workﬂows were addressed in the work
on long-running activities described by Dayal et al
...
[1991]
...
Jin et al
...

Garcia-Molina and Salem [1992] provides an overview of main-memory databases
...
[1993] describes a recovery algorithm designed for main-memory databases
...

[1994]
...
[1990]
...
[1982] describes a real-time database system used in a telecommunications switching system
...
[1990b] and
Soparkar et al
...
Concurrency control and scheduling in real-time databases are
discussed by Haritsa et al
...
[1993], and Pang et al
...
Ozsoyoglu
and Snodgrass [1995] is a survey of research in real-time and temporal databases
...
[1990b], Fekete et al
...
[1988]
...
[1988] and Weihl and Liskov [1990]
...
[1992] and Rothermel
and Mohan [1989]), and the NT/PV model (Korth and Speegle [1994])
...

[1995]
...
[1989]
...
[1988]
...
Multilevel transaction
management is discussed in Weikum [1991]
...
Transaction processing for
long-duration transactions is considered by Weikum and Schek [1984], Haerder and
Rothermel [1987], Weikum et al
...
[1990a]
...
[1994]
presents an extension of 2PL for long-duration transactions by allowing the early
release of locks under certain circumstances
...
[1988], Kaiser [1990],
and Weikum [1991]
...

[1990], Breitbart et al
...
[1992], Soparkar et al
...
[1992b] and Mehrotra et al
...
The ticket scheme is presented in Georgakopoulos et al
...
2LSR is introduced in Mehrotra et al
...
An earlier approach, called quasi-serializability, is presented in Du and Elmagarmid [1989]

Title: plsql
Description: My notes is about the pl/sql programs that we do in database management system.

Buy These Notes Preview

Notesale: Turn your study into money

Already a Member? >

Search for notes by fellow students, in your own course and all over the country.

My Basket

Document Preview