Search for notes by fellow students, in your own course and all over the country.
Browse our notes for titles which look like what you need, you can preview any of the notes via a sample of the contents. After you're happy these are the notes you're after simply pop them into your shopping cart.
Title: plsql
Description: My notes is about the pl/sql programs that we do in database management system.
Description: My notes is about the pl/sql programs that we do in database management system.
Document Preview
Extracts from the notes are below, to see the PDF you'll receive please use the links above
Edited by Foxit PDF Editor
Copyright
dddddd (c) by Foxit Software Company, 2004
For Evaluation Only
...
Computer
Science
Volume 1
Silberschatz−Korth−Sudarshan • Database System Concepts, Fourth Edition
Front Matter
1
Preface
1
1
...
Data Models
35
Introduction
2
...
Relational Model
35
36
87
II
...
SQL
5
...
Integrity and Security
7
...
Object−Based Databases and XML
307
Introduction
8
...
Object−Relational Databases
10
...
Data Storage and Querying
393
Introduction
11
...
Indexing and Hashing
13
...
Query Optimization
393
394
446
494
529
V
...
Transactions
16
...
Recovery System
563
564
590
637
iii
VI
...
Database System Architecture
19
...
Parallel Databases
679
680
705
750
VII
...
Application Development and Administration
22
...
Advanced Data Types and New Applications
24
...
In this text, we present the fundamental concepts of database management
...
This text is intended for a first course in databases at the junior or senior undergraduate, or first-year graduate, level
...
We assume only a familiarity with basic data structures, computer organization,
and a high-level programming language such as Java, C, or Pascal
...
Important theoretical results are covered, but formal proofs are
omitted
...
In place of proofs, figures and examples are used to suggest why a result is
true
...
Our aim is
to present these concepts and algorithms in a general setting that is not tied to one
particular database system
...
”
In this fourth edition of Database System Concepts, we have retained the overall style
of the first three editions, while addressing the evolution of database management
...
Every chapter has
been edited, and most have been modified extensively
...
xv
2
Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition
xvi
Front Matter
Preface
© The McGraw−Hill
Companies, 2001
Preface
Organization
The text is organized in eight major parts, plus three appendices:
• Overview (Chapter 1)
...
We explain how the concept of a database
system has developed, what the common features of database systems are,
what a database system does for the user, and how a database system interfaces with operating systems
...
This example
is used as a running example throughout the book
...
• Data models (Chapters 2 and 3)
...
This model provides a high-level view of the issues in database design,
and of the problems that we encounter in capturing the semantics of realistic
applications within the constraints of a data model
...
• Relational databases (Chapters 4 through 7)
...
Chapter 5 covers
two other relational languages, QBE and Datalog
...
Algorithms
and design issues are deferred to later chapters
...
Chapter 6 presents constraints from the standpoint of database integrity
and security; Chapter 7 shows how constraints can be used in the design of
a relational database
...
The theme of this chapter is the protection of the database
from accidental and intentional damage
...
The theory
of functional dependencies and normalization is covered, with emphasis on
the motivation and intuitive understanding of each normal form
...
• Object-based databases and XML (Chapters 8 through 10)
...
It introduces the concepts of object-oriented programming, and shows how these concepts form the basis for a data model
...
Chapter 9 covers object-relational databases, and shows how the SQL:1999 standard extends
the relational data model to include object-oriented features, such as inheritance, complex types, and object identity
...
The chapter also describes query languages for XML
...
Chapter 11 deals with
disk, file, and file-system structure, and with the mapping of relational and
object data to a file system
...
Chapters 13 and 14 address query-evaluation algorithms, and query optimization
based on equivalence-preserving query transformations
...
• Transaction management (Chapters 15 through 17)
...
Chapter 16 focuses on concurrency control and presents several techniques
for ensuring serializability, including locking, timestamping, and optimistic
(validation) techniques
...
Chapter 17
covers the primary techniques for ensuring correct transaction execution despite system crashes and disk failures
...
• Database system architecture (Chapters 18 through 20)
...
We discuss centralized systems,
client – server systems, parallel and distributed architectures, and network
types in this chapter
...
The chapter also covers issues of system availability during failures and describes the
LDAP directory system
...
The chapter also describes
parallel-system design
...
Chapter 21 covers database application development and administration
...
Chapter 22 covers querying techniques, including decision support systems, and information retrieval
...
The chapter also describes information retrieval techniques for
4
Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition
xviii
Front Matter
Preface
© The McGraw−Hill
Companies, 2001
Preface
querying textual data, including hyperlink-based techniques used in Web
search engines
...
Finally, Chapter 24 deals with
advanced transaction processing
...
• Case studies (Chapters 25 through 27)
...
These chapters outline unique features of each of these
products, and describe their internal structure
...
They also cover several interesting practical aspects in the design of
real systems
...
Although most new database applications use either the
relational model or the object-oriented model, the network and hierarchical
data models are still in use
...
bell-labs
...
Appendix C describes advanced relational database design, including the
theory of multivalued dependencies, join dependencies, and the project-join
and domain-key normal forms
...
This appendix, too, is available
only online, on the Web page of the book
...
Our basic procedure was to rewrite the material in each chapter, bringing the older
material up to date, adding discussions on recent developments in database technology, and improving descriptions of topics that students found difficult to understand
...
We have also added a tools section at the end of most chapters, which provide information on software tools related to the topic of the chapter
...
We have added a new chapter covering XML, and three case study chapters covering the leading commercial database systems, including Oracle, IBM DB2, and Microsoft SQL Server
...
For the benefit of those readers familiar with the third edition,
we explain the main changes here:
• Entity-relationship model
...
More examples have been added, and some changed,
to give better intuition to the reader
...
• Relational databases
...
SQL coverage has been significantly expanded to include the with clause, expanded coverage of embedded SQL, and coverage of ODBC and JDBC whose
usage has increased greatly in the past few years
...
Coverage of QBE
has been revised to remove some ambiguities and to add coverage of the QBE
version used in the Microsoft Access database
...
Coverage of security has been moved to Chapter 6 from its third-edition position of Chapter 19
...
Chapter 7 covers relational-database
design and normal forms
...
Chapter
7 has been significantly rewritten, providing several short-cut algorithms for
dealing with functional dependencies and extended coverage of the overall
database design process
...
• Object-based databases
...
Object-relational coverage in
Chapter 9 has been updated, and in particular the SQL:1999 standard replaces
the extended SQL used in the third edition
...
Chapter 10, covering XML, is a new chapter in the fourth edition
...
Coverage of storage and file structures, in Chapter 11, has been updated; this chapter was Chapter 10 in the
third edition
...
Coverage of RAID has been updated to reflect technology trends
...
Chapter 12, on indexing, now includes coverage of bitmap indices; this
chapter was Chapter 11 in the third edition
...
Partitioned hashing has been dropped, since it is not in significant use
...
All
details regarding cost estimation and query optimization have been moved
6
Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition
xx
Front Matter
Preface
© The McGraw−Hill
Companies, 2001
Preface
to Chapter 14, allowing Chapter 13 to concentrate on query processing algorithms
...
Chapter 14
now has pseudocode for optimization algorithms, and new sections on optimization of nested subqueries and on materialized views
...
Chapter 15, which provides an introduction to transactions, has been updated; this chapter was numbered Chapter 13 in the third
edition
...
Chapter 16, on concurrency control, includes a new section on implementation of lock managers, and a section on weak levels of consistency, which
was in Chapter 20 of the third edition
...
Chapter 17, on recovery, now includes coverage of the ARIES
recovery algorithm
...
As in the third edition, instructors can choose between just introducing
transaction-processing concepts (by covering only Chapter 15), or offering detailed coverage (based on Chapters 15 through 17)
...
Chapter 18, which provides an overview of
database system architectures, has been updated to cover current technology;
this was Chapter 16 in the third edition
...
While the coverage of parallel database query processing techniques in Chapter 20
(which was Chapter 16 in the third edition) is mainly of interest to those who
wish to learn about database internals, distributed databases, now covered in
Chapter 19, is a topic that is more fundamental; it is one that anyone dealing
with databases should be familiar with
...
Coverage of three-phase commit protocol has been abbreviated, as has distributed detection of global deadlocks, since neither is
used much in practice
...
There is
a new section on directory systems, in particular LDAP, since these are quite
widely used as a mechanism for making information available in a distributed
setting
...
Although we have modified and updated the entire text, we
concentrated our presentation of material pertaining to ongoing database research and new database applications in four new chapters, from Chapter 21
to Chapter 24
...
The description of how to build Web interfaces to
databases, including servlets and other mechanisms for server-side scripting,
is new
...
Coverage of materialized view selection is also new
...
There is a new section on e-commerce, focusing on database issues in e-commerce, and a new
section on dealing with legacy systems
...
Coverage of data warehousing and data mining has also been extended greatly
...
Earlier versions of this material were in Chapter 21 of the third edition
...
This material is an updated version of material that was in Chapter 21
of the third edition
...
• Case studies
...
These chapters outline unique features
of each of these products, and describe their internal structure
...
We have marked several sections as advanced, using the symbol
“∗∗”
...
It is possible to design courses by using various subsets of the chapters
...
• If object orientation is to be covered in a separate advanced course, Chapters
8 and 9, and Section 11
...
Alternatively, they could constitute
the foundation of an advanced course in object databases
...
• Both our coverage of transaction processing (Chapters 15 through 17) and our
coverage of database-system architecture (Chapters 18 through 20) consist of
8
Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition
xxii
Front Matter
Preface
© The McGraw−Hill
Companies, 2001
Preface
an overview chapter (Chapters 15 and 18, respectively), followed by chapters with details
...
• Chapters 21 through 24 are suitable for an advanced course or for self-study
by students, although Section 21
...
Model course syllabi, based on the text, can be found on the Web home page of the
book (see the following section)
...
bell-labs
...
For more information about how to get a copy of the solution manual, please send electronic mail to
customer
...
com
...
The McGraw-Hill Web page for this book is
http://www
...
com/silberschatz
Contacting Us and Other Users
We provide a mailing list through which users of our book can communicate among
themselves and with us
...
bell-labs
...
We have endeavored to eliminate typos, bugs, and the like from the text
...
We would appreciate it if you would notify us of any
errors or omissions in the book that are not on the current list of errata
...
We also
welcome any contributions to the book Web page that could be of use to other read-
Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition
Front Matter
Preface
9
© The McGraw−Hill
Companies, 2001
Preface
xxiii
ers, such as programming exercises, project suggestions, online labs and tutorials,
and teaching tips
...
bell-labs
...
Any other correspondence should be sent to Avi Silberschatz, Bell Laboratories, Room 2T-310, 600
Mountain Avenue, Murray Hill, NJ 07974, USA
...
In addition, many people have
written or spoken to us about the book, and have offered suggestions and comments
...
Gurari, The Ohio State
University; Irwin Levinstein, Old Dominion University; Ling Liu, Georgia Institute of Technology; Ami Motro, George Mason University; Bhagirath Narahari, Meral Ozsoyoglu, Case Western Reserve University; and Odinaldo Rodriguez, King’s College London; who served as reviewers of the book and
whose comments helped us greatly in formulating this fourth edition
...
L
...
• Phil Bohannon, for writing the first draft of Chapter 10 describing XML
...
Blakeley, Kalen Delaney, Michael Rys, Michael
e
Zwilling, Sameet Agarwal, Thomas Casey (all of Microsoft) for writing the
appendices describing the Oracle, IBM DB2, and Microsoft SQL Server database
systems
...
• Marilyn Turnamian and Nandprasad Joshi, whose excellent secretarial assistance was essential for timely completion of this fourth edition
...
The senior developmental editor was Kelley
Butcher
...
The executive marketing manager was
John Wannemacher
...
The freelance copyeditor was George Watson
...
The supplement producer was Jodi Banowetz
...
The freelance indexer was Tobiah Waldron
...
B
...
Edwards,
Christos Faloutsos, Homma Farian, Alan Fekete, Shashi Gadia, Jim Gray, Le Gruen-
10
Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition
xxiv
Front Matter
Preface
© The McGraw−Hill
Companies, 2001
Preface
wald, Ron Hitchens, Yannis Ioannidis, Hyoung-Joo Kim, Won Kim, Henry Korth (father of Henry F
...
V
...
Seshadri, Shashi Shekhar, Amit Sheth, Nandit Soparkar, Greg Speegle, and Marianne Winslett
...
Greg Speegle, Dawn Bezviner, and K
...
Raghavan helped
us to prepare the instructor’s manual for earlier editions
...
The idea of using ships as part of the cover
concept was originally suggested to us by Bruce Stephan
...
Hank
would like to acknowledge his wife, Joan, and his children, Abby and Joe, for their
love and understanding
...
A
...
H
...
K
...
S
...
Introduction
H
A
P
T
E
R
11
© The McGraw−Hill
Companies, 2001
Text
1
Introduction
A database-management system (DBMS) is a collection of interrelated data and a
set of programs to access those data
...
The primary goal of a DBMS
is to provide a way to store and retrieve database information that is both convenient
and efficient
...
Management of data involves both defining structures for storage of information and providing mechanisms for the manipulation of information
...
If data are to be shared among several users, the
system must avoid possible anomalous results
...
These
concepts and technique form the focus of this book
...
1
...
Here are some representative applications:
• Banking: For customer information, accounts, and loans, and banking transactions
...
Airlines were among the
first to use databases in a geographically distributed manner — terminals situated around the world accessed the central database system through phone
lines and other data networks
...
1
12
Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition
2
Chapter 1
1
...
• Telecommunication: For keeping records of calls made, generating monthly bills,
maintaining balances on prepaid calling cards, and storing information about
the communication networks
...
• Sales: For customer, product, and purchase information
...
• Human resources: For information about employees, salaries, payroll taxes and
benefits, and for generation of paychecks
...
Over the course of the last four decades of the twentieth century, use of databases
grew in all enterprises
...
Then automated teller machines
came along and let users interact directly with databases
...
The internet revolution of the late 1990s sharply increased direct user access to
databases
...
For
instance, when you access an online bookstore and browse a book or music collection, you are accessing data stored in a database
...
When you access a bank Web site and retrieve
your bank balance and transaction information, the information is retrieved from the
bank’s database system
...
Furthermore, data about your Web accesses may be stored in a database
...
The importance of database systems can be judged in another way — today, database system vendors like Oracle are among the largest software companies in the
world, and database systems form an important part of the product line of more
diversified companies like Microsoft and IBM
...
Introduction
13
© The McGraw−Hill
Companies, 2001
Text
1
...
2 Database Systems versus File Systems
Consider part of a savings-bank enterprise that keeps information about all customers and savings accounts
...
To allow users to manipulate the information, the
system has a number of application programs that manipulate the files, including
• A program to debit or credit an account
• A program to add a new account
• A program to find the balance of an account
• A program to generate monthly statements
System programmers wrote these application programs to meet the needs of the
bank
...
For example, suppose that the savings bank decides to offer checking accounts
...
Thus,
as time goes by, the system acquires more files and more application programs
...
The system stores permanent records in various files, and it needs different
application programs to extract records from, and add records to, the appropriate
files
...
Keeping organizational information in a file-processing system has a number of
major disadvantages:
• Data redundancy and inconsistency
...
Moreover, the same information may be duplicated in
several places (files)
...
This redundancy leads
to higher storage and access cost
...
For
example, a changed customer address may be reflected in savings-account
records but not elsewhere in the system
...
Suppose that one of the bank officers needs to
find out the names of all customers who live within a particular postal-code
area
...
Because the designers of the original system did not anticipate this request,
there is no application program on hand to meet it
...
The bank officer has
14
Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition
4
Chapter 1
1
...
Both alternatives are obviously unsatisfactory
...
As expected, a program to generate such a list does
not exist
...
The point here is that conventional file-processing environments do not allow needed data to be retrieved in a convenient and efficient manner
...
• Data isolation
...
• Integrity problems
...
For example, the balance of a bank account may never fall below a prescribed amount (say, $25)
...
However, when new constraints are added, it is difficult
to change the programs to enforce them
...
• Atomicity problems
...
In many applications, it is crucial that, if a
failure occurs, the data be restored to the consistent state that existed prior to
the failure
...
If a system failure occurs during the execution of the program, it is possible
that the $50 was removed from account A but was not credited to account B,
resulting in an inconsistent database state
...
That is, the funds transfer must be atomic — it must happen in its entirety or
not at all
...
• Concurrent-access anomalies
...
In such an environment, interaction of concurrent updates may result in inconsistent data
...
If two customers withdraw funds (say $50 and $100 respectively) from
account A at about the same time, the result of the concurrent executions may
leave the account in an incorrect (or inconsistent) state
...
If the
two programs run concurrently, they may both read the value $500, and write
back $450 and $400, respectively
...
Introduction
15
© The McGraw−Hill
Companies, 2001
Text
1
...
To guard against this possibility, the system must maintain some form
of supervision
...
• Security problems
...
For example, in a banking system, payroll personnel need
to see only that part of the database that has information about the various
bank employees
...
But, since application programs are added to the system in an ad hoc
manner, enforcing such security constraints is difficult
...
In what follows, we shall see the concepts and algorithms that enable database systems to solve the problems with file-processing systems
...
1
...
A major purpose of a database system is to
provide users with an abstract view of the data
...
1
...
1 Data Abstraction
For the system to be usable, it must retrieve data efficiently
...
Since many database-systems users are not computer trained, developers hide the
complexity from users through several levels of abstraction, to simplify users’ interactions with the system:
• Physical level
...
The physical level describes complex low-level data structures in
detail
...
The next-higher level of abstraction describes what data are
stored in the database, and what relationships exist among those data
...
Although implementation of the simple structures at the logical level may involve complex physical-level structures, the
user of the logical level does not need to be aware of this complexity
...
16
Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition
6
Chapter 1
1
...
The highest level of abstraction describes only part of the entire
database
...
Many
users of the database system do not need all this information; instead, they
need to access only a part of the database
...
The system may provide many
views for the same database
...
1 shows the relationship among the three levels of abstraction
...
Most high-level programming languages
support the notion of a record type
...
Each field has
a name and a type associated with it
...
The language
view level
view 1
view 2
…
view n
logical
level
physical
level
Figure 1
...
Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition
1
...
4
Data Models
7
compiler hides this level of detail from programmers
...
Database
administrators, on the other hand, may be aware of certain details of the physical
organization of the data
...
Programmers using a programming language work at this level of abstraction
...
Finally, at the view level, computer users see a set of application programs that
hide details of the data types
...
In addition to hiding details of the
logical level of the database, the views also provide a security mechanism to prevent
users from accessing certain parts of the database
...
1
...
2 Instances and Schemas
Databases change over time as information is inserted and deleted
...
The overall design of the database is called the database schema
...
The concept of database schemas and instances can be understood by analogy to
a program written in a programming language
...
Each
variable has a particular value at a given instant
...
Database systems have several schemas, partitioned according to the levels of abstraction
...
A database
may also have several schemas at the view level, sometimes called subschemas, that
describe different views of the database
...
The physical schema is hidden beneath the logical schema, and can usually
be changed easily without affecting application programs
...
We study languages for describing schemas, after introducing the notion of data
models in the next section
...
4 Data Models
Underlying the structure of a database is the data model: a collection of conceptual
tools for describing data, data relationships, data semantics, and consistency constraints
...
Introduction
Text
© The McGraw−Hill
Companies, 2001
Introduction
section: the entity-relationship model and the relational model
...
1
...
1 The Entity-Relationship Model
The entity-relationship (E-R) data model is based on a perception of a real world that
consists of a collection of basic objects, called entities, and of relationships among these
objects
...
For example, each person is an entity, and bank accounts can be
considered as entities
...
For example, the attributes account-number and balance may describe one particular account in a bank,
and they form attributes of the account entity set
...
An extra attribute customer-id is used to uniquely identify customers (since it may
be possible to have two customers with the same name, street address, and city)
...
In the United States,
many enterprises use the social-security number of a person (a unique number the
U
...
government assigns to every person in the United States) as a customer
identifier
...
For example, a depositor
relationship associates a customer with each account that she has
...
The overall logical structure (schema) of a database can be expressed graphically
by an E-R diagram, which is built up from the following components:
• Rectangles, which represent entity sets
• Ellipses, which represent attributes
• Diamonds, which represent relationships among entity sets
• Lines, which link attributes to entity sets and entity sets to relationships
Each component is labeled with the entity or relationship that it represents
...
Figure 1
...
The E-R diagram indicates that there are two entity sets,
customer and account, with attributes as outlined earlier
...
In addition to entities and relationships, the E-R model represents certain constraints to which the contents of a database must conform
...
For example, if each account must belong
to only one customer, the E-R model can express that constraint
...
Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition
1
...
4
customer-name
Data Models
account-number
customer-street
customer-id
19
© The McGraw−Hill
Companies, 2001
Text
9
balance
customer-city
customer
Figure 1
...
1
...
2 Relational Model
The relational model uses a collection of tables to represent both data and the relationships among those data
...
Figure 1
...
The first table, the customer table, shows, for example, that the customer identified
by customer-id 192-83-7465 is named Johnson and lives at 12 Alma St
...
The second table, account, shows, for example, that account A-101 has a balance of
$500, and A-201 has a balance of $900
...
For example,
account number A-101 belongs to the customer whose customer-id is 192-83-7465,
namely Johnson, and customers 192-83-7465 (Johnson) and 019-28-3746 (Smith) share
account number A-201 (they may share a business venture)
...
Record-based models are so named because the database is structured in fixed-format records of several
types
...
Each record type defines a
fixed number of fields, or attributes
...
It is not hard to see how tables may be stored in files
...
The relational model hides such low-level implementation details
from database developers and users
...
Chapters 3 through 7
cover the relational model in detail
...
Database
designs are often carried out in the E-R model, and then translated to the relational
model; Chapter 2 describes the translation process
...
We also note that it is possible to create schemas in the relational model that have
problems such as unnecessarily duplicated information
...
Introduction
© The McGraw−Hill
Companies, 2001
Text
Introduction
customer-id customer-name
192-83-7465
Johnson
019-28-3746
Smith
677-89-9011
Hayes
182-73-6091
Turner
321-12-3123
Jones
336-66-9999
Lindsay
019-28-3746
Smith
customer-street
12 Alma St
...
3 Main St
...
100 Main St
...
72 North St
...
3
A sample relational database
...
Then, to represent the fact
that accounts A-101 and A-201 both belong to customer Johnson (with customer-id
192-83-7465), we would need to store two rows in the customer table
...
In Chapter 7, we shall study how to distinguish
good schema designs from bad schema designs
...
4
...
The object-oriented model can be seen as extending the E-R model with notions
Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition
1
...
5
Database Languages
11
of encapsulation, methods (functions), and object identity
...
The object-relational data model combines features of the object-oriented data
model and relational data model
...
Semistructured data models permit the specification of data where individual data
items of the same type may have different sets of attributes
...
The extensible markup language (XML) is widely
used to represent semistructured data
...
Historically, two other data models, the network data model and the hierarchical
data model, preceded the relational data model
...
As a
result they are little used now, except in old database code that is still in service in
some places
...
1
...
In
practice, the data definition and data manipulation languages are not two separate
languages; instead they simply form parts of a single database language, such as the
widely used SQL language
...
5
...
For instance, the following statement in the SQL language defines the account table:
create table account
(account-number char(10),
balance integer)
Execution of the above DDL statement creates the account table
...
A data dictionary contains metadata — that is, data about data
...
A database system consults the data dictionary before
reading or modifying actual data
...
These statements define the implementation details of the database schemas,
which are usually hidden from the users
...
For example, suppose the balance on an account should not fall below $100
...
The database systems check these constraints every time the database is updated
...
Introduction
Text
© The McGraw−Hill
Companies, 2001
Introduction
1
...
2 Data-Manipulation Language
Data manipulation is
• The retrieval of information stored in the database
• The insertion of new information into the database
• The deletion of information from the database
• The modification of information stored in the database
A data-manipulation language (DML) is a language that enables users to access
or manipulate data as organized by the appropriate data model
...
• Declarative DMLs (also referred to as nonprocedural DMLs) require a user to
specify what data are needed without specifying how to get those data
...
However, since a user does not have to specify how to get the data, the database
system has to figure out an efficient means of accessing data
...
A query is a statement requesting the retrieval of information
...
Although technically incorrect, it is common practice to use the terms query language and datamanipulation language synonymously
...
customer-name
from customer
where customer
...
If the query were run on the table in Figure 1
...
Queries may involve information from more than one table
...
select account
...
customer-id = 192-83-7465 and
depositor
...
account-number
Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition
1
...
6
Database Users and Administrators
13
If the above query were run on the tables in Figure 1
...
There are a number of database query languages in use, either commercially or
experimentally
...
We also study some other query languages in Chapter 5
...
3 apply not only to defining
or structuring data, but also to manipulating data
...
At higher levels of abstraction,
we emphasize ease of use
...
The query processor component of the database system (which we study in
Chapters 13 and 14) translates DML queries into sequences of actions at the physical
level of the database system
...
5
...
Application programs are usually written in a host language, such as Cobol, C, C++, or
Java
...
To access the database, DML statements need to be executed from the host language
...
The Open Database Connectivity (ODBC) standard defined by Microsoft
for use with the C language is a commonly used application program interface standard
...
• By extending the host language syntax to embed DML calls within the host
language program
...
1
...
People who work with a database can be categorized as
database users or database administrators
...
6
...
Different types of user interfaces have been
designed for the different types of users
...
Introduction
Text
© The McGraw−Hill
Companies, 2001
Introduction
• Naive users are unsophisticated users who interact with the system by invoking one of the application programs that have been written previously
...
This program asks the teller for the amount
of money to be transferred, the account from which the money is to be transferred, and the account to which the money is to be transferred
...
Such a user may access a form, where she
enters her account number
...
The typical user interface for naive users is a forms interface, where the
user can fill in appropriate fields of the form
...
• Application programmers are computer professionals who write application
programs
...
Rapid application development (RAD) tools are tools that enable an application programmer to construct forms and reports without writing a program
...
These languages, sometimes called fourth-generation languages, often
include special features to facilitate the generation of forms and the display of
data on the screen
...
• Sophisticated users interact with the system without writing programs
...
They submit
each such query to a query processor, whose function is to break down DML
statements into instructions that the storage manager understands
...
Online analytical processing (OLAP) tools simplify analysts’ tasks by letting them view summaries of data in different ways
...
The tools also permit the analyst to select specific regions, look at data in more detail (for example, sales by city within a region)
or look at the data in less detail (for example, aggregate products together by
category)
...
We study OLAP tools and data mining in Chapter 22
...
Among these applications are computer-aided design systems, knowledge-
Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition
1
...
7
Transaction Management
15
base and expert systems, systems that store data with complex data types (for
example, graphics data and audio data), and environment-modeling systems
...
1
...
2 Database Administrator
One of the main reasons for using DBMSs is to have central control of both the data
and the programs that access those data
...
The functions of a DBA include:
• Schema definition
...
• Storage structure and access-method definition
...
The DBA carries out changes to the schema and physical organization to reflect the changing needs of the
organization, or to alter the physical organization to improve performance
...
By granting different types of
authorization, the database administrator can regulate which parts of the database various users can access
...
• Routine maintenance
...
Ensuring that enough free disk space is available for normal operations,
and upgrading disk space as required
...
1
...
An example is a funds transfer, as in Section 1
...
Clearly, it is essential that either both the credit
and debit occur, or that neither occur
...
This all-or-none requirement is called atomicity
...
That is, the value of the sum A + B must be preserved
...
Finally, after the successful execution of a funds
transfer, the new values of accounts A and B must persist, despite the possibility of
system failure
...
A transaction is a collection of operations that performs a single logical function
in a database application
...
Introduction
Text
© The McGraw−Hill
Companies, 2001
Introduction
tency
...
That is, if the database was consistent when a transaction started, the
database must be consistent when the transaction successfully terminates
...
This temporary inconsistency, although necessary, may lead to difficulty if a failure
occurs
...
For example, the transaction to
transfer funds from account A to account B could be defined to be composed of two
separate programs: one that debits account A, and another that credits account B
...
However, each program by itself does not transform the database from a consistent
state to a new consistent state
...
Ensuring the atomicity and durability properties is the responsibility of the database system itself — specifically, of the transaction-management component
...
However, because of various types of failure, a transaction may not always
complete its execution successfully
...
Thus, the database must
be restored to the state in which it was before the transaction in question started executing
...
Finally, when several transactions update the database concurrently, the consistency of data may no longer be preserved, even though each individual transaction is correct
...
Database systems designed for use on small personal computers may not have
all these features
...
Others do not offer backup and recovery, leaving that to the
user
...
Although such a low-cost, low-feature
approach is adequate for small personal databases, it is inadequate for a medium- to
large-scale enterprise
...
8 Database System Structure
A database system is partitioned into modules that deal with each of the responsibilites of the overall system
...
The storage manager is important because databases typically require a large
amount of storage space
...
A gigabyte is 1000 megabytes
Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition
1
...
8
Database System Structure
17
(1 billion bytes), and a terabyte is 1 million megabytes (1 trillion bytes)
...
Data are moved between disk storage and main memory as needed
...
The query processor is important because it helps the database system simplify
and facilitate access to data
...
However, quick processing of updates and queries
is important
...
1
...
1 Storage Manager
A storage manager is a program module that provides the interface between the lowlevel data stored in the database and the application programs and queries submitted to the system
...
The raw data are stored on the disk using the file system, which is usually provided by a conventional operating system
...
Thus, the storage
manager is responsible for storing, retrieving, and updating data in the database
...
• Transaction manager, which ensures that the database remains in a consistent
(correct) state despite system failures, and that concurrent transaction executions proceed without conflicting
...
• Buffer manager, which is responsible for fetching data from disk storage into
main memory, and deciding what data to cache in main memory
...
The storage manager implements several data structures as part of the physical
system implementation:
• Data files, which store the database itself
...
• Indices, which provide fast access to data items that hold particular values
...
Introduction
Text
© The McGraw−Hill
Companies, 2001
Introduction
1
...
2 The Query Processor
The query processor components include
• DDL interpreter, which interprets DDL statements and records the definitions
in the data dictionary
...
A query can usually be translated into any of a number of alternative evaluation plans that all give the same result
...
• Query evaluation engine, which executes low-level instructions generated by
the DML compiler
...
4 shows these components and the connections among them
...
9 Application Architectures
Most users of a database system today are not present at the site of the database
system, but connect to it through a network
...
Database applications are usually partitioned into two or three parts, as in Figure 1
...
In a two-tier architecture, the application is partitioned into a component
that resides at the client machine, which invokes database system functionality at the
server machine through query language statements
...
In contrast, in a three-tier architecture, the client machine acts as merely a front
end and does not contain any direct database calls
...
The application
server in turn communicates with a database system to access data
...
Three-tier applications are more appropriate for large applications, and for
applications that run on the World Wide Web
...
10 History of Database Systems
Data processing drives the growth of computers, as it has from the earliest days of
commercial computers
...
Punched cards, invented by Hollerith, were used at the very beginning of the
twentieth century to record U
...
census data, and mechanical systems were used to
Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition
1
...
10
naive users
(tellers, agents,
web-users)
write
application
interfaces
History of Database Systems
sophisticated
users
(analysts)
application
programmers
use
use
application
programs
query
tools
compiler and
linker
DML queries
application
program
object code
19
database
administrator
use
administration
tools
DDL interpreter
DML compiler
and organizer
query evaluation
engine
buffer manager
29
© The McGraw−Hill
Companies, 2001
Text
query processor
authorization
and integrity
manager
file manager
transaction
manager
storage manager
disk storage
indices
data
data dictionary
statistical data
Figure 1
...
process the cards and tabulate results
...
Techniques for data storage and processing have evolved over the years:
• 1950s and early 1960s: Magnetic tapes were developed for data storage
...
Processing of data consisted of reading data from one or more tapes and
30
Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition
20
Chapter 1
1
...
two-tier architecture
Figure 1
...
three-tier architecture
Two-tier and three-tier architectures
...
Data could also be input from punched card decks,
and output to printers
...
The records had to
be in the same sorted order
...
Tapes (and card decks) could be read only sequentially, and data sizes were
much larger than main memory; thus, data processing programs were forced
to process data in a particular order, by reading and merging data from tapes
and card decks
...
The position of data on disk was immaterial, since any location on disk
could be accessed in just tens of milliseconds
...
With disks, network and hierarchical databases could
be created that allowed data structures such as lists and trees to be stored on
disk
...
A landmark paper by Codd [1970] defined the relational model, and nonprocedural ways of querying data in the relational model, and relational
databases were born
...
Codd later won the prestigious Association of Computing
Machinery Turing Award for his work
...
Introduction
31
© The McGraw−Hill
Companies, 2001
Text
1
...
That changed with System R, a groundbreaking project
at IBM Research that developed techniques for the construction of an efficient
relational database system
...
[1976] and Chamberlin et al
...
The fully functional System R prototype led to IBM’s first relational database product, SQL/DS
...
By the early 1980s, relational databases had become
competitive with network and hierarchical database systems even in the area
of performance
...
Most importantly, they had to keep
efficiency in mind when designing their programs, which involved a lot of
effort
...
Since attaining dominance in the 1980s, the relational
model has reigned supreme among data models
...
• Early 1990s: The SQL language was designed primarily for decision support
applications, which are query intensive, yet the mainstay of databases in the
1980s was transaction processing applications, which are update intensive
...
Tools for analyzing large amounts of data saw large growths in
usage
...
Database vendors also began to add object-relational support to their
databases
...
Databases were deployed much more extensively than ever before
...
Database
systems also had to support Web interfaces to data
...
11 Summary
• A database-management system (DBMS) consists of a collection of interrelated data and a collection of programs to access that data
...
32
Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition
22
Chapter 1
1
...
• Database systems are ubiquitous today, and most people interact, either directly or indirectly, with databases many times every day
...
The management of data involves both the definition of structures for the storage of
information and the provision of mechanisms for the manipulation of information
...
If data are to be shared among several users, the system must avoid
possible anomalous results
...
That is, the system hides certain details of how the data are
stored and maintained
...
The entity-relationship (E-R) data model is a widely used data
model, and it provides a convenient graphical representation to view data, relationships and constraints
...
Other data models are the object-oriented model, the objectrelational model, and semistructured data models
...
A database
schema is specified by a set of definitions that are expressed using a datadefinition language (DDL)
...
Nonprocedural DMLs, which require a user to specify
only what data are needed, without specifying exactly how to get those data,
are widely used today
...
• A database system has several subsystems
...
The transaction manager also ensures that concurrent transaction executions proceed without conflicting
...
The storage manager subsystem provides the interface between the lowlevel data stored in the database and the application programs and queries
submitted to the system
...
Introduction
33
© The McGraw−Hill
Companies, 2001
Text
Exercises
23
• Database applications are typically broken up into a front-end part that runs at
client machines and a part that runs at the back end
...
In three-tier architectures, the back end part is itself broken up into an application server and a database server
...
1 List four significant differences between a file-processing system and a DBMS
...
2 This chapter has described several major advantages of a database system
...
3 Explain the difference between physical and logical data independence
...
4 List five responsibilities of a database management system
...
1
...
6 List seven programming languages that are procedural and two that are nonprocedural
...
1
...
34
Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition
24
Chapter 1
1
...
8 Consider a two-dimensional integer array of size n × m that is to be used in
your favorite programming language
...
Bibliographical Notes
We list below general purpose books, research paper collections, and Web sites on
databases
...
Textbooks covering database systems include Abiteboul et al
...
Textbook coverage of transaction processing is provided
by Bernstein and Newcomer [1997] and Gray and Reuter [1993]
...
Among these are Bancilhon and Buneman [1990], Date [1986], Date [1990], Kim [1995],
Zaniolo et al
...
A review of accomplishments in database management and an assessment of future research challenges appears in Silberschatz et al
...
[1996]
and Bernstein et al
...
The home page of the ACM Special Interest Group on
Management of Data (see www
...
org/sigmod) provides a wealth of information
about database research
...
Codd [1970] is the landmark paper that introduced the relational model
...
Tools
There are a large number of commercial database systems in use today
...
ibm
...
oracle
...
microsoft
...
informix
...
sybase
...
Some of these systems are available free for personal or
noncommercial use, or for development, but are not free for actual deployment
...
mysql
...
postgressql
...
A more complete list of links to vendor Web sites and other information is available from the home page of this book, at www
...
bell-labs
...
Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition
P A
I
...
In this part, we study two data
models— the entity – relationship model and the relational model
...
It is based on a
perception of a real world that consists of a collection of basic objects, called entities,
and of relationships among these objects
...
It uses a collection of tables to represent both data and the relationships among those data
...
Designers often formulate database schema design by first
modeling data at a high level, using the E-R model, and then translating it into the
the relational model
...
The object-oriented data model,
for example, extends the representation of entities by adding notions of encapsulation, methods (functions), and object identity
...
Chapters 8 and 9, respectively, cover these two data models
...
Data Models
H
A
P
2
...
It was developed
to facilitate database design by allowing specification of an enterprise schema, which
represents the overall logical structure of a database
...
The E-R model is very useful in mapping the meanings
and interactions of real-world enterprises onto a conceptual schema
...
2
...
2
...
1 Entity Sets
An entity is a “thing” or “object” in the real world that is distinguishable from all
other objects
...
An entity has a
set of properties, and the values for some set of properties may uniquely identify an
entity
...
Thus, the value 677-89-9011 for person-id would uniquely identify one particular person in the enterprise
...
An entity may be concrete, such as a person or a book, or it may be abstract, such as
a loan, or a holiday, or a concept
...
The set of all persons who are customers at a given bank, for example, can
be defined as the entity set customer
...
Data Models
2
...
The individual entities that constitute a
set are said to be the extension of the entity set
...
Entity sets do not need to be disjoint
...
A person entity may be an employee entity, a customer entity, both, or neither
...
Attributes are descriptive properties possessed by each member of an entity set
...
Possible attributes of the customer entity set are customer-id, customer-name, customerstreet, and customer-city
...
Possible attributes of the loan entity set are loan-number
and amount
...
For instance, a particular customer
entity may have the value 321-12-3123 for customer-id, the value Jones for customername, the value Main for customer-street, and the value Harrison for customer-city
...
In the United States,
many enterprises find it convenient to use the social-security number of a person1
as an attribute whose value uniquely identifies the person
...
For each attribute, there is a set of permitted values, called the domain, or value
set, of that attribute
...
Similarly, the domain of attribute loan-number might
be the set of all strings of the form “L-n” where n is a positive integer
...
Figure 2
...
Formally, an attribute of an entity set is a function that maps from the entity set into
a domain
...
For
example, a particular customer entity may be described by the set {(customer-id, 67789-9011), (customer-name, Hayes), (customer-street, Main), (customer-city, Harrison)},
meaning that the entity describes a person named Hayes whose customer identifier
is 677-89-9011 and who resides at Main Street in Harrison
...
The
attribute values describing an entity will constitute a significant portion of the data
stored in the database
...
1
...
Each person is supposed to have only one socialsecurity number, and no two people are supposed to have the same social-security number
...
Data Models
© The McGraw−Hill
Companies, 2001
2
...
1
Basic Concepts
321-12-3123 Jones
Main
Harrison
L-17 1000
019-28-3746 Smith
North
Rye
L-23 2000
677-89-9011 Hayes
Main
Harrison
L-15 1500
555-55-5555 Jackson
Dupont Woodside
L-14 1500
244-66-8800 Curry
North
L-19
500
963-96-3963 Williams Nassau Princeton
L-11
900
335-57-7991 Adams
29
L-16 1300
Rye
Spring Pittsfield
customer
Figure 2
...
• Simple and composite attributes
...
Composite attributes,
on the other hand, can be divided into subparts (that is, other attributes)
...
Using composite attributes in
a design schema is a good choice if a user will wish to refer to an entire attribute on some occasions, and to only a component of the attribute on other
occasions
...
2 Composite attributes help us to group
together related attributes, making the modeling cleaner
...
In the composite attribute address, its component attribute street can be further divided
into street-number, street-name, and apartment-number
...
2 depicts these
examples of composite attributes for the customer entity set
...
The attributes in our examples all
have a single value for a particular entity
...
Such attributes
are said to be single valued
...
Consider an employee entity set with the
attribute phone-number
...
This type of attribute is said to be multivalued
...
We assume the address format used in the United States, which includes a numeric postal code called
a zip code
...
Data Models
© The McGraw−Hill
Companies, 2001
2
...
2
Composite attributes customer-name and customer-address
...
Where appropriate, upper and lower bounds may be placed on the number
of values in a multivalued attribute
...
Placing bounds
in this case expresses that the phone-number attribute of the customer entity set
may have between zero and two values
...
The value for this type of attribute can be derived from
the values of other related attributes or entities
...
We can derive the value for this attribute
by counting the number of loan entities associated with that customer
...
If the customer entity set also has an
attribute date-of-birth, we can calculate age from date-of-birth and the current
date
...
In this case, date-of-birth may be referred
to as a base attribute, or a stored attribute
...
An attribute takes a null value when an entity does not have a value for it
...
For example, one may have no middle name
...
An unknown value may be either missing (the value does
exist, but we do not have that information) or not known (we do not know whether or
not the value actually exists)
...
A null value for the
apartment-number attribute could mean that the address does not include an apartment number (not applicable), that an apartment number exists but we do not know
what it is (missing), or that we do not know whether or not an apartment number is
part of the customer’s address (unknown)
...
For example, in addition to keeping track of customers and loans, the bank also
39
40
Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition
I
...
Entity−Relationship
Model
2
...
Also, if the bank has a number of different branches, then
we may keep information about all the branches of the bank
...
2
...
2 Relationship Sets
A relationship is an association among several entities
...
This relationship specifies that Hayes is a customer with loan number L-15
...
Formally, it is a mathematical relation on n ≥ 2 (possibly nondistinct) entity sets
...
, En are
entity sets, then a relationship set R is a subset of
{(e1 , e2 ,
...
, en ∈ En }
where (e1 , e2 ,
...
Consider the two entity sets customer and loan in Figure 2
...
We define the relationship set borrower to denote the association between customers and the bank loans
that the customers have
...
3 depicts this association
...
We can define
the relationship set loan-branch to denote the association between a bank loan and the
branch in which that loan is maintained
...
3
loan
Relationship set borrower
...
Data Models
2
...
, En participate in relationship set R
...
As an illustration, the individual customer entity
Hayes, who has customer identifier 677-89-9011, and the loan entity L-15 participate
in a relationship instance of borrower
...
The function that an entity plays in a relationship is called that entity’s role
...
However, they are useful when the meaning of a relationship needs clarification
...
In this type of relationship set, sometimes called a recursive relationship set, explicit role names are necessary to specify how an entity
participates in a relationship instance
...
We may have a relationship set works-for that is modeled by ordered pairs of employee entities
...
In this way, all relationships of works-for are characterized by (worker, manager)
pairs; (manager, worker) pairs are excluded
...
Consider a
relationship set depositor with entity sets customer and account
...
The depositor relationship among the entities corresponding to customer Jones and account A-217 has the value “23 May 2001” for attribute access-date, which means that the most recent date that Jones accessed account
A-217 was 23 May 2001
...
We
may wish to store a descriptive attribute for-credit with the relationship, to record
whether a student has taken the course for credit, or is auditing (or sitting in on) the
course
...
To understand
this point, suppose we want to model all the dates when a customer accessed an
account
...
We
cannot represent multiple access dates by multiple relationship instances between the
same customer and account, since the relationship instances would not be uniquely
identifiable using only the participating entities
...
However, there can be more than one relationship set involving the same entity
sets
...
Additionally, suppose each loan must have another customer who serves
as a guarantor for the loan
...
41
42
Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition
I
...
Entity−Relationship
Model
2
...
Most of the relationship sets in
a database system are binary
...
As an example, consider the entity sets employee, branch, and job
...
Job entities may have the attributes title and level
...
A ternary relationship among Jones, Perryridge,
and manager indicates that Jones acts as a manager at the Perryridge branch
...
Yet another relationship could be between Smith, Downtown,
and teller, indicating Smith acts as a teller at the Downtown branch
...
A binary relationship set is of degree 2; a ternary relationship set
is of degree 3
...
2 Constraints
An E-R enterprise schema may define certain constraints to which the contents of a
database must conform
...
2
...
1 Mapping Cardinalities
Mapping cardinalities, or cardinality ratios, express the number of entities to which
another entity can be associated via a relationship set
...
In this section, we shall concentrate on only binary relationship
sets
...
An entity in A is associated with at most one entity in B, and an
entity in B is associated with at most one entity in A
...
4a
...
An entity in A is associated with any number (zero or more) of
entities in B
...
(See Figure 2
...
)
• Many to one
...
An
entity in B, however, can be associated with any number (zero or more) of
entities in A
...
5a
...
An entity in A is associated with any number (zero or more) of
entities in B, and an entity in B is associated with any number (zero or more)
of entities in A
...
5b
...
Data Models
© The McGraw−Hill
Companies, 2001
2
...
4
B
(b)
Mapping cardinalities
...
(b) One to many
...
As an illustration, consider the borrower relationship set
...
If a loan can belong to several
customers (as can loans taken jointly by several business partners), the relationship
set is many to many
...
3 depicts this type of relationship
...
2
...
If only some entities in E
participate in relationships in R, the participation of entity set E in relationship R is
said to be partial
...
Therefore the participation of loan in
A
B
a1
a2
b1
a3
b2
a4
b3
a5
(a)
Figure 2
...
(a) Many to one
...
43
44
Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition
I
...
Entity−Relationship
Model
© The McGraw−Hill
Companies, 2001
2
...
In contrast, an individual can be a bank customer
whether or not she has a loan with the bank
...
2
...
Conceptually, individual entities are distinct; from a database perspective,
however, the difference among them must be expressed in terms of their attributes
...
In other words, no two entities in an entity set are allowed
to have exactly the same value for all attributes
...
Keys also help uniquely identify relationships, and thus distinguish
relationships from each other
...
3
...
For example, the customer-id attribute of the
entity set customer is sufficient to distinguish one customer entity from another
...
Similarly, the combination of customer-name and customer-id
is a superkey for the entity set customer
...
The concept of a superkey is not sufficient for our purposes, since, as we saw, a
superkey may contain extraneous attributes
...
We are often interested in superkeys for which no proper subset is a superkey
...
It is possible that several distinct sets of attributes could serve as a candidate key
...
Then, both {customer-id} and
{customer-name, customer-street} are candidate keys
...
We shall use the term primary key to denote a candidate key that is chosen by
the database designer as the principal means of identifying entities within an entity
set
...
Any two individual entities in the set are prohibited from
having the same value on the key attributes at the same time
...
Candidate keys must be chosen with care
...
In the United States, the social-security number attribute of a person would be a
Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition
36
Chapter 2
I
...
Entity−Relationship
Model
© The McGraw−Hill
Companies, 2001
Entity-Relationship Model
candidate key
...
S
...
An alternative
is to use some unique combination of other attributes as a key
...
For instance, the address field of a person should not be part of the primary
key, since it is likely to change
...
Unique identifiers generated by enterprises generally do not
change, except if two enterprises merge; in such a case the same identifier may have
been issued by both enterprises, and a reallocation of identifiers may be required to
make sure they are unique
...
3
...
We need a similar mechanism to distinguish among the various relationships
of a relationship set
...
, En
...
Assume
for now that the attribute names of all primary keys are unique, and each entity set
participates only once in the relationship
...
If the relationship set R has no attributes associated with it, then the set of attributes
primary-key(E1 ) ∪ primary-key(E2 ) ∪ · · · ∪ primary-key(En )
describes an individual relationship in set R
...
, am }
describes an individual relationship in set R
...
In case the attribute names of primary keys are not unique across entity sets, the
attributes are renamed to distinguish them; the name of the entity set combined with
the name of the attribute would form a unique name
...
1
...
45
46
Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition
I
...
Entity−Relationship
Model
2
...
As an illustration, consider the entity sets
customer and account, and the relationship set depositor, with attribute access-date, in
Section 2
...
2
...
Then the primary
key of depositor consists of the union of the primary keys of customer and account
...
Similarly, if the relationship is many to one from
account to customer — that is, each account is owned by at most one customer — then
the primary key of depositor is simply the primary key of account
...
For nonbinary relationships, if no cardinality constraints are present then the superkey formed as described earlier in this section is the only candidate key, and it
is chosen as the primary key
...
Since we have not discussed how to specify cardinality constraints on nonbinary relations, we do not discuss this issue further in this
chapter
...
3
...
4 Design Issues
The notions of an entity set and a relationship set are not precise, and it is possible
to define a set of entities and the relationships among them in a number of different ways
...
Section 2
...
4 covers the design process in further detail
...
4
...
It can easily be argued that a telephone is an entity in its own right with attributes
telephone-number and location (the office where the telephone is located)
...
Treating a telephone as an entity telephone permits employees to have several telephone numbers (including zero) associated with
them
...
The main difference then is that treating a telephone as an entity better models a
situation where one may want to keep extra information about a telephone, such as
Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition
38
Chapter 2
I
...
Entity−Relationship
Model
© The McGraw−Hill
Companies, 2001
Entity-Relationship Model
its location, or its type (mobile, video phone, or plain old telephone), or who all share
the telephone
...
In contrast, it would not be appropriate to treat the attribute employee-name as an
entity; it is difficult to argue that employee-name is an entity in its own right (in contrast
to the telephone)
...
Two natural questions thus arise: What constitutes an attribute, and what constitutes an entity set? Unfortunately, there are no simple answers
...
A common mistake is to use the primary key of an entity set as an attribute of another entity set, instead of using a relationship
...
The relationship borrower is the correct way to represent the connection between loans and
customers, since it makes their connection explicit, rather than implicit via an attribute
...
This should
not be done, since the primary key attributes are already implicit in the relationship
...
4
...
In Section 2
...
1, we assumed that a bank loan is modeled as an entity
...
Each
loan is represented by a relationship between a customer and a branch
...
However, with this design, we cannot represent conveniently a situation in
which several customers hold a loan jointly
...
Then, we must replicate
the values for the descriptive attributes loan-number and amount in each such relationship
...
Two problems arise as a result of the replication: (1) the data are stored multiple
times, wasting storage space, and (2) updates potentially leave the data in an inconsistent state, where the values differ in two relationships for attributes that are supposed to have the same value
...
The problem of replication of the attributes loan-number and amount is absent in
the original design of Section 2
...
1, because there loan is an entity set
...
Data Models
© The McGraw−Hill
Companies, 2001
2
...
4
Design Issues
39
entities
...
2
...
3 Binary versus n-ary Relationship Sets
Relationships in databases are often binary
...
For
instance, one could create a ternary relationship parent, relating a child to his/her
mother and father
...
Using the two relationships mother and father allows us record a child’s
mother, even if we are not aware of the father’s identity; a null value would be
required if the ternary relationship parent is used
...
In fact, it is always possible to replace a nonbinary (n-ary, for n > 2) relationship
set by a number of distinct binary relationship sets
...
We replace
the relationship set R by an entity set E, and create three relationship sets:
• RA , relating E and A
• RB , relating E and B
• RC , relating E and C
If the relationship set R had any attributes, these are assigned to entity set E; further,
a special identifying attribute is created for E (since it must be possible to distinguish
different entities in an entity set on the basis of their attribute values)
...
Then, in each of the three new relationship sets, we insert a relationship as follows:
• (ei , ai ) in RA
• (ei , bi ) in RB
• (ei , ci ) in RC
We can generalize this process in a straightforward manner to n-ary relationship
sets
...
However, this restriction is not always desirable
...
This attribute, along with the extra relationship
sets required, increases the complexity of the design and (as we shall see in
Section 2
...
• A n-ary relationship set shows more clearly that several entities participate in
a single relationship
...
Data Models
2
...
For example, consider a constraint
that says that R is many-to-one from A, B to C; that is, each pair of entities
from A and B is associated with at most one C entity
...
Consider the relationship set works-on in Section 2
...
2, relating employee, branch,
and job
...
If we did so, we would be able to record
that Jones is a manager and an auditor and that Jones works at Perryridge and Downtown; however, we would not be able to record that Jones is a manager at Perryridge
and an auditor at Downtown, but is not an auditor at Perryridge or a manager at
Downtown
...
However, doing so would not be very natural
...
4
...
Thus, attributes of one-to-one or one-to-many relationship sets can be associated with one of the participating entity sets, rather than with the relationship
set
...
In this case, the attribute access-date, which specifies when the customer last
accessed that account, could be associated with the account entity set, as Figure 2
...
Since each account entity participates in a relationship with at most one instance of customer, making this attribute designation would have the same meaning
account (account-number, access-date)
customer (customer-name)
depositor
A-101 24 May 1996
Johnson
A-215 3 June 1996
Smith
A-102 10 June 1996
Hayes
A-305 28 May 1996
Turner
A-201 17 June 1996
Jones
A-222 24 June 1996
Lindsay
A-217 23 May 1996
Figure 2
...
49
50
Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition
I
...
Entity−Relationship
Model
2
...
Attributes of a one-tomany relationship set can be repositioned to only the entity set on the “many” side of
the relationship
...
The design decision of where to place descriptive attributes in such cases— as a
relationship or entity attribute — should reflect the characteristics of the enterprise
being modeled
...
The choice of attribute placement is more clear-cut for many-to-many relationship
sets
...
If we are to express the date on which a specific customer last accessed a specific
account, access-date must be an attribute of the depositor relationship set, rather than
either one of the participating entities
...
When an attribute is determined by the combination of participating
entity sets, rather than by either entity separately, that attribute must be associated
with the many-to-many relationship set
...
7 depicts the placement of accessdate as a relationship attribute; again, to keep the figure simple, only some of the
attributes of the two entity sets are shown
...
7
Access-date as attribute of the depositor relationship set
...
Data Models
© The McGraw−Hill
Companies, 2001
2
...
5 Entity-Relationship Diagram
As we saw briefly in Section 1
...
E-R diagrams are simple and clear — qualities that may
well account in large part for the widespread use of the E-R model
...
6
...
8, which consists of two entity sets, customer and loan, related through a binary relationship set borrower
...
The attributes associated with loan are loan-number and amount
...
8, attributes of an entity set that are members of the primary key are underlined
...
To distinguish among these types, we draw either a directed line (→)
or an undirected line (— ) between the relationship set and the entity set in question
...
customer-name
customer-street
customer-id
loan-number
amount
customer-city
customer
Figure 2
...
51
52
Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition
I
...
Entity−Relationship
Model
2
...
Returning to the E-R diagram of Figure 2
...
If the relationship set borrower were one-to-many, from customer to
loan, then the line from borrower to customer would be directed, with an arrow pointing to the customer entity set (Figure 2
...
Similarly, if the relationship set borrower
were many-to-one from customer to loan, then the line from borrower to loan would
have an arrow pointing to the loan entity set (Figure 2
...
Finally, if the relationship set borrower were one-to-one, then both lines from borrower would have arrows:
customer-name
loan-number
customer-street
customer-id
amount
customer-city
borrower
customer
loan
(a)
customer-name
customer-street
customer-id
loan-number
amount
customer-city
borrower
customer
loan
(b)
customer-name
customer-street
customer-id
amount
loan-number
customer-city
customer
borrower
loan
(c)
Figure 2
...
(a) one to many
...
(c) one-to-one
...
Data Models
© The McGraw−Hill
Companies, 2001
2
...
10
account
E-R diagram with an attribute attached to a relationship set
...
9c)
...
For example, in Figure 2
...
Figure 2
...
Here, a composite attribute name, with component attributes first-name, middle-initial,
and last-name replaces the simple attribute customer-name of customer
...
The attribute street is
itself a composite attribute whose component attributes are street-number, street-name,
and apartment number
...
11 also illustrates a multivalued attribute phone-number, depicted by a
double ellipse, and a derived attribute age, depicted by a dashed ellipse
...
11
apartment-number
date-of-birth
zip-code
age
E-R diagram with composite, multivalued, and derived attributes
...
Data Models
© The McGraw−Hill
Companies, 2001
2
...
5
Entity-Relationship Diagram
45
employee-name
telephone-number
employee-id
manager
works-for
employee
worker
Figure 2
...
We indicate roles in E-R diagrams by labeling the lines that connect diamonds
to rectangles
...
12 shows the role indicators manager and worker between the
employee entity set and the works-for relationship set
...
Figure 2
...
We can specify some types of many-to-one relationships in the case of nonbinary
relationship sets
...
This constraint can be specified by an arrow pointing to job on the edge from works-on
...
Suppose there is a relationship set R between entity sets A1 , A2 ,
...
, An
...
A particular combination of entities from A1 , A2 ,
...
, An
...
, Ai
...
13
branch-name
works-on
branch
E-R diagram with a ternary relationship
...
Data Models
© The McGraw−Hill
Companies, 2001
2
...
14
borrower
loan-number
amount
loan
Total participation of an entity set in a relationship set
...
For each entity set Ak , i < k ≤ n, each combination of the entities from the
other entity sets can be associated with at most one entity from Ak
...
, Ak−1 , Ak+1 ,
...
Each of these interpretations has been used in different books and systems
...
In Chapter 7 (Section 7
...
Double lines are used in an E-R diagram to indicate that the participation of an
entity set in a relationship set is total; that is, each entity in the entity set occurs in at
least one relationship in that relationship set
...
A double line from loan to borrower, as in
Figure 2
...
E-R diagrams also provide a way to indicate more complex constraints on the number of times each entity participates in relationships in a relationship set
...
h, where l is the minimum and h
the maximum cardinality
...
A maximum value of 1 indicates that the entity participates in at most one relationship, while a maximum value ∗ indicates no limit
...
∗ on an edge is equivalent to a double line
...
15
...
1, meaning the minimum and the maximum cardinality are
both 1
...
The limit 0
...
Thus, the relationship borrower is one to many from customer to loan, and
further the participation of loan in borrower is total
...
∗ on the edge between customer and borrower, and
think that the relationship borrower is many to one from customer to loan — this is
exactly the reverse of the correct interpretation
...
If we had specified a cardinality limit of 1
...
55
56
Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition
I
...
Entity−Relationship
Model
2
...
15
0
...
1
loan
Cardinality limits on relationship sets
...
6 Weak Entity Sets
An entity set may not have sufficient attributes to form a primary key
...
An entity set that has a primary key is termed a strong
entity set
...
Payment numbers are typically
sequential numbers, starting from 1, generated separately for each loan
...
Thus, this entity set does not have a primary key; it is a weak
entity set
...
Every weak entity must be associated
with an identifying entity; that is, the weak entity set is said to be existence dependent on the identifying entity set
...
The relationship associating the weak entity set with the
identifying entity set is called the identifying relationship
...
In our example, the identifying entity set for payment is loan, and a relationship
loan-payment that associates payment entities with their corresponding loan entities is
the identifying relationship
...
The discriminator of a weak entity set is a set of attributes that allows this distinction to be made
...
The discriminator
of a weak entity set is also called the partial key of the entity set
...
In the case of the entity
set payment, its primary key is {loan-number, payment-number}, where loan-number is
the primary key of the identifying entity set, namely loan, and payment-number distinguishes payment entities within the same loan
...
Data Models
© The McGraw−Hill
Companies, 2001
2
...
2
...
A weak entity set can participate in relationships other than the identifying relationship
...
A
weak entity set may participate as owner in an identifying relationship with another
weak entity set
...
A particular weak entity would then be identified by a combination
of entities, one from each identifying entity set
...
In E-R diagrams, a doubly outlined box indicates a weak entity set, and a doubly outlined diamond indicates the corresponding identifying relationship
...
16, the weak entity set payment depends on the strong entity set loan via the
relationship set loan-payment
...
Finally,
the arrow from loan-payment to loan indicates that each payment is for a single loan
...
In some cases, the database designer may choose to express a weak entity set as
a multivalued composite attribute of the owner entity set
...
A weak
entity set may be more appropriately modeled as an attribute if it participates in only
the identifying relationship, and if it has few attributes
...
loan-number
payment-date
amount
payment-number
loan
Figure 2
...
57
58
Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition
I
...
Entity−Relationship
Model
2
...
The same course may be offered in
different semesters, and within a semester there may be several sections for the same
course
...
2
...
In this section, we discuss the extended E-R features of specialization, generalization,
higher- and lower-level entity sets, attribute inheritance, and aggregation
...
7
...
For instance, a subset of entities within an entity set
may have attributes that are not shared by all the entities in the entity set
...
Consider an entity set person, with attributes name, street, and city
...
For example, customer
entities may be described further by the attribute customer-id, whereas employee entities may be described further by the attributes employee-id and salary
...
The specialization of person allows us to distinguish among persons according to whether they
are employees or customers
...
Savings accounts need a minimum
balance, but the bank may set interest rates differently for different customers, offering better rates to favored customers
...
The bank could then create two specializations of account, namely savings-account
and checking-account
...
The entity set savings-account would have all the
attributes of account and an additional attribute interest-rate
...
Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition
50
Chapter 2
I
...
Entity−Relationship
Model
© The McGraw−Hill
Companies, 2001
Entity-Relationship Model
We can apply specialization repeatedly to refine a design scheme
...
For example, officer entities
may be described further by the attribute office-number, teller entities by the attributes
station-number and hours-per-week, and secretary entities by the attribute hours-perweek
...
An entity set may be specialized by more than one distinguishing feature
...
Another, coexistent, specialization could be based on whether the person
is a temporary (limited-term) employee or a permanent employee, resulting in the
entity sets temporary-employee and permanent-employee
...
For instance, a given employee may be a temporary employee who is a
secretary
...
17 shows
...
The ISA relationship may also be referred to as
a superclass-subclass relationship
...
2
...
2 Generalization
The refinement from an initial entity set into successive levels of entity subgroupings
represents a top-down design process in which distinctions are made explicit
...
The
database designer may have first identified a customer entity set with the attributes
name, street, city, and customer-id, and an employee entity set with the attributes name,
street, city, employee-id, and salary
...
This commonality can be
expressed by generalization, which is a containment relationship that exists between
a higher-level entity set and one or more lower-level entity sets
...
Higher- and lower-level entity sets also may be designated by the terms superclass
and subclass, respectively
...
For all practical purposes, generalization is a simple inversion of specialization
...
Data Models
© The McGraw−Hill
Companies, 2001
2
...
7
name
street
Extended E-R Features
51
city
person
ISA
credit-rating
salary
employee
customer
ISA
officer
teller
secretary
hours-worked
office-number
station-number
Figure 2
...
schema for an enterprise
...
New levels of entity representation will be
distinguished (specialization) or synthesized (generalization) as the design schema
comes to express fully the database application and the user requirements of the
database
...
Specialization stems from a single entity set; it emphasizes differences among entities within the set by creating distinct lower-level entity sets
...
Indeed, the reason a designer applies specialization is to represent such distinctive features
...
Generalization proceeds from the recognition that a number of entity sets share
some common features (namely, they are described by the same attributes and participate in the same relationship sets)
...
Data Models
2
...
Generalization
is used to emphasize the similarities among lower-level entity sets and to hide the
differences; it also permits an economy of representation in that shared attributes are
not repeated
...
7
...
The attributes of the higher-level entity
sets are said to be inherited by the lower-level entity sets
...
Thus, customer is described by its name, street,
and city attributes, and additionally a customer-id attribute; employee is described by
its name, street, and city attributes, and additionally employee-id and salary attributes
...
The officer, teller, and
secretary entity sets can participate in the works-for relationship set, since the superclass employee participates in the works-for relationship
...
The above entity sets can participate in any
relationships in which the person entity set participates
...
Figure 2
...
In the figure, employee is a lower-level
entity set of person and a higher-level entity set of the officer, teller, and secretary entity
sets
...
If an entity set is a lower-level entity set in more than one ISA relationship,
then the entity set has multiple inheritance, and the resulting structure is said to be
a lattice
...
7
...
One type of constraint involves
determining which entities can be members of a given lower-level entity set
...
In condition-defined lower-level entity sets, membership
is evaluated on the basis of whether or not an entity satisfies an explicit condition or predicate
...
Data Models
© The McGraw−Hill
Companies, 2001
2
...
7
Extended E-R Features
53
count has the attribute account-type
...
Only those entities that satisfy the condition
account-type = “savings account” are allowed to belong to the lower-level entity set person
...
Since all the lower-level entities are
evaluated on the basis of the same attribute (in this case, on account-type), this
type of generalization is said to be attribute-defined
...
User-defined lower-level entity sets are not constrained by a
membership condition; rather, the database user assigns entities to a given entity set
...
We therefore represent the
teams as four lower-level entity sets of the higher-level employee entity set
...
Instead, the user in charge of this decision makes the team assignment on an individual basis
...
A second type of constraint relates to whether or not entities may belong to more
than one lower-level entity set within a single generalization
...
A disjointness constraint requires that an entity belong to no more
than one lower-level entity set
...
• Overlapping
...
For an
illustration, consider the employee work team example, and assume that certain managers participate in more than one work team
...
Thus, the generalization is overlapping
...
The generalization is
overlapping if an employee can also be a customer
...
We can note a disjointedness constraint in an E-R diagram by adding the word disjoint next to the triangle symbol
...
This
constraint may be one of the following:
• Total generalization or specialization
...
Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition
54
Chapter 2
I
...
Entity−Relationship
Model
© The McGraw−Hill
Companies, 2001
Entity-Relationship Model
• Partial generalization or specialization
...
Partial generalization is the default
...
(This notation is similar to the notation for total participation
in a relationship
...
Because the higher-level entity set arrived at through
generalization is generally composed of only those entities in the lower-level entity
sets, the completeness constraint for a generalized higher-level entity set is usually
total
...
The work team entity sets illustrate a partial specialization
...
We may characterize the team entity sets more fully as a partial, overlapping specialization of employee
...
The completeness and disjointness constraints, however, do not depend on each other
...
We can see that certain insertion and deletion requirements follow from the constraints that apply to a given generalization or specialization
...
With a
condition-defined constraint, all higher-level entities that satisfy the condition must
be inserted into that lower-level entity set
...
2
...
5 Aggregation
One limitation of the E-R model is that it cannot express relationships among relationships
...
13)
...
Let us assume that there is an entity set manager
...
(A quaternary relationship is
required — a binary relationship between manager and employee would not permit us
to represent which (branch, job) combinations of an employee are managed by which
manager
...
18
...
)
It appears that the relationship sets works-on and manages can be combined into
one single relationship set
...
63
64
Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition
I
...
Entity−Relationship
Model
2
...
18
E-R diagram with redundant relationships
...
If the manager were a
value rather than an manager entity, we could instead make manager a multivalued attribute of the relationship works-on
...
Since the manager is a manager entity, this alternative is
ruled out in any case
...
Aggregation is an abstraction through which relationships are treated as higherlevel entities
...
Such an entity set is treated in the same manner as is any other entity set
...
Figure 2
...
2
...
6 Alternative E-R Notations
Figure 2
...
There is
no universal standard for E-R diagram notation, and different books and E-R diagram
software use different notations; Figure 2
...
An entity set may be represented as a box with the name
outside, and the attributes listed one below the other within the box
...
Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition
56
Chapter 2
I
...
Entity−Relationship
Model
Entity-Relationship Model
job
employee
works-on
branch
manages
manager
Figure 2
...
Cardinality constraints can be indicated in several different ways, as Figure 2
...
The labels ∗ and 1 on the edges out of the relationship are sometimes used for
depicting many-to-many, one-to-one, and many-to-one relationships, as the figure
shows
...
In
another alternative notation in the figure, relationship sets are represented by lines
between entity sets, without diamonds; only binary relationships can be modeled
thus
...
2
...
In this section, we consider how a database designer may
select from the wide range of alternatives
...
2
...
2
...
2
...
Data Models
© The McGraw−Hill
Companies, 2001
2
...
8
Design of an E-R Database Schema
E
entity set
A
attribute
E
weak entity set
A
multivalued
attribute
R
relationship set
A
derived attribute
R
identifying
relationship
set for weak
entity set
A
one-to-one
relationship
rolename
ISA
E
total
participation
of entity set
in relationship
discriminating
attribute of
weak entity set
A
many-to-many
relationship
R
R
primary key
R
R
57
many-to-one
relationship
R
l
...
20
disjoint
Symbols used in the E-R notation
...
6); a strong entity set
and its dependent weak entity sets may be regarded as a single “object” in the
database, since weak entities are existence dependent on a strong entity
• Whether using generalization (Section 2
...
2) is appropriate; generalization, or
a hierarchy of ISA relationships, contributes to modularity by allowing com-
Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition
58
Chapter 2
I
...
Entity−Relationship
Model
Entity-Relationship Model
E
entity set E with
attributes A1, A2, A3
and primary key A1
A1
A2
A3
many-to-many
relationship
*
one-to-one
relationship
1
many-to-one
relationship
*
R
R
R
Figure 2
...
mon attributes of similar entity sets to be represented in one place in an E-R
diagram
• Whether using aggregation (Section 2
...
5) is appropriate; aggregation groups
a part of an E-R diagram into a single entity set, allowing us to treat the aggregate entity set as a single unit without concern for the details of its internal
structure
...
2
...
1 Design Phases
A high-level data model serves the database designer by providing a conceptual
framework in which to specify, in a systematic fashion, what the data requirements
of the database users are, and how the database will be structured to fulfill these
requirements
...
The database designer needs to interact
extensively with domain experts and users to carry out this task
...
Next, the designer chooses a data model, and by applying the concepts of the
chosen data model, translates these requirements into a conceptual schema of the
database
...
Since we have studied only the E-R model so far, we shall
67
68
Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition
I
...
Entity−Relationship
Model
2
...
Stated in terms of the E-R model, the schema
specifies all entity sets, relationship sets, attributes, and mapping constraints
...
She can also examine the design to remove
any redundant features
...
A fully developed conceptual schema will also indicate the functional requirements of the enterprise
...
Example
operations include modifying or updating data, searching for and retrieving specific
data, and deleting data
...
The process of moving from an abstract data model to the implementation of the
database proceeds in two final design phases
...
The designer uses the resulting systemspecific database schema in the subsequent physical-design phase, in which the
physical features of the database are specified
...
In this chapter, we cover only the concepts of the E-R model as used in the conceptual-schema-design phase
...
Database design
receives a full treatment in Chapter 7
...
8
...
We employ the E-R data model to translate user requirements
into a conceptual design schema that is depicted as an E-R diagram
...
8
...
However, we do not attempt to model every
aspect of the database-design for a bank; we consider only a few aspects, in order to
illustrate the process of database design
...
8
...
1 Data Requirements
The initial specification of user requirements may be based on interviews with the
database users, and on the designer’s own analysis of the enterprise
...
Here are the major characteristics of the banking enterprise
...
Each branch is located in a particular
city and is identified by a unique name
...
Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition
60
Chapter 2
I
...
Entity−Relationship
Model
© The McGraw−Hill
Companies, 2001
Entity-Relationship Model
• Bank customers are identified by their customer-id values
...
Customers
may have accounts and can take out loans
...
• Bank employees are identified by their employee-id values
...
The bank also keeps track of the employee’s start date and, thus,
length of employment
...
Accounts can be held by more than one customer, and a customer can have more
than one account
...
The
bank maintains a record of each account’s balance, and the most recent date on
which the account was accessed by each customer holding the account
...
• A loan originates at a particular branch and can be held by one or more customers
...
For each loan, the bank
keeps track of the loan amount and the loan payments
...
The date and amount are recorded for each payment
...
Since the modeling requirements for that tracking are similar, and we
would like to keep our example application small, we do not keep track of such deposits and withdrawals in our model
...
8
...
2 Entity Sets Designation
Our specification of data requirements serves as the starting point for constructing a
conceptual schema for the database
...
8
...
1,
we begin to identify entity sets and their attributes:
• The branch entity set, with attributes branch-name, branch-city, and assets
...
A possible additional attribute is banker-name
...
Additional descriptive features are the multivalued attribute dependent-name, the base attribute start-date, and the derived attribute employment-length
...
Data Models
© The McGraw−Hill
Companies, 2001
2
...
8
Design of an E-R Database Schema
61
• Two account entity sets — savings-account and checking-account — with the common attributes of account-number and balance; in addition, savings-account has
the attribute interest-rate and checking-account has the attribute overdraft-amount
...
• The weak entity set loan-payment, with attributes payment-number, paymentdate, and payment-amount
...
8
...
3 Relationship Sets Designation
We now return to the rudimentary design scheme of Section 2
...
2
...
In the process, we also refine
some of the decisions we made earlier regarding attributes of entity sets
...
• loan-branch, a many-to-one relationship set that indicates in which branch a
loan originated
...
• loan-payment, a one-to-many relationship from loan to payment, which documents that a payment is made on a loan
...
• cust-banker, with relationship attribute type, a many-to-one relationship set expressing that a customer can be advised by a bank employee, and that a bank
employee can advise one or more customers
...
• works-for, a relationship set between employee entities with role indicators manager and worker; the mapping cardinalities express that an employee works
for only one manager and that a manager supervises one or more employees
...
2
...
2
...
8
...
3, we now present the completed E-R diagram for our example banking enterprise
...
22 depicts the full representation
of a conceptual model of a bank, expressed in terms of E-R concepts
...
8
...
1 and 2
...
2
...
8
...
3
...
Data Models
© The McGraw−Hill
Companies, 2001
2
...
22
interest-rate
overdraft-amount
E-R diagram for a banking enterprise
...
9 Reduction of an E-R Schema to Tables
We can represent a database that conforms to an E-R database schema by a collection
of tables
...
Each table has multiple columns, each of which has a unique name
...
Because the two models employ similar design principles, we can convert an E-R design into a relational design
...
Although important differences
71
72
Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition
I
...
Entity−Relationship
Model
2
...
In this section, we describe how an E-R schema can be represented by tables; and
in Chapter 3, we show how to generate a relational-database schema from an E-R
schema
...
We provide more details about this mapping in Chapter 6 after describing how to
specify constraints on tables
...
9
...
, an
...
Each row in this table corresponds to one entity of the entity
set E
...
8
...
We represent this entity set by
a table called loan, with two columns, as in Figure 2
...
The row
(L-17, 1000)
in the loan table means that loan number L-17 has a loan amount of $1000
...
We can also delete or
modify rows
...
Any row of the loan table must consist of a 2-tuple (v1 , v2 ), where v1 is a loan (that
is, v1 is in set D1 ) and v2 is an amount (that is, v2 is in set D2 )
...
We refer to the set of all
possible rows of loan as the Cartesian product of D1 and D2 , denoted by
D1 × D2
In general, if we have a table of n columns, we denote the Cartesian product of
D1 , D2 , · · · , Dn by
D1 × D2 × · · · × Dn−1 × Dn
loan-number
L-11
L-14
L-15
L-16
L-17
L-23
L-93
Figure 2
...
Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition
64
Chapter 2
I
...
Entity−Relationship
Model
Entity-Relationship Model
customer-id
019-28-3746
182-73-6091
192-83-7465
244-66-8800
321-12-3123
335-57-7991
336-66-9999
677-89-9011
963-96-3963
customer-name
Smith
Turner
Johnson
Curry
Jones
Adams
Lindsay
Hayes
Williams
Figure 2
...
As another example, consider the entity set customer of the E-R diagram in Figure 2
...
This entity set has the attributes customer-id, customer-name, customer-street,
and customer-city
...
24
...
9
...
, am
...
Let the primary key of B consist of attributes b1 , b2 ,
...
We
represent the entity set A by a table called A with one column for each attribute of
the set:
{a1 , a2 ,
...
, bn }
As an illustration, consider the entity set payment in the E-R diagram of Figure 2
...
This entity set has three attributes: payment-number, payment-date, and payment-amount
...
Thus, we represent payment by a table with four columns labeled loan-number, paymentnumber, payment-date, and payment-amount, as in Figure 2
...
2
...
3 Tabular Representation of Relationship Sets
Let R be a relationship set, let a1 , a2 ,
...
, bn
...
, am } ∪ {b1 , b2 ,
...
8
...
Data Models
© The McGraw−Hill
Companies, 2001
2
...
9
loan-number
L-11
L-14
L-15
L-16
L-17
L-17
L-17
L-23
L-93
L-93
Reduction of an E-R Schema to Tables
payment-number
53
69
22
58
5
6
7
11
103
104
payment-date
7 June 2001
28 May 2001
23 May 2001
18 June 2001
10 May 2001
7 June 2001
17 June 2001
17 May 2001
3 June 2001
13 June 2001
Figure 2
...
Since the relationship set has no attributes, the borrower table has two columns, labeled customer-id and loan-number, as shown in Figure 2
...
2
...
3
...
As we noted in Section 2
...
Furthermore, the primary key of a weak entity set includes the primary key of the strong entity set
...
16, the
weak entity set payment is dependent on the strong entity set loan via the relationship set loan-payment
...
Since loan-payment has no descriptive
attributes, the loan-payment table would have two columns, loan-number and paymentnumber
...
Every (loan-number, payment-number) combination in loan-payment would also be present in the payment table, and vice versa
...
In general, the table for the relationship set
customer-id
019-28-3746
019-28-3746
244-66-8800
321-12-3123
335-57-7991
555-55-5555
677-89-9011
963-96-3963
Figure 2
...
Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition
66
Chapter 2
I
...
Entity−Relationship
Model
Entity-Relationship Model
linking a weak entity set to its corresponding strong entity set is redundant and does
not need to be present in a tabular representation of an E-R diagram
...
9
...
2 Combination of Tables
Consider a many-to-one relationship set AB from entity set A to entity set B
...
Suppose further that the participation of A in the relationship is total; that is, every
entity a in the entity set A must participate in the relationship AB
...
As an illustration, consider the E-R diagram of Figure 2
...
The double line in the
E-R diagram indicates that the participation of account in the account-branch is total
...
Further, the relationship set account-branch is many to one from account to branch
...
9
...
Suppose address is a composite attribute of entity set customer, and the components of address are street and city
...
2
...
5 Multivalued Attributes
We have seen that attributes in an E-R diagram generally map directly into columns
for the appropriate tables
...
branch-name
account-number
account
branch-city
assets
balance
accountbranch
Figure 2
...
branch
75
76
Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition
I
...
Entity−Relationship
Model
2
...
As an illustration, consider the E-R diagram
in Figure 2
...
The diagram includes the multivalued attribute dependent-name
...
Each dependent of an employee is represented
as a unique row in the table
...
9
...
Although we refer to the generalization in Figure 2
...
1
...
For each lower-level entity set,
create a table that includes a column for each of the attributes of that entity set
plus a column for each attribute of the primary key of the higher-level entity
set
...
17, we have three tables:
• account, with attributes account-number and balance
• savings-account, with attributes account-number and interest-rate
• checking-account, with attributes account-number and overdraft-amount
2
...
Here, do not
create a table for the higher-level entity set
...
Then,
for the E-R diagram of Figure 2
...
• savings-account, with attributes account-number, balance, and interest-rate
• checking-account, with attributes account-number, balance, and overdraftamount
The savings-account and checking-account relations corresponding to these
tables both have account-number as the primary key
...
Similarly, if the generalization
were not complete — that is, if some accounts were neither savings nor checking
accounts — then such accounts could not be represented with the second method
...
9
...
Consider the diagram of Figure 2
...
The table for the relationship set
Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition
68
Chapter 2
I
...
Entity−Relationship
Model
© The McGraw−Hill
Companies, 2001
Entity-Relationship Model
manages between the aggregation of works-on and the entity set manager includes a
column for each attribute in the primary keys of the entity set manager and the relationship set works-on
...
We then transform the relationship sets
and entity sets within the aggregated entity
...
10 The Unified Modeling Language UML∗∗
Entity-relationship diagrams help model the data representation component of a software system
...
Other components include models of user interactions with the system, specification of functional modules of the system and their interaction, etc
...
Some of the parts of UML are:
• Class diagram
...
Later in this
section we illustrate a few features of class diagrams and how they relate to
E-R diagrams
...
Use case diagrams show the interaction between users and
the system, in particular the steps of tasks that users perform (such as withdrawing money or registering for a course)
...
Activity diagrams depict the flow of tasks between various
components of a system
...
Implementation diagrams show the system components and their interconnections, both at the software component level and
the hardware component level
...
See the bibliographic notes for references on UML
...
Figure 2
...
We describe these constructs below
...
UML actually models objects, whereas E-R models entities
...
Class diagrams can depict methods in addition to attributes
...
We represent binary relationship sets in UML by just drawing a line connecting
the entity sets
...
We may also
specify the role played by an entity set in a relationship set by writing the role name
on the line, adjacent to the entity set
...
This box can then be treated as
77
78
Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition
I
...
Entity−Relationship
Model
2
...
entity sets
and attributes
The Unified Modeling Language UML∗∗
customer
customer-id
customer-name
customer-street
customer-city
customer-city
customer-id
customer
2
...
cardinality
constraints
E1
role1
0
...
1
person
4
...
*
E2
E2
employee
(disjoint
generalization)
person
customer
employee
E-R diagram
Figure 2
...
1
customer
customer
role2
person
(overlapping
generalization)
ISA
R
R
a1
a2
a2
a1
E1
role2
69
employee
class diagram in UML
Symbols used in the UML class diagram notation
...
Nonbinary relationships cannot be directly represented in UML — they have to
be converted to binary relationships by the technique we have seen earlier in Section 2
...
3
...
Data Models
2
...
h, where l denotes the minimum and h the maximum number of relationships an entity can participate in
...
28
...
∗ on the E2 side and 0
...
Single values such as 1 or ∗ may be written on edges; the single value 1 on an edge
is treated as equivalent to 1
...
We represent generalization and specialization in UML by connecting entity sets
by a line with a triangle at the end corresponding to the more general entity set
...
UML
diagrams can also represent explicitly the constraints of disjoint/overlapping on generalizations
...
28 shows disjoint and overlapping generalizations of customer
and employee to person
...
An overlapping
generalization allows a person to be both a customer and an employee
...
11 Summary
• The entity-relationship (E-R) data model is based on a perception of a real
world that consists of a set of basic objects called entities, and of relationships
among these objects
...
It was developed to facilitate database design by allowing the specification of an enterprise schema
...
This overall structure can be expressed graphically by an E-R diagram
...
We express the distinction by associating with each entity a set
of attributes that describes the object
...
The collection of all
entities of the same type is an entity set, and the collection of all relationships
of the same type is a relationship set
...
• A superkey of an entity set is a set of one or more attributes that, taken collectively, allows us to identify uniquely an entity in the entity set
...
Similarly, a relationship set
is a set of one or more attributes that, taken collectively, allows us to identify
uniquely a relationship in the relationship set
...
Data Models
© The McGraw−Hill
Companies, 2001
2
...
11
Summary
71
mal superkey for each relationship set from among its superkeys; this is the
relationship set’s primary key
...
An entity set that has a primary key is termed a
strong entity set
...
Specialization
is the result of taking a subset of a higher-level entity set to form a lowerlevel entity set
...
The attributes of higher-level entity sets are inherited by lower-level entity sets
...
• The various features of the E-R model offer the database designer numerous
choices in how to best represent the enterprise being modeled
...
Aspects of the overall structure of the enterprise may be best described by using weak entity sets, generalization, specialization, or aggregation
...
• A database that conforms to an E-R diagram can be represented by a collection
of tables
...
Each table has a number of columns, each of which has a
unique name
...
• The unified modeling language (UML) provides a graphical means of modeling various components of a software system
...
However, there are some differences between the two that one must beware of
...
Data Models
© The McGraw−Hill
Companies, 2001
2
...
1 Explain the distinctions among the terms primary key, candidate key, and superkey
...
2 Construct an E-R diagram for a car-insurance company whose customers own
one or more cars each
...
2
...
Associate with each patient a log of the various tests and examinations conducted
...
4 A university registrar’s office maintains data about the following entities: (a)
courses, including number, title, credits, syllabus, and prerequisites; (b) course
offerings, including course number, year, semester, section number, instructor(s),
timings, and classroom; (c) students, including student-id, name, and program;
and (d) instructors, including identification number, name, department, and title
...
Construct an E-R diagram for the registrar’s office
...
2
...
a
...
81
82
Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition
I
...
Entity−Relationship
Model
© The McGraw−Hill
Companies, 2001
Exercises
73
b
...
Make sure that only one relationship
exists between a particular student and course-offering pair, yet you can
represent the marks that a student gets in different exams of a course offering
...
6 Construct appropriate tables for each of the E-R diagrams in Exercises 2
...
4
...
7 Design an E-R diagram for keeping track of the exploits of your favourite sports
team
...
Summary statistics should be modeled as derived attributes
2
...
2
...
2
...
Why, then, do we have weak entity sets?
2
...
Give two examples of where this concept is
useful
...
12 Consider the E-R diagram in Figure 2
...
a
...
b
...
The same music item may be present in cassette or compact disk
format, with differing prices
...
c
...
2
...
Why is allowing this redundancy a bad practice that one should avoid whenever
possible?
2
...
This database could be modeled as the single entity set exam, with attributes
course-name, section-number, room-number, and time
...
Show an E-R diagram illustrating the use of all three additional entity sets
listed
...
Data Models
© The McGraw−Hill
Companies, 2001
2
...
29
address
code
phone
E-R diagram for Exercise 2
...
b
...
2
...
a
...
Design three alternative E-R diagrams to represent the university registrar’s
office of Exercise 2
...
List the merits of each
...
2
...
What do the following mean in terms
of the structure of an enterprise schema?
a
...
b
...
2
...
4
...
30a) using binary relationships, as shown in Figure 2
...
Consider the alternative shown in
83
84
Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition
I
...
Entity−Relationship
Model
Exercises
75
A
A
RA
B
R
C
B
RB
(a)
E
RC
C
(b)
R1
A
R3
B
R2
C
(c)
Figure 2
...
17 (attributes not shown)
...
30c
...
2
...
4
...
30b
...
Show a simple instance of E, A, B, C, RA , RB , and RC that cannot correspond to any instance of A, B, C, and R
...
Modify the E-R diagram of Figure 2
...
c
...
d
...
Show how to treat E as a weak entity set so that a primary key attribute
is not required
...
19 A weak entity set can always be made into a strong entity set by adding to its
attributes the primary key attributes of its identifying entity set
...
2
...
The company sells motorcycles, passenger cars, vans, and buses
...
Explain why they
should not be placed at a higher or lower level
...
Data Models
© The McGraw−Hill
Companies, 2001
2
...
21 Explain the distinction between condition-defined and user-defined constraints
...
2
...
2
...
2
...
31 shows a lattice structure of generalization and specialization
...
Discuss how to handle a case where an attribute of X
has the same name as some attribute of Y
...
25 Draw the UML equivalents of the E-R diagrams of Figures 2
...
10, 2
...
13
and 2
...
2
...
Assume that both banks
use exactly the same E-R database schema — the one in Figure 2
...
(This assumption is, of course, highly unrealistic; we consider the more realistic case in
Section 19
...
) If the merged bank is to have a single database, there are several
potential problems:
• The possibility that the two original banks have branches with the same
name
• The possibility that some customers are customers of both original banks
• The possibility that some loan or account numbers were used at both original banks (for different loans or accounts, of course)
For each of these potential problems, describe why there is indeed a potential
for difficulties
...
For your solution, explain any
changes that would have to be made and describe what their effect would be on
the schema and the data
...
27 Reconsider the situation described for Exercise 2
...
As before, the
banks use the schema of Figure 2
...
S
...
What problems (be-
X
ISA
A
Figure 2
...
24 (attributes not shown)
...
Data Models
2
...
24) might occur in this multinational case?
How would you resolve them? Be sure to consider both the scheme and the
actual data values in constructing your answer
...
A logical design methodology for
relational databases using the extended E-R model is presented by Teorey et al
...
Mapping from extended E-R models to the relational model is discussed by Lyngbaek
and Vianu [1987] and Markowitz and Shoshani [1992]
...
[1981]),
GORDAS (Elmasri and Wiederhold [1981]), and ERROL (Markowitz and Raz [1983])
...
Smith and Smith [1977] introduced the concepts of generalization, specialization,
and aggregation and Hammer and McLeod [1980] expanded them
...
Thalheim [2000] provides a detailed textbook coverage of research in E-R modeling
...
[1992] and Elmasri and
Navathe [2000]
...
[1983] provide a collection of papers on the E-R model
...
These tools help a designer create E-R diagrams, and they can automatically create corresponding tables in a database
...
There are also some databaseindependent data modeling tools that support E-R diagrams and UML class diagrams
...
rational
...
visio
...
cai
...
Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition
C
I
...
Relational Model
T
E
R
87
© The McGraw−Hill
Companies, 2001
3
Relational Model
The relational model is today the primary data model for commercial data-processing
applications
...
In this chapter, we first study the fundamentals of the relational model, which provides a very simple yet powerful way of representing data
...
The three we cover in this chapter are not user-friendly, but instead serve as
the formal basis for user-friendly query languages that we study later
...
The relational algebra forms
the basis of the widely used SQL query language
...
The
domain relational calculus is the basis of the QBE query language
...
We study the part of this theory
dealing with queries in this chapter
...
3
...
Each table has a structure similar to that presented in Chapter 2, where
we represented E-R databases by tables
...
Since a table is a collection of such relationships, there is a
close correspondence between the concept of table and the mathematical concept of
79
88
Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition
80
Chapter 3
I
...
Relational Model
Relational Model
relation, from which the relational data model takes its name
...
In this chapter, we shall be using a number of different relations to illustrate the
various concepts underlying the relational data model
...
They differ slightly from the tables that were used in Chapter 2, so that we can simplify our presentation
...
3
...
1 Basic Structure
Consider the account table of Figure 3
...
It has three column headers: account-number,
branch-name, and balance
...
For each
attribute, there is a set of permitted values, called the domain of that attribute
...
Let
D1 denote the set of all account numbers, D2 the set of all branch names, and D3
the set of all balances
...
In general, account will contain only a subset of the set of all possible
rows
...
This definition corresponds almost exactly with our definition of table
...
Because tables are essentially relations, we shall use the mathematical
account-number
A-101
A-102
A-201
A-215
A-217
A-222
A-305
Figure 3
...
Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition
I
...
1
account-number
A-101
A-215
A-102
A-305
A-201
A-222
A-217
Figure 3
...
Relational Model
Structure of Relational Databases
81
branch-name balance
Downtown
500
Mianus
700
Perryridge
400
Round Hill
350
Brighton
900
Redwood
700
Brighton
750
The account relation with unordered tuples
...
A tuple variable is a
variable that stands for a tuple; in other words, a tuple variable is a variable whose
domain is the set of all tuples
...
1, there are seven tuples
...
We use the notation t[account-number] to denote
the value of t on the account-number attribute
...
Alternatively, we may write t[1] to denote the value
of tuple t on the first attribute (account-number), t[2] to denote branch-name, and so on
...
The order in which tuples appear in a relation is irrelevant, since a relation is a
set of tuples
...
1, or are unsorted, as in Figure 3
...
We require that, for all relations r, the domains of all attributes of r be atomic
...
For example, the set of integers is an atomic domain, but the set of all sets of integers
is a nonatomic domain
...
The important issue is not what the domain itself is,
but rather how we use domain elements in our database
...
In
all our examples, we shall assume atomic domains
...
It is possible for several attributes to have the same domain
...
It is possible that the attributes customer-name and employee-name will
have the same domain: the set of all person names, which at the physical level is
the set of all character strings
...
It is perhaps less clear whether customer-name
and branch-name should have the same domain
...
However, at the logical level, we may
want customer-name and branch-name to have distinct domains
...
Data Models
3
...
For example, suppose
that we include the attribute telephone-number in the customer relation
...
We would then have to resort to null values to signify that the value is unknown or does not exist
...
We shall assume null values are absent initially, and in Section 3
...
4, we
describe the effect of nulls on different operations
...
1
...
The concept of a relation corresponds to the programming-language notion of a
variable
...
It is convenient to give a name to a relation schema, just as we give names to type
definitions in programming languages
...
Following this notation, we use Account-schema to denote the relation
schema for relation account
...
We shall not be concerned about the precise definition of the domain of
each attribute until we discuss the SQL language in Chapter 4
...
The value of a given variable may change with time;
similarly the contents of a relation instance may change with time as the relation is
updated
...
”
As an example of a relation instance, consider the branch relation of Figure 3
...
The
schema for that relation is
Branch-schema = (branch-name, branch-city, assets)
Note that the attribute branch-name appears in both Branch-schema and Accountschema
...
Rather, using common attributes in
relation schemas is one way of relating tuples of distinct relations
...
Data Models
3
...
3
91
© The McGraw−Hill
Companies, 2001
3
...
located in Brooklyn
...
Then, for each such branch, we would look in the account relation to find the information about the accounts maintained at that branch
...
Let us continue our banking example
...
The relation schema is
Customer -schema = (customer-name, customer-street, customer-city)
Figure 3
...
Note that we have
omitted the customer-id attribute, which we used Chapter 2, because now we want to
have smaller relation schemas in our running example of a bank database
...
customer-name
Adams
Brooks
Curry
Glenn
Green
Hayes
Johnson
Jones
Lindsay
Smith
Turner
Williams
Figure 3
...
92
Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition
84
Chapter 3
I
...
Relational Model
Relational Model
In a real-world database, the customer-id (which could be a social-security number, or
an identifier generated by the bank) would serve to uniquely identify customers
...
The relation schema to describe this association is
Depositor -schema = (customer-name, account-number)
Figure 3
...
It would appear that, for our banking example, we could have just one relation
schema, rather than several
...
Suppose that we used only one
relation for our example, with schema
(branch-name, branch-city, assets, customer-name, customer-street
customer-city, account-number, balance)
Observe that, if a customer has several accounts, we must list her address once for
each account
...
This repetition is wasteful and is avoided by the use of several relations, as in our example
...
To represent
incomplete tuples, we must use null values that signify that the value is unknown or
does not exist
...
By using several relations, we can represent the branch information for a bank with no customers without using null values
...
In Chapter 7, we shall study criteria to help us decide when one set of relation
schemas is more appropriate than another, in terms of information repetition and
the existence of null values
...
We include two additional relations to describe data about loans maintained in the
various branches in the bank:
customer-name
Hayes
Johnson
Johnson
Jones
Lindsay
Smith
Turner
Figure 3
...
Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition
I
...
1
loan-number
L-11
L-14
L-15
L-16
L-17
L-23
L-93
Figure 3
...
Relational Model
Structure of Relational Databases
85
branch-name amount
Round Hill
900
Downtown
1500
Perryridge
1500
Perryridge
1300
Downtown
1000
Redwood
2000
Mianus
500
The loan relation
...
6 and 3
...
The E-R diagram in Figure 3
...
The relation schemas correspond to the set of tables that we might generate by the method outlined in Section 2
...
Note that the tables for account-branch and
loan-branch have been combined into the tables for account and loan respectively
...
Finally, we
note that the customer relation may contain information about customers who have
neither an account nor a loan at the bank
...
On occasion, we shall need to introduce additional
relation schemas to illustrate particular points
...
1
...
For example, in Branch-schema, {branchcustomer-name
Adams
Curry
Hayes
Jackson
Jones
Smith
Smith
Williams
Figure 3
...
94
Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition
86
Chapter 3
I
...
Relational Model
Relational Model
branch-city
branch-name
balance
account-number
account-branch
account
depositor
branch
loan-branch
customer
customer-name
assets
loan
borrower
customer-city
loan-number
customer-street
Figure 3
...
name} and {branch-name, branch-city} are both superkeys
...
However, {branch-name} is a candidate
key, and for our purpose also will serve as a primary key
...
Let R be a relation schema
...
That is, if t1 and t2 are in r and t1 = t2 , then
t1 [K] = t2 [K]
...
The primary key of the entity set becomes the primary key
of the relation
...
The table, and thus the relation, corresponding to a weak
entity set includes
The attributes of the weak entity set
The primary key of the strong entity set on which the weak entity set
depends
Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition
I
...
Relational Model
3
...
• Relationship set
...
If the relationship is many-to-many, this superkey is also the primary key
...
4
...
Recall from Section 2
...
3 that no table is generated for relationship sets linking a weak entity set to the corresponding strong
entity set
...
Recall from Section 2
...
3 that a binary many-to-one relationship set from A to B can be represented by a table consisting of the attributes of A and attributes (if any exist) of the relationship set
...
For one-to-one relationship sets, the relation
is constructed like that for a many-to-one relationship set
...
• Multivalued attributes
...
9
...
The primary key of the entity or relationship set, together
with the attribute C, becomes the primary key for the relation
...
This attribute is called a foreign key from r1 , referencing r2
...
For example, the attribute branch-name in
Account-schema is a foreign key from Account-schema referencing Branch-schema, since
branch-name is the primary key of Branch-schema
...
It is customary to list the primary key attributes of a relation schema before the
other attributes; for example, the branch-name attribute of Branch-schema is listed first,
since it is the primary key
...
1
...
Figure 3
...
Each relation appears as a box, with the attributes listed inside it and the relation name above it
...
Foreign
96
Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition
88
Chapter 3
I
...
Relational Model
Relational Model
branch
account
depositor
customer
branch–name
branch–city
assets
account–number
branch–name
balance
customer–name
account–number
customer–name
customer–street
customer–city
loan
loan–number
branch–name
amount
Figure 3
...
key dependencies appear as arrows from the foreign key attributes of the referencing
relation to the primary key of the referenced relation
...
In particular, E-R diagrams
do not show foreign key attributes explicitly, whereas schema diagrams show them
explicity
...
3
...
5 Query Languages
A query language is a language in which a user requests information from the database
...
Query languages can be categorized as either procedural or nonprocedural
...
In a nonprocedural language, the user describes the desired information without giving a specific
procedure for obtaining that information
...
We shall study
the very widely used query language SQL in Chapter 4
...
In this chapter, we examine “pure” languages: The relational algebra is procedural, whereas the tuple relational calculus and domain relational calculus are nonprocedural
...
Although we shall be concerned with only queries initially, a complete datamanipulation language includes not only a query language, but also a language for
database modification
...
Data Models
97
© The McGraw−Hill
Companies, 2001
3
...
2
The Relational Algebra
89
as well as commands to modify parts of existing tuples
...
3
...
It consists of a set of operations
that take one or two relations as input and produce a new relation as their result
...
In addition to the fundamental operations, there are
several other operations— namely, set intersection, natural join, division, and assignment
...
3
...
1 Fundamental Operations
The select, project, and rename operations are called unary operations, because they
operate on one relation
...
3
...
1
...
We use the lowercase
Greek letter sigma (σ) to denote selection
...
The argument relation is in parentheses after the σ
...
6, then the relation that results from the
preceding query is as shown in Figure 3
...
We can find all tuples in which the amount lent is more than $1200 by writing
σamount>1200 (loan)
In general, we allow comparisons using =, =, <, ≤, >, ≥ in the selection predicate
...
Thus, to find those tuples pertaining to loans
of more than $1200 made by the Perryridge branch, we write
σbranch-name = “Perryridge” ∧ amount>1200 (loan)
loan-number
L-15
L-16
Figure 3
...
98
Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition
90
Chapter 3
I
...
Relational Model
Relational Model
The selection predicate may include comparisons between two attributes
...
To find all customers who have the
same name as their loan officer, we can write
σcustomer -name = banker -name (loan-officer )
3
...
1
...
The project operation allows us to produce this relation
...
Since a relation is a set, any duplicate rows are eliminated
...
We list those attributes that
we wish to appear in the result as a subscript to Π
...
Thus, we write the query to list all loan numbers and the amount of the
loan as
Πloan-number , amount (loan)
Figure 3
...
3
...
1
...
Consider the more complicated query “Find those customers who live in Harrison
...
In general, since the result of a relational-algebra operation is of the same type
(relation) as its inputs, relational-algebra operations can be composed together into
loan-number
L-11
L-14
L-15
L-16
L-17
L-23
L-93
Figure 3
...
Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition
I
...
Relational Model
3
...
Composing relational-algebra operations into relational-algebra expressions is just like composing arithmetic operations (such as +, −,
∗, and ÷) into arithmetic expressions
...
2
...
3
...
1
...
Note that the customer relation does not contain the information,
since a customer does not need to have either an account or a loan at the bank
...
5) and
in the borrower relation (Figure 3
...
We know how to find the names of all customers
with a loan in the bank:
Πcustomer -name (borrower )
We also know how to find the names of all customers with an account in the bank:
Πcustomer -name (depositor )
To answer the query, we need the union of these two sets; that is, we need all customer names that appear in either or both of the two relations
...
So the expression needed
is
Πcustomer -name (borrower ) ∪ Πcustomer -name (depositor )
The result relation for this query appears in Figure 3
...
Notice that there are 10 tuples
in the result, even though there are seven distinct borrowers and six depositors
...
Since relations are sets, duplicate values are eliminated
...
12
Names of all customers who have either a loan or an account
...
Data Models
3
...
In general, we must ensure that unions are taken between compatible relations
...
The former is a relation of three attributes;
the latter is a relation of two
...
Such a union would not make sense in most situations
...
The relations r and s must be of the same arity
...
2
...
Note that r and s can be, in general, temporary relations that are the result of relationalalgebra expressions
...
2
...
5 The Set Difference Operation
The set-difference operation, denoted by −, allows us to find tuples that are in one
relation but are not in another
...
We can find all customers of the bank who have an account but not a loan by
writing
Πcustomer -name (depositor ) − Πcustomer -name (borrower )
The result relation for this query appears in Figure 3
...
As with the union operation, we must ensure that set differences are taken between compatible relations
...
3
...
1
...
We write the Cartesian product of relations r1 and
r2 as r1 × r2
...
13
Customers with an account but no loan
...
Data Models
101
© The McGraw−Hill
Companies, 2001
3
...
2
The Relational Algebra
93
Recall that a relation is by definition a subset of a Cartesian product of a set of
domains
...
However, since the same attribute name
may appear in both r1 and r2 , we need to devise a naming schema to distinguish
between these attributes
...
For example, the relation schema
for r = borrower × loan is
(borrower
...
loan-number, loan
...
branch-name, loan
...
loan-number from loan
...
For
those attributes that appear in only one of the two schemas, we shall usually drop
the relation-name prefix
...
We can
then write the relation schema for r as
(customer-name, borrower
...
loan-number,
branch-name, amount)
This naming convention requires that the relations that are the arguments of the
Cartesian-product operation have distinct names
...
A similar problem arises if we use the result of a relational-algebra expression in a
Cartesian product, since we shall need a name for the relation so that we can refer
to the relation’s attributes
...
2
...
7, we see how to avoid these problems by
using a rename operation
...
Thus, r is a large
relation, as you can see from Figure 3
...
Assume that we have n1 tuples in borrower and n2 tuples in loan
...
In particular, note that for some tuples t in r, it may be that
t[borrower
...
loan-number]
...
Relation R contains all tuples t for which
there is a tuple t1 in r1 and a tuple t2 in r2 for which t[R1 ] = t1 [R1 ] and t[R2 ] =
t2 [R2 ]
...
We need the information in both the loan relation and the borrower
relation to do so
...
15
...
However, the customer-name column may contain customers
102
Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition
94
Chapter 3
I
...
Relational Model
Relational Model
customer-name
Adams
Adams
Adams
Adams
Adams
Adams
Adams
Curry
Curry
Curry
Curry
Curry
Curry
Curry
Hayes
Hayes
Hayes
Hayes
Hayes
Hayes
Hayes
...
...
loan-number
L-16
L-16
L-16
L-16
L-16
L-16
L-16
L-93
L-93
L-93
L-93
L-93
L-93
L-93
L-15
L-15
L-15
L-15
L-15
L-15
L-15
...
...
14
loan
...
loan-number
L-11
L-14
L-15
L-16
L-17
L-23
L-93
L-11
L-14
L-15
L-16
L-17
L-23
L-93
L-11
L-14
L-15
L-16
L-17
L-23
L-93
...
...
...
Round Hill
Downtown
Perryridge
Perryridge
Downtown
Redwood
Mianus
Round Hill
Downtown
Perryridge
Perryridge
Downtown
Redwood
Mianus
Result of borrower × loan
...
...
900
1500
1500
1300
1000
2000
500
900
1500
1500
1300
1000
2000
500
Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition
I
...
2
customer-name
Adams
Adams
Curry
Curry
Hayes
Hayes
Jackson
Jackson
Jones
Jones
Smith
Smith
Smith
Smith
Williams
Williams
Figure 3
...
loan-number
L-16
L-16
L-93
L-93
L-15
L-15
L-14
L-14
L-17
L-17
L-11
L-11
L-23
L-23
L-17
L-17
103
© The McGraw−Hill
Companies, 2001
3
...
loan-number
L-15
L-16
L-15
L-16
L-15
L-16
L-15
L-16
L-15
L-16
L-15
L-16
L-15
L-16
L-15
L-16
The Relational Algebra
branch-name
Perryridge
Perryridge
Perryridge
Perryridge
Perryridge
Perryridge
Perryridge
Perryridge
Perryridge
Perryridge
Perryridge
Perryridge
Perryridge
Perryridge
Perryridge
Perryridge
95
amount
1500
1300
1500
1300
1500
1300
1500
1300
1500
1300
1500
1300
1500
1300
1500
1300
Result of σbranch-name = “Perryridge” (borrower × loan)
...
(If you do not see why that is true,
recall that the Cartesian product takes all possible pairings of one tuple from borrower
with one tuple of loan
...
loan-number
= loan
...
So, if we write
σborrower
...
loan-number
(σbranch-name = “Perryridge” (borrower × loan))
we get only those tuples of borrower × loan that pertain to customers who have a
loan at the Perryridge branch
...
loan-number = loan
...
16, is the correct answer to our query
...
2
...
7 The Rename Operation
Unlike relations in the database, the results of relational-algebra expressions do not
have a name that we can use to refer to them
...
Data Models
3
...
16 Result of Πcustomer -name
(σborrower
...
loan-number
(σbranch-name = “Perryridge” (borrower × loan)))
...
Given a relational-algebra expression E, the expression
ρx (E)
returns the result of expression E under the name x
...
Thus,
we can also apply the rename operation to a relation r to get the same relation under
a new name
...
Assume that a relationalalgebra expression E has arity n
...
,An ) (E)
returns the result of expression E under the name x, and with the attributes renamed
to A1 , A2 ,
...
To illustrate renaming a relation, we consider the query “Find the largest account
balance in the bank
...
Step 1: To compute the temporary relation, we need to compare the values of
all account balances
...
First, we need to devise a mechanism to distinguish between
the two balance attributes
...
balance
500
400
700
750
350
Figure 3
...
balance (σaccount
...
balance (account × ρd (account)))
...
Data Models
105
© The McGraw−Hill
Companies, 2001
3
...
2
The Relational Algebra
97
balance
900
Figure 3
...
We can now write the temporary relation that consists of the balances that are not
the largest:
Πaccount
...
balance
< d
...
The result contains all
balances except the largest one
...
17 shows this relation
...
balance (σaccount
...
balance
(account × ρd (account)))
Figure 3
...
As one more example of the rename operation, consider the query “Find the names
of all customers who live on the same street and in the same city as Smith
...
In the following query, we use the rename
operation on the preceding expression to give its result the name smith-addr, and to
rename its attributes to street and city, instead of customer-street and customer-city:
Πcustomer
...
customer -street =smith-addr
...
customer -city=smith-addr
...
4, appears in Figure 3
...
The rename operation is not strictly required, since it is possible to use a positional
notation for attributes
...
refer to the first attribute, the second attribute, and
so on
...
customer-name
Curry
Smith
Figure 3
...
106
Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition
98
Chapter 3
I
...
Relational Model
© The McGraw−Hill
Companies, 2001
Relational Model
The following relational-algebra expression illustrates the use of positional notation
with the unary operator σ:
σ$2=$3 (R × R)
If a binary operation needs to distinguish between its two operand relations, a similar
positional notation can be used for relation names as well
...
However, the
positional notation is inconvenient for humans, since the position of the attribute is a
number, rather than an easy-to-remember attribute name
...
3
...
2 Formal Definition of the Relational Algebra
The operations in Section 3
...
1 allow us to give a complete definition of an expression
in the relational algebra
...
A general expression in relational algebra is constructed out of smaller subexpressions
...
Then, these are all relationalalgebra expressions:
• E1 ∪ E2
• E1 − E2
• E1 × E2
• σP (E1 ), where P is a predicate on attributes in E1
• ΠS (E1 ), where S is a list consisting of some of the attributes in E1
• ρx (E1 ), where x is the new name for the result of E1
3
...
3 Additional Operations
The fundamental operations of the relational algebra are sufficient to express any
relational-algebra query
...
Therefore, we define additional operations that do not add any power to the algebra, but simplify common
queries
...
1
...
3, we introduce operations that extend the power of the relational algebra, to handle null
and aggregate values
...
Data Models
107
© The McGraw−Hill
Companies, 2001
3
...
2
The Relational Algebra
99
3
...
3
...
Suppose that we wish to find all customers who have both a loan and an
account
...
20
...
It is simply more convenient to write r ∩ s than to write
r − (r − s)
...
2
...
2 The Natural-Join Operation
It is often desirable to simplify certain queries that require a Cartesian product
...
Consider the query “Find the names of all customers
who have a loan at the bank, along with the loan number and the loan amount
...
Then, we select
those tuples that pertain to only the same loan-number, followed by the projection of
the resulting customer-name, loan-number, and amount:
Πcustomer -name, loan
...
loan-number = loan
...
It is denoted by the “join” symbol 1
...
Although the definition of natural join is complicated, the operation is easy to
apply
...
” We express this query
customer-name
Hayes
Jones
Smith
Figure 3
...
108
Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition
100
Chapter 3
I
...
Relational Model
Relational Model
customer-name
Adams
Curry
Hayes
Jackson
Jones
Smith
Smith
Williams
Figure 3
...
by using the natural join as follows:
Πcustomer -name, loan-number , amount (borrower
1
loan)
Since the schemas for borrower and loan (that is, Borrower-schema and Loan-schema)
have the attribute loan-number in common, the natural-join operation considers only
pairs of tuples that have the same value on loan-number
...
After performing the projection, we obtain the relation in Figure 3
...
Consider two relation schemas R and S — which are, of course, lists of attribute
names
...
Similarly, those attribute names that
appear in R but not S are denoted by R − S, whereas S − R denotes those attribute
names that appear in S but not in R
...
We are now ready for a formal definition of the natural join
...
The natural join of r and s, denoted by r 1 s, is a relation on schema
R ∪ S formally defined as follows:
r
1
s = ΠR ∪ S (σr
...
A1 ∧ r
...
A2 ∧
...
An = s
...
, An }
...
branch-name
Brighton
Perryridge
Figure 3
...
Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition
I
...
Relational Model
3
...
Πbranch-name
(σcustomer -city = “Harrison” (customer
1
account
1
depositor ))
The result relation for this query appears in Figure 3
...
Notice that we wrote customer 1 account 1 depositor without inserting
parentheses to specify the order in which the natural-join operations on the
three relations should be executed
...
That is, the natural join is associative
...
Πcustomer -name (borrower
1
depositor )
Note that in Section 3
...
3
...
We repeat this expression here
...
20
...
• Let r(R) and s(S) be relations without any attributes in common; that is,
R ∩ S = ∅
...
) Then, r 1 s = r × s
...
Consider
relations r(R) and s(S), and let θ be a predicate on attributes in the schema R ∪ S
...
2
...
3 The Division Operation
The division operation, denoted by ÷, is suited to queries that include the phrase
“for all
...
We can obtain all branches in Brooklyn by the expression
r1 = Πbranch-name (σbranch-city = “Brooklyn” (branch))
The result relation for this expression appears in Figure 3
...
110
Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition
102
Chapter 3
I
...
Relational Model
Relational Model
branch-name
Brighton
Downtown
Figure 3
...
We can find all (customer-name, branch-name) pairs for which the customer has an
account at a branch by writing
r2 = Πcustomer -name, branch-name (depositor
1
account)
Figure 3
...
Now, we need to find customers who appear in r2 with every branch name in
r1
...
We
formulate the query by writing
Πcustomer -name, branch-name (depositor 1 account)
÷ Πbranch-name (σbranch-city = “Brooklyn” (branch))
The result of this expression is a relation that has the schema (customer-name) and that
contains the tuple (Johnson)
...
The relation r ÷ s is a relation on schema R − S (that
is, on the schema containing all attributes of schema R that are not in schema S)
...
t is in ΠR−S (r)
2
...
tr [S] = ts [S]
b
...
Let r(R) and s(S) be given, with S ⊆ R:
r ÷ s = ΠR−S (r) − ΠR−S ((ΠR−S (r) × s) − ΠR−S,S (r))
customer-name
Hayes
Johnson
Johnson
Jones
Lindsay
Smith
Turner
Figure 3
...
Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition
I
...
3
111
© The McGraw−Hill
Companies, 2001
3
...
The expression on the right
side of the set difference operator
ΠR−S ((ΠR−S (r) × s) − ΠR−S,S (r))
serves to eliminate those tuples that fail to satisfy the second condition of the definition of division
...
Consider ΠR−S (r) × s
...
The expression
ΠR−S,S (r) merely reorders the attributes of r
...
If a tuple tj is in
ΠR−S ((ΠR−S (r) × s) − ΠR−S,S (r))
then there is some tuple ts in s that does not combine with tuple tj to form a tuple in
r
...
It is these
values that we eliminate from ΠR−S (r)
...
2
...
4 The Assignment Operation
It is convenient at times to write a relational-algebra expression by assigning parts of
it to temporary relation variables
...
To illustrate this operation, consider the
definition of division in Section 3
...
3
...
We could write r ÷ s as
temp1 ← ΠR−S (r)
temp2 ← ΠR−S ((temp1 × s) − ΠR−S,S (r))
result = temp1 − temp2
The evaluation of an assignment does not result in any relation being displayed to
the user
...
This relation variable may be used in subsequent
expressions
...
For relational-algebra queries, assignment must
always be made to a temporary relation variable
...
We discuss this issue in Section 3
...
Note
that the assignment operation does not provide any additional power to the algebra
...
3
...
A simple
extension is to allow arithmetic operations as part of projection
...
Data Models
© The McGraw−Hill
Companies, 2001
3
...
25
limit credit-balance
2000
1750
1500
1500
6000
700
2000
400
The credit-info relation
...
Another important extension is the outer-join operation, which
allows relational-algebra expressions to deal with null values, which model missing
information
...
3
...
The generalized projection operation has the form
ΠF1 ,F2 ,
...
, Fn is an arithmetic expression involving constants and attributes in the schema of E
...
For example, suppose we have a relation credit-info, as in Figure 3
...
If we want to
find how much more each person can spend, we can write the following expression:
Πcustomer -name, limit
− credit-balance
(credit-info)
The attribute resulting from the expression limit − credit -balance does not have a
name
...
As a notational convenience, renaming of attributes can be
combined with generalized projection as illustrated below:
Πcustomer -name, (limit
− credit-balance) as credit-available
(credit-info)
The second attribute of this generalized projection has been given the name creditavailable
...
26 shows the result of applying this expression to the relation in
Figure 3
...
3
...
2 Aggregate Functions
Aggregate functions take a collection of values and return a single value as a result
...
Thus, the function sum applied on the collection
{1, 1, 3, 4, 4, 11}
Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition
I
...
3
Extended Relational-Algebra Operations
customer-name
Curry
Jones
Smith
Hayes
Figure 3
...
Relational Model
105
credit-available
250
5300
1600
0
The result of Πcustomer -name, (limit
(credit-info)
...
The aggregate function avg returns the average of the values
...
The aggregate function count returns the number of the elements in the collection, and returns 6 on
the preceding collection
...
The collections on which aggregate functions operate can have multiple occurrences of a value; the order in which the values appear is not relevant
...
Sets are a special case of multisets where there is only one
copy of each element
...
27, for part-time employees
...
The relational-algebra expression
for this query is:
Gsum(salary) (pt-works)
The symbol G is the letter G in calligraphic font; read it as “calligraphic G
...
The result of the expression
above is a relation with a single attribute, containing a single row with a numerical
value corresponding to the sum of all the salaries of all employees working part-time
in the bank
...
27
branch-name salary
Perryridge
1500
Perryridge
1300
Perryridge
5300
Downtown
1500
Downtown
1300
Downtown
2500
Austin
1500
Austin
1600
The pt-works relation
...
Data Models
3
...
If we do want to eliminate duplicates, we use the
same function names as before, with the addition of the hyphenated string “distinct”
appended to the end of the function name (for example, count-distinct)
...
”
In this case, a branch name counts only once, regardless of the number of employees
working that branch
...
27, the result of this query is a single row containing the
value 3
...
To do so, we
need to partition the relation pt-works into groups based on the branch, and to apply
the aggregate function on each group
...
Figure 3
...
The expression sum(salary) in
the right-hand subscript of G indicates that for each group of tuples (that is, each
branch), the aggregation function sum must be applied on the collection of values of
the salary attribute
...
29
...
,Gn GF1 (A1 ), F2 (A2 ),
...
, Gn constitute a list of attributes on which to group; each Fi is an aggregate function; and each Ai is an atemployee-name
Rao
Sato
Johnson
Loreena
Peterson
Adams
Brown
Gopal
Figure 3
...
Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition
I
...
3
115
© The McGraw−Hill
Companies, 2001
3
...
29
Result of
branch-name Gsum(salary) (pt-works)
...
The meaning of the operation is as follows
...
All tuples in a group have the same values for G1 , G2 ,
...
2
...
, Gn
...
, Gn
...
, gn ), the result has a tuple (g1 , g2 ,
...
, am ) where, for
each i, ai is the result of applying the aggregate function Fi on the multiset of values
for attribute Ai in the group
...
, Gn can
be empty, in which case there is a single group containing all tuples in the relation
...
Going back to our earlier example, if we want to find the maximum salary for
part-time employees at each branch, in addition to the sum of the salaries, we write
the expression
branch-name Gsum(salary),max(salary) (pt-works)
As in generalized projection, the result of an aggregation operation does not have a
name
...
As
a notational convenience, attributes of an aggregation operation can be renamed as
illustrated below:
branch-name Gsum(salary) as sum-salary,max(salary) as max -salary (pt-works)
Figure 3
...
branch-name sum-salary max-salary
Austin
3100
1600
Downtown
5300
2500
Perryridge
8100
5300
Figure 3
...
116
Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition
108
Chapter 3
I
...
Relational Model
Relational Model
employee-name
Coyote
Rabbit
Smith
Williams
employee-name
Coyote
Rabbit
Gates
Williams
Figure 3
...
3
...
3 Outer Join
The outer-join operation is an extension of the join operation to deal with missing
information
...
31
...
A possible approach would be to use the naturaljoin operation as follows:
employee
1 ft-works
The result of this expression appears in Figure 3
...
Notice that we have lost the street
and city information about Smith, since the tuple describing Smith is absent from
the ft-works relation; similarly, we have lost the branch name and salary information
about Gates, since the tuple describing Gates is absent from the employee relation
...
There are
actually three forms of the operation: left outer join, denoted 1; right outer join, denoted 1 ; and full outer join, denoted 1
...
The results of the expressions
employee-name
Coyote
Rabbit
Williams
street
Toon
Tunnel
Seaview
Figure 3
...
salary
1500
1300
1500
Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition
I
...
3
employee-name
Coyote
Rabbit
Williams
Smith
street
Toon
Tunnel
Seaview
Revolver
Figure 3
...
Relational Model
Extended Relational-Algebra Operations
city
Hollywood
Carrotville
Seattle
Death Valley
Result of employee
branch-name
Mesa
Mesa
Redmond
null
109
salary
1500
1300
1500
null
1 ft-works
...
33, 3
...
35, respectively
...
In Figure 3
...
All information from
the left relation is present in the result of the left outer join
...
In Figure 3
...
Thus, all information from the right relation is present in the
result of the right outer join
...
Figure 3
...
Since outer join operations may generate results containing null values, we need
to specify how the different relational-algebra operations deal with null values
...
3
...
It is interesting to note that the outer join operations can be expressed by the basic
relational-algebra operations
...
, null)}
where the constant relation {(null,
...
employee-name
Coyote
Rabbit
Williams
Gates
street
Toon
Tunnel
Seaview
null
Figure 3
...
salary
1500
1300
1500
5300
118
Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition
110
Chapter 3
I
...
Relational Model
Relational Model
employee-name
Coyote
Rabbit
Williams
Smith
Gates
street
Toon
Tunnel
Seaview
Revolver
null
Figure 3
...
3
...
4 Null Values∗∗
In this section, we define how the various relational algebra operations deal with null
values and complications that arise when a null value participates in an arithmetic
operation or in a comparison
...
Operations and comparisons on null values should therefore be avoided,
where possible
...
Similarly, any comparisons (such as <, <=, >, >=, =) involving a null value evaluate to special value unknown; we cannot say for sure whether the result of the
comparison is true or false, so we say that the result is the new truth value unknown
...
We must therefore define how the three Boolean operations deal with the truth value unknown
...
• or: (true or unknown) = true; (false or unknown) = unknown; (unknown or unknown) = unknown
...
We are now in a position to outline how the different relational operations deal
with null values
...
• select: The selection operation evaluates predicate P in σP (E) on each tuple t
in E
...
Otherwise,
if the predicate returns unknown or false, t is not added to the result
...
Thus,
the definition of how selection handles nulls also defines how join operations
handle nulls
...
Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition
I
...
Relational Model
3
...
Thus, if two tuples in the projection result are exactly
the same, and both have nulls in the same fields, they are treated as duplicates
...
• union, intersection, difference: These operations treat nulls just as the projection operation does; they treat tuples that have the same values on all fields as
duplicates even if some of the fields have null values in both tuples
...
• generalized projection: We outlined how nulls are handled in expressions
at the beginning of Section 3
...
4
...
• aggregate: When nulls occur in grouping attributes, the aggregate operation
treats them just as in projection: If two tuples are the same on all grouping
attributes, the operation places them in the same group, even if some of their
attribute values are null
...
If the resultant multiset is empty,
the aggregate result is null
...
However, this would mean
a single unknown value in a large group could make the aggregate result on
the group to be null, and we would lose a lot of useful information
...
Such tuples may be added to the
result (depending on whether the operation is 1, 1 , or 1 ), padded with
nulls
...
4 Modification of the Database
We have limited our attention until now to the extraction of information from the
database
...
We express database modifications by using the assignment operation
...
2
...
3
...
1 Deletion
We express a delete request in much the same way as a query
...
We
120
Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition
112
Chapter 3
I
...
Relational Model
© The McGraw−Hill
Companies, 2001
Relational Model
can delete only whole tuples; we cannot delete values on only particular attributes
...
Here are several examples of relational-algebra delete requests:
• Delete all of Smith’s account records
...
loan ← loan − σamount≥0 and amount≤50 (loan)
• Delete all accounts at branches located in Needham
...
3
...
2 Insertion
To insert data into a relation, we either specify a tuple to be inserted or write a query
whose result is a set of tuples to be inserted
...
Similarly, tuples inserted
must be of the correct arity
...
We express the insertion
of a single tuple by letting E be a constant relation containing one tuple
...
We write
account ← account ∪ {(A-973, “Perryridge”, 1200)}
depositor ← depositor ∪ {(“Smith”, A-973)}
More generally, we might want to insert tuples on the basis of the result of a query
...
Let the loan number serve as the account number
for this savings account
...
Data Models
121
© The McGraw−Hill
Companies, 2001
3
...
5
Views
113
Instead of specifying a tuple as we did earlier, we specify a set of tuples that is inserted into both the account and depositor relation
...
Each tuple in the depositor
relation has as customer-name the name of the loan customer who is being given the
new account and the same account number as the corresponding account tuple
...
4
...
We can use the generalized-projection operator to do this task:
r ← ΠF1 ,F2 ,
...
If we want to select some tuples from r and to update only them, we can use
the following expression; here, P denotes the selection condition that chooses which
tuples to update:
r ← ΠF1 ,F2 ,
...
We write
account ← Πaccount-number, branch-name, balance
∗1
...
We write
account ← ΠAN,BN, balance ∗1
...
05 (σbalance≤10000 (account))
where the abbreviations AN and BN stand for account-number and branch-name, respectively
...
5 Views
In our examples up to this point, we have operated at the logical-model level
...
It is not desirable for all users to see the entire logical model
...
Consider a person who
needs to know a customer’s loan number and branch name, but has no need to see
the loan amount
...
122
Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition
114
Chapter 3
I
...
Relational Model
© The McGraw−Hill
Companies, 2001
Relational Model
An employee in the advertising department, for example, might like to see a relation
consisting of the customers who have either an account or a loan at the bank, and
the branches with which they do business
...
It is possible to support a large number of views on
top of any given set of actual relations
...
5
...
To define a view, we must give
the view a name, and must state the query that computes the view
...
The view
name is represented by v
...
We
wish this view to be called all-customer
...
Using the view all-customer, we can find all customers
of the Perryridge branch by writing
Πcustomer -name (σbranch-name = “Perryridge” (all-customer ))
Recall that we wrote the same query in Section 3
...
1 without using views
...
We study the issue of update
operations on views in Section 3
...
2
...
Suppose
that we define relation r1 as follows:
r1 ← Πbranch-name, customer -name (depositor 1 account)
∪ Πbranch-name, customer -name (borrower 1 loan)
We evaluate the assignment operation once, and r1 does not change when we update the relations depositor, account, loan, or borrower
...
Intuitively, at any given time, the set of tuples in the view relation is the result of
evaluation of the query expression that defines the view at that time
...
Data Models
123
© The McGraw−Hill
Companies, 2001
3
...
5
Views
115
Thus, if a view relation is computed and stored, it may become out of date if the
relations used to define it are modified
...
When we define a view, the database system stores the definition of the
view itself, rather than the result of evaluation of the relational-algebra expression
that defines the view
...
Thus, whenever we evaluate the query, the view relation
gets recomputed
...
Such views are called materialized views
...
5
...
Of course, the benefits to queries
from the materialization of a view must be weighed against the storage costs and the
added overhead for updates
...
5
...
The difficulty is that a modification
to the database expressed in terms of a view must be translated to a modification to
the actual relations in the logical model of the database
...
Let loan-branch be the view given to the clerk
...
However, to insert a tuple into loan, we must have some value for amount
...
• Insert a tuple (L-37, “Perryridge”, null) into the loan relation
...
Data Models
© The McGraw−Hill
Companies, 2001
3
...
36
amount
900
1500
1500
1300
1000
2000
500
1900
loan-number
L-16
L-93
L-15
L-14
L-17
L-11
L-23
L-17
null
Tuples inserted into loan and borrower
...
Consider the following insertion through this view:
loan-info ← loan-info ∪ {(“Johnson”, 1900)}
The only possible method of inserting tuples into the borrower and loan relations is to
insert (“Johnson”, null) into borrower and (null, null, 1900) into loan
...
36
...
Thus, there is no way to update the relations borrower and loan by using nulls
to get the desired update on loan-info
...
Different database systems specify different
conditions under which they permit updates on view relations; see the database
system manuals for details
...
3
...
3 Views Defined by Using Other Views
In Section 3
...
1 we mentioned that view relations may appear in any place that a
relation name may appear, except for restrictions on the use of views in update ex-
Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition
I
...
Relational Model
125
© The McGraw−Hill
Companies, 2001
3
...
Thus, one view may be used in the expression defining another view
...
View expansion is one way to define the meaning of views defined in terms of
other views
...
For example, if v1 is used in the definition of v2, v2 is used in the
definition of v3, and v3 is used in the definition of v1, then each of v1, v2, and v3
is recursive
...
2
...
A view relation stands for the expression defining the view, and therefore
a view relation can be replaced by the expression that defines it
...
Hence, view expansion of an expression
repeats the replacement step as follows:
repeat
Find any view relation vi in e1
Replace the view relation vi by the expression defining vi
until no more view relations are present in e1
As long as the view definitions are not recursive, this loop will terminate
...
As an illustration of view expansion, consider the following expression:
σcustomer -name=“John” ( perryridge-customer )
The view-expansion procedure initially generates
σcustomer -name=“John” (Πcustomer -name (σbranch-name = “Perryridge”
(all-customer )))
It then generates
σcustomer -name=“John” (Πcustomer -name (σbranch-name = “Perryridge”
(Πbranch-name, customer -name (depositor 1 account)
∪ Πbranch-name, customer -name (borrower 1 loan))))
There are no more uses of view relations, and view expansion terminates
...
Data Models
3
...
6 The Tuple Relational Calculus
When we write a relational-algebra expression, we provide a sequence of procedures
that generates the answer to our query
...
It describes the desired information without giving
a specific procedure for obtaining that information
...
Following our
earlier notation, we use t[A] to denote the value of tuple t on attribute A, and we use
t ∈ r to denote that tuple t is in relation r
...
2
...
6
...
To write this query in the tuple relational calculus, we need to write
an expression for a relation on the schema (loan-number)
...
To
express this request, we need the construct “there exists” from mathematical logic
...
”
Using this notation, we can write the query “Find the loan number for each loan
of an amount greater than $1200” as
{t | ∃ s ∈ loan (t[loan-number ] = s[loan-number ]
∧ s[amount] > 1200)}
In English, we read the preceding expression as “The set of all tuples t such that there
exists a tuple s in relation loan for which the values of t and s for the loan-number
attribute are equal, and the value of s for the amount attribute is greater than $1200
...
Thus, the result is a relation on (loannumber)
...
” This query is slightly more complex than the previous queries,
since it involves two relations: borrower and loan
...
We write the query as follows:
Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition
I
...
Relational Model
3
...
” Tuple variable u ensures that the
customer is a borrower at the Perryridge branch
...
Figure 3
...
To find all customers who have a loan, an account, or both at the bank, we used
the union operation in the relational algebra
...
• The customer-name appears in some tuple of the depositor relation as a depositor of the bank
...
The result of this query appeared earlier in Figure 3
...
If we now want only those customers who have both an account and a loan at the
bank, all we need to do is to change the or (∨) to and (∧) in the preceding expression
...
20
...
” The tuple-relational-calculus expression for this
query is similar to the expressions that we have just seen, except for the use of the not
(¬) symbol:
{t | ∃ u ∈ depositor (t[customer -name] = u[customer -name])
∧ ¬ ∃ s ∈ borrower (t[customer -name] = s[customer -name])}
customer-name
Adams
Hayes
Figure 3
...
128
Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition
120
Chapter 3
I
...
Relational Model
© The McGraw−Hill
Companies, 2001
Relational Model
This tuple-relational-calculus expression uses the ∃ u ∈ depositor (
...
) clause to eliminate those customers who appear in some tuple of the
borrower relation as having a loan from the bank
...
13
...
The formula
P ⇒ Q means “P implies Q”; that is, “if P is true, then Q must be true
...
The use of implication rather than not and
or often suggests a more intuitive interpretation of a query in English
...
2
...
” To
write this query in the tuple relational calculus, we introduce the “for all” construct,
denoted by ∀
...
”
We write the expression for our query as follows:
{t | ∃ r ∈ customer (r[customer -name] = t[customer -name]) ∧
( ∀ u ∈ branch (u[branch-city] = “ Brooklyn” ⇒
∃ s ∈ depositor (t[customer -name] = s[customer -name]
∧ ∃ w ∈ account (w[account-number ] = s[account-number ]
∧ w[branch-name] = u[branch-name]))))}
In English, we interpret this expression as “The set of all customers (that is, (customername) tuples t) such that, for all tuples u in the branch relation, if the value of u on attribute branch-city is Brooklyn, then the customer has an account at the branch whose
name appears in the branch-name attribute of u
...
The first line of the query expression is critical in this case — without the condition
∃ r ∈ customer (r[customer -name] = t[customer -name])
if there is no branch in Brooklyn, any value of t (including values that are not customer names in the depositor relation) would qualify
...
6
...
A tuple-relational-calculus expression is of
the form
{t | P(t)}
where P is a formula
...
A tuple variable is said to be a free variable unless it is quantified by a ∃ or ∀
...
Tuple variable s is said to be a bound variable
...
Data Models
129
© The McGraw−Hill
Companies, 2001
3
...
6
The Tuple Relational Calculus
121
A tuple-relational-calculus formula is built up out of atoms
...
• If P1 is a formula, then so are ¬P1 and (P1 )
...
• If P1 (s) is a formula containing a free tuple variable s, and r is a relation, then
∃ s ∈ r (P1 (s)) and ∀ s ∈ r (P1 (s))
are also formulae
...
In the tuple relational calculus, these equivalences
include the following three rules:
1
...
2
...
3
...
3
...
3 Safety of Expressions
There is one final issue to be addressed
...
Suppose that we write the expression
{t |¬ (t ∈ loan)}
There are infinitely many tuples that are not in loan
...
To help us define a restriction of the tuple relational calculus, we introduce the
concept of the domain of a tuple relational formula, P
...
They include values
mentioned in P itself, as well as values that appear in a tuple of a relation mentioned in P
...
Data Models
3
...
For example,
dom(t ∈ loan ∧ t[amount] > 1200) is the set containing 1200 as well as the set of all
values appearing in loan
...
We say that an expression {t | P (t)} is safe if all values that appear in the result
are values from dom(P )
...
Note that
dom(¬ (t ∈ loan)) is the set of all values appearing in loan
...
The other
examples of tuple-relational-calculus expressions that we have written in this section
are safe
...
6
...
We will not prove this assertion here; the bibliographic notes contain references
to the proof
...
We note that the
tuple relational calculus does not have any equivalent of the aggregate operation, but
it can be extended to support aggregation
...
3
...
The domain relational calculus, however, is closely related to the tuple
relational calculus
...
3
...
1 Formal Definition
An expression in the domain relational calculus is of the form
{< x1 , x2 ,
...
, xn )}
where x1 , x2 ,
...
P represents a formula composed
of atoms, as was the case in the tuple relational calculus
...
, xn > ∈ r, where r is a relation on n attributes and x1 , x2 ,
...
Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition
I
...
Relational Model
3
...
We require that attributes x and y have domains that can
be compared by Θ
...
We build up formulae from atoms by using the following rules:
• An atom is a formula
...
• If P1 and P2 are formulae, then so are P1 ∨ P2 , P1 ∧ P2 , and P1 ⇒ P2
...
As a notational shorthand, we write
∃ a, b, c (P (a, b, c))
for
∃ a (∃ b (∃ c (P (a, b, c))))
3
...
2 Example Queries
We now give domain-relational-calculus queries for the examples that we considered earlier
...
• Find the loan number, branch name, and amount for loans of over $1200:
{< l, b, a > | < l, b, a > ∈ loan ∧ a > 1200}
• Find all loan numbers for loans with an amount greater than $1200:
{< l > | ∃ b, a (< l, b, a > ∈ loan ∧ a > 1200)}
Although the second query appears similar to the one that we wrote for the tuple
relational calculus, there is an important difference
...
However, when we write ∃ b in the domain calculus, b refers not to a tuple,
but rather to a domain value
...
For example,
• Find the names of all customers who have a loan from the Perryridge branch
and find the loan amount:
132
Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition
124
Chapter 3
I
...
Relational Model
© The McGraw−Hill
Companies, 2001
Relational Model
{< c, a > | ∃ l (< c, l > ∈ borrower
∧ ∃ b (< l, b, a > ∈ loan ∧ b = “Perryridge”))}
• Find the names of all customers who have a loan, an account, or both at the
Perryridge branch:
{< c > | ∃ l (< c, l > ∈ borrower
∧ ∃ b, a (< l, b, a > ∈ loan ∧ b = “Perryridge”))
∨ ∃ a (< c, a > ∈ depositor
∧ ∃ b, n (< a, b, n > ∈ account ∧ b = “Perryridge”))}
• Find the names of all customers who have an account at all the branches located in Brooklyn:
{< c > | ∃ n (< c, n > ∈ customer ) ∧
∀ x, y, z (< x, y, z > ∈ branch ∧ y = “Brooklyn” ⇒
∃ a, b (< a, x, b > ∈ account ∧ < c, a > ∈ depositor ))}
In English, we interpret this expression as “The set of all (customer-name) tuples c such that, for all (branch-name, branch-city, assets) tuples, x, y, z, if the
branch city is Brooklyn, then the following is true”:
There exists a tuple in the relation account with account number a and
branch name x
...
”
3
...
3 Safety of Expressions
We noted that, in the tuple relational calculus (Section 3
...
That led us to define safety for tuplerelational-calculus expressions
...
An expression such as
{< l, b, a > | ¬(< l, b, a > ∈ loan)}
is unsafe, because it allows values in the result that are not in the domain of the
expression
...
Consider the expression
{< x > | ∃ y (< x, y >∈ r) ∧ ∃ z (¬(< x, z >∈ r) ∧ P (x, z))}
where P is some formula involving x and z
...
However, to test the second
part of the formula, ∃ z (¬ (< x, z > ∈ r) ∧ P (x, z)), we must consider values for
z that do not appear in r
...
Thus, it is not possible, in general, to test the second part of the
Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition
I
...
Relational Model
3
...
Instead,
we add restrictions to prohibit expressions such as the preceding one
...
Since we did not do so in the domain calculus, we
add rules to the definition of safety to deal with cases like our example
...
, xn > | P (x1 , x2 ,
...
All values that appear in tuples of the expression are values from dom(P)
...
For every “there exists” subformula of the form ∃ x (P1 (x)), the subformula is
true if and only if there is a value x in dom(P1 ) such that P1 (x) is true
...
For every “for all” subformula of the form ∀x (P1 (x)), the subformula is true
if and only if P1 (x) is true for all values x from dom(P1 )
...
Consider the
second rule in the definition of safety
...
In general, there would be infinitely many values to
test
...
This restriction reduces to a finite number the tuples we must
consider
...
To assert that
∀x (P1 (x)) is true, we must, in general, test all possible values, so we must examine infinitely many values
...
All the domain-relational-calculus expressions that we have written in the example queries of this section are safe
...
7
...
Since we noted earlier that the restricted tuple relational calculus is equivalent to the
relational algebra, all three of the following are equivalent:
• The basic relational algebra (without the extended relational algebra operations)
• The tuple relational calculus restricted to safe expressions
• The domain relational calculus restricted to safe expressions
134
Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition
126
Chapter 3
I
...
Relational Model
© The McGraw−Hill
Companies, 2001
Relational Model
We note that the domain relational calculus also does not have any equivalent of the
aggregate operation, but it can be extended to support aggregation, and extending it
to handle arithmatic expressions is straightforward
...
8 Summary
• The relational data model is based on a collection of tables
...
There are several languages for expressing these operations
...
These operations can be combined
to get expressions that express desired queries
...
• The operations in relational algebra can be divided into
Basic operations
Additional operations that can be expressed in terms of the basic operations
Extended operations, some of which add further expressive power to relational algebra
• Databases can be modified by insertion, deletion, or update of tuples
...
• Different users of a shared database may benefit from individualized views of
the database
...
We
evaluate queries involving views by replacing the view with the expression
that defines the view
...
Therefore, database
systems severely restrict updates through views
...
When database relations are updated, the materialized view must be correspondingly updated
...
The basic relational algebra is a procedural language that is
equivalent in power to both forms of the relational calculus when they are
restricted to safe expressions
...
Commercial database systems, therefore, use languages with more “syntactic sugar
...
Data Models
135
© The McGraw−Hill
Companies, 2001
3
...
Review Terms
• Table
• Relation
• Tuple variable
• Atomic domain
• Null value
• Database schema
• Database instance
• Relation schema
• Relation instance
• Keys
• Foreign key
Referencing relation
Referenced relation
• Schema diagram
• Query language
• Procedural language
• Nonprocedural language
• Relational algebra
• Relational algebra operations
Select σ
Project Π
Union ∪
Set difference −
Cartesian product ×
Rename ρ
• Additional operations
Set-intersection ∩
Natural-join 1
Division /
• Assignment operation
• Extended relational-algebra
operations
Generalized projection Π
Outer join
–– Left outer join 1
–– Right outer join 1
–– Full outer join 1
Aggregation G
• Multisets
• Grouping
• Null values
• Modification of the database
Deletion
Insertion
Updating
• Views
• View definition
• Materialized views
• View update
•
•
•
•
View expansion
Recursive views
Tuple relational calculus
Domain relational calculus
• Safety of expressions
• Expressive power of languages
Exercises
3
...
The office maintains data about each class, including the instructor, the number of students
enrolled, and the time and place of the class meetings
...
136
Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition
128
Chapter 3
I
...
Relational Model
Relational Model
model
address
driver-id
license
name
person
location
car
owns
driver
year
report-number
participated
date
accident
damage-amount
Figure 3
...
3
...
Illustrate your answer by referring to your solution to Exercise 3
...
3
...
38
...
4 In Chapter 2, we saw how to represent many-to-many, many-to-one, one-tomany, and one-to-one relationship sets
...
3
...
39, where the primary keys are underlined
...
Find the names of all employees who work for First Bank Corporation
...
Find the names and cities of residence of all employees who work for First
Bank Corporation
...
Find the names, street address, and cities of residence of all employees who
work for First Bank Corporation and earn more than $10,000 per annum
...
Find the names of all employees in this database who live in the same city
as the company for which they work
...
Find the names of all employees who live in the same city and on the same
street as do their managers
...
Find the names of all employees in this database who do not work for First
Bank Corporation
...
Find the names of all employees who earn more than every employee of
Small Bank Corporation
...
Assume the companies may be located in several cities
...
3
...
21, which shows the result of the query “Find
the names of all customers who have a loan at the bank
...
Observe that now customer Jackson no longer appears in the result, even though
Jackson does in fact have a loan from the bank
...
Data Models
3
...
39
Relational database for Exercises 3
...
8 and 3
...
a
...
b
...
How would you
modify the database to achieve this effect?
c
...
Write a query
using an outer join that accomplishes this desire without your having to
modify the database
...
7 The outer-join operations extend the natural-join operation so that tuples from
the participating relations are not lost in the result of the join
...
3
...
39
...
Give all employees of First Bank Corporation a 10 percent salary raise
...
Give all managers in this database a 10 percent salary raise, unless the salary
would be greater than $100,000
...
e
...
a
...
c
...
3
...
Using an aggregate function
...
Without using any aggregate functions
...
10 Consider the relational database of Figure 3
...
Give a relational-algebra expression for each of the following queries:
a
...
b
...
c
...
3
...
3
...
3
...
Data Models
3
...
Give an expression in the tuple relational
calculus that is equivalent to each of the following:
a
...
c
...
ΠA (r)
σB = 17 (r)
r × s
ΠA,F (σC = D (r × s))
3
...
Give
an expression in the domain relational calculus that is equivalent to each of the
following:
a
...
c
...
e
...
ΠA (r1 )
σB = 17 (r1 )
r1 ∪ r2
r1 ∩ r2
r1 − r2
ΠA,B (r1 ) 1 ΠB,C (r2 )
3
...
5 using the tuple relational calculus and the domain relational
calculus
...
16 Let R = (A, B) and S = (A, C), and let r(R) and s(S) be relations
...
b
...
d
...
17 Let R = (A, B) and S = (A, C), and let r(R) and s(S) be relations
...
r
b
...
r
1s
1s
1s
3
...
3
...
A marked null ⊥i is equal to itself, but if
i = j, then ⊥i = ⊥j
...
Consider the view loan-info (Section 3
...
Show how you can use
marked nulls to allow the insertion of the tuple (“Johnson”, 1900) through loaninfo
...
Data Models
3
...
F
...
This work led to the prestigious ACM Turing Award to
Codd in 1981; Codd [1982]
...
J
...
System R is discussed in Astrahan et al
...
[1979],
and Chamberlin et al
...
Ingres is discussed in Stonebraker [1980], Stonebraker
[1986b], and Stonebraker et al
...
Query-by-example is described in Zloof [1977]
...
Many relational-database products are now commercially available
...
Database
products for personal computers include Microsoft Access, dBase, and FoxPro
...
General discussion of the relational data model appears in most database texts
...
The original definition of relational algebra is in Codd [1970];
that of tuple relational calculus is in Codd [1972]
...
Several extensions to the relational calculus have been proposed
...
[1993] describe extensions to scalar aggregate functions
...
Codd [1990] is a compendium of E
...
Codd’s papers on the relational model
...
The problem of updating relational databases
through views is addressed by Bancilhon and Spyratos [1981], Cosmadakis and Papadimitriou [1984], Dayal and Bernstein [1978], and Langerak [1990]
...
5
covers materialized view maintenance, and references to literature on view maintenance can be found at the end of that chapter
...
Relational Databases
R T
Introduction
© The McGraw−Hill
Companies, 2001
2
Relational Databases
A relational database is a shared repository of data
...
One is how users specify requests for data: Which of the various query languages do they use? Chapter 4
covers the SQL language, which is the most widely used query language today
...
Another issue is data integrity and security; databases need to protect data from
damage by user actions, whether unintentional or intentional
...
The security component of a database
includes authentication of users, and access control, to restrict the permissible actions
for each user
...
Security and integrity
issues are present regardless of the data model, but for concreteness we study them
in the context of the relational model
...
Relational database design — the design of the relational schema — is the first step
in building a database application
...
There are, however, principles that can be used to distinguish good
database designs from bad ones
...
Chapter 7 describes the formal design of relational
schemas
...
Relational Databases
H
A
P
T
E
R
141
© The McGraw−Hill
Companies, 2001
4
...
However, commercial database systems require a query language
that is more user friendly
...
SQL uses a combination of relational-algebra
and relational-calculus constructs
...
It can define the structure of the data, modify data
in the database, and specify security constraints
...
Rather, we
present SQL’s fundamental constructs and concepts
...
4
...
IBM implemented the language, originally called Se-
quel, as part of the System R project in the early 1970s
...
Many products now support the SQL language
...
In 1986, the American National Standards Institute (ANSI) and the International
Organization for Standardization (ISO) published an SQL standard, called SQL-86
...
ANSI published an extended standard for
SQL, SQL-89, in 1989
...
The bibliographic notes provide references to these
standards
...
Relational Databases
4
...
The SQL:1999 standard is a superset of the SQL-92 standard;
we cover some features of SQL:1999 in this chapter, and provide more detailed coverage in Chapter 9
...
You
should also be aware that some database systems do not even support all the features of SQL-92, and that many databases provide nonstandard features that we do
not cover here
...
The SQL DDL provides commands for defining relation schemas, deleting relations, and modifying relation schemas
...
The SQL DML includes a
query language based on both the relational algebra and the tuple relational
calculus
...
• View definition
...
• Transaction control
...
• Embedded SQL and dynamic SQL
...
• Integrity
...
Updates that violate integrity
constraints are disallowed
...
The SQL DDL includes commands for specifying access rights
to relations and views
...
We also
briefly outline embedded and dynamic SQL, including the ODBC and JDBC standards
for interacting with a database from programs written in the C and Java languages
...
The enterprise that we use in the examples in this chapter, and later chapters, is a
banking enterprise with the following relation schemas:
Branch-schema = (branch-name, branch-city, assets)
Customer-schema = (customer-name, customer-street, customer-city)
Loan-schema = (loan-number, branch-name, amount)
Borrower-schema = (customer-name, loan-number)
Account-schema = (account-number, branch-name, balance)
Depositor-schema = (customer-name, account-number)
Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition
II
...
SQL
4
...
In actual SQL systems, however,
hyphens are not valid parts of a name (they are treated as the minus operator)
...
For example, we use branch name in place of
branch-name
...
2 Basic Structure
A relational database consists of a collection of relations, each of which is assigned
a unique name
...
SQL allows the use of null values to indicate that the value either is unknown or does
not exist
...
11
...
• The select clause corresponds to the projection operation of the relational algebra
...
• The from clause corresponds to the Cartesian-product operation of the relational algebra
...
• The where clause corresponds to the selection predicate of the relational algebra
...
That the term select has different meaning in SQL than in the relational algebra is an
unfortunate historical fact
...
A typical SQL query has the form
select A1 , A2 ,
...
, rm
where P
Each Ai represents an attribute, and each ri a relation
...
The query is
equivalent to the relational-algebra expression
ΠA1 , A2 ,
...
However, unlike the result of a
relational-algebra expression, the result of the SQL query may contain multiple copies
of some tuples; we shall return to this issue in Section 4
...
8
...
Relational Databases
© The McGraw−Hill
Companies, 2001
4
...
In practice, SQL may convert the expression into an equivalent form that can be processed more efficiently
...
4
...
1 The select Clause
The result of an SQL query is, of course, a relation
...
Formal query languages are based on the mathematical notion of a relation being
a set
...
In practice, duplicate elimination is time-consuming
...
Thus, the
preceding query will list each branch-name once for every tuple in which it appears in
the loan relation
...
We can rewrite the preceding query as
select distinct branch-name
from loan
if we want duplicates removed
...
To ensure
the elimination of duplicates in the results of our example queries, we will use distinct whenever it is necessary
...
However, the number is important in certain applications; we return to this issue in
Section 4
...
8
...
” Thus, the use of
loan
...
A select clause of the form select * indicates that all attributes of all relations
appearing in the from clause are selected
...
For example, the query
select loan-number, branch-name, amount * 100
from loan
Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition
II
...
SQL
4
...
SQL also provides special data types, such as various forms of the date type, and
allows several arithmetic functions to operate on these types
...
2
...
Consider the query “Find all loan
numbers for loans made at the Perryridge branch with loan amounts greater that
$1200
...
The operands of the logical connectives
can be expressions involving the comparison operators <, <=, >, >=, =, and <>
...
SQL includes a between comparison operator to simplify where clauses that specify that a value be less than or equal to some value and greater than or equal to some
other value
...
4
...
3 The from Clause
Finally, let us discuss the use of the from clause
...
Since the natural join is defined in
terms of a Cartesian product, a selection, and a projection, it is a relatively simple
matter to write an SQL expression for the natural join
...
” In SQL, this query can be written as
146
Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition
140
Chapter 4
II
...
SQL
© The McGraw−Hill
Companies, 2001
SQL
select customer-name, borrower
...
loan-number = loan
...
attribute-name, as does the relational
algebra, to avoid ambiguity in cases where an attribute appears in the schema of more
than one relation
...
customer-name instead of customername in the select clause
...
We can extend the preceding query and consider a more complicated case in which
we require also that the loan be from the Perryridge branch: “Find the customer
names, loan numbers, and loan amounts for all loans at the Perryridge branch
...
loan-number, amount
from borrower, loan
where borrower
...
loan-number and
branch-name = ’Perryridge’
SQL includes extensions to perform natural joins and outer joins in the from clause
...
10
...
2
...
It uses the as
clause, taking the form:
old-name as new-name
The as clause can appear in both the select and from clauses
...
loan-number, amount
from borrower, loan
where borrower
...
loan-number
The result of this query is a relation with the following attributes:
customer-name, loan-number, amount
...
We cannot, however, always derive names in this way, for several reasons: First,
two relations in the from clause may have attributes with the same name, in which
case an attribute name is duplicated in the result
...
Third,
Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition
II
...
SQL
4
...
Hence, SQL
provides a way of renaming the attributes of a result relation
...
loan-number as loan-id, amount
from borrower, loan
where borrower
...
loan-number
4
...
5 Tuple Variables
The as clause is particularly useful in defining the notion of tuple variables, as is
done in the tuple relational calculus
...
Tuple variables are defined in the from clause by way of the as
clause
...
loan-number, S
...
loan-number = S
...
When we write expressions of the form relation-name
...
Tuple variables are most useful for comparing two tuples in the same relation
...
Suppose that we want the query “Find the names of all branches that have assets
greater than at least one branch located in Brooklyn
...
branch-name
from branch as T, branch as S
where T
...
assets and S
...
asset, since it would not be clear
which reference to branch is intended
...
, vn ) to denote a tuple of arity n containing values v1 , v2 ,
...
The comparison operators can be used on tuples, and
the ordering is defined lexicographically
...
4
...
6 String Operations
SQL specifies strings by enclosing them in single quotes, for example, ’Perryridge’,
as we saw earlier
...
Relational Databases
4
...
The most commonly used operation on strings is pattern matching using the operator like
...
• Underscore ( ): The
character matches any character
...
To illustrate pattern matching, we consider the following examples:
• ’Perry%’ matches any string beginning with “Perry”
...
• ’
’ matches any string of exactly three characters
...
SQL expresses patterns by using the like comparison operator
...
”
This query can be written as
select customer-name
from customer
where customer-street like ’%Main%’
For patterns to include the special pattern characters (that is, % and ), SQL allows
the specification of an escape character
...
We define the escape character for a like comparison
using the escape keyword
...
• like ’ab\\cd%’ escape ’\’ matches all strings beginning with “ab\cd”
...
SQL also permits a variety of functions on character strings, such as concatenating (using “ ”), extracting substrings, finding the length of strings, converting between uppercase and lowercase, and so on
...
Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition
II
...
SQL
4
...
2
...
The order by clause causes the tuples in the result of a query to appear in
sorted order
...
loan-number = loan
...
To specify the sort order,
we may specify desc for descending order or asc for ascending order
...
Suppose that we wish to list the
entire loan relation in descending order of amount
...
We express this query in
SQL as follows:
select *
from loan
order by amount desc, loan-number asc
To fulfill an order by request, SQL must perform a sort
...
4
...
8 Duplicates
Using relations with duplicates offers advantages in several situations
...
We can define the duplicate
semantics of an SQL query using multiset versions of the relational operators
...
Given
multiset relations r1 and r2 ,
1
...
2
...
3
...
t2 in r1 × r2
...
Relational Databases
4
...
An SQL query of the form
select A1 , A2 ,
...
, rm
where P
is equivalent to the relational-algebra expression
ΠA1 , A2 ,
...
4
...
Like union, intersection, and set
difference in relational algebra, the relations participating in the operations must be
compatible; that is, they must have the same set of attributes
...
We shall now construct queries involving the union,
intersect, and except operations of two sets: the set of all customers who have an
account at the bank, which can be derived by
select customer-name
from depositor
and the set of customers who have a loan at the bank, which can be derived by
select customer-name
from borrower
We shall refer to the relations obtained as the result of the preceding queries as
d and b, respectively
...
3
...
Relational Databases
151
© The McGraw−Hill
Companies, 2001
4
...
3
Set Operations
145
The union operation automatically eliminates duplicates, unlike the select clause
...
If we want to retain all duplicates, we must write union all in place of union:
(select customer-name
from depositor)
union all
(select customer-name
from borrower)
The number of duplicate tuples in the result is equal to the total number of duplicates
that appear in both d and b
...
4
...
2 The Intersect Operation
To find all customers who have both a loan and an account at the bank, we write
(select distinct customer-name
from depositor)
intersect
(select distinct customer-name
from borrower)
The intersect operation automatically eliminates duplicates
...
If we want to retain all duplicates, we must write intersect all in place of intersect:
(select customer-name
from depositor)
intersect all
(select customer-name
from borrower)
The number of duplicate tuples that appear in the result is equal to the minimum
number of duplicates in both d and b
...
4
...
3 The Except Operation
To find all customers who have an account but no loan at the bank, we write
152
Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition
146
Chapter 4
II
...
SQL
© The McGraw−Hill
Companies, 2001
SQL
(select distinct customer-name
from depositor)
except
(select customer-name
from borrower)
The except operation automatically eliminates duplicates
...
If we want to retain all duplicates, we must write except all in place of except:
(select customer-name
from depositor)
except all
(select customer-name
from borrower)
The number of duplicate copies of a tuple in the result is equal to the number of
duplicate copies of the tuple in d minus the number of duplicate copies of the tuple
in b, provided that the difference is positive
...
If, instead, this customer has two accounts and three loans at the bank, there will be
no tuple with the name Jones in the result
...
4 Aggregate Functions
Aggregate functions are functions that take a collection (a set or multiset) of values as
input and return a single value
...
As an illustration, consider the query “Find the average account balance at the
Perryridge branch
...
Relational Databases
153
© The McGraw−Hill
Companies, 2001
4
...
4
Aggregate Functions
147
The result of this query is a relation with a single attribute, containing a single tuple with a numerical value corresponding to the average balance at the Perryridge
branch
...
There are circumstances where we would like to apply the aggregate function not
only to a single set of tuples, but also to a group of sets of tuples; we specify this wish
in SQL using the group by clause
...
Tuples with the same value on all attributes in the
group by clause are placed in one group
...
” We write this query as follows:
select branch-name, avg (balance)
from account
group by branch-name
Retaining duplicates is important in computing an average
...
The
average balance is $7000/4 = $1750
...
If duplicates were eliminated, we would obtain the wrong answer ($6000/3 = $2000)
...
If we do want to eliminate duplicates, we use the keyword distinct in
the aggregate expression
...
” In this case, a depositor counts only once, regardless of the
number of accounts that depositor may have
...
account-number = account
...
For example, we might be interested in only those branches where the average
account balance is more than $1200
...
To express such a
query, we use the having clause of SQL
...
We express this
query in SQL as follows:
select branch-name, avg (balance)
from account
group by branch-name
having avg (balance) > 1200
At times, we wish to treat the entire relation as a single group
...
Consider the query “Find the average balance for all
accounts
...
Relational Databases
© The McGraw−Hill
Companies, 2001
4
...
The notation for this function in SQL is count (*)
...
It is legal to use distinct with
max and min, even though the result does not change
...
If a where clause and a having clause appear in the same query, SQL applies the
predicate in the where clause first
...
SQL then applies the having clause, if it
is present, to each group; it removes the groups that do not satisfy the having clause
predicate
...
To illustrate the use of both a having clause and a where clause in the same query,
we consider the query “Find the average balance for each customer who lives in
Harrison and has at least three accounts
...
customer-name, avg (balance)
from depositor, account, customer
where depositor
...
account-number and
depositor
...
customer-name and
customer-city = ’Harrison’
group by depositor
...
account-number) >= 3
4
...
We can use the special keyword null in a predicate to test for a null value
...
The use of a null value in arithmetic and comparison operations causes several
complications
...
3
...
We now outline how SQL handles null values
...
Relational Databases
155
© The McGraw−Hill
Companies, 2001
4
...
6
Nested Subqueries
149
The result of an arithmetic expression (involving, for example +, −, ∗ or /) is null
if any of the input values is null
...
Since the predicate in a where clause can involve Boolean operations such as and,
or, and not on the results of comparisons, the definitions of the Boolean operations
are extended to deal with the value unknown, as outlined in Section 3
...
4
...
• or: The result of true or unknown is true, false or unknown is unknown, while
unknown or unknown is unknown
...
SQL defines the result of an SQL statement of the form
select
...
If the predicate evaluates to either false or unknown for a tuple in R1 × · · · × Rn
(the projection of) the tuple is not added to the result
...
Null values, when they exist, also complicate the processing of aggregate operators
...
Consider the following query to total all loan amounts:
select sum (amount)
from loan
The values to be summed in the preceding query include null values, since some
tuples have a null value for amount
...
In general, aggregate functions treat nulls according to the following rule: All aggregate functions except count(*) ignore null values in their input collection
...
The count
of an empty collection is defined to be 0, and all other aggregate operations return a
value of null when applied on an empty collection
...
A boolean type data, which can take values true, false, and unknown, was introduced in SQL:1999
...
4
...
A subquery is a select-from-
where expression that is nested within another query
...
Relational Databases
4
...
We shall study these uses in subsequent sections
...
6
...
The in connective tests for set membership, where the set is a
collection of values produced by a select clause
...
As an illustration, reconsider the query “Find all customers who have both a loan and an account at the bank
...
We can take the alternative approach of finding all account
holders at the bank who are members of the set of borrowers from the bank
...
We begin by finding all account
holders, and we write the subquery
(select customer-name
from depositor)
We then need to find those customers who are borrowers from the bank and who
appear in the list of account holders obtained in the subquery
...
The resulting query is
select distinct customer-name
from borrower
where customer-name in (select customer-name
from depositor)
This example shows that it is possible to write the same query several ways in
SQL
...
We shall see that there is a substantial amount of
redundancy in SQL
...
It is
also possible to test for membership in an arbitrary relation in SQL
...
loan-number = loan
...
account-number = account
...
Relational Databases
157
© The McGraw−Hill
Companies, 2001
4
...
6
Nested Subqueries
151
We use the not in construct in a similar way
...
The following
query selects the names of customers who have a loan at the bank, and whose names
are neither Smith nor Jones
...
6
...
” In Section 4
...
5, we wrote this query as follows:
select distinct T
...
assets > S
...
branch-city = ’Brooklyn’
SQL does, however, offer an alternative style for writing the preceding query
...
This construct
allows us to rewrite the query in a form that resembles closely our formulation of the
query in English
...
The > some
comparison in the where clause of the outer select is true if the assets value of the
tuple is greater than at least one member of the set of all asset values for branches in
Brooklyn
...
Relational Databases
4
...
As an exercise, verify that = some is identical to in, whereas <> some is not the same
as not in
...
Early versions of SQL
allowed only any
...
Now we modify our query slightly
...
The construct > all
corresponds to the phrase “greater than all
...
As an exercise, verify that <> all is identical to not in
...
” Aggregate functions cannot be composed in SQL
...
Instead, we can follow this strategy: We begin
by writing a query to find all average balances, and then nest it as a subquery of a
larger query that finds those branches for which the average balance is greater than
or equal to all average balances:
select branch-name
from account
group by branch-name
having avg (balance) >= all (select avg (balance)
from account
group by branch-name)
4
...
3 Test for Empty Relations
SQL includes a feature for testing whether a subquery has any tuples in its result
...
Using
the exists construct, we can write the query “Find all customers who have both an
account and a loan at the bank” in still another way:
select customer-name
from borrower
where exists (select *
from depositor
where depositor
...
customer-name)
We can test for the nonexistence of tuples in a subquery by using the not exists construct
...
Relational Databases
159
© The McGraw−Hill
Companies, 2001
4
...
6
Nested Subqueries
153
(that is, superset) operation: We can write “relation A contains relation B” as “not
exists (B except A)
...
) To illustrate the
not exists operator, consider again the query “Find all customers who have an account at all the branches located in Brooklyn
...
Using the except construct, we can write the query as
follows:
select distinct S
...
branch-name
from depositor as T, account as R
where T
...
account-number and
S
...
customer-name))
Here, the subquery
(select branch-name
from branch
where branch-city = ’Brooklyn’)
finds all the branches in Brooklyn
...
branch-name
from depositor as T, account as R
where T
...
account-number and
S
...
customer-name)
finds all the branches at which customer S
...
Thus, the
outer select takes each customer and tests whether the set of all branches at which
that customer has an account contains the set of all branches located in Brooklyn
...
In
a subquery, according to the rule, it is legal to use only tuple variables defined in
the subquery itself or in any query that contains the subquery
...
This rule is analogous to the usual scoping rules used for variables
in programming languages
...
6
...
The unique construct returns the value true if the argument subquery contains
160
Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition
154
Chapter 4
II
...
SQL
© The McGraw−Hill
Companies, 2001
SQL
no duplicate tuples
...
customer-name
from depositor as T
where unique (select R
...
customer-name = R
...
account-number = account
...
branch-name = ’Perryridge’)
We can test for the existence of duplicate tuples in a subquery by using the not
unique construct
...
customer-name
from depositor T
where not unique (select R
...
customer-name = R
...
account-number = account
...
branch-name = ’Perryridge’)
Formally, the unique test on a relation is defined to fail if and only if the relation
contains two tuples t1 and t2 such that t1 = t2
...
4
...
To define a view, we
must give the view a name and must state the query that computes the view
...
The view name is represented by v
...
As an example, consider the view consisting of branch names and the names of
customers who have either an account or a loan at that branch
...
We define this view as follows:
Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition
II
...
SQL
4
...
account-number = account
...
loan-number = loan
...
Since the expression sum(amount) does not have a name, the attribute
name is specified explicitly in the view definition
...
Using the
view all-customer, we can find all customers of the Perryridge branch by writing
select customer-name
from all-customer
where branch-name = ’Perryridge’
4
...
(An SQL block consists of a single select
from where statement, possibly with groupby and having clauses
...
4
...
1 Derived Relations
SQL allows a subquery expression to be used in the from clause
...
We do this renaming by using the as clause
...
Relational Databases
4
...
The subquery result is named result, with
the attributes branch-name and avg-balance
...
” We wrote this query in Section 4
...
We can now rewrite this query, without using the having clause, as
follows:
select branch-name, avg-balance
from (select branch-name, avg (balance)
from account
group by branch-name)
as branch-avg (branch-name, avg-balance)
where avg-balance > 1200
Note that we do not need to use the having clause, since the subquery in the from
clause computes the average balance, and its result is named as branch-avg; we can
use the attributes of branch-avg directly in the where clause
...
The having clause does not help us in this task, but
we can write this query easily by using a subquery in the from clause, as follows:
select max(tot-balance)
from (select branch-name, sum(balance)
from account
group by branch-name) as branch-total (branch-name, tot-balance)
4
...
2 The with Clause
Complex queries are much easier to write and to understand if we structure them
by breaking them into smaller views that we then combine, just as we structure programs by breaking their task into procedures
...
The with clause provides a way of defining a temporary view whose definition is
available only to the query in which the with clause occurs
...
with max-balance (value) as
select max(balance)
from account
select account-number
from account, max-balance
where account
...
value
Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition
II
...
SQL
4
...
We could have written the above query by using a nested subquery in either the
from clause or the where clause
...
The with clause makes the query
logic clearer; it also permits a view definition to be used in multiple places within a
query
...
We can write the
query using the with clause as follows
...
value >= branch-total-avg
...
You can write the equivalent query
as an exercise
...
9 Modification of the Database
We have restricted our attention until now to the extraction of information from the
database
...
4
...
1 Deletion
A delete request is expressed in much the same way as a query
...
SQL expresses a
deletion by
delete from r
where P
where P represents a predicate and r represents a relation
...
The where
clause can be omitted, in which case all tuples in r are deleted
...
If we want to delete
tuples from several relations, we must use one delete command for each relation
...
Relational Databases
© The McGraw−Hill
Companies, 2001
4
...
At the other extreme, the where clause may be empty
...
(Well-designed systems will seek confirmation from the user before executing such a devastating request
...
delete from account
where branch-name = ’Perryridge’
• Delete all loans with loan amounts between $1300 and $1500
...
delete from account
where branch-name in (select branch-name
from branch
where branch-city = ’Needham’)
This delete request first finds all branches in Needham, and then deletes all
account tuples pertaining to those branches
...
The delete request can contain a nested select that references the relation
from which tuples are to be deleted
...
We could write
delete from account
where balance < (select avg (balance)
from account)
The delete statement first tests each tuple in the relation account to check whether the
account has a balance less than the average at the bank
...
Performing all the tests before performing any deletion is important — if some tuples
are deleted before other tuples have been tested, the average balance may change,
and the final result of the delete would depend on the order in which the tuples were
processed!
Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition
II
...
SQL
4
...
9
...
Obviously, the attribute values for inserted tuples must be members of the attribute’s domain
...
The simplest insert statement is a request to insert one tuple
...
We write
insert into account
values (’A-9732’, ’Perryridge’, 1200)
In this example, the values are specified in the order in which the corresponding
attributes are listed in the relation schema
...
For example, the following SQL insert statements are identical
in function to the preceding one:
insert into account (account-number, branch-name, balance)
values (’A-9732’, ’Perryridge’, 1200)
insert into account (branch-name, account-number, balance)
values (’Perryridge’, ’A-9732’, 1200)
More generally, we might want to insert tuples on the basis of the result of a query
...
Let the loan number
serve as the account number for the savings account
...
SQL evaluates the select statement first, giving a set of tuples that is
then inserted into the account relation
...
We also need to add tuples to the depositor relation; we do so by writing
insert into depositor
select customer-name, loan-number
from borrower, loan
where borrower
...
loan-number and
branch-name = ’Perryridge’
166
Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition
160
Chapter 4
II
...
SQL
© The McGraw−Hill
Companies, 2001
SQL
This query inserts a tuple (customer-name, loan-number) into the depositor relation for
each customer-name who has a loan in the Perryridge branch with loan number loannumber
...
If we carry out some insertions even as the select statement is being
evaluated, a request such as
insert into account
select *
from account
might insert an infinite number of tuples! The request would insert the first tuple in
account again, creating a second copy of the tuple
...
The select statement may then find this third copy and insert a fourth copy,
and so on, forever
...
Our discussion of the insert statement considered only examples in which a value
is given for every attribute in inserted tuples
...
The
remaining attributes are assigned a null value denoted by null
...
Consider
the query
select account-number
from account
where branch-name = ’Perryridge’
Since the branch at which account A-401 is maintained is not known, we cannot determine whether it is equal to “Perryridge”
...
11
...
9
...
For this purpose, the update statement can be used
...
Suppose that annual interest payments are being made, and all balances are to be
increased by 5 percent
...
05
Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition
II
...
SQL
4
...
If interest is to be paid only to accounts with a balance of $1000 or more, we can
write
update account
set balance = balance * 1
...
As with
insert and delete, a nested select within an update statement may reference the relation that is being updated
...
For example, we can write the request “Pay 5 percent interest on accounts whose balance is
greater than average” as follows:
update account
set balance = balance * 1
...
We could write two update statements:
update account
set balance = balance * 1
...
05
where balance <= 10000
Note that, as we saw in Chapter 3, the order of the two update statements is important
...
3 percent interest
...
update account
set balance = case
when balance <= 10000 then balance * 1
...
06
end
The general form of the case statement is as follows
...
Relational Databases
4
...
when pred n then result n
else result 0
end
The operation returns result i , where i is the first of pred 1 , pred 2 ,
...
Case statements can be used in any place where a value is expected
...
9
...
As an
illustration, consider the following view definition:
create view loan-branch as
select branch-name, loan-number
from loan
Since SQL allows a view name to appear wherever a relation name is allowed, we can
write
insert into loan-branch
values (’Perryridge’, ’L-307’)
SQL represents this insertion by an insertion into the relation loan, since loan is the
actual relation from which the view loan-branch is constructed
...
This value is a null value
...
As we saw in Chapter 3, the view-update anomaly becomes more difficult to handle when a view is defined in terms of several relations
...
Under this constraint, the update, insert, and delete operations would be forbidden
on the example view all-customer that we defined previously
...
Relational Databases
169
© The McGraw−Hill
Companies, 2001
4
...
10
Joined Relations∗∗
163
4
...
5 Transactions
A transaction consists of a sequence of query and/or update statements
...
One of the following SQL statements must end the transaction:
• Commit work commits the current transaction; that is, it makes the updates
performed by the transaction become permanent in the database
...
• Rollback work causes the current transaction to be rolled back; that is, it undoes all the updates performed by the SQL statements in the transaction
...
The keyword work is optional in both the statements
...
Commit is similar, in a sense, to saving changes to a document that
is being edited, while rollback is similar to quitting the edit session without saving
changes
...
The database system guarantees that in the event of some
failure, such as an error in one of the SQL statements, a power outage, or a system
crash, a transaction’s effects will be rolled back if it has not yet executed commit
work
...
For instance, to transfer money from one account to another we need to update
two account balances
...
An error
while a transaction executes one of its statements would result in undoing of the
effects of the earlier statements of the transaction, so that the database is not left in a
partially updated state
...
If a program terminates without executing either of these commands, the updates
are either committed or rolled back
...
In many SQL implementations, by default each SQL statement is taken to be a transaction on its own, and gets
committed as soon as it is executed
...
How to turn off automatic commit depends on the specific SQL implementation
...
end
...
4
...
Relational Databases
© The McGraw−Hill
Companies, 2001
4
...
1
amount
3000
4000
1700
customer-name loan-number
Jones
L-170
Smith
L-230
Hayes
L-155
borrower
The loan and borrower relations
...
These additional operations are typically used as subquery
expressions in the from clause
...
10
...
1
...
Figure 4
...
loan-number = borrower
...
loan-number = borrower
...
The attributes of the
result consist of the attributes of the left-hand-side relation followed by the attributes
of the right-hand-side relation
...
The SQL standard does not require
attribute names in such results to be unique
...
We rename the result relation of a join and the attributes of the result relation by
using an as clause, as illustrated here:
loan inner join borrower on loan
...
loan-number
as lb(loan-number, branch, amount, cust, cust-loan-num)
We rename the second occurrence of loan-number to cust-loan-num
...
Next, we consider an example of the left outer join operation:
loan left outer join borrower on loan
...
loan-number
loan-number
L-170
L-230
branch-name
Downtown
Redwood
amount
3000
4000
customer-name
Jones
Smith
loan-number
L-170
L-230
Figure 4
...
loan-number = borrower
...
Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition
II
...
10
loan-number
L-170
L-230
L-260
branch-name
Downtown
Redwood
Perryridge
171
© The McGraw−Hill
Companies, 2001
4
...
3 The result of loan left outer join borrower on
loan
...
loan-number
...
First, compute the
result of the inner join as before
...
Figure 4
...
The tuples (L-170, Downtown, 3000) and (L-230, Redwood, 4000) join with
tuples from borrower and appear in the result of the inner join, and hence in the result
of the left outer join
...
Finally, we consider an example of the natural join operation:
loan natural inner join borrower
This expression computes the natural join of the two relations
...
Figure 4
...
The result is similar to the result of the inner join with the on condition in
Figure 4
...
However, the attribute
loan-number appears only once in the result of the natural join, whereas it appears
twice in the result of the join with the on condition
...
10
...
10
...
Join operations take two relations and return another relation as the result
...
Each of the variants of the join operations in SQL consists of a join type and a join
condition
...
The join type defines how tuples in each
loan-number
L-170
L-230
Figure 4
...
172
Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition
166
Chapter 4
II
...
SQL
SQL
Join types
inner join
left outer join
right outer join
full outer join
Figure 4
...
, An)
Join types and join conditions
...
Figure 4
...
The
first join type is the inner join, and the other three are the outer joins
...
The use of a join condition is mandatory for outer joins, but is optional for inner
joins (if it is omitted, a Cartesian product results)
...
The keywords inner and outer are
optional, since the rest of the join type enables us to deduce whether the join is an
inner join or an outer join
...
The ordering of the attributes in the result of a
natural join is as follows
...
Next come all nonjoin attributes of the left-hand-side relation, and finally all nonjoin
attributes of the right-hand-side relation
...
Tuples from the right-handside relation that do not match any tuple in the left-hand-side relation are padded
with nulls and are added to the result of the right outer join
...
6 shows the result of this expression
...
The
first two tuples in the result are from the inner natural join of loan and borrower
...
Hence, the tuple (L-155, null,
null, Hayes) appears in the join result
...
, An ) is similar to the natural join condition, except that the join attributes are the attributes A1 , A2 ,
...
The attributes A1 , A2 ,
...
The full outer join is a combination of the left and right outer-join types
...
Relational Databases
4
...
6
branch-name
Downtown
Redwood
null
173
© The McGraw−Hill
Companies, 2001
4
...
the left-hand-side relation that did not match with any from the right-hand-side, and
adds them to the result
...
For example, Figure 4
...
customer-name = borrower
...
The
first is equivalent to an inner join without a join condition; the second is equivalent
to a full outer join on the “false” condition — that is, where the inner join is empty
...
7
branch-name
Downtown
Redwood
Perryridge
null
amount
3000
4000
1700
null
customer-name
Jones
Smith
null
Hayes
The result of loan full outer join borrower using(loan-number)
...
Relational Databases
4
...
11 Data-Definition Language
In most of our discussions of SQL and relational databases, we have accepted a set of
relations as given
...
The SQL DDL allows specification of not only a set of relations, but also information
about each relation, including
• The schema for each relation
• The domain of values associated with each attribute
• The integrity constraints
• The set of indices to be maintained for each relation
• The security and authorization information for each relation
• The physical storage structure of each relation on disk
We discuss here schema definition and domain values; we defer discussion of the
other SQL DDL features to Chapter 6
...
11
...
The full
form, character, can be used instead
...
The full form, character varying, is equivalent
...
The
full form, integer, is equivalent
...
• numeric(p, d): A fixed-point number with user-specified precision
...
Thus, numeric(3,1) allows 44
...
5 or 0
...
• real, double precision: Floating-point and double-precision floating-point
numbers with machine-dependent precision
...
• date: A calendar date containing a (four-digit) year, month, and day of the
month
...
Relational Databases
175
© The McGraw−Hill
Companies, 2001
4
...
11
Data-Definition Language
169
• time: The time of day, in hours, minutes, and seconds
...
It is also possible to store time zone information along with the time
...
A variant, timestamp(p), can be
used to specify the number of fractional digits for seconds (the default here
being 6)
...
45’
Dates must be specified in the format year followed by month followed by day, as
shown
...
We can use an expression of the form cast e as t to convert a character string (or string valued expression) e to the type t, where t is one of date, time,
or timestamp
...
To extract individual fields of a date or time value d, we can use extract (field from
d), where field can be one of year, month, day, hour, minute, or second
...
SQL
also provides a data type called interval, and it allows computations based on dates
and times and on intervals
...
Similarly, adding
or subtracting an interval to a date or time gives back a date or time, respectively
...
For example, since
every small integer is an integer, a comparison x < y, where x is a small integer and
y is an integer (or vice versa), makes sense
...
A transformation of this sort is called a type coercion
...
As an illustration, suppose that the domain of customer-name is a character string
of length 20, and the domain of branch-name is a character string of length 15
...
As we discussed in Chapter 3, the null value is a member of all domains
...
Consider a tuple in the
customer relation where customer-name is null
...
In cases such
as this, we wish to forbid null values, and we do so by restricting the domain of
customer-name to exclude null values
...
Any database
modification that would cause a null to be inserted in a not null domain generates
176
Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition
170
Chapter 4
II
...
SQL
© The McGraw−Hill
Companies, 2001
SQL
an error diagnostic
...
In particular, it is essential to prohibit null values in the primary key of a relation
schema
...
4
...
2 Schema Definition in SQL
We define an SQL relation by using the create table command:
create table r(A1 D1 , A2 D2 ,
...
,
integrity-constraintk )
where r is the name of the relation, each Ai is the name of an attribute in the schema
of relation r, and Di is the domain type of values in the domain of attribute Ai
...
, Ajm ): The primary key specification says that attributes Aj1 , Aj2 ,
...
The primary
key attributes are required to be non-null and unique; that is, no tuple can have
a null value for a primary key attribute, and no two tuples in the relation can
be equal on all the primary-key attributes
...
• check(P): The check clause specifies a predicate P that must be satisfied by
every tuple in the relation
...
Figure 4
...
Note that,
as in earlier chapters, we do not attempt to model precisely the real world in the
bank-database example
...
We use customer-name as a primary key to keep our
database schema simple and short
...
Similarly, it
flags an error and prevents the update if the check condition on the tuple fails
...
An attribute can be declared to be not null in the
following way:
account-number char(10) not null
1
...
Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition
II
...
SQL
4
...
8
SQL data definition for part of the bank database
...
, Ajm )
The unique specification says that attributes Aj1 , Aj2 ,
...
However, candidate key attributes are permitted to be null unless they have explicitly
been declared to be not null
...
The treatment of nulls here is the same as that of the unique construct defined in
Section 4
...
4
...
For instance, the check
clause in the create table command for relation branch checks that the value of assets
is nonnegative
...
Relational Databases
© The McGraw−Hill
Companies, 2001
4
...
We consider more
general forms of check conditions, as well as a class of constraints called referential
integrity constraints, in Chapter 6
...
We can use the insert command to load
data into the relation
...
To remove a relation from an SQL database, we use the drop table command
...
The command
drop table r
is a more drastic action than
delete from r
The latter retains relation r, but deletes all tuples in r
...
After r is dropped, no tuples can be inserted
into r unless it is re-created with the create table command
...
All tuples
in the relation are assigned null as the value for the new attribute
...
We can drop attributes from a relation by
the command
alter table r drop A
where r is the name of an existing relation, and A is the name of an attribute of the
relation
...
4
...
Writing queries in SQL is usu-
ally much easier than coding the same queries in a general-purpose programming
language
...
Not all queries can be expressed in SQL, since SQL does not provide the full
expressive power of a general-purpose language
...
To write such queries, we can embed SQL within a more
powerful language
...
Relational Databases
179
© The McGraw−Hill
Companies, 2001
4
...
12
Embedded SQL
173
SQL is designed so that queries written in it can be optimized automatically
and executed efficiently — and providing the full power of a programming
language makes automatic optimization exceedingly difficult
...
Nondeclarative actions— such as printing a report, interacting with a user, or
sending the results of a query to a graphical user interface — cannot be done
from within SQL
...
For an integrated application, the
programs written in the programming language must be able to access the
database
...
A language in which SQL
queries are embedded is referred to as a host language, and the SQL structures permitted in the host language constitute embedded SQL
...
This embedded form of SQL extends the
programmer’s ability to manipulate the database even further
...
An embedded SQL program must be processed by a special preprocessor prior to
compilation
...
Then, the resulting program is compiled by the host-language compiler
...
For instance, a semicolon is used instead of END-EXEC when SQL
is embedded in C
...
Variables of the host language can be used
within embedded SQL statements, but they must be preceded by a colon (:) to distinguish them from SQL variables
...
There are, however, several important differences, as we note
here
...
The result of the
query is not yet computed
...
180
Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition
174
Chapter 4
II
...
SQL
© The McGraw−Hill
Companies, 2001
SQL
Consider the banking schema that we have used in this chapter
...
We can
write this query as follows:
EXEC SQL
declare c cursor for
select customer-name, customer-city
from depositor, customer, account
where depositor
...
customer-name and
account
...
account-number and
account
...
We use
this variable to identify the query in the open statement, which causes the query to
be evaluated, and in the fetch statement, which causes the values of one tuple to be
placed in host-language variables
...
The query has a host-language variable (:amount); the
query uses the value of the variable at the time the open statement was executed
...
An embedded SQL program executes a series of fetch statements to retrieve tuples
of the result
...
For our example query, we need one variable to hold the
customer-name value and another to hold the customer-city value
...
Then the statement:
EXEC SQL fetch c into :cn, :cc END-EXEC
produces a tuple of the result relation
...
A single fetch request returns only one tuple
...
Embedded SQL assists the
programmer in managing this iteration
...
When the program
executes an open statement on a cursor, the cursor is set to point to the first tuple
of the result
...
When no further tuples remain to be processed, the
variable SQLSTATE in the SQLCA is set to ’02000’ (meaning “no data”)
...
Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition
II
...
SQL
4
...
For our example, this statement takes
the form
EXEC SQL close c END-EXEC
SQLJ, the Java embedding of SQL, provides a variation of the above scheme, where
Java iterators are used in place of cursors
...
Embedded SQL expressions for database modification (update, insert, and delete)
do not return a result
...
A databasemodification request takes the form
EXEC SQL < any valid update, insert, or delete> END-EXEC
Host-language variables, preceded by a colon, may appear in the SQL databasemodification expression
...
Database relations can also be updated through cursors
...
declare c cursor for
select *
from account
where branch-name = ‘Perryridge‘
for update
We then iterate through the tuples by performing fetch operations on the cursor (as
illustrated earlier), and after fetching each tuple we execute the following code
update account
set balance = balance + 100
where current of c
Embedded SQL allows a host-language program to access the database, but it provides no assistance in presenting results to the user or in generating reports
...
We discuss such tools in Chapter 5
(Section 5
...
4
...
In contrast, embedded SQL statements must be completely present
at compile time; they are compiled by the embedded SQL preprocessor
...
Relational Databases
4
...
Preparing a dynamic SQL statement compiles it, and
subsequent uses of the prepared statement use the compiled version
...
char * sqlprog = ”update account set balance = balance ∗1
...
However, the syntax above requires extensions to the language or a preprocessor
for the extended language
...
In the rest of this section, we look at two standards for connecting to an SQL
database and performing queries and updates
...
To understand these standards, we need to understand the concept of SQL sessions
...
Thus, all activities of
the user or application are in the context of an SQL session
...
4
...
1 ODBC∗∗
The Open DataBase Connectivity (ODBC) standard defines a way for an application
program to communicate with a database server
...
Applications such as graphical user
interfaces, statistics packages, and spreadsheets can make use of the same ODBC API
to connect to any database server that supports ODBC
...
When the client program makes an ODBC API call, the code
in the library communicates with the server to carry out the requested action, and
fetch results
...
9 shows an example of C code using the ODBC API
...
To do
so, the program first allocates an SQL environment, then a database connection handle
...
The program then opens
the database connection by using SQLConnect
...
Relational Databases
183
© The McGraw−Hill
Companies, 2001
4
...
13
Dynamic SQL
177
int ODBCexample()
{
RETCODE error;
HENV env; /* environment */
HDBC conn; /* database connection */
SQLAllocEnv(&env);
SQLAllocConnect(env, &conn);
SQLConnect(conn, ”aura
...
com”, SQL NTS, ”avi”, SQL NTS,
”avipasswd”, SQL NTS);
{
char branchname[80];
float balance;
int lenOut1, lenOut2;
HSTMT stmt;
SQLAllocStmt(conn, &stmt);
}
char * sqlquery = ”select branch name, sum (balance)
from account
group by branch name”;
error = SQLExecDirect(stmt, sqlquery, SQL NTS);
if (error == SQL SUCCESS) {
SQLBindCol(stmt, 1, SQL C CHAR, branchname , 80, &lenOut1);
SQLBindCol(stmt, 2, SQL C FLOAT, &balance, 0 , &lenOut2);
while (SQLFetch(stmt) >= SQL SUCCESS) {
printf (” %s %g\n”, branchname, balance);
}
}
SQLFreeStmt(stmt, SQL DROP);
SQLDisconnect(conn);
SQLFreeConnect(conn);
SQLFreeEnv(env);
}
Figure 4
...
cluding the connection handle, the server to which to connect, the user identifier,
and the password for the database
...
Once the connection is set up, the program can send SQL commands to the database
by using SQLExecDirect C language variables can be bound to attributes of the query
result, so that when a result tuple is fetched using SQLFetch, its attribute values are
stored in corresponding C variables
...
The next argument
184
Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition
178
Chapter 4
II
...
SQL
© The McGraw−Hill
Companies, 2001
SQL
gives the address of the variable
...
A negative value returned
for the length field indicates that the value is null
...
On each fetch, the program stores the values
in C variables as specified by the calls on SQLBindCol and prints out these values
...
Good
programming style requires that the result of every function call must be checked to
make sure there are no errors; we have omitted most of these checks for brevity
...
The question marks are placeholders
for values which will be supplied later
...
ODBC defines functions for a variety of tasks, such as finding all the relations in the
database and finding the names and types of columns of a query result or a relation
in the database
...
The call SQLSetConnectOption(conn, SQL AUTOCOMMIT, 0) turns
off automatic commit on connection conn, and transactions must then be committed
explicitly by SQLTransact(conn, SQL COMMIT) or rolled back by SQLTransact(conn,
SQL ROLLBACK)
...
Each version defines conformance levels, which specify subsets of the functionality defined by
the standard
...
Level 1 requires support
for fetching information about the catalog, such as information about what relations
are present and the types of their attributes
...
The more recent SQL standards (SQL-92 and SQL:1999) define a call level interface
(CLI) that is similar to the ODBC interface, but with some minor differences
...
13
...
(The word JDBC was originally an abbreviation for “Java Database Connectivity”, but the full form is no longer used
...
10 shows an example Java program that uses the JDBC interface
...
forName
...
Relational Databases
185
© The McGraw−Hill
Companies, 2001
4
...
13
Dynamic SQL
179
public static void JDBCexample(String dbid, String userid, String passwd)
{
try
{
Class
...
jdbc
...
OracleDriver”);
Connection conn = DriverManager
...
bell-labs
...
createStatement();
try {
stmt
...
out
...
” + sqle);
}
ResultSet rset = stmt
...
next()) {
System
...
println(rset
...
getFloat(2));
}
stmt
...
close();
}
catch (SQLException sqle)
{
System
...
println(”SQLException : ” + sqle);
}
}
Figure 4
...
runs (in our example, aura
...
com), the port number it uses for communication (in our example, 2000)
...
The first parameter also specifies the protocol to be used to communicate
with the database (in our example, jdbc:oracle:thin:)
...
A JDBC driver may support multiple protocols, and we must specify one supported by both the database and the driver
...
The program then creates a statement handle on the connection and uses it to
execute an SQL statement and get back results
...
executeUpdate
executes an update statement
...
} catch {
...
Relational Databases
4
...
prepareStatement(
”insert into account values(?,?,?)”);
pStmt
...
setString(2, ”Perryridge”);
pStmt
...
executeUpdate();
pStmt
...
executeUpdate();
Figure 4
...
catch any exceptions (error conditions) that arise when JDBC calls are made, and print
an appropriate message to the user
...
executeQuery
...
Figure 4
...
We can also create a prepared statement in which some values are replaced by “?”,
thereby specifying that actual values will be provided later
...
The database can compile the query when it is prepared,
and each time it is executed (with new values), the database can reuse the previously
compiled form of the query
...
11 shows how prepared
statements can be used
...
It can
create an updatable result set from a query that performs a selection and/or a projection on a database relation
...
JDBC also provides an
API to examine database schemas and to find the types of attributes of a result set
...
4
...
We covered the basics of SQL earlier in this chapter
...
4
...
1 Schemas, Catalogs, and Environments
To understand the motivation for schemas and catalogs, consider how files are named
in a file system
...
Current generation file systems of course have a directory structure, with
Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition
II
...
SQL
4
...
To name a file uniquely, we must specify the full
path name of the file, for example, /users/avi/db-book/chapter4
...
Like early file systems, early database systems also had a single name space for all
relations
...
Contemporary database systems provide a three-level hierarchy for naming relations
...
SQL objects such as relations and views are contained
within a schema
...
The user must provide the user name and usually, a secret
password for verifying the identity of the user, as we saw in the ODBC and JDBC
examples in Sections 4
...
1 and 4
...
2
...
When a user connects to a database system,
the default catalog and schema are set up for for the connection; this corresponds to
the current directory being set to the user’s home directory when the user logs into
an operating system
...
bank-schema
...
Thus if catalog5 is the default
catalog, we can use bank-schema
...
Further, we may also omit the schema name, and the schema part of the name is again
considered to be the default schema for the connection
...
With multiple catalogs and schemas available, different applications and different users can work independently without worrying about name clashes
...
The default catalog and schema are part of an SQL environment that is set up
for each connection
...
All the usual SQL statements, including the
DDL and DML statements, operate in the context of a schema
...
Creation and
dropping of catalogs is implementation dependent and not part of the SQL standard
...
14
...
A module typically contains multiple SQL procedures
...
An extension of the SQL-92 standard lan-
guage also permits procedural constructs, such as for, while, and if-then-else, and
compound SQL statements (multiple SQL statements between a begin and an end)
...
Such procedures are also called stored procedures
...
Relational Databases
4
...
Chapter 9 covers procedural extensions of SQL as well as many other new features
of SQL:1999
...
15 Summary
• Commercial database systems do not use the terse, formal query languages
covered in Chapter 3
...
”
• SQL includes a variety of language constructs for queries on the database
...
SQL also allows ordering of query results by sorting on specified attributes
...
Views are useful for hiding unneeded information, and for collecting together
information from more than one relation into a single view
...
• SQL provides constructs for updating, inserting, and deleting information
...
That is, all the operations are carried out successfully, or none is carried out
...
• Modifications to the database may lead to the generation of null values in
tuples
...
• The SQL data definition language is used to create relations with specified
schemas
...
Further details on the SQL DDL, in particular its support for integrity
constraints, appear in Chapter 6
...
The ODBC and JDBC standards define application program interfaces to
access SQL databases from C and Java language programs
...
• We also saw a brief overview of some advanced features of SQL, such as procedural extensions, catalogs, schemas and stored procedures
...
Relational Databases
189
© The McGraw−Hill
Companies, 2001
4
...
1 Consider the insurance database of Figure 4
...
Construct the following SQL queries for this relational database
...
Find the total number of people who owned cars that were involved in accidents in 1989
...
Find the number of accidents in which the cars belonging to “John Smith”
were involved
...
Add a new accident to the database; assume any values for required attributes
...
Delete the Mazda belonging to “John Smith”
...
Update the damage amount for the car with license number “AABB2000” in
the accident with report number “AR2197” to $3000
...
2 Consider the employee database of Figure 4
...
Give an expression in SQL for each of the following queries
...
Find the names of all employees who work for First Bank Corporation
...
Relational Databases
© The McGraw−Hill
Companies, 2001
4
...
12
Insurance database
...
13
Employee database
...
Find the names and cities of residence of all employees who work for First
Bank Corporation
...
Find the names, street addresses, and cities of residence of all employees
who work for First Bank Corporation and earn more than $10,000
...
Find all employees in the database who live in the same cities as the companies for which they work
...
Find all employees in the database who live in the same cities and on the
same streets as do their managers
...
Find all employees in the database who do not work for First Bank Corporation
...
Find all employees in the database who earn more than each employee of
Small Bank Corporation
...
Assume that the companies may be located in several cities
...
i
...
j
...
k
...
l
...
4
...
13
...
a
...
b
...
c
...
d
...
e
...
Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition
II
...
SQL
Exercises
185
4
...
Give an expression in SQL that is equivalent
to each of the following queries
...
ΠA (r)
b
...
r × s
d
...
5 Let R = (A, B, C), and let r1 and r2 both be relations on schema R
...
a
...
c
...
r1 ∪ r2
r1 ∩ r2
r1 − r2
ΠAB (r1 )
1
ΠBC (r2 )
4
...
Write an
expression in SQL for each of the queries below:
a
...
{< a, b, c > | < a, b > ∈ r ∧ < a, c > ∈ s}
c
...
7 Show that, in SQL, <> all is identical to not in
...
8 Consider the relational database of Figure 4
...
Using SQL, define a view consisting of manager-name and the average salary of all employees who work for
that manager
...
4
...
a1
from p, r1, r2
where p
...
a1 or p
...
a1
Under what conditions does the preceding query select values of p
...
4
...
Using a nested query in the from clauser
...
Relational Databases
4
...
Using a nested query in a having clause
...
11 Suppose that we have a relation marks(student-id, score) and we wish to assign
grades to students based on the score as follows: grade F if score < 40, grade C
if 40 ≤ score < 60, grade B if 60 ≤ score < 80, and grade A if 80 ≤ score
...
Display the grade for each student, based on the marks relation
...
Find the number of students with each grade
...
12 SQL-92 provides an n-ary operation called coalesce, which is defined as follows:
coalesce(A1 , A2 ,
...
, An ,
and returns null if all of A1 , A2 ,
...
Show how to express the coalesce operation using the case operation
...
13 Let a and b be relations with the schemas A(name, address, title) and B(name, address, salary), respectively
...
Make sure that the result relation does not contain two copies of the attributes
name and address, and that the solution is correct even if some tuples in a and b
have null values for attributes name or address
...
14 Give an SQL schema definition for the employee database of Figure 4
...
Choose
an appropriate domain for each attribute and an appropriate primary key for
each relation schema
...
15 Write check conditions for the schema you defined in Exercise 4
...
Every employee works for a company located in the same city as the city in
which the employee lives
...
No employee earns a salary higher than that of his manager
...
16 Describe the circumstances in which you would choose to use embedded SQL
rather than SQL alone or only a general-purpose programming language
...
[1976]
...
[1975] and Chamberlin and Boyce [1974]
...
The IBM Systems Application Architecture definition of SQL is defined by IBM
[1987]
...
Textbook descriptions of the SQL-92 language include Date and Darwen [1997],
Melton and Simon [1993], and Cannan and Otten [1993]
...
More information on SQLJ
and SQLJ software can be obtained from http://www
...
org
...
Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition
II
...
SQL
193
© The McGraw−Hill
Companies, 2001
Bibliographical Notes
187
Eisenberg and Melton [1999] provide an overview of SQL:1999
...
Part 1 (SQL/Framework),
gives an overview of the other parts
...
Part 3 (SQL/CLI) describes the Call-Level Interface
...
The standard is useful to database implementers but is very hard
to read
...
ansi
...
Many database products support SQL features beyond those specified in the standards, and may not support some features of the standard
...
http://java
...
com/docs/books/tutorial is an excellent source for more (and up-todate) information on JDBC, and on Java in general
...
The ODBC API is described in Microsoft
[1997] and Sanders [1998]
...
Bibliographic references on these matters appear in
that chapter
...
Relational Databases
H
A
P
T
5
...
In this chapter, we study two more languages: QBE and Datalog
...
QBE and its variants
are widely used in database systems on personal computers
...
Although not used commercially at present, Datalog has been used in several research database systems
...
Keep in mind that individual implementations of a
language may differ in details, or may support only a subset of the full language
...
While these are not strictly speaking languages, they form the main
interface to a database for many users
...
5
...
The QBE database system was
developed at IBM’s T
...
Watson Research Center in the early 1970s
...
Today, many database systems for personal computers support variants of QBE language
...
It has two
distinctive features:
1
...
A query in a one-dimensional
189
Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition
190
Chapter 5
II
...
Other Relational
Languages
Other Relational Languages
language (for example, SQL) can be written in one (possibly long) line
...
(There is a
one-dimensional version of QBE, but we shall not consider it in our discussion)
...
QBE queries are expressed “by example
...
The system generalizes this example to compute the answer to the query
...
We express queries in QBE by skeleton tables
...
1
...
An example row consists of constants and example elements, which are domain
variables
...
branch
customer
branch-name
customer-name
loan
loan-number
borrower
account
depositor
Figure 5
...
195
196
Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition
II
...
Other Relational
Languages
5
...
5
...
1 Queries on One Relation
Returning to our ongoing bank example, to find all loan numbers at the Perryridge
branch, we bring up the skeleton for the loan relation, and fill it in as follows:
loan
loan-number
P
...
For each such tuple, the system assigns the value
of the loan-number attribute to the variable x
...
appears in the loan-number column next to
the variable x
...
As a result,
if a variable does not appear more than once in a query, it may be omitted
...
branch-name
Perryridge
amount
QBE (unlike SQL) performs duplicate elimination automatically
...
after the P
...
ALL
...
in
every field
...
in
the column headed by the relation name:
loan
P
...
branch-name
amount
>700
Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition
192
Chapter 5
II
...
Other Relational
Languages
Other Relational Languages
Comparisons can involve only one arithmetic expression on the right-hand side of
the comparison operation (for example, > ( x + y − 20))
...
The space on the left-hand side of the comparison operation must be blank
...
Note that requiring the left-hand side to be blank implies that we cannot compare
two distinct named variables
...
As yet another example, consider the query “Find the names of all branches that
are not located in Brooklyn
...
branch-city
¬ Brooklyn
assets
The primary purpose of variables in QBE is to force values of certain tuples to have
the same value on certain attributes
...
x
x
To execute this query, the system finds all pairs of tuples in borrower that agree on
the loan-number attribute, where the value for the customer-name attribute is “Smith”
for one tuple and “Jones” for the other
...
In the domain relational calculus, the query would be written as
{ l | ∃ x ( x, l ∈ borrower ∧ x = “Smith”)
∧ ∃ x ( x, l ∈ borrower ∧ x = “Jones”)}
As another example, consider the query “Find all customers who live in the same
city as Jones”:
customer
customer-name
P
...
1
...
The connections among the various relations are achieved through variables that force certain tuples to have the same value
on certain attributes
...
This query can be written as
197
198
Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition
II
...
Other Relational
Languages
5
...
y
Query-by-Example
193
amount
loan-number
x
To evaluate the preceding query, the system finds tuples in loan with “Perryridge”
as the value for the branch-name attribute
...
It
displays the values for the customer-name attribute
...
x
customer-name
x
account-number
loan-number
Now consider the query “Find the names of all customers who have an account
at the bank, but who do not have a loan from the bank
...
x
customer-name
x
account-number
loan-number
Compare the preceding query with our earlier query “Find the names of all customers who have both an account and a loan at the bank
...
This difference, however,
has a major effect on the processing of the query
...
There is a tuple in the depositor relation whose customer-name is the domain
variable x
...
There is no tuple in the borrower relation whose customer-name is the same as
in the domain variable x
...
”
The fact that we placed the ¬ under the relation name, rather than under an attribute name, is important
...
Thus, to
find all customers who have at least two accounts, we write
Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition
194
Chapter 5
II
...
Other Relational
Languages
Other Relational Languages
depositor
customer-name
P
...
”
5
...
3 The Condition Box
At times, it is either inconvenient or impossible to express all the constraints on the
domain variables within the skeleton tables
...
QBE allows logical expressions to appear in a condition box
...
For example, the query “Find the loan numbers of all loans made to Smith, to Jones
(or to both jointly)” can be written as
borrower
customer-name
n
loan-number
P
...
in multiple rows
...
in multiple rows are sometimes hard to
understand, and are best avoided
...
1
...
” We want to include an “x = Jones” constraint in this query
...
conditions
x ≥ 1300
x ≤ 1500
branch-name
balance
x
199
200
Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition
II
...
Other Relational
Languages
5
...
” This query can be written as
branch
branch-name
P
...
We can
write the query “Find all branches that have assets that are at least twice as large as
the assets of one of the branches located in Brooklyn” much as we did in the preceding query, by modifying the condition box to
conditions
y ≥ 2* z
To find all account numbers of account with a balance between $1300 and $2000,
but not exactly $1500, we write
account
account-number
P
...
To find all branches that are located in either Brooklyn or Queens,
we write
branch
branch-name
P
...
1
...
If the result of a query
includes attributes from several relation schemas, we need a mechanism to display
the desired result in a single table
...
We print the desired
result by including the command P
...
Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition
196
Chapter 5
II
...
Other Relational
Languages
Other Relational Languages
As an illustration, consider the query “Find the customer-name, account-number, and
balance for all accounts at the Perryridge branch
...
Join depositor and account
...
Project customer-name, account-number, and balance
...
Create a skeleton table, called result, with attributes customer-name, accountnumber, and balance
...
2
...
The resulting query is
account
account-number
y
depositor
result
P
...
1
...
We gain this control by inserting either the command AO
...
(descending order) in the appropriate column
...
AO
...
We specify the order in which the sorting should be carried out by including, with
each sort operator (AO or DO), an integer surrounded by parentheses
...
AO(1)
...
DO(2)
...
Relational Databases
© The McGraw−Hill
Companies, 2001
5
...
1
Query-by-Example
197
The command P
...
specifies that the account number should be sorted first;
the command P
...
specifies that the balances for each account should then be
sorted
...
1
...
We must postfix these operators with ALL
...
The ALL
...
Thus, to find
the total balance of all the accounts maintained at the Perryridge branch, we write
account
account-number
branch-name
Perryridge
balance
P
...
ALL
...
Thus, to
find the total number of customers who have an account at the bank, we write
depositor
customer-name
account-number
P
...
UNQ
...
operator, which is analogous to SQL’s group by construct
...
G
...
AVG
...
x
The average balance is computed on a branch-by-branch basis
...
in the P
...
ALL
...
If we wish to display the branch names in ascending order, we replace P
...
by
P
...
G
...
ALL
...
Relational Databases
© The McGraw−Hill
Companies, 2001
5
...
G
...
UNQ
...
UNQ
...
Thus, CNT
...
w is the number of distinct branches in Brooklyn
...
• The customer whose name is x has an account at the branch
...
UNQ
...
If CNT
...
z = CNT
...
w, then customer x must have an account
at all of the branches located in Brooklyn
...
5
...
7 Modification of the Database
In this section, we show how to add, remove, or change information in QBE
...
1
...
1 Deletion
Deletion of tuples from a relation is expressed in much the same way as a query
...
in place of P
...
When we delete information in only
some of the columns, null values, specified by −, are inserted
...
command operates on only one relation
...
operator for each relation
...
customer
D
...
Relational Databases
© The McGraw−Hill
Companies, 2001
5
...
1
Query-by-Example
199
• Delete the branch-city value of the branch whose name is “Perryridge
...
assets
Thus, if before the delete operation the branch relation contains the tuple
(Perryridge, Brooklyn, 50000), the delete results in the replacement of the preceding tuple with the tuple (Perryridge, −, 50000)
...
loan
D
...
branch-name
customer-name
amount
x
loan-number
y
conditions
x = (≥ 1300 and ≤ 1500)
Note that to delete loans we must delete tuples from both the loan and borrower relations
...
account
D
...
branch
account-number
y
customer-name
branch-name
x
branch-name
x
balance
account-number
y
branch-city
Brooklyn
assets
Note that, in expressing a deletion, we can reference relations other than those from
which we are deleting information
...
1
...
2 Insertion
To insert data into a relation, we either specify a tuple to be inserted or write a query
whose result is a set of tuples to be inserted
...
operator in the query expression
...
The simplest insert is a request to insert one tuple
...
We write
Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition
200
Chapter 5
II
...
Other Relational
Languages
Other Relational Languages
account
I
...
To insert information into the branch relation about a new branch with name “Capital” and city
“Queens,” but with a null asset value, we write
branch
I
...
Consider again the situation where we want to provide as a gift, for all loan customers of the Perryridge branch, a new $200 savings account for every loan account
that they have, with the loan number serving as the account number for the savings
account
...
account-number
x
depositor
I
...
5
...
7
...
For this purpose, we use the U
...
As we could
for insert and delete, we can choose the tuples to be updated by using a query
...
Suppose that we want to update the asset value of the of the Perryridge branch to
$10,000,000
...
10000000
205
206
Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition
II
...
Other Relational
Languages
5
...
The preceding query updates the assets of the Perryridge branch to $10,000,000,
regardless of the old value
...
Suppose that interest payments are being
made, and all balances are to be increased by 5 percent
...
x * 1
...
05
...
1
...
While
the original QBE was designed for a text-based display environment, Access QBE is
designed for a graphical display environment, and accordingly is called graphical
query-by-example (GQBE)
...
2
An example query in Microsoft Access QBE
...
Relational Databases
5
...
2 shows a sample GQBE query
...
” Section 5
...
4 showed how it is expressed in QBE
...
A more significant difference is that
the graphical version of QBE uses a line linking attributes of two tables, instead of a
shared variable, to specify a join condition
...
In the example in Figure 5
...
The attribute account-number is
shared between the two selected tables, and the system automatically inserts a link
between the two tables
...
The link can also be
specified to denote a natural outer-join, instead of a natural join
...
in the table
...
Queries involving group by and aggregation can be created in Access as shown in
Figure 5
...
The query in the figure finds the name, street, and city of all customers
who have more than one account at the bank; we saw the QBE version of the query
earlier in Section 5
...
6
...
3
An aggregation query in Microsoft Access QBE
...
Relational Databases
© The McGraw−Hill
Companies, 2001
5
...
2
Datalog
203
are noted in the design grid
...
SQL has a similar requirement
...
Queries are created through a graphical user interface, by first selecting tables
...
Selection conditions, grouping and aggregation can then be specified
on the attributes in the design grid
...
5
...
As in the relational calculus, a user describes the information desired
without giving a specific procedure for obtaining that information
...
However, the meaning of Datalog programs is defined
in a purely declarative manner, unlike the more procedural semantics of Prolog, so
Datalog simplifies writing simple queries and makes query optimization easier
...
2
...
Before presenting a formal definition
of Datalog rules and their formal meaning, we consider examples
...
The symbol :– is read as “if,” and the comma separating
the “account(A, “Perryridge”, B)” from “B > 700” is read as “and
...
4
...
5
...
Relational Databases
© The McGraw−Hill
Companies, 2001
5
...
4
balance
500
700
400
350
900
700
750
The account relation
...
Each rule defines
a set of tuples that the view relation must contain
...
The following Datalog
program specifies the interest rates for accounts:
interest-rate(A, 5) :– account(A, N , B), B < 10000
interest-rate(A, 6) :– account(A, N , B), B >= 10000
The program has two rules defining a view relation interest-rate, whose attributes are
the account number and the interest rate
...
Datalog rules can also use negation
...
Thus, Datalog rules are compact, compared to SQL
account-number
A-201
A-217
Figure 5
...
209
210
Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition
II
...
Other Relational
Languages
5
...
However, when relations have a large number of attributes, or the order or
number of attributes of relations may change, the positional notation can be cumbersome and error prone
...
In such a system, the Datalog rule
defining v1 can be written as
v1(account-number A, balance B) :–
account(account-number A, branch-name “Perryridge”, balance B),
B > 700
Translation between the two forms can be done without significant effort, given the
relation schema
...
2
...
2
...
We use the same conventions
as in the relational algebra for denoting relation names, attribute names, and constants (such as numbers or quoted strings)
...
Examples of constants are 4, which is a number, and “John,” which is a string;
X and Name are variables
...
, tn )
where p is the name of a relation with n attributes, and t1 , t2 ,
...
A negative literal has the form
not p(t1 , t2 ,
...
Here is an example of a literal:
account(A, “Perryridge”, B)
Literals involving arithmetic operations are treated specially
...
But what does this notation mean for arithmetic operations such as “>”? The relation > (conceptually) contains tuples of the form (x, y) for every possible pair of
values x, y such that x > y
...
Clearly,
the (conceptual) relation > is infinite
...
For example, A = B + C stands conceptually for +(B, C, A), where the relation + contains every tuple (x, y, z) such that
z = x + y
...
Relational Databases
5
...
, vn )
and denotes that the tuple (v1 , v2 ,
...
A set of facts for a relation
can also be written in the usual tabular notation
...
Rules are built
out of literals and have the form
p(t1 , t2 ,
...
, Ln
where each Li is a (positive or negative) literal
...
, tn ) is referred
to as the head of the rule, and the rest of the literals in the rule constitute the body of
the rule
...
As mentioned earlier, there may be several rules defining a
relation
...
6 shows a Datalog program that defines the interest on each account in
the Perryridge branch
...
It
uses the relation account and the view relation interest-rate
...
A view relation v1 is said to depend directly on a view relation v2 if v2 is used
in the expression defining v1
...
Relation interest-rate in turn depends
directly on account
...
, in , for some n, such that v1 depends directly on i1 , i1 depends directly on i2 , and so on till in−1 depends on in
...
6, since we have a chain of dependencies from interest
to interest-rate to account, relation interest also depends indirectly on account
...
A view relation v is said to be recursive if it depends on itself
...
Consider the program in Figure 5
...
Here, the view relation empl depends on itself
(becasue of the second rule), and is therefore recursive
...
6 is nonrecursive
...
interest-rate(A, 5) :– account(A, N , B), B < 10000
...
Figure 5
...
211
212
Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition
II
...
Other Relational
Languages
5
...
empl(X, Y ) :– manager(X, Z), empl(Z, Y )
...
7
Recursive Datalog program
...
2
...
For now, we consider only
programs that are nonrecursive
...
2
...
We define the semantics of a program by starting with the semantics of a single rule
...
2
...
1 Semantics of a Rule
A ground instantiation of a rule is the result of replacing each variable in the rule
by some constant
...
Ground instantiations are often
simply called instantiations
...
A rule usually has many possible instantiations
...
Suppose that we are given a rule R,
p(t1 , t2 ,
...
, Ln
and a set of facts I for the relations used in the rule (I can also be thought of as a
database instance)
...
, vn ) :– l1 , l2 ,
...
, vi,ni ) or of the form not qi (vi,1 ,
v1,2 ,
...
We say that the body of rule instantiation R is satisfied in I if
1
...
, vi,ni ) in the body of R , the set of facts I
contains the fact q(vi,1 ,
...
2
...
, vj,nj ) in the body of R , the set of facts
I does not contain the fact qj (vj,1 ,
...
Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition
208
Chapter 5
II
...
Other Relational
Languages
Other Relational Languages
account-number
A-201
A-217
Figure 5
...
We define the set of facts that can be inferred from a given set of facts I using rule
R as
infer(R, I) = {p(t1 ,
...
, tni ) is the head of R , and
the body of R is satisfied in I}
...
, Rn }, we define
infer(R, I) = infer(R1 , I) ∪ infer (R2 , I) ∪
...
4
...
The fact account(“A-217”, “Perryridge”, 750) is in the set of facts I
...
Hence, the
body of the rule instantiation is satisfied in I
...
8
...
2
...
2 Semantics of a Program
When a view relation is defined in terms of another view relation, the set of facts in
the first view depends on the set of facts in the second one
...
Hence, we can layer the view relations in the following way,
and can use the layering to define the semantics of the program:
• A relation is in layer 1 if all relations used in the bodies of rules defining it are
stored in the database
...
• In general, a relation p is in layer i + 1 if (1) it is not in layers 1, 2,
...
, i
...
6
...
9
...
Relation interest-rate is
213
214
Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition
II
...
Other Relational
Languages
5
...
9
account
Layering of view relations
...
Relation perryridge-account is similarly in layer 1
...
We can now define the semantics of a Datalog program in terms of the layering of
view relations
...
, n
...
• We define I0 to be the set of facts stored in the database, and define I1 as
I1 = I0 ∪ infer (R1 , I0 )
• We proceed in a similar fashion, defining I2 in terms of I1 and R2 , and so on,
using the following definition:
Ii+1 = Ii ∪ infer (Ri+1 , Ii )
• Finally, the set of facts in the view relations defined by the program (also called
the semantics of the program) is given by the set of facts In corresponding to
the highest layer n
...
6, I0 is the set of facts in the database, and I1 is the set
of facts in the database along with all facts that we can infer from I0 using the rules for
relations interest-rate and perryridge-account
...
The semantics of the program — that is, the set of those facts that are
in each of the view relations— is defined as the set of facts I2
...
5
...
View expansion
can be used with nonrecursive Datalog views as well; conversely, the layering technique described here can also be used with relational-algebra views
...
Relational Databases
5
...
2
...
Consider the
rule
gt(X, Y ) :– X > Y
Since the relation defining > is infinite, this rule would generate an infinite number
of facts for the relation gt, which calculation would, correspondingly, take an infinite
amount of time and space
...
Consider the rule:
not-in-loan(L, B, A) :– not loan(L, B, A)
The idea is that a tuple (loan-number, branch-name, amount) is in view relation not-inloan if the tuple is not present in the loan relation
...
Finally, if we have a variable in the head that does not appear in the body, we may
get an infinite number of facts where the variable is instantiated to different values
...
Every variable that appears in the head of the rule also appears in a nonarithmetic positive literal in the body of the rule
...
Every variable appearing in a negative literal in the body of the rule also appears in some positive literal in the body of the rule
...
The conditions can be weakened somewhat to allow variables in the head to appear only in an arithmetic literal in the body
in some cases
...
5
...
5 Relational Operations in Datalog
Nonrecursive Datalog expressions without arithmetic operations are equivalent in
expressive power to expressions using the basic operations in relational algebra (∪, −,
×, σ, Π and ρ)
...
Rather, we shall show
through examples how the various relational-algebra operations can be expressed in
Datalog
...
215
216
Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition
II
...
Other Relational
Languages
5
...
We perform
projections simply by using only the required attributes in the head of the rule
...
, Xn , Y1 , Y2 ,
...
, Xn ), r2 (Y1 , Y2 ,
...
, Xn , Y1 , Y2 ,
...
We form the union of two relations r1 and r2 (both of arity n) in this way:
query(X1 , X2 ,
...
, Xn )
query(X1 , X2 ,
...
, Xn )
We form the set difference of two relations r1 and r2 in this way:
query(X1 , X2 ,
...
, Xn ), not r2 (X1 , X2 ,
...
A relation can occur more than once in the rule body, but instead
of renaming to give distinct names to the relation occurrences, we can use different
variable names in the different occurrences
...
We leave this demonstration
as an exercise for you to carry out
...
Certain extensions to Datalog support the extended relational update operations
of insertion, deletion, and update
...
Some systems allow the use of + or − in rule heads to
denote relational insertion and deletion
...
Again, there is no standard syntax for this operation
...
2
...
For example, consider employees in an organization
...
Each manager manages a set of people who report to him or her
...
Relational Databases
© The McGraw−Hill
Companies, 2001
5
...
10
Datalog-Fixpoint procedure
...
Thus employees may be organized in a structure similar to a
tree
...
Suppose now that we want to find out which employees are supervised, directly
or indirectly by a given manager — say, Jones
...
People often write programs to manipulate tree data structures by recursion
...
The
people supervised by Jones are (1) people whose manager is Jones and (2) people
whose manager is supervised by Jones
...
We can encode the preceding recursive definition as a recursive Datalog view,
called empl-jones:
empl-jones(X) :– manager(X, “Jones” )
empl-jones(X) :– manager(X, Y ), empl-jones(Y )
The first rule corresponds to case (1); the second rule corresponds to case (2)
...
We assume that recursive Datalog programs contain no
rules with negative literals
...
The bibliographical
employee-name
Alon
Barinsky
Corbin
Duarte
Estovar
Jones
Rensal
Figure 5
...
217
218
Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition
II
...
Other Relational
Languages
5
...
12
Datalog
213
Tuples in empl-jones
(Duarte), (Estovar)
(Duarte), (Estovar), (Barinsky), (Corbin)
(Duarte), (Estovar), (Barinsky), (Corbin), (Alon)
(Duarte), (Estovar), (Barinsky), (Corbin), (Alon)
Employees of Jones in iterations of procedure Datalog-Fixpoint
...
The view relations of a recursive program that contains a set of rules R are defined
to contain exactly the set of facts I computed by the iterative procedure DatalogFixpoint in Figure 5
...
The recursion in the Datalog program has been turned into
an iteration in the procedure
...
Consider the program defining empl-jones, with the relation manager, as in Figure 5
...
The set of facts computed for the view relation empl-jones in each iteration
appears in Figure 5
...
In each iteration, the program computes one more level of
employees under Jones and adds it to the set empl-jones
...
Such a termination point must be reached, since the set of managers and
employees is finite
...
You should verify that, at the end of the iteration, the view relation empl-jones
contains exactly those employees who work under Jones
...
Iteration starts with a set of facts I set to the facts in the
database
...
1 Next, the set of rules R in the given Datalog program is used to infer
what facts are true, given that facts in I are true
...
This process is repeated until
no new facts can be inferred
...
At this point, then, we
have the final set of true facts
...
1
...
Thus, in the
Datalog sense of “fact,” a fact may be true (the tuple is indeed in the relation) or false (the tuple is not in
the relation)
...
Relational Databases
5
...
Recall that when we make an inference by using a ground instantiation
of a rule, for each negative literal notq in the rule body we check that q is not present
in the set of facts I
...
However, in
the fixed-point iteration, the set of facts I grows in each iteration, and even if q is
not present in I at one iteration, it may appear in I later
...
We require that a recursive program should not contain
negative literals, in order to avoid such problems
...
7):
empl(X, Y ) :– manager(X, Y )
empl(X, Y ) :– manager(X, Z), empl(Z, Y )
To find the direct and indirect subordinates of Jones, we simply use the query
? empl(X, “Jones”)
which gives the same set of values for X as the view empl-jones
...
The view empl defined previously is called the transitive closure of the relation
manager
...
5
...
7 The Power of Recursion
Datalog with recursion has more expressive power than Datalog without recursion
...
For example, we cannot express transitive
closure in Datalog without using recursion (or for that matter, in SQL or QBE without
recursion)
...
Intuitively, a fixed
number of joins can find only those employees that are some (other) fixed number of
levels down from any manager (we will not attempt to prove this result here)
...
If the number of levels of employees
in the manager relation is more than the limit of the query, the query will miss some
levels of employees
...
An alternative to recursion is to use an external mechanism, such as embedded
SQL, to iterate on a nonrecursive query
...
10
...
However, writing such queries by iter-
219
220
Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition
II
...
Other Relational
Languages
5
...
The expressive power provided by recursion must be used with care
...
The second rule of the program does not satisfy the safety
condition in Section 5
...
4
...
For such
programs, tuples in view relations can contain only constants from the database, and
hence the view relations must be finite
...
5
...
8 Recursion in Other Languages
The SQL:1999 standard supports a limited form of recursion, using the with recursive
clause
...
We can find every
pair (X, Y ) such that X is directly or indirectly managed by Y , using this SQL:1999
query:
with recursive empl(emp, mgr) as (
select emp, mgr
from manager
union
select emp, empl
...
mgr = empl
...
The additional keyword recursive
specifies that the view is recursive
...
2
...
The procedure Datalog-Fixpoint iteratively uses the function infer(R, I) to compute what facts are true, given a recursive Datalog program
...
Regardless of the language used to define a view V, the view can be thought of as being defined by an
expression EV that, given a set of facts I, returns a set of facts EV (I) for the view relation V
...
Relational Databases
Chapter 5
5
...
The preceding function has the same form
as the infer function for Datalog
...
Similarly, the function infer is said to be monotonic if
I1 ⊆ I2 ⇒ infer(R, I1 ) ⊆ inf er(R, I2 )
Thus, if infer is monotonic, given a set of facts I0 that is a subset of the true facts, we
can be sure that all facts in infer(R, I0 ) are also true
...
2
...
Relational-algebra expressions that use only the operators Π, σ, ×, 1, ∪, ∩, or ρ are
monotonic
...
However, relational expressions that use the operator − are not monotonic
...
Let
I1 = { manager 1 (“Alon”, “Barinsky”), manager 1 (“Barinsky”, “Estovar”),
manager 2 (“Alon”, “Barinsky”) }
and let
I2 = { manager 1 (“Alon”, “Barinsky”), manager 1 (“Barinsky”, “Estovar”),
manager 2 (“Alon”, “Barinsky”), manager 2 (“Barinsky”, “Estovar”)}
Consider the expression manager 1 − manager 2
...
But I1 ⊆ I2 ; hence, the expression is not monotonic
...
The fixed-point technique does not work on recursive views defined with nonmonotonic expressions
...
Such relationships define what subparts make up each part
...
An example of an aggregate query on such a structure
would be to compute the total number of subparts of each part
...
The bibliographic notes provide references
to research on defining such views
...
For
example, extended relational operations have been proposed to define transitive closure, and extensions to the SQL syntax to specify (generalized) transitive closure have
been proposed
...
221
222
Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition
II
...
Other Relational
Languages
5
...
3 User Interfaces and Tools
Although many people interact with databases, few people use a query language to
directly interact with a database system
...
Forms and graphical user interfaces allow users to enter values that complete predefined queries
...
Graphical user interfaces provide
an easy-to-use way to interact with the database system
...
Report generators permit predefined reports to be generated on the current
database contents
...
3
...
It is worth noting that such interfaces use query languages to communicate with
database systems
...
Chapter 22 covers data analysis tools in more detail
...
In this section, we describe the basic concepts, without going
into the details of any particular user interface product
...
3
...
For example, World Wide Web search
engines provide forms that are used to enter key words
...
As a more database-oriented example, you may connect to a university registration system, where you are asked to fill in your roll number and password into a
form
...
There may be further links on the Web page that let you
search for courses and find further information about courses such as the syllabus
and the instructor
...
Most database system vendors also provide proprietary
forms interfaces that offer facilities beyond those present in HTML forms
...
Most database system vendors also provide tools that simplify the creation of graphical user interfaces and forms
...
Users can define the type, size, and format of each field in
a form by using the form editor
...
Relational Databases
5
...
For instance, the execution of a query to fill in name and address fields may be associated with filling in a roll number field, and execution of an update statement may
be associated with submitting a form
...
2 For example, a constraint on the course number field may check that the
course number typed in by the user corresponds to an actual course
...
Menus that indicate the valid values that can
be entered in a field can help eliminate the possibility of many types of errors
...
5
...
2 Report Generators
Report generators are tools to generate human-readable summary reports from a
database
...
For example, a report may show the
total sales in each of the past two months for each sales region
...
Variables can be used to store parameters such as the
month and the year and to define fields in the report
...
The query definitions can
make use of the parameter values stored in the variables
...
Report-generator systems
provide a variety of facilities for structuring tabular output, such as defining table
and column headers, displaying subtotals for each group in a table, automatically
splitting long tables into multiple pages, and displaying subtotals at the end of each
page
...
13 is an example of a formatted report
...
The Microsoft Office suite provides a convenient way of embedding formatted
query results from a database, such as MS Access, into a document created with a
text editor, such as MS Word
...
A feature
called OLE (Object Linking and Embedding) links the resulting structure into a text
document
...
The name emphasizes that these tools offer a programming paradigm that is different from the imperative programming paradigm offered by third2
...
223
224
Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition
II
...
Other Relational
Languages
5
...
Quarterly Sales Report
Period: Jan
...
13
2,100,000
A formatted report
...
However, this term is less
relevant today, since forms and report generators are typically created with graphical
tools, rather than with programming languages
...
4 Summary
• We have considered two query languages: QBE, and Datalog
...
• QBE and its variants have become popular with nonexpert database users because of the intuitive simplicity of the visual paradigm
...
• Datalog is derived from Prolog, but unlike Prolog, it has a declarative semantics, making simple queries easier to write and query evaluation easier to optimize
...
However, no
accepted standards exist for important features, such as grouping and aggregation, in Datalog
...
• Most users interact with databases via forms and graphical user interfaces,
and there are numerous tools to simplify the construction of such interfaces
...
Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition
220
Chapter 5
II
...
Other Relational
Languages
Other Relational Languages
Review Terms
•
•
•
•
•
•
•
Query-by-Example (QBE)
Two-dimensional syntax
Skeleton tables
Example rows
Condition box
Result relation
Microsoft Access
• Graphical Query-By-Example
(GQBE)
• Design grid
• Datalog
• Rules
• Uses
• Defines
• Positive literal
• Negative literal
• Fact
• Rule
Head
Body
• Datalog program
• Depend on
Directly
Indirectly
• Recursive view
• Nonrecursive view
• Instantiation
Ground instantiation
Satisfied
• Infer
• Semantics
Of a rule
Of a program
• Safety
• Fixed point
• Transitive closure
• Monotonic view definition
• Forms
• Graphical user interfaces
• Report generators
Exercises
5
...
14, where the primary keys are underlined
...
a
...
b
...
c
...
d
...
”
e
...
5
...
15
...
Find the names of all employees who work for First Bank Corporation
...
Find the names and cities of residence of all employees who work for First
Bank Corporation
...
Relational Databases
© The McGraw−Hill
Companies, 2001
5
...
14
Insurance database
...
Find the names, street addresses, and cities of residence of all employees
who work for First Bank Corporation and earn more than $10,000 per annum
...
Find all employees who live in the same city as the company for which they
work is located
...
Find all employees who live in the same city and on the same street as their
managers
...
Find all employees in the database who do not work for First Bank Corporation
...
Find all employees who earn more than every employee of Small Bank Corporation
...
Assume that the companies may be located in several cities
...
5
...
15
...
Give expressions in QBE for each of the following queries:
a
...
b
...
c
...
d
...
5
...
15
...
b
...
d
...
Give all employees of First Bank Corporation a 10 percent raise
...
Give all managers in the database a 10 percent raise, unless the salary would
be greater than $100,000
...
employee (person-name, street, city)
works (person-name, company-name, salary)
company (company-name, city)
manages (person-name, manager-name)
Figure 5
...
Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition
222
II
...
Other Relational
Languages
© The McGraw−Hill
Companies, 2001
Other Relational Languages
e
...
5
...
Give expressions in QBE, and Datalog equivalent to each of the following queries:
a
...
c
...
ΠA (r)
σB = 17 (r)
r × s
ΠA,F (σC = D (r × s))
5
...
Give expressions in QBE, and Datalog equivalent to each of the following queries:
a
...
c
...
r1 ∪ r2
r1 ∩ r2
r1 − r2
ΠAB (r1 )
1
ΠBC (r2 )
5
...
Write expressions in QBE and Datalog for each of the following queries:
a
...
{< a, b, c > | < a, b > ∈ r ∧ < a, c > ∈ s}
c
...
8 Consider the relational database of Figure 5
...
Write a Datalog program for
each of the following queries:
a
...
b
...
c
...
d
...
5
...
227
228
Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition
II
...
Other Relational
Languages
© The McGraw−Hill
Companies, 2001
Bibliographical Notes
223
5
...
Bibliographical Notes
The experimental version of Query-by-Example is described in Zloof [1977]; the commercial version is described in IBM [1978]
...
Examples are Microsoft Access and Borland Paradox
...
[1993]), and Coral
(described in Ramakrishnan et al
...
[1993])
...
[1984]
...
Ramakrishnan and Ullman [1995] provides a more recent survey on deductive databases
...
Chandra and Harel [1982] and Apt and Pugin [1987] discuss stratified negation
...
[1992a]
...
IBM DB2 QMF and Borland Paradox also support QBE
...
cs
...
edu/coral)
...
sourceforge
...
Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition
C
II
...
Integrity and Security
E
R
229
© The McGraw−Hill
Companies, 2001
6
Integrity and Security
Integrity constraints ensure that changes made to the database by authorized users
do not result in a loss of data consistency
...
We have already seen two forms of integrity constraints for the E-R model in Chapter 2:
• Key declarations — the stipulation that certain attributes form a candidate key
for a given entity set
...
In general, an integrity constraint can be an arbitrary predicate pertaining to the
database
...
Thus, we concentrate
on integrity constraints that can be tested with minimal overhead
...
1 and 6
...
3
...
In Section 6
...
Triggers are used
to ensure some types of integrity
...
In Sections 6
...
7, we examine ways in which data
may be misused or intentionally made inconsistent, and present security mechanisms
to guard against such occurrences
...
1 Domain Constraints
We have seen that a domain of possible values must be associated with every attribute
...
Relational Databases
6
...
Declaring an attribute to
be of a particular domain acts as a constraint on the values that it can take
...
They are tested easily by the system whenever a new data item is entered into the database
...
For example, the attributes customer-name and employee-name might have the same domain: the set of all
person names
...
It is perhaps less clear whether customer-name and branch-name should have
the same domain
...
However, we would normally not consider the query
“Find all customers who have the same name as a branch” to be a meaningful query
...
From the above discussion, we can see that a proper definition of domain constraints not only allows us to test values inserted in the database, but also permits
us to test queries to ensure that the comparisons made make sense
...
Strongly typed programming languages allow the compiler to check the
program in greater detail
...
For example, the
statements:
create domain Dollars numeric(12,2)
create domain Pounds numeric(12,2)
define the domains Dollars and Pounds to be decimal numbers with a total of 12 digits,
two of which are placed after the decimal point
...
Such an assignment is likely to be due to a programmer error,
where the programmer forgot about the differences in currency
...
Values of one domain can be cast (that is, converted) to another domain
...
A as Pounds
In a real application we would of course multiply r
...
SQL also provides drop domain and alter domain clauses
to drop or modify domains that have been created earlier
...
Specifically, the check
clause permits the schema designer to specify a predicate that must be satisfied by
any value assigned to a variable whose type is the domain
...
Relational Databases
231
© The McGraw−Hill
Companies, 2001
6
...
2
Referential Integrity
227
create domain HourlyWage numeric(5,2)
constraint wage-value-test check(value >= 4
...
00
...
The name is used to indicate which constraint
an update violated
...
However, in general, the check conditions can be more complex (and
harder to check), since subqueries that refer to other relations are permitted in the
check condition
...
Thus, the condition has to be
checked not only when a tuple is inserted or modified in deposit, but also when the
relation branch changes (in this case, when a tuple is deleted or modified in relation
branch)
...
We discuss such constraints, along with a simpler way
of specifying them in SQL, in Section 6
...
Complex check conditions can be useful when we want to ensure integrity of data,
but we should use them with care, since they may be costly to test
...
2 Referential Integrity
Often, we wish to ensure that a value that appears in one relation for a given set of
attributes also appears for a certain set of attributes in another relation
...
232
Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition
228
Chapter 6
II
...
Integrity and Security
© The McGraw−Hill
Companies, 2001
Integrity and Security
6
...
1 Basic Concepts
Consider a pair of relations r(R) and s(S), and the natural join r 1 s
...
That is, there is no ts in s such that
tr [R ∩ S] = ts [R ∩ S]
...
Depending on the entity
set or relationship set being modeled, dangling tuples may or may not be acceptable
...
3
...
Here, our concern is not with queries, but
rather with when we should permit dangling tuples to exist in the database
...
This
situation would be undesirable
...
Therefore, tuple t1 would refer to an account at a branch that does not exist
...
Not all instances of dangling tuples are undesirable, however
...
In this case, a branch exists that
has no accounts
...
Thus, we do not want to prohibit this situation
...
• The attribute branch-name in Branch-schema is not a foreign key
...
1
...
)
In the Lunartown example, tuple t1 in account has a value on the foreign key
branch-name that does not appear in branch
...
Thus, the distinction between our two examples of dangling tuples
is the presence of a foreign key
...
We
say that a subset α of R2 is a foreign key referencing K1 in relation r1 if it is required
that, for every t2 in r2 , there must be a tuple t1 in r1 such that t1 [K1 ] = t2 [α]
...
The latter term arises because the preceding referential-integrity constraint
can be written as Πα (r2 ) ⊆ ΠK1 (r1 )
...
6
...
2 Referential Integrity and the E-R Model
Referential-integrity constraints arise frequently
...
Relational Databases
233
© The McGraw−Hill
Companies, 2001
6
...
2
Referential Integrity
229
E1
E2
...
...
1
An n-ary relationship set
...
Figure 6
...
, En
...
The attributes of the relation schema for relationship set R
include K1 ∪ K2 ∪ · · · ∪ Kn
...
Recall from
Chapter 2 that the relation schema for a weak entity set must include the primary
key of the entity set on which the weak entity set depends
...
6
...
3 Database Modification
Database modifications can cause violations of referential integrity
...
If a tuple t2 is inserted into r2 , the system must ensure that there is a
tuple t1 in r1 such that t1 [K] = t2 [α]
...
If a tuple t1 is deleted from r1 , the system must compute the set of
tuples in r2 that reference t1 :
σα = t1 [K] (r2 )
If this set is not empty, either the delete command is rejected as an error, or the
tuples that reference t1 must themselves be deleted
...
234
Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition
230
Chapter 6
II
...
Integrity and Security
© The McGraw−Hill
Companies, 2001
Integrity and Security
• Update
...
If a tuple t2 is updated in relation r2 , and the update modifies values for
the foreign key α, then a test similar to the insert case is made
...
The system must ensure that
t2 [α] ∈ ΠK (r1 )
If a tuple t1 is updated in r1 , and the update modifies values for the primary key (K), then a test similar to the delete case is made
...
If this set
is not empty, the update is rejected as an error, or the update is cascaded
in a manner similar to delete
...
2
...
We illustrate foreign-key declarations by using the SQL DDL definition of part of our bank database, shown in Figure 6
...
By default, a foreign key references the primary key attributes of the referenced
table
...
The specified list of attributes must
be declared as a candidate key of the referenced relation
...
However, a foreign key clause can specify that
if a delete or update action on the referenced relation violates the constraint, then,
instead of rejecting the action, the system must take steps to change the tuple in the
referencing relation to restore the constraint
...
foreign key (branch-name) references branch
on delete cascade
on update cascade,
...
Instead, the delete “cascades” to the
Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition
II
...
Integrity and Security
6
...
2
SQL data definition for part of the bank database
...
Similarly, the system does not reject an update to a field referenced by the constraint if it
violates the constraint; instead, the system updates the field branch-name in the referencing tuples in account to the new value as well
...
If there is a chain of foreign-key dependencies across multiple relations, a deletion
or update at one end of the chain can propagate across the entire chain
...
4
...
As a result, all the changes caused by the transaction and its cascading actions
are undone
...
Attributes of foreign keys are allowed to be null, provided that they have not other-
236
Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition
232
Chapter 6
II
...
Integrity and Security
© The McGraw−Hill
Companies, 2001
Integrity and Security
wise been declared to be non-null
...
If
any of the foreign-key columns is null, the tuple is defined automatically to satisfy
the constraint
...
To avoid such complexity, it is best to ensure that all columns of a foreign
key specification are declared to be non-null
...
For instance, suppose we have a relation marriedperson with primary key name, and an attribute spouse, and suppose that spouse is a foreign key on marriedperson
...
Suppose we wish to note the fact that John and Mary are married to each
other by inserting two tuples, one for John and one for Mary, in the above relation
...
After the second tuple is inserted the foreign
key constraint would hold again
...
1
6
...
Domain constraints and referential-integrity constraints are special forms
of assertions
...
However,
there are many constraints that we cannot express by using only these special forms
...
• Every loan has at least one customer who maintains an account with a minimum balance of $1000
...
An assertion in SQL takes the form
create assertion
Here is how the two examples of constraints can be written
...
We can work around the problem in the above example in another way, if the spouse attribute can be
set to null: We set the spouse attributes to null when inserting the tuples for John and Mary, and we update
them later
...
Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition
II
...
Integrity and Security
6
...
We write
create assertion sum-constraint check
(not exists (select * from branch
where (select sum(amount) from loan
where loan
...
branch-name)
>= (select sum(balance) from account
where account
...
branch-name)))
create assertion balance-constraint check
(not exists (select * from loan
where not exists ( select *
from borrower, depositor, account
where loan
...
loan-number
and borrower
...
customer-name
and depositor
...
account-number
and account
...
If the assertion is valid,
then any future modification to the database is allowed only if it does not cause that
assertion to be violated
...
Hence, assertions should be used with great
care
...
6
...
To design a trigger mechanism, we must meet two
requirements:
1
...
This is broken up into an event that
causes the trigger to be checked and a condition that must be satisfied for trigger execution to proceed
...
Specify the actions to be taken when the trigger executes
...
The database stores triggers just as if they were regular data, so that they are persistent and are accessible to all database operations
...
238
Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition
234
Chapter 6
II
...
Integrity and Security
© The McGraw−Hill
Companies, 2001
Integrity and Security
6
...
1 Need for Triggers
Triggers are useful mechanisms for alerting humans or for starting certain tasks automatically when certain conditions are met
...
The bank
gives this loan a loan number identical to the account number of the overdrawn account
...
Suppose that Jones’ withdrawal
of some money from an account made the account balance negative
...
The actions to be taken are:
• Insert a new tuple s in the loan relation with
s[loan-number] = t[account-number]
s[branch-name] = t[branch-name]
s[amount] = −t[balance]
(Note that, since t[balance] is negative, we negate t[balance] to get the loan
amount — a positive number
...
As another example of the use of triggers, suppose a warehouse wishes to maintain a minimum inventory of each item; when the inventory level of an item falls
below the minimum level, an order should be placed automatically
...
Note that trigger systems cannot usually perform updates outside the database,
and hence in the inventory replenishment example, we cannot use a trigger to directly place an order in the external world
...
We must create a separate permanently running
system process that periodically scans the orders relation and places orders
...
The process would also track deliveries of orders,
and alert managers in case of exceptional conditions such as delays in deliveries
...
4
...
Unfortunately, each database system implemented its
Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition
II
...
Integrity and Security
6
...
balance < 0
begin atomic
insert into borrower
(select customer-name, account-number
from depositor
where nrow
...
account-number);
insert into loan values
(nrow
...
branch-name, − nrow
...
account-number = nrow
...
3
Example of SQL:1999 syntax for triggers
...
We outline in Figure 6
...
This trigger definition specifies that the trigger is initiated after any update of the
relation account is executed
...
The referencing new row as clause creates a variable
nrow (called a transition variable), which stores the value of an updated row after
the update
...
balance < 0
...
The
begin atomic
...
The two insert statements with the begin
...
The update statement serves to set the account balance back
to 0 from its earlier negative value
...
For example, the action on delete of an account could be to check if the
holders of the account have any remaining accounts, and if they do not, to
delete them from the depositor relation
...
7)
...
Obviously a trigger cannot directly cause such an action outside the database, but could instead add a tuple to a relation storing addresses to which welcome letters need to be sent
...
240
Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition
236
Chapter 6
II
...
Integrity and Security
© The McGraw−Hill
Companies, 2001
Integrity and Security
• For updates, the trigger can specify columns whose update causes the trigger
to execute
...
• The referencing old row as clause can be used to create a variable storing the
old value of an updated or deleted row
...
• Triggers can be activated before the event (insert/delete/update) instead of
after the event
...
For instance, if we wish not to permit overdrafts, we can create a before
trigger that rolls back the transaction if the new balance is negative
...
We can
define a trigger that replaces the value by the null value
...
create trigger setnull-trigger before update on r
referencing new row as nrow
for each row
when nrow
...
phone-number = null
• Instead of carrying out an action for each affected row, we can carry out a single action for the entire SQL statement that caused the insert/delete/update
...
The clauses referencing old table as or referencing new table as can then
be used to refer to temporary tables (called transition tables) containing all the
affected rows
...
A single SQL statement can then be used to carry out multiple actions on
the basis of the transition tables
...
Relational Databases
241
© The McGraw−Hill
Companies, 2001
6
...
4
Triggers
237
create trigger reorder-trigger after update of amount on inventory
referencing old row as orow, new row as nrow
for each row
when nrow
...
item = orow
...
level > (select level
from minlevel
where minlevel
...
item)
begin
insert into orders
(select item, amount
from reorder
where reorder
...
item)
end
Figure 6
...
• minlevel(item, level), which notes the minimum amount of the item to be maintained
• reorder(item, amount), which notes the amount of the item to be ordered when
its level falls below the minimum
• orders(item, amount), which notes the amount of the item to be ordered
...
4 for reordering the item
...
If we only check that the
new value after an update is below the minimum level, we may place an order erroneously when the item has already been reordered
...
For instance, many database systems do not
implement the before clause, and the keyword on is used instead of after
...
Instead, they may specify transition tables by
using the keywords inserted or deleted
...
5 illustrates how the overdraft trigger would be written in MS-SQLServer
...
6
...
3 When Not to Use Triggers
There are many good uses for triggers, such as those we have just seen in Section 6
...
2,
but some uses are best handled by alternative techniques
...
For instance, they used
triggers on insert/delete/update of a employee relation containing salary and dept attributes to maintain the total salary of each department
...
5
...
Relational Databases
6
...
balance < 0
begin
insert into borrower
(select customer-name, account-number
from depositor, inserted
where inserted
...
account-number)
insert into loan values
(inserted
...
branch-name, − inserted
...
account-number = inserted
...
5
Example of trigger in MS-SQL server syntax
easier way to maintain summary data
...
A separate process
copied over the changes to the replica (copy) of the database, and the system executed
the changes on the replica
...
In fact, many trigger applications, including our example overdraft trigger, can be
substituted by “encapsulation” features being introduced in SQL:1999
...
That procedure would in turn check for negative balance, and carry out the actions of the overdraft trigger
...
Triggers should be written with great care, since a trigger error detected at run
time causes the failure of the insert/delete/update statement that set off the trigger
...
In the worst case,
this could even lead to an infinite chain of triggering
...
The insert action then triggers yet another insert action, and so on ad infinitum
...
Triggers are occasionally called rules, or active rules, but should not be confused
with Datalog rules (see Section 5
...
6
...
Relational Databases
243
© The McGraw−Hill
Companies, 2001
6
...
5
Security and Authorization
239
duction of inconsistency that integrity constraints provide
...
We
then present mechanisms to guard against such occurrences
...
5
...
Absolute protection
of the database from malicious abuse is not possible, but the cost to the perpetrator
can be made high enough to deter most if not all attempts to access the database
without proper authority
...
Some database-system users may be authorized to access
only a limited portion of the database
...
It is the responsibility of
the database system to ensure that these authorization restrictions are not violated
...
No matter how secure the database system is, weakness in
operating-system security may serve as a means of unauthorized access to the
database
...
Since almost all database systems allow remote access through terminals or networks, software-level security within the network software is as
important as physical security, both on the Internet and in private networks
...
Sites with computer systems must be physically secured against
armed or surreptitious entry by intruders
...
Users must be authorized carefully to reduce the chance of any user
giving access to an intruder in exchange for a bribe or other favors
...
A weakness at a low level of security (physical or human) allows circumvention of
strict high-level (database) security measures
...
Security at the physical and human levels, although important, is beyond the
scope of this text
...
The file system also provides some degree of protection
...
Relational Databases
6
...
Finally, network-level security has gained widespread recognition as the Internet
has evolved from an academic research platform to the basis of international electronic commerce
...
We shall present our discussion of security in terms of the
relational-data model, although the concepts of this chapter are equally applicable to
all data models
...
5
...
For
example,
• Read authorization allows reading, but not modification, of data
...
• Update authorization allows modification, but not deletion, of data
...
We may assign the user all, none, or a combination of these types of authorization
...
• Resource authorization allows the creation of new relations
...
• Drop authorization allows the deletion of relations
...
If a user deletes all tuples of a relation, the relation still exists, but
it is empty
...
We regulate the ability to create new relations through resource authorization
...
Index authorization may appear unnecessary, since the creation or deletion of an
index does not alter data in relations
...
However, indices also consume space, and all database modifications
are required to update indices
...
To allow the database
administrator to regulate the use of system resources, it is necessary to treat index
creation as a privilege
...
Relational Databases
245
© The McGraw−Hill
Companies, 2001
6
...
5
Security and Authorization
241
The ultimate form of authority is that given to the database administrator
...
This form of authorization is analogous to that of a superuser or operator for an
operating system
...
5
...
A view can hide data that a user does
not need to see
...
Views simplify system usage because they restrict
the user’s attention to the data of interest
...
Thus, a combination of relational-level security and view-level security limits a
user’s access to precisely the data that the user needs
...
This clerk is not authorized to see information regarding specific loans that the customer may have
...
But, if she is to have access to the information
needed, the clerk must be granted access to the view cust-loan, which consists of only
the names of customers and the branches at which they have a loan
...
loan-number = loan
...
However, when the
query processor translates it into a query on the actual relations in the database, it
produces a query on borrower and loan
...
Creation of a view does not require resource authorization
...
She receives only those
privileges that provide no additional authorization beyond those that she already
had
...
If a user creates
a view on which no authorization can be granted, the system will deny the view
creation request
...
246
Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition
242
Chapter 6
II
...
Integrity and Security
Integrity and Security
6
...
4 Granting of Privileges
A user who has been granted some form of authorization may be allowed to pass
on this authorization to other users
...
Consider, as an example, the granting of update authorization on the loan relation of the bank database
...
The passing of authorization from one user to another
can be represented by an authorization graph
...
The graph includes an edge Ui → Uj if user Ui grants update authorization on loan
to Uj
...
In the sample graph in
Figure 6
...
A user has an authorization if and only if there is a path from the root of the authorization graph (namely, the node representing the database administrator) down to
the node representing the user
...
Since U4 has authorization from U1 , that authorization should be revoked as
well
...
Since the database
administrator did not revoke update authorization on loan from U2 , U5 retains update
authorization on loan
...
A pair of devious users might attempt to defeat the rules for revocation of
authorization by granting authorization to each other, as shown in Figure 6
...
If
the database administrator revokes authorization from U2 , U2 retains authorization
through U3 , as in Figure 6
...
If authorization is revoked subsequently from U3 , U3
appears to retain authorization through U2 , as in Figure 6
...
However, when the
database administrator revokes authorization from U3 , the edges from U3 to U2 and
from U2 to U3 are no longer part of a path starting with the database administrator
...
6
Authorization-grant graph
...
Relational Databases
247
© The McGraw−Hill
Companies, 2001
6
...
5
Security and Authorization
243
DBA
U1
U2
U3
(a)
DBA
DBA
U1
U2
U1
U3
U3
(c)
(b)
Figure 6
...
We require that all edges in an authorization graph be part of some path originating
with the database administrator
...
8
...
5
...
Each teller must have the same types
of authorizations to the same set of relations
...
A better scheme would be to specify the authorizations that every teller is to be
given, and to separately identify which database users are tellers
...
When a new person is hired as a teller, a user identifier must be allocated
to him, and he must be identified as a teller
...
The notion of roles captures this scheme
...
Authorizations can be granted to roles, in exactly the same fashion as they are granted
to individual users
...
DBA
U1
Figure 6
...
248
Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition
244
Chapter 6
II
...
Integrity and Security
© The McGraw−Hill
Companies, 2001
Integrity and Security
In our bank database, examples of roles could include teller, branch-manager, auditor, and system-administrator
...
The problem with this scheme
is that it would not be possible to identify exactly which teller carried out a transaction, leading to security risks
...
Any authorization that can be granted to a user can be granted to a role
...
And like other authorizations, a user
may also be granted authorization to grant a particular role to others
...
6
...
6 Audit Trails
Many secure database applications require an audit trail be maintained
...
The audit trail aids security in several ways
...
The bank could then also use the audit trail to trace all
the updates performed by these persons, in order to find other incorrect or fraudulent
updates
...
However, many database systems provide built-in mechanisms to create audit trails, which
are much more convenient to use
...
6
...
We describe these mechanisms, as well as their limitations, in this section
...
6
...
The select
privilege corresponds to the read privilege
...
If the relation
to be created includes a foreign key that references attributes of another relation,
the user/role must have been granted references privilege on those attributes
...
Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition
II
...
Integrity and Security
6
...
The grant statement is used to confer authorization
...
The following grant statement grants users U1 , U2 , and U3 select authorization on
the account relation:
grant select on account to U1 , U2 , U3
The update authorization may be given either on all attributes of the relation or
on only some
...
If the list of attributes is omitted, the
update privilege will be granted on all attributes of the relation
...
The SQL references privilege is granted on specific attributes in a manner like
that for the update privilege
...
However, recall from Section 6
...
In the preceding example, if U1 creates a foreign key in a relation r referencing the
branch-name attribute of the branch relation, and then inserts a tuple into r pertaining
to the Perryridge branch, it is no longer possible to delete the Perryridge branch from
the branch relation without also modifying relation r
...
The privilege all privileges can be used as a short form for all the allowable privileges
...
SQL also includes a usage privilege that authorizes a user to use a specified
domain (recall that a domain corresponds to the programming-language notion of a
type, and may be user defined)
...
Relational Databases
6
...
6
...
grant teller to john
create role manager
grant teller to manager
grant manager to mary
Thus the privileges of a user or a role consist of
• All privileges directly granted to the user/role
• All privileges granted to roles that have been granted to the user/role
Note that there can be a chain of roles; for example, the role employee may be granted
to all tellers
...
Thus, the manager role inherits all privileges granted to the roles employee and to teller in addition to privileges
granted directly to manager
...
6
...
If we wish to grant a privilege and to allow the recipient
to pass the privilege on to other users, we append the with grant option clause to the
appropriate grant command
...
It takes a form almost
identical to that of grant:
revoke
from
Thus, to revoke the privileges that we granted previously, we write
Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition
II
...
Integrity and Security
6
...
5
...
This behavior is called cascading of the
revoke
...
The revoke
statement may alternatively specify restrict:
revoke select on branch from U1 , U2 , U3 restrict
In this case, the system returns an error if there are any cascading revokes, and does
not carry out the revoke action
...
6
...
The SQL standard specifies a primitive authorization mechanism for the database
schema: Only the owner of the schema can carry out any modification to the schema
...
Several database implementations have more powerful authorization mechanisms for database schemas, similar to those discussed earlier, but these mechanisms are nonstandard
...
6
...
For instance,
suppose you want all students to be able to see their own grades, but not the grades
of anyone else
...
Furthermore, with the growth in the Web, database accesses come primarily from
Web application servers
...
The task of authorization then falls on the application server; the entire authorization scheme of SQL is bypassed
...
The problems
are these:
• The code for checking authorization becomes intermixed with the rest of the
application code
...
Relational Databases
6
...
Because of an oversight, one of the application programs may not check for authorization, allowing unauthorized users access to confidential data
...
6
...
In such cases, data may
be stored in encrypted form
...
Encryption also forms the basis of
good schemes for authenticating users to a database
...
7
...
Simple encryption
techniques may not provide adequate security, since it may be easy for an unauthorized user to break the code
...
Thus,
Perryridge
becomes
Qfsszsjehf
If an unauthorized user sees only “Qfsszsjehf,” she probably has insufficient information to break the code
...
A good encryption technique has the following properties:
• It is relatively simple for authorized users to encrypt and decrypt data
...
• Its encryption key is extremely difficult for an intruder to determine
...
For this scheme to work, the authorized users must be provided with
the encryption key via a secure mechanism
...
The DES standard was reaffirmed in 1983, 1987,
Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition
II
...
Integrity and Security
6
...
However, weakness in DES was recongnized in 1993 as reaching a
point where a new standard to be called the Advanced Encryption Standard (AES),
needed to be selected
...
Rijmen and J
...
The Rijndael algorithm was
chosen for its significantly stronger level of security and its relative ease of implementation on current computer systems as well as such devices as smart cards
...
Public-key encryption is an alternative scheme that avoids some of the problems
that we face with the DES
...
Each
user Ui has a public key Ei and a private key Di
...
Each private key is known to only the one user to whom the
key belongs
...
Decryption requires the private key D1
...
If user U1 wants to share data with U2 , U1 encrypts
the data using E2 , the public key of U2
...
For public-key encryption to work, there must be a scheme for encryption that
can be made public without making it easy for people to figure out the scheme for
decryption
...
Such a scheme does exist and is based on these conditions:
• There is an efficient algorithm for testing whether or not a number is prime
...
For purposes of this scheme, data are treated as a collection of integers
...
The
private key consists of the pair (P1 , P2 )
...
Since all that is published is the product P1 P2 , an unauthorized user would need
to be able to factor P1 P2 to steal data
...
The details of public-key encryption and the mathematical justification of this technique’s properties are referenced in the bibliographic notes
...
A hybrid scheme used for secure communication is as follows: DES
keys are exchanged via a public-key – encryption scheme, and DES encryption is used
on the data transmitted subsequently
...
7
...
The simplest form of authentication consists of a secret password which must be presented when a connection is opened to a database
...
Relational Databases
6
...
However, the use of passwords has some drawbacks, especially over a
network
...
Once
the eavesdropper has a user name and password, she can connect to the database,
pretending to be the legitimate user
...
The database system sends a challenge string to the user
...
The database system
can verify the authenticity of the user by decrypting the string with the same secret
password, and checking the result with the original challenge string
...
Public-key systems can be used for encryption in challenge – response systems
...
The user decrypts the string using her private key, and returns
the result to the database system
...
This scheme has the added benefit of not storing the secret password in the database,
where it could potentially be seen by system administrators
...
The private key is used to sign data, and the signed data
can be made public
...
Thus, we can authenticate
the data; that is, we can verify that the data were indeed created by the person who
claims to have created them
...
That is, in
case the person who created the data later claims she did not create it (the electronic
equivalent of claiming not to have signed the check), we can prove that that person
must have created the data (unless her private key was leaked to others)
...
8 Summary
• Integrity constraints ensure that changes made to the database by authorized
users do not result in a loss of data consistency
...
In this chapter, we considered several additional
forms of constraints, and discussed mechanisms for ensuring the maintenance
of these constraints
...
Such constraints may also prohibit the use of null values for
particular attributes
...
Relational Databases
255
© The McGraw−Hill
Companies, 2001
6
...
8
Summary
251
• Referential-integrity constraints ensure that a value that appears in one relation for a given set of attributes also appears for a certain set of attributes in
another relation
...
Use of more complex constraints may lead to substantial overhead
...
Assertions are declarative
expressions that state predicates that we require always to be true
...
Triggers have many uses, such
as implementing business rules, audit logging, and even carrying out actions
outside the database system
...
• The data stored in the database need to be protected from unauthorized access, malicious destruction or alteration, and accidental introduction of inconsistency
...
Absolute protection of the database
from malicious abuse is not possible, but the cost to the perpetrator can be
made sufficiently high to deter most, if not all, attempts to access the database
without proper authority
...
Authorization is a means by which the database system can be protected against
malicious or unauthorized access
...
However, we must be careful about how authorization can be passed among users if we are to ensure that such authorization can be revoked at some future time
...
• The various authorization provisions in a database system may not provide
sufficient protection for highly sensitive data
...
Only a user who knows how to decipher (decrypt) the encrypted data
can read them
...
Review Terms
• Domain constraints
• Primary key constraint
• Check clause
• Unique constraint
• Referential integrity
• Foreign key constraint
256
Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition
252
Chapter 6
II
...
Integrity and Security
Integrity and Security
• Cascade
• Assertion
• Trigger
• Event-condition-action model
• Before and after triggers
• Transition variables and tables
• Database security
• Levels of security
• Authorization
• Privileges
Read
Insert
Update
Delete
Index
Resource
Alteration
Drop
Grant
All privileges
• Authorization graph
• Granting of privileges
• Roles
• Encryption
• Secret-key encryption
• Public-key encryption
• Authentication
• Challenge – response system
• Digital signature
• Nonrepudiation
Exercises
6
...
2 to include
the relations loan and borrower
...
2 Consider the following relational database:
employee (employee-name, street, city)
works (employee-name, company-name, salary)
company (company-name, city)
manages (employee-name, manager-name)
Give an SQL DDL definition of this database
...
6
...
Consider a database that includes the following relations:
salaried-worker (name, office, phone, salary)
hourly-worker (name, hourly-wage)
address (name, street, city)
Suppose that we wish to require that every name that appears in address appear
in either salaried-worker or hourly-worker, but not necessarily in both
...
Propose a syntax for expressing such constraints
...
Discuss the actions that the system must take to enforce a constraint of this
form
...
Relational Databases
6
...
4 SQL allows a foreign-key dependency to refer to the same relation, as in the
following example:
create table manager
(employee-name char(20) not null
manager-name char(20) not null,
primary key employee-name,
foreign key (manager-name) references manager
on delete cascade )
Here, employee-name is a key to the table manager, meaning that each employee
has at most one manager
...
Explain exactly what happens when a tuple in the relation
manager is deleted
...
5 Suppose there are two relations r and s, such that the foreign key B of r references the primary key A of s
...
6
...
6
...
6
...
account-number = account
...
Write active rules to maintain the view, that is, to keep it up to date on insertions
to and deletions from depositor or account
...
6
...
For each item on your list, state
whether this concern relates to physical security, human security, operatingsystem security, or database security
...
10 Using the relations of our sample bank database, write an SQL expression to
define the following views:
a
...
258
Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition
254
Chapter 6
II
...
Integrity and Security
© The McGraw−Hill
Companies, 2001
Integrity and Security
b
...
c
...
6
...
10, explain how updates
would be performed (if they should be allowed at all)
...
6
...
In this chapter, we described
the use of views as a security mechanism
...
6
...
14 Database systems that store each relation in a separate operating-system file
may use the operating system’s security and authorization scheme, instead of
defining a special scheme themselves
...
6
...
16 Perhaps the most important data items in any database system are the passwords that control access to the database
...
Be sure that your scheme allows the system to test passwords
supplied by users who are attempting to log into the system
...
The original SQL proposals for assertions and triggers are discussed in Astrahan et al
...
[1976], and Chamberlin
et al
...
See the bibliographic notes of Chapter 4 for references to SQL standards
and books on SQL
...
[1980a], Hsu and Imielinski [1985], McCune and Henschen [1989], and Chomicki
[1992]
...
Sheard and Stemple [1989] discusses this
approach
...
McCarthy and Dayal
[1989] discuss the architecture of an active database system based on the event–
condition–action formalism
...
Relational Databases
6
...
[1991]
...
A rule system is said to be confluent if, regardless of the rule chosen,
the final state is the same
...
[1995]
...
of Defense [1985]
...
Stonebraker and Wong [1974] discusses the Ingres approach to security, which involves modification of users’ queries to ensure that users do not access data for which
authorization has not been granted
...
Database systems that can produce incorrect answers when necessary for security
maintenance are discussed in Winslett et al
...
Work on security in relational databases includes that of Stachour and Thuraisingham [1990], Jajodia and Sandhu [1990], and Qian and Lunt [1996]
...
Stallings [1998] provides a textbook description of cryptography
...
The Data Encryption Standard is presented by US Dept
...
Public-key encryption is discussed by Rivest
et al
...
Other discussions on cryptography include Diffie and Hellman [1979],
Simmons [1979], Fernandez et al
...
260
Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition
C
II
...
Relational−Database
Design
E
R
© The McGraw−Hill
Companies, 2001
7
Relational-Database Design
This chapter continues our discussion of design issues in relational databases
...
One approach is to design schemas that are in an
appropriate normal form
...
In this chapter, we introduce the notion
of functional dependencies
...
7
...
A domain is atomic if elements of the domain are considered to be indivisible
units
...
A set of names is an example of a nonatomic value
...
Composite attributes, such as an attribute address with component attributes street
and city, also have nonatomic domains
...
The distinction is that we do not
normally consider integers to have subparts, but we consider sets of integers to have
subparts— namely, the integers making up the set
...
257
Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition
258
Chapter 7
II
...
Relational−Database
Design
© The McGraw−Hill
Companies, 2001
Relational-Database Design
The domain of all integers would be nonatomic if we considered each integer to be
an ordered list of digits
...
Examples of such numbers would be CS0012 and
EE1127
...
If a relation schema had an attribute whose domain consists of identification numbers encoded as above, the schema would not be in first normal form
...
Doing so requires extra programming, and information gets encoded in the application program rather than in the database
...
The use of set valued attributes can lead to designs with redundant storage of data,
which in turn can result in inconsistencies
...
Whenever an account is created, or the set of
owners of an account is updated, the update has to be performed at two places; failure to perform both updates can leave the database in an inconsistent state
...
Set valued attributes are also more complicated to write queries with, and
more complicated to reason about
...
Although we have not mentioned first normal form earlier, when
we introduced the relational model in Chapter 3 we stated that attribute values must
be atomic
...
For example, composite valued attributes are often useful, and set valued attributes are also useful in many cases, which is why both are supported in the E-R
model
...
There is also a runtime overhead of converting data back and forth from the atomic form
...
In fact, modern database
systems do support many types of nonatomic values, as we will see in Chapters 8
and 9
...
7
...
Among the undesirable properties that a bad design may
have are:
261
262
Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition
II
...
Relational−Database
Design
7
...
1 shows an instance of the relation lending (Lending-schema)
...
• t[branch-city] is the city in which the branch named t[branch-name] is located
...
• t[amount] is the amount of the loan whose number is t[loan-number]
...
Say that the loan is made
by the Perryridge branch to Adams in the amount of $1500
...
In our design, we need a tuple with values on all the attributes of Lendingschema
...
1
assets
9000000
2100000
1700000
9000000
400000
8000000
300000
3700000
9000000
1700000
7100000
customername
Jones
Smith
Hayes
Jackson
Jones
Turner
Williams
Hayes
Johnson
Glenn
Brooks
loannumber
L-17
L-23
L-15
L-14
L-93
L-11
L-29
L-16
L-18
L-25
L-10
Sample lending relation
...
Relational Databases
7
...
In general, the asset and city data for a branch must appear
once for each loan made by that branch
...
Repeating
information wastes space
...
Suppose, for example, that the assets of the Perryridge branch change from 1700000
to 1900000
...
Under our alternative design, many tuples of the lending relation need to be
changed
...
When we perform the update in the alternative database, we must
ensure that every tuple pertaining to the Perryridge branch is updated, or else our
database will show two different asset values for the Perryridge branch
...
We
know that a bank branch has a unique value of assets, so given a branch name we can
uniquely identify the assets value
...
In other words, we say that the functional dependency
branch-name → assets
holds on Lending-schema, but we do not expect the functional dependency branchname → loan-number to hold
...
We shall see that we can use functional
dependencies to specify formally when a database design is good
...
This is because tuples in the lending relation require values for loan-number, amount, and customer-name
...
Recall, however, that null values are difficult to handle, as we
saw in Section 3
...
4
...
Worse, we would have to delete this information when all the loans have been paid
...
7
...
A functional dependency is a type of constraint that is a
generalization of the notion of key, as discussed in Chapters 2 and 3
...
3
...
They allow us
to express facts about the enterprise that we are modeling with our database
...
Relational Databases
© The McGraw−Hill
Companies, 2001
7
...
3
Functional Dependencies
261
In Chapter 2, we defined the notion of a superkey as follows
...
A subset K of R is a superkey of R if, in any legal relation r(R), for all pairs
t1 and t2 of tuples in r such that t1 = t2 , then t1 [K] = t2 [K]
...
The notion of functional dependency generalizes the notion of superkey
...
The functional dependency
α→β
holds on schema R if, in any legal relation r(R), for all pairs of tuples t1 and t2 in r
such that t1 [α] = t2 [α], it is also the case that t1 [β] = t2 [β]
...
That is, K is a superkey if, whenever t1 [K] = t2 [K], it is also the case that
t1 [R] = t2 [R] (that is, t1 = t2 )
...
Consider the schema
Loan-info-schema = (loan-number, branch-name, customer-name, amount)
which is simplification of the Lending-schema that we saw earlier
...
We shall use functional dependencies in two ways:
1
...
If a relation r is legal under a set F of functional dependencies,
we say that r satisfies F
...
To specify constraints on the set of legal relations
...
If we wish to constrain ourselves to relations on schema R that satisfy a
set F of functional dependencies, we say that F holds on R
...
2, to see which functional dependencies
are satisfied
...
There are two tuples that have an A
value of a1
...
Similarly, the two tuples with an A value of a2 have the same C value, c2
...
The functional dependency C → A is not
satisfied, however
...
Relational Databases
© The McGraw−Hill
Companies, 2001
7
...
2
B
b1
b2
b2
b2
b3
C
c1
c1
c2
c2
c2
D
d1
d2
d2
d3
d4
Sample relation r
...
These two tuples have the same C values, c2 , but they have different A values, a2 and a3 , respectively
...
Many other functional dependencies are satisfied by r, including, for example, the
functional dependency AB → D
...
Observe that there is no pair of distinct tuples t1 and
t2 such that t1 [AB] = t2 [AB]
...
So, r satisfies AB → D
...
For example, A → A is satisfied by all relations involving attribute A
...
Similarly, AB → A
is satisfied by all relations involving attribute A
...
To distinguish between the concepts of a relation satisfying a dependency and a
dependency holding on a schema, we return to the banking example
...
3, we see that customer-street
→ customer-city is satisfied
...
3
customer-street customer-city
Main
Harrison
North
Rye
Main
Harrison
North
Rye
Pittsfield
Park
Putnam
Stamford
Nassau
Princeton
Spring
Pittsfield
Alma
Palo Alto
Sand Hill
Woodside
Senator
Brooklyn
Walnut
Stamford
The customer relation
...
Relational Databases
© The McGraw−Hill
Companies, 2001
7
...
3
Functional Dependencies
263
loan-number branch-name amount
L-17
Downtown
1000
L-23
Redwood
2000
L-15
Perryridge
1500
L-14
Downtown
1500
L-93
Mianus
500
L-11
Round Hill
900
L-29
Pownal
1200
L-16
North Town 1300
L-18
2000
Downtown
Perryridge
L-25
2500
L-10
Brighton
2200
Figure 7
...
can have streets with the same name
...
So, we would not include customer-street → customer-city in the set of functional
dependencies that hold on Customer-schema
...
4, we see that the dependency loannumber → amount is satisfied
...
Therefore, we want to require
that loan-number → amount be satisfied by the loan relation at all times
...
In the branch relation of Figure 7
...
We want to require that branch-name → assets hold on
Branch-schema
...
In what follows, we assume that, when we design a relational database, we first
list those functional dependencies that must always hold
...
5
branch-city
Brooklyn
Palo Alto
Horseneck
Horseneck
Horseneck
Bennington
Rye
Brooklyn
assets
9000000
2100000
1700000
400000
8000000
300000
3700000
7100000
The branch relation
...
Relational Databases
7
...
3
...
Rather, we
need to consider all functional dependencies that hold
...
We say that such functional dependencies are “logically implied” by F
...
Suppose we are given a relation schema R = (A, B, C, G, H, I) and the set of
functional dependencies
A→B
A→C
CG → H
CG → I
B→H
The functional dependency
A→H
267
268
Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition
II
...
Relational−Database
Design
7
...
That is, we can show that, whenever our given set of functional
dependencies holds on a relation, A → H must also hold on the relation
...
But that is exactly the definition of A → H
...
The closure of F, denoted by F + , is the
set of all functional dependencies logically implied by F
...
If F were large, this
process would be lengthy and difficult
...
Axioms, or rules of inference, provide a simpler technique for reasoning about
functional dependencies
...
)
for sets of attributes, and uppercase Roman letters from the beginning of the alphabet
for individual attributes
...
We can use the following three rules to find logically implied functional dependencies
...
This collection
of rules is called Armstrong’s axioms in honor of the person who first proposed it
...
If α is a set of attributes and β ⊆ α, then α → β holds
...
If α → β holds and γ is a set of attributes, then γα → γβ
holds
...
If α → β holds and β → γ holds, then α → γ holds
...
They are complete, because, for a given set F of functional dependencies, they allow us to generate all F +
...
Although Armstrong’s axioms are complete, it is tiresome to use them directly for
the computation of F +
...
It is possible to use Armstrong’s axioms to prove that these rules are correct (see Exercises 7
...
9, and 7
...
• Union rule
...
• Decomposition rule
...
• Pseudotransitivity rule
...
Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition
266
Chapter 7
II
...
Relational−Database
Design
© The McGraw−Hill
Companies, 2001
Relational-Database Design
Let us apply our rules to the example of schema R = (A, B, C, G, H, I) and the
set F of functional dependencies {A → B, A → C, CG → H, CG → I, B → H}
...
Since A → B and B → H hold, we apply the transitivity rule
...
• CG → HI
...
• AG → I
...
Another way of finding that AG → I holds is as follows
...
Applying the transitivity rule to
this dependency and CG → I, we infer AG → I
...
6 shows a procedure that demonstrates formally how to use Armstrong’s
axioms to compute F +
...
We will also
see an alternative way of computing F + in Section 7
...
3
...
Since a set of size n has 2n subsets, there are a total of 2 × 2n = 2n+1 possible
functional dependencies, where n is the number of attributes in R
...
Thus, the procedure is guaranteed to terminate
...
3
...
One way of doing this is to compute
F + , take all functional dependencies with α as the left-hand side, and take the union
of the right-hand sides of all such dependencies
...
F+ = F
repeat
for each functional dependency f in F +
apply reflexivity and augmentation rules on f
add the resulting functional dependencies to F +
for each pair of functional dependencies f1 and f2 in F +
if f1 and f2 can be combined using transitivity
add the resulting functional dependency to F +
+
until F does not change any further
Figure 7
...
269
270
Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition
II
...
Relational−Database
Design
7
...
Let α be a set of attributes
...
Figure 7
...
The
input is a set F of functional dependencies and the set α of attributes
...
To illustrate how the algorithm works, we shall use it to compute (AG)+ with the
functional dependencies defined in Section 7
...
2
...
The first
time that we execute the while loop to test each functional dependency, we find that
• A → B causes us to include B in result
...
• A → C causes result to become ABCG
...
• CG → I causes result to become ABCGHI
...
Let us see why the algorithm of Figure 7
...
The first step is correct, since
α → α always holds (by the reflexivity rule)
...
Since we start the while loop with α → result being true, we can add γ to result
only if β ⊆ result and β → γ
...
Another application of transitivity shows that α → γ (using α → β and
β → γ)
...
Thus, any attribute returned by the algorithm
is in α+
...
If there is an attribute in α+ that
is not yet in result, then there must be a functional dependency β → γ for which β ⊆
result, and at least one attribute in γ is not in result
...
There is a faster (although slightly more complex) algorithm that runs in time linear in the size of F; that algorithm is presented as part of
Exercise 7
...
result := α;
while (changes to result) do
for each functional dependency β → γ in F do
begin
if β ⊆ result then result := result ∪ γ;
end
Figure 7
...
Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition
268
Chapter 7
II
...
Relational−Database
Design
© The McGraw−Hill
Companies, 2001
Relational-Database Design
There are several uses of the attribute closure algorithm:
• To test if α is a superkey, we compute α+ , and check if α+ contains all attributes of R
...
That is, we compute α+ by using attribute
closure, and then check if it contains β
...
• It gives us an alternative way to compute F + : For each γ ⊆ R, we find the
closure γ + , and for each S ⊆ γ + , we output a functional dependency γ → S
...
3
...
Whenever a user performs an update on the relation, the database system must ensure that
the update does not violate any functional dependencies, that is, all the functional
dependencies in F are satisfied in the new database state
...
We can reduce the effort spent in checking for violations by testing a simplified set
of functional dependencies that has the same closure as the given set
...
However, the simplified set is easier to test
...
First, we need some definitions
...
The formal
definition of extraneous attributes is as follows
...
• Attribute A is extraneous in α if A ∈ α, and F logically implies (F − {α →
β}) ∪ {(α − A) → β}
...
For example, suppose we have the functional dependencies AB → C and A → C
in F
...
As another example, suppose we have the
functional dependencies AB → CD and A → C in F
...
Beware of the direction of the implications when using the definition of extraneous
attributes: If you exchange the left-hand side with right-hand side, the implication
will always hold
...
Relational Databases
© The McGraw−Hill
Companies, 2001
7
...
3
Functional Dependencies
269
Here is how we can test efficiently if an attribute is extraneous
...
Consider an attribute A in a dependency α → β
...
To do so, compute α+ (the closure
of α) under F ; if α+ includes A, then A is extraneous in β
...
To do so, compute γ + (the closure of γ) under F ; if γ +
includes all attributes in β, then A is extraneous in α
...
To check
if C is extraneous in AB → CD, we compute the attribute closure of AB under
F = {AB → D, A → E, and E → C}
...
A canonical cover Fc for F is a set of dependencies such that F logically implies all
dependencies in Fc , and Fc logically implies all dependencies in F
...
• Each left side of a functional dependency in Fc is unique
...
A canonical cover for a set of functional dependencies F can be computed as depicted in Figure 7
...
It is important to note that when checking if an attribute is extraneous, the check uses the dependencies in the current value of Fc , and not the dependencies in F
...
Such functional dependencies
should be deleted
...
However,
Fc is minimal in a certain sense — it does not contain extraneous attributes, and it
Fc = F
repeat
Use the union rule to replace any dependencies in Fc of the form
α1 → β1 and α1 → β2 with α1 → β1 β2
...
/* Note: the test for extraneous attributes is done using Fc , not F */
If an extraneous attribute is found, delete it from α → β
...
Figure 7
...
Relational Databases
7
...
It is cheaper to test Fc
than it is to test F itself
...
• There are two functional dependencies with the same set of attributes on the
left side of the arrow:
A → BC
A→B
We combine these functional dependencies into A → BC
...
This assertion is true because B → C is already in our set of functional dependencies
...
Thus, our canonical cover is
A→B
B→C
Given a set F of functional dependencies, it may be that an entire functional dependency in the set is extraneous, in the sense that dropping it does not change the
closure of F
...
Suppose that, to the contrary, there were such an extraneous
functional dependency in Fc
...
A canonical cover might not be unique
...
If we apply the extraneity
test to A → BC, we find that both B and C are extraneous under F
...
Then,
1
...
Now,
B is not extraneous in the righthand side of A → B under F
...
273
274
Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition
II
...
Relational−Database
Design
7
...
If B is deleted, we get the set {A → C, B → AC, and C → AB}
...
As an exercise, can you find one more canonical cover for F ?
7
...
2 suggests that we should decompose a relation schema
that has many attributes into several schemas with fewer attributes
...
Consider an alternative design in which we decompose Lending-schema into the
following two schemas:
Branch-customer-schema = (branch-name, branch-city, assets, customer-name)
Customer-loan-schema = (customer-name, loan-number, amount)
Using the lending relation of Figure 7
...
9 and 7
...
Of course, there are cases in which we need to reconstruct the loan relation
...
No relation in our alternative database contains these data
...
It appears that we can do so by writing
branch-customer
branch-name
Downtown
Redwood
Perryridge
Downtown
Mianus
Round Hill
Pownal
North Town
Downtown
Perryridge
Brighton
branch-city
Brooklyn
Palo Alto
Horseneck
Brooklyn
Horseneck
Horseneck
Bennington
Rye
Brooklyn
Horseneck
Brooklyn
Figure 7
...
Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition
272
Chapter 7
II
...
Relational−Database
Design
Relational-Database Design
customer-name
Jones
Smith
Hayes
Jackson
Jones
Turner
Williams
Hayes
Johnson
Glenn
Brooks
Figure 7
...
Figure 7
...
When
we compare this relation and the lending relation with which we started (Figure 7
...
In our example, branch-customer 1 customer-loan
has the following additional tuples:
(Downtown, Brooklyn, 9000000, Jones, L-93, 500)
(Perryridge, Horseneck, 1700000, Hayes, L-16, 1300)
(Mianus, Horseneck, 400000, Jones, L-17, 1000)
(North Town, Rye, 3700000, Hayes, L-15, 1500)
Consider the query, “Find all bank branches that have made a loan in an amount less
than $1000
...
1, we see that the only branches with loan
amounts less than $1000 are Mianus and Round Hill
...
A closer examination of this example shows why
...
Thus, when we join branch-customer and customer-loan, we obtain not only
the tuples we had originally in lending, but also several additional tuples
...
We are no longer able, in general, to represent in the database information
about which customers are borrowers from which branch
...
A
decomposition that is not a lossy-join decomposition is a lossless-join decomposi-
275
276
Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition
II
...
Relational−Database
Design
7
...
11
assets
9000000
9000000
2100000
1700000
1700000
9000000
400000
400000
8000000
300000
3700000
3700000
9000000
1700000
7100000
customername
Jones
Jones
Smith
Hayes
Hayes
Jackson
Jones
Jones
Turner
Williams
Hayes
Hayes
Johnson
Glenn
Brooks
The relation branch-customer
1
Decomposition
loannumber
L-17
L-93
L-23
L-15
L-16
L-14
L-17
L-93
L-11
L-29
L-15
L-16
L-18
L-25
L-10
273
amount
1000
500
2000
1500
1300
1500
1000
500
900
1200
1500
1300
2000
2500
2200
customer -loan
...
It should be clear from our example that a lossy-join decomposition is, in general, a bad database design
...
This representation is not adequate because a customer may have several loans, yet these loans are not necessarily obtained
from the same branch
...
The difference between this example and the preceding one is that the assets of a branch are the same, regardless
of the customer to which we are referring, whereas the lending branch associated
with a certain loan amount does depend on the customer to which we are referring
...
Relational Databases
7
...
That is, the functional
dependency
branch-name → assets branch-city
holds, but customer-name does not functionally determine loan-number
...
Therefore, we restate the preceding examples more concisely and more formally
...
A set of relation schemas {R1 , R2 ,
...
, Rn } is a decomposition of R if, for i = 1, 2,
...
Let r be a relation on schema R, and let ri = ΠRi (r) for i = 1, 2,
...
That is,
{r1 , r2 ,
...
, Rn }
...
When we compute the
relations r1 , r2 ,
...
, n
...
The
details are left for you to complete as an exercise
...
In general, r = r1 1 r2 1 · · · 1 rn
...
• R = Lending-schema
...
• R2 = Customer-loan-schema
...
1
...
9
...
10
...
11
...
1 and 7
...
To have a lossless-join decomposition, we need to impose constraints on the set of
possible relations
...
277
278
Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition
II
...
Relational−Database
Design
7
...
We say that a relation is legal if it satisfies all rules, or constraints, that we
impose on our database
...
A decomposition {R1 , R2 ,
...
A major part of this chapter deals with the questions of
how to specify constraints on the database, and how to obtain lossless-join decompositions that avoid the pitfalls represented by the examples of bad database designs
that we have seen in this section
...
5 Desirable Properties of Decomposition
We can use a given set of functional dependencies in designing a relational database
in which most of the undesirable properties discussed in Section 7
...
When we design such systems, it may become necessary to decompose a relation
into several smaller relations
...
In later sections, we outline specific ways of decomposing a relational
schema to get the properties we desire
...
2:
Lending-schema = (branch-name, branch-city, assets, customer-name,
loan-number, amount)
The set F of functional dependencies that we require to hold on Lending-schema are
branch-name → branch-city assets
loan-number → amount branch-name
As we discussed in Section 7
...
Assume that we decompose it to the following three relations:
Branch-schema = (branch-name, branch-city, assets)
Loan-schema = (loan-number, branch-name, amount)
Borrower-schema = (customer-name, loan-number)
We claim that this decomposition has several desirable properties, which we discuss
next
...
7
...
1 Lossless-Join Decomposition
In Section 7
...
We claim that the
Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition
276
Chapter 7
II
...
Relational−Database
Design
© The McGraw−Hill
Companies, 2001
Relational-Database Design
decomposition in Section 7
...
To demonstrate our claim, we must
first present a criterion for determining whether a decomposition is lossy
...
Let
R1 and R2 form a decomposition of R
...
We can use attribute closure to efficiently test for
superkeys, as we have seen earlier
...
We
begin by decomposing Lending-schema into two schemas:
Branch-schema = (branch-name, branch-city, assets)
Loan-info-schema = (branch-name, customer-name, loan-number, amount)
Since branch-name → branch-city assets, the augmentation rule for functional dependencies (Section 7
...
2) implies that
branch-name → branch-name branch-city assets
Since Branch-schema ∩ Loan-info-schema = {branch-name}, it follows that our initial
decomposition is a lossless-join decomposition
...
For the general case of decomposition of a relation into multiple parts at once, the
test for lossless join decomposition is more complicated
...
While the test for binary decomposition is clearly a sufficient condition for lossless
join, it is a necessary condition only if all constraints are functional dependencies
...
7
...
2 Dependency Preservation
There is another goal in relational-database design: dependency preservation
...
Relational Databases
© The McGraw−Hill
Companies, 2001
7
...
5
Desirable Properties of Decomposition
277
functional dependencies
...
To decide whether joins must be computed to check an update, we need to determine what functional dependencies can be tested by checking each relation individually
...
, Rn
be a decomposition of R
...
Since all functional dependencies in
a restriction involve attributes of only one relation schema, it is possible to test such
a dependency for satisfaction by checking only one relation
...
For instance, suppose F = {A → B, B → C}, and we have a decomposition into
AC and AB
...
The set of restrictions F1 , F2 ,
...
We now must ask whether testing only the restrictions is sufficient
...
F is a set of functional dependencies on schema R, but,
in general, F = F
...
If the latter is
true, then every dependency in F is logically implied by F , and, if we verify that F
is satisfied, we have verified that F is satisfied
...
Figure 7
...
The input
is a set D = {R1 , R2 ,
...
This algorithm is expensive since it requires computation of F + ;
we will describe another algorithm that is more efficient after giving an example of
testing for dependency preservation
...
Instead of applying the algorithm of Figure 7
...
12
Testing for dependency preservation
...
Relational Databases
7
...
• We can test the functional dependency: branch-name → branch-city assets using
Branch-schema = (branch-name, branch-city, assets)
...
If each member of F can be tested on one of the relations of the decomposition, then
the decomposition is dependency preserving
...
The alternative test can
therefore be used as a sufficient condition that is easy to check; if it fails we cannot
conclude that the decomposition is not dependency preserving, instead we will have
to apply the general test
...
The idea is to test each functional dependency α → β in F by using a
modified form of attribute closure to see if it is preserved by the decomposition
...
result = α
while (changes to result) do
for each Ri in the decomposition
t = (result ∩Ri )+ ∩ Ri
result = result ∪ t
The attribute closure is with respect to the functional dependencies in F
...
The
decomposition is dependency preserving if and only if all the dependencies in F are
preserved
...
This procedure
takes polynomial time, instead of the exponential time required to compute F +
...
5
...
2
...
The decomposition separates
branch and loan data into distinct relations, thereby eliminating this redundancy
...
In the decomposition, the relation on schema Borrowerschema contains the loan-number, customer-name relationship, and no other schema
does
...
Relational Databases
© The McGraw−Hill
Companies, 2001
7
...
6
Boyce – Codd Normal Form
279
on Borrower-schema
...
Clearly, the lack of redundancy in our decomposition is desirable
...
7
...
In this section we cover BCNF (defined below), and later, in
Section 7
...
7
...
1 Definition
One of the more desirable normal forms that we can obtain is Boyce – Codd normal
form (BCNF)
...
• α is a superkey for schema R
...
As an illustration, consider the following relation schemas and their respective
functional dependencies:
• Customer-schema = (customer-name, customer-street, customer-city)
customer-name → customer-street customer-city
• Branch-schema = (branch-name, assets, branch-city)
branch-name → assets branch-city
• Loan-info-schema = (branch-name, customer-name, loan-number, amount)
loan-number → amount branch-name
We claim that Customer-schema is in BCNF
...
The only nontrivial functional dependencies that hold on
Customer-schema have customer-name on the left side of the arrow
...
Similarly, it can be shown easily that the relation
schema Branch-schema is in BCNF
...
First, note that loan-number
is not a superkey for Loan-info-schema, since we could have a pair of tuples representing a single loan made to two people — for example,
(Downtown, John Bell, L-44, 1000)
(Downtown, Jane Bell, L-44, 1000)
Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition
280
Chapter 7
II
...
Relational−Database
Design
© The McGraw−Hill
Companies, 2001
Relational-Database Design
Because we did not list functional dependencies that rule out the preceding case, loannumber is not a candidate key
...
Therefore, Loan-info-schema does not satisfy the definition of
BCNF
...
2
...
We can eliminate this redundancy by redesigning our database such that
all schemas are in BCNF
...
Consider the decomposition of Loan-info-schema into two schemas:
Loan-schema = (loan-number, branch-name, amount)
Borrower-schema = (customer-name, loan-number)
This decomposition is a lossless-join decomposition
...
In this example, it is easy to see that
loan-number → amount branch-name
applies to the Loan-schema, and that only trivial functional dependencies apply to
Borrower-schema
...
Thus, both schemas of our decomposition are in BCNF
...
There is exactly one tuple for each loan in the relation on Loan-schema, and one tuple for each customer of each loan in the relation on
Borrower-schema
...
Often testing of a relation to see if it satisfies BCNF can be simplified:
• To check if a nontrivial dependency α → β causes a violation of BCNF, compute α+ (the attribute closure of α), and verify that it includes all attributes of
R; that is, it is a superkey of R
...
We can show that if none of the dependencies in F causes a violation of
BCNF, then none of the dependencies in F + will cause a violation of BCNF
either
...
That is, it does not suffice to use F when we test a relation Ri , in a decomposition
of R, for violation of BCNF
...
Suppose this were
283
284
Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition
II
...
Relational−Database
Design
7
...
Now, neither of the dependencies in
F contains only attributes from (A, C, D, E) so we might be misled into thinking R2
satisfies BCNF
...
Thus, we may need a dependency that is in F + , but is not in F , to
show that a decomposed relation is not in BCNF
...
To check if a relation Ri in a decomposition of R is in BCNF, we apply this test:
• For every subset α of attributes in Ri , check that α+ (the attribute closure of α
under F ) either includes no attribute of Ri − α, or includes all attributes of Ri
...
The above dependency shows that Ri violates BCNF, and is a “witness” for the violation
...
6
...
7
...
2 Decomposition Algorithm
We are now able to state a general method to decompose a relation schema so as to
satisfy BCNF
...
13 shows an algorithm for this task
...
, Rn by the algorithm
...
The decomposition that the algorithm generates is not only in BCNF, but is also
a lossless-join decomposition
...
result := {R};
done := false;
compute F + ;
while (not done) do
if (there is a schema Ri in result that is not in BCNF)
then begin
let α → β be a nontrivial functional dependency that holds
on Ri such that α → Ri is not in F + , and α ∩ β = ∅ ;
result := (result − Ri ) ∪ (Ri − β) ∪ ( α, β);
end
else done := true;
Figure 7
...
Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition
282
Chapter 7
II
...
Relational−Database
Design
© The McGraw−Hill
Companies, 2001
Relational-Database Design
We apply the BCNF decomposition algorithm to the Lending-schema schema that
we used in Section 7
...
We can apply the algorithm of Figure 7
...
Thus, Lendingschema is not in BCNF
...
Since branch-name is a key for
Branch-schema, the relation Branch-schema is in BCNF
...
We replace Loan-info-schema by
Loan-schema = (loan-number, branch-name, amount)
Borrower-schema = (customer-name, loan-number)
• Loan-schema and Borrower-schema are in BCNF
...
These relation
schemas are the same as those in Section 7
...
The BCNF decomposition algorithm takes time exponential in the size of the initial
schema, since the algorithm for checking if a relation in the decomposition satisfies
BCNF can take exponential time
...
Relational Databases
© The McGraw−Hill
Companies, 2001
7
...
6
Boyce – Codd Normal Form
283
algorithm that can compute a BCNF decomposition in polynomial time
...
7
...
3 Dependency Preservation
Not every BCNF decomposition is dependency preserving
...
The
set F of functional dependencies that we require to hold on the Banker-schema is
banker-name → branch-name
branch-name customer-name → banker-name
Clearly, Banker-schema is not in BCNF since banker-name is not a superkey
...
13, we obtain the following BCNF decomposition:
Banker-branch-schema = (banker-name, branch-name)
Customer-banker-schema = (customer-name, banker-name)
The decomposed schemas preserve only banker-name → branch-name (and trivial
dependencies), but the closure of {banker-name → branch-name} does not include
customer-name branch-name → banker-name
...
To see why the decomposition of Banker-schema into the schemas Banker-branchschema and Customer-banker-schema is not dependency preserving, we apply the algorithm of Figure 7
...
We find that the restrictions F1 and F2 of F to each schema
are:
F1 = {banker-name → branch-name}
F2 = ∅ (only trivial dependencies hold on Customer-banker-schema)
(For brevity, we do not show trivial functional dependencies
...
Therefore, (F1 ∪ F2 )+ = F + , and the decomposition is not dependency preserving
...
Moreover, it is easy to see that any BCNF decomposition of Banker-schema
must fail to preserve customer-name branch-name → banker-name
...
Lossless join
2
...
Dependency preservation
Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition
284
Chapter 7
II
...
Relational−Database
Design
© The McGraw−Hill
Companies, 2001
Relational-Database Design
Recall that lossless join is an essential condition for a decomposition, to avoid loss
of information
...
In Section 7
...
There are situations where there is more than one way to decompose a schema
into BCNF
...
For instance, suppose we have a relation schema R(A, B, C) with the
functional dependencies A → B and B → C
...
If we used the dependency A → B (or equivalently, A → C)
to decompose R, we would end up with two relations R1(A, B) and R2(A, C); the
dependency B → C would not be preserved
...
Clearly the decomposition into R1(A, B) and R2(B, C)
is preferable
...
7
...
For such schemas, we have two alternatives if we wish to
check if an update violates any functional dependencies:
• Pay the extra cost of computing joins to test for violations
...
Unlike BCNF, 3NF decompositions may contain some redundancy in the decomposed schema
...
Which of the two alternatives to choose is a design
decision to be made by the database designer on the basis of the application requirements
...
7
...
3NF relaxes this constraint slightly by allowing nontrivial functional dependencies whose left side is not a superkey
...
• α is a superkey for R
...
Relational Databases
© The McGraw−Hill
Companies, 2001
7
...
7
Third Normal Form
285
• Each attribute A in β − α is contained in a candidate key for R
...
The first two alternatives are the same as the two alternatives in the definition of
BCNF
...
It represents, in some sense, a minimal relaxation of the
BCNF conditions that helps ensure that every schema has a dependency-preserving
decomposition into 3NF
...
Observe that any schema that satisfies BCNF also satisfies 3NF, since each of its
functional dependencies would satisfy one of the first two alternatives
...
The definition of 3NF allows certain functional dependencies that are not allowed
in BCNF
...
1
Let us return to our Banker-schema example (Section 7
...
We have shown that this
relation schema does not have a dependency-preserving, lossless-join decomposition
into BCNF
...
To see that it is, we note
that {customer-name, branch-name} is a candidate key for Banker-schema, so the only
attribute not contained in a candidate key for Banker-schema is banker-name
...
Since {customer-name, branch-name}
is a candidate key, these dependencies do not violate the definition of 3NF
...
Also, we can decompose the dependencies in F so that their right-hand side consists of only single attributes, and use the
resultant set in place of F
...
If α is not a superkey, we
have to verify whether each attribute in β is contained in a candidate key of R; this
test is rather more expensive, since it involves finding candidate keys
...
7
...
2 Decomposition Algorithm
Figure 7
...
The set of dependencies Fc used in the algorithm is a canoni1
...
25)
...
The definition we use is equivalent but easier to
understand
...
Relational Databases
7
...
, i contains α β
then begin
i := i + 1;
Ri := α β;
end
if none of the schemas Rj , j = 1, 2,
...
, Ri )
Figure 7
...
cal cover for F
...
, i;
initially i = 0, and in this case the set is empty
...
14, consider the following extension to the
Banker-schema in Section 7
...
The functional dependencies for this relation schema are
banker-name → branch-name office-number
customer-name branch-name → banker-name
The for loop in the algorithm causes us to include the following schemas in our
decomposition:
Banker-office-schema = (banker-name, branch-name, office-number)
Banker-schema = (customer-name, branch-name, banker-name)
Since Banker-schema contains a candidate key for Banker-info-schema, we are finished
with the decomposition process
...
It ensures that the decomposition
is a lossless-join decomposition by guaranteeing that at least one schema contains a
candidate key for the schema being decomposed
...
19 provides some insight
into the proof that this suffices to guarantee a lossless join
...
The result is not uniquely defined, since a set of functional dependencies
289
290
Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition
II
...
Relational−Database
Design
7
...
If a relation Ri is in the decomposition generated by the synthesis algorithm, then
Ri is in 3NF
...
Therefore, to see that Ri is in
3NF, you must convince yourself that any functional dependency γ → B that holds
on Ri satisfies the definition of 3NF
...
Now, B must be in α or β, since B is in Ri and
α → β generated Ri
...
In this case, the dependency α → β would not have been
in Fc since B would be extraneous in β
...
• B is in β but not α
...
The second condition of 3NF is satisfied
...
Then α must contain some attribute not in γ
...
The derivation could not have used α → β —
if it had been used, α must be contained in the attribute closure of γ,
which is not possible, since we assumed γ is not a superkey
...
This would imply that B
is extraneous in the right-hand side of α → β, which is not possible since
α → β is in the canonical cover Fc
...
• B is in α but not β
...
Interestingly, the algorithm we described for decomposition into 3NF can be implemented in polynomial time, even though testing a given relation to see if it satisfies
3NF is NP-hard
...
7
...
Nevertheless, there are
disadvantages to 3NF: If we do not eliminate all transitive relations schema dependencies, we may have to use null values to represent some of the possible meaningful
relationships among data items, and there is the problem of repetition of information
...
Since banker-name → branch-name, we may
want to represent relationships between values for banker-name and values for branchname in our database
...
Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition
288
Chapter 7
II
...
Relational−Database
Design
Relational-Database Design
customer-name
Jones
Smith
Hayes
Jackson
Curry
Turner
Figure 7
...
As an illustration of the repetition of information problem, consider the instance
of Banker-schema in Figure 7
...
Notice that the information indicating that Johnson
is working at the Perryridge branch is repeated
...
BCNF
2
...
Dependency preservation
Since it is not always possible to satisfy all three, we may be forced to choose between
BCNF and dependency preservation with 3NF
...
It is possible, although a little complicated, to write assertions
that enforce a functional dependency (see Exercise 7
...
Thus even if we had
a dependency-preserving decomposition, if we use standard SQL we would not be
able to efficiently test a functional dependency whose left-hand side is not a key
...
Given a BCNF decomposition that is not dependency preserving, we consider each dependency in a minimum cover Fc that is
not preserved in the decomposition
...
The functional dependency can be easily tested on the materialized view, by means of a constraint unique (α)
...
(Later in the
book, in Section 14
...
)
Thus, in case we are not able to get a dependency-preserving BCNF decomposition,
it is generally preferable to opt for BCNF, and use techniques such as materialized
views to reduce the cost of checking functional dependencies
...
Relational Databases
© The McGraw−Hill
Companies, 2001
7
...
8
Fourth Normal Form
289
7
...
Consider again our banking example
...
However, assume that our bank is attracting wealthy customers who have several addresses (say, a winter home and a summer home)
...
If we
remove this functional dependency, we find BC-schema to be in BCNF with respect to
our modified set of functional dependencies
...
To deal with this problem, we must define a new form of constraint, called a multivalued dependency
...
This normal form, called
fourth normal form (4NF), is more restrictive than BCNF
...
7
...
1 Multivalued Dependencies
Functional dependencies rule out certain tuples from being in a relation
...
Multivalued dependencies, on the other hand, do not rule out the existence of certain
tuples
...
For this reason, functional dependencies sometimes are referred to as equalitygenerating dependencies, and multivalued dependencies are referred to as tuplegenerating dependencies
...
The multivalued dependency
α→ β
→
holds on R if, in any legal relation r(R), for all pairs of tuples t1 and t2 in r such that
t1 [α] = t2 [α], there exist tuples t3 and t4 in r such that
t1 [α] = t2 [α] = t3 [α] = t4 [α]
t3 [β] = t1 [β]
t3 [R − β] = t2 [R − β]
t4 [β] = t2 [β]
t4 [R − β] = t1 [R − β]
Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition
290
Chapter 7
II
...
Relational−Database
Design
Relational-Database Design
t1
t2
t3
t4
α
a1
...
ai
a1
...
ai
Figure 7
...
aj
bi + 1
...
aj
bi + 1
...
an
bj + 1
...
bn
aj + 1
...
→
This definition is less complicated than it appears to be
...
16 gives a tabular
→
picture of t1 , t2 , t3 , and t4
...
If the multivalued dependency α → β is satisfied by all relations on schema
→
R, then α → β is a trivial multivalued dependency on schema R
...
To illustrate the difference between functional and multivalued dependencies, we
consider the BC-schema again, and the relation bc (BC-schema) of Figure 7
...
We must
repeat the loan number once for each address a customer has, and we must repeat
the address for each loan a customer has
...
If a customer (say, Smith) has a loan (say, loan
number L-23), we want that loan to be associated with all Smith’s addresses
...
18 is illegal
...
18
...
(The multivalued dependency customer-name → loan-number will do as well
...
)
As with functional dependencies, we shall use multivalued dependencies in two
ways:
1
...
To specify constraints on the set of legal relations; we shall thus concern ourselves with only those relations that satisfy a given set of functional and multivalued dependencies
loan-number
L-23
L-23
L-93
Figure 7
...
293
294
Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition
II
...
Relational−Database
Design
7
...
18
Fourth Normal Form
customer-street
North
Main
291
customer-city
Rye
Manchester
An illegal bc relation
...
Let D denote a set of functional and multivalued dependencies
...
As we did for functional dependencies, we can compute D+ from D, using the formal
definitions of functional dependencies and multivalued dependencies
...
Luckily, multivalued dependencies that occur in practice appear to be quite simple
...
(Section C
...
1 of the appendix outlines a system of inference rules for
multivalued dependencies
...
→
In other words, every functional dependency is also a multivalued dependency
...
8
...
We saw in the opening paragraphs of Section 7
...
We shall see that we can use the given multivalued dependency to improve the database design, by decomposing BC-schema into a fourth
normal form decomposition
...
→
• α is a superkey for schema R
...
Note that the definition of 4NF differs from the definition of BCNF in only the use
of multivalued dependencies instead of functional dependencies
...
To see this fact, we note that, if a schema R is not in BCNF, then there is
Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition
292
Chapter 7
II
...
Relational−Database
Design
© The McGraw−Hill
Companies, 2001
Relational-Database Design
result := {R};
done := false;
compute D+ ; Given schema Ri , let Di denote the restriction of D+ to Ri
while (not done) do
if (there is a schema Ri in result that is not in 4NF w
...
t
...
19
4NF decomposition algorithm
...
Since α → β implies α → β, R cannot be in 4NF
...
, Rn be a decomposition of R
...
Recall that, for a set F of functional
dependencies, the restriction Fi of F to Ri is all functional dependencies in F + that
include only attributes of Ri
...
The restriction of D to Ri is the set Di consisting of
1
...
All multivalued dependencies of the form
α → β ∩ Ri
→
where α ⊆ Ri and α → β is in D+
...
8
...
Figure 7
...
It is identical
to the BCNF decomposition algorithm of Figure 7
...
If we apply the algorithm of Figure 7
...
Following the algorithm, we replace BC-schema by two
schemas:
Borrower-schema = (customer-name, loan-number)
Customer-schema = (customer-name, customer-street, customer-city)
...
295
296
Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition
II
...
Relational−Database
Design
7
...
The following fact about multivalued dependencies and lossless
joins shows that the algorithm of Figure 7
...
Let R1 and R2 form a decomposition of R
...
5
...
The preceding fact about multivalued dependencies is a more general statement about lossless joins
...
→
→
The issue of dependency preservation when we decompose a relation becomes
more complicated in the presence of multivalued dependencies
...
1
...
7
...
As we saw earlier, multivalued dependencies help us understand and tackle some forms of repetition of information that cannot be understood in terms of functional dependencies
...
There is a class of even
more general constraints, which leads to a normal form called domain-key normal
form
...
Hence PJNF and domain-key normal form
are used quite rarely
...
Conspicuous by its absence from our discussion of normal forms is second normal form (2NF)
...
We
simply define it, and let you experiment with it in Exercise 7
...
7
...
In
this section we study how normalization fits into the overall database design process
...
4, we assumed that a relation schema
R is given, and proceeded to normalize it
...
Relational Databases
7
...
R could have been generated when converting a E-R diagram to a set of tables
...
R could have been a single relation containing all attributes that are of interest
...
3
...
In the rest of this section we examine the implications of these approaches
...
7
...
1 E-R Model and Normalization
When we carefully define an E-R diagram, identifying all entities correctly, the tables
generated from the E-R diagram should not need further normalization
...
For instance,
suppose an employee entity had attributes department-number and department-address,
and there is a functional dependency department-number → department-address
...
Most examples of such dependencies arise out of poor E-R diagram design
...
Similarly, a relationship involving more than two entities may not be in a
desirable normal form
...
(In fact, some E-R diagram variants actually make it difficult or impossible to
specify nonbinary relations
...
If the generated relations are not in desired normal form, the problem can be fixed in the E-R diagram
...
Alternatively,
normalization can be left to the designer’s intuition during E-R modeling, and can be
done formally on the relations generated from the E-R model
...
10
...
One of our goals in choosing a
decomposition was that it be a lossless-join decomposition
...
Consider the database of Figure 7
...
The figure depicts a situation in which we have not yet determined the amount
of loan L-58, but wish to record the remainder of the data on the loan
...
In other words, there is no loan-info relation corresponding to the relations
of Figure 7
...
Tuples that disappear when we compute the join are dangling tuples
(see Section 6
...
1)
...
, rn (Rn ) be a set of relations
...
Relational Databases
© The McGraw−Hill
Companies, 2001
7
...
10
Overall Database Design Process
branch-name
Round Hill
loan-number
L-58
loan-number
295
amount
loan-number
L-58
Figure 7
...
tuple t of relation ri is a dangling tuple if t is not in the relation
ΠRi (r1
1
r2
1
···
1
rn )
Dangling tuples may occur in practical database applications
...
The relation r1 1 r2 1 · · · 1 rn is
called a universal relation, since it involves all the attributes in the universe defined
by R1 ∪ R2 ∪ · · · ∪ Rn
...
20
is to include null values in the universal relation
...
Because of them, it may be better to view the relations
of the decomposed design as representing the database, rather than as the universal relation whose schema we decomposed during the normalization process
...
)
Note that we cannot enter all incomplete information into the database of Figure 7
...
For example, we cannot enter a loan number
unless we know at least one of the following:
• The customer name
• The branch name
• The amount of the loan
Thus, a particular decomposition defines a restricted form of incomplete information
that is acceptable in our database
...
Returning again to the
example of Figure 7
...
” This is
because
loan-number → customer-name amount
and therefore the only way that we can relate customer-name and amount is through
loan-number
...
Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition
296
Chapter 7
II
...
Relational−Database
Design
© The McGraw−Hill
Companies, 2001
Relational-Database Design
In other words, we do not want to store data for which the key attributes are unknown
...
Thus, our normal forms allow
representation of acceptable incomplete information via dangling tuples, while prohibiting the storage of undesirable incomplete information
...
We cannot use name to refer
to both customer-name and to branch-name
...
Nevertheless, if we defined our relation schemas directly,
rather than in terms of a universal relation, we could obtain relations on schemas
such as the following for our banking example:
branch-loan (name, number)
loan-customer (number, name)
amt (number, amount)
Observe that, with the preceding relations, expressions such as branch-loan 1 loancustomer are meaningless
...
In a language such as SQL, however, a query involving branch-loan and loan-customer must remove ambiguity in references to name by prefixing the relation name
...
We believe that using the unique-role assumption — that each attribute name has
a unique meaning in the database — is generally preferable to reusing of the same
name in multiple roles
...
7
...
3 Denormalization for Performance
Occasionally database designers choose a schema that has redundant information;
that is, it is not normalized
...
The penalty paid for not using a normalized schema is the extra
work (in terms of coding time and execution time) to keep redundant data consistent
...
In our
normalized schema, this requires a join of account with depositor
...
This makes displaying the account information
faster
...
The process of taking a normalized schema and
making it non-normalized is called denormalization, and designers use it to tune
performance of systems to support time-critical operations
...
Relational Databases
© The McGraw−Hill
Companies, 2001
7
...
11
Summary
297
A better alternative, supported by many database systems today, is to use the normalized schema, and additionally store the join or account and depositor as a materialized view
...
)
Like denormalization, using materialized view does have space and time overheads;
however, it has the advantage that keeping the view up to date is the job of the
database system, not the application programmer
...
10
...
We give examples here; obviously, such
designs should be avoided
...
A relation earnings(company-id, year, amount) could be used to store the
earnings information
...
An alternative design is to use multiple relations, each storing the earnings for a
different year
...
The only functional dependency here on each
relation would be company-id → earnings, so these relations are also in BCNF
...
Queries would also be more complicated since
they may have to refer to many relations
...
Here the only functional
dependencies are from company-id to the other attributes, and again the relation is
in BCNF
...
Queries would also be more complicated, since they may have to refer to
many attributes
...
While such representations are useful for display to users, for the reasons just given, they are not desirable in a database design
...
7
...
The pitfalls included repeated information and inability to represent some information
...
Relational Databases
© The McGraw−Hill
Companies, 2001
7
...
We laid special emphasis on what dependencies are logically implied by a set of dependencies
...
• We introduced the concept of decomposition, and showed that decompositions must be lossless-join decompositions, and should preferably be dependency preserving
...
• We then presented Boyce – Codd Normal Form (BCNF); relations in BCNF are
free from the pitfalls outlined earlier
...
There are relations for which there is no dependencypreserving BCNF decomposition
...
Relations in 3NF may have some redundancy, but there is always a dependency-preserving decomposition into
3NF
...
We defined fourth normal form (4NF) with multivalued dependencies
...
1
...
• Other normal forms, such as PJNF and DKNF, eliminate more subtle forms
of redundancy
...
Appendix C gives details on these normal forms
...
That is one of the primary
advantages of the relational model compared with the other data models that
we have studied
...
Relational Databases
© The McGraw−Hill
Companies, 2001
7
...
1 Explain what is meant by repetition of information and inability to represent information
...
7
...
3 Why are certain functional dependencies called trivial functional dependencies?
7
...
21
...
21
B
b1
b1
b1
b1
C
c1
c2
c1
c3
Relation of Exercise 7
...
Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition
300
Chapter 7
II
...
Relational−Database
Design
Relational-Database Design
7
...
7
...
• A many-to-one relationship set exists between entity sets account and customer
...
7 Consider the following proposed rule for functional dependencies: If α → β and
γ → β, then α → γ
...
7
...
(Hint: Use the
augmentation rule to show that, if α → β, then α → αβ
...
)
7
...
7
...
7
...
A → BC
CD → E
B→D
E→A
List the candidate keys for R
...
12 Using the functional dependencies of Exercise 7
...
7
...
11, compute the canonical
cover Fc
...
14 Consider the algorithm in Figure 7
...
Show that this algorithm
is more efficient than the one presented in Figure 7
...
3
...
7
...
Also write an SQL assertion that enforces the functional dependency
...
7
...
2 is not a
lossless-join decomposition:
(A, B, C)
(C, D, E)
Hint: Give an example of a relation r on schema R such that
ΠA, B, C (r)
1
ΠC, D, E (r) = r
303
304
Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition
II
...
Relational−Database
Design
© The McGraw−Hill
Companies, 2001
Exercises
result := ∅;
/* fdcount is an array whose ith element contains the number
of attributes on the left side of the ith FD that are
not yet known to be in α+ */
for i := 1 to |F | do
begin
let β → γ denote the ith FD;
fdcount [i] := |β|;
end
/* appears is an array with one entry for each attribute
...
Each integer
i on the list indicates that A appears on the left side
of the ith FD */
for each attribute A do
begin
appears [A] := N IL;
for i := 1 to |F | do
begin
let β → γ denote the ith FD;
if A ∈ β then add i to appears [A];
end
end
addin (α);
return (result);
procedure addin (α);
for each attribute A in α do
begin
if A ∈ result then
begin
result := result ∪ {A};
for each element i of appears[A] do
begin
fdcount [i] := fdcount [i] − 1;
if fdcount [i] := 0 then
begin
let β → γ denote the ith FD;
addin (γ);
end
end
end
end
Figure 7
...
301
Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition
302
Chapter 7
II
...
Relational−Database
Design
Relational-Database Design
7
...
, Rn be a decomposition of schema U
...
Show that
u ⊆ r1
1
r2
1
···
1
rn
7
...
2 is not a dependency-preserving
decomposition
...
19 Show that it is possible to ensure that a dependency-preserving decomposition into 3NF is a lossless-join decomposition by guaranteeing that at least one
schema contains a candidate key for the schema being decomposed
...
)
7
...
7
...
2
...
22 Give an example of a relation schema R and set F of functional dependencies
such that there are at least three distinct lossless-join decompositions of R into
BCNF
...
23 In designing a relational database, why might we choose a non-BCNF design?
7
...
2
...
25 Let a prime attribute be one that appears in at least one candidate key
...
Let A be
an attribute that is not in α, is not in β, and for which β → A holds
...
We can restate our definition of 3NF as follows:
A relation schema R is in 3NF with respect to a set F of functional dependencies
if there are no nonprime attributes A in R for which A is transitively dependent
on a key for R
...
7
...
We say that β is partially dependent on α
...
• It is not partially dependent on a candidate key
...
(Hint: Show that every partial dependency is a transitive dependency
...
27 Given the three goals of relational-database design, is there any reason to design
a database schema that is in 2NF, but is in no higher-order normal form? (See
Exercise 7
...
)
305
306
Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition
II
...
Relational−Database
Design
Bibliographical Notes
303
7
...
7
...
7
...
Explain problems that they may cause
...
In that paper, Codd also introduced functional dependencies, and
first, second, and third normal forms
...
Ullman [1988] is an
easily accessible source of proofs of soundness and completeness of Armstrong’s axioms
...
Maier [1983] discusses the theory of functional dependencies
...
[1986] discusses formal aspects of the concept of a
legal relation
...
The desirability of BCNF is discussed in
Bernstein et al
...
A polynomial-time algorithm for BCNF decomposition appears in Tsou and Fischer [1982], and can also be found in Ullman [1988]
...
[1979] gives the algorithm we used to find a lossless-join dependency-preserving decomposition into 3NF
...
[1979a]
...
Beeri et al
...
Our axiomatization is based on theirs
...
Maier [1983] presents the design theory of relational databases in detail
...
[1995] present a more theoretic coverage of many of the
dependencies and normal forms presented here
...
Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition
P A
III
...
As a result, researchers have developed several data models to
deal with these application domains
...
In addition, we study XML, a language
that can represent data that is less structured than that of the other data models
...
Inheritance,
object-identity, and encapsulation (information hiding), with methods to provide an
interface to objects, are among the key concepts of object-oriented programming that
have found applications in data modeling
...
While inheritance
and, to some extent, complex types are also present in the E-R model, encapsulation
and object-identity distinguish the object-oriented data model from the E-R model
...
This model provides the rich type system of
object-oriented databases, combined with relations as the basis for storage of data
...
The object-relational data model
provides a smooth migration path from relational databases, which is attractive to
relational database vendors
...
The XML language was initially designed as a way of adding markup information to text documents, but has become important because of its applications in data
exchange
...
Chapter 10 describes the XML language, and
then presents different ways of expressing queries on data represented in XML, and
transforming XML data from one form to another
...
Object−Based
Databases and XML
R T
8
...
Specifically, three widely used database systems— IBM
DB2, Oracle, and Microsoft SQL Server — are covered in Chapters 25, 26, and 27
...
Each of these chapters highlights unique features of each database system: tools,
SQL variations and extensions, and system architecture, including storage organization, query processing, concurrency control and recovery, and replication
...
Furthermore, since products are enhanced regularly, details of the product may change
...
Keep in mind that the chapters in this part use industrial rather than academic
terminology
...
Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition
III
...
Object−Oriented
Databases
© The McGraw−Hill
Companies, 2001
309
310
Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition
III
...
Object−Oriented
Databases
C H A P T E R
© The McGraw−Hill
Companies, 2001
2 5
Oracle
Hakan Jakobsson
Oracle Corporation
When Oracle was founded in 1977 as Software Development Laboratories by Larry
Ellison, Bob Miner, and Ed Oates, there were no commercial relational database products
...
Since then, Oracle has held a leading position in the relational database market, but over the years its product and service offerings have grown beyond the relational database server
...
In addition to database-related servers and tools, the company also offers application software for enterprise resource planning and customer-relationship management, including areas such as financials, human resources, manufacturing, marketing, sales, and supply chain management
...
This chapter surveys a subset of the features, options, and functionality of Oracle
products
...
The feature set described here is based on the
first release of Oracle9i
...
1 Database Design and Querying Tools
Oracle provides a variety of tools for database design, querying, report generation
and data analysis, including OLAP
...
Object−Based
Databases and XML
8
...
1
...
This is a suite of tools for various aspects of application development, including tools
for forms development, data modeling, reporting, and querying
...
10) for development modeling
...
The suite also supports XML for data exchange with other UML tools
...
It supports such modeling techniques as E-R diagrams, information
engineering, and object analysis and design
...
The metadata can then be used to generate forms and reports
...
The suite also contains application development tools for generating forms, reports, and tools for various aspects of Java and XML-based development
...
Oracle also has an application development tool for data warehousing, Oracle
Warehouse Builder
...
Oracle Warehouse Builder
supports both 3NF and star schemas and can also import designs from Oracle Designer
...
1
...
Oracle Discoverer is a Web-based, ad hoc query, reporting, analysis and Web publishing tool for end users and data analysts
...
Discoverer has wizards to help end
users visualize data as graphs
...
Discoverer’s ad hoc query
interface can generate SQL that takes advantage of this functionality and can provide end users with rich analytical functionality
...
Oracle Express Server is a multidimensional database server
...
Object−Based
Databases and XML
© The McGraw−Hill
Companies, 2001
8
...
2
SQL Variations and Extensions
923
ment
...
With the introduction of OLAP services in Oracle9i, Oracle is moving away from
supporting a separate storage engine and moving most of the calculations into SQL
...
The model also provides
a Java OLAP application programmer interface
...
• A common security model can be used for the analytical applications and the
data warehouse
...
• The relational database management system has a larger set of features and
functionality in many areas such as high availability, backup and recovery,
and third-party tool support
...
The main challenge with moving away from a separate multidimensional database
engine is to provide the same performance
...
Oracle has approached this problem in two
ways
...
• Oracle has extended materialized views to permit analytical functions, in particular grouping sets
...
25
...
In addition, Oracle supports a large number of
other language constructs, some of which conform with SQL:1999, while others are
Oracle-specific in syntax or functionality
...
Object−Based
Databases and XML
8
...
2, including ranking, moving aggregation, cube,
and rollup
...
It is an Oracle-specific syntax for
a feature that Oracle has had since the 1980s
...
The upsert operation combines update and insert, and is useful for merging new data with old data in data warehousing
applications
...
Multitable inserts allow multiple
tables to be updated based on a single scan of new data
...
8
...
25
...
1 Object-Relational Features
Oracle has extensive support for object-relational constructs, including:
• Object types
...
• Collection types
...
• Object tables
...
• Table functions
...
Table functions in Oracle can be
nested
...
• Object views
...
They allow data to be accessed or viewed in an objectoriented style even if the data are really stored in a traditional relational format
...
These can be written in PL/SQL, Java, or C
...
These can be used in SQL statements in the
same way as built-in functions such as sum and count
...
These can be used to store and index XML documents
...
PL/SQL was Oracle’s
original language for stored procedures and it has syntax similar to that used in the
Ada language
...
Oracle provides a package to encapsulate related procedures, functions, and
313
314
Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition
III
...
Object−Oriented
Databases
25
...
Oracle supports SQLJ (SQL embedded in Java) and JDBC,
and provides a tool to generate Java class definitions corresponding to user-defined
database types
...
2
...
(See Section 6
...
) Triggers can be
written in PL/SQL or Java or as C callouts
...
Row triggers execute once for
every row that is affected (updated or deleted, for example) by the DML operation
...
In each case, the trigger can
be defined as either a before or after trigger, depending on whether it is to be invoked
before or after the DML operation is carried out
...
Depending on the view definition, it may not be possible for Oracle to translate a DML statement on a view to modifications of the underlying base
tables unambiguously
...
A user can create an instead of trigger on a view to specify manually what
operations on the base tables are to occur in response to the DML operation on the
view
...
Oracle also has triggers that execute on a variety of other events, like database
startup or shutdown, server error messages, user logon or logoff, and DDL statements
such as create, alter and drop statements
...
3 Storage and Indexing
In Oracle parlance, a database consists of information stored in files and is accessed
through an instance, which is a shared memory area and a set of processes that interact with the data in the files
...
3
...
Each
table space, in turn, consists of one or more physical structures called data files
...
Usually, an Oracle database will have the following table spaces:
• The system table space, which is always created
...
• Table spaces created to store user data
...
Usually, the decision about what other table spaces should be created is based on performance, availability, maintainability, and ease of admin-
Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition
926
Chapter 25
III
...
Object−Oriented
Databases
© The McGraw−Hill
Companies, 2001
Oracle
istration
...
• Temporary table spaces
...
Temporary table spaces are allocated for sorting,
to make the space management operations involved in spilling to disk more
efficient
...
For
example, it is common to move data from a transactional system to a data warehouse
at regular intervals
...
These operations can be much faster than unloading the data from one database and then using a loader to insert it into the other
...
25
...
2 Segments
The space in a table space is divided into units, called segments, that each contain
data for a specific data structure
...
• Data segments
...
(Partitioning in Oracle is described in Section 25
...
10
...
Each index in a table space has its own index segment, except
for partitioned indices, which have one index segment per partition
...
These are segments used when a sort operation needs
to write data to disk or when data are inserted into a temporary table
...
These segments contain undo information so that an uncommitted transaction can be rolled back
...
5
...
5
...
Below the level of segment, space is allocated at a level of granularity called extent
...
A database block is the
lowest level of granularity at which Oracle performs disk I/O
...
Oracle provides storage parameters that allow for detailed control of how space is
allocated and managed, parameters such as:
• The size of a new extent that is to be allocated to provide room for rows that
are inserted into a table
...
Object−Based
Databases and XML
© The McGraw−Hill
Companies, 2001
8
...
3
Storage and Indexing
927
• The percentage of space utilization at which a database block is considered full
and at which no more rows will be inserted into that block
...
)
25
...
3 Tables
A standard table in Oracle is heap organized; that is, the storage location of a row in
a table is not based on the values contained in the row, and is fixed when the row
is inserted
...
There are several features and variations
...
The nested table is not stored in line in the parent table, but is stored
in a separate table
...
The data are private to the
session and are automatically removed at the end of its duration
...
7)
...
In a cluster, rows from different tables are stored together in the same block on the basis of some common
columns
...
The primary key/foreign
key values are used to determine the storage location
...
As a tradeoff, a query involving only the department table may have
to involve a substantially larger number of blocks than if that table had been stored
on its own
...
Therefore, an index on the clustering column is mandatory
...
Here, Oracle computes the location of a row by applying a hash
function to the value for the cluster column
...
Since no index traversal is needed to access a row
according to its cluster column value, this organization can save significant amounts
of disk I/O
...
Both the hash cluster and regular cluster organization can be applied to a single
table
...
Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition
928
Chapter 25
III
...
Object−Oriented
Databases
© The McGraw−Hill
Companies, 2001
Oracle
25
...
4 Index-Organized Tables
In an index organized table, records are stored in an Oracle B-tree index instead of in a
heap
...
While an entry in a regular index contains the key value and row-id of the
indexed row, an index-organized table replaces the row-id with the column values
for the remaining columns of the row
...
Consider looking up all the column values
of a row, given its primary key value
...
For an index-organized table, only the
index probe is necessary
...
In a heap table, each row has a fixed row-id
that does not change
...
Hence, a secondary index on an indexorganized table contains not normal row-ids, but logical row-ids instead
...
The physical row-id is referred to as a “guess” since it could be incorrect if the row has been
moved
...
However, if a table is
highly volatile and a large percentage of the guesses are likely to be wrong, it can be
better to create the secondary index with only key values, since using an incorrect
guess may result in a wasted disk I/O
...
3
...
The most commonly used type is a
B-tree index, created on one or multiple columns
...
) Index entries have the following format: For an index
on columns col1 , col2 , and col3 , each row in the table where at least one of the columns
has a nonnull value would result in the index entry
< col1 >< col2 >< col3 >< row-id >
where < coli > denotes the value for column i and < row-id > is the row-id for
the row
...
For
example, if there are many repeated combinations of < col1 >< col2 > values, the
representation of each distinct < col1 >< col2 > prefix can be shared between the
entries that have that combination of values, rather than stored explicitly for each
such entry
...
317
318
Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition
III
...
Object−Oriented
Databases
25
...
3
...
9
...
Bitmap indices
in Oracle use the same kind of B-tree structure to store the entries as a regular index
...
The number of such possible rows in a block depends
on how many rows can fit into a block, which is a function of the number of columns
in the table and their data types
...
If the column value of that row is that of the index entry, the bit is set
to 1
...
(It is possible that the row does not actually exist because a table
block may well have a smaller number of rows than the number that was calculated
as the maximum possible
...
The compression algorithm is a variation of a compression technique called ByteAligned Bitmap Compression (BBC)
...
If the distance between two ones is sufficiently large — that is, there is a sufficient
number of adjacent zeros between them — a runlength of zeros, that is the number of
zeros, is stored
...
For example, for the condition
(col1 = 1 or col1 = 2) and col2 > 5 and col3 <> 10
Oracle would be able to calculate which rows match the condition by performing
Boolean operations on bitmaps from indices on the three columns
...
• For the index on col2 , all the bitmaps for key values > 5 would be merged in
an operation that corresponds to a logical or
...
Then, a Boolean and would be performed on the results from the first
two indices, followed by two Boolean minuses of the bitmaps for values 10
and null for col3
...
Object−Based
Databases and XML
8
...
The ability to use the Boolean operations to combine multiple indices is not limited to bitmap indices
...
As a rule of thumb, bitmap indices tend to be more space efficient than regular
B-tree indices if the number of distinct key values is less than half the number of
rows in the table
...
For columns with a very small number of distinct values— for example, columns referring to properties such as country, state, gender, marital status,
and various status flags— a bitmap index might require only a small fraction of the
space of a regular B-tree index
...
25
...
7 Function-Based Indices
In addition to creating indices on one or multiple columns of a table, Oracle allows
indices to be created on expressions that involve one or more columns, such as col1 +
col2 ∗ 5
...
In order to find all rows
with name “van Gogh” efficiently, the condition
upper(name) = ’VAN GOGH’
would be used in the where clause of the query
...
A function-based index can be created as either a bitmap or a
B-tree index
...
3
...
Oracle supports bitmap join indices primarily for use
with star schemas (see Section 22
...
2)
...
How the rows in
the fact and dimension tables correspond is based on a join condition that is specified
when the index is created, and becomes part of the index metadata
...
Object−Based
Databases and XML
© The McGraw−Hill
Companies, 2001
8
...
3
Storage and Indexing
931
processed, the optimizer will look for the same join condition in the where clause of
the query in order to determine if the join index is applicable
...
In all cases, the join conditions between the fact
table on which the index is built and the dimension tables must refer to unique keys
in the dimension tables; that is, an indexed row in the fact table must correspond to
a unique row in each of the dimension tables
...
For example, consider a schema with a fact table for sales, and dimension
tables for customers, products, and time
...
If a multicolumn bitmap join index exists where
the key columns are the constrained dimension table columns (zip code, product category and time), Oracle can use the join index to find rows in the fact table that match
the constraining conditions
...
If the query contains
conditions on some columns of the fact table, indices on those columns could be included in the same access path, even if they were regular B-tree indices or domain
indices (domain indices are described below in Section 25
...
9)
...
3
...
This extensibility feature of the Oracle server allows software vendors to develop
so-called cartridges with functionality for specific application domains, such as text,
spatial data, and images, with indexing functionality beyond that provided by the
standard Oracle index types
...
A domain index must be registered in the data dictionary, together with the operators it supports
...
Oracle allows cost functions to be registered with the operators so that the optimizer can compare the cost of using the domain index to those of
other access paths
...
Once this operator has been registered, the domain index will be considered
as an access path for a query like
select *
from employees
where contains(resume, ’LINUX’)
Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition
932
Chapter 25
III
...
Object−Oriented
Databases
© The McGraw−Hill
Companies, 2001
Oracle
where resume is a text column in the employee table
...
A domain index can be combined with other (bitmap or B-tree) indices in the same
access path by converting between the row-id and bitmap representation and using
Boolean bitmap operations
...
3
...
The
ability to partition a table or index has advantages in many areas
...
• Loading operations in a data warehousing environment are less intrusive:
data can be added to a partition, and then the partition added to a table, which
is an instantaneous operation
...
• Query performance benefits substantially, since the optimizer can recognize
that only a subset of the partitions of a table need to be accessed in order to
resolve a query (partition pruning)
...
Each row in a partitioned table is associated with a specific partition
...
There are several ways to map column values to partitions, giving
rise to several types of partitioning, each with different characteristics: range, hash,
composite, and list partitioning
...
3
...
1 Range Partitioning
In range partitioning, the partitioning criteria are ranges of values
...
In a data warehouse
where data are loaded from the transactional systems at regular intervals, range partitioning can be used to implement a rolling window of historical data efficiently
...
The system actually loads the data into a separate table with the same
column definition as the partitioned table
...
After that, the system can make the separate table a
new partition of the partitioned table, by a simple change to the metadata in the data
dictionary — a nearly instantaneous operation
...
Object−Based
Databases and XML
© The McGraw−Hill
Companies, 2001
8
...
3
Storage and Indexing
933
Up until the metadata change, the loading process does not affect the existing
data in the partitioned table in any way
...
Old data can be removed from a table by
simply dropping its partition; this operation does not affect the other partitions
...
If date range
partitioning is used, the query optimizer can restrict the data access to those partitions that are relevant to the query, and avoid a scan of the entire table
...
3
...
2 Hash Partitioning
In hash partitioning, a hash function maps rows to partitions according to the values
in the partitioning columns
...
25
...
10
...
This type of partitioning combines the advantages of range partitioning and hash partitioning
...
3
...
4 List Partitioning
In list partitioning, the values associated with a particular partition are stated in a
list
...
For instance, a table with a state column can be
implicitly partitioned by geographical region if each partition list has the states that
belong in the same region
...
3
...
5
...
In addition, Oracle maintains
the materialized result, updating it when the tables that were referenced in the query
are updated
...
In data warehousing, a common usage for materialized views is to summarize
data
...
” Precomputing the result, or some partial result, of such a
query can speed up query processing dramatically compared to computing it from
scratch by aggregating all detail-level sales records
...
The rewrite consists of changing the query to
use the materialized view instead of the original tables in the query
...
Object−Based
Databases and XML
8
...
For example, if a query needs sales by quarter, the rewrite can take
advantage of a view that materializes sales by month, by adding additional aggregation to roll up the months to quarters
...
For example,
for a time dimension table in a star schema, Oracle can define a dimension metadata
object to specify how days roll up to months, months to quarters, quarters to years,
and so forth
...
The query rewrite logic looks at
these relationships since they allow a materialized view to be used for wider classes
of queries
...
When there are changes to the data in the tables referenced in the query that defines a materialized view, the materialized view must be refreshed to reflect those
changes
...
In a full refresh, Oracle recomputes the materialized view from scratch,
which may be the best option if the underlying tables have had significant changes,
for example, changes due to a bulk load
...
Incremental refresh may be better if the number of rows that
were changed is low
...
A materialized view is similar to an index in the sense that, while it can improve
query performance, it uses up space, and creating and maintaining it consumes resources
...
25
...
Some of the more important ones are described here briefly
...
4
...
The query processor scans the entire table by getting information about the blocks that make up the table from the extent map, and
scanning those blocks
...
The processor creates a start and/or stop key from conditions
in the query and uses it to scan to a relevant part of the index
...
Object−Based
Databases and XML
© The McGraw−Hill
Companies, 2001
8
...
4
Query Processing and Optimization
935
scan would be followed by a table access by index row-id
...
• Index fast full scan
...
If the index contains all the columns that are needed
in the index, and there are no good start/stop keys that would significantly
reduce that portion of the index that would be scanned in a regular index scan,
this method may be the fastest way to access the data
...
However, unlike a
regular full scan, which traverses the index leaf blocks in order, a fast full scan
does not guarantee that the output preserves the sort order of the index
...
If a query needs only a small subset of the columns of a wide
table, but no single index contains all those columns, the processor can use an
index join to generate the relevant information without accessing the table, by
joining several indices that together contain the needed columns
...
• Cluster and hash cluster access
...
Oracle has several ways to combine information from multiple indices in a single
access path
...
The functionality includes the
ability to perform Boolean operations and, or, and minus on bitmaps representing
row-ids
...
In addition, for many queries involving count(*) on selections
on a table, the result can be computed by just counting the bits that are set in the
bitmap generated by applying the where clause conditions, without accessing the
table
...
(An antijoin in Oracle returns rows from the left-hand
side input that do not match any row in the right-hand side input; this operation is
called anti-semijoin in other literature
...
25
...
2 Optimization
In Chapter 14, we discussed the general topic of query optimization
...
25
...
2
...
Most of the techniques relating to
query transformations and rewrites take place before access path selection, but Oracle also supports several types of cost-based query transformations that generate a
complete plan and return a cost estimate for both a standard version of the query and
Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition
936
Chapter 25
III
...
Object−Oriented
Databases
© The McGraw−Hill
Companies, 2001
Oracle
one that has been subjected to advanced transformations
...
Some of the major types of transformations and rewrites supported by Oracle are
as follows:
• View merging
...
This transformation is not applicable to all views
...
Oracle offers this feature for certain classes of views
that are not subject to regular view merging because they have a group by or
select distinct in the view definition
...
• Subquery flattening
...
• Materialized view rewrite
...
If some part of the query can be
matched up with an existing materialized view, Oracle can replace that part
of the query with a reference to the table in which the view is materialized
...
If multiple materialized views are applicable, Oracle picks the one that gives the greatest advantage in reducing the amount of
data that has to be processed
...
Oracle then decides
whether to execute the rewritten or the original version of the query on the
basis of the cost estimates
...
Oracle supports a technique for evaluating queries against
star schemas, known as the star transformation
...
fki in
(select pk from dimension tablei
where
One such subquery is generated for each dimension that has some constraining predicate
...
4), the
subquery will contain a join of the applicable tables that make up the dimension
...
Object−Based
Databases and XML
© The McGraw−Hill
Companies, 2001
8
...
4
Query Processing and Optimization
937
Oracle uses the values that are returned from each subquery to probe an
index on the corresponding fact table column, getting a bitmap as a result
...
The resultant bitmap can be used to access matching fact table
rows
...
Both the decision on whether the use of a subquery for a particular dimension is cost-effective, and the decision on whether the rewritten query is better
than the original, are based on the optimizer’s cost estimates
...
4
...
2 Access Path Selection
Oracle has a cost-based optimizer that determines join order, join methods, and access paths
...
In estimating the cost of an operation, the optimizer relies on statistics that have
been computed for schema objects such as tables and indices
...
For column statistics, Oracle supports height-balanced and
frequency histograms
...
Oracle
also tracks what columns are used in where clauses of queries, which make them potential candidates for histogram creation
...
Oracle uses sampling to speed up the process of gathering the new statistics and
automatically chooses the smallest adequate sample percentage
...
Oracle uses both CPU cost and disk I/Os in the optimizer cost model
...
Oracle’s package for gathering optimizer statistics
computes these measures
...
Oracle addresses this issue in several ways
...
It then changes the order of the tables and determines the best join
methods and access paths for the new join order and so forth, while keeping the best
plan that has been found so far
...
Since this cutoff depends on the cost estimate for the best
plan found so far, finding a good plan early is important so that the optimization can
be stopped after a smaller number of join orders, resulting in better response time
...
Object−Based
Databases and XML
8
...
For each join order that is considered, the optimizer may make additional passes
over the tables to decide join methods and access paths
...
For instance, a specific
combination of join methods and access paths may eliminate the need to perform an
order by sort
...
25
...
2
...
For example, if a table is partitioned
by date range and the query is constrained to data between two specific dates, the
optimizer determines which partitions contain data between the specified dates and
ensures that only those partitions are accessed
...
25
...
3 Parallel Execution
Oracle allows the execution of a single SQL statement to be parallelized by dividing
the work between multiple processes on a multiprocessor computer
...
Representative examples are decision support
queries that need to process large amounts of data, data loads in a data warehouse,
and index creation or rebuild
...
Depending on the type of
operation, Oracle has several ways to split up the work
...
For some operations, such as a full table scan,
each such slice can be a range of blocks— each parallel query process scans the table
from the block at the start of the range to the block at the end
...
For inserts
into a nonpartitioned table, the data to be inserted are randomly divided across the
parallel processes
...
One way is to divide one of the
inputs to the join between parallel processes and let each process join its slice with
the other input to the join; this is the asymmetric fragment-and-replicate method
of Section 20
...
2
...
For example, if a large table is joined to a small one by a hash
join, Oracle divides the large table among the processes and broadcasts a copy of the
small table to each process, which then joins its slice with the smaller table
...
Object−Based
Databases and XML
8
...
4
© The McGraw−Hill
Companies, 2001
Query Processing and Optimization
939
tables are large, it would be prohibitively expensive to broadcast one of them to all
processes
...
5
...
1)
...
Which one of these processes gets the row is determined by a hash function
on the values of the join column
...
Oracle parallelizes sort operations by value ranges of the column on which the
sort is performed (that is, using the range-partitioning sort of Section 20
...
1)
...
To maximize the benefits of parallelism, the rows need to be divided
as evenly as possible among the parallel processes, and the problem of determining
range boundaries that generates a good distribution then arises
...
25
...
3
...
The coordinator is
responsible for assigning work to the parallel servers and for collecting and returning
data to the user process that issued the statement
...
The degree of parallelism is determined by the optimizer, but
can be throttled back dynamically if the load on the system increases
...
When a sequence of
operations is needed to process a statement, the producer set of servers performs the
first operation and passes the resulting data to the consumer set
...
If a subsequent operation is needed, like another sort,
the roles of the two sets of servers switch
...
Hence, a sequence of operations proceeds by
passing data back and forth between two sets of servers that alternate in their roles as
producers and consumers
...
For shared nothing systems, the cost of accessing data on disk is not uniform
among processes
...
Oracle uses knowledge about device-to-node and device-toprocess affinity — that is, the ability to access devices directly — when distributing
work among parallel execution servers
...
Object−Based
Databases and XML
8
...
5 Concurrency Control and Recovery
Oracle supports concurrency control and recovery techniques that provide a number
of useful features
...
5
...
Read-only queries are given a read-consistent
snapshot, which is a view of the database as it existed at a specific point in time,
containing all updates that were committed by that point in time, and not containing
any updates that were not committed at that point in time
...
(This is basically the multiversion two-phase locking protocol described in
Section 16
...
2
...
The SCN essentially acts as a timestamp, where the time is measured in terms
of transaction commits instead of wall-clock time
...
Hence, the data in the block cannot be included in a consistent
view of the database as it existed at the time of the query’s SCN
...
Oracle retrieves that version of the
data from the rollback segment (rollback segments are described in Section 25
...
2)
...
Should the block with the desired SCN no
longer exist in the rollback segment, the query will return an error
...
In the Oracle concurrency model, read operations do not block write operations
and write operations do not block read operations, a property that allows a high
degree of concurrency
...
This kind of scenario is often problematic for database systems where queries
use read locks, since the query may either fail to acquire them or lock large amounts
of data for a long time, thereby preventing transactional activity against that data
and reducing concurrency
...
)
Oracle’s concurrency model is used as a basis for the Flashback Query feature
...
Object−Based
Databases and XML
© The McGraw−Hill
Companies, 2001
8
...
5
Concurrency Control and Recovery
941
perform queries on the data that existed at that point in time (provided that the data
still exist in the rollback segment)
...
However, recovery of a
very large database can be very costly, especially if the goal is just to retrieve some
data item that had been inadvertently deleted by a user
...
Oracle supports two ANSI/ISO isolation levels, “read committed” and “serializable”
...
The two isolation
levels correspond to whether statement-level or transaction-level read consistency is
used
...
Statement-level
read consistency is the default
...
Updates to different rows do not conflict
...
Locks are held for the duration of a transaction
...
These locks
prevent one user from, say, dropping a table while another user has an uncommitted
transaction that is accessing that table
...
Oracle detects deadlocks automatically and resolves them by rolling back one of
the transactions involved in the deadlock
...
When Oracle invokes an autonomous transaction, it generates a new transaction in a separate context
...
Oracle supports multiple levels of nesting of autonomous transactions
...
5
...
In addition to the
data files that contain tables and indices, there are control files, redo logs, archived
redo logs, and rollback segments
...
Oracle records any transactional modification of a database buffer in the redo log,
which consists of two or more files
...
It logs
changes to indices and rollback segments as well as changes to table data
...
The rollback segment contains information about older versions of the data (that
is, undo information)
...
Object−Based
Databases and XML
8
...
To be able to recover from a storage failure, the data files and control files should be
backed up regularly
...
Oracle supports hot backups
— backups performed on an online database that is subject to transactional activity
...
First, Oracle rolls forward
by applying the (archived) redo logs to the backup
...
Second, Oracle rolls back uncommitted transactions by using the rollback segment
...
Recovery on a database that has been subject to heavy transactional activity since
the last backup can be time consuming
...
Oracle provides
a GUI tool, Recovery Manager, which automates most tasks associated with backup
and recovery
...
5
...
(This feature is the same as remote backups, described in Section 17
...
) A standby
database is a copy of the regular database that is installed on a separate system
...
Oracle
keeps the standby database up to date by constantly applying archived redo logs
that are shipped from the primary database
...
25
...
Oracle can be configured
so that the operating system process is dedicated exclusively to the statement it is
processing or so that the process can be shared among multiple statements
...
We shall discuss the dedicated
server architecture first and the multithreaded server architecture later
...
6
...
The system code areas are the parts of the memory where the Oracle server code
resides
...
Object−Based
Databases and XML
© The McGraw−Hill
Companies, 2001
8
...
6
System Architecture
943
tion
...
It also contains memory for sorting and
hashing operations that may occur during the evaluation of the statement
...
It is made
up by several major structures, including:
• The buffer cache
...
A least recently used replacement policy is used except for blocks accessed during a full table scan
...
Some Oracle operations bypass the buffer cache and read data directly from disk
...
This buffer contains the part of the redo log that has not
yet been written to disk
...
Oracle seeks to maximize the number of users that can
use the database concurrently by minimizing the amount of memory that is
needed for each user
...
When multiple users execute the same SQL statement, they can
share most data structures that represent the execution plan for the statement
...
The sharable parts of the data structures representing the SQL statement are
stored in the shared pool, including the text of the statement
...
The determination of whether an SQL statement is the same as one existing in the shared pool is based on exact text
matching and the setting of certain session parameters
...
The shared pool also contains caches for dictionary
information and various control structures
...
6
...
Some of these processes are optional, and in
some cases, multiple processes of the same type can be used for performance reasons
...
When a buffer is removed from the buffer cache, it must be
written back to disk if it has been modified since it entered the cache
...
Object−Based
Databases and XML
8
...
• Log writer
...
It also writes a commit record to disk whenever a transaction commits
...
The checkpoint process updates the headers of the data file when
a checkpoint occurs
...
This process performs crash recovery if needed
...
• Process monitor
...
• Recoverer
...
• Archiver
...
25
...
3 Multithreaded Server
The multithreaded server configuration increases the number of users that a given
number of server processes can support by sharing server processes among statements
...
In doing so, it uses a request queue and a response queue in the
SGA
...
As a server process completes a request, it
puts the result in the response queue to be picked up by the dispatcher and
returned to the user
...
Instead, it stores the session-specific data in
the SGA
...
6
...
(Recall that, in Oracle terminology, an instance
is the combination of background processes and memory areas
...
This feature was called Oracle Parallel Server in earlier versions of Oracle
...
333
334
Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition
III
...
Object−Oriented
Databases
25
...
Oracle further optimizes the use of the hardware through features
such as affinity and partitionwise joins
...
If
one node fails, the remaining ones are still available to the application accessing the
database
...
Having multiple instances run against the same database gives rise to some technical issues that do not exist on a single instance
...
To address this, Oracle supports a distributed lock manager and the cache fusion feature, which allows data blocks to flow directly among caches on different instances
using the interconnect, without being written to disk
...
7 Replication, Distribution, and External Data
Oracle provides support for replication and distributed transactions with two-phase
commit
...
7
...
(See Section 19
...
1 for an introduction
to replication
...
(The term “snapshot” in this context should not be confused with the concept of a read-consistent snapshot in the context of the concurrency
model
...
Oracle supports two types
of snapshots: read-only and updatable
...
However, read-only
snapshots allow for a wider range of snapshot definitions
...
Oracle also supports multiple master sites for the same data, where all master
sites act as peers
...
The updates can be propagated either
asynchronously or synchronously
...
Since the same data could be subject to conflicting modifications at different sites, conflict resolution based on some business rules might be
needed
...
With synchronous replication, an update to one master site is propagated immediately to all other sites
...
Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition
946
Chapter 25
III
...
Object−Oriented
Databases
© The McGraw−Hill
Companies, 2001
Oracle
25
...
2 Distributed Databases
Oracle supports queries and transactions spanning multiple databases on different
systems
...
Oracle has built-in capability to optimize a query that includes tables at different sites, retrieve the relevant data, and return the result as if it had been a normal,
local query
...
25
...
3 External Data Sources
Oracle has several mechanisms for supporting external data sources
...
25
...
3
...
It supports a variety of data formats and it can
perform various filtering operations on the data being loaded
...
7
...
2 External Tables
Oracle allows external data sources, such as flat files, to be referenced in the from
clause of a query as if they were regular tables
...
An access driver is also needed to access the external data
...
The external table feature is primarily intended for extraction, transformation, and
loading (ETL) operations in a data warehousing environment
...
from < external table >
where
...
Since these
operations can be expressed either in native SQL or in functions written in PL/SQL or
Java, the external table feature provides a very powerful mechanism for expressing
all kinds of data transformation and filtering operations
...
25
...
335
336
Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition
III
...
Object−Oriented
Databases
Bibliographical Notes
947
25
...
1 Oracle Enterprise Manager
Oracle Enterprise Manager is Oracle’s main tool for database systems management
...
It also provides performance monitoring and tools to help
an administrator tune application SQL, access paths, and instance and data storage
parameters
...
25
...
2 Database Resource Management
A database administrator needs to be able to control how the processing power of
the hardware is divided among users or groups of users
...
It is also important to be able to prevent a user from inadvertently submitting an
extremely expensive ad hoc query that will unduly delay other users
...
For example, a group of high-priority, interactive users may be guaranteed at
least 60 percent of the CPU
...
A really low-priority group could get assigned 0 percent, which
would mean that queries issued by this group would run only when there are spare
CPU cycles available
...
The database administrator can also set time limits for how
long an SQL statement is allowed to run for each group
...
The resource manager can also
limit the number of user sessions that can be active concurrently for each resource
consumer group
...
oracle
...
oracle
...
Extensible indexing in Oracle8i is described by Srinivasan et al
...
[2000a] describe index organized tables in Oracle8i
...
[2000] describe XML support in Oracle8i
...
[1998] describe materialized
views in Oracle
...
The Oracle Parallel Server is described by Bamford et al
...
Recovery in Oracle
is described by Joshi et al
...
[2001]
...
Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition
C
III
...
Object−Relational
Databases
T
E
R
337
© The McGraw−Hill
Companies, 2001
9
Object-Relational Databases
Persistent programming languages add persistence and other database features to existing programming languages by using an existing object-oriented type system
...
Relational
query languages, in particular SQL, need to be correspondingly extended to deal
with the richer type system
...
Object-relational database systems (that is, database systems based on
the object-relation model) provide a convenient migration path for users of relational
databases who wish to use object-oriented features
...
We then show how to extend SQL by adding a variety of object-relational
features
...
Finally, we discuss differences between persistent programming languages and
object-relational systems, and mention criteria for choosing between them
...
1 Nested Relations
In Chapter 7, we defined first normal form (1NF), which requires that all attributes
have atomic domains
...
The assumption of 1NF is a natural one in the bank examples we have considered
...
For example, rather
than view a database as a set of records, users of certain applications view it as a set of
objects (or entities)
...
We shall see that a simple, easy-to-use interface requires a one-to-one correspondence
335
338
Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition
336
Chapter 9
III
...
Object−Relational
Databases
Object-Relational Databases
title
Compilers
Networks
author-set
publisher
(name, branch)
{Smith, Jones} (McGraw-Hill, New York)
{Jones, Frick}
(Oxford, London)
Figure 9
...
between the user’s intuitive notion of an object and the database system’s notion of
a data item
...
Thus, the value of a tuple on an attribute may be a relation, and relations may be contained within relations
...
If we view a tuple of a nested relation as a data item, we have a one-to-one correspondence between
data items and objects in the user’s view of the database
...
Suppose we store for
each book the following information:
• Book title
• Set of authors
• Publisher
• Set of keywords
We can see that, if we define a relation for the preceding information, several domains
will be nonatomic
...
A book may have a set of authors
...
Thus, we are interested
in a subpart of the domain element “set of authors
...
If we store a set of keywords for a book, we expect to be able to
retrieve all books whose keywords include one or more keywords
...
• Publisher
...
However, we may view publisher as consisting of the subfields name
and branch
...
Figure 9
...
The books relation can be represented
in 1NF, as in Figure 9
...
Since we must have atomic domains in 1NF, yet want access to individual authors and to individual keywords, we need one tuple for each
(keyword, author) pair
...
Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition
III
...
2
title
Compilers
Compilers
Compilers
Compilers
Networks
Networks
Networks
Networks
Figure 9
...
Object−Relational
Databases
Complex Types
pub-branch
New York
New York
New York
New York
London
London
London
London
337
keyword
parsing
parsing
analysis
analysis
Internet
Internet
Web
Web
flat-books, a 1NF version of non-1NF relation books
...
2 disappears if we
assume that the following multivalued dependencies hold:
• title → author
→
• title → keyword
→
• title → pub-name, pub-branch
Then, we can decompose the relation into 4NF using the schemas:
• authors(title, author)
• keywords(title, keyword)
• books4(title, pub-name, pub-branch)
Figure 9
...
2 onto the preceding decomposition
...
The 4NF design would
require users to include joins in their queries, thereby complicating interaction with
the system
...
In such a view,
however, we lose the one-to-one correspondence between tuples and books
...
2 Complex Types
Nested relations are just one example of extensions to the basic relational model;
other nonatomic data types, such as nested records, have also proved useful
...
With complex type systems and object orientation, we can represent E-R model concepts, such as identity of entities, multivalued attributes, and
generalization and specialization directly, without a complex translation to the relational model
...
Object−Based
Databases and XML
© The McGraw−Hill
Companies, 2001
9
...
3
4NF version of the relation flat-books of Figure 9
...
In this section, we describe extensions to SQL to allow complex types, including nested relations, and object-oriented features
...
9
...
1 Collection and Large Object Types
Consider this fragment of code
...
keyword-set setof(varchar(20))
...
Sets are an instance of collection types
...
The following attribute definitions illustrate the declaration of an
array:
author-array varchar(20) array [10]
Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition
III
...
Object−Relational
Databases
9
...
We can access elements of an
array by specifying the array index, for example author-array[1]
...
SQL:1999 does not support unordered sets or multisets,
although they may appear in future versions of SQL
...
SQL:1999 therefore provides new large-object data types
for character data (clob) and binary data (blob)
...
For example, we may declare attributes
book-review clob(10KB)
image blob(10MB)
movie blob(2GB))
Large objects are typically used in external applications, and it makes little sense to
retrieve them in their entirety by SQL
...
For instance, JDBC permits the programmer to fetch a large object
in small pieces, rather than all at once, much like fetching data from an operating
system file
...
2
...
The second statement defines a structured type Book, which contains
a title, an author-array, which is an array of authors, a publication date, a publisher
(of type Publisher), and a set of keywords
...
) The types
illustrated above are called structured types in SQL:1999
...
The Oracle 8 database system supports nested relations, but uses a syntax different from that in this
chapter
...
Object−Based
Databases and XML
9
...
The table is similar
to the nested relation books in Figure 9
...
The array permits us to record the
order of author names
...
Unnamed row types can also be used in SQL:1999 to define composite attributes
...
We can of course create tables without creating an intermediate type for the table
...
2
A structured type can have methods defined on it
...
salary = self
...
salary * percent) / 100;
end
The variable self refers to the structured type instance on which the method is invoked
...
6
...
In Oracle PL/SQL, given a table t, t%rowtype denotes the type of the rows of the table
...
a%type denotes the type of attribute a of table t
...
Object−Based
Databases and XML
343
© The McGraw−Hill
Companies, 2001
9
...
2
Complex Types
341
9
...
3 Creation of Values of Complex Types
In SQL:1999 constructor functions are used to create values of structured types
...
For instance, we could declare a constructor for the type Publisher
like this:
create function Publisher (n varchar(20), b varchar(20))
returns Publisher
begin
set name = n;
set branch = b;
end
We can then use Publisher(’McGraw-Hill’, ’New York’) to create a value of the type
Publisher
...
6; the names of such functions must be different from the name of any structured type
...
That is, the value the constructor creates
has no object identity
...
By default every structured type has a constructor with no arguments, which sets
the attributes to their default values
...
There can be more than one constructor for the same structured type; although
they have the same name, they must be distinguishable by the number of arguments
and types of their arguments
...
For instance,
if we declare an attribute publisher1 as a row type (as in Section 9
...
2), we can construct this value for it:
(’McGraw-Hill’, ’New York’)
without using a constructor
...
We can create multiset values just like
set values, by replacing set by multiset
...
Although sets and multisets are not part of the SQL:1999 standard, the other constructs shown in this
section are part of the standard
...
344
Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition
342
Chapter 9
III
...
Object−Relational
Databases
© The McGraw−Hill
Companies, 2001
Object-Relational Databases
Here we have created a value for the attribute Publisher by invoking a constructor
function for Publisher with appropriate arguments
...
3 Inheritance
Inheritance can be at the level of types, or at the level of tables
...
9
...
1 Type Inheritance
Suppose that we have the following type definition for people:
create type Person
(name varchar(20),
address varchar(20))
We may want to store extra information in the database about people who are students, and about people who are teachers
...
Student and Teacher are said to be subtypes of Person, and Person is a supertype of
Student, as well as of Teacher
...
However, a subtype can redefine the effect of a method by declaring the method
again, using overriding method in place of method in the method declaration
...
We can do this by using multiple inheritance, which we studied in Chapter 8
...
However, draft versions
of the SQL:1999 standard provided for multiple inheritance, and although the final
Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition
III
...
Object−Relational
Databases
9
...
We base
our discussion on the draft versions of the SQL:1999 standard
...
There is a
problem, however, since the attributes name, address, and department are present in
Student, as well as in Teacher
...
So there is no conflict caused by inheriting them from Student as well as Teacher
...
In fact,
a teaching assistant may be a student of one department and a teacher in another
department
...
Multiple inheritance as in the TeachingAssistant example is not supported in SQL:1999
...
The keyword final says that subtypes may not be created
from the given type, while not final says that subtypes may be created
...
” That is, each value must be associated with one specific
type, called its most-specific type, when it is created
...
For example,
suppose that an entity has the type Person, as well as the type Student
...
However, an
entity cannot have the type Student, as well as the type Teacher, unless it has a type,
such as TeachingAssistant, that is a subtype of Teacher, as well as of Student
...
3
...
For instance, suppose we define the people table as follows:
create table people of Person
We can then define tables students and teachers as subtables of people, as follows:
346
Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition
344
Chapter 9
III
...
Object−Relational
Databases
© The McGraw−Hill
Companies, 2001
Object-Relational Databases
create table students of Student
under people
create table teachers of Teacher
under people
The types of the subtables must be subtypes of the type of the parent table
...
Further, when we declare students and teachers as subtables of people, every tuple
present in students or teachers becomes also implicitly present in people
...
However,
only those attributes that are present in people can be accessed
...
(We
note, however, that multiple inheritance of tables is not supported by SQL:1999
...
SQL:1999 permits us to find tuples that are in people but not in its subtables by using
“only people” in place of people in a query
...
Before we state the constraints, we need a definition: We say that tuples in a subtable corresponds to tuples
in a parent table if they have the same values for all inherited attributes
...
The consistency requirements for subtables are:
1
...
2
...
For example, without the first condition, we could have two tuples in students (or
teachers) that correspond to the same person
...
Since SQL:1999 does not support multiple inheritance, the second condition actually prevents a person from being both a teacher and a student
...
Obviously it would be useful to model a situation where a person
Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition
III
...
Object−Relational
Databases
9
...
Thus, it can be useful to remove the second consistency constraint
...
3
...
Subtables can be stored in an efficient manner without replication of all inherited
fields, in one of two ways:
• Each table stores the primary key (which may be inherited from a parent table)
and the attributes defined locally
...
• Each table stores all inherited and locally defined attributes
...
Access to all attributes of a tuple is faster,
since a join is not required
...
9
...
3 Overlapping Subtables
Inheritance of types should be used with care
...
Student may itself have subtypes such as UndergraduateStudent, GraduateStudent, and
PartTimeStudent
...
As Chapter 8 mentions, each of these categories is sometimes called a role
...
In the preceding example, we would have subtypes such as ForeignUndergraduateStudent, ForeignGraduateStudentFootballPlayer, and so on
...
A better approach in the context of database systems is to allow an object to have
multiple types, without having a most-specific type
...
For example, suppose we again have the type Person, with subtypes Student and
Teacher, and the corresponding table people, with subtables teachers and students
...
There is no need to have a type TeachingAssistant that is a subtype of both Student
and Teacher
...
We note, however, that SQL:1999 prohibits such a situation, because of consistency
requirement 2 from Section 9
...
2
...
We can of course create separate tables to represent the
348
Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition
346
Chapter 9
III
...
Object−Relational
Databases
© The McGraw−Hill
Companies, 2001
Object-Relational Databases
information without using inheritance
...
9
...
An attribute of a type
can be a reference to an object of a specified type
...
The restriction of the
scope of a reference to tuples of a table is mandatory in SQL:1999, and it makes references behave like foreign keys
...
We can get the identifier value of a tuple by means of a query
...
SQL:1999 adopts a different approach, one where the referenced table must have an
attribute that stores the identifier of the tuple
...
Object−Based
Databases and XML
349
© The McGraw−Hill
Companies, 2001
9
...
4
Reference Types
347
Here, oid is an attribute name, not a keyword
...
oid
instead of select ref(p)
...
The type of the self-referential attribute must be specified as part of the type
definition of the referenced table, and the table definition must specify that the reference is user generated:
create type Person
(name varchar(20),
address varchar(20))
ref using varchar(20)
create table people of Person
ref is oid user generated
When inserting a tuple in people, we must provide a value for the identifier:
insert into people values
(’01284567’, ’John’, ’23 Coyote Run’)
No other tuple for people or its supertables or subtables can have the same identifier
...
When inserting a tuple for departments, we
can then use
insert into departments
values (’CS’, ’John’)
350
Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition
348
Chapter 9
III
...
Object−Relational
Databases
© The McGraw−Hill
Companies, 2001
Object-Relational Databases
9
...
Let us start with a simple example: Find the title and the name of the publisher
of each book
...
name
from books
Notice that the field name of the composite attribute publisher is referred to by a dot
notation
...
5
...
Consider the departments
table defined earlier
...
Since head is a reference to a tuple in the people table, the attribute name in the
preceding query is the name attribute of the tuple from the people table
...
To find
the name and address of the head of a department, we would require an explicit
join of the relations departments and people
...
9
...
2 Collection-Valued Attributes
We now consider how to handle collection-valued attributes
...
An expression evaluating to a collection can appear anywhere that a
relation name may appear, such as in a from clause, as the following paragraphs
illustrate
...
If we want to find all books that have the word “database” as one of their keywords, we can use this query:
select title
from books
where ’database’ in (unnest(keyword-set))
Note that we have used unnest(keyword-set) in a position where SQL without nested
relations would have required a select-from-where subexpression
...
Object−Based
Databases and XML
351
© The McGraw−Hill
Companies, 2001
9
...
5
Querying with Complex Types
349
If we know that a particular book has three authors, we could write:
select author-array[1], author-array[2], author-array[3]
from books
where title = ’Database System Concepts’
Now, suppose that we want a relation containing pairs of the form “title, authorname” for each book and each author of the book
...
title, A
...
author-array) as A
Since the author-array attribute of books is a collection-valued field, it can be used in a
from clause, where a relation is expected
...
5
...
The books relation has two attributes, author-array and
keyword-set, that are collections, and two attributes, title and publisher, that are not
...
We can use the following query to carry out
the task:
select title, A as author, publisher
...
branch
as pub-branch, K as keyword
from books as B, unnest(B
...
keyword-set) as K
The variable B in the from clause is declared to range over books
...
Figure 9
...
1)
shows an instance books relation, and Figure 9
...
The reverse process of transforming a 1NF relation into a nested relation is called
nesting
...
In the normal
use of grouping in SQL, a temporary multiset relation is (logically) created for each
group, and an aggregate function is applied on the temporary relation
...
Suppose that we are given a 1NF relation flat-books, as in Figure 9
...
The
following query nests the relation on the attribute keyword:
select title, author, Publisher(pub-name, pub-branch) as publisher,
set(keyword) as keyword-set
from flat-books
groupby title, author, publisher
The result of the query on the books relation from Figure 9
...
4
...
Object−Based
Databases and XML
© The McGraw−Hill
Companies, 2001
9
...
4
publisher
(pub-name, pub-branch)
(McGraw-Hill, New York)
(McGraw-Hill, New York)
(Oxford, London)
(Oxford, London)
keyword-set
{parsing, analysis}
{parsing, analysis}
{Internet, Web}
{Internet, Web}
A partially nested version of the flat-books relation
...
2 back to the nested table books in Figure 9
...
The following query, which performs the same task as the previous query,
illustrates this approach
...
title = O
...
title = O
...
Observe that the attribute
O
...
An advantage of
this approach is that an orderby clause can be used in the nested query, to generate
results in a desired order
...
Without such an ordering, arrays and lists would not be uniquely
determined
...
The extensions we have shown for nesting illustrate features from some proposals
for extending SQL, but are not part of any standard currently
...
Object−Based
Databases and XML
353
© The McGraw−Hill
Companies, 2001
9
...
6
Functions and Procedures
351
9
...
These can be
defined either by the procedural component of SQL:1999, or by an external programming language such as Java, C, or C++
...
Several database systems support their own procedural languages, such as PL/SQL in Oracle and TransactSQL in
Microsoft SQLServer
...
9
...
1 SQL Functions and Procedures
Suppose that we want a function that, given the title of a book, returns the count of
the number of authors, using the 4NF schema
...
title = title
return a-count;
end
This function can be used in a query that returns the titles of all books that have
more than one author:
select title
from books4
where author-count(title) > 1
Functions are particularly useful with specialized data types such as images and
geometric objects
...
Functions
may be written in an external language such as C, as we see in Section 9
...
2
...
Methods, which we saw in Section 9
...
2, can be viewed as functions associated
with structured types
...
Thus, the body of the
method can refer to an attribute a of the value by using self
...
These attributes can
also be updated by the method
...
The author-count function could instead be written as a procedure:
354
Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition
352
Chapter 9
III
...
Object−Relational
Databases
© The McGraw−Hill
Companies, 2001
Object-Relational Databases
create procedure author-count-proc(in title varchar(20), out a-count integer)
begin
select count(author) into a-count
from authors
where authors
...
The name,
along with the number of arguments, is used to identify the procedure
...
9
...
2 External Language Routines
SQL:1999 allows us to define functions in a programming language such as C or C++
...
An example of the use of such functions would be to perform a complex arithmetic computation on the data in a tuple
...
They must therefore have several extra parameters: an sqlstate value to indicate failure/success status, a parameter to store the return value of the function, and indicator variables for each parameter/function result to indicate if the value is null
...
Functions defined in a programming language and compiled outside the database
system may be loaded and executed with the database system code
...
Object−Based
Databases and XML
355
© The McGraw−Hill
Companies, 2001
9
...
6
Functions and Procedures
353
ing so carries the risk that a bug in the program can corrupt the database internal
structures, and can bypass the access-control functionality of the database system
...
Database systems that are concerned about security would typically execute such
code as part of a separate process, communicate the parameter values to it, and fetch
results back, via interprocess communication
...
The sandbox prevents
the Java code from carrying out any reads or updates directly on the database
...
6
...
The part of the SQL:1999
standard that deals with these constructs is called the Persistent Storage Module
(PSM)
...
end, and it may contain multiple SQL statements between the begin and the end
...
6
...
SQL:1999 supports while statements and repeat statements by this syntax:
declare n integer default 0;
while n < 10 do
set n = n + 1;
end while
repeat
set n = n − 1;
until n = 0
end repeat
This code does not do anything useful; it is simply meant to show the syntax of while
and repeat loops
...
There is also a for loop, which permits iteration over all results of a query:
declare n integer default 0;
for r as
select balance from account
where branch-name = ‘Perryridge‘
do
set n = n+ r
...
It is possible to give a name to the cursor, by inserting the text cn cursor for just
after the keyword as, where cn is the name we wish to give to the cursor
...
Object−Based
Databases and XML
9
...
The statement leave can be used to exit the loop, while iterate starts on
the next tuple, from the beginning of the loop, skipping the remaining statements
...
balance < 1000
then set l = l+ r
...
balance < 5000
then set m = m+ r
...
balance
end if
This code assumes that l, m, and h are integer variables, and r is a row variable
...
balance” in the for loop of the preceding paragraph by
the if-then-else code, the loop would compute the total balances of accounts that fall
under the low, medium, and high balance categories respectively
...
Finally, SQL:1999 includes the concept of signaling exception conditions, and declaring handlers that can handle the exception, as in this code:
declare out-of-stock condition
declare exit handler for out-of-stock
begin
...
The handler says that if the condition arises, the action to be taken
is to exit the enclosing begin end statement
...
In addition to explicitly defined conditions, there are also predefined conditions such as sqlexception, sqlwarning, and not found
...
5 provides a larger example of the use of SQL:1999 procedural constructs
...
The relation manager(empname, mgrname), specifying who works directly for which manager, is assumed to be available
...
We saw how to express such a query by recursion
in Chapter 5 (Section 5
...
6)
...
The procedure inserts
all employees who directly work for mgr into newemp before the repeat loop
...
Next, it computes employees
who work for those in newemp, except those who have already been found to be
Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition
III
...
Object−Relational
Databases
9
...
– – The relation manager(empname, mgrname) specifies who directly
– – works for whom
...
empname
from newemp, manager
where newemp
...
mgrname;
)
except (
select empname
from empl
);
delete from newemp;
insert into newemp
select *
from temp;
delete from temp;
until not exists (select * from newemp)
end repeat;
end
Figure 9
...
employees of mgr, and stores them in the temporary table temp
...
The repeat loop terminates when it
finds no new (indirect) employees
...
For
example, if a works for b, b works for c, and c works for a, there is a cycle
...
For instance, suppose we have a relation flights(to, from) that says which
358
Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition
356
Chapter 9
III
...
Object−Relational
Databases
© The McGraw−Hill
Companies, 2001
Object-Relational Databases
cities can be reached from which other cities by a direct flight
...
All we have to do is to replace manager by flight and replace
attribute names correspondingly
...
9
...
Database systems of both types are on the
market, and a database designer needs to choose the kind of system that is appropriate to the needs of the application
...
The declarative nature and limited power (compared to a
programming language) of the SQL language provides good protection of data from
programming errors, and makes high-level optimizations, such as reducing I/O, relatively easy
...
) Objectrelational systems aim at making data modeling and querying easier by using complex data types
...
A declarative language such as SQL, however, imposes a significant performance
penalty for certain kinds of applications that run primarily in main memory, and
that perform a large number of accesses to the database
...
They
provide low-overhead access to persistent data, and eliminate the need for data translation if the data are to be manipulated by a programming language
...
Typical applications include CAD databases
...
For example, some object-oriented database systems built around a
persistent programming language are implemented on top of a relational database
system
...
Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition
III
...
Object−Relational
Databases
9
...
To do so, the complex data types supported by object-relational
systems need to be translated to the simpler type system of relational databases
...
For instance, multivalued attributes in the E-R model correspond to set-valued attributes in the object-relational
model
...
ISA hierarchies
in the E-R model correspond to table inheritance in the object-relational model
...
9,
can be used, with some extensions, to translate object-relational data to relational
data
...
8 Summary
• The object-relational data model extends the relational data model by providing a richer type system including collection types, and object orientation
...
• Collection types include nested relations, sets, multisets, and arrays, and the
object-relational model permits attributes of a table to be collections
...
• We saw a variety of features of the extended data-definition language, as
well as the query language, and in particular support for collection-valued
attributes, inheritance, and tuple references
...
• Object-relational database systems (that is, database systems based on the
object-relation model) provide a convenient migration path for users of relational databases who wish to use object-oriented features
...
• We discussed differences between persistent programming languages and
object-relational systems, and mention criteria for choosing between them
...
Object−Based
Databases and XML
© The McGraw−Hill
Companies, 2001
9
...
1 Consider the database schema
Emp = (ename, setof(Children), setof(Skills))
Children = (name, Birthday)
Birthday = (day, month, year)
Skills = (type, setof(Exams))
Exams = (year, city)
Assume that attributes of type setof(Children), setof(Skills), and setof(Exams),
have attribute names ChildrenSet, SkillsSet, and ExamsSet, respectively
...
Write the following queries in
SQL:1999 (with the extensions described in this chapter)
...
Find the names of all employees who have a child who has a birthday in
March
...
Find those employees who took an examination for the skill type “typing”
in the city “Dayton”
...
List all skill types in the relation emp
...
2 Redesign the database of Exercise 9
...
List any functional or multivalued dependencies that you assume
...
9
...
3
...
Recall the constraints on subtables, and give all constraints that must be imposed on the relational schema
so that every database instance of the relational schema can also be represented
by an instance of the schema with inheritance
...
Object−Based
Databases and XML
9
...
4 A car-rental company maintains a vehicle database for all vehicles in its current
fleet
...
Special data are included
for certain types of vehicles:
•
•
•
•
Trucks: cargo capacity
Sports cars: horsepower, renter age requirement
Vans: number of passengers
Off-road vehicles: ground clearance, drivetrain (four- or two-wheel drive)
Construct an SQL:1999 schema definition for this database
...
9
...
Under what
circumstances would you choose to use a reference type?
9
...
11, which contains composite, multivalued
and derived attributes
...
Give an SQL:1999 schema definition corresponding to the E-R diagram
...
b
...
9
...
17, which
contains specializations
...
8 Consider the relational schema shown in Figure 3
...
a
...
b
...
10 on the above schema, using
SQL:1999
...
9 Consider an employee database with two relations
employee (employee-name, street, city)
works (employee-name, company-name, salary)
where the primary keys are underlined
...
a
...
b
...
9
...
6
...
9
...
Under what circumstances would
you use each of these features?
362
Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition
360
Chapter 9
III
...
Object−Relational
Databases
© The McGraw−Hill
Companies, 2001
Object-Relational Databases
9
...
For each of the following applications, state what
type of database system (relational, persistent-programming-language – based
OODB, object relational; do not specify a commercial product) you would recommend
...
a
...
A system to track contributions made to candidates for public office
c
...
Various algebraic query languages are presented in Fischer and Thomas
[1983], Zaniolo [1983], Ozsoyoglu et al
...
[1988]
...
[1989]
...
[1996]
...
POSTGRES (Stonebraker and Rowe [1986] and Stonebraker [1986a]) was an early implementation of an
object-relational system
...
The Iris database system from Hewlett-Packard (Fishman
et al
...
[1990]) provides object-oriented extensions on top
of a relational database system
...
[1989] is an object-oriented extension of SQL implemented in the O2 object-oriented
database system (Deux [1991])
...
XSQL is an
object-oriented extension of SQL proposed by Kifer et al
...
SQL:1999 was the product of an extensive (and long-delayed) standardization effort, which originally started off as adding object-oriented features to SQL and ended
up adding many more features, such as control flow, as we have seen
...
ansi
...
However,
standards documents are very hard to read, and are best left to SQL:1999 implementers
...
Tools
The Informix database system provides support for many object-relational features
...
0
...
IBM DB2 supports many of the
SQL:1999 features
...
Object−Based
Databases and XML
H
A
P
T
E
R
363
© The McGraw−Hill
Companies, 2001
10
...
In
fact, like the Hyper-Text Markup Language (HTML) on which the World Wide Web is
based, XML has its roots in document management, and is derived from a language
for structuring large documents known as the Standard Generalized Markup Language
(SGML)
...
It is particularly
useful as a data format when an application must communicate with another application, or integrate information from several other applications
...
In this chapter, we introduce XML and discuss both the management of XML data with database techniques and the exchange of data formatted
as XML documents
...
1 Background
To understand XML, it is important to understand its roots as a document markup
language
...
For example, a writer creating text that will eventually
be typeset in a magazine may want to make notes about how the typesetting should
be done
...
In electronic document processing,
a markup language is a formal description of what part of the document is content,
what part is markup, and what the markup means
...
Object−Based
Databases and XML
10
...
For instance, with
functional markup, text representing section headings (for this section, the words
“Background”) would be marked up as being a section heading, instead of being
marked up as text to be printed in large size, bold font
...
It also helps
different parts of a large document, or different pages in a large Web site to be formatted in a uniform manner
...
For the family of markup languages that includes HTML, SGML, and XML the
markup takes the form of tags enclosed in angle-brackets, <>
...
For example, the title of a document might be
marked up as follows
...
This feature is the key to XML’s major role in data representation and exchange, whereas HTML is used primarily for document formatting
...
1
...
These tags provide context for each
value and allow the semantics of the value to be identified
...
However, in spite of
this disadvantage, an XML representation has significant advantages when it is used
to exchange data, for example, as part of a message:
• First, the presence of the tags makes the message self-documenting; that is, a
schema need not be consulted to understand the meaning of the text
...
• Second, the format of the document is not rigid
...
The ability to recognize and ignore unexpected tags allows the
format of the data to evolve over time, without invalidating existing applications
...
Just as SQL is the dominant language for querying relational data, XML is becoming
the dominant format for data exchange
...
Object−Based
Databases and XML
10
...
1
365
© The McGraw−Hill
Companies, 2001
10
...
363
366
Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition
364
Chapter 10
III
...
XML
© The McGraw−Hill
Companies, 2001
XML
10
...
An element is simply
a pair of matching start- and end-tags, and all the text that appears between them
...
In the example in Figure 10
...
Further, elements in an XML document must nest properly
...
...
is properly nested, whereas
...
...
While proper nesting is an intuitive property, we may define it more formally
...
Tags are properly nested if every start-tag has a unique
matching end-tag that is in the context of the same parent element
...
2
...
The ability to nest elements within other elements provides an alternative way to
represent information
...
3 shows a representation of the bank information
from Figure 10
...
The
nested representation makes it easy to find all accounts of a customer, although it
would store account elements redundantly if they are owned by multiple customers
...
For instance, a shipping application would store the full address of sender
and receiver redundantly on a shipping document associated with each shipment,
whereas a normalized representation may require a join of shipping records with a
company-address relation to get address information
...
For instance, the
type of an account can represented as an attribute, as in Figure 10
...
The attributes of
...
...
2
Mixture of text with subelements
...
Object−Based
Databases and XML
367
© The McGraw−Hill
Companies, 2001
10
...
2
Structure of XML Data
365
Figure 10
...
an element appear as name=value pairs before the closing “>” of a tag
...
Furthermore, attributes can appear only once in
a given tag, unlike subelements, which may be repeated
...
However, in database and data exchange applications of XML, this distinction is less relevant, and the choice of representing data as
an attribute or a subelement is frequently arbitrary
...
Since XML documents are designed to be exchanged between applications, a namespace mechanism has been introduced to allow organizations to specify globally
unique names to be used as element tags in documents
...
Object−Based
Databases and XML
© The McGraw−Hill
Companies, 2001
10
...
...
4
Use of attributes
...
The bank may use
a Web URL such as
http://www
...
com
as a unique identifier
...
In Figure 10
...
The abbreviation can
then be used in various element tags, as illustrated in the figure
...
Different elements can then be associated with different namespaces
...
Elements without an explicit namespace prefix would then belong to
the default namespace
...
So that we can do so, XML allows this construct:
· · ·]]>
Because it is enclosed within CDATA, the text
data, not as a tag
...
com”>
...
Figure 10
...
Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition
III
...
XML
10
...
3 XML Document Schema
Databases have schemas, which are used to constrain what information can be stored
in the database and to constrain the data types of the stored information
...
While such freedom may occasionally be acceptable given the self-describing nature of the data format, it is not
generally useful when XML documents must be processesed automatically as part of
an application, or even when large amounts of related data are to be formatted in
XML
...
10
...
1 Document Type Definition
The document type definition (DTD) is an optional part of an XML document
...
However, the DTD does not in fact constrain types
in the sense of basic types like integer or string
...
The DTD is primarily a list of
rules for what pattern of subelements appear within an element
...
6 shows
a part of an example DTD for a bank information document; the XML document in
Figure 10
...
Each declaration is in the form of a regular expression for the subelements of an
element
...
6, a bank element consists of one or more
account, customer, or depositor elements; the | operator specifies “or” while the +
operator specifies “one or more
...
]>
Figure 10
...
370
Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition
368
Chapter 10
III
...
XML
© The McGraw−Hill
Companies, 2001
XML
The account element is defined to contain subelements account-number, branchname and balance (in that order)
...
Finally, the elements account-number, branch-name, balance, customer-name, customer-street, and customer-city are all declared to be of type #PCDATA
...
” Two other special type declarations are empty, which says that the element has
no contents, and any, which says that there is no constraint on the subelements of the
element; that is, any elements, even those not mentioned in the DTD, can occur as
subelements of the element
...
The allowable attributes for each element are also declared in the DTD
...
Attributes may specified to be of
type CDATA, ID, IDREF, or IDREFS; the type CDATA simply says that the attribute contains character data, while the other three are not so simple; they are explained in
more detail shortly
...
Attributes must have a type declaration and a default declaration
...
If an attribute has a default value, for every
element that does not specify a value for the attribute, the default value is filled in
automatically when the XML document is read
An attribute of type ID provides a unique identifier for the element; a value that
occurs in an ID attribute of an element must not occur in any other element in the
same document
...
account-number ID #REQUIRED
owners IDREFS #REQUIRED >
customer-id ID #REQUIRED
accounts IDREFS #REQUIRED >
· · · declarations for branch, balance, customer-name,
customer-street and customer-city · · ·
]>
Figure 10
...
Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition
III
...
XML
10
...
The type
IDREFS allows a list of references, separated by spaces
...
7 shows an example DTD in which customer account relationships are
represented by ID and IDREFS attributes, instead of depositor records
...
The customer elements have a new identifier attribute called customer-id
...
Each account element has an attribute
owners, of type IDREFS, which is a list of owners of the account
...
8 shows an example XML document based on the DTD in Figure 10
...
Note that we use a different set of accounts and customers from our earlier example,
in order to illustrate the IDREFS feature better
...
Figure 10
...
372
Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition
370
Chapter 10
III
...
XML
© The McGraw−Hill
Companies, 2001
XML
Document type definitions are strongly connected to the document formatting heritage of XML
...
Nevertheless, a tremendous number of data exchange formats are being defined in terms of DTDs, since they were
part of the original standard
...
• Individual text elements and attributes cannot be further typed
...
The lack of
such constraints is problematic for data processing and exchange applications,
which must then contain code to verify the types of elements and attributes
...
Order is seldom important for data exchange (unlike document layout,
where it is crucial)
...
6 permits the specification of unordered collections of tags, it is much more difficult to specify that each tag may only appear
once
...
Thus, there is no way to specify
the type of element to which an IDREF or IDREFS attribute should refer
...
7 does not prevent the “owners” attribute of an
account element from referring to other accounts, even though this makes no
sense
...
3
...
We present here an example of XMLSchema, and list
some areas in which it improves DTDs, without giving full details of XMLSchema’s
syntax
...
9 shows how the DTD in Figure 10
...
The first element is the root element bank, whose type is declared later
...
Observe the use
of types xsd:string and xsd:decimal to constrain the types of data elements
...
XMLSchema can define the minimum and
maximum number of occurrences of subelements by using minOccurs and maxOccurs
...
Among the benefits that XMLSchema offers over DTDs are these:
• It allows user-defined types to be created
...
Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition
III
...
XML
10
...
w3
...
9
XMLSchema version of DTD from Figure 10
...
• It allows types to be restricted to create specialized types, for instance by specifying minimum and maximum values
...
• It is a superset of DTDs
...
• It is integrated with namespaces to allow different parts of a document to
conform to different schema
...
9 shows
...
Object−Based
Databases and XML
10
...
10
...
In particular, tools for querying and transformation of XML data are essential
to extract information from large bodies of XML data, and to convert data between
different representations (schemas) in XML
...
As a result, querying
and transformation can be combined into a single tool
...
• XSLT was designed to be a transformation language, as part of the XSL style
sheet system, which is used to control the formatting of XML data into HTML
or other print or display languages
...
Furthermore, it is currently the most widely available language for manipulating
XML data
...
XQuery
combines features from many of the earlier proposals for querying XML, in
particular the language Quilt
...
An XML document is modeled as a tree, with nodes corresponding to elements and attributes
...
Correspondingly, each node (whether attribute or element), other than the root element,
has a parent node, which is an element
...
The terms
parent, child, ancestor, descendant, and siblings are interpreted in the tree model of
XML data
...
Elements containing text broken up by intervening subelements can have multiple
text node children
...
Since such structures are not commonly used in database data, we shall assume that elements do not
contain both text and subelements
...
Object−Based
Databases and XML
375
© The McGraw−Hill
Companies, 2001
10
...
4
Querying and Transformation
373
10
...
1 XPath
XPath addresses parts of an XML document by means of path expressions
...
5
...
A path expression in XPath is a sequence of location steps separated by “/” (instead of the “
...
The result of a path expression is a set of values
...
8, the XPath
expression
/bank-2/customer/name
would return these elements:
The expression
/bank-2/customer/name/text()
would return the same names, but without the enclosing tags
...
(Note
that this is an abstract root “above”
...
As a path expression is evaluated, the result of
the path at any point consists of a set of nodes from the document
...
Since multiple children can have the same name, the number of nodes in the node
set can increase or decrease with each step
...
For instance, /bank-2/account/@account-number returns a set
of all values of account-number attributes of account elements
...
XPath supports a number of other features:
• Selection predicates may follow any step in a path, and are contained in square
brackets
...
We can test the existence of a subelement by listing it without any comparison operation; for instance, if we removed just “> 400” from the above, the
376
Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition
374
Chapter 10
III
...
XML
© The McGraw−Hill
Companies, 2001
XML
expression would return account numbers of all accounts that have a balance
subelement, regardless of its value
...
For example, the path expression
/bank-2/account/[customer/count()> 2]
returns accounts with more than 2 customers
...
) can be used for negation
...
The function id can even be applied on sets of references, or even
strings containing multiple references separated by blanks, such as IDREFS
...
• The | operator allows expression results to be unioned
...
However, the | operator cannot
be nested inside other operators
...
For instance, the expression /bank-2//name finds any name element anywhere under
the /bank-2 element, regardless of the element in which it is contained
...
• Each step in the path need not select from the children of the nodes in the
current node set
...
We omit details, but note that “//”, described above, is a short form for
specifying “all descendants,” while “
...
10
...
2 XSLT
A style sheet is a representation of formatting options for a document, usually stored
outside the document itself, so that formatting is separate from content
...
Object−Based
Databases and XML
377
© The McGraw−Hill
Companies, 2001
10
...
4
Querying and Transformation
375
10
Using XSLT to wrap results in new XML elements
...
The XML Stylesheet
Language (XSL) was originally designed for generating HTML from XML, and is thus
a logical extension of HTML style sheets
...
1 XSLT transformations are quite powerful, and in fact XSLT can even
act as a query language
...
In their basic form, templates allow selection of nodes in an XML tree by an XPath
expression
...
While XSLT can
be used as a query language, its syntax and semantics are quite dissimilar from those
of SQL
...
Consider
this XSLT code:
The first template matches customer elements that occur as children of
the bank-2 root element
...
The first template
outputs the value of the customer-name subelement; note that the value does not
contain the element tag
...
This is required because the default behavior of XSLT on subtrees of the input document that do not match any
template is to copy the subtrees to the output document
...
Figure 10
...
1
...
Formatting is not relevant from a database perspective, so we do not
cover it here
...
Object−Based
Databases and XML
© The McGraw−Hill
Companies, 2001
10
...
”/>
Figure 10
...
Structural recursion is a key part of XSLT
...
The idea of structural recursion is this: When a template matches an element in the tree structure, XSLT can use structural recursion to
apply template rules recursively on subtrees, instead of just outputting a value
...
For example, the results of our previous query can be placed in a surrounding
...
Without recursion forced by the
would output
the subelements
...
XSLT provides a feature called keys, which permit lookup of elements by using
values of subelements or attributes; the goals are similar to that of the id() function in
XPath, but permits attributes other than the ID attributes to be used
...
The match attribute specifies
which nodes the key applies to
...
Note that the expression need not be unique to
an element; that is, more than one element may have the same expression value
...
Keys can be subsequently used in templates as part of any pattern through the
key function
...
Object−Based
Databases and XML
379
© The McGraw−Hill
Companies, 2001
10
...
4
Querying and Transformation
377
12
Joins in XSLT
...
Thus, the XML node for account “A-401” can be
referenced as key(“acctno”, “A-401”)
...
12
...
1
...
The result of the query consists of pairs of customer and account elements enclosed
within cust-acct elements
...
A simple example shows how xsl:sort would be
used in our style sheet to return customer elements sorted by name:
The xsl:sort directive within the xsl:apply-template element causes nodes to be sorted before they are processed by the next set of templates
...
10
...
3 XQuery
The World Wide Web Consortium (W3C) is developing XQuery, a query language
for XML
...
Object−Based
Databases and XML
10
...
The XQuery language derives from an XML query language
called Quilt; most of the XQuery features we outline here are part of Quilt
...
4
...
Unlike XSLT, XQuery does not represent queries in XML
...
The for section gives a series
of variables that range over the results of XPath expressions
...
The let clause simply allows complicated expressions to be assigned
to variable names for simplicity of representation
...
Finally, the return section allows the construction of results in XML
...
8, which uses ID and IDREFS:
for $x in /bank-2/account
let $acctno := $x/@account-number
where $x/balance > 400
return
Since this query is simple, the let clause is not essential, and the variable $acctno
in the return clause could be replaced with $x/@account-number
...
Thus, an equivalent query may have only for and return clauses:
for $x in /bank-2/account[balance > 400]
return
However, the let clause simplifies complex queries
...
The function distinct applied on a multiset, returns a set without duplication
...
XQuery also provides aggregate functions
such as sum and count that can be applied on collections such as sets and multisets
...
Note also that variables assigned by let clauses may be set- or
multiset-valued, if the path expression on the right-hand side returns a set or multiset
value
...
The join of depositor, account and customer elements in Figure 10
...
4
...
Object−Based
Databases and XML
381
© The McGraw−Hill
Companies, 2001
10
...
4
Querying and Transformation
379
for $b in /bank/account,
$c in /bank/customer,
$d in /bank/depositor
where $a/account-number = $d/account-number
and $c/customer-name = $d/customer-name
return
The same query can be expressed with the selections specified as XPath selections:
for $a in /bank/account,
$c in /bank/customer,
$d in /bank/depositor[account-number = $a/account-number
and customer-name = $c/customer-name]
return
XQuery FLWR expressions can be nested in the return clause, in order to generate
element nestings that do not appear in the source document
...
5
...
For instance, the XML structure shown in Figure 10
...
1 by this
query:
for $c in /bank/customer
return
$c/*
for $d in /bank/depositor[customer-name = $c/customer-name],
$a in /bank/account[account-number=$d/account-number]
return $a
The query also introduces the syntax $c/*, which refers to all the children of the node,
which is bound to the variable $c
...
Path expressions in XQuery are based on path expressions in XPath, but XQuery
provides some extensions (which may eventually be added to XPath itself)
...
The operator can be applied on a value of type
IDREFS to get a set of elements
...
We leave details to the reader
...
For instance, this query outputs all customer elements sorted by the name subelement:
382
Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition
380
Chapter 10
III
...
XML
© The McGraw−Hill
Companies, 2001
XML
for $c in /bank/customer,
return
To sort in descending order, we can use sortby(name descending)
...
For instance, we can get a nested
representation of bank information sorted in customer name order, with accounts of
each customer sorted by account number, as follows
...
For instance, the built-in function document(name) returns the root of a named
document; the root can then be used in a path expression to access the contents of the
document
...
XQuery also provides functions to con-
vert between types
...
XQuery offers a variety of other features, such as if-then-else clauses, which can be
used within return clauses, and existential and universal quantification, which can
be used in predicates in where clauses
...
Universal quantification can be expressed by using
every in place of some
...
5 The Application Program Interface
With the wide acceptance of XML as a data representation and exchange format, software tools are widely available for manipulation of XML data
...
Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition
III
...
XML
10
...
Programs may access parts of the document in a navigational fashion,
beginning with the root
...
We outline here some of the interfaces and methods in the Java
API for DOM, to give a flavor of DOM
...
The Node interface provides methods such as getParentNode(), getFirstChild(), and
getNextSibling(), to navigate the DOM tree, starting with the root node
...
Attribute values of an element can be accessed by name, using the method getAttribute(name)
...
The method getData() on the Text node returns the text contents
...
Many more details are required for writing an actual DOM program; see the bibliographical notes for references to further information
...
However, the DOM interface does not support any form of declarative querying
...
This API is built on the notion of event handlers, which consists of user-specified
functions associated with parsing events
...
The
pieces of a document are always encountered in order from start to finish
...
10
...
One way to store XML data is to
convert it to relational representation, and store it in a relational database
...
10
...
1 Relational Databases
Since relational databases are widely used in existing applications, there is a great
benefit to be had in storing XML data in relational databases, so that the data can be
accessed from existing applications
...
Object−Based
Databases and XML
10
...
However, there are many applications
where the XML data is not generated from a relational schema, and translating the
data to relational form for storage may not be straightforward
...
Several alternative approaches are available:
• Store as string
...
For instance, the XML data in Figure 10
...
While the above representation is easy to use, the database system does
not know the schema of the stored elements
...
In fact, it is not even possible to implement simple
selections such as finding all account elements, or finding the account element
with account number A-401, without scanning all tuples of the relation and
examining the contents of the string stored in the tuple
...
For instance, in our example, the
relations would be account-elements, customer-elements, and depositor-elements,
each with an attribute data
...
Thus, a
query that requires account elements with a specified account number can be
answered efficiently with this representation
...
Some database systems, such as Oracle 9, support function indices, which
can help avoid replication of attributes between the XML string and relation
attributes
...
For instance, a function index can be built on a user-defined function that returns the value of the account-number subelement of the XML string in a tuple
...
The above approaches have the drawback that a large part of the XML information is stored within strings
...
• Tree representation
...
Object−Based
Databases and XML
385
© The McGraw−Hill
Companies, 2001
10
...
6
Storage of XML Data
383
Each element and attribute in the XML data is given a unique identifier
...
The relation child
is used to record the parent element of each element and attribute
...
As an exercise, you can represent
the XML data of Figure 10
...
This representation has the advantage that all XML information can be represented directly in relational form, and many XML queries can be translated
into relational queries and executed inside the database system
...
• Map to relations
...
Elements whose schema is unknown are
stored as strings, or as a tree representation
...
All
attributes of these elements are stored as attributes of the relation
...
Otherwise, the relation corresponding to the subelement stores the contents of the subelement, along with
an identifier for the parent type and the attribute stores the identifier of the
subelement
...
If a subelement can occur multiple times in an element, the map-to-relations
approach stores the contents of the subelements in the relation corresponding
to the subelement
...
Note that when we apply this appoach to the DTD of the data in Figure 10
...
The bibliographical notes provide references to such hybrid approaches
...
6
...
Since XML is primarily a file format, a natural storage mechanism is simply a flat file
...
In
particular, it lacks data isolation, integrity checks, atomicity, concurrent access, and security
...
Object−Based
Databases and XML
10
...
Thus, this storage format may be sufficient for some applications
...
XML databases are databases that use XML as
their basic data model
...
This allows much of the
object-oriented database infrastucture to be reused, while using a standard
XML interface
...
It is also possible to build XML databases as a layer on top of relational databases
...
7 XML Applications
A central design goal for XML is to make it easier to communicate information, on the
Web and between applications, by allowing the semantics of the data to be described
with the data itself
...
Two applications of XML for communication
— exchange of data, and mediation of Web information resources— illustrate how
XML achieves its goal of supporting data exchange and demonstrate how database
technology and interaction are key in supporting exchange-based applications
...
7
...
Some examples:
• The chemical industry needs information about chemicals, such as their molecular structure, and a variety of important properties such as boiling and melting points, calorific values, solubility in various solvents, and so on
...
• In shipping, carriers of goods and customs and tax officials need shipment
records containing detailed information about the goods being shipped, from
whom and to where they were sent, to whom and to where they are being
shipped, the monetary value of the goods, and so on
...
Using normalized relational schemas to model such complex data requirements
results in a large number of relations, which is often hard for users to manage
...
Nested element representations help reduce the number of relations that must be
Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition
III
...
XML
10
...
For instance, in our bank example, listing customers
with account elements nested within account elements, as in Figure 10
...
1
...
Data in relational databases must be published,
that is, converted to XML form, for export to other applications
...
While application code can perform the publishing and
shredding operations, the operations are so common that the conversions should
be done automatically, without writing application code, where possible
...
An XML-enabled database supports an automatic mapping from its internal model
(relational, object-relational or object-oriented) to XML
...
A simple mapping might assign an element to every row of a table,
and make each column in that row either an attribute or a subelement of the row’s
element
...
A more complicated mapping would allow nested structures to be created
...
Some database products also allow XML queries to access relational data by treating the XML form of relational data as a virtual XML document
...
7
...
1 Data Mediation
Comparison shopping is an example of a mediation application, in which data about
items, inventory, pricing, and shipping costs are extracted from a variety of Web sites
offering a particular item for sale
...
A personal financial manager is a similar application in the context of banking
...
Suppose that these accounts may be held
at different institutions
...
XML-based mediation addresses the problem by extracting an XML representation of account information from the respective Web sites of
the financial institutions where the individual holds accounts
...
For those that do not, wrapper software is used to generate XML
data from HTML Web pages returned by the Web site
...
Nevertheless, the value provided by mediation often justifies the effort
required to develop and maintain wrappers
...
This may require further transformation of the XML data from each site, since different sites may structure the same information differently
...
Object−Based
Databases and XML
10
...
1, while another may use the
nested format in Figure 10
...
They may also use different names for the same information (for instance, acct-number and account-id), or may even use the same name for
different information
...
Such issues are discussed in more detail in Section 19
...
XML query languages such as XSLT and XQuery play an
important role in the task of transformation between different XML representations
...
8 Summary
• Like the Hyper-Text Markup Language, HTML, on which the Web is based, the
Extensible Markup Language, XML, is a descendant of the Standard Generalized Markup Language (SGML)
...
• XML documents contain elements, with matching starting and ending tags
indicating the beginning and end of an element
...
Elements may also have
attributes
...
• Elements may have an attribute of type ID that stores a unique identifier for the
element
...
Attributes of type IDREFS can store a list of references
...
The DTD of a document specifies what elements may occur,
how they may be nested, and what attributes each element may have
...
For instance,
they do not provide a type system
...
While it provides more expressive power,
including a powerful type system, it is also more complicated
...
Nesting of elements is reflected by the parent-child
structure of the tree representation
...
XPath is a standard language for path expressions, and allows
required elements to be specified by a file-system-like path, and additionally
allows selections and other features
...
• The XSLT language was originally designed as the transformation language
for a style sheet facility, in other words, to apply formatting information to
Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition
III
...
XML
10
...
However, XSLT offers quite powerful querying and transformation features and is widely available, so it is used for quering XML data
...
Each element in the input XML data is matched against available
templates, and the select part of the first matching template is applied to the
element
...
XSLT supports keys, which
can be used to implement some types of joins
...
• The XQuery language, which is currently being standardized, is based on the
Quilt query language
...
However, it supports many extensions to deal with the tree nature of XML
and to allow for the transformation of XML documents into other documents
with a significantly different structure
...
For example, XML
data can be stored as strings in a relational database
...
As another alternative, XML data can be
mapped to relations in the same way that E-R schemas are mapped to relational schemas
...
• The ability to transform documents in languages such as XSLT and XQuery
is a key to the use of XML in mediation applications, such as electronic business exchanges and the extraction and combination of Web data for use by a
personal finance manager or comparison shopper
...
Object−Based
Databases and XML
© The McGraw−Hill
Companies, 2001
10
...
1 Give an alternative representation of bank information containing the same
data as in Figure 10
...
Also give
the DTD for this representation
...
2 Show, by giving a DTD, how to represent the books nested-relation from Section 9
...
10
...
4 Write the following queries in XQuery, assuming the DTD from Exercise 10
...
a
...
b
...
c
...
Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition
III
...
XML
Exercises
389
· · · similar PCDATA declarations for year, publisher, place, journal, year,
number, volume, pages, last-name and first-name
]>
Figure 10
...
10
...
3 to list all skill
types in Emp
...
6 Write a query in XQuery on the XML representation in Figure 10
...
(Hint: Use a nested query to
get the effect of an SQL group by
...
7 Write a query in XQuery on the XML representation in Figure 10
...
(Hint: Use universal quantification
...
8 Give a query in XQuery to flip the nesting of data from Exercise 10
...
That is, at
the outermost level of nesting the output must have elements corresponding to
authors, and each such element must have nested within it items corresponding to all the books written by the author
...
9 Give the DTD for an XML representation of the information in Figure 2
...
Create a separate element type to represent each relationship, but use ID and IDREF
to implement primary and foreign keys
...
10 Write queries in XSLT and XQuery to output customer elements with associated account elements nested within the customer elements, given the bank
information representation using ID and IDREFS in Figure 10
...
10
...
13
...
You can assume that only books and articles
appear as top level elements in XML documents
...
12 Consider Exercise 10
...
What change would have to be done to the relational schema
...
13 Write queries in XQuery on the bibliography DTD fragment in Figure 10
...
a
...
b
...
c
...
392
Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition
390
III
...
XML
© The McGraw−Hill
Companies, 2001
XML
10
...
1, and the representation of the tree using nodes and child relations described in Section 10
...
1
...
15 Consider the following recursive DTD
...
Give a small example of data corresponding to the above DTD
...
Show how to map this DTD to a relational schema
...
Bibliographical Notes
The XML Cover Pages site (www
...
org/cover/) contains a wealth of XML
information, including tutorial introductions to XML, standards, publications, and
software
...
A large number of technical reports defining the XML
related standards are available at www
...
org
...
[2000] gives an algebra for XML
...
[2000]
...
Deutsch et al
...
Integration of
keyword querying into XML is outlined by Florescu et al
...
Query optimization for XML is described in McHugh and Widom [1999]
...
Other
work on querying and manipulating XML data includes Chawathe [1999], Deutsch
et al
...
[2000]
...
[1999] describe storage of XML data
...
XML support in commercial databases is described in Banerjee
et al
...
See Chapters 25 through 27 for
more information on XML support in commercial databases
...
[2000], Draper et al
...
[1999], and
Carey et al
...
Tools
A number of tools to deal with XML are available in the public domain
...
oasis-open
...
Kweelt (available at http://db
...
upenn
...
Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition
P A
IV
...
A vast majority of databases today
store data on magnetic disk and fetch data into main space memory for processing,
or copy data onto tapes and other backup devices for archival storage
...
Chapter 11 begins with an overview of physical storage media, including mechanisms to minimize the chance of data loss due to failures
...
Storage and retrieval of objects is also covered in Chapter 11
...
An index
is a structure that helps locate desired records of a relation quickly, without examining all records
...
Chapter 12 describes several types of indices used
in database systems
...
It is usually convenient to break up queries into smaller operations, roughly
corresponding to the relational algebra operations
...
There are many alternative ways of processing a query, which can have widely
varying costs
...
Chapter 14 describes the process of query optimization
...
Data Storage and
Querying
H
A
P
T
© The McGraw−Hill
Companies, 2001
11
...
For example, at the conceptual or logical level, we viewed the database, in the relational
model, as a collection of tables
...
This is because the goal of a database system is
to simplify and facilitate access to data; users of the system should not be burdened
unnecessarily with the physical details of the implementation of the system
...
We start with characteristics of the
underlying storage media, such as disk and tape systems
...
We consider several alternative
structures, each best suited to a different kind of access to data
...
11
...
These storage media
are classified by the speed with which data can be accessed, by the cost per unit of
data to buy the medium, and by the medium’s reliability
...
The cache is the fastest and most costly form of storage
...
We shall not
be concerned about managing cache storage in the database system
...
The storage medium used for data that are available to be operated on is main memory
...
Although main memory may contain many megabytes of
393
Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition
394
Chapter 11
IV
...
Storage and File
Structure
© The McGraw−Hill
Companies, 2001
Storage and File Structure
data, or even gigabytes of data in large server systems, it is generally too small
(or too expensive) for storing the entire database
...
• Flash memory
...
Reading data from flash memory takes less than 100 nanoseconds (a nanosecond is 1/1000 of a microsecond), which is roughly as fast as
reading data from main memory
...
To overwrite memory that has
been written already, we have to erase an entire bank of memory at once; it
is then ready to be written again
...
Flash memory has found popularity as a replacement for magnetic disks
for storing small volumes of data (5 to 10 megabytes) in low-cost computer
systems, such as computer systems that are embedded in other devices, in
hand-held computers, and in other digital electronic devices such as digital
cameras
...
The primary medium for the long-term on-line storage of data is the magnetic disk
...
The system must move the data from disk to main memory so that
they can be accessed
...
The size of magnetic disks currently ranges from a few gigabytes to 80 gigabytes
...
Disk storage survives power failures and system crashes
...
• Optical storage
...
7 or 8
...
Data are stored optically on a disk,
and are read by a laser
...
There are “record-once” versions of compact disk (called CD-R) and digital
video disk (called DVD-R), which can be written only once; such disks are also
called write-once, read-many (WORM) disks
...
Recordable compact disks
are magnetic – optical storage devices that use optical means to read magnetically encoded data
...
395
396
Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition
IV
...
Storage and File
Structure
11
...
• Tape storage
...
Although magnetic tape is much cheaper than disks, access to data is much
slower, because the tape must be accessed sequentially from the beginning
...
In contrast, disk storage is referred to as direct-access storage because it is possible
to read data from any location on disk
...
Tape jukeboxes are used to hold exceptionally large
collections of data, such as remote-sensing data from satellites, which could
include as much as hundreds of terabytes (1 terabyte = 1012 bytes), or even a
petabyte (1 petabyte = 1015 bytes) of data
...
1) according
to their speed and their cost
...
As we move
down the hierarchy, the cost per bit decreases, whereas the access time increases
...
In fact, many early storage devices, including paper tape and core memories, are relegated to museums now that magnetic tape
and semiconductor memory have become faster and cheaper
...
1
Storage-device hierarchy
...
Data Storage and
Querying
11
...
Today, almost all active data are stored on disks, except in rare cases
where they are stored on tape or in optical jukeboxes
...
The media in the next level in the hierarchy — for example,
magnetic disks — are referred to as secondary storage, or online storage
...
In addition to the speed and cost of the various storage systems, there is also the
issue of storage volatility
...
In the hierarchy shown in Figure 11
...
In the absence of expensive battery and generator backup systems, data
must be written to nonvolatile storage for safekeeping
...
11
...
Disk capacities have been growing at over 50 percent per year, but the storage requirements of large applications have also been growing very fast, in some cases even
faster than the growth rate of disk capacities
...
11
...
1 Physical Characteristics of Disks
Physically, disks are relatively simple (Figure 11
...
Each disk platter has a flat circular shape
...
Platters are made from rigid metal or glass and are covered (usually on both sides) with magnetic recording material
...
When the disk is in use, a drive motor spins it at a constant high speed (usually 60,
90, or 120 revolutions per second, but disks running at 250 revolutions per second are
available)
...
The disk surface is logically divided into tracks, which are subdivided into sectors
...
In currently available disks, sector sizes are typically 512 bytes; there are over
16,000 tracks on each platter, and 2 to 4 platters per disk
...
The
numbers above vary among different models; higher-capacity models usually have
more sectors per track and more tracks on each platter
...
There may be hundreds of
concentric tracks on a disk surface, containing thousands of sectors
...
Data Storage and
Querying
© The McGraw−Hill
Companies, 2001
11
...
2
Magnetic Disks
397
spindle
track t
arm assembly
sector s
cylinder c
read-write
head
platter
arm
rotation
Figure 11
...
Each side of a platter of a disk has a read– write head, which moves across the
platter to access different tracks
...
The disk platters mounted on a spindle and the heads mounted
on a disk arm are together known as head– disk assemblies
...
Hence, the
ith tracks of all the platters together are called the ith cylinder
...
They have
2
a lower cost and faster seek times (due to smaller seek distances) than do the largerdiameter disks (up to 14 inches) that were common earlier, yet they provide high
storage capacity
...
The read– write heads are kept as close as possible to the disk surface to increase
the recording density
...
Because
the head floats so close to the surface, platters must be machined carefully to be flat
...
If the head contacts the disk surface, the head can
scrape the recording medium off the disk, destroying the data that had been there
...
Under normal circumstances, a head crash results in failure of the entire disk, which
must then be replaced
...
Data Storage and
Querying
© The McGraw−Hill
Companies, 2001
11
...
They are much less susceptible to failure by head crashes
than the older oxide-coated disks
...
This arrangement allows the
computer to switch from track to track quickly, without having to move the head assembly, but because of the large number of heads, the device is extremely expensive
...
Fixed-head disks and multiple-arm disks were
used in high-performance mainframe systems, but are no longer in production
...
It accepts high-level commands to read or write a sector, and
initiates actions, such as moving the disk arm to the right track and actually reading
or writing the data
...
When the sector is
read back, the controller computes the checksum again from the retrieved data and
compares it with the stored checksum; if the data are corrupted, with a high probability the newly computed checksum will not match the stored checksum
...
Another interesting task that disk controllers perform is remapping of bad sectors
...
The remapping is noted on disk or in nonvolatile memory, and the write is
carried out on the new location
...
3 shows how disks are connected to a computer system
...
In modern disk systems, lower-level functions of the disk controller, such as control of the disk arm, computing and verification of checksums, and
remapping of bad sectors, are implemented within the disk drive unit
...
3
Disk subsystem
...
Data Storage and
Querying
© The McGraw−Hill
Companies, 2001
11
...
2
Magnetic Disks
399
disks to personal computers and workstations
...
While disks are usually connected directly by cables to the disk controller, they can
be situated remotely and connected by a high-speed network to the disk controller
...
The disks are usually
organized locally using redundant arrays of independent disks (RAID) storage organizations, but the RAID organization may be hidden from the server computers:
the disk subsystems pretend each RAID system is a very large and very reliable disk
...
Remote access to
disks across a storage area network means that disks can be shared by multiple computers, which could run different parts of an application in parallel
...
11
...
2 Performance Measures of Disks
The main measures of the qualities of a disk are capacity, access time, data-transfer
rate, and reliability
...
To access (that is, to read or write) data on a given sector of a disk,
the arm first must move so that it is positioned over the correct track, and then must
wait for the sector to appear under it as the disk rotates
...
Typical seek times range from 2 to 30 milliseconds, depending on how far the
track is from the initial arm position
...
The average seek time is the average of the seek times, measured over a sequence
of (uniformly distributed) random requests
...
Taking these factors into account, the average seek time is around one-half of
the maximum seek time
...
Once the seek has started, the time spent waiting for the sector to be accessed
to appear under the head is called the rotational latency time
...
1 milliseconds per rotation
...
Thus, the
average latency time of the disk is one-half the time for a full rotation of the disk
...
Once the first sector of the data to be accessed has come under
Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition
400
Chapter 11
IV
...
Storage and File
Structure
© The McGraw−Hill
Companies, 2001
Storage and File Structure
the head, data transfer begins
...
Current disk systems claim to support maximum
transfer rates of about 25 to 40 megabytes per second, although actual transfer rates
may be significantly less, at about 4 to 8 megabytes per second
...
The mean time to failure of a disk (or
of any other system) is the amount of time that, on average, we can expect the system
to run continuously without any failure
...
4 to 136
years
...
A
mean time to failure of 1,200,000 hours does not imply that the disk can be expected
to function for 136 years! Most disks have an expected life span of about 5 years, and
have significantly higher rates of failure once they become more than a few years old
...
The widely used ATA-4 interface standard (also called Ultra-DMA) supports 33 megabytes per second transfer
rates, while ATA-5 supports 66 megabytes per second
...
The transfer rate of the interface is
shared between all disks attached to the interface
...
2
...
Each request specifies the address on the
disk to be referenced; that address is in the form of a block number
...
Block sizes range from
512 bytes to several kilobytes
...
The lower levels of the file-system manager convert block addresses
into the hardware-level cylinder, surface, and sector number
...
One such technique, buffering of blocks
in memory to satisfy future requests, is discussed in Section 11
...
Here, we discuss
several other techniques
...
If several blocks from a cylinder need to be transferred from disk
to main memory, we may be able to save access time by requesting the blocks
in the order in which they will pass under the heads
...
Disk-arm – scheduling algorithms
attempt to order accesses to tracks in a fashion that increases the number of
accesses that can be processed
...
Suppose that,
initially, the arm is moving from the innermost track toward the outside of
the disk
...
Data Storage and
Querying
© The McGraw−Hill
Companies, 2001
11
...
2
Magnetic Disks
401
is an access request, the arm stops at that track, services requests for the track,
and then continues moving outward until there are no waiting requests for
tracks farther out
...
Now, it reverses direction and starts a new cycle
...
• File organization
...
For example, if we expect a file to be accessed sequentially, then we should
ideally keep all the blocks of the file sequentially on adjacent cylinders
...
However, this control places a burden on the programmer or system administrator to decide, for example, how
many cylinders to allocate for a file, and may require costly reorganization if
data are inserted to or deleted from the file
...
However, over time, a sequential file may become fragmented;
that is, its blocks become scattered all over the disk
...
The restore operation writes back the blocks of each file contiguously (or
nearly so)
...
The performance increases realized from these techniques can
be large, but the system is generally unusable while these utilities operate
...
Since the contents of main memory are lost in
a power failure, information about database updates has to be recorded on
disk to survive possible system crashes
...
We can use nonvolatile random-access memory (NV-RAM) to speed up
disk writes drastically
...
A common way to implement nonvolatile RAM is to use battery –
backed-up RAM
...
The controller writes the data to
their destination on disk whenever the disk does not have any other requests,
or when the nonvolatile RAM buffer becomes full
...
Data Storage and
Querying
11
...
On recovery from a system crash, any pending buffered writes in the
nonvolatile RAM are written back to the disk
...
Assume that write requests are received in a random fashion, with the disk
being busy on average 90 percent of the time
...
Doubling the buffer to 100 blocks results in approximately only one write per hour
finding the buffer to be full
...
• Log disk
...
All access to the log disk is sequential, essentially
eliminating seek time, and several consecutive blocks can be written at once,
making writes to the log disk several times faster than random writes
...
Furthermore, the log disk can reorder the writes to
minimize disk arm movement
...
File systems that support log disks as above are called journaling file systems
...
Doing so reduces the monetary cost, at the expense of lower performance
...
Data are not written back to their original destination on disk; instead, the
file system keeps track of where in the log disk the blocks were written most
recently, and retrieves them from that location
...
This approach improves write performance, but generates a high
degree of fragmentation for files that are updated often
...
11
...
1
...
The exact arrival rate
and rate of service are not needed since the disk utilization provides enough information for our calculations
...
Data Storage and
Querying
11
...
3
RAID
403
Having a large number of disks in a system presents opportunities for improving
the rate at which data can be read or written, if the disks are operated in parallel
...
Furthermore, this setup offers the potential for improving the reliability of data storage, because redundant information can be stored on multiple disks
...
A variety of disk-organization techniques, collectively called redundant arrays of
independent disks (RAID), have been proposed to achieve improved performance
and reliability
...
In fact, the I in RAID,
which now stands for independent, originally stood for inexpensive
...
RAID systems are used for their higher reliability and higher performance
rate, rather than for economic reasons
...
3
...
The chance that some disk out of a set of N disks will
fail is much higher than the chance that a specific single disk will fail
...
Then,
the mean time to failure of some disk in an array of 100 disks will be 100,000 / 100 =
1000 hours, or around 42 days, which is not long at all! If we store only one copy of
the data, then each disk failure will result in loss of a significant amount of data (as
discussed in Section 11
...
1)
...
The solution to the problem of reliability is to introduce redundancy; that is, we
store extra information that is not needed normally, but that can be used in the event
of failure of a disk to rebuild the lost information
...
The simplest (but most expensive) approach to introducing redundancy is to duplicate every disk
...
A
logical disk then consists of two physical disks, and every write is carried out on both
disks
...
Data will be lost
only if the second disk fails before the first failed disk is repaired
...
Suppose that the failures of the two disks are independent;
that is, there is no connection between the failure of one disk and the failure of the
other
...
)
Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition
404
Chapter 11
IV
...
Storage and File
Structure
© The McGraw−Hill
Companies, 2001
Storage and File Structure
You should be aware that the assumption of independence of disk failures is not
valid
...
As disks age, the probability of
failure increases, increasing the chance that a second disk will fail while the first is
being repaired
...
Mirrored-disk systems with
mean time to data loss of about 500,000 to 1,000,000 hours, or 55 to 110 years, are
available today
...
Power failures are not a concern if there is no data
transfer to disk in progress when they occur
...
The solution
to this problem is to write one copy first, then the next, so that one of the two copies
is always consistent
...
This matter is examined in Exercise 11
...
11
...
2 Improvement in Performance via Parallelism
Now let us consider the benefit of parallel access to multiple disks
...
The transfer rate of each read is the same as in a single-disk system,
but the number of reads per unit time has doubled
...
In its simplest form, data striping consists of splitting
the bits of each byte across multiple disks; such striping is called bit-level striping
...
The array of eight disks can be treated as a single disk with sectors that are eight
times the normal size, and, more important, that has eight times the transfer rate
...
Bit-level striping can be generalized to a number of disks that either is a
multiple of 8 or a factor of 8
...
Block-level striping stripes blocks across multiple disks
...
With an array of n disks, block-level striping assigns logical block i
of the disk array to disk (i mod n) + 1; it uses the i/n th physical block of the disk
to store logical block i
...
When
reading a large file, block-level striping fetches n blocks at a time in parallel from the
n disks, giving a high data transfer rate for large reads
...
405
406
Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition
IV
...
Storage and File
Structure
© The McGraw−Hill
Companies, 2001
11
...
Other levels
of striping, such as bytes of a sector or sectors of a block also are possible
...
Load-balance multiple small accesses (block accesses), so that the throughput
of such accesses increases
...
Parallelize large accesses so that the response time of large accesses is reduced
...
3
...
Striping provides high datatransfer rates, but does not improve reliability
...
These schemes have different cost – performance trade-offs
...
4
...
) For all levels,
the figure depicts four disk’s worth of data, and the extra disks depicted are used to
store redundant information for failure recovery
...
Figure 11
...
• RAID level 1 refers to disk mirroring with block striping
...
4b shows
a mirrored organization that holds four disks worth of data
...
Memory systems have long used parity bits for error
detection and correction
...
If one of the bits in the byte
gets damaged (either a 1 becomes a 0, or a 0 becomes a 1), the parity of the
byte changes and thus will not match the stored parity
...
Thus, all 1-bit
errors will be detected by the memory system
...
The idea of error-correcting codes can be used directly in disk arrays by
striping bytes across disks
...
Figure 11
...
The disks labeled P store the errorcorrection bits
...
Figure 11
...
Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition
406
Chapter 11
IV
...
Storage and File
Structure
Storage and File Structure
(a) RAID 0: nonredundant striping
C
C
C
C
(b) RAID 1: mirrored disks
P
P
P
(c) RAID 2: memory-style error-correcting codes
P
(d) RAID 3: bit-interleaved parity
P
(e) RAID 4: block-interleaved parity
P
P
P
P
P
(f) RAID 5: block-interleaved distributed parity
P
P
P
P
P
P
(g) RAID 6: P + Q redundancy
Figure 11
...
• RAID level 3, bit-interleaved parity organization, improves on level 2 by
exploiting the fact that disk controllers, unlike memory systems, can detect
whether a sector has been read correctly, so a single parity bit can be used for
error correction, as well as for detection
...
If one of the
sectors gets damaged, the system knows exactly which sector it is, and, for
each bit in the sector, the system can figure out whether it is a 1 or a 0 by computing the parity of the corresponding bits from sectors in the other disks
...
407
408
Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition
IV
...
Storage and File
Structure
© The McGraw−Hill
Companies, 2001
11
...
Figure 11
...
RAID level 3 has two benefits over level 1
...
Since reads and writes of a byte are
spread out over multiple disks, with N -way striping of data, the transfer rate
for reading or writing a single block is N times faster than a RAID level 1 organization using N -way striping
...
• RAID level 4, block-interleaved parity organization, uses block level striping,
like RAID 0, and in addition keeps a parity block on a separate disk for corresponding blocks from N other disks
...
4e
...
A block read accesses only one disk, allowing other requests to be processed by the other disks
...
The transfer rates for large reads is high, since all the disks can be
read in parallel; large writes also have high transfer rates, since the data and
parity can be written in parallel
...
A write of a block has to access the disk on which the block is stored,
as well as the parity disk, since the parity block has to be updated
...
Thus, a single
write requires four disk accesses: two to read the two old blocks, and two to
write the two blocks
...
In level 5, all disks can participate in satisfying
read requests, unlike RAID level 4, where the parity disk cannot participate,
so level 5 increases the total number of requests that can be met in a given
amount of time
...
Figure 11
...
The P ’s are distributed across all the disks
...
The
following table indicates how the first 20 blocks, numbered 0 to 19, and their
parity blocks are laid out
...
Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition
408
Chapter 11
IV
...
Storage and File
Structure
Storage and File Structure
P0
4
8
12
16
0
P1
9
13
17
1
5
P2
14
18
2
6
10
P3
19
3
7
11
15
P4
Note that a parity block cannot store parity for blocks in the same disk,
since then a disk failure would result in loss of data as well as of parity, and
hence would not be recoverable
...
• RAID level 6, the P + Q redundancy scheme, is much like RAID level 5, but
stores extra redundant information to guard against multiple disk failures
...
In the scheme in Figure 11
...
Finally, we note that several variations have been proposed to the basic RAID schemes
described here
...
2
However, the terminology we have presented is the most widely used
...
3
...
Rebuilding is easiest for RAID level 1, since data can
be copied from another disk; for the other levels, we need to access all the other
disks in the array to rebuild data of a failed disk
...
Furthermore, since rebuild time can form a
significant part of the repair time, rebuild performance also influences the mean time
to data loss
...
For example, some products use RAID level 1 to refer to mirroring without striping, and level 1+0 or
level 10 to refer to mirroring with striping
...
409
410
Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition
IV
...
Storage and File
Structure
© The McGraw−Hill
Companies, 2001
11
...
Since RAID levels 2 and 4 are subsumed by RAID levels 3 and 5, the choice
of RAID levels is restricted to the remaining levels
...
For small transfers, the disk access time dominates anyway, so the benefit of parallel reads diminishes
...
Level 6 is not supported currently by many RAID
implementations, but it offers better reliability than level 5 and can be used in applications where data safety is very important
...
RAID level 1 is
popular for applications such as storage of log files in a database system, since it
offers the best write performance
...
For applications where data are
read frequently, and written rarely, level 5 is the preferred choice
...
As a result, for
many existing database applications with moderate storage requirements, the monetary cost of the extra disk storage needed for mirroring has become relatively small
(the extra monetary cost, however, remains a significant issue for storage-intensive
applications such as video data storage)
...
RAID level 5, which increases the number of I/O operations needed to write a
single logical block, pays a significant time penalty in terms of write performance
...
RAID system designers have to make several other decisions as well
...
If there are more bits protected by a parity bit,
the space overhead due to parity bits is lower, but there is an increased chance that a
second disk will fail before the first failed disk is repaired, and that will result in data
loss
...
3
...
RAID can be implemented with no change at the hardware level, using only software
modification
...
However, there
are significant benefits to be had by building special-purpose hardware to support
RAID, which we outline below; systems with special hardware support are called
hardware RAID systems
...
Data Storage and
Querying
11
...
Without such hardware support, extra
work needs to be done to detect blocks that may have been partially written before
power failure (see Exercise 11
...
Some hardware RAID implementations permit hot swapping; that is, faulty disks
can be removed and replaced by new ones without turning power off
...
In fact many critical systems today
run on a 24 × 7 schedule; that is, they run 24 hours a day, 7 days a week, providing
no time for shutting down and replacing a failed disk
...
If a disk
fails, the spare disk is immediately used as a replacement
...
The failed disk
can be replaced at leisure
...
To avoid this possibility, good RAID implementations have multiple
redundant power supplies (with battery backups so they continue to function even
if power fails)
...
Thus, failure of any single component will not stop the functioning of the
RAID system
...
3
...
When applied
to arrays of tapes, the RAID structures are able to recover data even if one of the tapes
in an array of tapes is damaged
...
11
...
The two most common tertiary storage media are optical disks and magnetic tapes
...
4
...
They have
a fairly large capacity (640 megabytes), and they are cheap to mass-produce
...
Disks in the DVD-5 format can store 4
...
Data Storage and
Querying
© The McGraw−Hill
Companies, 2001
11
...
4
Tertiary Storage
411
(in one recording layer), while disks in the DVD-9 format can store 8
...
Recording on both sides of a disk yields even larger
capacities; DVD-10 and DVD-18 formats, which are the two-sided versions of DVD-5
and DVD-9, can store 9
...
CD and DVD drives have much longer seek times (100 milliseconds is common)
than do magnetic-disk drives, since the head assembly is heavier
...
Rotational speeds of CD drives originally corresponded to the audio CD standards, and the speeds of DVD drives originally corresponded to the DVD video standards, but current-generation drives rotate
at many times the standard rate
...
Current CD drives
read at around 3 to 6 megabytes per second, and current DVD drives read at 8 to 15
megabytes per second
...
The transfer rate of optical drives is characterized as n×, which means the drive supports transfers at n times the standard
rate; rates of around 50× for CD and 12× for DVD are now common
...
Since they cannot be overwritten, they can be used
to store information that should not be modified, such as audit trails
...
Jukeboxes are devices that store a large number of optical disks (up to several hundred) and load them automatically on demand to one of a small number (usually, 1 to
10) of drives
...
When a disk is accessed, it is loaded by a mechanical arm from a rack onto a drive
(any disk that was already in the drive must first be placed back on the rack)
...
11
...
2 Magnetic Tapes
Although magnetic tapes are relatively permanent, and can hold large volumes of
data, they are slow in comparison to magnetic and optical disks
...
Thus, they cannot provide random access for secondary-storage requirements, although historically, prior to the
use of magnetic disks, tapes were used as a secondary-storage medium
...
Tapes are also used for storing large volumes of data, such as video or image data,
that either do not need to be accessible quickly or are so voluminous that magneticdisk storage would be too expensive
...
Moving to the correct spot on a tape can take seconds or even minutes, rather than
Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition
412
Chapter 11
IV
...
Storage and File
Structure
© The McGraw−Hill
Companies, 2001
Storage and File Structure
milliseconds; once positioned, however, tape drives can write data at densities and
speeds approaching those of disk drives
...
The market is currently fragmented among a wide variety of tape formats
...
Data transfer rates are of the order of a few to tens of megabytes
per second
...
Tapes, however, have
limits on the number of times that they can be read or written reliably
...
Most other tape formats provide larger capacities, at the cost of slower access;
such formats are ideal for data backup, where fast seeks are not important
...
Applications that need such enormous data
storage include imaging systems that gather data by remote-sensing satellites, and
large video libraries for television broadcasters
...
5 Storage Access
A database is mapped into a number of different files, which are maintained by the
underlying operating system
...
Each file is partitioned into fixed-length storage units called blocks, which
are the units of both storage allocation and data transfer
...
6 various ways to organize the data logically in files
...
The exact set of data items that a block
contains is determined by the form of physical data organization being used (see
Section 11
...
We shall assume that no data item spans two or more blocks
...
A major goal of the database system is to minimize the number of block transfers
between the disk and memory
...
The goal is to maximize the chance
that, when a block is accessed, it is already in main memory, and, thus, no disk access
is required
...
The buffer
is that part of main memory available for storage of copies of disk blocks
...
Data Storage and
Querying
© The McGraw−Hill
Companies, 2001
11
...
5
Storage Access
413
of the block older than the version in the buffer
...
11
...
1 Buffer Manager
Programs in a database system make requests (that is, calls) on the buffer manager
when they need a block from disk
...
If the block is
not in the buffer, the buffer manager first allocates space in the buffer for the block,
throwing out some other block, if necessary, to make space for the new block
...
Then, the buffer manager reads in the requested block from the disk to the buffer, and passes the address of the block in main
memory to the requester
...
If you are familiar with operating-system concepts, you will note that the buffer
manager appears to be nothing more than a virtual-memory manager, like those
found in most operating systems
...
Further, to serve the database system
well, the buffer manager must use techniques more sophisticated than typical virtualmemory management schemes:
• Buffer replacement strategy
...
Most operating systems use a least recently used (LRU) scheme, in which the block that
was referenced least recently is written back to disk and is removed from the
buffer
...
• Pinned blocks
...
For instance, most recovery systems require that a block should
not be written to disk while an update on the block is in progress
...
Although many
operating systems do not support pinned blocks, such a feature is essential for
a database system that is resilient to crashes
...
There are situations in which it is necessary to write
back the block to disk, even though the buffer space that it occupies is not
needed
...
We shall see the
reason for forced output in Chapter 17; briefly, main-memory contents and
thus buffer contents are lost in a crash, whereas data on disk usually survive
a crash
...
5
...
For general-purpose programs, it is not possible to predict accurately
Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition
414
Chapter 11
IV
...
Storage and File
Structure
Storage and File Structure
for each tuple b of borrower do
for each tuple c of customer do
if b[customer-name] = c[customer-name]
then begin
let x be a tuple defined as follows:
x[customer-name] := b[customer-name]
x[loan-number] := b[loan-number]
x[customer-street] := c[customer-street]
x[customer-city] := c[customer-city]
include tuple x as part of result of borrower
end
end
end
Figure 11
...
which blocks will be referenced
...
The assumption generally made
is that blocks that have been referenced recently are likely to be referenced again
...
This approach is called the least recently used (LRU) block-replacement scheme
...
However, a database system is able to predict the pattern of future references more accurately than an
operating system
...
The
database system is often able to determine in advance which blocks will be needed by
looking at each of the steps required to perform the user-requested operation
...
To illustrate how information about future block access allows us to improve the
LRU strategy, consider the processing of the relational-algebra expression
borrower
1
customer
Assume that the strategy chosen to process this request is given by the pseudocode
program shown in Figure 11
...
(We shall study other strategies in Chapter 13
...
In this
example, we can see that, once a tuple of borrower has been processed, that tuple is not
needed again
...
The buffer manager should be instructed to free the space occupied by a
borrower block as soon as the final tuple has been processed
...
Now consider blocks containing customer tuples
...
When processing of
a customer block is completed, we know that that block will not be accessed again
until all other customer blocks have been processed
...
Data Storage and
Querying
© The McGraw−Hill
Companies, 2001
11
...
6
File Organization
415
customer block is the block that will be referenced next
...
Indeed, the optimal
strategy for block replacement is the most recently used (MRU) strategy
...
For the MRU strategy to work correctly for our example, the system must pin the
customer block currently being processed
...
In addition to using knowledge that the system may have about the request being
processed, the buffer manager can use statistical information about the probability
that a request will reference a particular relation
...
8) keeps track of the logical schema of the
relations as well as their physical storage information is one of the most frequently
accessed parts of the database
...
In Chapter 12, we discuss indices for files
...
The ideal database block-replacement strategy needs knowledge of the database
operations— both those being performed and those that will be performed in the
future
...
Indeed, a surprisingly large number of database systems use LRU, despite that strategy’s faults
...
The strategy that the buffer manager uses for block replacement is influenced by
factors other than the time at which the block will be referenced again
...
If the buffer manager is given information from the concurrencycontrol subsystem indicating which requests are being delayed, it can use this information to alter its block-replacement strategy
...
The crash-recovery subsystem (Chapter 17) imposes stringent constraints on block
replacement
...
Instead, the block manager must seek permission from the crashrecovery subsystem before writing out a block
...
In Chapter 17, we define precisely the
interaction between the buffer manager and the crash-recovery subsystem
...
6 File Organization
A file is organized logically as a sequence of records
...
Files are provided as a basic construct in operating systems, so we shall
Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition
416
Chapter 11
IV
...
Storage and File
Structure
Storage and File Structure
record 0
record 1
record 2
A-102
Perryridge
400
A-305
A-215
Round Hill
Mianus
350
700
record 3
record 4
A-101
A-222
500
700
record 5
record 6
record 7
A-201
A-217
A-110
Downtown
Redwood
Perryridge
record 8
A-218
Figure 11
...
assume the existence of an underlying file system
...
Although blocks are of a fixed size determined by the physical properties of the
disk and by the operating system, record sizes vary
...
One approach to mapping the database to files is to use several files, and to store
records of only one fixed length in any given file
...
Many
of the techniques used for the former can be applied to the variable-length case
...
11
...
1 Fixed-Length Records
As an example, let us consider a file of account records for our bank database
...
A simple approach is to use the first 40 bytes
for the first record, the next 40 bytes for the second record, and so on (Figure 11
...
However, there are two problems with this simple approach:
1
...
The space occupied by the
record to be deleted must be filled with some other record of the file, or we
must have a way of marking deleted records so that they can be ignored
...
Data Storage and
Querying
© The McGraw−Hill
Companies, 2001
11
...
6
record 0
record 1
record 3
record 4
A-101
A-222
record 5
record 6
A-201
A-217
record 7
record 8
Figure 11
...
6, with record 2 deleted and all records moved
...
Unless the block size happens to be a multiple of 40 (which is unlikely), some
records will cross block boundaries
...
It would thus require two block accesses to
read or write such a record
...
7)
...
It might be easier simply to move the
final record of the file into the space occupied by the deleted record (Figure 11
...
It is undesirable to move records to occupy the space freed by a deleted record,
since doing so requires additional block accesses
...
A simple
marker on a deleted record is not sufficient, since it is hard to find this available space
when an insertion is being done
...
At the beginning of the file, we allocate a certain number of bytes as a file header
...
For now, all we need
to store there is the address of the first record whose contents are deleted
...
8
A-102
A-305
A-217
A-110
Brighton
Downtown
750
600
File of Figure 11
...
Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition
418
Chapter 11
IV
...
Storage and File
Structure
Storage and File Structure
header
record 0
A-102
Perryridge
400
record 2
record 3
record 4
A-215
A-101
Mianus
700
Downtown
500
record 5
record 6
A-201
Perryridge
900
record 7
record 8
A-110
A-218
Downtown
Perryridge
600
record 1
Figure 11
...
6, with free list after deletion of records 1, 4, and 6
...
Intuitively,
we can think of these stored addresses as pointers, since they point to the location of
a record
...
Figure 11
...
6, with the free list, after records 1, 4,
and 6 have been deleted
...
We
change the header pointer to point to the next available record
...
Insertion and deletion for files of fixed-length records are simple to implement,
because the space made available by a deleted record is exactly the space needed to
insert a record
...
An inserted record may not fit in the space left free by a deleted record, or it
may fill only part of that space
...
6
...
For purposes of
illustration, we shall use one example to demonstrate the various implementation
techniques
...
6, in which we use one variable-length record for each
branch name and for all the account information for that branch
...
Data Storage and
Querying
© The McGraw−Hill
Companies, 2001
11
...
6
File Organization
419
type account-list = record
branch-name : char (22);
account-info : array [1
...
That is,
the type definition does not limit the number of elements in the array, although any
actual record will have a specific number of elements in its array
...
11
...
2
...
We can then store each record as a
string of consecutive bytes
...
10 shows such an organization to represent the
file of fixed-length records of Figure 11
...
An alternative
version of the byte-string representation stores the record length at the beginning of
each record, instead of using end-of-record symbols
...
10 has some disadvantages:
• It is not easy to reuse space occupied formerly by a deleted record
...
• There is no space, in general, for records to grow longer
...
g
...
Thus, the basic byte-string representation described here not usually used for implementing variable-length records
...
10
Byte-string representation of variable-length records
...
Data Storage and
Querying
11
...
11
Slotted-page structure
...
The slotted-page structure appears in Figure 11
...
There is a header at the beginning of each block, containing the following information:
1
...
The end of free space in the block
3
...
The free space in the block is contiguous, between the final entry in the
header array, and the first record
...
If a record is deleted, the space that it occupies is freed, and its entry is set to
deleted (its size is set to −1, for example)
...
The end-of-free-space pointer in the header is appropriately updated as well
...
The cost of moving the records is not too high, since the size of a block is
limited: A typical value is 4 kilobytes
...
Instead, pointers must point to the entry in the header that contains the
actual location of the record
...
11
...
2
...
There are two ways of doing this:
1
...
If there is a maximum record length that is never exceeded,
we can use fixed-length records of that length
...
Data Storage and
Querying
© The McGraw−Hill
Companies, 2001
11
...
6
0
Perryridge
A-102
400
A-201
900
1
Round Hill
A-305
350
⊥
⊥
2
3
4
Mianus
Downtown
Redwood
700
500
700
⊥
A-110
⊥
5
Brighton
A-215
A-101
A-222
A-217
750
⊥
Figure 11
...
10, using the reserved-space method
...
2
...
We can represent variable-length records by lists of fixedlength records, chained together by pointers
...
Figure 11
...
10
would be represented if we allowed a maximum of three accounts per branch
...
Those branches with fewer than three accounts (for example, Round
Hill) have records with null fields
...
12
...
The reserved-space method is useful when most records have a length close to
the maximum
...
In our bank
example, some branches may have many more accounts than others
...
To represent the file by the linked list
method, we add a pointer field as we did in Figure 11
...
The resulting structure appears in Figure 11
...
0
1
2
3
4
5
6
7
Perryridge
Round Hill
Mianus
Downtown
Redwood
8
Figure 11
...
10 using linked lists
...
Data Storage and
Querying
© The McGraw−Hill
Companies, 2001
11
...
14
A-102
A-305
A-215
A-101
A-222
A-217
400
350
700
500
700
750
A-201
A-218
A-110
900
700
600
Anchor-block and overflow-block structures
...
9 and 11
...
9, we use pointers to chain together only deleted records, whereas
in Figure 11
...
A disadvantage to the structure of Figure 11
...
The first record needs to have the branch-name value, but
subsequent records do not
...
This wasted space is significant,
since we expect, in practice, that each branch has a large number of accounts
...
Anchor block, which contains the first record of a chain
2
...
Figure 11
...
11
...
An instance
of a relation is a set of records
...
Several of the possible ways of organizing records in files are:
• Heap file organization
...
There is no ordering of records
...
Records are stored in sequential order, according to the value of a “search key” of each record
...
7
...
423
424
Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition
IV
...
Storage and File
Structure
11
...
A hash function is computed on some attribute of
each record
...
Chapter 12 describes this organization; it is
closely related to the indexing structures described in that chapter
...
However,
in a clustering file organization, records of several different relations are stored in
the same file; further, related records of the different relations are stored on the same
block, so that one I/O operation fetches related records from all the relations
...
Section 11
...
2 describes this organization
...
7
...
A search key is any attribute or set of attributes; it need not be
the primary key, or even a superkey
...
The pointer in each record points to
the next record in search-key order
...
Figure 11
...
In that example, the records are stored in search-key order, using branchname as the search key
...
It is difficult, however, to maintain physical sequential order as records are inserted and deleted, since it is costly to move many records as a result of a single
A-217
A-101
Brighton
Downtown
750
500
A-110
A-215
A-102
A-201
A-218
A-222
A-305
Downtown
Mianus
Perryridge
600
700
400
900
700
700
350
Figure 11
...
Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition
424
Chapter 11
IV
...
Storage and File
Structure
Storage and File Structure
A-217
A-101
Brighton
Downtown
750
500
A-110
A-215
A-102
A-201
A-218
A-222
A-305
Downtown
Mianus
Perryridge
Perryridge
Perryridge
Redwood
600
700
400
900
700
700
Round Hill
350
A-888
North Town
800
Figure 11
...
insertion or deletion
...
For insertion, we apply the following rules:
1
...
2
...
Otherwise, insert the new record in
an overflow block
...
Figure 11
...
15 after the insertion of the record (North
Town, A-888, 800)
...
16 allows fast insertion of new records,
but forces sequential file-processing applications to process records in an order that
does not match the physical order of the records
...
Eventually, however, the correspondence between search-key order and physical order may be totally lost, in which case sequential processing will become much
less efficient
...
Such reorganizations are costly, and must be done during
times when the system load is low
...
In the extreme case in
which insertions rarely occur, it is possible always to keep the file in physically sorted
order
...
15 is not needed
...
7
...
Usually, tuples of a relation can be represented as fixed-length records
...
Data Storage and
Querying
© The McGraw−Hill
Companies, 2001
11
...
7
Organization of Records in Files
425
can be mapped to a simple file structure
...
In such systems, the size of the database
is small, so little is gained from a sophisticated file structure
...
A simple file structure reduces the amount of code needed to implement the system
...
We have seen that there are performance
advantages to be gained from careful assignment of records to blocks, and from careful organization of the blocks themselves
...
However, many large-scale database systems do not rely directly on the underlying operating system for file management
...
The database system stores all relations in this one
file, and manages the file itself
...
customer-name = customer
...
Thus, for each
tuple of depositor, the system must locate the customer tuples with the same value for
customer-name
...
Regardless of how these records are located, however,
they need to be transferred from disk into main memory
...
As a concrete example, consider the depositor and customer relations of Figures
11
...
18, respectively
...
19, we show a file structure designed for efficient execution of queries involving depositor 1 customer
...
This structure mixes together tuples of two relations, but allows for efficient
processing of the join
...
Since the corresponding
customer-name
Hayes
Hayes
Hayes
Turner
Figure 11
...
Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition
426
Chapter 11
IV
...
Storage and File
Structure
Storage and File Structure
customer-name
Hayes
Turner
customer-street customer-city
Main
Brooklyn
Putnam
Stamford
Figure 11
...
depositor tuples are stored on the disk near the customer tuple, the block containing the
customer tuple contains tuples of the depositor relation needed to process the query
...
A clustering file organization is a file organization, such as that illustrated in Figure 11
...
Such a file
organization allows us to read records that would satisfy the join condition by using
one block read
...
Our use of clustering has enhanced processing of a particular join (depositor 1 customer), but it results in slowing processing of other types of query
...
Instead of several customer records appearing in one block,
each record is located in a distinct block
...
To locate all tuples of the
customer relation in the structure of Figure 11
...
20
...
Careful use of clustering can produce significant
performance gains in query processing
...
8 Data-Dictionary Storage
So far, we have considered only the representation of the relations themselves
...
19
Brooklyn
Stamford
A-305
Clustering file structure
...
Data Storage and
Querying
© The McGraw−Hill
Companies, 2001
11
...
8
Hayes
Hayes
Main
A-102
Hayes
Hayes
Turner
Turner
A-220
A-503
Putnam
Figure 11
...
schema of the relations
...
Among the types of information that the system must store are these:
• Names of the relations
• Names of the attributes of each relation
• Domains and lengths of attributes
• Names of views defined on the database, and definitions of those views
• Integrity constraints (for example, key constraints)
In addition, many systems keep the following data on users of the system:
• Names of authorized users
• Accounting information about users
• Passwords or other information used to authenticate users
Further, the database may store statistical and descriptive data about the relations,
such as:
• Number of tuples in each relation
• Method of storage for each relation (for example, clustered or nonclustered)
The data dictionary may also note the storage organization (sequential, hash or heap)
of relations, and the location where each relation is stored:
• If relations are stored in operating system files, the dictionary would note the
names of the file (or files) containing each relation
...
In Chapter 12, in which we study indices, we shall see a need to store information
about each index on each of the relations:
Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition
428
Chapter 11
IV
...
Storage and File
Structure
© The McGraw−Hill
Companies, 2001
Storage and File Structure
• Name of the index
• Name of the relation being indexed
• Attributes on which the index is defined
• Type of index formed
All this information constitutes, in effect, a miniature database
...
It is generally preferable to store the data about the database in the database itself
...
The exact choice of how to represent system data by relations must be made by
the system designers
...
The Index-metadata relation is thus
not in first normal form; it can be normalized, but the above representation is likely
to be more efficient to access
...
The storage organization and location of the Relation-metadata itself must be recorded elsewhere (for example, in the database code itself), since we need this information to find the contents of Relation-metadata
...
9 Storage for Object-Oriented Databases∗∗
The file-organization techniques described in Section 11
...
However, some extra features are needed to support objectoriented database features, such as set-valued fields and persistent pointers
...
9
...
At the lowest level of data representation, both tuples and the data
parts of objects are simply sequences of bytes
...
Objects in object-oriented databases may lack the uniformity of tuples in relational
databases
...
Data Storage and
Querying
© The McGraw−Hill
Companies, 2001
11
...
9
Storage for Object-Oriented Databases∗∗
429
trast, data are typically required to be (at least) in first normal form
...
Such objects have to be managed differently from
records in a relational system
...
Set-valued fields that have a larger number of
elements can be implemented as relations in the database
...
Each tuple also
contains the object identifier of the object
...
The storage system gives the upper levels
of the database system the view of a set-valued field, even though the set-valued field
has actually been normalized by creating a new relation
...
Such large objects may each be stored in a separate file
...
9
...
11
...
2 Implementation of Object Identifiers
Since objects are identified by object identifiers (OIDs), an object-storage system needs
a mechanism to locate an object, given an OID
...
If the OIDs are physical
OIDs — that is, they encode the location of the object — then the object can be found
directly
...
A volume or file identifier
2
...
An offset within the block
A volume is a logical unit of storage that usually corresponds to a disk
...
The unique
identifier is also stored with the object, and the identifiers in an OID and the corresponding object should match
...
(A dangling pointer is a
pointer that does not point to a valid object
...
21 illustrates this scheme
...
If the space occupied by the object had been
reallocated, there may be a new object in the location, and it may get incorrectly
addressed by the identifier of the old object
...
The unique
identifier helps to detect such errors, since the unique identifiers of the old physical
OID and the new object will not match
...
Data Storage and
Querying
Chapter 11
© The McGraw−Hill
Companies, 2001
11
...
Block
...
56850
...
Good OID
Object
Unique-Id
Location
Data
(a) General structure
Figure 11
...
56850
...
56850
...
Suppose that an object has to be moved to a new block, perhaps because the size of
the object has increased, and the old block has no extra space
...
Rather than change
the OID of the object (which involves changing every object that points to this one),
we leave behind a forwarding address at the old location
...
11
...
3 Management of Persistent Pointers
We implement persistent pointers in a persistent programming language by using
OIDs
...
An important difference between persistent pointers and in-memory
pointers is the size of the pointer
...
On most current computers, in-memory pointers are
usually 4 bytes long, which is sufficient to address 4 gigabytes of memory
...
Since database systems are often bigger than 4 gigabytes, persistent pointers are usually at least 8 bytes
long
...
This feature further increases the size of persistent pointers
...
The action of looking up an object, given its identifier, is called dereferencing
...
Given a persistent pointer, dereferencing an object has an extra step — finding the actual location of the object in memory by looking up the persistent pointer
in a table
...
We
can implement the table lookup fairly efficiently by using a hash table data structure,
but the lookup is still slow compared to a pointer dereference, even if the object is
already in memory
...
Data Storage and
Querying
11
...
9
© The McGraw−Hill
Companies, 2001
Storage for Object-Oriented Databases∗∗
431
Pointer swizzling is a way to cut down the cost of locating persistent objects that
are already present in memory
...
Now the system carries out an extra step — it stores an in-memory
pointer to the object in place of the persistent pointer
...
(When persistent objects have to be
moved from memory back to disk to make space for other persistent objects, the
system must carry out an extra step to ensure that the object is still in memory
...
Pointer swizzling on pointer dereference, as described here, is
called software swizzling
...
One way to ensure that it will not change is to pin pages containing swizzled objects in the buffer pool, so that they are never replaced until the program
that performed the swizzling has finished execution
...
11
...
4 Hardware Swizzling
Having two types of pointers, persistent and transient (in-memory), is inconvenient
...
It
would be simpler if both persistent and in-memory pointers were of the same type
...
However, the storage cost of longer persistent pointers will have to be borne by in-memory
pointers as well; understandably, this scheme is unpopular
...
When data in a virtual memory page are accessed, and the operating system detects that the page does not have real storage allocated for it, or has
been access protected, then a segmentation violation is said to occur
...
In most Unix systems, the
mmap system call provides this latter functionality
...
3
...
Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition
432
Chapter 11
IV
...
Storage and File
Structure
© The McGraw−Hill
Companies, 2001
Storage and File Structure
Hardware swizzling has two major advantages over software swizzling:
1
...
2
...
Software written to deal with in-memory pointers
can thereby deal with persistent pointers as well, without any changes
...
9
...
1 Pointer Representation
Hardware swizzling uses the following representation of persistent pointers contained in objects that are on disk
...
4 The page
identifier in a persistent pointer is actually a small indirect pointer, which we call
the short page identifier
...
The system has to look up the short page identifier in a persistent pointer in the
translation table to find the full page identifier
...
In practice, the
translation table is likely to contain much less than the maximum number of elements
(1024 in our example) and will not consume excessive space
...
Hence, a small number of bits is enough to
store the short page identifier
...
Even though only a few
bits are needed for the short page identifier, all the bits of an in-memory pointer,
other than the page-offset bits, are used as the short page identifier
...
The persistent-pointer representation scheme appears in Figure 11
...
The translation table gives the mapping between short page identifiers and the full database page identifiers for each of the short page identifiers in these persistent pointers
...
page
...
Each page maintains extra information so that all persistent pointers in the page
can be found
...
The need to locate all the persistent pointers in a page will become clear
later
...
The term page is generally used to refer to a real-memory or virtual-memory page, and the term
block is used to refer to disk blocks in the database
...
We shall use the terms page and block
interchangeably in this section
...
Data Storage and
Querying
© The McGraw−Hill
Companies, 2001
11
...
9
Storage for Object-Oriented Databases∗∗
PageId Off
...
PageId Off
...
22
2395
4867
679
...
56850
Page image before swizzling
...
9
...
2 Swizzling Pointers on a Page
Initially no page of the database has been allocated a page in virtual memory
...
Database pages get loaded into virtual-memory when
the database system needs to access data on the page
...
The system then loads the database page into the virtualmemory page it has allocated to it
...
It takes the following actions for
each persistent pointer in the page
...
Let Pi be the
full page identifier of pi , found in the translation table in page P
...
If page Pi does not already have a virtual-memory page allocated to it, the
system now allocates a free page in virtual memory to it
...
At this
point, the page in virtual address space does not have any storage allocated
for it, either in memory or on disk; it is merely a range of addresses reserved
for the database page
...
2
...
The system updates the persistent pointer being considered, whose
value is pi , oi , by replacing pi with vi
...
Data Storage and
Querying
© The McGraw−Hill
Companies, 2001
11
...
PageId Off
...
5001
4867
5001
255
object 1
020
object 2
170
object 3
PageID FullPageID
translation table
Figure 11
...
34278
519
...
Figure 11
...
22 after the system has
brought that page into memory and swizzled the pointers in it
...
34278 has been mapped to page
5001 in memory, whereas the page whose identifier is 519
...
All the pointers in objects
have been updated to reflect the new mapping, and can now be used as in-memory
pointers
...
Thus, objects in in-memory pages contain only inmemory pointers
...
That is indeed an important
advantage!
11
...
4
...
As we described,
a segmentation violation will occur, and will result in a function call on the database
system
...
It first determines what database page was allocated to virtual-memory page
vi ; let the full page identifier of the database page be Pi
...
)
2
...
435
436
Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition
IV
...
Storage and File
Structure
11
...
It carries out pointer swizzling out on page Pi , as described earlier in “Swizzling Pointer on a Page”
...
After swizzling all persistent pointers in P , the system allows the pointer
dereference that resulted in the segmentation violation to continue
...
If any swizzled pointer that points to an object in page vi is dereferenced later,
the dereference proceeds just like any other virtual-memory access, with no extra
overheads
...
This overhead
has to be incurred on every access to objects in the page, whereas when swizzling is
performed, the overhead is incurred only on the first access to an object in the page
...
Hardware swizzling
thus gives excellent performance benefits to applications that repeatedly dereference
pointers
...
9
...
4 Optimizations
Software swizzling performs a deswizzling operation when a page in memory has
to be written back to the database, to convert in-memory pointers back to persistent
pointers
...
For example, as shown in Figure 11
...
34278 (with short
identifier 2395 in the page shown) is mapped to virtual-memory page 5001
...
Thus, the short identifier 5001 in
object 1 and in the table match each other again
...
Several optimizations can be carried out on the basic scheme described here
...
If the system can allocate the page in this attempt, pointers
to it do not need to be updated
...
56850 with short
page identifier 4867 was mapped to virtual-memory page 4867, which is the same as
its short page identifier
...
If every page can be allocated to its appropriate
location in virtual address space, none of the pointers need to be translated, and the
cost of swizzling is reduced significantly
...
If they do not, a page that has been brought into virtual
memory will have to be replaced, and that replacement is hard to do, since there may
be in-memory pointers to objects in that page
...
Data Storage and
Querying
11
...
For set-level swizzling, the system uses a
single translation table for all pages in the segment
...
11
...
5 Disk Versus Memory Structure of Objects
The format in which objects are stored in memory may be different from the format in which they are stored on disk in the database
...
Another reason may be that we want to have the database accessible from
different machines, possibly based on different architectures, and from different languages, and from programs compiled under different compilers, all of which result
in differences in the in-memory representation
...
The physical structure (such as sizes and representation of integers)
in the object depends on the machine on which the program is run
...
The solution to this problem is to make the physical representation of objects in the
database independent of the machine and of the compiler
...
It can do
this conversion transparently at the same time that it swizzles pointers in the object,
so the programmer does not need to worry about the conversion
...
One such
language is the Object Definition Language (ODL) developed by the Object Database
Management Group (ODMG)
...
The definition of the structure of each class in the database is stored (logically) in
the databases
...
We can generate this code
automatically, using the stored definition of the class of the object
...
Hidden pointers are transient pointers
5
...
However, they differ in how the bits of an integer are laid out within a word
...
437
438
Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition
IV
...
Storage and File
Structure
11
...
These pointers point (indirectly) to tables
used to implement certain methods of the object
...
Therefore, when a process
accesses an object, the hidden pointers must be fixed to point to the correct location
...
11
...
6 Large Objects
Objects may also be extremely large; for instance, multimedia objects may occupy
several megabytes of space
...
Large objects containing binary data are called
binary large objects (blobs), while large objects containing character data, are called
character large objects (clobs), as we saw in Section 9
...
1
...
Large objects
and long fields are often stored in a special file (or collection of files) reserved for
long-field storage
...
Large
objects may need to be stored in a contiguous sequence of bytes when they are
brought into memory; in that case, if an object is bigger than a page, contiguous pages
of the buffer pool must be allocated to store it, which makes buffer management more
difficult
...
If inserts and
deletes need to be supported, we can handle large objects by using B-tree structures
(which we study in Chapter 12)
...
For practical reasons, we may manipulate large objects by using application programs, instead of doing so within the database:
• Text data
...
• Image/Graphical data
...
Although some graphical
data often are managed within the database system itself, special application
software is used for many cases, such as integrated circuit design
...
Audio and video data are typically a digitized, compressed representation created and displayed by separate application software
...
Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition
438
Chapter 11
IV
...
Storage and File
Structure
© The McGraw−Hill
Companies, 2001
Storage and File Structure
The most widely used method for updating such data is the checkout/checkin
method
...
Checkout and a checkin correspond roughly to read and write
...
11
...
They are classified by the speed with which they can access data, by their cost per unit
of data to buy the memory, and by their reliability
...
• Two factors determine the reliability of storage media: whether a power failure or system crash causes data to be lost, and what the likelihood is of physical failure of the storage devise
...
For disks, we can use mirroring
...
By striping
data across disks, these methods offer high throughput rates on large accesses;
by introducing redundancy across disks, they improve reliability greatly
...
RAID level 1 (mirroring) and RAID level
5 are the most commonly used
...
One approach to mapping the database to files is to use several files,
and to store records of only one fixed length in any given file
...
There are different techniques for implementing variable-length records, including the slotted-page method, the pointer method, and the reserved-space
method
...
If we can access several of the records
we want with only one block access, we save disk accesses
...
• One way to reduce the number of disk accesses is to keep as many blocks as
possible in main memory
...
The buffer is that part of main memory avail-
439
440
Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition
IV
...
Storage and File
Structure
11
...
The subsystem responsible for the
allocation of buffer space is called the buffer manager
...
There are schemes to detect
dangling persistent pointers
...
The hardware-based schemes use the virtualmemory-management support implemented in hardware, and made accessible to user programs by many current-generation operating systems
...
Data Storage and
Querying
© The McGraw−Hill
Companies, 2001
11
...
1 List the physical storage media available on the computers you use routinely
...
11
...
3 Consider the following data and parity-block arrangement on four disks:
Disk 1
B1
P1
B8
...
...
...
Disk 3
B3
B6
B9
...
...
...
The Bi ’s represent data blocks; the Pi ’s represent parity blocks
...
What, if any, problem might
this arrangement present?
441
442
Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition
IV
...
Storage and File
Structure
© The McGraw−Hill
Companies, 2001
Exercises
441
11
...
Assume that partially written blocks can
be detected
...
e
...
Suggest schemes
for getting the effect of atomic block writes with the following RAID schemes
...
a
...
RAID level 5 (block interleaved, distributed parity)
11
...
Thus, the data in the failed disk must be rebuilt and written
to the replacement disk while the system is in operation
...
11
...
MRU is preferable to LRU
...
LRU is preferable to MRU
...
7 Consider the deletion of record 5 from the file of Figure 11
...
Compare the
relative merits of the following techniques for implementing the deletion:
a
...
b
...
c
...
11
...
9 after each of the following steps:
a
...
b
...
c
...
11
...
Explain your answer
...
10 Give an example of a database application in which the pointer method of representing variable-length records is preferable to the reserved-space method
...
11
...
12 after each of the following steps:
a
...
b
...
c
...
Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition
442
Chapter 11
IV
...
Storage and File
Structure
© The McGraw−Hill
Companies, 2001
Storage and File Structure
11
...
12?
11
...
13 after each of the following steps:
a
...
b
...
c
...
11
...
11
...
Discuss how the control on replacement
that it provides would be useful for the implementation of database systems
...
16 In the sequential file organization, why is an overflow block used even if there
is, at the moment, only one overflow record?
11
...
Store each relation in one file
...
Store multiple relations (perhaps even the entire database) in one file
...
18 Consider a relational database with two relations:
course (course-name, room, instructor)
enrollment (course-name, student-name, grade)
Define instances of these relations for three courses, each of which enrolls five
students
...
11
...
For
each block in the file, two bits are maintained in the bitmap
...
Such bitmaps can be kept in memory even for quite large files
...
Describe how to keep the bitmap up-to-date on record insertions and deletions
...
Outline the benefit of the bitmap technique over free lists when searching
for free space and when updating free space information
...
20 Give a normalized version of the Index-metadata relation, and explain why using the normalized version would result in worse performance
...
21 Explain why a physical OID must contain more information than a pointer to a
physical storage location
...
Data Storage and
Querying
11
...
22 If physical OIDs are used, an object can be relocated by keeping a forwarding
pointer to its new location
...
11
...
Describe how the unique-id scheme helps in
detecting dangling pointers in an object-oriented database
...
24 Consider the example on page 435, which shows that there is no need for
deswizzling if hardware swizzling is used
...
34278 from 2395 to 5001
...
Rosch and Wethington [1999]
presents an excellent overview of computer hardware, including extensive coverage of all types of storage technology such as floppy disks, magnetic disks, optical
disks, tapes, and storage interfaces
...
Flash memory is discussed by Dippert and Levy [1993]
...
Alternative disk organizations that provide a high degree of fault tolerance include those described by Gray et al
...
Disk striping
is described by Salem and Garcia-Molina [1986]
...
[1988] and Chen and Patterson [1990]
...
[1994] presents an excellent survey of RAID principles and
implementation
...
The log-based file
system, which makes disk access sequential, is described in Rosenblum and Ousterhout [1991]
...
The
broadcast medium can be viewed as a level of the storage hierarchy — as a broadcast
disk with high latency
...
[1995]
...
Further discussion of storage issues in mobile computing appears in Douglis
et al
...
Basic data structures are discussed in Cormen et al
...
There are several papers
describing the storage structure of specific database systems
...
[1976]
discusses System R
...
[1981] reviews System R in retrospect
...
The structure of the Wisconsin Storage System (WiSS) is
described in Chou et al
...
A software tool for the physical design of relational
databases is described by Finkelstein et al
...
Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition
444
Chapter 11
IV
...
Storage and File
Structure
© The McGraw−Hill
Companies, 2001
Storage and File Structure
Buffer management is discussed in most operating system texts, including in Silberschatz and Galvin [1998]
...
Chou and
Dewitt [1985] presents algorithms for buffer management in database systems, and
describes a performance evaluation
...
[1997] describes techniques used in
the buffer manager of the Oracle database system
...
White and DeWitt [1994] describes the virtual-memory-mapped buffer-management scheme used
in the ObjectStore OODB system and in the QuickStore storage manager
...
The Exodus object storage manager is described in Carey
et al
...
Biliris and Orenstein [1994] provides a survey of storage systems for
object-oriented databases
...
[1994] describes a storage manager for mainmemory databases
...
Data Storage and
Querying
H
A
P
T
12
...
For example, a query like “Find all accounts at the Perryridge branch” or “Find the balance
of account number A-101” references only a fraction of the account records
...
Ideally, the
system should be able to locate these records directly
...
12
...
If we want to learn about a particular topic (specified by a word or
a phrase) in this textbook, we can search for the topic in the index at the back of the
book, find the pages where it occurs, and then read the pages to find the information
we are looking for
...
Moreover, the index is much smaller than the book,
further reducing the effort needed to find the words we are looking for
...
To find a book by a particular author, we would search in the author
catalog, and a card in the catalog tells us where to find the book
...
Database system indices play the same role as book indices or card catalogs in
libraries
...
Keeping a sorted list of account numbers would not work well on very large
databases with millions of accounts, since the index would itself be very big; further,
445
Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition
446
Chapter 12
IV
...
Indexing and Hashing
© The McGraw−Hill
Companies, 2001
Indexing and Hashing
even though keeping the index sorted reduces the search time, finding an account
can still be rather time-consuming
...
We shall discuss several of these techniques in this chapter
...
Based on a sorted ordering of the values
...
Based on a uniform distribution of values across a range of
buckets
...
We shall consider several techniques for both ordered indexing and hashing
...
Rather, each technique is best suited to particular database
applications
...
Access types
can include finding records with a specified attribute value and finding records
whose attribute values fall in a specified range
...
• Insertion time: The time it takes to insert a new data item
...
• Deletion time: The time it takes to delete a data item
...
• Space overhead: The additional space occupied by an index structure
...
We often want to have more than one index for a file
...
An attribute or set of attributes used to look up records in a file is called a search
key
...
This duplicate meaning for key is (unfortunately) well established
in practice
...
12
...
Each
index structure is associated with a particular search key
...
The records in the indexed file may themselves be stored in some sorted order, just
as books in a library are stored according to some attribute such as the Dewey deci-
447
448
Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition
IV
...
Indexing and Hashing
12
...
1
Sequential file for account records
...
A file may have several indices, on different search keys
...
(The term primary index is sometimes
used to mean an index on a primary key
...
) Primary indices are also called clustering indices
...
Indices whose search key specifies an order different from the sequential order of the
file are called secondary indices, or nonclustering indices
...
2
...
Such files, with a primary index on the search key, are called index-sequential files
...
They are
designed for applications that require both sequential processing of the entire file and
random access to individual records
...
1 shows a sequential file of account records taken from our banking example
...
1, the records are stored in search-key order, with
branch-name used as the search key
...
2
...
1 Dense and Sparse Indices
An index record, or index entry, consists of a search-key value, and pointers to one
or more records with that value as their search-key value
...
There are two types of ordered indices that we can use:
• Dense index: An index record appears for every search-key value in the file
...
The rest of the
records with the same search key-value would be stored sequentially after the
Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition
448
Chapter 12
IV
...
Indexing and Hashing
Indexing and Hashing
first record, since, because the index is a primary one, records are sorted on
the same search key
...
• Sparse index: An index record appears for only some of the search-key values
...
To locate a record,
we find the index entry with the largest search-key value that is less than or
equal to the search-key value for which we are looking
...
Figures 12
...
3 show dense and sparse indices, respectively, for the account
file
...
Using the
dense index of Figure 12
...
We process this record, and follow the pointer in that record to locate the
next record in search-key (branch-name) order
...
If we are using the sparse
index (Figure 12
...
” Since the last entry (in alphabetic order) before “Perryridge” is “Mianus,” we follow that pointer
...
As we have seen, it is generally faster to locate a record if we have a dense index
rather than a sparse index
...
There is a trade-off that the system designer must make between access time and
space overhead
...
The reason this design is a good trade-off is that the dominant cost in pro-
Brighton
Downtown
Mianus
Perryridge
Redwood
Round Hill
A-217
A-101
A-110
A-215
A-102
A-201
A-218
A-222
A-305
Figure 12
...
750
500
600
700
400
900
700
700
350
449
450
Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition
IV
...
Indexing and Hashing
12
...
3
Sparse index
...
Once we have brought in the block, the time to scan the entire block
is negligible
...
Thus, unless the record is on an overflow block (see Section 11
...
1),
we minimize block accesses while keeping the size of the index (and thus, our space
overhead) as small as possible
...
It is easy to modify our
scheme to handle this situation
...
2
...
2 Multilevel Indices
Even if we use a sparse index, the index itself may become too large for efficient
processing
...
If we have one index record per block, the index has
10,000 records
...
Thus, our index occupies 100 blocks
...
If an index is sufficiently small to be kept in main memory, the search time to find
an entry is low
...
Binary search can be used on the index
file to locate an entry, but the search still has a large cost
...
( x denotes
the least integer that is greater than or equal to x; that is, we round upward
...
On a disk system where a
block read takes 30 milliseconds, the search will take 210 milliseconds, which is long
...
In
that case, a sequential search is typically used, and that requires b block reads, which
will take even longer
...
To deal with this problem, we treat the index just as we would treat any other
sequential file, and construct a sparse index on the primary index, as in Figure 12
...
Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition
450
Chapter 12
IV
...
Indexing and Hashing
Indexing and Hashing
To locate a record, we first use binary search on the outer index to find the record for
the largest search-key value less than or equal to the one that we desire
...
We scan this block until we find the record that
has the largest search-key value less than or equal to the one that we desire
...
Using the two levels of indexing, we have read only one index block, rather than
the seven we read with binary search, if we assume that the outer index is already in
main memory
...
In such a case, we can create yet another level of index
...
Indices with two or more
levels are called multilevel indices
...
Each level of index could correspond to a unit of physical storage
...
A typical dictionary is an example of a multilevel index in the nondatabase world
...
Such a book
index
block 0
outer index
data
block 0
data
block 1
index
block 1
inner index
Figure 12
...
451
452
Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition
IV
...
Indexing and Hashing
12
...
Multilevel indices are closely related to tree structures, such as the binary trees
used for in-memory indexing
...
3
...
2
...
3 Index Update
Regardless of what form of index is used, every index must be updated whenever a
record is either inserted into or deleted from the file
...
• Insertion
...
Again, the actions the system takes next
depend on whether the index is dense or sparse:
Dense indices:
1
...
2
...
If the index record stores pointers to all records with the same
search-key value, the system adds a pointer to the new record to
the index record
...
Otherwise, the index record stores a pointer to only the first record
with the search-key value
...
Sparse indices: We assume that the index stores an entry for each block
...
On the other
hand, if the new record has the least search-key value in its block, the
system updates the index entry pointing to the block; if not, the system
makes no change to the index
...
To delete a record, the system first looks up the record to be deleted
...
If the deleted record was the only record with its particular search-key
value, then the system deletes the corresponding index record from
the index
...
Otherwise the following actions are taken:
a
...
Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition
452
Chapter 12
IV
...
Indexing and Hashing
© The McGraw−Hill
Companies, 2001
Indexing and Hashing
b
...
In this case, if the deleted record was
the first record with the search-key value, the system updates the
index record to point to the next record
...
If the index does not contain an index record with the search-key value
of the deleted record, nothing needs to be done to the index
...
Otherwise the system takes the following actions:
a
...
If the next
search-key value already has an index entry, the entry is deleted
instead of being replaced
...
Otherwise, if the index record for the search-key value points to the
record being deleted, the system updates the index record to point
to the next record with the same search-key value
...
On deletion or insertion, the system updates the lowestlevel index as described
...
The same technique
applies to further levels of the index, if there are any
...
2
...
A primary index may be sparse, storing only some
of the search-key values, since it is always possible to find records with intermediate
search-key values by a sequential access to a part of the file, as described earlier
...
A secondary index on a candidate key looks just like a dense primary index, except
that the records pointed to by successive values in the index are not stored sequentially
...
If the search key of a primary index is not a candidate key, it suffices
if the index points to the first record with a particular value for the search key, since
the other records can be fetched by a sequential scan of the file
...
The remaining records with the same search-key value could be anywhere in the file, since the
records are ordered by the search key of the primary index, rather than by the search
key of the secondary index
...
453
454
Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition
IV
...
Indexing and Hashing
12
...
5
500
A-217
A-110
Brighton
Downtown
750
600
A-215
Mianus
700
A-102
Perryridge
400
A-201
A-218
Perryridge
Perryridge
900
700
A-222
A-305
350
400
500
600
700
750
900
Downtown
Redwood
Round Hill
453
700
350
Secondary index on account file, on noncandidate key balance
...
The pointers in such a secondary index do not point
directly to the file
...
Figure 12
...
A sequential scan in primary index order is efficient because records in the file are
stored physically in the same order as the index order
...
Because secondary-key order and
physical-key order differ, if we attempt to scan the file sequentially in secondary-key
order, the reading of each record is likely to require the reading of a new block from
disk, which is very slow
...
If a file has multiple indices, whenever the file is
modified, every index must be updated
...
However, they impose a significant overhead
on modification of the database
...
12
...
Although this degradation can be remedied by reorganization of the file,
frequent reorganizations are undesirable
...
Data Storage and
Querying
© The McGraw−Hill
Companies, 2001
12
...
6
...
The B+ -tree index structure is the most widely used of several index structures
that maintain their efficiency despite insertion and deletion of data
...
Each nonleaf node in the tree has between n/2
and n children, where n is fixed for a particular tree
...
The overhead is acceptable even for frequently modified files, since the cost of file reorganization is avoided
...
This space overhead, too, is acceptable given
the performance benefits of the B+ -tree structure
...
3
...
Figure 12
...
It contains up to n − 1 search-key values K1 , K2 ,
...
, Pn
...
We consider first the structure of the leaf nodes
...
, n − 1, pointer
Pi points to either a file record with search-key value Ki or to a bucket of pointers,
each of which points to a file record with search-key value Ki
...
Pointer Pn has a special purpose that we shall discuss
shortly
...
7 shows one leaf node of a B+ -tree for the account file, in which we have
chosen n to be 3, and the search key is branch-name
...
Now that we have seen the structure of a leaf node, let us consider how search-key
values are assigned to particular nodes
...
We
allow leaf nodes to contain as few as (n − 1)/2 values
...
Thus, if Li and Lj are leaf nodes and i < j, then every searchkey value in Li is less than every search-key value in Lj
...
Now we can explain the use of the pointer Pn
...
This ordering allows for efficient sequential processing of the file
...
The structure of nonleaf nodes is the same as that for leaf nodes, except that all pointers are pointers to tree nodes
...
Data Storage and
Querying
© The McGraw−Hill
Companies, 2001
12
...
3
Brighton
B+ -Tree Index Files
455
Downtown
leaf node
A-212 Brighton
750
A-101 Downtown
500
A-110 Downtown
600
...
7
A leaf node for account B+ -tree index (n = 3)
...
The number of pointers in a node is called the fanout of
the node
...
For i = 2, 3,
...
Pointer Pm points to the part of the subtree that contains those key
values greater than or equal to Km − 1 , and pointer P1 points to the part of the subtree
that contains those search-key values less than K1
...
It is always possible to construct a B+ -tree, for any n, that satisfies the preceding
requirements
...
8 shows a complete B+ -tree for the account file (n = 3)
...
As an example of a B+ -tree for which the root must have less than n/2 values,
Figure 12
...
These examples of B+ -trees are all balanced
...
This property is a requirement for a B+ -tree
...
” It is the balance property of B+ -trees
that ensures good performance for lookup, insertion, and deletion
...
8
Redwood
Perryridge
Redwood
B+ -tree for account file (n = 3)
...
Data Storage and
Querying
© The McGraw−Hill
Companies, 2001
12
...
9
Perryridge Redwood
Round Hill
B+ -tree for account file with n = 5
...
3
...
Suppose that we wish to find
all records with a search-key value of V
...
10 presents pseudocode for doing
so
...
First, we examine the root node, looking for the smallest search-key value greater than V
...
We then follow pointer Pi to another node
...
In this case
we follow Pm to another node
...
Eventually, we reach a leaf node
...
If
the value V is not found in the leaf node, no record with key value V exists
...
If there are K search-key values in the file, the path is no longer than
log n/2 (K)
...
With a search-key
size of 12 bytes, and a disk-pointer size of 8 bytes, n is around 200
...
With
n = 100, if we have 1 million search-key values in the file, a lookup requires only
procedure find(value V )
set C = root node
while C is not a leaf node begin
Let Ki = smallest search-key value, if any, greater than V
if there is no such value then begin
Let m = the number of pointers in the node
set C = node pointed to by Pm
end
else set C = the node pointed to by Pi
end
if there is a key value Ki in C such that Ki = V
then pointer Pi directs us to the desired record or bucket
else no record with key value k exists
end procedure
Figure 12
...
457
458
Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition
IV
...
Indexing and Hashing
12
...
Thus, at most four blocks need to be
read from disk for the lookup
...
An important difference between B+ -tree structures and in-memory tree structures, such as binary trees, is the size of a node, and as a result, the height of the
tree
...
In a B+ -tree,
each node is large—typically a disk block—and a node can have a large number of
pointers
...
In
a balanced binary tree, the path for a lookup can be of length log2 (K) , where K is
the number of search-key values
...
If each node were on a different disk block, 20 block reads would be required to process a lookup, in contrast to
the four block reads for the B+ -tree
...
3
...
Furthermore, when a node is split or a pair of nodes is combined, we must ensure
that balance is preserved
...
Under this assumption, insertion and deletion are performed as defined next
...
Using the same technique as for lookup, we find the leaf node in
which the search-key value would appear
...
If the search-key value does not appear,
we insert the value in the leaf node, and position it such that the search keys
are still in order
...
• Deletion
...
We remove the search-key value from the
leaf node if there is no bucket associated with that search-key value or if the
bucket becomes empty as a result of the deletion
...
Assume that we wish
to insert a record with a branch-name value of “Clearview” into the B+ -tree of Figure 12
...
Using the algorithm for lookup, we find that “Clearview” should appear
in the node containing “Brighton” and “Downtown
...
” Therefore, the node is split into two nodes
...
11
shows the two leaf nodes that result from inserting “Clearview” and splitting the
node containing “Brighton” and “Downtown
...
Data Storage and
Querying
© The McGraw−Hill
Companies, 2001
12
...
11
Clearview
Downtown
Split of leaf node on insertion of “Clearview
...
Having split a leaf node, we must insert the new leaf node into the B+ -tree structure
...
We need to insert this search-key value into the parent of the leaf node that was split
...
12 shows the result of the insertion
...
It was possible to perform this insertion
because there was room for an added search-key value
...
In the worst case, all nodes along the path to the
root must be split
...
The general technique for insertion into a B+ -tree is to determine the leaf node l
into which insertion must occur
...
If this insertion causes a split, proceed recursively up the tree until either
an insertion does not cause a split or a new root is created
...
13 outlines the insertion algorithm in pseudocode
...
Ki and L
...
The
pseudocode also makes use of the function parent(L) to find the parent of a node L
...
The pseudocode refers to inserting an entry (V, P ) into a node
...
For internal nodes, P is stored just after V
...
First,
let us delete “Downtown” from the B+ -tree of Figure 12
...
We locate the entry for
“Downtown” by using our lookup algorithm
...
Since, in our example, n = 3 and
0 < (n − 1)/2 , this node must be eliminated from the B+ -tree
...
12
Downtown
Mianus
Redwood
Mianus
Perryridge
Redwood Round Hill
Insertion of “Clearview” into the B+ -tree of Figure 12
...
459
460
Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition
IV
...
Indexing and Hashing
12
...
K1 ,
...
Kn−1 , V such that exactly
n/2 of the values L
...
, L
...
Km ≥ V
/* Note: V must be either L
...
Pm , L
...
, L
...
Kn−1 to L
if (V < V ) then insert (P, V ) in L
else insert (P, V ) in L
end
else begin
if (V = V ) /* V is smallest value to go to L */
then add P, L
...
, L
...
Kn−1 , L
...
Pm ,
...
Pn−1 , L
...
Pn to L
delete L
...
, L
...
Kn−1 , L
...
Pn = L
...
Pn = L
end
end
end procedure
Figure 12
...
459
Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition
460
Chapter 12
IV
...
Indexing and Hashing
Indexing and Hashing
Perryridge
Mianus
Brighton
Clearview
Figure 12
...
12
...
In our example, this deletion leaves
the parent node, which formerly contained three pointers, with only two pointers
...
The resulting B+ -tree appears in Figure 12
...
When we make a deletion from a parent of a leaf node, the parent node itself may
become too small
...
14
...
When we delete the pointer to this node in the latter’s parent, the parent is left
with only one pointer
...
However, since the parent node contains useful information, we cannot simply delete
it
...
This sibling node has room to accommodate the information contained
in our now-too-small node, so we coalesce these nodes, such that the sibling node
now contains the keys “Mianus” and “Redwood
...
Figure 12
...
Notice that the root has only one child pointer after the deletion, so
it is deleted and its sole child becomes the root
...
It is not always possible to coalesce nodes
...
12
...
Once again, the leaf node containing “Perryridge” becomes empty
...
However, in this example, the sibling node already contains the maximum number of pointers: three
...
The solution in this case is to redistribute the pointers such that each sibling has two pointers
...
15
Mianus
Redwood
Redwood
Round Hill
Deletion of “Perryridge” from the B+ -tree of Figure 12
...
461
462
Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition
IV
...
Indexing and Hashing
12
...
16
Downtown
Redwood
Mianus
Redwood
Round Hill
Deletion of “Perryridge” from the B+ -tree of Figure 12
...
Figure 12
...
Note that the redistribution of values necessitates a change of a searchkey value in the parent of the two siblings
...
If the node is too small, we delete it from its parent
...
Figure 12
...
The procedure
swap variables(L, L ) merely swaps the values of the (pointer) variables L and L ;
this swap has no effect on the tree itself
...
” For nonleaf nodes, this criterion means less than n/2 pointers;
for leaf nodes, it means less than (n − 1)/2 values
...
We can also redistribute
entries by repartitioning entries equally between the two nodes
...
In the case of leaf nodes, the pointer to
an entry actually precedes the key value, so the pointer P precedes the key value V
...
It is worth noting that, as a result of deletion, a key value that is present in an
internal node of the B+ -tree may not be present at any leaf of the tree
...
It can be shown that the number of I/O operations needed for a
worst-case insertion or deletion is proportional to log n/2 (K), where n is the maximum number of pointers in a node, and K is the number of search-key values
...
It is the speed of operation on B+ -trees that makes
them a frequently used index structure in database implementations
...
3
...
3, the main drawback of index-sequential file organization is the degradation of performance as the file grows: With growth, an increasing
percentage of index records and actual records become out of order, and are stored in
overflow blocks
...
We solve the degradation problem for storing the actual records by using
the leaf level of the B+ -tree to organize the blocks containing the actual records
...
Data Storage and
Querying
12
...
Pn = L
...
Pm is the last pointer in L
remove (L
...
Pm ) from L
insert (L
...
Km−1
end
else begin
let m be such that (L
...
Km ) is the last pointer/value
pair in L
remove (L
...
Km ) from L
insert (L
...
Km ) as the first pointer and value in L,
by shifting other pointers and values right
replace V in parent(L) by L
...
symmetric to the then case
...
17
Deletion of entry from a B+ -tree
...
Data Storage and
Querying
© The McGraw−Hill
Companies, 2001
12
...
3
B+ -Tree Index Files
463
use the B+ -tree structure not only as an index, but also as an organizer for records in
a file
...
Figure 12
...
Since records are usually larger than pointers, the maximum number of records
that can be stored in a leaf node is less than the number of pointers in a nonleaf node
...
Insertion and deletion of records from a B+ -tree file organization are handled in
the same way as insertion and deletion of entries in a B+ -tree index
...
If the
block located has enough free space for the record, the system stores the record in the
block
...
The split propagates up the B+ -tree in the normal fashion
...
If a block B becomes less
than half full as a result, the records in B are redistributed with the records in an adjacent block B
...
The system updates the nonleaf
nodes of the B+ -tree in the usual fashion
...
We can improve the utilization of space in a B+ tree by involving more sibling nodes in redistribution during splits and merges
...
During insertion, if a node is full the system attempts to redistribute some of its
entries to one of the adjacent nodes, to make space for a new entry
...
Since the three nodes together contain one more record
than can fit in two nodes, each node will be about two-thirds full
...
( x denotes the greatest integer that is less than or equal to x; that
is, we drop the fractional part, if any
...
18
M
(F,7) (G,3) (H,3)
(K,1) (L,6)
B+ -tree file organization
...
Data Storage and
Querying
12
...
If both sibling
nodes have 2n/3 records, instead of borrowing an entry, the system redistributes
the entries in the node and in the two siblings evenly between two of the nodes, and
deletes the third node
...
With three adjacent nodes used for redistribution,
each node can be guaranteed to have 3n/4 entries
...
However, the cost of update becomes higher as more
sibling nodes are involved in the redistribution
...
4 B-Tree Index Files
B-tree indices are similar to B+ -tree indices
...
In the B+ -tree of Figure 12
...
Every search-key value appears in some leaf node;
several are repeated in nonleaf nodes
...
Figure 12
...
12
...
However, since search keys that appear in
nonleaf nodes appear nowhere else in the B-tree, we are forced to include an additional pointer field for each search key in a nonleaf node
...
A generalized B-tree leaf node appears in Figure 12
...
20b
...
In nonleaf nodes, the pointers Pi are the tree pointers that we used also for B+ -trees, while the pointers Bi are
bucket or file-record pointers
...
This discrepancy
occurs because nonleaf nodes must include pointers Bi , thus reducing the number of
Downtown
Downtown
bucket
Brighton
Brighton
bucket
Clearview
Clearview
bucket
Figure 12
...
12
...
Data Storage and
Querying
© The McGraw−Hill
Companies, 2001
12
...
5
P1
K1
P2
Pn−1
...
Pm−1
Bm−1
Km−1
Pm
(b)
Figure 12
...
(a) Leaf node
...
search keys that can be held in these nodes
...
The number of nodes accessed in a lookup in a B-tree depends on where the search
key is located
...
In contrast, it is sometimes possible to find the desired value
in a B-tree before reaching a leaf node
...
Moreover, the fact
that fewer search keys appear in a nonleaf B-tree node, compared to B+ -trees, implies
that a B-tree has a smaller fanout and therefore may have depth greater than that of
the corresponding B+ -tree
...
Deletion in a B-tree is more complicated
...
In a B-tree, the deleted entry may appear in a nonleaf node
...
Specifically, if search key Ki is deleted, the smallest search key
appearing in the subtree of pointer Pi + 1 must be moved to the field formerly occupied by Ki
...
In contrast, insertion in a B-tree is only slightly more complicated than is insertion in
a B+ -tree
...
Thus, many database system implementers prefer the structural simplicity of a B+ -tree
...
12
...
File organizations based on the technique of hashing allow us to avoid accessing an index structure
...
We
study file organizations and indices based on hashing in the following sections
...
Data Storage and
Querying
12
...
5
...
In our description of hashing, we shall use the term bucket to denote a unit of storage
that can store one or more records
...
Formally, let K denote the set of all search-key values, and let B denote the set of
all bucket addresses
...
Let h denote a hash
function
...
Assume for now that there is space in the bucket to store
the record
...
To perform a lookup on a search-key value Ki , we simply compute h(Ki ), then
search the bucket with that address
...
If we perform a lookup on K5 , the
bucket h(K5 ) contains records with search-key values K5 and records with searchkey values K7
...
Deletion is equally straightforward
...
12
...
1
...
Such
a function is undesirable because all the records have to be kept in the same bucket
...
An ideal hash
function distributes the stored keys uniformly across all the buckets, so that every
bucket has the same number of records
...
That is, the hash function assigns each bucket the
same number of search-key values from the set of all possible search-key values
...
That is, in the average case, each bucket will have
nearly the same number of values assigned to it, regardless of the actual distribution of search-key values
...
As an illustration of these principles, let us choose a hash function for the account
file using the search key branch-name
...
Data Storage and
Querying
© The McGraw−Hill
Companies, 2001
12
...
5
Static Hashing
467
the desirable properties not only on the example account file that we have been using,
but also on an account file of realistic size for a large bank with many branches
...
This hash
function has the virtue of simplicity, but it fails to provide a uniform distribution,
since we expect more branch names to begin with such letters as B and R than Q and
X, for example
...
Suppose that
the minimum balance is 1 and the maximum balance is 100,000, and we use a hash
function that divides the values into 10 ranges, 1–10,000, 10,001–20,000 and so on
...
But records with balances between 1
and 10,000 are far more common than are records with balances between 90,001 and
100,000
...
If the function has a random distribution, even if there
are such correlations in the search keys, the randomness of the distribution will make
it very likely that all buckets will have roughly the same number of records, as long
as each search key occurs in only a small fraction of the records
...
)
Typical hash functions perform computation on the internal binary machine representation of characters in the search key
...
Figure 12
...
Hash functions require careful design
...
A well-designed
function gives an average-case lookup time that is a (small) constant, independent of
the number of search keys in the file
...
5
...
2 Handling of Bucket Overflows
So far, we have assumed that, when a record is inserted, the bucket to which it is
mapped has space to store the record
...
Bucket overflow can occur for several reasons:
• Insufficient buckets
...
This designation, of course, assumes that the total number of records
is known when the hash function is chosen
...
Some buckets are assigned more records than are others, so a bucket
may overflow even when other buckets still have space
...
Skew can occur for two reasons:
Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition
468
Chapter 12
IV
...
Indexing and Hashing
Indexing and Hashing
bucket 0
bucket 5
A-102
A-201
A-218
bucket 1
400
900
700
Mianus
700
Downtown
Downtown
500
600
bucket 6
bucket 2
Perryridge
Perryridge
Perryridge
bucket 7
A-215
bucket 3
A-217
A-305
bucket 8
Brighton
Round Hill
750
350
bucket 4
A-222
Figure 12
...
1
...
2
...
So that the probability of bucket overflow is reduced, the number of buckets is
chosen to be (nr /fr ) ∗ (1 + d), where d is a fudge factor, typically around 0
...
Some
space is wasted: About 20 percent of the space in the buckets will be empty
...
Despite allocation of a few more buckets than required, bucket overflow can still
occur
...
If a record must be
inserted into a bucket b, and b is already full, the system provides an overflow bucket
for b, and inserts the record into the overflow bucket
...
All the overflow buck-
469
470
Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition
IV
...
Indexing and Hashing
12
...
22
Overflow chaining in a hash structure
...
22
...
We must change the lookup algorithm slightly to handle overflow chaining
...
The
system must examine all the records in bucket b to see whether they match the search
key, as before
...
The form of hash structure that we have just described is sometimes referred to
as closed hashing
...
Instead, if a bucket is full, the system inserts records in some other bucket in the initial set of buckets B
...
Other policies, such as computing further hash functions, are also used
...
The reason is that deletion under open hashing is troublesome
...
However, in a database system, it is important to be able to handle deletion as well as insertion
...
An important drawback to the form of hashing that we have described is that
we must choose the hash function when we implement the system, and it cannot be
changed easily thereafter if the file being indexed grows or shrinks
...
Data Storage and
Querying
© The McGraw−Hill
Companies, 2001
12
...
If B is too small, the buckets contain
records of many different search-key values, and bucket overflows can occur
...
We study later, in Section 12
...
12
...
2 Hash Indices
Hashing can be used not only for file organization, but also for index-structure creation
...
We construct a hash index as follows
...
Figure 12
...
The hash function in the figure
computes the sum of the digits of the account number modulo 7
...
23
Hash index on search key account-number of account file
...
Data Storage and
Querying
© The McGraw−Hill
Companies, 2001
12
...
6
Dynamic Hashing
471
bucket sizes)
...
In this example, account-number is a primary key for account, so each searchkey has only one associated pointer
...
We use the term hash index to denote hash file structures as well as secondary
hash indices
...
A
hash index is never needed as a primary index structure, since, if a file itself is organized by hashing, there is no need for a separate hash index structure on it
...
12
...
Most databases
grow larger over time
...
Choose a hash function based on the current file size
...
2
...
Although performance degradation is avoided, a significant
amount of space may be wasted initially
...
Periodically reorganize the hash structure in response to file growth
...
This reorganization is a massive, time-consuming operation
...
Several dynamic hashing techniques allow the hash function to be modified dynamically to accommodate the growth or shrinkage of the database
...
The bibliographical notes provide references to other forms of dynamic hashing
...
6
...
As a result, space efficiency is retained
...
With extendable hashing, we choose a hash function h with the desirable properties of uniformity and randomness
...
A typical value for
b is 32
...
Data Storage and
Querying
© The McGraw−Hill
Companies, 2001
12
...
bucket 1
01
...
11
...
...
bucket 2
i3
bucket 3
...
...
24
General extendable hash structure
...
Indeed, 232 is over 4 billion, and
that many buckets is unreasonable for all but the largest databases
...
We do not use the entire b
bits of the hash value initially
...
These i
bits are used as an offset into an additional table of bucket addresses
...
Figure 12
...
The i appearing above
the bucket address table in the figure indicates that i bits of the hash value h(K) are
required to determine the correct bucket for K
...
Although i bits are required to find the correct entry in the bucket
address table, several consecutive table entries may point to the same bucket
...
Therefore, we associate with each bucket an integer giving the length of the
common hash prefix
...
24 the integer associated with bucket j is shown as
ij
...
6
...
To locate the bucket containing search-key value Kl , the system takes the first i
high-order bits of h(Kl ), looks at the corresponding table entry for this bit string, and
follows the bucket pointer in the table entry
...
If there is room in the bucket,
473
474
Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition
IV
...
Indexing and Hashing
12
...
If, on the other hand, the bucket is full, it
must split the bucket and redistribute the current records, plus the new one
...
• If i = ij , only one entry in the bucket address table points to bucket j
...
It
does so by considering an additional bit of the hash value
...
It replaces
each entry by two entries, both of which contain the same pointer as the original entry
...
The
system allocates a new bucket (bucket z), and sets the second entry to point
to the new bucket
...
Next, it rehashes each record in bucket
j and, depending on the first i bits (remember the system has added 1 to i),
either keeps it in bucket j or allocates it to the newly created bucket
...
Usually, the
attempt will succeed
...
If the hash function has been chosen carefully, it is unlikely
that a single insertion will require that a bucket be split more than once, unless
there are a large number of records with the same search key
...
In
such cases, overflow buckets are used to store the records, as in static hashing
...
Thus, the system can split bucket j without increasing the size of
the bucket address table
...
The system allocates a new bucket (bucket z), and set ij and iz to the value
that results from adding 1 to the original ij value
...
(Note that with the new value for ij , not all the entries correspond to hash
prefixes that have the same value on the leftmost ij bits
...
Next, as in
the previous case, the system rehashes each record in bucket j, and allocates it
either to bucket j or to the newly created bucket z
...
In the unlikely case that it again fails,
it applies one of the two cases, i = ij or i > ij , as appropriate
...
To delete a record with search-key value Kl , the system follows the same procedure for lookup as before, ending up in some bucket—say, j
...
Data Storage and
Querying
© The McGraw−Hill
Companies, 2001
12
...
25
750
500
600
700
400
900
700
700
350
Sample account file
...
The bucket too is removed
if it becomes empty
...
The procedure for deciding on
which buckets can be coalesced and how to coalesce buckets is left to you to do as an
exercise
...
Unlike coalescing of buckets, changing the size of
the bucket address table is a rather expensive operation if the table is large
...
Our example account file in Figure 12
...
The
32-bit hash values on branch-name appear in Figure 12
...
Assume that, initially, the
file is empty, as in Figure 12
...
We insert the records one by one
...
We insert the record (A-217, Brighton, 750)
...
Next, we insert the record
(A-101, Downtown, 500)
...
When we attempt to insert the next record (Downtown, A-110, 600), we find that
the bucket is full
...
We now use 1 bit, allowing us 21 = 2 buckets
...
26
Hash function for branch-name
...
Data Storage and
Querying
© The McGraw−Hill
Companies, 2001
12
...
6
hash prefix
Dynamic Hashing
475
0
0
bucket address table
bucket 1
Figure 12
...
the number of bits necessitates doubling the size of the bucket address table to two
entries
...
Figure 12
...
Next, we insert (A-215, Mianus, 700)
...
Once again, we find the bucket full and i = i1
...
This increase in the number of bits necessitates
doubling the size of the bucket address table to four entries, as in Figure 12
...
Since
the bucket of Figure 12
...
For each record in the bucket of Figure 12
...
Next, we insert (A-102, Perryridge, 400), which goes in the same bucket as Mianus
...
The insertion of the third Perryridge record, (A-218, Perryridge, 700),
leads to another overflow
...
Hence the system uses an overflow bucket, as in Figure 12
...
We continue in this manner until we have inserted all the account records of Figure 12
...
The resulting structure appears in Figure 12
...
1
hash prefix
1
A-217 Brighton
750
1
bucket address table
500
A-110 Downtown
Figure 12
...
Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition
476
Chapter 12
IV
...
Indexing and Hashing
Indexing and Hashing
1
hash prefix
A-217 Brighton
2
750
2
A-101 Downtown 500
A-110 Downtown 600
2
A-215 Mianus
bucket address table
Figure 12
...
12
...
3 Comparison with Other Schemes
We now examine the advantages and disadvantages of extendable hashing, compared with the other schemes that we have discussed
...
Furthermore, there is minimal space overhead
...
30
3
A-218 Perryridge 700
A-201 Perryridge 900
Hash structure after seven insertions
...
Data Storage and
Querying
12
...
Indexing and Hashing
Comparison of Ordered Indexing and Hashing
477
1
A-217 Brighton
750
A-222 Redwood
700
2
A-101 Downtown 500
A-110 Downtown 600
3
A-215 Mianus
700
A-305 Round Hill 350
3
3
A-102 Perryridge 400
bucket address table
A-218 Perryridge 700
A-201 Perryridge 900
Figure 12
...
fix length
...
The main space saving of extendable hashing
over other forms of hashing is that no buckets need to be reserved for future growth;
rather, buckets can be allocated dynamically
...
This extra reference has only a minor effect on performance
...
5 do not have this extra level of indirection, they lose their minor performance advantage as they become
full
...
The bibliographical notes reference more detailed descriptions of extendable hashing
implementation
...
12
...
We
can organize files of records as ordered files, by using index-sequential organization
or B+ -tree organizations
...
Finally, we can organize them as heap files, where the records are not ordered in any
particular way
...
Data Storage and
Querying
12
...
A database-system implementor could provide many schemes, leaving the final decision of which schemes to use
to the database designer
...
Most database systems support B+ -trees and may additionally support
some form of hash file organization or hash indices
...
The fourth issue, the expected type of query, is critical to the choice of
ordered indexing or hashing
...
, An
from r
where Ai = c
then, to process this query, the system will perform a lookup on an ordered index
or a hash structure for attribute Ai , for value c
...
An ordered-index lookup requires time proportional to the log
of the number of values in r for Ai
...
The only advantage to
an index over a hash structure for this form of query is that the worst-case lookup
time is proportional to the log of the number of values in r for Ai
...
However, the worst-case lookup time is unlikely to occur with hashing, and
hashing is preferable in this case
...
Such a query takes the following form:
select A1 , A2 ,
...
479
480
Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition
IV
...
Indexing and Hashing
12
...
First, we perform a lookup on value c1
...
If, instead of an ordered index, we have a hash structure, we can perform a lookup
on c1 and can locate the corresponding bucket—but it is not easy, in general, to determine the next bucket that must be examined
...
Thus, there is no simple notion of
“next bucket in sorted order
...
Since values are
scattered randomly by the hash function, the values in the specified range are likely
to be scattered across many or all of the buckets
...
Usually the designer will choose ordered indexing unless it is known in advance
that range queries will be infrequent, in which case hashing would be chosen
...
12
...
Indices
are not required for correctness, since they are redundant data structures
...
Indices are also important for efficient enforcement of integrity constraints
...
In principle, a database system can decide automatically what indices to create
...
Therefore, most SQL implementations provide the programmer
control over creation and removal of indices via data-definition-language commands
...
Although the syntax that we
show is widely used and supported by many database systems, it is not part of the
SQL:1999 standard
...
We create an index by the create index command, which takes the form
create index
The attribute-list is the list of attributes of the relations that form the search key for
the index
...
Data Storage and
Querying
12
...
Thus, the command
create unique index b-index on branch (branch-name)
declares branch-name to be a candidate key for branch
...
If the indexcreation attempt succeeds, any subsequent attempt to insert a tuple that violates the
key declaration will fail
...
Many database systems also provide a way to specify the type of index to be used
(such as B+ -tree or hashing)
...
The index name we specified for an index is required to drop an index
...
9 Multiple-Key Access
Until now, we have assumed implicitly that only one index (or hash table) is used to
process a query on a relation
...
12
...
1 Using Multiple Single-Key Indices
Assume that the account file has two indices: one for branch-name and one for balance
...
” We write
select loan-number
from account
where branch-name = “Perryridge” and balance = 1000
There are three strategies possible for processing this query:
1
...
Examine each such record to see whether balance = 1000
...
Use the index on balance to find all records pertaining to accounts with balances of $1000
...
”
3
...
Also, use the index on balance to find pointers to all records
481
482
Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition
IV
...
Indexing and Hashing
12
...
Take the intersection of these
two sets of pointers
...
The third strategy is the only one of the three that takes advantage of the existence
of multiple indices
...
• There are many records pertaining to accounts with a balance of $1000
...
If these conditions hold, we must scan a large number of pointers to produce a small
result
...
Bitmap indices are outlined in Section 12
...
4
...
9
...
The structure of the index is the same as that of any
other index, the only difference being that the search key is not a single attribute, but
rather is a list of attributes
...
, an ), where the indexed attributes are A1 ,
...
The ordering
of search-key values is the lexicographic ordering
...
Lexicographic ordering is basically the same as alphabetic ordering of words
...
As an illustration, consider the query
select loan-number
from account
where branch-name < “Perryridge” and balance = 1000
We can answer this query by using an ordered index on the search key (branch-name,
balance): For each value of branch-name that is less than “Perryridge” in alphabetic
order, the system locates records with a balance value of 1000
...
The difference between this query and the previous one is that the condition on
branch-name is a comparison condition, rather than an equality condition
...
We shall
consider the grid file in Section 12
...
3
...
Data Storage and
Querying
© The McGraw−Hill
Companies, 2001
12
...
The R-tree is an extension of the B+ -tree to handle indexing on multiple dimensions
...
12
...
3 Grid Files
Figure 12
...
The two-dimensional array in the figure is called the grid array, and
the one-dimensional arrays are called linear scales
...
Search keys are mapped to cells in this way
...
Only some
of the buckets and pointers from the cells are shown in the figure
...
The dotted boxes in the
figure indicate which cells point to the same bucket
...
To find the cell to which the key is mapped, we independently locate the row and column to which the cell belongs
...
To do so, we search the array to find the least element that is
greater than “Brighton”
...
If it were the ith element, the search key would map to row i − 1
...
32
2
3
4
2K
5K
10K
50K
2
3
4
5
Linear scale for balance
5
6
100K
6
Buckets
Grid file on keys branch-name and balance of the account file
...
Data Storage and
Querying
© The McGraw−Hill
Companies, 2001
12
...
9
Multiple-Key Access
483
the final row
...
In this case, the balance 500000 maps to column 6
...
Similarly, (“Downtown”, 60000) would map to the cell in row 1 column 5
...
To perform a lookup to answer our example query, with the search condition of
branch-name < “Perryridge” and balance = 1000
we find all rows that can contain branch names less than “Perryridge”, using the
linear scale on branch-name
...
Rows 3 and beyond
contain branch names greater than or equal to “Perryridge”
...
In this case, only column 1 satisfies
this condition
...
We therefore look up all entries in the buckets pointed to from these three cells
...
The buckets may contain some search
keys that do not satisfy the required condition, so each search key in the buckets must
be tested again to see whether it satisfies the search condition
...
We must choose the linear scales in such a way that the records are uniformly distributed across the cells
...
If more than one cell points
to A, the system changes the cell pointers so that some point to A and others to B
...
If only one cell points to bucket A, B becomes
an overflow bucket for A
...
The
process is much like the expansion of the bucket address table in extensible hashing,
and is left for you to do as an exercise
...
If we want our structure to be used for queries on n keys, we construct an ndimensional grid array with n linear scales
...
Consider
this query:
select *
from account
where branch-name = “Perryridge”
The linear scale on branch-name tells us that only cells in row 3 can satisfy this condition
...
Thus, we can use a grid-file index on
Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition
484
Chapter 12
IV
...
Indexing and Hashing
© The McGraw−Hill
Companies, 2001
Indexing and Hashing
two search keys to answer queries on either search key by itself, as well as to answer
queries on both search keys
...
If each index were maintained separately, the three together would
occupy more space, and the cost of updating them would be high
...
However, they impose a space overhead (the grid directory can become large), as
well as a performance overhead on record insertion and deletion
...
If insertions to the file are frequent, reorganization will have to be carried out
periodically, and that can have a high cost
...
9
...
For bitmap indices to be used, records in a relation must be numbered sequentially, starting from, say, 0
...
This is particularly easy to achieve if records are fixed in size, and allocated on consecutive blocks of a file
...
Consider a relation r, with an attribute A that can take on only one of a small number (for example, 2 to 20) values
...
Another example
would be an attribute income-level, where income has been broken up into 5 levels:
L1: $0 − 9999, L2: $10, 000 − 19, 999, L3: 20, 000 − 39, 999, L4: 40, 000 − 74, 999, and
L5: 75, 000 − ∞
...
12
...
4
...
In its simplest form, a bitmap index on the
attribute A of relation r consists of one bitmap for each value that A can take
...
The ith bit of the
bitmap for value vj is set to 1 if the record numbered i has the value vj for attribute
A
...
In our example, there is one bitmap for the value m and one for f
...
All
other bits of the bitmap for m are set to 0
...
Figure 12
...
We now consider when bitmaps are useful
...
The bitmap index doesn’t
really help to speed up such a selection
...
Data Storage and
Querying
© The McGraw−Hill
Companies, 2001
12
...
9
record
number name gender
address
income
-level
0
John
m
Perryridge
L1
1
Diana
f
Brooklyn
2
Mary
f
3
Peter
4
Kathy
Multiple-Key Access
Bitmaps for gender
m
01101
Bitmaps for
income-level
10010
f
485
L1
10100
L2
L2
01000
Jonestown
L1
L3
00001
m
Brooklyn
L4
L4
00010
f
Perryridge
L3
L5
00000
Figure 12
...
In fact, bitmap indices are useful for selections mainly when there are selections
on multiple keys
...
Consider now a query that selects women with income in the range 10, 000 −
19, 999
...
To evaluate this
selection, we fetch the bitmaps for gender value f and the bitmap for income-level value
L2, and perform an intersection (logical-and) of the two bitmaps
...
In the example in Figure 12
...
Since the first attribute can take 2 values, and the second can take 5 values, we
would expect only about 1 in 10 records, on an average, to satisfy a combined condition on the two attributes
...
The system can then compute the
query result by finding all bits with value 1 in the intersection bitmap, and retrieving
the corresponding records
...
Another important use of bitmaps is to count the number of tuples satisfying a
given selection
...
For instance, if we wish
to find out how many women have an income level L2, we compute the intersection
of the two bitmaps, and then count the number of bits that are 1 in the intersection
bitmap
...
Bitmap indices are generally quite small compared to the actual relation size
...
Thus the space occupied by a single bitmap
is usually less than 1 percent of the space occupied by the relation
...
If an attribute A
8
of the relation can take on only one of 8 values, a bitmap index on attribute A would
consist of 8 bitmaps, which together occupy only 1 percent of the size of the relation
...
Data Storage and
Querying
12
...
To recognize deleted
records, we can store an existence bitmap, in which bit i is 0 if record i does not exist
and 1 otherwise
...
9
...
2
...
Therefore,
we can do insertion either by appending records to the end of the file or by replacing
deleted records
...
9
...
2 Efficient Implementation of Bitmap Operations
We can compute the intersection of two bitmaps easily by using a for loop: the ith
iteration of the loop computes the and of the ith bits of the two bitmaps
...
A word usually consists of 32 or 64
bits, depending on the architecture of the computer
...
What is important to note is that a single
bit-wise and instruction can compute the intersection of 32 or 64 bits at once
...
Only 31,250 instructions are needed to compute the intersection of two bitmaps for our relation, assuming a 32-bit word length
...
Just like bitmap intersection is useful for computing the and of two conditions,
bitmap union is useful for computing the or of two conditions
...
The complement operation can be used to compute a predicate involving the negation of a condition, such as not (income-level = L1)
...
It may appear that not (income-level = L1) can be implemented by just computing the complement of the bitmap for income level L1
...
Bits corresponding to such records would be 0 in the original bitmap,
but would become 1 in the complement, although the records don’t exist
...
For instance, if the value
of income-level is null, the bit would be 0 in the original bitmap for value L1, and 1 in
the complement bitmap
...
Similarly, to handle null values, the complement bitmap must
also be intersected with the complement of the bitmap for the value null
...
We can maintain an array with 256 entries, where the ith entry stores the
1
...
487
488
Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition
IV
...
Indexing and Hashing
12
...
Set the total count initially
to 0
...
The number of addition operations would be 1 of the
8
number of tuples, and thus the counting process is very efficient
...
12
...
4
...
In a B+ -tree index leaf, for each value we would normally maintain a list
of all records with that value for the indexed attribute
...
For a value that
occurs in many records, we store a bitmap instead of a list of records
...
Let N be
the number of records in the relation, and assume that a record has a 64-bit number
identifying it
...
In contrast,
the list representation requires 64 bits per record where the value occurs, or 64 ∗
N/16 = 4N bits
...
In our example (with a 64-bit record identifier), if fewer than 1 in 64 records
have a particular value, the list representation is preferable for identifying records
with that value, since it uses fewer bits than the bitmap representation
...
Thus, bitmaps can be used as a compressed storage mechanism at the leaf nodes
of B+ -trees, for those values that occur very frequently
...
10 Summary
• Many queries reference only a small proportion of the records in a file
...
• Index-sequential files are one of the oldest index schemes used in database
systems
...
To allow
fast random access, we use an index structure
...
Dense indices contain entries for every search-key value, whereas
sparse indices contain entries only for some search-key values
...
The other indices are called secondary indices
...
However, they impose an overhead
on modification of the database
...
Data Storage and
Querying
12
...
To overcome this deficiency, we can use
a B+ -tree index
...
The height of a B+ tree is proportional to the logarithm to the base N of the number of records
in the relation, where each nonleaf node stores N pointers; the value of N is
often around 50 or 100
...
• Lookup on B+ -trees is straightforward and efficient
...
The number of
operations required for lookup, insertion, and deletion on B+ -trees is proportional to the logarithm to the base N of the number of records in the relation,
where each nonleaf node stores N pointers
...
• B-tree indices are similar to B+ -tree indices
...
The
major disadvantages are overall complexity and reduced fanout for a given
node size
...
• Sequential file organizations require an index structure to locate data
...
Since we do not know at design time precisely which search-key
values will be stored in the file, a good hash function to choose is one that assigns search-key values to buckets such that the distribution is both uniform
and random
...
Such hash functions cannot easily accommodate databases that grow significantly larger over time
...
One example is extendable hashing, which
copes with changes in database size by splitting and coalescing buckets as the
database grows and shrinks
...
For notational convenience, we assume hash file organizations
have an implicit hash index on the search key used for hashing
...
When multiple
489
490
Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition
IV
...
Indexing and Hashing
Exercises
489
attributes are involved in a selection condition, we can intersect record identifiers retrieved from multiple indices
...
• Bitmap indices provide a very compact representation for indexing attributes
with very few distinct values
...
Review Terms
• Access types
• Access time
• Insertion time
• Deletion time
• Space overhead
• Ordered index
• Primary index
• Clustering index
• Secondary index
• Nonclustering index
• Index-sequential file
• Index record/entry
• Dense index
• Sparse index
• Multilevel index
• Sequential scan
• B+ -Tree index
• Balanced tree
• B+ -Tree file organization
•
•
•
•
•
•
•
•
•
•
•
B-Tree index
Static hashing
Hash file organization
Hash index
Bucket
Hash function
Bucket overflow
Skew
Closed hashing
Dynamic hashing
Extendable hashing
•
•
•
•
•
Multiple-key access
Indices on multiple keys
Grid files
Bitmap index
Bitmap operations
Intersection
Union
Complement
Existence bitmap
Exercises
12
...
12
...
12
...
4 Is it possible in general to have two primary indices on the same relation for
different search keys? Explain your answer
...
Data Storage and
Querying
12
...
5 Construct a B+ -tree for the following set of key values:
(2, 3, 5, 7, 11, 17, 19, 23, 29, 31)
Assume that the tree is initially empty and values are added in ascending order
...
Four
b
...
Eight
12
...
5, show the steps involved in the following
queries:
a
...
b
...
12
...
5, show the form of the tree after each of the
following series of operations:
a
...
b
...
c
...
d
...
e
...
12
...
What is the expected height of the tree as a function of n?
12
...
5 for a B-tree
...
10 Explain the distinction between closed and open hashing
...
12
...
12 Suppose that we are using extendable hashing on a file that contains records
with the following search-key values:
2, 3, 5, 7, 11, 17, 19, 23, 29, 31
Show the extendable hash structure for this file if the hash function is h(x) = x
mod 8 and buckets can hold three records
...
13 Show how the extendable hash structure of Exercise 12
...
Delete 11
...
Delete 31
...
Insert 1
...
Insert 15
...
Data Storage and
Querying
© The McGraw−Hill
Companies, 2001
12
...
14 Give pseudocode for deletion of entries from an extendable hash structure,
including details of when and how to coalesce buckets
...
12
...
Give details of how the count should be maintained when buckets are
split, coalesced or deleted
...
Therefore, it
is best not to reduce the size as soon as it is possible to do so, but instead do
it only if the number of index entries becomes small compared to the bucket
address table size
...
16 Why is a hash structure not the best choice for a search key on which range
queries are likely?
12
...
In cases where an overflow bucket would be needed, we instead reorganize the grid file
...
12
...
25
...
Construct a bitmap index on the attributes branch-name and balance, dividing balance values into 4 ranges: below 250, 250 to below 500, 500 to below
750, and 750 and above
...
Consider a query that requests all accounts in Downtown with a balance of
500 or more
...
12
...
Make sure that
your technique works even in the presence of null values, by using a bitmap
for the value null
...
20 How does data encryption affect index schemes? In particular, how might it
affect schemes that attempt to store data in sorted order?
Bibliographical Notes
Discussions of the basic data structures in indexing and hashing can be found in
Cormen et al
...
B-tree indices were first introduced in Bayer [1972] and Bayer
and McCreight [1972]
...
The bibliographic notes in Chapter 16 provides references to
research on allowing concurrent accesses and updates on B+ -trees
...
Several alternative tree and treelike search structures have been proposed
...
Such trees may not be balanced
in the sense that B+ -trees are
...
[1989], Orenstein
Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition
492
Chapter 12
IV
...
Indexing and Hashing
© The McGraw−Hill
Companies, 2001
Indexing and Hashing
[1982], Litwin [1981] and Fredkin [1960]
...
Knuth [1973] analyzes a large number of different hashing techniques
...
Extendable hashing was introduced by Fagin et al
...
Linear hashing was introduced by Litwin [1978] and Litwin [1980]; Larson
[1982] presents a performance analysis of linear hashing
...
Larson [1988] presents a variant of linear hashing
...
An alternative given by Ramakrishna and Larson [1989] allows retrieval in a single disk access
at the price of a high overhead for a small fraction of database modifications
...
The grid file structure appears in Nievergelt et al
...
Bitmap indices, and variants called bit-sliced indices and projection indices are described in O’Neil and Quass [1997]
...
They provide very large speedups on certain types of queries, and are today implemented on most database systems
...
493
494
Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition
C
IV
...
Query Processing
E
R
1
3
Query Processing
Query processing refers to the range of activities involved in extracting data from
a database
...
13
...
1
...
Parsing and translation
2
...
Evaluation
Before query processing can begin, the system must translate the query into a usable form
...
A more useful internal representation
is one based on the extended relational algebra
...
This translation process is similar to the work
performed by the parser of a compiler
...
The system constructs a parse-tree representation of the query, which it then translates into
a relational-algebra expression
...
Data Storage and
Querying
© The McGraw−Hill
Companies, 2001
13
...
1
statistics
about data
Steps in query processing
...
1 Most compiler texts cover parsing (see the bibliographical
notes)
...
For example, we have seen that, in SQL, a query could be expressed in several different ways
...
Furthermore, the relational-algebra representation of a query
specifies only partially how to evaluate a query; there are usually several ways to
evaluate relational-algebra expressions
...
For example, to implement the preceding selection, we can search
every tuple in account to find tuples with balance less than 2500
...
To specify fully how to evaluate a query, we need not only to provide the relationalalgebra expression, but also to annotate it with instructions specifying how to eval1
...
Therefore, the stored relation can be used, instead of uses of the view being replaced by the expression defining
the view
...
2
...
495
496
Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition
IV
...
Query Processing
13
...
2
A query-evaluation plan
...
Annotations may state the algorithm to be used for a specific
operation, or the particular index or indices to use
...
A sequence of primitive operations that can be used to evaluate a query is a queryexecution plan or query-evaluation plan
...
2 illustrates an evaluation plan
for our example query, in which a particular index (denoted in the figure as “index 1”) is specified for the selection operation
...
The different evaluation plans for a given query can have different costs
...
Rather, it is the responsibility of the system to construct a query-evaluation plan
that minimizes the cost of query evaluation
...
Once the query plan is chosen, the query is evaluated with that plan, and the result
of the query is output
...
For instance, instead of using the
relational-algebra representation, several databases use an annotated parse-tree representation based on the structure of the given SQL query
...
In order to optimize a query, a query optimizer must know the cost of each operation
...
Section 13
...
Sections 13
...
6 cover the evaluation of individual relational-algebra operations
...
In Section 13
...
13
...
Data Storage and
Querying
13
...
The response time for a query-evaluation plan (that is, the clock
time required to execute the plan), assuming no other activity is going on on the computer, would account for all these costs, and could be used as a good measure of the
cost of the plan
...
Moreover, CPU speeds have been
improving much faster than have disk speeds
...
Finally, estimating the CPU time is relatively hard, compared to estimating the disk-access cost
...
We use the number of block transfers from disk as a measure of the actual cost
...
This assumption ignores the variance arising from rotational
latency (waiting for the desired data to spin under the read – write head) and seek
time (the time that it takes to move the head over the desired track or cylinder)
...
We also
need to distinguish between reads and writes of blocks, since it takes more time to
write a block to disk than to read a block from disk
...
The number of seek operations performed
2
...
The number of blocks written
and then add up these numbers after multiplying them by the average seek time,
average transfer time for reading a block, and average transfer time for writing a
block, respectively
...
For simplicity we ignore these details, and leave
it to you to work out more precise cost estimates for various operations
...
These are taken into account separately where required
...
In the best case, all data can be read into the buffers, and the disk does not need
to be accessed again
...
When presenting cost
estimates, we generally assume the worst case
...
3 Selection Operation
In query processing, the file scan is the lowest-level operator to access data
...
497
498
Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition
IV
...
Query Processing
13
...
13
...
1 Basic Algorithms
Consider a selection operation on a relation whose tuples are stored together in one
file
...
In a linear search, the system scans each file block and tests
all records to see whether they satisfy the selection condition
...
The cost of linear search, in terms of number of I/O operations, is br , where
br denotes the number of blocks in the file
...
Although it may be slower than other algorithms for implementing selection, the linear search algorithm can be applied to any file, regardless of the
ordering of the file, or the availability of indices, or the nature of the selection
operation
...
• A2 (binary search)
...
The system performs the binary search
on the blocks of the file
...
If the selection is on a nonkey attribute, more than one block may
contain required records, and the cost of reading the extra blocks has to be
added to the cost estimate
...
2), and dividing it by
the average number of records that are stored per block of the relation
...
3
...
In Chapter 12, we pointed out that it is
efficient to read the records of a file in an order corresponding closely to physical
order
...
An index that is not a
primary index is called a secondary index
...
Ordered indices,
such as B+ -trees, also permit access to tuples in a sorted order, which is useful for
implementing range queries
...
We
use the selection predicate to guide us in the choice of the index to use in processing
the query
...
Data Storage and
Querying
13
...
For an equality comparison on a key
attribute with a primary index, we can use the index to retrieve a single record
that satisfies the corresponding equality condition
...
• A4 (primary index, equality on nonkey)
...
The only difference from the previous
case is that multiple records may need to be fetched
...
The cost of the operation is proportional to the height of the tree, plus the
number of blocks containing records with the specified search key
...
Selections specifying an equality condition
can use a secondary index
...
In the first case, only one record is retrieved, and the cost is equal to the
height of the tree plus one I/O operation to fetch the record
...
The cost could become even worse than
that of linear search if a large number of records are retrieved
...
If secondary indices store pointers to records’ physical location, the pointers
will have to be updated when records are moved
...
Accessing a record through a secondary index is then
even more expensive since a search has to be performed on the B+ -tree used in the
file organization
...
13
...
3 Selections Involving Comparisons
Consider a selection of the form σA≤v (r)
...
A primary ordered index (for example, a
primary B+ -tree index) can be used when the selection condition is a comparison
...
For A ≥ v, we
look up the value v in the index to find the first tuple in the file that has a value
of A = v
...
Data Storage and
Querying
© The McGraw−Hill
Companies, 2001
13
...
3
Selection Operation
499
all tuples that satisfy the condition
...
For comparisons of the form A < v or A ≤ v, an index lookup is not required
...
The case of A ≤ v is similar, except that the scan continues up to (but
not including) the first tuple with attribute A > v
...
• A7 (secondary index, comparison)
...
The lowestlevel index blocks are scanned, either from the smallest value up to v (for <
and ≤), or from v up to the maximum value (for > and ≥)
...
This step may require an I/O operation for each record fetched, since consecutive records may
be on different disk blocks
...
Therefore the secondary index should be used only if very few records are
selected
...
3
...
We now consider more complex selection
predicates
...
• Negation: The result of a selection σ¬θ (r) is the set of tuples of r for which the
condition θ evaluates to false
...
We can implement a selection operation involving either a conjunction or a disjunction of simple conditions by using one of the following algorithms:
• A8 (conjunctive selection using one index)
...
If one
Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition
500
Chapter 13
IV
...
Query Processing
© The McGraw−Hill
Companies, 2001
Query Processing
is, one of the selection algorithms A2 through A7 can retrieve records satisfying that condition
...
To reduce the cost, we choose a θi and one of algorithms A1 through A7 for
which the combination results in the least cost for σθi (r)
...
• A9 (conjunctive selection using composite index)
...
If the selection specifies an equality condition on two
or more attributes, and a composite index exists on these combined attribute
fields, then the index can be searched directly
...
• A10 (conjunctive selection by intersection of identifiers)
...
This algorithm requires indices with
record pointers, on the fields involved in the individual conditions
...
The intersection of all the retrieved pointers is the set of pointers to tuples
that satisfy the conjunctive condition
...
If indices are not available on all the individual
conditions, then the algorithm tests the retrieved records against the remaining conditions
...
This cost can be reduced by sorting the list of pointers and
retrieving records in the sorted order
...
Section 13
...
• A11 (disjunctive selection by union of identifiers)
...
The union of all the
retrieved pointers yields the set of pointers to all tuples that satisfy the disjunctive condition
...
However, if even one of the conditions does not have an access path, we
will have to perform a linear scan of the relation to find tuples that satisfy the
condition
...
The implementation of selections with negation conditions is left to you as an exercise (Exercise 13
...
501
502
Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition
IV
...
Query Processing
13
...
4 Sorting
Sorting of data plays an important role in database systems for two reasons
...
Second, and equally important for
query processing, several of the relational operations, such as joins, can be implemented efficiently if the input relations are first sorted
...
5
...
However, such a process orders the relation
only logically, through an index, rather than physically
...
For this reason, it may be desirable to order the records physically
...
In the first
case, standard sorting techniques such as quick-sort can be used
...
Sorting of relations that do not fit in memory is called external sorting
...
We describe the external sort – merge algorithm next
...
1
...
i = 0;
repeat
read M blocks of the relation, or the rest of the relation,
whichever is smaller;
sort the in-memory part of the relation;
write the sorted data to run file Ri ;
i = i + 1;
until the end of the relation
2
...
Suppose, for now, that the total number of runs, N, is less than M, so that we can allocate one page frame to each
run and have space left to hold one page of output
...
Data Storage and
Querying
© The McGraw−Hill
Companies, 2001
13
...
The output file is buffered
to reduce the number of disk write operations
...
In general, if the relation is much larger than memory, there may be M or more
runs generated in the first stage, and it is not possible to allocate a page frame for each
run during the merge stage
...
Since there is enough memory for M − 1 input buffer pages, each merge can
take M − 1 runs as input
...
Then, it merges the next M − 1
runs similarly, and so on, until it has processed all the initial runs
...
If this reduced number of runs
is still greater than or equal to M , another pass is made, with the runs created by the
first pass as input
...
The
passes repeat as many times as required, until the number of runs is less than M ; a
final pass then generates the sorted output
...
3 illustrates the steps of the external sort– merge for an example relation
...
During the merge stage,
two page frames are used for input and one for output
...
3
d
3
p
2
create
runs
c 33
7
m
7
p
b 14
d 21
a 14
a 14
a 19
33
a 14
2
d
a 14
merge
pass –2
sorted
output
External sorting using sort – merge
...
Data Storage and
Querying
© The McGraw−Hill
Companies, 2001
13
...
5
Join Operation
503
We compute how many block transfers are required for the external sort merge
in this way: Let br denote the number of blocks containing records of relation r
...
The initial number of runs is br /M
...
Each of these passes reads every block of the relation
once and writes it out once, with two exceptions
...
Second, there may be runs that
are not read in or written out during a pass— for example, if there are M runs to
be merged in a pass, M − 1 are read in and merged, and one run is not accessed
during the pass
...
3, we get a total of 12∗(4+1) =
60 block transfers, as you can verify from the figure
...
13
...
We use the term equi-join to refer to a join of the form r 1r
...
B s, where A and
B are attributes or sets of attributes of relations r and s respectively
...
• Number of blocks of customer: bcustomer = 400
...
• Number of blocks of depositor: bdepositor = 100
...
5
...
4 shows a simple algorithm to compute the theta join, r 1θ s, of two relations r and s
...
Relation r is called the outer relation and
relation s the inner relation of the join, since the loop for r encloses the loop for s
...
Like the linear file-scan algorithm for selection, the nested-loop join algorithm requires no indices, and it can be used regardless of what the join condition is
...
Data Storage and
Querying
13
...
end
end
Figure 13
...
join can be expressed as a theta join followed by elimination of repeated attributes by
a projection
...
The nested-loop join algorithm is expensive, since it examines every pair of tuples
in the two relations
...
The number
of pairs of tuples to be considered is nr ∗ns , where nr denotes the number of tuples in
r, and ns denotes the number of tuples in s
...
In the worst case, the buffer can hold only one block of each
relation, and a total of nr ∗ bs + br block accesses would be required, where br and
bs denote the number of blocks containing tuples of r and s respectively
...
If one of the relations fits entirely in main memory, it is beneficial to use that relation as the inner relation, since the inner relation would then be read only once
...
Now consider the natural join of depositor and customer
...
We can use the nested loops to compute the join; assume that depositor
is the outer relation and customer is the inner relation in the join
...
In the worst case, the number of
block accesses is 5000 ∗ 400 + 100 = 2,000,100
...
This computation
requires at most 100 + 400 = 500 block accesses — a significant improvement over the
worst-case scenario
...
13
...
2 Block Nested-Loop Join
If the buffer is too small to hold either relation entirely in memory, we can still obtain a major saving in block accesses if we process the relations on a per-block basis,
rather than on a per-tuple basis
...
5 shows block nested-loop join, which is
a variant of the nested-loop join where every block of the inner relation is paired with
every block of the outer relation
...
Data Storage and
Querying
© The McGraw−Hill
Companies, 2001
13
...
5
Join Operation
505
for each block Br of r do begin
for each block Bs of s do begin
for each tuple tr in Br do begin
for each tuple ts in Bs do begin
test pair (tr , ts ) to see if they satisfy the join condition
if they do, add tr · ts to the result
...
5
Block nested-loop join
...
As before,
all pairs of tuples that satisfy the join condition are added to the result
...
Thus, in the worst case, there will be a total of br ∗ bs + br block accesses, where br and bs denote the number of blocks containing records of r and s
respectively
...
In the best case, there will be
br + bs block accesses
...
In the worst case we have to read each block of customer
once for each block of depositor
...
This cost is a significant improvement over the
5000 ∗ 400 + 100 = 2, 000, 100 block accesses needed in the worst case for the basic
nested-loop join
...
The performance of the nested-loop and block nested-loop procedures can be further improved:
• If the join attributes in a natural join or an equi-join form a key on the inner
relation, then for each outer relation tuple the inner loop can terminate as soon
as the first match is found
...
In other words, if memory has M blocks, we read in M − 2 blocks
of the outer relation at a time, and when we read each block of the inner relation we join it with all the M − 2 blocks of the outer relation
...
The total cost is then
br /(M − 2) ∗ bs + br
...
Data Storage and
Querying
13
...
This scanning
method orders the requests for disk blocks so that the data remaining in the
buffer from the previous scan can be reused, thus reducing the number of disk
accesses needed
...
Section 13
...
3 describes this optimization
...
5
...
4), if an index is available on the inner loop’s join
attribute, index lookups can replace file scans
...
This join method is called an indexed nested-loop join; it can be used with existing
indices, as well as with temporary indices created for the sole purpose of evaluating
the join
...
For example, consider depositor 1 customer
...
Then, the relevant tuples in s
are those that satisfy the selection “customer-name = John”
...
For each tuple
in the outer relation r, a lookup is performed on the index for s, and the relevant
tuples are retrieved
...
Then, br disk accesses are needed to read relation
r, where br denotes the number of blocks containing records of r
...
Then, the cost of the join can be computed as
br + nr ∗ c, where nr is the number of records in relation r, and c is the cost of a single
selection on s using the join condition
...
3 how to estimate
the cost of a single selection algorithm (possibly using indices); that estimate gives us
the value of c
...
For example, consider an indexed nested-loop join of depositor 1 customer, with
depositor as the outer relation
...
Since customer has 10,000 tuples, the height of the tree is 4, and one more
access is needed to find the actual data
...
This cost is lower than the 40, 100 accesses needed
for a block nested-loop join
...
5
...
Let r(R) and s(S) be the relations whose
natural join is to be computed, and let R∩S denote their common attributes
...
Data Storage and
Querying
© The McGraw−Hill
Companies, 2001
13
...
5
Join Operation
507
pr := address of first tuple of r;
ps := address of first tuple of s;
while (ps = null and pr = null) do
begin
ts := tuple to which ps points;
Ss := {ts };
set ps to point to next tuple of s;
done := false;
while (not done and ps = null) do
begin
ts := tuple to which ps points;
if (ts [JoinAttrs] = ts [JoinAttrs])
then begin
Ss := Ss ∪ {ts };
set ps to point to next tuple of s;
end
else done := true;
end
tr := tuple to which pr points;
while (pr = null and tr [JoinAttrs] < ts [JoinAttrs]) do
begin
set pr to point to next tuple of r;
tr := tuple to which pr points;
end
while (pr = null and tr [JoinAttrs] = ts [JoinAttrs]) do
begin
for each ts in Ss do
begin
add ts 1 tr to result ;
end
set pr to point to next tuple of r;
tr := tuple to which pr points;
end
end
...
6
Merge join
...
Then, their join can be computed
by a process much like the merge stage in the merge – sort algorithm
...
6 shows the merge join algorithm
...
The merge join algorithm associates one pointer
with each relation
...
As the algorithm proceeds, the pointers move through the relation
...
Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition
508
Chapter 13
IV
...
Query Processing
Query Processing
The algorithm in Figure 13
...
Then, the corresponding tuples (if any) of the other relation are read in, and
are processed as they are read
...
7 shows two relations that are sorted on their join attribute a1
...
Since the relations are in sorted order, tuples with the same value on the join attributes are in consecutive order
...
Since it makes only
a single pass through both files, the merge join method is efficient; the number of
block accesses is equal to the sum of the number of blocks in both files, br + bs
...
The merge join algorithm
can also be easily extended from natural joins to the more general case of equi-joins
...
The join attribute here is customer-name
...
In this case, the merge join takes a total of 400 +
100 = 500 block accesses
...
Sorting customer takes 400 ∗ (2 log2 (400/3) + 1), or
6800, block transfers, with 400 more transfers to write out the result
...
Thus, the total cost is 9100 block transfers if the relations are not sorted,
and the memory size is just 3 blocks
...
Adding the cost of writing out the sorted results and reading them
back gives a total cost of 2500 block transfers if the relations are not sorted and the
memory size is 25 blocks
...
6 requires that the set
Ss of all tuples with the same value for the join attributes must fit in main memory
...
7
Sorted relations for merge join
...
Data Storage and
Querying
© The McGraw−Hill
Companies, 2001
13
...
5
Join Operation
509
This requirement can usually be met, even if the relation s is large
...
The overall cost of the merge join increases as a
result
...
The algorithm scans the records
through the indices, resulting in their being retrieved in sorted order
...
Hence, each tuple access could involve accessing a disk block, and
that is costly
...
Suppose that one of the relations is sorted; the other is unsorted, but has a secondary B+ -tree index on the join attributes
...
The result file contains tuples from the sorted relation and addresses for
tuples of the unsorted relation
...
Extensions of the technique to handle
two unsorted relations are left as an exercise for you
...
5
...
In the hash join algorithm, a hash function h is used to
partition tuples of both relations
...
We assume that
• h is a hash function mapping JoinAttrs values to {0, 1,
...
• Hr0 , Hr1 ,
...
Each tuple tr ∈ r is put in partition Hri , where i = h(tr [JoinAttrs])
...
, Hsnh denote partitions of s tuples, each initially empty
...
The hash function h should have the “goodness” properties of randomness and uniformity that we discussed in Chapter 12
...
8 depicts the partitioning of the
relations
...
If that value is hashed to some value i, the r tuple has to be in Hri and the
s tuple in Hsi
...
For example, if d is a tuple in depositor, c a tuple in customer, and h a hash function
on the customer-name attributes of the tuples, then d and c must be tested only if
Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition
510
Chapter 13
IV
...
Query Processing
Query Processing
0
1
...
...
0
1
2
2
3
3
4
...
...
4
s
r
partitions
of r
Figure 13
...
h(c) = h(d)
...
However, if h(c) = h(d), we must test c and d to see whether the values in their join
attributes are the same, since it is possible that c and d have different customer-names
that have the same hash value
...
9 shows the details of the hash join algorithm to compute the natural
join of relations r and s
...
After the partitioning of the relations, the rest of the hash join code performs
a separate indexed nested-loop join on each of the partition pairs i, for i = 0,
...
To do so, it first builds a hash index on each Hsi , and then probes (that is, looks
up Hsi ) with tuples from Hri
...
The hash index on Hsi is built in memory, so there is no need to access the disk to
retrieve the tuples
...
In the
course of the indexed nested-loop join, the system uses this hash index to retrieve
records that will match records in the probe input
...
It is straightforward to extend the hash join algorithm to compute
general equi-joins
...
It is not necessary for the partitions of the probe relation to fit in
memory
...
If the
size of the build relation is bs blocks, then, for each of the nh partitions to be of size
less than or equal to M , nh must be at least bs /M
...
Data Storage and
Querying
© The McGraw−Hill
Companies, 2001
13
...
5
Join Operation
511
/* Partition s */
for each tuple ts in s do begin
i := h(ts [JoinAttrs]);
Hsi := Hsi ∪ {ts };
end
/* Partition r */
for each tuple tr in r do begin
i := h(tr [JoinAttrs]);
Hri := Hri ∪ {tr };
end
/* Perform join on each partition */
for i := 0 to nh do begin
read Hsi and build an in-memory hash index on it
for each tuple tr in Hri do begin
probe the hash index on Hsi to locate all tuples ts
such that ts [JoinAttrs] = tr [JoinAttrs]
for each matching tuple ts in Hsi do begin
add tr 1 ts to the result
end
end
end
Figure 13
...
to account for the extra space occupied by the hash index on the partition as well, so
nh should be correspondingly larger
...
13
...
5
...
Instead, partitioning has to be done in repeated passes
...
Each bucket generated by one pass is separately read in and
partitioned again in the next pass, to create smaller partitions
...
The system
repeats this splitting of the input until each partition of the build input fits in memory
...
A relation does not need recursive partitioning if M > √h +1, or equivalently M >
n
(bs /M ) + 1, which simplifies (approximately) to M > bs
...
We can use a memory of this size to partition relations of size
9 million blocks, which is 36 gigabytes
...
Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition
512
Chapter 13
IV
...
Query Processing
© The McGraw−Hill
Companies, 2001
Query Processing
13
...
5
...
Hash-table overflow can occur if there are many
tuples in the build relation with the same values for the join attributes, or if the hash
function does not have the properties of randomness and uniformity
...
We can handle a small amount of skew by increasing the number of partitions so
that the expected size of each partition (including the hash index on the partition)
is somewhat less than the size of memory
...
5
...
Even if we are conservative on the sizes of the partitions, by using a fudge factor,
overflows can still occur
...
Overflow resolution is performed during the build phase,
if a hash-index overflow is detected
...
Similarly, Hri is also partitioned using the new hash
function, and only tuples in the matching partitions need to be joined
...
In overflow avoidance, the build relation s
is initially partitioned into many small partitions, and then some partitions are combined in such a way that each combined partition fits in memory
...
If a large number of tuples in s have the same value for the join attributes, the resolution and avoidance techniques may fail on some partitions
...
13
...
5
...
Our analysis assumes that there is no hashtable overflow
...
The partitioning of the two relations r and s calls for a complete reading of both relations, and a subsequent writing back of them
...
The build and probe phases read each of the partitions once, calling for a further br + bs accesses
...
Accessing such partially filled blocks can add an overhead of at most 2nh for each of the relations, since
each of the nh partitions could have a partially filled block that has to be written and
read back
...
Data Storage and
Querying
© The McGraw−Hill
Companies, 2001
13
...
5
Join Operation
513
The overhead 4nh is quite small compared to br + bs , and can be ignored
...
Each pass reduces
the size of each of the partitions by an expected factor of M − 1; and passes are
repeated until each partition is of size at most M blocks
...
Since, in each pass,
every block of s is read in and written out, the total block transfers for partitioning of
s is 2bs logM −1 (bs ) − 1
...
With a memory size of 20
blocks, depositor can be partitioned into five partitions, each of size 20 blocks, which
size will fit into memory
...
The relation
customer is similarly partitioned into five partitions, each of size 80
...
The hash join can be improved if the main memory size is large
...
The cost estimate goes down to br + bs
...
5
...
4 Hybrid Hash – Join
The hybrid hash– join algorithm performs another optimization; it is useful when
memory sizes are relatively large, but not all of the build relation fits in memory
...
Hence,
a total of nh + 1 blocks of memory are needed for the partitioning the two relations
...
Further, the hash function is designed in such a
way that the hash index on Hs0 fits in M − nh − 1 blocks, in order that, at the end of
partitioning of s, Hs0 is completely in memory and a hash index can be built on Hs0
...
After they are used for probing,
the tuples can be discarded, so the partition Hr0 does not occupy any memory space
...
The system writes out tuples in the other partitions as usual, and joins them later
...
If the size of the build relation is bs , nh is approximately equal to bs /M
...
For example, suppose the block size is 4 kilobytes, and
the build relation size is 1 gigabyte
...
Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition
514
Chapter 13
IV
...
Query Processing
Query Processing
Consider the join customer 1 depositor again
...
It occupies 20 blocks
of memory; one block is for input and one block each is for buffering the other four
partitions
...
Ignoring the cost of writing partially filled blocks, the
cost is 3(80 + 320) + 20 + 80 = 1300 block transfers, instead of 1500 block transfers
without the hybrid hashing optimization
...
5
...
The other join techniques are more efficient than the nested-loop join and its
variants, but can handle only simple join conditions, such as natural joins or equijoins
...
3
...
Consider the following join with a conjunctive condition:
r
1θ ∧θ ∧···∧θ
1
2
n
s
One or more of the join techniques described earlier may be applicable for joins on
the individual conditions r 1θ1 s, r 1θ2 s, r 1θ3 s, and so on
...
The result of the complete join consists of those tuples in the intermediate result that
satisfy the remaining conditions
θ1 ∧ · · · ∧ θi−1 ∧ θi+1 ∧ · · · ∧ θn
These conditions can be tested as tuples in r 1θi s are being generated
...
6 describes algorithms for computing the union of relations
...
6 Other Operations
Other relational operations and extended relational operations — such as duplicate
elimination, projection, set operations, outer join, and aggregation — can be implemented as outlined in Sections 13
...
1 through 13
...
5
...
Data Storage and
Querying
© The McGraw−Hill
Companies, 2001
13
...
6
Other Operations
515
13
...
1 Duplicate Elimination
We can implement duplicate elimination easily by sorting
...
With
external sort – merge, duplicates found while a run is being created can be removed
before the run is written to disk, thereby reducing the number of block transfers
...
The worst-case cost estimate for duplicate elimination is the same
as the worst-case cost estimate for sorting of the relation
...
First, the relation is partitioned on the basis of a hash function on the whole
tuple
...
While constructing the hash index, a tuple is inserted only if it is not already present
...
After all tuples in the partition have been processed, the tuples in the hash index are written to the result
...
Because of the relatively high cost of duplicate elimination, SQL requires an explicit
request by the user to remove duplicates; otherwise, the duplicates are retained
...
6
...
Duplicates can be eliminated by the methods described in Section 13
...
1
...
Generalized projection (which was
discussed in Section 3
...
1) can be implemented in the same way as projection
...
6
...
In r ∪ s, when a concurrent scan of both relations reveals the same tuple in
both files, only one of the tuples is retained
...
We implement set difference, r − s, similarly, by
retaining tuples in r only if they are absent in s
...
If the relations are not sorted initially, the cost of sorting has to be
included
...
Hashing provides another way to implement these set operations
...
, Hrnh and Hs0 , Hs1 ,
...
Depending on the
operation, the system then takes these steps on each partition i = 0, 1
...
Data Storage and
Querying
13
...
Build an in-memory hash index on Hri
...
Add the tuples in Hsi to the hash index only if they are not already present
...
Add the tuples in the hash index to the result
...
Build an in-memory hash index on Hri
...
For each tuple in Hsi , probe the hash index, and output the tuple to the
result only if it is already present in the hash index
...
Build an in-memory hash index on Hri
...
For each tuple in Hsi , probe the hash index, and, if the tuple is present in
the hash index, delete it from the hash index
...
Add the tuples remaining in the hash index to the result
...
6
...
3
...
For example, the natural left
outer join customer 1 depositor contains the join of customer and depositor, and, in
addition, for each customer tuple t that has no matching tuple in depositor (that is,
where customer-name is not in depositor), the following tuple t1 is added to the result
...
The remaining attributes (from the schema of depositor) of tuple t1 contain the value
null
...
Compute the corresponding join, and then add further tuples to the join result to get the outer-join result
...
To evaluate r 1θ s, we first compute r 1θ s, and
save that result as temporary relation q1
...
We can use any of the algorithms for computing the joins, projection, and set difference described earlier
to compute the outer joins
...
The right outer-join operation r 1 θ s is equivalent to s 1θ r, and can
therefore be implemented in a symmetric fashion to the left outer join
...
2
...
It is easy to extend the nested-loop join algorithms
to compute the left outer join: Tuples in the outer relation that do not match
any tuple in the inner relation are written to the output after being padded
with null values
...
517
518
Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition
IV
...
Query Processing
13
...
Merge join
can be extended to compute the full outer join as follows: When the merge
of the two relations is being done, tuples in either relation that did not match
any tuple in the other relation can be padded with nulls and written to the output
...
Since the relations are sorted, it is easy to detect whether or
not a tuple matches any tuples from the other relation
...
The cost estimates for implementing outer joins using the merge join algorithm are the same as are those for the corresponding join
...
The extension of the hash join algorithm to compute outer joins is left for
you to do as an exercise (Exercise 13
...
13
...
5 Aggregation
Recall the aggregation operator G, discussed in Section 3
...
2
...
The aggregation operation can be implemented in the same way as duplicate elimination
...
However, instead of eliminating tuples with the same value for the grouping attribute, we
gather them into groups, and apply the aggregation operations on each group to get
the result
...
Instead of gathering all the tuples in a group and then applying the aggregation
operations, we can implement the aggregation operations sum, min, max, count, and
avg on the fly as the groups are being constructed
...
For the count operation, it maintains a running count for each group for
which a tuple has been found
...
Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition
518
Chapter 13
IV
...
Query Processing
Query Processing
If all tuples of the result will fit in memory, both the sort-based and the hash-based
implementations do not need to write any tuples to disk
...
When we use on the
fly aggregation techniques, only one tuple needs to be stored for each of the groups
...
13
...
Now
we consider how to evaluate an expression containing multiple operations
...
The result of each evaluation is materialized in a temporary
relation for subsequent use
...
An
alternative approach is to evaluate several operations simultaneously in a pipeline,
with the results of one operation passed on to the next, without the need to store a
temporary relation
...
7
...
7
...
We shall see that the costs of these approaches can differ
substantially, but also that there are cases where only the materialization approach is
feasible
...
7
...
Consider the expression
Πcustomer -name (σbalance<2500 (account) 1 customer)
in Figure 13
...
Π customer-name
σ balance < 2500
customer
account
Figure 13
...
519
520
Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition
IV
...
Query Processing
13
...
In our example, there is only one such operation; the selection operation on account
...
We execute these operations by the algorithms that we
studied earlier, and we store the results in temporary relations
...
In our
example, the inputs to the join are the customer relation and the temporary relation
created by the selection on account
...
By repeating the process, we will eventually evaluate the operation at the root of
the tree, giving the final result of the expression
...
Evaluation as just described is called materialized evaluation, since the results of
each intermediate operation are created (materialized) and then are used for evaluation of the next-level operations
...
When we computed the cost estimates of algorithms, we ignored the
cost of writing the result of the operation to disk
...
We assume that the records
of the result accumulate in a buffer, and, when the buffer is full, they are written to
disk
...
Double buffering (using two buffers, with one continuing execution of the algorithm while the other is being written out) allows the algorithm to execute more
quickly by performing CPU activity in parallel with I/O activity
...
7
...
We achieve this reduction by combining several relational operations into a pipeline of operations, in which the results of one operation are passed
along to the next operation in the pipeline
...
Combining operations into a pipeline eliminates the cost of
reading and writing temporary relations
...
If materialization were applied, evaluation would involve creating a temporary relation to hold the result of the
join, and then reading back in the result to perform the projection
...
By combining the join and
the projection, we avoid creating the intermediate result, and instead create the final
result directly
...
Data Storage and
Querying
13
...
7
...
1 Implementation of Pipelining
We can implement a pipeline by constructing a single, complex operation that combines the operations that constitute the pipeline
...
Therefore, each operation in the pipeline is modeled as a separate process or thread within the system,
which takes a stream of tuples from its pipelined inputs, and generates a stream of
tuples for its output
...
In the example of Figure 13
...
In turn,
it passes the results of the join to the projection as they are generated
...
However,
as a result of pipelining, the inputs to the operations are not available all at once for
processing
...
Demand driven
2
...
Each time that an operation receives a request
for tuples, it computes the next tuple (or tuples) to be returned, and then returns
that tuple
...
If it has some pipelined inputs, the operation also
makes requests for tuples from its pipelined inputs
...
In a producer-driven pipeline, operations do not wait for requests to produce
tuples, but instead generate the tuples eagerly
...
An operation at any other level of a pipeline generates output tuples
when it gets input tuples from lower down in the pipeline, until its output buffer is
full
...
In either case, once the output buffer is full, the operation waits
until its parent operation removes tuples from the buffer, so that the buffer has space
for more tuples
...
The operation repeats this process until all the output tuples have been
generated
...
In a parallel-processing system, operations in a pipeline
may be run concurrently on distinct processors (see Chapter 20)
...
Data Storage and
Querying
© The McGraw−Hill
Companies, 2001
13
...
7
Evaluation of Expressions
521
pulling data up an operation tree from the top
...
Each operation in a demand-driven pipeline can be implemented as an iterator,
which provides the following functions: open(), next(), and close()
...
The implementation of the operation in turn calls open() and next() on its inputs, to get its input
tuples when required
...
The iterator maintains the state of its execution in between calls, so that
successive next() requests receive successive result tuples
...
When the next() function is called, the file scan continues from after the previous point; when the next tuple satisfying the selection is
found by scanning the file, the tuple is returned after storing the point where it was
found in the iterator state
...
On calls to
next(), it would return the next pair of matching tuples
...
Details of the implementation of iterators are left for you to complete in Exercise 13
...
Demand-driven pipelining is used more commonly than producer-driven
pipelining, because it is easier to implement
...
7
...
2 Evaluation Algorithms for Pipelining
Consider a join operation whose left-hand– side input is pipelined
...
This
unavailability limits the choice of join algorithm to be used
...
However, indexed nested-loop join can be used: As tuples are received for the
left-hand side of the join, they can be used to index the right-hand– side relation, and
to generate tuples in the join result
...
The restrictions on the evaluation algorithms that are eligible for use are a limiting
factor for pipelining
...
Suppose that the join of
r and s is required, and input r is pipelined
...
The cost of this technique is nr ∗ HTi , where HTi is the height of the
index on s
...
With a join
technique such as hash join, it may be possible to perform the join with a cost of
about 3(br + bs )
...
Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition
522
Chapter 13
IV
...
Query Processing
© The McGraw−Hill
Companies, 2001
Query Processing
doner := false;
dones := false;
r := ∅;
s := ∅;
result := ∅;
while not doner or not dones do
begin
if queue is empty, then wait until queue is not empty;
t := top entry in queue;
if t = End r then done r := true
else if t = End s then done s := true
else if t is from input r
then
begin
r := r ∪ {t};
result := result ∪ ({t} 1 s);
end
else /* t is from input s */
begin
s := s ∪ {t};
result := result ∪ (r 1 {t});
end
end
Figure 13
...
The effective use of pipelining requires the use of evaluation algorithms that can
generate output tuples even as tuples are received for the inputs to the operation
...
Only one of the inputs to a join is pipelined
...
Both inputs to the join are pipelined
...
If the pipelined input tuples are sorted on the join attributes, and the join
condition is an equi-join, merge join can also be used
...
However, tuples that are not in the
first partition will be output only after the entire pipelined input relation is received
...
If both inputs are pipelined, the choice of join algorithms is more restricted
...
Another alternative is the pipelined join technique, shown in Figure
13
...
The algorithm assumes that the input tuples for both input relations, r and s,
are pipelined
...
Special queue entries, called Endr and Ends , which serve as end-of-file
523
524
Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition
IV
...
Query Processing
13
...
For efficient evaluation, appropriate indices should be built on the
relations r and s
...
13
...
In the process of generating the internal form
of the query, the parser checks the syntax of the user’s query, verifies that the
relation names appearing in the query are names of relations in the database,
and so on
...
• Given a query, there are generally a variety of methods for computing the
answer
...
Chapter 14 covers query optimization
...
We can handle complex
selections by computing unions and intersections of the results of simple selections
...
• Queries involving a natural join may be processed in several ways, depending
on the availability of indices and the form of physical storage for the relations
...
If indices are available, the indexed nested-loop join can be used
...
It may be advantageous to sort a relation prior to join computation (so as to allow use of
the merge join strategy)
...
The partitioning is
carried out with a hash function on the join attributes, so that corresponding pairs of partitions can be joined independently
...
• Outer join operations can be implemented by simple extensions of join algorithms
...
Data Storage and
Querying
© The McGraw−Hill
Companies, 2001
13
...
• An expression can be evaluated by means of materialization, where the system computes the result of each subexpression and stores it on disk, and then
uses it to compute the result of the parent expression
...
Review Terms
• Query processing
• Evaluation primitive
• Query-execution plan
• Query-evaluation plan
• Query-execution engine
• Measures of query cost
•
•
•
•
•
•
• Sequential I/O
• Random I/O
• File scan
• Linear search
• Binary search
• Selections using indices
• Access paths
• Index scans
• Conjunctive selection
• Disjunctive selection
• Composite index
• Intersection of identifiers
• External sorting
•
•
•
•
•
• External sort – merge
• Runs
• N-way merge
• Equi-join
• Nested-loop join
•
Block nested-loop join
Indexed nested-loop join
Merge join
Sort – merge join
Hybrid merge – join
Hash join
Build
Probe
Build input
Probe input
Recursive partitioning
Hash-table overflow
Skew
Fudge factor
Overflow resolution
Overflow avoidance
Hybrid hash – join
Operator tree
Materialized evaluation
Double buffering
Pipelined evaluation
Demand-driven pipeline
(lazy, pulling)
Producer-driven pipeline
(eager, pushing)
Iterator
Pipelined join
525
526
Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition
IV
...
Query Processing
© The McGraw−Hill
Companies, 2001
Exercises
525
Exercises
13
...
13
...
branch-name
from branch T, branch S
where T
...
assets and S
...
Justify your choice
...
3 What are the advantages and disadvantages of hash indices relative to B+ -tree
indices? How might the type of index available influence the choice of a queryprocessing strategy?
13
...
Show the runs created on each pass of
the sort-merge algorithm, when applied to sort the following tuples on the first
attribute: (kangaroo, 17), (wallaby, 21), (emu, 1), (wombat, 13), (platypus, 3),
(lion, 8), (warthog, 4), (zebra, 11), (meerkat, 6), (hyena, 9), (hornbill, 2), (baboon,
12)
...
5 Let relations r1 (A, B, C) and r2 (C, D, E) have the following properties: r1 has
20,000 tuples, r2 has 45,000 tuples, 25 tuples of r1 fit on one block, and 30 tuples
of r2 fit on one block
...
b
...
d
...
6 Design a variant of the hybrid merge – join algorithm for the case where both
relations are not physically sorted, but both have a sorted secondary index on
the join attributes
...
7 The indexed nested-loop join algorithm described in Section 13
...
3 can be inefficient if the index is a secondary index, and there are multiple tuples with the
same value for the join attributes
...
Under what
conditions would this algorithm be more efficient than hybrid merge – join?
13
...
6 for r1 1 r2 , where r1 and r2 are as defined in Exercise 13
...
Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition
526
Chapter 13
IV
...
Query Processing
© The McGraw−Hill
Companies, 2001
Query Processing
13
...
Assuming infinite memory, what is the lowest cost way (in terms of I/O
operations) to compute r 1 s? What is the amount of memory required for this
algorithm?
13
...
List different ways to handle the following
selections that involve negation?
a
...
σ¬(branch -city=“Brooklyn”) (branch)
c
...
11 The hash join algorithm as described in Section 13
...
5 computes the natural join
of two relations
...
(Hint: Keep extra information with each tuple in the hash index, to detect
whether any tuple in the probe relation matches the tuple in the hash index
...
13
...
Use the standard iterator functions in
your pseudocode
...
13
...
Bibliographical Notes
A query processor must parse statements in the query language, and must translate
them into an internal form
...
Most compiler texts, such as Aho et al
...
Knuth [1973] presents an excellent description of external sorting algorithms,
including an optimization that can create initial runs that are (on the average) twice
the size of memory
...
These studies, which were related to the development of System R, determined that either the
nested-loop join or merge join nearly always provided the optimal join method (Blasgen and Eswaran [1976]); hence, these two were the only join algorithms implemented in System R
...
Today, hash joins are considered to be highly efficient
...
Hash
join techniques are described in Kitsuregawa et al
...
Zeller and Gray [1990] and Davison
and Graefe [1994] describe hash join techniques that can adapt to the available mem-
527
528
Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition
IV
...
Query Processing
© The McGraw−Hill
Companies, 2001
Bibliographical Notes
527
ory, which is important in systems where multiple queries may be running at the
same time
...
[1998] describes the use of hash joins and hash teams, which
allow pipelining of hash-joins by using the same partitioning for all hash-joins in a
pipeline sequence, in the Microsoft SQL Server
...
An earlier survey of query-processing techniques appears in Jarke and Koch [1984]
...
[1984] and
Whang and Krishnamurthy [1990]
...
Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition
C
IV
...
Query Optimization
4
Query Optimization
Query optimization is the process of selecting the most efficient query-evaluation
plan from among the many strategies usually possible for processing a given query,
especially if the query is complex
...
Rather, we expect the system to construct a
query-evaluation plan that minimizes the cost of query evaluation
...
One aspect of optimization occurs at the relational-algebra level, where the system
attempts to find an expression that is equivalent to the given expression, but more
efficient to execute
...
The difference in cost (in terms of evaluation time) between a good strategy and a
bad strategy is often substantial, and may be several orders of magnitude
...
14
...
”
Πcustomer -name (σbranch−city = “Brooklyn” (branch 1 (account 1 depositor)))
This expression constructs a large intermediate relation, branch 1 account 1 depositor
...
Since we are concerned with only those tuples in the branch relation that pertain to
branches located in Brooklyn, we do not need to consider those tuples that do not
529
530
Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition
530
Chapter 14
IV
...
Query Optimization
Query Optimization
Π customer-name
Π customer-name
σ branch-city=Brooklyn
σ branch-city=Brooklyn
branch
account
depositor
(a) Initial expression tree
Figure 14
...
have branch-city = “Brooklyn”
...
Our query
is now represented by the relational-algebra expression
Πcustomer -name ( (σbranch -city = “Brooklyn” (branch)) 1 (account 1 depositor))
which is equivalent to our original algebra expression, but which generates smaller
intermediate relations
...
1 depicts the initial and transformed expressions
...
To choose among different query-evaluation plans, the optimizer has to estimate
the cost of each evaluation plan
...
Instead, optimizers make
use of statistical information about the relations, such as relation sizes and index
depths, to make a good estimate of the cost of a plan
...
In Section 14
...
Using these statistics with the cost formulae in Chapter 13 allows
us to estimate the costs of individual operation
...
7
...
Generation of query-evaluation plans involves two steps: (1) generating expressions that are logically equivalent to the given expression and (2) annotating the resultant expressions in alternative ways to generate alternative query
evaluation plans
...
Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition
IV
...
Query Optimization
14
...
It does so by means of equivalence rules that specify how
to transform an expression into a logically equivalent one
...
3
...
In Section 14
...
We can choose one based on the estimated cost of the plans
...
Such optimization, called cost-based optimization, is described in Section 14
...
2
...
In Section 14
...
14
...
Given
an expression such as a 1 (b 1 c) to estimate the cost of joining a with (b 1 c), we
need to have estimates of statistics such as the size of b 1 c
...
One thing that will become clear later in this section is that the estimates are not
very accurate, since they are based on assumptions that may not hold exactly
...
However, real-world experience
has shown that even if estimates are not precise, the plans with the lowest estimated
costs usually have actual execution costs that are either the lowest actual execution
costs, or are close to the lowest actual execution costs
...
2
...
• br , the number of blocks containing tuples of relation r
...
• fr , the blocking factor of relation r — that is, the number of tuples of relation r
that fit into one block
...
This value is the same as the size of ΠA (r)
...
532
Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition
532
Chapter 14
IV
...
Query Optimization
© The McGraw−Hill
Companies, 2001
Query Optimization
The last statistic, V (A, r), can also be maintained for sets of attributes, if desired,
instead of just for individual attributes
...
If we assume that the tuples of relation r are stored together physically in a file,
the following equation holds:
nr
br =
fr
Statistics about indices, such as the heights of B+ -tree indices and number of leaf
pages in the indices, are also maintained in the catalog
...
This update incurs a substantial amount of overhead
...
Instead, they update the statistics during periods of light system load
...
However, if not too many updates occur in the intervals between the updates of
the statistics, the statistics will be sufficiently accurate to provide a good estimation
of the relative costs of the different plans
...
Real-world optimizers often
maintain further statistical information to improve the accuracy of their cost estimates of evaluation plans
...
As an example of a histogram, the range of values for an attribute age of a relation person could be divided
into 0 – 9, 10 – 19,
...
With each range we
store a count of the number of person tuples whose age values lie in that range
...
14
...
2 Selection Size Estimation
The size estimate of the result of a selection operation depends on the selection predicate
...
• σA = a (r): If we assume uniform distribution of values (that is, each value appears with equal probability), the selection result can be estimated to have
nr /V (A, r) tuples, assuming that the value a appears in attribute A of some
record of r
...
However,
it is often not realistic to assume that each value appears with equal probability
...
There is one tuple in the account relation for
each account
...
Therefore, certain branch-name values appear
with greater probability than do others
...
Data Storage and
Querying
14
...
2
533
© The McGraw−Hill
Companies, 2001
Estimating Statistics of Expression Results
533
distribution assumption is often not correct, it is a reasonable approximation
of reality in many cases, and it helps us to keep our presentation relatively
simple
...
If the actual value used
in the comparison (v) is available at the time of cost estimation, a more accurate estimate can be made
...
Assuming that values
are uniformly distributed, we can estimate the number of records that will
satisfy the condition A ≤ v as 0 if v < min(A, r), as nr if v ≥ max(A, r), and
nr ·
v − min(A, r)
max(A, r) − min(A, r)
otherwise
...
In such cases, we
will assume that approximately one-half the records will satisfy the comparison condition
...
• Complex selections:
Conjunction: A conjunctive selection is a selection of the form
σθ1 ∧θ2 ∧···∧θn (r)
We can estimate the result size of such a selection: For each θi , we estimate the size of the selection σθi (r), denoted by si , as described previously
...
The preceding probability is called the selectivity of the selection σθi (r)
...
Thus, we estimate the number of tuples in the full selection
as
s1 ∗ s2 ∗ · · · ∗ sn
nr ∗
nn
r
Disjunction: A disjunctive selection is a selection of the form
σθ1 ∨θ2 ∨···∨θn (r)
A disjunctive condition is satisfied by the union of all records satisfying
the individual, simple conditions θi
...
The probability that the tuple will satisfy the disjunction is then 1
minus the probability that it will satisfy none of the conditions:
s1
s2
sn
1 − (1 −
) ∗ (1 −
) ∗ · · · ∗ (1 −
)
nr
nr
nr
Multiplying this value by nr gives us the estimated number of tuples that
satisfy the selection
...
Data Storage and
Querying
14
...
We already know how to estimate
the number of tuples in σθ (r)
...
We can account for nulls by estimating the number of tuples for which
the condition θ would evaluate to unknown, and subtracting that number
from the above estimate ignoring nulls
...
14
...
3 Join Size Estimation
In this section, we see how to estimate the size of the result of a join
...
Each tuple of r × s occupies
lr + ls bytes, from which we can calculate the size of the Cartesian product
...
Let r(R) and s(S) be relations
...
• If R ∩ S is a key for R, then we know that a tuple of s will join with at most
one tuple from r
...
The case where R ∩ S is a key for S is symmetric
to the case just described
...
• The most difficult case is when R ∩ S is a key for neither R nor S
...
Consider a tuple t of r, and assume R ∩ S = {A}
...
Considering all the tuples in r, we estimate
that there are
nr ∗ ns
V (A, s)
tuples in r 1 s
...
These two estimates differ if V (A, r) = V (A, s)
...
Thus, the lower of the two estimates is probably the more accurate one
...
Data Storage and
Querying
14
...
2
535
© The McGraw−Hill
Companies, 2001
Estimating Statistics of Expression Results
535
attribute A in s
...
More important, the preceding estimate depends on the assumption that each value appears with equal probability
...
We can estimate the size of a theta join r 1θ s by rewriting the join as σθ (r × s),
and using the size estimates for Cartesian products along with the size estimates for
selections, which we saw in Section 14
...
2
...
• fcustomer = 25, which implies that bcustomer = 10000/25 = 400
...
• fdepositor = 50, which implies that bdepositor = 5000/50 = 100
...
Also assume that customer-name in depositor is a foreign key on customer
...
Let us now compute the size estimates for depositor 1 customer without using information about foreign keys
...
In this case, the lower
of these estimates is the same as that which we computed earlier from information
about foreign keys
...
2
...
Projection: The estimated size (number of records or number of tuples) of a projection of the form ΠA (r) is V (A, r), since projection eliminates duplicates
...
Set operations: If the two inputs to a set operation are selections on the same relation, we can rewrite the set operation as disjunctions, conjunctions, or negations
...
Similarly, we
536
Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition
536
Chapter 14
IV
...
Query Optimization
© The McGraw−Hill
Companies, 2001
Query Optimization
can rewrite intersections as conjunctions, and we can rewrite set difference by
using negation, so long as the two relations participating in the set operations
are selections on the same relation
...
2
...
If the inputs are not selections on the same relation, we estimate the sizes
this way: The estimated size of r ∪ s is the sum of the sizes of r and s
...
The estimated
size of r − s is the same size as r
...
Outer join: The estimated size of r 1 s is the size of r 1 s plus the size of r; that of
r 1 s is symmetric, while that of r 1 s is the size of r 1 s plus the sizes of
r and s
...
14
...
5 Estimation of Number of Distinct Values
For selections, the number of distinct values of an attribute (or set of attributes) A in
the result of a selection, V (A, σθ (r)), can be estimated in these ways:
• If the selection condition θ forces A to take on a specified value (e
...
, A = 3),
V (A, σθ (r)) = 1
...
g
...
• If the selection condition θ is of the form A op v, where op is a comparison
operator, V (A, σθ (r)) is estimated to be V (A, r) ∗ s, where s is the selectivity
of the selection
...
A more
accurate estimate can be derived for this case using probability theory, but the
above approximation works fairly well
...
• If A contains attributes A1 from r and A2 from s, then V (A, r 1 s) is estimated
as
min(V (A1, r) ∗ V (A2 − A1, s), V (A1 − A2, r) ∗ V (A2, s), nr1s )
Note that some attributes may be in A1 as well as in A2, and A1 − A2 and
A2−A1 denote, respectively, attributes in A that are only from r and attributes
Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition
IV
...
Query Optimization
14
...
Again, more accurate estimates can be derived by
using probability theory, but the above approximations work fairly well
...
The same holds for grouping attributes of aggregation
...
For min(A) and max(A), the number of distinct values can be estimated as min(V (A, r), V (G, r)), where G denotes the grouping attributes
...
14
...
As mentioned at the start of this chapter, a
query can be expressed in several different ways, with different costs of evaluation
...
Two relational-algebra expressions are said to be equivalent if, on every legal database instance, the two expressions generate the same set of tuples
...
) Note that the order of the tuples is irrelevant; the two expressions may
generate the tuples in different orders, but would be considered equivalent as long
as the set of tuples is the same
...
Two expressions in the multiset
version of the relational algebra are said to be equivalent if on every legal database
the two expressions generate the same multiset of tuples
...
We leave extensions to the multiset version of
the relational algebra to you as exercises
...
3
...
We can replace an expression of the first form by an expression of the second form, or vice
versa — that is we can replace an expression of the second form by an expression
of the first form — since the two expressions would generate the same result on any
valid database
...
We now list a number of general equivalence rules on relational-algebra expressions
...
2
...
A relation name r is
simply a special case of a relational-algebra expression, and can be used wherever E
appears
...
Data Storage and
Querying
© The McGraw−Hill
Companies, 2001
14
...
2
E2
E1
Pictorial representation of equivalences
...
Conjunctive selection operations can be deconstructed into a sequence of individual selections
...
σθ1 ∧θ2 (E) = σθ1 (σθ2 (E))
2
...
σθ1 (σθ2 (E)) = σθ2 (σθ1 (E))
3
...
This transformation can also be referred to as a
cascade of Π
...
(ΠLn (E))
...
Selections can be combined with Cartesian products and theta joins
...
σθ (E1 × E2 ) = E1 1θ E2
This expression is just the definition of the theta join
...
σθ1 (E1 1θ2 E2 ) = E1 1θ1 ∧θ2 E2
5
...
E1
1θ
E2 = E2
1θ
E1
Actually, the order of attributes differs between the left-hand side and righthand side, so the equivalence does not hold if the order of attributes is taken
into account
...
Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition
IV
...
3
539
© The McGraw−Hill
Companies, 2001
14
...
6
...
Natural-join operations are associative
...
Theta joins are associative in the following manner:
(E1
1θ
1
E2 ) 1θ2 ∧θ3 E3 = E1
1θ ∧θ
1
3
(E2
1θ
2
E3 )
where θ2 involves attributes from only E2 and E3
...
The commutativity and associativity of join operations
are important for join reordering in query optimization
...
The selection operation distributes over the theta-join operation under the following two conditions:
a
...
σθ0 (E1
1θ E 2 ) =
(σθ0 (E1 )) 1θ E2
b
...
σθ1 ∧θ2 (E1
1θ E2 ) =
(σθ1 (E1 )) 1θ (σθ2 (E2 ))
8
...
a
...
Suppose that the
join condition θ involves only attributes in L1 ∪ L2
...
Consider a join E1 1θ E2
...
Let L3 be attributes of E1 that are involved in join
condition θ, but are not in L1 ∪ L2 , and let L4 be attributes of E2 that are
involved in join condition θ, but are not in L1 ∪ L2
...
The set operations union and intersection are commutative
...
10
...
(E1 ∪ E2 ) ∪ E3 = E1 ∪ (E2 ∪ E3 )
(E1 ∩ E2 ) ∩ E3 = E1 ∩ (E2 ∩ E3 )
540
Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition
540
Chapter 14
IV
...
Query Optimization
© The McGraw−Hill
Companies, 2001
Query Optimization
11
...
σP (E1 − E2 ) = σP (E1 ) − σP (E2 )
Similarly, the preceding equivalence, with − replaced with either ∪ or ∩, also
holds
...
12
...
ΠL (E1 ∪ E2 ) = (ΠL (E1 )) ∪ (ΠL (E2 ))
This is only a partial list of equivalences
...
14
...
2 Examples of Transformations
We now illustrate the use of the equivalence rules
...
In our example in Section 14
...
We can carry out this transformation by using rule 7
...
Remember
that the rule merely says that the two expressions are equivalent; it does not say that
one is better than the other
...
As an illustration, suppose that we modify our original query to restrict
attention to customers who have a balance over $1000
...
However, we can first
Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition
IV
...
3
541
© The McGraw−Hill
Companies, 2001
14
...
a (associativity of natural join) to transform the join branch 1 (account 1
depositor) into (branch 1 account) 1 depositor:
Πcustomer -name (σbranch -city = “Brooklyn” ∧ balance
((branch 1 account) 1 depositor))
>1000
Then, using rule 7
...
Using rule 1, we
can break the selection into two selections, to get the following subexpression:
σbranch -city = “Brooklyn” (σbalance > 1000 (branch 1 account))
Both of the preceding expressions select tuples with branch-city = “Brooklyn” and
balance > 1000
...
3 depicts the initial expression and the final expression after all these
transformations
...
b to get the final expression
directly, without using rule 1 to break the selection into two selections
...
b
can itself be derived from rules 1 and 7
...
The preceding example illustrates that the set of equivalence rules in Section 14
...
1 is not minimal
...
Query optimizers therefore use minimal sets of equivalence rules
...
3
branch
σbalance < 1000
account
(b) Tree after multiple transformations
Multiple transformations
...
Data Storage and
Querying
© The McGraw−Hill
Companies, 2001
14
...
a and 8
...
The only attributes that we must retain are those
that either appear in the result of the query or are needed to process subsequent
operations
...
Thus, we reduce the size of the intermediate result
...
Therefore, we can modify the expression to
Πcustomer -name (
( Πaccount -number ((σbranch -city = “Brooklyn” (branch)) 1 account)) 1 depositor)
The projection Πaccount -number reduces the size of the intermediate join results
...
3
...
As mentioned in Chapter 3 and in equivalence rule 6
...
Thus, for all relations r1 , r2 , and r3 ,
(r1
1
r2 ) 1 r3 = r1
1
(r2
1
r3 )
Although these expressions are equivalent, the costs of computing them may differ
...
In contrast,
σbranch -city = “Brooklyn” (branch) 1 account
is probably a small relation
...
Thus, the preceding expression results in one tuple for each account held by a resident of Brooklyn
...
Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition
IV
...
3
543
© The McGraw−Hill
Companies, 2001
14
...
We do not care about
the order in which attributes appear in a join, since it is easy to change the order
before displaying the result
...
Using the associativity and commutativity of the natural join (rules 5 and 6), we
can consider rewriting our relational-algebra expression as
Πcustomer -name (((σbranch -city = “Brooklyn” (branch)) 1 depositor) 1 account)
That is, we could compute
(σbranch -city = “Brooklyn” (branch)) 1 depositor
first, and, after that, join the result with account
...
If there are b branches in Brooklyn and d tuples in the depositor
relation, this Cartesian product generates b ∗ d tuples, one for every possible pair of
depositor tuple and branches (without regard for whether the account in depositor is
maintained at the branch)
...
As a result, we would reject this strategy
...
14
...
4 Enumeration of Equivalent Expressions
Query optimizers use equivalence rules to systematically generate expressions equivalent to the given query expression
...
Given an expression, if any subexpression matches one side of an equivalence rule,
the optimizer generates a new expression where the subexpression is transformed to
match the other side of the rule
...
The preceding process is costly both in space and in time
...
Expression-representation techniques that allow
both expressions to point to shared subexpressions can reduce the space requirement
significantly, and many query optimizers use them
...
If an optimizer takes cost estimates of evaluation
into account, it may be able to avoid examining some of the expressions, as we shall
see in Section 14
...
We can reduce the time required for optimization by using techniques such as these
...
Data Storage and
Querying
© The McGraw−Hill
Companies, 2001
14
...
4 Choice of Evaluation Plans
Generation of expressions is only part of the query-optimization process, since each
operation in the expression can be implemented with different algorithms
...
Figure 14
...
3
...
Further, decisions about pipelining
have to be made
...
They would do so if the
indices on branch and account store records with equal values for the index attributes
sorted by branch-name
...
4
...
We can choose any ordering
of the operations that ensures that operations lower in the tree are executed before
operations higher in the tree
...
Although a merge join at a given level may be costlier than
a hash join, it may provide a sorted output that makes evaluating a later operation
(such as duplicate elimination, intersection, or another merge join) cheaper
...
4
An evaluation plan
...
Data Storage and
Querying
545
© The McGraw−Hill
Companies, 2001
14
...
4
Choice of Evaluation Plans
545
performing the join
...
Thus, in addition to considering alternative expressions for a query, we must also
consider alternative algorithms for each operation in an expression
...
We can use
these rules to generate all the query-evaluation plans for a given expression
...
2 coupled with cost estimates for various algorithms
and evaluation methods described in Chapter 13
...
There are two broad approaches: The
first searches all the plans, and chooses the best plan in a cost-based fashion
...
We discuss these approaches next
...
14
...
2 Cost-Based Optimization
A cost-based optimizer generates a range of query-evaluation plans from the given
query by using the equivalence rules, and chooses the one with the least cost
...
As an illustration, consider the expression
r1
1 r2 1 · · · 1 rn
where the joins are expressed without any ordering
...
(We
leave the computation of this expression for you to do in Exercise 14
...
) For joins
involving small numbers of relations, this number is acceptable; for example, with
n = 5, the number is 1680
...
With
n = 7, the number is 665280; with n = 10, the number is greater than 17
...
For example, suppose we want to find the best join order of the form
(r1
1 r2 1 r3 ) 1 r4 1 r5
which represents all join orders where r1 , r2 , and r3 are joined first (in some order),
and the result is joined (in some order) with r4 and r5
...
Thus, there appear to be 144 join orders to examine
...
Thus, instead of 144 choices to examine, we need to examine only
12 + 12 choices
...
Data Storage and
Querying
14
...
cost = ∞)
return bestplan[S]
// else bestplan[S] has not been computed earlier, compute it now
for each non-empty subset S1 of S such that S1 = S
P1 = findbestplan(S1)
P2 = findbestplan(S − S1)
A = best algorithm for joining results of P 1 and P 2
cost = P 1
...
cost + cost of A
if cost < bestplan[S]
...
cost = cost
bestplan[S]
...
plan; execute P 2
...
5
Dynamic programming algorithm for join order optimization
...
Dynamic programming algorithms store results of computations
and reuse them, a procedure that can reduce execution time greatly
...
5
...
Each element of the associative array contains two components: the cost of the best plan of S, and the plan itself
...
cost is assumed to be initialized to ∞ if bestplan[S] has not yet
been computed
...
Otherwise, the procedure tries
every way of dividing S into two disjoint subsets
...
The procedure picks the cheapest plan
from among all the alternatives for dividing S into two sets
...
The time
complexity of the procedure can be shown to be O(3n ) (see Exercise 14
...
Actually, the order in which tuples are generated by the join of a set of relations
is also important for finding the best overall join order, since it can affect the cost of
further joins (for instance, if merge join is used)
...
For
instance, generating the result of r1 1 r2 1 r3 sorted on the attributes common with
r4 or r5 may be useful, but generating it sorted on the attributes common to only r1
and r2 is not useful
...
Hence, it is not sufficient to find the best join order for each subset of the set of
n given relations
...
Data Storage and
Querying
547
© The McGraw−Hill
Companies, 2001
14
...
4
Choice of Evaluation Plans
547
each interesting sort order of the join result for that subset
...
The number of interesting sort orders is generally not large
...
The dynamic-programming algorithm
for finding the best join order can be easily extended to handle sort orders
...
With n = 10, this number is around 59000, which is much better than the 17
...
More important, the storage required is much less than
before, since we need to store only one join order for each interesting sort order of
each of 1024 subsets of r1 ,
...
Although both numbers still increase rapidly with
n, commonly occurring joins usually have less than 10 relations, and can be handled
easily
...
For instance, when examining the plans for an expression, we
can terminate after we examine only a part of the expression, if we determine that
the cheapest plan for that part is already costlier than the cheapest evaluation plan
for a full expression examined earlier
...
Then, no full expression involving that
subexpression needs to be examined
...
Then, only a few competing plans will require a full
analysis of cost
...
14
...
3 Heuristic Optimization
A drawback of cost-based optimization is the cost of optimization itself
...
Hence, many systems use heuristics to reduce the number
of choices that must be made in a cost-based fashion
...
An example of a heuristic rule is the following rule for transforming relationalalgebra queries:
• Perform selection operations as early as possible
...
In the first transformation example in Section 14
...
We say that the preceding rule is a heuristic because it usually, but not always,
helps to reduce the cost
...
The selection can certainly be performed before the join
...
Data Storage and
Querying
14
...
Performing the selection early — that is, directly on s — would require doing a
scan of all tuples in s
...
The projection operation, like the selection operation, reduces the size of relations
...
This advantage suggests a companion to the “perform selections early” heuristic:
• Perform projections early
...
An example similar to the one used for the selection heuristic
should convince you that this heuristic does not always reduce the cost
...
3
...
We now present an overview of the steps in a typical heuristic optimization algorithm
...
3
1
...
This step, based on equivalence rule 1, facilitates moving selection operations down the query tree
...
Move selection operations down the query tree for the earliest possible execution
...
a, 7
...
For instance, this step transforms σθ (r 1 s) into either σθ (r) 1 s or r 1 σθ (s)
whenever possible
...
The degree of reordering permitted for a particular selection is determined by the attributes
involved in that selection condition
...
Determine which selection operations and join operations will produce the
smallest relations — that is, will produce the relations with the least number
of tuples
...
This step considers the selectivity of a selection or join condition
...
This step relies on the associativity
of binary operations given in equivalence rule 6
...
Replace with join operations those Cartesian product operations that are followed by a selection condition (rule 4
...
The Cartesian product operation is
often expensive to implement since r1 × r2 includes a record for each combination of records from r1 and r2
...
Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition
IV
...
Query Optimization
14
...
Deconstruct and move as far down the tree as possible lists of projection attributes, creating new projections where needed
...
a, 8
...
6
...
In summary, the heuristics listed here reorder an initial query-tree representation
in such a way that the operations that reduce the size of intermediate results are applied first; early selection reduces the number of tuples, and early projection reduces
the number of attributes
...
Heuristic optimization further maps the heuristically transformed query expression into alternative sequences of operations to produce a set of candidate evaluation plans
...
The access-plan
– selection phase of a heuristic optimizer chooses the most efficient strategy for each
operation
...
4
...
For
example, certain query optimizers, such as the System R optimizer, do not consider
all join orders, but rather restrict the search to particular kinds of join orders
...
, rn
...
Left-deep join orders are particularly convenient for pipelined evaluation,
since the right operand is a stored relation, and thus only one input to each join is
pipelined
...
6 illustrates the difference between left-deep join trees and non-left-deep
join trees
...
With the use of dynamic programming
optimizations, the System R optimizer can find the best join order in time O(n2n )
...
The System R optimizer uses heuristics to push selections and projections down the
query tree
...
The estimate is likely to be accurate with small buffers; with large buffers, however, the page containing the tuple
may already be in the buffer
...
550
Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition
550
Chapter 14
IV
...
Query Optimization
Query Optimization
r5
r4
r3
r4
r5
r3
r1
r1
r2
r2
(a) Left-deep join tree
Figure 14
...
Query optimization approaches that integrate heuristic selection and the generation of alternative access plans have been adopted in several systems
...
The cost-based optimization techniques
described here are used for each block of the query separately
...
Each plan uses a left-deep join order,
starting with a different one of the n relations
...
Either nested-loop
or sort – merge join is chosen for each of the joins, depending on the available access
paths
...
The intricacies of SQL introduce a good deal of complexity into query optimizers
...
We briefly outline how to handle nested subqueries in Section 14
...
5
...
Even with the use of heuristics, cost-based query optimization imposes a substantial overhead on query processing
...
The difference in execution time between a good
plan and a bad one may be huge, making query optimization essential
...
Therefore, most commercial systems include relatively sophisticated optimizers
...
Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition
IV
...
Query Optimization
14
...
4
...
The parameters are the variables from outer level query that are used in the nested
subquery (these variables are called correlation variables)
...
select customer-name
from borrower
where exists (select *
from depositor
where depositor
...
customer-name)
Conceptually, the subquery can be viewed as a function that takes a parameter (here,
borrower
...
SQL evaluates the overall query (conceptually) by computing the Cartesian product of the relations in the outer from clause and then testing the predicates in the
where clause for each tuple in the product
...
This technique for evaluating a query with a nested subquery is called correlated
evaluation
...
A large number of random
disk I/O operations may result
...
Efficient join algorithms help avoid expensive random I/O
...
As an example of transforming a nested subquery into a join, the query in the
preceding example can be rewritten as
select customer-name
from borrower, depositor
where depositor
...
customer-name
(To properly reflect SQL semantics, the number of duplicate derivations should not
change because of the rewriting; the rewritten query can be modified to ensure this
property, as we will see shortly
...
In general, it may not be
possible to directly move the nested subquery relations into the from clause of the
outer query
...
For instance, a query of the
form
552
Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition
552
Chapter 14
IV
...
Query Optimization
© The McGraw−Hill
Companies, 2001
Query Optimization
select
...
from L1 , t1
2
where P1 and P2
1
where P2 contains predicates in P2 without selections involving correlation variables,
2
and P2 reintroduces the selections involving correlation variables (with relations referenced in the predicate appropriately renamed)
...
In our example, the original query would have been transformed to
create table t1 as
select distinct customer-name
from depositor
select customer-name
from borrower, t1
where t1
...
customer-name
The query we rewrote to illustrate creation of a temporary relation can be obtained
by simplifying the above transformed query, assuming the number of duplicates of
each tuple does not matter
...
Decorrelation is more complicated when the nested subquery uses aggregation,
or when the result of the nested subquery is used to test for equality, or when the
condition linking the nested subquery to the outer query is not exists, and so on
...
Optimization of complex nested subqueries is a difficult task, as you can infer from
the above discussion, and many optimizers do only a limited amount of decorrelation
...
Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition
IV
...
Query Optimization
14
...
5 Materialized Views∗∗
When a view is defined, normally the database stores only the query defining the
view
...
Materialized views constitute redundant data, in that their contents can be
inferred from the view definition and the rest of the database contents
...
Materialized views are important for improving performance in some applications
...
Computing the view requires reading every loan tuple
pertaining to the branch, and summing up the loan amounts, which can be timeconsuming
...
14
...
1 View Maintenance
A problem with materialized views is that they must be kept up-to-date when the
data used in the view definition changes
...
The task of keeping a materialized view up-to-date with
the underlying data is known as view maintenance
...
Another option for maintaining materialized views is to define triggers on insert,
delete, and update of each relation in the view definition
...
A simplistic way of doing so is to completely recompute the materialized view on every update
...
We describe how to perform incremental view maintenance in Section 14
...
2
...
Database system programmers no longer need to define triggers for view
maintenance
...
554
Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition
554
Chapter 14
IV
...
Query Optimization
Query Optimization
14
...
2 Incremental View Maintenance
To understand how to incrementally maintain materialized views, we start off by
considering individual operations, and then see how to handle a complete expression
...
To simplify our description, we replace updates to
a tuple by deletion of the tuple followed by insertion of the updated tuple
...
The changes (inserts and deletes) to a
relation or expression are referred to as its differential
...
5
...
1 Join Operation
Consider the materialized view v = r 1 s
...
If the old value of r is denoted by r old , and the new value of r
by r new , r new = r old ∪ ir
...
We can rewrite r new 1 s as (r old ∪ ir ) 1 s,
which we can again rewrite as (r old 1 s) ∪ (ir 1 s)
...
Inserts to s are handled in an exactly
symmetric fashion
...
Using the
same reasoning as above, we get
Deletes on s are handled in an exactly symmetric fashion
...
5
...
2 Selection and Projection Operations
Consider a view v = σθ (r)
...
Consider a materialized
view v = ΠA (r)
...
Then, ΠA (r) has a single tuple (a)
...
The reason is
that the same tuple (a) is derived in two ways, and deleting one tuple from r removes
only one of the ways of deriving (a); the other is still present
...
Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition
IV
...
Query Optimization
14
...
Let t
...
We find (t
...
If the count becomes 0, (t
...
Handling insertions is relatively straightforward
...
If (t
...
If not, we add (t
...
14
...
2
...
The aggregate operations in SQL are count, sum, avg, min, and max:
• count: Consider a materialized view v = A Gcount(B) (r), which computes the
count of the attribute B, after grouping r by attribute A
...
We look for the group t
...
If it is not present,
we add (t
...
If the group t
...
When a set of tuples dr is deleted from r, for each tuple t in dr we do the
following
...
A in the materialized view, and subtract 1
from the count for the group
...
A from the materialized view
...
When a set of tuples ir is inserted into r, for each tuple t in ir we do the following
...
A in the materialized view
...
A, t
...
A, t
...
If the group t
...
B to the aggregate value for the group, and add
1 to the count of the group
...
We look for the group t
...
B from the aggregate value for the group
...
A from the materialized view
...
• avg: Consider a materialized view v = A Gavg(B) (r)
...
556
Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition
556
Chapter 14
IV
...
Query Optimization
© The McGraw−Hill
Companies, 2001
Query Optimization
Instead, to handle the case of avg, we maintain the sum and count aggregate values as described earlier, and compute the average as the sum divided
by the count
...
(The case of max is
exactly equivalent
...
Maintaining the aggregate values min and max on deletions may be more expensive
...
14
...
2
...
Given materialized view v =
r ∩ s, when a tuple is inserted in r we check if it is present in s, and if so we add
it to v
...
The other set operations, union and set difference, are handled in a similar fashion; we
leave details to you
...
In the case of deletion from r we have to handle tuples in s that no longer match any
tuple in r
...
Again we leave details to you
...
5
...
5 Handling Expressions
So far we have seen how to update incrementally the result of a single operation
...
For example, suppose we wish to incrementally update a materialized view E1 1
E2 when a set of tuples ir is inserted into relation r
...
Suppose the set of tuples to be inserted into E1 is given by expression D1
...
See the bibliographical notes for further details on incremental view maintenance
with expressions
...
5
...
However, materialized views offer further opportunities for optimization:
• Rewriting queries to use materialized views:
Suppose a materialized view v = r 1 s is available, and a user submits a
query r 1 s 1 t
...
Thus, it is the job of the
Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition
IV
...
Query Optimization
14
...
• Replacing a use of a materialized view by the view definition:
Suppose a materialized view v = r 1 s is available, but without any index
on it, and a user submits a query σA=10 (v)
...
The best plan
for this query may be to replace v by r 1 s, which can lead to the query plan
σA=10 (r) 1 s; the selection and join can be performed efficiently by using
the indices on r
...
B, respectively
...
The bibliographical notes give pointers to research showing how to efficiently perform query optimization with materialized views
...
One simple criterion would be to select a set
of materialized views that minimizes the overall execution time of the workload of
queries and updates, including the time taken to maintain the materialized views
...
Indices are just like materialized views, in that they too are derived data, can speed
up queries, and may slow down updates
...
We examine these issues in more detail in Sections 21
...
5 and 21
...
6
...
5, and the RedBrick Data
Warehouse from Informix, provide tools to help the database administrator with index and materialized view selection
...
14
...
It is the responsibility of the system to transform the query as entered
by the user into an equivalent query that can be computed more efficiently
...
• The evaluation of complex queries involves many accesses to disk
...
• The strategy that the database system chooses for evaluating an operation depends on the size of each relation and on the distribution of values within
558
Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition
558
Chapter 14
IV
...
Query Optimization
Query Optimization
columns
...
These statistics include
The number of tuples in the relation r
The size of a record (tuple) of relation r in bytes
The number of distinct values that appear in the relation r for a particular
attribute
• These statistics allow us to estimate the sizes of the results of various operations, as well as the cost of executing the operations
...
The presence of these structures has a significant influence on the choice of a query-processing strategy
...
The first step in selecting a query-processing strategy is to find a relational-algebra expression that is equivalent to the given expression and is estimated to cost less to execute
...
We use these rules to generate systematically
all expressions equivalent to the given query
...
Several
optimization techniques are available to reduce the number of alternative expressions and plans that need to be generated
...
Heuristic rules for transforming relationalalgebra queries include “Perform selection operations as early as possible,”
“Perform projections early,” and “Avoid Cartesian products
...
Incremental
view maintenance is needed to efficiently update materialized views when
the underlying relations are modified
...
Other issues related to materialized views include how
to optimize queries by making use of available materialized views, and how
to select views to be materialized
...
Data Storage and
Querying
Exercises
•
•
•
•
•
•
•
•
•
•
559
© The McGraw−Hill
Companies, 2001
14
...
1 Clustering indices may allow faster access to data than a nonclustering index
affords
...
14
...
Assume that r1 has 1000 tuples, r2 has 1500 tuples,
and r3 has 750 tuples
...
14
...
2
...
Let V (C, r1 )
be 900, V (C, r2 ) be 1100, V (E, r2 ) be 50, and V (E, r3 ) be 100
...
Estimate the size of
r1 1 r2 1 r3 , and give an efficient strategy for computing the join
...
4 Suppose that a B+ -tree index on branch-city is available on relation branch, and
that no other index is available
...
σ¬(branch -city<“Brooklyn”) (branch)
b
...
σ¬(branch -city<“Brooklyn” ∨ assets<5000) (branch)
14
...
What would be the best way to handle the following selection?
σ(branch-city<“Brooklyn”) ∧ (assets<5000)∧(branch-name=“Downtown”) (branch)
560
Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition
560
Chapter 14
IV
...
Query Optimization
© The McGraw−Hill
Companies, 2001
Query Optimization
14
...
Explain how you can apply then
to improve the efficiency of certain queries:
a
...
b
...
c
...
14
...
3
...
a
...
σθ1 ∧θ2 (E1 1θ3 E2 ) = σθ1 (E1 1θ3 (σθ2 (E2 ))), where θ2 involves only attributes from E2
14
...
a
...
σB<4 ( A Gmax (B) (R)) and A Gmax (B) (σB<4 (R))
c
...
(R 1 S) 1 T and R 1 (S 1 T )
In other words, the natural left outer join is not associative
...
)
e
...
9 SQL allows relations with duplicates (Chapter 4)
...
Define versions of the basic relational-algebra operations σ, Π, ×, 1, −, ∪,
and ∩ that work on relations with duplicates, in a way consistent with SQL
...
Check which of the equivalence rules 1 through 7
...
14
...
Hint: A complete binary tree is one where every internal node has exactly
two children
...
(n−1)
If you wish, you can derive the formula for the number of complete binary
trees with n nodes from the formula for the number of binary trees with n
1
nodes
...
14
...
Assume that you can store and look up information about a set of relations (such
as the optimal join order for the set, and the cost of that join order) in constant
time
...
)
Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition
IV
...
Query Optimization
561
© The McGraw−Hill
Companies, 2001
Bibliographical Notes
561
14
...
Assume
that there is only one interesting sort order
...
13 A set of equivalence rules is said to be complete if, whenever two expressions
are equivalent, one can be derived from the other by a sequence of uses of the
equivalence rules
...
3
...
14
...
Write a nested query on the relation account to find for each branch with
name starting with “B”, all accounts with the maximum balance at the
branch
...
Rewrite the preceding query, without using a nested subquery; in other
words, decorrelate the query
...
Give a procedure (similar that that described in Section 14
...
5) for decorrelating such queries
...
15 Describe how to incrementally maintain the results of the following operations,
on both insertions and deletions
...
Union and set difference
b
...
16 Give an example of an expression defining a materialized view and two situations (sets of statistics for the input relations and the differentials) such that
incremental view maintenance is better than recomputation in one situation,
and recomputation is better in the other situation
...
[1979] describes access-path selection in the System R optimizer, which was one of the earliest relational-query optimizers
...
Query processing in Starburst is described in Haas et al
...
Query optimization
in Oracle is briefly outlined in Oracle [1997]
...
[1996], and Ganguly et al
...
Nonuniform distributions of values causes problems for estimation of query size and
cost
...
Ioannidis and Christodoulakis [1993], Ioannidis and
Poosala [1995], and Poosala et al
...
Exhaustive searching of all query plans is impractical for optimization of joins
involving many relations, and techniques based on randomized searching, which do
not examine all alternatives, have been proposed
...
Parametric query-optimization techniques have been proposed by Ioannidis et al
...
Data Storage and
Querying
14
...
A set of plans — one for each of several
different query selectivities— is computed, and is stored by the optimizer, at compile
time
...
Klug [1982] was an early work on optimization of relational-algebra expressions
with aggregate functions
...
Optimization of queries containing outer
joins is described in Rosenthal and Reiner [1984], Galindo-Legaria and Rosenthal
[1992], and Galindo-Legaria [1994]
...
Extension
of relational algebra to duplicates is described in Dayal et al
...
Optimization of
nested subqueries is discussed in Kim [1982], Ganski and Wong [1987], Dayal [1987],
and more recently, in Seshadri et al
...
When queries are generated through views, more relations often are joined than is
necessary for computation of the query
...
The notion of a tableau
was introduced by Aho et al
...
[1979a], and was further extended
by Sagiv and Yannakakis [1981]
...
Sellis [1988] and Roy et al
...
If an entire group
of queries is considered, it is possible to discover common subexpressions that can be
evaluated once for the entire group
...
Dalvi et al
...
Query optimization can make use of semantic information, such as functional dependencies and other integrity constraints
...
[1990], and in the context of
aggregation, by Sudarshan and Ramakrishnan [1991]
...
[1992c], Srivastava et al
...
[1996]
...
[1993]
...
[1986], Blakeley et al
...
Gupta and Mumick [1995] provides a survey of materialized view maintenance
...
[2001]
...
[1995], Dar et al
...
[2000]
...
[1996], Labio et al
...
[2000]
...
Transaction
Management
R T
Introduction
© The McGraw−Hill
Companies, 2001
5
Transaction Management
The term transaction refers to a collection of operations that form a single logical unit
of work
...
It is important that either all actions of a transaction be executed completely, or, in
case of some failure, partial effects of a transaction be undone
...
Further, once a transaction is successfully executed, its effects must persist
in the database — a system failure should not result in the database forgetting about
a transaction that successfully completed
...
In a database system where multiple transactions are executing concurrently, if
updates to shared data are not controlled there is potential for transactions to see
inconsistent intermediate states created by updates of other transactions
...
Thus, database
systems must provide mechanisms to isolate transactions from the effects of other
concurrently executing transactions
...
Chapter 15 describes the concept of a transaction in detail, including the properties
of atomicity, durability, isolation, and other properties provided by the transaction
abstraction
...
Chapter 16 describes several concurrency control techniques that help implement
the isolation property
...
563
564
Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition
C
V
...
Transactions
T
E
R
1
5
Transactions
Often, a collection of several operations on the database appears to be a single unit
from the point of view of the database user
...
Clearly, it is essential that all these operations occur, or that, in case of a failure, none
occur
...
Collections of operations that form a single logical unit of work are called transactions
...
Furthermore, it must
manage concurrent execution of transactions in a way that avoids the introduction of
inconsistency
...
As a result, it
would obtain an incorrect result
...
Details on
concurrent transaction processing and recovery from failures are in Chapters 16 and
17, respectively
...
15
...
Usually, a transaction is initiated by a user program written in a
high-level data-manipulation language or programming language (for example, SQL,
COBOL, C, C++, or Java), where it is delimited by statements (or function calls) of the
form begin transaction and end transaction
...
To ensure integrity of the data, we require that the database system maintain the
following properties of the transactions:
565
Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition
566
Chapter 15
V
...
Transactions
© The McGraw−Hill
Companies, 2001
Transactions
• Atomicity
...
• Consistency
...
• Isolation
...
Thus, each transaction is unaware of other transactions
executing concurrently in the system
...
After a transaction completes successfully, the changes it has made
to the database persist, even if there are system failures
...
To gain a better understanding of ACID properties and the need for them, consider a simplified banking system consisting of several accounts and a set of transactions that access and update those accounts
...
Transactions access data using two operations:
• read(X), which transfers the data item X from the database to a local buffer
belonging to the transaction that executed the read operation
...
In a real database system, the write operation does not necessarily result in the immediate update of the data on the disk; the write operation may be temporarily stored
in memory and executed on the disk later
...
We shall return to this subject
in Chapter 17
...
This transaction can be defined as
Ti : read(A);
A := A − 50;
write(A);
read(B);
B := B + 50;
write(B)
...
(For ease of presentation, we
consider them in an order different from the order A-C-I-D)
...
Without the consistency
requirement, money could be created or destroyed by the transaction! It can
565
566
Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition
V
...
Transactions
15
...
Ensuring consistency for an individual transaction is the responsibility of
the application programmer who codes the transaction
...
• Atomicity: Suppose that, just before the execution of transaction Ti the values
of accounts A and B are $1000 and $2000, respectively
...
Examples of such failures include power
failures, hardware failures, and software errors
...
In
this case, the values of accounts A and B reflected in the database are $950 and
$2000
...
In particular, we
note that the sum A + B is no longer preserved
...
We term such a
state an inconsistent state
...
Note, however, that the system must at some
point be in an inconsistent state
...
This state, however, is eventually replaced by the consistent state where the value of account
A is $950, and the value of account B is $2050
...
That is the reason for
the atomicity requirement: If the atomicity property is present, all actions of
the transaction are reflected in the database, or none are
...
We discuss these ideas further in Section 15
...
Ensuring atomicity
is the responsibility of the database system itself; specifically, it is handled by
a component called the transaction-management component, which we describe in detail in Chapter 17
...
The durability property guarantees that, once a transaction completes successfully, all the updates that it carried out on the database persist, even if
there is a system failure after the transaction completes execution
...
Transaction
Management
15
...
We can
guarantee durability by ensuring that either
1
...
2
...
Ensuring durability is the responsibility of a component of the database system called the recovery-management component
...
• Isolation: Even if the consistency and atomicity properties are ensured for
each transaction, if several transactions are executed concurrently, their operations may interleave in some undesirable way, resulting in an inconsistent
state
...
If a
second concurrently running transaction reads A and B at this intermediate
point and computes A + B, it will observe an inconsistent value
...
A way to avoid the problem of concurrently executing transactions is to
execute transactions serially — that is, one after the other
...
4
...
We discuss the problems caused by concurrently executing transactions in
Section 15
...
The isolation property of a transaction ensures that the concurrent execution of transactions results in a system state that is equivalent to a
state that could have been obtained had these transactions executed one at a
time in some order
...
5
...
15
...
However, as we
noted earlier, a transaction may not always complete its execution successfully
...
If we are to ensure the atomicity property, an aborted
transaction must have no effect on the state of the database
...
Transaction
Management
© The McGraw−Hill
Companies, 2001
15
...
2
Transaction State
569
the aborted transaction made to the database must be undone
...
It is part of the responsibility of the recovery scheme to manage
transaction aborts
...
A committed transaction that has performed updates transforms the database into a
new consistent state, which must persist even if there is a system failure
...
The
only way to undo the effects of a committed transaction is to execute a compensating
transaction
...
However, it is not always possible
to create such a compensating transaction
...
Chapter 24 includes a discussion of compensating transactions
...
We therefore establish a simple abstract transaction model
...
1
...
Similarly, we say that a transaction has aborted only if it has entered the aborted state
...
A transaction starts in the active state
...
At this point, the transaction has completed its execution, but it is still possible that it may have to be aborted, since the actual output
may still be temporarily residing in main memory, and thus a hardware failure may
preclude its successful completion
...
When the last of this information is written out,
the transaction enters the committed state
...
Chapter 17 discusses techniques to deal with loss of data on disk
...
Such a transaction must be rolled back
...
At this point, the system has two options:
Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition
570
Chapter 15
V
...
Transactions
Transactions
partially
committed
committed
failed
aborted
active
Figure 15
...
• It can restart the transaction, but only if the transaction was aborted as a result
of some hardware or software error that was not created through the internal logic of the transaction
...
• It can kill the transaction
...
We must be cautious when dealing with observable external writes, such as writes
to a terminal or printer
...
Most systems allow such writes
to take place only after the transaction has entered the committed state
...
If the system should
fail after the transaction has entered the committed state, but before it could complete
the external writes, the database system will carry out the external writes (using the
data in nonvolatile storage) when the system is restarted
...
For example
suppose the external action is that of dispensing cash at an automated teller machine,
and the system fails just before the cash is actually dispensed (we assume that cash
can be dispensed atomically)
...
In such a case a compensating transaction, such as depositing the cash back in the users account, needs to be
executed when the system is restarted
...
Transaction
Management
© The McGraw−Hill
Companies, 2001
15
...
3
Implementation of Atomicity and Durability
571
For certain applications, it may be desirable to allow active transactions to display data to users, particularly for long-duration transactions that run for minutes
or hours
...
Most current transaction systems
ensure atomicity and, therefore, forbid this form of interaction with users
...
15
...
We first consider a simple, but extremely inefficient, scheme called the shadow copy scheme
...
The scheme also assumes that the database is simply a file on
disk
...
In the shadow-copy scheme, a transaction that wants to update the database first
creates a complete copy of the database
...
If at any point the transaction has to be aborted, the system merely deletes the new copy
...
If the transaction completes, it is committed as follows
...
(Unix systems use the flush command for this purpose
...
The old copy of the database is then deleted
...
2 depicts the scheme, showing the database state before and after the update
...
2
new copy of
database
(b) After update
Shadow-copy technique for atomicity and durability
...
Transaction
Management
15
...
We now consider how the technique handles transaction and system failures
...
If the transaction fails at any time before db-pointer is
updated, the old contents of the database are not affected
...
Once the transaction has been
committed, all the updates that it performed are in the database pointed to by dbpointer
...
Now consider the issue of system failure
...
Then, when the system restarts, it
will read db-pointer and will thus see the original contents of the database, and none
of the effects of the transaction will be visible on the database
...
Before the pointer is updated,
all updated pages of the new copy of the database were written to disk
...
Therefore, when the system restarts, it will read db-pointer
and will thus see the contents of the database after all the updates performed by the
transaction
...
If some of the
bytes of the pointer were updated by the write, but others were not, the pointer is
meaningless, and neither old nor new versions of the database may be found when
the system restarts
...
In other words, the disk system guarantees that it will update
db-pointer atomically, as long as we make sure that db-pointer lies entirely in a single
sector, which we can ensure by storing db-pointer at the beginning of a block
...
As a simple example of a transaction outside the database domain, consider a textediting session
...
The actions
executed by the transaction are reading and updating the file
...
Many text editors use essentially the implementation just described, to ensure that
an editing session is transactional
...
At the
end of the editing session, if the updated file is to be saved, the text editor uses a file
rename command to rename the new file to have the actual file name
...
Unfortunately, this implementation is extremely inefficient in the context of large
databases, since executing a single transaction requires copying the entire database
...
There are practical ways of implementing atomicity and durability
that are much less expensive and more powerful
...
571
572
Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition
V
...
Transactions
15
...
4 Concurrent Executions
Transaction-processing systems usually allow multiple transactions to run concurrently
...
Ensuring consistency
in spite of concurrent execution of transactions requires extra work; it is far easier to
insist that transactions run serially — that is, one at a time, each starting only after
the previous one has completed
...
A transaction consists of many
steps
...
The CPU and the
disks in a computer system can operate in parallel
...
The parallelism of the CPU
and the I/O system can therefore be exploited to run multiple transactions in
parallel
...
All of this
increases the throughput of the system — that is, the number of transactions
executed in a given amount of time
...
• Reduced waiting time
...
If transactions run serially, a short transaction
may have to wait for a preceding long transaction to complete, which can lead
to unpredictable delays in running a transaction
...
Concurrent execution
reduces the unpredictable delays in running transactions
...
The motivation for using concurrent execution in a database is essentially the same
as the motivation for using multiprogramming in an operating system
...
In this section, we present the
concept of schedules to help identify those executions that are guaranteed to ensure
consistency
...
It does
so through a variety of mechanisms called concurrency-control schemes
...
Consider again the simplified banking system of Section 15
...
Let T1 and
Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition
574
Chapter 15
V
...
Transactions
Transactions
T2 be two transactions that transfer funds from one account to another
...
It is defined as
T1 : read(A);
A := A − 50;
write(A);
read(B);
B := B + 50;
write(B)
...
It is
defined as
T2 : read(A);
temp := A * 0
...
Suppose the current values of accounts A and B are $1000 and $2000, respectively
...
This execution sequence appears in Figure 15
...
In the figure, the
sequence of instruction steps is in chronological order from top to bottom, with instructions of T1 appearing in the left column and instructions of T2 appearing in the
right column
...
3
takes place, are $855 and $2145, respectively
...
1
A := A – temp
write(A)
read(B)
B := B + temp
write(B)
Figure 15
...
573
574
Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition
V
...
Transactions
15
...
Similarly, if the transactions are executed one at a time in the order T2 followed
by T1 , then the corresponding execution sequence is that of Figure 15
...
Again, as
expected, the sum A + B is preserved, and the final values of accounts A and B are
$850 and $2150, respectively
...
They represent the
chronological order in which instructions are executed in the system
...
For example, in transaction T1 , the instruction write(A) must appear before the
instruction read(B), in any valid schedule
...
These schedules are serial: Each serial schedule consists of a sequence of instructions from various transactions, where the instructions belonging to one single transaction appear together in that schedule
...
When the database system executes several transactions concurrently, the corresponding schedule no longer needs to be serial
...
With multiple transactions, the CPU time is shared among all the transactions
...
In general, it is not possible to predict exactly
how many instructions of a transaction will be executed before the CPU switches to
T1
T2
read(A)
temp := A * 0
...
4
Schedule 2 — a serial schedule in which T2 is followed by T1
...
Transaction
Management
© The McGraw−Hill
Companies, 2001
15
...
1
A := A – temp
write(A)
read(B)
B := B + 50
write(B)
read(B)
B := B + temp
write(B)
Figure 15
...
another transaction
...
Returning to our previous example, suppose that the two transactions are executed concurrently
...
5
...
The sum A + B is indeed preserved
...
To illustrate, consider the
schedule of Figure 15
...
After the execution of this schedule, we arrive at a state
where the final values of accounts A and B are $950 and $2100, respectively
...
Indeed, the sum A + B is not preserved by the execution of the two
transactions
...
It is the job of the database system to
ensure that any schedule that gets executed will leave the database in a consistent
state
...
We can ensure consistency of the database under concurrent execution by making
sure that any schedule that executed has the same effect as a schedule that could
have occurred without any concurrent execution
...
We examine this idea in Section 15
...
15
...
Before we examine how the database
575
576
Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition
V
...
Transactions
15
...
1
A := A – temp
write(A)
read(B)
write(A)
read(B)
B := B + 50
write(B)
B := B + temp
write(B)
Figure 15
...
system can carry out this task, we must first understand which schedules will ensure consistency, and which schedules will not
...
For this reason, we shall not interpret the type of operations that a
transaction can perform on a data item
...
We thus assume that, between a read(Q) instruction and a write(Q)
instruction on a data item Q, a transaction may perform an arbitrary sequence of operations on the copy of Q that is residing in the local buffer of the transaction
...
We shall therefore usually show only read and write
instructions in schedules, as we do in schedule 3 in Figure 15
...
In this section, we discuss different forms of schedule equivalence; they lead to the
notions of conflict serializability and view serializability
...
7
Schedule 3 — showing only the read and write instructions
...
Transaction
Management
15
...
5
...
If Ii and Ij refer to different data
items, then we can swap Ii and Ij without affecting the results of any instruction in
the schedule
...
Since we are dealing with only read and write instructions,
there are four cases that we need to consider:
1
...
The order of Ii and Ij does not matter, since the
same value of Q is read by Ti and Tj , regardless of the order
...
Ii = read(Q), Ij = write(Q)
...
If Ij comes before Ii , then Ti reads
the value of Q that is written by Tj
...
3
...
The order of Ii and Ij matters for reasons similar
to those of the previous case
...
Ii = write(Q), Ij = write(Q)
...
However, the value
obtained by the next read(Q) instruction of S is affected, since the result of
only the latter of the two write instructions is preserved in the database
...
Thus, only in the case where both Ii and Ij are read instructions does the relative
order of their execution not matter
...
To illustrate the concept of conflicting instructions, we consider schedule 3, in Figure 15
...
The write(A) instruction of T1 conflicts with the read(A) instruction of T2
...
Let Ii and Ij be consecutive instructions of a schedule S
...
We expect S to be equivalent to S , since all
instructions appear in the same order in both schedules except for Ii and Ij , whose
order does not matter
...
7 does not conflict
with the read(B) instruction of T1 , we can swap these instructions to generate an
equivalent schedule, schedule 5, in Figure 15
...
Regardless of the initial system state,
schedules 3 and 5 both produce the same final system state
...
• Swap the write(B) instruction of T1 with the write(A) instruction of T2
...
577
578
Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition
V
...
Transactions
15
...
8
Schedule 5 — schedule 3 after swapping of a pair of instructions
...
9, is a serial schedule
...
This equivalence
implies that, regardless of the initial system state, schedule 3 will produce the same
final state as will some serial schedule
...
In our previous examples, schedule 1 is not conflict equivalent to schedule 2
...
The concept of conflict equivalence leads to the concept of conflict serializability
...
Thus, schedule 3 is conflict serializable, since it is conflict equivalent to the
serial schedule 1
...
10; it consists of only the significant operations (that is, the read and write) of transactions T3 and T4
...
It is possible to have two schedules that produce the same outcome, but that are
not conflict equivalent
...
9
Schedule 6 — a serial schedule that is equivalent to schedule 3
...
Transaction
Management
© The McGraw−Hill
Companies, 2001
15
...
10
Schedule 7
...
Let schedule 8 be as defined in Figure 15
...
We claim
that schedule 8 is not conflict equivalent to the serial schedule
schedule 8, the write(B) instruction of T5 conflicts with the read(B) instruction of T1
...
However, the final values of accounts A and B
after the execution of either schedule 8 or the serial schedule
— $960 and $2040, respectively
...
For the system to determine that schedule 8
produces the same outcome as the serial schedule
...
However, there are other definitions of schedule equivalence based purely on the read and
write operations
...
15
...
2 View Serializability
In this section, we consider a form of equivalence that is less stringent than conflict
equivalence, but that, like conflict equivalence, is based on only the read and write
operations of transactions
...
11
Schedule 8
...
Transaction
Management
© The McGraw−Hill
Companies, 2001
15
...
5
Serializability
581
Consider two schedules S and S , where the same set of transactions participates
in both schedules
...
For each data item Q, if transaction Ti reads the initial value of Q in schedule
S, then transaction Ti must, in schedule S , also read the initial value of Q
...
For each data item Q, if transaction Ti executes read(Q) in schedule S, and if
that value was produced by a write(Q) operation executed by transaction Tj ,
then the read(Q) operation of transaction Ti must, in schedule S , also read the
value of Q that was produced by the same write(Q) operation of transaction Tj
...
For each data item Q, the transaction (if any) that performs the final write(Q)
operation in schedule S must perform the final write(Q) operation in schedule S
...
Condition 3, coupled with
conditions 1 and 2, ensures that both schedules result in the same final system state
...
However, schedule 1 is view equivalent
to schedule 3, because the values of account A and B read by transaction T2 were
produced by T1 in both schedules
...
We
say that a schedule S is view serializable if it is view equivalent to a serial schedule
...
12
...
Indeed, it is view
equivalent to the serial schedule
the initial value of Q in both schedules, and T6 performs the final write of Q in both
schedules
...
Indeed, schedule 9 is not conflict serializable, since every pair of consecutive instructions conflicts, and, thus, no
swapping of instructions is possible
...
Writes of this sort are called blind
writes
...
T3
read(Q)
T4
T6
write(Q)
write(Q)
write(Q)
Figure 15
...
Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition
582
Chapter 15
V
...
Transactions
Transactions
15
...
We
now address the effect of transaction failures during concurrent execution
...
In a system that allows
concurrent execution, it is necessary also to ensure that any transaction Tj that is
dependent on Ti (that is, Tj has read data written by Ti ) is also aborted
...
In the following two subsections, we address the issue of what schedules are
acceptable from the viewpoint of recovery from transaction failure
...
15
...
1 Recoverable Schedules
Consider schedule 11 in Figure 15
...
Suppose that the system allows T9 to commit immediately
after executing the read(A) instruction
...
Now suppose that T8 fails before it commits
...
However, T9 has already
committed and cannot be aborted
...
Schedule 11, with the commit happening immediately after the read(A) instruction, is an example of a nonrecoverable schedule, which should not be allowed
...
A recoverable schedule is
one where, for each pair of transactions Ti and Tj such that Tj reads a data item previously written by Ti , the commit operation of Ti appears before the commit operation
of Tj
...
6
...
Such situations occur if transactions have read data written by Ti
...
13
Schedule 11
...
Transaction
Management
© The McGraw−Hill
Companies, 2001
15
...
7
T10
read(A)
read(B)
write(A)
T11
Implementation of Isolation
583
T12
read(A)
write(A)
read(A)
Figure 15
...
of Figure 15
...
Transaction T10 writes a value of A that is read by transaction T11
...
Suppose that,
at this point, T10 fails
...
Since T11 is dependent on T10 , T11
must be rolled back
...
This
phenomenon, in which a single transaction failure leads to a series of transaction
rollbacks, is called cascading rollback
...
It is desirable to restrict the schedules to those where cascading
rollbacks cannot occur
...
Formally, a
cascadeless schedule is one where, for each pair of transactions Ti and Tj such that
Tj reads a data item previously written by Ti , the commit operation of Ti appears
before the read operation of Tj
...
15
...
Specifically, schedules that are conflict or view serializable and cascadeless satisfy
these requirements
...
As a trivial example of a concurrency-control scheme, consider this scheme: A
transaction acquires a lock on the entire database before it starts and releases the
lock after it has committed
...
As
a result of the locking policy, only one transaction can execute at a time
...
These are trivially serializable, and it is easy to
verify that they are cascadeless as well
...
In
other words, it provides a poor degree of concurrency
...
4,
concurrent execution has several performance benefits
...
Transaction
Management
15
...
We study a number of concurrency-control schemes in Chapter 16
...
Some of them allow only conflict serializable
schedules to be generated; others allow certain view-serializable schedules that are
not conflict-serializable to be generated
...
8 Transaction Definition in SQL
A data-manipulation language must include a construct for specifying the set of actions that constitute a transaction
...
Transactions are
ended by one of these SQL statements:
• Commit work commits the current transaction and begins a new one
...
The keyword work is optional in both the statements
...
The standard also specifies that the system must ensure both serializability and
freedom from cascading rollback
...
Thus,
conflict and view serializability are both acceptable
...
We study such weaker levels of consistency in Section 16
...
15
...
To do that, we must first understand how to
determine, given a particular schedule S, whether the schedule is serializable
...
Consider a schedule S
...
This graph consists of a pair G = (V, E), where V is a set
of vertices and E is a set of edges
...
The set of edges consists of all edges Ti → Tj for which
one of three conditions holds:
1
...
2
...
3
...
583
584
Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition
V
...
Transactions
15
...
15
Testing for Serializability
Precedence graph for (a) schedule 1 and (b) schedule 2
...
For example, the precedence graph for schedule 1 in Figure 15
...
Similarly, Figure 15
...
The precedence graph for schedule 4 appears in Figure 15
...
It contains the edge
T1 → T2 , because T1 executes read(A) before T2 executes write(A)
...
If the precedence graph for S has a cycle, then schedule S is not conflict serializable
...
A serializability order of the transactions can be obtained through topological
sorting, which determines a linear order consistent with the partial order of the
precedence graph
...
For example, the graph of Figure 15
...
17b and 15
...
Thus, to test for conflict serializability, we need to construct the precedence graph
and to invoke a cycle-detection algorithm
...
Cycle-detection algorithms, such as those based
on depth-first search, require on the order of n2 operations, where n is the number of
vertices in the graph (that is, the number of transactions)
...
Returning to our previous examples, note that the precedence graphs for schedules 1 and 2 (Figure 15
...
The precedence graph for
schedule 4 (Figure 15
...
Testing for view serializability is rather complicated
...
Thus, almost certainly there exists no efficient algorithm to test for view serializability
...
16
T2
Precedence graph for schedule 4
...
Transaction
Management
© The McGraw−Hill
Companies, 2001
15
...
17
Illustration of topological sorting
...
However,
concurrency-control schemes can still use sufficient conditions for view serializability
...
15
...
Understanding the concept of a transaction is critical for
understanding and implementing updates of data in a database, in such a way
that concurrent executions and failures of various forms do not result in the
database becoming inconsistent
...
Atomicity ensures that either all the effects of a transaction are reflected
in the database, or none are; a failure cannot leave the database in a state
where a transaction is partially executed
...
585
586
Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition
V
...
Transactions
15
...
Durability ensures that, once a transaction has been committed, that transaction’s updates do not get lost, even if there is a system failure
...
• When several transactions execute concurrently in the database, the consistency of data may no longer be preserved
...
Since a transaction is a unit that preserves consistency, a serial execution
of transactions guarantees that consistency is preserved
...
We require that any schedule produced by concurrent processing of a
set of transactions will have an effect equivalent to a schedule produced
when these transactions are run serially in some order
...
There are several different notions of equivalence leading to the concepts
of conflict serializability and view serializability
...
• Schedules must be recoverable, to make sure that if transaction a sees the effects of transaction b, and b then aborts, then a also gets aborted
...
Cascadelessness is
ensured by allowing transactions to only read committed data
...
Chapter 16 describes
concurrency-control schemes
...
The shadow copy scheme is used for ensuring atomicity and durability in
text editors; however, it has extremely high overheads when used for database
systems, and, moreover, it does not support concurrent execution
...
• We can test a given schedule for conflict serializability by constructing a precedence graph for the schedule, and by searching for absence of cycles in the
Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition
588
Chapter 15
V
...
Transactions
Transactions
graph
...
Review Terms
• Transaction
• ACID properties
Atomicity
Consistency
Isolation
Durability
• Inconsistent state
• Transaction state
Active
Partially committed
Failed
Aborted
Committed
Terminated
• Transaction
Restart
Kill
• Observable external writes
• Shadow copy scheme
•
•
•
•
•
•
•
•
Concurrent executions
Serial execution
Schedules
Conflict of operations
Conflict equivalence
Conflict serializability
View equivalence
View serializability
•
•
•
•
•
•
•
•
•
•
Blind writes
Recoverability
Recoverable schedules
Cascading rollback
Cascadeless schedules
Concurrency-control scheme
Lock
Serializability testing
Precedence graph
Serializability order
Exercises
15
...
Explain the usefulness of each
...
2 Suppose that there is a database system that never fails
...
3 Consider a file system such as the one on your favorite operating system
...
What are the steps involved in creation and deletion of files, and in writing
data to a file?
b
...
15
...
Why might this be the case?
15
...
List all possible sequences of states through which a transaction may pass
...
587
588
Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition
V
...
Transactions
Exercises
589
15
...
15
...
15
...
T2 : read(B);
read(A);
if B = 0 then A := A + 1;
write(A)
...
a
...
b
...
c
...
9 Since every conflict-serializable schedule is view serializable, why do we emphasize conflict serializability rather than view serializability?
15
...
18
...
15
...
T1
T2
T4
T3
T5
Figure 15
...
Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition
590
Chapter 15
V
...
Transactions
© The McGraw−Hill
Companies, 2001
Transactions
15
...
Bibliographical Notes
Gray and Reuter [1993] provides detailed textbook coverage of transaction-processing
concepts, techniques and implementation details, including concurrency control and
recovery issues
...
Early textbook discussions of concurrency control and recovery included Papadimitriou [1986] and Bernstein et al
...
An early survey paper on implementation
issues in concurrency control and recovery is presented by Gray [1978]
...
[1976] in connection
to work on concurrency control for System R
...
[1977] and Papadimitriou [1979]
...
[1990]
...
589
590
Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition
C
V
...
Concurrency Control
T
E
R
1
© The McGraw−Hill
Companies, 2001
6
Concurrency Control
We saw in Chapter 15 that one of the fundamental properties of a transaction is isolation
...
To ensure that it is, the system must
control the interaction among the concurrent transactions; this control is achieved
through one of a variety of mechanisms called concurrency-control schemes
...
That is, all the schemes presented here ensure that the
schedules are serializable
...
In this chapter, we consider the management of
concurrently executing transactions, and we ignore failures
...
16
...
The most common method used to implement
this requirement is to allow a transaction to access a data item only if it is currently
holding a lock on that item
...
1
...
In this section, we
restrict our attention to two modes:
1
...
If a transaction Ti has obtained a shared-mode lock (denoted by S)
on item Q, then Ti can read, but cannot write, Q
...
Exclusive
...
591
Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition
592
Chapter 16
V
...
Concurrency Control
Concurrency Control
S
S true
X false
Figure 16
...
We require that every transaction request a lock in an appropriate mode on data
item Q, depending on the types of operations that it will perform on Q
...
The transaction can
proceed with the operation only after the concurrency-control manager grants the
lock to the transaction
...
Let A and B represent arbitrary lock modes
...
If transaction Ti can be granted a lock on Q immediately, in spite
of the presence of the mode B lock, then we say mode A is compatible with mode
B
...
The compatibility
relation between the two modes of locking discussed in this section appears in the
matrix comp of Figure 16
...
An element comp(A, B) of the matrix has the value true if
and only if mode A is compatible with mode B
...
At any time, several shared-mode locks can be held simultaneously (by different transactions) on a particular data item
...
A transaction requests a shared lock on data item Q by executing the lock-S(Q)
instruction
...
A transaction can unlock a data item Q by the unlock(Q) instruction
...
If the data item is
already locked by another transaction in an incompatible mode, the concurrencycontrol manager will not grant the lock until all incompatible locks held by other
transactions have been released
...
T1 : lock-X(B);
read(B);
B := B − 50;
write(B);
unlock(B);
lock-X(A);
read(A);
A := A + 50;
write(A);
unlock(A)
...
2
Transaction T1
...
Transaction
Management
© The McGraw−Hill
Companies, 2001
16
...
1
Lock-Based Protocols
593
T2 : lock-S(A);
read(A);
unlock(A);
lock-S(B);
read(B);
unlock(B);
display(A + B)
...
3
Transaction T2
...
Note that a transaction must hold a lock on a data item as long as it accesses that item
...
As an illustration, consider again the simplified banking system that we introduced in Chapter 15
...
Transaction T1 transfers $50 from account B to account A (Figure 16
...
Transaction T2 displays the total amount of money in accounts A and B—that is, the
sum A + B (Figure 16
...
T1
lock-X(B)
T2
concurrency-control manager
grant-X(B, T1)
read(B)
B := B –– 50
write(B)
unlock(B)
lock-S(A)
grant-S(A, T2)
read(A)
unlock(A)
lock-S(B)
grant-S(B, T2)
read(B)
unlock(B)
display(A + B)
lock-X(A)
grant-X(A, T2)
read(A)
A := A + 50
write(A)
unlock(A)
Figure 16
...
Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition
594
Chapter 16
V
...
Concurrency Control
© The McGraw−Hill
Companies, 2001
Concurrency Control
T3 : lock-X(B);
read(B);
B := B − 50;
write(B);
lock-X(A);
read(A);
A := A + 50;
write(A);
unlock(B);
unlock(A)
...
5
Transaction T3
...
If these
two transactions are executed serially, either in the order T1 , T2 or the order T2 , T1 ,
then transaction T2 will display the value $300
...
4 is possible
...
The reason for this mistake is that the
transaction T1 unlocked data item B too early, as a result of which T2 saw an inconsistent state
...
The transaction making a lock request cannot execute its next action until the concurrency-control manager grants the lock
...
Exactly when
within this interval the lock is granted is not important; we can safely assume that the
lock is granted just before the following action of the transaction
...
We let you infer when locks are granted
...
Transaction T3 corresponds to T1 with unlocking delayed (Figure 16
...
Transaction T4 corresponds to T2 with unlocking delayed (Figure 16
...
You should verify that the sequence of reads and writes in schedule 1, which lead
to an incorrect total of $250 being displayed, is no longer possible with T3 and T4
...
Figure 16
...
593
594
Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition
V
...
Concurrency Control
16
...
7
Schedule 2
...
T4 will not print out an inconsistent result in any of
them; we shall see why later
...
Consider the partial
schedule of Figure 16
...
Since T3 is holding an exclusive-mode lock
on B and T4 is requesting a shared-mode lock on B, T4 is waiting for T3 to unlock
B
...
Thus, we have arrived at
a state where neither of these transactions can ever proceed with its normal execution
...
When deadlock occurs, the system must roll back
one of the two transactions
...
These data items are then available
to the other transaction, which can continue with its execution
...
6
...
On the other hand, if we do not
unlock a data item before requesting a lock on another data item, deadlocks may
occur
...
1
...
However, in general, deadlocks are a necessary evil associated with locking, if we want to avoid inconsistent states
...
We shall require that each transaction in the system follow a set of rules, called a
locking protocol, indicating when a transaction may lock and unlock each of the data
items
...
The set of all such
schedules is a proper subset of all possible serializable schedules
...
Before doing
so, we need a few definitions
...
, Tn } be a set of transactions participating in a schedule S
...
If Ti → Tj , then that precedence implies that in any equivalent serial schedule, Ti must appear before Tj
...
Transaction
Management
16
...
9 to test for conflict serializability
...
We say that a schedule S is legal under a given locking protocol if S is a possible
schedule for a set of transactions that follow the rules of the locking protocol
...
16
...
2 Granting of Locks
When a transaction requests a lock on a data item in a particular mode, and no other
transaction has a lock on the same data item in a conflicting mode, the lock can be
granted
...
Suppose a
transaction T2 has a shared-mode lock on a data item, and another transaction T1
requests an exclusive-mode lock on the data item
...
Meanwhile, a transaction T3 may request a shared-mode
lock on the same data item
...
At this point T2 may release the lock,
but still T1 has to wait for T3 to finish
...
In fact, it is possible that there is a sequence of transactions that
each requests a shared-mode lock on the data item, and each transaction releases the
lock a short while after it is granted, but T1 never gets the exclusive-mode lock on the
data item
...
We can avoid starvation of transactions by granting locks in the following manner:
When a transaction Ti requests a lock on a data item Q in a particular mode M , the
concurrency-control manager grants the lock provided that
1
...
2
...
Thus, a lock request will never get blocked by a lock request that is made later
...
1
...
This protocol requires that each transaction issue lock and unlock requests in two phases:
1
...
A transaction may obtain locks, but may not release any lock
...
Shrinking phase
...
595
596
Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition
V
...
Concurrency Control
16
...
The transaction acquires locks as
needed
...
For example, transactions T3 and T4 are two phase
...
Note that the unlock instructions do not need to appear
at the end of the transaction
...
We can show that the two-phase locking protocol ensures conflict serializability
...
The point in the schedule where the transaction has obtained its final lock (the end of its growing phase) is called the lock point of the
transaction
...
We leave the proof as
an exercise for you to do (see Exercise 16
...
Two-phase locking does not ensure freedom from deadlock
...
7), they are deadlocked
...
6
...
Cascading rollback may occur under two-phase locking
...
8
...
Cascading rollbacks can be avoided by a modification of two-phase locking called
the strict two-phase locking protocol
...
This requirement ensures that any data written by an
uncommitted transaction are locked in exclusive mode until the transaction commits,
preventing any other transaction from reading the data
...
We can easily
T5
lock-X(A)
read(A)
lock-S(B)
read(B)
write(A)
unlock(A)
T6
T7
lock-X(A)
read(A)
read
write(A)
unlock(A)
lock-S (A)
read(A)
Figure 16
...
Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition
598
Chapter 16
V
...
Concurrency Control
Concurrency Control
verify that, with rigorous two-phase locking, transactions can be serialized in the order in which they commit
...
Consider the following two transactions, for which we have shown only some of
the significant read and write operations:
T8 : read(a1 );
read(a2 );
...
T9 : read(a1 );
read(a2 );
display(a1 + a2 )
...
Therefore, any concurrent execution of both transactions amounts to a serial
execution
...
Thus, if T8 could initially lock a1 in shared mode, and
then could later change the lock to exclusive mode, we could get more concurrency,
since T8 and T9 could access a1 and a2 simultaneously
...
We shall provide a mechanism for upgrading
a shared lock to an exclusive lock, and downgrading an exclusive lock to a shared
lock
...
Lock conversion cannot be allowed arbitrarily
...
Returning to our example, transactions T8 and T9 can run concurrently under
the refined two-phase locking protocol, as shown in the incomplete schedule of Figure 16
...
T8
lock-S (a 1 )
T9
lock-S (a 1 )
lock-S (a 2 )
lock-S (a 2 )
lock-S (a 3 )
lock-S (a 4 )
unlock(a 1 )
unlock(a 2 )
lock-S ( an )
upgrade (a 1 )
Figure 16
...
597
598
Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition
V
...
Concurrency Control
16
...
This enforced wait occurs if Q is currently locked by another transaction in
shared mode
...
Further, if exclusive locks are held until the end of the transaction, the schedules are cascadeless
...
However, to obtain conflictserializable schedules through non-two-phase locking protocols, we need either to
have additional information about the transactions or to impose some structure or
ordering on the set of data items in the database
...
Strict two-phase locking and rigorous two-phase locking (with lock conversions)
are used extensively in commercial database systems
...
• When Ti issues a write(Q) operation, the system checks to see whether Ti
already holds a shared lock on Q
...
Otherwise, the system issues a lock-X(Q) instruction, followed by the write(Q) instruction
...
16
...
4 Implementation of Locking∗∗
A lock manager can be implemented as a process that receives messages from transactions and sends messages in reply
...
Unlock messages require only an acknowledgment in
response, but may result in a grant message to another waiting transaction
...
It uses a hash table, indexed on the name of a data item, to
find the linked list (if any) for a data item; this table is called the lock table
...
The record also notes if the request has currently
been granted
...
Transaction
Management
© The McGraw−Hill
Companies, 2001
16
...
10
Lock table
...
10 shows an example of a lock table
...
The lock table uses overflow chaining,
so there is a linked list of data items for each entry in the lock table
...
Granted locks are the filled-in (black) rectangles, while waiting requests
are the empty rectangles
...
It can be seen, for example, that T23 has been granted locks on I912 and I7, and is
waiting for a lock on I4
...
The lock manager processes requests this way:
• When a lock request message arrives, it adds a record to the end of the linked
list for the data item, if the linked list is present
...
It always grants the first lock request on a data item
...
Otherwise the request has
to wait
...
Transaction
Management
© The McGraw−Hill
Companies, 2001
16
...
1
Lock-Based Protocols
601
• When the lock manager receives an unlock message from a transaction, it
deletes the record for that data item in the linked list corresponding to that
transaction
...
If it can, the lock manager grants that request, and processes the record following it, if any, similarly,
and so on
...
Once the database system has taken appropriate actions to
undo the transaction (see Section 17
...
This algorithm guarantees freedom from starvation for lock requests, since a request can never be granted while a request received earlier is waiting to be granted
...
6
...
Section 18
...
1
describes an alternative implementation — one that uses shared memory instead of
message passing for lock request/grant
...
1
...
1
...
But, if we wish to develop protocols that are
not two phase, we need additional information on how each transaction will access
the database
...
The simplest model requires
that we have prior knowledge about the order in which the database items will be
accessed
...
To acquire such prior knowledge, we impose a partial ordering → on the set
D = {d1 , d2 ,
...
If di → dj , then any transaction accessing both
di and dj must access di before accessing dj
...
The partial ordering implies that the set D may now be viewed as a directed acyclic
graph, called a database graph
...
We will present a
simple protocol, called the tree protocol, which is restricted to employ only exclusive
locks
...
In the tree protocol, the only lock instruction allowed is lock-X
...
The first lock by Ti may be on any data item
...
Subsequently, a data item Q can be locked by Ti only if the parent of Q is
currently locked by Ti
...
Transaction
Management
© The McGraw−Hill
Companies, 2001
16
...
Data items may be unlocked at any time
...
A data item that has been locked and unlocked by Ti cannot subsequently be
relocked by Ti
...
To illustrate this protocol, consider the database graph of Figure 16
...
The following four transactions follow the tree protocol on this graph
...
T11 : lock-X(D); lock-X(H); unlock(D); unlock(H)
...
T13 : lock-X(D); lock-X(H); unlock(D); unlock(H)
...
12
...
Observe that the schedule of Figure 16
...
It can be shown
not only that the tree protocol ensures conflict serializability, but also that this protocol ensures freedom from deadlock
...
12 does not ensure recoverability and cascadelessness
...
Holding exclusive locks until the end of the transaction reduces concurrency
...
Whenever a transaction Ti performs a read of an uncommitted data
item, we record a commit dependency of Ti on the transaction that performed the
A
B
C
F
E
D
G
H
I
J
Figure 16
...
601
602
Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition
V
...
Concurrency Control
16
...
12
Serializable schedule under the tree protocol
...
Transaction Ti is then not permitted to commit until the
commit of all transactions on which it has a commit dependency
...
The tree-locking protocol has an advantage over the two-phase locking protocol in
that, unlike two-phase locking, it is deadlock-free, so no rollbacks are required
...
Earlier unlocking may lead to shorter waiting times,
and to an increase in concurrency
...
For example, a transaction that needs
to access data items A and J in the database graph of Figure 16
...
This additional locking results in increased
locking overhead, the possibility of additional waiting time, and a potential decrease
in concurrency
...
For a set of transactions, there may be conflict-serializable schedules that cannot
be obtained through the tree protocol
...
Examples of such schedules are explored in the exercises
...
Transaction
Management
16
...
2 Timestamp-Based Protocols
The locking protocols that we have described thus far determine the order between
every pair of conflicting transactions at execution time by the first lock that both
members of the pair request that involves incompatible modes
...
The most common method for doing so is to use a timestamp-ordering
scheme
...
2
...
This timestamp is assigned by the database system before the transaction Ti starts execution
...
There are two simple
methods for implementing this scheme:
1
...
2
...
The timestamps of the transactions determine the serializability order
...
To implement this scheme, we associate with each data item Q two timestamp
values:
• W-timestamp(Q) denotes the largest timestamp of any transaction that executed write(Q) successfully
...
These timestamps are updated whenever a new read(Q) or write(Q) instruction is
executed
...
2
...
This protocol operates as follows:
1
...
a
...
Hence, the read operation is rejected, and Ti is rolled
back
...
Transaction
Management
© The McGraw−Hill
Companies, 2001
16
...
2
Timestamp-Based Protocols
605
b
...
2
...
a
...
Hence, the system rejects the write operation and rolls Ti
back
...
If TS(Ti ) < W-timestamp(Q), then Ti is attempting to write an obsolete
value of Q
...
c
...
If a transaction Ti is rolled back by the concurrency-control scheme as result of issuance of either a read or write operation, the system assigns it a new timestamp and
restarts it
...
Transaction T14
displays the contents of accounts A and B:
T14 : read(B);
read(A);
display(A + B)
...
In presenting schedules under the timestamp protocol, we shall assume that a transaction is assigned a timestamp immediately before its first instruction
...
13, TS(T14 ) < TS(T15 ), and the schedule is possible under the timestamp protocol
...
There are, however, schedules that are possible under the two-phase
locking protocol, but are not possible under the timestamp protocol, and vice versa
(see Exercise 16
...
The timestamp-ordering protocol ensures conflict serializability
...
The protocol ensures freedom from deadlock, since no transaction ever waits
...
If
Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition
606
Chapter 16
V
...
Concurrency Control
Concurrency Control
T14
read (B)
T15
read (B)
B := B – 50
write(B)
read (A)
read (A)
display(A + B)
A := A + 50
write(A)
display (A + B)
Figure 16
...
a transaction is found to be getting restarted repeatedly, conflicting transactions need
to be temporarily blocked to enable the transaction to finish
...
However, it can be
extended to make the schedules recoverable, in one of several ways:
• Recoverability and cascadelessness can be ensured by performing all writes
together at the end of the transaction
...
• Recoverability and cascadelessness can also be guaranteed by using a limited
form of locking, whereby reads of uncommitted items are postponed until the
transaction that updated the item commits (see Exercise 16
...
• Recoverability alone can be ensured by tracking uncommitted writes, and allowing a transaction Ti to commit only after the commit of any transaction that
wrote a value that Ti read
...
1
...
16
...
3 Thomas’ Write Rule
We now present a modification to the timestamp-ordering protocol that allows greater
potential concurrency than does the protocol of Section 16
...
2
...
14, and apply the timestamp-ordering protocol
...
The read(Q) operation of T16 succeeds, as does the write(Q) operation of T17
...
Thus, the
write(Q) by T16 is rejected and transaction T16 must be rolled back
...
Since T17 has already written Q, the value that T16 is attempting to
write is one that will never need to be read
...
Any
605
606
Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition
V
...
Concurrency Control
16
...
14
Schedule 4
...
This observation leads to a modified version of the timestamp-ordering protocol
in which obsolete write operations can be ignored under certain circumstances
...
The protocol rules for write
operations, however, are slightly different from the timestamp-ordering protocol of
Section 16
...
2
...
1
...
Hence, the system rejects the write operation and rolls Ti back
...
If TS(Ti ) < W-timestamp(Q), then Ti is attempting to write an obsolete value
of Q
...
3
...
The difference between these rules and those of Section 16
...
2 lies in the second
rule
...
However, here, in those cases where TS(Ti )
≥ R-timestamp(Q), we ignore the obsolete write
...
This modification of transactions makes it possible to generate serializable schedules that would not be possible
under the other protocols presented in this chapter
...
14 is not conflict serializable and, thus, is not possible under any of two-phase
locking, the tree protocol, or the timestamp-ordering protocol
...
The result is a schedule that is
view equivalent to the serial schedule
...
3 Validation-Based Protocols
In cases where a majority of transactions are read-only transactions, the rate of conflicts among transactions may be low
...
A concurrency-control scheme imposes overhead of
code execution and possible delay of transactions
...
Transaction
Management
16
...
A difficulty in reducing the overhead is that
we do not know in advance which transactions will be involved in a conflict
...
We assume that each transaction Ti executes in two or three different phases in its
lifetime, depending on whether it is a read-only or an update transaction
...
Read phase
...
It reads
the values of the various data items and stores them in variables local to Ti
...
2
...
Transaction Ti performs a validation test to determine whether it can copy to the database the temporary local variables that hold the
results of write operations without causing a violation of serializability
...
Write phase
...
Otherwise, the system rolls back
Ti
...
However, all
three phases of concurrently executing transactions can be interleaved
...
We shall, therefore, associate three different timestamps with
transaction Ti :
1
...
2
...
3
...
We determine the serializability order by the timestamp-ordering technique, using
the value of the timestamp Validation(Ti )
...
The reason we have
chosen Validation(Ti ), rather than Start(Ti ), as the timestamp of transaction Ti is that
we can expect faster response time provided that conflict rates among transactions
are indeed low
...
Finish(Ti ) < Start(Tj )
...
2
...
This condition ensures that
607
608
Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition
V
...
Concurrency Control
16
...
15
Schedule 5, a schedule produced by using validation
...
Since the writes of Ti do not affect the
read of Tj , and since Tj cannot affect the read of Ti , the serializability order is
indeed maintained
...
Suppose that TS(T14 )
< TS(T15 )
...
15
...
Thus, T14 reads the old values of B and A, and this schedule is serializable
...
However, there is a possibility of starvation of long transactions, due to a sequence
of conflicting short transactions that cause repeated restarts of the long transaction
...
This validation scheme is called the optimistic concurrency control scheme since
transactions execute optimistically, assuming they will be able to finish execution
and validate at the end
...
16
...
There are circumstances, however, where it would be advantageous to group several data items, and to treat them as one individual synchronization unit
...
Clearly, executing these locks is
time consuming
...
Transaction
Management
© The McGraw−Hill
Companies, 2001
16
...
16
rb1
A2
Fc
…
rbk
rc1
…
rcm
Granularity hierarchy
...
On the other hand, if transaction Tj needs to access only a few data
items, it should not be required to lock the entire database, since otherwise concurrency is lost
...
We can make one by allowing data items to be of various sizes and defining a hierarchy of data granularities, where the small granularities are nested within
larger ones
...
Note that the
tree that we describe here is significantly different from that used by the tree protocol
(Section 16
...
5)
...
In the tree protocol, each node is an independent
data item
...
16, which consists of four levels
of nodes
...
Below it are nodes of type
area; the database consists of exactly these areas
...
Each area contains exactly those files that are its child nodes
...
Finally, each file has nodes of type record
...
Each node in the tree can be locked individually
...
When a transaction locks
a node, in either shared or exclusive mode, the transaction also has implicitly locked
all the descendants of that node in the same lock mode
...
16, in exclusive mode, then it has an
implicit lock in exclusive mode all the records belonging to that file
...
Suppose that transaction Tj wishes to lock record rb6 of file Fb
...
But, when Tj issues a lock
request for rb6 , rb6 is not explicitly locked! How does the system determine whether
Tj can lock rb6 ? Tj must traverse the tree from the root to record rb6
...
609
610
Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition
V
...
Concurrency Control
16
...
17
S
true
false
true
false
false
SIX
true
false
false
false
false
Multiple Granularity
611
X
false
false
false
false
false
Compatibility matrix
...
To do so, it
simply must lock the root of the hierarchy
...
But how does the system determine if the root node can be
locked? One possibility is for it to search the entire tree
...
A more efficient
way to gain this knowledge is to introduce a new class of lock modes, called intention lock modes
...
Intention locks are put
on all the ancestors of a node before that node is locked explicitly
...
A transaction wishing to lock a node—say, Q—must traverse a path in the
tree from the root to Q
...
There is an intention mode associated with shared mode, and there is one with
exclusive mode
...
Similarly,
if a node is locked in intention-exclusive (IX) mode, then explicit locking is being
done at a lower level, with exclusive-mode or shared-mode locks
...
The compatibility function for these lock
modes is in Figure 16
...
The multiple-granularity locking protocol, which ensures serializability, is this:
Each transaction Ti can lock a node Q by following these rules:
1
...
17
...
It must lock the root of the tree first, and can lock it in any mode
...
It can lock a node Q in S or IS mode only if it currently has the parent of Q
locked in either IX or IS mode
...
It can lock a node Q in X, SIX, or IX mode only if it currently has the parent of
Q locked in either IX or SIX mode
...
It can lock a node only if it has not previously unlocked any node (that is, Ti
is two phase)
...
It can unlock a node Q only if it currently has none of the children of Q locked
...
Transaction
Management
16
...
As an illustration of the protocol, consider the tree of Figure 16
...
Then, T18 needs to
lock the database, area A1 , and Fa in IS mode (and in that order), and finally
to lock ra2 in S mode
...
Then, T19 needs to
lock the database, area A1 , and file Fa in IX mode, and finally to lock ra9 in X
mode
...
Then, T20 needs
to lock the database and area A1 (in that order) in IS mode, and finally to lock
Fa in S mode
...
It can do so after locking the database in S mode
...
Transaction T19 can execute concurrently with T18 , but not with either T20 or T21
...
It is particularly
useful in applications that include a mix of
• Short transactions that access only a few data items
• Long transactions that produce reports from an entire file or set of files
There is a similar locking protocol that is applicable to database systems in which
data granularities are organized in the form of a directed acyclic graph
...
Deadlock is possible in the protocol that
we have, as it is in the two-phase locking protocol
...
These techniques are referenced in the bibliographical notes
...
5 Multiversion Schemes
The concurrency-control schemes discussed thus far ensure serializability by either
delaying an operation or aborting the transaction that issued the operation
...
These
difficulties could be avoided if old copies of each data item were kept in a system
...
When a transaction issues a read(Q) operation, the concurrencycontrol manager selects one of the versions of Q to be read
...
Transaction
Management
© The McGraw−Hill
Companies, 2001
16
...
5
Multiversion Schemes
613
serializability
...
16
...
1 Multiversion Timestamp Ordering
The most common transaction ordering technique used by multiversion schemes is
timestamping
...
The database system assigns this timestamp before
the transaction starts execution, as described in Section 16
...
With each data item Q, a sequence of versions
Each version Qk contains three data fields:
• Content is the value of version Qk
...
• R-timestamp(Qk ) is the largest timestamp of any transaction that successfully
read version Qk
...
The content field of the version holds the value written by Ti
...
It updates the
R-timestamp value of Qk whenever a transaction Tj reads the content of Qk , and
R-timestamp(Qk ) < TS(Tj )
...
The scheme operates as follows
...
Let Qk denote the version of Q whose write timestamp is the
largest write timestamp less than or equal to TS(Ti )
...
If transaction Ti issues a read(Q), then the value returned is the content of
version Qk
...
If transaction Ti issues write(Q), and if TS(Ti ) < R-timestamp(Qk ), then the system rolls back transaction Ti
...
The justification for rule 1 is clear
...
The second rule forces a transaction to abort if it is “too late”
in doing a write
...
Versions that are no longer needed are removed according to the following rule
...
Then, the older of the two versions Qk and Qj will not be used again, and can be
deleted
...
In typical database systems, where
Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition
614
Chapter 16
V
...
Concurrency Control
© The McGraw−Hill
Companies, 2001
Concurrency Control
reading is a more frequent operation than is writing, this advantage may be of major
practical significance
...
First, the reading
of a data item also requires the updating of the R-timestamp field, resulting in two
potential disk accesses, rather than one
...
This alternative may be
expensive
...
5
...
This multiversion timestamp-ordering scheme does not ensure recoverability and
cascadelessness
...
16
...
2 Multiversion Two-Phase Locking
The multiversion two-phase locking protocol attempts to combine the advantages
of multiversion concurrency control with the advantages of two-phase locking
...
Update transactions perform rigorous two-phase locking; that is, they hold all
locks up to the end of the transaction
...
Each version of a data item has a single timestamp
...
Read-only transactions are assigned a timestamp by reading the current value
of ts-counter before they start execution; they follow the multiversion timestampordering protocol for performing reads
...
When an update transaction reads an item, it gets a shared lock on the item, and
reads the latest version of that item
...
The write is performed on the new version, and the timestamp of the
new version is initially set to a value ∞, a value greater than that of any possible
timestamp
...
Only one update transaction
is allowed to perform commit processing at a time
...
In either case, read-only transactions never
need to wait for locks
...
Versions are deleted in a manner like that of multiversion timestamp ordering
...
Then, the older of the two versions Qk and Qj will not be used again and can
be deleted
...
Transaction
Management
© The McGraw−Hill
Companies, 2001
16
...
6
Deadlock Handling
615
Multiversion two-phase locking or variations of it are used in some commercial
database systems
...
6 Deadlock Handling
A system is in a deadlock state if there exists a set of transactions such that every
transaction in the set is waiting for another transaction in the set
...
, Tn } such that T0 is waiting for a
data item that T1 holds, and T1 is waiting for a data item that T2 holds, and
...
None of the transactions can make progress in such a situation
...
Rollback of a transaction may be partial: That is, a transaction may be rolled back to
the point where it obtained a lock whose release resolves the deadlock
...
We can
use a deadlock prevention protocol to ensure that the system will never enter a deadlock state
...
As we
shall see, both methods may result in transaction rollback
...
Note that a detection and recovery scheme requires overhead that includes not
only the run-time cost of maintaining the necessary information and of executing the
detection algorithm, but also the potential losses inherent in recovery from a deadlock
...
6
...
One approach ensures that no
cyclic waits can occur by ordering the requests for locks, or requiring all locks to be
acquired together
...
The simplest scheme under the first approach requires that each transaction locks
all its data items before it begins execution
...
There are two main disadvantages to this protocol: (1) it is often
hard to predict, before the transaction begins, what data items need to be locked;
(2) data-item utilization may be very low, since many of the data items may be locked
but unused for a long time
...
We have seen one such scheme in the tree protocol, which uses a
partial ordering of data items
...
Once a transaction has locked a particular item, it cannot
Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition
616
Chapter 16
V
...
Concurrency Control
© The McGraw−Hill
Companies, 2001
Concurrency Control
request locks on items that precede that item in the ordering
...
There is no need to change the underlying
concurrency-control system if two-phase locking is used: All that is needed it to ensure that locks are requested in the right order
...
In preemption, when a transaction T2 requests a lock that transaction
T1 holds, the lock granted to T1 may be preempted by rolling back of T1 , and granting
of the lock to T2
...
The system uses these timestamps only to decide whether a transaction
should wait or roll back
...
If a transaction
is rolled back, it retains its old timestamp when restarted
...
The wait–die scheme is a nonpreemptive technique
...
Otherwise, Ti is
rolled back (dies)
...
If T22 requests a data item held by T23 , then T22 will
wait
...
2
...
It is a counterpart to the
wait–die scheme
...
Otherwise, Tj is rolled back (Tj is wounded by Ti )
...
If T24 requests a data item held by T23 , then T24
will wait
...
Both the wound–wait and the wait–die schemes avoid starvation: At any time,
there is a transaction with the smallest timestamp
...
Since timestamps always increase, and since transactions are not assigned new timestamps when they are rolled back, a transaction that
is rolled back repeatedly will eventually have the smallest timestamp, at which point
it will not be rolled back again
...
• In the wait–die scheme, an older transaction must wait for a younger one to
release its data item
...
By contrast, in the wound–wait scheme, an older transaction never waits
for a younger transaction
...
Transaction
Management
© The McGraw−Hill
Companies, 2001
16
...
6
Deadlock Handling
617
• In the wait–die scheme, if a transaction Ti dies and is rolled back because it
requested a data item held by transaction Tj , then Ti may reissue the same
sequence of requests when it is restarted
...
Thus, Ti may die several times before acquiring the
needed data item
...
Transaction Ti is wounded and rolled back because Tj
requested a data item that it holds
...
Thus, there may be fewer rollbacks in the
wound–wait scheme
...
16
...
2 Timeout-Based Schemes
Another simple approach to deadlock handling is based on lock timeouts
...
If the lock has not been granted within that time, the transaction is said to time
out, and it rolls itself back and restarts
...
This scheme falls somewhere between deadlock prevention, where a
deadlock will never occur, and deadlock detection and recovery, which Section 16
...
3
discusses
...
However, in general
it is hard to decide how long a transaction must wait before timing out
...
Too short a wait
results in transaction rollback even when there is no deadlock, leading to wasted resources
...
Hence, the timeout-based
scheme has limited applicability
...
6
...
An algorithm that examines the state
of the system is invoked periodically to determine whether a deadlock has occurred
...
To do so, the
system must:
• Maintain information about the current allocation of data items to transactions, as well as any outstanding data item requests
...
• Recover from the deadlock when the detection algorithm determines that a
deadlock exists
...
Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition
618
Chapter 16
V
...
Concurrency Control
Concurrency Control
T26
T28
T25
T27
Figure 16
...
16
...
3
...
This graph consists of a pair G = (V, E), where V is a set of vertices and E is
a set of edges
...
Each
element in the set E of edges is an ordered pair Ti → Tj
...
When transaction Ti requests a data item currently being held by transaction Tj ,
then the edge Ti → Tj is inserted in the wait-for graph
...
A deadlock exists in the system if and only if the wait-for graph contains a cycle
...
To detect deadlocks,
the system needs to maintain the wait-for graph, and periodically to invoke an algorithm that searches for a cycle in the graph
...
18, which
depicts the following situation:
• Transaction T25 is waiting for transactions T26 and T27
...
• Transaction T26 is waiting for transaction T28
...
Suppose now that transaction T28 is requesting an item held by T27
...
19
...
Consequently, the question arises: When should we invoke the detection algorithm? The answer depends on two factors:
1
...
How many transactions will be affected by the deadlock?
617
618
Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition
V
...
Concurrency Control
16
...
19
Wait-for graph with a cycle
...
Data items allocated to deadlocked transactions will be
unavailable to other transactions until the deadlock can be broken
...
In the worst case, we would invoke the
detection algorithm every time a request for allocation could not be granted immediately
...
6
...
2 Recovery from Deadlock
When a detection algorithm determines that a deadlock exists, the system must recover from the deadlock
...
Three actions need to be taken:
1
...
Given a set of deadlocked transactions, we must determine which transaction (or transactions) to roll back to break the deadlock
...
Unfortunately, the term minimum cost is not a precise one
...
How long the transaction has computed, and how much longer the transaction will compute before it completes its designated task
...
How many data items the transaction has used
...
How many more data items the transaction needs for it to complete
...
How many transactions will be involved in the rollback
...
Rollback
...
The simplest solution is a total rollback: Abort the transaction and then
restart it
...
Such partial rollback requires the system
to maintain additional information about the state of all the running transactions
...
The deadlock detection mechanism should decide which locks the selected transaction needs to release in
order to break the deadlock
...
The recovery mechanism must be capable of performing such
Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition
620
Chapter 16
V
...
Concurrency Control
© The McGraw−Hill
Companies, 2001
Concurrency Control
partial rollbacks
...
See the bibliographical notes for relevant
references
...
Starvation
...
As a result, this transaction never completes its designated task, thus
there is starvation
...
The most common solution is to include
the number of rollbacks in the cost factor
...
7 Insert and Delete Operations
Until now, we have restricted our attention to read and write operations
...
Some transactions
require not only access to existing data items, but also the ability to create new data
items
...
To examine how such transactions affect concurrency control, we introduce these additional operations:
• delete(Q) deletes data item Q from the database
...
An attempt by a transaction Ti to perform a read(Q) operation after Q has been
deleted results in a logical error in Ti
...
It is also a logical error to attempt to delete a nonexistent data item
...
7
...
Let Ii
and Ij be instructions of Ti and Tj , respectively, that appear in schedule S in consecutive order
...
We consider several instructions Ij
...
Ii and Ij conflict
...
If Ij comes before Ii , Tj can execute the read operation successfully
...
Ii and Ij conflict
...
If Ij comes before Ii , Tj can execute the write operation successfully
...
Ii and Ij conflict
...
If Ij comes before Ii , Ti will have a logical error
...
Ii and Ij conflict
...
Then, if Ii comes before Ij , a logical error results
for Ti
...
Likewise, if Q existed
619
620
Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition
V
...
Concurrency Control
16
...
We can conclude the following:
• Under the two-phase locking protocol, an exclusive lock is required on a data
item before that item can be deleted
...
Suppose that transaction Ti issues delete(Q)
...
Hence, the
delete operation is rejected, and Ti is rolled back
...
Hence, this delete operation is rejected, and Ti is rolled
back
...
16
...
2 Insertion
We have already seen that an insert(Q) operation conflicts with a delete(Q) operation
...
Since an insert(Q) assigns a value to data item Q, an insert is treated similarly to a
write for concurrency-control purposes:
• Under the two-phase locking protocol, if Ti performs an insert(Q) operation,
Ti is given an exclusive lock on the newly created data item Q
...
16
...
3 The Phantom Phenomenon
Consider transaction T29 that executes the following SQL query on the bank database:
select sum(balance)
from account
where branch-name = ’Perryridge’
Transaction T29 requires access to all tuples of the account relation pertaining to the
Perryridge branch
...
We expect there to be potential for a
conflict for the following reasons:
Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition
622
Chapter 16
V
...
Concurrency Control
© The McGraw−Hill
Companies, 2001
Concurrency Control
• If T29 uses the tuple newly inserted by T30 in computing sum(balance), then
T29 read a value written by T30
...
• If T29 does not use the tuple newly inserted by T30 in computing sum(balance),
then in a serial schedule equivalent to S, T29 must come before T30
...
T29 and T30 do not access any tuple in
common, yet they conflict with each other! In effect, T29 and T30 conflict on a phantom
tuple
...
This problem is called the phantom phenomenon
...
”
To find all account tuples with branch-name = “Perryridge”, T29 must search either
the whole account relation, or at least an index on the relation
...
However, T29 is an example of a transaction that reads information about what tuples are
in a relation, and T30 is an example of a transaction that updates that information
...
The simplest solution to this problem is to associate a data item with the relation;
the data item represents the information used to find the tuples in the relation
...
Transactions, such as T30 , that update the information about what tuples are in a relation would have to lock the data item in exclusive mode
...
Do not confuse the locking of an entire relation, as in multiple granularity locking, with the locking of the data item corresponding to the relation
...
Locking is still required on tuples
...
The major disadvantage of locking a data item corresponding to the relation is
the low degree of concurrency— two transactions that insert different tuples into a
relation are prevented from executing concurrently
...
Any transaction that inserts a
tuple into a relation must insert information into every index maintained on the relation
...
For simplicity we shall only consider B+ -tree indices
...
A query will usually use one or more indices to access a relation
...
In our example, we assume
that there is an index on account for branch-name
...
If T29 reads the same leaf node to locate all tuples
pertaining to the Perryridge branch, then T29 and T30 conflict on that leaf node
...
Transaction
Management
© The McGraw−Hill
Companies, 2001
16
...
8
Weak Levels of Consistency
623
The index-locking protocol takes advantage of the availability of indices on a relation, by turning instances of the phantom phenomenon into conflicts on locks on
index leaf nodes
...
• A transaction Ti can access tuples of a relation only after first finding them
through one or more of the indices on the relation
...
• A transaction Ti may not insert, delete, or update a tuple ti in a relation r
without updating all indices on r
...
For insertion and deletion, the leaf nodes affected are those that contain (after
insertion) or contained (before deletion) the search-key value of the tuple
...
• The rules of the two-phase locking protocol must be observed
...
16
...
If every transaction has the
property that it maintains database consistency if executed alone, then serializability ensures that concurrent executions maintain consistency
...
In these cases, weaker levels of consistency are used
...
16
...
1 Degree-Two Consistency
The purpose of degree-two consistency is to avoid cascading aborts without necessarily ensuring serializability
...
A transaction must hold the appropriate lock mode when it
accesses a data item
...
Exclusive locks cannot be released until
the transaction either commits or aborts
...
Indeed, a transaction may read the same data item twice and obtain different
Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition
624
Chapter 16
V
...
Concurrency Control
Concurrency Control
T3
lock-S(Q)
read(Q)
unlock(Q)
T4
lock-X(Q)
read(Q)
write(Q)
unlock(Q)
lock-S(Q)
read(Q)
unlock(Q)
Figure 16
...
results
...
20, T3 reads the value of Q before and after that value is written
by T4
...
16
...
2 Cursor Stability
Cursor stability is a form of degree-two consistency designed for programs written
in host languages, which iterate over tuples of a relation by using cursors
...
• Any modified tuples are locked in exclusive mode until the transaction commits
...
Two-phase locking is
not required
...
Cursor stability is used in practice
on heavily accessed relations as a means of increasing concurrency and improving
system performance
...
Thus, the use of cursor stability is limited to specialized situations with simple
consistency constraints
...
8
...
For instance, a
transaction may operate at the level of read uncommitted, which permits the transaction to read records even if they have not been committed
...
For instance, approximate information is usually sufficient for statistics used for query optimization
...
Transaction
Management
© The McGraw−Hill
Companies, 2001
16
...
9
Concurrency in Index Structures∗∗
625
these transactions were to execute in a serializable fashion, they could interfere with
other transactions, causing the others’ execution to be delayed
...
• Repeatable read allows only committed records to be read, and further requires that, between two reads of a record by a transaction, no other transaction is allowed to update the record
...
For instance, when it is searching for records satisfying some conditions, a transaction may find some of the
records inserted by a committed transaction, but may not find others
...
For instance, between two reads of a record by the
transaction, the records may have been updated by other committed transactions
...
• Read uncommitted allows even uncommitted records to be read
...
16
...
However, since indices
are accessed frequently, they would become a point of great lock contention, leading
to a low degree of concurrency
...
It is perfectly acceptable for a transaction to perform a lookup
on an index twice, and to find that the structure of the index has changed in between,
as long as the index lookup returns the correct set of tuples
...
We outline two techniques for managing concurrent access to B+ -trees
...
The techniques that we present for concurrency control on B+ -trees are based on
locking, but neither two-phase locking nor the tree protocol is employed
...
The first technique is called the crabbing protocol:
• When searching for a key value, the crabbing protocol first locks the root node
in shared mode
...
After acquiring the lock on the child
node, it releases the lock on the parent node
...
Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition
626
Chapter 16
V
...
Concurrency Control
© The McGraw−Hill
Companies, 2001
Concurrency Control
• When inserting or deleting a key value, the crabbing protocol takes these actions:
It follows the same protocol as for searching until it reaches the desired
leaf node
...
It locks the leaf node in exclusive mode and inserts or deletes the key
value
...
After performing these actions, it releases the
locks on the node and siblings
...
Otherwise, it releases the lock on the parent
...
The progress of locking while the protocol both goes down the tree and goes back up
(in case of splits, coalescing, or redistribution) proceeds in a similar crab-like manner
...
There is a possibility of deadlocks between search operations coming
down the tree, and splits, coalescing or redistribution propagating up the tree
...
The second technique achieves even more concurrency, avoiding even holding the
lock on one node while acquiring the lock on another node, by using a modified version of B+ -trees called B-link trees; B-link trees require that every node (including internal nodes, not just the leaves) maintain a pointer to its right sibling
...
We shall illustrate
this technique with an example later, but we first present the modified procedures of
the B-link-tree locking protocol
...
Each node of the B+ -tree must be locked in shared mode before it is
accessed
...
If a split occurs concurrently with a lookup,
the desired search-key value may no longer appear within the range of values
represented by a node accessed during lookup
...
However, the system locks leaf
nodes following the two-phase locking protocol, as Section 16
...
3 describes,
to avoid the phantom phenomenon
...
The system follows the rules for lookup to locate the
leaf node into which it will make the insertion or deletion
...
Transaction
Management
© The McGraw−Hill
Companies, 2001
16
...
9
Concurrency in Index Structures∗∗
627
or deletion
...
7
...
• Split
...
3 and makes it the right sibling of the original node
...
Following this, the transaction releases the exclusive lock on the original node
and requests an exclusive lock on the parent, so that it can insert a pointer to
the new node
...
If a node has too few search-key values after a deletion, the node
with which it will be coalesced must be locked in exclusive mode
...
At this point, the transaction
releases the locks on the coalesced nodes
...
Observe this important fact: An insertion or deletion may lock a node, unlock it, and
subsequently relock it
...
As an illustration, consider the B+ -tree in Figure 16
...
Assume that there are two
concurrent operations on this B+ -tree:
1
...
Look up “Downtown”
Let us assume that the insertion operation begins first
...
It therefore converts its shared lock on the node to exclusive mode, and creates a
new node
...
” The new node contains the search-key value “Downtown
...
This lookup operation accesses the root, and follows the pointer
Perryridge
Downtown
Brighton
Clearview
Downtown
Figure 16
...
Round Hill
Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition
628
Chapter 16
V
...
Concurrency Control
Concurrency Control
Perryridge
Downtown
Brighton Clearview
Figure 16
...
21
...
It then accesses that node, and obtains a pointer to the
left child
...
” Since this node is currently locked by the insertion operation in
exclusive mode, the lookup operation must wait
...
It completes the insertion, leaving the B+ -tree as in Figure 16
...
The lookup operation proceeds
...
It therefore follows the right-sibling pointer to locate the next node
...
It
can be shown that, if a lookup holds a pointer to an incorrect node, then, by following
right-sibling pointers, the lookup must eventually reach the correct node
...
Coalescing of nodes
during deletion can cause inconsistencies, since a lookup may have read a pointer
to a deleted node from its parent, before the parent node was updated, and may
then try to access the deleted node
...
Leaving nodes uncoalesced avoids such inconsistencies
...
In most databases, however, insertions are more frequent than deletions, so
it is likely that nodes that have too few search-key values will gain additional values
relatively quickly
...
Key-value locking thus
provides increased concurrency
...
In this technique, every index lookup must lock
not only the keys found within the range (or the single key, in case of a point lookup)
but also the next key value — that is, the key value just greater than the last key value
that was within the range
...
Thus, if a transaction attempts to insert a value
that was within the range of the index lookup of another transaction, the two transactions would conflict on the key value next to the inserted key value
...
627
628
Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition
V
...
Concurrency Control
16
...
10 Summary
• When several transactions execute concurrently in the database, the consistency of data may no longer be preserved
...
• To ensure serializability, we can use various concurrency-control schemes
...
The most common ones are locking protocols, timestampordering schemes, validation techniques, and multiversion schemes
...
• The two-phase locking protocol allows a transaction to lock a new data item
only if that transaction has not yet unlocked any data item
...
In the absence of information
concerning the manner in which data items are accessed, the two-phase locking protocol is both necessary and sufficient for ensuring serializability
...
The rigorous two-phase locking protocol releases
all locks only at the end of the transaction
...
A unique fixed timestamp is
associated with each transaction in the system
...
Thus, if the timestamp of transaction
Ti is smaller than the timestamp of transaction Tj , then the scheme ensures
that the produced schedule is equivalent to a serial schedule in which transaction Ti appears before transaction Tj
...
• A validation scheme is an appropriate concurrency-control method in cases
where a majority of transactions are read-only transactions, and thus the rate
of conflicts among these transactions is low
...
The serializability order is determined by the timestamp of the transaction
...
It must, however, pass a validation test to complete
...
• There are circumstances where it would be advantageous to group several
data items, and to treat them as one aggregate data item for purposes of working, resulting in multiple levels of granularity
...
Such a hierarchy can be represented graphically as a tree
...
Transaction
Management
16
...
The protocol ensures serializability, but not freedom from deadlock
...
When a read
operation is issued, the system selects one of the versions to be read
...
A read operation
always succeeds
...
In multiversion two-phase locking, write operations may result in a lock
wait or, possibly, in deadlock
...
One way to prevent
deadlock is to use an ordering of data items, and to request locks in a sequence
consistent with the ordering
...
To control the preemption, we assign a unique timestamp to each transaction
...
If a transaction is rolled back, it retains its old timestamp when restarted
...
• If deadlocks are not prevented, the system must deal with them by using a
deadlock detection and recovery scheme
...
A system is in a deadlock state if and only if the wait-for graph
contains a cycle
...
It does so by
rolling back one or more transactions to break the deadlock
...
A transaction that inserts a
new tuple into the database is given an exclusive lock on the tuple
...
Such conflict cannot be detected if locking is done only on
tuples accessed by the transactions
...
The index-locking technique solves this problem by requiring locks on certain index buckets
...
• Weak levels of consistency are used in some applications where consistency
of query results is not critical, and using serializability would result in queries
adversely affecting transaction processing
...
SQL:1999 allows queries to specify the level of
consistency that they require
...
Transaction
Management
© The McGraw−Hill
Companies, 2001
16
...
10
Summary
631
• Special concurrency-control techniques can be developed for special data
structures
...
These techniques allow nonserializable access to the B+ -tree, but
they ensure that the B+ -tree structure is correct, and ensure that accesses to
the database itself are serializable
...
Transaction
Management
© The McGraw−Hill
Companies, 2001
16
...
1 Show that the two-phase locking protocol ensures conflict serializability, and
that transactions can be serialized according to their lock points
...
2 Consider the following two transactions:
T31 : read(A);
read(B);
if A = 0 then B := B + 1;
write(B)
...
Add lock and unlock instructions to transactions T31 and T32 , so that they observe the two-phase locking protocol
...
3 What benefit does strict two-phase locking provide? What disadvantages result?
16
...
5 Most implementations of database systems use strict two-phase locking
...
16
...
Suppose that we
insert a dummy vertex between each pair of vertices
...
631
632
Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition
V
...
Concurrency Control
© The McGraw−Hill
Companies, 2001
Exercises
633
16
...
16
...
• Each transaction must follow the rules of the tree protocol
...
Show that the protocol ensures serializability and deadlock freedom
...
9 Consider the following graph-based locking protocol, which allows only exclusive lock modes, and which operates on data graphs that are in the form of
a rooted directed acyclic graph
...
• To lock any other vertex, the transaction must be holding a lock on the
majority of the parents of that vertex
...
16
...
• A transaction can lock any vertex first
...
Show that the protocol ensures serializability and deadlock freedom
...
11 Consider a variant of the tree protocol called the forest protocol
...
Each transaction Ti must follow the
following rules:
• The first lock in each tree may be on any data item
...
• Data items may be unlocked at any time
...
Show that the forest protocol does not ensure serializability
...
12 Locking is not done explicitly in persistent programming languages
...
Most modern operating systems allow the user to set access protections (no access, read, write) on pages, and memory access that violate the
access protections result in a protection violation (see the Unix mprotect command, for example)
...
Transaction
Management
© The McGraw−Hill
Companies, 2001
16
...
23
S
true
false
false
X
false
false
false
I
false
false
true
Lock-compatibility matrix
...
(Hint: The
technique is similar to that used for hardware swizzling in Section 11
...
4)
...
13 Consider a database system that includes an atomic increment operation, in
addition to the read and write operations
...
The operation
increment(X) by C
sets the value of X to V + C in an atomic step
...
Figure 16
...
a
...
b
...
(Hint: Consider check-clearing transactions in our bank example
...
14 In timestamp ordering, W-timestamp(Q) denotes the largest timestamp of any
transaction that executed write(Q) successfully
...
Would this change in wording make any difference? Explain your
answer
...
15 When a transaction is rolled back under timestamp ordering, it is assigned a
new timestamp
...
16 In multiple-granularity locking, what is the difference between implicit and
explicit locking?
16
...
Why is it useless?
16
...
Provide examples of both situations, and compare the relative amount of concurrency allowed
...
19 Consider the validation-based concurrency-control scheme of Section 16
...
Show that by choosing Validation(Ti ), rather than Start(Ti ), as the timestamp of
transaction Ti , we can expect better response time provided that conflict rates
among transactions are indeed low
...
Transaction
Management
16
...
20 Show that there are schedules that are possible under the two-phase locking
protocol, but are not possible under the timestamp protocol, and vice versa
...
21 For each of the following protocols, describe aspects of practical applications
that would lead you to suggest using the protocol, and aspects that would
suggest not using the protocol:
•
•
•
•
•
•
•
Two-phase locking
Two-phase locking with multiple-granularity locking
The tree protocol
Timestamp ordering
Validation
Multiversion timestamp ordering
Multiversion two-phase locking
16
...
Explain how the commit bit can prevent cascading abort
...
23 Explain why the following technique for transaction execution may provide
better performance than just using strict two-phase locking: First execute the
transaction without acquiring any locks and without performing any writes to
the database as in the validation based techniques, but unlike in the validation
techniques do not perform either validation or perform writes on the database
...
(Hint: Consider
waits for disk I/O
...
24 Under what conditions is it less expensive to avoid deadlock than to allow
deadlocks to occur and then to detect them?
16
...
16
...
Give a schedule whereby the timestamp test for a write operation fails and
causes the first transaction to be restarted, in turn causing a cascading abort of
the other transaction
...
(Such a situation, where two or more processes carry out actions, but are
unable to complete their task because of interaction with the other processes,
is called a livelock
...
27 Explain the phantom phenomenon
...
28 Devise a timestamp-based protocol that avoids the phantom phenomenon
...
29 Explain the reason for the use of degree-two consistency
...
Transaction
Management
16
...
30 Suppose that we use the tree protocol of Section 16
...
5 to manage concurrent
access to a B+ -tree
...
Under what circumstances is it possible to release a lock
earlier?
16
...
Bibliographical Notes
Gray and Reuter [1993] provides detailed textbook coverage of transaction-processing
concepts, including concurrency control concepts and implementation details
...
Early textbook discussions of concurrency control and recovery included Papadimitriou [1986] and Bernstein et al
...
An early survey paper on implementation
issues in concurrency control and recovery is presented by Gray [1978]
...
[1976]
...
Other non-two-phase locking protocols that operate on more general graphs are described in Yannakakis et al
...
General discussions concerning locking protocols are offered by Lien and Weinberger
[1978], Yannakakis et al
...
Korth
[1983] explores various lock modes that can be obtained from the basic shared and
exclusive lock modes
...
6 is from Buckley and Silberschatz [1984]
...
8 is from Kedem
and Silberschatz [1983]
...
9 is from Kedem and Silberschatz [1979]
...
10 is from Yannakakis et al
...
Exercise 16
...
The timestamp-based concurrency-control scheme is from Reed [1983]
...
A timestamp algorithm that does not require any
rollback to ensure serializability is presented by Buckley and Silberschatz [1983]
...
The locking protocol for multiple-granularity data items is from Gray et al
...
A detailed description is presented by Gray et al
...
The effects of locking granularity are discussed by Ries and Stonebraker [1977]
...
This approach includes a class of lock modes
called update modes to deal with lock conversion
...
An extension of the protocol to ensure deadlock freedom is presented by Korth [1982]
...
Discussions concerning multiversion concurrency control are offered by Bernstein
et al
...
A multiversion tree-locking algorithm appears in Silberschatz [1982]
...
Transaction
Management
16
...
Lai
and Wilkinson [1984] describes a multiversion two-phase locking certifier
...
Holt [1971] and Holt [1972] were the first to formalize the notion of deadlocks in terms of a graph model similar to the one presented in this chapter
...
[1981a]
...
[1981] and Yannakakis [1981]
...
[1990]
...
[1975]
...
[1995]
...
The techniques presented in Section 16
...
The technique of key-value locking used
in ARIES provides for very high concurrency on B+ -tree access, and is described in
Mohan [1990a] and Mohan and Levine [1992]
...
Ellis [1987] presents a concurrency-control technique for
linear hashing
...
Concurrency-control algorithms for other index structures appear in Ellis [1980a] and
Ellis [1980b]
...
Transaction
Management
H
A
P
T
E
R
1
637
© The McGraw−Hill
Companies, 2001
17
...
In any failure, information may be lost
...
An integral part of a database
system is a recovery scheme that can restore the database to the consistent state that
existed before the failure
...
17
...
The simplest type of failure is one that does not
result in the loss of information in the system
...
In this chapter, we shall consider
only the following types of failure:
• Transaction failure
...
The transaction can no longer continue with its normal execution because of some internal condition, such as bad input, data not
found, overflow, or resource limit exceeded
...
The system has entered an undesirable state (for example,
deadlock), as a result of which a transaction cannot continue with its normal execution
...
• System crash
...
Transaction
Management
17
...
The content of nonvolatile
storage remains intact, and is not corrupted
...
Well-designed systems have numerous internal
checks, at the hardware and the software level, that bring the system to a halt
when there is an error
...
• Disk failure
...
Copies of the data on other disks,
or archival backups on tertiary media, such as tapes, are used to recover from
the failure
...
Next, we must consider
how these failure modes affect the contents of the database
...
These algorithms, known as recovery algorithms, have two parts:
1
...
2
...
17
...
To understand how to ensure the
atomicity and durability properties of a transaction, we must gain a better understanding of these storage media and their access methods
...
2
...
We review these terms, and introduce another class of storage, called stable storage
...
Information residing in volatile storage does not usually survive system crashes
...
Access to volatile storage is extremely fast, both because of the speed
of the memory access itself, and because it is possible to access any data item
in volatile storage directly
...
Information residing in nonvolatile storage survives system crashes
...
Disks are
used for online storage, whereas tapes are used for archival storage
...
Transaction
Management
639
© The McGraw−Hill
Companies, 2001
17
...
2
Storage Structure
641
however, are subject to failure (for example, head crash), which may result
in loss of information
...
This is
because disk and tape devices are electromechanical, rather than based entirely on chips, as is volatile storage
...
Other nonvolatile media are normally used only for
backup data
...
1), though nonvolatile, has insufficient capacity for most database systems
...
Information residing in stable storage is never lost (never should
be taken with a grain of salt, since theoretically never cannot be guaranteed—
for example, it is possible, although extremely unlikely, that a black hole may
envelop the earth and permanently destroy all data!)
...
Section 17
...
2 discusses
stable-storage implementation
...
Certain systems provide battery backup, so that some main
memory can survive system crashes and power failures
...
17
...
2 Stable-Storage Implementation
To implement stable storage, we need to replicate the needed information in several nonvolatile storage media (usually disk) with independent failure modes, and
to update the information in a controlled manner to ensure that failure during data
transfer does not damage the needed information
...
The simplest and fastest
form of RAID is the mirrored disk, which keeps two copies of each block, on separate
disks
...
RAID systems, however, cannot guard against data loss due to disasters such as
fires or flooding
...
However, since tapes cannot be carried off-site continually,
updates since the most recent time that tapes were carried off-site could be lost in
such a disaster
...
Since the blocks are output to a remote system as and when
they are output to local storage, once an output operation is complete, the output is
not lost, even in the event of a disaster such as a fire or flood
...
10
...
Block transfer between memory and disk storage
can result in
640
Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition
642
Chapter 17
V
...
Recovery System
© The McGraw−Hill
Companies, 2001
Recovery System
• Successful completion
...
• Partial failure
...
• Total failure
...
We require that, if a data-transfer failure occurs, the system detects it and invokes
a recovery procedure to restore the block to a consistent state
...
An output operation
is executed as follows:
1
...
2
...
3
...
During recovery, the system examines each pair of physical blocks
...
(Recall that
errors in a disk block, such as a partial write to the block, are detected by storing a
checksum with each block
...
If both blocks contain no detectable
error, but they differ in content, then the system replaces the content of the first block
with the value of the second
...
The requirement of comparing every corresponding pair of blocks during recovery
is expensive to meet
...
On recovery, only
blocks for which writes were in progress need to be compared
...
4
...
Although a large number of copies reduces
the probability of a failure to even lower than two copies do, it is usually reasonable
to simulate stable storage with only two copies
...
2
...
Blocks are the units of data transfer to and from disk, and may contain several data
Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition
V
...
Recovery System
17
...
1
Block storage operations
...
We shall assume that no data item spans two or more blocks
...
Transactions input information from the disk to main memory, and then output the
information back onto the disk
...
The blocks residing on the disk are referred to as physical blocks; the blocks
residing temporarily in main memory are referred to as buffer blocks
...
Block movements between disk and main memory are initiated through the following two operations:
1
...
2
...
Figure 17
...
Each transaction Ti has a private work area in which copies of all the data items
accessed and updated by Ti are kept
...
Each data item X kept in the work area of transaction Ti is denoted by xi
...
We transfer data by these two operations:
1
...
It executes
this operation as follows:
a
...
b
...
2
...
It executes this operation as follows:
a
...
b
...
642
Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition
644
Chapter 17
V
...
Recovery System
© The McGraw−Hill
Companies, 2001
Recovery System
Note that both operations may require the transfer of a block from disk to main memory
...
A buffer block is eventually written out to the disk either because the buffer manager needs the memory space for other purposes or because the database system
wishes to reflect the change to B on the disk
...
When a transaction needs to access a data item X for the first time, it must execute
read(X)
...
After the transaction accesses X for the final time, it must execute write(X) to reflect the change to X in the
database itself
...
Thus, the actual output may
take place later
...
17
...
Suppose that a system crash has occurred during the execution of Ti ,
after output(BA ) has taken place, but before output(BB ) was executed, where BA and
BB denote the buffer blocks on which A and B reside
...
This procedure will result in the value of A becoming $900,
rather than $950
...
• Do not reexecute Ti
...
Thus, the system enters an inconsistent state
...
The reason for this difficulty is that we have modified
the database without having assurance that the transaction will indeed commit
...
However, if
Ti performed multiple database modifications, several output operations may be required, and a failure may occur after some of these modifications have been made,
but before all of them are made
...
As we shall
see, this procedure will allow us to output all the modifications made by a committed transaction, despite failures
...
4 and 17
...
In these two sections, we shall assume that
Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition
V
...
Recovery System
17
...
We shall describe how to handle concurrently executing transactions later, in
Section 17
...
17
...
The
log is a sequence of log records, recording all the update activities in the database
...
An update log record describes a single database write
...
• Data-item identifier is the unique identifier of the data item written
...
• Old value is the value of the data item prior to the write
...
Other special log records exist to record significant events during transaction processing, such as the start of a transaction and the commit or abort of a transaction
...
Transaction Ti has started
...
Transaction Ti has performed a write on data item Xj
...
•
...
•
...
Whenever a transaction performs a write, it is essential that the log record for that
write be created before the database is modified
...
Also, we have the ability
to undo a modification that has already been output to the database
...
For log records to be useful for recovery from system and disk failures, the log
must reside in stable storage
...
In Section 17
...
In Sections 17
...
1 and 17
...
2, we shall introduce two techniques for using the
log to ensure transaction atomicity despite failures
...
As a result, the volume of data stored in the
log may become unreasonably large
...
4
...
644
Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition
646
Chapter 17
V
...
Recovery System
© The McGraw−Hill
Companies, 2001
Recovery System
17
...
1 Deferred Database Modification
The deferred-modification technique ensures transaction atomicity by recording all
database modifications in the log, but deferring the execution of all write operations
of a transaction until the transaction partially commits
...
The version of the deferred-modification technique that we describe in this
section assumes that transactions are executed serially
...
If the system crashes before
the transaction completes its execution, or if the transaction aborts, then the information on the log is simply ignored
...
Before Ti starts its execution,
a record
...
Finally, when Ti partially commits, a record
...
Since a failure may occur while this updating is
taking place, we must ensure that, before the start of these updates, all the log records
are written out to stable storage
...
Observe that only the new value of the data item is required by the deferredmodification technique
...
To illustrate, reconsider our simplified banking system
...
Let T1 be a transaction that withdraws $100 from account C:
T1 : read(C);
C := C − 100;
write(C)
...
The portion of the log containing the relevant
information on these two transactions appears in Figure 17
...
There are various orders in which the actual outputs can take place to both the
database system and the log as a result of the execution of T0 and T1
...
Transaction
Management
645
© The McGraw−Hill
Companies, 2001
17
...
4
Log-Based Recovery
647
Figure 17
...
appears in Figure 17
...
Note that the value of A is changed in the database only after
the record
...
The recovery scheme uses the following recovery procedure:
• redo(Ti ) sets the value of all data items updated by transaction Ti to the new
values
...
The redo operation must be idempotent; that is, executing it several times must be
equivalent to executing it once
...
After a failure, the recovery subsystem consults the log to determine which transactions need to be redone
...
Thus, if the system
crashes after the transaction completes its execution, the recovery scheme uses the
information in the log to restore the system to a previous consistent state after the
transaction had completed
...
Figure 17
...
Let us suppose that the
Log
Figure 17
...
646
Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition
648
Chapter 17
V
...
Recovery System
Recovery System
(a)
Figure 17
...
3, shown at three different times
...
Assume that the crash
occurs just after the log record for the step
write(B)
of transaction T0 has been written to stable storage
...
4a
...
The values of accounts A and B
remain $1000 and $2000, respectively
...
Now, let us assume the crash comes just after the log record for the step
write(C)
of transaction T1 has been written to stable storage
...
4b
...
After this operation is executed, the values of accounts
A and B are $950 and $2050, respectively
...
As
before, the log records of the incomplete transaction T1 can be deleted from the log
...
The log at the time of this crash is as in Figure 17
...
When
the system comes back up, two commit records are in the log: one for T0 and one
for T1
...
After the system executes these
operations, the values of accounts A, B, and C are $950, $2050, and $600, respectively
...
Some changes may have been made to the database as a
Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition
V
...
Recovery System
17
...
When the system comes up after the second crash, recovery proceeds exactly as in the preceding
examples
...
In other words,
it restarts the recovery actions from the beginning
...
17
...
2 Immediate Database Modification
The immediate-modification technique allows database modifications to be output
to the database while the transaction is still in the active state
...
In the event
of a crash or a transaction failure, the system must use the old-value field of the
log records described in Section 17
...
The undo operation, described next,
accomplishes this restoration
...
During its execution, any write(X) operation by Ti is preceded by the writing of the appropriate new update record to the log
...
Since the information in the log is used in reconstructing the state of the database,
we cannot allow the actual update to the database to take place before the corresponding log record is written out to stable storage
...
We shall return to this issue in Section 17
...
As an illustration, let us reconsider our simplified banking system, with transactions T0 and T1 executed one after the other in the order T0 followed by T1
...
5
...
6 shows one possible order in which the actual outputs took place in both
the database system and the log as a result of the execution of T0 and T1
...
5
Portion of the system log corresponding to T0 and T1
...
Transaction
Management
© The McGraw−Hill
Companies, 2001
17
...
6
State of system log and database corresponding to T0 and T1
...
4
...
Using the log, the system can handle any failure that does not result in the loss
of information in nonvolatile storage
...
• redo(Ti ) sets the value of all data items updated by transaction Ti to the new
values
...
The undo and redo operations must be idempotent to guarantee correct behavior
even if a failure occurs during the recovery process
...
• Transaction Ti needs to be redone if the log contains both the record
and the record
...
Suppose that the system
crashes before the completion of the transactions
...
The
state of the logs for each of these cases appears in Figure 17
...
First, let us assume that the crash occurs just after the log record for the step
write(B)
Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition
V
...
4
(a)
Figure 17
...
Recovery System
Log-Based Recovery
651
(c)
The same log, shown at three different times
...
7a)
...
Thus, transaction T0 must be undone, so an undo(T0 ) is performed
...
Next, let us assume that the crash comes just after the log record for the step
write(C)
of transaction T1 has been written to stable storage (Figure 17
...
When the system
comes back up, two recovery actions need to be taken
...
The operation redo(T0 ) must be performed, since the log contains both
the record
...
Note that the undo(T1 ) operation is performed before the redo(T0 )
...
However, the order of
doing undo operations first, and then redo operations, is important for the recovery
algorithm that we shall see in Section 17
...
Finally, let us assume that the crash occurs just after the log record
has been written to stable storage (Figure 17
...
When the system comes back up,
both T0 and T1 need to be redone, since the records
appear in the log, as do the records
...
17
...
3 Checkpoints
When a system failure occurs, we must consult the log to determine those transactions that need to be redone and those that need to be undone
...
There are two major difficulties
with this approach:
650
Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition
652
Chapter 17
V
...
Recovery System
© The McGraw−Hill
Companies, 2001
Recovery System
1
...
2
...
Although redoing them
will cause no harm, it will nevertheless cause recovery to take longer
...
During execution, the
system maintains the log, using one of the two techniques described in Sections 17
...
1
and 17
...
2
...
Output onto stable storage all log records currently residing in main memory
...
Output to the disk all modified buffer blocks
...
Output onto stable storage a log record
...
The presence of a
its recovery procedure
...
For such a transaction, the
...
Thus, at recovery time, there is no need to perform a redo operation on Ti
...
(We continue
to assume that transactions are run serially
...
It can find such a transaction by searching the log backward, from the end of the log, until it finds the first
the next
...
Once the system has identified transaction Ti , the redo and undo operations need
to be applied to only transaction Ti and all transactions Tj that started executing
after transaction Ti
...
The remainder
(earlier part) of the log can be ignored, and can be erased whenever desired
...
For the immediate-modification technique, the recovery operations are:
• For all transactions Tk in T that have no
...
Obviously, the undo operation does not need to be applied when the deferred-modification technique is being employed
...
Transaction
Management
651
© The McGraw−Hill
Companies, 2001
17
...
5
Shadow Paging
653
As an illustration, consider the set of transactions {T0 , T1 ,
...
Suppose that the most recent checkpoint took place during
the execution of transaction T67
...
, T100 need to be
considered during the recovery scheme
...
In Section 17
...
3, we consider an extension of the checkpoint technique for concurrent transaction processing
...
5 Shadow Paging
An alternative to log-based crash-recovery techniques is shadow paging
...
3
...
There are, however, disadvantages to the shadow-paging approach, as we shall see,
that limit its use
...
As before, the database is partitioned into some number of fixed-length blocks,
which are referred to as pages
...
Assume that there are
n pages, numbered 1 through n
...
)
These pages do not need to be stored in any particular order on disk (there are many
reasons why they do not, as we saw in Chapter 11)
...
We use a page table, as in Figure 17
...
The page table has n entries—one for each database page
...
The first entry contains a pointer to the
first page of the database, the second entry points to the second page, and so on
...
8 shows that the logical order of database pages does not need
to correspond to the physical order in which the pages are placed on disk
...
When the transaction starts, both page tables are identical
...
The current page table may be
changed when a transaction performs a write operation
...
Suppose that the transaction Tj performs a write(X) operation, and that X resides
on the ith page
...
If the ith page (that is, the page on which X resides) is not already in main
memory, then the system issues input(X)
...
If this is the write first performed on the ith page by this transaction, then the
system modifies the current page table as follows:
a
...
Usually, the database system has access
to a list of unused (free) pages, as we saw in Chapter 11
...
Recovery System
Recovery System
…
1
2
3
4
5
6
7
n
page table
…
654
V
...
8
Sample page table
...
It deletes the page found in step 2a from the list of free page frames; it
copies the contents of the ith page to the page found in step 2a
...
It modifies the current page table so that the ith entry points to the page
found in step 2a
...
It assigns the value of xj to X in the buffer page
...
2
...
Steps 1 and 3 here correspond
to steps 1 and 2 in Section 17
...
3
...
Transaction
Management
653
© The McGraw−Hill
Companies, 2001
17
...
5
Shadow Paging
1
2
3
4
5
6
7
8
9
10
655
1
2
3
4
5
6
7
8
9
10
current page table
shadow page table
pages on disk
Figure 17
...
page table
...
9 shows the shadow and current page tables for a transaction
performing a write to the fourth page of a database consisting of 10 pages
...
When
the transaction commits, the system writes the current page table to nonvolatile storage
...
It is important that the shadow page table
be stored in nonvolatile storage, since it provides the only means of locating database
pages
...
We do
not care whether the current page table is lost in a crash, since the system recovers by
using the shadow page table
...
A simple way of finding it is to choose one fixed location in stable storage that
contains the disk address of the shadow page table
...
Transaction
Management
17
...
Because of our definition of the write operation,
we are guaranteed that the shadow page table will point to the database pages corresponding to the state of the database prior to any transaction that was active at the
time of the crash
...
Unlike our log-based schemes, shadow
paging needs to invoke no undo operations
...
Ensure that all buffer pages in main memory that have been changed by the
transaction are output to disk
...
)
2
...
Note that we must not overwrite the
shadow page table, since we may need it for recovery from a crash
...
Output the disk address of the current page table to the fixed location in stable storage containing the address of the shadow page table
...
Therefore, the current page
table has become the shadow page table, and the transaction is committed
...
If the crash occurs after the completion of step 3, the
effects of the transaction will be preserved; no redo operations need to be invoked
...
The overhead of log-record output is eliminated, and recovery from crashes is significantly
faster (since no undo or redo operations are needed)
...
The commit of a single transaction using shadow paging
requires multiple blocks to be output—the actual data blocks, the current page
table, and the disk address of the current page table
...
The overhead of writing an entire page table can be reduced by implementing the page table as a tree structure, with page table entries at the leaves
...
The
nodes of the tree are pages and have a high fanout, like B+ -trees
...
When a
page is to be updated for the first time, the system changes the entry in the current page table to point to the copy of the page
...
Otherwise, the
system first copies it, and updates the copy
...
The process of copying proceeds up to the root of the tree
...
Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition
V
...
Recovery System
17
...
All the other parts of the tree are shared between the shadow and the current
page table, and do not need to be copied
...
However, several pages of the page table
still need to copied for each transaction, and the log-based schemes continue
to be superior as long as most transactions update only small parts of the
database
...
In Chapter 11, we considered strategies to ensure locality
— that is, to keep related database pages close physically on the disk
...
Shadow paging causes database pages to
change location when they are updated
...
(See the bibliographical notes for
references
...
Each time that a transaction commits, the database pages
containing the old version of data changed by the transaction become inaccessible
...
9, the page pointed to by the fourth entry of the shadow
page table will become inaccessible once the transaction of that example commits
...
Garbage may be created also as a side
effect of crashes
...
This process, called garbage collection,
imposes additional overhead and complexity on the system
...
(See the bibliographical notes for
references
...
In such systems, some logging is usually required, even if shadow
paging is used
...
4
...
It is relatively
easy to extend the log-based recovery schemes to allow concurrent transactions, as
we shall see in Section 17
...
For these reasons, shadow paging is not widely used
...
6 Recovery with Concurrent Transactions
Until now, we considered recovery in an environment where only a single transaction at a time is executing
...
Regardless
of the number of concurrent transactions, the system has a single disk buffer and a
single log
...
We allow immediate modification,
and permit a buffer block to have data items updated by one or more transactions
...
Transaction
Management
17
...
6
...
To roll back a failed transaction, we must undo the updates performed by the
transaction
...
Using the log-based schemes
for recovery, we restore the value by using the undo information in a log record
...
Then, the update performed by T1 will be lost if T0 is rolled back
...
We can ensure this requirement easily by using strict two-phase locking—that
is, two-phase locking with exclusive locks held until the end of the transaction
...
6
...
The system scans the log backward; for every log record of the form
restores the data item Xj to its old value V1
...
Scanning the log backward is important, since a transaction may have updated a
data item more than once
...
Scanning the log backward sets A correctly to 10
...
If strict two-phase locking is used for concurrency control, locks held by a transaction T may be released only after the transaction has been rolled back as described
...
6
...
Therefore, restoring the old value of the
data item will not erase the effects of any other transaction
...
6
...
4
...
Since we assumed no concurrency,
it was necessary to consider only the following transactions during recovery:
• Those transactions that started after the most recent checkpoint
• The one transaction, if any, that was active at the time of the most recent checkpoint
The situation is more complex when transactions can execute concurrently, since several transactions may have been active at the time of the most recent checkpoint
...
Transaction
Management
17
...
6
657
© The McGraw−Hill
Companies, 2001
Recovery with Concurrent Transactions
659
In a concurrent transaction-processing system, we require that the checkpoint log
record be of the form
time of the checkpoint
...
The requirement that transactions must not perform any updates to buffer blocks
or to the log during checkpointing can be bothersome, since transaction processing
will have to halt while a checkpoint is in progress
...
Section 17
...
5 describes fuzzy checkpointing schemes
...
6
...
The system constructs the two lists as follows: Initially, they are both empty
...
• For each record found of the form
adds Ti to undo-list
...
For each transaction Ti in L, if Ti is not in redo-list then it adds
Ti to the undo-list
...
The system rescans the log from the most recent record backward, and performs an undo for each log record that belongs transaction Ti on the undo-list
...
The scan
stops when the
in the undo-list
...
The system locates the most recent
...
3
...
It ignores log records of transactions on the undo-list in this
phase
...
658
Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition
660
Chapter 17
V
...
Recovery System
© The McGraw−Hill
Companies, 2001
Recovery System
After the system has undone all transactions on the undo-list, it redoes those transactions on the redo-list
...
When
the recovery process has completed, transaction processing resumes
...
Suppose that data item A initially has the value 10
...
Suppose that another transaction Tj then updated data item A to 30 and
committed, following which the system crashed
...
The final value of Q should be 30, which we can ensure
by performing undo before performing redo
...
7 Buffer Management
In this section, we consider several subtle details that are essential to the implementation of a crash-recovery scheme that ensures data consistency and imposes a minimal
amount of overhead on interactions with the database
...
7
...
This assumption imposes a high overhead on system execution for several
reasons: Typically, output to stable storage is in units of blocks
...
Thus, the output of each log record translates to
a much larger output at the physical level
...
2
...
The cost of performing the output of a block to stable storage is sufficiently high
that it is desirable to output multiple log records at once
...
Multiple log records can be gathered in the log buffer, and
output to stable storage in a single output operation
...
As a result of log buffering, a log record may reside in only main memory (volatile
storage) for a considerable time before it is output to stable storage
...
Transaction
Management
659
© The McGraw−Hill
Companies, 2001
17
...
7
Buffer Management
661
• Transaction Ti enters the commit state after the
been output to stable storage
...
• Before a block of data in main memory can be output to the database (in nonvolatile storage), all log records pertaining to data in that block must have
been output to stable storage
...
(Strictly speaking,
the WAL rule requires only that the undo information in the log have been
output to stable storage, and permits the redo information to be written later
...
)
The three rules state situations in which certain log records must have been output
to stable storage
...
Thus, when the system finds it necessary to output a log record to
stable storage, it outputs an entire block of log records, if there are enough log records
in main memory to fill a block
...
Writing the buffered log to disk is sometimes referred to as a log force
...
7
...
2, we described the use of a two-level storage hierarchy
...
Since main memory is typically much smaller than the entire
database, it may be necessary to overwrite a block B1 in main memory when another
block B2 needs to be brought into memory
...
As discussed in Section 11
...
1 in Chapter 11, this
storage hierarchy is the standard operating system concept of virtual memory
...
If the input of block B2 causes block B1 to be chosen for output, all log
records pertaining to data in B1 must be output to stable storage before B1 is output
...
• Output block B1 to disk
...
It is important that no writes to the block B1 be in progress while the system carries out this sequence of actions
...
Transaction
Management
17
...
The lock can be released immediately after the update has been performed
...
It releases the lock once the block output has
completed
...
Latches
are treated as distinct from locks used by the concurrency-control system
...
To illustrate the need for the write-ahead logging requirement, consider our banking example with transactions T0 and T1
...
Assume that the block on which B resides is
not in main memory, and that main memory is full
...
If the system outputs this block to disk and
then a crash occurs, the values in the database for accounts A, B, and C are $950,
$2000, and $700, respectively
...
However, because
of the WAL requirements, the log record
must be output to stable storage prior to output of the block on which A resides
...
17
...
3 Operating System Role in Buffer Management
We can manage the database buffer by using one of two approaches:
1
...
The database system manages
data-block transfer in accordance with the requirements in Section 17
...
2
...
The buffer must be kept small enough that other applications have
sufficient main memory available for their needs
...
Likewise, nondatabase applications may not use
that part of main memory reserved for the database buffer, even if some of the
pages in the database buffer are not being used
...
The database system implements its buffer within the virtual memory provided by the operating system
...
But, to ensure the write-ahead logging requirements in Section 17
...
1, the operating system should not write out the database buffer pages itself, but in-
Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition
V
...
8
661
© The McGraw−Hill
Companies, 2001
17
...
The database system in turn would force-output the buffer blocks to the database, after writing relevant log records to stable storage
...
The operating system reserves space on disk
for storing virtual-memory pages that are not currently in main memory; this
space is called swap space
...
Therefore, if the database buffer is in virtual memory, transfers between
database files and the buffer in virtual memory must be managed by the
database system, which enforces the write-ahead logging requirements that
we discussed
...
If a block Bx is
output by the operating system, that block is not output to the database
...
When the database system needs to output Bx , the operating system may
need first to input Bx from its swap space
...
Although both approaches suffer from some drawbacks, one or the other must
be chosen unless the operating system is designed to support the requirements of
database logging
...
17
...
Although failures in which the content of nonvolatile storage is lost
are rare, we nevertheless need to be prepared to deal with this type of failure
...
Our discussions apply as well to other
nonvolatile storage types
...
For example, we may dump the database to one or
more magnetic tapes
...
Once this restoration has been accomplished, the system uses the log
to bring the database system to the most recent consistent state
...
Output all log records currently residing in main memory onto stable storage
...
Output all buffer blocks onto the disk
...
Transaction
Management
17
...
Copy the contents of the database to stable storage
...
Output a log record
...
4
...
To recover from the loss of nonvolatile storage, the system restores the database
to disk by using the most recent dump
...
Notice that
no undo operations need to be executed
...
Dumps of a database and checkpointing of buffers are similar
...
First, the entire database must be be copied to stable storage, resulting in considerable
data transfer
...
Fuzzy dump schemes have been developed, which allow transactions to be active while the dump is in progress
...
17
...
6 require that, once a transaction updates a data item, no other transaction may update the same data item until the first
commits or is rolled back
...
Although strict two-phase locking is acceptable for records in relations, as discussed
in Section 16
...
To increase concurrency, we can use the B+ -tree concurrency-control algorithm described in Section 16
...
As a result, however, the recovery techniques from Section 17
...
Several alternative recovery techniques, applicable even with early lock release, have been proposed
...
We first describe an advanced recovery scheme supporting early lock release
...
ARIES is more complex than our advanced recovery scheme, but
incorporates a number of optimizations to minimize recovery time, and provides a
number of other useful features
...
9
...
Consider a transaction T
that inserts an entry into a B+ -tree, and, following the B+ -tree concurrency-control
protocol, releases some locks after the insertion operation completes, but before the
transaction commits
...
Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition
V
...
Recovery System
17
...
For this reason,
the B+ -tree concurrency-control protocol in Section 16
...
Now let us consider how to perform transaction rollback
...
Instead, the insertion operation has to be undone by a logical undo—that is,
in this case, by the execution of a delete operation
...
For example, if the operation inserted an entry in a B+ -tree, the undo information U would
indicate that a deletion operation is to be performed, and would identify the B+ -tree
and what to delete from the tree
...
In contrast, logging of old-value and new-value information
is called physical logging, and the corresponding log records are called physical log
records
...
Before a logical operation begins, it writes a log record
...
Thus, the usual old-value and new-value information is written out for each update
...
17
...
2 Transaction Rollback
First consider transaction rollback during normal operation (that is, not during recovery from system failure)
...
Unlike rollback
in normal operation, however, rollback in our advanced recovery scheme writes out
special redo-only log records of the form
restored to data item Xj during the rollback
...
Such records do not need undo information, since we will
never need to undo such an undo operation
...
It rolls back the operation by using the undo information U in the log record
...
In other words,
the system logs physical undo information for the updates performed during
664
Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition
666
Chapter 17
V
...
Recovery System
© The McGraw−Hill
Companies, 2001
Recovery System
rollback, instead of using compensation log records
...
9
...
At the end of the operation rollback, instead of generating a log record
< Ti , Oj , operation-end, U >, the system generates a log record < Ti , Oj ,
operation-abort>
...
When the backward scan of the log continues, the system skips all log records
of the transaction until it finds the log record
...
Observe that skipping over physical log records when the operation-end log record
is found during rollback ensures that the old values in the physical log record are not
used for rollback, once the operation completes
...
These preceding log records
must be skipped to prevent multiple rollback of the same operation, in case there had
been a crash during an earlier rollback, and the transaction had already been partly
rolled back
...
If failures occur while a logical operation is in progress, the operation-end log
record for the operation will not be found when the transaction is rolled back
...
The physical log
records will be used to roll back the incomplete operation
...
9
...
6
...
It outputs to stable storage all log records currently residing in main memory
...
It outputs to the disk all modified buffer blocks
...
It outputs onto stable storage a log record
all active transactions
...
9
...
In the redo phase, the system replays updates of all transactions by scanning the log forward from the last checkpoint
...
Transaction
Management
17
...
9
665
© The McGraw−Hill
Companies, 2001
Advanced Recovery Techniques∗∗
667
tem crash, and those that had not committed when the system crash occurred
...
This phase also determines all transactions that
are either in the transaction list in the checkpoint record, or started later, but
did not have either a
...
2
...
It
performs rollback by scanning the log backward from the end
...
Thus, log records of a transaction preceding an operationend record, but after the corresponding operation-begin record, are ignored
...
Scanning of the log stops
when the system has found
undo-list
...
In other words, this phase of restart recovery repeats all
the update actions that were executed after the checkpoint, and whose log records
reached the stable log
...
The actions are repeated in the
same order in which they were carried out; hence, this process is called repeating
history
...
Note that if an operation undo was in progress when the system crash occurred,
the physical log records written during operation undo would be found, and the partial operation undo would itself be undone on the basis of these physical log records
...
17
...
5 Fuzzy Checkpointing
The checkpointing technique described in Section 17
...
3 requires that all updates to
the database be temporarily suspended while the checkpoint is in progress
...
To avoid such interruptions, the checkpointing technique can be modified to permit updates to start once the checkpoint record has been written, but before the modified buffer blocks are written to disk
...
Since pages are output to disk only after the checkpoint record has been written, it
is possible that the system could crash before all pages are written
...
One way to deal with incomplete checkpoints is this:
The location in the log of the checkpoint record of the last completed checkpoint
666
Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition
668
Chapter 17
V
...
Recovery System
© The McGraw−Hill
Companies, 2001
Recovery System
is stored in a fixed position, last-checkpoint, on disk
...
Instead, before it writes the
checkpoint record, it creates a list of all modified buffer blocks
...
Even with fuzzy checkpointing, a buffer block must not be updated while it is
being output to disk, although other buffer blocks may be updated concurrently
...
Note that, in our scheme, logical logging is used only for undo purposes, whereas
physical logging is used for redo and undo purposes
...
To perform logical redo, the database state on
disk must be operation consistent, that is, it should not have partial effects of any
operation
...
Therefore, logical redo logging is usually restricted only
to operations that affect a single page; we will see how to handle such logical redos
in Section 17
...
6
...
17
...
6 ARIES
The state of the art in recovery methods is best illustrated by the ARIES recovery
method
...
In contrast, ARIES uses a number of techniques to reduce the
time taken for recovery, and to reduce the overheads of checkpointing
...
The price paid is greater
complexity; the benefits are worth the price
...
Uses a log sequence number (LSN) to identify log records, and the use of
LSNs in database pages to identify which operations have been applied to a
database page
...
Supports physiological redo operations, which are physical in that the affected page is physically identified, but can be logical within the page
...
With physical redo logging, all bytes of the page affected by the shifting of records must
be logged
...
Redo of the deletion operation would
delete the record and shift other records as required
...
Transaction
Management
17
...
9
667
© The McGraw−Hill
Companies, 2001
Advanced Recovery Techniques∗∗
669
3
...
Dirty
pages are those that have been updated in memory, and the disk version is
not up-to-date
...
Uses fuzzy checkpointing scheme that only records information about dirty
pages and associated information, and does not even require writing of dirty
pages to disk
...
In the rest of this section we provide an overview of ARIES
...
17
...
6
...
The number is conceptually just a logical identifier whose value is greater
for log records that occur later in the log
...
Typically, ARIES splits a
log into multiple log files, each of which has a file number
...
The LSN then consists of a
file number and an offset within the file
...
Whenever an operation (whether physical or logical) occurs on a page, the operation stores the LSN of
its log record in the PageLSN field of the page
...
In combination with a scheme for recording PageLSNs as part of checkpointing, which we
present later, ARIES can avoid even reading many pages for which logged operations
are already reflected on disk
...
The PageLSN is essential for ensuring idempotence in the presence of physiological redo operations, since reapplying a physiological redo that has already been applied to a page could cause incorrect changes to a page
...
Therefore, ARIES uses latches on buffer pages to prevent them from being written to disk while they are being updated
...
Each log record also contains the LSN of the previous log record of the same transaction
...
There are special redo-only
log records generated during transaction rollback, called compensation log records
(CLRs) in ARIES
...
In addition CLRs serve the role of the operation-abort
log records in our scheme
...
Transaction
Management
17
...
This field serves the same purpose as the operation identifier in the
operation-abort log record in our scheme, which helps to skip over log records that
have already been rolled back
...
For each page, it stores the PageLSN and a field
called the RecLSN which helps identify log records that have been applied already
to the version of the page on disk
...
Whenever the page is flushed to disk, the page is removed from the
DirtyPageTable
...
For each transaction, the checkpoint log record also notes LastLSN, the LSN of
the last log record written by the transaction
...
17
...
6
...
• Analysis pass: This pass determines which transactions to undo, which pages
were dirty at the time of the crash, and the LSN from which the redo pass
should start
...
• Undo pass: This pass rolls back all transactions that were incomplete at the
time of crash
...
It then sets RedoLSN to the minimum
of the RecLSNs of the pages in the DirtyPageTable
...
The redo pass starts its scan
of the log from RedoLSN
...
The analysis pass initially sets the list of
transactions to be undone, undo-list, to the list of transactions in the checkpoint log
record
...
The analysis pass continues scanning forward from the checkpoint
...
Whenever it finds a transaction end log record, it deletes the transaction
from undo-list
...
The analysis pass also keeps track of the last record
of each transaction in undo-list, which is used in the undo pass
...
Transaction
Management
17
...
Recovery System
Advanced Recovery Techniques∗∗
671
The analysis pass also updates DirtyPageTable whenever it finds a log record for
an update on a page
...
Redo Pass: The redo pass repeats history by replaying every action that is not already
reflected in the page on disk
...
Whenever it finds an update log record, it takes this action:
1
...
2
...
Note that if either of the tests is negative, then the effects of the log record have
already appeared on the page
...
Undo Pass and Transaction Rollback: The undo pass is relatively straightforward
...
If a CLR
is found, it uses the UndoNextLSN field to skip log records that have already been
rolled back
...
Whenever an update log record is used to perform an undo (whether for transaction rollback during normal processing, or during the restart undo pass), the undo
pass generates a CLR containing the undo action performed (which must be physiological)
...
17
...
6
...
If
some pages of a disk fail, they can be recovered without stopping transaction
processing on other pages
...
This can be quite useful for deadlock handling, since
transactions can be rolled back up to a point that permits release of required
locks, and then restarted from that point
...
670
Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition
672
Chapter 17
V
...
Recovery System
© The McGraw−Hill
Companies, 2001
Recovery System
• Recovery optimizations: The DirtyPageTable can be used to prefetch pages
during redo, instead of fetching a page only when the system finds a log
record to be applied to the page
...
Meanwhile, other log records can continue to be processed
...
17
...
Such systems are vulnerable to environmental disasters such as fire, flooding, or
earthquakes
...
Such systems must
provide high availability, that is, the time for which the system is unusable must be
extremely small
...
The remote backup site is sometimes also called the
secondary site
...
We achieve synchronization by sending all log
records from primary site to the remote backup site
...
Figure 17
...
When the primary site fails, the remote backup site takes over processing
...
In effect, the remote backup
site is performing recovery actions that would have been performed at the primary
site when the latter recovered
...
Once recovery has been
performed, the remote backup site starts processing transactions
...
10
Architecture of remote backup system
...
Transaction
Management
671
© The McGraw−Hill
Companies, 2001
17
...
10
Remote Backup Systems
673
Availability is greatly increased over a single-site system, since the system can
recover even if all data at the primary site are lost
...
Several issues must be addressed in designing a remote backup system:
• Detection of failure
...
Failure of communication lines can fool the remote backup into believing that the primary has failed
...
For example, in addition to the network connection,
there may be a separate modem connection over a telephone line, with services provided by different telecommunication companies
...
• Transfer of control
...
When the original primary site recovers, it can either play the role of remote backup, or take over the role of primary site again
...
The simplest way of transferring control is for the old primary to receive
redo logs from the old backup site, and to catch up with the updates by applying them locally
...
If control must be transferred back, the old backup site can pretend to have
failed, resulting in the old primary taking over
...
If the log at the remote backup grows large, recovery will
take a long time
...
The delay before the remote backup takes over can
be significantly reduced as a result
...
In this configuration, the remote backup site continually processes redo log records as they arrive, applying the updates locally
...
• Time to commit
...
This delay can result in a longer wait to commit
a transaction, and some systems therefore permit lower degrees of durability
...
One-safe
...
672
Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition
674
Chapter 17
V
...
Recovery System
© The McGraw−Hill
Companies, 2001
Recovery System
The problem with this scheme is that the updates of a committed transaction may not have made it to the backup site, when the backup site
takes over processing
...
When the
primary site recovers, the lost updates cannot be merged in directly, since
the updates may conflict with later updates performed at the backup site
...
Two-very-safe
...
The problem with this scheme is that transaction processing cannot
proceed if either the primary or the backup site is down
...
Two-safe
...
If only the primary is active, the transaction is
allowed to commit as soon as its commit log record is written to stable
storage at the primary site
...
It results in a slower commit than the one-safe scheme, but the benefits
generally outweigh the cost
...
In these systems, the
failure of a CPU does not result in system failure
...
Recovery actions include rollback of transactions running
on the failed CPU, and recovery of locks held by those transactions
...
However, we should
safeguard the data from disk failure by using, for example, a RAID disk organization
...
Transactions are then required to update
all replicas of any data item that they update
...
17
...
There are a variety of causes of such failure, including disk crash,
power failure, and software errors
...
• In addition to system failures, transactions may also fail for various reasons,
such as violation of integrity constraints or deadlocks
...
Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition
V
...
Recovery System
17
...
Data in volatile storage, such as in RAM, are lost
when the computer crashes
...
Data in stable storage are never lost
...
Offline,
or archival, stable storage may consist of multiple tape copies of data stored
in a physically secure location
...
To preserve consistency, we require that each transaction be
atomic
...
There are basically two different approaches for
ensuring atomicity: log-based schemes and shadow paging
...
In the deferred-modifications scheme, during the execution of a transaction, all the write operations are deferred until the transaction partially
commits, at which time the system uses the information on the log associated with the transaction in executing the deferred writes
...
If a crash occurs, the system uses the information
in the log in restoring the state of the system to a previous consistent state
...
• In shadow paging, two page tables are maintained during the life of a transaction: the current page table and the shadow page table
...
The shadow page table and pages
it points to are never changed during the duration of the transaction
...
If the transaction aborts, the current
page table is simply discarded
...
No transaction can be allowed to update a data item that has already been
updated by an incomplete transaction
...
• Transaction processing is based on a storage model in which main memory
holds a log buffer, a database buffer, and a system buffer
...
674
Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition
676
Chapter 17
V
...
Recovery System
Recovery System
• Efficient implementation of a recovery scheme requires that the number of
writes to the database and to stable storage be minimized
...
Before a block of data in main memory is output to the database (in nonvolatile storage), all log records pertaining to data in that block must have
been output to stable storage
...
If a failure occurs that results in the loss of physical database
blocks, we use the most recent dump in restoring the database to a previous
consistent state
...
• Advanced recovery techniques support high-concurrency locking techniques,
such as those used for B+ -tree concurrency control
...
When recovering from system failure, the system performs a redo pass using
the log, followed by an undo pass on the log to roll back incomplete transactions
...
It is also based on repeating of history, and allows
logical undo operations
...
It uses log sequence numbers (LSNs) to implement a variety of optimizations that reduce
the time taken for recovery
...
Review Terms
• Recovery scheme
• Fail-stop assumption
• Failure classification
• Disk failure
Transaction failure
Logical error
System error
System crash
Data-transfer failure
• Storage types
Volatile storage
Nonvolatile storage
Stable storage
Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition
V
...
Recovery System
Exercises
• Blocks
• Archival dump
Physical blocks
Buffer blocks
• Disk buffer
677
• Fuzzy dump
• Advanced recovery technique
• Immediate modification
Physical undo
Logical undo
Physical logging
Logical logging
Logical operations
Transaction rollback
Checkpoints
Restart recovery
Redo phase
Undo phase
• Repeating history
• Uncommitted modifications
• Fuzzy checkpointing
• Checkpoints
• ARIES
Log sequence number (LSN)
PageLSN
Physiological redo
Compensation log record
(CLR)
DirtyPageTable
Checkpoint log record
• High availability
• Force-output
• Log-based recovery
• Log
• Log records
• Update log record
• Deferred modification
• Idempotent
• Shadow paging
Page table
Current page table
Shadow page table
• Garbage collection
• Recovery with concurrent
transactions
Transaction rollback
Fuzzy checkpoint
Restart recovery
• Buffer management
• Log-record buffering
• Write-ahead logging (WAL)
• Log force
• Database buffering
• Latches
• Operating system and buffer
management
• Loss of nonvolatile storage
• Remote backup systems
Primary site
Remote backup site
Secondary site
• Detection of failure
• Transfer of control
• Time to recover
• Hot-spare configuration
• Time to commit
One-safe
Two-very-safe
Two-safe
Exercises
17
...
676
Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition
678
Chapter 17
V
...
Recovery System
© The McGraw−Hill
Companies, 2001
Recovery System
17
...
a
...
b
...
17
...
17
...
Show, by an example,
how an inconsistent database state could result if log records for a transaction
are not output to stable storage prior to data updated by the transaction being
written to disk
...
5 Explain the purpose of the checkpoint mechanism
...
6 When the system recovers from a crash (see Section 17
...
4), it constructs an
undo-list and a redo-list
...
17
...
17
...
, block 10)
...
read block 3
read block 7
read block 5
read block 3
read block 1
modify block 1
read block 10
modify block 5
17
...
17
...
Give examples of one situation where
logical logging is preferable to physical logging and one situation where physical logging is preferable to logical logging
...
Transaction
Management
17
...
11 Explain the reasons why recovery of interactive transactions is more difficult
to deal with than is recovery of batch transactions
...
)
17
...
a
...
b
...
Transactions that committed later have their effects rolled back with
this scheme
...
c
...
Why?
17
...
Describe how page access protections provided by modern operating systems
can be used to create before and after images of pages that are updated
...
12
...
14 ARIES assumes there is space in each page for an LSN
...
Suggest a technique to
handle such a situation; your technique must support physical redos but need
not support physiological redos
...
15 Explain the difference between a system crash and a “disaster
...
16 For each of the following requirements, identify the best choice of degree of
durability in a remote backup system:
a
...
b
...
c
...
Bibliographical Notes
Gray and Reuter [1993] is an excellent textbook source of information about recovery,
including interesting implementation and historical details
...
[1987] is
an early textbook source of information on concurrency control and recovery
...
Chandy et al
...
678
Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition
680
Chapter 17
V
...
Recovery System
© The McGraw−Hill
Companies, 2001
Recovery System
An overview of the recovery scheme of System R is presented by Gray et al
...
The shadow-paging mechanism of System R is described by Lorie [1977]
...
[1980], and Verhofstad [1978]
...
[1980]
...
The state of the art in recovery methods is best illustrated by the ARIES recovery
method, described in Mohan et al
...
Aries and its variants
are used in several database products, including IBM DB2 and Microsoft SQL Server
...
[2001]
...
Remote backup for disaster recovery (loss of an entire computing facility by, for
example, fire, flood, or earthquake) is considered in King et al
...
Chapter 24 lists references pertaining to long-duration transactions and related
recovery issues
...
Database System
Architecture
R T
Introduction
© The McGraw−Hill
Companies, 2001
6
Database System Architecture
The architecture of a database system is greatly influenced by the underlying computer system on which the database system runs
...
Database systems can also be designed to exploit parallel computer architectures
...
Chapter 18 first outlines the architectures of database systems running on server
systems, which are used in centralized and client–server architectures
...
The chapter then outlines parallel computer architectures, and parallel database architectures designed for different types of parallel computers
...
Chapter 19 presents a number of issues that arise in a distributed database, and
describes how to deal with each issue
...
Distributed query processing and directory systems are also described in this chapter
...
679
680
Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition
C
VI
...
Database System
Architecture
E
R
1
8
Database System Architectures
The architecture of a database system is greatly influenced by the underlying computer system on which it runs, in particular by such aspects of computer architecture
as networking, parallelism, and distribution:
• Networking of computers allows some tasks to be executed on a server system, and some tasks to be executed on client systems
...
• Parallel processing within a computer system allows database-system activities to be speeded up, allowing faster response to transactions, as well as more
transactions per second
...
The need for parallel
query processing has led to parallel database systems
...
Keeping multiple copies
of the database across different sites also allows large organizations to continue their database operations even when one site is affected by a natural
disaster, such as flood, fire, or earthquake
...
We study the architecture of database systems in this chapter, starting with the
traditional centralized systems, and covering client – server, parallel, and distributed
database systems
...
1 Centralized and Client–Server Architectures
Centralized database systems are those that run on a single computer system and do
not interact with other computer systems
...
Database System
Architecture
© The McGraw−Hill
Companies, 2001
18
...
Client – server systems, on
the other hand, have functionality split between a server system, and multiple client
systems
...
1
...
1)
...
Each device controller
is in charge of a specific type of device (for example, a disk drive, an audio device,
or a video display)
...
Cache memory reduces the contention for memory
access, since it reduces the number of times that the CPU needs to access the shared
memory
...
Personal computers and workstations fall into the first category
...
1
A centralized computer system
...
Database System
Architecture
18
...
1
© The McGraw−Hill
Companies, 2001
Centralized and Client – Server Architectures
685
machine at a time
...
It serves a large number of users who are connected to the system via terminals
...
In particular, they may not support
concurrency control, which is not required when only a single user can generate updates
...
Many such systems do not support SQL, and provide a simpler query language, such as a variant of QBE
...
Although general-purpose computer systems today have multiple processors, they
have coarse-granularity parallelism, with only a few processors (about two to four,
typically), all sharing the main memory
...
Thus, such systems support a higher throughput; that is, they allow a greater number of transactions to run per second, although individual transactions do not run
any faster
...
Thus, coarsegranularity parallel machines logically appear to be identical to single-processor
machines, and database systems designed for time-shared machines can be easily
adapted to run on them
...
We study the architecture of
parallel database systems in Section 18
...
18
...
2 Client – Server Systems
As personal computers became faster, more powerful, and cheaper, there was a shift
away from the centralized system architecture
...
Correspondingly, personal computers assumed the user-interface functionality that used to be handled directly by the centralized systems
...
Figure 18
...
Database functionality can be broadly divided into two parts — the front end and
the back end— as in Figure 18
...
The back end manages access structures, query
evaluation and optimization, concurrency control, and recovery
...
The interface between the front end and the back end is through
SQL, or through an application program
...
Database System
Architecture
© The McGraw−Hill
Companies, 2001
18
...
2
General structure of a client – server system
...
Any client that uses the ODBC or JDBC interfaces can
connect to any server that provides the interface
...
With the
growth of interface standards, the front-end user interface and the back-end server
are often provided by different vendors
...
Some of the popular application development tools
are PowerBuilder, Magic, and Borland Delphi; Visual Basic is also widely used for
application development
...
In effect, they provide front ends specialized for particular tasks
...
These calls appear like ordinary procedure calls to the programmer, but all the remote procedure calls from a client are
enclosed in a single transaction at the server end
...
SQL user-
interface
forms
interface
report
writer
graphical
interface
front-end
interface
(SQL + API)
SQL engine
Figure 18
...
back-end
683
684
Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition
VI
...
Database System
Architecture
18
...
2 Server System Architectures
Server systems can be broadly categorized as transaction servers and data servers
...
Usually,
client machines ship transactions to the server systems, where those transactions are executed, and results are shipped back to clients that are in charge
of displaying the data
...
• Data-server systems allow clients to interact with the servers by making requests to read or update data, in units such as files or pages
...
Data servers for database systems offer much more functionality; they support units of data — such as pages, tuples, or objects — that
are smaller than a file
...
Of these, the transaction-server architecture is by far the more widely used architecture
...
2
...
2
...
18
...
1 Transaction Server Process Structure
A typical transaction server system today consists of multiple processes accessing
data in shared memory, as in Figure 18
...
The processes that form part of the database
system include
• Server processes: These are processes that receive user queries (transactions),
execute them, and send the results back
...
Some database systems
use a separate process for each user session, and a few use a single database
process for all user sessions, but with multiple threads so that multiple queries
can execute concurrently
...
Multiple threads within a process can execute
concurrently
...
• Lock manager process: This process implements lock manager functionality,
which includes lock grant, lock release, and deadlock detection
...
Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition
688
Chapter 18
VI
...
Database System
Architecture
Database System Architectures
user
process
user
process
ODBC
JDBC
server
process
server
process
user
process
server
process
buffer pool
shared
memory
query plan cache
log buffer
log writer
process
lock table
checkpoint
process
log disks
Figure 18
...
• Log writer process: This process outputs log records from the log record buffer
to stable storage
...
• Checkpoint process: This process performs periodic checkpoints
...
The shared memory contains all shared data, such as:
• Buffer pool
• Lock table
685
686
Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition
VI
...
Database System
Architecture
18
...
Since multiple processes may read or perform updates on data structures in shared memory, there must
be a mechanism to ensure that only one of them is modifying any data structure at
a time, and no process is reading a data structure while it is being written by others
...
Alternative implementations, with less overheads, use special
atomic instructions supported by the computer hardware; one type of atomic instruction tests a memory location and sets it to 1 atomically
...
The mutual exclusion mechanisms are also used to implement latches
...
The
lock request procedure executes the actions that the lock manager process would
take on getting a lock request
...
1
...
• If a lock cannot be obtained immediately because of a lock conflict, the lock
request code keeps monitoring the lock table to check when the lock has been
granted
...
To avoid repeated checks on the lock table, operating system semaphores
can be used by the lock request code to wait for a lock grant notification
...
Even if the system handles lock requests through shared memory, it still uses the lock
manager process for deadlock detection
...
2
...
In such an environment, it makes sense to ship data to client machines,
to perform all processing at the client machine (which may take a while), and then
to ship the data back to the server machine
...
Data-server architectures have been particularly
popular in object-oriented database systems
...
Database System
Architecture
18
...
The unit of communication for data can
be of coarse granularity, such as a page, or fine granularity, such as a tuple (or
an object, in the context of object-oriented database systems)
...
If the unit of communication is a single item, the overhead of message passing is high compared to the amount of data transmitted
...
Fetching items even before they are requested is called
prefetching
...
• Locking
...
A disadvantage of page shipping is that client
machines may be granted locks of too coarse a granularity — a lock on a page
implicitly locks all items contained in the page
...
Other client machines that require locks on those items may be blocked
unnecessarily
...
If
the client machine does not need a prefetched item, it can transfer locks on the
item back to the server, and the locks can then be allocated to other clients
...
Data that are shipped to a client on behalf of a transaction can be
cached at the client, even after the transaction completes, if sufficient storage
space is available
...
However, cache coherency is an issue: Even if a
transaction finds cached data, it must make sure that those data are up to date,
since they may have been updated by a different client after they were cached
...
• Lock caching
...
Suppose that a client finds a data item in
the cache, and that it also finds the lock required for an access to the data item
in the cache
...
However, the server must keep track of cached locks; if a client requests a lock from the server, the server must call back all conflicting locks on
the data item from any other client machines that have cached the locks
...
This technique differs from lock de-escalation in that lock caching takes place
across transactions; otherwise, the two techniques are similar
...
Database System
Architecture
© The McGraw−Hill
Companies, 2001
18
...
3
Parallel Systems
691
The bibliographical references provide more information about client – server database systems
...
3 Parallel Systems
Parallel systems improve processing and I/O speeds by using multiple CPUs and
disks in parallel
...
The driving
force behind parallel database systems is the demands of applications that have to
query extremely large databases (of the order of terabytes — that is, 1012 bytes) or
that have to process an extremely large number of transactions per second (of the order of thousands of transactions per second)
...
In parallel processing, many operations are performed simultaneously, as opposed
to serial processing, in which the computational steps are performed sequentially
...
Most high-end machines today offer some degree of coarse-grain parallelism:
Two or four processor machines are common
...
Parallel computers with hundreds of CPUs and disks
are available commercially
...
A system that processes a large number of small transactions can
improve throughput by processing many transactions in parallel
...
18
...
1 Speedup and Scaleup
Two important issues in studying parallelism are speedup and scaleup
...
Handling larger tasks by increasing the degree of parallelism is called scaleup
...
Now suppose that we increase the size of the system by
increasing the number or processors, disks, and other components of the system
...
Suppose that the execution time of a task on the larger machine
is TL , and that the execution time of the same task on the smaller machine is TS
...
The parallel system is said to
demonstrate linear speedup if the speedup is N when the larger system has N times
the resources (CPU, disk, and so on) of the smaller system
...
Figure 18
...
Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition
Chapter 18
© The McGraw−Hill
Companies, 2001
18
...
Database System
Architecture
resources
Figure 18
...
Scaleup relates to the ability to process larger tasks in the same amount of time by
providing more resources
...
Suppose that the execution time of task Q on a given machine MS is TS , and
the execution time of task QN on a parallel machine ML , which is N times larger than
MS , is TL
...
The parallel system ML is said to
demonstrate linear scaleup on task Q if TL = TS
...
Figure 18
...
There are two kinds of
scaleup that are relevant in parallel database systems, depending on how the size of
the task is measured:
• In batch scaleup, the size of the database increases, and the tasks are large jobs
whose runtime depends on the size of the database
...
Thus, the size of the database is the measure of the size of the problem
...
• In transaction scaleup, the rate at which transactions are submitted to the
database increases and the size of the database increases proportionally to
the transaction rate
...
Such transaction processing is especially well adapted
for parallel execution, since transactions can run concurrently and independently on separate processors, and each transaction takes roughly the same
amount of time, even if the database grows
...
The goal of parallelism in database systems is usually to make sure
that the database system can continue to perform at an acceptable speed, even as the
689
690
Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition
VI
...
Database System
Architecture
18
...
6
Scaleup with increasing problem size and resources
...
Increasing the capacity of the system by increasing the parallelism provides a smoother path for growth
for an enterprise than does replacing a centralized system by a faster machine (even
assuming that such a machine exists)
...
A number of factors work against efficient parallel operation and can diminish
both speedup and scaleup
...
There is a startup cost associated with initiating a single process
...
• Interference
...
Both speedup and scaleup are affected by this phenomenon
...
By breaking down a single task into a number of parallel steps, we
reduce the size of the average step
...
It is often
difficult to divide a task into exactly equal-sized parts, and the way that the
sizes are distributed is therefore skewed
...
Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition
694
Chapter 18
VI
...
Database System
Architecture
Database System Architectures
18
...
2 Interconnection Networks
Parallel systems consist of a set of components (processors, memory, and disks) that
can communicate with each other via an interconnection network
...
7 shows
three commonly used types of interconnection networks:
• Bus
...
This type of interconnection is shown in Figure 18
...
The bus could be an Ethernet or a parallel interconnect
...
However, they do not scale well with increasing parallelism, since the bus can handle communication from only one
component at a time
...
The components are nodes in a grid, and each component connects
to all its adjacent components in the grid
...
Figure 18
...
Nodes that are not directly connected can communicate with one another by routing messages via a sequence of intermediate nodes that are directly connected to one another
...
• Hypercube
...
Thus, each of the n components is connected to log(n) other
components
...
7c shows a hypercube with 8 nodes
...
In contrast, in a mesh architecture a
√
component may be 2( n − 1) links away from some of the other components
√
(or n links away, if the mesh interconnection wraps around at the edges of
the grid)
...
011
111
101
001
110
010
000
(a) bus
(b) mesh
Figure 18
...
100
(c) hypercube
691
692
Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition
VI
...
Database System
Architecture
18
...
3
...
Among the most prominent ones are those in Figure 18
...
All the processors share a common memory (Figure 18
...
• Shared disk
...
8b)
...
• Shared nothing
...
8c)
...
This model is a hybrid of the preceding three architectures (Figure 18
...
In Sections 18
...
3
...
3
...
4, we elaborate on each of these models
...
2
...
In fact, they are very important for efficient transaction processing in such systems
...
8
(d) hierarchical
Parallel database architectures
...
Database System
Architecture
18
...
3
...
1 Shared Memory
In a shared-memory architecture, the processors and disks have access to a common
memory, typically via a bus or through an interconnection network
...
A processor can send messages to other processors much faster by using memory writes (which usually take less than a microsecond) than by sending a message
through a communication mechanism
...
Adding more processors does not help after a point, since the processors will spend
most of their time waiting for their turn on the bus to access memory
...
However, at least some of the data will not be in the cache, and accesses will have to go
to the shared memory
...
Maintaining cache-coherency becomes an increasing overhead with increasing number of processors
...
18
...
3
...
There are two advantages of this architecture over a shared-memory architecture
...
Second, it offers a
cheap way to provide a degree of fault tolerance: If a processor (or its memory) fails,
the other processors can take over its tasks, since the database is resident on disks
that are accessible from all processors
...
The shared-disk
architecture has found acceptance in many applications
...
Although the
memory bus is no longer a bottleneck, the interconnection to the disk subsystem is
now a bottleneck; it is particularly so in a situation where the database makes a large
number of accesses to disks
...
DEC clusters running Rdb were one of the early commercial users of the shareddisk database architecture
...
Digital Equipment Corporation (DEC) is now owned by Compaq
...
Database System
Architecture
© The McGraw−Hill
Companies, 2001
18
...
4
Distributed Systems
697
18
...
3
...
The processors at one node may communicate with another processor at another node by a high-speed interconnection network
...
Since
local disk references are serviced by local disks at each processor, the shared-nothing
model overcomes the disadvantage of requiring all I/O to go through a single interconnection network; only queries, accesses to nonlocal disks, and result relations pass
through the network
...
Consequently, shared-nothing architectures are
more scalable and can easily support a large number of processors
...
The Teradata database machine was among the earliest commercial systems to
use the shared-nothing database architecture
...
18
...
3
...
At the top level, the system consists of nodes
connected by an interconnection network, and do not share disks or memory with
one another
...
Each node of the system could actually be a shared-memory system with a few processors
...
Thus, a system could be built as a hierarchy,
with shared-memory architecture with a few processors at the base, and a sharednothing architecture at the top, with possibly a shared-disk architecture in the middle
...
8d illustrates a hierarchical architecture with shared-memory nodes
connected together in a shared-nothing architecture
...
Attempts to reduce the complexity of programming such systems have yielded
distributed virtual-memory architectures, where logically there is a single shared
memory, but physically there are multiple disjoint memory systems; the virtualmemory-mapping hardware, coupled with system software, allows each processor
to view the disjoint memories as a single virtual memory
...
18
...
The
computers in a distributed system communicate with one another through various
Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition
698
Chapter 18
VI
...
Database System
Architecture
Database System Architectures
communication media, such as high-speed networks or telephone lines
...
The computers in a distributed system may vary in size
and function, ranging from workstations up to mainframe systems
...
We mainly use the term site, to emphasize the physical distribution of these systems
...
9
...
Another major difference is that, in a distributed database system, we differentiate between local and
global transactions
...
A global transaction, on the other hand, is one
that either accesses data in a site different from the one at which the transaction was
initiated, or accesses data in several different sites
...
• Sharing data
...
For instance, in a distributed banking
system, where each branch stores data related to that branch, it is possible for
a user in one branch to access data in another branch
...
• Autonomy
...
9
A distributed system
...
Database System
Architecture
© The McGraw−Hill
Companies, 2001
18
...
4
Distributed Systems
699
are stored locally
...
In a distributed system, there is a global
database administrator responsible for the entire system
...
Depending on the design of the distributed database system, each administrator may have a different degree of local autonomy
...
• Availability
...
In particular, if data items are replicated in several sites, a transaction needing a particular data item may find that item in
any of several sites
...
The failure of one site must be detected by the system, and appropriate
action may be needed to recover from the failure
...
Finally, when the failed site recovers or is
repaired, mechanisms must be available to integrate it smoothly back into the
system
...
Availability is crucial for database systems used for real-time applications
...
18
...
1 An Example of a Distributed Database
Consider a banking system consisting of four branches in four different cities
...
Each such installation is thus a site
...
Each branch maintains
(among others) a relation account(Account-schema), where
Account-schema = (account-number, branch-name, balance)
The site containing information about all the branches of the bank maintains the relation branch(Branch-schema), where
Branch-schema = (branch-name, branch-city, assets)
There are other relations maintained at the various sites; we ignore them for the purpose of our example
...
If the transaction was initiated at the Valleyview
branch, then it is considered local; otherwise, it is considered global
...
Database System
Architecture
18
...
In an ideal distributed database system, the sites would share a common global
schema (although some relations may be stored only at some sites), all sites would
run the same distributed database-management software, and the sites would be
aware of each other’s existence
...
However, in reality a distributed database has to be constructed by linking together multiple already-existing
database systems, each with its own schema and possibly running different databasemanagement software
...
We discuss these systems in Section 19
...
18
...
2 Implementation Issues
Atomicity of transactions is an important issue in building a distributed database system
...
Transaction commit protocols ensure such a situation cannot arise
...
The basic idea behind 2PC is for each site to execute the transaction till just before
commit, and then leave the commit decision to a single coordinator site; the transaction is said to be in the ready state at a site at this point
...
Every site where the transaction executed must follow the decision of the coordinator
...
The
2PC protocol is described in detail in Section 19
...
1
...
Since a transaction may access data items at several sites, transaction managers at several sites may
need to coordinate to implement concurrency control
...
Therefore deadlock detection needs to be carried out
across multiple sites
...
Replication of data items,
which is the key to the continued functioning of distributed databases when failures
occur, further complicates concurrency control
...
5 provides detailed coverage of concurrency control in distributed databases
...
697
698
Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition
VI
...
Database System
Architecture
18
...
When the tasks to be carried out are complex, involving multiple databases and/or
multiple interactions with humans, coordination of the tasks and ensuring transaction properties for the tasks become more complicated
...
Section 19
...
3 describes
persistent messaging, while Section 24
...
In case an organization has to choose between a distributed architecture and a
centralized architecture for implementing an application, the system architect must
balance the advantages against the disadvantages of distribution of data
...
The primary disadvantage
of distributed database systems is the added complexity required to ensure proper
coordination among the sites
...
It is more difficult to implement a distributed
database system; thus, it is more costly
...
Since the sites that constitute the distributed system operate in parallel, it is harder to ensure the correctness of algorithms,
especially operation during failures of part of the system, and recovery from
failures
...
• Increased processing overhead
...
There are several approaches to distributed database design, ranging from fully
distributed designs to ones that include a large degree of centralization
...
18
...
There are basically two types of networks: local-area networks and widearea networks
...
In local-area networks, processors are distributed over
small geographical areas, such as a single building or a number of adjacent buildings
...
These differences imply major variations in the speed and reliability of
the communication network, and are reflected in the distributed operating-system
design
...
5
...
10) emerged in the early 1970s as a way
for computers to communicate and to share data with one another
...
Database System
Architecture
© The McGraw−Hill
Companies, 2001
18
...
10
workstation
PC
Local-area network
...
Because each
small computer is likely to need access to a full complement of peripheral devices
(such as disks and printers), and because some form of data sharing is likely to occur in a single enterprise, it was a natural step to connect these small systems into a
network
...
All the sites in such systems
are close to one another, so the communication links tend to have a higher speed and
lower error rate than do their counterparts in wide-area networks
...
Communication speeds range from a few megabits per
second (for wireless local-area networks), to 1 gigabit per second for Gigabit Ethernet
...
A storage-area network (SAN) is a special type of high-speed local-area network
designed to connect large banks of storage devices (disks) to computers that use the
data
...
The motivation for using storage-area networks to connect multiple computers to large banks
of storage devices is essentially the same as that for shared-disk databases, namely
• Scalability by adding more computers
• High availability, since data is still accessible even if a computer fails
RAID organizations are used in the storage devices to ensure high availability of the
data, permitting processing to continue even if individual disks fail
...
699
700
Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition
VI
...
Database System
Architecture
18
...
5
...
Systems that allowed remote terminals to be connected to a central computer
via telephone lines were developed in the early 1960s, but they were not true WANs
...
Work on the Arpanet
began in 1968
...
Typical links on the Internet are fiber-optic lines and, sometimes,
satellite channels
...
The last link, to end user sites, is often based on digital subscriber loop (DSL) technology supporting a few megabits per
second), or cable modem (supporting 10 megabits per second), or dial-up modem
connections over phone lines (supporting up to 56 kilobits per second)
...
• In continuous connection WANs, such as the wired Internet, hosts are connected to the network at all times
...
For applications where consistency is not critical,
such as sharing of documents, groupware systems such as Lotus Notes allow updates of remote data to be made locally, and the updates are then propagated back
to the remote site periodically
...
A mechanism for detecting
conflicting updates is described later, in Section 23
...
4; the resolution mechanism for
conflicting updates is, however, application dependent
...
6 Summary
• Centralized database systems run entirely on a single computer
...
Client– server interface protocols have
helped the growth of client – server database systems
...
Transaction servers have multiple processes, possibly running on multiple
processors
...
Database System
Architecture
© The McGraw−Hill
Companies, 2001
18
...
In addition
to processes that handle queries, there are system processes that carry out
tasks such as lock and log management and checkpointing
...
Such systems strive to
minimize communication between clients and servers by caching data
and locks at the clients
...
• Parallel database systems consist of multiple processors and multiple disks
connected by a fast interconnection network
...
Scaleup measures how well we can handle an increased number of
transactions by increasing parallelism
...
• Parallel database architectures include the shared-memory, shared-disk,
shared-nothing, and hierarchical architectures
...
• A distributed database is a collection of partially independent databases that
(ideally) share a common schema, and coordinate processing of transactions
that access nonlocal data
...
• Principally, there are two types of communication networks: local-area networks and wide-area networks
...
Wide-area networks connect nodes spread over a large
geographical area
...
Storage-area networks are a special type of local-area network designed
to provide fast interconnection between large banks of storage devices and
multiple computers
...
Database System
Architecture
18
...
1 Why is it relatively easy to port a database from a single processor machine to
a multiprocessor machine if individual queries need not be parallelized?
18
...
On the other hand, data server architectures are popular for client-server object-oriented database systems, where
transactions are expected to be relatively long
...
18
...
What would
be the drawback of such an architecture?
18
...
Consider instead a scenario
Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition
706
Chapter 18
VI
...
Database System
Architecture
© The McGraw−Hill
Companies, 2001
Database System Architectures
where client and server machines have exactly the same power
...
5 Consider an object-oriented database system based on a client-server architecture, with the server acting as a data server
...
What is the effect of the speed of the interconnection between the client
and the server on the choice between object and page shipping?
b
...
The page cache stores data in units
of a page, while the object cache stores data in units of objects
...
Describe one benefit of an object cache
over a page cache
...
6 What is lock de-escalation, and under what conditions is it required? Why is it
not required if the unit of data shipping is an item?
18
...
Suppose the company is growing rapidly
each year, and has outgrown its current computer system
...
8 Suppose a transaction is written in C with embedded SQL, and about 80 percent
of the time is spent in the SQL code, with the remaining 20 percent spent in C
code
...
18
...
10 Consider a bank that has a collection of sites, each running a database system
...
Would such a system qualify as a distributed database?
Why?
18
...
Such networks are often configured with a
server site and multiple client sites
...
What is the advantage of such an
architecture over one where a site can exchange data with another site only by
first dialing it up?
703
704
Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition
VI
...
Database System
Architecture
© The McGraw−Hill
Companies, 2001
Bibliographical Notes
707
Bibliographical Notes
Patterson and Hennessy [1995] and Stone [1993] are textbooks that provide a good
introduction to the area of computer architecture
...
Geiger [1995] and
Signore et al
...
North
[1995] describes the use of a variety of tools for client – server database access
...
[1991] and Franklin et al
...
Biliris and Orenstein [1994] survey object storage
management systems, including client – server related issues
...
[1992]
and Mohan and Narang [1994] describe recovery techniques for client-server systems
...
A survey of parallel computer architectures is
presented by Duncan [1990]
...
Ozsu and Valduriez [1999], Bell and Grimson [1992] and Ceri and Pelagatti [1984]
provide textbook coverage of distributed database systems
...
Comer and Droms [1999] and Thomas [1996] describe the computer networking
and the Internet
...
Discussions concerning ATM networks and switches are offered
by de Prycker [1993]
...
Database System
Architecture
H
A
P
T
19
...
Furthermore, the database systems that run
on each site may have a substantial degree of mutual independence
...
Each site may participate in the execution of transactions that access data at one
site, or several sites
...
This distribution of data is the cause of
many difficulties in transaction processing and query processing
...
We start by classifying distributed databases as homogeneous or heterogeneous,
in Section 19
...
We then address the question of how to store data in a distributed
database in Section 19
...
In Section 19
...
In Section 19
...
In Section 19
...
In Section 19
...
We
address query processing in distributed databases in Section 19
...
In Section 19
...
In Section 19
...
19
...
In such a system, local sites surrender a portion of their autonomy
709
706
Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition
710
Chapter 19
VI
...
Distributed Databases
© The McGraw−Hill
Companies, 2001
Distributed Databases
in terms of their right to change schemas or database management system software
...
In contrast, in a heterogeneous distributed database, different sites may use different schemas, and different database management system software
...
The differences in schemas are often a major problem
for query processing, while the divergence in software becomes a hindrance for processing transactions that access multiple sites
...
However,
in Section 19
...
Transaction processing issues in such systems are covered later, in
Section 24
...
19
...
There are two approaches to
storing this relation in the distributed database:
• Replication
...
The alternative to replication
is to store only one copy of relation r
...
The system partitions the relation into several fragments, and
stores each fragment at a different site
...
In the following subsections, we elaborate on each of these techniques
...
2
...
In the most
extreme case, we have full replication, in which a copy is stored in every site in the
system
...
• Availability
...
Thus, the system can continue to process queries
involving r, despite the failure of one site
...
In the case where the majority of accesses to the relation r result in only the reading of the relation, then several sites can process
queries involving r in parallel
...
Hence, data replication minimizes movement of data between
sites
...
Database System
Architecture
707
© The McGraw−Hill
Companies, 2001
19
...
2
Distributed Data Storage
711
• Increased overhead on update
...
Thus,
whenever r is updated, the update must be propagated to all sites containing
replicas
...
For example, in a banking system,
where account information is replicated in various sites, it is necessary to ensure that the balance in a particular account agrees in all sites
...
However, update transactions incur
greater overhead
...
We can simplify the management of replicas of relation r by choosing one of them
as the primary copy of r
...
Similarly, in an airlinereservation system, a flight can be associated with the site at which the flight originates
...
5
...
2
...
, rn
...
There are two different schemes for fragmenting a relation: horizontal fragmentation and vertical fragmentation
...
Vertical fragmentation splits the
relation by decomposing the scheme R of relation r
...
, rn
...
As an illustration, the account relation can be divided into several different fragments, each of which consists of tuples of accounts belonging to a particular branch
...
In general, a horizontal fragment can be defined as a selection on the global relation
r
...
Database System
Architecture
© The McGraw−Hill
Companies, 2001
19
...
By changing the selection predicates
used to construct the fragments, we can have a particular tuple of r appear in more
than one of the ri
...
Vertical fragmentation of r(R) involves the definition of several subsets
of attributes R1 , R2 ,
...
More generally, any superkey can be
used
...
The tuple-id value of a tuple is a unique value that distinguishes the tuple from all
other tuples
...
The physical or logical address for a tuple
can be used as a tuple-id, since each tuple has a unique address
...
For privacy reasons, this relation may be fragmented into a relation employee-privateinfo containing employee-id and salary, and another relation employee-public-info containing attributes employee-id, name, and designation
...
The two types of fragmentation can be applied to a single schema; for instance, the
fragments obtained by horizontally fragmenting a relation can be further partitioned
vertically
...
In general, a fragment can be replicated,
replicas of fragments can be fragmented further, and so on
...
2
...
This characteristic, called data transparency, can take several forms:
• Fragmentation transparency
...
• Replication transparency
...
The distributed system may replicate an object to increase either system per-
Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition
VI
...
Distributed Databases
19
...
Users do not have to be concerned with what
data objects have been replicated, or where replicas have been placed
...
Users are not required to know the physical location
of the data
...
Data items—such as relations, fragments, and replicas — must have unique names
...
In a distributed database,
however, we must take care to ensure that two sites do not use the same name for
distinct data items
...
The name server helps to ensure that the same name does not get used
for different data items
...
This approach, however, suffers from two major disadvantages
...
Second, if the name server
crashes, it may not be possible for any site in the distributed system to continue
to run
...
This approach ensures that no two sites
generate the same name (since each site has a unique identifier)
...
This solution, however, fails to achieve location transparency, since site identifiers are attached to names
...
account, or account@site17, rather than as simply account
...
To overcome this problem, the database system can create a set of alternative
names or aliases for data items
...
The mapping of aliases
to the real names can be stored at each site
...
Furthermore, the user will be unaffected if the
database administrator decides to move a data item from one site to another
...
Instead, the
system should determine which replica to reference on a read request, and should
update all replicas on a write request
...
19
...
1)
...
The local transactions are those
that access and update data in only one local database; the global transactions are
those that access and update data in several local databases
...
However, for global transactions, this task is much more complicated, since several
710
Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition
714
Chapter 19
VI
...
Distributed Databases
Distributed Databases
sites may be participating in execution
...
In this section we study the system structure of a distributed database, and its
possible failure modes
...
4 we study protocols for ensuring atomic commit of global transactions, and
in Section 19
...
In Section 19
...
19
...
1 System Structure
Each site has its own local transaction manager, whose function is to ensure the ACID
properties of those transactions that execute at that site
...
To understand how such a manager
can be implemented, consider an abstract model of a transaction system, in which
each site contains two subsystems:
• The transaction manager manages the execution of those transactions (or subtransactions) that access data stored in a local site
...
• The transaction coordinator coordinates the execution of the various transactions (both local and global) initiated at that site
...
1
...
1
System architecture
...
Database System
Architecture
711
© The McGraw−Hill
Companies, 2001
19
...
3
Distributed Transactions
715
The structure of a transaction manager is similar in many respects to the structure
of a centralized system
...
The transaction coordinator subsystem is not needed in the centralized environment, since a transaction accesses data at only a single site
...
For each such transaction, the coordinator is responsible
for
• Starting the execution of the transaction
• Breaking the transaction into a number of subtransactions and distributing
these subtransactions to the appropriate sites for execution
• Coordinating the termination of the transaction, which may result in the transaction being committed at all sites or aborted at all sites
19
...
2 System Failure Modes
A distributed system may suffer from the same types of failure that a centralized
system does (for example, software errors, hardware errors, or disk crashes)
...
The basic failure types are
• Failure of a site
• Loss of messages
• Failure of a communication link
• Network partition
The loss or corruption of messages is always a possibility in a distributed system
...
Information about such protocols may be found in standard textbooks on networking (see the bibliographical notes)
...
If a communication link fails, messages that would have been transmitted across the link must be
rerouted
...
In other cases, a failure may result in there being no connection between some pairs of sites
...
Database System
Architecture
19
...
Note that, under this definition, a subsystem may consist of a
single node
...
4 Commit Protocols
If we are to ensure atomicity, all the sites in which a transaction T executed must
agree on the final outcome of the execution
...
To ensure this property, the transaction coordinator of T must
execute a commit protocol
...
4
...
An alternative is the
three-phase commit protocol (3PC), which avoids certain disadvantages of the 2PC
protocol but adds to complexity and overhead
...
4
...
19
...
1 Two-Phase Commit
We first describe how the two-phase commit protocol (2PC) operates during normal
operation, then describe how it handles failures and finally how it carries out recovery and concurrency control
...
19
...
1
...
• Phase 1
...
It then sends a prepare T message to all sites at which T executed
...
If the answer is no, it adds a
record
to Ci
...
The
transaction manager then replies with a ready T message to Ci
...
When Ci receives responses to the prepare T message from all the
sites, or when a prespecified interval of time has elapsed since the prepare
T message was sent out, Ci can determine whether the transaction T can be
committed or aborted
...
Otherwise, transaction T must be
aborted
...
At
this point, the fate of the transaction has been sealed
...
Database System
Architecture
713
© The McGraw−Hill
Companies, 2001
19
...
4
Commit Protocols
717
coordinator sends either a commit T or an abort T message to all participating
sites
...
A site at which T executed can unconditionally abort T at any time before it sends
the message ready T to the coordinator
...
The ready T message is, in effect, a promise
by a site to follow the coordinator’s order to commit T or to abort T
...
Otherwise, if
the site crashes after sending ready T, it may be unable to make good on its promise
...
Since unanimity is required to commit a transaction, the fate of T is sealed as soon
as at least one site responds abort T
...
The final verdict
regarding T is determined at the time that the coordinator writes that verdict (commit
or abort) to the log and forces that verdict to stable storage
...
When the coordinator receives the acknowledge T message from all the sites, it adds the record
...
4
...
2 Handling of Failures
The 2PC protocol responds in differenct ways to various types of failures:
• Failure of a participating site
...
If the site fails after the coordinator has received the ready T message
from the site, the coordinator executes the rest of the commit protocol in the
normal fashion, ignoring the failure of the site
...
Let T be one such transaction
...
In this case, the site executes
redo(T)
...
In this case, the site executes undo(T)
...
In this case, the site must consult Ci
to determine the fate of T
...
In the former case, it executes redo(T); in the latter
case, it executes undo(T)
...
It does so by sending a querystatus T message to all the
sites in the system
...
It then notifies Sk about this outcome
...
The decision concerning T is
714
Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition
718
Chapter 19
VI
...
Distributed Databases
© The McGraw−Hill
Companies, 2001
Distributed Databases
postponed until Sk can obtain the needed information
...
It continues
to do so until a site that contains the needed information recovers
...
The log contains no control records (abort, commit, ready) concerning T
...
Since the failure of Sk precludes the sending of such a response,
by our algorithm Ci must abort T
...
• Failure of the coordinator
...
We shall see that, in certain cases, the participating sites
cannot decide whether to commit or abort T, and therefore these sites must
wait for the recovery of the failed coordinator
...
If an active site contains an
aborted
...
However, the coordinator may have decided to abort T,
but not to commit T
...
If none of the preceding cases holds, then all active sites must have a
as
...
Thus, the active sites
must wait for Ci to recover
...
For example, if locking is used, T may
hold locks on data at active sites
...
During this time, other
transactions may be forced to wait for T
...
This
situation is called the blocking problem, because T is blocked pending the
recovery of site Ci
...
When a network partitions, two possibilities exist:
1
...
In this
case, the failure has no effect on the commit protocol
...
The coordinator and its participants belong to several partitions
...
Sites that are not in the partition containing
the coordinator simply execute the protocol to deal with failure of the
coordinator
...
Database System
Architecture
715
© The McGraw−Hill
Companies, 2001
19
...
4
Commit Protocols
719
the coordinator follow the usual commit protocol, assuming that the sites
in the other partitions have failed
...
19
...
1
...
9
...
The recovering site must determine the commit–abort status of such transactions by
contacting other sites, as described in Section 19
...
1
...
If recovery is done as just described, however, normal transaction processing at
the site cannot begin until all in-doubt transactions have been committed or rolled
back
...
Further, if the coordinator has failed, and no other site has
information about the commit–abort status of an incomplete transaction, recovery
potentially could become blocked if 2PC is used
...
To circumvent this problem, recovery algorithms typically provide support for
noting lock information in the log
...
) Instead of writing a
a
T when the log record is written
...
After lock reacquisition is complete for all in-doubt transactions, transaction processing can start at the site, even before the commit–abort status of the in-doubt transactions is determined
...
Thus, site recovery is faster, and
never gets blocked
...
19
...
2 Three-Phase Commit
The three-phase commit (3PC) protocol is an extension of the two-phase commit protocol that avoids the blocking problem under certain assumptions
...
Under these assumptions, the protocol avoids blocking
by introducing an extra third phase where multiple sites are involved in the decision
to commit
...
Database System
Architecture
19
...
If the coordinator fails, the remaining sites first select a new coordinator
...
The new
coordinator restarts the third phase of the protocol if some site knew that the old coordinator intended to commit the transaction
...
While the 3PC protocol has the desirable property of not blocking unless k sites
fail, it has the drawback that a partitioning of the network will appear to be the same
as more than k sites failing, which would lead to blocking
...
Because of its overhead, the 3PC protocol is not
widely used
...
19
...
3 Alternative Models of Transaction Processing
For many applications, the blocking problem of two-phase commit is not acceptable
...
In this section we describe how to use persistent messaging to avoid the problem of
distributed commit, and then briefly outline the larger issue of workflows; workflows
are considered in more detail in Section 24
...
To understand persistent messaging consider how one might transfer funds between two different banks, each with its own computer
...
However, the transaction may have to update the total bank balance, and blocking could
have a serious impact on all other transactions at each bank, since almost all transactions at the bank would update the total bank balance
...
The bank first
deducts the amount of the check from the available balance and prints out a check
...
After
verifying the check, the bank increases the local balance by the amount of the check
...
So that funds are not
lost or incorrectly increased, the check must not be lost, and must not be duplicated
and deposited more than once
...
Persistent messages are messages that are guaranteed to be delivered to the recipient exactly once (neither less nor more), regardless of failures, if the transaction
sending the message commits, and are guaranteed to not be delivered if the transaction aborts
...
In contrast, regular
messages may be lost or may even be delivered multiple times in some situations
...
Database System
Architecture
717
© The McGraw−Hill
Companies, 2001
19
...
4
Commit Protocols
721
Error handling is more complicated with persistent messaging than with twophase commit
...
Both sites must therefore be provided with error handling code, along
with code to handle the persistent messages
...
The types of exception conditions that may arise depend on the application, so
it is not possible for the database system to handle exceptions automatically
...
For
instance, it is not acceptable to just lose the money being transfered if the receiving
account has been closed; the money must be credited back to the originating account,
and if that is not possible for some reason, humans must be alerted to resolve the
situation manually
...
In fact, few
organizations would agree to support two-phase commit for transactions originating
outside the organization, since failures could result in blocking of access to local data
...
Workflows provide a general model of transaction processing involving multiple
sites and possibly human processing of certain steps
...
The
steps, together, form a workflow
...
2
...
We now consider the implementation of persistent messaging
...
The message is also given
a unique message identifier
...
The usual database concurrency
control mechanisms ensure that the system process reads the message only
after the transaction that wrote the message commits; if the transaction aborts,
the usual recovery mechanism would delete the message from the relation
...
If it receives no
acknowledgement from the destination site, after some time it sends the message again
...
In case of permanent failures, the system will decide, after some period of time, that the
718
Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition
722
Chapter 19
VI
...
Distributed Databases
© The McGraw−Hill
Companies, 2001
Distributed Databases
message is undeliverable
...
Writing the message to a relation and processing it only after the transaction
commits ensures that the message will be delivered if and only if the transaction commits
...
• Receiving site protocol: When a site receives a persistent message, it runs a
transaction that adds the message to a special received-messages relation, provided it is not already present in the relation (the unique message identifier
detects duplicates)
...
Note that sending the acknowledgment before the transaction commits is
not safe, since a system failure may then result in loss of the message
...
In many messaging systems, it is possible for messages to get delayed arbitrarily, although such delays are very unlikely
...
Deleting
it could result in a duplicate delivery not being detected
...
To deal with this problem, each message is given a timestamp, and if the timestamp of a received
message is older than some cutoff, the message is discarded
...
19
...
We assume
that each site participates in the execution of a commit protocol to ensure global transaction atomicity
...
If any site containing a replica of a data item has failed, updates to the
data item cannot be processed
...
6 we describe protocols that can continue
transaction processing even if some sites or links have failed, thereby providing high
availability
...
5
...
The only change that needs to be incorporated is in the way the lock
manager deals with replicated data
...
As in
Chapter 16, we shall assume the existence of the shared and exclusive lock modes
...
Database System
Architecture
19
...
5
719
© The McGraw−Hill
Companies, 2001
Concurrency Control in Distributed Databases
723
19
...
1
...
All lock and unlock requests are made at
site Si
...
The lock manager determines whether the lock can be granted immediately
...
Otherwise, the request is delayed until it can
be granted, at which time a message is sent to the site at which the lock request was
initiated
...
In the case of a write, all the sites where a replica of
the data item resides must be involved in the writing
...
This scheme requires two messages for handling
lock requests, and one message for handling unlock requests
...
Since all lock and unlock requests are made at one
site, the deadlock-handling algorithms discussed in Chapter 16 can be applied
directly to this environment
...
The site Si becomes a bottleneck, since all requests must be processed there
...
If the site Si fails, the concurrency controller is lost
...
6
...
19
...
1
...
Each site maintains a local lock manager whose function is to administer the lock
and unlock requests for those data items that are stored in that site
...
If data item Q is locked in an incompatible mode, then the request is delayed
until it can be granted
...
There are several alternative ways of dealing with replication of data items, which
we study in Sections 19
...
1
...
5
...
6
...
It has a reasonably low overhead, requiring two message transfers for handling lock requests, and
720
Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition
724
Chapter 19
VI
...
Distributed Databases
© The McGraw−Hill
Companies, 2001
Distributed Databases
one message transfer for handling unlock requests
...
The deadlock-handling algorithms discussed in Chapter 16 must be modified, as we
shall discuss in Section 19
...
4, to detect global deadlocks
...
5
...
3 Primary Copy
When a system uses data replication, we can choose one of the replicas as the primary
copy
...
When a transaction needs to lock a data item Q, it requests a lock at the primary
site of Q
...
Thus, the primary copy enables concurrency control for replicated data to be handled like that for unreplicated data
...
However, if the primary site of Q fails, Q is inaccessible, even though other sites
containing a replica may be accessible
...
5
...
4 Majority Protocol
The majority protocol works this way: If data item Q is replicated in n different sites,
then a lock-request message must be sent to more than one-half of the n sites in which
Q is stored
...
As before, the response is delayed until the request can
be granted
...
This scheme deals with replicated data in a decentralized manner, thus avoiding
the drawbacks of central control
...
The majority protocol is more complicated to implement
than are the previous schemes
...
• Deadlock handling
...
As an illustration, consider
a system with four sites and full replication
...
Transaction T1 may succeed
in locking Q at sites S1 and S3 , while transaction T2 may succeed in locking
Q at sites S2 and S4
...
Luckily, we can avoid such deadlocks with relative
ease, by requiring all sites to request locks on the replicas of a data item in the
same predetermined order
...
Database System
Architecture
19
...
5
721
© The McGraw−Hill
Companies, 2001
Concurrency Control in Distributed Databases
725
19
...
1
...
The difference from
the majority protocol is that requests for shared locks are given more favorable treatment than requests for exclusive locks
...
When a transaction needs to lock data item Q, it simply requests
a lock on Q from the lock manager at one site that contains a replica of Q
...
When a transaction needs to lock data item Q, it requests a
lock on Q from the lock manager at all sites that contain a replica of Q
...
The biased scheme has the advantage of imposing less overhead on read operations than does the majority protocol
...
However, the additional overhead on writes is a disadvantage
...
19
...
1
...
The
quorum consensus protocol assigns each site a nonnegative weight
...
To execute a write operation, enough replicas must be written so that their
total weight is ≥ Qw
...
For instance, with a small read quorum, reads need to read fewer
replicas, but the write quorum will be higher, hence writes can succeed only if correspondingly more replicas are available
...
In fact, by setting weights and quorums appropriately, the quorum consensus protocol can simulate the majority protocol and the biased protocols
...
5
...
2 is that each transaction is given a unique timestamp that the system uses in deciding the serialization
order
...
Database System
Architecture
© The McGraw−Hill
Companies, 2001
19
...
2
Generation of unique timestamps
...
Then, the various
protocols can operate directly to the nonreplicated environment
...
In the centralized scheme, a single site distributes the timestamps
...
In the distributed scheme, each site generates a unique local timestamp by using
either a logical counter or the local clock
...
2)
...
Compare this
technique for generating unique timestamps with the one that we presented in Section 19
...
3 for generating unique names
...
In such a case, the fast site’s logical counter will be larger
than that of other sites
...
What we need is a mechanism to ensure
that local timestamps are generated fairly across the system
...
The logical
clock can be implemented as a counter that is incremented after a new local timestamp is generated
...
In this case,
site Si advances its logical clock to the value x + 1
...
Since
clocks may not be perfectly accurate, a technique similar to that for logical clocks
must be used to ensure that no clock gets far ahead of or behind another clock
...
5
...
With master – slave replication, the database allows updates at a primary site,
and automatically propagates updates to replicas at other sites
...
An important feature of such replication is that transactions do not obtain locks at
remote sites
...
Database System
Architecture
19
...
5
723
© The McGraw−Hill
Companies, 2001
Concurrency Control in Distributed Databases
727
(but perhaps outdated) view of the database, the replica should reflect a transactionconsistent snapshot of the data at the primary; that is, the replica should reflect all
updates of transactions up to some transaction in the serialization order, and should
not reflect any updates of later transactions in the serialization order
...
Master – slave replication is particularly useful for distributing information, for instance from a central office to branch offices of an organization
...
Updates should be propagated periodically — every night, for example — so that update propagation does not interfere with
query processing
...
It also supports snapshot refresh, which can be done either by recomputing the
snapshot or by incrementally updating it
...
With multimaster replication (also called update-anywhere replication) updates
are permitted at any replica of a data item, and are automatically propagated to
all replicas
...
Transactions update the local copy and the system updates other replicas
transparently
...
Many
database systems use the biased protocol, where writes have to lock and update all
replicas and reads lock and read any one replica, as their currency-control technique
...
Schemes based on lazy propagation allow transaction processing (including updates)
to proceed even if a site is disconnected from the network, thus improving availability, but, unfortunately, do so at the cost of consistency
...
This approach ensures that updates to an item are ordered serially, although
serializability problems can occur, since transactions may read an old value of
some other data item and use it to perform an update
...
This approach can cause even more problems, since the same data item
may be updated concurrently at multiple sites
...
5
...
Database System
Architecture
© The McGraw−Hill
Companies, 2001
19
...
Further, human intervention may be required to deal with conflicts
...
19
...
4 Deadlock Handling
The deadlock-prevention and deadlock-detection algorithms in Chapter 16 can be
used in a distributed system, provided that modifications are made
...
Similarly, the timestamp-ordering approach could be directly applied to a distributed
environment, as we saw in Section 19
...
2
...
Furthermore, certain deadlock-prevention techniques may require more sites to be involved
in the execution of a transaction than would otherwise be the case
...
Common
techniques for dealing with this issue require that each site keep a local wait-for
graph
...
For example, Figure 19
...
Note that transactions T2 and T3 appear in both graphs,
indicating that the transactions have requested items at both sites
...
When a transaction Ti on site S1 needs a resource in site S2 , it
sends a request message to site S2
...
Clearly, if any local wait-for graph has a cycle, deadlock has occurred
...
To illustrate this problem, we consider the
local wait-for graphs of Figure 19
...
Each wait-for graph is acyclic; nevertheless, a
deadlock exists in the system because the union of the local wait-for graphs contains
a cycle
...
4
...
3
T4
site S2
Local wait-for graphs
...
Database System
Architecture
19
...
4
T2
725
© The McGraw−Hill
Companies, 2001
19
...
3
...
Since there is communication delay in the system,
we must distinguish between two types of wait-for graphs
...
The constructed graph is an approximation generated by
the controller during the execution of the controller’s algorithm
...
Correct means in this case
that, if a deadlock exists, it is reported promptly, and if the system reports a deadlock,
it is indeed in a deadlock state
...
• Periodically, when a number of changes have occurred in a local wait-for
graph
...
When the coordinator invokes the deadlock-detection algorithm, it searches its
global graph
...
The coordinator
must notify all the sites that a particular transaction has been selected as victim
...
This scheme may produce unnecessary rollbacks if:
• False cycles exist in the global wait-for graph
...
5
...
Transaction T2 then requests a resource
held by T3 at site S2 , resulting in the addition of the edge T2 → T3 in S2
...
Deadlock recovery may be initiated, although
no deadlock has occurred
...
Database System
Architecture
© The McGraw−Hill
Companies, 2001
19
...
5
False cycles in the global wait-for graph
...
The likelihood of false cycles is usually sufficiently low that they do not cause
a serious performance problem
...
For example,
suppose that site S1 in Figure 19
...
At the same time, the
coordinator has discovered a cycle, and has picked T3 as a victim
...
Deadlock detection can be done in a distributed manner, with several sites taking
on parts of the task, instead of being done at a single site, However, such algorithms
are more complicated and more expensive
...
19
...
In particular, since failures are more likely
in large distributed systems, a distributed database must continue functioning even
when there are various types of failures
...
For a distributed system to be robust, it must detect failures, reconfigure the system
so that computation may continue, and recover when a processor or a link is repaired
...
For example, message
loss is handled by retransmission
...
Database System
Architecture
727
© The McGraw−Hill
Companies, 2001
19
...
6
Availability
731
without receipt of an acknowledgment, is usually a symptom of a link failure
...
Failure to find
such a route is usually a symptom of network partition
...
The system can usually detect that a failure has occurred, but
it may not be able to identify the type of failure
...
It could be that S2 has failed
...
The problem is partly addressed by using multiple links between sites, so that
even if one link fails the sites will remain connected
...
Suppose that site S1 has discovered that a failure has occurred
...
• If transactions were active at a failed/inaccessible site at the time of the failure,
these transactions should be aborted
...
However, in some cases, when data objects are replicated it may be possible
to proceed with reads and updates even though some replicas are inaccessible
...
We address this issue in Section 19
...
1
...
When a
site rejoins, care must be taken to ensure that data at the site is consistent, as
we will see in Section 19
...
3
...
6
...
Examples of central servers
include a name server, a concurrency coordinator, or a global deadlock detector
...
In particular, these situations must be avoided:
• Two or more central servers are elected in distinct partitions
...
728
Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition
732
Chapter 19
VI
...
Distributed Databases
© The McGraw−Hill
Companies, 2001
Distributed Databases
19
...
1 Majority-Based Approach
The majority-based approach to distributed concurrency control in Section 19
...
1
...
In this approach, each data object stores
with it a version number to detect when it was last written to
...
The
transaction does not operate on a until it has successfully obtained a lock on a
majority of the replicas of a
...
(Optionally, they may also write this value back to replicas with lower version numbers
...
The new version number is
one more than the highest version number
...
Failures during a transaction (whether network partitions or site failures) can be tolerated as long as (1) the sites available at commit contain a majority of replicas of all
the objects written to and (2) during reads, a majority of replicas are read to find the
version numbers
...
As long as the requirements are satisfied, the two-phase commit protocol can be used,
as usual, on the sites that are available
...
This is because
writes would have updated a majority of the replicas, while reads will read a majority
of the replicas and find at least one replica that has the latest version
...
We leave the
(straightforward) details to the reader
...
19
...
2 Read One, Write All Available Approach
As a special case of quorum consensus, we can employ the biased protocol by giving
unit weights to all sites, setting the read quorum to 1, and setting the write quorum to
n (all sites)
...
This protocol is called the read one, write all
protocol since all replicas must be written
...
In this approach, a read operation proceeds as
in the read one, write all scheme; any available replica can be read, and a read lock is
Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition
VI
...
Distributed Databases
19
...
A write operation is shipped to all replicas; and write locks
are acquired on all the replicas
...
While this approach appears very attractive, there are several complications
...
Further, if the network partitions, each partition may proceed to
update the same data item, believing that sites in the other partitions are all dead
...
19
...
3 Site Reintegration
Reintegration of a repaired site or link into the system requires care
...
If the site had replicas of any data items, it must obtain
the current values of these data items and ensure that it receives all future updates
...
An easy solution is to halt the entire system temporarily while the failed site rejoins
it
...
Techniques have been developed to allow failed sites to reintegrate while concurrent
updates to data items proceed concurrently
...
If a failed link recovers, two or more partitions can be rejoined
...
See the bibliographical
notes for more information on recovery in distributed systems
...
6
...
10, and replication in distributed databases are two alternative approaches to providing high availability
...
In particular, remote backup
systems help avoid two-phase commit, and its resultant overheads
...
Thus remote backup systems offer a
lower-cost approach to high availability than replication
...
730
Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition
734
Chapter 19
VI
...
Distributed Databases
© The McGraw−Hill
Companies, 2001
Distributed Databases
19
...
5 Coordinator Selection
Several of the algorithms that we have presented require the use of a coordinator
...
One way to
continue execution is by maintaining a backup to the coordinator, which is ready to
assume responsibility if the coordinator fails
...
All messages directed to the coordinator are received
by both the coordinator and its backup
...
The only difference
in function between the coordinator and its backup is that the backup does not take
any action that affects other sites
...
In the event that the backup coordinator detects the failure of the actual coordinator, it assumes the role of coordinator
...
The prime advantage to the backup approach is the ability to continue processing
immediately
...
Frequently, the only source
of some of the requisite information is the failed coordinator
...
Thus, the backup-coordinator approach avoids a substantial amount of delay while
the distributed system recovers from a coordinator failure
...
Furthermore, a coordinator and its backup need to communicate regularly to ensure that their activities are
synchronized
...
In the absence of a designated backup coordinator, or in order to handle multiple
failures, a new coordinator may be chosen dynamically by sites that are live
...
Election algorithms require that a unique identification number be
associated with each active site in the system
...
To keep the notation and the
discussion simple, assume that the identification number of site Si is i and that the
chosen coordinator will always be the active site with the largest identification number
...
The algorithm must send this number to each active
site in the system
...
Suppose that site Si
sends a request that is not answered by the coordinator within a prespecified time
Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition
VI
...
7
731
© The McGraw−Hill
Companies, 2001
19
...
In this situation, it is assumed that the coordinator has failed, and Si tries
to elect itself as the site for the new coordinator
...
Site Si then waits, for a time interval T, for an answer from any one of these sites
...
If Si does receive an answer, it begins a time interval T , to receive a message
informing it that a site with a higher identification number has been elected
...
)
If Si receives no message within T , then it assumes the site with a higher number
has failed, and site Si restarts the algorithm
...
If there are no active sites with higher numbers, the recovered site forces all sites with
lower numbers to let it become the coordinator site, even if there is a currently active
coordinator with a lower number
...
19
...
We examined several techniques for choosing a strategy for processing a
query that minimize the amount of time that it takes to compute the answer
...
In a distributed system, we must take into account
several other matters, including
• The cost of data transmission over the network
• The potential gain in performance from having several sites process parts of
the query in parallel
The relative cost of data transfer over the network and data transfer to and from disk
varies widely depending on the type of network and on the speed of the disks
...
Rather, we must
find a good tradeoff between the two
...
7
...
” Although the query is simple — indeed, trivial—processing it is not trivial, since the
account relation may be fragmented, replicated, or both, as we saw in Section 19
...
If the account relation is replicated, we have a choice of replica to make
...
However, if a replica is fragmented, the choice is not so easy to make, since we need
to compute several joins or unions to reconstruct the account relation
...
Database System
Architecture
© The McGraw−Hill
Companies, 2001
19
...
Query optimization
by exhaustive enumeration of all alternative strategies may not be practical in such
situations
...
The result is the expression
σbranch-name = “Hillside” (account1 ) ∪ σbranch-name = “Hillside” (account2 )
which includes two subexpressions
...
The second involves only account2 , and thus can be
evaluated at the Valleyview site
...
In evaluating
σbranch-name = “Hillside” (account2 )
we can apply the definition of the account2 fragment to obtain
σbranch-name = “Hillside” (σbranch-name = “Valleyview” (account))
This expression is the empty set, regardless of the contents of the account relation
...
19
...
2 Simple Join Processing
As we saw in Chapter 13, a major decision in the selection of a query-processing strategy is choosing a join strategy
...
Let SI denote the site
at which the query was issued
...
Among the possible strategies for processing this query are these:
Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition
VI
...
7
733
© The McGraw−Hill
Companies, 2001
19
...
Using the techniques of Chapter 13,
choose a strategy for processing the entire query locally at site SI
...
Ship temp1 from S2 to S3 , and compute temp2 = temp1 1 branch
at S3
...
• Devise strategies similar to the previous one, with the roles of S1 , S2 , S3 exchanged
...
Among the factors that must be considered
are the volume of data being shipped, the cost of transmitting a block of data between a pair of sites, and the relative speed of processing at each site
...
If we ship all three relations to SI , and indices exist on
these relations, we may need to re-create these indices at SI
...
However, the second
strategy has the disadvantage that a potentially large relation (customer 1 account)
must be shipped from S2 to S3
...
Thus, the second strategy may result in
extra network transmission compared to the first strategy
...
7
...
Let the schemas of r1 and r2 be R1 and R2
...
If there are many tuples of r2 that do not
join with any tuple of r1 , then shipping r2 to S1 entails shipping tuples that fail to
contribute to the result
...
A possible strategy to accomplish all this is:
1
...
2
...
3
...
4
...
5
...
The resulting relation is the same as r1 1
r2
...
In step 3, temp2 has the result of r2 1 ΠR1 ∩ R2 (r1 )
...
1
r2 , the
734
Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition
738
Chapter 19
VI
...
Distributed Databases
Distributed Databases
This strategy is particularly advantageous when relatively few tuples of r2 contribute to the join
...
In such a case, temp2 may have significantly
fewer tuples than r2
...
Additional cost is incurred in shipping temp1 to S2
...
This strategy is called a semijoin strategy, after the semijoin operator of the relational algebra, denoted n
...
In step 3, temp2
= r2 n r1
...
A substantial body of theory has been developed regarding the use of semijoins
for query optimization
...
19
...
4 Join Strategies that Exploit Parallelism
Consider a join of four relations:
r1
1
r2
1
r3
1
r4
where relation ri is stored at site Si
...
There are many possible strategies for parallel evaluation
...
) In one such strategy, r1 is
shipped to S2 , and r1 1 r2 computed at S2
...
Site S2 can ship tuples of (r1 1 r2 ) to S1 as they are
produced, rather than wait for the entire join to be computed
...
Once tuples of (r1 1 r2 ) and (r3 1 r4 ) arrive at S1 , the
computation of (r1 1 r2 ) 1 (r3 1 r4 ) can begin, with the pipelined join technique
of Section 13
...
2
...
Thus, computation of the final join result at S1 can be done
in parallel with the computation of (r1 1 r2 ) at S2 , and with the computation of
(r3 1 r4 ) at S4
...
8 Heterogeneous Distributed Databases
Many new database applications require data from a variety of preexisting databases
located in a heterogeneous collection of hardware and software environments
...
This software layer
is called a multidatabase system
...
A multidatabase system creates the illusion of logical database integration without requiring
physical database integration
...
Database System
Architecture
19
...
8
735
© The McGraw−Hill
Companies, 2001
Heterogeneous Distributed Databases
739
Full integration of heterogeneous systems into a homogeneous distributed database is often difficult or impossible:
• Technical difficulties
...
• Organizational difficulties
...
In such cases, it is important for a multidatabase system to allow the local database systems to retain a high degree of
autonomy over the local database and transactions running against that data
...
In this section, we provide an overview of the challenges faced
in constructing a multidatabase environment from the standpoint of data definition
and query processing
...
6 provides an overview of transaction management
issues in multidatabases
...
8
...
For instance, some may employ the relational model, whereas others may employ older
data models, such as the network model (see Appendix A) or the hierarchical model
(see Appendix B)
...
A commonly used
choice is the relational model, with SQL as the common query language
...
Another difficulty is the provision of a common conceptual schema
...
The multidatabase system must integrate
these separate schemas into one common schema
...
Schema integration is not simply straightforward translation between data-definition languages
...
The data types used in one system may not be supported by
other systems, and translation between types may not be simple
...
At the semantic level, an integer value for length may be inches in one system and millimeters in another, thus
creating an awkward situation in which equality of integers is only an approximate
notion (as is always the case for floating-point numbers)
...
For example, a system based in the
United States may refer to the city “Cologne,” whereas one in Germany refers to it as
“Koln
...
Database System
Architecture
19
...
Translation functions must be provided
...
As we noted earlier,
the alternative of converting each database to a common format may not be feasible
without obsoleting existing application programs
...
8
...
Some of the issues
are:
• Given a query on a global schema, the query may have to be translated into
queries on local schemas at each of the sites where the query has to be executed
...
The task is simplified by writing wrappers for each data source, which provide a view of the local data in the global schema
...
Wrappers may be provided by individual
sites, or may be written separately as part of the multidatabase system
...
• Some data sources may provide only limited query capabilities; for instance,
they may support selections, but not joins
...
Queries may therefore
have to be broken up, to be partly performed at the data source and partly at
the site issuing the query
...
Answers retrieved from the sites may have to be processed to remove
duplicates
...
A query on the entire account relation would require access to both sites and
removal of duplicate answers resulting from tuples with balance between 50
and 100, which are replicated at both sites
...
The usual solution is to rely on only local-level optimization, and just use heuristics at the global level
...
Unlike full-fledged multidatabase systems, mediator systems do not
bother about transaction processing
...
Database System
Architecture
737
© The McGraw−Hill
Companies, 2001
19
...
9
Directory Systems
741
ten used in an interchangeable fashion, and systems that are called mediators may
support limited forms of transactions
...
19
...
In the precomputerization days, organizations would create physical
directories of employees and distribute them across the organization
...
In general, a directory is a listing of information about some class of objects such as
persons
...
In the world of physical telephone directories, directories that satisfy lookups in the forward direction are
called white pages, while directories that satisfy lookups in the reverse direction are
called yellow pages
...
However, directories today need to be available over a
computer network, rather than in a physical (paper) form
...
9
...
Such interfaces are good for humans
...
Directories can
be used for storing other types of information, much like file system directories
...
A user can thus access the same settings from multiple locations,
such as at home and at work, without having to share a file system
...
The most widely used among them today is the
Lightweight Directory Access Protocol (LDAP)
...
The
question then is, why come up with a specialized protocol for accessing directory
information? There are at least two answers to the question
...
They evolved in parallel with the database access
protocols
...
Database System
Architecture
19
...
For example, a particular directory server may store information for Bell Laboratories employees in Murray
Hill, while another may store information for Bell Laboratories employees in
Bangalore, giving both sites autonomy in controlling their local data
...
More importantly, the directory system can be set up to automatically forward queries made at one site to the other site, without user intervention
...
As may be expected, several directory implementations find it beneficial to use relational databases to store data, instead of creating
special-purpose storage systems
...
9
...
Clients use the application programmer interface defined by directory system to communicate with the directory servers
...
The X
...
However,
the protocol is rather complex, and is not widely used
...
500 features, but with less complexity, and is widely used
...
19
...
2
...
Each entry must have a
distinguished name (DN), which uniquely identifies the entry
...
For example, an entry may
have the following distinguished name
...
The order of the components of a distinguished name reflects the normal postal address order, rather than
the reverse order used in specifying path names for files
...
Entries can also have attributes
...
Unlike those in the relational model, attributes
Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition
VI
...
Distributed Databases
19
...
LDAP allows the definition of object classes with attribute names and types
...
Moreover, entries can be specified to
be of one or more object classes
...
Entries are organized into a directory information tree (DIT), according to their
distinguished names
...
Entries that are internal nodes represent objects such as organizational units,
organizations, or countries
...
For instance, an internal node may
have a DN c=USA, and all entries below it have the value USA for the RDN c
...
Entries may have more than one distinguished name — for example, an entry for a
person in more than one organization
...
19
...
2
...
However, LDAP defines a network protocol for carrying out data
definition and manipulation
...
LDAP also defines a file format called LDAP Data Interchange
Format (LDIF) that can be used for storing and exchanging information
...
A query must specify the following:
• A base — that is, a node within a DIT — by giving its distinguished name (the
path from the root to the node)
...
Equality, matching by wild-card characters, and approximate equality (the exact definition of approximate equality is system dependent) are supported
...
• Attributes to return
...
The query can also specify whether to automatically dereference aliases; if alias dereferences are turned off, alias entries can be returned as answers
...
Database System
Architecture
19
...
Examples of
LDAP URLs are:
ldap:://aura
...
bell-labs
...
research
...
com/o=Lucent,c=USA??sub?cn=Korth
The first URL returns all attributes of all entries at the server with organization being
Lucent, and country being USA
...
The
question marks in the URL separate different fields
...
The second field, the list of attributes to return, is left
empty, meaning return all attributes
...
The last parameter is the search condition
...
Figure 19
...
The code first opens a connection to an LDAP
server by ldap open and ldap bind
...
The
arguments to ldap search s are the LDAP connection handle, the DN of the base from
which the search should be done, the scope of the search, the search condition, the
list of attributes to be returned, and an attribute called attrsonly, which, if set to 1,
would result in only the schema of the result being returned, without any actual tuples
...
The first for loop iterates over and prints each entry in the result
...
Since attributes in LDAP may be multivalued, the third for loop prints each value of
an attribute
...
Figure 19
...
The LDAP API also contains functions to create, update, and delete entries, as well
as other operations on the DIT
...
19
...
2
...
The suffix of a DIT is a sequence of RDN=value
pairs that identify what information the DIT stores; the pairs are concatenated to the
rest of the distinguished name generated by traversing from the entry to the root
...
The DITs may be organizationally and geographically
separated
...
Referrals are the key component that help organize a distributed collection of directories into an integrated system
...
Database System
Architecture
741
© The McGraw−Hill
Companies, 2001
19
...
9
Directory Systems
745
#include
h>
main() {
LDAP *ld;
LDAPMessage *res, *entry;
char *dn, *attr, *attrList[] = {“telephoneNumber”, NULL};
BerElement *ptr;
int vals, i;
ld = ldap open(“aura
...
bell-labs
...
6
Example of LDAP code in C
...
Access to the referenced DIT is transparent, proceeding without the user’s knowledge
...
The hierarchical naming mechanism used by LDAP helps break up control of information across parts of an organization
...
Although it is not an LDAP requirement, organizations often choose to break up
information either by geography (for instance, an organization may maintain a directory for each site where the organization has a large presence) or by organizational
742
Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition
746
Chapter 19
VI
...
Distributed Databases
© The McGraw−Hill
Companies, 2001
Distributed Databases
structure (for instance, each organizational unit, such as department, maintains its
own directory)
...
Work
on standardizing replication in LDAP is in progress
...
10 Summary
• A distributed database system consists of a collection of sites, each of which
maintains a local database system
...
In addition, a
site may participate in the execution of global transactions; those transactions
that access data in several sites
...
• Distributed databases may be homogeneous, where all sites have a common
schema and database system code, or heterogeneous, where the schemas and
system codes may differ
...
It is essential that the system
minimize the degree to which a user needs to be aware of how a relation is
stored
...
There are, however, additional failures with which we
need to deal in a distributed environment, including the failure of a site, the
failure of a link, loss of a message, and network partition
...
• To ensure atomicity, all the sites in which a transaction T executed must agree
on the final outcome of the execution
...
To ensure this property, the transaction coordinator of T must execute
a commit protocol
...
• The two-phase commit protocol may lead to blocking, the situation in which
the fate of a transaction cannot be determined until a failed site (the coordinator) recovers
...
• Persistent messaging provides an alternative model for handling distributed
transactions
...
Persistent messages (which are guaranteed to be
delivered exactly once, regardless of failures), are sent to remote sites to request actions to be taken there
...
Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition
VI
...
Distributed Databases
19
...
In the case of locking protocols, the only change that needs to be incorporated is in the way that the lock manager is implemented
...
One or more central coordinators
may be used
...
Protocols for handling replicated data include the primary-copy, majority,
biased, and quorum-consensus protocols
...
In the case of timestamping and validation schemes, the only needed
change is to develop a mechanism for generating unique global timestamps
...
Such facilities must be used with great care, since they may result
in nonserializable executions
...
• To provide high availability, a distributed database must detect failures, reconfigure itself so that computation may continue, and recover when a processor
or a link is repaired
...
The majority protocol can be extended by using version numbers to permit transaction processing to proceed even in the presence of failures
...
Less-expensive protocols are available to deal with site failures, but they
assume network partitioning does not occur
...
To provide
high availability, the system must maintain a backup copy that is ready to assume responsibility if the coordinator fails
...
The algorithms that determine which site should act as a coordinator are called election algorithms
...
Several
optimization techniques are available to choose which sites need to be accessed
...
• Heterogeneous distributed databases allow sites to have their own schemas
and database system code
...
The local database systems may employ different logical mod-
744
Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition
748
Chapter 19
VI
...
Distributed Databases
Distributed Databases
els and data-definition and data-manipulation languages, and may differ in
their concurrency-control and transaction-management mechanisms
...
• Directory systems can be viewed as a specialized form of database, where
information is organized in a hierarchical fashion similar to the way files are
organized in a file system
...
Directories can be distributed across multiple sites to provide autonomy to
individual sites
...
Review Terms
• Homogeneous distributed
database
In-doubt transactions
Blocking problem
• Heterogeneous distributed
database
• Three-phase commit protocol
• Data replication
• Persistent messaging
• Primary copy
• Concurrency control
• Data fragmentation
Horizontal fragmentation
Vertical fragmentation
• Data transparency
• Single lock-manager
Fragmentation transparency
Replication transparency
Location transparency
• Name server
• Aliases
• Distributed transactions
Local transactions
Global transactions
• Transaction manager
• Transaction coordinator
• System failure modes
• Network partition
• Commit protocols
• Two-phase commit protocol (2PC)
Ready state
(3PC)
• Distributed lock-manager
• Protocols for replicas
Primary copy
Majority protocol
Biased protocol
Quorum consensus protocol
• Timestamping
• Master – slave replication
• Multimaster (update-anywhere)
replication
• Transaction-consistent snapshot
• Lazy propagation
• Deadlock handling
Local wait-for graph
Global wait-for graph
False cycles
• Availability
• Robustness
Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition
VI
...
Distributed Databases
749
• Mediators
• Virtual database
• Directory systems
• Semijoin strategy
• LDAP: Lightweight directory
access protocol
Distinguished name (DN)
Relative distinguished names
RDNs
Directory information
tree (DIT)
• Distributed directory trees
• Multidatabase system
• DIT suffix
• Autonomy
• Referral
• Coordinator selection
• Backup coordinator
• Election algorithms
• Bully algorithm
• Distributed query processing
Exercises
19
...
19
...
19
...
4 When is it useful to have replication or fragmentation of data? Explain your
answer
...
5 Explain the notions of transparency and autonomy
...
6 To build a highly available distributed system, you must know what kinds of
failures can occur
...
List possible types of failure in a distributed system
...
Which items in your list from part a are also applicable to a centralized
system?
19
...
For each possible
failure that you listed in Exercise 19
...
19
...
Can site A distinguish
among the following?
• B goes down
...
• B is extremely overloaded and response time is 100 times longer than normal
...
Database System
Architecture
19
...
9 The persistent messaging scheme described in this chapter depends on timestamps combined with discarding of received messages if they are too old
...
19
...
19
...
Suppose we modify that protocol as follows:
• Only intention-mode locks are allowed on the root
...
Show that these modifications alleviate this problem without allowing any
nonserializable schedules
...
12 Explain the difference between data replication in a distributed system and the
maintenance of a remote backup site
...
13 Give an example where lazy replication can lead to an inconsistent database
state even when updates get an exclusive lock on the primary (master) copy
...
14 Study and summarize the facilities that the database system you are using provides for dealing with inconsistent states that can be reached with lazy propagation of updates
...
15 Discuss the advantages and disadvantages of the two methods that we presented in Section 19
...
2 for generating globally unique timestamps
...
16 Consider the following deadlock-detection algorithm
...
The edge (Ti , Tj , n) is inserted in the local wait-for of S1
...
A request from Ti to Tj in the same site is handled in the usual manner;
no timestamps are associated with the edge (Ti , Tj )
...
On receiving this message, a site sends its local wait-for graph to the coordinator
...
The wait-for graph reflects an instantaneous
state of the site, but it is not synchronized with respect to any other site
...
• The graph has an edge (Ti , Tj ) if and only if
Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition
VI
...
Distributed Databases
Exercises
751
There is an edge (Ti , Tj ) in one of the wait-for graphs
...
Show that, if there is a cycle in the constructed graph, then the system is in a
deadlock state, and that, if there is no cycle in the constructed graph, then the
system was not in a deadlock state when the execution of the algorithm began
...
17 Consider a relation that is fragmented horizontally by plant-number:
employee (name, address, salary, plant-number)
Assume that each fragment has two replicas: one stored at the New York site
and one stored locally at the plant site
...
a
...
b
...
c
...
d
...
19
...
Assume
that the machine relation is stored in its entirety at the Armonk site
...
a
...
b
...
”
c
...
d
...
19
...
18, state how your choice of a strategy
depends on:
a
...
The site at which the result is desired
19
...
7
...
21 Is ri n rj necessarily equal to rj
rj = rj n ri hold?
n
ri ? Under what conditions does ri
n
19
...
Database System
Architecture
© The McGraw−Hill
Companies, 2001
19
...
7
D
4
6
3
4
2
E
5
8
2
1
3
s
Relations for Exercise 19
...
19
...
Bibliographical Notes
Textbook discussions of distributed databases are offered by Ozsu and Valduriez
[1999] and Ceri and Pelagatti [1984]
...
Rothnie et al
...
Breitbart et al
...
The implementation of the transaction concept in a distributed database are presented by Gray [1981], Traiger et al
...
[1991]
...
The three-phase commit protocol is from Skeen [1981]
...
The bully algorithm in Section 19
...
5 is from Garcia-Molina [1982]
...
Distributed concurrency control is covered by Rosenkrantz et al
...
[1978], Bernstein et al
...
[1980], Bernstein and Goodman [1980], Bernstein and Goodman [1981a], Bernstein and Goodman [1982], and Garcia-Molina and Wiederhold
[1982]
...
[1986]
...
Validation techniques for distributed concurrencycontrol schemes are described by Schlageter [1981], Ceri and Owicki [1983], and
Bassiouni [1988]
...
Attar et al
...
A survey of techniques for recovery in distributed
database systems is presented by Kohler [1981]
...
Problems in this
Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition
VI
...
Distributed Databases
749
© The McGraw−Hill
Companies, 2001
Bibliographical Notes
753
environment are discussed in Gray et al
...
Anderson et al
...
Breitbart et al
...
The user manuals of various database systems provide details of how they handle replication and consistency
...
Distributed deadlock-detection algorithms are presented by Rosenkrantz et al
...
[1983], and Obermarck [1982]
...
16 is from Stuart et al
...
Distributed query processing is discussed in Wong [1977], Epstein et al
...
[1983], Ceri and
Pelagatti [1983], and Wong [1983]
...
[1982]
discuss the approach to distributed query processing taken by R* (a distributed version of System R)
...
The performance results also serve to validate
the cost model used in the R* query optimizer
...
[1982]
...
[1997]
...
[1996] and Papakonstantinou et al
...
Weltman and Dahbura [2000] and Howes et al
...
Kapitskaia et al
...
750
Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition
VI
...
Parallel Databases
C H A P T E R
© The McGraw−Hill
Companies, 2001
2 0
Parallel Databases
In this chapter, we discuss fundamental algorithms for parallel database systems that
are based on the relational data model
...
20
...
Today, they are successfully marketed by practically every database system vendor
...
Moreover, the growth of the World Wide Web has created
many sites with millions of viewers, and the increasing amounts of data collected from these viewers has produced extremely large databases at many
companies
...
Queries
used for such purposes are called decision-support queries, and the data requirements for such queries may run into terabytes
...
• The set-oriented nature of database queries naturally lends itself to parallelization
...
• As microprocessors have become cheap, parallel machines have become common and relatively inexpensive
...
Database System
Architecture
20
...
Parallelism is also used to provide scaleup, where increasing workloads
are handled without increased response time, via an increase in the degree of parallelism
...
Briefly,
in shared-memory architectures, all processors share a common memory and disks;
in shared-disk architectures, processors have independent memories, but share disks;
in shared-nothing architectures, processors share neither memory nor disks; and hierarchical architectures have nodes that share neither memory nor disks with each
other, but internally each node has a shared-memory or a shared-disk architecture
...
2 I/O Parallelism
In it simplest form, I/O parallelism refers to reducing the time required to retrieve
relations from disk by partitioning the relations on multiple disks
...
In horizontal partitioning, the tuples of a relation are divided (or declustered) among
many disks, so that each tuple resides on one disk
...
20
...
1 Partitioning Techniques
We present three basic data-partitioning strategies
...
, Dn−1 , across which the data are to be partitioned
...
This strategy scans the relation in any order and sends the ith
tuple to disk number Di mod n
...
• Hash partitioning
...
A hash
function is chosen whose range is {0, 1,
...
Each tuple of the original
relation is hashed on the partitioning attributes
...
• Range partitioning
...
It chooses a partitioning attribute, A, as a partitioning
vector
...
Let [v0 , v1 ,
...
Consider a tuple t such
that t[A] = x
...
If x ≥ vn−2 , then t goes on disk
Dn−1
...
For example, range partitioning with three disks numbered 0, 1, and 2 may
assign tuples with values less than 5 to disk 0, values between 5 and 40 to disk
1, and values greater than 40 to disk 2
...
Database System
Architecture
© The McGraw−Hill
Companies, 2001
20
...
2
I/O Parallelism
757
20
...
2 Comparison of Partitioning Techniques
Once a relation has been partitioned among several disks, we can retrieve it in parallel, using all the disks
...
Thus, the transfer rates for reading or writing an entire
relation are much faster with I/O parallelism than without it
...
Access to data
can be classified as follows:
1
...
Locating a tuple associatively (for example, employee-name = “Campbell”);
these queries, called point queries, seek tuples that have a specified value
for a specific attribute
3
...
The different partitioning techniques support these types of access at different levels
of efficiency:
• Round-robin
...
With this scheme, both point
queries and range queries are complicated to process, since each of the n disks
must be used for the search
...
This scheme is best suited for point queries based on the
partitioning attribute
...
Directing a query to a single disk saves the startup cost of initiating a query on multiple disks, and
leaves the other disks free to process other queries
...
If the hash function is a good randomizing function, and the partitioning attributes form a key of the relation, then the number of tuples in each of the
disks is approximately the same, without much variance
...
The scheme, however, is not well suited for point queries on nonpartitioning attributes
...
Therefore, all the disks need to be scanned for range queries
to be answered
...
This scheme is well suited for point and range queries on
the partitioning attribute
...
For range queries, we consult
Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition
758
Chapter 20
VI
...
Parallel Databases
© The McGraw−Hill
Companies, 2001
Parallel Databases
the partitioning vector to find the range of disks on which the tuples may
reside
...
An advantage of this feature is that, if there are only a few tuples in the
queried range, then the query is typically sent to one disk, as opposed to
all the disks
...
On the other hand, if there are many tuples in the queried range (as
there are when the queried range is a larger fraction of the domain of the relation), many tuples have to be retrieved from a few disks, resulting in an I/O
bottleneck (hot spot) at those disks
...
In contrast, hash partitioning and round-robin partitioning would engage all the disks for such queries,
giving a faster response time for approximately the same throughput
...
5
...
In general, hash partitioning or range
partitioning are preferred to round-robin partitioning
...
Large relations are preferably partitioned across all the available disks
...
20
...
3 Handling of Skew
When a relation is partitioned (by a technique other than round-robin), there may be
a skew in the distribution of tuples, with a high percentage of tuples placed in some
partitions and fewer tuples in other partitions
...
All the tuples with the same value for the partitioning
attribute end up in the same partition, resulting in skew
...
Attribute-value skew can result in skewed partitioning regardless of whether range
partitioning or hash partitioning is used
...
Partition skew is less likely with hash
partitioning, if a good hash function is chosen
...
Database System
Architecture
© The McGraw−Hill
Companies, 2001
20
...
2
I/O Parallelism
759
As Section 18
...
1 noted, even a small skew can result in a significant decrease in
performance
...
For example, if a relation of 1000 tuples is divided into 10 parts, and the division is skewed, then there may be some partitions of size less than 100 and some
partitions of size more than 100; if even one partition happens to be of size 200, the
speedup that we would obtain by accessing the partitions in parallel is only 5, instead
of the 10 for which we would have hoped
...
If even one partition has
40 tuples (which is possible, given the large number of partitions) the speedup that
we would obtain by accessing them in parallel would be 25, rather than 100
...
A balanced range-partitioning vector can be constructed by sorting: The relation
is first sorted on the partitioning attributes
...
After every 1/n of the relation has been read, the value of the partitioning
attribute of the next tuple is added to the partition vector
...
In case there are many tuples with the same value
for the partitioning attribute, the technique can still result in some skew
...
The I/O overhead for constructing balanced range-partition vectors can be reduced by constructing and storing a frequency table, or histogram, of the attribute
values for each attribute of each relation
...
1 shows an example of a histogram for an integer-valued attribute that takes values in the range 1 to 25
...
It is straightforward to construct a balanced
range-partitioning function given a histogram on the partitioning attributes
...
50
frequency
754
40
30
20
10
1–5
6–10
Figure 20
...
Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition
760
Chapter 20
VI
...
Parallel Databases
© The McGraw−Hill
Companies, 2001
Parallel Databases
Another approach to minimizing the effect of skew, particularly with range partitioning, is to use virtual processors
...
Any of the partitioning techniques and query evaluation techniques that we study
later in this chapter can be used, but they map tuples and work to virtual processors
instead of to real processors
...
The idea is that even if one range had many more tuples than the others because
of skew, these tuples would get split across multiple virtual processor ranges
...
20
...
Transaction throughput can be increased by this form of parallelism
...
Thus, the primary use of interquery parallelism is to scaleup a transaction-processing system to support a larger number of
transactions per second
...
Database systems designed
for single-processor systems can be used with few or no changes on a shared-memory
parallel architecture, since even sequential database systems support concurrent processing
...
Supporting interquery parallelism is more complicated in a shared-disk or sharednothing architecture
...
A parallel database system must also ensure that two processors do not update
the same data independently at the same time
...
The problem of ensuring that the version is the
latest is known as the cache-coherency problem
...
One such protocol for a shared-disk system is this:
1
...
Immediately after the transaction obtains
either a shared or exclusive lock on a page, it also reads the most recent copy
of the page from the shared disk
...
Before a transaction releases an exclusive lock on a page, it flushes the page to
the shared disk; then, it releases the lock
...
Database System
Architecture
© The McGraw−Hill
Companies, 2001
20
...
4
Intraquery Parallelism
761
This protocol ensures that, when a transaction sets a shared or exclusive lock on a
page, it gets the correct copy of the page
...
Such protocols do not write pages to disk when exclusive
locks are released
...
The protocols have to be designed to handle concurrent requests
...
When other processors want
to read or write the page, they send requests to the home processor Pi of the page,
since they cannot directly communicate with the disk
...
The Oracle 8 and Oracle Rdb systems are examples of shared-disk parallel database
systems that support interquery parallelism
...
4 Intraquery Parallelism
Intraquery parallelism refers to the execution of a single query in parallel on multiple processors and disks
...
Interquery parallelism does not help in this task, since each
query is run sequentially
...
Suppose that the relation has been partitioned across multiple
disks by range partitioning on some attribute, and the sort is requested on the partitioning attribute
...
Thus, we can parallelize a query by parallelizing individual operations
...
We can parallelize the evaluation of the operator tree by
evaluating in parallel some of the operations that do not depend on one another
...
The two operations can be executed in parallel on separate processors, one generating output that is consumed by the other, even as it is generated
...
We can speed up processing of a query by parallelizing the execution of each individual operation, such as sort, select, project,
and join
...
5
...
We can speed up processing of a query by executing in parallel the different operations in a query expression
...
6
...
Since the number of operations in a typical query is small, compared to
the number of tuples processed by each operation, the first form of parallelism can
Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition
762
Chapter 20
VI
...
Parallel Databases
© The McGraw−Hill
Companies, 2001
Parallel Databases
scale better with increasing parallelism
...
In the following discussion of parallelization of queries, we assume that the queries
are read only
...
Rather than presenting algorithms for each architecture
separately, we use a shared-nothing architecture model in our description
...
We can simulate this model easily by using the other architectures, since transfer
of data can be done via shared memory in a shared-memory architecture, and via
shared disks in a shared-disk architecture
...
We mention occasionally how
the algorithms can be further optimized for shared-memory or shared-disk systems
...
, Pn−1 , and n disks D0 , D1 ,
...
A real system may have multiple disks per processor
...
However, for simplicity, we assume here that Di is a single disk
...
5 Intraoperation Parallelism
Since relational operations work on relations containing large sets of tuples, we can
parallelize the operations by executing them in parallel on different subsets of the relations
...
Thus, intraoperation parallelism is natural in a database system
...
5
...
5
...
20
...
1 Parallel Sort
Suppose that we wish to sort a relation that resides on n disks D0 , D1 ,
...
If the
relation has been range partitioned on the attributes on which it is to be sorted, then,
as noted in Section 20
...
2, we can sort each partition separately, and can concatenate
the results to get the full sorted relation
...
If the relation has been partitioned in any other way, we can sort it in one of two
ways:
1
...
2
...
20
...
1
...
When we sort by range partitioning the relation,
it is not necessary to range-partition the relation on the same set of processors or
757
758
Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition
VI
...
Parallel Databases
20
...
Suppose that we choose processors
P0 , P1 ,
...
There are two steps involved in this
operation:
1
...
To implement range partitioning, in parallel every processor reads the tuples from its disk and sends the tuples to their destination processor
...
, Pm also receives tuples belonging to its partition, and
stores them locally
...
2
...
Each processor executes the same operation
— namely, sorting — on a different data set
...
)
The final merge operation is trivial, because the range partitioning in the
first phase ensures that, for 1 ≤ i < j ≤ m, the key values in processor Pi are
all less than the key values in Pj
...
Virtual processor partitioning can also be used to reduce skew
...
5
...
2 Parallel External Sort–Merge
Parallel external sort–merge is an alternative to range partitioning
...
, Dn−1 (it does not matter how the relation has been partitioned)
...
Each processor Pi locally sorts the data on disk Di
...
The system then merges the sorted runs on each processor to get the final
sorted output
...
The system range-partitions the sorted partitions at each processor Pi (all by
the same partition vector) across the processors P0 , P1 ,
...
It sends the
tuples in sorted order, so that each processor receives the tuples in sorted
streams
...
Each processor Pi performs a merge on the streams as they are received, to get
a single sorted run
...
The system concatenates the sorted runs on processors P0 , P1 ,
...
Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition
764
Chapter 20
VI
...
Parallel Databases
© The McGraw−Hill
Companies, 2001
Parallel Databases
As described, this sequence of actions results in an interesting form of execution
skew, since at first every processor sends all blocks of partition 0 to P0 , then every
processor sends all blocks of partition 1 to P1 , and so on
...
To avoid this problem, each processor repeatedly sends a block of data to each partition
...
As a result, all processors receive data in parallel
...
The Y-net interconnection network in the Teradata DBC
machines can merge output from multiple processors to give a single sorted output
...
5
...
Parallel join algorithms attempt to split the pairs to be tested over several processors
...
Then, the system collects the
results from each processor to produce the final result
...
5
...
1 Partitioned Join
For certain kinds of joins, such as equi-joins and natural joins, it is possible to partition
the two input relations across the processors, and to compute the join locally at each
processor
...
Partitioned join then works this way: The system partitions the relations
r and s each into n partitions, denoted r0 , r1 ,
...
, sn−1
...
The partitioned join technique works correctly only if the join is an equi-join (for
example, r 1r
...
B s) and if we partition r and s by the same partitioning function
on their join attributes
...
In a partitioned join, however, there are two different
ways of partitioning r and s:
• Range partitioning on the join attributes
• Hash partitioning on the join attributes
In either case, the same partitioning function must be used for both relations
...
For
hash partitioning, the same hash function must be used on both relations
...
2
depicts the partitioning in a partitioned parallel join
...
For example, hash–join, merge–join, or
nested-loop join could be used
...
759
760
Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition
VI
...
Parallel Databases
20
...
...
...
...
...
...
...
...
...
s
r
Figure 20
...
If one or both of the relations r and s are already partitioned on the join attributes
(by either hash partitioning or range partitioning), the work needed for partitioning
is reduced greatly
...
Each processor
Pi reads in the tuples on disk Di , computes for each tuple t the partition j to which t
belongs, and sends tuple t to processor Pj
...
We can optimize the join algorithm used locally at each processor to reduce I/O by
buffering some of the tuples to memory, instead of writing them to disk
...
5
...
3
...
The partition vector should be such
that | ri | + | si | (that is, the sum of the sizes of ri and si ) is roughly equal over all
the i = 0, 1,
...
With a good hash function, hash partitioning is likely to have
a smaller skew, except when there are many tuples with the same values for the join
attributes
...
5
...
2 Fragment-and-Replicate Join
Partitioning is not applicable to all types of joins
...
a
Thus, there may be no easy way of partitioning r and s so
that tuples in partition ri join with only tuples in partition si
...
We
first consider a special case of fragment and replicate — asymmetric fragment-andreplicate join — which works as follows
...
Database System
Architecture
© The McGraw−Hill
Companies, 2001
20
...
The system partitions one of the relations— say, r
...
2
...
3
...
The asymmetric fragment-and-replicate scheme appears in Figure 20
...
If r is already stored by partitioning, there is no need to partition it further in step 1
...
The general case of fragment and replicate join appears in Figure 20
...
, rn−1 , and partitions s into m partitions, s0 , s1 ,
...
As before, any partitioning technique may
be used on r and on s
...
Asymmetric fragment and
replicate is simply a special case of general fragment and replicate, where m = 1
...
s
s0
s1
s2
s3
...
P1,2
P2,1
r
sm–1
...
r2
P2
r3
P3
r3
...
...
...
...
...
...
...
...
3
...
761
762
Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition
VI
...
Parallel Databases
20
...
, P0,m−1 , P1,0 ,
...
Processor Pi,j computes the join of ri with sj
...
To do so, the system replicates ri to processors Pi,0 , Pi,1 ,
...
3b), and replicates si to processors P0,i , P1,i ,
...
3b)
...
Fragment and replicate works with any join condition, since every tuple in r can
be tested with every tuple in s
...
Fragment and replicate usually has a higher cost than partitioning when both relations are of roughly the same size, since at least one of the relations has to be replicated
...
In
such a case, asymmetric fragment and replicate is preferable, even though partitioning could be used
...
5
...
3 Partitioned Parallel Hash–Join
The partitioned hash–join of Section 13
...
5 can be parallelized
...
, Pn−1 , and two relations r and s, such that the relations r and
s are partitioned across multiple disks
...
5
...
If the size of s is less than that of r, the parallel
hash–join algorithm proceeds this way:
1
...
Let ri denote the
tuples of relation r that are mapped to processor Pi ; similarly, let si denote the
tuples of relation s that are mapped to processor Pi
...
2
...
The partitioning at this stage is exactly the same as in the
partitioning phase of the sequential hash–join algorithm
...
3
...
As it receives each tuple, the destination processor repartitions it
by the function h2 , just as the probe relation is partitioned in the sequential
hash–join algorithm
...
Each processor Pi executes the build and probe phases of the hash–join algorithm on the local partitions ri and si of r and s to produce a partition of the
final result of the hash–join
...
Therefore, any
of the optimizations of the hash–join described in Chapter 13 can be applied as well
Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition
768
Chapter 20
VI
...
Parallel Databases
© The McGraw−Hill
Companies, 2001
Parallel Databases
to the parallel case
...
20
...
2
...
Suppose that relation r is
stored by partitioning; the attribute on which it is partitioned does not matter
...
We use asymmetric fragment and replicate, with relation s being replicated and
with the existing partitioning of relation r
...
At the end of this phase, relation s is replicated at all sites
that store tuples of relation r
...
We can overlap the indexed nested-loop join with the
distribution of tuples of relation s, to reduce the costs of writing the tuples of relation
s to disk, and of reading them back
...
20
...
3 Other Relational Operations
The evaluation of other relational operations also can be parallelized:
• Selection
...
Consider first the case where θ is of the
form ai = v, where ai is an attribute and v is a value
...
If θ is of the form
l ≤ ai ≤ u — that is, θ is a range selection — and the relation has been rangepartitioned on ai , then the selection proceeds at each processor whose partition overlaps with the specified range of values
...
• Duplicate elimination
...
We can also parallelize duplicate elimination by
partitioning the tuples (by either range or hash partitioning) and eliminating
duplicates locally at each processor
...
Projection without duplicate elimination can be performed as tuples are read in from disk in parallel
...
• Aggregation
...
We can parallelize the operation by partitioning the relation on the grouping attributes, and then com-
763
764
Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition
VI
...
Parallel Databases
20
...
Either hash partitioning
or range partitioning can be used
...
We can reduce the cost of transferring tuples during partitioning by partly
computing aggregate values before partitioning, at least for the commonly
used aggregate functions
...
The system can perform the operation at each processor Pi on those r tuples
stored on disk Di
...
The system partitions the result of the local aggregation
on the grouping attribute A, and performs the aggregation again (on tuples
with the partial sums) at each processor Pi to get the final result
...
This idea can be extended easily to the min and
max aggregate functions
...
8
...
20
...
4 Cost of Parallel Evaluation of Operations
We achieve parallelism by partitioning the I/O among multiple disks, and partitioning the CPU work among multiple processors
...
We
already know how to estimate the cost of an operation such as a join or a selection
...
We must also account for the following costs:
• Startup costs for initiating the operation at multiple processors
• Skew in the distribution of work among the processors, with some processors
getting a larger number of tuples than others
• Contention for resources — such as memory, disk, and the communication
network — resulting in delays
• Cost of assembling the final result by transmitting partial results from each
processor
The time taken by a parallel operation can be estimated as
Tpart + Tasm + max(T0 , T1 ,
...
Assuming that the
tuples are distributed without any skew, the number of tuples sent to each processor
Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition
770
Chapter 20
VI
...
Parallel Databases
Parallel Databases
can be estimated as 1/n of the total number of tuples
...
The preceding estimate will be an optimistic estimate, since skew is common
...
A partitioned parallel evaluation, for instance, is only as fast as the slowest of the parallel executions
...
The problem of skew in partitioning is closely related to the problem of partition
overflow in sequential hash–joins (Chapter 13)
...
We can use balanced range partitioning and virtual processor partitioning
to minimize skew due to range partitioning, as in Section 20
...
3
...
6 Interoperation Parallelism
There are two forms of interoperation parallelism: pipelined parallelism, and independent parallelism
...
6
...
Recall that, in pipelining, the output tuples of one operation, A, are consumed by a second operation, B, even before the first
operation has produced the entire set of tuples in its output
...
Parallel systems use pipelining primarily for the same reason that sequential systems do
...
It is possible to
run operations A and B simultaneously on different processors, so that B consumes
tuples in parallel with A producing them
...
Consider a join of four relations:
r1
1
r2
1
r3
1
r4
We can set up a pipeline that allows the three joins to be computed in parallel
...
As P1 computes tuples in r1 1 r2 , it
makes these tuples available to processor P2
...
P2 can use those tuples
that are available to begin computation of temp1 1 r3 , even before r1 1 r2 is fully
computed by P1
...
765
766
Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition
VI
...
Parallel Databases
20
...
First, pipeline chains generally do not attain sufficient length to provide a high degree of parallelism
...
Third, only marginal speedup is obtained for the frequent
cases in which one operator’s execution cost is much higher than are those of the
others
...
The real reason for using pipelining is that pipelined executions can avoid writing intermediate results to disk
...
6
...
This form of parallelism is called independent parallelism
...
Clearly, we can compute temp1 ← r1 1 r2
in parallel with temp2 ← r3 1 r4
...
7
...
2)
...
20
...
3 Query Optimization
Query optimizers account in large measure for the success of relational technology
...
Query optimizers for parallel query evaluation are more complicated than query
optimizers for sequential query evaluation
...
More important is the issue of how
to parallelize a query
...
The expression can be represented by an operator tree, as in Section 13
...
To evaluate an operator tree in a parallel system, we must make the following
decisions:
• How to parallelize each operation, and how many processors to use for it
• What operations to pipeline across different processors, what operations to execute independently in parallel, and what operations to execute sequentially,
one after the other
Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition
772
Chapter 20
VI
...
Parallel Databases
© The McGraw−Hill
Companies, 2001
Parallel Databases
These decisions constitute the task of scheduling the execution tree
...
For instance, it may appear wise to use the maximum amount of
parallelism available, but it is a good idea not to execute certain operations in parallel
...
Otherwise,
the advantage of parallelism is negated by the overhead of communication
...
Unless the operations are coarse grained, the final operation of the pipeline may
wait for a long time to get inputs, while holding precious resources, such as memory
...
The number of parallel evaluation plans from which to choose is much larger than
the number of sequential evaluation plans
...
Hence, we usually adopt heuristic approaches to reduce the number of parallel execution plans that we have to consider
...
The first heuristic is to consider only evaluation plans that parallelize every operation across all processors, and that do not use any pipelining
...
Finding the best such execution plan is like doing query optimization in a sequential system
...
The second heuristic is to choose the most efficient sequential evaluation plan,
and then to parallelize the operations in that evaluation plan
...
This model uses existing implementations of operations, operating on local copies of
data, coupled with an exchange operation that moves data around between different
processors
...
Yet another dimension of optimization is the design of physical-storage organization to speed up queries
...
The database administrator must choose a physical organization that appears to be good for the expected mix of database queries
...
20
...
Since large-scale parallel database systems are used primarily for storing
large volumes of data, and for processing decision-support queries on those data,
these topics are the most important in a parallel database system
...
767
768
Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition
VI
...
Parallel Databases
20
...
With a large number of processors and disks, the probability that at least one processor or disk will malfunction is significantly greater than in a single-processor system with one disk
...
Assuming that the probability of failure of a single processor or disk is small, the probability of failure of the system goes up linearly
with the number of processors and disks
...
Therefore, large-scale parallel database systems, such as Compaq Himalaya,
Teradata, and Informix XPS (now a division of IBM), are designed to operate even
if a processor or disk fails
...
If a processor fails, the data that it stored can still be accessed from the other processors
...
Requests for data stored at the failed site are automatically routed to the
backup sites that store a replica of the data
...
Therefore, the replicas
of the data of a processor are partitioned across multiple other processors
...
Therefore, it is
unacceptable for the database system to be unavailable while such operations are in
progress
...
Consider, for instance, online index construction
...
The index-building operation therefore cannot lock the entire
relation in shared mode, as it would have done otherwise
...
20
...
• In I/O parallelism, relations are partitioned among available disks so that
they can be retrieved faster
...
Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition
774
Chapter 20
VI
...
Parallel Databases
Parallel Databases
• Skew is a major problem, especially with increasing degrees of parallelism
...
• In interquery parallelism, we run different queries concurrently to increase
throughput
...
There
are two types of intraquery parallelism: intraoperation parallelism and interoperation parallelism
...
Intraoperation parallelism is natural for relational
operations, since they are set oriented
...
In partitioned parallelism, the relations are split into several parts, and
tuples in ri are joined with only tuples from si
...
In fragment and replicate, both relations are partitioned and each partition is replicated
...
Unlike partitioned parallelism, fragment and replicate and asymmetric fragment-and-replicate
can be used with any join condition
...
• In independent parallelism, different operations that do not depend on one
another are executed in parallel
...
• Query optimization in parallel databases is significantly more complex than
query optimization in sequential databases
...
Database System
Architecture
© The McGraw−Hill
Companies, 2001
20
...
1 For each of the three partitioning techniques, namely round-robin, hash partitioning, and range partitioning, give an example of a query for which that
partitioning technique would provide the fastest response
...
2 In a range selection on a range-partitioned attribute, it is possible that only
one disk may need to be accessed
...
20
...
Hash partitioning
b
...
4 What form of parallelism (interquery, interoperation, or intraoperation) is likely
to be the most important for each of the following tasks
...
Increasing the throughput of a system with many small queries
b
...
Database System
Architecture
Chapter 20
20
...
5 With pipelined parallelism, it is often a good idea to perform several operations
in a pipeline on a single processor, even when many processors are available
...
Explain why
...
Would the arguments you advanced in part a hold if the machine has a
shared-memory architecture? Explain why or why not
...
Would the arguments in part a hold with independent parallelism? (That
is, are there cases where, even if the operations are not pipelined and there
are many processors available, it is still a good idea to perform several
operations on the same processor?)
20
...
What attributes should be used for partitioning?
20
...
How can you optimize the evaluation if the join condition is of
the form | r
...
B | ≤ k, where k is a small constant
...
A join with such a join condition is called a band join
...
8 Describe a good way to parallelize each of the following
...
Full outer join, if the join condition involves comparisons other than equality
a
...
c
...
e
...
20
...
a
...
, 91 – 100, with frequencies
15, 5, 20, 10, 10, 5, 5, 20, 5, and 5, respectively
...
b
...
20
...
20
...
a
...
What are the benefits and drawbacks of using RAID storage instead of storing an extra copy of each data item?
771
772
Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition
VI
...
Parallel Databases
© The McGraw−Hill
Companies, 2001
Bibliographical Notes
777
Bibliographical Notes
Relational database systems began appearing in the marketplace in 1983; now, they
dominate it
...
A commercial system, Teradata, and several
research projects, such as GRACE (Kitsuregawa et al
...
[1986]),
GAMMA (DeWitt et al
...
[1990]) were
launched in quick succession
...
Subsequently,
in the late 1980s and the 1990s, several more companies — such as Tandem, Oracle,
Sybase, Informix, and Red-Brick (now a part of Informix, which is itself now a part of
IBM) — entered the parallel database market
...
[1989]) and Volcano (Graefe [1990])
...
Cache-coherency protocols for parallel database systems are discussed by Dias et al
...
Carey et al
...
Parallelism and recovery in database systems are discussed by
Bayer et al
...
Graefe [1993] presents an excellent survey of query processing, including parallel processing of queries
...
[1992]
...
[1984], Kitsuregawa et al
...
[1987], Schneider and DeWitt [1989], Kitsuregawa and Ogawa [1990],
Lin et al
...
[1995], among other works
...
[1992], Deshpande and Larson [1992], and Shatdal and Naughton [1993]
...
[1991], Wolf [1991],
and DeWitt et al
...
Sampling techniques for parallel databases are described
by Seshadri and Naughton [1992] and Ganguly et al
...
The exchange-operator
model was advocated by Graefe [1990] and Graefe [1993]
...
Lu and Tan [1991],
Hong and Stonebraker [1991], Ganguly et al
...
[1993], Hasan
and Motwani [1995], and Jhingran et al
...
Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition
P A
VII
...
The chapter first outlines how to implement user interfaces, in particular Web-based interfaces
...
Chapter 22 describes a number of recent advances in querying and information
retrieval
...
It next covers data warehousing, whereby
data generated by different parts of an organization are gathered centrally
...
Finally, the chapter describes information retrieval,
which deals with techniques for querying collections of text documents, such as Web
pages, to find documents of interest
...
Applications such as mobile
computing and its connections with databases, are also described in this chapter
...
773
774
Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition
VII
...
Application
Development and
Administration
C H A P T E R
© The McGraw−Hill
Companies, 2001
2 1
Application Development
and Administration
Practically all use of databases occurs from within application programs
...
Not surprisingly, therefore, database systems have long supported tools such
as form and GUI builders, which help in rapid development of applications that interface with users
...
Once an application has been built, it is often found to run slower than the designers wanted, or to handle fewer transactions per second than they required
...
Benchmarks help to characterize the performance of database systems
...
A variety of standards have been proposed that affect database application
development
...
Legacy systems are systems based on older-generation technology
...
We outline issues
in interfacing with legacy systems, and how they can be replaced by other systems
...
1 Web Interfaces to Databases
The World Wide Web (Web, for short), is a distributed information system based on
hypertext
...
After outlining
several reasons for interfacing databases with the Web (Section 21
...
1), we provide
an overview of Web technology (Section 21
...
2) and then study Web servers (Section 21
...
3) and outline some state-of-the art techniques for building Web interfaces
781
Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition
782
Chapter 21
VII
...
Application
Development and
Administration
© The McGraw−Hill
Companies, 2001
Application Development and Administration
to databases, using servlets and server-side scripting languages (Sections 21
...
4 and
21
...
5)
...
1
...
21
...
1 Motivation
The Web has become important as a front end to databases for several reasons: Web
browsers provide a universal front end to information supplied by back ends located
anywhere in the world
...
Further, today, almost everyone who can afford it has access to the Web
...
The HTML forms interface is convenient for transaction processing
...
The server executes an application program
corresponding to the order form, and this action in turn executes transactions on a
database at the server site
...
Another reason for interfacing databases to the Web is that presenting only static
(fixed) documents on a Web site has some limitations, even when the user is not
doing any querying or transaction processing:
• Fixed Web documents do not allow the display to be tailored to the user
...
• When the company data are updated, the Web documents become obsolete
if they are not updated simultaneously
...
We can fix these problems by generating Web documents dynamically from a database
...
Whenever relevant data in the database are updated, the generated documents will automatically become up-to-date
...
Web interfaces provide attractive benefits even for database applications that are
used only with a single organization
...
Hyperlinks, which are links to other documents, can be associated with regions of
the displayed data
...
Hyperlinks are very useful for browsing data, permitting users to get more
details of parts of the data as desired
...
Programs can be written in client-side scripting languages, such as
Javascript, or can be “applets” written in the Java language
...
Other Topics
© The McGraw−Hill
Companies, 2001
21
...
1
Web Interfaces to Databases
783
the construction of sophisticated user interfaces, beyond what is possible with just
HTML, interfaces that can be used without downloading and installing any software
...
21
...
2 Web Fundamentals
Here we review some of the fundamental technology behind the World Wide Web,
for readers who are not familiar with it
...
1
...
1 Uniform Resource Locators
A uniform resource locator (URL) is a globally unique name for each document that
can be accessed on the Web
...
bell-labs
...
The second part gives the unique name
of a machine that has a Web server
...
Much data on the Web is dynamically generated
...
An example of such a URL is
http://www
...
com/search?q=silberschatz
which says that the program search on the server www
...
com should be executed with the argument q=silberschatz
...
21
...
2
...
1 is an example of the source of an HTML document
...
2 shows the
displayed image that this document creates
...
HTML also supports several other input types
...
The program generates an HTML document, which is then
sent back and displayed to the user; we will see how to construct such programs in
Sections 21
...
3, 21
...
4, and 21
...
5
...
The cascading stylesheet (css) standard allows the same
Silberschatz−Korth−Sudarshan:
Database System
Concepts, Fourth Edition
784
Chapter 21
VII
...
Application
Development and
Administration
Application Development and Administration
| A-101 | Downtown | 500 |
| A-102 | Perryridge | 400 |
| A-201 | Brighton | 900 |