1.4 A data model is an abstract, self-contained, logical definition of the objects,* operators, and so forth, that together constitute the abstract machine with which users interact the
Trang 1Copyright (c) 2003 C J Date page 1.8
• An online application is an application whose purpose is to
support an end user who is accessing the database from an
online workstation or terminal
• Persistent data is data whose lifetime typically exceeds that
of individual application program executions In other words,
it is data that (a) is stored in the database and (b) persists from the moment it is created until the moment it is
explicitly destroyed (Nonpersistent data, by contrast, is
typically destroyed implicitly when the application program that created it ceases execution, or possibly even sooner.)
• A property is some characteristic or feature possessed by
some entity (or some relationship) Examples are a person's name, a part's weight, a car's color, or a contract's
duration (By the way, is a contract an entity or a
relationship? What do you think? Justify your answer!)
• A query language is a language that supports the expression
of high-level commands (such as SELECT, INSERT, etc.) to the
DBMS SQL is an example of such a language Note: Despite
the name, query languages typically support much more than just query──i.e., retrieval──operations alone (Though not always! OQL and XQuery──see Chapter 25 and Chapter 27,
respectively──are examples of query languages that do support retrieval only.)
• Redundancy means the very same piece of information (say the
fact that a certain employee is in a certain department) is recorded more than once, possibly in more than one way Note that redundancy at the physical storage level is often
desirable (for performance reasons), while redundancy at the logical user level is usually undesirable (because it
complicates the user interface, among other things) But
physical redundancy need not imply logical redundancy, so long
as the system provides an adequate degree of data
independence
• A relationship is an association among entities Note: As
with entities, it is strictly necessary to distinguish between
relationship types and relationship occurrences or instances,
but in informal contexts we often use the same term
relationship for both concepts
• Security means the protection of the data in the database
against unauthorized access
• Sharing refers to the possibility that individual pieces of
data in the database can be shared among several different
Trang 2users, in the sense that each of those users can have access
to the same piece of data, possibly even at the same time (and different users can use it for different purposes)
• A stored field is the smallest unit of stored data.* The
type vs occurrence (or instance) distinction is important
once again, just as it is with entities and relationships
──────────
* But see Appendix A (regarding not only this term but also the
terms stored file and stored record)
──────────
• A stored file is the collection of all currently existing
occurrences of one type of stored record
• A stored record is a collection of related stored fields
The type vs occurrence distinction is important yet again
• A transaction is a logical unit of work, typically involving
several database operations (in particular, several update
operations), whose execution is guaranteed to be atomic──i.e.,
all or nothing──from a logical point of view
1.2 Some of the advantages are as follows:
• Compactness
• Speed
• Less drudgery
• Currency
• Centralized control
• Data independence
Some of the disadvantages are as follows:
• Security might be compromised (without good controls)
• Integrity might be compromised (without good controls)
Trang 3Copyright (c) 2003 C J Date page 1.10
• Additional hardware might be required
• Performance overhead might be significant
• Successful operation is crucial (the enterprise might be
highly vulnerable to failure)
• The system is likely to be complex (though such complexity should be concealed from the user)
1.3 A relational system is a system that is based on the
relational model Loosely speaking, therefore, it is a system in which:
a The data is perceived by the user as tables (and nothing but tables)
b The operators at the user's disposal (e.g., for data
retrieval) are operators that generate new tables from old
In a nonrelational system, by contrast, the user is presented with data in the form of other structures, either instead of or in
addition to the tables of a relational system Those other
structures, in turn, require other operators to manipulate them
For example, in a hierarchic system, the data is presented to the
user in the form of a set of tree structures (hierarchies), and the operators provided for manipulating such structures include
operators for traversing hierarchic paths──in effect, following pointers──up and down those trees
Note: It's worth pointing out that, in a sense, a relation might be thought of as a special case of a hierarchy (to be
specific, it's a root-only hierarchy) In principle, therefore, a hierarchic system requires all of the relational operators plus
certain additional operators And those additional operators
certainly add complexity, but they don't add any functionality (there's nothing useful that can be done with hierarchies that can't be done with just relations)
1.4 A data model is an abstract, self-contained, logical
definition of the objects,* operators, and so forth, that together constitute the abstract machine with which users interact (the
objects allow us to model the structure of data, the operators
allow us to model its behavior) An implementation of a given
data model is a physical realization on a real machine of the
components of that model In a nutshell: The model is what users have to know about; the implementation is what users don't have to know about
Trang 4──────────
* The term object is being used here in its generic sense, not
its special object-oriented sense
──────────
The difference between model and implementation is important because (among other things) it forms the basis for achieving data independence
┌───────────┬───────────┐
1.5 a │ WINE │ PRODUCER │
├═══════════┼═══════════┤
│ Zinfandel │ Rafanelli │
└───────────┴───────────┘
┌────────────────┬──────────────┐
b │ WINE │ PRODUCER │
├════════════════┼══════════════┤
│ Chardonnay │ Buena Vista │
│ Chardonnay │ Geyser Peak │
│ Joh Riesling │ Jekel │
│ Fumé Blanc │ Ch St Jean │
│ Gewurztraminer │ Ch St Jean │
└────────────────┴──────────────┘
┌──────┬────────────┬──────┐
c │ BIN# │ WINE │ YEAR │
├══════┼────────────┼──────┤
│ 6 │ Chardonnay │ 2002 │
│ 22 │ Fumé Blanc │ 2000 │
│ 52 │ Pinot Noir │ 1999 │
└──────┴────────────┴──────┘
┌────────────────┬──────┬──────┐
d │ WINE │ BIN# │ YEAR │
├────────────────┼══════┼──────┤
│ Cab Sauvignon │ 48 │ 1997 │
└────────────────┴──────┴──────┘
1.6 We give a solution for part a only: "Rafanelli is a producer
of Zinfandel"──or, more precisely, "Some bin contains some bottles
of Zinfandel that were produced by Rafanelli in some year, and they will be ready to drink in some year."
1.7 a The specified row (for bin number 80) is added to the
CELLAR table
Trang 5Copyright (c) 2003 C J Date page
1.12
b The rows for bin numbers 45, 48, 64, and 72 are deleted from the CELLAR table
c The row for bin number 50 has the number of bottles set to
5
d Same as c
Incidentally, note how convenient it is to be able to refer to rows by their primary key value (the primary key for the CELLAR table is {BIN#}──see Chapter 8) In other words, such key values
effectively provide a row-level addressing mechanism in a
relational system
1.8 a SELECT BIN#, WINE, BOTTLES
FROM CELLAR
WHERE PRODUCER = 'Geyser Peak' ;
b SELECT BIN#, WINE
FROM CELLAR
WHERE BOTTLES > 5 ;
c SELECT BIN#
FROM CELLAR
WHERE WINE = 'Cab Sauvignon'
OR WINE = 'Pinot Noir'
OR WINE = 'Zinfandel'
OR WINE = 'Syrah'
OR ;
There's no shortcut answer to this question, because "color
of wine" isn't explicitly recorded in the database; thus, the DBMS doesn't know that (e.g.) Pinot Noir is red
d UPDATE CELLAR
SET BOTTLES = BOTTLES + 3
WHERE BIN# = 30 ;
e DELETE
FROM CELLAR
WHERE WINE = 'Chardonnay' ;
f INSERT
INTO CELLAR ( BIN#, WINE, PRODUCER, YEAR, BOTTLES, READY ) VALUES ( 55, 'Merlot', 'Gary Farrell', 2000, 12, 2005 ) ;
1.9 No answer provided
Trang 6*** End of Chapter 1 ***
Trang 7Copyright (c) 2003 C J Date page 2.1
Chapter 2
D a t a b a s e S y s t e m A r
c h i t e c t u r e
Principal Sections
• The three levels of the architecture
• The external level
• The conceptual level
• The internal level
• Mappings
• The DBA
• The DBMS
• Data communications
• Client/server architecture
• Utilities
• Distributed processing
General Remarks
This chapter resembles Chapter 1 in that it's probably best given just a "once over lightly" treatment on a first pass As with Chapter 1, therefore, it's not really worth giving a blow-by-blow analysis of the individual sections here However, the following topics, at least, should be touched on in a live class:
• The external, conceptual, and internal levels (and common
synonyms──e.g., physical or stored in place of internal,
community logical or just logical in place of conceptual, user logical or just logical in place of external the
terminology issue rears its ugly head again!)
• DDLs, DMLs, and schemas (the last of these also known more
simply as data definitions)
• Point out that the relational model has nothing explicit to
say regarding the internal level (deliberately, of course)
• Logical data independence (at least a brief mention, with a
forward reference to Chapters 3 and──especially──10)
• Steps in processing and executing a DML request (hence, an
overview of the basic components of a DBMS)
Trang 8• Basic client/server concepts (and note that client vs server
is, primarily, a logical distinction, not a physical one)
• Basic idea (very superficial) of distributed systems
Note: Section 2.2 and (to a lesser extent) subsequent
sections make use of a rather trivial example based on PL/I and COBOL Of course, I do realize that PL/I and COBOL are regarded
as antediluvian in some circles (though they're still very
significant commercially), but which actual languages are used isn't important! What's more, no PL/I- or COBOL-specific
knowledge is really needed in order to follow the example
Naturally you can substitute your own favorite more modern
languages if you prefer
Answers to Exercises
2.1 See Fig 2.3 in the body of the chapter
2.2 Some of the following definitions elaborate slightly on those given in the body of the chapter
• Back end: Same as server, q.v
• A client is an application that runs on top of the
DBMS──either a user-written application or a "built-in"
application, i.e., an application provided by the DBMS vendor
or some third-party software vendor The term is also used to refer to the hardware platform the client application runs on, especially when that platform is distinct from the one the server runs on
• The conceptual view is an abstract representation of the
database in its entirety The conceptual schema is a
definition of that conceptual view The conceptual DDL is a
language for writing conceptual schemas
• The conceptual/internal mapping defines the correspondence
between the conceptual view and the stored database
• A data definition language (DDL) is a language for defining,
or declaring, database objects
• The data dictionary is a system database that contains "data
about the data"──i.e., definitions of other objects in the system, also known as metadata (in particular, all of the
various schemas and mappings will physically be stored, in
Trang 9Copyright (c) 2003 C J Date page 2.3
both source and object form, in the dictionary) A
comprehensive dictionary will also include cross-reference information, showing, for instance, which applications use which pieces of the database, which users require which
reports, what terminals or workstations are connected to the system, and so on The dictionary might even──in fact,
probably should──be integrated into the database it defines, and thus include its own definition (i.e., be
"self-describing")
• A data manipulation language (DML) is a language for
"manipulating" or processing database objects
• A data sublanguage is that portion of a given language that's
concerned specifically with database objects and operations
It might or might not be clearly separable from the host
language (q.v.) in which it's embedded or from which it's
invoked
• A database/data-communications system (DB/DC system) is a
combination of a DC manager and a DBMS, in which the DBMS
looks after the database and the DC manager handles all
messages to and from the DBMS (or, more accurately, to and from applications that use the DBMS)
• The data communications manager (DC manager) is a software
component that manages all message transmissions between the user and the DBMS (more accurately, between the user and some application running on top of the DBMS)
• A distributed database is (loosely) a database that is
logically centralized but physically distributed across many distinct physical sites It's a little difficult to make this definition more precise (different writers tend to use the term in different ways); carried to its logical conclusion, however, full support for distributed database implies that a single application should be able to operate "transparently"
on data that is spread across a variety of different
databases, managed by a variety of different DBMSs, running on
a variety of different machines, supported by a variety of different operating systems, and connected together by a
variety of different communication networks──where
"transparently" means that the application operates from a logical point of view as if the data were all managed by a single DBMS running on a single machine
• Distributed processing means that distinct machines can be
connected together into some kind of communications network,
in such a way that a single data processing task can be spread
Trang 10across several machines in the network (and, typically,
carried out in parallel)
• An external view is a more or less abstract representation of some portion of the total database An external schema is a definition of such an external view An external DDL is a
language for writing external schemas
• An external/conceptual mapping defines the correspondence
between an external view and the conceptual view
• Front end: Same as client, q.v
• A host language is a language in which a data sublanguage is
embedded The host language is responsible for providing
various nondatabase facilities, such as I/O operations, local variables, computational operations, if-then-else logic, and
so on
• Load is the process of creating the initial version of the
database (or portions thereof) from one or more nondatabase files
• Logical database design is the process of identifying the
entities of interest to the enterprise and identifying the
information to be recorded about those entities Note:
Chapter 9 and Part III of the book make it clear that
integrity constraints are highly relevant to the logical
database design process Note too that logical design should
be done before the corresponding physical design (q.v.)
• The internal view is the database as physically stored.* The
internal schema is the definition of that internal view The
internal DDL is a language for writing internal schemas
Note: The book usually uses the more intuitive terms "stored database" and "stored database definition" in place of
"internal view" and "internal schema," respectively
──────────
* A slight oversimplification To paraphrase some remarks from Section 2.5, the internal view is really "at one remove" from the
physical level, since it doesn't deal with physical records──also
called blocks or pages──nor with device-specific considerations such as cylinder or track sizes In other words, it effectively assumes an unbounded linear address space; details of how that address space maps to physical storage are highly system-specific and are deliberately omitted from the general architecture