An Introduction to Database Systems 8Ed - C J Date - Solutions Manual Episode 2 Part 10 pps

Given that XML fundamentally supports just one data type, viz., character strings, it's at least arguable that the options available for structuring such data i.e., character-string dat

Trang 1

27.23

[PNUM = $sx//PNUM][@COLOR = 'Blue'] return

{ $sx/SNUM, $sx/SNAME, $sx/STATUS, $sx/CITY }

</Supplier>

}

</Result>

27.22 Since the document doesn't have any immediate child elements

of type Supplier, the return clause is never executed, and the result is the empty sequence Note: If the query had been

formulated slightly differently, as follows──

{ for $sx in document("SuppliersOverShipments.xml")/

Supplier[CITY = 'London']

return

{ $sx/SNUM, $sx/SNAME, $sx/STATUS, $sx/CITY }

</whatever>

}

</Result>

──then the result would have looked like this:

</Result>

27.23 There appears to be no difference Here's an actual example

(query 1.1.9.3 Q3 from the W3C XML Query Use Cases document──see

reference [27.29]):

• Query:

{

for $b in document("http://www.bn.com/bib.xml")/bib/book,

$t in $b/title,

$a in $b/author return

{ $t } { $a }

</result>

}

</results>

• Query (modified):

Trang 2

{

for $b in document("http://www.bn.com/bib.xml")/bib/book,

$t in $b/title,

$a in $b/author return

{ $t, $a }

</result>

}

</results>

• Result (for both queries):*

<title>TCP/IP Illustrated</title>

<last>Stevens</last>

</author>

</result>

<title>Advanced Unix Programming</title>

<last>Stevens</last>

</author>

</result>

<last>Abiteboul</last>

<first>Serge</first>

</author>

</result>

</results>

──────────

* Again we've altered the "official" result very slightly for formatting reasons

──────────

27.24 See Section 27.6

Trang 3

27.25

27.25 The following observations, at least, spring to mind

immediately:

• Several of the functions perform what is essentially type conversion The expression XMLFILETOCLOB ('BoltDrawing.svg'), for example, might be more conventionally written something like this:

CAST_AS_CLOB ( 'BoltDrawing.svg' )

In other words, XMLDOC should be recognized as a fully fledged type (see Section 27.6, subsection "Documents as Attribute Values")

• Likewise, the expression XMLCONTENT (DRAWING,

'RetrievedBoltDrawing.svg') might more conventionally be

written thus:

DRAWING := CAST_AS_XMLDOC ( 'RetrievedBoltDrawing.svg' ) ;

In fact, XMLCONTENT is an update operator (see Chapter 5), and

the whole idea of being able to invoke it from inside a read-only operation (SELECT in SQL) is more than a little suspect [3.3]

• Consider the expression XMLFILETOCLOB ('BoltDrawing.svg') once again The argument here is apparently of type character

string However, that character string is interpreted (in

fact, it is dereferenced──see Chapter 26), which means that it

can't be just any old character string In fact, the

XMLFILETOCLOB function is more than a little reminiscent of the EXECUTE IMMEDIATE operation of dynamic SQL (see Chapter 4)

• Remarks analogous to those in the previous paragraph apply also to arguments like

'//PartTuple[PNUM = "P3"]/WEIGHT'

(see the XMLEXTRACTREAL example)

27.26 The suggestion is correct, in the following sense Consider any of the PartsRelation documents shown in the body of the

chapter Clearly it would be easy, albeit tedious, to show a

tuple containing exactly the same information as that

document──though it's true that the tuple in question would

contain just one component, corresponding to the XML document in its entirety That component in turn would contain a list or

sequence of further components, corresponding to the first-level content of the XML document in their "document order"; those

Trang 4

27.26

components in turn would (in general) contain further components, and so on Omitted elements can be represented by empty

sequences Note in particular that tuples in the relational model carry their attribute types with them, just as XML elements carry their tags with them──implying that (contrary to popular opinion!) tuples too, like XML documents, are self-describing, in a sense

27.27 The claim that XML data is "schemaless" is absurd, of

course; data that was "schemaless" would have no known structure, and it would be impossible to query it──except by playing games with SUBSTRING operations, if we stretch a point and think of such game-playing as "querying"──or to design a query language for it.* Rather, the point is that the schemas for XML data and (say) SQL data are expressed in different styles, styles that might seem

distinct at a superficial level but aren't really so very

different at a deep level

──────────

* In fact, it would be a BLOB──i.e., an arbitrarily long bit

string, with no internal structure that the DBMS is aware of

──────────

27.28 In one sense we might say that an analogous remark does

apply to relational data Given that XML fundamentally supports

just one data type, viz., character strings, it's at least

arguable that the options available for structuring such data

(i.e., character-string data specifically) in a relational

database are exactly the same as those available in XML As a

trivial example, an address might be represented by a single

character string; or by separate strings for street, city, state, and zip; or in a variety of other ways

In a much larger sense, however, an analogous remark does not

apply First, relational systems provide a variety of additional (and genuine) data types over and above character strings, as well

as the ability for users to define their own types; they therefore don't force users to represent everything in character-string

form, and indeed they provide very strong incentives not to

Second, there's a large body of design theory available for

relational databases that militates against certain bad designs Third, relational systems provide a wide array of operators, the effect of which is (in part) that there's no logical incentive for biasing designs in such a way as to favor some applications at the expense of others (contrast the situation in XML)

Trang 5

27.27

27.29 This writer is aware of no differences of substance──except that the hierarchic model is usually regarded as including certain operators and constraints, while it's not at all clear that the same is true of "the semistructured model."

27.30 No answer provided

Trang 6

The following text speaks for itself:

(Begin quote)

There are four appendixes Appendix A is an introduction to a new

implementation technology called The TransRelational tm Model

Appendix B gives further details, for reference purposes, of the syntax and semantics of SQL expressions Appendix C contains a list of the more important abbreviations, acronyms, and symbols introduced in the body of the text Finally, Appendix D (online) provides a tutorial survey of common storage structures and access methods

(End quote)

Appendixes ***

Trang 7

T h e T r a n s R e l a t i o n a

l tm M o d e l

Principal Sections

• Three levels of abstraction

• The basic idea

• Condensed columns

• Merged columns

• Implementing the relational operators

General Remarks

This is admittedly only an appendix, but if I was the instructor I would certainly cover it in class "It's the best possible time

to be alive, when almost everything you thought you knew is wrong"

(from Arcadia, by Tom Stoppard) The appendix is about a

radically new implementation technology, which (among other

things) does mean that an awful lot of what we've taken for

granted for years regarding DBMS implementation is now "wrong," or

at least obsolete For example:

• The data occupies a fraction of the space required for a

conventional database today

• The data is effectively stored in many different sort orders

at the same time

• Indexes and other conventional access paths are completely unnecessary

• Optimization is much simpler than it is with conventional

systems; often, there's just one obviously best way to

implement any given relational operation In particular, the need for cost-based optimizing is almost entirely eliminated

• Join performance is linear!──meaning, in effect, that the

time it takes to join twenty relations is only twice the time

it takes to join ten (loosely speaking) It also means that joining twenty relations, if necessary, is feasible in the

first place; in other words, the system is scalable

Trang 8

• There's no need to compile database requests ahead of time for performance

• Performance in general is orders of magnitude better than it

is with a conventional system

• Logical design can be done properly (in particular, there is never any need to "denormalize for performance")

• Physical database design can be completely automated

• Database reorganization as conventionally understood is

completely unnecessary

• The system is much easier to administer, because far fewer human decisions are needed

• There's no such thing as a "stored relvar" or "stored tuple"

at the physical level at all!

In a nutshell, the TransRelational model allows us to build DBMSs that──at last!──truly deliver on the full promise of the

relational model Perhaps you can see why it's my honest opinion that "The TransRelationaltm Model" is the biggest advance in the

DB field since Ted Codd gave us the relational model, back in

1969

Note: We're supposed to put that trademark symbol on the term TransRelational, at least the first time we use it, also in titles and the like Also, you should be aware that various aspects of the TR model──e.g., the idea of storing the data "attribute-wise" rather than "tuple-wise"──do somewhat resemble various ideas that have been described elsewhere in the literature; however, nobody else (so far as I know) has described a scheme that's anything like as comprehensive as the TR model; what's more, there are many aspects of the TR model that (again so far as I know) aren't like anything else, anywhere

The logarithms analogy from reference [A.1] is helpful: "As

we all know, logarithms allow what would otherwise be complicated, tedious, and time-consuming numeric problems to be solved by

transforming them into vastly simpler but (in a sense) equivalent problems and solving those simpler problems instead Well, it's

my claim that TR technology does the same kind of thing for data management problems." Give some examples

Explain and justify the name: The TransRelational tm Model

(which we abbreviate to "TR" in the book and in these notes)

Credit to Steve Tarin, who invented it Discuss data independence

Trang 9

and the conventional "direct image" style of implementation and the problems it causes

Note the simplifying assumptions: The database is (a) read-only and (b) in main memory Stress the fact that these

assumptions are made purely for pedagogic reasons; TR can and does

do well on updates and on disk

A.2 Three Levels of Abstraction

Straightforward──but stress the fact that the files are

abstractions (as indeed the TR tables are too) Be very careful

to use the terminology appropriate to each level from this point forward Show but do not yet explain in detail the Field Values Table and the (or, rather, a) Record Reconstruction Table for the

file of Fig A.3 Note: Each of those tables is derived from the

file independently of the other Point out that we're definitely not dealing with a direct-image style of implementation!

A.3 The Basic Idea

Explain "the crucial insight": Field Values in the Field Values Table, linkage information in the Record Reconstruction Table By the way, I deliberately don't abbreviate these terms to FVT and RRT Students have so much that's novel to learn here that I

think such abbreviations get in the way (the names, by contrast,

serve to remind students of the functionality) Note: Almost all

of the terms in this appendix are taken from reference [A.1] and

do not appear in reference [A.2]──which, to be frank, is quite

difficult to understand, in part precisely because its terminology isn't very good (or even consistent)

Regarding the Field Values Table: Built at load time (so

that's when the sorting is done) Explain intuitively obvious advantages for ORDER BY, value lookup, etc The Field Values

Table is the only TR table that contains user data as such

Isomorphic to the file

Regarding the Record Reconstruction Table: Also isomorphic, but contains pointers (row numbers) Those row numbers identify

rows in the Field Values Table or the Record Reconstruction Table

or both, depending on the context Explain the zigzag algorithm Can enter the rings (zigzags) anywhere! Explain simple equality restriction queries (binary search) TR lets us do a sort/merge join without having to do the sort!──or, at least, without having

to do the run-time sort (explain) Implications for the

optimizer: Little or no access path selection Don't need

indexes Physical database design is simplified (in fact, it

Trang 10

should become clear later that it can be automated, given the

logical design) No need for performance tuning A boon for the tired DBA

Explain how the Record Reconstruction Table is built (or you could set this subsection as a reading assignment) Not unique;

we can turn this fact to our advantage, but the details are beyond the scope of this appendix; suffice it to say that some Record

Reconstruction Tables are "preferred." See reference [A.1] for further discussion

A.4 Condensed Columns

An obvious improvement to the Field Values Table but one with far-reaching consequences Note the implications for update in particular (we're pretending the database is read-only, but this point is worth highlighting in passing) The compression

advantages are staggering!──but note that we're compressing at the level of field values, not of bit string encodings Don't have

to pay the usual price of extra machine cycles to do the

decompressing!

Explain row ranges.* Emphasize the point that these are

conceptual: Various more efficient internal representations are possible Histograms The TR representation is all about

permutations and histograms Immediately obvious implications for certain kinds of queries──e.g., "How many parts are there of each color?" Explain the revised record reconstruction process

──────────

* Row ranges look very much like intervals as in Chapter 23 But we'll see in the next section that we sometimes need to deal with

empty row ranges, whereas intervals in Chapter 23 were always

nonempty

──────────

A.5 Merged Columns

An extension of the condensed-columns idea (in a way) Go through the bill-of-materials example Explain the implications for join!

In effect, we can do a sort/merge join without doing the sort and without doing the merge, either! (The sort and merge are done at load time Do the heavy lifting ahead of time! As with

logarithms, in fact.)

Định dạng
Số trang	20
Dung lượng	107,02 KB