Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống
1
/ 84 trang
THÔNG TIN TÀI LIỆU
Thông tin cơ bản
Định dạng
Số trang
84
Dung lượng
799,36 KB
Nội dung
XML Structures for Existing Databases 59 <!ATTLIST Invoice InvoiceID ID #REQUIRED InvoiceNumber CDATA #REQUIRED TrackingNumber CDATA #REQUIRED OrderDate CDATA #REQUIRED ShipDate CDATA #REQUIRED> ShipMethod (USPS | FedEx | UPS) #REQUIRED> <!ELEMENT MonthlyTotal (MonthlyCustomerTotal*, MonthlyPartTotal*)> <!ATTLIST MonthlyTotal MonthlyTotalID ID #REQUIRED Month CDATA #REQUIRED Year CDATA #REQUIRED VolumeShipped CDATA #REQUIRED PriceShipped CDATA #REQUIRED> Rule 8: Adding Relationships through Containment. For each relationship we have defined, if the relationship is one-to-one or one-to-many in the direction it is being navigated, and no other relationship leads to the child within the selected subset, then add the child element as element content of the parent element with the appropriate cardinality. Many-to-One or Multiple Parent Relationships If the relationship is many-to-one, or the child has more than one parent, then we need to use pointing to describe the relationship. This is done by adding an IDREF or IDREFS attribute to the element on the parent side of the relationship. The IDREF should point to the ID of the child element. If the relationship is one-to-many, and the child has more than one parent, we should use an IDREFS attribute instead. Note that if we have defined a relationship to be navigable in either direction, for the purposes of this analysis it really counts as two different relationships. Note that these rules emphasize the use of containment over pointing whenever it is possible. Because of the inherent performance penalties when using the DOM and SAX with pointing relationships, containment is almost always the preferred solution. If we have a situation that requires pointing, however, and its presence in our structures is causing too much slowdown in our processing, we may want to consider changing the relationship to a containment relationship, and repeating the information pointed to wherever it would have appeared before. Applying this rule to our example and adding IDREF/IDREFS attributes, we arrive at the following: <!ELEMENT SalesData (Invoice*, MonthlyTotal*)> <!ATTLIST SalesData Status (NewVersion | UpdatedVersion | CourtesyCopy) #REQUIRED> <!ELEMENT Invoice (LineItem*)> <!ATTLIST Invoice InvoiceID ID #REQUIRED InvoiceNumber CDATA #REQUIRED TrackingNumber CDATA #REQUIRED Chapter 2 60 OrderDate CDATA #REQUIRED ShipDate CDATA #REQUIRED ShipMethod (USPS | FedEx | UPS) #REQUIRED CustomerIDREF IDREF #REQUIRED> <!ELEMENT Customer EMPTY> <!ELEMENT MonthlyCustomerTotal EMPTY> <!ATTLIST MonthlyCustomerTotal MonthlyCustomerTotalID ID #REQUIRED VolumeShipped CDATA #REQUIRED PriceShipped CDATA #REQUIRED CustomerIDREF IDREF #REQUIRED> <!ELEMENT MonthlyPartTotal EMPTY> <!ATTLIST MonthlyPartTotal MonthlyPartTotalID ID #REQUIRED VolumeShipped CDATA #REQUIRED PriceShipped CDATA #REQUIRED PartIDREF IDREF #REQUIRED> <!ELEMENT LineItem EMPTY> <!ATTLIST LineItem LineItemID ID #REQUIRED Quantity CDATA #REQUIRED Price CDATA #REQUIRED PartIDREF IDREF #REQUIRED> Rule 9: Adding Relationships using IDREF/IDREFS. Identify each relationship that is many-to-one in the direction we have defined it, or whose child is the child in more than one relationship we have defined. For each of these relationships, add an IDREF or IDREFS attribute to the element on the parent side of the relationship, which points to the ID of the element on the child side of the relationship. We're getting close to our final result, but there are still a couple of things we need to do to finalize the structure. We'll see how this is done in the next couple of sections. Add Missing Elements to the Root Element A significant flaw may have been noticed in the final structure we arrived at in the last section – when building documents using this DTD, there's no place to add a <Customer> element. It's not the root element of the document, and it doesn't appear in any of the element content models of any of the other elements in the structure. This is because it is only pointed to, not contained. Elements that turn out to only be referenced by IDREF(S) need to be added as allowable element content to the root element of the DTD. Then, when creating the document, the orphaned elements are created within the root element and then pointed to, where appropriate. Applying this rule to our example, we see that we are missing the <Customer> and <Part> elements. Adding these as allowable structural content to our root element gives us: XML Structures for Existing Databases 61 <!ELEMENT SalesData (Invoice*, Customer*, Part*, MonthlyTotal*)> <!ATTLIST SalesData Status (NewVersion | UpdatedVersion | CourtesyCopy) #REQUIRED> <!ELEMENT Invoice (LineItem*)> Rule 10: Add Missing Elements. For any element that is only pointed to in the structure created so far, add that element as allowable element content of the root element. Set the cardinality suffix of the element being added to *. Discard Unreferenced ID attributes Finally, we need to discard those ID attributes that we created in Rule 5 that do not have IDREF(S) pointing to them. Since we created these attributes in the process of building the XML structures, discarding them if they are not used does not sacrifice information, and saves developers the trouble of generating unique values for the attributes. Rule 11: Remove Unwanted ID Attributes. Remove ID attributes that are not referenced by IDREF or IDREFS attributes elsewhere in the XML structures. Applying Rule 11 to our example gives us our final structure. On review, the InvoiceID, LineItemID, MonthlyPartTotalID, MonthlyTotalID, and MonthlyCustomerTotalID attributes are not referenced by any IDREF or IDREFS attributes. Removing them, we arrive at our final structure, ch03_ex01.dtd: <!ELEMENT SalesData (Invoice*, Customer*, Part*, MonthlyTotal*)> <!ATTLIST SalesData Status (NewVersion | UpdatedVersion | CourtesyCopy) #REQUIRED> <!ELEMENT Invoice (LineItem*)> <!ATTLIST Invoice InvoiceNumber CDATA #REQUIRED TrackingNumber CDATA #REQUIRED OrderDate CDATA #REQUIRED ShipDate CDATA #REQUIRED ShipMethod (USPS | FedEx | UPS) #REQUIRED CustomerIDREF IDREF #REQUIRED> <!ELEMENT Customer EMPTY> <!ATTLIST Customer CustomerID ID #REQUIRED Name CDATA #REQUIRED Address CDATA #REQUIRED City CDATA #REQUIRED State CDATA #REQUIRED PostalCode CDATA #REQUIRED> <!ELEMENT Part EMPTY> <!ATTLIST Part PartID ID #REQUIRED PartNumber CDATA #REQUIRED Name CDATA #REQUIRED Chapter 2 62 Color CDATA #REQUIRED Size CDATA #REQUIRED> <!ELEMENT MonthlyTotal (MonthlyCustomerTotal*, MonthlyPartTotal*)> <!ATTLIST MonthlyTotal Month CDATA #REQUIRED Year CDATA #REQUIRED VolumeShipped CDATA #REQUIRED PriceShipped CDATA #REQUIRED> <!ELEMENT MonthlyCustomerTotal EMPTY> <!ATTLIST MonthlyCustomerTotal VolumeShipped CDATA #REQUIRED PriceShipped CDATA #REQUIRED CustomerIDREF IDREF #REQUIRED> <!ELEMENT MonthlyPartTotal EMPTY> <!ATTLIST MonthlyPartTotal VolumeShipped CDATA #REQUIRED PriceShipped CDATA #REQUIRED PartIDREF IDREF #REQUIRED> <!ELEMENT LineItem EMPTY> <!ATTLIST LineItem Quantity CDATA #REQUIRED Price CDATA #REQUIRED PartIDREF IDREF #REQUIRED> An Example XML Document Finally, here's an example of an XML document (ch03_ex01.xml) that would be valid for this DTD: <?xml version="1.0"?> <!DOCTYPE SalesData SYSTEM "http://myserver/xmldb/ch03_ex01.dtd" > <SalesData Status="NewVersion"> <Invoice InvoiceNumber="1" TrackingNumber="1" OrderDate="01012000" ShipDate="07012000" ShipMethod="FedEx" CustomerIDREF="Customer2"> <LineItem Quantity="2" Price="5" PartIDREF="Part2" /> </Invoice> <Customer CustomerID="Customer2" Name="BobSmith" Address="2AnyStreet" City="Anytown" State="AS" PostalCode="ANYCODE" /> <Part PartID="Part2" PartNumber="13" Name="Winkle" Color="Red" Size="10" /> <MonthlyTotal Month="January" Year="2000" VolumeShipped="2" PriceShipped="10"> XML Structures for Existing Databases 63 <MonthlyCustomerTotal VolumeShipped="5" PriceShipped="25" CustomerIDREF="Customer2" /> <MonthlyPartTotal VolumeShipped="8" PriceShipped="40" PartIDREF="Part2" /> </MonthlyTotal> </SalesData> Summary In this chapter, we've seen some guidelines for the creation of XML structures to hold data from existing relational databases. We've seen that this isn't an exact science, and that many of the decisions we will make while creating XML structures will entirely depend on the kinds of information we wish to represent in our documents. If there's one point in particular we should come away with from this chapter, it's that we need to try to represent relationships in our XML documents with containment as much as possible. XML is designed around the concept of containment – the DOM and XSLT treat XML documents as trees, while SAX and SAX-based parsers treat them as a sequence of branch begin and end events and leaf events. The more pointing relationships we use, the more complicated the navigation of your document will be, and the more of a performance hit our processor will take – especially if we are using SAX or a SAX-based parser. We must bear in mind as we create these structures that there are usually many XML structures that may be used to represent the same relational database data. The techniques described in this chapter should allow us to optimize our documents for rapid processing and minimum document size. Using the techniques discussed in this chapter, and the next, we should be able to easily move information between our relational database and XML documents. Here are the eleven rules we have defined for the development of XML structures from relational database structures: ❑ Rule 1: Choose the Data to Include. Based on the business requirement the XML document will be fulfilling, we decide which tables and columns from your relational database will need to be included in our documents. ❑ Rule 2: Create a Root Element. Create a root element for the document. We add the root element to our DTD, and declare any attributes of that element that are required to hold additional semantic information (such as routing information). Root element's names should describe their content. ❑ Rule 3: Model the Content Tables. Create an element in the DTD for each content table we have chosen to model. Declare these elements as EMPTY for now. ❑ Rule 4: Modeling Non-Foreign Key Columns. Create an attribute for each column we have chosen to include in our XML document (except foreign key columns). These attributes should appear in the !ATTLIST declaration of the element corresponding to the table in which they appear. Declare each of these attributes as CDATA, and declare it as #IMPLIED or #REQUIRED depending on whether the original column allows NULLS or not. Chapter 2 64 ❑ Rule 5: Add ID Attributes to the Elements. Add an ID attribute to each of the elements you have created in our XML structure (with the exception of the root element). Use the element name followed by ID for the name of the new attribute, watching as always for name collisions. Declare the attribute as type ID, and #REQUIRED. ❑ Rule 6: Representing Lookup Tables. For each foreign key that we have chosen to include in our XML structures that references a lookup table: 1. Create an attribute on the element representing the table in which the foreign key is found. 2. Give the attribute the same name as the table referenced by the foreign key, and make it #REQUIRED if the foreign key does not allow NULLS or #IMPLIED otherwise. 3. Make the attribute of the enumerated list type. The allowable values should be some human-readable form of the description column for all rows in the lookup table. ❑ Rule 7: Adding Element Content to Root elements. Add a child element or elements to the allowable content of the root element for each table that models the type of information we want to represent in our document. ❑ Rule 8: Adding Relationships through Containment. For each relationship we have defined, if the relationship is one-to-one or one-to-many in the direction it is being navigated, and no other relationship leads to the child within the selected subset, then add the child element as element content of the parent element with the appropriate cardinality. ❑ Rule 9: Adding Relationships using IDREF/IDREFS. Identify each relationship that is many-to-one in the direction we have defined it, or whose child is the child in more than one relationship we have defined. For each of these relationships, add an IDREF or IDREFS attribute to the element on the parent side of the relationship, which points to the ID of the element on the child side of the relationship. ❑ Rule 10: Add Missing Elements. For any element that is only pointed to in the structure created so far, add that element as allowable element content of the root element. Set the cardinality suffix of the element being added to *. ❑ Rule 11: Remove Unwanted ID Attributes. Remove ID attributes that are not referenced by IDREF or IDREFS attributes elsewhere in the XML structures. XML Structures for Existing Databases 65 Chapter 2 66 Database Structures for Existing XML So far, we have seen some general points on designing XML structures, and how best to design XML documents to represent existing database structures. In this chapter, we'll take a look at how database structures can be designed to store the information contained in an already existing XML structure. There are a number of reasons why we might need to move data from an XML repository to a relational database. For example, we might have a large amount of data stored in XML that needs to be queried against. XML (at least with the tools currently available) is not very good at performing queries, especially queries that require more than one document to be examined. In this case, we might want to extract the data content (or some portion of it) from the XML repository and move it to a relational database. Remember that XML's strengths are cross-platform transparency and presentation, while relational databases are vastly better at searching and summarization. Another good reason why we might want to move data into relational structures, would be to take advantage of the relational database's built-in locking and transactional features. Finally, our documents might contain huge amounts of data - more than we need to access when performing queries and/or summarizing data - and moving the data to a relational database will allow us to obtain just the data that is of interest to us. In this chapter, we will see how the various types of element and attribute content that can occur in XML are modeled in a relational database. In the process of doing this, we will go on to develop a set of rules that can be used to generically transform XML DTDs into SQL table creation scripts. Chapter 3 68 How to Handle the Various DTD Declarations As we are looking at creating database structures from existing XML structures, we will approach this chapter by looking at the four types of declarations that may appear in DTDs: ❑ element declarations. ❑ attribute list declarations. ❑ entity declarations. ❑ notation declarations. We can then see how each of these types of declaration can best be modeled in relational database structures. To help us demonstrate this we will create examples that persist XML documents to a SQL database and show the SQL create scripts. So, let's start with element declarations. Element Declarations As we have seen, in DTDs there are five types of element declaration: ❑ element-only. ❑ text-only. ❑ EMPTY. ❑ MIXED. ❑ ANY. So, let's look at each of these in turn, and see how each element content model would be modeled in a relational database. The Element-only (Structured Content) Model In this content model, the element may only contain other elements. Let's start with a simple example. Simple Element Content In the following DTD (ch03_ex01.dtd) we have a simple content model for an Invoice element: <!ELEMENT Invoice (Customer, LineItem*)> <!ELEMENT Customer (#PCDATA)> <!ELEMENT LineItem (#PCDATA)> The Invoice element can have two child elements, a Customer, and zero or more LineItem elements. So, let's see some sample XML that this DTD describes (ch03_ex01.xml): <?xml version="1.0"?> <!DOCTYPE listing SYSTEM "ch03_ex01.dtd" > <Invoice> <Customer> </Customer> <LineItem> </LineItem> <LineItem> </LineItem> </Invoice> [...]... InvoiceID ID #REQUIRED ClientIDREF IDREF #REQUIRED> and here is some sample XML (ch03_ex13 .xml) : < ?xml version="1.0"?> 89 Chapter 3 In this case, we need to add some sort... (ch03_ex 12. dtd): and here is some corresponding XML (ch03_ex 12 .xml) : < ?xml version="1.0"?> < ?xml version="1.0"?> In this case, we need to create an additional table to hold the reference numbers, as many of them... following structure (ch03_ex09.dtd): 78 Database Structures for Existing XML The following XML (ch03_ex09 .xml) can be represented by such a DTD: < ?xml version="1.0"?> . #REQUIRED> An Example XML Document Finally, here's an example of an XML document (ch03_ex01 .xml) that would be valid for this DTD: < ?xml version="1.0"?> <!DOCTYPE SalesData. elsewhere in the XML structures. XML Structures for Existing Databases 65 Chapter 2 66 Database Structures for Existing XML So far, we have seen some general points on designing XML structures,. of Manufacturer. Here is some sample XML that represents the structure in this DTD, ch03_ex 02 .xml: < ?xml version="1.0"?> <!DOCTYPE listing SYSTEM "ch03_ex 02. dtd" > <Invoice> <Customer> <Address>