Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống
1
/ 86 trang
THÔNG TIN TÀI LIỆU
Thông tin cơ bản
Định dạng
Số trang
86
Dung lượng
1,67 MB
Nội dung
<element attr1="value1" attr2="value2" ></element> However, as we noted above, there is also an alternative syntax, whereby we place the closing slash at the end of the opening element: <element attr1="value1" attr2="value2" /> The following line defines an empty element image, with an attribute src with the value logo.gif: <image src="logo.gif" /> Processing Instructions XML processing instructions contain information for the application using the XML document. Processing instructions do not constitute the part of the character data of the document – the XML parser should pass these instructions unchanged to the application. The syntax of the processing instruction might be strangely familiar to you: <?TargetApp instructions?> In the following example, php is the target application and print "This XML document was created on Jan-07, 1999"; is the instruction: <?php print "This XML document was created on Jan-07, 1999"; ?> Entity References Entities are used in the document as a way of avoiding typing long pieces of text many times in a document. Entities are declared in the document's DTD (we will see later how to declare entities, when we look at DTDs in more detail). The declared entities can be referenced throughout the document. When the document is parsed by an XML parser, it replaces the entity reference with the text defined in the entity declaration. There are two types of entities – internal and external. The replacement text for an internal entity is specified in an entity declaration, whereas the replacement text for an external entity resides in a separate file, the location of which is specified in the entity declaration. After the entity has been declared, it can be referenced within the document using the following syntax: &nameofentity; Note that there should be no space between the ampersand (&), the entity name and the semicolon. For example, let's assume that an entity myname with the value "Harish Rawat" has been declared in the DTD of the document. The entity myname can be referred to in the document as: TEAM FLY PRESENTS Simpo PDF Merge and Split Unregistered Version - http://www.simpopdf.com <author>&myname;</author> The parser, while parsing the document will replace &myname; by Harish Rawat. So the application using the XML document will see the content of the element author as Harish Rawat. Comments Comments can be added in XML documents; the syntax is identical to that for HTML comments: <! This is a comment > The Document Type Definition The document type definition of an XML document is defined within a declaration known as the document type declaration. The DTD can be contained within this declaration, or the declaration can point to an external document containing the DTD. The DTD consists of element type declarations, attribute list declarations, entity declarations, and notation declarations. We will cover all of these in this section. Be sure to distinguish between the document type definition, or DTD, and the document type declaration. The syntax for a document type definition is: <!DOCTYPE rootelementname [ ]> The rootelementname is the name of the root element of the document. The declarations for the various elements, attributes, etc., are placed within the square braces. An XML document can also have an external DTD, which can be referenced with the following syntax: <!DOCTYPE rootelementname SYSTEM "http://www.harawat.com/books.dtd"> The rootelementname is the name of the root element of the document. The location of the file containing the DTD is http://www.harawat.com/books.dtd. Element Type Declarations The element type declaration indicates whether the element contains other elements, text, or is empty. It also specifies whether the elements are mandatory or optional, and how many times the elements can appear. An element type declaration, specifying that an element can contain character data, looks as follows: TEAM FLY PRESENTS Simpo PDF Merge and Split Unregistered Version - http://www.simpopdf.com <!ELEMENT elementname (#PCDATA)> Here ELEMENT is a keyword, elementname is the name of the element, and #PCDATA is also a keyword. #PCDATA stands for "parsed character data", that is, the data that can be handled by the XML parser. For example, the following element declaration specifies that the element title contains character data: <!ELEMENT title (#PCDATA)> The syntax of an element type declaration for an empty element is: <!ELEMENT elementname EMPTY> Here elementname is the name of the element, and EMPTY is a keyword. For example, the following element type declaration specifies that element image is empty: <!ELEMENT image EMPTY> The syntax of an element type declaration for an element can contain anything – other elements or parsed character data – is as follows: <!ELEMENT elementname ANY> Here elementname is the name of the element and ANY is a keyword. An element type declaration for an element that contains only other elements looks like this: <!ELEMENT parentelement (childelement1, childelement2, )> Here the element parentelement contains the child elements childelement1, childelement2, etc. For example, the following element type declaration specifies that the element book contains the elements title, authors, isbn, price: <!ELEMENT book (title, authors, isbn, price)> The syntax of element type declaration, specifying that parentelemnt contains either childelement1 or childelement2, … . <!ELEMENT parentelement (childelement1 | childelement2 | )> For example, the following element type declaration specifies that element url can contain either httpurl or ftpurl: TEAM FLY PRESENTS Simpo PDF Merge and Split Unregistered Version - http://www.simpopdf.com <!ELEMENT url (httpurl | ftpurl)> The following operators can be used in the element type declaration, to specify the number of allowed instances of elements within the parent element: Operator Description * Zero or more instances of the element is allowed. + One or more instance of the element is allowed. ? Optional. The following element type declaration specifies that the element authors contains zero or more instances of the element author: <!ELEMENT authors (author*)> The following element type declaration specifies that element authors contains one or more instances of element author: <!ELEMENT authors (author+)> The following element type declaration specifies that the element toc contains the element chapters and optionally can contain element appendixes: <!ELEMENT toc (chapters, appendixes?)> Attribute List Declarations We saw earlier that an element can have attributes associated with it. The attribute list declaration specifies the attributes which specific elements can take. It also indicates whether the attributes are mandatory or not, the possible values for the attributes, default values etc. The syntax of the attribute list declaration is: <!ATTLIST elementname attrname1 datatype1 flag1 attrname2 datatype2 flag2 > Here elementname is the name of the element, attrname1 is the name of an attribute, datatype1 specifies the type of information to be passed with the attribute and flag1 indicates how the default values for the attribute are to be handled. The possible values for the datatype field depend on the type of the attribute. Possible values for the flags field are: TEAM FLY PRESENTS Simpo PDF Merge and Split Unregistered Version - http://www.simpopdf.com Flag Description #REQUIRED This flag indicates that the attribute should be present in all instances of the element. If the attribute is not present in an instance of the element, then the document is not a valid document. #IMPLIED This flag indicates that the application can assume a default value for the attribute if the attribute is not specified in an element. #FIXED This flag indicates that the attribute can have only one value for all instances of elements in the document. CDATA Attributes CDATA attributes can have any character data as their value. The following attribute list declaration specifies that instances of the element price must have an attribute currency whose value can be any character data: <!ATTLIST price currency CDATA #REQUIRED> Enumerated Attributes Enumerated attributes can take one of the list of values provided in the declaration. The following attribute list declaration specifies that instances of the element author can have an attribute gender, with a value of either "male" or "female": <!ATTLIST author gender (male|female) #IMPLIED> ID and IDREF Attributes Attributes of type ID must have a unique value in an XML document. These attributes are used to uniquely identify instances of elements in the document. The following attribute list declaration specifies that instances of element employee, must have an attribute employeeid, and the value of it should be unique in the XML document: <!ATTLIST employee employeeid ID #REQUIRED> The value of attributes of type IDREF must match the value of an ID attribute on some element in the XML document. Similarly, the values of attributes of type IDREFS must contain whitespace- delimited ID values in the document. Attributes of type IDREF and IDREFS are used to establish links between elements in the document. The following attribute list declaration is used to establish a link between an employee and his or her manager and subordinates. <!ATTLIST employee employeeid ID #REQUIRED managerid IDREF #IMPLIED TEAM FLY PRESENTS Simpo PDF Merge and Split Unregistered Version - http://www.simpopdf.com subordinatesid IDREFS #IMPLIED> Entity Attributes Entity attributes provide a mechanism for referring to non-XML (binary) data from an XML document. The value of an entity attribute must match the name of an external entity declaration referring to non-XML data. The following attribute list declaration specifies that the element book, can have an entity attribute logo. <!ATTLIST book logo ENTITY #IMPLIED> Notation Declarations Sometimes elements in XML documents might refer to an external file containing data in a format that an XML parser cannot read. Suppose we have an XML document containing the details of book. We may want to put a reference to a GIF image of the cover along with the details of the book. The XML parser would not be able to process this data, so we need a mechanism to identify a helper application which will process this non-XML data. Notation declarations allow the XML parser to identify helper applications, which can be used to process non-XML data. A notation declaration provides a name and an external identifier for a type of non-XML (unparsed) data. The external identifier for the notation allows the XML application to locate a helper application capable of processing data in the given notation. For example, the following notation declaration specifies "file:///usr/bin/netscape" as the helper application for non-XML data of type "gif": <!NOTATION gif SYSTEM "file:///usr/bin/netscape"> Entity Declarations Entity declarations define entities which are used within the XML document. Whenever the XML parser encounters an entity reference in the XML document, it replaces it with the contents of the entity as defined in the entity declaration. Internal entity declarations are in the following format: <!ENTITY myname "Harish Rawat"> This entity declaration defines an entity myname, with the value "Harish Rawat". The following is an example of an external entity declaration, referring to a file containing XML data: <!ENTITY description1 SYSTEM "description1.xml"> This entity declaration defines an entity named description1, with "description1.xml" as the system identifier. A "system identifier" is the location of the file containing the data. TEAM FLY PRESENTS Simpo PDF Merge and Split Unregistered Version - http://www.simpopdf.com When declaring external entity declarations, public identifiers for the entity can also be specified. The XML parser, on encountering the external entity reference first tries to resolve the reference using the public identifier and only when it fails it tries to use system identifier. In this example, the entity description1 is declared with the public identifier "http://www.harawat.com/description1.xml", and the system identifier "description1.xml": <!ENTITY description1 SYSTEM "description1.xml" PUBLIC "http://www.harawat.com/description1.xml> If the file contains non-XML data, the syntax will be: <!ENTITY booklogo SYSTEM "booklogo.gif" NDATA gif> This entity declaration defines an entity booklogo, which refers to an external non-XML file booklogo.gif, of notation gif. Notation declaration for gif should be declared earlier. XML Support in PHP PHP supports a set of functions that can be used for writing PHP-based XML applications. These functions can be used for parsing well-formed XML documents. The XML parser in PHP is a streams- based parser. Before parsing the document, different handlers (or callback functions) are registered with the parser. The XML document is fed to the parser in sections, and as the parser parses the document and recognizes different nodes, it calls the appropriate registered handler. Note that the XML parser does not check for the validity of the XML document. It won't generate any errors or warnings if the document is well-formed but not valid. The PHP XML extension supports Unicode character set through different character encodings. There are two types of character encodings, source encoding and target encoding. Source encoding is performed when the XML document is parsed. The default source encoding used by PHP is ISO-8859- 1. Target encoding is carried out when PHP passes data to registered handler functions. Target encoding affects character data as well as tag names and processing instruction targets. If the XML parser encounters characters outside the range that its source encoding is capable of representing, it will return an error. If PHP encounters characters in the parsed XML document that cannot be represented in the chosen target encoding, such characters will be replaced by a question mark. XML support for PHP is implemented using the expat library. Expat is a library written in C, for parsing XML documents. More information about expat can be found at http://www.jclark.com/xml/expat.html page. Note that XML support is not available in PHP by default. We discuss installing PHP with XML support in Chapter 2. TEAM FLY PRESENTS Simpo PDF Merge and Split Unregistered Version - http://www.simpopdf.com The PHP XML API A PHP script which parses an XML document must perform the following operations: 1. Create an XML parser. 2. Register handler functions (callback functions) with the parser. The parser will call these registered handlers as and when it recognizes different nodes in the XML document. Most of the application logic is implemented in these handler functions. 3. Read the data from the XML file, and pass the data to the parser. This is where the actual parsing of the data occur. 4. Free the parser, after the complete file has been parsed. We will have a quick look at what this means in practice by showing a very simple XML parser (in fact, just about the simplest possible!), before going on to look at the individual functions in turn. <?php // First we define the handler functions to inform the parser what action to // take on encountering a specific type of node. // We'll just print out element opening and closing tags and character data // The handler for element opening tags function startElementHandler($parser, $name, $attribs) { echo("<$name><BR>"); } // The handler for element closing tags function endElementHandler($parser, $name) { echo("</$name><BR>"); } // The handler for character data function cdataHandler($parser, $data) { echo("$data<BR>"); } // Now we create the parser $parser=xml_parser_create(); // Register the start and end element handlers xml_set_element_handler($parser, "startElementHandler", "endElementHandler"); // Register the character data parser xml_set_character_data_handler($parser, "cdataHandler"); // Open the XML file $file="test.xml"; if (!($fp = fopen($file, "r"))) { die("could not open $file for reading"); TEAM FLY PRESENTS Simpo PDF Merge and Split Unregistered Version - http://www.simpopdf.com } // Read chunks of 4K from the file, and pass it to the parser while ($data = fread($fp, 4096)) { if (!xml_parse($parser, $data, feof($fp))) { die(sprintf("XML error %d %d", xml_get_current_line_number($parser), xml_get_current_column_number($parser))); } } ?> If we run this script against the following XML file: <?xml version="1.0"?> <books> <book>Pro PHP</book> <book>XML in IE5</book> <book>Pro XML</book> </books> This will produce this output in the browser: TEAM FLY PRESENTS Simpo PDF Merge and Split Unregistered Version - http://www.simpopdf.com Now we'll go on to discuss the functions in detail. In the following sections, all the XML-related functions will be described, along with examples of their use. Creating an XML Parser The function xml_parser_create() creates an XML parser context. int xml_parser_create(string [encoding_parameter]); Paramter Optional Description Default encoding_parameter Yes The character source encoding that will be used by the parser. The source encoding once set cannot be changed later. The possible values are "ISO-8859-1" TEAM FLY PRESENTS Simpo PDF Merge and Split Unregistered Version - http://www.simpopdf.com [...]... Explorer 5 then the PHP script could simply return the XML file; otherwise, the script would convert the XML data into HTML and sent an HTML page to the browser Different Views of the Same Data PHP can be used to present different views of the same XML document, by deleting or modifying nodes within the document A Sample PHP XML Application To give a simple example of what can be done with XML and PHP, ... UTF-8 encoded string to ISO-8 859 -1 encoding: string utf8_decode(string data); Parameter data Optional No Description UTF-8 encoded string This function returns an ISO-8 859 -1 string corresponding to data utf8_encode The utf8_encode() function converts an ISO-8 859 -1 encoded string to UTF-8 encoding: string utf8_encode(String data); Parameter data Optional No Description ISO-8 859 -1 encoded string The function... ISO-8 859 -1 encoded string The function returns a UTF-8 string corresponding to data PHP XML Applications Now that we've seen the theory, let's look at some of the types of practical applications that can be developed using XML and PHP TEAM FLY PRESENTS Simpo PDF Merge and Split Unregistered Version - http://www.simpopdf.com Web-Enabling Enterprise Applications Now XML is getting used as the format for... < ?php print "This document was created on Jan 01, 1999";?> The processing instruction handler will be called with the following parameters: processingInstructionHandler($parser, "php" , string "print \"This XML document was created on Jan 01, 1999\";"); A sample processing instruction handler might look like this: TEAM FLY PRESENTS Simpo PDF Merge and Split Unregistered Version - http://www.simpopdf.com... book: TEAM FLY PRESENTS Simpo PDF Merge and Split Unregistered Version - http://www.simpopdf.com The other option available from the main page is to view the complete list of all the books Again, the title of the book acts as link which the user can click to view the book's table of contents: TEAM FLY PRESENTS Simpo PDF Merge and Split Unregistered Version - http://www.simpopdf.com This figure displays... Mark Wilcox Stefan Zeiger 1-861002-77-7 59 .99 &book_1861002777; TEAM FLY PRESENTS Simpo PDF Merge and Split Unregistered Version - http://www.simpopdf.com Beginning Linux Programming Second Edition Neil Matthew Richard Stones 1-861002-97-1... to describe orders, transactions, inventory, billing etc PHP can be used to provide the web based front end for these business-to-business applications Smart Searches PHP can be used to search XML documents For example, if all the articles in a web site are written using the same DTD, which defines elements for author, title, abstract etc., then PHP can be used to search for the articles depending on...Simpo PDF Merge and Split Unregistered Version - http://www.simpopdf.com "ISO-8 859 -1", "USASCII" and "UTF-8" The function returns a handle (a positive integer value) on success, and false on error The handle returned by xml_parser_create() will... consists one HTML file and four PHP scripts The main page of the application is the HTML page, named main.html: Book Information Site Book Information Site Search Books After the title and page headers, the page contains a form with the ACTION attribute set to one of our PHP pages, search_books .php This contains a ... "author" or "isbn") The selected search category will be available in the search_books .php script through the $searchBy variable There is also a text box where the keyword for the search can be entered (this will be available to the PHP script as the $searchKeyword variable), and a submit button: Search Books By . Support in PHP PHP supports a set of functions that can be used for writing PHP- based XML applications. These functions can be used for parsing well-formed XML documents. The XML parser in PHP is. TEAM FLY PRESENTS Simpo PDF Merge and Split Unregistered Version - http://www.simpopdf.com The PHP XML API A PHP script which parses an XML document must perform. when the XML document is parsed. The default source encoding used by PHP is ISO-8 859 - 1. Target encoding is carried out when PHP passes data to registered handler functions. Target encoding affects