1. Trang chủ
  2. » Công Nghệ Thông Tin

the xml standard slide

85 256 0

Đang tải... (xem toàn văn)

Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống

THÔNG TIN TÀI LIỆU

Thông tin cơ bản

Định dạng
Số trang 85
Dung lượng 511 KB

Nội dung

Document Type Definitions DTDs Define and Constrain Element Names & Structure Element Type Declaration Attribute List Declaration... Object Fusion in Mediator Systems Object Identity

Trang 1

The XML Standard

Trang 2

Overview of our XML Standards

• Motivation: HTML vs XML

• XML 101: syntax, elements, attributes, DTDs, …

• XML 201: XML Schema, Namespaces

• XSLT: Transforming and Rendering XML

• XQuery: Search, Transform & Integrate

Trang 3

So what is XML (all about)?

Executive Summary:

• XML = HTML – idiosyncrasies (simplified syntax)

+ user-definable ("semantic") tags

• Separation of data and its presentation

=> simple, very flexible data exchange format:

Trang 4

What’s Wrong with HTML?

Y Papakonstantinou, S Abiteboul, H Garcia-Molina.

“Object Fusion in Mediator Systems” In VLDB 96

HTML confuses presentation

with content

Trang 5

What’s Wrong with HTML

Author

Conference

Title

Trang 6

And Some Repercussions

• Lack of schema/semantics when querying the Web (HTML):

– "find documents (books, papers, ) where

author = Michael Jackson"

( and learn how software engineering meets the moon

− automation of information management

(retrieval, manipulation, integration)

Trang 7

XML is Based on Markup

< bibliography >

< paper ID= "object-fusion">

< authors >

< author >Y.Papakonstantinou</ author >

< author >S Abiteboul</ author >

< author >H Garcia-Molina</ author >

Decoupled from presentation

Trang 8

Elements and their Content

element

element name

Character content

Element Content

Empty Element

< bibliography >

< paper ID="object-fusion">

< authors >

< author >Y.Papakonstantinou</ author >

< author >S Abiteboul</ author >

< author >H Garcia-Molina</ author >

Trang 10

XML = Labeled Ordered Trees

< bibliography >

< paper id=23 >

< authors >

<author>Yannis</author> <author>Serge</author>

Trang 11

How do I share

structure and metadata/semantics

How do I learn and use

the element structure

of a document?

Trang 12

Adding Structure and Semantics

• XML Document Type Definitions (DTDs):

• define the structure of "allowed" documents (i.e.,

– identify your vocabulary

• Resource Description Framework (RDF)

– simple metadata model

Trang 13

XML DTDs as Extended CFGs

<!element bibliography paper* >

<!element paper (authors,fullPaper?,title,booktitle) >

<!element authors author+ >

Trang 14

<!element bibliography paper* >

<!element paper (authors, fullPaper?, title, booktitle)>

<!element authors author+ >

<!element author (#PCDATA)>

<!element fullPaper EMPTY>

<!element title (#PCDATA)>

<!element booktitle (#PCDATA)>

<!attlist fullPaper source ENTITY #REQUIRED>

<!attlist paper ID ID>

Document Type Definitions (DTDs)

Define and Constrain Element Names & Structure

Element Type Declaration

Attribute List Declaration

Trang 15

Element Declarations

<!element bibliography paper* >

<!element paper (authors, fullPaper?, title, booktitle)>

<!element authors author+ >

<!element author (#PCDATA) >

<!element fullPaper EMPTY>

<!element title (#PCDATA)>

<!element booktitle (#PCDATA)>

<!attlist fullPaper source ENTITY #REQUIRED>

<!attlist paper ID ID>

Character content

Authors followed by optional fullpaper, followed by title, followed by booktitle

Sequence of 1 or more author

Sequence of 0 or more paper

Trang 16

Element Content Declarations

Declaration Meaning

element name Exactly one instance of element

R? Zero or one instances of R

R* Zero or more instances of R

R+ One or more instances of R

R1|R2|…|Rn One instance of R1 or R2 or … Rn

#PCDATA Character content

EMPTY Empty element

(#PCDATA e*)* Mixed Content

ANY Anything goes

Trang 17

<title>Object Fusion in Mediator Systems</title>

<related papers= "semistructured-data" "mediators"/> </paper>

</bibliography>

Object Identity Attribute

CDATA (character data)

<person ID=" yannis "> Yannis’ info </person>

IDREF intradocument reference

Reference to external ENTITY

Trang 18

Attribute Types

ID Token unique within the document IDREF Reference to an ID token

IDREFS Reference to multiple ID tokens

ENTITY External entity (image, video, …)

ENTITIES External entities

NMTOKEN Enumerated token

NMTOKENS Enumerated tokens

More to

appear?

More types (eg, DATE) may soon be part of the standard

Trang 20

Types of Entities

• Internal (to a doc) vs External ( → use URI)

• General (in XML doc) vs Parameter (in DTD)

• Parsed (XML) vs Unparsed (non-XML)

Trang 21

Internal Text Entities

<!ENTITY WWW "World Wide Web">

<p>We all use the &WWW; </p>

Internal Text Entity Declaration

Entity Reference

<p>We all use the World Wide Web </p>

Logically equivalent to actually appearing

Trang 22

Unparsed (& "Binary") Entities

<!ENTITY fusion SYSTEM "fusion.ps" NDATA ps>

and unparsed entity

<fullPaper source="fusion"/>

<!attlist fullPaper source ENTITY #REQUIRED>

Element with ENTITY attribute Declare attribute type to be entity

<!NOTATION ps SYSTEM "ghostview.exe">

NOTATION declaration (helper app )

Declare external

Trang 23

From Docs to Data: XML Schema

• XML DTDs (part of the XML spec.)

– flexible, semistructured data model (nesting, ANY, ?, *, |, )

– but document-oriented (SGML heritage)

• XML Schema (W3C working draft)

– schema definition language in XML

– data-oriented: data types

– extends capabilities of DTD

Trang 24

Sample Data for Introduction to XML Schema

Trang 25

The Simple “Russian Doll” Approach

<xsd:element name="title" type="xsd:string"/>

<xsd:element name="author" type="xsd:string"/>

<xsd:element name="character“

minOccurs="0" maxOccurs="unbounded" >

<xsd:complexType>

<xsd:sequence>

<xsd:element name="name" type="xsd:string"/>

<xsd:element name="friend-of" type="xsd:string“

minOccurs="0" maxOccurs="unbounded"/>

<xsd:element name="since" type="xsd:date"/>

<xsd:element name="qualification" type="xsd:string"/> </xsd:sequence> …

<xsd:attribute name="isbn" type="xsd:string"/>

Optional Namespace DefinitionSequence Compositor Simple Type

Content fortitle andauthor

Complex Type

Content for book

Character may appear any number of times

Basic Type of XML Schema

Trang 26

The Catalog Approach to XML Schema: Stand-Alone Declarations & References

<xsd:element name="title" type="xsd:string"/>

<xsd:element name="author" type="xsd:string"/>

<xsd:element name="name" type="xsd:string"/>

Attributes

Complex TypeElement character

Reference

Trang 27

Catalog Approach Cont’d

<xsd:attribute ref="isbn"/>

</xsd:complexType>

</xsd:element>

Trang 28

<xsd:simpleType name=" nameType ">

<xsd: restriction base=" xsd:string ">

Trang 29

Groups: Named containers of sets of

Elements or Attributes

<xsd:group name="mainBookElements">

<xsd:sequence>

<xsd:element name="title" type="nameType"/>

<xsd:element name="author" type="nameType"/>

Trang 30

Compositors: Sequence, Choice, All

The group nameTypes consists of one of

the element “name”

• the sequence containing firstName,

middlename, lastName

Trang 31

Compositors (cont’d)

<xsd:complexType name="characterType">

< xsd:all >

<xsd:element name="name“ type="nameType"/>

<xsd:element name="friend-of“ type="nameType”

minOccurs="0“ maxOccurs="unbounded"/>

<xsd:element name="since" type="sinceType"/>

<xsd:element name="qualification" type="descType"/> </xsd:all>

</xsd:complexType>

The characterType consists of name, a list of friend-of,

since, and qualification particles in no particular order.(Compare with the sequence compositor.)

Trang 32

Derivation of Simple Types:

have seenrestrictions and facets

The simple type isbnType will be either

a 10-digit string (notice the pattern)

• the token "TBD“ or the token "NA"

Trang 33

By inserting xsd:unique in the book element declaration

we enforce that the character name’s in each book are unique

Trang 35

Including Unknown Elements

<xsd:complexType name="descType" mixed="true">

Trang 36

Presenting XML: XSLT

• Why Stylesheets?

– separation of content (XML) from presentation (XSL)

• Why not just CSS for XML?

– XSL is far more powerful:

• selecting elements

• transforming the XML tree

• content based display (result may depend on data)

Trang 37

XSLT Overview

• XSLT stylesheets are denoted in XML syntax

• XSL components:

1 a language for transforming XML

documents ( XSLT : integral part

of the XSL specification)

2 an XML formatting vocabulary ( Formatting Objects : >90% of the formatting properties inherited from CSS)

Trang 38

XSLT Processing Model

XSL stylesheet

Transformation

Trang 39

XSLT Processing Model

• XSL stylesheet: collection of template rules

• template rule: ( pattern ⇒ template )

• main steps:

– match pattern against source tree

– instantiate template (replace current node “.” by the template in the result tree)

– select further nodes for processing

• control can be

– program-driven ("pull": <xsl:foreach> )

– data/event-driven ("push": <xsl:apply-templates> )

Trang 40

< xsl:template match ="product">

Template Rule: Example

(i) match pattern: process <product> elements

(ii) instantiate template: replace each a product with two HTML tables (iii) select the <product> grandchildren (“sales/domestic”,

“sales/foreign”) for further processing

pattern

template

Trang 42

Creating the Result Tree

• Literal result elements : non-XSL elements (e.g., HTML) appear “literally” in the result tree

Trang 43

Example of Turning XML into HTML

<?xml version="1.0"?>

<?xml-stylesheet type="text/xsl" href="FitnessCenter.xsl"?>

< FitnessCenter >

< Member level =" platinum ">

< Name > Jeff </ Name >

< Phone type =" home "> 555-1234 </ Phone >

< Phone type =" work "> 555-4321 </ Phone >

< FavoriteColor > lightgrey </ FavoriteColor >

</ Member >

</ FitnessCenter >

Trang 44

HTML Document in an XSL Template

<?xml version="1.0"?>

<xsl:output method="html"/>

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">

Trang 45

Extracting the Member Name

Trang 46

Extracting a Value from an XML

Document, Navigating the XML Document

– A slash at the beginning of the path indicates

that it is an absolute path, starting from the

top of the XML document

/FitnessCenter/Member/Name

"Start from the top of the XML document, go to the FitnessCenter element, from there go to the Member element, and from there go to the Name element."

Trang 47

Document /

Trang 48

Extract the FavoriteColor and use it

Trang 49

Attribute values cannot contain "<" nor ">"

- Consequently, the following is NOT valid:

<Body bgcolor="<xsl:value-of select='/FitnessCenter/Member/FavoriteColor'/>">

To extract the value of an XML element and use it as an attribute value you must use curly braces:

<Body bgcolor="{/FitnessCenter/Member/FavoriteColor}">

Evaluate the expression within the curly braces Assign the value

to the attribute.

Trang 50

Extract the Home Phone Number

</HTML>

</xsl:template>

</xsl:stylesheet>

Trang 51

Creating the Result Tree

• Further XSL elements for

Trang 52

Creating the Result Tree: Repetition

Trang 53

Creating the Result Tree: Sorting

<xsl:template match="employees">

< ul >

<xsl:apply-templates select="employee">

<xsl:sort select="name/last"/>

<xsl:sort select="name/first"/>

Trang 54

More on XSL

• XSL(T):

– Conflict resolution for multiple applicable rules

Trang 55

XQuery: Querying XML Sources

• Functional Query Language

– Operates on the Xpath/XQuery data model – List of ordered trees

– A document is list of size 1

• XQuery expressions are composed of

– Path expressions

– Element constructors

– FLWR expressions

– … and more …

Trang 56

Path Expressions

doc(“zoo.xml”)//chapter[2]//figure[caption=“Tree Frogs”]

In the second chapter of the document zoo.xml find the

figures with caption “Tree Frogs”

book

section paragraph figure caption

“Tree Frogs”

chapter chapter

paragraph figure caption

“Just Frogs”

part

Trang 57

More Path Expressions

Find the first immediate chapter subelements of immediate part subelements of the document zoo.xml and retrieve

figures that have … doc(“zoo.xml”)/part/chapter[1]//figure[caption=“Tree Frogs”]

chapter

book

section paragraph figure caption

“Tree Frogs”

chapter chapter

paragraph figure caption

“Just Frogs”

part

Trang 58

“Tree Frogs”

result

Trang 59

Bibliography Example Data Set

<bib>

<book>

<author> Aho </author>

<author> Hopcroft </author>

<author> Ullman </author>

<title> Automata Theory </title>

<publisher> Morgan Kaufmann </publisher>

<year> 1998 >/year>

</book>

<book>

<author> Ullman </author>

<title> Database Systems </title>

<publisher> Morgan Kaufmann </publisher>

<year> 1998 >/year>

</book>

<book>

<author> Abiteboul </author>

<author> Buneman </author>

<author> Suciu </author>

<title> Automata Theory </title>

<publisher> Prentice Hall </publisher>

<year> 1998 >/year>

</book>

</bib>

Trang 60

Reviews Example Data Set

<reviews>

<review>

<title> Automata Theory </title>

<comment> It’s the best in automata theory </comment> <comment> A definitive textbook </comment>

</review>

</reviews>

Trang 61

For-Let-Where-Return (FLWR)

FOR $b in doc(“bib.xml”)//book WHERE $b/publisher = “Morgan Kaufmann”

Morgan Kaufmann

book

year publisher

Prentice Hall

1998

Trang 62

Think (tuples of) variable bindings

Morgan Kaufmann

book

year publisher

Prentice Hall

1998

Trang 64

FOR $p in distinct(doc(“bib.xml”)//publisher)

LET $b := document(“bib.xml”)//book[publisher = $p] WHERE count($b) > 1

RETURN $p

List publishers who have published

more than 1 book

Tuples ($p, $b) are formulated

Trang 65

Boolean Expressions in WHERE

Trang 66

<author> Aho </author>

<author> Hopcroft </author>

<author> Ullman </author>

<title> Automata Theory </title>

<publisher> Morgan Kaufmann </publisher>

<year> 1998 >/year>

<comment> It’s the best in automata theory </comment>

<comment> A definitive textbook </comment>

</book_with_review>

Ngày đăng: 23/10/2014, 17:17

TỪ KHÓA LIÊN QUAN

w