XML and Web Databases: Dr. M. Brindha Assistant Professor Department of CSE NIT, Trichy-15
XML and Web Databases: Dr. M. Brindha Assistant Professor Department of CSE NIT, Trichy-15
Dr. M. Brindha
Assistant Professor
Department of CSE
NIT, Trichy-15
Introduction
• XML: Extensible Markup Language
• Defined by the WWW Consortium (W3C)
• Originally
intended as a document markup language not a
database language
• Documents have tags giving extra information about sections of the
document
• E.g. <title> XML </title> <slide> Introduction …</slide>
• Derived from SGML (Standard Generalized Markup Language), but
simpler to use than SGML
• Extensible, unlike HTML
• Users can add new tags, and separately specify how the tag should be
handled for display
• Goalwas (is?) to replace HTML as the language for publishing
documents on the Web
XML Introduction (Cont.)
• The ability to specify new tags, and to create nested tag
structures made XML a great way to exchange data, not
just documents.
• Much of the use of XML has been in data exchange applications, not as a replacement
for HTML
• DTD syntax
• <!ELEMENT element (subelements-specification) >
• <!ATTLIST element (attributes) >
Element Specification in DTD
• Subelements can be specified as
• names of elements, or
• #PCDATA (parsed character data), i.e., character strings
• EMPTY (no subelements) or ANY (anything can be a subelement)
• Example
<! ELEMENT depositor (customer-name account-number)>
<! ELEMENT customer-name (#PCDATA)>
<! ELEMENT account-number (#PCDATA)>
• Subelement specification may have regular expressions
<!ELEMENT bank ( ( account | customer | depositor)+)>
• Notation:
• “|” - alternatives
• “+” - 1 or more occurrences
• “*” - 0 or more occurrences
Bank DTD
<!DOCTYPE bank [
<!ELEMENT bank ( ( account | customer | depositor)+)>
<!ELEMENT account (account-number branch-name balance)>
<! ELEMENT customer(customer-name customer-street
customer-city)>
<! ELEMENT depositor (customer-name account-number)>
<! ELEMENT account-number (#PCDATA)>
<! ELEMENT branch-name (#PCDATA)>
<! ELEMENT balance(#PCDATA)>
<! ELEMENT customer-name(#PCDATA)>
<! ELEMENT customer-street(#PCDATA)>
<! ELEMENT customer-city(#PCDATA)>
]>
Attribute Specification in DTD
• Attribute specification : for each attribute
• Name
• Type of attribute
• CDATA
• ID (identifier)
or IDREF (ID reference) or IDREFS (multiple IDREFs)
• more on this later
• Whether
• mandatory (#REQUIRED)
• has a default value (value),
• or neither (#IMPLIED)
• Examples
• <!ATTLISTaccount acct-type CDATA “checking”>
• <!ATTLIST customer
customer-id ID # REQUIRED
accounts IDREFS # REQUIRED >
IDs and IDREFs
• An element can have at most one attribute of type ID
• The ID attribute value of each element in an XML
document must be distinct
• Thus the ID attribute value is an object identifier
• An attribute
of type IDREF must contain the ID value
of an element in the same document
• An attribute of type IDREFS contains a set of (0 or
more) ID values. Each ID value must contain the ID
value of an element in the same document
Bank DTD with Attributes
• Bank DTD
with ID and IDREF attribute types.
<!DOCTYPE bank-2[
<!ELEMENT account (branch, balance)>
<!ATTLIST account
account-number ID # REQUIRED
owners IDREFS # REQUIRED>
<!ELEMENT customer(customer-name, customer-street,
customer-city)>
<!ATTLIST customer
customer-id ID # REQUIRED
accounts IDREFS # REQUIRED>
… declarations for branch, balance, customer-name,
customer-street and customer-city
]>
XML data with ID and IDREF attributes
<bank-2>
<account account-number=“A-401” owners=“C100 C102”>
<branch-name> Downtown </branch-name>
<balance> 500 </balance>
</account>
<customer customer-id=“C100” accounts=“A-401”>
<customer-name>Joe </customer-name>
<customer-street> Monroe </customer-street>
<customer-city> Madison</customer-city>
</customer>
<customer customer-id=“C102” accounts=“A-401 A-402”>
<customer-name> Mary </customer-name>
<customer-street> Erin </customer-street>
<customer-city> Newark </customer-city>
</customer>
</bank-2>
Limitations of DTDs
• No typing of text elements and attributes
• All values are strings, no integers, reals, etc.
• Indexing:
• Store values of subelements/attributes to be indexed as extra
fields of the relation, and build indices on these fields
• E.g. customer-name or account-number
• Oracle 9 supports function indices which use the result of a
function as the key value.
• The function should return the value of the required
subelement/attribute
String Representation (Cont.)
• Benefits:
• Can store any XML data even without DTD
• As long as there are many top-level elements in a document,
strings are small compared to full document
• Allows fast access to individual elements.
customer-name account-number
(id: 3) (id: 7)
• Each element/attribute is given a unique identifier
• Type indicates element/attribute
• Label specifies the tag name of the element/name of attribute
• Value is the text value of the element/attribute
• The relation child notes the parent-child relationships in the
tree
• Can add an extra attribute to child to record ordering of children
Tree Representation (Cont.)
• Benefit: Can store any XML data, even without DTD
• Drawbacks:
• Data is broken up into too many pieces, increasing space
overheads
• Even simple queries require a large number of joins, which can
be slow
Mapping
• Map to relations
XML Data to Relations
• If DTD of document is known, can map data to relations
• A relation is created for each element type
• Elements (of type #PCDATA), and attributes are mapped to attributes
of relations
• Benefits:
• Efficient storage
• Can translate XML queries into SQL, execute efficiently, and then
translate SQL results back to XML
• Drawbacks: need to know DTD, translation overheads
still present
Mapping XML Data to Relations (Cont.)
• Relation created for each element type contains
• An id attribute to store a unique id for each element
• A relation attribute corresponding to each element attribute
• A parent-id attribute to keep track of parent element
• As in the tree representation
• Position information (ith child) can be store too