Notes on XML, XSD and XSLT

Russell Bateman
May 2018
last update:

(I have studied this a couple of times in years passed, but never taken notes before. Note also that I have filched some minor examples used in here from W3Schools rather than invent them.)

XML

eXtensible Mark-up Language is designed to store and transport data without architectural consideration as to hosting, designed also to be (more or less) human-readable as well as easily machine-readable.

XML header

Since XML 1.1, this mandatory, e.g.:

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>

...where version is self-explanatory, encoding is optional and will be automatically detected as long as the content is UTF-8, UTF-16 or US-ASCII, and standalone indicates whether the document can be processed (correctly) without a DTD.

XPath

XPath is for navigating XML documents (XSLT uses XPath).

XSL

eXtensible Stylesheet Language is a styling language for XML, a (very) little like CSS for HTML.

XSLT

XSL eXtensible Stylesheet Language is a styling language for XML, a (very) little like CSS for HTML. XSLT stands for XSL Transformations. Using XSLT, you can transform XML into other formats, like HTML.

XQuery

is for querying XML documents.

DTD

Document Type Definition defines the structure and legal elements and attributes of an XML document.

An external DTD definition, note.dtd:

<!ELEMENT note    ( to, from, heading, body )>
<!ELEMENT to      ( #PCDATA )>
<!ELEMENT from    ( #PCDATA )>
<!ELEMENT heading ( #PCDATA )>
<!ELEMENT body    ( #PCDATA )>

...consumed by an XML document, say note.xml:

<?xml version="1.0"?>
<!DOCTYPE note SYSTEM "note.dtd">
<note>
  <to>Tove</to>
  <from>Jani</from>
  <heading>Reminder</heading>
  <body>Don't forget me this weekend!</body>
</note>

XSD

XML Schema Definition is to define the building blocks of an XML document including the legal elements and attributes, the number and even the order of child elements, the data types for elements and attributes and their default- or fixed values.

XSD is a more powerful alternative to DTD. Why does it exist?

* For example, you cannot express, for an element <elevation>, that it can't contain a value over 29,029 feet.

XSD is extensible: one can extend (i.e.: reuse) the work of another. You can reference multiple schemas in the same XML document.

A well-formed XML document conforms to XML syntax rules, but even then, a document can contain errors with respect to what it's used for when there's no XSD.

The XSD replacing note.dtd for note.xml above could be (note.xsd):

<?xml version="1.0"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
           targetNamespace="https://www.w3schools.com"
           xmlns="https://www.w3schools.com"
           elementFormDefault="qualified">
  <xs:element name="note">
    <xs:complexType>
      <xs:sequence>
        <xs:element name="to"      type="xs:string" />
        <xs:element name="from"    type="xs:string" />
        <xs:element name="heading" type="xs:string" />
        <xs:element name="body"    type="xs:string" />
      </xs:sequence>
    </xs:complexType>
  </xs:element>
</xs:schema>

...and it can be included in note.xml in the same way as the DTD:

<?xml version="1.0"?>
<!DOCTYPE note SYSTEM "https://www.w3schools.com/xml/note.dtd">
<note>
  <to>Tove</to>
  <from>Jani</from>
  <heading>Reminder</heading>
  <body>Don't forget me this weekend!</body>
</note>

...although this is normally done this way:

<?xml version="1.0"?>

<note xmlns="https://www.w3schools.com"
         xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
         xsi:schemaLocation="https://www.w3schools.com/xml/note.xsd">
  <to>Tove</to>
  <from>Jani</from>
  <heading>Reminder</heading>
  <body>Don't forget me this weekend!</body>
</note>

The foregoing defines five elements, note, to, from, heading and body. The last four are said to be found only inside a <note> element.

XSD and namespaces

<?xml version="1.0"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
           targetNamespace="http://www.w3schools.com"
           xmlns="http://www.w3schools.com"
           elementFormDefault="qualified">
    ...
    ...
</xs:schema>

On line 2, the author of this XSD is stating that the elements and data types used in this schema come from the "http://www.w3.org/2001/XMLSchema" namespace and that, whenever used, these should be prefixed with xs.

On line 3, the author states that the elements defined by this schema, that is note, to, from, heading and body, come from the "http://www.w3schools.com" namespace. This definition is known as the default namespace and elements that come from it are not going to be prefixed. Indeed, any unprefixed element belongs to that namespace.

Unprefixed attributes always belong to no namespace at all.

Unless you complicate the XML document with additional instances of the xmlns attribute on elements in it, you have only one, default namespace throughout the entire document. In the following contrived example, the <text> elements come from two, different namespaces despite neither use being attributed via any prefix:

<element xmlns="ns1">
  <text>This is ns1</text>
  <child xmlns="ns2">
    <text>This is ns2</text>
  </child>
</element>

Without the xmlns attributes, this would have to be written explicitly instead like this:

<root xmlns:a="ns1" xmlns:b="ns2">
  <element>
    <a:text>This is ns1</text>
    <child>
      <b:text>This is ns2</text>
    </child>
  </element>
</root>

In this sample XSD, ...

<abc:schema xmlns:abc="http://www.w3.org/2001/XMLSchema"
            targetNamespace="http://www.w3schools.com"
            xmlns:xyz="http://www.w3schools.com"
            elementFormDefault="qualified">
    <abc:complexType name="myType"> ... </abc:complexType>
    <abc:element name="myElementOne" type="xyz:myType" />
    <abc:element name="myElementTwo" type="abc:string" />
</abc:schema>

...note that:

  1. first, this XSD is itself an XML document and <abc:schema ...> defines schema sources and prefixing for the XSD (which goes on to define a schema).
  2. <schema>, <complexType> and <element> (plus the string type) all belong to the "http://www.w3.org/2001/XMLSchema" namespace. This attribute, xmlns:abc="http://www.w3.org/2001/XMLSchema" demands that they be prefixed when used.
  3. There are no elements or attributes in the schema that belong to the "http://www.w3schools.com" namespace. This is, in fact, the very schema that defines <myType>, <myElementOne> and <myElementTwo>.
  4. Complex type <myType> is part of the "http://www.w3schools.com" namespace because a) it's defined in this schema and b) the targetNamespace of this schema is "http://www.w3schools.com".
  5. Because that namespace is mapped to prefix xyz, this prefix must be used to refer in the XML document to it.
  6. <schema>

Details of XSD

(See note.xsd above.)

  1. There's always a <schema> element and it's the root of every XSD.
  2. This element usually contains some attributes (as shown).
  3. The xmlns attribute stipulates that when elements and data types used in the schema come from the "http://www.w3.org/2001/XMLSchema" namespace, they'll be prefixed with xs.
  4. The targetNamespace attribute reveals that the elements described in the (notes.xsd) schema will come from the "https://www.w3schools.com" namespace.
  5. Finally, the elementFormDefault attribute introduces the requirement that any elements used in an XML document consuming note.xsd be namespace-qualified, in other words. Conside this sample fragment of an XSD file:
    <schema xmlns:xsd="http://www.w3.org/2001/XMLSchema-instance"
               targetNamespace="http://www.books.org"
               xmlns="http://www.books.org"
               elementFormDefault="qualified">
      <xsd:complexType name="AuthorType">
        <!-- compositor goes here -->
        <xsd:sequence>
           <xsd:element name="name" type="xsd:string" />
           <xsd:element name="phone" type="tns:Phone" />
        </xsd:sequence>
        <xsd:attribute name="id" type="tns:AuthorId" />
      </xsd:complexType>
      <xsd:element name="author" type="tns:AuthorType" />
    </xsd:schema>
    
    This is a directive to any XML instance that claims to conform to this schema that any elements it uses that were defined in this schema must be namespace-qualified. For instance, if you specify elementFormDefault="unqualified", then this XML fragment in a document including the XSD above is legal:
    <x:author xmlns:x="http://example.org/publishing">
       <name>Aaron Skonnard</name>
       <phone>(801)390-4552</phone>
    </x:author>
    
    ...but, if elementFormDefault="qualified" is specified, then you must see this instead:
    <x:author xmlns:x="http://example.org/publishing">
       <x:name>Aaron Skonnard</name>
       <x:phone>(801)390-4552</phone>
    </x:author>
    

Value restrictions

A restriction on an XML element is called a facet. Here, the value of <age> can never be lower than 0 nor higher than 120:

<xs:element name="age">
  <xs:simpleType>
    <xs:restriction base="xs:integer">
      <xs:minInclusive value="0" />
      <xs:maxInclusive value="120" />
    </xs:restriction>
  </xs:simpleType>
</xs:element>





















Notes on XML, XSD, XSLT, etc.

XML eXtensible Mark-up Language is designed to store and transport data without architectural consideration as to hosting, designed also to be (more or less) human-readable as well as easily machine-readable

XPath is for navigating XML documents (XSLT uses XPath)

XSL eXtensible Stylesheet Language is a styling language for XML, a little analogous to CSS for HTML

XSLT XSL Transformations uses XSL to transform an XML document into another format, for instance, HTML

XQuery is for querying XML documents

DTD Document Type Definition defines the structure and legal elements and attributes of an XML document

XSD XML Schema Definition (XSD) is used to define the building blocks of an XML document including the legal elements and attributes, the number and even the order of child elements, the data types for elements and attributes and their default- or fixed values