XML Schemas
The value class implements a rich in-memory representation of structured object data, with a tree-like structure and a secondary attribute structure. It is specifically designed to map easily to XML-like languages, and with the xmlschema class you write out your internal structure in a wide selection of common encoding techniques.
Applying an XML Schema
Using the Tag Type as a Key
Mixing Tag and Attribute Keys
Wrapping the Value
Container Arrays
Advanced Wrapping
Making Things Interesting
Example: Push Access Protocol
Boolean Values as Objects
Native encoding
First let's take a look at a data structure the way it comes out of the savexml method with no schema arguments. Here's the code:
{
value obj = $("UserName", "jsmith") ->
$("UserID", 1001) ->
$("Password",
$attr("crypt", "plain") ->
$val("s3cret")
) ->
$("FirstName", "John") ->
$("LastName", "Smith") ->
$("AddressData",
$("StreetName", "145th Street") ->
$("HouseNumber", "10") ->
$("ZipCode", "90157") ->
$("City", "Smackville") ->
$("State", "Maine")
) ->
$("GroupMemberships",
$("GroupOne") ->
$("GroupTwo")
);
}
The GraceXML output looks like this:
<?xml version="1.0"?>
<dict>
<string id="UserName">jsmith</string>
<unsigned id="UserID">1001</unsigned>
<string id="Password" crypt="plain">s3cret</string>
<string id="FirstName">John</string>
<string id="LastName">Smith</string>
<dict id="AddressData">
<string id="StreetName">145th Street</string>
<string id="HouseNumber">10</string>
<string id="ZipCode">90157</string>
<string id="City">Smackville</string>
<string id="State">Maine</string>
</dict>
<array id="GroupMemberships">
<string>GroupOne</string>
<string>GroupTwo</string>
</array>
</dict>
Of course we do not live in an ideal world where everybody agrees that this is the best and only way to serialize data to XML. In fact, there are a lot of people out there that may appear to be actively out on thwarting any effort a person could reasonably come up with to achieve both easy object serialization and easy interchange with the formats of other people through the same mechanism. Most of the times, developers have to set out on archeological expeditions wading through a tree of objects with all kinds of filters and enumerators. The grace XML schema system performs this seemingly impossible task in a very diverse range of situations. The actual schema files are not about just defining the format of the XML data; The most important aspect is definition of the actual function of the tags involved in a particular format.
Applying an XML Schema
An xmlschema object can be passed as an extra parameter to the XML-related import and export methods of the value class:
#include <grace/xmlschema.h> int myApp::main (void) { xmlschema S; value storpels; string err; if (! S.load ("schema:storpel.schema.xml")) return 1; if (! storpels.loadxml ("etc:storpels.xml", S, err)) { ferr.writeln ("XML Error: %s" %format (err)); return 1; } return 0; }
The error string argument is optional parameter you can supply if you want to log or print out XML parser errors.
Using the Tag Type as a Key
To see how this works out, let's find some other ways of expressing data. Let's start with a nice one that's easily done:
<?xml version="1.0"?>
<articleOrder>
<articleCode>WNN-8254</articleCode>
<orderQuantity>1</orderQuantity>
</articleOrder>
This is probably the most common (and most sensible) way of expressing data. The tag type is used as a de-facto key for child elements. The schema for this record format would look like this:
<?xml version="1.0"?>
<xml.schema>
<xml.class name="articleOrder">
<xml.type>dict</xml.type>
<xml.proplist>
<xml.member class="articleCode" id="articleCode"/>
<xml.member class="orderQuantity" id="orderQuantity"/>
</xml.proplist>
</xml.class>
<xml.class name="articleCode"><xml.type>string</xml.type></xml.class>
<xml.class name="orderQuantity"><xml.type>integer</xml.type></xml.class>
</xml.schema>
The schema defines three xml classes. The first one, <articleOrder> is defined to be a key/value container (or dict). It can contain child objects of two classes. Both of these classes have a defined relationship between their tag type (class) and their id within the context of the parent container. The two child classes have their own definition, showing them to be leaf objects with string and integer data respectively.
This model of strict enumeration is recommended for schemas that are expected to become very complex and those that need the ability to contain unkeyed arrays. You can also tell xmlschema to automatically make the tag name the explicit key by indicating this in the options section:
<?xml version="1.0"?>
<xml.schema>
<xml.schema.options>
<xml.option.defaulttagkey>true</xml.option.defaulttagkey>
</xml.schema.options>
<xml.class name="threadcount">
<xml.type>integer</xml.type>
</xml.class>
</xml.schema>
Any tags not implicitly defined in the <xml.proplist> of their parent tag with an id property are assumed to have their tags as a key in this scenario. They are assumed to be string values unless if the class is explicitly declared in an <xml.class> with a type, like the threadcount example above.
Mixing Tag and Attribute Keys
Let's look at a variation on the previous schema for some other definitions.
<?xml version="1.0"?>
<shoppingBasket>
<article>
<articleCode>AF-28</articleCode>
<articleQuantity>2</articleQuantity>
<articleCost>
<cost class="product" currency="EUR">10.00</cost>
<cost class="shipping" currency="EUR">5.00</cost>
</articleCost>
</article>
<article>
<articleCode>BX-15</articleCode>
<articleQuantity>1</articleQuantity>
<articleCost>
<cost class="product" currency="EUR">25.00</cost>
</articleCost>
</article>
</shoppingBasket>
The first difference with the earlier layout is the way the root node works: It contains a list of one or more objects of the same class (<article>, in this instance). Another difference is the <cost> object, which has an attribute class for its primary key. The XML schema for this layout would look like this:
<?xml version="1.0"?>
<xml.schema>
<!-- the root class, an array of article objects -->
<xml.class name="shoppingBasket">
<xml.type>array</xml.type>
<xml.proplist>
<xml.member class="article"/>
</xml.proplist>
</xml.class>
<xml.class name="article">
<xml.type>dict</xml.type>
<xml.proplist>
<xml.member class="articleCode" id="articleCode"/>
<xml.member class="articleQuantity" id="articleQuantity"/>
<xml.member class="articleCost" id="articleCost"/>
</xml.proplist>
</xml.class>
<xml.class name="articleCode"><xml.type>string</xml.type></xml.class>
<xml.class name="articleQuantity"><xml.type>string</xml.type></xml.class>
<xml.class name="articleCost">
<xml.type>dict</xml.type>
<xml.proplist>
<xml.member class="cost"/>
</xml.proplist>
</xml.class>
<xml.class name="cost">
<xml.type>float</xml.type>
<xml.attributes>
<xml.attribute label="class" mandatory="true" isindex="true">
<xml.type>string</xml.type>
</xml.attribute>
<xml.attribute label="currency" mandatory="true">
<xml.type>string</xml.type>
</xml.attribute>
</xml.attributes>
</xml.class>
</xml.schema>
There are two collection classes defined here that have child nodes with no implied 'id' for their class. In the case of the <shoppingBasket> root node, this is because it contains a numbered array. In the cast of the <articleCost> object, the indexing is achieved through the class attribute of the <cost> object.
The <xml.attributes> definition enumerates any attributes a particular class has and how they should be handled. The mandatory attribute determines whether the attribute is expected. The isindex attribute is a boolean indicating whether a particular attribute should be considered the object's index key. There should be at most one attribute flagged with isindex.
Wrapping the Value
Let's get more complicated. Sometimes, people get carried away when designing XML formats, or they're using a limited form of XML parsing that leads to weird constructs that are less easily mapped one-on-one to a tree of value objects, at least not in a way that makes them easy to access or manipulate. Here's a simple example:
<?xml version="1.0"?>
<settings>
<localCurrency><string>EUR</string></localCurrency>
<localLanguage><string>Dutch</string></localLanguage>
</settings>
The straightforward way of mapping this would create problems because of the, mostly redundant, <string> objects around the object's true value. There can be some legitimate reasons for going this way, for instance when the actual type of the data cannot be predefined. The XML schema language recognizes the concept where a miniature hierarchy of XML nodes represents a single node of data with the concept of wraps containers. Let's take a look at a workable schema for this:
<?xml version="1.0"?>
<xml.schema>
<xml.class name="settings">
<xml.type>dict</xml.type>
<xml.proplist>
<xml.member class="localCurrency" id="localCurrency"/>
<xml.member class="localLanguage" id="localLanguage"/>
</xml.proplist>
</xml.class>
<xml.class name="localCurrency">
<xml.type>container</xml.type>
<xml.container>
<xml.container.types>
<xml.container.type id="string">string</xml.container.type>
</xml.container.types>
</xml.container>
</xml.class>
<xml.class name="localLanguage">
<xml.type>container</xml.type>
<xml.container>
<xml.container.types>
<xml.container.type id="string">string</xml.container.type>
</xml.container.types>
</xml.container>
</xml.class>
<xml.class name="string" wrap="true" contained="true">
<xml.type>string</xml.type>
</xml.class>
</xml.schema>
The schema defines the <localCurrency> and <localLanguage> classes to be containers. What this means is that not all details of the resulting data object are defined by the tag. In this instance, we are only concerned with wrapping the data's type into a child node. The definitions inside <xml.container.types> offer a mapping between datatypes and direct child nodes. In this example, only string data is defined.
The definition of the <string> class shows two extra attributes: wrap and contained that help the parser understand that tags of this type require special treatment with regards to their parent nodes.
Container Arrays
Let's expand on containers a bit more and look at a subset of Apple's plist format:
<?xml version="1.0"?>
<dict>
<key>firstName</key>
<string>John</string>
<key>lastName</key>
<string>Doe</string>
</dict>
It's not really the kind of format you want to offer for other people to parse, but sometimes there's no choosing whom to talk to. The XML schema for this layout looks like this:
<?xml version="1.0"?>
<xml.schema>
<xml.class name="dict">
<xml.type>container</xml.type>
<xml.container>
<xml.container.idclass>key</xml.container.idclass>
<xml.container.types>
<xml.container.type id="string">string</xml.container.type>
</xml.container.types>
</xml.container>
</xml.class>
<xml.class name="string" contained="true">
<xml.type>string</xml.type>
</xml.class>
</xml.schema>
The <xml.container.idclass> indication is intended for containers that are key/value arrays. It defines a key object that preceeds a value object for every child data node. The lack of a wrap attribute in the definition of the <string> class means that the data node will end up as a child of the container, not as an integral part of it.
Advanced Wrapping
It's time to put wrapping capabilities to the logical extreme, a subset of the patently ridiculous XMLRPC layout:
<?xml version="1.0"?>
<params>
<param>
<name>firstName</name>
<value>
<string>John</string>
</value>
</param><param>
<name>lastName</name>
<value>
<string>Doe</string>
</value>
</param>
</params>
This schema has two useless fluff classes beyond Apple's schema. The <param> object envelopes each data node that is a child of the container. An extra <value> object envelopes the actual contained value. When confronted with such a layout, you have to wonder what kinds of drugs were in fashion among the working group that came up with it. Luckily xmlschema can still cut through this. The schema for the above circus ends up like this:
<?xml version="1.0"?>
<xml.schema>
<xml.class name="params">
<xml.type>container</xml.type>
<xml.container>
<xml.container.wrapclass>param</xml.container.wrapclass>
<xml.container.valueclass>value</xml.container.valueclass>
<xml.container.idclass>name</xml.container.idclass>
<xml.container.types>
<xml.container.type id="string">string</xml.container.type>
</xml.container.types>
</xml.container>
</xml.class>
<xml.class name="string"><xml.type>string</xml.type></xml.class>
</xml.schema>
That's about as complicated as you can get with wrapping, although there are still some tricks that can be useful.
Making Things Interesting
Let's cook up a likely scenario where we want some aggressive wrapping to keep access paths short, lots of stupid factors get in our way:
<?xml version="1.0"?>
<UeberXML>
<retailPrice source="catalog">
<currency name="EUR" amount="10.25"/>
</retailPrice>
<wholesalePrice source="vendor">
<currency name="USD" amount="7.45"/>
</wholesalePrice>
<remoteDescription href="http://vendor.com/arts/1857243"/>
<productTag>cool</productTag>
<productTag>fresh</productTag>
<productTag>hip</productTag>
</UeberXML>
The <currency> node has the characteristics of a wrapped node, but the source attribute of the <retailPrice> and <wholesalePrice> tags needs to be preserved as well. We want the amount attribute of the <currency> object to be the object's value. The <productTag> nodes should really be in their own array. There's also something special about how we'll handle the <remoteDescription> node.
Here's the schema taking care of things:
<?xml version="1.0"?>
<xml.schema>
<xml.class name="UeberXML">
<xml.type>dict</xml.type>
<xml.proplist>
<xml.member class="retailPrice" id="retailPrice"/>
<xml.member class="wholesalePrice" id="wholesalePrice"/>
<xml.member class="myunion" id="description"/>
<xml.member class="prouctTag" id="tags"/>
</xml.proplist>
</xml.class>
<xml.class name="retailPrice">
<xml.type>container</xml.type>
<xml.attributes>
<xml.attribute label="source"><xml.type>string</xml.type></xml.attribute>
</xml.attributes>
<xml.container>
<xml.container.types>
<xml.container.type id="string">currency</xml.container.type>
<xml.container.type id="float">currency</xml.container.type>
<xml.container.type id="integer">currency</xml.container.type>
</xml.container.types>
</xml.container>
</xml.class>
<xml.class name="wholesalePrice">
<xml.type>container</xml.type>
<xml.attributes>
<xml.attribute label="source"><xml.type>string</xml.type></xml.attribute>
</xml.attributes>
<xml.container>
<xml.container.types>
<xml.container.type id="string">currency</xml.container.type>
<xml.container.type id="float">currency</xml.container.type>
<xml.container.type id="integer">currency</xml.container.type>
</xml.container.types>
</xml.container>
</xml.class>
<xml.class name="currency" wrap="true" contained="true" attribvalue="amount">
<xml.type>string</xml.type>
<xml.attributes>
<xml.attribute label="name"><xml.type>string</xml.type></xml.attribute>
<xml.attribute label="amount"><xml.type>string</xml.type></xml.attribute>
</xml.attributes>
</xml.class>
<xml.class name="myunion">
<xml.type>union</xml.type>
<xml.union>
<xml.union.match class="remoteDescription" type="attribexists" label="href"/>
<xml.union.match class="localDescription" type="default"/>
</xml.union>
</xml.class>
<xml.class name="remoteDescription" union="myunion">
<xml.type>string</xml.type>
<xml.attributes>
<xml.attribute label="href" mandatory="true">
<xml.type>string</xml.type>
</xml.attribute>
</xml.attributes>
</xml.class>
<xml.class name="localDescription" union="myunion">
<xml.type>string</xml.type>
</xml.class>
<xml.class name="productTag" array="true">
<xml.type>string</xml.type>
</xml.class>
</xml.schema>
First note how, by defining an attribute to the container class instead of the wrapped value class, attributes are automatically combined into the singular data object. In the class definition of <currency> the attribvalue attribute indicates that the data value should be pulled out of an attribute labelled "amount".
The class definition labelled "myunion" represents an abstract union class that can morph between tag types depending on some defined conditions. In this example, a node marked as a myunion object will be transcribed as a <remoteDescription> object if there's a href attribute, otherwise the default <localDescription> object will be used. Finally, the array attribute in the definition of the <productTag> class indicates to the schema system that objects of this class should be considered child nodes of an implicit array. If we would convert the example layout with this schema data to grace native XML, it would look like this:
<?xml version="1.0"?>
<UeberXML>
<string id="retailPrice" source="catalog" name="EUR">10.25</string>
<string id="wholesalePrice" source="vendor" name="USD">7.45</string>
<string id="description" href="http://vendor.com/arts/1857243"/>
<array id="tags">
<string>cool</string>
<string>fresh</string>
<string>hip</string>
</array>
</UeberXML>
If you're wondering who would come up with such a convoluted format, this might be a good time to disclose that the schema and data are used by one of the test scripts.
Example: Push Access Protocol
Now for a real world example, the WAPForum-specified PAP format for submitting Push messages to a PPG system. These messages look like this:
<?xml version="1.0"?>
<!DOCTYPE pap PUBLIC "-//WAPFORUM//DTD PAP 1.0//EN" "http://www.wapforum.org/DTD/pap_1.0.dtd">
<pap>
<push-message push-id="1234_@_thozie_._de">
<address address-value="WAPPUSH=127.0.0.1/TYPE=USER@127.0.0.1" />
</push-message>
</pap>
The only thing really new here is the document type indicator. Some protocols need it to be present, in such cases the schema can define it.
A schema for the PAP layout:
<?xml version="1.0"?>
<xml.schema>
<xml.schema.options>
<xml.option.doctype name="pap" status="PUBLIC" dtd="http://www.wapforum.org/DTD/pap_1.0.dtd" >-//WAPFORUM//DTD PAP 1.0//EN</xml.option.doctype>
</xml.schema.options>
<xml.class name="pap">
<xml.type>dict</xml.type>
<xml.proplist>
<xml.member class="push-message"/>
</xml.proplist>
</xml.class>
<xml.class name="push-message">
<xml.type>array</xml.type>
<xml.attributes>
<xml.attribute label="push-id" mandatory="true" isindex="true">
<xml.type>string</xml.type>
</xml.attribute>
</xml.attributes>
<xml.proplist>
<xml.member class="address"/>
</xml.proplist>
</xml.class>
<xml.class name="address" attribvalue="address-value">
<xml.type>string</xml.type>
<xml.attributes>
<xml.attribute label="address-value">
<xml.type>string</xml.type>
</xml.attribute>
</xml.attributes>
</xml.class>
</xml.schema>
The new <xml.schema.options> section defines the doctype that should be written inside <xml.option.doctype> node, with the DTD public status in the status attribute, the URL for the DTD in the href attribute and the actual identification string as data.
Boolean Values as Objects
The final trick is value-implicated booleans. It's an Apple classic, observe:
<?xml version="1.0"?>
<Sosumi>
<firstOption><true/></firstOption>
<secondOption><false/></secondOption>
</Sosumi>
Lovely, isn't it? By using the container system, we can work ourselves out of it, though, by defining a special boolean type in the schema. Here's a quick implementation of the above layout as a schema:
<?xml version="1.0"?>
<xml.schema>
<xml.class name="Sosumi">
<xml.type>dict</xml.type>
<xml.proplist>
<xml.member class="firstOption" id="firstOption"/>
<xml.member class="secondOption" id="secondOption"/>
</xml.proplist>
</xml.class>
<xml.class name="firstOption">
<xml.type>container</xml.type>
<xml.container>
<xml.container.types>
<xml.container.type id="bool.true">true</xml.container.type>
<xml.container.type id="bool.false">false</xml.container.type>
</xml.container.types>
</xml.container>
</xml.class>
<xml.class name="secondOption">
<xml.type>container</xml.type>
<xml.container>
<xml.container.types>
<xml.container.type id="bool.true">true</xml.container.type>
<xml.container.type id="bool.false">false</xml.container.type>
</xml.container.types>
</xml.container>
</xml.class>
<xml.class name="bool.true" contained="true" wrap="true">
<xml.type>bool.true</xml.type>
</xml.class>
<xml.class name="bool.false" contained="true" wrap="true">
<xml.type>bool.false</xml.type>
</xml.class>
</xml.schema>
This trick is only available for booleans, obviously. The bool.true and bool.false classes define a <true/> and <false/> object that indicate both a boolean type and its members.
| Previous Chapter | Table of Contents | Next Chapter |