Sunday 10 April 2011

Document Engineering: Test Assertions Example

I was asked to clarify my previous blog posting from yesterday with some examples, so I'll give it a go.

A simple example is an OASIS Universal Business Language (UBL) invoice's so-called 'calculation model' (an example of rules showing how calculations in the invoice are to be made). The invoice has all kinds of totals and amounts so the way you calculate totals from the amounts is important to get right.

In TAML (Test Assertions Markup Language) I can write the rules as test assertions but using XPath so they can be executed against the XML invoice:
...
        <taml:testAssertion id="IN1" name="Invoice" enable="true">
                <taml:normativeSource>U2ICMDraft5Rule1:

"To be a conforming UBL 2 invoice the document MUST be valid according to a standard UBL 2 Invoice schema."</taml:normativeSource>
                <taml:target type="document" idscheme="'document'">/</taml:target>
                <taml:predicate>count(//in:Invoice) ge 1</taml:predicate>
                <taml:prescription level="mandatory"/>
                <taml:report label="failed" message="Not a standard UBL 2 invoice">The file does not contain a standard UBL 2
invoice.</taml:report>
        </taml:testAssertion>

        <taml:testAssertion id="INTOT1" name="LineExtensionAmount (1)" enable="true">
                <taml:normativeSource>U2ICMDraft5Rule2:

"The 'LineExtensionAmount' in the invoice 'LegalMonetaryTotal' SHOULD equal the sum of all 'LineExtensionAmount's in all of the invoice lines."
                </taml:normativeSource>
                <taml:target type="total"
idscheme="'invoice-total'">
/in:Invoice/cac:LegalMonetaryTotal</taml:target>
                <taml:prerequisite>(count(distinct-values(
//*/@currencyID)) eq
1)

                </taml:prerequisite>
                <taml:predicate>

number(./cbc:LineExtensionAmount) eq
sum(/in:Invoice/cac:InvoiceLine/cbc:LineExtensionAmount)

                </taml:predicate>
                <taml:prescription level="preferred"/>
                <taml:report label="failed" message="Error in Line Extension Amount">

The line extension total is not the sum of the invoice lines'
line extension amounts.

                </taml:report>
        </taml:testAssertion>
...


These are just two rules in an example set of rules.

If I use XML Schema (XSD) 1.1 to apply the assertions (effectively as test cases) by combining them with a schema (see previous blog posting) I run into some immediate problems:

1) Ideally I need to use a schema which targets a UBL invoice but a) my UBL invoice already has a schema b) my UBL invoice schema has some design rules which might make it tricky changing a globally defined element into a locally defined one

2) Assuming I can write my own schema in XSD 1.1, and can find a way to define some elements globally, if the assertion(s) targeting it allow this, and some elements locally if the assertions targeting those elements demand
it (e.g. have relevance only to certain contexts for that element)


3) Does every assertion map to one or more elements? I need to be able to turn any failed test of one assertion into a report refering to the one TA (by its ID). Can I do that?

4) How do I apply prerequisites? I could add them to the predicate perhaps but it makes it quite complex. The logic needs to be that if (count(distinct-values(//*/@currencyID)) eq 1)then the predicate applies, else it does not apply so it might be a little more complex than I'd like. Now this doesn't give me a mapping to my test assertion so I need to add the TA id somewhere - in a report or annotation, say (can I do that in XML Schema 1.1 ?).

That's how I'd like it to work but I'm told there is a hitch to this. The XML Schema 1.1 assert cannot lookup values from another part of the document. Duh! Still, I can handle it perhaps. I need to take all the values from the UBL that I want to test, calculate the appropriate totals, do the appropriate lookups and put the results in a special XML file - I might call it the provisional report file. It is then this file's markup, specially designed to support my test assertions, which I define using XML Schema 1.1.

e.g.
<?xml version="1.0" encoding="UTF-8"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
  <xs:element name="InvoiceCalculationModelReport">
   <xs:complexType>
    <xs:sequence>
     <xs:element name="IN1">
      <xs:complexType>
       <xs:sequence>
        <xs:element name="InvoiceCount">
         <xs:simpleType>
          <xs:restriction base="xs:int">
           <xs:assertion test="$value ge 1"/>
          </xs:restriction>
         </xs:simpleType>
        </xs:element>
       </xs:sequence>
      </xs:complexType>
     </xs:element>
     <xs:element name="INTOT1">
      <xs:complexType>
       <xs:sequence>
        <xs:element name="DistinctCurrencyCount">
         <xs:simpleType>
          <xs:restriction base="xs:int">
           <xs:assertion test="$value eq 1"/>
          </xs:restriction>
         </xs:simpleType>
        </xs:element>
        <xs:element name="LegalMonetaryTotalLineExtensionAmount">
         <xs:simpleType>
          <xs:restriction base="xs:int"/>
         </xs:simpleType>
        </xs:element>
        <xs:element name="SumOfLineExtensionAmounts">
         <xs:simpleType>
          <xs:restriction base="xs:int"/>
         </xs:simpleType>
        </xs:element>
       </xs:sequence>
       <xs:assert test="SumOfLineExtensionAmounts eq LegalMonetaryTotalLineExtensionAmount"/>
      </xs:complexType>
     </xs:element>

     <!-- ... -->
   </xs:sequence>
  </xs:complexType>
 </xs:element>
</xs:schema>


Then it is a relatively simple matter to extract values using code, database or XSLT (with help from the TA XPaths perhaps if the latter is used) from the target UBL invoice into a provisional report file which is validated by this schema. The report file would, in this example with just a few test assertions, look like the following:

<?xml version="1.0" encoding="UTF-8"?>
<InvoiceCalculationModelReport>
  <IN1>
   <InvoiceCount>1</InvoiceCount>
  </IN1>
  <INTOT1>
   <DistinctCurrencyCount>1</DistinctCurrencyCount>    <LegalMonetaryTotalLineExtensionAmount>100</LegalMonetaryTotalLineExtensionAmount>
   <SumOfLineExtensionAmounts>100</SumOfLineExtensionAmounts>
  </INTOT1>
  <!-- ... -->
</InvoiceCalculationModelReport>


The added advantage is that I can then get results into my provisional report file whatever the target system, even for a non-software system. The final output is the final report file which reports pass/fail/irrelevant of the like for each assert or better still, for each TA.

In the open source tool for executing test assertions written in XPath, called Tamelizer (hosted on Google Code) there is a similar way to handle such scenarios. The Test Assertions (TAs) are written in TAML and there is a provisional markup written describing the features of the targeted product. The executable TAs are written in terms of the markup used to describe the product features and executed against that marked up description document.

Here's a shot at how it would work for a target which is a mechanical widget with some features as follows 
1) red button on top
2) number of batteries = 2
3) voltage = 6V
4) alarm sound = continuous
5) country of destination = UK


The spec might have some TAs written for it such as

TA1: if there is a 6V battery then the button on top MUST be red
TA2: there MUST be at least 1 battery
TA3: if there are two batteries then their voltage MUST be 3V
TA4: if the country of destination for the widget is US then the
button on the widget MUST be blue


The features can be marked up with some XML. The test cases are written as the TA expressions but in terms of the markup as XPaths

Now here the XSD 1.1 comes in. The markup for the features is defined now in XSD 1.1 so that the test cases can be added into the asserts of the respective elements.

The markup might be

<widget>
<buttonColour>red</buttonColour>
<batteries>
<number>2</number>
<voltage units="Volts">6</voltage>
</batteries>
<destination>UK</destination>
...
</widget>


The expession of a TA in terms of the above might be

<TA id="TA2"><predicate language="XPath">/widget/batteries/number gt; 1</predicate>...</TA>

The schema definition for batteries might be

<xsd:element name="batteries">
...
<xsd:assert>number &gt; 1</xsd:assert>
...
</xsd:element>



Now the point is that the assertion needs to meet some criteria of TAs. One is that the report needs to refer to exactly one TA by its Id. There might be several TAs applying to the same element so the XSD 1.1 might or might not meet this requirement. This relates to the fact that a TA should itself map to a single  requirement in the spec. The TA needs to be self-contained too so that this mapping is unambiguous.

If there is a prerequisite for a TA then there needs to be a corresponding prerequisite for the assertion in the schema.

It gets complex when I have a single prerequisite for many TAs together but I can just apply the same prerequisite logic to each and every assertion individually. If I do though the XPaths of testing might take longer. If I structure the schema though and the if I have control over the features markup too I might be able to put my prerequisites on ancestor elements so that descendants' assertions are irrelevant if those higher assertions fail. Now this is where portability might apply. I really want to make each assertion composable and composability makes portability desirable. Say I have many profles for my spec: I might want some TAs to be transferred between those profiles. This might mean I have to make the assertions in the schema portable for those TAs so that I can move them from a schema for one profile into a schema for another one. Having those higher level assertions might hinder this: It might be better to have any prereqs added to every assertion where it applies so that I can always copy that element with its prerequisites and predicate expressed all in the one assertion. That is really my main point. Plus I tend to think the TAs are going to be each self-contained and atomic so the assertions in the schema relating to them might need to be too.

Now I can go back to the real world example of the UBL invoice test assertions and instead of writing an XSD 1.1 schema again for UBL, I can write a schema for a list of elements to take values extracted from a UBL invoice - the tax totals, line totals, etc. I can derive it from an invoice using XSLT, say, or just programming code. Now the schema for this totals document can itself be written in XML Schema 1.1 and can be modelled along the lines of my set of test assertions. It can then take XPath Boolean expressions derived from my Test Assertions (if the latter are already written in XPath I might only want to combine the prerequisite and predicate expressions to get my XSD 1.1 assert expression). Then I can execute the schema against the totals in their XML markup and, using say a tool like Saxon which can read and execute the XSD 1.1, obtain a list of any deviations from the rules.

Not exactly keeping it simple but sometimes it has to be just a little complex to accomplish what we need, just no more complex than it needs to be, hopefully.

I have to say though, even when compared to Schematron, another assertion-based schema language based on XPath and executed using a two-step XSLT approach, I do rather prefer Tamelizer, the Google Code project, with XPath expressions in executable TAML test assertions. Maybe it won't get as wide use as XML Schema 1.1 but it is just right, I think, for test cases for XML document targets. It takes more skill to write all of the reasoning into the TAML XPath expressions but they execute against the actual target XML so for this kind of target it is worth that little more struggle getting the XPaths right. The prerequisite facility makes it all the more satisfactory in my view.

Document Engineering: Test Assertions and XML Schema Assertions

Computer software does sometimes require some engineering discipline to keep in running smoothly. Software sometimes needs to be spoon fed information and sometimes that information comes in the form of documents similar to those read by humans (invoices can be sent from computer system to computer system to simplify and improve the efficiency of business transactions, perhaps over the Internet). Sometimes humans need to use software to write documents and software to read those same documents. The software used to write a document might be different from the software used to read it (Word at one end and Open Office at the other, say). On the Internet, web browsers need to read websites written and served up with various kinds of software packages from various software producers. All this makes it important at times to apply some engineering practices to ensure things work well. One such practice is the process from specification to conformance test or interoperability test. The specification for, say, a document might require that software reading the document handle it in this way or that way. It might also say how the document is to be written, perhaps using one of the many markup languages such as those which are based on the W3C standards authority's eXtensible Markup Language (XML). One software engineering practice used over and over for decades of computer system history is the production of many test assertions corresponding to statements in a specification. These are atomic restatements which get numbered or indexed in some way so that a test based on the spec can be tied to an individual statement in the spec. The test assertions are a bit like a special engineering index for the spec to help with testing.

Now a test assertion exists so that testers know that there is something particular which they ideally ought to make a special point of testing. They can refer to this test assertion in their test so that a failure of a component being tested can be tied in a test report to the exactly relevant item in the spec. Not complicated really. They often call the individual tests 'test cases'. Large, complicated systems can involve many thousands of test assertions (for large, complicated specifications, of course) and similarly large numbers of test cases based on these assertions. The test assertions could fill a fair bit of a database or a pretty big file of data, depending how they are stored. Then you have to be able to cater for various versions of the specs and various versions of the software or documents being tested. Simplicity might help make it all manageable and help real people keep track of it all. 

Now I have an interest at the moment in a particular technology and I'd like to explore an idea that for XML document testing, and maybe any kind of testing when the tests are documented first as XML documents, you can either write the test using this certain technology or turn test results into test reports using this technology when the tests themselves are run some other way. The technology in question is W3C XML Schema version 1.1. This allows an assertion a bit like a test assertion to be written using a special syntax for expressions relating to an XML document to be inserted into the middle of a definition of a particular part of that XML document. Test assertions targeting XML documents do not have to be expressed so that they can be executed as a test applied to the XML documents. There are some well known examples of where they are written this way but here (some WSI web services specs have such test assertions) really these test assertions are doubling up as test cases. I reckon the assertion feature in XML Schema 1.1 might be used for such test-assertion-like test cases but how? If you write a set of descriptions of tests using XML markup of some kind and define that markup using XML Schema 1.1 I reckon you could match every atomic statement of a requirement or the like in a spec to a test assertion for that requirement and put an element in the markup to report on the testing of the test assertion and define that element in the XML Schema. If the XML Schema is written using version 1.1 then you can put an assertion into the definition of the element which when executed as an executable expression (with a Boolean result) against the element in the test report produces a yes/no answer (true/false answer, it being a 'Boolean', 'predicate' expression).

That might be one way to do it which results in a layer of yes/no answers which can be overlaid on the report to show whether test results are conforming to the test assertions or not. Another way works if the target is itself an XML document but here it gets more challenging. The assertions are run against the XML target but how do they get stored in the XML Schema, I wonder? The test assertions might be themselves documented using markup such as OASIS's Test Assertion Guidelines Technical Committee's (in progress) Test Assertion Markup Language (which I helped write up). Then we are left with the test cases which I wonder whether they can somehow be put into the form of a W3C XML Scheme 1.1 schema. They could be put into the form of a Test Assertion Markup Language XPath profile document and executed that way against the target XML using a tool like the Google Code project's Tamelizer (by Fujitsu America, based on previous work with WS-I) but that doesn't use XML Schema 1.1, it uses XSLT 2.0, which is cool. (Schematron is a similar alternative too which also uses XSLTXPath used by XML Schema 1.1 and the above alternatives) to some kind of schema but it isn't obvious how to do so. I think I'd need a report-like XML structure with one element for each test case and for each element (or element's 'type', an XML Schema thing) a schema definition where I can insert the XPath assertion expression. That doesn't work. I don't want to run the schema against the XML structure, I want to run the assertions in it against the target XML. No good. The target XML might have its own schema. What I have to do is create a schema for the target and put my assertions there. But it constrains the way I define my schema, perhaps not the way I want to do it for the type of XML document I am testing. Still it is an option. Another is to write tests and report on them in a test report document and define the test report document's XML using the schema where I put the assertions. That means two steps and is the same as the first option I looked at above which need not be limited to XML targets (or even software targets, similarly with Tamelizer).

Right so I might decide I can indeed use XML Schema 1.1 to define the target and put my assertions into the individual elements' definitions. I'd probably want a 'global' element definition for each element which always has the same test assertion(s) applying to it wherever it occurs. If the test assertions depend on where the element occurs then I might have to have local elements defined so they can have a set of test assertions applied to them depending on their context (where they occur in the document). This messes with my design a bit but seems to be part and parcel of this technique. All-in-all I need my assertions to map to the test assertions and the test assertions to map to the 'normative' statements in the spec for that type of document. OK, fine. It means to use this approach I have to base my schema design not just on the document's XML structure and processes intended for the document which might impose requirements on the schema design but also I base the schema design partly on the spec design too insomuch as that dictate the test assertions and their granularity. I might then have a mix of global element definitions and local element definitions but I would foresee possible problems if an element has some assertions relating to it as a target which depend on its context in the document and some which don't. In these cases the local, context-dependant requirements (and their test assertions) trump the global, context-independent ones and the element might just have to be defined with one or more local definitions in the schema.

I conclude I'm fairly comfortable with the use of W3X XML Schema 1.1 for associating test assertions with XML documents and even with other targets for testing but it might be limited to applying a kind of truth table of yes and no test reports as a layer over the top of a more general test report for tests made some way other than with the assertion expressions themselves and this goes along similar lines to those used in tools like Tamelizer. To go further and make the schema double as the set of test cases (the test suite or part of it) requires, I think, that the target be an XML document which is itself defined using a schema under the control of the test assertion and / or test case author(s). Cool. 

Wednesday 9 March 2011

XML Special Character Gotchas

You know you've got past the beginner tutorial and you're doing the real thing with XML when you start to get encoding and special character issues. Here's what I mean. You get some code which takes some XML, perhaps produced by a web page, and passes it to an XML parser. Now, just like other formats like JSON and HTML, XML requires that certain 'special characters' be 'escaped' (replaced with sequences like '&amp;' which is the escape sequence for an ampersand character). So if your XML contains some of these characters then they have to be replaced or the XML is not right. This introduces a catch-22 gotcha for XML parsers (at least it does for the ubiquitous one I use and probably does for others too) in that you might need to parse the XML as a first step in replacing the special characters (or how can you tell the character is in element or attribute content rather than a comment, namespace string or element or attribute name, say?). Trying to parse XML containing these characters might cause the parser to throw an error. That, apparently, is because the XML standard specs seem to give the impression this is the correct behavior of an XML parser (though I'm told on authority that this isn't strictly what the specs intended).

The way I had to solve this with my C# code and .NET XML parser was to try parsing the XML, then catch any 'XML exception' errors, then do some things with the exception message and parsed data which aren't exactly advisable (since you don't know what state the parser data will be in after the exception) but seem to be unavoidable for an ordinary developer like myself. I have to get the offending bit of data from the parser using the line number and line position of the error from the error / exception message. Then do some intelligent replacing of a special character while avoiding replacing special characters which are part of the escape sequence of an already replaced special character! Phew! I just wish the parser would do all this for me but there are so few XML parsers available to me in my development environment (two I think) and neither handle this scenario the way I'd like, it seems.

Here's a rendition of a general function to do all this in C# with .NET 2 or 3. I'm not necessarily proficient enough to say anyone else should use this code - it probably needs some better error handling and optimization, besides the fact it makes some assumptions which might not always be safe. It show the kind of issues a developer may face when parsing XML. Actually, I couldn't find much code anywhere on the Internet to handle this issue. In fact found very little written about this issue at all apart from an aged link here which helped as a starting point http://support.microsoft.com/kb/316063 :


public static string EscapeXmlSpecialCharacters(string XmlString)
{
string resultString = "";
//Create and load the XML document.
XmlDocument doc = new XmlDocument();
try
{
doc.LoadXml(XmlString);
resultString = XmlString;
}
catch (XmlException ex)
{
StringReader str = new StringReader(XmlString);
StringWriter stw = new StringWriter(new StringBuilder(resultString));
string output = "";
long i = 0;
string strline = "";
long linenumber = (int)ex.LineNumber;
long lineposition = (int)ex.LinePosition;
while (i < linenumber - 1)
{
strline = str.ReadLine();
stw.WriteLine(strline);
i = i + 1;
}
strline = str.ReadLine();
string strOffendingCharacter = strline.ToString().Substring((int)lineposition - 2, 1);
string strOffendingCharacterAndFollowing5 = strline.ToString().Substring((int)lineposition - 2, 5);
switch (strOffendingCharacter)
{
case "<":
strline = strline.Substring(0, (int)lineposition - 2) + "&lt;" + strline.Substring((int)lineposition - 1);
break;
case "&":
// ensure we are not replacing the ampersand in an already escaped special character (&lt;, &gt;, &apos;, &quot; or &amp;)
switch (strOffendingCharacterAndFollowing5.Substring(1, 3))
{
case "lt;":
break;
case "gt;":
break;
default:
switch (strOffendingCharacterAndFollowing5.Substring(1, 4))
{
case "amp;":
break;
default:
switch (strOffendingCharacterAndFollowing5)
{
case "apos;":
break;
case "quot;":
break;
default:
strline = strline.Substring(0, (int)lineposition - 2) + "&amp;" + strline.Substring((int)lineposition - 1);
break;
}
break;
}
break;
}
break;
}
stw.WriteLine(strline);
strline = str.ReadToEnd();
stw.WriteLine(strline);
output = stw.ToString();
str.Close();
str = null;
stw.Flush();
stw.Close();
stw = null;
resultString = EscapeXmlSpecialCharacters(output);
}
return resultString;
}

Then you can put in a step between receiving some XML from, say, a web control like a grid and sending that XML as a string to an XML parser like XmlReader so that you can read the string into a .NET dataset, say. No idea what the equivalent issues and solution are like in Java, sorry. If you write your own XML parser though this is one of the issues you will have to bear in mind and handle, along with similar XML-related issues like handling the illegal characters and characters which are not encoded with the encoding declared in the XML declaration (usually UTF-8). Targeting such a parser at supporting just MicroXML, say (see earlier blogs on MicroXML and MicroXSD) might help to keep these issues to something manageable for people writing their own parsers; we'll see perhaps.

Tuesday 8 March 2011

MicroXML

Using MicroXML


Some tend to think XML is rather complex and mysterious yet all it is, fundamentally, is a set of building blocks for text: element tags, element content, attribute names, attribute values and the structure (the way the building blocks are arranged). Let’s look at these in turn, while limiting our focus to the simplification of XML which is called MicroXML. In the process we also need to address some special concerns, such as some considerations that have to be given to certain characters and how to denote a comment as being distinct from the XML content. [MicroXML is a subset profile of XML developed by James Clark and published in a blogspot by James at http://blog.jclark.com/2010/12/more-on-microxml.html .]

Element Tags


The simplest and most obvious part of XML is the tag. Tags surround text, sometimes surrounding other tags and sometimes including within the tag other structures called attributes. Tags are sometimes in pairs and sometimes alone. When they are in pairs then there is a leading tag or ‘start tag’ and a trailing tag or ‘end tag’ and usually some text between them. If there is a start tag and end tag without any text between them then it can also be written in an alternative, equivalent form called an empty tag which is the one time when the tag has no pair.

In computer programming, the simplest kinds of examples are often called ‘Hello World’ examples because a very famous book once introduced readers to the early programming language called ‘C’ with an example that output a simple string of characters saying just ‘Hello World’. Using MicroXML we might have an equivalent simple example like this:

 <Hello>World</Hello>

This is called an element. This element has a start tag <Hello>, an end tag </Hello> and between the tags some text reading ‘World’. Simple! Start tags have an opening angle bracket or ‘less-than’ sign < followed by the name given to the element followed by the closing bracket or ‘greater than’ sign >. There is no white space unless the start tag contains one or more attributes, as we’ll see later. White space is that set of characters in text which do not have any visible shape like spaces, tabs, new-lines, etc. The name of the element (in this case ‘Hello’) is not allowed to contain any white space. Neither is an element allowed to start with a digit but it can contain any number of digits after the first character. A formal definition of MicroXML states that the first character of a name (element or attribute name) cannot be a digit but can be an alpha character or underscore (or some other more obscure characters which are listed by their ASCII codes see formal definition of MicroXML here http://blog.jclark.com/2010/12/more-on-microxml.html ). Other name charcters after the first do cannot include the underscore but can include digits, alpha characters and the two punctuation characters point (full stop) ‘.’ and hyphen (dash) ‘-’.

End tags start with an opening angle bracket or less-than sign ‘<’ followed by a forward slash ‘/’ then the same element name as the start tag used and finally the closing angle bracket or greater than sign ‘>’.  So our example element:

 <Hello>World</Hello>

is composed of:
Start tag: <Hello>
Content:  World
End tag: </Hello>

Then there is that special case of element tags which stand alone without being in a pair. These tags are an alternative way to show a special kind of element; one that does not have any content. The ‘Hello’ in our example above, if the text ‘World’ were removed, could be written like this:

 <Hello></Hello>

but it could also be written like this with just one special tag:

 <Hello/>

Elements can contain another kind of structure called an attribute. Attributes when added to an empty element can look like this:

 <Hello from=”Mars” to=”World”/>

The special tag for the empty element starts with the opening angle bracket, then comes the element name and then there can sometimes be white space, such as when the element has an attribute, but whether or not there is any white space it always closes with a backslash immediately followed by a closing angle bracket or greater-than sign. (There are some times when white space follows the element name even when there are no attributes but the reason is not important at this stage except to remember it is possible.)

Note that when there are one or more attributes added to an element the attributes and their values (they always have values in MicroXML) always sit between the element name and the closing angle bracket of the start tag, or when the tag is the standalone empty element tag, between the element name and the backslash, i.e. like the example above, or like this:

 <Hello from=”Mars”>World</Hello>

Also, the quotes around the attribute value can be a pair of double quotes or a pair of single quotes.

Element Content and Mixed Content

  
We’ve seen that an element can contain text but it can also contain other elements, or both. When it contains text alone then the text is simply everything between the start and end tag. In the example above that means just ‘World’. When it contains other elements then the other elements sit between the start and end tags but there may or may not be white space between the end of the first element’s start tag and the next element’s start tag. When the element contains both elements and text then it is called mixed content and the text and other elements along with any white space still sit between the first element’s start and end tags. So you can have content like this:

 <Hello><From>Mars</From><To>World</To></Hello>

or with white space like this:

 <Hello> <From>Mars</From> <To>
World</To>
</Hello>

or with mixed content like this:

 <Hello>From Mars <To>World</To></Hello>

Both white space and mixed content together makes the XML look less recognizable as XML, but it is still allowed, like this:

.. <Hello>From
Mars
<To>
World</To>

</Hello>

Attribute Names and Values


The attribute consists of a name and a textual value. The name and value together are placed in an element start tag or empty tag (not end tag) between the element name and the closing angle bracket. The name goes first followed, always, by an equals sign (with or without any white space between) followed (again, with or without any white space) by the attribute’s textual value surrounded by either single or double quotes like this:

 <Hello from=’Mars’>World</Hello>

or like this

 <Hello from = ”Mars” to=”World”  />

or like this

 <Hello from=  ”Mars” via=”My website”>World</Hello>

Note the various combinations of characters allowed such as white spaces and single or double quotes. Note too that if an attribute value were ever to contain an element it would not normally be recognized as an element but would normally be treated as just text. (Of course, it would be possible to instruct some computational process to extract the value of the attribute and treat it as XML in its own right but normally the value of the attribute is treated as just textual content.)

The attribute names are limited in their allowed characters just as element names are: no leading digits, only certain punctuation characters allowed, etc. Note also that with an attribute name in MicroXML there is a special case of names beginning ‘xml:’ which are reserved names for special purposes specified in the XML Standards. Normally the colon is not allowed in an attribute or element name in MicroXML (because it has a special function given it in parts of the XML specifications not included in MicroXML but which would affect the way MicroXML was handled by tools used to handle XML in general). The ‘xml:’ prefix is allowed in attribute names in MicroXML, however, mainly to allow a special feature to be used which is an ‘xml:id’ special attribute (out of scope here).

Special and Illegal Characters


We have already seen that some characters in element and attribute names are illegal in MicroXML (and some of these are so in XML in general). In addition, to help processors interpret the XML properly and reliably there are some characters which are not allowed in the content. That might seem alarming and so it should: What, you might think, if there are such characters in text we wish to include between tags or in attribute values? What do we do with these characters? The answer is that they have to be replaced with what can be called ‘escape’ strings. This is one big drawback with using MicroXML or XML in general. The reason is clear when you think of what would happen if the text content of an element had a less-than sign in it: It could look very much like an end tag and in some cases might be indistinguishable from one, for example:

<MathStatement>one < two</MathStatement>

is potentially confusing but even more so is:

<two>one<two</two>

Attributes are not exempt and the following is not allowed:

<two one=”<two”</one>

The special characters this applies to are

Ampersand &
Less-than <
Greater-than >
Double-quote "
Single-quote (apostrophe) '

The reason the ampersand ‘&’ is a special character too becomes apparent when we consider what we have to do with these characters to replace them. It is called ‘escaping’ and it consists of replacing these characters with a special sequence of characters which are as follows:

Ampersand     &amp;
Less-than      &lt;
Greater-than      &gt;
Double-quote      &quot;
Single-quote (apostrophe)       &#39;     or     &apos;

The escape strings themselves can start with an apostrophe so this too is a special character.
Now all this poses problems which may have to be solved for people producing the XML, either through text editing or computer programming. You have to not just replace the special characters with their escape sequences but you have to ensure that when doing so you distinguish what is an apostrophe that is part of the text and what is an apostrophe at the start of an escape string (else you might end up producing something like &apos;apos; and eventually &apos;apos;apos;apos; or worse). Then you have to think about how to read the XML in any computer code, when to escape those characters, when to replace them back for humans to read, etc. It can all get complex but the same happens with web browsers which have to do this kind of thing for the language of the Web, HTML, too.

Other characters are said to be ‘illegal’ in MicroXML because all the content and the tags and attribute names and values have to be written in the character set of the encoding known as UTF-8 so any characters not a found in this encoding system’s character set have to be replaced at some point too. This might best be achieved by controlling which characters are actually added to the MicroXML rather than by escaping them (because escaping such characters introduces some extra complications out of scope here).

It has not been mentioned until now but the start of a document written in MicroXML should be the UTF-8 character encoding declaration along with the general XML (version 1.0) declaration:

<?xml version="1.0" encoding="UTF-8"?>
… (rest of the MicroXML follows on).

This, for the Hello World example, would look like the following:

<?xml version="1.0" encoding="UTF-8"?>
<Hello from=’Mars’>World</Hello>

 

Comments

 

To allow ignorable comment text to be interspersed with the MicroXML comments are separated from the rest of the XML by the special sequence of characters: left angle-bracket (less-than sign) followed without white space by an exclamation mark and, again no white space, two consecutive hyphens (dashes) then the comment then two consecutive hyphens and (without intervening white space) a right angle-bracket (greater-than sign).

<!--   then textual comment here then  -->

This looks like the following when the comment is embedded in some XML:

<Hello>From
<!-- this is a comment -->
Mars
<To>
World</To></Hello>

This says that the comment is not to be regarded as actual content and it can be ignored. The comment’s text does not need to be escaped. Comments cannot be nested in MicroXML. One comment has to be ended before the other begins.

This is allowed:

<Hello>From
<!-- this is a comment -->
<!-- this is another comment -->
Mars
<To>
World</To></Hello>

but not this:

<Hello>From
<!-- this is a comment
<!-- this is an illegally nested comment --> -->
Mars
<To>
World</To></Hello>

Structure


There are very many ways the above building blocks of MicroXML (and XML in general) can all be combined to form simple or complex structures but special note needs to be given to the hierarchical or tree-like nature of the structure due to there always being one and only one top level element. This is because there is a distinction made between a self-contained piece of XML and other partial pieces of XML. The first can be called an ‘instance’ and the latter can be called ‘fragments’. An ‘instance’ has special status. This is an instance:

<Hello><From>Mars</From><To>World</To></Hello>

whereas the following is not an instance but is a fragment:

<From>Mars</From><To>World</To>

That is because the first example is wrapped in a single element and the latter has two elements without any single element wrapper.

Then again the following is not even a fragment because as it stands it would not be valid MicroXML because it has a missing tag at the start and an incomplete one at the end:

Mars</From><To>World</To

To be perfect the instance example would be more complete as an instance (or XML document) if it had its declaration:

<?xml version="1.0" encoding="UTF-8"?>
<Hello><From>Mars</From><To>World</To></Hello>

Then a common practice is to add some indentation to make it easier to see that it has a structure because that structure sometimes expresses some of the meaning of the XML.

<?xml version="1.0" encoding="UTF-8"?>
<Hello>
 <From>Mars</From>
 <To>World</To>
</Hello>

The indentation is added using white space but this is not essential. It does illustrate the fact that the instance having a single wrapper around the outside, each element possibly wrapping other elements but attributes wrapping nothing gives the XML a tree-like or hierarchical logical structure.

The tree-like structure is described sometimes using language which alludes to a family tree (as we do later in the discussion of MicroXSD) where one element containing another element is said to be the parent of the second element (and XML elements can only have the one ‘parent’) and the second is said to be one of the child elements of the first. Two child elements sharing the same parent are often metaphorically called ‘siblings’.

The Use of a Namespace


There are times when there may be several instances of XML in one file or, say with mixed content, when various instances are somehow interspersed. To distinguish one instance from another we can assign a special name to each instance via a special attribute called the namespace attribute. The special name is called a namespace. We will see more about this in the section on MicroXSD, but the way that a namespace name is composed is a matter of discretion and might involve adhering to some sort of namespace naming scheme (such as a schemes involving URIs or domain names). The namespaces in MicroXML are limited compared to full, standard XML 1.0 and are assigned using the attribute ‘xmlns’ like this:

<?xml version="1.0" encoding="UTF-8"?>
<Hello xmlns="somenamespace">
 <From>Mars</From>
 <To>World</To>
</Hello>

Any string is allowed as the value of the attribute. It is worth bearing in mind that this is a key area of simplification in the MicroXML profile of XML; full, standard XML is greatly complicated by the allowance of any number of namespaces in an instance with each namespace being assigned to respective elements and attributes using special prefixes in the element and attribute names. For MicroXML this is eliminated for greater simplicity. This does mean that we have to keep the namespaces separated more so that we are limited in how we combine elements from different namespaces: Essentially all elements from one namespace have to be completed (all end tags closed) before another begins. For example we might have a file whose top element has no namespace but within this element are two parent elements for two parts of the rest of the instance each with a different namespace:

<?xml version="1.0" encoding="UTF-8"?>
<Hello>
 <From xmlns="somenamespace1">Mars</From>
 <To xmlns="somenamespace2">World</To>
</Hello>

This is allowed because the two namespaces do not overlap. Every element within an element having a particular namespace has that same namespace in MicroXML.


The Use of a Schema


It might clearly help the handling of the XML or of various parts of it if we pay attention to the design of its structure and seek to document it in a way which humans or even software can read.  The structure with its element and attribute names can be defined outside of the XML using some techniques designed especially for XML which can be used with MicroXML too. This also provides a designer of the XML structure with the opportunity to add some further information to describe and possibly constrain the XML, such as by associating a type with the content of a particular element or attribute. One such technique is to associate a schema with the XML. Such a schema can be written using MicroXSD, as seen in the previous blog [ http://stephengreenxml.blogspot.com/2011/02/microxsd.html ] which is a subset profile of W3C XML Schema. MicroXSD was developed by this blog's author especially for use with MicroXML.

Thursday 3 March 2011

MicroXSD

Introduction

MicroXML


Toward the end of 2010 XML was getting a bit of a rethink in the XML community. First people talked about what for some had been unthinkable in its first decade, an ‘XML 2.0’. That may be on hold; perhaps just as well. For some it has been a major selling point that no 2.0 was planned when XML was created, so archived XML documents might still be readable (parseable) with software decades from now and printed XML still comprehensible decades after that.
What seems more likely is that those who find XML too onerous to use for their purposes will get as much as they need in the form of a predefined subset of XML. Efforts to define possible 80:20 subsets (80 percent of what you need with 20 percent of the complexity) have begun afresh of late, with the first major contender being called 'MicroXML'. The main discussion of MicroXML happened on the XML-Dev public mailing list in December 2010. [MicroXML is a subset profile of XML developed by James Clark and published in a blogspot by James at http://blog.jclark.com/2010/12/more-on-microxml.html . I blog on its use in the next blog.]

MicroXSD

Along with MicroXML there was discussion of the possibility of a simplified subset of languages used for defining structural constraints on XML instances. The standard full version of the main constraint language is called W3C XML Schema but often an alias is used for this in the acronym XSD (XML Schema Definition). MicroXSD is such a subset.
MicroXSD eliminates all but the essentials and only allows 'local' element definitions with unnamed complex and simple types. This makes the 'schema' look similar to the XML it defines. Some will find that an improvement because many acknowledge that the average schema written with XSD-proper is somewhat incomprehensible, even to the well-trained eye. MicroXML, as defined in late 2010 does not allow multiple namespaces (namespaces add some of the most atrocious complexities to standard XML) so MicroXSD need only support zero or one namespace, meaning no imports. Having an easier way to define (and understand) the constraints to be applied to a vocabulary written in MicroXML might allow more people to find it within their means to produce and use XML and write software supporting this subset of it.

Using MicroXSD

Writing the simplest schema

A Schema written in MicroXSD inherits the semantics of a schema written in standard W3C XML Schema 1.0 (often known by the acronym XSD) except that not every element and attribute in W3C XML Schema is allowed in MicroXSD: It is a subset. (Note again that MicroXSD is particularly but not solely targeted for defining XML instances written in the subset of XML called 'MicroXML' as mentioned in the introduction.)
A schema is a file or string (or the like) of XML markup which defines and constrains the structural aspects of another instance or fragment of XML. It cannot express every aspect of the constraints which might be required for some uses of XML but it is often the first choice to use for the definition since it supports some of the most common requirements and is quite ubiquitous, well known and well supported in XML-related software. Once a schema is defined there are other ways to fill the gaps, such as with prose specification statements, assertions or test assertions. The MicroXSD subset cannot support all typical use cases (such as cases where there is much reuse of XML syntax, as with XHTML) but where it can be used it offers ease of interpretation of the schema by a human reader plus reduced complexity when writing software to execute validation of the schema against a target instance. Often the latter is necessary as a prerequisite to further processing of XML in software.

The 'schema' Element: <schema>

Every conforming MicroXSD schema starts with the 'scheme' element as its outermost (top level) element.
An example:
<schema xmlns="http://www.w3.org/2001/XMLSchema" version="123" attributeFormDefault="unqualified" elementFormDefault="qualified" targetNamespace="somenamespace">


Lets pull this apart and see how it works.
<schema> is the top level element in a schema. It wraps all the other schema elements in this way
<schema ...> ...
</schema>


Attributes of the 'scheme' element are as follows.
xmlns="http://www.w3.org/2001/XMLSchema" : This attribute with this name and value is mandatory in a schema top level element. It provides the schema with a namespace. All elements between <schema> and </schema> share this namespace. This namespace is the namespace of the language used to write the schema, NOT the namespace of the XML constrained by the schema. For the latter, in MicroXSD, the 'targetNamespace' attribute is used (see below).
version="123": This attribute gives the schema a version number (sometimes characters other than numbers are used too such as '123.1.0' or even 'foo-draft-1').
attributeFormDefault="unqualified" elementFormDefault="qualified" : Included for compatibility with existing software processing W3C XML Schema and with the XML Schema standard, these attributes and the values shown are necessary to clarify how to interpret the namespaces of attributes and elements in the instance of XML constrained by the schema. They are fixed with these values in MicroXSD.

targetNamespace="somenamespace" : This attribute defines which namespace is to be assigned to an XML instance constrained by the schema. It is optional and if missing the namespace is assumed to be empty. Typically namespaces follow some scheme (scheme, not schema) which helps, for example, to associate the namespace with some provenance. One such scheme is to use the authors' URL or domain name perhaps together with an identifier of some find. Any string is allowed though.
In a MicroXSD schema no other attributes should be present in this top level 'scheme' element. The only element allowed immediately below (within) this scheme element in a MicroXSD schema is the 'element' element (the element called 'element'!). (Other possibilities acceptable in W3C XML Schema are not supported with the MicroXSD subset profile.)

The Topmost 'element' Element: <element>



The element called 'element' (beware confusion!) is found just below the 'scheme' element but can also be nested further down within itself: 'element' elements can contain other 'element' elements (albeit indirectly). An example of its use when it is directly below the 'scheme', top level element is
<element name="Hello"> ... </element>
name="Hello": This 'name' attribute is the only attribute allowed in 'element' in MicroXSD for the topmost 'element' (which is directly below the top level 'scheme' element). It defines the name of the top level element in the constrained XML instance. It can have as its value any valid XML name as specified by the XML 1.0 Standard and in MicroXML: Starting with an alpha (ASCII) or underscore ('_') character, not including any spaces, but with numbers and some punctuation characters (such as the point '.' or underscore '_') allowed after the first character.
When the 'element' element occurs lower down in the schema structure it can also take the attributes 'minOccurs' and 'maxOccurs' and these will be looked at later.

The only child element allowed in MicroXSD for 'element' element is called 'complexType'.

Example of a topmost element with its complex type:
<element name="Hello">
 <complexType mixed="true">
  ...
Wherever the 'complexType' element is used in MicroXSD its syntax is the same.

Simplest use of the 'complexType' Element


Here is a so-called 'Hello World' example:
For the simple XML
<Hello>World</Hello>
we can use MicroXSD to write the schema
<schema xmlns="http://www.w3.org/2001/XMLSchema" version="0.1" attributeFormDefault="unqualified" elementFormDefault="qualified">
 <element name="Hello">
  <complexType>
   <simpleContent>
    <extension base="string"/>
   </simpleContent>
  </complexType>
 </element>
</schema>
Here the complexType element has one child which is 'simpleContent' (other child elements are allowed in MicroXSD and these will be looked at later on). That child 'simpleContent' itself has one required child element (the only allowed child element of 'simpleContent' in MicroXSD) which is called 'extension' which has a single, required attribute called 'base'. The 'base' attribute has a value which defines the datatype of the content (in the above example, the datatype of the content of the 'Hello' element). Allowed values for 'base' are string, decimal, integer, date, dateTime, boolean and base64Binary. These are some of the more commonly used standard datatypes for W3C XML Schema.
The above schema allows the XML
<Hello>Reader</Hello>
but does not allow
<Hello><Reader/></Hello>
because the latter contains another element named 'Reader' rather than just some text data of type 'string'.
The markup language, XML, allows both elements and attributes so in the next section we will look at how to add some attributes to this simple element.

Adding attributes

A very simple 'Hello World' example XML and a corresponding MicroXSD schema :
<Hello>World</Hello>
which may be constrained by MicroXSD schema
<schema xmlns="http://www.w3.org/2001/XMLSchema" version="0.1" attributeFormDefault="unqualified" elementFormDefault="qualified">
 <element name="Hello">
  <complexType>
   <simpleContent>
    <extension base="string"/>
   </simpleContent>
  </complexType>
 </element>
</schema>
XML can have attributes as well as elements so a Hello World example could be written:
<Greeting to=”World”>Hello</Greeting>

The 'attribute' Element: <attribute>

The 'complexType' allows us to specify whether an element has child elements and whether it has attributes. When there are no child elements to be added to the instance element we use 'complexType' with a 'simpleContent' child and inside that we place a child of 'simpleContent' called 'extension'. In MicroXSD the element 'simpleContent' can only have this 'extension' element as a child element.
So far we have defined the following
<Greeting>Hello</Greeting>
and we need to add the attribute named 'to' to give us
<Greeting to=”World”>Hello</Greeting>
We do this by adding 'attribute' elements as children of the 'extension' element. The 'extension' element in MicroXSD is limited to having the attribute 'base' (with values limited in MicroXSD to a range of commonly used datatypes: string, decimal, integer, date, dateTime, boolean and base64Binary) and the child element 'attribute'.
<schema xmlns="http://www.w3.org/2001/XMLSchema" version="0.1" attributeFormDefault="unqualified" elementFormDefault="qualified">
 <element name="Greeting">
  <complexType>
   <simpleContent>
    <extension base="string">
     <attribute name="to"/>
    <extension>
   </simpleContent>
  </complexType>
 </element>
</schema>
To show that the attribute itself has content of a particular datatype we can use the element 'simpleType'. With MicroXSD this is the one way to do it. With full W3C XML Schema there is an alternative way which is to use a 'type' attribute of the 'attribute' element but that is not included in MicroXSD. Likewise in full W3C XML Schema there are several ways to assign a datatype to a simple content element but only one applies in MicroXSD, as shown above.
<schema xmlns="http://www.w3.org/2001/XMLSchema" version="0.1" attributeFormDefault="unqualified" elementFormDefault="qualified">
 <element name="Greeting">
  <complexType>
   <simpleContent>
    <extension base="string">
     <attribute name="to">
      <simpleType>
       <restriction base="string"/>
      </simpleType>
     </attribute>
    </extension>
   </simpleContent>
  </complexType>
 </element>
</schema>

The 'simpleType' Element: <simpleType>

<simpleType>
 <restriction base="string"/>
</simpleType>
In MicroXSD the element called 'simpleType' has only one child element called 'restriction' and no attributes. The 'restriction' element has only one attribute in MicroXSD named 'base' which has the same function and list of values as the 'base' attribute of the 'extension' element described above.
Of course there can be more than one attribute for an element, though only one with any particular name (multiple attributes with the same name on the same element are not allowed). We could add another attribute to the example schema with the name 'from' like this:
<schema xmlns="http://www.w3.org/2001/XMLSchema" version="0.1" attributeFormDefault="unqualified" elementFormDefault="qualified">
 <element name="Greeting">
  <complexType>
   <simpleContent>
    <extension base="string">
     <attribute name="to">
      <simpleType>
        <restriction base="string"/>
      </simpleType>
     </attribute>
     <attribute name="from">
      <simpleType>
       <restriction base="string"/>
      </simpleType>
     </attribute>
    </extension>
   </simpleContent>
  </complexType>
 </element>
</schema>

This allows the further attribute 'to' to be added to the XML example:
<Greeting to=”World” from=”Mars”>Hello</Greeting>


The 'attribute' element (confusing language isn't it!) in MicroXSD can have just the one attribute 'name'.
For example:
<attribute name="to">...</attribute>
The 'name' assigns a name to the attribute in the XML instance and like the name of an element, the name of an attribute has to conform to the naming standard in XML 1.0 which means no numbers as first character but apart from the first character both alphanumeric characters and some other characters such as point '.' and underscore '_' are allowed in any order. Because MicroXSD supports in particular a subset of XML called MicroXML (see introduction), the attributes' (and elements') names do not include the 'prefix' allowed in standard XML. This is to eliminate multiple namespaces. This decreases the complexity required in MicroXSD significantly. Note: It has been proposed that there should be support in MicroXML for prefixes for attribute names but as yet this has not been included in MicroXSD (version 2012). The term given to standard element and attribute names in XML without the prefix is 'NCName' (as distinct from the term for a name qualified with a prefix and associated namespace which is a 'QName' and QNames are not supported in this version of MicroXSD).

Simply supporting complexity



Here is a very simple example XML and corresponding MicroXSD schema :
<Greeting>Hello</Greeting>
which may be constrained by MicroXSD schema


<schema xmlns="http://www.w3.org/2001/XMLSchema" attributeFormDefault="unqualified" elementFormDefault="qualified">
 <element name="Greeting">
  <complexType>
   <simpleContent>
    <extension base="string"></extension>
   </simpleContent>
  </complexType>
 </element>
</schema>
XML can have child elements within any given element so a Hello World example could be written:
<Greeting><Hello/></Greeting>
This requires that we add an element, albeit an empty one, to the top level element. We do this by adding 'sequence' or 'choice' elements as children of the 'complexType' element (instead of the 'simpleContent' element). The 'sequence' element in MicroXSD is limited to having no attributes but a range of possible child elements. Likewise the 'choice' element. Each of these elements, 'sequence' and 'choice' allow us to specify that an element has child elements. The difference between them is clear from their names: 'sequence' is used when the sequence of the child elements is fixed whereas 'choice' is used to show that there is a choice between certain child elements or sets of child elements. When there is just one child element it is best to use 'sequence'. In our case we just want one child element so we will use 'sequence'.


<schema xmlns="http://www.w3.org/2001/XMLSchema" attributeFormDefault="unqualified" elementFormDefault="qualified">
 <element name="Greeting">
  <complexType>
   <sequence>
    <element name="Hello">
     <complexType></complexType>
    </element>
   </sequence>
  </complexType>
 </element>
</schema>

Here we have the unusual situation of having an empty complexType to show that the child element 'Hello' is empty. This complexType in MicroXSD, though, can be any complexType allowed by MicroXSD.
Suppose we now wish to add several attributes to this child element, such as for an XML instance as follows:
<Greeting><Hello exclamation="true"/></Greeting>
We add an attribute using the 'attribute' element but we place this element after any 'sequence' or 'choice' elements. We cannot use an 'attribute' element like this when using the 'simpleContent' we used before but if there is neither a 'simpleContent' nor a 'sequence' or 'choice' we can still add 'attribute' elements. To show that the attribute itself has content of a particular datatype we can use the element 'simpleType'. In the example we can make the datatype 'boolean' to match the use of values 'true' and 'false'.

<schema xmlns="http://www.w3.org/2001/XMLSchema" attributeFormDefault="unqualified" elementFormDefault="qualified">
 <element name="Greeting">
  <complexType>
   <sequence>
    <element name="Hello">
     <complexType>
      <attribute name="exclamation">
       <simpleType>
        <restriction base="boolean"/>
       </simpleType>
      </attribute>
     </complexType>
    </element>
   </sequence>
  </complexType>
 </element>
</schema>

With full W3C XML Schema there are other options available besides 'sequence' and 'choice' but with MicroXSD the only children of the 'complexType' are 'simpleContent', 'sequence', 'choice' and, when 'simpleContent' is not used, the 'attribute' element - to keep things simple. However, many structures with varying degrees of complexity are possible even with MicroXSD because the 'sequence' and 'choice' elements themselves can have nested 'sequence' and 'choice' elements. So the possible children of both 'sequence' and 'choice' in MicroXSD are 'element', 'sequence' (nested) and 'choice' (nested). Multiple 'sequence' or 'choice' elements are not allowed side-by-side though, they can only be nested.
Here is a schema using MicroXSD with a little nesting
<schema xmlns="http://www.w3.org/2001/XMLSchema" attributeFormDefault="unqualified" elementFormDefault="qualified">
 <element name="Greeting">
  <complexType>
   <sequence>
    <element name="Hello">
     <complexType>
      <choice>
       <element name="World">
        <complexType></complexType>
       </element>
       <element name="Mars">
        <complexType></complexType>
       </element>
      </choice>
      <attribute name="exclamation">
       <simpleType>
        <restriction base="boolean"/>
       </simpleType>
      </attribute>
     </complexType>
    </element>
   </sequence>
  </complexType>
 </element>
</schema>
and this would result in allowing a XML instances a little more complex like this
<Greeting>
 <Hello exclamation="true">
  <World/>
 </Hello>
</Greeting>
or this
<Greeting>
 <Hello exclamation="false">
  <Mars/>
 </Hello>
</Greeting>

This concludes the description of the more complex uses of MicroXSD. The degree of complexity is unlimited but other aspects are limited by both the fact we are using a subset of W3C XML Schema and the fact that W3C XML Schema is itself somewhat limited in its feature set and supplemented by other related technologies such as ISO Schematron, RelaxNG, Test Assertions, formatting tables, plain text specifications and programming code. However, the fact that MicroXSD is a subset of W3C XML Schema means that adhering to the MicroXSD metaschema in your schema design should ensure that tools conforming to W3C XML Schema can readily handle a MicroXSD schema.

The MicroXSD 'Metaschema' (the schema defining a MicroXSD schema)

<?xml version="1.0" encoding="UTF-8"?>
<schema targetNamespace="http://www.w3.org/2001/XMLSchema" elementFormDefault="qualified" attributeFormDefault="unqualified" version="2012.02" xmlns="http://www.w3.org/2001/XMLSchema">
 <!-- MicroXSD 2012.02 -->
 <!-- -->
 <element name="schema">
  <complexType>
   <sequence>
    <element name="element" minOccurs="0">
     <complexType>
      <sequence>
       <element name="complexType" type="complexType_type"/>
      </sequence>
      <attribute name="name" type="NCName" use="required"/>
     </complexType>
    </element>
   </sequence>
   <attribute name="version" type="string" use="optional"/>
   <attribute name="attributeFormDefault" use="required" fixed="unqualified"/>
   <attribute name="elementFormDefault" use="required" fixed="qualified"/>
   <attribute name="targetNamespace" type="string" use="optional"/>
  </complexType>
 </element>
 <complexType name="element_type">
  <sequence>
   <element name="complexType" type="complexType_type"/>
  </sequence>
  <attribute name="name" type="NCName" use="required"/>
  <attribute name="minOccurs" use="optional">
   <simpleType>
    <restriction base="string">
     <enumeration value="0"/>
     <enumeration value="1"/>
    </restriction>
   </simpleType>
  </attribute>
  <attribute name="maxOccurs" use="optional">
   <simpleType>
    <restriction base="string">
     <enumeration value="1"/>
     <enumeration value="unbounded"/>
    </restriction>
   </simpleType>
  </attribute>
 </complexType>
 <simpleType name="base_type">
  <restriction base="string">
   <enumeration value="string"/>
   <enumeration value="decimal"/>
   <enumeration value="integer"/>
   <enumeration value="date"/>
   <enumeration value="dateTime"/>
   <enumeration value="boolean"/>
   <enumeration value="base64Binary"/>
  </restriction>
 </simpleType>
 <group name="element_sequence_choice">
  <choice>
   <element name="element" type="element_type"/>
   <group ref="sequence_choice"/>
  </choice>
 </group>
 <complexType name="restriction_type">
  <attribute name="base" type="base_type" use="required"/>
 </complexType>
 <complexType name="attribute_type">
  <sequence>
   <element name="simpleType" type="simpleType_type"/>
  </sequence>
  <attribute name="use" use="optional">
   <simpleType>
    <restriction base="string">
     <enumeration value="optional"/>
     <enumeration value="required"/>
    </restriction>
   </simpleType>
  </attribute>
  <attribute name="name" type="NCName" use="required"/>
 </complexType>
 <complexType name="extension_type">
  <sequence>
   <element name="attribute" type="attribute_type" minOccurs="0" maxOccurs="unbounded"/>
  </sequence>
  <attribute name="base" type="base_type" use="required"/>
 </complexType>
 <complexType name="complexType_type">
  <choice>
   <element name="simpleContent">
    <complexType>
     <sequence>
      <element name="extension" type="extension_type"/>
     </sequence>
    </complexType>
   </element>
   <sequence>
    <group ref="sequence_choice" minOccurs="0"/>
    <element name="attribute" type="attribute_type" minOccurs="0" maxOccurs="unbounded"/>
   </sequence>
  </choice>
  <attribute name="mixed" use="optional">
   <simpleType>
    <restriction base="string">
     <enumeration value="true"/>
     <enumeration value="false"/>
    </restriction>
   </simpleType>
  </attribute>
 </complexType>
 <complexType name="simpleType_type">
  <sequence>
   <element name="restriction" type="restriction_type"/>
  </sequence>
 </complexType>
 <group name="sequence_choice">
  <choice>
   <element name="sequence">
    <complexType>
     <group ref="element_sequence_choice" maxOccurs="unbounded"/>
    </complexType>
   </element>
   <element name="choice">
    <complexType>
     <group ref="element_sequence_choice" maxOccurs="unbounded"/>
    </complexType>
   </element>
  </choice>
 </group>
</schema>