From Schemas to Overlays: XML, Epischemas, and a Minimal Viable Product Line Architecture
Written by ChatGPT, as prompted by Stephen D Green, June 2026
A recurring assumption in XML systems is that a document has a schema and that the primary task is to determine whether the document is valid according to that schema. This assumption has shaped much of the XML ecosystem, from XML Schema (XSD) and RELAX NG to validation tools and editor support.
Yet many real-world document ecosystems do not work this way.
Instead, they contain a stable, long-lived document model surrounded by numerous contextual interpretations. Different organizations, departments, workflows, jurisdictions, industries, and applications apply different constraints to the same underlying representation. The document remains recognizable and interoperable, while its contextual interpretation evolves.
This observation suggests a broader architectural pattern.
A COBOL Analogy
The idea is reminiscent of COBOL's REDEFINES facility.
With REDEFINES, the same storage can be interpreted according to different record layouts. The underlying bytes remain unchanged. What changes is the interpretation applied to them.
The significance of REDEFINES is not polymorphism, inheritance, or object orientation. Rather, it is the separation of representation from interpretation.
A stable representation can participate in multiple legitimate interpretations.
While XML is fundamentally different from memory-oriented programming languages, a similar principle can be applied at the document level.
The Conventional XML View
Consider a simple business document:
<BusinessDocument>
<DocumentID>...</DocumentID>
<DocumentDate>...</DocumentDate>
<Sender>...</Sender>
<Receiver>...</Receiver>
<Items>...</Items>
<Total>...</Total>
</BusinessDocument>
Traditional schema thinking tends to ask:
What schema validates this document?
However, many practical situations are better described by a different question:
What stable structure does this document conform to, and what contextual constraints are currently being applied?
For example:
- An automatically generated invoice may require a UUID as its DocumentID.
- A manually generated invoice may permit free-form identifiers.
- A legacy order system may require a different identifier format.
- A particular jurisdiction may impose additional constraints.
- A trading partner may impose its own profile.
The structure remains unchanged. The contextual constraints differ.
Base Schemas and Epischemas
An interesting proposal in the XML community is the concept of an "epischema," described by Gerrit Imsieke, “Epischema – Schema Constraints That Facilitate Content Completion”, XML . com, 29 April 2017.
The central idea is that a base schema defines the general structure, while one or more epischemas apply additional restrictions without modifying the base schema itself.
The epischema is not a replacement for the base schema. It is an overlay.
This immediately suggests a more general architecture:
Base Schema → Canonical Product
Epischema → Product Line Overlay
The base schema provides stability.
The overlay provides specialization.
The base schema can remain stable for many years while overlays evolve, proliferate, and occasionally disappear.
A Minimal Viable Product Line Architecture
Viewed through the lens of a Minimal Viable Product Line (MVPL) architecture, the pattern becomes even clearer.
The canonical schema becomes a Product.
It defines:
- the vocabulary;
- the structural relationships;
- the major business concepts;
- the long-term interoperability contract.
The overlays become Product Lines.
They define:
- industry profiles;
- workflow profiles;
- regulatory requirements;
- trading-partner agreements;
- organizational policies;
- experimental extensions.
The Product remains stable.
The Product Lines evolve.
This separation allows innovation without destabilizing the shared foundation.
Public and Private Overlays
Not all overlays are equal.
Some are public.
A public overlay may represent:
- an industry profile;
- a regulatory profile;
- a standards-body specification;
- a trading-partner agreement.
These overlays become part of the interoperability contract between independent organizations.
Other overlays are private.
Different departments within the same organization may apply different constraints to the same document:
- accounting overlays;
- logistics overlays;
- analytics overlays;
- archival overlays.
These need not be visible outside the organization.
The architecture therefore naturally separates into layers:
Canonical Product
↓
Public Overlays
↓
Private Overlays
Each layer has different governance, funding, and lifecycle requirements.
Governance and Funding
This distinction extends beyond technology.
The canonical product behaves like shared infrastructure.
It requires:
- long-term governance;
- version management;
- architectural stewardship;
- compatibility management;
- stable funding.
The overlays are different.
Many overlays may be funded simply as part of the projects that require them.
Some may be temporary.
Some may be experimental.
Some may never be published.
The architecture therefore aligns technical structure with economic reality.
The shared infrastructure receives long-term stewardship.
The contextual overlays receive local funding and local control.
Beyond Validation
The most interesting implication is that validation becomes only part of the story.
Traditional XML validation asks:
Valid?
Yes or No
An overlay architecture asks a different question:
What interpretation emerges
when these overlays are applied?
This resembles compiler architecture more than traditional validation.
A compiler transforms:
Source Text
↓
Parse Tree
↓
Abstract Syntax Tree
↓
Typed Semantic Model
Similarly, a document-processing architecture might produce:
XML Instance
↓
Canonical Infoset
↓
Overlay Application
↓
Contextual Infoset
↓
Business Interpretation
The result is not merely a validation outcome but an enriched interpretation.
Different overlays may generate different contextual infosets from the same canonical representation.
Stable Assumptions Rather Than Invariants
This perspective also changes how we think about standards.
The term "invariant" is often used to describe properties that remain true throughout a process.
However, many standards are not invariant in a strict mathematical sense.
A4 paper dimensions are not physically invariant. Paper can be cut, folded, or destroyed.
The value of the standard comes from a shared assumption.
Participants can rely on it.
Automation can be built around it.
Coordination becomes possible.
Similarly, a canonical document model provides stable assumptions for an ecosystem.
An order may become an invoice.
Datatype constraints may change.
Business state may change.
Workflow context may change.
Yet participants continue to rely upon the stable assumptions embodied in the canonical model.
The real value of standardization lies not in immutability but in the creation of reliable assumptions that independent actors can share.
Toward a First-Class Overlay Architecture
The XML ecosystem already contains many of the necessary technologies:
- RELAX NG
- Schematron
- NVDL
- XProc
- XML Catalogs
- xml-model processing instructions
- Genericode
- PSVI-style augmentation
What is largely missing is a unifying architectural model.
Rather than thinking in terms of one document and one schema, future XML tooling could explicitly recognize:
Instance
↓
Canonical Product
↓
Overlay Selection
↓
Overlay Composition
↓
Contextual Infoset
↓
Business Interpretation
In such a world, validation overlays become first-class citizens.
The canonical product provides stability.
The overlays provide adaptability.
The resulting architecture supports long-lived standards, evolving business requirements, organisational diversity, and perhaps even AI-generated contextual interpretations.
The XML document remains stable.
The interpretations evolve.
That may be the closest document-oriented analogue to what COBOL REDEFINES achieved decades ago in the world of data structures.
Reference: Gerrit Imsieke, “Epischema – Schema Constraints That Facilitate Content Completion”, XML . com, 29 April 2017 [link: https://www.xml.com/articles/2017/04/29/epischemas/ ]
No comments:
Post a Comment