The COBOL REDEFINES facility and the Natural REDEFINE facility are often described superficially as forms of polymorphism, but this is not really accurate in the modern software engineering sense. In object-oriented programming, polymorphism refers primarily to the ability for one interface or operation to exhibit different behavior depending on the type of the object involved. COBOL REDEFINES is instead fundamentally about alternate interpretations of the same underlying storage. A region of bytes can be viewed through multiple different structural definitions, each imposing different datatype constraints, field boundaries, and decoding rules. The underlying storage does not change; only the interpretation changes.
This distinction becomes particularly interesting when considering XML and schema languages. XML is normally understood as a self-describing, intrinsically typed document format. An XML document is generally expected to carry enough structural information that validators and processors know what interpretation applies. XML Schema Definition (XSD), especially, strongly encourages this model through namespaces, global element declarations, type derivation, and xsi:type. Yet there is another possible architectural direction, one that resembles COBOL REDEFINES not at the level of memory bytes and offsets, but at the level of canonical serialized document structures.
Consider a generic business document structure:
<BusinessDocument>
<DocumentID>...</DocumentID>
<DocumentDate>...</DocumentDate>
<Sender>...</Sender>
<Receiver>...</Receiver>
<DocumentItem>...</DocumentItem>
</BusinessDocument>
One profile might interpret this as an order, constraining DocumentID to be an alphanumeric token. Another profile might interpret it as an automatically generated invoice, constraining DocumentID to be a UUID. A third might interpret it as a manually generated invoice, allowing special characters and whitespace in DocumentID. Importantly, the semantics are not the most interesting aspect here. The crucial idea is that the same serialized byte stream is subjected to different typing and validation overlays depending on context.
At first glance, this might appear similar to XML polymorphism through xsi:type. One could imagine something like:
<DocumentID xsi:type="UUIDInvoiceID">
550e8400-e29b-41d4-a716-446655440000
</DocumentID>
or even assigning UUID-like type identifiers analogous to COM class IDs or GUIDs. A schema could define many such types and validators could dispatch accordingly. Technically this works, because xsi:type enables runtime substitution of derived types. However, this is not especially satisfying architecturally because the type binding is embedded directly into the instance document itself. The document effectively self-declares its interpretation. This is early-bound, intrinsic typing rather than externally applied interpretation.
A more interesting possibility is externalized or late-bound schema application. In this model, the XML instance remains structurally neutral. The interpretation is selected later by the processing environment. One simple mechanism for this would be XML processing instructions:
<?document-profile uuid-invoice?>
<BusinessDocument>
<DocumentID>550e8400-e29b-41d4-a716-446655440000</DocumentID>
</BusinessDocument>
A processing pipeline could examine the processing instruction, select the appropriate schema set, and validate the document accordingly. Another processing instruction could select a different schema profile entirely:
<?document-profile manual-invoice?>
In this architecture, the document itself does not intrinsically carry its type identity. Instead, type interpretation is externalized into the processing pipeline. This begins to resemble COBOL REDEFINES much more closely, though at the level of serialized document interpretation rather than memory overlays.
Importantly, the XML Infoset itself does not necessarily change in this process. The XML Information Set is simply the parsed abstract representation of the document: elements, attributes, namespaces, character data, processing instructions, and so on. The underlying Infoset may remain identical while different validation profiles impose different typing overlays on top of it. What changes instead is the Post-Schema-Validation Infoset, or PSVI. The same canonical Infoset may yield multiple alternative PSVIs depending on which schema set is applied. One schema may annotate DocumentID as a UUID type. Another may annotate it as a legacy invoice identifier. Yet another may classify it as a free-form manually entered identifier. In this sense, the same parsed document tree acquires different type projections depending on externally selected constraints.
This architectural direction begins to resemble systems such as OASIS Genericode. Genericode separates canonical XML instance data from externally applied semantic and validation metadata. The XML itself remains relatively generic while interpretation rules are supplied externally through profiles, code lists, and constraint layers. This is philosophically very different from mainstream XSD-centric XML architectures, which generally assume that documents are intrinsically typed and self-identifying through namespaces and schema bindings.
The resemblance to older SGML architectural ideas is also striking. SGML often treated structure, validation, semantics, rendering, and processing context as distinct layers rather than collapsing them together into a single intrinsic type system. Modern XML tooling often conflated these concerns through namespaces and schema declarations. What emerges here instead is a layered architecture in which the XML instance is merely a canonical syntax tree and interpretation is externally projected later through validation overlays.
This becomes especially compelling when considering RELAX NG. RELAX NG was intentionally designed as a simpler, more orthogonal schema language than XSD. It is less tied to intrinsic type systems and more oriented toward structural grammar validation. Through the RELAX NG DTD Compatibility specification, validators can also support default attribute insertion and Infoset augmentation. Modern validators such as Jing can therefore not only validate a document but potentially augment the Infoset by inserting defaulted attributes and annotations.
This changes the nature of schema processing considerably. Schemas cease to be merely passive accept-or-reject grammars and instead begin acting as active overlay transformations. One RELAX NG profile might inject attributes identifying a UUID invoice profile, while another profile might inject attributes corresponding to manually generated invoices. The same canonical XML document can therefore produce different augmented Infosets depending on which profile is externally applied.
At this point the architecture begins to resemble compiler pipelines more than ordinary XML validation. One can imagine the following processing flow:
Canonical XML document → processing instruction or external context → profile selection → RELAX NG validation and augmentation → augmented Infoset → downstream typed interpretation.
This is essentially late-bound schema projection. The XML document itself remains minimally typed and structurally stable while multiple external overlays provide different validation, augmentation, and interpretation layers. XSD does not naturally support this style of architecture because it fundamentally assumes that type identity is largely intrinsic to the document itself. RELAX NG, Schematron, Genericode, and XProc together form a much more flexible ecosystem for this kind of externally projected typing model.
XProc is particularly well suited to this approach because it was designed specifically as a pipeline orchestration language for XML processing. An XProc pipeline can inspect processing instructions, select schema profiles dynamically, invoke RELAX NG validators, apply Schematron rules, augment Infosets, and route documents through multiple validation overlays. This makes it possible to construct sophisticated multi-stage interpretation systems in which schema binding occurs operationally at runtime rather than being statically embedded into the document.
Schematron complements this especially well because it provides rule-based contextual validation rather than merely grammatical validation. A Schematron layer can express assertions such as “if profile equals UUID invoice then DocumentID must match UUID syntax” or “if profile equals manual invoice then DocumentID may contain special characters.” This creates an architecture where canonical XML structure is separated from contextual validation overlays, much like alternate record definitions in older mainframe systems.
The result is an XML architecture that behaves surprisingly similarly in spirit to COBOL REDEFINES. The mechanisms are entirely different. COBOL operates at the level of contiguous storage and byte reinterpretation. XML operates at the level of abstract syntax trees and externally applied typing overlays. Yet the underlying conceptual pattern is remarkably similar: a stable representation subjected to multiple alternate typed interpretations selected according to context.
In this model, schemas become overlays rather than absolute type definitions. Validation becomes projection rather than intrinsic identity checking. XML documents cease to be fully self-describing objects and instead become canonical serialized forms onto which multiple interpretation layers can later be applied. That architectural direction is arguably much closer to sophisticated data processing systems, compiler pipelines, and overlay-based interpretation frameworks than to conventional object-oriented XML document models.
Wording by ChatGPT, prompted by Stephen D Green, May 2026