blob: 910949dead4e6617f44bcd54c7fa0f77d37e8cd3 [file] [log] [blame]
//
// Copyright (c) 2020, 2021 Contributors to the Eclipse Foundation
//
== Architecture
=== Introduction
This chapter describes the architectural
components, comprising the XML-databinding facility, that realize the
goals outlined in <<Goals>>. The scope of
this version of specification covers many additional goals beyond those
in JAXB 1.0. As a result, JAXB 1.0 architecture has been revised
significantly.
=== Overview
The XML data-binding facility consists of the
following architectural components:
* *_schema compiler_*: A schema compiler binds a
_source schema_ to a set of _schema derived program elements_. The binding
is described by an XML-based language, *_binding language_*.
* *_schema generator_*: A schema generator maps a
set of existing program elements to a _derived schema_. The mapping is
described by *_program annotations_*.
* *_binding runtime framework_* that provides two
primary operations for accessing, manipulating and validating XML
content using either schema derived or existing program elements: +
** _Unmarshalling_ is the process of reading
an XML document and constructing a tree of _content objects_. Each content
object is an instance of either a schema derived or an existing program
element mapped by the schema generator and corresponds to an instance in
the XML document. Thus, the content tree reflects the documents
content. +
Validation can optionally be enabled as part of the
unmarshalling process. _Validation_ is the process of verifying that an
xml document meets all the constraints expressed in the schema.
** _Marshalling_ is the inverse of
unmarshalling, i.e., it is the process of traversing a content tree and
writing an XML document that reflects the trees content. Validation can
optionally be enabled as part of the marshalling process.
As used in this specification, the term
_schema_ includes the W3C XML Schema as defined in the XML Schema 1.0
Recommendation[XSD Part 1][XSD Part 2]. <<a210>> illustrates relationships
between concepts introduced in this section.
.Non-Normative Jakarta XML Binding Architecture diagram
[[a210]]
image::images/xmlb-3.png[image]
JAXB-annotated classes are common to both
binding schemes. They are either generated by a schema compiler or the
result of a programmer adding JAXB annotations to existing Java classes.
The universal unmarshal/marshal process is driven by the JAXB
annotations on the portable JAXB-annotated classes.
Note that the binding declarations object in
the above diagram is logical. Binding declarations can either be inlined
within the schema or they can appear in an external binding file that is
associated with the source schema.
.JAXB 1.0 style binding of schema to interface/implementation classes.
image::images/xmlb-4.png[image]
Note that the application accesses only the
schema-derived interfaces, factory methods and `jakarta.xml.bind` APIs
directly. This convention is necessary to enable switching between JAXB
implementations.
=== Java Representation
The content tree contains instances of _bound types_,
types that bind and provide access to XML content. Each bound
type corresponds to one or more schema components. As much as possible,
for type safety and ease of use, a bound type that constrains the values
to match the schema constraints of the schema components. The different
bound types, which may be either schema derived or authored by a user,
are described below.
*_Value Class_* A coarse grained schema
component, such as a complex type definition, is bound to a Value class.
The Java class hierarchy is used to preserve XML Schemas derived by
extension type definition hierarchy. JAXB-annotated classes are
portable and in comparison to schema derived interfaces/implementation
classes, result in a smaller number of classes.
*_Property_* A fine-grained schema component,
such as an attribute declaration or an element declaration with a simple
type, is bound directly to a _property_ or a _field_ within a value class.
A property is _realized_ in a value class by
a set of JavaBeans-style _access methods_. These methods include the
usual `get` and `set` methods for retrieving and modifying a propertys
value; they also provide for the deletion and, if appropriate, the
re-initialization of a propertys value.
Properties are also used for references from
one content instance to another. If an instance of a schema component
_X_ can occur within, or be referenced from, an instance of some other
component _Y_ then the content class derived from _Y_ will define a
property that can contain instances of _X_.
Binding a fine-grained schema component to a
field is useful when a bound type does not follow the JavaBeans
patterns. It makes it possible to map such types to a schema without the
need to refactor them.
*_Interface_* JAXB 1.0 bound schema components
(XML content) to schema derived content interfaces and implementation
classes. The interface/implementation classes tightly couple the schema
derived implementation classes to the Jakarta XML Binding implementation
runtime framework and are thus not portable. The binding of schema components to
schema derived interfaces continues to be supported in Jakarta XML Binding.
[NOTE]
.Note
====
The mapping of existing Java interfaces to schema constructs is not
supported. Since an existing class can implement multiple interfaces,
there is no obvious mapping of existing interfaces to XML schema constructs.
====
*_Enum type_* J2SE 5.0 platform introduced
linguistic support for type safe enumeration types. Enum type are used
to represent values of schema types with enumeration values.
*_Collection type_* Collections are used to
represent content models. Where possible, for type safety, parametric
lists are used for homogeneous collections. For e.g. a repeating element
in content model is bound to a parametric list.
*_DOM node_* In some cases, binding XML content
to a DOM or DOM like representation rather than a collection of types is
more natural to a programmer. One example is an open content model that
allows elements whose types are not statically constrained by the
schema.
Content tree can be created by unmarshalling
of an XML document or by programmatic construction. Each bound type in
the content tree is created as follows:
* schema derived implementation classes that
implement schema derived interfaces can be created using factory methods
generated by the schema compiler.
* schema derived value classes can be created
using a constructor or a factory method generated by the schema
compiler.
* existing types, authored by users, are
required to provide a no arg constructor. The no arg constructor is used
by an unmarshaller during unmarshalling to create an instance of the
type.
==== Binding Declarations
A particular binding of a given source schema
is defined by a set of _binding declarations_ . Binding declarations are
written in a _binding language_ , which is itself an application of XML.
A binding declaration can occur within the annotation `appinfo` of each
XML Schema component. Alternatively, binding declarations can occur in
an auxiliary file. Each binding declaration within the auxiliary file is
associated to a schema component in the source schema. It was necessary
to support binding declarations external to the source schema in order
to allow for customization of an XML Schemas that one prefers not to
modify. The schema compiler hence actually requires two inputs, a source
schema and a set of binding declarations.
Binding declarations enable one to override
default binding rules, thereby allowing for user customization of the
schema-derived value class. Additionally, binding declarations allow for
further refinements to be introduced into the binding to Java
representation that could not be derived from the schema alone.
The binding declarations need not define
every last detail of a binding. The schema compiler assumes _default
binding declarations_ for those components of the source schema that are
not mentioned explicitly by binding declarations. Default declarations
both reduce the verbosity of the customization and make it more robust
to the evolution of the source schema. The defaulting rules are
sufficiently powerful that in many cases a usable binding can be
produced with no binding declarations at all. By defining a standardized
format for the binding declarations, it is envisioned that tools will be
built to greatly aid the process of customizing the binding from schema
components to a Java representation.
==== Mapping Annotations
A mapping annotation defines the mapping of a
program element to one or more schema components. A mapping annotation
typically contains one or more annotation members to allow customized
binding. An annotation member can be required or optional. A mapping
annotation can be collocated with the program element in the source. The
schema generator hence actually requires both inputs: a set of classes
and a set of mapping annotations.
Defaults make it easy to use the mapping
annotations. In the absence of a mapping annotation on a program
element, the schema generator assumes, when required by a mapping rule,
a _default mapping annotation_. This, together with an appropriate choice
of default values for optional annotation members makes it possible to
produce in many cases a usable mapping with minimal mapping annotations.
Thus mapping annotations provide a powerful yet easy to use
customization mechanism.
=== Annotations
Many of the architectural components are driven by program
annotations defined by this specification, _mapping annotations_.
*_Java to Schema Mapping_* Mapping annotations
provide meta data that describe or customize the mapping of existing
classes to a derived schema.
*_Portable Value Classes_* Mapping annotations
provide information for unmarshalling and marshalling of an XML instance
into a content tree representing the XML content without the need for a
schema at run time. Thus schema derived code annotated with mapping
annotations are portable i.e. they are capable of being marshalled and
unmarshalled by a universal marshaller and unmarshaller written by a
JAXB vendor implementation.
*_Adding application specific behavior and data_*
Applications can choose to add either behavior or data to schema derived
code. Section <<Modifying Schema-Derived Code>>
specifies how the mapping annotation, `@jakarta.annotation.Generated`,
should be used by a developer to denote developer added/modified code
from schema-derived code. This information can be utilized by tools to
preserve application specific code across regenerations of schema
derived code.
=== Binding Framework
The binding framework has been revised
significantly since JAXB 1.0. Significant changes include:
* support for unmarshalling of invalid XML
content.
* removal of on-demand validation.
* unmarshal/marshal time validation deferring
to JAXP validation.
==== Unmarshalling
===== Invalid XML Content
*_Rationale:_* Invalid XML content can arise for
many reasons:
* When the cost of validation needs to be avoided.
* When the schema for the XML has evolved.
* When the XML is from a non-schema-aware processor.
* When the schema is not authoritative.
Support for invalid XML content required
changes to JAXB 1.0 schema to java binding rules as well as the
introduction of a flexible unmarshalling mode. These changes are
described in <<Unmarshalling Modes>>.
==== Validation
The constraints expressed in a schema fall
into three general categories:
* A _type_ constraint imposes requirements
upon the values that may be provided by constraint facets in simple type
definitions.
* A _local structural_ constraint imposes
requirements upon every instance of a given element type, e.g., that
required attributes are given values and that a complex elements
content matches its content specification.
* A _global structural_ constraint imposes
requirements upon an entire document, e.g., that `ID` values are unique
and that for every `IDREF` attribute value there exists an element with
the corresponding `ID` attribute value.
A _document_ is valid if, and only if, all of
the constraints expressed in its schema are satisfied. The manner in
which constraints are enforced in a set of derived classes has a
significant impact upon the usability of those classes. All constraints
could, in principle, be checked only during unmarshalling. This approach
would, however, yield classes that violate the _fail-fast_ principle of
API design: errors should, if feasible, be reported as soon as they are
detected. In the context of schema-derived classes, this principle
ensures that violations of schema constraints are signalled when they
occur rather than later on when they may be more difficult to diagnose.
With this principle in mind we see that schema
constraints can, in general, be enforced in three ways:
* _Static_ enforcement leverages the type
system of the Java programming language to ensure that a schema
constraint is checked at applications compilation time. Type
constraints are often good candidates for static enforcement. If an
attribute is constrained by a schema to have a boolean value, e.g., then
the access methods for that attributes property can simply accept and
return values of type `boolean`.
* _Simple dynamic_ enforcement performs a
trivial run-time check and throws an appropriate exception upon failure.
Type constraints that do not easily map directly to Java classes or
primitive types are best enforced in this way. If an attribute is
constrained to have an integer value between zero and 100, e.g., then
the corresponding propertys access methods can accept and return `int`
values and its mutation method can throw a run-time exception if its
argument is out of range.
* _Complex dynamic_ enforcement performs a
potentially costly run-time check, usually involving more than one
content object, and throwing an appropriate exception upon failure.
Local structural constraints are usually enforced in this way: the
structure of a complex elements content, e.g., can in general only be
checked by examining the types of its children and ensuring that they
match the schemas content model for that element. Global structural
constraints must be enforced in this way: the uniqueness of `ID` values,
e.g., can only be checked by examining the entire content tree.
It is straightforward to implement both static
and simple dynamic checks so as to satisfy the fail-fast principle.
Constraints that require complex dynamic checks could, in theory, also
be implemented so as to fail as soon as possible. The resulting classes
would be rather clumsy to use, however, because it is often convenient
to violate structural constraints on a temporary basis while
constructing or manipulating a content tree.
Consider, e.g., a complex type definition
whose content specification is very complex. Suppose that an instance of
the corresponding value class is to be modified, and that the only way
to achieve the desired result involves a sequence of changes during
which the content specification would be violated. If the content
instance were to check continuously that its content is valid, then the
only way to modify the content would be to copy it, modify the copy, and
then install the new copy in place of the old content. It would be much
more convenient to be able to modify the content in place.
A similar analysis applies to most other sorts
of structural constraints, and especially to global structural
constraints. Schema-derived classes have the ability to enable or
disable a mode that verifies type constraints. JAXB mapped classes can
optionally be validated at unmarshal and marshal time.
===== Validation Re architecture
The detection of complex schema constraint
violations has been redesigned to have a Jakarta XML Binding implementation to
delegate to the validation API in JAXP. JAXP defines a standard
validation API (`javax.xml.validation` package) for validating XML
content against constraints within a schema. Furthermore, JAXP has
been incorporated into J2SE 5.0 platform. Any Jakarta XML Binding implementation
that takes advantage of the validation API will result in a smaller
footprint.
===== Unmarshal validation
When the unmarshalling process incorporates
validation and it successfully completes without any validation errors,
both the input document and the resulting content tree are guaranteed to
be valid.
However, always requiring validation during
unmarshalling proves to be too rigid and restrictive a requirement.
Since existing XML parsers allow schema validation to be disabled, there
exist a significant number of XML processing uses that disable schema
validation to improve processing speed and/or to be able to process
documents containing invalid or incomplete content. To enable the JAXB
architecture to be used in these processing scenarios, the binding
framework makes validation optional.
===== Marshal Validation
Validation may also be optionally performed
at marshal time. This is new for Jakarta XML Binding. Validation of object graph
while marshalling is useful in web services where the marshalled output
must conform to schema constraints specified in a WSDL document. This
could provide a valuable debugging aid for dealing with any
interoperability problems
===== Handling Validation Failures
While it would be possible to notify a JAXB
application that a validation error has occurred by throwing a
`JAXBException` when the error is detected, this means of communicating
a validation error results in only one failure at a time being handled.
Potentially, the validation operation would have to be called as many
times as there are validation errors. Both in terms of validation
processing and for the applications benefit, it is better to detect as
many errors and warnings as possible during a single validation pass. To
allow for multiple validation errors to be processed in one pass, each
validation error is mapped to a validation error event. A validation
error event relates the validation error or warning encountered to the
location of the text or object(s) involved with the error. The stream of
potential validation error events can be communicated to the application
either through a registered validation event handler at the time the
validation error is encountered, or via a collection of validation
failure events that the application can request after the operation has
completed.
Unmarshalling and marshalling are the two
operations that can result in multiple validation failures. The same
mechanism is used to handle both failure scenarios. See
<<General Validation Processing>> for further details.
=== An example
Throughout this specification we will refer
and build upon the familiar schema from [XSD Part 0], which describes a
purchase order, as a running example to illustrate various binding
concepts as they are defined. Note that all schema name attributes with
values in *this font* are bound by JAXB technology to either a Java
interface or JavaBean-like property. Please note that the derived Java
code in the example only approximates the default binding of the
schema-to-Java representation.
[source,xml,subs="specialcharacters,quotes"]
----
<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema">
<xsd:element name=*"purchaseOrder"* type=*"PurchaseOrderType"*/>
<xsd:element name=*"comment"* type=*"xsd:string"*/>
<xsd:complexType name=*"PurchaseOrderType"*>
<xsd:sequence>
<xsd:element name=*"shipTo"* type="USAddress"/>
<xsd:element name=*"billTo"* type="USAddress"/>
<xsd:element ref=*"comment"* minOccurs="0"/>
<xsd:element name=*"items"* type="Items"/>
</xsd:sequence>
<xsd:attribute name=*"orderDate"* type="xsd:date"/>
</xsd:complexType>
<xsd:complexType name=*"USAddress"*>
<xsd:sequence>
<xsd:element name=*"name"* type="xsd:string"/>
<xsd:element name=*"street"* type="xsd:string"/>
<xsd:element name=*"city"* type="xsd:string"/>
<xsd:element name=*"state"* type="xsd:string"/>
<xsd:element name=*"zip"* type="xsd:decimal"/>
</xsd:sequence>
<xsd:attribute name=*"country"* type="xsd:NMTOKEN" fixed="US"/>
</xsd:complexType>
<xsd:complexType name=*"Items"* >
<xsd:sequence>
<xsd:element name=*"item"* minOccurs="1" maxOccurs="unbounded">
<xsd:complexType>
<xsd:sequence>
<xsd:element name=*"productName"* type="xsd:string"/>
<xsd:element name=*"quantity"* >
<xsd:simpleType>
<xsd:restriction base="xsd:positiveInteger">
<xsd:maxExclusive value="100"/>
</xsd:restriction>
</xsd:simpleType>
</xsd:element>
<xsd:element name=*"USPrice"* type="xsd:decimal"/>
<xsd:element ref=*"comment"* minOccurs="0"/>
<xsd:element name=*"shipDate"* type="xsd:date" minOccurs="0"/>
</xsd:sequence>
<xsd:attribute name=*"partNum"* type="SKU" use="required"/>
</xsd:complexType>
</xsd:element>
</xsd:sequence>
</xsd:complexType>
<!-- Stock Keeping Unit, a code for identifying products -->
<xsd:simpleType name=*"SKU"* >
<xsd:restriction base="xsd:string">
<xsd:pattern value="\d{3}-[A-Z]{2}"/>
</xsd:restriction
</xsd:simpleType>
</xsd:schema>
----
Binding of purchase order schema to a Java
representationfootnote:[In the interest of
terseness, Jakarta XML Binding program annotations have been ommitted.]:
[source,java,subs="+macros"]
----
import javax.xml.datatype.XMLGregorianCalendar; import java.util.List;
public class PurchaseOrderType {
USAddress getShipTo() {...} void setShipTo(USAddress) {...}
USAddress getBillTo() {...} void setBillTo(USAddress) {...}
/** Optional to set Comment property. */
String getComment() {...} void setComment(String) {...}
Items getItems() {...} void setItems(Items) {...}
XMLGregorianCalendar getOrderDate() void setOrderDate(XMLGregorianCalendar)
};
public class USAddress {
String getName() {...} void setName(String) {...}
String getStreet() {...} void setStreet(String) {...}
String getCity() {...} void setCity(String) {...}
String getState() {...} void setState(String) {...}
int getZip() {...} void setZip(int) {...}
static final String COUNTRY=”USA”;footnote:creq[Appropriate
customization required to bind a fixed attribute to a constant value.]
};
public class Items {
public class ItemType {
String getProductName() {...} void setProductName(String) {...}
/** Type constraint on Quantity setter value 0..99.footnote:[Type constraint
checking only performed if customization enables it and implementation supports fail-fast checking] */
int getQuantity() {...} void setQuantity(int) {...}
float getUSPrice() {...} void setUSPrice(float) {...}
/** Optional to set Comment property. */
String getComment() {...} void setComment(String) {...}
XMLGregorianCalendar getShipDate(); void setShipDate(XMLGregorianCalendar);
/** Type constraint on PartNum setter value "\d{3}-[A-Z]{2}".footnote:creq[] */
String getPartNum() {...} void setPartNum(String) {...}
};
/** Local structural constraint 1 or more instances of Items.ItemType */
List<Items.ItemType> getItem() {...}
}
public class ObjectFactory {
// type factories
Object newInstance(Class javaInterface) {...}
PurchaseOrderType createPurchaseOrderType() {...}
USAddress create USAddress() {...}
Items createItems() {...}
Items.ItemType createItemsItemType() {...}
// element factories
JAXBElement<PurchaseOrderType> createPurchaseOrder(PurchaseOrderType) {...}
JAXBElement<String> createComment(String value) {...}
}
----
The purchase order schema does not describe
any global structural constraints.
The coming chapters will identify how these
XML Schema concepts were bound to a Java representation. Just as in [XSD
Part 0], additions will be made to the schema example to illustrate the
binding concepts being discussed.