blob: 2a5ef7c4cecd74656ae50e0c2d540e5c8c8612fa [file] [log] [blame]
<html>
<head>
<!--
Copyright (c) 2010, 2021 Oracle and/or its affiliates. All rights reserved.
This program and the accompanying materials are made available under the
terms of the Eclipse Distribution License v. 1.0, which is available at
http://www.eclipse.org/org/documents/edl-v10.php.
SPDX-License-Identifier: BSD-3-Clause
-->
<title>Design of XSOM</title>
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1"/>
<style>
pre {
background-color: rgb(240,240,240);
margin-left: 2em;
margin-right: 2em;
padding: 1em;
}
p {
margin-left: 2em;
}
dt {
margin-top: 0.5em;
margin-left: 2em;
font-weight: bold;
}
dd {
margin-left: 3em;
}
</style>
</head>
<body>
<h1 style="text-align:center">Design of XSOM</h1>
<div align=right style="font-size:smaller">
By <a href="mailto:kohsuke.kawaguchi@sun.com">Kohsuke Kawaguchi</a><br>
</div>
<p>
This document describes the details you need to know to extend/maintain XSOM.
</p>
<h1>Design Goals</h1>
<p>
The primary design goals of XSOM are:
</p>
<ol>
<li>Expose all the information defined in the schema spec
<li>Provide additional methods that helps simplifying client applications.
</ol>
<p>
Providing mutation methods was a non-goal for this project, primarily because of the added complexity.
</p>
<h1>Building workspace</h1>
<p>
The workspace uses Ant as the build tool. The followings are the major targets:
</p>
<dl>
<dt>clean</dt>
<dd>remove the intermediate and output files.</dd>
<dt>compile</dt>
<dd>generate a parser by RelaxNGCC and compile all the source files into the bin directory.</dd>
<dt>jar</dt>
<dd>make a jar file</dd>
<dt>release</dt>
<dd>build a distribution zip file that contains everything from the source code to a binary file</dd>
<dt>src-zip</dt>
<dd>Build a zip file that contains the source code.</dd>
</dl>
<h1>Architecture</h1>
<p>
XSOM consists of roughly three parts.
The first part is the public interface, which is defined in the <code>com.sun.xml.xsom</code> package. The entire functionality of XSOM is exposed via this interface. This interface is derived from a draft document submitted to W3C by some WG members.
</p><p>
The second part is the implementation of these interfaces, the <code>com.sun.xml.xsom.impl</code> package. These code are all hand-written.
</p><p>
The third part is a parser that reads XML representation of XML Schema and builds XSOM nodes accordingly. The package is <code>com.sun.xml.xsom.parser</code>. This part of the code is mostly generated by <a href="http://relaxngcc.sourceforge.net/">RelaxNGCC</a>.
</p>
<center>
<img src="architecture.png"/>
</center>
<h1>Implementation Details</h1>
<p>
Most of the implementation classes are fairly simple. Probably the only one interesting piece of code is the <code>Ref</code> class, which is a reference to other schema components.
</p><p>
The <code>Ref</code> class itself is just a place hodler and this class defined a series of inner interfaces that are specialized to hold a reference to different kinds of schema components.
The sole purpose of this indirection layer is to support forward references during a parsing of the XML representation.
</p><p>
A typical reference interface would look like this:
</p>
<pre>
public static interface Term {
/** Obtains a reference as a term. */
XSTerm getTerm();
}
</pre>
<p>
In case this indirection is unnecessary, all implementation classes of <code>XSTerm</code> implements this <code>Ref.Term</code> interface. This applies to all the other types of the <code>Ref</code> interface. Therefore, whereever a reference is necessary, you can stimply pass a real object. In other words, a direct reference (<code>XS***Impl</code>) can be always treated as an indirect reference (<code>Ref.***</code>).
</p><p>
Implementations for forward references are placed in the <code>com.sun.xml.xsom.impl.parser.DelayedRef</code> class. The detail will be discussed later.
</p>
<h1>Parser</h1>
<p>
The following collaboration diagram shows various objects that participate in a parsing process.
</p>
<center>
<img src="collaboration.png"/>
</center>
<p>
<code>XSOMParser</code> is the only publicly visible component in this picture. This class also keeps references to vairous other objects that are necessary to parse schemas. This includes an error handler, the root <code>SchemaSet</code> object, an entity resolver, etc.
</p><p>
Whenever the parse method is called, it will create a new NGCCRuntimeEx and configure XMLReader so that a schema file is parsed into this NGCCRuntimeEx instance.
<code>NGCCRuntimeEx</code> derives from <code>NGCCRuntime</code>, which is a class generated by RelaxNGCC. This object will use other RelaxNGCC-generated classes and parse a document and constructs a XSOM object graph appropriately.
</p><p>
When a new XML document is referenced by an import or include statement, a new set of <code>NGCCRuntimeEx</code> is set up to parse that document. One NGCCRuntimeEx can only parse one XML document.
</p>
<h2>Forward references and back-patching</h2>
<p>
Since we use SAX to parse schemas, the referenced schema component is often unavailable when we hit a reference. Because of this, when we see a reference, we create a "delayed" reference that keeps the name of the referenced component.
</p><p>
Note that because of the way XML Schema &lt;redefine> works, all the references by name must be lazily bound even if the component is already defined.
</p><p>
All these "delayed" references are remembered and tracked by XSOMParser. When the client calls the <code>XSOMParser.getResult</code> method, XSOMParser will make sure that they resolve to a schema component correctly.
"Delayed" references are available in the <code>DelayedRef</code> class.
</p>
<h2>RelaxNGCC</h2>
<p>
The actual parser is generated by RelaxNGCC from <code>xsom/src/*.rng</code> files. <code>xmlschema.rng</code> is the entry point and all the other files are referenced from this file. For more information about RelaxNGCC, goto <a href="http://relaxngcc.sourceforge.net/">here</a>. Or just contact me (as I'm one of the developers of RelaxNGCC.)
</p>
</body>
</html>