| // |
| // Copyright (c) 2020 Contributors to the Eclipse Foundation |
| // |
| |
| [appendix] |
| == [[a4649]]Binding XML Names to Java Identifiers |
| |
| === Overview |
| |
| This section provides default mappings from: |
| |
| * XML Name to Java identifier |
| * Model group to Java identifier |
| * Namespace URI to Java package name |
| |
| === [[a4656]]The Name to Identifier Mapping Algorithm |
| |
| Java identifiers typically follow three |
| simple, well-known conventions: |
| |
| * Class and interface names always begin with |
| an upper-case letter. The remaining characters are either digits, |
| lower-case letters, or upper-case letters. Upper-case letters within a |
| multi-word name serve to identify the start of each non-initial word, or |
| sometimes to stand for acronyms. |
| * Method names and components of a package |
| name always begin with a lower-case letter, and otherwise are exactly |
| like class and interface names. |
| * Constant names are entirely in upper case, |
| with each pair of words separated by the underscore character (‘_’, |
| \u005F, LOW LINE). |
| |
| XML names, however, are much richer than Java |
| identifiers: They may include not only the standard Java identifier |
| characters but also various punctuation and special characters that are |
| not permitted in Java identifiers. Like most Java identifiers, most XML |
| names are in practice composed of more than one natural-language word. |
| Non-initial words within an XML name typically start with an upper-case |
| letter followed by a lower-case letter, as in Java language, or are |
| prefixed by punctuation characters, which is not usual in the Java |
| language and, for most punctuation characters, is in fact illegal. |
| |
| In order to map an arbitrary XML name into a |
| Java class, method, or constant identifier, the XML name is first broken |
| into a _word list_ . For the purpose of constructing word lists from XML |
| names we use the following definitions: |
| |
| * A _punctuation character_ is one of the |
| following: |
| * A hyphen (’-’, \u002D, HYPHEN-MINUS), |
| * A period (‘.’, \u002E, FULL STOP), |
| * A colon (’:’, \u003A, COLON), |
| * A dot (‘.’, \u00B7, MIDDLE DOT), |
| * \u0387, GREEK ANO TELEIA, |
| * \u06DD, ARABIC END OF AYAH, or |
| * \u06DE, ARABIC START OF RUB EL HIZB. |
| * An underscore (’_’, \u005F, LOW LINE) with |
| following exceptionlink:#a5380[29] |
| |
| These are all legal characters in XML names. |
| |
| * A _letter_ is a character for which the |
| _Character.isLetter_ method returns _true_ , _i.e._ , a letter according |
| to the Unicode standard. Every letter is a legal Java identifier |
| character, both initial and non-initial. |
| * A _digit_ is a character for which the |
| _Character.isDigit_ method returns _true_ , _i.e._ , a digit according |
| to the Unicode Standard. Every digit is a legal non-initial Java |
| identifier character. |
| * A _mark_ is a character that is in none of |
| the previous categories but for which the |
| _Character.isJavaIdentifierPart_ method returns _true_ . This category |
| includes numeric letters, combining marks, non-spacing marks, and |
| ignorable control characters. |
| |
| Every XML name character falls into one of |
| the above categories. We further divide letters into three |
| subcategories: |
| |
| * An _upper-case letter_ is a letter for |
| which the _Character.isUpperCase_ method returns _true_ , |
| * A _lowercase letter_ is a letter for which |
| the _Character.isLowerCase_ method returns _true_ , and |
| * All other letters are _uncased_ . |
| |
| An XML name is split into a word list by |
| removing any leading and trailing punctuation characters and then |
| searching for _word breaks_ . A word break is defined by three regular |
| expressions: A prefix, a separator, and a suffix. The prefix matches |
| part of the word that precedes the break, the separator is not part of |
| any word, and the suffix matches part of the word that follows the |
| break. The word breaks are defined as: |
| |
| === [[a4681]]XML Word Breaks |
| |
| Prefix |
| |
| Separator |
| |
| Suffix |
| |
| Example |
| |
| {empty}[^punct] |
| |
| punct+1 |
| |
| {empty}[^punct] |
| |
| foo|--|bar |
| |
| digit |
| |
| |
| |
| {empty}[^ _digit_ ] |
| |
| foo|22|bar |
| |
| {empty}[^digit] |
| |
| |
| |
| digit |
| |
| foo|22 |
| |
| lower |
| |
| |
| |
| {empty}[^lower] |
| |
| foo|Bar |
| |
| upper |
| |
| |
| |
| upper lower |
| |
| FOO|Bar |
| |
| letter |
| |
| |
| |
| {empty}[^letter] |
| |
| Foo|\u2160 |
| |
| {empty}[^letter] |
| |
| |
| |
| letter |
| |
| \u2160|Foo |
| |
| uncased |
| |
| |
| |
| {empty}[^uncased] |
| |
| |
| |
| {empty}[^uncased] |
| |
| |
| |
| uncased |
| |
| |
| |
| (The character _\u2160_ is ROMAN NUMERAL ONE, |
| a numeric letter.) |
| |
| After splitting, if a word begins with a |
| lower-case character then its first character is converted to upper |
| case. The final result is a word list in which each word is either |
| |
| * A string of upper- and lower-case letters, |
| the first character of which is upper case (includes underscore,’_’, for |
| exception case1). |
| * A string of digits, or |
| * A string of uncased letters and marks. |
| |
| Given an XML name in word-list form, each of |
| the three types of Java identifiers is constructed as follows: |
| |
| * A class or interface identifier is |
| constructed by concatenating the words in the list, |
| * A method identifier is constructed by |
| concatenating the words in the list. A prefix verb ( _get_ , _set_ , |
| _etc._ ) is prepended to the result. |
| * A constant identifier is constructed by |
| converting each word in the list to upper case; the words are then |
| concatenated, separated by underscores. |
| |
| This algorithm will not change an XML name |
| that is already a legal and conventional Java class, method, or constant |
| identifier, except perhaps to add an initial verb in the case of a |
| property access method. |
| |
| To improve user experience with default |
| binding, the automated resolution of frequent naming collision is |
| specified in link:jaxb.html#a4770[See Standardized Name |
| Collision Resolution]“. |
| |
| === Example |
| |
| === [[a4734]]XML Names and derived Java Class, Method, and Constant Names |
| |
| XML Name |
| |
| Class Name |
| |
| Method Name |
| |
| Constant Name |
| |
| mixedCaseName |
| |
| MixedCaseName |
| |
| getMixedCaseName |
| |
| MIXED_CASE_NAME |
| |
| Answer42 |
| |
| Answer42 |
| |
| getAnswer42 |
| |
| ANSWER_42 |
| |
| name-with-dashes |
| |
| NameWithDashes |
| |
| getNameWithDashes |
| |
| NAME_WITH_DASHES |
| |
| other_punct-chars |
| |
| OtherPunctChars |
| |
| getOtherPunctChars |
| |
| OTHER_PUNCT_CHARS |
| |
| === [[a4755]]XML Names and derived Java Class, Method, and Constant Names when <jaxb:globalBindings underscoreHandling=”asCharInWord”> |
| |
| [width="100%",cols="25%,25%,25%,25%",options="header",] |
| |=== |
| |XML Name |Class |
| Name |Method Name |
| |Constant Name |
| |other_punct-chars |
| |Other_punctChars |
| |getOther_punctChars |
| |OTHER_PUNCT_CHARS |
| |
| |name_with_underscore |
| |Name_with_underscore |
| |name_with_underscore |
| |NAME_WITH_UNDERSCORE |
| |=== |
| |
| === [[a4767]]Collisions and conflicts |
| |
| It is possible that the name-mapping |
| algorithm will map two distinct XML names to the same word list.These |
| cases will result in a _collision_ if, and only if, the same Java |
| identifier is constructed from the word list and is used to name two |
| distinct generated classes or two distinct methods or constants in the |
| same generated class. It is also possible if two or more namespaces are |
| customized to map to the same Java package, XML names that are unique |
| due to belonging to distinct namespaces could mapped to the same Java |
| Class identifier. Collisions are not permitted by the schema compiler |
| and are reported as errors; they may be repaired by revising XML name |
| within the source schema or by specifying a customized binding that maps |
| one of the two XML names to an alternative Java identifier. |
| |
| A class name must not conflict with the |
| generated JAXB class, _ObjectFactory_ , link:jaxb.html#a482[See |
| Java Package], that occurs in each schema-derived Java package. Method |
| names are forbidden to conflict with Java keywords or literals, with |
| methods declared in _java.lang.Object_ , or with methods declared in the |
| binding-framework classes. Such conflicts are reported as errors and may |
| be repaired by revising the appropriate schema or by specifying an |
| appropriate customized binding that resolves the name collision. |
| |
| === [[a4770]]Standardized Name Collision Resolution |
| |
| Given the frequency of an XML element or |
| attribute with the name “class” or “Class” resulting in a naming |
| collision with the inherited method _java.lang.Object.getClass()_ , |
| method name mapping automatically resolves this conflict by mapping |
| these XML names to the java method identifier “getClazz”. |
| |
| * |
| |
| === [[a4773]]Deriving a legal Java identifier from an enum facet value |
| |
| Given that an enum facet’s value is not |
| restricted to an XML name, the XML Name to Java identifier algorithm is |
| not applicable to generating a Java identifier from an enum facet’s |
| value. The following algorithm maps an enum facet value to a valid Java |
| constant identifier name. |
| |
| * For each character in enum facet value, + |
| copy the character to a string representation _javaId_ when |
| _java.lang.Character.isJavaIdentifierPart()_ is _true_ . |
| * To follow Java constant naming convention, |
| each valid lower case character must be copied as its upper case |
| equivalent. |
| * There is no derived Java constant |
| identifier when any of the following occur: |
| * _javaId.length() == 0_ |
| * |
| _java.lang.Character.isJavaIdentifierStart(javaId.get(0)) == false_ |
| |
| === [[a4780]]Deriving an identifier for a model group |
| |
| XML Schema has the concept of a group of |
| element declarations. Occasionally, it is convenient to bind the |
| grouping as a Java content property or a Java value class. When a |
| semantically meaningful name for the group is not provided within the |
| source schema or via a binding declaration customization, it is |
| necessary to generate a Java identifier from the grouping. Below is an |
| algorithm to generate such an identifier. |
| |
| A name is computed for an unnamed model group |
| by concatenating together the first 3 element declarations and/or |
| wildcards that occur within the model group. Each XML \{name} is mapped |
| to a Java identifier for a method using the XML Name to Java Identifier |
| Mapping algorithm. Since wildcard does not have a \{name} property, it |
| is represented as the Java identifier “ _Any_ ”. The Java identifiers |
| are concatenated together with the separator “ _And_ ” for sequence and |
| all compositor and “ _Or_ ” for choice compositors. For example, a |
| sequence of element _foo_ and element _bar_ would map to _“_ _FooAndBar_ |
| _”_ and a choice of element _foo_ and element _bar_ maps to _“_ |
| _FooOrBar_ _._ ” Lastly, a sequence of wildcard and element _bar_ would |
| map to the Java identifier _“_ _AnyAndBar_ _”_ . |
| |
| === Example: |
| |
| Given XML Schema fragment: |
| |
| <xs:choice> + |
| <xs:sequence> + |
| <xs:element ref="A"/> + |
| <xs:any processContents="strict"/> + |
| </xs:sequence> + |
| <xs:element ref="C"/> |
| |
| </xs:choice> |
| |
| The generated Java identifier would be |
| _AAndAnyOrC_ . |
| |
| === [[a4788]]Generating a Java package name |
| |
| This section describes how to generate a |
| package name to hold the derived Java representation. The motivation for |
| specifying a default means to generate a Java package name is to |
| increase the chances that a schema can be processed by a schema compiler |
| without requiring the user to specify customizations. |
| |
| If a schema has a target namespace, the next |
| subsection describes how to map the URI into a Java package name. If the |
| schema has no target namespace, there is a section that describes an |
| algorithm to generate a Java package name from the schema filename. |
| |
| === Mapping from a Namespace URI |
| |
| An XML namespace is represented by a URI. |
| Since XML Namespace will be mapped to a Java package, it is necessary to |
| specify a default mapping from a URI to a Java package name. The URI |
| format is described in [RFC2396]. |
| |
| The following steps describe how to map a URI |
| to a Java package name. The example URI, |
| _http://www.acme.com/go/espeak.xsd_ , is used to illustrate each step. |
| |
| . Remove the scheme and _":"_ part from the |
| beginning of the URI, if present. + |
| Since there is no formal syntax to identify the optional URI scheme, |
| restrict the schemes to be removed to case insensitive checks for |
| schemes “ _http_ ” and “ _urn_ ”. |
| |
| _//www.acme.com/go/espeak.xsd_ |
| |
| . Remove the trailing file type, one of _.??_ |
| or _.???_ or _.html_ . |
| |
| _//www.acme.com/go/espeak_ |
| |
| . Parse the remaining string into a list of |
| strings using _’/’_ and _‘:’_ as separators. Treat consecutive |
| separators as a single separator. |
| |
| _\{"www.acme.com", "go", "espeak" }_ |
| |
| . For each string in the list produced by |
| previous step, unescape each escape sequence octet. |
| |
| _\{"www.acme.com", "go", "espeak" }_ |
| |
| . If the scheme is a “urn”, replace all |
| dashes, “-”, occurring in the first component with |
| “.”.link:#a5381[30] |
| . Apply algorithm described in Section 7.7 |
| “Unique Package Names” in [JLS] to derive a unique package name from the |
| potential internet domain name contained within the first component. The |
| internet domain name is reversed, component by component. Note that a |
| leading “ _www_ .” is not considered part of an internet domain name and |
| must be dropped. |
| |
| If the first component does not contain |
| either one of the top-level domain names, for example, com, gov, net, |
| org, edu, or one of the English two-letter codes identifying countries |
| as specified in ISO Standard 3166, 1981, this step must be skipped. |
| |
| _\{“com”, “acme”, “go”, “espeak”}_ |
| |
| . For each string in the list, convert each |
| string to be all lower case. |
| |
| _\{"com”, “acme”, "go", "espeak" }_ |
| |
| . For each string remaining, the following |
| conventions are adopted from [JLS] Section 7.7, “Unique Package Names.” |
| . If the sting component contains a hyphen, |
| or any other special character not allowed in an identifier, convert it |
| into an underscore. |
| . If any of the resulting package name |
| components are keywords then append underscore to them. |
| . If any of the resulting package name |
| components start with a digit, or any other character that is not |
| allowed as an initial character of an identifier, have an underscore |
| prefixed to the component. |
| |
| _\{"com”, “acme”, "go", "espeak" }_ |
| |
| . Concatenate the resultant list of strings |
| using _’.’_ as a separating character to produce a package name. |
| |
| _Final package name: "com.acme.go.espeak"._ |
| |
| link:jaxb.html#a4767[See Collisions |
| and conflicts], specifies what to do when the above algorithm results in |
| an invalid Java package name. |
| |
| === [[a4816]]Conforming Java Identifier Algorithm |
| |
| This section describes how to convert a legal |
| Java identifier which may not conform to Java naming conventions to a |
| Java identifier that conforms to the standard naming conventions. |
| link:jaxb.html#a1608[See Customized Name Mapping]“discusses when |
| this algorithm is applied to customization names. |
| |
| Since a legal Java identifier is also a XML |
| name, this algorithm is the same as link:jaxb.html#a4656[See The |
| Name to Identifier Mapping Algorithm]” with the following exception: |
| constant names must not be mapped to a Java constant that conforms to |
| the Java naming convention for a constant. |