spec/src/main/asciidoc/appD-binding_xml.adoc - jaxb-api - Git at Google

 //
 // Copyright (c) 2020 Contributors to the Eclipse Foundation
 //

 [appendix]
 == [[a4649]]Binding XML Names to Java Identifiers

 === Overview

 This section provides default mappings from:

 * XML Name to Java identifier
 * Model group to Java identifier
 * Namespace URI to Java package name

 === [[a4656]]The Name to Identifier Mapping Algorithm

 Java identifiers typically follow three
 simple, well-known conventions:

 * Class and interface names always begin with
 an upper-case letter. The remaining characters are either digits,
 lower-case letters, or upper-case letters. Upper-case letters within a
 multi-word name serve to identify the start of each non-initial word, or
 sometimes to stand for acronyms.
 * Method names and components of a package
 name always begin with a lower-case letter, and otherwise are exactly
 like class and interface names.
 * Constant names are entirely in upper case,
 with each pair of words separated by the underscore character (‘_’,
 \u005F, LOW LINE).

 XML names, however, are much richer than Java
 identifiers: They may include not only the standard Java identifier
 characters but also various punctuation and special characters that are
 not permitted in Java identifiers. Like most Java identifiers, most XML
 names are in practice composed of more than one natural-language word.
 Non-initial words within an XML name typically start with an upper-case
 letter followed by a lower-case letter, as in Java language, or are
 prefixed by punctuation characters, which is not usual in the Java
 language and, for most punctuation characters, is in fact illegal.

 In order to map an arbitrary XML name into a
 Java class, method, or constant identifier, the XML name is first broken
 into a _word list_ . For the purpose of constructing word lists from XML
 names we use the following definitions:

 * A _punctuation character_ is one of the
 following:
 *  A hyphen (’-’, \u002D, HYPHEN-MINUS),
 *  A period (‘.’, \u002E, FULL STOP),
 *  A colon (’:’, \u003A, COLON),
 *  A dot (‘.’, \u00B7, MIDDLE DOT),
 *  \u0387, GREEK ANO TELEIA,
 * \u06DD, ARABIC END OF AYAH, or
 *  \u06DE, ARABIC START OF RUB EL HIZB.
 *  An underscore (’_’, \u005F, LOW LINE) with
 following exceptionlink:#a5380[29]

 These are all legal characters in XML names.

 * A _letter_ is a character for which the
 _Character.isLetter_ method returns _true_ , _i.e._ , a letter according
 to the Unicode standard. Every letter is a legal Java identifier
 character, both initial and non-initial.
 * A _digit_ is a character for which the
 _Character.isDigit_ method returns _true_ , _i.e._ , a digit according
 to the Unicode Standard. Every digit is a legal non-initial Java
 identifier character.
 * A _mark_ is a character that is in none of
 the previous categories but for which the
 _Character.isJavaIdentifierPart_ method returns _true_ . This category
 includes numeric letters, combining marks, non-spacing marks, and
 ignorable control characters.

 Every XML name character falls into one of
 the above categories. We further divide letters into three
 subcategories:

 * An _upper-case letter_ is a letter for
 which the _Character.isUpperCase_ method returns _true_ ,
 * A _lowercase letter_ is a letter for which
 the _Character.isLowerCase_ method returns _true_ , and
 * All other letters are _uncased_ .

 An XML name is split into a word list by
 removing any leading and trailing punctuation characters and then
 searching for _word breaks_ . A word break is defined by three regular
 expressions: A prefix, a separator, and a suffix. The prefix matches
 part of the word that precedes the break, the separator is not part of
 any word, and the suffix matches part of the word that follows the
 break. The word breaks are defined as:

 === [[a4681]]XML Word Breaks

 Prefix

 Separator

 Suffix

 Example

 {empty}[^punct]

 punct+1

 {empty}[^punct]

 foo|--|bar

 digit


 {empty}[^ _digit_ ]

 foo|22|bar

 {empty}[^digit]


 digit

 foo|22

 lower


 {empty}[^lower]

 foo|Bar

 upper


 upper lower

 FOO|Bar

 letter


 {empty}[^letter]

 Foo|\u2160

 {empty}[^letter]


 letter

 \u2160|Foo

 uncased


 {empty}[^uncased]


 {empty}[^uncased]


 uncased


 (The character _\u2160_ is ROMAN NUMERAL ONE,
 a numeric letter.)

 After splitting, if a word begins with a
 lower-case character then its first character is converted to upper
 case. The final result is a word list in which each word is either

 * A string of upper- and lower-case letters,
 the first character of which is upper case (includes underscore,’_’, for
 exception case1).
 * A string of digits, or
 * A string of uncased letters and marks.

 Given an XML name in word-list form, each of
 the three types of Java identifiers is constructed as follows:

 * A class or interface identifier is
 constructed by concatenating the words in the list,
 * A method identifier is constructed by
 concatenating the words in the list. A prefix verb ( _get_ , _set_ ,
 _etc._ ) is prepended to the result.
 * A constant identifier is constructed by
 converting each word in the list to upper case; the words are then
 concatenated, separated by underscores.

 This algorithm will not change an XML name
 that is already a legal and conventional Java class, method, or constant
 identifier, except perhaps to add an initial verb in the case of a
 property access method.

 To improve user experience with default
 binding, the automated resolution of frequent naming collision is
 specified in link:jaxb.html#a4770[See Standardized Name
 Collision Resolution]“.

 === Example

 === [[a4734]]XML Names and derived Java Class, Method, and Constant Names

 XML Name

 Class Name

 Method Name

 Constant Name

 mixedCaseName

 MixedCaseName

 getMixedCaseName

 MIXED_CASE_NAME

 Answer42

 Answer42

 getAnswer42

 ANSWER_42

 name-with-dashes

 NameWithDashes

 getNameWithDashes

 NAME_WITH_DASHES

 other_punct-chars

 OtherPunctChars

 getOtherPunctChars

 OTHER_PUNCT_CHARS

 === [[a4755]]XML Names and derived Java Class, Method, and Constant Names when <jaxb:globalBindings underscoreHandling=”asCharInWord”>

 [width="100%",cols="25%,25%,25%,25%",options="header",]
 |===
 |XML Name |Class
 Name |Method Name
 |Constant Name
 |other_punct-chars
 |Other_punctChars
 |getOther_punctChars
 |OTHER_PUNCT_CHARS

 |name_with_underscore
 |Name_with_underscore
 |name_with_underscore
 |NAME_WITH_UNDERSCORE
 |===

 === [[a4767]]Collisions and conflicts

 It is possible that the name-mapping
 algorithm will map two distinct XML names to the same word list.These
 cases will result in a _collision_ if, and only if, the same Java
 identifier is constructed from the word list and is used to name two
 distinct generated classes or two distinct methods or constants in the
 same generated class. It is also possible if two or more namespaces are
 customized to map to the same Java package, XML names that are unique
 due to belonging to distinct namespaces could mapped to the same Java
 Class identifier. Collisions are not permitted by the schema compiler
 and are reported as errors; they may be repaired by revising XML name
 within the source schema or by specifying a customized binding that maps
 one of the two XML names to an alternative Java identifier.

 A class name must not conflict with the
 generated JAXB class, _ObjectFactory_ , link:jaxb.html#a482[See
 Java Package], that occurs in each schema-derived Java package. Method
 names are forbidden to conflict with Java keywords or literals, with
 methods declared in _java.lang.Object_ , or with methods declared in the
 binding-framework classes. Such conflicts are reported as errors and may
 be repaired by revising the appropriate schema or by specifying an
 appropriate customized binding that resolves the name collision.

 === [[a4770]]Standardized Name Collision Resolution

 Given the frequency of an XML element or
 attribute with the name “class” or “Class” resulting in a naming
 collision with the inherited method _java.lang.Object.getClass()_ ,
 method name mapping automatically resolves this conflict by mapping
 these XML names to the java method identifier “getClazz”.

 *

 === [[a4773]]Deriving a legal Java identifier from an enum facet value

 Given that an enum facet’s value is not
 restricted to an XML name, the XML Name to Java identifier algorithm is
 not applicable to generating a Java identifier from an enum facet’s
 value. The following algorithm maps an enum facet value to a valid Java
 constant identifier name.

 * For each character in enum facet value, +
 copy the character to a string representation _javaId_ when
 _java.lang.Character.isJavaIdentifierPart()_ is _true_ .
 * To follow Java constant naming convention,
 each valid lower case character must be copied as its upper case
 equivalent.
 * There is no derived Java constant
 identifier when any of the following occur:
 *  _javaId.length() == 0_
 *
 _java.lang.Character.isJavaIdentifierStart(javaId.get(0)) == false_

 === [[a4780]]Deriving an identifier for a model group

 XML Schema has the concept of a group of
 element declarations. Occasionally, it is convenient to bind the
 grouping as a Java content property or a Java value class. When a
 semantically meaningful name for the group is not provided within the
 source schema or via a binding declaration customization, it is
 necessary to generate a Java identifier from the grouping. Below is an
 algorithm to generate such an identifier.

 A name is computed for an unnamed model group
 by concatenating together the first 3 element declarations and/or
 wildcards that occur within the model group. Each XML \{name} is mapped
 to a Java identifier for a method using the XML Name to Java Identifier
 Mapping algorithm. Since wildcard does not have a \{name} property, it
 is represented as the Java identifier “ _Any_ ”. The Java identifiers
 are concatenated together with the separator “ _And_ ” for sequence and
 all compositor and “ _Or_ ” for choice compositors. For example, a
 sequence of element _foo_ and element _bar_ would map to _“_ _FooAndBar_
 _”_ and a choice of element _foo_ and element _bar_ maps to _“_
 _FooOrBar_ _._ ” Lastly, a sequence of wildcard and element _bar_ would
 map to the Java identifier _“_ _AnyAndBar_ _”_ .

 === Example:

 Given XML Schema fragment:

 <xs:choice> +
 <xs:sequence> +
 <xs:element ref="A"/> +
 <xs:any processContents="strict"/> +
 </xs:sequence> +
 <xs:element ref="C"/>

 </xs:choice>

 The generated Java identifier would be
 _AAndAnyOrC_ .

 === [[a4788]]Generating a Java package name

 This section describes how to generate a
 package name to hold the derived Java representation. The motivation for
 specifying a default means to generate a Java package name is to
 increase the chances that a schema can be processed by a schema compiler
 without requiring the user to specify customizations.

 If a schema has a target namespace, the next
 subsection describes how to map the URI into a Java package name. If the
 schema has no target namespace, there is a section that describes an
 algorithm to generate a Java package name from the schema filename.

 === Mapping from a Namespace URI

 An XML namespace is represented by a URI.
 Since XML Namespace will be mapped to a Java package, it is necessary to
 specify a default mapping from a URI to a Java package name. The URI
 format is described in [RFC2396].

 The following steps describe how to map a URI
 to a Java package name. The example URI,
 _http://www.acme.com/go/espeak.xsd_ , is used to illustrate each step.

 . Remove the scheme and _":"_ part from the
 beginning of the URI, if present. +
 Since there is no formal syntax to identify the optional URI scheme,
 restrict the schemes to be removed to case insensitive checks for
 schemes “ _http_ ” and “ _urn_ ”.

  _//www.acme.com/go/espeak.xsd_

 . Remove the trailing file type, one of _.??_
 or _.???_ or _.html_ .

  _//www.acme.com/go/espeak_

 . Parse the remaining string into a list of
 strings using _’/’_ and _‘:’_ as separators. Treat consecutive
 separators as a single separator.

  _\{"www.acme.com", "go", "espeak" }_

 . For each string in the list produced by
 previous step, unescape each escape sequence octet.

  _\{"www.acme.com", "go", "espeak" }_

 . If the scheme is a “urn”, replace all
 dashes, “-”, occurring in the first component with
 “.”.link:#a5381[30]
 . Apply algorithm described in Section 7.7
 “Unique Package Names” in [JLS] to derive a unique package name from the
 potential internet domain name contained within the first component. The
 internet domain name is reversed, component by component. Note that a
 leading “ _www_ .” is not considered part of an internet domain name and
 must be dropped.

 If the first component does not contain
 either one of the top-level domain names, for example, com, gov, net,
 org, edu, or one of the English two-letter codes identifying countries
 as specified in ISO Standard 3166, 1981, this step must be skipped.

  _\{“com”, “acme”, “go”, “espeak”}_

 . For each string in the list, convert each
 string to be all lower case.

  _\{"com”, “acme”, "go", "espeak" }_

 . For each string remaining, the following
 conventions are adopted from [JLS] Section 7.7, “Unique Package Names.”
 . If the sting component contains a hyphen,
 or any other special character not allowed in an identifier, convert it
 into an underscore.
 . If any of the resulting package name
 components are keywords then append underscore to them.
 . If any of the resulting package name
 components start with a digit, or any other character that is not
 allowed as an initial character of an identifier, have an underscore
 prefixed to the component.

  _\{"com”, “acme”, "go", "espeak" }_

 . Concatenate the resultant list of strings
 using _’.’_ as a separating character to produce a package name.

  _Final package name: "com.acme.go.espeak"._

 link:jaxb.html#a4767[See Collisions
 and conflicts], specifies what to do when the above algorithm results in
 an invalid Java package name.

 === [[a4816]]Conforming Java Identifier Algorithm

 This section describes how to convert a legal
 Java identifier which may not conform to Java naming conventions to a
 Java identifier that conforms to the standard naming conventions.
 link:jaxb.html#a1608[See Customized Name Mapping]“discusses when
 this algorithm is applied to customization names.

 Since a legal Java identifier is also a XML
 name, this algorithm is the same as link:jaxb.html#a4656[See The
 Name to Identifier Mapping Algorithm]” with the following exception:
 constant names must not be mapped to a Java constant that conforms to
 the Java naming convention for a constant.
	//
	// Copyright (c) 2020 Contributors to the Eclipse Foundation
	//

	[appendix]
	== [[a4649]]Binding XML Names to Java Identifiers

	=== Overview

	This section provides default mappings from:

	* XML Name to Java identifier
	* Model group to Java identifier
	* Namespace URI to Java package name

	=== [[a4656]]The Name to Identifier Mapping Algorithm

	Java identifiers typically follow three
	simple, well-known conventions:

	* Class and interface names always begin with
	an upper-case letter. The remaining characters are either digits,
	lower-case letters, or upper-case letters. Upper-case letters within a
	multi-word name serve to identify the start of each non-initial word, or
	sometimes to stand for acronyms.
	* Method names and components of a package
	name always begin with a lower-case letter, and otherwise are exactly
	like class and interface names.
	* Constant names are entirely in upper case,
	with each pair of words separated by the underscore character (‘_’,
	\u005F, LOW LINE).

	XML names, however, are much richer than Java
	identifiers: They may include not only the standard Java identifier
	characters but also various punctuation and special characters that are
	not permitted in Java identifiers. Like most Java identifiers, most XML
	names are in practice composed of more than one natural-language word.
	Non-initial words within an XML name typically start with an upper-case
	letter followed by a lower-case letter, as in Java language, or are
	prefixed by punctuation characters, which is not usual in the Java
	language and, for most punctuation characters, is in fact illegal.

	In order to map an arbitrary XML name into a
	Java class, method, or constant identifier, the XML name is first broken
	into a _word list_ . For the purpose of constructing word lists from XML
	names we use the following definitions:

	* A _punctuation character_ is one of the
	following:
	* A hyphen (’-’, \u002D, HYPHEN-MINUS),
	* A period (‘.’, \u002E, FULL STOP),
	* A colon (’:’, \u003A, COLON),
	* A dot (‘.’, \u00B7, MIDDLE DOT),
	* \u0387, GREEK ANO TELEIA,
	* \u06DD, ARABIC END OF AYAH, or
	* \u06DE, ARABIC START OF RUB EL HIZB.
	* An underscore (’_’, \u005F, LOW LINE) with
	following exceptionlink:#a5380[29]

	These are all legal characters in XML names.

	* A _letter_ is a character for which the
	_Character.isLetter_ method returns _true_ , _i.e._ , a letter according
	to the Unicode standard. Every letter is a legal Java identifier
	character, both initial and non-initial.
	* A _digit_ is a character for which the
	_Character.isDigit_ method returns _true_ , _i.e._ , a digit according
	to the Unicode Standard. Every digit is a legal non-initial Java
	identifier character.
	* A _mark_ is a character that is in none of
	the previous categories but for which the
	_Character.isJavaIdentifierPart_ method returns _true_ . This category
	includes numeric letters, combining marks, non-spacing marks, and
	ignorable control characters.

	Every XML name character falls into one of
	the above categories. We further divide letters into three
	subcategories:

	* An _upper-case letter_ is a letter for
	which the _Character.isUpperCase_ method returns _true_ ,
	* A _lowercase letter_ is a letter for which
	the _Character.isLowerCase_ method returns _true_ , and
	* All other letters are _uncased_ .

	An XML name is split into a word list by
	removing any leading and trailing punctuation characters and then
	searching for _word breaks_ . A word break is defined by three regular
	expressions: A prefix, a separator, and a suffix. The prefix matches
	part of the word that precedes the break, the separator is not part of
	any word, and the suffix matches part of the word that follows the
	break. The word breaks are defined as:

	=== [[a4681]]XML Word Breaks

	Prefix

	Separator

	Suffix

	Example

	{empty}[^punct]

	punct+1

	{empty}[^punct]

	foo\|--\|bar

	digit



	{empty}[^ _digit_ ]

	foo\|22\|bar

	{empty}[^digit]



	digit

	foo\|22

	lower



	{empty}[^lower]

	foo\|Bar

	upper



	upper lower

	FOO\|Bar

	letter



	{empty}[^letter]

	Foo\|\u2160

	{empty}[^letter]



	letter

	\u2160\|Foo

	uncased



	{empty}[^uncased]



	{empty}[^uncased]



	uncased



	(The character _\u2160_ is ROMAN NUMERAL ONE,
	a numeric letter.)

	After splitting, if a word begins with a
	lower-case character then its first character is converted to upper
	case. The final result is a word list in which each word is either

	* A string of upper- and lower-case letters,
	the first character of which is upper case (includes underscore,’_’, for
	exception case1).
	* A string of digits, or
	* A string of uncased letters and marks.

	Given an XML name in word-list form, each of
	the three types of Java identifiers is constructed as follows:

	* A class or interface identifier is
	constructed by concatenating the words in the list,
	* A method identifier is constructed by
	concatenating the words in the list. A prefix verb ( _get_ , _set_ ,
	_etc._ ) is prepended to the result.
	* A constant identifier is constructed by
	converting each word in the list to upper case; the words are then
	concatenated, separated by underscores.

	This algorithm will not change an XML name
	that is already a legal and conventional Java class, method, or constant
	identifier, except perhaps to add an initial verb in the case of a
	property access method.

	To improve user experience with default
	binding, the automated resolution of frequent naming collision is
	specified in link:jaxb.html#a4770[See Standardized Name
	Collision Resolution]“.

	=== Example

	=== [[a4734]]XML Names and derived Java Class, Method, and Constant Names

	XML Name

	Class Name

	Method Name

	Constant Name

	mixedCaseName

	MixedCaseName

	getMixedCaseName

	MIXED_CASE_NAME

	Answer42

	Answer42

	getAnswer42

	ANSWER_42

	name-with-dashes

	NameWithDashes

	getNameWithDashes

	NAME_WITH_DASHES

	other_punct-chars

	OtherPunctChars

	getOtherPunctChars

	OTHER_PUNCT_CHARS

	=== [[a4755]]XML Names and derived Java Class, Method, and Constant Names when <jaxb:globalBindings underscoreHandling=”asCharInWord”>

	[width="100%",cols="25%,25%,25%,25%",options="header",]
	\|===
	\|XML Name \|Class
	Name \|Method Name
	\|Constant Name
	\|other_punct-chars
	\|Other_punctChars
	\|getOther_punctChars
	\|OTHER_PUNCT_CHARS

	\|name_with_underscore
	\|Name_with_underscore
	\|name_with_underscore
	\|NAME_WITH_UNDERSCORE
	\|===

	=== [[a4767]]Collisions and conflicts

	It is possible that the name-mapping
	algorithm will map two distinct XML names to the same word list.These
	cases will result in a _collision_ if, and only if, the same Java
	identifier is constructed from the word list and is used to name two
	distinct generated classes or two distinct methods or constants in the
	same generated class. It is also possible if two or more namespaces are
	customized to map to the same Java package, XML names that are unique
	due to belonging to distinct namespaces could mapped to the same Java
	Class identifier. Collisions are not permitted by the schema compiler
	and are reported as errors; they may be repaired by revising XML name
	within the source schema or by specifying a customized binding that maps
	one of the two XML names to an alternative Java identifier.

	A class name must not conflict with the
	generated JAXB class, _ObjectFactory_ , link:jaxb.html#a482[See
	Java Package], that occurs in each schema-derived Java package. Method
	names are forbidden to conflict with Java keywords or literals, with
	methods declared in _java.lang.Object_ , or with methods declared in the
	binding-framework classes. Such conflicts are reported as errors and may
	be repaired by revising the appropriate schema or by specifying an
	appropriate customized binding that resolves the name collision.

	=== [[a4770]]Standardized Name Collision Resolution

	Given the frequency of an XML element or
	attribute with the name “class” or “Class” resulting in a naming
	collision with the inherited method _java.lang.Object.getClass()_ ,
	method name mapping automatically resolves this conflict by mapping
	these XML names to the java method identifier “getClazz”.

	*

	=== [[a4773]]Deriving a legal Java identifier from an enum facet value

	Given that an enum facet’s value is not
	restricted to an XML name, the XML Name to Java identifier algorithm is
	not applicable to generating a Java identifier from an enum facet’s
	value. The following algorithm maps an enum facet value to a valid Java
	constant identifier name.

	* For each character in enum facet value, +
	copy the character to a string representation _javaId_ when
	_java.lang.Character.isJavaIdentifierPart()_ is _true_ .
	* To follow Java constant naming convention,
	each valid lower case character must be copied as its upper case
	equivalent.
	* There is no derived Java constant
	identifier when any of the following occur:
	* _javaId.length() == 0_
	*
	_java.lang.Character.isJavaIdentifierStart(javaId.get(0)) == false_

	=== [[a4780]]Deriving an identifier for a model group

	XML Schema has the concept of a group of
	element declarations. Occasionally, it is convenient to bind the
	grouping as a Java content property or a Java value class. When a
	semantically meaningful name for the group is not provided within the
	source schema or via a binding declaration customization, it is
	necessary to generate a Java identifier from the grouping. Below is an
	algorithm to generate such an identifier.

	A name is computed for an unnamed model group
	by concatenating together the first 3 element declarations and/or
	wildcards that occur within the model group. Each XML \{name} is mapped
	to a Java identifier for a method using the XML Name to Java Identifier
	Mapping algorithm. Since wildcard does not have a \{name} property, it
	is represented as the Java identifier “ _Any_ ”. The Java identifiers
	are concatenated together with the separator “ _And_ ” for sequence and
	all compositor and “ _Or_ ” for choice compositors. For example, a
	sequence of element _foo_ and element _bar_ would map to _“_ _FooAndBar_
	_”_ and a choice of element _foo_ and element _bar_ maps to _“_
	_FooOrBar_ _._ ” Lastly, a sequence of wildcard and element _bar_ would
	map to the Java identifier _“_ _AnyAndBar_ _”_ .

	=== Example:

	Given XML Schema fragment:

	<xs:choice> +
	<xs:sequence> +
	<xs:element ref="A"/> +
	<xs:any processContents="strict"/> +
	</xs:sequence> +
	<xs:element ref="C"/>

	</xs:choice>

	The generated Java identifier would be
	_AAndAnyOrC_ .

	=== [[a4788]]Generating a Java package name

	This section describes how to generate a
	package name to hold the derived Java representation. The motivation for
	specifying a default means to generate a Java package name is to
	increase the chances that a schema can be processed by a schema compiler
	without requiring the user to specify customizations.

	If a schema has a target namespace, the next
	subsection describes how to map the URI into a Java package name. If the
	schema has no target namespace, there is a section that describes an
	algorithm to generate a Java package name from the schema filename.

	=== Mapping from a Namespace URI

	An XML namespace is represented by a URI.
	Since XML Namespace will be mapped to a Java package, it is necessary to
	specify a default mapping from a URI to a Java package name. The URI
	format is described in [RFC2396].

	The following steps describe how to map a URI
	to a Java package name. The example URI,
	_http://www.acme.com/go/espeak.xsd_ , is used to illustrate each step.

	. Remove the scheme and _":"_ part from the
	beginning of the URI, if present. +
	Since there is no formal syntax to identify the optional URI scheme,
	restrict the schemes to be removed to case insensitive checks for
	schemes “ _http_ ” and “ _urn_ ”.

	_//www.acme.com/go/espeak.xsd_

	. Remove the trailing file type, one of _.??_
	or _.???_ or _.html_ .

	_//www.acme.com/go/espeak_

	. Parse the remaining string into a list of
	strings using _’/’_ and _‘:’_ as separators. Treat consecutive
	separators as a single separator.

	_\{"www.acme.com", "go", "espeak" }_

	. For each string in the list produced by
	previous step, unescape each escape sequence octet.

	_\{"www.acme.com", "go", "espeak" }_

	. If the scheme is a “urn”, replace all
	dashes, “-”, occurring in the first component with
	“.”.link:#a5381[30]
	. Apply algorithm described in Section 7.7
	“Unique Package Names” in [JLS] to derive a unique package name from the
	potential internet domain name contained within the first component. The
	internet domain name is reversed, component by component. Note that a
	leading “ _www_ .” is not considered part of an internet domain name and
	must be dropped.

	If the first component does not contain
	either one of the top-level domain names, for example, com, gov, net,
	org, edu, or one of the English two-letter codes identifying countries
	as specified in ISO Standard 3166, 1981, this step must be skipped.

	_\{“com”, “acme”, “go”, “espeak”}_

	. For each string in the list, convert each
	string to be all lower case.

	_\{"com”, “acme”, "go", "espeak" }_

	. For each string remaining, the following
	conventions are adopted from [JLS] Section 7.7, “Unique Package Names.”
	. If the sting component contains a hyphen,
	or any other special character not allowed in an identifier, convert it
	into an underscore.
	. If any of the resulting package name
	components are keywords then append underscore to them.
	. If any of the resulting package name
	components start with a digit, or any other character that is not
	allowed as an initial character of an identifier, have an underscore
	prefixed to the component.

	_\{"com”, “acme”, "go", "espeak" }_

	. Concatenate the resultant list of strings
	using _’.’_ as a separating character to produce a package name.

	_Final package name: "com.acme.go.espeak"._

	link:jaxb.html#a4767[See Collisions
	and conflicts], specifies what to do when the above algorithm results in
	an invalid Java package name.

	=== [[a4816]]Conforming Java Identifier Algorithm

	This section describes how to convert a legal
	Java identifier which may not conform to Java naming conventions to a
	Java identifier that conforms to the standard naming conventions.
	link:jaxb.html#a1608[See Customized Name Mapping]“discusses when
	this algorithm is applied to customization names.

	Since a legal Java identifier is also a XML
	name, this algorithm is the same as link:jaxb.html#a4656[See The
	Name to Identifier Mapping Algorithm]” with the following exception:
	constant names must not be mapped to a Java constant that conforms to
	the Java naming convention for a constant.