Next: Limitations Up: XML Enhancements to Java™ Previous: Getting Started

Subsections


XJ Language Overview

The XJ programming language adds a set of classes to the standard class library defined by Java (See Figure 1). At the root of this set of classes is the XMLObject class, which corresponds loosely to the Node class in DOM [World Wide Web Consortium 2000]. The XMLObject class and its subclasses are treated specially by the XJ compiler:

Figure 1: The hierarchy of XJ types
\begin{figure}\centering\setlength{\unitlength}{1pt}%
\begin{picture}(304,145...
...ector(0,-1){18}}
\par
\put(240,49){\vector(0,-1){18}}
\end{picture}
\end{figure}

In the class hierarchy, XMLElement is the superclass of all classes corresponding to XML element declarations, and XMLAtomic is the superclass of all classes corresponding to XML atomic types. The XMLObject and XMLAtomic classes are abstract and instances of them cannot be constructed directly. Instances of XMLElement may be constructed, as described in §3.6.

The result of an XPath expression is always an instance of a Sequence class, which corresponds to an ordered list of zero-or-more XMLObject. The XMLCursor class implements the java.util.Iterator interface, and is used to iterate over instances of the Sequence class. Both of these classes support a limited form of genericity as defined in Java 5.0 [Sun Microsystems ] for more robust typechecking. The classes in the com.ibm.xj.io package support serialization of XML data to java.io.OutputStream.

In this section, we will describe the XJ classes, XPath expressions, and the construction of XML data in greater detail. We will also describe the interaction between XML classes and the types of Java -- the coercions between XML Schema types (such as xsd:int) and corresponding primitive types (such as int).


XJ Type References

The XJ language allows programmers to refer to element declarations defined in XML Schemas as if they were classes. In addition, XJ programmers may refer to the XML Schema built-in types [Biron and Malhotra 2004] as classes as well.


Referring to XML Schemas

An XML Schema in XJ is treated in a manner similar to a package in Java. When resolving a type, the XJ compiler first attempts to resolve a simple name using the standard mechanisms of Java [Gosling et al. 2000, §6]. If a corresponding package or type cannot be found, the compiler appends ".xsd" to the simple name and attempts to discover an XML Schema of that name using the current classpath. For example, consider the type reference:

com.ibm.xj.samples.totals.salesschema

Assume the compiler has found a package corresponding to com.ibm.xj.samples.totals. The compiler will then attempt to discover an appropriate package or type named salesschema in the package com.ibm.xj.samples.totals. If this fails, the compiler will try to find the resource with the name com/ibm/xj/samples/totals/salesschema.xsd using the standard resource mechanism of Java (that is, using the classpath).


Referring to Element Declarations

Element declarations in XML Schema may be global or local in scope. Global element declarations appear at the top level of the XML Schema document. Local element declarations, on the other hand, appear entirely within a complex type definition. Names used in local element declarations are scoped to the complex type within which they are declared. For example, in the XML Schema defined in Appendix A, the declaration of element salesdata is global. Local declarations of a sales element occur in the definitions of the complex types YearType and RegionType.

Global element declarations in an XML Schema are treated in XJ as top-level classes in a package. So for example, the reference:

com.ibm.xj.samples.totals.salesschema.salesdata
would refer to the XJ class corresponding to the global element declaration of salesdata in the XML Schema, salesschema.

Local element declarations are named as if they were nested classes. Element names of local scope are disambiguated by qualifying the names with the sequence of names of containing elements, where each name in the sequence is separated by a ".". The following type refers to the sales element that is defined within the year local element declaration in salesdata.

com.ibm.xj.samples.totals.salesschema.salesdata.year.sales

The following type refers to a different sales element declaration, the one that is defined in the scope of RegionType (instances of these types are not assignable to each other):

com.ibm.xj.samples.totals.salesschema.salesdata.year.region.sales

Two types corresponding to elements in an XML Schema are the same if they refer to the same element declaration in the XML Schema. Note that there may be multiple ways of specifying the same XJ type. All XJ XML classes derived from element declarations are subclasses of the XMLElement class.

Schema Built-In Types

XJ supports the built-in atomic types, such as xsd:string and xsd:decimal, defined by XML Schema. Appendix B lists all the built-in types defined by XML Schema. To refer to an XML Schema built-in atomic type in an XJ program, the type name is prefaced by "com.ibm.xj.xsd." or just by "xsd.". For example, xsd.integer refers to the XML Schema built-in type xsd:integer.

All the subtyping relationships between types listed in Appendix B are supported by XJ. For example, the following assignment is valid because xsd:short is a subtype of xsd:decimal:

xsd.short shortVar = ...;
xsd.decimal decVar = shortVar;

The simple types in Appendix B, xsd:NMTOKENS, xsd:IDREFS, and xsd:ENTITIES are represented using the Sequence3.3) class in XJ. For example, the built-in list type xsd:NMTOKENS is represented as the XJ type Sequence<xsd.NMTOKEN>.

Currently, construction of XML atomic types and references to user-defined atomic and simple types is not supported.


Import of XML types

XJ modifies the semantics of the import statement (the syntax remains unchanged). In addition to type-on-demand imports of packages and single-type-imports, XJ supports type-on-demand imports of XML Schemas, and single-type-imports of XML element and atomic type declarations in XML Schemas using the lookup mechanism described in §3.1.1.

The rules for disambiguation of names are as in Java. Local names hide imported names, and if the same simple name is used for two different imported types, a compile-time error occurs. The type-import-on-demand declaration of an XML Schema behaves as that of a type-import-on-demand of a package in Java. Specifically, the same disambiguation rules hold if two XML Schemas (or an XML Schema and a package) declare types whose names conflict.

Type-on-demand Import of an XML Schema

Consider an on-demand import statement of the form (as in Java) :

import PackageOrTypeName.*;

The XJ compiler first attempts to resolve PackageOrTypeName using the standard mechanisms used by Java. If a corresponding package or type cannot be found, the compiler appends ".xsd" to PackageOrTypeName and attempts to discover an XML Schema of that name using the mechanism of §3.1.1. If such an XML Schema is discovered, all global element declarations in the XML Schema are available within the compilation unit. Programmers can declare variables, methods, and fields using these types; they may be used wherever a reference type is expected. For example, given an import-on-demand statement of salesschema, the use of the simple name salesdata, which refers to the global element declaration in salesschema, is valid:

import com.ibm.xj.samples.totals.salesschema.*;
$&vellip#vdots;$
salesdata document;

Single-type-import of XML Schema Declarations

A single type import statement with respect to an XML Schema behaves as expected, that is, only the declaration referred to by the statement is visible within the compilation unit. In the example below, the first import statement imports only the global element salesdata from the XML Schema salesschema, and the second import statement imports the XJ type corresponding to the sales element defined in YearType.

import com.ibm.xj.samples.totals.salesschema.salesdata;
import com.ibm.xj.samples.totals.salesschema.salesdata.year.sales;

The following statement imports a different sales element declaration, the one that is defined in the scope of RegionType (instances of the two sales types are not assignable to each other):

import com.ibm.xj.samples.totals.salesschema.salesdata.year.region.sales;

Default Imports

The XJ compiler automatically adds an import of the form com.ibm.xj.* to all compilation units (in addition to the automatic import of java.lang.*). A programmer may therefore refer to XMLObject, Sequence, and other classes in the default XJ package without the "com.ibm.xj" prefix.


XJ Generics

For better static typing of XPath expressions, XJ supports a limited form of generic types based on Java 5.0 [Sun Microsystems ]. The Sequence and XMLCursor types are generic container classes, and may be parametrized with respect to other XJ types. So, for example, the type of an ordered sequence that contains item elements would be com.ibm.xj.Sequence<item> (or Sequence<item> for short).

The Sequence class is used to hold lists of XML values -- the result of the evaluation of an XPath expression is always a Sequence. A Sequence<E> class, where E is the formal type parameter, has the following methods:

The XMLCursor implements the java.util.Iterator interface, and is used to iterate over Sequence. The subtyping rules for generics in Java 5.0 apply for these two classes. Specifically, an instance of Sequence<XMLElement> cannot be assigned to an instance of Sequence<XMLObject>, even though XMLElement is a subclass of XMLObject.


Automatic Unboxing

To simplify programming, XJ supports automatic unboxing of Sequence and XJ element classes. An assignment of an XML sequence class to an XML element class is allowed statically if the parametric type of the Sequence is compatible with the XML element class. For example, in the following:

salesdata s;
Sequence <salesdata> seq = ...;
s = seq;
The assignment s = seq; will be valid statically because the parametric type of the sequence and the type of the element class are compatible (they are both salesdata). The dynamic semantics of the unboxing is to retrieve the element contained in the sequence seq if seq is a singleton sequence, and to throw a com.ibm.xj.NonSingletonSequenceException otherwise.

Similarly, the compiler allows automatic unboxing of an element class to an atomic class, if according to the XML Schema declaration of the element, the element has atomic content compatible with the atomic class. For example, consider the statements:

salesdata.year.theyear y = ...;
xsd.string s = y;

where y is an XJ variable declared to be of the XML element class salesdata.year.theyear. Since the content of theyear is declared to be xsd:string, the compiler allows the assignment. At runtime, s would be set to refer to the contents of the element referred to by y.


XPath Expressions

The ability to specify XPath expressions inline in XJ enables programmers to extract information from XML data in a concise and declarative manner. Most of the processing of XML data in XJ is done with XPath expressions; in XJ, the expression syntax is extended to include XPath expressions. The syntax for specifying XPath expressions in XJ is all expressions that satisfy the production XPath defined as follows:

XPath  ::= QualifiedIdentifier '[|' XP '|]'
       ::= Primary '[|' XP '|]'
XP     ::= Expr

QualifiedIdentifier [Gosling et al. 2000, §18.1, page 449] and Primary [Gosling et al. 2000, §18.1, page 451] refer to the corresponding productions defined by the Java Language Specification. XP corresponds to an Expr, where this non-terminal is as defined by XPath 1.0 [Clark and DeRose 1999, §3.1]. We will refer to the QualifiedIdentifier or Primary with respect to which the XPath expression is evaluated as the context specifier. The XJ parser lexically analyzes XPath expressions as specified by the XPath specification; keywords and other parsing constructs of Java do not apply while parsing an XPath expression.

Referring to In-Scope Variables

The XPath expression can refer to XJ variables that are visible in the current scope. The following sample code demonstrates the use of an XPath expression that refers to an XJ variable min. At runtime, the XPath expression would return all year children of the context node (the value referred to by sd) such that the sum of the contents of the sales descendants of the year element is greater than the value of min.

int min = 70;
Sequence<year> ys = sd[|year[sum(.//sales) > $min]|];

Static Semantics

An XPath expression is valid in XJ if the type of the context specifier of an XPath expression resolves to a subclass of XMLElement or to a Sequence. It is a static error if the evaluation of the XPath expression returns an empty result for all XML values that are valid with respect to the type of the context node specifier. For example, the XPath expression in the following code would raise a static error since year elements do not have children labeled sale and the compiler can determine that the XPath expression will always return an empty result set at runtime.

year y = ...;
Sequence s  = y[|sale|];

All variables used in the XPath expression must be visible in the current scope. Only variables whose names correspond to the identifier syntax of Java are considered valid; specifically, variables whose names contain namespace references will raise a compile-time error. The result type of an XPath expression is always an instance of the com.ibm.xj.Sequence<...> type. If the compiler can determine that the XPath expression always evaluates to instances of a particular XML class, it returns a Sequence of that type. Otherwise, it returns a Sequence of XMLObject. So, for example, in the XPath expression below, the static type of the XPath expression is determined to be Sequence<year>, since the XPath expression will always return zero-or-more year elements.

int min = 70;
Sequence<year> ys = sd[|year[sum(.//sales) > $min]|];
A unique type cannot be determined for the result of the following XPath expression; the static type of the XPath expression is Sequence<XMLObject>.

sd[|//*|];

Runtime Semantics

At runtime, the XPath expression is evaluated with respect to the XML value referred to by the context specifier as defined by XPath 1.0. If the context specifier evaluates to a Sequence at runtime, each member of the Sequence is used as a context node in the evaluation of the XPath expression, and the union of the results of all evaluations are taken to form the result. The result is an instance of Sequence -- an ordered list of zero-or-more XML values with no duplicates. According to the XPath 1.0 specification, an XPath expression can return a node set, number, boolean value, or string. When the result of the XPath expression is a node set, the XJ result of the XPath expression is a Sequence consisting of the nodes in the set in arbitrary order. When result of the XPath expression is not a node set, the XJ result of the XPath expression is a Sequence of the appropriate type (for example, Sequence<xsd.boolean>, when the result is a boolean value). In XPath evaluation, the current values of variable references are used.


Construction of XML Data

XJ introduces two mechanisms for constructing instances of XML classes that correspond to XML Schema element type declarations. XML data can be constructed either from an external source or by embedding literal XML in an XJ program.

Constructing XML Instances from External Sources

XMLElement and all XML classes corresponding to global element declarations have three constructors: where salesdata is an XJ class derived from an XML Schema element declaration. These constructors load the XML data from the stream, file, or URL as appropriate and construct appropriate instances of the XJ classes corresponding to the elements in the data. For example, following expressions load an XML value and construct instances of salesdata:
new salesdata(new java.io.FileInputStream("com/ibm/xj/samples/totals/chart.xml"));
new salesdata(new java.io.File("com/ibm/xj/samples/totals/chart.xml"));
new salesdata(new java.net.URL("file:com/ibm/xj/samples/totals/chart.xml"));

Inline Construction of XML

XJ supports the construction of XML data using literal XML -- the syntax of this construction is similar to that of direct element construction in XQuery [Katz et al. 2003]. For example, the following statement creates an instance of the theyear element class:

theyear y = new theyear(<theyear>1998</theyear>);

The XML literal within the parentheses can be any well-formed block of XML:

region r = new region(<region>
                        <name>NorthEast</name>
                        <sales unit='GBP'>75</sales>
                      </region>);

The content of the XML literal is validated according to the XML Schema declaration for the declared element type, in this case, region. One can construct dynamic XML based on expressions evaluated at runtime. For example, given the previous two statements, one can construct an instance of salesdata:

float conversion = 1.9;
salesdata s =
  new salesdata(
    <salesdata>
      <year>
        {y}
        <sales unit='Dollars'>{grossSales * conversion}</sales>
        {r}
      </year>
    </salesdata>);

The braces, '{' and '}' delimit XJ expressions that are evaluated at runtime to provide the values that are to be inserted during construction. The pattern '{{' is interpreted as a single '{' in case the programmer wishes to insert a literal '{'. Similarly '}}' is interpreted as '}'[*]. In the example, grossSales is some XJ variable that is visible in the current scope. The result of the evaluation of the construction, assuming that the runtime value of grossSales is 100 is:

<salesdata>
  <year>
    <theyear>1998</theyear>
    <sales unit='Dollars'>190.0</sales>
    <region>
      <name>NorthEast</name>
      <sales unit='GBP'>75</sales>
    </region>
  </year>
</salesdata>

To construct untyped XML, that is, XML that is not to be validated with respect to any XML Schema, one can use the XML literal constructor defined for XMLElement:

XMLElement a = new XMLElement(<theyear>1998</theyear>);

where the argument to the constructor is any well-formed block of XML data. The type of the XML value constructed is XMLElement. The variable a cannot be assigned to the variable y declared above even though they are constructed from the same literal XML. The variable a is an instance of XMLElement, and the variable y is an instance of theyear, and an XMLElement may not be assigned to a theyear.


Namespaces

XJ supports the declaration of XML namespaces -- an XJ reference to an XML Schema can be used as the namespace prefix for the target namespace associated with the schema. So, if sampleschema is a reference to an XML Schema that is associated with the target namespace, " http://sample.com/sampleschema", "sampleschema" can be used as the namespace prefix corresponding to that target namespace in XPath expressions and construction.

For example, the following code constructs an element book in the target namespace associated with sampleschema. Note the use of xmlns attributes to declare new namespace prefixes (as in XML).

import sampleschema.book;
...
new book(<sampleschema:book>
           <au:author xmlns:au="http://sample.com/author">
               John Steinbeck
           </au:author>
          </sampleschema:book>
        );

The following code would be invalid since the <book> element constructed is in the default namespace and sampleschema.book is in the namespace associated with sampleschema, " http://sample.com/sampleschema"

import sampleschema.book;
...
new book(<book>
           <au:author xmlns:au="http://sample.com/author">
               John Steinbeck
           </au:author>
          </book>
        );


Implicit Coercions

XJ supports a number of implicit coercions between Java types and XML types to simplify programming. The rules for automatic coercion from an XML atomic built-in type to a Java type are summarized in the table  below [*]. The XJ classes for the built-in types xsd.QName, xsd.NOTATION, xsd.base64Binary, xsd.hexBinary and xsd.anyURI do not have any implicit coercions defined. A value of one of these built-in atomic types may be assigned to an XML variable of the same type or used in the construction of an element type instance.

Consider primitive XML Schema types. An implicit coercion of the primitive XML Schema type xsd.float to its corresponding primitive Java type float is allowed. Similarly, an implicit coercion of xsd.double to the primitive Java type double is allowed, and of xsd.boolean to the primitive Java type boolean. An implicit coercion of the primitive XML Schema type xsd.decimal to the primitive Java type double is allowed.

The primitive XML Schema type xsd.string is a finite length sequence of characters. The atomic class corresponding to xsd.string can be implicitly coerced into the Java class java.lang.String, which is a fixed length sequence of the primitive type char, which is a 16-bit unsigned integer representing a Unicode character.[*]

The XML Schema built-in numeric types xsd:byte, xsd:short, xsd:int, and xsd:long derive from the built-in type xsd:integer. Each of these derived XML Schema types can also be implicitly coerced to the corresponding Java primitive class of the same name. xsd:integer in turn derives from the primitive type xsd:decimal and can also be implicitly coerced to the Java class long.


Table I: Implicit coercions from XML built-in atomic types to Java types
XML Schema type Java type
xsd:anySimpleType Sequence<XMLObject>
xsd:float float
xsd:double double
xsd:boolean boolean
xsd:string java.lang.String
xsd:normalizedString java.lang.String
xsd:token java.lang.String
xsd:language java.lang.String
xsd:Name java.lang.String
xsd:NMTOKEN java.lang.String
xsd:NCName java.lang.String
xsd:ID java.lang.String
xsd:IDREF java.lang.String
xsd:ENTITY java.lang.String
xsd:decimal double
xsd:integer long
xsd:long long
xsd:int int
xsd:short short
 
XML Schema type Java type
xsd:byte byte
xsd:nonPositiveInteger long
xsd:negativeInteger long
xsd:nonNegativeInteger long
xsd:positiveInteger long
xsd:unsignedLong long
xsd:unsignedInt long
xsd:unsignedShort long
xsd:unsignedByte long
xsd:date java.util.Date
xsd:dateTime java.util.Date
xsd:gMonthDay java.util.Date
xsd:gDay java.util.Date
xsd:gMonth java.util.Date
xsd:gYearMonth java.util.Date
xsd:gYear java.util.Date
xsd:time java.util.Date
xsd:duration long
   


Any non-primitive XML Schema atomic type T must derive from some XML Schema primitive type P. Values of the type T can be implicitly coerced to the same Java type as the type P.

In §3.4, we described how an XML element or sequence class can be unboxed to an XML atomic class. Given the rules for implicitly coercing an XML atomic type to a Java type, one can compose coercions and unboxing to implcitly coerce an XML element or sequence class to a Java type. For example, consider the XJ statement:

String s = sd[|theyear|];

where sd is an XJ variable declared to be of an XML element class salesdata. The sequence that is the result type of the XPath expression sd[|/theyear|], Sequence<theyear>, can be unboxed to the XML element class theyear, which in turn, can be unboxed to the XML atomic class xsd.string. Finally, the assignment brings about another implicit coercion, from xsd.string to java.lang.String.

As another example, consider the XJ statement:

double d = sd[|year/sales|];

The XPath expression /year/sales yields a singleton sequence whose member is of element class sales, which has simple content based on xsd:double.

Explicit cast operations from XML Schema to Java work as expected. So, (String)sd[|theyear|] results in the same coercions as String s = sd[|theyear|].


Output of XML Data

XJ provides helper classes for outputting an instance of an XML class to an external java.io.OutputStream. These classes, com.ibm.xj.io.XMLOutputStream and com.ibm.xj.io.XMLDocumentOutputStream both support the interface offered by java.io.PrintStream, and behave similarly to PrintStream when invoked on objects that are XJ XML classes. For XJ XML classes, these two classes format the XML appropriately. The standard way of using XMLOutputStream is as follows:

XMLOutputStream out = new XMLOutputStream(System.out); // Any stream can be passed in
out.println(new XMLElement(<a> ... </a>));

XMLDocumentOutputStream behaves similarly to XMLOutputStream except that it prepends the XML header (for example, "<?xml version="1.0" encoding="UTF-8">") before outputting an XML class. Only the "UTF-8" encoding is supported at the moment.


Exception Handling

All run-time exceptions that are XJ-specific are instances of com.ibm.xj.XJException or one of its subclasses. The XJException class is used to uniformly wrap any exceptions that may be thrown by the XJ runtime. XJExceptions are unchecked exceptions, so they do not need to be declared in the throws clause of a method. Thus, they are similar to Java's RuntimeException.

There is currently one subclass of XJException: com.ibm.xj.NonSingletonSequenceException, thrown when a Sequence with more than one element is coerced to an element type.

Here are the current scenarios that will result in throwing an instance of XJException:

  1. When the XJ runtime fails to initialize.
  2. When the XML Schema file to be imported could not be found at run time.
  3. When a change to the imported XML Schema is detected at run time.
  4. When the imported XML Schema file could not be opened at run time.
  5. When constructing from a stream and the document could not be parsed.
  6. When an XPath expression could not be evaluated.
  7. When attempting to coerce an empty sequence or a null element to an atomic type.
  8. When unable to convert the text content of an XML element to a given atomic type.

If an exception that is not a (subclass of) XJException is thrown by the XJ runtime, this is a bug and should be reported (§1.3).



Footnotes

...}'[*]
Note that braces can also be inserted using the '{"{"}' and '{"}"}' constructs.
...[*]
The rules for string encoding conversion are the default Java rules, which may not correspond to the XML Schema rules.


Next: Limitations Up: XML Enhancements to Java™ Previous: Getting Started
XJ Group 2005-09-13