Photo
XJ

Technical Summary

The integration of XML into an object-oriented language requires the specification of how XML documents are represented in the language's value space, how the type system is extended to represent XML types, and what expressions are available to operate on the XML values. A common characteristic of previous proposals for the integration of XML is that each has defined its own type system for XML. At one extreme, JAXB, a tool for generating Java classes from an XML Schema, embeds XML types completely into the Java type system. Since the XML type system and the Java type system are dissimilar, JAXB is forced to make arbitrary decisions about the representation of XML types. For example, the Java type system cannot represent the ordering or the tagged union constraints of XML Schema.

The design of a custom type system and expression syntax for XML in a programming language has two significant drawbacks:

  • Considerable effort is being expended in developing tools for XML standards. A non-standard type system and expression syntax places an undue burden on the XML programmer.
  • One of the primary uses of XML is in application integration. When a language implements a custom type system for XML, it is often unclear how XML values in the language interact with heterogeneous processes (processes not developed using the language). For example, some languages implement type systems built on regular expression types. Regular expression types are richer and more powerful than XML Schema. When an application serializes an XML value, it must also publish the XML Schema with which the value is valid. For a regular expression type, it is unclear what the corresponding XML Schema would be. In general, a mapping must be defined by the language to approximate the regular expression type into an XML Schema. Understanding both the language's type system and its corresponding mapping to XML standards again places an undue burden on an XML programmer.

The goal of the design of XJ is to be consistent with existing XML and Java specifications. The basis for the XJ type systems for XML values is XML Schema. The expression syntax and semantics for navigating XML values is consistent with the XPath specification. It is our contention that this consistency with existing standards is more intuitive and practical for a programmer.

XJ integrates XML into Java by introducing a new subclass of java.lang.Object, com.ibm.xj.XMLObject that serves as the root class for all XML types. Each element and atomic type declaration in an XML Schema is mapped into an XML class that is a subclass of com.ibm.xj.XMLObject. XPath expressions can be applied to all subclasses of com.ibm.xj.XMLObject. Other than the hierarchy under com.ibm.xj.XMLObject, Java is mostly unchanged --- the only change is defining how XML primitive types may be coerced into Java primitive values and vice versa. To ensure consistency with external XML standards, XJ XML values have the following properties:

  • The serialization of an XJ XML value that is an instance of an XML class is valid (using XML Schema validation rules) according to the XML Schema declaration corresponding to the XML class.
  • An XML tree that is valid according to an XML Schema declaration is converted into an instance of the XML class corresponding to the declaration.
  • The semantics of XPath in XJ is consistent with that of XQuery. If the execution of an XPath expression on an XML document results in a sequence of nodes in the document, then the execution of the XPath expression on an instance of an XJ XML class corresponding to the root of the XML document results in a sequence of instances of XML classes corresponding to those nodes. Moreover, the coercions on XML types defined in XQuery, such as automatic unboxing of elements, are available in XJ.