IBM

XML Schema Infoset API Requirements

IBM Working Draft 15 February 2002 19:00 ET

This version:
http://www.research.ibm.com/XML/schema/WD-XML-Schema-Infoset-API-Req.htm
Latest version:
http://www.research.ibm.com/XML/schema/WD-XML-Schema-Infoset-API-Req.htm
Previous version:
None
Editor:
Bob Schloss <rschloss@us.ibm.com>

Abstract

IBM has established these as requirements for some of its own work on schema-related APIs, and we offer them for consideration as potential requirements for API's to be included in W3C recommendations.

This document lists the design principles, scope, and requirements for a XML Schema Infoset and API. It includes requirements as they relate to development time and runtime software which

It includes requirements concerning

Status of this Document

This document was shared by IBM with the W3C Schema and DOM WGs in February 2002. It represents the views of the IBM XML Schema Infoset API Working Team (Achille Fokoué, Ed Merks, Lisa Martin, Arnaud Le Hors, Elena Litani, Bob Schloss, Chuck Campbell, Noah Mendelsohn).

This is a draft document and may be updated, replaced or obsoleted by other documents at any time.

Please send comments to the editor <rschloss@us.ibm.com> and to IBM's representatives to the W3C Schema Working Group (<lmartin@ca.ibm.com> and <chuckc@us.ibm.com>) and the W3C DOM Working Group (<elitani@ca.ibm.com> and <lehors@us.ibm.com> ).


Table of Contents

  1. Introduction
  2. Design Principles and Scope
  3. Requirements to be able to partially encrypt and the requirements of the XML Encryption Syntax and Processing document.
    1. Design
    2. Data Model
    3. Interface
    4. Processing
    5. Algorithms
    6. Coordination
    7. Intellectual Property
  4. Non-Requirements
  5. Desirable Features
  6. Use Cases
  7. References

1. Introduction

The XML Schema Recommendation [XML-schema] specifies

Numerous other specifications are using the XML Schema mechanisms to specify constraints on XML documents used as messages in program interfaces [WSDL], to constrain values that an agent accepts when a user interacts with an XForm [XFORMS], etc.

This specification provides requirements for a data model and an API to be used by programs such as:

Schema APIs and models have begun to appear from commercial software vendors (such as TIBCO [TIBCOSchemaAPI], and from open source efforts (such as Castor [Castor Schema]). The W3C Schema WG has been advised that Markup Technologies, a W3C member, is also developing an API to Schema components. These generally support a subset of the operations one might wish to perform on W3C XML Schemas; for example, updates to schema components, and coordinating changes to corresponding schema documents, are often not considered.

A complete API with multiple implementations would be an extremely valuable augmentation to the W3C XML Schema language specifications.

This document uses the term core methods to mean the set of methods or functions which permit construction or examination of W3C XML Schema components. This document uses the term non-core methods to mean the set of methods or functions which make construction, examination, modification, comparison, search of, or diagnostic feedback on, W3C XML Schemas much easier to code, but which, if not implemented, can have their functions accomplished by more extensive use of the core methods.

2. Design Principles and Scope

This section describes high level principles of design and definition of scope. They are an expression of intent/motivation. Fuller motivations of the requirements and non-requirements appear in subsequent sections.

  1. The XML Schema Infoset API specification must describe

    • how to express the API in commonly used language, such as Java and
    • how to use a language which can use interfaces expressed in IDL
    to create, examine or modify a schema, including the abstract components of the schema as well as the XML representation of XML schema documents.
  2. Methods should be provided to retrieve or delete schema components either
    • all components of a specific component type
    • by name, for components which have names
  3. The XML Schema Infoset API must describe functions to
    • create its data model by deserializing XML Schema documents,
    • to serialize its data model as one or more XML Schema documents, and
    • to serialize its data model as an XML document compatible with the portion of the Post Schema Validation Infoset grammar that describes schema components [Serialized Infosets Schema from University of Edinburgh].
  4. The model must represent schemas which are incomplete or in-error.
    • The specification must provide for representation of a schema object which is missing some of the properties that it would be required to have before it can be used in assessment.
    • The specification must provide for representation of references from one schema object to another by name, even though the component defining that name is not in the data model.
    • The specification must provide for representation of a schema object or set of schema objects which have properties such that a violation of a constraint on schema would be recognized if these properties were not changed before assessment.
    • The model must be able to represent cyclic references and self-references
  5. The specification may include method call ordering constraints, specifying that programs using the API must call one method before calling a different method in cases where that makes the API syntax clear and will simplify the development of software to provide the API.
  6. The specification must provide a mechanism for values of minOccurs and maxOccurs and of facets to be read and set in the native type system of the implementing programming language, and not only as Unicode strings.
  7. The specification will not include methods to take a portion of an XML infoset and determine whether a particular set of schema components consider the infoset valid, invalid, or indeterminate without additional schema components.
  8. The specification should define error code values or exception names by which a library implementing the API returns information to the calling program when an illegal state occurs.
  9. The specification should strive to limit optionality in the core methods, and maximize extensibility such that all of the specification can be quickly implemented
  10. A set of schema documents that are deserialized through this API, and then serialized through this API, should result in equivalent XML documents, in terms of their implications for validation (including PSVI information). Information should not be thrown away or compiled in ways that would prevent this.

    Part of the API concerning capability introspection (see below) will allow using programs to request that component ordering, namespace prefixes, IDs, names of identity-definition constraints, global elementFormDefault and attributeFormDefault information, whitespace and XML comments and entity references all be preserved, and for a given implementation to specify that it is capable, or is not capable, of doing so.

  11. The API should have capability introspection, so that callers can determine whether the implementation they are using has implemented various parts of the specification.
    1. Names of the exceptions detected concerning constraints on schema by a given implementation should be visible through this capability introspection
    2. Exceptions which are detected immediately, and Exceptions which are detected only upon special checking method calls, should be visible through this capability introspection
    3. Non-core named sets of convenience methods
  12. The spec for the API may document non-core sets of convenience methods
  13. The spec will avoid specifying methods which returns a message to be shown to a developer, or which cause a message to be sent to standard output. In the exceptional case where this cannot be avoided, the method must be non-Core and must be passed an argument to specify the language to be used for the message

3. Requirements

Reasons for the given requirement appear after some requirements within brackets: [reason] .

1. Design

  1. The specification should strive to limit optionality in the core methods, and maximize extensibility such that all of the specification can be quickly implemented.
  2. The spec for the API may document non-core sets of convenience methods. [Modularity].
  3. The spec will avoid specifying methods which returns a message to be shown to a developer, or which cause a message to be sent to standard output. Any method which returns a message to be shown to a developer, or causes a message to be sent to standard output, must be non-Core and must specify the language to be used for the message. [Support I18N].

2. Data Model

  1. The data model must represent the abstract components of the schema as well as additional information from the XML representation of XML schema documents.
  2. The specification must provide for representation of a schema object which is missing some of the properties that it would be required to have before it can be used in assessment. [This allows the model to be used in schema editors].
  3. The specification must provide for representation of references from one schema object to another by name, even though the component defining that name is not in the data model. [This allows the model to be used in schema editors, and a reference to be created before the definition is created].
  4. The specification must provide for representation of a schema object or set of schema objects which have properties such that a violation of a constraint on schema would be recognized if these properties were not changed before assessment. [This allows the model created in a schema editor to be persisted in an error state].
  5. The model must be able to represent cyclic references and self-references. [This allows the model created in a schema editor to be persisted in an error state].

3. Interface

  1. Methods described in this interface must be usable from
    • programs written in Java and
    • programs written in a language that allows interfaces expressed in IDL
  2. The specification must include method call ordering constraints, specifying that programs using the API must call one method before calling a different method in cases where that makes the API syntax clear and will simplify the development of software to provide the API.
  3. The specification must provide a mechanism for values of minOccurs and maxOccurs and of facets to be read and set in the native type system of the implementing programming language, and not only as Unicode strings.
  4. The specification should define error code values or exception names by which a library implementing the API returns information to the calling program when an illegal state occurs. These error code values and exception names should be utilize the error codes already defined by W3C XML Schema specifications when they are available.
  5. The API should have capability introspection, so that callers can determine whether the implementation they are using has implemented various parts of the specification.
    1. Names of the exceptions detected concerning constraints on schema by a given implementation should be visible through this capability introspection
    2. Exceptions which are detected immediately, and Exceptions which are detected only upon special checking method calls, should be visible through this capability introspection
    3. Non-core named sets of convenience methods
  6. Convenience methods should be provided to locate all unresolved references in a schema data model

4. Processing

  1. Methods should be provided to retrieve or delete schema components either
    • all components of a specific component type
    • by name, for components which have names
  2. The XML Schema Infoset API must describe functions to
    • create its data model by deserializing XML Schema documents,
    • to serialize its data model as one or more XML Schema documents, and
    • to serialize its data model as an XML document compatible with the portion of the Post Schema Validation Infoset grammar that describes schema components.
  3. A set of schema documents that are deserialized through this API, and then serialized through this API, should result in Equivalent XML documents, in terms of their implications for validation (including PSVI information). Information should not be thrown away or compiled in ways that would prevent this.

    Part of the API concerning capability introspection will allow using programs to request that "round-tripping" a schema be as precise as possible -- near-identical. Near-Identical need not include attribute quotation systems used or attribute order on the same line. Near-Identical should include whitespace and element start tags, element end tags, attributes, and element content appearing on the same lines as they did in the original document. Near-Identical includes elements inside <appinfo> and <documentation> appearing on the same lines as they did in the original document.

    Equivalent includes the presence of schema language for which components are normally not created (such as minOccurs=maxOccurs=0).

  4. When a schema document is deserialized, the API provides facilities for specifying one of the following behaviors:
    • deserialize any schema document referenced in an <include> <import> or <redefine> element information item.
    • do not deserialize schema documents referred to
  5. When a schema document is deserialized, the API provides facilities for specifying one of the following behaviors:
    • connect unresolved symbolic references from existing models into the components of the model built from this schema
    • do not connect unresolved symbolic references
  6. When a schema data model is destroyed, and models of other schema documents remain which had connections to components in the destroyed model, the API provides facilities for specifying one of the following behaviors:
    • restore unresolved symbolic references into existing models, where connections into the components of the model built from this schema had been
    •  
  7. When a schema data model has its targetNamespace changed, and models of other schema documents remain which had connections to components in the model using the old target namespace, the API provides facilities for specifying one of the following behaviors:
    • restore unresolved symbolic references, using the old target namespace, into existing models, where connections into the components of the model built from this schema had been
    •  
  8. When a schema data model is destroyed, and models of other non-schema documents (WSDL, XForm) remain which had connections to components in the destroyed model, the API provides facilities for signaling an event, allowing the non-schema data model to be adjusted.

    [Question: should the event only be signalled if there actually were connections, or if there could have been connections?]

  9. When a schema data model has its targetNamespace changed, and models of other non-schema documents (WSDL, XForm) remain which had connections to components in the model using the old target namespace, the API provides facilities for signaling an event, allowing the non-schema data model to be adjusted.
  10. The API should provide facilities so that a component in the data model for one schema document can be copied into the data model for a different schema document.
  11. The API could provide facilities so that when schema validity is tested more than once, the output from the n+1st test indicates which errors had been present and remain present, which errors have become present since the last test, and which error that were present are no longer present. [Would make it easier to maintain an error messages list without recreating the entire list after each test. Only useful for very large schemas with large numbers of errors.]

5. Algorithms

(no requirements)

6. Coordination

The XML Schema Infoset API specification should meet the requirements of (so as to support) or work with the following applications:

To ensure the above requirements are adequately addressed, the XML Schema Infoset API specification must be reviewed by a designated member of the following communities:

7. Intellectual Property

  1. The specification should be free of encumbering technologies: requiring no licensing fees for implementation and use.

    Members of the XML Schema Infoset API Working Group or Task Force and any other Working Group constituted within the XML Schema Infoset API Activity (if the W3C should charter an activity) are expected to disclose any intellectual property they have in this area. Any intellectual property essential to implement specifications produced by this work (Activity) must be at least available for licensing on a royalty-free basis. At the suggestion of the Working Group or Task Force, and at the discretion of the Director of W3C, technologies may be accepted if they are licensed on reasonable, non-discriminatory terms.

    Note: This is not a statement as to whether IBM believes a Schema Infoset API is better created in a RAND or RF framework.

4. Non-Requirements

  1. An implementation of the methods in this API should be usable without any need to invoke the DOM Level 3 Abstract Schema methods, described in DOM Level 3 Abstract Schemas.

5. Desirable features

  1. The specification should use UML Logical Class Diagrams to express properties of data model instance objects and associations between objects.
  2. For the convenience of developers, the specification should use Javadoc to describe the API as seen by Java software

6. Use Cases

This list is not an exhaustive list of use cases. It represents a range of operations that should be doable using the methods provided by the Schema Infoset API.

Schema examination

  1. determine if the schema has a rule for validating a global element called 'foo' - return true if it does, false if it doesn't
  2. determine if 'group1' and 'group2' have the same effective content model, although they have different group names
  3. determine if components referenced via <include> <redefine> or <import> could not be accessed, and specify a non-access reason code (e.g. no connectivity, resolved URI does not exists (404), resolved URI does not reference an XML document, resolved URI does not reference a W3C XML Schema, self-reference, other).
  4. retrieve all foreign attributes located on the first element information item which declares a global element

Schema reduction

  1. given a schema and an element name, remove any schema components that are not needed to validate that element and its content (including child elements and their content) and its attributes

Schema modification

  1. take an existing schema, update its targetNamespace and its version
  2. take an existing schema, find any simpleType which derived from (http://www.w3.org/2001/XMLSchema):string and which has the enumerated value "topSecret" and modify it to have the enumerated value "superSecret" instead
  3. take an existing schema, find all global groups which are composed with <all>, and modify them to be composed with <sequence>

Schema creation

  1. create a schema that contains a global element declaration with an anonymous simpleType that is a restriction of one of the schema built-in types, specifying facets, and serialize the schema to a document using the XML Representation of Schema
  2. create a schema that contains a global element declaration with an anonymous complexType
  3. create a schema that contains a foreign attribute on a schema <element> element information item

PSVI examination

  1. For an element in an instance document that was assessed with a schema, return the element abstract component which was used in the assessment
  2. For an element in an instance document that was assessed with a schema, return a binary indication of whether the element's content was valid or invalid
  3. For an attribute in an instance document that was assessed with a schema, return the attribute abstract component which was used in the assessment
  4. For an attribute in an instance document that was assessed with a schema, return a binary indication of whether the attribute's content was valid or invalid

7. References

Castor Schema
Castor XML Schema Support from Exolab.org, Arnaud Blandin, 0.9.3.9, December 12, 2001
DOM
Document Object Model Core, Level 3. Arnaud Le Hors. W3C Working Draft. January 2001.
http://www.w3.org/TR/DOM-Level-3-Core/core.html
DOM-AS
Document Object Model Abstract Schemas. B. Chang, J. Kesselman, R. Rahman. February 14, 2002
http://www.w3.org/2001/02/WD-DOM-Level-3-ASLS-SaintValentin/abstract-schemas.html
InfoSet
XML Information Set, W3C Proposed Recommendation. John Cowan. August 2001.
http://www.w3.org/TR/2001/PR-xml-infoset-20010810/
List
XML Schema Infoset API List (an unmoderated and unchartered public list which should be created).
Serialized Infosets Schema from University of Edinburgh
A schema for serialized infosets, Richard Tobin and Henry Thompson, LTG, University of Edinburgh, May 2001
TIBCO Schema API
TIBCO Schema API for Java, December 2001
WSDL
Web Services Definition Language (WSDL) 1.1. E. Christensen, F. Curbera, G. Meredith, S. Weerawarana. March 25, 2001.
http://www.w3.org/TR/wsdl
XForms
XForms 1.0. M. Dubinko, J. Dietl, R. Merrick, D. Raggett, T. V. Raman, L. Bucsay Welsh. August 28, 2001.
http://www.w3.org/TR/xforms
XML
Extensible Markup Language (XML) 1.0 Recommendation. T. Bray, J. Paoli, C. M. Sperberg-McQueen. February 1998.
http://www.w3.org/TR/1998/REC-xml-19980210
XML-ns
Namespaces in XML Recommendation. T. Bray, D. Hollander, A. Layman. January 1999.
http://www.w3.org/TR/1999/REC-xml-names-19990114/
XML-schema
XML Schema Part 1: Structures W3C Recommendation. D. Beech, M. Maloney, N. Mendelsohn, H. Thompson. May 2001.
http://www.w3.org/TR/2001/REC-xmlschema-1-20010502/
XML Schema Part 2: Datatypes W3C Recommendation. P. Biron, A. Malhotra. May 2001.
http://www.w3.org/TR/2001/REC-xmlschema-2-20010502/
XSet
Full Fidelity Information Set Representation. Jonathan Borden. XML-Dev
http://lists.xml.org/archives/xml-dev/200008/msg00239.html
URI
RFC2396. Uniform Resource Identifiers (URI): Generic Syntax. T. Berners-Lee, R. Fielding, L. Masinter. August 1998
http://www.ietf.org/rfc/rfc2396.txt