IBM has established these as requirements for some of its own work on schema-related APIs, and we offer them
for consideration
as potential requirements for API's to be included in W3C recommendations.
This document lists the design principles, scope, and requirements for a
XML Schema Infoset and API.
It includes requirements as they relate to development time and runtime software
which
Status of this Document
This document was shared by IBM with the W3C Schema and DOM WGs in February 2002.
It represents the views of
the IBM XML Schema Infoset API Working
Team (Achille Fokoué, Ed Merks, Lisa Martin, Arnaud Le Hors, Elena Litani, Bob Schloss,
Chuck Campbell, Noah Mendelsohn).
This is a draft document and may be updated, replaced or obsoleted
by other documents at any time.
Please send comments to the editor <rschloss@us.ibm.com>
and to IBM's representatives to the W3C Schema Working Group
(<lmartin@ca.ibm.com> and
<chuckc@us.ibm.com>)
and the W3C DOM Working Group (<elitani@ca.ibm.com>
and
<lehors@us.ibm.com>
).
The XML Schema Recommendation [XML-schema] specifies
- the
syntax of a class of resources called XML schema documents,
-
the semantics of those documents in terms of being able to classify any
XML element (including the document element of an XML instance document)
into one that is valid according to a schema or invalid
according to a schema,
- a representation of the constraints of a schema
as a set of abstract components with properties,
-
optional schema-related markup
in instance documents,
and
-
a process of assessment
for XML content which can augment the infoset of that content with post-schema-validation infoset
information.
Numerous other specifications are using the XML Schema mechanisms to
specify constraints on XML documents used as messages in program interfaces
[WSDL],
to constrain values that an agent accepts when a user interacts with an XForm
[XFORMS], etc.
This specification provides
requirements for a data model and an API to be used by programs such as:
- editors of XML instance documents which provide guidance based on a schema
- tools that examine pairs of schemas
- to compare them
- to generate code that can map instance documents conforming to the first schema into
instance documents conforming to the second schema
-
Mapping tools that support non-XML data sources at one end and schema-described XML at the other
- Tools to visualize, create, modify and extend XML Schemas.
Schema APIs and models have begun to appear from commercial software vendors
(such as TIBCO [TIBCOSchemaAPI], and from open
source efforts (such as Castor [Castor Schema]).
The W3C Schema WG has been advised that Markup Technologies, a W3C member, is also
developing an API to Schema components.
These generally support a subset of the operations
one might wish to perform on W3C XML Schemas; for example, updates to schema components,
and coordinating changes to corresponding schema documents, are often not considered.
A complete API with multiple implementations would be an extremely valuable augmentation to the
W3C XML Schema language specifications.
This document uses the term core methods to mean the set of methods or functions which
permit construction or examination of W3C XML Schema components.
This document uses the term non-core methods to mean the set of methods or functions which
make construction, examination, modification, comparison, search of, or diagnostic feedback on,
W3C XML Schemas much easier to code,
but which, if not implemented, can have their functions accomplished by more extensive use of the
core methods.
This section describes high level principles of design and definition of
scope. They are an expression of intent/motivation. Fuller motivations of
the requirements and non-requirements appear in subsequent sections.
-
The XML Schema Infoset API specification must
describe
- how to express the API in commonly used language, such as Java and
- how to use a language which can
use interfaces expressed in IDL
to create, examine or modify
a schema, including the abstract components of the schema as
well as the XML representation of XML schema documents.
- Methods should be provided to retrieve or delete schema components
either
- all components of a specific component type
- by name, for components which have names
-
The XML Schema Infoset API must describe functions to
-
create its data model by deserializing XML Schema documents,
- to serialize its data model as one or more XML Schema documents,
and
- to serialize its data model as an XML document compatible with
the portion of the Post Schema Validation Infoset grammar that describes
schema components [Serialized Infosets Schema from University of Edinburgh].
- The model must represent schemas which are incomplete or in-error.
- The specification must provide for representation of a schema object
which is missing some of the properties that it would be required to have
before it can be used in assessment.
- The specification must provide for representation of references from
one schema object to another by name, even though the component defining
that name is not in the data model.
- The specification must provide for representation of a schema object
or set of schema objects which have properties such that a violation of a constraint on
schema would be recognized if these properties were not changed before
assessment.
- The model must be able to represent cyclic references and self-references
- The specification may include method call ordering constraints,
specifying that programs using the API must call one method before calling
a different method in cases where that makes the API syntax clear and will
simplify the development of software to provide the API.
- The specification must provide a mechanism for values of minOccurs and
maxOccurs and of facets to be read and set in the native type system of the
implementing programming language, and not only as Unicode strings.
- The specification will not include methods to take a portion of an XML infoset
and determine whether a particular set of schema components consider the infoset
valid, invalid, or indeterminate without additional schema components.
- The specification should define error code values or exception names
by which a library implementing the API returns information to the calling
program when an illegal state occurs.
- The specification should strive to limit optionality in the core methods, and maximize
extensibility such that all of the specification can be quickly implemented
A set of schema documents that are deserialized through this API,
and then serialized through this API, should result in equivalent XML documents,
in terms of their implications for validation (including PSVI information).
Information should not be thrown away or compiled in ways that would prevent this.
Part of the API concerning capability introspection (see below) will allow
using programs to request that component ordering, namespace prefixes, IDs,
names of identity-definition constraints,
global elementFormDefault and attributeFormDefault information, whitespace
and XML comments and entity references all be preserved, and for a given implementation
to specify that it is capable, or is not capable, of doing so.
- The API should have capability introspection, so that callers can determine
whether the implementation they are using has implemented various parts of
the specification.
- Names of the exceptions detected concerning constraints on schema
by a given implementation should be visible through this capability introspection
- Exceptions which are detected immediately, and Exceptions which are detected only upon special checking
method calls, should be visible through this capability introspection
- Non-core named sets of convenience methods
- The spec for the API may document non-core sets of convenience methods
- The spec will avoid specifying
methods which returns a message to be shown to a developer,
or which cause a message to be sent to standard output. In the exceptional case
where this cannot be avoided, the method
must be non-Core and must be passed an argument to specify the language to be used for the message
Reasons for the given requirement appear after some requirements within brackets: [
reason] .
- The specification should strive to limit optionality in the core methods, and maximize
extensibility such that all of the specification can be quickly implemented.
- The spec for the API may document non-core sets of convenience methods.
[Modularity].
-
The spec will avoid specifying
methods which returns a message to be shown to a developer,
or which cause a message to be sent to standard output.
Any method which returns a message to be shown to a developer,
or causes a message to be sent to standard output,
must be non-Core and must specify the language to be used for the message.
[Support I18N].
- The data model must represent the abstract components of the schema as
well as additional information from the XML representation of XML schema documents.
- The specification must provide for representation of a schema object
which is missing some of the properties that it would be required to have
before it can be used in assessment.
[This allows the model to be used in schema editors].
- The specification must provide for representation of references from
one schema object to another by name, even though the component defining
that name is not in the data model.
[This allows the model to be used in schema editors, and a reference to be created
before the definition is created].
- The specification must provide for representation of a schema object
or set of schema objects which have properties such that a violation of a constraint on
schema would be recognized if these properties were not changed before
assessment.
[This allows the model created in a schema editor to be persisted in an error state].
- The model must be able to represent cyclic references and self-references.
[This allows the model created in a schema editor to be persisted in an error state].
- Methods described in this interface must be usable from
- programs written in Java and
- programs written in a language that allows interfaces expressed in IDL
- The specification must include method call ordering constraints,
specifying that programs using the API must call one method before calling
a different method in cases where that makes the API syntax clear and will
simplify the development of software to provide the API.
- The specification must provide a mechanism for values of minOccurs and
maxOccurs and of facets to be read and set in the native type system of the
implementing programming language, and not only as Unicode strings.
- The specification should define error code values or exception names
by which a library implementing the API returns information to the calling
program when an illegal state occurs. These error code values and exception names
should be utilize the error codes already defined by W3C XML Schema specifications
when they are available.
- The API should have capability introspection, so that callers can determine
whether the implementation they are using has implemented various parts of
the specification.
- Names of the exceptions detected concerning constraints on schema
by a given implementation should be visible through this capability introspection
- Exceptions which are detected immediately, and Exceptions which are detected only upon special checking
method calls, should be visible through this capability introspection
- Non-core named sets of convenience methods
- Convenience methods should be provided to locate all unresolved references
in a schema data model
- Methods should be provided to retrieve or delete schema components
either
- all components of a specific component type
- by name, for components which have names
-
The XML Schema Infoset API must describe functions to
-
create its data model by deserializing XML Schema documents,
- to serialize its data model as one or more XML Schema documents,
and
- to serialize its data model as an XML document compatible with
the portion of the Post Schema Validation Infoset grammar that describes
schema components.
-
A set of schema documents that are deserialized through this API,
and then serialized through this API, should result in Equivalent XML documents,
in terms of their implications for validation (including PSVI information).
Information should not be thrown away or compiled in ways that would prevent this.
Part of the API concerning capability introspection will allow
using programs to request that "round-tripping" a schema be as
precise as possible -- near-identical.
Near-Identical need not include attribute quotation systems used or attribute
order on the same line.
Near-Identical should include whitespace and
element start tags, element end tags, attributes, and element content
appearing on the same lines as they did in the original document.
Near-Identical includes elements inside <appinfo> and <documentation>
appearing on the same lines as they did in the original document.
Equivalent includes the presence of schema language for which components
are normally not created (such as minOccurs=maxOccurs=0).
- When a schema document is deserialized, the API provides facilities for
specifying one of the following behaviors:
- deserialize any schema document referenced in an <include> <import> or
<redefine> element information item.
- do not deserialize schema documents referred to
- When a schema document is deserialized, the API provides facilities for specifying
one of the following behaviors:
- connect unresolved symbolic references from existing models into the components of the model built from this schema
- do not connect unresolved symbolic references
- When a schema data model is destroyed, and models of other schema documents
remain which had connections to components in the destroyed model,
the API provides facilities for specifying
one of the following behaviors:
- restore unresolved symbolic references into existing models, where
connections into the components of the model built from this schema had been
-
- When a schema data model has its targetNamespace changed, and models of other schema documents
remain which had connections to components in the model using the old target namespace,
the API provides facilities for specifying
one of the following behaviors:
- restore unresolved symbolic references, using the old target namespace, into existing models, where
connections into the components of the model built from this schema had been
-
- When a schema data model is destroyed, and models of other non-schema documents (WSDL, XForm)
remain which had connections to components in the destroyed model,
the API provides facilities for signaling an event, allowing the non-schema data model to be
adjusted.
[Question: should the event only be signalled if there actually were connections,
or if there could have been connections?]
- When a schema data model has its targetNamespace changed, and models of other non-schema documents (WSDL, XForm)
remain which had connections to components in the model using the old target namespace,
the API provides facilities for signaling an event, allowing the non-schema data model to be
adjusted.
-
The API should provide facilities so that
a component in the data model for one schema document can be copied
into the data model for a different schema document.
-
The API could provide facilities so that when schema validity is tested more than once,
the output from the n+1st test indicates which errors had been present and
remain present, which errors have become present since the last test, and which error that
were present are no longer present.
[Would make it easier to maintain an error messages list without recreating the entire list after
each test. Only useful for very large schemas with large numbers of errors.]
(no requirements)
The XML Schema Infoset API specification should meet the requirements of (so as to
support) or work with the following applications:
- (no W3C applications being developed as of February 2002).
To ensure the above requirements are adequately addressed, the XML Schema Infoset API
specification must be reviewed by a designated member of the following
communities:
- The specification should be free of encumbering technologies: requiring no
licensing fees for implementation and use.
Members of the XML Schema Infoset API Working Group or Task Force
and any other Working Group
constituted within the XML Schema Infoset API Activity (if the W3C should
charter an activity)
are expected to disclose any
intellectual property they have in this area. Any intellectual property
essential to implement specifications produced by this work (Activity) must be at
least available for licensing on a royalty-free basis. At the suggestion of
the Working Group or Task Force,
and at the discretion of the Director of W3C, technologies
may be accepted if they are licensed on reasonable, non-discriminatory terms.
-
An implementation of the methods in this API should be usable
without any need to invoke the DOM Level 3 Abstract Schema methods, described in
DOM Level 3 Abstract Schemas.
- The specification should use UML Logical Class Diagrams to express
properties of data model instance objects and associations between objects.
- For the convenience of developers, the specification should use Javadoc
to describe the API as seen by Java software
This list is not an exhaustive list of use cases. It represents a range of operations that
should be doable using the methods provided by the Schema Infoset API.
- determine if the schema has a rule for validating a global element called 'foo' - return true if it does, false if it doesn't
- determine if 'group1' and 'group2' have the same effective content model, although they have different group names
- determine if components referenced via <include> <redefine> or <import> could
not be accessed, and specify a non-access reason code (e.g. no connectivity, resolved URI does not
exists (404), resolved URI does not reference an XML document, resolved URI does not reference
a W3C XML Schema, self-reference, other).
- retrieve all foreign attributes located on the first element information item which declares
a global element
- given a schema and an element name, remove any schema components that are not needed to validate that element and its content (including child elements and their content) and its attributes
- take an existing schema, update its targetNamespace and its version
-
take an existing schema, find any simpleType which derived from (http://www.w3.org/2001/XMLSchema):string and
which has the enumerated value "topSecret"
and modify it to have the enumerated value "superSecret" instead
- take an existing schema, find all global groups which are composed with <all>, and
modify them to be composed with <sequence>
- create a schema that contains a global element declaration with an anonymous simpleType that
is a restriction of one of the schema built-in types, specifying facets, and serialize the schema
to a document using the XML Representation of Schema
- create a schema that contains a global element declaration with an anonymous complexType
- create a schema that contains a foreign attribute on a schema <element> element information item
- For an element in an instance document that was assessed with a schema, return the element
abstract component which was used in the assessment
- For an element in an instance document that was assessed with a schema, return a binary indication
of whether the element's content was valid or invalid
- For an attribute in an instance document that was assessed with a schema, return the attribute
abstract component which was used in the assessment
- For an attribute in an instance document that was assessed with a schema, return a binary
indication of whether the attribute's content was valid or invalid
- Castor Schema
- Castor XML Schema Support from Exolab.org,
Arnaud Blandin, 0.9.3.9, December 12, 2001
- DOM
- Document Object
Model Core, Level 3. Arnaud Le Hors. W3C Working Draft. January
2001.
http://www.w3.org/TR/DOM-Level-3-Core/core.html
- DOM-AS
- Document Object Model Abstract Schemas. B. Chang, J. Kesselman, R. Rahman. February 14, 2002
- http://www.w3.org/2001/02/WD-DOM-Level-3-ASLS-SaintValentin/abstract-schemas.html
- InfoSet
- XML Information Set, W3C Proposed Recommendation. John Cowan. August 2001.
- http://www.w3.org/TR/2001/PR-xml-infoset-20010810/
- List
- XML
Schema Infoset API List (an unmoderated and unchartered public list which should be created).
- Serialized Infosets Schema from University of Edinburgh
- A schema for serialized infosets,
Richard Tobin and Henry Thompson, LTG, University of Edinburgh, May 2001
- TIBCO Schema API
- TIBCO Schema API for Java,
December 2001
- WSDL
- Web Services Definition Language (WSDL) 1.1. E. Christensen, F. Curbera, G. Meredith, S. Weerawarana. March 25, 2001.
- http://www.w3.org/TR/wsdl
- XForms
- XForms 1.0. M. Dubinko, J. Dietl, R. Merrick, D. Raggett, T. V. Raman, L. Bucsay Welsh. August 28, 2001.
- http://www.w3.org/TR/xforms
- XML
- Extensible Markup Language (XML) 1.0 Recommendation. T. Bray, J. Paoli, C.
M. Sperberg-McQueen. February 1998.
- http://www.w3.org/TR/1998/REC-xml-19980210
- XML-ns
- Namespaces in XML Recommendation. T. Bray, D. Hollander, A. Layman.
January 1999.
- http://www.w3.org/TR/1999/REC-xml-names-19990114/
- XML-schema
- XML Schema
Part 1: Structures W3C Recommendation. D. Beech, M. Maloney, N.
Mendelsohn, H. Thompson. May 2001.
- http://www.w3.org/TR/2001/REC-xmlschema-1-20010502/
XML Schema Part 2:
Datatypes W3C Recommendation. P. Biron, A. Malhotra. May 2001.
- http://www.w3.org/TR/2001/REC-xmlschema-2-20010502/
- XSet
- Full Fidelity Information Set Representation. Jonathan Borden. XML-Dev
- http://lists.xml.org/archives/xml-dev/200008/msg00239.html
- URI
- RFC2396. Uniform Resource Identifiers (URI): Generic Syntax. T.
Berners-Lee, R. Fielding, L. Masinter. August 1998
http://www.ietf.org/rfc/rfc2396.txt