Validation

The "validation" block provides a number of components to allow validation of documents according to schemas expressed in different grammars from within Cocoon.

The "validation" block, on top of providing a ValidatingTransformer to validate documents in a pipeline, and a ValidationReportTransformer to produce validity reports, provides also a simple framework to access validation capabilities from within other components.

Validation Framework: the Validator interface

The main entry point for the validation framework is the org.apache.cocoon.components.validation.Validator interface, which specify an extremely simple way to produce handlers able to validate XML documents.

One can get a hold on a Validator instance using Avalon's normal lookup mechanism:

import org.apache.cocoon.components.validation.Validator;

public class MyClass {
  ...
  public void myMethod() {
    Validator validator = (Validator) this.serviceManager.lookup(Validator.ROLE);
    ...
  }
}

A Validator, basically, abstracts the idea of accessing a schema. It is designed to be grammar language independent, and in most cases (when necessary, and unknown by the caller) it should be able to automatically determine the type of the grammar of the supplied schema.

Well known exceptions to the automatic schema detection are grammars not written in parseable XML (such as the RELAX NG Compact grammar, or XML DTDs). Unfortunately the complexity of writing detection algorithms for those languages would be so complicated to probably outweight the benefits of having them.

Once obtained an instance of a Validator, normally one would invoke one of the getValidationHandler(...) methods (depending on the available parameters) to obtain handlers that will actually perform the validation of a series of SAX events.

One parameter always required by the getValidationHandler(...) is the source of the schema to use for validation (either as a String or an Excalibur Source instance).

Optionally, one could also specify the grammar of the schema to use. This will force the Validator to try and parse the schema using a specified language (for example Relax NG, or the XML Schema Language) and should be used whenever.

Finally, a SAX ErrorHandler can optionally be specified when obtaining a ValidationHandler. This will receive notification of all inconsistancies in the notification and the ValidationHandler will throw SAXExceptions only when the error handler is configured to do so.

Configuring the validation framework in cocoon.xconf files

Configuration of the validation framework is extremely easy. If you built the "validation" block together with Cocoon, you should already have something like the following in your build/webapp/WEB-INF/cocoon.xconf file:

<!--+ The shared Validator instance in Cocoon.
    |
    | This defaults to an instance of a "CachedValidator". To disable schema
    | caching add the following attribute to the <validator ... /> element:
    |
    |   class="org.apache.cocoon.components.validation.impl.DefaultValidator"
    +-->

<validator logger="core.validation">
  <schema-parser name="jing" class="org.apache.cocoon.components.validation.jing.JingSchemaParser"/>
  <schema-parser name="jaxp" class="org.apache.cocoon.components.validation.jaxp.JaxpSchemaParser">
    <factory-class>org.apache.xerces.jaxp.validation.XMLSchemaFactory</factory-class>
  </schema-parser>
</validator>

Simply the root tag <validator/> specifies the Validator instance to use. Caching of schemas is the default behavior, you can see from the example above how this can be disabled.

To support multiple grammars, the Validator can be configured with more than one SchemaParser. Each SchemaParser can support more than one grammar (for example, the jaxp parser declared above will support all the grammars supported by your Java Virtual Machine JAXP library).

This means that in some case, multiple providers (or SchemaParsers) can support the same grammar. We'll see later how to make sure that a specific SchemaParser is used when needed.

A note on grammars

For grammars written in XML, wherever possible their name will be the namespace URI associated with their root element. At time of writing the well known grammars are the following:

NAME

NAMESPACE

VALIDATOR INTERFACE FIELD NAME

ISO Schematron http://purl.oclc.org/dsdl/schematron Validator.GRAMMAR_ISO_SCHEMATRON
RELAX Core http://www.xml.gr.jp/xmlns/relaxCore Validator.GRAMMAR_RELAX_CORE
RELAX Namespace http://www.xml.gr.jp/xmlns/relaxNamespace Validator.GRAMMAR_RELAX_NS
RELAX NG http://relaxng.org/ns/structure/1.0 Validator.GRAMMAR_RELAX_NG
Shematron 1.5 http://www.ascc.net/xml/schematron Validator.GRAMMAR_SCHEMATRON
TREX http://www.thaiopensource.com/trex Validator.GRAMMAR_TREX
XML Schema http://www.w3.org/2001/XMLSchema Validator.GRAMMAR_XML_SCHEMA
XML DTD

N/A (grammar identifier http://www.w3.org/TR/REC-xml)

Validator.GRAMMAR_XML_DTD

This implies that normally, automatic detection of grammars will work on analyzing the root element's namespace of the schema document. And of course (as stated above) this will not work on non-XML schema languages.

In addition to the grammar name, one could prefix the grammar name with an extra identifier to select a specific provider or SchemaParser: if for example both the JAXP and JING parsers declared in the cocoon.xconf snippet above can handle the RELAX NG grammar, one could use the following two identifiers to select specifically one implementation over another:

  • jing:http://relaxng.org/ns/structure/1.0 for the JING SchemaParser
  • jaxp:http://relaxng.org/ns/structure/1.0 for the JAXP SchemaParser

In other words, the grammar identifier can be prefixed by the schema parser name declared in cocoon.xconf separated by the colon character.

The ValidationHandler interface

The ValidationHandler interface is a simple union of the ContentHandler and LexicalHandler SAX interfaces.

To validate documents, simply, send SAX events to the ValidationHandler: the ErrorHandler originally specified when obtaining it will receive notification of all inconsistancies, optionally throwing SAXExceptions back to the caller.

In addition to this the ValidationHandler provides an extra method called getValidity(). This method exposes a SourceValidity instance associated with the original schema (and possibly all included subschemas) previously parsed. This is extremely useful when dealing with pipeline caches, as it allows to identify whether the behavior of the ValidationHandler will change in result to schema changes.

Extensions implementation details

For normal development (extending the validation framework with new grammars) developers should implement the SchemaParser and Schema interfaces. A number of different abstract classes are provided in the org.apache.cocoon.components.validation.impl package.

The sources of other provided SchemaParser implementations and the JavaDoc comments can be of extreme help when developing new providers.

Alternatively, one can provide implementations against the JAXP API, and Cocoon can use those directly through the configuration of the JaxpSchemaParser class. More than one JaxpSchemaParser can be configured for a Validator in the cocoon.xconf file, just specify a different factory class, and optionally the grammar languages this provides (see the JavaDOC for more information).

Fields

NameValue
CocoonBlockvalidation
Comments (0)