This chapter describes the optional DOM Level 3 Content
Model (CM) feature. This module provides a
representation for XML content models, e.g., DTDs and XML Schemas,
together with operations on the content models, and how such
information within the content models could be applied to XML
documents used in both the document-editing and CM-editing worlds.
It also provides additional tests for well-formedness of XML
documents, including Namespace well-formedness. A DOM application
can use the hasFeature
method of
theDOMImplementation
interface to determine whether a
given DOM supports these capabilities or not. One feature string
for the CM-editing interfaces listed in this section is "CM-EDIT"
and another feature string for document-editing interfaces is
"CM-DOC".
This chapter interacts strongly with the Load and Save chapter, which is also under development in DOM Level 3. Not only will that code serialize/deserialize content models, but it may also wind up defining its well-formedness and validity checks in terms of what is defined in this chapter. In addition, the CM and Load/Save functional areas will share a common error-reporting mechanism allowing user-registered error callbacks. Note that this may not imply that the parser actually calls the DOM's validation code -- it may be able to achieve better performance via its own -- but the appearance to the user should probably be "as if" the DOM has been asked to validate the document, and parsers should probably be able to validate newly loaded documents in terms of a previously loaded DOM CM.
Finally, this chapter will have separate sections to address the needs of the document-editing and CM-editing worlds, along with a section that details overlapping areas such as validation. In this manner, the document-editing world's focuses on editing aspects and usage of information in the CM are made distinct from the CM-editing world's focuses on defining and manipulating the information in the CM.
In the October 9, 1997 DOM requirements document, the following appeared: "There will be a way to determine the presence of a DTD. There will be a way to add, remove, and change declarations in the underlying DTD (if available). There will be a way to test conformance of all or part of the given document against a DTD (if available)." In later discussions, the following was added, "There will be a way to query element/attribute (and maybe other) declarations in the underlying DTD (if available)," supplementing the primitive support for these in Level 1.
That work was deferred past Level 2, in the hope that XML Schemas would be addressed as well. It is anticipated that lowest common denominator general APIs generated in this chapter can support both DTDs and XML Schemas, and other XML content models down the road.
The kinds of information that a Content Model must make available are mostly self-evident from the definitions of Infoset, DTDs, and XML Schemas. Note that some kinds of information on which the DOM already relies, e.g., default values for attributes, will finally be given a visible representation here, however.
The content model referenced in these use cases/requirements is an abstraction and does not refer solely to DTDs or XML Schemas.
For the CM-editing and document-editing worlds, the following use cases and requirements are common to both and could be labeled as the "Validation and Other Common Functionality" section:
Use Cases:
Requirements:
Specific to the CM-editing world, the following are use cases and requirements and could be labeled as the "CM-editing" section:
Use Cases:
Requirements:
Specific to the document-editing world, the following are use cases and requirements and could be labeled as the "Document-editing" section:
Use Cases:
Requirements:
General Issues:
QName
, e.g.,
foo:bar
, whereas the latter will report its namespace
and local name, e.g., {http://my.namespace}bar
. We
have added the isNamespaceAware
attribute to the
generic CM object to help applications determine which of these
fields are important, but we are still analyzing this
challenge.A list of the proposed Content Model data structures and functions follow, starting off with the data structures and "CM-editing" methods.
CMModel
is an abstract object that could map to a
DTD, an XML Schema, a database schema, etc. It's a generalized
content model object, that has both an internal and external
subset. The internal subset would always exist, even if empty, with
the external subset (if present) being represented as a link to one
or more CMExternalModel
s.
It is possible, however, that none of these CMExternalModel
s
are active.
interface CMModel : CMNode { readonly attribute boolean isNamespaceAware; readonly attribute ElementDeclaration rootElementDecl; DOMString getLocation(); nsElement getCMNamespace(); CMNamedNodeMap getCMNodes(); boolean removeNode(in CMNode node); boolean insertBefore(in CMNode newNode, in CMNode refNode); boolean validate(); };
isNamespaceAware
of type boolean
, readonlyQNames
.rootElementDecl
of
type
ElementDeclaration
, readonlygetCMNamespace
CMModel
.
|
Namespace of |
getCMNodes
getLocation
|
This method returns a DOMString defining the absolute location from which this document is retrieved including the document name. |
insertBefore
removeNode
validate
|
Is the CM valid? |
CMExternalModel
is an abstract object that could
map to a DTD, an XML Schema, a database schema, etc. It's a
generalized content model object that is not bound to a particular
XML document.
interface CMExternalModel : CMModel { };
CMNode
is analogous to a Node
in the
Core DOM, e.g., an element declaration. This can exist for both CMExternalModel
(include/ignore must be handled here) and CMModel
.
It should handle the following:
interface CommentsPIsDeclaration { attribute
ProcessingInstruction pis; attribute Comment comments; }; interface
Conditional Declaration { attribute boolean includeIgnore;
};
Opaque.
interface CMNode { const unsigned short ELEMENT_DECLARATION = 1; const unsigned short ATTRIBUTE_DECLARATION = 2; const unsigned short CM_NOTATION_DECLARATION = 3; const unsigned short ENTITY_DECLARATION = 4; const unsigned short CM_CHILDREN = 5; const unsigned short CM_MODEL = 6; const unsigned short CM_EXTERNALMODEL = 7; readonly attribute unsigned short cmNodeType; CMNode cloneCM(); CMNode cloneExternalCM(); };
ElementDeclaration
.AttributeDeclaration
.
CMNotationDeclaration
.EntityDeclaration
.
CMChildren
.CMModel
.CMExternalModel
.cmNodeType
of type
unsigned short
, readonlycloneCM
cloneExternalCM
CMExternalModel
.
It is possible that a document would not refer to the
CMNode
returned.
Cloned |
CMNodeList
is the CM analogue to
NodeList
; the document order is meaningful, as opposed
to CMNamedNodeMap
.
interface CMNodeList { };
CMNamedNodeMap
is the CM analogue to
NamedNodeMap
. The order is not meaningful.
interface CMNamedNodeMap { };
The primitive datatypes supported currently are:
string
, boolean
, float
,
double
, decimal
.
interface CMDataType { const short STRING_DATATYPE = 1; const short BOOLEAN_DATATYPE = 2; const short FLOAT_DATATYPE = 3; const short DOUBLE_DATATYPE = 4; const short LONG_DATATYPE = 5; const short INT_DATATYPE = 6; const short SHORT_DATATYPE = 7; const short BYTE_DATATYPE = 8; attribute int lowValue; attribute int highValue; short getPrimitiveType(); };
string
data type as defined
in XML
Schema Datatypes.boolean
data type as defined
in XML
Schema Datatypes.float
data type as defined
in XML
Schema Datatypes.double
data type as defined
in XML
Schema Datatypes.integer
data type as defined
in XML
Schema Datatypes.getPrimitiveType
|
code representing the primitive type of the attached data item. |
The element name along with the content specification in the
context of a CMNode
.
interface ElementDeclaration { int getContentType(); CMChildren getCMChildren(); CMNamedNodeMap getCMAttributes(); CMNamedNodeMap getCMGrandChildren(); };
getCMAttributes
CMNamedNodeMap
containing
AttributeDeclarations
for all the attributes that
can appear on this type of element.
Attributes list for this |
getCMChildren
Content model of element. |
getCMGrandChildren
CMNamedNodeMap
containing ElementDeclarations
for all the
Element
s that can appear as children of this type of
element. Note that which ones can actually appear, and in what
order, is defined by the
CMChildren
.
Children list for this |
getContentType
|
Content type constant. |
An element in the context of a CMNode
.
interface CMChildren { attribute DOMString listOperator; attribute CMDataType elementType; attribute int multiplicity; attribute CMNamedNodeMap subModels; readonly attribute boolean isPCDataOnly; };
elementType
of type CMDataType
isPCDataOnly
of type boolean
, readonlylistOperator
of type DOMString
multiplicity
of type int
subModels
of type CMNamedNodeMap
CMNode
s
in which the element can be defined.An attribute in the context of a CMNode
.
interface AttributeDeclaration { const short NO_VALUE_CONSTRAINT = 0; const short DEFAULT_VALUE_CONSTRAINT = 1; const short FIXED_VALUE_CONSTRAINT = 2; readonly attribute DOMString attrName; attribute CMDataType attrType; attribute DOMString attributeValue; attribute DOMString enumAttr; attribute CMNodeList ownerElement; attribute short constraintType; };
attrName
of type
DOMString
, readonlyattrType
of type CMDataType
attributeValue
of type DOMString
constraintType
of type short
enumAttr
of type
DOMString
ownerElement
of
type CMNodeList
As in current DOM.
interface EntityDeclaration { };
This interface represents a notation declaration.
interface CMNotationDeclaration { attribute DOMString strSystemIdentifier; attribute DOMString strPublicIdentifier; };
strPublicIdentifier
of type DOMString
strSystemIdentifier
of type DOMString
This section contains "Validation and Other" methods common to
both the document-editing and CM-editing worlds (includes Document
,
DOMImplementation
, and DOMErrorHandler
methods).
The setErrorHandler
method is off of the
Document
interface.
interface Document { void setErrorHandler(in DOMErrorHandler handler); };
setErrorHandler
handler
of type DOMErrorHandler
This interface extends the Document
interface with additional methods for both document and CM
editing.
interface DocumentCM : Document { int numCMs(); CMModel getInternalCM(); CMExternalModel * getCMs(); CMModel getActiveCM(); void addCM(in CMModel cm); void removeCM(in CMModel cm); boolean activateCM(in CMModel cm); };
activateCM
CMModel
active. Note that if a user wants to activate one CM to get default
attribute values and then activate another to do validation, a user
can do that; however, only one CM is active at a time.
cm
of type CMModel
CMModel
points to a list of CMExternalModel
s;
with this call, only the specified CM will be active.
|
True if the |
addCM
CMModel
with a document. Can be invoked multiple times to result in a list
of CMExternalModel
s.
Note that only one sole internal CMModel
is associated with the document, however, and that only one of the
possible list of CMExternalModel
s
is active at any one time.
cm
of type CMModel
getActiveCM
CMExternalModel
for a document.
|
getCMs
CMExternalModel
s
associated with a document from the CMModel
.
This list arises when addCM()
is invoked.
|
A list of |
getInternalCM
numCMs
CMExternalModel
s
associated with the document. Only one CMModel
can be associated with the document, but it may point to a list of
CMExternalModel
s.
|
Non-negative number of external CM objects. |
removeCM
CMExternalModel
.
Can be invoked multiple times to remove a number of these in the
list of CMExternalModel
s.
cm
of type CMModel
This interface extends the DOMImplementation
interface with additional methods.
interface DOMImplementationCM : DOMImplementation { CMModel createCM(); CMExternalModel createExternalCM(); };
createCM
A NULL return indicates failure. |
createExternalCM
A NULL return indicates failure. |
This section contains "Document-editing" methods (includes
Node
, Element
, Text
and Document
methods).
This interface extends the Node
interface with
additional methods for guided document editing.
interface NodeCM : Node { boolean canInsertBefore(in Node newChild, in Node refChild) raises(DOMException); boolean canRemoveChild(in Node oldChild) raises(DOMException); boolean canReplaceChild(in Node newChild, in Node oldChild) raises(DOMException); boolean canAppendChild(in Node newChild) raises(DOMException); boolean isValid(); };
canAppendChild
AppendChild
.
newChild
of type
Node
Node
to be appended.
|
Success or failure. |
|
DOMException. |
canInsertBefore
Node::InsertBefore
operation would make this document
invalid with respect to the currently active CM. ISSUE: Describe
"valid" when referring to partially completed documents.
newChild
of type
Node
Node
to be inserted.refChild
of type
Node
Node
.
|
A boolean that is true if the |
|
DOMException. |
canRemoveChild
RemoveChild
.
oldChild
of type
Node
Node
to be removed.
|
Success or failure. |
|
DOMException. |
canReplaceChild
ReplaceChild
.
newChild
of type
Node
Node
.oldChild
of type
Node
Node
to be replaced.
|
Success or failure. |
|
DOMException. |
isValid
|
True if the node is valid in the current context, false if not. |
This interface extends the Element
interface with
additional methods for guided document editing.
interface ElementCM : Element { int contentType(); ElementDeclaration getElementDeclaration() raises(DOMException); boolean canSetAttribute(in DOMString attrname, in DOMString attrval); boolean canSetAttributeNode(in Node node); boolean canSetAttributeNodeNS(in Node node, in DOMString namespaceURI, in DOMString localName); boolean canSetAttributeNS(in DOMString attrname, in DOMString attrval, in DOMString namespaceURI, in DOMString localName); };
canSetAttribute
attrname
of type
DOMString
attrval
of type
DOMString
|
Success or failure. |
canSetAttributeNS
attrname
of type
DOMString
attrval
of type
DOMString
namespaceURI
of type
DOMString
namespaceURI
of namespace.localName
of type
DOMString
localName
of namespace.
|
Success or failure. |
canSetAttributeNode
node
of type
Node
Node
in which the attribute can possibly be
set.
|
Success or failure. |
canSetAttributeNodeNS
node
of type
Node
Node
in which to set the
namespace.namespaceURI
of type
DOMString
namespaceURI
of namespace.localName
of type
DOMString
localName
of namespace.
|
Success or failure. |
contentType
|
Constant for mixed, empty, any, etc. |
getElementDeclaration
ElementDeclaration object |
|
If no DTD is present raises this exception |
This interface extends the CharacterData
interface
with additional methods for document editing.
interface CharacterDataCM : Text { boolean isWhitespaceOnly(); boolean canSetData(in unsigned long offset, in DOMString arg) raises(DOMException); boolean canAppendData(in DOMString arg) raises(DOMException); boolean canReplaceData(in unsigned long offset, in unsigned long count, in DOMString arg) raises(DOMException); boolean canInsertData(in unsigned long offset, in DOMString arg) raises(DOMException); boolean canDeleteData(in unsigned long offset, in DOMString arg) raises(DOMException); };
canAppendData
arg
of type
DOMString
|
Success or failure. |
|
DOMException. |
canDeleteData
offset
of type
unsigned long
arg
of type
DOMString
|
Success or failure. |
|
DOMException. |
canInsertData
offset
of type
unsigned long
arg
of type
DOMString
|
Success or failure. |
|
DOMException. |
canReplaceData
offset
of type
unsigned long
count
of type
unsigned long
arg
of type
DOMString
|
Success or failure. |
|
DOMException. |
canSetData
offset
of type
unsigned long
arg
of type
DOMString
|
Success or failure. |
|
DOMException. |
isWhitespaceOnly
|
True if content only whitespace; false for non-whitespace if it is a text node in element content. |
This interface extends the DocumentType
interface
with additional methods for document editing.
interface DocumentTypeCM : DocumentType { boolean isElementDefined(in DOMString elemTypeName); boolean isElementDefinedNS(in DOMString elemTypeName, in DOMString namespaceURI, in DOMString localName); boolean isAttributeDefined(in DOMString elemTypeName, in DOMString attrName); boolean isAttributeDefinedNS(in DOMString elemTypeName, in DOMString attrName, in DOMString namespaceURI, in DOMString localName); boolean isEntityDefined(in DOMString entName); };
isAttributeDefined
elemTypeName
of type
DOMString
attrName
of type
DOMString
|
Success or failure. |
isAttributeDefinedNS
elemTypeName
of type
DOMString
attrName
of type
DOMString
namespaceURI
of type
DOMString
namespaceURI
of namespace.localName
of type
DOMString
localName
of namespace.
|
Success or failure. |
isElementDefined
elemTypeName
of type
DOMString
|
Success or failure. |
isElementDefinedNS
elemTypeName
of type
DOMString
namespaceURI
of type
DOMString
namespaceURI
of namespace.localName
of type
DOMString
localName
of namespace.
|
Success or failure. |
isEntityDefined
entName
of type
DOMString
|
Success or failure. |
This interface extends Attr
to provide guided
editing of an XML document.
interface AttributeCM { AttributeDeclaration getAttributeDeclaration(); CMNotationDeclaration getNotation() raises(DOMException); };
getAttributeDeclaration
The attribute declaration corresponding to this attribute |
getNotation
Returns the notation declaration for this attribute if the type is of notation type, null otherwise. |
|
DOMException |
This section contains DOM error handling interfaces.
Basic interface for DOM error handlers. If an application needs
to implement customized error handling for DOM such as CM or
Load/Save, it must implement this interface and then register an
instance using the setErrorHandler
method. All errors
and warnings will then be reported through this interface.
Application writers can override the methods in a subclass to take
user-specified actions.
interface DOMErrorHandler { void warning(in DOMLocator where, in DOMString how, in DOMString why) raises(DOMSystemException); void fatalError(in DOMLocator where, in DOMString how, in DOMString why) raises(DOMSystemException); void error(in DOMLocator where, in DOMString how, in DOMString why) raises(DOMSystemException); };
error
where
of type DOMLocator
how
of type
DOMString
why
of type
DOMString
|
A subclass of DOMException. |
fatalError
where
of type DOMLocator
how
of type
DOMString
why
of type
DOMString
|
A subclass of DOMException. |
warning
where
of type DOMLocator
how
of type
DOMString
why
of type
DOMString
|
A subclass of DOMException. |
This interface provides document location information and is similar to a SAX locator object.
interface DOMLocator { int getColumnNumber(); int getLineNumber(); DOMString getPublicID(); DOMString getSystemID(); Node getNode(); };
getColumnNumber
|
The column number, or -1 if none is available. |
getLineNumber
|
The line number, or -1 if none is available. |
getNode
|
The NODE, or null if none is available. |
getPublicID
|
A string containing the public identifier, or null if none is available. |
getSystemID
|
A string containing the system identifier, or null if none is available. |
Editing and generating a content model falls in the CM-editing world. The most obvious requirement for this set of requirements is for tools that author content models, either under user control, i.e., explicitly designed document types, or generated from other representations. The latter class includes transcoding tools, e.g., synthesizing an XML representation to match a database schema.
It's important to note here that a DTD's "internal subset" is part of the Content Model, yet is loaded, stored, and maintained as part of the individual document instance. This implies that even tools which do not want to let users change the definition of the Document Type may need to support editing operations upon this portion of the CM. It also means that our representation of the CM must be aware of where each portion of its content resides, so that when the serializer processes this document it can write out just the internal subset. A similar issue may arise with external parsed entities, or if schemas introduce the ability to reference other schemas. Finally, the internal-subset case suggests that we may want at least a two-level representation of content models, so a single DOM representation of a DTD can be shared among several documents, each potentially also having its own internal subset; it's possible that entity layering may be represented the same way.
The API for altering the content model may also be the CM's official interface with parsers. One of the ongoing problems in the DOM is that there is some information which must currently be created via completely undocumented mechanisms, which limits the ability to mix and match DOMs and parsers. Given that specialized DOMs are going to become more common (sub-classed, or wrappers around other kinds of storage, or optimized for specific tasks), we must avoid that situation and provide a "builder" API. Particular pairs of DOMs and parsers may bypass it, but it's required as a portability mechanism.
Note that several of these applications require that a CM be able to be created, loaded, and manipulated without/before being bound to a specific Document. A related issue is that we'd want to be able to share a single representation of a CM among several documents, both for storage efficiency and so that changes in the CM can quickly be tested by validating it against a set of known-good documents. Similarly, there is a known problem in DOM Level 2 where we assume that the DocumentType will be created before the Document, which is fine for newly-constructed documents but not a good match for the order in which an XML parser encounters this data; being able to "rebind" a Document to a new CM, after it has been created may be desirable.
As noted earlier, questions about whether one can alter the content of the CM via its syntax, via higher-level abstractions, or both, exist. It's also worth noting that many of the editing concepts from the Document tree still apply; users should probably be able to clone part of a CM, remove and re-insert parts, and so on.
In addition to using the content model to validate a document instance, applications would like to be able to use it to guide construction and editing of documents, which falls into the document-editing world. Examples of this sort of guided editing already exist, and are becoming more common. The necessary queries can be phrased in several ways, the most useful of which may be a combination of "what does the DTD allow me to insert here" and "if I insert this here, will the document still be valid". The former is better suited to presentation to humans via a user interface, and when taken together with sub-tree validation may subsume the latter.
It has been proposed that in addition to asking questions about specific parts of the content model, there should be a reasonable way to obtain a list of all the defined symbols of a given type (element, attribute, entity) independent of whether they're valid in a given location; that might be useful in building a list in a user-interface, which could then be updated to reflect which of these are relevant for the program's current state.
Remember that namespaces also weigh in on this issue, in the case of attributes, a "can-this-go-there" may prompt a namespace-well-formedness check and warn you if you're about to conflict with or overwrite another attribute with the same namespaceURI/localName but different prefix... or same nodeName but different namespaceURI.
As mentioned above, we have to deal with the fact that the shortest distance between two valid documents may be through an invalid one. Users may want to know several levels of detail (all the possible children, those which would be valid given what precedes this point, those which would be valid given both preceding and following siblings). Also, once XML Schemas introduce context sensitive validity, we may have to consider the effect of children as well as the individual node being inserted.
The most obvious use for a content model (DTD or XML Schema or any Content Model) is to use it to validate that a given XML document is in fact a properly constructed instance of the document type described by this CM. This again falls into the document-editing world. The XML spec only discusses performing this test at the time the document is loaded into the "processor", which most of us have taken to mean that this check should be performed at parse time. But it is obviously desirable to be able to validate again a document -- or selected subtrees -- at other times. One such case would be validating an edited or newly constructed document before serializing it or otherwise passing it to other users. This issue also arises if the "internal subset" is altered -- or if the whole Content Model changes.
In the past, the DOM has allowed users to create invalid documents, and assumed the serializer would accept the task of detecting problems and announcing/repairing them when the document was written out in XML syntax... or that they would be checked for validity when read back in. We considered adding validity checks to the DOM's existing editing operations to prevent creation of invalid documents, but are currently inclined against this for several reasons. First, it would impose a significant amount of computational overhead to the DOM, which might be unnecessary in many situations, e.g., if the change is occurring in a context where we know the result will be valid. Second, "the shortest distance between two good documents may be through a bad document". Preventing a document from becoming temporarily invalid may impose a considerable amount of additional work on higher-level code and users Hence our current plan is to continue to permit editing to produce invalid DOMs, but provide operations which permit a user to check the validity of a node on demand.
Note that validation includes checking that ID attributes are unique, and that IDREFs point to IDs which actually exist.
XML defined the "well-formed" (WF) state for documents which are parsed without reference to their DTDs. Knowing that a document is well-formed may be useful by itself even when a DTD is available. For example, users may wish to deliberately save an invalid document, perhaps as a checkpoint before further editing. Hence, the CM feature will permit both full validity checking (see next section) and "lightweight" WF checking, as requested by the caller, as well as processing entity declarations in the CM even if validation is not turned on. This falls within the document-editing world.
While the DOM inherently enforces some of XML's well-formedness conditions (proper nesting of elements, constraints on which children may be placed within each node), there are some checks that are not yet performed. These include:
In addition, Namespaces introduce their own concepts of well-formedness. Specifically:
namespaceNormalize
operation, which would
create the implied declarations and reconcile conflicts in some
reasonably standardized manner. This may be a major undertaking,
since some DOMs may be using the namespace to direct subclassing of
the nodes or similar special treatment; as with the existing
normalize
method, you may be left with a
different-but-equivalent set of node objects.In the past, the DOM has allowed users to create documents which violate these rules, and assumed the serializer would accept the task of detecting problems and announcing/repairing them when the document was written out in XML syntax. We considered adding WF checks to the DOM's existing editing operations to prevent WF violations from arising, but are currently inclined against this for two reasons. First, it would impose a significant amount of computational overhead to the DOM, which might be unnecessary in many situations (for example, if the change is occurring in a context where we know the illegal characters have already been prevented from arising). Second, "the shortest distance between two good documents may be through a bad document" -- preventing a document from becoming temporarily ill-formed may impose a considerable amount of additional work on higher-level code and users. (Note possible issue for Serialization: In some applications, being able to save and reload marginally poorly-formed DOMs might be useful -- editor checkpoint files, for example.) Hence our current plan is to continue to permit editing to produce ill-formed DOMs, but provide operations which permit a user to check the well-formedness of a node on demand, and possibly provide some of the primitive (e.g., string-checking) functions directly.