.. _xml-documents: ************* XML Documents ************* Lasso provides a full suite of objects both for constructing new XML documents and parsing existing XML documents. Lasso's implementation follows the `DOM Level 2 Core specification`_ as closely as possible. This introduces a series of objects each representing the various components that can be found within an XML document. The Lasso object names match up with the objects specified in the DOM standard with the addition of an ``xml_`` prefix. Also provided is a simplified method for parsing existing XML data. This method is called `xml` and does not conform to the DOM specification. Lasso also provides both XPath and XSLT functionality. This functionality is integrated into the XML object model, though it is not considered part of the DOM specification itself. In cases where elements are accessed by numeric position, Lasso's implementation conforms to the DOM specification's zero-based indexes, as opposed to Lasso's standard one-based positions. This will be noted in all relevant cases within this chapter. Creating XML Documents ====================== XML documents are created either from existing XML character data or as empty documents. An empty XML document will initially contain only the root document node which can then have children or attributes added to it. A document created from existing XML character data will be parsed and validated and the resulting document object tree will be created. When attempting to create an XML document from existing data, if the data is not valid, then a failure will be generated during parsing. The current `error_msg` will indicate the encountered error. New XML documents can be created in one of two ways: the DOM Level 2-conformant :type:`xml_DOMImplementation` type, or the `xml` method. Both have the same abilities, but the `xml` method provides a simplified interface and is compatible with earlier Lasso versions. It's important to note that `xml` is not itself an object, it is merely a method that provides a moderately easier to use interface to XML document creation. Internally, the `xml` method uses the :type:`xml_DOMImplementation` type and therefore provides equivalent functionality to the :type:`xml_DOMImplementation` type. Using xml --------- The `xml` method is presented in five variations; two for parsing existing XML documents and three for creating new blank documents. .. method:: xml(text::string) .. method:: xml(text::bytes) These first two methods parse existing XML data in either string or raw bytes form. If the document parsing is successful, these methods return the top-level :type:`xml_document` node object. .. method: xml(namespaceUri::string, rootNodeName::string) .. method:: xml(nsUri::string, rootNodeName::string, dtd::xml_documentType= ?) .. method:: xml() These subsequent three methods create a new document consisting of only the root :type:`xml_document` node and no children. These methods return the top-level :type:`xml_document` node object. The first methods create the document given a namespace and a root element name, along with an optional document type node (an :type:`xml_documentType`, created through the `xml_DOMImplementation->createDocumentType` method). The last method takes zero parameters and returns a document with no namespace and the root element name set to "none". In all cases, the resulting value from the `xml` method will be the root element of the document. This will be an object of type :type:`xml_element`. It's important to note that this is not the :type:`xml_document` object, which differs from the root element node. This behavior is a departure from that of the :type:`xml_DOMImplementation` type which does return the :type:`xml_document` object itself. The owning :type:`xml_document` object can be obtained from any node within that document by calling the `xml_node->ownerDocument` method. xml Examples ^^^^^^^^^^^^ Example of creating an XML document from existing data:: local(myDocumentText) = 'b content' local(myDocumentObj) = xml(#myDocumentText) Example of creating a blank XML document:: local(myDocumentObj) = xml('my_namespace', 'a') Using xml_DOMImplementation --------------------------- The :type:`xml_DOMImplementation` type provides comparable functionality to the `xml` method, but follows the DOM Level 2 specification. An object of the type :type:`xml_DOMImplementation` is stateless and can be created with zero parameters. Once an :type:`xml_DOMImplementation` object is obtained it can be used to create or parse XML documents as well as create XML document types. This functionality is presented in the following four methods. .. type:: xml_DOMImplementation .. member: xml_DOMImplementation->createDocument(namespaceUri::string, rootNodeName::string) .. member:: xml_DOMImplementation->createDocument(nsUri::string, rootNodeName::string, dtd::xml_documentType= ?) .. member:: xml_DOMImplementation->createDocumentType(qname::string, publicid::string, systemid::string) .. member:: xml_DOMImplementation->parseDocument(text::bytes) In contrast to the `xml` method, when creating or parsing an XML document the :type:`xml_DOMImplementation` object returns the document node. This will be an object of type :type:`xml_document`. It's important to note that this is not the root element node. The root element node can be obtained through the `xml_document->documentElement` method. xml_DOMImplementation Examples ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Example of creating an XML document from existing data:: local(myDocumentText) = 'b content' local(myDocumentObj) = xml_DOMImplementation->parseDocument( bytes(#myDocumentText) ) Example of creating a blank XML document:: local(domImpl) = xml_DOMImplementation local(docType) = #domImpl->createDocumentType( 'svg:svg', '-//W3C//DTD SVG 1.1//EN', 'http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd' ) local(myDocumentObj) = #domImpl->createDocument( 'http://www.w3.org/2000/svg', 'svg:svg', #docType ) The resulting document would have the following format: .. code-block:: xml Creating XML Node Objects ------------------------- While the :type:`xml_DOMImplementation` object is responsible for creating the initial :type:`xml_document` object, the :type:`xml_document` object is the means through which new XML node object types are created, including element, attribute, and text nodes. All XML objects always belong to a particular instance of the :type:`xml_document` type. No XML node objects can be created without an existing document. Nodes can be copied into another existing :type:`xml_document`, but nodes are never shared between documents. The following methods are use for creating new nodes: .. type:: xml_document .. member:: xml_document->createElement(tagName::string)::xml_element .. member:: xml_document->createElementNS(nsUri::string, qualifiedName::string)::xml_element The first version creates a new element node without a namespace. The second version permits a namespace to be specified. .. member:: xml_document->createAttribute(name::string)::xml_attr .. member:: xml_document->createAttributeNS(nsUri::string, qualifiedName::string)::xml_attr The first version creates a new attribute without a namespace. The second version permits a namespace to be specified. .. member:: xml_document->createDocumentFragment()::xml_documentFragment .. member:: xml_document->createTextNode(data::string)::xml_text .. member:: xml_document->createComment(data::string)::xml_comment .. member:: xml_document->createCDATASection(data::string)::xml_cdataSection .. member:: xml_document->createProcessingInstruction(target::string, data::string)::xml_processingInstruction .. member:: xml_document->createEntityReference(name::string)::xml_entityReference .. member:: xml_document->importNode(importedNode::xml_node, deep::boolean)::xml_node Imports a node from another document into the document of the target object and returns the new node. The new node is not yet placed within the current document and so it has no parent. If "false" is given for the second parameter, then the node's children and attributes are not copied. If "true" is given, then all attributes and child nodes are copied into the current document. The following table lists all the possible objects that may be encountered within or inserted into an XML document. .. tabularcolumns:: llL .. _xml-object-names: .. table:: XML Object Names ============================= ===================== =================================== Lasso XML Object Name XML DOM Level 2 Name Description ============================= ===================== =================================== ``xml_DOMImplementation`` DOMImplementation Creates `xml_document` and `xml_documentType` objects. Can parse existing XML documents or create new empty documents. ``xml_node`` Node Base functionality supported by all objects. ``xml_document`` Document Represents the entire document and provides access to the document's data. ``xml_element`` Element Represents an XML element node. ``xml_attr`` Attr Represents an attribute of an XML element node. ``xml_characterData`` CharacterData Represents character data within the document. This is the base object type for `xml_text` and `xml_cdataSection` objects. ``xml_text`` Text Represents the character data of an `xml_element` or `xml_attr` node. ``xml_cdataSection`` CDATASection Represents a CDATA node. ``xml_entityReference`` EntityReference Represents an entity reference. ``xml_entity`` Entity Represents a parsed or unparsed entity within the document. ``xml_processingInstruction`` ProcessingInstruction Represents a processing instruction located within the document. ``xml_comment`` Comment Represents the content of an XML comment node. ``xml_documentType`` DocumentType Represents the doctype attribute of an XML document. ``xml_documentFragment`` DocumentFragment Represents a minimal document object. ``xml_notation`` Notation Represents a notation declared in the DTD. ``xml_nodeList`` NodeList Represents a list of node objects. Provides random access to the list. This list uses zero-based indexes, in contrast to Lasso's standard one-based positions. ``xml_namedNodeMap`` NamedNodeMap Represents a collection of nodes that can be accessed by name. ============================= ===================== =================================== Inspecting XML Objects ---------------------- Lasso's XML interface permits all the various pieces of an XML document to be inspected. This includes accessing attributes, node content, node children etc. The methods listed in this section are not meant to be exhaustive, but instead to show the methods most commonly used when working with an XML document. .. type:: xml_node .. member:: xml_node->nodeType()::string Returns the name of the type of node. For example, an :type:`xml_element` node would return "ELEMENT_NODE". This is in contrast to the DOM Level 2 specification which returns an integer value. .. member:: xml_node->nodeName()::string Returns the name of the node. This value will depend on the type of the node in question. For :type:`xml_element` nodes, this will be the same value as the tag name. For :type:`xml_attr` nodes, this will be the same as the attribute name. .. member:: xml_node->prefix() Returns the namespace prefix of the node or "null" if it is unspecified. .. member:: xml_node->localName() Returns the local part of the qualified name of the node. .. member:: xml_node->namespaceURI() Returns the namespace URI of the node or "null" if it is unspecified. .. member:: xml_node->nodeValue() Returns the value of the node as a string. This result will vary depending on the node type. For example an attribute node will return the attribute value. A text node will return the text content for the node. Many node types, such as element nodes, will return "null". This value is read/write for nodes that have values, and in such cases can be set with the `xml_node->nodeValue=` method. .. member:: xml_node->parentNode() Returns the parent of the node or "null" if there is no parent. Some, such as attribute nodes and the document node, do not have parents. .. member:: xml_node->ownerDocument() Returns the :type:`xml_document` that is the owner of the target node. In the case of the document node, this will be "null". .. type:: xml_element .. member:: xml_element->tagName()::string Returns the name of the element. .. member:: xml_element->getAttribute(name::string)::string Returns the value of the specified attribute. Returns an empty string if the attribute does not exist or has no value. .. member:: xml_element->getAttributeNS(nsUri::string, localName::string) Returns the value of the attribute matching the given namespace and local name. Returns an empty string if the attribute does not exist or has no value. .. member:: xml_element->getAttributeNode(name::string) Returns the specified attribute node. Returns "null" if the attribute does not exist. .. member:: xml_element->getAttributeNodeNS(nsUri::string, localName::string) Returns the attribute node matching the given namespace and local name. Returns "null" if the attribute does not exist. .. member:: xml_element->hasAttribute(name::string)::boolean Returns "true" if the specified attribute exists. .. member:: xml_element->hasAttributeNS(nsUri::string, localName::string)::boolean Returns "true" if the attribute matching the given namespace and local name exists. .. type:: xml_attr .. member:: xml_attr->name()::string Returns the name of the attribute. .. member:: xml_attr->ownerElement() Returns the element node that owns the attribute or "null" if the attribute is not in use. .. member:: xml_attr->value()::string Returns the value of the attribute. This value is read/write. .. type:: xml_nodeList .. member:: xml_nodeList->length()::integer Returns the number of nodes in the list. .. member:: xml_nodeList->item(index::integer) Returns the node indicated by the index. Indexes start at zero and go up to length-1. Returns "null" if the index is invalid. .. type:: xml_nodeMap .. member:: xml_nodeMap->length()::integer Returns the number of nodes in the map. .. member:: xml_nodeMap->getNamedItem(name::string) Returns the node matching the indicated name. .. member:: xml_nodeMap->getNamedItemNS(nsUri::string, localName::string) Returns the node matching the indicated namespace URI and local name. .. member:: xml_nodeMap->item(index::integer) Returns the node indicated by the index. Indexes start at zero and go up to length-1. Returns "null" if the index is invalid. Modifying XML Objects --------------------- Various parts of an XML document can be modified. This includes setting node values, adding or removing child nodes, adding or removing attributes, or removing items from node maps. .. member:: xml_node->nodeValue=(value::string) Sets the value of the node to the indicated string. Only the following node types are able to have their values set: :type:`xml_attr`, :type:`xml_cdataSection`, :type:`xml_comment`, :type:`xml_processingInstruction`, :type:`xml_text`. .. member:: xml_node->insertBefore(new::xml_node, ref::xml_node)::xml_node Inserts the new node into the document immediately before the ref node. Returns the newly inserted node. .. member:: xml_node->replaceChild(new::xml_node, ref::xml_node)::xml_node Replaces the ref node in the document with the new node. Returns the new node. .. member:: xml_node->appendChild(new::xml_node)::xml_node Inserts the new node into the document at the end of the target node's child list. Returns the new node. .. member:: xml_node->removeChild(c::xml_node)::xml_node Removes the indicated child node from the document. Returns the removed node. .. member:: xml_node->normalize() Modifies the document such that no two text nodes are adjacent. All adjacent text nodes are merged into one text node. .. member:: xml_element->setAttribute(name::string, value::string) Adds an attribute with the given name and value. If the attribute already exists then the value is set accordingly. .. member:: xml_element->setAttributeNS(uri::string, qname::string, value::string) Adds an attribute with the given namespace, name, and value. If the attribute already exists its value is set accordingly. .. member:: xml_element->setAttributeNode(node::xml_attr) Adds the new attribute node. If an attribute with the same name already exists it is replaced. To add a namespace-aware attribute, use `xml_element->setAttributeNodeNS` instead. .. member:: xml_element->setAttributeNodeNS(node::xml_attr) Adds the new attribute node. If an attribute with the same namespace/name combination already exists it is replaced. .. member:: xml_element->removeAttribute(name::string) Removes the attribute with the indicated name. .. member:: xml_element->removeAttributeNS(uri::string, qname::string) Removes the attribute with the given namespace/name combination. .. member:: xml_element->removeAttributeNode(node::xml_attr)::xml_attr Removes the indicated attribute node. Returns the removed node. .. note:: Some node maps are read-only and cannot be modified. .. member:: xml_nodeMap->setNamedItem(node::xml_node)::xml_node Adds the node to the node map based on the "nodeName" value of the node. Replaces any duplicate node within the map. Returns the added node. .. member:: xml_nodeMap->setNamedItemNS(node::xml_node)::xml_node Adds the node to the node map based on the namespace/name combination. Replaces any duplicate node within the map. Returns the added node. .. member:: xml_nodeMap->removeNamedItem(name::string) Removes the node with the given name from the map. Returns the removed node. .. member:: xml_nodeMap->removeNamedItemNS(uri::string, qname::string) Removes the node with the given namespace/name combination from the map. Returns the removed node. XPath ===== Lasso's XML API supports the XPath 1.0 specification for any :type:`xml_node` type through the `xml_node->extract` and `xml_node->extractOne` methods. Consult the `XPath specification`_ for the specifics of XPath syntax. Using XPath ----------- XPath is used to address a specific set of nodes within an XML document. For example, child nodes matching a node name pattern can be located, or nodes with specific attributes can be easily found within the document. .. member:: xml_node->extract(xpath::string) Executes the XPath in the node and returns all matches as a staticarray. .. member:: xml_node->extract(xpath::string, namespaces::staticarray) Executes the XPath in the node and returns all matches as a staticarray. This method should be used for XML documents that use namespaces. The second parameter is a staticarray containing the relevant namespace prefixes and URI pairs that are used within the XPath expression. Note that the namespace prefixes used in the XPath expression do not have to match those used within the document itself. .. member:: xml_node->extractOne(xpath::string) Executes the XPath in the node and returns the first matching node or "null" if there are no matches. .. member:: xml_node->extractOne(xpath::string, namespaces::staticarray) Executes the XPath in the node and returns the first matching node or "null" if there are no matches. This method should be used for XML documents that use namespaces. The second parameter is a staticarray containing the relevant namespace prefixes and URI pairs that are used within the XPath expression. Note that the namespace prefixes used in the XPath expression do not have to match those used within the document itself. XPath Examples ^^^^^^^^^^^^^^ Extract all child elements of the a node:: local(doc) = xml( ' C Content ') #doc->extract('//a/*') // => staticarray(, C Content) Using namespaces, extract all child elements of the a node:: local(doc) = xml( ' C Content ') #doc->extract('//n:a/*', (: 'n'='my_uri')) // => staticarray(, C Content) Extract the first child element of the a node:: local(doc) = xml( ' C Content ') #doc->extractOne('//a/*') // => Extract the ``"at"`` attribute from the second child element of the a node:: local(doc) = xml( ' C Content ') #doc->extractOne('//n:a/*[2]/@at', (: 'n'='my_uri')) // => at="val2" XSLT ==== Lasso's XML API supports XSL Transformations (XSLT) 1.0. For the specifics of XSLT, consult the `XSLT specification`_. XSLT support is provided on any :type:`xml_node` type through the `~xml_node->transform` method. This method accepts an XSLT template as a string as well as a list of all variables to be made available during the transformation. The transformation is performed and a new XML document is returned. .. member:: xml_node->transform(sheet::string, variables::staticarray)::xml_document Performs an XSLT transformation on the document and returns the resulting newly produced document. .. _DOM Level 2 Core specification: http://www.w3.org/TR/DOM-Level-2-Core/ .. _XPath specification: http://www.w3.org/TR/xpath/ .. _XSLT specification: http://www.w3.org/TR/xslt/