XML Documents

Lasso provides a full suite of objects both for constructing new XML documents and parsing existing XML documents. Lasso’s implementation follows the DOM Level 2 Core specification as closely as possible. This introduces a series of objects each representing the various components that can be found within an XML document. The Lasso object names match up with the objects specified in the DOM standard with the addition of an xml_ prefix. Also provided is a simplified method for parsing existing XML data. This method is called xml and does not conform to the DOM specification.

Lasso also provides both XPath and XSLT functionality. This functionality is integrated into the XML object model, though it is not considered part of the DOM specification itself.

In cases where elements are accessed by numeric position, Lasso’s implementation conforms to the DOM specification’s zero-based indexes, as opposed to Lasso’s standard one-based positions. This will be noted in all relevant cases within this chapter.

Creating XML Documents

XML documents are created either from existing XML character data or as empty documents. An empty XML document will initially contain only the root document node which can then have children or attributes added to it. A document created from existing XML character data will be parsed and validated and the resulting document object tree will be created. When attempting to create an XML document from existing data, and the data is not valid, a failure will be generated during parsing. The current error_msg will indicate the encountered error.

New XML documents can be created in one of two ways: the DOM Level 2-conformant xml_DOMImplementation type, or the xml method. Both have the same abilities, but the xml method provides a simplified interface and is compatible with earlier Lasso versions. It’s important to note that xml is not itself an object, it is merely a method that provides a moderately easier to use interface to XML document creation. Internally, the xml method uses the xml_DOMImplementation type and therefore provides equivalent functionality to the xml_DOMImplementation type.

Using xml

The xml method is presented in five variations; two for parsing existing XML documents and three for creating new blank documents.

xml(text::string)
xml(text::bytes)

These first two methods parse existing XML data in either string or raw bytes form. If the document parsing is successful, these methods return the top-level xml_document node object.

xml(nsUri::string, rootNodeName::string, dtd::xml_documentType=?)
xml()

These subsequent three methods create a new document consisting of only the root xml_document node and no children, returning the top-level xml_document node object. The first methods create the document given a namespace and a root element name, along with an optional document type node (an xml_documentType, created through the xml_DOMImplementation->createDocumentType method). The last method takes no parameters and returns a document with no namespace and the root element name set to “none”.

In all cases, the resulting value from the xml method will be the root element of the document. This will be an object of type xml_element. It’s important to note that this is not the xml_document object, which differs from the root element node. This behavior is a departure from that of the xml_DOMImplementation type which does return the xml_document object itself. The owning xml_document object can be obtained from any node within that document by calling the xml_node->ownerDocument method.

xml Examples

Example of creating an XML document from existing data:

local(myDocumentText) = '<a><b>b content</b><c/></a>'
local(myDocumentObj)  = xml(#myDocumentText)

Example of creating a blank XML document:

local(myDocumentObj) = xml('my_namespace', 'a')

Using xml_DOMImplementation

The xml_DOMImplementation type provides comparable functionality to the xml method, but follows the DOM Level 2 specification. An object of the type xml_DOMImplementation is stateless and can be created with no parameters. Once an xml_DOMImplementation object is obtained it can create or parse XML documents as well as create XML document types.

This functionality is presented in the following four methods.

type xml_DOMImplementation
xml_DOMImplementation->createDocument(nsUri::string, rootNodeName::string, dtd::xml_documentType=?)
xml_DOMImplementation->createDocumentType(qname::string, publicid::string, systemid::string)
xml_DOMImplementation->parseDocument(text::bytes)

In contrast to the xml method, when creating or parsing an XML document the xml_DOMImplementation object returns the document node. This will be an object of type xml_document. It’s important to note that this is not the root element node. The root element node can be obtained through the xml_document->documentElement method.

xml_DOMImplementation Examples

Example of creating an XML document from existing data:

local(myDocumentText) = '<a><b>b content</b><c/></a>'
local(myDocumentObj)  =
   xml_DOMImplementation->parseDocument(
      bytes(#myDocumentText)
   )

Example of creating a blank XML document:

local(domImpl) = xml_DOMImplementation
local(docType) = #domImpl->createDocumentType(
   'svg:svg',
   '-//W3C//DTD SVG 1.1//EN',
   'http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd'
)
local(myDocumentObj) = #domImpl->createDocument(
   'http://www.w3.org/2000/svg',
   'svg:svg',
   #docType
)

The resulting document would have the following format:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE svg:svg PUBLIC "-//W3C//DTD SVG 1.1//EN" "http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd">
<svg xmlns:svg="http://www.w3.org/2000/svg"/>

Creating XML Node Objects

While the xml_DOMImplementation object is responsible for creating the initial xml_document object, the xml_document object is the means through which new XML node object types are created, including element, attribute, and text nodes. All XML objects always belong to a particular instance of the xml_document type. No XML node objects can be created without an existing document. Nodes can be copied into another existing xml_document, but nodes are never shared between documents.

The following methods are use for creating new nodes:

type xml_document
xml_document->createElement(tagName::string) → xml_element
xml_document->createElementNS(nsUri::string, qualifiedName::string) → xml_element

The first version creates a new element node without a namespace. The second version permits a namespace to be specified.

xml_document->createAttribute(name::string) → xml_attr
xml_document->createAttributeNS(nsUri::string, qualifiedName::string) → xml_attr

The first version creates a new attribute without a namespace. The second version permits a namespace to be specified.

xml_document->createDocumentFragment() → xml_documentFragment
xml_document->createTextNode(data::string) → xml_text
xml_document->createComment(data::string) → xml_comment
xml_document->createCDATASection(data::string) → xml_cdataSection
xml_document->createProcessingInstruction(target::string, data::string) → xml_processingInstruction
xml_document->createEntityReference(name::string) → xml_entityReference
xml_document->importNode(importedNode::xml_node, deep::boolean) → xml_node

Imports a node from another document into the document of the target object and returns the new node. The new node is not yet placed within the current document and so it has no parent. If “false” is given for the second parameter, the node’s children and attributes are not copied. If “true” is given, then all attributes and child nodes are copied into the current document.

The following table lists all the possible objects that may be encountered within or inserted into an XML document.

XML Object Names
Lasso XML Object Name XML DOM Level 2 Name Description
xml_DOMImplementation DOMImplementation Creates xml_document and xml_documentType objects. Can parse existing XML documents or create new empty documents.
xml_node Node Base functionality supported by all objects.
xml_document Document Represents the entire document and provides access to the document’s data.
xml_element Element Represents an XML element node.
xml_attr Attr Represents an attribute of an XML element node.
xml_characterData CharacterData Represents character data within the document. This is the base object type for xml_text and xml_cdataSection objects.
xml_text Text Represents the character data of an xml_element or xml_attr node.
xml_cdataSection CDATASection Represents a CDATA node.
xml_entityReference EntityReference Represents an entity reference.
xml_entity Entity Represents a parsed or unparsed entity within the document.
xml_processingInstruction ProcessingInstruction Represents a processing instruction located within the document.
xml_comment Comment Represents the content of an XML comment node.
xml_documentType DocumentType Represents the doctype attribute of an XML document.
xml_documentFragment DocumentFragment Represents a minimal document object.
xml_notation Notation Represents a notation declared in the DTD.
xml_nodeList NodeList Represents a list of node objects. Provides random access to the list. This list uses zero-based indexes, in contrast to Lasso’s standard one-based positions.
xml_namedNodeMap NamedNodeMap Represents a collection of nodes that can be accessed by name.

Inspecting XML Objects

Lasso’s XML interface permits all the various pieces of an XML document to be inspected. This includes accessing attributes, node content, node children etc. The methods listed in this section are not meant to be exhaustive, but instead to show the methods most commonly used when working with an XML document.

type xml_node
xml_node->nodeType() → string

Returns the name of the type of node. For example, an xml_element node would return “ELEMENT_NODE”. This is in contrast to the DOM Level 2 specification which returns an integer value.

xml_node->nodeName() → string

Returns the name of the node. This value will depend on the type of the node in question. For xml_element nodes, this will be the same value as the tag name. For xml_attr nodes, this will be the same as the attribute name.

xml_node->prefix()

Returns the namespace prefix of the node or “null” if it is unspecified.

xml_node->localName()

Returns the local part of the qualified name of the node.

xml_node->namespaceURI()

Returns the namespace URI of the node or “null” if it is unspecified.

xml_node->nodeValue()

Returns the value of the node as a string. This result will vary depending on the node type. For example, an attribute node will return the attribute value, and a text node will return the text content for the node. Many node types, such as element nodes, will return “null”. This value is read/write for nodes that have values, and in such cases can be set with the xml_node->nodeValue= method.

xml_node->parentNode()

Returns the parent of the node or “null” if there is no parent. Some, such as attribute nodes and the document node, do not have parents.

xml_node->ownerDocument()

Returns the xml_document that is the owner of the target node. In the case of the document node, this will be “null”.

type xml_element
xml_element->tagName() → string

Returns the name of the element.

xml_element->getAttribute(name::string) → string

Returns the value of the specified attribute. Returns an empty string if the attribute does not exist or has no value.

xml_element->getAttributeNS(nsUri::string, localName::string)

Returns the value of the attribute matching the given namespace and local name. Returns an empty string if the attribute does not exist or has no value.

xml_element->getAttributeNode(name::string)

Returns the specified attribute node. Returns “null” if the attribute does not exist.

xml_element->getAttributeNodeNS(nsUri::string, localName::string)

Returns the attribute node matching the given namespace and local name. Returns “null” if the attribute does not exist.

xml_element->hasAttribute(name::string) → boolean

Returns “true” if the specified attribute exists.

xml_element->hasAttributeNS(nsUri::string, localName::string) → boolean

Returns “true” if the attribute matching the given namespace and local name exists.

type xml_attr
xml_attr->name() → string

Returns the name of the attribute.

xml_attr->ownerElement()

Returns the element node that owns the attribute or “null” if the attribute is not in use.

xml_attr->value() → string

Returns the value of the attribute. This value is read/write.

type xml_nodeList
xml_nodeList->length() → integer

Returns the number of nodes in the list.

xml_nodeList->item(index::integer)

Returns the node specified by the index. Indexes start at zero and go up to length-1. Returns “null” if the index is invalid.

type xml_nodeMap
xml_nodeMap->length() → integer

Returns the number of nodes in the map.

xml_nodeMap->getNamedItem(name::string)

Returns the node matching the specified name.

xml_nodeMap->getNamedItemNS(nsUri::string, localName::string)

Returns the node matching the specified namespace URI and local name.

xml_nodeMap->item(index::integer)

Returns the node specified by the index. Indexes start at zero and go up to length-1. Returns “null” if the index is invalid.

Modifying XML Objects

Various parts of an XML document can be modified. This includes setting node values, adding or removing child nodes, adding or removing attributes, or removing items from node maps.

xml_node->nodeValue=(value::string)

Sets the value of the node to the specified string. Only the following node types are able to have their values set: xml_attr, xml_cdataSection, xml_comment, xml_processingInstruction, xml_text.

xml_node->insertBefore(new::xml_node, ref::xml_node) → xml_node

Inserts the new node into the document immediately before the ref node. Returns the newly inserted node.

xml_node->replaceChild(new::xml_node, ref::xml_node) → xml_node

Replaces the ref node in the document with the new node. Returns the new node.

xml_node->appendChild(new::xml_node) → xml_node

Inserts the new node into the document at the end of the target node’s child list. Returns the new node.

xml_node->removeChild(c::xml_node) → xml_node

Removes the specified child node from the document. Returns the removed node.

xml_node->normalize()

Modifies the document such that no two text nodes are adjacent. All adjacent text nodes are merged into one text node.

xml_element->setAttribute(name::string, value::string)

Adds an attribute with the given name and value. If the attribute already exists then the value is set accordingly.

xml_element->setAttributeNS(uri::string, qname::string, value::string)

Adds an attribute with the given namespace, name, and value. If the attribute already exists its value is set accordingly.

xml_element->setAttributeNode(node::xml_attr)

Adds the new attribute node. If an attribute with the same name already exists it is replaced. To add a namespace-aware attribute, use xml_element->setAttributeNodeNS instead.

xml_element->setAttributeNodeNS(node::xml_attr)

Adds the new attribute node. If an attribute with the same namespace/name combination already exists it is replaced.

xml_element->removeAttribute(name::string)

Removes the attribute with the specified name.

xml_element->removeAttributeNS(uri::string, qname::string)

Removes the attribute with the given namespace/name combination.

xml_element->removeAttributeNode(node::xml_attr) → xml_attr

Removes the specified attribute node. Returns the removed node.

Note

Some node maps are read-only and cannot be modified.

xml_nodeMap->setNamedItem(node::xml_node) → xml_node

Adds the node to the node map based on the “nodeName” value of the node. Replaces any duplicate node within the map. Returns the added node.

xml_nodeMap->setNamedItemNS(node::xml_node) → xml_node

Adds the node to the node map based on the namespace/name combination. Replaces any duplicate node within the map. Returns the added node.

xml_nodeMap->removeNamedItem(name::string)

Removes the node with the given name from the map. Returns the removed node.

xml_nodeMap->removeNamedItemNS(uri::string, qname::string)

Removes the node with the given namespace/name combination from the map. Returns the removed node.

XPath

Lasso’s XML API supports the XPath 1.0 specification for any xml_node type through the xml_node->extract and xml_node->extractOne methods. Consult the XPath specification for the specifics of XPath syntax.

Using XPath

XPath is used to address a specific set of nodes within an XML document. For example, child nodes matching a node name pattern can be located, or nodes with specific attributes can be easily found within the document.

xml_node->extract(xpath::string)

Executes the XPath in the node and returns all matches as a staticarray.

xml_node->extract(xpath::string, namespaces::staticarray)

Executes the XPath in the node and returns all matches as a staticarray. This method should be used for XML documents that use namespaces. The second parameter is a staticarray containing the relevant namespace prefixes and URI pairs that are used within the XPath expression. Note that the namespace prefixes used in the XPath expression do not have to match those used within the document itself.

xml_node->extractOne(xpath::string)

Executes the XPath in the node and returns the first matching node or “null” if there are no matches.

xml_node->extractOne(xpath::string, namespaces::staticarray)

Executes the XPath in the node and returns the first matching node or “null” if there are no matches. This method should be used for XML documents that use namespaces. The second parameter is a staticarray containing the relevant namespace prefixes and URI pairs that are used within the XPath expression. Note that the namespace prefixes used in the XPath expression do not have to match those used within the document itself.

XPath Examples

Extract all child elements of the a node:

local(doc) = xml(
   '<a>
      <b at="val"/>
      <c at="val2">C Content</c>
   </a>')
#doc->extract('//a/*')

// => staticarray(<b at="val"/>, <c at="val2">C Content</c>)

Using namespaces, extract all child elements of the a node:

local(doc) = xml(
   '<a xmlns="my_uri">
      <b at="val"/>
      <c at="val2">C Content</c>
   </a>')
#doc->extract('//n:a/*', (: 'n'='my_uri'))

// => staticarray(<b at="val"/>, <c at="val2">C Content</c>)

Extract the first child element of the a node:

local(doc) = xml(
   '<a>
      <b at="val"/>
      <c at="val2">C Content</c>
   </a>')
#doc->extractOne('//a/*')

// => <b at="val"/>

Extract the "at" attribute from the second child element of the a node:

local(doc) = xml(
   '<a xmlns="my_uri">
      <b at="val"/>
      <c at="val2">C Content</c>
   </a>')
#doc->extractOne('//n:a/*[2]/@at', (: 'n'='my_uri'))

// => at="val2"

XSLT

Lasso’s XML API supports XSL Transformations (XSLT) 1.0. For the specifics of XSLT, consult the XSLT specification.

XSLT support is provided on any xml_node type through the transform method, which accepts an XSLT template as a string as well as a list of all variables to be made available during the transformation. The transformation is performed and a new XML document is returned.

xml_node->transform(sheet::string, variables::staticarray) → xml_document

Performs an XSLT transformation on the document and returns the resulting newly produced document.