The experimental, loadable  DOM XML/XSLT filter module
    mod-dom.so
    is invoked by the zebra.cfg configuration statement
    
     recordtype.xml: dom.db/filter_dom_conf.xml
    
    In this example the DOM XML filter is configured to work
    on all data files with suffix
    *.xml, where the configuration file is found in the
    path db/filter_dom_conf.xml.
   
The DOM XSLT filter configuration file must be valid XML. It might look like this:
     
     <?xml version="1.0" encoding="UTF8"?>
     <dom xmlns="http://indexdata.com/zebra-2.0">
     <input>
     <xmlreader level="1"/>
     <!-- <marc inputcharset="marc-8"/> -->
    </input>
     <extract>
     <xslt stylesheet="common2index.xsl"/>
    </extract>
     <store>
     <xslt stylesheet="common2store.xsl"/>
    </store>
     <retrieve name="dc">
     <xslt stylesheet="store2dc.xsl"/>
    </retrieve>
     <retrieve name="mods">
     <xslt stylesheet="store2mods.xsl"/>
    </retrieve>
    </dom>
     
    
    The root XML element <dom> and all other DOM
    XML filter elements are residing in the namespace
    xmlns="http://indexdata.com/zebra-2.0".
   
    All pipeline definition elements - i.e. the
    <input>,
    <extract>,
    <store>, and
    <retrieve> elements - are optional.
    Missing pipeline definitions are just interpreted
    do-nothing identity pipelines.
   
    All pipeline definition elements may contain zero or more
    <xslt stylesheet="path/file.xsl"/>
    XSLT transformation instructions, which are performed
    sequentially from top to bottom.
    The paths in the stylesheet attributes
    are relative to zebras working directory, or absolute to the file
    system root.
   
     The <input> pipeline definition element
     may contain either one XML Reader definition
     <xmlreader level="1"/>, used to split
     an XML collection input stream into individual XML DOM
     documents at the prescribed element level,
     or one MARC binary
     parsing instruction
     <marc inputcharset="marc-8"/>, which defines
     a conversion to MARCXML format DOM trees. The allowed values
     of the inputcharset attribute depend on your
     local iconv™ set-up.
    
     Both input parsers deliver individual DOM XML documents to the
     following chain of zero or more
     <xslt stylesheet="path/file.xsl"/>
     XSLT transformations. At the end of this pipeline, the documents
     are in the common format, used to feed both the
     <extract> and
     <store> pipelines.
    
     The <extract> pipeline takes documents
     from any common DOM XML format to the Zebra specific
     indexing DOM XML format.
     It may consist of zero ore more
     <xslt stylesheet="path/file.xsl"/>
     XSLT transformations, and the outcome is handled to the
     Zebra core to drive the process of building the inverted
     indexes. See
     Section 2.5, “Canonical Indexing Format” for
     details.
    
     The <store> pipeline takes documents
     from any common DOM  XML format to the Zebra specific
     storage DOM  XML format.
     It may consist of zero ore more
     <xslt stylesheet="path/file.xsl"/>
     XSLT transformations, and the outcome is handled to the
     Zebra core for deposition into the internal storage system.
    
     Finally, there may be one or more
     <retrieve> pipeline definitions, each
     of them again consisting of zero or more
     <xslt stylesheet="path/file.xsl"/>
     XSLT transformations. These are used for document
     presentation after search, and take the internal storage DOM
     XML to the requested output formats during record present
     requests.
    
     The  possible multiple
     <retrieve> pipeline definitions
     are distinguished by their unique name
     attributes, these are the literal schema or
     element set names used in
     SRW,
     SRU and
     Z39.50 protocol queries.
    
     DOM XML indexing comes in two flavors: pure
     processing-instruction governed plain XML documents, and - very
     similar to the Alvis filter indexing format - XML documents
     containing XML <record> and
     <index> instructions from the magic
     namespace xmlns:z="http://indexdata.com/zebra-2.0".
    
The output of the processing instruction driven
      indexing XSLT stylesheets must contain
      processing instructions named
      zebra-2.0.
      The output of the XSLT indexing transformation is then
      parsed using DOM methods, and the contained instructions are
      performed on the elements and their
       subtrees directly following the processing instructions.
     
For example, the output of the command
       xsltproc dom-index-pi.xsl marc-one.xml
      might look like this:
       
       <?xml version="1.0" encoding="UTF-8"?>
       <?zebra-2.0 record id=11224466 rank=42?>
       <record>
       <?zebra-2.0 index control:0?>
       <control>11224466</control>
       <?zebra-2.0 index any:w title:w title:p title:s?>
       <title>How to program a computer</title>
      </record>
       
      
The output of the indexing XSLT stylesheets must contain
      certain elements in the magic
      xmlns:z="http://indexdata.com/zebra-2.0"
      namespace. The output of the XSLT indexing transformation is then
      parsed using DOM methods, and the contained instructions are
      performed on the magic elements and their
       subtrees.
     
For example, the output of the command
       xsltproc dom-index-element.xsl marc-one.xml
      might look like this:
       
       <?xml version="1.0" encoding="UTF-8"?>
       <z:record xmlns:z="http://indexdata.com/zebra-2.0"
       z:id="11224466" z:rank="42">
       <z:index name="control:0">11224466</z:index>
       <z:index name="any:w title:w title:p title:s">
       How to program a computer</z:index>
      </z:record>
       
      
Both indexing formats are defined with equal semantics and behavior in mind:
Zebra specific instructions are either
         processing instructions named
         zebra-2.0 or
         elements contained in the namespace
         xmlns:z="http://indexdata.com/zebra-2.0".
	
There must be exactly one record
	 instruction, which sets the scope for the following,
	 possibly nested index and
	 group instructions.
	
	 The unique record instruction
	 may have additional attributes id,
	 rank and type.
	 Attribute id is the value of the opaque ID
	 and may be any string not containing the whitespace character
	 ' '.
	 The rank attribute value must be a
	 non-negative integer. See
	 Section 9, “Relevance Ranking and Sorting of Result Sets” .
	 The type attribute specifies how the record
	 is to be treated. The following values may be given for
	 type:
	 
insertThe record is inserted. If the record already exists, it is skipped (i.e. not replaced).
replaceThe record is replaced. If the record does not already exist, it is skipped (i.e. not inserted).
deleteThe record is deleted. If the record does not already exist, a warning issued and rest of records are skipped in from the input stream.
updateThe record is inserted or replaced depending on whether the record exists or not. This is the default behavior but may be effectively changed by "outside" the scope of the DOM filter by zebraidx commands or extended services updates.
adeleteThe record is deleted. If the record does not already exist, it is skipped (i.e. nothing is deleted).
Requires version 2.0.54 or later.
	 Note that the value of type is only used to
	 determine the action if and only if the Zebra indexer is running
	 in "update" mode (i.e zebraidx update) or if the specialUpdate
	 action of the
	 Extended
          Service Update is used.
	 For this reason a specialUpdate may end up deleting records!
	
 Multiple and possible nested index
         instructions must contain at least one
         indexname:indextype
         pair, and may contain multiple such pairs separated by the
         whitespace character  ' '. In each index
         pair, the name and the type of the index is separated by a
         colon character ':'.
	
Any index name consisting of ASCII letters, and following the standard Zebra rules will do, see Section 3.5.1, “Mapping of PQF APT access points”.
         Index types are restricted to the values defined in
         the standard configuration
         file default.idx, see
         Section 2.3, “BIB-1 Attribute Set” and
         Chapter 10, Field Structure and Character Sets
   for details.
	
         DOM input documents which are not resulting in both one
         unique valid
         record instruction and one or more valid
         index instructions can not be searched and
         found. Therefore,
         invalid document processing is aborted, and any content of
         the <extract> and
         <store> pipelines is discarded.
	 A warning is issued in the logs.
	
	 The group can be used to group
	 indexing material for proximity search. It can be used to
	 search for material that should all occur within the same
	 group. It takes an optional unit attribute
	 which can be one of known Z39.50 proximity units:
	 sentence (3),
	 paragraph (4),
         section (5),
         chapter (6),
         document (7),
         element (8),
	 subelement (9),
         elementType (10).
         If omitted, unit element is used.
	
         For example, in order to search withing same group of unit type
         chapter, the
         corresponding Z39.50 proximity search would be:
         @prox 0 0 0 0 k 6 leftop rightop
        
The group facility requires Zebra 2.1.0 or later
The examples work as follows:
      From the original XML file
      marc-one.xml (or from the XML record DOM of the
      same form coming from an <input>
      pipeline),
      the indexing
      pipeline <extract>
      produces an indexing XML record, which is defined by
      the record instruction
      Zebra uses the content of
      z:id="11224466"
      or
      id=11224466
      as internal
      record ID, and - in case static ranking is set - the content of
      rank=42
      or
      z:rank="42"
      as static rank.
     
In these examples, the following literal indexes are constructed:
       any:w
       control:0
       title:w
       title:p
       title:s
      
      where the indexing type is defined after the
      literal ':' character.
      Any value from the standard configuration
      file default.idx will do.
      Finally, any
      text() node content recursively contained
      inside the <z:index> element, or any
      element following a index processing instruction,
      will be filtered through the
      appropriate char map for character normalization, and will be
      inserted in the named indexes.
     
Finally, this example configuration can be queried using PQF queries, either transported by Z39.50, (here using a yaz-client)
       
       Z> open localhost:9999
       Z> elem dc
       Z> form xml
       Z>
       Z> find @attr 1=control @attr 4=3 11224466
       Z> scan @attr 1=control @attr 4=3 ""
       Z>
       Z> find @attr 1=title program
       Z> scan @attr 1=title ""
       Z>
       Z> find @attr 1=title @attr 4=2 "How to program a computer"
       Z> scan @attr 1=title @attr 4=2 ""
       
      
      or the proprietary
      extensions x-pquery and
      x-pScanClause to
      SRU, and SRW
      
       
       http://localhost:9999/?version=1.1&operation=searchRetrieve&x-pquery=@attr 1=title program
       http://localhost:9999/?version=1.1&operation=scan&x-pScanClause=@attr 1=title ""
       
      See the section called “The SRU Server” for more information on SRU/SRW configuration, and the section called “YAZ server virtual hosts” or the YAZ CQL section for the details or the YAZ frontend server.
      Notice that there are no *.abs,
      *.est, *.map, or other GRS-1
      filter configuration files involves in this process, and that the
      literal index names are used during search and retrieval.
     
      In case that we want to support the usual
      bib-1 Z39.50 numeric access points, it is a
      good idea to choose string index names defined in the default
      configuration file tab/bib1.att, see
      Section 3.4, “The Attribute Set (.att) Files”