Metadata is great way to enrich your content. However, just having metadata does not help you or your end users very much. You must leverage or use the metadata. Index sheets enable you to take advantage (leverage) both internal and external metadata. Index sheets unleash the power of metadata by indexing your metadata according to metadata tags or properties within your content. Index sheets take a source document and index its contents according to indexing rules you set up within the index sheet. Index sheets, unlike metadata, are specific to NXT.
Index sheets, in short, are XML documents that follow XSL (eXtensible Stylesheet
Language) rules. More precisely, an index sheet is an XML document that follows
a rule-based system based on XPath and a subset of XSLT. Index sheets are
stylesheets for indexing. Because index sheets are based on XPath/XSLT rule
system, the syntax you use may be both familiar and standard. Index sheet file
names contain the .xil
extension which stands for
"eXtensible Indexing Language."
Index sheets are simple in their structure but powerful and effective in their purpose. Index sheets deal with "inside" a document, whereas the makefile deals with a document as a whole. The makefile is not concerned with what is in the document, but an index sheet is.
When you install NXT Online Server or NXT Builder you receive premade, fully functional, out-of-the-box index sheets. You can modify and customize any or all of these index sheets to fit your indexing needs. Table 1 provides a list of these index sheets and the document types they index and a brief description of each.
Default Index Sheet | Content Type | Document Type | Description |
---|---|---|---|
HTML.XIL | text/html | HTML | Indexes most HTML elements and designates the "Title" to be the document title in the TOC. |
HTML-title.XIL | text/html | HTML | Similar to the HTML.xil except it creates table of contents hierarchy from the header ("H1, H2, etc.") elements. |
XML.XIL | text/xml | XML | Indexes all elements by element name for any given XML document. |
PDF.xil | application/pdf | Adobe Acrobat PDF | Transforms the PDF document to XML then indexes the (metadata) properties of PDF documents. This indexsheet is used by default for PDF indexing. |
PDF-transform.xil | application/pdf | Adobe Acrobat PDF | Transforms the PDF document to XML then indexes the (metadata) properties of PDF documents. This PDF indexsheet was introduced in NXT 4.10 for custom indexing of PDF files. |
MSExcel.XIL | application/msexcel | Microsoft Excel | Indexes the (metadata) properties of MSExcel documents. |
MSWord.XIL | application/msword | Microsoft Word | Transforms the MSWord document to HTML then indexes the (metadata) properties of MSWord documents. |
ODT.XIL | application/vnd.oasis.opendocument.text | OpenDocument Text (ODT) | Indexes the (metadata) properties of ODT documents. |
MSPowerPoint.XIL | application/mspowerpoint | Microsoft PowerPoint | Indexes the (metadata) properties of MSPowerPoint documents. |
Metadata.XIL | Metadata | Indexes all external metadata files leveraging Dublin Core elements as well as non-Dublin Core standard elements within those metadata documents. |
Library Manager, the content bridges, ccBuild, and Manage Content all use these default index sheets to further index your content (beyond a simple full-text index) as you build your content collections.
Note: To index ODT documents, you must install the corresponding version of OpenOffice IFilters that is included in the Apache OpenOffice 4.0 and higher, or similar software (for example, LibreOffice 4.3 and higher). You can download and install the required software manually from the official site.
The NXT indexing engine uses the following indexsheets to index PDF files: PDF.xil, PDF-transform.xil.
To revert to the NXT version 4.9 indexing behavior,
use the PDF-transform.xil
file. Also, you can use the
PDF-transform.xil
file if you want to customize your indexing process. For example, to
generate facets for PDF files, or to use an external utlity during the indexing process.
To apply rules for the improved performance during the PDF indexing, use PDF.xil
.
New rules for indexing are introduced in the 4.10 version of NXT. The improved performance
during the PDF indexing means that with NXT 4.10 and higher you can index large PDF files faster.
PDF.xil
is used by default.
Also, since the 4.10 version of NXT, a special tool for PDF indexing is used. The PDFSupport tool is installed to ensure a correct processing of PDF files between several NXT products. This tool is useful when you uninstall one NXT product but keep another NXT product. In this case, the registration for PDF libraries is processed correctly.
You may use or modify the default index sheets as you desire, or you may want to create an index sheet completely from scratch. Either path you choose, you can use the information and know-how in this section to help you accomplish your index sheet and indexing needs.
![]() |
Index sheets are fixed. If you edit them, you must start your collection over (in Library Manager, see Start Collection Over menu item) and will need to redistribute your collection. |
Index sheets are very powerful in what they do and enable your end users to do, however, their makeup is relatively simple. There are basically two building blocks that make up Index sheets:
Because index sheets are XML documents which generally follow XSLT/XPath rules and are specific to NXT, there is mark-up within an index sheet that is XML/XSL standard, and there is mark-up that is proprietary (NXT specific). To differentiate which items are which, index sheets employ the use of "namespaces".
Namespaces are used in XML to avoid naming collision with other documents.
Namespaces identifier become part of element names and preface an element name
within the "<" and ">" characters, like this: <namespace:element_name>
.
The namespace identifier for XML/XSL standard items is "xsl" and that
for the NXT items is "np."
You must declare the namespaces you will use in an XML document. That
declaration happens at the top of that document (usually immediately following
the XML declaration statement), and each namespace declaration is prefaced by
xmlns:
(this stands for XML namespace). Figure 1 shows the
namespace declaration statement of an index sheet where two namespaces are
being declared; xsl and np.
Figure 1. Namespace Declaration in an Index Sheet
Template blocks are XPath/XSLT items. Template blocks define and delineate the
indexing "action items" for an index sheet. These template blocks follow the
XPath rules for matching on elements. Template blocks begin with the start tag
<xsl:template ...>
and end with </xsl:template>
.
Template blocks look for or "match" on element names (they can also match on attributes of elements) and index the contents of the element (what is between the tags). Remember that. The tag or element names of your metadata provide the basis for the matching of your template blocks.
Fields are NXT items. Fields are indexing aliases for the matched-on
elements. Meaning, you could match on an element <feline>
and
index it as "cat." "Cat" would be the field name for the "feline"
element. Regardless, you must index each element you instruct the index sheet
to find (match on) as something. That something may be the same
as the element name or different. You must define field names that are
different than the element names.
Fields can be defined in two different places; within the index sheet or within the makefile (see makefile.dtd). Regardless of where you define your fields, field definitions exist outside of the template blocks.
After fields are defined (if needed) you can use those fields within the template blocks to index your content. Figure 2 shows an example of a "two-action" (two template blocks) Index sheet with field definition and usage.
Figure 2. Sample Index Sheet
Remember that everything an index sheet indexes is indexed as a field. In Figure 2 you see two actions (template blocks) but only one definition. The first template block matches on the "title" element, then indexes the element's contents as a field by the same name as the element name (this is the function of the "field-element-name" attribute). The second template block matches on the element "dc:creator" and indexes it as "author," thus using the field defined toward the top of the index sheet. Indexsheet.dtd governs the structure of all index sheets.
Note: You may match on more than one element to apply a rule. When doing so, use the pipe character "|" to separate the element names. For example:
<xsl:template match='H1|H2|H3|H4|H5|H6'>
<np:index toc-heading="title-HTML" title-field="dc:title">
<xsl:apply-templates/>
</np:index>
</xsl:template>
Be aware that you must not add spaces between the element names and the pipe character in the template match. 'H2|H3' is valid; 'H2 | H3' is not.
In Figure 2 you notice that right in the middle of the template block there is
an XML/XSL empty element; <xsl:apply-templates/>
. The
NXT XIL language requires the "apply-templates" element within each
template block. The purpose of the "apply-templates" element is two-fold.
First, the apply-templates
element enables NXT to accomplish the
indexing indicated by the np:index
element. Apply-templates is kind of like the "on" switch or the "go" button,
telling NXT to do what np:index
indicates.
Second, from XSLT, apply-templates
tells NXT to process the
children (content between the tags) of the matched-on element. When the index
sheet matches on an element, the np:index
element in the template
block indicates what to do with the element content. The xsl:apply-templates
element starts the process by "grabbing" the children of the matched-on element
(the data between the start and end tags of the matched-on element). The child
content is indexed according to the rules indicated in the np:index
element. The same data that was just indexed is then "processed" or "parsed"
for elements that may be applicable to any template blocks in your index sheet.
If there are elements within the child data that apply to any template blocks
in your index sheet those appropriate template blocks will be applied to the
respective matching elements; thus the element name "apply-templates
".
Without the apply-templates
element nested within the np:index
element of the template block, indexing will not happen. This differs from
XSLT. In XSLT, if the apply-templates
element is not included, the
action of the template block will still occur but the children of the
matched-on element will not be processed.
Using Index sheets to index your HTML and XML content enables you to standardize or normalize your end users' ability to search for content. Explanation: Suppose you have some HTML and XML documents on your Content Network that use "author" tags to designate the person who created the document, whereas other documents, within the document metadata (internal or external), use the Dublin Core "dc:creator" tags for the same type of person. With index sheets and XIL you do not have to index these as separate entities. You can match on these elements separately but index them as the same field name. Figure 3 shows two examples of this.
Figure 3. Indexing Two Different Elements as the Same Field
One benefit of being able to index content in this fashion is that your end users are able to topically search for "author" and be able to get results for "author" and "dc:creator". Another benefit to this indexing capability is that you are able to use one index sheet on more that one document. The relationship of index sheets to documents is one-to-many. Meaning, you can use one index sheet to index many documents. However, each document can be indexed by only one index sheet.
The use of index sheets permeates every collection builder application in the NXT 4 product family. So, once you have created your indexsheet, depending on what method you use for creating content collections (Library Manager, ccBuild, or Manage Content), you will need to choose the appropriate implementation process. The implementation process for each of these applications is different.
Content collections that you build with Library Manager use index sheets to extend the full-text indexing that NXT does by default. NXT applies indexsheets to your library collection content according to the content type of your content, similar to Manage Content. Generally, Library Manager and the content bridges leverage the out-of-the-box, default index sheets to index the content going into the content collections. However, you can implement additional index sheets for Library Manager and the content bridges to use to index your library content. Implementing an indexsheet in Library Manager is on a per collection basis.
![]() |
Index sheets are fixed. If you edit them, you must start your collection over (in Library Manager, see Start Collection Over menu item) and will need to redistribute your collection. |
Library Manager allows you to implement indexsheets only for collection builder nodes in your library. Since collection reference nodes reference content collections were built outside of the Library Manager interface, NXT assumes that the appropriate indexsheets were applied at the time the referenced collection was built. And, only one content bridge allows you to implement other indexsheets or manipulate which indexsheet indexes which document type: File System Content Bridge. All other content bridges inherently use the appropriate indexsheet based on their content type.
To completely implement an indexsheet with either of the allowable content bridges, you must accomplish the following two step process:
You must perform the steps in this order otherwise when you try to choose the indexsheet from the Index Sheet property drop down list, the indexsheet will not be in the list. Also, if you have a new document type that you want indexed with your new indexsheet you need to add that document type, indicate the publishing rules, and select the appropriate indexsheet.
To implement indexsheets within an ccBuild built content collection, you do it through your content collection makefile. Implementing indexsheets into the ccBuild process is a two-step process:
indexsheet
element
for each indexsheet
indexsheet
attribute to each
document
elementRemember that you can apply one indexsheet to multiple documents but that each document can only reference one indexsheet. Using the MakeStart utility, you can initially construct your makefile with a single indexsheet for your content collection by entering the path and name of the indexsheet in the appropriate text box.
MakeStart takes the indexsheet information, creates and configures an indexsheet
element for you. All indexsheet
elements must be a nested children
of the content-collection
element and come before the first document
element in your makefile. The indexsheet
element must contain two
attributes: id and source.
The id
attribute can be to your choosing, but must be unique among
all other IDs in your collection. The source
attribute is merely
the path and file name to the indexsheet you are adding. Each indexsheet
element is an empty element (there is nothing between the start and end tags)
and can not have any elements nested within it.
The only instance where the ID must be a particular value is with indexing
external metadata. Figure 4 also shows the method for associating external
metadata to a document within a content collection. Assign an indexsheet
element's id
attribute value of "metadata" to designate an
indexsheet for indexing metadata content in your collection. The indexsheet
that you specify in your makefile as the "metadata" indexsheet will be used on
all metadata files in your content collection (these are the files referenced
by document
-element-nested metadata
elements).
To include other indexsheets within your content collection, you must add
subsequent indexsheet
elements directly below the indexsheet
element created by MakeStart. Include all necessary attributes and values in
each indexsheet
element or ccBuild will encounter a fatal error
when it tries to build your content collections. Figure 4 shows part of a
makefile with multiple indexsheet
elements.
Figure 4. Declaring Indexsheets in the Makefile
Once you add the indexsheet
elements to your makefile, you need to
decide which indexsheets will index which content collection documents. You
designate the indexsheet to index a given document with the indexsheet
attribute within the document
element for each indexable document.
The value of the indexsheet
attribute corresponds to the id
attribute value of one of your indexsheet
elements.
NXT 4 indexes most content types. However, NXT 4 does not index multimedia files like images, audio and visual files. For a complete list of indexable media, see |
Once you have configured both the indexsheet
elements and
attributes for your collection content, you can execute ccBuild and build or
update your content collection. Then you can go to your NXT site and run
searches to see the effects of your indexsheets.
Manage Content uses seven of the eight default indexsheets. The indexsheet that is not used by Manage Content is "HTML-title.xil." Manage Content chooses one indexsheet from the seven indexsheets to apply to a given document by the document's content type, or MIME-type. When you add a document or file to a content collection with Manage Content, NXT automatically and instantly indexes that document on the fly. To do this NXT must know which indexsheet is appropriate for that document.
NXT knows a document's content type by its extension (this is how your operating system determines which application can open a file). NXT checks the extension against your operating system's registry to find the MIME-type that corresponds with a particular extension. Once NXT identifies a document's type, it knows, by the pairings in Table 2 (these are listed in the Index Sheets field of the Add a Collection dialog) which indexsheet to apply to the document. NXT then applies the indexsheet and indexes the document, and places a copy of the document and index information in your content collection.
text/html=C:\Program Files\Rocket\NXT 4\Online Server/IndexSheets/HTML.xil; |
text/xml=C:\Program Files\Rocket\NXT 4\Online Server/IndexSheets/XML.xil; |
metadata=C:\Program Files\Rocket\NXT 4\Online Server/IndexSheets/Metadata.xil; |
application/msword=C:\Program Files\Rocket\NXT 4\Online Server/IndexSheets/MSWord.xil; |
application/pdf=C:\Program Files\Rocket\NXT 4\Online Server/IndexSheets/PDF.xil; |
application/pdf=C:\Program Files\Rocket\NXT 4\Online Server/IndexSheets/PDF-transform.xil; |
application/vnd.ms-powerpoint=C:\Program Files\Rocket\NXT 4\Online Server/IndexSheets/MSPowerPoint.xil; |
application/x-mspowerpoint=C:\Program Files\Rocket\NXT 4\Online Server/IndexSheets/MSPowerPoint.xil; |
application/vnd.ms-excel=C:\Program Files\Rocket\NXT 4\Online Server/IndexSheets/MSExcel.xil; |
application/x-msexcel=C:\Program Files\Rocket\NXT 4\Online Server/IndexSheets/MSExcel.xil |
During the build process, NXT obtains the MIME type from the server registry based on the extension of the document, and then applies the appropriate Indexsheet based on that MIME type. NXT does this for each document in each of your Content Services and Manage Content.
You can designate a different indexsheet for indexing a certain type of content by modifying the value (path and name) of the MIME type for the specific indexsheet. You must do this when you create your collection with Manage Content. Once the content collection is build, you are not able to modify the indexsheets that NXT applies to the content you add to the collection. If you do change an indexsheet in this way, that change only applies to that specific content collection.
Each MIME type can only have one associated indexsheet; however, one indexsheet can be used for multiple MIME types. |
Copyright © 2006-2023, Rocket Software, Inc. All rights reserved.