Your end users basically have two means of finding content on your site; navigating the table of contents, and, running searches. These two may seem to be too few to be able to find content efficiently. You can enhance both of these options in ways that will increase the efficiency of your end users' ability to find content.
Metadata is descriptive data; data that describes data. This is similar to a dictionary; a dictionary is words that describe words. Metadata can be a very useful tool to aid in the process of describing and finding content.
Metadata can nearly be any data that describes another set of data or provides added information regarding that other data. Metadata can be set of document properties that describe a document, such as; "title", "author", "subject", "abstract", "date", "ID", "name", and "size", just to name a few. Metadata can also be a completely separate paragraph describing a document, like a document abstract or summary. Furthermore, metadata can be a set of specialty tags or elements surrounding specific data within an XML or HTML document.
You can create your own metadata structure to fit the specific needs of your content. The metadata "door" can swing wide open to fit your needs. However, there is a cross-industry standard set of accepted metadata elements and attributes for the electronic environment. This standard set of metadata elements is known as the Dublin Core ( see http://dublincore.org/documents/1999/07/02/dces/ for a complete list of the elements).
Although the Dublin Core has established a standard set of metadata elements that you can use, you are not restricted to those fifteen elements. You can use the Dublin Core metadata elements or create your own. The first step to expanding your end users' ability to search for and find content is to enrich your content by adding metadata to it. Metadata can exist in various places. There are two types of metadata; internal and external.
In mark-up language documents, XML and HTML, you can include metadata within the document it is describing. This is "internal metadata." Within XML and HTML documents internal metadata is contained within descriptive elements (start and end tags). Placing metadata within meaningful and descriptive elements will enhance your ability to appropriately index that metadata content. Figure 1 shows an example of an HTML document with internal metadata elements.
Properties on MS Word, ODT, PDF, and other non-mark-up documents are also considered to be internal metadata. However, with these types of documents, the properties are usually set and you are not able to add additional properties to them. You are only able to modify the values of the existing properties. Therefore, we will focus our discussion on HTML and XML type documents for adding metadata. |
Figure 1. Internal Metadata in an HTML Document
Not all element tags in an HTML document are metadata elements. Most, in fact, are for the purpose of displaying the document. The "title" tag is one that is metadata for that document. Metadata tags are tags or element names that describe or further define what the content between the tags is. Your browser processes HTML pages. Any tags that your browser does not understand it disregards and simply displays the content between the tags in the browser. The way your browser handles (or doesn't as the case is) metadata tagging enables you to add or mark up your content as much as you want without affecting the document's display.
XML documents are different from HTML documents in that all XML document tags are metadata tags because XML documents are pure content, and no display. You use XSL style sheets to display XML content. Figure 2 shows an example of internal metadata in an XML document.
Figure 2. Internal Metadata in an XML Document
Notice that the tags in this XML document are very descriptive of what the tags contain, or what is between the tags. Having these types of descriptive tags within your HTML or XML content enables you to effectively index these documents beyond a full-text index by using indexsheets.
Metadata can also exist, for all types of documents and files (including XML and HTML), in a file or document separate from the document it is describing. This type of metadata is "external metadata." External metadata is contained in a Resource Description Framework (RDF) file. An RDF file is an XML document that is specifically structured to house metadata. RDF files also enable you to associate metadata with non-text files; images, a/v files, etc. Figure 3 shows an example of a RDF file containing metadata for the graphic.jpg image.
Figure 3. External Metadata in a RDF File
Notice that the RDF file in Figure 3 contains both Dublin Core elements and custom, or non-Dublin Core, elements. You may implement alternative forms of external metadata, as long as these files follow XML specifications. For more information regarding RDFs please see either of the following web sites:
http://www.ukoln.ac.uk/metadata/resources/dc/datamodel/WD-dc-rdf/
http://www.w3.org/TR/REC-rdf-syntax
Once you create your external metadata files, you must make a "connection" between your metadata file and the file it describes. NXT has three different ways to associate external metadata files to their parent documents:
The last two methods for associating external metadata files to their parent files are specific to NXT, "Filename" is not.
Filename association means that the external metadata file has the same name, extension included, as the parent document usually with the .rdf
extension. So, the name of an RDF file for the graphic.jpg image in Figure 3 would be graphic.jpg.rdf
. This method of association is the method you would use with the File System Content Bridge.
When you use the File System Content Bridge to build a content collection in Library Manager, and, you have external metadata files to describe the content, you must "tell" the Library Manager to handle and index those external metadata files. Library Manager handles content for the File System Content Bridge with content rules according to document type. You can add a property value to the rules for a given document type to handle external metadata. Follow the following process to have the Library Manager index and handle external metadata files for the File System Content Bridge:
.rdf
" extension (.rdf
is the default extension for external metadata files, and the default Property Source value is Constant).rdf
or other extension)Most of this process is for adding external metadata to an existing content collection in your library. You can, if you know you already have external metadata or will have external metadata, start and build a new content collection builder node using the Metadata in .RDF Files File System Content Bridge template. Figure 4 shows this option.
Figure 4. File System Content Bridge Metadata Template Option
This template pre-defines each File System Content Bridge default document type with the metadata property set for .rdf
extension files. This would eliminate steps 2 - 9 (other than building a File System Content Bridge collection builder steps). If you have external metadata files, but use a different extension, you can modify the Property Source Value accordingly. You should always do step 9 to make sure you do not publish your metadata files.
When you add the metadata property to your content rules, you "tell" NXT to take any file with the extension that you indicate for a given document type, associate it with it's namesake parent document, and index it with the Metadata.xil indexsheet. The Metadata.xil is a premade, out-of-the-box indexsheet for indexing metadata files. Therefore, all you need to do is set the metadata property for NXT to perform the index.
If you set this property for a document type, the build system will try to find a metadata file for all documents of the given type. If the build system finds a file of that document type that does not have an associated metadata file, it will log a warning for each of those documents.
By setting this property the build system knows to index each metadata file with the Metadata.xil
indexsheet (or the indexsheet that has the ID value of Metadata).
Another way you can associate an external metadata file to a document is in your content collection makefile, if you built, or plan to build, the content collection with ccBuild. This association is accomplished is a two-step process.
metadata
element within the parent document elementindexsheet
element specifically for indexing metadata files in your content collectionThe makefile.dtd only allows for 0 or 1 metadata elements to be nested within any given document element. Figure 5 shows a makefile indexsheet element designating an indexsheet for metadata (id="metadata"
) and document element with a nested metadata file element.
Figure 5. Metadata Association in the Makefile
This type of an association is as close as you get to a "physical" association in the electronic world. Unlike with the File System Content Bridge, neither the name nor extension of the metadata file designate it as a metadata file, nor do either dictate the its parent document. The nesting determines the metadata file's parent, and, the metadata
element name designates the file as a metadata file. The file that you reference in the location
attribute of the metadata
element can have any name and any extension.
Remember to include a metadata indexsheet element for NXT to index your metadata files, otherwise they go unindexed. You can only have one "metadata" indexsheet per content collection which is why the metadata
element does not have an attribute to identify an indexsheet to use for indexing. So, you should make sure that your metadata indexsheet is sufficient to cover all your content collection metadata files.
With Manage Content you can add metadata on-the-fly to documents in a content collection. Figure 6 shows the Properties interface for adding metadata to a content collection document. When you use Manage Content to assign metadata to a file, NXT places this metadata in a name-associated RDF file.
Figure 6. Adding Metadata with Manage Content
The metadata fields you see in Figure 6 correspond (top to bottom) to the Dublin Core elements of title, subject, creator, and description, respectively.
You may be wondering what the difference is between metadata and the properties of a document (like a MSWord document). The short answer is that there is no difference. The longer answer is that when NXT indexes certain files like Microsoft Office and PDF files, it converts the documents to HTML. This conversion creates documents with internal metadata. NXT then applies an indexsheet to the documents to leverage and index the internal metadata. Part of this internal metadata contains the document properties. Once NXT is finished indexing those documents, it deletes the HTML version of the document. Thus, in the end, NXT handles document properties the same as it handles other forms of metadata.
Copyright © 2006-2023, Rocket Software, Inc. All rights reserved.