M Chopey PlanningImplementing
M Chopey PlanningImplementing
As one step toward realizing this goal, the major task of the
Metadata Workshop was to identify a simple set of elements for
describing networked electronic resources. To make this task
manageable, it was limited in two ways. First, only those elements
necessary for the discovery of the resource were considered. It was
believed that resource discovery is the most pressing need that
metadata can satisfy, and one that would have to be satisfied re-
gardless of the subject matter or internal complexity of the object.
share bibliographic data for commonly held items have been the two
main driving forces behind the development of national and interna-
tional cataloging standards and formats.
Because metadata for digital collections is not likely to be stored for
use by any institution except the one creating and maintaining it, the
driving force behind the development of metadata standards for digital
collections in the future is most likely to be a desire for uniform access
methodology across collections. As a practical matter, much of the co-
operation among institutions that has occurred to date and led to current
metadata standards has probably been motivated more by a sense of
trepidation about venturing into this entirely new venture alone, and by
the tradition of open information sharing among the communities de-
veloping metadata standards, than anything else. Planners and imple-
menters of new projects with a metadata component are still considered
pioneers, as standards and documentation for many metadata schemes
and types of metadata implementations are often still very sketchy as to
the specifics of element definitions, data values, and best practices for
technical details on such things as storing metadata and designing ef-
fective access mechanisms to retrieve it. Fortunately, guidance from
metadata pioneers is becoming more available to new planners and
implementers. (An example of this, and one that is highly recom-
mended as a source of guidance on the planning and implementing
metadata for various types of collections is: Diane I. Hillmann and
Elaine L. Westbrooks, eds., Metadata in Practice (Chicago: American
Library Association, 2004).)
the content of the collection for markup and display purposes (as is the
case, for example, with Encoded Archival Description (EAD), a meta-
data scheme used for the markup of archival finding aids). First, a sum-
mary of the general considerations for planning and implementing the
metadata component for any type of collection, and then on to some
more specific considerations for different types of collections.
Wide Web for this purpose, many metadata implementations rely on lo-
cally defined data elements and data values in their metadata records
that cannot be designated in a general-use metadata creation tool. (See
for example, the freely available “Dublin Core Metadata Template”
provided by the Nordic Metadata Project (available at http://www.lub.
lu.se/cgi-bin/nmdc.pl), or the Connexion® cataloging tool available to
OCLC subscribers at http://connexion.oclc.org/.)
A metadata creation tool is generally set up as a “web form,” which
might be installed on an indexer’s workstation or might be available to
indexers via a local or wide area network or the Internet. The indexer
uses the form to enter data values appropriate to the object being de-
scribed in defined fields, and then saves the record, whereupon the rec-
ord is automatically encoded in the format in which it will be stored, or
at least transported to, the online collection’s database. The encoding of
the metadata might affect how the metadata are retrieved and displayed
by the project’s search and display mechanisms and how the record is
parsed for storage in the collection database. Encoding metadata records
in a standard syntax allows a project to exchange records with other
metadata applications outside of the project for which the metadata are
created. If the indexer is creating a record in Dublin Core, he or she
might enter data values into the web form as follows:
264 METADATA: A CATALOGER’S PRIMER
Cataloging Function
This stage entails planning what the user of the collection will see
upon accessing the home page or welcome screen of the collection on
the World Wide Web; what search, browse, and navigational options
will be available from that screen and others; and, what the user will be
directed to upon clicking any given hypertext link (including a hyper-
text link embodied by a thumbnail image or other graphic, which is a
popular means of providing navigation between pages in the WWW en-
vironment) on any given page. Although it is best to plan as much as
possible of the final product before the construction of the site begins,
most esthetic and navigational functions can be adjusted relatively eas-
ily at any point later in the development of the repository, so it is not as
TABLE 1
268
Curatorial Digital object Web design Database/ Cataloging Administra-
formatting and Retrieval system and tive/
Program- design Indexing Managerial
ming
Envision the final product a a a a
Plan appearance and technical characteristics of digital objects a a
Estimate disk storage requirements a a a
Determine staffing requirements a
Determine equipment/technical infrastructure required a
Project costs a
Secure funding a
Procure equipment/technical infrastructure a
Hire or select staff a
Designate metadata scheme and element set a a a
Define parameters and input standards for data values a a a
Designate exchange syntax for metadata a a
Design and construct metadata creation tool a a
Design database for metadata storage and retrieval a a a
Assemble rights and credit information a a
Train digital object formatting/digitization technicians a
Scan/digitize objects; gather already-digitized objects a
Design and construct user interface a a a
Program search, browse, and navigational functions a a a
Inspect site for Web Content Accessibility Guidelines (WCAG) a a
compliance
Test final product a a
Design channels for ongoing feedback from users a a
Publicize and release project a a
Secure continuing funding a
Part II: How to Create, Apply, and Use Metadata 269
Esthetics
Designing what the user will see upon accessing the home page re-
quires a collaboration between the vision of the curator and the skills of
the web designer for making that vision a reality. Decisions to be made
at this stage include all esthetic considerations from the color of the
background, to the size, color, and style of the fonts, to the selection of
images, graphics, and special effects to be displayed on the various
pages of the online collection. The choices have an impact on the
amount of storage space required, and possibly on response time. Most
repositories will want to apply at least some level of standard design
template to each page the user accesses in the course of navigating and
viewing a collection. The standard template might include copyright in-
formation, an e-mail link for contacting the webmaster for information
or to provide feedback, the name and logo or graphic associated with the
repository and/or institution sponsoring it, etc.
Navigation Functions
tire collection at any point in a user’s journey through it, even when
sub-collections have been defined and the user has entered one.
Another important consideration in providing navigational capabili-
ties in a digital repository is to allow a user to move to any part of the
collection, or any search screen or index, with a minimal number of
mouse clicks from any other part of the collection or from any stage in
the retrieval of data. Allowing a user to cache as much of his or her ses-
sion as possible can also prevent frustration when a user wants to back-
track. Some collections allow users to retrieve the “search history” of a
session at any point and either re-execute a search or refine one. Useful
feedback from users can be obtained during the pre-release testing pe-
riod and in an ongoing fashion during the life of the project, but care
should be taken during the planning and design stage to anticipate po-
tential navigation problems.
Browsing Functionality
lection database and retrieve every object that pertained to that cate-
gory. So for example, if the user clicked on the “Yayoi culture” link or
icon, a search would be executed that gathered every object in the col-
lection whose associated metadata contained the word “Yayoi” in some
predefined data element or elements. The cataloging function would
need to know every category that might be defined anywhere in the site,
so that the metadata will contain the appropriate terms to gather every
object that belongs to that category.
Enabling browsing of indexes, such as name, title, or subject indexes,
likewise requires the collaboration of the curatorial, web design, and
cataloging functions. The curator should determine the types of indexes
to be made available for browsing, and the places within the site where
the indexes will be available. The web designer designs the appearance
and linking functionality of the indexes, and the cataloger ensures that
there are defined data elements in the metadata records that correspond
to the indexes selected by the curator.
For example, the curator might determine that useful indexes for a
particular textual document collection would be: author name, docu-
ment title, and subject. The web designer would place these indexes
(perhaps in the form of “pull-down menus,” for one popular design ex-
ample) in the pages within the site where the curator has determined that
browsing a particular index might be useful, and would write the scripts
to populate that index with the appropriate data from a defined field
within the metadata, and to execute the gathering of relevant documents
when an access point within an index is selected. The access points con-
tained within this type of index come from a defined data element
within the metadata records for the collection. For example, an author
index might access a data field defined as “Author” or “Creator.” The
index is dynamic in that every time a new object and its associated
metadata are added to the collection, the index expands to include that
new access point. It might also be desirable for the index to count the oc-
currences of a given access point within the index, e.g., the number of
documents in the collection written by a given author. An index must
also be designed to sort the data that populates it–usually alphabetically,
but sometimes chronologically or according to some other order.
The cataloging role is crucial in constructing useful indexes. The cat-
aloger must ensure that the metadata scheme chosen or designed for the
collection defines the data element needed for any given index, and that
the data element is defined in such a way as to enable browsing (and
also other kinds of searching) at the appropriate level of specificity. For
example, if the curator determines that indexes and other retrieval
272 METADATA: A CATALOGER’S PRIMER
mechanisms should allow a user to search for or browse all names asso-
ciated with objects in the collection, but also to search for or browse
names of persons or corporate bodies performing a particular role, the
cataloger needs to define that particular data element accordingly. In the
Dublin Core metadata scheme, this might be accomplished by assigning
the data element “Creator” with a locally-specified data element refine-
ment, which might be anything from “photographer” to “course archi-
tect” to “funding agency.” It would then be the responsibility of the web
design and database programming functions to ensure that the data is
parsed correctly upon import to the collection database, and that search
mechanisms and indexes are able to retrieve the data correctly from the
database.
In order for an index to be useful, the data that populates it also must
be constructed in a consistent manner and one that is logical for sorting.
Data consistency is the responsibility of the cataloging function, and is
accomplished by creating documentation with clear and thorough rules
for data entry, and by effective training of indexers (who actually create
the metadata records).
The creation of a subject index can be simple or complex, depending
on how large and detailed the list or thesaurus of subject terms defined
for the collection is. The most fundamental principle in constructing a
subject index is to use a controlled vocabulary. That is, there should
only be one authorized subject term or phrase for a given topic or con-
cept. Most controlled vocabulary lists can be made more useful by pro-
viding a reference structure to direct users from unauthorized synonyms
to authorized forms.
Many general and discipline-specific subject thesauri exist and can
be adopted or adapted for use in a metadata-driven digital repository.
Examples of general subject thesauri are Library of Congress Subject
Headings or the Sears List of Subject Headings. Discipline-specific the-
sauri include lists such as the Art & Architecture Thesaurus (AAT 2000)
or the Thesaurus of Psychological Index Terms (Gallagher 2004).
Adopting an existing thesaurus makes planning easier, but some the-
sauri are very complicated, and can be applied correctly and effectively
only by highly trained and experienced indexers. If the decision is made
to construct a local list or thesaurus, the work should be done under the
direction of, or in close consultation with, a professional cataloger expe-
rienced in the principles of subject analysis and document indexing.
There is an international (ISO 1986) standard for the construction of
thesauri, but its documentation might be hard to understand and apply
Part II: How to Create, Apply, and Use Metadata 273
by anyone who is not already very familiar with subject cataloging and
document indexing principles.
Once a subject list has been selected or devised, there are still deci-
sions to be made about how subject browsing and searching will be
implemented in the collection. A browsable subject index can be con-
structed in the same manner as a name or title or other index, but since
subject terms in a list are likely to have hierarchical and horizontal rela-
tionships with other subject terms in the list, the display of results of a
subject browse or search is an additional consideration for the cura-
torial, cataloging, web design, and database/retrieval system design
functions. For example, a hypothetical thesaurus for use with a collec-
tion of botanical literature might include among its authorized subject
vocabulary the term lichens. The reference structure for that hypotheti-
cal thesaurus might indicate that lichens has broader term Cryptogams
and narrower terms Ascolichens, Epiphytic lichens, lichen-forming fungi,
and rare lichens, and that Ascolichens in turn has narrower terms
Caliciales, Graphidales, Lecanorales, Lichinales, Peltigerales, Pertusa-
riales, and Verrucariales. If a user elected to browse on the term li-
chens, an effective display of results might be something like the fol-
lowing:
Lichens [22]
Narrower terms:
Ascolichens
Epiphytic lichens
Lichen-forming fungi
Rare lichens
indicating that the collection contains 22 items on the term selected, and
pointing the user to related terms that might also be of interest. If the
user clicked on one of the other links, a similarly displayed result would
appear, placing the selected term within the context of its related subject
terms. Library catalog software is designed to display subject term re-
lationships in a fashion somewhat similar to this, but the means of ac-
complishing this with library catalog software involves loading into
the library catalog database an entire file of records wherein each term
274 METADATA: A CATALOGER’S PRIMER
in the thesaurus has its own record containing all of its broader and re-
lated terms. This would be a highly inefficient method for any meta-
data-driven repository to try to replicate. A more efficient solution for a
metadata-driven repository would be to encode the thesaurus in XML
and program the display of subject search or browse results to reference
the XML document simultaneously as the search or browse queries the
database of object-associated metadata.
Search Functionality
Object Renderability
Format Longevity
cures grant funding can usually apply part of the grant funding toward
the salaries of these staff members. Each functional area in the project
should have documented its projected staffing needs and written job de-
scriptions in the earlier stage “identify staffing needs.” If possible, rep-
resentatives from each functional area should also be involved in the
hiring process. In areas in which extensive training of staff might be re-
quired, such as indexing, an effort should be made to hire staff who can
accept a long-term commitment to the project, so that re-training can be
avoided and optimal efficiency can be achieved from project staff.
Well before this stage is reached (probably during the “envision the
final project stage”), the cataloging function of the project will most
likely have determined what established metadata scheme will be em-
ployed, or, if a locally invented metadata scheme is to be used, the gen-
eral parameters of that local scheme.
There are many benefits to selecting an established metadata stan-
dard, especially if there is any chance that the repository institution
might later wish to integrate its collection with other collections, or
make its collections searchable through outside applications’ search
mechanisms. By adopting an established metadata standard, a project
may gain access to users’ manuals, documentation on recommended
best practices for data values and encoding, and possibly tools for data
input. A project can also benefit from the experience of other projects
using that scheme, as other implementers will often publish usage
guidelines and best practices for their own projects in addition to those
promulgated by the scheme’s maintenance agency. Methods of encod-
ing metadata are often designed with established metadata schemes in
mind, so a project might find documentation on methods of encoding
metadata more useful and applicable when using an established meta-
data scheme. The project’s web and database programmers are likely to
be able to find in electronic discussion forums shareable scripts and
other techniques for making use of the project’s metadata if an estab-
lished metadata scheme is employed.
Metadata schemes are sometimes extensible, meaning that a project
can adopt the structure and syntax of a metadata scheme, and use some
or all of its defined data elements at their broadest definitions, while still
allowing for local description needs. In Dublin Core, this is accom-
plished by means of data element qualifiers (DCMI 2004), which refine
the meaning of a data element while still allowing the element itself to
Part II: How to Create, Apply, and Use Metadata 279
them up in the form of a local metadata standard. This will become the
basis for informing much of the database programming and web design
of the online repository, and, when expanded with instructions for in-
putting and encoding the metadata (see below), will become the proj-
ect’s usage guide for metadata creation and manipulation, and probably
the most important piece of documentation produced by the project.
After the metadata scheme and its data elements have been desig-
nated and documented, the cataloger should carefully define each ele-
ment as to the nature of data that should occupy the field labeled with
that data element. These definitions should be documented in the usage
guide in terms and language that can be understood by the indexers (and
possibly digital formatting technicians) who will be creating the proj-
ect’s metadata. The cataloger should also provide detailed instructions
for inputting data, specifying the exact form that the data should take
(including punctuation, spacing, and capitalization), and guidelines for
transcribing data found on the object for which the metadata are being
created. For some elements, the cataloger might stipulate that the data
value for that element conform to a standard (such as ISO . . . standard
for the expression of a “Date” element like “Date.Created”) or that the
data value be selected from a list, such as a standard list of subject terms
or a thesaurus. Some data elements might reference an online registry of
some sort, like the “virtual international authority file” envisioned by
Barbara Tillett (2001).
At this stage all of the documentation needed for the training of index-
ers should have been written by the cataloging function. Documentation
can be revised as necessary if it is not easily understood by the indexers,
who will most likely not be professional catalogers. The goal should be to
allow indexers to create metadata with minimal supervision. In the early
stages of metadata creation, all records created by indexers should be
stored in a “review file” for inspection by a professional cataloger, who
282 METADATA: A CATALOGER’S PRIMER
will file the reviewed record to the live database. As indexers become
more experienced, they may be allowed to store records directly to the
live database, but the review file should be available throughout the life
of the project so that indexers can file records to it when they are uncer-
tain about the data for any given field. Some repositories will use au-
thority-controlled headings (name and/or subject) in their records. In
such cases, the indexer should probably be authorized to select al-
ready-established headings from the repository’s authority file, but not
to enter new, unestablished headings. Records requiring headings that
have not yet been established should be stored in the review file, from
which a professional cataloger will periodically retrieve them and es-
tablish the headings in question (and add them to the authority file).
As noted above, metadata record creation can be accomplished effi-
ciently with a web-based form designed by the repository. Indexers
should be encouraged to report problems with the form and the record
filing process (such as slow response time or bugs), and suggest im-
provements to make their work easier and the overall metadata creation
workflow more efficient. Indexers will be responsible for transcribing
data found on the digital objects (or surrogates of them) and also for an-
alyzing the objects in order to record other characteristics and attributes.
The workflow should be designed to make it as easy as possible for in-
dexers to ascertain the information they need at the metadata creation
stage. For example, if file size is recorded in the metadata for objects in
the collection, the file size should be readily ascertainable by the in-
dexer, perhaps provided by the digital formatting function. Many repos-
itories will wish to have the curatorial function provide suggested name
headings and subjects and other controlled vocabulary data so that the
indexer is not responsible for determining them. Transcription can be
problematic in some types of digital collections because, in textual doc-
ument indexing for example, a data element such as “title” might be am-
biguous or difficult to determine. In newspaper indexing, a given article
might have several different titles, including a generic or recurring col-
umn title, a more specific title, a subtitle, and a running title that changes
when the article continues on a different page. A digital book or report
might likewise have a title that differs between the title page, the cover,
and the running title appearing at the top or bottom of each page in the
item. In library cataloging, one title is always chosen as the main title,
and the problem of choosing this main title is addressed by a separate set
of rules in the cataloging code for every class of material that might be
cataloged by a library, and an ordered list within each class for choosing
Part II: How to Create, Apply, and Use Metadata 283
the “source of information” for the title and other data that might need to
be transcribed. A digital repository might adopt this approach in creat-
ing its rules for transcription, or the repository might determine that all
titles should be transcribed with appropriate refinement of the “title” el-
ement, but without any one title chosen as the main title. In any case, the
documentation used by the indexers should give clear and unambiguous
instructions for identifying every data element that needs to be tran-
scribed (including title, publisher, place of publication or origin, pub-
lisher or distributor, date, standard number appearing on the item, etc.),
in addition to the instructions for how that data should be transcribed.
Some types of digital objects, such as art images or sound files, might
have no transcribable data at all. In such cases, the curatorial function
will be responsible for supplying much of the data that an indexer will
enter in the metadata record. The curatorial function might need guid-
ance from the cataloging function (in the form of documentation created
by the cataloging function) on how to determine and supply this data.
Import/Attach Metadata
This stage is executed by the web design function according to the vi-
sion laid out by the team at the “envision the final product” stage. As the
user interface is being constructed, the curatorial function should be
monitoring its progress to ensure that the curatorial vision of the final
product is being realized. The web design function might present alterna-
tive layouts at each stage of the construction for the curatorial function, or
perhaps the entire planning and implementation team, to examine and se-
lect the best alternative. Many of the search and browse features of the
user interface will be designed using web design and database software.
This is an advantage not only because it makes the web designer’s task
easier, but also because it is likely to result in interfaces that are familiar
to World Wide Web users, making the repository easier to use and navi-
gate by users. The web designer should be familiar with relevant guide-
lines and standards for accessibility for people with disabilities, and
should program into the user interface alternative means of access
wherever they are required.
WORKS CITED
Anglo-American Cataloguing Rules, 2nd ed. (AACR2). 2002 revision, 2004 update.
Chicago: American Library Association.
Bishoff, Liz and Elizabeth S. Meagher. 2004. Building Heritage Colorado: The Colo-
rado digitization experience. In Diane I. Hillmann and Elaine L. Westbrooks, eds.,
Metadata in practice. Chicago: American Library Association, pp. 22-25.
286 METADATA: A CATALOGER’S PRIMER
Caplan, Priscilla. 2003. Metadata fundamentals for all librarians. Chicago: American
Library Association.
Colorado Digitization Program, Heritage Colorado. Available at: http://www.cdpheritage.
org/heritage/.
Dublin Core Metadata Initiative (DCMI). 2003. Dublin Core metadata element set,
version 1.1: reference description, available at: http://dublincore.org/documents/
dces/.
Dublin Core Metadata Initiative (DCMI). 2004. DCMI metadata terms, available at:
http://dublincore.org/documents/dcmi-terms/Qualifiers.
Gallagher, Lisa A. ed. 2004. Thesaurus of psychological index terms, 10th ed. Wash-
ington, DC: American Psychological Association.
Getty Vocabulary Program. 2000. Art & architecture thesaurus Los Angeles, Calif.:
J. Paul Getty Trust.
Hillmann, Diane I. and Elaine L. Westbrooks, eds. 2004. Metadata in practice. Chi-
cago: American Library Association.
International Federation of Library Associations and Institutions (IFLA). 2004. Family
of ISBDs, available at: http://www.ifla.org/VI/3/nd1/isbdlist.htm.
International Standards Organization (ISO). 1986. Documentation: Guidelines for the
establishment and development of monolingual thesauri, 2nd ed., ISO 2788. Geneva:
ISO.
Library of Congress. Cataloging Distribution Service. 2004. Library of Congress sub-
ject headings, 27th ed. Washington, DC: Library of Congress.
Library of Congress. Network Development and MARC Standards Office. 2003.
MARC 21 concise format for bibliographic data, 2003 concise ed. available at:
http://purl.access.gpo.gov/GPO/LPS35317.
Library of Congress. Network Development and MARC Standards Office and Society
of American Archivists. 2002. Encoded archival description (EAD): Official EAD
version 2002 website. Washington, DC: Library of Congress, available at: http://
www.loc.gov/ead/.
Sears List of Subject Headings. 2004. 18th ed. Bronx, N.Y.: H. W. Wilson.
Sperberg-McQueen, C.M. and Lou Burnard eds. 2002. Guidelines for text encoding
and interchange. Oxford: Published for the TEI Consortium by the Humanities
Computing Unit, University of Oxford.
Tillett, Barbara. 2001. Authority control on the Web. In Proceedings of the bicenten-
nial conference on bibliographic control for the new millennium: Confronting the
challenges of networked resources and the Web, Washington, D.C., November
15-17, 2000, ed. Ann M. Sandberg-Fox, Washington, DC: Library of Congress,
Cataloging Distribution Service, p. 207-220.
United States. Department of Justice. Americans with Disability Act, ADA homepage,
available at: http://www.usdoj.gov/crt/ada/adahom1.htm.
Weibel, Stuart. 1996. A proposed convention for embedding metadata in HTML. In
W3C workshop on distributed indexing and searching, May 1996, available at:
http://www.w3.org/Search/9605-Indexing-Workshop/reportOutcomes/S6Group2.html.
Weibel, Stuart et al. 1995. OCLC/NCSA metadata workshop report, available at:
http://www.oclc.org:5047/oclc/research/conferences/metadata/dublin_core_report.
html.
Part II: How to Create, Apply, and Use Metadata 287
World Wide Web Consortium (W3C). 1997. Platform for Internet content selection
(PICS), available at: http://www.w3.org/PICS/.
World Wide Web Consortium (W3C). Resource Description Framework, available at:
http://www.w3.org/RDF/.
World Wide Web Consortium (W3C). 2001. Technology and Society Domain. Seman-
tic Web, available at: http://www.w3.org/2001/sw/.
World Wide Web Consortium (W3C). Web Accessibility Initiative homepage, avail-
able at: http://www.w3.org/WAI/Resources/.