Technical Introduction. (Paul Spence)
In 2004, the Centre for Computing in the Humanities began a pilot project in collaboration
with the Department of Spanish and Spanish-American studies at King’s College London to
explore the extent to which some of the traditional scholarly research activities associated
with an academic department could be represented using an XML-based architecture. The
principal aim was to create a resource that integrated primary sources with scholarly
commentary within an environment that allowed users to re-arrange materials according to
different points of focus: the scholars involved; the authors of the primary source
materials; the research areas or thematic bibliographies.
The project focused on one of the major research areas in the Department of Spanish and
Spanish-American studies, namely the literature, culture and history of the Spanish Golden
Age, but the underlying goal was that the framework developed could be re-applied to any
academic department wishing to publish its research within an integrated environment.
The project team includes both domain specialists from the Spanish and Spanish-American
department (initially led by Professor Barry Ife but now led by head of department Professor
Robert Archer and also including Dr Trudi Darby and Dr Robert Goodwin, as well as Dr Alex
Samson from UCL) and humanities computing specialists from CCH including Dr Arianna Ciula,
Zaneta Au, Paul Vetch, Mark Stewart and Paul Spence, who was the Director and the lead
analyst for the technical research.
The research outcomes have so far included a freely-available web publication that
integrates 20 electronic versions of primary texts, over 200 bibliographic entries relating
to participating scholars, and a number of publications in digital form both in Spanish and
English, as well as the Spanish-English Translations Database, 1500-1640 and
a digital version of John Minsheu's 1599 Spanish-English Dictionary. The web
publication represents only a portion of the research carried out on the project: other
research includes a project led by Barry Ife that experimentally compared and analysed texts
from Gonzalo Fernández de Oviedo y Valdés and Alvar Núñez Cabeza de Vaca - each providing
very different accounts of a sixteenth century expedition to Florida - using XML-based
techniques. The next stage of the website will both broaden the research focus of the web
publication and extend this initial pilot to cover the full range of research carried out by
the Spanish and Spanish-American department.
The project made use of a number of customized versions of the P4 Guidelines of the Text
Encoding Initiative (TEI) [http:www.tei-c.org/]. TEI is a major and long-standing (since
1989) international scholarly standards initiative, and its Guidelines allow
encoding systems to be developed for a wide range of text-based materials across the
humanities. Many projects have used TEI to great effect to encode primary source texts, but
there is relatively little use of TEI to encode pre-existing secondary scholarly materials
and even less experience to date of using it to encode born-digital materials. Encoding
structures were developed for EMS that sought to provide a common framework, but which also
reflected the very different nature of the materials on the project. The result is a larger
corpus of material, sub-divided into smaller corpora for electronic publications, primary
source materials, bibliographic materials and presentational materials describing the
activities carried out under the EMS ‘umbrella’ and all encoded using TEI XML.
TEI provides an excellent basis for encoding individual documents within a recognised
corpus and is widely used within the humanities for text encoding, but it is less clear
which standard(s) should be used to combine textual materials arranged as different corpora
and the technical tools in this area are still relatively immature. CCH has carried out
extensive research in combining TEI with the METS standard,1 and in the
case of EMS the objective was both to develop an extensible architecture that could
represent departmental research and to provide tools to enable it to be published. The
architecture developed has proven to be robust enough to make it easy to extend the resource
to new kinds of publication and allows users to manage the relationships between different
entities in a given research universe. The tools developed will be enhanced as part of the
next stage of development, but they already make it easy for departmental research to be
published early and often. The resource created also highlights two key advantages of an XML
approach over other technical approaches: the fact that it is possible to inter-link content
at various levels of granularity (both at document ‘object’ and intra-documental level) and
the fact that deep encoding allows user to carry out structured searches across multiple
content components.
XML is used widely in the humanities in the context of digital publication and
semantically-aware information retrieval, as in the EMS project. However, the potential
benefits of XML as a data interchange format – which is the main reason why it is so widely
used in the business world – are rarely taken advantage of in digital scholarship. One of
the technical aims of the EMS project was to explore XML’s potential as a format for
integration between projects and/or institutional applications with different academic and
technological starting points.
The web publication already includes a range of scholarly
activities from fine-grained text encoding/digital publication to more mundane tasks such as
the need of an institution to provide information on their members of staff, research areas,
courses taught and bibliographies and combines them in a seamless interface. The fact that
the material is encoded in XML makes it relatively easy to re-purpose the information for
different media (print, mobile phones) and in different formats (PDF versions of articles
that can be used as the basis for conventional print journals). Using the single-source
publishing principle made possible by XML’s separation of content from presentation,
moreover, it is possible to re-configure the data to meet vastly different research
objectives. The next stage of the project will explore the potential to provide more
specialised outputs (bibliographies for individual courses or for personal CVs, project
lists for wider institutional websites) and to connect the research of an academic
department to other digital repositories (so that, for example, one might combine or share
research from different institutions around the area of ‘Anglo-Spanish literary relations).
|
|