Technical Introduction. (Paul Spence)

In 2004, the Centre for Computing in the Humanities began a pilot project in collaboration with the Department of Spanish and Spanish-American studies at King’s College London to explore the extent to which some of the traditional scholarly research activities associated with an academic department could be represented using an XML-based architecture. The principal aim was to create a resource that integrated primary sources with scholarly commentary within an environment that allowed users to re-arrange materials according to different points of focus: the scholars involved; the authors of the primary source materials; the research areas or thematic bibliographies.

The project focused on one of the major research areas in the Department of Spanish and Spanish-American studies, namely the literature, culture and history of the Spanish Golden Age, but the underlying goal was that the framework developed could be re-applied to any academic department wishing to publish its research within an integrated environment.

The project team includes both domain specialists from the Spanish and Spanish-American department (initially led by Professor Barry Ife but now led by head of department Professor Robert Archer and also including Dr Trudi Darby and Dr Robert Goodwin, as well as Dr Alex Samson from UCL) and humanities computing specialists from CCH including Dr Arianna Ciula, Zaneta Au, Paul Vetch, Mark Stewart and Paul Spence, who was the Director and the lead analyst for the technical research.

The research outcomes have so far included a freely-available web publication that integrates 20 electronic versions of primary texts, over 200 bibliographic entries relating to participating scholars, and a number of publications in digital form both in Spanish and English, as well as the Spanish-English Translations Database, 1500-1640 and a digital version of John Minsheu's 1599 Spanish-English Dictionary. The web publication represents only a portion of the research carried out on the project: other research includes a project led by Barry Ife that experimentally compared and analysed texts from Gonzalo Fernández de Oviedo y Valdés and Alvar Núñez Cabeza de Vaca - each providing very different accounts of a sixteenth century expedition to Florida - using XML-based techniques. The next stage of the website will both broaden the research focus of the web publication and extend this initial pilot to cover the full range of research carried out by the Spanish and Spanish-American department.

The project made use of a number of customized versions of the P4 Guidelines of the Text Encoding Initiative (TEI) [http:www.tei-c.org/]. TEI is a major and long-standing (since 1989) international scholarly standards initiative, and its Guidelines allow encoding systems to be developed for a wide range of text-based materials across the humanities. Many projects have used TEI to great effect to encode primary source texts, but there is relatively little use of TEI to encode pre-existing secondary scholarly materials and even less experience to date of using it to encode born-digital materials. Encoding structures were developed for EMS that sought to provide a common framework, but which also reflected the very different nature of the materials on the project. The result is a larger corpus of material, sub-divided into smaller corpora for electronic publications, primary source materials, bibliographic materials and presentational materials describing the activities carried out under the EMS ‘umbrella’ and all encoded using TEI XML.

TEI provides an excellent basis for encoding individual documents within a recognised corpus and is widely used within the humanities for text encoding, but it is less clear which standard(s) should be used to combine textual materials arranged as different corpora and the technical tools in this area are still relatively immature. CCH has carried out extensive research in combining TEI with the METS standard,¹ and in the case of EMS the objective was both to develop an extensible architecture that could represent departmental research and to provide tools to enable it to be published. The architecture developed has proven to be robust enough to make it easy to extend the resource to new kinds of publication and allows users to manage the relationships between different entities in a given research universe. The tools developed will be enhanced as part of the next stage of development, but they already make it easy for departmental research to be published early and often. The resource created also highlights two key advantages of an XML approach over other technical approaches: the fact that it is possible to inter-link content at various levels of granularity (both at document ‘object’ and intra-documental level) and the fact that deep encoding allows user to carry out structured searches across multiple content components.

XML is used widely in the humanities in the context of digital publication and semantically-aware information retrieval, as in the EMS project. However, the potential benefits of XML as a data interchange format – which is the main reason why it is so widely used in the business world – are rarely taken advantage of in digital scholarship. One of the technical aims of the EMS project was to explore XML’s potential as a format for integration between projects and/or institutional applications with different academic and technological starting points.

The web publication already includes a range of scholarly activities from fine-grained text encoding/digital publication to more mundane tasks such as the need of an institution to provide information on their members of staff, research areas, courses taught and bibliographies and combines them in a seamless interface. The fact that the material is encoded in XML makes it relatively easy to re-purpose the information for different media (print, mobile phones) and in different formats (PDF versions of articles that can be used as the basis for conventional print journals). Using the single-source publishing principle made possible by XML’s separation of content from presentation, moreover, it is possible to re-configure the data to meet vastly different research objectives. The next stage of the project will explore the potential to provide more specialised outputs (bibliographies for individual courses or for personal CVs, project lists for wider institutional websites) and to connect the research of an academic department to other digital repositories (so that, for example, one might combine or share research from different institutions around the area of ‘Anglo-Spanish literary relations).

Footnotes

1.		Metadata Encoding and Transmission Standard, http://www.loc.gov/standards/mets/ (accessed 12 December 2007)