I gave a talk on my PhD research (the GSTAR project) at the Hestia 2 event in Southampton last Thursday. Given I am still early on in the process, and having been asked to relate my work to the world of commercial archaeology, I decided to follow an overview of my research with some ideas for the future and how Linked Data approaches could be used to overhaul the (painful and convoluted) ways we manage heritage data in the UK.
The talk will soon be up on the project webpages and the slides are presented below via Slideshare. There are some great write ups of the day over on the Hestia webpages.
In the questions and discussion afterwards, a couple of topics that have presented before came up. Firstly, the idea that the CIDOC CRM and derivatives such as CRM EH are unnecessarily complicated. Secondly, the idea that this serves no useful purpose as very few people, if anyone at all, will be interested in using the richness contained therein.
Complicated and Complex
The first issue is one I have written up in my literature review but in one which requires further investigation. Ontologies such as the CIDOC CRM are indeed complicated and I have been quoted as describing working with them as ‘difficult’ (eg Isaksen, 2011). I do still think though that such approaches are unavoidable to adequately represent the complexity of heritage data. A major criticism of computational approaches in archaeology going back some decades was the idea that they are inherently reductionist; to fall back to overly simplistic data models has the potential to fall foul of the same problems inherent in what are now seen as legacy information systems, but in their day were equally as cutting edge as Linked Data approaches are today. There is a big difference between complex and complicated (see eg Kamensky, 2011) and the latter is sometimes essential to describe the former.
CIDOC CRM and extensions such as CRM EH can be described as complicated but working with them is quite straightforward once the class structure is understood. Especially the ability to create and use patterns, re-useable blocks which can easily be recycled. And the benefits of using such a rich model to describe complex scenarios far outweighs any disadvantages associated with complicated models and/or implementation difficulties. After all, the world both now and in the past is/was a complex place!
As an example, let’s look at the relationship between a find and the stratigraphic unit in which it was found. In CRM EH terms, we would start with a production event which would create the object and ultimately some kind of deposition event resulting in the find coming to rest in the location where it is ultimately found by an archaeologist through excavation or by a member of the public through a casual find or metal detecting or however.
A much simpler model would simply record a bunch of attributes against a find, a data-centric view of the world of data of the kind prevalent in relational systems modelled using eg entity-relationship type approaches. Taking classifications of finds in particular, this simple approach is problematic as without any explicit recording of the process of an archaeologist making assertions about a find in order to classify using some typology, the semantics of the archaeological process are lost and the real semantic benefits of Linked Data approaches are not realised. We end up with the same simplistic, ‘perfect’, idealised record of reality commonly found in many archaeological information systems.
The notion that there is little audience for richer data models and resources has cropped up in numerous seminars, workshops and conference sessions I have attended and participated in. It is certainly true that many resources to date have not captured the richness of archaeological data and as such it is generally impossible to know why a find was classified in a particular way or which strands of evidence were used in the creation of the overall published narrative. But that is not to suggest that if our data models become more descriptive, this data will not be used. I have had equally as many if not more discussions of potential and desirable use cases for ways in which we can do more with such resources. Everything from better understandings of the archaeological process to tie research frameworks to fieldwork to synthesis in more meaningful ways through to managing change and the propagation of paradigm shifts such as revised typologies through to fieldwork datasets and heritage inventories. Knowing how we come to form then manage our shared knowledge base in the digital world is essential, given the highly theoretical, subjective, multi-vocal and often contradictory nature of archaeological discourse and scholarship.
Taking this to a logical progression would involve some implementation of working archaeological information systems incorporating some of these principals. So for example, moving to a situation where HERs become responsible solely for their own data, as does English Heritage, all of whom publish rich Linked Data which together forms a coherent, national record but without the current resource hungry problems relating to transfer and interchange of data and the all the redundancy, duplication and inconsistency that entails. For a fuller discussion of these themes, see Cripps, 2013.
So I think my research area has a good deal of promise and I hope to be able to demonstrate some practical outputs over the coming months. I will blog here as I go along. Do check back for updates.
Cripps, Paul. 2013. “Places, People, Events and Stuff; Building Blocks for Archaeological Information Systems.” In Proceedings of the Computer Applications in Archaeology Conference, 2012.
Isaksen, Leif. 2011. “Archaeology and the Semantic Web.” PhD Thesis.
Kamensky, John. 2011. “Managing the complicated vs the Complex”. IBM Center for the Business of Government.