Postgraduate Researchers Presentation Day – 2nd prize for best oral presentation
I am very pleased to report that I won an award for my presentation on the GSTAR project at the Postgraduate Researchers Presentation Day, held on 15th May 2015 at the University of South Wales Postgraduate Research Centre, Trefforest Campus.
The award was for 2nd place best oral presentation. Very happymaking
It was also really good to have some positive feedback from professors and peers outside of my specific area of research. Next stop, thesis, viva and (hopefully) award of degree next year – fingers crossed!
Following on from my presentation at CAA2014 in Paris, I was invited to submit a paper to a session at CAA2015 covering Linked Data (LD) and focussing on the difference between being theoretically interoperable and interoperating in practice.
The session abstract is as follows:
Linked Data and Semantic Web based approaches to data management have now become commonplace in the field of heritage. So commonplace in fact, that despite frequent mention in digital literature, and a growing familiarity with concepts such as URIs and RDF across the domain, it is starting to see fall off in Computer Science conferences and journals as many of the purely technical issues are seen to be ‘solved’. So is the revolution over? We propose that until the benefits of Linked Data are seen in real interconnections between independent systems it will not properly have begun. This session will discuss the socio-technical challenges required to build a concrete Semantic Web in the heritage sector.
We particularly invite papers that offer practical approaches and experience relating to:
Interface development and user support for ingestion, annotation and consumption
Management, publication and sustainability of Linked Data resources
Building cross and inter-domain Linked Data communities
Processes for establishing usage conventions of specific terms, vocabularies and ontologies
Alignment processes for overlapping vocabularies
Engage non-technical users with adopting semantic technologies
Licensing and acknowledgment in distributed systems (especially those across multiple legal jurisdictions)
Incorporation within other software paradigms: TEI, GIS, plain text, imaging software, VR, etc.
Access implications of integrating open and private content
Mapping the Field – what components are now properly in place? What remains to be done?
Papers should try to provide evidence of proposed approaches in use across multiple systems wherever possible. Purely theoretical papers and those dealing solely with a single data system are explicitly out of scope for this session.
Keywords: Linked Data, Semantic Web, Web Science
The abstract for the paper is as follows:
From interoperable to interoperating Geosemantic resources; practical examples of producing and using Linked Geospatial Data
Paul Cripps, University of South Wales (firstname.lastname@example.org);
Douglas Tudhope, University of South Wales (email@example.com)
The concept of using geospatial information within Semantic Web and Linked Data environments is not new. For example, geospatial information was very much at the heart of the CRMEH archaeological extension to the CIDOC CRM a decade ago (Cripps et al. 2004) although this was not implemented; a review of the situation regarding geosemantics in 2005 commented “the semantic web is not ready to provide the expressiveness in terms of rules and language for geospatial application” (O’Dea et al. 2005 p.73). It is only recently that Linked Geospatial Data has begun to become a reality through works such as GeoSPARQL (Perry & Herring 2012; Battle & Kolas 2012), a W3C/OGC standard, and the emerging CRMgeo standard (Doerr & Hiebel 2013). This paper presents some real world, practical examples of creating and working with archaeological geosemantic resources using currently available standards and Open Source tools.
The first example demonstrates a lightweight mapping between the CRMEH, CIDOC CRM and GeoSPARQL ontologies using data available from the Archaeology Data Service (ADS) digital archive and Linked Data repository. The second example demonstrates the use of Ordnance Survey (OS) Open Data within a Linked Data resource published via the ADS Linked Data repository. Both examples feature the use of Open Source tools including the STELLAR toolkit, Open Refine, Parliament, OS OpenSpace API and custom components developed and released under open license.
The first example will also be placed in the context of the GSTAR project which is using the approaches described to produce Linked Geospatial Data for research purposes from commonly used platforms for managing archaeological resources within the UK heritage sector. These include the Historic Buildings and Sites and Monuments Record (HBSMR) software from exeGesIS, used by UK Historic Environment Records (HERs), and MODES, used by museums for managing museum collections. As such, the outputs from the GSTAR project have wider applicability in moving geosemantic information from interoperable to interoperating in the UK.
Battle, R. & Kolas, D., 2012. GeoSPARQL: Enabling a Geospatial Semantic Web. Semantic Web Journal, 0(0), pp.1–17.
Cripps, P. et al., 2004. Ontological Modelling of the work of the Centre for Archaeology, Heraklion. Available at: http://cidoc.ics.forth.gr/docs/Ontological_Modelling_Project_Report_ Sep2004.pdf.
Doerr, M. & Hiebel, G., 2013. CRMgeo : Linking the CIDOC CRM to GeoSPARQL through a Spatiotemporal Refinement, Heraklion.
O’Dea, D., Geoghegan, S. & Ekins, C., 2005. Dealing with geospatial information in the semantic web. In AOW ’05 Proceedings of the 2005 Australasian Ontology Workshop – Volume 58. pp. 69–73. Available at: http://dl.acm.org/citation.cfm?id=1151945 [Accessed April 13, 2013].
Perry, M. & Herring, J., 2012. OGC GeoSPARQL – A Geographic Query Language for RDF Data, Available at: http://www.opengis.net/doc/IS/geosparql/1.0.
GSTAR @ CAA: From interoperable to Interoperating
My work on the GSTAR project addresses exactly the issues raised in the session abstract through an investigation of the application of Linked Geospatial Data (LGD) and semantic web techniques for archaeological research purposes; This investigation builds on conceptual structures such as the CIDOC CRM, CRM-EH and GeoSPARQL, incorporating real world archaeological data from a range of sources through to providing working technology demonstrators. This was illustrated through the use of case studies based on my research and also through my work on projects such as the Colonisation of Britain project, undertaken for Wessex Archaeology, and the Later Silbury project, being undertaken for Historic England; the former resulted in a Linked Data resource now online at the Archaeology Data Service and the latter will do too upon completion.
Focussing on producing and then using LGD, my talk looked at the background to my research, the methods, techniques and tools used and some of the pitfalls and successes in creating interoperating LGD resources. I had hoped to be a bit further ahead and be able to demonstrate some map based querying and visualisation in action but at the time, these elements were not ready and interacting with a SPARQL endpoint is hardly the most audience grabbing activity! The research was also put into the context of the broader historic environment sector in England by showing how interoperating geosemantic resources could form the backbone of a digital ecosystem to support research, management and development control functions for a broad range of sectoral user groups.
Since returning from Siena, work has proceeded apace to finalise the geosemantic resource ready for the next phase of activity; taking real world archaeological research questions and expressing these using GeoSPARQL queries to demonstrate the way in which such resources can be used for research purposes. Part of this involves engaging domain experts to see where and how their research interests can be elucidated through the applications of such approaches (more on this here).
Ultimately, the querying and results visualisation components will be housed in a web based interface, hiding the complexity of SPARQL endpoints, to demonstrate how geosemantic resources can underpin user focussed research tools such as Virtual Learning Environments. Whilst it would have been nice to present more of this at CAA2015, the plan is now to complete this phase over the summer ready for a fully fledged demonstration of the whole (completed and submitted) research project at CAA2016 in Oslo.
The main focus of the GSTAR project is to investigate the use of geosemantic technologies for archaeological research purposes. To this end, a geosemantic resource has been created from a range of sources and the next step is to express real world archaeological research questions in the form of queries which can be actioned on this resource. Whilst I have my own ideas regarding interesting research questions for my study area, in order to engage with the broader research community and draw on their extensive experience and knowledge, I will be taking GSTAR on the road tomorrow, giving an overview of the project to the Avebury and Stonehenge Archaeological and Historical Research Group so as to be able to pick their brains about potential areas of archaeological research which may be interesting and fruitful to explore.
To date, such exploration may well have been hindered by the usual silo based storage of archaeological information compounded by semantic inconsistencies between data sources. Not to mention the lack of spatial integration or integration at different scales of recording. Having a wealth of interoperable data and tools with which to explore and analyse it may just open doors previously locked and barred (or at least firmly jammed shut).
Of course, the primary aim of the project is to investigate the technologies but, being an archaeologist I would like to engage with archaeological discourse in addition to focussing on matters technological. After all, for the technology demonstrators to be successful, they really need to be able to support real world topics rather than just using simple use cases solely to demonstrate some theoretical situations; far too many demonstrators and exemplars are based around simplistic scenarios which obviously work but are far removed from any real world applications. We know already that the technologies underpinning the GSTAR project work ‘in the lab’ but can they be applied successfully in a complex subject domain such as archaeology for complex use cases such as those presented by archaeological research processes?
I’ve been using Mendeley now for a long time and as one of their advisors, I am a keen advocate of the platform. It makes my life so much easier through managing my references, my pdf collection, it’s ability to gather references from online resources, mobile app support (I use Scholarley until an official app emerges) and the very neat plugin for MS Word to add and format citations.
RefMe – The free web and mobile tool to generate citations, reference lists and bibliographies
But when it comes to hard copy, there is no other solution than to manually create an entry in Mendeley. Till now. I’ve signed up with RefMe which has a handy mobile app which can scan bar codes on published works and generate references automatically. Even better, it can then output these references to Mendeley. RefMe offers a whole bunch of other functionality too but for me, I don’t need another reference manager. Being able to generate references in my Mendeley database using the tools RefMe provides is, however, just plain brilliant. So, workflow is now:
Wave phone at source material (book, journal, etc)
Update Mendeley from RefMe (one click)
Make and drink tea whilst writing and citing from within Word
To be honest, most of my work relating to my current research makes use of digital resources obtained through library subscriptions to journals but where this really comes into its own is when I need to reference my own library, the collection of books and journals amassed over the many years I’ve been working in archaeology. I just need to spend an afternoon cataloguing the lot now…
One of the outputs from the Pilot Study was an approach to working with geospatial data within the broader framework provided by the CIDOC CRM ontology and the CRMEH archaeological extension. Whilst there is ongoing work by myself and others to add archaeological and spatio-temporal components directly to the CIDOC CRM, for the purposes of the GSTAR project, a lightweight approach has been developed and deployed to suit the needs of the project; CRMEH already adds archaeological excavation capabilities and the spatial extension presented here gives a range of geospatial capabilities, as provided by a mapping to GeoSPARQL.
For the purposes of the GSTAR project, there is a need to be able to incorporate into semantic resources rich geospatial data representing depictions of archaeological features, sites and monuments, also boundaries of activities and events plus locations where objects were discovered. Whilst the CRMEH was developed with spatial information at its core, this has not, to date, been formally expressed. This is now possible using GeoSPARQL.
During the early stages of the GSTAR project, related work became apparent, notably two extensions to the CIDOC CRM (of which CRMEH is itself an extension) pertaining to spatio-temporal information (CRMgeo) and archaeological excavation information (CRMarchaeo). These will ultimately offer greater research potential but the mapping presented here can be seen as an example of a simple, lightweight solution targeting the ‘low hanging fruit’ so often talked about with respect to ontologies and Linked Data; a mapping which meets the needs of the GSTAR project, retains compatibility with the CIDOC CRM and GeoSPARQL standards and provides core geospatial functionality for CRMEH albeit without the reasoning power (and associated complexity) of the two aforementioned extensions.
Application within GSTAR
The semantic resource will be used through the Case Studies planned for the GSTAR project to investigate the use of geosemantic tools for archaeological research. Two of these are focussing on the integration aspect, looking at what I have defined as ‘horizontal’ and ‘vertical’ integration using the spatial components of source data. Horizontal integration refers to linkages between inventories, ie from site finds inventories to museum object inventories to sites and monuments inventories. Vertical integration refers to linkages between primary and derived data, from fieldwork databases containing records of features and finds up to inventories of higher order data objects containing records derived from these primary observations.
Related Work I – CRMgeo
Work is ongoing to produce a spatio-temporal model through integration of GeoSPARQL and CIDOC CRM: the CRMgeo extension, currently in draft form. This promised to be an incredibly powerful resource capable of advanced spatio-temporal description and reasoning.
The work is described as follows:
CRMgeo is an extension for the CIDOC CRM to provide an “articulation” (linkage) between the standards of the geospatial and the cultural heritage community in particular between GeoSPARQL and CIDOC CRM. The model was developed from the analysis of the epistemological processes of defining, using and determining places. This means that we analyzed how a question, such as “is this the place of the Varus Battle” or “is this the place where Lord Nelson died”, can be verified or falsified, including geometric specifications. Consequently, we reached at a detailed model which seems to give a complete account of all practical components necessary to verify such a question, in agreement with the laws of physics, the practice of geometric measurement and archaeological reasoning. This model indeed appears to have the capability to link both ontologies and shows the way how to correctly reconcile data at any scale and time – not by inventing precision or truth that cannot be acquired, but by quantifying or delimiting the inherent indeterminacies, as it is good practice in natural sciences. This model aims at being a comprehensive theory from which mutually compatible simplification can be derived for implementations in more constraint environment, such at those lacking moving frames.
Related Work II – CRMarchaeo
Similarly, work is ongoing to produce an archaeological excavation model: the CRMarchaeo extension, currently in draft form. This promises to support description of and reasoning about archaeological excavation information from a range of recording methodologies.
This project is described as follows:
CRMarchaeo is an ontology and RDF Schema to encode metadata about the archaeological excavation process.
The goal of this model is to provide the means to document excavations in such a way that the following functionality is supported:
Maximize interpretation capability after excavation or to continue excavation Reason of excavation (goals). What is the archaeological question?
Possibility of knowledge revision after excavation
Comparing previous excavations on same site (space)
All kinds of comprehensive statistical studies (“collective behavior”)
My contribution to CRMarchaeo is running in parallel to my work on the CRMEH. Whilst ultimately there will need to be some decisions as to which extension to use for new projects and resources, there is currently a fair amount of data out in the wild which uses CRMEH and at least until CRMarchaeo is finalised and probably longer, there will be some co-existence of these two complimentary models. After all, the two models are very much related and oversight has been maintained to ensure a good degree of compatibility between them.
A lightweight mapping
A decision was made to create a lightweight mapping of CRMEH to GeoSPARQL rather than implement a combination of CRMarchaeo and CRMgeo for three main reasons:
Firstly, these extensions are centred on the core CIDOC CRM ontology rather than the CRMEH extension. As CRMEH is being used for the GSTAR project, their use would have required a mapping process anyway to ensure compliance.
Secondly, both of these ‘emerging’ standards are currently in draft form, in the process of being finalised and formally adopted. As such, they are not fixed yet and subject to review, improvement and change; Some components in particular still require more work to completion.
Finally, the degree to which the advanced features offered by these extensions could be made use of through the GSTAR project is uncertain. A lightweight mapping can be seen as an 80% or 90% solution, covering most eventualities and avoiding the overheads associated with the rather more complex extensions. But retaining overall compatibility.
The key spatial components needed are already present in CRMEH. There are two main components covering excavation data: the Context (aka Stratigraphic Unit; the atomic unit of archaeological recording) and the ContextDepiction (a depiction of the Stratigraphic Unit, typically a polygon shown in plan view). A Context is related to a ContextDepiction through the property Depicts / Is Depicted By with a Context being depicted by one or more depictions.
These extend from the core CIDOC CRM: the CRMEH class Context (EH0007) is a subclass of Place (E57) whilst ContextDepiction (EH0022) is a subclass of Place Appellation (E44). In GeoSPARQL, there are also two classes to describe spatial information with Features having some representation in the form of Geometry. There is a good alignment here between the CRMEH classes (or indeed the parent CIDOC CRM classes) and the GeoSPARQL classes, allowing the ontologies to be linked as described in the GeoSPARQL User Guide written by Dave Kolas and Robert Battle.
This is illustrated in the following diagram:
Alignment of CRMEH classes and properties with GeoSPARQL classes and properties
As shown in the diagram, the rdfs:subClassOf, rdfs:subPropertyOf and rdfs:isA relationships can be used to link the two ontologies. This maps the necessary classes and also allows instances of Context Depictions to behave as Simple Features as used within GeoSPARQL.
From mapping to RDF
This mapping allows Contexts to be depicted by one or more pieces of geometry, each instance of a ContextDepiction making use of an OGC Simple Features type (Point, Line, Polygon, etc) and represented using one of the standard formats, in this case WKT.
The mapping can also be applied at the broader CIDOC CRM level and inherited by the CRM EH (and other) classes and properties if this is advantageous.
With respect to data, resources can be created very simply by adding the class inheritance relationships once to a given resource then creating appropriate assertions relating to the ContextDepiction. This means in practice, GIS data can be converted very easily using a variety of tools (eg StringTemplate, Java(script), Python, VB or even Microsoft Excel) to produce suitable RDF of whatever flavour (ntriples, turtle, etc) ready for ingestion into a triple store.
An example of this for a single ContextDepiction is shown below:
In the example (above), namespaces are shown in blue, subproperty/subclass relationships to be defined once in greenand the block of RDF to be used for each ContextDepiction in red. NB The GSTAR namespace houses a WISSKI installation, currently not configured and acting as a placeholder only; the URIs do not resolve.
A working system
The system used for the Pilot Study took GIS data from the ADS archives, processed it as above and loaded it alongside the CRMEH RDF encoding and Erlangen CIDOC CRM RDF encoding. This was then successfully tested using a range of OGC spatial operators, SPARQL and GeoSPARQL queries.
For a fuller account of this, please see the Transfer Report when that is published more widely. Or wait a while longer for the thesis (due 2016).
In summary, whilst emerging standards building on the CIDOC CRM covering geospatial and archaeological excavation are forthcoming, a simple, lightweight approach can also be deployed for this use case to give a good range of functionality without the complexity, albeit sacrificing some semantic richness.
The mapping described here could also be applied directly to the parent CIDOC CRM classes/properties (rather than the child CRMEH classes/properties used here) to give a more generic linkage to GeoSPARQL suitable for use in a broader range of cases.