On Thursday 24th April, I gave a presentation on my PhD research project (GSTAR) to the 2014 Computer Applications and Quantitative Methods in Archaeology conference, Paris, France. The presentation formed part of the session S07 Ontologies and standards for improving interoperability of archaeological data: from models towards practical experiences in various contexts organised by Anne-Violaine Szabados, Katell Briatte, Maria Emilia Masci, and Christophe Tufféry. Reinhard Foertsch and Sebastian Rahtz chaired the session.
The abstract describes the talk, which covered work to date in the first year of the project:
Much work has been undertaken over the past decade relating to the application of semantic approaches to archaeological data resources, notably by English Heritage and the University of South Wales. These two organisations, over the course of a number of projects, developed an archaeological extension to the CIDOC CRM ontology through the Ontological Modelling Project (Cripps & May, 2010), then applied this to a number of archaeological resources through the subsequent STAR project (May, Binding and Tudhope, 2011), implementing tools to facilitate integration of other resources through the STELLAR project (May, Binding, Tudhope, & Jeffrey, 2012), and now, in partnership with the Bespoke HER User Group, RCAHMS, RCAHMW and Wessex Archaeology, are implementing SKOS based vocabularies and associated tools to enable the augmentation of these semantic resources through the SENESCHAL project.
From the outset, it was observed that the spatial component of archaeological data would be a key element, archaeological data being inherently spatial in nature. To date, most current applications of spatial semantics in the heritage sector have focussed on place names and named locations for sites and monuments and object provenances. The GSTAR project aims to extend semantic approaches to archaeological data fully into the geospatial domain and is instead focussing on the detailed spatial data emerging from archaeological excavation and survey work and is investigating approaches for the creation, use, management and dissemination of such spatial data within a geosemantic framework, building on the CIDOC CRM, with particular reference to sharing and integration of disparate resources.
This paper will present work to date in the first year of the GSTAR project. This has been centred on the identification of suitable platforms and methods for the integration of semantic and geospatial data including comparisons of different approaches emerging from the Semantic Web and Geospatial research communities. Testing and prototyping has been accomplished using sample data from the Archaeology Data Service, making use of available geospatial and (geo)semantic tools, both FOSS and commercial.
Cripps, P. and K. May 2010. To OO or not to OO? Revelations from Ontological Modelling of an Archaeological Information System, in: Nicolucci, F. and S. Hermon (eds.), Beyond the Artifact. Digital Interpretation of the Past. Proceedings of CAA2004, Prato 13–17 April 2004. Archaeolingua, Budapest, pp. 59-63.
May, K., C. Binding and D. Tudhope 2011. A STAR is Born: Some Emerging Semantic Technologies for Archaeological Resources, in: Jerem, E., F. Redő and V. Szeverényi (eds.), On the Road to Reconstructing the Past. Computer Applications and Quantitative Methods in Archaeology (CAA). Proceedings of the 36th International Conference. Budapest, April 2-6, 2008. Archeaeolingua, Budapest, pp. 111-116 (CD-ROM 402-408).
May, K., C. Binding, D. Tudhope and S. Jeffrey 2012. Semantic Technologies Enhancing Links and Linked Data for Archaeological Resources, in: Zhou, M., I. Romanowska, Z. Wu, P. Xu and P. Verhagen (eds.), Revive the Past. Computer Applications and Quantitative Methods in Archaeology (CAA). Proceedings of the 39th International Conference, Beijing, April 12-16.. Pallas Publications, Amsterdam, pp. 261-272.
The presentation is available on Slideshare:
The presentation also prompted some positive comments on Twitter, which was lovely:
Next it's @pauljcripps on 'Geosemantic Tools for Archaeological Research' (GeoSPARQL). Looking forward to this! #caa2014
I’ll be talking about geospatial topics relating to historic environment information management at this seminar on 14th May. Another classic title for the event, following up on the successful NACHOS seminar. Watch this space for details of the forthcoming Burritos workshop…
More seriously, the event is described as:
On 14 May 2014 the Council for British Archaeology (CBA) is hosting a one day seminar on behalf of FISH and HEIRNET at the University of York to discuss common issues facing the historic environment information sector and make progress towards a shared vision and agenda for historic environment information management.
The key aims of the seminar are to:
Encourage discussion between different groups that produce and manage historic environment information from across the sector (professional, research and voluntary to identify common goals and issues
Develop information sharing networks and working partnerships across the sector to pool resources in the areas of skills development and application of information technology
The TACOS keynotes, discussions and demonstrations will build upon a ‘show and tell’ event (the NACHOS seminar) held at the British Museum in November 2012, which identified the need for integration of information sources in support of the National Heritage Protection Plan (NHPP). The seminar will investigate current historic environment information management practices and identify areas for improvement through cross-sector collaboration through three overarching themes of:
Use of information and reuse of data (e.g. ‘Big Data’ projects reusing historic environment information/datasets, the role of information standards, the integration of different types of historic environment information built heritage information
Skills development (e.g. skill gaps in professional practice, university provision)
Use of new information systems and technology (e.g. access to information and technology, how skills development and training is accessed – potential barriers)
I’ll be talking about my research and some of the opportunities now available for making better use of digital heritage information, particularly geospatial data. Hopefully this will complement the talks by Pater McKeague (RCAHMS), Ceri Binding (University of South Wales) and Dan Pett (PAS) in particular but will also touch on skills issues being discussed by Kenny Aitchison (Landward Research), Julian Richards (University of York) and Ed Lee (EH). It’s only a fifteen minute talk so I will try to focus on direction, overview and a bit of blue skies thinking; there’s more detail on many of these topics in my various publications.
The talks will be videod and streamed (where possible) and there will be social media channels too, so do keep an eye out on twitter. My slides will also be on my slideshare after the event.
LGD14 Barcamp, featuring open plan space and beanbags.
I was very pleased to attend this event co-organised by the World Wide Web Consortium (W3C) through the SmartOpenData project, the Open Geospatial Consortium (OGC), the UK Government (data.gov.uk), the Ordnance Survey (OS) and Google. Hosted by Google Campus London, the two day event comprised presentations, lightening talks and a barcamp, all focussing on the use of geospatial data within the world of Linked Data. It was refreshing to be amongst researchers, users, developers and commercial folk all working in this area; I for one picked up some good ideas to help with my research project and hopefully my contributions were of use.
It was certainly good to bring together the camps working in this area: the geospatial technologists on the one side and the web folks on the other (And people like me who have one foot in each camp, as well as limbs in other domains, my primary domain being digital cultural heritage of course). To make this stuff work it’s going to take both groups working together through their respective consortia, the W3C and OGC.
I noted a number of specific highlights that really inspired and gave me food for thought. Some reinforced my own perceptions and others gave me some new ideas for application to my project. The extensive use of IRC and Twitter combined with fast internet access throughout the event made it possible to discuss and find out more whilst talks were ongoing. The format lent itself to interaction and I was impressed by the amount of progress made in such a short space of time, with new working groups forming and ideas for revisions to standards such as GeoSPARQL forthcoming.
Some of my favourite bits:
Ontologies and Linked Data
The discussion of the relationship of ontologies to Linked Data resources was informative. Whilst there is tendency in the world of the web to target the low hanging fruit, publish data and sort out issues later, it is my opinion that there needs to be robust semantics within our Linked Data resources. Otherwise we have a web of mess rather than semantically interoperable data. I noted a couple of points made by Tim Duffy (British Geological Survey) that resonated here:
Kerry Taylor (Commonwealth Scientific and Industrial Research Organisation) gave some examples of where ontological development can support but also restrict aims, showing how things can go wrong when trying to implement the various standards out there. This is an important point; ontologies need to be simple enough to work with but also suit the domain and applications.
GeoSPARQL and geometries
It was interesting to note that the use of Well Known Text (WKT) within GeoSPARQL can be problematic; hearing Lars G. Svensson (Deutsche Nationalbibliothek) talk about their experiences was reassuring given my experiences over the past few months!
Two crucial issues were raised by Raphaël Troncy (Eurecom) relating firstly to the use of coordinate systems and secondly to the way in which geometries are represented. I have often found the way in which geospatial data is used on the web to be problematic, with web developers focussing solely on location with only minimal respect for Coordinate Reference Systems (CRS), Spatial Reference System (SRS) or Spatial Reference Identifiers (SRID). In many cases, this is an acceptable way of working (if you just want features on maps in roughly the right place) but lack of clarity regarding spatial frameworks is problematic for any more detailed use of geospatial data. Being explicit about coordinate systems is essential for transforming between different coordinate systems and also takes into account factors such as tectonic plate movement. Put simply, assuming WGS84 is the only way to reference coordinates is a gross oversimplification.
Secondly, he went on to talk about the implementation of this within GeoSPARQL. The standard does support CRS (a good start) but the implementation is a little complex in my view. He suggested making CRS definitions simply part of the semantic model rather than being fudged into a geometry node as they currently are; a geometry node currently comprises up to three parts, the first being an (optional) SRID, the second being the geometry itself and the third being a literal describing the format of the geometry (eg a WKT or GML literal). It was suggested that these could better be stored as individual assertions relating to a geometry object and this was well received and may well appear in the next version of the standard: hurrah!
great idea from @rtroncy – represent the #CRS / #SRID as an explicit triple rather than as part of the geometry node – YES!!! #lgd14
A thorny issue if ever there was one. With heritage data in particular, it is important to know provenance of vocabularies. This topic came up a couple of times and it was pleasing to hear that a lightweight solution exists (current and then historical, versioned namespaces; bit clunky but doable) and versioning can be more fully supported using ontologies designed for the purpose.
A key question with Linked Data is how do you know who is using your data? Does this matter? Arguably not, but as with anything, proper citation and accreditation is useful, polite and can be used to demonstrate impact (a good thing when looking for funding). Turns out that Adam Leadbetter (British Oceanographic Data Centre) and Dicky Allison (Woods Hole Oceanographic Institution) have both been using the Heritage Data vocabularies I blogged about previously, which is great stuff but this only came to light through seeing the inclusion of English Heritage as a provider on one of their slides!
Precision & Accuracy
Important concepts for heritage data are precision and accuracy. When working with historic maps in particular it is important to be able to record tolerances against which data has been captured. As with coordinate systems, this is an area often ignored in the world of Linked Data with coordinates expressed to spurious levels of precision (ten decimal places is a *seriously* precise measurement!) with no metadata to describe overall accuracy. Coming from a geospatial background where these are core items of metadata, the lack of proper support for this within current Linked Data standards is problematic. It took a speaker working with heritage data to make this point; nice one Rob Warren (Big Data Institute, Dalhousie University).
There was talk of temporal aspects to data, most spatial data have some kind of temporal component to it. Interestingly, the data I work with is placed in archaeological time and rarely do we have any absolute temporal data; chronologies are typically relative and imprecise with occasional pegs to actual temporal classes generally used in Linked Data (timestamps, dates, etc). I think this makes for an interesting area to try out ideas and the way this is represented in cultural heritage ontologies such as the CIDOC CRM, whilst being a bit different to the norm, actually encapsulates some very powerful constructs for working with spatio-temporal data.
Last but not least, there was a liberal spread of really cool stuff.
Strabon had a few mentions, with a point made that the GeoKnowreport on platforms had evaluated an old version and actually Strabon is now a very capable and scalable system. Being a semantic spatio-temporal system built from the ground up rather than than adding semantic, spatial and temporal functionality to an existing system sounds promising. I will certainly be reviewing it in more detail as a result.
Also, building on the Strabon system comes Sextant. This application is described as:
Sextant is a web-based system for the visualization and exploration of time-evolving linked geospatial data and the creation, sharing, and collaborative editing of `temporally-enriched’ thematic maps which are produced by combining different sources of such data and other geospatial information available in standard OGC file formats (e.g., KML).
This looks like a very interesting platform for mapping geosemantic data, one which I will definitely be investigating further.
An absolutely brilliant piece of work was presented by John Goodwin (Ordnance Survey) called entitled Rapid Assembly of Geo-centred Linked Data applications (RAGLD). A collaboration between the University of Southampton, Ordnance Survey and Seme4, this project provides a neat suite of developer tools (currently in beta) for working with Linked Geospatial Data. Massive +1 from me!
Another really interesting platform is map4rdf. This is described as:
map4rdf is a mapping and faceted browsing tool for exploring and visualizing RDF datasets enhanced with geometrical Information. map4rdf is an open source software. Just configure it to use your SPARQL endpoint and provide your users with a nice map-based visualization of your data.
Again, this is one I will be investigating further for my GSTAR project.
Is just cool. Enough said. Love their displays of historical computer gear and of course the open plan, bean bag filled working space. Really tempted to join up and hang out there more (if only the trains to London didn’t require a mortgage…)
A brilliant event, well organised and some amazing ideas and discussion. Not only that, but an excellent forum for meeting people working in the same subject area; my Twitter peeps grew considerably as a result and I’ve added lots of new folks to my LinkedGeoData list.
Big thanks of course to John Goodwin and Phil Archer for leading on the organisation front.
The first investigation in the GeoSemantic Technologies for Archaeological Research (GSTAR) research project is nearing completion, an assessment of approaches to the integration of geospatial archaeological data into a semantic framework to provide geosemantic capabilities.
The investigation draws on archaeological excavation data lodged with the Archaeology Data Service (ADS) and made available as Linked Data (LD) through the ADS’s Linked Data platform. The data relates to the Cobham Golf Course site and was produced by Oxford Archaeology (OA) as part of the Channel Tunnel Rail Link (CTRL) project then turned into a Linked Data resource through the Semantic Technologies Enhancing Links and Linked data for Archaeological Resources (STELLAR) project, undertaken by the Hypermedia Research Unit at the University of South Wales (USW).
Mapping a feature by Wessex Archaeology
The GSTAR literature review identified two strands of integration approaches within published literature. Emerging from the semantic web and Linked Data communities, an approach involving the direct inclusion of geospatial data within semantic resources, leveraging World Wide Web Consortium (W3C) standards for Resource Description Framework (RDF) and Open Geospatial Consortium (OGC) standards for Well Known Text (WKT, part of the Simple Features specification) and GeoSPARQL. Emerging from the Geographic Information Science (GISc) community, approaches involving the use of Web Feature Services (WFS) within broader Spatial Data Infrastructures (SDI) running in parallel and linked to to semantic resources.
This initial GSTAR investigation looked at both these strands with a view to assessing suitable modes for subsequent use in the next phases of the GSTAR project. A WISSKI installation has also been setup to allow for the minting of any additional URIs needed.
The Spatial co-ordinates of a Context, defining the actual spatial extent of the context. Usually recorded at the time of excavation or other investigative work
[/code] The OWL definition of the EHE0022 class used to describe depictions
Further triples were also added to describe the depiction using the GeoSPARQL ogc:hasGeometry and ogc:asWKT properties.
A spatial representation for a given feature.
A spatial representation for a given feature.
<dc:creator>OGC GeoSPARQL 1.0 Standard Working Group</dc:creator>
A spatial representation for a given feature.
The OWL definition of the hasGeometry property
GIS Server route
A second approach used the same base platform and data but accessed the geospatial component via WFS provided by GeoServer, drawing on the Oracle database.
I gave a talk on my PhD research (the GSTAR project) at the Hestia 2 event in Southampton last Thursday. Given I am still early on in the process, and having been asked to relate my work to the world of commercial archaeology, I decided to follow an overview of my research with some ideas for the future and how Linked Data approaches could be used to overhaul the (painful and convoluted) ways we manage heritage data in the UK.
In the questions and discussion afterwards, a couple of topics that have presented before came up. Firstly, the idea that the CIDOC CRM and derivatives such as CRM EH are unnecessarily complicated. Secondly, the idea that this serves no useful purpose as very few people, if anyone at all, will be interested in using the richness contained therein.
Complicated and Complex
The first issue is one I have written up in my literature review but in one which requires further investigation. Ontologies such as the CIDOC CRM are indeed complicated and I have been quoted as describing working with them as ‘difficult’ (eg Isaksen, 2011). I do still think though that such approaches are unavoidable to adequately represent the complexity of heritage data. A major criticism of computational approaches in archaeology going back some decades was the idea that they are inherently reductionist; to fall back to overly simplistic data models has the potential to fall foul of the same problems inherent in what are now seen as legacy information systems, but in their day were equally as cutting edge as Linked Data approaches are today. There is a big difference between complex and complicated (see eg Kamensky, 2011) and the latter is sometimes essential to describe the former.
Complexity by versionz
CIDOC CRM and extensions such as CRM EH can be described as complicated but working with them is quite straightforward once the class structure is understood. Especially the ability to create and use patterns, re-useable blocks which can easily be recycled. And the benefits of using such a rich model to describe complex scenarios far outweighs any disadvantages associated with complicated models and/or implementation difficulties. After all, the world both now and in the past is/was a complex place!
As an example, let’s look at the relationship between a find and the stratigraphic unit in which it was found. In CRM EH terms, we would start with a production event which would create the object and ultimately some kind of deposition event resulting in the find coming to rest in the location where it is ultimately found by an archaeologist through excavation or by a member of the public through a casual find or metal detecting or however.
A much simpler model would simply record a bunch of attributes against a find, a data-centric view of the world of data of the kind prevalent in relational systems modelled using eg entity-relationship type approaches. Taking classifications of finds in particular, this simple approach is problematic as without any explicit recording of the process of an archaeologist making assertions about a find in order to classify using some typology, the semantics of the archaeological process are lost and the real semantic benefits of Linked Data approaches are not realised. We end up with the same simplistic, ‘perfect’, idealised record of reality commonly found in many archaeological information systems.
Audience? by orkomedix
The notion that there is little audience for richer data models and resources has cropped up in numerous seminars, workshops and conference sessions I have attended and participated in. It is certainly true that many resources to date have not captured the richness of archaeological data and as such it is generally impossible to know why a find was classified in a particular way or which strands of evidence were used in the creation of the overall published narrative. But that is not to suggest that if our data models become more descriptive, this data will not be used. I have had equally as many if not more discussions of potential and desirable use cases for ways in which we can do more with such resources. Everything from better understandings of the archaeological process to tie research frameworks to fieldwork to synthesis in more meaningful ways through to managing change and the propagation of paradigm shifts such as revised typologies through to fieldwork datasets and heritage inventories. Knowing how we come to form then manage our shared knowledge base in the digital world is essential, given the highly theoretical, subjective, multi-vocal and often contradictory nature of archaeological discourse and scholarship.
Taking this to a logical progression would involve some implementation of working archaeological information systems incorporating some of these principals. So for example, moving to a situation where HERs become responsible solely for their own data, as does English Heritage, all of whom publish rich Linked Data which together forms a coherent, national record but without the current resource hungry problems relating to transfer and interchange of data and the all the redundancy, duplication and inconsistency that entails. For a fuller discussion of these themes, see Cripps, 2013.
So I think my research area has a good deal of promise and I hope to be able to demonstrate some practical outputs over the coming months. I will blog here as I go along. Do check back for updates.
Cripps, Paul. 2013. “Places, People, Events and Stuff; Building Blocks for Archaeological Information Systems.” In Proceedings of the Computer Applications in Archaeology Conference, 2012.