Tag Archives: SPARQL

GSTAR web services; now with added GeoJSON

Archaeogeomancy: Digital Heritage Specialists – archaeological geomatics – the majick of spatial data in archaeology – archaeological information systems for the digital age:

"A boundary is an array of an array of an array of arrays" by Paul Downey

“A boundary is an array of an array of an array of arrays” by Paul Downey

Following on from the last update concerning the GSTAR web services, the final pieces of infrastructure for the case studies and demonstrator are nearly complete. Building on the API, a GeoJSON output format has been added so that results from GeoSPARQL queries can a) be accessed via a simple URL as with all other outputs and b) visualised using a web map or indeed any platform which can consume GeoJSON. 

I had assumed this last element would be straightforward, after all, plotting results on a map is just one of those things one would expect from some geospatial resource. But a couple of hurdles presented themselves.

1. Well Known Text

Geospatial data modelled using the GeoSPARQL ontology and stored in a triple store such as Parliament typically makes use of either Well Known Text (WKT) or Geographic Markup Language (GML) to store geometries. Importantly, as per the GeoSPARQL standard there is the ability to include a URI for a Coordinate Reference System (CRS) within a geo:wktLiteral node:

Req 10: All RDFS Literals of type geo:wktLiteral shall consist of an optional URI identifying the coordinate reference system followed by Simple Features Well Known Text (WKT) describing a geometric value. Valid geo:wktLiterals are formed by concatenating a valid, absolute URI as defined in [RFC 2396], one or more spaces (Unicode U+0020 character) as a separator, and a WKT string as defined in Simple
Features [ISO 19125-1].

If using any of the standard outputs from Parliament via Jena using a SPARQL endpoint, the output includes the data as stored and (as was politely pointed out to me by a couple of eminent GeoJSON folks) embedding WKT within JSON structures is not really the done thing:

Consider me slapped. But the triple store is doing exactly as asked, outputting a JSON version of the query results, whatever form they may be in. So I needed an extra step to produce a proper GeoJSON output in which geometries are represented as JSON arrays rather than WKT.

This inclusion of a CRS URI element in a geo:wktLiteral node is problematic for systems expecting a plain old WKT string in which the first element is the geometry type. The overall node structure of a feature geometry when represented in RDF is as follows, comprising the three elements: CRS, WKT + Datatype.

"<http://www.opengis.net/def/crs/EPSG/0/4326> Point(33.95 -83.38)"^^<http://www.opengis.net/ont/geosparql#wktLiteral>

The final element here is the datatype, recorded using the standard ^^<datatype> notation, as generally used for typed literals; this is handled easily and becomes a datatype node in XML and JSON output formats. As such, this element is not a problem when appended to a WKT string.

{ "datatype": "http://www.opengis.net/ont/geosparql#wktLiteral" , "type": "typed-literal", "value" : "<http://www.opengis.net/def/crs/EPSG/0/4326> Point(33.95 -83.38)"}

More generally speaking, it has been noted by the research community (for example by a number of contributors to the Linked Geosptial Data event I attended) that this compound approach is unhelpful and somewhat at odds with the usual approach to semantic structures in which each assertion is represented as a discrete triple. But currently, this is what we have, so that’s that.

Anyway, the solution was actually rather low tech. One of the only Triple Stores to natively support GeoJSON output is Strabon, but the Triple Store selected for use in the GSTAR project is Parliament. So it was necessary to investigate suitable approaches using the platforms in use. A quick look at the Strabon source code (available online under the Mozilla Public License v2.0) revealed an elegantly simple solution. Taking the same example used above, it is clear that there is a pattern to the structure which, with some judicious use of basic Java string functions, can be used to clean up the string and extract the individual components.

Firstly, the EPSG code for the CRS can be extracted by first checking to see if the string begins with a < character, indicating there is a URI present. If so, the EPSG code will be the last element of the URI, appended to the base URI for the namespace. Secondly, the WKT serialisation itself will follow the closing > for the CRS URI tag. These marker elements used to get indices used for substrings are shown in green, the EPSG code shown in blue and the WKT geometry shown in red.

<http://www.opengis.net/def/crs/EPSG/0/4326> Point(33.95 -83.38)

So this gives a nice plain WKT string (the red bit) which can be processed using GeoTools with the added benefit of a Proj4 Coordinate Reference System (derived from the blue bit) which can be used to control any transformations. If there is no CRS element, the input string can be safely returned as is.

2. Generating GeoJSON output

Having resolved the WKT issue, the only remaining hurdle was to take the processed WKT geometries and output JSON coordinate arrays within a GeoJSON structure, as per the GeoJSON standard. This was accomplished using Jackson to manipulate the JSON data combined with geojson-jackson to fuse the JSON and GeoJSON elements. This made it relatively easy to take the Query Results Json Format data and transform it into GeoJSON.

This extra output format has now been added to the Sparqlr API so GeoJSON can be returned for the results of any query containing spatial data.

Results

A final check was to ensure the GeoJSON data is well formed using a suitable viewer, in this case QGIS which has good support for a wide range of OGC standard formats including GeoJSON.

GeoJSON output from Parliament via the Sparqlr API, viewed in QGIS

GeoJSON output from Parliament via the Sparqlr API, viewed in QGIS

Next steps

The final pieces of the puzzle are the demonstrator interface and case study queries showing the capabilities of such resources. The former comprises a web based interface including some Seneschal widgets (for selecting controlled vocabulary items to be used in queries) combined with a web map for visualising results and making spatial selections. The latter comprises some interesting prebuilt GeoSPARQL queries based on real world archaeological research questions, visualised using the same web based UI.

Then just need to finish off that thesis…

The post GSTAR web services; now with added GeoJSON appeared first on Archaeogeomancy: Digital Heritage Specialists.

GSTAR Web Services

Archaeogeomancy: Digital Heritage Specialists – archaeological geomatics – the majick of spatial data in archaeology – archaeological information systems for the digital age:

Web by David Reid

Web by David Reid

With all the source data prepped and ready to go, the next step is to build some demonstrators to show how such geosemantic resources can be used in practice. Whilst very powerful, a Sparql endpoint is not the most friendly way of interacting with data resources, especially from within a web based application where options for programming are a bit limited. There is still quite some debate on this topic which will be covered in more detail in the thesis (watch this space; still on track for submission 1st/2nd quarter 2016!) but the approach I have opted for is an API using web services to provide a range of outputs via a combination of URLs and parameters.

API vs Endpoint

The API is currently being finalised but the initial tests are working well, happily providing a range of outputs from the triple store. This approach allows the web service backend to draw on all the resources needed to handle geosemantic and respond to a range of requests in a form readily digestable by an xHTML+Javascript web application, including mapping. From a geospatial perspective, it means geospatial data can be requested from the webservice in a form ready to be mapped (eg JSON) or used in map popups (eg HTML) rather than having to process large piles of RDF within the browser. It also takes advantage of browser cacheing for the URL based resources, dramatically improving performance by reducing the need for trips to the server to get data.

SPARQLr – the GSTAR web service

Sparklers by Derek Key

Sparklers by Derek Key

The Sparqlr web service which implements the API is a RESTful service and is being built using Jersey which is a great platform for such tasks. This talks to the Parliament triple store via Jena with some Geotools components added to handle spatial data. As the service runs as a Java application on a GlassFish web server, it is possible to make use of the full range of Java tools out there without being limited to what can be achieved within a web browser. And thankfully, much of the code produced previously for the GSTAR Pilot is being recycled! As usual, all development is being undertaken using Eclipse+Maven.

Various queries can be performed in this way, some using basic URL syntax eg http://gstar:8080/sparqlr/api/features to return records about excavated features from archaeological archives as an RDF graph (default) or http://gstar:8080/sparqlr/api/artefacts/ntriple to return records about archaeological objects from museum collections as N-Triples. Parameters are also implemented which can be used to return particular records (eg http://gstar:8080/sparqlr/api/sites?MonUID=MWI11909 to return records about some pits near Stonehenge).

On the geosemantic front, parameters are also being used to pass in geometries as UTF-8 encoded WKT strings to facilitate spatial searches with the incoming WKT geometries used within the web service to add Geosparql filters to Sparql queries (eg http://gstar:8080/sparqlr/api/sites/within?polygon=POLYGON+%28%28569186.11565982+169502.18457639%2C+569186.02731245+169502.23116132%2C+569185.82348375+169502.25234642%2C+569185.70491406+169502.19113299%2C+569185.57672168+169502.04343594%2C+569185.54491719+169501.88381892%2C+569185.64107717+169501.66842781%2C+569185.82308162+169501.55315212%2C+569186.01894577+169501.56512577%2C+569186.19385893+169501.68723228%2C+569186.29291775+169501.91384472%2C+569186.25804717+169502.07848985%2C+569186.11565982+169502.18457639%29%29 to return all sites/monuments within a specified region such as a user generated polygon drawn on a web map).

The bulk of this is now up and running with the next step being to build the web application. This will involve the construction of a web map using OpenLayers to present results and facilitate user input (eg to capture polygonal search areas or points with buffer distances; the kind of spatial searches typically used within web based GIS).

The post GSTAR Web Services appeared first on Archaeogeomancy: Digital Heritage Specialists.

Linked Data: From interoperable to interoperating

Archaeogeomancy: Digital Heritage Specialists – archaeological geomatics – the majick of spatial data in archaeology – archaeological information systems for the digital age:

Piazza Mercato, Siena

Piazza Mercato, Siena

Videos of all the presentations in this CAA session, held in Siena 2015, which I blogged about earlier. Full credit and thanks due to Doug Rocks-Macqueen and his Recording Archaeology project for recording and making this and other sessions available (see also the session on ArchaeoFOSS and the keynotes). Thanks also to Leif Isaksen and Keith May for organising and chairing the session.

The session outline:

Linked Data and Semantic Web based approaches to data management have now become commonplace in the field of heritage. So commonplace in fact, that despite frequent mention in digital literature, and a growing familiarity with concepts such as URIs and RDF across the domain, it is starting to see fall off in Computer Science conferences and journals as many of the purely technical issues are seen to be ‘solved’. So is the revolution over? We propose that until the benefits of Linked Data are seen in real interconnections between independent systems it will not properly have begun. This session will discuss the socio-technical challenges required to build a concrete Semantic Web in the heritage sector.

The videos for the accepted papers:

  • The Syrian Heritage Project in the IT infrastructure of the German Archaeological InstitutePhilipp Gerth, Sebastian Cuy (video)
  • Using CIDOC CRM for dynamically querying ArSol, a relational database, from the semantic webOlivier Marlet, Stéphane Curet, Xavier Rodier, Béatrice Bouchou-Markhoff (video)
  • How to move from Relational to Linked Open Data 5 Star – a numismatic exampleKarsten Tolle, David Wigg-Wolf (video)
  • The Labeling System: A bottom-up approach for enriched vocabularies in the humanitiesFlorian Thiery, Thomas Engel (video)
  • From interoperable to interoperating Geosemantic resourcesPaul J Cripps, Douglas Tudhope (video)

The playlist for the session:

The post Linked Data: From interoperable to interoperating appeared first on Archaeogeomancy: Digital Heritage Specialists.

From interoperable to interoperating Geosemantic resources

Archaeogeomancy: Digital Heritage Specialists – archaeological geomatics – the majick of spatial data in archaeology – archaeological information systems for the digital age:

Ospedale Psichiatrico - the conference venue, aka the Asylum...

Ospedale Psichiatrico – the conference venue, aka (rather appropriately, perhaps) the Asylum…

Following on from my earlier post on CAA2015, my presentation entitled From interoperable to interoperating Geosemantic resources is now available on YouTube thanks to Doug Rocks-Macqueen and his Recording Archaeology project. Indeed, there are a whole collection of presentations from the conference (and numerous others conferences) available, all thanks to Doug’s dedication; his work is a great asset to the community and the growing resource he is creating is of enormous benefit so all thanks due to Doug.

Anyway, back on topic.

There is a competition with a super special prize for the person who guesses correctly the number of times I say ‘um’ during the presentation; answers on a postcard please :-)

Whilst in Siena, as well as hearing all the fantastically interesting talks and networking over a beer or two, there was a little time for some sightseeing and photography:

The post From interoperable to interoperating Geosemantic resources appeared first on Archaeogeomancy: Digital Heritage Specialists.

GSTAR @ Computer Applications in Archaeology (CAA) 2015

Archaeogeomancy: Digital Heritage Specialists – archaeological geomatics – the majick of spatial data in archaeology – archaeological information systems for the digital age:

Conference

Conference

Following on from my presentation at CAA2014 in Paris, I was invited to submit a paper to a session at CAA2015 covering Linked Data (LD) and focussing on the difference between being theoretically interoperable and interoperating in practice.

Conference Session

The session abstract is as follows:

Linked Data and Semantic Web based approaches to data management have now become commonplace in the field of heritage. So commonplace in fact, that despite frequent mention in digital literature, and a growing familiarity with concepts such as URIs and RDF across the domain, it is starting to see fall off in Computer Science conferences and journals as many of the purely technical issues are seen to be ‘solved’. So is the revolution over? We propose that until the benefits of Linked Data are seen in real interconnections between independent systems it will not properly have begun. This session will discuss the socio-technical challenges required to build a concrete Semantic Web in the heritage sector.

We particularly invite papers that offer practical approaches and experience relating to:

  • Interface development and user support for ingestion, annotation and consumption
  • Management, publication and sustainability of Linked Data resources
  • Building cross and inter-domain Linked Data communities
  • Processes for establishing usage conventions of specific terms, vocabularies and ontologies
  • Alignment processes for overlapping vocabularies
  • Engage non-technical users with adopting semantic technologies
  • Licensing and acknowledgment in distributed systems (especially those across multiple legal jurisdictions)
  • Incorporation within other software paradigms: TEI, GIS, plain text, imaging software, VR, etc.
  • Access implications of integrating open and private content
  • Mapping the Field – what components are now properly in place? What remains to be done?

Papers should try to provide evidence of proposed approaches in use across multiple systems wherever possible. Purely theoretical papers and those dealing solely with a single data system are explicitly out of scope for this session.

Keywords: Linked Data, Semantic Web, Web Science

Conference Paper

The abstract for the paper is as follows:

From interoperable to interoperating Geosemantic resources; practical examples of producing and using Linked Geospatial Data

Paul Cripps, University of South Wales (paul.cripps@southwales.ac.uk);

Douglas Tudhope, University of South Wales (douglas.tudhope@southwales.ac.uk)

Keywords: Geospatial; Linked Data; ontology; CIDOC CRM; GeoSPARQL

The concept of using geospatial information within Semantic Web and Linked Data environments is not new. For example, geospatial information was very much at the heart of the CRMEH archaeological extension to the CIDOC CRM a decade ago (Cripps et al. 2004) although this was not implemented; a review of the situation regarding geosemantics in 2005 commented “the semantic web is not ready to provide the expressiveness in terms of rules and language for geospatial application” (O’Dea et al. 2005 p.73). It is only recently that Linked Geospatial Data has begun to become a reality through works such as GeoSPARQL (Perry & Herring 2012; Battle & Kolas 2012), a W3C/OGC standard, and the emerging CRMgeo standard (Doerr & Hiebel 2013). This paper presents some real world, practical examples of creating and working with archaeological geosemantic resources using currently available standards and Open Source tools.

The first example demonstrates a lightweight mapping between the CRMEH, CIDOC CRM and GeoSPARQL ontologies using data available from the Archaeology Data Service (ADS) digital archive and Linked Data repository. The second example demonstrates the use of Ordnance Survey (OS) Open Data within a Linked Data resource published via the ADS Linked Data repository. Both examples feature the use of Open Source tools including the STELLAR toolkit, Open Refine, Parliament, OS OpenSpace API and custom components developed and released under open license.

The first example will also be placed in the context of the GSTAR project which is using the approaches described to produce Linked Geospatial Data for research purposes from commonly used platforms for managing archaeological resources within the UK heritage sector. These include the Historic Buildings and Sites and Monuments Record  (HBSMR) software from exeGesIS, used by UK Historic Environment Records (HERs), and MODES, used by museums for managing museum collections. As such, the outputs from the GSTAR project have wider applicability in moving geosemantic information from interoperable to interoperating in the UK.

Battle, R. & Kolas, D., 2012. GeoSPARQL: Enabling a Geospatial Semantic Web. Semantic Web Journal, 0(0), pp.1–17.

Cripps, P. et al., 2004. Ontological Modelling of the work of the Centre for Archaeology, Heraklion. Available at: http://cidoc.ics.forth.gr/docs/Ontological_Modelling_Project_Report_ Sep2004.pdf.

Doerr, M. & Hiebel, G., 2013. CRMgeo : Linking the CIDOC CRM to GeoSPARQL through a Spatiotemporal Refinement, Heraklion.

O’Dea, D., Geoghegan, S. & Ekins, C., 2005. Dealing with geospatial information in the semantic web. In AOW ’05 Proceedings of the 2005 Australasian Ontology Workshop – Volume 58. pp. 69–73. Available at: http://dl.acm.org/citation.cfm?id=1151945 [Accessed April 13, 2013].

Perry, M. & Herring, J., 2012. OGC GeoSPARQL – A Geographic Query Language for RDF Data, Available at: http://www.opengis.net/doc/IS/geosparql/1.0.

GSTAR @ CAA: From interoperable to Interoperating

My work on the GSTAR project addresses exactly the issues raised in the session abstract through an investigation of the application of Linked Geospatial Data (LGD) and semantic web techniques for archaeological research purposes; This investigation builds on conceptual structures such as the CIDOC CRM, CRM-EH and GeoSPARQL, incorporating real world archaeological data from a range of sources through to providing working technology demonstrators. This was illustrated through the use of case studies based on my research and also through my work on projects such as the Colonisation of Britain project, undertaken for Wessex Archaeology, and the Later Silbury project, being undertaken for Historic England; the former resulted in a Linked Data resource now online at the Archaeology Data Service and the latter will do too upon completion.

Focussing on producing and then using LGD, my talk looked at the background to my research, the methods, techniques and tools used and some of the pitfalls and successes in creating interoperating LGD resources. I had hoped to be a bit further ahead and be able to demonstrate some map based querying and visualisation in action but at the time, these elements were not ready and interacting with a SPARQL endpoint is hardly the most audience grabbing activity! The research was also put into the context of the broader historic environment sector in England by showing how interoperating geosemantic resources could form the backbone of a digital ecosystem to support research, management and development control functions for a broad range of sectoral user groups.

Where next?

Since returning from Siena, work has proceeded apace to finalise the geosemantic resource ready for the next phase of activity; taking real world archaeological research questions and expressing these using GeoSPARQL queries to demonstrate the way in which such resources can be used for research purposes. Part of this involves engaging domain experts to see where and how their research interests can be elucidated through the applications of such approaches (more on this here).

Ultimately, the querying and results visualisation components will be housed in a web based interface, hiding the complexity of SPARQL endpoints, to demonstrate how geosemantic resources can underpin user focussed research tools such as Virtual Learning Environments. Whilst it would have been nice to present more of this at CAA2015, the plan is now to complete this phase over the summer ready for a fully fledged demonstration of the whole (completed and submitted) research project at CAA2016 in Oslo.

The post GSTAR @ Computer Applications in Archaeology (CAA) 2015 appeared first on Archaeogeomancy: Digital Heritage Specialists.