Provenance

Provenance concerns information about how an entity came to be and about its contributions towards the existence of others. Dydra exposes meta-data about repositories sufficient to answer questions about - data lineage - retrospective repository state - responsibility for changes

Lineage is first-order only. That is, the information about a transaction enumerates the data set constituents at the level of graphs. In does not describe dependencies at the level of individual inserted or deleted statement, [cui2001] and no attempt is made to identify graph subcomponents [ding2005] .

Each repository can specify a respective provenance repository. This is reflected in the base repository’s service description as a prov:hasProvenanceService association. along with additional prov:hasProvenance associations to aid discovery for revisions. In addition refrences appear in SPARQL protocol responses in the HTTP response headers, as proposed by PROV-AQ[prov-aq](3.1) - the query response header rel=’prov:has_query_service’ specifies as the anchor the sparql query service for the respective provenance repository. - the query response header rel=’prov:has_provenance’ specifies as the anchor the graph identifier for repository revision effective for the query


Provenance Schema

The schema derives from the w3c proposed provenance ontology .

@prefix : <urn:dydra:> .
@prefix prov: <http://www.w3.org/ns/prov#> .
:Transaction rdfs:subClassOf [prov:Activity](http://www.w3.org/TR/prov-o/#Activity) .
:Revision rdfs:subClassOf [prov:Entity](http://www.w3.org/TR/prov-o/#Entity) .
:Graph rdfs:subClassOf [prov:Entity](http://www.w3.org/TR/prov-o/#Entity) .
:Account rdfs:subClassOf [prov:Agent](http://www.w3.org/TR/prov-o/#Agent) .
:Operation rdfs:subClassOf [prov:Entity](http://www.w3.org/TR/prov-o/#Entity) .
:Query rdfs:subClassOf :operation .
:Repository rdfs:subClassOf [prov:Collection](http://www.w3.org/TR/prov-o/#) .

Provenance Information

The provenance information is compiled subsequent to each update request when a provenance repository has been specified. It states the identities of the transaction entities: account, repository, query, transaction, and generated revision. For each of those, it records the following subject and associations:

Account

Repository

  • prov:wasAssociatedWith : the account
  • prov:hadMember : the revision
  • as graph : the context for the query, transaction, and revision statements

Query

Transaction

  • prov:used : the parent revision (unless initial)
  • prov:generated : the revision
  • prov:hadPlan : the query
  • prov:startedAtTime, prov:endedAtTime
  • prov:wasRevisionOf : the parent revision (unless initial)

Revision

  • prov:wasDerivedFrom : read graphs
  • prov:wasGeneratedBy : the transaction
  • prov:wasInvalidateBy : the succeeding transaction (if applicable)
  • prov:wasRevisionOf : the parent revision
  • prov:wasUsedBy : the succeeding transaction (if applicable)
  • prov:startedAtTime

Parent Revision

  • prov:endedAtTime

Graphs

  • prov:wasGeneratedBy, prov:wasInvalidatedBy, prov:wasInfluencedBy : the transaction, depending on creation, deletion, or modification.

A Simple Example

The information collected in a provenance repository after a simple sequence of three of three updates was performed on its base repository would appear as follows:

../_images/provenance.pdf

Query Responses

Update query responses include links to provenance information in the headers and or the encoded body, depending on the particular encoding. (NYI)

HTML

For html responses - eg. the query editor page, provenance and provenance-service links should be present along with an anchor link to the abstract repositiory, in the head. (NYI)

RDF

For RDF responses, the prov:hasProvenance, prov:hasAnchor, and prov:hasProvenanceService properties should be incorporated into the encoded result. [prov-aq](3.2.1) (NYI)


Configuration

In order to enable provenance processing:

  • Configure the base repository to record provenance data

    <> <urn:dydra:provenanceRepositoryId> <http://localhost/account/provenance-repo-id>

  • Specify the repository as a request pragma

PREFIX provenanceRepositoryId: <http://localhost/account/provenance-repo-id>
INSERT DATA {
  http://examle.org/uri1/0001 'object~:0001' .
  http://examle.org/uri1/0001 rdf:type http://example.org/thing .
}

[cui2001]ftp://db.stanford.edu/pub/dbpubs/2001/56/56.pdf.gz
[ding2005]ftp://www.ksl.stanford.edu/local/pub/KSL_Reports/KSL-05-06.pdf
[prov-aq](1, 2) http://www.w3.org/TR/prov-aq/#resource-accessed-by-http
[prov-o]http://www.w3.org/TR/prov-o/