Provenance concerns information about how an entity came to be and about its contributions towards the existence of others. Dydra exposes meta-data about repositories sufficient to answer questions about - data lineage - retrospective repository state - responsibility for changes
Lineage is first-order only. That is, the information about a transaction enumerates the data set constituents at the level of graphs. In does not describe dependencies at the level of individual inserted or deleted statement, [cui2001] and no attempt is made to identify graph subcomponents [ding2005] .
Each repository can specify a respective provenance repository. This is reflected in the base repository’s service description as a prov:hasProvenanceService association. along with additional prov:hasProvenance associations to aid discovery for revisions. In addition refrences appear in SPARQL protocol responses in the HTTP response headers, as proposed by PROV-AQ[prov-aq](3.1) - the query response header rel=’prov:has_query_service’ specifies as the anchor the sparql query service for the respective provenance repository. - the query response header rel=’prov:has_provenance’ specifies as the anchor the graph identifier for repository revision effective for the query
The schema derives from the w3c proposed provenance ontology .
@prefix : <urn:dydra:> .
@prefix prov: <http://www.w3.org/ns/prov#> .
:Transaction rdfs:subClassOf [prov:Activity](http://www.w3.org/TR/prov-o/#Activity) .
:Revision rdfs:subClassOf [prov:Entity](http://www.w3.org/TR/prov-o/#Entity) .
:Graph rdfs:subClassOf [prov:Entity](http://www.w3.org/TR/prov-o/#Entity) .
:Account rdfs:subClassOf [prov:Agent](http://www.w3.org/TR/prov-o/#Agent) .
:Operation rdfs:subClassOf [prov:Entity](http://www.w3.org/TR/prov-o/#Entity) .
:Query rdfs:subClassOf :operation .
:Repository rdfs:subClassOf [prov:Collection](http://www.w3.org/TR/prov-o/#) .
The provenance information is compiled subsequent to each update request when a provenance repository has been specified. It states the identities of the transaction entities: account, repository, query, transaction, and generated revision. For each of those, it records the following subject and associations:
The information collected in a provenance repository after a simple sequence of three of three updates was performed on its base repository would appear as follows:
Update query responses include links to provenance information in the headers and or the encoded body, depending on the particular encoding. (NYI)
For html responses - eg. the query editor page, provenance and provenance-service links should be present along with an anchor link to the abstract repositiory, in the head. (NYI)
In order to enable provenance processing:
Configure the base repository to record provenance data
<> <urn:dydra:provenanceRepositoryId> <http://localhost/account/provenance-repo-id>
Specify the repository as a request pragma
PREFIX provenanceRepositoryId: <http://localhost/account/provenance-repo-id>
INSERT DATA {
http://examle.org/uri1/0001 'object~:0001' .
http://examle.org/uri1/0001 rdf:type http://example.org/thing .
}
[cui2001] | ftp://db.stanford.edu/pub/dbpubs/2001/56/56.pdf.gz |
[ding2005] | ftp://www.ksl.stanford.edu/local/pub/KSL_Reports/KSL-05-06.pdf |
[prov-aq] | (1, 2) http://www.w3.org/TR/prov-aq/#resource-accessed-by-http |
[prov-o] | http://www.w3.org/TR/prov-o/ |