Provenance¶
Provenance concerns information about how an entity came to be and about its contributions towards the existence of others. Dydra exposes meta-data about repositories sufficient to answer questions about - data lineage - retrospective repository state - responsibility for changes
Lineage is first-order only. That is, the information about a transaction enumerates the data set constituents at the level of graphs. In does not describe dependencies at the level of individual inserted or deleted statement, [cui2001] and no attempt is made to identify graph subcomponents [ding2005] .
Each repository can specify a respective provenance repository. This is reflected in the base repository’s service description as a prov:hasProvenanceService association. along with additional prov:hasProvenance associations to aid discovery for revisions. In addition refrences appear in SPARQL protocol responses in the HTTP response headers, as proposed by PROV-AQ[prov-aq](3.1) - the query response header rel=’prov:has_query_service’ specifies as the anchor the sparql query service for the respective provenance repository. - the query response header rel=’prov:has_provenance’ specifies as the anchor the graph identifier for repository revision effective for the query
Provenance Schema¶
The schema derives from the w3c proposed provenance ontology .
@prefix : <urn:dydra:> .
@prefix prov: <http://www.w3.org/ns/prov#> .
:Transaction rdfs:subClassOf [prov:Activity](http://www.w3.org/TR/prov-o/#Activity) .
:Revision rdfs:subClassOf [prov:Entity](http://www.w3.org/TR/prov-o/#Entity) .
:Graph rdfs:subClassOf [prov:Entity](http://www.w3.org/TR/prov-o/#Entity) .
:Account rdfs:subClassOf [prov:Agent](http://www.w3.org/TR/prov-o/#Agent) .
:Operation rdfs:subClassOf [prov:Entity](http://www.w3.org/TR/prov-o/#Entity) .
:Query rdfs:subClassOf :operation .
:Repository rdfs:subClassOf [prov:Collection](http://www.w3.org/TR/prov-o/#) .
Provenance Information¶
The provenance information is compiled subsequent to each update request when a provenance repository has been specified. It states the identities of the transaction entities: account, repository, query, transaction, and generated revision. For each of those, it records the following subject and associations:
Account¶
Repository¶
prov:wasAssociatedWith : the account
prov:hadMember : the revision
as graph : the context for the query, transaction, and revision statements
Query¶
urn:dydra:signature : the query text hash sum
urn:dydra:user_id : the request identifier supplied by the client
Transaction¶
prov:used : the parent revision (unless initial)
prov:generated : the revision
prov:hadPlan : the query
prov:startedAtTime, prov:endedAtTime
prov:wasRevisionOf : the parent revision (unless initial)
Revision¶
prov:wasDerivedFrom : read graphs
prov:wasGeneratedBy : the transaction
prov:wasInvalidateBy : the succeeding transaction (if applicable)
prov:wasRevisionOf : the parent revision
prov:wasUsedBy : the succeeding transaction (if applicable)
prov:startedAtTime
Parent Revision¶
prov:endedAtTime
Graphs¶
prov:wasGeneratedBy, prov:wasInvalidatedBy, prov:wasInfluencedBy : the transaction, depending on creation, deletion, or modification.
A Simple Example¶
The information collected in a provenance repository after a simple sequence of three of three updates was performed on its base repository would appear as follows:
Query Responses¶
Update query responses include links to provenance information in the headers and or the encoded body, depending on the particular encoding. (NYI)
HTML¶
For html responses - eg. the query editor page, provenance and provenance-service links should be present along with an anchor link to the abstract repositiory, in the head. (NYI)
Configuration¶
In order to enable provenance processing:
Configure the base repository to record provenance data
<> <urn:dydra:provenanceRepositoryId> <http://localhost/account/provenance-repo-id>
Specify the repository as a request pragma
PREFIX provenanceRepositoryId: <http://localhost/account/provenance-repo-id>
INSERT DATA {
http://examle.org/uri1/0001 'object~:0001' .
http://examle.org/uri1/0001 rdf:type http://example.org/thing .
}