Federation

Federation concerns ways for a query to draw upon diverse data sources and processing capacity in addition to the immediate dataset and query processor. The SPARQL 1.1 recommendation include provisions for expressing federated queries in SPARQL.

For Dydra, its various forms involve an initial, base query, which is processed by a first host against an initial repository and a sub-query, which is processed by a second host against some other repository, with the solutions incorporated by the first host into the base query evaluation algebra. The distinct forms arise due to variations in which hosts play which of the two roles, which query forms initiate federated processing, and which data sources are involved.

Federation Request Modes

Dydra supports internal and external federation. Internal federation pertains when the same query processor hosts both the repository to which a sub-query is applied and the repository for the base query. In this case, the query if executed as a subtask within the initial host query processor subject to intra-host access and authorization. External federation pertains when a sub-query is performed upon request by a remote host processor. In this case, for each such location, the first processor determines a sparql endpoint, a query request is made of that endpoint host, and the processor receives the results are integrates into its algebra data-flow. The federation mode none suppresses federation processing.

The default mode for a query is specified in the initial system configuration. Each individual query may specify its own mode, subject to constraints

mode

effect

permitted variations

#<urn:dydra:none>

federation is disabled: GRAPH forms apply to a repository’s named graphs only and any SERVICE cause an error.

#<urn:dydra:internal>

federation is enabled for references local to the host

#<urn:dydra:none>

#<urn:dydra:external>

federation is enabled for both local and remote references.

#<urn:dydra:none>, #<urn:dydra:internal>

Federation Query Forms

The SPARQL 1.1 Federated Query specification includes the SERVICE form, with which a query indicates that a component is to be executed by an alternative processor. This is explicit federation. SPARQL 1.0, on the other hand, introduced the possibility to indicate diverse datasets through GRAPH forms, but the specification indicates that, if an IRI is specified in a dataset description, “attempts are made to obtain an RDF graph associated with the IRI.” In other words, it allows just that the first processor obtain the designated graph and incorporate it into the local dataset to which it then applies the query. The Dydra SPARQL processor performs explicit federation only and any graphs must already be present in the repository when it initiates the request.

Authorization

When a query specifies either an internal reference to a local repository or a remote repository location, the access must be authorized. Authorization has two aspects: from the client perspective and from the service perspective. Each repository allows to freely specify authorized locations in the form ACL entries which specify either originating repository or a request agent. A reference from a query is permitted when the query’s initial repository matches an entry from the service location’s authorized clients or the request agent satisfies an analogous constraint. The respective ``system``repository contains these permissionsas ACL entries..

Examples

Internal Federation

A SERVICE form which either specifies a constant IRI term or which specifies a variable location in tur bound to an IRI term, which is local to the query processor host, is executed as a SubSelect within the same query processor. References to repositories within the same account are always permitted, but references to any other repository require authorization.

 PREFIX federation_mode: <urn:dydra:internal>  # or external
select *
where { ?s ?p ?o .
 service <http://localhost/jhacker/foaf> {
  ?s <http://xmlns.com/foaf/0.1/mbox> ?mbox .
 }
}

A local host is indicate by the following authorities - local - 127.0.0.1 - dydra.com - the exact hostname returned by the hostname function.

External Federation

A SERVICE form which specifies a location IRI, which is external to the query processor host, is executed as a SPARQL request to the remote processor. All external requests require authorization to access the respective service.

PREFIX federation_mode: <urn:dydra:external>
select *
where { ?s ?p ?o .
 service <http://w3.org/tbl/foaf.nt> {
  ?s <http://xmlns.com/foaf/0.1/mbox> ?mbox .
 }
}

Internal Federation with Virtual Sources

A SERVICE local federation location may designate data sources in addition to concrete repositories. Stored views may be identified with their external resource identifier. Alternative backends are identifed according to their declare repository alias.

View Federation

Where the location IRI includes a view suffix, that view is executed and the results are incorporated according to its dimensionality. A SELECT query expression yields a result field with the select form projsction dimensions. A CONSTRUCT or DESCRIBE expression always yields a result field with the dimensions ?s, ?p, ?o.

A view location is any IRI which follows the pattern for internal federation, above.

Relational Federation

PSQL views are declared in the store configuration (/srv/dydra/config/server.conf) to define an arbitrary mapping from Postgres view to RDF field, where the columns of each view become dimensions of the solution field. For example, given the declaration

pgsql {
  tr_crop_plots_seed_start {
    storage pgsql
    pgsql-table public.tr_crop_plots_seed_start
  }
  tr_crop_plots_seed_end {
    storage pgsql
    pgsql-table public.tr_crop_plots_seed_end
  }
  tr_crop_plots_seed {
    storage pgsql
    pgsql-table public.tr_crop_plots_seed
  }
}

and the Postgres views defined as

         View "marti.tr_crop_plots_seed"
Column | Type | Modifiers | Storage  | Description
--------+------+-----------+----------+-------------
 s      | text |           | extended |
 p      | text |           | extended |
 o      | text |           | extended |
View definition:
 SELECT concat('http://example.org/plot/', tr_crop_plots.crop_plot) AS s,
    'http://example.org/seed'::text AS p,
    tr_crop_plots.seed AS o
   FROM tr_crop_plots;

          View "marti.tr_crop_plots_seed_end"
 Column |  Type   | Modifiers | Storage  | Description
--------+---------+-----------+----------+-------------
 s      | text    |           | extended |
 p      | unknown |           | plain    |
 o      | date    |           | plain    |
View definition:
 SELECT concat('http://example.org/plot/', tr_crop_plots.crop_plot) AS s,
    'http://example.org/end_date' AS p,
    tr_crop_plots.end_date AS o
   FROM tr_crop_plots;

         View "marti.tr_crop_plots_seed_start"
 Column |  Type   | Modifiers | Storage  | Description
--------+---------+-----------+----------+-------------
 s      | text    |           | extended |
 p      | unknown |           | plain    |
 o      | date    |           | plain    |
View definition:
 SELECT concat('http://example.org/plot/', tr_crop_plots.crop_plot) AS s,
    'http://example.org/start_date' AS p,
    tr_crop_plots.start_date AS o
   FROM tr_crop_plots;

a federation operation could take the form

select ?field ?start ?end ?amount
where {
 ?field <http://example.org/harvest> ?amount .
 { service <http://localhost/pgsql/tr_crop_plots_seed> { ?field <http://example.org/seed> ?seed } }
 { service <http://localhost/pgsql/tr_crop_plots_seed_start> { ?field <http://example.org/start_date> ?start } }
 { service <http://localhost/pgsql/tr_crop_plots_seed_end> { ?field <http://example.org/end_date> ?end } }
}" :repository-id "my/field-harvest")