The SPECIAL Policy Log Vocabulary

This documents specifies splog, a vocabulary to log data processing and sharing events that should comply with a given consent provided by a data subject. We also model the consent actions related to consent giving and revocation.

Introduction

The European General Data Protection Regulation defines a set of obligations for personal data controllers and processors. Primary obligations include: obtaining explicit consent from the data subject for the processing of personal data, providing full transparency with respect to the processing, and enabling data rectification and erasure (albeit only in certain circumstances). At the core of any transparency architecture is the logging of events in relation to the processing and sharing of personal data. The logs should enable verification that data processors abide by the access and usage control policies that have been associated with the data based on the data subject’s consent and the applicable regulations.

The SPECIAL Policy Log Vocabulary is focused purely on the representation of such logs using the W3C RDF (Resource Description Framework) standard and published following the principles of linked data.

At the heart of a log is a set of log entries organized along the privacy-aware content they relate to, together with associated metadata. The SPECIAL Policy Log Vocabulary makes extensive use of the Usage Policy Language Ontology, defined within the SPECIAL H2020 EU project.

The SPECIAL Policy Log vocabulary in turn builds upon the following existing RDF vocabularies:

SPL for privacy policies
Provenance Ontology for provenance information
Dublin Core Terms for metadata

Audience and scope

This document describes the SPECIAL Policy Log. It is aimed at people wishing to publish data processing and sharing events that must comply with a privacy policy in RDF as well as consent-related activities (acquisition and revocation). Mechanics of cross-format translation from other formats are not covered here.

Document conventions and namespaces

The namespace for the SPECIAL Policy Log Vocabulary is http://purl.org/specialprivacy/splog#. We write triples in this document in the Turtle RDF syntax [[TURTLE]] using the following namespace prefixes:

PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX spl: <http://specialprivacy.ercim.eu/langs/usage-policy#>
PREFIX dct: <http://purl.org/dc/terms/>
PREFIX prov: <http://www.w3.org/ns/prov#>
PREFIX skos: <http://www.w3.org/2004/02/skos/core#>

Concepts

Log

A Log is a collection of data that records data processing and sharing events as well as consent-related activities (acquisition and revocation). The data in a Log can be roughly described as belonging to one of the following categories:

Log metadata:: This is metadata that SHOULD describe the log as a whole, such as the label or title, the software agent(s) it belongs to, etc. Metadata is described in Section 7. One of the most important aspect is the Processor (a splog:Processor instance) whose service is logged, related via the splog:processor property (a subproperty of prov:agent).
Log entries:: This is the actual data from the Log. The Log MUST make use of the splog:logEntry property (a prov:wasGeneratedBy subproperty) to point to each of the entry in the log.

Optionally, and for the sake of compactness, events MAY be grouped into a given dimension or set of dimensions, conforming log groups. This is described in Section 6. In such case, the log can point to the groups through the specific property splog:eventGroup (a subproperty of splog:event).

Log entries

Log entries contain information about processing and sharing events associated to data subjects, as well as actions related to the consent provided (or revoked) by data subjects. These different types of entries are represented in our model with a classification of log entries, i.e., a hierarchy of classes. Thus, a LogEntry has two main types (subclasses), PolicyEntry and DataEvent, described as follows:

PolicyEntry:: This class reflects log entries related to policies and consent. We currently consider two subclasses, ConsentAssertion specifying a consent provided by a data subject to a Controller (linked with a splog:controller property), and ConsentRevocation, denoting the revocation of a given consent. Note that, in principle, we assume that a consent provided by a data subject replaces any previous consent, hence consent updates are implicit. Nonetheless, companies may wish to explicitly record a revocation entry pointing to the revoked consent, thus we include this capability via the splog:revoke property in our model. This latter may facilitate consent tracking and consent versioning.
DataEvent:: This class considers log entries that are actually events on the data, i.e., the aforementioned data processing and sharing events. In the case of the latter, the concrete Recipient can be specified, via the splog:recipient property.

In turn, the data in a log entry can be described as belonging to one of the following kinds:

Log entry metadata:: This is metadata that SHOULD describe the LogEntry as a whole. Metadata is described in Section 7.
Data subject:: The log entry SHOULD reference the data subject(s) involved in the entry. This is specified with the splog:dataSubject property (a prov:wasAssociatedWith subproperty) pointing to the appropriate splog:DataSubject involved in the entry. For the sake of simplicity, we assume that an entry is related to a single data subject, but multiple data subjects MAY be specified using multiple splog:dataSubject properties. Note that in case of anonymized logs, no subject can be specified.
Content:: The log entry MUST reference the actual data of the log. This is specified with the splog:logEntryContent property, pointing to the actual splog:LogEntryContent. This is described in Section 5
Timestamps:: The log entry MUST reference the time at which the event occurred using the splog:validityTime property (subproperty of prov:atTime). Note that this is based on the notion of considering instaneous events for the log entries. For the sake of log preservation, the log entry SHOULD also reflect the time in which the log was recorded, using splog:transactionTime (a dct:issued subproperty).
Message:: The entry SHOULD reference a splog:message of the log representing a human-friendly text.
InmutableRecord:: The log entry MAY reference a splog:InmutableRecord of its contents.
Activities:: The log entry MAY reference the BPM Activity, via the splog:activity property, and the concrete Case, via the splog:case property. Activities and Cases are members of a splog:Process, specified with skos:member. Both Activities and Cases can point to the involved splog:Processor, via the splog:performedBy property.

Basic Example

This example provides a quick overview of how the SPECIAL Policy Log vocabulary might be used to represent a log. First, the log description:

eg:log1 a splog:Log ;
dct:title             "Log of Database R2D2"@en ;
dct:description       "This contains a dump of our Database R2D2 used to track BeFit devices"@en ;
dct:issued            "2018-02-14"^^xsd:dateTimeStamp ;
prov:wasAttributedTo  eg:TrackingSystemR2D2 ;
splog:processor       eg:beFitInc .

where eg:log1 is an instance of a Log, and eg:TrackingSystemR2D2 is a software within the eg:beFitInc company.

Then, we include a new entry in the log, which is a processing event. The collection of the new position took place on 3rd January at 13:20 (i.e. validity time) and the event was recorded few seconds later (i.e. transaction time).

eg:log1 splog:event eg:logEntry1 .
eg:logEntry1 a splog:ProcessingEvent ;
    dct:title               "Collection of new device positions in
                             Database R2D2 on January 2018"@en ;
    splog:dataSubject       eg:user1 ;
    dct:description         "We collected a new position of your BeFit
    			             device in our database in Europe"@en ;
    splog:transactionTime   "2018-01-10T13:20:50Z"^^xsd:dateTimeStamp ;
    splog:validityTime      "2018-01-10T13:20:00Z"^^xsd:dateTimeStamp ;
    splog:message           "Tracking position by GPS... collected!" ;
    splog:eventContent      eg:content1 ;
    splog:inmutableRecord   eg:iRec1 .

where eg:logEntry1 is an instance of a ProcessingEvent, eg:user1 is the data subject related to the event, eg:iRec1 represents the inmutable version of the event and eg:content1 points to the actual content, defined as follows:


eg:content1 a splog:logEntryContent ;
    dct:description     "This contains the data item collected by a BeFit device on January 2018 in Vienna, only for the health purpose of the service"@en ;
    spl:hasData         svd:Location ;
    spl:hasProcessing   eg:SensorGathering ;
    spl:hasPurpose      eg:HealthTracking ;
    spl:hasStorage      [has:location svl:OurServers] ;
    spl:hasRecipient    [a svr:Ours] .

eg:SensorGathering rdfs:subClassOf svpr:Collect .
eg:HealthTracking rdfs:subClassOf svpu:Health .

In turn, the immutable Record can be defined as the hash of the content and the data subject, which can be kept in a different ledger or Knowledge Base, together with the definition of the hash algorithm:

eg:iRec1    a splog:InmutableRecord ;
	splog:hashContent "AZ8QWE..."^^xsd:base64Binary ;
	splog:hashUSer    "BHJQQ..."^^xsd:base64Binary ;
	splog:hashAlgorithm eg:hashRSA ;
	splog:hashKeyLength eg:hash2048 ;

Log Entry Content

The log entry content is represented by the splog:LogEntryContent Class, which is a rdfs:subClassOf the SPECIAL spl:Authorization. This way, event content and data policy authorizations can be checked for compliance.

Thus, the event content mus include the following four elements, defined in the SPECIAL usage policy language:

spl:hasData

specifies the data involved in the event. The data could be given at different levels, e.g. concrete instances of the data as in Example 3, which contains the actual location data, or just a general data category, e.g. just specifying that eg:dataItem1 a svd:Location, without actual coordinates.

spl:hasProcessing

specifies how is data processed.

spl:hasPurpose

specifies the purpose of the data processing.

spl:hasStorage

specifies where and for how long is the data stored.

spl:hasRecipient

specifies potential disclosures to other recipients, including third parties.

Log Entry Group

UML-style block diagram of the terms in this vocabulary related to grouping — Pictorial summary of Log grouping

An log entry group is a subclass of a log entry, containing information about one or more log entries in order to support presentation and processing. The data in a log entry group can be roughly described as belonging to one of the following kinds:

Log entry group metadata:: This is metadata that describes the log entry group as a whole. Metadata is described in Section 7.
Timestamps:: The entry MAY reference the interval of time considered in the group, using the splog:validityStartTime and splog:validityEndTime properties (subproperties of prov:startedAtTime and prov:endedAtTime), denoting the validity time. For the sake of log preservation, the entry SHOULD also reflect the time in which the log was recorded, e.g using the splog:transactionTime property (a dct:issued subproperty).
Dimensions:: The group MUST reference the component(s) it groups. This is specified with the splog:dimension property (a subproperty of splog:logEntryContent), pointing to a particular splog:LogEntryContent.
Data subject:: The group MAY reference the data subject(s) it groups, via the splog:dataSubjectGroup property (a prov:wasAssociatedWith subproperty). This property refers to a splog:DataSubjectGroup instance that groups all the splog:DataSubject instances, using skos:broader.
Events:: The group MAY point to the particular events included in the group through the splog:member property (a skos:broader subproperty).

The following example shows a log grouping all recommendations given during a month, without specifying the concrete data.

eg:log1 a splog:Log ;
    splog:eventGroup eg:recommendationsJanuary2018 .

eg:recommendationsJanuary2018  a splog:logEntryGroup ;
    splog:transactionTime   "2018-02-01T00:05:00Z"^^xsd:dateTimeStamp ;
    splog:validityTime      "2018-01-31T23:59:59Z"^^xsd:dateTimeStamp ;
    splog:dataSubjectGroup  eg:basicSubjectGroup1 ;
    splog:dimension         eg:templateOfferRecommendation .

eg:basicSubjectGroup1       splog:member eg:user1, eg:user2, eg:user3 .

eg:templateOfferRecommendation a splog:logEntryContent ;
    spl:hasData             eg:OfferRecommendation ;
    spl:hasProcessing       eg:MonthlyDataAnalysis ;
    spl:hasPurpose          eg:MonthlyOffersRecommendation ;
    spl:hasStorage          [has:location svl:OurServers] ;
    spl:hasRecipient        [a svr:Ours] .

eg:OfferRecommendation rdfs:subClassOf  svd:Location ;
    rdfs:comment  "We recommended you an offer at the end of the month
    		   based on the location of your device during the
    		   given month. We concrete offer is not stored in
    		   this log" .
eg:MonthlyDataAnalysis rdfs:subClassOf  svpr:Analyze .
eg:MonthlyOffersRecommendation rdfs:subClassOf eg:RecommendationActivity .
eg:RecommendationActivity rdfs:subClassOf svpu:Marketing .

Log Provenance

In the log model, we assume that the description of entries coming from different systems can be merged and integrated together in a single store, which will potentially serve transparency and compliance mechanisms.

In certain scenarios, named graphs can be used to encapsulate logs before integrating entries coming from different subsystems. For example, let us assume a gym company ViennaGym, referred to with the namespace viennagym, makes offers based on a mutual sharing policy with the previous company beFitInc. The following example builds upon the previous gathering event (see Examples 1 and 2) and shows the integration with a marketing event from ViennaGym. First, the data item is gathered by beFitInc (previous example), then it is shared between the company beFitInc and ViennaGym, and finally this latter uses the data to provide marketing advertising. These series of events is encapsulated in three graphs eg:tracking, eg:sharing, and viennagym:marketing respectively. We make use of the TriG [[TRIG]] syntax to extend Turtle with named graphs.

eg:tracking prov:agent eg:beFitInc .
eg:sharing prov:agent eg:beFitInc .
viennagym:marketing prov:agent viennagym:ViennaGymInc .

eg:tracking {
  eg:logEntry1 a splog:ProcessingEvent ;
     prov:wasAssociatedWith  eg:user1 ;
     splog:transactionTime   "2018-01-10T13:20:50Z"^^xsd:dateTimeStamp ;
     splog:validityTime      "2018-01-10T13:20:00Z"^^xsd:dateTimeStamp ;
     splog:message           "Tracking position by GPS.. collected!" ;
     splog:content           eg:content1 .

  eg:content1 a splog:logEntryContent ;
     spl:hasData         svd:Location ;
     spl:hasProcessing   befit:sensorGathering ;
     spl:hasPurpose      befit:HealthTracking ;
     spl:hasStorage      [has:location svl:OurServers] ;
     spl:hasRecipient    [a svr:Ours] .
}

eg:sharing {
  eg:logEntry2 a splog:SharingEvent ;
     prov:wasAssociatedWith  eg:user1 ;
     splog:transactionTime   "2018-01-15T09:02:30Z"^^xsd:dateTimeStamp ;
     splog:validityTime      "2018-01-15T09:00:00Z"^^xsd:dateTimeStamp ;
     splog:recipient         viennagym:ViennaGymInc ;
     splog:content           eg:content2 .

  eg:content2 a splog:logEntryContent ;
     spl:hasData         svd:Location ;
     spl:hasProcessing   eg:SecureTransferPartner ;
     spl:hasPurpose      eg:BefitpartnerRecommendation ;
     spl:hasStorage      [has:location svl:OurServers] ;
     spl:hasRecipient    viennagym:Company .


  eg:SecureTransferPartner rdfs:subClassOf svpr:Transfer .
  eg:PartnerRecommendation rdfs:subClassOf befit:RecommendationActivity .
  eg:RecommendationActivity rdfs:subClassOf svpu:Marketing .
  viennagym:Company rdfs:subClassOf spl:AnyRecipient .
}

viennagym:marketing {
  viennagym:entry1111 a splog:ProcessingEvent ;
     prov:wasAssociatedWith  befit:Sue ;
     splog:transactionTime   "2018-01-27T13:00:30Z"^^xsd:dateTimeStamp ;
     splog:validityTime      "2018-01-27T13:00:00Z"^^xsd:dateTimeStamp ;
     splog:message           "Send offer of our gym!" ;
     ...
     splog:content           viennagym:marketing6590 .

  viennagym:marketing1 a splog:logEntryContent ;
      spl:hasData         svd:Location ;
      spl:hasProcessing   viennagym:Analysis ;
      spl:hasPurpose      viennagym:GymRecommendation ;
      spl:hasStorage      [has:location svl:OurServers] ;
      spl:hasRecipient    [a svr:Ours].

  viennagym:Analysis rdfs:subClassOf svpr:Analyze .
  viennagym:GymRecommendation rdfs:subClassOf svpu:Marketing .
}

Recording instance data

In principle, the main objective of the SPLog is to record data processing and sharing events, together with policy-related events (consent assertion and revocation), keeping the actual (instance) subjects’ data in a different ledger. However, SPLog additionally provides an optional instance module (a) to store such instance data, or (b) to refer to (a service or API) where the instance data can be located. In the following, we provide details and examples on the potential use of this vocabulary for these two cases. In both cases, we consider a splog:InstanceData class (a subclass of dcat:Dataset) associated to a splog:DataEvent log entry via the splog:instanceData property. Then, the instance data can be served in different splog:DataDistribution (subclass of dcat:Distribution), e.g. one distribution stored in raw CSV data and one in JSON data. Combining storing and referenced data is also possible.

A first possibility is to store the actual data in the log. For instance, BeFit may decide to store in the log both the data collection event and (a copy of) the actual collected data of Sue’s Befit device. Note that physically storing the instance data in the log implies that the log contains (even more) sensitive data. Thus, in general, it is not recommended that the instance data are kept on a public ledger (such as blockchain), as a security breach or a future hash break would expose the actual data. In addition, similarly to the previous case of the log entries, an immutable ledger would prevent the controller of deleting or rectifying the data (as it is required), hence cryptographic deletion mechanisms must be in place for the actual data.

The SPLog vocabulary provides two ways of storing instance data, (a) storing raw data (e.g. JSON, CSV, etc.) or (b) storing the semantic representation of the data (i.e. RDF data).

Storing raw data. In this case, the splog:DataDistribution contains the raw data itself, using the splog:rawData property and further described with additional properties such as dct:format media type or dcat:byteSize. The following is an example of a raw distribution in CSV, showing the collection of location data from Sue.

 befit:entry3918 a splog:ProcessingEvent ;
     ...
     splog:instanceData befit:instance3918 .

  befit:instance3918 a splog:IntanceData ;
     dct:title "Actual collected data" ;
     dcat:contactPoint befit:CollectionContactPoint ;
   	 splog:dataDistribution befit:distribution3918_1 .
     ...

  befit:distribution3918_1 a splog:DataDistribution ;
     a splog:DataDistribution ;
     dcat:mediaType "text/csv" ;
     dcat:byteSize "304"^^xsd:decimal ;
     splog:rawData "PersonName,Position,Time \n Sue,48.2082 N, 16.3738 E,
                     2018-01-27T13:00:00Z" .

Storing RDF data. In case the data is actually RDF data, the concrete resource (e.g. an RDF resource or named graph) can be specified via splog:RDFData. The following is an example of storing a distribution in RDF, showing the collection of heart rate data from Sue.

  befit:instance5000 a splog:IntanceData ;
     dct:title "Actual collected data" ;
     dcat:contactPoint befit:CollectionContactPoint ;
   	 splog:dataDistribution befit:distribution5000_1 .
     ...

  befit:distribution5000_1 a splog:DataDistribution ;
     a splog:DataDistribution ;
     splog:RDFData befit:collection_5000 .

  befit:collection_5000 foaf:name "Sue" ;
     befit:heartRate 80 .

In this particular case, the instance data is located externally. Note that a first possibility is that the distribution makes use of the aforementioned splog:RDFData property, but the resource itself is external. In that case, the data can be retrieved with a standard Linked Data dereferenciation. In general, as explained, SPLog provides access to external data via splog:downloadURL (subproperty of dcat:downloadURL), typically described via a dct:format media type, or splog:accessURL (subproperty of dcat:accessURL), when the data cannot be directly downloaded but there is an access point (e.g. landing page, feed, SPARQL endpoint). The following is an example of a reference to a JSON storing the collection of heart rate data from Sue at the given time.

  befit:instance888 a splog:IntanceData ;
     dct:title "Actual collected data" ;
     dcat:contactPoint befit:CollectionContactPoint ;
   	 splog:dataDistribution befit:distribution888_1 .
     ...

  befit:distribution888_1 a splog:DataDistribution ;
     a splog:DataDistribution ;
     dcat:mediaType "application/json" ;
     dcat:downloadURL
      <http:example.org/internalAPI/getHistoricData/Sue/20180325090930> .

Outline of the vocabulary