Published using Google Docs
A Decentralised Registry to Support the Dynamic Binding of Semantic Components in Solid Data Browsers
Updated automatically every 5 minutes

A Decentralised Registry to Support the Dynamic Binding of Semantic Components in Solid Data Browsers

Wouter Termont1, Wouter Janssens1, Ruben Verborgh1,3, Tom Haegemans1,2,*

* Corresponding author

1 Digita, R&D Dept., Breydelstraat 34-40 - 1040 Brussels, Belgium

2 KU Leuven, Department of Decision Sciences and Management Information Systems, Naamsestraat 69 - 3000 Leuven, Belgium

3 Ghent University – imec, Department of Electronics and Information Systems, Technologiepark-Zwijnaarde 126, 9052 Gent, Belgium





This document is the first version of a ‘living document’. You can find an up-to-date version at go.digita.ai/semcom.

The corresponding author can be contacted at tom@digita.ai or tom.haegemans@kuleuven.be.

This project has received funding from the European Union’s H2020 research and innovation programme under Grant Agreement no 871528.


1. Introduction

Recently, much attention has been given to Tim Berners-Lee’s Solid specification (Capadisli, Berners-Lee, Verborgh, Kjernsmo, Bingham, & Zagidulin, 2021) and the ecosystem in which it results. This is not surprising as this specification is meant to let people get a better grip on their personal information, something that is very relevant in this day and age.

One feature of a Solid ecosystem that allows this improved ‘grip’ is that people can see which parties store data about them. For this, it is required that several organisations provide their users with a ‘pod’ (i.e. a digital vault) that contains the users’ information. Then, much like how people can connect multiple mailboxes to one mail client (e.g. Microsoft Outlook), people can connect multiple of their pods to a single Solid client, which are frequently called ‘data browsers’. Solid clients render the contents of one or multiple pods so they can be presented in a way that people can easily understand. For example, if one of your pods contains your location in the form of x- and y-coordinates (which are difficult to understand), a data browser can plot this location on an actual map (which is easier to understand).

However, even though data browser applications are an essential part of the Solid ecosystem, they are not easy to build. The main reason for this is that pods can contain all kinds of data (e.g. your location, heart rate, …) and it is impossible for the people that build data browser applications to foresee all these kinds of data, let alone create an exhaustive list of them.

To solve this problem, there needs to be a way in which data browser applications can inspect the kind of data that should be displayed and, next, load and dynamically bind the right component to display this kind of data at runtime.

In the literature, several theoretical building blocks are available to design such a system such as linked data shapes, linked data forms and dependency injection. Likewise, in practice, similar problems were solved using registries of micro frontend-components like OpenComponents and Bit.dev. However, to the best of our knowledge, the theoretical building blocks have not yet been practically implemented while none of the practical solutions are based upon semantics or linked data.

Therefore, the goal of this research is to construct a decentralised registry to support the dynamic binding of semantic components in Solid data browsers. This registry is based on insights of practical solutions such as OpenComponents and Bit.dev while using theoretical building blocks such as linked data shapes as a foundation.

The remainder of this report is structured as follows. Section 2 presents the requirements for a decentralised registry to support dynamic binding of semantic components in Solid data browsers. Section 3 contains the details on how the design science methodology was adopted. Section 4 introduces the high level architecture and concepts that underlie the software tool to enable dynamic binding of semantic components in Solid data browsers. Section 5 discusses how the results of Section 4 fulfill the requirements as described in Section 2. Sections 6 and 7 discuss and conclude the research.

2. Requirements for Enabling Dynamic Binding of Semantic Components in Solid Data Browsers

In order to enable Solid clients, like data browsers, to render disparate data from one or more pods, without having an exhaustive list of all possible kinds of data, a number of minimally required steps or roles need to be implemented.

Shape Discovery

A first requirement is a way for the client application to learn about the kinds of data present in a pod. Such a discovery mechanism can rely either on the presence of metadata, or on some form of pattern-matching or other data-processing method that would extract such metadata on-the-fly. Since metadata, described within the Resource Description Framework (RDF; Cyganiak, Wood, & Lanthaler, 2014), plays a central role in the Solid ecosystem, this could be the preferred approach, at least in those cases where such metadata is present.

Crucial in this first requirement is a metadata standard to describe different kinds of data. Several proposals for doing so, on different levels of complexity, are already being developed within the extended Solid ecosystem. On a most basic level, class annotations of RDF Schema (RDFS; Brickley, & Guha, 2014) could be used to infer the presence of certain kinds of data. However, since these annotations do not have any normative aspect on their own, it could be hard to infer the correctness and completeness of the annotated data with any probability.

The most advanced proposal for dealing with this normative aspect are linked data shapes, as discussed by Berners-Lee (2019) and Verborgh (2019): descriptions of structural constraints to which a data resource should adhere. The two leading shape languages for describing linked data shapes are the Shapes Constraint Language (ShaCL; Cyganiak, Wood, & Lanthaler, 2014) and Shape Expressions (ShEx; Prud'hommeaux, Boneva, Labra Gayo, & Kellogg, 2019).

A further evolution on top of these languages is Shape Trees (Prud'hommeaux, & Bingham, 2021) which addresses the problem of declaring shapes that cover multiple resources, by linking ShaCL and ShEx shapes to the resource hierarchy. Note that of the above-mentioned shape languages, Shape Trees is the only one that is explicitly mandated to be sent as response header within the current Solid spec. This makes sense, since Shape Trees subsumes both ShaCL and ShEx, but it forces discovery mechanisms to process the most complex language, even in those cases where shape- or class-based discovery would suffice, or to discover these less complex constraints in the structured data itself.

Component negotiation

When a standard way of describing kinds of data is available, a second requirement is a way for client applications to retrieve a component, based on this description, that is capable of rendering the data that adheres to the description. Since it is entirely possible for there to be multiple components that can render the same kind of data, sometimes a decision will have to be made between multiple components. This decision can be made on grounds of the rendering capabilities of the component, but also based on any other preferences the client application might have.

Because the client cannot know about all possible kinds of data in advance, and this variety might be arbitrarily large, the components themselves will almost always reside on a remote location. Together with the above mentioned need to make decisions between multiple components, this necessitates some kind of component negotiation. Such a negotiation process can be implemented in any number of ways, based on the data shape metadata, the metadata about the client's preferences, and the component's metadata describing its compatibility with such shapes and preferences.

Note that the negotiation itself therefore does not rely on the components themselves, and the files of each component can therefore be served from another remote location than where the metadata is registered and the negotiation takes place. This enables the entire remote architecture to be extremely decentralized, and to rely on the serving power of Content Delivery Networks (CDNs). The negotiation process can also take many forms, with possible factors being the number of exchanges between client and the remote server(s), the decision being made client-side or server-side, and the number of compatible components returned.

Ultimately, though, the negotiation should result in one or more data-compatible components being available on the client side, either after being returned by the registry directly, or after being fetched by the client from a location pointed out by the registry (such as a CDN).

Dynamic binding

Given a component that can render the data found in the pod(s), the client should then be able to insert this component into its runtime without hindrance to the user experience. Depending on the language or framework, this will require a combination of dynamic import, which loads the component's module in the runtime, and some kind of late binding (e.g. runtime dependency injection), which links the component to where it is needed in the client's process.

Note that, since the client will not know anything about the implementation of the component it retrieved, all components should adhere to an interface, which enables the client to construct and insert it in a standardized fashion. More importantly, such an interface will tell the client how to pass the component the necessary information, such as the data to be rendered, any required configuration parameters or constraints.

3. Methodology

The artefact in this research was developed while following the design science methodology (Peffers, Tuunanen, Rothenberger, & Chatterjee, 2007). Design science in information systems research aims to “create and evaluate IT artefacts to solve identified organisational problems” (Hevner, March, Park, & Ram, 2004, p. 77). It is different from professional design in two ways. First, the developed artifact must contribute to the knowledge base (Gregor, & Hevner 2013, p. 342), and second, the creation and evaluation of the artifact must follow a rigorous process (Peffers et al. 2007, p. 49).

Accordingly, first, our software demonstrates a solution to a problem that is of great importance to the Solid research community, and second, it is grounded in several theoretical foundations such as linked data shapes and dependency injection.

4. SemCom: A Specification of a Decentralised Registry of Linked Data Shapes and Semantic Components

In this section, we introduce the artefact to support the dynamic binding of semantic components in a Solid data browser. The artefact, called SemCom, is a specification that leads to an ecosystem of decentralised registries containing a mapping between linked data shapes and components. This mapping adds semantic information to web components so that they can be bound dynamically in data browsers. In what follows, we first introduce the specification, an updated version of which will be maintained online (Digita, 2021a). Next, we discuss its reference implementation consisting of a software development kit (SDK) and repository, the full interfaces of which can be found online (Digita, 2021b).

The SemCom Specification, Ecosystem and Architecture

At the center of the ecosystem are semantic components. These components consist of code and metadata (see Figure 4.1). The code of the component can be executed at runtime, so, when relevant, it can be bound dynamically to a Solid data browser. The metadata of the component describes its nature and includes several attributes.


Figure 4.1: A schematic representation of a semantic component that can be bound dynamically to a Solid data browser on a mobile device.


The key attributes of a component’s metadata are the input
and output attributes as these attributes contain the component’s semantic information in the form of a reference to RDFS classes, shapes (e.g. ShaCL or ShEx) and/or Shape Trees. As such, these attributes take care of the many to many mapping between components and linked data shapes. The input attribute contains information about which linked data shapes the component can be used for. The output attribute contains information about the linked data shapes that can be used to describe the output of the component.

Other metadata attributes can be information about the purposes for which the component can be used, the runtime in which the component can be used, the license the component has, which version the component has, et cetera. Especially interesting is metadata for cross-platform development, containing attributes that increase responsiveness, such as intended media types and display sizes.

In order to impose as little restriction as possible on the interoperability of the SemCom components and the apps using them, the app–component interaction is performed by way of events. The specification foresees in four types of events:

Each of these events can contain data in one of five data types: plain text, JSON, RDF quads, a binary large object (blob) or an array of unsigned 8-bit integers.

Figure 4.2: An entity-relation overview of the SemCom architecture.

From an architectural point of view, in the SemCom ecosystem, the components’ code is stored at a host such as a content delivery network (CDN) and the components’ metadata is stored at a repository (see Figure 4.2). For scalability reasons, the repositories thus do not contain any actual code. The repositories of the SemCom ecosystem function as a store for component metadata and are designed to work in a decentralised way.

Using RFC 2119 requirement level keywords (Bradner 1997), the SemCom specification demands the following requirements from repositories:

This way, a Solid app can query the endpoint(s) of a single SemCom node for components capable of rendering data adhering to the shape discovered in the Solid Pod, while the SemCom registry it communicates with actually is a decentralized network of interacting nodes (see Figure 4.2).

Figure 4.2: A conceptual overview of the workings of SemCom

Reference Implementation & General Process

The reference implementation of the SemCom ecosystem consists of a Node.js implementation of the repository software, an SDK for browser applications using Web Components, and a number of example components. The SDK can be used by a web application to query the repository for web components, based on metadata including RDFS classes discovered in a pod, and then render these web components using the data.

The reference implementation is designed to support the following process (see Figure 4.4):

  1. A user uses a Solid app (e.g. a data browser) to access data stored in his/her Solid pod.
  2. Using the SDK, the Solid app discovers the shape of the user’s data.
  3. The app queries the SemCom repository for components compatible with this shape tree. If fitting components are found, the repository returns their metadata to the app.
  4. The app decides which component to use, fetches its code, and loads it dynamically.
  5. The app initializes an instance of the component.
  6. The component requests data from the app with a read event.
  7. To provide the data, the app can fetch it from the user's pod.
  8. The app then sends a response event containing the requested data.

Figure 4.4: A sequence diagram of the SemCom workflow.

5. Discussion

Given the reference implementation of a registry for semantic components, as put forth in Section 4, comparison can be made with the requirements listed in Section 2, leading to a number of open questions and possible alternative approaches.

First, we chose to rely on RDFS classes to detect the kind of data present in a pod, rather than some other kind of metadata or a pattern matching approach. Our rationale here is to stay as close as possible to the current evolution of the Solid specification, in which shapes (e.g. ShaCL or ShEx) and Shape Trees are not yet fully incorporated. Future changes or additions to the specification may warrant adding other shape- or class-based discovery mechanisms. A pattern matching approach on the other hand would be very useful in scenarios where other metadata is lacking. However, such an approach would require a lot of attention to the pattern matching algorithm itself, which would have led us too far from the main goal of this project. A pattern matching alternative therefore definitely remains an interesting subject for future research.

Second, the interface of the components in the reference implementation, as well as the way they are dynamically imported and created using the available DOM functionality, is heavily influenced by our choice for a Web Components-based SDK aimed at browser applications. This remains the most obvious choice, given the facts that the Solid ecosystem is in the first place a bundle of web technologies, and that the big majority of first wave Solid apps are therefore browser applications. Nevertheless, we are confident that the level of abstraction employed in the SemCom specification is high enough to allow for easy adoption in other platforms and frameworks.

Third, while the component code in the reference implementation must be fetched in a fully decentralized way, since the registry only stores metadata, the implementation of metadata decentralization relies on each of the nodes syncing with each other. While this does not pose a problem with regard to scalability, since only metadata is stored and takes very little space, a better alternative might be to let the nodes actually query each other's endpoints, as proposed in the specification.

Last, we want to point out a practical characteristic of working with the reference implementation, which suggests some avenues for further research and desirable additions to the SemCom specification. When a Solid app using the SDK is loaded multiple times on the same (or similar) data, it can happen that different compatible components are used to render the data between multiple loads. One of the reasons for this behavior can be that a new version of a component was registered with the repository, using slightly different metadata tags, either in the shape tree metadata or in the other preferences (e.g. optimal display size). Another reason could be a small change in the shape tree sent by the user's pod, for which the component repository might calculate a different component to be better suited even though the component used during an earlier load of the app might still be compatible. Such behavior could, if not taken into account, result in an erratic user experience. While it would definitely be interesting to look into possible mitigations of this behavior on the side of the repository, the most suited way to handle this would probably be client specific. Some clients might, for example, want to set a specific component once on the first load, while others may want to offer the choice to the user (e.g. saved as a preference). Such choices would be a good candidate for a future extension of the SDK.

6. Conclusion

Using a design science methodology, we have addressed the problem that applications consuming data from Solid pods can not know all kinds of data in those pods in advance in order to render them. Based on three requirements — shape discovery, component negotiation, and dynamic binding — we designed SemCom, a specification of a decentralised registry of linked data shapes and semantic components. This specification prescribes how Solid apps can discover the kinds of data present in a pod; use this information as metadata to query a decentralized registry of components for compatible ones; fetch such a component from a decentralized host; dynamically import the component in the runtime; and use it to render the data from the pod. We also provided a reference implementation of this specification, consisting of a Node.js repository and an SDK for browser application, leveraging the existing Web Components standards.  In the discussion, we defended some of our design choices, critically assessed the practical behavior of the reference implementation, and suggested a number of aspects for further research.

References

Berners-Lee, T. (2019, April 26). Linked Data Shapes, Forms and Footprints. W3C Design Issues. https://www.w3.org/DesignIssues/Footprints.html 

Bit. (n.d.). The Platform for the Modular Web. https://bit.dev 

Bradner, S. (1997). Key words for use in RFCs to Indicate Requirement Levels. RFC 2119, BCP 14, doi:10.17487/RFC2119. https://www.rfc-editor.org/info/rfc2119 

Brickley, D., & Guha, R.V. (2014, February 25). RDF Schema 1.1. https://www.w3.org/TR/2014/REC-rdf-schema-20140225 

Capadisli, S., Berners-Lee, T., Verborgh, R., Kjernsmo, K., Bingham, J., & Zagidulin, D. (Eds.). (2021, June 7). Solid Protocol. https://solidproject.org/TR/protocol 

Cyganiak, R., Wood, D., & Lanthaler, M. (Eds.). (2014, February 25). RDF 1.1 Concepts and Abstract Syntax. https://www.w3.org/TR/rdf11-concepts 

Digita. (2021a, August 16). Semantic Components Specification. https://docs.develop.digita.ai/semcom/specification 

Digita. (2021b, August 16). Semantic Components Software Development Kit. https://docs.develop.digita.ai/semcom/sdk 

Gregor, S., & Hevner, A. R. (2014). Positioning and Presenting Design Science Research for Maximum Impact. MIS Quarterly, 27(3), 425–478.

Hevner, A. R., March, S. T., Park, J., & Ram, S. (2004). Design Science in Information Systems Research. MIS Quarterly, 28(1), 75–105.

Knublauch, H. & Kontokostas, D. (Eds.). (2017, July 20). Shapes Constraint Language (ShaCL). https://www.w3.org/TR/2017/REC-shacl-20170720 

OpenComponents. (n.d.). Painless micro frontends delivery. https://opencomponents.github.io 

Peffers, K., Tuunanen, T., Rothenberger, M. A., & Chatterjee, S. (2007). A design science research methodology for information systems research. Journal of Management Information Systems, 24(3), 45–77.

Prud'hommeaux, E., & Bingham, J. (Eds.). (2021, August 17). Shape Trees Specification. https://shapetrees.org/TR/specification 

Prud'hommeaux, E., Boneva, I., Labra Gayo, J. E., & Kellogg, G. (Eds.). (2019, October 8). Shape Expressions Language 2.1. https://shex.io/shex-semantics 

Sporny, M., Longley, D., Kellogg, G., Lanthaler, M., Champin, P.-A., & Lindström, N. (2020, July 16). JSON-LD 1.1. https://www.w3.org/TR/json-ld11 

The W3C SPARQL Working Group (Ed.). (2013, March 21). SPARQL 1.1 Overview. https://www.w3.org/TR/sparql11-overview 

Verborgh, R. (2019, June 17). Shaping Linked Data Apps. https://ruben.verborgh.org/blog/2019/06/17/shaping-linked-data-apps 

Digita BV - TOPOS Office Center - Breydelstraat 34-40 - 1040 Brussels
VAT: BE 0705 969 661 - RPR: Brussels - Bank: BE 37 7360 4900 0828


[1] After some consideration, we stepped down from a MUST-have to a MAY-have endpoint for SPARQL requests (The W3C SPARQL Working Group, 2013), because of a similar performance-based decision in the Solid specification.

[2] Our choice for JSON-LD (Sporny, Longley, Kellogg, Lanthaler, Champin, & Lindström, 2020) as the default aligns with the community's preference for a developer-friendly way of serializing RDF. Similar decisions are made in current evolutions of the Solid specification at the time of writing.