Understanding the lifecycle, management, and integration of research data across platforms.

The highly fragmented nature of FCT research acts as a significant barrier to it. The current section investigates the challenges for FCT RDE, as well as the recommendations that can be adopted to minimise the detrimental effects of the FCT research challenges. Based on literature and documentation review and expert consultations, this study outlines eight core FCT RDE challenges:

1.    FCT-Specific Data Standardisation: dearth of standards specific to FCT and FCT research.

2.    Technological Requirements for FCT Research: high heterogeneity of tools and lack of specialised capabilities.

3.    Importance of an Interoperable Data Infrastructure: need for the better interoperability of data spaces.

4.    Legal Policies and Regulation: handling a plethora of EU and national laws with complex application.

5.    Strengthening Cross-Stakeholder Trust and Enabling Transparency: low trust amongst stakeholders is driven by a lack of transparency.

6.    Achieving Appropriate Data Control: absences of a common data management and governance framework.

7.    Data Quality Assurance: need for a reliable data quality assessment and verification tools; lacking clarity of data ownership.

8.    Intra-Organisational Issues: need for highly trained and experienced personnel.

The lack of FCT-specific standardisation in data collection, storage, and analysis increases the risks of inconsistencies and errors in the data and, subsequently, in research findings. A current issue is the management and accommodation of the vast spectrum of data formats used in FCT research: the multitude of data formats often requires a different set of tools and solutions for their sharing, processing, and storage. This also hinders the interoperability of data processing from multiple sources. Reportedly, another issue is the lack of pressure for organisations to comply with relevant standards, which is further enforced by the lack of FCT-specific standards, challenging the development and deployment of a successful FCT RDE. These aspects mark the pressing need for shared pan-European FCT data standards, regulations, and processes. The development of single-source search capabilities, paired with standardised FCT data and metadata vocabularies can increase the findability of resources. Furthermore, clear data governance framework needs to be created and applied to support the establishment of clear rules and procedures for FCT data sharing.

FCT research requires a wide range of technological solutions and capabilities, which are often costly, especially if they are FCT-specific. As mentioned above, the multitude of data formats used in FCT research can create heterogeneity in the required tools, technical requirements, and knowledge. Furthermore, the current use of disparate data platforms and formats hinders the efficient data sharing and analysis, making the processes time-consuming. The FCT research domain will benefit from the development of new, user-friendly, interoperable, and widespread technologies, paired with automated AI capabilities that would assist the analysis, visualisation, and interpretation of data. It is crucial to introduce domain-wide standardisation of data collection, storage, and processing. Moreover, the development and use of a common secure European cloud storage infrastructure will enable the handling of large amounts of data in a secure manner.

Interoperability is a vital prerequisite for the successful implementation of an FCT RDE. Presently, the multiple sets of non-specific standards applied by a limited number of FCT stakeholders limits the national and EU interoperability of data. Thus, the central data space components of the successful FCT RDE need to be interoperable with national systems and cross-border processes. Additionally, the standardisation of data formats, metadata, and interoperability of data platforms are crucial for FCT data sharing.

FCT research and all data practices within it need to adhere to legal policies and regulations, but the present abundance of regulations presents a challenge for the domain. Currently, organisations hesitate to share data due to the complexity of GDPR and its likeliness to be open to interpretation. Furthermore, a substantial amount of FCT research is conducted in and through third countries, which requires researchers and organisations to be aware of any differences in and changes to specific regulations that impact their work relating to their country and partnership ones. To overcome these challenges, FCT-specific policies and procedures, defining data management practices, data sharing agreements, security measures, and data access and use guidelines need to be implemented and shared in clear and accessible way.

The FCT research domain faces high levels of cross-stakeholder distrust, which is linked to concerns about potential data misuse and data quality uncertainties. The reluctance of sharing sensitive data is further fuelled by security concerns. For FCT research, it is crucial to have trustworthy partnerships, built on transparency, mutual respect, and communication. Trust can also be reinforced through the implementation of clear legal and policy guidelines and protocols. Furthermore, citizens need to be treated as a rightful FCT research stakeholder, and need to be engaged with regularly to increase public FCT understanding whilst transparency on how their data is used.

Thorough data control is paramount in FCT RDE. Data control is a complex issue that encompasses a wide range of aspects, including levels of data access for different stakeholders, application of data protection practices, data and metadata standardisation, tools applicability, and the implementation of policies and regulations. There is currently heavy reliance on GDPR as interpreted and applied by each EU state, but there is also reportedly lack of knowledge on how GDPR impact FCT data sharing, creating hesitancy and reinforcing siloes. Thus, FCT data policies and governance frameworks are required to ensure that data is managed securely, and that research is conducted ethically, legally, and in-line with all relevant regulations.

Data quality is of utmost importance for FCT research, as poor data quality can lead to erroneous conclusions and wasted resources, leading to reluctance to share and request data. The lack of widely applied standardised practices and guidelines highlights the importance of implementation of more reliable data quality assessment and verification tools. An additional issue is the lack of clarity regarding the rights and responsibilities of contributing organisations, making it even more difficult to establish whose responsibility data quality assurance is. To overcome this challenge, new methods for data quality assurance need to be developed and implemented.

The FCT research domain faces a number of intra-organisational issues, including non-standardised vocabularies, lack of skilled personnel, costly equipment and tools, and insufficient funding. The use of disparate languages and vocabularies across and within organisations challenges data re-use and sharing and highlights the need for a definition and utilisation of a standardised language. The domain also faces a shortage of skilled personnel, highlighting the need for better education and training, specifically on data security best practices and awareness of the responsibilities of the different roles in maintaining data security. The domain also requires a wide spectrum of personnel with specific skills, including expertise in data science, machine learning, statistical analysis, cybersecurity, and other.

The issues supporting the successful development and sustainability of a European FCT RDE are complex and interlinked. The current section provides an overview of the FCT RDE challenges and the recommended actions to address them, acting as an important foundation for the successful implementation and long-term use of the FCT RDE.



Skill Level: Beginner

Onboarding of new Participant

A new Participant asks credentials for the Research Data Ecosystem using the Participant Connector. The request is received by the Issuer Connector, so that the Issuer is able to visualize it and validate the request. Once the request is validated and credentials issued, the Participant is able to operate in the ecosystem.

Upload of a dataset on the Connector and metadata documentation

A Participant can upload a new dataset locally inside the Participant Connector. At this stage the dataset is not disclosed, but remains visible to the user accounts registered on the Participant Connector.

The Participant can document the dataset by providing metadata related to provenance, legal, ethics and privacy aspects, structure of the dataset, multiple formats of the same dataset if necessary, etc.

Publication of metadata on the Catalogue

Once finalized, the Participant can publish metadata of its dataset on the Catalogue. Metadata are then visible on the Catalogue.

12.2       Onboarding of new Participant

Let’s consider two Participants that want to share data. As previously mentioned, they need dedicate Connectors running on properly configured nodes. To avoid data misuse or malicious behaviours, participants should prove each other to be trusted. They need something that certifies trust. None can take part to the RDE without undergoing an approval process. Indeed, to provide a high level of trust between Participants, any organisation needs to be first checked and then approved to be part of the RDE.

Onboarding consists in:

1.    verifying the identity of a new Participant

2.    providing the verified Participant with a proof (i.e. credentials) that certifies it has been verified as trusted and can take part to the RDE

3.    registering the Participant information into the publicly available Catalogue, to be searchable by other participants.


Figure 7: Onboarding

As shown in Figure 7, the Onboarding process starts when a new Participant wants to take part to the RDE. To achieve that, the Participant must send a request through its Connector to an Issuer, that acts as trusted authority of the RDE. The request will include information about the Participant, as well as supportive files (documentation about the organisation, certifications, etc.).

When a new request is received, the Issuer must perform the required checks to assess the veracity of the Participant claims. These checks consist in an offline verification of the trustworthiness of the requesting Participant (e.g. work conducted by the organisation, no legal infringements, certifications, purpose of the participation in the RDE, etc.). If this verification is successful, the request can be approved. Then, the issuer generates and signs a Verifiable Credential (VC) and send it back to the Participant. VC is also registered on the Ledger for later verification. 

Once obtained (e.g. by email), the Participant store the VC provided by the Issuer on the Connector. The VC will be shown to other participants when requested (like a passport). Specifically, before any interaction between two participants A and B, they (actually their Connectors) will ask each other the credentials (signed by A and B to prove their identity) (Figure 8). Then, they can validate each other the credentials by verifying the signature of the Participant, the signature of the Issuer and the validity status of the credentials. That way, a Participant can be sure that the other Participant has been considered trusted by an Issuer, and then interactions can occur.


Figure 8Verifiable Credential verification

 

12.3       Upload of a dataset on the Connector and metadata documentation

To make datasets discoverable and searchable in the ecosystem, it is vital to collect metadata on the available datasets into the Catalogue, while datasets remain stored on Participant premises (in compliance with data sovereignty and decentralisation principles).

Once the Participant is registered in the RDE, it can start sharing its datasets. When a dataset is ready to be shared, the participant can upload it on the Connector (local submission). The local submission of a dataset on the Connector does not make it immediately available in the RDE. To make a dataset available and searchable by other participants of the RDE, the data provider must publish the metadata of the dataset on the Catalogue. The local submission step is essential to allow participant to define and enrich the metadata of a dataset before their publication on the Catalogue. When metadata are finalised, the data provider can publish them on the Catalogue.

12.4       Publication of metadata on the Catalogue

This stage consists in advertising a dataset in the ecosystem through publishing its metadata on the Catalogue. From this moment on, any other participant in the ecosystem can search and visualize metadata published on the Catalogue, as well as sending requests to the data provider to access the dataset.

If needed, the Participant can delist the metadata, update them, and re-publish them on the Catalogue, so as to always keep dataset information updated to the research community.



Skill Level: Beginner

In the European Union (EU), the type of license you use to share data depends on whether the data is personal or non-personal. In addition, choosing between an individual license file and an open license for exchanging non-personal data depends on the requirements and considerations of the parties involved.. It is important to note that the license file or data processing agreement (DPA) should be tailored to a specific dataset or model being shared. 

 

Figure 21: Depending on the type of data and requirements on the dataset, various options for creating a license are illustrated. 

 

The creation of a license file relies on the data provider's response to a series of tailored inquiries. The responses from the data provider are then used alongside the mitigation actions to generate the license. Depending on the type of data and additional constraints from the data provider, the software enables the creation of different types of licenses. Handling personal data needs to follow the GDPR [] rules in the EU. Hence, if the data is classified as personal data, then the license file needs to include GDPR compliance. This kind of agreement is called a data processing agreement (DPA). Highlighting the importance of complying with data privacy regulations is crucial for any organization dealing with personal data. Ensuring the possession of a valid license file is vital to minimizing possible legal consequences in the future. If GDPR regulations apply, contract drafting is significantly constrained as the requirements of GDPR [] must be adhered to. Particularly, the security measures at the data processor's end should align with the current state of technology and consider the sensitivity of the data. It is important to note that a generic description of technical and organisational measures is not sufficient. Instead, a concrete description must be included in the DPA.  If the data does not involve personal information and there are no other usage restrictions, the module recommends using an open license such as the Creative Commons licenses. If the non-personal data is classified as a type that requires individually negotiated contracts, then the license file should reflect this. There could be restrictions on its usage such as commercial use restriction, so that the data may not be used for any commercial purposes or confidentiality obligations in case the data contains sensitive information. In this case the module generates a proprietary license agreement. 

 

Tool Description 


Figure 22. Start page of the tool 

 

At the start page of the tool (Figure 22) there are 2 parts, one for the Risk Mitigation File and one for the License Generation. At the first part a user can choose to upload a risk mitigation file (JSON based file), created during the risk mitigation analysis. By pressing the button Open File, the results from the file will be automatically processed, properly formatted and shown as a set of Risk Factor – Action pairs in a new tab. The results from this file can be used as a help for answering the questions in the second part of the tool, i.e. the License Generation. Figure 23 shows an example of the results from the risk mitigation file. 


Figure 23. An example of the Risk Factor – Action pairs from the risk mitigation file analysis 

 

To create a license document, a user can choose from one of the 2 links in the License Generation part at the start page. One link is for the license generation which includes the personal data, and the other one is for the non-personal data. Figure 24 shows a page for the personal data, opened by clicking the given link. 

 

Figure 24. Personal data license generation page 

 

To generate a license document, a user needs to answer a set of questions. For some questions that require a text input, the suggestions are provided that can help user in formulating the answers. A user can leave the text as it is, in which case it will appear like that in the output license document, or he can modify the text. Also, there is an additional text provided under the asterisk (*) sign for some questions, to help the user with additional explanations related to the given question or its parts. By clicking the submit button at the end of the page, a new page with the license text is created with the possibility to download a license text to the .txt file (with the Download button at the end of the page). An example of the license text generated for the personal data case is shown in Figure 25. 


Figure 25. An example of the license text for the personal data 

 

For the non-personal data, there is also a set of questions to be answered, some of which are the same as for the personal data. Figure 26 shows the page for the non-personal data. The procedure for generating the license document (.txt file) is the same as in case for the personal data. Figure 27 shows an example of the generated license text for the non-personal data. 


Figure 26. Non-personal data license generation page 

Figure 27. An example of the license text for the non-personal data 




Skill Level: Beginner

In this course, you will deepen your understanding of the proposed architecture for the Research Data Ecosystem (RDE), focusing on its core elements, processes, and technological components. The course will explore how this architecture enables data access and sharing while emphasizing principles like decentralization, data sovereignty, and interoperability. Participants will also experience live demos of key components, showcasing the system's functionality in action.

Skill Level: Beginner

The "Establishing Governance in the Data Ecosystem" course explores the governance model of The LAGO project, aimed at facilitating secure and reliable data sharing through trust and policy adherence. Operating within a federated environment with decentralized data storage, the course outlines a two-phase workflow: data provisioning and publication, followed by request and transfer. Participants will learn about governance functions within the Connector component, including identity and catalog management to ensure data integrity and accessibility. The course also highlights specific nodes, such as the Federator and Federated Catalogue, which support federation maintenance and data description storage.

Skill Level: Beginner