PLAZA & FAIR Data
What is FAIR data?
FAIR data is data that adheres to certain guiding principles, in order make it possible for data providers to easily:
- Find data and data sources
- Access data and data sources
- Interoperate between data providers
- Reuse data
These principles are further clarified and standardized by the FAIR working groups
How is PLAZA FAIR?
The developers of the PLAZA platform are dedicated to make the PLAZA platform compatible with the FAIR guiding principles.
This is a long-term goal for the PLAZA project, and with each software iteration we are one step closer to being fully FAIR compliant.
The data is findable: we currently offer a warehouse service which provides, for versions 3.0 and later, JSON objects detailing where the data is located. This service works both per PLAZA instance, and globally by aggregating the data for all PLAZA instances.
(meta)Data are assigned a globally unique and eternally persistent identifier.
Work in progress, once all dependencies have been identified. The current JSON objects are uniquely identified through the URL they are coming from.
Data are described with rich metadata.
Current JSON objects have the necessary meta-data within them, and additional documentation is available.
(meta)Data are registered or indexed in a searchable resource.
Work in progress within the ELIXIR framework.
Metadata specify the data identifier.
Data identifiers are logically build and constructed from meta-data core principle terms.
The data is accessible. Both the PLAZA warehouse and PLAZA gene-centric rest API function through the standard HTTP(s) protocol, make use of standard JSON formats to exchange data, etc.
(meta)Data are retrievable by their identifier using a standardized communications protocol.
The protocol is open, free, and universally implementable.
The protocol allows for an authentication and authorization procedure, where necessary.
Metadata are accessible, even when the data are no longer available.
For the warehouse data: the JSON objects (meta-data) describing and listing the data (on the FTP servers) will continue to be available as long as the PLAZA instance(s) are available, even if the data on the FTP servers is no longer available.
(meta)Data use a formal, accessible, shared, and broadly applicable language for knowledge representation.
Currently the meta-data is not yet formalized fully, but there is work in progress to further make use of ontologies to optimize this.
(meta)Data use vocabularies that follow FAIR principles.
By making use of standard ontologies this will indeed be the case.
(meta)Data include qualified references to other (meta)data.
Work in progress.
The goal of the PLAZA warehouse is to provide the PLAZA data to other frameworks (e.g. Galaxy) in order to speed up the deployment of new tools/data within those frameworks. The meta-data is standardized over all PLAZA instances in order for the service to be reusable for other frameworks.
(meta)Data have a plurality of accurate and relevant attributes.
Implemented. By making use of accurate and relevant meta-data, the requested data is quickly identified. Furthermore, each data file contains the necessary attributes to identify the content per file.
(meta)Data are released with a clear and accessible data usage license.
View the PLAZA data license.
(meta)Data are associated with their provenance.
We provide the data per PLAZA instance, and within each PLAZA instance we keep track of the genome versions that are used to produce the data. Furthermore, we intend to track additional information such as used software versions, dates/versions of external files (e.g. OBO files for Gene Ontology etc. ) in order to increase the provenance.
(meta)Data meet domain-relevant community standards.
All implemented. The data is offered in the standard formats expected by the community (e.g. FASTA for sequence files, GFF for annotation files, TAB-delimited for other data files). Meta-data exchange is through JSON.