Open and FAIR data assets
Agricultural research is no longer driven only by hypothesis-based science. With the advent of powerful data capabilities, it now also encompasses a predictive, empirical method that operates over large data pools to discern patterns rapidly and with agility. To take advantage of these approaches, CGIAR is committed to well-described, machine-interpretable, openly available data that are highly findable, accessible, interoperable, and reusable (FAIR). CGIAR Centers are committed to making their datasets available on institutional repositories that are FAIR-compliant, and through the work of the Big Data Platform, CGIAR made strong progress toward making its data assets open and FAIR in 2020.
CGIAR is committed to well-described, machine-interpretable, openly available data that are highly findable, accessible, interoperable, and reusable (FAIR).
The GARDIAN data ecosystem and knowledge base grew to about 170,000 publications and 27,000 datasets, representing an increase of 10% and 17%, respectively from 2019. GARDIAN is a one-stop-shop that allows users to find, visualize, and map data generated by CGIAR and its partners.
In 2020, the Big Data Platform built awareness around the need for open and FAIR assets, and developed data standards, tooling, and services to make it easy for researchers to collect data born-FAIR or to make legacy data FAIR — now accessible via a toolkit feature in GARDIAN. Findable datasets across CGIAR increased by about 20% between 2019 and 2020, and there was a 60% increase in open data over that period. In general, the amount of open data has been increasing since 2017, when the Platform came into operation.
Findable datasets across CGIAR increased by about 20% between 2019 and 2020, and there was a 60% increase in open data over that period.
This change is likely attributable to the Platform’s cross-CGIAR efforts on awareness building and capacity enhancement, backed by its seven domain-specific Communities of Practice — facilitating the recognition that data are valuable assets of CGIAR and must be well stewarded. In attempting to highlight the importance of open and FAIR data, the Platform continued to improve data science and analytics capabilities in 2020, developing data pipelines to popular crop models in the Collaborative GARDIAN Labs (CG Labs) analytical environment.
The spatial visualization and querying features in CG Labs were also enhanced in 2020, allowing users to specify desired parameters on very large, visualizable datasets (for example, the CMIP6 climate forecast dataset and the ISRIC global soil dataset). These innovation-focused efforts are critical to the development of high-value data products that help answer such questions as where fertilizer use might be most profitable and environmentally friendly in Africa. Further, they demonstrate the possibilities of open, actionable data, and facilitate CGIAR’s digital transformation.