BioFAIR-Funded Pathfinder Projects

The BioFAIR Pathfinder Projects represent an £800,000 investment across nine short-term initiatives designed to tackle real-world bottlenecks in making UK life science data Findable, Accessible, Interoperable, and Reusable (FAIR) and AI-ready. These pathfinders serve as our testing ground to diagnose specific friction points, prototype scalable solutions, and develop shared standards within their communities.

By solving practical interoperability and cultural challenges today, the proven successes from these projects will feed directly into BioFAIR, ensuring our emerging national data infrastructure is built on community-led evidence rather than assumptions.

Creating AI-enabled analysis pipelines for FAIR neuroscience data and models

Project Lead: Padraig Gleeson & Ankur Sinha — University College London

Contact: p.gleeson@ucl.ac.uk & ankur.sinha@ucl.ac.uk

The analysis of experimental data and its interpretation through computational models are central to modern neuroscience, yet the reuse of the field’s rapidly growing datasets remains limited. Researchers must navigate heterogeneous data formats, specialised modelling languages, and multi-stage analysis pipelines that demand significant programming expertise. As a result, many neuroscientists with deep domain knowledge cannot fully benefit from the expanding body of FAIR datasets and models available to the community.

The Open Source Brain (OSB) platform already improves access by providing a unified interface to over 450 experimental datasets and more than 3,000 computational models drawn from community archives, while supporting best-practice standards including NeuroML and SBML for modelling and Neurodata Without Borders (NWB) for experimental data. Even so, many researchers still struggle with the technical hurdles of building, configuring and running advanced modelling pipelines.

This project will address that challenge by creating an AI-enabled research assistant that lets users search, inspect, analyse and simulate OSB datasets and models through natural language. Combining state-of-the-art Large Language Models with Retrieval Augmented Generation and specialist AI agents, the system will guide users through complex workflows that previously required expert programming — a significant step beyond keyword-based search and manual Jupyter scripting.

The guiding question is how heterogeneous, multimodal FAIR neuroscience data and models can be made genuinely usable by researchers who are not computational specialists. By lowering technical barriers, the project will increase the reuse, transparency and accessibility of UK-led community resources such as OSB and NeuroML. The platform will be openly developed and designed to generalise to other life-science data resources, providing a proof-of-concept for UK and global initiatives seeking to make their contents more FAIR through leading-edge AI technologies.

OME-Zarr Framework for AI-Driven Whole Slide Analysis

Project Lead: David Harris-Birtill & Craig Myles — University of St Andrews

Contact: dcchb@st-andrews.ac.uk & cggm1@st-andrews.ac.uk

Digital pathology is rapidly adopting artificial intelligence for tasks such as biomarker prediction, tumour stratification and survival analysis. Yet computational pathology remains constrained by the fragmented, proprietary image formats produced by commercial scanners. The Open Microscopy Environment (OME) next-generation file format, OME-Zarr, offers an open, cloud-optimised and metadata-rich representation of whole-slide imaging, with growing adoption by repositories such as the EMBL-EBI BioImage Archive. Despite this momentum, there is no community framework showing how OME-Zarr whole-slide images can support AI training and reproducible AI prediction through shared workflows, so pathology groups face real barriers to adopting OME-Zarr and reusing AI methods.

This Pathfinder project builds on existing Findable, Accessible and Interoperable imaging resources, concentrating on strengthening Reusability through a reusable development and inference framework for OME-Zarr whole-slide images. The work will be grounded in colorectal cancer biomarker prediction using the SurGen dataset in the BioImage Archive as a community reference dataset, accompanied by reproducible examples. Its stakeholders — computational pathology researchers, AI method developers, research software engineers, data curators and clinicians — will be able to develop and benchmark digital pathology AI methods against a well-characterised reference resource.

Workflows and software will be released on GitHub and registered on WorkflowHub to maximise findability and reuse, enabling AI-ready, OME-Zarr-based curation practices. In line with BioFAIR’s Pathfinder objectives, the project will evaluate OME-Zarr for machine learning by benchmarking throughput, reproducibility and cross-site portability of the resulting workflows. Clinical collaborators Dr Jamie Wilson (Clinical Lead in Pathology) and Professor Shaun Walsh (Lead Consultant in Pathology) at Ninewells Hospital will help maintain clinical relevance and translatability. Co-design with the OME team (advised by Dr Jean-Marie Burel) and the BioImage Archive (supported by Matthew Hartley) will ensure the outputs align with community standards and are valuable across research, curation and clinical innovation.

FAIR-Figures: making publication figures AI-ready by applying FAIR data principles

Project Lead: Melissa Harrison — EMBL-EBI & Tim Beck – University of Nottingham

Contact: mharrison@ebi.ac.uk & tim.beck@nottingham.ac.uk

Brain images published in the biomedical literature could be collated into datasets for training Artificial Intelligence (AI) models and supporting neuroimaging AI education. Although large numbers of suitable images exist as figures across the literature, literature databases do not offer figure-search capabilities, making published figures difficult to find and reuse.

This project will bring together a UK brain-imaging AI research community with the team behind Europe PMC, one of the largest biomedical literature databases, to bridge the gap between researchers’ need for ad hoc image datasets and Europe PMC’s existing search capabilities. By applying FAIR data principles, the project will make publication figures in Europe PMC searchable, enabling brain images to be gathered into custom, reusable datasets for AI applications.

The work will achieve this by co-creating, with the UK brain-imaging community and the Europe PMC team, a metadata model describing the image features needed to search publication figures. Building on previous pilot work, state-of-the-art software components for splitting multi-panel figures into individual images and for extracting figure information from publication text will be integrated into an open, reusable workflow. This workflow will populate the metadata model with standard vocabulary terms and supply the information required to make brain images meaningfully searchable.

The project will deliver a prototype public user interface and API for searching and retrieving individual images within Europe PMC, enabling the construction of AI-ready brain-image datasets. Crucially, the approach is designed as a template that can be scaled across other life-science domains, allowing communities to build custom image datasets that meet their own specialised requirements. The work was scoped during an HDR UK and ELIXIR-UK joint hackathon and has been shaped by members of the NIHR Nottingham Biomedical Research Centre’s imaging themes.

UK Regulatory Network Commons Portal: a FAIR benchmarking framework for gene regulatory network inference methods

Project Lead: Kedar Natarajan — University of Southampton

Contact: knn1y25@soton.ac.uk

Gene regulatory networks (GRNs) are fundamental to biological systems, describing how regulators — most often transcription factors — orchestrate gene expression to coordinate cell identity. Although single-cell RNA sequencing (scRNA-seq) and chromatin accessibility (scATAC-seq) provide rich data, there is little agreement on how best to infer GRNs from multimodal datasets. More than 30 computational methods now exist, often with incompatible input and output formats, diverse performance metrics, and limited validation. Researchers consequently re-run analyses repeatedly, struggle to compare methods fairly, and have limited confidence in the predictions they use to design experiments. This creates critical FAIR gaps: scattered predictions in incompatible formats, an absence of systematic benchmarking on consistent datasets, and no community-validated reference networks.

This project proposes the UK Regulatory Network Commons Portal, a FAIR repository and benchmarking framework that will curate and harmonise fragmented GRN predictions into validated, reusable resources, organised around four “commons”. A Methods Commons will benchmark ten diverse inference approaches — spanning correlation-based, motif-enrichment, trajectory-based, enhancer-focused and machine-learning methods — using containerised, reproducible pipelines. A Data Commons will establish tiered reference standards (Gold, Silver, Bronze) from human and mouse single-cell multimodal datasets, rigorously validated against perturbation and chromatin-mapping studies to quantify prediction confidence. A Training Hub, delivered with the ELIXIR-UK Southampton Node, AI@Southampton and BioFAIR, will provide toolkits, resources and workshops to upskill the UK research community. Finally, a People Commons will foster a national community of practice committed to maintaining these standards beyond the funding period.

By demonstrating how computational outputs can achieve true interoperability, the initiative will serve the broader BioFAIR infrastructure while addressing fundamental questions in mammalian regulatory logic for developers, biologists and data stewards across the UK life sciences community.

FAIRPath: Foundations for a FAIRer Plant Pathology Community

Project Lead: Richard Ostler — Rothamsted Research

Contact: richard.ostler@rothamsted.ac.uk

FAIRPath aims to accelerate the transformation of plant pathology datasets — generated from field and controlled-environment trials — into Findable, Accessible, Interoperable and Reusable resources. The project builds on prior work at Rothamsted Research that mobilised more than 20 years of historic plant pathology data for the cereal root disease Take-all using the MIAPPE (Minimum Information About a Plant Phenotyping Experiment) metadata standard. That work developed strategies for converting legacy datasets into a modern data standard and revealed gaps in MIAPPE’s ability to capture complex, single- and multi-year experimental designs and statistical metadata for plant pathology.

The project has three objectives. First, it will develop practical tools to simplify FAIR adoption by researchers, including updated MIAPPE Excel metadata templates, validation tools, and data-collection form libraries built with Open Data Kit (ODK). Second, it will extend MIAPPE and related ontologies to better capture plant pathology metadata and improve the representation of statistical design elements. Third, it will deliver training and engagement activities — including a “bring your own data” workshop and webinar to refine and validate the tools, conference attendance to promote uptake across the community, and guidance for both mobilising historic data and creating “born-FAIR” new data.

Expected outcomes include increased researcher data literacy, greater awareness of current gaps within the community, improved interoperability of plant pathology datasets, enhanced MIAPPE coverage, and a sustainable framework for community-driven FAIR practices. While the project works specifically with the plant pathology community, many resources will benefit the wider plant sciences and MIAPPE user community. In doing so, FAIRPath will strengthen UK plant sciences by enabling reuse of historical data, supporting reproducible research, and advancing the FAIR principles in life sciences in line with BioFAIR’s mission.

The FAIR-in-action Bridge

Project Lead: Xenia Perez Sitja — Earlham Institute

Contact: xenia.perez-sitja@earlham.ac.uk

The FAIR-in-action Bridge aims to accelerate the practical adoption of the FAIR principles across UK life-science research-performing organisations (RPOs). While many organisations have developed FAIR data policies and guidance, research data management (RDM) professionals face a persistent challenge: translating high-level guidance into practical, reusable implementation materials that work in real institutional settings.

The project targets this “missing middle layer” between policy and everyday research practice by co-creating two FAIR-in-action playbooks. These will translate existing resources into facilitator-ready activities, templates and practical examples that can be owned and adapted locally, then reused at scale.

The work builds on the sustained success of the ELIXIR-UK RDM Club — a national network of more than 275 RDM professionals across 58+ UK RPOs and 29 international organisations. This established community will be mobilised to co-create, test and validate the playbooks within host institutions, with six diverse UK RPOs participating in the rollout to prime for national adoption. Each rollout will be a one-day, in-person event co-delivered with local teams to build facilitator capacity, embed local systems and policies, and support sustainable reuse, with feedback informing iterative refinement.

Sustainability is ensured through the RDM Club as a permanent national community of practice — part of the wider ELIXIR consortium — supporting ongoing reuse, adaptation and peer learning beyond the funding period. Final outputs will be published openly through resources such as RDMkit, the FAIR Cookbook, TeSS and Zenodo. Expected outcomes include increased capacity among RDM professionals to embed and advocate for FAIR RDM, stronger institutional integration of FAIR practices, and improved engagement with FAIR principles among researchers and students — supporting the early stages of culture change across the UK life-science research system.

Connecting Bioconductor, Galaxy and nf-core: FAIR, AI-ready workflows and training for single-cell analysis

Project Lead: Kevin Rue-Albrecht — University of Oxford

Contact: kevin.rue-albrecht@imm.ox.ac.uk

Researchers analysing single-cell data face a fragmented tool landscape: many excellent methods exist across different communities, but inconsistent packaging, documentation and metadata make it difficult to find, combine or reuse the right tools for a given biological question. Bioconductor offers a rich set of R-based packages for single-cell analysis, while Galaxy provides a graphical, reproducible environment that makes such tools accessible to users without programming experience. This project will improve how Bioconductor tools and training materials are shared across workflow platforms, making them more FAIR and better prepared for future AI-driven applications.

The project brings together a team with complementary expertise spanning the Bioconductor, nf-core, ELIXIR and Galaxy communities, with deep knowledge of existing infrastructure including WorkflowHub, bio.tools, RDMkit and EDAM. It will synergise with the work of BioFAIR fellows Marisa Loach and Nicola Soranzo to measurably improve Bioconductor’s FAIR-ness and to create and disseminate exemplar single-cell workflows that demonstrate how Bioconductor tools can be packaged, described and registered for reuse across platforms and prepared for AI-driven discovery.

Training activities will include online sessions and in-person “bring-your-own-data” events to help users apply the workflows to their own data. Expected outcomes include tooling for adapting Bioconductor methods as Galaxy wrappers and nf-core modules; extensions to existing Galaxy single-cell workflows demonstrating interoperability with Bioconductor software; guidance for aligning Bioconductor metadata with EDAM and registering workflows via WorkflowHub and the Intergalactic Workflow Commission; tutorials published through the Galaxy Training Network, the ELIXIR Training e-Support System (TeSS) and Bioconductor platforms; and contributions to the single-cell RDMkit. Together, these outputs will help researchers run reproducible single-cell analyses and strengthen the links between major open-source bioinformatics ecosystems.

Advancing Comparative Metabolomics: building towards a comprehensive, FAIR-compliant community resource for model organism metabolomes

Project Lead: Ralf Weber — University of Birmingham

Contact: r.j.weber@bham.ac.uk

Model organisms are central to fundamental and translational research across the life sciences. While large-scale genomic, transcriptomic and proteomic initiatives have transformed knowledge of model-organism biology, understanding of their metabolomes and metabolic biochemistry has lagged — despite the critical regulatory roles of small molecules in cellular pathways, physiology and environmental responses. Comparative genomics has revealed conserved and species-specific genetic features, yet systematic comparative metabolomics across model organisms remains underdeveloped, largely because of a lack of specialised, scalable, FAIR-compliant community resources and standardised workflows.

The Deep Metabolome Annotation (DMA) strategy, developed at the University of Birmingham, has partly addressed this gap by combining diverse chromatographic separations with comprehensive multi-stage mass spectrometry and bespoke computational workflows to deliver high-confidence, reusable metabolite and lipid annotations. Applied to Daphnia magna, DMA yielded more than 8,500 annotations and a prototype community database and web portal, DMAdb, which includes open-source and Galaxy-based tools and workflows for data processing, quality control and annotation.

Building on these foundations, this project will prototype a comparative metabolomics resource as a stepping stone towards enabling researchers to study how conserved and species-specific pathways and perturbation responses differ across model organisms. It will expand DMAdb into a multi-model-organism, FAIR-compliant community resource by incorporating DMA characterisations for Drosophila melanogaster and Caenorhabditis elegans, enhancing the Galaxy-based tools and workflows, and improving the web portal for cross-species exploration and visualisation of metabolome data. Interoperability will be strengthened through piloted integration with UK ELIXIR-endorsed resources such as MetaboLights, LIPID MAPS and FlyBase, embedding DMAdb within the wider ELIXIR/BioFAIR ecosystem. The resource will support the reuse and enrichment of public metabolomics datasets and provide a scalable foundation for future expansion to additional model organisms and emerging annotation approaches.

From Local Monitoring to National Capability: FAIR workflows for ecological living laboratories

Project Lead: Phil Wilkes & Robert Barber — Royal Botanic Gardens, Kew

Contact: p.wilkes@kew.org & R.Barber@kew.org

This project will deliver practical FAIR workflows, metadata templates and training materials enabling UK ecological monitoring sites to integrate multi-modal data for climate-change research at a national scale.

Climate change is disrupting ecological timing across the UK. Rising temperatures drive earlier insect emergence and bird breeding, while photoperiod-dependent plants respond more slowly, creating temporal mismatches across trophic levels. Understanding which ecosystems are most vulnerable requires integrating phenological, biodiversity, climate and vegetation data across many sites and years. Ecological “living laboratories” at universities, botanic gardens and conservation estates are generating rich long-term datasets on exactly these topics, collectively representing an unparalleled national resource for understanding climate-driven ecological change.

While national initiatives promote FAIR principles, existing guidance remains too high-level for landscape-scale monitoring teams working with multi-modal ecological data. There is no practical blueprint for how sites managing diverse datasets should structure, describe and publish their data for cross-site integration. As a result, data remain locked in local systems with inconsistent metadata and variable formats, preventing the synthesis needed to detect large-scale patterns of ecological desynchronisation.

The project addresses this gap through three work packages: a national survey mapping FAIR maturity across 50+ UK living labs; the development and testing of lightweight metadata templates and workflows at the Wakehurst Ecosystem Observatory; and the production of open guidance, training materials and a directory establishing a future UK FAIR Living Labs network. Outputs will be co-designed with the High Weald AONB Partnership, the Weald to Waves initiative, and the UK Centre for Ecology & Hydrology to ensure they meet real operational needs and align with national data infrastructure. By enabling data integration across sites and disciplines, the work will help UK ecological monitoring networks answer pressing questions about climate-driven ecosystem change and demonstrate practical pathways from FAIR principles to FAIR practice.

Don’t see your specific scientific domain listed above? We are continuously working to map and support the diverse landscape of UK life sciences. Check out our Opportunities page to get involved with upcoming funding calls, and keep an eye out for our next round of Pathfinder funding launching in late 2026!