Stemformatics is a pivotal data portal and visualisation platform for stem cell research. Initially established to host curated gene expression datasets, it now becomes a central repository for transcriptional profiles of both pluripotent and adult stem cells, with a particular focus on haematopoiesis and myeloid subsets. Stemformatics hosts an extensive data collection derived from multiple platforms, offering an in-depth view of stem cell biology. In recent developments, Stemformatics has expanded its focus beyond simple curation to collating and integrating public datasets with shared phenotypes. This strategic shift has led to the creation of several integrated expression atlases, with a notable emphasis on human myeloid, blood and dendritic cells. These atlases are critical in cross-dataset comparisons, enabling researchers to annotate myeloid subsets in external single-cell data and identify tissue-specific or activation properties of laboratory models or primary cells.
The Stemformatics team has a long-term vision to develop Artificial Intelligence models to identify experimental or environmental drivers of complex gene expression phenotypes. Specifically, we are implementing Deep Neural Networks to identify influential variables such as cell type, tissue origin, or time-dependent/dose-dependent ligand responses. This requires deep curation of the metadata accompanying a transcriptional dataset. We are assessing the usefulness of Large Language Models, such as ChatGPT, for method extraction from academic papers to assist with rapid curation and integration of public data. Stemformatics atlases are freely available at www.stemformatics.org and updates are provided via the Git repository https://github.com/wellslab/s4m-api.