Biomedical Data Science Seminar Series – Biomics Biomedical Data Science Seminar Series – Biomics

Biomedical Data Science Seminar Series

BIOMICS will establish dedicated seminar series that will feature scientific talks of experts from the partnering institutions, to take place every three months. These seminars will cover the latest achievements in the field and will be disseminated and open to the entire GIMM, as well as to the regional and national scientific communities.


[3] The third speaker of the Biomedical Data Science Seminar Series is Bernardo Almeida, a Senior AI Research Scientist at InstaDeep, in Paris, where he is developing large language foundational models for biology.

Decoding the genome with foundation models

The human genome encodes the fundamental instructions of human biology, yet deciphering how its sequence governs molecular function and influences disease remains one of the central challenges in biomedicine. As genomics and biomedical data continue to expand exponentially, genomics foundation models have emerged as powerful approaches capable of capturing complex, multi-scale patterns embedded in these sequences. In this talk, I will present our efforts to develop such models to learn the “code” of the genome – beginning with self-supervised models trained directly on genomic sequences, extending these architectures to integrate natural language, and introducing a next generation of unified models that bridge multiple training paradigms. Together, these advances bring us closer to a comprehensive, computable understanding of genome function.


[2] The second speaker of the Biomedical Data Science Seminar Series is Pedro Beltrão, core member of BIOMICS project and spokesperson for ETH at the consortium.

The genetics of human trait variation across the scales of biological organization

The number of genetic studies of human traits and diseases has grown over the past years with hundreds of thousands of gene-to-phenotype mappings done through genome-wide association, clinical studies or studies of model organisms. However, connecting trait associated genetic variation to mechanisms through individual proteins and cellular processes remains a challenge. Our group is interested in building computational and experimental approaches that aim to address this challenge. In this talk I will briefly introduce some of our work on using AlphaFold models to study the impact of protein missense mutations and on predicting tissue type differences in protein-protein interactions. I will then focus primarily on describing our ongoing efforts to study the differences and similarities between genes linked to traits by different genetic approaches: GWAS, rare disorder studies and mouse KO phenotypes. We find that rare disorder studies and GWAS are biased in the identification of different types of genes that often do not overlap even for the same or related traits. Despite the low gene-level overlap, we observe convergence at the level of cellular processes linked to the same types of traits regardless of the technologies used to study gene-to-trait associations. Finally, we show how this convergence allows us to improve the prediction of novel candidate disease genes.


[1] The first speaker of the Biomedical Data Science Seminar Series is Hagen Tilgner, member of BIOMICS Scientific Advisory Board.