
Abstract
The human genome encodes the fundamental instructions of human biology, yet deciphering how its sequence governs molecular function and influences disease remains one of the central challenges in biomedicine. As genomics and biomedical data continue to expand exponentially, genomics foundation models have emerged as powerful approaches capable of capturing complex, multi-scale patterns embedded in these sequences. In this talk, I will present our efforts to develop such models to learn the “code” of the genome – beginning with self-supervised models trained directly on genomic sequences, extending these architectures to integrate natural language, and introducing a next generation of unified models that bridge multiple training paradigms. Together, these advances bring us closer to a comprehensive, computable understanding of genome function.
Short Bio
Bernardo Almeida is a Senior AI Research Scientist at InstaDeep, in Paris, where he is developing large language foundational models for biology. He received his PhD in 2023 from the University of Vienna after his work on deep learning models to understand the information encoded in the genome sequence. Earlier, he obtained his undergraduate degree in Biomedical Sciences and masters in Oncobiology from the University of Algarve, Portugal.