What are the most fascinating scientific findings?
Why we have invited tens of thousands of participants to our studies?
How have we contributed to the COVID-19 crisis?
What are the most fascinating scientific findings?
Why we have invited tens of thousands of participants to our studies?
How have we contributed to the COVID-19 crisis?

Yearbook of Institute of Genomics 2021


In the following overview, we have pointed out the most significant accomplishments of the Institute of Genomics, University of Tartu. I can proudly say that we have made a remarkable effort and eventually it will begin to bear fruit – to researchers, Estonians and actually to the whole world. More than 200,000 individuals who have participated voluntarily in the Estonian Biobank have given a major boost to it. However, in addition to research, we have contributed to delivering all this knowledge into practice so people could benefit from it.


In order to link genetic data to healthcare information, the Ministry of Social Affairs of Estonia has launched a national personalized medicine programme. During the last year, we have made a significant contribution together with the programme’s other partners to ensure that the first genetic data-based medical services are available for the Estonians in 2023.


Our next challenge is to contribute to international innovation and to make new discoveries in the field of medicine. As one of Europe's most successful population-based biobanks, the Estonian Biobank is a welcome partner in international research projects around the world. By creating opportunities to make a valuable database safely available to trustworthy partners, together we can move forward fastly to develop personalised medicine.


Mait Metspalu, director of the Institute of Genomics

 Remarkable research findings

In 2021, 130 publications were published, written or participated in by researchers from the Institute of Genomics. Of these, 33 were published in high-impact scientific journals (The Lancet, Cell, Nature, Science, Nature Genetics, Nature Communications).


Patterns of genetic connectedness between modern and medieval Estonian genomes reveal the origins of a major ancestry component of the Finnish population

The Finnish population is a unique example of a genetic isolate affected by a recent founder event. Previous studies have suggested that the ancestors of Finnic-speaking Finns and Estonians reached the circum-Baltic region by the 1st millennium BC. However, high linguistic similarity points to a more recent split of their languages. To study genetic connectedness between Finns and Estonians directly, we first assessed the efficacy of imputation of low-coverage ancient genomes by sequencing a medieval Estonian genome to high depth (23×) and evaluated the performance of its down-sampled replicas. We find that ancient genomes imputed from >0.1× coverage can be reliably used in principal-component analyses without projection. By searching for long shared allele intervals (LSAIs; similar to identity-by-descent segments) in unphased data for >143,000 present-day Estonians, 99 Finns, and 14 imputed ancient genomes from Estonia, we find unexpectedly high levels of individual connectedness between Estonians and Finns for the last eight centuries in contrast to their clear differentiation by allele frequencies. High levels of sharing of these segments between Estonians and Finns predate the demographic expansion and late settlement process of Finland. One plausible source of this extensive sharing is the 8th–10th centuries AD migration event from North Estonia to Finland that has been proposed to explain uniquely shared linguistic features between the Finnish language and the northern dialect of Estonian and shared Christianity-related loanwords from Slavic. These results suggest that LSAI detection provides a computationally tractable way to detect fine-scale structure in large cohorts.

Read the article


Revisiting the out of Africa event with a deep-learning approach

Anatomically modern humans evolved around 300 thousand years ago in Africa. They started to appear in the fossil record outside of Africa as early as 100 thousand years ago, although other hominins existed throughout Eurasia much earlier. Recently, several studies argued in favor of a single out of Africa event for modern humans on the basis of whole-genome sequence analyses. However, the single out of Africa model is in contrast with some of the findings from fossil records, which support two out of Africa events, and uniparental data, which propose a back to Africa movement. Here, we used a deep-learning approach coupled with approximate Bayesian computation and sequential Monte Carlo to revisit these hypotheses from the whole-genome sequence perspective. Our results support the back to Africa model over other alternatives. We estimated that there are two sequential separations between Africa and out of African populations happening around 60-90 thousand years ago and separated by 13-15 thousand years. One of the populations resulting from the more recent split has replaced the older West African population to a large extent, while the other one has founded the out of Africa populations.

Read the article


 Phenotypic differences between highlanders and lowlanders in Papua New Guinea

Altitude is one of the most demanding environmental pressures for human populations. Highlanders from Asia, America and Africa have been shown to exhibit different biological adaptations, but Oceanian populations remain understudied [Woolcock et al., 1972; Cotes et al., 1974; Senn et al., 2010]. We tested the hypothesis that highlanders phenotypically differ from lowlanders in Papua New Guinea, as a result of inhabiting the highest mountains in Oceania for at least 20,000 years. Six phenotypes were significantly different between Papua New Guinean highlanders and lowlanders. Highlanders show shorter height (p-value = 0.001), smaller waist circumference (p-value = 0.002), larger Forced Vital Capacity (FVC) (p-value = 0.008), larger maximal (p-value = 3.20e -4) and minimal chest depth (p-value = 2.37e -5) and higher haemoglobin concentration (p-value = 3.36e -4). Our study reports specific phenotypes in Papua New Guinean highlanders potentially related to altitude adaptation. Similar to other human groups adapted to high altitude, the evolutionary history of Papua New Guineans appears to have also followed an adaptive biological strategy for altitude.

Read the article


Ancient genomes reveal structural shifts after the arrival of Steppe-related ancestry in the Italian Peninsula

Across Europe, the genetics of the Chalcolithic/Bronze Age transition is increasingly characterized in terms of an influx of Steppe-related ancestry. The effect of this major shift on the genetic structure of populations in the Italian Peninsula remains underexplored. Here, genome-wide shotgun data for 22 individuals from commingled cave and single burials in Northeastern and Central Italy dated between 3200 and 1500 BCE provide the first genomic characterization of Bronze Age individuals (n = 8; 0.001–1.2× coverage) from the central Italian Peninsula, filling a gap in the literature between 1950 and 1500 BCE. Our study confirms a diversity of ancestry components during the Chalcolithic and the arrival of Steppe-related ancestry in the central Italian Peninsula as early as 1600 BCE, with this ancestry component increasing through time. We detect close patrilineal kinship in the burial patterns of Chalcolithic commingled cave burials and a shift away from this in the Bronze Age (2200–900 BCE) along with lowered runs of homozygosity, which may reflect larger changes in population structure. Finally, we find no evidence that the arrival of Steppe-related ancestry in Central Italy directly led to changes in frequency of 115 phenotypes present in the dataset, rather that the post-Roman Imperial period had a stronger influence, particularly on the frequency of variants associated with protection against Hansen’s disease (leprosy). Our study provides a closer look at local dynamics of demography and phenotypic shifts as they occurred as part of a broader phenomenon of widespread admixture during the Chalcolithic/Bronze Age transition.

Read the article


Mycobacterium leprae diversity and population dynamics in medieval Europe from novel ancient genomes

Hansen’s disease (leprosy), widespread in medieval Europe, is today mainly prevalent in tropical and subtropical regions with around 200,000 new cases reported annually. Despite its long history and appearance in historical records, its origins and past dissemination patterns are still widely unknown. Applying ancient DNA approaches to its major causative agent, Mycobacterium leprae, can significantly improve our understanding of the disease’s complex history. Previous studies have identified a high genetic continuity of the pathogen over the last 1500 years and the existence of at least four M. leprae lineages in some parts of Europe since the Early Medieval period. Here, we reconstructed 19 ancient M. leprae genomes to further investigate M. leprae’s genetic variation in Europe, with a dedicated focus on bacterial genomes from previously unstudied regions (Belarus, Iberia, Russia, Scotland), from multiple sites in a single region (Cambridgeshire, England), and from two Iberian leprosaria. Overall, our data confirm the existence of similar phylogeographic patterns across Europe, including high diversity in leprosaria. Further, we identified a new genotype in Belarus. By doubling the number of complete ancient M. leprae genomes, our results improve our knowledge of the past phylogeography of M. leprae and reveal a particularly high M. leprae diversity in European medieval leprosaria. Our findings allow us to detect similar patterns of strain diversity across Europe with branch 3 as the most common branch and the leprosaria as centers for high diversity. The higher resolution of our phylogeny tree also refined our understanding of the interspecies transfer between red squirrels and humans pointing to a late antique/early medieval transmission.

Read the article

Harnessing pluripotent stem cells as models to decipher human evolution

The study of human evolution, long constrained by a lack of experimental model systems, has been transformed by the emergence of the induced pluripotent stem cell (iPSC) field. iPSCs can be readily established from noninvasive tissue sources, from both humans and other primates; they can be maintained in the laboratory indefinitely, and they can be differentiated into other tissue types. These qualities mean that iPSCs are rapidly becoming established as viable and powerful model systems with which it is possible to address questions in human evolution that were until now logistically and ethically intractable, especially in the quest to understand humans' place among the great apes, and the genetic basis of human uniqueness. In this review, we discuss the key lessons and takeaways of this nascent field; from the types of research, iPSCs make possible to lingering challenges and likely future directions. We provide a comprehensive overview of how the seemingly unlikely combination of iPSCs and explicit evolutionary frameworks is transforming what is possible in our understanding of humanity's past and present.

Read the article

The Population-Specific Impact of Neandertal Introgression on Human Disease

Since the discovery of admixture between modern humans and Neandertals, multiple studies investigated the effect of Neandertal-derived DNA on human disease and non-disease phenotypes. These studies have linked Neandertal ancestry to the skin- and hair-related phenotypes, immunity, neurological, and behavioural traits. However, these inferences have so far been limited to cohorts with participants of European ancestry. Here, I analyze summary statistics from 40 disease GWAS (genome-wide association study) cohorts of ∼212,000 individuals provided by the Biobank Japan Project for phenotypic effects of Neandertal DNA. I show that Neandertal DNA is associated with autoimmune diseases, prostate cancer and type 2 diabetes. Many of these disease associations are linked to population-specific Neandertal DNA, highlighting the importance of studying a wider range of ancestries to characterize the phenotypic legacy of Neandertals in people today.

Read the article


Large-scale cis- and trans-eQTL analyses identify thousands of genetic loci and polygenic scores that regulate blood gene expression

Trait-associated genetic variants affect complex phenotypes primarily via regulatory mechanisms on the transcriptome. To investigate the genetics of gene expression, we performed cis- and trans-expression quantitative trait locus (eQTL) analyses using blood-derived expression from 31,684 individuals through the eQTLGen Consortium. We detected cis-eQTL for 88% of genes, and these were replicable in numerous tissues. Distal trans-eQTL (detected for 37% of 10,317 trait-associated variants tested) showed lower replication rates, partially due to low replication power and confounding by cell type composition. However, replication analyses in single-cell RNA-seq data prioritized intracellular trans-eQTL. Trans-eQTL exerted their effects via several mechanisms, primarily through regulation by transcription factors. Expression of 13% of the genes correlated with polygenic scores for 1,263 phenotypes, pinpointing potential drivers for those traits. In summary, this work represents a large eQTL resource, and its results serve as a starting point for in-depth interpretation of complex phenotypes.

Read the article


Genome-wide association study identifies five risk loci for pernicious anemia

Pernicious anemia is a rare condition characterized by vitamin B12 deficiency anemia due to lack of intrinsic factor, often caused by autoimmune gastritis. Patients with pernicious anemia have a higher incidence of other autoimmune disorders, such as type 1 diabetes, vitiligo, and autoimmune thyroid issues. Therefore, the disease has a clear autoimmune basis, although the genetic susceptibility factors have thus far remained poorly studied. We conduct a genome-wide association study meta-analysis in 2166 cases and 659,516 European controls from population-based biobanks and identify genome-wide significant signals in or near the PTPN22 (rs6679677, p = 1.91 × 10−24, OR = 1.63), PNPT1 (rs12616502, p = 3.14 × 10−8, OR = 1.70), HLA-DQB1 (rs28414666, p = 1.40 × 10−16, OR = 1.38), IL2RA (rs2476491, p = 1.90 × 10−8, OR = 1.22) and AIRE (rs74203920, p = 2.33 × 10−9, OR = 1.83) genes, thus providing robust associations between pernicious anemia and genetic risk factors.

Read the article


Using fecal immunochemical tubes for the analysis of the gut microbiome has the potential to improve colorectal cancer screening

Colorectal cancer (CRC) is a challenging public health problem which successful treatment depends on the stage at diagnosis. Recently, CRC-specific microbiome signatures have been proposed as a marker for CRC detection. Since many countries have initiated CRC screening programs, it would be useful to analyze the microbiome in the samples collected in fecal immunochemical test (FIT) tubes for fecal occult blood testing. Therefore, we investigated the impact of FIT tubes and stabilization buffer on the microbial community structure evaluated in stool samples from 30 volunteers and compared the detected communities to those of fresh-frozen samples, highlighting previously published cancer-specific communities. Altogether, 214 samples were analyzed by 16S rRNA gene sequencing, including positive and negative controls. Our results indicated that the variation between individuals was greater than the differences introduced by the collection strategy. The vast majority of the genera were stable for up to 7 days. None of the changes observed between fresh-frozen samples and FIT tube specimens were related to previously identified CRC-specific bacteria. Overall, we show that FIT tubes can be used for profiling the microbiota in CRC screening programs. This circumvents the need to collect additional samples and can possibly improve the sensitivity of CRC detection.

Read the article


Machine Learning Reveals Time-Varying Microbial Predictors with Complex Effects on Glucose Regulation

The incidence of type 2 diabetes (T2D) has been increasing globally, and a growing body of evidence links type 2 diabetes with altered microbiota composition. Type 2 diabetes is preceded by a long prediabetic state characterized by changes in various metabolic parameters. We tested whether the gut microbiome could have predictive potential for T2D development during the healthy and prediabetic disease stages. We used prospective data of 608 well-phenotyped Finnish men collected from the population-based Metabolic Syndrome in Men (METSIM) study to build machine learning models for predicting continuous glucose and insulin measures in a shorter (1.5 year) and longer (4 year) period. Our results show that the inclusion of the gut microbiome improves prediction accuracy for modeling T2D-associated parameters such as glycosylated hemoglobin and insulin measures. We identified novel microbial biomarkers and described their effects on the predictions using interpretable machine learning techniques, which revealed complex linear and nonlinear associations. Additionally, the modeling strategy carried out allowed us to compare the stability of model performance and biomarker selection, also revealing differences in short-term and long-term predictions. The identified microbiome biomarkers provide a predictive measure for various metabolic traits related to T2D, thus providing an additional parameter for personal risk assessment. Our work also highlights the need for robust modeling strategies and the value of interpretable machine learning. Recent studies have shown a clear link between gut microbiota and type 2 diabetes. However, current results are based on cross-sectional studies that aim to determine the microbial dysbiosis when the disease is already prevalent. In order to consider the microbiome as a factor in disease risk assessment, prospective studies are needed. Our study is the first study that assesses the gut microbiome as a predictive measure for several type 2 diabetes-associated parameters in a longitudinal study setting. Our results revealed a number of novel microbial biomarkers that can improve the prediction accuracy for continuous insulin measures and glycosylated hemoglobin levels. These results make the prospect of using the microbiome in personalized medicine promising.

Read the article


Leveraging Northern European population history: novel low-frequency variants for polycystic ovary syndrome

PCOS is a common, complex disorder with unknown aetiology. While previous genome-wide association studies (GWAS) have mapped several loci associated with PCOS, the analysis of populations with unique population history and genetic makeup has the potential to uncover new low-frequency variants with larger effects. We identified three novel genome-wide significant associations with PCOS, with two putative independent causal variants in the checkpoint kinase 2 (CHEK2) gene and a third in myosin X (MYO10).

Read the article


Mendelian Randomization Identifies the Potential Causal Impact of Dietary Patterns on Circulating Blood Metabolites

Nutrition plays an important role in the development and progress of several health conditions, but the exact mechanism is often still unclear. Blood metabolites are likely candidates to be mediating these relationships, as their levels are strongly dependent on the frequency of consumption of several foods/drinks. Understanding the causal effect of food on metabolites is thus of extreme importance. To establish these effects, we utilized two-sample Mendelian randomization using the genetic variants associated with dietary traits as instrumental variables. The estimates of single-nucleotide polymorphisms’ effects on exposures were obtained from a recent genome-wide association study (GWAS) of 25 individual and 15 principal-component dietary traits, whereas the ones for outcomes were obtained from a GWAS of 123 blood metabolites measured by nuclear magnetic resonance spectroscopy. We identified 413 potentially causal links between food and metabolites, replicating previous findings, such as the association between increased oily fish consumption and higher DHA, and highlighting several novel associations. Most of the associations were related to very-low-density, intermediate-density (IDL), and low-density lipoproteins (LDL). For example, we found that constituents of IDL particles and large LDL particles were raised by coffee and alcohol while lowered by an overall healthier diet and fruit consumption. Our findings provide a strong base of evidence for planning future RCTs aimed at understanding the role of diet in determining blood metabolite levels.

Read the article

Explore all our publications from this year (narrow search results with filters "Institute of Genomics" and "2021") in Estonian Research Information System

 Responding to COVID-19

The COVID-19 pandemic has highlighted the importance of scientists more than ever. In addition to developing vaccines and advising governments, the role of scientists is even wider – how to detect the spread of the virus? how to prevent severe disease in at-risk groups? or how to explain to people why it is important to get vaccinated? Our researchers have contributed to solving these issues.


COVID-19 host genetics initiative

The researchers from Estonian Genome Centre are taking part in the large collaboration project – COVID-19 host genetics initiative – which brings together the human genetics community to generate, share, and analyze data to learn the genetic determinants of COVID-19 susceptibility, severity, and outcomes.  Such discoveries could help to generate hypotheses for drug repurposing, identify individuals at unusually high or low risk, and contribute to global knowledge of the biology of SARS-CoV-2 infection and disease.

Learn more about the recent findings


More than 10,000 coronavirus samples were analysed

The Core Facility of Genomics has sequenced and analysed more than 10,000 Estonian SARS-CoV-2 whole genomes. Genomic sequences of viruses provide valuable epidemiological information about infection spread and control. This information is used for describing Estonian coronavirus diversity, evolution and spreading networks.


The saliva test is reliable for detecting COVID-19

In collaboration with the Health Board and the Institute of Family Medicine and Public and Health, we studied that the samples to detect the coronavirus could also be taken from saliva. That means that it does not have to be seen through by a medical professional, which would also be cost-effective.


 The activities of the Estonian Biobank

A large study on mental health genetics

Nearly every second individual is affected by a mental health problem in their lifetime, but the exact mechanisms causing these problems remain unclear. To bring more clarity to the field of mental health, the Estonian Biobank at the University of Tartu launched a large study in spring 2021. More than 86,000 Estonian Biobank participants responded to the questionnaire which is nearly 50% of the invitees. The results of this study will contribute towards the development of novel personalised medicine approaches in mental health

Personality study

Estonian Biobank and the Institute of Psychology launched a one of the kind study on the associations between personality traits, genes, life experiences and health. The study is unique in its combination of comprehensive measurement and a large number of participants. All Estonian Biobank participants are invited to complete a web survey. Upon completion, participants can request personalised feedback on their main personality traits. The collection of the data will continue until spring 2022.

More than 5,000 biobank participants joined this year

Despite the main recruitment of biobank participants has ended, thousands of people had the opportunity to join the Estonian Biobank this year. 

The Estonian Biobank has established a population-based biobank of Estonia with a current cohort size of more than 207,000 individuals (genotyped with genome-wide arrays), reflecting the age, sex and geographical distribution of the adult Estonian population. Considering the fact that about 20% of Estonia's adult population has joined the programme, it is indeed a database that is very important for the development of medical science both domestically and internationally.

 Overview of the year in numbers

overview of the year in numbers

Doctoral Defense: Natàlia Pujol Gualdo "Decoding genetic associations of female reproductive health traits"

Akadeemik Andres Metspalu

Andres Metspalu elected member of Academia Europaea

Sekvenaatorite paigaldus

New technology to help sequence whole genomes from 10,000 gene donors