Frequently Asked Question
1. What is the Dog Genome Annotation (DoGA) project?
The Dog Genome Annotation (DoGA) project is a research initiative focused on functionally annotating the canine genome. This involves identifying and characterizing regulatory elements such as promoters and enhancers, which are essential for gene expression. The project has generated a comprehensive tissue biobank and utilized advanced sequencing technologies to create a promoter and gene expression atlas for 100 canine tissues.
2. Why is the DoGA project important?
Despite the dog's significance as a model for human diseases and its unique population history, the functional annotation of its genome remained incomplete. This lack of knowledge about regulatory elements hampered gene discovery for various traits, including development, morphology, disease, and behavior. DoGA addresses this gap by providing an extensive resource that will significantly advance canine genomics research.
3. What resources and data are available through the DoGA project?
The DoGA project provides several valuable resources to the scientific community:
DoGA biobank: A collection of almost 6,000 samples from 132 different tissues collected from 49 animals (dogs and wolves). This biobank, with detailed clinical and pathological metadata, is available to researchers.
Promoter and gene expression atlas: An online interactive atlas that enables visualization of expression profiles of genes and promoters across 100 tissues. Users can search by genomic location, gene symbol, or tissue name.
Genome browser tracks: Genome browser views of the data, functional annotation, and species comparisons are available through the Zenbu and UCSC Genome Browsers.
Data analysis scripts: R-Markdown documents and scripts used for data analysis are publicly available, allowing researchers to replicate analyses and apply them to different canine genomes.
Whole genome sequencing data: Whole genome sequencing data from six dogs and four wolves is available to facilitate allele-specific analyses of gene expression.
4. How was the DoGA data generated and validated?
The project employed STRT2-seq, a targeted RNA sequencing technology, to capture active promoter regions and quantify gene expression levels across the 100 tissues. This technique allowed the identification of over 100,000 promoter region candidates, including novel promoters previously not annotated.
The quality of the identified promoters was validated in several ways:
Overlap with epigenetic marks: The promoters showed significant overlap with previously published datasets for open chromatin regions and promoter-associated histone marks (H3K4me3 and H3K27ac).
PCR validation: Several promoter candidates were validated using PCR and Sanger sequencing, confirming the accuracy of the identified transcription start sites (TSSs).
Biological relevance: The expression profiles of the promoters clustered according to their tissue of origin, indicating the biological relevance and quality of the data.
5. How can the DoGA data be used to study canine diseases?
The DoGA resource is highly valuable for studying canine diseases. It provides tissue-specific expression profiles for hundreds of genes associated with diseases listed in the Online Mendelian Inheritance in Animals (OMIA) database. Researchers can use this data to prioritize candidate genes, identify relevant tissues affected by disease-associated genomic variations, and investigate the potential role of alternative promoters in tissue-specific disease expression.
6. How does the DoGA data contribute to understanding the evolution of dogs and wolves?
The inclusion of tissue samples from both dogs and wolves provides a unique opportunity to study the molecular basis of behavioral evolution during domestication. By comparing gene expression and regulatory element usage between these species, researchers can gain insights into the genetic changes that have shaped canine behavior.
7. What are some examples of how the DoGA data can be applied?
The DoGA data and resources have various applications, including:
Investigating alternative promoter usage: The data allows the analysis of how alternative promoters of a gene are used differently across tissues and developmental stages, which can impact gene function and contribute to phenotypic diversity.
Prioritizing candidate causal variants: The overlap of promoter regions with previously identified lineage- and behavior-specific SNPs allows researchers to prioritize these variants for further functional studies.
Studying gene expression during embryonic development: The availability of data from dog embryos at different developmental stages provides insights into the dynamic changes in gene and promoter usage during canine development.
8. What are the limitations of the DoGA data, and how will the project continue in the future?
The DoGA project acknowledges some limitations, including the incompleteness of the gene expression atlas and promoter list. Future efforts will focus on expanding the biobank, including additional tissues and developmental stages. Single-cell resolution data will be incorporated to further refine the understanding of cell type-specific promoter usage. The consortium also plans to investigate other regulatory elements, such as enhancers, and continue comparative studies between dogs and wolves, particularly focusing on brain tissues and their role in behavioral evolution.