B1458 - RNA-Seq Deep Sequencing and In Vivo Chromatin Studies Identifying Functional Elements Relevant to Reading and Language - 25/10/2012

B number: 
B1458
Principal applicant name: 
Prof Jeffrey Gruen (Yale University, USA)
Co-applicants: 
Dr Natalie Powers (Yale University, USA), Dr John Eicher (Yale University, USA)
Title of project: 
RNA-Seq, Deep Sequencing, and In Vivo Chromatin Studies Identifying Functional Elements Relevant to Reading and Language.
Proposal summary: 

Aims and Hypotheses

Learning disabilities are disorders characterized by unexpected difficulty with a specific mode of learning, generally with normal IQ and educational opportunity. The most common learning disabilities involve language; the NICHD estimates that 15-20% of Americans have a language-based learning disability. This high prevalence makes academic remediation of these disorders a costly burden to the educational system. The most common learning disabilities by far are dyslexia and language impairment (LI), which are specific deficits in processing and expressing written and spoken language, respectively. Both disorders are highly heritable, and genetic studies over the past three decades have identified a number of risk loci and genes. Because genetic methods have been used almost exclusively, however, nothing is known about how most risk variants exert their effects. Additionally, these risk variants account for little of the known heritability of these disorders; much of it is still 'missing,' which precludes development of a gene-based diagnosis. An effective gene-based diagnostic tool would be useful for early detection of affected individuals, as interventions are far more effective if administered earlier in life. In our current funding period, we reported compelling evidence that BV677278, a polymorphic tandem repeat within an intron of the dyslexia risk gene DCDC2, is a regulatory element that substantially influences reading and verbal language skills. We showed that this element can modulate expression from the DCDC2 promoter, that it binds specifically to the potent transcription factor ETV6, that at least two of its alleles significantly reduce mean reading or language performance, and that these alleles show a synergistic genetic interaction with a known risk allele of KIAA0319 (another known dyslexia risk gene in the same locus, DYX2). We also observed associations between other DYX2 regions, such as C6orf62 and THEM2, indicating that other DYX2 elements contribute to reading and language along with the risk variants in DCDC2 and KIAA0319. Based on these results, we hypothesize that BV677278 directly regulates KIAA0319 and other genes via a regulatory complex containing ETV6, and that the deleterious effects of specific BV677278 alleles are due to differences in target gene expression. We also hypothesize that BV677278 and KIAA0319 are not the only functional variants in the DYX2 locus that influence reading and language. To test these hypotheses, we will address the following specific aims:

Specific Aim 1) Examine the effect of BV677278 on gene expression. Because a BV677278 deletion exists naturally in humans, we can examine the effect of having two, one, or zero copies of BV677278 on gene expression, as well as the effects of different BV677278 genotypes. To this end, we will correlate extant expression microarray data from ~1,000 lymphoblastoid cell lines from ALSPAC subjects with various BV677278 genotypes. Allele frequency of the deletion suggests that there will be at least 7 BV677278-null cell lines. We will also perform RNA-seq on ~100 of these cell lines selected for genotypes of interest, including all BV677278-nulls.

Specific Aim 2) Identify putative regulatory targets that interact with the ETV6-BV677278 complex. We will use chromatin interaction analysis with paired end tagging (ChIA-PET) to find putative regulatory targets that physically interact with the ETV6-BV677278 complex. ChIA-PET allows for an unbiased scan of the genome for physical interactions with this complex, and positive results can be confirmed by chromatin conformation capture (3C). 3C will be used as a backup if ChIA-PET fails, though it will require prior identification of candidate target sequences, informed by our expression study in Aim 1. These experiments will be performed in 20 ALSPAC lymphoblastoid cell lines selected for two, one, and zero BV677278 copies.

Specific Aim 3) Identify additional DYX2 variants independent of the BV677278 regulatory element and the KIAA0319 risk haplotype. Our data suggest that, although BV677278 and the KIAA0319 risk haplotype are important sources of the DYX2 linkage and association with dyslexia, there are other DYX2 variants that contribute risk. To identify these variants and to elucidate their functions, we will exploit the single-base resolution provided by next-generation sequencing, and deep sequence the entire DYX2 locus in 1,000 ALSPAC subjects. We will select these subjects using an extreme phenotypes approach: 250 each of the worst performers on reading and language tasks, respectively, and 250 each of the best performers on those same tasks, respectively. Risk variants will accumulate in specific regions in severely affected individuals and will be assessed for functional implications in gene expression, protein function, and histone modifications.

The overall goal of this proposal is to explore the molecular mechanisms by which genetic variants in the DYX2 locus influence written and verbal language skills and impart heritable risk to dyslexia and LI. Our proposed experiments will elucidate the mechanism of action of BV677278, characterize its synergistic interaction with KIAA0319, identify other regulatory and gene targets of BV677278 that may not have been captured by genetic approaches, and detect other variants in the DYX2 locus that influence reading and language. By expanding our understanding of the biological basis of complex reading and language processes, we hope to open future options for diagnosis and treatment of language-based learning disabilities.

Exposure Variables

RNA, DNA and lymphoblastoid cell lines from ALSPAC subjects will be chosen on the basis of their BV677278 genotype, expression data from the microarray analyses of ALSPAC cell lines, performance on verbal language and reading tasks, IQ, and ancestry.

BV677278 is a hypermutable compound short tandem repeat with more than 30 alleles. A 2,445bp deletion, encompassing BV677278 in its entirety, also exists naturally in humans-individuals heterozygous for this deletion are hemizygous (have only one copy) for BV677278, while individuals homozygous for the deletion are completely BV677278-null (no copies of BV677278 in their genome). Our analyses of the ALSPAC thus far have shown that BV677278 allele 5 is associated with dyslexia and lowered reading skills and that BV677278 allele 6 is associated with language impairment and lowered language skills. Not only did these alleles show association with their respective phenotypes, their effects were strong enough to significantly reduce mean performance on reading tasks in the case of allele 5, and language tasks in the case of allele 6, in carriers vs. non-carriers. Additionally, we found that BV677278 interacts synergistically with a known risk variant in the other dyslexia gene in DYX2, KIAA0319, to adversely affect several reading, language, and cognitive phenotypes [Powers et al., submitted]. We also showed that BV677278 specifically binds a nuclear protein and is capable of modulating expression from the DCDC2 promoter [Meng et al., 2011]. We identified this protein that binds to BV677278 as the potent transcription factor and proto-oncogene ETV6 [Powers et al., submitted]. In our aims, we propose to take advantage of the BV677278 microdeletion and the various BV677278 genotypes by comparing RNA-sequencing, deep sequencing of the DYX2 locus, and chromatin studies in lymphoblastoid cell lines with two, one, or no copies (control) of BV677278, and with various alleles that we have so far found to be deleterious to language and reading (alleles 5 and 6).

Outcome Variables

For Aim 1, the outcome of RNA-sequencing will show quantitative expression data (number of cDNA transcripts) from subjects with different BV677278 genotypes. We will be able to compare expression from subjects with two, one, or no copies of BV677278, as well as with BV677278 genotypes of interest, especially alleles 5 and 6. By sequencing 100 selected ALSPAC subjects, we will be able to adjust for intra-subject and inter-subject variation. We will also be able to compare RNA-sequence results with the extant ALSPAC microarray data. We intend to perform RNA-sequence at a high resolution (approximately two samples per flow-cell lane on the Illumina Hi-Seq platform), which should enable us to detect differences in splice variants as well.

For Aim 2, the outcome of the CHiA-PET chromatin studies will be next-generation sequence data of fusion fragments. If successful, one half of the fragment should contain sequence from BV677278, and the other half from putative target genes (under BV677278 transcriptional control) located elsewhere in the DYX2 locus (cis) and beyond (trans). Bioinformatics analysis will be able to differentiate cis from trans elements. Within the DYX2 locus we will be especially looking for fusion fragments containing sequence from the KIAA0319 risk haplotype or promoter region, as well as other possible regulatory elements in the region. Outside DYX2, we will use existing annotation from publically available resources such as JASPAC, ENCODE, and ENSEMBL and in vitro transcriptional reporter assays to inform us whether we have hit transcriptional control elements and target genes that may be relevant to language and reading.

For Aim 3, the outcome of deep sequencing of the 1.5MB DYX2 locus will be variants, known and unknown, distributed throughout the locus. Here we will be looking for clusters of variants in putative regulatory regions such as the DCDC2 promoter, the KIAA0319 promoter, and promoters from other genes that appear to influence reading and language. Analysis for this outcome will be somewhat empirical in terms of setting cluster boundaries and significance, and will depend on the frequency of variants that we see, and whether hits are in areas of LD with SNPs previously shown to be associated with reading, language, or cognition, or that have putative functional roles or show evolutionary conservation.

Confounding Variables

A significant confounder for these experiments will be the heterogeneous genetic background of the subjects, and consequently the lymphoblastoid cell lines, we will use. We will try to minimize the effects of genetic background by restricting studies to subjects of European ancestry, by comparing outcomes from subjects that have the same BV677278 genotypes, and by utilizing population structure metrics already calculated in genome-wide association studies completed at ALSPAC.

There are inherent biases in the techniques we have selected to use in our studies. Therefore,we intend to confirm all novel variants identified by next-generation sequencing results with Sanger sequencing, and all putative physical interactions identified via CHiA-PETwith chromatin conformation capture (3C). We will also use luciferase-reporter experiments to confirm function of novel putative regulatory elements identified by sequencing or the chromatin studies

Materials Requested and Selection Criteria:

1) RNA: We request 5 micrograms (concentration ~50 nanogram/microliter) of RNA from 100 selected subjects for RNA-sequencing. Subject selection will be informed by the results of our genotyping of the BV677278 element and the analysis of the microarray expression data (pending). The goal in subject selection is to identify subjects with BV677278 genotypes of interest, specifically alleles previously related to reading and language performance and the presence/absence of the BV677278 element (del/del and del/+), and to correlate their relationship with global gene expression.

2) Lymphoblastoid Cell Lines: We request 20 lymphoblastoid cell lines from selected subjects for in vivo chromatin studies. Subject selection will be informed by BV677278 genotype, particularly the presence/absence of the BV677278 element and risk alleles (e.g. alleles 5 and 6).

3) DNA: Depending on availability, we request 1-3 micrograms (concentration ~100 nanogram/microliter) of cell line-derived DNA from 1,000 selected subjects for deep sequencing of the 1.5MB of the DYX2 locus. Subject selection will be contingent on past performance on selected language and reading tasks in the ALSPAC at ages 7, 8 and 9 years. We will select subjects using an extreme phenotypes approach: 250 each of the worst performers on reading and language tasks, respectively, and 250 each of the best performers on those same tasks, respectively.

Performance Sites

All non-sequencing experiments will be performed in Dr. Gruen's lab at the Yale Child Health Research Center. This includes all the chromatin studies (CHiA-PET and 3C) and any cell culture that will be required, DNA extraction, and PCR.

All sequencing, next-generation and conventional Sanger sequencing, sequence alignment and annotation, will be performed at the Yale Center for Genome Analysis and The W.M. Keck Foundation Biotechnology Resource Laboratory at Yale. Analysis of aligned and annotated sequence results will be performed in Dr. Gruen's lab at the Yale Child Health Research Center.

Security and Storage of Information and Materials

ALSPAC phenotype, genotype, and sequence information are stored on Yale networked/secured desktop computers only. No ALSPAC data are downloaded to computers that are not connected to the Yale network. No ALSPAC data are downloaded onto thumb drives, peripheral drives, or laptop computers that are not directly connected to the Yale network. No ALSPAC data may be shared with collaborators other than those that have been approved by ALSPAC. In addition, there is no identifying information with phenotype, genotype or sequence information that could link them with a specific ALSPAC subject.

RNA and DNA are stored in -70degree freezers in Dr. Gruen's lab in the Yale Child Health Research Center (464 Congress Avenue). Doors to the lab are locked and accessible by key entry only. Access to the Yale Child Health Research Center is restricted to Yale employees, students, and post-docs who work in the Center. In addition there is no identifying information recorded on the RNA/DNA tubes that could link the material with a specific ALSPAC subject.

Lymphoblastoid cell lines (LCL's) are stored in liquid nitrogen freezers in the Yale Child Health Research Center. Active cultures are stored in CO2 incubators in dedicated facilities in the Center. Access to the Yale Child Health Research Center is restricted to Yale employees, students, and post-docs who work in the Center. In addition there is no identifying information recorded on the cell culture flasks or storage vials that could link the material with a specific ALSPAC subject.

Data Sharing

Per ALSPAC protocol, all data generated in the course of this research will be returned to the ALSPAC within 12 months of its generation.

Ethical Considerations

The main ethical concerns of this proposed research are 1) protection of subject privacy, 2) security and storage of information and materials, 3) serendipitous identification of potentially important health information, and 4) subject withdrawal from the study.

1) Protection of Subject Privacy

Per ALSPAC protocol, all materials and information from ALSPAC is stripped of any identifiers that could possibly link them with a specific subject prior to being sent to collaborators. There is a theoretical possibility that whole genome sequencing or whole exome sequencing could identify an exceedingly rare phenotype that could be traced to a specific subject in the ALSPAC. However, we will not be performing either whole genome or whole exome sequencing on any subject DNA in this proposal. Furthermore, there are no known syndromes, rare or common, that could become known to us by deep sequencing the DYX2 locus. Therefore the risk of violating subject privacy through this research is minimal.

2) Security and Storage of Information and Materials

All information and data obtained during the proposed studies will be stored on Yale-maintained servers with security maintained by Yale University networks. Please see details in the above security section.

3) Serendipitous Identification of Potentially Important Health Information

In the course of deep sequencing the 1.5 megabases (MB) of the DYX2 dyslexia locus on 6p22, it is theoretically possible that we could identify a translocation that could suggest risk of malignancy. The 1.5MB we propose to sequence has been well defined in terms of coding regions and genes. It does not contain any oncogenes or proto-oncogenes or sequences previously described as causing malignancies. However, a recent report by Longoni et al. (2012) suggests that DCDC2 could be a novel oncogenic target of the ETS transcription factor ESE3/EHF. In this study the authors found that DCDC2 was aberrantly expressed in 53 malignant prostate tumors, but absent in 10 normal control prostates. They conclude that the ETS transcription factor ESE3/EHF, which is expressed in normal prostate and frequently lost in prostate tumors, maintained DCDC2 repressed by binding to a novel identified ETS binding site in the DCDC2 gene promoter. The authors do not report that genomic rearrangements, mutations, or RNA splice variants involving DCDC2 were the cause of prostatic malignancies or contributed to drug resistance, but a translocation could theoretically remove DCDC2 repression by separating DCDC2 from its regulatory element.

The risk of serendipitously finding a genomic rearrangement or mutation in the DYX2 locus that could increase the risk of malignancy is extremely small. However, should we find any genomic rearrangement, mutation, or splice variant, in the course of deep sequencing DYX2 that could suggest even a small increased risk of malignancy, we will report it to the ALSPAC within 6 months of identifying it. It will then be ALSPAC's responsibility to decide whether to share the information with a subject or for any further action.

4) Subject Withdrawal from the Study

In the event that we are notified by the ALSPAC that a donor-subject has withdrawn from the study, we will destroy any biomaterials we have received, including DNA, RNA, or lymphoblastoid cell lines. Since all information, genotypes, and variables are de-identified we don't expect that these would need to be deleted. However, we will delete any data if directed to do so by the ALSPAC. The Yale Human Investigation Committee will oversee this procedure.

Oversight of Protection of Human Subjects

All human subject research at Yale is under the oversight of the Yale Human Investigation Committee (HIC). Protocols describing protection of human subjects in detail must be reviewed and approved by the Yale HIC prior to applying for any funding, federal or private, or submission of results for publication. All prior research with the ALSPAC, including this proposal, has been submitted for review by the Yale HIC.

References:

Cho K, Frijters JC, Zhang H, Miller LL, Gruen JR. Prenatal exposure to nicotine and impaired reading performance. The Journal of Pediatrics, in press.

Eicher JD, Powers NR, Cho K, Miller LL, Mueller K, Ring SM, Tomblin JB, Gruen JR. Associations of prenatal nicotine exposure and the dopamine related genes ANKK1/DRD2 to verbal language, manuscript in preparation.

Longoni N, Kunderfranco P, Pellini S, Albino D, Mello-Grand M, Pinton S, D'Ambrosio G, Sarti M, Sessa F, Chiorino G, Catapano CV, Carbone, GM. (2012) Aberrant expression of the neuronal-specific protein DCDC2 promotes malignant phenotypes and is associated with prostate cancer progression. Oncogene doi: 10.1038/onc.2012.245 (epub ahead of print)

Meng H, Smith SD, Hager K, Held M, Liu J, Olson RK, Pennington B, DeFries JC, Gelernter J, O'Reilly-Pol T, Somlo S, Skudlarski P, Shaywitz SE, Shaywitz BA, Marchione K, Wang Y, Paramasivam M, LoTurco JJ, Page GP, Gruen JR. DCDC2 is associated with reading disability and modulates neuronal development in the brain, Proc Natl Acad Sci USA, 102: 17053-17058, 2005. PMID 16278297

Meng H, Powers NR, Tang L, Cope NA, Zhang P-X, Fuleihan R, Gibson C, Page GP, Gruen JR. A dyslexia-associated variant in DCDC2 changes gene expression. Behavior Genetics 2011 Jan;41(1):58-66.

Powers NR, Eicher JD, Butter F, Kong Y, Miller LL, Ring SM, Mann M, Gruen JR. Alleles of a Rapidly-Evolving ETV6 Binding Site in DCDC2 Confer Risk of Reading and Language Impairment, submitted.

Date proposal received: 
Thursday, 25 October, 2012
Date proposal approved: 
Thursday, 25 October, 2012
Keywords: 
Genetics, Speech and Language, Speech & Language
Primary keyword: