ALSPAC OMICs Data Catalogue
2025-12-18 14:30:49
Introduction
Welcome to the ALSPAC Omics Catalogue, a guide to the omics data offered by ALSPAC. This catalogue features a variety of named ALSPAC datasets, each consisting of collected or produced data that has been organized, named, and curated for ease of use. Every named ALSPAC dataset comes with accompanying metadata that provides information about the dataset as a whole. Each named ALSPAC dataset has at least one release version that includes a curated selection of files detailed in the metadata sections.
Please note that these datasets are not generally accessible. Please see http://www.bristol.ac.uk/alspac/researchers/access/ for details for access.
The information within this catalogue is made available for browsing to help both internal ALSPAC users and external researchers understand the data and facilitate prospective data requests.
For external collaborators we offer as standard “freezes” of specific named ALSPAC datasets. These freezes, along with their metadata, are outlined in this catalogue. External collaborators will be granted access to these freezes upon request approval. A freeze represents a carefully selected subset of data files within a version, containing the core data from a dataset with withdrawn consent removed and specific dataset IDs applied. These freezes are subject to periodic updates.
Documentation for the current freeze is in the form of a yaml file is present below, listing the files external collaborators will receive, accompanied by metadata.
Due to the removal of withdrawn individuals from the freezes, please note that the number of participants within each dataset may change over time and may not match those found in the Methodology fields.
Freeze 1 timing: July 2021 - Dec 2022
Freeze 2 timing: Dec 2022 -
Dec 2023
Freeze 3 timing: Jan 2023 - Oct 2024
Freeze 4 timing:
Oct 2024 - June 2025
Freeze 5 timing: June 2025 - Dec 2025
Freeze 6 timing: Dec 2025 - Current
Genetic Array Data
Genome-wide - Illumina 550 quad - G1 (gwa_550_g1)
Description
This dataset contains genome wide array data genotype calls for G1
individuals.
Reference genome build: GRCh37
Methodology
ALSPAC children were genotyped using the Illumina HumanHap550 quad chip genotyping platforms by 23andme subcontracting the Wellcome Trust Sanger Institute, Cambridge, UK and the Laboratory Corporation of America, Burlington, NC, US. The resulting raw genome-wide data were subjected to standard quality control methods. Individuals were excluded on the basis of gender mismatches; minimal or excessive heterozygosity; disproportionate levels of individual missingness (>3%) and insufficient sample replication (IBD < 0.8).
Population stratification was assessed by multidimensional scaling analysis and compared with Hapmap II (release 22) European descent (CEU), Han Chinese, Japanese and Yoruba reference populations; all individuals with non-European ancestry were removed.
SNPs with a minor allele frequency of < 1%, a call rate of < 95% or evidence for violations of Hardy-Weinberg equilibrium (P < 5E-7) were removed.
Associated publication:
- Horikoshi et al 2013
(https://doi.org/10.1038/ng.2477)
Freeze Docs
# This yaml file is a description of a freeze of a released version of a named alspac dataset
# It should conform to the schema https://github.com/alspac/alspac-data-catalogue-schema
id: alspacdcs:gwa_550_g1_2022-12-05_f6
name: >-
Genome-wide array data for G1 individuals 2022-12-05 freeze 6
description: >-
The sixth freeze of the genome-wide array data for G1 based on the 2022-12-05 release. The data is in plink format.
Contains .hh file, which is produced automatically when the input data contains heterozygous calls where they shouldn't be possible (haploid chromosomes, male X/Y), or there are nonmissing calls for nonmales on the Y chromosome. Consists of a text file with one line per error (sorted primarily by variant ID, secondarily by sample ID) with the following three fields:
1. Family ID
2. Within-family ID
3. Variant ID
freeze_size: 997M
linker_file_md5sum: 45415c7d4fae355b4fb2d6ccd042620d
woc_file_md5sum: 6c887db8c7dd10cc695630ca73b41405
all_individuals_to_exclude_md5sum: e4efce63f9f671548d08c8bb2f9cc4f7
git_tag: https://github.com/alspac/dataset_gwa_550_g1/releases/tag/freeze6
is_current_freeze: true
freeze_number: 6
freeze_date: 2025-09-30
previous_freeze: alspacdcs:gwa_550_g1_2022-12-05_f5
freeze_of_alspac_dataset_version: alspacdcs:gwa_550_g1_2022-12-05
freeze_of_named_alspac_dataset: alspacdcs:gwa_550_g1
contains:
- data
files: []
data:
contains:
- freeze_id.bed
- freeze_id.bim
- freeze_id.fam
- freeze_id.hh
- freeze_id.log
files:
- id: alspacdcs:19718249-e6b9-437a-89e6-f8023285ba85
name: freeze_id.bed
md5sum: c708b16229b4a9af9ddd2f98e34b2d39
filesize: 981.4MB
filetype: .bed
belongs_to: data
- id: alspacdcs:bd518596-7d72-4662-a351-2aabc4b2c816
name: freeze_id.bim
md5sum: 0be48a05ee0e98d0de8180ae658768b2
filesize: 13.4MB
filetype: .bim
number_of_variants: 500527
belongs_to: data
- id: alspacdcs:d25e6ef8-f6c0-4a29-b7f9-1cf9cf4139bb
name: freeze_id.fam
md5sum: 09847a7ba78db2da9fd6495a5d771c4f
filesize: 248.9KB
filetype: .fam
number_of_participants: 8222
belongs_to: data
- id: alspacdcs:34ad861a-4492-41d5-a527-17e673aa8196
name: freeze_id.hh
md5sum: 609e4e8b8fd7f660b853b3f99013c0a4
filesize: 1.6MB
filetype: .hh
belongs_to: data
- id: alspacdcs:bad942d2-bd9e-4862-817c-d7e000bad2e0
name: freeze_id.log
md5sum: af9ddd5af43a34e2acf38ecb99d8fe4b
filesize: 1.1KB
filetype: .log
belongs_to: data
Genome-wide - Illumina exome core array - G0 partners (gwa_exome_g0p)
Description
This dataset contains genome wide array genotype calls for G0 mothers
and partners.
Reference genome build: GRCh37
Methodology
3,453 ALSPAC mother and fathers and 535,478 SNPs were genotyped using the Illumina HumanCoreExome chip genotyping platforms by the ALSPAC lab and called using GenomeStudio. The resulting raw genome-wide data were subjected to standard quality control methods using PLINK (v1.07). Individuals were excluded on the basis of gender mismatches (n = 80); minimal or excessive heterozygosity (n = 64); disproportionate levels of individual missingness (>5%, n = 60) and possible contamination (n = 3).
Population stratification was assessed by multidimensional scaling analysis and compared with 1000 Genomes phase 3 data and principal component analysis (n = 266); all individuals with non-European ancestry were removed. Cryptic relatedness was measured as SNP relatedness in GCTA (relatedness > 0.1, n = 69 removed). SNPs with a call rate of < 95% or evidence for violations of Hardy-Weinberg equilibrium (P < 1E-7) and those which failed GenomeStudio quality control measures were removed (n = 21,298). 6,594 duplicate SNPs were also removed. This resulted in 2,911 unrelated mothers and father genotypes at 507,586 SNPs. We then identified 2217 samples where aln assigned historically by the lab matched genetically assigned aln.
1737 putative G0 partner-G1 pairs for whom both G0 partner and G1 have called genotype data available were identified based on ALN. Given the G0 partners were invited by the G0 mother to take part and only enrolled in the study in their own right several years later, it could not be assumed that all G0 partners were biologically related to G1. Called genotype data for the 1720 unique G0 partners and 1737 unique G1s were merged (i.e. there were 17 pairs of siblings/twins among the G1 offspring), using plink v1.90b7.2 64-bit (11 Dec 2023).
After aplication of the plink filters –geno 0.05, –maf 0.01, –snps-only just-acgt and –autosome, 113288 SNPs remained. The –related command in KING version 2.3.2 was used to perform kinship analysis, which confirmed that all 1737 putative G0 partner-G1 pairs are genetically related. This would be expected for biological father-offspring pairs, using the inference criteria described in in Table 1 of “Manichaikul, Ani, et al. ”Robust relationship inference in genome-wide association studies.” Bioinformatics 26.22 (2010): 2867-2873.”
Freeze Docs
# This yaml file is a description of a freeze of a released version of a named alspac dataset
# It should conform to the schema https://github.com/alspac/alspac-data-catalogue-schema
id: alspacdcs:gwa_exome_g0p_2016-11-22_f6
name: Freeze 6 version 2016-11-22 Genome-wide - Illumina exome core array - G0 partners
description: >-
Freeze 6 version 2016-11-22 Genome-wide array data including genotype calls for G0 partners, including additional G0 mothers who were absent from previous genotyping rounds.
Data in plink format, including .hh file, which is produced automatically when the input data contains heterozygous calls where they shouldn't be possible (haploid chromosomes, male X/Y), or there are nonmissing calls for nonmales on the Y chromosome. Consists of a text file with one line per error (sorted primarily by variant ID, secondarily by sample ID) with the following three fields:
1. Family ID
2. Within-family ID
3. Variant ID
freeze_size: 281M
linker_file_md5sum: 45415c7d4fae355b4fb2d6ccd042620d
woc_file_md5sum: 6c887db8c7dd10cc695630ca73b41405
all_individuals_to_exclude_md5sum: e4efce63f9f671548d08c8bb2f9cc4f7
git_tag: https://github.com/alspac/dataset_gwa_exome_g0p/releases/tag/freeze6
is_current_freeze: true
freeze_number: 5
freeze_date: 2025-09-30
previous_freeze: alspacdcs:gwa_exome_g0p_2016-11-22_f5
freeze_of_alspac_dataset_version: alspacdcs:gwa_exome_g0p_2016-11-22
freeze_of_named_alspac_dataset: alspacdcs:gwa_exome_g0p
contains:
- data
files: []
data:
contains:
- freeze_id.bed
- freeze_id.bim
- freeze_id.fam
- freeze_id.hh
- freeze_id.log
files:
- id: alspacdcs:ec23e114-dc37-440f-a7cc-fca07375ccad
name: freeze_id.bed
md5sum: 304b0d356880c5174806ce08d7beffd3
filesize: 266.2MB
filetype: .bed
belongs_to: data
- id: alspacdcs:76ea8622-c436-4e75-a8d2-2c5b2bfd0d2c
name: freeze_id.bim
md5sum: 0fe43f888776059fef0a76d3f08d00ad
filesize: 13.9MB
filetype: .bim
number_of_variants: 507586
belongs_to: data
- id: alspacdcs:0e23f056-c8a1-4ccb-8f41-ffae41613be2
name: freeze_id.fam
md5sum: 5145c717970e73ceaa7268ce00a7ea15
filesize: 122.3KB
filetype: .fam
number_of_participants: 2198
belongs_to: data
- id: alspacdcs:e5d619e3-7202-4d90-92f4-6a37e47bfe39
name: freeze_id.hh
md5sum: cfa6a113c8f8e54c4e5d4b69e8a31fa9
filesize: 115.3KB
filetype: .hh
belongs_to: data
- id: alspacdcs:773da93d-79bf-464c-a4d6-e3d35d9398f1
name: freeze_id.log
md5sum: a1ab2605df887f103555e24b11bb2545
filesize: 1.1KB
filetype: .log
belongs_to: data
Genome-wide - Illumina 660 quad - G0 mothers (gwa_660_g0m)
Description
This dataset contains genome-wide array data including raw files and
genotype calls for G0 mothers.
Legacy 1 reference genome:
GRCh36
Legacy 2 reference genome:
GRCh37
Methodology
ALSPAC mothers were genotyped using the Illumina human660W-quad array at Centre National de Génotypage (CNG) and genotypes were called with Illumina GenomeStudio. PLINK (v1.07) was used to carry out quality control measures on an initial set of 10,015 subjects and 557,124 directly genotyped SNPs.
SNPs were removed if they displayed more than 5% missingness or a Hardy-Weinberg equilibrium P value of less than 1.0e-06. Additionally SNPs with a minor allele frequency of less than 1% were removed. Samples were excluded if they displayed more than 5% missingness, had indeterminate X chromosome heterozygosity or extreme autosomal heterozygosity. Samples showing evidence of population stratification were identified by multidimensional scaling of genome-wide identity by state pairwise distances using the four HapMap populations as a reference, and then excluded.
Cryptic relatedness was assessed using a IBD estimate of more than 0.125 which is expected to correspond to roughly 12.5% alleles shared IBD or a relatedness at the first cousin level. Related subjects that passed all other quality control thresholds were retained. This resulted in 9,048 subjects and 526,688 SNPs passed these quality control filters.
Associated publication:
- Rietveld et al 2013
(https://doi.org/10.1126%2Fscience.1235488)
Freeze Docs
# This yaml file is a description of a freeze of a released version of a named alspac dataset
# It should conform to the schema https://github.com/alspac/alspac-data-catalogue-schema
id: alspacdcs:gwa_660_g0m_2022-12-05_f6
name: Freeze 6 version 2022-12-05 Genome-wide - Illumina 660 quad - G0 mothers
description: >-
Freeze 6 of genome-wide array data including genotype calls for G0 mothers.
Contains 2 sets of data, legacy1 and legacy2.
legacy1: A dir/folder containing the plink data files.
Includes full set of SNPs (aligned to hg36), but missing ~500 mothers who
were excluded in legacy QC due to strict relatedness inclusion thresholds.
legacy2: A dir/folder containing the plink data files
Includes full set of individuals but due to legacy QC is restricted
to a set of ~480k SNPs that overlap with the Illumina 550k array
(which was used for G1 in gwa_550_g1). This QC was performed alongside liftOver to Hg37.
freeze_size: 2G
linker_file_md5sum: 45415c7d4fae355b4fb2d6ccd042620d
woc_file_md5sum: 6c887db8c7dd10cc695630ca73b41405
all_individuals_to_exclude_md5sum: e4efce63f9f671548d08c8bb2f9cc4f7
git_tag: https://github.com/alspac/dataset_gwa_660_g0m/releases/tag/freeze6
is_current_freeze: true
freeze_number: 6
freeze_date: 2025-09-30
freeze_of_alspac_dataset_version: alspacdcs:gwa_660_g0m_2022-12-05
freeze_of_named_alspac_dataset: alspacdcs:gwa_660_g0m
contains:
- data
files: []
data:
contains:
- legacy2
- legacy1
files: []
legacy2:
contains:
- freeze_id.bed
- freeze_id.bim
- freeze_id.fam
- freeze_id.log
files:
- id: alspacdcs:321df196-491b-4cdf-99a1-7ff40882c242
name: freeze_id.bed
md5sum: 7559903a4811210f6289497e1323dfe7
filesize: 960.3MB
filetype: .bed
belongs_to: data/legacy2
- id: alspacdcs:0843af40-8fea-4265-9870-cd492fab06cd
name: freeze_id.bim
md5sum: b4a1adb225de05d92d0af585950fd423
filesize: 12.3MB
filetype: .bim
number_of_variants: 465740
belongs_to: data/legacy2
- id: alspacdcs:f7bb4abb-2e0f-4809-9449-53cb2e35659c
name: freeze_id.fam
md5sum: 4f3c4043ebed461f5b1272b5ab8579ea
filesize: 447.6KB
filetype: .fam
number_of_participants: 8648
belongs_to: data/legacy2
- id: alspacdcs:d28b0471-5e01-49ea-9107-37c92f06a29c
name: freeze_id.log
md5sum: 59864532d578c1ba0fdf3aa95022510b
filesize: 981.0B
filetype: .log
belongs_to: data/legacy2
legacy1:
contains:
- freeze_id.bed
- freeze_id.bim
- freeze_id.fam
- freeze_id.log
files:
- id: alspacdcs:375a4bd0-df16-45e1-878d-aef46db50e8b
name: freeze_id.bed
md5sum: be66d3cc1d3d906c4d396cc161a605b1
filesize: 1019.6MB
filetype: .bed
belongs_to: data/legacy1
- id: alspacdcs:daf916d2-8f0b-4113-afa9-d1d420b9d894
name: freeze_id.bim
md5sum: 88b8c2221ef4ddc03118042db70d8575
filesize: 14.0MB
filetype: .bim
number_of_variants: 526688
belongs_to: data/legacy1
- id: alspacdcs:e2152aba-48db-4ca8-96f3-86a6ec257448
name: freeze_id.fam
md5sum: c97e448b8ae0bf29c1ca609a4719d05b
filesize: 253.7KB
filetype: .fam
number_of_participants: 8118
belongs_to: data/legacy1
- id: alspacdcs:b1cad595-044c-4b34-922b-3ed8095ae628
name: freeze_id.log
md5sum: 69b9dd28928c7adfb0459e7c3f07a0f0
filesize: 981.0B
filetype: .log
belongs_to: data/legacy1
Genome-wide - CNV - G1 (cnv_550_g1)
Description
This dataset contains predicted ALSPAC CNVs using PennCNV, generated from 23andMe raw genotype data.
Methodology
LRR and BAF data was missing from the 23andMe raw genotype data, so we had to generate this data ourselves using an in house algorithm. Once this data was generated, we ran PennCNV using the hh550 libraries.
There are filtered PennCNV calls. Multiple calls were merged using the ‘clean_cnv.pl’ script, using a merge fraction of 0.5. Individuals with > 30 CNVs, a Log R Ratio SD of >0.3, a BAF drift of > 0.002, and a waviness factor of > 0.05 were removed. CNVs in which at least 50% of the length of the CNV call overlapped with any of telomeric centromeric, immunoglobulin regions were removed using the ‘scan_region.pl’ script in PennCNV.
In addition, CNVs covering fewer than 5 probes, of a length < 5kb, and with a confidence score of below 10 were removed. Density was calculated as the number of probes in a CNV divided by the length of the CNV, and CNVs where the density of probes across the call was < 1 probe per 20kb was removed.
These QC parameters are suggestions only and provided in filtered.cnv. Analysts can apply their own filter parameters to the raw calls in data.cnv
Freeze Docs
# This yaml file is a description of a freeze of a released version of a named alspac dataset
# It should conform to the schema https://github.com/alspac/alspac-data-catalogue-schema
id: alspacdcs:cnv_550_g1_2015-11-09_f6
name: Genome-wide - CNV - G1 release version 2015-11-09 freeze 6
description: >-
This is the sixth freeze of the 2015-11-09 version of
cnv_550_g1 dataset.
It contains two csv versions of the cnv called data, the unfilterd
and filtered versions.
freeze_size: 27m
linker_file_md5sum: a8b3ed028e1a22a41e428612a62bc7c9
woc_file_md5sum: 163b7668b82ec7e5e6b7e35aecbbb473
all_individuals_to_exclude_md5sum: e551ddec737da29e25fc8d3119989a6a
git_tag: https://github.com/alspac/dataset_cnv_550_g1/releases/tag/freeze6
is_current_freeze: true
freeze_number: 6
freeze_date: 2025-09-30
previous_freeze: alspacdcs:cnv_550_g1_2015-11-09_f5
freeze_of_alspac_dataset_version: alspacdcs:cnv_550_g1_2015-11-09
freeze_of_named_alspac_dataset: alspacdcs:cnv_550_g1
contains:
- data
files: []
data:
contains:
- new_filtered.csv
- new_cnvdata.csv
files:
- id: alspacdcs:50895b7b-f20f-4d6d-97ae-a1bc18f1d393
name: new_filtered.csv
md5sum: aeb36ef5266f890bfecff3448325da8c
description: >-
CNV data that has been filtered.
columns
V1 - Position
V2 - Number of markers in the region
V3 - CNV length
V4 - Copy number estimate
V6 - Start SNP
V7 - End SNP
V8 - Confidence score
qlet - within pregnancy ID
cnv_550_g1 - pregnancy ID
number_of_participants: 6791 #data$id_qlet <- paste(data$cnv_550_g1, data$qlet, sep="_")
#length(unique(data$id_qlet))
number_of_cnv_variants: 14242 # Read file into R as data then:
# dim(unique(data[1]))
filesize: 5.9MB
filetype: .csv
belongs_to: data
- id: alspacdcs:45549969-94fc-4e76-91d1-3a12e750d380
name: new_cnvdata.csv
md5sum: 3bf366db747ed456613b100566bbd9a8
description: >-
This is the output of Penncnv before filtering.
columns
V1 - Position
V2 - Number of markers in the region
V3 - CNV length
V4 - Copy number estimate
V6 - Start SNP
V7 - End SNP
V8 - Confidence score
qlet - within pregnancy ID
cnv_550_g1 - pregnancy ID
number_of_participants: 7448 #data$id_qlet <- paste(data$cnv_550_g1, data$qlet, sep="_")
#length(unique(data$id_qlet))
number_of_cnv_variants: 70025 # Read file into R as data then:
# dim(unique(data[1]))
filesize: 20.8MB
filetype: .csv
belongs_to: data
Imputed Data
Genome-wide - HRC imputed - G0 mothers + G1 (gi_hrc_g0m_g1)
Description
This dataset contains genotype data imputed to HRC for G0 mothers and
G1.
Reference genome build: GRCh37
Methodology
ALSPAC children were genotyped using the Illumina HumanHap550 quad chip genotyping platforms by 23andme subcontracting the Wellcome Trust Sanger Institute, Cambridge, UK and the Laboratory Corporation of America, Burlington, NC, US. The resulting raw genome-wide data were subjected to standard quality control methods. Individuals were excluded on the basis of gender mismatches; minimal or excessive heterozygosity; disproportionate levels of individual missingness (>3%) and insufficient sample replication (IBD < 0.8).
Population stratification was assessed by multidimensional scaling analysis and compared with Hapmap II (release 22) European descent (CEU), Han Chinese, Japanese and Yoruba reference populations; all individuals with non-European ancestry were removed.
SNPs with a minor allele frequency of < 1%, a call rate of < 95% or evidence for violations of Hardy-Weinberg equilibrium (P < 5E-7) were removed.
Related subjects that passed all other quality control thresholds were retained during subsequent phasing and imputation. 9,115 subjects and 500,527 SNPs passed these quality control filters.
ALSPAC mothers were genotyped using the Illumina human660W-quad array at Centre National de Génotypage (CNG) and genotypes were called with Illumina GenomeStudio. PLINK (v1.07) was used to carry out quality control measures on an initial set of 10,015 subjects and 557,124 directly genotyped SNPs. SNPs were removed if they displayed more than 5% missingness or a Hardy-Weinberg equilibrium P value of less than 1.0e-06. Additionally SNPs with a minor allele frequency of less than 1% were removed.
Samples were excluded if they displayed more than 5% missingness, had indeterminate X chromosome heterozygosity or extreme autosomal heterozygosity. Samples showing evidence of population stratification were identified by multidimensional scaling of genome-wide identity by state pairwise distances using the four HapMap populations as a reference, and then excluded.
9,048 subjects and 526,688 SNPs passed these quality control filters.
We combined 477,482 SNP genotypes in common between the sample of mothers and sample of children. We removed SNPs with genotype missingness above 1% due to poor quality (11,396 SNPs removed) and removed a further 321 subjects due to potential ID mismatches. This resulted in a dataset of 17,842 subjects containing 6,305 duos and 465,740 SNPs (112 were removed during liftover and 234 were out of HWE after combination). We estimated haplotypes using ShapeIT (v2.r644) which utilises relatedness during phasing. The phased haplotypes were then imputed to the Haplotype Reference Consortium (HRCr1.1, 2016) panel of approximately 31,000 phased whole genomes. The HRC panel was phased using ShapeIt v2.r727, and the imputation was performed using the Michigan imputation server.
Freeze Docs
# This yaml file is a description of a freeze of a released version of a named alspac dataset
# It should conform to the schema https://github.com/alspac/alspac-data-catalogue-schema
id: alspacdcs:gi_hrc_g0m_g1_2017-05-04_f6
name: >-
Genome-wide - HRC imputed - G0 mothers + G1 version 2017-05-04 freeze 6
description: >-
Freeze 6 of version 2017-05-04 Genome-wide array data imputed to the HRC reference panel for G0 mothers and G1 individuals in bgen and sample file format (version 1.2).
freeze_size: 114G
linker_file_md5sum: 45415c7d4fae355b4fb2d6ccd042620d
woc_file_md5sum: 6c887db8c7dd10cc695630ca73b41405
all_individuals_to_exclude_md5sum: e4efce63f9f671548d08c8bb2f9cc4f7
git_tag: https://github.com/alspac/dataset_gi_hrc_g0m_g1/releases/tag/freeze6
is_current_freeze: true
freeze_number: 6
freeze_date: 2025-09-30
previous_freeze: alspacdcs:gi_hrc_g0m_g1_2017-05-04_f5
freeze_of_alspac_dataset_version: alspacdcs:gi_hrc_g0m_g1_2017-05-04
freeze_of_named_alspac_dataset: alspacdcs:gi_hrc_g0m_g1
contains:
- data
files: []
data:
contains:
- filtered_21.bgen
- filtered_22.bgen
- filtered_23male.bgen
- filtered_20.bgen
- filtered_19.bgen
- filtered_18.bgen
- filtered_15.bgen
- filtered_17.bgen
- filtered_14.bgen
- filtered_23female.bgen
- filtered_16.bgen
- filtered_13.bgen
- filtered_09.bgen
- filtered_12.bgen
- filtered_10.bgen
- filtered_11.bgen
- filtered_08.bgen
- filtered_07.bgen
- filtered_06.bgen
- filtered_05.bgen
- filtered_03.bgen
- filtered_04.bgen
- filtered_01.bgen
- filtered_02.bgen
- swapped_23_male.sample
- swapped_23_female.sample
- swapped.sample
files:
- id: alspacdcs:c7184af4-0945-445e-b8f1-9343d2797971
name: filtered_21.bgen
md5sum: d944952cd7ae62525a0b2902306b0371
filesize: 1.7GB
filetype: .bgen
number_of_variants: 531276
number_of_participants: 17443
belongs_to: data
- id: alspacdcs:8991c75e-ddca-4ed8-acd9-b1cc8e52b465
name: filtered_22.bgen
md5sum: e6e35fc7bb4af26579e86117182a867b
filesize: 1.8GB
filetype: .bgen
number_of_variants: 524544
number_of_participants: 17443
belongs_to: data
- id: alspacdcs:505ca2f7-01dc-4780-807d-32238508e3d1
name: filtered_23male.bgen
md5sum: 0a865f7f362a08741f62980790ea00d1
filesize: 1.2GB
filetype: .bgen
number_of_variants: 1228035
number_of_participants: 4500
belongs_to: data
- id: alspacdcs:56d0823d-e769-4547-be13-4c48d2b69897
name: filtered_20.bgen
md5sum: 91cd660fb3febebc4acd427f69ce9b76
filesize: 2.6GB
filetype: .bgen
number_of_variants: 884983
number_of_participants: 17443
belongs_to: data
- id: alspacdcs:24ba7793-a9e5-447c-81b3-37051ac9e9e8
name: filtered_19.bgen
md5sum: 395c180a38fc6b3982ec31a2b870e520
filesize: 3.4GB
filetype: .bgen
number_of_variants: 868554
number_of_participants: 17443
belongs_to: data
- id: alspacdcs:bec2e8f8-1f53-42a1-ae88-92866d0e6288
name: filtered_18.bgen
md5sum: 8b07da40891cd7d6d140402e11ccc450
filesize: 3.1GB
filetype: .bgen
number_of_variants: 1104755
number_of_participants: 17443
belongs_to: data
- id: alspacdcs:d855abcd-9b51-49ca-a47d-d6462f41315b
name: filtered_15.bgen
md5sum: 949e7d7fd1db2ae89ef659957967e03e
filesize: 3.4GB
filetype: .bgen
number_of_variants: 1139215
number_of_participants: 17443
belongs_to: data
- id: alspacdcs:b7170945-432a-4a9a-9e3b-17f629463fa3
name: filtered_17.bgen
md5sum: b405b7608ef1189ce700fd8fe1df096e
filesize: 3.6GB
filetype: .bgen
number_of_variants: 1090072
number_of_participants: 17443
belongs_to: data
- id: alspacdcs:ad4b422c-8473-4593-b683-fdc862642761
name: filtered_14.bgen
md5sum: 0c20c36da72c89c59204a344b66d3758
filesize: 3.5GB
filetype: .bgen
number_of_variants: 1266536
number_of_participants: 17443
belongs_to: data
- id: alspacdcs:82972f35-103d-45f8-ad02-f3c4a5194c86
name: filtered_23female.bgen
md5sum: d4abdc0d84bda1f8a3eec5c9cee8977b
filesize: 4.2GB
filetype: .bgen
number_of_variants: 1228035
number_of_participants: 12943
belongs_to: data
- id: alspacdcs:a0b692db-9a88-4ef2-b73b-dbce7161257d
name: filtered_16.bgen
md5sum: d941572511b3377843eb6a8aefe79a34
filesize: 4.1GB
filetype: .bgen
number_of_variants: 1281298
number_of_participants: 17443
belongs_to: data
- id: alspacdcs:3043d7b8-e1e4-4c57-a130-d65a654e4062
name: filtered_13.bgen
md5sum: 6c7efae02d5581e86db400aff3213b8f
filesize: 3.7GB
filetype: .bgen
number_of_variants: 1385434
number_of_participants: 17443
belongs_to: data
- id: alspacdcs:55c6a805-bdbe-4436-9cd5-f047ce051900
name: filtered_09.bgen
md5sum: 0cc5973a41ede08fa4afe3958c4972c6
filesize: 4.5GB
filetype: .bgen
number_of_variants: 1675899
number_of_participants: 17443
belongs_to: data
- id: alspacdcs:754f61a5-e376-4755-b4d4-66c5a6ba2188
name: filtered_12.bgen
md5sum: 016275752dfe39b2b921f82e476420bf
filesize: 5.1GB
filetype: .bgen
number_of_variants: 1848118
number_of_participants: 17443
belongs_to: data
- id: alspacdcs:89eb6efe-438d-42ce-992a-644c8edcdc44
name: filtered_10.bgen
md5sum: a44c8a763298e2fb94efccb35d21cf5c
filesize: 5.1GB
filetype: .bgen
number_of_variants: 1927504
number_of_participants: 17443
belongs_to: data
- id: alspacdcs:f42cc063-672e-4299-9af5-0c98ec7382eb
name: filtered_11.bgen
md5sum: b1fee7aa390f3c52f4884a2fb5f7196a
filesize: 5.2GB
filetype: .bgen
number_of_variants: 1936990
number_of_participants: 17443
belongs_to: data
- id: alspacdcs:afdcb88c-111d-4b2c-9485-cb66eed0f392
name: filtered_08.bgen
md5sum: 3403e27325bbf785589498e42f4536dd
filesize: 5.7GB
filetype: .bgen
number_of_variants: 2242706
number_of_participants: 17443
belongs_to: data
- id: alspacdcs:1ce1ee18-6715-4ea5-8721-2c07334e001c
name: filtered_07.bgen
md5sum: 68fa282857de492d9a36ad0e5d045ed9
filesize: 6.6GB
filetype: .bgen
number_of_variants: 2289306
number_of_participants: 17443
belongs_to: data
- id: alspacdcs:185f7830-7af7-4808-87ee-43dd1ae05b2c
name: filtered_06.bgen
md5sum: 74b1b0d1e46662f1b0216a9e67c42f54
filesize: 6.3GB
filetype: .bgen
number_of_variants: 2460112
number_of_participants: 17443
belongs_to: data
- id: alspacdcs:919fe0bb-9bb6-4d1e-8dbc-bbf6ea76d994
name: filtered_05.bgen
md5sum: 90c9d5589d86b9611cc6b16da239fd36
filesize: 6.7GB
filetype: .bgen
number_of_variants: 2588170
number_of_participants: 17443
belongs_to: data
- id: alspacdcs:31cb65a9-320e-4db7-a243-af3547f269f9
name: filtered_03.bgen
md5sum: de1fe1cd4acd6b25430310c9e849caaa
filesize: 7.3GB
filetype: .bgen
number_of_variants: 2821895
number_of_participants: 17443
belongs_to: data
- id: alspacdcs:f1b8b2f7-d0a5-46e0-8f31-a54bfd84066d
name: filtered_04.bgen
md5sum: 7e0625019c52c820202127cf0edd4e63
filesize: 7.9GB
filetype: .bgen
number_of_variants: 2787582
number_of_participants: 17443
belongs_to: data
- id: alspacdcs:b03079de-2c94-45e6-a7b3-d5b18267bac1
name: filtered_01.bgen
md5sum: 99bcd042d88989e303d6425e0a82f27d
filesize: 8.6GB
filetype: .bgen
number_of_variants: 3069932
number_of_participants: 17443
belongs_to: data
- id: alspacdcs:08a494f3-84fc-455a-a2a0-4bdd0efe8b3b
name: filtered_02.bgen
md5sum: f50b3709b381b89a571468133a954f38
filesize: 8.7GB
filetype: .bgen
number_of_variants: 3392238
number_of_participants: 17443
belongs_to: data
- id: alspacdcs:18f8763d-b876-4648-8946-e03d564d5a6e
name: swapped_23_male.sample
md5sum: 48c1a0e6ab8f3c7a22662957b69646dd
filesize: 259.3KB
filetype: .sample
number_of_participants: 4500
belongs_to: data
- id: alspacdcs:f9603ac1-c20b-4824-9608-15acaac5769d
name: swapped_23_female.sample
md5sum: 77adf7126efae7f70c74c32abac67679
filesize: 745.8KB
filetype: .sample
number_of_participants: 12943
belongs_to: data
- id: alspacdcs:015eadb8-9b0c-4f4d-8206-bb715f0a3c03
name: swapped.sample
md5sum: 83772169b3ae48a868b30615c69804a6
filesize: 1005.1KB
filetype: .sample
number_of_participants: 17443
belongs_to: data
Genome-wide - HapMap2 imputed - G1 (gi_hapmap2_g1)
Description
This dataset contains genotype data imputed to HapMap 2 for G1.
Reference genome build: GRCh36
Methodology
A total of 9912 subjects were genotyped using the Illumina HumanHap550 quad genome-wide SNP genotyping platform by 23 and Me subcontracting the Wellcome Trust Sanger Institute, Cambridge, UK and the Laboratory Corporation of America, Burlington, NC, USA.
Individuals were excluded from further analysis on the basis of having incorrect gender assignments; minimal or excessive heterozygosity (<0.320 and >0.345 for the Sanger data and <0.310 and >0.330 for the LabCorp data); disproportionate levels of individual missingness (>3%); evidence of cryptic relatedness (>10% IBD) and being of non-European ancestry (as detected by a multidimensional scaling analysis seeded with HapMap 2 individuals, EIGENSTRAT analysis revealed no additional obvious population stratification and genome-wide analyses with other phenotypes indicate a low lambda). The resulting data set consisted of 8365 individuals (84% of those genotyped).
SNPs with a minor allele frequency of <1% and call rate of <95% were removed. Furthermore, only SNPs which passed an exact test of Hardy-Weinberg equilibrium (P > 5 x 10-7) were considered for analysis. Genotypes were subsequently imputed with MACH 1.0.16 Markov Chain Haplotyping software, using CEPH individuals from phase 2 of the HapMap project as a reference set (release 22).
Associated publication:
- https://doi.org/10.1093/hmg/ddr309
Freeze Docs
# This yaml file is a description of a freeze of a released version of a named alspac dataset
# It should conform to the schema https://github.com/alspac/alspac-data-catalogue-schema
id: alspacdcs:gi_hapmap2_g1_2022-12-07_f6
name: Genome-wide - HapMap2 imputed - G1 version 2022-12-07 freeze 6
description: >-
Freeze 6 of 2022-12-07 version of Genome-wide array data imputed to the HapMap2 reference panel for G1 individuals.
In Plink standard format, See https://www.cog-genomics.org/plink/1.9/formats for further information.
freeze_size: 5G
linker_file_md5sum: 45415c7d4fae355b4fb2d6ccd042620d
woc_file_md5sum: 6c887db8c7dd10cc695630ca73b41405
all_individuals_to_exclude_md5sum: e4efce63f9f671548d08c8bb2f9cc4f7
git_tag: https://github.com/alspac/dataset_gi_hapmap2_g1/releases/tag/freeze6
is_current_freeze: true
freeze_number: 6
freeze_date: 2025-09-30
previous_freeze: alspacdcs:gi_hapmap2_g1_2022-12-07_f5
freeze_of_alspac_dataset_version: alspacdcs:gi_hapmap2_g1_2022-12-07
freeze_of_named_alspac_dataset: alspacdcs:gi_hapmap2_g1
contains:
- data
files: []
data:
contains:
- freeze_id.bed
- freeze_id.bim
- freeze_id.fam
- freeze_id.log
files:
- id: alspacdcs:51e0b3de-dcce-456f-a99b-76752e42cbc7
name: freeze_id.bed
md5sum: 4362bfc2985fe02e84950530668c379d
filesize: 4.9GB
filetype: .bed
belongs_to: data
- id: alspacdcs:fa76edfd-d194-49ef-b9b7-8a9f922582e2
name: freeze_id.bim
md5sum: da64bd173633ec7198b8c9b7f61fabca
filesize: 67.6MB
filetype: .bim
number_of_variants: 2543887
belongs_to: data
- id: alspacdcs:a7e7e68b-7ff1-4287-a4dc-2437ddee77e4
name: freeze_id.fam
md5sum: fd2ed9d93ab7c69f6bfef1927ae0feaa
filesize: 273.0KB
filetype: .fam
number_of_participants: 8222
belongs_to: data
- id: alspacdcs:476a079c-a8c6-446c-a84a-af79610ee216
name: freeze_id.log
md5sum: b291cf31d5cfd1f44eb6217819c9fa20
filesize: 941.0B
filetype: .log
belongs_to: data
Genome-wide - HapMap2 imputed - G0 mothers (gi_hapmap2_g0m)
Description
This dataset contains genotype data imputed to HapMap 2 for G0
mothers.
Reference genome build: GRCh36
Methodology
A total of 10 015 women (mothers from the ALSPAC cohort) were genotyped using the Illumina 660 quad SNP chip which contains 557 124 SNP markers. Markers with minor allele frequency < 1%, SNPs with >5% missing genotypes and any markers that failed an exact test of Hardy-Weinberg equilibrium (P < 1 x 10-6) were excluded from further analyses. Genome-wide identity by state sharing was calculated for each pair of individuals in the cohort to identify cryptic relatedness.
In order to identify individuals who might have ancestries other than Western European, we merged data from both cohorts with the 60 western European (CEU) founder, 60 Nigerian (YRI) founder and 90 Japanese (JPT) and Han Chinese (CHB) individuals from the International HapMap Project. Genome-wide IBS distances for each pair of individuals were calculated on markers shared between the HapMap and the Illumina 660K SNP chip, and then the multidimensional scaling option in R was used to generate a two-dimensional plot based upon individuals’ scores on the first two principal coordinates from this analysis. Samples that did not cluster with the CEU individuals were excluded from subsequent analyses. In addition, we plotted the proportion of missing data for each individual against their genome-wide heterozygosity. Any individual, who did not cluster with others, was removed from further analyses. Samples were also excluded from analyses in the case of excessive missingness (>5%), unusual genome-wide or X chromosome heterozygosity, as well as one individual from each pair of putatively related individuals (genome-wide IBD >10%). After data cleaning, 8340 individuals and 526688 SNPs were left in the genome-wide data set.
We then conducted imputation using the MACH Markov Chain Haplotyping software with CEU individuals from phase 2 of the HapMap project as a reference set (release 22). The final imputed data set consisted of 8340 individuals, each with 2 594 390 imputed markers. Only imputed genotypes with minor allele frequencies ≥1% and R-sqr ≥0.3 were considered for association. Of these 8340 with genetic data, 2874 mothers also had phenotype data available.
Associated publication:
- https://doi.org/10.1093/hmg/ddt239
Freeze Docs
# This yaml file is a description of a freeze of a released version of a named alspac dataset
# It should conform to the schema https://github.com/alspac/alspac-data-catalogue-schema
id: alspacdcs:gi_hapmap2_g0m_2022-12-07_f6
name: Genome-wide - HapMap2 imputed - G0 mothers version 2022-12-07 freeze 6
description: >-
Version 2022-12-07 freeze 6 of Genome-wide array data imputed to the HapMap2 reference panel for G0 mothers.
The number of variants & individuals within each plink file set can be viewed within the log file.
freeze_size: 5G
linker_file_md5sum: 45415c7d4fae355b4fb2d6ccd042620d
woc_file_md5sum: 6c887db8c7dd10cc695630ca73b41405
all_individuals_to_exclude_md5sum: e4efce63f9f671548d08c8bb2f9cc4f7
git_tag: https://github.com/alspac/dataset_gi_hapmap2_g0m/releases/tag/freeze6
is_current_freeze: true
freeze_number: 6
freeze_date: 2025-09-30
previous_freeze: alspacdcs:gi_hapmap2_g0m_2022-12-07_f5
freeze_of_alspac_dataset_version: alspacdcs:gi_hapmap2_g0m_2022-12-07
freeze_of_named_alspac_dataset: alspacdcs:gi_hapmap2_g0m
contains:
- plink
files: []
plink:
contains:
- freeze_id_chr21.bed
- freeze_id_chr22.bed
- freeze_id_chr20.bed
- freeze_id_chr18.bed
- freeze_id_chr19.bed
- freeze_id_chr14.bed
- freeze_id_chr17.bed
- freeze_id_chr16.bed
- freeze_id_chr12.bed
- freeze_id_chr15.bed
- freeze_id_chr13.bed
- freeze_id_chr4.bed
- freeze_id_chr9.bed
- freeze_id_chr6.bed
- freeze_id_chr11.bed
- freeze_id_chr3.bed
- freeze_id_chr7.bed
- freeze_id_chr10.bed
- freeze_id_chr1.bed
- freeze_id_chr8.bed
- freeze_id_chr2.bed
- freeze_id_chr5.bed
- freeze_id_chr21.bim
- freeze_id_chr14.bim
- freeze_id_chr19.bim
- freeze_id_chr5.bim
- freeze_id_chr18.bim
- freeze_id_chr15.bim
- freeze_id_chr7.bim
- freeze_id_chr16.bim
- freeze_id_chr20.bim
- freeze_id_chr11.bim
- freeze_id_chr4.bim
- freeze_id_chr10.bim
- freeze_id_chr17.bim
- freeze_id_chr2.bim
- freeze_id_chr1.bim
- freeze_id_chr12.bim
- freeze_id_chr6.bim
- freeze_id_chr9.bim
- freeze_id_chr22.bim
- freeze_id_chr3.bim
- freeze_id_chr8.bim
- freeze_id_chr13.bim
- freeze_id_chr20.fam
- freeze_id_chr16.fam
- freeze_id_chr17.fam
- freeze_id_chr6.fam
- freeze_id_chr12.fam
- freeze_id_chr4.fam
- freeze_id_chr14.fam
- freeze_id_chr15.fam
- freeze_id_chr7.fam
- freeze_id_chr2.fam
- freeze_id_chr11.fam
- freeze_id_chr13.fam
- freeze_id_chr21.fam
- freeze_id_chr1.fam
- freeze_id_chr8.fam
- freeze_id_chr18.fam
- freeze_id_chr3.fam
- freeze_id_chr9.fam
- freeze_id_chr22.fam
- freeze_id_chr10.fam
- freeze_id_chr5.fam
- freeze_id_chr19.fam
- freeze_id_chr20.log
- freeze_id_chr5.log
- freeze_id_chr6.log
- freeze_id_chr4.log
- freeze_id_chr2.log
- freeze_id_chr18.log
- freeze_id_chr10.log
- freeze_id_chr12.log
- freeze_id_chr15.log
- freeze_id_chr17.log
- freeze_id_chr3.log
- freeze_id_chr9.log
- freeze_id_chr1.log
- freeze_id_chr14.log
- freeze_id_chr7.log
- freeze_id_chr22.log
- freeze_id_chr16.log
- freeze_id_chr8.log
- freeze_id_chr19.log
- freeze_id_chr13.log
- freeze_id_chr21.log
- freeze_id_chr11.log
files:
- id: alspacdcs:4d0de059-4893-4c69-9fa5-43f78961fe14
name: freeze_id_chr21.bed
md5sum: 13165e1c9a27aa42853429b0246a1ed5
filesize: 65.6MB
filetype: .bed
belongs_to: plink
- id: alspacdcs:ad0f93f0-f7b3-4500-a73c-a26fd706680f
name: freeze_id_chr22.bed
md5sum: 5abcf552c585152ed0ee11754f3e7833
filesize: 65.5MB
filetype: .bed
belongs_to: plink
- id: alspacdcs:7c2cd7f8-5513-4c49-88be-06d7c251f381
name: freeze_id_chr20.bed
md5sum: 2af011bb98d6b8a8b00b7d938700fdac
filesize: 122.8MB
filetype: .bed
belongs_to: plink
- id: alspacdcs:1dd436e8-5f33-4c9a-832c-74c3cb343f3f
name: freeze_id_chr18.bed
md5sum: 6b46a8d2993dae303334b9a51b50b92c
filesize: 148.7MB
filetype: .bed
belongs_to: plink
- id: alspacdcs:c76b53ac-facb-4a38-b114-a15a2bbad2a5
name: freeze_id_chr19.bed
md5sum: 801ccb3bb64dddaabfc2b7a4a1e4c5b0
filesize: 71.7MB
filetype: .bed
belongs_to: plink
- id: alspacdcs:27a29455-1a51-42da-869a-b84c9d1f5575
name: freeze_id_chr14.bed
md5sum: a41f9803ec71a0dcdf137806b21ba2e6
filesize: 162.5MB
filetype: .bed
belongs_to: plink
- id: alspacdcs:0676b576-8d37-4879-873f-82c926d3db10
name: freeze_id_chr17.bed
md5sum: c6d54ed5ac68f2e0bd806b6124463ee4
filesize: 113.2MB
filetype: .bed
belongs_to: plink
- id: alspacdcs:94c3ca16-2032-4cb3-b723-bea46b76e195
name: freeze_id_chr16.bed
md5sum: b04eb2e4e66fef7ee7d48cb666d78c38
filesize: 138.5MB
filetype: .bed
belongs_to: plink
- id: alspacdcs:79c2a62c-2b42-4cc7-81f2-1dbaa7ebcf2b
name: freeze_id_chr12.bed
md5sum: 367f44ccd183c47334cfc7cb8333628a
filesize: 241.7MB
filetype: .bed
belongs_to: plink
- id: alspacdcs:6e50d58c-be8e-4371-89fe-6ba9e9c764c3
name: freeze_id_chr15.bed
md5sum: 611159bc9c4500de559615d0a7c549f2
filesize: 140.0MB
filetype: .bed
belongs_to: plink
- id: alspacdcs:fe6ad8e2-6509-4194-a22f-bb85550bcbce
name: freeze_id_chr13.bed
md5sum: 0e99cf077012880a802dc36ce72142c1
filesize: 201.6MB
filetype: .bed
belongs_to: plink
- id: alspacdcs:ed39eb11-b13b-4498-9c64-a85f01d6e6e9
name: freeze_id_chr4.bed
md5sum: 147fee33c621f644dad5a2d8ee86fc1d
filesize: 315.9MB
filetype: .bed
belongs_to: plink
- id: alspacdcs:c4f65aef-2670-4c92-bd14-e116317599e0
name: freeze_id_chr9.bed
md5sum: 58ff215f0652257867e42f567ff1c2be
filesize: 236.4MB
filetype: .bed
belongs_to: plink
- id: alspacdcs:0f814664-6480-4e8d-9714-c5f6d14d9e99
name: freeze_id_chr6.bed
md5sum: 953f9c82981d59d25dabe44ba5718b29
filesize: 353.1MB
filetype: .bed
belongs_to: plink
- id: alspacdcs:d887041f-c254-41ca-bec4-e1077f661182
name: freeze_id_chr11.bed
md5sum: 3c89898ce9fc0445c566ea0c060fb9db
filesize: 251.8MB
filetype: .bed
belongs_to: plink
- id: alspacdcs:34f61935-0af5-45bc-84ae-9d6d47230e4b
name: freeze_id_chr3.bed
md5sum: 609847ca0489b7a97725ec275f8337d2
filesize: 337.5MB
filetype: .bed
belongs_to: plink
- id: alspacdcs:6461c8cb-58e4-4117-a6c5-217f4a05731a
name: freeze_id_chr7.bed
md5sum: fb9e8aaf4ae7c3fc75233248ec9d03b0
filesize: 277.3MB
filetype: .bed
belongs_to: plink
- id: alspacdcs:ecf053b3-2e66-4f55-ba41-c14dade20283
name: freeze_id_chr10.bed
md5sum: 4606d4a5a008927b6ab051461218094a
filesize: 267.9MB
filetype: .bed
belongs_to: plink
- id: alspacdcs:138530d5-810e-4c0f-bf28-8e4527e75994
name: freeze_id_chr1.bed
md5sum: 01f7205ea4b6e852c0e8feb72a2cb9cd
filesize: 374.7MB
filetype: .bed
belongs_to: plink
- id: alspacdcs:3f74efb1-d0fa-4d63-be04-cc952f0f8fe7
name: freeze_id_chr8.bed
md5sum: de34e8ef57e4c08991e4778401adf861
filesize: 285.5MB
filetype: .bed
belongs_to: plink
- id: alspacdcs:cfd357f9-898a-4c61-bfb1-8622eee6b3c4
name: freeze_id_chr2.bed
md5sum: 494713bafedd17c3be4e782f7881dcc0
filesize: 427.5MB
filetype: .bed
belongs_to: plink
- id: alspacdcs:bba780dc-a454-428e-adcd-4d573c91dac3
name: freeze_id_chr5.bed
md5sum: a3a47a8ea90e0fa39d5c203436b6d982
filesize: 325.5MB
filetype: .bed
belongs_to: plink
- id: alspacdcs:0d071024-e6e0-42f6-bb30-c46b8e10878e
name: freeze_id_chr21.bim
md5sum: c1f6f2181c49172608ac79e18425e4f4
filesize: 924.7KB
filetype: .bim
number_of_variants: 33863
belongs_to: plink
- id: alspacdcs:99819ab1-c952-473f-b867-4ad5ea1602bc
name: freeze_id_chr14.bim
md5sum: 4a933818aaea48201f455ebd07ea1b78
filesize: 2.3MB
filetype: .bim
number_of_variants: 83936
belongs_to: plink
- id: alspacdcs:524d05bb-8de8-404a-96c8-51ec20e24d4e
name: freeze_id_chr19.bim
md5sum: c6fce7e15e198304f752ccbce66299b9
filesize: 1012.3KB
filetype: .bim
number_of_variants: 37045
belongs_to: plink
- id: alspacdcs:223d49ae-f1ef-40c5-b75f-d2507f94523d
name: freeze_id_chr5.bim
md5sum: e8f55ef9016bf2f03ee43f08a6c974c3
filesize: 4.4MB
filetype: .bim
number_of_variants: 168144
belongs_to: plink
- id: alspacdcs:4c02c893-aec2-46cb-a4c6-0eaeb202dae2
name: freeze_id_chr18.bim
md5sum: 9ffd8f006c82701060dff29bf460e8fe
filesize: 2.1MB
filetype: .bim
number_of_variants: 76812
belongs_to: plink
- id: alspacdcs:839b79d3-999f-4747-b6af-8046a9a44a30
name: freeze_id_chr15.bim
md5sum: 1e1139db4b031ba577b5ac6ae000ce6f
filesize: 1.9MB
filetype: .bim
number_of_variants: 72300
belongs_to: plink
- id: alspacdcs:7b7c44cd-7f15-47f0-81b1-84be66ad88d1
name: freeze_id_chr7.bim
md5sum: dae38c5168605323dfc584a73f3ce4a1
filesize: 3.8MB
filetype: .bim
number_of_variants: 143232
belongs_to: plink
- id: alspacdcs:3ba18f76-ecc8-41d1-ae4b-544df2519703
name: freeze_id_chr16.bim
md5sum: 8bd9cb45256b6b5ca37ce66eec810035
filesize: 1.9MB
filetype: .bim
number_of_variants: 71550
belongs_to: plink
- id: alspacdcs:fa231004-46b8-413c-96b8-91dd0b5644c0
name: freeze_id_chr20.bim
md5sum: 6e0b2d6cd06cc6e36f9cbc3f8df0a169
filesize: 1.7MB
filetype: .bim
number_of_variants: 63408
belongs_to: plink
- id: alspacdcs:8babfa2c-b0f7-442d-99f2-f25928c8def7
name: freeze_id_chr11.bim
md5sum: 703ecef520ce7363c24e9600b363570f
filesize: 3.5MB
filetype: .bim
number_of_variants: 130069
belongs_to: plink
- id: alspacdcs:eb0a5025-f35c-460a-9384-0228c559483e
name: freeze_id_chr4.bim
md5sum: 54a244447b1345636690b252215bfd2d
filesize: 4.3MB
filetype: .bim
number_of_variants: 163157
belongs_to: plink
- id: alspacdcs:5e0dae3e-bcc0-46ad-b09e-f04e324fb888
name: freeze_id_chr10.bim
md5sum: 3c259904c7da548d25c86a4a36e96285
filesize: 3.8MB
filetype: .bim
number_of_variants: 138402
belongs_to: plink
- id: alspacdcs:90635b2b-b142-48af-9e6d-18d72cfc0634
name: freeze_id_chr17.bim
md5sum: 0dc0770759f9edccec7ce305e07b57d4
filesize: 1.6MB
filetype: .bim
number_of_variants: 58455
belongs_to: plink
- id: alspacdcs:205cf2e2-d9a3-43c9-bdac-ccd1d055e456
name: freeze_id_chr2.bim
md5sum: 275cefa559489b51bebbc65657a91822
filesize: 5.9MB
filetype: .bim
number_of_variants: 220833
belongs_to: plink
- id: alspacdcs:2824e3a5-b354-46f9-8d93-617e1e13b935
name: freeze_id_chr1.bim
md5sum: 44795681691b62d1921ad8855fd11a09
filesize: 5.1MB
filetype: .bim
number_of_variants: 193554
belongs_to: plink
- id: alspacdcs:22b98431-8ff4-4727-990b-2af616a6073a
name: freeze_id_chr12.bim
md5sum: 515a46f735c531163377d114549042b5
filesize: 3.4MB
filetype: .bim
number_of_variants: 124860
belongs_to: plink
- id: alspacdcs:f088a831-e7d8-4924-a07c-25ee38c02dab
name: freeze_id_chr6.bim
md5sum: 3fd4e793a35c5e935454efc1105be192
filesize: 4.8MB
filetype: .bim
number_of_variants: 182381
belongs_to: plink
- id: alspacdcs:30e15b56-e8e9-40ff-aaf2-7e13e0b6e966
name: freeze_id_chr9.bim
md5sum: 1e828e0f36c2d168ce6c1df5887a764b
filesize: 3.2MB
filetype: .bim
number_of_variants: 122112
belongs_to: plink
- id: alspacdcs:6495275d-0674-48af-8b89-96cd2a4cac31
name: freeze_id_chr22.bim
md5sum: 86a1da3366ba87e62f561dc09f64f9ac
filesize: 920.9KB
filetype: .bim
number_of_variants: 33815
belongs_to: plink
- id: alspacdcs:add2c16b-6453-404c-a3b2-2f12165323a7
name: freeze_id_chr3.bim
md5sum: 96d147406f1f24697b0cb9af0c7091fc
filesize: 4.6MB
filetype: .bim
number_of_variants: 174356
belongs_to: plink
- id: alspacdcs:cb453287-93df-430f-82b4-1274c53b8f1b
name: freeze_id_chr8.bim
md5sum: 6243ef376ee6cbe643bec69201bec604
filesize: 3.9MB
filetype: .bim
number_of_variants: 147483
belongs_to: plink
- id: alspacdcs:e6547c62-7904-4ef8-915b-794fc791728a
name: freeze_id_chr13.bim
md5sum: cd1b7c80977fb5a0bbd87bc83dd85aed
filesize: 2.8MB
filetype: .bim
number_of_variants: 104120
belongs_to: plink
- id: alspacdcs:082730f3-2524-4bd6-a513-9519176ff930
name: freeze_id_chr20.fam
md5sum: 02a0b436dddcc4646d6fd0fb2ac3f591
filesize: 277.5KB
filetype: .fam
number_of_participants: 8118
belongs_to: plink
- id: alspacdcs:487e2beb-1c64-45a5-8c59-3ca980045a18
name: freeze_id_chr16.fam
md5sum: 02a0b436dddcc4646d6fd0fb2ac3f591
filesize: 277.5KB
filetype: .fam
number_of_participants: 8118
belongs_to: plink
- id: alspacdcs:eb1a7726-3011-4dec-83cc-609c55b87c70
name: freeze_id_chr17.fam
md5sum: 02a0b436dddcc4646d6fd0fb2ac3f591
filesize: 277.5KB
filetype: .fam
number_of_participants: 8118
belongs_to: plink
- id: alspacdcs:af7bcd06-b0c6-4b22-bfec-99d963ab0ad9
name: freeze_id_chr6.fam
md5sum: 02a0b436dddcc4646d6fd0fb2ac3f591
filesize: 277.5KB
filetype: .fam
number_of_participants: 8118
belongs_to: plink
- id: alspacdcs:86767f39-0f41-4260-a643-9ba9ddf3347b
name: freeze_id_chr12.fam
md5sum: 02a0b436dddcc4646d6fd0fb2ac3f591
filesize: 277.5KB
filetype: .fam
number_of_participants: 8118
belongs_to: plink
- id: alspacdcs:a58901cc-5262-47e7-ab84-09044aad73df
name: freeze_id_chr4.fam
md5sum: 02a0b436dddcc4646d6fd0fb2ac3f591
filesize: 277.5KB
filetype: .fam
number_of_participants: 8118
belongs_to: plink
- id: alspacdcs:ed5bc02f-d28f-4cf0-8d10-55571afd99c2
name: freeze_id_chr14.fam
md5sum: 02a0b436dddcc4646d6fd0fb2ac3f591
filesize: 277.5KB
filetype: .fam
number_of_participants: 8118
belongs_to: plink
- id: alspacdcs:e2da1046-3a96-43d2-902b-0a2f0ee3ed48
name: freeze_id_chr15.fam
md5sum: 02a0b436dddcc4646d6fd0fb2ac3f591
filesize: 277.5KB
filetype: .fam
number_of_participants: 8118
belongs_to: plink
- id: alspacdcs:48ceeaba-3116-4183-a42b-5e9a9bdc08fa
name: freeze_id_chr7.fam
md5sum: 02a0b436dddcc4646d6fd0fb2ac3f591
filesize: 277.5KB
filetype: .fam
number_of_participants: 8118
belongs_to: plink
- id: alspacdcs:4a18d6e0-e531-47de-96ee-40724b3c8a09
name: freeze_id_chr2.fam
md5sum: 02a0b436dddcc4646d6fd0fb2ac3f591
filesize: 277.5KB
filetype: .fam
number_of_participants: 8118
belongs_to: plink
- id: alspacdcs:cc02ab10-5505-4dd8-b754-4c5b126a3125
name: freeze_id_chr11.fam
md5sum: 02a0b436dddcc4646d6fd0fb2ac3f591
filesize: 277.5KB
filetype: .fam
number_of_participants: 8118
belongs_to: plink
- id: alspacdcs:4fb39ed6-6bc7-4f8f-8e44-750f84f89269
name: freeze_id_chr13.fam
md5sum: 02a0b436dddcc4646d6fd0fb2ac3f591
filesize: 277.5KB
filetype: .fam
number_of_participants: 8118
belongs_to: plink
- id: alspacdcs:feb5710a-76e2-452d-955a-bda1d5c5cf26
name: freeze_id_chr21.fam
md5sum: 02a0b436dddcc4646d6fd0fb2ac3f591
filesize: 277.5KB
filetype: .fam
number_of_participants: 8118
belongs_to: plink
- id: alspacdcs:a9dd0bc9-3310-4ed5-afd4-948cbf300d97
name: freeze_id_chr1.fam
md5sum: 02a0b436dddcc4646d6fd0fb2ac3f591
filesize: 277.5KB
filetype: .fam
number_of_participants: 8118
belongs_to: plink
- id: alspacdcs:de7be791-f12e-4513-8f52-0adb6d82b44c
name: freeze_id_chr8.fam
md5sum: 02a0b436dddcc4646d6fd0fb2ac3f591
filesize: 277.5KB
filetype: .fam
number_of_participants: 8118
belongs_to: plink
- id: alspacdcs:5490fdc5-b116-476c-a5f9-0fb8e288293d
name: freeze_id_chr18.fam
md5sum: 02a0b436dddcc4646d6fd0fb2ac3f591
filesize: 277.5KB
filetype: .fam
number_of_participants: 8118
belongs_to: plink
- id: alspacdcs:873a1134-d308-471f-ae4d-629ffbf27d01
name: freeze_id_chr3.fam
md5sum: 02a0b436dddcc4646d6fd0fb2ac3f591
filesize: 277.5KB
filetype: .fam
number_of_participants: 8118
belongs_to: plink
- id: alspacdcs:619e79bc-51be-4a8e-b576-6f1090995b7e
name: freeze_id_chr9.fam
md5sum: 02a0b436dddcc4646d6fd0fb2ac3f591
filesize: 277.5KB
filetype: .fam
number_of_participants: 8118
belongs_to: plink
- id: alspacdcs:d9db5a85-5691-4f7c-9ffe-ff20318e2d6c
name: freeze_id_chr22.fam
md5sum: 02a0b436dddcc4646d6fd0fb2ac3f591
filesize: 277.5KB
filetype: .fam
number_of_participants: 8118
belongs_to: plink
- id: alspacdcs:5b8b7ca5-6be1-4cf1-b2c1-f3ba32d5c3c0
name: freeze_id_chr10.fam
md5sum: 02a0b436dddcc4646d6fd0fb2ac3f591
filesize: 277.5KB
filetype: .fam
number_of_participants: 8118
belongs_to: plink
- id: alspacdcs:748e1050-e5ad-49cc-b2c9-e3d615c463c6
name: freeze_id_chr5.fam
md5sum: 02a0b436dddcc4646d6fd0fb2ac3f591
filesize: 277.5KB
filetype: .fam
number_of_participants: 8118
belongs_to: plink
- id: alspacdcs:4adc17ab-eec6-4270-ad85-d31dc80fb3f7
name: freeze_id_chr19.fam
md5sum: 02a0b436dddcc4646d6fd0fb2ac3f591
filesize: 277.5KB
filetype: .fam
number_of_participants: 8118
belongs_to: plink
- id: alspacdcs:acc95b26-02b3-47a0-9445-36f175e792b0
name: freeze_id_chr20.log
md5sum: 39173b45309913c6ccc1cd639081f198
filesize: 975.0B
filetype: .log
belongs_to: plink
- id: alspacdcs:4cc54de9-97be-4dcb-bde5-48205f1f6cff
name: freeze_id_chr5.log
md5sum: c676952aec770492e37516fb583043a1
filesize: 971.0B
filetype: .log
belongs_to: plink
- id: alspacdcs:39e7375f-d35c-4b93-80f6-fc9cc0c43ec5
name: freeze_id_chr6.log
md5sum: f1a481559a61558066b4a5f82a54b261
filesize: 971.0B
filetype: .log
belongs_to: plink
- id: alspacdcs:d63190d2-6602-4a17-836e-b787de1a3ad3
name: freeze_id_chr4.log
md5sum: b944fbfdcb2d6e4578398d6a75cea4eb
filesize: 971.0B
filetype: .log
belongs_to: plink
- id: alspacdcs:4eb9745a-7faa-4370-95a7-3627edefd059
name: freeze_id_chr2.log
md5sum: 58c0ae51d3d950091908b26d9fcbf662
filesize: 971.0B
filetype: .log
belongs_to: plink
- id: alspacdcs:6a2c28f5-7c46-4738-b348-a14c8aa09088
name: freeze_id_chr18.log
md5sum: c85719b99d729238125cef5af686af40
filesize: 975.0B
filetype: .log
belongs_to: plink
- id: alspacdcs:fab8e72c-5d9e-4391-85a5-0705a5ba1931
name: freeze_id_chr10.log
md5sum: 9944731b8939bb29ec0f058fb85fc8a6
filesize: 977.0B
filetype: .log
belongs_to: plink
- id: alspacdcs:37d48538-54f2-4006-9010-b55dabdffa0c
name: freeze_id_chr12.log
md5sum: 6a92b280ef1cb18e0cca60b34b77cf8c
filesize: 977.0B
filetype: .log
belongs_to: plink
- id: alspacdcs:891848f8-5c96-4433-bff8-0690191ccf78
name: freeze_id_chr15.log
md5sum: f111c4fde2aef6c4d6549b509b07a8eb
filesize: 975.0B
filetype: .log
belongs_to: plink
- id: alspacdcs:f59eb5fe-0712-4324-9dba-55fcfdd8995d
name: freeze_id_chr17.log
md5sum: 2477b8d24bc302ba49a95c89b5894560
filesize: 975.0B
filetype: .log
belongs_to: plink
- id: alspacdcs:05923457-4385-4adf-bc5b-699393117763
name: freeze_id_chr3.log
md5sum: d46fc3e1a1fbbcf6ee7e5aca5b3913e6
filesize: 971.0B
filetype: .log
belongs_to: plink
- id: alspacdcs:1e37f74e-932b-4cdd-a6a8-2097c7c9d343
name: freeze_id_chr9.log
md5sum: c334d5c2e55192550c082508c43f0b8c
filesize: 971.0B
filetype: .log
belongs_to: plink
- id: alspacdcs:d4c5a687-74e9-4714-aae4-40d7354ac513
name: freeze_id_chr1.log
md5sum: a31745db1e5b091d2b083d188e078b51
filesize: 971.0B
filetype: .log
belongs_to: plink
- id: alspacdcs:cfd03b67-3c6e-4d45-a46e-799b0c76e795
name: freeze_id_chr14.log
md5sum: 1da37f38292b25228306b0c12d0233b7
filesize: 975.0B
filetype: .log
belongs_to: plink
- id: alspacdcs:53d1a685-75fc-4759-a16e-55e37f4fcee3
name: freeze_id_chr7.log
md5sum: ca37be820a48ec6005f80e8298b0d24f
filesize: 971.0B
filetype: .log
belongs_to: plink
- id: alspacdcs:1cd105ee-dd11-41a4-b044-a87663ed7f9a
name: freeze_id_chr22.log
md5sum: a400793a0f02086dd3c2f32a40a42ea5
filesize: 975.0B
filetype: .log
belongs_to: plink
- id: alspacdcs:4a5875a7-ad8d-4727-a34d-f97236d87e6b
name: freeze_id_chr16.log
md5sum: 088cc8fbf1bb1741cf4406b9cca934c5
filesize: 975.0B
filetype: .log
belongs_to: plink
- id: alspacdcs:d99b8342-76da-40a6-877f-9cc23f6c0cc5
name: freeze_id_chr8.log
md5sum: b612c22e9f96a11ffc220257fb1c40a2
filesize: 971.0B
filetype: .log
belongs_to: plink
- id: alspacdcs:6d04e6c1-cb79-466d-b225-d56cc0633e7c
name: freeze_id_chr19.log
md5sum: f1e8728cc5a2ec80a0b31845c5797403
filesize: 975.0B
filetype: .log
belongs_to: plink
- id: alspacdcs:149473be-439e-4b4e-95ad-6ba89828c0b2
name: freeze_id_chr13.log
md5sum: 86280b924bb8789c35d7313eff1d4b83
filesize: 977.0B
filetype: .log
belongs_to: plink
- id: alspacdcs:eaa04a24-6af6-437a-bfe4-8fefe5ecc7b5
name: freeze_id_chr21.log
md5sum: a0e7a40ef5874dd577150a472100520a
filesize: 975.0B
filetype: .log
belongs_to: plink
- id: alspacdcs:8546ec4d-3642-41a0-bb82-74284d9fefa2
name: freeze_id_chr11.log
md5sum: 308e63e5f26651ad4f144d4485b05176
filesize: 977.0B
filetype: .log
belongs_to: plink
Genome-wide - 1000G imputed - G0 partners (gi_1000g_g0p)
Description
This dataset contains genome-wide array data imputed to the 1000
genomes reference panel for G0 partners, with some additional G0 mothers
and G1 individuals. This data has been cleaned, flipped to the positive
strand and in b37 coordinates and imputed to the 1000 genomes phase I
version 3.
Reference genome build: GRCh37
Methodology
3,453 ALSPAC mother and fathers and 535,478 SNPs were genotyped using the Illumina HumanCoreExome chip genotyping platforms by the ALSPAC lab and called using GenomeStudio. The resulting raw genome-wide data were subjected to standard quality control methods using PLINK (v1.07). Individuals were excluded on the basis of gender mismatches (n = 80); minimal or excessive heterozygosity (n = 64); disproportionate levels of individual missingness (>5%, n = 60) and possible contamination (n = 3).
Population stratification was assessed by multidimensional scaling analysis and compared with 1000 Genomes phase 3 data and principal component analysis (n = 266); all individuals with non-European ancestry were removed.
Cryptic relatedness was measured as SNP relatedness in GCTA (relatedness > 0.1, n = 69 removed). SNPs with a call rate of < 95% or evidence for violations of Hardy-Weinberg equilibrium (P < 1E-7) and those which failed GenomeStudio quality control measures were removed (n = 21,298). 6,594 duplicate SNPs were also removed.
This resulted in 2,911 unrelated mothers and father genotypes at 507,586 SNPs. We then identified 2217 samples where aln assigned historically by the lab matched genetically assigned aln.
We phased data of 3074 samples that passed qc but contained related subjects in shapeit v2.r837. We then removed 155,336 monomorphic SNPs, 1033 markers not in 1000 genomes, 11,842 A/T or G/C SNPs and 10 duplicate sites to give 337,732 SNPs on chromosomes 1-23. Of the 329,363 markers on chromosomes 1-22, 298,742 overlapped the reference genome. We imputed to the 1000 genomes phase 1 version 3 using the Michigan Imputation Server. We then identified 2217 samples where aln assigned historically by the lab matched genetically assigned aln. We then removed 12 subjects who have withdrawn consent and 6 subjects genotyped in an earlier work package to give 2201 subjects.
1737 putative G0 partner-G1 pairs for whom both G0 partner and G1 have called genotype data available were identified based on ALN. Given the G0 partners were invited by the G0 mother to take part and only enrolled in the study in their own right several years later, it could not be assumed that all G0 partners were biologically related to G1. Called genotype data for the 1720 unique G0 partners and 1737 unique G1s were merged (i.e. there were 17 pairs of siblings/twins among the G1 offspring), using plink v1.90b7.2 64-bit (11 Dec 2023).
After aplication of the plink filters –geno 0.05, –maf 0.01, –snps-only just-acgt and –autosome. The –related command in KING version 2.3.2 was used to perform kinship analysis, which confirmed that all 1737 putative G0 partner-G1 pairs are genetically related. This would be expected for biological father-offspring pairs, using the inference criteria described in in Table 1 of “Manichaikul, Ani, et al. ”Robust relationship inference in genome-wide association studies.” Bioinformatics 26.22 (2010): 2867-2873.”
Freeze Docs
# This yaml file is a description of a freeze of a released version of a named alspac dataset
# It should conform to the schema https://github.com/alspac/alspac-data-catalogue-schema
id: alspacdcs:gi_1000g_g0p_2016-11-22_f6
name: Genome-wide - 1000G imputed - G0 partners version 2016-11-22 freeze 5
description: >-
This dataset is the sixth freeze of 2016-11-22 version of the Genome-wide array data imputed to the 1000 genomes reference panel
for G0 partners, with some additional G0 mothers and G1 individuals.
freeze_size: 44G
linker_file_md5sum: 45415c7d4fae355b4fb2d6ccd042620d
woc_file_md5sum: 6c887db8c7dd10cc695630ca73b41405
all_individuals_to_exclude_md5sum: e4efce63f9f671548d08c8bb2f9cc4f7
git_tag: https://github.com/alspac/dataset_gi_1000g_g0p/releases/tag/freeze6
is_current_freeze: true
freeze_number: 6
freeze_date: 2025-09-30
previous_freeze: alspacdcs:gi_1000g_g0p_2016-11-22_f5
freeze_of_alspac_dataset_version: alspacdcs:gi_1000g_g0p_2016-11-22
freeze_of_named_alspac_dataset: alspacdcs:gi_1000g_g0p
contains:
- data
files: []
data:
contains:
- filtered_data_chr22.bgen
- filtered_data_chr21.bgen
- filtered_data_chr20.bgen
- filtered_data_chr19.bgen
- filtered_data_chr16.bgen
- filtered_data_chr18.bgen
- filtered_data_chr14.bgen
- filtered_data_chr17.bgen
- filtered_data_chr15.bgen
- filtered_data_chr13.bgen
- filtered_data_chr09.bgen
- filtered_data_chr12.bgen
- filtered_data_chr11.bgen
- filtered_data_chr10.bgen
- filtered_data_chr07.bgen
- filtered_data_chr08.bgen
- filtered_data_chr06.bgen
- filtered_data_chr04.bgen
- filtered_data_chr05.bgen
- filtered_data_chr02.bgen
- filtered_data_chr03.bgen
- filtered_data_chr01.bgen
- swapped.sample
files:
- id: alspacdcs:88d6c1fb-637e-45e6-aed7-b3a314f01b43
name: filtered_data_chr22.bgen
md5sum: 824412e963441699f260c6245f65659d
filesize: 721.5MB
filetype: .bgen
number_of_variants: 366590
number_of_participants: 2198
belongs_to: data
- id: alspacdcs:4df43797-f090-4a69-b8f0-eb27dde2b726
name: filtered_data_chr21.bgen
md5sum: 7881bdc24e7f0adbfb800b49d1efd590
filesize: 671.1MB
filetype: .bgen
number_of_variants: 378064
number_of_participants: 2198
belongs_to: data
- id: alspacdcs:3a69092e-67f8-40e1-8f48-bd8a0ef43806
name: filtered_data_chr20.bgen
md5sum: d241eb21be3188c26c460e1f65f0d8c1
filesize: 1.1GB
filetype: .bgen
number_of_variants: 618749
number_of_participants: 2198
belongs_to: data
- id: alspacdcs:2c1484b3-3a79-47c4-957b-eb7dc0a0a343
name: filtered_data_chr19.bgen
md5sum: 37ea045cd9f4027cba547b7b89c3a1a0
filesize: 1.2GB
filetype: .bgen
number_of_variants: 606147
number_of_participants: 2198
belongs_to: data
- id: alspacdcs:127c1e60-f254-4f6f-af8b-6a5fc78f1c12
name: filtered_data_chr16.bgen
md5sum: 52f065575d3cb2dff34df6763a583766
filesize: 1.5GB
filetype: .bgen
number_of_variants: 867901
number_of_participants: 2198
belongs_to: data
- id: alspacdcs:a4d6a3eb-9f36-4c56-b79a-b48b9bb04772
name: filtered_data_chr18.bgen
md5sum: b8e055a6c0955bb67161c9f7a1d8cad7
filesize: 1.3GB
filetype: .bgen
number_of_variants: 783661
number_of_participants: 2198
belongs_to: data
- id: alspacdcs:588abf2e-4186-4fbb-aca3-9e9e8a0d7b33
name: filtered_data_chr14.bgen
md5sum: 1ecd96aab2925bafd7d20497d85dd937
filesize: 1.4GB
filetype: .bgen
number_of_variants: 903811
number_of_participants: 2198
belongs_to: data
- id: alspacdcs:a2c4b90b-c8d5-4781-86fc-30abb3dda4ca
name: filtered_data_chr17.bgen
md5sum: 73d85caf67dcedc63b11a43bd5ccb44d
filesize: 1.4GB
filetype: .bgen
number_of_variants: 755467
number_of_participants: 2198
belongs_to: data
- id: alspacdcs:1494da4f-c326-4be9-bd9a-1042fb06339d
name: filtered_data_chr15.bgen
md5sum: f8c5b54206189808e9a361cc0da63798
filesize: 1.4GB
filetype: .bgen
number_of_variants: 814028
number_of_participants: 2198
belongs_to: data
- id: alspacdcs:f31abcf8-9570-4e8e-a3cb-3f6064a6a362
name: filtered_data_chr13.bgen
md5sum: 176a10d38ab80783a8e392e5791edea7
filesize: 1.5GB
filetype: .bgen
number_of_variants: 988473
number_of_participants: 2198
belongs_to: data
- id: alspacdcs:f6fd76fe-aec2-460e-9e16-b131e0a91776
name: filtered_data_chr09.bgen
md5sum: 82a480f3e8792db2c1cec3adc50e1357
filesize: 1.9GB
filetype: .bgen
number_of_variants: 1189463
number_of_participants: 2198
belongs_to: data
- id: alspacdcs:b9b4f93b-3ee7-404f-9b24-bd3b6d3b4736
name: filtered_data_chr12.bgen
md5sum: 509202db22200fe0bd58210ab8e9c757
filesize: 2.1GB
filetype: .bgen
number_of_variants: 1316510
number_of_participants: 2198
belongs_to: data
- id: alspacdcs:7c8064bb-916e-4e4f-a36b-225234db223b
name: filtered_data_chr11.bgen
md5sum: b1b7e3bef0fe72cd90bd0ba456f687aa
filesize: 2.1GB
filetype: .bgen
number_of_variants: 1359640
number_of_participants: 2198
belongs_to: data
- id: alspacdcs:6c6e37e4-78cd-46c1-b784-db9aa41e00ff
name: filtered_data_chr10.bgen
md5sum: 8f64fe184e4c876a345a728ed5eeddcf
filesize: 2.1GB
filetype: .bgen
number_of_variants: 1363104
number_of_participants: 2198
belongs_to: data
- id: alspacdcs:1f2be053-4a68-440a-b328-f806d7ab6790
name: filtered_data_chr07.bgen
md5sum: f832922558eddcf3feed87091c2ec0ae
filesize: 2.6GB
filetype: .bgen
number_of_variants: 1601293
number_of_participants: 2198
belongs_to: data
- id: alspacdcs:4ab48954-47ad-4138-bc9a-2e2b65df9ec5
name: filtered_data_chr08.bgen
md5sum: 47d79712e676a0048f90858cbb888179
filesize: 2.3GB
filetype: .bgen
number_of_variants: 1558902
number_of_participants: 2198
belongs_to: data
- id: alspacdcs:021c15de-7ffd-4ee2-b834-8f9e8411dd04
name: filtered_data_chr06.bgen
md5sum: a9327ad1591fdf7d349b066544e71c3a
filesize: 2.6GB
filetype: .bgen
number_of_variants: 1758025
number_of_participants: 2198
belongs_to: data
- id: alspacdcs:12db13fe-2594-45b1-bccd-f6fe916cb6b7
name: filtered_data_chr04.bgen
md5sum: 514f09f02c74fc3eca83379e9e99c5dc
filesize: 3.1GB
filetype: .bgen
number_of_variants: 1969883
number_of_participants: 2198
belongs_to: data
- id: alspacdcs:2f9c9b1f-553a-48ab-973e-cad46335845f
name: filtered_data_chr05.bgen
md5sum: f4accbf5bdd6a2ccc9598e9e2221915d
filesize: 2.7GB
filetype: .bgen
number_of_variants: 1809961
number_of_participants: 2198
belongs_to: data
- id: alspacdcs:250aba37-8329-430a-9065-37e3fa65494e
name: filtered_data_chr02.bgen
md5sum: e297c8d30455053d23ac360bcc886bb0
filesize: 3.5GB
filetype: .bgen
number_of_variants: 2349883
number_of_participants: 2198
belongs_to: data
- id: alspacdcs:9c547458-6640-4bb3-972e-408f607047f7
name: filtered_data_chr03.bgen
md5sum: c0b55e9d65c219ffb1b8c58a0ebb7c18
filesize: 3.0GB
filetype: .bgen
number_of_variants: 1969275
number_of_participants: 2198
belongs_to: data
- id: alspacdcs:4acc911f-cc1b-419e-8f16-f178f079229a
name: filtered_data_chr01.bgen
md5sum: a5eb049e4df5a8b005ae51b47947d830
filesize: 3.3GB
filetype: .bgen
number_of_variants: 2159337
number_of_participants: 2198
belongs_to: data
- id: alspacdcs:859a623f-b408-4cb5-81b7-4f48da58e7b6
name: swapped.sample
md5sum: 1bf22d5d9118fc1479199f108af11138
filesize: 164.9KB
filetype: .sample
number_of_participants: 2198
belongs_to: dataGenome-wide - 1000G imputed - G0 mothers + G1 (gi_1000g_g0m_g1)
Description
This dataset contains genome-wide 1000G imputed data for G0 mothers +
G1. This data has been cleaned, flipped to the positive strand and in
b37 coordinates and imputed to the 1000 genomes phase I version 3.
Reference genome build: GRCh37
Methodology
ALSPAC children were genotyped using the Illumina HumanHap550 quad chip genotyping platforms by 23andme subcontracting the Wellcome Trust Sanger Institute, Cambridge, UK and the Laboratory Corporation of America, Burlington, NC, US. The resulting raw genome-wide data were subjected to standard quality control methods. Individuals were excluded on the basis of gender mismatches; minimal or excessive heterozygosity; disproportionate levels of individual missingness (>3%) and insufficient sample replication (IBD < 0.8).
Population stratification was assessed by multidimensional scaling analysis and compared with Hapmap II (release 22) European descent (CEU), Han Chinese, Japanese and Yoruba reference populations; all individuals with non-European ancestry were removed.
SNPs with a minor allele frequency of < 1%, a call rate of < 95% or evidence for violations of Hardy-Weinberg equilibrium (P < 5E-7) were removed. 9,115 subjects and 500,527 SNPs passed these quality control filters.
ALSPAC mothers were genotyped using the Illumina human660W-quad array at Centre National de Génotypage (CNG) and genotypes were called with Illumina GenomeStudio. PLINK (v1.07) was used to carry out quality control measures on an initial set of 10,015 subjects and 557,124 directly genotyped SNPs. SNPs were removed if they displayed more than 5% missingness or a Hardy-Weinberg equilibrium P value of less than 1.0e-06. Additionally SNPs with a minor allele frequency of less than 1% were removed.
Samples were excluded if they displayed more than 5% missingness, had indeterminate X chromosome heterozygosity or extreme autosomal heterozygosity. Samples showing evidence of population stratification were identified by multidimensional scaling of genome-wide identity by state pairwise distances using the four HapMap populations as a reference, and then excluded.
9,048 subjects and 526,688 SNPs passed these quality control filters.
We combined 477,482 SNP genotypes in common between the sample of mothers and sample of children. We removed SNPs with genotype missingness above 1% due to poor quality (11,396 SNPs removed) and removed a further 321 subjects due to potential ID mismatches. This resulted in a dataset of 17,842 subjects containing 6,305 duos and 465,740 SNPs (112 were removed during liftover and 234 were out of HWE after combination). We estimated haplotypes using ShapeIT(v2.r644) which utilises relatedness during phasing. We obtained a phased version of the 1000 genomes reference panel (Phase 1, Version3) from the Impute2 reference data repository (phased using ShapeItv2.r644, haplotype release date Dec 2013). Imputation of the target data was performed using Impute V2.2.2 against the reference panel(all polymorphic SNPs excluding singletons), using all 2186 reference haplotypes (including non-Europeans).
This gave 8,237 eligible children and 8,196 eligible mothers withavailable genotype data after exclusion of related subjects using cryptic relatedness measures described previously.
Known issues: There is a known strand issue present within this imputation: The Dec 2013 haplotype release of 1000 genomes phase 1 version 3 have 199 reported SNPs with incorrect strand. For more information and the origins of this list please visit https://mathgen.stats.ox.ac.uk/impute/data_download_1000G_phase1_integrated_SHAPEIT2_16-06-14.html. It is very unlikely that they have systematic effects across the genome and most probably are just isolated to these 199 known problematic SNPs. The user is advised to discard them from their analysis.
Formatting of the bgen files within the gi_1000g_g0m_g1 dataset have NA in place of the chromosome column. Some tools may allow this, while others are less forgiving. This may mean users wish to re-format the dataset (using QCtool or equivalent) for their work.
Allele frequency concordance with other cohorts: When contributing to consortia you may find that the allele frequencies in ALSPAC for a few thousand SNPs are discordant from a reference panel used by the consortium. This is actually to be expected - when calculating allele frequencies, even from the same population, in two different samples for many millions of SNPs there will be a number of SNPs that appear to be highly discordant.
Freeze Docs
# This yaml file is a description of a freeze of a released version of a named alspac dataset
# It should conform to the schema https://github.com/alspac/alspac-data-catalogue-schema
id: alspacdcs:gi_1000g_g0m_g1_2015-10-30_f6
name: >-
Genome-wide - 1000G imputed - G0 mothers + G1 version 2015-10-30
freeze 6
description: >-
This is the sixth freeze of the the 2015-10-30 version of
gi_1000g_g0m_g1 datatset. It contains data in the oxford format
which is a combination of bgen and sample (version 1.2) files. It is a subset of
the data in gi_1000g_g0m_g1_2015-10-30 limited to one format and
with participants who have withdrawn their consent removed.
The Dec 2013 haplotype release of 1000 genomes phase 1 version 3 have 199 reported SNPs
with incorrect strand. The strand issues are present in this imputation version. For more
information and the origins of this list please visit:
https://mathgen.stats.ox.ac.uk/impute/data_download_1000G_phase1_integrated_SHAPEIT2_16-06-14.html
It is very unlikely that they have systematic effects across the genome and most
probably are just isolated to these 199 known problematic SNPs.
The user is advised to discard them from their analysis.
freeze_size: 123G
linker_file_md5sum: 45415c7d4fae355b4fb2d6ccd042620d
woc_file_md5sum: 6c887db8c7dd10cc695630ca73b41405
all_individuals_to_exclude_md5sum: e4efce63f9f671548d08c8bb2f9cc4f7
git_tag: https://github.com/alspac/dataset_gi_1000g_g0m_g1/releases/tag/freeze6
is_current_freeze: true
freeze_number: 6
freeze_date: 2025-09-30
previous_freeze: alspacdcs:gi_1000g_g0m_g1_2015-10-30_f5
freeze_of_alspac_dataset_version: alspacdcs:gi_1000g_g0m_g1_2015-10-30
freeze_of_named_alspac_dataset: alspacdcs:gi_1000g_g0m_g1
contains:
- data
files: []
data:
contains:
- filtered_22.bgen
- filtered_21.bgen
- filtered_20.bgen
- filtered_19.bgen
- filtered_17.bgen
- filtered_18.bgen
- filtered_15.bgen
- filtered_16.bgen
- filtered_14.bgen
- filtered_13.bgen
- filtered_09.bgen
- filtered_23.bgen
- filtered_12.bgen
- filtered_11.bgen
- filtered_10.bgen
- filtered_08.bgen
- filtered_07.bgen
- filtered_06.bgen
- filtered_05.bgen
- filtered_03.bgen
- filtered_04.bgen
- filtered_01.bgen
- filtered_02.bgen
- swapped.sample
files:
- id: alspacdcs:a4c65023-c1e8-483e-af6a-a5eda202135b
name: filtered_22.bgen
md5sum: fe115a073819d3ddca57180a314edf96
filesize: 2.0GB
filetype: .bgen
number_of_variants: 365644
number_of_participants: 17443
belongs_to: data
- id: alspacdcs:e189d2bf-679c-4d71-ab1b-5911d0680689
name: filtered_21.bgen
md5sum: 7d481004542668f9bfec0cc9a6f23205
filesize: 1.9GB
filetype: .bgen
number_of_variants: 377554
number_of_participants: 17443
belongs_to: data
- id: alspacdcs:8e5ff0c0-87a7-49d0-9602-aae35bc1f4d3
name: filtered_20.bgen
md5sum: 32d561ccc75a8cff9cfb7d0ff2f6beb5
filesize: 2.7GB
filetype: .bgen
number_of_variants: 617694
number_of_participants: 17443
belongs_to: data
- id: alspacdcs:f4852506-1b9f-4dd7-aa43-703931a0beae
name: filtered_19.bgen
md5sum: 9136b29ea7e9ccbdcb4ac7889fe8aef7
filesize: 3.9GB
filetype: .bgen
number_of_variants: 603516
number_of_participants: 17443
belongs_to: data
- id: alspacdcs:e1455c23-9df9-4073-93bc-fd99abe9837b
name: filtered_17.bgen
md5sum: aa761d8764e878d227a4af63c9748b63
filesize: 3.8GB
filetype: .bgen
number_of_variants: 753174
number_of_participants: 17443
belongs_to: data
- id: alspacdcs:2ff1ff0c-3531-495d-8efb-1333d09a586e
name: filtered_18.bgen
md5sum: e60407d3601e26584b9e8cbdefc1d62c
filesize: 3.4GB
filetype: .bgen
number_of_variants: 783010
number_of_participants: 17443
belongs_to: data
- id: alspacdcs:05db4f4c-9edf-46dd-a873-d411cb05bc99
name: filtered_15.bgen
md5sum: 618133a6ef0e5be6cbb9b20214d689d9
filesize: 3.7GB
filetype: .bgen
number_of_variants: 812545
number_of_participants: 17443
belongs_to: data
- id: alspacdcs:7cb98407-45d4-4484-8b9f-3c37402169c0
name: filtered_16.bgen
md5sum: 485eaa35595bd2d5b09ac112661a7e00
filesize: 4.3GB
filetype: .bgen
number_of_variants: 865998
number_of_participants: 17443
belongs_to: data
- id: alspacdcs:544d9814-32ad-4375-b7cd-11b0cd4b6191
name: filtered_14.bgen
md5sum: 36a40f49a0b30786fba809efe8fb515f
filesize: 3.9GB
filetype: .bgen
number_of_variants: 904351
number_of_participants: 17443
belongs_to: data
- id: alspacdcs:807ca4b7-b7fc-4a08-9fed-7505100e8e3a
name: filtered_13.bgen
md5sum: 0cd06c79431689b0abf3b611b4353054
filesize: 3.9GB
filetype: .bgen
number_of_variants: 987740
number_of_participants: 17443
belongs_to: data
- id: alspacdcs:3826dfea-edc3-4b5f-97f2-27f0b8984a1b
name: filtered_09.bgen
md5sum: aa484f17e3432cf848f8284842cf12d5
filesize: 5.0GB
filetype: .bgen
number_of_variants: 1187731
number_of_participants: 17443
belongs_to: data
- id: alspacdcs:94361599-9a8e-4d65-a64b-8dd2f205ecb3
name: filtered_23.bgen
md5sum: 3c60c10ed23c2d8e66999e6f736646da
filesize: 5.9GB
filetype: .bgen
number_of_variants: 1250218
number_of_participants: 17443
belongs_to: data
- id: alspacdcs:eb39d1c6-76c0-4c65-893d-7441d408874c
name: filtered_12.bgen
md5sum: ebfb0facd3f3e9329a1cec9d2edf035b
filesize: 5.3GB
filetype: .bgen
number_of_variants: 1314328
number_of_participants: 17443
belongs_to: data
- id: alspacdcs:a1a16d1a-f4fc-4fe8-8058-8bd9b94e9a02
name: filtered_11.bgen
md5sum: 34c90038607804acde536fbdcefb5f12
filesize: 5.3GB
filetype: .bgen
number_of_variants: 1356882
number_of_participants: 17443
belongs_to: data
- id: alspacdcs:02184bad-3ba9-4e29-b390-31192645fd5b
name: filtered_10.bgen
md5sum: 57e035cd8f5b67b99e7292482712f007
filesize: 5.4GB
filetype: .bgen
number_of_variants: 1361506
number_of_participants: 17443
belongs_to: data
- id: alspacdcs:d13bffcf-8879-4246-af07-8ccd87bb30da
name: filtered_08.bgen
md5sum: d79768b17b72de5f27ff7a65bc2f4f22
filesize: 5.9GB
filetype: .bgen
number_of_variants: 1557429
number_of_participants: 17443
belongs_to: data
- id: alspacdcs:e21e81ad-c2cc-457e-bfd9-370f40ced126
name: filtered_07.bgen
md5sum: d31560a8a8a2ae087ea92d81d85c337e
filesize: 7.1GB
filetype: .bgen
number_of_variants: 1599387
number_of_participants: 17443
belongs_to: data
- id: alspacdcs:5cd3c842-678f-4bac-921a-ca7e810df276
name: filtered_06.bgen
md5sum: 76bc20f38bb1c155375c38e597c501ab
filesize: 6.8GB
filetype: .bgen
number_of_variants: 1755859
number_of_participants: 17443
belongs_to: data
- id: alspacdcs:c835c01a-4641-4d85-8ae6-e2ebc5675cea
name: filtered_05.bgen
md5sum: 1c79aeefda8460272e4f964182f10afd
filesize: 6.8GB
filetype: .bgen
number_of_variants: 1808090
number_of_participants: 17443
belongs_to: data
- id: alspacdcs:c220949d-7fd4-4714-ab41-3638c77a45b7
name: filtered_03.bgen
md5sum: ab8430120ce8f09840e194b1a4649ea9
filesize: 7.6GB
filetype: .bgen
number_of_variants: 1966662
number_of_participants: 17443
belongs_to: data
- id: alspacdcs:76f8330c-390d-4f9a-8e0d-ba437fa5154a
name: filtered_04.bgen
md5sum: 064e18391df4c15af5c8a99dacccceae
filesize: 8.3GB
filetype: .bgen
number_of_variants: 1968171
number_of_participants: 17443
belongs_to: data
- id: alspacdcs:8ee96eba-2b3c-4fcb-b550-9e81a934938a
name: filtered_01.bgen
md5sum: 2d645050a449c6c9210f8c9948790555
filesize: 9.0GB
filetype: .bgen
number_of_variants: 2155158
number_of_participants: 17443
belongs_to: data
- id: alspacdcs:9e07e8aa-956e-4e5d-a505-412fe106c9a3
name: filtered_02.bgen
md5sum: 621d9fb9e88ee50f898372f0a17439d8
filesize: 9.1GB
filetype: .bgen
number_of_variants: 2346862
number_of_participants: 17443
belongs_to: data
- id: alspacdcs:f450474c-3688-44db-8fce-c11c1a229950
name: swapped.sample
md5sum: 8e3d90b5108bc3da7ede33e39718f57d
filesize: 1.2MB
filetype: .sample
number_of_participants: 17443
belongs_to: dataGenome-wide - TOPMed round 2 imputed - G0 mothers + G1 (gi_topmed_g0m_g1)
Description
This dataset contains genotype data imputed to TOPMed round 2 for G0
mothers and G1.
Reference genome build: GRCh38
Methodology
ALSPAC children were genotyped using the Illumina HumanHap550 quad chip genotyping platforms by 23andme subcontracting the Wellcome Trust Sanger Institute, Cambridge, UK and the Laboratory Corporation of America, Burlington, NC, US. The resulting raw genome-wide data were subjected to standard quality control methods. Individuals were excluded on the basis of gender mismatches; minimal or excessive heterozygosity; disproportionate levels of individual missingness (>3%) and insufficient sample replication (IBD < 0.8).
Population stratification was assessed by multidimensional scaling analysis and compared with Hapmap II (release 22) European descent (CEU), Han Chinese, Japanese and Yoruba reference populations; all individuals with non-European ancestry were removed.
SNPs with a minor allele frequency of < 1%, a call rate of < 95% or evidence for violations of Hardy-Weinberg equilibrium (P < 5E-7) were removed. Cryptic relatedness was measured as proportion of identity by descent (IBD > 0.1).
Related subjects that passed all other quality control thresholds were retained during subsequent phasing and imputation. 9,115 subjects and 500,527 SNPs passed these quality control filters.
ALSPAC mothers were genotyped using the Illumina human660W-quad array at Centre National de Génotypage (CNG) and genotypes were called with Illumina GenomeStudio. PLINK (v1.07) was used to carry out quality control measures on an initial set of 10,015 subjects and 557,124 directly genotyped SNPs. SNPs were removed if they displayed more than 5% missingness or a Hardy-Weinberg equilibrium P value of less than 1.0e-06. Additionally SNPs with a minor allele frequency of less than 1% were removed.
Samples were excluded if they displayed more than 5% missingness, had indeterminate X chromosome heterozygosity or extreme autosomal heterozygosity. Samples showing evidence of population stratification were identified by multidimensional scaling of genome-wide identity by state pairwise distances using the four HapMap populations as a reference, and then excluded.
9,048 subjects and 526,688 SNPs passed these quality control filters.
We combined 477,482 SNP genotypes in common between the sample of mothers and sample of children. We removed SNPs with genotype missingness above 1% due to poor quality (11,396 SNPs removed) and removed a further 321 subjects due to potential ID mismatches. This resulted in a dataset of 17,842 subjects containing 6,305 duos and 465,740 SNPs (112 were removed during liftOver and 234 were out of HWE after combination).
Individuals within this dataset, but who have withdrawn from the project were removed from the dataset before proceeding with imputation specific quality control. This left 17450 individuals.
The combined mothers and children combined genotype panel was filtered to remove SNPs below MAF 0.01, missing call rates exceeding 0.01 using Plink 2.0. The joint set of SNPs was checked for palindromic SNPs but none were present. The combined call set was swapped from GRCh37 to GRCh38 using UCSC liftOver.
The dataset was later filtered to SNPs above HWE of 1e-6 leaving 455150 SNPs. The combined autosomal call set was then converted to VCF files, before being uploaded to the TOPMed imputation server to flag variants requiring a strand fix. Any SNPs flagged with an issue were corrected, or filtered out using Plink2. 454248 SNPs remained within the autosomes.
Phasing and imputation was conducted on the Michigan TOPMed imputation server (v1.7.4) in October of 2023. Phasing was done using Eagle (v2.4). Imputation was done on minimac4 (v1.0.2) to TOPMed R2. An R squared filter of 0.3 was applied.
Freeze Docs
# This yaml file is a description of a freeze of a released version of a named alspac dataset
# It should conform to the schema https://github.com/alspac/alspac-data-catalogue-schema
id: alspacdcs:gi_topmed_g0m_g1_2025-07-25_f6
name: >-
Genome-wide - TOPmed imputed - G0 mothers + G1 version 2025-07-25
freeze 5
description: >-
Freeze 6 of version 2025-07-25 Genome-wide array data imputed to the TOPmed round 2 reference panel for G0 mothers and G1 individuals in bgen and sample file format (version 1.2).
The 2025-07-25 version of the dataset has had all monomorphic variants filtered out of the dataset to reduce overall size.
freeze_size: 102G
linker_file_md5sum: 45415c7d4fae355b4fb2d6ccd042620d
woc_file_md5sum: 6c887db8c7dd10cc695630ca73b41405
all_individuals_to_exclude_md5sum: e4efce63f9f671548d08c8bb2f9cc4f7
git_tag: https://github.com/alspac/dataset_gi_topmed_g0m_g1/releases/tag/freeze6
is_current_freeze: true
freeze_number: 6
freeze_date: 2025-09-30
previous_freeze: alspacdcs:gi_topmed_g0m_g1_2024-12-19_f5
freeze_of_alspac_dataset_version: alspacdcs:gi_topmed_g0m_g1_2025-07-25
freeze_of_named_alspac_dataset: alspacdcs:gi_topmed_g0m_g1
contains:
- data
files: []
data:
contains:
- chr22_freeze.bgen
- chr21_freeze.bgen
- chr20_freeze.bgen
- chr19_freeze.bgen
- chr15_freeze.bgen
- chr16_freeze.bgen
- chr17_freeze.bgen
- chr18_freeze.bgen
- chr14_freeze.bgen
- chr6_freeze.bgen
- chr11_freeze.bgen
- chr9_freeze.bgen
- chr7_freeze.bgen
- chr13_freeze.bgen
- chr12_freeze.bgen
- chr8_freeze.bgen
- chr10_freeze.bgen
- chr3_freeze.bgen
- chr5_freeze.bgen
- chr1_freeze.bgen
- chr4_freeze.bgen
- chr2_freeze.bgen
- freeze.sample
files:
- id: alspacdcs:bc269aaf-b168-4486-8fb9-9765872ace28
name: chr22_freeze.bgen
md5sum: dfa50660994f12fb75c6555ff5a8aecb
filesize: 1.6GB
filetype: .bgen
number_of_variants: 962561
number_of_participants: 17443
belongs_to: data
- id: alspacdcs:66e69d02-4806-43fa-ba36-61cb82c32f4f
name: chr21_freeze.bgen
md5sum: f5e9aff73c2f8e53827dd2436a2617ed
filesize: 1.5GB
filetype: .bgen
number_of_variants: 900622
number_of_participants: 17443
belongs_to: data
- id: alspacdcs:18df2c84-cf46-4826-96fd-a1eb764104a7
name: chr20_freeze.bgen
md5sum: d20992c1426cd71ba5347438e4796b04
filesize: 2.4GB
filetype: .bgen
number_of_variants: 1553082
number_of_participants: 17443
belongs_to: data
- id: alspacdcs:f2bf3956-efe5-4e8c-b661-57224158a638
name: chr19_freeze.bgen
md5sum: 2b411c0382a710f441d1a4facff85377
filesize: 2.8GB
filetype: .bgen
number_of_variants: 1545576
number_of_participants: 17443
belongs_to: data
- id: alspacdcs:bf62d97c-7a23-423b-b058-4cc1399a22aa
name: chr15_freeze.bgen
md5sum: 2ba7b10c3acdbc05068324b5e6c49e64
filesize: 3.0GB
filetype: .bgen
number_of_variants: 1991728
number_of_participants: 17443
belongs_to: data
- id: alspacdcs:507db985-2ff1-47f8-83cb-3a0e9c23e73c
name: chr16_freeze.bgen
md5sum: fb6c49d8517a13f2fe100ec945cae487
filesize: 3.4GB
filetype: .bgen
number_of_variants: 2182386
number_of_participants: 17443
belongs_to: data
- id: alspacdcs:a10bbf67-2c3b-4500-9c9a-cf412bf8e5bf
name: chr17_freeze.bgen
md5sum: 7e7b431d7a9a56854437bc0e508224c9
filesize: 3.1GB
filetype: .bgen
number_of_variants: 1960949
number_of_participants: 17443
belongs_to: data
- id: alspacdcs:73ff5630-9dae-48fc-a1c7-393389a1d5b9
name: chr18_freeze.bgen
md5sum: 46b5223f3aab805ff1472c8170f16246
filesize: 3.0GB
filetype: .bgen
number_of_variants: 1917077
number_of_participants: 17443
belongs_to: data
- id: alspacdcs:198a0ce9-b36e-4283-882e-2c340083a7c8
name: chr14_freeze.bgen
md5sum: 31e3d2a5e3025ff295732b81fb8eb67e
filesize: 3.2GB
filetype: .bgen
number_of_variants: 2168141
number_of_participants: 17443
belongs_to: data
- id: alspacdcs:a77b0371-6a4c-4cb8-9efd-6554fcc65687
name: chr6_freeze.bgen
md5sum: 7ad4e779bec8ff015b6f456fbd2168a1
filesize: 6.0GB
filetype: .bgen
number_of_variants: 4170487
number_of_participants: 17443
belongs_to: data
- id: alspacdcs:5c029c98-e44f-45dd-aa3d-b55fabc4825c
name: chr11_freeze.bgen
md5sum: 4c0bb18bd8b37d03167dba5b5832c73d
filesize: 5.0GB
filetype: .bgen
number_of_variants: 3361214
number_of_participants: 17443
belongs_to: data
- id: alspacdcs:55d0c079-6544-42d4-a777-128384faccba
name: chr9_freeze.bgen
md5sum: 7de2977c6dc18dbfea7f13a1fc935a2a
filesize: 4.3GB
filetype: .bgen
number_of_variants: 2996234
number_of_participants: 17443
belongs_to: data
- id: alspacdcs:080fa42b-02ec-4ae7-ade0-ae1a7c3c3d60
name: chr7_freeze.bgen
md5sum: f24b5dca4f7eb1f3f0352f4921d81439
filesize: 6.1GB
filetype: .bgen
number_of_variants: 3924564
number_of_participants: 17443
belongs_to: data
- id: alspacdcs:5f06e0fd-682c-4c2f-970e-0787dbc6402b
name: chr13_freeze.bgen
md5sum: bd34a1d9ee2d3793753d40c941065ebe
filesize: 3.6GB
filetype: .bgen
number_of_variants: 2429492
number_of_participants: 17443
belongs_to: data
- id: alspacdcs:78d407d7-0af8-4829-b814-5b8daf36c820
name: chr12_freeze.bgen
md5sum: af25a87a5c448baa1afabc0e863570d4
filesize: 4.8GB
filetype: .bgen
number_of_variants: 3247986
number_of_participants: 17443
belongs_to: data
- id: alspacdcs:35de9a9c-6ce6-4fdc-9595-0b53664187c0
name: chr8_freeze.bgen
md5sum: 5ff96ffda3f0f0eb369488cba3daf90c
filesize: 5.3GB
filetype: .bgen
number_of_variants: 3767813
number_of_participants: 17443
belongs_to: data
- id: alspacdcs:75358122-201b-40c6-8301-f6b52f9dafad
name: chr10_freeze.bgen
md5sum: 49c9fec775b0a100122c55fd4dbcb1e6
filesize: 5.0GB
filetype: .bgen
number_of_variants: 3328581
number_of_participants: 17443
belongs_to: data
- id: alspacdcs:2d2df7b7-21d0-4594-98c2-b13b88d8bb6e
name: chr3_freeze.bgen
md5sum: 968e0450a13df1db361bb7f051eb15d6
filesize: 7.0GB
filetype: .bgen
number_of_variants: 4839527
number_of_participants: 17443
belongs_to: data
- id: alspacdcs:789c55ab-434b-4a57-8cf2-a06032d26655
name: chr5_freeze.bgen
md5sum: efb585b055f4fc6152539451536fdce7
filesize: 6.3GB
filetype: .bgen
number_of_variants: 4361228
number_of_participants: 17443
belongs_to: data
- id: alspacdcs:ab21d436-44ac-4644-9423-ea97d8d30621
name: chr1_freeze.bgen
md5sum: 71d10371666edf730b3a9bbbba4d9656
filesize: 7.9GB
filetype: .bgen
number_of_variants: 5442779
number_of_participants: 17443
belongs_to: data
- id: alspacdcs:e2d62c6c-1091-4167-80a5-3b9a1d7bdc76
name: chr4_freeze.bgen
md5sum: 061a70600f6dc402186c4e6d5a466e36
filesize: 7.5GB
filetype: .bgen
number_of_variants: 4721728
number_of_participants: 17443
belongs_to: data
- id: alspacdcs:ceb776ea-f450-4941-a824-5c1f2f5b5170
name: chr2_freeze.bgen
md5sum: 9f34455ed28304c165f40516d7ba6a28
filesize: 8.2GB
filetype: .bgen
number_of_variants: 5860289
number_of_participants: 17443
belongs_to: data
- id: alspacdcs:f8a3e43c-2a74-43e5-9b56-10b38cf07d23
name: freeze.sample
md5sum: 4dd920b481fbd03cb9cde07d05fd0e40
filesize: 953.9KB
filetype: .sample
number_of_participants: 17443
belongs_to: dataSequence Data
Whole genome sequencing - G1 (wgs_hiseq_g1)
Description
This dataset contains whole genome sequencing for G1 individuals,
part of the UK10K dataset.
Reference genome build:
GRCh37
Methodology
ALSPAC and TwinsUK cohorts were sequenced at an average read depth of 6.7x through the UK10K program (http://www.UK10K.org) using the Illumina HiSeq platform, and aligned to the GRCh37 human reference using BWA. SNV calls were completed using samtools/bcftools and VQSR and GATK were used to recall these calls.
Associated publication:
-
http://www.ncbi.nlm.nih.gov/pubmed/26367797
Please ensure you have permission to access this data (http://www.uk10k.org/data_access.html) before using it.
Freeze Docs
# This yaml file is a description of a freeze of a released version of a named alspac dataset
# It should conform to the schema https://github.com/alspac/alspac-data-catalogue-schema
id: alspacdcs:wgs_hiseq_g1_2016-08-18_f6
name: Whole genome sequencing - G1 version 2016-08-18 freeze 6
description: >-
This is the freeze 6 of version 2016-08-18 of the Whole genome sequencing for G1 individuals, part of the UK10K dataset.
freeze_size: 341G
linker_file_md5sum: 45415c7d4fae355b4fb2d6ccd042620d
woc_file_md5sum: 6c887db8c7dd10cc695630ca73b41405
all_individuals_to_exclude_md5sum: e4efce63f9f671548d08c8bb2f9cc4f7
git_tag: https://github.com/alspac/dataset_wgs_hiseq_g1/releases/tag/freeze6
is_current_freeze: true
freeze_number: 6
freeze_date: 2025-09-30
previous_freeze: alspacdcs:wgs_hiseq_g1_2016-08-18_f5
freeze_of_alspac_dataset_version: alspacdcs:wgs_hiseq_g1_2016-08-18
freeze_of_named_alspac_dataset: alspacdcs:wgs_hiseq_g1
contains:
- data
files: []
data:
contains:
- 4_freeze.vcf.gz.csi
- 3_freeze.vcf.gz.csi
- 7_freeze.vcf.gz.csi
- 17_freeze.vcf.gz.csi
- 20_freeze.vcf.gz.csi
- 15_freeze.vcf.gz.csi
- 9_freeze.vcf.gz.csi
- 16_freeze.vcf.gz.csi
- 12_freeze.vcf.gz.csi
- 10_freeze.vcf.gz.csi
- 6_freeze.vcf.gz.csi
- X_freeze.vcf.gz.csi
- 8_freeze.vcf.gz.csi
- 21_freeze.vcf.gz.csi
- 19_freeze.vcf.gz.csi
- 1_freeze.vcf.gz.csi
- 11_freeze.vcf.gz.csi
- 13_freeze.vcf.gz.csi
- 18_freeze.vcf.gz.csi
- 5_freeze.vcf.gz.csi
- 2_freeze.vcf.gz.csi
- 22_freeze.vcf.gz.csi
- 14_freeze.vcf.gz.csi
- 21_freeze.vcf.gz
- 22_freeze.vcf.gz
- 19_freeze.vcf.gz
- 20_freeze.vcf.gz
- 15_freeze.vcf.gz
- 17_freeze.vcf.gz
- 14_freeze.vcf.gz
- 18_freeze.vcf.gz
- 16_freeze.vcf.gz
- X_freeze.vcf.gz
- 13_freeze.vcf.gz
- 9_freeze.vcf.gz
- 10_freeze.vcf.gz
- 8_freeze.vcf.gz
- 12_freeze.vcf.gz
- 11_freeze.vcf.gz
- 6_freeze.vcf.gz
- 5_freeze.vcf.gz
- 7_freeze.vcf.gz
- 4_freeze.vcf.gz
- 3_freeze.vcf.gz
- 1_freeze.vcf.gz
- 2_freeze.vcf.gz
files:
- id: alspacdcs:bc517e41-e006-4bb4-98d6-561234d4c927
name: 4_freeze.vcf.gz.csi
md5sum: 1e96e09dda062a07d0e6dbed3d620609
filesize: 122.6KB
filetype: .csi
belongs_to: data
- id: alspacdcs:657dbf85-17d4-48d2-9ef6-7eff00bf0e62
name: 3_freeze.vcf.gz.csi
md5sum: 710268ee23b70a2f4a7692c016d23954
filesize: 127.9KB
filetype: .csi
belongs_to: data
- id: alspacdcs:1d4e1d33-6af9-458f-9b80-6387ec4718ae
name: 7_freeze.vcf.gz.csi
md5sum: 79858dec4f5980281600065acc55645a
filesize: 101.8KB
filetype: .csi
belongs_to: data
- id: alspacdcs:49f0b8f5-d6de-43c2-b0d9-aebddb7e95a1
name: 17_freeze.vcf.gz.csi
md5sum: e55c23ef41e971f4cbba256ab90f6c0e
filesize: 49.9KB
filetype: .csi
belongs_to: data
- id: alspacdcs:38301176-dce1-4712-997a-38e165d0e86b
name: 20_freeze.vcf.gz.csi
md5sum: 01abfd1020dfdd35ae4b8fafd887bb75
filesize: 38.2KB
filetype: .csi
belongs_to: data
- id: alspacdcs:d353fef6-de4b-4b8f-a6b9-cd1e8cb2d8b3
name: 15_freeze.vcf.gz.csi
md5sum: 59ed2877301a832f461cf72965a7456c
filesize: 51.7KB
filetype: .csi
belongs_to: data
- id: alspacdcs:6a990423-f1d8-4a78-9182-34fc1342067e
name: 9_freeze.vcf.gz.csi
md5sum: 8f582cdaf97496a225c064103b4966df
filesize: 75.4KB
filetype: .csi
belongs_to: data
- id: alspacdcs:81bca53a-7958-421e-969e-767f56f494d9
name: 16_freeze.vcf.gz.csi
md5sum: ac3a87b8284a80237a5612ce1c10763b
filesize: 50.4KB
filetype: .csi
belongs_to: data
- id: alspacdcs:85c02321-3454-4bf6-b5fc-b56262cf1be8
name: 12_freeze.vcf.gz.csi
md5sum: eadd942f5d3d41bbd6747d4ed1445fdf
filesize: 85.5KB
filetype: .csi
belongs_to: data
- id: alspacdcs:10019143-029a-4dd7-90b6-8f762d3e351a
name: 10_freeze.vcf.gz.csi
md5sum: c6a35b0f8ab981baba9f5f76cadb807f
filesize: 85.5KB
filetype: .csi
belongs_to: data
- id: alspacdcs:f3e4c4bb-b99e-4c9d-81ec-f6fed8fd4ccb
name: 6_freeze.vcf.gz.csi
md5sum: 66afc781a4738e294b7c1c71ee7d0bf0
filesize: 109.9KB
filetype: .csi
belongs_to: data
- id: alspacdcs:84116aa8-a76f-4af1-b138-2cdeb4c28c35
name: X_freeze.vcf.gz.csi
md5sum: feb208ab0f31fe27ab5b4b8a688ad67c
filesize: 96.0KB
filetype: .csi
belongs_to: data
- id: alspacdcs:138a1186-3078-4eae-b8d2-832f78aabea5
name: 8_freeze.vcf.gz.csi
md5sum: d00479aa24a74e7d00e90fd4c259b63d
filesize: 92.8KB
filetype: .csi
belongs_to: data
- id: alspacdcs:76a34242-a3dc-4c52-a6ba-d0a3339ac7c4
name: 21_freeze.vcf.gz.csi
md5sum: 45b9ef5f1036c573549400642b1817fd
filesize: 22.1KB
filetype: .csi
belongs_to: data
- id: alspacdcs:e3c1eeb7-9b1a-4ad4-af45-9d3a22a828b9
name: 19_freeze.vcf.gz.csi
md5sum: c0d1ba8b4f99a46bf2690484b4e8a08f
filesize: 35.8KB
filetype: .csi
belongs_to: data
- id: alspacdcs:7ee211cc-e460-4339-897e-8c58f68bc8d6
name: 1_freeze.vcf.gz.csi
md5sum: 689c7e022a0a6b95ee0d2355b03e7bad
filesize: 145.6KB
filetype: .csi
belongs_to: data
- id: alspacdcs:12b528bb-2278-43b5-aa04-56d33c6e8bd3
name: 11_freeze.vcf.gz.csi
md5sum: 361b60ef1aa9e2c8a5edb2b524c3176e
filesize: 85.2KB
filetype: .csi
belongs_to: data
- id: alspacdcs:53e14d75-9344-4d8a-a050-bd10b17adbb0
name: 13_freeze.vcf.gz.csi
md5sum: 8ec1fa0d623cb977bb19d811741b1a39
filesize: 62.1KB
filetype: .csi
belongs_to: data
- id: alspacdcs:27546e70-05f1-48c8-b908-9f499bf1d0c1
name: 18_freeze.vcf.gz.csi
md5sum: 03c5e29b9df938ecc249d83d5a29f37e
filesize: 48.5KB
filetype: .csi
belongs_to: data
- id: alspacdcs:8e68f9d4-5814-4d86-9180-7b8489bbf2ac
name: 5_freeze.vcf.gz.csi
md5sum: b6dbbdbd8640267057a6b1db5311add8
filesize: 116.1KB
filetype: .csi
belongs_to: data
- id: alspacdcs:ae159f9a-33ef-4b1a-8347-9ce73d4a5c5c
name: 2_freeze.vcf.gz.csi
md5sum: 6cec8fff890738c60c71bf42ca6ba952
filesize: 156.1KB
filetype: .csi
belongs_to: data
- id: alspacdcs:6849eb0f-6cba-497e-b608-4f0277b548f9
name: 22_freeze.vcf.gz.csi
md5sum: 80b954d0b63ae61f6661a512b040023d
filesize: 22.1KB
filetype: .csi
belongs_to: data
- id: alspacdcs:04fa0db1-24dd-4c3b-a8c9-64b607850b61
name: 14_freeze.vcf.gz.csi
md5sum: 01148117d507206aeeec75d4cb909b9e
filesize: 56.6KB
filetype: .csi
belongs_to: data
- id: alspacdcs:a40bee63-e57d-4fe2-b0f8-79bb97fef1a3
name: 21_freeze.vcf.gz
md5sum: 407fc245f5af69ebee43b7f6900c7d3a
filesize: 4.3GB
filetype: .gz
number_of_variants: 563988
number_of_participants: 1865
belongs_to: data
- id: alspacdcs:5d0ee7ad-cd85-4c36-b0c4-e18184da88f0
name: 22_freeze.vcf.gz
md5sum: 41eb74a2deb305d78562e4a0545ad429
filesize: 4.4GB
filetype: .gz
number_of_variants: 552675
number_of_participants: 1865
belongs_to: data
- id: alspacdcs:61e3b02f-9177-4274-bba7-97427a5cc05f
name: 19_freeze.vcf.gz
md5sum: 2aabff2631af303fb7510912d483387a
filesize: 7.0GB
filetype: .gz
number_of_variants: 886630
number_of_participants: 1865
belongs_to: data
- id: alspacdcs:251aea18-1247-4bd4-bb73-032149511ff7
name: 20_freeze.vcf.gz
md5sum: 9f7d7d3408b37dbf1d3b910486d6b8de
filesize: 7.5GB
filetype: .gz
number_of_variants: 970869
number_of_participants: 1865
belongs_to: data
- id: alspacdcs:aaf7deab-9969-45ee-a464-88fa1bba3c37
name: 15_freeze.vcf.gz
md5sum: 70828f306ee2698ef5bf3f68d6482214
filesize: 9.7GB
filetype: .gz
number_of_variants: 1262404
number_of_participants: 1865
belongs_to: data
- id: alspacdcs:1d328f6b-1c5f-44d4-b179-f301064f87be
name: 17_freeze.vcf.gz
md5sum: 8170bc093573b70bbc86f40fa820b555
filesize: 9.1GB
filetype: .gz
number_of_variants: 1177884
number_of_participants: 1865
belongs_to: data
- id: alspacdcs:0234d628-9ebe-4eae-891f-0c25105caa73
name: 14_freeze.vcf.gz
md5sum: 13c6732a437d7c5998ad4aa23327217f
filesize: 10.7GB
filetype: .gz
number_of_variants: 1403580
number_of_participants: 1865
belongs_to: data
- id: alspacdcs:349521d9-a02e-4e93-9d0c-7bc4a923760e
name: 18_freeze.vcf.gz
md5sum: a8c08c119c443ae7de5c5679cc8cf84c
filesize: 9.4GB
filetype: .gz
number_of_variants: 1220427
number_of_participants: 1865
belongs_to: data
- id: alspacdcs:228136ed-df23-4d67-acf4-f32775e6262f
name: 16_freeze.vcf.gz
md5sum: 660eb1a7f9bfa9453104b6f6a36b3792
filesize: 10.6GB
filetype: .gz
number_of_variants: 1373607
number_of_participants: 1865
belongs_to: data
- id: alspacdcs:99891c2e-9103-4405-b91a-0a0f6ea0ad38
name: X_freeze.vcf.gz
md5sum: 83fd00533ca8fa79ef2cb90cf24a4447
filesize: 10.5GB
filetype: .gz
number_of_variants: 1700742
number_of_participants: 1865
belongs_to: data
- id: alspacdcs:1eba6a4a-f4b7-43c7-83fb-062ff890cae9
name: 13_freeze.vcf.gz
md5sum: 884aee3994ad39536cbe22216b8013be
filesize: 11.8GB
filetype: .gz
number_of_variants: 1527053
number_of_participants: 1865
belongs_to: data
- id: alspacdcs:e5c1a4b3-1d71-4361-a285-08034f6abd29
name: 9_freeze.vcf.gz
md5sum: 342ecb473fad630b7a5a2da3084e2870
filesize: 14.2GB
filetype: .gz
number_of_variants: 1845456
number_of_participants: 1865
belongs_to: data
- id: alspacdcs:b1bb1feb-f3f5-46b6-9371-2e8df16ed159
name: 10_freeze.vcf.gz
md5sum: 6f6447119f502325bead2828c6e8eeda
filesize: 16.3GB
filetype: .gz
number_of_variants: 2110436
number_of_participants: 1865
belongs_to: data
- id: alspacdcs:86e912ca-7535-4e06-959f-9c67d4b7b09e
name: 8_freeze.vcf.gz
md5sum: d7f427e88eaa35de8c0f85d406a1e17d
filesize: 18.8GB
filetype: .gz
number_of_variants: 2451009
number_of_participants: 1865
belongs_to: data
- id: alspacdcs:6ce14e08-2a85-46be-8ea6-e4eb81d444b4
name: 12_freeze.vcf.gz
md5sum: 0128a134ac2d2be58424a0dfe4fadb63
filesize: 15.7GB
filetype: .gz
number_of_variants: 2047922
number_of_participants: 1865
belongs_to: data
- id: alspacdcs:1976a545-7c16-4f99-9a7d-9c5d2d77860d
name: 11_freeze.vcf.gz
md5sum: 0160ce2e72cdf53ff2b7770c84415225
filesize: 16.4GB
filetype: .gz
number_of_variants: 2125064
number_of_participants: 1865
belongs_to: data
- id: alspacdcs:22088f45-f8f4-4386-b1a3-939ddecb1641
name: 6_freeze.vcf.gz
md5sum: ed5f30f81e5145e74c2b60266b0baab1
filesize: 21.0GB
filetype: .gz
number_of_variants: 2704091
number_of_participants: 1865
belongs_to: data
- id: alspacdcs:6dd03d42-28da-4947-b785-b843e90a48f9
name: 5_freeze.vcf.gz
md5sum: ceb9bc294a5fae86481d6a17d4807b62
filesize: 21.6GB
filetype: .gz
number_of_variants: 2804359
number_of_participants: 1865
belongs_to: data
- id: alspacdcs:ab486b1a-2849-4297-94af-6a7e4437f18d
name: 7_freeze.vcf.gz
md5sum: 586f5de80643fd974397c7fafaf3227a
filesize: 19.0GB
filetype: .gz
number_of_variants: 2445204
number_of_participants: 1865
belongs_to: data
- id: alspacdcs:4709a4af-7190-4f19-8ffb-869f546a9d16
name: 4_freeze.vcf.gz
md5sum: c224d57595609b177ec126f7a356c12f
filesize: 23.2GB
filetype: .gz
number_of_variants: 3019176
number_of_participants: 1865
belongs_to: data
- id: alspacdcs:6533b2ac-5b9a-4f86-9f95-df2a61013ae2
name: 3_freeze.vcf.gz
md5sum: 2ff06564c5dd09220d04cba5eff9601e
filesize: 24.2GB
filetype: .gz
number_of_variants: 3147254
number_of_participants: 1865
belongs_to: data
- id: alspacdcs:da3293aa-2c59-4b9b-9f21-ed3912c050c2
name: 1_freeze.vcf.gz
md5sum: 0f78af877e87d5f733e51f2f3d3885f6
filesize: 26.3GB
filetype: .gz
number_of_variants: 3406915
number_of_participants: 1865
belongs_to: data
- id: alspacdcs:794abf20-8d47-423b-b7a2-343ddb24255a
name: 2_freeze.vcf.gz
md5sum: 57503332a8bd17997da7916c8057cad4
filesize: 28.8GB
filetype: .gz
number_of_variants: 3749277
number_of_participants: 1865
belongs_to: dataWhole exome sequencing - G0 & G1 (wes_novaseq_g0_g1)
Description
This dataset contains whole exome sequencing for G0 and G1
individuals. It was generated at the Sanger Institute as part of an
initiative sequencing multiple Birth cohorts: ALSPAC, MCS and BiB. As
part of this initiative, the exome sequencing data will also be
available via EGA but researchers will still gain access through ALSPACs
project approval system.
Reference genome build:
GRCh38
Methodology
Exome sequencing was conducted on DNA for 12,374 participants (8,605 children and 3,389 of their parents) at the Sanger Institute, using Illumina NovaSeq. Reads were aligned to GRCh38 with BWA-MEM. There was an average on-target depth of ~62X for ALSPAC.
QC was conducted on the dataset at the Sanger Institute, please find details within the associated publication (Koko et al., 2024). Sample QC was done before (base-calls after sequencing, alignment quality, CRAM file quality) and after variant calling (PCA analysis, comparison to array data, relatedness). Integrated variant QC removed potentially false positive variants using a trained random forest model. Genotype QC removed low quality individual genotype calls.
Single nucleotide variant (SNV) and small insertions/deletion (indels) calling was conducted with GATK HaplotypeCaller, GenomicsDBImport and GenotypeGVCFs (GATK version 4.2.4.0 for ALSPAC) following GATK best practices (Van der Auwera and O’Connor, 2020).
There were 12 individuals identified to have sex mismatches within the dataset, withflagging as mismatches based on X F stat. When looking at the Y coverage of these individuals, 3 were clear cut-offs based from both X f stat and Y depth, while 9 were only mismatches based off the X F stat. The 3 individuals with clear mismatches on both statistics were removed from the dataset, while the other mismatches were retained.
Associated publication:
-
doi.org/10.12688/wellcomeopenres.22697.1
Freeze Docs
# This yaml file is a description of a freeze of a released version of a named alspac dataset
# It should conform to the schema https://github.com/alspac/alspac-data-catalogue-schema
id: alspacdcs:wes_novaseq_g0_g1_2024-09-20_f6
name: >-
Whole Exome Sequencing - Novaseq - G0 & G1 version 2024-09-20 freeze 6
description: >-
This is first iteration of wes_novaseq_g0_g1, first introduced in freeze 4. It contains data in vcf 4.2 format. It contains the majority of the G1 cohort (n=~8296), accompanied by G0 mothers (n=~1642) and partners (n=~1630) to create trios. Over time the participants may withdraw their consent, and subsequently will be removed from the dataset, so the number of available individuals from each cohort may differ from stated above.
This exome sequencing (ES) data was conducted at the Sanger institute and was part of an effort to ES ALSPAC, MCS and BiB. All ES data was quality controlled at the Sanger institute prior to this ALSPAC release and has been extensively document in the relevant publication (see below).
In brief (exert from associated publication, Koko et al., 2024):
"Sample QC:
* Before variant calling: Samples were removed if they failed one or more filters based on quality of base-calls after sequencing, or quality of the CRAM files of aligned reads. The remainder then underwent variant calling.
* After variant calling: We assigned individuals to populations using principal component analysis (PCA), then identified and removed individuals who were outliers on one or more variant-based metrics within each of the populations. We compared the exome data to genotyping array data from the same samples and removed samples that did not match as expected, since these could be sample mix-ups. The samples were also checked for unexpected relatedness; samples showing conflicts between reported and inferred relatedness were removed. This sample QC was split in two separate steps, before and after variant and genotype QC, as detailed in the coming sections.
Integrated variant and genotype QC:
* Variant QC: We removed candidate variants which may not be real, instead being artefacts or mapping errors, using a trained random forest model to distinguish likely true positives from likely false positives.
* Genotype QC: We removed low-quality individual genotype calls from the dataset. This was done in conjunction with variant QC, as we will explain below."
for extended information such as thresholds please find within the publication.
Associated publication:
Koko et al., 2024
DOI: https://doi.org/10.12688/wellcomeopenres.22697.2
freeze_size: 167G
linker_file_md5sum: 45415c7d4fae355b4fb2d6ccd042620d
woc_file_md5sum: 6c887db8c7dd10cc695630ca73b41405
all_individuals_to_exclude_md5sum: e4efce63f9f671548d08c8bb2f9cc4f7
git_tag: https://github.com/alspac/dataset_wes_novaseq_g0_g1/releases/tag/freeze6
is_current_freeze: true
freeze_number: 6
freeze_date: 2025-09-30
previous_freeze: alspacdcs:wes_novaseq_g0_g1_2024-09-20_f5
freeze_of_alspac_dataset_version: alspacdcs:wes_novaseq_g0_g1_2024-09-20
freeze_of_named_alspac_dataset: alspacdcs:wes_novaseq_g0_g1
contains:
- data
files: []
data:
contains:
- chr13_data.vcf.gz.csi
- chr3_data.vcf.gz.csi
- chr1_data.vcf.gz.csi
- chr7_data.vcf.gz.csi
- chr6_data.vcf.gz.csi
- chr10_data.vcf.gz.csi
- chr9_data.vcf.gz.csi
- chr14_data.vcf.gz.csi
- chrY_data.vcf.gz.csi
- chr21_data.vcf.gz.csi
- chr18_data.vcf.gz.csi
- chr12_data.vcf.gz.csi
- chr19_data.vcf.gz.csi
- chr16_data.vcf.gz.csi
- chr20_data.vcf.gz.csi
- chr17_data.vcf.gz.csi
- chr8_data.vcf.gz.csi
- chr11_data.vcf.gz.csi
- chr2_data.vcf.gz.csi
- chr15_data.vcf.gz.csi
- chr5_data.vcf.gz.csi
- chr22_data.vcf.gz.csi
- chr4_data.vcf.gz.csi
- chrX_data.vcf.gz.csi
- chrY_data.vcf.gz
- chr21_data.vcf.gz
- chr13_data.vcf.gz
- chr18_data.vcf.gz
- chr20_data.vcf.gz
- chrX_data.vcf.gz
- chr22_data.vcf.gz
- chr15_data.vcf.gz
- chr9_data.vcf.gz
- chr10_data.vcf.gz
- chr7_data.vcf.gz
- chr8_data.vcf.gz
- chr14_data.vcf.gz
- chr4_data.vcf.gz
- chr5_data.vcf.gz
- chr12_data.vcf.gz
- chr16_data.vcf.gz
- chr3_data.vcf.gz
- chr6_data.vcf.gz
- chr17_data.vcf.gz
- chr11_data.vcf.gz
- chr19_data.vcf.gz
- chr2_data.vcf.gz
- chr1_data.vcf.gz
files:
- id: alspacdcs:dacbe298-6538-4cc4-8294-2b82588032b5
name: chr13_data.vcf.gz.csi
md5sum: 4b1088f3717f8dfa6e9d1435125ebcb7
filesize: 13.4KB
filetype: .csi
belongs_to: data
- id: alspacdcs:f61a7fa1-107a-4d3c-8a44-8784c9d5dccb
name: chr3_data.vcf.gz.csi
md5sum: c4ba2ca77fe3ef0a54ffd132b67e501a
filesize: 37.9KB
filetype: .csi
belongs_to: data
- id: alspacdcs:9c248b37-fb0f-4108-bec7-6bfccd87838e
name: chr1_data.vcf.gz.csi
md5sum: 0cbe45073d54a442cee4d7adb3cc1d43
filesize: 59.4KB
filetype: .csi
belongs_to: data
- id: alspacdcs:c4e40fb2-5707-4ec8-8d14-97193f34f5e1
name: chr7_data.vcf.gz.csi
md5sum: aeb5dbe67f4ea80055674dd7f975bb07
filesize: 32.2KB
filetype: .csi
belongs_to: data
- id: alspacdcs:8a9b904b-000a-44f4-ba14-726c70d25d28
name: chr6_data.vcf.gz.csi
md5sum: 7e749d71a869e5e13a30ffbfd91fa741
filesize: 32.1KB
filetype: .csi
belongs_to: data
- id: alspacdcs:09dfab5a-2e81-4893-bb03-9693b44e9422
name: chr10_data.vcf.gz.csi
md5sum: 2e7d3061a8c3321352176fa1a6c75613
filesize: 27.8KB
filetype: .csi
belongs_to: data
- id: alspacdcs:a4a08f7c-d6ac-4014-a6ef-c982e53b6239
name: chr9_data.vcf.gz.csi
md5sum: dc1bf4d82d9b65fe01a0be1ded03928c
filesize: 25.0KB
filetype: .csi
belongs_to: data
- id: alspacdcs:d8fa72a5-9103-402a-baa2-9d0dc1e0e8e5
name: chr14_data.vcf.gz.csi
md5sum: 18984469d87f958ceb6b6978b27c98d5
filesize: 19.1KB
filetype: .csi
belongs_to: data
- id: alspacdcs:f83490ab-222d-4eb0-af78-534022ba18e7
name: chrY_data.vcf.gz.csi
md5sum: ee2bdf73b5f72154dc520c4da1a3b3f6
filesize: 129.0B
filetype: .csi
belongs_to: data
- id: alspacdcs:6c66a88c-bc26-407f-a116-12a68eb98e62
name: chr21_data.vcf.gz.csi
md5sum: a4ad759539e5f3c2e586b528e8640b80
filesize: 6.3KB
filetype: .csi
belongs_to: data
- id: alspacdcs:0e5acc30-23ef-48e2-8bee-addc4e8b4561
name: chr18_data.vcf.gz.csi
md5sum: 2222655064ce404092d784f6b6fd1ac9
filesize: 12.4KB
filetype: .csi
belongs_to: data
- id: alspacdcs:4cab45e5-a2dd-4c13-8110-bd0a50727082
name: chr12_data.vcf.gz.csi
md5sum: 1c215daae12e958e9e7adc65d9e036bd
filesize: 31.7KB
filetype: .csi
belongs_to: data
- id: alspacdcs:0f1f7933-322e-4490-96f6-96b6962aff05
name: chr19_data.vcf.gz.csi
md5sum: 3b2b9f9973f5d97cce113b3ab878e60a
filesize: 23.7KB
filetype: .csi
belongs_to: data
- id: alspacdcs:ce4ecf99-46a8-4115-a0fa-63b7e0e9c49c
name: chr16_data.vcf.gz.csi
md5sum: a1d59dc67083f2596e9702af8e734ccb
filesize: 19.9KB
filetype: .csi
belongs_to: data
- id: alspacdcs:e9455650-d11c-4338-84c4-4ef06288e6df
name: chr20_data.vcf.gz.csi
md5sum: 755cdaf303b2f09a19c2f6d11cf7401f
filesize: 14.7KB
filetype: .csi
belongs_to: data
- id: alspacdcs:96909c3e-2255-4019-ac36-e67bec620851
name: chr17_data.vcf.gz.csi
md5sum: 947c56aa0082ffff41fabd3196e0abc5
filesize: 26.2KB
filetype: .csi
belongs_to: data
- id: alspacdcs:bb2f79c7-7877-4571-8b61-0d3316753943
name: chr8_data.vcf.gz.csi
md5sum: 0a95f0920e76f0331993186b80fc8e65
filesize: 24.6KB
filetype: .csi
belongs_to: data
- id: alspacdcs:ccc44a15-b42c-423c-a185-341cf3cdc0ee
name: chr11_data.vcf.gz.csi
md5sum: 7348297f255b9025f3e485be1b62b2f6
filesize: 31.5KB
filetype: .csi
belongs_to: data
- id: alspacdcs:60776f11-9a96-4752-95de-343118a30e61
name: chr2_data.vcf.gz.csi
md5sum: 5d1135a086b68de5e46d4152cabfd04a
filesize: 47.6KB
filetype: .csi
belongs_to: data
- id: alspacdcs:d3895d74-5cd3-4c2f-8a08-fe87081c05c2
name: chr15_data.vcf.gz.csi
md5sum: 5a62f6f281f6dbd005afb149ee52daf1
filesize: 19.6KB
filetype: .csi
belongs_to: data
- id: alspacdcs:fbf90a1b-cc06-41b4-be90-69a1b43ab0d9
name: chr5_data.vcf.gz.csi
md5sum: 948665995542d984eedc0b52141d18a1
filesize: 30.8KB
filetype: .csi
belongs_to: data
- id: alspacdcs:9893fdc5-0368-4864-a5bf-2e4cd8519682
name: chr22_data.vcf.gz.csi
md5sum: 44dd6c78e5648b2ebdec6fce8d0feb1c
filesize: 11.0KB
filetype: .csi
belongs_to: data
- id: alspacdcs:9b01dd43-9a1f-4e89-a13f-893b5e3f35eb
name: chr4_data.vcf.gz.csi
md5sum: 014f977ed3351d8052197e58f24db7db
filesize: 29.7KB
filetype: .csi
belongs_to: data
- id: alspacdcs:8ac119cb-c6e6-48e2-8b95-b414de9d7923
name: chrX_data.vcf.gz.csi
md5sum: 561bc722911156aadade8d025721f0af
filesize: 22.9KB
filetype: .csi
belongs_to: data
- id: alspacdcs:cdd72e0b-0fb0-4855-b8eb-212032b2fea2
name: chrY_data.vcf.gz
md5sum: e4ea71e21eb7e842a8a6a63dcff96f5c
filesize: 363.9KB
filetype: .gz
number_of_variants: 9
number_of_participants: 11499
belongs_to: data
- id: alspacdcs:11e3f53f-e9a8-490b-add4-8a04c879f10b
name: chr21_data.vcf.gz
md5sum: b7918e78a4f18b43cc1958f4552cbfc6
filesize: 1.9GB
filetype: .gz
number_of_variants: 42207
number_of_participants: 11499
belongs_to: data
- id: alspacdcs:f599664c-be2b-4672-a0cb-55649de2ce0b
name: chr13_data.vcf.gz
md5sum: a2614d41e8ffdabf8f1ed8f2cbcd1479
filesize: 2.8GB
filetype: .gz
number_of_variants: 63931
number_of_participants: 11499
belongs_to: data
- id: alspacdcs:171aa6f5-e09d-4094-b660-fe0269d43567
name: chr18_data.vcf.gz
md5sum: 0f1a4c920c7121630a57b721fc876c04
filesize: 2.5GB
filetype: .gz
number_of_variants: 57017
number_of_participants: 11499
belongs_to: data
- id: alspacdcs:c6379949-8645-4d5e-837f-7112e0d984ab
name: chr20_data.vcf.gz
md5sum: 6595e284dbfa1a5787edf228356b97e4
filesize: 4.3GB
filetype: .gz
number_of_variants: 96655
number_of_participants: 11499
belongs_to: data
- id: alspacdcs:eb560ff9-0a18-492c-8720-dfe6fef3145d
name: chrX_data.vcf.gz
md5sum: 64c17e9b179a795bbb7e759000e711f5
filesize: 3.8GB
filetype: .gz
number_of_variants: 86925
number_of_participants: 11499
belongs_to: data
- id: alspacdcs:d6361e5f-b844-4b42-83fe-4906a7295f03
name: chr22_data.vcf.gz
md5sum: 30138d4b019ded3aaed8427dd8a06f87
filesize: 4.2GB
filetype: .gz
number_of_variants: 94446
number_of_participants: 11499
belongs_to: data
- id: alspacdcs:7093f81c-6f4e-415b-a465-363d20ffa553
name: chr15_data.vcf.gz
md5sum: 74aa176cbdc57357581703748d44aea0
filesize: 5.6GB
filetype: .gz
number_of_variants: 127646
number_of_participants: 11499
belongs_to: data
- id: alspacdcs:231e21a8-e379-43cd-a25b-7a7fa12f3c43
name: chr9_data.vcf.gz
md5sum: 0afb58ce21087d9e50bfa2d86793a8d9
filesize: 7.1GB
filetype: .gz
number_of_variants: 161039
number_of_participants: 11499
belongs_to: data
- id: alspacdcs:6e8945b4-3f3c-4d3a-afea-08d838f056ff
name: chr10_data.vcf.gz
md5sum: 3adcd480d18be470dacfae3e5f96d426
filesize: 6.5GB
filetype: .gz
number_of_variants: 149730
number_of_participants: 11499
belongs_to: data
- id: alspacdcs:d92a64f4-a4e6-43d9-bbeb-930d2210b84e
name: chr7_data.vcf.gz
md5sum: 6db789bff9c81208c328843e5781e7f6
filesize: 8.1GB
filetype: .gz
number_of_variants: 181925
number_of_participants: 11499
belongs_to: data
- id: alspacdcs:4ca85d29-56d8-44e1-812b-496d8fc11e40
name: chr8_data.vcf.gz
md5sum: 4bcd43a72b346de5b5787a403ef74e05
filesize: 5.9GB
filetype: .gz
number_of_variants: 133894
number_of_participants: 11499
belongs_to: data
- id: alspacdcs:d0bfe7af-b25c-445c-84b1-98bac22f963a
name: chr14_data.vcf.gz
md5sum: fb66388a3b5110af66d81e26253e188b
filesize: 5.7GB
filetype: .gz
number_of_variants: 128137
number_of_participants: 11499
belongs_to: data
- id: alspacdcs:ff11a24a-a18e-4ca4-9fa7-d815190390bb
name: chr4_data.vcf.gz
md5sum: e3af61e808ba72769d41139f14da6a37
filesize: 6.1GB
filetype: .gz
number_of_variants: 140675
number_of_participants: 11499
belongs_to: data
- id: alspacdcs:b3eaa07e-cefb-4a7b-ac6b-49f476078a80
name: chr5_data.vcf.gz
md5sum: 18000c1da5ab74f7f1dd75d9d2cc016b
filesize: 7.0GB
filetype: .gz
number_of_variants: 161010
number_of_participants: 11499
belongs_to: data
- id: alspacdcs:9998c158-0e4a-48d8-8da5-0bb5c659a15c
name: chr12_data.vcf.gz
md5sum: 96671ee7f5203edc897c091eeec95afa
filesize: 8.5GB
filetype: .gz
number_of_variants: 193518
number_of_participants: 11499
belongs_to: data
- id: alspacdcs:69cdce0a-8164-421e-a8bb-419732d8c5cc
name: chr16_data.vcf.gz
md5sum: d715e6d923f9cab2145d21988b8ebc4e
filesize: 8.3GB
filetype: .gz
number_of_variants: 186300
number_of_participants: 11499
belongs_to: data
- id: alspacdcs:6a1bf086-48e0-44c7-bc77-2dd9a9f55ac7
name: chr3_data.vcf.gz
md5sum: 20775cf2ee65817d8aeab72cc1f2c217
filesize: 9.1GB
filetype: .gz
number_of_variants: 206875
number_of_participants: 11499
belongs_to: data
- id: alspacdcs:1c1275fd-bc2f-410b-bc62-ff8a13a46dce
name: chr6_data.vcf.gz
md5sum: e158c0b18e4f47b23bdc9f022a125411
filesize: 8.0GB
filetype: .gz
number_of_variants: 181754
number_of_participants: 11499
belongs_to: data
- id: alspacdcs:f0fa15b4-2b5d-4c9d-befa-a1ab3cca9de4
name: chr17_data.vcf.gz
md5sum: 09644c190b7c6e7ef48be198e256452a
filesize: 10.0GB
filetype: .gz
number_of_variants: 224774
number_of_participants: 11499
belongs_to: data
- id: alspacdcs:a9db4b9d-ea30-4b0d-a1af-9ed5a3bcf9ce
name: chr11_data.vcf.gz
md5sum: ed62fd5d53cf7e2c412a5b6107b33aa2
filesize: 10.2GB
filetype: .gz
number_of_variants: 227858
number_of_participants: 11499
belongs_to: data
- id: alspacdcs:3c7e8732-e9a3-4dba-a784-fb75517d8d88
name: chr19_data.vcf.gz
md5sum: 292b7e544d1e4480489168c9fd0889a0
filesize: 12.5GB
filetype: .gz
number_of_variants: 271080
number_of_participants: 11499
belongs_to: data
- id: alspacdcs:41e14418-d698-44d4-8a57-7dc70749d6a8
name: chr2_data.vcf.gz
md5sum: e5c46d64ec1d086e8a6af1ae985112c2
filesize: 11.8GB
filetype: .gz
number_of_variants: 272150
number_of_participants: 11499
belongs_to: data
- id: alspacdcs:4dad866c-05dc-4c26-8ce6-8485c907df79
name: chr1_data.vcf.gz
md5sum: 4d4cdab191e20d80e68cd5ca1a8ae997
filesize: 16.3GB
filetype: .gz
number_of_variants: 370645
number_of_participants: 11499
belongs_to: dataWhole exome sequencing - G1 (wes_novaseq_g1)
Description
This dataset contains whole exome sequencing for G1 individuals. It
was generated at the Broad Institute for ~2900 G1 individuals.
Reference genome build: GRCh38
Methodology
The exomes returned from the Broad Insitute did not undergo PCA or relatedness filtering; instead provided as raw VCF data. The following thresholds were applied to the samples:
- Chimera rate: Less than 0.05
- Contamination rate: Less than 0.10
- PF aligned rate: More than 0.60
87 individuals were removed from the dataset who were believed to have been a sample mismatch. These exomes had discordance rate of above 0.05 when compared to existing array data using bcftools gtcheck.
Associated publications:
-
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9980234/ (conducted
additional QC beyond dataset)
Freeze Docs
# This yaml file is a description of a freeze of a released version of a named alspac dataset
# It should conform to the schema https://github.com/alspac/alspac-data-catalogue-schema
id: alspacdcs:wes_novaseq_g1_204-04-12_f6
name: >-
Whole Exome Sequencing - Novaseq - G1 version 2024-04-09 freeze 6
description: >-
This contains whole exome sequencing done at the Broad institute, first introduced in freeze 4. It contains data in vcf 4.2 format and an index file in csi format. It is a subset of the G1 cohort, with participants who have withdrawn their consent removed and omics IDs applied according to the freeze.
Samples were selected for whole exome sequencing at the Broad Institute from the G1 cohort (the cohort of index children) and were from subjects who were singletons/unrelated and of European/British ancestry, had blood-derived DNA available, and had been genotyped on a whole genome genotyping array.
The QC was performed by at the Broad. The following thresholds were applied:
Chimera rate < 0.05
Contamination rate < 0.10
PF aligned rate < 0.60
87 individuals were removed from the dataset who were believed to have been a sample mismatch. These exomes had discordance rate of above 0.05 when compared to existing array data using bcftools gtcheck.
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9980234/ describes this dataset in supplementary materials.
freeze_size: 28G
linker_file_md5sum: 45415c7d4fae355b4fb2d6ccd042620d
woc_file_md5sum: 6c887db8c7dd10cc695630ca73b41405
all_individuals_to_exclude_md5sum: e4efce63f9f671548d08c8bb2f9cc4f7
git_tag: https://github.com/alspac/dataset_wes_novaseq_g1/releases/tag/freeze6
is_current_freeze: true
freeze_number: 6
freeze_date: 2025-09-30
previous_freeze: alspacdcs:wes_novaseq_g1_204-04-12_f5
freeze_of_alspac_dataset_version: alspacdcs:wes_novaseq_g1_2024-03-26
freeze_of_named_alspac_dataset: alspacdcs:wes_novaseq_g1
contains:
- data
files: []
data:
contains:
- all_chr.vcf.gz.csi
- all_chr.vcf.gz
files:
- id: alspacdcs:614f9332-d95a-446a-a3fc-031649e1d6b3
name: all_chr.vcf.gz.csi
md5sum: afe84f33398dea988e73e6f66a781977
filesize: 785.7KB
filetype: .csi
belongs_to: data
- id: alspacdcs:60a87d01-0070-41a1-8026-fb726623bc40
name: all_chr.vcf.gz
md5sum: 3c6c1622289d2df4a5c871275c3bdb9a
filesize: 27.1GB
filetype: .gz
number_of_variants: 2965032
number_of_participants: 2879
belongs_to: data
Epigenetic Data
DNA methylation - EPIC & 450k - G0 + G1 (dnam_epic450_g0_g1)
Description
This dataset contains methylation data collected from both G0 and G1 on two arrays at different timepoints. This dataset supersedes dnam_450_g0m_g1.
There is data from Illumina Infinium HumanMethylation450K BeadChip array on G0 mothers at two timepoints (pregnancy and middle age), G1 participants at 5 timepoints (across birth, childhood and adolescence) and G0 participants at one timepoint. This dataset also contains data from Infinium MethylationEPIC v1.0 data on 2721 G1 individuals at 2 timepoints.
This dataset was generated as part of the Accessible Resource for Integrated Epigenomics Studies (http://www.ariesepigenomics.org.uk/).
Methodology
Preprocessing and quality control for this dataset was conducted using Meffil.
Associated publications:
- https://doi.org/10.1093/ije/dyv072 -
https://doi.org/10.1093/bioinformatics/bty476.
Associated R packages: - aries: https://github.com/MRCIEU/aries is associated with loading and using this dataset. - meffil: https://github.com/perishky/meffil/ was used for QC and normalisations within
Freeze Docs
# This yaml file is a description of a freeze of a released version of a named alspac dataset
# It should conform to the schema https://github.com/alspac/alspac-data-catalogue-schema
id: alspacdcs:dnam_epic450_g0_g1_2022-7-13_f6
name: >-
DNA methylation - EPIC & 450k - G0 + G1 version 2022-7-13 Freeze 6
description: >-
This is the freeze 6 version of dnam_epic450_g0_g1, which was first introduced
in freeze 2 and first released 2022-7-13.
This dataset consists of multiple sections, each are described:
Betas:
Normalized betas using functional normalization. We used 10 PCs on the controlmatrix to regress out technical variation. Slide was regressed out as random effect before normaliziation. CpGs are in rows and samples in columns. These are in .gds format.
control_matrix:
The 850 control probes are summarized in 42 control types. These probes can roughly be divided into negative control probes (613), probes intended for between array normalization (186) and the remainder (49), which are designed for quality control, including assessing the bisulfite conversion rate. None of these probes are designed to measure a biological signal. The summarized control probes can be used as surrogates for unwanted variation and are used for the functional normalization. Samples are rows and 42 control types are in columns. These are in .txt format.
derived:
dnamage:
DNA methylation aging estimates from within the dataset. Further information on this data and its usage is found within the `dnamage.html` and `dnamage.md` within the docs dir/folder.
dnamage data file is a csv file containing DNA methylation aging estimates within the dataset.
cellcounts:
Files contain cell counts estimated using a variety of cell type references using the Houseman deconvolution algorithm (PMID: 22568884). In each file, samples correspond to rows and cell types to columns.
reports:
Collection of QC and normalization reports generated by the R meffil package upon freeze creation. This was first introduced in freeze 6. These are in html format.
detection_p_values:
This matrix shows the detection pvalues for each sample and each CpG and is extracted from the idat files using the "meffil.load.detection.pvalues" function in meffil. CpGs are in rows and samples in columns. These are .gds files.
samplesheet:
Manifest files with columns extracted directly from LIMS and age, sex, omics ID, timepoint, timecode, sampletype, genotype columns to report sample mismatches, duplicate.rm column to remove duplicates. Samples in rows, variables in columns. These are csv files and the sampleheet.csv is the same as samplesheet-common.csv
cell count files specific details:
andrews-and-bakulski-cord-blood.txt
Cord blood cell count estimates derived using the Bakulski et al. 2016 reference (PMID 27019159; https://bioconductor.org/packages/release/data/experiment/html/FlowSorted.CordBlood.450k.html). This reference has been implemented in meffil. Cell counts estimated for b-cells, cd4+ t cells, cd8+ t cells, granulocytes, monocytes, natural killer cells and nucleated red blood cells. In this text file, samples are in rows and cell types in columns.
gervin-and-lyle-cord-blood.txt
Cord blood cell count estimates derived using the Gervin et al. 2019 reference (PMID 31455416; GEO accession GSE127824). Cell counts estimated for b-cells, cd4+ t cells, cd8+ t cells, granulocytes, monocytes, and natural killer cells. This reference has been implemented in meffil. In this text file, samples are in rows and cell types in columns.
cord-blood-gse68456.txt
Cord blood cell count estimates derived using the de Goede et al. 2015 reference (PMID 26366232; GEO accession GSE68456). Cell counts estimated for b-cells, cd4+ t cells, cd8+ t cells, granulocytes, monocytes, natural killer cells and nucleated red blood cells. This reference has been implemented in meffil. In this text file, samples are in rows and cell types in columns.
blood-gse35069-complete.txt
Cell counts in peripheral blood predicted using the peripheral blood reference published in Reinius et al. 2012 (PMID: 22848472). Same as 'blood gse35069.txt' but replaces granulocyteswith eosinophils and neutrophils. This reference has been implemented in meffil. In this text file, samples are in rows and cell types in columns.
blood-gse35069.txt
Blood cell count estimates derived using the Reinius et al. 2012 reference (PMID 25424692; GEO accession GSE35069). Cell counts estimated for b-cells, cd4+ t cells, cd8+ t cells, granulocytes, monocytes, and natural killer cells. In this text file, samples are in rows and cell types in columns.
blood-idoloptimized-epic.txt
Cell counts in peripheral blood predicted using the cell type reference from Bioconductor package FlowSorted.Blood.EPIC. This reference has been implemented in meffil. In this text file, samples are in rows and cell types in columns.
blood-idoloptimized.txt
Cell counts in peripheral blood predicted using the cell type reference from Bioconductor package FlowSorted.Blood.EPIC but restricted to the IDOLOptimizedCpGs450klegacy CpG sites. This reference has been implemented in meffil. In this text file, samples are in rows and cell types in columns.
combined-cord-blood.txt
Cord blood cell count estimates derived using the Bakulski et al, Gervin et al., de Goede et al., and Lin et al. references (https://bioconductor.org/packages/release/data/experiment/html/FlowSorted.CordBloodCombined.450k.html) for CpG sites selected using the IDOL algorithm and optimized for the Illumina Infinium HumanMethylation450 Beadchip. Cell counts estimated for b-cells, cd4+ t cells, cd8+ t cells, granulocytes, monocytes, natural killer cells and nucleated red blood cells. In this text file, samples are in rows and cell types in columns.
freeze_size:
linker_file_md5sum: 45415c7d4fae355b4fb2d6ccd042620d
woc_file_md5sum: 6c887db8c7dd10cc695630ca73b41405
all_individuals_to_exclude_md5sum: e4efce63f9f671548d08c8bb2f9cc4f7
git_tag: https://github.com/alspac/dataset_dnam_epic450_g0_g1/releases/tag/Freeze6
is_current_freeze: true
freeze_number: 6
freeze_date: 2025-09-30 ### Update to align with date of release
previous_freeze: 5
freeze_of_alspac_dataset_version: alspacdcs:dnam_epic450_g0_g1_2022-7-13
freeze_of_named_alspac_dataset: alspacdcs:dnam_epic450_g0_g1
contains:
- data
files: []
data:
contains:
- samplesheet
- derived
- betas
- detection_p_values
- control_matrix
files: []
samplesheet:
contains:
- samplesheet-epic.csv
- samplesheet-450.csv
- samplesheet-common.csv
- samplesheet.csv
files:
- id: alspacdcs:70eb011a-4680-4cb3-aa12-8dfae3ef55ca
name: samplesheet-epic.csv
md5sum: b186ad4f758ca51dfeb0e9cab45f2c3f
filesize: 1.0MB
filetype: .csv
belongs_to: data/samplesheet
- id: alspacdcs:ff6084b6-26f2-41f8-b4ec-1f59f0846ee6
name: samplesheet-450.csv
md5sum: 4ade52ccb70cea58acf588fd06e2952f
filesize: 2.1MB
filetype: .csv
belongs_to: data/samplesheet
- id: alspacdcs:bbdb5acd-979d-4078-8b4a-db7e57af77d8
name: samplesheet-common.csv
md5sum: 45032bf2732e0493f93f20afb3f588c4
filesize: 3.2MB
filetype: .csv
belongs_to: data/samplesheet
- id: alspacdcs:5e6a3daa-4078-40b4-8e0c-57f66d3c8511
name: samplesheet.csv
md5sum: 45032bf2732e0493f93f20afb3f588c4
filesize: 3.2MB
filetype: .csv
belongs_to: data/samplesheet
derived:
contains:
- dnamage.csv
- reports
- cellcounts
files:
- id: alspacdcs:9fc2876f-8cdf-43cd-a70f-e9e50160d284
name: dnamage.csv
md5sum: bd0c2efef6ee145cd0804d61c7e83151
filesize: 11.2MB
filetype: .csv
belongs_to: data/derived
reports:
contains:
- qc
- normalization
files: []
qc:
contains:
- qc-report-450.html
- qc-report-epic.html
- qc-report-common.html
- qc-report-450.md
- qc-report-common.md
- qc-report-epic.md
- figure
files:
- id: alspacdcs:a7070157-9eac-4a11-bf92-4b0d21f70088
name: qc-report-450.html
md5sum: 2b3e191892ea9537adbf0559261961be
filesize: 1.7MB
filetype: .html
belongs_to: data/derived/reports/qc
- id: alspacdcs:36710710-e348-4971-8ccb-03af82681f42
name: qc-report-epic.html
md5sum: b2847e855c05d0b6bd3ccc70a5499fe3
filesize: 1.7MB
filetype: .html
belongs_to: data/derived/reports/qc
- id: alspacdcs:56b25c8b-8a2e-4cf1-82a0-e8e73131590d
name: qc-report-common.html
md5sum: d132611210a761836c41c0d2b466fc54
filesize: 1.6MB
filetype: .html
belongs_to: data/derived/reports/qc
- id: alspacdcs:3f35fece-6d92-47c9-8eee-b86f7ffc10ba
name: qc-report-450.md
md5sum: 74cf93e9000b1448661b8b8c9f83a085
filesize: 21.1KB
filetype: .md
belongs_to: data/derived/reports/qc
- id: alspacdcs:82b6d23d-02f5-4b06-8c98-deae14de0e53
name: qc-report-common.md
md5sum: ebb115f7ccc1f9b170da20e86046beb0
filesize: 21.1KB
filetype: .md
belongs_to: data/derived/reports/qc
- id: alspacdcs:03e7aa54-1f47-4f68-b51f-3f39c8b4893b
name: qc-report-epic.md
md5sum: b072dd1b731f524f5509d5e91a738e62
filesize: 19.9KB
filetype: .md
belongs_to: data/derived/reports/qc
figure:
contains:
- unnamed-chunk-16-1.png
- unnamed-chunk-36-1.png
- unnamed-chunk-5-1.png
- unnamed-chunk-7-1.png
- unnamed-chunk-35-1.png
- unnamed-chunk-13-1.png
- unnamed-chunk-9-1.png
- unnamed-chunk-11-1.png
- unnamed-chunk-12-1.png
- unnamed-chunk-3-1.png
files:
- id: alspacdcs:496ef4c5-c5de-4e89-bcee-f36a700d8399
name: unnamed-chunk-16-1.png
md5sum: 60eb456b6848a4574507634f6110aff5
filesize: 107.4KB
filetype: .png
belongs_to: data/derived/reports/qc/figure
- id: alspacdcs:618340c6-f9c8-4058-9255-b3b71bbcb7ee
name: unnamed-chunk-36-1.png
md5sum: 1c1cbfadbc51a707f2d596c7b524cc3f
filesize: 29.9KB
filetype: .png
belongs_to: data/derived/reports/qc/figure
- id: alspacdcs:a9d24441-669b-4715-aa66-3fc3e6ea4cb3
name: unnamed-chunk-5-1.png
md5sum: 5d111e4bbc9bf43c14e0218f43b38ae8
filesize: 126.9KB
filetype: .png
belongs_to: data/derived/reports/qc/figure
- id: alspacdcs:6963248d-017c-402c-9540-be746811cc29
name: unnamed-chunk-7-1.png
md5sum: a736e2022533b8774eb71eaea5205503
filesize: 452.8KB
filetype: .png
belongs_to: data/derived/reports/qc/figure
- id: alspacdcs:ff23ca91-d373-4dd8-87e9-aff48e28dbdc
name: unnamed-chunk-35-1.png
md5sum: d387226410b191145b3a3da2ba725288
filesize: 26.1KB
filetype: .png
belongs_to: data/derived/reports/qc/figure
- id: alspacdcs:80dc37e4-07aa-4a46-906a-6124d2f66ea7
name: unnamed-chunk-13-1.png
md5sum: ce7fb6fee9571240aca9917e43548dbc
filesize: 79.5KB
filetype: .png
belongs_to: data/derived/reports/qc/figure
- id: alspacdcs:97770508-bbd3-42f3-8812-529f8eb0c079
name: unnamed-chunk-9-1.png
md5sum: c4a04fe47757519977dd3ca2f6f05908
filesize: 56.8KB
filetype: .png
belongs_to: data/derived/reports/qc/figure
- id: alspacdcs:743c3f26-3694-4a95-9e96-a67d3ea3f80e
name: unnamed-chunk-11-1.png
md5sum: fb762339b8382426fc163086764cd8f7
filesize: 33.4KB
filetype: .png
belongs_to: data/derived/reports/qc/figure
- id: alspacdcs:a8cc7846-d470-4cf7-b23a-eb2386e06fe9
name: unnamed-chunk-12-1.png
md5sum: 13ba50bfb39465f8f3af6532367c18af
filesize: 268.5KB
filetype: .png
belongs_to: data/derived/reports/qc/figure
- id: alspacdcs:a4138686-1912-4ea7-97fe-006172270faf
name: unnamed-chunk-3-1.png
md5sum: 770268d74e2f4e73cc2fcf4f7198fb9b
filesize: 106.4KB
filetype: .png
belongs_to: data/derived/reports/qc/figure
normalization:
contains:
- norm-report-450.html
- norm-report-epic.html
- norm-report-common.html
- norm-report-epic.md
- norm-report-450.md
- norm-report-common.md
- figure
files:
- id: alspacdcs:27351be8-a689-436a-b30e-358a3441ef68
name: norm-report-450.html
md5sum: aba86d0952ddcd684fbd494c11c9feb7
filesize: 1.8MB
filetype: .html
belongs_to: data/derived/reports/normalization
- id: alspacdcs:8bd2e716-8d6c-4f37-88ee-700398bf0dcc
name: norm-report-epic.html
md5sum: f07cbf1e5f46b12770520dfc909e419d
filesize: 1.7MB
filetype: .html
belongs_to: data/derived/reports/normalization
- id: alspacdcs:d7dadf30-de9c-4666-b053-d5340a0d3579
name: norm-report-common.html
md5sum: b7c8a003959b2b426252fabd307dc397
filesize: 1.7MB
filetype: .html
belongs_to: data/derived/reports/normalization
- id: alspacdcs:93f6590a-8ba6-4dec-9a92-22956531e803
name: norm-report-epic.md
md5sum: c0a90deb92e022cd440996b114990c38
filesize: 11.6KB
filetype: .md
belongs_to: data/derived/reports/normalization
- id: alspacdcs:007eb8c6-facc-40ae-855b-63b41df638bf
name: norm-report-450.md
md5sum: d36090a1fdac51bbd314f01bca31a22f
filesize: 20.1KB
filetype: .md
belongs_to: data/derived/reports/normalization
- id: alspacdcs:4afe1adf-5dcc-48c9-a9f0-ed8c975b4566
name: norm-report-common.md
md5sum: bd71c88eaafaba4abb5644ac3a43f185
filesize: 24.5KB
filetype: .md
belongs_to: data/derived/reports/normalization
figure:
contains:
- unnamed-chunk-47-1.png
- unnamed-chunk-45-1.png
- unnamed-chunk-46-1.png
- unnamed-chunk-32-1.png
- unnamed-chunk-27-1.png
- unnamed-chunk-43-1.png
- unnamed-chunk-78-1.png
- unnamed-chunk-76-1.png
- unnamed-chunk-48-1.png
- unnamed-chunk-44-1.png
- unnamed-chunk-38-1.png
- unnamed-chunk-61-1.png
- unnamed-chunk-64-1.png
- unnamed-chunk-50-1.png
- unnamed-chunk-21-1.png
- unnamed-chunk-77-1.png
- unnamed-chunk-35-1.png
- unnamed-chunk-74-1.png
- unnamed-chunk-67-1.png
- unnamed-chunk-72-1.png
- unnamed-chunk-42-1.png
- unnamed-chunk-56-1.png
- unnamed-chunk-75-1.png
- unnamed-chunk-23-1.png
- unnamed-chunk-49-1.png
- unnamed-chunk-79-1.png
- unnamed-chunk-71-1.png
- unnamed-chunk-22-1.png
- unnamed-chunk-51-1.png
- unnamed-chunk-80-1.png
- unnamed-chunk-73-1.png
files:
- id: alspacdcs:8504be18-9cf2-4f25-b89c-dbb7bc32a1bc
name: unnamed-chunk-47-1.png
md5sum: e8a702b741c582874e86c524178b3224
filesize: 43.6KB
filetype: .png
belongs_to: data/derived/reports/normalization/figure
- id: alspacdcs:0f88ece8-0b4f-4fb9-b792-1001e22f4f1d
name: unnamed-chunk-45-1.png
md5sum: 397ceb400ba6cc54f8957a04a7223b80
filesize: 40.5KB
filetype: .png
belongs_to: data/derived/reports/normalization/figure
- id: alspacdcs:1b7de838-299d-41e7-b71f-25abe6655548
name: unnamed-chunk-46-1.png
md5sum: 2a361ec26c3d5d729fc3ecc385f3dc37
filesize: 41.6KB
filetype: .png
belongs_to: data/derived/reports/normalization/figure
- id: alspacdcs:61e18d09-99de-4071-a146-45375913c274
name: unnamed-chunk-32-1.png
md5sum: e2fd34ad90f83276ac8531d441b6efea
filesize: 12.9KB
filetype: .png
belongs_to: data/derived/reports/normalization/figure
- id: alspacdcs:4bec956f-9904-4fcd-9b86-1df9d759fae3
name: unnamed-chunk-27-1.png
md5sum: d741adec199194e22300578b399b2adf
filesize: 212.6KB
filetype: .png
belongs_to: data/derived/reports/normalization/figure
- id: alspacdcs:39bd9815-c0ae-46d6-985a-e23a1b6fcad0
name: unnamed-chunk-43-1.png
md5sum: db015f1d599967aacff3e918ca210ed7
filesize: 45.0KB
filetype: .png
belongs_to: data/derived/reports/normalization/figure
- id: alspacdcs:f5c9a61b-da3c-4dc7-96fb-4e8646fda8e7
name: unnamed-chunk-78-1.png
md5sum: 1376d286deab61aff5b82ca728e79193
filesize: 40.2KB
filetype: .png
belongs_to: data/derived/reports/normalization/figure
- id: alspacdcs:6222599c-5d8c-4c6c-ac4a-53a339726aaa
name: unnamed-chunk-76-1.png
md5sum: b3a9faa0ba9216acccb586e28e6cc7e7
filesize: 35.3KB
filetype: .png
belongs_to: data/derived/reports/normalization/figure
- id: alspacdcs:5e6da925-c4de-4468-9c6c-8e8d100b2595
name: unnamed-chunk-48-1.png
md5sum: 974eb0b4ab2a678d374df2ae5fa4cb36
filesize: 41.4KB
filetype: .png
belongs_to: data/derived/reports/normalization/figure
- id: alspacdcs:4522e502-c9ce-4586-9c05-e49abd83ae25
name: unnamed-chunk-44-1.png
md5sum: c16673a6bb0109a0c533055bcccfd9b0
filesize: 40.8KB
filetype: .png
belongs_to: data/derived/reports/normalization/figure
- id: alspacdcs:fc0ff28a-e49e-4a1d-8ce7-3ab6b874c4bb
name: unnamed-chunk-38-1.png
md5sum: 523af90e3cb6d55e1927bdcc7ba581c1
filesize: 14.9KB
filetype: .png
belongs_to: data/derived/reports/normalization/figure
- id: alspacdcs:43d661fe-a4ae-44f8-85f9-7abc89e0fb16
name: unnamed-chunk-61-1.png
md5sum: 8909fde9844719d9c0dc6d4f27d6bee7
filesize: 14.1KB
filetype: .png
belongs_to: data/derived/reports/normalization/figure
- id: alspacdcs:df11f3a1-616b-4c48-8d7e-11963516aa07
name: unnamed-chunk-64-1.png
md5sum: 7feb3f164bc369cae2872b7f4e67b6a9
filesize: 15.7KB
filetype: .png
belongs_to: data/derived/reports/normalization/figure
- id: alspacdcs:fe380c9b-5cec-4b25-b006-cfb999e5e441
name: unnamed-chunk-50-1.png
md5sum: 63d4be9ba5dd131d42c5a1755a32ab3c
filesize: 37.7KB
filetype: .png
belongs_to: data/derived/reports/normalization/figure
- id: alspacdcs:4ef9c734-79a5-43ca-83f7-8341034378fd
name: unnamed-chunk-21-1.png
md5sum: 843c2535db5036f216cde2995c7d6e4d
filesize: 7.9KB
filetype: .png
belongs_to: data/derived/reports/normalization/figure
- id: alspacdcs:34f39e92-0032-4f12-8a97-f0e34ca5c2eb
name: unnamed-chunk-77-1.png
md5sum: 3cd5906468110fc3efa0e7059e50f011
filesize: 33.6KB
filetype: .png
belongs_to: data/derived/reports/normalization/figure
- id: alspacdcs:556bd24f-dc52-43ae-a7fc-0d36710441bc
name: unnamed-chunk-35-1.png
md5sum: 62397c374b15bafae8d4c77b5ab66903
filesize: 13.8KB
filetype: .png
belongs_to: data/derived/reports/normalization/figure
- id: alspacdcs:a776fb34-97c7-4ff4-8805-c08108b232af
name: unnamed-chunk-74-1.png
md5sum: dc69a1f6993cea3c5c289edbebe1395f
filesize: 34.6KB
filetype: .png
belongs_to: data/derived/reports/normalization/figure
- id: alspacdcs:a5ed73d6-5ceb-4e2d-8eff-355f7a66af27
name: unnamed-chunk-67-1.png
md5sum: 8310a8c9e27a59e61f93314fc83d4f0f
filesize: 13.3KB
filetype: .png
belongs_to: data/derived/reports/normalization/figure
- id: alspacdcs:f1cb7439-8fd0-4821-b6fb-fd8f71bec3a5
name: unnamed-chunk-72-1.png
md5sum: 62659ae0e97f0d410803cdb84f86ad06
filesize: 43.0KB
filetype: .png
belongs_to: data/derived/reports/normalization/figure
- id: alspacdcs:bebe6e86-f64f-4624-8ea0-46b0018c6993
name: unnamed-chunk-42-1.png
md5sum: cd0df689b84096cbb12bca6965b465dc
filesize: 43.7KB
filetype: .png
belongs_to: data/derived/reports/normalization/figure
- id: alspacdcs:72654aa4-46cb-4df7-9383-fc5e79f9e2f2
name: unnamed-chunk-56-1.png
md5sum: a305a033260dd8391902463465d4539b
filesize: 195.0KB
filetype: .png
belongs_to: data/derived/reports/normalization/figure
- id: alspacdcs:855b211f-c5a9-4cc9-9c2a-7c13e4816ddc
name: unnamed-chunk-75-1.png
md5sum: 86dbcf633fe7791f3fcd2730bbaca1fe
filesize: 33.7KB
filetype: .png
belongs_to: data/derived/reports/normalization/figure
- id: alspacdcs:d7b53a91-ddea-4975-8962-cac7adafd94c
name: unnamed-chunk-23-1.png
md5sum: 7a6f89888fd84a613415240965293fe2
filesize: 8.3KB
filetype: .png
belongs_to: data/derived/reports/normalization/figure
- id: alspacdcs:477db1b7-d261-42ec-822b-77f70a645a52
name: unnamed-chunk-49-1.png
md5sum: f6b29e0a8dab04649277a20958a9bdd3
filesize: 40.9KB
filetype: .png
belongs_to: data/derived/reports/normalization/figure
- id: alspacdcs:de610732-67e7-4d03-84ad-8bd0c3a6b3ec
name: unnamed-chunk-79-1.png
md5sum: b348bd0dfdea753fbfa4502b5e4e85ef
filesize: 37.8KB
filetype: .png
belongs_to: data/derived/reports/normalization/figure
- id: alspacdcs:bb91723a-f756-45be-bcaf-24dbc9e8db46
name: unnamed-chunk-71-1.png
md5sum: 540e6c04a4bf0de24c33072f251de261
filesize: 35.3KB
filetype: .png
belongs_to: data/derived/reports/normalization/figure
- id: alspacdcs:efa4497e-b13d-40d5-b86c-ca7a86edf94f
name: unnamed-chunk-22-1.png
md5sum: a197d74fb49c30f66707e60a8bdebd33
filesize: 7.9KB
filetype: .png
belongs_to: data/derived/reports/normalization/figure
- id: alspacdcs:b5bc7cc3-a98e-49df-8fd4-955a2a612837
name: unnamed-chunk-51-1.png
md5sum: ff43a8fecff1d2aab034fc95dc58353d
filesize: 36.1KB
filetype: .png
belongs_to: data/derived/reports/normalization/figure
- id: alspacdcs:f7f8c4cc-37c2-4c70-9a6b-646777b8b277
name: unnamed-chunk-80-1.png
md5sum: a0024a6beb1ae9e8f1049cc6494b41b0
filesize: 37.9KB
filetype: .png
belongs_to: data/derived/reports/normalization/figure
- id: alspacdcs:0a9c7485-73bc-4992-b58b-e338a65d253b
name: unnamed-chunk-73-1.png
md5sum: 6640674615e33436ee7f004385ccae38
filesize: 44.7KB
filetype: .png
belongs_to: data/derived/reports/normalization/figure
cellcounts:
contains:
- blood-idoloptimized-epic.txt
- combined-cord-blood.txt
- andrews-and-bakulski-cord-blood.txt
- gervin-and-lyle-cord-blood.txt
- blood-gse35069.txt
- blood-gse35069-complete.txt
- blood-idoloptimized.txt
- cord-blood-gse68456.txt
files:
- id: alspacdcs:bde83af1-b866-418a-9778-91fb56b4da3f
name: blood-idoloptimized-epic.txt
md5sum: 7331e83d31e1d200bbff3d041223cde1
filesize: 345.9KB
filetype: .txt
belongs_to: data/derived/cellcounts
- id: alspacdcs:f17a7996-c9af-408c-8ae2-93ed3ebc1524
name: combined-cord-blood.txt
md5sum: 7cbcf72ca00012d17d22ff6d21b7575c
filesize: 128.2KB
filetype: .txt
belongs_to: data/derived/cellcounts
- id: alspacdcs:e5e3dfbb-1eb2-4c37-a480-55e1e39cfdbf
name: andrews-and-bakulski-cord-blood.txt
md5sum: 33c69aa8e50deb28355dcb82d01c7510
filesize: 113.7KB
filetype: .txt
belongs_to: data/derived/cellcounts
- id: alspacdcs:e553d486-dd1e-424d-80c6-327f6f749740
name: gervin-and-lyle-cord-blood.txt
md5sum: 099c4cf9bd4ecfee91c19c3c2d2b6f70
filesize: 99.5KB
filetype: .txt
belongs_to: data/derived/cellcounts
- id: alspacdcs:8c7e54c5-a525-44f4-a553-003b09436936
name: blood-gse35069.txt
md5sum: 53fb63b4cef457d90688b3ddb861fa73
filesize: 1020.6KB
filetype: .txt
belongs_to: data/derived/cellcounts
- id: alspacdcs:4cddb029-b2ec-4794-8290-6f01f18b0d0f
name: blood-gse35069-complete.txt
md5sum: 27ab648c56b56e62709a98fcba95a764
filesize: 1.1MB
filetype: .txt
belongs_to: data/derived/cellcounts
- id: alspacdcs:5956609a-dee6-4552-a59f-d6db466fc45c
name: blood-idoloptimized.txt
md5sum: 2c2bdbf34093960af969ca37ae43c77b
filesize: 1.1MB
filetype: .txt
belongs_to: data/derived/cellcounts
- id: alspacdcs:051b757c-971b-45e5-a010-f269e09fc9fb
name: cord-blood-gse68456.txt
md5sum: 941f8a9ce1289ab5baaf10fb29bd8941
filesize: 129.8KB
filetype: .txt
belongs_to: data/derived/cellcounts
betas:
contains:
- epic.gds
- 450.gds
- common.gds
files:
- id: alspacdcs:c0cb59ba-fb67-4a49-87ee-12a86c9d82fc
name: epic.gds
md5sum: 0357486c3af3b5ee120c7b05bf077340
filesize: 17.5GB
filetype: .gds
belongs_to: data/betas
- id: alspacdcs:d4ad6861-58d0-4500-9807-eefdd933dd68
name: 450.gds
md5sum: 02e9b3cdda39d3476bfce111f5935f93
filesize: 21.3GB
filetype: .gds
belongs_to: data/betas
- id: alspacdcs:c27c47a4-6f52-474f-93f6-3ffa9360fef4
name: common.gds
md5sum: 2d447051e6241bf35dc1bfba4e740848
filesize: 29.1GB
filetype: .gds
belongs_to: data/betas
detection_p_values:
contains:
- epic.gds
- 450.gds
- common.gds
files:
- id: alspacdcs:fe210696-2b93-4d85-8211-fdae1ff5f647
name: epic.gds
md5sum: 341d1194d468e10e80be9dc9990c474b
filesize: 17.7GB
filetype: .gds
belongs_to: data/detection_p_values
- id: alspacdcs:01a334ae-04d5-4d4a-9fff-b365c8893c54
name: 450.gds
md5sum: 1c437226b2aab0c00aed7098e739f49d
filesize: 21.5GB
filetype: .gds
belongs_to: data/detection_p_values
- id: alspacdcs:aeaf85d2-4522-431b-975e-2ae0a85e4fbe
name: common.gds
md5sum: c6f4348fa7d92a5f341f69e1784036da
filesize: 29.3GB
filetype: .gds
belongs_to: data/detection_p_values
control_matrix:
contains:
- epic.txt
- common.txt
- 450.txt
files:
- id: alspacdcs:9042df18-14aa-4597-8d56-6d2e991a0c0a
name: epic.txt
md5sum: 7a680d3ccd26a491ec7dde2ce91eeeab
filesize: 1008.8KB
filetype: .txt
belongs_to: data/control_matrix
- id: alspacdcs:440e5699-4274-40f1-a49b-dcb6d47af57a
name: common.txt
md5sum: 42d21ff7a2ead483e85b909b279e9912
filesize: 3.1MB
filetype: .txt
belongs_to: data/control_matrix
- id: alspacdcs:b8218208-65c7-4174-8c5e-e259fbb0da6b
name: 450.txt
md5sum: 9e6aa62498c5bb7493f7512e274056ba
filesize: 2.1MB
filetype: .txt
belongs_to: data/control_matrix
Gene Expression Data
Gene expression - array - G1 (ge_ht12_g1)
Description
There are two different types of QC’d data available in this version, one performed by David Evans for the Bryois et al 2014 paper, and one performed by Gibran Hemani for the molgenis eQTL mapping meta analysis. A version without QC is available as well. Details on the QC’d versions can be seen below.
This data was generated from LCLs. The majority of samples used in their generation were collected at age 9 years. LCL’s are a lymphoblastoid cell lines which were produced by transforming lymphocytes with Epstein Barr Virus and cultured before DNA was extracted. Gene expression patterns may not be the same as that from untransformed lymphocytes taken from a 9 year old.
Methodology
Bryois: - LCL’s from unrelated individuals were grown under identical conditions and cells frozen in RNAlater. RNA was extracted using an RNeasy extraction kit (Qiagen) and was amplified using the Illumina TotalPrep-96 RNA Amplification kit (Ambion). Expression profiling of the samples, each with two technical replicates, were performed using the Illumina Human HT-12 V3 BeadChips (Illumina Inc) including 48,804 probes where 200 ng of total RNA was processed according to the protocol supplied by Illumina. Raw data was imported to the Illumina Beadstudio software and probes with less than three beads present were excluded. Log2 - transformed expression signals were then normalized with quantile normalization of the replicates of each individual followed by quantile normalization across all individuals.
We restricted our analysis to 23’935 probes tagging genes annotated in Ensembl. Principal component analysis was performed on 931 individuals. 62 individuals with principal component 1 or 2 greater than one standard deviation of the population were excluded from further analysis. See http://journals.plos.org/plosgenetics/article?id=10.1371/journal.pgen.1004461 for full details.
Molgenis: - Genetic outliers were removed, any individuals that were clear outliers in the first 2 genetic principal components. Each probe was simply quantile normalised and then log2 transformed. Then adjusted for the first 4 genetic MDS, expression principal components (excluding those that had genetic associations), and scaled to have mean 0 and variance 1. See https://github.com/molgenis/systemsgenetics/wiki/eQTL-mapping-analysis-cookbook for full details.
Freeze Docs
# This yaml file is a description of a freeze of a released version of a named alspac dataset
# It should conform to the schema https://github.com/alspac/alspac-data-catalogue-schema
id: alspacdcs:ge_ht12_g1_2015-11-02_f6
name: Gene expression - array - G1 release version 2015-11-02 freeze 6
description: >-
This is the sixth freeze of the 2015-11-02 version of
ge_ht12_g1 dataset which has .csv distributions of the data rather than
.Rdata files in order to be easier to use across differnt data
science software and languages.
freeze_size: 2.6G
linker_file_md5sum: 45415c7d4fae355b4fb2d6ccd042620d
woc_file_md5sum: 6c887db8c7dd10cc695630ca73b41405
all_individuals_to_exclude_md5sum: e4efce63f9f671548d08c8bb2f9cc4f7
git_tag: https://github.com/alspac/dataset_ge_ht12_g1/releases/tag/freeze6
is_current_freeze: true
freeze_number: 6
freeze_date: 2025-09-30
previous_freeze: alspacdcs:ge_ht12_g1_2015-11-02_f5
freeze_of_alspac_dataset_version: alspacdcs:ge_ht12_g1_2015-11-02
freeze_of_named_alspac_dataset: alspacdcs:ge_ht12_g1
contains:
- data
- docs
files: []
data:
contains:
- bryois.csv
- molgenis.csv
- raw.csv
files:
- id: alspacdcs:ad99beeb-c614-4f26-ba39-1823a17c9fb2
name: bryois.csv
md5sum: 47f1e98d0b16a448362c299f86d80bbb
description: >-
Csv version of the bryois data.
IDs in columns and Illumina probe IDs in rows.
This is the normalised data used in Bryois et al 2014.
Probe IDs are mapped to genes in raw.csv
filesize: 741.2MB
filetype: .csv
belongs_to: data
number_of_participants: 947
number_of_gene_expression_probe_values: 48630
- id: alspacdcs:aa466bac-6e5a-427a-aecc-89a7114dc811
name: molgenis.csv
md5sum: 6fe74a566bad2d2357554adf53a41960
description: >-
The freeze 6 csv version of the molgenis data.
IDs in columns and Illumina probe IDs in rows.
Normalised data following the molgenis pipeline,
found at
https://github.com/molgenis/systemsgenetics/wiki/eQTL-mapping-analysis-cookbook.
Probe IDs are mapped to genes in raw.csv
filesize: 751.2MB
filetype: .csv
belongs_to: data
number_of_participants: 879
number_of_gene_expression_probe_values: 48630
- id: alspacdcs:d83e0b3c-5a0e-4116-be15-9f89d58b3675
name: raw.csv
md5sum: 3f6ac964549b12dea0cd245f7f8b9dcd
description: >-
The 6 csv version of the raw ge data.
IDs in columns and probes in rows. Four columns per
individual, with two columns for average signal and two columns
for average number of beads.
Presumably this is a file generated by the Illumina Genome
Studio software.
filesize: 1.1GB
filetype: .csv
belongs_to: data
number_of_participants: 994
number_of_gene_expression_probe_values: 48630Omics tips
Introduction
This section is a guide to using ’Omics datasets. It explains which software to use and describes common file formats. It’s a good starting point for beginners and helpful for problem-solving.
Disclaimer
Some information is copied or reworded from software documentation. Check the original documentation alongside this guide for up-to-date information. Note that some links may no longer work.
Operating systems
You can use ALSPAC data with any operating system, but Unix-based systems like Macintosh, Linux, or BSD are more convenient due to the data’s size and complexity. We recommend using the command line and programming scripts with languages like Bash, R, Python, or Perl. Many online resources are available to learn these tools. Use free/libre and open-source software where possible.
Links:
- Unix guide: https://www.osc.edu/supercomputing/unix-cmds
- Beginning Python: https://www.python.org/about/gettingstarted/
- Beginning R: https://www.statmethods.net/r-tutorial/index.html
- Free/libre and open-source software: https://www.fsf.org/about/
Key Omics software
Plink
Plink is a tool for performing quality control and whole genome association analysis of genetic data. - Link: http://zzz.bwh.harvard.edu/plink/ ### SNPTest SNPTest is a tool for performing whole genome association analysis of genetic data. - Link: https://mathgen.stats.ox.ac.uk/genetics_software/snptest/snptest.html (Not open source) ### BoltLmm BoltLmm is a tool for performing genome association analysis of genetic data. It is recommended for analysis of more than 5000 samples, its methods automatically take into account population substructures. - Link: https://data.broadinstitute.org/alkesgroup/BOLT-LMM/ ### Qctools A tool for quality control of genetic data. It is also useful to inspect and modify .gen .bgen and vcf files etc (see section 4 below). - Link: https://www.well.ox.ac.uk/~gav/qctool_v2/ ### SAMTOOLS Samtools is a suite of tools which are used for genomic analysis. - Link: http://www.htslib.org/ ### VCFTOOLS Part of samtools that allows you to work with vcf files. - Link: https://vcftools.github.io/index.html ### BCFTOOLS This is a part of samstools and allows users to manipulate .bcf files. - Link: http://samtools.github.io/bcftools/bcftools.html
File types
In a Unix environment the postfix of a file name does not explicitly mean anything to the operating system, unlike in a Windows system which will look at the file types. In a Unix system it is just part of the name of the file and humans use it to distinguish file formats. The following is a non-exhaustive list of file types you may encounter whilst using ALSPAC Omics data.
.gen
This is an ‘oxford’ data format for genetic data. The .gen file is a plain text file, this means that standard Unix command line tools can be used to inspect the data. For example, ‘head’ or ‘less’.
The .gen (genotype) file stores data on a one-line-per-SNP format. The first 5 entries of each line are the SNP ID, RS ID of the SNP, base-pair position of the SNP, the allele coded A and the allele coded B. The SNP ID can be used to denote the chromosome number of each SNP. The next three numbers on the line are the probabilities of the three genotypes AA, AB and BB at the SNP for the first individual in the cohort. The next three numbers are the genotype probabilities for the second individual in the cohort. The next three numbers are for the third individual and so on. The order of individuals in the genotype file should match the order of the individuals in the sample file (see below). It should be noted that the probabilities need not sum to 1 to allow for the possibility of a NULL genotype call. This format allows for genotype uncertainty. This genotype file format is the same as that produced by the genotype calling algorithm CHIAMO. NOTE : We recommend that you arrange SNPs in base-pair order in the genotype files. This is required if you want to use the files with IMPUTE and will make viewing the output of SNPTEST somewhat easier. For example, Suppose you want to create a genotype for 2 individuals at 5 SNPs whose genotypes are
| SNP 1 | AA | AA |
| SNP 2 | GG | GT |
| SNP 3 | CC | CT |
| SNP 4 | CT | CT |
| SNP 5 | AG | GG |
The correct genotype file would look like this:
| SNP1 rs1 1000 | A | C | 1 | 0 | 0 | 1 | 0 | 0 |
| SNP2 rs2 2000 | G | T | 1 | 0 | 0 | 0 | 1 | 0 |
| SNP3 rs3 3000 | C | T | 1 | 0 | 0 | 0 | 1 | 0 |
| SNP4 rs4 4000 | C | T | 0 | 1 | 0 | 0 | 1 | 0 |
| SNP5 rs5 5000 | A | G | 0 | 1 | 0 | 0 | 0 | 1 |
.bgen
A binary version of a .gen file. This file can not be visually inspected on the command line. .bgen files are used because they greatly increase the speed and storage efficiency of software for storing large amounts of Omics data. The full details of the file format are discussed in : https://www.well.ox.ac.uk/~gav/bgen_format/ bgen files are normally used with tools such as qctools and snptest There is also a library for reading .bgen files into R : https://bitbucket.org/gavinband/bgen/wiki/rbgen
.sample
The .sample file is paired with either .gen or .bgen files. It contains information on the samples that is not genetic. It is a plain text file that can be inspected with standard Unix command line tools.
Please note that the sample file format changed with the release of SNPTEST v2. Specifically, the way in which covariates and phenotypes are coded on the second line of the header file has changed. The sample file has three parts (a) a header line detailing the names of the columns in the file, (b) a line detailing the types of variables stored in each column, and (c) a line for each individual detailing the information for that individual. Here is an example of the start of a sample file for reference
| ID_1 | ID_2 | missing | cov_1 | cov_2 | cov_3 | cov_4 | pheno1 | bin1 |
|---|---|---|---|---|---|---|---|---|
| 0 | 0 | 0 | D | D | C | C | P | B |
| 1 | 1 | 0 | .007 | 1 | 2 | 0 | .0019 | -0.008 1.233 1 |
| 2 | 2 | 0 | .009 | 1 | 2 | 0 | .0022 | -0.001 6.234 0 |
| 3 | 3 | 0 | .005 | 1 | 2 | 0 | .0025 | 0.0028 6.121 1 |
| 4 | 4 | 0 | .007 | 2 | 1 | 0 | .0017 | -0.011 3.234 1 |
| 5 | 5 | 0 | .004 | 3 | 2 | -0 | .012 | 0.0236 2.786 0 |
The header line: This line needs a minimum of three entries. The first three entries should always be ID_1, ID_2 and missing. They denote that the first three columns contain the first ID, second ID and missing data proportion of each individual. Additional entries on this line should be the names of covariates or phenotypes that are included in the file. In the above example, there are 4 covariates named cov_1, cov_2, cov_3, cov_4, a continuous phenotype named pheno1 and a binary phenotype named bin1. NOTE : All phenotypes should appear after the covariates in this file. The second line of the file details the type of variables included in each column. The first three entries of this line should be set to 0. Subsequent entries in this line for covariates and phenotypes should be specified by the following rules
| D | Discrete covariate (coded using positive integers) |
| C | Continuous covariates |
| P | Continuous Phenotype |
| B | Binary Phenotype (0 = Controls, 1 = Cases) |
The remainder of the file should consist of a line for each individual containing the information specified by the entries of the header line (see example above). Use spaces to separate the entries of the sample file and not TABS because that is the expected character.
Missing values - Specifying missing values for covariates and phenotypes is possible. It was recommended that you use -9 for missing values. This was the default value assumed by SNPTEST v1, although the -missing_code option in SNPTEST v1 meant that you could use other numeric values for the missing code, In SNPTEST v2 the behavior of the -missing_code option has changed so that it now takes a comma-separated list of values, each of which is treated as missing when encountered in the sample file(s). Default missing values are now denoted by the two character string “NA”.
.ped
A plink format file that is in plain text and can be viewed with standard tools. It contains genetic variant data. https://www.cog-genomics.org/plink/1.9/formats#ped
.map
A plink format file that is in plain text. It contains information about variants. https://www.cog-genomics.org/plink/1.9/formats#map
.bed
A plink format file that isa binary equivalent of a .ped file. It is smaller and faster to process but is not easily viewable or editable. https://www.cog-genomics.org/plink/1.9/formats#bed
.bim
A plink format, similar to a .map file but is used with binary .bed files. https://www.cog-genomics.org/plink/1.9/formats#bin
.fam
A plain text format that contains sample information for plink binary files. https://www.cog-genomics.org/plink/1.9/formats#fam
.csv
A plain text format where different fields are separated by commas. (Comma separated variables).
.vcf
VCF files are a flexible file format for storing different types of genetic variants. They are a plain text format that can be inspected on the command line with standard Unix tools. However they are often very large files, and specific tools such as ‘vcftools’ are useful for working with this data. Commonly SNPs are stored in these files but other variants such as Copy Number variations can also be stored. The basic form for a vcf file is: https://en.wikipedia.org/wiki/Variant_Call_Format
.bcf
This is a binary version of a vcf file. It cannot be inspected on the command line, but can be used with the genomic tools mentioned in this document.
.tar.gz
This is a standard Unix file format for bundling and compressing a set of files. It is similar to a .zip file. It is made by first bundling a set of files into a .tar file (sometimes called a tar ball). This is then compressed using ‘gun zip’. https://en.wikipedia.org/wiki/Tar_(computing) https://en.wikipedia.org/wiki/Gzip
.enc
This file extension is used as a convention to mean that the file is encrypted. You will need to have that password that was used to encrypt the data in order to unencrypt the files. https://en.wikipedia.org/wiki/OpenSSL
Variant/SNP ids
There are many types of genetic variation. A common type is a single nucleotide polymorphism (SNP). Others include copy number variations.
Variants can be specified by a Chromosome and location in reference to a specific build of the human genome. They can also be given a reference SNP (rs) cluster identifier.
- chr:Location
- rs_ids
Overview of Imputation reference panels
SNP array data frequently contain hundreds of thousands of variants. However due to linkage disequilibrium it is possible to estimate many more SNP values for an individual. This estimation procedure is called imputation and it works by combining an individuals SNP array data with a large reference population of sequenced data. In this way it is possible to have accurate estimations of millions of SNP values for an individual without the cost of fully sequencing each person. ALSPAC has prerun the imputation process using three different imputation panels.
Panels
- TOPmed: The latest reference panel (to ALSPAC), which has the most snps
- HRC: This is the latest reference panel and our data contains circa 40 millions of SNPs.
- 1000 Genomes: This is the previous generation reference panel which is still widely used in ALSPAC studies. There are some SNPs that appear in this panel that are not in the HRC panel.
- Hapmap: This was the first widely used imputation panel.
SNP data types from imputation.
SNPs that have been imputed can be stored and analysed in different formats. These can be appropriate for different types of analysis, for example an analysis could assume and additive effect for the minor allele or it could assume a recessive/dominant effect.
- Best guess. The data will be presented as either 0,1, or 2 to represent how many of the minor alleles at that position a person has. The best guess is derived from the probability of a variant calculated from the imputation process.
- Dosage. This is the probability that the person has 0, 1 or 2 of the minor allele. i.e. 0.1, 0.2,0.7. This will sum to one across the three possibilities (i.e for each SNP for each individual).
SNP Statistics
You can generate statistics on your SNP data using the program ‘QCtools’. This will give you the imputation information scores. For example:
qctool -g example.bgen -s example.sample -sample-stats -osample sample-stats.txt
Best practice
GWAS
We recommend you follow the steps outlined in the following paper when performing GWAS: Marees, Andries T., et al. “A tutorial on conducting genome‐wide association studies: Quality control and statistical analysis.” International journal of methods in psychiatric research 27.2 (2018): e1608. https://doi.org/10.1002/mpr.1608 ### Phewas We recommend you follow the steps outlined in the following paper when performing Phewas: Millard, L., Davies, N., Timpson, N. et al. MR-PheWAS: hypothesis prioritization among potential causal effects of body mass index on many outcomes, using Mendelian randomization. Sci Rep 5, 16645 (2015). https://doi.org/10.1038/srep16645 ### Methylation The following paper describes the methylation data available in ALSPAC Relton, Caroline L., et al. “Data resource profile: accessible resource for integrated epigenomic studies (ARIES).” International journal of epidemiology 44.4 (2015): 1181-1190.
Population stratification
This is when an observed genetic association is due to the population/geography. Not taking this into account can lead to biased estimates of effects. One common method to account for these is to calculate principal components (PCs) of the genetic data and then to include these as covariables in any models.
ALSPAC do not provide PCs as part of the standard omics datasets, as these would require being re-generated and tested alongside each freeze. PCs can be generated using plink, hail or a variety of other tools.
For more information about how to do this in plink see: https://www.cog-genomics.org/plink/1.9/strat
An common method used to account for population substructure is by using linear mixed models. For example using the bolt LMM software tool.
https://data.broadinstitute.org/alkesgroup/BOLT-LMM/
Polygenic risk scores (PRS)
These are scores which estimate the effect of variants in an individual genome on a given phenotypic trait or disease.
Further explanations can be found online, such as: https://www.genome.gov/Health/Genomics-and-Medicine/Polygenic-risk-scores
Or example tutorials for calculating PRSs: https://www.nature.com/articles/s41596-020-0353-1
Different collaborators often generate PRS for ALSPAC, but these are not shared as part of our standard omics datasets. Collaborators wishing for PRSs will need to generate these themselves.
Common tasks
Here we provide links to webpages that provide instructions or provide brief details any code for completing common tasks using the various software we have described above (section x):
- Extract some SNPs from a bgen data file and convert to plain text.
https://www.well.ox.ac.uk/~gav/qctool_v2/documentation/examples/filtering_variants.html
- Extract some SNPs from bed data:
http://zzz.bwh.harvard.edu/plink/dataman.shtml
plink --bfile mydata --chr 2 --from-kb 5000 --to-kb 10000
- Reading .bgen and .sample oxford files in plink
Plink supports bgen files but it is fussy about the types of its columns in the data.sample file. You may wish to remove or retype columns to read a data.sample file into plink. For more info see:
https://www.cog-genomics.org/plink/2.0/input
To make a new sample file removing some columns you can use the Unix command: ‘cut -f 1,2,3 -d ” ” data.sample > data2.sample’
Courses
Working with ’Omics data can be complicated but there are many excellent resources available to help you learn how to do this. There are both paid in person courses and free online courses.
Details on paid courses offered by Bristol University can be found here: https://www.bristol.ac.uk/medical-school/study/short-courses/ In addition, a number of free online courses are summarised here: https://www.mooc-list.com/tags/bioinformatics
Further sources of help
Stack exchange
Stack exchange is an online Q&A community which is divided into different sub-communities. The first and most well-known is Stack overflow. This is one of the best place to ask questions about programming on the Internet. Other useful exchange sites include bioinformatics https://bioinformatics.stackexchange.com/, maths https://mathoverflow.net/ and statistics https://stats.stackexchange.com/.
Bio-stars
Biostars is bioinformatics community Q&A web-site: https://www.biostars.org/
Mailing lists
For individual product/projects there is often a mailing list. For example to get help using SNPTEST you can ask on the mailing list https://mathgen.stats.ox.ac.uk/genetics_software/snptest/snptest.html#contact
AI tools
AI tools such as chatGPT can be useful to understand how to work with omics data, but please do understand their limitations and look at documentation or research papers directly.
Ask ALSPAC
If you can not find the answer to your question or you think there is something wrong with your data then please contact the alspac-omics@bristol.ac.uk mailbox and we will do our best to help you.