ALSPAC OMICs Data Catalogue

Table of Contents

1. Introduction

Welcome to the ALSPAC Omics Catalog, your gateway to the comprehensive omics data offered by ALSPAC. Our catalog features a variety of named ALSPAC datasets, each consisting of collected or produced data that has been organized, named, and curated for ease of use. Every named ALSPAC dataset comes with accompanying metadata that provides information about the dataset as a whole. Each named ALSPAC dataset has at least one release version that includes a curated selection of files detailed in the metadata sections.

Please note that these datasets are not generally accessible. However, the information is made available for browsing to help both internal ALSPAC users and external researchers understand the data and facilitate custom data requests. See http://www.bristol.ac.uk/alspac/researchers/access/ for how to do this.

For our external ALSPAC collaborators, we offer as standard "freezes" of specific dataset versions of named ALSPAC datasets. These freezes, along with their metadata, are outlined in this catalog. External collaborators will be granted access to these freezes upon request (See http://www.bristol.ac.uk/alspac/researchers/access/ ). A freeze represents a carefully selected subset of data files within a version, containing the core data from a dataset with withdrawn consent removed and specific dataset IDs applied. These freezes are subject to periodic updates.

NamedALSPACDataset DatasetVersion Freeze

The metadata presented in our catalog adheres to the ALSPAC Data Catalog Schema, which is crafted in LinkML. To explore the full schema documentation, please visit: https://alspac.github.io/alspac-data-catalogue-schema/

This website is equipped with RDFa, enabling the metadata to be machine-readable and allowing for the creation of queries using SPARQL with compatible tools, such as Apache Any23 and Apache Jena.

For more information about this see the document on FAIR data principles and the document describing the rational and construction of this catalog here.

2. Catalog overview

3. Genetic Array Data

3.1. Genome-wide - Illumina 550 quad - G1 (gwa_550_g1)

3.1.2. Version Docs

alspacdcs:gwa 550 g1 2022-12-05
a
dcat:Dataset
schema:description
Genome-wide array data genotype calls for G1 individuals 2022-12-05.
alspacdcs:has containers
alspacdcs:has freezes
alspacdcs:has next version
NA
alspacdcs:has parts
alspacdcs:a511d25c-ae38-4ae5-ae6b-2e95cad4cc17
a
dcat:Dataset
alspacdcs:data distributions
alspacdcs:0f55fc05-69bf-46a2-82cc-d42335c362f8
a
dcat:Distribution
alspacdcs:belongs to container
alspacdcs:69b0b1a9-9af0-4b85-a9c9-2973fc1ab5c4
a
nfo:Folder
schema:description
A dir/folder containing the plink data files
schema:name
data
dcat:byteSize
14M
schema:description
Extended variant information file accompanying a .bed binary genotype table. (--make-just-bim can be used to update just this file.) A text file with no header line, and one line per variant with the following six fields: 1. Chromosome code (either an integer, or 'X'/'Y'/'XY'/'MT'; '0' indicates unknown) or name 2. Variant identifier 3. Position in morgans or centimorgans (safe to use dummy value of '0') 4. Base-pair coordinate (1-based; limited to 231-2) 5. Allele 1 (corresponding to clear bits in .bed; usually minor) 6. Allele 2 (corresponding to set bits in .bed; usually major)
alspacdcs:md5sum
e78729d05d074db4508474379c7fbe8a
dcat:mediaType
.bim
schema:name
data.bim
schema:description
Variant information
schema:name
Variant Information
,
alspacdcs:624c63f2-7750-46ba-837c-2629f2c80e2e
a
dcat:Dataset
alspacdcs:data distributions
alspacdcs:fb0184f4-82b9-472f-86c4-fc3de9986b6c
a
dcat:Distribution
alspacdcs:belongs to container
dcat:byteSize
139k
schema:description
A text file with no header line, and one line per sample with the following six fields: 1. Family ID ('FID') 2. Within-family ID ('IID'; cannot be '0') 3. Within-family ID of father ('0' if father isn't in dataset) 4. Within-family ID of mother ('0' if mother isn't in dataset) 5. Sex code ('1' = male, '2' = female, '0' = unknown) 6. Phenotype value ('1' = control, '2' = case, '-9'/'0'/non-numeric = missing data if case/control)
alspacdcs:md5sum
171c7c4b5877b539a509328a03b8ce35
dcat:mediaType
.fam
schema:name
data.fam
schema:description
sample information
schema:name
data sample information
and
alspacdcs:22528cfc-f8b0-477d-a016-3c3260c6ddd2
a
dcat:Dataset
alspacdcs:data distributions
alspacdcs:a37b5aa2-a52c-4a9c-9d15-c1350ee2a286
a
dcat:Distribution
alspacdcs:belongs to container
dcat:byteSize
999M
schema:description
Plink bed file. Primary representation of genotype calls at biallelic variants. Must be accompanied by .bim and .fam files.
alspacdcs:md5sum
a483efdaf34696f4dfd768e9ba515634
dcat:mediaType
.bed
schema:name
data.bed
schema:description
Genotype data
schema:name
Biallelic genotype table
alspacdcs:has previous version
NA
alspacdcs:is current version
True
schema:name
Genome-wide array data genotype calls for G1 individuals 2022-12-05.
alspacdcs:qc description
alspacdcs:version of

3.1.3. Freeze Docs

alspacdcs:gwa 550 g1 2022-12-05 f2
a
dcat:Dataset
alspacdcs:all individuals to exclude md5sum
da4785a577a4d837883710f7ab45af51
schema:description
The second freeze of the genome-wide array data for G1 based on a 2022-12-05 release. The data is in plink format.
alspacdcs:freeze date
2022-12-19
alspacdcs:freeze number
2
alspacdcs:freeze of alspac dataset version
alspacdcs:freeze of named alspac dataset
alspacdcs:freeze size
997M
alspacdcs:git tag
alspacdcs:has containers
alspacdcs:has parts
alspacdcs:c3ac5077-d8d4-44d5-9456-3b731d23f67f
a
dcat:Dataset
alspacdcs:data distributions
alspacdcs:c20cd22a-61ac-49a6-8adb-2b877868784f
a
dcat:Distribution
alspacdcs:belongs to container
alspacdcs:5b87a9bf-879b-4d26-b3e2-aab9b14a1fdb
a
nfo:Folder
schema:description
A dir/folder containing the two freeze data files
schema:name
data
dcat:byteSize
249k
schema:description
A text file with no header line, and one line per sample with the following six fields: 1. Family ID ('FID') 2. Within-family ID ('IID'; cannot be '0') 3. Within-family ID of father ('0' if father isn't in dataset) 4. Within-family ID of mother ('0' if mother isn't in dataset) 5. Sex code ('1' = male, '2' = female, '0' = unknown) 6. Phenotype value ('1' = control, '2' = case, '-9'/'0'/non-numeric = missing data if case/control)
alspacdcs:md5sum
0cbe669d9dc4c8b8fb3b2792e3d872ca
dcat:mediaType
.fam
schema:name
freeze_id.fam
alspacdcs:number of participants
8224
schema:description
Sample ids
schema:name
sample info
,
alspacdcs:af4a19ce-a0c0-4086-80da-da4a6865dae0
a
dcat:Dataset
alspacdcs:data distributions
alspacdcs:f00b1310-f7f6-47c7-b46d-7082f43f542d
a
dcat:Distribution
alspacdcs:belongs to container
dcat:byteSize
14M
schema:description
Extended variant information file accompanying a .bed binary genotype table. (--make-just-bim can be used to update just this file.) A text file with no header line, and one line per variant with the following six fields: 1. Chromosome code (either an integer, or 'X'/'Y'/'XY'/'MT'; '0' indicates unknown) or name 2. Variant identifier 3. Position in morgans or centimorgans (safe to use dummy value of '0') 4. Base-pair coordinate (1-based; limited to 231-2) 5. Allele 1 (corresponding to clear bits in .bed; usually minor) 6. Allele 2 (corresponding to set bits in .bed; usually major)
alspacdcs:md5sum
c7fa007331fab0e8b6ce5b78412848da
dcat:mediaType
.bim
schema:name
freeze_id.bim
alspacdcs:number of variants
500527
schema:description
Information about SNPS
schema:name
Variant Information
,
,
  and
alspacdcs:8d57fbeb-51de-48f9-a92f-92d70f936a5a
a
dcat:Dataset
alspacdcs:data distributions
alspacdcs:dfc98933-69fa-4e53-99ab-55e50836ccbf
a
dcat:Distribution
alspacdcs:belongs to container
dcat:byteSize
1.7M
schema:description
Produced automatically when the input data contains heterozygous calls where they shouldn't be possible (haploid chromosomes, male X/Y), or there are nonmissing calls for nonmales on the Y chromosome. A text file with one line per error (sorted primarily by variant ID, secondarily by sample ID) with the following three fields: Family ID Within-family ID Variant ID
alspacdcs:md5sum
f6aa76751c6b3ce70a83f999c036970b
dcat:mediaType
.hh
schema:name
freeze_id.hh
schema:description
A plink report
schema:name
Heterozygous haploid and nonmale Y chromosome call list
alspacdcs:is current freeze
true
alspacdcs:linker file md5sum
33b08d90fa3e43504308f20283088a6b
schema:name
Genome-wide array data including raw files and genotype calls for G1 individuals 2022-12-05 freeze 2
alspacdcs:woc file md5sum
2feb3852dfd14c2868072624fd7fa1ea

3.2. Genome-wide - Illumina exome core array - G0 partners (gwa_exome_g0p)

3.2.2. Version Docs

alspacdcs:gwa exome g0p 2016-11-22
a
dcat:Dataset
schema:description
Version 2016-11-22 of Genome-wide array data including raw files and genotype calls for G2 individuals
alspacdcs:has containers
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
and
alspacdcs:has freezes
alspacdcs:has next version
NA
alspacdcs:has parts
,
,
,
,
,
,
,
alspacdcs:d80c84b9-e2f5-4076-b722-40364e93e7b4
a
dcat:Dataset
alspacdcs:data distributions
alspacdcs:64307fde-e52e-4cf6-b501-fc3eefd46030
a
dcat:Distribution
alspacdcs:belongs to container
dcat:byteSize
96.4KB
alspacdcs:md5sum
a62ff8156561c8d03a29331420542c01
dcat:mediaType
.jpg
schema:name
200739110110_R05C02_1_Focus_scan#1_swath#1_point#1_try#1.jpg
schema:name
200739110110_R05C02_1_Focus_scan#1_swath#1_point#1_try#1
,
,
,
,
,
,
,
alspacdcs:959953e6-cfdb-4a09-8d14-2f086a211667
a
dcat:Dataset
alspacdcs:data distributions
alspacdcs:02ea8dbd-04c1-4c3b-889b-2f8d3c4df109
a
dcat:Distribution
alspacdcs:belongs to container
dcat:byteSize
100.0KB
alspacdcs:md5sum
1528b55c9bbc4c350adc0a462680475c
dcat:mediaType
.jpg
schema:name
200739100056_R03C01_3_Focus_scan#1_swath#1_point#1_try#1.jpg
schema:name
200739100056_R03C01_3_Focus_scan#1_swath#1_point#1_try#1
,
,
alspacdcs:a3ce29c5-3c73-41b6-b997-98ff55680519
a
dcat:Dataset
alspacdcs:data distributions
alspacdcs:b18d641e-4d6f-42bc-98ae-55421638e191
a
dcat:Distribution
alspacdcs:belongs to container
dcat:byteSize
96.6KB
alspacdcs:md5sum
b0c2f6819439a647b74fec3638a81a59
dcat:mediaType
.jpg
schema:name
200739100016_R06C01_4_Focus_scan#1_swath#1_point#1_try#1.jpg
schema:name
200739100016_R06C01_4_Focus_scan#1_swath#1_point#1_try#1
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,