FinnGen UKBB MVP Meta Analysis Step
gentropy.finngen_ukb_mvp_meta.FinngenUkbMvpMetaSummaryStatisticsIngestionStep
¶
FinnGen UK Biobank and Million Veteran Program meta-analysis summary statistics ingestion step.
Process overview¶
The step performs the following operations:
- Prepares
FinnGenManifestandEFOCuration. - Builds the
StudyIndex. - Reads the raw summary statistics paths from
StudyIndex. - Converts source summary statistics from BGZIP into Parquet.
- Prepares
VariantDirectionfor allele flipping. - Harmonises
SummaryStatistics. - Performs quality control on harmonised
SummaryStatistics. - Updates
StudyIndexwith QC results.
graph TD
%% --- INPUTS ---
A1([source_manifest_path]) --> B1
A2([efo_curation_path]) --> B2
A3([gnomad_variant_index_path]) --> G1
A4([Source Summary Statistics in BGZIP format]) --> C3
%% --- STEP 1: StudyIndex ---
subgraph "Building studyIndex"
B1["FinnGenMetaManifest"] --> C1["StudyIndex"]
B2["EFOMapping"] --> C1
end
%% --- STEP 2: Raw Summary Statistics ---
subgraph "Downloading summary statistics"
C1 --> C2["List of summary statistics paths"]
C2 --> C3["Raw summary statistics in parquet format"]
end
%% --- STEP 3: Quality Control ---
subgraph "Variant Annotations"
G1["VariantIndex"] --> G2["VariantDirection"]
end
%% --- STEP 4: Harmonised Summary Statistics ---
subgraph "Harmonising summary statistics"
C3 --> D1["Allele flipping"]
B1 --> D1
G2 --> D1
D1 --> D2["Removal of not meta-analysed variants"]
D2 --> D3["Removal of low imputation score variants"]
D3 --> D4["Removal of low allele count variants"]
D4 --> E1["Harmonised summary statistics in parquet format"]
end
%% --- STEP 5: QC ---
subgraph "Summary Statistics QC"
E1 --> Q1["SummaryStatistics QC"]
Q1 --> Q2["StudyIndex annotated with QC"]
C1 --> Q2
end
%% --- STYLING ---
classDef input fill:#f8f8ff,stroke:#555,stroke-width:1px,color:#000;
classDef output fill:#e7ffe7,stroke:#555,stroke-width:1px,color:#000;
class A1,A2,A3,A4 input;
class Q2,E1,Q1 output;
Inputs
- This step requires the gnomAD variant index to perform the allele flipping during harmonisation.
- The
source_manifest_pathshould point to a manifest that includes paths to the summary statistics files.
Outputs
This step outputs 4 artifacts:
- Raw summary statistics in Parquet format.
- Harmonised summary statistics in Parquet format.
- Summary statistics QC results in Parquet format.
- Study Index in parquet format (updated with QC results).
Source code in src/gentropy/finngen_ukb_mvp_meta.py
24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 | |
__init__(session: Session, source_manifest_path: str, efo_curation_path: str, gnomad_variant_index_path: str, study_index_output_path: str, raw_summary_statistics_output_path: str, harmonised_summary_statistics_output_path: str, harmonised_summary_statistics_qc_output_path: str, perform_meta_analysis_filter: bool = True, imputation_score_threshold: float = 0.8, perform_imputation_score_filter: bool = True, min_allele_count_threshold: int = 20, perform_min_allele_count_filter: bool = True, min_allele_frequency_threshold: float = 0.0001, perform_min_allele_frequency_filter: bool = False, filter_out_ambiguous_variants: bool = False, qc_threshold: float = 1e-08) -> None
¶
Data ingestion and harmonisation step for FinnGen UKB meta-analysis.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
session
|
Session
|
Session object. |
required |
source_manifest_path
|
str
|
Path to the manifest file. |
required |
efo_curation_path
|
str
|
Path to the EFO curation file. |
required |
gnomad_variant_index_path
|
str
|
Path to the gnomAD variant index file. |
required |
study_index_output_path
|
str
|
Output path for the study index. |
required |
raw_summary_statistics_output_path
|
str
|
Output path for raw summary statistics. |
required |
harmonised_summary_statistics_output_path
|
str
|
Output path for harmonised summary statistics. |
required |
harmonised_summary_statistics_qc_output_path
|
str
|
Output path for harmonised summary statistics QC results. |
required |
perform_meta_analysis_filter
|
bool
|
Whether to filter non-meta analyzed variants. |
True
|
imputation_score_threshold
|
float
|
Imputation score threshold. |
0.8
|
perform_imputation_score_filter
|
bool
|
Whether to filter low imputation scores. |
True
|
min_allele_count_threshold
|
int
|
Minimum allele count threshold. |
20
|
perform_min_allele_count_filter
|
bool
|
Whether to filter low allele counts. |
True
|
min_allele_frequency_threshold
|
float
|
Minimum allele frequency threshold. |
0.0001
|
perform_min_allele_frequency_filter
|
bool
|
Whether to filter low allele frequencies. |
False
|
filter_out_ambiguous_variants
|
bool
|
Whether to filter out ambiguous variants. |
False
|
qc_threshold
|
float
|
P-value threshold for QC. |
1e-08
|
Raises:
| Type | Description |
|---|---|
AssertionError
|
If no summary statistics paths are found in the study index. |
Source code in src/gentropy/finngen_ukb_mvp_meta.py
105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 | |