Skip to content

gwas_catalog_sumstat_preprocess

gentropy.gwas_catalog_sumstat_preprocess.GWASCatalogSumstatsPreprocessStep

Step to preprocess GWAS Catalog harmonised summary stats.

It additionally performs sanity filter of GWAS before saving it.

Source code in src/gentropy/gwas_catalog_sumstat_preprocess.py
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
class GWASCatalogSumstatsPreprocessStep:
    """Step to preprocess GWAS Catalog harmonised summary stats.

    It additionally performs sanity filter of GWAS before saving it.
    """

    def __init__(
        self, session: Session, raw_sumstats_path: str, out_sumstats_path: str
    ) -> None:
        """Run step to preprocess GWAS Catalog harmonised summary stats and produce SummaryStatistics dataset.

        Args:
            session (Session): Session object.
            raw_sumstats_path (str): Input GWAS Catalog harmonised summary stats path.
            out_sumstats_path (str): Output SummaryStatistics dataset path.
        """
        # Processing dataset:
        GWASCatalogSummaryStatistics.from_gwas_harmonized_summary_stats(
            session.spark, raw_sumstats_path
        ).sanity_filter().df.write.mode(session.write_mode).parquet(out_sumstats_path)
        session.logger.info("Processing dataset successfully completed.")

__init__(session: Session, raw_sumstats_path: str, out_sumstats_path: str) -> None

Run step to preprocess GWAS Catalog harmonised summary stats and produce SummaryStatistics dataset.

Parameters:

Name Type Description Default
session Session

Session object.

required
raw_sumstats_path str

Input GWAS Catalog harmonised summary stats path.

required
out_sumstats_path str

Output SummaryStatistics dataset path.

required
Source code in src/gentropy/gwas_catalog_sumstat_preprocess.py
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
def __init__(
    self, session: Session, raw_sumstats_path: str, out_sumstats_path: str
) -> None:
    """Run step to preprocess GWAS Catalog harmonised summary stats and produce SummaryStatistics dataset.

    Args:
        session (Session): Session object.
        raw_sumstats_path (str): Input GWAS Catalog harmonised summary stats path.
        out_sumstats_path (str): Output SummaryStatistics dataset path.
    """
    # Processing dataset:
    GWASCatalogSummaryStatistics.from_gwas_harmonized_summary_stats(
        session.spark, raw_sumstats_path
    ).sanity_filter().df.write.mode(session.write_mode).parquet(out_sumstats_path)
    session.logger.info("Processing dataset successfully completed.")