Intervals
gentropy.dataset.intervals.Intervals
dataclass
¶
Bases: Dataset
Intervals dataset links genes to genomic regions based on genome interaction studies.
Source code in src/gentropy/dataset/intervals.py
22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 |
|
from_source(spark: SparkSession, source_name: str, source_path: str, gene_index: GeneIndex, lift: LiftOverSpark) -> Intervals
classmethod
¶
Collect interval data for a particular source.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
spark |
SparkSession
|
Spark session |
required |
source_name |
str
|
Name of the interval source |
required |
source_path |
str
|
Path to the interval source file |
required |
gene_index |
GeneIndex
|
Gene index |
required |
lift |
LiftOverSpark
|
LiftOverSpark instance to convert coordinats from hg37 to hg38 |
required |
Returns:
Name | Type | Description |
---|---|---|
Intervals |
Intervals
|
Intervals dataset |
Raises:
Type | Description |
---|---|
ValueError
|
If the source name is not recognised |
Source code in src/gentropy/dataset/intervals.py
35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 |
|
get_schema() -> StructType
classmethod
¶
Provides the schema for the Intervals dataset.
Returns:
Name | Type | Description |
---|---|---|
StructType |
StructType
|
Schema for the Intervals dataset |
Source code in src/gentropy/dataset/intervals.py
26 27 28 29 30 31 32 33 |
|
v2g(variant_index: VariantIndex) -> V2G
¶
Convert intervals into V2G by intersecting with a variant index.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
variant_index |
VariantIndex
|
Variant index dataset |
required |
Returns:
Name | Type | Description |
---|---|---|
V2G |
V2G
|
Variant-to-gene evidence dataset |
Source code in src/gentropy/dataset/intervals.py
78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 |
|
Schema¶
root
|-- chromosome: string (nullable = false)
|-- start: string (nullable = false)
|-- end: string (nullable = false)
|-- geneId: string (nullable = false)
|-- resourceScore: double (nullable = true)
|-- score: double (nullable = true)
|-- datasourceId: string (nullable = false)
|-- datatypeId: string (nullable = false)
|-- pmid: string (nullable = true)
|-- biofeature: string (nullable = true)