Intervals
gentropy.dataset.intervals.Intervals
dataclass
¶
Bases: Dataset
Intervals dataset links genes to genomic regions based on genome interaction studies.
Source code in src/gentropy/dataset/intervals.py
19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 |
|
from_source(spark: SparkSession, source_name: str, source_path: str, gene_index: GeneIndex, lift: LiftOverSpark) -> Intervals
classmethod
¶
Collect interval data for a particular source.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
spark
|
SparkSession
|
Spark session |
required |
source_name
|
str
|
Name of the interval source |
required |
source_path
|
str
|
Path to the interval source file |
required |
gene_index
|
GeneIndex
|
Gene index |
required |
lift
|
LiftOverSpark
|
LiftOverSpark instance to convert coordinats from hg37 to hg38 |
required |
Returns:
Name | Type | Description |
---|---|---|
Intervals |
Intervals
|
Intervals dataset |
Raises:
Type | Description |
---|---|
ValueError
|
If the source name is not recognised |
Source code in src/gentropy/dataset/intervals.py
32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 |
|
get_schema() -> StructType
classmethod
¶
Provides the schema for the Intervals dataset.
Returns:
Name | Type | Description |
---|---|---|
StructType |
StructType
|
Schema for the Intervals dataset |
Source code in src/gentropy/dataset/intervals.py
23 24 25 26 27 28 29 30 |
|
Schema¶
root
|-- chromosome: string (nullable = false)
|-- start: string (nullable = false)
|-- end: string (nullable = false)
|-- geneId: string (nullable = false)
|-- resourceScore: double (nullable = true)
|-- score: double (nullable = true)
|-- datasourceId: string (nullable = false)
|-- datatypeId: string (nullable = false)
|-- pmid: string (nullable = true)
|-- biofeature: string (nullable = true)