Gene index
Bases: Dataset
Gene index dataset.
Gene-based annotation.
Source code in src/otg/dataset/gene_index.py
17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 |
|
filter_by_biotypes(biotypes)
Filter by approved biotypes.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
biotypes | list | List of Ensembl biotypes to keep. | required |
Returns:
Name | Type | Description |
---|---|---|
GeneIndex | GeneIndex | Gene index dataset filtered by biotypes. |
Source code in src/otg/dataset/gene_index.py
83 84 85 86 87 88 89 90 91 92 93 |
|
from_source(target_index)
classmethod
Initialise GeneIndex from source dataset.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
target_index | DataFrame | Target index dataframe | required |
Returns:
Name | Type | Description |
---|---|---|
GeneIndex | GeneIndex | Gene index dataset |
Source code in src/otg/dataset/gene_index.py
55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 |
|
get_schema()
classmethod
Provides the schema for the GeneIndex dataset.
Source code in src/otg/dataset/gene_index.py
50 51 52 53 |
|
locations_lut()
Gene location information.
Returns:
Name | Type | Description |
---|---|---|
DataFrame | DataFrame | Gene LUT including genomic location information. |
Source code in src/otg/dataset/gene_index.py
95 96 97 98 99 100 101 102 103 104 105 |
|
symbols_lut()
Gene symbol lookup table.
Pre-processess gene/target dataset to create lookup table of gene symbols, including obsoleted gene symbols.
Returns:
Name | Type | Description |
---|---|---|
DataFrame | DataFrame | Gene LUT for symbol mapping containing |
Source code in src/otg/dataset/gene_index.py
107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 |
|
Schema
root
|-- geneId: string (nullable = false)
|-- chromosome: string (nullable = false)
|-- approvedSymbol: string (nullable = true)
|-- biotype: string (nullable = true)
|-- approvedName: string (nullable = true)
|-- obsoleteSymbols: array (nullable = true)
| |-- element: struct (containsNull = true)
| | |-- label: string (nullable = true)
| | |-- source: string (nullable = true)
|-- tss: long (nullable = true)