Besides providing a name for every human gene, the HGNC complete
gene set also includes mappings of HGNC symbols to
gene entries in other popular databases or resources, e.g. to Entrez
gene or UCSC gene identifiers.
The helper crosswalk()
provides an easy interface to
crosswalk (i.e. map or translate) across identifiers. This function
allows matching exact terms in columns of an imported HGNC complete gene
set and return values from another column.
Gene symbol to HUGO identifier and back
# Example bundled data set.
hgnc_dt <- hgnc_dataset_example()
# From gene symbol to HUGO identifier
crosswalk(
c("A1BG", "A1BG-AS1"),
from = "symbol",
to = "hgnc_id",
hgnc_dataset = hgnc_dt
)
#> [1] "HGNC:5" "HGNC:37133"
# and back.
crosswalk(
c("HGNC:5", "HGNC:37133"),
from = "hgnc_id",
to = "symbol",
hgnc_dataset = hgnc_dt
)
#> [1] "A1BG" "A1BG-AS1"
Gene symbol to aliases
Note that this is typically a one-to-many mapping, so the result is a
list instead of a character vector.
crosswalk(
c("A1BG", "A1BG-AS1", "A1CF"),
from = "symbol",
to = "alias_symbol",
hgnc_dataset = hgnc_dt
)
#> [[1]]
#> [1] NA
#>
#> [[2]]
#> [1] "FLJ23569"
#>
#> [[3]]
#> [1] "ACF" "ASP" "ACF64" "ACF65" "APOBEC1CF"
Gene symbol to locus group
# From gene symbol to locus group
crosswalk(
c("A1BG", "A1BG-AS1"),
from = "symbol",
to = "locus_group",
hgnc_dataset = hgnc_dt
)
#> [1] "protein-coding gene" "non-coding RNA"
Gene symbol to locus type
# From gene symbol to locus type
crosswalk(
c("A1BG", "A1BG-AS1"),
from = "symbol",
to = "locus_type",
hgnc_dataset = hgnc_dt
)
#> [1] "gene with protein product" "RNA, long non-coding"
Gene symbol to Entrez gene id
# From gene symbol to Entrez gene id
crosswalk(
c("A1BG", "A1BG-AS1"),
from = "symbol",
to = "entrez_id",
hgnc_dataset = hgnc_dt
)
#> [1] 1 503538
Gene symbol to Ensembl gene id
# From gene symbol to Ensembl gene id
crosswalk(
c("A1BG", "A1BG-AS1"),
from = "symbol",
to = "ensembl_gene_id",
hgnc_dataset = hgnc_dt
)
#> [1] "ENSG00000121410" "ENSG00000268895"
Gene symbol to MGI marker id
Map human gene symbols to mouse homolog identifiers, i.e. MGI. If you need to also
work with mouse gene names, you might find the package
mgi.report.reader useful.
# From gene symbol to MGI marker id
crosswalk(
c("A1BG", "A1BG-AS1", "AADACL4"),
from = "symbol",
to = "mgd_id",
hgnc_dataset = hgnc_dt
)
#> [[1]]
#> [1] "MGI:2152878"
#>
#> [[2]]
#> [1] NA
#>
#> [[3]]
#> [1] "MGI:2685282" "MGI:2685284" "MGI:2685880" "MGI:3650257" "MGI:3650721"
#> [6] "MGI:3652194"