Import HGNC data — import_hgnc

import_hgnc_dataset() imports HGNC data into R. Specify a directory path in addition if you wish the save the data to disk.

Usage

import_hgnc_dataset(file = latest_archive_url())

Arguments

file: A file or URL of the complete HGNC data set (in TSV format). Use list_archives() to list previous versions of these data. Pass one of the URLs (column url) to file to import that specific version. By default the value of file is the URL corresponding to the latest version, i.e. the returned value of latest_archive_url().

Value

A tibble of the HGNC data set consisting of 55 columns:

hgnc_id: A unique ID provided by HGNC for each gene with an approved symbol. IDs are of the format 'HGNC:n', where n is a unique number. HGNC IDs remain stable even if a name or symbol changes.
hgnc_id2: A stripped down version of hgnc_id where the prefix 'HGNC:' has been removed. This column is added by the package {hgnc}.
symbol: The official gene symbol approved by the HGNC, typically a short form of the gene name. Symbols are approved in accordance with the Guidelines for Human Gene Nomenclature.
name: The full gene name approved by the HGNC; corresponds to the approved symbol above.
locus_group: A group name for a set of related locus types as defined by the HGNC. One of: 'protein-coding gene', 'non-coding RNA', 'pseudogene' or 'other'.
locus_type: Specifies the genetic class of each gene entry, including various types of RNA and other gene-related categories, such as pseudogenes and virus integration sites.
status: Status of the symbol report, which can be either 'Approved' or 'Entry Withdrawn'.
location: Chromosomal location. Indicates the cytogenetic location of the gene or region on the chromosome, e.g., '19q13.43'. In the absence of that information, it may be listed as 'not on reference assembly', 'unplaced', or 'reserved'.
location_sortable: A sortable version of the location column, allowing easier sorting by chromosomal location.
alias_symbol: Alternative symbols that have been used to refer to the gene. Aliases may be from literature, other databases, or represent membership of a gene group.
alias_name: Alternative names for the gene. Aliases may be from literature, other databases, or represent membership of a gene group.
prev_symbol: This field displays any symbols that were previously HGNC-approved nomenclature.
prev_name: This field displays any names that were previously HGNC-approved nomenclature.
gene_group: A gene group. Each gene has been assigned to one or more groups, according to either sequence similarity or information from publications, specialist advisors, or other databases.
gene_group_id: Gene group identifier, an integer number. This column contains the gene group identifiers. See gene_group for the gene group name.
date_approved_reserved: The date the entry was first approved.
date_symbol_changed: The date the gene symbol was last changed.
date_name_changed: The date the gene name was last changed.
date_modified: Date the entry was last modified.
entrez_id: Entrez gene identifier.
ensembl_gene_id: Ensembl gene identifier.
vega_id: VEGA gene identifier.
ucsc_id: UCSC gene identifier.
ena: International Nucleotide Sequence Database Collaboration (GenBank, ENA and DDBJ) accession number(s).
refseq_accession: The Reference Sequence (RefSeq) identifier for that entry, provided by the NCBI.
ccds_id: Consensus CDS identifier.
uniprot_ids: UniProt protein accession.
pubmed_id: Pubmed and Europe Pubmed Central PMIDs.
mgd_id: Mouse genome informatics database identifier.
rgd_id: Rat genome database gene identifier.
lsdb: The name of the Locus Specific Mutation Database and URL for the gene.
cosmic: Symbol used within the Catalogue of somatic mutations in cancer for the gene.
omim_id: Online Mendelian Inheritance in Man (OMIM) identifier.
mirbase: miRBase identifier.
homeodb: Homeobox Database identifier.
snornabase: snoRNABase identifier.
bioparadigms_slc: Symbol used to link to the SLC tables database at bioparadigms.org for the gene.
orphanet: Orphanet identifier.
pseudogene_org: Pseudogene.org identifier.
horde_id: Symbol used within HORDE for the gene.
merops: Identifier used to link to the MEROPS peptidase database.
imgt: Symbol used within international ImMunoGeneTics information system.
iuphar: The objectId used to link to the IUPHAR/BPS Guide to PHARMACOLOGY database.
kznf_gene_catalog: Lawrence Livermore National Laboratory Human KZNF Gene Catalog (LLNL) identifier.
mamit_trnadb: Identifier to link to the Mamit-tRNA database.
cd: Symbol used within the Human Cell Differentiation Molecule database for the gene.
lncrnadb: lncRNA Database identifier.
enzyme_id: ENZYME EC accession number.
intermediate_filament_db: Identifier used to link to the Human Intermediate Filament Database.
rna_central_ids: Identifier in the RNAcentral, The non-coding RNA sequence database.
lncipedia: The LNCipedia identifier to which the gene belongs. This will only appear if the gene is a long non-coding RNA.
gtrnadb: The GtRNAdb identifier to which the gene belongs. This will only appear if the gene is a tRNA.
agr: The Alliance of Genomic Resources HGNC ID for the Human gene page within the resource.
mane_select: MANE Select nucleotide accession with version (i.e., NCBI RefSeq or Ensembl transcript ID and version).
gencc: Gene Curation Coalition (GenCC) Database identifier.

Examples

if (FALSE) import_hgnc_dataset() # \dontrun{}