import_hgnc_dataset()
imports HGNC data into R. Specify a directory path
in addition if you wish the save the data to disk.
Usage
import_hgnc_dataset(file = latest_archive_url())
Arguments
- file
A file or URL of the complete HGNC data set (in TSV format). Use
list_archives()
to list previous versions of these data. Pass one of the URLs (columnurl
) tofile
to import that specific version. By default the value offile
is the URL corresponding to the latest version, i.e. the returned value oflatest_archive_url()
.
Value
A tibble of the HGNC data set consisting of 55 columns:
hgnc_id
: A unique ID provided by HGNC for each gene with an approved symbol. IDs are of the format'HGNC:n'
, wheren
is a unique number. HGNC IDs remain stable even if a name or symbol changes.hgnc_id2
: A stripped down version ofhgnc_id
where the prefix'HGNC:'
has been removed. This column is added by the package{hgnc}
.symbol
: The official gene symbol approved by the HGNC, typically a short form of the gene name. Symbols are approved in accordance with the Guidelines for Human Gene Nomenclature.name
: The full gene name approved by the HGNC; corresponds to the approved symbol above.locus_group
: A group name for a set of related locus types as defined by the HGNC. One of:'protein-coding gene'
,'non-coding RNA'
,'pseudogene'
or'other'
.locus_type
: Specifies the genetic class of each gene entry, including various types of RNA and other gene-related categories, such as pseudogenes and virus integration sites.status
: Status of the symbol report, which can be either'Approved'
or'Entry Withdrawn'
.location
: Chromosomal location. Indicates the cytogenetic location of the gene or region on the chromosome, e.g.,'19q13.43'
. In the absence of that information, it may be listed as'not on reference assembly'
,'unplaced'
, or'reserved'
.location_sortable
: A sortable version of thelocation
column, allowing easier sorting by chromosomal location.alias_symbol
: Alternative symbols that have been used to refer to the gene. Aliases may be from literature, other databases, or represent membership of a gene group.alias_name
: Alternative names for the gene. Aliases may be from literature, other databases, or represent membership of a gene group.prev_symbol
: This field displays any symbols that were previously HGNC-approved nomenclature.prev_name
: This field displays any names that were previously HGNC-approved nomenclature.gene_group
: A gene group. Each gene has been assigned to one or more groups, according to either sequence similarity or information from publications, specialist advisors, or other databases.gene_group_id
: Gene group identifier, an integer number. This column contains the gene group identifiers. Seegene_group
for the gene group name.date_approved_reserved
: The date the entry was first approved.date_symbol_changed
: The date the gene symbol was last changed.date_name_changed
: The date the gene name was last changed.date_modified
: Date the entry was last modified.entrez_id
: Entrez gene identifier.ensembl_gene_id
: Ensembl gene identifier.vega_id
: VEGA gene identifier.ucsc_id
: UCSC gene identifier.ena
: International Nucleotide Sequence Database Collaboration (GenBank, ENA and DDBJ) accession number(s).refseq_accession
: The Reference Sequence (RefSeq) identifier for that entry, provided by the NCBI.ccds_id
: Consensus CDS identifier.uniprot_ids
: UniProt protein accession.pubmed_id
: Pubmed and Europe Pubmed Central PMIDs.mgd_id
: Mouse genome informatics database identifier.rgd_id
: Rat genome database gene identifier.lsdb
: The name of the Locus Specific Mutation Database and URL for the gene.cosmic
: Symbol used within the Catalogue of somatic mutations in cancer for the gene.omim_id
: Online Mendelian Inheritance in Man (OMIM) identifier.mirbase
: miRBase identifier.homeodb
: Homeobox Database identifier.snornabase
: snoRNABase identifier.bioparadigms_slc
: Symbol used to link to the SLC tables database at bioparadigms.org for the gene.orphanet
: Orphanet identifier.pseudogene_org
: Pseudogene.org identifier.horde_id
: Symbol used within HORDE for the gene.merops
: Identifier used to link to the MEROPS peptidase database.imgt
: Symbol used within international ImMunoGeneTics information system.iuphar
: The objectId used to link to the IUPHAR/BPS Guide to PHARMACOLOGY database.kznf_gene_catalog
: Lawrence Livermore National Laboratory Human KZNF Gene Catalog (LLNL) identifier.mamit_trnadb
: Identifier to link to the Mamit-tRNA database.cd
: Symbol used within the Human Cell Differentiation Molecule database for the gene.lncrnadb
: lncRNA Database identifier.enzyme_id
: ENZYME EC accession number.intermediate_filament_db
: Identifier used to link to the Human Intermediate Filament Database.rna_central_ids
: Identifier in the RNAcentral, The non-coding RNA sequence database.lncipedia
: The LNCipedia identifier to which the gene belongs. This will only appear if the gene is a long non-coding RNA.gtrnadb
: The GtRNAdb identifier to which the gene belongs. This will only appear if the gene is a tRNA.agr
: The Alliance of Genomic Resources HGNC ID for the Human gene page within the resource.mane_select
: MANE Select nucleotide accession with version (i.e., NCBI RefSeq or Ensembl transcript ID and version).gencc
: Gene Curation Coalition (GenCC) Database identifier.