The mgi.20240527 package is an R data annotation tool designed to provide Mouse Genome Informatics (MGI) gene marker identifiers, symbols, and names. It offers essential functionality for quick mapping between symbols, identifiers, and names, as well as converting to and from Entrez and Ensembl gene identifiers, aligned with the latest gene models. This package is based on the data release from May 27, 2024, ensuring users have access to the most current information available for their genomic research needs.
This package may be useful if you need a fast, offline solution for mouse gene marker remapping and are comfortable with the data being pinned to the May 27, 2024 release.
Note that the authors of this package are not affiliated with the original MGI data providers.
Installation
# install.packages("pak")
pak::pak("patterninstitute/mgi.20240527")
MGI gene markers
The list of gene markers returned by the function markers()
corresponds to those for which an associated gene model existed as of 27 May 2024.
library(mgi.20240527)
markers()
#> # A tibble: 71,489 × 15
#> mrk_id mrk_type mrk_symbol mrk_name genome_build entrez_gene_id ncbi_gene_chr
#> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
#> 1 MGI:8… Gene a nonagou… GRCm39 50518 2
#> 2 MGI:8… Gene Pzp PZP, al… GRCm39 11287 6
#> 3 MGI:8… Gene Abl1 c-abl o… GRCm39 11350 2
#> 4 MGI:8… Gene Abl2 ABL pro… GRCm39 11352 1
#> 5 MGI:8… Gene Scgb1b27 secreto… GRCm39 11354 7
#> 6 MGI:8… Gene Scgb2b27 secreto… GRCm39 233099 7
#> 7 MGI:8… Gene Scgb2b26 secreto… GRCm39 110187 7
#> 8 MGI:8… Gene Acadl acyl-Co… GRCm39 11363 1
#> 9 MGI:8… Gene Acadm acyl-Co… GRCm39 11364 3
#> 10 MGI:8… Gene Acads acyl-Co… GRCm39 11409 5
#> # ℹ 71,479 more rows
#> # ℹ 8 more variables: ncbi_gene_start <int>, ncbi_gene_end <int>,
#> # ncbi_gene_strand <chr>, ensembl_gene_id <chr>, ensembl_gene_chr <chr>,
#> # ensembl_gene_start <int>, ensembl_gene_end <int>, ensembl_gene_strand <chr>
Crosswalk between identifiers or symbols
# As of 27 May 2024, Dcdc3 symbol is deprecated in favor of Rp1.
update_symbol("Dcdc3")
#> [1] "Rp1"
# Because Dcdc3 is remapped to Rp1 both symbols are mapped to MGI:1341105.
symbol_to_marker_id("Dcdc3")
#> [1] "MGI:1341105"
symbol_to_marker_id("Rp1")
#> [1] "MGI:1341105"
symbol_to_marker_id("Dcdc3")
#> [1] "MGI:1341105"
symbol_to_ensembl_id("Rp1")
#> [1] "ENSMUSG00000025900"
Markers with genomic coordinates
markers("MGI:87860")
#> # A tibble: 1 × 15
#> mrk_id mrk_type mrk_symbol mrk_name genome_build entrez_gene_id ncbi_gene_chr
#> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
#> 1 MGI:87… Gene Abl2 ABL pro… GRCm39 11352 1
#> # ℹ 8 more variables: ncbi_gene_start <int>, ncbi_gene_end <int>,
#> # ncbi_gene_strand <chr>, ensembl_gene_id <chr>, ensembl_gene_chr <chr>,
#> # ensembl_gene_start <int>, ensembl_gene_end <int>, ensembl_gene_strand <chr>
markers("ENSMUSG00000066583", "ensembl_gene_id")
#> # A tibble: 1 × 15
#> ensembl_gene_id mrk_id mrk_type mrk_symbol mrk_name genome_build
#> <chr> <chr> <chr> <chr> <chr> <chr>
#> 1 ENSMUSG00000066583 MGI:87862 Gene Scgb1b27 secretoglobin, … GRCm39
#> # ℹ 9 more variables: entrez_gene_id <chr>, ncbi_gene_chr <chr>,
#> # ncbi_gene_start <int>, ncbi_gene_end <int>, ncbi_gene_strand <chr>,
#> # ensembl_gene_chr <chr>, ensembl_gene_start <int>, ensembl_gene_end <int>,
#> # ensembl_gene_strand <chr>
Old symbols’ remapping
symbols <- c(
"Xkr4",
"LOC102640625",
"sid2057",
"Sox17",
"ENSMUSG00000074760",
"R75157",
"6430567E01Rik",
"VINAS"
)
# Current MGI marker symbols as of 2024-05-27.
update_symbol(symbols)
#> [1] "Xkr4" NA "1110004F10Rik" "Sox17"
#> [5] "1700010L13Rik" "Zyx" "Zzz3" "1500026H17Rik"
# Map MGI marker symbols to marker identifiers.
symbol_to_marker_id(symbols)
#> [1] "MGI:3528744" NA "MGI:1929274" "MGI:107543" "MGI:1922878"
#> [6] "MGI:103072" "MGI:1920453" "MGI:1916252"
# Map MGI marker symbols to Ensembl identifiers.
symbol_to_ensembl_id(symbols)
#> [1] "ENSMUSG00000051951" NA "ENSMUSG00000030663"
#> [4] "ENSMUSG00000025902" NA "ENSMUSG00000029860"
#> [7] "ENSMUSG00000039068" "ENSMUSG00000097383"
# Map MGI marker symbols to Entrez identifiers.
symbol_to_entrez_id(symbols)
#> [1] "497097" NA "56372" "20671" NA "22793" "108946" "69002"
# Deprecated Ensembl identifier, e.g. 1700010L13Rik (MGI:1922878) should map
# to ENSMUSG00000074760.
# URL: https://www.informatics.jax.org/marker/MGI:1922878
symbol_to_marker_id("1700010L13Rik")
#> [1] "MGI:1922878"
# But results in NA because ENSMUSG00000074760 has status Retired:
# URL: https://www.ensembl.org/Mus_musculus/Gene/Idhistory?g=ENSMUSG00000074760
symbol_to_ensembl_id("1700010L13Rik")
#> [1] NA
Disclaimer
The mgi.20240527 package is provided “as is” without any warranties, express or implied. While we strive to ensure the accuracy and reliability of the data, we do not guarantee its completeness or correctness. The package is based on the data release from May 27, 2024, and may not include subsequent updates or corrections. Users are encouraged to verify the information independently and use it at their own risk. The authors and maintainers of this package are not liable for any direct, indirect, incidental, or consequential damages arising from its use.
References
Baldarelli RM, Smith CL, Ringwald M, Richardson JE, Bult CJ, Mouse Genome Informatics Group. 2024. Mouse Genome Informatics: an integrated knowledgebase system for the laboratory mouse. Genetics. 2024 May 7;227(1):iyae031.
Baldarelli RM, Smith CM, Finger JH, Hayamizu TF, McCright IJ, Xu J, Shaw DR, Beal JS, Blodgett O, Campbell J, Corbani LE, Frost PJ, Giannatto SC, Miers DB, Kadin JA, Richardson JE, Ringwald M. 2021. The mouse Gene Expression Database (GXD): 2021 update. Nucleic Acids Res. 2021 Jan 8;49(D1):D924-D931.
Krupke DM, Begley DA, Sundberg JP, Richardson JE, Neuhauser SB, Bult CJ. The Mouse Tumor Biology Database: A Comprehensive Resource for Mouse Models of Human Cancer. Cancer Res. 2017 Nov 1;77(21):e67-e70.
Related software
- MGI main data portal: https://www.informatics.jax.org/
- MGI batch query: https://www.informatics.jax.org/batch
- MouseMine: https://www.mousemine.org/mousemine/begin.do
- BioConductor packages: