This function retrieves a few extra details about a toplevel sequence. These sequences correspond to genomic regions in the genome assembly that are not a component of another sequence region. Thus, toplevel sequences will be chromosomes and any unlocalised or unplaced scaffolds.
Usage
get_toplevel_sequence_info(
species_name = "homo_sapiens",
toplevel_sequence = c(1:22, "X", "Y", "MT"),
verbose = FALSE,
warnings = TRUE,
progress_bar = TRUE
)Arguments
- species_name
The species name, i.e., the scientific name, all letters lowercase and space replaced by underscore. Examples:
'homo_sapiens'(human),'ovis_aries'(Domestic sheep) or'capra_hircus'(Goat).- toplevel_sequence
A toplevel sequence name, e.g. chromosome names such as
"1","X", or"Y", or a non-chromosome sequence, e.g., a scaffold such as"KI270757.1".- verbose
Whether to be chatty.
- warnings
Whether to print warnings.
- progress_bar
Whether to show a progress bar.
Value
A tibble, each row being a toplevel sequence,
of 8 variables:
species_nameEnsembl species name: this is the name used internally by Ensembl to uniquely identify a species by name. It is the scientific name but formatted without capitalisation and spacing converted with an underscore, e.g.,
'homo_sapiens'.toplevel_sequenceName of the toplevel sequence.
is_chromosomeA logical indicating whether the toplevel sequence is a chromosome (
TRUE) or not (FALSE).coord_systemCoordinate system type.
assembly_exception_typeCoordinate system type.
is_circularA logical indicating whether the toplevel sequence is a circular sequence (
TRUE) or not (FALSE).assembly_nameAssembly name.
lengthGenomic length toplevel sequence in base pairs.
Examples
# Get details about human chromosomes (default)
get_toplevel_sequence_info()
#> # A tibble: 25 × 8
#> species_name toplevel_sequence is_chromosome coordinate_system
#> <chr> <chr> <lgl> <chr>
#> 1 homo_sapiens 1 TRUE chromosome
#> 2 homo_sapiens 2 TRUE chromosome
#> 3 homo_sapiens 3 TRUE chromosome
#> 4 homo_sapiens 4 TRUE chromosome
#> 5 homo_sapiens 5 TRUE chromosome
#> 6 homo_sapiens 6 TRUE chromosome
#> 7 homo_sapiens 7 TRUE chromosome
#> 8 homo_sapiens 8 TRUE chromosome
#> 9 homo_sapiens 9 TRUE chromosome
#> 10 homo_sapiens 10 TRUE chromosome
#> # ℹ 15 more rows
#> # ℹ 4 more variables: assembly_exception_type <chr>, is_circular <lgl>,
#> # assembly_name <chr>, length <int>
# Get details about a scaffold
# (To find available toplevel sequences to query use the function
# `get_toplevel_sequences()`)
get_toplevel_sequence_info(species_name = 'homo_sapiens', toplevel_sequence = 'KI270757.1')
#> # A tibble: 1 × 8
#> species_name toplevel_sequence is_chromosome coordinate_system
#> <chr> <chr> <lgl> <chr>
#> 1 homo_sapiens KI270757.1 FALSE scaffold
#> # ℹ 4 more variables: assembly_exception_type <chr>, is_circular <lgl>,
#> # assembly_name <chr>, length <int>