Genomic features relevant to R-loops. Both mm10 and hg38 annotations are available.
annots_primary_hg38(quiet = FALSE) annots_full_hg38(quiet = FALSE) annots_primary_mm10(quiet = FALSE) annots_full_mm10(quiet = FALSE)
If TRUE, messages are suppressed. Default: FALSE.
A list of
tbl objects. See details.
tbl objects (tidyverse-style data frames) containing
annotations as genomic ranges. The primary annotations (e.g.,
annots_primary_hg38()) are an abbreviated
version of the full annotations (e.g.,
annots_full_hg38()). See the
description below for further details:
This section details the annotation databases which are available in RLHub. See the succeeding section ("Objects available based on accessor") for a list of which databases are available within each function. All processing was performed using this script as part of the RLBase-data processing protocol.
Description: Centromere locations within the genome.
Source: UCSC table centromeres.
Description: Copy-number alterations found in inherited disorder cell lines. See source for full description. CNV states (0-4)
are represented in the data as separate types. For example, Deep deletion (0) sites are accessed with
Source: UCSC table coriellDelDup.
Description: CpG island predicted locations throughout the genome.
Source: UCSC table cpgIslandExt.
Description: The UCSC Encode_CREs table contains putative promoter-like ("prom"), promoter-enhancer-like ("enhP"), distal-enhancer-like ("enhD"), H3K4me3 ("K4me3"), and CTCF ("CTCF") chromatin states across the genome.
Source: UCSC table encodeCcreCombined.
Description: The collection of curated transcription-factor binding profiles from encode, made available by UCSC table browser.
Source: UCSC table encRegTfbsClustered.
Description: G4-Quadruplex ChIP-Seq data
Source: GEO accession GSE63874.
Description: Re-processed and binned G4-Quadruplex Predictions. The type names for this database are the G4Q prediction classes and follow the pattern
tl: the length of guanine tracts in region;
nl: number of locations for G4 formation;
gn: the number of possible simultaneous G4 structures. For more information, see the source publication here.
Due to the large number of possible configurations of
gn, they were binned based on frequency.
Description: RNA species provided by UCSC KnownGene, split up by the "transcriptType" column from the source table.
Source: UCSC table knownGene.
Description: Microsatellite DNA regions predicted based on motif.
Source: UCSC table microsat.
Description: List of predicted poly-A sites, split by the "name2" column of the source table.
Source: UCSC table wgEncodeGencodePolyaV38.
Description: Repeat masker table from UCSC containing genomic annotations for predicted repetitive elements, split by class of repetitive element ("repClass").
Source: UCSC table rmsk.
Description: Regions of G or C-skew profiled using the
skewr program. See the RLBase-data
README.md for steps.
Source: From UCSC goldenPath, hg38 and
and mm10 gene GTF.
CpG islands for mm10 and hg38 provided as described in the CpG_Islands entry above. Processing
Description: snoRNA, miRNA, and scaRNA species provided by UCSC table browser and split by the "type" column.
Source: UCSC table wgRna.
Description: UCSC table of alternative splice events predicted from transcriptome data sets. Split by "name" column.
Source: UCSC table knownAlt.
Description: UCSC table containing predicted tRNA genes.
Source: UCSC table tRNAs.
Here, we show which objects are available with each accessor function:
Accessor functions (e.g.,
annots_primary_hg38()) return a named
objects that specify feature ranges. Below, we detail the naming and structure of each.
The names in the
list objects provided by each accessor function (e.g.,
follow this structure:
DataBase is the database from which
annotations were derived and
Type indicates the specific annotations from the database
which are included in the
tbl. This is required as some databases produce > 1
type of annotation (e.g., Transcript_Features contains "Exon"
Transcript_Features__Exon) and "Intron" (
tbl returned has the following structure:
"chom" - the Chromosome of the feature range (UCSC style)
"start" - the starting position of the feature range.
"end" - the end position of the feature range.
"strand" - the strand of the feature range.
"id" - A unique ID for the feature range.
annos <- annots_primary_hg38() annos <- annots_full_hg38() annos <- annots_primary_mm10() annos <- annots_full_mm10()