#' @include AnnotationTrack-class.R NULL ## GeneRegionTrack Class ----------------------------------------------------- #' GeneRegionTrack class and methods #' #' #' A class to hold gene model data for a genomic region. #' #' #' A track containing all gene models in a particular region. The data are #' usually fetched dynamially from an online data store, but it is also #' possible to manully construct objects from local data. Connections to #' particular online data sources should be implemented as sub-classes, and #' \code{GeneRegionTrack} is just the commone denominator that is being used #' for plotting later on. There are several levels of data associated to a #' \code{GeneRegionTrack}: #' #' \describe{ #' #' \item{exon level:}{identifiers are stored in the exon column of the #' \code{\linkS4class{GRanges}} object in the \code{range} slot. Data may be #' extracted using the \code{exon} method.} #' #' \item{transcript level:}{identifiers are stored in the transcript column of #' the \code{\linkS4class{GRanges}} object. Data may be extracted using the #' \code{transcript} method.} #' #' \item{gene level:}{identifiers are stored in the gene column of the #' \code{\linkS4class{GRanges}} object, more human-readable versions in the #' symbol column. Data may be extracted using the \code{gene} or the #' \code{symbol} methods.} #' #' \item{transcript-type level:}{information is stored in the feature column of #' the \code{\linkS4class{GRanges}} object. If a display parameter of the same #' name is specified, the software will use its value for the coloring.} #' #' } #' #' \code{GeneRegionTrack} objects also know about coding regions and non-coding #' regions (e.g., UTRs) in a transcript, and will indicate those by using #' different shapes (wide boxes for all coding regions, thinner boxes for #' non-coding regions). This is archived by setting the \code{feature} values #' of the object for non-coding elements to one of the options that are #' provided in the \code{thinBoxFeature} display parameters. All other elements #' are considered to be coding elements. #' #' @name GeneRegionTrack-class #' #' @param range #' #' An optional meta argument to handle the different input types. If the #' \code{range} argument is missing, all the relevant information to create the #' object has to be provided as individual function arguments (see below). #' #' The different input options for \code{range} are: #' #' \describe{ #' #' \item{}{A \code{TxDb} object: all the necessary gene model information #' including exon locations, transcript groupings and associated gene ids are #' contained in \code{TxDb} objects, and the coercion between the two is almost #' completely automated. If desired, the data to be fetched from the #' \code{TxDb} object can be restricted using the constructor's #' \code{chromosome}, \code{start} and \code{end} arguments. See below for #' details. A direct coercion method \code{as(obj, "GeneRegionTrack")} is also #' available. A nice added benefit of this input option is that the UTR and #' coding region information that is part of the original \code{TxDb} object is #' retained in the \code{GeneRegionTrack}.} #' #' \item{}{A \code{GRanges} object: the genomic ranges for the #' \code{GeneRegion} track as well as the optional additional metadata columns #' \code{feature}, \code{transcript}, \code{gene}, \code{exon} and #' \code{symbol} (see description of the individual function parameters below #' for details). Calling the constructor on a \code{GRanges} object without #' further arguments, e.g. \code{GeneRegionTrack(range=obj)} is equivalent to #' calling the coerce method \code{as(obj, "GeneRegionTrack")}.} #' #' \item{}{A \code{GRangesList} object: this is very similar to the previous #' case, except that the grouping information that is part of the list #' structure is preserved in the \code{GeneRegionTrack}. I.e., all the elements #' within one list item receive the same group id. For consistancy, there is #' also a coercion method from \code{GRangesLists} \code{as(obj, #' "GeneRegionTrack")}. Please note that unless the necessary information about #' gene ids, symbols, etc. is present in the individual \code{GRanges} meta #' data slots, the object will not be particularly useful, because all the #' identifiers will be set to a common default value.} #' #' \item{}{An \code{\linkS4class{IRanges}} object: almost identical to the #' \code{GRanges} case, except that the chromosome and strand information as #' well as all additional data has to be provided in the separate #' \code{chromosome}, \code{strand}, \code{feature}, \code{transcript}, #' \code{symbol}, \code{exon} or \code{gene} arguments, because it can not be #' directly encoded in an \code{IRanges} object. Note that only the former two #' are mandatory (if not provided explicitely the more or less reasonable #' default values \code{chromosome=NA} and \code{strand=*} are used, but not #' providing information about the gene-to-transcript relationship or the #' human-readble symbols renders a lot of the class' functionality useles.} #' #' \item{}{A \code{data.frame} object: the \code{data.frame} needs to contain #' at least the two mandatory columns \code{start} and \code{end} with the #' range coordinates. It may also contain a \code{chromosome} and a #' \code{strand} column with the chromosome and strand information for each #' range. If missing, this information will be drawn from the constructor's #' \code{chromosome} or \code{strand} arguments. In addition, the #' \code{feature}, \code{exon}, \code{transcript}, \code{gene} and #' \code{symbol} data can be provided as columns in the \code{data.frame}. The #' above comments about potential default values also apply here.} #' #' \item{}{A \code{character} scalar: in this case the value of the #' \code{range} argument is considered to be a file path to an annotation file #' on disk. A range of file types are supported by the \code{Gviz} package as #' identified by the file extension. See the \code{importFunction} #' documentation below for further details.} #' #' } #' @template GeneRegionTrack-class_param #' @return #' #' The return value of the constructor function is a new object of class #' \code{GeneRegionTrack}. #' @section Objects from the class: #' #' Objects can be created using the constructor function #' \code{GeneRegionTrack}. #' #' @author Florian Hahne, Steve Lianoglou #' #' @inherit GdObject-class seealso #' #' @examples #' #' #' ## The empty object #' GeneRegionTrack() #' #' ## Load some sample data #' data(cyp2b10) #' #' ## Construct the object #' grTrack <- GeneRegionTrack( #' start = 26682683, end = 26711643, #' rstart = cyp2b10$start, rends = cyp2b10$end, chromosome = 7, genome = "mm9", #' transcript = cyp2b10$transcript, gene = cyp2b10$gene, symbol = cyp2b10$symbol, #' feature = cyp2b10$feature, exon = cyp2b10$exon, #' name = "Cyp2b10", strand = cyp2b10$strand #' ) #' #' ## Directly from the data.frame #' grTrack <- GeneRegionTrack(cyp2b10) #' #' ## From a TxDb object #' if (require(GenomicFeatures)) { #' samplefile <- system.file("extdata", #' "hg19_knownGene_sample.sqlite", #' package = "GenomicFeatures") #' txdb <- loadDb(samplefile) #' GeneRegionTrack(txdb) #' GeneRegionTrack(txdb, chromosome = "chr6", start = 35000000, end = 40000000) #' } #' \dontshow{ #' ## For some annoying reason the postscript device does not know about #' ## the sans font #' if (!interactive()) { #' font <- ps.options()$family #' displayPars(grTrack) <- list(fontfamily = font, fontfamily.title = font) #' } #' } #' #' ## Plotting #' plotTracks(grTrack) #' #' ## Track names #' names(grTrack) #' names(grTrack) <- "foo" #' plotTracks(grTrack) #' #' ## Subsetting and splitting #' subTrack <- subset(grTrack, from = 26700000, to = 26705000) #' length(subTrack) #' subTrack <- grTrack[transcript(grTrack) == "ENSMUST00000144140"] #' split(grTrack, transcript(grTrack)) #' #' ## Accessors #' start(grTrack) #' end(grTrack) #' width(grTrack) #' position(grTrack) #' width(subTrack) <- width(subTrack) + 100 #' #' strand(grTrack) #' strand(subTrack) <- "-" #' #' chromosome(grTrack) #' chromosome(subTrack) <- "chrX" #' #' genome(grTrack) #' genome(subTrack) <- "hg19" #' #' range(grTrack) #' ranges(grTrack) #' #' ## Annotation #' identifier(grTrack) #' identifier(grTrack, "lowest") #' identifier(subTrack) <- "bar" #' #' feature(grTrack) #' feature(subTrack) <- "foo" #' #' exon(grTrack) #' exon(subTrack) <- letters[1:2] #' #' gene(grTrack) #' gene(subTrack) <- "bar" #' #' symbol(grTrack) #' symbol(subTrack) <- "foo" #' #' transcript(grTrack) #' transcript(subTrack) <- c("foo", "bar") #' chromosome(subTrack) <- "chr7" #' plotTracks(subTrack) #' #' values(grTrack) #' #' ## Grouping #' group(grTrack) #' group(subTrack) <- "Group 1" #' transcript(subTrack) #' plotTracks(subTrack) #' #' ## Collapsing transcripts #' plotTracks(grTrack, #' collapseTranscripts = TRUE, showId = TRUE, #' extend.left = 10000, shape = "arrow" #' ) #' #' ## Stacking #' stacking(grTrack) #' stacking(grTrack) <- "dense" #' plotTracks(grTrack) #' #' ## coercion #' as(grTrack, "data.frame") #' as(grTrack, "UCSCData") #' #' ## HTML image map #' coords(grTrack) #' tags(grTrack) #' grTrack <- plotTracks(grTrack)$foo #' coords(grTrack) #' tags(grTrack) #' @exportClass GeneRegionTrack setClass("GeneRegionTrack", contains = "AnnotationTrack", representation = representation(start = "NumericOrNULL", end = "NumericOrNULL"), prototype = prototype( columns = c("feature", "transcript", "symbol", "gene", "exon"), stacking = "squish", stacks = 0, start = 0, end = 0, name = "GeneRegionTrack", dp = DisplayPars( arrowHeadWidth = 10, arrowHeadMaxWidth = 20, col = NULL, collapseTranscripts = FALSE, exonAnnotation = NULL, fill = "orange", min.distance = 0, shape = c("smallArrow", "box"), showExonId = NULL, thinBoxFeature = .THIN_BOX_FEATURES, transcriptAnnotation = NULL ) ) ) ## Initialize ---------------------------------------------------------------- #' @describeIn GeneRegionTrack-class Initialize. #' @export setMethod("initialize", "GeneRegionTrack", function(.Object, start, end, ...) { if (is.null(list(...)$range) && is.null(list(...)$genome) && is.null(list(...)$chromosome)) { return(.Object) } ## the diplay parameter defaults .makeParMapping() .Object <- .updatePars(.Object, "GeneRegionTrack") .Object@start <- ifelse(is.null(start), 0, start) .Object@end <- ifelse(is.null(end), 0, end) .Object <- callNextMethod(.Object, ...) return(.Object) }) ## ReferenceGeneRegionTrack Class -------------------------------------------- ## The file-based version of the GeneRegionTrack class. This will mainly provide a means to dispatch to ## a special 'subset' method which should stream the necessary data from disk. #' @describeIn GeneRegionTrack-class The file-based version of the `GeneRegionTrack-class`. #' @exportClass ReferenceGeneRegionTrack setClass("ReferenceGeneRegionTrack", contains = c("GeneRegionTrack", "ReferenceTrack")) ## Initialize ---------------------------------------------------------------- ## This just needs to set the appropriate slots that are being inherited from ReferenceTrack because the ## multiple inheritence has some strange features with regards to method selection #' @describeIn GeneRegionTrack-class Initialize. #' @export setMethod("initialize", "ReferenceGeneRegionTrack", function(.Object, stream, reference, mapping = list(), args = list(), defaults = list(), ...) { .Object <- selectMethod("initialize", "ReferenceTrack")(.Object = .Object, reference = reference, stream = stream, mapping = mapping, args = args, defaults = defaults) .Object <- callNextMethod(.Object, ...) return(.Object) }) ## Constructor --------------------------------------------------------------- ## Constructor. The following arguments are supported: ## o range: one in a whole number of different potential inputs upon which the .buildRanges method will dispatch ## o start, end: numeric vectors of the track start and end coordinates. ## o genome, chromosome: the reference genome and active chromosome for the track. ## o rstarts, rends, rwidths: integer vectors of exon start and end locations or widths, or a character vector of ## comma-delimited exon locations, one vector element for each transcript ## o strand, feature, exon, transcript, gene, symbol, chromosome: vectors of equal length containing ## the exon strand, biotype, exon id, transcript id, gene id, human-readable gene symbol and chromosome information ## o stacking: character controlling the stacking of overlapping items. One in 'hide', ## 'dense', 'squish', 'pack' or 'full'. ## o name: the name of the track. This will be used for the title panel. ## o exonList: boolean, causing the values in starts, rends or rwidths to be interpreted as delim-separated ## lists that have to be exploded. All other annotation arguments will be repeated accordingly. ## o delim: the delimiter if coordinates are in a list ## All additional items in ... are being treated as DisplayParameters #' @describeIn GeneRegionTrack-class Constructor function for `GeneRegionTrack-class`. #' @export GeneRegionTrack <- function(range = NULL, rstarts = NULL, rends = NULL, rwidths = NULL, strand, feature, exon, transcript, gene, symbol, chromosome, genome, stacking = "squish", name = "GeneRegionTrack", start = NULL, end = NULL, importFunction, stream = FALSE, ...) { ## Some defaults covars <- if (is.data.frame(range)) range else if (is(range, "GRanges")) as.data.frame(mcols(range)) else data.frame() isStream <- FALSE if (!is.character(range)) { n <- if (is.null(range)) max(c(length(start), length(end), length(width))) else if (is(range, "data.frame")) nrow(range) else length(range) if (is.null(covars[["feature"]]) && missing(feature)) { feature <- paste("exon", seq_len(n), sep = "_") } if (is.null(covars[["exon"]]) && missing(exon)) { exon <- make.unique(rep(if (!missing(feature) && !is.null(feature)) as.character(feature) else covars[["feature"]], n)[seq_len(n)]) } if (is.null(covars[["transcript"]]) && missing(transcript)) { transcript <- paste("transcript", seq_len(n), sep = "_") } if (is.null(covars[["gene"]]) && missing(gene)) { gene <- paste("gene", seq_len(n), sep = "_") } } ## Build a GRanges object from the inputs .missingToNull(c("feature", "exon", "transcript", "gene", "symbol", "strand", "chromosome", "importFunction", "genome")) args <- list( feature = feature, id = exon, exon = exon, transcript = transcript, gene = gene, symbol = symbol, strand = strand, chromosome = chromosome, genome = genome ) defs <- list( feature = "unknown", id = "unknown", exon = "unknown", transcript = "unknown", genome = NA, gene = "unknown", symbol = "unknown", strand = "*", density = 1, chromosome = "chrNA" ) range <- .buildRange( range = range, groupId = "transcript", start = rstarts, end = rends, width = rwidths, args = args, defaults = defs, chromosome = chromosome, tstart = start, tend = end, trackType = "GeneRegionTrack", importFun = importFunction, genome = genome ) if (is.list(range)) { isStream <- TRUE slist <- range range <- GRanges() } if (is.null(start)) { start <- if (!length(range)) NULL else min(start(range)) } if (is.null(end)) { end <- if (!length(range)) NULL else max(end(range)) } if (missing(chromosome) || is.null(chromosome)) { chromosome <- if (length(range) > 0) .chrName(as.character(seqnames(range)[1])) else "chrNA" } genome <- .getGenomeFromGRange(range, ifelse(is.null(genome), character(), genome[1])) if (!isStream) { return(new("GeneRegionTrack", start = start, end = end, chromosome = chromosome[1], range = range, name = name, genome = genome, stacking = stacking, ... )) } else { return(new("ReferenceGeneRegionTrack", start = start, end = end, chromosome = chromosome[1], range = range, name = name, genome = genome, stacking = stacking, stream = slist[["stream"]], reference = slist[["reference"]], mapping = slist[["mapping"]], args = args, defaults = defs, ... )) } } ## .buildRange --------------------------------------------------------------- ## ## Helper methods to build a GRanges object from the input arguments. ## ## Coordinates for grouped elements may be passed in as comma-separated values ## (e.g. "1,5,9"), in which case we need to split and convert to numeric. ## This also implies that the additional arguments (like feature, group, etc.) ## have to be replicated accordingly. We handle this by passing along the repeat ## vector 'by' to the numeric method below. ## For TxDb objects we extract the grouping information and use the GRanges method #' @importClassesFrom GenomicFeatures TxDb #' @importMethodsFrom GenomicFeatures isActiveSeq "isActiveSeq<-" cdsBy exonsBy fiveUTRsByTranscript threeUTRsByTranscript transcriptsBy transcripts setMethod( ".buildRange", signature("TxDb"), function(range, groupId = "transcript", tstart, tend, chromosome, args, ...) { ## If chromosome (and optional start and end) information is present we only extract parts of the annotation data noSubset <- is.null(tstart) && is.null(tend) if (!is.null(chromosome)) { chromosome <- .chrName(chromosome) ## Seems like TxDb objects use pass by reference for the active chromosomes, so we have to ## restore the old values after we are done oldAct <- seqlevels(range) oldRange <- range on.exit({ restoreSeqlevels(oldRange) seqlevels(oldRange, pruning.mode = "coarse") <- oldAct }) restoreSeqlevels(range) seqlevels(range, pruning.mode = "coarse") <- chromosome sl <- seqlengths(range) if (is.null(tstart)) { tstart <- rep(1, length(chromosome)) } if (is.null(tend)) { tend <- sl[chromosome] + 1 tend[is.na(tend)] <- tstart[is.na(tend)] + 1 } sRange <- GRanges(seqnames = chromosome, ranges = IRanges(start = tstart, end = tend)) } ## First the mapping of internal transcript ID to transcript name txs <- as.data.frame(values(transcripts(range, columns = c("tx_id", "tx_name")))) rownames(txs) <- txs[, "tx_id"] ## Now the CDS ranges t2c <- cdsBy(range, "tx") names(t2c) <- txs[names(t2c), 2] tids <- rep(names(t2c), elementNROWS(t2c)) t2c <- unlist(t2c) if (length(t2c)) { t2c$tx_id <- tids t2c$feature_type <- "CDS" } ## And the 5'UTRS t2f <- fiveUTRsByTranscript(range) names(t2f) <- txs[names(t2f), 2] tids <- rep(names(t2f), elementNROWS(t2f)) t2f <- unlist(t2f) if (length(t2f)) { t2f$tx_id <- tids t2f$feature_type <- "utr5" } ## And the 3'UTRS t2t <- threeUTRsByTranscript(range) names(t2t) <- txs[names(t2t), 2] tids <- rep(names(t2t), elementNROWS(t2t)) t2t <- unlist(t2t) if (length(t2t)) { t2t$tx_id <- tids t2t$feature_type <- "utr3" } ## And finally all the non-coding transcripts nt2e <- exonsBy(range, "tx") names(nt2e) <- txs[names(nt2e), 2] nt2e <- nt2e[!names(nt2e) %in% c(values(t2c)$tx_id, values(t2f)$tx_id, values(t2t)$tx_id)] tids <- rep(names(nt2e), elementNROWS(nt2e)) nt2e <- unlist(nt2e) if (length(nt2e)) { nt2e$tx_id <- tids nt2e$feature_type <- "ncRNA" } ## Now we can merge the three back together (we need to change the column names of t2c to make them all the same) colnames(values(t2c))[c(1, 2)] <- c("exon_id", "exon_name") ## t2e <- c(t2c, t2f, t2t, nt2e) ## This is super-slow, much more efficient if we build the GRanges object from the individual bits and pieces vals <- DataFrame( exon_id = c(values(t2c)$exon_id, values(t2f)$exon_id, values(t2t)$exon_id, values(nt2e)$exon_id), exon_name = c(values(t2c)$exon_name, values(t2f)$exon_name, values(t2t)$exon_name, values(nt2e)$exon_name), exon_rank = c(values(t2c)$exon_rank, values(t2f)$exon_rank, values(t2t)$exon_rank, values(nt2e)$exon_rank), tx_id = c(values(t2c)$tx_id, values(t2f)$tx_id, values(t2t)$tx_id, values(nt2e)$tx_id), feature_type = c(values(t2c)$feature_type, values(t2f)$feature_type, values(t2t)$feature_type, values(nt2e)$feature_type) ) t2e <- GRanges( seqnames = c(seqnames(t2c), seqnames(t2f), seqnames(t2t), seqnames(nt2e)), ranges = IRanges( start = c(start(t2c), start(t2f), start(t2t), start(nt2e)), end = c(end(t2c), end(t2f), end(t2t), end(nt2e)) ), strand = c(strand(t2c), strand(t2f), strand(t2t), strand(nt2e)) ) if (all(is.na(vals$exon_name))) { vals$exon_name <- paste(vals$tx_id, vals$exon_rank, sep = "_") } values(t2e) <- vals if (length(t2e) == 0) { return(GRanges()) } ## Add the gene level annotation g2t <- transcriptsBy(range, "gene") gids <- rep(names(g2t), elementNROWS(g2t)) g2t <- unlist(g2t) values(g2t)[["gene_id"]] <- gids values(t2e)$gene_id <- gids[match(values(t2e)$tx_id, as.character(txs[as.character(values(g2t)$tx_id), 2]))] vals <- values(t2e)[c("tx_id", "exon_name", "exon_rank", "feature_type", "tx_id", "gene_id")] colnames(vals) <- c("transcript", "exon", "rank", "feature", "symbol", "gene") ## Add the genome information genome(t2e) <- unique(genome(range)) ## Finally we re-assign, subset if necessary, and sort range <- t2e values(range) <- vals if (!noSubset && !is.null(chromosome)) { ## We have to keep all exons for all the overlapping transcripts txSel <- unique(subsetByOverlaps(g2t, sRange)$tx_name) range <- range[range$transcript %in% txSel] } args <- list(genome = genome(range)[1]) return(.buildRange(range = sort(range), chromosome = chromosome, args = args, ...)) } ) ## For ensDb objects we extract the grouping information and use the GRanges method #' @importClassesFrom ensembldb EnsDb #' @importMethodsFrom ensembldb cdsBy exonsBy fiveUTRsByTranscript threeUTRsByTranscript transcriptsBy transcripts setMethod( ".buildRange", signature("EnsDb"), function(range, groupId = "transcript", tstart, tend, chromosome, args, ...) { ## If chromosome (and optional start and end) information is present we only extract parts of the annotation data noSubset <- is.null(tstart) && is.null(tend) if (!is.null(chromosome)) { chromosome <- .chrName(chromosome) sl <- seqlengths(range) if (is.null(tstart)) { tstart <- rep(1, length(chromosome)) } if (is.null(tend)) { tend <- sl[chromosome] + 1 tend[is.na(tend)] <- tstart[is.na(tend)] + 1 } sRange <- GRanges(seqnames = chromosome, ranges = IRanges(start = tstart, end = tend)) ## sRange <- GRangesFilter(sRange, type = "any") can we filter directly? } ## First the mapping of internal transcript ID to transcript name txs <- as.data.frame(values(transcripts(range, columns = c("tx_id", "tx_name")))) rownames(txs) <- txs[, "tx_id"] ## Now the CDS ranges t2c <- cdsBy(range, "tx") names(t2c) <- txs[names(t2c), 2] tids <- rep(names(t2c), elementNROWS(t2c)) t2c <- unlist(t2c) if (length(t2c)) { t2c$tx_id <- tids t2c$feature_type <- "CDS" } ## And the 5'UTRS t2f <- fiveUTRsByTranscript(range) names(t2f) <- txs[names(t2f), 2] tids <- rep(names(t2f), elementNROWS(t2f)) t2f <- unlist(t2f) if (length(t2f)) { t2f$tx_id <- tids t2f$feature_type <- "utr5" } ## And the 3'UTRS t2t <- threeUTRsByTranscript(range) names(t2t) <- txs[names(t2t), 2] tids <- rep(names(t2t), elementNROWS(t2t)) t2t <- unlist(t2t) if (length(t2t)) { t2t$tx_id <- tids t2t$feature_type <- "utr3" } ## And finally all the non-coding transcripts nt2e <- exonsBy(range, "tx") names(nt2e) <- txs[names(nt2e), 2] nt2e <- nt2e[!names(nt2e) %in% c(values(t2c)$tx_id, values(t2f)$tx_id, values(t2t)$tx_id)] tids <- rep(names(nt2e), elementNROWS(nt2e)) nt2e <- unlist(nt2e) if (length(nt2e)) { nt2e$tx_id <- tids nt2e$feature_type <- "ncRNA" } ## Now we can merge the three back together (we need to change the column names of t2c to make them all the same) ## t2e <- c(t2c, t2f, t2t, nt2e) ## This is super-slow, much more efficient if we build the GRanges object from the individual bits and pieces vals <- DataFrame( exon_id = c(values(t2c)$exon_id, values(t2f)$exon_id, values(t2t)$exon_id, values(nt2e)$exon_id), exon_name = c(values(t2c)$exon_id, values(t2f)$exon_id, values(t2t)$exon_id, values(nt2e)$exon_id), exon_rank = c(values(t2c)$exon_rank, values(t2f)$exon_rank, values(t2t)$exon_rank, values(nt2e)$exon_rank), tx_id = c(values(t2c)$tx_id, values(t2f)$tx_id, values(t2t)$tx_id, values(nt2e)$tx_id), feature_type = c(values(t2c)$feature_type, values(t2f)$feature_type, values(t2t)$feature_type, values(nt2e)$feature_type) ) t2e <- GRanges( seqnames = c(seqnames(t2c), seqnames(t2f), seqnames(t2t), seqnames(nt2e)), ranges = IRanges( start = c(start(t2c), start(t2f), start(t2t), start(nt2e)), end = c(end(t2c), end(t2f), end(t2t), end(nt2e)) ), strand = c(strand(t2c), strand(t2f), strand(t2t), strand(nt2e)) ) if (all(is.na(vals$exon_name))) { vals$exon_name <- paste(vals$tx_id, vals$exon_rank, sep = "_") } values(t2e) <- vals if (length(t2e) == 0) { return(GRanges()) } ## Add the gene level annotation g2t <- transcriptsBy(range, "gene") gids <- rep(names(g2t), elementNROWS(g2t)) g2t <- unlist(g2t) values(g2t)[["gene_id"]] <- gids values(t2e)$gene_id <- gids[match(values(t2e)$tx_id, as.character(txs[as.character(values(g2t)$tx_id), 2]))] vals <- values(t2e)[c("tx_id", "exon_name", "exon_rank", "feature_type", "tx_id", "gene_id")] colnames(vals) <- c("transcript", "exon", "rank", "feature", "symbol", "gene") ## Add the genome information genome(t2e) <- unique(genome(range)) ## Finally we re-assign, subset if necessary, and sort range <- t2e values(range) <- vals if (!noSubset && !is.null(chromosome)) { ## We have to keep all exons for all the overlapping transcripts txSel <- unique(subsetByOverlaps(g2t, sRange)$tx_name) range <- range[range$transcript %in% txSel] } args <- list(genome = genome(range)[1]) return(.buildRange(range = sort(range), chromosome = chromosome, args = args, ...)) } ) ## General accessors --------------------------------------------------------- ## Annotation Accessors ------------------------------------------------------ #' @describeIn GeneRegionTrack-class Extract the gene identifiers for all #' gene models. #' @export setMethod("gene", signature(GdObject = "GeneRegionTrack"), function(GdObject) .getAnn(GdObject, "gene")) #' @describeIn GeneRegionTrack-class Replace the gene identifiers for all #' gene models. #' The replacement value must be a character of appropriate length or another #' vector that can be coerced into such. #' @export setReplaceMethod("gene", signature("GeneRegionTrack", "character"), function(GdObject, value) .setAnn(GdObject, value, "gene")) #' @describeIn GeneRegionTrack-class Extract the human-readble gene symbol #' for all gene models. #' @export setMethod("symbol", signature(GdObject = "GeneRegionTrack"), function(GdObject) .getAnn(GdObject, "symbol")) #' @describeIn GeneRegionTrack-class Replace the human-readable gene symbol #' for all gene models. #' The replacement value must be a character of appropriate length or another #' vector that can be coerced into such. #' @export setReplaceMethod("symbol", signature("GeneRegionTrack", "character"), function(GdObject, value) .setAnn(GdObject, value, "symbol")) #' @describeIn GeneRegionTrack-class Extract the transcript identifiers for all #' transcripts in the gene models. #' @export setMethod("transcript", signature(GdObject = "GeneRegionTrack"), function(GdObject) .getAnn(GdObject, "transcript")) #' @describeIn GeneRegionTrack-class Replace the transcript identifiers for all #' transcripts in the gene model. The replacement value must be a character of #' appropriate length or another vector that can be coerced into such. #' @export setReplaceMethod( "transcript", signature("GeneRegionTrack", "character"), function(GdObject, value) .setAnn(GdObject, value, "transcript") ) #' @describeIn GeneRegionTrack-class Extract the exon identifiers for all exons #' in the gene models. #' @export setMethod("exon", signature(GdObject = "GeneRegionTrack"), function(GdObject) .getAnn(GdObject, "exon")) #' @describeIn GeneRegionTrack-class replace the exon identifiers for all exons #' in the gene model. The replacement value must be a character of appropriate #' length or another vector that can be coerced into such. #' @export setReplaceMethod("exon", signature("GeneRegionTrack", "character"), function(GdObject, value) .setAnn(GdObject, value, "exon")) #' @describeIn GeneRegionTrack-class extract the group membership for all track items. #' @export setMethod("group", "GeneRegionTrack", function(GdObject) transcript(GdObject)) #' @describeIn GeneRegionTrack-class replace the grouping information for track items. #' The replacement value must be a factor of appropriate length or another #' vector that can be coerced into such. #' @export setReplaceMethod( "group", signature("GeneRegionTrack", "character"), function(GdObject, value) .setAnn(GdObject, value, "transcript") ) #' @describeIn GeneRegionTrack-class return track item identifiers. #' Depending on the setting of the optional argument lowest, these are either #' the group identifiers or the individual item identifiers. #' export setMethod("identifier", "GeneRegionTrack", function(GdObject, type = .dpOrDefault(GdObject, "transcriptAnnotation", "symbol")) { if (is.logical(type)) { type <- ifelse("symbol", "gene", type[1]) } if (is.null(type)) { type <- "symbol" } id <- switch(as.character(type), "symbol" = symbol(GdObject), "gene" = gene(GdObject), "transcript" = transcript(GdObject), "feature" = feature(GdObject), "exon" = exon(GdObject), "lowest" = exon(GdObject), symbol(GdObject) ) id[is.na(id)] <- "NA" return(id) }) #' @describeIn GeneRegionTrack-class Set the track item identifiers. #' The replacement value has to be a character vector of appropriate length. #' This always replaces the group-level identifiers, so essentially it is #' similar to `groups<-`. #' @export setReplaceMethod("identifier", c("GeneRegionTrack", "character"), function(GdObject, value) { type <- .dpOrDefault(GdObject, "transcriptAnnotation", "symbol") switch(as.character(type), "symbol" = symbol(GdObject) <- value, "gene" = gene(GdObject) <- value, "transcript" = transcript(GdObject) <- value, "feature" = feature(GdObject) <- value, "exon" = exon(GdObject) <- value, "lowest" = exon(GdObject) <- value, symbol(GdObject) <- value ) return(GdObject) }) ## Stacking ------------------------------------------------------------------ ## Consolidate --------------------------------------------------------------- ## Collapse ----------------------------------------------------------------- ## Subset -------------------------------------------------------------------- ## FIXME: Still needs to be implemented #' @describeIn GeneRegionTrack-class Subset a GeneRegionTrack by coordinates #' and sort if necessary. #' @export setMethod("subset", signature(x = "ReferenceGeneRegionTrack"), function(x, ...) { warning("ReferenceGeneRegionTrack objects are not supported yet.") return(callNextMethod()) }) ## Position ------------------------------------------------------------------ ## DrawGrid ------------------------------------------------------------------ ## DrawAxis ------------------------------------------------------------------ ## DrawGD -------------------------------------------------------------------- #' @describeIn GeneRegionTrack-class plot the object to a graphics device. #' The return value of this method is the input object, potentially updated #' during the plotting operation. Internally, there are two modes in which the #' method can be called. Either in 'prepare' mode, in which case no plotting is #' done but the object is preprocessed based on the available space, or in #' 'plotting' mode, in which case the actual graphical output is created. #' Since subsetting of the object can be potentially costly, this can be #' switched off in case subsetting has already been performed before or #' is not necessary. #' #' @export setMethod("drawGD", signature("GeneRegionTrack"), function(GdObject, ...) { displayPars(GdObject) <- list(showFeatureId = as.vector(.dpOrDefault(GdObject, "showExonId"))) GdObject <- callNextMethod() return(invisible(GdObject)) }) ## SetAs --------------------------------------------------------------------- #' @importFrom rtracklayer GenomicData setAs( "GeneRegionTrack", "UCSCData", function(from, to) { ranges <- cbind(as(as(ranges(from), "DataFrame"), "data.frame"), start = start(from), end = end(from), color = .getBiotypeColor(from), strand = strand(from) ) ranges <- ranges[order(start(from)), ] ranges <- split(ranges, ranges[, "X.transcript"]) start <- vapply(ranges, function(x) min(x$start), FUN.VALUE = numeric(1L)) end <- vapply(ranges, function(x) max(x$end), FUN.VALUE = numeric(1L)) name <- vapply(ranges, function(x) as.character(unique(x$X.symbol)), FUN.VALUE = character(1L)) color <- vapply(ranges, function(x) as.character(unique(x$color)), FUN.VALUE = character(1L)) strand <- vapply(ranges, function(x) as.character(unique(x$strand)), FUN.VALUE = character(1L)) strand[strand == "*"] <- "+" id <- names(ranges) blocks <- vapply(ranges, nrow, FUN.VALUE = numeric(1L)) bsizes <- vapply(ranges, function(x) paste(x$end - x$start + 1, collapse = ","), FUN.VALUE = character(1L)) bstarts <- vapply(ranges, function(x) paste(x$start - min(x$start), collapse = ","), FUN.VALUE = character(1L)) dcolor <- as.integer(col2rgb(.dpOrDefault(from, "col"))) line <- new("BasicTrackLine", name = names(from), description = names(from), visibility = stacking(from), color = dcolor, itemRgb = TRUE ) new("UCSCData", rtracklayer::GenomicData(IRanges(start, end), chrom = chromosome(from), id = id, name = name, itemRgb = color, blockCount = blocks, blockSizes = bsizes, blockStarts = bstarts, strand = strand ), trackLine = line ) } ) setAs("GRanges", "GeneRegionTrack", function(from, to) GeneRegionTrack(range = from)) setAs("GRangesList", "GeneRegionTrack", function(from, to) GeneRegionTrack(range = from)) setAs("TxDb", "GeneRegionTrack", function(from, to) GeneRegionTrack(range = from)) ## Show ---------------------------------------------------------------------- #' @describeIn GeneRegionTrack-class Show method. #' @export setMethod("show", signature(object = "GeneRegionTrack"), function(object) { cat(sprintf("GeneRegionTrack '%s'\n%s\n", names(object), .annotationTrackInfo(object))) }) #' @describeIn GeneRegionTrack-class Show method. #' @export setMethod("show", signature(object = "ReferenceGeneRegionTrack"), function(object) { .referenceTrackInfo(object, "ReferenceGeneRegionTrack") })