Parse UCSC tracks that use the Gokey format
character vector containing lines read from a track file, or valid path or connection to a track file.
character vector containing valid regular
expression patterns used to recognize when a track should
be considered an overlay coverage track. For example
track name="trackA F"
and track name="trackA R"
would
be recognized as forward and reverse strand for a track
named "trackA"
. Overlay tracks are handled using the UCSC
"multiWig"
approach, and not the composite track approach.
To disable overlay_grep, use overlay_grep="^$"
. To enable
overlay_grep for all tracks, use overlay_grep="$"
.
integer value indicating the priority to start when assigning priority to each track.
character string indicating the
output format, where "text"
will return one long character
string, and "list"
will return a list
with one track
per list element with class "glue","character"
.
character
indicating type of debug output:
df
: returns the intermediate track_df
data.frame;
pri
: prints priority during track parsing;
none
: does no debug, the default.
logical
indicating whether
multiWig parent tracks should be named by concatenating
header1
and header2
values.
logical
indicating whether to print verbose
output during processing.
additional arguments are treated as a named list
of track parameters that override existing parameter values.
For example scoreFilter=1
will override the default
for bigBed tracks scoreFilter=5
.
by default a character string suitable to cat()
directly into a text file, when output_format="text"
.
When output_format="list"
it returns a
list of glue
objects, which can be concatenated into
one character string with Reduce("+", trackline_list)
.
Given a text file, or lines from a text file, representing
the Gokey format
, this function will parse the track
lines into groups, and return a text string usable in
a UCSC genome browser track hub.
In general, the intention is to convert a set of UCSC track lines to a track hub format, where common track options are converted to relevant track hub configuration lines.
Tracks are generally divided into two types of groupings:
Track name that matches overlay_grep
regular expression pattern
are configured as multiWig
overlay tracks. This configuration
uses the UCSC multiWig format as described here
https://genome.ucsc.edu/goldenPath/help/trackDb/trackDbHub.html#aggregate
More specifically, a parent track is configured as a superTrack
.
Each track matching overlay_grep
is converted to a shared track
name after removing the relevant grep pattern. Each unique track group
is used as an intermediate track with "container multiWig"
.
Each track group is assigned priority in order of each unique track group defined in the track config lines.
Individual tracks are configured as child tracks to the track groups.
All other tracks are grouped as composite tracks, specifically using composite track view, as described here https://genome.ucsc.edu/goldenPath/help/trackDb/trackDbHub.html#compositeTrack
More specifically, the parent track is configured as a compositeTrack
,
including views as "view Views COV=Coverage JUNC=Junctions PEAK=Peaks"
by default.
An intermediate track is created to represent each view, by default
"JUNC"
however this value is not visible to users unless there are
multiple different view values.
Each track is configured as a child to the relevant view track. Track priority is assigned in the order it appears in the track config lines. The priority allows peak tracks to be ordered directly after or before the associated coverage track.
Note that in both scenarios above, there is one top-level parent
track that contains a subset of tracks. The top-level grouping can
be defined in the track lines by supplying two header lines immediately
before each top-level grouping of tracks, referred to as header1
and header2
for clarity.
The first header line header1
is used as the top-level track.
For composite tracks, one composite track view is created underneath
the top-level track for each secondary header header2
.
Composite tracks can associate two views to the same parent by
using only second header line header2
for subsequent track groups.
In this way, composite views can effectively contain a subgroup of
tracks within each top-level header header1
.
For multiWig overlay tracks, each overlay track is grouped into
the top-level header header1
track. However, there is no additional
subgroup available.
An example for two composite tracks, each with one view.
headingA1
headingA2
track name=trackname1
track name=trackname2
headingB1
headingB2
track name=trackname5
track name=trackname6
In this case, there will be two top-level parent tracks, labeled
"headingA1"
and "headingB1"
, which appear inside the track hub.
Within each track, there will be one composite view:
for headingA1
there is one internal track headingA2
; and
for headingB1
there is one internal track headingB2
.
Other jam ucsc browser functions:
assign_track_defaults()
,
get_track_defaults()
,
make_ucsc_trackname()
# example of two composite track top-level parent tracks
track_lines_text <- c("headingA1
headingA2
track name=trackname1 shortLabel=trackname1 bigDataUrl=some_url
track name=trackname2 shortLabel=trackname2 bigDataUrl=some_url
track name=trackname3 shortLabel=trackname3 bigDataUrl=some_url
track name=trackname4 shortLabel=trackname4 bigDataUrl=some_url
headingB1
headingB2
track name=trackname5 shortLabel=trackname5 bigDataUrl=some_url
track name=trackname6 shortLabel=trackname6 bigDataUrl=some_url
track name=trackname7 shortLabel=trackname7 bigDataUrl=some_url
track name=trackname8 shortLabel=trackname8 bigDataUrl=some_url
")
track_lines <- unlist(strsplit(track_lines_text, "\n"));
cat(parse_ucsc_gokey(track_lines))
#>
#>
#> track headingA1
#> compositeTrack on
#> shortLabel headingA1
#> longLabel headingA1
#> superTrack on show
#> configurable on
#> subGroup1 view Views \
#> COV=Coverage \
#> JUNC=Junctions \
#> PEAK=Peaks
#> visibility full
#> priority 5100
#>
#>
#>
#> track headingA2
#> parent headingA1 on
#> shortLabel headingA2
#> longLabel headingA2
#> view COV
#> compositeTrack on
#> type bigwig
#> configurable on
#> centerLabelsDense on
#> dragAndDrop on
#> maxHeightPixels 100:35:5
#> transformFunc NONE
#> smoothingWindow off
#> windowingFunction mean+whiskers
#> gridDefault on
#> autoScale on
#> visibility full
#> priority 5110
#>
#>
#>
#> track trackname1
#> parent headingA2 on
#> type bigwig
#> shortLabel trackname1
#> longLabel trackname1
#> bigDataUrl some_url
#> color 0,0,150
#> gridDefault on
#> autoScale on
#> alwaysZero on
#> smoothingWindow off
#> windowingFunction mean+whiskers
#> visibility full
#> priority 5112
#>
#>
#>
#> track trackname2
#> parent headingA2 on
#> type bigwig
#> shortLabel trackname2
#> longLabel trackname2
#> bigDataUrl some_url
#> color 0,0,150
#> gridDefault on
#> autoScale on
#> alwaysZero on
#> smoothingWindow off
#> windowingFunction mean+whiskers
#> visibility full
#> priority 5114
#>
#>
#>
#> track trackname3
#> parent headingA2 on
#> type bigwig
#> shortLabel trackname3
#> longLabel trackname3
#> bigDataUrl some_url
#> color 0,0,150
#> gridDefault on
#> autoScale on
#> alwaysZero on
#> smoothingWindow off
#> windowingFunction mean+whiskers
#> visibility full
#> priority 5116
#>
#>
#>
#> track trackname4
#> parent headingA2 on
#> type bigwig
#> shortLabel trackname4
#> longLabel trackname4
#> bigDataUrl some_url
#> color 0,0,150
#> gridDefault on
#> autoScale on
#> alwaysZero on
#> smoothingWindow off
#> windowingFunction mean+whiskers
#> visibility full
#> priority 5118
#>
#>
#>
#> track headingB1
#> compositeTrack on
#> shortLabel headingB1
#> longLabel headingB1
#> superTrack on show
#> configurable on
#> subGroup1 view Views \
#> COV=Coverage \
#> JUNC=Junctions \
#> PEAK=Peaks
#> visibility full
#> priority 5300
#>
#>
#>
#> track headingB2
#> parent headingB1 on
#> shortLabel headingB2
#> longLabel headingB2
#> view COV
#> compositeTrack on
#> type bigwig
#> configurable on
#> centerLabelsDense on
#> dragAndDrop on
#> maxHeightPixels 100:35:5
#> transformFunc NONE
#> smoothingWindow off
#> windowingFunction mean+whiskers
#> gridDefault on
#> autoScale on
#> visibility full
#> priority 5330
#>
#>
#>
#> track trackname5
#> parent headingB2 on
#> type bigwig
#> shortLabel trackname5
#> longLabel trackname5
#> bigDataUrl some_url
#> color 0,0,150
#> gridDefault on
#> autoScale on
#> alwaysZero on
#> smoothingWindow off
#> windowingFunction mean+whiskers
#> visibility full
#> priority 5332
#>
#>
#>
#> track trackname6
#> parent headingB2 on
#> type bigwig
#> shortLabel trackname6
#> longLabel trackname6
#> bigDataUrl some_url
#> color 0,0,150
#> gridDefault on
#> autoScale on
#> alwaysZero on
#> smoothingWindow off
#> windowingFunction mean+whiskers
#> visibility full
#> priority 5334
#>
#>
#>
#> track trackname7
#> parent headingB2 on
#> type bigwig
#> shortLabel trackname7
#> longLabel trackname7
#> bigDataUrl some_url
#> color 0,0,150
#> gridDefault on
#> autoScale on
#> alwaysZero on
#> smoothingWindow off
#> windowingFunction mean+whiskers
#> visibility full
#> priority 5336
#>
#>
#>
#> track trackname8
#> parent headingB2 on
#> type bigwig
#> shortLabel trackname8
#> longLabel trackname8
#> bigDataUrl some_url
#> color 0,0,150
#> gridDefault on
#> autoScale on
#> alwaysZero on
#> smoothingWindow off
#> windowingFunction mean+whiskers
#> visibility full
#> priority 5338
#>
track_df <- parse_ucsc_gokey(track_lines, debug="df")
#> ## (12:31:47) 21Sep2023: jamba::sdim(track_dfl):
#> rows cols class
#> trackname1 1 3 data.frame
#> trackname2 1 3 data.frame
#> trackname3 1 3 data.frame
#> trackname4 1 3 data.frame
#> trackname5 1 3 data.frame
#> trackname6 1 3 data.frame
#> trackname7 1 3 data.frame
#> trackname8 1 3 data.frame
#> $trackname1
#> name shortLabel bigDataUrl
#> 1 trackname1 trackname1 some_url
#>
#> $trackname2
#> name shortLabel bigDataUrl
#> 1 trackname2 trackname2 some_url
#>
#> ## (12:31:47) 21Sep2023: head(track_df$name, 4):
#> [1] "trackname1" "trackname2" "trackname3" "trackname4"
# example of two composite track top-level parent tracks
track_lines_text2 <- c("headingA1
headingA2
track name=trackname1_pos shortLabel=trackname1_pos bigDataUrl=some_url
track name=trackname1_neg shortLabel=trackname1_neg bigDataUrl=some_url
track name=trackname2_pos shortLabel=trackname2_pos bigDataUrl=some_url
track name=trackname2_neg shortLabel=trackname2_neg bigDataUrl=some_url
headingB1
headingB2
track name=trackname3_pos shortLabel=trackname3_pos bigDataUrl=some_url
track name=trackname3_neg shortLabel=trackname3_neg bigDataUrl=some_url
track name=trackname4_pos shortLabel=trackname4_pos bigDataUrl=some_url
track name=trackname4_neg shortLabel=trackname4_neg bigDataUrl=some_url
")
track_lines2 <- unlist(strsplit(track_lines_text2, "\n"));
track_text2 <- parse_ucsc_gokey(track_lines2);
cat(track_text2);
#>
#>
#> track headingA1_headingA2
#> superTrack on show
#> shortLabel headingA1: headingA2
#> longLabel headingA1: headingA2
#> configurable on
#> priority 5100
#>
#>
#>
#> track trackname1
#> superTrack headingA1_headingA2 full
#> type bigwig
#> container multiWig
#> aggregate transparentOverlay
#> shortLabel trackname1
#> longLabel trackname1
#> showSubtrackColorOnUi on
#> centerLabelsDense on
#> alwaysZero on
#> graphTypeDefault bar
#> maxHeightPixels 100:66:5
#> autoScale on
#> windowingFunction mean+whiskers
#> visibility full
#> priority 5110
#>
#>
#>
#> track trackname1_pos
#> parent trackname1
#> shortLabel trackname1_pos
#> longLabel trackname1_pos
#> bigDataUrl some_url
#> type bigwig
#> color 0,0,150
#> priority 5112
#>
#>
#>
#> track trackname1_neg
#> parent trackname1
#> shortLabel trackname1_neg
#> longLabel trackname1_neg
#> bigDataUrl some_url
#> type bigwig
#> color 0,0,150
#> priority 5114
#>
#>
#>
#> track trackname2
#> superTrack headingA1_headingA2 full
#> type bigwig
#> container multiWig
#> aggregate transparentOverlay
#> shortLabel trackname2
#> longLabel trackname2
#> showSubtrackColorOnUi on
#> centerLabelsDense on
#> alwaysZero on
#> graphTypeDefault bar
#> maxHeightPixels 100:66:5
#> autoScale on
#> windowingFunction mean+whiskers
#> visibility full
#> priority 5130
#>
#>
#>
#> track trackname2_pos
#> parent trackname2
#> shortLabel trackname2_pos
#> longLabel trackname2_pos
#> bigDataUrl some_url
#> type bigwig
#> color 0,0,150
#> priority 5132
#>
#>
#>
#> track trackname2_neg
#> parent trackname2
#> shortLabel trackname2_neg
#> longLabel trackname2_neg
#> bigDataUrl some_url
#> type bigwig
#> color 0,0,150
#> priority 5134
#>
#>
#>
#> track headingB1_headingB2
#> superTrack on show
#> shortLabel headingB1: headingB2
#> longLabel headingB1: headingB2
#> configurable on
#> priority 5300
#>
#>
#>
#> track trackname3
#> superTrack headingB1_headingB2 full
#> type bigwig
#> container multiWig
#> aggregate transparentOverlay
#> shortLabel trackname3
#> longLabel trackname3
#> showSubtrackColorOnUi on
#> centerLabelsDense on
#> alwaysZero on
#> graphTypeDefault bar
#> maxHeightPixels 100:66:5
#> autoScale on
#> windowingFunction mean+whiskers
#> visibility full
#> priority 5350
#>
#>
#>
#> track trackname3_pos
#> parent trackname3
#> shortLabel trackname3_pos
#> longLabel trackname3_pos
#> bigDataUrl some_url
#> type bigwig
#> color 0,0,150
#> priority 5352
#>
#>
#>
#> track trackname3_neg
#> parent trackname3
#> shortLabel trackname3_neg
#> longLabel trackname3_neg
#> bigDataUrl some_url
#> type bigwig
#> color 0,0,150
#> priority 5354
#>
#>
#>
#> track trackname4
#> superTrack headingB1_headingB2 full
#> type bigwig
#> container multiWig
#> aggregate transparentOverlay
#> shortLabel trackname4
#> longLabel trackname4
#> showSubtrackColorOnUi on
#> centerLabelsDense on
#> alwaysZero on
#> graphTypeDefault bar
#> maxHeightPixels 100:66:5
#> autoScale on
#> windowingFunction mean+whiskers
#> visibility full
#> priority 5370
#>
#>
#>
#> track trackname4_pos
#> parent trackname4
#> shortLabel trackname4_pos
#> longLabel trackname4_pos
#> bigDataUrl some_url
#> type bigwig
#> color 0,0,150
#> priority 5372
#>
#>
#>
#> track trackname4_neg
#> parent trackname4
#> shortLabel trackname4_neg
#> longLabel trackname4_neg
#> bigDataUrl some_url
#> type bigwig
#> color 0,0,150
#> priority 5374
#>
# the final step is to save into a text file
if (FALSE) {
cat(track_text2, file="trackDb_platjam.txt")
}