Parse UCSC tracks that use the Gokey format

parse_ucsc_gokey(
  track_lines,
  overlay_grep = c("[ -._](plus|minus|F|R|pos|neg)($|[ -._])"),
  priority = 5000,
  output_format = c("text", "list"),
  debug = c("none"),
  multiwig_concat_header = TRUE,
  verbose = FALSE,
  ...
)

Arguments

track_lines

character vector containing lines read from a track file, or valid path or connection to a track file.

overlay_grep

character vector containing valid regular expression patterns used to recognize when a track should be considered an overlay coverage track. For example track name="trackA F" and track name="trackA R" would be recognized as forward and reverse strand for a track named "trackA". Overlay tracks are handled using the UCSC "multiWig" approach, and not the composite track approach. To disable overlay_grep, use overlay_grep="^$". To enable overlay_grep for all tracks, use overlay_grep="$".

priority

integer value indicating the priority to start when assigning priority to each track.

output_format

character string indicating the output format, where "text" will return one long character string, and "list" will return a list with one track per list element with class "glue","character".

debug

character indicating type of debug output:

  • df: returns the intermediate track_df data.frame;

  • pri: prints priority during track parsing;

  • none: does no debug, the default.

multiwig_concat_header

logical indicating whether multiWig parent tracks should be named by concatenating header1 and header2 values.

verbose

logical indicating whether to print verbose output during processing.

...

additional arguments are treated as a named list of track parameters that override existing parameter values. For example scoreFilter=1 will override the default for bigBed tracks scoreFilter=5.

Value

by default a character string suitable to cat()

directly into a text file, when output_format="text". When output_format="list" it returns a list of glue objects, which can be concatenated into one character string with Reduce("+", trackline_list).

Details

Given a text file, or lines from a text file, representing the Gokey format, this function will parse the track lines into groups, and return a text string usable in a UCSC genome browser track hub.

In general, the intention is to convert a set of UCSC track lines to a track hub format, where common track options are converted to relevant track hub configuration lines.

Tracks are generally divided into two types of groupings:

multiWig Overlay Tracks

Track name that matches overlay_grep regular expression pattern are configured as multiWig overlay tracks. This configuration uses the UCSC multiWig format as described here https://genome.ucsc.edu/goldenPath/help/trackDb/trackDbHub.html#aggregate

  • More specifically, a parent track is configured as a superTrack.

  • Each track matching overlay_grep is converted to a shared track name after removing the relevant grep pattern. Each unique track group is used as an intermediate track with "container multiWig".

  • Each track group is assigned priority in order of each unique track group defined in the track config lines.

  • Individual tracks are configured as child tracks to the track groups.

Composite View Tracks

All other tracks are grouped as composite tracks, specifically using composite track view, as described here https://genome.ucsc.edu/goldenPath/help/trackDb/trackDbHub.html#compositeTrack

  • More specifically, the parent track is configured as a compositeTrack, including views as "view Views COV=Coverage JUNC=Junctions PEAK=Peaks" by default.

  • An intermediate track is created to represent each view, by default "JUNC" however this value is not visible to users unless there are multiple different view values.

  • Each track is configured as a child to the relevant view track. Track priority is assigned in the order it appears in the track config lines. The priority allows peak tracks to be ordered directly after or before the associated coverage track.

Top-Level Parent Tracks

Note that in both scenarios above, there is one top-level parent track that contains a subset of tracks. The top-level grouping can be defined in the track lines by supplying two header lines immediately before each top-level grouping of tracks, referred to as header1 and header2 for clarity.

The first header line header1 is used as the top-level track. For composite tracks, one composite track view is created underneath the top-level track for each secondary header header2. Composite tracks can associate two views to the same parent by using only second header line header2 for subsequent track groups. In this way, composite views can effectively contain a subgroup of tracks within each top-level header header1.

For multiWig overlay tracks, each overlay track is grouped into the top-level header header1 track. However, there is no additional subgroup available.

An example for two composite tracks, each with one view.

headingA1
headingA2
track name=trackname1
track name=trackname2

headingB1
headingB2
track name=trackname5
track name=trackname6

In this case, there will be two top-level parent tracks, labeled "headingA1" and "headingB1", which appear inside the track hub. Within each track, there will be one composite view: for headingA1 there is one internal track headingA2; and for headingB1 there is one internal track headingB2.

See also

Other jam ucsc browser functions: assign_track_defaults(), get_track_defaults(), make_ucsc_trackname()

Examples

# example of two composite track top-level parent tracks
track_lines_text <- c("headingA1
headingA2
track name=trackname1 shortLabel=trackname1 bigDataUrl=some_url
track name=trackname2 shortLabel=trackname2 bigDataUrl=some_url
track name=trackname3 shortLabel=trackname3 bigDataUrl=some_url
track name=trackname4 shortLabel=trackname4 bigDataUrl=some_url

headingB1
headingB2
track name=trackname5 shortLabel=trackname5 bigDataUrl=some_url
track name=trackname6 shortLabel=trackname6 bigDataUrl=some_url
track name=trackname7 shortLabel=trackname7 bigDataUrl=some_url
track name=trackname8 shortLabel=trackname8 bigDataUrl=some_url
")
track_lines <- unlist(strsplit(track_lines_text, "\n"));
cat(parse_ucsc_gokey(track_lines))
#> 
#> 
#> track             headingA1
#> compositeTrack    on
#> shortLabel        headingA1
#> longLabel         headingA1
#> superTrack        on show
#> configurable      on
#> subGroup1         view Views \
#>    COV=Coverage \
#>    JUNC=Junctions \
#>    PEAK=Peaks
#> visibility        full
#> priority          5100
#> 
#>  
#> 
#>    track                headingA2
#>    parent               headingA1 on
#>    shortLabel           headingA2
#>    longLabel            headingA2
#>    view                 COV
#>    compositeTrack       on
#>    type                 bigwig
#>    configurable         on
#>    centerLabelsDense    on
#>    dragAndDrop          on
#>    maxHeightPixels      100:35:5
#>    transformFunc        NONE
#>    smoothingWindow      off
#>    windowingFunction    mean+whiskers
#>    gridDefault          on
#>    autoScale            on
#>    visibility           full
#>    priority             5110
#> 
#>  
#> 
#>       track                trackname1
#>       parent               headingA2 on
#>       type                 bigwig
#>       shortLabel           trackname1
#>       longLabel            trackname1
#>       bigDataUrl           some_url
#>       color                0,0,150
#>       gridDefault          on
#>       autoScale            on
#>       alwaysZero           on
#>       smoothingWindow      off
#>       windowingFunction    mean+whiskers
#>       visibility           full
#>       priority             5112
#> 
#>  
#> 
#>       track                trackname2
#>       parent               headingA2 on
#>       type                 bigwig
#>       shortLabel           trackname2
#>       longLabel            trackname2
#>       bigDataUrl           some_url
#>       color                0,0,150
#>       gridDefault          on
#>       autoScale            on
#>       alwaysZero           on
#>       smoothingWindow      off
#>       windowingFunction    mean+whiskers
#>       visibility           full
#>       priority             5114
#> 
#>  
#> 
#>       track                trackname3
#>       parent               headingA2 on
#>       type                 bigwig
#>       shortLabel           trackname3
#>       longLabel            trackname3
#>       bigDataUrl           some_url
#>       color                0,0,150
#>       gridDefault          on
#>       autoScale            on
#>       alwaysZero           on
#>       smoothingWindow      off
#>       windowingFunction    mean+whiskers
#>       visibility           full
#>       priority             5116
#> 
#>  
#> 
#>       track                trackname4
#>       parent               headingA2 on
#>       type                 bigwig
#>       shortLabel           trackname4
#>       longLabel            trackname4
#>       bigDataUrl           some_url
#>       color                0,0,150
#>       gridDefault          on
#>       autoScale            on
#>       alwaysZero           on
#>       smoothingWindow      off
#>       windowingFunction    mean+whiskers
#>       visibility           full
#>       priority             5118
#> 
#>   
#> 
#> track             headingB1
#> compositeTrack    on
#> shortLabel        headingB1
#> longLabel         headingB1
#> superTrack        on show
#> configurable      on
#> subGroup1         view Views \
#>    COV=Coverage \
#>    JUNC=Junctions \
#>    PEAK=Peaks
#> visibility        full
#> priority          5300
#> 
#>   
#> 
#>    track                headingB2
#>    parent               headingB1 on
#>    shortLabel           headingB2
#>    longLabel            headingB2
#>    view                 COV
#>    compositeTrack       on
#>    type                 bigwig
#>    configurable         on
#>    centerLabelsDense    on
#>    dragAndDrop          on
#>    maxHeightPixels      100:35:5
#>    transformFunc        NONE
#>    smoothingWindow      off
#>    windowingFunction    mean+whiskers
#>    gridDefault          on
#>    autoScale            on
#>    visibility           full
#>    priority             5330
#> 
#>  
#> 
#>       track                trackname5
#>       parent               headingB2 on
#>       type                 bigwig
#>       shortLabel           trackname5
#>       longLabel            trackname5
#>       bigDataUrl           some_url
#>       color                0,0,150
#>       gridDefault          on
#>       autoScale            on
#>       alwaysZero           on
#>       smoothingWindow      off
#>       windowingFunction    mean+whiskers
#>       visibility           full
#>       priority             5332
#> 
#>  
#> 
#>       track                trackname6
#>       parent               headingB2 on
#>       type                 bigwig
#>       shortLabel           trackname6
#>       longLabel            trackname6
#>       bigDataUrl           some_url
#>       color                0,0,150
#>       gridDefault          on
#>       autoScale            on
#>       alwaysZero           on
#>       smoothingWindow      off
#>       windowingFunction    mean+whiskers
#>       visibility           full
#>       priority             5334
#> 
#>  
#> 
#>       track                trackname7
#>       parent               headingB2 on
#>       type                 bigwig
#>       shortLabel           trackname7
#>       longLabel            trackname7
#>       bigDataUrl           some_url
#>       color                0,0,150
#>       gridDefault          on
#>       autoScale            on
#>       alwaysZero           on
#>       smoothingWindow      off
#>       windowingFunction    mean+whiskers
#>       visibility           full
#>       priority             5336
#> 
#>  
#> 
#>       track                trackname8
#>       parent               headingB2 on
#>       type                 bigwig
#>       shortLabel           trackname8
#>       longLabel            trackname8
#>       bigDataUrl           some_url
#>       color                0,0,150
#>       gridDefault          on
#>       autoScale            on
#>       alwaysZero           on
#>       smoothingWindow      off
#>       windowingFunction    mean+whiskers
#>       visibility           full
#>       priority             5338
#> 
track_df <- parse_ucsc_gokey(track_lines, debug="df")
#> ##  (12:31:47) 21Sep2023:   jamba::sdim(track_dfl): 
#>            rows cols      class
#> trackname1    1    3 data.frame
#> trackname2    1    3 data.frame
#> trackname3    1    3 data.frame
#> trackname4    1    3 data.frame
#> trackname5    1    3 data.frame
#> trackname6    1    3 data.frame
#> trackname7    1    3 data.frame
#> trackname8    1    3 data.frame
#> $trackname1
#>         name shortLabel bigDataUrl
#> 1 trackname1 trackname1   some_url
#> 
#> $trackname2
#>         name shortLabel bigDataUrl
#> 1 trackname2 trackname2   some_url
#> 
#> ##  (12:31:47) 21Sep2023:   head(track_df$name, 4): 
#> [1] "trackname1" "trackname2" "trackname3" "trackname4"

# example of two composite track top-level parent tracks
track_lines_text2 <- c("headingA1
headingA2
track name=trackname1_pos shortLabel=trackname1_pos bigDataUrl=some_url
track name=trackname1_neg shortLabel=trackname1_neg bigDataUrl=some_url
track name=trackname2_pos shortLabel=trackname2_pos bigDataUrl=some_url
track name=trackname2_neg shortLabel=trackname2_neg bigDataUrl=some_url

headingB1
headingB2
track name=trackname3_pos shortLabel=trackname3_pos bigDataUrl=some_url
track name=trackname3_neg shortLabel=trackname3_neg bigDataUrl=some_url
track name=trackname4_pos shortLabel=trackname4_pos bigDataUrl=some_url
track name=trackname4_neg shortLabel=trackname4_neg bigDataUrl=some_url
")
track_lines2 <- unlist(strsplit(track_lines_text2, "\n"));
track_text2 <- parse_ucsc_gokey(track_lines2);
cat(track_text2);
#> 
#> 
#> track                headingA1_headingA2
#> superTrack           on show
#> shortLabel           headingA1: headingA2
#> longLabel            headingA1: headingA2
#> configurable         on
#> priority             5100
#> 
#>  
#> 
#>    track                  trackname1
#>    superTrack             headingA1_headingA2 full
#>    type                   bigwig
#>    container              multiWig
#>    aggregate              transparentOverlay
#>    shortLabel             trackname1
#>    longLabel              trackname1
#>    showSubtrackColorOnUi  on
#>    centerLabelsDense      on
#>    alwaysZero             on
#>    graphTypeDefault       bar
#>    maxHeightPixels        100:66:5
#>    autoScale              on
#>    windowingFunction      mean+whiskers
#>    visibility             full
#>    priority               5110
#> 
#>  
#> 
#>       track             trackname1_pos
#>       parent            trackname1
#>       shortLabel        trackname1_pos
#>       longLabel         trackname1_pos
#>       bigDataUrl        some_url
#>       type              bigwig
#>       color             0,0,150
#>       priority          5112
#> 
#>  
#> 
#>       track             trackname1_neg
#>       parent            trackname1
#>       shortLabel        trackname1_neg
#>       longLabel         trackname1_neg
#>       bigDataUrl        some_url
#>       type              bigwig
#>       color             0,0,150
#>       priority          5114
#> 
#>  
#> 
#>    track                  trackname2
#>    superTrack             headingA1_headingA2 full
#>    type                   bigwig
#>    container              multiWig
#>    aggregate              transparentOverlay
#>    shortLabel             trackname2
#>    longLabel              trackname2
#>    showSubtrackColorOnUi  on
#>    centerLabelsDense      on
#>    alwaysZero             on
#>    graphTypeDefault       bar
#>    maxHeightPixels        100:66:5
#>    autoScale              on
#>    windowingFunction      mean+whiskers
#>    visibility             full
#>    priority               5130
#> 
#>  
#> 
#>       track             trackname2_pos
#>       parent            trackname2
#>       shortLabel        trackname2_pos
#>       longLabel         trackname2_pos
#>       bigDataUrl        some_url
#>       type              bigwig
#>       color             0,0,150
#>       priority          5132
#> 
#>  
#> 
#>       track             trackname2_neg
#>       parent            trackname2
#>       shortLabel        trackname2_neg
#>       longLabel         trackname2_neg
#>       bigDataUrl        some_url
#>       type              bigwig
#>       color             0,0,150
#>       priority          5134
#> 
#>    
#> 
#> track                headingB1_headingB2
#> superTrack           on show
#> shortLabel           headingB1: headingB2
#> longLabel            headingB1: headingB2
#> configurable         on
#> priority             5300
#> 
#>    
#> 
#>    track                  trackname3
#>    superTrack             headingB1_headingB2 full
#>    type                   bigwig
#>    container              multiWig
#>    aggregate              transparentOverlay
#>    shortLabel             trackname3
#>    longLabel              trackname3
#>    showSubtrackColorOnUi  on
#>    centerLabelsDense      on
#>    alwaysZero             on
#>    graphTypeDefault       bar
#>    maxHeightPixels        100:66:5
#>    autoScale              on
#>    windowingFunction      mean+whiskers
#>    visibility             full
#>    priority               5350
#> 
#>  
#> 
#>       track             trackname3_pos
#>       parent            trackname3
#>       shortLabel        trackname3_pos
#>       longLabel         trackname3_pos
#>       bigDataUrl        some_url
#>       type              bigwig
#>       color             0,0,150
#>       priority          5352
#> 
#>  
#> 
#>       track             trackname3_neg
#>       parent            trackname3
#>       shortLabel        trackname3_neg
#>       longLabel         trackname3_neg
#>       bigDataUrl        some_url
#>       type              bigwig
#>       color             0,0,150
#>       priority          5354
#> 
#>  
#> 
#>    track                  trackname4
#>    superTrack             headingB1_headingB2 full
#>    type                   bigwig
#>    container              multiWig
#>    aggregate              transparentOverlay
#>    shortLabel             trackname4
#>    longLabel              trackname4
#>    showSubtrackColorOnUi  on
#>    centerLabelsDense      on
#>    alwaysZero             on
#>    graphTypeDefault       bar
#>    maxHeightPixels        100:66:5
#>    autoScale              on
#>    windowingFunction      mean+whiskers
#>    visibility             full
#>    priority               5370
#> 
#>  
#> 
#>       track             trackname4_pos
#>       parent            trackname4
#>       shortLabel        trackname4_pos
#>       longLabel         trackname4_pos
#>       bigDataUrl        some_url
#>       type              bigwig
#>       color             0,0,150
#>       priority          5372
#> 
#>  
#> 
#>       track             trackname4_neg
#>       parent            trackname4
#>       shortLabel        trackname4_neg
#>       longLabel         trackname4_neg
#>       bigDataUrl        some_url
#>       type              bigwig
#>       color             0,0,150
#>       priority          5374
#> 

# the final step is to save into a text file
if (FALSE) {
   cat(track_text2, file="trackDb_platjam.txt")
}