[
  {
    "path": ".Rbuildignore",
    "content": "^ggtranscript\\.Rproj$\n^\\.Rproj\\.user$\n^dev$\n^README\\.Rmd$\n^\\.github$\n^codecov\\.yml$\n^.pre-commit-config.yaml$\n^data-raw$\n^_pkgdown.yml$\n^tests/testthat/_snaps/*\n^LICENSE\\.md$\n^cran-comments\\.md$\n"
  },
  {
    "path": ".github/.gitignore",
    "content": "*.html\n"
  },
  {
    "path": ".github/workflows/check-bioc.yml",
    "content": "on:\n  push:\n  pull_request:\n\nname: R-CMD-check-bioc\n\n## These environment variables control whether to run GHA code later on that is\n## specific to testthat, covr, and pkgdown.\nenv:\n  has_testthat: 'true'\n  run_covr: 'true'\n  run_pkgdown: 'true'\n  has_RUnit: 'false'\n  cache-version: 'cache-v1'\n\njobs:\n  build-check:\n    runs-on: ${{ matrix.config.os }}\n    name: ${{ matrix.config.os }} (${{ matrix.config.r }})\n    container: ${{ matrix.config.cont }}\n\n    strategy:\n      fail-fast: false\n      matrix:\n        config:\n          - { os: ubuntu-latest, r: '4.4', bioc: '3.19', cont: \"bioconductor/bioconductor_docker:RELEASE_3_19\", rspm: \"https://packagemanager.rstudio.com/cran/__linux__/focal/latest\" }\n          \n    env:\n      R_REMOTES_NO_ERRORS_FROM_WARNINGS: true\n      RSPM: ${{ matrix.config.rspm }}\n      NOT_CRAN: true\n      TZ: UTC\n      GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}\n      GITHUB_PAT: ${{ secrets.GITHUB_TOKEN }}\n\n    steps:\n\n      - name: Set R Library home on Linux\n        if: runner.os == 'Linux'\n        run: |\n          mkdir /__w/_temp/Library\n          echo \".libPaths('/__w/_temp/Library')\" > ~/.Rprofile\n\n      - name: Checkout Repository\n        uses: actions/checkout@v3\n\n      - name: Setup R from r-lib\n        if: runner.os != 'Linux'\n        uses: r-lib/actions/setup-r@v2\n        with:\n          r-version: ${{ matrix.config.r }}\n          http-user-agent: ${{ matrix.config.http-user-agent }}\n\n      - name: Setup pandoc from r-lib\n        if: runner.os != 'Linux'\n        uses: r-lib/actions/setup-pandoc@v2\n\n      - name: Query dependencies\n        run: |\n          install.packages('remotes')\n          saveRDS(remotes::dev_package_deps(dependencies = TRUE), \".github/depends.Rds\", version = 2)\n        shell: Rscript {0}\n\n      - name: Restore R package cache\n        if: \"!contains(github.event.head_commit.message, '/nocache') && runner.os != 'Linux'\"\n        uses: actions/cache@v3\n        with:\n          path: ${{ env.R_LIBS_USER }}\n          key: ${{ env.cache-version }}-${{ runner.os }}-biocversion-RELEASE_3_14-r-4.1-${{ hashFiles('.github/depends.Rds') }}\n          restore-keys: ${{ env.cache-version }}-${{ runner.os }}-biocversion-RELEASE_3_14-r-4.1-\n\n      - name: Cache R packages on Linux\n        if: \"!contains(github.event.head_commit.message, '/nocache') && runner.os == 'Linux' \"\n        uses: actions/cache@v3\n        with:\n          path: /home/runner/work/_temp/Library\n          key: ${{ env.cache-version }}-${{ runner.os }}-biocversion-RELEASE_3_14-r-4.1-${{ hashFiles('.github/depends.Rds') }}\n          restore-keys: ${{ env.cache-version }}-${{ runner.os }}-biocversion-RELEASE_3_14-r-4.1-\n\n      - name: Install Linux system dependencies\n        if: runner.os == 'Linux'\n        run: |\n          sysreqs=$(Rscript -e 'cat(\"apt-get update -y && apt-get install -y\", paste(gsub(\"apt-get install -y \", \"\", remotes::system_requirements(\"ubuntu\", \"20.04\")), collapse = \" \"))')\n          echo $sysreqs\n          sudo -s eval \"$sysreqs\"\n\n      - name: Install BiocManager\n        run: |\n          message(paste('****', Sys.time(), 'installing BiocManager ****'))\n          remotes::install_cran(\"BiocManager\")\n        shell: Rscript {0}\n\n      - name: Set BiocVersion\n        run: |\n          BiocManager::install(version = \"${{ matrix.config.bioc }}\", ask = FALSE, force = TRUE)\n        shell: Rscript {0}\n\n      - name: Install dependencies pass 1\n        run: |\n          ## Try installing the package dependencies in steps. First the local\n          ## dependencies, then any remaining dependencies to avoid the\n          ## issues described at\n          ## https://stat.ethz.ch/pipermail/bioc-devel/2020-April/016675.html\n          ## https://github.com/r-lib/remotes/issues/296\n          ## Ideally, all dependencies should get installed in the first pass.\n\n          ## Set the repos source depending on the OS\n          ## Alternatively use https://storage.googleapis.com/bioconductor_docker/packages/\n          ## though based on https://bit.ly/bioc2021-package-binaries\n          ## the Azure link will be the main one going forward.\n          gha_repos <- if(\n              .Platform$OS.type == \"unix\" && Sys.info()[\"sysname\"] != \"Darwin\"\n          ) c(\n              \"AnVIL\" = \"https://bioconductordocker.blob.core.windows.net/packages/3.14/bioc\",\n              BiocManager::repositories()\n              ) else BiocManager::repositories()\n\n          ## For running the checks\n          message(paste('****', Sys.time(), 'installing rcmdcheck and BiocCheck ****'))\n          install.packages(c(\"rcmdcheck\", \"BiocCheck\"), repos = gha_repos)\n\n          ## Pass #1 at installing dependencies\n          ## This pass uses AnVIL-powered fast binaries\n          ## details at https://github.com/nturaga/bioc2021-bioconductor-binaries\n          ## The speed gains only apply to the docker builds.\n          message(paste('****', Sys.time(), 'pass number 1 at installing dependencies: local dependencies ****'))\n          remotes::install_local(dependencies = TRUE, repos = gha_repos, build_vignettes = FALSE, upgrade = TRUE)\n        continue-on-error: true\n        shell: Rscript {0}\n\n      - name: Install dependencies pass 2\n        run: |\n          ## Pass #2 at installing dependencies\n          ## This pass does not use AnVIL and will thus update any packages\n          ## that have seen been updated in Bioconductor\n          message(paste('****', Sys.time(), 'pass number 2 at installing dependencies: any remaining dependencies ****'))\n          remotes::install_local(dependencies = TRUE, repos = BiocManager::repositories(), build_vignettes = TRUE, upgrade = TRUE, force = TRUE)\n        shell: Rscript {0}\n\n      - name: Install BiocGenerics\n        if:  env.has_RUnit == 'true'\n        run: |\n          ## Install BiocGenerics\n          BiocManager::install(\"BiocGenerics\")\n        shell: Rscript {0}\n\n      - name: Install covr\n        if: github.ref == 'refs/heads/master' && env.run_covr == 'true' && runner.os == 'Linux'\n        run: |\n          remotes::install_cran(\"covr\")\n        shell: Rscript {0}\n\n      - name: Install pkgdown\n        if: github.ref == 'refs/heads/master' && env.run_pkgdown == 'true' && runner.os == 'Linux'\n        run: |\n          remotes::install_cran(\"pkgdown\")\n        shell: Rscript {0}\n\n      - name: Session info\n        run: |\n          options(width = 100)\n          pkgs <- installed.packages()[, \"Package\"]\n          sessioninfo::session_info(pkgs, include_base = TRUE)\n        shell: Rscript {0}\n\n      - name: Run CMD check\n        env:\n          _R_CHECK_CRAN_INCOMING_: false\n          DISPLAY: 99.0\n        run: |\n          options(crayon.enabled = TRUE)\n          rcmdcheck::rcmdcheck(\n              args = c(\"--no-manual\", \"--no-vignettes\", \"--timings\"),\n              build_args = c(\"--no-manual\", \"--keep-empty-dirs\", \"--no-resave-data\"),\n              error_on = \"warning\",\n              check_dir = \"check\"\n          )\n        shell: Rscript {0}\n\n      ## Might need an to add this to the if:  && runner.os == 'Linux'\n      - name: Reveal testthat details\n        if:  env.has_testthat == 'true'\n        run: find . -name testthat.Rout -exec cat '{}' ';'\n\n      - name: Run RUnit tests\n        if:  env.has_RUnit == 'true'\n        run: |\n          BiocGenerics:::testPackage()\n        shell: Rscript {0}\n\n      - name: Run BiocCheck\n        env:\n          DISPLAY: 99.0\n        run: |\n          BiocCheck::BiocCheck(\n              dir('check', 'tar.gz$', full.names = TRUE),\n              `quit-with-status` = FALSE,\n              `no-check-R-ver` = TRUE,\n              `no-check-bioc-help` = TRUE\n          )\n        shell: Rscript {0}\n\n      - name: Test coverage\n        if: github.ref == 'refs/heads/master' && env.run_covr == 'true' && runner.os == 'Linux'\n        run: |\n          covr::codecov()\n        shell: Rscript {0}\n\n      - name: Install package\n        if: github.ref == 'refs/heads/master' && env.run_pkgdown == 'true' && runner.os == 'Linux'\n        run: R CMD INSTALL .\n        \n      - name: Get R package info\n        if: runner.os == 'Linux'\n        run: |\n          #### DockerHub repos must be lowercase (,,) ####\n          name=$(grep '^Package:' DESCRIPTION | cut -d\\   -f2)\n          echo \"packageNameOrig=${name}\" >> $GITHUB_ENV\n          echo $name\n          version=$(grep Version DESCRIPTION | grep -o \"[0-9.]\\+\")\n          echo \"packageVersion=${version}\" >> $GITHUB_ENV\n          echo $version\n        shell: bash {0}\n\n      - name: Build and deploy pkgdown site\n        if: github.ref == 'refs/heads/master' && env.run_pkgdown == 'true' && runner.os == 'Linux'\n        run: |\n          git config --global --add safe.directory /__w/${{env.packageNameOrig}}/${{env.packageNameOrig}} \n          git config --local user.name \"$GITHUB_ACTOR\"\n          git config --local user.email \"$GITHUB_ACTOR@users.noreply.github.com\"\n          Rscript -e \"pkgdown::deploy_to_branch(new_process = FALSE)\"\n        shell: bash {0}\n        ## Note that you need to run pkgdown::deploy_to_branch(new_process = FALSE)\n        ## at least one locally before this will work. This creates the gh-pages\n        ## branch (erasing anything you haven't version controlled!) and\n        ## makes the git history recognizable by pkgdown.\n\n      - name: Upload check results\n        if: failure()\n        uses: actions/upload-artifact@master\n        with:\n          name: ${{ runner.os }}-biocversion-RELEASE_3_14-r-4.1-results\n          path: check\n"
  },
  {
    "path": ".gitignore",
    "content": ".Rproj.user\ninst/doc\n*.DS_Store\n*.pdf\n*.Rproj\n"
  },
  {
    "path": ".pre-commit-config.yaml",
    "content": "repos:\n-   repo: https://github.com/lorenzwalthert/precommit\n    rev: v0.1.3.9133\n    hooks:\n    -   id: readme-rmd-rendered # make sure README.Rmd is rendered to README.md\n    -   id: parsable-R\n        exclude: >\n          (?x)^(\n          tests/testthat/in/style-files-fail-parse\\.R|\n          tests/testthat/in/parsable-R-fail\\.R|\n          )$\n    -   id: style-files # style code in the tidyverse style\n        args: [--indent_by=4]\n        exclude: >\n          (?x)^(\n          tests/testthat/in/.*\\.R|\n          renv/.*\n          )$\n    -   id: deps-in-desc # all dependencies pkg::func are in listed in dec\n        args: [--allow_private_imports]\n        exclude: >\n          (?x)^(\n          tests/testthat/in/.*|\n          inst/renv-update\\.R|\n          renv/activate.R|\n          vignettes/FAQ\\.Rmd|\n          )$\n    -   id: lintr\n        args: [--warn_only]\n        verbose: true\n\n-   repo: https://github.com/pre-commit/pre-commit-hooks\n    rev: v4.0.1\n    hooks:\n    -   id: check-added-large-files # make sure no large files commited\n    -   id: end-of-file-fixer\n        exclude: '\\.Rd'\n"
  },
  {
    "path": "DESCRIPTION",
    "content": "Package: ggtranscript\nTitle: Visualizing Transcript Structure and Annotation using 'ggplot2'\nVersion: 1.0.0\nAuthors@R:\n    c(\n    person(\"David\", \"Zhang\", , \"dyzhang32@gmail.com\", \n           role = c(\"aut\", \"cre\"),\n           comment = c(ORCID = \"0000-0003-2382-8460\")), \n    person(\"Emil\", \"Gustavsson\", , \"e.gustavsson@ucl.ac.uk\", role = c(\"aut\"),\n           comment = c(ORCID = \"0000-0003-0541-7537\")),\n    person(\"Regina\", \"Reynolds\", , \"regina.reynolds.16@ucl.ac.uk\", \n           role = c(\"ctb\"), comment = c(ORCID = \"0000-0001-6470-7919\")), \n    person(\"Sonia\", \"Ruiz\", , \"s.ruiz@ucl.ac.uk\", \n           role = c(\"ctb\"))\n    )\nDescription: The goal of ggtranscript is the simplify the process of visualizing \n    transcript structure and annotation. To achieve this, ggtranscript \n    introduces 5 new geoms (geom_range(), geom_half_range(), geom_intron(), \n    geom_junction() and geom_junction_label_repel()) as well as several helper \n    functions. As a 'ggplot2' extension, ggtranscript inherits 'ggplot2's \n    familiarity and flexibility, enabling users to intuitively adjust \n    aesthetics, parameters, scales etc as well as complement ggtranscript geoms\n    with existing 'ggplot2' geoms to create informative, publication-ready \n    plots.\nLicense: MIT + file LICENSE\nURL: https://github.com/dzhang32/ggtranscript\nBugReports: https://github.com/dzhang32/ggtranscript/issues\nEncoding: UTF-8\nRoxygen: list(markdown = TRUE)\nRoxygenNote: 7.3.2\nSuggests: \n    BiocStyle,\n    covr,\n    ggpubr,\n    knitr,\n    rmarkdown,\n    rtracklayer,\n    sessioninfo,\n    testthat (>= 3.0.0),\n    vdiffr\nConfig/testthat/edition: 3\nVignetteBuilder: knitr\nDepends: \n    R (>= 2.10)\nLazyData: true\nImports: \n    dplyr,\n    GenomicRanges,\n    ggplot2,\n    magrittr,\n    rlang,\n    S4Vectors,\n    GenomeInfoDb,\n    ggrepel\nCollate: \n    'add_exon_number.R'\n    'add_utr.R'\n    'data.R'\n    'geom_range.R'\n    'geom_half_range.R'\n    'geom_intron.R'\n    'geom_junction.R'\n    'geom_junction_label_repel.R'\n    'ggtranscript-package.R'\n    'globals.R'\n    'shorten_gaps.R'\n    'to_diff.R'\n    'to_intron.R'\n    'utils.R'\n"
  },
  {
    "path": "LICENSE",
    "content": "YEAR: 2022\nCOPYRIGHT HOLDER: ggtranscript authors\n"
  },
  {
    "path": "LICENSE.md",
    "content": "# MIT License\n\nCopyright (c) 2022 ggtranscript authors\n\nPermission is hereby granted, free of charge, to any person obtaining a copy\nof this software and associated documentation files (the \"Software\"), to deal\nin the Software without restriction, including without limitation the rights\nto use, copy, modify, merge, publish, distribute, sublicense, and/or sell\ncopies of the Software, and to permit persons to whom the Software is\nfurnished to do so, subject to the following conditions:\n\nThe above copyright notice and this permission notice shall be included in all\ncopies or substantial portions of the Software.\n\nTHE SOFTWARE IS PROVIDED \"AS IS\", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR\nIMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,\nFITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE\nAUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER\nLIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,\nOUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE\nSOFTWARE.\n"
  },
  {
    "path": "NAMESPACE",
    "content": "# Generated by roxygen2: do not edit by hand\n\nexport(add_exon_number)\nexport(add_utr)\nexport(geom_half_range)\nexport(geom_intron)\nexport(geom_junction)\nexport(geom_junction_label_repel)\nexport(geom_range)\nexport(shorten_gaps)\nexport(to_diff)\nexport(to_intron)\nimport(ggrepel)\nimportFrom(ggplot2,aes)\nimportFrom(magrittr,\"%>%\")\nimportFrom(rlang,\"%||%\")\n"
  },
  {
    "path": "NEWS.md",
    "content": "\n# ggtranscript 1.0.0\n\n## Fixes\n\n* Updates `geom`s to work with the latest version of `ggplot2` and `ggrepel` (R version `4.4`). \n* Fixes all unit tests.\n* Fixes CI workflow issues and simplifies testing to linux-only.\n\n# ggtranscript 0.99.9\n\n## NEW FEATURES\n\n* Address ggtranscript reviews; update docs with examples of using gtf/bed files, integration with `ggplot2` extensions and add usage of `shorten_gaps()` to README.\n\n# ggtranscript 0.99.8\n\n## NEW FEATURES\n\n* Address CRAN feedback; changing ggplot2 -> 'ggplot2', remove biocViews and contributing, removing Date field in DESCRIPTION.\n\n# ggtranscript 0.99.7\n\n## NEW FEATURES\n\n* Add `cran-comments.md` in preparation for first CRAN submission.\n* Update CI to run `R CMD Check` on latest R version (4.2).\n\n# ggtranscript 0.99.6\n\n## NEW FEATURES\n\n* Add `@return` documentation for `geom_*` functions for `BiocCheck`.\n\n# ggtranscript 0.99.5\n\n## NEW FEATURES\n\n* Change branch to naming from main to master to match BBS.\n\n# ggtranscript 0.99.4\n\n## NEW FEATURES\n\n* Change email to UCL email for Bioconductor submission. \n\n## NEW FEATURES\n\n* Add `add_utr()` for adding UTRs as ranges. This helper function is designed to \nwork with `shorten_gaps()`, enabling shortening of gaps whilst visually \ndifferentiating UTRs from the CDS.\n* Allow `to_intron()` to take CDS and UTRs ranges as input. \n* Submit to Bioconductor.\n\n# ggtranscript 0.99.2\n\n## NEW FEATURES\n\n* Add `geom_junction_label_repel()` for labeling junctions (e.g. with counts).\n* Add `add_exon_number()` for visualizing the exon number/order.\n\n# ggtranscript 0.99.1\n\n## NEW FEATURES\n\n* Implement base geoms: `geom_range()`, `geom_half_range()`, `geom_intron()`, \n`geom_junction()` and helper functions: `to_intron()`, `to_diff()` and \n`shorten_gaps()`.\n"
  },
  {
    "path": "R/add_exon_number.R",
    "content": "#' Add exon number\n#'\n#' `add_exon_number()` adds the exon number (the order the exons are transcribed\n#' within each transcript) as a column in `exons`. This can be useful when\n#' visualizing long, complex transcript structures, in order to keep track of\n#' specific exons of interest.\n#'\n#' To note, a \"strand\" column must be present within `exons`. The strand is used\n#' to differentiate whether exon numbers should be calculated according to\n#' ascending (\"+\") or descending (\"-\") genomic co-ordinates. For ambiguous\n#' strands (\"*\"), `add_exon_number()` will be assume the strand be \"+\".\n#'\n#' @inheritParams to_diff\n#'\n#' @return `data.frame()` equivalent to input `exons`, with the additional\n#'   column \"exon_number\".\n#'\n#' @export\n#' @examples\n#' library(magrittr)\n#' library(ggplot2)\n#'\n#' # to illustrate the package's functionality\n#' # ggtranscript includes example transcript annotation\n#' sod1_annotation %>% head()\n#'\n#' # extract exons\n#' sod1_exons <- sod1_annotation %>% dplyr::filter(type == \"exon\")\n#' sod1_exons %>% head()\n#'\n#' # add the exon number for each transcript\n#' sod1_exons <- sod1_exons %>% add_exon_number(group_var = \"transcript_name\")\n#'\n#' base <- sod1_exons %>%\n#'     ggplot(aes(\n#'         xstart = start,\n#'         xend = end,\n#'         y = transcript_name\n#'     )) +\n#'     geom_range() +\n#'     geom_intron(\n#'         data = to_intron(sod1_exons, \"transcript_name\"),\n#'         strand = \"+\"\n#'     )\n#'\n#' # it can be useful to annotate exons with their exon number\n#' # using ggplot2::geom_text()\n#' base +\n#'     geom_text(aes(\n#'         x = (start + end) / 2, # plot label at midpoint of exon\n#'         label = exon_number\n#'     ),\n#'     size = 3.5,\n#'     nudge_y = 0.4\n#'     )\n#'\n#' # Or alternatively, using ggrepel::geom_label_repel()\n#' # to separate labels from exons\n#' base +\n#'     ggrepel::geom_label_repel(ggplot2::aes(\n#'         x = (start + end) / 2,\n#'         label = exon_number\n#'     ),\n#'     size = 3.5,\n#'     min.segment.length = 0\n#'     )\nadd_exon_number <- function(exons, group_var = NULL) {\n    .check_coord_object(exons, check_strand = TRUE)\n    .check_group_var(exons, group_var)\n\n    if (!is.null(group_var)) {\n        exons <- exons %>% dplyr::group_by_at(.vars = group_var)\n    }\n\n    # arrange to make sure order reflects genomic position\n    exons <- exons %>%\n        dplyr::arrange_at(c(.vars = c(group_var, \"start\", \"end\")))\n\n    # add exon number, assuming all plus strand at start\n    exons <- exons %>%\n        dplyr::mutate(\n            exon_number = dplyr::row_number(),\n            n_exons = dplyr::n()\n        ) %>%\n        dplyr::ungroup()\n\n    # convert exon number for minus strand\n    exons <- exons %>%\n        dplyr::mutate(\n            exon_number = ifelse(\n                strand == \"-\",\n                n_exons - exon_number + 1,\n                exon_number\n            )\n        ) %>%\n        dplyr::select(-n_exons)\n\n    return(exons)\n}\n"
  },
  {
    "path": "R/add_utr.R",
    "content": "#' Add untranslated regions (UTRs)\n#'\n#' Given a set of `exons` (encompassing the CDS and UTRs) and `cds` regions,\n#' `add_utr()` will calculate and add the corresponding UTR regions as ranges.\n#' This can be useful when combined with `shorten_gaps()` to visualize\n#' transcripts with long introns, whilst differentiating UTRs from CDS regions.\n#'\n#' The definition of the inputted `cds` regions are expected to range from the\n#' beginning of the start codon to the end of the stop codon. Sometimes, for\n#' example in the case of Ensembl, reference annotation will omit the stop\n#' codons from the CDS definition. In such cases, users should manually ensure\n#' that the `cds` includes both the start and stop codons.\n#'\n#' @inheritParams to_diff\n#' @param cds `data.frame()` contains coding sequence ranges for the transcripts\n#'   in `exons`.\n#'\n#' @return `data.frame()` contains differentiated CDS and UTR ranges.\n#'\n#' @export\n#' @examples\n#'\n#' library(magrittr)\n#' library(ggplot2)\n#'\n#' # to illustrate the package's functionality\n#' # ggtranscript includes example transcript annotation\n#' pknox1_annotation %>% head()\n#'\n#' # extract exons\n#' pknox1_exons <- pknox1_annotation %>% dplyr::filter(type == \"exon\")\n#' pknox1_exons %>% head()\n#'\n#' # extract cds\n#' pknox1_cds <- pknox1_annotation %>% dplyr::filter(type == \"CDS\")\n#' pknox1_cds %>% head()\n#'\n#' # the CDS definition originating from the Ensembl reference annotation\n#' # does not include the stop codon\n#' # we must incorporate the stop codons into the CDS manually\n#' # by adding 3 base pairs to the end of the CDS of each transcript\n#' pknox1_cds_w_stop <- pknox1_cds %>%\n#'     dplyr::group_by(transcript_name) %>%\n#'     dplyr::mutate(\n#'         end = ifelse(end == max(end), end + 3, end)\n#'     ) %>%\n#'     dplyr::ungroup()\n#'\n#' # add_utr() adds ranges that represent the UTRs\n#' pknox1_cds_utr <- add_utr(\n#'     pknox1_exons,\n#'     pknox1_cds_w_stop,\n#'     group_var = \"transcript_name\"\n#' )\n#'\n#' pknox1_cds_utr %>% head()\n#'\n#' # this can be useful when combined with shorten_gaps()\n#' # to visualize transcripts with long introns whilst differentiating UTRs\n#' pknox1_cds_utr_rescaled <-\n#'     shorten_gaps(\n#'         exons = pknox1_cds_utr,\n#'         introns = to_intron(pknox1_cds_utr, \"transcript_name\"),\n#'         group_var = \"transcript_name\"\n#'     )\n#'\n#' pknox1_cds_utr_rescaled %>%\n#'     dplyr::filter(type == \"CDS\") %>%\n#'     ggplot(aes(\n#'         xstart = start,\n#'         xend = end,\n#'         y = transcript_name\n#'     )) +\n#'     geom_range() +\n#'     geom_range(\n#'         data = pknox1_cds_utr_rescaled %>% dplyr::filter(type == \"UTR\"),\n#'         height = 0.25,\n#'         fill = \"white\"\n#'     ) +\n#'     geom_intron(\n#'         data = to_intron(\n#'             pknox1_cds_utr_rescaled %>% dplyr::filter(type != \"intron\"),\n#'             \"transcript_name\"\n#'         ),\n#'         arrow.min.intron.length = 110\n#'     )\nadd_utr <- function(exons,\n                    cds,\n                    group_var = NULL) {\n\n    # input checks\n    .check_coord_object(exons, check_seqnames = TRUE)\n    .check_group_var(exons, group_var)\n    .check_coord_object(cds, check_seqnames = TRUE)\n    .check_group_var(cds, group_var)\n\n    # we have to create dummy group for downstream for loop if there is no group\n    null_group <- is.null(group_var)\n    if (null_group) {\n        exons <- exons %>% dplyr::mutate(dummy_group = \"A\")\n        cds <- cds %>% dplyr::mutate(dummy_group = \"A\")\n        group_var <- \"dummy_group\"\n    }\n\n    groups <- cds[[group_var]] %>% unique()\n\n    # convert to GenomicRanges for downstream processing\n    exons_gr <- exons %>% GenomicRanges::GRanges()\n    cds_gr <- cds %>% GenomicRanges::GRanges()\n\n    exons_w_utr <- vector(\"list\", length = length(groups))\n\n    for (i in seq_along(groups)) {\n        exons_gr_curr <- exons_gr %>%\n            .[GenomicRanges::mcols(exons_gr)[[group_var]] == groups[i]]\n\n        cds_gr_curr <- cds_gr %>%\n            .[GenomicRanges::mcols(cds_gr)[[group_var]] == groups[i]]\n\n        # use setdiff to get regions in exon but not in cds (i.e. the utrs)\n        utrs_curr <- GenomicRanges::setdiff(exons_gr_curr, cds_gr_curr)\n        GenomicRanges::mcols(utrs_curr)[[group_var]] <- groups[i]\n\n        utrs_curr$type <- \"UTR\"\n        cds_gr_curr$type <- \"CDS\"\n\n        exons_w_utr[[i]] <- c(utrs_curr, cds_gr_curr) %>% sort()\n    }\n\n    exons_w_utr <- exons_w_utr %>%\n        do.call(c, .) %>%\n        as.data.frame() %>%\n        dplyr::as_tibble()\n\n    # remove dummp_group if created\n    if (null_group) {\n        exons_w_utr <- exons_w_utr %>% dplyr::select(-dummy_group)\n    }\n\n    return(exons_w_utr)\n}\n"
  },
  {
    "path": "R/data.R",
    "content": "#' Example transcript annotation\n#'\n#' Transcript annotation including the co-ordinates (hg38) of the genes,\n#' transcripts, exons and CDS regions for \\emph{SOD1} and \\emph{PKNOX1}, which\n#' originate from version 105 of the Ensembl reference annotation.\n#'\n#' @format A `tibble::tibble()`:\n#' \\describe{\n#'   \\item{seqnames}{`factor()` chromosome.}\n#'   \\item{start}{`integer()` start position.}\n#'   \\item{end}{`integer()` end position.}\n#'   \\item{strand}{`factor()` strand.}\n#'   \\item{type}{`factor()` E.g.gene, transcript, exon or CDS.}\n#'   \\item{gene_name}{`character()` name of gene (GBA).}\n#'   \\item{transcript_name}{`character()` name of transcript.}\n#'   \\item{transcript_biotype}{`character()` biotype of transcript.}\n#' }\n#'\n#' @source generated using `ggtranscript/data-raw/sod1_pknox1_annotation.R`\n\"sod1_annotation\"\n\n#' @rdname sod1_annotation\n\"pknox1_annotation\"\n\n#' Example junctions\n#'\n#' Junction co-ordinates and counts associated with the \\emph{SOD1} gene.\n#' Junctions counts originate from GTEx liver samples and are downloaded via the\n#' Bioconductor package `snapcount`. Only unannotated junctions with a mean\n#' count above 0.3 have been retained for this example.\n#'\n#' @format A `tibble::tibble()`:\n#' \\describe{\n#'   \\item{seqnames}{`factor()` chromosome.}\n#'   \\item{start}{`integer()` start position.}\n#'   \\item{end}{`integer()` end position.}\n#'   \\item{strand}{`factor()` strand.}\n#'   \\item{mean_count}{`factor()` Average count across all GTEx liver samples.}\n#' }\n#'\n#' @source generated using `ggtranscript/data-raw/sod1_junctions.R`\n\"sod1_junctions\"\n"
  },
  {
    "path": "R/geom_half_range.R",
    "content": "#' @param range.orientation `character()` one of \"top\" or \"bottom\", specifying\n#'   where the half ranges will be plotted with respect to each transcript\n#'   (`y`).\n#'\n#' @export\n#' @rdname geom_range\ngeom_half_range <- function(mapping = NULL, data = NULL,\n                            stat = \"identity\", position = \"identity\",\n                            ...,\n                            range.orientation = \"bottom\",\n                            linejoin = \"mitre\",\n                            na.rm = FALSE,\n                            show.legend = NA,\n                            inherit.aes = TRUE) {\n    ggplot2::layer(\n        data = data,\n        mapping = mapping,\n        stat = stat,\n        geom = GeomHalfRange,\n        position = position,\n        show.legend = show.legend,\n        inherit.aes = inherit.aes,\n        params = list(\n            range.orientation = range.orientation,\n            linejoin = linejoin,\n            na.rm = na.rm,\n            ...\n        )\n    )\n}\n\n#' `GeomHalfRange` is `GeomRange` with default parameters for `vjust` and\n#' `height` as well as the added parameter `range.orientation`\n#'\n#' @include geom_range.R\n#' @keywords internal\n#' @noRd\nGeomHalfRange <- ggplot2::ggproto(\"GeomHalfRange\", GeomRange,\n    setup_data = function(data, params) {\n        # check that range.orientation is one of possible options\n        .check_range.orientation(params)\n\n        # modified from ggplot2::GeomTile\n        data$height <- data$height %||% params$height %||% 0.25\n\n        transform(\n            data,\n            xmin = xstart,\n            xmax = xend,\n            ymin = y - height / 2,\n            ymax = y + height / 2,\n            height = NULL\n        )\n    },\n    draw_panel = function(data,\n                          panel_params,\n                          coord,\n                          range.orientation = \"bottom\",\n                          lineend = \"butt\",\n                          linejoin = \"mitre\") {\n        vjust <- ifelse(\n            range.orientation == \"bottom\",\n            1.5,\n            0.5\n        )\n\n        GeomRange$draw_panel(\n            data = data,\n            panel_params = panel_params,\n            coord = coord,\n            vjust = vjust,\n            lineend = lineend,\n            linejoin = linejoin\n        )\n    }\n)\n\n#' @keywords internal\n#' @noRd\n.check_range.orientation <- function(params) {\n    not_orient_option <-\n        !(params$range.orientation %in% c(\"top\", \"bottom\"))\n\n    if (not_orient_option) {\n        stop(\n            \"range.orientation must be one of \",\n            \"'alternating', 'top' or 'bottom'\"\n        )\n    }\n}\n"
  },
  {
    "path": "R/geom_intron.R",
    "content": "#' Plot intron lines with strand arrows\n#'\n#' `geom_intron()` draws horizontal lines with central arrows that are designed\n#' to represent introns. In combination with `geom_range()`/`geom_half_range()`,\n#' these geoms form the core components for visualizing transcript structures.\n#'\n#' `geom_intron()` requires the following `aes()`; `xstart`, `xend` and `y`\n#' (e.g. transcript name). If users do not have intron co-ordinates, these can\n#' be generated from the corresponding exons using `to_intron()`. The `strand`\n#' option (one of \"+\" or \"-\") adjusts the arrow direction to match the direction\n#' of transcription. The `arrow.min.intron.length` parameter can be useful to\n#' remove strand arrows that overlap exons, which can be a problem if plotted\n#' introns include those that are relatively short.\n#'\n#' @inheritParams ggplot2::layer\n#' @inheritParams ggplot2::geom_point\n#' @inheritParams ggplot2::geom_segment\n#' @param arrow.min.intron.length `integer()` the minimum required width of an\n#'   intron for a strand arrow to be drawn. This can be useful to remove strand\n#'   arrows on short introns that overlap adjacent exons.\n#'\n#' @return the return value of a `geom_*` function is not intended to be\n#'   directly handled by users. Therefore, `geom_*` functions should never be\n#'   executed in isolation, rather used in combination with a\n#'   `ggplot2::ggplot()` call.\n#'\n#' @export\n#' @examples\n#'\n#' library(magrittr)\n#' library(ggplot2)\n#'\n#' # to illustrate the package's functionality\n#' # ggtranscript includes example transcript annotation\n#' pknox1_annotation %>% head()\n#'\n#' # extract exons\n#' pknox1_exons <- pknox1_annotation %>% dplyr::filter(type == \"exon\")\n#' pknox1_exons %>% head()\n#'\n#' # to_intron() is a helper function included in ggtranscript\n#' # which is useful for converting exon co-ordinates to introns\n#' pknox1_introns <- pknox1_exons %>% to_intron(group_var = \"transcript_name\")\n#' pknox1_introns %>% head()\n#'\n#' base <- pknox1_introns %>%\n#'     ggplot(aes(\n#'         xstart = start,\n#'         xend = end,\n#'         y = transcript_name\n#'     ))\n#'\n#' # by default, geom_intron() assumes introns originate from the \"+\" strand\n#' base + geom_intron()\n#'\n#' # however this can be modified using the strand option\n#' base + geom_intron(strand = \"-\")\n#'\n#' # strand can also be set as an aes()\n#' base + geom_intron(aes(strand = strand))\n#'\n#' # as a ggplot2 extension, ggtranscript geoms inherit the\n#' # the functionality from the parameters and aesthetics in ggplot2\n#' base + geom_intron(\n#'     aes(colour = transcript_name),\n#'     linewidth = 1\n#' )\n#'\n#' # together, geom_range() and geom_intron() are designed to visualize\n#' # the core components of transcript annotation\n#' pknox1_exons %>%\n#'     ggplot(aes(\n#'         xstart = start,\n#'         xend = end,\n#'         y = transcript_name\n#'     )) +\n#'     geom_range() +\n#'     geom_intron(\n#'         data = pknox1_introns\n#'     )\n#'\n#' # for short introns, sometimes strand arrows will overlap exons\n#' # to avoid this, users can set the arrow.min.intron.length parameter\n#' pknox1_exons %>%\n#'     ggplot(aes(\n#'         xstart = start,\n#'         xend = end,\n#'         y = transcript_name\n#'     )) +\n#'     geom_range() +\n#'     geom_intron(\n#'         data = pknox1_introns,\n#'         arrow.min.intron.length = 3500\n#'     )\ngeom_intron <- function(mapping = NULL, data = NULL,\n                        stat = \"identity\", position = \"identity\",\n                        ...,\n                        arrow = grid::arrow(ends = \"last\", length = grid::unit(0.1, \"inches\")),\n                        arrow.fill = NULL,\n                        lineend = \"butt\",\n                        linejoin = \"round\",\n                        na.rm = FALSE,\n                        arrow.min.intron.length = 0,\n                        show.legend = NA,\n                        inherit.aes = TRUE) {\n    ggplot2::layer(\n        data = data,\n        mapping = mapping,\n        stat = stat,\n        geom = GeomIntron,\n        position = position,\n        show.legend = show.legend,\n        inherit.aes = inherit.aes,\n        params = list(\n            arrow = arrow,\n            arrow.fill = arrow.fill,\n            lineend = lineend,\n            linejoin = linejoin,\n            na.rm = na.rm,\n            arrow.min.intron.length = arrow.min.intron.length,\n            ...\n        )\n    )\n}\n\n#' `GeomIntron` is pretty much `ggplot2::GeomSegment` with the `required_aes`\n#' changed to `xstart`/`xend` to match genetic nomenclature and the added arrows\n#' to indicate direction of transcription (configured with `strand` and\n#' `arrow.min.intron.length`)\n#' @noRd\nGeomIntron <- ggplot2::ggproto(\"GeomIntron\", ggplot2::GeomSegment,\n    required_aes = c(\"xstart\", \"xend\", \"y\"),\n    default_aes = aes(\n        colour = \"black\",\n        linewidth = 0.5,\n        linetype = 1,\n        alpha = NA,\n        strand = \"+\"\n    ),\n    setup_params = function(data, params) {\n        # check that arrow.min.intron.length numeric is >= 0\n        arrow.min_numeric <- is.numeric(params$arrow.min.intron.length)\n        arrow.min_neg <- params$arrow.min.intron.length < 0\n\n        if (!arrow.min_numeric | arrow.min_neg) {\n            stop(\"arrow.min.intron.length must be a numeric > 0\")\n        }\n\n        params\n    },\n    setup_data = function(data, params) {\n        # needed to permit usage of xstart/xend\n        transform(\n            data,\n            x = xstart,\n            yend = y,\n            xstart = NULL\n        )\n    },\n    draw_panel = function(data,\n                          panel_params,\n                          coord,\n                          arrow = NULL,\n                          arrow.fill = NULL,\n                          lineend = \"butt\",\n                          linejoin = \"round\",\n                          na.rm = FALSE,\n                          arrow.min.intron.length = 0) {\n\n        # check that strand is scalar and one of \"+\" or \"-\"\n        .check_strand(data$strand)\n\n        # first, create the intron grob, which is just a pure line (no arrow)\n        intron_grob <- ggplot2::GeomSegment$draw_panel(\n            data = data,\n            panel_params = panel_params,\n            coord = coord,\n            arrow = NULL,\n            arrow.fill = NULL,\n            lineend = lineend,\n            linejoin = linejoin,\n            na.rm = na.rm\n        )\n\n        # then, create the arrow grobs, one per strand\n        # need both as the direction of arrow (as far I can tell) is\n        # is dependent on the orientation of the x/xend\n        strand_arrow_plus_grob <- .create_strand_arrow_grob(\n            target_strand = \"+\",\n            arrow.min.intron.length = arrow.min.intron.length,\n            data = data,\n            panel_params = panel_params,\n            coord = coord,\n            arrow = arrow,\n            arrow.fill = arrow.fill,\n            lineend = lineend,\n            linejoin = linejoin,\n            na.rm = na.rm\n        )\n\n        strand_arrow_minus_grob <- .create_strand_arrow_grob(\n            target_strand = \"-\",\n            arrow.min.intron.length = arrow.min.intron.length,\n            data = data,\n            panel_params = panel_params,\n            coord = coord,\n            arrow = arrow,\n            arrow.fill = arrow.fill,\n            lineend = lineend,\n            linejoin = linejoin,\n            na.rm = na.rm\n        )\n\n        # draw_panel expects return of a grob\n        # here, as we build multiple grobs (i.e. intron lines + arrows)\n        # we use a grobTree to combine the two\n        grid::grobTree(\n            intron_grob,\n            strand_arrow_plus_grob,\n            strand_arrow_minus_grob\n        )\n    }\n)\n\n#' @keywords internal\n#' @noRd\n.check_strand <- function(strand) {\n    # TODO - add option for \"*\" arrow?\n    any_na <- any(is.na(strand))\n    plus_minus <- !(all(strand %in% c(\"+\", \"-\")))\n\n    if (any_na | plus_minus) {\n        stop(\"strand values must be one of '+' and '-'\")\n    }\n\n    return(invisible())\n}\n\n#' @keywords internal\n#' @noRd\n.create_strand_arrow_grob <- function(target_strand,\n                                      arrow.min.intron.length,\n                                      data,\n                                      panel_params,\n                                      coord,\n                                      arrow,\n                                      arrow.fill,\n                                      lineend,\n                                      linejoin,\n                                      na.rm) {\n\n    # filter for introns that match target strand\n    # and have a length above arrow.min.intron.length\n    match_strand <- data$strand == target_strand\n    ab_min <- abs(data$x - data$xend) > arrow.min.intron.length\n    arrow_data <- data[match_strand & ab_min, ]\n\n    # if there are no arrows to plot, use a nullGrob() to add nothing\n    if (nrow(arrow_data) == 0) {\n        arrow_grob <- grid::nullGrob()\n    } else {\n\n        # obtain the the correct orientation of arrow (dependent on strand)\n        # as the arrow can only be placed at either end of a geom_segment/path\n        # the strand changes the x/xends around, shifting the around direction\n        if (target_strand == \"+\") {\n            arrow_data <- transform(\n                arrow_data,\n                xend = (x + xend) / 2\n            )\n        } else {\n            arrow_data <- transform(\n                arrow_data,\n                mid = (x + xend) / 2,\n                x = xend\n            )\n            arrow_data <- transform(\n                arrow_data,\n                xend = mid\n            )\n        }\n\n        arrow_grob <- ggplot2::GeomSegment$draw_panel(\n            data = arrow_data,\n            panel_params = panel_params,\n            coord = coord,\n            arrow = arrow,\n            arrow.fill = arrow.fill,\n            lineend = lineend,\n            linejoin = linejoin,\n            na.rm = na.rm\n        )\n    }\n\n    return(arrow_grob)\n}\n"
  },
  {
    "path": "R/geom_junction.R",
    "content": "#' Plot junction curves\n#'\n#' `geom_junction()` draws curves that are designed to represent junction reads\n#' from RNA-sequencing data. It can be useful to overlay junction data on\n#' transcript annotation (plotted using `geom_range()`/`geom_half_range()` and\n#' `geom_intron()`) to understand which splicing events or transcripts have\n#' support from RNA-sequencing data.\n#'\n#' `geom_junction()` requires the following `aes()`; `xstart`, `xend` and `y`\n#' (e.g. transcript name). `geom_junction()` curves can be modified using\n#' `junction.y.max`, which can be useful when junctions overlap one\n#' another/other transcripts or extend beyond the plot margins. By default,\n#' junction curves will alternate between being plotted on the top and bottom of\n#' each transcript (`y`), however this can be modified via\n#' `junction.orientation`.\n#'\n#' @inheritParams ggplot2::layer\n#' @inheritParams ggplot2::geom_bar\n#' @inheritParams grid::curveGrob\n#' @param junction.orientation `character()` one of \"alternating\", \"top\" or\n#'   \"bottom\", specifying where the junctions will be plotted with respect to\n#'   each transcript (`y`).\n#' @param junction.y.max `double()` the max y-value of each junction curve. It\n#'   can be useful to adjust this parameter when junction curves overlap with\n#'   one another/other transcripts or extend beyond the plot margins.\n#'\n#' @return the return value of a `geom_*` function is not intended to be\n#'   directly handled by users. Therefore, `geom_*` functions should never be\n#'   executed in isolation, rather used in combination with a\n#'   `ggplot2::ggplot()` call.\n#'\n#' @export\n#' @examples\n#'\n#' library(magrittr)\n#' library(ggplot2)\n#'\n#' # to illustrate the package's functionality\n#' # ggtranscript includes example transcript annotation\n#' sod1_annotation %>% head()\n#'\n#' # as well as a set of example (unannotated) junctions\n#' # originating from GTEx and downloaded via the Bioconductor package snapcount\n#' sod1_junctions\n#'\n#' # extract exons\n#' sod1_exons <- sod1_annotation %>% dplyr::filter(\n#'     type == \"exon\",\n#'     transcript_name == \"SOD1-201\"\n#' )\n#' sod1_exons %>% head()\n#'\n#' # add transcript_name to junctions for plotting\n#' sod1_junctions <- sod1_junctions %>%\n#'     dplyr::mutate(transcript_name = \"SOD1-201\")\n#'\n#' # junctions can be plotted as curves using geom_junction()\n#' base <- sod1_junctions %>%\n#'     ggplot2::ggplot(ggplot2::aes(\n#'         xstart = start,\n#'         xend = end,\n#'         y = transcript_name\n#'     ))\n#'\n#' # sometimes, depending on the number and widths of transcripts and junctions\n#' # junctions will go overlap one another or extend beyond the plot margin\n#' base + geom_junction()\n#'\n#' # in such cases, junction.y.max can be adjusted to modify the max y of curves\n#' base + geom_junction(junction.y.max = 0.5)\n#'\n#' # ncp can be used improve the smoothness of curves\n#' base + geom_junction(junction.y.max = 0.5, ncp = 30)\n#'\n#' # junction.orientation controls where the junction are plotted\n#' # with respect to each transcript\n#' # either alternating (default), or on the top or bottom\n#' base + geom_junction(junction.orientation = \"top\", junction.y.max = 0.5)\n#' base + geom_junction(junction.orientation = \"bottom\", junction.y.max = 0.5)\n#'\n#' # it can be useful useful to overlay junction curves onto existing annotation\n#' # plotted using geom_range() and geom_intron()\n#' base <- sod1_exons %>%\n#'     ggplot(aes(\n#'         xstart = start,\n#'         xend = end,\n#'         y = transcript_name\n#'     )) +\n#'     geom_range() +\n#'     geom_intron(\n#'         data = to_intron(sod1_exons, \"transcript_name\")\n#'     )\n#'\n#' base + geom_junction(\n#'     data = sod1_junctions,\n#'     junction.y.max = 0.5\n#' )\n#'\n#' # as a ggplot2 extension, ggtranscript geoms inherit the\n#' # the functionality from the parameters and aesthetics in ggplot2\n#' # this can be useful when mapping junction thickness to their counts\n#' base + geom_junction(\n#'     data = sod1_junctions,\n#'     aes(linewidth = mean_count),\n#'     junction.y.max = 0.5,\n#'     colour = \"purple\"\n#' ) +\n#'     scale_linewidth(range = c(0.1, 1))\n#'\n#' # it can be useful to combine geom_junction() with geom_half_range()\n#' sod1_exons %>%\n#'     ggplot(aes(\n#'         xstart = start,\n#'         xend = end,\n#'         y = transcript_name\n#'     )) +\n#'     geom_half_range() +\n#'     geom_intron(\n#'         data = to_intron(sod1_exons, \"transcript_name\")\n#'     ) +\n#'     geom_junction(\n#'         data = sod1_junctions,\n#'         aes(linewidth = mean_count),\n#'         junction.y.max = 0.5,\n#'         junction.orientation = \"top\",\n#'         colour = \"purple\"\n#'     ) +\n#'     scale_linewidth(range = c(0.1, 1))\ngeom_junction <- function(mapping = NULL,\n                          data = NULL,\n                          stat = \"identity\",\n                          position = \"identity\",\n                          junction.orientation = \"alternating\",\n                          junction.y.max = 1,\n                          angle = 90,\n                          ncp = 15,\n                          na.rm = FALSE,\n                          orientation = NA,\n                          show.legend = NA,\n                          inherit.aes = TRUE,\n                          ...) {\n    ggplot2::layer(\n        data = data,\n        mapping = mapping,\n        stat = stat,\n        geom = GeomJunction,\n        position = position,\n        show.legend = show.legend,\n        inherit.aes = inherit.aes,\n        params = list(\n            junction.orientation = junction.orientation,\n            junction.y.max = junction.y.max,\n            angle = angle,\n            ncp = ncp,\n            na.rm = na.rm,\n            orientation = orientation,\n            ...\n        )\n    )\n}\n\n#' @keywords internal\n#' @noRd\nGeomJunction <- ggplot2::ggproto(\"GeomJunction\", ggplot2::GeomLine,\n    required_aes = c(\"xstart\", \"xend\", \"y\"),\n    setup_data = function(data, params) {\n        # check that junction.orientation is length 1 + one of possible options\n        .check_junction.orientation(params)\n        # check that junction.y.max is length 1 + one of possible options\n        .check_junction.y.max(params)\n\n        # we need a unique group id per junction, rather than per transcript\n        # similar to spring example from ggplot2 book\n        # https://ggplot2-book.org/spring1.html#spring3\n        if (is.null(data$group)) {\n            data$group <- seq_len(nrow(data))\n        }\n        if (anyDuplicated(data$group)) {\n            data$group <- paste(data$group, seq_len(nrow(data)), sep = \"-\")\n        }\n\n        # needed to permit usage of xstart/xend\n        transform(\n            data,\n            x = xstart,\n            xstart = NULL\n        )\n    },\n    draw_panel = function(data,\n                          panel_params,\n                          coord,\n                          junction.orientation = \"alternating\",\n                          junction.y.max = 1,\n                          angle = 90,\n                          ncp = 15) {\n        # junction_index represents the order of each junction within tx\n        # needed for junction.orientation = \"alternating\"\n        data <- data %>%\n            dplyr::group_by(y) %>%\n            dplyr::mutate(junction_index = dplyr::row_number()) %>%\n            dplyr::ungroup()\n\n        # obtain the actual curves using grid:::calcControlPoints\n        junctions <- .get_junction_curves(data, angle, ncp)\n\n        # normalise curve points to lie between 0-1\n        # scale to fit depending on N txs, width of junctions\n        junctions <- .get_normalised_curve(\n            junctions,\n            junction.orientation,\n            junction.y.max\n        )\n\n        ggplot2::GeomLine$draw_panel(junctions, panel_params, coord)\n    }\n)\n\n#' @keywords internal\n#' @noRd\n.get_junction_curves <- function(data, angle, ncp) {\n\n    #  very similar to springs example\n    # create the junction points, whilst preserving aes\n    # https://ggplot2-book.org/spring1.html#spring3\n    # TODO - implementation could probably be vectorised for speed\n    cols_to_keep <- setdiff(names(data), c(\"x\", \"xend\", \"y\"))\n    junctions <- lapply(seq_len(nrow(data)), function(i) {\n        junction_curve <- .get_junction_curve(\n            data$x[i], data$xend[i], data$y[i],\n            angle, ncp\n        )\n        cbind(junction_curve, unclass(data[i, cols_to_keep]))\n    })\n\n    junctions <- do.call(rbind, junctions)\n\n    return(junctions)\n}\n\n\n#' @keywords internal\n#' @noRd\n.get_junction_curve <- function(x, xend, y, angle, ncp) {\n    # creates the points for each curve\n    curve_points <- calcControlPoints(\n        x1 = x, x2 = xend,\n        y1 = y, y2 = y,\n        angle = angle,\n        curvature = -0.5,\n        ncp = ncp\n    )\n\n    # need to re-add the original points as these not included\n    # by grid:::calcControlPoints\n    # makes sure junctions curves meet the intron lines\n    junction_curve <- data.frame(\n        x_points = c(x, curve_points$x, xend),\n        y_points = c(y, curve_points$y, y),\n        y_original = y\n    ) %>%\n        dplyr::rename(\n            x = x_points,\n            y = y_points\n        )\n\n    return(junction_curve)\n}\n\n#' @keywords internal\n#' @noRd\n.get_normalised_curve <- function(junctions,\n                                  junction.orientation,\n                                  junction.y.max) {\n\n    # junction.y.max is equivalent to the max y of each junction curve\n    # each tx is internally uses y an integer\n    # scaling factor (sf) is used normalise the junction curve points\n    sf <- 1 / junction.y.max\n\n    # each curve point is normalised with relation to the original tx y\n    # first divided by the max(y), meaning all y values lie between 0-1\n    # then divided by the sf, setting the max y\n    if (junction.orientation == \"top\") {\n        junctions <- junctions %>% dplyr::mutate(\n            y = ifelse(y == y_original, y, y_original + (y / max(y)) / sf)\n        )\n    } else if (junction.orientation == \"bottom\") {\n        junctions <- junctions %>% dplyr::mutate(\n            y = ifelse(y == y_original, y, y_original - (y / max(y)) / sf)\n        )\n    } else if (junction.orientation == \"alternating\") {\n        junctions <- junctions %>% dplyr::mutate(y = dplyr::case_when(\n            y == y_original ~ y,\n            junction_index %% 2 == 0 ~ y_original - (y / max(y) / sf),\n            junction_index %% 2 == 1 ~ y_original + (y / max(y) / sf)\n        ))\n    }\n\n    return(junctions)\n}\n\n#' @keywords internal\n#' @noRd\n.check_junction.orientation <- function(params) {\n    not_orient_option <-\n        !(params$junction.orientation %in% c(\"alternating\", \"top\", \"bottom\"))\n\n    if (not_orient_option) {\n        stop(\n            \"junction.orientation must be one of \",\n            \"'alternating', 'top' or 'bottom'\"\n        )\n    }\n}\n\n#' @keywords internal\n#' @noRd\n.check_junction.y.max <- function(params) {\n    if (length(params$junction.y.max) != 1) {\n        stop(\n            \"junction.y.max must have a length of 1\"\n        )\n    }\n    if (!is.numeric(params$junction.y.max)) {\n        stop(\n            \"junction.y.max must be a numeric value (integer/double)\"\n        )\n    }\n}\n\ncalcControlPoints <- grid:::calcControlPoints\n"
  },
  {
    "path": "R/geom_junction_label_repel.R",
    "content": "#' Label junction curves\n#'\n#' `geom_junction_label_repel()` labels junction curves at their midpoint using\n#' `ggrepel::geom_label_repel()`. This can be useful to label and compare\n#' junctions (plotted using `geom_junction()`) with metrics of their usage (e.g.\n#' read counts or percent-spliced-in).\n#'\n#' `geom_junction_label_repel()` requires the following `aes()`; `xstart`,\n#' `xend`, `y` (e.g. transcript name) and `label`. Under the hood,\n#' `geom_junction_label_repel()` generates the same junction curves as\n#' `geom_junction()` to obtain curve midpoints for labeling. Therefore, it is\n#' important that users use the same input data and parameters that alter\n#' junction curves (namely `junction.orientation`, `junction.y.max`, `angle`,\n#' `ncp`) for `geom_junction_label_repel()` that they have used for\n#' `geom_junction()`.\n#'\n#' @inheritParams ggrepel::geom_text_repel\n#' @inheritParams grid::curveGrob\n#' @inheritParams geom_junction\n#'\n#' @return the return value of a `geom_*` function is not intended to be\n#'   directly handled by users. Therefore, `geom_*` functions should never be\n#'   executed in isolation, rather used in combination with a\n#'   `ggplot2::ggplot()` call.\n#'\n#' @export\n#' @examples\n#' library(magrittr)\n#' library(ggplot2)\n#'\n#' # to illustrate the package's functionality\n#' # ggtranscript includes example transcript annotation\n#' sod1_annotation %>% head()\n#'\n#' # as well as a set of example (unannotated) junctions\n#' # originating from GTEx and downloaded via the Bioconductor package snapcount\n#' sod1_junctions\n#'\n#' # extract exons\n#' sod1_exons <- sod1_annotation %>% dplyr::filter(\n#'     type == \"exon\",\n#'     transcript_name == \"SOD1-201\"\n#' )\n#' sod1_exons %>% head()\n#'\n#' # add transcript_name to junctions for plotting\n#' sod1_junctions <- sod1_junctions %>%\n#'     dplyr::mutate(transcript_name = \"SOD1-201\")\n#'\n#' # geom_junction_label_repel() can be used to label junctions\n#' base <- sod1_exons %>%\n#'     ggplot(aes(\n#'         xstart = start,\n#'         xend = end,\n#'         y = transcript_name\n#'     )) +\n#'     geom_range() +\n#'     geom_intron(\n#'         data = to_intron(sod1_exons, \"transcript_name\")\n#'     )\n#'\n#' # this can be useful to label junctions with their counts\n#' base +\n#'     geom_junction(\n#'         data = sod1_junctions,\n#'         junction.y.max = 0.5\n#'     ) +\n#'     geom_junction_label_repel(\n#'         data = sod1_junctions,\n#'         aes(label = round(mean_count, 2)),\n#'         junction.y.max = 0.5\n#'     )\ngeom_junction_label_repel <- function(mapping = NULL,\n                                      data = NULL,\n                                      stat = \"identity\",\n                                      position = \"identity\",\n                                      parse = FALSE,\n                                      ...,\n                                      junction.orientation = \"alternating\",\n                                      junction.y.max = 1,\n                                      angle = 90,\n                                      ncp = 15,\n                                      box.padding = 0.25,\n                                      label.padding = 0.25,\n                                      point.padding = 1e-6,\n                                      label.r = 0.15,\n                                      label.size = 0.25,\n                                      min.segment.length = 0,\n                                      arrow = NULL,\n                                      force = 1,\n                                      force_pull = 1,\n                                      max.time = 0.5,\n                                      max.iter = 10000,\n                                      max.overlaps = getOption(\"ggrepel.max.overlaps\", default = 10),\n                                      nudge_x = 0,\n                                      nudge_y = 0,\n                                      xlim = c(NA, NA),\n                                      ylim = c(NA, NA),\n                                      na.rm = FALSE,\n                                      show.legend = NA,\n                                      direction = c(\"both\", \"y\", \"x\"),\n                                      seed = NA,\n                                      verbose = FALSE,\n                                      inherit.aes = TRUE) {\n    if (!missing(nudge_x) || !missing(nudge_y)) {\n        if (!missing(position)) {\n            stop(\"Specify either `position` or `nudge_x`/`nudge_y`\", call. = FALSE)\n        }\n        position <- position_nudge_repel(nudge_x, nudge_y)\n    }\n    ggplot2::layer(\n        data = data,\n        mapping = mapping,\n        stat = stat,\n        geom = GeomJunctionLabelRepel,\n        position = position,\n        show.legend = show.legend,\n        inherit.aes = inherit.aes,\n        params = list(\n            parse = parse,\n            junction.orientation = junction.orientation,\n            junction.y.max = junction.y.max,\n            angle = angle,\n            ncp = ncp,\n            box.padding  = to_unit(box.padding),\n            label.padding = to_unit(label.padding),\n            point.padding  = to_unit(point.padding),\n            label.r = to_unit(label.r),\n            label.size = label.size,\n            min.segment.length = to_unit(min.segment.length),\n            arrow = arrow,\n            na.rm = na.rm,\n            force = force,\n            force_pull = force_pull,\n            max.time = max.time,\n            max.iter = max.iter,\n            max.overlaps = max.overlaps,\n            nudge_x = nudge_x,\n            nudge_y = nudge_y,\n            xlim = xlim,\n            ylim = ylim,\n            direction = match.arg(direction),\n            seed = seed,\n            verbose = verbose,\n            ...\n        )\n    )\n}\n\n#' @include geom_junction.R\n#' @keywords internal\n#' @noRd\nGeomJunctionLabelRepel <- ggplot2::ggproto(\n    \"GeomJunctionLabelRepel\", ggrepel::GeomLabelRepel,\n    required_aes = c(\"xstart\", \"xend\", \"y\", \"label\"),\n    # copied from ggrepel::GeomLabelRepel with segment.colour and segment.alpha\n    # defaults set to appropriate values, rather than NULL\n    # this avoid warnings e.g. Unknown or uninitialised column: `segment.alpha`\n    # but does cause issues when setting e.g. aes(colour = tx)\n    # TODO - resolve either warning or make segment.colour borrow colour aes\n    default_aes = aes(\n        colour = \"black\",\n        fill = \"white\",\n        size = 3.88,\n        angle = 0,\n        alpha = NA,\n        family = \"\",\n        fontface = 1,\n        lineheight = 1.2,\n        hjust = 0.5,\n        vjust = 0.5,\n        point.size = 1,\n        segment.linetype = 1,\n        segment.colour = \"black\",\n        segment.size = 0.5,\n        segment.alpha = NA,\n        segment.curvature = 0,\n        segment.angle = 90,\n        segment.ncp = 1,\n        segment.shape = 0.5,\n        segment.square = TRUE,\n        segment.squareShape = 1,\n        segment.inflect = FALSE,\n        segment.debug = FALSE\n    ),\n    setup_data = GeomJunction$setup_data,\n    draw_panel = function(data, panel_scales, coord,\n                          parse = FALSE,\n                          na.rm = FALSE,\n                          junction.orientation = \"alternating\",\n                          junction.y.max = 1,\n                          angle = 90,\n                          ncp = 15,\n                          box.padding = 0.25,\n                          label.padding = 0.25,\n                          point.padding = 1e-6,\n                          label.r = 0.15,\n                          label.size = 0.25,\n                          min.segment.length = 0,\n                          arrow = NULL,\n                          force = 1,\n                          force_pull = 1,\n                          max.time = 0.5,\n                          max.iter = 10000,\n                          max.overlaps = 10,\n                          nudge_x = 0,\n                          nudge_y = 0,\n                          xlim = c(NA, NA),\n                          ylim = c(NA, NA),\n                          direction = \"both\",\n                          seed = NA,\n                          verbose = FALSE) {\n\n        # junction_index represents the order of each junction within tx\n        # needed for junction.orientation = \"alternating\"\n        data <- data %>%\n            dplyr::group_by(y) %>%\n            dplyr::mutate(junction_index = dplyr::row_number()) %>%\n            dplyr::ungroup()\n\n        # obtain the midpoints of junction curves (where we want label)\n        junction_midpoints <-\n            to_junction_midpoints(\n                data,\n                angle,\n                ncp,\n                junction.orientation,\n                junction.y.max\n            )\n\n        ggrepel::GeomLabelRepel$draw_panel(\n            data = junction_midpoints,\n            panel_scales = panel_scales,\n            coord = coord,\n            parse = parse,\n            na.rm = na.rm,\n            box.padding = box.padding,\n            label.padding = label.padding,\n            point.padding = point.padding,\n            label.r = label.r,\n            label.size = label.size,\n            min.segment.length = min.segment.length,\n            arrow = arrow,\n            force = force,\n            force_pull = force_pull,\n            max.time = max.time,\n            max.iter = max.iter,\n            max.overlaps = max.overlaps,\n            nudge_x = nudge_x,\n            nudge_y = nudge_y,\n            xlim = xlim,\n            ylim = ylim,\n            direction = direction,\n            seed = seed,\n            verbose = verbose\n        )\n    }\n)\n\n#' Wrapper for obtaining junction curve midpoints\n#'\n#' @keywords internal\n#' @noRd\nto_junction_midpoints <- function(data,\n                                  angle,\n                                  ncp,\n                                  junction.orientation,\n                                  junction.y.max) {\n    # TODO - maybe export this as helper?\n    junctions <- .get_junction_curves(data, angle, ncp)\n    junctions <- .get_normalised_curve(\n        junctions,\n        junction.orientation,\n        junction.y.max\n    )\n    junction_midpoints <- .get_curve_midpoints(junctions)\n\n    return(junction_midpoints)\n}\n\n#' @keywords internal\n#' @noRd\n.get_curve_midpoints <- function(junctions) {\n\n    # get the mid points of each curve for labeling junctions\n    # these are the points with the x value closest to median(x)\n    # this cannot be == median(x), this will not pick up point for even ncp's\n    junctions_mid <- junctions %>%\n        dplyr::group_by(group) %>%\n        dplyr::mutate(\n            median_x = stats::median(x),\n            median_diff = abs(x - median_x)\n        ) %>%\n        dplyr::filter(median_diff == min(median_diff)) %>%\n        dplyr::ungroup() %>%\n        dplyr::select(-median_x, -median_diff)\n\n    return(junctions_mid)\n}\n\n\nto_unit <- ggrepel:::to_unit\n"
  },
  {
    "path": "R/geom_range.R",
    "content": "#' Plot genomic ranges\n#'\n#' `geom_range()` and `geom_half_range()` draw tiles that are designed to\n#' represent range-based genomic features, such as exons. In combination with\n#' `geom_intron()`, these geoms form the core components for visualizing\n#' transcript structures.\n#'\n#' `geom_range()` and `geom_half_range()` require the following `aes()`;\n#' `xstart`, `xend` and `y` (e.g. transcript name). `geom_half_range()` takes\n#' advantage of the vertical symmetry of transcript annotation by plotting only\n#' half of a range on the top or bottom of a transcript structure. This can be\n#' useful for comparing between two transcripts or free up plotting space for\n#' other transcript annotations (e.g. `geom_junction()`).\n#'\n#' @inheritParams ggplot2::layer\n#' @inheritParams ggplot2::geom_point\n#' @inheritParams ggplot2::geom_tile\n#' @inheritParams ggplot2::geom_segment\n#' @inheritParams grid::rectGrob\n#'\n#' @return the return value of a `geom_*` function is not intended to be\n#'   directly handled by users. Therefore, `geom_*` functions should never be\n#'   executed in isolation, rather used in combination with a\n#'   `ggplot2::ggplot()` call.\n#'\n#' @export\n#' @examples\n#'\n#' library(magrittr)\n#' library(ggplot2)\n#'\n#' # to illustrate the package's functionality\n#' # ggtranscript includes example transcript annotation\n#' sod1_annotation %>% head()\n#'\n#' # extract exons\n#' sod1_exons <- sod1_annotation %>% dplyr::filter(type == \"exon\")\n#' sod1_exons %>% head()\n#'\n#' base <- sod1_exons %>%\n#'     ggplot(aes(\n#'         xstart = start,\n#'         xend = end,\n#'         y = transcript_name\n#'     ))\n#'\n#' # geom_range() is designed to visualise range-based annotation such as exons\n#' base + geom_range()\n#'\n#' # geom_half_range() allows users to plot half ranges\n#' # on the top or bottom of the transcript\n#' base + geom_half_range()\n#'\n#' # where the half ranges are plotted can be adjusted using range.orientation\n#' base + geom_half_range(range.orientation = \"top\")\n#'\n#' # as a ggplot2 extension, ggtranscript geoms inherit the\n#' # the functionality from the parameters and aesthetics in ggplot2\n#' base + geom_range(\n#'     aes(fill = transcript_name),\n#'     linewidth = 1\n#' )\n#'\n#' # together, geom_range() and geom_intron() are designed to visualize\n#' # the core components of transcript annotation\n#' base + geom_range(\n#'     aes(fill = transcript_biotype)\n#' ) +\n#'     geom_intron(\n#'         data = to_intron(sod1_exons, \"transcript_name\")\n#'     )\n#'\n#' # for protein coding transcripts\n#' # geom_range() be useful for visualizing UTRs that lie outside of the CDS\n#' sod1_exons_prot_coding <- sod1_exons %>%\n#'     dplyr::filter(transcript_biotype == \"protein_coding\")\n#'\n#' # extract cds\n#' sod1_cds <- sod1_annotation %>%\n#'     dplyr::filter(type == \"CDS\")\n#'\n#' sod1_exons_prot_coding %>%\n#'     ggplot(aes(\n#'         xstart = start,\n#'         xend = end,\n#'         y = transcript_name\n#'     )) +\n#'     geom_range(\n#'         fill = \"white\",\n#'         height = 0.25\n#'     ) +\n#'     geom_range(\n#'         data = sod1_cds\n#'     ) +\n#'     geom_intron(\n#'         data = to_intron(sod1_exons_prot_coding, \"transcript_name\")\n#'     )\n#'\n#' # geom_half_range() can be useful for comparing between two transcripts\n#' # enabling visualization of one transcript on the top, other on the bottom\n#' sod1_201_exons <- sod1_exons %>% dplyr::filter(transcript_name == \"SOD1-201\")\n#' sod1_201_cds <- sod1_cds %>% dplyr::filter(transcript_name == \"SOD1-201\")\n#' sod1_202_exons <- sod1_exons %>% dplyr::filter(transcript_name == \"SOD1-202\")\n#' sod1_202_cds <- sod1_cds %>% dplyr::filter(transcript_name == \"SOD1-202\")\n#'\n#' sod1_201_plot <- sod1_201_exons %>%\n#'     ggplot(aes(\n#'         xstart = start,\n#'         xend = end,\n#'         y = \"SOD1-201/202\"\n#'     )) +\n#'     geom_half_range(\n#'         fill = \"white\",\n#'         height = 0.125\n#'     ) +\n#'     geom_half_range(\n#'         data = sod1_201_cds\n#'     ) +\n#'     geom_intron(\n#'         data = to_intron(sod1_201_exons, \"transcript_name\")\n#'     )\n#'\n#' sod1_201_plot\n#'\n#' sod1_201_202_plot <- sod1_201_plot +\n#'     geom_half_range(\n#'         data = sod1_202_exons,\n#'         range.orientation = \"top\",\n#'         fill = \"white\",\n#'         height = 0.125\n#'     ) +\n#'     geom_half_range(\n#'         data = sod1_202_cds,\n#'         range.orientation = \"top\",\n#'         fill = \"purple\"\n#'     ) +\n#'     geom_intron(\n#'         data = to_intron(sod1_202_exons, \"transcript_name\")\n#'     )\n#'\n#' sod1_201_202_plot\n#'\n#' # leveraging existing ggplot2 functionality via e.g. coord_cartesian()\n#' # can be useful to zoom in on areas of interest\n#' sod1_201_202_plot + coord_cartesian(xlim = c(31659500, 31660000))\ngeom_range <- function(mapping = NULL, data = NULL,\n                       stat = \"identity\", position = \"identity\",\n                       ...,\n                       vjust = NULL,\n                       linejoin = \"mitre\",\n                       na.rm = FALSE,\n                       show.legend = NA,\n                       inherit.aes = TRUE) {\n    ggplot2::layer(\n        data = data,\n        mapping = mapping,\n        stat = stat,\n        geom = GeomRange,\n        position = position,\n        show.legend = show.legend,\n        inherit.aes = inherit.aes,\n        params = list(\n            vjust = vjust,\n            linejoin = linejoin,\n            na.rm = na.rm,\n            ...\n        )\n    )\n}\n\n#' `GeomRange` is `ggplot2::GeomTile` with modified `aes` to match genetic\n#' nomenclature (`xstart`/`xend`)\n#' @keywords internal\n#' @noRd\nGeomRange <- ggplot2::ggproto(\"GeomRange\", ggplot2::GeomTile,\n    required_aes = c(\"xstart\", \"xend\", \"y\"),\n    default_aes = aes(\n        fill = \"grey\",\n        colour = \"black\",\n        linewidth = 0.25,\n        linetype = 1,\n        alpha = NA,\n        height = NA\n    ),\n    setup_data = function(data, params) {\n        # modified from ggplot2::GeomTile\n        data$height <- data$height %||% params$height %||% 0.5\n\n        transform(\n            data,\n            xmin = xstart,\n            xmax = xend,\n            ymin = y - height / 2,\n            ymax = y + height / 2,\n            height = NULL\n        )\n    },\n    draw_panel = function(self,\n                          data,\n                          panel_params,\n                          coord,\n                          vjust = NULL,\n                          lineend = \"butt\",\n                          linejoin = \"mitre\") {\n        if (!coord$is_linear()) {\n            # prefer to match geom_curve and warn\n            # rather than copy the implementation from GeomRect for simplicity\n            # also don'think geom_range would be used for non-linear coords\n            warn(\"geom_ is not implemented for non-linear coordinates\")\n        }\n\n        coords <- coord$transform(data, panel_params)\n        grid::rectGrob(\n            coords$xmin, coords$ymax,\n            width = coords$xmax - coords$xmin,\n            height = coords$ymax - coords$ymin,\n            default.units = \"native\",\n            just = c(\"left\", \"top\"),\n            vjust = vjust,\n            gp = grid::gpar(\n                col = coords$colour,\n                fill = ggplot2::alpha(coords$fill, coords$alpha),\n                lwd = coords$linewidth * ggplot2::.pt,\n                lty = coords$linetype,\n                linejoin = linejoin,\n                lineend = lineend\n            )\n        )\n    }\n)\n"
  },
  {
    "path": "R/ggtranscript-package.R",
    "content": "#' `ggtranscript`: Visualizing transcript structure and annotation using\n#' `ggplot2`\n#'\n#' The goal of `ggtranscript` is the simplify the process of visualizing\n#' transcript structure and annotation. To achieve this, `ggtranscript`\n#' introduces 5 new geoms (`geom_range()`, `geom_half_range()`, `geom_intron()`,\n#' `geom_junction()` and `geom_junction_label_repel()`) as well as several\n#' helper functions. As a `ggplot2` extension, `ggtranscript` inherits\n#' `ggplot2`'s familiarity and flexibility, enabling users to intuitively adjust\n#' aesthetics, parameters, scales etc as well as complement `ggtranscript` geoms\n#' with existing `ggplot2` geoms to create informative, publication-ready plots.\n#'\n#' @docType package\n#' @name ggtranscript\n\"_PACKAGE\"\n\n#' @importFrom rlang %||%\n#' @importFrom magrittr %>%\n#' @importFrom ggplot2 aes\n#' @import ggrepel\nNULL\n"
  },
  {
    "path": "R/globals.R",
    "content": "# bypass R CMD Check notes, related to tidyverse non-standard evaluation\n# https://www.r-bloggers.com/2019/08/no-visible-binding-for-global-variable/\nutils::globalVariables(c(\n    \"x\",\n    \"start\",\n    \"end\",\n    \":=\",\n    \"intron_start\",\n    \"intron_end\",\n    \"exons\",\n    \"xend\",\n    \"mid\",\n    \"index\",\n    \"diff_type\",\n    \"in_exons\",\n    \"in_ref_exons\",\n    \".\",\n    # shorten_gaps()\n    \"width\",\n    \"rescaled_start\",\n    \"rescaled_end\",\n    \"width_tx_start\",\n    \"seqnames\",\n    \"strand\",\n    \"shorten_type\",\n    \"gap_width\",\n    \"shortened_gap_width\",\n    \"shortened_gap_diff\",\n    \"sum_shortened_gap_diff\",\n    \"intron_indexes\",\n    \"shortened_width\",\n    \"type\",\n    # add_exon_number()\n    \"exon_number\",\n    \"n_exons\",\n    # geom_junction_label_repel()\n    \"group\",\n    \"median\",\n    \"median_x\",\n    \"median_diff\",\n    \"x_points\",\n    \"y_points\",\n    \"y\",\n    \"y_original\",\n    \"position_nudge_repel\",\n    # add_utr(),\n    \"dummy_group\"\n))\n"
  },
  {
    "path": "R/shorten_gaps.R",
    "content": "#' Improve transcript structure visualization by shortening gaps\n#'\n#' For a given set of exons and introns, `shorten_gaps()` reduces the width of\n#' gaps (regions that do not overlap any `exons`) to a user-inputted\n#' `target_gap_width`. This can be useful when visualizing transcripts that have\n#' long introns, to hone in on the regions of interest (i.e. exons) and better\n#' compare between transcript structures.\n#'\n#' After `shorten_gaps()` reduces the size of gaps, it will re-scale `exons` and\n#' `introns` to preserve exon alignment. This process will only reduce the width\n#' of input `introns`, never `exons`. Importantly, the outputted re-scaled\n#' co-ordinates should only be used for visualization as they will not match the\n#' original genomic coordinates.\n#'\n#' @inheritParams to_diff\n#' @param introns `data.frame()` the intron co-ordinates corresponding to the\n#'   input `exons`. This can be created by applying `to_intron()` to the\n#'   `exons`. If introns originate from multiple transcripts, they must be\n#'   differentiated using `group_var`. If a user is not using `to_intron()`,\n#'   they must make sure intron start/ends are defined precisely as the adjacent\n#'   exon boundaries (rather than exon end + 1 and exon start - 1).\n#' @param target_gap_width `integer()` the width in base pairs to shorten the\n#'   gaps to.\n#'\n#' @return `data.frame()` contains the re-scaled co-ordinates of `introns` and\n#'   `exons` of each input transcript with shortened gaps.\n#'\n#' @export\n#' @examples\n#'\n#' library(magrittr)\n#' library(ggplot2)\n#'\n#' # to illustrate the package's functionality\n#' # ggtranscript includes example transcript annotation\n#' pknox1_annotation %>% head()\n#'\n#' # extract exons\n#' pknox1_exons <- pknox1_annotation %>% dplyr::filter(type == \"exon\")\n#' pknox1_exons %>% head()\n#'\n#' # to_intron() is a helper function included in ggtranscript\n#' # which is useful for converting exon co-ordinates to introns\n#' pknox1_introns <- pknox1_exons %>% to_intron(group_var = \"transcript_name\")\n#' pknox1_introns %>% head()\n#'\n#' # for transcripts with long introns, the exons of interest\n#' # can be difficult to visualize clearly when using the default scale\n#' pknox1_exons %>%\n#'     ggplot(aes(\n#'         xstart = start,\n#'         xend = end,\n#'         y = transcript_name\n#'     )) +\n#'     geom_range() +\n#'     geom_intron(\n#'         data = pknox1_introns,\n#'         arrow.min.intron.length = 3500\n#'     )\n#'\n#' # in such cases it can be useful to rescale the exons and introns\n#' # using shorten_gaps() which shortens regions that do not overlap an exon\n#' pknox1_rescaled <-\n#'     shorten_gaps(pknox1_exons, pknox1_introns, group_var = \"transcript_name\")\n#'\n#' pknox1_rescaled %>% head()\n#'\n#' # this allows us to visualize differences in exonic structure more clearly\n#' pknox1_rescaled %>%\n#'     dplyr::filter(type == \"exon\") %>%\n#'     ggplot(aes(\n#'         xstart = start,\n#'         xend = end,\n#'         y = transcript_name\n#'     )) +\n#'     geom_range() +\n#'     geom_intron(\n#'         data = pknox1_rescaled %>% dplyr::filter(type == \"intron\"),\n#'         arrow.min.intron.length = 300\n#'     )\n#'\n#' # shorten_gaps() can be used in combination with to_diff()\n#' # to further highlight differences in exon structure\n#' # here, all other transcripts are compared to the MANE-select transcript\n#' pknox1_rescaled_diffs <- to_diff(\n#'     exons = pknox1_rescaled %>%\n#'         dplyr::filter(type == \"exon\", transcript_name != \"PKNOX1-201\"),\n#'     ref_exons = pknox1_rescaled %>%\n#'         dplyr::filter(type == \"exon\", transcript_name == \"PKNOX1-201\"),\n#'     group_var = \"transcript_name\"\n#' )\n#'\n#' pknox1_rescaled %>%\n#'     dplyr::filter(type == \"exon\") %>%\n#'     ggplot(aes(\n#'         xstart = start,\n#'         xend = end,\n#'         y = transcript_name\n#'     )) +\n#'     geom_range() +\n#'     geom_intron(\n#'         data = pknox1_rescaled %>% dplyr::filter(type == \"intron\"),\n#'         arrow.min.intron.length = 300\n#'     ) +\n#'     geom_range(\n#'         data = pknox1_rescaled_diffs,\n#'         aes(fill = diff_type),\n#'         alpha = 0.2\n#'     )\nshorten_gaps <- function(exons,\n                         introns,\n                         group_var = NULL,\n                         target_gap_width = 100L) {\n\n    # input checks\n    .check_coord_object(exons, check_seqnames = TRUE, check_strand = TRUE)\n    .check_coord_object(introns, check_seqnames = TRUE, check_strand = TRUE)\n    .check_group_var(exons, group_var)\n    .check_group_var(introns, group_var)\n    target_gap_width <- .check_target_gap_width(target_gap_width)\n\n    # check type column, create if not present\n    exons <- .get_type(exons, \"exons\")\n    introns <- .get_type(introns, \"introns\")\n\n    # to_intron() defines introns using the exon boundaries\n    # we need to convert this to the actual gap definition to make sure\n    # comparison to GenomicRanges::gaps() when using \"equal\" works correctly\n    # this is converted back in .get_rescaled_txs()\n    introns <- introns %>%\n        dplyr::mutate(\n            start = start + 1,\n            end = end - 1\n        )\n\n    # we use GenomicRanges methods for downstream processing\n    exons_gr <- GenomicRanges::GRanges(exons)\n    introns_gr <- GenomicRanges::GRanges(introns)\n\n    # obtain actual gaps, i.e. regions that overlap no exons\n    intron_gaps <- .get_gaps(exons_gr)\n\n    # by mapping gaps back to introns, we can then shorten overlapping gaps\n    gap_map_intron <- .get_gap_map(introns_gr, intron_gaps)\n    introns_shortened <- .get_shortened_gaps(\n        introns,\n        intron_gaps,\n        gap_map_intron,\n        group_var,\n        target_gap_width\n    )\n\n    # don't have to take tx_start_gaps into account if only 1 tx\n    if (!is.null(group_var)) {\n        # because we're shortening intron_gaps, we also need to shorten the\n        # region from start of the plot and start of each tx (tx_start_gaps)\n        tx_start_gaps <- .get_tx_start_gaps(exons, group_var)\n        gap_map_tx_start_gaps <- .get_gap_map(\n            tx_start_gaps %>% GenomicRanges::GRanges(),\n            intron_gaps\n        )\n        tx_start_gaps_shortened <- .get_shortened_gaps(\n            tx_start_gaps,\n            intron_gaps,\n            gap_map_tx_start_gaps,\n            group_var,\n            target_gap_width\n        ) %>%\n            dplyr::select(-start, -end, -strand, -seqnames, -strand)\n    }\n\n    rescaled_tx <- .get_rescaled_txs(\n        exons,\n        introns_shortened,\n        tx_start_gaps_shortened,\n        group_var\n    )\n\n    return(rescaled_tx)\n}\n\n#' Add a type column if it is not present already\n#'\n#' @keywords internal\n#' @noRd\n.get_type <- function(x, exons_introns) {\n    if (!is.null(x[[\"type\"]])) {\n        # if there is an existing type column for introns\n        # need to make sure this is all \"intron\" for downstream functions\n        # don't check for exons, as this can be variable (e.g. \"five_prime_utr\")\n\n        if (exons_introns == \"introns\") {\n            allowed_types <- \"intron\"\n\n            if (!all(x[[\"type\"]] %in% allowed_types)) {\n                stop(\n                    \"values in the 'type' column of \", exons_introns, \" must be one of: \",\n                    allowed_types %>% paste0(\"'\", ., \"'\") %>% paste(collapse = \", \")\n                )\n            }\n        }\n    } else {\n        # if there isn't, we add a default type column\n        default_type <- ifelse(exons_introns == \"exons\", \"exon\", \"intron\")\n\n        x <- x %>% dplyr::mutate(type = default_type)\n    }\n\n    return(x)\n}\n\n#' @keywords internal\n#' @noRd\n.get_gaps <- function(exons_gr) {\n    orig_seqnames <- exons_gr %>%\n        GenomicRanges::seqnames() %>%\n        as.character() %>%\n        unique()\n\n    orig_strand <- exons_gr %>%\n        GenomicRanges::strand() %>%\n        as.character() %>%\n        unique()\n\n    # make sure we only have exons from a single transcript\n    .check_len_1_strand_seqnames(orig_seqnames, orig_strand)\n\n    # \"reduce\" exons - here meaning to collapse into single meta transcript\n    exons_gr_reduced <- exons_gr %>% GenomicRanges::reduce()\n\n    # keep only the relevant seqnames, otherwise gaps includes all seqlevels\n    GenomeInfoDb::seqlevels(exons_gr_reduced, pruning.mode = \"coarse\") <-\n        orig_seqnames\n\n    # obtain intronic gaps of the meta transcript\n    intron_gaps <- exons_gr_reduced %>%\n        GenomicRanges::gaps(\n            start = min(GenomicRanges::start(exons_gr_reduced)),\n            end = max(GenomicRanges::end(exons_gr_reduced))\n        )\n\n    # gaps creates a gap per strand too, keep only those from the original strand\n    intron_gaps <- intron_gaps %>%\n        .[GenomicRanges::strand(intron_gaps) == orig_strand]\n\n    return(intron_gaps)\n}\n\n#' @keywords internal\n#' @noRd\n.get_tx_start_gaps <- function(exons, group_var) {\n\n    # need to scale the transcript starts so scaled introns/exons align\n    # importantly, this tx start also has to take into account\n    # whether intron_gaps that overlap it have been shortened\n    # here, get the tx_start_gap - the region between\n    # 1. the start of plot (smallest start position of all txs)\n    # 2. the start of each tx\n    tx_start_gaps <-\n        exons %>%\n        dplyr::group_by_at(.vars = c(\n            group_var\n        )) %>%\n        dplyr::summarise(\n            seqnames = unique(seqnames),\n            strand = unique(strand),\n            end = min(start), # min start of this transcript\n            start = min(exons[[\"start\"]]) # min start of all transcripts\n        )\n\n    return(tx_start_gaps)\n}\n\n#' map the gaps back to introns/transcript start gaps\n#' @keywords internal\n#' @noRd\n.get_gap_map <- function(y, intron_gaps) {\n\n    # when we reduce the length of the intron_gaps, whilst making sure\n    # whilst making sure the exons/introns remain aligned\n    # to do this, we need to map the intron_gaps back onto the introns\n\n    # the simplest case is when gaps are identical to original introns\n    equal_hits <- GenomicRanges::findOverlaps(\n        intron_gaps,\n        y,\n        type = \"equal\"\n    )\n\n    # often, the intron_gaps don't map identically\n    # this occurs due to the exons of one tx overlapping the intron of another\n    # we find cases when the gaps are completely contained an original intron\n    # using type = \"within\", but this also catches the \"equal\" intron_gaps\n    within_hits <- GenomicRanges::findOverlaps(\n        intron_gaps,\n        y,\n        type = \"within\"\n    )\n\n    # convert to data.frame() to use dplyr::anti_join()\n    equal_hits <- equal_hits %>% as.data.frame()\n    within_hits <- within_hits %>% as.data.frame()\n\n    # remove the \"equal\" hits from the \"within\"\n    pure_within_hits <- within_hits %>%\n        dplyr::anti_join(equal_hits, by = c(\"queryHits\", \"subjectHits\"))\n\n    # need both \"equal\" and \"pure_within\" hits\n    gap_map <- list(\n        equal = equal_hits,\n        pure_within = pure_within_hits\n    )\n\n    return(gap_map)\n}\n\n#' @keywords internal\n#' @noRd\n.get_shortened_gaps <- function(y,\n                                intron_gaps,\n                                gap_map,\n                                group_var,\n                                target_gap_width) {\n\n    # we need the intron/tx_start_gap widths (to shorten them)\n    y <- y %>% dplyr::mutate(width = (end - start) + 1)\n\n    # characterise introns by shortening type\n    y_shortened <- y %>%\n        dplyr::mutate(\n            shorten_type = dplyr::case_when(\n                dplyr::row_number() %in% gap_map[[\"equal\"]][[\"subjectHits\"]] ~\n                \"equal\",\n                dplyr::row_number() %in% gap_map[[\"pure_within\"]][[\"subjectHits\"]] ~\n                \"pure_within\",\n                TRUE ~ \"none\"\n            )\n        )\n\n    # for the \"equal\" cases, simply shorten the widths to the target_gap_width\n    y_shortened <- y_shortened %>%\n        dplyr::mutate(\n            shortened_width = ifelse(\n                (shorten_type == \"equal\") & (width > target_gap_width),\n                target_gap_width,\n                width\n            )\n        )\n\n    # for the \"within\" cases we need to shorten the intron widths\n    # by the !total! amount the overlapping gaps are shortened\n    overlapping_gap_indexes <- gap_map[[\"pure_within\"]][[\"queryHits\"]]\n\n    # only have to this if there are gaps that are \"pure_within\"\n    if (length(overlapping_gap_indexes) > 0) {\n\n        # one intron may overlap multiple gaps\n        # first, calculate the sum of the reduction in gap widths\n        sum_gap_diff <- dplyr::tibble(\n            intron_indexes = gap_map[[\"pure_within\"]][[\"subjectHits\"]],\n            gap_width = GenomicRanges::width(intron_gaps)[overlapping_gap_indexes]\n        ) %>%\n            dplyr::mutate(\n                shortened_gap_width = ifelse(\n                    gap_width > target_gap_width,\n                    target_gap_width,\n                    gap_width\n                ),\n                shortened_gap_diff = gap_width - shortened_gap_width,\n            ) %>%\n            dplyr::group_by(intron_indexes) %>%\n            dplyr::summarise(\n                sum_shortened_gap_diff = sum(shortened_gap_diff)\n            )\n\n        # now actually do reduction for introns with \"pure_within\" gaps\n        y_shortened[[\"sum_shortened_gap_diff\"]] <- NA_integer_\n\n        y_shortened[[\"sum_shortened_gap_diff\"]][sum_gap_diff[[\"intron_indexes\"]]] <-\n            sum_gap_diff[[\"sum_shortened_gap_diff\"]]\n\n        y_shortened <- y_shortened %>%\n            dplyr::mutate(\n                shortened_width = ifelse(\n                    is.na(sum_shortened_gap_diff),\n                    shortened_width,\n                    width - sum_shortened_gap_diff\n                )\n            ) %>%\n            dplyr::select(-sum_shortened_gap_diff)\n    }\n\n    # remove unecessary intermediate cols\n    y_shortened <- y_shortened %>%\n        dplyr::select(\n            -shorten_type,\n            -width,\n            width = shortened_width\n        )\n\n    return(y_shortened)\n}\n\n#' @keywords internal\n#' @noRd\n.get_rescaled_txs <- function(exons,\n                              introns_shortened,\n                              tx_start_gaps_shortened,\n                              group_var) {\n\n    # calculate the rescaled exon/intron start/ends using\n    # the widths of the exons and reduced introns\n    rescaled_tx <- exons %>% dplyr::mutate(\n        width = (end - start) + 1\n    )\n\n    # bind together exons and introns and arrange into genomic order\n    rescaled_tx <- rescaled_tx %>%\n        dplyr::bind_rows(\n            introns_shortened\n        ) %>%\n        dplyr::arrange_at(.vars = c(group_var, \"start\", \"end\"))\n\n    # calculate the rescaled coords using cumsum of the widths of introns/exons\n    rescaled_tx <- rescaled_tx %>%\n        dplyr::group_by_at(.vars = c(\n            group_var\n        )) %>%\n        dplyr::mutate(\n            rescaled_end = cumsum(width),\n            rescaled_start = rescaled_end - (width - 1)\n        ) %>%\n        dplyr::ungroup()\n\n    # account for the tx starts being in different places\n    # to keep everything aligned\n    if (is.null(group_var)) {\n        # if only 1 tx, we use 1 as the dummy rescaled tx_start\n        rescaled_tx <- rescaled_tx %>%\n            dplyr::mutate(width_tx_start = 1)\n    } else {\n        rescaled_tx <- rescaled_tx %>%\n            dplyr::left_join(\n                tx_start_gaps_shortened,\n                by = c(group_var),\n                suffix = c(\"\", \"_tx_start\")\n            )\n    }\n\n    rescaled_tx <- rescaled_tx %>%\n        dplyr::mutate(\n            rescaled_end = rescaled_end + width_tx_start,\n            rescaled_start = rescaled_start + width_tx_start\n        ) %>%\n        dplyr::select(-dplyr::contains(\"width\"))\n\n    # convert introns back to be defined by exon boundaries, match to_intron()\n    rescaled_tx <- rescaled_tx %>%\n        dplyr::mutate(\n            start = ifelse(type == \"intron\", start - 1, start),\n            end = ifelse(type == \"intron\", end + 1, end),\n            rescaled_start = ifelse(\n                type == \"intron\", rescaled_start - 1, rescaled_start\n            ),\n            rescaled_end = ifelse(\n                type == \"intron\", rescaled_end + 1, rescaled_end\n            )\n        )\n\n    # remove original start/end\n    rescaled_tx <- rescaled_tx %>% dplyr::select(-start, -end)\n\n    rescaled_tx <- rescaled_tx %>%\n        dplyr::select(\n            seqnames,\n            start = rescaled_start,\n            end = rescaled_end,\n            strand,\n            dplyr::everything()\n        )\n\n    return(rescaled_tx)\n}\n\n#' we expect the exons to originate from a single gene.\n#' therefore, unique strand and seqnames should be of length 1\n#' @keywords internal\n#' @noRd\n.check_len_1_strand_seqnames <- function(orig_seqnames, orig_strand) {\n    ab_1_uniq <- \"of object contains more than 1 unique value. \"\n    reason <- \"object is expected to contain exons from a single gene.\"\n\n    if (length(orig_seqnames) != 1) {\n        stop(\"seqnames \", ab_1_uniq, reason)\n    }\n\n    if (length(orig_strand) != 1) {\n        stop(\"strand \", ab_1_uniq, reason)\n    }\n}\n\n#' @keywords internal\n#' @noRd\n.check_target_gap_width <- function(target_gap_width) {\n    if (!is.integer(target_gap_width)) {\n        warning(\"target_gap_width must be an integer, coercing...\")\n        target_gap_width <- target_gap_width %>%\n            as.integer()\n    }\n\n    return(target_gap_width)\n}\n"
  },
  {
    "path": "R/to_diff.R",
    "content": "#' Obtain the differences between transcript structure\n#'\n#' `to_diff()` obtains the difference between `exons` from a set of transcripts\n#' to a reference transcript (`ref_exons`). This can be useful when visualizing\n#' the differences between transcript structure. `to_diff()` expects two sets of\n#' input exons; 1. `exons` - exons from any number of transcripts that will be\n#' compared to `ref_exons` and 2. `ref_exons` - exons from a single transcript\n#' which acts as the reference to compare against.\n#'\n#' @param exons `data.frame()` contains exons which can originate from multiple\n#'   transcripts differentiated by `group_var`.\n#' @param ref_exons `data.frame()` contains exons that originate from a single\n#'   transcript, which `exons` will be compared against.\n#' @param group_var `character()` if input data originates from more than 1\n#'   transcript, `group_var` must specify the column that differentiates\n#'   transcripts (e.g. \"transcript_id\").\n#'\n#' @return `data.frame()` details the differences between `exons` and\n#'   `ref_exons`.\n#'\n#' @export\n#' @examples\n#'\n#' library(magrittr)\n#' library(ggplot2)\n#'\n#' # to illustrate the package's functionality\n#' # ggtranscript includes example transcript annotation\n#' sod1_annotation %>% head()\n#'\n#' # extract exons\n#' sod1_exons <- sod1_annotation %>% dplyr::filter(type == \"exon\")\n#' sod1_exons %>% head()\n#'\n#' # for this example, let's compare transcripts to the MANE-select transcript\n#' sod1_mane <- sod1_exons %>% dplyr::filter(transcript_name == \"SOD1-201\")\n#' sod1_not_mane <- sod1_exons %>% dplyr::filter(transcript_name != \"SOD1-201\")\n#'\n#' # to_diff() obtains the differences between the exons as ranges\n#' sod1_diffs <- to_diff(\n#'     exons = sod1_not_mane,\n#'     ref_exons = sod1_mane,\n#'     group_var = \"transcript_name\"\n#' )\n#'\n#' sod1_diffs %>% head()\n#'\n#' # using geom_range(), it can be useful to visually overlay\n#' # the differences on top of the transcript annotation\n#' sod1_exons %>%\n#'     ggplot(aes(\n#'         xstart = start,\n#'         xend = end,\n#'         y = transcript_name\n#'     )) +\n#'     geom_range() +\n#'     geom_intron(\n#'         data = to_intron(sod1_exons, \"transcript_name\")\n#'     ) +\n#'     geom_range(\n#'         data = sod1_diffs,\n#'         ggplot2::aes(fill = diff_type),\n#'         alpha = 0.2\n#'     )\nto_diff <- function(exons, ref_exons, group_var = NULL) {\n    .check_coord_object(exons, check_seqnames = TRUE, check_strand = TRUE)\n    .check_coord_object(ref_exons, check_seqnames = TRUE, check_strand = TRUE)\n    .check_group_var(exons, group_var)\n\n    # need to remember if group is NULL for downstream\n    null_group <- is.null(group_var)\n\n    # we have to create dummy group if there is no group for .get_diff\n    if (null_group) {\n        exons <- exons %>% dplyr::mutate(dummy_group = \"A\")\n        group_var <- \"dummy_group\"\n    }\n\n    diffs <- .get_diff(exons, ref_exons, group_var)\n\n    # remove the dummy_group if created\n    if (null_group) diffs[[group_var]] <- NULL\n\n    return(diffs)\n}\n\n#' The heavy lifting of `to_diff()` happens here.\n#'\n#' @keywords internal\n#' @noRd\n.get_diff <- function(exons, ref_exons, group_var) {\n    groups <- exons[[group_var]] %>% unique()\n\n    # needs to be a genomic range for downstream processing\n    exons_gr <- GenomicRanges::GRanges(exons)\n    ref_exons_gr <- GenomicRanges::GRanges(ref_exons)\n\n    diffs <- vector(\"list\", length = length(group_var))\n\n    for (i in seq_along(groups)) {\n        exons_gr_curr <- exons_gr %>%\n            .[GenomicRanges::mcols(exons_gr)[[group_var]] == groups[i]]\n\n        # get the disjoint pieces (flattening and breaking apart exons)\n        disjoint_pieces <- GenomicRanges::disjoin(\n            c(ref_exons_gr, exons_gr_curr)\n        )\n\n        # find whether the disjoint pieces overlap exons or ref_exons\n        # those that only overlap 1 are the differences\n        # TODO - perhaps allow modification of findOverlaps() via ... ?\n        overlap_exons <- GenomicRanges::findOverlaps(\n            disjoint_pieces, exons_gr_curr\n        )\n        overlap_ref_exons <- GenomicRanges::findOverlaps(\n            disjoint_pieces, ref_exons_gr\n        )\n\n        # convert pieces back to data.frame and classify diffs\n        # TODO - could improve efficiency by placing this step post-loop\n        # i.e. manipulate the grs instead\n        diff_curr <- disjoint_pieces %>%\n            as.data.frame() %>%\n            dplyr::mutate(\n                index = dplyr::row_number(),\n                type = \"diff\",\n                in_exons = index %in% S4Vectors::queryHits(overlap_exons),\n                in_ref_exons = index %in% S4Vectors::queryHits(overlap_ref_exons)\n            ) %>%\n            dplyr::mutate(\n                diff_type = dplyr::case_when(\n                    in_exons & in_ref_exons ~ \"both\",\n                    in_exons & !in_ref_exons ~ \"not_in_ref\",\n                    !in_exons & in_ref_exons ~ \"in_ref\"\n                )\n            )\n\n        # add back in group info\n        diff_curr[[group_var]] <- groups[i]\n\n        # keep only diffs and necessary cols\n        diffs[[i]] <-\n            diff_curr %>%\n            dplyr::filter(diff_type != \"both\") %>%\n            dplyr::select(-in_exons, -in_ref_exons, -index)\n    }\n\n    diffs <- diffs %>% do.call(dplyr::bind_rows, .)\n\n    return(diffs)\n}\n"
  },
  {
    "path": "R/to_intron.R",
    "content": "#' Convert exon co-ordinates to introns\n#'\n#' Given a set of `exons`, `to_intron()` will return the corresponding introns.\n#'\n#' It is important to note that, for visualization purposes, `to_intron()`\n#' defines introns precisely as the exon boundaries, rather than the intron\n#' start/end being (exon end + 1)/(exon start - 1).\n#'\n#' @inheritParams to_diff\n#'\n#' @return `data.frame()` contains the intron co-ordinates.\n#'\n#' @export\n#' @examples\n#' library(magrittr)\n#' library(ggplot2)\n#'\n#' # to illustrate the package's functionality\n#' # ggtranscript includes example transcript annotation\n#' sod1_annotation %>% head()\n#'\n#' # extract exons\n#' sod1_exons <- sod1_annotation %>% dplyr::filter(type == \"exon\")\n#' sod1_exons %>% head()\n#'\n#' # to_intron() is a helper function included in ggtranscript\n#' # which is useful for converting exon co-ordinates to introns\n#' sod1_introns <- sod1_exons %>% to_intron(group_var = \"transcript_name\")\n#' sod1_introns %>% head()\n#'\n#' # this can be particular useful when combined with\n#' # geom_range() and geom_intron()\n#' # to visualize the core components of transcript annotation\n#' sod1_exons %>%\n#'     ggplot(aes(\n#'         xstart = start,\n#'         xend = end,\n#'         y = transcript_name\n#'     )) +\n#'     geom_range() +\n#'     geom_intron(\n#'         data = to_intron(sod1_exons, \"transcript_name\")\n#'     )\nto_intron <- function(exons, group_var = NULL) {\n    .check_coord_object(exons)\n    .check_group_var(exons, group_var)\n\n    # TODO - switch this to using GenomicRanges::gaps()?\n\n    if (!is.null(group_var)) {\n        exons <- exons %>% dplyr::group_by_at(.vars = group_var)\n    }\n\n    # make sure exons are arranged by coord, so that dplyr::lag works correctly\n    exons <- exons %>%\n        dplyr::arrange(start, end)\n\n    # obtain intron start and ends\n    introns <- exons %>%\n        dplyr::mutate(\n            intron_start := dplyr::lag(end),\n            intron_end := start,\n            type = \"intron\"\n        ) %>%\n        dplyr::select(-start, -end)\n\n    # remove the introduced artifact NAs\n    introns <- introns %>%\n        dplyr::ungroup() %>%\n        dplyr::filter(!is.na(intron_start) & !is.na(intron_end))\n\n    # filter out introns with a width of 1, this should only happen when\n    # utrs are included and are directly adjacent to end of cds\n    introns <- introns %>% dplyr::filter(abs(intron_end - intron_start) != 1)\n\n    introns <- introns %>% dplyr::rename(start = intron_start, end = intron_end)\n\n    return(introns)\n}\n"
  },
  {
    "path": "R/utils.R",
    "content": "#' @keywords internal\n#' @noRd\n.check_coord_object <- function(x,\n                                check_seqnames = FALSE,\n                                check_strand = FALSE) {\n    if (!is.data.frame(x)) {\n        stop(\n            \"object must be a data.frame. \",\n            \"GRanges objects are currently not supported and must be converted \",\n            \"using e.g. as.data.frame()\"\n        )\n    }\n\n    if (!all(c(\"start\", \"end\") %in% colnames(x))) {\n        stop(\"object must have the columns 'start' and 'end'\")\n    }\n\n    if (check_seqnames) {\n        if (!(\"seqnames\" %in% colnames(x))) {\n            stop(\"object must have the column 'seqnames'\")\n        }\n    }\n\n    if (check_strand) {\n        if (!(\"strand\" %in% colnames(x))) {\n            stop(\"object must have the column 'strand'\")\n        }\n    }\n}\n\n#' @keywords internal\n#' @noRd\n.check_group_var <- function(x, group_var) {\n    if (!is.null(group_var)) {\n        if (!all(group_var %in% colnames(x))) {\n            stop(\n                \"group_var ('\", group_var, \"') \",\n                \"must be a column in object\"\n            )\n        }\n    }\n}\n"
  },
  {
    "path": "README.Rmd",
    "content": "---\noutput: github_document\n---\n\n```{r, include = FALSE}\nknitr::opts_chunk$set(\n    collapse = TRUE,\n    comment = \"#>\",\n    fig.path = \"man/figures/README-\",\n    out.width = \"100%\",\n    dpi = 300\n)\n```\n\n# ggtranscript <img src=\"man/figures/ggtranscript_logo_cropped.svg\" align=\"right\" height=\"139\" />\n\n<!-- badges: start -->\n[![GitHub issues](https://img.shields.io/github/issues/dzhang32/ggtranscript)](https://github.com/dzhang32/ggtranscript/issues)\n[![GitHub pulls](https://img.shields.io/github/issues-pr/dzhang32/ggtranscript)](https://github.com/dzhang32/ggtranscript/pulls)\n[![Lifecycle: experimental](https://img.shields.io/badge/lifecycle-experimental-orange.svg)](https://lifecycle.r-lib.org/articles/stages.html#experimental)\n[![R-CMD-check-bioc](https://github.com/dzhang32/ggtranscript/workflows/R-CMD-check-bioc/badge.svg)](https://github.com/dzhang32/ggtranscript/actions)\n[![Codecov test coverage](https://codecov.io/gh/dzhang32/ggtranscript/branch/main/graph/badge.svg)](https://app.codecov.io/gh/dzhang32/ggtranscript?branch=main)\n<!-- badges: end -->\n\n`ggtranscript` is a `ggplot2` extension that makes it to easy to visualize transcript structure and annotation. \n\n## Installation\n\n```{r \"install_dev\", eval = FALSE}\n# you can install the development version of ggtranscript from GitHub:\n# install.packages(\"devtools\")\ndevtools::install_github(\"dzhang32/ggtranscript\")\n```\n\n## Usage\n\n`ggtranscript` introduces 5 new geoms (`geom_range()`, `geom_half_range()`, `geom_intron()`, `geom_junction()` and `geom_junction_label_repel()`) and several helper functions designed to facilitate the visualization of transcript structure and annotation. The following guide takes you on a quick tour of using these geoms, for a more detailed overview see the [Getting Started tutorial](https://dzhang32.github.io/ggtranscript/articles/ggtranscript.html).\n\n`geom_range()` and `geom_intron()` enable the plotting of exons and introns, the core components of transcript annotation. `ggtranscript` also provides `to_intron()`, which converts exon co-ordinates to the corresponding introns. Together, `ggtranscript` enables users to plot transcript structures with only exons as the required input and just a few lines of code.  \n\n```{r geom-range-intron}\nlibrary(magrittr)\nlibrary(dplyr)\nlibrary(ggplot2)\nlibrary(ggtranscript)\n\n# to illustrate the package's functionality\n# ggtranscript includes example transcript annotation\nsod1_annotation %>% head()\n\n# extract exons\nsod1_exons <- sod1_annotation %>% dplyr::filter(type == \"exon\")\n\nsod1_exons %>%\n    ggplot(aes(\n        xstart = start,\n        xend = end,\n        y = transcript_name\n    )) +\n    geom_range(\n        aes(fill = transcript_biotype)\n    ) +\n    geom_intron(\n        data = to_intron(sod1_exons, \"transcript_name\"),\n        aes(strand = strand)\n    )\n```\n\n`ggtranscript` provides the helper function `shorten_gaps()`, which reduces the size of the gaps. `shorten_gaps()` then rescales the exon and intron co-ordinates to preserve the original exon alignment. This allows you to hone in the differences in the exonic structure, which can be particularly useful if the transcript has relatively long introns. \n\n```{r shorten-gaps}\n\nsod1_rescaled <- shorten_gaps(\n  sod1_exons, \n  to_intron(sod1_exons, \"transcript_name\"), \n  group_var = \"transcript_name\"\n  )\n\nsod1_rescaled %>%\n    dplyr::filter(type == \"exon\") %>%\n    ggplot(aes(\n        xstart = start,\n        xend = end,\n        y = transcript_name\n    )) +\n    geom_range(\n      aes(fill = transcript_biotype)\n    ) +\n    geom_intron(\n        data = sod1_rescaled %>% dplyr::filter(type == \"intron\"), \n        arrow.min.intron.length = 200\n    )\n\n```\n\n\n`geom_range()` can be used for any range-based genomic annotation. For example, when plotting protein-coding transcripts, users may find it helpful to visually distinguish the coding segments from UTRs. \n\n```{r geom-range-intron-w-cds}\n# filter for only exons from protein coding transcripts\nsod1_exons_prot_cod <- sod1_exons %>%\n    dplyr::filter(transcript_biotype == \"protein_coding\")\n\n# obtain cds\nsod1_cds <- sod1_annotation %>% dplyr::filter(type == \"CDS\")\n\nsod1_exons_prot_cod %>%\n    ggplot(aes(\n        xstart = start,\n        xend = end,\n        y = transcript_name\n    )) +\n    geom_range(\n        fill = \"white\",\n        height = 0.25\n    ) +\n    geom_range(\n        data = sod1_cds\n    ) +\n    geom_intron(\n        data = to_intron(sod1_exons_prot_cod, \"transcript_name\"),\n        aes(strand = strand),\n        arrow.min.intron.length = 500,\n    )\n```\n\n`geom_half_range()` takes advantage of the vertical symmetry of transcript annotation by plotting only half of a range on the top or bottom of a transcript structure. One use case of `geom_half_range()` is to visualize the differences between transcript structure more clearly. \n\n```{r geom-half-range, fig.height = 3}\n\n# extract exons and cds for the two transcripts to be compared\nsod1_201_exons <- sod1_exons %>% dplyr::filter(transcript_name == \"SOD1-201\")\nsod1_201_cds <- sod1_cds %>% dplyr::filter(transcript_name == \"SOD1-201\")\nsod1_202_exons <- sod1_exons %>% dplyr::filter(transcript_name == \"SOD1-202\")\nsod1_202_cds <- sod1_cds %>% dplyr::filter(transcript_name == \"SOD1-202\")\n\nsod1_201_202_plot <- sod1_201_exons %>%\n    ggplot(aes(\n        xstart = start,\n        xend = end,\n        y = \"SOD1-201/202\"\n    )) +\n    geom_half_range(\n        fill = \"white\",\n        height = 0.125\n    ) +\n    geom_half_range(\n        data = sod1_201_cds\n    ) +\n    geom_intron(\n        data = to_intron(sod1_201_exons, \"transcript_name\")\n    ) +\n    geom_half_range(\n        data = sod1_202_exons,\n        range.orientation = \"top\",\n        fill = \"white\",\n        height = 0.125\n    ) +\n    geom_half_range(\n        data = sod1_202_cds,\n        range.orientation = \"top\",\n        fill = \"purple\"\n    ) +\n    geom_intron(\n        data = to_intron(sod1_202_exons, \"transcript_name\")\n    )\n\nsod1_201_202_plot\n\n```\n\nAs a `ggplot2` extension, `ggtranscript` inherits the the familiarity and functionality of `ggplot2`. For instance, by leveraging `coord_cartesian()` users can zoom in on regions of interest. \n\n```{r geom-half-range-zoomed, fig.height = 3}\n\nsod1_201_202_plot + coord_cartesian(xlim = c(31659500, 31660000))\n\n```\n\n`geom_junction()` enables to plotting of junction curves, which can be overlaid across transcript structures. `geom_junction_label_repel()` adds a label to junction curves, which can often be useful to mark junctions with a metric of their usage such as read counts. \n\n```{r geom-junction, fig.height = 3}\n\n# ggtranscript includes a set of example (unannotated) junctions\n# originating from GTEx and downloaded via the Bioconductor package snapcount\nsod1_junctions\n\n# add transcript_name to junctions for plotting\nsod1_junctions <- sod1_junctions %>%\n    dplyr::mutate(transcript_name = \"SOD1-201\")\n\nsod1_201_exons %>%\n  ggplot(aes(\n    xstart = start,\n    xend = end,\n    y = transcript_name\n  )) +\n  geom_range(\n    fill = \"white\", \n    height = 0.25\n  ) +\n  geom_range(\n    data = sod1_201_cds\n  ) + \n  geom_intron(\n    data = to_intron(sod1_201_exons, \"transcript_name\")\n  ) + \n  geom_junction(\n    data = sod1_junctions,\n    junction.y.max = 0.5\n  ) +\n  geom_junction_label_repel(\n    data = sod1_junctions,\n    aes(label = round(mean_count, 2)),\n    junction.y.max = 0.5\n  )\n\n```\n\nAlternatively, users may prefer to map junction read counts to the thickness of the junction curves. As a `ggplot2` extension, this can be done intuitively by modifying the size `aes()` of `geom_junction()`. In addition, by modifying `ggplot2` scales and themes, users can easily create informative, publication-ready plots.\n\n```{r geom-junction-pub, fig.height = 3}\n\nsod1_201_exons %>%\n  ggplot(aes(\n    xstart = start,\n    xend = end,\n    y = transcript_name\n  )) +\n  geom_range(\n    fill = \"white\", \n    height = 0.25\n  ) +\n  geom_range(\n    data = sod1_201_cds\n  ) + \n  geom_intron(\n    data = to_intron(sod1_201_exons, \"transcript_name\")\n  ) + \n  geom_junction(\n    data = sod1_junctions,\n    aes(size = mean_count),\n    junction.y.max = 0.5, \n    ncp = 30, \n    colour = \"purple\"\n  ) + \n  scale_size_continuous(range = c(0.1, 1), guide = \"none\") + \n  xlab(\"Genomic position (chr21)\") + \n  ylab(\"Transcript name\") + \n  theme_bw()\n\n```\n\n## Citation\n\n```{r citing-ggtranscript}\n\ncitation(\"ggtranscript\")\n\n```\n\n## Credits\n\n* `ggtranscript` was developed using `biocthis`.\n"
  },
  {
    "path": "README.md",
    "content": "\n# ggtranscript <img src=\"man/figures/ggtranscript_logo_cropped.svg\" align=\"right\" height=\"139\" />\n\n<!-- badges: start -->\n\n[![GitHub\nissues](https://img.shields.io/github/issues/dzhang32/ggtranscript)](https://github.com/dzhang32/ggtranscript/issues)\n[![GitHub\npulls](https://img.shields.io/github/issues-pr/dzhang32/ggtranscript)](https://github.com/dzhang32/ggtranscript/pulls)\n[![Lifecycle:\nexperimental](https://img.shields.io/badge/lifecycle-experimental-orange.svg)](https://lifecycle.r-lib.org/articles/stages.html#experimental)\n[![R-CMD-check-bioc](https://github.com/dzhang32/ggtranscript/workflows/R-CMD-check-bioc/badge.svg)](https://github.com/dzhang32/ggtranscript/actions)\n[![Codecov test\ncoverage](https://codecov.io/gh/dzhang32/ggtranscript/branch/main/graph/badge.svg)](https://app.codecov.io/gh/dzhang32/ggtranscript?branch=main)\n<!-- badges: end -->\n\n`ggtranscript` is a `ggplot2` extension that makes it to easy to\nvisualize transcript structure and annotation.\n\n## Installation\n\n``` r\n# you can install the development version of ggtranscript from GitHub:\n# install.packages(\"devtools\")\ndevtools::install_github(\"dzhang32/ggtranscript\")\n```\n\n## Usage\n\n`ggtranscript` introduces 5 new geoms (`geom_range()`,\n`geom_half_range()`, `geom_intron()`, `geom_junction()` and\n`geom_junction_label_repel()`) and several helper functions designed to\nfacilitate the visualization of transcript structure and annotation. The\nfollowing guide takes you on a quick tour of using these geoms, for a\nmore detailed overview see the [Getting Started\ntutorial](https://dzhang32.github.io/ggtranscript/articles/ggtranscript.html).\n\n`geom_range()` and `geom_intron()` enable the plotting of exons and\nintrons, the core components of transcript annotation. `ggtranscript`\nalso provides `to_intron()`, which converts exon co-ordinates to the\ncorresponding introns. Together, `ggtranscript` enables users to plot\ntranscript structures with only exons as the required input and just a\nfew lines of code.\n\n``` r\nlibrary(magrittr)\nlibrary(dplyr)\n#> \n#> Attaching package: 'dplyr'\n#> The following objects are masked from 'package:stats':\n#> \n#>     filter, lag\n#> The following objects are masked from 'package:base':\n#> \n#>     intersect, setdiff, setequal, union\nlibrary(ggplot2)\nlibrary(ggtranscript)\n\n# to illustrate the package's functionality\n# ggtranscript includes example transcript annotation\nsod1_annotation %>% head()\n#> # A tibble: 6 × 8\n#>   seqnames    start      end strand type        gene_name transcript_name\n#>   <fct>       <int>    <int> <fct>  <fct>       <chr>     <chr>          \n#> 1 21       31659666 31668931 +      gene        SOD1      <NA>           \n#> 2 21       31659666 31668931 +      transcript  SOD1      SOD1-202       \n#> 3 21       31659666 31659784 +      exon        SOD1      SOD1-202       \n#> 4 21       31659770 31659784 +      CDS         SOD1      SOD1-202       \n#> 5 21       31659770 31659772 +      start_codon SOD1      SOD1-202       \n#> 6 21       31663790 31663886 +      exon        SOD1      SOD1-202       \n#> # ℹ 1 more variable: transcript_biotype <chr>\n\n# extract exons\nsod1_exons <- sod1_annotation %>% dplyr::filter(type == \"exon\")\n\nsod1_exons %>%\n    ggplot(aes(\n        xstart = start,\n        xend = end,\n        y = transcript_name\n    )) +\n    geom_range(\n        aes(fill = transcript_biotype)\n    ) +\n    geom_intron(\n        data = to_intron(sod1_exons, \"transcript_name\"),\n        aes(strand = strand)\n    )\n```\n\n<img src=\"man/figures/README-geom-range-intron-1.png\" width=\"100%\" />\n\n`ggtranscript` provides the helper function `shorten_gaps()`, which\nreduces the size of the gaps. `shorten_gaps()` then rescales the exon\nand intron co-ordinates to preserve the original exon alignment. This\nallows you to hone in the differences in the exonic structure, which can\nbe particularly useful if the transcript has relatively long introns.\n\n``` r\n\nsod1_rescaled <- shorten_gaps(\n  sod1_exons, \n  to_intron(sod1_exons, \"transcript_name\"), \n  group_var = \"transcript_name\"\n  )\n\nsod1_rescaled %>%\n    dplyr::filter(type == \"exon\") %>%\n    ggplot(aes(\n        xstart = start,\n        xend = end,\n        y = transcript_name\n    )) +\n    geom_range(\n      aes(fill = transcript_biotype)\n    ) +\n    geom_intron(\n        data = sod1_rescaled %>% dplyr::filter(type == \"intron\"), \n        arrow.min.intron.length = 200\n    )\n```\n\n<img src=\"man/figures/README-shorten-gaps-1.png\" width=\"100%\" />\n\n`geom_range()` can be used for any range-based genomic annotation. For\nexample, when plotting protein-coding transcripts, users may find it\nhelpful to visually distinguish the coding segments from UTRs.\n\n``` r\n# filter for only exons from protein coding transcripts\nsod1_exons_prot_cod <- sod1_exons %>%\n    dplyr::filter(transcript_biotype == \"protein_coding\")\n\n# obtain cds\nsod1_cds <- sod1_annotation %>% dplyr::filter(type == \"CDS\")\n\nsod1_exons_prot_cod %>%\n    ggplot(aes(\n        xstart = start,\n        xend = end,\n        y = transcript_name\n    )) +\n    geom_range(\n        fill = \"white\",\n        height = 0.25\n    ) +\n    geom_range(\n        data = sod1_cds\n    ) +\n    geom_intron(\n        data = to_intron(sod1_exons_prot_cod, \"transcript_name\"),\n        aes(strand = strand),\n        arrow.min.intron.length = 500,\n    )\n```\n\n<img src=\"man/figures/README-geom-range-intron-w-cds-1.png\" width=\"100%\" />\n\n`geom_half_range()` takes advantage of the vertical symmetry of\ntranscript annotation by plotting only half of a range on the top or\nbottom of a transcript structure. One use case of `geom_half_range()` is\nto visualize the differences between transcript structure more clearly.\n\n``` r\n\n# extract exons and cds for the two transcripts to be compared\nsod1_201_exons <- sod1_exons %>% dplyr::filter(transcript_name == \"SOD1-201\")\nsod1_201_cds <- sod1_cds %>% dplyr::filter(transcript_name == \"SOD1-201\")\nsod1_202_exons <- sod1_exons %>% dplyr::filter(transcript_name == \"SOD1-202\")\nsod1_202_cds <- sod1_cds %>% dplyr::filter(transcript_name == \"SOD1-202\")\n\nsod1_201_202_plot <- sod1_201_exons %>%\n    ggplot(aes(\n        xstart = start,\n        xend = end,\n        y = \"SOD1-201/202\"\n    )) +\n    geom_half_range(\n        fill = \"white\",\n        height = 0.125\n    ) +\n    geom_half_range(\n        data = sod1_201_cds\n    ) +\n    geom_intron(\n        data = to_intron(sod1_201_exons, \"transcript_name\")\n    ) +\n    geom_half_range(\n        data = sod1_202_exons,\n        range.orientation = \"top\",\n        fill = \"white\",\n        height = 0.125\n    ) +\n    geom_half_range(\n        data = sod1_202_cds,\n        range.orientation = \"top\",\n        fill = \"purple\"\n    ) +\n    geom_intron(\n        data = to_intron(sod1_202_exons, \"transcript_name\")\n    )\n\nsod1_201_202_plot\n```\n\n<img src=\"man/figures/README-geom-half-range-1.png\" width=\"100%\" />\n\nAs a `ggplot2` extension, `ggtranscript` inherits the the familiarity\nand functionality of `ggplot2`. For instance, by leveraging\n`coord_cartesian()` users can zoom in on regions of interest.\n\n``` r\n\nsod1_201_202_plot + coord_cartesian(xlim = c(31659500, 31660000))\n```\n\n<img src=\"man/figures/README-geom-half-range-zoomed-1.png\" width=\"100%\" />\n\n`geom_junction()` enables to plotting of junction curves, which can be\noverlaid across transcript structures. `geom_junction_label_repel()`\nadds a label to junction curves, which can often be useful to mark\njunctions with a metric of their usage such as read counts.\n\n``` r\n\n# ggtranscript includes a set of example (unannotated) junctions\n# originating from GTEx and downloaded via the Bioconductor package snapcount\nsod1_junctions\n#> # A tibble: 5 × 5\n#>   seqnames    start      end strand mean_count\n#>   <fct>       <int>    <int> <fct>       <dbl>\n#> 1 chr21    31659787 31666448 +           0.463\n#> 2 chr21    31659842 31660554 +           0.831\n#> 3 chr21    31659842 31663794 +           0.316\n#> 4 chr21    31659842 31667257 +           4.35 \n#> 5 chr21    31660351 31663789 +           0.324\n\n# add transcript_name to junctions for plotting\nsod1_junctions <- sod1_junctions %>%\n    dplyr::mutate(transcript_name = \"SOD1-201\")\n\nsod1_201_exons %>%\n  ggplot(aes(\n    xstart = start,\n    xend = end,\n    y = transcript_name\n  )) +\n  geom_range(\n    fill = \"white\", \n    height = 0.25\n  ) +\n  geom_range(\n    data = sod1_201_cds\n  ) + \n  geom_intron(\n    data = to_intron(sod1_201_exons, \"transcript_name\")\n  ) + \n  geom_junction(\n    data = sod1_junctions,\n    junction.y.max = 0.5\n  ) +\n  geom_junction_label_repel(\n    data = sod1_junctions,\n    aes(label = round(mean_count, 2)),\n    junction.y.max = 0.5\n  )\n```\n\n<img src=\"man/figures/README-geom-junction-1.png\" width=\"100%\" />\n\nAlternatively, users may prefer to map junction read counts to the\nthickness of the junction curves. As a `ggplot2` extension, this can be\ndone intuitively by modifying the size `aes()` of `geom_junction()`. In\naddition, by modifying `ggplot2` scales and themes, users can easily\ncreate informative, publication-ready plots.\n\n``` r\n\nsod1_201_exons %>%\n  ggplot(aes(\n    xstart = start,\n    xend = end,\n    y = transcript_name\n  )) +\n  geom_range(\n    fill = \"white\", \n    height = 0.25\n  ) +\n  geom_range(\n    data = sod1_201_cds\n  ) + \n  geom_intron(\n    data = to_intron(sod1_201_exons, \"transcript_name\")\n  ) + \n  geom_junction(\n    data = sod1_junctions,\n    aes(size = mean_count),\n    junction.y.max = 0.5, \n    ncp = 30, \n    colour = \"purple\"\n  ) + \n  scale_size_continuous(range = c(0.1, 1), guide = \"none\") + \n  xlab(\"Genomic position (chr21)\") + \n  ylab(\"Transcript name\") + \n  theme_bw()\n#> Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.\n#> ℹ Please use `linewidth` instead.\n#> This warning is displayed once every 8 hours.\n#> Call `lifecycle::last_lifecycle_warnings()` to see where this warning was\n#> generated.\n```\n\n<img src=\"man/figures/README-geom-junction-pub-1.png\" width=\"100%\" />\n\n## Citation\n\n``` r\n\ncitation(\"ggtranscript\")\n#> To cite package 'ggtranscript' in publications use:\n#> \n#>   Gustavsson EK, Zhang D, Reynolds RH, Garcia-Ruiz S, Ryten M (2022).\n#>   \"ggtranscript: an R package for the visualization and interpretation\n#>   of transcript isoforms using ggplot2.\" _Bioinformatics_.\n#>   doi:10.1093/bioinformatics/btac409\n#>   <https://doi.org/10.1093/bioinformatics/btac409>,\n#>   <https://academic.oup.com/bioinformatics/article/38/15/3844/6617821>.\n#> \n#> A BibTeX entry for LaTeX users is\n#> \n#>   @Article{,\n#>     title = {ggtranscript: an R package for the visualization and interpretation of transcript isoforms using ggplot2},\n#>     author = {Emil K Gustavsson and David Zhang and Regina H Reynolds and Sonia Garcia-Ruiz and Mina Ryten},\n#>     year = {2022},\n#>     journal = {Bioinformatics},\n#>     doi = {https://doi.org/10.1093/bioinformatics/btac409},\n#>     url = {https://academic.oup.com/bioinformatics/article/38/15/3844/6617821},\n#>   }\n```\n\n## Credits\n\n- `ggtranscript` was developed using `biocthis`.\n"
  },
  {
    "path": "_pkgdown.yml",
    "content": "template:\n  bootstrap: 5\n  bootswatch: cosmo\n"
  },
  {
    "path": "codecov.yml",
    "content": "comment: false\n\ncoverage:\n  status:\n    project:\n      default:\n        target: auto\n        threshold: 1%\n        informational: true\n    patch:\n      default:\n        target: auto\n        threshold: 1%\n        informational: true\n"
  },
  {
    "path": "data-raw/ggplot2_exts_thumbnail.R",
    "content": "\n# Load libraries ----------------------------------------------------------\n\nlibrary(tidyverse)\ndevtools::load_all(\".\")\n\n# Main --------------------------------------------------------------------\n\nsod1_201_exons <- sod1_annotation %>%\n    dplyr::filter(\n        type == \"exon\",\n        transcript_name == \"SOD1-201\"\n    )\n\nsod1_201_cds <- sod1_annotation %>%\n    dplyr::filter(\n        type == \"CDS\",\n        transcript_name == \"SOD1-201\"\n    )\n\nsod1_junctions <- sod1_junctions %>% dplyr::mutate(transcript_name = \"SOD1-201\")\n\nggplot2_exts_figure <- sod1_201_exons %>%\n    ggplot(aes(\n        xstart = start,\n        xend = end,\n        y = transcript_name\n    )) +\n    geom_range(\n        fill = \"white\",\n        height = 0.125\n    ) +\n    geom_range(\n        data = sod1_201_cds,\n        height = 0.25\n    ) +\n    geom_intron(\n        data = to_intron(sod1_201_exons, \"transcript_name\")\n    ) +\n    geom_junction(\n        data = sod1_junctions,\n        aes(size = mean_count),\n        junction.y.max = 0.25,\n        ncp = 30,\n        colour = \"purple\"\n    ) +\n    scale_size_continuous(range = c(0.1, 1), guide = \"none\") +\n    xlab(\"Genomic position (chr21)\") +\n    ylab(\"Transcript name\") +\n    theme_bw() +\n    theme(\n        axis.line = element_line(colour = \"black\"),\n        panel.grid = element_blank(),\n        panel.border = element_blank()\n    )\n\nggplot2_exts_figure\n\n# Save data ---------------------------------------------------------------\n\nggsave(\n    plot = ggplot2_exts_figure,\n    filename = here::here(\"man\", \"figures\", \"dzhang32-ggtranscript.png\"),\n    height = 3,\n    width = 3.5,\n    dpi = 600\n)\n"
  },
  {
    "path": "data-raw/ggtranscript_logo.R",
    "content": "\n# Load libraries ----------------------------------------------------------\n\nlibrary(tidyverse)\nlibrary(hexSticker)\nlibrary(showtext)\ndevtools::load_all(\".\")\n\n# Main --------------------------------------------------------------------\n\nlogo_exons <- tribble(\n    ~start, ~end, ~tx, ~letter,\n    #-------|-----|----|--------\n    150, 200, \"J\", \"T\",\n    500, 550, \"J\", \"T\",\n    300, 310, \"I\", \"T\",\n    350, 400, \"I\", \"T\",\n    300, 350, \"H\", \"T\",\n    390, 400, \"H\", \"T\",\n    300, 310, \"G\", \"T\",\n    350, 400, \"G\", \"T\",\n    700, 800, \"J\", \"X_top\",\n    1100, 1200, \"J\", \"X_top\",\n    700, 800, \"G\", \"X_bot\",\n    1100, 1200, \"G\", \"X_bot\"\n) %>%\n    dplyr::mutate(\n        tx = tx %>% factor(\n            levels = LETTERS[1:17]\n        )\n    )\n\nlogo_utr <- tribble(\n    ~start, ~end, ~tx, ~letter,\n    #-------|-----|----|--------\n    100, 150, \"J\", \"T\",\n    550, 600, \"J\", \"T\",\n)\n\nlogo_introns <- logo_exons %>%\n    dplyr::filter(letter == \"T\") %>%\n    to_intron(group_var = \"tx\")\n\nlogo_junctions <- logo_exons %>%\n    dplyr::filter(letter %in% c(\"X_top\", \"X_bot\")) %>%\n    to_intron(group_var = \"tx\")\n\nsize <- 0.3\ncolour <- \"black\"\nfill <- ggpubr::get_palette(\"jco\", 10)[10]\n\n# create T\nggtranscript_logo <- logo_exons %>%\n    dplyr::filter(letter == \"T\") %>%\n    ggplot(aes(\n        xstart = start,\n        xend = end,\n        y = tx\n    )) +\n    geom_range(\n        fill = fill,\n        size = size,\n        colour = colour\n    ) +\n    geom_range(\n        data = logo_utr,\n        fill = \"white\",\n        height = 0.25,\n        size = size,\n        colour = colour\n    ) +\n    geom_intron(\n        data = logo_introns,\n        size = size,\n        colour = colour,\n        arrow.min.intron.length = 100\n    )\n\nggtranscript_logo <- ggtranscript_logo +\n    geom_half_range(\n        data = logo_exons %>% dplyr::filter(letter == \"X_top\"),\n        range.orientation = \"top\",\n        fill = fill,\n        size = size,\n        colour = colour,\n    ) +\n    geom_half_range(\n        data = logo_exons %>% dplyr::filter(letter == \"X_bot\"),\n        fill = fill,\n        size = size,\n        colour = colour,\n    ) +\n    geom_junction(\n        data = logo_junctions %>% dplyr::filter(letter == \"X_bot\"),\n        size = size,\n        colour = colour,\n        junction.orientation = \"top\",\n        junction.y.max = 1.4,\n        ncp = 50\n    ) +\n    geom_junction(\n        data = logo_junctions %>% dplyr::filter(letter == \"X_top\"),\n        size = size,\n        colour = colour,\n        junction.orientation = \"bottom\",\n        junction.y.max = 1.4,\n        ncp = 50\n    )\n\nggtranscript_logo <- ggtranscript_logo +\n    scale_x_continuous(\n        limits = c(-300, 1600),\n        minor_breaks = seq(-300, 1500, 100)\n    ) +\n    scale_y_discrete(drop = FALSE) +\n    theme_bw() +\n    theme(\n        axis.text = element_blank(),\n        axis.ticks = element_blank(),\n        axis.title = element_blank(),\n        panel.grid.major = element_line(\n            size = size,\n            colour = ggpubr::get_palette(\"Greys\", 10)[2]\n        ),\n        panel.grid.minor = element_line(\n            size = size,\n            colour = ggpubr::get_palette(\"Greys\", 10)[2]\n        )\n    )\n\nggtranscript_logo\n\n# Save data ---------------------------------------------------------------\n\n# use font from https://fonts.google.com\nfont_add_google(name = \"Raleway\", family = \"Raleway\")\nshowtext_auto()\n\nggtranscript_logo_hex <- hexSticker::sticker(\n    # the plot (TX)\n    subplot = ggtranscript_logo,\n    s_x = 0.98,\n    s_y = 1.2,\n    s_width = 2.8,\n    s_height = 3,\n    # the package\n    package = \"ggtranscript\",\n    p_x = 1,\n    p_y = 0.65,\n    p_size = 35,\n    p_family = \"Raleway\",\n    p_fontface = \"bold\",\n    p_color = ggpubr::get_palette(\"jco\", 10)[6],\n    # hex border\n    h_color = ggpubr::get_palette(\"jco\", 10)[6],\n    h_fill = \"white\",\n    h_size = 2,\n    # url\n    url = \"https://github.com/dzhang32/ggtranscript\",\n    u_family = \"Raleway\",\n    u_color = ggpubr::get_palette(\"jco\", 10)[6],\n    u_size = 6.5,\n    # general\n    filename = here::here(\"man\", \"figures\", \"ggtranscript_logo.png\"),\n    dpi = 600,\n    white_around_sticker = TRUE\n)\n\n# here::here(\"man\", \"figures\", \"ggtranscript_logo.png\") is then\n# manually cropped to remove the white background in Inkscape\nplot(ggtranscript_logo_hex)\n"
  },
  {
    "path": "data-raw/sod1_junctions.R",
    "content": "\n# Load libraries ----------------------------------------------------------\n\nlibrary(tidyverse)\nlibrary(snapcount)\nlibrary(SummarizedExperiment)\n\n# Main --------------------------------------------------------------------\n\n# obtain gtex junctions across SOD1\nsod1_query <- snapcount::QueryBuilder(compilation = \"gtex\", regions = \"SOD1\")\n\n# keeping only unannotated junctions\n# from liver where SOD1 is highly expressed\n# https://gtexportal.org/home/gene/SOD1\nsod1_query <- set_row_filters(sod1_query, annotated == 0)\nsod1_query <- set_column_filters(sod1_query, SMTS == \"Liver\")\n\nsod1_junctions <- snapcount::query_jx(sod1_query)\n\n# obtain mean counts\nmean_counts <-\n    sod1_junctions %>%\n    SummarizedExperiment::assays() %>%\n    .[[\"counts\"]] %>%\n    as.matrix() %>%\n    rowMeans()\n\nsod1_junctions <- sod1_junctions %>%\n    SummarizedExperiment::rowRanges() %>%\n    as.data.frame() %>%\n    dplyr::as_tibble()\n\n# minor QC and tidying of the junctions\nsod1_junctions <-\n    sod1_junctions %>%\n    dplyr::mutate(mean_count = mean_counts) %>%\n    dplyr::filter(mean_count > 0.3) %>%\n    dplyr::select(\n        seqnames,\n        start,\n        end,\n        strand,\n        mean_count\n    )\n\n# Save data ---------------------------------------------------------------\n\nusethis::use_data(\n    sod1_junctions,\n    compress = \"gzip\",\n    overwrite = TRUE\n)\n"
  },
  {
    "path": "data-raw/sod1_pknox1_annotation.R",
    "content": "\n# Load libraries ----------------------------------------------------------\n\nlibrary(tidyverse)\nlibrary(rtracklayer)\nlibrary(R.utils)\n\n# Main --------------------------------------------------------------------\n\ngtf_path <- file.path(tempdir(), \"Homo_sapiens.GRCh38.105.chr.gtf.gz\")\n\n# download ens 105 gtf\ndownload.file(\n    stringr::str_c(\n        \"http://ftp.ensembl.org/pub/release-105/gtf/homo_sapiens/\",\n        \"Homo_sapiens.GRCh38.105.chr.gtf.gz\"\n    ),\n    destfile = gtf_path\n)\n\n# unzip gtf\nR.utils::gunzip(gtf_path)\n\ngtf_path <- gtf_path %>%\n    stringr::str_remove(\"\\\\.gz$\")\n\ngtf <- rtracklayer::import(gtf_path)\n\n# extract example gene transcripts\n# convert to tibble()\nsod1_annotation <-\n    gtf[!is.na(gtf$gene_name) & gtf$gene_name == \"SOD1\"] %>%\n    as.data.frame() %>%\n    dplyr::as_tibble() %>%\n    dplyr::select(\n        seqnames,\n        start,\n        end,\n        strand,\n        type,\n        gene_name,\n        transcript_name,\n        transcript_biotype\n    )\n\npknox1_annotation <-\n    gtf[!is.na(gtf$gene_name) & gtf$gene_name == \"PKNOX1\"] %>%\n    as.data.frame() %>%\n    dplyr::as_tibble() %>%\n    dplyr::select(\n        seqnames,\n        start,\n        end,\n        strand,\n        type,\n        gene_name,\n        transcript_name,\n        transcript_biotype\n    )\n\n# Save data ---------------------------------------------------------------\n\nusethis::use_data(\n    sod1_annotation,\n    compress = \"gzip\",\n    overwrite = TRUE\n)\n\nusethis::use_data(\n    pknox1_annotation,\n    compress = \"gzip\",\n    overwrite = TRUE\n)\n"
  },
  {
    "path": "inst/CITATION",
    "content": "pkgVer <- function(pkg) {\n    if (!exists(\"meta\") || is.null(meta)) meta <- packageDescription(pkg)\n    ver <- meta$Version\n    paste0('https://github.com/dzhang32/', pkg, ' - R package version ', ver)\n}\n\nc(\n    bibentry(bibtype=\"article\",\n        title = \"ggtranscript: an R package for the visualization and interpretation of transcript isoforms using ggplot2\",\n        author = personList(\n            as.person(\"Emil K Gustavsson\"), \n            as.person(\"David Zhang\"),\n            as.person(\"Regina H Reynolds\"), \n            as.person(\"Sonia Garcia-Ruiz\"),\n            as.person(\"Mina Ryten\")\n        ),\n        year = 2022,\n        journal = \"Bioinformatics\",\n        doi = \"https://doi.org/10.1093/bioinformatics/btac409\",\n        url = \"https://academic.oup.com/bioinformatics/article/38/15/3844/6617821\"\n    )\n)\n"
  },
  {
    "path": "man/add_exon_number.Rd",
    "content": "% Generated by roxygen2: do not edit by hand\n% Please edit documentation in R/add_exon_number.R\n\\name{add_exon_number}\n\\alias{add_exon_number}\n\\title{Add exon number}\n\\usage{\nadd_exon_number(exons, group_var = NULL)\n}\n\\arguments{\n\\item{exons}{\\code{data.frame()} contains exons which can originate from multiple\ntranscripts differentiated by \\code{group_var}.}\n\n\\item{group_var}{\\code{character()} if input data originates from more than 1\ntranscript, \\code{group_var} must specify the column that differentiates\ntranscripts (e.g. \"transcript_id\").}\n}\n\\value{\n\\code{data.frame()} equivalent to input \\code{exons}, with the additional\ncolumn \"exon_number\".\n}\n\\description{\n\\code{add_exon_number()} adds the exon number (the order the exons are transcribed\nwithin each transcript) as a column in \\code{exons}. This can be useful when\nvisualizing long, complex transcript structures, in order to keep track of\nspecific exons of interest.\n}\n\\details{\nTo note, a \"strand\" column must be present within \\code{exons}. The strand is used\nto differentiate whether exon numbers should be calculated according to\nascending (\"+\") or descending (\"-\") genomic co-ordinates. For ambiguous\nstrands (\"*\"), \\code{add_exon_number()} will be assume the strand be \"+\".\n}\n\\examples{\nlibrary(magrittr)\nlibrary(ggplot2)\n\n# to illustrate the package's functionality\n# ggtranscript includes example transcript annotation\nsod1_annotation \\%>\\% head()\n\n# extract exons\nsod1_exons <- sod1_annotation \\%>\\% dplyr::filter(type == \"exon\")\nsod1_exons \\%>\\% head()\n\n# add the exon number for each transcript\nsod1_exons <- sod1_exons \\%>\\% add_exon_number(group_var = \"transcript_name\")\n\nbase <- sod1_exons \\%>\\%\n    ggplot(aes(\n        xstart = start,\n        xend = end,\n        y = transcript_name\n    )) +\n    geom_range() +\n    geom_intron(\n        data = to_intron(sod1_exons, \"transcript_name\"),\n        strand = \"+\"\n    )\n\n# it can be useful to annotate exons with their exon number\n# using ggplot2::geom_text()\nbase +\n    geom_text(aes(\n        x = (start + end) / 2, # plot label at midpoint of exon\n        label = exon_number\n    ),\n    size = 3.5,\n    nudge_y = 0.4\n    )\n\n# Or alternatively, using ggrepel::geom_label_repel()\n# to separate labels from exons\nbase +\n    ggrepel::geom_label_repel(ggplot2::aes(\n        x = (start + end) / 2,\n        label = exon_number\n    ),\n    size = 3.5,\n    min.segment.length = 0\n    )\n}\n"
  },
  {
    "path": "man/add_utr.Rd",
    "content": "% Generated by roxygen2: do not edit by hand\n% Please edit documentation in R/add_utr.R\n\\name{add_utr}\n\\alias{add_utr}\n\\title{Add untranslated regions (UTRs)}\n\\usage{\nadd_utr(exons, cds, group_var = NULL)\n}\n\\arguments{\n\\item{exons}{\\code{data.frame()} contains exons which can originate from multiple\ntranscripts differentiated by \\code{group_var}.}\n\n\\item{cds}{\\code{data.frame()} contains coding sequence ranges for the transcripts\nin \\code{exons}.}\n\n\\item{group_var}{\\code{character()} if input data originates from more than 1\ntranscript, \\code{group_var} must specify the column that differentiates\ntranscripts (e.g. \"transcript_id\").}\n}\n\\value{\n\\code{data.frame()} contains differentiated CDS and UTR ranges.\n}\n\\description{\nGiven a set of \\code{exons} (encompassing the CDS and UTRs) and \\code{cds} regions,\n\\code{add_utr()} will calculate and add the corresponding UTR regions as ranges.\nThis can be useful when combined with \\code{shorten_gaps()} to visualize\ntranscripts with long introns, whilst differentiating UTRs from CDS regions.\n}\n\\details{\nThe definition of the inputted \\code{cds} regions are expected to range from the\nbeginning of the start codon to the end of the stop codon. Sometimes, for\nexample in the case of Ensembl, reference annotation will omit the stop\ncodons from the CDS definition. In such cases, users should manually ensure\nthat the \\code{cds} includes both the start and stop codons.\n}\n\\examples{\n\nlibrary(magrittr)\nlibrary(ggplot2)\n\n# to illustrate the package's functionality\n# ggtranscript includes example transcript annotation\npknox1_annotation \\%>\\% head()\n\n# extract exons\npknox1_exons <- pknox1_annotation \\%>\\% dplyr::filter(type == \"exon\")\npknox1_exons \\%>\\% head()\n\n# extract cds\npknox1_cds <- pknox1_annotation \\%>\\% dplyr::filter(type == \"CDS\")\npknox1_cds \\%>\\% head()\n\n# the CDS definition originating from the Ensembl reference annotation\n# does not include the stop codon\n# we must incorporate the stop codons into the CDS manually\n# by adding 3 base pairs to the end of the CDS of each transcript\npknox1_cds_w_stop <- pknox1_cds \\%>\\%\n    dplyr::group_by(transcript_name) \\%>\\%\n    dplyr::mutate(\n        end = ifelse(end == max(end), end + 3, end)\n    ) \\%>\\%\n    dplyr::ungroup()\n\n# add_utr() adds ranges that represent the UTRs\npknox1_cds_utr <- add_utr(\n    pknox1_exons,\n    pknox1_cds_w_stop,\n    group_var = \"transcript_name\"\n)\n\npknox1_cds_utr \\%>\\% head()\n\n# this can be useful when combined with shorten_gaps()\n# to visualize transcripts with long introns whilst differentiating UTRs\npknox1_cds_utr_rescaled <-\n    shorten_gaps(\n        exons = pknox1_cds_utr,\n        introns = to_intron(pknox1_cds_utr, \"transcript_name\"),\n        group_var = \"transcript_name\"\n    )\n\npknox1_cds_utr_rescaled \\%>\\%\n    dplyr::filter(type == \"CDS\") \\%>\\%\n    ggplot(aes(\n        xstart = start,\n        xend = end,\n        y = transcript_name\n    )) +\n    geom_range() +\n    geom_range(\n        data = pknox1_cds_utr_rescaled \\%>\\% dplyr::filter(type == \"UTR\"),\n        height = 0.25,\n        fill = \"white\"\n    ) +\n    geom_intron(\n        data = to_intron(\n            pknox1_cds_utr_rescaled \\%>\\% dplyr::filter(type != \"intron\"),\n            \"transcript_name\"\n        ),\n        arrow.min.intron.length = 110\n    )\n}\n"
  },
  {
    "path": "man/geom_intron.Rd",
    "content": "% Generated by roxygen2: do not edit by hand\n% Please edit documentation in R/geom_intron.R\n\\name{geom_intron}\n\\alias{geom_intron}\n\\title{Plot intron lines with strand arrows}\n\\usage{\ngeom_intron(\n  mapping = NULL,\n  data = NULL,\n  stat = \"identity\",\n  position = \"identity\",\n  ...,\n  arrow = grid::arrow(ends = \"last\", length = grid::unit(0.1, \"inches\")),\n  arrow.fill = NULL,\n  lineend = \"butt\",\n  linejoin = \"round\",\n  na.rm = FALSE,\n  arrow.min.intron.length = 0,\n  show.legend = NA,\n  inherit.aes = TRUE\n)\n}\n\\arguments{\n\\item{mapping}{Set of aesthetic mappings created by \\code{\\link[ggplot2:aes]{aes()}}. If specified and\n\\code{inherit.aes = TRUE} (the default), it is combined with the default mapping\nat the top level of the plot. You must supply \\code{mapping} if there is no plot\nmapping.}\n\n\\item{data}{The data to be displayed in this layer. There are three\noptions:\n\nIf \\code{NULL}, the default, the data is inherited from the plot\ndata as specified in the call to \\code{\\link[ggplot2:ggplot]{ggplot()}}.\n\nA \\code{data.frame}, or other object, will override the plot\ndata. All objects will be fortified to produce a data frame. See\n\\code{\\link[ggplot2:fortify]{fortify()}} for which variables will be created.\n\nA \\code{function} will be called with a single argument,\nthe plot data. The return value must be a \\code{data.frame}, and\nwill be used as the layer data. A \\code{function} can be created\nfrom a \\code{formula} (e.g. \\code{~ head(.x, 10)}).}\n\n\\item{stat}{The statistical transformation to use on the data for this layer.\nWhen using a \\verb{geom_*()} function to construct a layer, the \\code{stat}\nargument can be used the override the default coupling between geoms and\nstats. The \\code{stat} argument accepts the following:\n\\itemize{\n\\item A \\code{Stat} ggproto subclass, for example \\code{StatCount}.\n\\item A string naming the stat. To give the stat as a string, strip the\nfunction name of the \\code{stat_} prefix. For example, to use \\code{stat_count()},\ngive the stat as \\code{\"count\"}.\n\\item For more information and other ways to specify the stat, see the\n\\link[ggplot2:layer_stats]{layer stat} documentation.\n}}\n\n\\item{position}{A position adjustment to use on the data for this layer. This\ncan be used in various ways, including to prevent overplotting and\nimproving the display. The \\code{position} argument accepts the following:\n\\itemize{\n\\item The result of calling a position function, such as \\code{position_jitter()}.\nThis method allows for passing extra arguments to the position.\n\\item A string naming the position adjustment. To give the position as a\nstring, strip the function name of the \\code{position_} prefix. For example,\nto use \\code{position_jitter()}, give the position as \\code{\"jitter\"}.\n\\item For more information and other ways to specify the position, see the\n\\link[ggplot2:layer_positions]{layer position} documentation.\n}}\n\n\\item{...}{Other arguments passed on to \\code{\\link[ggplot2:layer]{layer()}}'s \\code{params} argument. These\narguments broadly fall into one of 4 categories below. Notably, further\narguments to the \\code{position} argument, or aesthetics that are required\ncan \\emph{not} be passed through \\code{...}. Unknown arguments that are not part\nof the 4 categories below are ignored.\n\\itemize{\n\\item Static aesthetics that are not mapped to a scale, but are at a fixed\nvalue and apply to the layer as a whole. For example, \\code{colour = \"red\"}\nor \\code{linewidth = 3}. The geom's documentation has an \\strong{Aesthetics}\nsection that lists the available options. The 'required' aesthetics\ncannot be passed on to the \\code{params}. Please note that while passing\nunmapped aesthetics as vectors is technically possible, the order and\nrequired length is not guaranteed to be parallel to the input data.\n\\item When constructing a layer using\na \\verb{stat_*()} function, the \\code{...} argument can be used to pass on\nparameters to the \\code{geom} part of the layer. An example of this is\n\\code{stat_density(geom = \"area\", outline.type = \"both\")}. The geom's\ndocumentation lists which parameters it can accept.\n\\item Inversely, when constructing a layer using a\n\\verb{geom_*()} function, the \\code{...} argument can be used to pass on parameters\nto the \\code{stat} part of the layer. An example of this is\n\\code{geom_area(stat = \"density\", adjust = 0.5)}. The stat's documentation\nlists which parameters it can accept.\n\\item The \\code{key_glyph} argument of \\code{\\link[ggplot2:layer]{layer()}} may also be passed on through\n\\code{...}. This can be one of the functions described as\n\\link[ggplot2:draw_key]{key glyphs}, to change the display of the layer in the legend.\n}}\n\n\\item{arrow}{specification for arrow heads, as created by \\code{\\link[grid:arrow]{grid::arrow()}}.}\n\n\\item{arrow.fill}{fill colour to use for the arrow head (if closed). \\code{NULL}\nmeans use \\code{colour} aesthetic.}\n\n\\item{lineend}{Line end style (round, butt, square).}\n\n\\item{linejoin}{Line join style (round, mitre, bevel).}\n\n\\item{na.rm}{If \\code{FALSE}, the default, missing values are removed with\na warning. If \\code{TRUE}, missing values are silently removed.}\n\n\\item{arrow.min.intron.length}{\\code{integer()} the minimum required width of an\nintron for a strand arrow to be drawn. This can be useful to remove strand\narrows on short introns that overlap adjacent exons.}\n\n\\item{show.legend}{logical. Should this layer be included in the legends?\n\\code{NA}, the default, includes if any aesthetics are mapped.\n\\code{FALSE} never includes, and \\code{TRUE} always includes.\nIt can also be a named logical vector to finely select the aesthetics to\ndisplay.}\n\n\\item{inherit.aes}{If \\code{FALSE}, overrides the default aesthetics,\nrather than combining with them. This is most useful for helper functions\nthat define both data and aesthetics and shouldn't inherit behaviour from\nthe default plot specification, e.g. \\code{\\link[ggplot2:borders]{borders()}}.}\n}\n\\value{\nthe return value of a \\verb{geom_*} function is not intended to be\ndirectly handled by users. Therefore, \\verb{geom_*} functions should never be\nexecuted in isolation, rather used in combination with a\n\\code{ggplot2::ggplot()} call.\n}\n\\description{\n\\code{geom_intron()} draws horizontal lines with central arrows that are designed\nto represent introns. In combination with \\code{geom_range()}/\\code{geom_half_range()},\nthese geoms form the core components for visualizing transcript structures.\n}\n\\details{\n\\code{geom_intron()} requires the following \\code{aes()}; \\code{xstart}, \\code{xend} and \\code{y}\n(e.g. transcript name). If users do not have intron co-ordinates, these can\nbe generated from the corresponding exons using \\code{to_intron()}. The \\code{strand}\noption (one of \"+\" or \"-\") adjusts the arrow direction to match the direction\nof transcription. The \\code{arrow.min.intron.length} parameter can be useful to\nremove strand arrows that overlap exons, which can be a problem if plotted\nintrons include those that are relatively short.\n}\n\\examples{\n\nlibrary(magrittr)\nlibrary(ggplot2)\n\n# to illustrate the package's functionality\n# ggtranscript includes example transcript annotation\npknox1_annotation \\%>\\% head()\n\n# extract exons\npknox1_exons <- pknox1_annotation \\%>\\% dplyr::filter(type == \"exon\")\npknox1_exons \\%>\\% head()\n\n# to_intron() is a helper function included in ggtranscript\n# which is useful for converting exon co-ordinates to introns\npknox1_introns <- pknox1_exons \\%>\\% to_intron(group_var = \"transcript_name\")\npknox1_introns \\%>\\% head()\n\nbase <- pknox1_introns \\%>\\%\n    ggplot(aes(\n        xstart = start,\n        xend = end,\n        y = transcript_name\n    ))\n\n# by default, geom_intron() assumes introns originate from the \"+\" strand\nbase + geom_intron()\n\n# however this can be modified using the strand option\nbase + geom_intron(strand = \"-\")\n\n# strand can also be set as an aes()\nbase + geom_intron(aes(strand = strand))\n\n# as a ggplot2 extension, ggtranscript geoms inherit the\n# the functionality from the parameters and aesthetics in ggplot2\nbase + geom_intron(\n    aes(colour = transcript_name),\n    linewidth = 1\n)\n\n# together, geom_range() and geom_intron() are designed to visualize\n# the core components of transcript annotation\npknox1_exons \\%>\\%\n    ggplot(aes(\n        xstart = start,\n        xend = end,\n        y = transcript_name\n    )) +\n    geom_range() +\n    geom_intron(\n        data = pknox1_introns\n    )\n\n# for short introns, sometimes strand arrows will overlap exons\n# to avoid this, users can set the arrow.min.intron.length parameter\npknox1_exons \\%>\\%\n    ggplot(aes(\n        xstart = start,\n        xend = end,\n        y = transcript_name\n    )) +\n    geom_range() +\n    geom_intron(\n        data = pknox1_introns,\n        arrow.min.intron.length = 3500\n    )\n}\n"
  },
  {
    "path": "man/geom_junction.Rd",
    "content": "% Generated by roxygen2: do not edit by hand\n% Please edit documentation in R/geom_junction.R\n\\name{geom_junction}\n\\alias{geom_junction}\n\\title{Plot junction curves}\n\\usage{\ngeom_junction(\n  mapping = NULL,\n  data = NULL,\n  stat = \"identity\",\n  position = \"identity\",\n  junction.orientation = \"alternating\",\n  junction.y.max = 1,\n  angle = 90,\n  ncp = 15,\n  na.rm = FALSE,\n  orientation = NA,\n  show.legend = NA,\n  inherit.aes = TRUE,\n  ...\n)\n}\n\\arguments{\n\\item{mapping}{Set of aesthetic mappings created by \\code{\\link[ggplot2:aes]{aes()}}. If specified and\n\\code{inherit.aes = TRUE} (the default), it is combined with the default mapping\nat the top level of the plot. You must supply \\code{mapping} if there is no plot\nmapping.}\n\n\\item{data}{The data to be displayed in this layer. There are three\noptions:\n\nIf \\code{NULL}, the default, the data is inherited from the plot\ndata as specified in the call to \\code{\\link[ggplot2:ggplot]{ggplot()}}.\n\nA \\code{data.frame}, or other object, will override the plot\ndata. All objects will be fortified to produce a data frame. See\n\\code{\\link[ggplot2:fortify]{fortify()}} for which variables will be created.\n\nA \\code{function} will be called with a single argument,\nthe plot data. The return value must be a \\code{data.frame}, and\nwill be used as the layer data. A \\code{function} can be created\nfrom a \\code{formula} (e.g. \\code{~ head(.x, 10)}).}\n\n\\item{stat}{The statistical transformation to use on the data for this layer.\nWhen using a \\verb{geom_*()} function to construct a layer, the \\code{stat}\nargument can be used the override the default coupling between geoms and\nstats. The \\code{stat} argument accepts the following:\n\\itemize{\n\\item A \\code{Stat} ggproto subclass, for example \\code{StatCount}.\n\\item A string naming the stat. To give the stat as a string, strip the\nfunction name of the \\code{stat_} prefix. For example, to use \\code{stat_count()},\ngive the stat as \\code{\"count\"}.\n\\item For more information and other ways to specify the stat, see the\n\\link[ggplot2:layer_stats]{layer stat} documentation.\n}}\n\n\\item{position}{A position adjustment to use on the data for this layer. This\ncan be used in various ways, including to prevent overplotting and\nimproving the display. The \\code{position} argument accepts the following:\n\\itemize{\n\\item The result of calling a position function, such as \\code{position_jitter()}.\nThis method allows for passing extra arguments to the position.\n\\item A string naming the position adjustment. To give the position as a\nstring, strip the function name of the \\code{position_} prefix. For example,\nto use \\code{position_jitter()}, give the position as \\code{\"jitter\"}.\n\\item For more information and other ways to specify the position, see the\n\\link[ggplot2:layer_positions]{layer position} documentation.\n}}\n\n\\item{junction.orientation}{\\code{character()} one of \"alternating\", \"top\" or\n\"bottom\", specifying where the junctions will be plotted with respect to\neach transcript (\\code{y}).}\n\n\\item{junction.y.max}{\\code{double()} the max y-value of each junction curve. It\ncan be useful to adjust this parameter when junction curves overlap with\none another/other transcripts or extend beyond the plot margins.}\n\n\\item{angle}{A numeric value between 0 and 180,\n    giving an amount to skew the control\n    points of the curve.  Values less than 90 skew the curve towards\n    the start point and values greater than 90 skew the curve\n    towards the end point.}\n\n\\item{ncp}{The number of control points used to draw the curve.\n    More control points creates a smoother curve.}\n\n\\item{na.rm}{If \\code{FALSE}, the default, missing values are removed with\na warning. If \\code{TRUE}, missing values are silently removed.}\n\n\\item{orientation}{The orientation of the layer. The default (\\code{NA})\nautomatically determines the orientation from the aesthetic mapping. In the\nrare event that this fails it can be given explicitly by setting \\code{orientation}\nto either \\code{\"x\"} or \\code{\"y\"}. See the \\emph{Orientation} section for more detail.}\n\n\\item{show.legend}{logical. Should this layer be included in the legends?\n\\code{NA}, the default, includes if any aesthetics are mapped.\n\\code{FALSE} never includes, and \\code{TRUE} always includes.\nIt can also be a named logical vector to finely select the aesthetics to\ndisplay.}\n\n\\item{inherit.aes}{If \\code{FALSE}, overrides the default aesthetics,\nrather than combining with them. This is most useful for helper functions\nthat define both data and aesthetics and shouldn't inherit behaviour from\nthe default plot specification, e.g. \\code{\\link[ggplot2:borders]{borders()}}.}\n\n\\item{...}{Other arguments passed on to \\code{\\link[ggplot2:layer]{layer()}}'s \\code{params} argument. These\narguments broadly fall into one of 4 categories below. Notably, further\narguments to the \\code{position} argument, or aesthetics that are required\ncan \\emph{not} be passed through \\code{...}. Unknown arguments that are not part\nof the 4 categories below are ignored.\n\\itemize{\n\\item Static aesthetics that are not mapped to a scale, but are at a fixed\nvalue and apply to the layer as a whole. For example, \\code{colour = \"red\"}\nor \\code{linewidth = 3}. The geom's documentation has an \\strong{Aesthetics}\nsection that lists the available options. The 'required' aesthetics\ncannot be passed on to the \\code{params}. Please note that while passing\nunmapped aesthetics as vectors is technically possible, the order and\nrequired length is not guaranteed to be parallel to the input data.\n\\item When constructing a layer using\na \\verb{stat_*()} function, the \\code{...} argument can be used to pass on\nparameters to the \\code{geom} part of the layer. An example of this is\n\\code{stat_density(geom = \"area\", outline.type = \"both\")}. The geom's\ndocumentation lists which parameters it can accept.\n\\item Inversely, when constructing a layer using a\n\\verb{geom_*()} function, the \\code{...} argument can be used to pass on parameters\nto the \\code{stat} part of the layer. An example of this is\n\\code{geom_area(stat = \"density\", adjust = 0.5)}. The stat's documentation\nlists which parameters it can accept.\n\\item The \\code{key_glyph} argument of \\code{\\link[ggplot2:layer]{layer()}} may also be passed on through\n\\code{...}. This can be one of the functions described as\n\\link[ggplot2:draw_key]{key glyphs}, to change the display of the layer in the legend.\n}}\n}\n\\value{\nthe return value of a \\verb{geom_*} function is not intended to be\ndirectly handled by users. Therefore, \\verb{geom_*} functions should never be\nexecuted in isolation, rather used in combination with a\n\\code{ggplot2::ggplot()} call.\n}\n\\description{\n\\code{geom_junction()} draws curves that are designed to represent junction reads\nfrom RNA-sequencing data. It can be useful to overlay junction data on\ntranscript annotation (plotted using \\code{geom_range()}/\\code{geom_half_range()} and\n\\code{geom_intron()}) to understand which splicing events or transcripts have\nsupport from RNA-sequencing data.\n}\n\\details{\n\\code{geom_junction()} requires the following \\code{aes()}; \\code{xstart}, \\code{xend} and \\code{y}\n(e.g. transcript name). \\code{geom_junction()} curves can be modified using\n\\code{junction.y.max}, which can be useful when junctions overlap one\nanother/other transcripts or extend beyond the plot margins. By default,\njunction curves will alternate between being plotted on the top and bottom of\neach transcript (\\code{y}), however this can be modified via\n\\code{junction.orientation}.\n}\n\\examples{\n\nlibrary(magrittr)\nlibrary(ggplot2)\n\n# to illustrate the package's functionality\n# ggtranscript includes example transcript annotation\nsod1_annotation \\%>\\% head()\n\n# as well as a set of example (unannotated) junctions\n# originating from GTEx and downloaded via the Bioconductor package snapcount\nsod1_junctions\n\n# extract exons\nsod1_exons <- sod1_annotation \\%>\\% dplyr::filter(\n    type == \"exon\",\n    transcript_name == \"SOD1-201\"\n)\nsod1_exons \\%>\\% head()\n\n# add transcript_name to junctions for plotting\nsod1_junctions <- sod1_junctions \\%>\\%\n    dplyr::mutate(transcript_name = \"SOD1-201\")\n\n# junctions can be plotted as curves using geom_junction()\nbase <- sod1_junctions \\%>\\%\n    ggplot2::ggplot(ggplot2::aes(\n        xstart = start,\n        xend = end,\n        y = transcript_name\n    ))\n\n# sometimes, depending on the number and widths of transcripts and junctions\n# junctions will go overlap one another or extend beyond the plot margin\nbase + geom_junction()\n\n# in such cases, junction.y.max can be adjusted to modify the max y of curves\nbase + geom_junction(junction.y.max = 0.5)\n\n# ncp can be used improve the smoothness of curves\nbase + geom_junction(junction.y.max = 0.5, ncp = 30)\n\n# junction.orientation controls where the junction are plotted\n# with respect to each transcript\n# either alternating (default), or on the top or bottom\nbase + geom_junction(junction.orientation = \"top\", junction.y.max = 0.5)\nbase + geom_junction(junction.orientation = \"bottom\", junction.y.max = 0.5)\n\n# it can be useful useful to overlay junction curves onto existing annotation\n# plotted using geom_range() and geom_intron()\nbase <- sod1_exons \\%>\\%\n    ggplot(aes(\n        xstart = start,\n        xend = end,\n        y = transcript_name\n    )) +\n    geom_range() +\n    geom_intron(\n        data = to_intron(sod1_exons, \"transcript_name\")\n    )\n\nbase + geom_junction(\n    data = sod1_junctions,\n    junction.y.max = 0.5\n)\n\n# as a ggplot2 extension, ggtranscript geoms inherit the\n# the functionality from the parameters and aesthetics in ggplot2\n# this can be useful when mapping junction thickness to their counts\nbase + geom_junction(\n    data = sod1_junctions,\n    aes(linewidth = mean_count),\n    junction.y.max = 0.5,\n    colour = \"purple\"\n) +\n    scale_linewidth(range = c(0.1, 1))\n\n# it can be useful to combine geom_junction() with geom_half_range()\nsod1_exons \\%>\\%\n    ggplot(aes(\n        xstart = start,\n        xend = end,\n        y = transcript_name\n    )) +\n    geom_half_range() +\n    geom_intron(\n        data = to_intron(sod1_exons, \"transcript_name\")\n    ) +\n    geom_junction(\n        data = sod1_junctions,\n        aes(linewidth = mean_count),\n        junction.y.max = 0.5,\n        junction.orientation = \"top\",\n        colour = \"purple\"\n    ) +\n    scale_linewidth(range = c(0.1, 1))\n}\n"
  },
  {
    "path": "man/geom_junction_label_repel.Rd",
    "content": "% Generated by roxygen2: do not edit by hand\n% Please edit documentation in R/geom_junction_label_repel.R\n\\name{geom_junction_label_repel}\n\\alias{geom_junction_label_repel}\n\\title{Label junction curves}\n\\usage{\ngeom_junction_label_repel(\n  mapping = NULL,\n  data = NULL,\n  stat = \"identity\",\n  position = \"identity\",\n  parse = FALSE,\n  ...,\n  junction.orientation = \"alternating\",\n  junction.y.max = 1,\n  angle = 90,\n  ncp = 15,\n  box.padding = 0.25,\n  label.padding = 0.25,\n  point.padding = 1e-06,\n  label.r = 0.15,\n  label.size = 0.25,\n  min.segment.length = 0,\n  arrow = NULL,\n  force = 1,\n  force_pull = 1,\n  max.time = 0.5,\n  max.iter = 10000,\n  max.overlaps = getOption(\"ggrepel.max.overlaps\", default = 10),\n  nudge_x = 0,\n  nudge_y = 0,\n  xlim = c(NA, NA),\n  ylim = c(NA, NA),\n  na.rm = FALSE,\n  show.legend = NA,\n  direction = c(\"both\", \"y\", \"x\"),\n  seed = NA,\n  verbose = FALSE,\n  inherit.aes = TRUE\n)\n}\n\\arguments{\n\\item{mapping}{Set of aesthetic mappings created by \\code{\\link[ggplot2]{aes}} or\n\\code{\\link[ggplot2]{aes_}}. If specified and \\code{inherit.aes = TRUE} (the\ndefault), is combined with the default mapping at the top level of the\nplot. You only need to supply \\code{mapping} if there isn't a mapping\ndefined for the plot.}\n\n\\item{data}{A data frame. If specified, overrides the default data frame\ndefined at the top level of the plot.}\n\n\\item{stat}{The statistical transformation to use on the data for this\nlayer, as a string.}\n\n\\item{position}{Position adjustment, either as a string, or the result of\na call to a position adjustment function.}\n\n\\item{parse}{If TRUE, the labels will be parsed into expressions and\ndisplayed as described in ?plotmath}\n\n\\item{...}{other arguments passed on to \\code{\\link[ggplot2]{layer}}. There are\n  three types of arguments you can use here:\n\n  \\itemize{\n    \\item Aesthetics: to set an aesthetic to a fixed value, like\n       \\code{colour = \"red\"} or \\code{size = 3}.\n    \\item Other arguments to the layer, for example you override the\n      default \\code{stat} associated with the layer.\n    \\item Other arguments passed on to the stat.\n  }}\n\n\\item{junction.orientation}{\\code{character()} one of \"alternating\", \"top\" or\n\"bottom\", specifying where the junctions will be plotted with respect to\neach transcript (\\code{y}).}\n\n\\item{junction.y.max}{\\code{double()} the max y-value of each junction curve. It\ncan be useful to adjust this parameter when junction curves overlap with\none another/other transcripts or extend beyond the plot margins.}\n\n\\item{angle}{A numeric value between 0 and 180,\n    giving an amount to skew the control\n    points of the curve.  Values less than 90 skew the curve towards\n    the start point and values greater than 90 skew the curve\n    towards the end point.}\n\n\\item{ncp}{The number of control points used to draw the curve.\n    More control points creates a smoother curve.}\n\n\\item{box.padding}{Amount of padding around bounding box, as unit or number.\nDefaults to 0.25. (Default unit is lines, but other units can be specified\nby passing \\code{unit(x, \"units\")}).}\n\n\\item{label.padding}{Amount of padding around label, as unit or number.\nDefaults to 0.25. (Default unit is lines, but other units can be specified\nby passing \\code{unit(x, \"units\")}).}\n\n\\item{point.padding}{Amount of padding around labeled point, as unit or\nnumber. Defaults to 0. (Default unit is lines, but other units can be\nspecified by passing \\code{unit(x, \"units\")}).}\n\n\\item{label.r}{Radius of rounded corners, as unit or number. Defaults\nto 0.15. (Default unit is lines, but other units can be specified by\npassing \\code{unit(x, \"units\")}).}\n\n\\item{label.size}{Size of label border, in mm.}\n\n\\item{min.segment.length}{Skip drawing segments shorter than this, as unit or\nnumber. Defaults to 0.5. (Default unit is lines, but other units can be\nspecified by passing \\code{unit(x, \"units\")}).}\n\n\\item{arrow}{specification for arrow heads, as created by \\code{\\link[grid]{arrow}}}\n\n\\item{force}{Force of repulsion between overlapping text labels. Defaults\nto 1.}\n\n\\item{force_pull}{Force of attraction between a text label and its\ncorresponding data point. Defaults to 1.}\n\n\\item{max.time}{Maximum number of seconds to try to resolve overlaps.\nDefaults to 0.5.}\n\n\\item{max.iter}{Maximum number of iterations to try to resolve overlaps.\nDefaults to 10000.}\n\n\\item{max.overlaps}{Exclude text labels when they overlap too many other\nthings. For each text label, we count how many other text labels or other\ndata points it overlaps, and exclude the text label if it has too many overlaps.\nDefaults to 10.}\n\n\\item{nudge_x, nudge_y}{Horizontal and vertical adjustments to nudge the\nstarting position of each text label. The units for \\code{nudge_x} and\n\\code{nudge_y} are the same as for the data units on the x-axis and y-axis.}\n\n\\item{xlim, ylim}{Limits for the x and y axes. Text labels will be constrained\nto these limits. By default, text labels are constrained to the entire plot\narea.}\n\n\\item{na.rm}{If \\code{FALSE} (the default), removes missing values with\na warning.  If \\code{TRUE} silently removes missing values.}\n\n\\item{show.legend}{logical. Should this layer be included in the legends?\n\\code{NA}, the default, includes if any aesthetics are mapped.\n\\code{FALSE} never includes, and \\code{TRUE} always includes.}\n\n\\item{direction}{\"both\", \"x\", or \"y\" -- direction in which to adjust position of labels}\n\n\\item{seed}{Random seed passed to \\code{\\link[base]{set.seed}}. Defaults to\n\\code{NA}, which means that \\code{set.seed} will not be called.}\n\n\\item{verbose}{If \\code{TRUE}, some diagnostics of the repel algorithm are printed}\n\n\\item{inherit.aes}{If \\code{FALSE}, overrides the default aesthetics,\nrather than combining with them. This is most useful for helper functions\nthat define both data and aesthetics and shouldn't inherit behaviour from\nthe default plot specification, e.g. \\code{\\link[ggplot2]{borders}}.}\n}\n\\value{\nthe return value of a \\verb{geom_*} function is not intended to be\ndirectly handled by users. Therefore, \\verb{geom_*} functions should never be\nexecuted in isolation, rather used in combination with a\n\\code{ggplot2::ggplot()} call.\n}\n\\description{\n\\code{geom_junction_label_repel()} labels junction curves at their midpoint using\n\\code{ggrepel::geom_label_repel()}. This can be useful to label and compare\njunctions (plotted using \\code{geom_junction()}) with metrics of their usage (e.g.\nread counts or percent-spliced-in).\n}\n\\details{\n\\code{geom_junction_label_repel()} requires the following \\code{aes()}; \\code{xstart},\n\\code{xend}, \\code{y} (e.g. transcript name) and \\code{label}. Under the hood,\n\\code{geom_junction_label_repel()} generates the same junction curves as\n\\code{geom_junction()} to obtain curve midpoints for labeling. Therefore, it is\nimportant that users use the same input data and parameters that alter\njunction curves (namely \\code{junction.orientation}, \\code{junction.y.max}, \\code{angle},\n\\code{ncp}) for \\code{geom_junction_label_repel()} that they have used for\n\\code{geom_junction()}.\n}\n\\examples{\nlibrary(magrittr)\nlibrary(ggplot2)\n\n# to illustrate the package's functionality\n# ggtranscript includes example transcript annotation\nsod1_annotation \\%>\\% head()\n\n# as well as a set of example (unannotated) junctions\n# originating from GTEx and downloaded via the Bioconductor package snapcount\nsod1_junctions\n\n# extract exons\nsod1_exons <- sod1_annotation \\%>\\% dplyr::filter(\n    type == \"exon\",\n    transcript_name == \"SOD1-201\"\n)\nsod1_exons \\%>\\% head()\n\n# add transcript_name to junctions for plotting\nsod1_junctions <- sod1_junctions \\%>\\%\n    dplyr::mutate(transcript_name = \"SOD1-201\")\n\n# geom_junction_label_repel() can be used to label junctions\nbase <- sod1_exons \\%>\\%\n    ggplot(aes(\n        xstart = start,\n        xend = end,\n        y = transcript_name\n    )) +\n    geom_range() +\n    geom_intron(\n        data = to_intron(sod1_exons, \"transcript_name\")\n    )\n\n# this can be useful to label junctions with their counts\nbase +\n    geom_junction(\n        data = sod1_junctions,\n        junction.y.max = 0.5\n    ) +\n    geom_junction_label_repel(\n        data = sod1_junctions,\n        aes(label = round(mean_count, 2)),\n        junction.y.max = 0.5\n    )\n}\n"
  },
  {
    "path": "man/geom_range.Rd",
    "content": "% Generated by roxygen2: do not edit by hand\n% Please edit documentation in R/geom_range.R, R/geom_half_range.R\n\\name{geom_range}\n\\alias{geom_range}\n\\alias{geom_half_range}\n\\title{Plot genomic ranges}\n\\usage{\ngeom_range(\n  mapping = NULL,\n  data = NULL,\n  stat = \"identity\",\n  position = \"identity\",\n  ...,\n  vjust = NULL,\n  linejoin = \"mitre\",\n  na.rm = FALSE,\n  show.legend = NA,\n  inherit.aes = TRUE\n)\n\ngeom_half_range(\n  mapping = NULL,\n  data = NULL,\n  stat = \"identity\",\n  position = \"identity\",\n  ...,\n  range.orientation = \"bottom\",\n  linejoin = \"mitre\",\n  na.rm = FALSE,\n  show.legend = NA,\n  inherit.aes = TRUE\n)\n}\n\\arguments{\n\\item{mapping}{Set of aesthetic mappings created by \\code{\\link[ggplot2:aes]{aes()}}. If specified and\n\\code{inherit.aes = TRUE} (the default), it is combined with the default mapping\nat the top level of the plot. You must supply \\code{mapping} if there is no plot\nmapping.}\n\n\\item{data}{The data to be displayed in this layer. There are three\noptions:\n\nIf \\code{NULL}, the default, the data is inherited from the plot\ndata as specified in the call to \\code{\\link[ggplot2:ggplot]{ggplot()}}.\n\nA \\code{data.frame}, or other object, will override the plot\ndata. All objects will be fortified to produce a data frame. See\n\\code{\\link[ggplot2:fortify]{fortify()}} for which variables will be created.\n\nA \\code{function} will be called with a single argument,\nthe plot data. The return value must be a \\code{data.frame}, and\nwill be used as the layer data. A \\code{function} can be created\nfrom a \\code{formula} (e.g. \\code{~ head(.x, 10)}).}\n\n\\item{stat}{The statistical transformation to use on the data for this layer.\nWhen using a \\verb{geom_*()} function to construct a layer, the \\code{stat}\nargument can be used the override the default coupling between geoms and\nstats. The \\code{stat} argument accepts the following:\n\\itemize{\n\\item A \\code{Stat} ggproto subclass, for example \\code{StatCount}.\n\\item A string naming the stat. To give the stat as a string, strip the\nfunction name of the \\code{stat_} prefix. For example, to use \\code{stat_count()},\ngive the stat as \\code{\"count\"}.\n\\item For more information and other ways to specify the stat, see the\n\\link[ggplot2:layer_stats]{layer stat} documentation.\n}}\n\n\\item{position}{A position adjustment to use on the data for this layer. This\ncan be used in various ways, including to prevent overplotting and\nimproving the display. The \\code{position} argument accepts the following:\n\\itemize{\n\\item The result of calling a position function, such as \\code{position_jitter()}.\nThis method allows for passing extra arguments to the position.\n\\item A string naming the position adjustment. To give the position as a\nstring, strip the function name of the \\code{position_} prefix. For example,\nto use \\code{position_jitter()}, give the position as \\code{\"jitter\"}.\n\\item For more information and other ways to specify the position, see the\n\\link[ggplot2:layer_positions]{layer position} documentation.\n}}\n\n\\item{...}{Other arguments passed on to \\code{\\link[ggplot2:layer]{layer()}}'s \\code{params} argument. These\narguments broadly fall into one of 4 categories below. Notably, further\narguments to the \\code{position} argument, or aesthetics that are required\ncan \\emph{not} be passed through \\code{...}. Unknown arguments that are not part\nof the 4 categories below are ignored.\n\\itemize{\n\\item Static aesthetics that are not mapped to a scale, but are at a fixed\nvalue and apply to the layer as a whole. For example, \\code{colour = \"red\"}\nor \\code{linewidth = 3}. The geom's documentation has an \\strong{Aesthetics}\nsection that lists the available options. The 'required' aesthetics\ncannot be passed on to the \\code{params}. Please note that while passing\nunmapped aesthetics as vectors is technically possible, the order and\nrequired length is not guaranteed to be parallel to the input data.\n\\item When constructing a layer using\na \\verb{stat_*()} function, the \\code{...} argument can be used to pass on\nparameters to the \\code{geom} part of the layer. An example of this is\n\\code{stat_density(geom = \"area\", outline.type = \"both\")}. The geom's\ndocumentation lists which parameters it can accept.\n\\item Inversely, when constructing a layer using a\n\\verb{geom_*()} function, the \\code{...} argument can be used to pass on parameters\nto the \\code{stat} part of the layer. An example of this is\n\\code{geom_area(stat = \"density\", adjust = 0.5)}. The stat's documentation\nlists which parameters it can accept.\n\\item The \\code{key_glyph} argument of \\code{\\link[ggplot2:layer]{layer()}} may also be passed on through\n\\code{...}. This can be one of the functions described as\n\\link[ggplot2:draw_key]{key glyphs}, to change the display of the layer in the legend.\n}}\n\n\\item{vjust}{A numeric vector specifying vertical justification.\n    If specified, overrides the \\code{just} setting.}\n\n\\item{linejoin}{Line join style (round, mitre, bevel).}\n\n\\item{na.rm}{If \\code{FALSE}, the default, missing values are removed with\na warning. If \\code{TRUE}, missing values are silently removed.}\n\n\\item{show.legend}{logical. Should this layer be included in the legends?\n\\code{NA}, the default, includes if any aesthetics are mapped.\n\\code{FALSE} never includes, and \\code{TRUE} always includes.\nIt can also be a named logical vector to finely select the aesthetics to\ndisplay.}\n\n\\item{inherit.aes}{If \\code{FALSE}, overrides the default aesthetics,\nrather than combining with them. This is most useful for helper functions\nthat define both data and aesthetics and shouldn't inherit behaviour from\nthe default plot specification, e.g. \\code{\\link[ggplot2:borders]{borders()}}.}\n\n\\item{range.orientation}{\\code{character()} one of \"top\" or \"bottom\", specifying\nwhere the half ranges will be plotted with respect to each transcript\n(\\code{y}).}\n}\n\\value{\nthe return value of a \\verb{geom_*} function is not intended to be\ndirectly handled by users. Therefore, \\verb{geom_*} functions should never be\nexecuted in isolation, rather used in combination with a\n\\code{ggplot2::ggplot()} call.\n}\n\\description{\n\\code{geom_range()} and \\code{geom_half_range()} draw tiles that are designed to\nrepresent range-based genomic features, such as exons. In combination with\n\\code{geom_intron()}, these geoms form the core components for visualizing\ntranscript structures.\n}\n\\details{\n\\code{geom_range()} and \\code{geom_half_range()} require the following \\code{aes()};\n\\code{xstart}, \\code{xend} and \\code{y} (e.g. transcript name). \\code{geom_half_range()} takes\nadvantage of the vertical symmetry of transcript annotation by plotting only\nhalf of a range on the top or bottom of a transcript structure. This can be\nuseful for comparing between two transcripts or free up plotting space for\nother transcript annotations (e.g. \\code{geom_junction()}).\n}\n\\examples{\n\nlibrary(magrittr)\nlibrary(ggplot2)\n\n# to illustrate the package's functionality\n# ggtranscript includes example transcript annotation\nsod1_annotation \\%>\\% head()\n\n# extract exons\nsod1_exons <- sod1_annotation \\%>\\% dplyr::filter(type == \"exon\")\nsod1_exons \\%>\\% head()\n\nbase <- sod1_exons \\%>\\%\n    ggplot(aes(\n        xstart = start,\n        xend = end,\n        y = transcript_name\n    ))\n\n# geom_range() is designed to visualise range-based annotation such as exons\nbase + geom_range()\n\n# geom_half_range() allows users to plot half ranges\n# on the top or bottom of the transcript\nbase + geom_half_range()\n\n# where the half ranges are plotted can be adjusted using range.orientation\nbase + geom_half_range(range.orientation = \"top\")\n\n# as a ggplot2 extension, ggtranscript geoms inherit the\n# the functionality from the parameters and aesthetics in ggplot2\nbase + geom_range(\n    aes(fill = transcript_name),\n    linewidth = 1\n)\n\n# together, geom_range() and geom_intron() are designed to visualize\n# the core components of transcript annotation\nbase + geom_range(\n    aes(fill = transcript_biotype)\n) +\n    geom_intron(\n        data = to_intron(sod1_exons, \"transcript_name\")\n    )\n\n# for protein coding transcripts\n# geom_range() be useful for visualizing UTRs that lie outside of the CDS\nsod1_exons_prot_coding <- sod1_exons \\%>\\%\n    dplyr::filter(transcript_biotype == \"protein_coding\")\n\n# extract cds\nsod1_cds <- sod1_annotation \\%>\\%\n    dplyr::filter(type == \"CDS\")\n\nsod1_exons_prot_coding \\%>\\%\n    ggplot(aes(\n        xstart = start,\n        xend = end,\n        y = transcript_name\n    )) +\n    geom_range(\n        fill = \"white\",\n        height = 0.25\n    ) +\n    geom_range(\n        data = sod1_cds\n    ) +\n    geom_intron(\n        data = to_intron(sod1_exons_prot_coding, \"transcript_name\")\n    )\n\n# geom_half_range() can be useful for comparing between two transcripts\n# enabling visualization of one transcript on the top, other on the bottom\nsod1_201_exons <- sod1_exons \\%>\\% dplyr::filter(transcript_name == \"SOD1-201\")\nsod1_201_cds <- sod1_cds \\%>\\% dplyr::filter(transcript_name == \"SOD1-201\")\nsod1_202_exons <- sod1_exons \\%>\\% dplyr::filter(transcript_name == \"SOD1-202\")\nsod1_202_cds <- sod1_cds \\%>\\% dplyr::filter(transcript_name == \"SOD1-202\")\n\nsod1_201_plot <- sod1_201_exons \\%>\\%\n    ggplot(aes(\n        xstart = start,\n        xend = end,\n        y = \"SOD1-201/202\"\n    )) +\n    geom_half_range(\n        fill = \"white\",\n        height = 0.125\n    ) +\n    geom_half_range(\n        data = sod1_201_cds\n    ) +\n    geom_intron(\n        data = to_intron(sod1_201_exons, \"transcript_name\")\n    )\n\nsod1_201_plot\n\nsod1_201_202_plot <- sod1_201_plot +\n    geom_half_range(\n        data = sod1_202_exons,\n        range.orientation = \"top\",\n        fill = \"white\",\n        height = 0.125\n    ) +\n    geom_half_range(\n        data = sod1_202_cds,\n        range.orientation = \"top\",\n        fill = \"purple\"\n    ) +\n    geom_intron(\n        data = to_intron(sod1_202_exons, \"transcript_name\")\n    )\n\nsod1_201_202_plot\n\n# leveraging existing ggplot2 functionality via e.g. coord_cartesian()\n# can be useful to zoom in on areas of interest\nsod1_201_202_plot + coord_cartesian(xlim = c(31659500, 31660000))\n}\n"
  },
  {
    "path": "man/ggtranscript.Rd",
    "content": "% Generated by roxygen2: do not edit by hand\n% Please edit documentation in R/ggtranscript-package.R\n\\docType{package}\n\\name{ggtranscript}\n\\alias{ggtranscript-package}\n\\alias{ggtranscript}\n\\title{\\code{ggtranscript}: Visualizing transcript structure and annotation using\n\\code{ggplot2}}\n\\description{\nThe goal of \\code{ggtranscript} is the simplify the process of visualizing\ntranscript structure and annotation. To achieve this, \\code{ggtranscript}\nintroduces 5 new geoms (\\code{geom_range()}, \\code{geom_half_range()}, \\code{geom_intron()},\n\\code{geom_junction()} and \\code{geom_junction_label_repel()}) as well as several\nhelper functions. As a \\code{ggplot2} extension, \\code{ggtranscript} inherits\n\\code{ggplot2}'s familiarity and flexibility, enabling users to intuitively adjust\naesthetics, parameters, scales etc as well as complement \\code{ggtranscript} geoms\nwith existing \\code{ggplot2} geoms to create informative, publication-ready plots.\n}\n\\seealso{\nUseful links:\n\\itemize{\n  \\item \\url{https://github.com/dzhang32/ggtranscript}\n  \\item Report bugs at \\url{https://github.com/dzhang32/ggtranscript/issues}\n}\n\n}\n\\author{\n\\strong{Maintainer}: David Zhang \\email{dyzhang32@gmail.com} (\\href{https://orcid.org/0000-0003-2382-8460}{ORCID})\n\nAuthors:\n\\itemize{\n  \\item Emil Gustavsson \\email{e.gustavsson@ucl.ac.uk} (\\href{https://orcid.org/0000-0003-0541-7537}{ORCID})\n}\n\nOther contributors:\n\\itemize{\n  \\item Regina Reynolds \\email{regina.reynolds.16@ucl.ac.uk} (\\href{https://orcid.org/0000-0001-6470-7919}{ORCID}) [contributor]\n  \\item Sonia Ruiz \\email{s.ruiz@ucl.ac.uk} [contributor]\n}\n\n}\n"
  },
  {
    "path": "man/shorten_gaps.Rd",
    "content": "% Generated by roxygen2: do not edit by hand\n% Please edit documentation in R/shorten_gaps.R\n\\name{shorten_gaps}\n\\alias{shorten_gaps}\n\\title{Improve transcript structure visualization by shortening gaps}\n\\usage{\nshorten_gaps(exons, introns, group_var = NULL, target_gap_width = 100L)\n}\n\\arguments{\n\\item{exons}{\\code{data.frame()} contains exons which can originate from multiple\ntranscripts differentiated by \\code{group_var}.}\n\n\\item{introns}{\\code{data.frame()} the intron co-ordinates corresponding to the\ninput \\code{exons}. This can be created by applying \\code{to_intron()} to the\n\\code{exons}. If introns originate from multiple transcripts, they must be\ndifferentiated using \\code{group_var}. If a user is not using \\code{to_intron()},\nthey must make sure intron start/ends are defined precisely as the adjacent\nexon boundaries (rather than exon end + 1 and exon start - 1).}\n\n\\item{group_var}{\\code{character()} if input data originates from more than 1\ntranscript, \\code{group_var} must specify the column that differentiates\ntranscripts (e.g. \"transcript_id\").}\n\n\\item{target_gap_width}{\\code{integer()} the width in base pairs to shorten the\ngaps to.}\n}\n\\value{\n\\code{data.frame()} contains the re-scaled co-ordinates of \\code{introns} and\n\\code{exons} of each input transcript with shortened gaps.\n}\n\\description{\nFor a given set of exons and introns, \\code{shorten_gaps()} reduces the width of\ngaps (regions that do not overlap any \\code{exons}) to a user-inputted\n\\code{target_gap_width}. This can be useful when visualizing transcripts that have\nlong introns, to hone in on the regions of interest (i.e. exons) and better\ncompare between transcript structures.\n}\n\\details{\nAfter \\code{shorten_gaps()} reduces the size of gaps, it will re-scale \\code{exons} and\n\\code{introns} to preserve exon alignment. This process will only reduce the width\nof input \\code{introns}, never \\code{exons}. Importantly, the outputted re-scaled\nco-ordinates should only be used for visualization as they will not match the\noriginal genomic coordinates.\n}\n\\examples{\n\nlibrary(magrittr)\nlibrary(ggplot2)\n\n# to illustrate the package's functionality\n# ggtranscript includes example transcript annotation\npknox1_annotation \\%>\\% head()\n\n# extract exons\npknox1_exons <- pknox1_annotation \\%>\\% dplyr::filter(type == \"exon\")\npknox1_exons \\%>\\% head()\n\n# to_intron() is a helper function included in ggtranscript\n# which is useful for converting exon co-ordinates to introns\npknox1_introns <- pknox1_exons \\%>\\% to_intron(group_var = \"transcript_name\")\npknox1_introns \\%>\\% head()\n\n# for transcripts with long introns, the exons of interest\n# can be difficult to visualize clearly when using the default scale\npknox1_exons \\%>\\%\n    ggplot(aes(\n        xstart = start,\n        xend = end,\n        y = transcript_name\n    )) +\n    geom_range() +\n    geom_intron(\n        data = pknox1_introns,\n        arrow.min.intron.length = 3500\n    )\n\n# in such cases it can be useful to rescale the exons and introns\n# using shorten_gaps() which shortens regions that do not overlap an exon\npknox1_rescaled <-\n    shorten_gaps(pknox1_exons, pknox1_introns, group_var = \"transcript_name\")\n\npknox1_rescaled \\%>\\% head()\n\n# this allows us to visualize differences in exonic structure more clearly\npknox1_rescaled \\%>\\%\n    dplyr::filter(type == \"exon\") \\%>\\%\n    ggplot(aes(\n        xstart = start,\n        xend = end,\n        y = transcript_name\n    )) +\n    geom_range() +\n    geom_intron(\n        data = pknox1_rescaled \\%>\\% dplyr::filter(type == \"intron\"),\n        arrow.min.intron.length = 300\n    )\n\n# shorten_gaps() can be used in combination with to_diff()\n# to further highlight differences in exon structure\n# here, all other transcripts are compared to the MANE-select transcript\npknox1_rescaled_diffs <- to_diff(\n    exons = pknox1_rescaled \\%>\\%\n        dplyr::filter(type == \"exon\", transcript_name != \"PKNOX1-201\"),\n    ref_exons = pknox1_rescaled \\%>\\%\n        dplyr::filter(type == \"exon\", transcript_name == \"PKNOX1-201\"),\n    group_var = \"transcript_name\"\n)\n\npknox1_rescaled \\%>\\%\n    dplyr::filter(type == \"exon\") \\%>\\%\n    ggplot(aes(\n        xstart = start,\n        xend = end,\n        y = transcript_name\n    )) +\n    geom_range() +\n    geom_intron(\n        data = pknox1_rescaled \\%>\\% dplyr::filter(type == \"intron\"),\n        arrow.min.intron.length = 300\n    ) +\n    geom_range(\n        data = pknox1_rescaled_diffs,\n        aes(fill = diff_type),\n        alpha = 0.2\n    )\n}\n"
  },
  {
    "path": "man/sod1_annotation.Rd",
    "content": "% Generated by roxygen2: do not edit by hand\n% Please edit documentation in R/data.R\n\\docType{data}\n\\name{sod1_annotation}\n\\alias{sod1_annotation}\n\\alias{pknox1_annotation}\n\\title{Example transcript annotation}\n\\format{\nA \\code{tibble::tibble()}:\n\\describe{\n\\item{seqnames}{\\code{factor()} chromosome.}\n\\item{start}{\\code{integer()} start position.}\n\\item{end}{\\code{integer()} end position.}\n\\item{strand}{\\code{factor()} strand.}\n\\item{type}{\\code{factor()} E.g.gene, transcript, exon or CDS.}\n\\item{gene_name}{\\code{character()} name of gene (GBA).}\n\\item{transcript_name}{\\code{character()} name of transcript.}\n\\item{transcript_biotype}{\\code{character()} biotype of transcript.}\n}\n\nAn object of class \\code{tbl_df} (inherits from \\code{tbl}, \\code{data.frame}) with 112 rows and 8 columns.\n}\n\\source{\ngenerated using \\code{ggtranscript/data-raw/sod1_pknox1_annotation.R}\n}\n\\usage{\nsod1_annotation\n\npknox1_annotation\n}\n\\description{\nTranscript annotation including the co-ordinates (hg38) of the genes,\ntranscripts, exons and CDS regions for \\emph{SOD1} and \\emph{PKNOX1}, which\noriginate from version 105 of the Ensembl reference annotation.\n}\n\\keyword{datasets}\n"
  },
  {
    "path": "man/sod1_junctions.Rd",
    "content": "% Generated by roxygen2: do not edit by hand\n% Please edit documentation in R/data.R\n\\docType{data}\n\\name{sod1_junctions}\n\\alias{sod1_junctions}\n\\title{Example junctions}\n\\format{\nA \\code{tibble::tibble()}:\n\\describe{\n\\item{seqnames}{\\code{factor()} chromosome.}\n\\item{start}{\\code{integer()} start position.}\n\\item{end}{\\code{integer()} end position.}\n\\item{strand}{\\code{factor()} strand.}\n\\item{mean_count}{\\code{factor()} Average count across all GTEx liver samples.}\n}\n}\n\\source{\ngenerated using \\code{ggtranscript/data-raw/sod1_junctions.R}\n}\n\\usage{\nsod1_junctions\n}\n\\description{\nJunction co-ordinates and counts associated with the \\emph{SOD1} gene.\nJunctions counts originate from GTEx liver samples and are downloaded via the\nBioconductor package \\code{snapcount}. Only unannotated junctions with a mean\ncount above 0.3 have been retained for this example.\n}\n\\keyword{datasets}\n"
  },
  {
    "path": "man/to_diff.Rd",
    "content": "% Generated by roxygen2: do not edit by hand\n% Please edit documentation in R/to_diff.R\n\\name{to_diff}\n\\alias{to_diff}\n\\title{Obtain the differences between transcript structure}\n\\usage{\nto_diff(exons, ref_exons, group_var = NULL)\n}\n\\arguments{\n\\item{exons}{\\code{data.frame()} contains exons which can originate from multiple\ntranscripts differentiated by \\code{group_var}.}\n\n\\item{ref_exons}{\\code{data.frame()} contains exons that originate from a single\ntranscript, which \\code{exons} will be compared against.}\n\n\\item{group_var}{\\code{character()} if input data originates from more than 1\ntranscript, \\code{group_var} must specify the column that differentiates\ntranscripts (e.g. \"transcript_id\").}\n}\n\\value{\n\\code{data.frame()} details the differences between \\code{exons} and\n\\code{ref_exons}.\n}\n\\description{\n\\code{to_diff()} obtains the difference between \\code{exons} from a set of transcripts\nto a reference transcript (\\code{ref_exons}). This can be useful when visualizing\nthe differences between transcript structure. \\code{to_diff()} expects two sets of\ninput exons; 1. \\code{exons} - exons from any number of transcripts that will be\ncompared to \\code{ref_exons} and 2. \\code{ref_exons} - exons from a single transcript\nwhich acts as the reference to compare against.\n}\n\\examples{\n\nlibrary(magrittr)\nlibrary(ggplot2)\n\n# to illustrate the package's functionality\n# ggtranscript includes example transcript annotation\nsod1_annotation \\%>\\% head()\n\n# extract exons\nsod1_exons <- sod1_annotation \\%>\\% dplyr::filter(type == \"exon\")\nsod1_exons \\%>\\% head()\n\n# for this example, let's compare transcripts to the MANE-select transcript\nsod1_mane <- sod1_exons \\%>\\% dplyr::filter(transcript_name == \"SOD1-201\")\nsod1_not_mane <- sod1_exons \\%>\\% dplyr::filter(transcript_name != \"SOD1-201\")\n\n# to_diff() obtains the differences between the exons as ranges\nsod1_diffs <- to_diff(\n    exons = sod1_not_mane,\n    ref_exons = sod1_mane,\n    group_var = \"transcript_name\"\n)\n\nsod1_diffs \\%>\\% head()\n\n# using geom_range(), it can be useful to visually overlay\n# the differences on top of the transcript annotation\nsod1_exons \\%>\\%\n    ggplot(aes(\n        xstart = start,\n        xend = end,\n        y = transcript_name\n    )) +\n    geom_range() +\n    geom_intron(\n        data = to_intron(sod1_exons, \"transcript_name\")\n    ) +\n    geom_range(\n        data = sod1_diffs,\n        ggplot2::aes(fill = diff_type),\n        alpha = 0.2\n    )\n}\n"
  },
  {
    "path": "man/to_intron.Rd",
    "content": "% Generated by roxygen2: do not edit by hand\n% Please edit documentation in R/to_intron.R\n\\name{to_intron}\n\\alias{to_intron}\n\\title{Convert exon co-ordinates to introns}\n\\usage{\nto_intron(exons, group_var = NULL)\n}\n\\arguments{\n\\item{exons}{\\code{data.frame()} contains exons which can originate from multiple\ntranscripts differentiated by \\code{group_var}.}\n\n\\item{group_var}{\\code{character()} if input data originates from more than 1\ntranscript, \\code{group_var} must specify the column that differentiates\ntranscripts (e.g. \"transcript_id\").}\n}\n\\value{\n\\code{data.frame()} contains the intron co-ordinates.\n}\n\\description{\nGiven a set of \\code{exons}, \\code{to_intron()} will return the corresponding introns.\n}\n\\details{\nIt is important to note that, for visualization purposes, \\code{to_intron()}\ndefines introns precisely as the exon boundaries, rather than the intron\nstart/end being (exon end + 1)/(exon start - 1).\n}\n\\examples{\nlibrary(magrittr)\nlibrary(ggplot2)\n\n# to illustrate the package's functionality\n# ggtranscript includes example transcript annotation\nsod1_annotation \\%>\\% head()\n\n# extract exons\nsod1_exons <- sod1_annotation \\%>\\% dplyr::filter(type == \"exon\")\nsod1_exons \\%>\\% head()\n\n# to_intron() is a helper function included in ggtranscript\n# which is useful for converting exon co-ordinates to introns\nsod1_introns <- sod1_exons \\%>\\% to_intron(group_var = \"transcript_name\")\nsod1_introns \\%>\\% head()\n\n# this can be particular useful when combined with\n# geom_range() and geom_intron()\n# to visualize the core components of transcript annotation\nsod1_exons \\%>\\%\n    ggplot(aes(\n        xstart = start,\n        xend = end,\n        y = transcript_name\n    )) +\n    geom_range() +\n    geom_intron(\n        data = to_intron(sod1_exons, \"transcript_name\")\n    )\n}\n"
  },
  {
    "path": "tests/testthat/test-add_exon_number.R",
    "content": "sod1_exons <- sod1_annotation %>%\n    dplyr::filter(type == \"exon\")\n\n# create dummy transcripts with both positive and minus strand\n# purely for testing strand functionality\ntest_exons <- sod1_exons %>%\n    dplyr::filter(transcript_name == \"SOD1-202\") %>%\n    dplyr::mutate(strand = \"-\") %>%\n    dplyr::bind_rows(\n        sod1_exons %>%\n            dplyr::filter(transcript_name == \"SOD1-201\")\n    )\n\n##### add_exon_number #####\n\ntestthat::test_that(\"add_exon_number() works correctly\", {\n    test_exon_number <- test_exons %>%\n        add_exon_number(group_var = \"transcript_name\")\n\n    test_exon_number_plus <- test_exon_number %>%\n        dplyr::filter(strand == \"+\")\n    test_exon_number_minus <- test_exon_number %>%\n        dplyr::filter(strand == \"-\")\n\n    expect_true(\"exon_number\" %in% colnames(test_exon_number))\n    expect_true(is.numeric(test_exon_number[[\"exon_number\"]]))\n\n    expect_equal(\n        test_exon_number_plus[[\"exon_number\"]],\n        seq_len(nrow(test_exon_number_plus))\n    )\n    expect_equal(\n        test_exon_number_minus[[\"exon_number\"]],\n        seq_len(nrow(test_exon_number_minus)) %>% rev()\n    )\n\n    # check order makes no difference\n    set.seed(32)\n    expect_equal(\n        test_exons[sample(seq_len(nrow(test_exons)), nrow(test_exons)), ] %>%\n            add_exon_number(group_var = \"transcript_name\"),\n        test_exon_number\n    )\n})\n\ntestthat::test_that(\"add_exon_number(group_var = NULL) works correctly\", {\n    test_exon_number_plus <- test_exons %>%\n        dplyr::filter(strand == \"+\") %>%\n        add_exon_number(group_var = NULL)\n    test_exon_number_minus <- test_exons %>%\n        dplyr::filter(strand == \"-\") %>%\n        add_exon_number(group_var = NULL)\n\n    expect_equal(\n        test_exon_number_plus[[\"exon_number\"]],\n        seq_len(nrow(test_exon_number_plus))\n    )\n    expect_equal(\n        test_exon_number_minus[[\"exon_number\"]],\n        seq_len(nrow(test_exon_number_minus)) %>% rev()\n    )\n})\n"
  },
  {
    "path": "tests/testthat/test-add_utr.R",
    "content": "pknox1_exons <- pknox1_annotation %>% dplyr::filter(type == \"exon\")\npknox1_cds <- pknox1_annotation %>% dplyr::filter(type == \"CDS\")\npknox1_utr <- pknox1_annotation %>% dplyr::filter(grepl(\"utr\", type))\n\n##### add_utr #####\n\n# add 3 bp to end of cds as stop codon not included in ensembl cds\npknox1_cds_w_stop <- pknox1_cds %>%\n    dplyr::group_by(transcript_name) %>%\n    dplyr::mutate(\n        end = ifelse(end == max(end), end + 3, end)\n    ) %>%\n    dplyr::ungroup()\n\npknox1_cds_utr <- add_utr(\n    pknox1_exons,\n    pknox1_cds_w_stop,\n    group_var = \"transcript_name\"\n)\n\npknox1_cds_utr_1_tx <- add_utr(\n    pknox1_exons %>% dplyr::filter(transcript_name == \"PKNOX1-203\"),\n    pknox1_cds_w_stop %>% dplyr::filter(transcript_name == \"PKNOX1-203\"),\n    group_var = \"transcript_name\"\n)\n\npknox1_cds_utr_1_tx_no_group <- add_utr(\n    pknox1_exons %>% dplyr::filter(transcript_name == \"PKNOX1-203\"),\n    pknox1_cds_w_stop %>% dplyr::filter(transcript_name == \"PKNOX1-203\"),\n    group_var = NULL\n)\n\ntest_add_utrs <- function(cds_utr_add_utr, utr_annotation, cds_annotation) {\n    utr_add_utr <- cds_utr_add_utr %>%\n        dplyr::filter(type == \"UTR\") %>%\n        dplyr::select(start, end) %>%\n        dplyr::arrange(start, end)\n\n    cds_add_utr <- cds_utr_add_utr %>%\n        dplyr::filter(type == \"CDS\") %>%\n        dplyr::select(start, end) %>%\n        dplyr::arrange(start, end)\n\n    utr_annotation <- utr_annotation %>%\n        dplyr::select(start, end) %>%\n        dplyr::arrange(start, end)\n\n    cds_annotation <- cds_annotation %>%\n        dplyr::select(start, end) %>%\n        dplyr::arrange(start, end)\n\n    no_na_type <- all(!is.na(cds_utr_add_utr[[\"type\"]]))\n    no_dummy_group <- is.null(cds_utr_add_utr[[\"dummy_group\"]])\n    correct_utrs <- all.equal(utr_add_utr, utr_annotation)\n    correct_cds <- all.equal(cds_add_utr, cds_annotation)\n\n    check_add_utr <- all(no_na_type, no_dummy_group, correct_utrs, correct_cds)\n\n    return(check_add_utr)\n}\n\ntestthat::test_that(\n    \"add_utr() works correctly\",\n    {\n        expect_true(test_add_utrs(pknox1_cds_utr, pknox1_utr, pknox1_cds_w_stop))\n        expect_true(test_add_utrs(\n            pknox1_cds_utr_1_tx,\n            pknox1_utr %>% dplyr::filter(transcript_name == \"PKNOX1-203\"),\n            pknox1_cds_w_stop %>% dplyr::filter(transcript_name == \"PKNOX1-203\")\n        ))\n        expect_true(test_add_utrs(\n            pknox1_cds_utr_1_tx_no_group,\n            pknox1_utr %>% dplyr::filter(transcript_name == \"PKNOX1-203\"),\n            pknox1_cds_w_stop %>% dplyr::filter(transcript_name == \"PKNOX1-203\")\n        ))\n    }\n)\n\n##### add_utr & shorten_gaps #####\n\npknox1_cds_utr_rescaled <-\n    shorten_gaps(\n        exons = pknox1_cds_utr,\n        introns = to_intron(pknox1_cds_utr, \"transcript_name\"),\n        group_var = \"transcript_name\"\n    )\n\n# add labels helps manual checking\nplot_before_after_rescaled <- function(cds_utr_before,\n                                       cds_utr_after,\n                                       group_var,\n                                       add_labels = FALSE) {\n    before_rescaling <- cds_utr_before %>%\n        dplyr::filter(type == \"CDS\") %>%\n        ggplot2::ggplot(ggplot2::aes(\n            xstart = start,\n            xend = end,\n            y = .data[[group_var]]\n        )) +\n        geom_range() +\n        geom_range(\n            data = cds_utr_before %>% dplyr::filter(type == \"UTR\"),\n            height = 0.25,\n            fill = \"white\"\n        ) +\n        geom_intron(\n            data = to_intron(cds_utr_before, \"transcript_name\"),\n        )\n\n    after_rescaling <- cds_utr_after %>%\n        dplyr::filter(type == \"CDS\") %>%\n        ggplot2::ggplot(ggplot2::aes(\n            xstart = start,\n            xend = end,\n            y = .data[[group_var]]\n        )) +\n        geom_range() +\n        geom_range(\n            data = cds_utr_after %>% dplyr::filter(type == \"UTR\"),\n            height = 0.25,\n            fill = \"white\"\n        ) +\n        geom_intron(\n            data = to_intron(\n                cds_utr_after %>%\n                    dplyr::filter(type != \"intron\"),\n                \"transcript_name\"\n            ),\n        )\n\n    before_after_list <- list(before_rescaling, after_rescaling)\n\n    if (add_labels) {\n        for (i in seq_len(length(before_after_list))) {\n            before_after_list[[i]] <- before_after_list[[i]] +\n                ggrepel::geom_label_repel(\n                    ggplot2::aes_string(\n                        x = \"start\",\n                        y = group_var,\n                        label = \"start\"\n                    ),\n                    min.segment.length = 0\n                )\n        }\n    }\n\n    before_after_plot <- ggpubr::ggarrange(\n        plotlist = before_after_list, nrow = 2\n    )\n\n    return(before_after_plot)\n}\n\ntestthat::test_that(\n    \"shorten_gaps works correctly\",\n    {\n        test_rescaled_w_utr_plot <- plot_before_after_rescaled(\n            pknox1_cds_utr,\n            pknox1_cds_utr_rescaled,\n            group_var = \"transcript_name\",\n            add_labels = FALSE\n        )\n\n        vdiffr::expect_doppelganger(\n            \"test rescaled with utr plot\",\n            test_rescaled_w_utr_plot\n        )\n    }\n)\n"
  },
  {
    "path": "tests/testthat/test-geom_half_range.R",
    "content": "# create dummy exons for testing\ntest_exons <-\n    dplyr::tibble(\n        start = c(100, 300, 500, 650),\n        end = start + 100,\n        strand = c(\"+\", \"+\", \"-\", \"-\"),\n        tx = c(\"A\", \"A\", \"B\", \"B\")\n    )\n\n# create base plot to be used in downstream tests\ntest_exons_plot <- test_exons %>%\n    ggplot2::ggplot(aes(\n        xstart = start,\n        xend = end,\n        y = tx\n    ))\n\n##### geom_half_range #####\n\ntestthat::test_that(\n    \"geom_half_range() works correctly\",\n    {\n        base_geom_half_range <- test_exons_plot +\n            geom_half_range()\n        w_param_geom_half_range <- test_exons_plot +\n            geom_half_range(colour = \"red\", fill = \"blue\")\n        w_aes_geom_half_range <- test_exons_plot +\n            geom_half_range(aes(fill = tx))\n        w_facet_geom_half_range <- test_exons_plot +\n            geom_half_range() +\n            ggplot2::facet_wrap(~tx)\n\n        vdiffr::expect_doppelganger(\n            \"Base geom_half_range plot\",\n            base_geom_half_range\n        )\n        vdiffr::expect_doppelganger(\n            \"With param geom_half_range plot\",\n            w_param_geom_half_range\n        )\n        vdiffr::expect_doppelganger(\n            \"With aes geom_half_range plot\",\n            w_aes_geom_half_range\n        )\n        vdiffr::expect_doppelganger(\n            \"With facet geom_half_range plot\",\n            w_facet_geom_half_range\n        )\n    }\n)\n\ntestthat::test_that(\n    \"geom_half_range(range.orientation = x) works correctly\",\n    {\n        w_top_geom_half_range <- test_exons_plot +\n            geom_half_range(range.orientation = \"top\")\n        w_both_geom_half_range <- test_exons_plot +\n            geom_half_range(range.orientation = \"top\", fill = \"red\") +\n            geom_half_range(range.orientation = \"bottom\", fill = \"blue\")\n\n        vdiffr::expect_doppelganger(\n            \"With top geom_half_range plot\",\n            w_top_geom_half_range\n        )\n        vdiffr::expect_doppelganger(\n            \"With both geom_half_range plot\",\n            w_both_geom_half_range\n        )\n    }\n)\n\ntestthat::test_that(\n    \"geom_half_range() catches user input errors\",\n    {\n        a_range.orientation <- test_exons_plot +\n            geom_half_range(range.orientation = \"a\")\n\n        expect_error(\n            print(a_range.orientation),\n            \"range.orientation must be one of\"\n        )\n    }\n)\n"
  },
  {
    "path": "tests/testthat/test-geom_intron.R",
    "content": "test_introns <-\n    dplyr::tibble(\n        strand = c(\"+\", \"-\"),\n        tx = c(\"A\", \"B\"),\n        start = c(201, 601),\n        end = c(299, 649),\n        type = \"intron\"\n    )\n\n# create base plot to be used in downstream tests\ntest_introns_plot <- test_introns %>%\n    ggplot2::ggplot(aes(\n        xstart = start,\n        xend = end,\n        y = tx\n    ))\n\n##### geom_intron #####\n\ntestthat::test_that(\n    \"geom_intron() works correctly\",\n    {\n        base_geom_intron <- test_introns_plot +\n            geom_intron()\n        w_param_geom_intron <- test_introns_plot +\n            geom_intron(colour = \"blue\", linewidth = 2)\n        w_aes_geom_intron <- test_introns_plot +\n            geom_intron(aes(colour = tx, linewidth = c(1L, 2L)))\n        w_facet_geom_intron <- test_introns_plot +\n            geom_intron() +\n            ggplot2::facet_wrap(~tx)\n\n        vdiffr::expect_doppelganger(\n            \"Base geom_intron plot\",\n            base_geom_intron\n        )\n        vdiffr::expect_doppelganger(\n            \"With param geom_intron plot\",\n            w_param_geom_intron\n        )\n        vdiffr::expect_doppelganger(\n            \"With aes geom_intron plot\",\n            w_aes_geom_intron\n        )\n        vdiffr::expect_doppelganger(\n            \"With facet geom_intron plot\",\n            w_facet_geom_intron\n        )\n    }\n)\n\ntestthat::test_that(\n    \"geom_intron(strand = x) works correctly\",\n    {\n        minus_strand <- test_introns_plot +\n            geom_intron(strand = \"-\")\n        factor_strand <- test_introns_plot +\n            geom_intron(strand = factor(\"-\"))\n        as_aes_strand <- test_introns_plot +\n            geom_intron(aes(strand = strand))\n\n        vdiffr::expect_doppelganger(\n            \"Minus strand plot\",\n            minus_strand\n        )\n        vdiffr::expect_doppelganger(\n            \"factor strand plot\",\n            factor_strand\n        )\n        vdiffr::expect_doppelganger(\n            \"As aes strand plot\",\n            as_aes_strand\n        )\n    }\n)\n\ntestthat::test_that(\n    \"geom_intron(arrow.min.intron.length = x) works correctly\",\n    {\n        base_arrow.min <- test_introns_plot +\n            geom_intron(arrow.min.intron.length = 50)\n        w_strand_arrow_min <- test_introns_plot +\n            geom_intron(arrow.min.intron.length = 50, strand = \"-\")\n\n        vdiffr::expect_doppelganger(\n            \"base arrow.min plot\",\n            base_arrow.min\n        )\n        vdiffr::expect_doppelganger(\n            \"with strand arrow.min plot\",\n            w_strand_arrow_min\n        )\n    }\n)\n\ntestthat::test_that(\n    \"geom_intron() catches strand input errors\",\n    {\n        na_strand <- test_introns_plot +\n            geom_intron(strand = c(NA, rep(\"+\", nrow(test_introns) - 1)))\n        a_strand <- test_introns_plot +\n            geom_intron(strand = \"a\")\n        int_strand <- test_introns_plot +\n            geom_intron(aes(strand = start))\n        # seems to require print to catch error\n        expect_error(\n            print(na_strand),\n            \"strand values must be one of\"\n        )\n        expect_error(\n            print(a_strand),\n            \"strand values must be one of\"\n        )\n        expect_error(\n            print(int_strand),\n            \"strand values must be one of\"\n        )\n    }\n)\n\ntestthat::test_that(\n    \"geom_intron() catches arrow.min.intron.length input errors\",\n    {\n        neg_arrow.min <- test_introns_plot +\n            geom_intron(arrow.min.intron.length = -1)\n        chr_arrow.min <- test_introns_plot +\n            geom_intron(arrow.min.intron.length = \"1\")\n        # seems to require print to catch error\n        expect_error(\n            print(neg_arrow.min),\n            \"arrow.min.intron.length must be \"\n        )\n        expect_error(\n            print(chr_arrow.min),\n            \"arrow.min.intron.length must be \"\n        )\n    }\n)\n"
  },
  {
    "path": "tests/testthat/test-geom_junction.R",
    "content": "# manually create the expected introns\ntest_introns <-\n    sod1_annotation %>%\n    dplyr::filter(type == \"exon\") %>%\n    to_intron(group_var = \"transcript_name\")\n\n# create base plot to be used in downstream tests\ntest_introns_plot <- test_introns %>%\n    ggplot2::ggplot(aes(\n        xstart = start,\n        xend = end,\n        y = transcript_name\n    ))\n\n##### geom_junction #####\n\ntestthat::test_that(\n    \"geom_junction() works correctly\",\n    {\n        base_geom_junction <- test_introns_plot +\n            geom_junction()\n        w_param_geom_junction <- test_introns_plot +\n            geom_junction(colour = \"red\")\n        w_aes_geom_junction <- test_introns_plot +\n            geom_junction(aes(colour = transcript_name))\n        w_facet_geom_junction <- test_introns_plot +\n            geom_junction() +\n            ggplot2::facet_wrap(~transcript_biotype)\n\n        vdiffr::expect_doppelganger(\n            \"Base geom_junction plot\",\n            base_geom_junction\n        )\n        vdiffr::expect_doppelganger(\n            \"With param geom_junction plot\",\n            w_param_geom_junction\n        )\n        vdiffr::expect_doppelganger(\n            \"With aes geom_junction plot\",\n            w_aes_geom_junction\n        )\n        vdiffr::expect_doppelganger(\n            \"With facet geom_junction plot\",\n            w_facet_geom_junction\n        )\n    }\n)\n\ntestthat::test_that(\n    \"geom_junction(junction.orientation = x) works correctly\",\n    {\n        top_junction.orientation <- test_introns_plot +\n            geom_junction(junction.orientation = \"top\")\n        bottom_junction.orientation <- test_introns_plot +\n            geom_junction(junction.orientation = \"bottom\")\n        w_aes_param_top_junction.orientation <- test_introns_plot +\n            geom_junction(\n                aes(colour = transcript_name),\n                linewidth = 1,\n                junction.orientation = \"top\"\n            )\n\n        vdiffr::expect_doppelganger(\n            \"top junction.orientation plot\",\n            top_junction.orientation\n        )\n        vdiffr::expect_doppelganger(\n            \"bottom junction.orientation plot\",\n            bottom_junction.orientation\n        )\n        vdiffr::expect_doppelganger(\n            \"with aes and param top junction.orientation plot\",\n            w_aes_param_top_junction.orientation\n        )\n    }\n)\n\ntestthat::test_that(\n    \"geom_junction(junction.y.max = x) works correctly\",\n    {\n        junction.y.max_0.5 <- test_introns_plot +\n            geom_junction(junction.y.max = 0.5)\n        w_aes_param_junction.y.max_0.5 <- test_introns_plot +\n            geom_junction(\n                aes(colour = transcript_name),\n                linewidth = 1,\n                junction.y.max = 0.5\n            )\n        w_facet_junction.y.max_0.5 <- test_introns_plot +\n            geom_junction(junction.y.max = 0.5) +\n            ggplot2::facet_wrap(~transcript_biotype)\n\n        vdiffr::expect_doppelganger(\n            \"0.5 junction.y.max plot\",\n            junction.y.max_0.5\n        )\n        vdiffr::expect_doppelganger(\n            \"with aes and param 0.5 junction.y.max plot\",\n            w_aes_param_junction.y.max_0.5\n        )\n        vdiffr::expect_doppelganger(\n            \"with facet 0.5 junction.y.max plot\",\n            w_facet_junction.y.max_0.5\n        )\n    }\n)\n\ntestthat::test_that(\n    \"geom_junction() catches junction.orientation input errors\",\n    {\n        a_junction.orientation <- test_introns_plot +\n            geom_junction(junction.orientation = \"a\")\n\n        expect_error(\n            print(a_junction.orientation),\n            \"junction.orientation must be one of\"\n        )\n    }\n)\n\ntestthat::test_that(\n    \"geom_junction() catches junction.y.max input errors\",\n    {\n        len_2_junction.y.max <- test_introns_plot +\n            geom_junction(junction.y.max = c(1, 2))\n        a_junction.y.max <- test_introns_plot +\n            geom_junction(junction.y.max = \"a\")\n\n        expect_error(\n            print(len_2_junction.y.max),\n            \"junction.y.max must have a length of 1\"\n        )\n        expect_error(\n            print(a_junction.y.max),\n            \"junction.y.max must be a numeric value\"\n        )\n    }\n)\n"
  },
  {
    "path": "tests/testthat/test-geom_junction_label_repel.R",
    "content": "# manually create the expected introns\ntest_introns <-\n    sod1_annotation %>%\n    dplyr::filter(\n        type == \"exon\",\n        transcript_name %in% c(\"SOD1-201\", \"SOD1-202\")\n    ) %>%\n    to_intron(group_var = \"transcript_name\") %>%\n    dplyr::mutate(\n        count = dplyr::row_number()\n    )\n\n# create base plot to be used in downstream tests\ntest_introns_plot <- test_introns %>%\n    ggplot2::ggplot(aes(\n        xstart = start,\n        xend = end,\n        y = transcript_name\n    ))\n\n##### geom_junction_label_repel #####\n\ntestthat::test_that(\n    \"geom_junction() works correctly\",\n    {\n        base_geom_junction_labels <- test_introns_plot +\n            geom_junction() +\n            geom_junction_label_repel(\n                aes(label = count),\n                seed = 32\n            )\n        w_param_geom_junction_labels <- test_introns_plot +\n            geom_junction(\n                junction.y.max = 0.5\n            ) +\n            geom_junction_label_repel(\n                aes(label = count),\n                junction.y.max = 0.5,\n                seed = 32\n            )\n        w_aes_geom_junction_labels <- test_introns_plot +\n            geom_junction(aes(colour = transcript_name)) +\n            geom_junction_label_repel(\n                aes(\n                    label = count,\n                    colour = transcript_name\n                ),\n                seed = 32\n            )\n        w_facet_geom_junction_labels <- test_introns_plot +\n            geom_junction() +\n            geom_junction_label_repel(\n                aes(label = count),\n                seed = 32\n            ) +\n            ggplot2::facet_wrap(transcript_name ~ ., drop = TRUE)\n\n        vdiffr::expect_doppelganger(\n            \"Base geom_junction_label_repel plot\",\n            base_geom_junction_labels\n        )\n        vdiffr::expect_doppelganger(\n            \"With param geom_junction_label_repel plot\",\n            w_param_geom_junction_labels\n        )\n        vdiffr::expect_doppelganger(\n            \"With aes geom_junction_label_repel plot\",\n            w_aes_geom_junction_labels\n        )\n        vdiffr::expect_doppelganger(\n            \"With facet geom_junction_label_repel plot\",\n            w_facet_geom_junction_labels\n        )\n    }\n)\n"
  },
  {
    "path": "tests/testthat/test-geom_range.R",
    "content": "# create dummy exons for testing\ntest_exons <-\n    dplyr::tibble(\n        start = c(100, 300, 500, 650),\n        end = start + 100,\n        strand = c(\"+\", \"+\", \"-\", \"-\"),\n        tx = c(\"A\", \"A\", \"B\", \"B\")\n    )\n\n# create base plot to be used in downstream tests\ntest_exons_plot <- test_exons %>%\n    ggplot2::ggplot(aes(\n        xstart = start,\n        xend = end,\n        y = tx\n    ))\n\n##### geom_range #####\n\ntestthat::test_that(\n    \"geom_range() works correctly\",\n    {\n        base_geom_range <- test_exons_plot +\n            geom_range()\n        w_param_geom_range <- test_exons_plot +\n            geom_range(colour = \"red\", fill = \"blue\")\n        w_aes_geom_range <- test_exons_plot +\n            geom_range(aes(fill = tx))\n        w_facet_geom_range <- test_exons_plot +\n            geom_range() +\n            ggplot2::facet_wrap(~tx)\n\n        vdiffr::expect_doppelganger(\n            \"Base geom_range plot\",\n            geom_range\n        )\n        vdiffr::expect_doppelganger(\n            \"With param geom_range plot\",\n            w_param_geom_range\n        )\n        vdiffr::expect_doppelganger(\n            \"With aes geom_range plot\",\n            w_aes_geom_range\n        )\n        vdiffr::expect_doppelganger(\n            \"With facet geom_range plot\",\n            w_facet_geom_range\n        )\n    }\n)\n\ntestthat::test_that(\n    \"geom_range(vjust = x) works correctly\",\n    {\n        w_vjust_geom_range <- test_exons_plot +\n            geom_range(vjust = 1.5, height = 0.25)\n\n        vdiffr::expect_doppelganger(\n            \"With vjust geom_range plot\",\n            w_vjust_geom_range\n        )\n    }\n)\n"
  },
  {
    "path": "tests/testthat/test-shorten_gaps.R",
    "content": "test_exons <-\n    dplyr::tibble(\n        seqnames = \"1\",\n        start = c(100, 300, 500, 650),\n        end = start + 100,\n        strand = \"+\",\n        tx = c(\"A\", \"A\", \"B\", \"B\")\n    )\n\npknox1_exons <- pknox1_annotation %>% dplyr::filter(type == \"exon\")\npknox1_introns <- pknox1_exons %>%\n    to_intron(\"transcript_name\")\n\n##### .get_gaps #####\n\n# need to create gaps globally for downstream tests\npknox1_intron_gaps <- .get_gaps(GenomicRanges::GRanges(pknox1_exons))\ntest_intron_gaps <- .get_gaps(GenomicRanges::GRanges(test_exons))\n\ntest_.get_gaps <- function(exons, intron_gaps) {\n\n    # intron_gaps should not overlap any exons\n    exons_gap_hits <- GenomicRanges::findOverlaps(\n        GenomicRanges::GRanges(exons),\n        intron_gaps\n    )\n\n    overlap_exons <- length(exons_gap_hits) == 0\n\n    return(overlap_exons)\n}\n\ntestthat::test_that(\".get_gaps() works correctly\", {\n    expect_true(test_.get_gaps(\n        pknox1_exons, pknox1_intron_gaps\n    ))\n    expect_true(test_.get_gaps(\n        test_exons, test_intron_gaps\n    ))\n})\n\n##### .get_tx_start_gaps #####\n\npknox1_tx_start_gaps <-\n    .get_tx_start_gaps(pknox1_exons, \"transcript_name\")\n\ntest_exons_tx_start_gaps <-\n    .get_tx_start_gaps(test_exons, NULL)\n\ntest_.get_tx_start_gaps <- function(exons, tx_start_gaps, group_var) {\n    unique_start <- length(unique(tx_start_gaps[[\"start\"]])) == 1\n    correct_start <- all(tx_start_gaps[[\"start\"]] == min(exons[[\"start\"]]))\n    correct_end <- exons %>%\n        dplyr::group_by_at(.vars = group_var) %>%\n        dplyr::summarise(tx_start = min(start))\n    correct_end <- all(tx_start_gaps[[\"end\"]] == correct_end[[\"tx_start\"]])\n\n    correct_all <- all(\n        unique_start, correct_start, correct_end\n    )\n\n    return(correct_all)\n}\n\ntestthat::test_that(\".get_tx_start_gaps() works correctly\", {\n    expect_true(test_.get_tx_start_gaps(\n        pknox1_exons,\n        pknox1_tx_start_gaps,\n        \"transcript_name\"\n    ))\n    expect_true(test_.get_tx_start_gaps(\n        test_exons,\n        test_exons_tx_start_gaps,\n        NULL\n    ))\n})\n\n##### .check_len_1_strand_seqnames #####\n\ntestthat::test_that(\n    \".check_len_1_strand_seqnames() catches user input errors\",\n    {\n        expect_error(\n            .check_len_1_strand_seqnames(1:2, 1),\n            \"seqnames of object contains more than 1 unique value\"\n        )\n        expect_error(\n            .check_len_1_strand_seqnames(1, 1:2),\n            \"strand of object contains more than 1 unique value\"\n        )\n    }\n)\n\n##### .check_type #####\n\ntestthat::test_that(\".get_type() works correctly\", {\n    added_type_exons <- pknox1_exons %>%\n        dplyr::select(-type) %>%\n        .get_type(\"exons\") %>%\n        .[[\"type\"]]\n    added_type_introns <- pknox1_exons %>%\n        dplyr::select(-type) %>%\n        .get_type(\"introns\") %>%\n        .[[\"type\"]]\n\n    expect_true(\n        all(added_type_exons == \"exon\")\n    )\n    expect_true(\n        all(added_type_introns == \"intron\")\n    )\n    expect_identical(\n        .get_type(pknox1_exons, \"exons\"),\n        pknox1_exons\n    )\n    expect_identical(\n        .get_type(pknox1_introns, \"introns\"),\n        pknox1_introns\n    )\n})\n\ntestthat::test_that(\".get_type() catches user input errors\", {\n    expect_error(\n        .get_type(pknox1_exons, \"introns\"),\n        \"values in the 'type' column of introns must be one of:\"\n    )\n})\n\n##### .check_target_gap_width #####\n\ntestthat::test_that(\".check_target_gap_width() catches user input errors\", {\n    expect_warning(\n        .check_target_gap_width(100),\n        \"target_gap_width must be an integer, coercing\"\n    )\n})\n\n##### shorten_gaps #####\n\n# also using this to test drop_orig_coords\npknox1_rescaled_tx <- shorten_gaps(\n    pknox1_exons,\n    pknox1_introns,\n    group_var = \"transcript_name\",\n    target_gap_width = 100L\n)\n\npknox1_exons_1_tx <- pknox1_exons %>%\n    dplyr::filter(transcript_name == \"PKNOX1-202\")\npknox1_introns_1_tx <- pknox1_introns %>%\n    dplyr::filter(transcript_name == \"PKNOX1-202\")\n\npknox1_rescaled_1_tx <- shorten_gaps(\n    pknox1_exons_1_tx,\n    pknox1_introns_1_tx,\n    group_var = \"transcript_name\",\n    target_gap_width = 100L\n)\n\npknox1_rescaled_1_tx_no_group <- shorten_gaps(\n    pknox1_exons_1_tx,\n    pknox1_introns_1_tx,\n    group_var = NULL,\n    target_gap_width = 100L\n)\n\ntest_rescaled_tx <- shorten_gaps(\n    test_exons,\n    to_intron(test_exons, \"tx\"),\n    group_var = \"tx\",\n    target_gap_width = 50L\n)\n\ntestthat::test_that(\"shorten_gaps() keeps existing columns\", {\n    expect_true(!is.null(pknox1_rescaled_tx[[\"transcript_biotype\"]]))\n    expect_true(!is.null(\n        pknox1_rescaled_1_tx_no_group[[\"transcript_biotype\"]]\n    ))\n})\n\ntestthat::test_that(\"shorten_gaps() takes user inputted type\", {\n\n    # for test, we modify all exons types to \"utr\"\n    all_utr <- shorten_gaps(\n        pknox1_exons %>% dplyr::mutate(type = \"utr\"),\n        pknox1_introns,\n        group_var = \"transcript_name\",\n        target_gap_width = 100L\n    )\n    expect_true(all(pknox1_rescaled_tx[[\"type\"]] %in% c(\"exon\", \"intron\")))\n    expect_true(all(all_utr[[\"type\"]] %in% c(\"utr\", \"intron\")))\n})\n\ntest_shorten_gaps <- function(exons, rescaled_tx) {\n\n    # should never shorten exons\n    exon_widths_before <- exons[[\"end\"]] - exons[[\"start\"]]\n    exon_widths_after <- rescaled_tx %>%\n        dplyr::filter(type == \"exon\") %>%\n        dplyr::mutate(width = end - start) %>%\n        .[[\"width\"]]\n\n    unchanged_exon_widths <- all.equal(\n        sort(exon_widths_before), sort(exon_widths_after)\n    )\n\n    return(unchanged_exon_widths)\n}\n\ntestthat::test_that(\"shorten_gaps() never modifies exons\", {\n    expect_true(test_shorten_gaps(\n        pknox1_exons,\n        pknox1_rescaled_tx\n    ))\n    expect_true(test_shorten_gaps(\n        pknox1_exons_1_tx,\n        pknox1_rescaled_1_tx\n    ))\n    expect_true(test_shorten_gaps(\n        pknox1_exons_1_tx,\n        pknox1_rescaled_1_tx_no_group\n    ))\n    expect_true(test_shorten_gaps(\n        test_exons,\n        test_rescaled_tx\n    ))\n})\n\n# add labels helps manual checking\nplot_rescaled_tx <- function(exons,\n                             rescaled_tx,\n                             group_var,\n                             add_labels = FALSE) {\n    before_rescaling <- exons %>%\n        ggplot2::ggplot(ggplot2::aes(\n            xstart =  start,\n            xend = end,\n            y = .data[[group_var]]\n        )) +\n        geom_range() +\n        geom_intron(\n            data = to_intron(exons, group_var),\n            strand = \"-\",\n            arrow.min.intron.length = 500\n        )\n\n    after_rescaling <- rescaled_tx %>%\n        dplyr::filter(type == \"exon\") %>%\n        ggplot2::ggplot(ggplot2::aes(\n            xstart = start,\n            xend = end,\n            y = .data[[group_var]]\n        )) +\n        geom_range() +\n        geom_intron(\n            data = rescaled_tx %>%\n                dplyr::filter(type == \"intron\"),\n            strand = \"-\",\n            arrow.min.intron.length = 500\n        )\n\n    before_after_list <- list(before_rescaling, after_rescaling)\n\n    if (add_labels) {\n        for (i in seq_len(length(before_after_list))) {\n            before_after_list[[i]] <- before_after_list[[i]] +\n                ggrepel::geom_label_repel(\n                    ggplot2::aes(\n                        x = end,\n                        y = .data[[group_var]],\n                        label = end\n                    ),\n                    linewidth = 2,\n                    min.segment.length = 0\n                )\n        }\n    }\n\n    before_after_plot <- ggpubr::ggarrange(\n        plotlist = before_after_list, nrow = 2\n    )\n\n    return(before_after_plot)\n}\n\ntestthat::test_that(\n    \"shorten_gaps works correctly\",\n    {\n        test_rescaled_plot <- plot_rescaled_tx(\n            test_exons, test_rescaled_tx, \"tx\"\n        )\n        pknox1_rescaled_plot <- plot_rescaled_tx(\n            pknox1_exons, pknox1_rescaled_tx, \"transcript_name\"\n        )\n        pknox1_rescaled_plot_1_tx <- plot_rescaled_tx(\n            pknox1_exons_1_tx, pknox1_rescaled_1_tx, \"transcript_name\"\n        )\n        # make sure everything works okay even if group is set to NULL\n        pknox1_plot_1_tx_no_group <- plot_rescaled_tx(\n            pknox1_exons_1_tx,\n            pknox1_rescaled_1_tx_no_group,\n            \"transcript_name\"\n        )\n\n        vdiffr::expect_doppelganger(\n            \"test exons rescaled plot\",\n            test_rescaled_plot\n        )\n        vdiffr::expect_doppelganger(\n            \"pknox1 rescaled plot\",\n            pknox1_rescaled_plot\n        )\n        vdiffr::expect_doppelganger(\n            \"pknox1 rescaled plot 1 tx\",\n            pknox1_rescaled_plot_1_tx\n        )\n        vdiffr::expect_doppelganger(\n            \"pknox1 rescaled plot 1 tx no group\",\n            pknox1_plot_1_tx_no_group\n        )\n    }\n)\n"
  },
  {
    "path": "tests/testthat/test-to_diff.R",
    "content": "sod1_exons <- sod1_annotation %>%\n    dplyr::filter(type == \"exon\")\n\nmane <- sod1_exons %>%\n    dplyr::filter(transcript_name == \"SOD1-201\")\n\nsingle_tx <- sod1_exons %>%\n    dplyr::filter(transcript_name %in% c(\"SOD1-202\"))\n\nmulti_tx <- sod1_exons %>%\n    dplyr::filter(transcript_name %in% c(\"SOD1-202\", \"SOD1-203\", \"SOD1-204\"))\n\n##### to_diff #####\n\ntestthat::test_that(\"to_diff() works correctly\", {\n    test_diffs <- to_diff(\n        exons = single_tx,\n        ref_exons = mane\n    )\n    expect_true(is.data.frame(test_diffs))\n    expect_true(nrow(test_diffs) > 0)\n    expect_true(all(\n        c(\"seqnames\", \"start\", \"end\", \"strand\", \"type\", \"diff_type\") %in%\n            colnames(test_diffs)\n    ))\n})\n\ntestthat::test_that(\"to_diff() works correctly for single transcripts\", {\n    test_diffs <- to_diff(\n        exons = single_tx,\n        ref_exons = mane,\n        group_var = \"transcript_name\"\n    )\n    # think the easiest way to check diffs is via plotting\n    single_tx_diff_plot <- mane %>%\n        dplyr::bind_rows(single_tx) %>%\n        ggplot2::ggplot(\n            aes(\n                xstart = start,\n                xend = end,\n                y = transcript_name\n            )\n        ) +\n        geom_range() +\n        geom_range(\n            data = test_diffs,\n            alpha = 0.2,\n            fill = \"red\"\n        )\n\n    vdiffr::expect_doppelganger(\n        \"single tx diff plot\",\n        single_tx_diff_plot\n    )\n})\n\ntestthat::test_that(\"to_diff() works correctly for multiple transcripts\", {\n    test_diffs <- to_diff(\n        exons = multi_tx,\n        ref_exons = mane,\n        group_var = \"transcript_name\"\n    )\n    multi_tx_diff_plot <- mane %>%\n        dplyr::bind_rows(multi_tx) %>%\n        ggplot2::ggplot(\n            aes(\n                xstart = start,\n                xend = end,\n                y = transcript_name\n            )\n        ) +\n        geom_range() +\n        geom_range(\n            data = test_diffs,\n            aes(fill = diff_type),\n            alpha = 0.2,\n        )\n\n    vdiffr::expect_doppelganger(\n        \"multi tx diff plot\",\n        multi_tx_diff_plot\n    )\n})\n"
  },
  {
    "path": "tests/testthat/test-to_intron.R",
    "content": "# create dummy exons for testing\ntest_exons <-\n    dplyr::tibble(\n        start = c(100, 300, 500, 650),\n        end = start + 100,\n        strand = c(\"+\", \"+\", \"-\", \"-\"),\n        tx = c(\"A\", \"A\", \"B\", \"B\")\n    )\n\n# manually create the expected introns\ntest_introns <-\n    dplyr::tibble(\n        strand = c(\"+\", \"-\"),\n        tx = c(\"A\", \"B\"),\n        start = c(200, 600),\n        end = c(300, 650),\n        type = \"intron\"\n    )\n\npknox1_cds_utr <-\n    pknox1_annotation %>% dplyr::filter(\n        type == \"CDS\" | grepl(\"utr\", type)\n    )\n\n##### to_intron #####\n\ntestthat::test_that(\"to_intron() obtains introns correctly\", {\n    # with group_var\n    expect_equal(\n        test_introns,\n        test_exons %>% to_intron(group_var = \"tx\")\n    )\n    # without group_var\n    expect_equal(\n        test_introns %>% dplyr::filter(tx != \"B\"),\n        test_exons %>%\n            dplyr::filter(tx != \"B\") %>%\n            to_intron()\n    )\n})\n\ntestthat::test_that(\n    \"to_intron() obtains introns correctly, regardless of exon order\",\n    {\n        set.seed(32)\n\n        expect_equal(\n            test_introns,\n            test_exons %>%\n                .[sample(seq_len(nrow(test_exons))), ] %>%\n                to_intron(group_var = \"tx\")\n        )\n    }\n)\n"
  },
  {
    "path": "tests/testthat/test-utils.R",
    "content": "# create dummy exons for testing\ntest_exons <-\n    dplyr::tibble(\n        start = c(100, 300, 500, 650),\n        end = start + 100,\n        strand = c(\"+\", \"+\", \"-\", \"-\"),\n        tx = c(\"A\", \"A\", \"B\", \"B\")\n    )\n\n##### .check_coord_object #####\n\ntestthat::test_that(\".check_coord_object() works correctly\", {\n    expect_equal(\n        .check_coord_object(test_exons),\n        NULL\n    )\n})\n\ntestthat::test_that(\".check_coord_object() catches user input errors\", {\n    expect_error(\n        .check_coord_object(\"1\"),\n        \"must be a data.frame\"\n    )\n    expect_error(\n        .check_coord_object(test_exons %>% dplyr::select(-start)),\n        \"must have the columns\"\n    )\n    expect_error(\n        .check_coord_object(test_exons %>% dplyr::select(-end)),\n        \"must have the columns\"\n    )\n    expect_error(\n        .check_coord_object(test_exons, check_seqnames = TRUE),\n        \"must have the column\"\n    )\n    expect_error(\n        .check_coord_object(\n            test_exons %>% dplyr::select(-strand),\n            check_strand = TRUE\n        ),\n        \"must have the column\"\n    )\n})\n\n##### .check_group_var #####\n\ntestthat::test_that(\".check_group_var() works correctly\", {\n    expect_equal(\n        .check_group_var(test_exons, group_var = NULL),\n        NULL\n    )\n    expect_equal(\n        .check_group_var(test_exons, group_var = \"tx\"),\n        NULL\n    )\n})\n\ntestthat::test_that(\".check_group_var() catches user input errors\", {\n    expect_error(\n        .check_group_var(test_exons, \"not_a_col\"),\n        \"must be a column in\"\n    )\n})\n"
  },
  {
    "path": "tests/testthat.R",
    "content": "library(testthat)\nlibrary(ggtranscript)\n\ntest_check(\"ggtranscript\")\n"
  },
  {
    "path": "vignettes/.gitignore",
    "content": "*.html\n*.R\n"
  },
  {
    "path": "vignettes/ggtranscript.Rmd",
    "content": "---\ntitle: \"Getting started\"\nauthor: \n  - name: David Zhang\n    affiliation:\n    - UCL\n    email: dyzhang32@gmail.com\noutput: \n  BiocStyle::html_document:\n    self_contained: yes\n    toc: true\n    toc_float: true\n    toc_depth: 2\n    code_folding: show\npackage: \"`r pkg_ver('ggtranscript')`\"\nvignette: >\n  %\\VignetteIndexEntry{Introduction to ggtranscript}\n  %\\VignetteEngine{knitr::rmarkdown}\n  %\\VignetteEncoding{UTF-8}  \n---\n\n```{r setup, include = FALSE}\nknitr::opts_chunk$set(\n    collapse = TRUE,\n    comment = \"#>\",\n    crop = NULL\n)\n```\n\n```{r load-ggtranscript}\nlibrary(magrittr)\nlibrary(dplyr)\nlibrary(ggtranscript)\nlibrary(ggplot2)\nlibrary(rtracklayer)\n```\n\n`ggtranscript` is designed to make it easy to visualize transcript structure and annotation using `ggplot2`. \n\nAs the intended users are those who work with genetic and/or transcriptomic data in `R`, this tutorial assumes a basic understanding of transcript annotation and familiarity with `ggplot2`. \n\n<br>\n\n## Input data\n\n### Example data\n\nIn order to showcase the package's functionality, `ggtranscript` includes example transcript annotation for the genes *SOD1* and *PKNOX1*, as well as a set of unannotated junctions associated with *SOD1*. These specific genes are unimportant, chosen arbitrarily for illustration. However, it worth noting that the input data for `ggtranscript`, as a `ggplot2` extension, is required be a [`data.frame`](https://www.rdocumentation.org/packages/base/versions/3.6.2/topics/data.frame) or [`tibble`](https://tibble.tidyverse.org).\n\n```{r example-data}\n\nsod1_annotation %>% head()\n\npknox1_annotation %>% head()\n\nsod1_junctions\n\n```\n\n### Importing data from a gtf\n\nYou may be asking, what if I have a `gtf` file or a `GRanges` object? The below demonstrates how to wrangle a `gtf` into the required format for `ggtranscript` and extract the relevant annotation for a particular gene of interest. \n\nFor the purposes of the vignette, here we download a `gtf` (Ensembl version 105), then load the `gtf` into `R` using `rtracklayer::import()`.\n\n```{r import-gtf-data}\n\n# download ens 105 gtf into a temporary directory\ngtf_path <- file.path(tempdir(), \"Homo_sapiens.GRCh38.105.chr.gtf.gz\")\n\ndownload.file(\n    paste0(\n        \"http://ftp.ensembl.org/pub/release-105/gtf/homo_sapiens/\",\n        \"Homo_sapiens.GRCh38.105.chr.gtf.gz\"\n    ),\n    destfile = gtf_path\n)\n\ngtf <- rtracklayer::import(gtf_path)\n\nclass(gtf)\n\n```\n\nTo note, the loaded `gtf` is a `GRanges` class object. The input data for `ggtranscript`, as a `ggplot2` extension, is required be a [`data.frame`](https://www.rdocumentation.org/packages/base/versions/3.6.2/topics/data.frame) or [`tibble`](https://tibble.tidyverse.org). We can convert a `GRanges` to a `data.frame` using `as.data.frame` or a `tibble` via `dplyr::as_tibble()`. Either is fine with respect to `ggtranscript`, however we prefer `tibble`s over `data.frame`s for several [reasons](https://r4ds.had.co.nz/tibbles.html#tibbles-vs.-data.frame). \n\n```{r convert-gtf-df}\n\ngtf <- gtf %>% dplyr::as_tibble()\n\nclass(gtf)\n\n```\n\nNow that the `gtf` is a `tibble` (or `data.frame` object), we can `dplyr::filter()` rows and `dplyr::select()` columns to keep the annotation columns required for any specific gene of interest. Here, we illustrate how you would obtain the annotation for the gene *SOD1*, ready for plotting with `ggtranscript`.\n\n```{r get-sod1-annot}\n\n# filter your gtf for the gene of interest, here \"SOD1\"\ngene_of_interest <- \"SOD1\"\n\nsod1_annotation_from_gtf <- gtf %>% \n  dplyr::filter(\n    !is.na(gene_name), \n    gene_name == gene_of_interest\n  ) \n\n# extract the required annotation columns\nsod1_annotation_from_gtf <- sod1_annotation_from_gtf %>% \n  dplyr::select(\n    seqnames,\n    start,\n    end,\n    strand,\n    type,\n    gene_name,\n    transcript_name,\n    transcript_biotype\n  )\n\nsod1_annotation_from_gtf %>% head()\n\n```\n\n### Importing data from a bed file\n\nIf users would like to plot ranges from a `bed` file using `ggtranscript`, they can first import the `bed` file into `R` using `rtracklayer::import.bed()`. This method will create a `UCSCData` object.\n\n```{r import-bed-data}\n\n# for the example, we'll use the test bed file provided by rtracklayer \ntest_bed <- system.file(\"tests/test.bed\", package = \"rtracklayer\")\n\nbed <- rtracklayer::import.bed(test_bed)\n\nclass(bed)\n\n```\n\nA `UCSCData` object can be coerced into a `tibble`, a data structure which can be plotted using `ggplot2`/`ggtranscript`, using `dplyr::as_tibble()`.\n\n```{r convert-bed-df}\n\nbed <- bed %>% dplyr::as_tibble()\n\nclass(bed)\n\nbed %>% head()\n\n```\n\n<br>\n\n## Basic usage\n\n### Required aesthetics\n\n`ggtranscript` introduces 5 new geoms designed to simplify the visualization of transcript structure and annotation; `geom_range()`, `geom_half_range()`, `geom_intron()`, `geom_junction()` and `geom_junction_label_repel()`. The required aesthetics (`aes()`) for these geoms are designed to match the data formats which are widely used in genetic and transcriptomic analyses:\n\n```{r geom-aes, echo = FALSE}\n\ndplyr::tribble(\n    ~`Required aes()`, ~Type, ~Description, ~`Required by`,\n    #-------|-----|----|--------\n    \"xstart\", \"integer\", \"Start position in base pairs\", \"All geoms\", \n    \"xend\", \"integer\", \"End position in base pairs\", \"All geoms\", \n    \"y\", \"charactor or factor\", \"The group used for the y axis, most often a transcript id or name \", \"All geoms\",\n    \"label\", \"integer or charactor\", \"Variable used to label junction curves\", \"Only geom_junction_label_repel()\",\n) %>% \n  knitr::kable()\n\n```\n\n### Plotting exons and introns {#plotting_exons_and_introns}\n\nIn the simplest case, the core components of transcript structure are exons and introns, which can be plotted using `geom_range()` and `geom_intron()`. In order to facilitate this, `ggtranscript` also provides `to_intron()`, which converts exon co-ordinates into introns. Therefore, you can plot transcript structures with only exons as input and just a few lines of code.\n\n> 📌: As `ggtranscript` geoms share required aesthetics, it is recommended to set these via `ggplot()`, rather than in the individual `geom_*()` calls to avoid code duplication.\n\n```{r geom-range-intron}\n\n# to illustrate the package's functionality\n# ggtranscript includes example transcript annotation\nsod1_annotation %>% head()\n\n# extract exons\nsod1_exons <- sod1_annotation %>% dplyr::filter(type == \"exon\")\n\nsod1_exons %>%\n    ggplot(aes(\n        xstart = start,\n        xend = end,\n        y = transcript_name\n    )) +\n    geom_range(\n        aes(fill = transcript_biotype)\n    ) +\n    geom_intron(\n        data = to_intron(sod1_exons, \"transcript_name\"),\n        aes(strand = strand)\n    )\n\n```\n\n### Differentiating UTRs from the coding sequence\n\nAs suggested by it's name, `geom_range()` is designed to visualize range-based transcript annotation. This includes but is not limited to exons. For instance, for protein coding transcripts it can be useful to visually distinguish the coding sequence (CDS) of a transcript from it's UTRs. This can be achieved by adjusting the height and fill of `geom_range()` and overlaying the CDS on top of the exons (including UTRs).\n\n```{r geom-range-intron-w-cds}\n\n# filter for only exons from protein coding transcripts\nsod1_exons_prot_cod <- sod1_exons %>%\n    dplyr::filter(transcript_biotype == \"protein_coding\")\n\n# obtain cds\nsod1_cds <- sod1_annotation %>% dplyr::filter(type == \"CDS\")\n\nsod1_exons_prot_cod %>%\n    ggplot(aes(\n        xstart = start,\n        xend = end,\n        y = transcript_name\n    )) +\n    geom_range(\n        fill = \"white\",\n        height = 0.25\n    ) +\n    geom_range(\n        data = sod1_cds\n    ) +\n    geom_intron(\n        data = to_intron(sod1_exons_prot_cod, \"transcript_name\"),\n        aes(strand = strand),\n        arrow.min.intron.length = 500,\n    )\n\n```\n\n### Plotting junctions\n\n`geom_junction()` plots curved lines that are intended to represent junction reads. Junctions are reads obtained through RNA-sequencing (RNA-seq) data that map with gapped alignment to the genome. Often, this gap is indicative of a splicing event, but can also originate from other genomic events such as indels.\n\nIt can be useful to visually overlay junctions on top of an existing transcript structure. For example, this can help to understand which existing transcripts are expressed in the RNA-seq sample or inform the location or interpretation of the novel splice sites.\n\n`geom_junction_label_repel()` adds labels to junction curves. This can useful for labeling junctions with a measure of their expression or support such as read counts or percent-spliced-in. Alternatively, you may choose to visually map this measure to the thickness of the junction curves by adjusting the the size `aes()`. Or, as shown below, both of these options can be combined.\n\n```{r geom-junction, fig.height = 3}\n\n# extract exons and cds for the MANE-select transcript\nsod1_201_exons <- sod1_exons %>% dplyr::filter(transcript_name == \"SOD1-201\")\nsod1_201_cds <- sod1_cds %>% dplyr::filter(transcript_name == \"SOD1-201\")\n\n# add transcript name column to junctions for plotting\nsod1_junctions <- sod1_junctions %>% dplyr::mutate(transcript_name = \"SOD1-201\")\n\nsod1_201_exons %>%\n  ggplot(aes(\n    xstart = start,\n    xend = end,\n    y = transcript_name\n  )) +\n  geom_range(\n    fill = \"white\", \n    height = 0.25\n  ) +\n  geom_range(\n    data = sod1_201_cds\n  ) + \n  geom_intron(\n    data = to_intron(sod1_201_exons, \"transcript_name\")\n  ) + \n  geom_junction(\n    data = sod1_junctions,\n    aes(size = mean_count),\n    junction.y.max = 0.5\n  ) +\n  geom_junction_label_repel(\n    data = sod1_junctions,\n    aes(label = round(mean_count, 2)),\n    junction.y.max = 0.5\n  ) + \n  scale_size_continuous(range = c(0.1, 1))\n\n```\n\n<br>\n\n## Visualizing transcript structure differences\n\n### Context\n\nOne of the primary reasons for visualizing transcript structures is to better observe the differences between them. Often this can be achieved by simply plotting the exons and introns as shown in [basic usage](#plotting_exons_and_introns). However, for longer, complex transcripts this may not be as straight forward. \n\nFor example, the transcripts of *PKNOX1* have relatively long introns, which makes the comparison between transcript structures (especially small differences in exons) more difficult. \n\n> 📌: For relatively short introns, strand arrows may overlap exons. In such cases, the `arrow.min.intron.length` parameter of `geom_intron()` can be used to set the minimum intron length for a strand arrow to be plotted. \n\n```{r transcript-diff-base}\n\n# extract exons\npknox1_exons <- pknox1_annotation %>% dplyr::filter(type == \"exon\")\n\npknox1_exons %>%\n    ggplot(aes(\n        xstart = start,\n        xend = end,\n        y = transcript_name\n    )) +\n    geom_range(\n        aes(fill = transcript_biotype)\n    ) +\n    geom_intron(\n        data = to_intron(pknox1_exons, \"transcript_name\"),\n        aes(strand = strand), \n        arrow.min.intron.length = 3500\n    )\n\n```\n\n### Improving transcript structure visualisation using `shorten_gaps()`\n\n`ggtranscript` provides the helper function `shorten_gaps()`, which reduces the size of the gaps (regions that do not overlap an exon). `shorten_gaps()` then rescales the exon and intron co-ordinates, preserving the original exon alignment. This allows you to hone in the differences of interest, such as the exonic structure.\n\n> 📌: The rescaled co-ordinates returned by `shorten_gaps()` will not match the original genomic positions. Therefore, it is recommended that `shorten_gaps()` is used for visualizations purposes only. \n\n```{r shorten-gaps}\n\n# extract exons\npknox1_exons <- pknox1_annotation %>% dplyr::filter(type == \"exon\")\n\npknox1_rescaled <- shorten_gaps(\n  exons = pknox1_exons, \n  introns = to_intron(pknox1_exons, \"transcript_name\"), \n  group_var = \"transcript_name\"\n)\n\n# shorten_gaps() returns exons and introns all in one data.frame()\n# let's split these for plotting \npknox1_rescaled_exons <- pknox1_rescaled %>% dplyr::filter(type == \"exon\") \npknox1_rescaled_introns <- pknox1_rescaled %>% dplyr::filter(type == \"intron\") \n\npknox1_rescaled_exons %>% \n    ggplot(aes(\n        xstart = start,\n        xend = end,\n        y = transcript_name\n    )) +\n    geom_range(\n        aes(fill = transcript_biotype)\n    ) +\n    geom_intron(\n        data = pknox1_rescaled_introns,\n        aes(strand = strand), \n        arrow.min.intron.length = 300\n    )\n\n```\n\n### Comparing between two transcripts using `geom_half_range()`\n\nIf you are interested in the differences between two transcripts, you can use `geom_half_range()` whilst adjusting `range.orientation` to plot the exons from each on the opposite sides of the transcript structure. This can reveal small differences in exon structure, such as those observed here at the 5' ends of *PKNOX1-201* and *PKNOX1-203*. \n\n```{r geom-half-range, fig.height = 3}\n\n# extract the two transcripts to be compared\npknox1_rescaled_201_exons <- pknox1_rescaled_exons %>% \n  dplyr::filter(transcript_name == \"PKNOX1-201\")\npknox1_rescaled_203_exons <- pknox1_rescaled_exons %>% \n  dplyr::filter(transcript_name == \"PKNOX1-203\")\n\npknox1_rescaled_201_exons %>%\n    ggplot(aes(\n        xstart = start,\n        xend = end,\n        y = \"PKNOX1-201/203\"\n    )) +\n    geom_half_range() +\n    geom_intron(\n        data = to_intron(pknox1_rescaled_201_exons, \"transcript_name\"), \n        arrow.min.intron.length = 300\n    ) +\n    geom_half_range(\n        data = pknox1_rescaled_203_exons,\n        range.orientation = \"top\", \n        fill = \"purple\"\n    ) +\n    geom_intron(\n        data = to_intron(pknox1_rescaled_203_exons, \"transcript_name\"), \n        arrow.min.intron.length = 300\n    )\n\n```\n\n### Comparing many transcripts to a single reference transcript using `to_diff()`\n\nSometimes, it can be useful to visualize the differences of several transcripts with respect to one transcript. For example, you may be interested in how other transcripts differ in structure to the MANE-select transcript. This exploration can reveal whether certain important regions are missing or novel regions are added, hinting at differences in transcript function.\n\n`to_diff()` is a helper function designed for this situation. Here, we apply this to *PKNOX1*, finding the differences between all other transcripts and the MANE-select transcript (*PKNOX1-201*). \n\n> 📌: Although below, we apply `to_diff()` to the rescaled exons and intron (outputted by `shorten_gaps()`), `to_diff()` can also be applied to the original, unscaled transcripts with the same effect. \n\n```{r to-diff}\n\nmane <- pknox1_rescaled_201_exons\n\nnot_mane <- pknox1_rescaled_exons %>% \n  dplyr::filter(transcript_name != \"PKNOX1-201\")\n\npknox1_rescaled_diffs <- to_diff(\n    exons = not_mane,\n    ref_exons = mane,\n    group_var = \"transcript_name\"\n)\n\npknox1_rescaled_exons %>%\n    ggplot(aes(\n        xstart = start,\n        xend = end,\n        y = transcript_name\n    )) +\n    geom_range() +\n    geom_intron(\n        data = pknox1_rescaled_introns,\n        arrow.min.intron.length = 300\n    ) +\n    geom_range(\n        data = pknox1_rescaled_diffs,\n        aes(fill = diff_type),\n        alpha = 0.2\n    )\n\n```\n\n<br>\n\n## Integrating existing `ggplot2` functionality\n\nAs a `ggplot2` extension, `ggtranscript` inherits `ggplot2`'s familiarity and flexibility, enabling users to intuitively adjust aesthetics, parameters, scales etc as well as complement `ggtranscript` geoms with existing `ggplot2` geoms to create informative, publication-ready plots.\n\nBelow is a list outlining some examples of complementing `ggtranscript` with existing `ggplot2` functionality that we have found useful: \n\n  - Adding exon annotation such as [exon number/order](https://dzhang32.github.io/ggtranscript/reference/add_exon_number.html) using `add_exon_number()` and `geom_text()`\n  \n```{r exon-num-ex}\n\nbase_sod1_plot <- sod1_exons %>% \n  ggplot(aes(\n    xstart = start,\n    xend = end,\n    y = transcript_name\n  )) +\n  geom_range(\n    aes(fill = transcript_biotype)\n  ) +\n  geom_intron(\n    data = to_intron(sod1_exons, \"transcript_name\"),\n    aes(strand = strand)\n  ) \n\nbase_sod1_plot + \n  geom_text(\n    data = add_exon_number(sod1_exons, \"transcript_name\"),\n    aes(\n      x = (start + end) / 2, # plot label at midpoint of exon\n      label = exon_number\n    ),\n    size = 3.5,\n    nudge_y = 0.4\n  )\n\n```\n  \n  - Zooming in on areas of interest using `coord_cartesian()` or `ggforce::facet_zoom()`\n  \n```{r zoom-ex}\n\nbase_sod1_plot + \n  coord_cartesian(xlim = c(31665500, 31669000))\n\n```\n  \n  - Plotting mutations using `geom_vline()`\n  \n```{r mutation-ex}\n\nexample_mutation <- dplyr::tibble(\n  transcript_name = \"SOD1-204\", \n  position = 31661600\n)\n\n# xstart and xend are set here to override the default aes()\nbase_sod1_plot + \n  geom_vline(\n    data = example_mutation, \n    aes(\n      xintercept = position, \n      xstart = NULL,\n      xend = NULL\n      ), \n    linetype = 2,\n    colour = \"red\"\n  )\n\n```\n\n  - Beautifying plots using themes and scales\n  \n```{r beautify-ex}\n\nbase_sod1_plot + \n  theme_bw() + \n  scale_x_continuous(name = \"Position\") + \n  scale_y_discrete(name = \"Transcript name\") + \n  scale_fill_discrete(\n    name = \"Transcript biotype\",\n    labels = c(\"Processed transcript\", \"Protein-coding\")\n    )\n  \n```\n  \n<br>\n\n## Session info\n\n<details>\n  <summary>Show/hide</summary>\n```{r session-info, echo = FALSE}\n\nlibrary(\"sessioninfo\")\n\noptions(width = 120)\n\nsession_info()\n```\n</details> \n"
  }
]