Repository: egeulgen/pathfindR Branch: master Commit: 7ce1330d6d16 Files: 138 Total size: 688.6 KB Directory structure: gitextract_46eozuid/ ├── .Rbuildignore ├── .Rinstignore ├── .github/ │ ├── .gitignore │ ├── ISSUE_TEMPLATE/ │ │ ├── bug_report.md │ │ └── feature_request.md │ └── workflows/ │ ├── R-CMD-check.yaml │ ├── branch_naming_policy.yaml │ ├── pkgdown.yaml │ └── test-coverage.yaml ├── .gitignore ├── CODE_OF_CONDUCT.md ├── CONTRIBUTING.md ├── DESCRIPTION ├── LICENSE ├── LICENSE.md ├── NAMESPACE ├── NEWS.md ├── R/ │ ├── active_snw_search.R │ ├── clustering.R │ ├── comparison.R │ ├── core.R │ ├── data_generation.R │ ├── enrichment.R │ ├── pathfindr.R │ ├── scoring.R │ ├── utility.R │ ├── visualization.R │ └── zzz.R ├── README.Rmd ├── README.md ├── _pkgdown.yml ├── codecov.yml ├── cran-comments.md ├── inst/ │ ├── CITATION │ ├── extdata/ │ │ ├── CREB.txt │ │ ├── MYC.txt │ │ └── resultActiveSubnetworkSearch.txt │ ├── java/ │ │ └── ActiveSubnetworkSearch.jar │ └── rmd/ │ ├── conversion_table.Rmd │ ├── enriched_terms.Rmd │ └── results.Rmd ├── java/ │ ├── ActiveSubnetworkSearchAlgorithms/ │ │ ├── ActiveSubnetworkSearch.java │ │ ├── GAIndividual.java │ │ ├── GeneticAlgorithm.java │ │ ├── GreedySearch.java │ │ └── SimulatedAnnealing.java │ ├── ActiveSubnetworkSearchMisc/ │ │ ├── Gaussian.java │ │ ├── ScoreCalculations.java │ │ ├── Subnetwork.java │ │ └── ZStatistics.java │ ├── Application/ │ │ ├── AppActiveSubnetworkSearch.java │ │ └── Parameters.java │ ├── File/ │ │ ├── ExperimentFileReader.java │ │ └── SIFReader.java │ └── Network/ │ ├── Network.java │ ├── Node.java │ └── SubnetworkFinder.java ├── man/ │ ├── UpSet_plot.Rd │ ├── active_snw_enrichment_wrapper.Rd │ ├── active_snw_search.Rd │ ├── annotate_term_genes.Rd │ ├── check_java_version.Rd │ ├── cluster_enriched_terms.Rd │ ├── cluster_graph_vis.Rd │ ├── color_kegg_pathway.Rd │ ├── combine_pathfindR_results.Rd │ ├── combined_results_graph.Rd │ ├── configure_output_dir.Rd │ ├── create_HTML_report.Rd │ ├── create_kappa_matrix.Rd │ ├── enrichment.Rd │ ├── enrichment_analyses.Rd │ ├── enrichment_chart.Rd │ ├── fetch_gene_set.Rd │ ├── fetch_java_version.Rd │ ├── filterActiveSnws.Rd │ ├── fuzzy_term_clustering.Rd │ ├── get_biogrid_pin.Rd │ ├── get_gene_sets_list.Rd │ ├── get_kegg_gsets.Rd │ ├── get_mgsigdb_gsets.Rd │ ├── get_pin_file.Rd │ ├── get_reactome_gsets.Rd │ ├── gset_list_from_gmt.Rd │ ├── hierarchical_term_clustering.Rd │ ├── hyperg_test.Rd │ ├── input_processing.Rd │ ├── input_testing.Rd │ ├── isColor.Rd │ ├── pathfindr.Rd │ ├── plot_scores.Rd │ ├── process_pin.Rd │ ├── return_pin_path.Rd │ ├── run_pathfindr.Rd │ ├── safe_get_content.Rd │ ├── score_terms.Rd │ ├── single_iter_wrapper.Rd │ ├── summarize_enrichment_results.Rd │ ├── term_gene_graph.Rd │ ├── term_gene_heatmap.Rd │ ├── visualize_KEGG_diagram.Rd │ ├── visualize_active_subnetworks.Rd │ ├── visualize_term_interactions.Rd │ └── visualize_terms.Rd ├── renv/ │ ├── .gitignore │ ├── activate.R │ └── settings.json ├── revdep/ │ ├── .gitignore │ ├── email.yml │ └── failures.md ├── slides/ │ └── cost_charme_school/ │ └── demo_script.R ├── tests/ │ ├── testthat/ │ │ ├── test-active_snw_search.R │ │ ├── test-clustering.R │ │ ├── test-comparison.R │ │ ├── test-core.R │ │ ├── test-data_generation.R │ │ ├── test-enrichment.R │ │ ├── test-scoring.R │ │ ├── test-utility.R │ │ ├── test-visualization.R │ │ └── test-zzz.R │ ├── testthat-active_snw.R │ ├── testthat-clustering.R │ ├── testthat-comparison.R │ ├── testthat-core.R │ ├── testthat-data_generation.R │ ├── testthat-enrichment.R │ ├── testthat-scoring.R │ ├── testthat-utility.R │ ├── testthat-visualization.R │ └── testthat-zzz.R └── vignettes/ ├── .gitignore ├── comparing_results.Rmd ├── intro_vignette.Rmd ├── manual_execution.Rmd ├── non_hs_analysis.Rmd ├── obtain_data.Rmd └── visualization_vignette.Rmd ================================================ FILE CONTENTS ================================================ ================================================ FILE: .Rbuildignore ================================================ ^renv$ ^renv\.lock$ ^slides$ ^CODE_OF_CONDUCT\.md$ ^CONTRIBUTING.md$ ^\.github$ ^Meta$ ^doc$ ^.*\.Rprofile$ ^.*\.Rproj$ ^\.Rproj\.user$ ^data-raw$ ^misc$ ^README.md$ ^\.travis\.yml$ ^cran-comments\.md$ ^CRAN-RELEASE$ ^Dockerfile_dev$ ^codecov\.yml$ ^LICENSE\.md$ ^README\.Rmd$ ^docs$ ^_pkgdown\.yml$ ^pkgdown$ ^revdep$ ^CRAN-SUBMISSION$ ================================================ FILE: .Rinstignore ================================================ ^slides$ ^java$ ^misc_data$ ================================================ FILE: .github/.gitignore ================================================ *.html ================================================ FILE: .github/ISSUE_TEMPLATE/bug_report.md ================================================ --- name: Bug report about: Create a report to help us improve title: '' labels: 'bug' assignees: '' --- **Describe the bug** A clear and concise description of what the bug is. **To Reproduce** Steps to reproduce the behavior: 1. Prepare input as '...' 2. Run the following function: '....' 3. See error **Expected behavior** A clear and concise description of what you expected to happen. **Screenshots** If applicable, add screenshots to help explain your problem. **Desktop (please complete the following information):** - OS: [e.g. macOS, Windows, Linux] - Version [e.g. 10.14.5] ** R Session Information:** Please provide the R session information (by running `sessionInfo()`) **Additional context** Add any other context about the problem here. While pathfindR is an R package, the active subnetwork search functionality is written in Java. If you suspect any issue regarding java please provide your Java version (by running `java --version`) ================================================ FILE: .github/ISSUE_TEMPLATE/feature_request.md ================================================ --- name: Feature request about: Suggest an idea for this project title: '' labels: 'enhancement' assignees: '' --- **Is your feature request related to a problem? Please describe.** A clear and concise description of what the problem is. Ex. I'm always frustrated when [...] **Describe the solution you'd like** A clear and concise description of what you want to happen. **Describe alternatives you've considered** A clear and concise description of any alternative solutions or features you've considered. **Additional context** Add any other context or screenshots about the feature request here. ================================================ FILE: .github/workflows/R-CMD-check.yaml ================================================ # Workflow derived from https://github.com/r-lib/actions/tree/v2/examples # Need help debugging build failures? Start at https://github.com/r-lib/actions#where-to-find-help on: push: branches: - master pull_request: branches: - master name: R-CMD-check jobs: R-CMD-check: runs-on: ${{ matrix.config.os }} name: ${{ matrix.config.os }} (${{ matrix.config.r }}) strategy: fail-fast: false matrix: config: - {os: macos-latest, r: 'release'} - {os: windows-latest, r: 'release'} - {os: ubuntu-latest, r: 'devel', http-user-agent: 'release'} - {os: ubuntu-latest, r: 'release'} - {os: ubuntu-latest, r: 'oldrel-1'} env: GITHUB_PAT: ${{ secrets.GITHUB_TOKEN }} R_KEEP_PKG_SOURCE: yes steps: - uses: actions/checkout@v4 - uses: r-lib/actions/setup-pandoc@v2 - uses: r-lib/actions/setup-r@v2 with: r-version: ${{ matrix.config.r }} http-user-agent: ${{ matrix.config.http-user-agent }} use-public-rspm: true - uses: r-lib/actions/setup-r-dependencies@v2 with: extra-packages: any::rcmdcheck needs: check - uses: r-lib/actions/check-r-package@v2 with: upload-snapshots: true build_args: 'c("--no-manual","--compact-vignettes=gs+qpdf")' ================================================ FILE: .github/workflows/branch_naming_policy.yaml ================================================ name: Branch Naming Policy Action on: create: delete: pull_request: branches: - master jobs: branch-naming-policy: runs-on: ubuntu-latest steps: - name: Checkout uses: actions/checkout@v4 - name: Run Branch Naming Policy Action uses: nicklegan/github-repo-branch-naming-policy-action@v1.1.1 if: github.ref_type == 'branch' || github.ref_type == 'pull_request' with: token: ${{ secrets.REPO_TOKEN }} regex: '^(feature|fix|docs|refactor|test|release|chore|experiment)\/[a-zA-Z0-9-]+$' delete: true ================================================ FILE: .github/workflows/pkgdown.yaml ================================================ # Workflow derived from https://github.com/r-lib/actions/tree/v2/examples # Need help debugging build failures? Start at https://github.com/r-lib/actions#where-to-find-help on: push: branches: [main, master] pull_request: branches: [main, master] release: types: [published] workflow_dispatch: name: pkgdown jobs: pkgdown: runs-on: ubuntu-latest # Only restrict concurrency for non-PR jobs concurrency: group: pkgdown-${{ github.event_name != 'pull_request' || github.run_id }} env: GITHUB_PAT: ${{ secrets.GITHUB_TOKEN }} permissions: contents: write steps: - uses: actions/checkout@v4 - uses: r-lib/actions/setup-pandoc@v2 - uses: r-lib/actions/setup-r@v2 with: use-public-rspm: true - uses: r-lib/actions/setup-r-dependencies@v2 with: extra-packages: any::pkgdown, local::. needs: website - name: Build site run: pkgdown::build_site_github_pages(new_process = FALSE, install = FALSE) shell: Rscript {0} - name: Deploy to GitHub pages 🚀 if: github.event_name != 'pull_request' uses: JamesIves/github-pages-deploy-action@v4.5.0 with: clean: false branch: gh-pages folder: docs ================================================ FILE: .github/workflows/test-coverage.yaml ================================================ # Workflow derived from https://github.com/r-lib/actions/tree/v2/examples # Need help debugging build failures? Start at https://github.com/r-lib/actions#where-to-find-help on: push: branches: [main, master] pull_request: branches: [main, master] name: test-coverage jobs: test-coverage: runs-on: ubuntu-latest env: GITHUB_PAT: ${{ secrets.GITHUB_TOKEN }} steps: - uses: actions/checkout@v4 - uses: r-lib/actions/setup-r@v2 with: use-public-rspm: true - uses: r-lib/actions/setup-r-dependencies@v2 with: extra-packages: any::covr needs: coverage - name: Test coverage run: | covr::codecov( quiet = FALSE, clean = FALSE, install_path = file.path(normalizePath(Sys.getenv("RUNNER_TEMP"), winslash = "/"), "package") ) shell: Rscript {0} - name: Show testthat output if: always() run: | ## -------------------------------------------------------------------- find '${{ runner.temp }}/package' -name 'testthat.Rout*' -exec cat '{}' \; || true shell: bash - name: Upload test results if: failure() uses: actions/upload-artifact@v4 with: name: coverage-test-failures path: ${{ runner.temp }}/package ================================================ FILE: .gitignore ================================================ Meta doc inst/doc misc data-raw *.pptx .Rprofile *.DS_Store *.Rproj *.RData *.Ruserdata *.Rproj.user *.Rhistory .Rproj.user docs ================================================ FILE: CODE_OF_CONDUCT.md ================================================ # Contributor Covenant Code of Conduct ## Our Pledge In the interest of fostering an open and welcoming environment, we as contributors and maintainers pledge to making participation in our project and our community a harassment-free experience for everyone, regardless of age, body size, disability, ethnicity, sex characteristics, gender identity and expression, level of experience, education, socio-economic status, nationality, personal appearance, race, religion, or sexual identity and orientation. ## Our Standards Examples of behavior that contributes to creating a positive environment include: * Using welcoming and inclusive language * Being respectful of differing viewpoints and experiences * Gracefully accepting constructive criticism * Focusing on what is best for the community * Showing empathy towards other community members Examples of unacceptable behavior by participants include: * The use of sexualized language or imagery and unwelcome sexual attention or advances * Trolling, insulting/derogatory comments, and personal or political attacks * Public or private harassment * Publishing others' private information, such as a physical or electronic address, without explicit permission * Other conduct which could reasonably be considered inappropriate in a professional setting ## Our Responsibilities Project maintainers are responsible for clarifying the standards of acceptable behavior and are expected to take appropriate and fair corrective action in response to any instances of unacceptable behavior. Project maintainers have the right and responsibility to remove, edit, or reject comments, commits, code, wiki edits, issues, and other contributions that are not aligned to this Code of Conduct, or to ban temporarily or permanently any contributor for other behaviors that they deem inappropriate, threatening, offensive, or harmful. ## Scope This Code of Conduct applies both within project spaces and in public spaces when an individual is representing the project or its community. Examples of representing a project or community include using an official project e-mail address, posting via an official social media account, or acting as an appointed representative at an online or offline event. Representation of a project may be further defined and clarified by project maintainers. ## Enforcement Instances of abusive, harassing, or otherwise unacceptable behavior may be reported by contacting the project team at egeulgen@gmail.com. All complaints will be reviewed and investigated and will result in a response that is deemed necessary and appropriate to the circumstances. The project team is obligated to maintain confidentiality with regard to the reporter of an incident. Further details of specific enforcement policies may be posted separately. Project maintainers who do not follow or enforce the Code of Conduct in good faith may face temporary or permanent repercussions as determined by other members of the project's leadership. ## Attribution This Code of Conduct is adapted from the [Contributor Covenant][homepage], version 1.4, available at https://www.contributor-covenant.org/version/1/4/code-of-conduct.html [homepage]: https://www.contributor-covenant.org For answers to common questions about this code of conduct, see https://www.contributor-covenant.org/faq ================================================ FILE: CONTRIBUTING.md ================================================ # Contributing to pathfindR development The goal of this guide is to help you in contributing to pathfindR. The guide is divided into two main pieces: 1. Filing a bug report or feature request in an issue. 1. Suggesting a change via a pull request. Please note that pathfindR is released with a [Contributor Code of Conduct](CODE_OF_CONDUCT.md). By contributing to this project, you agree to abide by its terms. ## Issues When filing an issue, the most important thing is to include a minimal reproducible example so that we can quickly verify the problem, and then figure out how to fix it. There are three things you need to include to make your example reproducible: required packages, data, code. 1. **Packages** should be loaded at the top of the script, so it's easy to see which ones the example needs. 1. The easiest way to include **data** is to use `dput()` to generate the R code to recreate it. For example, to recreate the `mtcars` dataset in R, I'd perform the following steps: 1. Run `dput(mtcars)` in R 2. Copy the output 3. In my reproducible script, type `mtcars <- ` then paste. But even better is if you can create a `data.frame()` with just a handful of rows and columns that still illustrates the problem. For more complex **data**, you can use `saveRDS()` to save the object and attach it with the issue. 1. Spend a little bit of time ensuring that your **code** is easy for others to read: * make sure you've used spaces and your variable names are concise, but informative * use comments to indicate where your problem lies * do your best to remove everything that is not related to the problem. The shorter your code is, the easier it is to understand. You can check you have actually made a reproducible example by starting up a fresh R session and pasting your script in. ## Pull requests To contribute a change to pathfindR, you follow these steps: 1. Create a branch in git and make your changes. 1. Push branch to github and issue pull request (PR). 1. Discuss the pull request. 1. Iterate until either we accept the PR or decide that it's not a good fit for pathfindR. If you're not familiar with git or github, please start by reading ## Branch Naming Conventions We want to follow the branch following naming convention during development: ### Feature Development: - Use the prefix `feature/` followed by a brief description of the feature. - Example: `feature/add-new-method`, `feature/update-active-snw-search` ### Bug Fixes: - Use the prefix `fix/` followed by a description of the fix or the issue number. - Example: `fix/correct-typo`, `fix/#123` ### Documentation: - Use the prefix `docs/` for updates exclusively in the documentation. - Example: `docs/update-readme`, `docs/add-examples` ### Refactoring: - Use `refactor/` when modifying the structure and organization of code without changing its external behavior. - Example: `refactor/reorganize-tests`, `refactor/optimization-code` ### Testing: - Use `test/` for changes related to testing only. - Example: `test/add-unit-tests`, `test/expand-tests` ### Releases (for maintainers only): - Use `release/` for preparing a new version release. - Example: `release/v1.0.0`, `release/v2.0.0` ### Chore/Maintenance (mostly for maintainers): - Use `chore/` for mundane tasks like updating dependencies or minor tasks that don't modify the source code. - Example: `chore/update-packages`, `chore/license-update` ### Experimental: - Use `experiment/` for experimental work that might not be merged into the `master` - Example: `experiment/new-algorithm`, `exp/test-new-library` # Attribution This Contributing guide was adapted from [ggplot2](https://github.com/tidyverse/ggplot2) ================================================ FILE: DESCRIPTION ================================================ Package: pathfindR Type: Package Title: Enrichment Analysis Utilizing Active Subnetworks Version: 2.7.0.9000 Authors@R: c(person("Ege", "Ulgen", role = c("cre", "cph"), email = "egeulgen@gmail.com", comment = c(ORCID = "0000-0003-2090-3621")), person("Ozan", "Ozisik", role = "aut", email = "ozanytu@gmail.com", comment = c(ORCID = "0000-0001-5980-8002"))) Maintainer: Ege Ulgen Description: Enrichment analysis enables researchers to uncover mechanisms underlying a phenotype. However, conventional methods for enrichment analysis do not take into account protein-protein interaction information, resulting in incomplete conclusions. 'pathfindR' is a tool for enrichment analysis utilizing active subnetworks. The main function identifies active subnetworks in a protein-protein interaction network using a user-provided list of genes and associated p values. It then performs enrichment analyses on the identified subnetworks, identifying enriched terms (i.e. pathways or, more broadly, gene sets) that possibly underlie the phenotype of interest. 'pathfindR' also offers functionalities to cluster the enriched terms and identify representative terms in each cluster, to score the enriched terms per sample and to visualize analysis results. The enrichment, clustering and other methods implemented in 'pathfindR' are described in detail in Ulgen E, Ozisik O, Sezerman OU. 2019. 'pathfindR': An R Package for Comprehensive Identification of Enriched Pathways in Omics Data Through Active Subnetworks. Front. Genet. . License: MIT + file LICENSE URL: https://egeulgen.github.io/pathfindR/, https://github.com/egeulgen/pathfindR BugReports: https://github.com/egeulgen/pathfindR/issues Encoding: UTF-8 LazyData: true SystemRequirements: Java (>= 8.0) biocViews: Imports: DBI, AnnotationDbi, doParallel, foreach, rmarkdown, ggplot2, ggraph, ggupset, fpc, ggkegg (>= 1.4.0), grDevices, httr, igraph, R.utils, msigdbr (>= 24.1.0), knitr Depends: R (>= 4.3.0), pathfindR.data (>= 2.0) Suggests: org.Hs.eg.db, testthat (>= 2.3.2), covr, mockery RoxygenNote: 7.3.3 VignetteBuilder: knitr ================================================ FILE: LICENSE ================================================ YEAR: 2020 COPYRIGHT HOLDER: Ege Ulgen ================================================ FILE: LICENSE.md ================================================ # MIT License Copyright (c) 2020 Ege Ulgen Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. ================================================ FILE: NAMESPACE ================================================ # Generated by roxygen2: do not edit by hand export(UpSet_plot) export(active_snw_search) export(annotate_term_genes) export(cluster_enriched_terms) export(cluster_graph_vis) export(combine_pathfindR_results) export(combined_results_graph) export(create_kappa_matrix) export(enrichment) export(enrichment_analyses) export(enrichment_chart) export(fetch_gene_set) export(filterActiveSnws) export(fuzzy_term_clustering) export(get_gene_sets_list) export(get_pin_file) export(hierarchical_term_clustering) export(hyperg_test) export(input_processing) export(input_testing) export(plot_scores) export(return_pin_path) export(run_pathfindR) export(score_terms) export(summarize_enrichment_results) export(term_gene_graph) export(term_gene_heatmap) export(visualize_KEGG_diagram) export(visualize_active_subnetworks) export(visualize_term_interactions) export(visualize_terms) import(doParallel) import(foreach) import(ggplot2) import(ggraph) import(graphics) import(knitr) import(parallel) import(pathfindR.data) import(rmarkdown) importFrom(ggkegg,pathway) importFrom(httr,GET) importFrom(httr,content) importFrom(httr,http_error) importFrom(httr,status_code) importFrom(httr,timeout) ================================================ FILE: NEWS.md ================================================ # pathfindR (development version) # pathfindR 2.7.0 ## Minor Changes and Bug Fixes - Moved org.Hs.eg.db from "Imports" to "Suggests" per new CRAN policy. Relevant functions revert to default behaviour if the required package is not installed. # pathfindR 2.6.0 ## Minor Changes and Bug Fixes - fixed missing argument issue in `get_gene_sets_list`(#230) - refactored to introduce `safe_get_content` so that URL access issues are handled more gracefully # pathfindR 2.5.1 ## Minor Changes and Bug Fixes - fixed NA values in kappa matrix generation that will cause error as part of the latest `igraph` update (#227) # pathfindR 2.5.0 ## Major Changes - updated dependencies so that `pathfindR` depends on `msigdbr (>= 24.1.0)` - added the new `db_species` argument to the `get_mgsigdb_gsets()` data generation function ## Minor Changes and Bug Fixes - fixed test assertions that will break as part of the latest `ggplot2` update (#223) # pathfindR 2.4.2 ## Minor Changes and Bug Fixes - fixed a bug in `visualize_KEGG_diagram()` where `ggkegg` was raising an error (#214) # pathfindR 2.4.1 ## Minor Changes and Bug Fixes - fixed a bug regarding KEGG gene set fetching: removed the conversion functionality in `get_kegg_gsets()` which now returns KEGG IDs so that the user can convert the returned identifiers using a more appropriate tool (e.g. BioMart) should they wish # pathfindR 2.4.0 ## Major Changes - implemented a new `color_kegg_pathway()` function using `ggkegg` to create colored KEGG pathway ggplot objects (instead of using `KEGGREST` to obtain the colored PNG files, which no longer works #169) - renamed the `visualize_hsa_KEGG` function to `visualize_KEGG_diagram()` to reflect this is now able to handle KEGG pathway enrichment results from any organism - updated the `visualize_terms()`, `visualize_term_interactions()` and `visualize_KEGG_diagram()` functions so that they now return a list of ggplot objects (named by term ID) - updated the `get_kegg_gsets()` function to also use `ggkegg` for fetching genes per pathway data - removed unneeded dependencies: `magick`, `KEGGgraph` and `KEGGREST` ## Minor Changes and Bug Fixes - updated the `get_biogrid_pin()` function so that it can now determine the latest version and download/process it from BioGRID (via setting `release = "latest"`, which is now the default behavior) # pathfindR 2.3.1 ## Minor Changes and Bug Fixes - fixed a bug in the `UpSet_plot()` plot function regarding the interaction with `ggupset` package that was discovered in a reverse dependency check for `ggplot2 3.5.0` (#189) - fixed gene symbol case mismatch issue in `score_terms()` (#186) - applied enhancement suggestion from #184 to enable scale fill manual for `term_gene_graph()` # pathfindR 2.3.0 ## Major Changes - reverted removal of `create_HTML_report()` so `run_pathfindR()` once again generates HTML reports # pathfindR 2.2.0 ## Minor Changes and Bug Fixes - added the `disable_parallel` argument in `active_snw_enrichment_wrapper()` to be able to disable parallel runs via `foreach` - fixed the issue encountered on CentOS where `forech` wasn't loading `pathfindR` (#164) - fixed a CRAN error due to a package documentation issue (#172) - performed some refactoring and updated/improved all tests # pathfindR 2.1.0 ## Minor Changes and Bug Fixes - removed `create_HTML_report()` so `run_pathfindR()` no longer generates a HTML report # pathfindR 2.0.1 ## Minor Changes and Bug Fixes - added the `dir_for_report` argument in the internal function `create_HTML_report()` to fix test issues on CRAN # pathfindR 2.0.0 ## Major Changes - updated the java active subnetwork search component and added the `seedForRandom` argument in `active_snw_search()`to ensure reproducibility. By default behavior, in `run_pathfindR()`, a seed is set for each iteration to produce reproducible results (#108) - as the example input/output data were renamed for convenience in 'pathfindR.data' v2.0, 'pathfindR' now depends on pathfindR.data (>= 2.0) - refactored/simplified `run_pathfindR()` - visualization enriched term diagrams are now NOT part of `run_pathfindR()` - default behavior of `run_pathfindR()` is now to run in a temporary directory. The user can still set `output_dir` to run in a specified directory and also produce HTML reports - in `hierarchical_term_clustering()`, update the sequence of number of clusters for which silhouette width is calculated for choosing the optimal number of clusters. This should speed up the function for cases with a large number of enriched terms - updated the relevant vignettes to reflect the implemented changes ## Minor Changes and Bug Fixes - fixed a minor issue in `return_pin_path()` where the PIN was not properly read (#157) # pathfindR 1.6.4 ## Minor Changes and Bug Fixes - updated the alias selection function within `input_processing()` so that an alias that is not already present is selected - updated the min-max scaling (controlled by `scale_vals`) in `color_kegg_pathway()`, the default is now `scale_vals=TRUE` - updated the `term_gene_heatmap()` function so that legend title is shown and can be customized - updated the `term_gene_heatmap()` function so that coloring is proper when no change values are provided in `genes_df` - added the `sort_terms_by_p` argument to the `term_gene_heatmap()` function to enable sorting of terms by 'lowest_p' - in visualization functions, made coloring of up-/down-regulated genes consistent (#126) - added the `vertex.label.cex` and `vertex.size.scaling` arguments to `cluster_graph_vis()` - added the `show_legend` argument to `visualize_term_interactions()` to toggle the legend # pathfindR 1.6.3 ## Minor Changes and Bug Fixes - Fixed coloring issue in `color_kegg_pathway()` - In `color_kegg_pathway()` the default value for `normalize_vals` is now `FALSE` # pathfindR 1.6.2 ## Major Changes - fixed an issue in `get_kegg_gsets()` where empty result was returned for some organisms due to an error in parsing (#72) ## Minor Changes and Bug Fixes - added `repel = TRUE` in `term_gene_graph()` and `combined_results_graph()` for better visualization of labels - fixed minor issue in `enrichment_chart()` (#75) - fixed minor issue in `visualize_term_interactions()` - fixed issue in `get_biogrid_pin()` where the download method was set to `wget` (now set to `auto`, per #83) - updated to using tab3 format for `get_biogrid_pin()` (if tab3 is available for the chosen release, otherwise tab2 format is used) - updated the default version of PIN obtained by `get_biogrid_pin()` to '4.4.200' - in `get_kegg_gsets()`, improved parsing of KEGG term descriptions so that no description is duplicated (#87) - in `score_terms()`, if using descriptions, the ID is now appended for (any) duplicated term descriptions (#87) - in `obtain_colored_url()`, swapped `bg_color` with `fg_color` due to an issue with `KEGGREST` - added legend to `term_gene_heatmap()` (#95) - in `get_biogrid_pin()`, the "download.file.method" from global options is used - `combined_results_graph()` raises an error if there are no common terms in the combined data frame # pathfindR 1.6.1 ## Major Changes - In `run_pathfindR()`, the default `iterations` was set back to 10 (the default for all other v1.x) # pathfindR 1.6.0 ## Major Changes - In `run_pathfindR()`, as "GR" (the default active subnetwork search method) provides nearly identical results in each iteration, the default `iterations` is set to 1 - added the column 'support' (the proportion of active subnetworks leading to enrichment over all subnetworks) in the output - updated the download URL in `get_biogrid_pin()` as BioGRID updated the URL for download ## Minor Changes and Bug Fixes - changed old argument in the "Step-by-Step Execution of the pathfindR Enrichment Workflow" vignette - fixed an issue in `visualize_term_interactions()` where the file name was too long, it was causing an error on Windows. Limited to 100 characters (#58) # pathfindR 1.5.1 ## Minor Changes and Bug Fixes - Fixed issue in `check_java_version()` where java version 14 could not be parsed (#49) - Fixed issue in `combined_results_graph()` where gene nodes were not colored correctly (#55) # pathfindR 1.5.0 ## Major Changes - created separate package `pathfindR.data` for storing pathfindR data - added the function `visualize_active_subnetworks()` for visualizing graphs of active subnetworks - add the new vignette "Comparing Two pathfindR Results" that briefly describes how different pathfindR results can be compared - added the functions `combine_pathfindR_results()` and `combined_results_graph()` for comparison of 2 pathfindR results and term-gene graph of the combined results, respectively - added the function `get_pin_file()` for obtaining organism-specific PIN data (only from BioGRID for now) - added the function `get_gene_sets_list()` for obtaining organism-specific gene sets list from KEGG, Reactome and MSigDB - added the function `term_gene_heatmap()` to create heatmap visualizations of enriched terms and the involved input genes. Rows are enriched terms and columns are involved input genes. If `genes_df` is provided, colors of the tiles indicate the change values - added the function `UpSet_plot()` to create UpSet plots of enriched terms - added the human cell markers gene sets data `cell_markers_gsets` and `cell_markers_descriptions` ## Minor Changes and Bug Fixes - fixed an issue regarding `parallel::makeCluster()` in `run_pathfindR()` (#45) - fixed save-related issue in `download_kegg_png()` (#37, @rix133) - added the output data `RA_comparison_output` of pathfindR results on another RA-related dataset (GSE84074) - in `visualize_hsa_KEGG()`, fixed the issue where >1 entrez ids were returned for a gene symbol (the first one is kept) - in `visualize_hsa_KEGG()`, implemented a tryCatch to avoid any issues when `KEGGREST::color.pathway.by.objects()` might fail (#28) - in `visualize_hsa_KEGG()`, now limiting the number of genes passes onto `KEGGREST::color.pathway.by.objects()` to < 60 (because the KEGG API now limits the number?) - changed default visualization in `term_gene_heatmap()` (i.e. when `genes_df` is not provided) to binary colored heatmap (by default, "green" and "red", controlled by `low` and `high`) by up-/down- regulation status - update the vignette "pathfindR Analysis for non-Homo-sapiens organisms" to reflect new data generation functions `get_pin_file()` and `get_gene_sets_list()` and fixed a minor issue in the vignette (#46) # pathfindR 1.4.2 ## Minor Changes and Bug Fixes - Fixed corner case in `create_kappa_matrix()` when `chance` is 1, the metric is turned into 0 - Fixed misused `class(.) == *` in `cluster_graph_vis()` # pathfindR 1.4.1 ## Major Changes - Fixed error in DESCRIPTION: the Java version in SystemRequirements was corrected to "Java (>= 8.0)" - The Java version is now checked ## Minor Changes and Bug Fixes - Fixed behavior: when no input genes are present in the enriched hsa KEGG pathway, visualization of the pathway is now skipped - Added the argument `max_to_plot` to `visualize_hsa_KEGG()` and to `run_pathfindR()`. This argument controls the number of pathways to be visualized (default is NULL, i.e. no filter). This was implemented not to slow down the runtime of `run_pathfindR()` as downloading the png files is slow. - Fixed links to visualizations in `enriched_ters.Rmd` # pathfindR 1.4.0 ## Major Changes - Replaced most occurrences of "pathway" to "term". This was adapted because "term" reflects the utility of the package better. The enrichment and clustering approaches work with any kind of gene set data (be it pathway gene sets, gene ontology gene sets, motif gene sets etc.) Accordingly: - `DESCRIPTION` was updated - The functions `annotate_pathway_DEGs()`, `calculate_pw_scores()`, `cluster_pathways()`, `fuzzy_pw_clustering()`, `hierarchical_pw_clustering()`, `visualize_pw_interactions()` and `visualize_pws()` were renamed to `annotate_term_DEGs()`, `score_terms()`, `cluster_enriched_terms()`, `fuzzy_term_clustering()`, `hierarchical_term_clustering()`, `visualize_term_interactions()` and `visualize_terms()` respectively - The Rmd template file for the report `enriched_pathways.Rmd` was renamed to `enriched_terms.Rmd` - All the Rmd template files for the report were updated - Documentation of each function was updated accordingly - Added the visualization function `term_gene_graph()`, which creates a graph of enriched terms - involved genes - Made changes in `enrichment()` and `enrichment_analyses()` to get enrichment results faster - Added the function `fetch_gene_set()` for obtaining gene set data more easily - Terms in gene sets can now be filtered according to the number of genes a term contains (controlled by `min_gset_size`, `max_gset_size` in `fetch_gene_set()` and `run_pathfindR()`) - Added the argument `gaCrossover` during active subnetwork search which controls the probability of a crossover in GA (default = 1, i.e. always perform crossover) - Added unit tests using `testthat` - Updated all gene sets data - Updated all RA example data - The vignettes were updated - Updated all PIN data - Improved speed of kappa matrix calculation (`create_kappa_matrix()`) - Added vignette for non-Homo-sapiens organisms - Added Mus musculus (mmu) data: - `mmu_kegg_genes` & `mmu_kegg_descriptions`: mmu KEGG gene sets data - mmu STRING PIN - `myeloma_input` & `myeloma_output`: example mmu input and output data - Added the STRING PIN (combined score >= 400) - The argument `sig_gene_thr` in subnetwork filtering via `filterActiveSnws()` now serves the threshold proportion of significant genes in the active subnetwork. e.g., if there are 100 significant genes and `sig_gene_thr = 0.03`, subnetwork that contain at least 3 (100 x 0.03) significant genes will be accepted for further analysis - Removed `pathview` dependency by implementing colored pathway diagram visualization function using `KEGGREST` and `KEGGgraph` ## Minor Changes and Bug Fixes - In `hierarchical_term_clustering()`, redefined the distance measure as `1 - kappa statistic` - Fixed minor issue in `cluster_graph_vis()` (during the calculations for additional node colors) - Removed title from graph visualization of hierarchical clustering in `cluster_graph_vis()` - In `active_snw_search()`, unnecessary warnings during active subnetwork search were removed - Fixed minor issue in `enrichment_chart()`, supplying fuzzy clustered results no longer raises an error - Added new checks in `input_testing()` and `input_processing()` to ensure that both the initial input data frame and the processed input data frame for active subnetwork search contain at least 2 genes (to fix the corner case encountered in issue #17) - Fixed minor issue in `enrichment_chart()`, ensuring that bubble sizes displayed in the legend (proportional to # of DEGs) are integers - In `enrichment_chart()`, added the arguments `num_bubbles` (default is 4) to control number of bubbles displayed in the legend and `even_breaks` (default is `TRUE`) to indicate if even increments of breaks are required - Updated the logo - Minor fix in `term_gene_graph()` (create the igraph object as an undirected graph for better auto layout) - Minor fix in `visualize_term_interactions()`. The legend no longer displays "Non-input Active Snw. Genes" if they were not provided - The argument `human_genes` in `run_pathfindR()` and `input_processing()` was renamed as `convert2alias` - The gene symbols in the input data frame, the PIN and the gene sets are now turned into uppercase (for obtaining the best overlap) - Added the argument `top_terms` to `enrichment_chart()`, controlling the number top enriched terms to plot (default is 10) - Other minor bug/error fixes # pathfindR 1.3.0 ## Major Changes - Separated the steps of the function `run_pathfindR` into individual functions: `active_snw_search`, `enrichment_analyses`, `summarize_enrichment_results`, `annotate_pathway_DEGs`, `visualize_pws`. - renamed the function `pathmap` as `visualize_hsa_KEGG`, updated the function to produce different visualizations for inputs with binary change values (ordered) and no change values (the `input_processing` function, assigns a change value of 100 to all). - Created new the visualization function `visualize_pw_interactions`, which creates PNG files visualizing the interactions (in the selected PIN) of genes involved in the given pathways. - Added new vignette, describing the step-by-step execution of the pathfindR workflow - Changed clustering metric to kappa statistic, created the new clustering related functions `create_kappa_matrix`, `hierarchical_pw_clustering`, `fuzzy_pw_clustering` and `cluster_pathways`. - Implemented the new function `cluster_graph_vis` for visualizing graph diagrams of clustering results. ## Minor Changes and Bug Fixes - Fixed the bug where the arguments `score_quan_thr` and `sig_gene_thr` for `run_pathfindR` were not being utilized. - in `run_pathfindR`, added message at the end of run, reporting the number enriched pathways. - the function `run_pathfindR` now creates a variable `org_dir` that is the "path/to/original/working/directory". `org_dir` is used in multiple functions to return to the original working directory if anything fails. This changes the previous behavior where if a function stopped with an error the directory was changed to "..", i.e. the parent directory. This change was adapted so that the user is returned to the original working directory if they supply a recursive output folder (`output_dir`, e.g. "./ALL_RESULTS/RESULT_A"). - in `input_processing`, added the argument `human_genes` to only perform alias symbol conversion when human gene symbols are provided. - Updated the Rmd files used to create the report HTML files - Added the data for `GO-All`, all annotations in the GO database (BP+MF+CC) - Updated the vignette `pathfindR - An R Package for Pathway Enrichment Analysis Utilizing Active Subnetworks` to reflect the new functionalities. # pathfindR 1.2.3 ## Minor Changes and Bug Fixes - in the function `plot_scores`, added the argument `label_cases` to indicate whether or not to label the cases in the pathway scoring heatmap plot. Also added the argument `case_control_titles` which allows the user to change the default "Case" and "Control" headers. Also added the arguments `low` and `high` used to change the low and high end colors of the scoring color gradient. - in the function `plot_scores`, reversed the color gradient to match the coloring scheme used by pathview (i.e. red for positive values, green for negative values) - minor change in `parseActiveSnwSearch`, replaced `score_thr` by `score_quan_thr`. This was done so that the scoring filter for active subnetworks could be performed based on the distribution of the current active subnetworks and not using a constant empirical score value threshold. - minor change in `parseActiveSnwSearch`, increased `sig_gene_thr` from 2 to 10 as we observed in most of the cases, this resulted in faster runs with comparable results. - in `choose_clusters`, added the argument `p_val_threshold` to be used as p value threshold for filtering the enriched pathways prior to clustering. # pathfindR 1.2.2 ## Major Changes - fixed issue related to the package `pathview`. ## Minor Changes and Bug Fixes - in the function `choose_clusters`, added option to use pathway names instead of pathway ids when visualizing the clustering dendrogram and heatmap. # pathfindR 1.2.1 ## Major Changes - Added the option to specify a custom gene set when using `run_pathfindR`. For this, the `gene_sets` argument should be set to "Custom" and `custom_genes` and `custom_pathways` should be provided. ## Minor Changes and Bug Fixes - fixed minor bug in `calculate_pw_scores` where if there was one DEG, subsetting the experiment matrix failed - added if condition to check if there were DEGs in `calculate_pw_scores`. If there is none, the pathway is skipped. - in `calculate_pw_scores`, if `cases` are provided, the pathways are reordered before plotting the heat map and returning the matrix according to their activity in `cases`. This way, "up" pathways are grouped together, same for "down" pathways. - in `calculate_pwd`, if a pathway has perfect overlap with other pathways, change the correlation value with 1 instead of NA. - in `choose_clusters`, if `result_df` has less than 3 pathways, do not perform clustering. - `run_pathfindR` checks whether the output directory (`output_dir`) already exists and if it exists, now appends "(1)" to `output_dir` and displays a warning message. This was implemented to prevent writing over existing results. - in run `run_pathfindR`, recursive creation for the output directory (`output_dir`) is now supported. - in run `run_pathfindR`, if no pathways are found, the function returns an empty data frame instead of raising an error. # pathfindR 1.2 ## Major Changes - Implemented the (per subject) pathway scoring function `calculate_pw_scores` and the function to plot the heatmap of pathway scores per subject `plot_scores`. - Added the `auto` parameter to `choose_clusters`. When `auto == TRUE` (default), the function chooses the optimal number of clusters `k` automatically, as the value which maximizes the average silhouette width. It then returns a data frame with the cluster assignments and the representative/member statuses of each pathway. - Added the `Fold_Enrichment` column to the resulting data frame of `enrichment`, and as a corollary to the resulting data frame of `run_pathfindR`. - Added the option `bubble` to plot a bubble chart displaying the enrichment results in `run_pathfindR` using the helper function `enrichment_chart`. To plot the bubble chart set `bubble = TRUE` in `run_pathfindR` or use `enrichment_chart(your_result_df)`. ## Minor Changes and Bug Fixes - Add the parameter `silent_option` to `run_pathfindR`. When `silent_option == TRUE` (default), the console outputs during active subnetwork search are printed to a file named "console_out.txt". If `silent_option == FALSE`, the output is printed on the screen. Default was set to `TRUE` because multiple console outputs are simultaneously printed when running in parallel. - Added the `list_active_snw_genes` parameter to `run_pathfindR`. When `list_active_snw_genes == TRUE`, the function adds the column `non_DEG_Active_Snw_Genes`, which reports the non-DEG active subnetwork genes for the active subnetwork which was enriched for the given pathway with the lowest p value. - Added the data `RA_clustered`, which is the example output of the clustering workflow. - In the function, `run_pathfindR` added the option to specify the argument `output_dir` which specifies the directory to be created under the current working directory for storing the result HTML files. `output_dir` is "pathfindR_Results" by default. - `run_pathfindR` now checks whether the output directory (`output_dir`) already exists and if it exists, stops and displays an error message. This was implemented to prevent writing over existing results. - `genes_table.html` now contains a second table displaying the input gene symbols for which there were no interactions in the PIN. # pathfindR 1.1 ## Major changes - Added the `gene_sets` option in `run_pathfindR` to chose between different gene sets. Available gene sets are `KEGG`, `Reactome`, `BioCarta` and Gene Ontology gene sets (`GO-BP`, `GO-CC` and `GO-MF`) - `cluster_pathways` automatically recognizes the ID type and chooses the gene sets accordingly ## Minor Changes and Bug Fixes - Fixed issue regarding p values < 1e-13. No active subnetworks were found when there were p values < 1e-13. These are now changed to 1e-13 in the function `input_processing` - In `input_processing`, genes for which no interactions are found in the PIN are now removed before active subnetwork search - Duplicated gene symbols no longer raise an error. If there are duplicated symbols, the lowest p value is chosen for each gene symbol in the function `input_processing` - To prevent the formation of nested folders, by default and on errors, the function `run_pathfindR` returns to the user's working directory. - Citation information are now provided for our BioRxiv pre-print ================================================ FILE: R/active_snw_search.R ================================================ #' Perform Active Subnetwork Search #' #' @param input_for_search input the input data that active subnetwork search uses. The input #' must be a data frame containing at least these 2 columns: \describe{ #' \item{GENE}{Gene Symbol} #' \item{P_VALUE}{p value obtained through a test, e.g. differential expression/methylation} #' } #' @inheritParams return_pin_path #' @param snws_file name for active subnetwork search output data #' \strong{without file extension} (default = 'active_snws') #' @param dir_for_parallel_run (previously created) directory for a parallel run iteration. #' Used in the wrapper function (see ?run_pathfindR) (Default = NULL) #' @inheritParams filterActiveSnws #' @param search_method algorithm to use when performing active subnetwork #' search. Options are greedy search (GR), simulated annealing (SA) or genetic #' algorithm (GA) for the search (default = 'GR'). #' @param seedForRandom seed for reproducibility while running the java modules (applies for GR and SA) #' @param silent_option boolean value indicating whether to print the messages #' to the console (FALSE) or not (TRUE, this will print to a temp. file) during #' active subnetwork search (default = TRUE). This option was added because #' during parallel runs, the console messages get disorderly printed. #' @param use_all_positives if TRUE: in GA, adds an individual with all positive #' nodes. In SA, initializes candidate solution with all positive nodes. (default = FALSE) #' @param geneInitProbs For SA and GA, probability of adding a gene in initial solution (default = 0.1) #' @param saTemp0 Initial temperature for SA (default = 1.0) #' @param saTemp1 Final temperature for SA (default = 0.01) #' @param saIter Iteration number for SA (default = 10000) #' @param gaPop Population size for GA (default = 400) #' @param gaIter Iteration number for GA (default = 200) #' @param gaThread Number of threads to be used in GA (default = 5) #' @param gaCrossover Applies crossover with the given probability in GA (default = 1, i.e. always perform crossover) #' @param gaMut For GA, applies mutation with given mutation rate (default = 0, i.e. mutation off) #' @param grMaxDepth Sets max depth in greedy search, 0 for no limit (default = 1) #' @param grSearchDepth Search depth in greedy search (default = 1) #' @param grOverlap Overlap threshold for results of greedy search (default = 0.5) #' @param grSubNum Number of subnetworks to be presented in the results (default = 1000) #' #' @return A list of genes in every identified active subnetwork that has a score greater than #' the `score_quan_thr`th quantile and that has at least `sig_gene_thr` affected genes. #' #' @export #' #' @examples #' \donttest{ #' processed_df <- example_pathfindR_input[1:15, -2] #' colnames(processed_df) <- c('GENE', 'P_VALUE') #' GR_snws <- active_snw_search( #' input_for_search = processed_df, #' pin_name_path = 'KEGG', #' search_method = 'GR', #' score_quan_thr = 0.8 #' ) #' # clean-up #' unlink('active_snw_search', recursive = TRUE) #' } active_snw_search <- function(input_for_search, pin_name_path = "Biogrid", snws_file = "active_snws", dir_for_parallel_run = NULL, score_quan_thr = 0.8, sig_gene_thr = 0.02, search_method = "GR", seedForRandom = 1234, silent_option = TRUE, use_all_positives = FALSE, geneInitProbs = 0.1, saTemp0 = 1, saTemp1 = 0.01, saIter = 10000, gaPop = 400, gaIter = 10000, gaThread = 5, gaCrossover = 1, gaMut = 0, grMaxDepth = 1, grSearchDepth = 1, grOverlap = 0.5, grSubNum = 1000) { ############ Argument checks input_for_search if (!is.data.frame(input_for_search)) { stop("`input_for_search` should be data frame") } cnames <- c("GENE", "P_VALUE") if (any(!cnames %in% colnames(input_for_search))) { stop("`input_for_search` should contain the columns ", paste(dQuote(cnames), collapse = ",")) } # pin_name_path (fetch pin path) pin_path <- return_pin_path(pin_name_path) # snws_file if (!suppressWarnings(file.create(file.path(tempdir(check = TRUE), snws_file)))) { stop("`snws_file` may be containing forbidden characters. Please change and try again") } # search_method valid_mets <- c("GR", "SA", "GA") if (!search_method %in% valid_mets) { stop("`search_method` should be one of ", paste(dQuote(valid_mets), collapse = ", ")) } # silent_option if (!is.logical(silent_option)) { stop("`silent_option` should be either TRUE or FALSE") } # use_all_positives if (!is.logical(use_all_positives)) { stop("`use_all_positives` should be either TRUE or FALSE") } ############ Initial Steps If dir_for_parallel_run is provided, change ############ working dir to dir_for_parallel_run if (!is.null(dir_for_parallel_run)) { org_dir <- getwd() on.exit(setwd(org_dir)) setwd(dir_for_parallel_run) } ## turn silent_option into shell argument tmp_out <- file.path(tempdir(check = TRUE), paste0("console_out_", snws_file, ".txt")) silent_option <- ifelse(silent_option, paste0(" > ", tmp_out), "") ## turn use_all_positives into the java argument use_all_positives <- ifelse(use_all_positives, " -useAllPositives", "") ## absolute path for active snw search jar active_search_jar_path <- system.file("java/ActiveSubnetworkSearch.jar", package = "pathfindR") ## create directory for active subnetworks if (!dir.exists("active_snw_search")) { dir.create("active_snw_search") } if (!file.exists("active_snw_search/input_for_search.txt")) { input_for_search$GENE <- base::toupper(input_for_search$GENE) utils::write.table(input_for_search[, c("GENE", "P_VALUE")], "active_snw_search/input_for_search.txt", col.names = FALSE, row.names = FALSE, quote = FALSE, sep = "\t") } input_path <- normalizePath("active_snw_search/input_for_search.txt") ############ Run active Subnetwork Search running Active Subnetwork Search system(paste0("java -Xss4m -jar \"", active_search_jar_path, "\"", " -sif=\"", pin_path, "\"", " -sig=\"", input_path, "\"", " -method=", search_method, " -seedForRandom=", seedForRandom, use_all_positives, " -saTemp0=", saTemp0, " -saTemp1=", saTemp1, " -saIter=", format(saIter, scientific = FALSE), " -geneInitProb=", geneInitProbs, " -gaPop=", gaPop, " -gaIter=", gaIter, " -gaThread=", gaThread, " -gaCrossover=", gaCrossover, " -gaMut=", gaMut, " -grMaxDepth=", grMaxDepth, " -grSearchDepth=", grSearchDepth, " -grOverlap=", grOverlap, " -grSubNum=", grSubNum, silent_option)) snws_file <- file.path("active_snw_search", paste0(snws_file, ".txt")) file.rename(from = "resultActiveSubnetworkSearch.txt", to = snws_file) ############ Parse and filter active subnetworks filtered_snws <- filterActiveSnws(active_snw_path = snws_file, sig_genes_vec = input_for_search$GENE, score_quan_thr = score_quan_thr, sig_gene_thr = sig_gene_thr) if (is.null(filtered_snws)) { snws <- list() } else { snws <- filtered_snws$subnetworks } message(paste0("Found ", length(snws), " active subnetworks\n\n")) return(snws) } #' Parse Active Subnetwork Search Output File and Filter the Subnetworks #' #' @param active_snw_path path to the output of an Active Subnetwork Search #' @param sig_genes_vec vector of significant gene symbols. In the scope of this #' package, these are the input genes that were used for active subnetwork search #' @param score_quan_thr active subnetwork score quantile threshold. Must be #' between 0 and 1 or set to -1 for not filtering. (Default = 0.8) #' @param sig_gene_thr threshold for the minimum proportion of significant genes in #' the subnetwork (Default = 0.02) If the number of genes to use as threshold is #' calculated to be < 2 (e.g. 50 signif. genes x 0.01 = 0.5), the threshold number #' is set to 2 #' #' @return A list containing \code{subnetworks}: a list of of genes in every #' active subnetwork that has a score greater than the \code{score_quan_thr}th #' quantile and that contains at least \code{sig_gene_thr} of significant genes #' and \code{scores} the score of each filtered active subnetwork #' @export #' #' @seealso See \code{\link{run_pathfindR}} for the wrapper function of the #' pathfindR enrichment workflow #' #' @examples #' path2snw_list <- system.file( #' 'extdata/resultActiveSubnetworkSearch.txt', #' package = 'pathfindR' #' ) #' filtered <- filterActiveSnws( #' active_snw_path = path2snw_list, #' sig_genes_vec = example_pathfindR_input$Gene.symbol #' ) filterActiveSnws <- function(active_snw_path, sig_genes_vec, score_quan_thr = 0.8, sig_gene_thr = 0.02) { ## Arg. checks active_snw_path <- suppressWarnings(normalizePath(active_snw_path)) if (!file.exists(active_snw_path)) { stop("The active subnetwork file does not exist! Check the `active_snw_path` argument") } if (!is.atomic(sig_genes_vec)) { stop("`sig_genes_vec` should be a vector") } if (!is.numeric(score_quan_thr)) { stop("`score_quan_thr` should be numeric") } if (score_quan_thr != -1 & (score_quan_thr > 1 | score_quan_thr < 0)) { stop("`score_quan_thr` should be in [0, 1] or -1 (if not filtering)") } if (!is.numeric(sig_gene_thr)) { stop("`sig_gene_thr` should be numeric") } if (sig_gene_thr < 0 | sig_gene_thr > 1) { stop("`sig_gene_thr` should be in [0, 1]") } output <- readLines(active_snw_path) if (length(output) == 0) { return(NULL) } score_vec <- c() subnetworks <- list() for (i in base::seq_len(length(output))) { snw <- output[[i]] snw <- unlist(strsplit(snw, "\\s")) score_vec <- c(score_vec, as.numeric(snw[1])) subnetworks[[i]] <- snw[-1] } # keep subnetworks with score over the 'score_quan_thr'th quantile if (score_quan_thr == -1) { score_thr <- min(score_vec) - 1 } else { score_thr <- stats::quantile(score_vec, score_quan_thr) } cond <- as.numeric(score_vec) > as.numeric(score_thr) subnetworks <- subnetworks[cond] score_vec <- as.numeric(score_vec)[cond] # select subnetworks containing at least 'sig_gene_thr' of significant # genes snw_sig_counts <- vapply(subnetworks, function(snw_genes) { sum(base::toupper(snw_genes) %in% base::toupper(sig_genes_vec)) }, 1) sig_gene_num_thr <- sig_gene_thr * length(sig_genes_vec) sig_gene_num_thr <- max(2, sig_gene_num_thr) cond <- (snw_sig_counts >= sig_gene_num_thr) subnetworks <- subnetworks[cond] score_vec <- score_vec[cond] return(list(subnetworks = subnetworks, scores = score_vec)) } #' Visualize Active Subnetworks #' #' @inheritParams filterActiveSnws #' @inheritParams term_gene_heatmap #' @inheritParams return_pin_path #' @param num_snws number of top subnetworks to be visualized (leave blank if #' you want to visualize all subnetworks) #' @inheritParams term_gene_graph #' @param ... additional arguments for \code{\link{input_processing}} #' #' @return a list of ggplot objects of graph visualizations of identified active #' subnetworks. Green nodes are down-regulated genes, reds are up-regulated genes #' and yellows are non-input genes #' @export #' #' @examples #' path2snw_list <- system.file( #' 'extdata/resultActiveSubnetworkSearch.txt', #' package = 'pathfindR' #' ) #' # visualize top 2 active subnetworks #' g_list <- visualize_active_subnetworks( #' active_snw_path = path2snw_list, #' genes_df = example_pathfindR_input[1:10, ], #' pin_name_path = 'KEGG', #' num_snws = 2 #' ) visualize_active_subnetworks <- function(active_snw_path, genes_df, pin_name_path = "Biogrid", num_snws, layout = "stress", score_quan_thr = 0.8, sig_gene_thr = 0.02, ...) { # process input data frame processed_input <- input_processing(genes_df, pin_name_path = pin_name_path, ...) # parse and filter active subnetworks active_snw_list <- filterActiveSnws(active_snw_path = active_snw_path, sig_genes_vec = processed_input$GENE, score_quan_thr = score_quan_thr, sig_gene_thr = sig_gene_thr) if (is.null(active_snw_list) | length(active_snw_list$scores) == 0) { return(NULL) } score_vec <- active_snw_list$scores subnetworks <- active_snw_list$subnetworks if (missing(num_snws)) { num_snws <- length(subnetworks) } if (num_snws > length(subnetworks)) { num_snws <- length(subnetworks) } # load PIN data load PIN pin_path <- return_pin_path(pin_name_path) pin <- utils::read.delim(file = pin_path, header = FALSE) pin$V2 <- NULL pin[, 1] <- base::toupper(pin[, 1]) pin[, 2] <- base::toupper(pin[, 2]) # create graphs graphs_list <- list() for (idx in seq_len(num_snws)) { snw <- subnetworks[[idx]] num_input_genes <- sum(processed_input$GENE %in% snw) perc_input_genes <- round(num_input_genes/length(processed_input$GENE) * 100, 2) snw_interactions <- pin[pin[, 1] %in% snw & pin[, 2] %in% snw, ] g <- igraph::graph_from_data_frame(snw_interactions, directed = FALSE) cond_up_gene <- names(igraph::V(g)) %in% processed_input$GENE[processed_input$CHANGE > 0] cond_down_gene <- names(igraph::V(g)) %in% processed_input$GENE[processed_input$CHANGE < 0] igraph::V(g)$type <- ifelse(cond_up_gene, "up", ifelse(cond_down_gene, "down", "non-input")) igraph::V(g)$label.cex <- 0.5 igraph::V(g)$frame.color <- "gray" igraph::V(g)$color <- ifelse(igraph::V(g)$type == "non-input", "#FFD500", ifelse(igraph::V(g)$type == "up", "#D2222D", "#35CD35")) color_lookup <- c(`#35CD35` = "down-regulated gene", `#D2222D` = "up-regulated gene", `#FFD500` = "non-input gene") p <- ggraph::ggraph(g, layout = layout) p <- p + ggraph::geom_edge_link(alpha = 0.8, colour = "darkgrey") p <- p + ggraph::geom_node_point(ggplot2::aes(color = .data$color), size = 2) p <- p + ggplot2::theme_void() p <- p + ggraph::geom_node_text(ggplot2::aes(label = .data$name), nudge_y = 0.2) p <- p + ggplot2::scale_colour_manual(values = unique(igraph::V(g)$color), name = NULL, labels = color_lookup[unique(igraph::V(g)$color)]) p <- p + ggplot2::labs(title = paste0("Active Subnetwork #", idx), subtitle = paste0("Score=", round(score_vec[idx], 2), ", ", num_input_genes, "(", perc_input_genes, "%) input genes")) p <- p + ggplot2::theme(plot.title = ggplot2::element_text(hjust = 0.5), plot.subtitle = ggplot2::element_text(hjust = 0.5), legend.position = "bottom") graphs_list[[idx]] <- p } return(graphs_list) } ================================================ FILE: R/clustering.R ================================================ #' Create Kappa Statistics Matrix #' #' @param enrichment_res data frame of pathfindR enrichment results. Must-have #' columns are 'Term_Description' (if \code{use_description = TRUE}) or 'ID' #' (if \code{use_description = FALSE}), 'Down_regulated', and 'Up_regulated'. #' If \code{use_active_snw_genes = TRUE}, 'non_Signif_Snw_Genes' must also be #' provided. #' @param use_description Boolean argument to indicate whether term descriptions #' (in the 'Term_Description' column) should be used. (default = \code{FALSE}) #' @param use_active_snw_genes boolean to indicate whether or not to use #' non-input active subnetwork genes in the calculation of kappa statistics #' (default = FALSE, i.e. only use affected genes) #' #' @return a matrix of kappa statistics between each term in the #' enrichment results. #' #' @export #' #' @examples #' sub_df <- example_pathfindR_output[1:3, ] #' create_kappa_matrix(sub_df) create_kappa_matrix <- function(enrichment_res, use_description = FALSE, use_active_snw_genes = FALSE) { ### Argument checks if (!is.logical(use_description)) { stop("`use_description` should be TRUE or FALSE") } if (!is.logical(use_active_snw_genes)) { stop("`use_active_snw_genes` should be TRUE or FALSE") } if (!is.data.frame(enrichment_res)) { stop("`enrichment_res` should be a data frame of enrichment results") } if (nrow(enrichment_res) < 2) { stop("`enrichment_res` should contain at least 2 rows") } nec_cols <- c("Down_regulated", "Up_regulated") if (use_description) { nec_cols <- c("Term_Description", nec_cols) } else { nec_cols <- c("ID", nec_cols) } if (use_active_snw_genes) { nec_cols <- c(nec_cols, "non_Signif_Snw_Genes") } if (!all(nec_cols %in% colnames(enrichment_res))) { stop("`enrichment_res` should contain all of ", paste(dQuote(nec_cols), collapse = ", ")) } ### Initial steps Column to use for gene set names chosen_id <- ifelse(use_description, which(colnames(enrichment_res) == "Term_Description"), which(colnames(enrichment_res) == "ID")) # list of genes down_idx <- which(colnames(enrichment_res) == "Down_regulated") up_idx <- which(colnames(enrichment_res) == "Up_regulated") genes_lists <- apply(enrichment_res, 1, function(x) { base::toupper(c(unlist(strsplit(as.character(x[up_idx]), ", ")), unlist(strsplit(as.character(x[down_idx]), ", ")))) }) if (use_active_snw_genes) { active_idx <- which(colnames(enrichment_res) == "non_Signif_Snw_Genes") genes_lists <- mapply(function(x, y) { c(x, unlist(strsplit(as.character(y), ", "))) }, genes_lists, enrichment_res[, active_idx]) } # Exclude zero-length gene sets excluded_idx <- which(vapply(genes_lists, length, 1) == 0) if (length(excluded_idx) != 0) { genes_lists <- genes_lists[-excluded_idx] enrichment_res <- enrichment_res[-excluded_idx, ] } ### Create Kappa Matrix all_genes <- unique(unlist(genes_lists, use.names = FALSE)) N <- nrow(enrichment_res) term_names <- enrichment_res[, chosen_id] kappa_mat <- matrix(0, nrow = N, ncol = N, dimnames = list(term_names, term_names)) diag(kappa_mat) <- 1 total <- length(all_genes) for (i in 1:(N - 1)) { for (j in (i + 1):N) { genes_i <- genes_lists[[i]] genes_j <- genes_lists[[j]] both <- length(intersect(genes_i, genes_j)) term_i <- length(base::setdiff(genes_i, genes_j)) term_j <- length(base::setdiff(genes_j, genes_i)) no_terms <- total - sum(both, term_i, term_j) observed <- (both + no_terms)/total chance <- (both + term_i) * (both + term_j) chance <- chance + (term_j + no_terms) * (term_i + no_terms) chance <- chance/total^2 kappa_mat[j, i] <- kappa_mat[i, j] <- (observed - chance)/(1 - chance) } } kappa_mat[is.na(kappa_mat)] <- 0 return(kappa_mat) } #' Hierarchical Clustering of Enriched Terms #' #' @param kappa_mat matrix of kappa statistics (output of \code{\link{create_kappa_matrix}}) #' @inheritParams create_kappa_matrix #' @param num_clusters number of clusters to be formed (default = \code{NULL}). #' If \code{NULL}, the optimal number of clusters is determined as the number #' which yields the highest average silhouette width. #' @param clu_method the agglomeration method to be used #' (default = 'average', see \code{\link[stats]{hclust}}) #' @param plot_hmap boolean to indicate whether to plot the kappa statistics #' clustering heatmap or not (default = FALSE) #' @param plot_dend boolean to indicate whether to plot the clustering #' dendrogram partitioned into the optimal number of clusters (default = TRUE) #' #' @details The function initially performs hierarchical clustering #' of the enriched terms in \code{enrichment_res} using the kappa statistics #' (defining the distance as \code{1 - kappa_statistic}). Next, #' the clustering dendrogram is cut into k = 2, 3, ..., n - 1 clusters #' (where n is the number of terms). The optimal number of clusters is #' determined as the k value which yields the highest average silhouette width. #' (if \code{num_clusters} not specified) #' #' @return a vector of clusters for each enriched term in the enrichment results. #' @export #' #' @examples #' \dontrun{ #' hierarchical_term_clustering(kappa_mat, enrichment_res) #' hierarchical_term_clustering(kappa_mat, enrichment_res, method = 'complete') #' } hierarchical_term_clustering <- function(kappa_mat, enrichment_res, num_clusters = NULL, use_description = FALSE, clu_method = "average", plot_hmap = FALSE, plot_dend = TRUE) { ### Set ID/Name index chosen_id <- ifelse(use_description, which(colnames(enrichment_res) == "Term_Description"), which(colnames(enrichment_res) == "ID")) ### Argument checks if (!isSymmetric.matrix(kappa_mat)) { stop("`kappa_mat` should be a symmetric matrix") } if (!all(colnames(kappa_mat) %in% enrichment_res[, chosen_id])) { stop("All terms in `kappa_mat` should be present in `enrichment_res`") } if (!is.logical(plot_hmap)) { stop("`plot_hmap` should be TRUE or FALSE") } if (!is.logical(plot_dend)) { stop("`plot_dend` should be TRUE or FALSE") } ### Add excluded (zero-length) genes kappa_mat2 <- kappa_mat cond <- !enrichment_res[, chosen_id] %in% rownames(kappa_mat2) outliers <- enrichment_res[cond, chosen_id] outliers_mat <- matrix(-1, nrow = nrow(kappa_mat2), ncol = length(outliers), dimnames = list(rownames(kappa_mat2), outliers)) kappa_mat2 <- cbind(kappa_mat2, outliers_mat) outliers_mat <- matrix(-1, nrow = length(outliers), ncol = ncol(kappa_mat2), dimnames = list(outliers, colnames(kappa_mat2))) kappa_mat2 <- rbind(kappa_mat2, outliers_mat) ### Perform hierarchical clustering clu <- stats::hclust(stats::as.dist(1 - kappa_mat2), method = clu_method) if (plot_hmap) { stats::heatmap(kappa_mat2, distfun = function(x) stats::as.dist(1 - x), hclustfun = function(x) stats::hclust(x, method = clu_method)) } ### Choose optimal k (if not specified) if (is.null(num_clusters)) { kmax <- max(nrow(kappa_mat2)%/%2, 2) # sequence of k (number of clusters) to try if (kmax <= 20) { kseq <- 2:kmax } else if (kmax <= 100) { kseq <- c(2:19, seq(20, kmax%/%10 * 10, 10)) } else { kseq <- c(2:19, seq(20, 99, 10), seq(100, kmax%/%50 * 50, 50)) } # calculate average silhouette width per k in sequence avg_sils <- c() for (k in kseq) { avg_sils <- c(avg_sils, fpc::cluster.stats(stats::as.dist(1 - kappa_mat2), stats::cutree(clu, k = k), silhouette = TRUE)$avg.silwidth) } k_opt <- kseq[which.max(avg_sils)] message(paste("The maximum average silhouette width was", round(max(avg_sils), 2), "for k =", k_opt, "\n\n")) } else { k_opt <- num_clusters } if (plot_dend) { graphics::plot(clu) stats::rect.hclust(clu, k = k_opt) } clusters <- stats::cutree(clu, k = k_opt) return(clusters) } #' Heuristic Fuzzy Multiple-linkage Partitioning of Enriched Terms #' #' @inheritParams hierarchical_term_clustering #' @inheritParams create_kappa_matrix #' @param kappa_threshold threshold for kappa statistics, defining strong #' relation (default = 0.35) #' #' @details The fuzzy clustering algorithm was implemented based on: #' Huang DW, Sherman BT, Tan Q, et al. The DAVID Gene Functional #' Classification Tool: a novel biological module-centric algorithm to #' functionally analyze large gene lists. Genome Biol. 2007;8(9):R183. #' #' @return a boolean matrix of cluster assignments. Each row corresponds to an #' enriched term, each column corresponds to a cluster. #' @export #' #' @examples #' \dontrun{ #' fuzzy_term_clustering(kappa_mat, enrichment_res) #' fuzzy_term_clustering(kappa_mat, enrichment_res, kappa_threshold = 0.45) #' } fuzzy_term_clustering <- function(kappa_mat, enrichment_res, kappa_threshold = 0.35, use_description = FALSE) { ### Set ID/Name index chosen_id <- ifelse(use_description, which(colnames(enrichment_res) == "Term_Description"), which(colnames(enrichment_res) == "ID")) ### Argument checks if (!isSymmetric.matrix(kappa_mat)) { stop("`kappa_mat` should be a symmetric matrix") } if (!all(colnames(kappa_mat) %in% enrichment_res[, chosen_id])) { stop("All terms in `kappa_mat` should be present in `enrichment_res`") } if (!is.numeric(kappa_threshold)) { stop("`kappa_threshold` should be numeric") } if (kappa_threshold > 1) { stop("`kappa_threshold` should be at most 1 as kappa statistic is always <= 1") } ### Find Qualified Seeds qualified_seeds <- list() j <- 1 for (i in base::seq_len(nrow(kappa_mat))) { current_term <- rownames(kappa_mat)[i] current_term_kappa <- kappa_mat[i, ] init_membership_cond <- current_term_kappa >= kappa_threshold if (sum(init_membership_cond) > 3) { related_terms <- names(current_term_kappa)[init_membership_cond] terms <- c(current_term, related_terms) related_kappa <- kappa_mat[rownames(kappa_mat) %in% terms, colnames(kappa_mat) %in% terms] diag(related_kappa) <- 0 tight_relationship_cond <- sum(related_kappa >= kappa_threshold)/(nrow(related_kappa)^2) >= 0.5 if (tight_relationship_cond) { qualified_seeds[[j]] <- related_terms names(qualified_seeds)[j] <- current_term j <- j + 1 } } } ### Fuzzy Clustering clusters <- unique(qualified_seeds) i <- 1 j <- i + 1 while (i < length(clusters)) { common_terms <- intersect(clusters[[i]], clusters[[j]]) all_terms <- union(clusters[[i]], clusters[[j]]) if (length(common_terms)/length(all_terms) > 0.5 & i != j) { clusters[[i]] <- all_terms clusters[[j]] <- NULL i <- 1 j <- i + 1 } else if (j < length(clusters)) { j <- j + 1 } else { i <- i + 1 j <- 1 } } ### Find Outliers cond <- !enrichment_res[, chosen_id] %in% c(names(clusters), unlist(clusters)) outliers <- enrichment_res[cond, chosen_id] for (outlier in outliers) { clusters[[outlier]] <- outlier } ### Return Cluster Matrix names(clusters) <- base::seq_len(length(clusters)) cluster_mat <- matrix(FALSE, nrow = nrow(enrichment_res), ncol = length(clusters), dimnames = list(enrichment_res[, chosen_id], names(clusters))) for (clu in names(clusters)) { clu_terms <- clusters[[clu]] cluster_mat[clu_terms, clu] <- TRUE } return(cluster_mat) } #' Graph Visualization of Clustered Enriched Terms #' #' @param clu_obj clustering result (either a matrix obtained via #' \code{\link{hierarchical_term_clustering}} or \code{\link{fuzzy_term_clustering}} #' `fuzzy_term_clustering` or a vector obtained via `hierarchical_term_clustering`) #' @inheritParams fuzzy_term_clustering #' @param vertex.label.cex font size for vertex labels; it is interpreted as a multiplication factor of some device-dependent base font size (default = 0.7) #' @param vertex.size.scaling scaling factor for the node size (default = 2.5) #' #' @return Plots a graph diagram of clustering results. Each node is an enriched term #' from `enrichment_res`. Size of node corresponds to -log(lowest_p). Thickness #' of the edges between nodes correspond to the kappa statistic between the two #' terms. Color of each node corresponds to distinct clusters. For fuzzy #' clustering, if a term is in multiple clusters, multiple colors are utilized. #' #' @export #' #' @examples #' \dontrun{ #' cluster_graph_vis(clu_obj, kappa_mat, enrichment_res) #' } cluster_graph_vis <- function(clu_obj, kappa_mat, enrichment_res, kappa_threshold = 0.35, use_description = FALSE, vertex.label.cex = 0.7, vertex.size.scaling = 2.5) { ### Set ID/Name index chosen_id <- ifelse(use_description, which(colnames(enrichment_res) == "Term_Description"), which(colnames(enrichment_res) == "ID")) ### For coloring nodes all_cols <- c("#E41A1C", "#377EB8", "#4DAF4A", "#984EA3", "#FF7F00", "#FFFF33", "#A65628", "#F781BF", "#999999", "#66C2A5", "#FC8D62", "#8DA0CB", "#E78AC3", "#A6D854", "#FFD92F", "#E5C494", "#B3B3B3", "#8DD3C7", "#FFFFB3", "#BEBADA", "#FB8072", "#80B1D3", "#FDB462", "#B3DE69", "#FCCDE5", "#D9D9D9", "#BC80BD", "#CCEBC5", "#FFED6F", "#A6CEE3", "#1F78B4", "#B2DF8A", "#33A02C", "#FB9A99", "#E31A1C", "#FDBF6F", "#FF7F00", "#CAB2D6", "#6A3D9A", "#FFFF99", "#B15928") if (is.matrix(clu_obj)) { ### Argument checks if (!all(rownames(clu_obj) %in% colnames(kappa_mat))) { stop("Not all terms in `clu_obj` present in `kappa_mat`!") } ### Prep data Remove weak links kappa_mat2 <- kappa_mat diag(kappa_mat2) <- 0 kappa_mat2 <- ifelse(kappa_mat2 < kappa_threshold, 0, kappa_mat2) # Add missing terms missing <- rownames(clu_obj)[!rownames(clu_obj) %in% colnames(kappa_mat2)] missing_mat <- matrix(0, nrow = nrow(kappa_mat2), ncol = length(missing), dimnames = list(rownames(kappa_mat2), missing)) kappa_mat2 <- cbind(kappa_mat2, missing_mat) missing <- rownames(clu_obj)[!rownames(clu_obj) %in% rownames(kappa_mat2)] missing_mat <- matrix(0, nrow = length(missing), ncol = ncol(kappa_mat2), dimnames = list(missing, colnames(kappa_mat2))) kappa_mat2 <- rbind(kappa_mat2, missing_mat) ### Create Graph, Set Color, Size and Percentages values <- apply(clu_obj, 1, function(x) which(x)) percs <- list() for (i in base::seq_len(length(values))) { percs[[i]] <- rep(1/length(values[[i]]), length(values[[i]])) } g <- igraph::graph_from_adjacency_matrix(kappa_mat2, weighted = TRUE) if (length(all_cols) < max(as.integer(colnames(clu_obj)))) { num_extra <- max(as.integer(colnames(clu_obj))) - length(all_cols) extra_colors <- grDevices::rainbow(num_extra) all_cols <- c(all_cols, extra_colors) } # Node shapes are either circle (single cluster) or pie (multiple # clusters) igraph::V(g)$shape <- ifelse(vapply(percs, length, 1) > 1, "pie", "circle") # Node colors are cluster memberships cols <- lapply(values, function(x) all_cols[x]) igraph::V(g)$color <- vapply(cols, function(x) x[1], "") # Node sizes are -log(lowest_p) p_idx <- match(names(igraph::V(g)), enrichment_res[, chosen_id]) transformed_p <- -log10(enrichment_res$lowest_p[p_idx]) igraph::V(g)$size <- transformed_p * vertex.size.scaling ### Plot Graph igraph::plot.igraph(g, vertex.pie = percs, vertex.pie.color = cols, layout = igraph::layout_nicely(g), edge.curved = FALSE, vertex.label.dist = 0, vertex.label.color = "black", asp = 1, vertex.label.cex = vertex.label.cex, edge.width = igraph::E(g)$weight, edge.arrow.mode = 0) } else if (is.integer(clu_obj)) { ### Argument checks if (!all(names(clu_obj) %in% colnames(kappa_mat))) { stop("Not all terms in `clu_obj` present in `kappa_mat`!") } ### Prep data Remove weak links kappa_mat2 <- kappa_mat diag(kappa_mat2) <- 0 kappa_mat2 <- ifelse(kappa_mat2 > kappa_threshold, kappa_mat2, 0) # Add missing terms missing <- names(clu_obj)[!names(clu_obj) %in% colnames(kappa_mat2)] missing_mat <- matrix(0, nrow = nrow(kappa_mat2), ncol = length(missing), dimnames = list(rownames(kappa_mat2), missing)) kappa_mat2 <- cbind(kappa_mat2, missing_mat) missing <- names(clu_obj)[!names(clu_obj) %in% rownames(kappa_mat2)] missing_mat <- matrix(0, nrow = length(missing), ncol = ncol(kappa_mat2), dimnames = list(missing, colnames(kappa_mat2))) kappa_mat2 <- rbind(kappa_mat2, missing_mat) ### Create Graph, Set Colors and Sizes g <- igraph::graph_from_adjacency_matrix(kappa_mat2, weighted = TRUE) igraph::V(g)$Clu <- clu_obj[match(igraph::V(g)$name, names(clu_obj))] if (length(all_cols) < max(as.integer(igraph::V(g)$Clu))) { num_extra <- max(clu_obj) - length(all_cols) extra_colors <- grDevices::rainbow(num_extra) all_cols <- c(all_cols, extra_colors) } # Node colors are cluster memberships igraph::V(g)$color <- all_cols[as.integer(igraph::V(g)$Clu)] # Node sizes are -log(lowest_p) p_idx <- match(names(igraph::V(g)), enrichment_res[, chosen_id]) transformed_p <- -log10(enrichment_res$lowest_p[p_idx]) igraph::V(g)$size <- transformed_p * vertex.size.scaling ### Plot graph igraph::plot.igraph(g, layout = igraph::layout_nicely(g), edge.curved = FALSE, vertex.label.dist = 0, vertex.label.color = "black", asp = 0, vertex.label.cex = vertex.label.cex, edge.width = igraph::E(g)$weight, edge.arrow.mode = 0) } else { stop("Invalid class for `clu_obj`!") } } #' Cluster Enriched Terms #' #' @inheritParams create_kappa_matrix #' @param method Either 'hierarchical' or 'fuzzy'. Details of clustering are #' provided in the corresponding functions \code{\link{hierarchical_term_clustering}}, #' and \code{\link{fuzzy_term_clustering}} #' @param plot_clusters_graph boolean value indicate whether or not to plot #' the graph diagram of clustering results (default = TRUE) #' @param ... additional arguments for \code{\link{hierarchical_term_clustering}}, #' \code{\link{fuzzy_term_clustering}} and \code{\link{cluster_graph_vis}}. #' See documentation of these functions for more details. #' #' #' @return a data frame of clustering results. For 'hierarchical', the cluster #' assignments (Cluster) and whether the term is representative of its cluster #' (Status) is added as columns. For 'fuzzy', terms that are in multiple #' clusters are provided for each cluster. The cluster assignments (Cluster) #' and whether the term is representative of its cluster (Status) is #' added as columns. #' #' @export #' #' @examples #' example_clustered <- cluster_enriched_terms( #' example_pathfindR_output[1:3, ], #' plot_clusters_graph = FALSE #' ) #' example_clustered <- cluster_enriched_terms( #' example_pathfindR_output[1:3, ], #' method = 'fuzzy', plot_clusters_graph = FALSE #' ) #' @seealso See \code{\link{hierarchical_term_clustering}} for hierarchical #' clustering of enriched terms. #' See \code{\link{fuzzy_term_clustering}} for fuzzy clustering of enriched terms. #' See \code{\link{cluster_graph_vis}} for graph visualization of clustering. cluster_enriched_terms <- function(enrichment_res, method = "hierarchical", plot_clusters_graph = TRUE, use_description = FALSE, use_active_snw_genes = FALSE, ...) { ### Argument Checks if (!method %in% c("hierarchical", "fuzzy")) { stop("the clustering `method` must either be \"hierarchical\" or \"fuzzy\"") } if (!is.logical(plot_clusters_graph)) { stop("`plot_clusters_graph` must be logical!") } ### Create Kappa Matrix kappa_mat <- create_kappa_matrix(enrichment_res = enrichment_res, use_description = use_description, use_active_snw_genes = use_active_snw_genes) kappa_mat[is.na(kappa_mat)] <- 0 ### Cluster Terms if (method == "hierarchical") { clu_obj <- R.utils::doCall("hierarchical_term_clustering", kappa_mat = kappa_mat, enrichment_res = enrichment_res, use_description = use_description, ...) } else { clu_obj <- R.utils::doCall("fuzzy_term_clustering", kappa_mat = kappa_mat, enrichment_res = enrichment_res, use_description = use_description, ...) } ### Graph Visualization of Clusters if (plot_clusters_graph) { R.utils::doCall("cluster_graph_vis", clu_obj = clu_obj, kappa_mat = kappa_mat, enrichment_res = enrichment_res, use_description = use_description, ...) } ### Returned Data Frame with Cluster Information clustered_df <- enrichment_res ### Set ID/Name index chosen_id <- ifelse(use_description, which(colnames(enrichment_res) == "Term_Description"), which(colnames(enrichment_res) == "ID")) if (method == "hierarchical") { ### Assign Clusters and Representatives clu_idx <- match(clustered_df[, chosen_id], names(clu_obj)) clustered_df$Cluster <- clu_obj[clu_idx] clustered_df <- clustered_df[order(clustered_df$Cluster, clustered_df$lowest_p, decreasing = FALSE), ] tmp <- tapply(clustered_df[, chosen_id], clustered_df$Cluster, function(x) x[1]) stat_cond <- clustered_df[, chosen_id] %in% tmp clustered_df$Status <- ifelse(stat_cond, "Representative", "Member") } else { term_list <- list() for (term in rownames(clu_obj)) { term_list[[term]] <- which(clu_obj[term, ]) } ### Assign Clusters and Representatives clustered_df2 <- c() for (i in base::seq_len(nrow(clustered_df))) { current_row <- clustered_df[i, ] current_clusters <- term_list[[current_row[, chosen_id]]] for (clu in current_clusters) { clustered_df2 <- rbind(clustered_df2, data.frame(current_row, Cluster = clu)) } } clustered_df <- clustered_df2 clustered_df <- clustered_df[order(clustered_df$Cluster, clustered_df$lowest_p, decreasing = FALSE), ] tmp <- tapply(clustered_df[, chosen_id], clustered_df$Cluster, function(x) x[1]) stat_cond <- clustered_df[, chosen_id] %in% tmp clustered_df$Status <- ifelse(stat_cond, "Representative", "Member") } return(clustered_df) } ================================================ FILE: R/comparison.R ================================================ #' Combine 2 pathfindR Results #' #' @param result_A data frame of first pathfindR enrichment results #' @param result_B data frame of second pathfindR enrichment results #' @param plot_common boolean to indicate whether or not to plot the term-gene #' graph of the common terms (default=\code{TRUE}) #' #' @return Data frame of combined pathfindR enrichment results. Columns are: \describe{ #' \item{ID}{ID of the enriched term} #' \item{Term_Description}{Description of the enriched term} #' \item{Fold_Enrichment_A}{Fold enrichment value for the enriched term (Calculated using ONLY the input genes)} #' \item{occurrence_A}{the number of iterations that the given term was found to enriched over all iterations} #' \item{lowest_p_A}{the lowest adjusted-p value of the given term over all iterations} #' \item{highest_p_A}{the highest adjusted-p value of the given term over all iterations} #' \item{Up_regulated_A}{the up-regulated genes in the input involved in the given term's gene set, comma-separated} #' \item{Down_regulated_A}{the down-regulated genes in the input involved in the given term's gene set, comma-separated} #' \item{Fold_Enrichment_B}{Fold enrichment value for the enriched term (Calculated using ONLY the input genes)} #' \item{occurrence_B}{the number of iterations that the given term was found to enriched over all iterations} #' \item{lowest_p_B}{the lowest adjusted-p value of the given term over all iterations} #' \item{highest_p_B}{the highest adjusted-p value of the given term over all iterations} #' \item{Up_regulated_B}{the up-regulated genes in the input involved in the given term's gene set, comma-separated} #' \item{Down_regulated_B}{the down-regulated genes in the input involved in the given term's gene set, comma-separated} #' \item{combined_p}{the combined p value (via Fisher's method)} #' \item{status}{whether the term is found in both analyses ('common'), found only in the first ('A only') or found only in the second ('B only)} #' } #' By default, the function also displays the term-gene graph of the common terms #' #' @export #' #' @examples #' combined_results <- combine_pathfindR_results(example_pathfindR_output, example_comparison_output) combine_pathfindR_results <- function(result_A, result_B, plot_common = TRUE) { combined_df <- merge(result_A, result_B, by = c("ID", "Term_Description"), all = TRUE, suffixes = c("_A", "_B")) ### Calculate combined p values combined_df$combined_p <- NA for (i in seq_len(nrow(combined_df))) { p_vec <- c(combined_df$lowest_p_A[i], combined_df$lowest_p_B[i]) p_vec <- p_vec[!is.na(p_vec)] combined_df$combined_p[i] <- stats::pchisq(q = sum(log(p_vec)) * -2, df = length(p_vec) * 2, lower.tail = FALSE) } ### Indicate intersection status combined_df$status <- ifelse(is.na(combined_df$lowest_p_A), "B only", ifelse(is.na(combined_df$lowest_p_B), "A only", "common")) ### Plot graph common terms if (plot_common) { graphics::plot(combined_results_graph(combined_df)) } message("You may run `combined_results_graph()` to create visualizations of combined term-gene graphs of selected terms") return(combined_df) } #' Combined Results Graph #' #' @param combined_df Data frame of combined pathfindR enrichment results #' @param selected_terms the vector of selected terms for creating the graph #' (either IDs or term descriptions). If set to \code{'common'}, all of the #' common terms are used. (default = 'common') #' @inheritParams term_gene_graph #' #' @return a \code{\link[ggraph]{ggraph}} object containing the combined term-gene graph. #' Each node corresponds to an enriched term (orange if common, different shades of blue otherwise), #' an up-regulated gene (green), a down-regulated gene (red) or #' a conflicting (i.e. up in one analysis, down in the other or vice versa) gene #' (gray). An edge between a term and a gene indicates #' that the given term involves the gene. Size of a term node is proportional #' to either the number of genes (if \code{node_size = 'num_genes'}) or #' the -log10(lowest p value) (if \code{node_size = 'p_val'}). #' @export #' #' @examples #' combined_results <- combine_pathfindR_results( #' example_pathfindR_output, #' example_comparison_output, #' plot_common = FALSE #' ) #' g <- combined_results_graph(combined_results, selected_terms = sample(combined_results$ID, 3)) combined_results_graph <- function(combined_df, selected_terms = "common", use_description = FALSE, layout = "stress", node_size = "num_genes") { ############ Argument Checks Check use_description is boolean if (!is.logical(use_description)) { stop("`use_description` must either be TRUE or FALSE!") } ### Set column for term labels ID_column <- ifelse(use_description, "Term_Description", "ID") ### Check node_size val_node_size <- c("num_genes", "p_val") if (!node_size %in% val_node_size) { stop("`node_size` should be one of ", paste(dQuote(val_node_size), collapse = ", ")) } if (!is.data.frame(combined_df)) { stop("`combined_df` should be a data frame") } ### Check necessary columnns necessary_cols <- c(ID_column, "combined_p", "Up_regulated_A", "Down_regulated_A", "Up_regulated_B", "Down_regulated_B") if (!all(necessary_cols %in% colnames(combined_df))) { stop(paste(c("All of", paste(necessary_cols, collapse = ", "), "must be present in `results_df`!"), collapse = " ")) } ############ Initial steps Filter for selected terms if (any(selected_terms == "common")) { if (!any(combined_df$status == "common")) { stop("There are no common terms") } combined_df <- combined_df[combined_df$status == "common", ] } else { if (!any(selected_terms %in% combined_df[, ID_column])) { stop("None of the `selected_terms` are in the combined results!") } combined_df <- combined_df[combined_df[, ID_column] %in% selected_terms, ] } ### Prep data frame for graph graph_df <- data.frame() for (i in base::seq_len(nrow(combined_df))) { up_genes <- c(unlist(strsplit(combined_df$Up_regulated_A[i], ", ")), unlist(strsplit(combined_df$Up_regulated_B[i], ", "))) down_genes <- c(unlist(strsplit(combined_df$Down_regulated_A[i], ", ")), unlist(strsplit(combined_df$Down_regulated_B[i], ", "))) genes <- c(up_genes, down_genes) genes <- genes[!is.na(genes)] for (gene in genes) { graph_df <- rbind(graph_df, data.frame(Term = combined_df[i, ID_column], Gene = gene)) } } graph_df <- unique(graph_df) up_genes_A <- unlist(lapply(combined_df$Up_regulated_A, function(x) unlist(strsplit(x, ", ")))) down_genes_A <- unlist(lapply(combined_df$Down_regulated_A, function(x) unlist(strsplit(x, ", ")))) up_genes_B <- unlist(lapply(combined_df$Up_regulated_B, function(x) unlist(strsplit(x, ", ")))) down_genes_B <- unlist(lapply(combined_df$Down_regulated_B, function(x) unlist(strsplit(x, ", ")))) terms_A <- combined_df[!is.na(combined_df$lowest_p_A) & is.na(combined_df$lowest_p_B), ID_column] terms_B <- combined_df[is.na(combined_df$lowest_p_A) & !is.na(combined_df$lowest_p_B), ID_column] ############ Create graph object and plot create igraph object g <- igraph::graph_from_data_frame(graph_df, directed = FALSE) igraph::V(g)$type <- ifelse(names(igraph::V(g)) %in% terms_A, "A-only term", ifelse(names(igraph::V(g)) %in% terms_B, "B-only term", ifelse(names(igraph::V(g)) %in% combined_df[, ID_column], "common term", "gene"))) # Adjust node sizes if (node_size == "num_genes") { sizes <- igraph::degree(g) sizes <- ifelse(grepl("term", igraph::V(g)$type), sizes, 2) size_label <- "# genes" } else { idx <- match(names(igraph::V(g)), combined_df[, ID_column]) sizes <- -log10(combined_df$combined_p[idx]) sizes[is.na(sizes)] <- 2 size_label <- "-log10(p)" } igraph::V(g)$size <- sizes igraph::V(g)$label.cex <- 0.5 igraph::V(g)$frame.color <- "gray" cond_up_A <- names(igraph::V(g)) %in% up_genes_A cond_up_B <- names(igraph::V(g)) %in% up_genes_B cond_down_A <- names(igraph::V(g)) %in% down_genes_A cond_down_B <- names(igraph::V(g)) %in% down_genes_B missing_A <- !cond_up_A & !cond_down_A missing_B <- !cond_up_B & !cond_down_B up_cond <- (cond_up_A & cond_up_B) | (missing_A & cond_up_B) | (cond_up_A & missing_B) down_cond <- (cond_down_A & cond_down_B) | (missing_A & cond_down_B) | (cond_down_A & missing_B) igraph::V(g)$for_coloring <- ifelse(igraph::V(g)$type == "common term", "Common term", ifelse(igraph::V(g)$type == "A-only term", "A-only term", ifelse(igraph::V(g)$type == "B-only term", "B-only term", ifelse(up_cond, "Up gene", ifelse(down_cond, "Down gene", "Conflicting gene"))))) ### Create graph create_graph <- function(g, for_coloring, size) { color_var <- ggplot2::enquo(for_coloring) size_var <- ggplot2::enquo(size) p <- ggraph::ggraph(g, layout = layout) p <- p + ggraph::geom_edge_link(alpha = 0.8, colour = "darkgrey") p <- p + ggraph::geom_node_point(ggplot2::aes(color = !!color_var, size = !!size_var)) p <- p + ggplot2::scale_size(range = c(5, 10), breaks = round(seq(round(min(igraph::V(g)$size)), round(max(igraph::V(g)$size)), length.out = 4)), name = size_label) p <- p + ggplot2::theme_void() p <- p + suppressWarnings(ggraph::geom_node_text(ggplot2::aes(label = .data$name), nudge_y = 0.2, repel = TRUE, max.overlaps = 20)) vertex_cols <- c(`Common term` = "#FCCA46", `A-only term` = "#9FB8AD", `B-only term` = "#619B8A", `Up gene` = "green", `Down gene` = "red", `Conflicting gene` = "gray") p <- p + ggplot2::scale_colour_manual(values = vertex_cols, name = NULL) p <- p + ggplot2::ggtitle("Combined Terms Graph") p <- p + ggplot2::theme(plot.title = ggplot2::element_text(hjust = 0.5)) return(p) } return(create_graph(g, for_coloring, size)) } ================================================ FILE: R/core.R ================================================ #' Wrapper Function for pathfindR - Active-Subnetwork-Oriented Enrichment Workflow #' #' \code{run_pathfindR} is the wrapper function for the pathfindR workflow #' #' This function takes in a data frame consisting of Gene Symbol, log-fold-change #' and adjusted-p values. After input testing, any gene symbols that are not in #' the PIN are converted to alias symbols if the alias is in the PIN. Next, #' active subnetwork search is performed. Enrichment analysis is #' performed using the genes in each of the active subnetworks. Terms with #' adjusted-p values lower than \code{enrichment_threshold} are discarded. The #' lowest adjusted-p value (over all subnetworks) for each term is kept. This #' process of active subnetwork search and enrichment is repeated for a selected #' number of \code{iterations}, which is done in parallel. Over all iterations, #' the lowest and the highest adjusted-p values, as well as number of occurrences #' are reported for each enriched term. #' #' @inheritParams input_processing #' @inheritParams fetch_gene_set #' @inheritParams enrichment_analyses #' @param plot_enrichment_chart boolean value. If TRUE, a bubble chart displaying #' the enrichment results is plotted. (default = TRUE) #' @param output_dir the directory to be created where the output and intermediate #' files are saved (default = \code{NULL}, a temporary directory is used) #' @param ... additional arguments for \code{\link{active_snw_enrichment_wrapper}} #' #' @return Data frame of pathfindR enrichment results. Columns are: \describe{ #' \item{ID}{ID of the enriched term} #' \item{Term_Description}{Description of the enriched term} #' \item{Fold_Enrichment}{Fold enrichment value for the enriched term (Calculated using ONLY the input genes)} #' \item{occurrence}{the number of iterations that the given term was found to enriched over all iterations} #' \item{support}{the median support (proportion of active subnetworks leading to enrichment within an iteration) over all iterations} #' \item{lowest_p}{the lowest adjusted-p value of the given term over all iterations} #' \item{highest_p}{the highest adjusted-p value of the given term over all iterations} #' \item{non_Signif_Snw_Genes (OPTIONAL)}{the non-significant active subnetwork genes, comma-separated} #' \item{Up_regulated}{the up-regulated genes (as determined by `change value` > 0, if the `change column` was provided) in the input involved in the given term's gene set, comma-separated. If change column not provided, all affected are listed here.} #' \item{Down_regulated}{the down-regulated genes (as determined by `change value` < 0, if the `change column` was provided) in the input involved in the given term's gene set, comma-separated} #' } #' The function also creates an HTML report with the pathfindR enrichment #' results linked to the visualizations of the enriched terms in addition to #' the table of converted gene symbols. This report can be found in #' '\code{output_dir}/results.html' under the current working directory. #' #' By default, a bubble chart of top 10 enrichment results are plotted. The x-axis #' corresponds to fold enrichment values while the y-axis indicates the enriched #' terms. Sizes of the bubbles indicate the number of significant genes in the given terms. #' Color indicates the -log10(lowest-p) value; the more red it is, the more #' significant the enriched term is. See \code{\link{enrichment_chart}}. #' #' @import knitr #' @import rmarkdown #' @import parallel #' @import doParallel #' @import foreach #' @import graphics #' #' @export #' #' @section Warning: Especially depending on the protein interaction network, #' the algorithm and the number of iterations you choose, 'active subnetwork #' search + enrichment' component of \code{run_pathfindR} may take a long time to finish. #' #' @seealso #' \code{\link{input_testing}} for input testing, \code{\link{input_processing}} for input processing, #' \code{\link{active_snw_search}} for active subnetwork search and subnetwork filtering, #' \code{\link{enrichment_analyses}} for enrichment analysis (using the active subnetworks), #' \code{\link{summarize_enrichment_results}} for summarizing the active-subnetwork-oriented enrichment results, #' \code{\link{annotate_term_genes}} for annotation of affected genes in the given gene sets, #' \code{\link{visualize_terms}} for visualization of enriched terms, #' \code{\link{enrichment_chart}} for a visual summary of the pathfindR enrichment results, #' \code{\link[foreach]{foreach}} for details on parallel execution of looping constructs, #' \code{\link{cluster_enriched_terms}} for clustering the resulting enriched terms and partitioning into clusters. #' #' @examples #' \dontrun{ #' run_pathfindR(example_pathfindR_input) #' } run_pathfindR <- function(input, gene_sets = "KEGG", min_gset_size = 10, max_gset_size = 300, custom_genes = NULL, custom_descriptions = NULL, pin_name_path = "Biogrid", p_val_threshold = 0.05, enrichment_threshold = 0.05, convert2alias = TRUE, plot_enrichment_chart = TRUE, output_dir = NULL, list_active_snw_genes = FALSE, ...) { ############ Argument checks if (!is.logical(plot_enrichment_chart)) { stop("`plot_enrichment_chart` should be either TRUE or FALSE") } if (!is.logical(list_active_snw_genes)) { stop("`list_active_snw_genes` should be either TRUE or FALSE") } gset_list <- fetch_gene_set(gene_sets = gene_sets, min_gset_size = min_gset_size, max_gset_size = max_gset_size, custom_genes = custom_genes, custom_descriptions = custom_descriptions) ## absolute path to PIN pin_path <- return_pin_path(pin_name_path) ## create output dir output_dir_org <- output_dir output_dir <- configure_output_dir(output_dir) # on exit, set working directory back to original working directory org_dir <- getwd() on.exit(setwd(org_dir)) # create and change working directory into the output directory dir.create(output_dir, recursive = TRUE) output_dir <- normalizePath(output_dir) setwd(output_dir) input_testing(input, p_val_threshold) input_processed <- input_processing(input, p_val_threshold, pin_path, convert2alias) combined_res <- active_snw_enrichment_wrapper(input_processed, pin_path, gset_list, enrichment_threshold, list_active_snw_genes, ...) setwd(output_dir) ## In case no enrichment was found if (is.null(combined_res)) { warning("Did not find any enriched terms!", call. = FALSE) return(data.frame()) } final_res <- summarize_enrichment_results(combined_res, list_active_snw_genes) final_res <- annotate_term_genes(result_df = final_res, input_processed = input_processed, genes_by_term = gset_list$genes_by_term) if (!is.null(output_dir_org)) { create_HTML_report(input = input, input_processed = input_processed, final_res = final_res, dir_for_report = output_dir) } if (plot_enrichment_chart) { graphics::plot(enrichment_chart(result_df = final_res)) } message(paste0("Found ", nrow(final_res), " enriched terms\n\n")) message("You may run:\n") message("- cluster_enriched_terms() for clustering enriched terms\n") message("- visualize_terms() for visualizing enriched term diagrams\n\n") return(final_res) } ================================================ FILE: R/data_generation.R ================================================ #' Safely download and parse web content #' #' This helper function retrieves content from a given URL using \pkg{httr}. #' It ensures that common issues (e.g. no internet, timeouts, HTTP errors, #' or parsing errors) are handled gracefully with clear, informative error messages. #' #' @param url Character string. The URL of the resource to download. #' @param ... Additional arguments passed to \code{\link[httr]{GET}}. #' @param timeout_sec Numeric. Timeout in seconds for the request (default = 10). #' #' @return A character string containing the parsed content of the response #' (UTF-8 encoded). On failure, an error is raised with a clear message. #' #' @details #' This function is intended for use inside package functions. #' For examples, vignettes, or tests, wrap calls in a connectivity check #' (e.g. using \code{http_error(HEAD(url))}) to avoid CRAN failures #' when the resource is temporarily unavailable. #' #' @examples #' \dontrun{ #' # Retrieve the latest BioGRID release page #' result <- safe_get_content("https://downloads.thebiogrid.org/BioGRID/Latest-Release/") #' } #' #' @importFrom httr GET timeout http_error status_code content safe_get_content <- function(url, ..., timeout_sec = 10) { res <- tryCatch( { GET(url, timeout(timeout_sec), ...) }, error = function(e) { stop("Failed to retrieve resource from ", url, ". Error: ", conditionMessage(e), call. = FALSE) } ) # Check HTTP status if (http_error(res)) { stop("The resource at ", url, " is unavailable. HTTP status: ", status_code(res), call. = FALSE) } # Return parsed content (default: text if HTML, raw if binary, etc.) content <- tryCatch( content(res, as = "text", encoding = "UTF-8"), error = function(e) { stop("Failed to parse content from ", url, ". Error: ", conditionMessage(e), call. = FALSE) } ) return(content) } #' Process Data frame of Protein-protein Interactions #' #' @param pin_df data frame of protein-protein interactions with 2 columns: #' 'Interactor_A' and 'Interactor_B' #' #' @return processed PIN data frame (removes self-interactions and #' duplicated interactions) process_pin <- function(pin_df) { # remove self-interactions pin_df <- pin_df[pin_df$Interactor_A != pin_df$Interactor_B, ] # remove duplicated inteactions (including symmetric ones) pin_df <- unique(t(apply(pin_df, 1, sort))) pin_df <- as.data.frame(pin_df) colnames(pin_df) <- c("Interactor_A", "Interactor_B") return(pin_df) } #' Retrieve the Requested Release of Organism-specific BioGRID PIN #' #' @param org organism name. BioGRID naming requires underscores for spaces so #' 'Homo sapiens' becomes 'Homo_sapiens', 'Mus musculus' becomes 'Mus_musculus' #' etc. See \url{https://wiki.thebiogrid.org/doku.php/statistics} for a full #' list of available organisms (default = 'Homo_sapiens') #' @param path2pin the path of the file to save the PIN data. By default, the #' PIN data is saved in a temporary file #' @param release the requested BioGRID release (default = 'latest') #' #' @return the path of the file in which the PIN data was saved. If #' \code{path2pin} was not supplied by the user, the PIN data is saved in a #' temporary file get_biogrid_pin <- function(org = "Homo_sapiens", path2pin, release = "latest") { # check organism name all_org_names <- c("Anopheles_gambiae_PEST", "Apis_mellifera", "Arabidopsis_thaliana_Columbia", "Bacillus_subtilis_168", "Bos_taurus", "Caenorhabditis_elegans", "Candida_albicans_SC5314", "Canis_familiaris", "Cavia_porcellus", "Chlamydomonas_reinhardtii", "Chlorocebus_sabaeus", "Cricetulus_griseus", "Danio_rerio", "Dictyostelium_discoideum_AX4", "Drosophila_melanogaster", "Emericella_nidulans_FGSC_A4", "Equus_caballus", "Escherichia_coli_K12_MC4100_BW2952", "Escherichia_coli_K12_MG1655", "Escherichia_coli_K12_W3110", "Escherichia_coli_K12", "Gallus_gallus", "Glycine_max", "Hepatitus_C_Virus", "Homo_sapiens", "Human_Herpesvirus_1", "Human_Herpesvirus_2", "Human_Herpesvirus_3", "Human_Herpesvirus_4", "Human_Herpesvirus_5", "Human_Herpesvirus_6A", "Human_Herpesvirus_6B", "Human_Herpesvirus_7", "Human_Herpesvirus_8", "Human_Immunodeficiency_Virus_1", "Human_Immunodeficiency_Virus_2", "Human_papillomavirus_10", "Human_papillomavirus_16", "Human_papillomavirus_6b", "Leishmania_major_Friedlin", "Macaca_mulatta", "Meleagris_gallopavo", "Mus_musculus", "Mycobacterium_tuberculosis_H37Rv", "Neurospora_crassa_OR74A", "Nicotiana_tomentosiformis", "Oryctolagus_cuniculus", "Oryza_sativa_Japonica", "Ovis_aries", "Pan_troglodytes", "Pediculus_humanus", "Plasmodium_falciparum_3D7", "Rattus_norvegicus", "Ricinus_communis", "Saccharomyces_cerevisiae_S288c", "Schizosaccharomyces_pombe_972h", "Selaginella_moellendorffii", "Simian_Immunodeficiency_Virus", "Simian_Virus_40", "Solanum_lycopersicum", "Solanum_tuberosum", "Streptococcus_pneumoniae_ATCCBAA255", "Strongylocentrotus_purpuratus", "Sus_scrofa", "Tobacco_Mosaic_Virus", "Ustilago_maydis_521", "Vaccinia_Virus", "Vitis_vinifera", "Xenopus_laevis", "Zea_mays") if (!org %in% all_org_names) { stop(paste(org, "is not a valid Biogrid organism.", "Available organisms are listed on: https://wiki.thebiogrid.org/doku.php/statistics")) } if (release == "latest") { result <- safe_get_content("https://downloads.thebiogrid.org/BioGRID/Latest-Release/") h2_matches <- regexpr("(?<=

BioGRID Release\\s)(\\d\\.\\d\\.\\d+)", result, perl = TRUE) release <- regmatches(result, h2_matches) } # release directory for download rel_dir <- paste0("BIOGRID-", release) # choose tab2 vs. tab3 tab_v <- ifelse(utils::compareVersion(release, "3.5.183") == -1, ".tab2", ".tab3") # download tab2 format organism files tmp <- tempfile() fname <- paste0("BIOGRID-ORGANISM-", release, tab_v) biogrid_url <- paste0("https://downloads.thebiogrid.org/Download/BioGRID/Release-Archive/", rel_dir, "/", fname, ".zip") utils::download.file(biogrid_url, tmp, method = getOption("download.file.method"), quiet = TRUE) # parse organism names all_org_files <- utils::unzip(tmp, list = TRUE) all_org_files$Organism <- sub("\\.tab\\d\\.txt", "", all_org_files$Name) all_org_files$Organism <- sub("BIOGRID-ORGANISM-", "", all_org_files$Organism) all_org_files$Organism <- sub("-.*\\d+$", "", all_org_files$Organism) org_file <- all_org_files$Name[all_org_files$Organism == org] # process and save organism PIN file biogrid_df <- utils::read.delim(unz(tmp, org_file), check.names = FALSE, colClasses = "character") biogrid_pin <- data.frame(Interactor_A = biogrid_df[, "Official Symbol Interactor A"], Interactor_B = biogrid_df[, "Official Symbol Interactor B"]) biogrid_pin <- process_pin(biogrid_pin) final_pin <- data.frame(intA = biogrid_pin$Interactor_A, pp = "pp", intB = biogrid_pin$Interactor_B) if (missing(path2pin)) { path2pin <- tempfile() } utils::write.table(final_pin, path2pin, sep = "\t", row.names = FALSE, col.names = FALSE, quote = FALSE) return(path2pin) } #' Retrieve Organism-specific PIN data #' #' @param source As of this version, this function is implemented to get data #' from 'BioGRID' only. This argument (and this wrapper function) was implemented #' for future utility #' @inheritParams get_biogrid_pin #' @param ... additional arguments for \code{\link{get_biogrid_pin}} #' #' @return the path of the file in which the PIN data was saved. If #' \code{path2pin} was not supplied by the user, the PIN data is saved in a #' temporary file #' @export #' #' @examples #' \dontrun{ #' pin_path <- get_pin_file() #' } get_pin_file <- function(source = "BioGRID", org = "Homo_sapiens", path2pin, ...) { ## TODO if (source != "BioGRID") { stop("As of this version, this function is implemented to get data from BioGRID only") } path2pin <- get_biogrid_pin(org = org, path2pin = path2pin, ...) return(path2pin) } #' Retrieve Gene Sets from GMT-format File #' #' @param path2gmt path to the gmt file #' @param descriptions_idx index for descriptions (default = 2) #' #' @return list containing 2 elements: \itemize{ #' \item{gene_sets - A list containing the genes involved in each gene set} #' \item{descriptions - A named vector containing the descriptions for each gene set} #' } gset_list_from_gmt <- function(path2gmt, descriptions_idx = 2) { gset_names_idx <- ifelse(descriptions_idx == 2, 1, 2) gmt_lines <- readLines(path2gmt) ## Genes list genes_list <- lapply(gmt_lines, function(x) { x <- unlist(strsplit(x, "\t")) x <- unique(x[3:length(x)]) x <- x[x != ""] return(x) }) names(genes_list) <- vapply(gmt_lines, function(x) { x <- unlist(strsplit(x, "\t")) return(x[gset_names_idx]) }, "a") ## Descriptions vector descriptions_vec <- vapply(gmt_lines, function(x) { x <- unlist(strsplit(x, "\t")) return(x[descriptions_idx]) }, "a") names(descriptions_vec) <- names(genes_list) # remove empty gene sets (if any) genes_list <- genes_list[vapply(genes_list, length, 1) != 0] descriptions_vec <- descriptions_vec[names(genes_list)] return(list(gene_sets = genes_list, descriptions = descriptions_vec)) } #' Retrieve Organism-specific KEGG Pathway Gene Sets #' #' @param org_code KEGG organism code for the selected organism. For a full list #' of all available organisms, see \url{https://www.genome.jp/kegg/catalog/org_list.html} #' #' @return list containing 2 elements: \itemize{ #' \item{gene_sets - A list containing KEGG IDs for the genes involved in each KEGG pathway} #' \item{descriptions - A named vector containing the descriptions for each KEGG pathway} #' } #' @importFrom ggkegg pathway get_kegg_gsets <- function(org_code = "hsa") { message("Grab a cup of coffee, this will take a while...") all_pathways_url <- paste0("https://rest.kegg.jp/list/pathway/", org_code) all_pathways_result <- safe_get_content(all_pathways_url) parsed_all_pathways_result <- strsplit(all_pathways_result, "\n")[[1]] pathway_ids <- vapply(parsed_all_pathways_result, function(x) unlist(strsplit(x, "\t"))[1], "id") pathway_descriptons <- vapply(parsed_all_pathways_result, function(x) unlist(strsplit(x, "\t"))[2], "description") names(pathway_descriptons) <- pathway_ids genes_by_pathway <- lapply(pathway_ids, function(pw_id) { pathways_graph <- pathway(pid = pw_id, directory = tempdir(), use_cache = FALSE, return_tbl_graph = FALSE) all_pw_kegg_ids <- igraph::V(pathways_graph)$name[igraph::V(pathways_graph)$type == "gene"] all_pw_kegg_ids <- unlist(strsplit(all_pw_kegg_ids, " ")) all_pw_kegg_ids <- unique(all_pw_kegg_ids) return(all_pw_kegg_ids) }) names(genes_by_pathway) <- pathway_ids # remove empty gene sets (e.g. pure metabolic pathways) kegg_genes <- genes_by_pathway[vapply(genes_by_pathway, length, 1) != 0] kegg_descriptions <- pathway_descriptons kegg_descriptions <- sub(" & .*$", "", sub("-([^-]*)$", "&\\1", kegg_descriptions)) kegg_descriptions <- kegg_descriptions[names(kegg_descriptions) %in% names(kegg_genes)] result <- list(gene_sets = kegg_genes, descriptions = kegg_descriptions) return(result) } #' Retrieve Reactome Pathway Gene Sets #' #' @return Gets the latest Reactome pathways gene sets in gmt format. Parses the #' gmt file and returns a list containing 2 elements: \itemize{ #' \item{gene_sets - A list containing the genes involved in each Reactome pathway} #' \item{descriptions - A named vector containing the descriptions for each Reactome pathway} #' } #' get_reactome_gsets <- function() { tmp <- tempfile() reactome_url <- "https://reactome.org/download/current/ReactomePathways.gmt.zip" utils::download.file(reactome_url, tmp, method = getOption("download.file.method")) reactome_gmt <- unz(tmp, "ReactomePathways.gmt") result <- gset_list_from_gmt(reactome_gmt, descriptions_idx = 1) close(reactome_gmt) # fix illegal char(s) result$descriptions <- gsub("[^ -~]", "", result$descriptions) return(result) } #' Retrieve Organism-specific MSigDB Gene Sets #' #' @param species species name for output genes, such as Homo sapiens, Mus musculus, etc. #' See \code{\link[msigdbr]{msigdbr_species}} for all the species available in #' the msigdbr package. #' @param db_species Species abbreviation for the human or mouse databases ("HS" or "MM"). #' @param collection collection. e.g., H, C1. (default = NULL, #' i.e. list all gene sets in collection). #' See \code{\link[msigdbr]{msigdbr_collections}} for all available options #' the msigdbr package. #' @param subcollection sub-collection, such as CGP, BP, etc. (default = NULL, #' i.e. list all gene sets in collection). #' See \code{\link[msigdbr]{msigdbr_collections}} for all available options #' the msigdbr package. #' #' @return Retrieves the MSigDB gene sets and returns a list containing 2 elements: \itemize{ #' \item{gene_sets - A list containing the genes involved in each of the selected MSigDB gene sets} #' \item{descriptions - A named vector containing the descriptions for each selected MSigDB gene set} #' } #' #' @details this function utilizes the function \code{\link[msigdbr]{msigdbr}} #' from the \code{msigdbr} package to retrieve the 'Molecular Signatures Database' #' (MSigDB) gene sets (Subramanian et al. 2005 , #' Liberzon et al. 2015 ). #' Available collections are: H: hallmark gene sets, C1: positional gene sets, #' C2: curated gene sets, C3: motif gene sets, C4: computational gene sets, #' C5: GO gene sets, C6: oncogenic signatures and C7: immunologic signatures get_mgsigdb_gsets <- function(species = "Homo sapiens", db_species = "HS", collection = NULL, subcollection = NULL) { msig_df <- msigdbr::msigdbr( species = species, collection = collection, subcollection = subcollection, db_species = db_species ) ### create gene sets list all_gs_ids <- unique(msig_df$gs_id) msig_gsets_list <- list() for (id in all_gs_ids) { sub_df <- msig_df[msig_df$gs_id == id, ] msig_gsets_list[[id]] <- unique(sub_df$gene_symbol) } ### create gene sets descriptions msig_gsets_descriptions <- msig_df[, c("gs_name", "gs_id")] msig_gsets_descriptions <- unique(msig_gsets_descriptions) tmp <- msig_gsets_descriptions$gs_id msig_gsets_descriptions <- msig_gsets_descriptions$gs_name names(msig_gsets_descriptions) <- tmp result <- list(gene_sets = msig_gsets_list, descriptions = msig_gsets_descriptions) return(result) } #' Retrieve Organism-specific Gene Sets List #' #' @param source As of this version, either 'KEGG', 'Reactome' or 'MSigDB' (default = 'KEGG') #' @param org_code (Used for 'KEGG' only) KEGG organism code for the selected organism. For a full list #' of all available organisms, see \url{https://www.genome.jp/kegg/catalog/org_list.html} #' @inheritParams get_mgsigdb_gsets #' #' @return A list containing 2 elements: \itemize{ #' \item{gene_sets - A list containing the genes involved in each gene set} #' \item{descriptions - A named vector containing the descriptions for each gene set} #' }. For 'KEGG' and 'MSigDB', it is possible to choose a specific organism. For a full list #' of all available KEGG organisms, see \url{https://www.genome.jp/kegg/catalog/org_list.html}. #' See \code{\link[msigdbr]{msigdbr_species}} for all the species available in #' the msigdbr package used for obtaining 'MSigDB' gene sets. #' For Reactome, there is only one collection of pathway gene sets. #' @export #' get_gene_sets_list <- function(source = "KEGG", org_code = "hsa", species = "Homo sapiens", db_species = "HS", collection, subcollection = NULL) { if (source == "KEGG") { return(get_kegg_gsets(org_code)) } else if (source == "Reactome") { message("For Reactome, there is only one collection of pathway gene sets.") return(get_reactome_gsets()) } else if (source == "MSigDB") { return( get_mgsigdb_gsets( species = species, db_species= db_species, collection = collection, subcollection = subcollection ) ) } else { stop("As of this version, this function is implemented to get data from KEGG, Reactome and MSigDB only") } } ================================================ FILE: R/enrichment.R ================================================ #' Hypergeometric Distribution-based Hypothesis Testing #' #' @param term_genes vector of genes in the selected term gene set #' @param chosen_genes vector containing the set of input genes #' @param background_genes vector of background genes (i.e. universal set of #' genes in the experiment) #' #' @return the p-value as determined using the hypergeometric distribution. #' #' @details To determine whether the \code{chosen_genes} are enriched #' (compared to a background pool of genes) in the \code{term_genes}, the #' hypergeometric distribution is assumed and the appropriate p value #' (the value under the right tail) is calculated and returned. #' #' @export #' #' @examples #' hyperg_test(letters[1:5], letters[2:5], letters) #' hyperg_test(letters[1:5], letters[2:10], letters) #' hyperg_test(letters[1:5], letters[2:13], letters) hyperg_test <- function(term_genes, chosen_genes, background_genes) { #### Argument checks if (!is.atomic(term_genes)) { stop("`term_genes` should be a vector") } if (!is.atomic(chosen_genes)) { stop("`chosen_genes` should be a vector") } if (!is.atomic(background_genes)) { stop("`background_genes` should be a vector") } if (length(term_genes) > length(background_genes)) { stop("`term_genes` cannot be larger than `background_genes`!") } if (length(chosen_genes) > length(background_genes)) { stop("`chosen_genes` cannot be larger than `background_genes`!") } #### Calculate p value term_genes_selected <- sum(chosen_genes %in% term_genes) term_genes_in_pool <- sum(term_genes %in% background_genes) tot_genes_in_pool <- length(background_genes) non_term_genes_in_pool <- tot_genes_in_pool - term_genes_in_pool num_selected_genes <- length(chosen_genes) p_val <- stats::phyper(term_genes_selected - 1, term_genes_in_pool, non_term_genes_in_pool, num_selected_genes, lower.tail = FALSE) return(p_val) } #' Perform Enrichment Analysis for a Single Gene Set #' #' @param input_genes The set of gene symbols to be used for enrichment #' analysis. In the scope of this package, these are genes that were #' identified for an active subnetwork #' @param genes_by_term List that contains genes for each gene set. Names of #' this list are gene set IDs (default = kegg_genes) #' @param term_descriptions Vector that contains term descriptions for the #' gene sets. Names of this vector are gene set IDs (default = kegg_descriptions) #' @param adj_method correction method to be used for adjusting p-values. #' (default = 'bonferroni') #' @param enrichment_threshold adjusted-p value threshold used when filtering #' enrichment results (default = 0.05) #' @param sig_genes_vec vector of significant gene symbols. In the scope of this #' package, these are the input genes that were used for active subnetwork search #' @param background_genes vector of background genes. In the scope of this package, #' the background genes are taken as all genes in the PIN #' (see \code{\link{enrichment_analyses}}) #' #' @return A data frame that contains enrichment results #' @export #' @seealso \code{\link[stats]{p.adjust}} for adjustment of p values. See #' \code{\link{run_pathfindR}} for the wrapper function of the pathfindR #' workflow. \code{\link{hyperg_test}} for the details on hypergeometric #' distribution-based hypothesis testing. #' @examples #' enrichment( #' input_genes = c('PER1', 'PER2', 'CRY1', 'CREB1'), #' sig_genes_vec = 'PER1', #' background_genes = unlist(pathfindR.data::kegg_genes) #' ) enrichment <- function(input_genes, genes_by_term = pathfindR.data::kegg_genes, term_descriptions = pathfindR.data::kegg_descriptions, adj_method = "bonferroni", enrichment_threshold = 0.05, sig_genes_vec, background_genes) { #### Argument checks input genes if (!is.atomic(input_genes)) { stop("`input_genes` should be a vector of gene symbols") } ## gene sets data if (!is.list(genes_by_term)) { stop("`genes_by_term` should be a list of term gene sets") } if (is.null(names(genes_by_term))) { stop("`genes_by_term` should be a named list (names are gene set IDs)") } if (!is.atomic(term_descriptions)) { stop("`term_descriptions` should be a vector of term gene descriptions") } if (is.null(names(term_descriptions))) { stop("`term_descriptions` should be a named vector (names are gene set IDs)") } if (length(genes_by_term) != length(term_descriptions)) { stop("The lengths of `genes_by_term` and `term_descriptions` should be the same") } if (any(names(genes_by_term) != names(term_descriptions))) { stop("The names of `genes_by_term` and `term_descriptions` should all be the same") } ## enrichment threshold if (!is.numeric(enrichment_threshold)) { stop("`enrichment_threshold` should be a numeric value between 0 and 1") } if (enrichment_threshold < 0 | enrichment_threshold > 1) { stop("`enrichment_threshold` should be between 0 and 1") } ## signif. genes and background (universal set) genes if (!is.atomic(sig_genes_vec)) { stop("`sig_genes_vec` should be a vector") } if (!is.atomic(background_genes)) { stop("`background_genes` should be a vector") } #### Obtain p values enrichment_res <- vapply(genes_by_term, hyperg_test, 0.1, input_genes, background_genes) enrichment_res <- as.data.frame(enrichment_res) colnames(enrichment_res) <- "p_value" # Adjust p values idx <- order(enrichment_res$p_value) enrichment_res <- enrichment_res[idx, , drop = FALSE] enrichment_res$adj_p <- stats::p.adjust(enrichment_res$p, method = adj_method) #### Filter by adj-p cond <- enrichment_res$adj_p <= enrichment_threshold # Empty case (if all adj-p > threshold) if (sum(cond) == 0) { return(NULL) } enrichment_res <- enrichment_res[cond, ] #### Add other columns Term IDs enrichment_res$ID <- rownames(enrichment_res) ## Term descriptions idx <- match(enrichment_res$ID, names(term_descriptions)) enrichment_res$Term_Description <- term_descriptions[idx] # Fold enrinchment gset_for_fe <- genes_by_term[rownames(enrichment_res)] A <- vapply(gset_for_fe, function(gset) length(intersect(sig_genes_vec, gset)), 1L)/length(sig_genes_vec) B <- vapply(gset_for_fe, function(gset) length(intersect(background_genes, gset)), 1L)/length(background_genes) enrichment_res$Fold_Enrichment <- A/B # Non-significant Subnetwork Genes non_sig_snw_genes <- base::setdiff(input_genes, sig_genes_vec) for (i in base::seq_len(nrow(enrichment_res))) { tmp <- intersect(non_sig_snw_genes, genes_by_term[[enrichment_res$ID[i]]]) enrichment_res$non_Signif_Snw_Genes[i] <- paste(tmp, collapse = ", ") } ## reorder columns to_order <- c("ID", "Term_Description", "Fold_Enrichment", "p_value", "adj_p", "non_Signif_Snw_Genes") enrichment_res <- enrichment_res[, to_order] return(enrichment_res) } #' Perform Enrichment Analyses on the Input Subnetworks #' #' @param snws a list of subnetwork genes (i.e., vectors of genes for each subnetwork) #' @inheritParams enrichment #' @inheritParams return_pin_path #' @param list_active_snw_genes boolean value indicating whether or not to report #' the non-significant active subnetwork genes for the active subnetwork which was enriched for #' the given term with the lowest p value (default = \code{FALSE}) #' #' @return a dataframe of combined enrichment results. Columns are: \describe{ #' \item{ID}{ID of the enriched term} #' \item{Term_Description}{Description of the enriched term} #' \item{Fold_Enrichment}{Fold enrichment value for the enriched term} #' \item{p_value}{p value of enrichment} #' \item{adj_p}{adjusted p value of enrichment} #' \item{support}{the support (proportion of active subnetworks leading to enrichment over all subnetworks) for the gene set} #' \item{non_Signif_Snw_Genes (OPTIONAL)}{the non-significant active subnetwork genes, comma-separated} #' } #' #' @export #' #' @seealso \code{\link{enrichment}} for the enrichment analysis for a single gene set #' #' @examples #' enr_res <- enrichment_analyses( #' snws = example_active_snws[1:2], #' sig_genes_vec = example_pathfindR_input$Gene.symbol[1:25], #' pin_name_path = 'KEGG' #' ) enrichment_analyses <- function(snws, sig_genes_vec, pin_name_path = "Biogrid", genes_by_term = pathfindR.data::kegg_genes, term_descriptions = pathfindR.data::kegg_descriptions, adj_method = "bonferroni", enrichment_threshold = 0.05, list_active_snw_genes = FALSE) { ### Argument check if (!is.logical(list_active_snw_genes)) { stop("`list_active_snw_genes` should be either TRUE or FALSE") } ### Load PIN Data pin_path <- return_pin_path(pin_name_path) pin <- utils::read.delim(file = pin_path, header = FALSE) background_genes <- unique(c(pin[, 1], pin[, 3])) # turn all to upper case for best match genes_by_term <- lapply(genes_by_term, base::toupper) sig_genes_vec <- base::toupper(sig_genes_vec) background_genes <- base::toupper(background_genes) ############ Enrichment per subnetwork enrichment_res <- lapply(snws, function(x) { enrichment(input_genes = base::toupper(x), genes_by_term = genes_by_term, term_descriptions = term_descriptions, adj_method = adj_method, enrichment_threshold = enrichment_threshold, sig_genes_vec = sig_genes_vec, background_genes = background_genes) }) ### indices for snw.s if (length(enrichment_res) != 0) { for (i in seq_len(length(enrichment_res))) { if (!is.null(enrichment_res[[i]])) { enrichment_res[[i]]$snw_idx <- i } } } ############ Combine Enrichments Results for All Subnetworks enrichment_res <- Reduce(rbind, enrichment_res) ############ Process if non-empty if (!is.null(enrichment_res)) { ## calculate support values support <- tapply(enrichment_res$snw_idx, enrichment_res$ID, length) support <- support/length(snws) enrichment_res$support <- support[match(enrichment_res$ID, names(support))] enrichment_res$snw_idx <- NULL ## delete non_Signif_Snw_Genes if list_active_snw_genes == FALSE if (!list_active_snw_genes) { enrichment_res$non_Signif_Snw_Genes <- NULL } ## keep lowest p for each term idx <- order(enrichment_res$adj_p) enrichment_res <- enrichment_res[idx, ] enrichment_res <- enrichment_res[!duplicated(enrichment_res$ID), ] } return(enrichment_res) } #' Summarize Enrichment Results #' #' @param enrichment_res a dataframe of combined enrichment results. Columns are: \describe{ #' \item{ID}{ID of the enriched term} #' \item{Term_Description}{Description of the enriched term} #' \item{Fold_Enrichment}{Fold enrichment value for the enriched term} #' \item{p_value}{p value of enrichment} #' \item{adj_p}{adjusted p value of enrichment} #' \item{non_Signif_Snw_Genes (OPTIONAL)}{the non-significant active subnetwork genes, comma-separated} #' } #' @inheritParams enrichment_analyses #' #' @return a dataframe of summarized enrichment results (over multiple iterations). Columns are: \describe{ #' \item{ID}{ID of the enriched term} #' \item{Term_Description}{Description of the enriched term} #' \item{Fold_Enrichment}{Fold enrichment value for the enriched term} #' \item{occurrence}{the number of iterations that the given term was found to enriched over all iterations} #' \item{support}{the median support (proportion of active subnetworks leading to enrichment within an iteration) over all iterations} #' \item{lowest_p}{the lowest adjusted-p value of the given term over all iterations} #' \item{highest_p}{the highest adjusted-p value of the given term over all iterations} #' \item{non_Signif_Snw_Genes (OPTIONAL)}{the non-significant active subnetwork genes, comma-separated} #' } #' @export #' #' @examples #' \dontrun{ #' summarize_enrichment_results(enrichment_res) #' } summarize_enrichment_results <- function(enrichment_res, list_active_snw_genes = FALSE) { message("## Processing the enrichment results over all iterations") ## Argument checks if (!is.logical(list_active_snw_genes)) { stop("`list_active_snw_genes` should be either TRUE or FALSE") } nec_cols <- c("ID", "Term_Description", "Fold_Enrichment", "p_value", "adj_p", "support") if (list_active_snw_genes) { nec_cols <- c(nec_cols, "non_Signif_Snw_Genes") } if (!is.data.frame(enrichment_res)) { stop("`enrichment_res` should be a data frame") } if (ncol(enrichment_res) != length(nec_cols)) { stop("`enrichment_res` should have exactly ", length(nec_cols), " columns") } if (!all(nec_cols %in% colnames(enrichment_res))) { stop("`enrichment_res` should have column names ", paste(dQuote(nec_cols), collapse = ", ")) } ## Annotate lowest p, highest p, occurrence and median support final_res <- enrichment_res lowest_p <- tapply(enrichment_res$adj_p, enrichment_res$ID, min) highest_p <- tapply(enrichment_res$adj_p, enrichment_res$ID, max) occurrence <- tapply(enrichment_res$adj_p, enrichment_res$ID, length) support <- tapply(enrichment_res$support, enrichment_res$ID, stats::median) matched_idx <- match(final_res$ID, names(lowest_p)) final_res$lowest_p <- as.numeric(lowest_p[matched_idx]) matched_idx <- match(final_res$ID, names(highest_p)) final_res$highest_p <- as.numeric(highest_p[matched_idx]) matched_idx <- match(final_res$ID, names(occurrence)) final_res$occurrence <- as.numeric(occurrence[matched_idx]) matched_idx <- match(final_res$ID, names(support)) final_res$support <- as.numeric(support[matched_idx]) ## reorder columns keep <- c("ID", "Term_Description", "Fold_Enrichment", "occurrence", "support", "lowest_p", "highest_p") if (list_active_snw_genes) { keep <- c(keep, "non_Signif_Snw_Genes") } final_res <- final_res[, keep] ## keep data with lowest p-value over all iterations final_res <- final_res[order(final_res$lowest_p), ] final_res <- final_res[!duplicated(final_res$ID), ] rownames(final_res) <- NULL return(final_res) } ================================================ FILE: R/pathfindr.R ================================================ #' pathfindR: A package for Enrichment Analysis Utilizing Active Subnetworks #' #' pathfindR is a tool for active-subnetwork-oriented gene set enrichment analysis. #' The main aim of the package is to identify active subnetworks in a #' protein-protein interaction network using a user-provided list of genes #' and associated p values then performing enrichment analyses on the identified #' subnetworks, discovering enriched terms (i.e. pathways, gene ontology, TF target #' gene sets etc.) that possibly underlie the phenotype of interest. #' #' For analysis on non-Homo sapiens organisms, pathfindR offers utility functions #' for obtaining organism-specific PIN data and organism-specific gene sets data. #' #' pathfindR also offers functionalities to cluster the enriched terms and #' identify representative terms in each cluster, to score the enriched terms #' per sample and to visualize analysis results. #' #' #' @seealso See \code{\link{run_pathfindR}} for details on the pathfindR #' active-subnetwork-oriented enrichment analysis #' See \code{\link{cluster_enriched_terms}} for details on methods of enriched #' terms clustering to define clusters of biologically-related terms #' See \code{\link{score_terms}} for details on agglomerated score calculation #' for enriched terms to investigate how a gene set is altered in a given sample #' (or in cases vs. controls) #' See \code{\link{term_gene_heatmap}} for details on visualization of the heatmap #' of enriched terms by involved genes #' See \code{\link{term_gene_graph}} for details on visualizing terms and #' term-related genes as a graph to determine the degree of overlap between the #' enriched terms by identifying shared and/or distinct significant genes #' See \code{\link{UpSet_plot}} for details on creating an UpSet plot of the #' enriched terms. #' See \code{\link{get_pin_file}} for obtaining organism-specific PIN data and #' \code{\link{get_gene_sets_list}} for obtaining organism-specific gene sets data #' @import pathfindR.data #' @name pathfindR "_PACKAGE" globalVariables(c("for_coloring", "size")) ================================================ FILE: R/scoring.R ================================================ #' Calculate Agglomerated Scores of Enriched Terms for Each Subject #' #' @param enrichment_table a data frame that must contain the 3 columns below: \describe{ #' \item{Term_Description}{Description of the enriched term (necessary if \code{use_description = TRUE})} #' \item{ID}{ID of the enriched term (necessary if \code{use_description = FALSE})} #' \item{Up_regulated}{the up-regulated genes in the input involved in the given term's gene set, comma-separated} #' \item{Down_regulated}{the down-regulated genes in the input involved in the given term's gene set, comma-separated} #' } #' @param exp_mat the experiment (e.g., gene expression/methylation) matrix. #' Columns are samples and rows are genes. Column names must contain sample #' names and row names must contain the gene symbols. #' @param cases (Optional) A vector of sample names that are cases in the #' case/control experiment. (default = NULL) #' @param use_description Boolean argument to indicate whether term descriptions #' (in the 'Term_Description' column) should be used. (default = \code{FALSE}) #' @param plot_hmap Boolean value to indicate whether or not to draw the #' heatmap plot of the scores. (default = TRUE) #' @param ... Additional arguments for \code{\link{plot_scores}} for aesthetics #' of the heatmap plot #' #' @return Matrix of agglomerated scores of each enriched term per sample. #' Columns are samples, rows are enriched terms. Optionally, displays a heatmap #' of this matrix. #' #' @section Conceptual Background: #' For an experiment matrix (containing expression, methylation, etc. values), #' the rows of which are genes and the columns of which are samples, #' we denote: \itemize{ #' \item E as a matrix of size \ifelse{html}{\out{m x n}}{\eqn{m \times n}} #' \item G as the set of all genes in the experiment \ifelse{html}{\out{G = Ei., i ∈ [1, m]}}{\eqn{G = E_{i\cdot}, \ \ i \in [1, m]}} #' \item S as the set of all samples in the experiment \ifelse{html}{\out{S = E.j, i ∈ [1, n]}}{\eqn{S = E_{j\cdot}, \ \ \in [1, n]}} #' } #' #' We next define the gene score matrix GS (the standardized experiment matrix, #' also of size \ifelse{html}{\out{m x n}}{\eqn{m \times n}}) as: #' #' \ifelse{html}{\out{GSgs = (Egs - ēg) / sg}}{\eqn{GS_{gs} = \frac{E_{gs} - \bar{e_g}}{s_g}}} #' #' where \ifelse{html}{\out{g ∈ G}}{\eqn{g \in G}}, \ifelse{html}{\out{s ∈ S}}{\eqn{s \in S}}, #' \ifelse{html}{\out{ēg}}{\eqn{\bar{e_g}}} is the mean of #' all values for gene g and \ifelse{html}{\out{sg}}{\eqn{\bar{s_g}}} #' is the standard deviation of all values for gene g. #' #' We next denote T to be a set of terms (where each \ifelse{html}{\out{t ∈ T}}{\eqn{t \in T}} #' is a set of term-related genes, i.e., #' \ifelse{html}{\out{t = \{gx, ..., gy\} ⊂ G}}{\eqn{t = \{g_x, ..., g_y\} \subset G}}) #' and finally define the agglomerated term scores matrix TS (where rows #' correspond to genes and columns corresponds to samples s.t. the matrix has size #' \ifelse{html}{\out{|T| x n}}{\eqn{|T| \times n}}) as: #' #' \ifelse{html}{\out{TSts = 1/|t| ∑ g ∈ t GSgs}}{\eqn{TS_{ts} = \frac{1}{|t|}\sum_{g \in t} GS_{gs}}}, #' where \ifelse{html}{\out{t ∈ T}}{\eqn{t \in T}} and \ifelse{html}{\out{s ∈ S}}{\eqn{s \in S}}. #' #' @export #' #' @examples #' score_matrix <- score_terms( #' example_pathfindR_output, #' example_experiment_matrix, #' plot_hmap = FALSE #' ) score_terms <- function(enrichment_table, exp_mat, cases = NULL, use_description = FALSE, plot_hmap = TRUE, ...) { #### Argument Checks if (!is.logical(use_description)) { stop("`use_description` should either be TRUE or FALSE") } if (!is.logical(plot_hmap)) { stop("`plot_hmap` should either be TRUE or FALSE") } if (!is.data.frame(enrichment_table)) { stop("`enrichment_table` should be a data frame of enrichment results") } ID_column <- ifelse(use_description, "Term_Description", "ID") nec_cols <- c(ID_column, "Up_regulated", "Down_regulated") if (!all(nec_cols %in% colnames(enrichment_table))) { stop("`enrichment_table` should contain all of ", paste(dQuote(nec_cols), collapse = ", ")) } if (!is.matrix(exp_mat)) { stop("`exp_mat` should be a matrix") } if (!is.null(cases)) { if (!is.atomic(cases)) { stop("`cases` should be a vector") } if (!all(cases %in% colnames(exp_mat))) { stop("Missing `cases` in `exp_mat`") } } ## fix duplicated term descriptions (if using description) if (use_description) { dup_desc <- enrichment_table$Term_Description[duplicated(enrichment_table$Term_Description)] tmp <- ifelse(enrichment_table$Term_Description %in% dup_desc, paste0(enrichment_table$Term_Description, "_", enrichment_table$ID), enrichment_table$Term_Description) enrichment_table$Term_Description <- tmp } #### Create score matrix all_scores_matrix <- c() for (i in base::seq_len(nrow(enrichment_table))) { # Get signif. genes up_genes <- enrichment_table$Up_regulated[i] down_genes <- enrichment_table$Down_regulated[i] up_genes <- unlist(strsplit(up_genes, ", ")) down_genes <- unlist(strsplit(down_genes, ", ")) genes <- c(up_genes, down_genes) # convert gene symbols to upper case for comparison genes <- toupper(genes) exp_mat_genes <- rownames(exp_mat) exp_mat_genes <- toupper(exp_mat_genes) # some genes may not be in exp. matrix genes <- genes[genes %in% exp_mat_genes] if (length(genes) != 0) { # subset exp. matrix to include only genes sub_mat <- exp_mat[exp_mat_genes %in% genes, , drop = FALSE] current_term_score_matrix <- c() for (gene in genes) { gene_vec <- sub_mat[toupper(rownames(sub_mat)) == gene, ] gene_vec <- as.numeric(gene_vec) names(gene_vec) <- colnames(sub_mat) # calculate mean and sd across samples gene_mean <- base::mean(gene_vec) gene_sd <- stats::sd(gene_vec) gene_scores <- vapply(gene_vec, function(x) (x - gene_mean)/gene_sd, 1.2) current_term_score_matrix <- rbind(current_term_score_matrix, gene_scores) rownames(current_term_score_matrix)[nrow(current_term_score_matrix)] <- gene } current_term_scores <- apply(current_term_score_matrix, 2, base::mean) all_scores_matrix <- rbind(all_scores_matrix, current_term_scores) rownames(all_scores_matrix)[nrow(all_scores_matrix)] <- enrichment_table[i, ID_column] } } if (!is.null(cases)) { ## order as cases, then controls match1 <- match(cases, colnames(all_scores_matrix)) match2 <- setdiff(base::seq_len(ncol(all_scores_matrix)), match1) all_scores_matrix <- all_scores_matrix[, c(match1, match2)] } if (plot_hmap) { heatmap <- plot_scores(score_matrix = all_scores_matrix, cases = cases, ...) graphics::plot(heatmap) } return(all_scores_matrix) } #' Plot the Heatmap of Score Matrix of Enriched Terms per Sample #' #' @param score_matrix Matrix of agglomerated enriched term scores per sample. Columns are #' samples, rows are enriched terms #' @inheritParams score_terms #' @param label_samples Boolean value to indicate whether or not to label the #' samples in the heatmap plot (default = TRUE) #' @param case_title Naming of the 'Case' group (as in \code{cases}) (default = 'Case') #' @param control_title Naming of the 'Control' group (default = 'Control') #' @param low a string indicating the color of 'low' values in the coloring gradient (default = 'green') #' @param mid a string indicating the color of 'mid' values in the coloring gradient (default = 'black') #' @param high a string indicating the color of 'high' values in the coloring gradient (default = 'red') #' #' @return A `ggplot2` object containing the heatmap plot. x-axis indicates #' the samples. y-axis indicates the enriched terms. 'Score' indicates the #' score of the term in a given sample. If \code{cases} are provided, the plot is #' divided into 2 facets, named by \code{case_title} and \code{control_title}. #' #' @import ggplot2 #' @export #' #' @examples #' score_matrix <- score_terms( #' example_pathfindR_output, #' example_experiment_matrix, #' plot_hmap = FALSE #' ) #' hmap <- plot_scores(score_matrix) plot_scores <- function(score_matrix, cases = NULL, label_samples = TRUE, case_title = "Case", control_title = "Control", low = "green", mid = "black", high = "red") { #### Argument Checks if (!is.matrix(score_matrix)) { stop("`score_matrix` should be a matrix") } if (!is.null(cases)) { if (!is.atomic(cases)) { stop("`cases` should be a vector") } if (!all(cases %in% colnames(score_matrix))) { stop("Missing `cases` in `score_matrix`") } } if (!is.logical(label_samples)) { stop("`label_samples` should be TRUE or FALSE") } if (!is.character(case_title) | length(case_title) != 1) { stop("`case_title` should be a single character value") } if (!is.character(control_title) | length(control_title) != 1) { stop("`control_title` should be a single character value") } if (!isColor(low)) { stop("`low` should be a valid color") } if (!isColor(mid)) { stop("`mid` should be a valid color") } if (!isColor(high)) { stop("`high` should be a valid color") } #### Create plot sort according to activity (up/down) if (!is.null(cases)) { tmp <- rowMeans(score_matrix[, cases, drop = FALSE]) score_matrix <- score_matrix[c(which(tmp >= 0), which(tmp < 0)), ] } ## transform the matrix var_names <- list() var_names[["Term"]] <- factor(rownames(score_matrix), levels = rev(rownames(score_matrix))) var_names[["Sample"]] <- factor(colnames(score_matrix), levels = colnames(score_matrix)) score_df <- expand.grid(var_names, KEEP.OUT.ATTRS = FALSE, stringsAsFactors = FALSE) scores <- as.vector(score_matrix) scores <- data.frame(scores) score_df <- cbind(score_df, scores) if (!is.null(cases)) { score_df$Type <- ifelse(score_df$Sample %in% cases, case_title, control_title) score_df$Type <- factor(score_df$Type, levels = c(case_title, control_title)) } g <- ggplot2::ggplot(score_df, ggplot2::aes(x = .data$Sample, y = .data$Term)) g <- g + ggplot2::geom_tile(ggplot2::aes(fill = .data$scores), color = "white") g <- g + ggplot2::scale_fill_gradient2(low = low, mid = mid, high = high) g <- g + ggplot2::theme(axis.title.x = ggplot2::element_blank(), axis.title.y = ggplot2::element_blank(), axis.text.x = ggplot2::element_text(angle = 45, hjust = 1), legend.title = ggplot2::element_text(size = 10), legend.text = ggplot2::element_text(size = 12)) g <- g + ggplot2::labs(fill = "Score") if (!is.null(cases)) { g <- g + ggplot2::facet_grid(~Type, scales = "free_x", space = "free") g <- g + ggplot2::theme(strip.text.x = ggplot2::element_text(size = 12, face = "bold")) } if (!label_samples) { g <- g + ggplot2::theme(axis.text.x = ggplot2::element_blank(), axis.ticks.x = ggplot2::element_blank()) } return(g) } ================================================ FILE: R/utility.R ================================================ #' Active Subnetwork Search + Enrichment Analysis Wrapper for a Single Iteration #' #' @param i current iteration index (default = \code{NULL}) #' @param dirs vector of directories for parallel runs #' @inheritParams active_snw_search #' @inheritParams enrichment_analyses #' @inheritParams active_snw_enrichment_wrapper #' #' @return Data frame of enrichment results using active subnetwork search results single_iter_wrapper <- function(i = NULL, dirs, input_processed, pin_path, score_quan_thr, sig_gene_thr, search_method, silent_option, use_all_positives, geneInitProbs, saTemp0, saTemp1, saIter, gaPop, gaIter, gaThread, gaCrossover, gaMut, grMaxDepth, grSearchDepth, grOverlap, grSubNum, gset_list, adj_method, enrichment_threshold, list_active_snw_genes) { snws_file <- "active_snws" dir_for_parallel_run <- NULL if (!is.null(i)) { snws_file <- paste0("active_snws_", i) dir_for_parallel_run <- dirs[i] } snws <- active_snw_search(input_for_search = input_processed, pin_name_path = pin_path, snws_file = snws_file, dir_for_parallel_run = dir_for_parallel_run, score_quan_thr = score_quan_thr, sig_gene_thr = sig_gene_thr, search_method = search_method, seedForRandom = ifelse(is.null(i), 1234, i), silent_option = silent_option, use_all_positives = use_all_positives, geneInitProbs = ifelse(!is.null(i), geneInitProbs[i], geneInitProbs), saTemp0 = saTemp0, saTemp1 = saTemp1, saIter = saIter, gaPop = gaPop, gaIter = gaIter, gaThread = gaThread, gaCrossover = gaCrossover, gaMut = gaMut, grMaxDepth = grMaxDepth, grSearchDepth = grSearchDepth, grOverlap = grOverlap, grSubNum = grSubNum) enrichment_res <- enrichment_analyses(snws = snws, sig_genes_vec = input_processed$GENE, pin_name_path = pin_path, genes_by_term = gset_list$genes_by_term, term_descriptions = gset_list$term_descriptions, adj_method = adj_method, enrichment_threshold = enrichment_threshold, list_active_snw_genes = list_active_snw_genes) return(enrichment_res) } #' Wrapper for Active Subnetwork Search + Enrichment over Single/Multiple Iteration(s) #' #' @param input_processed processed input data frame #' @param pin_path path/to/PIN/file #' @param gset_list list for gene sets #' @param disable_parallel boolean to indicate whether to disable parallel runs #' via \code{foreach} (default = FALSE) #' @inheritParams run_pathfindR #' @inheritParams active_snw_search #' @inheritParams enrichment_analyses #' @param iterations number of iterations for active subnetwork search and #' enrichment analyses (Default = 10) #' @param n_processes optional argument for specifying the number of processes #' used by foreach. If not specified, the function determines this #' automatically (Default == NULL. Gets set to 1 for Genetic Algorithm) #' #' @return Data frame of combined pathfindR enrichment results active_snw_enrichment_wrapper <- function(input_processed, pin_path, gset_list, enrichment_threshold, list_active_snw_genes, adj_method = "bonferroni", search_method = "GR", disable_parallel = FALSE, use_all_positives = FALSE, iterations = 10, n_processes = NULL, score_quan_thr = 0.8, sig_gene_thr = 0.02, saTemp0 = 1, saTemp1 = 0.01, saIter = 10000, gaPop = 400, gaIter = 200, gaThread = 5, gaCrossover = 1, gaMut = 0, grMaxDepth = 1, grSearchDepth = 1, grOverlap = 0.5, grSubNum = 1000, silent_option = TRUE) { message("## Performing Active Subnetwork Search and Enrichment") ############ Argument checks Active Subnetwork Search Method valid_mets <- c("GR", "SA", "GA") if (!search_method %in% valid_mets) { stop("`search_method` should be one of ", paste(dQuote(valid_mets), collapse = ", ")) } ## If search_method is GA, set iterations as 1 if (search_method == "GA") { warning("`iterations` is set to 1 because `search_method = \"GA\"`", call. = FALSE) iterations <- 1 } if (!is.null(n_processes)) { if (!is.numeric(n_processes)) { stop("`n_processes` should be either NULL or a positive integer") } if (n_processes < 1) { stop("`n_processes` should be > 1") } } # calculate the number of processes, if necessary if (is.null(n_processes)) { n_processes <- parallel::detectCores() - 1 } ## If iterations < n_processes, set n_processes to iterations if (iterations < n_processes & iterations != 1) { message("`n_processes` is set to `iterations` because `iterations` < `n_processes`") n_processes <- iterations } if (!is.logical(use_all_positives)) { stop("`use_all_positives` should be either TRUE or FALSE") } if (!is.logical(silent_option)) { stop("`silent_option` should be either TRUE or FALSE") } if (!is.logical(disable_parallel)) { stop("`disable_parallel` should be either TRUE or FALSE") } if (!is.numeric(iterations)) { stop("`iterations` should be a positive integer") } if (iterations < 1) { stop("`iterations` should be >= 1") } geneInitProbs <- 0.1 dirs <- c() if (iterations > 1) { geneInitProbs <- seq(from = 0.01, to = 0.2, length.out = iterations) for (i in base::seq_len(iterations)) { dir_i <- file.path("active_snw_searches", paste0("Iteration_", i)) dir.create(dir_i, recursive = TRUE, showWarnings = FALSE) dirs <- c(dirs, dir_i) } } if (iterations == 1) { combined_res <- single_iter_wrapper(i = NULL, dirs, input_processed, pin_path, score_quan_thr, sig_gene_thr, search_method, silent_option, use_all_positives, geneInitProbs, saTemp0, saTemp1, saIter, gaPop, gaIter, gaThread, gaCrossover, gaMut, grMaxDepth, grSearchDepth, grOverlap, grSubNum, gset_list, adj_method, enrichment_threshold, list_active_snw_genes) } else { if (!disable_parallel) { cl <- parallel::makeCluster(n_processes, setup_strategy = "sequential") doParallel::registerDoParallel(cl) `%dopar%` <- foreach::`%dopar%` combined_res <- foreach::foreach(i = 1:iterations, .combine = rbind, .packages = "pathfindR") %dopar% { single_iter_wrapper(i, dirs, input_processed, pin_path, score_quan_thr, sig_gene_thr, search_method, silent_option, use_all_positives, geneInitProbs, saTemp0, saTemp1, saIter, gaPop, gaIter, gaThread, gaCrossover, gaMut, grMaxDepth, grSearchDepth, grOverlap, grSubNum, gset_list, adj_method, enrichment_threshold, list_active_snw_genes) } parallel::stopCluster(cl) } else { combined_res <- c() for (i in 1:iterations) { current_res <- single_iter_wrapper(i, dirs, score_quan_thr, sig_gene_thr, search_method, silent_option, use_all_positives, geneInitProbs, saTemp0, saTemp1, saIter, gaPop, gaIter, gaThread, gaCrossover, gaMut, grMaxDepth, grSearchDepth, grOverlap, grSubNum, gset_list, adj_method, enrichment_threshold, list_active_snw_genes) combined_res <- rbind(combined_res, current_res) } } } return(combined_res) } #' Configure Output Directory Name #' #' @inheritParams run_pathfindR #' #' @return /path/to/output/dir configure_output_dir <- function(output_dir = NULL) { output_dir_init <- output_dir output_dir <- ifelse(is.null(output_dir), file.path(tempdir(check = TRUE), "pathfindR_results"), output_dir) dir_changed <- FALSE while (dir.exists(output_dir)) { output_dir <- sub("/$", "", output_dir) if (grepl("\\(\\d+\\)$", output_dir)) { output_dir <- unlist(strsplit(output_dir, "\\(")) suffix <- as.numeric(sub("\\)", "", output_dir[2])) + 1 output_dir <- paste0(output_dir[1], "(", suffix, ")") } else { output_dir <- paste0(output_dir, "(1)") } dir_changed <- TRUE } if (dir_changed & !is.null(output_dir_init)) { message(paste0("There is already a directory named \"", output_dir_init, "\".\nWriting the result to \"", output_dir, "\" not to overwrite any previous results.")) } return(output_dir) } #' Create HTML Report of pathfindR Results #' #' @inheritParams run_pathfindR #' @param input_processed processed input data frame #' @param final_res final pathfindR result data frame #' @param dir_for_report directory to render the report in create_HTML_report <- function(input, input_processed, final_res, dir_for_report) { message("## Creating HTML report") rmarkdown::render(input = system.file("rmd", "results.Rmd", package = "pathfindR"), output_dir = dir_for_report) rmarkdown::render(input = system.file("rmd", "enriched_terms.Rmd", package = "pathfindR"), params = list(df = final_res), output_dir = dir_for_report) rmarkdown::render(input = system.file("rmd", "conversion_table.Rmd", package = "pathfindR"), params = list(df = input_processed, original_df = input), output_dir = dir_for_report) } #' Input Testing #' #' @param input the input data that pathfindR uses. The input must be a data #' frame with three columns: \enumerate{ #' \item Gene Symbol (Gene Symbol) #' \item Change value, e.g. log(fold change) (OPTIONAL) #' \item p value, e.g. adjusted p value associated with differential expression #' } #' @param p_val_threshold the p value threshold to use when filtering #' the input data frame. Must a numeric value between 0 and 1. (default = 0.05) #' #' @return Only checks if the input and the threshold follows the required #' specifications. #' @export #' @seealso See \code{\link{run_pathfindR}} for the wrapper function of the #' pathfindR workflow #' @examples #' input_testing(example_pathfindR_input, 0.05) input_testing <- function(input, p_val_threshold = 0.05) { message("## Testing input") if (!is.data.frame(input)) { stop("the input is not a data frame") } if (ncol(input) != 2 & ncol(input) != 3) { stop("the input should have 2 or 3 columns") } if (nrow(input) < 2) { stop("There must be at least 2 rows (genes) in the input data frame") } if (!is.numeric(p_val_threshold)) { stop("`p_val_threshold` must be a numeric value between 0 and 1") } if (p_val_threshold > 1 | p_val_threshold < 0) { stop("`p_val_threshold` must be between 0 and 1") } # if changes are provided, p vals are in col. 3, else in col. 2 p_column <- ifelse(ncol(input) == 3, 3, 2) if (any(is.na(input[, p_column]))) { stop("p values cannot contain NA values") } if (!all(is.numeric(input[, p_column]))) { stop("p values must all be numeric") } if (any(input[, p_column] > 1 | input[, p_column] < 0)) { stop("p values must all be between 0 and 1") } message("The input looks OK") } #' Process Input #' @inheritParams input_testing #' @inheritParams active_snw_search #' @inheritParams return_pin_path #' @param convert2alias boolean to indicate whether or not to convert gene symbols #' in the input that are not found in the PIN to an alias symbol found in the PIN #' (default = TRUE) IMPORTANT NOTE: the conversion uses human gene symbols/alias symbols. #' #' @return This function first filters the input so that all p values are less #' than or equal to the threshold. Next, gene symbols that are not found in #' the PIN are identified. If aliases of these gene symbols are found in the #' PIN, the symbols are converted to the corresponding aliases. The #' resulting data frame containing the original gene symbols, the updated #' symbols, change values and p values is then returned. #' @export #' #' @seealso See \code{\link{run_pathfindR}} for the wrapper function of the #' pathfindR workflow #' #' @examples #' processed_df <- input_processing( #' input = example_pathfindR_input[1:5, ], #' pin_name_path = 'KEGG' #' ) #' processed_df <- input_processing( #' input = example_pathfindR_input[1:5, ], #' pin_name_path = 'KEGG', #' convert2alias = FALSE #' ) input_processing <- function(input, p_val_threshold = 0.05, pin_name_path = "Biogrid", convert2alias = TRUE) { message("## Processing input. Converting gene symbols, if necessary (and if human gene symbols provided)") if (!is.logical(convert2alias)) { stop("`convert2alias` should be either TRUE or FALSE") } pin_path <- return_pin_path(pin_name_path) if (ncol(input) == 2) { input <- data.frame(GENE = input[, 1], CHANGE = rep(1e+06, nrow(input)), P_VALUE = input[, 2]) } colnames(input) <- c("GENE", "CHANGE", "P_VALUE") ## Turn GENE into character if (is.factor(input$GENE)) { warning("The gene column was turned into character from factor.", call. = FALSE) input$GENE <- as.character(input$GENE) } message("Number of genes provided in input: ", nrow(input)) ## Discard larger than p-value threshold if (sum(input$P_VALUE <= p_val_threshold) == 0) { stop("No input p value is lower than the provided threshold (", p_val_threshold, ")") } input <- input[input$P_VALUE <= p_val_threshold, ] message("Number of genes in input after p-value filtering: ", nrow(input)) ## Choose lowest p for each gene if (anyDuplicated(input$GENE)) { warning("Duplicated genes found! The lowest p value for each gene was selected", call. = FALSE) input <- input[order(input$P_VALUE, decreasing = FALSE), ] input <- input[!duplicated(input$GENE), ] } ## Fix p < 1e-13 if (any(input$P_VALUE < 1e-13)) { message("pathfindR cannot handle p values < 1e-13. These were changed to 1e-13") input$P_VALUE <- ifelse(input$P_VALUE < 1e-13, 1e-13, input$P_VALUE) } ## load and prep pin pin <- utils::read.delim(file = pin_path, header = FALSE) ## Genes not in pin PIN_genes <- c(base::toupper(pin[, 1]), base::toupper(pin[, 3])) missing_symbols <- input$GENE[!base::toupper(input$GENE) %in% PIN_genes] non_missing_symbols <- input$GENE[base::toupper(input$GENE) %in% PIN_genes] if (convert2alias & !requireNamespace("org.Hs.eg.db", quietly = TRUE)) { message( "Package 'org.Hs.eg.db' is not installed; returning input genes unchanged.\n", "Install it with:\n", " if (!requireNamespace('BiocManager', quietly = TRUE)) install.packages('BiocManager')\n", " BiocManager::install('org.Hs.eg.db')" ) convert2alias <- FALSE } if (convert2alias & length(missing_symbols) != 0) { ## use SQL to get alias table and gene_info table (contains the ## symbols) first open the database connection db_con <- org.Hs.eg.db::org.Hs.eg_dbconn() ## the SQL query sql_query <- "SELECT * FROM alias, gene_info WHERE alias._id == gene_info._id;" ## execute the query on the database hsa_alias_df <- DBI::dbGetQuery(db_con, sql_query) select_alias <- function(result, converted, idx) { while (idx > 0) { if (!result[idx] %in% c(converted[, 2], non_missing_symbols)) { return(result[idx]) } idx <- idx - 1 } return("NOT_FOUND") } ## loop for getting all symbols converted <- c() for (i in base::seq_len(length(missing_symbols))) { result <- hsa_alias_df[hsa_alias_df$alias_symbol == missing_symbols[i], c("alias_symbol", "symbol")] result <- hsa_alias_df[hsa_alias_df$symbol %in% result$symbol, c("alias_symbol", "symbol")] result <- result$alias_symbol[base::toupper(result$alias_symbol) %in% PIN_genes] ## avoid duplicate entries to_add <- select_alias(result, converted, length(result)) converted <- rbind(converted, c(missing_symbols[i], to_add)) } ## Convert to appropriate symbol input$new_gene <- input$GENE input$new_gene[match(converted[, 1], input$new_gene)] <- converted[, 2] } else { input$new_gene <- ifelse(input$GENE %in% missing_symbols, "NOT_FOUND", input$GENE) } ## number and percent still missing n <- sum(input$new_gene == "NOT_FOUND") perc <- n/nrow(input) * 100 if (n == nrow(input)) { stop("None of the genes were in the PIN\nPlease check your gene symbols") } ## Give out warning indicating the number of still missing if (n != 0) { message(paste0("Could not find any interactions for ", n, " (", round(perc, 2), "%) genes in the PIN")) } else { message(paste0("Found interactions for all genes in the PIN")) } ## reorder columns input <- input[, c(1, 4, 2, 3)] colnames(input) <- c("old_GENE", "GENE", "CHANGE", "P_VALUE") input <- input[input$GENE != "NOT_FOUND", ] ## Keep lowest p value for duplicated genes input <- input[order(input$P_VALUE), ] input <- input[!duplicated(input$GENE), ] ## Check that at least two genes remain if (nrow(input) < 2) { stop("After processing, 1 gene (or no genes) could be mapped to the PIN") } message("Final number of genes in input: ", nrow(input)) return(input) } #' Annotate the Affected Genes in the Provided Enriched Terms #' #' Function to annotate the involved affected (input) genes in each term. #' #' @param result_df data frame of enrichment results. #' The only must-have column is 'ID'. #' @param input_processed input data processed via \code{\link{input_processing}} #' @param genes_by_term List that contains genes for each gene set. Names of #' this list are gene set IDs (default = kegg_genes) #' #' @return The original data frame with two additional columns: \describe{ #' \item{Up_regulated}{the up-regulated genes in the input involved in the given term's gene set, comma-separated} #' \item{Down_regulated}{the down-regulated genes in the input involved in the given term's gene set, comma-separated} #' } #' @export #' #' @examples #' example_gene_data <- example_pathfindR_input #' colnames(example_gene_data) <- c('GENE', 'CHANGE', 'P_VALUE') #' #' annotated_result <- annotate_term_genes( #' result_df = example_pathfindR_output, #' input_processed = example_gene_data #' ) annotate_term_genes <- function(result_df, input_processed, genes_by_term = pathfindR.data::kegg_genes) { message("## Annotating involved genes and visualizing enriched terms") ### Argument checks if (!is.data.frame(result_df)) { stop("`result_df` should be a data frame") } if (!"ID" %in% colnames(result_df)) { stop("`result_df` should contain an \"ID\" column") } if (!is.data.frame(input_processed)) { stop("`input_processed` should be a data frame") } if (!all(c("GENE", "CHANGE") %in% colnames(input_processed))) { stop("`input_processed` should contain the columns \"GENE\" and \"CHANGE\"") } if (!is.list(genes_by_term)) { stop("`genes_by_term` should be a list of term gene sets") } if (is.null(names(genes_by_term))) { stop("`genes_by_term` should be a named list (names are gene set IDs)") } ### Annotate up/down-regulated term-related genes Up/Down-regulated genes upreg <- base::toupper(input_processed$GENE[input_processed$CHANGE >= 0]) downreg <- base::toupper(input_processed$GENE[input_processed$CHANGE < 0]) ## Annotation annotated_df <- result_df annotated_df$Down_regulated <- annotated_df$Up_regulated <- NA for (i in base::seq_len(nrow(annotated_df))) { idx <- which(names(genes_by_term) == annotated_df$ID[i]) temp <- genes_by_term[[idx]] annotated_df$Up_regulated[i] <- paste(temp[base::toupper(temp) %in% upreg], collapse = ", ") annotated_df$Down_regulated[i] <- paste(temp[base::toupper(temp) %in% downreg], collapse = ", ") } return(annotated_df) } #' Fetch Gene Set Objects #' #' Function for obtaining the gene sets per term and the term descriptions to #' be used for enrichment analysis. #' #' @param gene_sets Name of the gene sets to be used for enrichment analysis. #' Available gene sets are 'KEGG', 'Reactome', 'BioCarta', 'GO-All', #' 'GO-BP', 'GO-CC', 'GO-MF', 'cell_markers', 'mmu_KEGG' or 'Custom'. #' If 'Custom', the arguments \code{custom_genes} and \code{custom_descriptions} #' must be specified. (Default = 'KEGG') #' @param min_gset_size minimum number of genes a term must contain (default = 10) #' @param max_gset_size maximum number of genes a term must contain (default = 300) #' @param custom_genes a list containing the genes involved in each custom #' term. Each element is a vector of gene symbols located in the given custom #' term. Names should correspond to the IDs of the custom terms. #' @param custom_descriptions A vector containing the descriptions for each #' custom term. Names of the vector should correspond to the IDs of the custom #' terms. #' #' @return a list containing 2 elements \describe{ #' \item{genes_by_term}{list of vectors of genes contained in each term} #' \item{term_descriptions}{vector of descriptions per each term} #' } #' #' @export #' #' @examples #' KEGG_gset <- fetch_gene_set() #' GO_MF_gset <- fetch_gene_set('GO-MF', min_gset_size = 20, max_gset_size = 100) fetch_gene_set <- function(gene_sets = "KEGG", min_gset_size = 10, max_gset_size = 300, custom_genes = NULL, custom_descriptions = NULL) { ### Argument checks all_gs_opts <- c("KEGG", "Reactome", "BioCarta", "GO-All", "GO-BP", "GO-CC", "GO-MF", "cell_markers", "mmu_KEGG", "Custom") if (!gene_sets %in% all_gs_opts) { stop("`gene_sets` should be one of ", paste(dQuote(all_gs_opts), collapse = ", ")) } if (!is.numeric(min_gset_size)) { stop("`min_gset_size` should be numeric") } if (!is.numeric(max_gset_size)) { stop("`max_gset_size` should be numeric") } ### Custom Gene Sets if (gene_sets == "Custom") { if (is.null(custom_genes) | is.null(custom_descriptions)) { stop("`custom_genes` and `custom_descriptions` must be provided if `gene_sets = \"Custom\"`") } if (!is.list(custom_genes)) { stop("`custom_genes` should be a list of term gene sets") } if (is.null(names(custom_genes))) { stop("`custom_genes` should be a named list (names are gene set IDs)") } if (!is.atomic(custom_descriptions)) { stop("`custom_descriptions` should be a vector of term gene descriptions") } if (is.null(names(custom_descriptions))) { stop("`custom_descriptions` should be a named vector (names are gene set IDs)") } # filter by size gset_lens <- vapply(custom_genes, length, 1) keep <- which(gset_lens >= min_gset_size & gset_lens <= max_gset_size) custom_genes <- custom_genes[keep] custom_descriptions <- custom_descriptions[names(custom_genes)] return(list(genes_by_term = custom_genes, term_descriptions = custom_descriptions)) } ### Built-in Gene Sets GO gene sets if (grepl("^GO", gene_sets)) { genes_by_term <- pathfindR.data::go_all_genes GO_df <- pathfindR.data:::GO_all_terms_df term_descriptions <- GO_df$GO_term names(term_descriptions) <- GO_df$GO_ID if (gene_sets == "GO-BP") { tmp <- GO_df$GO_ID[GO_df$Category == "Process"] genes_by_term <- genes_by_term[tmp] term_descriptions <- term_descriptions[tmp] } else if (gene_sets == "GO-CC") { tmp <- GO_df$GO_ID[GO_df$Category == "Component"] genes_by_term <- genes_by_term[tmp] term_descriptions <- term_descriptions[tmp] } else if (gene_sets == "GO-MF") { tmp <- GO_df$GO_ID[GO_df$Category == "Function"] genes_by_term <- genes_by_term[tmp] term_descriptions <- term_descriptions[tmp] } ## non-GO (KEGG, Reactome, BioCarta, mmu_KEGG) } else { if (gene_sets == "KEGG") { genes_by_term <- pathfindR.data::kegg_genes term_descriptions <- pathfindR.data::kegg_descriptions } else if (gene_sets == "Reactome") { genes_by_term <- pathfindR.data::reactome_genes term_descriptions <- pathfindR.data::reactome_descriptions } else if (gene_sets == "BioCarta") { genes_by_term <- pathfindR.data::biocarta_genes term_descriptions <- pathfindR.data::biocarta_descriptions } else if (gene_sets == "mmu_KEGG") { genes_by_term <- pathfindR.data::mmu_kegg_genes term_descriptions <- pathfindR.data::mmu_kegg_descriptions } else { genes_by_term <- pathfindR.data::cell_markers_gsets term_descriptions <- pathfindR.data::cell_markers_descriptions } } # filter by size term_lens <- vapply(genes_by_term, length, 1) keep <- which(term_lens >= min_gset_size & term_lens <= max_gset_size) genes_by_term <- genes_by_term[keep] term_descriptions <- term_descriptions[names(genes_by_term)] return(list(genes_by_term = genes_by_term, term_descriptions = term_descriptions)) } #' Return The Path to Given Protein-Protein Interaction Network (PIN) #' #' This function returns the absolute path/to/PIN.sif. While the default PINs are #' 'Biogrid', 'STRING', 'GeneMania', 'IntAct', 'KEGG' and 'mmu_STRING'. The user can also #' use any other PIN by specifying the 'path/to/PIN.sif'. All PINs to be used #' in this package must formatted as SIF files: i.e. have 3 columns with no #' header, no row names and be tab-separated. Columns 1 and 3 must be #' interactors' gene symbols, column 2 must be a column with all #' rows consisting of 'pp'. #' #' @param pin_name_path Name of the chosen PIN or absolute/path/to/PIN.sif. If PIN name, #' must be one of c('Biogrid', 'STRING', 'GeneMania', 'IntAct', 'KEGG', 'mmu_STRING'). If #' path/to/PIN.sif, the file must comply with the PIN specifications. (Default = 'Biogrid') #' #' @return The absolute path to chosen PIN. #' #' @export #' @seealso See \code{\link{run_pathfindR}} for the wrapper function of the #' pathfindR workflow #' @examples #' \dontrun{ #' pin_path <- return_pin_path('GeneMania') #' } return_pin_path <- function(pin_name_path = "Biogrid") { ## Default PINs valid_opts <- c("Biogrid", "STRING", "GeneMania", "IntAct", "KEGG", "mmu_STRING", "/path/to/custom/SIF") if (pin_name_path %in% valid_opts[-length(valid_opts)]) { path <- file.path(tempdir(check = TRUE), paste0(pin_name_path, ".sif")) if (!file.exists(path)) { adj_list <- utils::getFromNamespace(paste0(tolower(pin_name_path), "_adj_list"), ns = "pathfindR.data") pin_df <- lapply(seq_along(adj_list), function(i, nm, val) { data.frame(base::toupper(nm[[i]]), "pp", base::toupper(val[[i]])) }, val = adj_list, nm = names(adj_list)) pin_df <- base::do.call("rbind", pin_df) utils::write.table(pin_df, path, sep = "\t", row.names = FALSE, col.names = FALSE, quote = FALSE) } path <- normalizePath(path) ## Custom PIN } else if (file.exists(suppressWarnings(normalizePath(pin_name_path)))) { path <- normalizePath(pin_name_path) pin <- utils::read.delim(file = path, quote = "", header = FALSE) if (ncol(pin) != 3) { stop("The PIN file must have 3 columns and be tab-separated") } if (any(pin[, 2] != "pp")) { stop("The second column of the PIN file must all be \"pp\" ") } if (any(grepl("[a-z]", pin[, 1])) | any(grepl("[a-z]", pin[, 3]))) { pin[, 1] <- base::toupper(pin[, 1]) pin[, 3] <- base::toupper(pin[, 3]) path <- file.path(tempdir(check = TRUE), "custom_PIN.sif") utils::write.table(pin, path, sep = "\t", row.names = FALSE, col.names = FALSE, quote = FALSE) path <- normalizePath(path) } } else { stop("The chosen PIN must be one of:\n", paste(dQuote(valid_opts), collapse = ", ")) } return(path) } ================================================ FILE: R/visualization.R ================================================ #' Check if value is a valid color #' #' @param x value #' #' @return TRUE if x is a valid color, otherwise FALSE isColor <- function(x) { if (!is.character(x) | length(x) != 1) { return(FALSE) } tryCatch(is.matrix(grDevices::col2rgb(x)), error = function(e) FALSE) } #' Create Diagrams for Enriched Terms #' #' @param result_df Data frame of enrichment results. Must-have columns for #' KEGG human pathway diagrams (\code{is_KEGG_result = TRUE}) are: 'ID' and 'Term_Description'. #' Must-have columns for the rest are: 'Term_Description', 'Up_regulated' and #' 'Down_regulated' #' @param input_processed input data processed via \code{\link{input_processing}}, #' not necessary when \code{is_KEGG_result = FALSE} #' @param is_KEGG_result boolean to indicate whether KEGG gene sets were used for #' enrichment analysis or not (default = \code{TRUE}) #' @inheritParams return_pin_path #' @param ... additional arguments for \code{\link{visualize_KEGG_diagram}} (used #' when \code{is_KEGG_result = TRUE}) or \code{\link{visualize_term_interactions}} #' (used when \code{is_KEGG_result = FALSE}) #' #' @return Depending on the argument \code{is_KEGG_result}, creates visualization of #' interactions of genes involved in the list of enriched terms in #' \code{result_df}. Returns a list of ggplot objects named by Term ID. #' #' #' @details For \code{is_KEGG_result = TRUE}, KEGG pathway diagrams are created, #' affected nodes colored by up/down regulation status. #' For other gene sets, interactions of affected genes are determined (via a shortest-path #' algorithm) and are visualized (colored by change status) using igraph. #' #' #' @export #' #' @seealso See \code{\link{visualize_KEGG_diagram}} for the visualization function #' of KEGG diagrams. See \code{\link{visualize_term_interactions}} for the #' visualization function that generates diagrams showing the interactions of #' input genes in the PIN. See \code{\link{run_pathfindR}} for the wrapper #' function of the pathfindR workflow. #' #' @examples #' \dontrun{ #' input_processed <- data.frame( #' GENE = c("PARP1", "NDUFA1", "STX6", "SNAP23"), #' CHANGE = c(1.5, -2, 3, 5) #' ) #' result_df <- example_pathfindR_output[1:2, ] #' #' gg_list <- visualize_terms(result_df, input_processed) #' gg_list2 <- visualize_terms(result_df, is_KEGG_result = FALSE, pin_name_path = 'IntAct') #' } visualize_terms <- function( result_df, input_processed = NULL, is_KEGG_result = TRUE, pin_name_path = "Biogrid", ... ) { ############ Argument Checks if (!is.data.frame(result_df)) { stop("`result_df` should be a data frame") } if (!is.logical(is_KEGG_result)) { stop("the argument `is_KEGG_result` should be either TRUE or FALSE") } if (is_KEGG_result) { nec_cols <- "ID" } else { nec_cols <- c("Term_Description", "Up_regulated", "Down_regulated") } if (!all(nec_cols %in% colnames(result_df))) { stop("`result_df` should contain the following columns: ", paste(dQuote(nec_cols), collapse = ", ")) } if (is_KEGG_result) { if (is.null(input_processed)) { stop("`input_processed` should be specified when `is_KEGG_result = TRUE`") } } ############ Generate Diagrams if (is_KEGG_result) { visualize_KEGG_diagram( kegg_pw_ids = result_df$ID, input_processed = input_processed, ... ) } else { visualize_term_interactions( result_df = result_df, pin_name_path = pin_name_path, ... ) } } #' Visualize Interactions of Genes Involved in the Given Enriched Terms #' #' @param result_df Data frame of enrichment results. Must-have columns #' are: 'Term_Description', 'Up_regulated' and 'Down_regulated' #' @inheritParams return_pin_path #' @param show_legend Boolean to indicate whether to display the legend (\code{TRUE}) #' or not (\code{FALSE}) (default: \code{TRUE}) #' #' @return list of ggplot objects (named by Term ID) visualizing the interactions of genes involved #' in the given enriched terms (annotated in the \code{result_df}) in the PIN used #' for enrichment analysis (specified by \code{pin_name_path}). #' #' @details The following steps are performed for the visualization of interactions #' of genes involved for each enriched term: \enumerate{ #' \item shortest paths between all affected genes are determined (via \code{\link[igraph]{igraph}}) #' \item the nodes of all shortest paths are merged #' \item the PIN is subsetted using the merged nodes (genes) #' \item using the PIN subset, the graph showing the interactions is generated #' \item the final graph is visualized using \code{\link[igraph]{igraph}}, colored by changed #' status (if provided) #' } #' #' @export #' #' @seealso See \code{\link{visualize_terms}} for the wrapper function #' for creating enriched term diagrams. See \code{\link{run_pathfindR}} for the #' wrapper function of the pathfindR enrichment workflow. #' #' @examples #' \dontrun{ #' result_df <- example_pathfindR_output[1:2, ] #' gg_list <- visualize_term_interactions(result_df, pin_name_path = 'IntAct') #' } visualize_term_interactions <- function(result_df, pin_name_path, show_legend = TRUE) { ############ Initial Steps fix naming issue result_df$Term_Description <- gsub("\\/", "-", result_df$Term_Description) ## load PIN pin_path <- return_pin_path(pin_name_path) pin <- utils::read.delim(file = pin_path, header = FALSE) pin$V2 <- NULL pin[, 1] <- base::toupper(pin[, 1]) pin[, 2] <- base::toupper(pin[, 2]) ## pin graph pin_g <- igraph::graph_from_data_frame(pin, directed = FALSE) ############ Visualize interactions by enriched term pw_vis_list <- list() for (i in base::seq_len(nrow(result_df))) { current_row <- result_df[i, ] up_genes <- base::toupper(unlist(strsplit(current_row$Up_regulated, ", "))) down_genes <- base::toupper(unlist(strsplit(current_row$Down_regulated, ", "))) current_genes <- c(down_genes, up_genes) ## Add active snw genes if listed if (!is.null(result_df$non_Signif_Snw_Genes)) { snw_genes <- unlist(strsplit(current_row$non_Signif_Snw_Genes, ", ")) snw_genes <- base::toupper(snw_genes) current_genes <- c(current_genes, snw_genes) } else { snw_genes <- NULL } if (length(current_genes) < 2) { message(paste0("< 2 genes, skipping visualization of ", current_row$Term_Description)) } else { cat("Visualizing:", paste0("(", i, ")") , current_row$Term_Description, paste(rep(" ", 200), collapse = ""), "\r") ## Find genes without direct interaction cond1 <- pin$V1 %in% current_genes cond2 <- pin$V3 %in% current_genes direct_interactions <- pin[cond1 & cond2, ] tmp <- c(direct_interactions$V1, direct_interactions$V3) missing_genes <- current_genes[!current_genes %in% tmp] ## Find shortest path between genes without direct interaction and ## other current_genes s_path_genes <- c() for (gene in missing_genes) { tmp <- suppressWarnings(igraph::shortest_paths(pin_g, from = which(names(igraph::V(pin_g)) == gene), to = which(names(igraph::V(pin_g)) %in% current_genes), output = "vpath")) tmp <- unique(unlist(lapply(tmp$vpath, function(x) names(x)))) s_path_genes <- unique(c(s_path_genes, tmp)) } final_genes <- unique(c(current_genes, s_path_genes)) cond1 <- pin$V1 %in% final_genes cond2 <- pin$V3 %in% final_genes final_interactions <- pin[cond1 & cond2, ] g <- igraph::graph_from_data_frame(final_interactions, directed = FALSE) cond1 <- names(igraph::V(g)) %in% up_genes cond2 <- names(igraph::V(g)) %in% down_genes cond3 <- names(igraph::V(g)) %in% snw_genes node_type <- as.factor(ifelse(cond1, "up", ifelse(cond2, "down", ifelse(cond3, "interactor", "none")))) igraph::V(g)$type <- node_type node_colors <- c("green", "red", "blue", "gray") names(node_colors) <- c("up", "down", "interactor", "none") node_colors <- node_colors[levels(node_type)] type_descriptions <- c( none = "other", up = "up-regulated gene", down = "down-regulated gene", interactor = "interacting non-input gene" ) type_descriptions <- type_descriptions[levels(node_type)] p <- ggraph::ggraph(g, layout = "stress") p <- p + ggraph::geom_edge_link(alpha = 0.8, colour = "darkgrey", linewidth = 0.5) p <- p + ggraph::geom_node_point(ggplot2::aes(color = .data$type), size = 5) p <- p + ggplot2::theme_void() p <- p + suppressWarnings(ggraph::geom_node_text(ggplot2::aes(label = .data$name), nudge_y = 0.2, repel = TRUE, max.overlaps = 20)) p <- p + ggplot2::scale_color_manual(values = node_colors, name = NULL, labels = type_descriptions) p <- p + ggplot2::ggtitle( paste(current_row$Term_Description, "\n Involved Gene Interactions in", pin_name_path) ) pw_vis_list[[current_row$ID]] <- p } } return(pw_vis_list) } #' Visualize Human KEGG Pathways #' #' @param kegg_pw_ids KEGG ids of pathways to be colored and visualized #' @param input_processed input data processed via \code{\link{input_processing}} #' @inheritParams color_kegg_pathway #' #' @return Creates colored visualizations of the enriched human KEGG pathways #' and returns them as a list of ggplot objects, named by Term ID. #' #' @seealso See \code{\link{visualize_terms}} for the wrapper function for #' creating enriched term diagrams. See \code{\link{run_pathfindR}} for the #' wrapper function of the pathfindR enrichment workflow. #' #' @export #' #' @examples #' \dontrun{ #' input_processed <- data.frame( #' GENE = c("PKLR", "GPI", "CREB1", "INS"), #' CHANGE = c(1.5, -2, 3, 5) #' ) #' gg_list <- visualize_KEGG_diagram(c("hsa00010", "hsa04911"), input_processed) #' } visualize_KEGG_diagram <- function( kegg_pw_ids, input_processed, scale_vals = TRUE, node_cols = NULL, legend.position = "top" ) { message("This function utilises one functionality of `ggkegg`. For more options, visit https://github.com/noriakis/ggkegg") ############ Arg checks ### kegg_pw_ids if (!is.atomic(kegg_pw_ids)) { stop("`kegg_pw_ids` should be a vector of KEGG IDs") } if (!all(grepl("^[a-z]{3}[0-9]{5}$", kegg_pw_ids))) { stop("`kegg_pw_ids` should be a vector of valid hsa KEGG IDs") } ### input_processed if (!is.data.frame(input_processed)) { stop("`input_processed` should be a data frame") } nec_cols <- c("GENE", "CHANGE") if (!all(nec_cols %in% colnames(input_processed))) { stop("`input_processed` should contain the following columns: ", paste(dQuote(nec_cols), collapse = ", ")) } if (!requireNamespace("org.Hs.eg.db", quietly = TRUE)) { message( "Package 'org.Hs.eg.db' is not installed; returning empty list.\n", "Install it with:\n", " if (!requireNamespace('BiocManager', quietly = TRUE)) install.packages('BiocManager')\n", " BiocManager::install('org.Hs.eg.db')" ) return(list()) } ############ Create change vector Convert gene symbols into NCBI gene IDs tmp <- AnnotationDbi::mget(input_processed$GENE, AnnotationDbi::revmap(org.Hs.eg.db::org.Hs.egSYMBOL), ifnotfound = NA) input_processed$EG_ID <- vapply(tmp, function(x) as.character(x[1]), "EGID") input_processed <- input_processed[!is.na(input_processed$EG_ID), ] ### A rule of thumb for the 'kegg' ID is entrezgene ID for eukaryote ### species input_processed$KEGG_ID <- paste0("hsa:", input_processed$EG_ID) ############ Fetch all pathway genes, create vector of change values and ############ Generate colored pathway diagrams for each pathway change_vec <- input_processed$CHANGE names(change_vec) <- input_processed$KEGG_ID cat("Generating pathway diagrams of", length(kegg_pw_ids), "KEGG pathways\n\n") pw_vis_list <- lapply( kegg_pw_ids, color_kegg_pathway, change_vec=change_vec, scale_vals = scale_vals, node_cols = node_cols, legend.position = legend.position ) names(pw_vis_list) <- kegg_pw_ids return(pw_vis_list) } #' Color hsa KEGG pathway #' #' @param pw_id hsa KEGG pathway id (e.g. hsa05012) #' @param change_vec vector of change values, names should be hsa KEGG gene ids #' @param scale_vals should change values be scaled? (default = \code{TRUE}) #' @param node_cols low, middle and high color values for coloring the pathway nodes #' (default = \code{NULL}). If \code{node_cols=NULL}, the low, middle and high color #' are set as 'green', 'gray' and 'red'. If all change values are 1e6 (in case no #' changes are supplied, this dummy value is assigned by #' \code{\link{input_processing}}), only one color ('#F38F18' if NULL) is used. #' @inheritParams ggplot2::theme #' #' @return a ggplot object containing the colored KEGG pathway diagram visualization #' #' @examples #' \dontrun{ #' pw_id <- 'hsa00010' #' change_vec <- c(-2, 4, 6) #' names(change_vec) <- c('hsa:2821', 'hsa:226', 'hsa:229') #' result <- pathfindR:::color_kegg_pathway(pw_id, change_vec) #' } color_kegg_pathway <- function(pw_id, change_vec, scale_vals = TRUE, node_cols = NULL, legend.position = "top") { ############ Arg checks if (!is.logical(scale_vals)) { stop("`scale_vals` should be logical") } ## check node_cols if (!is.null(node_cols)) { if (!is.atomic(node_cols)) { stop("`node_cols` should be a vector of colors") } if (!all(change_vec == 1e+06) & length(node_cols) != 3) { stop("the length of `node_cols` should be 3") } if (!all(vapply(node_cols, isColor, TRUE))) { stop("`node_cols` should be a vector of valid colors") } } ############ Set node palette if node_cols not supplied, use default ############ color(s) if (!is.null(node_cols)) { if (all(change_vec == 1e+06)) { message("all `change_vec` values are 1e6, using the first color in `node_cols`") low_col <- mid_col <- high_col <- node_cols[1] } else { low_col <- node_cols[1] mid_col <- node_cols[2] high_col <- node_cols[3] } } else if (all(change_vec == 1e+06)) { ## NO CHANGES SUPPLIED low_col <- mid_col <- high_col <- "#F38F18" } else { low_col <- "red" mid_col <- "gray" high_col <- "green" } ############ Assign the input change values to any corresponding pathway gene nodes # create pathway graph object and collect all pathway genes ggkegg_temp_dir <- file.path(tempdir(check = TRUE), "ggkegg") dir.create(ggkegg_temp_dir, showWarnings = FALSE) g <- tryCatch({ ggkegg::pathway(pid = pw_id, directory = ggkegg_temp_dir) }, error = function(e) { message(paste("Cannot parse KEGG pathway for:", pw_id)) message("Here's the original error message:") message(e$message) return(NULL) }, warning = function(w) { message(paste("Cannot parse KEGG pathway for:", pw_id)) message("Here's the original error message:") message(w$message) return(NULL) }) if (is.null(g)) { return(NULL) } gene_nodes <- names(igraph::V(g))[igraph::V(g)$type == "gene"] ## aggregate change values over all pathway gene nodes pw_vis_changes <- c() for (i in seq_len(length(gene_nodes))) { node_name <- gene_nodes[i] node <- unlist(strsplit(node_name, " ")) cond <- names(change_vec) %in% node if (any(cond)) { node_val <- mean(change_vec[cond]) names(node_val) <- node_name pw_vis_changes <- c(pw_vis_changes, node_val) } } ## if no input genes present in chosen pathway if (all(is.na(pw_vis_changes))) { return(NULL) } ############ Determine node colors ### scaling if (!all(pw_vis_changes == 1e+06) & scale_vals) { common_limit <- max(abs(pw_vis_changes)) pw_vis_changes <- ifelse(pw_vis_changes < 0, -abs(pw_vis_changes) / common_limit, pw_vis_changes / common_limit) } ############ Create pathway diagram visualisation igraph::V(g)$change_value <- NA igraph::V(g)$change_value[match(names(pw_vis_changes), names(igraph::V(g)))] <- pw_vis_changes p <- ggraph::ggraph(g, layout="manual", x=igraph::V(g)$x, y=igraph::V(g)$y) p <- p + ggkegg::geom_node_rect(ggplot2::aes(filter = !is.na(.data$change_value), fill = .data$change_value)) p <- p + ggkegg::overlay_raw_map(pw_id) p <- p + ggplot2::scale_fill_gradient2(low = low_col, mid = mid_col, high = high_col) p <- p + ggplot2::theme_void() p <- p + ggplot2::theme( legend.title = ggplot2::element_blank(), legend.position = legend.position ) return(p) } #' Create Bubble Chart of Enrichment Results #' #' This function is used to create a ggplot2 bubble chart displaying the #' enrichment results. #' #' @param result_df a data frame that must contain the following columns: \describe{ #' \item{Term_Description}{Description of the enriched term} #' \item{Fold_Enrichment}{Fold enrichment value for the enriched term} #' \item{lowest_p}{the lowest adjusted-p value of the given term over all iterations} #' \item{Up_regulated}{the up-regulated genes in the input involved in the given term's gene set, comma-separated} #' \item{Down_regulated}{the down-regulated genes in the input involved in the given term's gene set, comma-separated} #' \item{Cluster(OPTIONAL)}{the cluster to which the enriched term is assigned} #' } #' @param top_terms number of top terms (according to the 'lowest_p' column) #' to plot (default = 10). If \code{plot_by_cluster = TRUE}, selects the top #' \code{top_terms} terms per each cluster. Set \code{top_terms = NULL} to plot #' for all terms.If the total number of terms is less than \code{top_terms}, #' all terms are plotted. #' @param plot_by_cluster boolean value indicating whether or not to group the #' enriched terms by cluster (works if \code{result_df} contains a #' 'Cluster' column). #' @param num_bubbles number of sizes displayed in the legend \code{# genes} #' (Default = 4) #' @param even_breaks whether or not to set even breaks for the number of sizes #' displayed in the legend \code{# genes}. If \code{TRUE} (default), sets #' equal breaks and the number of displayed bubbles may be different than the #' number set by \code{num_bubbles}. If the exact number set by #' \code{num_bubbles} is required, set this argument to \code{FALSE} #' #' @return a \code{\link[ggplot2]{ggplot2}} object containing the bubble chart. #' The x-axis corresponds to fold enrichment values while the y-axis indicates #' the enriched terms. Size of the bubble indicates the number of significant #' genes in the given enriched term. Color indicates the -log10(lowest-p) value. #' The closer the color is to red, the more significant the enrichment is. #' Optionally, if 'Cluster' is a column of \code{result_df} and #' \code{plot_by_cluster == TRUE}, the enriched terms are grouped by clusters. #' #' @import ggplot2 #' @export #' #' @examples #' g <- enrichment_chart(example_pathfindR_output) enrichment_chart <- function(result_df, top_terms = 10, plot_by_cluster = FALSE, num_bubbles = 4, even_breaks = TRUE) { message("Plotting the enrichment bubble chart") necessary <- c("Term_Description", "Fold_Enrichment", "lowest_p", "Up_regulated", "Down_regulated") if (!all(necessary %in% colnames(result_df))) { stop("The input data frame must have the columns:\n", paste(necessary, collapse = ", ")) } if (!is.logical(plot_by_cluster)) { stop("`plot_by_cluster` must be either TRUE or FALSE") } if (!is.numeric(top_terms) & !is.null(top_terms)) { stop("`top_terms` must be either numeric or NULL") } if (!is.null(top_terms)) { if (top_terms < 1) { stop("`top_terms` must be > 1") } } # sort by lowest adj.p result_df <- result_df[order(result_df$lowest_p), ] ## Filter for top_terms if (!is.null(top_terms)) { if (plot_by_cluster & "Cluster" %in% colnames(result_df)) { keep_ids <- tapply(result_df$ID, result_df$Cluster, function(x) { x[seq_len(min(top_terms, length(x)))] }) keep_ids <- unlist(keep_ids) result_df <- result_df[result_df$ID %in% keep_ids, ] } else if (top_terms < nrow(result_df)) { result_df <- result_df[seq_len(top_terms), ] } } num_genes <- vapply(result_df$Up_regulated, function(x) length(unlist(strsplit(x, ", "))), 1) num_genes <- num_genes + vapply(result_df$Down_regulated, function(x) length(unlist(strsplit(x, ", "))), 1) result_df$Term_Description <- factor(result_df$Term_Description, levels = rev(unique(result_df$Term_Description))) log_p <- -log10(result_df$lowest_p) g <- ggplot2::ggplot(result_df, ggplot2::aes(.data$Fold_Enrichment, .data$Term_Description)) g <- g + ggplot2::geom_point(ggplot2::aes(color = log_p, size = num_genes), na.rm = TRUE) g <- g + ggplot2::theme_bw() g <- g + ggplot2::theme(axis.text.x = ggplot2::element_text(size = 10), axis.text.y = ggplot2::element_text(size = 10), plot.title = ggplot2::element_blank()) g <- g + ggplot2::xlab("Fold Enrichment") g <- g + ggplot2::theme(axis.title.y = ggplot2::element_blank()) g <- g + ggplot2::labs(size = "# genes", color = expression(-log[10](p))) ## breaks for # genes if (max(num_genes) < num_bubbles) { g <- g + ggplot2::scale_size_continuous(breaks = seq(0, max(num_genes))) } else { if (even_breaks) { brks <- base::seq(0, max(num_genes), round(max(num_genes)/(num_bubbles + 1))) } else { brks <- base::round(base::seq(0, max(num_genes), length.out = num_bubbles + 1)) } g <- g + ggplot2::scale_size_continuous(breaks = brks) } g <- g + ggplot2::scale_color_gradient(low = "#f5efef", high = "red") if (plot_by_cluster & "Cluster" %in% colnames(result_df)) { g <- g + ggplot2::facet_grid(result_df$Cluster ~ ., scales = "free_y", space = "free", drop = TRUE) } else if (plot_by_cluster) { message("For plotting by cluster, there must a column named `Cluster` in the input data frame!") } return(g) } #' Create Term-Gene Graph #' #' @param result_df A dataframe of pathfindR results that must contain the following columns: \describe{ #' \item{Term_Description}{Description of the enriched term (necessary if \code{use_description = TRUE})} #' \item{ID}{ID of the enriched term (necessary if \code{use_description = FALSE})} #' \item{lowest_p}{the lowest adjusted-p value of the given term over all iterations} #' \item{Up_regulated}{the up-regulated genes in the input involved in the given term's gene set, comma-separated} #' \item{Down_regulated}{the down-regulated genes in the input involved in the given term's gene set, comma-separated} #' } #' @param num_terms Number of top enriched terms to use while creating the graph. Set to \code{NULL} to use #' all enriched terms (default = 10, i.e. top 10 terms) #' @param layout The type of layout to create (see \code{\link[ggraph]{ggraph}} for details. Default = 'stress') #' @param use_description Boolean argument to indicate whether term descriptions #' (in the 'Term_Description' column) should be used. (default = \code{FALSE}) #' @param node_size Argument to indicate whether to use number of significant genes ('num_genes') #' or the -log10(lowest p value) ('p_val') for adjusting the node sizes (default = 'num_genes') #' @param node_colors vector of 3 colors to be used for coloring nodes (colors for term nodes, up, and down, respectively) #' #' @return a \code{\link[ggraph]{ggraph}} object containing the term-gene graph. #' Each node corresponds to an enriched term (beige), an up-regulated gene (green) #' or a down-regulated gene (red). An edge between a term and a gene indicates #' that the given term involves the gene. Size of a term node is proportional #' to either the number of genes (if \code{node_size = 'num_genes'}) or #' the -log10(lowest p value) (if \code{node_size = 'p_val'}). #' #' @details This function (adapted from the Gene-Concept network visualization #' by the R package \code{enrichplot}) can be utilized to visualize which input #' genes are involved in the enriched terms as a graph. The term-gene graph #' shows the links between genes and biological terms and allows for the #' investigation of multiple terms to which significant genes are related. The #' graph also enables determination of the overlap between the enriched terms #' by identifying shared and distinct significant term-related genes. #' #' @import ggraph #' @export #' #' @examples #' p <- term_gene_graph(example_pathfindR_output) #' p <- term_gene_graph(example_pathfindR_output, num_terms = 5) #' p <- term_gene_graph(example_pathfindR_output, node_size = 'p_val') term_gene_graph <- function(result_df, num_terms = 10, layout = "stress", use_description = FALSE, node_size = "num_genes", node_colors = c("#E5D7BF", "green", "red")) { ############ Argument Checks Check num_terms is NULL or numeric if (!is.numeric(num_terms) & !is.null(num_terms)) { stop("`num_terms` must either be numeric or NULL!") } ### Check use_description is boolean if (!is.logical(use_description)) { stop("`use_description` must either be TRUE or FALSE!") } ### Set column for term labels ID_column <- ifelse(use_description, "Term_Description", "ID") ### Check node_size val_node_size <- c("num_genes", "p_val") if (!node_size %in% val_node_size) { stop("`node_size` should be one of ", paste(dQuote(val_node_size), collapse = ", ")) } if (!is.data.frame(result_df)) { stop("`result_df` should be a data frame") } ### Check necessary columnns necessary_cols <- c(ID_column, "lowest_p", "Up_regulated", "Down_regulated") if (!all(necessary_cols %in% colnames(result_df))) { stop(paste(c("All of", paste(necessary_cols, collapse = ", "), "must be present in `results_df`!"), collapse = " ")) } if (!is.atomic(node_colors)) { stop("`node_colors` should be a vector of colors") } if (!all(vapply(node_colors, isColor, TRUE))) { stop("`node_colors` should be a vector of valid colors") } if (length(node_colors) != 3) { stop("`node_colors` must contain exactly 3 colors") } ############ Initial steps set num_terms to NULL if number of enriched ############ terms is smaller than num_terms if (!is.null(num_terms)) { if (nrow(result_df) < num_terms) { num_terms <- NULL } } ### Order and filter for top N genes result_df <- result_df[order(result_df$lowest_p, decreasing = FALSE), ] if (!is.null(num_terms)) { result_df <- result_df[1:num_terms, ] } ### Prep data frame for graph graph_df <- data.frame() for (i in base::seq_len(nrow(result_df))) { up_genes <- unlist(strsplit(result_df$Up_regulated[i], ", ")) down_genes <- unlist(strsplit(result_df$Down_regulated[i], ", ")) for (gene in c(up_genes, down_genes)) { graph_df <- rbind(graph_df, data.frame(Term = result_df[i, ID_column], Gene = gene)) } } up_genes <- lapply(result_df$Up_regulated, function(x) unlist(strsplit(x, ", "))) up_genes <- unlist(up_genes) ############ Create graph object and plot create igraph object g <- igraph::graph_from_data_frame(graph_df, directed = FALSE) cond_term <- names(igraph::V(g)) %in% result_df[, ID_column] cond_up_gene <- names(igraph::V(g)) %in% up_genes node_type <- ifelse(cond_term, "term", ifelse(cond_up_gene, "up", "down")) node_type <- factor(node_type, levels = c("term", "up", "down")) node_type <- droplevels(node_type) igraph::V(g)$type <- node_type type_descriptions <- c(term="enriched term", up="up-regulated gene", down="down-regulated gene") type_descriptions <- type_descriptions[levels(node_type)] names(node_colors) <- c("term", "up", "down") node_colors <- node_colors[levels(node_type)] # Adjust node sizes if (node_size == "num_genes") { sizes <- igraph::degree(g) sizes <- ifelse(igraph::V(g)$type == "term", sizes, 2) size_label <- "# genes" } else { idx <- match(names(igraph::V(g)), result_df[, ID_column]) sizes <- -log10(result_df$lowest_p[idx]) sizes[is.na(sizes)] <- 2 size_label <- "-log10(p)" } igraph::V(g)$size <- sizes igraph::V(g)$label.cex <- 0.5 igraph::V(g)$frame.color <- "gray" ### Create graph p <- ggraph::ggraph(g, layout = layout) p <- p + ggraph::geom_edge_link(alpha = 0.8, colour = "darkgrey") p <- p + ggraph::geom_node_point(ggplot2::aes(color = .data$type, size = .data$size)) p <- p + ggplot2::scale_size(range = c(5, 10), breaks = round(seq(round(min(igraph::V(g)$size)), round(max(igraph::V(g)$size)), length.out = 4)), name = size_label) p <- p + ggplot2::theme_void() p <- p + suppressWarnings(ggraph::geom_node_text(ggplot2::aes(label = .data$name), nudge_y = 0.2, repel = TRUE, max.overlaps = 20)) p <- p + ggplot2::scale_color_manual(values = node_colors, name = NULL, labels = type_descriptions) if (is.null(num_terms)) { p <- p + ggplot2::ggtitle("Term-Gene Graph") } else { p <- p + ggplot2::ggtitle("Term-Gene Graph", subtitle = paste(c("Top", num_terms, "terms"), collapse = " ")) } p <- p + ggplot2::theme(plot.title = ggplot2::element_text(hjust = 0.5), plot.subtitle = ggplot2::element_text(hjust = 0.5)) return(p) } #' Create Terms by Genes Heatmap #' #' @param result_df A dataframe of pathfindR results that must contain the following columns: \describe{ #' \item{Term_Description}{Description of the enriched term (necessary if \code{use_description = TRUE})} #' \item{ID}{ID of the enriched term (necessary if \code{use_description = FALSE})} #' \item{lowest_p}{the highest adjusted-p value of the given term over all iterations} #' \item{Up_regulated}{the up-regulated genes in the input involved in the given term's gene set, comma-separated} #' \item{Down_regulated}{the down-regulated genes in the input involved in the given term's gene set, comma-separated} #' } #' @param genes_df the input data that was used with \code{\link{run_pathfindR}}. #' It must be a data frame with 3 columns: \enumerate{ #' \item Gene Symbol (Gene Symbol) #' \item Change value, e.g. log(fold change) (optional) #' \item p value, e.g. adjusted p value associated with differential expression #' } The change values in this data frame are used to color the affected genes #' @param num_terms Number of top enriched terms to use while creating the plot. Set to \code{NULL} to use #' all enriched terms (default = 10) #' @inheritParams term_gene_graph #' @inheritParams plot_scores #' @param legend_title legend title (default = 'change') #' @param sort_terms_by_p boolean to indicate whether to sort terms by 'lowest_p' #' (\code{TRUE}) or by number of genes (\code{FALSE}) (default = \code{FALSE}) #' @param ... additional arguments for \code{\link{input_processing}} (used if #' \code{genes_df} is provided) #' #' @return a ggplot2 object of a heatmap where rows are enriched terms and #' columns are involved input genes. If \code{genes_df} is provided, colors of #' the tiles indicate the change values. #' @export #' #' @examples #' term_gene_heatmap(example_pathfindR_output, num_terms = 3) term_gene_heatmap <- function(result_df, genes_df, num_terms = 10, use_description = FALSE, low = "red", mid = "black", high = "green", legend_title = "change", sort_terms_by_p = FALSE, ...) { ############ Arg checks if (!is.logical(use_description)) { stop("`use_description` must either be TRUE or FALSE!") } ### Set column for term labels ID_column <- ifelse(use_description, "Term_Description", "ID") if (!is.data.frame(result_df)) { stop("`result_df` should be a data frame") } nec_cols <- c(ID_column, "lowest_p", "Up_regulated", "Down_regulated") if (!all(nec_cols %in% colnames(result_df))) { stop("`result_df` should have the following columns: ", paste(dQuote(nec_cols), collapse = ", ")) } if (!missing(genes_df)) { suppressMessages(input_testing(genes_df)) } if (!is.null(num_terms)) { if (!is.numeric(num_terms)) { stop("`num_terms` should be numeric or NULL") } if (num_terms < 1) { stop("`num_terms` should be > 0 or NULL") } } if (!isColor(low)) { stop("`low` should be a valid color") } if (!isColor(mid)) { stop("`mid` should be a valid color") } if (!isColor(high)) { stop("`high` should be a valid color") } ############ Init prep steps result_df <- result_df[order(result_df$lowest_p), ] ### select num_terms genes if (!is.null(num_terms)) { if (num_terms < nrow(result_df)) { result_df <- result_df[1:num_terms, ] } } ### process input genes (if provided) if (!missing(genes_df)) { genes_df <- input_processing(input = genes_df, ...) } ### parse genes from enrichment results parse_genes <- function(vec, idx) { return(unname(unlist(strsplit(vec[idx], ", ")))) } up_genes <- apply(result_df, 1, parse_genes, which(colnames(result_df) == "Up_regulated")) down_genes <- apply(result_df, 1, parse_genes, which(colnames(result_df) == "Down_regulated")) if (length(down_genes) == 0) { down_genes <- rep(NA, nrow(result_df)) } if (length(up_genes) == 0) { up_genes <- rep(NA, nrow(result_df)) } names(up_genes) <- names(down_genes) <- result_df[, ID_column] ############ Create terms-by-genes matrix and order all_genes <- unique(c(unlist(up_genes), unlist(down_genes))) all_genes <- all_genes[!is.na(all_genes)] all_terms <- result_df[, ID_column] term_genes_mat <- matrix(0, nrow = nrow(result_df), ncol = length(all_genes), dimnames = list(all_terms, all_genes)) for (i in seq_len(nrow(term_genes_mat))) { current_term <- rownames(term_genes_mat)[i] current_genes <- c(up_genes[[current_term]], down_genes[[current_term]]) current_genes <- current_genes[!is.na(current_genes)] term_genes_mat[i, match(current_genes, colnames(term_genes_mat))] <- 1 } ### Order by column term_genes_mat <- term_genes_mat[, order(colSums(term_genes_mat), decreasing = TRUE)] ### Order by row ordering_func <- function(row) { n <- length(row) pow <- 2^-(0:(n - 1)) return(row %*% pow) } term_genes_mat <- term_genes_mat[order(apply(term_genes_mat, 1, ordering_func), decreasing = TRUE), ] ### Transform the matrix var_names <- list() var_names[["Enriched_Term"]] <- factor(rownames(term_genes_mat), levels = rev(rownames(term_genes_mat))) var_names[["Symbol"]] <- factor(colnames(term_genes_mat), levels = colnames(term_genes_mat)) term_genes_df <- expand.grid(var_names, KEEP.OUT.ATTRS = FALSE, stringsAsFactors = FALSE) value <- as.vector(term_genes_mat) value <- data.frame(value) term_genes_df <- cbind(term_genes_df, value) term_genes_df$value[term_genes_df$value == 0] <- NA bg_df <- expand.grid(Enriched_Term = all_terms, Symbol = all_genes) if (sort_terms_by_p) { bg_df$Enriched_Term <- factor(bg_df$Enriched_Term, levels = rev(result_df[, ID_column])) } else { bg_df$Enriched_Term <- factor(bg_df$Enriched_Term, levels = rev(rownames(term_genes_mat))) } bg_df$Symbol <- factor(bg_df$Symbol, levels = colnames(term_genes_mat)) if (!missing(genes_df)) { for (i in seq_len(nrow(term_genes_df))) { if (!is.na(term_genes_df$value[i])) { if (all(genes_df$CHANGE == 1e+06)) { term_genes_df$value[i] <- ifelse(term_genes_df$Symbol[i] %in% up_genes[[as.character(term_genes_df$Enriched_Term[i])]], 1, -1) } else { term_genes_df$value[i] <- genes_df$CHANGE[genes_df$GENE == term_genes_df$Symbol[i]] } } } if (all(genes_df$CHANGE == 1e+06)) { term_genes_df$value <- factor(term_genes_df$value, levels = c(-1, 1)) } } else { for (i in seq_len(nrow(term_genes_df))) { if (!is.na(term_genes_df$value[i])) { term_genes_df$value[i] <- ifelse(term_genes_df$Symbol[i] %in% unlist(up_genes), "up", "down") } } } g <- ggplot2::ggplot(bg_df, ggplot2::aes(x = .data$Symbol, y = .data$Enriched_Term)) g <- g + ggplot2::geom_tile(fill = "white", color = "white") g <- g + ggplot2::theme(axis.ticks.y = ggplot2::element_blank(), axis.text.x = ggplot2::element_text(angle = 90, hjust = 1), axis.text.y = ggplot2::element_text(colour = "#000000"), axis.title.x = ggplot2::element_blank(), axis.title.y = ggplot2::element_blank(), panel.grid.major.x = ggplot2::element_blank(), panel.grid.major.y = ggplot2::element_blank(), panel.grid.minor.x = ggplot2::element_blank(), panel.grid.minor.y = ggplot2::element_blank(), panel.background = ggplot2::element_rect(fill = "#ffffff")) g <- g + ggplot2::geom_tile(data = term_genes_df, ggplot2::aes(fill = .data$value), color = "gray60") if (!missing(genes_df)) { if (all(genes_df$CHANGE == 1e+06)) { g <- g + ggplot2::scale_fill_manual(values = c(low, high), na.value = "white", name = legend_title) } else { g <- g + ggplot2::scale_fill_gradient2(low = low, mid = mid, high = high, na.value = "white", name = legend_title) } } else { g <- g + ggplot2::scale_fill_manual(values = c(low, high), na.value = "white", name = legend_title) } return(g) } #' Create UpSet Plot of Enriched Terms #' #' @inheritParams term_gene_heatmap #' @param method the option for producing the plot. Options include 'heatmap', #' 'boxplot' and 'barplot'. (default = 'heatmap') #' #' @return UpSet plots are plots of the intersections of sets as a matrix. This #' function creates a ggplot object of an UpSet plot where the x-axis is the #' UpSet plot of intersections of enriched terms. By default (i.e. #' \code{method = 'heatmap'}) the main plot is a heatmap of genes at the #' corresponding intersections, colored by up/down regulation (if #' \code{genes_df} is provided, colored by change values). If #' \code{method = 'barplot'}, the main plot is bar plots of the number of genes #' at the corresponding intersections. Finally, if \code{method = 'boxplot'} and #' if \code{genes_df} is provided, then the main plot displays the boxplots of #' change values of the genes at the corresponding intersections. #' @export #' #' @examples #' UpSet_plot(example_pathfindR_output) UpSet_plot <- function(result_df, genes_df, num_terms = 10, method = "heatmap", use_description = FALSE, low = "red", mid = "black", high = "green", ...) { ############ Arg checks if (!is.logical(use_description)) { stop("`use_description` must either be TRUE or FALSE!") } ### Set column for term labels ID_column <- ifelse(use_description, "Term_Description", "ID") if (!is.data.frame(result_df)) { stop("`result_df` should be a data frame") } nec_cols <- c(ID_column, "lowest_p", "Up_regulated", "Down_regulated") if (!all(nec_cols %in% colnames(result_df))) { stop("`result_df` should have the following columns: ", paste(dQuote(nec_cols), collapse = ", ")) } if (!missing(genes_df)) { suppressMessages(input_testing(genes_df)) } if (!is.null(num_terms)) { if (!is.numeric(num_terms)) { stop("`num_terms` should be numeric or NULL") } if (num_terms < 1) { stop("`num_terms` should be > 0 or NULL") } } valid_opts <- c("heatmap", "boxplot", "barplot") if (!method %in% valid_opts) { stop("`method` should be one of` ", paste(dQuote(valid_opts), collapse = ", ")) } if (!isColor(low)) { stop("`low` should be a valid color") } if (!isColor(mid)) { stop("`mid` should be a valid color") } if (!isColor(high)) { stop("`high` should be a valid color") } ########## Init prep steps result_df <- result_df[order(result_df$lowest_p), ] ### select num_terms genes if (!is.null(num_terms)) { if (num_terms < nrow(result_df)) { result_df <- result_df[1:num_terms, ] } } ### process input genes (if provided) if (!missing(genes_df)) { genes_df <- input_processing(input = genes_df, ...) } ### parse genes from enrichment results parse_genes <- function(vec, idx) { return(unname(unlist(strsplit(vec[idx], ", ")))) } up_genes <- apply(result_df, 1, parse_genes, which(colnames(result_df) == "Up_regulated")) down_genes <- apply(result_df, 1, parse_genes, which(colnames(result_df) == "Down_regulated")) if (length(down_genes) == 0) { down_genes <- rep(NA, nrow(result_df)) } if (length(up_genes) == 0) { up_genes <- rep(NA, nrow(result_df)) } names(up_genes) <- names(down_genes) <- result_df[, ID_column] ############ Create terms-by-genes matrix and order all_genes <- unique(c(unlist(up_genes), unlist(down_genes))) all_terms <- result_df[, ID_column] term_genes_mat <- matrix(0, nrow = nrow(result_df), ncol = length(all_genes), dimnames = list(all_terms, all_genes)) for (i in seq_len(nrow(term_genes_mat))) { current_term <- rownames(term_genes_mat)[i] current_genes <- c(up_genes[[current_term]], down_genes[[current_term]]) term_genes_mat[i, match(current_genes, colnames(term_genes_mat))] <- 1 } ### Transform the matrix var_names <- list() var_names[["Enriched_Term"]] <- factor(rownames(term_genes_mat), levels = rownames(term_genes_mat)) var_names[["Symbol"]] <- factor(colnames(term_genes_mat), levels = colnames(term_genes_mat)) term_genes_df <- expand.grid(var_names, KEEP.OUT.ATTRS = FALSE, stringsAsFactors = FALSE) value <- as.vector(term_genes_mat) value <- data.frame(value) term_genes_df <- cbind(term_genes_df, value) term_genes_df <- term_genes_df[term_genes_df$value != 0, ] ### Order according to frequencies term_genes_df$Enriched_Term <- factor(term_genes_df$Enriched_Term, levels = names(sort(table(term_genes_df$Enriched_Term), decreasing = TRUE))) term_genes_df$Symbol <- factor(term_genes_df$Symbol, levels = rev(names(sort(table(term_genes_df$Symbol))))) terms_lists <- rev(split(term_genes_df$Enriched_Term, term_genes_df$Symbol)) plot_df <- data.frame( Gene = names(terms_lists), Up_Down = ifelse(names(terms_lists) %in% unlist(up_genes), "up", "down"), stringsAsFactors = FALSE ) plot_df$Term <- terms_lists bg_df <- expand.grid(Gene = unique(plot_df$Gene), Term = unique(plot_df$Term)) if (method == "heatmap") { g <- ggplot2::ggplot(bg_df, ggplot2::aes(x = .data$Term, y = .data$Gene)) g <- g + ggplot2::geom_tile(fill = "white", color = "gray60") if (missing(genes_df)) { g <- g + ggplot2::geom_tile(data = plot_df, ggplot2::aes(x = .data$Term, y = .data$Gene, fill = .data$Up_Down), color = "gray60") g <- g + ggplot2::scale_fill_manual(values = c(low, high)) } else { plot_df$Value <- genes_df$CHANGE[match(names(plot_df$Term), genes_df$GENE)] g <- g + ggplot2::geom_tile(data = plot_df, ggplot2::aes(x = .data$Term, y = .data$Gene, fill = .data$Value), color = "gray60") g <- g + ggplot2::scale_fill_gradient2(low = low, mid = mid, high = high) } g <- g + ggplot2::theme_minimal() g <- g + ggplot2::theme(axis.title = ggplot2::element_blank(), panel.grid.major = ggplot2::element_blank(), panel.grid.minor = ggplot2::element_blank(), legend.title = ggplot2::element_blank()) } else if (method == "boxplot") { if (missing(genes_df)) { stop("For `method = boxplot`, you must provide `genes_df`") } plot_df$Value <- genes_df$CHANGE[match(names(plot_df$Term), genes_df$GENE)] g <- ggplot2::ggplot(plot_df, ggplot2::aes(x = .data$Term, y = .data$Value)) g <- g + ggplot2::geom_boxplot() g <- g + ggplot2::geom_jitter(width = 0.1) } else { g <- ggplot2::ggplot(plot_df, ggplot2::aes(x = .data$Term)) g <- g + ggplot2::geom_bar() } g <- g + ggupset::scale_x_upset(order_by = ifelse(missing(genes_df), "freq", "degree"), reverse = !missing(genes_df)) return(g) } ================================================ FILE: R/zzz.R ================================================ .onAttach <- function(libname, pkgname) { packageStartupMessage("############################################################################## Welcome to pathfindR! Please cite the article below if you use pathfindR in published reseach: Ulgen E, Ozisik O, Sezerman OU. 2019. pathfindR: An R Package for Comprehensive Identification of Enriched Pathways in Omics Data Through Active Subnetworks. Front. Genet. doi:10.3389/fgene.2019.00858 ##############################################################################") check_java_version() } #' Obtain Java Version #' #' @return character vector containing the output of 'java -version' #' #' @details this function was adapted from the CRAN package \code{sparklyr} fetch_java_version <- function() { java_home <- Sys.getenv("JAVA_HOME", unset = NA) if (!is.na(java_home)) { java <- file.path(java_home, "bin", "java") if (identical(.Platform$OS.type, "windows")) { java <- paste0(java, ".exe") } if (!file.exists(java)) { java <- "" } } else { java <- Sys.which("java") } if (java == "") { stop("Java version not detected. Please download and install Java from ", dQuote("https://www.java.com/en/")) } version <- system2(java, "-version", stderr = TRUE, stdout = TRUE) if (length(version) < 1) { stop("Java version not detected. Please download and install Java from ", dQuote("https://www.java.com/en/")) } return(version) } #' Check Java Version #' #' @param version character vector containing the output of 'java -version'. If #' NULL, result of \code{\link{fetch_java_version}} is used (default = NULL) #' #' @return only parses and checks whether the java version is >= 1.8 #' #' @details this function was adapted from the CRAN package \code{sparklyr} check_java_version <- function(version = NULL) { if (is.null(version)) { version <- fetch_java_version() } # find line with version info versionLine <- version[grepl("version", version)] if (length(versionLine) != 1) { stop("Java version detected but couldn't parse version from ", paste(version, collapse = " - ")) } # transform to usable R version string vers_string <- strsplit(versionLine, "\\s+", perl = TRUE)[[1]] vers_string <- vers_string[grepl("[0-9]+\\.[0-9]+\\.[0-9]+", vers_string, perl = TRUE)] if (length(vers_string) != 1) { vers_string <- strsplit(versionLine, "\\s+", perl = TRUE)[[1]] vers_string <- vers_string[grepl("[0-9]+", vers_string, perl = TRUE)] vers_string <- vers_string[!grepl("-", vers_string)] if (length(vers_string) != 1) { stop("Java version detected but couldn't parse version from: ", versionLine) } } parsedVersion <- gsub("^\"|\"$", "", vers_string) parsedVersion <- gsub("_", ".", parsedVersion) parsedVersion <- gsub("[^0-9.]+", "", parsedVersion) # ensure Java 1.8 (8) or higher if (utils::compareVersion(parsedVersion, "1.8") < 0) { stop("Java version", parsedVersion, " detected but Java >=8 is required. Please download and install Java from ", dQuote("https://www.java.com/en/")) } } ================================================ FILE: README.Rmd ================================================ --- output: github_document --- ```{r, include=FALSE} knitr::opts_chunk$set(collapse=TRUE, comment="#>", fig.path="inst/extdata/", out.width="100%") suppressPackageStartupMessages(library(pathfindR)) ``` # pathfindR: An R Package for Enrichment Analysis Utilizing Active Subnetworks [![R-CMD-check](https://github.com/egeulgen/pathfindR/actions/workflows/R-CMD-check.yaml/badge.svg)](https://github.com/egeulgen/pathfindR/actions/workflows/R-CMD-check.yaml) [![codecov](https://codecov.io/gh/egeulgen/pathfindR/graph/badge.svg?token=8m9aPaXzNr)](https://codecov.io/gh/egeulgen/pathfindR) [![CRAN version](https://www.r-pkg.org/badges/version/pathfindR)](https://cran.r-project.org/package=pathfindR) [![CRAN total downloads](https://cranlogs.r-pkg.org/badges/grand-total/pathfindR)](https://cran.r-project.org/package=pathfindR) [![install with bioconda](https://img.shields.io/badge/install%20with-bioconda-brightgreen.svg?style=flat)](http://bioconda.github.io/recipes/r-pathfindr/README.html) [![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/license/mit) # Overview `pathfindR` is an R package for enrichment analysis via active subnetworks. The package also offers functionality to cluster the enriched terms and identify representative terms in each cluster, score the enriched terms per sample, and visualize analysis results. As of the latest version, the package also allows comparison of two pathfindR results. The functionality suite of pathfindR is described in detail in _Ulgen E, Ozisik O, Sezerman OU. 2019. pathfindR: An R Package for Comprehensive Identification of Enriched Pathways in Omics Data Through Active Subnetworks. Front. Genet. [https://doi.org/10.3389/fgene.2019.00858](https://doi.org/10.3389/fgene.2019.00858)_ For detailed documentation, see [pathfindR's website](https://egeulgen.github.io/pathfindR/). # Installation - You can install the released version of pathfindR from CRAN via: ```{r installation1, eval=FALSE} install.packages("pathfindR") ``` - Since version 2.1.0, you may also install `pathfindR` via conda: ```{bash conda, eval=FALSE} conda install -c bioconda r-pathfindr ``` - Via [pak](https://pak.r-lib.org/) (this might be preferable given `pathfindR`'s Bioconductor dependencies): ```{r installation2, eval=FALSE} install.packages("pak") # if you have not installed "pak" pak::pkg_install("pathfindR") ``` - And the development version from GitHub via `devtools`: ```{r installation3, eval=FALSE} install.packages("devtools") # if you have not installed "devtools" devtools::install_github("egeulgen/pathfindR") ``` > **IMPORTANT NOTE** > For the active subnetwork search component to work, the user must have [Java (>= 8.0)](https://www.java.com/en/download/) installed, and the path/to/java must be in the PATH environment variable. We also have docker images available on [Docker Hub](https://hub.docker.com/repository/docker/egeulgen/pathfindr) and [GitHub](https://github.com/egeulgen/pathfindR/packages): ```{bash docker, eval=FALSE} # pull image for the latest release docker pull egeulgen/pathfindr:latest # pull image for a specific version (e.g., 1.4.1) docker pull egeulgen/pathfindr:1.4.1 ``` Online app on superbio.ai: [https://app.superbio.ai/apps/111/](https://app.superbio.ai/apps/111/) # Enrichment Analysis with pathfindR ![pathfindR Enrichment Workflow](https://github.com/egeulgen/pathfindR/blob/master/vignettes/pathfindr.png?raw=true "pathfindr Enrichment Workflow") This workflow takes in a data frame consisting of "gene symbols", "change values" (optional), and "associated p-values": ```{r example_input, echo=FALSE} tmp <- example_pathfindR_input[1:4, ] tmp$logFC <- round(tmp$logFC,2) tmp$adj.P.Val <- format(tmp$adj.P.Val, digits=2) colnames(tmp) <- c("Gene_symbol", "logFC", "FDR_p") knitr::kable(tmp, align=c("l", "c", "c")) ``` After input testing, any gene symbol that is not in the chosen protein-protein interaction network (PIN) is converted to an alias symbol if there is an alias that is found in the PIN. After mapping the input genes with the associated p-values onto the PIN, active subnetwork search is performed. The resulting active subnetworks are then filtered based on their scores and the number of significant genes they contain. > An active subnetwork can be defined as a group of interconnected genes in a protein-protein interaction network (PIN) that predominantly consists of significantly altered genes. In other words, active subnetworks define distinct disease-associated sets of interacting genes, whether discovered through the original analysis or discovered because of being in interaction with a significant gene. These filtered lists of active subnetworks are then used for enrichment analyses, i.e., using the genes in each of the active subnetworks, the significantly enriched terms (pathways/gene sets) are identified. Enriched terms with adjusted p-values larger than the given threshold are discarded, and the lowest adjusted p-value (among all active subnetworks) for each term is kept. This process of `active subnetwork search + enrichment analyses` is repeated for a selected number of iterations, performed in parallel. Over all iterations, the lowest and the highest adjusted p-values, and the number of occurrences among all iterations are reported for each significantly enriched term. This workflow can be run using the function `run_pathfindR()`: ```{r basic_usage, eval=FALSE} library(pathfindR) output_df <- run_pathfindR(input_df) ``` This wrapper function performs the active-subnetwork-oriented enrichment analysis, and returns a data frame of enriched terms: ![pathfindR Enrichment Chart](https://github.com/egeulgen/pathfindR/blob/master/vignettes/enrichment_chart.png?raw=true "Enrichment Chart") Some useful arguments are: ```{r useful_args, eval=FALSE} # set an output directory for saving active subnetworks and creating an HTML report # (default=NULL, sets a temporary directory) output_df <- run_pathfindR(input_df, output_dir="/top/secret/results") # change the gene sets used for analysis (default="KEGG") output_df <- run_pathfindR(input_df, gene_sets="GO-MF") # change the PIN for active subnetwork search (default=Biogrid) output_df <- run_pathfindR(input_df, pin_name_path="IntAct") # or use an external PIN of your choice output_df <- run_pathfindR(input_df, pin_name_path="/path/to/my/PIN.sif") # change the number of iterations (default=10) output_df <- run_pathfindR(input_df, iterations=25) # report the non-significant active subnetwork genes (for later analyses) output_df <- run_pathfindR(input_df, list_active_snw_genes=TRUE) ``` The available PINs are "Biogrid", "STRING", "GeneMania", "IntAct", "KEGG" and "mmu_STRING". The available gene sets are "KEGG", "Reactome", "BioCarta", "GO-All", "GO-BP", "GO-CC", "GO-MF", and "mmu_KEGG". You also use a custom PIN (see `?return_pin_path`) or a custom gene set (see `?fetch_gene_set`) > As of the latest development version, pathfindR offers utility functions for obtaining organism-specific PIN data (for now, only BioGRID PINs) and organism-specific gene sets (KEGG and Reactome) data via `get_pin_file()` and `get_gene_sets_list()`, respectively. # Clustering of the Enriched Terms ![Enriched Terms Clustering Workflow](https://github.com/egeulgen/pathfindR/blob/master/vignettes/term_clustering.png?raw=true "Enriched Terms Clustering Workflow") The wrapper function for this workflow is `cluster_enriched_terms()`. This workflow first calculates the pairwise kappa statistics between the enriched terms. The function then performs hierarchical clustering (by default), automatically determines the optimal number of clusters by maximizing the average silhouette width and returns a data frame with cluster assignments. ```{r clustering_h, eval=FALSE} # default settings clustered_df <- cluster_enriched_terms(output_df) # display the heatmap of hierarchical clustering clustered_df <- cluster_enriched_terms(output_df, plot_hmap=TRUE) # display the dendrogram and automatically-determined clusters clustered_df <- cluster_enriched_terms(output_df, plot_dend=TRUE) # change agglomeration method (default="average") for hierarchical clustering clustered_df <- cluster_enriched_terms(output_df, clu_method="centroid") ``` Alternatively, the `fuzzy` clustering method (as described in Huang DW, Sherman BT, Tan Q, et al. The DAVID Gene Functional Classification Tool: a novel biological module-centric algorithm to functionally analyze large gene lists. Genome Biol. 2007;8(9):R183.) can be used: ```{r clustering_f, eval=FALSE} clustered_df_fuzzy <- cluster_enriched_terms(output_df, method="fuzzy") ``` # Visualization of Enrichment Results ## Enriched Term Diagrams For H.sapiens KEGG enrichment analyses, `visualize_terms()` can be used to generate KEGG pathway diagrams as `ggraph` (inherits from `ggplot`) objects (using [`ggkegg`](https://github.com/noriakis/ggkegg)): ```{r KEGG_vis, eval=FALSE} input_processed <- input_processing(example_pathfindR_input) gg_list <- visualize_terms( result_df = example_pathfindR_output, input_processed = input_processed, is_KEGG_result = TRUE ) # this function returns a list of ggraph objects (named by Term ID) # save one of the plots as PDF image ggplot2::ggsave( "hsa04911_diagram.pdf", # path to output, format is determined by extension gg_list$hsa04911, # what to plot width = 5 # adjust width height = 5 # adjust height ) ``` ![KEGG Pathway Diagram](https://github.com/egeulgen/pathfindR/blob/master/vignettes/example_kegg_pathway_diagram.png?raw=true) Alternatively (i.e., for other types of (non-KEGG) enrichment analyses), an interaction diagram per enriched term can be generated again via `visualize_terms()`. These diagrams are also returned as `ggraph` objects: ```{r nonKEGG_viss, eval=FALSE} input_processed <- input_processing(example_pathfindR_input) gg_list <- visualize_terms( result_df = example_pathfindR_output, input_processed = input_processed, is_KEGG_result = FALSE, pin_name_path = "Biogrid" ) # this function returns a list of ggraph objects (named by Term ID) # save one of the plots as PDF image ggplot2::ggsave( "diabetic_cardiomyopathy_interactions.pdf", # path to output, format is determined by extension gg_list$hsa04911, # what to plot width = 10 # adjust width height = 6 # adjust height ) ``` ![Interaction Diagram](https://github.com/egeulgen/pathfindR/blob/master/vignettes/example_interaction_vis.png?raw=true) ## Term-Gene Heatmap The function `term_gene_heatmap()` can visualize the heatmap of enriched terms by the involved input genes. This heatmap allows visual identification of the input genes involved in the enriched terms, and the common or distinct genes between different terms. If the input data frame (same as in `run_pathfindR()`) is supplied, the tile colors indicate the change values. ![Term-Gene Heatmap](https://github.com/egeulgen/pathfindR/blob/master/vignettes/hmap.png?raw=true "Term-Gene Heatmap") ## Term-Gene Graph The function `term_gene_graph()` (adapted from the Gene-Concept network visualization by the R package `enrichplot`) can be utilized to visualize which significant genes are involved in the enriched terms. The function creates the term-gene graph, displaying the connections between genes and biological terms (enriched pathways or gene sets). This allows for the investigation of multiple terms to which significant genes are related. The graph also enables the determination of the degree of overlap between the enriched terms by identifying shared and/or distinct significant genes. ![Term-Gene Graph](https://github.com/egeulgen/pathfindR/blob/master/vignettes/term_gene.png?raw=true "Term-Gene Graph") ## UpSet Plot UpSet plots are plots of the intersections of sets as a matrix. This function creates a ggplot object of an UpSet plot where the x-axis is the UpSet plot of intersections of enriched terms. By default (i.e., `method="heatmap"`), the main plot is a heatmap of genes at the corresponding intersections, colored by up-/down-regulation (if `genes_df` is provided, colored by change values). If `method="barplot"`, the main plot is bar plots of the number of genes at the corresponding intersections. Finally, if `method="boxplot"` and `genes_df` is provided, then the main plot displays the boxplots of the genes' change values at the corresponding intersections. ![UpSet plot](https://github.com/egeulgen/pathfindR/blob/master/vignettes/upset.png?raw=true "UpSet Plot") # Per Sample Enriched Term Scores ![Agglomerated Scores for all Enriched Terms per Sample](https://github.com/egeulgen/pathfindR/blob/master/vignettes/score_hmap.png?raw=true "Scoring per Sample") The function `score_terms()` can be used to calculate the agglomerated z score of each enriched term per sample. This allows the user to examine the scores individually and infer how a term is overall altered (activated or repressed) in a given sample or a group of samples. # Comparison of 2 pathfindR Results The function `combine_pathfindR_results()` allows combining two pathfindR analysis results for investigating common and distinct terms between the groups. Below is an example for comparing two different results using rheumatoid arthritis-related data. ```{r compare2res, eval=FALSE} combined_df <- combine_pathfindR_results( result_A=an_output_df, result_B=another_output_df ) ``` By default, `combine_pathfindR_results()` plots the term-gene graph for the common terms in the combined results. The function `combined_results_graph()` can be used to create this graph (using only selected terms etc.) later on. ```{r compare_graph, eval=FALSE} combined_results_graph(combined_df, selected_terms=c("hsa04144", "hsa04141", "hsa04140")) ``` ![Combined Results Graph](https://github.com/egeulgen/pathfindR/blob/master/vignettes/combined_graph.png?raw=true "Combined Results Graph") ================================================ FILE: README.md ================================================ # pathfindR: An R Package for Enrichment Analysis Utilizing Active Subnetworks [![R-CMD-check](https://github.com/egeulgen/pathfindR/actions/workflows/R-CMD-check.yaml/badge.svg)](https://github.com/egeulgen/pathfindR/actions/workflows/R-CMD-check.yaml) [![codecov](https://codecov.io/gh/egeulgen/pathfindR/graph/badge.svg?token=8m9aPaXzNr)](https://codecov.io/gh/egeulgen/pathfindR) [![CRAN version](https://www.r-pkg.org/badges/version/pathfindR)](https://cran.r-project.org/package=pathfindR) [![CRAN total downloads](https://cranlogs.r-pkg.org/badges/grand-total/pathfindR)](https://cran.r-project.org/package=pathfindR) [![install with bioconda](https://img.shields.io/badge/install%20with-bioconda-brightgreen.svg?style=flat)](http://bioconda.github.io/recipes/r-pathfindr/README.html) [![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/license/mit) # Overview `pathfindR` is an R package for enrichment analysis via active subnetworks. The package also offers functionality to cluster the enriched terms and identify representative terms in each cluster, score the enriched terms per sample, and visualize analysis results. As of the latest version, the package also allows comparison of two pathfindR results. The functionality suite of pathfindR is described in detail in *Ulgen E, Ozisik O, Sezerman OU. 2019. pathfindR: An R Package for Comprehensive Identification of Enriched Pathways in Omics Data Through Active Subnetworks. Front. Genet. * For detailed documentation, see [pathfindR’s website](https://egeulgen.github.io/pathfindR/). # Installation - You can install the released version of pathfindR from CRAN via: ``` r install.packages("pathfindR") ``` - Since version 2.1.0, you may also install `pathfindR` via conda: ``` bash conda install -c bioconda r-pathfindr ``` - Via [pak](https://pak.r-lib.org/) (this might be preferable given `pathfindR`’s Bioconductor dependencies): ``` r install.packages("pak") # if you have not installed "pak" pak::pkg_install("pathfindR") ``` - And the development version from GitHub via `devtools`: ``` r install.packages("devtools") # if you have not installed "devtools" devtools::install_github("egeulgen/pathfindR") ``` > **IMPORTANT NOTE** For the active subnetwork search component to work, > the user must have [Java (\>= 8.0)](https://www.java.com/en/download/) > installed, and the path/to/java must be in the PATH environment > variable. We also have docker images available on [Docker Hub](https://hub.docker.com/repository/docker/egeulgen/pathfindr) and [GitHub](https://github.com/egeulgen/pathfindR/packages): ``` bash # pull image for the latest release docker pull egeulgen/pathfindr:latest # pull image for a specific version (e.g., 1.4.1) docker pull egeulgen/pathfindr:1.4.1 ``` Online app on superbio.ai: # Enrichment Analysis with pathfindR
pathfindR Enrichment Workflow
This workflow takes in a data frame consisting of “gene symbols”, “change values” (optional), and “associated p-values”: | Gene_symbol | logFC | FDR_p | |:------------|:-----:|:-------:| | FAM110A | -0.69 | 3.4e-06 | | RNASE2 | 1.35 | 1.0e-05 | | S100A8 | 1.54 | 3.5e-05 | | S100A9 | 1.03 | 2.3e-04 | After input testing, any gene symbol that is not in the chosen protein-protein interaction network (PIN) is converted to an alias symbol if there is an alias that is found in the PIN. After mapping the input genes with the associated p-values onto the PIN, active subnetwork search is performed. The resulting active subnetworks are then filtered based on their scores and the number of significant genes they contain. > An active subnetwork can be defined as a group of interconnected genes > in a protein-protein interaction network (PIN) that predominantly > consists of significantly altered genes. In other words, active > subnetworks define distinct disease-associated sets of interacting > genes, whether discovered through the original analysis or discovered > because of being in interaction with a significant gene. These filtered lists of active subnetworks are then used for enrichment analyses, i.e., using the genes in each of the active subnetworks, the significantly enriched terms (pathways/gene sets) are identified. Enriched terms with adjusted p-values larger than the given threshold are discarded, and the lowest adjusted p-value (among all active subnetworks) for each term is kept. This process of `active subnetwork search + enrichment analyses` is repeated for a selected number of iterations, performed in parallel. Over all iterations, the lowest and the highest adjusted p-values, and the number of occurrences among all iterations are reported for each significantly enriched term. This workflow can be run using the function `run_pathfindR()`: ``` r library(pathfindR) output_df <- run_pathfindR(input_df) ``` This wrapper function performs the active-subnetwork-oriented enrichment analysis, and returns a data frame of enriched terms:
pathfindR Enrichment Chart
Some useful arguments are: ``` r # set an output directory for saving active subnetworks and creating an HTML report # (default=NULL, sets a temporary directory) output_df <- run_pathfindR(input_df, output_dir="/top/secret/results") # change the gene sets used for analysis (default="KEGG") output_df <- run_pathfindR(input_df, gene_sets="GO-MF") # change the PIN for active subnetwork search (default=Biogrid) output_df <- run_pathfindR(input_df, pin_name_path="IntAct") # or use an external PIN of your choice output_df <- run_pathfindR(input_df, pin_name_path="/path/to/my/PIN.sif") # change the number of iterations (default=10) output_df <- run_pathfindR(input_df, iterations=25) # report the non-significant active subnetwork genes (for later analyses) output_df <- run_pathfindR(input_df, list_active_snw_genes=TRUE) ``` The available PINs are “Biogrid”, “STRING”, “GeneMania”, “IntAct”, “KEGG” and “mmu_STRING”. The available gene sets are “KEGG”, “Reactome”, “BioCarta”, “GO-All”, “GO-BP”, “GO-CC”, “GO-MF”, and “mmu_KEGG”. You also use a custom PIN (see `?return_pin_path`) or a custom gene set (see `?fetch_gene_set`) > As of the latest development version, pathfindR offers utility > functions for obtaining organism-specific PIN data (for now, only > BioGRID PINs) and organism-specific gene sets (KEGG and Reactome) data > via `get_pin_file()` and `get_gene_sets_list()`, respectively. # Clustering of the Enriched Terms ![Enriched Terms Clustering Workflow](https://github.com/egeulgen/pathfindR/blob/master/vignettes/term_clustering.png?raw=true "Enriched Terms Clustering Workflow") The wrapper function for this workflow is `cluster_enriched_terms()`. This workflow first calculates the pairwise kappa statistics between the enriched terms. The function then performs hierarchical clustering (by default), automatically determines the optimal number of clusters by maximizing the average silhouette width and returns a data frame with cluster assignments. ``` r # default settings clustered_df <- cluster_enriched_terms(output_df) # display the heatmap of hierarchical clustering clustered_df <- cluster_enriched_terms(output_df, plot_hmap=TRUE) # display the dendrogram and automatically-determined clusters clustered_df <- cluster_enriched_terms(output_df, plot_dend=TRUE) # change agglomeration method (default="average") for hierarchical clustering clustered_df <- cluster_enriched_terms(output_df, clu_method="centroid") ``` Alternatively, the `fuzzy` clustering method (as described in Huang DW, Sherman BT, Tan Q, et al. The DAVID Gene Functional Classification Tool: a novel biological module-centric algorithm to functionally analyze large gene lists. Genome Biol. 2007;8(9):R183.) can be used: ``` r clustered_df_fuzzy <- cluster_enriched_terms(output_df, method="fuzzy") ``` # Visualization of Enrichment Results ## Enriched Term Diagrams For H.sapiens KEGG enrichment analyses, `visualize_terms()` can be used to generate KEGG pathway diagrams as `ggraph` (inherits from `ggplot`) objects (using [`ggkegg`](https://github.com/noriakis/ggkegg)): ``` r input_processed <- input_processing(example_pathfindR_input) gg_list <- visualize_terms( result_df = example_pathfindR_output, input_processed = input_processed, is_KEGG_result = TRUE ) # this function returns a list of ggraph objects (named by Term ID) # save one of the plots as PDF image ggplot2::ggsave( "hsa04911_diagram.pdf", # path to output, format is determined by extension gg_list$hsa04911, # what to plot width = 5 # adjust width height = 5 # adjust height ) ```
KEGG Pathway Diagram
Alternatively (i.e., for other types of (non-KEGG) enrichment analyses), an interaction diagram per enriched term can be generated again via `visualize_terms()`. These diagrams are also returned as `ggraph` objects: ``` r input_processed <- input_processing(example_pathfindR_input) gg_list <- visualize_terms( result_df = example_pathfindR_output, input_processed = input_processed, is_KEGG_result = FALSE, pin_name_path = "Biogrid" ) # this function returns a list of ggraph objects (named by Term ID) # save one of the plots as PDF image ggplot2::ggsave( "diabetic_cardiomyopathy_interactions.pdf", # path to output, format is determined by extension gg_list$hsa04911, # what to plot width = 10 # adjust width height = 6 # adjust height ) ```
Interaction Diagram
## Term-Gene Heatmap The function `term_gene_heatmap()` can visualize the heatmap of enriched terms by the involved input genes. This heatmap allows visual identification of the input genes involved in the enriched terms, and the common or distinct genes between different terms. If the input data frame (same as in `run_pathfindR()`) is supplied, the tile colors indicate the change values.
Term-Gene Heatmap
## Term-Gene Graph The function `term_gene_graph()` (adapted from the Gene-Concept network visualization by the R package `enrichplot`) can be utilized to visualize which significant genes are involved in the enriched terms. The function creates the term-gene graph, displaying the connections between genes and biological terms (enriched pathways or gene sets). This allows for the investigation of multiple terms to which significant genes are related. The graph also enables the determination of the degree of overlap between the enriched terms by identifying shared and/or distinct significant genes.
Term-Gene Graph
## UpSet Plot UpSet plots are plots of the intersections of sets as a matrix. This function creates a ggplot object of an UpSet plot where the x-axis is the UpSet plot of intersections of enriched terms. By default (i.e., `method="heatmap"`), the main plot is a heatmap of genes at the corresponding intersections, colored by up-/down-regulation (if `genes_df` is provided, colored by change values). If `method="barplot"`, the main plot is bar plots of the number of genes at the corresponding intersections. Finally, if `method="boxplot"` and `genes_df` is provided, then the main plot displays the boxplots of the genes’ change values at the corresponding intersections.
UpSet plot
# Per Sample Enriched Term Scores
Agglomerated Scores for all Enriched Terms per Sample
The function `score_terms()` can be used to calculate the agglomerated z score of each enriched term per sample. This allows the user to examine the scores individually and infer how a term is overall altered (activated or repressed) in a given sample or a group of samples. # Comparison of 2 pathfindR Results The function `combine_pathfindR_results()` allows combining two pathfindR analysis results for investigating common and distinct terms between the groups. Below is an example for comparing two different results using rheumatoid arthritis-related data. ``` r combined_df <- combine_pathfindR_results( result_A=an_output_df, result_B=another_output_df ) ``` By default, `combine_pathfindR_results()` plots the term-gene graph for the common terms in the combined results. The function `combined_results_graph()` can be used to create this graph (using only selected terms etc.) later on. ``` r combined_results_graph(combined_df, selected_terms=c("hsa04144", "hsa04141", "hsa04140")) ```
Combined Results Graph
================================================ FILE: _pkgdown.yml ================================================ destination: docs template: params: bootswatch: united docsearch: api_key: 7f13d388d59fea08d4add29291ea2896 index_name: pathfindr url: https://egeulgen.github.io/pathfindR/ news: - one_page: false reference: - title: "pathfindR" desc: > pathfindR package contents: - pathfindR - title: "Main functions" desc: > Main functions of pathfindR contents: - run_pathfindR - cluster_enriched_terms - enrichment_chart - score_terms - plot_scores - term_gene_heatmap - term_gene_graph - UpSet_plot - visualize_terms - cluster_graph_vis - visualize_active_subnetworks - title: "Data Generation" desc: > Functions to generate PIN and gene sets data contents: - get_pin_file - get_gene_sets_list - title: "Comparison of 2 pathfindR Results" desc: > Functions to compare 2 different pathfindR results contents: - combine_pathfindR_results - combined_results_graph - title: "Enrichment-related functions" desc: > Active subnetwork search- and Enrichment-related functions contents: - active_snw_enrichment_wrapper - single_iter_wrapper - active_snw_search - annotate_term_genes - enrichment - enrichment_analyses - fetch_gene_set - filterActiveSnws - hyperg_test - input_processing - input_testing - return_pin_path - summarize_enrichment_results - title: "Clustering-related functions" desc: > Clustering-related functions contents: - create_kappa_matrix - fuzzy_term_clustering - hierarchical_term_clustering - title: "Visualization functions" desc: > Visualization-related functions contents: - color_kegg_pathway - visualize_KEGG_diagram - visualize_term_interactions - title: "Misc. functions" desc: > Miscellaneous functions contents: - configure_output_dir - check_java_version - fetch_java_version - get_biogrid_pin - process_pin - get_kegg_gsets - get_reactome_gsets - get_mgsigdb_gsets - gset_list_from_gmt - create_HTML_report - isColor - safe_get_content ================================================ FILE: codecov.yml ================================================ comment: false coverage: status: project: default: target: auto threshold: 1% patch: default: target: auto threshold: 1% ================================================ FILE: cran-comments.md ================================================ ## Test environments * local OS X 26.2, R 4.5.2 * macOS-latest (on GitHub-Actions), R 4.5.2 * windows-latest (on GitHub-Actions), R 4.5.2 * ubuntu-latest (on GitHub-Actions), R 4.5.2 * ubuntu-latest (on GitHub-Actions), R devel * ubuntu-latest (on GitHub-Actions), R 4.4.3 * win-builder (devel and release) ## R CMD check results There were no ERRORs, WARNINGs or NOTEs. This is a minor release for 'pathfindR', fixing the CRAN errors due to strong dependencies on a package from Bioconductor data annotation repository. The package was moved to 'Suggests' and code was updated to conditionally execute if installed, raising an informative message if not. ## Downstream dependencies There are currently no downstream dependencies for this package. ================================================ FILE: inst/CITATION ================================================ citHeader("Please cite the article below if you use pathfindR in published reseach") bibentry( bibtype = "Article", author = c(person("Ege","Ulgen"), person("Ozan","Ozisik"), person(c("Osman","Ugur"),"Sezerman")), title = "pathfindR: An R Package for Comprehensive Identification of Enriched Pathways in Omics Data Through Active Subnetworks", journal = "Frontiers in Genetics", volume = 10, year = 2019, pages = 858, url = "https://doi.org/10.3389/fgene.2019.00858", textVersion = "Ulgen E, Ozisik O, Sezerman OU. 2019. pathfindR: An R Package for Comprehensive Identification of Enriched Pathways in Omics Data Through Active Subnetworks. Front. Genet. https://doi.org/10.3389/fgene.2019.00858" ) ================================================ FILE: inst/extdata/CREB.txt ================================================ CREB_01 > Genes having at least one occurence of the transcription factor binding site V$CREB_01 (v7.4 TRANSFAC) in the regions spanning up to 4 kb around their transcription starting sites. ABCE1 ABHD16A ADAP1 ADCY8 ADNP ADNP2 AFF4 AHI1 AKIRIN1 ALS2 ANAPC10 APPBP2 AREG ARIH1 ARL4D ASPHD1 ATF3 ATG5 ATL2 ATP6V0C AVPI1 BNIP3L BRAF BRE C11orf87 C12orf66 C1orf35 C3orf26 CALM2 CAMK2D CBX8 CCBL1 CCDC148 CCNA2 CD2AP CDC42 CDK2AP2 CDS1 CDX4 CENPE CHGB CHMP1B CHMP2A CHPF CLDN6 CLDN7 CLSTN3 CNTROB CREM CRH CSDC2 CTC1 CXCL16 CYLD DAAM2 DCTN1 DDX19A DDX28 DDX51 DEPDC4 DGUOK DHX36 DIO2 DNAJC27 DNTTIP1 DUS2L DUSP1 EEF2 ELAVL1 ELL2 ELOVL5 EPB41 EPHA2 FAM131A FAM174A FAM65A FGF6 FLJ44313 FLT1 FOSB FOXD3 G3BP2 GEM GLI1 GNB4 GNL1 GPBP1 GPM6B GPR3 GTF3C1 HAS1 HDX HHIP HIST1H4E HOXC10 HS3ST2 HS3ST3A1 HSP90AB1 ID1 IFT20 IKBKB ING4 INTS7 IRF2BPL IRX4 IRX6 JUND KCNA5 KCNF1 KCTD8 LDHA LGR5 LMCD1 LTBP1 MAF MAFF MAOA MAP1LC3A MAP3K13 MAPK10 MBIP MBNL2 MCAM MITF MLF2 MMGT1 MRGPRF MRRF MYL6 NCALD NDUFA10 NDUFB2 NEUROD6 NF1 NOC4L NOL4 NR2E1 NR4A2 NUP214 NUP98 NUPL2 OGDH ORMDL2 OSBPL9 OSR1 PACRGL PAFAH1B1 PAK1 PARD6A PCSK1 PDLIM3 PDP1 PEG3 PER1 PFAS PHACTR3 PITX2 PKP4 PLCD3 PLK4 PNMA3 PNMA6A PNRC1 PPARGC1A PPM1A PPP1R15A PPP2R2A PRELID1 PRR3 PTPRU RAB24 RAB25 RAB6A RAD51C RAI1 RALGAPA1P RBKS RBM18 RBMS2 RBP5 RCAN1 RCE1 RELB RIPK4 RNF44 RNF7 RNMTL1 RPL41 RPRD1A RPS29 RUNDC3A RUSC1 RUSC1-AS1 SARNP SCAMP5 SCG2 SDHB SEMA4C SENP2 SEZ6L2 SGIP1 SIK1 SLC18A2 SLC25A25 SLC35F5 SLC38A1 SNAP25 SPAG9 SPATA7 SREBF2 SRRM4 SST ST13 STAT3 SULT4A1 SUV39H2 SYNGR3 SYT11 TEX14 TGIF2 TH THADA THOC1 TIPRL TMEM147 TMEM39A TMUB2 TNFAIP1 TP53INP2 TRAF4 TRAP1 TSC22D2 TSPAN7 TUBB2B UBE2H UCN UMPS USP48 VGF VPS37B WFDC3 WISP1 WNT10A XPNPEP3 XRN2 YTHDC2 YWHAZ ZBTB11 ZBTB37 ZC3H10 ZFAND2B ZFYVE27 ZIM2 ZMYM2 ZMYND15 ZNF184 ZNF295 ZNF335 ZNF367 ZNF576 ZNF593 ZNF687 ================================================ FILE: inst/extdata/MYC.txt ================================================ CACGTG_MYC_Q2 > Genes having at least one occurence of the highly conserved motif M2 CACGTG sites. The motif matches transcription factor binding site V$MYC_Q2 (v7.4 TRANSFAC). AATF ABCA1 ABCA2 ABCB6 ABCC4 ACACA ACAN ACAP3 ACLY ACVR2B ACY1 ADAM10 ADAM12 ADAMTS17 ADAMTS19 ADAMTS3 ADCK1 ADCY3 ADK ADNP ADPRHL1 ADSS AEN AFF4 AGMAT AGRP AIFM3 AK2 AK3 AKAP1 AKAP10 AKAP12 ALDH18A1 ALDH3B1 ALDH6A1 ALDOA ALG1 ALX4 AMD1 AMMECR1L AMPD2 ANAPC13 ANGPT2 ANKHD1 ANKHD1-EIF4EBP3 ANKRD12 ANKRD13B ANKRD17 ANXA6 AP1S2 AP3D1 AP3M1 APEX1 APOA5 ARF6 ARHGAP12 ARHGAP17 ARHGAP20 ARHGAP44 ARL2 ARL3 ARMC6 ARMCX1 ARMCX2 ARMCX3 ARMCX4 ARMCX6 ARPC5 ARRDC3 ARRDC4 ARX ASNA1 ASPHD1 ASPSCR1 ASS1 ATAD3A ATAD3B ATF1 ATF4 ATF7IP ATG3 ATG5 ATIC ATL2 ATOH8 ATP1B3 ATP5F1 ATP5G1 ATP6V0B ATP6V1A ATP6V1C1 ATP6V1H ATP7A ATXN3 ATXN7L2 AVP AVPI1 B3GALT2 B3GALT6 B3GNT9 B4GALT2 BAI2 BARHL1 BATF2 BATF3 BAX BCAS3 BCKDHA BCL11B BCL6 BCL7C BCL9 BCL9L BCOR BDNF BEND4 BEX1 BEX2 BFAR BHLHA15 BHLHE40 BHLHE41 BLOC1S1 BMP2 BMP2K BMP4 BMP6 BMP7 BOK BRD2 BRDT C10orf46 C11orf10 C12orf12 C12orf66 C14orf132 C15orf39 C16orf42 C16orf72 C17orf102 C19orf2 C19orf26 C19orf54 C19orf70 C1orf43 C1orf51 C1QBP C20orf111 C20orf112 C21orf91 C2CD2L C2CD4A C2orf67 C5orf4 C5orf41 C6orf125 C6orf211 C9orf139 C9orf85 CA14 CACNA1D CAD CAMK2D CAMK4 CAMKK1 CAMKK2 CAMKV CANX CBX5 CBX6 CCAR1 CCDC103 CCDC126 CCDC132 CCDC41 CCDC6 CCNYL1 CD164 CD3EAP CDC14A CDK20 CDK5R1 CDKL5 CDKN2C CEACAM5 CEBPA CEBPB CELF1 CELSR3 CEP57 CEP63 CEP95 CFL1 CGN CGREF1 CHD4 CHRM1 CHRNA7 CHST11 CIRH1A CITED2 CLCN2 CLIP2 CLN3 CLTC CMTM8 CNNM1 CNOT4 CNPPD1 CNPY2 CNPY3 CNST COL25A1 COL2A1 COMMD3 COMMD8 COPS7A COPZ1 CPEB4 CPT1A CRMP1 CSDA CSDE1 CSK CSRP2BP CTAGE5 CTBP2 CTIF CTSA CTSF CUL5 CXorf41 CXorf56 CYP2D6 DAZL DCAF11 DCAF13 DCHS1 DCTN4 DCTPP1 DCUN1D4 DDX18 DDX3X DDX4 DDX5 DEPDC7 DHX35 DIABLO DIAPH1 DIRC2 DKFZp547B139 DLC1 DLX1 DLX2 DNAAF1 DNAJB5 DNAJB9 DNMT3A DNTTIP2 DOHH DOLK DOPEY1 DPAGT1 DRD1 DSCAM DTNA DUSP1 DUSP4 DUSP7 DVL2 DYM DYRK1B DZIP1 E2F3 E2F8 EBNA1BP2 EEF1B2 EEF1E1 EFNB1 EFTUD2 EGLN2 EIF2C2 EIF3A EIF3B EIF3E EIF3J EIF4A1 EIF4B EIF4E EIF4G1 ELAVL3 ELK1 ELOVL5 EME1 EN1 EN2 ENO3 ENOPH1 ENPP6 EPB41 EPB41L4B EPC1 ERCC6 ERF ESR2 ESRP2 ESRRA ESYT1 ETV1 EVC2 EWSR1 EXOSC5 EYA4 FABP3 FADS3 FAF1 FAM108B1 FAM116A FAM117A FAM123B FAM134A FAM13B FAM164C FAM179B FAM192A FAM19A4 FAM76B FARP1 FBL FBXL19 FBXL19-AS1 FBXO33 FCHSD2 FEN1 FGF11 FGF14 FGF19 FGF6 FHOD1 FKBP10 FKBP11 FKBP5 FLJ45684 FLT3 FLVCR2 FMR1 FOSL1 FOXD3 FOXF2 FOXJ3 FOXO3 FOXRED2 FPGS FRMD3 FXR1 FXYD2 FXYD6 G6PC3 GABARAP GADD45B GADD45G GAPDH GAR1 GATA4 GATA5 GCSH GEMIN4 GGN GIGYF2 GIT1 GJA1 GK GLA GLB1L GLS2 GLYR1 GMFB GNA13 GNAS GNB2 GNL1 GNPTG GPC3 GPD1 GPM6B GPR176 GPRC5C GPS1 GPX1 GRIN2A GRK6 GSK3B GTF2A1 GTF2H1 GTPBP1 GUCY1A2 GYG1 H2AFZ H3F3A HBP1 HEBP2 HERPUD1 HEXA HHEX HHIP HIF1A HIRA HLX HMGA1 HMGN2 HMHA1 HMOX1 HNRNPA1 HNRNPA3 HNRNPD HNRNPF HNRNPH2 HNRNPH3 HNRNPM HNRPDL HOOK2 HOXA1 HOXA11 HOXA3 HOXA4 HOXA7 HOXA9 HOXB4 HOXB5 HOXB7 HOXC11 HOXC12 HOXC13 HOXC5 HOXD10 HPCA HPCAL4 HPS3 HPS5 HRH3 HRSP12 HS3ST3A1 HSD11B1L HSDL1 HSP90AB1 HSPA4 HSPA9 HSPBAP1 HSPD1 HSPE1 HSPH1 HYAL2 ICAM5 IER5L IFRD2 IGF1R IGF2BP1 IGF2BP3 IGF2R IGSF22 IKZF3 IL15RA IL1RAPL1 ILF3 ILK INSM1 IPO13 IPO4 IPO7 IQCG IQGAP2 IRF9 IRS4 JAKMIP1 JOSD1 JPH1 KAT5 KAT6A KBTBD2 KCMF1 KCNAB1 KCNE4 KCNH4 KCNK5 KCNN4 KCNQ5 KCTD15 KDM3A KDM4C KDM6A KIAA0090 KIAA0586 KIAA0664 KIAA1033 KIAA1407 KIAA1539 KIAA1715 KIAA1737 KLF10 KLF11 KLF9 KLHDC3 KLHL28 KRTCAP2 LAMP1 LAP3 LCLAT1 LDHA LEF1 LEPREL4 LHX9 LIG3 LIN28A LMLN LMNB1 LMX1A LOC147727 LOC80054 LONP1 LONRF3 LPCAT4 LRFN4 LRP8 LRRC16B LRRC48 LTBR LYAR LYPD1 LZTFL1 LZTS2 MACROD1 MAEL MAFF MANF MAP7 MAPKAPK3 MAT2A MAX MBNL1 MCART1 MCM2 MCM8 MCOLN1 MDN1 MEA1 MED28 MEOX2 MEPCE METAP1D METTL11A MFHAS1 MFSD5 MGC13053 MICAL2 MICU1 MIEN1 MINK1 MLL MMP23A MMP23B MNT MON1A MORF4L2 MPP3 MPV17 MRM1 MRPL27 MRPL40 MRTO4 MTAP MTCH2 MTHFD1 MTUS1 MXD3 MXD4 MYB MYCL1 MYL12A MYO19 NAA25 NAPA NAT10 NCL NCOA6 NDUFA7 NDUFAF4 NDUFS1 NEFM NEK6 NET1 NEURL2 NEUROD1 NEUROD6 NFATC3 NFX1 NGFRAP1 NID1 NIT1 NKX2-2 NKX2-3 NLN NMNAT2 NOL6 NOL8 NOP56 NOP58 NOTCH1 NPM1 NPTN NPTX1 NR0B2 NR1D1 NR4A3 NR5A1 NR6A1 NRAS NRIP3 NRXN1 NTHL1 NTN3 NUDC NUDT11 NUP153 NUP62CL NXPH1 ODC1 OGDHL OGT OLFM2 ONECUT1 OPRD1 OSGEP OSR1 PA2G4 PABPC1 PABPC4 PAICS PANK3 PATZ1 PAX6 PBRM1 PCDHA10 PDCD6IP PDIA2 PDIA4 PDK2 PDP2 PDPR PEPD PER1 PES1 PFDN2 PFDN6 PFKFB3 PFN1 PHC3 PHF15 PHF17 PHF20L1 PIAS4 PICALM PIGW PIK3IP1 PITX3 PKN1 PLA2G4A PLA2G6 PLAG1 PLAGL1 PLBD1 PLCG2 PLEKHA6 PNMAL1 POGK POLA1 POLH POLR2H POLR2L POLR3A POLR3C POLR3D POLR3E POP1 POU3F2 PPARGC1A PPARGC1B PPAT PPCS PPM1A PPP1R1B PPP1R3B PPP1R3C PPP1R9B PPP2R2B PPRC1 PRDM4 PRELID1 PRKCE PRKCG PRKCH PRMT1 PROK2 PRPS1 PRPS2 PRR3 PRR7 PRRC2C PSEN2 PSMB3 PSME3 PTCH1 PTGES2 PTMA PUS1 PWP2 QTRT1 QTRTD1 RAB24 RAB2A RAB30 RAB31 RAB3IL1 RABGAP1 RAD50 RAD9A RALYL RAMP2 RANBP1 RAPGEF6 RARG RASD2 RASGRP2 RBBP4 RBBP6 RBBP7 RBM15B RBM19 RBM3 RCL1 RCOR2 REEP3 RELB REV1 REXO2 RFX4 RFX5 RGL1 RHEBL1 RHOBTB3 RLF RNF115 RNF128 RNF145 RNF146 RNF219 RNF43 RNF44 RORC RPA1 RPIA RPL13A RPL19 RPL22 RPL30 RPS11 RPS19 RPS2 RPS28 RPS6KA5 RPUSD4 RRAGB RRAGC RRP15 RRP8 RRS1 RSPO2 RSPRY1 RTN4 RTN4R RTN4RL2 RUNX1T1 RUNX2 RUSC1 RXRB SAE1 SAMD12 SASH3 SATB2 SC5DL SCAMP3 SCFD2 SCRT2 SCYL1 SDC1 SEC11C SEC23IP SELS SEMA3F SEMA7A SEPT3 SERBP1 SET SEZ6L SEZ6L2 SFXN2 SGK1 SGOL1 SGTB SHMT1 SHMT2 SIGMAR1 SIRT1 SLC12A5 SLC12A6 SLC17A2 SLC17A9 SLC1A7 SLC20A1 SLC24A4 SLC25A32 SLC25A33 SLC25A37 SLC26A10 SLC26A2 SLC31A2 SLC33A1 SLC35A5 SLC36A1 SLC38A2 SLC38A5 SLC38A7 SLC39A11 SLC39A7 SLC43A1 SLC4A11 SLC6A15 SLC7A3 SLC7A5 SLC7A5P1 SLC9A5 SLCO1C1 SLCO5A1 SMAD2 SMC3 SMG1 SMYD4 SNCAIP SNCB SNN SNRPA SNTB2 SNX16 SNX2 SNX5 SNX8 SOCS2 SOCS5 SORD SORL1 SOX12 SPATA2L SPG21 SPIN2A SPNS1 SPPL3 SRFBP1 SRP72 SSR1 ST6GAL1 STAT3 STC2 STK31 STMN1 STMN4 STT3B STX6 STXBP2 SUCLG2 SUGP2 SUMF1 SUPT16H SYBU SYNCRIP SYNRG SYT3 SYT6 TAC1 TADA1 TAF6L TAGLN2 TBC1D15 TBC1D5 TBL1X TBL1Y TBX4 TCEAL1 TCEAL3 TCEAL7 TCEAL8 TCERG1 TCF4 TCOF1 TDRD1 TEF TESK2 TET2 TEX12 TFAP4 TFB2M TFRC TGFB2 TGFB3 TGIF2 THAP5 THUMPD2 TIAL1 TIMM10 TIMM50 TIMM8A TIMM9 TLE3 TLE4 TLL1 TMED10 TMEM108 TMEM132E TMEM146 TMEM47 TMEM86A TMUB1 TNFRSF21 TNPO2 TNXB TOM1 TOM1L2 TOP1 TOPORS TPM2 TRAPPC8 TRIB1 TRIM3 TRIM37 TRIM46 TRIP10 TRMT2A TRMT6 TRPM7 TSC1 TSC2 TSKU TSPAN4 TSR2 TSSK3 TUBA4A TUBA4B TUG1 TXLNG TXNDC12 U2AF2 UBA1 UBE2B UBE2C UBE4B UBIAD1 UBR4 UBR5 UBXN1 UBXN10 UBXN6 UNC45B USP15 USP2 USP31 USP34 USP36 UTP14A UTP18 UTY UVRAG VARS2 VGF VLDLR VPS13A VPS16 VPS26A VPS33A VPS37B VWA2 WBP2 WDR17 WDR46 WDR65 WDR77 WEE1 WHSC1L1 WIPI1 WWP2 XPO1 XPO5 XRN2 YBX1 YBX2 YEATS2 YPEL5 ZADH2 ZBTB10 ZBTB11 ZBTB40 ZBTB49 ZBTB5 ZBTB8OS ZC3H10 ZCCHC7 ZCWPW1 ZFP91 ZFYVE26 ZHX2 ZIC1 ZMYM6 ZMYND12 ZNF296 ZNF318 ZNF503 ZNF565 ZNF574 ZNF593 ZNF711 ZNF771 ZNF800 ZNHIT6 ZNRF2 ZZZ3 ================================================ FILE: inst/extdata/resultActiveSubnetworkSearch.txt ================================================ 91.13730033969254 ZRANB1 ALKBH5 HNRNPH1 EWSR1 PSMD7 CSNK2A2 TRRAP ZNF148 CBX3 NUP93 NUP214 CFL1 PFN1 ATPIF1 PDIA4 RMDN3 HNRNPD HNRNPR RPL31 RPL26 EIF4A3 SARNP YARS ACTG1 TFG MAGED1 DNAAF5 C16ORF58 MARC1 PGRMC2 PTRH2 ATP5J VDAC1 SF3B6 ILF3 SHMT1 APEH ATIC ECH1 GSTO1 TXN CDC37 FBXW4 PFDN5 PIK3R4 PARP1 PRPSAP1 PRPSAP2 COPS5 COPS7B ZZEF1 TCEB2 PCBP1 UBL5 CRKL SGK223 U2AF2 GMFG SF3B2 SRSF5 RPLP2 RPS24 HNRNPDL C12ORF65 MRPS18C SRP54 EIF2S3 BRD7 BRD9 CKAP4 ITGB1 ATP5I UBAC2 PDHA1 ECHS1 MDH2 MRPL49 DNAJA3 PYCR2 SLIRP CALM3 MYL6 MLH1 CKB ARIH2 SMARCA4 ZMYND8 KDM1A LEO1 MRPL33 AKR7A2 MTMR12 THEM6 IARS AIMP2 EEF2 GAPDH ANXA7 PDHB RCC2 BUD31 SON S100A9 RNPS1 ZNF207 DPY30 SRRM1 SRPK1 SRSF8 ERH PHRF1 DDX23 PTBP1 RANGAP1 UBE2I HMGN2 UTRN PHF19 HDAC1 RING1 KDM2B SIN3A THAP11 TNPO1 KCTD5 CYFIP1 GNB1 MTOR TUBB S100A8 RPA1 XPC LSM3 USP11 DYNC1H1 TSR2 DDX24 GMDS ALDH9A1 STK38 LRRFIP1 CLASP2 TRIP12 WRNIP1 ELP3 ARG1 HNRNPA1 HNRNPAB SNAP23 SEPT9 GORASP2 TXLNA NUP62 UBIAD1 EIF2B1 DDX54 TAF15 ARAF FAF2 SLC25A5 RBM14 WDR33 S100P RAB11FIP3 DDOST RRP1 ANKHD1 UQCRQ SLC25A3 SUGP2 QSOX2 ATP2A2 NOA1 GAK PUF60 DAZAP1 AASDHPPT CAPRIN1 NUFIP2 HMGB2 CNDP2 SLC9A3R1 PUS1 NRD1 VPS25 ZNF664 POLD2 MAT2B UVRAG KLF13 WDR11 CREB1 TAF4 ANAPC1 DNMT1 ABCF1 ACOT13 JAK1 PHACTR4 SLBP NKTR NOL9 SBF1 PSMG2 FBXO21 MLLT6 TIMM8B GTF2B HMCES PITPNB TAMM41 COMT SPECC1L DCK HNMT CRELD2 BCL9L DSTN GLRX CASP3 METAP1 GNE C2ORF68 CEP85 CNOT11 WDR4 60.49402062461474 DDX58 FNBP1 MCEMP1 EWSR1 DDIT3 COPS7B FBXW4 PFDN5 ZNF148 TCEB2 S100A8 KDM1A SF3B2 SRSF5 PTBP1 RANGAP1 PDHA1 ECHS1 SLC9A3R1 KCTD5 GOT1 DNAJA3 MAP4K1 RNF34 EEF2 ATIC USP11 ELMSAN1 FOXJ3 WRNIP1 ELP3 IARS HNRNPAB ACTB ACTG1 CFL1 FAF2 TMEM43 PDIA4 TMEM131 SPG7 UBE2I PRPSAP1 BAG4 IST1 SNRPB BHLHE40 ADA YEATS2 MYL6B ABCF1 TICAM1 DDX54 GSE1 BTBD2 ZNF207 PHRF1 TAF4 GLTSCR1 IGBP1 CENPB GID8 MLLT6 KCTD20 GAK NUDT9 RETN SPOCK2 ZMIZ1 BLOC1S4 WDR59 UBL5 FBXL12 FAM43A PARP6 COMT KDM2B HK3 ZNF296 SARS FRMD8 MDM1 ST6GAL1 IMP3 C14ORF2 CD37 RTN3 TSPYL1 S100A12 SLC16A11 C6ORF136 RASAL3 GLRX FAM65A RUNX3 TEX261 TMCC3 57.29788313236089 APP C12ORF65 PYCR2 HNRNPH1 DFFB COPS5 COPS7B PRPSAP2 CAPRIN1 RPL31 ADARB1 SRPK1 SF3B2 SF3B6 VDAC1 CLEC4D ANXA5 ACTB GSTO1 ECH1 DYNLT1 ERH PAXIP1 FUCA1 ETS1 UBE2I SRSF8 PTBP1 PDHB ITGB1 SLIRP ECHDC2 MRPL49 HNRNPA1 GAPDH CFL1 PCBP1 CDC37 BMX DDX47 WDR45B TUBB S100A8 EDEM1 MAT2B IARS AIMP2 GORASP2 RPUSD2 SEPT9 GAK HMGB2 IGBP1 ASPSCR1 ABCF1 PHACTR4 DDX54 NUDT9 ARHGAP17 COMT CD74 TMEM109 ZSCAN21 BUD31 RING1 MLLT6 HADH MED24 CENPB APEX2 LCP2 TBC1D9B ZFAND5 SLC41A3 BLOC1S1 ANKRD13D HMCES RASA3 TMA7 FRMD8 STK25 NDUFB3 UBE2G1 GIMAP6 TSPYL1 CRACR2B ZFP36 C1ORF174 BET1L RNF34 S1PR1 54.15217634009142 M FNBP1 ANKHD1 EIF4A3 HNRNPR RPL26 SRSF5 RCC2 PDHB MDH2 CKB PFN1 BAG4 PGRMC2 GHITM SLC52A2 DNAJA3 KDM1A RNF34 RMDN3 STIM2 CKAP4 ARF1 RAB9A STX6 SCRIB UTRN ITGB1 UBAC2 FAF2 TMEM43 PDIA4 PIGT PAFAH1B1 DDOST SLC25A3 ATP2A2 RBM14 SNRPB TMEM131 DSTN SLIRP SLC25A5 RPS24 EIF2S3 DDX54 SNAP23 IKBKB PRKCQ PHACTR4 STX10 C16ORF58 GORASP2 SHMT1 WDR11 RTN3 THEM6 GAPDH AP2B1 HADH PUS1 JAK1 MED24 MIPEP MARCKSL1 SLC23A2 ALG9 SLC41A3 YARS WDR45B ELP3 KIAA0195 WDR4 C2CD2 NOL9 COMT MYOF PPP1R16B CNTNAP1 LRRC8C AHCYL2 ZDHHC8 RFTN1 SBF1 DDX47 RAD17 TM9SF4 BET1L KIAA0355 TELO2 TMCC3 BTN3A3 MDFIC 38.16260765638983 RAB7A EDEM1 FAF2 RMDN3 STIM2 CKAP4 PIGT TCTN3 TMEM43 UBAC2 VDAC1 MYOF RPA1 ITGB1 STX6 SNAP23 RAB9A WDR59 HPS3 PIK3R4 TUBB COPS5 SLC25A3 ACTB U2AF2 CFL1 PDHA1 UQCRQ COX7A2 UVRAG RTN3 DDX54 GORASP2 STX10 DDOST ATP2A2 CBX3 TMEM131 IST1 HMGCL BLOC1S1 HEATR5B DSCR3 BET1L 38.09303453097859 GOLGA2 FAM110A GATA2 KDM1A SIN3A GSE1 KDM2B UBE2I CDC37 TXLNA NUP62 PFN1 CKB ANKHD1 PDHA1 ECH1 RAB9A STX6 GORASP2 ATP6V1D DDX54 IFT20 PCBP1 PAXIP1 NCOA6 RNF214 EDEM1 PPP1R16B SNRPB LCP2 MLLT6 CCNC CCDC92 SUGP2 GLTSCR1 AFF4 37.94922574840974 SEC61B RPL31 SRP54 CKAP4 RMDN3 ILF3 RPLP2 RPS24 EEF2 COPS5 SLC25A3 DDOST EDEM1 FAF2 TMEM43 PDIA4 PIGT TCTN3 PGRMC2 PTRH2 STIM2 CALM1 RTN3 RAB9A ARF1 ITGB1 SLC25A5 GORASP2 ATP2A2 TMEM109 MBOAT2 POFUT1 TMEM131 TNPO1 RANGAP1 AHCYL2 GNB1 IFT20 SON RPL39 TOR1A TMCC3 COX7C BTN3A3 COX7A2 MUTYH 32.513228872394464 BCAR1 RPL26 MTOR RRAGD STK38 SHMT1 ABL1 CRKL UBL5 KIAA0355 UVRAG GNE PSMG2 MED24 CRLF3 HMGN2 IST1 HPS6 BMX SUMO3 NUFIP2 SLPI C14ORF2 GAK COX7A2 POLD2 32.36264675233986 SMAD3 ZMIZ1 ETS1 SMARCA4 ACTB U2AF2 CTCF SIN3A PARP1 MED24 SMAD7 HDAC1 TGIF2 RUNX3 ARID1A NCOA6 RBM14 RPA1 ANXA7 ITGB1 PIAS3 NUP214 JAK1 HP SH2D2A ZCCHC14 DOCK9 GMDS 32.029488486189834 ERBB2 FRS3 TMA7 ARF1 JAK1 ABL1 CRKL AP2B1 ACTB GAPDH CFL1 ILF3 CKB PTPRA CALM1 CLASP2 TNPO1 TXN CDC37 BMX SRPK1 STX6 ITGB1 IARS LCP2 SH2B3 ZMYND8 SH2D2A APLP1 31.95879977723965 SRC IKBKB CDC37 BMX HNRNPA1 RPLP2 CFL1 PARP1 NCOA6 ABL1 MICAL1 ARHGAP17 EVL TUBB ANXA7 BTBD2 PTPRA CALM1 SLC25A3 FASLG SH2D2A CYSTM1 TCEB2 LHFPL2 RAB9A PHF19 ASAP1 S1PR1 31.579541097028233 MMGT1 MYL6 MYL6B VDAC1 RMDN3 FAF2 TMEM43 TCTN3 ATP2A2 PDHA1 MDH2 ANXA1 ANXA5 TUBB GAPDH CLEC4D TNPO1 CKAP4 STX6 RAB9A C16ORF58 TMEM109 MYOF MED24 SF3B2 IST1 30.99958176142714 DDX39A RBP7 MED29 CRLF3 FNTA VPS9D1 DDOST PARP1 ACTB UBE2I SARNP ANXA1 ILF3 SNRPB ERH LSM3 USP11 ALG9 QDPR IRF1 POLE4 GNE CRACR2B STIM2 C1ORF174 KLF2 ATP6V0E1 BCL11B APLP1 ZC3H7A 30.976151623504936 PPP2CA FNBP1 TNPO1 HNRNPH1 PAXIP1 RPA1 ANAPC1 AIMP2 ATP6V1D MAP4K1 CARD11 IKBKB PPP2R3C ZNF217 ZFP36 NRD1 SON STK25 FAM43A INTS9 EP400 TMEM109 RBL2 IGBP1 30.66887205839992 MKRN2 HNRNPR HNRNPDL HDAC1 TGIF2 CREB1 ZFP36 ALG13 ZC3H7A RNF214 CEP85 KIAA0355 ARF1 AP3D1 PARP1 MTOR ASAP1 CAPRIN1 IARS KDM1A NTNG2 TM9SF4 DYNC1H1 IFT20 PTPRA S1PR1 30.549398058010485 NSP2 HNRNPR TFG ANXA1 RRS1 SRPK1 DDX24 NFATC3 HEATR5B PTRH2 AGK TXLNA SH3BP5L CCDC102A CCDC92 VDAC1 COX7A2 ATP2A2 ZC3H7A MARC1 SUGP2 ZFAND5 TTF2 SPECC1L EEF2K CLASP2 TIMM8B 30.20892482298902 HRAS SCRIB ANXA7 ANXA1 MRPL49 FNTA MTOR STK38 ITGB1 ANAPC1 BTBD2 PTPRA PHACTR4 ACTG1 SNAP23 SEPT9 GORASP2 UBIAD1 MARCKSL1 SLC23A2 RALGDS NBEAL2 TLR6 SBF1 PHF19 DOCK9 S1PR1 MUTYH 30.06034561405813 UBE3A PSMD7 CCDC92 SCRIB ACTB AIMP2 DSTN CKB COPS5 COPS7B TUBB GNE ANXA1 UTRN CALM1 DNMT1 MDM1 SUMO3 ECH1 DAZAP2 ZFAND2B FAF2 ZFAND5 JADE1 SGK223 IRF1 POLE4 29.56177733780863 MEOX2 HNRNPD CCNC ARIH2 IKBKB WDR37 STX6 TSR2 ZNF330 MLLT6 ZMYND8 BUD31 TXLNA DSCR3 MOAP1 C1ORF174 METAP1 EIF4A3 MAGED1 DPH3 COX7A2 PSMB10 TJAP1 HK3 29.534218339459187 AGO2 HNRNPR HNRNPD ILF3 CTCF PCBP1 COPS5 GAPDH HNRNPA1 HNRNPAB ACTB AP2B1 CNDP2 GEMIN4 TFG ZC3H7A RNF214 CEP85 ZFP36 KIAA0355 SAMD4B IGBP1 NCOA6 EEF2K 29.433975525714484 BICD2 FAM110A NUP93 NUP62 NFATC3 PAFAH1B1 DYNC1H1 IST1 RANGAP1 PCBP1 ZC3H7A KIAA0355 MTOR TXLNG TTF2 TMEM131 SEPT9 ASAP1 29.366908226555786 SMG7 RPL31 LEO1 KDM1A CSNK2A2 PSMD7 RING1 KDM2B ALG13 ZFP36 CEP85 RNF214 ZC3H7A SAMD4B KIAA0355 CAPRIN1 ZNF217 GSE1 U2AF2 PFN1 ARID1A PAXIP1 NCOA6 YEATS2 DDOST MLLT6 29.116580283771295 SH3KBP1 CRKL MAP4K1 ABL1 AP2B1 RPA1 ELMSAN1 ARHGAP17 CKAP4 ASAP1 FRS3 SRRM1 ZFP36 DNMT1 UBE2I PHACTR4 SLC25A3 SEPT9 KLF13 RNF34 28.74239366438648 MYO6 ACTB PTRH2 HNRNPA1 PCBP1 WDR33 COPS5 GEMIN4 SEPT9 PDHA1 CLEC4D CALM3 MYL6B CALM1 GAK STX6 RAB9A AP2B1 YEATS2 WRNIP1 USP11 EVL 28.53371294563163 BRD1 SCRIB FEZ1 TELO2 KIAA0355 ZNF148 ZNF330 DNMT1 CBX3 WRNIP1 USP11 DDX24 JADE1 ING4 DPY30 BCL11B CTCF QSOX2 WDR37 GID8 ANAPC1 ZZEF1 DDX47 DNAJA3 28.510823733459013 KRT31 FAM110A FRS3 GNE ZNF148 PSMG2 JOSD1 FNTA TXLNA TXLNG SNRPB BAG4 ALG13 RBM14 AIMP2 NUFIP2 ABL1 VPS9D1 CCNC KDM1A COMT ASPSCR1 MXD3 SLIRP 28.482266596865543 YTHDC1 RPL31 ADARB1 SRPK1 ZNF330 LEO1 PARP1 CENPB PYHIN1 SNRPB HNRNPA1 EIF4A3 RNPS1 DDX23 PFDN5 ABL1 SOX13 STX6 POFUT1 ZNF317 SMARCAL1 FAF2 TLR6 PHF19 28.430905333676385 MED23 TRRAP CSNK2A2 PSMD7 RING1 ATPIF1 SRRM1 PAXIP1 NCOA6 RBM14 PARP1 MED24 CCNC MED29 ETS1 EP400 KDM1A INTS9 SRP54 GEMIN4 AGK PDXP PSMB10 28.361188225398905 SMARCA2 ACTB GAPDH SMARCA4 DDIT3 ARID1A GATA3 KDM1A GATA2 BRD7 BRD9 GLTSCR1 PARP1 ETS1 SIN3A HDAC1 IRF1 ACTG1 MYL6 ZNF212 JADE1 NUP62 28.26035995500177 SEC16A RNF214 CEP85 ZFP36 ALG13 PFN1 HNRNPA1 ACTB PTRH2 AP2B1 GAK RBM14 WDR33 CLEC4D PDHA1 RAB9A STX6 ANKHD1 KIAA0355 SAMD4B IKBKB STK38 ELMSAN1 KCTD5 28.187402251544047 DCTN1 PSMD7 DYNC1H1 PAFAH1B1 ACTB CDC37 FBXW4 PCBP1 ACTG1 ZC3H7A CAPRIN1 TTF2 SPECC1L DYNLT1 CLASP2 CCDC64 ANXA1 AGK UQCRQ MIPEP KCTD5 RBL2 28.13225095313616 LNX1 DDIT3 CRACR2B ZNF148 PFDN5 IARS AIMP2 ILF3 EIF4A3 HNRNPH1 JOSD1 NUP62 DYNLT1 ABCA1 CEP85 HSBP1 LCP2 FBXL12 SH2D2A DCTD DOCK9 27.913794256344993 NSP12 RNF214 CEP85 ZC3H7A ANKHD1 ZNF217 GSE1 FRMD8 ANAPC1 SNRPB PFDN5 EEF2 PDHA1 SLC25A5 ALDH9A1 HADH CRKL CCDC92 SPECC1L IFT20 27.876520709497008 IKBKG DDIT3 PRKCQ IKBKB CDC37 MTOR PARP1 MLLT6 UBE2I ANXA1 RNF34 SEPT9 MAP4K1 CARD11 ACTG1 ABCA1 FRMD8 STK25 SRPK1 ZFAND5 AKR7A2 PFDN5 27.728264803811395 CRK ACTB AP2B1 ABL1 CRKL MAP4K1 DNAJA3 TUBB SGK223 ASAP1 MICAL1 ARHGAP17 PHACTR4 PTPRA SH2D2A ATIC CNDP2 STK38 FASLG BUD31 NUFIP2 27.650138213066064 HGS TFG MAGED1 DAZAP2 IST1 UBE2I BLOC1S1 ABCA1 PTRH2 TUBB STX6 RAB9A CLEC4D HNRNPDL IL2RB MIF4GD CDR2 ATP2A2 NUP62 27.587134145670063 CALML3 CCDC102A CALM1 MYL6 MYL6B HMGB2 JOSD1 HMCES GAK AP2B1 DYNC1H1 PYHIN1 SRPK1 HNRNPA1 CFL1 ACTG1 INTS9 PIGT SPECC1L GOT1 BHLHE40 GNB1 27.553956583637234 RAD50 HNRNPR ILF3 HNRNPA1 ACTB DYNC1H1 RCC2 TCEB2 MLH1 RPA1 PAXIP1 GATA3 PARP1 PYHIN1 USP11 DYNLT1 FNBP1 ZFAND2B ZC3H7A RBL2 27.034416026855386 MAP3K7 CDC37 IKBKB PRKCQ CARD11 MAP4K1 COPS5 COPS7B MDH2 ANXA5 PDIA4 HMGB2 ECHS1 EEF2 ATIC VDAC1 SMAD7 MAGED1 DAZAP2 YEATS2 POLE4 MARCKSL1 26.953813646267744 MDFI DCANP1 TSPYL1 EWSR1 CDC37 MAGED1 ILF3 SHMT1 CRY1 ZNF330 GNE GATA2 JOSD1 STX6 ZNF696 TOR1A NUFIP2 AASDHPPT SPG7 26.652634059296886 PRRC2B TFG GATA3 GATA2 RNF214 CEP85 ZFP36 ALG13 BAG4 ANKHD1 ZC3H7A PCBP1 HNRNPA1 RBM14 CAPRIN1 SAMD4B KIAA0355 PYHIN1 USP11 CD74 DPY30 26.60955502665256 SET RPL31 CTCF RPS24 DDX24 TSPYL1 ZNF330 ELMSAN1 CBX3 HMGB2 GAPDH ACTB MARCKSL1 CD81 HNRNPH1 PFN1 CALM1 DDX23 SUGP2 IRF1 ASAP1 26.604841398775786 RBPMS TFG EWSR1 PCBP1 ILF3 RBM14 SNRPB BAG4 MAGED1 DAZAP2 BHLHE40 ALG13 GSE1 GATA2 ANKHD1 NUFIP2 SLIRP WDR54 ATP6V0E2 PPP1R16B SPG7 26.559339681656965 LAMTOR5 TTF2 HMCES RRAGD RAB9A ITGB1 DSTN GORASP2 BLOC1S1 STK38 DBI HNRNPH1 SIN3A NCOA6 GAPDH CD74 KLHL6 DNAJA3 26.53741759469747 MYL12A ACTB SLC25A3 VDAC1 PDHB COX7C PDHA1 HNRNPA1 COX7A2 LRRFIP1 MYL6B MYL6 HERC1 IST1 DNMT1 PAXIP1 IGBP1 26.50678531012626 RYBP CSNK2A2 RING1 KDM2B GATA3 GATA2 HDAC1 UBE2I PARP1 MYOF SCRIB ABL1 AP3D1 CBX3 TCEB2 AGK ZNF207 IFT20 STK25 GNB1 26.381565144143718 LYN FNBP1 FASLG SCRIB ARF1 ERH DYNLT1 SNAP23 SEPT9 DDX23 UTRN ITGB1 PHACTR4 EVL LCP2 MARCKSL1 ZDHHC8 CDC37 IRAK2 RFTN1 RASA3 26.34167459781998 RCN1 FAM110A MYL6 TCTN3 PDIA4 ILF3 COPS5 SLC25A3 CLEC4D WDR45B CD74 RAB9A WRNIP1 KCTD5 SPECC1L FBXL12 26.295698935954142 CYSRT1 FRS3 GNE GATA3 GATA2 RMDN3 HSBP1 JOSD1 ZNF696 ZNF330 ZNF319 ERH SNRPB FASLG NUFIP2 DNAJA3 RALGDS ZCCHC14 MXD3 SPG7 26.116452991137777 GNB2 IL27RA VDAC1 CLEC4D GNB1 CD81 KCTD5 DYNLT1 PFDN5 BHLHE40 MTOR PDHB ANXA7 ANXA1 MED29 ZNF212 COMT 25.911842498569833 METTL21B SMARCAL1 HERC1 TSPYL1 DNAAF5 TELO2 TBC1D9B NBEAL2 RAD17 IKBKB GAK MDH2 TRIP12 TTF2 SBF1 GEMIN4 ANAPC1 ANKHD1 PFDN5 IGBP1 25.765880786853874 CBL ACTB GAPDH HNRNPD JAK1 ABL1 CRKL ASAP1 ARF1 DAZAP1 PFN1 CALM1 CD81 TUBB FRS3 PRKCQ PRPSAP2 SMAD7 IRF1 LCP2 25.487196278902946 KAT2B TRRAP WDR37 YEATS2 ACTB RPA1 HNRNPD PARP1 ETS1 HMGN2 TAF15 SMAD7 KLF10 KLF13 IRF1 KLF2 25.426630150517504 ATG9A EDEM1 FAF2 UBAC2 ZFAND2B HEATR5B STX6 GAK PARP1 AP3D1 WDR11 STX10 SYPL1 UVRAG PGRMC2 AP4B1 TJAP1 CAPRIN1 25.382403590600777 FKBP5 PRKCQ IKBKB CDC37 PCBP1 HNRNPH1 SF3B2 MAP4K1 ACTG1 ANAPC1 HPS3 ALDH9A1 PDHB SMARCAL1 AP2B1 TTF2 DNAJC27 25.306585932491917 UBQLN1 RPL31 RPL26 RPLP2 MTOR EDEM1 GNB1 HNRNPH1 MLLT6 IST1 TICAM1 ABCF1 GOT1 MIPEP GAPDH SLPI DAZAP2 ZFAND2B UBE2I H2AFJ 25.270191207789683 MEX3B RUNX3 MAGED1 ZC3H7A RNF214 CNOT11 CEP85 KIAA0355 ALG13 SAMD4B ANKHD1 CAPRIN1 NUFIP2 U2AF2 ZCCHC14 AP2B1 GID8 25.253942730998883 PCM1 TSPYL1 EWSR1 DDIT3 DDX23 CEP85 MED29 IFT20 TXLNA TXLNG ACTB TTF2 CCDC92 MAGED1 STX10 TBC1D31 ALG13 PPP2R3C 25.247542609638433 SNX27 FNBP1 MTMR12 THEM6 JAK1 HEATR5B PAFAH1B1 WDR45B HP ORM1 SRSF8 STK38 LSM3 FAM65A OTULIN SBF1 KCTD5 ARID1A 25.217045209311454 NDUFA4 GHITM ECH1 TAMM41 PDHA1 CLEC4D UQCRQ COX6A1 COX7C PDHB COX7A2 SLC25A3 ATP5J ATP5E AGK RFTN1 NDUFB3 25.122699797674088 TANK CSNK2A2 TELO2 AIMP2 MDH2 HNRNPH1 ACTG1 PAFAH1B1 ZFAND2B IKBKB TXLNG TXLNA NUP62 RAB9A ZFP36 TBC1D31 TICAM1 PFDN5 25.060602196492937 WDYHV1 CEP85 RASAL3 BTBD2 EIF2B1 AIMP2 ACTB ACTG1 EP400 MIF4GD SLPI CDR2 ACOT13 DYNLT1 PFDN5 HMGCL GMDS 25.051732878129698 TFAP4 TTF2 PYHIN1 CTCF SRP54 HMGB2 SIN3A ARID1A BRD7 HDAC1 ELMSAN1 KCTD5 S100A8 TCEB2 YEATS2 WDR45B 25.003358997383305 NSP10 SNRPB SON GSE1 ZNF217 PDHA1 SLC25A5 ZZEF1 THAP11 MYL6B HK3 ARID1A MED29 MLLT6 AASDHPPT MARCKSL1 AP2B1 WDR11 24.954419596014894 TRIP6 FRS3 GNE RNF214 PCBP1 ILF3 COPS5 PARP1 GSE1 ZFP36 KIAA0355 KCTD5 FASLG S1PR1 24.83451468168788 INF2 ACTB ACTG1 PFN1 CALM1 GAK STX6 RAB9A WDR59 HPS3 CKAP4 RMDN3 SBF1 HERC1 RAD17 CALM3 ERH SEPT9 24.827734322394786 KLK10 GMFG HP ORM1 ORM2 CCNC S100A9 S100A8 GOT1 S100P PGLYRP1 HK3 CAMP MMP9 24.766579216258826 CDK4 RUNX3 HDAC1 COPS5 GEMIN4 RPA1 ANXA7 APLP1 WDR33 TUBB CDC37 CDKN1C RBL2 SLBP GATA2 DAZAP2 24.75634944872382 MYOD1 KDM1A TRRAP EP400 SMARCA4 ARID1A HDAC1 RING1 KDM2B ELMSAN1 BHLHE40 CREB1 GSE1 SMAD7 PAXIP1 CDKN1C 24.697673358683844 STAT1 IL27RA JAK1 IL2RB SHMT1 APEH SMARCA4 COPS5 SARS U2AF2 HDAC1 UBE2I IRF1 MYL6B OTULIN HADH IGBP1 MTOR 24.498781860019943 CEP135 ECH1 DNAAF5 MAGED1 GSE1 HDAC1 SRPK1 CEP85 ANXA7 PAFAH1B1 TXLNG TXLNA TBC1D31 TMEM131 ATP6V1D DBI 24.426709546580838 POLR2B ZMYND8 KDM1A CCNC HNRNPD MED24 MED29 SNRPB TUBB CALM1 INTS9 MAP4K1 GTF2B JADE1 PHRF1 24.3944898474767 BYSL HERC1 CSNK2A2 PYHIN1 SMARCA4 PARP1 AIMP2 ZNF330 TNPO1 CLEC4D COPS5 RPS24 KLHL6 TSR2 ZNF212 JADE1 BMX BHLHE40 24.363399531551607 CEP128 CCDC102A ECH1 NFATC3 CEP85 ANXA1 PAFAH1B1 GID8 TXLNG TXLNA AP2B1 TTF2 TBC1D31 GNB1 DDX47 HDAC1 PPP2R3C 24.35367728369922 CCR1 ORM1 HP ANXA1 ARG1 CSTA S100A9 S100A8 GDPD3 CD74 ANKRD22 CKAP4 S100P 24.346665796907565 RELA CSNK2A2 ABCA1 ANXA1 PIAS3 HDAC1 PARP1 NCOA6 RBM14 SRF DNAJA3 MAP4K1 DYNC1H1 KDM1A IKBKB ING4 GTF2B NUFIP2 COMMD6 TIMM8B 24.31928244629448 KRTAP10-8 FAM110A FRS3 ZSCAN21 ZNF330 MLLT6 GNE C2ORF68 UBL5 JOSD1 ZNF696 ZNF319 NUFIP2 RALGDS PDIA4 ASPSCR1 24.296049906303015 PAX6 PAXIP1 NCOA6 GSE1 ZNF148 SMARCA4 CTCF ARID1A GLTSCR1 BRD7 DNMT1 ARIH2 EP400 RING1 KDM2B MLLT6 ELMSAN1 MARCKSL1 24.16612736205139 LGALS3 MYL6 PAXIP1 TXN PYHIN1 PARP1 DDOST IGBP1 CKAP4 ITGB1 DSTN SLC25A5 SYPL1 GEMIN4 GOT1 PTPRA DBI 24.05099156926739 ARID4B ZNF330 SIN3A CSNK2A2 HDAC1 SMARCA4 GATA2 ARID1A ETS1 PARP1 BCL11B TGIF2 MXD3 KLF10 SRPK1 DYNLT1 PAFAH1B1 23.855333393527275 ATP6AP2 ATP6V0E1 RMDN3 PDIA4 TCTN3 CKAP4 CLEC4D RAB9A ATP6V1D UBAC2 GDPD3 IST1 S100A8 RANGAP1 S100P 23.77178100370523 TLE1 ACTB ANXA7 HDAC1 GATA3 GATA2 SIN3A PARP1 ERH PFN1 ALG13 RUNX3 DAZAP2 SBF1 BTBD2 23.74199242534983 KRT40 ZSCAN21 GNE C2ORF68 GATA2 KDM1A THAP11 JOSD1 ZNF696 TXLNA TXLNG ZNF319 ZNF317 SNRPB FASLG COMT SPG7 SLIRP 23.738941897343484 PSMD4 AHCYL2 ACTB GAPDH MDH2 AIMP2 RPA1 XPC CBX3 AP2B1 ABL1 PSMB10 PSMD7 CCDC92 MED24 PDIA4 PTPRA S100P 23.622306931199184 DNAJB11 ECH1 PAXIP1 EDEM1 PDIA4 TCTN3 CKAP4 BRD7 CLEC4D DDOST ADA CD74 SEPT9 MAT2B SIMC1 23.575659537725052 CNOT3 KDM1A TRRAP ELP3 INTS9 ZFP36 CNOT11 TMEM131 RNF214 CEP85 ZC3H7A SAMD4B KIAA0355 CAPRIN1 SRRM1 DPY30 23.570620483092753 AQP3 DNAAF5 C16ORF58 TELO2 TBC1D9B C14ORF2 TUBB HNRNPH1 WDR11 COX6A1 SLC25A3 SPG7 NDUFB3 AGPAT2 23.441872793469827 NOTCH2 ZSCAN21 STX6 RAB9A CKAP4 ZNF696 ZNF317 E4F1 ZNF664 EEF2 PTPRA SIN3A KIAA0355 CRKL ZMYND8 ATP2A2 23.176542122147268 TBK1 TFG CDC37 TXLNA TXLNG MTOR RRAGD EDEM1 MAP4K1 RNF214 SAMD4B ZFP36 KIAA0355 TICAM1 IRF1 S100P 23.127158306389877 TRIM63 GATA3 RING1 ING4 JOSD1 CKB ILF3 RBM14 UBE2I PIAS3 SRF UBE2G1 KLHL36 AKR7A2 NUFIP2 PDHB NDUFA1 23.094161882115614 MYO9B ACTB ACTG1 PFN1 CALM1 MYL6 CALM3 MYL6B PCBP1 BRD7 NUP214 MOAP1 ARF1 HLA-DPA1 23.037291279391198 DYNLL2 EWSR1 PCBP1 HNRNPH1 PAXIP1 NCOA6 ETS1 TCEB2 DYNC1H1 PAFAH1B1 DYNLT1 KDM1A RNF214 CDR2 22.950992094490754 PTGES3 EWSR1 SMARCAL1 CDC37 MAP4K1 ABL1 LEO1 ZNF330 NOL9 ELP3 WDR11 SARS CLEC4D TTF2 SBF1 SLIRP 22.929939493409215 SVIL ACTB GAPDH DSTN PARP1 RNF144A HMGN2 SEPT9 MIF4GD PAXIP1 CALM3 CALM1 GNB1 DYNLT1 22.922724785261654 EPB41L3 EIF4A3 SNRPB SRRM1 CSNK2A2 RNPS1 GID8 TAF4 DPY30 PAXIP1 EEF2K FAF2 RNF144A AP3D1 AFF4 ELMSAN1 22.911035474216124 FHL3 FAM110A GATA2 AIMP2 SUGP2 ZCCHC14 ZFP36 SNRPB UBE2I CREB1 ANKHD1 TJAP1 SLIRP 22.805753645852427 TNFAIP3 ANKRD13D TICAM1 RNF216 CARD11 IRAK2 PCBP1 UBE2I RCC2 FAF2 ATP2A2 IKBKB MTOR ALDH9A1 22.803687847515796 PLS3 ACTB PAFAH1B1 ANXA5 EWSR1 SF3B2 PYHIN1 HNRNPH1 IARS SEPT9 PIGT AIF1 ALDH9A1 22.688545400903262 NDUFA5 EWSR1 VDAC1 PDHB C12ORF65 SLIRP MDH2 DAZAP2 TCTN3 MAP4K1 DNAJA3 COX6A1 NDUFB3 NDUFA1 22.677008170066358 POMK XYLT2 HDAC1 ATP5J ATP5E GHITM PGRMC2 TCTN3 PIGT DGAT1 RTN3 ATP2A2 DDOST PSMD7 DYNC1H1 QSOX2 GPR114 22.64634993302088 IQGAP3 HERC1 IST1 KCTD5 TUBB BRD7 PARP1 MYL6 CALM3 MYL6B CALM1 BCL11B 22.49127909969904 TRIM56 ZFP36 ALG13 ZC3H7A PCBP1 ILF3 PARP1 HNRNPH1 TICAM1 FAF2 ING4 ALDH9A1 22.48205888708661 FOS DDIT3 SMARCA4 BRD7 ARID1A EP400 PAXIP1 NCOA6 ATP5I CALM1 MYL6B KDM2B ABL1 UBE2I ANKHD1 22.435308433675367 ANXA7 SEMA5B ACTB GAPDH ANXA1 RPA1 PDHB COPS5 PCBP1 PSMB10 APLP1 EDEM1 SUMO3 CENPB 22.412284808998795 VPS33A TELO2 PCBP1 SRPK1 TMCC3 AP3D1 ELP3 RAB9A TBC1D31 ALG13 TTC21B UVRAG CNOT11 ALDH9A1 22.401184151315004 PDCD6 CRKL TFG IST1 BAG4 TXN PCBP1 PARP1 HNRNPH1 IGBP1 ANXA5 QDPR ARID1A 22.39876501455635 ZUFSP ACTB GAPDH TUBB FNBP1 MTOR USP11 DYNC1H1 RPA1 ARG1 CSTA UTRN GATA2 DSCR3 CTCF POLE4 22.350023063177872 NFIA KDM2B GATA3 PAXIP1 GATA2 GSE1 RPS24 RRS1 PARP1 ZNF148 ARID1A ELMSAN1 DPY30 SON CREB1 22.241979826524467 MED17 TRRAP MED24 CCNC HNRNPD PARP1 SMARCA4 MED29 INTS9 WDR33 FBXO21 TBC1D31 22.17787793942259 VAMP3 ZSCAN21 C12ORF65 RMDN3 STX2 SNAP23 STX6 STX10 RAB9A LHFPL2 MLH1 CD81 COMT SYPL1 BET1L 22.1206301728684 PSMD2 PSMD7 CCDC92 APEH VPS9D1 HDAC1 U2AF2 CALM1 AP2B1 PARP1 AIMP2 FAF2 ZFAND2B SFSWAP 22.06403313517576 LEMD3 ACTB PTRH2 CBX3 ZNF330 ILF3 RBM14 PARP1 CKAP4 RMDN3 TCTN3 STX6 RAB9A CLEC2D 22.01178243677922 LRIF1 FEZ1 PFN1 PARP1 BRD7 CBX3 ANXA7 APLP1 ANXA1 S100A8 DSCR3 DYNLT1 SPG7 21.99414021892283 GPR45 DNAAF5 C16ORF58 TELO2 MTOR TUBB TBC1D9B NBEAL2 C2CD2 JAK1 TM9SF4 GEMIN4 FBXL12 TNPO1 DCK SPG7 21.964211839576638 KIF1B ACTB PAFAH1B1 UBE2I SCRIB HERC1 CSNK2A2 PCBP1 HMGB2 MIF4GD CALM3 CALM1 ZNF217 RAB9A 21.91274218826615 TJP2 ACTB TTF2 PFN1 SRPK1 DDX23 ARHGAP17 FNBP1 SCRIB PIGT CD74 GTF2H5 GOT1 H2AFJ 21.874809291831554 RPS27A RPL31 EWSR1 HNRNPA1 RPL26 E4F1 RPS24 PARP1 HDAC1 PAXIP1 DAZAP2 DOCK9 ZC3H7A CRY1 21.744976403500328 TSC1 FAM110A CDR2 FRS3 PFN1 DAZAP1 SH2D2A EDEM1 ARF1 RAB9A STX6 IKBKB JAK1 KDM1A 21.740965021284055 WTAP ACTB PDHA1 TUBB SLIRP TXN PYHIN1 THAP11 IFT20 SH3BP5L CA4 ANXA1 HNRNPH1 CSTA 21.73765112608427 JAK3 CD81 GTF2B MAGED1 RNPS1 JAK1 IL2RB ANXA1 DCK CRY1 WDR45B AP3D1 DNAJA3 21.730823127708998 RICTOR TSPYL1 MIF4GD RPL26 MTOR TELO2 HNRNPD IKBKB TUBB STK38 GNB1 ARF1 STX6 21.71815736971932 DVL3 TFG TSPYL1 ZSCAN21 PCBP1 PHF19 KDM1A PLAGL2 ZNF696 PPP1R16B ZNF319 BHLHE40 DDX54 21.7036238713553 ANKRD28 HNRNPR FNBP1 PFN1 RTN3 MAGED1 PRPSAP1 CTCF PAFAH1B1 LEO1 CD74 AP2B1 AGPAT2 21.7026987332619 HMGN1 KLHL36 HMGN2 PARP1 XPC PFN1 ANXA5 CALM1 ATPIF1 PDIA4 JADE1 SUGP2 MAP4K1 LCP2 PLAGL2 21.679038499034174 NPTN TSPYL1 DNAAF5 TELO2 MTOR NBEAL2 RAD17 HEATR5B GEMIN4 FBXL12 ATP2A2 TNPO1 SLC25A5 SIMC1 21.665706703251033 LZTS2 SHMT1 CEP85 ZFP36 ALG13 WDR33 KIAA0355 SAMD4B DUSP5 STX6 BMX RANGAP1 CCNC SBF1 PHF19 SPG7 21.657324992346748 CLTA ACTB AP2B1 GAK HEATR5B COPS5 TFG ARIH2 IST1 PYCR2 TMEM43 STX10 AP3D1 CLEC4D ELMSAN1 21.61692909717344 PIAS4 ZNF330 PARP1 ETS1 UBE2I HDAC1 CBX3 YEATS2 SMAD7 HNRNPH1 TICAM1 SIMC1 21.616067982348312 NIPSNAP1 ACTB COX7C VDAC1 EEF2 PDHB MDH2 CALM1 PDHA1 UQCRQ ARHGAP17 TCTN3 DBI 21.573527581064123 SKP2 WDR59 CDKN1C COPS5 IKBKB MTOR SMAD7 MLH1 AP5B1 ANAPC1 SLC9A3R1 MMP9 RBL2 CRY1 PFDN5 21.554456075506707 NOL10 ZNF330 PARP1 H2AFJ RPL31 ADARB1 IMP3 SRSF5 EIF4A3 RPS24 JADE1 DDX47 DDX23 21.52275062308499 PSMC1 PSMD7 CCDC92 MIF4GD LEO1 KDM1A HDAC1 U2AF2 AKR7A2 PIAS3 ATP6V1D DPY30 CRY1 SFSWAP 21.50193674447883 MMS19 PRPSAP1 COPS5 MAP4K1 ABCF1 EDEM1 ARF1 TMEM43 QSOX2 ORAOV1 ELP3 AP2B1 MLH1 RANGAP1 POLD2 MUTYH 21.469832097503673 KLF15 RPL39 CTCF GSE1 ZNF148 ZNF330 BRD7 CAPRIN1 DDX24 KDM2B RING1 ZNF319 EP400 TAF4 21.46565442294332 MAVS PCBP1 RMDN3 FAF2 RAB9A IKBKB MAP4K1 PTRH2 WRNIP1 ABL1 TICAM1 GNB1 RNF34 21.44890130293635 EIF3I SRRM1 PAXIP1 PARP1 CTCF PYHIN1 HMGB2 TRIP12 EIF2S3 CD74 CD81 CAPRIN1 DDX47 21.384501218150543 FTL SMARCAL1 TSPYL1 GATA3 FAM43A THAP11 DUSP5 AP3D1 ELP3 TBC1D31 SBF1 ASPSCR1 UQCRQ DDX47 SBK1 21.362369535057887 SERPINB4 CCNC S100A9 PIGT S100A8 GOT1 COPS5 MED24 CCDC92 BMX UBAC2 HLA-DPA1 PFDN5 21.336770871113693 PLSCR1 CRKL TFG EWSR1 ILF3 CRY1 DAZAP2 FRS3 SLPI MFNG NUFIP2 SPG7 ADCY7 21.259060954991394 GRN AHCYL2 CRKL SLPI ZSCAN21 GNE FAM43A INTS9 ALG13 HK3 CRY1 NUFIP2 21.22211596637735 GTF2E1 ZNF330 PARP1 EWSR1 GEMIN4 ERH DDX23 SRRM1 CBX3 GTF2B SARS IKBKB RPUSD2 GTF2H5 21.171061474785397 SDCCAG3 KDM1A TTF2 YEATS2 CEP85 KIAA0355 SAMD4B NUP62 RBM14 HNRNPH1 PAFAH1B1 AP2B1 21.148014995146898 UQCRB COX6A1 COX7C ACTB PSMB10 PDHB VDAC1 DDOST UQCRQ GNB1 SPECC1L CALM1 C14ORF2 21.102952273421558 RNF11 ANKRD13D PSMD7 S100A9 S100A8 TXN RNF216 IKBKB MYOF PDXP UBE2G1 AP2B1 21.085367560477668 RAI14 ACTB RPA1 VDAC1 FEZ1 PCBP1 CALM3 STX6 PACSIN1 ASAP1 SEPT9 21.049384049475936 TPM3 CCDC102A CALM1 U2AF2 ACTB PDHA1 VDAC1 CKB SEPT9 PDIA4 IMP3 PAXIP1 KDM1A KLF10 21.036711119741426 CTR9 PYHIN1 PARP1 LEO1 TCEB2 ZNF330 AFF4 DDX23 SNRPB AKR7A2 BMX TMEM131 TJAP1 21.03524615609842 TNPO3 ST6GAL1 FNBP1 TNPO1 PARP1 RBM14 HNRNPH1 HDAC1 PTPRA MFNG C16ORF58 S1PR1 21.03340396317452 PIGT IL1RN TCTN3 CSTA S100A9 S100A8 ARG1 PDIA4 CKAP4 GID8 CLEC2D 20.973819332734642 AKAP9 KDM1A SF3B2 SNRPB RBM14 HDAC1 FNBP1 CALM1 PFN1 AP2B1 SAMD3 MAGED1 STK25 VPS9D1 20.932353161307823 DYNC1I1 FAM110A PFDN5 GNB1 DYNLT1 PAFAH1B1 DYNC1H1 CCDC64 CALM1 ANXA7 20.891372724232568 RAD51 RPL31 CTCF PFN1 EVL ABL1 CSTA BRD9 RPA1 ELMSAN1 EP400 UBE2I IGBP1 DNAJA3 20.857089875489 TIPRL EIF4A3 TFG FUCA1 IARS FASLG ZC3H7A DYNLT1 CREB1 PUF60 XPNPEP1 IGBP1 20.697447240584054 TNRC6A CEP85 ZFP36 ALG13 ZC3H7A CNOT11 KIAA0355 ARID1A ARIH2 NCOA6 TXLNG AP2B1 AFF4 KDM1A ANAPC1 TNPO1 20.670589778379085 CCNA2 HELB RPA1 ELMSAN1 GATA2 PARP1 COPS5 CDKN1C PRKCQ SLBP CALM1 RBL2 20.55594649417474 MAGEA9 RAD17 SUGP2 STX6 ELP3 TTF2 SARS ARHGAP17 CNOT11 DDX24 TAF4 20.508258250814002 COPA SMARCAL1 RPA1 SF3B2 PYHIN1 USP11 PDHA1 SEPT9 CD74 WDR11 ZNF217 RTN3 BET1L 20.461119166972882 PSMC4 PSMD7 CCDC92 RAD17 ZFAND2B PAFAH1B1 ZFAND5 TTF2 SEPT9 CRY1 USP11 PFDN5 20.45361068395323 TPM4 ACTB CKB ILF3 COPS5 HNRNPH1 TELO2 CALM1 RNF144A RAB9A CLEC4D PACSIN1 KLF10 20.37617733405232 SULT1C4 TTF2 SBF1 ANAPC1 MAGED1 WDR59 HPS3 HPS6 NBEAL2 ANKHD1 RAD17 WDR11 20.3749152650239 CCDC88A SRSF8 PAXIP1 PTPRA CALM1 DYNC1H1 AP2B1 NUP62 ANXA7 ANXA1 STX6 S100P 20.36680311684649 IPO4 MYL6 AIMP2 EEF2 PARP1 ZNF330 TNPO1 CLEC4D PAFAH1B1 ZC3H7A 20.351428249792228 IQCB1 ACTB PAFAH1B1 SLC25A3 GAPDH TUBB CKB ECH1 CALM3 CALM1 GEMIN4 IRAK2 QDPR S100P 20.342280376611587 BRPF3 JADE1 ING4 CSNK2A2 SIN3A DDX24 BAG4 CRACR2B CBX3 YEATS2 SUGP2 USP11 CENPB 20.341650927239456 NDC80 HERC1 CEP85 HNRNPH1 TXLNG TXLNA AKR7A2 TBC1D31 ATP6V1D IFT20 NUFIP2 SPG7 20.32731188478642 SNX6 ZMYND8 TSPYL1 CSNK2A2 HERC1 PFN1 CALM1 SHMT1 APEH FAM43A STX6 RAB9A 20.302484292894984 CDK6 CDC37 UBE2I DDIT3 SLBP NFATC3 CDKN1C WDR33 ZFP36 CA4 KLF10 RBL2 LPIN1 20.17529612617826 CERS2 ATP5J ATP5I ATPIF1 MICAL1 TCTN3 HNRNPH1 RFTN1 RAB9A SNAP23 TRIP12 PTPRA 20.17008364092812 POLR2H CCNC MED29 CSNK2A2 PYHIN1 PARP1 INTS9 PHRF1 IKBKB GTF2B UVRAG 20.16139702054575 EIF4E2 FAM110A CDR2 MAGED1 ZFP36 ZC3H7A RNF214 HNRNPD ARIH2 20.153030956967186 ABI2 ACTB ACTG1 RNPS1 MED29 IFT20 NUP62 BAG4 ABL1 SNAP23 PFDN5 KIAA0355 20.134840484107603 MED15 TRRAP MED24 CCNC HNRNPD EWSR1 NCOA6 IGBP1 MED29 ELP3 INTS9 MLLT6 20.132178021128354 PLIN3 EIF4A3 CCNC PIAS3 VDAC1 C16ORF58 RAB9A GOT1 WDR4 ATIC TMEM43 PARP6 IGBP1 20.098736137725858 UBE2D3 HERC1 RNF216 ARIH2 HNRNPH1 RING1 KLHL3 UBE2G1 STK25 RNF34 FBXL12 XPNPEP1 20.096389228605318 TRIM54 CCNC FAM110A CDC37 UBE2I KDM1A CALM3 JOSD1 TJAP1 VPS9D1 20.0779340675615 CEP162 KDM1A DNAJA3 PYCR2 RPA1 ACTB TTF2 YEATS2 TBC1D31 TXLNA TXLNG GAK 20.05649072924457 CEP131 PFN1 ATP5J RNF214 CEP85 SAMD4B CALM3 AP2B1 KDM1A AKR7A2 TBC1D31 PPP2R3C 20.042434173433982 ATP2B1 VDAC1 CKAP4 STX6 ITGB1 RAB9A CALM3 CALM1 HNRNPH1 ETS1 CLEC2D 19.97396532413994 MIS12 HP AKR7A2 HERC1 CBX3 TJAP1 PTPRA CEP85 MRPL49 MED24 SF3B2 S100P 19.97052951089356 OASL HNRNPR SRSF5 RRS1 SF3B2 DDX54 DDX24 HNRNPAB CSTA TM9SF4 GEMIN4 ZNF317 19.96045703163332 MTUS2 FAM110A TFG JOSD1 ZNF696 RASAL3 RPA1 AFF4 SPG7 SLIRP 19.92461931546622 EDC4 TRRAP WDR37 YEATS2 CRKL MRPS18C RPA1 AP2B1 PDHA1 ZFP36 BAG4 ABCF1 19.91877567423529 KIF7 ZFP36 KIAA0355 SAMD4B TBC1D31 NRD1 PFN1 NUP62 TNFAIP6 RBM14 HNRNPH1 19.910107659171956 C14ORF1 TFG ANXA1 ANXA7 PFN1 S100A8 SNRPB BTBD2 CREB1 CD74 19.89655996319854 HOXA1 FRS3 SLPI GNE RALGDS UBL5 CRELD2 KDM1A ALG13 BAG4 SNRPB BUD31 19.87378424312189 IRAK1 CDC37 IKBKB EEF2 MAGED1 TELO2 GAK IRAK2 HPS6 PFDN5 DNAJA3 CLEC2D 19.842559749010086 RFC5 ZNF330 PARP1 PCBP1 RPA1 RAD17 ITGB1 GTF2B CCNC PPP1R16B 19.822317856775523 RCOR3 KDM1A GATA3 GATA2 GSE1 ZMYND8 HDAC1 ZNF217 ALG13 PHF19 TXLNA CDR2 MXD3 19.81517650315358 NUBP2 TTF2 TUBB BRD7 ARID1A S100A9 PIK3R4 UVRAG MRI1 IGBP1 FBXL12 19.759847321825347 DNAJB6 RRS1 PARP1 DNMT1 HNRNPH1 NOL9 RPUSD2 BAG4 CAPRIN1 DYNLT1 GAPDH ASAP1 19.756906236101734 ATP6V1B2 ACTB TFG PYHIN1 RMDN3 STX6 RAB9A ATP6V1D COX6A1 EEF2 IARS ATP2A2 19.72322397634765 S100A2 THAP11 ILF3 GAPDH COPS5 TCEB2 PIGT MED29 CCDC102A PAXIP1 CDR2 ARHGAP17 19.716122884436086 COX5B COX6A1 COX7C PDHA1 DNAJA3 PARP1 PSMD7 CDR2 BHLHE40 RPLP2 AP2B1 19.68259725695752 SMURF2 PIAS3 UBE2I PARP1 AIMP2 RBM14 RUNX3 SMAD7 DAZAP2 SLC25A5 UBE2G1 19.678378445752724 ARHGAP17 WDR49 COMT ACTB CKB PFN1 SLC9A3R1 FNBP1 EWSR1 ABL1 PACSIN1 19.644446990927882 LASP1 ACTB U2AF2 COPS5 TFG DAZAP2 BHLHE40 BAG4 PFN1 CD81 SH2D2A DNMT1 19.571676583754964 FOXK1 AHCYL2 SCRIB RNPS1 ZNF330 SIN3A HDAC1 RBL2 EEF2 SRF POLD2 19.564399741835985 GOLGA6L9 FAM110A CDC37 RNPS1 ZNF696 GTF2B TJAP1 STK25 SLIRP 19.56270124649877 ARL13B SCRIB STX6 ITGB1 RAB9A PHACTR4 UBE2I TMEM43 APEH SLFN13 GEMIN4 STK25 CMTM7 19.561747471319276 SMAD9 KDM1A DNAJA3 ACTB DSTN E4F1 SMAD7 MED24 MED29 DPY30 TRIP12 METAP1 19.55623251819622 CCDC101 TRRAP WDR37 YEATS2 ACTB ACTG1 PAXIP1 NUP62 TNPO1 BUD31 19.509816900695025 MYH14 ECH1 KLF10 MYL6 MYL6B CLEC4D DNMT1 HDAC1 USP11 PSMB10 19.487152882623747 TF CD81 HP ORM1 FNBP1 GOT1 ECHS1 RMDN3 PGRMC2 SRPK1 PIGT 19.485295819026575 MBIP TRRAP WDR37 EWSR1 CALM1 YEATS2 ACTB POLE4 CCNC S100P ETS1 TXLNA MOAP1 19.482044221883875 CCT8L2 TTF2 HMCES THAP11 YEATS2 CNOT11 RAD17 IKBKB LEO1 SIMC1 TRIP12 19.46407309046348 FAHD1 CFL1 PCBP1 DAZAP1 MDH2 ANXA1 GOT1 PUS1 PAXIP1 MIPEP NUDT9 19.455382939166878 CLIC1 ACTB U2AF2 EWSR1 HNRNPH1 ADA PDHB CLEC4D TOR1A NUP62 19.438550206093833 MKRN3 FAM110A MIF4GD SLPI MAGED1 APEX2 UBE2I SON VPS9D1 19.40573560655779 MTA3 ZNF296 BCL11B ZMYND8 KDM1A GATA3 GSE1 SIN3A PARP1 HDAC1 BAG4 DPY30 19.397856450624943 KANK2 PARP1 SEPT9 PLAGL2 AP2B1 ZNF212 ZFP36 KIAA0355 TXLNA GEMIN4 C2ORF68 19.39526008815284 NCOR2 GATA3 GATA2 SMARCA4 SIN3A WDR59 HDAC1 UBE2I SRF NCOA6 NRD1 ALG13 19.35512084754303 LGALS1 ACTB U2AF2 SNRPB GEMIN4 PCBP1 IGBP1 SEPT9 PTPRA ITGB1 POLE4 19.331616418935166 EFEMP1 FAM110A PDIA4 ANXA5 SLPI ZSCAN21 BAG4 NUFIP2 E4F1 ZNF696 19.321135702619927 PSMD3 ACTB GAPDH U2AF2 SNRPB DYNC1H1 PSMD7 CCDC92 ZFAND2B CD74 GNB1 CRY1 19.30691752661284 KRT8 NUP93 NUP62 CDR2 CEP85 FAF2 ARIH2 ANXA1 TBC1D31 ALG13 LCP2 19.303735537022803 NVL ZNF330 ILF3 CRY1 RPL31 ADARB1 UBE2I PARP1 PYHIN1 RPS24 TTF2 DDX23 19.300485699250302 YKT6 RMDN3 CKAP4 TCTN3 STX6 RAB9A HNRNPH1 DBI MDH2 FNTA GOT1 BET1L 19.260791230499745 CCND2 ACTB RPA1 DYNLT1 PFDN5 SPECC1L CALM3 ABL1 CBX3 CALM1 CDKN1C RBL2 19.26052864274564 MALL CLEC10A MCEMP1 CLDN9 EWSR1 SFSWAP CKAP4 PGRMC2 ATP5J UBE2I DAGLA WDR33 RFTN1 19.22841133690489 PAPD5 ZNF330 SMARCA4 PARP1 RPL31 ADARB1 SRSF5 SF3B2 ZCCHC14 TCTN3 19.21438172969777 CREB1 ATF1 FEZ1 WDR59 SMARCA4 ETS1 UBE2I HDAC1 TTF2 ABL1 CRY1 TAF4 19.213160396405982 XRN1 CALM1 QSOX2 RPA1 HNRNPA1 MLH1 POLD2 ZFP36 ALG13 ZC3H7A ATP5G2 DDX24 19.193591153777238 L3MBTL2 ZNF296 TSPYL1 TBC1D9B THAP11 HDAC1 RING1 KDM2B CBX3 ZFP36 PFDN5 IRF1 19.17647111909451 KDM6A ETS1 PAXIP1 GATA3 GATA2 SMARCA4 SRF NCOA6 DPY30 YEATS2 FNTA 19.170996167169086 CBLB CD81 HMGB2 CRKL ABL1 ASAP1 TICAM1 GORASP2 CRY1 CARD11 DSCR3 19.156378056606947 TBC1D22B CDR2 ARF1 RAB9A STX6 GDPD3 S100A9 S100A8 S100P 19.152962556050998 BANP AHCYL2 PRKCH LAP2 PARP1 UBE2I HDAC1 SNRPB STK38 PDHB 19.148752393108914 TNIP1 CDC37 IKBKB TXLNA DAZAP2 PIGT AP2B1 GTF2B TTF2 CARD11 HDAC1 19.147769346738343 VPS4B RPL31 NUP214 UBE2I IST1 VPS9D1 CSNK2A2 TUBB UVRAG CALM1 SPOCK2 IGBP1 19.131949136717967 TOP3A TTF2 COX7A2 PDHA1 RPA1 RAD17 PARP1 MLH1 ECHDC2 NAA38 MARCKSL1 19.13076243711087 SPPL2B GHITM SLC52A2 PGRMC2 CDR2 PIGT CLEC2D TMEM43 QSOX2 DGAT1 GPR114 TOR1A UQCRQ 19.12347919588782 LMO2 DDIT3 GATA3 GATA2 AIMP2 DAZAP2 MED24 UBE2I YEATS2 NUP62 19.121618184043303 UPK1A ACTB COX7A2 ALG9 S100A9 S100A8 SLC52A2 CKAP4 GDPD3 19.118161453979877 LSM2 ACTB CDC37 ILF3 SNRPB SF3B2 LSM3 NAA38 GEMIN4 DDX23 ARHGAP17 BUD31 19.114296101095956 HIVEP1 ETS1 PARP1 CDC37 MAGED1 BHLHE40 CREB1 GATA3 GATA2 KLF10 19.113472505120797 CEP19 PUS1 SCRIB ANAPC1 RABL2B PPP2R3C SRPK1 INTS9 HMGB2 CREB1 AP3D1 TRIP12 19.096368846649064 CRYAA KDM1A TRRAP GSE1 NCF4 CEP85 DCK GORASP2 PAFAH1B1 RAB24 OTULIN 19.094817039065568 TXN2 HADH TXN MLH1 AP2B1 ILF3 PPP1R16B DCTD DDIT3 CRACR2B ANXA1 C2ORF50 19.057268865970883 GPR17 XYLT2 DNAAF5 TBC1D9B PIK3R4 MTOR HEATR5B DGAT1 GEMIN4 ATP2A2 NDUFB3 SIMC1 19.045182507189242 TECR ACTB SLC25A3 VDAC1 DDOST PGRMC2 PDHA1 CTCF PYHIN1 PAXIP1 ZC3H7A GORASP2 19.03050236564008 RBM12 CSNK2A2 PCBP1 HNRNPA1 SNRPB DAZAP1 IGBP1 ALG13 GNE GATA3 BTBD2 19.024373962959444 ACTBL2 ACTB RNASE3 RPA1 CLEC4D ACTG1 CFL1 ADA BRD7 COPS5 FAM65A CALM1 PHF19 19.005401735020634 TSC2 HERC1 EDEM1 ARF1 ARIH2 SRSF5 CALM1 RAB9A RRAGD GAPDH OTULIN 18.995803666174993 RMND1 COX6A1 PUS1 MFNG PYCR2 SLIRP ECHS1 TMEM43 TMEM109 TMEM5 18.96026247853289 SIRT2 SARS CALM1 KCTD5 RUNX3 ANAPC1 MOAP1 RMDN3 PARP1 SLC25A5 18.931216172323577 TOPBP1 ZNF330 SMARCA4 PARP1 RPA1 SF3B2 MUTYH ABL1 ACTG1 ARID1A TELO2 18.880273131018253 MARCH2 ANKRD13D IKBKB HADH S100A9 S100A8 ALDH9A1 STX6 RMDN3 18.8634091464663 CASP8 TICAM1 CARD11 RALGDS QDPR FASLG IKBKB MTOR PARP1 RNF34 AIF1 18.859793921498206 UBE2L3 HERC1 RNF216 DAZAP1 ARIH2 MDH2 PARP1 RNF144A TRIP12 SMAD7 18.8583894898744 PIAS2 ZNF330 PARP1 UBE2I SUMO3 CREB1 TXLNA ZNF319 PAXIP1 ADA TSR2 18.85500475832615 SNAPIN BLOC1S4 BLOC1S1 TOR1A DOCK9 SNAP23 RAB9A HMGB2 NUP62 S100P 18.831240465115705 UBXN1 CD81 MAGED1 FAF2 PRPSAP2 BTBD2 UBE2I MAP4K1 PAFAH1B1 IGBP1 RPA1 18.82384352352349 MIB1 ECH1 DNAJA3 MAGED1 RRS1 TMCC3 AP2B1 TBC1D31 GEMIN4 YARS RANGAP1 NUFIP2 18.771864505001417 NDUFB10 ACTB SLC25A3 VDAC1 PDHB COX7C MARC1 NDUFB3 NDUFA1 DNAJA3 18.757705337947726 HOOK1 MIF4GD MYL6 ARF1 HNRNPH1 RBM14 UBE2I DYNC1H1 AP4B1 IFT20 18.687600223483518 ARFIP2 CSNK2A2 SARS CALM1 EWSR1 C16ORF58 IFT20 NUP62 ARF1 DDOST STX6 18.654617427831408 ZBTB48 EIF4A3 HNRNPH1 CTCF DDX54 MRPS18C ZNF317 ABL1 DDX24 CENPB 18.648278262136973 DAPK1 SARNP SH2D2A LRRFIP1 CALM1 PARP1 ABL1 APEX2 DSTN EEF2 CAPRIN1 CNTNAP1 18.625191815768023 CCDC53 KDM1A ACTG1 TXLNA NUP62 IFT20 BLOC1S4 BLOC1S1 HSBP1 CKAP4 18.613707031693966 ATL3 RTN3 ATP5E GPR114 GOT1 EDEM1 DDOST ITGB1 UBL5 TMEM109 18.608491932005382 HNRNPLL PAXIP1 MAGED1 BHLHE40 UBE2I PTBP1 PCBP1 WDR33 CALM1 H2AFJ 18.56529717198281 ANP32B AHCYL2 CD81 EWSR1 CALM1 ANXA5 PFN1 ATPIF1 PDIA4 PARP1 RPA1 DNAJC27 18.54525598807852 PDE4DIP KDM1A MIF4GD S100A9 ZZEF1 CALM1 ASAP1 GDPD3 ZNF696 18.51730224032609 ATP6V0A1 TFG ATP6V0E1 ATP6V1D PTRH2 MARCKSL1 GPR114 RMDN3 SLC25A3 TCTN3 AGPAT2 18.515290446059403 ACTN1 TXN DYNLT1 FAF2 PARP1 CKB CD81 ITGB1 CLEC4D ARID1A 18.490950401234517 SLC2A1 TELO2 PIK3R4 PARP1 DDOST TMEM43 UBE2I RAB9A AGK GEMIN4 SPG7 AGPAT2 18.480387369752872 NSF PTRH2 GORASP2 STX6 STX2 SNAP23 HNRNPAB RAB9A STX10 ASPSCR1 BET1L RAB24 18.435480920682263 UBQLN4 MIF4GD TXN RNPS1 DAZAP2 ATPIF1 ANKRD13D MOAP1 STK25 HDAC1 MLLT6 18.42097874642018 EIF5B JADE1 ETS1 PAXIP1 SRRM1 BRD7 ZNF207 ANXA5 CALM1 PYHIN1 18.41752972112872 ASNS ZFP36 PDIA4 PAFAH1B1 COPS5 IARS USP11 GLRX CD74 XPNPEP1 18.41218533710208 XPO7 ZNF330 NUP93 UBE2I NUP62 IFT20 DSCR3 USP11 AP2B1 18.396392430993856 ARF5 GHITM RAB11FIP3 HNRNPH1 DAZAP1 ARF1 AP3D1 FAF2 MRI1 GOT1 GORASP2 18.37044446705495 WIZ EWSR1 PARP1 PRPSAP1 ZNF330 CBX3 HDAC1 GATA3 GATA2 PYHIN1 18.339639700863625 RHOV SCRIB MYL6B MYOF HNRNPH1 RAB9A ITGB1 MTOR SNAP23 RASA3 18.328622114707695 SEC23B EIF4A3 SRPK1 CLEC4D ITGB1 DDOST ARF1 DAZAP1 SUMO3 KIAA0355 18.328519961304476 ZNF24 ZNF330 ZSCAN21 RPL31 CBX3 PARP1 UBE2I USP11 PPP1R16B APLP1 PRKCQ SFSWAP 18.321824014256123 DDIT4L EIF4A3 TXN BAG4 LSM3 CALM3 CALM1 DBI PUF60 CRLF3 18.301390330688754 CLINT1 ACTB RPA1 EWSR1 CALM1 GAK STX6 RAB9A ARF1 AP2B1 TFG PFDN5 18.242067302980896 CHCHD4 ILF3 SON TXN CDR2 GOT1 ECHS1 WDR54 MED29 CRELD2 RAB9A 18.23829564467257 RIPK1 HNRNPR PARP1 RPA1 ANXA1 ITGB1 TICAM1 RNF216 IKBKB TAF4 18.23828550624874 ERGIC3 TCTN3 TMEM109 ANKRD13D CD81 HP UBIAD1 C16ORF58 GPR114 BUD31 18.20184529809352 STX1A CD81 TXLNA CDC37 UBE2I STX2 SNAP23 STX6 STX10 CDR2 CMTM7 ZNF696 18.198325271132198 HARS2 IMP3 SARS TXN C12ORF65 PDHA1 MDH2 PFDN5 GTF2H5 NUDT9 18.187868803564555 NFATC2IP ZNF330 PARP1 HNRNPH1 CBX3 SUMO3 RPL26 TMA7 CALM1 18.18178613831558 ELF2 ZNF330 ZNF148 TRRAP EP400 CBX3 PARP1 CALM1 DYNLT1 18.177610063533027 VAMP2 ILF3 SYPL1 CALM3 CALM1 SNAP23 SEPT9 STX6 STX2 RAB9A 18.174546799272033 PIP CSNK2A2 RING1 USP11 S100A9 TOR1A NFATC3 RAD17 ZC3H7A PPP2R3C 18.14954325162262 DEPDC1B FAM110A NUP93 RAB9A ANAPC1 SMARCA4 TCTN3 RANGAP1 18.148689022614946 STX11 KDM1A FAM110A CDR2 ANKHD1 RNPS1 SNAP23 STX6 STX10 HSBP1 18.07293777403692 CDC20 TRRAP PAXIP1 ANAPC1 CDR2 CTCF HDAC1 COPS5 TUBB IST1 RNF34 18.06537352829875 ATP1B3 ALG9 EWSR1 RPLP2 RPL26 VDAC1 EEF2 ATP5J PAXIP1 TM9SF4 RTN3 CNTNAP1 18.058430968208235 FH GOT1 ECHS1 MDH2 DAZAP1 ALDH9A1 ATIC PAFAH1B1 XPNPEP1 KLHL6 18.054029816552436 ATR TELO2 XPC RPA1 RAD17 ABL1 CREB1 HDAC1 E4F1 IFT20 18.043557300899707 HSF2BP FAM110A CDC37 MAGED1 UBE2I HSBP1 SEPT9 SPG7 18.035869441777375 DNM1 ACTB UBE2I PARP1 FNBP1 IST1 PACSIN1 ASAP1 ANKRD22 18.014966775295967 HCK ACTB CDC37 IRAK2 CRKL ABL1 FRS3 PRPSAP1 PRPSAP2 FASLG RASA3 17.999239629722133 CCNDBP1 FAM110A IMP3 PAXIP1 COPS5 STK25 SPG7 PSMB10 ZNF696 17.997378770344223 RUFY1 COMT FNTA PFN1 ANKHD1 TELO2 BMX AP3D1 TUBB PUF60 DYNC1H1 RAB9A 17.992518965392787 HCST DNAAF5 JAK1 OTULIN TAMM41 AP5B1 GEMIN4 PTPRA LRRC8C MTOR SIMC1 17.98383233661214 VMA21 ATP6V0E1 CLEC4D EDEM1 RMDN3 DAGLA SEPT9 TCTN3 LHFPL2 17.976748177489497 TAB1 HNRNPR CDC37 IKBKB MAGED1 MICAL1 CKB CFL1 CARD11 SMAD7 17.95016160990202 ZNF48 ADARB1 RRS1 RPLP2 ILF3 DDX54 DDX24 SRRM1 ZNF317 TXLNG 17.946900157130642 TLX2 RPL39 KDM2B CSNK2A2 DDX54 DDX24 PAXIP1 GSE1 ARID1A ELMSAN1 17.92469569460188 GOLGB1 KDM1A UBE2I CLEC4D PDHA1 ASAP1 ARF1 RAB9A STX6 GAK EWSR1 PFN1 17.92296645714955 STX12 EWSR1 VDAC1 ABCA1 TM9SF4 STX10 STX6 STX2 SNAP23 RAB9A TXLNA EIF2B1 17.916064893873326 ASPM FEZ1 COPS7B MOAP1 CALM3 CALM1 MXD3 CENPB HLA-DPA1 KIAA0355 MTOR 17.907111849332058 KEAP1 DDIT3 BCL11B PYHIN1 USP11 BRD7 EEF2 GAPDH IRF1 IKBKB KLHL3 LSM3 C2ORF68 17.895459931767437 INPPL1 TICAM1 PFN1 ABL1 SUMO3 ANXA1 TUBB HNRNPH1 SLC25A5 SAMD4B 17.89301132233023 RAB11FIP1 TFG NUP62 DYNLT1 RPA1 ASAP1 STX6 RAB9A S100P 17.87551032688699 RAMP3 GHITM C16ORF58 PYCR2 C2CD2 TUBB PTRH2 CKAP4 SPG7 AGPAT2 FBXL12 17.842736731254284 PNMA1 ACTB FAM110A SNRPB BTBD2 ZNF148 MOAP1 VPS9D1 SPG7 SLIRP 17.825485321672883 PPM1B EIF4A3 VDAC1 S100A8 ABL1 IKBKB ANXA1 GAK ECHS1 S1PR1 17.790638061159452 GRAP2 ZMYND8 ZNF319 ZSCAN21 ANAPC1 MAGED1 BAG4 ABL1 FASLG MAP4K1 LCP2 17.74976522305623 MAP1LC3A TBC1D9B MOAP1 SRPK1 HNRNPH1 PARP1 TUBB VPS9D1 AP2B1 17.749404315652754 RMND5A KDM1A RPL31 GEMIN4 PIGT GID8 ZCCHC14 ARID1A PAPD7 USP11 PDHB 17.731747104066947 CTIF AHCYL2 NUP62 NUP214 TUBB MIF4GD SLBP AP2B1 17.731701247949797 PAFAH1B1 HSP90AA4P ACTB DYNC1H1 DYNLT1 CLASP2 ACTG1 ATIC GAK GID8 FNTA PDIA4 17.72957967176349 AMFR TXN EDEM1 FAF2 UBAC2 CALM1 PTPRA UBE2G1 THEM6 ATP2A2 17.72781688378375 ITCH EWSR1 DAZAP1 ARID1A IKBKB UBE2G1 RAB9A PACSIN1 SMAD7 KLF10 17.70929939195824 MED8 CCNC MED29 MED24 EWSR1 GAPDH INTS9 HDAC1 TCEB2 BRD9 17.687433724996705 PAPSS1 CNDP2 ALKBH5 ARHGAP17 DBI ATIC PAFAH1B1 PARP1 EVL 17.68495250384136 NPRL2 CFL1 PCBP1 ANXA7 RRAGD WDR59 RAD17 ARF1 ARIH2 17.67703921298801 VPS18 SRRM1 UVRAG EDEM1 TJAP1 IFT20 AP3D1 RAB9A 17.67040654526595 ALAS1 ACTG1 EP400 C12ORF65 PDHA1 MDH2 SLIRP ECHDC2 HMGCL AGK 17.668826609745818 INCA1 TFG DAZAP2 PFDN5 SNRPB LSM3 FRS3 GTF2H5 DSCR3 ASPSCR1 METAP1 17.66739858163018 ALB ARIH2 HNRNPA1 HP KCTD5 S100A8 GOT1 EEF2K FOXJ3 17.640103658590974 VAMP5 COMT IMP3 SLC9A3R1 SNAP23 STX6 STX2 STX10 TNPO1 BET1L PTRH2 SLIRP 17.621481945212242 ZYX ACTB U2AF2 EVL PUF60 CCDC92 IMP3 KLHL6 APEH ALG13 17.61515943652794 VASP ACTB ACTG1 PHACTR4 EVL PFN1 ALG13 FRS3 HNRNPH1 SF3B2 17.605763039296725 MINOS1 COX7C PDHA1 UQCRQ PDHB C14ORF2 MED24 LEO1 17.5843438987046 DCP1A TTF2 CRKL KIAA0355 ZFP36 CEP85 SAMD4B PAXIP1 BAG4 YARS 17.579609036915 TBL1XR1 KDM1A GATA3 GATA2 HDAC1 ACTB UBE2I HNRNPD WDR59 ARF1 ABCF1 17.568996843252716 NCOA3 ETS1 ALG13 EWSR1 HNRNPD ABL1 GATA3 GATA2 NFATC3 IKBKB 17.548439981710015 GOPC CCDC102A DDIT3 PTRH2 STX6 RAB9A CLEC4D ASPSCR1 NUP62 CLASP2 17.52217877989635 RALA ANKRD13D ACTB MARCKSL1 ILF3 CTCF AKR7A2 ATP5J CALM3 CALM1 ARF1 17.507233647365528 SH3GLB1 SARNP UBE2I RANGAP1 ARHGAP17 DSCR3 IRAK2 EDEM1 SIN3A GORASP2 17.505892555040685 SPAG9 FNBP1 HDAC1 U2AF2 HNRNPA1 DDX23 LRRFIP1 MAP4K1 RAB9A ABL1 17.502402023603658 EGFL8 ORM1 HP MMP9 HNRNPH1 S100A9 TIMM8B RMDN3 17.498376830412813 SCAMP2 RTN3 TCTN3 RAB9A STX6 SNAP23 ATP5J GNB1 ARIH2 LRRC8C 17.485076415876733 KMT2B ZNF330 CBX3 YEATS2 PAXIP1 NCOA6 DPY30 ANXA7 ANXA1 17.461767487524938 TRIM41 CSNK2A2 RNPS1 ZSCAN21 HNRNPH1 PLAGL2 ZNF696 ZNF319 UBE2I ANKHD1 17.45267074083423 UBE2O CSNK2A2 NOL9 ZNF330 AFF4 DDX54 DUSP5 E4F1 SMAD7 CLASP2 17.42279531113533 LGALS8 APEH PDIA4 ANXA5 ANXA1 HMGB2 ITGB1 TXN PTPRA LRRC8C 17.402842583915255 TRIM66 TELO2 KIAA0355 ALG13 GSE1 JAK1 GLTSCR1 MLH1 USP11 DNAJA3 17.391114150212022 RNH1 RNASE3 PYHIN1 PARP1 U2AF2 THAP11 ZNF696 DSTN FBXL12 17.3763304532466 ZNF644 ZNF330 PARP1 HNRNPR CBX3 GATA2 RPS24 TCTN3 17.376138466020933 GLUL CCNC PIAS3 UBE2I WDR59 ASPSCR1 UBAC2 PIGT ZFAND2B ARIH2 HNRNPH1 TAF15 17.37573524227005 PPP6R1 TTF2 TUBB PFDN5 PRPSAP1 FAM43A MED24 S1PR1 CD74 17.354649530663025 RBM4B IMP3 RRS1 SUGP2 CTCF SON DDX54 DDOST DDX24 SFSWAP 17.33265038164944 LAMTOR2 ILF3 RRAGD MTOR RAB9A BLOC1S1 PDHA1 RING1 17.27176419434903 CEP70 CDC37 ZSCAN21 GSE1 KDM1A ZNF148 DAZAP2 ZNF696 PPP1R16B STK25 17.264800467104696 COMMD4 ARHGAP17 DSCR3 COMMD6 NUP62 SF3B2 ATP6V1D APEX2 17.216393478640946 ZNF622 ZNF330 PAXIP1 PCBP1 WDR33 PYHIN1 HMGB2 ATP6V1D CKAP4 17.195372681096107 MGME1 SOX13 CDC37 PDHA1 MDH2 DDX54 MIPEP HMGCL RAB24 17.162938734881806 NDUFB9 ACTB COX7C VDAC1 COX7A2 ILF3 FEZ1 NDUFB3 NDUFA1 SPG7 17.16143327826983 USP5 ACTG1 MLH1 RNF216 COPS5 IKBKB HNRNPA1 PARP1 DAZAP2 OTULIN 17.159812883321646 TAF6L TRRAP WDR37 YEATS2 XYLT2 HDAC1 PARP1 PFDN5 HLA-DPA1 17.153055678075496 CDKL3 TTF2 HNRNPH1 GNB1 SRPK1 GDPD3 17.148203603584427 EPHA2 CDC37 EEF2 RNF144A ANXA1 STX6 ITGB1 RAB9A AP2B1 NUDT9 17.141995823664153 FBXL19 CCNC S100A9 MED29 MED24 COX7A2 ATPIF1 HNRNPH1 17.128740863187105 OSBPL8 CSNK2A2 TCTN3 CKAP4 STX6 RAB9A CLEC4D CD74 OTULIN GPR114 17.125577970599593 NOC4L RPL31 ADARB1 RRS1 BRD7 ILF3 DNAJA3 BHLHE40 DAZAP2 ZNF148 17.12409034019394 PRKACB HP TBC1D31 ERH PARP1 DDX23 RNF216 ARF1 DFFB RNF34 17.117328586616598 NUP54 WDR59 IFT20 NUP62 HDAC1 KIAA0355 LEO1 AP2B1 17.10271946575575 CETN2 TTF2 XPC PARP1 HNRNPD CALM3 CALM1 GTF2H5 17.08270282018985 FGB HP ORM1 GOT1 AIMP2 PIGT ANXA7 RING1 XPNPEP1 17.00733208733818 PLEC PRKCH PCBP1 HNRNPA1 PDHA1 ETS1 MOAP1 GOT1 DYNLT1 17.00724666278472 SLC25A4 ACTB BRD7 SLC25A5 VDAC1 COPS5 CLEC2D FAF2 RAB9A HLA-DPA1 16.996327473340266 HIPK2 DAZAP2 SUMO3 UBE2I PARP1 ABL1 HNRNPH1 CBX3 16.9899152942407 ZBTB9 ZNF330 PARP1 UBE2I TSPYL1 GATA3 GATA2 MED29 TSR2 16.983958281243105 NEDD4L DAZAP2 IKBKB HNRNPH1 DDX54 APEX2 ABL1 SMAD7 SLIRP 16.960100644517627 MTX1 GHITM ACTB HNRNPD COX7A2 SRRM1 C16ORF58 HLA-DPA1 CKAP4 DDOST 16.934699250434406 TRIM9 ACTB PTRH2 RTN3 CEP85 ZC3H7A PSMG2 QDPR GNB1 EVL 16.900108799281384 CHD1 ZNF330 PARP1 HDAC1 CBX3 PYHIN1 BMX DOCK9 DUSP5 DDX23 16.884286848921455 SLC25A12 GHITM ATPIF1 PAFAH1B1 TM9SF4 BRD9 BRD7 AGK COX6A1 COX7A2 TIMM8B 16.86944207121782 CXCR4 DNAAF5 JAK1 PTRH2 TM9SF4 WDR11 OTULIN DGAT1 SPG7 LRRC8C MTOR 16.84947160782626 TOMM34 ZFAND5 ACTB CDC37 HNRNPD MRPS18C ATP6V1D SLC16A11 CALM1 PLAGL2 16.84853546956054 KRTAP13-3 MIPEP ZNF330 UBL5 ZNF319 PFDN5 GNE GATA2 16.825851174065047 NEK7 HADH SBF1 NUP214 SLC9A3R1 DPH3 EEF2 CTCF ANXA5 AP3D1 16.81076736954127 ARL1 ACTB PARP1 EP400 TJAP1 ITGB1 BET1L GORASP2 16.803924091631114 IRF3 WRNIP1 HNRNPH1 EWSR1 MAP4K1 OTULIN TICAM1 TELO2 RBL2 S1PR1 16.7953850502838 PUM2 ZFP36 ALG13 ZC3H7A RNF214 NUFIP2 SRSF8 HNRNPH1 16.784404873309732 NAMPT CD81 HNRNPH1 U2AF2 HNRNPA1 CLEC4D EDEM1 LEO1 16.783732196123356 C17ORF59 BLOC1S1 CDR2 IFT20 NUP62 UQCRQ HSBP1 RAB9A 16.780217354392438 GABARAPL2 GIMAP6 VDAC1 TXN TBC1D9B PTPRA HNRNPD TSR2 16.777324656444506 CXXC1 DNMT1 PAXIP1 NCOA6 DPY30 YEATS2 HNRNPH1 H2AFJ 16.765373188894834 RYK NUP93 CDC37 PARP1 PTPRA F5 DYNC1H1 AGK 16.763517232608326 MEAF6 TRRAP EP400 PARP1 JADE1 ING4 DPY30 ANKHD1 16.72863180277571 SEC24A TFG EWSR1 ILF3 IGBP1 CLEC4D DSTN ARF1 RAB9A CREB1 16.722669347904347 PCBD1 CCDC102A KLF13 RBL2 GORASP2 TFF3 BRD7 COPS5 RPS24 YEATS2 16.722118550242232 A2M ACTB CDC37 FBXW4 FBXL12 ANXA7 ALKBH5 PAXIP1 RETN 16.711617896269008 APOA1 CD81 HP ORM1 DGAT1 ABCA1 GOT1 QDPR TMEM43 U2AF2 16.710504842795768 FARP1 NUP214 CDC37 SNRPB LSM3 RNPS1 MYL6B DYNLT1 16.70870300017079 DDX19A NUP62 NUP214 ARID1A MIF4GD CLEC4D MRI1 RAB9A 16.706562591021132 PIK3R2 ACTB BRD7 RRS1 TUBB USP11 CRKL ABL1 EDEM1 RNF144A 16.70276799682795 RAB3GAP1 FEZ1 CKAP4 RAB9A GNE PAXIP1 HNRNPH1 AP2B1 16.69880931582648 GNAI2 ACTB U2AF2 HNRNPH1 CD81 GNB1 CYSTM1 RASA3 16.69640197293929 SYDE1 FAM110A PYCR2 HNRNPH1 CALM1 WDR33 16.690827286074203 PLEKHF2 FRMD8 RTN3 MRI1 PRPSAP1 BRD7 PACSIN1 AIMP2 DAZAP2 16.67892069653526 GID8 SPON1 RNPS1 CALM1 PIGT HNRNPH1 PAFAH1B1 PGRMC2 16.67603733156661 DUSP14 TFG MRPS18C WDR59 PYHIN1 PYCR2 OTULIN BAG4 ANKHD1 GOT1 16.673868630948487 PAK4 ACTB DYNC1H1 RCC2 TCEB2 ABL1 SRPK1 STX6 RAB9A 16.67009208289545 AZGP1 CCNC CSNK2A2 S100A8 ORM1 ARG1 TAF15 PSMB10 16.66879271203885 POC1A BLOC1S4 AP4B1 TMCC3 MDM1 HNRNPH1 U2AF2 PFDN5 CKB 16.66335891305705 ALDH3A1 PIGT ARG1 CSTA SRF TXN GOT1 FBXL12 16.642778478791808 ARFGEF1 HNRNPDL DPY30 NUP62 HDAC1 STX6 RAB9A CLEC2D 16.634845113728268 TMOD3 ACTB PAFAH1B1 ACTG1 CFL1 LRRFIP1 CALM1 SF3B2 CLEC4D ERH 16.61727412110016 C11ORF58 MYL6B MAP4K1 DUSP5 PARP1 EWSR1 RANGAP1 CBX3 BMX 16.610766286640867 ERC1 KDM1A CRLF3 ACTB PFN1 NUP62 EWSR1 BMX WDR4 IKBKB 16.596994217730888 PHF20L1 KDM1A DNMT1 HDAC1 JADE1 YEATS2 XPNPEP1 CALM1 16.59598111004374 FAM134C XYLT2 ATP5J DAZAP2 RTN3 UBIAD1 C16ORF58 THEM6 CRY1 CMTM7 16.590846226017806 C3ORF52 SLC23A2 AGPAT2 CLEC10A MARC1 RMDN3 THEM6 UBAC2 KIAA0195 YEATS2 DAGLA 16.58544993837413 PTCH1 CSNK2A2 MYL6 TCTN3 TCEB2 DYNC1H1 CD74 TMEM131 SPG7 16.584833823961432 USP25 TMEM43 SUMO3 UBE2I ANXA1 GAPDH ASPSCR1 WRNIP1 EDEM1 SON 16.583654088755168 EIF3L HNRNPR CAPRIN1 CSNK2A2 CTCF CD74 EIF2S3 CD81 TTC21B BMX 16.56847792951107 ILVBL GHITM GEMIN4 TBC1D9B UQCRQ SLC25A3 FAF2 TCTN3 WDR45B MTOR 16.555636001206043 CEP97 PFN1 CALM1 AP2B1 RNF214 BTBD2 CALM3 SNAP23 KIAA0355 16.547371349390563 ACAD11 CCNC MED29 MED24 SLBP STK25 CAPRIN1 HNRNPH1 16.547185004519505 LRRK1 CDC37 MAP4K1 GAK S100A8 SH2D2A ABL1 ECHS1 ASAP1 16.546676792140822 DAB1 DAZAP2 MAGED1 BHLHE40 CRKL APLP1 SNRPB PAFAH1B1 CLASP2 16.53908828708178 PACSIN2 ACTG1 PDIA4 EWSR1 ARIH2 PACSIN1 ASAP1 FASLG STX6 SBK1 16.525608114567383 SGTA ACTB U2AF2 CALM1 DSTN SLPI EDEM1 RNF144A TFF3 AASDHPPT 16.51864160126772 FAM129B SHMT1 GOT1 PTPRA HNRNPH1 PUF60 STX6 RAB9A 16.517706094980625 TXNDC5 JMJD8 TXN WDR59 TCTN3 PDIA4 ZNF207 PTPRA 16.493822462293114 HMGXB4 ZNF330 SIN3A CBX3 ZNF296 JADE1 UBE2I UBL5 16.47752056472647 MBNL1 ORM1 HP HNRNPH1 HNRNPA1 PGRMC2 CKAP4 ANKRD22 16.473890123608427 TINF2 HNMT ACTB GAPDH TUBB ANXA5 ADA CKB SARS PAXIP1 16.471585542795882 CTNND1 EIF4A3 ACTB AIMP2 ACTG1 RNF144A STX6 RAB9A PTPRA 16.460484931002252 BAIAP2 KDM1A ACTB ACTG1 PFN1 NRD1 SLC9A3R1 PTPRA CRY1 STX6 16.445225865674477 RUNX1T1 ETS1 ZFP36 HDAC1 DNMT1 SIN3A GSE1 CDR2 16.429460003974558 FGFR1OP PIK3R4 ELP3 WRNIP1 ABL1 TBC1D31 TXLNA SPECC1L RABL2B PPP2R3C 16.421336304176734 ASAP1 ARHGAP9 CRKL ARF1 RAB11FIP3 PACSIN1 PDHA1 ASPSCR1 16.412935077952277 HIST1H2BO ZNF330 XPC PARP1 H2AFJ ING4 JADE1 CD81 16.399303849552872 VPS33B TBC1D31 UVRAG RMDN3 ZC3H7A DYNLT1 COMMD6 STX6 AP3D1 16.396794686763364 SP110 CSNK2A2 RUNX3 CBX3 RMDN3 AFF4 ANXA7 CENPB 16.36998139259394 NUDCD3 KDM1A UBE2I PARP1 TUBB DYNLT1 PAFAH1B1 DYNC1H1 KLHL3 KLHL6 16.36720488365212 TAB2 PIAS3 UBE2I HDAC1 SIN3A SMAD7 EDEM1 IKBKB HNRNPH1 16.349906534066797 TSPAN4 ITGB1 CD81 HNRNPH1 FRS3 GNE CLEC2D 16.34006896549489 FOXH1 TFG MAGED1 DAZAP2 IARS SCRIB COX7A2 YARS 16.316466337999003 GNAS ACTB CD81 GNB1 CLEC4D RMDN3 CALM1 GTF2H5 ELMSAN1 16.296003760182465 USP21 FAM110A UTRN GATA3 FUCA1 BHLHE40 JOSD1 RNF144A 16.2852731495118 EPC1 TRRAP ELP3 EP400 ACTB HNRNPH1 SRF 16.284772617464178 SNX3 EIF4A3 TFG PAXIP1 GOT1 DAZAP1 IGBP1 RAB9A 16.264200043157967 ATPAF2 HMGN2 PPP1R16B MIF4GD CLEC4D EWSR1 DDIT3 MICAL1 ECHS1 HMGCL 16.247728033207103 HLA-B IL27RA ST6GAL1 TELO2 EDEM1 FAF2 PTRH2 WDR11 CLEC4D C16ORF58 16.2442081272534 DUSP16 CEP85 CNOT11 KIAA0355 DYNC1H1 STK38 ANAPC1 HPS3 RNF34 KCTD20 16.23614677408253 PIP4K2C AHCYL2 ACTG1 CSNK2A2 RNPS1 SCRIB DDX23 AFF4 GOT1 EDEM1 16.23464525562317 CBWD1 TTF2 TUBB FAM43A WDR59 C2CD2 DDX47 METAP1 16.216856238419606 GIPC1 NUP93 EWSR1 DDIT3 GEMIN4 HNRNPH1 PFN1 DOCK9 16.20290775512864 SCLT1 KDM1A BLOC1S1 IFT20 CALM1 PHACTR4 DYNC1H1 HSBP1 COMMD6 16.196967540870038 WEE1 CSNK2A2 EEF2 PCBP1 PAXIP1 GATA3 ADA ING4 MTOR 16.187348851315615 LINC00839 PTBP1 SEPT9 RPLP2 QSOX2 PARP6 PTPRA CALM1 16.17987945542773 LURAP1 DOCK10 CCDC92 SBF1 TSPYL1 UTRN HSBP1 U2AF2 ELP3 16.176639755202395 ACADVL GOT1 INTS9 EEF2K PARP1 MDH2 DNAJA3 HMGCL 16.175080066253535 WDR46 ADARB1 RRS1 SRSF5 DDX24 TSPYL1 CSNK2A2 ZNF317 H2AFJ 16.161542765064663 H2AFJ PNMAL1 HNRNPD ING4 U2AF2 PARP1 TRIP12 SFSWAP 16.160956607601783 PRDX6 VDAC1 PAXIP1 ORM2 ANXA1 PARP1 RANGAP1 GSTO1 16.158663697549933 MED19 GTF2B CCNC MED29 MED24 PARP1 AFF4 DCK 16.132902748781305 SLC6A15 KLHL36 PTPRA CKAP4 PTRH2 STX6 RAB9A AGPAT2 KIAA0355 AP2B1 16.111030568296503 EXOC3 IFT20 PTPRA SUMO3 STX6 RAB9A MLH1 METAP1 16.079466892295642 TGFBR1 CSNK2A2 IKBKB AP2B1 FNTA PPP1R16B USP11 SMAD7 BTBD2 FBXL12 16.05080252324916 GDI2 ACTB U2AF2 RRS1 PARP1 TUBB RAB24 RAB9A 16.049780308592744 MYOG ZCCHC14 FAM110A SRF SMARCA4 MLH1 16.04629491970636 ZBTB11 ZNF330 RPL31 ADARB1 RPL26 SRSF5 E4F1 SMAD7 16.032447140002745 TMEM120B COMT ATP5J C16ORF58 TCTN3 STX2 CD74 CLEC2D 16.0317118160904 NEK6 KCTD5 NUP93 CDC37 BHLHE40 FOXJ3 16.02310707129147 STAM2 IST1 CLEC4D DAZAP2 JAK1 PARP1 LCP2 STX6 16.003244726448333 POLE3 APEH YEATS2 POLE4 RAD17 RPA1 POLD2 ARF1 WDR45B 15.979654706376744 CCNH CCNC PTBP1 PDHB RPA1 TFG BCL11B GTF2H5 15.97746926960866 MELK FAM110A TXN SMAD7 PSMG2 15.968991066438525 KRT38 KDM1A CCDC102A FAM110A TXLNA TXLNG THAP11 CREB1 15.952221449386027 NDEL1 ACTB PAFAH1B1 DYNC1H1 DYNLT1 RBM14 AIMP2 UTRN TUBB SFSWAP 15.940494341495826 NCAPD2 PSMD7 PAXIP1 PARP1 SNRPB SF3B2 HDAC1 CD74 15.940093981638165 MCL1 VDAC1 BAG4 EDEM1 DBI SEPT9 ARG1 TXLNG S1PR1 15.940000041295681 RNF5 S100A9 EDEM1 S100A8 C16ORF58 AGPAT2 15.897820053267852 RAB11FIP5 JMJD8 NUP93 RAB9A STX6 ERH RNPS1 ECHS1 15.896089608622601 CLU PAXIP1 ORM1 HP HLA-DPA1 KDM1A TOR1A MMP9 KIAA0355 15.884184984777965 RIOK2 JADE1 GHITM CSNK2A2 PYHIN1 PARP1 EWSR1 SRPK1 H2AFJ 15.883232700331392 TNIK ACTG1 PAFAH1B1 DYNC1H1 FNTA CCDC92 GSE1 IKBKB 15.87882932139874 ZCRB1 RPL31 ADARB1 RPL26 CTCF SRSF5 RRS1 SF3B2 SF3B6 SNRPB 15.860555368201913 IQGAP2 ACTB PARP1 MYL6 CALM1 IKBKB MAP4K1 CD74 15.85554818844333 ARFIP1 ACTB PFN1 ECH1 RAB9A ARF1 DYNLT1 SARS SHMT1 QDPR 15.849079847984552 PSMD5 ZMYND8 PSMD7 CCDC92 BTBD2 SON PARP1 PSMB10 15.839002914621174 SLC39A7 ARF1 ARIH2 DAZAP1 HNRNPH1 TCTN3 HDAC1 NUDT9 15.825626024309042 ROR2 ALG13 MAGED1 DAZAP2 BHLHE40 PTPRA STX6 RAB9A 15.806671809834366 PLEKHA5 CCDC92 PFN1 CALM1 STX6 RAB9A SH3BP5L DYNLT1 SLC25A5 15.787227371909902 C19ORF52 GEMIN4 CRACR2B ATP6V1D PDHA1 FAF2 PARP1 AGK 15.76946839274223 PIK3R3 WRNIP1 FRS3 RBP7 SH2D2A MICAL1 ORM1 RING1 15.761089433737174 VPS11 KDM1A CCDC92 SCRIB TICAM1 CSTA UVRAG AP3D1 RAB9A 15.750710665271558 ENV PSMD7 VDAC1 ATP2A2 CKAP4 TOR1A CALM1 TMEM43 EVL TIMM8B 15.73976617927808 BRPF1 ZNF330 ZMYND8 CSNK2A2 RING1 JADE1 ING4 DYNLT1 DDX23 15.73751855547109 CDK19 CCNC MED29 MED24 CTCF SMARCA4 CDC37 STK38 15.729987427345261 PUS7 ZNF330 CBX3 ACOT13 EWSR1 RAD17 DUSP5 S100P 15.727534655297669 WAS CRKL FNBP1 LCP2 PACSIN1 UBE2I RPA1 15.723468157074976 COMMD1 TTF2 DSCR3 COMMD6 NUP62 HMGB2 TCEB2 15.721644696551733 USP47 FNBP1 HDAC1 LRRFIP1 IST1 PUS1 DUSP5 PRPSAP2 15.701963920129039 HIST2H2BF ZNF330 CD81 HMGN2 SRSF5 PRPSAP2 C2ORF68 GID8 15.701341664354155 SMN2 GEMIN4 SNRPB SRSF5 BRD7 MAGED1 SNU13 PDHA1 15.69748755538086 IKBKAP APEH PFN1 DPH3 BMX IKBKB SF3B2 DYNC1H1 ELP3 ZNF217 15.694961512113478 SYT2 AHCYL2 COPS5 COPS7B EIF2B1 CALM1 AFF4 CENPB LEO1 15.68962213204478 NDUFB1 C14ORF2 MED24 EWSR1 NDUFB3 NDUFA1 LEO1 15.676376867247177 TSPAN15 PIGT QSOX2 HNRNPH1 TM9SF4 DGAT1 RTN3 THEM6 S1PR1 LRRC8C 15.66053620538008 CIDEB COX7C C14ORF2 DFFB HNRNPH1 ATP5I DNAJA3 15.649891697563719 ATP2B4 SLC41A3 ATP5J CALM3 CALM1 TMEM5 LHFPL2 RAB9A 15.641630683709483 TAF3 ZNF330 PARP1 CTCF RPL31 DDX23 MICAL1 BHLHE40 TAF4 15.637244950327485 PTPN21 AHCYL2 AKR7A2 CSNK2A2 DDX54 DSCR3 BMX GID8 CEP85 15.625029081217942 TUBB8 IL27RA FAM43A TCTN3 TUBB COPS5 GNB1 PFDN5 15.624669221963435 USP18 IKBKB HERC1 MYL6 CARD11 UVRAG EEF2 15.588648143243995 POC5 DDX47 DDOST RNF214 DYNC1H1 CALM1 AGK CALM3 15.577116519930607 SRP1 DAZAP2 ACTB ACTG1 S100A9 S100A8 15.572086061903365 PPP4R2 GEMIN4 DSTN CAPRIN1 IKBKB SF3B2 SF3B6 ELMSAN1 IGBP1 MARCKSL1 15.56827998584919 NAGK STIM2 TXN GOT1 PIGT HNRNPH1 C2ORF68 XPNPEP1 15.562532280822358 OIP5 THAP11 ING4 MED29 PLAGL2 KDM1A SNRPB NUP62 15.559594736957242 RNF111 SUMO3 UBE2I CREB1 TSPYL1 CSNK2A2 XPC AP2B1 SMAD7 15.557429362211144 CSTF1 DOCK10 RBM14 SNRPB HNRNPA1 DNAAF5 ALDH9A1 PAFAH1B1 15.553688819971903 PLEKHG4 TFG ZNF319 GATA3 ZNF148 ARF1 ZCCHC14 SLIRP METAP1 15.542170910855692 CBFB ETS1 ARIH2 TCEB2 RBM14 ANXA1 RUNX3 BCL11B TAF4 15.536452490355899 F2RL1 CD81 GEMIN4 COPS5 TELO2 TBC1D9B ATP5J C16ORF58 ATP2A2 15.535961294810384 HLA-C IL27RA ST6GAL1 PIGT PDIA4 CLEC4D CKAP4 FAF2 MTMR12 STX10 15.534114358260656 BCL2 GHITM VDAC1 RTN3 MOAP1 UBE2I HNRNPD PARP1 IKBKB 15.533404440158645 CLIC4 CRKL HADH CLEC4D LSM3 ADA HNRNPA1 PARP1 15.520544839745105 ATXN10 MAGED1 GATA3 ABCA1 SMARCA4 PTPRA TNPO1 GNB1 15.496123999634769 CREB3L1 TRRAP DDIT3 ORM2 COMT UBIAD1 C16ORF58 CMTM7 15.491293611880717 WLS FAF2 RAB9A TCTN3 PARP1 HNRNPH1 15.461829961702833 FBXO38 TRRAP EP400 YEATS2 TAF4 RING1 USP11 ZNF217 15.450550316999559 SMG9 ZC3H7A RNF214 SARS PFDN5 PRPSAP1 PPP2R3C 15.426384951050304 FOXRED2 MIPEP TNFAIP6 DPY30 HERC1 EDEM1 PDIA4 VDAC1 RAB9A 15.425917716032867 SRGAP2 ACTB PFN1 FAM110A CALM3 FASLG S100P 15.422690579936049 VPS45 COPS5 HNRNPD ARIH2 STX10 STX6 RAB9A AP2B1 15.418415901594125 DST ACTB COPS7B CALM1 STX6 RAB9A TXLNA EIF2S3 DYNLT1 15.415364741229673 PFKM APEH SARS CALM1 PCBP1 SUMO3 ELMSAN1 15.4099128323473 SETD2 SCRIB RNPS1 SNRPB RPA1 DDX23 PACSIN1 WDR37 15.406980960943029 MAD2L1 ATP5E TUBB APEX2 MLH1 TSR2 ANAPC1 YEATS2 STK25 15.406252299971804 LACTB CCDC102A CALM1 MDH2 PDHB ECH1 PAXIP1 MARC1 15.402711277620153 MYCBP2 TBC1D9B CALM1 CD81 CFL1 CRY1 USP11 PUF60 15.396935924275263 ECM1 CCNC FRS3 GNE SRPK1 FASLG IL2RB 15.37730311625595 SUFU ZNF330 ZFP36 RUNX3 SLC41A3 EDEM1 NFATC3 15.372741331014076 EYA2 AKR7A2 TFG GSE1 THEM6 CEP85 PIGT ARID1A 15.349175895187445 CARD10 KDM1A GSE1 CDC37 TMEM43 JOSD1 HNRNPH1 15.341469127898563 GNL3L ZNF330 BRD7 RPL31 RPS24 PYHIN1 PAFAH1B1 OTULIN S100P 15.341311908693433 FGG PIGT KDM1A HP RING1 NUP93 BLOC1S1 GOT1 15.332547492732672 COPB1 SRPK1 SF3B2 ARF1 ZC3H7A CD74 KCTD5 15.326547410839261 PC ILF3 BRD7 SLC25A3 GOT1 DYNLT1 MDH2 FOXJ3 15.314352339322406 SLX4 SUMO3 UBE2I PARP1 RPA1 TSPYL1 GID8 15.311255187290874 OXSR1 AP5B1 ARHGAP17 HNRNPH1 TCEB2 DYNC1H1 15.30554778340895 MSH3 ZNF330 PARP1 RPA1 CLEC4D MLH1 ORAOV1 15.294569698171255 ATP6V1E1 ZFP36 ATP6V1D AIMP2 XPNPEP1 RMDN3 TMEM43 ARID1A 15.287075643354028 GLI4 ZNF317 RRS1 CTCF NOA1 DDX24 RETN CENPB ZNF696 15.257357234458235 RGS20 TXLNA TXLNG ZNF664 FAF2 FASLG CRY1 NUFIP2 ZNF696 15.236179671414925 OPRM1 ATP6V0E1 DNAAF5 GEMIN4 COPS5 CALM1 ARF1 NDUFB3 15.23524860800903 PATZ1 PPP1R16B USP11 ILF3 HNRNPH1 SCRIB DAZAP2 PFDN5 15.227784128141206 CENPM VPS25 GNE MICAL1 COPS5 PAFAH1B1 PDXP LEO1 15.204726404679072 C11ORF30 ETS1 NUP214 CSNK2A2 HDAC1 DDIT3 SON DYNLT1 15.158855629279438 FBF1 PSMD7 SCRIB TXLNA TXLNG IFT20 CEP85 AGK FBXL12 15.156024313376316 VWCE FBXO21 HLA-DPA1 ZSCAN21 NUFIP2 E4F1 ZNF664 ZNF696 15.146391307382599 RAB11B HADH RAB11FIP3 SH3BP5L ARF1 BTN3A3 RPA1 CCDC64 15.14304136761394 ENTHD2 CCNC DSCR3 AP4B1 LSM3 SF3B2 TNPO1 MAT2B 15.14243275966237 XPOT NUP93 WDR11 RAB9A NUP62 PTPRA 15.133795975681004 NCOA1 GTF2B ETS1 SMARCA4 COPS5 ARID1A SRF NCOA6 15.132902760384578 CEP120 TTF2 TBC1D31 ECH1 ANAPC1 YEATS2 CNOT11 CKB 15.128644851715702 CENPV ADARB1 ILF3 C12ORF65 PARP1 PYHIN1 FNTA CLEC4D 15.1035351275964 UBA5 APEH HADH SARS ATP6V1D PARP6 GSTO1 ECHS1 15.102492137094542 MRTO4 RPL31 ADARB1 SRPK1 HNRNPH1 DDX24 PYHIN1 ARID1A 15.101646578516238 RALBP1 GSE1 HDAC1 RNPS1 AP2B1 SEPT9 15.09225683574621 NARS PUS1 ABL1 PARP1 GOT1 IARS ANXA5 15.091045033968834 NUMB FRS3 GAK STX6 RAB9A AP2B1 PTPRA IGBP1 MTOR 15.088913125212214 WASF2 APEH ACTB ACTG1 EVL PFN1 ABL1 UTRN HSBP1 15.081391671513344 KIF18B ZNF330 PARP1 AHCYL2 NUP62 IMP3 15.078345229678735 TACC3 TTF2 SLBP ASPSCR1 NUP62 EVL MLLT6 15.073728626475999 ELK4 ACTB ACTG1 H2AFJ ERH SRF PRPSAP1 PRPSAP2 15.073648740277468 BAZ1B ZNF330 SMARCA4 PYHIN1 PARP1 HMGN2 CBX3 ARID1A 15.06765795657382 REEP6 CDC37 RTN3 ATP5J GPR114 IFT20 TMEM43 GORASP2 15.061554428939509 MBD3L1 ZNF296 UBL5 SIN3A HDAC1 FRS3 SNRPB MLH1 15.04254796183253 EPN2 DAZAP2 PARP1 TXLNA RNF214 STX6 AP2B1 15.020521594219765 SPINT2 CLEC4D TBC1D9B MARC1 THAP11 PTPRA CNTNAP1 ZNF696 15.011061405772034 NDUFA13 ATP5J MDH2 ATP6V1D MTMR12 NDUFB3 NDUFA1 15.009003725914887 B3GAT3 XYLT2 CLEC2D DNAAF5 HNRNPD CKAP4 STX6 14.998436277651132 BCCIP ACTB MARCKSL1 CDC37 PSMB10 HMGB2 ALDH9A1 DDX24 14.990558598267253 IPO9 AIMP2 ACTG1 FAF2 ZNF330 TNPO1 H2AFJ S1PR1 14.98464710176754 SGOL1 KLHL36 TBC1D31 ECH1 CBX3 PAFAH1B1 MLLT6 HNRNPAB 14.983710838573753 GLMN KDM1A IL27RA ST6GAL1 CDC37 MFNG C16ORF58 TCEB2 14.972520526277455 MCMBP PSMD7 DYNC1H1 TCEB2 RPA1 DDIT3 ATP6V1D DUSP5 14.970841334464218 FAM168A TFG EWSR1 HNRNPH1 DAZAP2 OTULIN RNF216 14.955353056539847 GCD7 RTN3 UBE2I PUF60 AIMP2 HSBP1 DNAJA3 ZNF212 14.95426536391778 RAB10 CFL1 MDH2 CALM1 CLEC4D RPA1 ITGB1 MICAL1 14.951051209501884 RASSF10 FAM110A ANXA1 CSTA PCBP1 14.93673947379863 ATXN1L ALG13 GATA2 ANKHD1 SUGP2 DAZAP2 GORASP2 S100P 14.927319768291202 HMGCL MIPEP GTF2B MS4A7 FBXO21 GSTO1 14.92509537396442 CORO1A DDX24 U2AF2 TJAP1 IFT20 TMCC3 14.920797341901133 SUCLG1 ACTB GSTO1 DOCK10 ANXA5 MDH2 ECHS1 SEPT9 14.908251005516775 PFN2 ACTB ACTG1 EVL GEMIN4 SIN3A WDR33 HNRNPH1 14.90449333655689 ZNF609 ALG13 GATA3 GATA2 HDAC1 PIGT HNRNPH1 TAF15 14.903698021848014 ZNF687 ZMYND8 TSPYL1 CSNK2A2 HDAC1 PYHIN1 CENPB DDX24 14.895259445445555 TAF1D ADARB1 HNRNPH1 ILF3 RNPS1 CSNK2A2 FEZ1 CENPB 14.893436067265139 ATP2C1 FAF2 RMDN3 HLA-DPA1 OTULIN GORASP2 S1PR1 CLEC2D AGK 14.886596050997753 RABL6 SMARCA4 PARP1 CAPRIN1 MED29 STX6 RAB9A 14.871411459595244 PRKAA2 CDR2 AIMP2 ZNF212 STIM2 DNMT1 RASAL3 UBE2I 14.868943292226202 EMILIN1 TTF2 NUP214 PUS1 IFT20 WDR11 14.851084135537645 PAK2 SCRIB ABL1 EEF2 CKB GAPDH RPA1 PAFAH1B1 14.850405335794173 ISY1 CCNC SRRM1 SNRPB UBE2I BUD31 MLH1 POLE4 14.835281063429976 GNA13 CD81 GNB1 MRPS18C UTRN PHF19 RNF144A S1PR1 14.821536815769784 MED31 CCNC MED29 MED24 ERH DAZAP2 ANXA7 14.815821244038327 USP54 ZNF148 PIGT ZCCHC14 ZFP36 KIAA0355 SNRPB BHLHE40 14.809417337935024 PTPRF ACTB GAPDH ATP6V0E1 DNAAF5 PTPRA SEPT9 14.7879076406152 NSFL1C KDM1A DYNLT1 ASPSCR1 DBI ATIC GMDS PARP6 AP2B1 14.777756102430113 INTS7 PAXIP1 SLC9A3R1 HNRNPD FAF2 SUMO3 HLA-DPA1 14.748978630850628 CACNA1A RPL31 TELO2 MOAP1 CALM3 TCTN3 PUF60 TAF15 14.746549495008853 SAAL1 ST6GAL1 PTPRA TMEM43 QSOX2 CKAP4 PTRH2 S1PR1 SPG7 14.744273321890732 TTYH1 XYLT2 ALG9 KIAA0195 TAMM41 DGAT1 THEM6 DAGLA AGPAT2 LRRC8C 14.742989722545698 CCDC120 MAGED1 DAZAP2 HNRNPH1 GID8 CDR2 PLAGL2 14.730963231649827 DNAJC11 PITPNB RNF214 RMDN3 TCTN3 ANXA7 14.726287590291909 IRF2BP2 TRRAP SCRIB PARP1 SIN3A KLF10 NUDT9 14.707477862874384 NDUFA10 PDHA1 UQCRQ RMDN3 NDUFB3 NDUFA1 ELMSAN1 14.706109044870798 ZDHHC17 CSTA CNDP2 ZFP36 IFT20 STK25 SNAP23 EVL 14.699001765592229 CIRBP PAXIP1 COPS5 HNRNPA1 ARIH2 CAPRIN1 DOCK9 SRF 14.698104690140756 TOM1L2 SIN3A IGBP1 CREB1 DSTN RAB9A CALM3 CALM1 14.664765953545581 VDR RUNX3 STK38 SRPK1 NCOA6 MED24 14.656220157241687 HIRA UBE2I HNRNPD RPA1 PYHIN1 SMARCA4 CTCF CSTA 14.638638682988514 SNX12 DAZAP1 HNRNPH1 EIF4A3 ELMSAN1 IGBP1 14.636167734860132 MTPAP C12ORF65 PDHA1 CALM1 SLIRP STK25 DYNLT1 LEO1 14.626340866270665 APOD CLEC10A ZMYND8 CCNC ATP6V0E1 TOR1A KIAA0355 CLASP2 14.615614289920204 SMG1 EIF4A3 ZC3H7A RNF214 TELO2 EEF2 14.614446791960203 AP1G1 COMT AIMP2 ARF1 CREB1 HEATR5B AP2B1 PAFAH1B1 14.604407722606613 TACSTD2 GEMIN4 TBC1D9B EIF2B1 TNPO1 MED24 HNRNPAB MTOR 14.599528345408455 MET CLEC4D ITGB1 SH2D2A PCBP1 SH2B3 14.59321985832173 RINT1 FAM110A STX6 CKAP4 TXLNA RBL2 14.585510872544091 TMEM184A NRD1 DGAT1 UQCRQ HNRNPH1 AASDHPPT SPG7 AGPAT2 14.576893732970479 CHD8 GTF2B CBX3 CTCF U2AF2 DYNLT1 CREB1 AP3D1 14.569988940649422 EIF2B5 S100A9 OPN3 EIF2S3 CD74 EIF2B1 14.560632203976718 S100A4 TTC21B ZZEF1 PARP1 GAK DNAJA3 HSBP1 IGBP1 14.551894346955736 MACF1 ACTB PFN1 SNRPB RBM14 GOT1 CBX3 ASAP1 STX6 14.535508606477173 ARHGAP18 IST1 BAG4 S100A9 MRI1 MIPEP GAK AP2B1 14.522276717124278 AMOTL2 DDIT3 CDC37 NUP62 RASAL3 CDR2 ANXA1 14.509699238226958 ENKD1 MIF4GD CDR2 HNRNPH1 MLH1 PPP2R3C S100P 14.503235710060022 ZMYM4 KDM1A ZSCAN21 CBX3 PARP1 CTCF PYHIN1 TUBB 14.500830484484682 MED21 GTF2B CCNC MED29 MED24 CSNK2A2 CKAP4 PTRH2 14.497762528559528 FTH1 GATA3 BAG4 PCBP1 PACSIN1 TNPO1 BRD7 14.497167754290938 FAM91A1 KCTD5 TUBB BRD7 TNFAIP6 STX6 RAB9A 14.494305343661955 SGTB SLPI EDEM1 MTOR RNF144A EIF2B1 DDX24 14.488715022648904 HOXB9 AHCYL2 ING4 PARP1 PFDN5 S100P CKLF 14.4870544863434 LDLRAD1 ALG9 DNAAF5 JMJD8 PIGT ZNF696 CNTNAP1 CMTM7 14.486934274192736 UCHL1 PYHIN1 DNMT1 ILF3 COPS5 CLEC4D UBE2I TMEM5 14.475595830125046 TRAF1 ECH1 EWSR1 GORASP2 GATA2 IKBKB JOSD1 TICAM1 NUFIP2 14.464025817044417 CMTM5 SLC23A2 MCEMP1 TMEM43 TUBB DGAT1 C16ORF58 PPP2R3C 14.443172774533613 SV2A NDUFB3 SPG7 WDR11 RAB9A CALM1 CLEC2D 14.441202817255288 DNAJC6 KDM1A HDAC1 COX6A1 HK3 GAK AP2B1 WDR11 14.425176356594495 HOXC9 AHCYL2 PIGT ZFP36 MAGED1 HNRNPH1 S100P 14.42281411997591 BLOC1S2 CCNC ACTB BLOC1S4 BLOC1S1 STX2 IFT20 14.417781482273915 MOB4 MYL6 CALM1 PAXIP1 RMDN3 STK25 HDAC1 14.407430941776948 GMPS AIMP2 ANXA5 GSTO1 PARP1 CRY1 AGPAT2 14.406138254204377 ZNF777 ADARB1 RPL26 MRPS18C SRSF5 NKTR SCRIB ZNF212 14.40113789512277 ATXN2L PCBP1 NUFIP2 ZC3H7A AIMP2 ADA 14.395977640558531 LAGE3 CNDP2 RNF214 HK3 TCEB2 WDR11 14.392059312497357 EPB41L1 KCTD5 COPS5 RNPS1 PTPRA EEF2K AFF4 14.391582558401 MNDA RRS1 CAPRIN1 DDX47 CTCF DDX54 DDX24 NOL9 14.387737267370053 GLO1 ATIC CLEC4D PARP1 EWSR1 HNRNPA1 IGBP1 QDPR 14.362025728634679 DEPDC5 ARF1 DAZAP1 HNRNPD RRAGD WDR59 14.358813622571114 CD6 ZMYND8 CSNK2A2 SBF1 PAXIP1 DDX47 XPNPEP1 14.357825919578593 CCP110 CALM3 CALM1 AP2B1 SNAP23 DYNLT1 TUBB SAMD4B 14.345373302536832 EFHD1 ACTB CLEC4D S100A8 PLAGL2 CALM3 14.328541690386054 ZNF655 ZNF330 IMP3 CDC37 DSCR3 VPS9D1 EVL 14.318998225813118 WBP11 ZNF330 RPL31 EWSR1 DDX23 C16ORF58 PYHIN1 ARF1 H2AFJ 14.31874569276878 ATRIP HMGN2 ST6GAL1 TXLNA TELO2 IFT20 RAD17 RPA1 14.317280296589368 SLX1B ANXA1 THAP11 GEMIN4 SLIRP CALM1 S100A8 14.31538049992126 FOXR2 TRRAP EP400 CCNC EIF2B1 PSMG2 14.306306111268016 PTDSS2 HNRNPR ILF3 H2AFJ TCTN3 S1PR1 CLEC2D 14.305368161660532 LMO3 AIMP2 HNRNPH1 BRD7 BHLHE40 SAMD3 14.29858331837413 LENG1 EWSR1 CDR2 IARS UBL5 H2AFJ KLF10 14.29801629683905 EXOC4 IFT20 DYNLT1 RPA1 HNRNPH1 STX6 14.296864617112503 HLA-DPB1 C2CD2 HLA-DPA1 STIM2 TBC1D9B CD74 ZZEF1 GID8 14.29534476861751 CEP72 AHCYL2 TBC1D31 SOX13 UBE2I HNRNPH1 PPP2R3C 14.294423010960298 SGPL1 CSNK2A2 VDAC1 CD74 TCTN3 TM9SF4 BRD7 UQCRQ 14.282586431773723 PRDM1 KDM1A HDAC1 GSE1 KDM2B IRF1 AGK BMX 14.271988577978682 PDCD7 FBXO21 NUP62 FEZ1 SNRPB LEO1 CALM1 14.266049355687565 TCL1B ZNF330 BAG4 HPS3 HPS6 MED29 MED24 14.262926785345126 TGM1 CCNC THAP11 PIGT TCTN3 JOSD1 GOT1 TAF15 14.239321973041342 COPG1 TELO2 MAGED1 ARF1 AP3D1 CD74 GEMIN4 SF3B2 14.231903724696291 H1FNT MRPS18C E4F1 ZNF317 CTCF MRPL33 MRPL49 SF3B2 14.214787339325838 IPO8 ZNF330 ZSCAN21 TNPO1 DUSP5 S1PR1 SLIRP ZNF696 14.21234697151638 GPR35 CD81 GHITM VDAC1 VPS25 RTN3 C16ORF58 ATPIF1 KIAA0195 14.210070749500918 TMEM31 PSMD7 CCDC92 UBAC2 FAF2 ZFAND2B RNF144A ANKRD13D CMTM7 14.203511802765265 ATP6V0C MCEMP1 ERH ATP5J ATPIF1 TCTN3 CLEC2D 14.199553554519772 CNP CLEC4D HPS6 CALM1 TUBB BRD7 RAB9A 14.19942877252465 C6ORF165 MIF4GD MAGED1 GSE1 SAMD3 KLHL3 14.1666517561191 SERTAD3 CCNC TXN UBL5 IMP3 SNRPB 14.16425244951171 NDUFB7 ZNF330 CCNC UQCRQ CKAP4 NDUFB3 NDUFA1 14.159542622530301 ADD3 ACTB ACTG1 UBE2I DYNLT1 ASAP1 STX6 RAB9A 14.157796306634246 AHNAK2 CSNK2A2 PYCR2 PFN1 UBIAD1 STX6 GOT1 MIPEP 14.148986131305842 SLC22A4 TAMM41 ALG9 DGAT1 UQCRQ C14ORF2 PTRH2 SPG7 14.142585503153851 TFAP2A CITED4 RBM14 PARP1 EWSR1 UBE2I PYHIN1 AP2B1 14.126401601073487 FAM9B KDM2B ECH1 SLPI CDC37 BLOC1S1 14.11698360628591 C8ORF33 ADARB1 RRS1 ILF3 COPS5 SF3B2 PYHIN1 FEZ1 CRELD2 14.09670937979197 PKN1 CD81 PCBP1 TUBB CDR2 PLAGL2 SLIRP 14.096460685304157 ATP1A3 VDAC1 SLC25A3 AGK TMEM5 AGPAT2 CLEC2D S1PR1 14.09091431891004 TMEM9B C16ORF58 STX10 STX6 HNRNPH1 CLEC2D 14.080928100266034 TSHR CD81 ACTB HNRNPA1 SCRIB TUBB JAK1 CRELD2 14.07677455964096 L3MBTL3 ZNF330 DNMT1 KDM1A CRLF3 HDAC1 SNRPB 14.074186567554621 CD97 XYLT2 CLEC2D ADARB1 HNRNPD NDUFB3 14.063389780930601 VIPR1 GHITM PFN1 ATPIF1 C16ORF58 C14ORF2 AGPAT2 TM9SF4 14.047697404324515 LHFPL5 CD81 UBIAD1 C16ORF58 RMDN3 DAGLA CMTM7 CD37 14.019767573935857 TRIP10 ARHGAP17 ASAP1 SBK1 CAPRIN1 INTS9 14.016244020808331 TRIM67 TAMM41 VDAC1 U2AF2 EVL QDPR CEP85 ZCCHC14 13.97106267306708 CENPF FNTA PAFAH1B1 TXLNA PAXIP1 RMDN3 HNRNPAB 13.96512240617107 POU6F2 CCNC CDC37 ZNF319 ZNF148 BHLHE40 VPS9D1 13.9620968727965 PEG10 COMT ACTB UBE2I RBM14 HNRNPH1 GATA3 AP3D1 13.960547509257054 HSDL2 PDHA1 MDH2 MIPEP TCTN3 HDAC1 PSMB10 13.95221521815776 CSGALNACT1 TRRAP EP400 DNMT1 STX6 13.937941140827986 TM9SF3 FAF2 TCTN3 HNRNPH1 S1PR1 CLEC2D 13.901465512151681 GTF3C2 RNPS1 PARP1 RNF144A RPA1 STX6 13.891768487992056 SHBG APEH ACTB DSTN SRF MXD4 13.889394629042139 MB21D1 RRS1 CTCF ANXA1 ANXA5 MYOF CENPB IST1 TXLNA 13.88879957377773 BICD1 CRKL NFATC3 CKB TBC1D31 IST1 SH3BP5L DBI 13.888014641070207 GORAB ARF1 CD81 CRKL KCTD5 BET1L CMTM7 13.866536928577545 ARID5A MAGED1 DAZAP2 BAG4 SH2D2A ALDH9A1 PLAGL2 13.864841176114938 CENPB ZNF331 PYHIN1 PARP1 MXD3 ANXA7 13.844141838513725 DCTN5 ACTB DYNC1H1 ACTG1 FNBP1 TXN 13.840392818732811 FAU CD81 RRS1 RPS24 CTCF COPS5 PTBP1 HMGB2 13.830190341568729 TMEM30A IL27RA CLEC4D MARC1 HLA-DPA1 HNRNPH1 CLEC2D 13.824976426695509 PJA1 MAGED1 FAF2 DDX54 JOSD1 LCP2 13.821307537057125 KRTAP13-2 GEMIN4 GNE GATA3 HNRNPH1 PFDN5 13.82091863045577 HK2 FNTA VDAC1 HK3 ARIH2 PARP1 OTULIN 13.801978899167814 EEA1 ABL1 PARP1 MLLT6 HSBP1 STX6 RAB9A 13.801231239852132 ZNF768 ZNF330 TNPO1 EWSR1 SRSF5 ADARB1 SCRIB 13.795015218110475 POTEF EWSR1 TAF15 ADA HLA-DPA1 TUBB GID8 S1PR1 13.786997962041116 RER1 ATP5I TOR1A TCTN3 PGRMC2 HLA-DPA1 GORASP2 S1PR1 13.779166064238144 RBCK1 CARD11 ITGB1 RAB9A ZFAND2B UBE2G1 OTULIN 13.766438796468979 TP53BP2 ACTB CCDC92 ANXA1 KDM1A DYNLT1 TXLNA RASAL3 13.757358903316197 TEKT4 GSE1 HNRNPH1 CRY1 PLAGL2 13.754981062989167 CD63 CD81 ITGB1 ALG13 TMEM43 SRF 13.749187665037981 CPVL SCRIB SARNP STK25 SEPT9 QSOX2 DYNLT1 13.747331086519326 UBR1 BTBD2 BRD7 HNRNPA1 EDEM1 FAF2 CLEC2D 13.74524726650152 WHAMMP3 HERC1 BLOC1S1 CDR2 ANKHD1 WDR37 TXLNG 13.721591975028716 GABARAPL1 TMEM131 RTN3 TBC1D9B SRPK1 HNRNPH1 WDR11 13.717543038006806 TMX2 TLR5 TM9SF4 CD81 SRSF5 C16ORF58 CMTM7 13.715612245385882 NF1 PFN1 CALM1 SMARCA4 ECHDC2 FAF2 HLA-DPA1 CA4 13.709079796406828 SEC23IP ACTB DYNC1H1 RCC2 DSTN ARF1 ARIH2 RAB9A 13.704506994994043 TARS2 ZFP36 SYPL1 CDC37 C12ORF65 PDHA1 NAA38 13.704167533836676 ZNF408 ZNF330 ZMYND8 ZNF317 UBE2I CENPB ZNF212 13.702505984133895 ACSL4 EDEM1 TJAP1 HNRNPD BRD7 13.699840286172384 BRAF FNTA CDC37 PARP1 MYOF AP2B1 S100P 13.696836524005418 SOHLH1 TRRAP TTF2 WDR59 HPS3 PFDN5 13.677485930740957 GATAD1 ETS1 SLBP E4F1 HDAC1 SIN3A 13.667274027745199 RFX1 XYLT2 HDAC1 CBX3 PYHIN1 MAGED1 ABL1 13.658056312246414 POLR2L PHRF1 IMP3 SNRPB ARG1 MED29 13.650040380902949 GDI1 RCC2 ALDH9A1 RAB24 RAB9A 13.646802418884919 BLVRB ORM1 RPL31 GMDS GOT1 MDH2 13.643682598516305 TTC4 CDC37 S100A9 BTBD2 HDAC1 TXLNG 13.639634140063906 ALAD WDR4 CCNC GOT1 ANKRD22 PARP1 13.630042136985038 RAB6B SBF1 CALM1 ARF1 RING1 RPA1 CCDC64 13.629012356509211 USP4 RNPS1 IKBKB ANXA7 RBL2 USP11 13.625784239264714 SPANXN2 TSPYL1 DDX24 WDR37 RNF214 WDR54 13.6136308433884 CTNNA3 FAM110A GNB1 PLAGL2 13.612013975419917 CCR2 RTN3 CD81 SLC41A3 NTNG2 GOT1 SRSF5 13.611979105100254 B4GALT3 XYLT2 GHITM MFNG PDIA4 ATPIF1 ARF1 HLA-DPA1 13.598885776510313 PRDX4 TTF2 PDIA4 EWSR1 PYCR2 CRY1 13.59083307231909 GAB2 CRKL FAM110A STK38 13.56740927457904 LCN2 CCNC PDIA4 LY96 IRAK2 HNRNPA1 GID8 13.5608554918248 MED29 STEAP3 CCNC MED24 IFT20 EIF2B1 13.533374477148104 CDSN JADE1 CCNC GOT1 ARG1 USP11 DYNC1H1 13.508573726716403 LIN37 ZNF296 BCL11B DPY30 SIN3A RBL2 EWSR1 CDR2 13.502656594965496 ATF4 GTF2B HK3 CKAP4 DDIT3 ZNF212 13.49274644723814 TRAF4 KDM1A UBE2I TICAM1 FRS3 SNRPB CEP85 GORASP2 PLAGL2 13.491692514103148 SLC7A1 GHITM COMT DGAT1 RMDN3 CKAP4 RNF144A LEO1 13.48802351253996 UBN2 ETS1 EWSR1 CSTA CBX3 PYHIN1 CALM3 13.47847287643377 JAK2 CALM1 JOSD1 JAK1 IL2RB DNAJA3 13.474314317996262 CCDC132 KDM1A SF3B2 THAP11 PPP2R3C STX6 RAB9A 13.464653568121086 WIPF2 EIF4A3 ACTB ACTG1 ZFAND5 TXLNG HNRNPDL 13.456927709278041 MYO15B CCNC ZCCHC14 NUP62 DNAJA3 13.444037441348483 SH3PXD2A FAM110A FASLG HNRNPH1 13.437785550896827 MPHOSPH8 ZNF330 ILF3 TRRAP IMP3 ERH 13.432047486363093 PRKAR1B CDC37 TUBB PRPSAP2 FAM43A EEF2K POLE4 13.428297600536807 SHARPIN PSMD7 GSE1 IKBKB WDR11 RAB9A OTULIN 13.421121466466964 CDK15 CDC37 RBM14 ANKRD22 ORM1 AFF4 13.407415681291369 TTI1 TRRAP TELO2 MTOR HERC1 S1PR1 AGK 13.405334299167512 MTMR2 MTMR12 SBF1 SRSF5 COX7A2 XPNPEP1 13.403087901237766 ZMYND19 GSE1 PIGT GID8 TUBB GORASP2 MLLT6 13.391244179507627 JADE2 JADE1 ING4 CSNK2A2 GORASP2 PIK3R4 PACSIN1 13.374122882590308 MIS18A DYNLT1 TXLNA TXLNG RMDN3 IFT20 AIMP2 13.365715297242449 TYMP PIGT CCNC HNRNPH1 EEF2 GAPDH GOT1 13.360718218774418 CTDSP1 FAM110A HNRNPH1 HMGCL 13.358229961689704 SLC18A2 DNAAF5 TELO2 TMEM109 JAK1 OTULIN 13.357981301769028 NCAPD3 RPL31 RPS24 PAXIP1 CALM1 BRD7 CD74 13.35630284483614 PBX2 WDR4 AFF4 CREB1 HNRNPH1 STK25 13.353031096243674 PBXIP1 ZMYND8 XYLT2 CCNC GHITM RMDN3 STX6 13.348757149020487 RNF170 FBXO21 TMEM109 S100A9 TM9SF4 13.345235734581498 SEPT8 JADE1 SEPT9 PUS1 CALM1 13.33819730113847 HIST1H2BN CD81 EWSR1 HNRNPD ING4 TMA7 13.337093616838812 TYMS ATIC TXN DCK WDR45B IGBP1 GOT1 13.327583169015186 STK3 NRD1 TSPYL1 ORM2 ZC3H7A 13.324058562454878 CEP290 TBC1D31 ECH1 CALM3 CALM1 DDX47 KLHL3 13.317805848742136 CYCS HADH VDAC1 RBL2 HNRNPH1 SFSWAP 13.297351255303914 PPP1R13B ZFP36 TXLNA NUP62 RASAL3 MLH1 13.294333181493785 CCDC153 AIMP2 GORASP2 NUP62 DDIT3 CDR2 13.283856007317798 UBE2V1 TXN CALM1 DAZAP2 ARIH2 RNF144A ECHS1 13.282617980893415 CRX CCNC TFG ARIH2 PPP1R16B PSMB10 MLLT6 AASDHPPT 13.271614009938997 KRTAP19-7 GEMIN4 GNE BHLHE40 DAZAP2 MIPEP 13.263949834634003 PPP3R1 PDHB CDR2 TMCC3 CALM3 CALM1 13.262703645254199 SOCS3 JAK1 ABL1 IL2RB SH2D2A TCEB2 13.259067590988048 LLGL1 SMARCAL1 PTPRA USP11 BRD7 STX6 RAB9A 13.25659015229679 DHFR CDC37 TBC1D9B HPS3 RAD17 HNRNPH1 13.252482265608082 SCGN SNAP23 STX6 STX10 GNE CNOT11 13.248991289368847 SLC1A1 EWSR1 HNRNPH1 JAK1 SPG7 AGPAT2 TM9SF4 13.248113296095777 COL4A3BP PARP1 RANGAP1 RTN3 STX2 TAF15 IFT20 13.241093402902848 HSPA14 GMFG SEPT9 PAXIP1 ATP5E 13.240083163653946 CCT6B PAXIP1 WDR37 FBXW4 GNB1 HDAC1 13.228444875590714 MAP3K4 ACTB CDC37 ACTG1 HNRNPD ZFP36 13.216398320570889 SLAIN2 AKR7A2 HERC1 PFN1 SRPK1 AP2B1 13.215651343412931 MAP7D3 ZC3H7A DYNLT1 RMDN3 CALM3 CALM1 13.215389239964676 FAM9C CCNC HSBP1 BLOC1S4 BLOC1S1 STX6 13.213196304257522 SPCS2 ILF3 HNRNPH1 TCTN3 CKAP4 TM9SF4 13.200638477233342 XPNPEP1 DMGDH ATP6V1D WDR4 GORASP2 PUF60 13.184059276239408 KANSL1 KDM1A DPY30 YEATS2 CTCF NUP62 CDR2 13.176651081413965 HYPK TXLNA PCBP1 YEATS2 CALM1 C2ORF68 13.17122202292975 GJB1 COMT CD37 HNRNPH1 LHFPL2 CALM1 13.170005366190786 GPATCH2L KDM1A CSNK2A2 TSPYL1 S100A9 PLAGL2 13.169484808803535 ARL8A ITGB1 GHITM ZSCAN21 SPECC1L TBC1D9B PUF60 13.16597190748735 BRAT1 PSMD7 PARP1 MTOR GPR114 INTS9 13.164523600068346 THRB ACTB UBE2I PARP1 NCOA6 HNRNPH1 13.16251984874304 FAM103A1 DAZAP2 MAGED1 HNRNPH1 COPS5 BAG4 13.161692169169015 RPAP3 CDC37 MAP4K1 TELO2 TM9SF4 KDM1A SBF1 ANAPC1 13.158157055788442 LAMTOR3 RRAGD RAB9A UQCRQ PTRH2 PUF60 13.156424827525322 NUS1 ACTB TCTN3 HLA-DPA1 HNRNPH1 GPR114 13.149885514145168 CHRAC1 PARP1 XPC DCK RNF214 GOT1 13.14716381117448 MARCH4 SARS S100A9 MARC1 PTPRA KIAA0195 13.134130184306802 GPANK1 MAGED1 HNRNPH1 BAG4 PFDN5 AP2B1 13.13142106532119 PTOV1 XYLT2 HNRNPH1 HDAC1 FEZ1 TCEB2 13.117255451103132 TNKS ZMYND8 FNBP1 SH3BP5L GMDS 13.113329452889042 TGS1 LSM10 NCOA6 SCRIB CRY1 SNRPB ZZEF1 13.113201706727365 DOK4 AHCYL2 KDM1A GSE1 PFDN5 RBP7 13.107951416852334 ZMIZ1 SETD4 ETS1 PIAS3 NFATC3 HNRNPD 13.107464984593593 POLA1 HELB RPA1 PARP1 POLD2 POLE4 13.104390783741206 ITGA5 CD81 EWSR1 ITGB1 SPECC1L MMP9 AP2B1 13.096750788301103 SFSWAP IFI27L1 JADE1 H2AFJ EWSR1 DDX23 RNPS1 13.086658428799515 STXBP1 SHMT1 ARHGAP17 CALM1 ECHS1 STX2 13.085397863770039 CD9 ITGB1 CD81 HNRNPH1 CTCF GOT1 13.085257214666772 HLA-A HLA-DPA1 STX6 RAB9A CLEC4D BUD31 13.077978333835484 TRIM68 TFG ARHGAP17 TCTN3 BTN3A3 13.075748556786317 TTC26 HK3 HERC1 FAM43A IFT20 13.049909587928354 KCTD3 FAM110A MRI1 SLC9A3R1 ASPSCR1 COPS5 13.046845438203533 LIMK2 CFL1 PGLYRP1 CDC37 EEF2 ATP5E 13.046763940182355 RAB2B TTF2 FNTA PARP1 RAB9A 13.03344498415573 SERINC1 ILF3 HNRNPH1 GPR114 IFT20 13.029195669599432 SCAF4 SNRPB HNRNPA1 RPA1 RNPS1 GATA3 RRS1 13.026763292070262 MAPKAP1 FAM110A STK38 MTOR 13.024327958966104 CSPP1 TUBB PFDN5 RNF214 KIAA0355 SAMD4B 13.01838079645147 PRKD2 TMA7 ANXA5 MDH2 HNRNPH1 TAF15 13.0157735733304 KIF20B H2AFJ CRY1 PAFAH1B1 SF3B2 MXD3 13.01101756437445 RFWD2 ETS1 COPS7B BTBD2 COPS5 TCEB2 13.007214252958244 EMP1 MS4A7 CD37 HNRNPH1 ATP6V0E1 CLDN9 13.005440618023538 PPP1R32 TFG PFDN5 FRS3 HNRNPH1 BAG4 12.988936406052147 HABP4 JADE1 PPP1R16B PIAS3 UBE2I DDX24 H2AFJ 12.978918475784493 RNF7 ACTB ARID1A TBC1D31 COPS7B TCEB2 ARIH2 12.971137736057823 DARS2 FNTA PDHA1 DNAJA3 AIMP2 NCOA6 12.966808167658233 SORBS3 IMP3 TXLNA HNRNPDL MICAL1 FASLG SIN3A 12.966605312803864 USP32 DNAAF5 ANKHD1 RNF144A RAB9A TRIP12 12.96451131317066 IVL JADE1 GOT1 PIGT CSTA GID8 HNRNPA1 12.961051788435846 DKK2 AHCYL2 SIN3A S100A9 CENPB ZZEF1 12.959552251138502 LTBP4 CRKL DYNLT1 NUFIP2 CRY1 E4F1 12.956586808934262 HERC1 ZNF571 HMGN2 ARF1 ABL1 EDEM1 ECHS1 12.954635932397242 RRAGA ACTB DYNLT1 RRAGD RAB9A EDEM1 12.945627735256437 UBE2H JADE1 RNF216 GOT1 RNF144A RNF34 12.9440695459552 PBX4 ZNF330 KDM1A HNRNPH1 IFT20 12.93878407132058 AARS EIF4A3 EEF2K CLEC4D IARS 12.935428040539033 DDX3Y CD81 PCBP1 RNF34 USP11 12.934316738857081 TRIM44 TSPYL1 DDX24 TXLNA TXLNG TCEB2 12.934102737694474 ALS2CR11 CDR2 AP2B1 CNOT11 DNAJA3 HNRNPD 12.932418498968426 BCL10 IKBKB PRKCQ CARD11 TMEM43 SLC9A3R1 OTULIN 12.924767511259562 PLN MS4A7 ATP2A2 ATP6V0E1 RTN3 STX2 CLEC2D 12.9235721491963 PLEKHG6 ARF1 ACTB NUP214 ANXA1 HNRNPH1 12.919752197400918 PI4KA CALM1 MAP4K1 RAB9A S100A9 C2ORF50 12.916441192163163 C10ORF12 ZNF330 CBX3 EWSR1 NBEAL2 PHF19 12.908988763675014 PPEF1 SLC25A3 ATP2A2 HERC1 DNAJA3 CALM1 12.903730181520613 BMPR2 HNRNPR FRS3 C16ORF58 SMAD7 DYNLT1 12.885906947284935 MYO9A MYL6B PFN1 CKB BRD7 SAMD3 12.884937420498597 DHDH RNF214 SBF1 WDR59 TUBB 12.873702188085337 SNAPC5 BLOC1S1 NUP62 DDIT3 12.865712783640276 DDX60 DDX47 ERH MAGED1 CD74 EIF2S3 12.853292077019905 KIAA1683 ANAPC1 PUF60 WDR59 CALM3 CALM1 12.853099550283657 PLD2 ACTB GAPDH TUBB MTOR BAG4 ARF1 DYNLT1 12.852934090492212 USP13 TXN ATP6V1D DAZAP2 FAF2 12.830495263923973 RPS6KB1 ABL1 MTOR USP11 RCC2 EEF2K 12.828456460380654 HNF1A ACTB KDM2B CBX3 PAXIP1 CALM3 12.826934446399154 PPP4R1 HNRNPR HDAC1 SF3B2 IKBKB MED24 12.824445031052672 MYD88 LRRFIP1 TLR5 TXN IRAK2 12.823421172458008 SIPA1L1 ACTB FAM110A USP11 SAMD4B 12.82186380945007 IL13RA2 DNAAF5 DSCR3 BAG4 JAK1 SLC25A5 12.820345143346154 PTPN13 PTPRA FASLG DFFB PARP1 STX6 12.816851565002347 RAB6A CLEC4D RPA1 ARF1 RAB9A CCDC64 12.814317302079758 ARHGAP12 ACTB ACTG1 PHACTR4 ASAP1 SRPK1 12.81099531415499 COX17 COMT CFL1 HADH TXN HMGCL 12.80892213645707 APRT TUBB CLEC4D PSMB10 MLH1 NUDT9 ADA 12.790211875026548 MCCC1 ECH1 DYNLT1 DYNC1H1 MIPEP ARF1 NAA38 PFDN5 12.789876745089192 RHBDD2 COMT EDEM1 SLC25A3 TCTN3 DYNC1H1 12.77675866043283 ANKRD26 TBC1D31 SPECC1L PFN1 ARID1A STX6 RAB9A 12.770199511572732 RBM14-RBM4 FAF2 WDR4 H2AFJ WDR37 ZSCAN21 NOL9 12.764765140446867 HOXA5 AHCYL2 HLA-DPA1 DDIT3 NUFIP2 ANKHD1 S100P 12.754224269979586 CMSS1 RPL31 RPL26 SRSF5 PACSIN1 S1PR1 12.748445344565313 HAL DOCK10 VDAC1 UBAC2 GOT1 PIGT PFDN5 HNRNPA1 12.740096980426063 CRELD1 GPR114 RMDN3 NUFIP2 NDUFB3 SPG7 12.735076839334619 SERPINA12 CCNC TCTN3 SEC22C DYNC1H1 FOXJ3 12.728647880300256 MARCH1 JAK1 VPS9D1 STX10 STX6 PTPRA 12.727834860530066 HSF2 NUP62 ANXA1 UBE2I BAG4 DDIT3 12.726983515534382 TRAPPC9 IKBKB TFG GATA2 STX6 NFATC3 12.724901894157206 GRB10 ABL1 WDR59 NOA1 DDX24 USP11 12.724426265432443 CHPT1 ITGB1 TCTN3 ALG9 TMEM5 CLEC2D 12.719092908622864 USP2 CRACR2B GEMIN4 CARD11 SMAD7 CRY1 12.718395161851076 COG4 FAM110A HDAC1 RAB9A 12.708187742157715 EPB41 RNPS1 C2CD2 EEF2K ECHS1 STX6 DYNC1H1 12.704714419736337 GSC2 ANXA5 CALM1 ORAOV1 GTF2H5 TSR2 CALM3 12.7011091195529 FAM90A1 CCNC CDR2 PFDN5 PLAGL2 MLH1 ZNF212 12.697441082988284 PIP5K1A AHCYL2 PARP1 CENPB HNRNPH1 12.691140626850947 DCAF13 PAXIP1 RPA1 DDX47 H2AFJ NUFIP2 12.688499355686796 GDE1 DAZAP2 TMEM109 HNRNPA1 UBAC2 XPNPEP1 12.686206415923946 ND4 CTCF GHITM NDUFB3 NDUFA1 ATP5J 12.682908113727049 ZACN ALG9 HNRNPH1 SLC41A3 LRRC8C EDEM1 12.680036932203036 SH3GL2 ACTG1 ERH PAXIP1 LCP2 CALM1 12.678600229641985 FAM96B PAXIP1 SON MAP4K1 PAFAH1B1 ELP3 12.674693113804508 SLC12A4 ILF3 HNRNPH1 TMEM43 STX6 12.673664676371178 TRIML2 NUP62 CCDC92 EIF2B1 CDR2 12.671467107829773 PDAP1 KLHL36 HNRNPH1 U2AF2 CD74 12.669975911174232 TSTA3 MAT2B GOT1 ANXA5 DSTN HNRNPH1 12.666857891707162 FKBP9 APEH COX6A1 COX7C ACTB XPNPEP1 12.665860155802582 SLC25A15 HLA-DPA1 PARP1 DNAJA3 CLEC2D 12.656567185543286 AAK1 ACTB DYNC1H1 PSMD7 DYNLT1 AP2B1 GAK PFN1 12.65642255762714 UQCC1 SBF1 TMEM131 MRPS18C CDR2 PDHA1 ECHDC2 12.65629440309535 PKP4 SCRIB STX6 PAXIP1 HNRNPH1 12.645129520941639 ZNF460 ZNF330 AHCYL2 MRPS18C NOA1 CENPB AP3D1 12.644129103560928 SPAG5 TSPYL1 PFN1 COPS7B IFT20 12.642632034060462 LPAR1 XYLT2 PIGT CKAP4 TM9SF4 USP11 NDUFB3 12.635815251543947 ISLR TCTN3 CRELD2 SLC25A5 SPG7 MTOR 12.628541127803826 RPAP2 PHRF1 ACTB PFN1 TNFAIP6 MED24 12.623554075986746 NT5C1A GDPD3 CDC37 S100A9 12.62063534157844 GNA11 CD81 ACTB GNB1 DYNLT1 SLC9A3R1 12.618230071219948 MAPK9 SHMT1 NFATC3 STK25 HDAC1 XPNPEP1 12.617570886483609 ZKSCAN8 SCRIB MRPS18C NOA1 ZSCAN21 THAP11 CENPB 12.613417030843538 VCPIP1 ACTB AP2B1 PFN1 AP5B1 ASPSCR1 FAF2 12.613334695584571 ZNF800 HDAC1 ADARB1 SRSF5 FBXO21 TM9SF4 12.611438562677618 STON2 ACTB AP2B1 RNPS1 GAK ABL1 TOR1A 12.609903236700042 MKS1 ECH1 RNF34 DDX24 KIAA0355 SAMD4B 12.609795980454468 TCHP ACTB ZFP36 KIAA0355 CDR2 TUBB S100P 12.604479713492816 RBBP8 TRRAP PAXIP1 UBE2I RBL2 ZNF217 12.592559776499902 FAM195B DNMT1 ZFP36 ZC3H7A PPP1R16B 12.591383524260609 SCYL1 ACTB SMARCA4 ARID1A BTBD2 ARF1 CKAP4 DDX23 12.58717338010275 TNFRSF1A HNRNPD JAK1 UBE2I BAG4 IKBKB FASLG 12.580664715623618 TARBP2 ADARB1 ILF3 CAPRIN1 SRP54 MAP4K1 TGIF2 12.57379066574236 LCP1 ACTB U2AF2 CNDP2 AIF1 GOT1 12.570138223956597 TMEM55B FAF2 RAB9A TBC1D9B DFFB NFATC3 12.562996964214928 AP3S1 CD81 BMX AP3D1 DUSP5 SPG7 AGK 12.560524205460064 LSG1 CLEC4D CKAP4 PTRH2 RAB9A PYHIN1 HMGB2 12.553869938314373 ZNF592 ZNF330 ZMYND8 CSNK2A2 HDAC1 SCRIB 12.553081935183153 UBAC1 PSMD7 HMGN2 RNASE3 VPS25 RNF216 DAZAP2 12.552451473258257 C12ORF5 PIGT ANXA1 GSTO1 HMGCL PCBP1 12.548327648987696 SLC2A8 C2CD2 AGPAT2 CLEC2D AP2B1 IARS 12.53804980163159 ATP2A3 GHITM RMDN3 S1PR1 CLEC2D ATP2A2 SPG7 12.529918255732493 SEC22A CLEC10A RNF144A GORASP2 STX2 CD74 CLEC2D 12.525940122846919 TSPYL6 MYL6 DDX24 CENPB WDR59 12.51773125485418 NDUFA3 HLA-DPA1 NDUFB3 NDUFA1 INTS9 12.51578089830613 TPCN2 HK3 STX6 AP3D1 WDR11 PTPRA 12.514743611966564 MINA C1ORF174 FNTA DDX54 ILF3 RNPS1 12.509899874891586 SPEN CCNC BAG4 CBX3 HDAC1 PYHIN1 DYNLT1 12.499346134667679 NCF1 NCF4 FASLG ACTB RCC2 EEF2 12.496025391401774 CNN2 CD81 ACTB U2AF2 ANXA1 MAT2B 12.49596597572004 FOXO3 TRRAP IKBKB HNRNPH1 RAD17 12.491226176417404 TMEM87A ATP5J STX6 RAB9A TMEM131 CLEC4D 12.485730559703105 OBFC1 CDC37 SPECC1L MED29 MED24 BRD7 12.482763634260044 PALLD ACTB DSTN PFN1 SARS WDR4 IGBP1 12.481136837564845 HIC1 DNMT1 PIAS3 UBE2I BCL11B ARID1A PHF19 12.474570293288217 SSH3 ZNF330 HERC1 ASAP1 RPA1 12.473312988369914 TMEM14B STX6 UBIAD1 COMT SRSF5 SFSWAP 12.46851020946378 EXOSC6 ZFP36 MIF4GD GSE1 ARIH2 12.463193349393734 CENPK TBC1D31 SPECC1L PAFAH1B1 LEO1 COPS5 12.45830242512749 SH3BP4 FAM110A PARP1 12.45049650858817 CCR4 CNTNAP1 ATP2A2 GHITM NUP62 CLASP2 12.44981455040277 RERE ZMYND8 HDAC1 DPY30 YEATS2 ALG13 12.433511011713414 MAGEA11 DOCK10 ILF3 EWSR1 MXD3 TRIM51 12.432649061412166 DRD2 GHITM SLC22A17 RTN3 CALM1 CALM3 12.423309769548975 NRBP1 TTF2 COPS5 TCEB2 DYNLT1 C2ORF50 12.421925887481848 MGMT ACTB ANXA1 CSNK2A2 DDX24 PRKCH 12.415612251227067 MAPRE3 PIK3R4 TTF2 TMEM131 STIM2 SPECC1L CLASP2 12.41291968955834 NDUFB2 NDUFA1 MTOR MED24 HNRNPH1 12.408369891568181 MS4A4A AGPAT2 C16ORF58 DNAAF5 CMTM7 ATP6V0E1 12.407344019827867 TUBA3C CD81 FEZ1 AP4B1 FAF2 DYNLT1 12.404306998260306 TM4SF5 STX10 STX6 TMEM43 PTRH2 CYSTM1 12.397162827527717 PRSS8 IL27RA NDUFB3 AGK IARS 12.390609640362428 PPARD KDM1A HDAC1 TTF2 TELO2 RAD17 12.38887783516027 TRIM39 KDM1A CCNC MOAP1 CBX3 UBL5 12.386457352604717 NOS1AP SCRIB RNPS1 AP2B1 STX6 PLAGL2 12.381746816361211 WDR73 DAZAP2 ANXA7 ARID1A INTS9 12.38076494716938 ARHGEF7 SCRIB CALM1 ATPIF1 MYL6B IGBP1 12.378400783097414 ABCB10 TCTN3 HLA-DPA1 PDHA1 DNAJA3 12.375722651846571 MPO S100A8 IRAK2 SRPK1 12.371429326997916 SACM1L RTN3 COPS5 PARP1 DDOST TM9SF4 S1PR1 12.37096064853776 MOSPD2 CLEC4D GPR114 RMDN3 FASLG TCTN3 12.367453301762332 CD44 CD74 PHRF1 GOT1 ITGB1 12.361978858350152 GPRIN1 ACTB HNRNPD S1PR1 STX6 PTPRA 12.360881360859432 PPP1R8 SARS HNRNPH1 U2AF2 SNRPB 12.357507784676356 EGLN1 ING4 RUNX3 PRPSAP1 RMDN3 12.350049781616514 FBXW8 MYL6 CALM1 MAP4K1 MMP9 PFDN5 12.345024404752504 VPS26A APEH VPS25 CRLF3 IFT20 EIF2S3 12.338932731569418 CDKN1B SLC9A3R1 ABL1 UBE2I COPS5 ORM2 IRF1 ================================================ FILE: inst/rmd/conversion_table.Rmd ================================================ --- title: '`r logo_path <- system.file("extdata", "logo.png", package = "pathfindR"); knitr::opts_chunk$set(out.width="15%"); knitr::include_graphics(logo_path)` pathfindR - Converted Genes and Genes without Interactions' date: "`r format(Sys.time(), '%d %B, %Y')`" output: html_document params: df: "" original_df: "" --- ```{r setup, include=FALSE} knitr::opts_chunk$set(echo = FALSE) ``` ## Table of Converted Gene Symbols ```{r converted_tbl, table1, comment=NA} genes_df <- params$df colnames(genes_df) <- c("Old Symbol", "Converted Symbol", "Change", "p-value") genes_df <- genes_df[genes_df[, 1] != genes_df[, 2], ] knitr::kable(genes_df, align = "c", table.caption.prefix ="") ``` ## Table of Genes without Interactions (not found in the PIN) ```{r gene_wo_interaction, table2, comment=NA} org_df <- params$original_df missing_df <- org_df[!org_df[, 1] %in% params$df[, 1], ] knitr::kable(missing_df, align = "c", table.caption.prefix ="") ``` ================================================ FILE: inst/rmd/enriched_terms.Rmd ================================================ --- title: '`r logo_path <- system.file("extdata", "logo.png", package = "pathfindR"); knitr::opts_chunk$set(out.width="15%"); knitr::include_graphics(logo_path)` pathfindR - All Enriched Terms' output: html_document params: df: "" --- ```{r setup, include=FALSE} knitr::opts_chunk$set(echo = TRUE) ``` ```{r table, echo = F, comment=NA} result_df <- params$df result_df$lowest_p <- format(result_df$lowest_p, digits = 2) result_df$highest_p <- format(result_df$highest_p, digits = 2) create_link <- function(text, link) return(paste0("[", text, "]", "(", link, ")")) knitr::kable(result_df, align = "c", table.caption.prefix ="") ``` ================================================ FILE: inst/rmd/results.Rmd ================================================ --- title: '`r logo_path <- system.file("extdata", "logo.png", package = "pathfindR"); knitr::opts_chunk$set(out.width="15%"); knitr::include_graphics(logo_path)` pathfindR - Results' date: "`r format(Sys.time(), '%d %B, %Y')`" output: html_document --- ```{r setup, include=FALSE} knitr::opts_chunk$set(echo = TRUE) ``` pathfindR-Enrichment results are presented below: ## [All terms found to be enriched](./enriched_terms.html) A table that lists all terms found to be enriched as well as lists of up- or down-regulated genes for each term. If it was requested, the term descriptions are linked to the visualizations of these terms, where affected color genes are colored by change values (if provided). ## [Tables of genes with converted gene symbols and genes without interactions](./conversion_table.html) - A table listing the genes whose symbols (Old Symbol) were converted to aliases (Converted Symbol) that were in the protein-protein interaction network. - A table listing the input genes for which no interactions in the PIN were found (after the aliases were also checked). ================================================ FILE: java/ActiveSubnetworkSearchAlgorithms/ActiveSubnetworkSearch.java ================================================ package ActiveSubnetworkSearchAlgorithms; import ActiveSubnetworkSearchMisc.ScoreCalculations; import ActiveSubnetworkSearchMisc.Subnetwork; import Application.AppActiveSubnetworkSearch; import Application.Parameters; import File.ExperimentFileReader; import File.SIFReader; import Network.Network; import Network.Node; import java.io.BufferedWriter; import java.io.FileNotFoundException; import java.io.FileWriter; import java.io.IOException; import java.util.AbstractMap.SimpleEntry; import java.util.ArrayList; import java.util.logging.Level; import java.util.logging.Logger; /** * * @author Ozan Ozisik */ public class ActiveSubnetworkSearch { /** * scoreCalculations and network are used in other classes */ public static ScoreCalculations scoreCalculations; public static Network network; public static ArrayList networkNodeList; public static void activeSubnetworkSearch(){ network=SIFReader.readSIF(Parameters.sifPath); if(network==null){ Logger.getLogger(AppActiveSubnetworkSearch.class.getName()).log(Level.SEVERE, "SIF file could not be loaded"); System.exit(0); } networkNodeList=network.getNodeList(); ArrayList> namePValuePairList=ExperimentFileReader.readExperimentFile(Parameters.experimentFilePath); if(namePValuePairList==null){ Logger.getLogger(AppActiveSubnetworkSearch.class.getName()).log(Level.SEVERE, "Experiment file could not be loaded"); System.exit(0); } scoreCalculations=new ScoreCalculations(namePValuePairList); ArrayList subnetworkList; if(Parameters.useSAorGAorGR==Parameters.SearchMethod.GA){ GeneticAlgorithm geneticAlgorithm=new GeneticAlgorithm(); subnetworkList=geneticAlgorithm.geneticAlgorithm(); }else if(Parameters.useSAorGAorGR==Parameters.SearchMethod.SA){ SimulatedAnnealing simulatedAnnealing=new SimulatedAnnealing(); subnetworkList=simulatedAnnealing.simulatedAnnealing(); }else{ GreedySearch greedySearch=new GreedySearch(); subnetworkList=greedySearch.greedySearch(); } try { BufferedWriter bw=new BufferedWriter(new FileWriter(Parameters.resultFilePath)); for(Subnetwork subnetwork:subnetworkList){ if(subnetwork.getScore()>0){ bw.write(subnetwork.getScore()+" "); // bw.write(subnetwork.numberOfNodes()+" "); for(Node node:subnetwork.getNodeList()){ bw.write(node.getName()+" "); } bw.newLine(); } } bw.close(); } catch (FileNotFoundException ex) { Logger.getLogger(ActiveSubnetworkSearch.class.getName()).log(Level.SEVERE, null, ex); } catch (IOException ex) { Logger.getLogger(ActiveSubnetworkSearch.class.getName()).log(Level.SEVERE, null, ex); } } } ================================================ FILE: java/ActiveSubnetworkSearchAlgorithms/GAIndividual.java ================================================ package ActiveSubnetworkSearchAlgorithms; import ActiveSubnetworkSearchMisc.Subnetwork; import Network.Node; import Network.SubnetworkFinder; import java.text.DecimalFormat; import java.util.ArrayList; import java.util.Collections; import java.util.HashSet; import java.util.Iterator; /** * * @author Ozan Ozisik */ public class GAIndividual implements Comparable{ private ArrayList representationBoolean; private ArrayList networkNodeList; private HashSet nodesOnSet; private ArrayList subnetworkList; public GAIndividual(HashSet nodesOnSet){ this.nodesOnSet=nodesOnSet; this.networkNodeList=ActiveSubnetworkSearch.networkNodeList; representationBoolean=new ArrayList<>(); for(Node node:networkNodeList){ if(nodesOnSet.contains(node)){ representationBoolean.add(Boolean.TRUE); }else{ representationBoolean.add(Boolean.FALSE); } } subnetworkList=(new SubnetworkFinder()).findSubnetworksDFS(nodesOnSet); Collections.sort(subnetworkList,Collections.reverseOrder()); } public GAIndividual(ArrayList representationBoolean){ this.representationBoolean=representationBoolean; this.networkNodeList=ActiveSubnetworkSearch.networkNodeList; nodesOnSet=new HashSet<>(); for(int i=0;i ownIt = this.subnetworkList.iterator(); Iterator otherIt = ((GAIndividual)o).getSubnetworkList().iterator(); while (!decision && (ownIt.hasNext() && otherIt.hasNext())) { Subnetwork subnetworkOwn=ownIt.next(); Subnetwork subnetworkOther=otherIt.next(); if (subnetworkOwn.getScore() > subnetworkOther.getScore()) { result=1; decision = true; }else if (subnetworkOwn.getScore() < subnetworkOther.getScore()) { result=-1; decision = true; } } if(!decision){ //Gives advantage to the individual with more subnetworks //when one individual's subnetwork set is a subset of other's if(ownIt.hasNext()){ result=1; }else if(otherIt.hasNext()){ result=-1; } } return result; } public ArrayList getRepresentationBoolean() { return representationBoolean; } public ArrayList getNetworkNodeList() { return networkNodeList; } public HashSet getNodesOnSet() { return nodesOnSet; } public ArrayList getSubnetworkList() { return subnetworkList; } public Subnetwork getHighestScoringSubnetwork(){ return subnetworkList.get(0); } /** * @return score of highest scoring subnetwork in the individual */ public double getScore(){ if(subnetworkList.isEmpty()){ return 0; }else{ return subnetworkList.get(0).getScore(); } } public String toString(){ String str=""; for(Subnetwork subnetwork:subnetworkList){ if(subnetwork.numberOfNodes()>1){ str+=subnetwork.numberOfNodes()+" "; str+=(new DecimalFormat("###.##")).format(subnetwork.getScore())+", "; } } return str; } } ================================================ FILE: java/ActiveSubnetworkSearchAlgorithms/GeneticAlgorithm.java ================================================ package ActiveSubnetworkSearchAlgorithms; import ActiveSubnetworkSearchMisc.ScoreCalculations; import ActiveSubnetworkSearchMisc.Subnetwork; import Application.Parameters; import Network.*; import java.util.ArrayList; import java.util.Collections; import java.util.Random; import java.util.concurrent.ThreadLocalRandom; import java.util.logging.Level; import java.util.logging.Logger; /** * * @author Ozan Ozisik */ enum SelectionType {RANKSELECTION, ROULETTEWHEEL}; enum CrossoverType {SINGLEPOINT, MULTIPOINT, UNIFORM}; public class GeneticAlgorithm { ScoreCalculations scoreCalculations; ArrayList networkNodeList; Random random; int populationSize; public ArrayList geneticAlgorithm(){ scoreCalculations=ActiveSubnetworkSearch.scoreCalculations; networkNodeList = ActiveSubnetworkSearch.networkNodeList; populationSize=Parameters.ga_populationSize; ArrayList population=new ArrayList<>(); random=new Random(Parameters.seedForRandom); initializePopulation(population, populationSize); printSituation(population); int addRandomIndividualCount=0; boolean running=true; int iter=0; GAIndividual lastBestIndividual=population.get(0); int lastBestRepeatNumber=0; while(running){ /** * New population created */ ArrayList newPopulation=createNewPopulation(population , SelectionType.RANKSELECTION, CrossoverType.UNIFORM); /** * After each 10 steps 10% of the population (worst scoring part) * is replaced with random individuals */ if(addRandomIndividualCount==10){ for(int i=1;i<=(int)Parameters.ga_populationSize*0.1;i++){ newPopulation.set(newPopulation.size()-i, createRandomGAIndividual()); } addRandomIndividualCount=0; } /** * Best scoring individuals are checked to prevent score decrement * in the next population. * There is a possibility that this may overwrite one of * the randomly added individuals above. It is not a big deal. */ if(Parameters.ga_Elitism){ if(newPopulation.get(0).compareTo(population.get(0))<0){ newPopulation.set(newPopulation.size()-1, population.get(0)); } } Collections.sort(newPopulation,Collections.reverseOrder()); population=newPopulation; System.out.println("New Population, iter="+iter); printSituation(population); //TODO: Be careful about GAIndividuals with 0 subnetworks, you may //consider adding empty subnetwork with score 0 in subnetwork finder //when no subnetwork found addRandomIndividualCount++; iter++; if(population.get(0).compareTo(lastBestIndividual)==0){ lastBestRepeatNumber++; }else{ lastBestIndividual=population.get(0); lastBestRepeatNumber=0; } if(lastBestRepeatNumber>=50){ running=false; System.out.println("The score did not improve in 50 steps"); } if(iter>=Parameters.ga_totalIterations){ running=false; } } return population.get(0).getSubnetworkList(); } private void printSituation(ArrayList population){ for (int i = 0; i < 10; i++) { System.out.println(population.get(i).toString()); } } private void initializePopulation(ArrayList population, int populationSize){ for (int i = 0; i < populationSize; i++) { population.add(createRandomGAIndividual()); } /** * Creates a GAIndividual that contains all the genes that have * positive scores */ if(Parameters.startWithAllPositiveZScoreNodes){ ArrayList individualPositiveZ=new ArrayList<>(); for(int i=0;i 0); } population.set(populationSize-1, new GAIndividual(individualPositiveZ)); } Collections.sort(population,Collections.reverseOrder()); } private GAIndividual createRandomGAIndividual(){ ArrayList individual = new ArrayList<>(); for (int i = 0; i < networkNodeList.size(); i++) { individual.add(random.nextDouble() createNewPopulation(ArrayList population, SelectionType selectionType, CrossoverType crossoverType){ ArrayList newPopulation=new ArrayList<>(); long start=System.nanoTime(); ArrayList threads=new ArrayList(); for(int i=0;i newPopulation; ArrayList population; SelectionType selectionType; CrossoverType crossoverType; public NewPopulationFactory(ArrayList population, ArrayList newPopulation, SelectionType selectionType, CrossoverType crossoverType){ this.population=population; this.newPopulation=newPopulation; this.selectionType=selectionType; this.crossoverType=crossoverType; } @Override public void run() { while(newPopulation.size() population) { double totalWeight = 0; double weights[] = new double[population.size()]; if (selectionType == SelectionType.RANKSELECTION) { for (int i = 0; i < population.size(); i++) { weights[i] = population.size()-i; totalWeight = totalWeight + weights[i]; } } else {//SelectionType.ROULETTEWHEEL, individuals who have the same //highest scoring subnetwork will have the same weight for (int i = 0; i < population.size(); i++) { weights[i] = population.get(i).getScore(); totalWeight = totalWeight + weights[i]; } } GAIndividual[] parents=new GAIndividual[2]; for (int i = 0; i < 2; i++) { int randomIndex = -1; double rand = ThreadLocalRandom.current().nextDouble() * totalWeight; int rr = 0; while ((rr < population.size()) && (randomIndex == -1)) { rand = rand - weights[rr]; if (rand <= 0.0) { randomIndex = rr; } rr++; } parents[i]=population.get(randomIndex); } return parents; } private GAIndividual[] crossoverAndMutation(GAIndividual parent1, GAIndividual parent2){ ArrayList parent1Boolean=parent1.getRepresentationBoolean(); ArrayList parent2Boolean=parent2.getRepresentationBoolean(); ArrayList child1Boolean=new ArrayList<>(); ArrayList child2Boolean=new ArrayList<>(); /** * Crossover */ if(ThreadLocalRandom.current().nextDouble()0){ for(int i=0;i node2BestComponent; /** * Track the best score generated from the current starting point */ double bestScore; /** * Map from a node to the number of nodes which are dependent on this node * for connectivity into the graph */ HashMap node2DependentCount; /** * Map from a node to it's predecessor in the search tree When we remove * this node, that predecessor may be optionally added to the list of * removable nodes, dependending if it has any other predecessors */ HashMap node2Predecessor; /** * Lets us know if we need to repeat the greedy search from a new starting * point */ boolean greedyDone; /** * Determines which nodes are within max depth of the starting point */ HashSet withinMaxDepth; ArrayList nodeList; Network graph; public ArrayList greedySearch() { this.max_depth = Parameters.gr_maxDepth; this.search_depth = Parameters.gr_searchDepth; node2BestComponent = new HashMap(); nodeList = ActiveSubnetworkSearch.networkNodeList; graph = ActiveSubnetworkSearch.network; int percent=0; for(int nodeNo=0;nodeNopercent){ percent=newPercent; System.out.println(percent+"% of seeds checked"); } withinMaxDepth = new HashSet(); /** * determine which nodes are within max-depth * of this starting node and add them to a hash set * so we can easily identify them * if the user doesn't wish to limit the maximum * depth, just add every node into the max depth * hash, thus all nodes are accepted as possible * additions */ if (max_depth==0) { for(Node node:nodeList){ withinMaxDepth.add(node); } } else { /** * recursively find the nodes within a max depth */ initializeMaxDepth(seed, max_depth); } // set the neighborhood of nodes to initially be only // the single node we are starting the search from ArrayList nodeListForSubnetwork=new ArrayList(); nodeListForSubnetwork.add(seed); Subnetwork component = new Subnetwork(nodeListForSubnetwork); // make sure that the seed is never added to the list of removables node2DependentCount = new HashMap(); node2Predecessor = new HashMap(); node2DependentCount.put(seed, 1); // we don't need to make a predecessor entry for the seed, // since it should never be added to the list of removable nodes HashSet removableNodes = new HashSet(); bestScore = Double.NEGATIVE_INFINITY; runGreedySearchRecursive(search_depth, component, seed, removableNodes); runGreedyRemovalSearch(component, removableNodes); for(Node node:component.getNodeList()){ Subnetwork oldBest = node2BestComponent.get(node); if (oldBest == null || oldBest.getScore() < component.getScore()) { node2BestComponent.put(node, component); } } } System.out.println("100%"); ArrayList subnetworkList=new ArrayList(node2BestComponent.values()); Collections.sort(subnetworkList,Collections.reverseOrder()); System.out.println("Filtering"); System.out.println("Subnetwork number"+subnetworkList.size()); for(int i=subnetworkList.size()-1;i>=0;i--){ if(subnetworkList.get(i).getScore()<=0){ subnetworkList.remove(i); } } System.out.println("Subnetwork number"+subnetworkList.size()); for(int i=subnetworkList.size()-1;i>=0;i--){ if(subnetworkList.get(i).numberOfNodes()<2){ subnetworkList.remove(i); } } System.out.println("Subnetwork number"+subnetworkList.size()); return filterSubnetworkList(subnetworkList); } /** * Takes sorted subnetworkList and filters subnetworks using overlap threshold * @param subnetworkList */ private ArrayList filterSubnetworkList(ArrayList subnetworkList){ ArrayList filteredSubnetworkList=new ArrayList<>(); ArrayList subnetworkListToBeDeleted=new ArrayList<>(); int percent=0; int i=0; while(ipercent){ percent=newPercent; System.out.println(percent+"% of subnetworks checked, "+(filteredSubnetworkList.size()+1) + " filtered subnetworks in the list"); } Subnetwork subnetwork1=subnetworkList.get(i); if (!subnetworkListToBeDeleted.contains(subnetwork1)) { filteredSubnetworkList.add(subnetwork1); for (int j = i + 1; j < subnetworkList.size(); j++) { Subnetwork subnetwork2 = subnetworkList.get(j); if (!subnetworkListToBeDeleted.contains(subnetwork2)) { int common = 0; for (Node node1 : subnetwork1.getNodeList()) { if(subnetwork2.contains(node1)){ common++; } } int size; if(subnetwork1.numberOfNodes()Parameters.gr_overlapThreshold){ //subnetwork2 is added because it has lower score //subnetworkList is sorted subnetworkListToBeDeleted.add(subnetwork2); } } } } i++; } System.out.println("100%"); //subnetworkList.removeAll(subnetworkListToBeDeleted); return filteredSubnetworkList; } /** * Recursively find the nodes within a max depth */ private void initializeMaxDepth(Node current, int depth) { withinMaxDepth.add(current); if (depth > 0) { for (Node neighbor : graph.getNeighborSet(current)) { if (!withinMaxDepth.contains(neighbor)) { initializeMaxDepth(neighbor, depth - 1); } } } } /** * Recursive greedy search function. Called from greedySearch() to a * recursive set of calls to greedily identify high scoring networks. The * idea for this search is that we make a recursive call for each addition * of a node from the neighborhood. At each stage we check to see if we have * found a higher scoring network, and if so, store it in one of the global * variables. You know how in the Wonder Twins, one of them turned into an * elephant and the other turned into a bucket of water? This function is * like the elephant. * * @param depth The remaining depth allowed for this greed search. * @param component The current component we are branching from. * @param lastAdded The last node added. * @param removableNodes Nodes that can be removed. */ private boolean runGreedySearchRecursive(int depth, Subnetwork component, Node lastAdded, HashSet removableNodes) { boolean improved = false; // score this component, check and see if the global top scores should // be updated, if we have found a better score, then return true if (component.getScore() > bestScore) { depth = search_depth; improved = true; bestScore = component.getScore(); } if (depth > 0) { // if depth > 0, otherwise we are out of depth and the recursive // calls will end // Get an iterator of nodes which are next to the boolean anyCallImproved = false; removableNodes.remove(lastAdded); int dependentCount = 0; for(Node newNeighbor:graph.getNeighborSet(lastAdded)){ //this node is only a new neighbor if it is not currently // in the component. if (withinMaxDepth.contains(newNeighbor) && !component.contains(newNeighbor)) { component.addNode(newNeighbor); removableNodes.add(newNeighbor); boolean thisCallImproved = runGreedySearchRecursive( depth - 1, component, newNeighbor, removableNodes); if (!thisCallImproved) { component.removeNode(newNeighbor); removableNodes.remove(newNeighbor); }else { dependentCount += 1; anyCallImproved = true; node2Predecessor.put(newNeighbor, lastAdded); } } } improved = improved | anyCallImproved; if (dependentCount > 0) { removableNodes.remove(lastAdded); node2DependentCount.put(lastAdded, dependentCount); } } return improved; } private void runGreedyRemovalSearch(Subnetwork component, HashSet removableNodes) { LinkedList list = new LinkedList(removableNodes); while (!list.isEmpty()) { Node current = (Node) list.removeFirst(); component.removeNode(current); double score = component.getScore(); if (score > bestScore) { bestScore = score; Node predecessor = (Node) node2Predecessor.get(current); int dependentCount = node2DependentCount.get(predecessor); dependentCount -= 1; if (dependentCount == 0) { removableNodes.add(predecessor); }else { node2DependentCount.put(predecessor, dependentCount); } }else { component.addNode(current); } } } } ================================================ FILE: java/ActiveSubnetworkSearchAlgorithms/SimulatedAnnealing.java ================================================ package ActiveSubnetworkSearchAlgorithms; import ActiveSubnetworkSearchMisc.Subnetwork; import ActiveSubnetworkSearchMisc.ScoreCalculations; import Application.Parameters; import Network.*; import java.text.DecimalFormat; import java.util.ArrayList; import java.util.Collections; import java.util.HashSet; import java.util.Iterator; import java.util.Random; /** * * @author Ozan Ozisik Some code parts are * adapted from https://github.com/idekerlab/jActiveModules * * Simulated Annealing for active subnetwork search. * * Notes: In code from idekerLab, all nodes are "on" in the beginning, which * cost lots of iterations to clean up. Here, if the related parameter is set, * all the nodes with positive z-scores are on, others are off; else * they are set randomly. Keeping change was default behavior when * score was not improved in any subnetwork in the list and randomness did not * reject the change. I changed default to false, although this situation can * occur rarely. */ public class SimulatedAnnealing { /** * * @param network * @param scoreCalculations */ public ArrayList simulatedAnnealing() { Random rand = new Random(Parameters.seedForRandom); Network network=ActiveSubnetworkSearch.network; ScoreCalculations scoreCalculations=ActiveSubnetworkSearch.scoreCalculations; ArrayList nodeList = ActiveSubnetworkSearch.networkNodeList; HashSet nodesOnSet = new HashSet<>(); HashSet nodesOffSet = new HashSet<>(nodeList); if(Parameters.startWithAllPositiveZScoreNodes){ for (Node node : nodesOffSet) { if (scoreCalculations.getZScore(node) > 0) { nodesOnSet.add(node); } } }else{ for (Node node : nodesOffSet) { if(rand.nextDouble() subnetworkList = subnetworkFinder.findSubnetworksDFS(nodesOnSet); //subnetworkList.sort((subnetwork1, subnetwork2) -> - (int)Math.signum(subnetwork1.getScore()-subnetwork2.getScore())); Collections.sort(subnetworkList, Collections.reverseOrder()); for (int i = 0; i < subnetworkList.size(); i++) { if (subnetworkList.get(i).numberOfNodes() > 1) { System.out.print(subnetworkList.get(i).numberOfNodes() + " " + (new DecimalFormat("###.##")).format(subnetworkList.get(i).getScore()) + ", "); } } System.out.println(""); double initialTemperature = Parameters.sa_initialTemperature; double finalTemperature = Parameters.sa_finalTemperature; int totalIterations = Parameters.sa_totalIterations; double T = initialTemperature; double temp_step = 1 - Math.pow((finalTemperature / initialTemperature), (1.0 / totalIterations)); System.out.println("Percentage of finished job, node number and score of modules that have more than one node are as follows:"); int percent=0; System.out.println("0%"); //TODO: There should be another stop mechanism, not only iteration number for (int iteration = 0; iteration < totalIterations; iteration++) { int newPercent=(100*iteration)/totalIterations; if(newPercent>percent){ percent=newPercent; System.out.println(percent+"% "); printSituation(subnetworkList); } Node node = nodeList.get(rand.nextInt(nodeList.size())); toggleNodeState(nodesOnSet, nodesOffSet, node); ArrayList newSubnetworkList = subnetworkFinder.findSubnetworksDFS(nodesOnSet); //newSubnetworkList.sort((subnetwork1, subnetwork2) -> -(int)Math.signum(subnetwork1.getScore()-subnetwork2.getScore())); Collections.sort(newSubnetworkList,Collections.reverseOrder()); boolean decision = false; boolean keep = false;//was true in IdekerLab code Iterator oldIt = subnetworkList.iterator(); Iterator newIt = newSubnetworkList.iterator(); //Note: There is a higher chance of accepting a change while (!decision && (newIt.hasNext() && oldIt.hasNext())) { Subnetwork subnetworkOld=oldIt.next(); Subnetwork subnetworkNew=newIt.next(); double delta = subnetworkNew.getScore() - subnetworkOld.getScore(); if (delta > .001) { keep = true; decision = true; }else if (rand.nextDouble() > Math.exp(delta / T)) { keep = false; decision = true; } } if (keep) { subnetworkList = newSubnetworkList; } else { toggleNodeState(nodesOnSet, nodesOffSet, node); } T = T * (1 - temp_step); } System.out.println("100%"); printSituation(subnetworkList); return subnetworkList; } /** * Moves node from nodesOnSet to nodesOff set or vice versa. * @param nodesOnSet * @param nodesOffSet * @param node */ public void toggleNodeState(HashSet nodesOnSet, HashSet nodesOffSet, Node node) { if (nodesOnSet.contains(node)) { nodesOnSet.remove(node); nodesOffSet.add(node); } else { nodesOffSet.remove(node); nodesOnSet.add(node); } } public void printSituation(ArrayList subnetworkList){ for (int i = 0; i < subnetworkList.size(); i++) { if (subnetworkList.get(i).numberOfNodes() > 1) { System.out.print(subnetworkList.get(i).numberOfNodes() + " " + (new DecimalFormat("###.##")).format(subnetworkList.get(i).getScore()) + ", "); } } System.out.println(""); } } ================================================ FILE: java/ActiveSubnetworkSearchMisc/Gaussian.java ================================================ package ActiveSubnetworkSearchMisc; /****************************************************************************** * * https://introcs.cs.princeton.edu/java/22library/Gaussian.java.html * * Function to compute the Gaussian pdf (probability density function) * and the Gaussian cdf (cumulative density function) * * % java Gaussian 820 1019 209 * 0.17050966869132111 * * % java Gaussian 1500 1019 209 * 0.9893164837383883 * * % java Gaussian 1500 1025 231 * 0.9801220907365489 * * The approximation is accurate to absolute error less than 8 * 10^(-16). * Reference: Evaluating the Normal Distribution by George Marsaglia. * http://www.jstatsoft.org/v11/a04/paper * ******************************************************************************/ public class Gaussian { // return pdf(x) = standard Gaussian pdf public static double pdf(double x) { return Math.exp(-x*x / 2) / Math.sqrt(2 * Math.PI); } // return pdf(x, mu, signma) = Gaussian pdf with mean mu and stddev sigma public static double pdf(double x, double mu, double sigma) { return pdf((x - mu) / sigma) / sigma; } // return cdf(z) = standard Gaussian cdf using Taylor approximation public static double cdf(double z) { if (z < -8.0) return 0.0; if (z > 8.0) return 1.0; double sum = 0.0, term = z; for (int i = 3; sum + term != sum; i += 2) { sum = sum + term; term = term * z * z / i; } return 0.5 + sum * pdf(z); } // return cdf(z, mu, sigma) = Gaussian cdf with mean mu and stddev sigma public static double cdf(double z, double mu, double sigma) { return cdf((z - mu) / sigma); } // Compute z such that cdf(z) = y via bisection search public static double inverseCDF(double y) { return inverseCDF(y, 0.00000001, -8, 8); } // bisection search private static double inverseCDF(double y, double delta, double lo, double hi) { double mid = lo + (hi - lo) / 2; if (hi - lo < delta) return mid; if (cdf(mid) > y) return inverseCDF(y, delta, lo, mid); else return inverseCDF(y, delta, mid, hi); } } ================================================ FILE: java/ActiveSubnetworkSearchMisc/ScoreCalculations.java ================================================ package ActiveSubnetworkSearchMisc; import Application.Parameters; import ActiveSubnetworkSearchAlgorithms.ActiveSubnetworkSearch; import Network.*; import java.util.ArrayList; import java.util.HashMap; import java.util.AbstractMap.SimpleEntry; import java.util.Collections; import java.util.Random; import java.util.logging.Level; import java.util.logging.Logger; /** * * @author Ozan Ozisik * Some code parts from https://github.com/idekerlab/jActiveModules * * Everything related to score calculation is in this class. * * As Monte Carlo approach includes randomness, the score calibrated by this * approach will be different in each run. * */ public class ScoreCalculations { private HashMap nodeToPValueMap; private HashMap nodeToZScoreMap; private final ArrayList networkNodeList; private double[] samplingScoreMeans; private double[] samplingScoreStds; private double[] samplingScoreMins; private double[] samplingScoreMaxs; private double MIN_SIG = 0.0000000000001; private double MAX_SIG = 1 - MIN_SIG; public ScoreCalculations(ArrayList> namePValuePairList) { this.networkNodeList=ActiveSubnetworkSearch.networkNodeList; fillNodeToPValueMap(namePValuePairList); process(); } private void fillNodeToPValueMap(ArrayList> namePValuePairList) { nodeToPValueMap = new HashMap(); int geneFromExperimentNotExisingInNetwork = 0; for (SimpleEntry entry : namePValuePairList) { Node node = new Node(entry.getKey()); if (networkNodeList.contains(node)) { double pValue = entry.getValue(); if(pValueMAX_SIG){ pValue=MAX_SIG; } double existingPValue=nodeToPValueMap.get(node) == null ? 1 : nodeToPValueMap.get(node); if (pValue < existingPValue) { nodeToPValueMap.put(node, pValue); } } else { geneFromExperimentNotExisingInNetwork++; } } System.out.println(nodeToPValueMap); if(geneFromExperimentNotExisingInNetwork>0){ Logger.getLogger(ScoreCalculations.class.getName()).log(Level.WARNING, "{0} genes in experiment file does not exist in the network", geneFromExperimentNotExisingInNetwork); } //Assign p-value to genes that do not exist in the experiment file. for (Node node : networkNodeList) { if (!nodeToPValueMap.containsKey(node)) { nodeToPValueMap.put(node, Parameters.pForNonSignificantNodes); } } } public void process() { boolean tmpPenaltyForSize=Parameters.penaltyForSize; Parameters.penaltyForSize=false; calculateZScores(); calculateMeanAndStdForMonteCarlo(); Parameters.penaltyForSize=tmpPenaltyForSize; } public Double getPValue(Node node) { return nodeToPValueMap.get(node); } public Double getZScore(Node node) { return nodeToZScoreMap.get(node); } private void calculateZScores() { nodeToZScoreMap=new HashMap(); for (Node node : networkNodeList) { double pValue = nodeToPValueMap.get(node); nodeToZScoreMap.put(node, ZStatistics.oneMinusNormalCDFInverse(pValue)); } } private void calculateMeanAndStdForMonteCarlo() { int numberOfNodes = networkNodeList.size(); samplingScoreMeans = new double[numberOfNodes+1];//0th position is not used samplingScoreStds = new double[numberOfNodes+1];//0th position is not used samplingScoreMins = new double[numberOfNodes+1];//0th position is not used samplingScoreMaxs = new double[numberOfNodes+1];//0th position is not used double[] samplingScoreSums=new double[numberOfNodes+1];//0th position is not used double[] samplingScoreSquareSums=new double[numberOfNodes+1];//0th position is not used for (int i = 0; i < numberOfNodes+1; i++) { samplingScoreSums[i] = 0; samplingScoreSquareSums[i] = 0; samplingScoreMins[i]=Double.MAX_VALUE; samplingScoreMaxs[i]=Double.MIN_VALUE; } int numberOfTrials=2000; ArrayList nodeListForSampling=new ArrayList<>(networkNodeList); ArrayList significantNodesList=new ArrayList<>(); ArrayList nonsignificantNodesList=new ArrayList<>(); for(Node node:networkNodeList){ if(nodeToZScoreMap.get(node)>0){ significantNodesList.add(node); }else{ nonsignificantNodesList.add(node); } } // System.out.println(""+significantNodesList.size()+" "+nonsignificantNodesList.size()); Random random=new Random(Parameters.seedForRandom); for (int trial = 0; trial < numberOfTrials; trial++) { // long start=System.nanoTime(); Collections.shuffle(nodeListForSampling, random); //These code can be used to first add significant nodes and start //sampling with positive scored nodes // Collections.shuffle(significantNodesList, random); // Collections.shuffle(nonsignificantNodesList, random); // nodeListForSampling.clear(); // nodeListForSampling.addAll(significantNodesList); // nodeListForSampling.addAll(nonsignificantNodesList); double zSum=0; int numberOfNodesInSubnetwork=0; for(Node node:nodeListForSampling){ zSum=zSum+nodeToZScoreMap.get(node); numberOfNodesInSubnetwork++; double score=ScoreCalculations.this.calculateScoreOfSubnetwork(numberOfNodesInSubnetwork,zSum,false); samplingScoreSums[numberOfNodesInSubnetwork]+=score; samplingScoreSquareSums[numberOfNodesInSubnetwork]+=score*score; if(scoresamplingScoreMaxs[numberOfNodesInSubnetwork]){ samplingScoreMaxs[numberOfNodesInSubnetwork]=score; } } // long stop=System.nanoTime(); // System.out.println((stop-start)/1000);//ms } for(int i=1;i<=numberOfNodes;i++){ samplingScoreMeans[i]=samplingScoreSums[i]/numberOfTrials; /** * var = SUM((x-xmean)^2) / N * var = SUM(x^2 - 2*xmean*x + xmean^2)/N * var = SUM(x^2)/N - (2*xmean*SUM(x))/N + (N*xmean^2)/N * var = SUM(x^2 )/N - 2*xmean^2 + xmean^2 * var = SUM(x^2 )/N - xmean^2 */ samplingScoreStds[i]=samplingScoreSquareSums[i]/numberOfTrials - samplingScoreMeans[i]*samplingScoreMeans[i]; samplingScoreStds[i]=Math.sqrt(samplingScoreStds[i]+0.0000001); } } /** * Calculates score of subnetwork. Returns zero for one node subnetworks. * @param subnetwork * @param subnetworkScoreNormalization * @return */ public double calculateScoreOfSubnetwork(Subnetwork subnetwork, boolean subnetworkScoreNormalization) { return ScoreCalculations.this.calculateScoreOfSubnetwork(subnetwork.getNodeList(), subnetworkScoreNormalization); } /** * Calculates score using node list. Returns zero for one node subnetworks. * @param nodeList * @param subnetworkScoreNormalization * @return */ public double calculateScoreOfSubnetwork(ArrayList nodeList, boolean subnetworkScoreNormalization) { int numberOfNodes=nodeList.size(); double zSum=0; for(Node node:nodeList){ zSum=zSum+nodeToZScoreMap.get(node); } return ScoreCalculations.this.calculateScoreOfSubnetwork(numberOfNodes, zSum, subnetworkScoreNormalization); } /** * Calculates score using z score sum and number of nodes. * Returns zero for one node subnetworks. * @param numberOfNodes * @param zSum * @param subnetworkScoreNormalization * @return */ public double calculateScoreOfSubnetwork(int numberOfNodes, double zSum, boolean subnetworkScoreNormalization) { if(numberOfNodes==1){ return 0; } double score=zSum/Math.sqrt(numberOfNodes); if(subnetworkScoreNormalization){ score=normalizeScore(score, numberOfNodes); } if(Parameters.penaltyForSize){ score=penaltyForSize(score, numberOfNodes); } return score; } private double normalizeScore(double score, int numberOfNodes){ return (score-samplingScoreMeans[numberOfNodes])/samplingScoreStds[numberOfNodes]; } private double penaltyForSize(double score, int numberOfNodes){ score=score*Gaussian.cdf(score, 100, 30)*1000; return score; } } ================================================ FILE: java/ActiveSubnetworkSearchMisc/Subnetwork.java ================================================ package ActiveSubnetworkSearchMisc; import Network.Network; import ActiveSubnetworkSearchAlgorithms.ActiveSubnetworkSearch; import Network.Node; import java.util.ArrayList; import java.util.HashSet; /** * * @author Ozan Ozisik */ public class Subnetwork implements Comparable { private Network network; private ArrayList nodeList; private ScoreCalculations scoreCalculations; private double score; private double zSum; private HashSet neighborSet; private ArrayList neighborList; public Subnetwork(ArrayList nodeList){ this.nodeList=nodeList; this.scoreCalculations=ActiveSubnetworkSearch.scoreCalculations; neighborSet=new HashSet<>(); neighborList=new ArrayList(); network = ActiveSubnetworkSearch.network; zSum=0; for(Node node:nodeList){ zSum=zSum+scoreCalculations.getZScore(node); } this.score=scoreCalculations.calculateScoreOfSubnetwork(nodeList.size(), zSum, true); } //TODO It may be better to return a copy of the list or Collections.unmodifiableList(nodeList), here and in other private collection returning areas public ArrayList getNodeList(){ return nodeList; } public HashSet getNeighborSet(){ if(neighborSet.isEmpty()){ extractNeighborSet(); } return neighborSet; } public ArrayList getNeighborList(){ if(neighborSet.isEmpty()){ extractNeighborSet(); } return neighborList; } public int numberOfNodes(){ return nodeList.size(); } public double getScore(){ return score; } @Override public int compareTo(Object o) { return (int)Math.signum(this.getScore()-((Subnetwork)o).getScore()); } public boolean contains(Node node){ return nodeList.contains(node); } public void addNode(Node node){ nodeList.add(node); zSum=zSum+scoreCalculations.getZScore(node); this.score=scoreCalculations.calculateScoreOfSubnetwork(nodeList.size(), zSum, true); //neighborSet is cleared for reextraction in case of need. //It could also be updated here. neighborSet.clear(); } public void removeNode(Node node){ if(nodeList.contains(node)){ nodeList.remove(node); zSum=zSum-scoreCalculations.getZScore(node); this.score=scoreCalculations.calculateScoreOfSubnetwork(nodeList.size(), zSum, true); //neighborSet is cleared for reextraction in case of need. //It could also be updated here. neighborSet.clear(); } } private void extractNeighborSet(){ neighborSet.clear(); for(Node node:nodeList){ neighborSet.addAll(network.getNeighborSet(node)); } neighborSet.removeAll(nodeList); neighborList.addAll(neighborSet); } } ================================================ FILE: java/ActiveSubnetworkSearchMisc/ZStatistics.java ================================================ package ActiveSubnetworkSearchMisc; /** * * @author Ozan Ozisik * adapted from https://github.com/idekerlab/jActiveModules */ public class ZStatistics { public static double oneMinusNormalCDFInverse(double p) { if (p <= 0.5) { if (p > 0) { return oneMinusNormalCDFInversePLT5(p); } else { return Double.POSITIVE_INFINITY; } } else if (p < 1) { return -oneMinusNormalCDFInversePLT5(1 - p); } else { return Double.NEGATIVE_INFINITY; } } //from 26.2.23, page 933, Handbook of Mathematical Functions, NBS, 1964 //Requires 0 < p <= 0.5 private static double oneMinusNormalCDFInversePLT5(double p) { double t, temp; if (p < 0) { throw new IllegalArgumentException("oneMinusNormalCDFInversePLT5 called with negative p\n"); } else if (p > 0.5) { throw new IllegalArgumentException("oneMinusNormalCDFInversePLT5 called with p > 0.5\n"); } else { t = Math.sqrt(-2 * Math.log(p)); temp = 2.515517 + 0.802853 * t + 0.010328 * t * t; temp = t - temp / (1 + 1.432788 * t + 0.189269 * t * t + 0.001308 * t * t * t); return temp; } } } ================================================ FILE: java/Application/AppActiveSubnetworkSearch.java ================================================ package Application; import ActiveSubnetworkSearchAlgorithms.ActiveSubnetworkSearch; import java.util.logging.Level; import java.util.logging.Logger; /** * * @author Ozan Ozisik */ public class AppActiveSubnetworkSearch { /** * @param args the command line arguments */ public static void main(String[] args) { try{ processArguments(args); }catch(Exception e){ Logger.getLogger(AppActiveSubnetworkSearch.class.getName()).log(Level.SEVERE, "Please check the arguments"); System.exit(0); } ActiveSubnetworkSearch.activeSubnetworkSearch(); } public static void processArguments(String[] args) throws Exception { String helpText; helpText = "Options of the application are\n" + "-sif= \tuses the given interaction file\n" + "-sig= \tuses the given experiment file (gene p-value pairs)\n" + "-method=[GR|SA|GA] \truns greedy search, simulated annealing or genetic algorithm for the search (default GR)\n" + "-useAllPositives \tif used adds an individual with all positive nodes in GA, initializes candidate solution with all positive nodes in SA (default false)\n" + "-geneInitProb= \tprobability of adding a gene in inital solution for SA and GA (default 0.1)\n" + "-saTemp0= \tinitial temperature for SA (default 1.0)\n" + "-saTemp1= \tfinal temperature for SA (default 0.01)\n" + "-saIter= \titeration number for SA (default 10000)\n" + "-gaPop= \tpopulation size for GA (default 400)\n" + "-gaIter= \titeration number for GA (default 200)\n" + "-gaThread= \tnumber of threads to be used in GA (default 5)\n" + "-gaCrossover= \tapplies crossover with given probability (default 1)\n" + "-gaMut= \tapplies mutation with given rate (default 0)\n" + "-grMaxDepth= \tsets max depth in greedy search, 0 for no limit (default 1)\n" + "-grSearchDepth= \tsets search depth in greedy search (default 1)\n" + "-grOverlap= \tsets overlap threshold for results of greedy search (default 0.5)\n" + "-grSubNum= \tsets number of subnetworks to be presented in the results (default 1000)\n" + "-seedForRandom= \tsets the seed for random number generators, useful for reproducibility (default 1234)\n"; if(args.length==0 || args[0].equals("-h") || args[0].equals("help") || args[0].equals("-help")){ System.out.println(helpText); }else{ for(int i=0;i1){ value=str[1]; } switch(argType){ case "-sif":Parameters.sifPath=value;break; case "-sig":Parameters.experimentFilePath=value;break; case "-method":Parameters.useSAorGAorGR=Parameters.SearchMethod.valueOf(value);break; case "-useAllPositives":Parameters.startWithAllPositiveZScoreNodes=true;break; case "-geneInitProb":Parameters.geneInitialAdditionProbability=Double.parseDouble(value);break; case "-saTemp0":Parameters.sa_initialTemperature=Double.parseDouble(value);break; case "-saTemp1":Parameters.sa_finalTemperature=Double.parseDouble(value);break; case "-saIter":Parameters.sa_totalIterations=Integer.parseInt(value);break; case "-gaPop":Parameters.ga_populationSize=Integer.parseInt(value);break; case "-gaIter":Parameters.ga_totalIterations=Integer.parseInt(value);break; case "-gaThread":Parameters.ga_threadNumber=Integer.parseInt(value);break; case "-gaMut":Parameters.ga_mutationRate=Double.parseDouble(value);break; case "-grMaxDepth":Parameters.gr_maxDepth=Integer.parseInt(value);break; case "-grSearchDepth":Parameters.gr_searchDepth=Integer.parseInt(value);break; case "-grOverlap":Parameters.gr_overlapThreshold=Double.parseDouble(value);break; case "-grSubNum":Parameters.gr_subnetworkNum=Integer.parseInt(value);break; case "-seedForRandom":Parameters.seedForRandom=Integer.parseInt(value);break; default:System.out.println("Unknown argument: "+argType); } } } } } ================================================ FILE: java/Application/Parameters.java ================================================ package Application; /** * * @author Ozan Ozisik */ public class Parameters { public static String sifPath="BIOGRID-ORGANISM-Homo_sapiens-3.4.155.OzCleaned.sif"; public static String experimentFilePath="Behcet_jp_GWASPvalue.txt"; public static String resultFilePath="resultActiveSubnetworkSearch.txt"; public enum SearchMethod{GR, SA, GA}; public static SearchMethod useSAorGAorGR=SearchMethod.GR;//(default GR) public static boolean startWithAllPositiveZScoreNodes=false;//(default false) public static double geneInitialAdditionProbability=0.1;//(default 0.1) public static boolean penaltyForSize=false; public static double pForNonSignificantNodes=0.5;//0.9999999999999 public static double sa_initialTemperature=1.0;//(default 1.0) public static double sa_finalTemperature=0.01;//(default 0.01) public static int sa_totalIterations=10000;//(default 10000) public static int ga_populationSize=400;//(default 400) public static int ga_totalIterations=200;//(default 200) public static int ga_threadNumber=5;//(default 5) public static double ga_crossoverRate=1; public static double ga_mutationRate=0.0; public static boolean ga_Elitism=true; public static int gr_maxDepth=1;//(default 1) public static int gr_searchDepth=1;//(default 1) public static double gr_overlapThreshold=0.5;//(default 0.5) public static double gr_subnetworkNum=1000;//(default 1000) public static int seedForRandom=1234; } ================================================ FILE: java/File/ExperimentFileReader.java ================================================ package File; import java.io.BufferedReader; import java.io.FileNotFoundException; import java.io.FileReader; import java.io.IOException; import java.util.logging.Level; import java.util.logging.Logger; import java.util.AbstractMap.SimpleEntry; import java.util.ArrayList; /** * * @author Ozan Ozisik */ public class ExperimentFileReader { public static ArrayList> readExperimentFile(String path){ try { //A list of pairs is used to allow multiple p-values for the same gene. //These multiple p-values situation is not business of this class ArrayList> namePValuePairList=new ArrayList<>(); BufferedReader bufReader=new BufferedReader(new FileReader(path)); String line; String[] strArr; int lineNo=1; while ((line = bufReader.readLine()) != null) { strArr=line.split("[ \\t]"); if(strArr.length==2){ try{ namePValuePairList.add(new SimpleEntry<>(strArr[0].toUpperCase(), Double.parseDouble(strArr[1]))); }catch(NumberFormatException nfe){ Logger.getLogger(ExperimentFileReader.class.getName()).log(Level.WARNING, "Unexpected number format in experiment file line {0}, discarded", lineNo); } }else{ Logger.getLogger(ExperimentFileReader.class.getName()).log(Level.WARNING, "Unexpected column number in experiment file line {0}, discarded", lineNo); } lineNo++; } return namePValuePairList; } catch (FileNotFoundException ex) { Logger.getLogger(ExperimentFileReader.class.getName()).log(Level.SEVERE, "Experiment file not found", ex); return null; } catch (IOException ex) { Logger.getLogger(ExperimentFileReader.class.getName()).log(Level.SEVERE, null, ex); return null; } } } ================================================ FILE: java/File/SIFReader.java ================================================ package File; import Network.Network; import java.io.BufferedReader; import java.io.FileNotFoundException; import java.io.FileReader; import java.io.IOException; import java.util.logging.Level; import java.util.logging.Logger; /** * * @author Ozan Ozisik */ public class SIFReader { public static Network readSIF(String path){ try { int columnNumber; BufferedReader bufReader=new BufferedReader(new FileReader(path)); Network network=new Network(); bufReader.mark(300); String line; line = bufReader.readLine(); String[] strArr=line.split("[ \\t]"); if(strArr.length==2){ columnNumber=2; }else if(strArr.length==3){ columnNumber=3; }else{ Logger.getLogger(SIFReader.class.getName()).log(Level.SEVERE, "SIF file must have 2 or 3 columns"); return null; } bufReader.reset(); int lineNo=1; while ((line = bufReader.readLine()) != null) { strArr=line.split("[ \\t]"); if(strArr.length==columnNumber){ String strNode1, strNode2; if(columnNumber==2){ strNode1=strArr[0]; strNode2=strArr[1]; }else{ strNode1=strArr[0]; strNode2=strArr[2]; } strNode1=strNode1.toUpperCase(); strNode2=strNode2.toUpperCase(); network.addInteraction(strNode1,strNode2); }else{ Logger.getLogger(SIFReader.class.getName()).log(Level.WARNING, "Unexpected column number in SIF line {0}, discarded", lineNo); } lineNo++; } return network; } catch (FileNotFoundException ex) { Logger.getLogger(SIFReader.class.getName()).log(Level.SEVERE, "SIF file not found", ex); return null; } catch (IOException ex) { Logger.getLogger(SIFReader.class.getName()).log(Level.SEVERE, null, ex); return null; } } } ================================================ FILE: java/Network/Network.java ================================================ package Network; import java.util.ArrayList; import java.util.HashMap; import java.util.HashSet; import java.util.logging.Level; import java.util.logging.Logger; /** * * @author Ozan Ozisik */ public class Network { private HashMap> adjacency; private boolean selfInteractionWarningGiven=false; public Network() { adjacency=new HashMap>(); } public void addInteraction(String strNode1, String strNode2){ Node node1=new Node(strNode1); Node node2=new Node(strNode2); addInteraction(node1, node2); } public void addInteraction(Node node1, Node node2){ if(node1.equals(node2)){ if(!selfInteractionWarningGiven){ Logger.getLogger(Network.class.getName()).log(Level.WARNING, "Self interactions are discarded."); selfInteractionWarningGiven=true; } }else{ if(adjacency.get(node1)==null){ adjacency.put(node1, new HashSet()); } if(adjacency.get(node2)==null){ adjacency.put(node2, new HashSet()); } adjacency.get(node1).add(node2); adjacency.get(node2).add(node1); } } public HashSet getNeighborSet(Node node){ return adjacency.get(node); } public ArrayList getNodeList(){ return new ArrayList<>(adjacency.keySet()); } public boolean areAdjacent(Node node1, Node node2){ return adjacency.get(node1).contains(node2); } public int getNumberOfNodes(){ return adjacency.keySet().size(); } public int getNumberOfInteractions(){ int interactionNumber=0; ArrayList nodeList=getNodeList(); for(int i=0;i nodesOnSet; HashSet reached; public SubnetworkFinder(){ network=ActiveSubnetworkSearch.network; } /** * Finds the connected subnetworks of the given nodes using depth first * search. This method may return empty ArrayList, this should be handled in the * calling methods. * @param nodesOnSet * @return */ public ArrayList findSubnetworksDFS(HashSet nodesOnSet){ this.nodesOnSet=nodesOnSet; ArrayList subnetworkList=new ArrayList(); reached=new HashSet<>(2 * nodesOnSet.size()); for(Node node:nodesOnSet){ if(!reached.contains(node)){ ArrayList subnetworkNodeList = new ArrayList<>(); search(node, subnetworkNodeList); subnetworkList.add(new Subnetwork(subnetworkNodeList)); } } return subnetworkList; } private void search(Node node, ArrayList subnetworkNodeList){ reached.add(node); subnetworkNodeList.add(node); HashSet neighborNodesSet=network.getNeighborSet(node); for(Node neighborNode:neighborNodesSet){ if((nodesOnSet.contains(neighborNode))&&(!reached.contains(neighborNode))){ search(neighborNode, subnetworkNodeList); } } } public ArrayList findSubnetworksDFSNonRecursive(HashSet nodesOnSet){ ArrayList subnetworkList=new ArrayList(); HashSet reached=new HashSet<>(2 * nodesOnSet.size()); for(Node node:nodesOnSet){ if(!reached.contains(node)){ ArrayList subnetworkNodeList = new ArrayList<>(); LinkedList nodesToBeChecked=new LinkedList<>(); nodesToBeChecked.add(node); while(!nodesToBeChecked.isEmpty()){ Node curNode=nodesToBeChecked.pop(); if(!reached.contains(curNode)){ reached.add(curNode); subnetworkNodeList.add(curNode); for(Node neighborNode:network.getNeighborSet(curNode)){ if((nodesOnSet.contains(neighborNode))&&(!reached.contains(neighborNode))){ nodesToBeChecked.push(neighborNode); } } } } subnetworkList.add(new Subnetwork(subnetworkNodeList)); } } return subnetworkList; } public ArrayList findSubnetworksBFS(HashSet nodesOnSet){ ArrayList subnetworkList=new ArrayList(); HashSet reached=new HashSet<>(2 * nodesOnSet.size()); for(Node node:nodesOnSet){ if(!reached.contains(node)){ ArrayList subnetworkNodeList = new ArrayList<>(); LinkedList nodesToBeChecked=new LinkedList<>(); nodesToBeChecked.add(node); while(!nodesToBeChecked.isEmpty()){ Node curNode=nodesToBeChecked.pop(); if(!reached.contains(curNode)){ reached.add(curNode); subnetworkNodeList.add(curNode); for(Node neighborNode:network.getNeighborSet(curNode)){ if((nodesOnSet.contains(neighborNode))&&(!reached.contains(neighborNode))){ nodesToBeChecked.add(neighborNode); } } } } subnetworkList.add(new Subnetwork(subnetworkNodeList)); } } return subnetworkList; } } ================================================ FILE: man/UpSet_plot.Rd ================================================ % Generated by roxygen2: do not edit by hand % Please edit documentation in R/visualization.R \name{UpSet_plot} \alias{UpSet_plot} \title{Create UpSet Plot of Enriched Terms} \usage{ UpSet_plot( result_df, genes_df, num_terms = 10, method = "heatmap", use_description = FALSE, low = "red", mid = "black", high = "green", ... ) } \arguments{ \item{result_df}{A dataframe of pathfindR results that must contain the following columns: \describe{ \item{Term_Description}{Description of the enriched term (necessary if \code{use_description = TRUE})} \item{ID}{ID of the enriched term (necessary if \code{use_description = FALSE})} \item{lowest_p}{the highest adjusted-p value of the given term over all iterations} \item{Up_regulated}{the up-regulated genes in the input involved in the given term's gene set, comma-separated} \item{Down_regulated}{the down-regulated genes in the input involved in the given term's gene set, comma-separated} }} \item{genes_df}{the input data that was used with \code{\link{run_pathfindR}}. It must be a data frame with 3 columns: \enumerate{ \item Gene Symbol (Gene Symbol) \item Change value, e.g. log(fold change) (optional) \item p value, e.g. adjusted p value associated with differential expression } The change values in this data frame are used to color the affected genes} \item{num_terms}{Number of top enriched terms to use while creating the plot. Set to \code{NULL} to use all enriched terms (default = 10)} \item{method}{the option for producing the plot. Options include 'heatmap', 'boxplot' and 'barplot'. (default = 'heatmap')} \item{use_description}{Boolean argument to indicate whether term descriptions (in the 'Term_Description' column) should be used. (default = \code{FALSE})} \item{low}{a string indicating the color of 'low' values in the coloring gradient (default = 'green')} \item{mid}{a string indicating the color of 'mid' values in the coloring gradient (default = 'black')} \item{high}{a string indicating the color of 'high' values in the coloring gradient (default = 'red')} \item{...}{additional arguments for \code{\link{input_processing}} (used if \code{genes_df} is provided)} } \value{ UpSet plots are plots of the intersections of sets as a matrix. This function creates a ggplot object of an UpSet plot where the x-axis is the UpSet plot of intersections of enriched terms. By default (i.e. \code{method = 'heatmap'}) the main plot is a heatmap of genes at the corresponding intersections, colored by up/down regulation (if \code{genes_df} is provided, colored by change values). If \code{method = 'barplot'}, the main plot is bar plots of the number of genes at the corresponding intersections. Finally, if \code{method = 'boxplot'} and if \code{genes_df} is provided, then the main plot displays the boxplots of change values of the genes at the corresponding intersections. } \description{ Create UpSet Plot of Enriched Terms } \examples{ UpSet_plot(example_pathfindR_output) } ================================================ FILE: man/active_snw_enrichment_wrapper.Rd ================================================ % Generated by roxygen2: do not edit by hand % Please edit documentation in R/utility.R \name{active_snw_enrichment_wrapper} \alias{active_snw_enrichment_wrapper} \title{Wrapper for Active Subnetwork Search + Enrichment over Single/Multiple Iteration(s)} \usage{ active_snw_enrichment_wrapper( input_processed, pin_path, gset_list, enrichment_threshold, list_active_snw_genes, adj_method = "bonferroni", search_method = "GR", disable_parallel = FALSE, use_all_positives = FALSE, iterations = 10, n_processes = NULL, score_quan_thr = 0.8, sig_gene_thr = 0.02, saTemp0 = 1, saTemp1 = 0.01, saIter = 10000, gaPop = 400, gaIter = 200, gaThread = 5, gaCrossover = 1, gaMut = 0, grMaxDepth = 1, grSearchDepth = 1, grOverlap = 0.5, grSubNum = 1000, silent_option = TRUE ) } \arguments{ \item{input_processed}{processed input data frame} \item{pin_path}{path/to/PIN/file} \item{gset_list}{list for gene sets} \item{enrichment_threshold}{adjusted-p value threshold used when filtering enrichment results (default = 0.05)} \item{list_active_snw_genes}{boolean value indicating whether or not to report the non-significant active subnetwork genes for the active subnetwork which was enriched for the given term with the lowest p value (default = \code{FALSE})} \item{adj_method}{correction method to be used for adjusting p-values. (default = 'bonferroni')} \item{search_method}{algorithm to use when performing active subnetwork search. Options are greedy search (GR), simulated annealing (SA) or genetic algorithm (GA) for the search (default = 'GR').} \item{disable_parallel}{boolean to indicate whether to disable parallel runs via \code{foreach} (default = FALSE)} \item{use_all_positives}{if TRUE: in GA, adds an individual with all positive nodes. In SA, initializes candidate solution with all positive nodes. (default = FALSE)} \item{iterations}{number of iterations for active subnetwork search and enrichment analyses (Default = 10)} \item{n_processes}{optional argument for specifying the number of processes used by foreach. If not specified, the function determines this automatically (Default == NULL. Gets set to 1 for Genetic Algorithm)} \item{score_quan_thr}{active subnetwork score quantile threshold. Must be between 0 and 1 or set to -1 for not filtering. (Default = 0.8)} \item{sig_gene_thr}{threshold for the minimum proportion of significant genes in the subnetwork (Default = 0.02) If the number of genes to use as threshold is calculated to be < 2 (e.g. 50 signif. genes x 0.01 = 0.5), the threshold number is set to 2} \item{saTemp0}{Initial temperature for SA (default = 1.0)} \item{saTemp1}{Final temperature for SA (default = 0.01)} \item{saIter}{Iteration number for SA (default = 10000)} \item{gaPop}{Population size for GA (default = 400)} \item{gaIter}{Iteration number for GA (default = 200)} \item{gaThread}{Number of threads to be used in GA (default = 5)} \item{gaCrossover}{Applies crossover with the given probability in GA (default = 1, i.e. always perform crossover)} \item{gaMut}{For GA, applies mutation with given mutation rate (default = 0, i.e. mutation off)} \item{grMaxDepth}{Sets max depth in greedy search, 0 for no limit (default = 1)} \item{grSearchDepth}{Search depth in greedy search (default = 1)} \item{grOverlap}{Overlap threshold for results of greedy search (default = 0.5)} \item{grSubNum}{Number of subnetworks to be presented in the results (default = 1000)} \item{silent_option}{boolean value indicating whether to print the messages to the console (FALSE) or not (TRUE, this will print to a temp. file) during active subnetwork search (default = TRUE). This option was added because during parallel runs, the console messages get disorderly printed.} } \value{ Data frame of combined pathfindR enrichment results } \description{ Wrapper for Active Subnetwork Search + Enrichment over Single/Multiple Iteration(s) } ================================================ FILE: man/active_snw_search.Rd ================================================ % Generated by roxygen2: do not edit by hand % Please edit documentation in R/active_snw_search.R \name{active_snw_search} \alias{active_snw_search} \title{Perform Active Subnetwork Search} \usage{ active_snw_search( input_for_search, pin_name_path = "Biogrid", snws_file = "active_snws", dir_for_parallel_run = NULL, score_quan_thr = 0.8, sig_gene_thr = 0.02, search_method = "GR", seedForRandom = 1234, silent_option = TRUE, use_all_positives = FALSE, geneInitProbs = 0.1, saTemp0 = 1, saTemp1 = 0.01, saIter = 10000, gaPop = 400, gaIter = 10000, gaThread = 5, gaCrossover = 1, gaMut = 0, grMaxDepth = 1, grSearchDepth = 1, grOverlap = 0.5, grSubNum = 1000 ) } \arguments{ \item{input_for_search}{input the input data that active subnetwork search uses. The input must be a data frame containing at least these 2 columns: \describe{ \item{GENE}{Gene Symbol} \item{P_VALUE}{p value obtained through a test, e.g. differential expression/methylation} }} \item{pin_name_path}{Name of the chosen PIN or absolute/path/to/PIN.sif. If PIN name, must be one of c('Biogrid', 'STRING', 'GeneMania', 'IntAct', 'KEGG', 'mmu_STRING'). If path/to/PIN.sif, the file must comply with the PIN specifications. (Default = 'Biogrid')} \item{snws_file}{name for active subnetwork search output data \strong{without file extension} (default = 'active_snws')} \item{dir_for_parallel_run}{(previously created) directory for a parallel run iteration. Used in the wrapper function (see ?run_pathfindR) (Default = NULL)} \item{score_quan_thr}{active subnetwork score quantile threshold. Must be between 0 and 1 or set to -1 for not filtering. (Default = 0.8)} \item{sig_gene_thr}{threshold for the minimum proportion of significant genes in the subnetwork (Default = 0.02) If the number of genes to use as threshold is calculated to be < 2 (e.g. 50 signif. genes x 0.01 = 0.5), the threshold number is set to 2} \item{search_method}{algorithm to use when performing active subnetwork search. Options are greedy search (GR), simulated annealing (SA) or genetic algorithm (GA) for the search (default = 'GR').} \item{seedForRandom}{seed for reproducibility while running the java modules (applies for GR and SA)} \item{silent_option}{boolean value indicating whether to print the messages to the console (FALSE) or not (TRUE, this will print to a temp. file) during active subnetwork search (default = TRUE). This option was added because during parallel runs, the console messages get disorderly printed.} \item{use_all_positives}{if TRUE: in GA, adds an individual with all positive nodes. In SA, initializes candidate solution with all positive nodes. (default = FALSE)} \item{geneInitProbs}{For SA and GA, probability of adding a gene in initial solution (default = 0.1)} \item{saTemp0}{Initial temperature for SA (default = 1.0)} \item{saTemp1}{Final temperature for SA (default = 0.01)} \item{saIter}{Iteration number for SA (default = 10000)} \item{gaPop}{Population size for GA (default = 400)} \item{gaIter}{Iteration number for GA (default = 200)} \item{gaThread}{Number of threads to be used in GA (default = 5)} \item{gaCrossover}{Applies crossover with the given probability in GA (default = 1, i.e. always perform crossover)} \item{gaMut}{For GA, applies mutation with given mutation rate (default = 0, i.e. mutation off)} \item{grMaxDepth}{Sets max depth in greedy search, 0 for no limit (default = 1)} \item{grSearchDepth}{Search depth in greedy search (default = 1)} \item{grOverlap}{Overlap threshold for results of greedy search (default = 0.5)} \item{grSubNum}{Number of subnetworks to be presented in the results (default = 1000)} } \value{ A list of genes in every identified active subnetwork that has a score greater than the `score_quan_thr`th quantile and that has at least `sig_gene_thr` affected genes. } \description{ Perform Active Subnetwork Search } \examples{ \donttest{ processed_df <- example_pathfindR_input[1:15, -2] colnames(processed_df) <- c('GENE', 'P_VALUE') GR_snws <- active_snw_search( input_for_search = processed_df, pin_name_path = 'KEGG', search_method = 'GR', score_quan_thr = 0.8 ) # clean-up unlink('active_snw_search', recursive = TRUE) } } ================================================ FILE: man/annotate_term_genes.Rd ================================================ % Generated by roxygen2: do not edit by hand % Please edit documentation in R/utility.R \name{annotate_term_genes} \alias{annotate_term_genes} \title{Annotate the Affected Genes in the Provided Enriched Terms} \usage{ annotate_term_genes( result_df, input_processed, genes_by_term = pathfindR.data::kegg_genes ) } \arguments{ \item{result_df}{data frame of enrichment results. The only must-have column is 'ID'.} \item{input_processed}{input data processed via \code{\link{input_processing}}} \item{genes_by_term}{List that contains genes for each gene set. Names of this list are gene set IDs (default = kegg_genes)} } \value{ The original data frame with two additional columns: \describe{ \item{Up_regulated}{the up-regulated genes in the input involved in the given term's gene set, comma-separated} \item{Down_regulated}{the down-regulated genes in the input involved in the given term's gene set, comma-separated} } } \description{ Function to annotate the involved affected (input) genes in each term. } \examples{ example_gene_data <- example_pathfindR_input colnames(example_gene_data) <- c('GENE', 'CHANGE', 'P_VALUE') annotated_result <- annotate_term_genes( result_df = example_pathfindR_output, input_processed = example_gene_data ) } ================================================ FILE: man/check_java_version.Rd ================================================ % Generated by roxygen2: do not edit by hand % Please edit documentation in R/zzz.R \name{check_java_version} \alias{check_java_version} \title{Check Java Version} \usage{ check_java_version(version = NULL) } \arguments{ \item{version}{character vector containing the output of 'java -version'. If NULL, result of \code{\link{fetch_java_version}} is used (default = NULL)} } \value{ only parses and checks whether the java version is >= 1.8 } \description{ Check Java Version } \details{ this function was adapted from the CRAN package \code{sparklyr} } ================================================ FILE: man/cluster_enriched_terms.Rd ================================================ % Generated by roxygen2: do not edit by hand % Please edit documentation in R/clustering.R \name{cluster_enriched_terms} \alias{cluster_enriched_terms} \title{Cluster Enriched Terms} \usage{ cluster_enriched_terms( enrichment_res, method = "hierarchical", plot_clusters_graph = TRUE, use_description = FALSE, use_active_snw_genes = FALSE, ... ) } \arguments{ \item{enrichment_res}{data frame of pathfindR enrichment results. Must-have columns are 'Term_Description' (if \code{use_description = TRUE}) or 'ID' (if \code{use_description = FALSE}), 'Down_regulated', and 'Up_regulated'. If \code{use_active_snw_genes = TRUE}, 'non_Signif_Snw_Genes' must also be provided.} \item{method}{Either 'hierarchical' or 'fuzzy'. Details of clustering are provided in the corresponding functions \code{\link{hierarchical_term_clustering}}, and \code{\link{fuzzy_term_clustering}}} \item{plot_clusters_graph}{boolean value indicate whether or not to plot the graph diagram of clustering results (default = TRUE)} \item{use_description}{Boolean argument to indicate whether term descriptions (in the 'Term_Description' column) should be used. (default = \code{FALSE})} \item{use_active_snw_genes}{boolean to indicate whether or not to use non-input active subnetwork genes in the calculation of kappa statistics (default = FALSE, i.e. only use affected genes)} \item{...}{additional arguments for \code{\link{hierarchical_term_clustering}}, \code{\link{fuzzy_term_clustering}} and \code{\link{cluster_graph_vis}}. See documentation of these functions for more details.} } \value{ a data frame of clustering results. For 'hierarchical', the cluster assignments (Cluster) and whether the term is representative of its cluster (Status) is added as columns. For 'fuzzy', terms that are in multiple clusters are provided for each cluster. The cluster assignments (Cluster) and whether the term is representative of its cluster (Status) is added as columns. } \description{ Cluster Enriched Terms } \examples{ example_clustered <- cluster_enriched_terms( example_pathfindR_output[1:3, ], plot_clusters_graph = FALSE ) example_clustered <- cluster_enriched_terms( example_pathfindR_output[1:3, ], method = 'fuzzy', plot_clusters_graph = FALSE ) } \seealso{ See \code{\link{hierarchical_term_clustering}} for hierarchical clustering of enriched terms. See \code{\link{fuzzy_term_clustering}} for fuzzy clustering of enriched terms. See \code{\link{cluster_graph_vis}} for graph visualization of clustering. } ================================================ FILE: man/cluster_graph_vis.Rd ================================================ % Generated by roxygen2: do not edit by hand % Please edit documentation in R/clustering.R \name{cluster_graph_vis} \alias{cluster_graph_vis} \title{Graph Visualization of Clustered Enriched Terms} \usage{ cluster_graph_vis( clu_obj, kappa_mat, enrichment_res, kappa_threshold = 0.35, use_description = FALSE, vertex.label.cex = 0.7, vertex.size.scaling = 2.5 ) } \arguments{ \item{clu_obj}{clustering result (either a matrix obtained via \code{\link{hierarchical_term_clustering}} or \code{\link{fuzzy_term_clustering}} `fuzzy_term_clustering` or a vector obtained via `hierarchical_term_clustering`)} \item{kappa_mat}{matrix of kappa statistics (output of \code{\link{create_kappa_matrix}})} \item{enrichment_res}{data frame of pathfindR enrichment results. Must-have columns are 'Term_Description' (if \code{use_description = TRUE}) or 'ID' (if \code{use_description = FALSE}), 'Down_regulated', and 'Up_regulated'. If \code{use_active_snw_genes = TRUE}, 'non_Signif_Snw_Genes' must also be provided.} \item{kappa_threshold}{threshold for kappa statistics, defining strong relation (default = 0.35)} \item{use_description}{Boolean argument to indicate whether term descriptions (in the 'Term_Description' column) should be used. (default = \code{FALSE})} \item{vertex.label.cex}{font size for vertex labels; it is interpreted as a multiplication factor of some device-dependent base font size (default = 0.7)} \item{vertex.size.scaling}{scaling factor for the node size (default = 2.5)} } \value{ Plots a graph diagram of clustering results. Each node is an enriched term from `enrichment_res`. Size of node corresponds to -log(lowest_p). Thickness of the edges between nodes correspond to the kappa statistic between the two terms. Color of each node corresponds to distinct clusters. For fuzzy clustering, if a term is in multiple clusters, multiple colors are utilized. } \description{ Graph Visualization of Clustered Enriched Terms } \examples{ \dontrun{ cluster_graph_vis(clu_obj, kappa_mat, enrichment_res) } } ================================================ FILE: man/color_kegg_pathway.Rd ================================================ % Generated by roxygen2: do not edit by hand % Please edit documentation in R/visualization.R \name{color_kegg_pathway} \alias{color_kegg_pathway} \title{Color hsa KEGG pathway} \usage{ color_kegg_pathway( pw_id, change_vec, scale_vals = TRUE, node_cols = NULL, legend.position = "top" ) } \arguments{ \item{pw_id}{hsa KEGG pathway id (e.g. hsa05012)} \item{change_vec}{vector of change values, names should be hsa KEGG gene ids} \item{scale_vals}{should change values be scaled? (default = \code{TRUE})} \item{node_cols}{low, middle and high color values for coloring the pathway nodes (default = \code{NULL}). If \code{node_cols=NULL}, the low, middle and high color are set as 'green', 'gray' and 'red'. If all change values are 1e6 (in case no changes are supplied, this dummy value is assigned by \code{\link{input_processing}}), only one color ('#F38F18' if NULL) is used.} \item{legend.position}{the default position of legends ("none", "left", "right", "bottom", "top", "inside")} } \value{ a ggplot object containing the colored KEGG pathway diagram visualization } \description{ Color hsa KEGG pathway } \examples{ \dontrun{ pw_id <- 'hsa00010' change_vec <- c(-2, 4, 6) names(change_vec) <- c('hsa:2821', 'hsa:226', 'hsa:229') result <- pathfindR:::color_kegg_pathway(pw_id, change_vec) } } ================================================ FILE: man/combine_pathfindR_results.Rd ================================================ % Generated by roxygen2: do not edit by hand % Please edit documentation in R/comparison.R \name{combine_pathfindR_results} \alias{combine_pathfindR_results} \title{Combine 2 pathfindR Results} \usage{ combine_pathfindR_results(result_A, result_B, plot_common = TRUE) } \arguments{ \item{result_A}{data frame of first pathfindR enrichment results} \item{result_B}{data frame of second pathfindR enrichment results} \item{plot_common}{boolean to indicate whether or not to plot the term-gene graph of the common terms (default=\code{TRUE})} } \value{ Data frame of combined pathfindR enrichment results. Columns are: \describe{ \item{ID}{ID of the enriched term} \item{Term_Description}{Description of the enriched term} \item{Fold_Enrichment_A}{Fold enrichment value for the enriched term (Calculated using ONLY the input genes)} \item{occurrence_A}{the number of iterations that the given term was found to enriched over all iterations} \item{lowest_p_A}{the lowest adjusted-p value of the given term over all iterations} \item{highest_p_A}{the highest adjusted-p value of the given term over all iterations} \item{Up_regulated_A}{the up-regulated genes in the input involved in the given term's gene set, comma-separated} \item{Down_regulated_A}{the down-regulated genes in the input involved in the given term's gene set, comma-separated} \item{Fold_Enrichment_B}{Fold enrichment value for the enriched term (Calculated using ONLY the input genes)} \item{occurrence_B}{the number of iterations that the given term was found to enriched over all iterations} \item{lowest_p_B}{the lowest adjusted-p value of the given term over all iterations} \item{highest_p_B}{the highest adjusted-p value of the given term over all iterations} \item{Up_regulated_B}{the up-regulated genes in the input involved in the given term's gene set, comma-separated} \item{Down_regulated_B}{the down-regulated genes in the input involved in the given term's gene set, comma-separated} \item{combined_p}{the combined p value (via Fisher's method)} \item{status}{whether the term is found in both analyses ('common'), found only in the first ('A only') or found only in the second ('B only)} } By default, the function also displays the term-gene graph of the common terms } \description{ Combine 2 pathfindR Results } \examples{ combined_results <- combine_pathfindR_results(example_pathfindR_output, example_comparison_output) } ================================================ FILE: man/combined_results_graph.Rd ================================================ % Generated by roxygen2: do not edit by hand % Please edit documentation in R/comparison.R \name{combined_results_graph} \alias{combined_results_graph} \title{Combined Results Graph} \usage{ combined_results_graph( combined_df, selected_terms = "common", use_description = FALSE, layout = "stress", node_size = "num_genes" ) } \arguments{ \item{combined_df}{Data frame of combined pathfindR enrichment results} \item{selected_terms}{the vector of selected terms for creating the graph (either IDs or term descriptions). If set to \code{'common'}, all of the common terms are used. (default = 'common')} \item{use_description}{Boolean argument to indicate whether term descriptions (in the 'Term_Description' column) should be used. (default = \code{FALSE})} \item{layout}{The type of layout to create (see \code{\link[ggraph]{ggraph}} for details. Default = 'stress')} \item{node_size}{Argument to indicate whether to use number of significant genes ('num_genes') or the -log10(lowest p value) ('p_val') for adjusting the node sizes (default = 'num_genes')} } \value{ a \code{\link[ggraph]{ggraph}} object containing the combined term-gene graph. Each node corresponds to an enriched term (orange if common, different shades of blue otherwise), an up-regulated gene (green), a down-regulated gene (red) or a conflicting (i.e. up in one analysis, down in the other or vice versa) gene (gray). An edge between a term and a gene indicates that the given term involves the gene. Size of a term node is proportional to either the number of genes (if \code{node_size = 'num_genes'}) or the -log10(lowest p value) (if \code{node_size = 'p_val'}). } \description{ Combined Results Graph } \examples{ combined_results <- combine_pathfindR_results( example_pathfindR_output, example_comparison_output, plot_common = FALSE ) g <- combined_results_graph(combined_results, selected_terms = sample(combined_results$ID, 3)) } ================================================ FILE: man/configure_output_dir.Rd ================================================ % Generated by roxygen2: do not edit by hand % Please edit documentation in R/utility.R \name{configure_output_dir} \alias{configure_output_dir} \title{Configure Output Directory Name} \usage{ configure_output_dir(output_dir = NULL) } \arguments{ \item{output_dir}{the directory to be created where the output and intermediate files are saved (default = \code{NULL}, a temporary directory is used)} } \value{ /path/to/output/dir } \description{ Configure Output Directory Name } ================================================ FILE: man/create_HTML_report.Rd ================================================ % Generated by roxygen2: do not edit by hand % Please edit documentation in R/utility.R \name{create_HTML_report} \alias{create_HTML_report} \title{Create HTML Report of pathfindR Results} \usage{ create_HTML_report(input, input_processed, final_res, dir_for_report) } \arguments{ \item{input}{the input data that pathfindR uses. The input must be a data frame with three columns: \enumerate{ \item Gene Symbol (Gene Symbol) \item Change value, e.g. log(fold change) (OPTIONAL) \item p value, e.g. adjusted p value associated with differential expression }} \item{input_processed}{processed input data frame} \item{final_res}{final pathfindR result data frame} \item{dir_for_report}{directory to render the report in} } \description{ Create HTML Report of pathfindR Results } ================================================ FILE: man/create_kappa_matrix.Rd ================================================ % Generated by roxygen2: do not edit by hand % Please edit documentation in R/clustering.R \name{create_kappa_matrix} \alias{create_kappa_matrix} \title{Create Kappa Statistics Matrix} \usage{ create_kappa_matrix( enrichment_res, use_description = FALSE, use_active_snw_genes = FALSE ) } \arguments{ \item{enrichment_res}{data frame of pathfindR enrichment results. Must-have columns are 'Term_Description' (if \code{use_description = TRUE}) or 'ID' (if \code{use_description = FALSE}), 'Down_regulated', and 'Up_regulated'. If \code{use_active_snw_genes = TRUE}, 'non_Signif_Snw_Genes' must also be provided.} \item{use_description}{Boolean argument to indicate whether term descriptions (in the 'Term_Description' column) should be used. (default = \code{FALSE})} \item{use_active_snw_genes}{boolean to indicate whether or not to use non-input active subnetwork genes in the calculation of kappa statistics (default = FALSE, i.e. only use affected genes)} } \value{ a matrix of kappa statistics between each term in the enrichment results. } \description{ Create Kappa Statistics Matrix } \examples{ sub_df <- example_pathfindR_output[1:3, ] create_kappa_matrix(sub_df) } ================================================ FILE: man/enrichment.Rd ================================================ % Generated by roxygen2: do not edit by hand % Please edit documentation in R/enrichment.R \name{enrichment} \alias{enrichment} \title{Perform Enrichment Analysis for a Single Gene Set} \usage{ enrichment( input_genes, genes_by_term = pathfindR.data::kegg_genes, term_descriptions = pathfindR.data::kegg_descriptions, adj_method = "bonferroni", enrichment_threshold = 0.05, sig_genes_vec, background_genes ) } \arguments{ \item{input_genes}{The set of gene symbols to be used for enrichment analysis. In the scope of this package, these are genes that were identified for an active subnetwork} \item{genes_by_term}{List that contains genes for each gene set. Names of this list are gene set IDs (default = kegg_genes)} \item{term_descriptions}{Vector that contains term descriptions for the gene sets. Names of this vector are gene set IDs (default = kegg_descriptions)} \item{adj_method}{correction method to be used for adjusting p-values. (default = 'bonferroni')} \item{enrichment_threshold}{adjusted-p value threshold used when filtering enrichment results (default = 0.05)} \item{sig_genes_vec}{vector of significant gene symbols. In the scope of this package, these are the input genes that were used for active subnetwork search} \item{background_genes}{vector of background genes. In the scope of this package, the background genes are taken as all genes in the PIN (see \code{\link{enrichment_analyses}})} } \value{ A data frame that contains enrichment results } \description{ Perform Enrichment Analysis for a Single Gene Set } \examples{ enrichment( input_genes = c('PER1', 'PER2', 'CRY1', 'CREB1'), sig_genes_vec = 'PER1', background_genes = unlist(pathfindR.data::kegg_genes) ) } \seealso{ \code{\link[stats]{p.adjust}} for adjustment of p values. See \code{\link{run_pathfindR}} for the wrapper function of the pathfindR workflow. \code{\link{hyperg_test}} for the details on hypergeometric distribution-based hypothesis testing. } ================================================ FILE: man/enrichment_analyses.Rd ================================================ % Generated by roxygen2: do not edit by hand % Please edit documentation in R/enrichment.R \name{enrichment_analyses} \alias{enrichment_analyses} \title{Perform Enrichment Analyses on the Input Subnetworks} \usage{ enrichment_analyses( snws, sig_genes_vec, pin_name_path = "Biogrid", genes_by_term = pathfindR.data::kegg_genes, term_descriptions = pathfindR.data::kegg_descriptions, adj_method = "bonferroni", enrichment_threshold = 0.05, list_active_snw_genes = FALSE ) } \arguments{ \item{snws}{a list of subnetwork genes (i.e., vectors of genes for each subnetwork)} \item{sig_genes_vec}{vector of significant gene symbols. In the scope of this package, these are the input genes that were used for active subnetwork search} \item{pin_name_path}{Name of the chosen PIN or absolute/path/to/PIN.sif. If PIN name, must be one of c('Biogrid', 'STRING', 'GeneMania', 'IntAct', 'KEGG', 'mmu_STRING'). If path/to/PIN.sif, the file must comply with the PIN specifications. (Default = 'Biogrid')} \item{genes_by_term}{List that contains genes for each gene set. Names of this list are gene set IDs (default = kegg_genes)} \item{term_descriptions}{Vector that contains term descriptions for the gene sets. Names of this vector are gene set IDs (default = kegg_descriptions)} \item{adj_method}{correction method to be used for adjusting p-values. (default = 'bonferroni')} \item{enrichment_threshold}{adjusted-p value threshold used when filtering enrichment results (default = 0.05)} \item{list_active_snw_genes}{boolean value indicating whether or not to report the non-significant active subnetwork genes for the active subnetwork which was enriched for the given term with the lowest p value (default = \code{FALSE})} } \value{ a dataframe of combined enrichment results. Columns are: \describe{ \item{ID}{ID of the enriched term} \item{Term_Description}{Description of the enriched term} \item{Fold_Enrichment}{Fold enrichment value for the enriched term} \item{p_value}{p value of enrichment} \item{adj_p}{adjusted p value of enrichment} \item{support}{the support (proportion of active subnetworks leading to enrichment over all subnetworks) for the gene set} \item{non_Signif_Snw_Genes (OPTIONAL)}{the non-significant active subnetwork genes, comma-separated} } } \description{ Perform Enrichment Analyses on the Input Subnetworks } \examples{ enr_res <- enrichment_analyses( snws = example_active_snws[1:2], sig_genes_vec = example_pathfindR_input$Gene.symbol[1:25], pin_name_path = 'KEGG' ) } \seealso{ \code{\link{enrichment}} for the enrichment analysis for a single gene set } ================================================ FILE: man/enrichment_chart.Rd ================================================ % Generated by roxygen2: do not edit by hand % Please edit documentation in R/visualization.R \name{enrichment_chart} \alias{enrichment_chart} \title{Create Bubble Chart of Enrichment Results} \usage{ enrichment_chart( result_df, top_terms = 10, plot_by_cluster = FALSE, num_bubbles = 4, even_breaks = TRUE ) } \arguments{ \item{result_df}{a data frame that must contain the following columns: \describe{ \item{Term_Description}{Description of the enriched term} \item{Fold_Enrichment}{Fold enrichment value for the enriched term} \item{lowest_p}{the lowest adjusted-p value of the given term over all iterations} \item{Up_regulated}{the up-regulated genes in the input involved in the given term's gene set, comma-separated} \item{Down_regulated}{the down-regulated genes in the input involved in the given term's gene set, comma-separated} \item{Cluster(OPTIONAL)}{the cluster to which the enriched term is assigned} }} \item{top_terms}{number of top terms (according to the 'lowest_p' column) to plot (default = 10). If \code{plot_by_cluster = TRUE}, selects the top \code{top_terms} terms per each cluster. Set \code{top_terms = NULL} to plot for all terms.If the total number of terms is less than \code{top_terms}, all terms are plotted.} \item{plot_by_cluster}{boolean value indicating whether or not to group the enriched terms by cluster (works if \code{result_df} contains a 'Cluster' column).} \item{num_bubbles}{number of sizes displayed in the legend \code{# genes} (Default = 4)} \item{even_breaks}{whether or not to set even breaks for the number of sizes displayed in the legend \code{# genes}. If \code{TRUE} (default), sets equal breaks and the number of displayed bubbles may be different than the number set by \code{num_bubbles}. If the exact number set by \code{num_bubbles} is required, set this argument to \code{FALSE}} } \value{ a \code{\link[ggplot2]{ggplot2}} object containing the bubble chart. The x-axis corresponds to fold enrichment values while the y-axis indicates the enriched terms. Size of the bubble indicates the number of significant genes in the given enriched term. Color indicates the -log10(lowest-p) value. The closer the color is to red, the more significant the enrichment is. Optionally, if 'Cluster' is a column of \code{result_df} and \code{plot_by_cluster == TRUE}, the enriched terms are grouped by clusters. } \description{ This function is used to create a ggplot2 bubble chart displaying the enrichment results. } \examples{ g <- enrichment_chart(example_pathfindR_output) } ================================================ FILE: man/fetch_gene_set.Rd ================================================ % Generated by roxygen2: do not edit by hand % Please edit documentation in R/utility.R \name{fetch_gene_set} \alias{fetch_gene_set} \title{Fetch Gene Set Objects} \usage{ fetch_gene_set( gene_sets = "KEGG", min_gset_size = 10, max_gset_size = 300, custom_genes = NULL, custom_descriptions = NULL ) } \arguments{ \item{gene_sets}{Name of the gene sets to be used for enrichment analysis. Available gene sets are 'KEGG', 'Reactome', 'BioCarta', 'GO-All', 'GO-BP', 'GO-CC', 'GO-MF', 'cell_markers', 'mmu_KEGG' or 'Custom'. If 'Custom', the arguments \code{custom_genes} and \code{custom_descriptions} must be specified. (Default = 'KEGG')} \item{min_gset_size}{minimum number of genes a term must contain (default = 10)} \item{max_gset_size}{maximum number of genes a term must contain (default = 300)} \item{custom_genes}{a list containing the genes involved in each custom term. Each element is a vector of gene symbols located in the given custom term. Names should correspond to the IDs of the custom terms.} \item{custom_descriptions}{A vector containing the descriptions for each custom term. Names of the vector should correspond to the IDs of the custom terms.} } \value{ a list containing 2 elements \describe{ \item{genes_by_term}{list of vectors of genes contained in each term} \item{term_descriptions}{vector of descriptions per each term} } } \description{ Function for obtaining the gene sets per term and the term descriptions to be used for enrichment analysis. } \examples{ KEGG_gset <- fetch_gene_set() GO_MF_gset <- fetch_gene_set('GO-MF', min_gset_size = 20, max_gset_size = 100) } ================================================ FILE: man/fetch_java_version.Rd ================================================ % Generated by roxygen2: do not edit by hand % Please edit documentation in R/zzz.R \name{fetch_java_version} \alias{fetch_java_version} \title{Obtain Java Version} \usage{ fetch_java_version() } \value{ character vector containing the output of 'java -version' } \description{ Obtain Java Version } \details{ this function was adapted from the CRAN package \code{sparklyr} } ================================================ FILE: man/filterActiveSnws.Rd ================================================ % Generated by roxygen2: do not edit by hand % Please edit documentation in R/active_snw_search.R \name{filterActiveSnws} \alias{filterActiveSnws} \title{Parse Active Subnetwork Search Output File and Filter the Subnetworks} \usage{ filterActiveSnws( active_snw_path, sig_genes_vec, score_quan_thr = 0.8, sig_gene_thr = 0.02 ) } \arguments{ \item{active_snw_path}{path to the output of an Active Subnetwork Search} \item{sig_genes_vec}{vector of significant gene symbols. In the scope of this package, these are the input genes that were used for active subnetwork search} \item{score_quan_thr}{active subnetwork score quantile threshold. Must be between 0 and 1 or set to -1 for not filtering. (Default = 0.8)} \item{sig_gene_thr}{threshold for the minimum proportion of significant genes in the subnetwork (Default = 0.02) If the number of genes to use as threshold is calculated to be < 2 (e.g. 50 signif. genes x 0.01 = 0.5), the threshold number is set to 2} } \value{ A list containing \code{subnetworks}: a list of of genes in every active subnetwork that has a score greater than the \code{score_quan_thr}th quantile and that contains at least \code{sig_gene_thr} of significant genes and \code{scores} the score of each filtered active subnetwork } \description{ Parse Active Subnetwork Search Output File and Filter the Subnetworks } \examples{ path2snw_list <- system.file( 'extdata/resultActiveSubnetworkSearch.txt', package = 'pathfindR' ) filtered <- filterActiveSnws( active_snw_path = path2snw_list, sig_genes_vec = example_pathfindR_input$Gene.symbol ) } \seealso{ See \code{\link{run_pathfindR}} for the wrapper function of the pathfindR enrichment workflow } ================================================ FILE: man/fuzzy_term_clustering.Rd ================================================ % Generated by roxygen2: do not edit by hand % Please edit documentation in R/clustering.R \name{fuzzy_term_clustering} \alias{fuzzy_term_clustering} \title{Heuristic Fuzzy Multiple-linkage Partitioning of Enriched Terms} \usage{ fuzzy_term_clustering( kappa_mat, enrichment_res, kappa_threshold = 0.35, use_description = FALSE ) } \arguments{ \item{kappa_mat}{matrix of kappa statistics (output of \code{\link{create_kappa_matrix}})} \item{enrichment_res}{data frame of pathfindR enrichment results. Must-have columns are 'Term_Description' (if \code{use_description = TRUE}) or 'ID' (if \code{use_description = FALSE}), 'Down_regulated', and 'Up_regulated'. If \code{use_active_snw_genes = TRUE}, 'non_Signif_Snw_Genes' must also be provided.} \item{kappa_threshold}{threshold for kappa statistics, defining strong relation (default = 0.35)} \item{use_description}{Boolean argument to indicate whether term descriptions (in the 'Term_Description' column) should be used. (default = \code{FALSE})} } \value{ a boolean matrix of cluster assignments. Each row corresponds to an enriched term, each column corresponds to a cluster. } \description{ Heuristic Fuzzy Multiple-linkage Partitioning of Enriched Terms } \details{ The fuzzy clustering algorithm was implemented based on: Huang DW, Sherman BT, Tan Q, et al. The DAVID Gene Functional Classification Tool: a novel biological module-centric algorithm to functionally analyze large gene lists. Genome Biol. 2007;8(9):R183. } \examples{ \dontrun{ fuzzy_term_clustering(kappa_mat, enrichment_res) fuzzy_term_clustering(kappa_mat, enrichment_res, kappa_threshold = 0.45) } } ================================================ FILE: man/get_biogrid_pin.Rd ================================================ % Generated by roxygen2: do not edit by hand % Please edit documentation in R/data_generation.R \name{get_biogrid_pin} \alias{get_biogrid_pin} \title{Retrieve the Requested Release of Organism-specific BioGRID PIN} \usage{ get_biogrid_pin(org = "Homo_sapiens", path2pin, release = "latest") } \arguments{ \item{org}{organism name. BioGRID naming requires underscores for spaces so 'Homo sapiens' becomes 'Homo_sapiens', 'Mus musculus' becomes 'Mus_musculus' etc. See \url{https://wiki.thebiogrid.org/doku.php/statistics} for a full list of available organisms (default = 'Homo_sapiens')} \item{path2pin}{the path of the file to save the PIN data. By default, the PIN data is saved in a temporary file} \item{release}{the requested BioGRID release (default = 'latest')} } \value{ the path of the file in which the PIN data was saved. If \code{path2pin} was not supplied by the user, the PIN data is saved in a temporary file } \description{ Retrieve the Requested Release of Organism-specific BioGRID PIN } ================================================ FILE: man/get_gene_sets_list.Rd ================================================ % Generated by roxygen2: do not edit by hand % Please edit documentation in R/data_generation.R \name{get_gene_sets_list} \alias{get_gene_sets_list} \title{Retrieve Organism-specific Gene Sets List} \usage{ get_gene_sets_list( source = "KEGG", org_code = "hsa", species = "Homo sapiens", db_species = "HS", collection, subcollection = NULL ) } \arguments{ \item{source}{As of this version, either 'KEGG', 'Reactome' or 'MSigDB' (default = 'KEGG')} \item{org_code}{(Used for 'KEGG' only) KEGG organism code for the selected organism. For a full list of all available organisms, see \url{https://www.genome.jp/kegg/catalog/org_list.html}} \item{species}{species name for output genes, such as Homo sapiens, Mus musculus, etc. See \code{\link[msigdbr]{msigdbr_species}} for all the species available in the msigdbr package.} \item{db_species}{Species abbreviation for the human or mouse databases ("HS" or "MM").} \item{collection}{collection. e.g., H, C1. (default = NULL, i.e. list all gene sets in collection). See \code{\link[msigdbr]{msigdbr_collections}} for all available options the msigdbr package.} \item{subcollection}{sub-collection, such as CGP, BP, etc. (default = NULL, i.e. list all gene sets in collection). See \code{\link[msigdbr]{msigdbr_collections}} for all available options the msigdbr package.} } \value{ A list containing 2 elements: \itemize{ \item{gene_sets - A list containing the genes involved in each gene set} \item{descriptions - A named vector containing the descriptions for each gene set} }. For 'KEGG' and 'MSigDB', it is possible to choose a specific organism. For a full list of all available KEGG organisms, see \url{https://www.genome.jp/kegg/catalog/org_list.html}. See \code{\link[msigdbr]{msigdbr_species}} for all the species available in the msigdbr package used for obtaining 'MSigDB' gene sets. For Reactome, there is only one collection of pathway gene sets. } \description{ Retrieve Organism-specific Gene Sets List } ================================================ FILE: man/get_kegg_gsets.Rd ================================================ % Generated by roxygen2: do not edit by hand % Please edit documentation in R/data_generation.R \name{get_kegg_gsets} \alias{get_kegg_gsets} \title{Retrieve Organism-specific KEGG Pathway Gene Sets} \usage{ get_kegg_gsets(org_code = "hsa") } \arguments{ \item{org_code}{KEGG organism code for the selected organism. For a full list of all available organisms, see \url{https://www.genome.jp/kegg/catalog/org_list.html}} } \value{ list containing 2 elements: \itemize{ \item{gene_sets - A list containing KEGG IDs for the genes involved in each KEGG pathway} \item{descriptions - A named vector containing the descriptions for each KEGG pathway} } } \description{ Retrieve Organism-specific KEGG Pathway Gene Sets } ================================================ FILE: man/get_mgsigdb_gsets.Rd ================================================ % Generated by roxygen2: do not edit by hand % Please edit documentation in R/data_generation.R \name{get_mgsigdb_gsets} \alias{get_mgsigdb_gsets} \title{Retrieve Organism-specific MSigDB Gene Sets} \usage{ get_mgsigdb_gsets( species = "Homo sapiens", db_species = "HS", collection = NULL, subcollection = NULL ) } \arguments{ \item{species}{species name for output genes, such as Homo sapiens, Mus musculus, etc. See \code{\link[msigdbr]{msigdbr_species}} for all the species available in the msigdbr package.} \item{db_species}{Species abbreviation for the human or mouse databases ("HS" or "MM").} \item{collection}{collection. e.g., H, C1. (default = NULL, i.e. list all gene sets in collection). See \code{\link[msigdbr]{msigdbr_collections}} for all available options the msigdbr package.} \item{subcollection}{sub-collection, such as CGP, BP, etc. (default = NULL, i.e. list all gene sets in collection). See \code{\link[msigdbr]{msigdbr_collections}} for all available options the msigdbr package.} } \value{ Retrieves the MSigDB gene sets and returns a list containing 2 elements: \itemize{ \item{gene_sets - A list containing the genes involved in each of the selected MSigDB gene sets} \item{descriptions - A named vector containing the descriptions for each selected MSigDB gene set} } } \description{ Retrieve Organism-specific MSigDB Gene Sets } \details{ this function utilizes the function \code{\link[msigdbr]{msigdbr}} from the \code{msigdbr} package to retrieve the 'Molecular Signatures Database' (MSigDB) gene sets (Subramanian et al. 2005 , Liberzon et al. 2015 ). Available collections are: H: hallmark gene sets, C1: positional gene sets, C2: curated gene sets, C3: motif gene sets, C4: computational gene sets, C5: GO gene sets, C6: oncogenic signatures and C7: immunologic signatures } ================================================ FILE: man/get_pin_file.Rd ================================================ % Generated by roxygen2: do not edit by hand % Please edit documentation in R/data_generation.R \name{get_pin_file} \alias{get_pin_file} \title{Retrieve Organism-specific PIN data} \usage{ get_pin_file(source = "BioGRID", org = "Homo_sapiens", path2pin, ...) } \arguments{ \item{source}{As of this version, this function is implemented to get data from 'BioGRID' only. This argument (and this wrapper function) was implemented for future utility} \item{org}{organism name. BioGRID naming requires underscores for spaces so 'Homo sapiens' becomes 'Homo_sapiens', 'Mus musculus' becomes 'Mus_musculus' etc. See \url{https://wiki.thebiogrid.org/doku.php/statistics} for a full list of available organisms (default = 'Homo_sapiens')} \item{path2pin}{the path of the file to save the PIN data. By default, the PIN data is saved in a temporary file} \item{...}{additional arguments for \code{\link{get_biogrid_pin}}} } \value{ the path of the file in which the PIN data was saved. If \code{path2pin} was not supplied by the user, the PIN data is saved in a temporary file } \description{ Retrieve Organism-specific PIN data } \examples{ \dontrun{ pin_path <- get_pin_file() } } ================================================ FILE: man/get_reactome_gsets.Rd ================================================ % Generated by roxygen2: do not edit by hand % Please edit documentation in R/data_generation.R \name{get_reactome_gsets} \alias{get_reactome_gsets} \title{Retrieve Reactome Pathway Gene Sets} \usage{ get_reactome_gsets() } \value{ Gets the latest Reactome pathways gene sets in gmt format. Parses the gmt file and returns a list containing 2 elements: \itemize{ \item{gene_sets - A list containing the genes involved in each Reactome pathway} \item{descriptions - A named vector containing the descriptions for each Reactome pathway} } } \description{ Retrieve Reactome Pathway Gene Sets } ================================================ FILE: man/gset_list_from_gmt.Rd ================================================ % Generated by roxygen2: do not edit by hand % Please edit documentation in R/data_generation.R \name{gset_list_from_gmt} \alias{gset_list_from_gmt} \title{Retrieve Gene Sets from GMT-format File} \usage{ gset_list_from_gmt(path2gmt, descriptions_idx = 2) } \arguments{ \item{path2gmt}{path to the gmt file} \item{descriptions_idx}{index for descriptions (default = 2)} } \value{ list containing 2 elements: \itemize{ \item{gene_sets - A list containing the genes involved in each gene set} \item{descriptions - A named vector containing the descriptions for each gene set} } } \description{ Retrieve Gene Sets from GMT-format File } ================================================ FILE: man/hierarchical_term_clustering.Rd ================================================ % Generated by roxygen2: do not edit by hand % Please edit documentation in R/clustering.R \name{hierarchical_term_clustering} \alias{hierarchical_term_clustering} \title{Hierarchical Clustering of Enriched Terms} \usage{ hierarchical_term_clustering( kappa_mat, enrichment_res, num_clusters = NULL, use_description = FALSE, clu_method = "average", plot_hmap = FALSE, plot_dend = TRUE ) } \arguments{ \item{kappa_mat}{matrix of kappa statistics (output of \code{\link{create_kappa_matrix}})} \item{enrichment_res}{data frame of pathfindR enrichment results. Must-have columns are 'Term_Description' (if \code{use_description = TRUE}) or 'ID' (if \code{use_description = FALSE}), 'Down_regulated', and 'Up_regulated'. If \code{use_active_snw_genes = TRUE}, 'non_Signif_Snw_Genes' must also be provided.} \item{num_clusters}{number of clusters to be formed (default = \code{NULL}). If \code{NULL}, the optimal number of clusters is determined as the number which yields the highest average silhouette width.} \item{use_description}{Boolean argument to indicate whether term descriptions (in the 'Term_Description' column) should be used. (default = \code{FALSE})} \item{clu_method}{the agglomeration method to be used (default = 'average', see \code{\link[stats]{hclust}})} \item{plot_hmap}{boolean to indicate whether to plot the kappa statistics clustering heatmap or not (default = FALSE)} \item{plot_dend}{boolean to indicate whether to plot the clustering dendrogram partitioned into the optimal number of clusters (default = TRUE)} } \value{ a vector of clusters for each enriched term in the enrichment results. } \description{ Hierarchical Clustering of Enriched Terms } \details{ The function initially performs hierarchical clustering of the enriched terms in \code{enrichment_res} using the kappa statistics (defining the distance as \code{1 - kappa_statistic}). Next, the clustering dendrogram is cut into k = 2, 3, ..., n - 1 clusters (where n is the number of terms). The optimal number of clusters is determined as the k value which yields the highest average silhouette width. (if \code{num_clusters} not specified) } \examples{ \dontrun{ hierarchical_term_clustering(kappa_mat, enrichment_res) hierarchical_term_clustering(kappa_mat, enrichment_res, method = 'complete') } } ================================================ FILE: man/hyperg_test.Rd ================================================ % Generated by roxygen2: do not edit by hand % Please edit documentation in R/enrichment.R \name{hyperg_test} \alias{hyperg_test} \title{Hypergeometric Distribution-based Hypothesis Testing} \usage{ hyperg_test(term_genes, chosen_genes, background_genes) } \arguments{ \item{term_genes}{vector of genes in the selected term gene set} \item{chosen_genes}{vector containing the set of input genes} \item{background_genes}{vector of background genes (i.e. universal set of genes in the experiment)} } \value{ the p-value as determined using the hypergeometric distribution. } \description{ Hypergeometric Distribution-based Hypothesis Testing } \details{ To determine whether the \code{chosen_genes} are enriched (compared to a background pool of genes) in the \code{term_genes}, the hypergeometric distribution is assumed and the appropriate p value (the value under the right tail) is calculated and returned. } \examples{ hyperg_test(letters[1:5], letters[2:5], letters) hyperg_test(letters[1:5], letters[2:10], letters) hyperg_test(letters[1:5], letters[2:13], letters) } ================================================ FILE: man/input_processing.Rd ================================================ % Generated by roxygen2: do not edit by hand % Please edit documentation in R/utility.R \name{input_processing} \alias{input_processing} \title{Process Input} \usage{ input_processing( input, p_val_threshold = 0.05, pin_name_path = "Biogrid", convert2alias = TRUE ) } \arguments{ \item{input}{the input data that pathfindR uses. The input must be a data frame with three columns: \enumerate{ \item Gene Symbol (Gene Symbol) \item Change value, e.g. log(fold change) (OPTIONAL) \item p value, e.g. adjusted p value associated with differential expression }} \item{p_val_threshold}{the p value threshold to use when filtering the input data frame. Must a numeric value between 0 and 1. (default = 0.05)} \item{pin_name_path}{Name of the chosen PIN or absolute/path/to/PIN.sif. If PIN name, must be one of c('Biogrid', 'STRING', 'GeneMania', 'IntAct', 'KEGG', 'mmu_STRING'). If path/to/PIN.sif, the file must comply with the PIN specifications. (Default = 'Biogrid')} \item{convert2alias}{boolean to indicate whether or not to convert gene symbols in the input that are not found in the PIN to an alias symbol found in the PIN (default = TRUE) IMPORTANT NOTE: the conversion uses human gene symbols/alias symbols.} } \value{ This function first filters the input so that all p values are less than or equal to the threshold. Next, gene symbols that are not found in the PIN are identified. If aliases of these gene symbols are found in the PIN, the symbols are converted to the corresponding aliases. The resulting data frame containing the original gene symbols, the updated symbols, change values and p values is then returned. } \description{ Process Input } \examples{ processed_df <- input_processing( input = example_pathfindR_input[1:5, ], pin_name_path = 'KEGG' ) processed_df <- input_processing( input = example_pathfindR_input[1:5, ], pin_name_path = 'KEGG', convert2alias = FALSE ) } \seealso{ See \code{\link{run_pathfindR}} for the wrapper function of the pathfindR workflow } ================================================ FILE: man/input_testing.Rd ================================================ % Generated by roxygen2: do not edit by hand % Please edit documentation in R/utility.R \name{input_testing} \alias{input_testing} \title{Input Testing} \usage{ input_testing(input, p_val_threshold = 0.05) } \arguments{ \item{input}{the input data that pathfindR uses. The input must be a data frame with three columns: \enumerate{ \item Gene Symbol (Gene Symbol) \item Change value, e.g. log(fold change) (OPTIONAL) \item p value, e.g. adjusted p value associated with differential expression }} \item{p_val_threshold}{the p value threshold to use when filtering the input data frame. Must a numeric value between 0 and 1. (default = 0.05)} } \value{ Only checks if the input and the threshold follows the required specifications. } \description{ Input Testing } \examples{ input_testing(example_pathfindR_input, 0.05) } \seealso{ See \code{\link{run_pathfindR}} for the wrapper function of the pathfindR workflow } ================================================ FILE: man/isColor.Rd ================================================ % Generated by roxygen2: do not edit by hand % Please edit documentation in R/visualization.R \name{isColor} \alias{isColor} \title{Check if value is a valid color} \usage{ isColor(x) } \arguments{ \item{x}{value} } \value{ TRUE if x is a valid color, otherwise FALSE } \description{ Check if value is a valid color } ================================================ FILE: man/pathfindr.Rd ================================================ % Generated by roxygen2: do not edit by hand % Please edit documentation in R/pathfindr.R \docType{package} \name{pathfindR} \alias{pathfindR-package} \alias{pathfindR} \title{pathfindR: A package for Enrichment Analysis Utilizing Active Subnetworks} \description{ pathfindR is a tool for active-subnetwork-oriented gene set enrichment analysis. The main aim of the package is to identify active subnetworks in a protein-protein interaction network using a user-provided list of genes and associated p values then performing enrichment analyses on the identified subnetworks, discovering enriched terms (i.e. pathways, gene ontology, TF target gene sets etc.) that possibly underlie the phenotype of interest. } \details{ For analysis on non-Homo sapiens organisms, pathfindR offers utility functions for obtaining organism-specific PIN data and organism-specific gene sets data. pathfindR also offers functionalities to cluster the enriched terms and identify representative terms in each cluster, to score the enriched terms per sample and to visualize analysis results. } \seealso{ See \code{\link{run_pathfindR}} for details on the pathfindR active-subnetwork-oriented enrichment analysis See \code{\link{cluster_enriched_terms}} for details on methods of enriched terms clustering to define clusters of biologically-related terms See \code{\link{score_terms}} for details on agglomerated score calculation for enriched terms to investigate how a gene set is altered in a given sample (or in cases vs. controls) See \code{\link{term_gene_heatmap}} for details on visualization of the heatmap of enriched terms by involved genes See \code{\link{term_gene_graph}} for details on visualizing terms and term-related genes as a graph to determine the degree of overlap between the enriched terms by identifying shared and/or distinct significant genes See \code{\link{UpSet_plot}} for details on creating an UpSet plot of the enriched terms. See \code{\link{get_pin_file}} for obtaining organism-specific PIN data and \code{\link{get_gene_sets_list}} for obtaining organism-specific gene sets data } \author{ \strong{Maintainer}: Ege Ulgen \email{egeulgen@gmail.com} (\href{https://orcid.org/0000-0003-2090-3621}{ORCID}) [copyright holder] Authors: \itemize{ \item Ozan Ozisik \email{ozanytu@gmail.com} (\href{https://orcid.org/0000-0001-5980-8002}{ORCID}) } } ================================================ FILE: man/plot_scores.Rd ================================================ % Generated by roxygen2: do not edit by hand % Please edit documentation in R/scoring.R \name{plot_scores} \alias{plot_scores} \title{Plot the Heatmap of Score Matrix of Enriched Terms per Sample} \usage{ plot_scores( score_matrix, cases = NULL, label_samples = TRUE, case_title = "Case", control_title = "Control", low = "green", mid = "black", high = "red" ) } \arguments{ \item{score_matrix}{Matrix of agglomerated enriched term scores per sample. Columns are samples, rows are enriched terms} \item{cases}{(Optional) A vector of sample names that are cases in the case/control experiment. (default = NULL)} \item{label_samples}{Boolean value to indicate whether or not to label the samples in the heatmap plot (default = TRUE)} \item{case_title}{Naming of the 'Case' group (as in \code{cases}) (default = 'Case')} \item{control_title}{Naming of the 'Control' group (default = 'Control')} \item{low}{a string indicating the color of 'low' values in the coloring gradient (default = 'green')} \item{mid}{a string indicating the color of 'mid' values in the coloring gradient (default = 'black')} \item{high}{a string indicating the color of 'high' values in the coloring gradient (default = 'red')} } \value{ A `ggplot2` object containing the heatmap plot. x-axis indicates the samples. y-axis indicates the enriched terms. 'Score' indicates the score of the term in a given sample. If \code{cases} are provided, the plot is divided into 2 facets, named by \code{case_title} and \code{control_title}. } \description{ Plot the Heatmap of Score Matrix of Enriched Terms per Sample } \examples{ score_matrix <- score_terms( example_pathfindR_output, example_experiment_matrix, plot_hmap = FALSE ) hmap <- plot_scores(score_matrix) } ================================================ FILE: man/process_pin.Rd ================================================ % Generated by roxygen2: do not edit by hand % Please edit documentation in R/data_generation.R \name{process_pin} \alias{process_pin} \title{Process Data frame of Protein-protein Interactions} \usage{ process_pin(pin_df) } \arguments{ \item{pin_df}{data frame of protein-protein interactions with 2 columns: 'Interactor_A' and 'Interactor_B'} } \value{ processed PIN data frame (removes self-interactions and duplicated interactions) } \description{ Process Data frame of Protein-protein Interactions } ================================================ FILE: man/return_pin_path.Rd ================================================ % Generated by roxygen2: do not edit by hand % Please edit documentation in R/utility.R \name{return_pin_path} \alias{return_pin_path} \title{Return The Path to Given Protein-Protein Interaction Network (PIN)} \usage{ return_pin_path(pin_name_path = "Biogrid") } \arguments{ \item{pin_name_path}{Name of the chosen PIN or absolute/path/to/PIN.sif. If PIN name, must be one of c('Biogrid', 'STRING', 'GeneMania', 'IntAct', 'KEGG', 'mmu_STRING'). If path/to/PIN.sif, the file must comply with the PIN specifications. (Default = 'Biogrid')} } \value{ The absolute path to chosen PIN. } \description{ This function returns the absolute path/to/PIN.sif. While the default PINs are 'Biogrid', 'STRING', 'GeneMania', 'IntAct', 'KEGG' and 'mmu_STRING'. The user can also use any other PIN by specifying the 'path/to/PIN.sif'. All PINs to be used in this package must formatted as SIF files: i.e. have 3 columns with no header, no row names and be tab-separated. Columns 1 and 3 must be interactors' gene symbols, column 2 must be a column with all rows consisting of 'pp'. } \examples{ \dontrun{ pin_path <- return_pin_path('GeneMania') } } \seealso{ See \code{\link{run_pathfindR}} for the wrapper function of the pathfindR workflow } ================================================ FILE: man/run_pathfindr.Rd ================================================ % Generated by roxygen2: do not edit by hand % Please edit documentation in R/core.R \name{run_pathfindR} \alias{run_pathfindR} \title{Wrapper Function for pathfindR - Active-Subnetwork-Oriented Enrichment Workflow} \usage{ run_pathfindR( input, gene_sets = "KEGG", min_gset_size = 10, max_gset_size = 300, custom_genes = NULL, custom_descriptions = NULL, pin_name_path = "Biogrid", p_val_threshold = 0.05, enrichment_threshold = 0.05, convert2alias = TRUE, plot_enrichment_chart = TRUE, output_dir = NULL, list_active_snw_genes = FALSE, ... ) } \arguments{ \item{input}{the input data that pathfindR uses. The input must be a data frame with three columns: \enumerate{ \item Gene Symbol (Gene Symbol) \item Change value, e.g. log(fold change) (OPTIONAL) \item p value, e.g. adjusted p value associated with differential expression }} \item{gene_sets}{Name of the gene sets to be used for enrichment analysis. Available gene sets are 'KEGG', 'Reactome', 'BioCarta', 'GO-All', 'GO-BP', 'GO-CC', 'GO-MF', 'cell_markers', 'mmu_KEGG' or 'Custom'. If 'Custom', the arguments \code{custom_genes} and \code{custom_descriptions} must be specified. (Default = 'KEGG')} \item{min_gset_size}{minimum number of genes a term must contain (default = 10)} \item{max_gset_size}{maximum number of genes a term must contain (default = 300)} \item{custom_genes}{a list containing the genes involved in each custom term. Each element is a vector of gene symbols located in the given custom term. Names should correspond to the IDs of the custom terms.} \item{custom_descriptions}{A vector containing the descriptions for each custom term. Names of the vector should correspond to the IDs of the custom terms.} \item{pin_name_path}{Name of the chosen PIN or absolute/path/to/PIN.sif. If PIN name, must be one of c('Biogrid', 'STRING', 'GeneMania', 'IntAct', 'KEGG', 'mmu_STRING'). If path/to/PIN.sif, the file must comply with the PIN specifications. (Default = 'Biogrid')} \item{p_val_threshold}{the p value threshold to use when filtering the input data frame. Must a numeric value between 0 and 1. (default = 0.05)} \item{enrichment_threshold}{adjusted-p value threshold used when filtering enrichment results (default = 0.05)} \item{convert2alias}{boolean to indicate whether or not to convert gene symbols in the input that are not found in the PIN to an alias symbol found in the PIN (default = TRUE) IMPORTANT NOTE: the conversion uses human gene symbols/alias symbols.} \item{plot_enrichment_chart}{boolean value. If TRUE, a bubble chart displaying the enrichment results is plotted. (default = TRUE)} \item{output_dir}{the directory to be created where the output and intermediate files are saved (default = \code{NULL}, a temporary directory is used)} \item{list_active_snw_genes}{boolean value indicating whether or not to report the non-significant active subnetwork genes for the active subnetwork which was enriched for the given term with the lowest p value (default = \code{FALSE})} \item{...}{additional arguments for \code{\link{active_snw_enrichment_wrapper}}} } \value{ Data frame of pathfindR enrichment results. Columns are: \describe{ \item{ID}{ID of the enriched term} \item{Term_Description}{Description of the enriched term} \item{Fold_Enrichment}{Fold enrichment value for the enriched term (Calculated using ONLY the input genes)} \item{occurrence}{the number of iterations that the given term was found to enriched over all iterations} \item{support}{the median support (proportion of active subnetworks leading to enrichment within an iteration) over all iterations} \item{lowest_p}{the lowest adjusted-p value of the given term over all iterations} \item{highest_p}{the highest adjusted-p value of the given term over all iterations} \item{non_Signif_Snw_Genes (OPTIONAL)}{the non-significant active subnetwork genes, comma-separated} \item{Up_regulated}{the up-regulated genes (as determined by `change value` > 0, if the `change column` was provided) in the input involved in the given term's gene set, comma-separated. If change column not provided, all affected are listed here.} \item{Down_regulated}{the down-regulated genes (as determined by `change value` < 0, if the `change column` was provided) in the input involved in the given term's gene set, comma-separated} } The function also creates an HTML report with the pathfindR enrichment results linked to the visualizations of the enriched terms in addition to the table of converted gene symbols. This report can be found in '\code{output_dir}/results.html' under the current working directory. By default, a bubble chart of top 10 enrichment results are plotted. The x-axis corresponds to fold enrichment values while the y-axis indicates the enriched terms. Sizes of the bubbles indicate the number of significant genes in the given terms. Color indicates the -log10(lowest-p) value; the more red it is, the more significant the enriched term is. See \code{\link{enrichment_chart}}. } \description{ \code{run_pathfindR} is the wrapper function for the pathfindR workflow } \details{ This function takes in a data frame consisting of Gene Symbol, log-fold-change and adjusted-p values. After input testing, any gene symbols that are not in the PIN are converted to alias symbols if the alias is in the PIN. Next, active subnetwork search is performed. Enrichment analysis is performed using the genes in each of the active subnetworks. Terms with adjusted-p values lower than \code{enrichment_threshold} are discarded. The lowest adjusted-p value (over all subnetworks) for each term is kept. This process of active subnetwork search and enrichment is repeated for a selected number of \code{iterations}, which is done in parallel. Over all iterations, the lowest and the highest adjusted-p values, as well as number of occurrences are reported for each enriched term. } \section{Warning}{ Especially depending on the protein interaction network, the algorithm and the number of iterations you choose, 'active subnetwork search + enrichment' component of \code{run_pathfindR} may take a long time to finish. } \examples{ \dontrun{ run_pathfindR(example_pathfindR_input) } } \seealso{ \code{\link{input_testing}} for input testing, \code{\link{input_processing}} for input processing, \code{\link{active_snw_search}} for active subnetwork search and subnetwork filtering, \code{\link{enrichment_analyses}} for enrichment analysis (using the active subnetworks), \code{\link{summarize_enrichment_results}} for summarizing the active-subnetwork-oriented enrichment results, \code{\link{annotate_term_genes}} for annotation of affected genes in the given gene sets, \code{\link{visualize_terms}} for visualization of enriched terms, \code{\link{enrichment_chart}} for a visual summary of the pathfindR enrichment results, \code{\link[foreach]{foreach}} for details on parallel execution of looping constructs, \code{\link{cluster_enriched_terms}} for clustering the resulting enriched terms and partitioning into clusters. } ================================================ FILE: man/safe_get_content.Rd ================================================ % Generated by roxygen2: do not edit by hand % Please edit documentation in R/data_generation.R \name{safe_get_content} \alias{safe_get_content} \title{Safely download and parse web content} \usage{ safe_get_content(url, ..., timeout_sec = 10) } \arguments{ \item{url}{Character string. The URL of the resource to download.} \item{...}{Additional arguments passed to \code{\link[httr]{GET}}.} \item{timeout_sec}{Numeric. Timeout in seconds for the request (default = 10).} } \value{ A character string containing the parsed content of the response (UTF-8 encoded). On failure, an error is raised with a clear message. } \description{ This helper function retrieves content from a given URL using \pkg{httr}. It ensures that common issues (e.g. no internet, timeouts, HTTP errors, or parsing errors) are handled gracefully with clear, informative error messages. } \details{ This function is intended for use inside package functions. For examples, vignettes, or tests, wrap calls in a connectivity check (e.g. using \code{http_error(HEAD(url))}) to avoid CRAN failures when the resource is temporarily unavailable. } \examples{ \dontrun{ # Retrieve the latest BioGRID release page result <- safe_get_content("https://downloads.thebiogrid.org/BioGRID/Latest-Release/") } } ================================================ FILE: man/score_terms.Rd ================================================ % Generated by roxygen2: do not edit by hand % Please edit documentation in R/scoring.R \name{score_terms} \alias{score_terms} \title{Calculate Agglomerated Scores of Enriched Terms for Each Subject} \usage{ score_terms( enrichment_table, exp_mat, cases = NULL, use_description = FALSE, plot_hmap = TRUE, ... ) } \arguments{ \item{enrichment_table}{a data frame that must contain the 3 columns below: \describe{ \item{Term_Description}{Description of the enriched term (necessary if \code{use_description = TRUE})} \item{ID}{ID of the enriched term (necessary if \code{use_description = FALSE})} \item{Up_regulated}{the up-regulated genes in the input involved in the given term's gene set, comma-separated} \item{Down_regulated}{the down-regulated genes in the input involved in the given term's gene set, comma-separated} }} \item{exp_mat}{the experiment (e.g., gene expression/methylation) matrix. Columns are samples and rows are genes. Column names must contain sample names and row names must contain the gene symbols.} \item{cases}{(Optional) A vector of sample names that are cases in the case/control experiment. (default = NULL)} \item{use_description}{Boolean argument to indicate whether term descriptions (in the 'Term_Description' column) should be used. (default = \code{FALSE})} \item{plot_hmap}{Boolean value to indicate whether or not to draw the heatmap plot of the scores. (default = TRUE)} \item{...}{Additional arguments for \code{\link{plot_scores}} for aesthetics of the heatmap plot} } \value{ Matrix of agglomerated scores of each enriched term per sample. Columns are samples, rows are enriched terms. Optionally, displays a heatmap of this matrix. } \description{ Calculate Agglomerated Scores of Enriched Terms for Each Subject } \section{Conceptual Background}{ For an experiment matrix (containing expression, methylation, etc. values), the rows of which are genes and the columns of which are samples, we denote: \itemize{ \item E as a matrix of size \ifelse{html}{\out{m x n}}{\eqn{m \times n}} \item G as the set of all genes in the experiment \ifelse{html}{\out{G = Ei., i ∈ [1, m]}}{\eqn{G = E_{i\cdot}, \ \ i \in [1, m]}} \item S as the set of all samples in the experiment \ifelse{html}{\out{S = E.j, i ∈ [1, n]}}{\eqn{S = E_{j\cdot}, \ \ \in [1, n]}} } We next define the gene score matrix GS (the standardized experiment matrix, also of size \ifelse{html}{\out{m x n}}{\eqn{m \times n}}) as: \ifelse{html}{\out{GSgs = (Egs - ēg) / sg}}{\eqn{GS_{gs} = \frac{E_{gs} - \bar{e_g}}{s_g}}} where \ifelse{html}{\out{g ∈ G}}{\eqn{g \in G}}, \ifelse{html}{\out{s ∈ S}}{\eqn{s \in S}}, \ifelse{html}{\out{ēg}}{\eqn{\bar{e_g}}} is the mean of all values for gene g and \ifelse{html}{\out{sg}}{\eqn{\bar{s_g}}} is the standard deviation of all values for gene g. We next denote T to be a set of terms (where each \ifelse{html}{\out{t ∈ T}}{\eqn{t \in T}} is a set of term-related genes, i.e., \ifelse{html}{\out{t = \{gx, ..., gy\} ⊂ G}}{\eqn{t = \{g_x, ..., g_y\} \subset G}}) and finally define the agglomerated term scores matrix TS (where rows correspond to genes and columns corresponds to samples s.t. the matrix has size \ifelse{html}{\out{|T| x n}}{\eqn{|T| \times n}}) as: \ifelse{html}{\out{TSts = 1/|t| ∑ g ∈ t GSgs}}{\eqn{TS_{ts} = \frac{1}{|t|}\sum_{g \in t} GS_{gs}}}, where \ifelse{html}{\out{t ∈ T}}{\eqn{t \in T}} and \ifelse{html}{\out{s ∈ S}}{\eqn{s \in S}}. } \examples{ score_matrix <- score_terms( example_pathfindR_output, example_experiment_matrix, plot_hmap = FALSE ) } ================================================ FILE: man/single_iter_wrapper.Rd ================================================ % Generated by roxygen2: do not edit by hand % Please edit documentation in R/utility.R \name{single_iter_wrapper} \alias{single_iter_wrapper} \title{Active Subnetwork Search + Enrichment Analysis Wrapper for a Single Iteration} \usage{ single_iter_wrapper( i = NULL, dirs, input_processed, pin_path, score_quan_thr, sig_gene_thr, search_method, silent_option, use_all_positives, geneInitProbs, saTemp0, saTemp1, saIter, gaPop, gaIter, gaThread, gaCrossover, gaMut, grMaxDepth, grSearchDepth, grOverlap, grSubNum, gset_list, adj_method, enrichment_threshold, list_active_snw_genes ) } \arguments{ \item{i}{current iteration index (default = \code{NULL})} \item{dirs}{vector of directories for parallel runs} \item{input_processed}{processed input data frame} \item{pin_path}{path/to/PIN/file} \item{score_quan_thr}{active subnetwork score quantile threshold. Must be between 0 and 1 or set to -1 for not filtering. (Default = 0.8)} \item{sig_gene_thr}{threshold for the minimum proportion of significant genes in the subnetwork (Default = 0.02) If the number of genes to use as threshold is calculated to be < 2 (e.g. 50 signif. genes x 0.01 = 0.5), the threshold number is set to 2} \item{search_method}{algorithm to use when performing active subnetwork search. Options are greedy search (GR), simulated annealing (SA) or genetic algorithm (GA) for the search (default = 'GR').} \item{silent_option}{boolean value indicating whether to print the messages to the console (FALSE) or not (TRUE, this will print to a temp. file) during active subnetwork search (default = TRUE). This option was added because during parallel runs, the console messages get disorderly printed.} \item{use_all_positives}{if TRUE: in GA, adds an individual with all positive nodes. In SA, initializes candidate solution with all positive nodes. (default = FALSE)} \item{geneInitProbs}{For SA and GA, probability of adding a gene in initial solution (default = 0.1)} \item{saTemp0}{Initial temperature for SA (default = 1.0)} \item{saTemp1}{Final temperature for SA (default = 0.01)} \item{saIter}{Iteration number for SA (default = 10000)} \item{gaPop}{Population size for GA (default = 400)} \item{gaIter}{Iteration number for GA (default = 200)} \item{gaThread}{Number of threads to be used in GA (default = 5)} \item{gaCrossover}{Applies crossover with the given probability in GA (default = 1, i.e. always perform crossover)} \item{gaMut}{For GA, applies mutation with given mutation rate (default = 0, i.e. mutation off)} \item{grMaxDepth}{Sets max depth in greedy search, 0 for no limit (default = 1)} \item{grSearchDepth}{Search depth in greedy search (default = 1)} \item{grOverlap}{Overlap threshold for results of greedy search (default = 0.5)} \item{grSubNum}{Number of subnetworks to be presented in the results (default = 1000)} \item{gset_list}{list for gene sets} \item{adj_method}{correction method to be used for adjusting p-values. (default = 'bonferroni')} \item{enrichment_threshold}{adjusted-p value threshold used when filtering enrichment results (default = 0.05)} \item{list_active_snw_genes}{boolean value indicating whether or not to report the non-significant active subnetwork genes for the active subnetwork which was enriched for the given term with the lowest p value (default = \code{FALSE})} } \value{ Data frame of enrichment results using active subnetwork search results } \description{ Active Subnetwork Search + Enrichment Analysis Wrapper for a Single Iteration } ================================================ FILE: man/summarize_enrichment_results.Rd ================================================ % Generated by roxygen2: do not edit by hand % Please edit documentation in R/enrichment.R \name{summarize_enrichment_results} \alias{summarize_enrichment_results} \title{Summarize Enrichment Results} \usage{ summarize_enrichment_results(enrichment_res, list_active_snw_genes = FALSE) } \arguments{ \item{enrichment_res}{a dataframe of combined enrichment results. Columns are: \describe{ \item{ID}{ID of the enriched term} \item{Term_Description}{Description of the enriched term} \item{Fold_Enrichment}{Fold enrichment value for the enriched term} \item{p_value}{p value of enrichment} \item{adj_p}{adjusted p value of enrichment} \item{non_Signif_Snw_Genes (OPTIONAL)}{the non-significant active subnetwork genes, comma-separated} }} \item{list_active_snw_genes}{boolean value indicating whether or not to report the non-significant active subnetwork genes for the active subnetwork which was enriched for the given term with the lowest p value (default = \code{FALSE})} } \value{ a dataframe of summarized enrichment results (over multiple iterations). Columns are: \describe{ \item{ID}{ID of the enriched term} \item{Term_Description}{Description of the enriched term} \item{Fold_Enrichment}{Fold enrichment value for the enriched term} \item{occurrence}{the number of iterations that the given term was found to enriched over all iterations} \item{support}{the median support (proportion of active subnetworks leading to enrichment within an iteration) over all iterations} \item{lowest_p}{the lowest adjusted-p value of the given term over all iterations} \item{highest_p}{the highest adjusted-p value of the given term over all iterations} \item{non_Signif_Snw_Genes (OPTIONAL)}{the non-significant active subnetwork genes, comma-separated} } } \description{ Summarize Enrichment Results } \examples{ \dontrun{ summarize_enrichment_results(enrichment_res) } } ================================================ FILE: man/term_gene_graph.Rd ================================================ % Generated by roxygen2: do not edit by hand % Please edit documentation in R/visualization.R \name{term_gene_graph} \alias{term_gene_graph} \title{Create Term-Gene Graph} \usage{ term_gene_graph( result_df, num_terms = 10, layout = "stress", use_description = FALSE, node_size = "num_genes", node_colors = c("#E5D7BF", "green", "red") ) } \arguments{ \item{result_df}{A dataframe of pathfindR results that must contain the following columns: \describe{ \item{Term_Description}{Description of the enriched term (necessary if \code{use_description = TRUE})} \item{ID}{ID of the enriched term (necessary if \code{use_description = FALSE})} \item{lowest_p}{the lowest adjusted-p value of the given term over all iterations} \item{Up_regulated}{the up-regulated genes in the input involved in the given term's gene set, comma-separated} \item{Down_regulated}{the down-regulated genes in the input involved in the given term's gene set, comma-separated} }} \item{num_terms}{Number of top enriched terms to use while creating the graph. Set to \code{NULL} to use all enriched terms (default = 10, i.e. top 10 terms)} \item{layout}{The type of layout to create (see \code{\link[ggraph]{ggraph}} for details. Default = 'stress')} \item{use_description}{Boolean argument to indicate whether term descriptions (in the 'Term_Description' column) should be used. (default = \code{FALSE})} \item{node_size}{Argument to indicate whether to use number of significant genes ('num_genes') or the -log10(lowest p value) ('p_val') for adjusting the node sizes (default = 'num_genes')} \item{node_colors}{vector of 3 colors to be used for coloring nodes (colors for term nodes, up, and down, respectively)} } \value{ a \code{\link[ggraph]{ggraph}} object containing the term-gene graph. Each node corresponds to an enriched term (beige), an up-regulated gene (green) or a down-regulated gene (red). An edge between a term and a gene indicates that the given term involves the gene. Size of a term node is proportional to either the number of genes (if \code{node_size = 'num_genes'}) or the -log10(lowest p value) (if \code{node_size = 'p_val'}). } \description{ Create Term-Gene Graph } \details{ This function (adapted from the Gene-Concept network visualization by the R package \code{enrichplot}) can be utilized to visualize which input genes are involved in the enriched terms as a graph. The term-gene graph shows the links between genes and biological terms and allows for the investigation of multiple terms to which significant genes are related. The graph also enables determination of the overlap between the enriched terms by identifying shared and distinct significant term-related genes. } \examples{ p <- term_gene_graph(example_pathfindR_output) p <- term_gene_graph(example_pathfindR_output, num_terms = 5) p <- term_gene_graph(example_pathfindR_output, node_size = 'p_val') } ================================================ FILE: man/term_gene_heatmap.Rd ================================================ % Generated by roxygen2: do not edit by hand % Please edit documentation in R/visualization.R \name{term_gene_heatmap} \alias{term_gene_heatmap} \title{Create Terms by Genes Heatmap} \usage{ term_gene_heatmap( result_df, genes_df, num_terms = 10, use_description = FALSE, low = "red", mid = "black", high = "green", legend_title = "change", sort_terms_by_p = FALSE, ... ) } \arguments{ \item{result_df}{A dataframe of pathfindR results that must contain the following columns: \describe{ \item{Term_Description}{Description of the enriched term (necessary if \code{use_description = TRUE})} \item{ID}{ID of the enriched term (necessary if \code{use_description = FALSE})} \item{lowest_p}{the highest adjusted-p value of the given term over all iterations} \item{Up_regulated}{the up-regulated genes in the input involved in the given term's gene set, comma-separated} \item{Down_regulated}{the down-regulated genes in the input involved in the given term's gene set, comma-separated} }} \item{genes_df}{the input data that was used with \code{\link{run_pathfindR}}. It must be a data frame with 3 columns: \enumerate{ \item Gene Symbol (Gene Symbol) \item Change value, e.g. log(fold change) (optional) \item p value, e.g. adjusted p value associated with differential expression } The change values in this data frame are used to color the affected genes} \item{num_terms}{Number of top enriched terms to use while creating the plot. Set to \code{NULL} to use all enriched terms (default = 10)} \item{use_description}{Boolean argument to indicate whether term descriptions (in the 'Term_Description' column) should be used. (default = \code{FALSE})} \item{low}{a string indicating the color of 'low' values in the coloring gradient (default = 'green')} \item{mid}{a string indicating the color of 'mid' values in the coloring gradient (default = 'black')} \item{high}{a string indicating the color of 'high' values in the coloring gradient (default = 'red')} \item{legend_title}{legend title (default = 'change')} \item{sort_terms_by_p}{boolean to indicate whether to sort terms by 'lowest_p' (\code{TRUE}) or by number of genes (\code{FALSE}) (default = \code{FALSE})} \item{...}{additional arguments for \code{\link{input_processing}} (used if \code{genes_df} is provided)} } \value{ a ggplot2 object of a heatmap where rows are enriched terms and columns are involved input genes. If \code{genes_df} is provided, colors of the tiles indicate the change values. } \description{ Create Terms by Genes Heatmap } \examples{ term_gene_heatmap(example_pathfindR_output, num_terms = 3) } ================================================ FILE: man/visualize_KEGG_diagram.Rd ================================================ % Generated by roxygen2: do not edit by hand % Please edit documentation in R/visualization.R \name{visualize_KEGG_diagram} \alias{visualize_KEGG_diagram} \title{Visualize Human KEGG Pathways} \usage{ visualize_KEGG_diagram( kegg_pw_ids, input_processed, scale_vals = TRUE, node_cols = NULL, legend.position = "top" ) } \arguments{ \item{kegg_pw_ids}{KEGG ids of pathways to be colored and visualized} \item{input_processed}{input data processed via \code{\link{input_processing}}} \item{scale_vals}{should change values be scaled? (default = \code{TRUE})} \item{node_cols}{low, middle and high color values for coloring the pathway nodes (default = \code{NULL}). If \code{node_cols=NULL}, the low, middle and high color are set as 'green', 'gray' and 'red'. If all change values are 1e6 (in case no changes are supplied, this dummy value is assigned by \code{\link{input_processing}}), only one color ('#F38F18' if NULL) is used.} \item{legend.position}{the default position of legends ("none", "left", "right", "bottom", "top", "inside")} } \value{ Creates colored visualizations of the enriched human KEGG pathways and returns them as a list of ggplot objects, named by Term ID. } \description{ Visualize Human KEGG Pathways } \examples{ \dontrun{ input_processed <- data.frame( GENE = c("PKLR", "GPI", "CREB1", "INS"), CHANGE = c(1.5, -2, 3, 5) ) gg_list <- visualize_KEGG_diagram(c("hsa00010", "hsa04911"), input_processed) } } \seealso{ See \code{\link{visualize_terms}} for the wrapper function for creating enriched term diagrams. See \code{\link{run_pathfindR}} for the wrapper function of the pathfindR enrichment workflow. } ================================================ FILE: man/visualize_active_subnetworks.Rd ================================================ % Generated by roxygen2: do not edit by hand % Please edit documentation in R/active_snw_search.R \name{visualize_active_subnetworks} \alias{visualize_active_subnetworks} \title{Visualize Active Subnetworks} \usage{ visualize_active_subnetworks( active_snw_path, genes_df, pin_name_path = "Biogrid", num_snws, layout = "stress", score_quan_thr = 0.8, sig_gene_thr = 0.02, ... ) } \arguments{ \item{active_snw_path}{path to the output of an Active Subnetwork Search} \item{genes_df}{the input data that was used with \code{\link{run_pathfindR}}. It must be a data frame with 3 columns: \enumerate{ \item Gene Symbol (Gene Symbol) \item Change value, e.g. log(fold change) (optional) \item p value, e.g. adjusted p value associated with differential expression } The change values in this data frame are used to color the affected genes} \item{pin_name_path}{Name of the chosen PIN or absolute/path/to/PIN.sif. If PIN name, must be one of c('Biogrid', 'STRING', 'GeneMania', 'IntAct', 'KEGG', 'mmu_STRING'). If path/to/PIN.sif, the file must comply with the PIN specifications. (Default = 'Biogrid')} \item{num_snws}{number of top subnetworks to be visualized (leave blank if you want to visualize all subnetworks)} \item{layout}{The type of layout to create (see \code{\link[ggraph]{ggraph}} for details. Default = 'stress')} \item{score_quan_thr}{active subnetwork score quantile threshold. Must be between 0 and 1 or set to -1 for not filtering. (Default = 0.8)} \item{sig_gene_thr}{threshold for the minimum proportion of significant genes in the subnetwork (Default = 0.02) If the number of genes to use as threshold is calculated to be < 2 (e.g. 50 signif. genes x 0.01 = 0.5), the threshold number is set to 2} \item{...}{additional arguments for \code{\link{input_processing}}} } \value{ a list of ggplot objects of graph visualizations of identified active subnetworks. Green nodes are down-regulated genes, reds are up-regulated genes and yellows are non-input genes } \description{ Visualize Active Subnetworks } \examples{ path2snw_list <- system.file( 'extdata/resultActiveSubnetworkSearch.txt', package = 'pathfindR' ) # visualize top 2 active subnetworks g_list <- visualize_active_subnetworks( active_snw_path = path2snw_list, genes_df = example_pathfindR_input[1:10, ], pin_name_path = 'KEGG', num_snws = 2 ) } ================================================ FILE: man/visualize_term_interactions.Rd ================================================ % Generated by roxygen2: do not edit by hand % Please edit documentation in R/visualization.R \name{visualize_term_interactions} \alias{visualize_term_interactions} \title{Visualize Interactions of Genes Involved in the Given Enriched Terms} \usage{ visualize_term_interactions(result_df, pin_name_path, show_legend = TRUE) } \arguments{ \item{result_df}{Data frame of enrichment results. Must-have columns are: 'Term_Description', 'Up_regulated' and 'Down_regulated'} \item{pin_name_path}{Name of the chosen PIN or absolute/path/to/PIN.sif. If PIN name, must be one of c('Biogrid', 'STRING', 'GeneMania', 'IntAct', 'KEGG', 'mmu_STRING'). If path/to/PIN.sif, the file must comply with the PIN specifications. (Default = 'Biogrid')} \item{show_legend}{Boolean to indicate whether to display the legend (\code{TRUE}) or not (\code{FALSE}) (default: \code{TRUE})} } \value{ list of ggplot objects (named by Term ID) visualizing the interactions of genes involved in the given enriched terms (annotated in the \code{result_df}) in the PIN used for enrichment analysis (specified by \code{pin_name_path}). } \description{ Visualize Interactions of Genes Involved in the Given Enriched Terms } \details{ The following steps are performed for the visualization of interactions of genes involved for each enriched term: \enumerate{ \item shortest paths between all affected genes are determined (via \code{\link[igraph]{igraph}}) \item the nodes of all shortest paths are merged \item the PIN is subsetted using the merged nodes (genes) \item using the PIN subset, the graph showing the interactions is generated \item the final graph is visualized using \code{\link[igraph]{igraph}}, colored by changed status (if provided) } } \examples{ \dontrun{ result_df <- example_pathfindR_output[1:2, ] gg_list <- visualize_term_interactions(result_df, pin_name_path = 'IntAct') } } \seealso{ See \code{\link{visualize_terms}} for the wrapper function for creating enriched term diagrams. See \code{\link{run_pathfindR}} for the wrapper function of the pathfindR enrichment workflow. } ================================================ FILE: man/visualize_terms.Rd ================================================ % Generated by roxygen2: do not edit by hand % Please edit documentation in R/visualization.R \name{visualize_terms} \alias{visualize_terms} \title{Create Diagrams for Enriched Terms} \usage{ visualize_terms( result_df, input_processed = NULL, is_KEGG_result = TRUE, pin_name_path = "Biogrid", ... ) } \arguments{ \item{result_df}{Data frame of enrichment results. Must-have columns for KEGG human pathway diagrams (\code{is_KEGG_result = TRUE}) are: 'ID' and 'Term_Description'. Must-have columns for the rest are: 'Term_Description', 'Up_regulated' and 'Down_regulated'} \item{input_processed}{input data processed via \code{\link{input_processing}}, not necessary when \code{is_KEGG_result = FALSE}} \item{is_KEGG_result}{boolean to indicate whether KEGG gene sets were used for enrichment analysis or not (default = \code{TRUE})} \item{pin_name_path}{Name of the chosen PIN or absolute/path/to/PIN.sif. If PIN name, must be one of c('Biogrid', 'STRING', 'GeneMania', 'IntAct', 'KEGG', 'mmu_STRING'). If path/to/PIN.sif, the file must comply with the PIN specifications. (Default = 'Biogrid')} \item{...}{additional arguments for \code{\link{visualize_KEGG_diagram}} (used when \code{is_KEGG_result = TRUE}) or \code{\link{visualize_term_interactions}} (used when \code{is_KEGG_result = FALSE})} } \value{ Depending on the argument \code{is_KEGG_result}, creates visualization of interactions of genes involved in the list of enriched terms in \code{result_df}. Returns a list of ggplot objects named by Term ID. } \description{ Create Diagrams for Enriched Terms } \details{ For \code{is_KEGG_result = TRUE}, KEGG pathway diagrams are created, affected nodes colored by up/down regulation status. For other gene sets, interactions of affected genes are determined (via a shortest-path algorithm) and are visualized (colored by change status) using igraph. } \examples{ \dontrun{ input_processed <- data.frame( GENE = c("PARP1", "NDUFA1", "STX6", "SNAP23"), CHANGE = c(1.5, -2, 3, 5) ) result_df <- example_pathfindR_output[1:2, ] gg_list <- visualize_terms(result_df, input_processed) gg_list2 <- visualize_terms(result_df, is_KEGG_result = FALSE, pin_name_path = 'IntAct') } } \seealso{ See \code{\link{visualize_KEGG_diagram}} for the visualization function of KEGG diagrams. See \code{\link{visualize_term_interactions}} for the visualization function that generates diagrams showing the interactions of input genes in the PIN. See \code{\link{run_pathfindR}} for the wrapper function of the pathfindR workflow. } ================================================ FILE: renv/.gitignore ================================================ library/ local/ cellar/ lock/ python/ sandbox/ staging/ ================================================ FILE: renv/activate.R ================================================ local({ # the requested version of renv version <- "1.1.4" attr(version, "sha") <- NULL # the project directory project <- Sys.getenv("RENV_PROJECT") if (!nzchar(project)) project <- getwd() # use start-up diagnostics if enabled diagnostics <- Sys.getenv("RENV_STARTUP_DIAGNOSTICS", unset = "FALSE") if (diagnostics) { start <- Sys.time() profile <- tempfile("renv-startup-", fileext = ".Rprof") utils::Rprof(profile) on.exit({ utils::Rprof(NULL) elapsed <- signif(difftime(Sys.time(), start, units = "auto"), digits = 2L) writeLines(sprintf("- renv took %s to run the autoloader.", format(elapsed))) writeLines(sprintf("- Profile: %s", profile)) print(utils::summaryRprof(profile)) }, add = TRUE) } # figure out whether the autoloader is enabled enabled <- local({ # first, check config option override <- getOption("renv.config.autoloader.enabled") if (!is.null(override)) return(override) # if we're being run in a context where R_LIBS is already set, # don't load -- presumably we're being run as a sub-process and # the parent process has already set up library paths for us rcmd <- Sys.getenv("R_CMD", unset = NA) rlibs <- Sys.getenv("R_LIBS", unset = NA) if (!is.na(rlibs) && !is.na(rcmd)) return(FALSE) # next, check environment variables # prefer using the configuration one in the future envvars <- c( "RENV_CONFIG_AUTOLOADER_ENABLED", "RENV_AUTOLOADER_ENABLED", "RENV_ACTIVATE_PROJECT" ) for (envvar in envvars) { envval <- Sys.getenv(envvar, unset = NA) if (!is.na(envval)) return(tolower(envval) %in% c("true", "t", "1")) } # enable by default TRUE }) # bail if we're not enabled if (!enabled) { # if we're not enabled, we might still need to manually load # the user profile here profile <- Sys.getenv("R_PROFILE_USER", unset = "~/.Rprofile") if (file.exists(profile)) { cfg <- Sys.getenv("RENV_CONFIG_USER_PROFILE", unset = "TRUE") if (tolower(cfg) %in% c("true", "t", "1")) sys.source(profile, envir = globalenv()) } return(FALSE) } # avoid recursion if (identical(getOption("renv.autoloader.running"), TRUE)) { warning("ignoring recursive attempt to run renv autoloader") return(invisible(TRUE)) } # signal that we're loading renv during R startup options(renv.autoloader.running = TRUE) on.exit(options(renv.autoloader.running = NULL), add = TRUE) # signal that we've consented to use renv options(renv.consent = TRUE) # load the 'utils' package eagerly -- this ensures that renv shims, which # mask 'utils' packages, will come first on the search path library(utils, lib.loc = .Library) # unload renv if it's already been loaded if ("renv" %in% loadedNamespaces()) unloadNamespace("renv") # load bootstrap tools ansify <- function(text) { if (renv_ansify_enabled()) renv_ansify_enhanced(text) else renv_ansify_default(text) } renv_ansify_enabled <- function() { override <- Sys.getenv("RENV_ANSIFY_ENABLED", unset = NA) if (!is.na(override)) return(as.logical(override)) pane <- Sys.getenv("RSTUDIO_CHILD_PROCESS_PANE", unset = NA) if (identical(pane, "build")) return(FALSE) testthat <- Sys.getenv("TESTTHAT", unset = "false") if (tolower(testthat) %in% "true") return(FALSE) iderun <- Sys.getenv("R_CLI_HAS_HYPERLINK_IDE_RUN", unset = "false") if (tolower(iderun) %in% "false") return(FALSE) TRUE } renv_ansify_default <- function(text) { text } renv_ansify_enhanced <- function(text) { # R help links pattern <- "`\\?(renv::(?:[^`])+)`" replacement <- "`\033]8;;x-r-help:\\1\a?\\1\033]8;;\a`" text <- gsub(pattern, replacement, text, perl = TRUE) # runnable code pattern <- "`(renv::(?:[^`])+)`" replacement <- "`\033]8;;x-r-run:\\1\a\\1\033]8;;\a`" text <- gsub(pattern, replacement, text, perl = TRUE) # return ansified text text } renv_ansify_init <- function() { envir <- renv_envir_self() if (renv_ansify_enabled()) assign("ansify", renv_ansify_enhanced, envir = envir) else assign("ansify", renv_ansify_default, envir = envir) } `%||%` <- function(x, y) { if (is.null(x)) y else x } catf <- function(fmt, ..., appendLF = TRUE) { quiet <- getOption("renv.bootstrap.quiet", default = FALSE) if (quiet) return(invisible()) msg <- sprintf(fmt, ...) cat(msg, file = stdout(), sep = if (appendLF) "\n" else "") invisible(msg) } header <- function(label, ..., prefix = "#", suffix = "-", n = min(getOption("width"), 78)) { label <- sprintf(label, ...) n <- max(n - nchar(label) - nchar(prefix) - 2L, 8L) if (n <= 0) return(paste(prefix, label)) tail <- paste(rep.int(suffix, n), collapse = "") paste0(prefix, " ", label, " ", tail) } heredoc <- function(text, leave = 0) { # remove leading, trailing whitespace trimmed <- gsub("^\\s*\\n|\\n\\s*$", "", text) # split into lines lines <- strsplit(trimmed, "\n", fixed = TRUE)[[1L]] # compute common indent indent <- regexpr("[^[:space:]]", lines) common <- min(setdiff(indent, -1L)) - leave text <- paste(substring(lines, common), collapse = "\n") # substitute in ANSI links for executable renv code ansify(text) } bootstrap <- function(version, library) { friendly <- renv_bootstrap_version_friendly(version) section <- header(sprintf("Bootstrapping renv %s", friendly)) catf(section) # attempt to download renv catf("- Downloading renv ... ", appendLF = FALSE) withCallingHandlers( tarball <- renv_bootstrap_download(version), error = function(err) { catf("FAILED") stop("failed to download:\n", conditionMessage(err)) } ) catf("OK") on.exit(unlink(tarball), add = TRUE) # now attempt to install catf("- Installing renv ... ", appendLF = FALSE) withCallingHandlers( status <- renv_bootstrap_install(version, tarball, library), error = function(err) { catf("FAILED") stop("failed to install:\n", conditionMessage(err)) } ) catf("OK") # add empty line to break up bootstrapping from normal output catf("") return(invisible()) } renv_bootstrap_tests_running <- function() { getOption("renv.tests.running", default = FALSE) } renv_bootstrap_repos <- function() { # get CRAN repository cran <- getOption("renv.repos.cran", "https://cloud.r-project.org") # check for repos override repos <- Sys.getenv("RENV_CONFIG_REPOS_OVERRIDE", unset = NA) if (!is.na(repos)) { # check for RSPM; if set, use a fallback repository for renv rspm <- Sys.getenv("RSPM", unset = NA) if (identical(rspm, repos)) repos <- c(RSPM = rspm, CRAN = cran) return(repos) } # check for lockfile repositories repos <- tryCatch(renv_bootstrap_repos_lockfile(), error = identity) if (!inherits(repos, "error") && length(repos)) return(repos) # retrieve current repos repos <- getOption("repos") # ensure @CRAN@ entries are resolved repos[repos == "@CRAN@"] <- cran # add in renv.bootstrap.repos if set default <- c(FALLBACK = "https://cloud.r-project.org") extra <- getOption("renv.bootstrap.repos", default = default) repos <- c(repos, extra) # remove duplicates that might've snuck in dupes <- duplicated(repos) | duplicated(names(repos)) repos[!dupes] } renv_bootstrap_repos_lockfile <- function() { lockpath <- Sys.getenv("RENV_PATHS_LOCKFILE", unset = "renv.lock") if (!file.exists(lockpath)) return(NULL) lockfile <- tryCatch(renv_json_read(lockpath), error = identity) if (inherits(lockfile, "error")) { warning(lockfile) return(NULL) } repos <- lockfile$R$Repositories if (length(repos) == 0) return(NULL) keys <- vapply(repos, `[[`, "Name", FUN.VALUE = character(1)) vals <- vapply(repos, `[[`, "URL", FUN.VALUE = character(1)) names(vals) <- keys return(vals) } renv_bootstrap_download <- function(version) { sha <- attr(version, "sha", exact = TRUE) methods <- if (!is.null(sha)) { # attempting to bootstrap a development version of renv c( function() renv_bootstrap_download_tarball(sha), function() renv_bootstrap_download_github(sha) ) } else { # attempting to bootstrap a release version of renv c( function() renv_bootstrap_download_tarball(version), function() renv_bootstrap_download_cran_latest(version), function() renv_bootstrap_download_cran_archive(version) ) } for (method in methods) { path <- tryCatch(method(), error = identity) if (is.character(path) && file.exists(path)) return(path) } stop("All download methods failed") } renv_bootstrap_download_impl <- function(url, destfile) { mode <- "wb" # https://bugs.r-project.org/bugzilla/show_bug.cgi?id=17715 fixup <- Sys.info()[["sysname"]] == "Windows" && substring(url, 1L, 5L) == "file:" if (fixup) mode <- "w+b" args <- list( url = url, destfile = destfile, mode = mode, quiet = TRUE ) if ("headers" %in% names(formals(utils::download.file))) { headers <- renv_bootstrap_download_custom_headers(url) if (length(headers) && is.character(headers)) args$headers <- headers } do.call(utils::download.file, args) } renv_bootstrap_download_custom_headers <- function(url) { headers <- getOption("renv.download.headers") if (is.null(headers)) return(character()) if (!is.function(headers)) stopf("'renv.download.headers' is not a function") headers <- headers(url) if (length(headers) == 0L) return(character()) if (is.list(headers)) headers <- unlist(headers, recursive = FALSE, use.names = TRUE) ok <- is.character(headers) && is.character(names(headers)) && all(nzchar(names(headers))) if (!ok) stop("invocation of 'renv.download.headers' did not return a named character vector") headers } renv_bootstrap_download_cran_latest <- function(version) { spec <- renv_bootstrap_download_cran_latest_find(version) type <- spec$type repos <- spec$repos baseurl <- utils::contrib.url(repos = repos, type = type) ext <- if (identical(type, "source")) ".tar.gz" else if (Sys.info()[["sysname"]] == "Windows") ".zip" else ".tgz" name <- sprintf("renv_%s%s", version, ext) url <- paste(baseurl, name, sep = "/") destfile <- file.path(tempdir(), name) status <- tryCatch( renv_bootstrap_download_impl(url, destfile), condition = identity ) if (inherits(status, "condition")) return(FALSE) # report success and return destfile } renv_bootstrap_download_cran_latest_find <- function(version) { # check whether binaries are supported on this system binary <- getOption("renv.bootstrap.binary", default = TRUE) && !identical(.Platform$pkgType, "source") && !identical(getOption("pkgType"), "source") && Sys.info()[["sysname"]] %in% c("Darwin", "Windows") types <- c(if (binary) "binary", "source") # iterate over types + repositories for (type in types) { for (repos in renv_bootstrap_repos()) { # build arguments for utils::available.packages() call args <- list(type = type, repos = repos) # add custom headers if available -- note that # utils::available.packages() will pass this to download.file() if ("headers" %in% names(formals(utils::download.file))) { headers <- renv_bootstrap_download_custom_headers(repos) if (length(headers) && is.character(headers)) args$headers <- headers } # retrieve package database db <- tryCatch( as.data.frame( do.call(utils::available.packages, args), stringsAsFactors = FALSE ), error = identity ) if (inherits(db, "error")) next # check for compatible entry entry <- db[db$Package %in% "renv" & db$Version %in% version, ] if (nrow(entry) == 0) next # found it; return spec to caller spec <- list(entry = entry, type = type, repos = repos) return(spec) } } # if we got here, we failed to find renv fmt <- "renv %s is not available from your declared package repositories" stop(sprintf(fmt, version)) } renv_bootstrap_download_cran_archive <- function(version) { name <- sprintf("renv_%s.tar.gz", version) repos <- renv_bootstrap_repos() urls <- file.path(repos, "src/contrib/Archive/renv", name) destfile <- file.path(tempdir(), name) for (url in urls) { status <- tryCatch( renv_bootstrap_download_impl(url, destfile), condition = identity ) if (identical(status, 0L)) return(destfile) } return(FALSE) } renv_bootstrap_download_tarball <- function(version) { # if the user has provided the path to a tarball via # an environment variable, then use it tarball <- Sys.getenv("RENV_BOOTSTRAP_TARBALL", unset = NA) if (is.na(tarball)) return() # allow directories if (dir.exists(tarball)) { name <- sprintf("renv_%s.tar.gz", version) tarball <- file.path(tarball, name) } # bail if it doesn't exist if (!file.exists(tarball)) { # let the user know we weren't able to honour their request fmt <- "- RENV_BOOTSTRAP_TARBALL is set (%s) but does not exist." msg <- sprintf(fmt, tarball) warning(msg) # bail return() } catf("- Using local tarball '%s'.", tarball) tarball } renv_bootstrap_github_token <- function() { for (envvar in c("GITHUB_TOKEN", "GITHUB_PAT", "GH_TOKEN")) { envval <- Sys.getenv(envvar, unset = NA) if (!is.na(envval)) return(envval) } } renv_bootstrap_download_github <- function(version) { enabled <- Sys.getenv("RENV_BOOTSTRAP_FROM_GITHUB", unset = "TRUE") if (!identical(enabled, "TRUE")) return(FALSE) # prepare download options token <- renv_bootstrap_github_token() if (is.null(token)) token <- "" if (nzchar(Sys.which("curl")) && nzchar(token)) { fmt <- "--location --fail --header \"Authorization: token %s\"" extra <- sprintf(fmt, token) saved <- options("download.file.method", "download.file.extra") options(download.file.method = "curl", download.file.extra = extra) on.exit(do.call(base::options, saved), add = TRUE) } else if (nzchar(Sys.which("wget")) && nzchar(token)) { fmt <- "--header=\"Authorization: token %s\"" extra <- sprintf(fmt, token) saved <- options("download.file.method", "download.file.extra") options(download.file.method = "wget", download.file.extra = extra) on.exit(do.call(base::options, saved), add = TRUE) } url <- file.path("https://api.github.com/repos/rstudio/renv/tarball", version) name <- sprintf("renv_%s.tar.gz", version) destfile <- file.path(tempdir(), name) status <- tryCatch( renv_bootstrap_download_impl(url, destfile), condition = identity ) if (!identical(status, 0L)) return(FALSE) renv_bootstrap_download_augment(destfile) return(destfile) } # Add Sha to DESCRIPTION. This is stop gap until #890, after which we # can use renv::install() to fully capture metadata. renv_bootstrap_download_augment <- function(destfile) { sha <- renv_bootstrap_git_extract_sha1_tar(destfile) if (is.null(sha)) { return() } # Untar tempdir <- tempfile("renv-github-") on.exit(unlink(tempdir, recursive = TRUE), add = TRUE) untar(destfile, exdir = tempdir) pkgdir <- dir(tempdir, full.names = TRUE)[[1]] # Modify description desc_path <- file.path(pkgdir, "DESCRIPTION") desc_lines <- readLines(desc_path) remotes_fields <- c( "RemoteType: github", "RemoteHost: api.github.com", "RemoteRepo: renv", "RemoteUsername: rstudio", "RemotePkgRef: rstudio/renv", paste("RemoteRef: ", sha), paste("RemoteSha: ", sha) ) writeLines(c(desc_lines[desc_lines != ""], remotes_fields), con = desc_path) # Re-tar local({ old <- setwd(tempdir) on.exit(setwd(old), add = TRUE) tar(destfile, compression = "gzip") }) invisible() } # Extract the commit hash from a git archive. Git archives include the SHA1 # hash as the comment field of the tarball pax extended header # (see https://www.kernel.org/pub/software/scm/git/docs/git-archive.html) # For GitHub archives this should be the first header after the default one # (512 byte) header. renv_bootstrap_git_extract_sha1_tar <- function(bundle) { # open the bundle for reading # We use gzcon for everything because (from ?gzcon) # > Reading from a connection which does not supply a 'gzip' magic # > header is equivalent to reading from the original connection conn <- gzcon(file(bundle, open = "rb", raw = TRUE)) on.exit(close(conn)) # The default pax header is 512 bytes long and the first pax extended header # with the comment should be 51 bytes long # `52 comment=` (11 chars) + 40 byte SHA1 hash len <- 0x200 + 0x33 res <- rawToChar(readBin(conn, "raw", n = len)[0x201:len]) if (grepl("^52 comment=", res)) { sub("52 comment=", "", res) } else { NULL } } renv_bootstrap_install <- function(version, tarball, library) { # attempt to install it into project library dir.create(library, showWarnings = FALSE, recursive = TRUE) output <- renv_bootstrap_install_impl(library, tarball) # check for successful install status <- attr(output, "status") if (is.null(status) || identical(status, 0L)) return(status) # an error occurred; report it header <- "installation of renv failed" lines <- paste(rep.int("=", nchar(header)), collapse = "") text <- paste(c(header, lines, output), collapse = "\n") stop(text) } renv_bootstrap_install_impl <- function(library, tarball) { # invoke using system2 so we can capture and report output bin <- R.home("bin") exe <- if (Sys.info()[["sysname"]] == "Windows") "R.exe" else "R" R <- file.path(bin, exe) args <- c( "--vanilla", "CMD", "INSTALL", "--no-multiarch", "-l", shQuote(path.expand(library)), shQuote(path.expand(tarball)) ) system2(R, args, stdout = TRUE, stderr = TRUE) } renv_bootstrap_platform_prefix_default <- function() { # read version component version <- Sys.getenv("RENV_PATHS_VERSION", unset = "R-%v") # expand placeholders placeholders <- list( list("%v", format(getRversion()[1, 1:2])), list("%V", format(getRversion()[1, 1:3])) ) for (placeholder in placeholders) version <- gsub(placeholder[[1L]], placeholder[[2L]], version, fixed = TRUE) # include SVN revision for development versions of R # (to avoid sharing platform-specific artefacts with released versions of R) devel <- identical(R.version[["status"]], "Under development (unstable)") || identical(R.version[["nickname"]], "Unsuffered Consequences") if (devel) version <- paste(version, R.version[["svn rev"]], sep = "-r") version } renv_bootstrap_platform_prefix <- function() { # construct version prefix version <- renv_bootstrap_platform_prefix_default() # build list of path components components <- c(version, R.version$platform) # include prefix if provided by user prefix <- renv_bootstrap_platform_prefix_impl() if (!is.na(prefix) && nzchar(prefix)) components <- c(prefix, components) # build prefix paste(components, collapse = "/") } renv_bootstrap_platform_prefix_impl <- function() { # if an explicit prefix has been supplied, use it prefix <- Sys.getenv("RENV_PATHS_PREFIX", unset = NA) if (!is.na(prefix)) return(prefix) # if the user has requested an automatic prefix, generate it auto <- Sys.getenv("RENV_PATHS_PREFIX_AUTO", unset = NA) if (is.na(auto) && getRversion() >= "4.4.0") auto <- "TRUE" if (auto %in% c("TRUE", "True", "true", "1")) return(renv_bootstrap_platform_prefix_auto()) # empty string on failure "" } renv_bootstrap_platform_prefix_auto <- function() { prefix <- tryCatch(renv_bootstrap_platform_os(), error = identity) if (inherits(prefix, "error") || prefix %in% "unknown") { msg <- paste( "failed to infer current operating system", "please file a bug report at https://github.com/rstudio/renv/issues", sep = "; " ) warning(msg) } prefix } renv_bootstrap_platform_os <- function() { sysinfo <- Sys.info() sysname <- sysinfo[["sysname"]] # handle Windows + macOS up front if (sysname == "Windows") return("windows") else if (sysname == "Darwin") return("macos") # check for os-release files for (file in c("/etc/os-release", "/usr/lib/os-release")) if (file.exists(file)) return(renv_bootstrap_platform_os_via_os_release(file, sysinfo)) # check for redhat-release files if (file.exists("/etc/redhat-release")) return(renv_bootstrap_platform_os_via_redhat_release()) "unknown" } renv_bootstrap_platform_os_via_os_release <- function(file, sysinfo) { # read /etc/os-release release <- utils::read.table( file = file, sep = "=", quote = c("\"", "'"), col.names = c("Key", "Value"), comment.char = "#", stringsAsFactors = FALSE ) vars <- as.list(release$Value) names(vars) <- release$Key # get os name os <- tolower(sysinfo[["sysname"]]) # read id id <- "unknown" for (field in c("ID", "ID_LIKE")) { if (field %in% names(vars) && nzchar(vars[[field]])) { id <- vars[[field]] break } } # read version version <- "unknown" for (field in c("UBUNTU_CODENAME", "VERSION_CODENAME", "VERSION_ID", "BUILD_ID")) { if (field %in% names(vars) && nzchar(vars[[field]])) { version <- vars[[field]] break } } # join together paste(c(os, id, version), collapse = "-") } renv_bootstrap_platform_os_via_redhat_release <- function() { # read /etc/redhat-release contents <- readLines("/etc/redhat-release", warn = FALSE) # infer id id <- if (grepl("centos", contents, ignore.case = TRUE)) "centos" else if (grepl("redhat", contents, ignore.case = TRUE)) "redhat" else "unknown" # try to find a version component (very hacky) version <- "unknown" parts <- strsplit(contents, "[[:space:]]")[[1L]] for (part in parts) { nv <- tryCatch(numeric_version(part), error = identity) if (inherits(nv, "error")) next version <- nv[1, 1] break } paste(c("linux", id, version), collapse = "-") } renv_bootstrap_library_root_name <- function(project) { # use project name as-is if requested asis <- Sys.getenv("RENV_PATHS_LIBRARY_ROOT_ASIS", unset = "FALSE") if (asis) return(basename(project)) # otherwise, disambiguate based on project's path id <- substring(renv_bootstrap_hash_text(project), 1L, 8L) paste(basename(project), id, sep = "-") } renv_bootstrap_library_root <- function(project) { prefix <- renv_bootstrap_profile_prefix() path <- Sys.getenv("RENV_PATHS_LIBRARY", unset = NA) if (!is.na(path)) return(paste(c(path, prefix), collapse = "/")) path <- renv_bootstrap_library_root_impl(project) if (!is.null(path)) { name <- renv_bootstrap_library_root_name(project) return(paste(c(path, prefix, name), collapse = "/")) } renv_bootstrap_paths_renv("library", project = project) } renv_bootstrap_library_root_impl <- function(project) { root <- Sys.getenv("RENV_PATHS_LIBRARY_ROOT", unset = NA) if (!is.na(root)) return(root) type <- renv_bootstrap_project_type(project) if (identical(type, "package")) { userdir <- renv_bootstrap_user_dir() return(file.path(userdir, "library")) } } renv_bootstrap_validate_version <- function(version, description = NULL) { # resolve description file # # avoid passing lib.loc to `packageDescription()` below, since R will # use the loaded version of the package by default anyhow. note that # this function should only be called after 'renv' is loaded # https://github.com/rstudio/renv/issues/1625 description <- description %||% packageDescription("renv") # check whether requested version 'version' matches loaded version of renv sha <- attr(version, "sha", exact = TRUE) valid <- if (!is.null(sha)) renv_bootstrap_validate_version_dev(sha, description) else renv_bootstrap_validate_version_release(version, description) if (valid) return(TRUE) # the loaded version of renv doesn't match the requested version; # give the user instructions on how to proceed dev <- identical(description[["RemoteType"]], "github") remote <- if (dev) paste("rstudio/renv", description[["RemoteSha"]], sep = "@") else paste("renv", description[["Version"]], sep = "@") # display both loaded version + sha if available friendly <- renv_bootstrap_version_friendly( version = description[["Version"]], sha = if (dev) description[["RemoteSha"]] ) fmt <- heredoc(" renv %1$s was loaded from project library, but this project is configured to use renv %2$s. - Use `renv::record(\"%3$s\")` to record renv %1$s in the lockfile. - Use `renv::restore(packages = \"renv\")` to install renv %2$s into the project library. ") catf(fmt, friendly, renv_bootstrap_version_friendly(version), remote) FALSE } renv_bootstrap_validate_version_dev <- function(version, description) { expected <- description[["RemoteSha"]] if (!is.character(expected)) return(FALSE) pattern <- sprintf("^\\Q%s\\E", version) grepl(pattern, expected, perl = TRUE) } renv_bootstrap_validate_version_release <- function(version, description) { expected <- description[["Version"]] is.character(expected) && identical(expected, version) } renv_bootstrap_hash_text <- function(text) { hashfile <- tempfile("renv-hash-") on.exit(unlink(hashfile), add = TRUE) writeLines(text, con = hashfile) tools::md5sum(hashfile) } renv_bootstrap_load <- function(project, libpath, version) { # try to load renv from the project library if (!requireNamespace("renv", lib.loc = libpath, quietly = TRUE)) return(FALSE) # warn if the version of renv loaded does not match renv_bootstrap_validate_version(version) # execute renv load hooks, if any hooks <- getHook("renv::autoload") for (hook in hooks) if (is.function(hook)) tryCatch(hook(), error = warnify) # load the project renv::load(project) TRUE } renv_bootstrap_profile_load <- function(project) { # if RENV_PROFILE is already set, just use that profile <- Sys.getenv("RENV_PROFILE", unset = NA) if (!is.na(profile) && nzchar(profile)) return(profile) # check for a profile file (nothing to do if it doesn't exist) path <- renv_bootstrap_paths_renv("profile", profile = FALSE, project = project) if (!file.exists(path)) return(NULL) # read the profile, and set it if it exists contents <- readLines(path, warn = FALSE) if (length(contents) == 0L) return(NULL) # set RENV_PROFILE profile <- contents[[1L]] if (!profile %in% c("", "default")) Sys.setenv(RENV_PROFILE = profile) profile } renv_bootstrap_profile_prefix <- function() { profile <- renv_bootstrap_profile_get() if (!is.null(profile)) return(file.path("profiles", profile, "renv")) } renv_bootstrap_profile_get <- function() { profile <- Sys.getenv("RENV_PROFILE", unset = "") renv_bootstrap_profile_normalize(profile) } renv_bootstrap_profile_set <- function(profile) { profile <- renv_bootstrap_profile_normalize(profile) if (is.null(profile)) Sys.unsetenv("RENV_PROFILE") else Sys.setenv(RENV_PROFILE = profile) } renv_bootstrap_profile_normalize <- function(profile) { if (is.null(profile) || profile %in% c("", "default")) return(NULL) profile } renv_bootstrap_path_absolute <- function(path) { substr(path, 1L, 1L) %in% c("~", "/", "\\") || ( substr(path, 1L, 1L) %in% c(letters, LETTERS) && substr(path, 2L, 3L) %in% c(":/", ":\\") ) } renv_bootstrap_paths_renv <- function(..., profile = TRUE, project = NULL) { renv <- Sys.getenv("RENV_PATHS_RENV", unset = "renv") root <- if (renv_bootstrap_path_absolute(renv)) NULL else project prefix <- if (profile) renv_bootstrap_profile_prefix() components <- c(root, renv, prefix, ...) paste(components, collapse = "/") } renv_bootstrap_project_type <- function(path) { descpath <- file.path(path, "DESCRIPTION") if (!file.exists(descpath)) return("unknown") desc <- tryCatch( read.dcf(descpath, all = TRUE), error = identity ) if (inherits(desc, "error")) return("unknown") type <- desc$Type if (!is.null(type)) return(tolower(type)) package <- desc$Package if (!is.null(package)) return("package") "unknown" } renv_bootstrap_user_dir <- function() { dir <- renv_bootstrap_user_dir_impl() path.expand(chartr("\\", "/", dir)) } renv_bootstrap_user_dir_impl <- function() { # use local override if set override <- getOption("renv.userdir.override") if (!is.null(override)) return(override) # use R_user_dir if available tools <- asNamespace("tools") if (is.function(tools$R_user_dir)) return(tools$R_user_dir("renv", "cache")) # try using our own backfill for older versions of R envvars <- c("R_USER_CACHE_DIR", "XDG_CACHE_HOME") for (envvar in envvars) { root <- Sys.getenv(envvar, unset = NA) if (!is.na(root)) return(file.path(root, "R/renv")) } # use platform-specific default fallbacks if (Sys.info()[["sysname"]] == "Windows") file.path(Sys.getenv("LOCALAPPDATA"), "R/cache/R/renv") else if (Sys.info()[["sysname"]] == "Darwin") "~/Library/Caches/org.R-project.R/R/renv" else "~/.cache/R/renv" } renv_bootstrap_version_friendly <- function(version, shafmt = NULL, sha = NULL) { sha <- sha %||% attr(version, "sha", exact = TRUE) parts <- c(version, sprintf(shafmt %||% " [sha: %s]", substring(sha, 1L, 7L))) paste(parts, collapse = "") } renv_bootstrap_exec <- function(project, libpath, version) { if (!renv_bootstrap_load(project, libpath, version)) renv_bootstrap_run(project, libpath, version) } renv_bootstrap_run <- function(project, libpath, version) { # perform bootstrap bootstrap(version, libpath) # exit early if we're just testing bootstrap if (!is.na(Sys.getenv("RENV_BOOTSTRAP_INSTALL_ONLY", unset = NA))) return(TRUE) # try again to load if (requireNamespace("renv", lib.loc = libpath, quietly = TRUE)) { return(renv::load(project = project)) } # failed to download or load renv; warn the user msg <- c( "Failed to find an renv installation: the project will not be loaded.", "Use `renv::activate()` to re-initialize the project." ) warning(paste(msg, collapse = "\n"), call. = FALSE) } renv_json_read <- function(file = NULL, text = NULL) { jlerr <- NULL # if jsonlite is loaded, use that instead if ("jsonlite" %in% loadedNamespaces()) { json <- tryCatch(renv_json_read_jsonlite(file, text), error = identity) if (!inherits(json, "error")) return(json) jlerr <- json } # otherwise, fall back to the default JSON reader json <- tryCatch(renv_json_read_default(file, text), error = identity) if (!inherits(json, "error")) return(json) # report an error if (!is.null(jlerr)) stop(jlerr) else stop(json) } renv_json_read_jsonlite <- function(file = NULL, text = NULL) { text <- paste(text %||% readLines(file, warn = FALSE), collapse = "\n") jsonlite::fromJSON(txt = text, simplifyVector = FALSE) } renv_json_read_patterns <- function() { list( # objects list("{", "\t\n\tobject(\t\n\t", TRUE), list("}", "\t\n\t)\t\n\t", TRUE), # arrays list("[", "\t\n\tarray(\t\n\t", TRUE), list("]", "\n\t\n)\n\t\n", TRUE), # maps list(":", "\t\n\t=\t\n\t", TRUE), # newlines list("\\u000a", "\n", FALSE) ) } renv_json_read_envir <- function() { envir <- new.env(parent = emptyenv()) envir[["+"]] <- `+` envir[["-"]] <- `-` envir[["object"]] <- function(...) { result <- list(...) names(result) <- as.character(names(result)) result } envir[["array"]] <- list envir[["true"]] <- TRUE envir[["false"]] <- FALSE envir[["null"]] <- NULL envir } renv_json_read_remap <- function(object, patterns) { # repair names if necessary if (!is.null(names(object))) { nms <- names(object) for (pattern in patterns) nms <- gsub(pattern[[2L]], pattern[[1L]], nms, fixed = TRUE) names(object) <- nms } # repair strings if necessary if (is.character(object)) { for (pattern in patterns) object <- gsub(pattern[[2L]], pattern[[1L]], object, fixed = TRUE) } # recurse for other objects if (is.recursive(object)) for (i in seq_along(object)) object[i] <- list(renv_json_read_remap(object[[i]], patterns)) # return remapped object object } renv_json_read_default <- function(file = NULL, text = NULL) { # read json text text <- paste(text %||% readLines(file, warn = FALSE), collapse = "\n") # convert into something the R parser will understand patterns <- renv_json_read_patterns() transformed <- text for (pattern in patterns) transformed <- gsub(pattern[[1L]], pattern[[2L]], transformed, fixed = TRUE) # parse it rfile <- tempfile("renv-json-", fileext = ".R") on.exit(unlink(rfile), add = TRUE) writeLines(transformed, con = rfile) json <- parse(rfile, keep.source = FALSE, srcfile = NULL)[[1L]] # evaluate in safe environment result <- eval(json, envir = renv_json_read_envir()) # fix up strings if necessary -- do so only with reversible patterns patterns <- Filter(function(pattern) pattern[[3L]], patterns) renv_json_read_remap(result, patterns) } # load the renv profile, if any renv_bootstrap_profile_load(project) # construct path to library root root <- renv_bootstrap_library_root(project) # construct library prefix for platform prefix <- renv_bootstrap_platform_prefix() # construct full libpath libpath <- file.path(root, prefix) # run bootstrap code renv_bootstrap_exec(project, libpath, version) invisible() }) ================================================ FILE: renv/settings.json ================================================ { "bioconductor.version": "3.21", "external.libraries": [], "ignored.packages": [], "package.dependency.fields": [ "Imports", "Depends", "LinkingTo" ], "ppm.enabled": null, "ppm.ignored.urls": [], "r.version": null, "snapshot.type": "explicit", "use.cache": true, "vcs.ignore.cellar": true, "vcs.ignore.library": true, "vcs.ignore.local": true, "vcs.manage.ignores": true } ================================================ FILE: revdep/.gitignore ================================================ checks library checks.noindex library.noindex data.sqlite *.html ================================================ FILE: revdep/email.yml ================================================ release_date: ??? rel_release_date: ??? my_news_url: ??? release_version: ??? release_details: ??? ================================================ FILE: revdep/failures.md ================================================ *Wow, no problems at all. :)* ================================================ FILE: slides/cost_charme_school/demo_script.R ================================================ ################################################## ## Project: pathfindR ## Script purpose: COST CHARME Summer Training ## School, Istanbul - pathfindR hands-on demonstration ## Date: Sep 1, 2019 ## Author: Ege Ulgen ################################################## # Installation ------------------------------------------------------------ # For the active subnetwork search component to work(i.e., in order to # run pathfindR), the user must have Java installed and # path/to/java must be in the PATH environment variable. # For Windows users, to configure the PATH environment variable see: # https://github.com/egeulgen/pathfindR/wiki/Installation#configuration-of-java-on-windows install.packages("devtools") # if you have not installed "devtools" package devtools::install_github("egeulgen/pathfindR") library(pathfindR) # Enrichment Analysis ----------------------------------------------------- ## demo input file = RA_input ?RA_input dim(RA_input) head(RA_input) ## demo runs ?run_pathfindR # takes a while (use `visualize_pathways = FALSE` for faster runs) RA_demo_out1 <- run_pathfindR(RA_input, iterations = 1) # change number of iter.s to 1 for demo dim(RA_demo_out1) head(RA_demo_out1) # faster non-default run RA_demo_out2 <- run_pathfindR(RA_input, iterations = 1, # change number of iter.s to 1 for demo gene_sets = "BioCarta", # change from default ("KEGG") pin_name_path = "GeneMania", # change from default ("Biogrid") output = "DEMO_OUTPUT") # change output directory # Pathway Clustering ------------------------------------------------------ ## demo enrichment result file = RA_output ?RA_output dim(RA_output) head(RA_output) ?cluster_pathways RA_demo_clu1 <- cluster_pathways(RA_output) # hierarchical (default) RA_demo_clu2 <- cluster_pathways(RA_output, method = "fuzzy") head(RA_demo_clu1) head(RA_demo_clu2) ## Plot enrichment chart grouped by clusters enrichment_chart(RA_demo_clu1, plot_by_cluster = TRUE) ## Example Output for the pathfindR Clustering Workflow ?RA_clustered # Term-gene graph --------------------------------------------------------- ?term_gene_graph ### `options(stringsAsFactors = TRUE)` if `stringsAsFactors` is set as FALSE in .Rprofile term_gene_graph(RA_output) # top 10 terms(default) ## Graph using representative pathways RA_representative <- RA_demo_clu1[RA_demo_clu1$Status == "Representative", ] term_gene_graph(RA_representative, num_terms = NULL, # to plot using all terms use_names = TRUE) # use pw names instead of IDs # Pathway Scoring --------------------------------------------------------- ## Expression matrix = RA_exp_mat ?RA_exp_mat ## Vector of "Case" IDs cases <- c("GSM389703", "GSM389704", "GSM389706", "GSM389708", "GSM389711", "GSM389714", "GSM389716", "GSM389717", "GSM389719", "GSM389721", "GSM389722", "GSM389724", "GSM389726", "GSM389727", "GSM389730", "GSM389731", "GSM389733", "GSM389735") ?calculate_pw_scores ## Calculate pathway scores and plot heatmap for top 10 enriched pathways score_matrix <- calculate_pw_scores(RA_output[1:10, ], RA_exp_mat, cases) ## Calculate pathway scores and plot heatmap for representative patways score_matrix <- calculate_pw_scores(RA_representative, RA_exp_mat, cases) # works if cases are not supplied as well score_matrix <- calculate_pw_scores(RA_representative, RA_exp_mat) ================================================ FILE: tests/testthat/test-active_snw_search.R ================================================ ## Tests for functions related to active subnetwork search - Aug 2023 # set up input data input_data_frame <- example_pathfindR_input[1:10, c(1, 3)] colnames(input_data_frame) <- c("GENE", "P_VALUE") example_snws_len <- 1000 example_snw_output <- system.file("extdata", "resultActiveSubnetworkSearch.txt", package = "pathfindR") mock_file_path <- function(...) { args <- list(...) if (args[[1]] == "active_snw_search") { return(example_snw_output) } return(file.path(...)) } test_that("`active_snw_search()` -- returns a list object", { mockery::stub(active_snw_search, "dir.exists", TRUE) mockery::stub(active_snw_search, "file.exists", TRUE) mockery::stub(active_snw_search, "normalizePath", NULL) mockery::stub(active_snw_search, "system", NULL) mockery::stub(active_snw_search, "file.path", mock_file_path) mockery::stub(active_snw_search, "file.rename", NULL) # Expect > 0 active snws expect_message(snw_list <- active_snw_search(input_for_search = input_data_frame), "Found [1-9]\\d* active subnetworks") expect_is(snw_list, "list") expect_is(snw_list[[1]], "character") expect_true(length(snw_list) > 0) # Expect no active snws mockery::stub(active_snw_search, "filterActiveSnws", NULL) expect_message(snw_list <- active_snw_search(input_for_search = input_data_frame), "Found 0 active subnetworks") expect_identical(snw_list, list()) }) test_that("`active_snw_search()` -- `dir_for_parallel_run` arg is used when provided", { mockery::stub(active_snw_search, "dir.exists", TRUE) mockery::stub(active_snw_search, "file.exists", TRUE) mockery::stub(active_snw_search, "normalizePath", NULL) mockery::stub(active_snw_search, "system", NULL) mockery::stub(active_snw_search, "file.path", mock_file_path) mockery::stub(active_snw_search, "file.rename", NULL) m <- mockery::mock(NULL, cycle = TRUE) mockery::stub(active_snw_search, "setwd", m) res <- active_snw_search(input_for_search = input_data_frame, dir_for_parallel_run = tempdir()) mockery::expect_called(m, 2) }) test_that("`active_snw_search()` -- argument checks work", { # input_for_search expect_error(snw_list <- active_snw_search(input_for_search = list()), "`input_for_search` should be data frame") invalid_input <- input_data_frame colnames(invalid_input) <- c("A", "B") expect_error(snw_list <- active_snw_search(input_for_search = invalid_input), paste0("`input_for_search` should contain the columns ", paste(dQuote(c("GENE", "P_VALUE")), collapse = ","))) # snws_file expect_error(snw_list <- active_snw_search(input_for_search = input_data_frame, snws_file = "[/]"), "`snws_file` may be containing forbidden characters. Please change and try again") # search_method valid_mets <- c("GR", "SA", "GA") expect_error(active_snw_search(input_for_search = input_data_frame, search_method = "INVALID"), paste0("`search_method` should be one of ", paste(dQuote(valid_mets), collapse = ", "))) # silent_option expect_error(active_snw_search(input_for_search = input_data_frame, silent_option = "WRONG"), "`silent_option` should be either TRUE or FALSE") expect_error(active_snw_search(input_for_search = input_data_frame, use_all_positives = "INVALID"), "`use_all_positives` should be either TRUE or FALSE") }) test_that("`active_snw_search()` -- all search methods work", { skip_on_cran() ## GR expect_message(snw_list <- active_snw_search(input_for_search = input_data_frame, pin_name_path = "Biogrid", search_method = "GR", dir_for_parallel_run = tempdir(check = TRUE)), "Found [1-9]\\d* active subnetworks") expect_is(snw_list, "list") expect_is(snw_list[[1]], "character") skip("will test SA and GA if we can create a suitable (faster and non-empty) test case") ## SA expect_message(snw_list <- active_snw_search(input_for_search = input_data_frame, pin_name_path = "Biogrid", search_method = "SA", dir_for_parallel_run = tempdir(check = TRUE)), "Found [1-9]\\d* active subnetworks") expect_is(snw_list, "list") expect_is(snw_list[[1]], "character") ## GA expect_message(snw_list <- active_snw_search(input_for_search = input_data_frame, pin_name_path = "Biogrid", search_method = "GA", dir_for_parallel_run = tempdir(check = TRUE)), "Found [1-9]\\d* active subnetworks") expect_is(snw_list, "list") expect_is(snw_list[[1]], "character") }) test_that("`active_snw_search()` -- results are reproducible", { skip_on_cran() snw_lists <- list() seed_vals <- c(123, 123, 456) for (idx in 1:3) { seed <- seed_vals[idx] snw_lists[[idx]] <- active_snw_search(input_for_search = input_data_frame, seedForRandom = seed, dir_for_parallel_run = tempdir(check = TRUE)) } expect_identical(snw_lists[[1]], snw_lists[[2]]) expect_false(identical(snw_lists[[1]], snw_lists[[3]])) }) test_that("`filterActiveSnws()` -- returns expected list object", { snws_filtered <- filterActiveSnws(active_snw_path = example_snw_output, sig_genes_vec = input_data_frame$GENE) expect_is(snws_filtered, "list") expect_length(snws_filtered, 2) expect_is(snws_filtered$subnetworks, "list") expect_is(snws_filtered$scores, "numeric") expect_is(snws_filtered$subnetworks[[1]], "character") expect_true(length(snws_filtered$subnetworks) <= example_snws_len) # empty file case empty_path <- tempfile("empty", fileext = ".txt") file.create(empty_path) expect_null(suppressWarnings(filterActiveSnws(active_snw_path = empty_path, sig_genes_vec = input_data_frame$GENE))) }) test_that("`filterActiveSnws()` -- `score_quan_thr` works", { snws_filtered <- filterActiveSnws(active_snw_path = example_snw_output, sig_genes_vec = example_pathfindR_input$Gene.symbol, score_quan_thr = -1, sig_gene_thr = 0) expect_length(snws_filtered$subnetworks, example_snws_len) for (q_thr in seq(0.1, 1, by = 0.1)) { snws_filtered <- filterActiveSnws(active_snw_path = example_snw_output, sig_genes_vec = example_pathfindR_input$Gene.symbol, score_quan_thr = q_thr, sig_gene_thr = 0) exp_len <- example_snws_len * (1 - q_thr) expect_length(snws_filtered$subnetworks, as.integer(exp_len + 0.5)) } }) test_that("`filterActiveSnws()` -- `sig_gene_thr` works", { snws_filtered1 <- filterActiveSnws(active_snw_path = example_snw_output, sig_genes_vec = example_pathfindR_input$Gene.symbol, sig_gene_thr = 0.02, score_quan_thr = -1) snws_filtered2 <- filterActiveSnws(active_snw_path = example_snw_output, sig_genes_vec = example_pathfindR_input$Gene.symbol, sig_gene_thr = 0.1, score_quan_thr = -1) expect_true(length(snws_filtered2$subnetworks) < example_snws_len) expect_true(length(snws_filtered1$subnetworks) > length(snws_filtered2$subnetworks)) }) test_that("`filterActiveSnws()` -- argument checks work", { expect_error(filterActiveSnws(active_snw_path = "this/is/not/a/valid/path"), "The active subnetwork file does not exist! Check the `active_snw_path` argument") expect_error(filterActiveSnws(active_snw_path = example_snw_output, sig_genes_vec = list()), "`sig_genes_vec` should be a vector") expect_error(filterActiveSnws(active_snw_path = example_snw_output, sig_genes_vec = example_pathfindR_input$Gene.symbol, score_quan_thr = "INVALID"), "`score_quan_thr` should be numeric") expect_error(filterActiveSnws(active_snw_path = example_snw_output, sig_genes_vec = example_pathfindR_input$Gene.symbol, score_quan_thr = -2), "`score_quan_thr` should be in \\[0, 1\\] or -1 \\(if not filtering\\)") expect_error(filterActiveSnws(active_snw_path = example_snw_output, sig_genes_vec = example_pathfindR_input$Gene.symbol, score_quan_thr = 2), "`score_quan_thr` should be in \\[0, 1\\] or -1 \\(if not filtering\\)") expect_error(filterActiveSnws(active_snw_path = example_snw_output, sig_genes_vec = example_pathfindR_input$Gene.symbol, sig_gene_thr = "INVALID"), "`sig_gene_thr` should be numeric") expect_error(filterActiveSnws(active_snw_path = example_snw_output, sig_genes_vec = example_pathfindR_input$Gene.symbol, sig_gene_thr = -1), "`sig_gene_thr` should be in \\[0, 1\\]") }) test_that("`visualize_active_subnetworks()` -- returns list of ggraph objects", { # empty file case empty_path <- tempfile("empty", fileext = ".txt") file.create(empty_path) expect_null(visualize_active_subnetworks(active_snw_path = empty_path, genes_df = input_data_frame)) skip_on_cran() # default g_list <- visualize_active_subnetworks(example_snw_output, input_data_frame) expect_is(g_list, "list") expect_is(g_list[[1]], "ggraph") expect_true(length(g_list) <= example_snws_len) # set `num_snws` to larger than actual number g_list <- visualize_active_subnetworks(example_snw_output, input_data_frame, num_snws = 21) expect_is(g_list, "list") expect_is(g_list[[1]], "ggraph") expect_length(g_list, 13) }) ================================================ FILE: tests/testthat/test-clustering.R ================================================ ## Tests for enriched term clustering functions - Aug 2023 enrichment_res <- example_pathfindR_output[1:5, ] input_kappa_mat <- create_kappa_matrix(enrichment_res) test_that("`create_kappa_matrix()` -- creates kappa matrix", { input_df <- enrichment_res kappa_mat <- create_kappa_matrix(input_df) expect_true(isSymmetric.matrix(kappa_mat)) expect_true(all(kappa_mat >= 0 & kappa_mat <= 1 | kappa_mat >= -1 & kappa_mat <= 0)) expect_identical(colnames(kappa_mat), rownames(kappa_mat)) expect_identical(colnames(kappa_mat), input_df$ID) # zero length excluded input_df2 <- input_df input_df2$Down_regulated[1] <- input_df2$Up_regulated[1] <- "" kappa_mat2 <- create_kappa_matrix(input_df2) expect_true(isSymmetric.matrix(kappa_mat2)) expect_false(input_df2$ID[1] %in% colnames(kappa_mat2)) input_df$non_Signif_Snw_Genes <- c("GeneA, GeneB", "GeneA", "GeneC", "GeneB, GeneC", "") kappa_mat3 <- create_kappa_matrix(input_df, use_active_snw_genes = TRUE) expect_true(isSymmetric.matrix(kappa_mat3)) expect_true(!all(kappa_mat3 != kappa_mat)) }) test_that("`create_kappa_matrix()` -- argument checks works", { expect_error(create_kappa_matrix(example_pathfindR_output, use_description = "INVALID"), "`use_description` should be TRUE or FALSE") expect_error(create_kappa_matrix(example_pathfindR_output, use_active_snw_genes = "INVALID"), "`use_active_snw_genes` should be TRUE or FALSE") expect_error(create_kappa_matrix(list()), "`enrichment_res` should be a data frame of enrichment results") expect_error(create_kappa_matrix(example_pathfindR_output[1, ]), "`enrichment_res` should contain at least 2 rows") cr_cols <- function(use_description = FALSE, use_active_snw_genes = FALSE) { nec_cols <- c("Down_regulated", "Up_regulated") if (use_description) { nec_cols <- c("Term_Description", nec_cols) } else { nec_cols <- c("ID", nec_cols) } if (use_active_snw_genes) { nec_cols <- c(nec_cols, "non_Signif_Snw_Genes") } return(nec_cols) } # desc F nec_cols <- cr_cols() valid_res <- enrichment_res[, -2] expect_silent(create_kappa_matrix(valid_res)) invalid_res <- enrichment_res[, -1] expect_error(create_kappa_matrix(invalid_res), paste0("`enrichment_res` should contain all of ", paste(dQuote(nec_cols), collapse = ", "))) # desc T nec_cols <- cr_cols(use_description = TRUE) valid_res <- enrichment_res[, -1] expect_silent(create_kappa_matrix(valid_res, use_description = TRUE)) invalid_res <- enrichment_res[, -2] expect_error(create_kappa_matrix(invalid_res, use_description = TRUE), paste0("`enrichment_res` should contain all of ", paste(dQuote(nec_cols), collapse = ", "))) # snw_g T nec_cols <- cr_cols(use_active_snw_genes = TRUE) valid_res <- enrichment_res valid_res$non_Signif_Snw_Genes <- "" expect_silent(create_kappa_matrix(valid_res, use_active_snw_genes = TRUE)) expect_error(create_kappa_matrix(enrichment_res, use_active_snw_genes = TRUE), paste0("`enrichment_res` should contain all of ", paste(dQuote(nec_cols), collapse = ", "))) }) test_that("`hierarchical_term_clustering()` -- returns integer vector", { m <- mockery::mock(NULL, cycle = TRUE) mockery::stub(hierarchical_term_clustering, "graphics::plot", m) mockery::stub(hierarchical_term_clustering, "stats::heatmap", m) mockery::stub(hierarchical_term_clustering, "stats::rect.hclust", m) expected_message_regex <- "The maximum average silhouette width was -?(0\\.?\\d{0,2}|1) for k = \\d+ \n\n" expect_message(clu_res <- hierarchical_term_clustering(input_kappa_mat, enrichment_res, plot_hmap = TRUE, plot_dend = TRUE), expected_message_regex) expect_is(clu_res, "integer") expect_true(max(clu_res) <= nrow(input_kappa_mat)) expect_identical(rownames(input_kappa_mat), names(clu_res)) }) test_that("`hierarchical_term_clustering()` -- `num_clusters` works", { for (selected_num_clusters in seq_len(nrow(enrichment_res))) { expect_is(res <- hierarchical_term_clustering(input_kappa_mat, enrichment_res, num_clusters = selected_num_clusters, plot_hmap = FALSE, plot_dend = FALSE), "integer") expect_equal(max(res), selected_num_clusters) } }) test_that("`hierarchical_term_clustering()` -- `kseq` (sequence of number of clusters to try) is determined appropriately", { mockery::stub(hierarchical_term_clustering, "stats::hclust", NULL) mockery::stub(hierarchical_term_clustering, "isSymmetric.matrix", TRUE) mock_cutree <- function(tree, k, h = NULL) { return(k) } mockery::stub(hierarchical_term_clustering, "stats::cutree", mock_cutree) for (num_terms in c(3, 15, 153, 200, 204, 432)) { kmax <- max(num_terms%/%2, 2) num_expected_calls <- ifelse(kmax <= 20, kmax - 1, ifelse(kmax <= 100, 18 + kmax%/%10 - 1, 26 + kmax%/%50 - 1)) target_k <- ifelse(kmax <= 20, kmax, ifelse(kmax <= 100, round(kmax%/%10) * 10, round(kmax%/%50) * 50)) tmp_enr_res <- example_pathfindR_output[seq_len(num_terms), ] tmp_kappa_mat <- matrix(NA, nrow = num_terms, ncol = num_terms, dimnames = list(tmp_enr_res$ID, tmp_enr_res$ID)) silwidth_out_vec <- vector("list", num_expected_calls) for (idx in seq_len(num_expected_calls)) { if (idx == length(silwidth_out_vec)) { silwidth_out_vec[[idx]] <- list(avg.silwidth = 100) } else { silwidth_out_vec[[idx]] <- list(avg.silwidth = -100) } } mock_cluster.stats <- do.call(mockery::mock, silwidth_out_vec) mockery::stub(hierarchical_term_clustering, "fpc::cluster.stats", mock_cluster.stats) expected_message <- paste0("The maximum average silhouette width was 100 for k = ", target_k, " \n\n") expect_message(res_k <- hierarchical_term_clustering(tmp_kappa_mat, tmp_enr_res, plot_hmap = FALSE, plot_dend = FALSE), expected_message) expect_equal(res_k, target_k) mockery::expect_called(mock_cluster.stats, num_expected_calls) } }) test_that("`hierarchical_term_clustering()` -- argument checks work", { expect_error(hierarchical_term_clustering(kappa_mat = list(), enrichment_res = data.frame()), "`kappa_mat` should be a symmetric matrix") expect_error(hierarchical_term_clustering(kappa_mat = matrix(nrow = 1, ncol = 2), enrichment_res = data.frame()), "`kappa_mat` should be a symmetric matrix") mat <- matrix(nrow = 3, ncol = 3, dimnames = list(1:3, 1:3)) expect_error(hierarchical_term_clustering(kappa_mat = mat, enrichment_res = data.frame(ID = 4:5)), "All terms in `kappa_mat` should be present in `enrichment_res`") expect_error(hierarchical_term_clustering(kappa_mat = mat, enrichment_res = data.frame(ID = 1:3), plot_hmap = "INVALID"), "`plot_hmap` should be TRUE or FALSE") expect_error(hierarchical_term_clustering(kappa_mat = mat, enrichment_res = data.frame(ID = 1:3), plot_dend = "INVALID"), "`plot_dend` should be TRUE or FALSE") }) test_that("`fuzzy_term_clustering()` -- returns matrix of cluster memberships", { expect_is(res_mat <- fuzzy_term_clustering(create_kappa_matrix(example_pathfindR_output[1:25, ]), example_pathfindR_output[1:25, ], kappa_threshold = 0.1), "matrix") expect_true(is.logical(res_mat)) }) test_that("`fuzzy_term_clustering()` -- argument checks work", { expect_error(fuzzy_term_clustering(kappa_mat = list(), enrichment_res = data.frame()), "`kappa_mat` should be a symmetric matrix") expect_error(fuzzy_term_clustering(kappa_mat = matrix(nrow = 1, ncol = 2), enrichment_res = data.frame()), "`kappa_mat` should be a symmetric matrix") mat <- matrix(nrow = 3, ncol = 3, dimnames = list(1:3, 1:3)) expect_error(fuzzy_term_clustering(kappa_mat = mat, enrichment_res = data.frame(ID = 4:5)), "All terms in `kappa_mat` should be present in `enrichment_res`") expect_error(fuzzy_term_clustering(kappa_mat = mat, enrichment_res = data.frame(ID = 1:3), kappa_threshold = "INVALID"), "`kappa_threshold` should be numeric") expect_error(fuzzy_term_clustering(kappa_mat = mat, enrichment_res = data.frame(ID = 1:3), kappa_threshold = 1.5), "`kappa_threshold` should be at most 1 as kappa statistic is always <= 1") }) test_that("`cluster_graph_vis()` -- graph visualization of clusters works OK", { mockery::stub(hierarchical_term_clustering, "graphics::plot", NULL) mockery::stub(hierarchical_term_clustering, "stats::rect.hclust", NULL) mock_plot.igraph <- mockery::mock(NULL, cycle = TRUE) mockery::stub(cluster_graph_vis, "igraph::plot.igraph", mock_plot.igraph) ## use_description = FALSE for (clustering_func in c(hierarchical_term_clustering, fuzzy_term_clustering)) { clu_obj <- clustering_func(input_kappa_mat, enrichment_res) expect_silent(cluster_graph_vis(clu_obj, input_kappa_mat, enrichment_res)) } mockery::expect_called(mock_plot.igraph, 2) }) test_that("`cluster_graph_vis()` -- coloring of 'extra' clusters work", { mockery::stub(cluster_graph_vis, "igraph::plot.igraph", NULL) ### more than 41 clusters (number of colors available) selected_num_terms <- 45 clu_input_df <- example_pathfindR_output[seq_len(selected_num_terms), ] mock_kappa_mat <- matrix(0, nrow = selected_num_terms, ncol = selected_num_terms, dimnames = list(clu_input_df$ID, clu_input_df$ID)) # dummy hierarchical result hierarchical_clu_obj <- seq_len(selected_num_terms) names(hierarchical_clu_obj) <- clu_input_df$ID expect_silent(cluster_graph_vis(hierarchical_clu_obj, mock_kappa_mat, clu_input_df)) # dummy fuzzy result fuzzy_clu_obj <- matrix(FALSE, nrow = selected_num_terms, ncol = selected_num_terms, dimnames = list(clu_input_df$ID, seq_len(selected_num_terms))) diag(fuzzy_clu_obj) <- TRUE expect_silent(cluster_graph_vis(fuzzy_clu_obj, mock_kappa_mat, clu_input_df)) }) test_that("`cluster_graph_vis()` -- check errors are raised appropriately", { expect_error(cluster_graph_vis(list(), matrix(), data.frame(ID = 1)), "Invalid class for `clu_obj`!") # hierarchical - missing terms in kappa matrix clu_obj <- hierarchical_term_clustering(input_kappa_mat, enrichment_res, plot_dend = FALSE) expect_error(cluster_graph_vis(c(clu_obj, EXTRA = 1L), input_kappa_mat, enrichment_res), "Not all terms in `clu_obj` present in `kappa_mat`!") # fuzzy - missing terms in kappa matrix clu_obj <- fuzzy_term_clustering(input_kappa_mat, enrichment_res) expect_error(cluster_graph_vis(rbind(clu_obj, EXTRA = rep(FALSE, ncol(clu_obj))), input_kappa_mat, enrichment_res), "Not all terms in `clu_obj` present in `kappa_mat`!") }) test_that("`cluster_enriched_terms()` -- returns the input data frame with the additional columns `Cluster` and `Status`", { set.seed(123) num_clusters <- 3 available_clus <- seq_len(num_clusters) toy_hierarchical_clu_obj <- sample(available_clus, size = nrow(enrichment_res) - 1, replace = TRUE) missing_clu <- setdiff(available_clus, toy_hierarchical_clu_obj) toy_hierarchical_clu_obj <- c(toy_hierarchical_clu_obj, missing_clu) toy_hierarchical_clu_obj <- sample(toy_hierarchical_clu_obj) names(toy_hierarchical_clu_obj) <- enrichment_res$ID toy_fuzzy_clu_obj <- matrix(FALSE, nrow = nrow(enrichment_res), ncol = num_clusters, dimnames = list(enrichment_res$ID, available_clus)) for (row_idx in seq_len(nrow(toy_fuzzy_clu_obj))) { num_memberships <- sample(available_clus, 1) new_cols <- sample(available_clus, size = num_memberships) toy_fuzzy_clu_obj[row_idx, new_cols] <- TRUE } mock_doCall <- function(...) { arguments <- list(...) if (arguments[1] == "hierarchical_term_clustering") { return(toy_hierarchical_clu_obj) } if (arguments[1] == "fuzzy_term_clustering") { return(toy_fuzzy_clu_obj) } return(NULL) } mockery::stub(cluster_enriched_terms, "create_kappa_matrix", input_kappa_mat) mockery::stub(cluster_enriched_terms, "R.utils::doCall", mock_doCall) # hierarchical expect_is(h_clu_res <- cluster_enriched_terms(enrichment_res), "data.frame") expect_true(all(c("Cluster", "Status") %in% colnames(h_clu_res))) expect_equal(max(h_clu_res$Cluster), num_clusters) # expect to have same number of rep. terms as the number of clusters expect_equal(max(h_clu_res$Cluster), sum(h_clu_res$Status == "Representative")) ## fuzzy expect_is(fuzzy_clu_res <- cluster_enriched_terms(enrichment_res, method = "fuzzy"), "data.frame") expect_true(all(c("Cluster", "Status") %in% colnames(fuzzy_clu_res))) expect_true(max(fuzzy_clu_res$Cluster) <= sum(fuzzy_clu_res$Status == "Representative")) }) test_that("`cluster_enriched_terms()` argument checks work", { expect_error(cluster_enriched_terms(enrichment_res, method = "INVALID"), "the clustering `method` must either be \"hierarchical\" or \"fuzzy\"") expect_error(cluster_enriched_terms(enrichment_res, plot_clusters_graph = "INVALID"), "`plot_clusters_graph` must be logical!") }) ================================================ FILE: tests/testthat/test-comparison.R ================================================ ## Tests for functions related to comparison of pathfindR results - Aug 2023 input_df_A <- example_pathfindR_output[1:20, ] input_df_B <- example_comparison_output[1:20, ] test_that("`combine_pathfindR_results()` -- works as expected", { mock_graph <- mockery::mock(NULL) mock_plot <- mockery::mock(NULL) mockery::stub(combine_pathfindR_results, "combined_results_graph", mock_graph) mockery::stub(combine_pathfindR_results, "graphics::plot", mock_plot) expect_is(combined <- combine_pathfindR_results(input_df_A, input_df_B), "data.frame") expect_true(nrow(combined) <= nrow(input_df_A) + nrow(input_df_B)) mockery::expect_called(mock_plot, 1) mockery::expect_called(mock_graph, 1) }) combined_df <- combine_pathfindR_results(input_df_A, input_df_B, plot_common = FALSE) combined_df2 <- combined_df[combined_df$status != "common", ] test_that("`combined_results_graph()` -- produces a ggplot object using the correct data", { # Common Terms, default expect_is(p <- combined_results_graph(combined_df), "ggplot") expect_true(all(p$data$type %in% c("gene", "common term"))) expect_equal(sum(p$data$type == "common term"), sum(combined_df$status == "common")) # Selected 5 Terms sel_terms <- combined_df$ID[1:5] expect_is(p <- combined_results_graph(combined_df, selected_terms = sel_terms), "ggplot") expect_true(all(sel_terms %in% p$data$name)) # use_description = TRUE expect_is(p <- combined_results_graph(combined_df, use_description = TRUE), "ggplot") # node_size = 'p_val' expect_is(p <- combined_results_graph(combined_df, node_size = "p_val"), "ggplot") # errors when there are no common terms expect_error(combined_results_graph(combined_df2), "There are no common terms") }) test_that("`combined_results_graph()` -- argument checks work", { expect_error(combined_results_graph(combined_df, use_description = "INVALID"), "`use_description` must either be TRUE or FALSE!") val_node_size <- c("num_genes", "p_val") expect_error(combined_results_graph(combined_df, node_size = "INVALID"), paste0("`node_size` should be one of ", paste(dQuote(val_node_size), collapse = ", "))) expect_error(combined_results_graph(combined_df = "INVALID"), "`combined_df` should be a data frame") wrong_df <- combined_df[, -c(1, 2)] ID_column <- "ID" necessary_cols <- c(ID_column, "combined_p", "Up_regulated_A", "Down_regulated_A", "Up_regulated_B", "Down_regulated_B") expect_error(combined_results_graph(wrong_df, use_description = FALSE), paste(c("All of", paste(necessary_cols, collapse = ", "), "must be present in `results_df`!"), collapse = " ")) ID_column <- "Term_Description" necessary_cols <- c(ID_column, "combined_p", "Up_regulated_A", "Down_regulated_A", "Up_regulated_B", "Down_regulated_B") expect_error(combined_results_graph(wrong_df, use_description = TRUE), paste(c("All of", paste(necessary_cols, collapse = ", "), "must be present in `results_df`!"), collapse = " ")) expect_error(combined_results_graph(combined_df, selected_terms = "INVALID"), "None of the `selected_terms` are in the combined results!") }) ================================================ FILE: tests/testthat/test-core.R ================================================ ## Tests for core function - Aug 2023 # set up input data input_data_frame <- example_pathfindR_input[1:10, c(1, 3)] colnames(input_data_frame) <- c("GENE", "P_VALUE") test_that("`run_pathfindR()` -- works as expected", { mock_fetch_gene_set <- mockery::mock(list(), cycle = TRUE) mock_return_pin_path <- mockery::mock("/path/to/some/PIN/SIF", cycle = TRUE) mock_input_processing <- mockery::mock(input_data_frame, cycle = TRUE) mock_active_snw_enrichment_wrapper <- mockery::mock(data.frame(), c()) mock_summarize_enrichment_results <- mockery::mock(data.frame()) mock_annotate_term_genes <- mockery::mock(example_pathfindR_output) mock_plot <- mockery::mock(NULL) mockery::stub(run_pathfindR, "fetch_gene_set", mock_fetch_gene_set) mockery::stub(run_pathfindR, "return_pin_path", mock_return_pin_path) mockery::stub(run_pathfindR, "input_processing", mock_input_processing) mockery::stub(run_pathfindR, "active_snw_enrichment_wrapper", mock_active_snw_enrichment_wrapper) mockery::stub(run_pathfindR, "summarize_enrichment_results", mock_summarize_enrichment_results) mockery::stub(run_pathfindR, "annotate_term_genes", mock_annotate_term_genes) mockery::stub(run_pathfindR, "graphics::plot", mock_plot) mockery::stub(run_pathfindR, "create_HTML_report", NULL) expected_messages <- paste(c("The input looks OK", "Plotting the enrichment bubble chart", paste(c(paste0("Found ", nrow(example_pathfindR_output), " enriched terms\n"), "You may run:", "- cluster_enriched_terms() for clustering enriched terms", "- visualize_terms() for visualizing enriched term diagrams\n"), collapse = "\n")), collapse = "|") # wrapper functions correctly - with output_dir provided out_dir <- file.path(tempdir(check = TRUE), "core_test") expect_message(res <- run_pathfindR(input_data_frame, output_dir = out_dir), expected_messages) expect_is(res, "data.frame") expect_identical(res, example_pathfindR_output) expect_true(dir.exists(out_dir)) mockery::expect_called(mock_fetch_gene_set, 1) mockery::expect_called(mock_return_pin_path, 1) mockery::expect_called(mock_input_processing, 1) mockery::expect_called(mock_active_snw_enrichment_wrapper, 1) mockery::expect_called(mock_summarize_enrichment_results, 1) mockery::expect_called(mock_annotate_term_genes, 1) mockery::expect_called(mock_plot, 1) # warning raised as expected when no results found expect_warning(res <- run_pathfindR(input_data_frame), "Did not find any enriched terms!") expect_identical(res, data.frame()) }) test_that("`run_pathfindR()` argument checks work", { expect_error(run_pathfindR(input_data_frame, plot_enrichment_chart = "INVALID"), "`plot_enrichment_chart` should be either TRUE or FALSE") expect_error(run_pathfindR(input_data_frame, list_active_snw_genes = "INVALID"), "`list_active_snw_genes` should be either TRUE or FALSE") }) ================================================ FILE: tests/testthat/test-data_generation.R ================================================ ## Tests for functions related to data generation - September 2025 library(httr) library(ggkegg) test_that("safe_get_content handles GET error via mocking", { fake_GET <- function(...) stop("Simulated connection failure") with_mocked_bindings( { expect_error( safe_get_content("http://example.com"), regexp = "Failed to retrieve resource" ) }, GET = fake_GET ) }) test_that("safe_get_content handles HTTP error via mocking", { fake_GET <- function(...) { structure( list(status_code = 500L), class = "response" ) } with_mocked_bindings( { expect_error( safe_get_content("http://example.com"), regexp = "unavailable" ) }, GET = fake_GET ) }) test_that("safe_get_content handles content parsing failure via mocking", { fake_GET <- function(...) { structure( list(status_code = 200L), class = "response" ) } fake_content <- function(...) stop("Simulated parsing failure") with_mocked_bindings( { expect_error( safe_get_content("http://example.com"), regexp = "Failed to parse content" ) }, GET = fake_GET, content = fake_content ) }) set.seed(123) gene_pool <- paste0("Gene", 1:100) toy_biogrid_pin <- data.frame(A = sample(gene_pool, 25), B = sample(gene_pool, 25)) colnames(toy_biogrid_pin) <- c("Official Symbol Interactor A", "Official Symbol Interactor B") test_that("`process_pin()` -- removes self-interactions and duplicated interactions", { input_pin_df <- toy_biogrid_pin colnames(input_pin_df) <- c("Interactor_A", "Interactor_B") input_pin_df <- rbind(input_pin_df, data.frame(Interactor_A = input_pin_df$Interactor_B[1:5], Interactor_B = input_pin_df$Interactor_A[1:5])) processed_df <- process_pin(input_pin_df) expect_true(nrow(processed_df) < nrow(input_pin_df)) }) test_that("`get_biogrid_pin()` -- returns a path to a valid PIN file", { mockery::stub(get_biogrid_pin, "utils::download.file", NULL) mockery::stub(get_biogrid_pin, "utils::unzip", list(Name = "BIOGRID-ORGANISM-Homo_sapiens-4.4.211.tab3.txt")) mockery::stub(get_biogrid_pin, "utils::read.delim", toy_biogrid_pin) expected_biogrid_pin_df <- toy_biogrid_pin colnames(expected_biogrid_pin_df) <- c("Interactor_A", "Interactor_B") expected_biogrid_pin_df <- process_pin(expected_biogrid_pin_df) expected_biogrid_pin_df <- data.frame(V1 = expected_biogrid_pin_df$Interactor_A, V2 = "pp", V3 = expected_biogrid_pin_df$Interactor_B) pin_path <- get_biogrid_pin(release = "4.4.211") pin_df <- read.delim(pin_path, header = FALSE) expect_true(ncol(pin_df) == 3) expect_true(all(pin_df[, 2] == "pp")) expect_identical(pin_df, expected_biogrid_pin_df) }) test_that("`get_biogrid_pin()` -- determines and downloads the latest version", { mockery::stub(get_biogrid_pin, "safe_get_content", "

BioGRID Release 3.5.183") mockery::stub(get_biogrid_pin, "utils::download.file", NULL) mockery::stub(get_biogrid_pin, "utils::unzip", list(Name = "BIOGRID-ORGANISM-Homo_sapiens-X.X.X.tab3.txt")) mockery::stub(get_biogrid_pin, "utils::read.delim", toy_biogrid_pin) expected_biogrid_pin_df <- toy_biogrid_pin colnames(expected_biogrid_pin_df) <- c("Interactor_A", "Interactor_B") expected_biogrid_pin_df <- process_pin(expected_biogrid_pin_df) expected_biogrid_pin_df <- data.frame(V1 = expected_biogrid_pin_df$Interactor_A, V2 = "pp", V3 = expected_biogrid_pin_df$Interactor_B) pin_path <- get_biogrid_pin() pin_df <- read.delim(pin_path, header = FALSE) expect_true(ncol(pin_df) == 3) expect_true(all(pin_df[, 2] == "pp")) expect_identical(pin_df, expected_biogrid_pin_df) }) test_that("`get_biogrid_pin()` -- error check works", { # invalid organism error expect_error(get_biogrid_pin(org = "Hsapiens"), paste("Hsapiens is not a valid Biogrid organism.", "Available organisms are listed on: https://wiki.thebiogrid.org/doku.php/statistics")) }) test_that("`get_pin_file()` -- works as expected", { with_mocked_bindings({ pin_path <- get_pin_file() expect_identical(pin_path, "/path/to/some/PIN/file") }, get_biogrid_pin = function(...) "/path/to/some/PIN/file", .package = "pathfindR") expect_error(get_pin_file(source = "STRING"), "As of this version, this function is implemented to get data from BioGRID only") }) test_that("`gset_list_from_gmt()` -- works as expected", { gmt_list <- list(GSA = sample(gene_pool, 80), GSB = sample(gene_pool, 100), GSC = sample(gene_pool, 33)) description_vec <- c(GSA = "gene set A", GSB = "gene set B", GSC = "gene set C") gmt_df <- c() for (gset in names(gmt_list)) { tmp <- c(gset, description_vec[gset]) tmp <- c(tmp, gmt_list[[gset]], rep("", 100 - length(gmt_list[[gset]]))) gmt_df <- rbind(gmt_df, tmp) } path2gmt <- tempfile() write.table(gmt_df, path2gmt, sep = "\t", col.names = FALSE, row.names = FALSE, quote = FALSE) expect_is(res <- gset_list_from_gmt(path2gmt), "list") expect_identical(res$gene_sets, gmt_list) expect_identical(res$descriptions, description_vec) }) test_that("`get_kegg_gsets()` -- works as expected", { skip_on_cran() mock_response <- "pathway1\tdescription\npathway2\tdescription2" mock_pw_graph1 <- igraph::graph_from_data_frame( data.frame(from = c("A", "B"), to = c("B", "C")), vertices = data.frame( name = c("A", "B", "C"), type = c("gene", "not_gene", "gene") ) ) mock_pw_graph2 <- igraph::graph_from_data_frame( data.frame(from = c("D", "F"), to = c("E", "G")), vertices = data.frame( name = c("D", "E", "F", "G"), type = c("gene", "gene", "not_gene", "gene") ) ) mock_pathway <- function(pid, ...) { if (pid == "pathway1") { return(mock_pw_graph1) } else if (pid == "pathway2") { return(mock_pw_graph2) } else { stop("Unknown pid") } } with_mocked_bindings( { expect_is(toy_eco_kegg <- pathfindR:::get_kegg_gsets(), "list") }, safe_get_content = function(...) mock_response, pathway = mock_pathway ) expect_length(toy_eco_kegg, 2) expect_true(all(names(toy_eco_kegg) == c("gene_sets", "descriptions"))) expect_true(all(names(toy_eco_kegg[["gene_sets"]]) %in% names(toy_eco_kegg[["descriptions"]]))) expect_length(toy_eco_kegg[["gene_sets"]], 2) expect_length(toy_eco_kegg[["descriptions"]], 2) expect_true(toy_eco_kegg[["descriptions"]]["pathway1"] == "description") expect_true(toy_eco_kegg[["descriptions"]]["pathway2"] == "description2") expect_identical(toy_eco_kegg[["gene_sets"]][["pathway1"]], c("A", "C")) expect_identical(toy_eco_kegg[["gene_sets"]][["pathway2"]], c("D", "E", "G")) }) test_that("`get_reactome_gsets()` -- works as expected", { skip_on_cran() pw1 <- "Pathway1" pw2 <- "Pathway2" desc1 <- "Description1" desc2 <- "Description2" genes1 <- c("GeneA", "GeneB") genes2 <- c("GeneC", "GeneD", "GeneE") gmt_content <- paste( c( paste(c(desc1, pw1, genes1), collapse = "\t"), paste(c(desc2, pw2, genes2), collapse = "\t") ), collapse = "\n" ) mockery::stub(get_reactome_gsets, "utils::download.file", NULL) unz_mock <- function(zipfile, filename, ...) { textConnection(gmt_content) } mockery::stub(get_reactome_gsets, "unz", unz_mock) expected_gsets <- list(genes1, genes2) names(expected_gsets) <- c(pw1, pw2) expect_descriptions <- c(desc1, desc2) names(expect_descriptions) <- c(pw1, pw2) expect_is(reactome <- get_reactome_gsets(), "list") expect_length(reactome, 2) expect_length(reactome$gene_sets, 2) expect_length(reactome$descriptions, 2) expect_equal(names(reactome$gene_sets), names(reactome$descriptions)) expect_equal(reactome$gene_sets, expected_gsets) expect_equal(reactome$descriptions, expect_descriptions) }) test_that("`get_mgsigdb_gsets()` -- works as expected", { toy_msigdb_df <- c() for (gs_idx in 1:5) { toy_msigdb_df <- rbind(toy_msigdb_df, data.frame(gene_symbol = sample(gene_pool, sample(25:75, 1)), gs_id = paste0("GS", gs_idx), gs_name = paste("Gene Set", gs_idx))) } mockery::stub(get_mgsigdb_gsets, "msigdbr::msigdbr", toy_msigdb_df) expect_is(res_msig_db <- get_mgsigdb_gsets(collection = "C1"), "list") expect_length(res_msig_db, 2) expect_true(all(names(res_msig_db) == c("gene_sets", "descriptions"))) expect_true(all(names(res_msig_db[["gene_sets"]] %in% names(res_msig_db[["descriptions"]])))) }) test_that("`get_gene_sets_list()` works", { expect_error(gsets <- get_gene_sets_list("Wiki"), "As of this version, this function is implemented to get data from KEGG, Reactome and MSigDB only") mockery::stub(get_gene_sets_list, "get_kegg_gsets", NULL) mockery::stub(get_gene_sets_list, "get_reactome_gsets", NULL) mockery::stub(get_gene_sets_list, "get_mgsigdb_gsets", NULL) expect_silent(kegg <- get_gene_sets_list(org_code = "vcn")) expect_message(rctm <- get_gene_sets_list("Reactome")) expect_silent(msig <- get_gene_sets_list("MSigDB", species = "Mus musculus", db_species = "MS", collection = "C3", subcollection = "MIR:MIR_Legacy")) }) ================================================ FILE: tests/testthat/test-enrichment.R ================================================ ## Tests for functions related to enrichment analyses - Aug 2023 set.seed(123) test_that("`hyperg_test()` -- returns an appropriate p value", { expect_is(tmp_p <- hyperg_test(term_genes = LETTERS[1:10], chosen_genes = LETTERS[2:5], background_genes = LETTERS), "numeric") expect_true(tmp_p >= 0 & tmp_p <= 1) expect_is(tmp_p2 <- hyperg_test(term_genes = LETTERS[1:4], chosen_genes = LETTERS[3:10], background_genes = LETTERS), "numeric") expect_true(tmp_p2 >= 0 & tmp_p2 <= 1) expect_true(tmp_p2 > tmp_p) }) test_that("`hyperg_test()` -- argument checks work", { expect_error(hyperg_test(term_genes = list()), "`term_genes` should be a vector") expect_error(hyperg_test(term_genes = LETTERS, chosen_genes = list()), "`chosen_genes` should be a vector") expect_error(hyperg_test(term_genes = LETTERS, chosen_genes = LETTERS[1:2], background_genes = list()), "`background_genes` should be a vector") expect_error(hyperg_test(term_genes = c(LETTERS, LETTERS), chosen_genes = LETTERS[1:3], background_genes = LETTERS), "`term_genes` cannot be larger than `background_genes`!") expect_error(hyperg_test(term_genes = LETTERS[1:10], chosen_genes = c(LETTERS, LETTERS), background_genes = LETTERS), "`chosen_genes` cannot be larger than `background_genes`!") }) test_that("`enrichment()` -- returns a data frame", { expected_num_significant <- 10 gsets <- example_pathfindR_output$ID[1:50] p_val_vec <- c(runif(expected_num_significant, min = 1e-05, max = 0.001), runif(length(gsets) - expected_num_significant, min = 0.05, max = 1)) names(p_val_vec) <- gsets mock_vapply <- mockery::mock(p_val_vec, 5, 2, cycle = TRUE) mockery::stub(enrichment, "vapply", mock_vapply) mockery::stub(enrichment, "base::setdiff", c("RPS6KA2", "HSPA2", "SCN4B", "PPP2R1B", "PTCH1", "CASP10", "TIRAP", "BEX3", "KIF5C", "TNFSF13B")) # default expect_is(enr_res <- enrichment(input_genes = example_pathfindR_input$Gene.symbol, sig_genes_vec = c("DummyGene"), background_genes = c("DummyGene")), "data.frame") expect_equal(nrow(enr_res), expected_num_significant) expect_true(any(enr_res$non_Signif_Snw_Genes != "")) expect_true(all(enr_res$Fold_Enrichment == 2.5)) # higher threshold - no filter expect_is(enr_res2 <- enrichment(input_genes = example_pathfindR_input$Gene.symbol, sig_genes_vec = c("DummyGene"), background_genes = c("DummyGene"), enrichment_threshold = 1), "data.frame") expect_equal(nrow(enr_res2), 50) expect_true(any(enr_res2$non_Signif_Snw_Genes != "")) # no enrichment case mockery::stub(enrichment, "stats::p.adjust", rep(1, 50)) expect_null(enr_res3 <- enrichment(input_genes = example_pathfindR_input$Gene.symbol, sig_genes_vec = c("DummyGene"), background_genes = c("DummyGene"))) }) test_that("`enrichment()` -- argument checks work", { tmp_input_genes <- example_pathfindR_input$Gene.symbol[1:6] tmp_sig_vec <- example_pathfindR_input$Gene.symbol[1:3] ## input genes expect_error(enrichment(input_genes = list(), sig_genes_vec = "PER1", background_genes = unlist(kegg_genes)), "`input_genes` should be a vector of gene symbols") ## gene sets data expect_error(enrichment(input_genes = tmp_input_genes, genes_by_term = "INVALID", sig_genes_vec = tmp_sig_vec, background_genes = unlist(kegg_genes)), "`genes_by_term` should be a list of term gene sets") expect_error(enrichment(input_genes = tmp_input_genes, genes_by_term = list(1:3), sig_genes_vec = tmp_sig_vec, background_genes = unlist(kegg_genes)), "`genes_by_term` should be a named list \\(names are gene set IDs\\)") expect_error(enrichment(input_genes = tmp_input_genes, term_descriptions = list(), sig_genes_vec = tmp_sig_vec, background_genes = unlist(kegg_genes)), "`term_descriptions` should be a vector of term gene descriptions") expect_error(enrichment(input_genes = tmp_input_genes, term_descriptions = 1:3, sig_genes_vec = tmp_sig_vec, background_genes = unlist(kegg_genes)), "`term_descriptions` should be a named vector \\(names are gene set IDs\\)") expect_error(enrichment(input_genes = tmp_input_genes, genes_by_term = list(A = 1:3), term_descriptions = c(A = "a", B = "b"), sig_genes_vec = tmp_sig_vec, background_genes = unlist(kegg_genes)), "The lengths of `genes_by_term` and `term_descriptions` should be the same") expect_error(enrichment(input_genes = tmp_input_genes, genes_by_term = list(A = 1:3, X = 1:3), term_descriptions = c(A = "a", B = "b"), sig_genes_vec = tmp_sig_vec, background_genes = unlist(kegg_genes)), "The names of `genes_by_term` and `term_descriptions` should all be the same") ## enrichment threshold expect_error(enrichment(input_genes = tmp_input_genes, sig_genes_vec = tmp_sig_vec, background_genes = unlist(kegg_genes), enrichment_threshold = "INVALID"), "`enrichment_threshold` should be a numeric value between 0 and 1") expect_error(enrichment(input_genes = tmp_input_genes, sig_genes_vec = tmp_sig_vec, background_genes = unlist(kegg_genes), enrichment_threshold = -1), "`enrichment_threshold` should be between 0 and 1") ## signif. genes and background (universal set) genes expect_error(enrichment(input_genes = tmp_input_genes, sig_genes_vec = list(), background_genes = unlist(kegg_genes)), "`sig_genes_vec` should be a vector") expect_error(enrichment(input_genes = tmp_input_genes, sig_genes_vec = tmp_sig_vec, background_genes = list()), "`background_genes` should be a vector") }) tmp_gset_genes <- kegg_genes[example_pathfindR_output$ID[order(example_pathfindR_output$support, decreasing = TRUE)[1:10]]] tmp_gset_desc <- kegg_descriptions[names(tmp_gset_genes)] all_iter_enr_res <- list(NULL, NULL, NULL) subnw_start_idx <- 1:3 for (idx in seq_along(subnw_start_idx)) { j <- subnw_start_idx[idx] res <- enrichment_analyses(snws = example_active_snws[j:j + 2], sig_genes_vec = example_pathfindR_input$Gene.symbol, genes_by_term = tmp_gset_genes, term_descriptions = tmp_gset_desc, list_active_snw_genes = TRUE) if (!is.null(res)) { all_iter_enr_res[[idx]] <- res } } combined_res <- do.call(rbind, all_iter_enr_res) test_that("`enrichment_analyses()` -- returns a data frame", { toy_pin <- data.frame(V1 = paste("Gene", sample(1:50, 10)), V2 = "pp", V3 = paste("Gene", sample(1:50, 10))) mockery::stub(enrichment_analyses, "return_pin_path", NULL) mockery::stub(enrichment_analyses, "utils::read.delim", toy_pin) mock_lapply <- mockery::mock(c(), all_iter_enr_res, cycle = TRUE) mockery::stub(enrichment_analyses, "lapply", mock_lapply) # default expect_is(enr_res1 <- enrichment_analyses(snws = example_active_snws[1:3], sig_genes_vec = example_pathfindR_input$Gene.symbol, list_active_snw_genes = FALSE), "data.frame") total <- sum(vapply(all_iter_enr_res, function(x) ifelse(is.null(x), 0, nrow(x)), 1)) expect_true(nrow(enr_res1) <= total) # list active snw genes expect_is(enr_res2 <- enrichment_analyses(snws = example_active_snws[1:3], sig_genes_vec = example_pathfindR_input$Gene.symbol, list_active_snw_genes = TRUE), "data.frame") expect_true(ncol(enr_res2) == ncol(enr_res1) + 1) }) test_that("`enrichment_analyses()` -- argument check works", { expect_error(enrichment_analyses(snws = example_active_snws, list_active_snw_genes = "INVALID"), "`list_active_snw_genes` should be either TRUE or FALSE") }) test_that("`summarize_enrichment_results()` -- returns summarized enrichment results", { # default expect_is(summ_res <- summarize_enrichment_results(enrichment_res = combined_res[, -6]), "data.frame") expect_equal(ncol(summ_res), 7) expect_false("non_Signif_Snw_Genes" %in% colnames(summ_res)) expect_true(nrow(summ_res) <= nrow(combined_res)) # list active snw genes expect_is(summ_res2 <- summarize_enrichment_results(enrichment_res = combined_res, list_active_snw_genes = TRUE), "data.frame") expect_equal(ncol(summ_res2), 8) expect_true("non_Signif_Snw_Genes" %in% colnames(summ_res2)) expect_true(nrow(summ_res2) <= nrow(combined_res)) }) test_that("`summarize_enrichment_results()` -- argument checks work", { expect_error(summarize_enrichment_results(enrichment_res = combined_res, list_active_snw_genes = "INVALID"), "`list_active_snw_genes` should be either TRUE or FALSE") expect_error(summarize_enrichment_results(enrichment_res = list()), "`enrichment_res` should be a data frame") # list_active_snw_genes = FALSE nec_cols <- c("ID", "Term_Description", "Fold_Enrichment", "p_value", "adj_p", "support") expect_error(summarize_enrichment_results(enrichment_res = data.frame()), paste0("`enrichment_res` should have exactly ", length(nec_cols), " columns")) tmp <- as.data.frame(matrix(nrow = 1, ncol = length(nec_cols), dimnames = list(NULL, letters[seq_along(nec_cols)]))) expect_error(summarize_enrichment_results(enrichment_res = tmp), paste0("`enrichment_res` should have column names ", paste(dQuote(nec_cols), collapse = ", "))) # list_active_snw_genes = TRUE nec_cols <- c("ID", "Term_Description", "Fold_Enrichment", "p_value", "adj_p", "support", "non_Signif_Snw_Genes") expect_error(summarize_enrichment_results(enrichment_res = data.frame(), list_active_snw_genes = TRUE), paste0("`enrichment_res` should have exactly ", length(nec_cols), " columns")) tmp <- as.data.frame(matrix(nrow = 1, ncol = length(nec_cols), dimnames = list(NULL, letters[seq_along(nec_cols)]))) expect_error(summarize_enrichment_results(enrichment_res = tmp, list_active_snw_genes = TRUE), paste0("`enrichment_res` should have column names ", paste(dQuote(nec_cols), collapse = ", "))) }) ================================================ FILE: tests/testthat/test-scoring.R ================================================ ## Tests for agglomerated term scoring functions - Jan 2024 test_that("`score_terms()` -- returns score matrix", { mockery::stub(score_terms, "graphics::plot", NULL) small_result <- example_pathfindR_output[1:3, ] expect_is(score_terms(enrichment_table = small_result, exp_mat = example_experiment_matrix, plot_hmap = FALSE), "matrix") expect_is(score_terms(enrichment_table = small_result, exp_mat = example_experiment_matrix, plot_hmap = TRUE), "matrix") expect_is(score_terms(enrichment_table = small_result, exp_mat = example_experiment_matrix, cases = colnames(example_experiment_matrix)[1:3], plot_hmap = TRUE), "matrix") }) test_that("`score_terms()` -- matches gene symbols correctly", { toy_result <- data.frame( ID = c("gset1", "gset2"), Term_Description = c("gset1", "gset2"), Up_regulated = "", Down_regulated = c( paste(paste0("Gene_", c(1, 3, 5)), collapse = ", "), paste(paste0("Gene_", c(6, 8)), collapse = ", ") ) ) toy_result2 <- data.frame( ID = c("gset1", "gset2"), Term_Description = c("gset1", "gset2"), Up_regulated = "", Down_regulated = c( paste(paste0("Dummy_", c(1, 3, 5)), collapse = ", "), paste(paste0("Gene_", c(6, 8)), collapse = ", ") ) ) toy_exp_mat <- matrix( rnorm(40), nrow = 10, ncol = 4, dimnames = list(paste0("gene_", 1:10), paste0("subject_", 1:4)) ) expect_is(res_mat <- score_terms(enrichment_table = toy_result, exp_mat = toy_exp_mat, plot_hmap = FALSE), "matrix") expect_equal(nrow(res_mat), 2) expect_is(res_mat <- score_terms(enrichment_table = toy_result2, exp_mat = toy_exp_mat, plot_hmap = FALSE), "matrix") expect_equal(nrow(res_mat), 1) }) test_that("`score_terms()` -- argument checks work", { expect_error(score_terms(enrichment_table = example_pathfindR_output, exp_mat = example_experiment_matrix, use_description = "INVALID"), "`use_description` should either be TRUE or FALSE") expect_error(score_terms(enrichment_table = example_pathfindR_output, exp_mat = example_experiment_matrix, plot_hmap = "INVALID"), "`plot_hmap` should either be TRUE or FALSE") expect_error(score_terms(enrichment_table = list(), exp_mat = example_experiment_matrix), "`enrichment_table` should be a data frame of enrichment results") tmp <- example_pathfindR_output[, -c(1, 2)] nec_cols <- c("ID", "Up_regulated", "Down_regulated") expect_error(score_terms(enrichment_table = tmp, exp_mat = example_experiment_matrix), paste0("`enrichment_table` should contain all of ", paste(dQuote(nec_cols), collapse = ", "))) nec_cols <- c("Term_Description", "Up_regulated", "Down_regulated") expect_error(score_terms(enrichment_table = tmp, exp_mat = example_experiment_matrix, use_description = TRUE), paste0("`enrichment_table` should contain all of ", paste(dQuote(nec_cols), collapse = ", "))) expect_error(score_terms(enrichment_table = example_pathfindR_output, exp_mat = list()), "`exp_mat` should be a matrix") expect_error(score_terms(enrichment_table = example_pathfindR_output, exp_mat = example_experiment_matrix, cases = list()), "`cases` should be a vector") expect_error(score_terms(enrichment_table = example_pathfindR_output, exp_mat = example_experiment_matrix, cases = LETTERS), "Missing `cases` in `exp_mat`") }) test_that("duplicated term descriptions test", { small_result <- example_pathfindR_output[1:2, ] small_result$Term_Description <- small_result$Term_Description[1] expect_is(score_terms(enrichment_table = small_result, exp_mat = example_experiment_matrix, use_description = TRUE, plot_hmap = FALSE), "matrix") }) test_that("`plot_scores()` -- creates term score heatmap ggplot object with correct labels", { score_mat <- score_terms(example_pathfindR_output[1:3, ], example_experiment_matrix, plot_hmap = FALSE) # default g <- plot_scores(score_mat) expect_is(g, "ggplot") labels <- ggplot2::get_labs(g) expect_identical(labels$fill, "Score") expect_identical(labels$x, "Sample") expect_identical(labels$y, "Term") # cases provided g <- plot_scores(score_mat, cases = colnames(score_mat)[1:3]) expect_is(g, "ggplot") labels <- ggplot2::get_labs(g) expect_identical(labels$fill, "Score") expect_identical(labels$x, "Sample") expect_identical(labels$y, "Term") # default - label_samples = FALSE g <- plot_scores(score_mat, label_samples = FALSE) expect_is(g, "ggplot") labels <- ggplot2::get_labs(g) expect_identical(labels$fill, "Score") expect_identical(labels$x, "Sample") expect_identical(labels$y, "Term") # cases provided - label_samples = FALSE g <- plot_scores(score_mat, cases = colnames(score_mat)[1:3], label_samples = FALSE) expect_is(g, "ggplot") labels <- ggplot2::get_labs(g) expect_identical(labels$fill, "Score") expect_identical(labels$x, "Sample") expect_identical(labels$y, "Term") }) test_that("`plot_scores()` -- argument checks work", { expect_error(plot_scores(score_matrix = c()), "`score_matrix` should be a matrix") expect_error(plot_scores(score_matrix = data.frame()), "`score_matrix` should be a matrix") expect_error(plot_scores(score_matrix = list()), "`score_matrix` should be a matrix") mat <- matrix(1, nrow = 3, ncol = 2, dimnames = list(paste0("T", 1:3), c("A", "B"))) expect_error(plot_scores(score_matrix = mat, cases = list()), "`cases` should be a vector") expect_error(plot_scores(score_matrix = mat, cases = c("A", "B", "C")), "Missing `cases` in `score_matrix`") expect_error(plot_scores(score_matrix = mat, label_samples = "INVALID"), "`label_samples` should be TRUE or FALSE") expect_error(plot_scores(score_matrix = mat, case_title = 1), "`case_title` should be a single character value") expect_error(plot_scores(score_matrix = mat, case_title = rep("z", 3)), "`case_title` should be a single character value") expect_error(plot_scores(score_matrix = mat, control_title = 1), "`control_title` should be a single character value") expect_error(plot_scores(score_matrix = mat, control_title = rep("z", 3)), "`control_title` should be a single character value") expect_error(plot_scores(score_matrix = mat, low = "")) expect_error(plot_scores(score_matrix = mat, mid = "")) expect_error(plot_scores(score_matrix = mat, high = "")) }) ================================================ FILE: tests/testthat/test-utility.R ================================================ ## Tests for various utility functions - Aug 2023 set.seed(123) test_that("`active_snw_enrichment_wrapper()` -- works as expected", { input_df <- example_pathfindR_input[, c(1, 3)] colnames(input_df) <- c("GENE", "P_VALUE") org_dir <- getwd() test_directory <- file.path(tempdir(check = TRUE), "snw_wrapper_test") dir.create(test_directory) setwd(test_directory) on.exit(setwd(org_dir)) on.exit(unlink(test_directory), add = TRUE) with_mocked_bindings({ expect_is(active_snw_enrichment_wrapper(input_processed = input_df, pin_path = "Biogrid", gset_list = list(), enrichment_threshold = 0.05, list_active_snw_genes = FALSE, iterations = 1), "data.frame") expect_is(active_snw_enrichment_wrapper(input_processed = input_df, pin_path = "Biogrid", gset_list = list(), enrichment_threshold = 0.05, list_active_snw_genes = FALSE, iterations = 2, disable_parallel = TRUE), "data.frame") expect_warning(active_snw_enrichment_wrapper(input_processed = input_df, pin_path = "Biogrid", gset_list = list(), enrichment_threshold = 0.05, list_active_snw_genes = FALSE, search_method = "GA", iterations = 2)) }, single_iter_wrapper = function(...) example_pathfindR_output, .package = "pathfindR") skip_on_cran() expect_is(active_snw_enrichment_wrapper(input_processed = input_df[1:10, ], pin_path = "Biogrid", gset_list = list(genes_by_term = kegg_genes[1:2], term_descriptions = kegg_descriptions[names(kegg_genes[1:2])]), enrichment_threshold = 0.05, list_active_snw_genes = FALSE, iterations = 2), "NULL") }) test_that("`active_snw_enrichment_wrapper()` -- argument checks work", { valid_mets <- c("GR", "SA", "GA") expect_error(active_snw_enrichment_wrapper(input_processed = input_processed, pin_path = pin_path, gset_list = list(), enrichment_threshold = 0.05, list_active_snw_genes = FALSE, search_method = "INVALID"), paste0("`search_method` should be one of ", paste(dQuote(valid_mets), collapse = ", "))) expect_error(active_snw_enrichment_wrapper(input_processed = input_processed, pin_path = pin_path, gset_list = list(), enrichment_threshold = 0.05, list_active_snw_genes = FALSE, use_all_positives = "INVALID"), "`use_all_positives` should be either TRUE or FALSE") expect_error(active_snw_enrichment_wrapper(input_processed = input_processed, pin_path = pin_path, gset_list = list(), enrichment_threshold = 0.05, list_active_snw_genes = FALSE, silent_option = "INVALID"), "`silent_option` should be either TRUE or FALSE") expect_error(active_snw_enrichment_wrapper(input_processed = input_processed, pin_path = pin_path, gset_list = list(), enrichment_threshold = 0.05, list_active_snw_genes = FALSE, disable_parallel = "INVALID"), "`disable_parallel` should be either TRUE or FALSE") expect_error(active_snw_enrichment_wrapper(input_processed = input_processed, pin_path = pin_path, gset_list = list(), enrichment_threshold = 0.05, list_active_snw_genes = FALSE, iterations = "INVALID"), "`iterations` should be a positive integer") expect_error(active_snw_enrichment_wrapper(input_processed = input_processed, pin_path = pin_path, gset_list = list(), enrichment_threshold = 0.05, list_active_snw_genes = FALSE, iterations = 0), "`iterations` should be >= 1") expect_error(active_snw_enrichment_wrapper(input_processed = input_processed, pin_path = pin_path, gset_list = list(), enrichment_threshold = 0.05, list_active_snw_genes = FALSE, n_processes = "INVALID"), "`n_processes` should be either NULL or a positive integer") expect_error(active_snw_enrichment_wrapper(input_processed = input_processed, pin_path = pin_path, gset_list = list(), enrichment_threshold = 0.05, list_active_snw_genes = FALSE, n_processes = 0), "`n_processes` should be > 1") }) test_that("`configure_output_dir()` -- works as expected", { expected_dir <- file.path(tempdir(), "test_pathfindR_results") mockery::stub(configure_output_dir, "file.path", expected_dir) expect_equal(configure_output_dir(), expected_dir) test_out_dir <- file.path(tempdir(), "TEST") for (i in 1:3) { actual_dir <- configure_output_dir(test_out_dir) dir_to_check <- test_out_dir if (i > 1) { dir_to_check <- paste0(dir_to_check, "(", i - 1, ")") } expect_equal(actual_dir, dir_to_check) dir.create(actual_dir) } }) test_that("`fetch_gene_set()` -- can fetch all gene set objects", { skip_on_cran() for (gset_name in c("KEGG", "mmu_KEGG", "Reactome", "BioCarta", "cell_markers", "GO-All", "GO-BP", "GO-CC", "GO-MF")) { expect_is(gset_obj <- fetch_gene_set(gene_sets = gset_name, min_gset_size = 10, max_gset_size = 300), "list") expect_is(gset_obj$genes_by_term, "list") expect_is(gset_obj$term_descriptions, "character") expect_true(length(gset_obj$genes_by_term) == length(gset_obj$term_descriptions)) tmp <- vapply(gset_obj$genes_by_term, length, 1L) expect_true(min(tmp) >= 10 & max(tmp) <= 300) } # Custom gset_obj <- fetch_gene_set(gene_sets = "Custom", min_gset_size = 20, max_gset_size = 200, custom_genes = kegg_genes, custom_descriptions = kegg_descriptions) expect_is(gset_obj$genes_by_term, "list") expect_is(gset_obj$term_descriptions, "character") expect_true(length(gset_obj$genes_by_term) == length(gset_obj$term_descriptions)) tmp <- vapply(gset_obj$genes_by_term, length, 1L) expect_true(min(tmp) >= 20 & max(tmp) <= 200) }) test_that("`create_HTML_report()` -- works a expected", { mock_render <- mockery::mock(NULL, cycle = TRUE) mockery::stub(create_HTML_report, "rmarkdown::render", mock_render) create_HTML_report(input = data.frame(), input_processed = data.frame(), final_res = data.frame(), dir_for_report = "/path/to/report/dir") mockery::expect_called(mock_render, 3) }) test_that("`fetch_gene_set()` -- min/max_gset_size args correctly filter gene sets", { skip_on_cran() min_max_pairs <- list(c(min = 10, max = 300), c(min = 50, max = 200)) num_of_terms_after_size_filtering <- c() for (idx in seq_along(min_max_pairs)) { cur_vals <- min_max_pairs[[idx]] expect_is(gset_obj <- fetch_gene_set(gene_sets = "KEGG", min_gset_size = cur_vals["min"], max_gset_size = cur_vals["max"]), "list") sizes_of_terms <- vapply(gset_obj$genes_by_term, length, 1L) expect_true(min(sizes_of_terms) >= cur_vals["min"] & max(sizes_of_terms) <= cur_vals["max"]) num_of_terms_after_size_filtering <- c(num_of_terms_after_size_filtering, length(gset_obj$genes_by_term)) } expect_true(num_of_terms_after_size_filtering[2] < num_of_terms_after_size_filtering[1]) }) test_that("`fetch_gene_set()` -- for 'Custom' gene set, check if the custom objects are provided", { expect_error(fetch_gene_set(gene_sets = "Custom"), "`custom_genes` and `custom_descriptions` must be provided if `gene_sets = \"Custom\"`") expect_error(fetch_gene_set(gene_sets = "Custom", custom_genes = kegg_genes), "`custom_genes` and `custom_descriptions` must be provided if `gene_sets = \"Custom\"`") expect_error(fetch_gene_set(gene_sets = "Custom", custom_descriptions = kegg_descriptions), "`custom_genes` and `custom_descriptions` must be provided if `gene_sets = \"Custom\"`") }) test_that("`fetch_gene_set()` -- argument checks work", { all_gs_opts <- c("KEGG", "Reactome", "BioCarta", "GO-All", "GO-BP", "GO-CC", "GO-MF", "cell_markers", "mmu_KEGG", "Custom") expect_error(fetch_gene_set(gene_sets = "INVALID"), paste0("`gene_sets` should be one of ", paste(dQuote(all_gs_opts), collapse = ", "))) expect_error(fetch_gene_set(min_gset_size = "INVALID"), "`min_gset_size` should be numeric") expect_error(fetch_gene_set(max_gset_size = "INVALID"), "`max_gset_size` should be numeric") expect_error(fetch_gene_set(gene_sets = "Custom", custom_genes = "INVALID", custom_descriptions = ""), "`custom_genes` should be a list of term gene sets") expect_error(fetch_gene_set(gene_sets = "Custom", custom_genes = list(), custom_descriptions = ""), "`custom_genes` should be a named list \\(names are gene set IDs\\)") expect_error(fetch_gene_set(gene_sets = "Custom", custom_genes = kegg_genes, custom_descriptions = list()), "`custom_descriptions` should be a vector of term gene descriptions") expect_error(fetch_gene_set(gene_sets = "Custom", custom_genes = kegg_genes, custom_descriptions = 1:3), "`custom_descriptions` should be a named vector \\(names are gene set IDs\\)") }) test_that("`return_pin_path()` -- returns the absolute path to PIN file", { mockery::stub(return_pin_path, "utils::getFromNamespace", list()) mockery::stub(return_pin_path, "lapply", list(data.frame(V1 = paste0("G", 1:10), V2 = "pp", V3 = paste0("G", 2:11)), data.frame(V1 = paste0("G", 3:5), V2 = "pp", V3 = paste0("G", 5:7)))) expect_silent(path2file <- return_pin_path("Biogrid")) expect_true(file.exists(path2file)) custom_pin <- read.delim(path2file, header = FALSE) custom_pin$V1 <- tolower(custom_pin$V1) custom_sif_path <- file.path(tempdir(check = TRUE), "tmp_PIN.sif") utils::write.table(custom_pin, custom_sif_path, sep = "\t", row.names = FALSE, col.names = FALSE, quote = FALSE) expect_silent(final_custom_path <- return_pin_path(custom_sif_path)) expect_true(file.exists(final_custom_path)) # convert to uppercase works upper_case_custom <- read.delim(final_custom_path, header = FALSE) expect_true(all(toupper(upper_case_custom[, 1]) == upper_case_custom[, 1])) expect_true(all(toupper(upper_case_custom[, 3]) == upper_case_custom[, 3])) # invalid custom PIN - wrong format invalid_sif_path <- system.file(paste0("extdata/MYC.txt"), package = "pathfindR") expect_error(return_pin_path(invalid_sif_path), "The PIN file must have 3 columns and be tab-separated") # invalid custom PIN - invalid second column invalid_sif_path <- file.path(tempdir(check = TRUE), "custom.sif") invalid_custom_sif <- data.frame(P1 = "X", pp = "INVALID", P2 = "Y") write.table(invalid_custom_sif, invalid_sif_path, sep = "\t", col.names = FALSE, row.names = FALSE) expect_error(return_pin_path(invalid_sif_path), "The second column of the PIN file must all be \"pp\" ") # invalid option valid_opts <- c("Biogrid", "STRING", "GeneMania", "IntAct", "KEGG", "mmu_STRING", "/path/to/custom/SIF") expect_error(return_pin_path("INVALID"), paste0("The chosen PIN must be one of:\n", paste(dQuote(valid_opts), collapse = ", "))) }) test_that("`input_testing()` -- works as expected", { expect_message(input_testing(input = example_pathfindR_input, p_val_threshold = 0.05), "The input looks OK") expect_error(input_testing(input = matrix(), p_val_threshold = 0.05), "the input is not a data frame") expect_error(input_testing(input = example_pathfindR_input[, 1, drop = FALSE], p_val_threshold = 0.05), "the input should have 2 or 3 columns") expect_error(input_testing(input = example_pathfindR_input[1, ], p_val_threshold = 0.05), "There must be at least 2 rows \\(genes\\) in the input data frame") expect_error(input_testing(input = example_pathfindR_input, p_val_threshold = "INVALID"), "`p_val_threshold` must be a numeric value between 0 and 1") expect_error(input_testing(input = example_pathfindR_input, p_val_threshold = -1), "`p_val_threshold` must be between 0 and 1") tmp <- example_pathfindR_input tmp$adj.P.Val <- NA expect_error(input_testing(input = tmp, p_val_threshold = 0.05), "p values cannot contain NA values") tmp <- example_pathfindR_input tmp$adj.P.Val <- "INVALID" expect_error(input_testing(input = tmp, p_val_threshold = 0.05), "p values must all be numeric") tmp <- example_pathfindR_input tmp$adj.P.Val[1] <- -1 expect_error(input_testing(input = tmp, p_val_threshold = 0.05), "p values must all be between 0 and 1") }) test_that("`input_processing()` -- works as expected", { input_df <- example_pathfindR_input[1:10, ] toy_PIN <- data.frame(V1 = sample(example_pathfindR_input$Gene.symbol, 100), V2 = "pp", V3 = sample(example_pathfindR_input$Gene.symbol, 100)) mockery::stub(input_processing, "return_pin_path", NULL) mockery::stub(input_processing, "utils::read.delim", toy_PIN) expect_is(processed_df <- input_processing(input_df), "data.frame") expect_true(ncol(processed_df) == 4) expect_true(nrow(processed_df) <= nrow(example_pathfindR_input)) # no change values provided input_df2 <- input_df[, -2] expect_is(processed_df2 <- suppressWarnings(input_processing(input_df2)), "data.frame") expect_true(ncol(processed_df2) == 4) expect_true(all(processed_df2$CHANGE == 1e+06)) toy_PIN2 <- rbind(toy_PIN, data.frame(V1 = c("SERPINA3", "ARHGAP17"), V2 = "pp", V3 = c("ACT", "GIG25"))) mockery::stub(input_processing, "utils::read.delim", toy_PIN2) # multiple mapping input_multimap <- input_df input_multimap$Gene.symbol[1] <- "GIG24" input_multimap$Gene.symbol[2] <- "ACT" input_multimap$Gene.symbol[3] <- "AACT" input_multimap$Gene.symbol[4] <- "GIG25" expect_is(processed_df3 <- input_processing(input_multimap), "data.frame") }) test_that("`input_processing()` -- errors and warnings work", { input_df <- example_pathfindR_input[1:10, ] toy_PIN <- data.frame(V1 = sample(input_df$Gene.symbol, 7), V2 = "pp", V3 = sample(input_df$Gene.symbol, 7)) mockery::stub(input_processing, "return_pin_path", NULL) mockery::stub(input_processing, "utils::read.delim", toy_PIN) input_df$Gene.symbol <- as.factor(input_df$Gene.symbol) expect_warning(input_processing(input_df, p_val_threshold = 0.05, pin_name_path = "Biogrid", convert2alias = TRUE), "The gene column was turned into character from factor.") expect_error(input_processing(example_pathfindR_input, p_val_threshold = 1e-100, pin_name_path = "Biogrid"), "No input p value is lower than the provided threshold \\(1e-100\\)") input_dup <- example_pathfindR_input[1:3, ] input_dup <- rbind(input_dup, input_dup[1, ]) expect_warning(input_processing(input_dup, p_val_threshold = 0.05, pin_name_path = "Biogrid"), "Duplicated genes found! The lowest p value for each gene was selected") low_sig_input <- example_pathfindR_input[1:3, ] low_sig_input$adj.P.Val <- 1e-15 expect_message(res <- input_processing(low_sig_input, p_val_threshold = 0.05, pin_name_path = "Biogrid"), "pathfindR cannot handle p values < 1e-13. These were changed to 1e-13") expect_true(all(res$P_VALUE == 1e-13)) invalid_genes_input <- low_sig_input invalid_genes_input$Gene.symbol <- paste0(LETTERS[seq_len(nrow(invalid_genes_input))], "INVALID") expect_error(input_processing(invalid_genes_input, p_val_threshold = 0.05, pin_name_path = "Biogrid"), "None of the genes were in the PIN\nPlease check your gene symbols") low_sig_input$Gene.symbol[1] <- "INVALID_A" low_sig_input$Gene.symbol[2] <- "INVALID_B" low_sig_input$Gene.symbol[3] <- toy_PIN$V1[1] expect_error(input_processing(low_sig_input, p_val_threshold = 0.05, pin_name_path = "Biogrid"), "After processing, 1 gene \\(or no genes\\) could be mapped to the PIN") expect_error(input_processing(low_sig_input, p_val_threshold = 0.05, pin_name_path = "Biogrid", convert2alias = "INVALID"), "`convert2alias` should be either TRUE or FALSE") }) example_gene_data <- example_pathfindR_input[1:10, ] colnames(example_gene_data) <- c("GENE", "CHANGE", "P_VALUE") tmp_res <- example_pathfindR_output[1:5, -c(7, 8)] test_that("`annotate_term_genes()` -- adds input genes for each term", { expect_is(annotated_result <- annotate_term_genes(result_df = tmp_res, input_processed = example_gene_data), "data.frame") expect_true("Up_regulated" %in% colnames(annotated_result) & "Down_regulated" %in% colnames(annotated_result)) expect_true(nrow(annotated_result) == nrow(tmp_res)) }) test_that("annotate_term_genes() -- argument checks work", { expect_error(annotate_term_genes(result_df = list(), input_processed = example_gene_data), "`result_df` should be a data frame") expect_error(annotate_term_genes(result_df = tmp_res[, -1], input_processed = example_gene_data), "`result_df` should contain an \"ID\" column") expect_error(annotate_term_genes(result_df = tmp_res, input_processed = list()), "`input_processed` should be a data frame") expect_error(annotate_term_genes(result_df = tmp_res, input_processed = example_gene_data[, -1]), "`input_processed` should contain the columns \"GENE\" and \"CHANGE\"") expect_error(annotate_term_genes(result_df = tmp_res, input_processed = example_gene_data, genes_by_term = "INVALID"), "`genes_by_term` should be a list of term gene sets") expect_error(annotate_term_genes(result_df = tmp_res, input_processed = example_gene_data, genes_by_term = list(1)), "`genes_by_term` should be a named list \\(names are gene set IDs\\)") }) ================================================ FILE: tests/testthat/test-visualization.R ================================================ ## Tests for functions related to various visualization functions - Apr 2024 single_result <- example_pathfindR_output[1, ] processed_input <- example_pathfindR_input[, c(1, 1, 2, 3)] colnames(processed_input) <- c("old_GENE", "GENE", "CHANGE", "P_VALUE") test_that("`visualize_terms()` -- calls the appropriate function", { mock_vis_kegg <- mockery::mock(NULL) mockery::stub(visualize_terms, "visualize_KEGG_diagram", mock_vis_kegg) expect_silent(visualize_terms(result_df = single_result, input_processed = data.frame(), is_KEGG_result = TRUE)) mockery::expect_called(mock_vis_kegg, 1) mock_vis_term_inter <- mockery::mock(NULL) mockery::stub(visualize_terms, "visualize_term_interactions", mock_vis_term_inter) expect_silent(visualize_terms(result_df = single_result, is_KEGG_result = FALSE)) mockery::expect_called(mock_vis_term_inter, 1) }) test_that("`visualize_terms()` -- argumment checks work", { expect_error(visualize_terms(result_df = "INVALID"), "`result_df` should be a data frame") # is_KEGG_result = TRUE nec_cols <- "ID" expect_error(visualize_terms(single_result[, -1], is_KEGG_result = TRUE), paste0("`result_df` should contain the following columns: ", paste(dQuote(nec_cols), collapse = ", "))) # is_KEGG_result = FALSE nec_cols <- c("Term_Description", "Up_regulated", "Down_regulated") expect_error(visualize_terms(single_result[, -2], is_KEGG_result = FALSE), paste0("`result_df` should contain the following columns: ", paste(dQuote(nec_cols), collapse = ", "))) expect_error(visualize_terms(result_df = single_result, is_KEGG_result = TRUE), "`input_processed` should be specified when `is_KEGG_result = TRUE`") expect_error(visualize_terms(result_df = single_result, is_KEGG_result = "INVALID"), "the argument `is_KEGG_result` should be either TRUE or FALSE") }) test_that("`visualize_term_interactions()` -- creates expected list of ggraph objects", { skip_on_cran() expect_is(res <- visualize_term_interactions(single_result, pin_name_path = "Biogrid"), "list") expect_is(res[[1]], "ggraph") tmp_res <- rbind(single_result, single_result) tmp_res$Term_Description[2] <- "SKIP" tmp_res$Up_regulated[2] <- "Gene1" tmp_res$Down_regulated[2] <- "" expect_message(res <- visualize_term_interactions(tmp_res, pin_name_path = "KEGG"), paste0("< 2 genes, skipping visualization of ", tmp_res$Term_Description[2])) # Non-empty non_Signif_Snw_Genes tmp_res <- single_result tmp_res$non_Signif_Snw_Genes <- example_pathfindR_output$Up_regulated[2] expect_is(res <- visualize_term_interactions(tmp_res, pin_name_path = "Biogrid"), "list") expect_is(res[[1]], "ggraph") }) test_that("`visualize_KEGG_diagram()` -- creates expected list of ggraph objects", { skip_on_cran() skip_if_not_installed("org.Hs.eg.db") expect_is(res <- visualize_KEGG_diagram(kegg_pw_ids = single_result$ID, input_processed = processed_input), "list") expect_is(res[[1]], "ggraph") constant_input <- processed_input constant_input$CHANGE <- 1e+06 expect_is(visualize_KEGG_diagram(kegg_pw_ids = single_result$ID, input_processed = constant_input), "list") expect_is(res[[1]], "ggraph") }) test_that("`visualize_KEGG_diagram()` -- skips pathway if non-existent", { skip_on_cran() skip_if_not_installed("org.Hs.eg.db") temp_res <- example_pathfindR_output[1:2, ] temp_res$ID[2] <- "hsa12345" expect_is(res <- visualize_KEGG_diagram(kegg_pw_ids = temp_res$ID, input_processed = processed_input), "list") expect_is(res[[1]], "ggraph") expect_length(expect_is, 1) }) test_that("`visualize_KEGG_diagram()` -- argument checks work", { expect_error(visualize_KEGG_diagram(kegg_pw_ids = list(), input_processed = processed_input), "`kegg_pw_ids` should be a vector of KEGG IDs") expect_error(visualize_KEGG_diagram(kegg_pw_ids = c("X", "Y", "Z"), input_processed = processed_input), "`kegg_pw_ids` should be a vector of valid hsa KEGG IDs") expect_error(visualize_KEGG_diagram(kegg_pw_ids = "abc12345", input_processed = list()), "`input_processed` should be a data frame") expect_error(visualize_KEGG_diagram(kegg_pw_ids = "abc12345", input_processed = processed_input[, -2]), paste0("`input_processed` should contain the following columns: ", paste(dQuote(c("GENE", "CHANGE")), collapse = ", "))) }) test_that("`color_kegg_pathway()` -- works as expected", { skip_on_cran() pw_id <- "hsa00010" change_vec <- c(-2, 4, 6) names(change_vec) <- c("hsa:2821", "hsa:226", "hsa:229") expect_is(result <- color_kegg_pathway(pw_id, change_vec), "ggraph") names(change_vec) <- rep("missing", 3) expect_is(result <- color_kegg_pathway(pw_id, change_vec), "NULL") }) test_that("`color_kegg_pathway()` -- exceptions are handled properly", { change_vec <- c(-2, 4, 6) names(change_vec) <- c("hsa:2821", "hsa:226", "hsa:229") expect_error(color_kegg_pathway(pw_id = "hsa03040", change_vec = change_vec, scale_vals = "INVALID"), "`scale_vals` should be logical") expect_error(color_kegg_pathway(pw_id = "hsa03040", change_vec = change_vec, node_cols = list()), "`node_cols` should be a vector of colors") expect_error(color_kegg_pathway(pw_id = "hsa03040", change_vec = change_vec, node_cols = rep("red", 4)), "the length of `node_cols` should be 3") expect_error(color_kegg_pathway(pw_id = "hsa03040", change_vec = change_vec, node_cols = c("red", "#FFFFFF", "INVALID")), "`node_cols` should be a vector of valid colors") skip_on_cran() constant_vec <- rep(1e+06, 3) names(constant_vec) <- c("hsa:2821", "hsa:226", "hsa:229") expect_silent(color_kegg_pathway(pw_id = "hsa03040", change_vec = change_vec, node_cols = c("red", "blue", "green"))) expect_message(color_kegg_pathway(pw_id = "hsa03040", change_vec = constant_vec, node_cols = c("red", "blue", "green"))) expect_null(suppressWarnings(color_kegg_pathway(pw_id = "hsa03040", change_vec = NULL))) expect_message(color_kegg_pathway(pw_id = "hsa11111", change_vec = c())) }) test_that("`enrichment_chart()` -- produces a ggplot object with correct labels", { # default - top 10 expect_is(g <- enrichment_chart(example_pathfindR_output), "ggplot") expect_equal(ggplot2::quo_name(g$mapping$x), "Fold_Enrichment") expect_equal(ggplot2::quo_name(g$mapping$y), "Term_Description") labels <- ggplot2::get_labs(g) expect_equal(labels$size, "# genes") expect_equal(labels$colour, expression(-log[10](p))) expect_equal(labels$x, "Fold Enrichment") expect_equal(labels$y, "Term_Description") # plot_by_cluster expect_is(g <- enrichment_chart(example_pathfindR_output_clustered, plot_by_cluster = TRUE), "ggplot") expect_equal(ggplot2::quo_name(g$mapping$x), "Fold_Enrichment") expect_equal(ggplot2::quo_name(g$mapping$y), "Term_Description") labels <- ggplot2::get_labs(g) expect_equal(labels$size, "# genes") expect_equal(labels$colour, expression(-log[10](p))) expect_equal(labels$x, "Fold Enrichment") expect_equal(labels$y, "Term_Description") # chang top_terms expect_is(g <- enrichment_chart(example_pathfindR_output, top_terms = NULL), "ggplot") expect_equal(ggplot2::quo_name(g$mapping$x), "Fold_Enrichment") expect_equal(ggplot2::quo_name(g$mapping$y), "Term_Description") labels <- ggplot2::get_labs(g) expect_equal(labels$size, "# genes") expect_equal(labels$colour, expression(-log[10](p))) expect_equal(labels$x, "Fold Enrichment") expect_equal(labels$y, "Term_Description") expect_is(g <- enrichment_chart(example_pathfindR_output, top_terms = 1000), "ggplot") expect_equal(ggplot2::quo_name(g$mapping$x), "Fold_Enrichment") expect_equal(ggplot2::quo_name(g$mapping$y), "Term_Description") labels <- ggplot2::get_labs(g) expect_equal(labels$size, "# genes") expect_equal(labels$colour, expression(-log[10](p))) expect_equal(labels$x, "Fold Enrichment") expect_equal(labels$y, "Term_Description") # change num_bubbles expect_is(g <- enrichment_chart(example_pathfindR_output_clustered, num_bubbles = 30), "ggplot") expect_equal(ggplot2::quo_name(g$mapping$x), "Fold_Enrichment") expect_equal(ggplot2::quo_name(g$mapping$y), "Term_Description") labels <- ggplot2::get_labs(g) expect_equal(labels$size, "# genes") expect_equal(labels$colour, expression(-log[10](p))) expect_equal(labels$x, "Fold Enrichment") expect_equal(labels$y, "Term_Description") # change even_breaks expect_is(g <- enrichment_chart(example_pathfindR_output_clustered, even_breaks = FALSE), "ggplot") expect_equal(ggplot2::quo_name(g$mapping$x), "Fold_Enrichment") expect_equal(ggplot2::quo_name(g$mapping$y), "Term_Description") labels <- ggplot2::get_labs(g) expect_equal(labels$size, "# genes") expect_equal(labels$colour, expression(-log[10](p))) expect_equal(labels$x, "Fold Enrichment") expect_equal(labels$y, "Term_Description") }) test_that("`enrichment_chart()` -- argument checks work", { necessary <- c("Term_Description", "Fold_Enrichment", "lowest_p", "Up_regulated", "Down_regulated") expect_error(enrichment_chart(example_pathfindR_output[, -2]), paste0("The input data frame must have the columns:\n", paste(necessary, collapse = ", "))) expect_error(enrichment_chart(example_pathfindR_output, plot_by_cluster = "INVALID"), "`plot_by_cluster` must be either TRUE or FALSE") expect_message(enrichment_chart(example_pathfindR_output, plot_by_cluster = TRUE), "For plotting by cluster, there must a column named `Cluster` in the input data frame!") expect_error(enrichment_chart(example_pathfindR_output, top_terms = "INVALID"), "`top_terms` must be either numeric or NULL") expect_error(enrichment_chart(example_pathfindR_output, top_terms = 0), "`top_terms` must be > 1") }) test_that("`term_gene_graph()` -- produces a ggplot object using the correct data", { # Top 10 (default) expect_is(p <- term_gene_graph(example_pathfindR_output), "ggplot") expect_equal(sum(p$data$type == "term"), 10) # Top 3 expect_is(p <- term_gene_graph(example_pathfindR_output, num_terms = 3), "ggplot") expect_equal(sum(p$data$type == "term"), 3) # All terms expect_is(p <- term_gene_graph(example_pathfindR_output[1:15, ], num_terms = NULL), "ggplot") expect_equal(sum(p$data$type == "term"), 15) # Top 1000, expect to plot top nrow(output) expect_is(p <- term_gene_graph(example_pathfindR_output[1:15, ], num_terms = 1000), "ggplot") expect_equal(sum(p$data$type == "term"), 15) # use_description = TRUE expect_is(p <- term_gene_graph(example_pathfindR_output, use_description = TRUE), "ggplot") expect_equal(sum(p$data$type == "term"), 10) # node_size = 'p_val' expect_is(p <- term_gene_graph(example_pathfindR_output, node_size = "p_val"), "ggplot") expect_equal(sum(p$data$type == "term"), 10) }) test_that("`term_gene_graph()` -- argument checks work", { expect_error(term_gene_graph(example_pathfindR_output, num_terms = "INVALID"), "`num_terms` must either be numeric or NULL!") expect_error(term_gene_graph(example_pathfindR_output, use_description = "INVALID"), "`use_description` must either be TRUE or FALSE!") val_node_size <- c("num_genes", "p_val") expect_error(term_gene_graph(example_pathfindR_output, node_size = "INVALID"), paste0("`node_size` should be one of ", paste(dQuote(val_node_size), collapse = ", "))) expect_error(term_gene_graph(result_df = "INVALID"), "`result_df` should be a data frame") wrong_df <- example_pathfindR_output[, -c(1, 2)] ID_column <- "ID" necessary_cols <- c(ID_column, "lowest_p", "Up_regulated", "Down_regulated") expect_error(term_gene_graph(wrong_df, use_description = FALSE), paste(c("All of", paste(necessary_cols, collapse = ", "), "must be present in `results_df`!"), collapse = " ")) ID_column <- "Term_Description" necessary_cols <- c(ID_column, "lowest_p", "Up_regulated", "Down_regulated") expect_error(term_gene_graph(wrong_df, use_description = TRUE), paste(c("All of", paste(necessary_cols, collapse = ", "), "must be present in `results_df`!"), collapse = " ")) expect_error(term_gene_graph(example_pathfindR_output, node_colors = list())) expect_error(term_gene_graph(example_pathfindR_output, node_colors = c(1, 2, 3))) expect_error(term_gene_graph(example_pathfindR_output, node_colors = c("red", "blue"))) }) test_that("`term_gene_heatmap()` -- produces a ggplot object using the correct data", { skip_on_cran() # Top 10 (default) expect_is(p <- term_gene_heatmap(example_pathfindR_output), "ggplot") expect_equal(length(unique(p$data$Enriched_Term)), 10) expect_true(all(p$data$Enriched_Term %in% example_pathfindR_output$ID)) # Top 3 expect_is(p <- term_gene_heatmap(example_pathfindR_output, num_terms = 3), "ggplot") expect_equal(length(unique(p$data$Enriched_Term)), 3) # No genes in 'Down_regulated' res_df <- example_pathfindR_output[1:3, ] res_df$Down_regulated <- "" expect_is(p <- term_gene_heatmap(res_df), "ggplot") expect_equal(length(unique(p$data$Enriched_Term)), 3) # No genes in 'Up_regulated' res_df <- example_pathfindR_output[1:3, ] res_df$Up_regulated <- "" expect_is(p <- term_gene_heatmap(res_df), "ggplot") expect_equal(length(unique(p$data$Enriched_Term)), 3) # All terms expect_is(p <- term_gene_heatmap(example_pathfindR_output[1:15, ], num_terms = NULL), "ggplot") expect_equal(length(unique(p$data$Enriched_Term)), 15) # Top 1000, expect to plot top nrow(output) expect_is(p <- term_gene_heatmap(example_pathfindR_output[1:15, ], num_terms = 1000), "ggplot") expect_equal(length(unique(p$data$Enriched_Term)), 15) # use_description = TRUE expect_is(p <- term_gene_heatmap(example_pathfindR_output, use_description = TRUE), "ggplot") expect_equal(length(unique(p$data$Enriched_Term)), 10) expect_true(all(p$data$Enriched_Term %in% example_pathfindR_output$Term_Description)) # genes_df supplied expect_is(p <- term_gene_heatmap(example_pathfindR_output[1:3, ], example_pathfindR_input), "ggplot") # genes_df supplied - wihout change column expect_is(p <- term_gene_heatmap(example_pathfindR_output[1:3, ], example_pathfindR_input[, -2]), "ggplot") # sort by lowest_p instead expect_is(p <- term_gene_heatmap(example_pathfindR_output[1:3, ], example_pathfindR_input, sort_terms_by_p = TRUE), "ggplot") }) test_that("`term_gene_graph()` -- argument checks work", { expect_error(term_gene_heatmap(result_df = example_pathfindR_output, use_description = "INVALID"), "`use_description` must either be TRUE or FALSE!") expect_error(term_gene_heatmap(result_df = "INVALID"), "`result_df` should be a data frame") wrong_df <- example_pathfindR_output[, -c(1, 2)] ID_column <- "ID" nec_cols <- c(ID_column, "lowest_p", "Up_regulated", "Down_regulated") expect_error(term_gene_heatmap(wrong_df, use_description = FALSE), paste0("`result_df` should have the following columns: ", paste(dQuote(nec_cols), collapse = ", "))) ID_column <- "Term_Description" nec_cols <- c(ID_column, "lowest_p", "Up_regulated", "Down_regulated") expect_error(term_gene_heatmap(wrong_df, use_description = TRUE), paste0("`result_df` should have the following columns: ", paste(dQuote(nec_cols), collapse = ", "))) expect_error(term_gene_heatmap(result_df = example_pathfindR_output, genes_df = "INVALID")) expect_error(term_gene_heatmap(result_df = example_pathfindR_output, num_terms = "INVALID"), "`num_terms` should be numeric or NULL") expect_error(term_gene_heatmap(result_df = example_pathfindR_output, num_terms = -1), "`num_terms` should be > 0 or NULL") expect_error(term_gene_heatmap(example_pathfindR_output, low = "")) expect_error(term_gene_heatmap(example_pathfindR_output, mid = "")) expect_error(term_gene_heatmap(example_pathfindR_output, high = "")) }) test_that("`UpSet_plot()` -- produces a ggplot object", { skip_on_cran() # Top 10 (default) expect_is(p <- UpSet_plot(example_pathfindR_output), "ggplot") # Top 3 expect_is(p <- UpSet_plot(example_pathfindR_output, num_terms = 3), "ggplot") # All terms expect_is(p <- UpSet_plot(example_pathfindR_output[1:15, ], num_terms = NULL), "ggplot") # No genes in 'Down_regulated' res_df <- example_pathfindR_output res_df$Down_regulated <- "" expect_is(p <- UpSet_plot(res_df, num_terms = 3), "ggplot") # No genes in 'Up_regulated' res_df <- example_pathfindR_output res_df$Up_regulated <- "" expect_is(p <- UpSet_plot(res_df, num_terms = 3), "ggplot") # use_description = TRUE expect_is(p <- UpSet_plot(example_pathfindR_output, use_description = TRUE), "ggplot") # Other visualization types expect_is(p <- UpSet_plot(example_pathfindR_output[1:3, ], example_pathfindR_input[1:10, ]), "ggplot") expect_is(p <- UpSet_plot(example_pathfindR_output[1:3, ], example_pathfindR_input[1:10, ], method = "boxplot"), "ggplot") expect_is(p <- UpSet_plot(example_pathfindR_output[1:3, ], method = "barplot"), "ggplot") }) test_that("`UpSet_plot()` -- argument checks work", { expect_error(UpSet_plot(result_df = example_pathfindR_output, use_description = "INVALID"), "`use_description` must either be TRUE or FALSE!") expect_error(UpSet_plot(result_df = "INVALID"), "`result_df` should be a data frame") wrong_df <- example_pathfindR_output[, -c(1, 2)] ID_column <- "ID" nec_cols <- c(ID_column, "lowest_p", "Up_regulated", "Down_regulated") expect_error(UpSet_plot(wrong_df, use_description = FALSE), paste0("`result_df` should have the following columns: ", paste(dQuote(nec_cols), collapse = ", "))) ID_column <- "Term_Description" nec_cols <- c(ID_column, "lowest_p", "Up_regulated", "Down_regulated") expect_error(UpSet_plot(wrong_df, use_description = TRUE), paste0("`result_df` should have the following columns: ", paste(dQuote(nec_cols), collapse = ", "))) expect_error(UpSet_plot(result_df = example_pathfindR_output, genes_df = "INVALID")) expect_error(UpSet_plot(result_df = example_pathfindR_output, num_terms = "INVALID"), "`num_terms` should be numeric or NULL") expect_error(UpSet_plot(result_df = example_pathfindR_output, num_terms = -1), "`num_terms` should be > 0 or NULL") valid_opts <- c("heatmap", "boxplot", "barplot") expect_error(UpSet_plot(result_df = example_pathfindR_output, method = "INVALID"), paste("`method` should be one of`", paste(dQuote(valid_opts), collapse = ", "))) expect_error(UpSet_plot(result_df = example_pathfindR_output, method = "boxplot"), "For `method = boxplot`, you must provide `genes_df`") expect_error(UpSet_plot(example_pathfindR_output, low = "")) expect_error(UpSet_plot(example_pathfindR_output, mid = "")) expect_error(UpSet_plot(example_pathfindR_output, high = "")) }) test_that("`isColor()` -- identifies colors correctly", { expect_true(isColor("red")) expect_true(isColor("green")) expect_true(isColor("black")) expect_true(isColor("gray60")) expect_true(isColor("#E5D7BF")) expect_false(isColor("")) expect_false(isColor("a")) expect_false(isColor(FALSE)) expect_false(isColor(1)) expect_false(isColor(c())) expect_false(isColor(list())) }) ================================================ FILE: tests/testthat/test-zzz.R ================================================ ## Tests for functions related to java version check - Aug 2023 test_that("`fetch_java_version()` works as expected", { version_vec <- c("java version \"13.0.1\" 2019-10-15", "Java(TM) SE Runtime Environment (build 13.0.1+9)", "Java HotSpot(TM) 64-Bit Server VM (build 13.0.1+9, mixed mode, sharing)") mockery::stub(fetch_java_version, "Sys.getenv", "/path/to/java/home") mockery::stub(fetch_java_version, "file.exists", TRUE) mockery::stub(fetch_java_version, "system2", version_vec) # unix mockery::stub(fetch_java_version, "identical", FALSE) expect_equal(fetch_java_version(), version_vec) # windows mockery::stub(fetch_java_version, "identical", TRUE) expect_equal(fetch_java_version(), version_vec) mockery::stub(fetch_java_version, "system2", c()) expect_error(fetch_java_version()) mockery::stub(fetch_java_version, "file.exists", FALSE) expect_error(fetch_java_version()) mockery::stub(fetch_java_version, "Sys.getenv", NA) mockery::stub(fetch_java_version, "Sys.which", "path/to/java") mockery::stub(fetch_java_version, "system2", version_vec) expect_equal(fetch_java_version(), version_vec) }) test_that("`check_java_version()` works", { expect_null(check_java_version()) }) test_that("`check_java_version()` raises parsing error", { expect_error(check_java_version(c("version 1.8", "version 1.7")), "Java version detected but couldn't parse version from ") expect_error(check_java_version("version XXXX"), "Java version detected but couldn't parse version from: ") }) test_that("`check_java_version()` works with 1.8", { expect_null(check_java_version(c("java version \"1.8.0_144\"", "Java(TM) SE Runtime Environment (build 1.8.0_000-000)", "Java HotSpot(TM) 64-Bit Server VM (build 00.000-000, mixed mode)"))) }) test_that("`check_java_version()` works with 14", { expect_null(check_java_version(c("java version \"14\" 2020-03-17", "Java(TM) SE Runtime Environment (build 14+36-1461)", "Java HotSpot(TM) 64-Bit Server VM (build 14+36-1461, mixed mode, sharing)"))) }) test_that("`check_java_version()` fails with 1.7", { expect_error(check_java_version(c("java version \"1.7.0\"", "Java(TM) SE Runtime Environment (build 1.7.0_000-000)", "Java HotSpot(TM) 64-Bit Server VM (build 00.000-000, mixed mode)"))) }) ================================================ FILE: tests/testthat-active_snw.R ================================================ library(testthat) library(pathfindR) test_check("pathfindR", filter = "active_snw_search") ================================================ FILE: tests/testthat-clustering.R ================================================ library(testthat) library(pathfindR) test_check("pathfindR", filter = "clustering") ================================================ FILE: tests/testthat-comparison.R ================================================ library(testthat) library(pathfindR) test_check("pathfindR", filter = "comparison") ================================================ FILE: tests/testthat-core.R ================================================ library(testthat) library(pathfindR) test_check("pathfindR", filter = "core") ================================================ FILE: tests/testthat-data_generation.R ================================================ library(testthat) library(pathfindR) test_check("pathfindR", filter = "data_generation") ================================================ FILE: tests/testthat-enrichment.R ================================================ library(testthat) library(pathfindR) test_check("pathfindR", filter = "enrichment") ================================================ FILE: tests/testthat-scoring.R ================================================ library(testthat) library(pathfindR) test_check("pathfindR", filter = "scoring") ================================================ FILE: tests/testthat-utility.R ================================================ library(testthat) library(pathfindR) test_check("pathfindR", filter = "utility") ================================================ FILE: tests/testthat-visualization.R ================================================ library(testthat) library(pathfindR) test_check("pathfindR", filter = "visualization") ================================================ FILE: tests/testthat-zzz.R ================================================ library(testthat) library(pathfindR) test_check("pathfindR", filter = "zzz") ================================================ FILE: vignettes/.gitignore ================================================ *.html *.R ================================================ FILE: vignettes/comparing_results.Rmd ================================================ --- title: "Comparing Two pathfindR Results" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Comparing Two pathfindR Results} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>", fig.width = 8, fig.height = 4, fig.align = "center" ) suppressPackageStartupMessages(library(pathfindR)) ``` The function `combine_pathfindR_results()` allows combination of two pathfindR active-subnetwork-oriented enrichment analysis results for investigating common and distinct terms between the groups. Below is an example for comparing results using two different rheumatoid arthritis-related data sets (`example_pathfindR_output` and `example_comparison_output`). ```{r compare2res} combined_df <- combine_pathfindR_results( result_A = example_pathfindR_output, result_B = example_comparison_output, plot_common = FALSE ) ``` By default, `combine_pathfindR_results()` plots the term-gene graph for the common terms in the combined results. For not plotting the graph, set `plot_common = FALSE`. The function `combined_results_graph()` can be used to create this graph (using only selected terms etc.) later on. By default, the function creates the graph using all common terms: ```{r compare_graph1} combined_results_graph(combined_df) ``` By supplying a vector of selected terms to the `selected_terms` arguments, you may plot the term-gene graph for the selected terms: ```{r compare_graph2, fig.width=8, fig.height=4} combined_results_graph( combined_df, selected_terms = c("hsa04144", "hsa04141", "hsa04140") ) ``` By default, `combined_results_graph()` creates the graph using term IDs. To use term descriptions instead, set `use_description = TRUE`: ```{r compare_graph3, eval=FALSE} combined_results_graph( combined_df, use_description = TRUE, selected_terms = combined_df$Term_Description[1:4] ) ``` For changing the layout of the graph (`"auto"` by default), you may use the `layout` argument. For changing how the sizes of the term nodes are determined, you may use the `node_size` argument. The options are `"num_genes"` (default) and `"p_val"` for using the number of significant genes in the term and the -log10(p) value of the term, respectively: ```{r compare_graph4, eval=FALSE} combined_results_graph( combined_df, selected_terms = c("hsa04144", "hsa04141", "hsa04140"), node_size = "p_val" ) ``` ================================================ FILE: vignettes/intro_vignette.Rmd ================================================ --- title: "Introduction to pathfindR" author: "Ege Ulgen" date: "`r Sys.Date()`" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Introduction to pathfindR} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r setup, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>", eval = FALSE, fig.width = 7, fig.height = 7, fig.align = "center" ) suppressPackageStartupMessages(library(pathfindR)) ``` `pathfindR` is a tool for enrichment analysis via active subnetworks. The package also offers functionality to cluster the enriched terms and identify representative terms in each cluster, to score the enriched terms per sample and to visualize analysis results. The functionality suite of pathfindR is described in detail in _Ulgen E, Ozisik O, Sezerman OU. 2019. pathfindR: An R Package for Comprehensive Identification of Enriched Pathways in Omics Data Through Active Subnetworks. Front. Genet. [https://doi.org/10.3389/fgene.2019.00858](https://doi.org/10.3389/fgene.2019.00858)_ # Overview The observation that motivated us to develop `pathfindR` was that direct enrichment analysis of differential RNA/protein expression or DNA methylation results may not provide the researcher with the full picture. That is to say: enrichment analysis of only a list of significant genes alone may not be informative enough to explain the underlying disease mechanisms. Therefore, we considered leveraging interaction information from a protein-protein interaction network (PIN) to identify distinct active subnetworks and then perform enrichment analyses on these subnetworks. > An active subnetwork can be defined as a group of interconnected genes in a PIN that predominantly consists of significantly altered genes. In other words, active subnetworks define distinct disease-associated sets of interacting genes, whether discovered through the original analysis or discovered because of being in interaction with a significant gene. The active-subnetwork-oriented enrichment analysis approach of pathfindR can be summarized as follows: Mapping the input genes with the associated p values onto the PIN (after processing the input), active subnetwork search is performed. The resulting active subnetworks are then filtered based on their scores and the number of significant genes they contain. This filtered list of active subnetworks are then used for enrichment analyses, i.e. using the genes in each of the active subnetworks, the significantly enriched terms (pathways/gene sets) are identified. Enriched terms with adjusted p values larger than the given threshold are discarded and the lowest adjusted p value (over all active subnetworks) for each term is kept. This process of `active subnetwork search + enrichment analyses` is repeated for a selected number of iterations, performed in parallel. Over all iterations, the lowest and the highest adjusted-p values, as well as number of occurrences over all iterations are reported for each significantly enriched term in the resulting data frame. This active-subnetwork-oriented enrichment approach is demonstrated in the section [Active-subnetwork-oriented Enrichment Analysis] of this vignette. The enrichment analysis usually yields a great number of enriched terms whose biological functions are related. Therefore, we implemented two clustering approaches using a pairwise distance matrix based on the kappa statistics between the enriched terms (as proposed by Huang et al. [^1]). Based on this distance metric, the user can perform either hierarchical (default) or fuzzy clustering of the enriched terms. Details of clustering and partitioning of enriched terms are presented in the [Clustering Enriched Terms] section of this vignette. Other functionality of pathfindR includes: - agglomerated score calculation per each term (to investigate how a gene set is altered in a given sample) - visualization of terms and term-related genes as a graph (to determine the degree of overlap between the enriched terms by identifying shared and/or distinct significant genes) and - pathfindR analysis with custom gene sets are also briefly described. # Active-subnetwork-oriented Enrichment Analysis For convenience, we provide the wrapper function `run_pathfindR()` to be used for the active-subnetwork-oriented enrichment analysis. The input for this function must be a data frame consisting of the columns containing: `Gene Symbols`, `Change Values` (optional) and `p values`. The example data frame used in this vignette (`example_pathfindR_input`) is the dataset containing the differentially-expressed genes for the GEO dataset GSE15573 comparing 18 rheumatoid arthritis (RA) patients versus 15 healthy subjects. The first 6 rows of the example input data frame are displayed below: ```{r load_pkg, eval=TRUE} library(pathfindR) knitr::kable(head(example_pathfindR_input)) ``` > For a detailed step-by-step explanation and an unwrapped demonstration of the active-subnetwork-oriented enrichment analysis, see the vignette [Step-by-Step Execution of the pathfindR Enrichment Workflow](manual_execution.html) Executing the workflow is straightforward (but does typically take several minutes): ```{r run_pathfindR} output_df <- run_pathfindR(example_pathfindR_input) ``` ## Useful arguments This subsection demonstrates some (selected) useful arguments of `run_pathfindR()`. For a full list of arguments, see `?run_pathfindR` or visit [our GitHub wiki](https://github.com/egeulgen/pathfindR/wiki). ### Filtering Input Genes By default, `run_pathfindR()` uses the input genes with p-values < 0.05. To change this threshold, use `p_val_threshold`: ```{r change_input_thr} output_df <- run_pathfindR(example_pathfindR_input, p_val_threshold = 0.01) ``` ### Output Directory By default, `run_pathfindR()` creates a temporary directory for writing the output files, including active subnetwork search results and a HTML report. To set the output directory, use `output_dir`: ```{r change_out_dir} output_df <- run_pathfindR(example_pathfindR_input, output_dir = "this_is_my_output_directory") ``` This creates `"this_is_my_output_directory"` under the current working directory. In essence, this argument is treated as a path so it can be used to create the output directory anywhere. For example, to create the directory `"my_dir"` under `"~/Desktop"` and run the analysis there, you may run: ```{r change_out_dir2} output_df <- run_pathfindR(example_pathfindR_input, output_dir = "~/Desktop/my_dir") ``` > Note: If the output directory (e.g. `"my_dir"`) already exists, `run_pathfindR()` creates and works under `"my_dir(1)"`. If that exists also exists, it creates `"my_dir(2)"` and so on. This was intentionally implemented so that any previous pathfindR results are not overwritten. ### Gene Sets for Enrichment The active-subnetwork-oriented enrichment analyses can be performed on any gene sets (biological pathways, gene ontology terms, transcription factor target genes, miRNA target genes etc.). The available gene sets in pathfindR are "KEGG", "Reactome", "BioCarta", "GO-All", "GO-BP", "GO-CC" and "GO-MF" (all for Homo sapiens). For changing the default gene sets for enrichment analysis (hsa KEGG pathways), use the argument `gene_sets` ```{r change_gset1} output_df <- run_pathfindR(example_pathfindR_input, gene_sets = "GO-MF") ``` By default, `run_pathfindR()` filters the gene sets by including only the terms containing at least 10 and at most 300 genes. To change the default behavior, you may change `min_gset_size` and `max_gset_size`: ```{r change_gset2} ## Including more terms for enrichment analysis output_df <- run_pathfindR(example_pathfindR_input, gene_sets = "GO-MF", min_gset_size = 5, max_gset_size = 500 ) ``` > Note that increasing the number of terms for enrichment analysis may result in significantly longer run time. If the user prefers to use another gene set source, the `gene_sets` argument should be set to `"Custom"` and the custom gene sets (list) and the custom gene set descriptions (named vector) should be supplied via the arguments `custom_genes` and `custom_descriptions`, respectively. See `?fetch_gene_set` for more details and [Analysis with Custom Gene Sets] for a simple demonstration. For details on obtaining organism-specific Gene Sets and PIN data, see the vignette [Obtaining PIN and Gene Sets Data](obtain_data.html). ### Filtering Enriched Terms by Adjusted-p Values By default, `run_pathfindR()` adjusts the enrichment p values via the "bonferroni" method and filters the enriched terms by adjusted-p value < 0.05. To change this adjustment method and the threshold, set `adj_method` and `enrichment_threshold`, respectively: ```{r change_enr_threshold} output_df <- run_pathfindR(example_pathfindR_input, adj_method = "fdr", enrichment_threshold = 0.01 ) ``` ### Protein-protein Interaction Network For the active subnetwork search process, a protein-protein interaction network (PIN) is used. `run_pathfindR()` maps the input genes onto this PIN and identifies active subnetworks which are then be used for enrichment analyses. To change the default PIN ("Biogrid"), use the `pin_name_path` argument: ```{r change_PIN1} output_df <- run_pathfindR(example_pathfindR_input, pin_name_path = "IntAct") ``` The `pin_name_path` argument can be one of "Biogrid", "STRING", "GeneMania", "IntAct", "KEGG", "mmu_STRING" or it can be the path to a custom PIN file provided by the user. ```{r change_PIN2} # to use an external PIN of your choice output_df <- run_pathfindR(example_pathfindR_input, pin_name_path = "/path/to/myPIN.sif") ``` > NOTE: the PIN is also used for generating the background genes (in this case, all unique genes in the PIN) during hypergeometric-distribution-based tests in enrichment analyses. Therefore, a large PIN will generally result in better results. ### Active Subnetwork Search Method Currently, there are three algorithms implemented in pathfindR for active subnetwork search: Greedy Algorithm (default, based on Ideker et al. [^2]), Simulated Annealing Algorithm (based on Ideker et al. [^2]) and Genetic Algorithm (based on Ozisik et al. [^3]). For a detailed discussion on which algorithm to use see [this wiki entry](https://github.com/egeulgen/pathfindR/wiki/Active-subnetwork-oriented-Enrichment-Documentation#selecting-the-active-subnetwork-search-algorithm) ```{r change_method} # for simulated annealing: output_df <- run_pathfindR(example_pathfindR_input, search_method = "SA") # for genetic algorithm: output_df <- run_pathfindR(example_pathfindR_input, search_method = "GA") ``` ### Other Arguments Because the active subnetwork search algorithms are stochastic, `run_pathfindR()` may be set to iterate the active subnetwork identification and enrichment steps multiple times (by default 1 time). To change this number, set `iterations`: ```{r change_n_iters} output_df <- run_pathfindR(example_pathfindR_input, iterations = 25) ``` `run_pathfindR()` uses a parallel loop (using the package `foreach`) for performing these iterations in parallel. By default, the number of processes to be used is determined automatically. To override, change `n_processes`: ``` {r change_n_proc} # if not set, `n_processes` defaults to (number of detected cores - 1) output_df <- run_pathfindR(example_pathfindR_input, iterations = 5, n_processes = 2) ``` ## Output ### Enriched Terms Data Frame `run_pathfindR()` returns a data frame of enriched terms. Columns are: - ID: ID of the enriched term - Term_Description: Description of the enriched term - Fold_Enrichment: Fold enrichment value for the enriched term (Calculated using ONLY the input genes) - occurrence: The number of iterations that the given term was found to enriched over all iterations - lowest_p: the lowest adjusted-p value of the given term over all iterations - highest_p: the highest adjusted-p value of the given term over all iterations - non_Signif_Snw_Genes (OPTIONAL): the non-significant active subnetwork genes, comma-separated (controlled by `list_active_snw_genes`, default is `FALSE`) - Up_regulated: the up-regulated genes (as determined by `change value` > 0, if the `change column` was provided) in the input involved in the given term's gene set, comma-separated. If change column was not provided, all affected input genes are listed here. - Down_regulated: the down-regulated genes (as determined by `change value` < 0, if the `change column` was provided) in the input involved in the given term's gene set, comma-separated The first 2 rows of the output data frame of the example analysis on the rheumatoid arthritis gene-level differential expression input data (`example_pathfindR_input`) is shown below: ```{r example_out, eval=TRUE} knitr::kable(head(example_pathfindR_output, 2)) ``` By default, `run_pathfindR()` also produces a graphical summary of enrichment results for top 10 enriched terms, which can also be later produced by `enrichment_chart()`: You may also disable plotting this chart by setting `plot_enrichment_chart=FALSE` and later produce this plot via the function `enrichment_chart()`: ```{r encrichment_plot_shown} # change number of top terms plotted (default = 10) enrichment_chart( result_df = example_pathfindR_output, top_terms = 15 ) ``` ### HTML Report (created when `output_dir` is set) The function also creates an HTML report `results.html` that is saved in the output directory if it's set. This report contains links to two other HTML files: **1. `enriched_terms.html`** This document contains the table of the active subnetwork-oriented enrichment results (same as the returned data frame). **2. `conversion_table.html`** This document contains the table of converted gene symbols. Columns are: - Old Symbol: the original gene symbol - Converted Symbol: the alias symbol that was found in the PIN - Change: the provided change value - p-value: the provided adjusted p value > During input processing, gene symbols that are not in the PIN are identified and excluded. For human genes, if aliases of these missing gene symbols are found in the PIN, these symbols are converted to the corresponding aliases (controlled by the argument `convert2alias`). This step is performed to best map the input data onto the PIN. The document contains a second table of genes for which **no interactions** were identified after checking for alias symbols (so these could not be used during the analysis). ## Enriched Term Diagrams For KEGG enrichment analyses, `visualize_terms()` can be used to generate KEGG pathway diagrams that are returned as a list of `ggraph` objects (using [`ggkegg`](https://github.com/noriakis/ggkegg)):: ```{r KEGG_vis} input_processed <- input_processing(example_pathfindR_input) gg_list <- visualize_terms( result_df = example_pathfindR_output, input_processed = input_processed, is_KEGG_result = TRUE ) # this function returns a list of ggraph objects (named by Term ID) # save one of the plots as PDF image ggplot2::ggsave( "hsa04911_diagram.pdf", # path to output, format is determined by extension gg_list$hsa04911, # what to plot width = 5, # adjust width height = 5 # adjust height ) ``` Alternatively (i.e., for other types of non-KEGG enrichment analyses), an interaction diagram per enriched term can be generated again via `visualize_terms()`. These diagrams are also returned as a list of `ggraph` objects: ```{r nonKEGG_viss} input_processed <- input_processing(example_pathfindR_input) gg_list <- visualize_terms( result_df = example_pathfindR_output, input_processed = input_processed, is_KEGG_result = FALSE, pin_name_path = "Biogrid" ) # this function returns a list of ggraph objects (named by Term ID) # save one of the plots as PDF image ggplot2::ggsave( "diabetic_cardiomyopathy_interactions.pdf", # path to output, format is determined by extension gg_list$hsa04911, # what to plot width = 10, # adjust width height = 6 # adjust height ) ``` # Clustering Enriched Terms The wrapper function `cluster_enriched_terms()` can be used to perform clustering of enriched terms and partitioning the terms into biologically-relevant groups. Clustering can be performed either via `hierarchical` or `fuzzy` method using the pairwise kappa statistics (a chance-corrected measure of co-occurrence between two sets of categorized data) matrix between all enriched terms. ## Hierarchical Clustering By default, `cluster_enriched_terms()` performs hierarchical clustering of the terms (using $1 - \kappa$ as the distance metric). Iterating over $2,3,...n$ clusters (where $n$ is the number of terms), `cluster_enriched_terms()` determines the optimal number of clusters by maximizing the average silhouette width, partitions the data into this optimal number of clusters and returns a data frame with cluster assignments. ```{r hierarchical0} example_pathfindR_output_clustered <- cluster_enriched_terms(example_pathfindR_output, plot_dend = FALSE, plot_clusters_graph = FALSE) ``` ```{r hierarchical1, eval=TRUE} ## First 2 rows of clustered data frame knitr::kable(head(example_pathfindR_output_clustered, 2)) ## The representative terms knitr::kable(example_pathfindR_output_clustered[example_pathfindR_output_clustered$Status == "Representative", ]) ``` After clustering, you may again plot the summary enrichment chart and display the enriched terms by clusters: ```{r hierarchical2, eval=TRUE} # plotting only selected clusters for better visualization selected_clusters <- subset(example_pathfindR_output_clustered, Cluster %in% 5:7) enrichment_chart(selected_clusters, plot_by_cluster = TRUE) ``` For details, see `?hierarchical_term_clustering` ## Heuristic Fuzzy Multiple-linkage Partitioning Alternatively, the `fuzzy` clustering method (as described by Huang et al.[^1]) can be used: ```{r fuzzy} clustered_fuzzy <- cluster_enriched_terms(example_pathfindR_output, method = "fuzzy") ``` For details, see `?fuzzy_term_clustering` # Aggregated Term Scores per Sample The function `score_terms()` can be used to calculate the agglomerated z score of each enriched term per sample. This allows the user to individually examine the scores and infer how a term is overall altered (activated or repressed) in a given sample or a group of samples. ```{r scores} ## Vector of "Case" IDs cases <- c( "GSM389703", "GSM389704", "GSM389706", "GSM389708", "GSM389711", "GSM389714", "GSM389716", "GSM389717", "GSM389719", "GSM389721", "GSM389722", "GSM389724", "GSM389726", "GSM389727", "GSM389730", "GSM389731", "GSM389733", "GSM389735" ) ## Calculate scores for representative terms ## and plot heat map using term descriptions representative_df <- example_pathfindR_output_clustered[example_pathfindR_output_clustered$Status == "Representative", ] score_matrix <- score_terms( enrichment_table = representative_df, exp_mat = example_experiment_matrix, cases = cases, use_description = TRUE, # default FALSE label_samples = FALSE, # default = TRUE case_title = "RA", # default = "Case" control_title = "Healthy", # default = "Control" low = "#f7797d", # default = "green" mid = "#fffde4", # default = "black" high = "#1f4037" # default = "red" ) ``` # Comparison of 2 pathfindR Results The function `combine_pathfindR_results()` allows combination of two pathfindR active-subnetwork-oriented enrichment analysis results for investigating common and distinct terms between the groups. Below is an example for comparing results using two different rheumatoid arthritis-related data sets(`example_pathfindR_output` and `example_comparison_output`). ```{r compare2res, eval=TRUE, fig.height=4, fig.width=8} combined_df <- combine_pathfindR_results( result_A = example_pathfindR_output, result_B = example_comparison_output, plot_common = FALSE ) ``` For more details, see the vignette [Comparing Two pathfindR Results](comparing_results.html) # Analysis with Custom Gene Sets > As of v1.5, pathfindR offers utility functions for obtaining organism-specific PIN data and organism-specific gene sets data via `get_pin_file()` and `get_gene_sets_list()`, respectively. See the vignette [Obtaining PIN and Gene Sets Data](obtain_data.html) for detailed information on how to gather PIN and gene sets data (for any organism of your choice) for use with pathfindR. It is possible to use `run_pathfindR()` with custom gene sets (including gene sets for non-Homo-sapiens species). Here, we provide an example application of active-subnetwork-oriented enrichment analysis of the target genes of two transcription factors. We first load and prepare the gene sets: ```{r custom_prep, eval=TRUE} ## CREB target genes CREB_target_genes <- normalizePath(system.file("extdata/CREB.txt", package = "pathfindR")) CREB_target_genes <- readLines(CREB_target_genes)[-c(1, 2)] # skip the first two lines ## MYC target genes MYC_target_genes <- normalizePath(system.file("extdata/MYC.txt", package = "pathfindR")) MYC_target_genes <- readLines(MYC_target_genes)[-c(1, 2)] # skip the first two lines ## Prep for use custom_genes <- list(TF1 = CREB_target_genes, TF2 = MYC_target_genes) custom_descriptions <- c(TF1 = "CREB target genes", TF2 = "MYC target genes") ``` We next prepare the example input data frame. Because of the way we choose genes, we expect significant enrichment for MYC targets (40 MYC target genes + 10 CREB target genes). Because this is only an example, we also assign each genes random p-values between 0.001 and 0.05. ```{r custom_input, eval=TRUE} set.seed(123) ## Select 40 random genes from MYC gene sets and 10 from CREB gene sets selected_genes <- sample(MYC_target_genes, 40) selected_genes <- c( selected_genes, sample(CREB_target_genes, 10) ) ## Assign random p value between 0.001 and 0.05 for each selected gene rand_p_vals <- sample(seq(0.001, 0.05, length.out = 5), size = length(selected_genes), replace = TRUE ) example_pathfindR_input <- data.frame( Gene_symbol = selected_genes, p_val = rand_p_vals ) knitr::kable(head(example_pathfindR_input)) ``` Finally, we perform active-subnetwork-oriented enrichment analysis via `run_pathfindR()` using the custom genes as the gene sets: ```{r custom_run} example_custom_genesets_result <- run_pathfindR( example_pathfindR_input, gene_sets = "Custom", custom_genes = custom_genes, custom_descriptions = custom_descriptions, min_gset_size = 1, # do not limit the gene set size for demo max_gset_size = Inf, # do not limit the gene set size for demo ) knitr::kable(example_custom_genesets_result) ``` ```{r custom_result1, eval=TRUE, echo=FALSE} knitr::kable(example_custom_genesets_result) ``` > It is also possible to run pathfindR using non-human organism annotation. See the vignette [pathfindR Analysis for non-Homo-sapiens organisms](non_hs_analysis.html) [^1]: Huang DW, Sherman BT, Tan Q, et al. The DAVID Gene Functional Classification Tool: a novel biological module-centric algorithm to functionally analyze large gene lists. Genome Biol. 2007;8(9):R183. [^2]: Ideker T, Ozier O, Schwikowski B, Siegel AF. Discovering regulatory and signalling circuits in molecular interaction networks. Bioinformatics. 2002;18 Suppl 1:S233-40. [^3]: Ozisik O, Bakir-Gungor B, Diri B, Sezerman OU. Active Subnetwork GA: A Two Stage Genetic Algorithm Approach to Active Subnetwork Search. Current Bioinformatics. 2017; 12(4):320-8. \doi 10.2174/1574893611666160527100444 ================================================ FILE: vignettes/manual_execution.Rmd ================================================ --- title: "Step-by-Step Execution of the pathfindR Enrichment Workflow" author: "Ege Ulgen" date: "`r Sys.Date()`" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Step-by-Step Execution of the pathfindR Enrichment Workflow} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r setup, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>", eval = FALSE ) ``` This vignette walks through each step of the pathfindR active-subnetwork-oriented pathway enrichment analysis. For most purposes, the wrapper function `run_pathfindR()` can be used to perform this analysis from start to end. For users who wish to have further control over the enrichment workflow, this vignette will be more useful. # Load the package and prepare the input data frame We first need to load the package and the input data to be used for analysis. The input must be a data frame consisting of the following columns: `Gene Symbols`, `Change Values` (optional) and `p values`. The example data frame used in this vignette (`example_pathfindR_input`) is the dataset containing the differentially-expressed genes for the GEO dataset GSE15573 comparing 18 rheumatoid arthritis (RA) patients versus 15 healthy subjects. ```{r init_steps, eval=TRUE} suppressPackageStartupMessages(library(pathfindR)) data(example_pathfindR_input) head(example_pathfindR_input, 3) ``` # The protein-protein interaction network (PIN) For the active subnetwork search process, we will need a protein-protein interaction network (PIN). pathfindR will map the input genes onto this PIN and identify active subnetworks which will then be used for enrichment analyses. > An active subnetwork can be defined as a group of interconnected genes in a protein-protein interaction network (PIN) that predominantly consists of significantly altered genes. In other words, active subnetworks define distinct disease-associated sets of interacting genes, whether discovered through the original analysis or discovered because of being in interaction with a significant gene. The `pin_name_path` argument in all functions can be one of "Biogrid", "STRING", "GeneMania", "IntAct", "KEGG", "mmu_STRING" or it can be the path to a custom PIN file provided by the user. # Process input data We next need to process the input data for use in analysis via `input_processing()`: ```{r process} example_processed <- input_processing( input = example_pathfindR_input, # the input: in this case, differential expression results p_val_threshold = 0.05, # p value threshold to filter significant genes pin_name_path = "Biogrid", # the name of the PIN to use for active subnetwork search convert2alias = TRUE # boolean indicating whether or not to convert missing symbols to alias symbols in the PIN ) ``` > After checking that the data frame complies with the requirements, `input_processing()` filters the input so that genes with p values larger than `p_val_threshold` are excluded. Next, gene symbols that are not in the PIN are identified and excluded. For human genes, if aliases of these missing gene symbols are found in the PIN, these symbols are converted to the corresponding aliases (controlled by the argument `convert2alias`). This step is performed to best map the input data onto the PIN. # Obtain Gene Set Data We obtain the necessary gene sets for enrichment analyses using `fetch_gene_set()`: ``` {r gene_set} # using "BioCarta" as our gene sets for enrichment biocarta_list <- fetch_gene_set( gene_sets = "BioCarta", min_gset_size = 10, max_gset_size = 300 ) biocarta_gsets <- biocarta_list[[1]] biocarta_descriptions <- biocarta_list[[2]] ``` > The available gene sets in pathfindR are "KEGG", "Reactome", "BioCarta", "GO-All", "GO-BP", "GO-CC" and "GO-MF". If the user prefers to use another gene set source, the `gene_sets` argument should be set to `"Custom"` and the custom gene sets (list) and the custom gene set descriptions (named vector) should be supplied via the arguments `custom_genes` and `custom_descriptions`, respectively. See `?fetch_gene_set` for more details. # Active Subnetwork Search and Enrichment Analyses As outlined in the vignette [Introduction to pathfindR](intro_vignette.html), `run_pathfindR()` initially identifies and filters active subnetworks, then performs enrichment analyses on these subnetworks and summarize the results. To perform these steps manually, we utilize the function `active_snw_search()` for identifying and filtering active subnetworks and the function `enrichment_analyses()` for obtaining enriched terms using these subnetworks. Because the active subnetwork search algorithms are stochastic, we suggest iterating these subnetwork identification and enrichment steps multiple times (especially for "SA")[^1]: [^1]: Here we are using a regular `for` loop. In the wrapper function `run_pathfindR()`, however, a parallel loop (via the package `foreach`) is used. ```{r snw_search} n_iter <- 10 ## number of iterations combined_res <- NULL ## to store the result of each iteration for (i in 1:n_iter) { ###### Active Subnetwork Search snws_file <- paste0("active_snws_", i) # Name of output file active_snws <- active_snw_search( input_for_search = example_processed, pin_name_path = "Biogrid", snws_file = snws_file, score_quan_thr = 0.8, # you may tweak these arguments for optimal filtering of subnetworks sig_gene_thr = 0.02, # you may tweak these arguments for optimal filtering of subnetworks search_method = "GR", # we suggest using GR seedForRandom = i # setting seed to ensure reproducibility per iteration ) ###### Enrichment Analyses current_res <- enrichment_analyses( snws = active_snws, sig_genes_vec = example_processed$GENE, pin_name_path = "Biogrid", genes_by_term = biocarta_gsets, term_descriptions = biocarta_descriptions, adj_method = "bonferroni", enrichment_threshold = 0.05, list_active_snw_genes = TRUE ) # listing the non-input active snw genes in output ###### Combine results via `rbind` combined_res <- rbind(combined_res, current_res) } ``` # Summary of Enrichment Results We next summarize the enrichment results (in `combined_res`) using `summarize_enrichment_results()` and annotate the involved significant (input) genes in each term using `annotate_term_genes()`. ```{r post_proc} ###### Summarize Combined Enrichment Results summarized_df <- summarize_enrichment_results(combined_res, list_active_snw_genes = TRUE ) ###### Annotate Affected Genes Involved in Each Enriched Term final_res <- annotate_term_genes( result_df = summarized_df, input_processed = example_processed, genes_by_term = biocarta_gsets ) ``` # Visualizations We can visualize each enriched term diagram using `visualize_terms()`. In this case, these will be graphs of interactions of pathway-involved genes for each pathway. See `?visualize_terms` for more details. ```{r vis_pws} visualize_terms( result_df = final_res, hsa_KEGG = FALSE, # boolean to indicate whether human KEGG gene sets were used for enrichment analysis or not pin_name_path = "Biogrid" ) ``` We can also create a graphical summary of the top 10 enrichment results using `enrichment_chart()`: ```{r enr_chart} enrichment_chart(final_res[1:10, ]) ``` The x-axis corresponds to fold enrichment values while the y-axis indicates the enriched terms. The size of each bubble indicates the number of significant genes in the given enriched term. Color indicates the -log10(lowest-p) value. The closer the color is to red, the more significant the enrichment is. ================================================ FILE: vignettes/non_hs_analysis.Rmd ================================================ --- title: "pathfindR Analysis for non-Homo-sapiens organisms" author: "Ege Ulgen" date: "`r Sys.Date()`" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{pathfindR Analysis for non-Homo-sapiens organisms} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>", fig.width = 7, fig.height = 7, fig.align = "center", eval = FALSE ) suppressPackageStartupMessages(library(pathfindR)) ``` As mentioned in the vignette [Introduction to pathfindR](intro_vignette.html), enrichment analysis with pathfindR is not limited to the built-in data. The users are able to utilize custom protein-protein interaction networks (PINs) as well as custom gene sets. These abilities to use custom data naturally allow for performing pathfindR analysis on non-Homo-sapiens input data. In this vignette, we'll try to provide an overview of how pathfindR analysis using Mus musculus data can be performed. # Preparation of Necessary Data > As of v1.5, pathfindR offers utility functions for obtaining organism-specific PIN data and organism-specific gene sets data via `get_pin_file()` and `get_gene_sets_list()`, respectively. See the vignette [Obtaining PIN and Gene Sets Data](obtain_data.html) for detailed information on how to gather PIN and gene sets data (for any organism of your choice) for use with pathfindR. For performing non-human active-subnetwork-oriented enrichment analysis, the user needs the following resources: - organism-specific protein interaction network (PIN) data - organism-specific gene sets data After obtaining and processing these data for use, the user can run pathfindR using custom parameters. > Important Note: Because the non-human organism-specific PIN will likely contain less interactions than the Homo sapiens PIN, pathfindR may result in less (or even no) enriched terms. ## Obtain Organism-specific Gene Sets We can obtain the up-to-date M.musculus (KEGG identifier: mmu) KEGG Pathway Gene Sets using the function `get_gene_sets_list()`: > If using another organism, all you have to do is to replace "mmu" with the KEGG organism code in the related arguments in this vignette. ```{r mmu_kegg} gsets_list <- get_gene_sets_list( source = "KEGG", org_code = "mmu" ) ``` This returns a list containing 2 objects named: `gene_sets` containing sets of genes of each pathway and `desriptions` containing the description of each pathway. The M.musculus KEGG gene set data `mmu_kegg_genes` and `mmu_kegg_descriptions` are already provided in pathfindR. For other organisms, the user may wish to save the data as RDS files for future use: ```{r KEGG_save} mmu_kegg_genes <- gsets_list$gene_sets mmu_kegg_descriptions <- gsets_list$descriptions ## Save both as RDS files for later use saveRDS(mmu_kegg_genes, "mmu_kegg_genes.RDS") saveRDS(mmu_kegg_descriptions, "mmu_kegg_descriptions.RDS") ``` These can be later loaded via: ```{r KEGG_load} mmu_kegg_genes <- readRDS("mmu_kegg_genes.RDS") mmu_kegg_descriptions <- readRDS("mmu_kegg_descriptions.RDS") ``` > The function `get_gene_sets_list()` can also be used to obtain gene sets data from other sources. See the vignette [Obtaining PIN and Gene Sets Data](obtain_data.html) for more detail. ## Obtain Organism-specific Protein-protein Interaction Network You may use the function `get_pin_file()` to obtain organism-specific BioGRID PIN data (see the vignette [Obtaining PIN and Gene Sets Data](obtain_data.html)) > Note that BioGRID PINs are smaller for non-H.sapiens organisms and this, in turn, results in less or no significantly enriched terms with pathfindR analysis. Here, we demonstrate obtaining the organism-specific protein-protein interaction network (PIN) from [STRING](https://string-db.org/). You may choose the organism of your choice and find the PIN on the downloads page with the description "protein network data (scored links between proteins)". When processing, we recommend filtering the interactions using a link score threshold (e.g. 800). Regardless of the resource, the raw PIN data should be processed to a SIF file, each interactor should be specified with their gene symbols. The first 3 interactions from an example SIF file is provided below: | | | | |:--------|:--|:-------| |C2cd2 |pp |Ints2 | |Apob |pp |Gpt | |B4galnt1 |pp |Mettl1 | Notice there are no headers and each line contains an interaction in the form `GeneA pp GeneB`, separated by tab (i.e. `\t`) with no row names and no column names. Below we download process the STRING PIN for use with pathfindR: ```{r process_PIN1} ## Downloading the STRING PIN file to tempdir url <- "https://stringdb-static.org/download/protein.links.v11.0/10090.protein.links.v11.0.txt.gz" path2file <- file.path(tempdir(check = TRUE), "STRING.txt.gz") download.file(url, path2file) ## read STRING pin file mmu_string_df <- read.table(path2file, header = TRUE) ## filter using combined_score cut-off value of 800 mmu_string_df <- mmu_string_df[mmu_string_df$combined_score >= 800, ] ## fix ids mmu_string_pin <- data.frame( Interactor_A = sub("^10090\\.", "", mmu_string_df$protein1), Interactor_B = sub("^10090\\.", "", mmu_string_df$protein2) ) head(mmu_string_pin, 2) ``` |Interactor_A |Interactor_B | |:------------------|:------------------| |ENSMUSP00000000001 |ENSMUSP00000017460 | |ENSMUSP00000000001 |ENSMUSP00000039107 | Since the interactors are Ensembl peptide IDs, we'll need to convert them to MGI symbols for use with pathfindR. This can be achieved via `biomaRt` or any other conversion method you prefer: ```{r process_PIN2, eval=FALSE} # library(biomaRt) mmu_ensembl <- useMart("ensembl", dataset = "mmusculus_gene_ensembl") converted <- getBM( attributes = c("ensembl_peptide_id", "mgi_symbol"), filters = "ensembl_peptide_id", values = unique(unlist(mmu_string_pin)), mart = mmu_ensembl ) mmu_string_pin$Interactor_A <- converted$mgi_symbol[match(mmu_string_pin$Interactor_A, converted$ensembl_peptide_id)] mmu_string_pin$Interactor_B <- converted$mgi_symbol[match(mmu_string_pin$Interactor_B, converted$ensembl_peptide_id)] mmu_string_pin <- mmu_string_pin[!is.na(mmu_string_pin$Interactor_A) & !is.na(mmu_string_pin$Interactor_B), ] mmu_string_pin <- mmu_string_pin[mmu_string_pin$Interactor_A != "" & mmu_string_pin$Interactor_B != "", ] head(mmu_string_pin, 2) ``` | Interactor_A | Interactor_B | |:------------:|:------------:| | Gnai3 | Ppy | | Gnai3 | Ccr3 | Next, we remove self interactions and any duplicated interactions, format the data frame as SIF: ```{r process_PIN3} # remove self interactions self_intr_cond <- mmu_string_pin$Interactor_A == mmu_string_pin$Interactor_B mmu_string_pin <- mmu_string_pin[!self_intr_cond, ] # remove duplicated inteactions (including symmetric ones) mmu_string_pin <- unique(t(apply(mmu_string_pin, 1, sort))) # this will return a matrix object mmu_string_pin <- data.frame( A = mmu_string_pin[, 1], pp = "pp", B = mmu_string_pin[, 2] ) ``` Finally, we save the gene symbol PIN as a SIF file named "mmusculusPIN.sif" under the temporary directory (i.e. `tempdir()`): ```{r process_PIN4} path2SIF <- file.path(tempdir(), "mmusculusPIN.sif") write.table(mmu_string_pin, file = path2SIF, col.names = FALSE, row.names = FALSE, sep = "\t", quote = FALSE ) path2SIF <- normalizePath(path2SIF) ``` We'll use this path to the custom sif for analysis with `run_pathfindR()`. >The STRING Mus musculus PIN created above is available in pathfindR and can be used via setting `pin_name_path = "mmu_STRING"` in `run_pathfindR()`. # Running pathfindR on non-Homo sapiens data ## Input Data The data used in this vignette (`example_mmu_input`) is the data frame of differentially-expressed genes along for the GEO dataset [GSE99393](https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE99393). The RNA microarray experiment was perform to detail the global program of gene expression underlying polarization of myeloma-associated macrophages by CSF1R antibody treatment. The samples are 6 murine bone marrow derived macrophages co-cultured with myeloma cells (myeloma-associated macrophages), 3 of which were treated with CSF1R antibody (treatment group) and the rest were treated with control IgG antibody (control group). In `example_mmu_input`, 45 differentially-expressed genes with |logFC| >= 2 and FDR <= 0.05 are presented. ```{r mmu_input_df, eval=TRUE} knitr::kable(head(example_mmu_input)) ``` ## Executing `run_pathfindR()` After obtaining the necessary PIN and gene sets data, you can then perform pathfindR analysis by setting these arguments: - `convert2alias = FALSE`: alias conversion only works on H.sapiens genes - `pin_name_path = path2SIF`: as we're using a non-built-in PIN, we need to provide the path to the mmu sif file - `gene_sets = "Custom`: as we're using a non-built-in source for gene sets - `custom_genes = mmu_kegg_genes` - `custom_descriptions = mmu_kegg_descriptions` ```{r run} example_mmu_output <- run_pathfindR( input = example_mmu_input, convert2alias = FALSE, gene_sets = "Custom", custom_genes = mmu_kegg_genes, custom_descriptions = mmu_kegg_descriptions, pin_name_path = path2SIF ) ``` ```{r enr_chart, echo=FALSE, eval=TRUE} enrichment_chart(example_mmu_output) ``` ```{r output, eval=TRUE} knitr::kable(example_mmu_output) ``` Because we used a very strict cut-off (logFC >= 2 + FDR <= 0.05), there were only 18 enriched KEGG pathways. However, the pathways identified here are significantly related to the pathways identified in the original publication by Wang et al.[^1]. [^1]: Wang Q, Lu Y, Li R, et al. Therapeutic effects of CSF1R-blocking antibodies in multiple myeloma. Leukemia. 2018;32(1):176-183. ## Built-in Mus musculus Data As aforementioned, for Mus musculus (only), we have provided the necessary PIN (`mmu_STRING`) and gene set data (`mmu_KEGG`) so you can also run: ```{r run2} example_mmu_output <- run_pathfindR( input = example_mmu_input, convert2alias = FALSE, gene_sets = "mmu_KEGG", pin_name_path = "mmu_STRING" ) ``` ================================================ FILE: vignettes/obtain_data.Rmd ================================================ --- title: "Obtaining PIN and Gene Sets Data" output: rmarkdown::html_vignette date: "`r Sys.Date()`" vignette: > %\VignetteIndexEntry{Obtaining PIN and Gene Sets Data} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>", eval = FALSE ) ``` # Get PIN File For retrieving the PIN file for an organism of your choice, you may use the function `get_pin_file()`. As of this version, the only source for PIN data is "BioGRID". By default, the function downloads the PIN data from BioGRID and processes it, saves it in a temporary file and returns the path: ```{r} ## the default organism is "Homo_sapiens" path_to_pin_file <- get_pin_file() ``` You can retrieve the PIN data for the organism of your choice, by setting the `org` argument: ```{r} ## retrieving PIN data for "Gallus_gallus" path_to_pin_file <- get_pin_file(org = "Gallus_gallus") ``` You may also supply a `path/to/PIN/file` to save the PIN file for later use (in this case, the path you supply will be returned): ```{r} ## saving the "Homo_sapiens" PIN as "/path/to/PIN/file" path_to_pin_file <- get_pin_file(path2pin = "/path/to/PIN/file") ``` You may also retrieve a specific version of BioGRID via setting the `release` argument: ```{r} ## retrieving PIN data for "Mus_musculus" from BioGRID release 3.5.179 path_to_pin_file <- get_pin_file( org = "Mus_musculus", release = "3.5.179" ) ``` # Get Gene Sets List To retrieve organism-specific gene sets list, you may use the function `get_gene_sets_list()`. The available sources for gene sets are "KEGG", "Reactome" and "MSigDB". The function retrieves the gene sets data from the source and processes it into a list of two objects used by pathfindR for active-subnetwork-oriented enrichment analysis: 1. **gene_sets** A list containing the genes involved in each gene set 2. **descriptions** A named vector containing the descriptions for each gene set By default, `get_gene_sets_list()` obtains "KEGG" gene sets for "hsa". ## KEGG Pathway Gene Sets To obtain the gene sets list of the KEGG pathways for an organism of your choice, use the KEGG organism code for the selected organism. For a full list of all available organisms, see [here](https://www.genome.jp/kegg/catalog/org_list.html). ```{r} ## obtaining KEGG pathway gene sets for Rattus norvegicus (rno) gsets_list <- get_gene_sets_list(org_code = "rno") ``` ## Reactome Pathway Gene Sets For obtaining Reactome pathway gene sets, set the `source` argument to "Reactome". This downloads the most current Reactome pathways in gmt format and processes it into the list object that pathfindR uses: ```{r} gsets_list <- get_gene_sets_list(source = "Reactome") ``` For Reactome, there is only one collection of pathway gene sets. ## MSigDB Gene Sets Using `msigdbr`, `pathfindR` can retrieve all MSigDB gene sets. For this, set the `source` argument to "MSigDB" and the `collection` argument to the desired MSigDB collection (one of H, C1, C2, C3, C4, C5, C6, C7): ```{r} gsets_list <- get_gene_sets_list( source = "MSigDB", collection = "C2" ) ``` The default organism for MSigDB is "Homo sapiens", you may obtain the gene sets data for another organism by setting the `species` argument: ```{r} ## obtaining C5 gene sets data for "Drosophila melanogaster" gsets_list <- get_gene_sets_list( source = "MSigDB", species = "Drosophila melanogaster", collection = "C5" ) ``` ```{r, eval=TRUE} ## see msigdbr::msigdbr_species() for all available organisms msigdbr::msigdbr_species() ``` You may also obtain the gene sets for a subcollection by setting the `subcollection` argument: ```{r} ## obtaining C3 - MIR: microRNA targets gsets_list <- get_gene_sets_list( source = "MSigDB", collection = "C3", subcollection = "MIR" ) ``` ================================================ FILE: vignettes/visualization_vignette.Rmd ================================================ --- title: "Visualization of pathfindR Enrichment Results" output: rmarkdown::html_vignette date: "`r Sys.Date()`" vignette: > %\VignetteIndexEntry{Visualization of pathfindR Enrichment Results} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>", fig.width = 8, fig.height = 4, fig.align = "center" ) ``` ```{r setup} suppressPackageStartupMessages(library(pathfindR)) ``` `pathfindR` offers various functionality to visualize the enrichment results. In this vignette, I try to demonstrate these functionalities. ## `enrichment_chart()`: Bubble Chart of Enrichment Results `enrichment_chart` generates a bubble chart. The x-axis corresponds to fold enrichment values while the y-axis indicates the enriched terms. Size of the bubble indicates the number of significant genes in the given enriched term. Color indicates the -log10(lowest-p) value. The closer the color is to red, the more significant the enrichment is. ```{r enr_chart, eval=FALSE} enrichment_chart(example_pathfindR_output) ``` By default, the bubble chart is generated for the top 10 terms. This can be controlled by the `top_terms` argument: ```{r enr_chart2, eval=FALSE} ## change top_terms enrichment_chart(example_pathfindR_output, top_terms = 3) ## set null for displaying all terms enrichment_chart(example_pathfindR_output, top_terms = NULL) ``` If the enrichment results were clustered, setting `plot_by_cluster == TRUE` will result in the enriched terms to be grouped by clusters: ```{r enr_chart3, fig.height=8, fig.width=8} enrichment_chart(example_pathfindR_output_clustered, plot_by_cluster = TRUE) ``` See `?enrichment_chart` for more details. ## `visualize_terms()`: Enriched Term Diagrams For KEGG enrichment analyses, `visualize_terms()` can be used to generate KEGG pathway diagrams that are returned as a list of `ggraph` objects (using [`ggkegg`](https://github.com/noriakis/ggkegg)):: ```{r KEGG_vis, eval=FALSE} input_processed <- input_processing(example_pathfindR_input) gg_list <- visualize_terms( result_df = example_pathfindR_output, input_processed = input_processed, is_KEGG_result = TRUE ) # this function returns a list of ggraph objects (named by Term ID) # save one of the plots as PDF image ggplot2::ggsave( "hsa04911_diagram.pdf", # path to output, format is determined by extension gg_list$hsa04911, # what to plot width = 5 # adjust width height = 5 # adjust height ) ``` Alternatively (i.e., for other types of non-KEGG enrichment analyses), an interaction diagram per enriched term can be generated again via `visualize_terms()`. These diagrams are also returned as a list of `ggraph` objects: ```{r nonKEGG_viss, eval=FALSE} input_processed <- input_processing(example_pathfindR_input) gg_list <- visualize_terms( result_df = example_pathfindR_output, input_processed = input_processed, is_KEGG_result = FALSE, pin_name_path = "Biogrid" ) # this function returns a list of ggraph objects (named by Term ID) # save one of the plots as PDF image ggplot2::ggsave( "diabetic_cardiomyopathy_interactions.pdf", # path to output, format is determined by extension gg_list$hsa04911, # what to plot width = 10 # adjust width height = 6 # adjust height ) ``` See `?visualize_terms` for more details. ## `term_gene_heatmap()`: Terms by Genes Heatmap `term_gene_heatmap()` is used to create a heatmap where rows are enriched terms and columns are involved input genes. This heatmap allows visual identification of the input genes involved in the enriched terms, as well as the common or distinct genes between different terms. ```{r hmap} term_gene_heatmap(example_pathfindR_output) ``` By default, the heatmap is generated for the top 10 terms. This can be controlled by the `num_terms` argument: ```{r hmap2, eval=FALSE} term_gene_heatmap(example_pathfindR_output, num_terms = 3) ## set null for displaying all terms term_gene_heatmap(example_pathfindR_output, num_terms = NULL) ``` By default, the term ids are used. For using full descriptions, set `use_description = TRUE` ```{r hmap3, eval=FALSE} term_gene_heatmap(example_pathfindR_output, use_description = TRUE) ``` If the input data frame (same as in `run_pathfindR()`) is supplied, the tile colors indicate the change values: ```{r hmap4, eval=FALSE} term_gene_heatmap(result_df = example_pathfindR_output, genes_df = example_pathfindR_input) ``` See `?term_gene_heatmap` for more details. ## `term_gene_graph()`: Term-Gene Graph The function `term_gene_graph()` (adapted from the Gene-Concept network visualization by the R package `enrichplot`) can be utilized to visualize which significant genes are involved in the enriched terms. The function creates the term-gene graph, displaying the connections between genes and biological terms (enriched pathways or gene sets). This allows for the investigation of multiple terms to which significant genes are related. The graph also enables determination of the degree of overlap between the enriched terms by identifying shared and/or distinct significant genes. By default, the function visualizes the term-gene graph for the top 10 enriched terms: ```{r term_gene1} term_gene_graph(example_pathfindR_output) ``` To plot all of the enriched terms in the enrichment results, set `num_terms = NULL` (not advised due to cluttered visualization): ```{r term_gene2, eval=FALSE} term_gene_graph(example_pathfindR_output, num_terms = NULL) ``` To plot using full term names (instead of IDs which is the default), set `use_description = TRUE`: ```{r term_gene3, eval=FALSE} term_gene_graph(example_pathfindR_output, num_terms = 3, use_description = TRUE) ``` By default the node sizes are plotted proportional to the number of genes a term contains (`num_genes`). To adjust node sizes using the $-log_{10}$(lowest p values), set `node_size = "p_val"`: ```{r term_gene4, eval=FALSE} term_gene_graph(example_pathfindR_output, num_terms = 3, node_size = "p_val") ``` See `?term_gene_graph` for more details. ## `UpSet_plot()`: UpSet Plots of Enriched Terms UpSet plots are plots of the intersections of sets as a matrix. `UpSet_plot()` creates a ggplot object of an UpSet plot where the x-axis is the UpSet plot of intersections of enriched terms. By default (`method = "heatmap"`), the main plot is a heatmap of genes at the corresponding intersections, colored by up/down regulation: ```{r upset1} UpSet_plot(example_pathfindR_output) ``` If genes_df is provided, the heatmap tiles are colored by change values: ```{r upset2, eval=FALSE} UpSet_plot(example_pathfindR_output, genes_df = example_pathfindR_input) ``` Again, you may change the number of top terms plotted via `num_terms` (default = 10): ```{r upset3, eval=FALSE} UpSet_plot(example_pathfindR_output, num_terms = 5) ``` Again, to plot using full term names (instead of IDs which is the default), set `use_description = TRUE`: ```{r upset4, eval=FALSE} UpSet_plot(example_pathfindR_output, use_description = TRUE) ``` If `method = "barplot"`, the main plot is a bar plots of the number of genes in the corresponding intersections: ```{r upset5, eval=FALSE} UpSet_plot(example_pathfindR_output, method = "barplot") ``` If `method = "boxplot"` and if `genes_df` is provided, then the main plot displays the boxplots of change values of the genes within the corresponding intersections: ```{r upset6, eval=FALSE} UpSet_plot(example_pathfindR_output, example_pathfindR_input, method = "boxplot") ``` See `?UpSet_plot` for more details.