Repository: egeulgen/pathfindR
Branch: master
Commit: 7ce1330d6d16
Files: 138
Total size: 688.6 KB

Directory structure:
gitextract_46eozuid/

├── .Rbuildignore
├── .Rinstignore
├── .github/
│   ├── .gitignore
│   ├── ISSUE_TEMPLATE/
│   │   ├── bug_report.md
│   │   └── feature_request.md
│   └── workflows/
│       ├── R-CMD-check.yaml
│       ├── branch_naming_policy.yaml
│       ├── pkgdown.yaml
│       └── test-coverage.yaml
├── .gitignore
├── CODE_OF_CONDUCT.md
├── CONTRIBUTING.md
├── DESCRIPTION
├── LICENSE
├── LICENSE.md
├── NAMESPACE
├── NEWS.md
├── R/
│   ├── active_snw_search.R
│   ├── clustering.R
│   ├── comparison.R
│   ├── core.R
│   ├── data_generation.R
│   ├── enrichment.R
│   ├── pathfindr.R
│   ├── scoring.R
│   ├── utility.R
│   ├── visualization.R
│   └── zzz.R
├── README.Rmd
├── README.md
├── _pkgdown.yml
├── codecov.yml
├── cran-comments.md
├── inst/
│   ├── CITATION
│   ├── extdata/
│   │   ├── CREB.txt
│   │   ├── MYC.txt
│   │   └── resultActiveSubnetworkSearch.txt
│   ├── java/
│   │   └── ActiveSubnetworkSearch.jar
│   └── rmd/
│       ├── conversion_table.Rmd
│       ├── enriched_terms.Rmd
│       └── results.Rmd
├── java/
│   ├── ActiveSubnetworkSearchAlgorithms/
│   │   ├── ActiveSubnetworkSearch.java
│   │   ├── GAIndividual.java
│   │   ├── GeneticAlgorithm.java
│   │   ├── GreedySearch.java
│   │   └── SimulatedAnnealing.java
│   ├── ActiveSubnetworkSearchMisc/
│   │   ├── Gaussian.java
│   │   ├── ScoreCalculations.java
│   │   ├── Subnetwork.java
│   │   └── ZStatistics.java
│   ├── Application/
│   │   ├── AppActiveSubnetworkSearch.java
│   │   └── Parameters.java
│   ├── File/
│   │   ├── ExperimentFileReader.java
│   │   └── SIFReader.java
│   └── Network/
│       ├── Network.java
│       ├── Node.java
│       └── SubnetworkFinder.java
├── man/
│   ├── UpSet_plot.Rd
│   ├── active_snw_enrichment_wrapper.Rd
│   ├── active_snw_search.Rd
│   ├── annotate_term_genes.Rd
│   ├── check_java_version.Rd
│   ├── cluster_enriched_terms.Rd
│   ├── cluster_graph_vis.Rd
│   ├── color_kegg_pathway.Rd
│   ├── combine_pathfindR_results.Rd
│   ├── combined_results_graph.Rd
│   ├── configure_output_dir.Rd
│   ├── create_HTML_report.Rd
│   ├── create_kappa_matrix.Rd
│   ├── enrichment.Rd
│   ├── enrichment_analyses.Rd
│   ├── enrichment_chart.Rd
│   ├── fetch_gene_set.Rd
│   ├── fetch_java_version.Rd
│   ├── filterActiveSnws.Rd
│   ├── fuzzy_term_clustering.Rd
│   ├── get_biogrid_pin.Rd
│   ├── get_gene_sets_list.Rd
│   ├── get_kegg_gsets.Rd
│   ├── get_mgsigdb_gsets.Rd
│   ├── get_pin_file.Rd
│   ├── get_reactome_gsets.Rd
│   ├── gset_list_from_gmt.Rd
│   ├── hierarchical_term_clustering.Rd
│   ├── hyperg_test.Rd
│   ├── input_processing.Rd
│   ├── input_testing.Rd
│   ├── isColor.Rd
│   ├── pathfindr.Rd
│   ├── plot_scores.Rd
│   ├── process_pin.Rd
│   ├── return_pin_path.Rd
│   ├── run_pathfindr.Rd
│   ├── safe_get_content.Rd
│   ├── score_terms.Rd
│   ├── single_iter_wrapper.Rd
│   ├── summarize_enrichment_results.Rd
│   ├── term_gene_graph.Rd
│   ├── term_gene_heatmap.Rd
│   ├── visualize_KEGG_diagram.Rd
│   ├── visualize_active_subnetworks.Rd
│   ├── visualize_term_interactions.Rd
│   └── visualize_terms.Rd
├── renv/
│   ├── .gitignore
│   ├── activate.R
│   └── settings.json
├── revdep/
│   ├── .gitignore
│   ├── email.yml
│   └── failures.md
├── slides/
│   └── cost_charme_school/
│       └── demo_script.R
├── tests/
│   ├── testthat/
│   │   ├── test-active_snw_search.R
│   │   ├── test-clustering.R
│   │   ├── test-comparison.R
│   │   ├── test-core.R
│   │   ├── test-data_generation.R
│   │   ├── test-enrichment.R
│   │   ├── test-scoring.R
│   │   ├── test-utility.R
│   │   ├── test-visualization.R
│   │   └── test-zzz.R
│   ├── testthat-active_snw.R
│   ├── testthat-clustering.R
│   ├── testthat-comparison.R
│   ├── testthat-core.R
│   ├── testthat-data_generation.R
│   ├── testthat-enrichment.R
│   ├── testthat-scoring.R
│   ├── testthat-utility.R
│   ├── testthat-visualization.R
│   └── testthat-zzz.R
└── vignettes/
    ├── .gitignore
    ├── comparing_results.Rmd
    ├── intro_vignette.Rmd
    ├── manual_execution.Rmd
    ├── non_hs_analysis.Rmd
    ├── obtain_data.Rmd
    └── visualization_vignette.Rmd

================================================
FILE CONTENTS
================================================

================================================
FILE: .Rbuildignore
================================================
^renv$
^renv\.lock$
^slides$
^CODE_OF_CONDUCT\.md$
^CONTRIBUTING.md$
^\.github$
^Meta$
^doc$
^.*\.Rprofile$
^.*\.Rproj$
^\.Rproj\.user$
^data-raw$
^misc$
^README.md$
^\.travis\.yml$
^cran-comments\.md$
^CRAN-RELEASE$
^Dockerfile_dev$
^codecov\.yml$
^LICENSE\.md$
^README\.Rmd$
^docs$
^_pkgdown\.yml$
^pkgdown$
^revdep$
^CRAN-SUBMISSION$


================================================
FILE: .Rinstignore
================================================
^slides$
^java$
^misc_data$


================================================
FILE: .github/.gitignore
================================================
*.html


================================================
FILE: .github/ISSUE_TEMPLATE/bug_report.md
================================================
---
name: Bug report
about: Create a report to help us improve
title: ''
labels: 'bug'
assignees: ''

---

**Describe the bug**
A clear and concise description of what the bug is.

**To Reproduce**
Steps to reproduce the behavior:
1. Prepare input as '...'
2. Run the following function: '....'
3. See error

**Expected behavior**
A clear and concise description of what you expected to happen.

**Screenshots**
If applicable, add screenshots to help explain your problem.

**Desktop (please complete the following information):**
 - OS: [e.g. macOS, Windows, Linux]
 - Version [e.g. 10.14.5]

** R Session Information:**
Please provide the R session information (by running `sessionInfo()`)

**Additional context**
Add any other context about the problem here. While pathfindR is an R package, the active subnetwork search functionality is written in Java. If you suspect any issue regarding java please provide your Java version (by running `java --version`)


================================================
FILE: .github/ISSUE_TEMPLATE/feature_request.md
================================================
---
name: Feature request
about: Suggest an idea for this project
title: ''
labels: 'enhancement'
assignees: ''

---

**Is your feature request related to a problem? Please describe.**
A clear and concise description of what the problem is. Ex. I'm always frustrated when [...]

**Describe the solution you'd like**
A clear and concise description of what you want to happen.

**Describe alternatives you've considered**
A clear and concise description of any alternative solutions or features you've considered.

**Additional context**
Add any other context or screenshots about the feature request here.


================================================
FILE: .github/workflows/R-CMD-check.yaml
================================================
# Workflow derived from https://github.com/r-lib/actions/tree/v2/examples
# Need help debugging build failures? Start at https://github.com/r-lib/actions#where-to-find-help
on:
  push:
    branches:
      - master
  pull_request:
    branches:
      - master

name: R-CMD-check

jobs:
  R-CMD-check:
    runs-on: ${{ matrix.config.os }}

    name: ${{ matrix.config.os }} (${{ matrix.config.r }})

    strategy:
      fail-fast: false
      matrix:
        config:
          - {os: macos-latest,   r: 'release'}
          - {os: windows-latest, r: 'release'}
          - {os: ubuntu-latest,   r: 'devel', http-user-agent: 'release'}
          - {os: ubuntu-latest,   r: 'release'}
          - {os: ubuntu-latest,   r: 'oldrel-1'}

    env:
      GITHUB_PAT: ${{ secrets.GITHUB_TOKEN }}
      R_KEEP_PKG_SOURCE: yes

    steps:
      - uses: actions/checkout@v4

      - uses: r-lib/actions/setup-pandoc@v2

      - uses: r-lib/actions/setup-r@v2
        with:
          r-version: ${{ matrix.config.r }}
          http-user-agent: ${{ matrix.config.http-user-agent }}
          use-public-rspm: true

      - uses: r-lib/actions/setup-r-dependencies@v2
        with:
          extra-packages: any::rcmdcheck
          needs: check

      - uses: r-lib/actions/check-r-package@v2
        with:
          upload-snapshots: true
          build_args: 'c("--no-manual","--compact-vignettes=gs+qpdf")'


================================================
FILE: .github/workflows/branch_naming_policy.yaml
================================================
name: Branch Naming Policy Action

on:
  create:
  delete:
  pull_request:
    branches:
      - master

jobs:
  branch-naming-policy:
    runs-on: ubuntu-latest

    steps:
      - name: Checkout
        uses: actions/checkout@v4

      - name: Run Branch Naming Policy Action
        uses: nicklegan/github-repo-branch-naming-policy-action@v1.1.1
        if: github.ref_type == 'branch' || github.ref_type == 'pull_request'
        with:
          token: ${{ secrets.REPO_TOKEN }}
          regex: '^(feature|fix|docs|refactor|test|release|chore|experiment)\/[a-zA-Z0-9-]+$'
          delete: true


================================================
FILE: .github/workflows/pkgdown.yaml
================================================
# Workflow derived from https://github.com/r-lib/actions/tree/v2/examples
# Need help debugging build failures? Start at https://github.com/r-lib/actions#where-to-find-help
on:
  push:
    branches: [main, master]
  pull_request:
    branches: [main, master]
  release:
    types: [published]
  workflow_dispatch:

name: pkgdown

jobs:
  pkgdown:
    runs-on: ubuntu-latest
    # Only restrict concurrency for non-PR jobs
    concurrency:
      group: pkgdown-${{ github.event_name != 'pull_request' || github.run_id }}
    env:
      GITHUB_PAT: ${{ secrets.GITHUB_TOKEN }}
    permissions:
      contents: write
    steps:
      - uses: actions/checkout@v4

      - uses: r-lib/actions/setup-pandoc@v2

      - uses: r-lib/actions/setup-r@v2
        with:
          use-public-rspm: true

      - uses: r-lib/actions/setup-r-dependencies@v2
        with:
          extra-packages: any::pkgdown, local::.
          needs: website

      - name: Build site
        run: pkgdown::build_site_github_pages(new_process = FALSE, install = FALSE)
        shell: Rscript {0}

      - name: Deploy to GitHub pages 🚀
        if: github.event_name != 'pull_request'
        uses: JamesIves/github-pages-deploy-action@v4.5.0
        with:
          clean: false
          branch: gh-pages
          folder: docs


================================================
FILE: .github/workflows/test-coverage.yaml
================================================
# Workflow derived from https://github.com/r-lib/actions/tree/v2/examples
# Need help debugging build failures? Start at https://github.com/r-lib/actions#where-to-find-help
on:
  push:
    branches: [main, master]
  pull_request:
    branches: [main, master]

name: test-coverage

jobs:
  test-coverage:
    runs-on: ubuntu-latest
    env:
      GITHUB_PAT: ${{ secrets.GITHUB_TOKEN }}

    steps:
      - uses: actions/checkout@v4

      - uses: r-lib/actions/setup-r@v2
        with:
          use-public-rspm: true

      - uses: r-lib/actions/setup-r-dependencies@v2
        with:
          extra-packages: any::covr
          needs: coverage

      - name: Test coverage
        run: |
          covr::codecov(
            quiet = FALSE,
            clean = FALSE,
            install_path = file.path(normalizePath(Sys.getenv("RUNNER_TEMP"), winslash = "/"), "package")
          )
        shell: Rscript {0}

      - name: Show testthat output
        if: always()
        run: |
          ## --------------------------------------------------------------------
          find '${{ runner.temp }}/package' -name 'testthat.Rout*' -exec cat '{}' \; || true
        shell: bash

      - name: Upload test results
        if: failure()
        uses: actions/upload-artifact@v4
        with:
          name: coverage-test-failures
          path: ${{ runner.temp }}/package


================================================
FILE: .gitignore
================================================
Meta
doc
inst/doc
misc
data-raw
*.pptx

.Rprofile

*.DS_Store
*.Rproj
*.RData
*.Ruserdata
*.Rproj.user
*.Rhistory
.Rproj.user

docs


================================================
FILE: CODE_OF_CONDUCT.md
================================================
# Contributor Covenant Code of Conduct

## Our Pledge

In the interest of fostering an open and welcoming environment, we as
contributors and maintainers pledge to making participation in our project and
our community a harassment-free experience for everyone, regardless of age, body
size, disability, ethnicity, sex characteristics, gender identity and expression,
level of experience, education, socio-economic status, nationality, personal
appearance, race, religion, or sexual identity and orientation.

## Our Standards

Examples of behavior that contributes to creating a positive environment
include:

* Using welcoming and inclusive language
* Being respectful of differing viewpoints and experiences
* Gracefully accepting constructive criticism
* Focusing on what is best for the community
* Showing empathy towards other community members

Examples of unacceptable behavior by participants include:

* The use of sexualized language or imagery and unwelcome sexual attention or
 advances
* Trolling, insulting/derogatory comments, and personal or political attacks
* Public or private harassment
* Publishing others' private information, such as a physical or electronic
 address, without explicit permission
* Other conduct which could reasonably be considered inappropriate in a
 professional setting

## Our Responsibilities

Project maintainers are responsible for clarifying the standards of acceptable
behavior and are expected to take appropriate and fair corrective action in
response to any instances of unacceptable behavior.

Project maintainers have the right and responsibility to remove, edit, or
reject comments, commits, code, wiki edits, issues, and other contributions
that are not aligned to this Code of Conduct, or to ban temporarily or
permanently any contributor for other behaviors that they deem inappropriate,
threatening, offensive, or harmful.

## Scope

This Code of Conduct applies both within project spaces and in public spaces
when an individual is representing the project or its community. Examples of
representing a project or community include using an official project e-mail
address, posting via an official social media account, or acting as an appointed
representative at an online or offline event. Representation of a project may be
further defined and clarified by project maintainers.

## Enforcement

Instances of abusive, harassing, or otherwise unacceptable behavior may be
reported by contacting the project team at egeulgen@gmail.com. All
complaints will be reviewed and investigated and will result in a response that
is deemed necessary and appropriate to the circumstances. The project team is
obligated to maintain confidentiality with regard to the reporter of an incident.
Further details of specific enforcement policies may be posted separately.

Project maintainers who do not follow or enforce the Code of Conduct in good
faith may face temporary or permanent repercussions as determined by other
members of the project's leadership.

## Attribution

This Code of Conduct is adapted from the [Contributor Covenant][homepage], version 1.4,
available at https://www.contributor-covenant.org/version/1/4/code-of-conduct.html

[homepage]: https://www.contributor-covenant.org

For answers to common questions about this code of conduct, see
https://www.contributor-covenant.org/faq


================================================
FILE: CONTRIBUTING.md
================================================
# Contributing to pathfindR development

The goal of this guide is to help you in contributing to pathfindR. The guide is divided into two main pieces:

1. Filing a bug report or feature request in an issue.
1. Suggesting a change via a pull request.

Please note that pathfindR is released with a [Contributor Code of Conduct](CODE_OF_CONDUCT.md). By contributing to this project, 
you agree to abide by its terms.

## Issues

When filing an issue, the most important thing is to include a minimal 
reproducible example so that we can quickly verify the problem, and then figure 
out how to fix it. There are three things you need to include to make your 
example reproducible: required packages, data, code.

1.  **Packages** should be loaded at the top of the script, so it's easy to
    see which ones the example needs.
  
1.  The easiest way to include **data** is to use `dput()` to generate the R code 
    to recreate it. For example, to recreate the `mtcars` dataset in R,
    I'd perform the following steps:
  
       1. Run `dput(mtcars)` in R
       2. Copy the output
       3. In my reproducible script, type `mtcars <- ` then paste.
       
    But even better is if you can create a `data.frame()` with just a handful
    of rows and columns that still illustrates the problem.
    
    For more complex **data**, you can use `saveRDS()` to save the object and attach it with the issue.
  
1.  Spend a little bit of time ensuring that your **code** is easy for others to
    read:
  
    * make sure you've used spaces and your variable names are concise, but
      informative
  
    * use comments to indicate where your problem lies
  
    * do your best to remove everything that is not related to the problem.  
     The shorter your code is, the easier it is to understand.

You can check you have actually made a reproducible example by starting up a 
fresh R session and pasting your script in.

## Pull requests

To contribute a change to pathfindR, you follow these steps:

1. Create a branch in git and make your changes.
1. Push branch to github and issue pull request (PR).
1. Discuss the pull request.
1. Iterate until either we accept the PR or decide that it's not
   a good fit for pathfindR.

If you're not familiar with git or github, please start by reading <http://r-pkgs.had.co.nz/git.html>

## Branch Naming Conventions

We want to follow the branch following naming convention during development:

### Feature Development:
- Use the prefix `feature/` followed by a brief description of the feature.
- Example: `feature/add-new-method`, `feature/update-active-snw-search`

### Bug Fixes:
- Use the prefix `fix/` followed by a description of the fix or the issue number.
- Example: `fix/correct-typo`, `fix/#123`

### Documentation:
- Use the prefix `docs/` for updates exclusively in the documentation.
- Example: `docs/update-readme`, `docs/add-examples`

### Refactoring:
- Use `refactor/` when modifying the structure and organization of code without changing its external behavior.
- Example: `refactor/reorganize-tests`, `refactor/optimization-code`

### Testing:
- Use `test/` for changes related to testing only.
- Example: `test/add-unit-tests`, `test/expand-tests`

### Releases (for maintainers only):
- Use `release/` for preparing a new version release.
- Example: `release/v1.0.0`, `release/v2.0.0`

### Chore/Maintenance (mostly for maintainers):
- Use `chore/` for mundane tasks like updating dependencies or minor tasks that don't modify the source code.
- Example: `chore/update-packages`, `chore/license-update`

### Experimental:
- Use `experiment/` for experimental work that might not be merged into the `master`
- Example: `experiment/new-algorithm`, `exp/test-new-library`

# Attribution
This Contributing guide was adapted from [ggplot2](https://github.com/tidyverse/ggplot2)


================================================
FILE: DESCRIPTION
================================================
Package: pathfindR
Type: Package
Title: Enrichment Analysis Utilizing Active Subnetworks
Version: 2.7.0.9000
Authors@R: c(person("Ege", "Ulgen",
                    role = c("cre", "cph"), 
                    email = "egeulgen@gmail.com",
                    comment = c(ORCID = "0000-0003-2090-3621")),
             person("Ozan", "Ozisik",
                    role = "aut",
                    email = "ozanytu@gmail.com",
                    comment = c(ORCID = "0000-0001-5980-8002")))
Maintainer: Ege Ulgen <egeulgen@gmail.com>
Description: Enrichment analysis enables researchers to uncover mechanisms 
    underlying a phenotype. However, conventional methods for enrichment 
    analysis do not take into account protein-protein interaction information, 
    resulting in incomplete conclusions. 'pathfindR' is a tool for enrichment 
    analysis utilizing active subnetworks. The main function identifies active 
    subnetworks in a protein-protein interaction network using a user-provided 
    list of genes and associated p values. It then performs enrichment analyses 
    on the identified subnetworks, identifying enriched terms (i.e. pathways or, 
    more broadly, gene sets) that possibly underlie the phenotype of interest.
    'pathfindR' also offers functionalities to cluster the enriched terms and 
    identify representative terms in each cluster, to score the enriched terms 
    per sample and to visualize analysis results. The enrichment, clustering and 
    other methods implemented in 'pathfindR' are described in detail in 
    Ulgen E, Ozisik O, Sezerman OU. 2019. 'pathfindR': An R Package for 
    Comprehensive Identification of Enriched Pathways in Omics Data Through 
    Active Subnetworks. Front. Genet. <doi:10.3389/fgene.2019.00858>.
License: MIT + file LICENSE
URL: https://egeulgen.github.io/pathfindR/, https://github.com/egeulgen/pathfindR
BugReports: https://github.com/egeulgen/pathfindR/issues
Encoding: UTF-8
LazyData: true
SystemRequirements: Java (>= 8.0)
biocViews:
Imports: 
    DBI,
    AnnotationDbi,
    doParallel,
    foreach,
    rmarkdown,
    ggplot2,
    ggraph,
    ggupset,
    fpc,
    ggkegg (>= 1.4.0),
    grDevices,
    httr,
    igraph,
    R.utils,
    msigdbr (>= 24.1.0),
    knitr
Depends: R (>= 4.3.0),
    pathfindR.data (>= 2.0)
Suggests: 
    org.Hs.eg.db,
    testthat (>= 2.3.2),
    covr,
    mockery
RoxygenNote: 7.3.3
VignetteBuilder: knitr


================================================
FILE: LICENSE
================================================
YEAR: 2020
COPYRIGHT HOLDER: Ege Ulgen


================================================
FILE: LICENSE.md
================================================
# MIT License

Copyright (c) 2020 Ege Ulgen

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.


================================================
FILE: NAMESPACE
================================================
# Generated by roxygen2: do not edit by hand

export(UpSet_plot)
export(active_snw_search)
export(annotate_term_genes)
export(cluster_enriched_terms)
export(cluster_graph_vis)
export(combine_pathfindR_results)
export(combined_results_graph)
export(create_kappa_matrix)
export(enrichment)
export(enrichment_analyses)
export(enrichment_chart)
export(fetch_gene_set)
export(filterActiveSnws)
export(fuzzy_term_clustering)
export(get_gene_sets_list)
export(get_pin_file)
export(hierarchical_term_clustering)
export(hyperg_test)
export(input_processing)
export(input_testing)
export(plot_scores)
export(return_pin_path)
export(run_pathfindR)
export(score_terms)
export(summarize_enrichment_results)
export(term_gene_graph)
export(term_gene_heatmap)
export(visualize_KEGG_diagram)
export(visualize_active_subnetworks)
export(visualize_term_interactions)
export(visualize_terms)
import(doParallel)
import(foreach)
import(ggplot2)
import(ggraph)
import(graphics)
import(knitr)
import(parallel)
import(pathfindR.data)
import(rmarkdown)
importFrom(ggkegg,pathway)
importFrom(httr,GET)
importFrom(httr,content)
importFrom(httr,http_error)
importFrom(httr,status_code)
importFrom(httr,timeout)


================================================
FILE: NEWS.md
================================================
# pathfindR (development version)

# pathfindR 2.7.0
## Minor Changes and Bug Fixes
- Moved org.Hs.eg.db from "Imports" to "Suggests" per new CRAN policy. Relevant functions revert to default behaviour if the required package is not installed.

# pathfindR 2.6.0

## Minor Changes and Bug Fixes
- fixed missing argument issue in `get_gene_sets_list`(#230)
- refactored to introduce `safe_get_content` so that URL access issues are handled more gracefully

# pathfindR 2.5.1

## Minor Changes and Bug Fixes
- fixed NA values in kappa matrix generation that will cause error as part of the latest `igraph` update (#227)

# pathfindR 2.5.0

## Major Changes
- updated dependencies so that `pathfindR` depends on `msigdbr (>= 24.1.0)`
- added the new `db_species` argument to the `get_mgsigdb_gsets()` data generation function

## Minor Changes and Bug Fixes
- fixed test assertions that will break as part of the latest `ggplot2` update (#223)

# pathfindR 2.4.2

## Minor Changes and Bug Fixes

- fixed a bug in `visualize_KEGG_diagram()` where `ggkegg` was raising an error (#214)


# pathfindR 2.4.1

## Minor Changes and Bug Fixes

- fixed a bug regarding KEGG gene set fetching: removed the conversion functionality in `get_kegg_gsets()` which now returns KEGG IDs so that the user can convert the returned identifiers using a more appropriate tool (e.g. BioMart) should they wish

# pathfindR 2.4.0

## Major Changes

- implemented a new `color_kegg_pathway()` function using `ggkegg` to create colored KEGG pathway ggplot objects (instead of using `KEGGREST` to obtain the colored PNG files, which no longer works #169)
- renamed the `visualize_hsa_KEGG` function to `visualize_KEGG_diagram()` to reflect this is now able to handle KEGG pathway enrichment results from any organism
- updated the `visualize_terms()`, `visualize_term_interactions()` and `visualize_KEGG_diagram()` functions so that they now return a list of ggplot objects (named by term ID)
- updated the `get_kegg_gsets()` function to also use `ggkegg` for fetching genes per pathway data
- removed unneeded dependencies: `magick`, `KEGGgraph` and `KEGGREST`

## Minor Changes and Bug Fixes

- updated the `get_biogrid_pin()` function so that it can now determine the latest version and download/process it from BioGRID (via setting `release = "latest"`, which is now the default behavior) 

# pathfindR 2.3.1

## Minor Changes and Bug Fixes
- fixed a bug in the `UpSet_plot()` plot function regarding the interaction with `ggupset` package that was discovered in a reverse dependency check for `ggplot2 3.5.0` (#189)
- fixed gene symbol case mismatch issue in `score_terms()` (#186)
- applied enhancement suggestion from #184 to enable scale fill manual for `term_gene_graph()`

# pathfindR 2.3.0
## Major Changes
- reverted removal of `create_HTML_report()` so `run_pathfindR()` once again generates HTML reports

# pathfindR 2.2.0
## Minor Changes and Bug Fixes
- added the `disable_parallel` argument in `active_snw_enrichment_wrapper()` to be able to disable parallel runs via `foreach` 
- fixed the issue encountered on CentOS where `forech` wasn't loading `pathfindR` (#164)
- fixed a CRAN error due to a package documentation issue (#172)
- performed some refactoring and updated/improved all tests

# pathfindR 2.1.0
## Minor Changes and Bug Fixes
- removed `create_HTML_report()` so `run_pathfindR()` no longer generates a HTML report

# pathfindR 2.0.1

## Minor Changes and Bug Fixes
- added the `dir_for_report` argument in the internal function `create_HTML_report()` to fix test issues on CRAN

# pathfindR 2.0.0

## Major Changes
- updated the java active subnetwork search component and added the `seedForRandom` argument in `active_snw_search()`to ensure reproducibility. By default behavior, in `run_pathfindR()`, a seed is set for each iteration to produce reproducible results (#108)
- as the example input/output data were renamed for convenience in 'pathfindR.data' v2.0, 'pathfindR' now depends on pathfindR.data (>= 2.0) 
- refactored/simplified `run_pathfindR()`
- visualization enriched term diagrams are now NOT part of `run_pathfindR()`
- default behavior of `run_pathfindR()` is now to run in a temporary directory. The user can still set `output_dir` to run in a specified directory and also produce HTML reports
- in `hierarchical_term_clustering()`, update the sequence of number of clusters for which silhouette width is calculated for choosing the optimal number of clusters. This should speed up the function for cases with a large number of enriched terms
- updated the relevant vignettes to reflect the implemented changes

## Minor Changes and Bug Fixes
- fixed a minor issue in `return_pin_path()` where the PIN was not properly read (#157)

# pathfindR 1.6.4

## Minor Changes and Bug Fixes
- updated the alias selection function within `input_processing()` so that an alias that is not already present is selected
- updated the min-max scaling (controlled by `scale_vals`) in `color_kegg_pathway()`, the default is now `scale_vals=TRUE`
- updated the `term_gene_heatmap()` function so that legend title is shown and can be customized
- updated the `term_gene_heatmap()` function so that coloring is proper when no change values are provided in `genes_df`
- added the `sort_terms_by_p` argument to the `term_gene_heatmap()` function to enable sorting of terms by 'lowest_p'
- in visualization functions, made coloring of up-/down-regulated genes consistent (#126)
- added the `vertex.label.cex` and `vertex.size.scaling` arguments to `cluster_graph_vis()`
- added the `show_legend` argument to `visualize_term_interactions()` to toggle the legend


# pathfindR 1.6.3

## Minor Changes and Bug Fixes
- Fixed coloring issue in `color_kegg_pathway()`
- In `color_kegg_pathway()` the default value for `normalize_vals` is now `FALSE`


# pathfindR 1.6.2

## Major Changes
- fixed an issue in `get_kegg_gsets()` where empty result was returned for some organisms due to an error in parsing (#72)

## Minor Changes and Bug Fixes
- added `repel = TRUE` in `term_gene_graph()` and `combined_results_graph()` for better visualization of labels
- fixed minor issue in `enrichment_chart()` (#75)
- fixed minor issue in `visualize_term_interactions()`
- fixed issue in `get_biogrid_pin()` where the download method was set to `wget` (now set to `auto`, per #83)
- updated to using tab3 format for `get_biogrid_pin()` (if tab3 is available for the chosen release, otherwise tab2 format is used)
- updated the default version of PIN obtained by `get_biogrid_pin()` to '4.4.200'
- in `get_kegg_gsets()`, improved parsing of KEGG term descriptions so that no description is duplicated (#87)
- in `score_terms()`, if using descriptions, the ID is now appended for (any) duplicated term descriptions (#87)
- in `obtain_colored_url()`, swapped `bg_color` with `fg_color` due to an issue with `KEGGREST`
- added legend to `term_gene_heatmap()` (#95)
- in `get_biogrid_pin()`, the "download.file.method" from global options is used
- `combined_results_graph()` raises an error if there are no common terms in the combined data frame

# pathfindR 1.6.1

## Major Changes

- In `run_pathfindR()`, the default `iterations` was set back to 10 (the default for all other v1.x)

# pathfindR 1.6.0

## Major Changes
- In `run_pathfindR()`, as "GR" (the default active subnetwork search method) provides nearly identical results in each iteration, the default `iterations` is set to 1
- added the column 'support' (the proportion of active subnetworks leading to enrichment over all subnetworks) in the output
- updated the download URL in `get_biogrid_pin()` as BioGRID updated the URL for download

## Minor Changes and Bug Fixes
- changed old argument in the "Step-by-Step Execution of the pathfindR Enrichment Workflow" vignette
- fixed an issue in `visualize_term_interactions()` where the file name was too long, it was causing an error on Windows. Limited to 100 characters (#58)

# pathfindR 1.5.1

## Minor Changes and Bug Fixes
- Fixed issue in `check_java_version()` where java version 14 could not be parsed (#49)
- Fixed issue in `combined_results_graph()` where gene nodes were not colored correctly (#55)

# pathfindR 1.5.0

## Major Changes
- created separate package `pathfindR.data` for storing pathfindR data
- added the function `visualize_active_subnetworks()` for visualizing graphs of active subnetworks
- add the new vignette "Comparing Two pathfindR Results" that briefly describes how different pathfindR results can be compared
- added the functions `combine_pathfindR_results()` and `combined_results_graph()` for comparison of 2 pathfindR results and term-gene graph of the combined results, respectively
- added the function `get_pin_file()` for obtaining organism-specific PIN data (only from BioGRID for now)
- added the function `get_gene_sets_list()` for obtaining organism-specific gene sets list from KEGG, Reactome and MSigDB
- added the function `term_gene_heatmap()` to create heatmap visualizations of enriched terms and the involved input genes. Rows are enriched terms and columns are involved input genes. If `genes_df` is provided, colors of the tiles indicate the change values
- added the function `UpSet_plot()` to create UpSet plots of enriched terms
- added the human cell markers gene sets data `cell_markers_gsets` and `cell_markers_descriptions`

## Minor Changes and Bug Fixes
- fixed an issue regarding `parallel::makeCluster()` in `run_pathfindR()` (#45)
- fixed save-related issue in `download_kegg_png()` (#37, @rix133)
- added the output data `RA_comparison_output` of pathfindR results on another RA-related dataset (GSE84074)
- in `visualize_hsa_KEGG()`, fixed the issue where >1 entrez ids were returned for a gene symbol (the first one is kept)
- in `visualize_hsa_KEGG()`, implemented a tryCatch to avoid any issues when `KEGGREST::color.pathway.by.objects()` might fail (#28)
- in `visualize_hsa_KEGG()`, now limiting the number of genes passes onto `KEGGREST::color.pathway.by.objects()` to < 60 (because the KEGG API now limits the number?)
- changed default visualization in `term_gene_heatmap()` (i.e. when `genes_df` is not provided) to binary colored heatmap (by default, "green" and "red", controlled by `low` and `high`) by up-/down- regulation status
- update the vignette "pathfindR Analysis for non-Homo-sapiens organisms" to reflect new data generation functions `get_pin_file()` and `get_gene_sets_list()` and fixed a minor issue in the vignette (#46)

# pathfindR 1.4.2

## Minor Changes and Bug Fixes
- Fixed corner case in `create_kappa_matrix()` when `chance` is 1, the metric is turned into 0
- Fixed misused `class(.) == *` in `cluster_graph_vis()`

# pathfindR 1.4.1

## Major Changes
- Fixed error in DESCRIPTION: the Java version in SystemRequirements was corrected to "Java (>= 8.0)"
- The Java version is now checked

## Minor Changes and Bug Fixes
- Fixed behavior: when no input genes are present in the enriched hsa KEGG pathway, visualization of the pathway is now skipped
- Added the argument `max_to_plot` to `visualize_hsa_KEGG()` and to `run_pathfindR()`. This argument controls the number of pathways to be visualized (default is NULL, i.e. no filter). This was implemented not to slow down the runtime of `run_pathfindR()` as downloading the png files is slow.
- Fixed links to visualizations in `enriched_ters.Rmd`

# pathfindR 1.4.0

## Major Changes
- Replaced most occurrences of "pathway" to "term". This was adapted because "term" reflects the utility of the package better. The enrichment and clustering approaches work with any kind of gene set data (be it pathway gene sets, gene ontology gene sets, motif gene sets etc.) Accordingly:
  - `DESCRIPTION` was updated
  - The functions `annotate_pathway_DEGs()`, `calculate_pw_scores()`, `cluster_pathways()`, `fuzzy_pw_clustering()`, `hierarchical_pw_clustering()`, `visualize_pw_interactions()` and `visualize_pws()` were renamed to 
  `annotate_term_DEGs()`, `score_terms()`, `cluster_enriched_terms()`, `fuzzy_term_clustering()`, `hierarchical_term_clustering()`, `visualize_term_interactions()` and `visualize_terms()` respectively
  - The Rmd template file for the report `enriched_pathways.Rmd` was renamed to `enriched_terms.Rmd`
  - All the Rmd template files for the report were updated
  - Documentation of each function was updated accordingly
- Added the visualization function `term_gene_graph()`, which creates a graph of enriched terms - involved genes
- Made changes in `enrichment()` and `enrichment_analyses()` to get enrichment results faster
- Added the function `fetch_gene_set()` for obtaining gene set data more easily
- Terms in gene sets can now be filtered according to the number of genes a term contains (controlled by `min_gset_size`, `max_gset_size` in `fetch_gene_set()` and `run_pathfindR()`) 
- Added the argument `gaCrossover` during active subnetwork search which controls the probability of a crossover in GA (default = 1, i.e. always perform crossover)
- Added unit tests using `testthat`
- Updated all gene sets data
- Updated all RA example data
- The vignettes were updated
- Updated all PIN data
- Improved speed of kappa matrix calculation (`create_kappa_matrix()`)
- Added vignette for non-Homo-sapiens organisms
- Added Mus musculus (mmu) data:
  - `mmu_kegg_genes` & `mmu_kegg_descriptions`: mmu KEGG gene sets data
  - mmu STRING PIN
  - `myeloma_input` & `myeloma_output`: example mmu input and output data
- Added the STRING PIN (combined score >= 400)
- The argument `sig_gene_thr` in subnetwork filtering via `filterActiveSnws()` now serves the threshold proportion of significant genes in the active subnetwork. e.g., if there are 100 significant genes and `sig_gene_thr = 0.03`, subnetwork that contain at least 3 (100 x 0.03) significant genes will be accepted for further analysis
- Removed `pathview` dependency by implementing colored pathway diagram visualization function using `KEGGREST` and `KEGGgraph`

## Minor Changes and Bug Fixes
- In `hierarchical_term_clustering()`, redefined the distance measure as `1 - kappa statistic`
- Fixed minor issue in `cluster_graph_vis()` (during the calculations for additional node colors)
- Removed title from graph visualization of hierarchical clustering in `cluster_graph_vis()`
- In `active_snw_search()`, unnecessary warnings during active subnetwork search were removed
- Fixed minor issue in `enrichment_chart()`, supplying fuzzy clustered results no longer raises an error
- Added new checks in `input_testing()` and `input_processing()` to ensure that both the initial input data frame and the processed input data frame for active subnetwork search contain at least 2 genes (to fix the corner case encountered in issue #17)
- Fixed minor issue in `enrichment_chart()`, ensuring that bubble sizes displayed in the legend (proportional to # of DEGs) are integers
- In `enrichment_chart()`, added the arguments `num_bubbles` (default is 4) to control number of bubbles displayed in the legend and `even_breaks` (default is `TRUE`) to indicate if even increments of breaks are required
- Updated the logo
- Minor fix in `term_gene_graph()` (create the igraph object as an undirected graph for better auto layout)
- Minor fix in `visualize_term_interactions()`. The legend no longer displays "Non-input Active Snw. Genes" if they were not provided
- The argument `human_genes` in `run_pathfindR()` and `input_processing()` was renamed as `convert2alias`
- The gene symbols in the input data frame, the PIN and the gene sets are now turned into uppercase (for obtaining the best overlap)
- Added the argument `top_terms` to `enrichment_chart()`, controlling the number top enriched terms to plot (default is 10)
- Other minor bug/error fixes

# pathfindR 1.3.0

## Major Changes
- Separated the steps of the function `run_pathfindR` into individual functions: `active_snw_search`, `enrichment_analyses`, `summarize_enrichment_results`, `annotate_pathway_DEGs`, `visualize_pws`.
- renamed the function `pathmap` as `visualize_hsa_KEGG`, updated the function to produce different visualizations for inputs with binary change values (ordered) and no change values (the `input_processing` function, assigns a change value of 100 to all).
- Created new the visualization function `visualize_pw_interactions`, which creates PNG files visualizing the interactions (in the selected PIN) of genes involved in the given pathways.
- Added new vignette, describing the step-by-step execution of the pathfindR workflow
- Changed clustering metric to kappa statistic, created the new clustering related functions `create_kappa_matrix`, `hierarchical_pw_clustering`, `fuzzy_pw_clustering` and `cluster_pathways`.
- Implemented the new function `cluster_graph_vis` for visualizing graph diagrams of clustering results.

## Minor Changes and Bug Fixes
- Fixed the bug where the arguments `score_quan_thr` and `sig_gene_thr` for `run_pathfindR` were not being utilized.
- in `run_pathfindR`, added message at the end of run, reporting the number enriched pathways.
- the function `run_pathfindR` now creates a variable `org_dir` that is the "path/to/original/working/directory". `org_dir` is used in multiple functions to return to the original working directory if anything fails. This changes the previous behavior where if a function stopped with an error the directory was changed to "..", i.e. the parent directory. This change was adapted so that the user is returned to the original working directory if they supply a recursive output folder (`output_dir`, e.g. "./ALL_RESULTS/RESULT_A"). 
- in `input_processing`, added the argument `human_genes` to only perform alias symbol conversion when human gene symbols are provided. - Updated the Rmd files used to create the report HTML files
- Added the data for `GO-All`, all annotations in the GO database (BP+MF+CC)
- Updated the vignette `pathfindR - An R Package for Pathway Enrichment Analysis Utilizing Active Subnetworks` to reflect the new functionalities.

# pathfindR 1.2.3
## Minor Changes and Bug Fixes
- in the function `plot_scores`, added the argument `label_cases` to indicate whether or not to label the cases in the pathway scoring heatmap plot. Also added the argument `case_control_titles` which allows the user to change the default "Case" and "Control" headers. Also added the arguments `low` and `high` used to change the low and high end colors of the scoring color gradient.
- in the function `plot_scores`, reversed the color gradient to match the coloring scheme used by pathview (i.e. red for positive values, green for negative values)
- minor change in `parseActiveSnwSearch`, replaced `score_thr` by `score_quan_thr`. This was done so that the scoring filter for active subnetworks could be performed based on the distribution of the current active subnetworks and not using a constant empirical score value threshold.
- minor change in `parseActiveSnwSearch`, increased `sig_gene_thr` from 2 to 10 as we observed in most of the cases, this resulted in faster runs with comparable results.
- in `choose_clusters`, added the argument `p_val_threshold` to be used as p value threshold for filtering the enriched pathways prior to clustering.

# pathfindR 1.2.2

## Major Changes
- fixed issue related to the package `pathview`.
## Minor Changes and Bug Fixes
- in the function `choose_clusters`, added option to use pathway names instead of pathway ids when visualizing the clustering dendrogram and heatmap.

# pathfindR 1.2.1

## Major Changes
- Added the option to specify a custom gene set when using `run_pathfindR`. For this, the `gene_sets` argument should be set to "Custom" and `custom_genes` and `custom_pathways` should be provided.

## Minor Changes and Bug Fixes
- fixed minor bug in `calculate_pw_scores` where if there was one DEG, subsetting the experiment matrix failed
- added if condition to check if there were DEGs in `calculate_pw_scores`. If there is none, the pathway is skipped.
- in `calculate_pw_scores`, if `cases` are provided, the pathways are reordered before plotting the heat map and returning the matrix according to their activity in `cases`. This way, "up" pathways are grouped together, same for "down" pathways.
- in `calculate_pwd`, if a pathway has perfect overlap with other pathways, change the correlation value with 1 instead of NA.
- in `choose_clusters`, if `result_df` has less than 3 pathways, do not perform clustering.
- `run_pathfindR` checks whether the output directory (`output_dir`) already exists and if it exists, now appends "(1)" to `output_dir` and displays a warning message. This was implemented to prevent writing over existing results.
- in run `run_pathfindR`, recursive creation for the output directory (`output_dir`) is now supported.
- in run `run_pathfindR`, if no pathways are found, the function returns an empty data frame instead of raising an error.

# pathfindR 1.2

## Major Changes
- Implemented the (per subject) pathway scoring function `calculate_pw_scores` and the function to plot the heatmap of pathway scores per subject `plot_scores`.

- Added the `auto` parameter to `choose_clusters`. When `auto == TRUE` (default), the function chooses the optimal number of clusters `k` automatically, as the value which maximizes the average silhouette width. It then returns a data frame with the cluster assignments and the representative/member statuses of each pathway.

- Added the `Fold_Enrichment` column to the resulting data frame of `enrichment`, and as a corollary to the resulting data frame of `run_pathfindR`.

- Added the option `bubble` to plot a bubble chart displaying the enrichment results in `run_pathfindR` using the helper function `enrichment_chart`. To plot the bubble chart set `bubble = TRUE` in `run_pathfindR` or use `enrichment_chart(your_result_df)`. 

## Minor Changes and Bug Fixes
- Add the parameter `silent_option` to `run_pathfindR`. When `silent_option == TRUE` (default), the console outputs during active subnetwork search are printed to a file named "console_out.txt". If `silent_option == FALSE`, the output is printed on the screen. Default was set to `TRUE` because multiple console outputs are simultaneously printed when running in parallel.

- Added the `list_active_snw_genes` parameter to `run_pathfindR`. When `list_active_snw_genes == TRUE`, the function adds the column `non_DEG_Active_Snw_Genes`, which reports the non-DEG active subnetwork genes for the active subnetwork which was enriched for the given pathway with the lowest p value.

- Added the data `RA_clustered`, which is the example output of the clustering workflow.

- In the function, `run_pathfindR` added the option to specify the argument `output_dir` which specifies the directory to be created under the current working directory for storing the result HTML files. `output_dir` is "pathfindR_Results" by default.

- `run_pathfindR` now checks whether the output directory (`output_dir`) already exists and if it exists, stops and displays an error message. This was implemented to prevent writing over existing results.

- `genes_table.html` now contains a second table displaying the input gene symbols for which there were no interactions in the PIN.

# pathfindR 1.1

## Major changes
- Added the `gene_sets` option in `run_pathfindR` to chose between different gene sets. Available gene sets are `KEGG`, `Reactome`, `BioCarta` and Gene Ontology gene sets (`GO-BP`, `GO-CC` and `GO-MF`)
- `cluster_pathways` automatically recognizes the ID type and chooses the gene sets accordingly

## Minor Changes and Bug Fixes
- Fixed issue regarding p values < 1e-13. No active subnetworks were found when there were p values < 1e-13. These are now changed to 1e-13 in the function `input_processing`
- In `input_processing`, genes for which no interactions are found in the PIN are now removed before active subnetwork search
- Duplicated gene symbols no longer raise an error. If there are duplicated symbols, the lowest p value is chosen for each gene symbol in the function `input_processing`
- To prevent the formation of nested folders, by default and on errors, the function `run_pathfindR` returns to the user's working directory.
- Citation information are now provided for our BioRxiv pre-print


================================================
FILE: R/active_snw_search.R
================================================
#' Perform Active Subnetwork Search
#'
#' @param input_for_search input the input data that active subnetwork search uses. The input
#' must be a data frame containing at least these 2 columns: \describe{
#'   \item{GENE}{Gene Symbol}
#'   \item{P_VALUE}{p value obtained through a test, e.g. differential expression/methylation}
#' }
#' @inheritParams return_pin_path
#' @param snws_file name for active subnetwork search output data
#' \strong{without file extension} (default = 'active_snws')
#' @param dir_for_parallel_run (previously created) directory for a parallel run iteration.
#' Used in the wrapper function (see ?run_pathfindR) (Default = NULL)
#' @inheritParams filterActiveSnws
#' @param search_method algorithm to use when performing active subnetwork
#'  search. Options are greedy search (GR), simulated annealing (SA) or genetic
#'  algorithm (GA) for the search (default = 'GR').
#' @param seedForRandom seed for reproducibility while running the java modules (applies for GR and SA)
#' @param silent_option boolean value indicating whether to print the messages
#' to the console (FALSE) or not (TRUE, this will print to a temp. file) during
#' active subnetwork search (default = TRUE). This option was added because
#' during parallel runs, the console messages get disorderly printed.
#' @param use_all_positives if TRUE: in GA, adds an individual with all positive
#'  nodes. In SA, initializes candidate solution with all positive nodes. (default = FALSE)
#' @param geneInitProbs For SA and GA, probability of adding a gene in initial solution (default = 0.1)
#' @param saTemp0 Initial temperature for SA (default = 1.0)
#' @param saTemp1 Final temperature for SA (default = 0.01)
#' @param saIter Iteration number for SA (default = 10000)
#' @param gaPop Population size for GA (default = 400)
#' @param gaIter Iteration number for GA (default = 200)
#' @param gaThread Number of threads to be used in GA (default = 5)
#' @param gaCrossover Applies crossover with the given probability in GA (default = 1, i.e. always perform crossover)
#' @param gaMut For GA, applies mutation with given mutation rate (default = 0, i.e. mutation off)
#' @param grMaxDepth Sets max depth in greedy search, 0 for no limit (default = 1)
#' @param grSearchDepth Search depth in greedy search (default = 1)
#' @param grOverlap Overlap threshold for results of greedy search (default = 0.5)
#' @param grSubNum Number of subnetworks to be presented in the results (default = 1000)
#'
#' @return A list of genes in every identified active subnetwork that has a score greater than
#' the `score_quan_thr`th quantile and that has at least `sig_gene_thr` affected genes.
#'
#' @export
#'
#' @examples
#' \donttest{
#' processed_df <- example_pathfindR_input[1:15, -2]
#' colnames(processed_df) <- c('GENE', 'P_VALUE')
#' GR_snws <- active_snw_search(
#'   input_for_search = processed_df,
#'   pin_name_path = 'KEGG',
#'   search_method = 'GR',
#'   score_quan_thr = 0.8
#' )
#' # clean-up
#' unlink('active_snw_search', recursive = TRUE)
#' }
active_snw_search <- function(input_for_search, pin_name_path = "Biogrid", snws_file = "active_snws",
    dir_for_parallel_run = NULL, score_quan_thr = 0.8, sig_gene_thr = 0.02, search_method = "GR",
    seedForRandom = 1234, silent_option = TRUE, use_all_positives = FALSE, geneInitProbs = 0.1,
    saTemp0 = 1, saTemp1 = 0.01, saIter = 10000, gaPop = 400, gaIter = 10000, gaThread = 5,
    gaCrossover = 1, gaMut = 0, grMaxDepth = 1, grSearchDepth = 1, grOverlap = 0.5,
    grSubNum = 1000) {
    ############ Argument checks input_for_search
    if (!is.data.frame(input_for_search)) {
        stop("`input_for_search` should be data frame")
    }
    cnames <- c("GENE", "P_VALUE")
    if (any(!cnames %in% colnames(input_for_search))) {
        stop("`input_for_search` should contain the columns ", paste(dQuote(cnames),
            collapse = ","))
    }

    # pin_name_path (fetch pin path)
    pin_path <- return_pin_path(pin_name_path)

    # snws_file
    if (!suppressWarnings(file.create(file.path(tempdir(check = TRUE), snws_file)))) {
        stop("`snws_file` may be containing forbidden characters. Please change and try again")
    }

    # search_method
    valid_mets <- c("GR", "SA", "GA")
    if (!search_method %in% valid_mets) {
        stop("`search_method` should be one of ", paste(dQuote(valid_mets), collapse = ", "))
    }

    # silent_option
    if (!is.logical(silent_option)) {
        stop("`silent_option` should be either TRUE or FALSE")
    }

    # use_all_positives
    if (!is.logical(use_all_positives)) {
        stop("`use_all_positives` should be either TRUE or FALSE")
    }

    ############ Initial Steps If dir_for_parallel_run is provided, change
    ############ working dir to dir_for_parallel_run
    if (!is.null(dir_for_parallel_run)) {
        org_dir <- getwd()
        on.exit(setwd(org_dir))
        setwd(dir_for_parallel_run)
    }

    ## turn silent_option into shell argument
    tmp_out <- file.path(tempdir(check = TRUE), paste0("console_out_", snws_file,
        ".txt"))
    silent_option <- ifelse(silent_option, paste0(" > ", tmp_out), "")

    ## turn use_all_positives into the java argument
    use_all_positives <- ifelse(use_all_positives, " -useAllPositives", "")

    ## absolute path for active snw search jar
    active_search_jar_path <- system.file("java/ActiveSubnetworkSearch.jar", package = "pathfindR")

    ## create directory for active subnetworks
    if (!dir.exists("active_snw_search")) {
        dir.create("active_snw_search")
    }

    if (!file.exists("active_snw_search/input_for_search.txt")) {
        input_for_search$GENE <- base::toupper(input_for_search$GENE)
        utils::write.table(input_for_search[, c("GENE", "P_VALUE")], "active_snw_search/input_for_search.txt",
            col.names = FALSE, row.names = FALSE, quote = FALSE, sep = "\t")
    }

    input_path <- normalizePath("active_snw_search/input_for_search.txt")

    ############ Run active Subnetwork Search running Active Subnetwork Search
    system(paste0("java -Xss4m -jar \"", active_search_jar_path, "\"", " -sif=\"",
        pin_path, "\"", " -sig=\"", input_path, "\"", " -method=", search_method,
        " -seedForRandom=", seedForRandom, use_all_positives, " -saTemp0=", saTemp0,
        " -saTemp1=", saTemp1, " -saIter=", format(saIter, scientific = FALSE), " -geneInitProb=",
        geneInitProbs, " -gaPop=", gaPop, " -gaIter=", gaIter, " -gaThread=", gaThread,
        " -gaCrossover=", gaCrossover, " -gaMut=", gaMut, " -grMaxDepth=", grMaxDepth,
        " -grSearchDepth=", grSearchDepth, " -grOverlap=", grOverlap, " -grSubNum=",
        grSubNum, silent_option))

    snws_file <- file.path("active_snw_search", paste0(snws_file, ".txt"))
    file.rename(from = "resultActiveSubnetworkSearch.txt", to = snws_file)

    ############ Parse and filter active subnetworks
    filtered_snws <- filterActiveSnws(active_snw_path = snws_file, sig_genes_vec = input_for_search$GENE,
        score_quan_thr = score_quan_thr, sig_gene_thr = sig_gene_thr)

    if (is.null(filtered_snws)) {
        snws <- list()
    } else {
        snws <- filtered_snws$subnetworks
    }
    message(paste0("Found ", length(snws), " active subnetworks\n\n"))

    return(snws)
}

#' Parse Active Subnetwork Search Output File and Filter the Subnetworks
#'
#' @param active_snw_path path to the output of an Active Subnetwork Search
#' @param sig_genes_vec vector of significant gene symbols. In the scope of this
#'   package, these are the input genes that were used for active subnetwork search
#' @param score_quan_thr active subnetwork score quantile threshold. Must be
#' between 0 and 1 or set to -1 for not filtering. (Default = 0.8)
#' @param sig_gene_thr threshold for the minimum proportion of significant genes in
#' the subnetwork (Default = 0.02) If the number of genes to use as threshold is
#' calculated to be < 2 (e.g. 50 signif. genes x 0.01 = 0.5), the threshold number
#' is set to 2
#'
#' @return A list containing \code{subnetworks}: a list of of genes in every
#' active subnetwork that has a score greater than the \code{score_quan_thr}th
#' quantile and that contains at least \code{sig_gene_thr} of significant genes
#' and \code{scores} the score of each filtered active subnetwork
#' @export
#'
#' @seealso See \code{\link{run_pathfindR}} for the wrapper function of the
#'   pathfindR enrichment workflow
#'
#' @examples
#' path2snw_list <- system.file(
#'   'extdata/resultActiveSubnetworkSearch.txt',
#'   package = 'pathfindR'
#' )
#' filtered <- filterActiveSnws(
#'   active_snw_path = path2snw_list,
#'   sig_genes_vec = example_pathfindR_input$Gene.symbol
#' )
filterActiveSnws <- function(active_snw_path, sig_genes_vec, score_quan_thr = 0.8,
    sig_gene_thr = 0.02) {
    ## Arg. checks
    active_snw_path <- suppressWarnings(normalizePath(active_snw_path))

    if (!file.exists(active_snw_path)) {
        stop("The active subnetwork file does not exist! Check the `active_snw_path` argument")
    }

    if (!is.atomic(sig_genes_vec)) {
        stop("`sig_genes_vec` should be a vector")
    }

    if (!is.numeric(score_quan_thr)) {
        stop("`score_quan_thr` should be numeric")
    }
    if (score_quan_thr != -1 & (score_quan_thr > 1 | score_quan_thr < 0)) {
        stop("`score_quan_thr` should be in [0, 1] or -1 (if not filtering)")
    }

    if (!is.numeric(sig_gene_thr)) {
        stop("`sig_gene_thr` should be numeric")
    }
    if (sig_gene_thr < 0 | sig_gene_thr > 1) {
        stop("`sig_gene_thr` should be in [0, 1]")
    }

    output <- readLines(active_snw_path)

    if (length(output) == 0) {
        return(NULL)
    }

    score_vec <- c()
    subnetworks <- list()
    for (i in base::seq_len(length(output))) {
        snw <- output[[i]]

        snw <- unlist(strsplit(snw, "\\s"))

        score_vec <- c(score_vec, as.numeric(snw[1]))
        subnetworks[[i]] <- snw[-1]
    }

    # keep subnetworks with score over the 'score_quan_thr'th quantile
    if (score_quan_thr == -1) {
        score_thr <- min(score_vec) - 1
    } else {
        score_thr <- stats::quantile(score_vec, score_quan_thr)
    }
    cond <- as.numeric(score_vec) > as.numeric(score_thr)
    subnetworks <- subnetworks[cond]
    score_vec <- as.numeric(score_vec)[cond]

    # select subnetworks containing at least 'sig_gene_thr' of significant
    # genes
    snw_sig_counts <- vapply(subnetworks, function(snw_genes) {
        sum(base::toupper(snw_genes) %in% base::toupper(sig_genes_vec))
    }, 1)
    sig_gene_num_thr <- sig_gene_thr * length(sig_genes_vec)
    sig_gene_num_thr <- max(2, sig_gene_num_thr)
    cond <- (snw_sig_counts >= sig_gene_num_thr)
    subnetworks <- subnetworks[cond]
    score_vec <- score_vec[cond]

    return(list(subnetworks = subnetworks, scores = score_vec))
}

#' Visualize Active Subnetworks
#'
#' @inheritParams filterActiveSnws
#' @inheritParams term_gene_heatmap
#' @inheritParams return_pin_path
#' @param num_snws number of top subnetworks to be visualized (leave blank if
#' you want to visualize all subnetworks)
#' @inheritParams term_gene_graph
#' @param ... additional arguments for \code{\link{input_processing}}
#'
#' @return a list of ggplot objects of graph visualizations of identified active
#' subnetworks. Green nodes are down-regulated genes, reds are up-regulated genes
#' and yellows are non-input genes
#' @export
#'
#' @examples
#' path2snw_list <- system.file(
#'   'extdata/resultActiveSubnetworkSearch.txt',
#'   package = 'pathfindR'
#' )
#' # visualize top 2 active subnetworks
#' g_list <- visualize_active_subnetworks(
#'   active_snw_path = path2snw_list,
#'   genes_df = example_pathfindR_input[1:10, ],
#'   pin_name_path = 'KEGG',
#'   num_snws = 2
#' )
visualize_active_subnetworks <- function(active_snw_path, genes_df, pin_name_path = "Biogrid",
    num_snws, layout = "stress", score_quan_thr = 0.8, sig_gene_thr = 0.02, ...) {
    # process input data frame
    processed_input <- input_processing(genes_df, pin_name_path = pin_name_path,
        ...)

    # parse and filter active subnetworks
    active_snw_list <- filterActiveSnws(active_snw_path = active_snw_path, sig_genes_vec = processed_input$GENE,
        score_quan_thr = score_quan_thr, sig_gene_thr = sig_gene_thr)
    if (is.null(active_snw_list) | length(active_snw_list$scores) == 0) {
        return(NULL)
    }

    score_vec <- active_snw_list$scores
    subnetworks <- active_snw_list$subnetworks

    if (missing(num_snws)) {
        num_snws <- length(subnetworks)
    }

    if (num_snws > length(subnetworks)) {
        num_snws <- length(subnetworks)
    }

    # load PIN data load PIN
    pin_path <- return_pin_path(pin_name_path)
    pin <- utils::read.delim(file = pin_path, header = FALSE)
    pin$V2 <- NULL

    pin[, 1] <- base::toupper(pin[, 1])
    pin[, 2] <- base::toupper(pin[, 2])

    # create graphs
    graphs_list <- list()
    for (idx in seq_len(num_snws)) {
        snw <- subnetworks[[idx]]

        num_input_genes <- sum(processed_input$GENE %in% snw)
        perc_input_genes <- round(num_input_genes/length(processed_input$GENE) *
            100, 2)

        snw_interactions <- pin[pin[, 1] %in% snw & pin[, 2] %in% snw, ]
        g <- igraph::graph_from_data_frame(snw_interactions, directed = FALSE)
        cond_up_gene <- names(igraph::V(g)) %in% processed_input$GENE[processed_input$CHANGE >
            0]
        cond_down_gene <- names(igraph::V(g)) %in% processed_input$GENE[processed_input$CHANGE <
            0]
        igraph::V(g)$type <- ifelse(cond_up_gene, "up", ifelse(cond_down_gene, "down",
            "non-input"))

        igraph::V(g)$label.cex <- 0.5
        igraph::V(g)$frame.color <- "gray"
        igraph::V(g)$color <- ifelse(igraph::V(g)$type == "non-input", "#FFD500",
            ifelse(igraph::V(g)$type == "up", "#D2222D", "#35CD35"))

        color_lookup <- c(`#35CD35` = "down-regulated gene", `#D2222D` = "up-regulated gene",
            `#FFD500` = "non-input gene")


        p <- ggraph::ggraph(g, layout = layout)
        p <- p + ggraph::geom_edge_link(alpha = 0.8, colour = "darkgrey")
        p <- p + ggraph::geom_node_point(ggplot2::aes(color = .data$color), size = 2)
        p <- p + ggplot2::theme_void()
        p <- p + ggraph::geom_node_text(ggplot2::aes(label = .data$name), nudge_y = 0.2)
        p <- p + ggplot2::scale_colour_manual(values = unique(igraph::V(g)$color),
            name = NULL, labels = color_lookup[unique(igraph::V(g)$color)])
        p <- p + ggplot2::labs(title = paste0("Active Subnetwork #", idx), subtitle = paste0("Score=",
            round(score_vec[idx], 2), ", ", num_input_genes, "(", perc_input_genes,
            "%) input genes"))
        p <- p + ggplot2::theme(plot.title = ggplot2::element_text(hjust = 0.5),
            plot.subtitle = ggplot2::element_text(hjust = 0.5), legend.position = "bottom")
        graphs_list[[idx]] <- p
    }

    return(graphs_list)
}


================================================
FILE: R/clustering.R
================================================
#' Create Kappa Statistics Matrix
#'
#' @param enrichment_res data frame of pathfindR enrichment results. Must-have
#' columns are 'Term_Description' (if \code{use_description = TRUE}) or 'ID'
#' (if \code{use_description = FALSE}), 'Down_regulated', and 'Up_regulated'.
#' If \code{use_active_snw_genes = TRUE}, 'non_Signif_Snw_Genes' must also be
#' provided.
#' @param use_description Boolean argument to indicate whether term descriptions
#'  (in the 'Term_Description' column) should be used. (default = \code{FALSE})
#' @param use_active_snw_genes boolean to indicate whether or not to use
#' non-input active subnetwork genes in the calculation of kappa statistics
#' (default = FALSE, i.e. only use affected genes)
#'
#' @return a matrix of kappa statistics between each term in the
#' enrichment results.
#'
#' @export
#'
#' @examples
#' sub_df <- example_pathfindR_output[1:3, ]
#' create_kappa_matrix(sub_df)
create_kappa_matrix <- function(enrichment_res, use_description = FALSE, use_active_snw_genes = FALSE) {
    ### Argument checks
    if (!is.logical(use_description)) {
        stop("`use_description` should be TRUE or FALSE")
    }

    if (!is.logical(use_active_snw_genes)) {
        stop("`use_active_snw_genes` should be TRUE or FALSE")
    }

    if (!is.data.frame(enrichment_res)) {
        stop("`enrichment_res` should be a data frame of enrichment results")
    }
    if (nrow(enrichment_res) < 2) {
        stop("`enrichment_res` should contain at least 2 rows")
    }

    nec_cols <- c("Down_regulated", "Up_regulated")
    if (use_description) {
        nec_cols <- c("Term_Description", nec_cols)
    } else {
        nec_cols <- c("ID", nec_cols)
    }
    if (use_active_snw_genes) {
        nec_cols <- c(nec_cols, "non_Signif_Snw_Genes")
    }

    if (!all(nec_cols %in% colnames(enrichment_res))) {
        stop("`enrichment_res` should contain all of ", paste(dQuote(nec_cols), collapse = ", "))
    }

    ### Initial steps Column to use for gene set names
    chosen_id <- ifelse(use_description, which(colnames(enrichment_res) == "Term_Description"),
        which(colnames(enrichment_res) == "ID"))

    # list of genes
    down_idx <- which(colnames(enrichment_res) == "Down_regulated")
    up_idx <- which(colnames(enrichment_res) == "Up_regulated")

    genes_lists <- apply(enrichment_res, 1, function(x) {
        base::toupper(c(unlist(strsplit(as.character(x[up_idx]), ", ")), unlist(strsplit(as.character(x[down_idx]),
            ", "))))
    })

    if (use_active_snw_genes) {
        active_idx <- which(colnames(enrichment_res) == "non_Signif_Snw_Genes")

        genes_lists <- mapply(function(x, y) {
            c(x, unlist(strsplit(as.character(y), ", ")))
        }, genes_lists, enrichment_res[, active_idx])
    }

    # Exclude zero-length gene sets
    excluded_idx <- which(vapply(genes_lists, length, 1) == 0)
    if (length(excluded_idx) != 0) {
        genes_lists <- genes_lists[-excluded_idx]
        enrichment_res <- enrichment_res[-excluded_idx, ]
    }

    ### Create Kappa Matrix
    all_genes <- unique(unlist(genes_lists, use.names = FALSE))
    N <- nrow(enrichment_res)
    term_names <- enrichment_res[, chosen_id]

    kappa_mat <- matrix(0, nrow = N, ncol = N, dimnames = list(term_names, term_names))
    diag(kappa_mat) <- 1

    total <- length(all_genes)
    for (i in 1:(N - 1)) {
        for (j in (i + 1):N) {
            genes_i <- genes_lists[[i]]
            genes_j <- genes_lists[[j]]

            both <- length(intersect(genes_i, genes_j))
            term_i <- length(base::setdiff(genes_i, genes_j))
            term_j <- length(base::setdiff(genes_j, genes_i))
            no_terms <- total - sum(both, term_i, term_j)

            observed <- (both + no_terms)/total
            chance <- (both + term_i) * (both + term_j)
            chance <- chance + (term_j + no_terms) * (term_i + no_terms)
            chance <- chance/total^2
            kappa_mat[j, i] <- kappa_mat[i, j] <- (observed - chance)/(1 - chance)
        }
    }
    kappa_mat[is.na(kappa_mat)] <- 0
    return(kappa_mat)
}


#' Hierarchical Clustering of Enriched Terms
#'
#' @param kappa_mat matrix of kappa statistics (output of \code{\link{create_kappa_matrix}})
#' @inheritParams create_kappa_matrix
#' @param num_clusters number of clusters to be formed (default = \code{NULL}).
#' If \code{NULL}, the optimal number of clusters is determined as the number
#' which yields the highest average silhouette width.
#' @param clu_method the agglomeration method to be used
#' (default = 'average', see \code{\link[stats]{hclust}})
#' @param plot_hmap boolean to indicate whether to plot the kappa statistics
#' clustering heatmap or not (default = FALSE)
#' @param plot_dend boolean to indicate whether to plot the clustering
#' dendrogram partitioned into the optimal number of clusters (default = TRUE)
#'
#' @details The function initially performs hierarchical clustering
#' of the enriched terms in \code{enrichment_res} using the kappa statistics
#' (defining the distance as \code{1 - kappa_statistic}). Next,
#' the clustering dendrogram is cut into k = 2, 3, ..., n - 1 clusters
#' (where n is the number of terms). The optimal number of clusters is
#' determined as the k value which yields the highest average silhouette width.
#' (if \code{num_clusters} not specified)
#'
#' @return a vector of clusters for each enriched term in the enrichment results.
#' @export
#'
#' @examples
#' \dontrun{
#' hierarchical_term_clustering(kappa_mat, enrichment_res)
#' hierarchical_term_clustering(kappa_mat, enrichment_res, method = 'complete')
#' }
hierarchical_term_clustering <- function(kappa_mat, enrichment_res, num_clusters = NULL,
    use_description = FALSE, clu_method = "average", plot_hmap = FALSE, plot_dend = TRUE) {
    ### Set ID/Name index
    chosen_id <- ifelse(use_description, which(colnames(enrichment_res) == "Term_Description"),
        which(colnames(enrichment_res) == "ID"))

    ### Argument checks
    if (!isSymmetric.matrix(kappa_mat)) {
        stop("`kappa_mat` should be a symmetric matrix")
    }

    if (!all(colnames(kappa_mat) %in% enrichment_res[, chosen_id])) {
        stop("All terms in `kappa_mat` should be present in `enrichment_res`")
    }

    if (!is.logical(plot_hmap)) {
        stop("`plot_hmap` should be TRUE or FALSE")
    }

    if (!is.logical(plot_dend)) {
        stop("`plot_dend` should be TRUE or FALSE")
    }

    ### Add excluded (zero-length) genes
    kappa_mat2 <- kappa_mat
    cond <- !enrichment_res[, chosen_id] %in% rownames(kappa_mat2)
    outliers <- enrichment_res[cond, chosen_id]
    outliers_mat <- matrix(-1, nrow = nrow(kappa_mat2), ncol = length(outliers),
        dimnames = list(rownames(kappa_mat2), outliers))
    kappa_mat2 <- cbind(kappa_mat2, outliers_mat)
    outliers_mat <- matrix(-1, nrow = length(outliers), ncol = ncol(kappa_mat2),
        dimnames = list(outliers, colnames(kappa_mat2)))
    kappa_mat2 <- rbind(kappa_mat2, outliers_mat)

    ### Perform hierarchical clustering
    clu <- stats::hclust(stats::as.dist(1 - kappa_mat2), method = clu_method)

    if (plot_hmap) {
        stats::heatmap(kappa_mat2, distfun = function(x) stats::as.dist(1 - x), hclustfun = function(x) stats::hclust(x,
            method = clu_method))
    }

    ### Choose optimal k (if not specified)
    if (is.null(num_clusters)) {
        kmax <- max(nrow(kappa_mat2)%/%2, 2)

        # sequence of k (number of clusters) to try
        if (kmax <= 20) {
            kseq <- 2:kmax
        } else if (kmax <= 100) {
            kseq <- c(2:19, seq(20, kmax%/%10 * 10, 10))
        } else {
            kseq <- c(2:19, seq(20, 99, 10), seq(100, kmax%/%50 * 50, 50))
        }

        # calculate average silhouette width per k in sequence
        avg_sils <- c()
        for (k in kseq) {
            avg_sils <- c(avg_sils, fpc::cluster.stats(stats::as.dist(1 - kappa_mat2),
                stats::cutree(clu, k = k), silhouette = TRUE)$avg.silwidth)
        }

        k_opt <- kseq[which.max(avg_sils)]

        message(paste("The maximum average silhouette width was", round(max(avg_sils),
            2), "for k =", k_opt, "\n\n"))
    } else {
        k_opt <- num_clusters
    }


    if (plot_dend) {
        graphics::plot(clu)
        stats::rect.hclust(clu, k = k_opt)
    }

    clusters <- stats::cutree(clu, k = k_opt)

    return(clusters)
}

#' Heuristic Fuzzy Multiple-linkage Partitioning of Enriched Terms
#'
#' @inheritParams hierarchical_term_clustering
#' @inheritParams create_kappa_matrix
#' @param kappa_threshold threshold for kappa statistics, defining strong
#' relation (default = 0.35)
#'
#' @details The fuzzy clustering algorithm was implemented based on:
#' Huang DW, Sherman BT, Tan Q, et al. The DAVID Gene Functional
#' Classification Tool: a novel biological module-centric algorithm to
#' functionally analyze large gene lists. Genome Biol. 2007;8(9):R183.
#'
#' @return a boolean matrix of cluster assignments. Each row corresponds to an
#' enriched term, each column corresponds to a cluster.
#' @export
#'
#' @examples
#' \dontrun{
#' fuzzy_term_clustering(kappa_mat, enrichment_res)
#' fuzzy_term_clustering(kappa_mat, enrichment_res, kappa_threshold = 0.45)
#' }
fuzzy_term_clustering <- function(kappa_mat, enrichment_res, kappa_threshold = 0.35,
    use_description = FALSE) {
    ### Set ID/Name index
    chosen_id <- ifelse(use_description, which(colnames(enrichment_res) == "Term_Description"),
        which(colnames(enrichment_res) == "ID"))

    ### Argument checks
    if (!isSymmetric.matrix(kappa_mat)) {
        stop("`kappa_mat` should be a symmetric matrix")
    }

    if (!all(colnames(kappa_mat) %in% enrichment_res[, chosen_id])) {
        stop("All terms in `kappa_mat` should be present in `enrichment_res`")
    }

    if (!is.numeric(kappa_threshold)) {
        stop("`kappa_threshold` should be numeric")
    }

    if (kappa_threshold > 1) {
        stop("`kappa_threshold` should be at most 1 as kappa statistic is always <= 1")
    }

    ### Find Qualified Seeds
    qualified_seeds <- list()
    j <- 1
    for (i in base::seq_len(nrow(kappa_mat))) {
        current_term <- rownames(kappa_mat)[i]
        current_term_kappa <- kappa_mat[i, ]


        init_membership_cond <- current_term_kappa >= kappa_threshold
        if (sum(init_membership_cond) > 3) {
            related_terms <- names(current_term_kappa)[init_membership_cond]
            terms <- c(current_term, related_terms)
            related_kappa <- kappa_mat[rownames(kappa_mat) %in% terms, colnames(kappa_mat) %in%
                terms]
            diag(related_kappa) <- 0
            tight_relationship_cond <- sum(related_kappa >= kappa_threshold)/(nrow(related_kappa)^2) >=
                0.5

            if (tight_relationship_cond) {
                qualified_seeds[[j]] <- related_terms
                names(qualified_seeds)[j] <- current_term
                j <- j + 1
            }
        }
    }

    ### Fuzzy Clustering
    clusters <- unique(qualified_seeds)
    i <- 1
    j <- i + 1
    while (i < length(clusters)) {
        common_terms <- intersect(clusters[[i]], clusters[[j]])
        all_terms <- union(clusters[[i]], clusters[[j]])

        if (length(common_terms)/length(all_terms) > 0.5 & i != j) {
            clusters[[i]] <- all_terms
            clusters[[j]] <- NULL
            i <- 1
            j <- i + 1
        } else if (j < length(clusters)) {
            j <- j + 1
        } else {
            i <- i + 1
            j <- 1
        }
    }

    ### Find Outliers
    cond <- !enrichment_res[, chosen_id] %in% c(names(clusters), unlist(clusters))
    outliers <- enrichment_res[cond, chosen_id]
    for (outlier in outliers) {
        clusters[[outlier]] <- outlier
    }
    ### Return Cluster Matrix
    names(clusters) <- base::seq_len(length(clusters))

    cluster_mat <- matrix(FALSE, nrow = nrow(enrichment_res), ncol = length(clusters),
        dimnames = list(enrichment_res[, chosen_id], names(clusters)))
    for (clu in names(clusters)) {
        clu_terms <- clusters[[clu]]
        cluster_mat[clu_terms, clu] <- TRUE
    }

    return(cluster_mat)
}


#' Graph Visualization of Clustered Enriched Terms
#'
#' @param clu_obj clustering result (either a matrix obtained via
#' \code{\link{hierarchical_term_clustering}} or \code{\link{fuzzy_term_clustering}}
#' `fuzzy_term_clustering` or a vector obtained via `hierarchical_term_clustering`)
#' @inheritParams fuzzy_term_clustering
#' @param vertex.label.cex font size for vertex labels; it is interpreted as a multiplication factor of some device-dependent base font size (default = 0.7)
#' @param vertex.size.scaling scaling factor for the node size (default = 2.5)
#'
#' @return Plots a graph diagram of clustering results. Each node is an enriched term
#' from `enrichment_res`. Size of node corresponds to -log(lowest_p). Thickness
#' of the edges between nodes correspond to the kappa statistic between the two
#' terms. Color of each node corresponds to distinct clusters. For fuzzy
#' clustering, if a term is in multiple clusters, multiple colors are utilized.
#'
#' @export
#'
#' @examples
#' \dontrun{
#' cluster_graph_vis(clu_obj, kappa_mat, enrichment_res)
#' }
cluster_graph_vis <- function(clu_obj, kappa_mat, enrichment_res, kappa_threshold = 0.35,
    use_description = FALSE, vertex.label.cex = 0.7, vertex.size.scaling = 2.5) {
    ### Set ID/Name index
    chosen_id <- ifelse(use_description, which(colnames(enrichment_res) == "Term_Description"),
        which(colnames(enrichment_res) == "ID"))

    ### For coloring nodes
    all_cols <- c("#E41A1C", "#377EB8", "#4DAF4A", "#984EA3", "#FF7F00", "#FFFF33",
        "#A65628", "#F781BF", "#999999", "#66C2A5", "#FC8D62", "#8DA0CB", "#E78AC3",
        "#A6D854", "#FFD92F", "#E5C494", "#B3B3B3", "#8DD3C7", "#FFFFB3", "#BEBADA",
        "#FB8072", "#80B1D3", "#FDB462", "#B3DE69", "#FCCDE5", "#D9D9D9", "#BC80BD",
        "#CCEBC5", "#FFED6F", "#A6CEE3", "#1F78B4", "#B2DF8A", "#33A02C", "#FB9A99",
        "#E31A1C", "#FDBF6F", "#FF7F00", "#CAB2D6", "#6A3D9A", "#FFFF99", "#B15928")

    if (is.matrix(clu_obj)) {
        ### Argument checks
        if (!all(rownames(clu_obj) %in% colnames(kappa_mat))) {
            stop("Not all terms in `clu_obj` present in `kappa_mat`!")
        }

        ### Prep data Remove weak links
        kappa_mat2 <- kappa_mat
        diag(kappa_mat2) <- 0
        kappa_mat2 <- ifelse(kappa_mat2 < kappa_threshold, 0, kappa_mat2)

        # Add missing terms
        missing <- rownames(clu_obj)[!rownames(clu_obj) %in% colnames(kappa_mat2)]
        missing_mat <- matrix(0, nrow = nrow(kappa_mat2), ncol = length(missing),
            dimnames = list(rownames(kappa_mat2), missing))
        kappa_mat2 <- cbind(kappa_mat2, missing_mat)
        missing <- rownames(clu_obj)[!rownames(clu_obj) %in% rownames(kappa_mat2)]
        missing_mat <- matrix(0, nrow = length(missing), ncol = ncol(kappa_mat2),
            dimnames = list(missing, colnames(kappa_mat2)))
        kappa_mat2 <- rbind(kappa_mat2, missing_mat)

        ### Create Graph, Set Color, Size and Percentages
        values <- apply(clu_obj, 1, function(x) which(x))
        percs <- list()
        for (i in base::seq_len(length(values))) {
            percs[[i]] <- rep(1/length(values[[i]]), length(values[[i]]))
        }

        g <- igraph::graph_from_adjacency_matrix(kappa_mat2, weighted = TRUE)

        if (length(all_cols) < max(as.integer(colnames(clu_obj)))) {
            num_extra <- max(as.integer(colnames(clu_obj))) - length(all_cols)
            extra_colors <- grDevices::rainbow(num_extra)
            all_cols <- c(all_cols, extra_colors)
        }

        # Node shapes are either circle (single cluster) or pie (multiple
        # clusters)
        igraph::V(g)$shape <- ifelse(vapply(percs, length, 1) > 1, "pie", "circle")

        # Node colors are cluster memberships
        cols <- lapply(values, function(x) all_cols[x])
        igraph::V(g)$color <- vapply(cols, function(x) x[1], "")

        # Node sizes are -log(lowest_p)
        p_idx <- match(names(igraph::V(g)), enrichment_res[, chosen_id])
        transformed_p <- -log10(enrichment_res$lowest_p[p_idx])
        igraph::V(g)$size <- transformed_p * vertex.size.scaling

        ### Plot Graph
        igraph::plot.igraph(g, vertex.pie = percs, vertex.pie.color = cols, layout = igraph::layout_nicely(g),
            edge.curved = FALSE, vertex.label.dist = 0, vertex.label.color = "black",
            asp = 1, vertex.label.cex = vertex.label.cex, edge.width = igraph::E(g)$weight,
            edge.arrow.mode = 0)
    } else if (is.integer(clu_obj)) {
        ### Argument checks
        if (!all(names(clu_obj) %in% colnames(kappa_mat))) {
            stop("Not all terms in `clu_obj` present in `kappa_mat`!")
        }

        ### Prep data Remove weak links
        kappa_mat2 <- kappa_mat
        diag(kappa_mat2) <- 0
        kappa_mat2 <- ifelse(kappa_mat2 > kappa_threshold, kappa_mat2, 0)

        # Add missing terms
        missing <- names(clu_obj)[!names(clu_obj) %in% colnames(kappa_mat2)]
        missing_mat <- matrix(0, nrow = nrow(kappa_mat2), ncol = length(missing),
            dimnames = list(rownames(kappa_mat2), missing))
        kappa_mat2 <- cbind(kappa_mat2, missing_mat)
        missing <- names(clu_obj)[!names(clu_obj) %in% rownames(kappa_mat2)]
        missing_mat <- matrix(0, nrow = length(missing), ncol = ncol(kappa_mat2),
            dimnames = list(missing, colnames(kappa_mat2)))
        kappa_mat2 <- rbind(kappa_mat2, missing_mat)

        ### Create Graph, Set Colors and Sizes
        g <- igraph::graph_from_adjacency_matrix(kappa_mat2, weighted = TRUE)

        igraph::V(g)$Clu <- clu_obj[match(igraph::V(g)$name, names(clu_obj))]

        if (length(all_cols) < max(as.integer(igraph::V(g)$Clu))) {
            num_extra <- max(clu_obj) - length(all_cols)
            extra_colors <- grDevices::rainbow(num_extra)
            all_cols <- c(all_cols, extra_colors)
        }

        # Node colors are cluster memberships
        igraph::V(g)$color <- all_cols[as.integer(igraph::V(g)$Clu)]

        # Node sizes are -log(lowest_p)
        p_idx <- match(names(igraph::V(g)), enrichment_res[, chosen_id])
        transformed_p <- -log10(enrichment_res$lowest_p[p_idx])
        igraph::V(g)$size <- transformed_p * vertex.size.scaling

        ### Plot graph
        igraph::plot.igraph(g, layout = igraph::layout_nicely(g), edge.curved = FALSE,
            vertex.label.dist = 0, vertex.label.color = "black", asp = 0, vertex.label.cex = vertex.label.cex,
            edge.width = igraph::E(g)$weight, edge.arrow.mode = 0)
    } else {
        stop("Invalid class for `clu_obj`!")
    }
}

#' Cluster Enriched Terms
#'
#' @inheritParams create_kappa_matrix
#' @param method Either 'hierarchical' or 'fuzzy'. Details of clustering are
#' provided in the corresponding functions \code{\link{hierarchical_term_clustering}},
#' and \code{\link{fuzzy_term_clustering}}
#' @param plot_clusters_graph boolean value indicate whether or not to plot
#' the graph diagram of clustering results (default = TRUE)
#' @param ... additional arguments for \code{\link{hierarchical_term_clustering}},
#' \code{\link{fuzzy_term_clustering}} and \code{\link{cluster_graph_vis}}.
#' See documentation of these functions for more details.
#'
#'
#' @return a data frame of clustering results. For 'hierarchical', the cluster
#' assignments (Cluster) and whether the term is representative of its cluster
#' (Status) is added as columns. For 'fuzzy', terms that are in multiple
#' clusters are provided for each cluster. The cluster assignments (Cluster)
#' and whether the term is representative of its cluster (Status) is
#' added as columns.
#'
#' @export
#'
#' @examples
#' example_clustered <- cluster_enriched_terms(
#'   example_pathfindR_output[1:3, ],
#'   plot_clusters_graph = FALSE
#' )
#' example_clustered <- cluster_enriched_terms(
#'   example_pathfindR_output[1:3, ],
#'   method = 'fuzzy', plot_clusters_graph = FALSE
#' )
#' @seealso See \code{\link{hierarchical_term_clustering}} for hierarchical
#' clustering of enriched terms.
#' See \code{\link{fuzzy_term_clustering}} for fuzzy clustering of enriched terms.
#' See \code{\link{cluster_graph_vis}} for graph visualization of clustering.
cluster_enriched_terms <- function(enrichment_res, method = "hierarchical", plot_clusters_graph = TRUE,
    use_description = FALSE, use_active_snw_genes = FALSE, ...) {
    ### Argument Checks
    if (!method %in% c("hierarchical", "fuzzy")) {
        stop("the clustering `method` must either be \"hierarchical\" or \"fuzzy\"")
    }

    if (!is.logical(plot_clusters_graph)) {
        stop("`plot_clusters_graph` must be logical!")
    }

    ### Create Kappa Matrix
    kappa_mat <- create_kappa_matrix(enrichment_res = enrichment_res, use_description = use_description,
        use_active_snw_genes = use_active_snw_genes)
    kappa_mat[is.na(kappa_mat)] <- 0

    ### Cluster Terms
    if (method == "hierarchical") {
        clu_obj <- R.utils::doCall("hierarchical_term_clustering", kappa_mat = kappa_mat,
            enrichment_res = enrichment_res, use_description = use_description, ...)
    } else {
        clu_obj <- R.utils::doCall("fuzzy_term_clustering", kappa_mat = kappa_mat,
            enrichment_res = enrichment_res, use_description = use_description, ...)
    }

    ### Graph Visualization of Clusters
    if (plot_clusters_graph) {
        R.utils::doCall("cluster_graph_vis", clu_obj = clu_obj, kappa_mat = kappa_mat,
            enrichment_res = enrichment_res, use_description = use_description, ...)
    }

    ### Returned Data Frame with Cluster Information
    clustered_df <- enrichment_res

    ### Set ID/Name index
    chosen_id <- ifelse(use_description, which(colnames(enrichment_res) == "Term_Description"),
        which(colnames(enrichment_res) == "ID"))

    if (method == "hierarchical") {
        ### Assign Clusters and Representatives
        clu_idx <- match(clustered_df[, chosen_id], names(clu_obj))
        clustered_df$Cluster <- clu_obj[clu_idx]
        clustered_df <- clustered_df[order(clustered_df$Cluster, clustered_df$lowest_p,
            decreasing = FALSE), ]

        tmp <- tapply(clustered_df[, chosen_id], clustered_df$Cluster, function(x) x[1])
        stat_cond <- clustered_df[, chosen_id] %in% tmp
        clustered_df$Status <- ifelse(stat_cond, "Representative", "Member")
    } else {
        term_list <- list()
        for (term in rownames(clu_obj)) {
            term_list[[term]] <- which(clu_obj[term, ])
        }
        ### Assign Clusters and Representatives
        clustered_df2 <- c()
        for (i in base::seq_len(nrow(clustered_df))) {
            current_row <- clustered_df[i, ]
            current_clusters <- term_list[[current_row[, chosen_id]]]
            for (clu in current_clusters) {
                clustered_df2 <- rbind(clustered_df2, data.frame(current_row, Cluster = clu))
            }
        }

        clustered_df <- clustered_df2
        clustered_df <- clustered_df[order(clustered_df$Cluster, clustered_df$lowest_p,
            decreasing = FALSE), ]

        tmp <- tapply(clustered_df[, chosen_id], clustered_df$Cluster, function(x) x[1])
        stat_cond <- clustered_df[, chosen_id] %in% tmp
        clustered_df$Status <- ifelse(stat_cond, "Representative", "Member")
    }

    return(clustered_df)
}


================================================
FILE: R/comparison.R
================================================
#' Combine 2 pathfindR Results
#'
#' @param result_A data frame of first pathfindR enrichment results
#' @param result_B data frame of second pathfindR enrichment results
#' @param plot_common boolean to indicate whether or not to plot the term-gene
#' graph of the common terms (default=\code{TRUE})
#'
#' @return Data frame of combined pathfindR enrichment results. Columns are: \describe{
#'   \item{ID}{ID of the enriched term}
#'   \item{Term_Description}{Description of the enriched term}
#'   \item{Fold_Enrichment_A}{Fold enrichment value for the enriched term (Calculated using ONLY the input genes)}
#'   \item{occurrence_A}{the number of iterations that the given term was found to enriched over all iterations}
#'   \item{lowest_p_A}{the lowest adjusted-p value of the given term over all iterations}
#'   \item{highest_p_A}{the highest adjusted-p value of the given term over all iterations}
#'   \item{Up_regulated_A}{the up-regulated genes in the input involved in the given term's gene set, comma-separated}
#'   \item{Down_regulated_A}{the down-regulated genes in the input involved in the given term's gene set, comma-separated}
#'   \item{Fold_Enrichment_B}{Fold enrichment value for the enriched term (Calculated using ONLY the input genes)}
#'   \item{occurrence_B}{the number of iterations that the given term was found to enriched over all iterations}
#'   \item{lowest_p_B}{the lowest adjusted-p value of the given term over all iterations}
#'   \item{highest_p_B}{the highest adjusted-p value of the given term over all iterations}
#'   \item{Up_regulated_B}{the up-regulated genes in the input involved in the given term's gene set, comma-separated}
#'   \item{Down_regulated_B}{the down-regulated genes in the input involved in the given term's gene set, comma-separated}
#'   \item{combined_p}{the combined p value (via Fisher's method)}
#'   \item{status}{whether the term is found in both analyses ('common'), found only in the first ('A only') or found only in the second ('B only)}
#' }
#' By default, the function also displays the term-gene graph of the common terms
#'
#' @export
#'
#' @examples
#' combined_results <- combine_pathfindR_results(example_pathfindR_output, example_comparison_output)
combine_pathfindR_results <- function(result_A, result_B, plot_common = TRUE) {
    combined_df <- merge(result_A, result_B, by = c("ID", "Term_Description"), all = TRUE,
        suffixes = c("_A", "_B"))

    ### Calculate combined p values
    combined_df$combined_p <- NA
    for (i in seq_len(nrow(combined_df))) {
        p_vec <- c(combined_df$lowest_p_A[i], combined_df$lowest_p_B[i])
        p_vec <- p_vec[!is.na(p_vec)]
        combined_df$combined_p[i] <- stats::pchisq(q = sum(log(p_vec)) * -2, df = length(p_vec) *
            2, lower.tail = FALSE)
    }
    ### Indicate intersection status
    combined_df$status <- ifelse(is.na(combined_df$lowest_p_A), "B only", ifelse(is.na(combined_df$lowest_p_B),
        "A only", "common"))

    ### Plot graph common terms
    if (plot_common) {
        graphics::plot(combined_results_graph(combined_df))
    }

    message("You may run `combined_results_graph()` to create visualizations of combined term-gene graphs of selected terms")

    return(combined_df)
}


#' Combined Results Graph
#'
#' @param combined_df Data frame of combined pathfindR enrichment results
#' @param selected_terms the vector of selected terms for creating the graph
#' (either IDs or term descriptions). If set to \code{'common'}, all of the
#' common terms are used. (default = 'common')
#' @inheritParams term_gene_graph
#'
#' @return a  \code{\link[ggraph]{ggraph}} object containing the combined term-gene graph.
#'  Each node corresponds to an enriched term (orange if common, different shades of blue otherwise),
#'  an up-regulated gene (green), a down-regulated gene (red) or
#'  a conflicting (i.e. up in one analysis, down in the other or vice versa) gene
#'  (gray). An edge between a term and a gene indicates
#'  that the given term involves the gene. Size of a term node is proportional
#'  to either the number of genes (if \code{node_size = 'num_genes'}) or
#'  the -log10(lowest p value) (if \code{node_size = 'p_val'}).
#' @export
#'
#' @examples
#' combined_results <- combine_pathfindR_results(
#'   example_pathfindR_output,
#'   example_comparison_output,
#'   plot_common = FALSE
#' )
#' g <- combined_results_graph(combined_results, selected_terms = sample(combined_results$ID, 3))
combined_results_graph <- function(combined_df, selected_terms = "common", use_description = FALSE,
    layout = "stress", node_size = "num_genes") {
    ############ Argument Checks Check use_description is boolean
    if (!is.logical(use_description)) {
        stop("`use_description` must either be TRUE or FALSE!")
    }

    ### Set column for term labels
    ID_column <- ifelse(use_description, "Term_Description", "ID")

    ### Check node_size
    val_node_size <- c("num_genes", "p_val")
    if (!node_size %in% val_node_size) {
        stop("`node_size` should be one of ", paste(dQuote(val_node_size), collapse = ", "))
    }

    if (!is.data.frame(combined_df)) {
        stop("`combined_df` should be a data frame")
    }

    ### Check necessary columnns
    necessary_cols <- c(ID_column, "combined_p", "Up_regulated_A", "Down_regulated_A",
        "Up_regulated_B", "Down_regulated_B")

    if (!all(necessary_cols %in% colnames(combined_df))) {
        stop(paste(c("All of", paste(necessary_cols, collapse = ", "), "must be present in `results_df`!"),
            collapse = " "))
    }

    ############ Initial steps Filter for selected terms
    if (any(selected_terms == "common")) {
        if (!any(combined_df$status == "common")) {
            stop("There are no common terms")
        }
        combined_df <- combined_df[combined_df$status == "common", ]
    } else {
        if (!any(selected_terms %in% combined_df[, ID_column])) {
            stop("None of the `selected_terms` are in the combined results!")
        }
        combined_df <- combined_df[combined_df[, ID_column] %in% selected_terms,
            ]
    }

    ### Prep data frame for graph
    graph_df <- data.frame()
    for (i in base::seq_len(nrow(combined_df))) {
        up_genes <- c(unlist(strsplit(combined_df$Up_regulated_A[i], ", ")), unlist(strsplit(combined_df$Up_regulated_B[i],
            ", ")))
        down_genes <- c(unlist(strsplit(combined_df$Down_regulated_A[i], ", ")),
            unlist(strsplit(combined_df$Down_regulated_B[i], ", ")))
        genes <- c(up_genes, down_genes)
        genes <- genes[!is.na(genes)]
        for (gene in genes) {
            graph_df <- rbind(graph_df, data.frame(Term = combined_df[i, ID_column],
                Gene = gene))
        }
    }
    graph_df <- unique(graph_df)

    up_genes_A <- unlist(lapply(combined_df$Up_regulated_A, function(x) unlist(strsplit(x,
        ", "))))
    down_genes_A <- unlist(lapply(combined_df$Down_regulated_A, function(x) unlist(strsplit(x,
        ", "))))
    up_genes_B <- unlist(lapply(combined_df$Up_regulated_B, function(x) unlist(strsplit(x,
        ", "))))
    down_genes_B <- unlist(lapply(combined_df$Down_regulated_B, function(x) unlist(strsplit(x,
        ", "))))

    terms_A <- combined_df[!is.na(combined_df$lowest_p_A) & is.na(combined_df$lowest_p_B),
        ID_column]
    terms_B <- combined_df[is.na(combined_df$lowest_p_A) & !is.na(combined_df$lowest_p_B),
        ID_column]

    ############ Create graph object and plot create igraph object
    g <- igraph::graph_from_data_frame(graph_df, directed = FALSE)
    igraph::V(g)$type <- ifelse(names(igraph::V(g)) %in% terms_A, "A-only term",
        ifelse(names(igraph::V(g)) %in% terms_B, "B-only term", ifelse(names(igraph::V(g)) %in%
            combined_df[, ID_column], "common term", "gene")))

    # Adjust node sizes
    if (node_size == "num_genes") {
        sizes <- igraph::degree(g)
        sizes <- ifelse(grepl("term", igraph::V(g)$type), sizes, 2)
        size_label <- "# genes"
    } else {
        idx <- match(names(igraph::V(g)), combined_df[, ID_column])
        sizes <- -log10(combined_df$combined_p[idx])
        sizes[is.na(sizes)] <- 2
        size_label <- "-log10(p)"
    }
    igraph::V(g)$size <- sizes
    igraph::V(g)$label.cex <- 0.5
    igraph::V(g)$frame.color <- "gray"

    cond_up_A <- names(igraph::V(g)) %in% up_genes_A
    cond_up_B <- names(igraph::V(g)) %in% up_genes_B
    cond_down_A <- names(igraph::V(g)) %in% down_genes_A
    cond_down_B <- names(igraph::V(g)) %in% down_genes_B
    missing_A <- !cond_up_A & !cond_down_A
    missing_B <- !cond_up_B & !cond_down_B

    up_cond <- (cond_up_A & cond_up_B) | (missing_A & cond_up_B) | (cond_up_A & missing_B)
    down_cond <- (cond_down_A & cond_down_B) | (missing_A & cond_down_B) | (cond_down_A &
        missing_B)

    igraph::V(g)$for_coloring <- ifelse(igraph::V(g)$type == "common term", "Common term",
        ifelse(igraph::V(g)$type == "A-only term", "A-only term", ifelse(igraph::V(g)$type ==
            "B-only term", "B-only term", ifelse(up_cond, "Up gene", ifelse(down_cond,
            "Down gene", "Conflicting gene")))))

    ### Create graph
    create_graph <- function(g, for_coloring, size) {
        color_var <- ggplot2::enquo(for_coloring)
        size_var <- ggplot2::enquo(size)
        p <- ggraph::ggraph(g, layout = layout)
        p <- p + ggraph::geom_edge_link(alpha = 0.8, colour = "darkgrey")
        p <- p + ggraph::geom_node_point(ggplot2::aes(color = !!color_var, size = !!size_var))
        p <- p + ggplot2::scale_size(range = c(5, 10), breaks = round(seq(round(min(igraph::V(g)$size)),
            round(max(igraph::V(g)$size)), length.out = 4)), name = size_label)
        p <- p + ggplot2::theme_void()
        p <- p + suppressWarnings(ggraph::geom_node_text(ggplot2::aes(label = .data$name),
            nudge_y = 0.2, repel = TRUE, max.overlaps = 20))

        vertex_cols <- c(`Common term` = "#FCCA46", `A-only term` = "#9FB8AD", `B-only term` = "#619B8A",
            `Up gene` = "green", `Down gene` = "red", `Conflicting gene` = "gray")
        p <- p + ggplot2::scale_colour_manual(values = vertex_cols, name = NULL)
        p <- p + ggplot2::ggtitle("Combined Terms Graph")
        p <- p + ggplot2::theme(plot.title = ggplot2::element_text(hjust = 0.5))
        return(p)
    }

    return(create_graph(g, for_coloring, size))
}


================================================
FILE: R/core.R
================================================
#' Wrapper Function for pathfindR - Active-Subnetwork-Oriented Enrichment Workflow
#'
#' \code{run_pathfindR} is the wrapper function for the pathfindR workflow
#'
#' This function takes in a data frame consisting of Gene Symbol, log-fold-change
#' and adjusted-p values. After input testing, any gene symbols that are not in
#' the PIN are converted to alias symbols if the alias is in the PIN. Next,
#' active subnetwork search is performed. Enrichment analysis is
#' performed using the genes in each of the active subnetworks. Terms with
#' adjusted-p values lower than \code{enrichment_threshold} are discarded. The
#' lowest adjusted-p value (over all subnetworks) for each term is kept. This
#' process of active subnetwork search and enrichment is repeated  for a selected
#' number of \code{iterations}, which is done in parallel. Over all iterations,
#' the lowest and the highest adjusted-p values, as well as number of occurrences
#' are reported for each enriched term.
#'
#' @inheritParams input_processing
#' @inheritParams fetch_gene_set
#' @inheritParams enrichment_analyses
#' @param plot_enrichment_chart boolean value. If TRUE, a bubble chart displaying
#'  the enrichment results is plotted. (default = TRUE)
#' @param output_dir the directory to be created where the output and intermediate
#'  files are saved (default = \code{NULL}, a temporary directory is used)
#' @param ... additional arguments for \code{\link{active_snw_enrichment_wrapper}}
#'
#' @return Data frame of pathfindR enrichment results. Columns are: \describe{
#'   \item{ID}{ID of the enriched term}
#'   \item{Term_Description}{Description of the enriched term}
#'   \item{Fold_Enrichment}{Fold enrichment value for the enriched term (Calculated using ONLY the input genes)}
#'   \item{occurrence}{the number of iterations that the given term was found to enriched over all iterations}
#'   \item{support}{the median support (proportion of active subnetworks leading to enrichment within an iteration) over all iterations}
#'   \item{lowest_p}{the lowest adjusted-p value of the given term over all iterations}
#'   \item{highest_p}{the highest adjusted-p value of the given term over all iterations}
#'   \item{non_Signif_Snw_Genes (OPTIONAL)}{the non-significant active subnetwork genes, comma-separated}
#'   \item{Up_regulated}{the up-regulated genes (as determined by `change value` > 0, if the `change column` was provided) in the input involved in the given term's gene set, comma-separated. If change column not provided, all affected are listed here.}
#'   \item{Down_regulated}{the down-regulated genes (as determined by `change value` < 0, if the `change column` was provided) in the input involved in the given term's gene set, comma-separated}
#' }
#'  The function also creates an HTML report with the pathfindR enrichment
#'  results linked to the visualizations of the enriched terms in addition to
#'  the table of converted gene symbols. This report can be found in
#'  '\code{output_dir}/results.html' under the current working directory.
#'
#'  By default, a bubble chart of top 10 enrichment results are plotted. The x-axis
#'  corresponds to fold enrichment values while the y-axis indicates the enriched
#'  terms. Sizes of the bubbles indicate the number of significant genes in the given terms.
#'  Color indicates the -log10(lowest-p) value; the more red it is, the more
#'  significant the enriched term is. See \code{\link{enrichment_chart}}.
#'
#' @import knitr
#' @import rmarkdown
#' @import parallel
#' @import doParallel
#' @import foreach
#' @import graphics
#'
#' @export
#'
#' @section Warning: Especially depending on the protein interaction network,
#'  the algorithm and the number of iterations you choose, 'active subnetwork
#'  search + enrichment' component of \code{run_pathfindR} may take a long time to finish.
#'
#' @seealso
#' \code{\link{input_testing}} for input testing, \code{\link{input_processing}} for input processing,
#' \code{\link{active_snw_search}} for active subnetwork search and subnetwork filtering,
#' \code{\link{enrichment_analyses}} for enrichment analysis (using the active subnetworks),
#' \code{\link{summarize_enrichment_results}} for summarizing the active-subnetwork-oriented enrichment results,
#' \code{\link{annotate_term_genes}} for annotation of affected genes in the given gene sets,
#' \code{\link{visualize_terms}} for visualization of enriched terms,
#' \code{\link{enrichment_chart}} for a visual summary of the pathfindR enrichment results,
#' \code{\link[foreach]{foreach}} for details on parallel execution of looping constructs,
#' \code{\link{cluster_enriched_terms}} for clustering the resulting enriched terms and partitioning into clusters.
#'
#' @examples
#' \dontrun{
#' run_pathfindR(example_pathfindR_input)
#' }
run_pathfindR <- function(input, gene_sets = "KEGG", min_gset_size = 10, max_gset_size = 300,
    custom_genes = NULL, custom_descriptions = NULL, pin_name_path = "Biogrid", p_val_threshold = 0.05,
    enrichment_threshold = 0.05, convert2alias = TRUE, plot_enrichment_chart = TRUE,
    output_dir = NULL, list_active_snw_genes = FALSE, ...) {
    ############ Argument checks
    if (!is.logical(plot_enrichment_chart)) {
        stop("`plot_enrichment_chart` should be either TRUE or FALSE")
    }
    if (!is.logical(list_active_snw_genes)) {
        stop("`list_active_snw_genes` should be either TRUE or FALSE")
    }

    gset_list <- fetch_gene_set(gene_sets = gene_sets, min_gset_size = min_gset_size,
        max_gset_size = max_gset_size, custom_genes = custom_genes, custom_descriptions = custom_descriptions)

    ## absolute path to PIN
    pin_path <- return_pin_path(pin_name_path)

    ## create output dir
    output_dir_org <- output_dir
    output_dir <- configure_output_dir(output_dir)
    # on exit, set working directory back to original working directory
    org_dir <- getwd()
    on.exit(setwd(org_dir))
    # create and change working directory into the output directory
    dir.create(output_dir, recursive = TRUE)
    output_dir <- normalizePath(output_dir)
    setwd(output_dir)

    input_testing(input, p_val_threshold)

    input_processed <- input_processing(input, p_val_threshold, pin_path, convert2alias)

    combined_res <- active_snw_enrichment_wrapper(input_processed, pin_path, gset_list,
        enrichment_threshold, list_active_snw_genes, ...)
    setwd(output_dir)

    ## In case no enrichment was found
    if (is.null(combined_res)) {
        warning("Did not find any enriched terms!", call. = FALSE)
        return(data.frame())
    }

    final_res <- summarize_enrichment_results(combined_res, list_active_snw_genes)


    final_res <- annotate_term_genes(result_df = final_res, input_processed = input_processed,
        genes_by_term = gset_list$genes_by_term)

    if (!is.null(output_dir_org)) {
        create_HTML_report(input = input, input_processed = input_processed, final_res = final_res,
            dir_for_report = output_dir)
    }

    if (plot_enrichment_chart) {
        graphics::plot(enrichment_chart(result_df = final_res))
    }

    message(paste0("Found ", nrow(final_res), " enriched terms\n\n"))
    message("You may run:\n")
    message("- cluster_enriched_terms() for clustering enriched terms\n")
    message("- visualize_terms() for visualizing enriched term diagrams\n\n")

    return(final_res)
}


================================================
FILE: R/data_generation.R
================================================
#' Safely download and parse web content
#'
#' This helper function retrieves content from a given URL using \pkg{httr}.  
#' It ensures that common issues (e.g. no internet, timeouts, HTTP errors, 
#' or parsing errors) are handled gracefully with clear, informative error messages.  
#'
#' @param url Character string. The URL of the resource to download.
#' @param ... Additional arguments passed to \code{\link[httr]{GET}}.
#' @param timeout_sec Numeric. Timeout in seconds for the request (default = 10).
#'
#' @return A character string containing the parsed content of the response 
#'   (UTF-8 encoded). On failure, an error is raised with a clear message.
#'
#' @details
#' This function is intended for use inside package functions.  
#' For examples, vignettes, or tests, wrap calls in a connectivity check 
#' (e.g. using \code{http_error(HEAD(url))}) to avoid CRAN failures 
#' when the resource is temporarily unavailable.
#'
#' @examples
#' \dontrun{
#' # Retrieve the latest BioGRID release page
#' result <- safe_get_content("https://downloads.thebiogrid.org/BioGRID/Latest-Release/")
#' }
#' 
#' @importFrom httr GET timeout http_error status_code content
safe_get_content <- function(url, ..., timeout_sec = 10) {
  res <- tryCatch(
    {
      GET(url, timeout(timeout_sec), ...)
    },
    error = function(e) {
      stop("Failed to retrieve resource from ", url, 
           ". Error: ", conditionMessage(e), call. = FALSE)
    }
  )
  
  # Check HTTP status
  if (http_error(res)) {
    stop("The resource at ", url, " is unavailable. HTTP status: ",
         status_code(res), call. = FALSE)
  }
  
  # Return parsed content (default: text if HTML, raw if binary, etc.)
  content <- tryCatch(
    content(res, as = "text", encoding = "UTF-8"),
    error = function(e) {
      stop("Failed to parse content from ", url, 
           ". Error: ", conditionMessage(e), call. = FALSE)
    }
  )
  
  return(content)
}


#' Process Data frame of Protein-protein Interactions
#'
#' @param pin_df data frame of protein-protein interactions with 2 columns:
#' 'Interactor_A' and 'Interactor_B'
#'
#' @return processed PIN data frame (removes self-interactions and
#' duplicated interactions)
process_pin <- function(pin_df) {
    # remove self-interactions
    pin_df <- pin_df[pin_df$Interactor_A != pin_df$Interactor_B, ]

    # remove duplicated inteactions (including symmetric ones)
    pin_df <- unique(t(apply(pin_df, 1, sort)))

    pin_df <- as.data.frame(pin_df)
    colnames(pin_df) <- c("Interactor_A", "Interactor_B")
    return(pin_df)
}

#' Retrieve the Requested Release of Organism-specific BioGRID PIN
#'
#' @param org organism name. BioGRID naming requires underscores for spaces so
#' 'Homo sapiens' becomes 'Homo_sapiens', 'Mus musculus' becomes 'Mus_musculus'
#' etc. See \url{https://wiki.thebiogrid.org/doku.php/statistics} for a full
#' list of available organisms (default = 'Homo_sapiens')
#' @param path2pin the path of the file to save the PIN data. By default, the
#' PIN data is saved in a temporary file
#' @param release the requested BioGRID release (default = 'latest')
#'
#' @return the path of the file in which the PIN data was saved. If
#' \code{path2pin} was not supplied by the user, the PIN data is saved in a
#' temporary file
get_biogrid_pin <- function(org = "Homo_sapiens", path2pin, release = "latest") {
    # check organism name
    all_org_names <- c("Anopheles_gambiae_PEST", "Apis_mellifera", "Arabidopsis_thaliana_Columbia",
        "Bacillus_subtilis_168", "Bos_taurus", "Caenorhabditis_elegans", "Candida_albicans_SC5314",
        "Canis_familiaris", "Cavia_porcellus", "Chlamydomonas_reinhardtii", "Chlorocebus_sabaeus",
        "Cricetulus_griseus", "Danio_rerio", "Dictyostelium_discoideum_AX4", "Drosophila_melanogaster",
        "Emericella_nidulans_FGSC_A4", "Equus_caballus", "Escherichia_coli_K12_MC4100_BW2952",
        "Escherichia_coli_K12_MG1655", "Escherichia_coli_K12_W3110", "Escherichia_coli_K12",
        "Gallus_gallus", "Glycine_max", "Hepatitus_C_Virus", "Homo_sapiens", "Human_Herpesvirus_1",
        "Human_Herpesvirus_2", "Human_Herpesvirus_3", "Human_Herpesvirus_4", "Human_Herpesvirus_5",
        "Human_Herpesvirus_6A", "Human_Herpesvirus_6B", "Human_Herpesvirus_7", "Human_Herpesvirus_8",
        "Human_Immunodeficiency_Virus_1", "Human_Immunodeficiency_Virus_2", "Human_papillomavirus_10",
        "Human_papillomavirus_16", "Human_papillomavirus_6b", "Leishmania_major_Friedlin",
        "Macaca_mulatta", "Meleagris_gallopavo", "Mus_musculus", "Mycobacterium_tuberculosis_H37Rv",
        "Neurospora_crassa_OR74A", "Nicotiana_tomentosiformis", "Oryctolagus_cuniculus",
        "Oryza_sativa_Japonica", "Ovis_aries", "Pan_troglodytes", "Pediculus_humanus",
        "Plasmodium_falciparum_3D7", "Rattus_norvegicus", "Ricinus_communis", "Saccharomyces_cerevisiae_S288c",
        "Schizosaccharomyces_pombe_972h", "Selaginella_moellendorffii", "Simian_Immunodeficiency_Virus",
        "Simian_Virus_40", "Solanum_lycopersicum", "Solanum_tuberosum", "Streptococcus_pneumoniae_ATCCBAA255",
        "Strongylocentrotus_purpuratus", "Sus_scrofa", "Tobacco_Mosaic_Virus", "Ustilago_maydis_521",
        "Vaccinia_Virus", "Vitis_vinifera", "Xenopus_laevis", "Zea_mays")
    if (!org %in% all_org_names) {
        stop(paste(org, "is not a valid Biogrid organism.", "Available organisms are listed on: https://wiki.thebiogrid.org/doku.php/statistics"))
    }

    if (release == "latest") {
      result <- safe_get_content("https://downloads.thebiogrid.org/BioGRID/Latest-Release/")
      
      h2_matches <- regexpr("(?<=<h2>BioGRID Release\\s)(\\d\\.\\d\\.\\d+)", result, perl = TRUE)
      release <- regmatches(result, h2_matches)
    }

    # release directory for download
    rel_dir <- paste0("BIOGRID-", release)

    # choose tab2 vs. tab3
    tab_v <- ifelse(utils::compareVersion(release, "3.5.183") == -1, ".tab2", ".tab3")

    # download tab2 format organism files
    tmp <- tempfile()
    fname <- paste0("BIOGRID-ORGANISM-", release, tab_v)
    biogrid_url <- paste0("https://downloads.thebiogrid.org/Download/BioGRID/Release-Archive/",
        rel_dir, "/", fname, ".zip")
    utils::download.file(biogrid_url, tmp, method = getOption("download.file.method"),
        quiet = TRUE)

    # parse organism names
    all_org_files <- utils::unzip(tmp, list = TRUE)
    all_org_files$Organism <- sub("\\.tab\\d\\.txt", "", all_org_files$Name)
    all_org_files$Organism <- sub("BIOGRID-ORGANISM-", "", all_org_files$Organism)
    all_org_files$Organism <- sub("-.*\\d+$", "", all_org_files$Organism)

    org_file <- all_org_files$Name[all_org_files$Organism == org]

    # process and save organism PIN file
    biogrid_df <- utils::read.delim(unz(tmp, org_file), check.names = FALSE, colClasses = "character")
    biogrid_pin <- data.frame(Interactor_A = biogrid_df[, "Official Symbol Interactor A"],
        Interactor_B = biogrid_df[, "Official Symbol Interactor B"])
    biogrid_pin <- process_pin(biogrid_pin)

    final_pin <- data.frame(intA = biogrid_pin$Interactor_A, pp = "pp", intB = biogrid_pin$Interactor_B)

    if (missing(path2pin)) {
        path2pin <- tempfile()
    }
    utils::write.table(final_pin, path2pin, sep = "\t", row.names = FALSE, col.names = FALSE,
        quote = FALSE)
    return(path2pin)
}

#' Retrieve Organism-specific PIN data
#'
#' @param source As of this version, this function is implemented to get data
#' from 'BioGRID' only. This argument (and this wrapper function) was implemented
#' for future utility
#' @inheritParams get_biogrid_pin
#' @param ... additional arguments for \code{\link{get_biogrid_pin}}
#'
#' @return the path of the file in which the PIN data was saved. If
#' \code{path2pin} was not supplied by the user, the PIN data is saved in a
#' temporary file
#' @export
#'
#' @examples
#' \dontrun{
#' pin_path <- get_pin_file()
#' }
get_pin_file <- function(source = "BioGRID", org = "Homo_sapiens", path2pin, ...) {
    ## TODO
    if (source != "BioGRID") {
        stop("As of this version, this function is implemented to get data from BioGRID only")
    }

    path2pin <- get_biogrid_pin(org = org, path2pin = path2pin, ...)
    return(path2pin)
}

#' Retrieve Gene Sets from GMT-format File
#'
#' @param path2gmt path to the gmt file
#' @param descriptions_idx index for descriptions (default = 2)
#'
#' @return list containing 2 elements: \itemize{
#' \item{gene_sets - A list containing the genes involved in each gene set}
#' \item{descriptions - A named vector containing the descriptions for each gene set}
#' }
gset_list_from_gmt <- function(path2gmt, descriptions_idx = 2) {
    gset_names_idx <- ifelse(descriptions_idx == 2, 1, 2)
    gmt_lines <- readLines(path2gmt)

    ## Genes list
    genes_list <- lapply(gmt_lines, function(x) {
        x <- unlist(strsplit(x, "\t"))
        x <- unique(x[3:length(x)])
        x <- x[x != ""]
        return(x)
    })

    names(genes_list) <- vapply(gmt_lines, function(x) {
        x <- unlist(strsplit(x, "\t"))
        return(x[gset_names_idx])
    }, "a")

    ## Descriptions vector
    descriptions_vec <- vapply(gmt_lines, function(x) {
        x <- unlist(strsplit(x, "\t"))
        return(x[descriptions_idx])
    }, "a")

    names(descriptions_vec) <- names(genes_list)

    # remove empty gene sets (if any)
    genes_list <- genes_list[vapply(genes_list, length, 1) != 0]
    descriptions_vec <- descriptions_vec[names(genes_list)]

    return(list(gene_sets = genes_list, descriptions = descriptions_vec))
}

#' Retrieve Organism-specific KEGG Pathway Gene Sets
#'
#' @param org_code KEGG organism code for the selected organism. For a full list
#' of all available organisms, see \url{https://www.genome.jp/kegg/catalog/org_list.html}
#'
#' @return list containing 2 elements: \itemize{
#' \item{gene_sets - A list containing KEGG IDs for the genes involved in each KEGG pathway}
#' \item{descriptions - A named vector containing the descriptions for each KEGG pathway}
#' }
#' @importFrom ggkegg pathway
get_kegg_gsets <- function(org_code = "hsa") {

  message("Grab a cup of coffee, this will take a while...")

  all_pathways_url <- paste0("https://rest.kegg.jp/list/pathway/", org_code)
  all_pathways_result <- safe_get_content(all_pathways_url)
  parsed_all_pathways_result <- strsplit(all_pathways_result, "\n")[[1]]
  pathway_ids <- vapply(parsed_all_pathways_result, function(x) unlist(strsplit(x, "\t"))[1], "id")
  pathway_descriptons <- vapply(parsed_all_pathways_result, function(x) unlist(strsplit(x, "\t"))[2], "description")
  names(pathway_descriptons) <- pathway_ids

  genes_by_pathway <- lapply(pathway_ids, function(pw_id) {
    pathways_graph <- pathway(pid = pw_id, directory = tempdir(), use_cache = FALSE, return_tbl_graph = FALSE)
    all_pw_kegg_ids <- igraph::V(pathways_graph)$name[igraph::V(pathways_graph)$type == "gene"]
    all_pw_kegg_ids <- unlist(strsplit(all_pw_kegg_ids, " "))
    all_pw_kegg_ids <- unique(all_pw_kegg_ids)
    return(all_pw_kegg_ids)
  })

  names(genes_by_pathway) <- pathway_ids

  # remove empty gene sets (e.g. pure metabolic pathways)
  kegg_genes <- genes_by_pathway[vapply(genes_by_pathway, length, 1) != 0]

  kegg_descriptions <- pathway_descriptons
  kegg_descriptions <- sub(" & .*$", "", sub("-([^-]*)$", "&\\1", kegg_descriptions))
  kegg_descriptions <- kegg_descriptions[names(kegg_descriptions) %in% names(kegg_genes)]

  result <- list(gene_sets = kegg_genes, descriptions = kegg_descriptions)
  return(result)
}

#' Retrieve Reactome Pathway Gene Sets
#'
#' @return Gets the latest Reactome pathways gene sets in gmt format. Parses the
#' gmt file and returns a list containing 2 elements: \itemize{
#' \item{gene_sets - A list containing the genes involved in each Reactome pathway}
#' \item{descriptions - A named vector containing the descriptions for each Reactome pathway}
#' }
#'
get_reactome_gsets <- function() {
    tmp <- tempfile()
    reactome_url <- "https://reactome.org/download/current/ReactomePathways.gmt.zip"
    utils::download.file(reactome_url, tmp, method = getOption("download.file.method"))

    reactome_gmt <- unz(tmp, "ReactomePathways.gmt")
    result <- gset_list_from_gmt(reactome_gmt, descriptions_idx = 1)
    close(reactome_gmt)

    # fix illegal char(s)
    result$descriptions <- gsub("[^ -~]", "", result$descriptions)
    return(result)
}

#' Retrieve Organism-specific MSigDB Gene Sets
#'
#' @param species species name for output genes, such as Homo sapiens, Mus musculus, etc.
#' See \code{\link[msigdbr]{msigdbr_species}} for all the species available in
#' the msigdbr package.
#' @param db_species Species abbreviation for the human or mouse databases ("HS" or "MM").
#' @param collection collection. e.g., H, C1. (default = NULL,
#' i.e. list all gene sets in collection). 
#' See \code{\link[msigdbr]{msigdbr_collections}} for all available options
#' the msigdbr package.
#' @param subcollection sub-collection, such as CGP, BP, etc. (default = NULL,
#' i.e. list all gene sets in collection). 
#' See \code{\link[msigdbr]{msigdbr_collections}} for all available options
#' the msigdbr package.
#'
#' @return Retrieves the MSigDB gene sets and returns a list containing 2 elements: \itemize{
#' \item{gene_sets - A list containing the genes involved in each of the selected MSigDB gene sets}
#' \item{descriptions - A named vector containing the descriptions for each selected MSigDB gene set}
#' }
#'
#' @details this function utilizes the function \code{\link[msigdbr]{msigdbr}}
#' from the \code{msigdbr} package to retrieve the 'Molecular Signatures Database'
#' (MSigDB) gene sets (Subramanian et al. 2005 <doi:10.1073/pnas.0506580102>,
#' Liberzon et al. 2015 <doi:10.1016/j.cels.2015.12.004>).
#' Available collections are: H: hallmark gene sets, C1: positional gene sets,
#' C2: curated gene sets, C3: motif gene sets, C4: computational gene sets,
#' C5: GO gene sets, C6: oncogenic signatures and C7: immunologic signatures
get_mgsigdb_gsets <- function(species = "Homo sapiens", db_species = "HS", collection = NULL, subcollection = NULL) {
    msig_df <- msigdbr::msigdbr(
      species = species, 
      collection = collection, 
      subcollection = subcollection, 
      db_species = db_species
    )

    ### create gene sets list
    all_gs_ids <- unique(msig_df$gs_id)
    msig_gsets_list <- list()
    for (id in all_gs_ids) {
        sub_df <- msig_df[msig_df$gs_id == id, ]
        msig_gsets_list[[id]] <- unique(sub_df$gene_symbol)
    }
    ### create gene sets descriptions
    msig_gsets_descriptions <- msig_df[, c("gs_name", "gs_id")]
    msig_gsets_descriptions <- unique(msig_gsets_descriptions)
    tmp <- msig_gsets_descriptions$gs_id
    msig_gsets_descriptions <- msig_gsets_descriptions$gs_name
    names(msig_gsets_descriptions) <- tmp

    result <- list(gene_sets = msig_gsets_list, descriptions = msig_gsets_descriptions)
    return(result)
}

#' Retrieve Organism-specific Gene Sets List
#'
#' @param source As of this version, either 'KEGG', 'Reactome' or 'MSigDB' (default = 'KEGG')
#' @param org_code (Used for 'KEGG' only) KEGG organism code for the selected organism. For a full list
#' of all available organisms, see \url{https://www.genome.jp/kegg/catalog/org_list.html}
#' @inheritParams get_mgsigdb_gsets
#'
#' @return A list containing 2 elements: \itemize{
#' \item{gene_sets - A list containing the genes involved in each gene set}
#' \item{descriptions - A named vector containing the descriptions for each gene set}
#' }. For 'KEGG' and 'MSigDB', it is possible to choose a specific organism. For a full list
#' of all available KEGG organisms, see \url{https://www.genome.jp/kegg/catalog/org_list.html}.
#' See \code{\link[msigdbr]{msigdbr_species}} for all the species available in
#' the msigdbr package used for obtaining 'MSigDB' gene sets.
#' For Reactome, there is only one collection of pathway gene sets.
#' @export
#'
get_gene_sets_list <- function(source = "KEGG", org_code = "hsa", species = "Homo sapiens", 
                               db_species = "HS", collection, subcollection = NULL) {
    if (source == "KEGG") {
        return(get_kegg_gsets(org_code))
    } else if (source == "Reactome") {
        message("For Reactome, there is only one collection of pathway gene sets.")
        return(get_reactome_gsets())
    } else if (source == "MSigDB") {
        return(
          get_mgsigdb_gsets(
            species = species, 
            db_species= db_species, 
            collection = collection, 
            subcollection = subcollection
          )
        )
    } else {
        stop("As of this version, this function is implemented to get data from KEGG, Reactome and MSigDB only")
    }
}


================================================
FILE: R/enrichment.R
================================================
#' Hypergeometric Distribution-based Hypothesis Testing
#'
#' @param term_genes vector of genes in the selected term gene set
#' @param chosen_genes vector containing the set of input genes
#' @param background_genes vector of background genes (i.e. universal set of
#' genes in the experiment)
#'
#' @return the p-value as determined using the hypergeometric distribution.
#'
#' @details To determine whether the \code{chosen_genes} are enriched
#' (compared to a background pool of genes) in the \code{term_genes}, the
#' hypergeometric distribution is assumed and the appropriate p value
#' (the value under the right tail) is calculated and returned.
#'
#' @export
#'
#' @examples
#' hyperg_test(letters[1:5], letters[2:5], letters)
#' hyperg_test(letters[1:5], letters[2:10], letters)
#' hyperg_test(letters[1:5], letters[2:13], letters)
hyperg_test <- function(term_genes, chosen_genes, background_genes) {
    #### Argument checks
    if (!is.atomic(term_genes)) {
        stop("`term_genes` should be a vector")
    }
    if (!is.atomic(chosen_genes)) {
        stop("`chosen_genes` should be a vector")
    }
    if (!is.atomic(background_genes)) {
        stop("`background_genes` should be a vector")
    }

    if (length(term_genes) > length(background_genes)) {
        stop("`term_genes` cannot be larger than `background_genes`!")
    }
    if (length(chosen_genes) > length(background_genes)) {
        stop("`chosen_genes` cannot be larger than `background_genes`!")
    }

    #### Calculate p value
    term_genes_selected <- sum(chosen_genes %in% term_genes)
    term_genes_in_pool <- sum(term_genes %in% background_genes)
    tot_genes_in_pool <- length(background_genes)
    non_term_genes_in_pool <- tot_genes_in_pool - term_genes_in_pool
    num_selected_genes <- length(chosen_genes)

    p_val <- stats::phyper(term_genes_selected - 1, term_genes_in_pool, non_term_genes_in_pool,
        num_selected_genes, lower.tail = FALSE)
    return(p_val)
}

#' Perform Enrichment Analysis for a Single Gene Set
#'
#' @param input_genes The set of gene symbols to be used for enrichment
#'   analysis. In the scope of this package, these are genes that were
#'   identified for an active subnetwork
#' @param genes_by_term List that contains genes for each gene set. Names of
#'   this list are gene set IDs (default = kegg_genes)
#' @param term_descriptions Vector that contains term descriptions for the
#'   gene sets. Names of this vector are gene set IDs (default = kegg_descriptions)
#' @param adj_method correction method to be used for adjusting p-values.
#'   (default = 'bonferroni')
#' @param enrichment_threshold adjusted-p value threshold used when filtering
#'   enrichment results (default = 0.05)
#' @param sig_genes_vec vector of significant gene symbols. In the scope of this
#'   package, these are the input genes that were used for active subnetwork search
#' @param background_genes vector of background genes. In the scope of this package,
#'   the background genes are taken as all genes in the PIN
#'   (see \code{\link{enrichment_analyses}})
#'
#' @return A data frame that contains enrichment results
#' @export
#' @seealso \code{\link[stats]{p.adjust}} for adjustment of p values. See
#'   \code{\link{run_pathfindR}} for the wrapper function of the pathfindR
#'   workflow. \code{\link{hyperg_test}} for the details on hypergeometric
#'   distribution-based hypothesis testing.
#' @examples
#' enrichment(
#'   input_genes = c('PER1', 'PER2', 'CRY1', 'CREB1'),
#'   sig_genes_vec = 'PER1',
#'   background_genes = unlist(pathfindR.data::kegg_genes)
#' )
enrichment <- function(input_genes, genes_by_term = pathfindR.data::kegg_genes, term_descriptions = pathfindR.data::kegg_descriptions,
    adj_method = "bonferroni", enrichment_threshold = 0.05, sig_genes_vec, background_genes) {
    #### Argument checks input genes
    if (!is.atomic(input_genes)) {
        stop("`input_genes` should be a vector of gene symbols")
    }

    ## gene sets data
    if (!is.list(genes_by_term)) {
        stop("`genes_by_term` should be a list of term gene sets")
    }
    if (is.null(names(genes_by_term))) {
        stop("`genes_by_term` should be a named list (names are gene set IDs)")
    }

    if (!is.atomic(term_descriptions)) {
        stop("`term_descriptions` should be a vector of term gene descriptions")
    }
    if (is.null(names(term_descriptions))) {
        stop("`term_descriptions` should be a named vector (names are gene set IDs)")
    }

    if (length(genes_by_term) != length(term_descriptions)) {
        stop("The lengths of `genes_by_term` and `term_descriptions` should be the same")
    }
    if (any(names(genes_by_term) != names(term_descriptions))) {
        stop("The names of `genes_by_term` and `term_descriptions` should all be the same")
    }

    ## enrichment threshold
    if (!is.numeric(enrichment_threshold)) {
        stop("`enrichment_threshold` should be a numeric value between 0 and 1")
    }
    if (enrichment_threshold < 0 | enrichment_threshold > 1) {
        stop("`enrichment_threshold` should be between 0 and 1")
    }

    ## signif. genes and background (universal set) genes
    if (!is.atomic(sig_genes_vec)) {
        stop("`sig_genes_vec` should be a vector")
    }
    if (!is.atomic(background_genes)) {
        stop("`background_genes` should be a vector")
    }

    #### Obtain p values
    enrichment_res <- vapply(genes_by_term, hyperg_test, 0.1, input_genes, background_genes)
    enrichment_res <- as.data.frame(enrichment_res)
    colnames(enrichment_res) <- "p_value"

    # Adjust p values
    idx <- order(enrichment_res$p_value)
    enrichment_res <- enrichment_res[idx, , drop = FALSE]
    enrichment_res$adj_p <- stats::p.adjust(enrichment_res$p, method = adj_method)


    #### Filter by adj-p
    cond <- enrichment_res$adj_p <= enrichment_threshold
    # Empty case (if all adj-p > threshold)
    if (sum(cond) == 0) {
        return(NULL)
    }
    enrichment_res <- enrichment_res[cond, ]

    #### Add other columns Term IDs
    enrichment_res$ID <- rownames(enrichment_res)

    ## Term descriptions
    idx <- match(enrichment_res$ID, names(term_descriptions))
    enrichment_res$Term_Description <- term_descriptions[idx]

    # Fold enrinchment
    gset_for_fe <- genes_by_term[rownames(enrichment_res)]
    A <- vapply(gset_for_fe, function(gset) length(intersect(sig_genes_vec, gset)),
        1L)/length(sig_genes_vec)
    B <- vapply(gset_for_fe, function(gset) length(intersect(background_genes, gset)),
        1L)/length(background_genes)
    enrichment_res$Fold_Enrichment <- A/B

    # Non-significant Subnetwork Genes
    non_sig_snw_genes <- base::setdiff(input_genes, sig_genes_vec)
    for (i in base::seq_len(nrow(enrichment_res))) {
        tmp <- intersect(non_sig_snw_genes, genes_by_term[[enrichment_res$ID[i]]])
        enrichment_res$non_Signif_Snw_Genes[i] <- paste(tmp, collapse = ", ")
    }

    ## reorder columns
    to_order <- c("ID", "Term_Description", "Fold_Enrichment", "p_value", "adj_p",
        "non_Signif_Snw_Genes")
    enrichment_res <- enrichment_res[, to_order]

    return(enrichment_res)
}

#' Perform Enrichment Analyses on the Input Subnetworks
#'
#' @param snws a list of subnetwork genes (i.e., vectors of genes for each subnetwork)
#' @inheritParams enrichment
#' @inheritParams return_pin_path
#' @param list_active_snw_genes boolean value indicating whether or not to report
#' the non-significant active subnetwork genes for the active subnetwork which was enriched for
#' the given term with the lowest p value (default = \code{FALSE})
#'
#' @return a dataframe of combined enrichment results. Columns are: \describe{
#'   \item{ID}{ID of the enriched term}
#'   \item{Term_Description}{Description of the enriched term}
#'   \item{Fold_Enrichment}{Fold enrichment value for the enriched term}
#'   \item{p_value}{p value of enrichment}
#'   \item{adj_p}{adjusted p value of enrichment}
#'   \item{support}{the support (proportion of active subnetworks leading to enrichment over all subnetworks) for the gene set}
#'   \item{non_Signif_Snw_Genes (OPTIONAL)}{the non-significant active subnetwork genes, comma-separated}
#' }
#'
#' @export
#'
#' @seealso \code{\link{enrichment}} for the enrichment analysis for a single gene set
#'
#' @examples
#' enr_res <- enrichment_analyses(
#'   snws = example_active_snws[1:2],
#'   sig_genes_vec = example_pathfindR_input$Gene.symbol[1:25],
#'   pin_name_path = 'KEGG'
#' )
enrichment_analyses <- function(snws, sig_genes_vec, pin_name_path = "Biogrid", genes_by_term = pathfindR.data::kegg_genes,
    term_descriptions = pathfindR.data::kegg_descriptions, adj_method = "bonferroni",
    enrichment_threshold = 0.05, list_active_snw_genes = FALSE) {
    ### Argument check
    if (!is.logical(list_active_snw_genes)) {
        stop("`list_active_snw_genes` should be either TRUE or FALSE")
    }

    ### Load PIN Data
    pin_path <- return_pin_path(pin_name_path)
    pin <- utils::read.delim(file = pin_path, header = FALSE)

    background_genes <- unique(c(pin[, 1], pin[, 3]))

    # turn all to upper case for best match
    genes_by_term <- lapply(genes_by_term, base::toupper)
    sig_genes_vec <- base::toupper(sig_genes_vec)
    background_genes <- base::toupper(background_genes)

    ############ Enrichment per subnetwork
    enrichment_res <- lapply(snws, function(x) {
        enrichment(input_genes = base::toupper(x), genes_by_term = genes_by_term,
            term_descriptions = term_descriptions, adj_method = adj_method, enrichment_threshold = enrichment_threshold,
            sig_genes_vec = sig_genes_vec, background_genes = background_genes)
    })

    ### indices for snw.s
    if (length(enrichment_res) != 0) {
        for (i in seq_len(length(enrichment_res))) {
            if (!is.null(enrichment_res[[i]])) {
                enrichment_res[[i]]$snw_idx <- i
            }
        }
    }

    ############ Combine Enrichments Results for All Subnetworks
    enrichment_res <- Reduce(rbind, enrichment_res)

    ############ Process if non-empty
    if (!is.null(enrichment_res)) {
        ## calculate support values
        support <- tapply(enrichment_res$snw_idx, enrichment_res$ID, length)
        support <- support/length(snws)
        enrichment_res$support <- support[match(enrichment_res$ID, names(support))]
        enrichment_res$snw_idx <- NULL

        ## delete non_Signif_Snw_Genes if list_active_snw_genes == FALSE
        if (!list_active_snw_genes) {
            enrichment_res$non_Signif_Snw_Genes <- NULL
        }

        ## keep lowest p for each term
        idx <- order(enrichment_res$adj_p)
        enrichment_res <- enrichment_res[idx, ]
        enrichment_res <- enrichment_res[!duplicated(enrichment_res$ID), ]
    }
    return(enrichment_res)
}


#' Summarize Enrichment Results
#'
#' @param enrichment_res a dataframe of combined enrichment results. Columns are: \describe{
#'   \item{ID}{ID of the enriched term}
#'   \item{Term_Description}{Description of the enriched term}
#'   \item{Fold_Enrichment}{Fold enrichment value for the enriched term}
#'   \item{p_value}{p value of enrichment}
#'   \item{adj_p}{adjusted p value of enrichment}
#'   \item{non_Signif_Snw_Genes (OPTIONAL)}{the non-significant active subnetwork genes, comma-separated}
#' }
#' @inheritParams enrichment_analyses
#'
#' @return a dataframe of summarized enrichment results (over multiple iterations). Columns are: \describe{
#'   \item{ID}{ID of the enriched term}
#'   \item{Term_Description}{Description of the enriched term}
#'   \item{Fold_Enrichment}{Fold enrichment value for the enriched term}
#'   \item{occurrence}{the number of iterations that the given term was found to enriched over all iterations}
#'   \item{support}{the median support (proportion of active subnetworks leading to enrichment within an iteration) over all iterations}
#'   \item{lowest_p}{the lowest adjusted-p value of the given term over all iterations}
#'   \item{highest_p}{the highest adjusted-p value of the given term over all iterations}
#'   \item{non_Signif_Snw_Genes (OPTIONAL)}{the non-significant active subnetwork genes, comma-separated}
#' }
#' @export
#'
#' @examples
#' \dontrun{
#' summarize_enrichment_results(enrichment_res)
#' }
summarize_enrichment_results <- function(enrichment_res, list_active_snw_genes = FALSE) {
    message("## Processing the enrichment results over all iterations")

    ## Argument checks
    if (!is.logical(list_active_snw_genes)) {
        stop("`list_active_snw_genes` should be either TRUE or FALSE")
    }

    nec_cols <- c("ID", "Term_Description", "Fold_Enrichment", "p_value", "adj_p",
        "support")
    if (list_active_snw_genes) {
        nec_cols <- c(nec_cols, "non_Signif_Snw_Genes")
    }

    if (!is.data.frame(enrichment_res)) {
        stop("`enrichment_res` should be a data frame")
    }

    if (ncol(enrichment_res) != length(nec_cols)) {
        stop("`enrichment_res` should have exactly ", length(nec_cols), " columns")
    }

    if (!all(nec_cols %in% colnames(enrichment_res))) {
        stop("`enrichment_res` should have column names ", paste(dQuote(nec_cols),
            collapse = ", "))
    }

    ## Annotate lowest p, highest p, occurrence and median support
    final_res <- enrichment_res
    lowest_p <- tapply(enrichment_res$adj_p, enrichment_res$ID, min)
    highest_p <- tapply(enrichment_res$adj_p, enrichment_res$ID, max)
    occurrence <- tapply(enrichment_res$adj_p, enrichment_res$ID, length)
    support <- tapply(enrichment_res$support, enrichment_res$ID, stats::median)

    matched_idx <- match(final_res$ID, names(lowest_p))
    final_res$lowest_p <- as.numeric(lowest_p[matched_idx])

    matched_idx <- match(final_res$ID, names(highest_p))
    final_res$highest_p <- as.numeric(highest_p[matched_idx])

    matched_idx <- match(final_res$ID, names(occurrence))
    final_res$occurrence <- as.numeric(occurrence[matched_idx])

    matched_idx <- match(final_res$ID, names(support))
    final_res$support <- as.numeric(support[matched_idx])

    ## reorder columns
    keep <- c("ID", "Term_Description", "Fold_Enrichment", "occurrence", "support",
        "lowest_p", "highest_p")
    if (list_active_snw_genes) {
        keep <- c(keep, "non_Signif_Snw_Genes")
    }
    final_res <- final_res[, keep]

    ## keep data with lowest p-value over all iterations
    final_res <- final_res[order(final_res$lowest_p), ]
    final_res <- final_res[!duplicated(final_res$ID), ]
    rownames(final_res) <- NULL

    return(final_res)
}


================================================
FILE: R/pathfindr.R
================================================
#' pathfindR: A package for Enrichment Analysis Utilizing Active Subnetworks
#'
#' pathfindR is a tool for active-subnetwork-oriented gene set enrichment analysis.
#' The main aim of the package is to identify active subnetworks in a
#' protein-protein interaction network using a user-provided list of genes
#' and associated p values then performing enrichment analyses on the identified
#' subnetworks, discovering enriched terms (i.e. pathways, gene ontology, TF target
#' gene sets etc.) that possibly underlie the phenotype of interest.
#'
#' For analysis on non-Homo sapiens organisms, pathfindR offers utility functions
#' for obtaining organism-specific PIN data and organism-specific gene sets data.
#'
#' pathfindR also offers functionalities to cluster the enriched terms and
#' identify representative terms in each cluster, to score the enriched terms
#' per sample and to visualize analysis results.
#'
#'
#' @seealso See \code{\link{run_pathfindR}} for details on the pathfindR
#' active-subnetwork-oriented enrichment analysis
#' See \code{\link{cluster_enriched_terms}} for details on methods of enriched
#' terms clustering to define clusters of biologically-related terms
#' See \code{\link{score_terms}} for details on agglomerated score calculation
#' for enriched terms to investigate how a gene set is altered in a given sample
#' (or in cases vs. controls)
#' See \code{\link{term_gene_heatmap}} for details on visualization of the heatmap
#' of enriched terms by involved genes
#' See \code{\link{term_gene_graph}} for details on visualizing terms and
#' term-related genes as a graph to determine the degree of overlap between the
#' enriched terms by identifying shared and/or distinct significant genes
#' See \code{\link{UpSet_plot}} for details on creating an UpSet plot of the
#' enriched terms.
#' See \code{\link{get_pin_file}} for obtaining organism-specific PIN data and
#' \code{\link{get_gene_sets_list}} for obtaining organism-specific gene sets data
#' @import pathfindR.data
#' @name pathfindR
"_PACKAGE"

globalVariables(c("for_coloring", "size"))


================================================
FILE: R/scoring.R
================================================
#' Calculate Agglomerated Scores of Enriched Terms for Each Subject
#'
#' @param enrichment_table a data frame that must contain the 3 columns below: \describe{
#'   \item{Term_Description}{Description of the enriched term (necessary if \code{use_description = TRUE})}
#'   \item{ID}{ID of the enriched term (necessary if \code{use_description = FALSE})}
#'   \item{Up_regulated}{the up-regulated genes in the input involved in the given term's gene set, comma-separated}
#'   \item{Down_regulated}{the down-regulated genes in the input involved in the given term's gene set, comma-separated}
#' }
#' @param exp_mat the experiment (e.g., gene expression/methylation) matrix.
#' Columns are samples and rows are genes. Column names must contain sample
#' names and row names must contain the gene symbols.
#' @param cases (Optional) A vector of sample names that are cases in the
#' case/control experiment. (default = NULL)
#' @param use_description Boolean argument to indicate whether term descriptions
#'  (in the 'Term_Description' column) should be used. (default = \code{FALSE})
#' @param plot_hmap Boolean value to indicate whether or not to draw the
#' heatmap plot of the scores. (default = TRUE)
#' @param ... Additional arguments for \code{\link{plot_scores}} for aesthetics
#' of the heatmap plot
#'
#' @return Matrix of agglomerated scores of each enriched term per sample.
#' Columns are samples, rows are enriched terms. Optionally, displays a heatmap
#' of this matrix.
#'
#' @section Conceptual Background:
#' For an experiment matrix (containing expression, methylation, etc. values),
#' the rows of which are genes and the columns of which are samples,
#' we denote: \itemize{
#' \item E as a matrix of size \ifelse{html}{\out{m x n}}{\eqn{m \times n}}
#' \item G as the set of all genes in the experiment \ifelse{html}{\out{G = E<sub>i.</sub>,  i &#8712; [1, m]}}{\eqn{G = E_{i\cdot},  \ \ i \in [1, m]}}
#' \item S as the set of all samples in the experiment \ifelse{html}{\out{S = E<sub>.j</sub>,  i &#8712; [1, n]}}{\eqn{S = E_{j\cdot},  \ \ \in [1, n]}}
#' }
#'
#' We next define the gene score matrix GS (the standardized experiment matrix,
#' also of size \ifelse{html}{\out{m x n}}{\eqn{m \times n}}) as:
#'
#' \ifelse{html}{\out{GS<sub>gs</sub> = (E<sub>gs</sub> - &#x113;<sub>g</sub>) / s<sub>g</sub>}}{\eqn{GS_{gs} = \frac{E_{gs} - \bar{e_g}}{s_g}}}
#'
#' where \ifelse{html}{\out{g &#8712; G}}{\eqn{g \in G}}, \ifelse{html}{\out{s &#8712; S}}{\eqn{s \in S}},
#' \ifelse{html}{\out{&#x113;<sub>g</sub>}}{\eqn{\bar{e_g}}} is the mean of
#' all values for gene g and \ifelse{html}{\out{s<sub>g</sub>}}{\eqn{\bar{s_g}}}
#' is the standard deviation of all values for gene g.
#'
#' We next denote T to be a set of terms (where each \ifelse{html}{\out{t &#8712; T}}{\eqn{t \in T}}
#' is a set of term-related genes, i.e.,
#' \ifelse{html}{\out{t = \{g<sub>x</sub>, ..., g<sub>y</sub>\} &sub; G}}{\eqn{t = \{g_x, ..., g_y\} \subset G}})
#' and finally define the agglomerated term scores matrix TS (where rows
#' correspond to genes and columns corresponds to samples s.t. the matrix has size
#' \ifelse{html}{\out{|T| x n}}{\eqn{|T| \times n}}) as:
#'
#' \ifelse{html}{\out{TS<sub>ts</sub> = 1/|t| &#x2211; <sub>g &#8712; t</sub> GS<sub>gs</sub>}}{\eqn{TS_{ts} = \frac{1}{|t|}\sum_{g \in t} GS_{gs}}},
#' where \ifelse{html}{\out{t &#8712; T}}{\eqn{t \in T}} and \ifelse{html}{\out{s &#8712; S}}{\eqn{s \in S}}.
#'
#' @export
#'
#' @examples
#' score_matrix <- score_terms(
#'   example_pathfindR_output,
#'   example_experiment_matrix,
#'   plot_hmap = FALSE
#' )
score_terms <- function(enrichment_table, exp_mat, cases = NULL, use_description = FALSE,
    plot_hmap = TRUE, ...) {
    #### Argument Checks
    if (!is.logical(use_description)) {
        stop("`use_description` should either be TRUE or FALSE")
    }

    if (!is.logical(plot_hmap)) {
        stop("`plot_hmap` should either be TRUE or FALSE")
    }

    if (!is.data.frame(enrichment_table)) {
        stop("`enrichment_table` should be a data frame of enrichment results")
    }
    ID_column <- ifelse(use_description, "Term_Description", "ID")
    nec_cols <- c(ID_column, "Up_regulated", "Down_regulated")
    if (!all(nec_cols %in% colnames(enrichment_table))) {
        stop("`enrichment_table` should contain all of ", paste(dQuote(nec_cols),
            collapse = ", "))
    }

    if (!is.matrix(exp_mat)) {
        stop("`exp_mat` should be a matrix")
    }

    if (!is.null(cases)) {
        if (!is.atomic(cases)) {
            stop("`cases` should be a vector")
        }

        if (!all(cases %in% colnames(exp_mat))) {
            stop("Missing `cases` in `exp_mat`")
        }
    }

    ## fix duplicated term descriptions (if using description)
    if (use_description) {
        dup_desc <- enrichment_table$Term_Description[duplicated(enrichment_table$Term_Description)]

        tmp <- ifelse(enrichment_table$Term_Description %in% dup_desc, paste0(enrichment_table$Term_Description,
            "_", enrichment_table$ID), enrichment_table$Term_Description)
        enrichment_table$Term_Description <- tmp
    }

    #### Create score matrix
    all_scores_matrix <- c()
    for (i in base::seq_len(nrow(enrichment_table))) {
        # Get signif. genes
        up_genes <- enrichment_table$Up_regulated[i]
        down_genes <- enrichment_table$Down_regulated[i]
        up_genes <- unlist(strsplit(up_genes, ", "))
        down_genes <- unlist(strsplit(down_genes, ", "))

        genes <- c(up_genes, down_genes)

        # convert gene symbols to upper case for comparison
        genes <- toupper(genes)
        exp_mat_genes <- rownames(exp_mat)
        exp_mat_genes <- toupper(exp_mat_genes)

        # some genes may not be in exp. matrix
        genes <- genes[genes %in% exp_mat_genes]

        if (length(genes) != 0) {
            # subset exp. matrix to include only genes
            sub_mat <- exp_mat[exp_mat_genes %in% genes, , drop = FALSE]

            current_term_score_matrix <- c()
            for (gene in genes) {
                gene_vec <- sub_mat[toupper(rownames(sub_mat)) == gene, ]
                gene_vec <- as.numeric(gene_vec)
                names(gene_vec) <- colnames(sub_mat)

                # calculate mean and sd across samples
                gene_mean <- base::mean(gene_vec)
                gene_sd <- stats::sd(gene_vec)

                gene_scores <- vapply(gene_vec, function(x) (x - gene_mean)/gene_sd,
                  1.2)
                current_term_score_matrix <- rbind(current_term_score_matrix, gene_scores)
                rownames(current_term_score_matrix)[nrow(current_term_score_matrix)] <- gene
            }

            current_term_scores <- apply(current_term_score_matrix, 2, base::mean)
            all_scores_matrix <- rbind(all_scores_matrix, current_term_scores)
            rownames(all_scores_matrix)[nrow(all_scores_matrix)] <- enrichment_table[i,
                ID_column]
        }
    }

    if (!is.null(cases)) {
        ## order as cases, then controls
        match1 <- match(cases, colnames(all_scores_matrix))
        match2 <- setdiff(base::seq_len(ncol(all_scores_matrix)), match1)
        all_scores_matrix <- all_scores_matrix[, c(match1, match2)]
    }

    if (plot_hmap) {
        heatmap <- plot_scores(score_matrix = all_scores_matrix, cases = cases, ...)
        graphics::plot(heatmap)
    }

    return(all_scores_matrix)
}

#' Plot the Heatmap of Score Matrix of Enriched Terms per Sample
#'
#' @param score_matrix Matrix of agglomerated enriched term scores per sample. Columns are
#' samples, rows are enriched terms
#' @inheritParams score_terms
#' @param label_samples Boolean value to indicate whether or not to label the
#' samples in the heatmap plot (default = TRUE)
#' @param case_title Naming of the 'Case' group (as in \code{cases}) (default = 'Case')
#' @param control_title Naming of the 'Control' group (default = 'Control')
#' @param low a string indicating the color of 'low' values in the coloring gradient (default = 'green')
#' @param mid a string indicating the color of 'mid' values in the coloring gradient (default = 'black')
#' @param high a string indicating the color of 'high' values in the coloring gradient (default = 'red')
#'
#' @return A `ggplot2` object containing the heatmap plot. x-axis indicates
#' the samples. y-axis indicates the enriched terms. 'Score' indicates the
#' score of the term in a given sample. If \code{cases} are provided, the plot is
#' divided into 2 facets, named by \code{case_title} and \code{control_title}.
#'
#' @import ggplot2
#' @export
#'
#' @examples
#' score_matrix <- score_terms(
#'   example_pathfindR_output,
#'   example_experiment_matrix,
#'   plot_hmap = FALSE
#' )
#' hmap <- plot_scores(score_matrix)
plot_scores <- function(score_matrix, cases = NULL, label_samples = TRUE, case_title = "Case",
    control_title = "Control", low = "green", mid = "black", high = "red") {
    #### Argument Checks
    if (!is.matrix(score_matrix)) {
        stop("`score_matrix` should be a matrix")
    }

    if (!is.null(cases)) {
        if (!is.atomic(cases)) {
            stop("`cases` should be a vector")
        }

        if (!all(cases %in% colnames(score_matrix))) {
            stop("Missing `cases` in `score_matrix`")
        }
    }

    if (!is.logical(label_samples)) {
        stop("`label_samples` should be TRUE or FALSE")
    }

    if (!is.character(case_title) | length(case_title) != 1) {
        stop("`case_title` should be a single character value")
    }

    if (!is.character(control_title) | length(control_title) != 1) {
        stop("`control_title` should be a single character value")
    }

    if (!isColor(low)) {
      stop("`low` should be a valid color")
    }

    if (!isColor(mid)) {
      stop("`mid` should be a valid color")
    }

    if (!isColor(high)) {
      stop("`high` should be a valid color")
    }

    #### Create plot sort according to activity (up/down)
    if (!is.null(cases)) {
        tmp <- rowMeans(score_matrix[, cases, drop = FALSE])
        score_matrix <- score_matrix[c(which(tmp >= 0), which(tmp < 0)), ]
    }

    ## transform the matrix
    var_names <- list()
    var_names[["Term"]] <- factor(rownames(score_matrix), levels = rev(rownames(score_matrix)))
    var_names[["Sample"]] <- factor(colnames(score_matrix), levels = colnames(score_matrix))

    score_df <- expand.grid(var_names, KEEP.OUT.ATTRS = FALSE, stringsAsFactors = FALSE)
    scores <- as.vector(score_matrix)
    scores <- data.frame(scores)
    score_df <- cbind(score_df, scores)
    if (!is.null(cases)) {
        score_df$Type <- ifelse(score_df$Sample %in% cases, case_title, control_title)
        score_df$Type <- factor(score_df$Type, levels = c(case_title, control_title))
    }

    g <- ggplot2::ggplot(score_df, ggplot2::aes(x = .data$Sample, y = .data$Term))
    g <- g + ggplot2::geom_tile(ggplot2::aes(fill = .data$scores), color = "white")
    g <- g + ggplot2::scale_fill_gradient2(low = low, mid = mid, high = high)
    g <- g + ggplot2::theme(axis.title.x = ggplot2::element_blank(), axis.title.y = ggplot2::element_blank(),
        axis.text.x = ggplot2::element_text(angle = 45, hjust = 1), legend.title = ggplot2::element_text(size = 10),
        legend.text = ggplot2::element_text(size = 12))
    g <- g + ggplot2::labs(fill = "Score")
    if (!is.null(cases)) {
        g <- g + ggplot2::facet_grid(~Type, scales = "free_x", space = "free")
        g <- g + ggplot2::theme(strip.text.x = ggplot2::element_text(size = 12, face = "bold"))
    }
    if (!label_samples) {
        g <- g + ggplot2::theme(axis.text.x = ggplot2::element_blank(), axis.ticks.x = ggplot2::element_blank())
    }
    return(g)
}


================================================
FILE: R/utility.R
================================================
#' Active Subnetwork Search + Enrichment Analysis Wrapper for a Single Iteration
#'
#' @param i current iteration index (default = \code{NULL})
#' @param dirs vector of directories for parallel runs
#' @inheritParams active_snw_search
#' @inheritParams enrichment_analyses
#' @inheritParams active_snw_enrichment_wrapper
#'
#' @return Data frame of enrichment results using active subnetwork search results
single_iter_wrapper <- function(i = NULL, dirs, input_processed, pin_path, score_quan_thr,
    sig_gene_thr, search_method, silent_option, use_all_positives, geneInitProbs,
    saTemp0, saTemp1, saIter, gaPop, gaIter, gaThread, gaCrossover, gaMut, grMaxDepth,
    grSearchDepth, grOverlap, grSubNum, gset_list, adj_method, enrichment_threshold,
    list_active_snw_genes) {
    snws_file <- "active_snws"
    dir_for_parallel_run <- NULL
    if (!is.null(i)) {
        snws_file <- paste0("active_snws_", i)
        dir_for_parallel_run <- dirs[i]
    }
    snws <- active_snw_search(input_for_search = input_processed, pin_name_path = pin_path,
        snws_file = snws_file, dir_for_parallel_run = dir_for_parallel_run, score_quan_thr = score_quan_thr,
        sig_gene_thr = sig_gene_thr, search_method = search_method, seedForRandom = ifelse(is.null(i),
            1234, i), silent_option = silent_option, use_all_positives = use_all_positives,
        geneInitProbs = ifelse(!is.null(i), geneInitProbs[i], geneInitProbs), saTemp0 = saTemp0,
        saTemp1 = saTemp1, saIter = saIter, gaPop = gaPop, gaIter = gaIter, gaThread = gaThread,
        gaCrossover = gaCrossover, gaMut = gaMut, grMaxDepth = grMaxDepth, grSearchDepth = grSearchDepth,
        grOverlap = grOverlap, grSubNum = grSubNum)
    enrichment_res <- enrichment_analyses(snws = snws, sig_genes_vec = input_processed$GENE,
        pin_name_path = pin_path, genes_by_term = gset_list$genes_by_term, term_descriptions = gset_list$term_descriptions,
        adj_method = adj_method, enrichment_threshold = enrichment_threshold, list_active_snw_genes = list_active_snw_genes)
    return(enrichment_res)
}


#' Wrapper for Active Subnetwork Search + Enrichment over Single/Multiple Iteration(s)
#'
#' @param input_processed processed input data frame
#' @param pin_path path/to/PIN/file
#' @param gset_list list for gene sets
#' @param disable_parallel boolean to indicate whether to disable parallel runs
#'  via \code{foreach} (default = FALSE)
#' @inheritParams run_pathfindR
#' @inheritParams active_snw_search
#' @inheritParams enrichment_analyses
#' @param iterations number of iterations for active subnetwork search and
#'  enrichment analyses (Default = 10)
#' @param n_processes optional argument for specifying the number of processes
#'  used by foreach. If not specified, the function determines this
#'  automatically (Default == NULL. Gets set to 1 for Genetic Algorithm)
#'
#' @return Data frame of combined pathfindR enrichment results
active_snw_enrichment_wrapper <- function(input_processed, pin_path, gset_list, enrichment_threshold,
    list_active_snw_genes, adj_method = "bonferroni", search_method = "GR", disable_parallel = FALSE,
    use_all_positives = FALSE, iterations = 10, n_processes = NULL, score_quan_thr = 0.8,
    sig_gene_thr = 0.02, saTemp0 = 1, saTemp1 = 0.01, saIter = 10000, gaPop = 400,
    gaIter = 200, gaThread = 5, gaCrossover = 1, gaMut = 0, grMaxDepth = 1, grSearchDepth = 1,
    grOverlap = 0.5, grSubNum = 1000, silent_option = TRUE) {
    message("## Performing Active Subnetwork Search and Enrichment")
    ############ Argument checks Active Subnetwork Search Method
    valid_mets <- c("GR", "SA", "GA")
    if (!search_method %in% valid_mets) {
        stop("`search_method` should be one of ", paste(dQuote(valid_mets), collapse = ", "))
    }

    ## If search_method is GA, set iterations as 1
    if (search_method == "GA") {
        warning("`iterations` is set to 1 because `search_method = \"GA\"`", call. = FALSE)
        iterations <- 1
    }

    if (!is.null(n_processes)) {
        if (!is.numeric(n_processes)) {
            stop("`n_processes` should be either NULL or a positive integer")
        }
        if (n_processes < 1) {
            stop("`n_processes` should be > 1")
        }
    }

    # calculate the number of processes, if necessary
    if (is.null(n_processes)) {
        n_processes <- parallel::detectCores() - 1
    }

    ## If iterations < n_processes, set n_processes to iterations
    if (iterations < n_processes & iterations != 1) {
        message("`n_processes` is set to `iterations` because `iterations` < `n_processes`")
        n_processes <- iterations
    }

    if (!is.logical(use_all_positives)) {
        stop("`use_all_positives` should be either TRUE or FALSE")
    }

    if (!is.logical(silent_option)) {
        stop("`silent_option` should be either TRUE or FALSE")
    }

    if (!is.logical(disable_parallel)) {
        stop("`disable_parallel` should be either TRUE or FALSE")
    }

    if (!is.numeric(iterations)) {
        stop("`iterations` should be a positive integer")
    }
    if (iterations < 1) {
        stop("`iterations` should be >= 1")
    }

    geneInitProbs <- 0.1
    dirs <- c()
    if (iterations > 1) {
        geneInitProbs <- seq(from = 0.01, to = 0.2, length.out = iterations)

        for (i in base::seq_len(iterations)) {
            dir_i <- file.path("active_snw_searches", paste0("Iteration_", i))
            dir.create(dir_i, recursive = TRUE, showWarnings = FALSE)
            dirs <- c(dirs, dir_i)
        }
    }

    if (iterations == 1) {
        combined_res <- single_iter_wrapper(i = NULL, dirs, input_processed, pin_path,
            score_quan_thr, sig_gene_thr, search_method, silent_option, use_all_positives,
            geneInitProbs, saTemp0, saTemp1, saIter, gaPop, gaIter, gaThread, gaCrossover,
            gaMut, grMaxDepth, grSearchDepth, grOverlap, grSubNum, gset_list, adj_method,
            enrichment_threshold, list_active_snw_genes)
    } else {
        if (!disable_parallel) {
            cl <- parallel::makeCluster(n_processes, setup_strategy = "sequential")
            doParallel::registerDoParallel(cl)
            `%dopar%` <- foreach::`%dopar%`
            combined_res <- foreach::foreach(i = 1:iterations, .combine = rbind,
                .packages = "pathfindR") %dopar% {
                single_iter_wrapper(i, dirs, input_processed, pin_path, score_quan_thr,
                  sig_gene_thr, search_method, silent_option, use_all_positives,
                  geneInitProbs, saTemp0, saTemp1, saIter, gaPop, gaIter, gaThread,
                  gaCrossover, gaMut, grMaxDepth, grSearchDepth, grOverlap, grSubNum,
                  gset_list, adj_method, enrichment_threshold, list_active_snw_genes)
            }
            parallel::stopCluster(cl)
        } else {
            combined_res <- c()
            for (i in 1:iterations) {
                current_res <- single_iter_wrapper(i, dirs, score_quan_thr, sig_gene_thr,
                  search_method, silent_option, use_all_positives, geneInitProbs,
                  saTemp0, saTemp1, saIter, gaPop, gaIter, gaThread, gaCrossover,
                  gaMut, grMaxDepth, grSearchDepth, grOverlap, grSubNum, gset_list,
                  adj_method, enrichment_threshold, list_active_snw_genes)
                combined_res <- rbind(combined_res, current_res)
            }
        }
    }
    return(combined_res)
}


#' Configure Output Directory Name
#'
#' @inheritParams run_pathfindR
#'
#' @return /path/to/output/dir
configure_output_dir <- function(output_dir = NULL) {
    output_dir_init <- output_dir
    output_dir <- ifelse(is.null(output_dir), file.path(tempdir(check = TRUE), "pathfindR_results"),
        output_dir)
    dir_changed <- FALSE
    while (dir.exists(output_dir)) {
        output_dir <- sub("/$", "", output_dir)
        if (grepl("\\(\\d+\\)$", output_dir)) {
            output_dir <- unlist(strsplit(output_dir, "\\("))
            suffix <- as.numeric(sub("\\)", "", output_dir[2])) + 1
            output_dir <- paste0(output_dir[1], "(", suffix, ")")
        } else {
            output_dir <- paste0(output_dir, "(1)")
        }
        dir_changed <- TRUE
    }

    if (dir_changed & !is.null(output_dir_init)) {
        message(paste0("There is already a directory named \"", output_dir_init,
            "\".\nWriting the result to \"", output_dir, "\" not to overwrite any previous results."))
    }
    return(output_dir)
}

#' Create HTML Report of pathfindR Results
#'
#' @inheritParams run_pathfindR
#' @param input_processed processed input data frame
#' @param final_res final pathfindR result data frame
#' @param dir_for_report directory to render the report in
create_HTML_report <- function(input, input_processed, final_res, dir_for_report) {
    message("## Creating HTML report")
    rmarkdown::render(input = system.file("rmd", "results.Rmd", package = "pathfindR"),
        output_dir = dir_for_report)
    rmarkdown::render(input = system.file("rmd", "enriched_terms.Rmd", package = "pathfindR"),
        params = list(df = final_res), output_dir = dir_for_report)
    rmarkdown::render(input = system.file("rmd", "conversion_table.Rmd", package = "pathfindR"),
        params = list(df = input_processed, original_df = input), output_dir = dir_for_report)
}

#' Input Testing
#'
#' @param input the input data that pathfindR uses. The input must be a data
#'   frame with three columns: \enumerate{
#'   \item Gene Symbol (Gene Symbol)
#'   \item Change value, e.g. log(fold change) (OPTIONAL)
#'   \item p value, e.g. adjusted p value associated with differential expression
#' }
#' @param p_val_threshold the p value threshold to use when filtering
#'   the input data frame. Must a numeric value between 0 and 1. (default = 0.05)
#'
#' @return Only checks if the input and the threshold follows the required
#'   specifications.
#' @export
#' @seealso See \code{\link{run_pathfindR}} for the wrapper function of the
#'   pathfindR workflow
#' @examples
#' input_testing(example_pathfindR_input, 0.05)
input_testing <- function(input, p_val_threshold = 0.05) {
    message("## Testing input")

    if (!is.data.frame(input)) {
        stop("the input is not a data frame")
    }

    if (ncol(input) != 2 & ncol(input) != 3) {
        stop("the input should have 2 or 3 columns")
    }

    if (nrow(input) < 2) {
        stop("There must be at least 2 rows (genes) in the input data frame")
    }

    if (!is.numeric(p_val_threshold)) {
        stop("`p_val_threshold` must be a numeric value between 0 and 1")
    }

    if (p_val_threshold > 1 | p_val_threshold < 0) {
        stop("`p_val_threshold` must be between 0 and 1")
    }

    # if changes are provided, p vals are in col. 3, else in col. 2
    p_column <- ifelse(ncol(input) == 3, 3, 2)

    if (any(is.na(input[, p_column]))) {
        stop("p values cannot contain NA values")
    }

    if (!all(is.numeric(input[, p_column]))) {
        stop("p values must all be numeric")
    }

    if (any(input[, p_column] > 1 | input[, p_column] < 0)) {
        stop("p values must all be between 0 and 1")
    }

    message("The input looks OK")
}

#' Process Input
#' @inheritParams input_testing
#' @inheritParams active_snw_search
#' @inheritParams return_pin_path
#' @param convert2alias boolean to indicate whether or not to convert gene symbols
#' in the input that are not found in the PIN to an alias symbol found in the PIN
#' (default = TRUE) IMPORTANT NOTE: the conversion uses human gene symbols/alias symbols.
#'
#' @return This function first filters the input so that all p values are less
#'   than or equal to the threshold. Next, gene symbols that are not found in
#'   the PIN are identified. If aliases of these gene symbols are found in the
#'   PIN, the symbols are converted to the corresponding aliases. The
#'   resulting data frame containing the original gene symbols, the updated
#'   symbols, change values and p values is then returned.
#' @export
#'
#' @seealso See \code{\link{run_pathfindR}} for the wrapper function of the
#'   pathfindR workflow
#'
#' @examples
#' processed_df <- input_processing(
#'   input = example_pathfindR_input[1:5, ],
#'   pin_name_path = 'KEGG'
#' )
#' processed_df <- input_processing(
#'   input = example_pathfindR_input[1:5, ],
#'   pin_name_path = 'KEGG',
#'   convert2alias = FALSE
#' )
input_processing <- function(input, p_val_threshold = 0.05, pin_name_path = "Biogrid",
    convert2alias = TRUE) {
    message("## Processing input. Converting gene symbols,
          if necessary (and if human gene symbols provided)")

    if (!is.logical(convert2alias)) {
        stop("`convert2alias` should be either TRUE or FALSE")
    }

    pin_path <- return_pin_path(pin_name_path)

    if (ncol(input) == 2) {
        input <- data.frame(GENE = input[, 1], CHANGE = rep(1e+06, nrow(input)),
            P_VALUE = input[, 2])
    }

    colnames(input) <- c("GENE", "CHANGE", "P_VALUE")

    ## Turn GENE into character
    if (is.factor(input$GENE)) {
        warning("The gene column was turned into character from factor.", call. = FALSE)
        input$GENE <- as.character(input$GENE)
    }

    message("Number of genes provided in input: ", nrow(input))
    ## Discard larger than p-value threshold
    if (sum(input$P_VALUE <= p_val_threshold) == 0) {
        stop("No input p value is lower than the provided threshold (", p_val_threshold,
            ")")
    }
    input <- input[input$P_VALUE <= p_val_threshold, ]
    message("Number of genes in input after p-value filtering: ", nrow(input))

    ## Choose lowest p for each gene
    if (anyDuplicated(input$GENE)) {
        warning("Duplicated genes found! The lowest p value for each gene was selected",
            call. = FALSE)

        input <- input[order(input$P_VALUE, decreasing = FALSE), ]
        input <- input[!duplicated(input$GENE), ]
    }

    ## Fix p < 1e-13
    if (any(input$P_VALUE < 1e-13)) {
        message("pathfindR cannot handle p values < 1e-13. These were changed to 1e-13")
        input$P_VALUE <- ifelse(input$P_VALUE < 1e-13, 1e-13, input$P_VALUE)
    }

    ## load and prep pin
    pin <- utils::read.delim(file = pin_path, header = FALSE)

    ## Genes not in pin
    PIN_genes <- c(base::toupper(pin[, 1]), base::toupper(pin[, 3]))
    missing_symbols <- input$GENE[!base::toupper(input$GENE) %in% PIN_genes]
    non_missing_symbols <- input$GENE[base::toupper(input$GENE) %in% PIN_genes]

    
    if (convert2alias & !requireNamespace("org.Hs.eg.db", quietly = TRUE)) {
      message(
        "Package 'org.Hs.eg.db' is not installed; returning input genes unchanged.\n",
        "Install it with:\n",
        "  if (!requireNamespace('BiocManager', quietly = TRUE)) install.packages('BiocManager')\n",
        "  BiocManager::install('org.Hs.eg.db')"
      )
      convert2alias <- FALSE
    }
    
    
    if (convert2alias & length(missing_symbols) != 0) {
        ## use SQL to get alias table and gene_info table (contains the
        ## symbols) first open the database connection
        db_con <- org.Hs.eg.db::org.Hs.eg_dbconn()
        ## the SQL query
        sql_query <- "SELECT * FROM alias, gene_info WHERE alias._id == gene_info._id;"
        ## execute the query on the database
        hsa_alias_df <- DBI::dbGetQuery(db_con, sql_query)

        select_alias <- function(result, converted, idx) {
            while (idx > 0) {
                if (!result[idx] %in% c(converted[, 2], non_missing_symbols)) {
                  return(result[idx])
                }
                idx <- idx - 1
            }
            return("NOT_FOUND")
        }

        ## loop for getting all symbols
        converted <- c()
        for (i in base::seq_len(length(missing_symbols))) {
            result <- hsa_alias_df[hsa_alias_df$alias_symbol == missing_symbols[i],
                c("alias_symbol", "symbol")]
            result <- hsa_alias_df[hsa_alias_df$symbol %in% result$symbol, c("alias_symbol",
                "symbol")]
            result <- result$alias_symbol[base::toupper(result$alias_symbol) %in%
                PIN_genes]
            ## avoid duplicate entries
            to_add <- select_alias(result, converted, length(result))
            converted <- rbind(converted, c(missing_symbols[i], to_add))
        }

        ## Convert to appropriate symbol
        input$new_gene <- input$GENE
        input$new_gene[match(converted[, 1], input$new_gene)] <- converted[, 2]
    } else {
        input$new_gene <- ifelse(input$GENE %in% missing_symbols, "NOT_FOUND", input$GENE)
    }

    ## number and percent still missing
    n <- sum(input$new_gene == "NOT_FOUND")
    perc <- n/nrow(input) * 100

    if (n == nrow(input)) {
        stop("None of the genes were in the PIN\nPlease check your gene symbols")
    }

    ## Give out warning indicating the number of still missing
    if (n != 0) {
        message(paste0("Could not find any interactions for ", n, " (", round(perc,
            2), "%) genes in the PIN"))
    } else {
        message(paste0("Found interactions for all genes in the PIN"))
    }

    ## reorder columns
    input <- input[, c(1, 4, 2, 3)]
    colnames(input) <- c("old_GENE", "GENE", "CHANGE", "P_VALUE")

    input <- input[input$GENE != "NOT_FOUND", ]

    ## Keep lowest p value for duplicated genes
    input <- input[order(input$P_VALUE), ]
    input <- input[!duplicated(input$GENE), ]

    ## Check that at least two genes remain
    if (nrow(input) < 2) {
        stop("After processing, 1 gene (or no genes) could be mapped to the PIN")
    }

    message("Final number of genes in input: ", nrow(input))

    return(input)
}

#' Annotate the Affected Genes in the Provided Enriched Terms
#'
#' Function to annotate the involved affected (input) genes in each term.
#'
#' @param result_df data frame of enrichment results.
#'  The only must-have column is 'ID'.
#' @param input_processed input data processed via \code{\link{input_processing}}
#' @param genes_by_term List that contains genes for each gene set. Names of
#'   this list are gene set IDs (default = kegg_genes)
#'
#' @return The original data frame with two additional columns:  \describe{
#'   \item{Up_regulated}{the up-regulated genes in the input involved in the given term's gene set, comma-separated}
#'   \item{Down_regulated}{the down-regulated genes in the input involved in the given term's gene set, comma-separated}
#' }
#' @export
#'
#' @examples
#' example_gene_data <- example_pathfindR_input
#' colnames(example_gene_data) <- c('GENE', 'CHANGE', 'P_VALUE')
#'
#' annotated_result <- annotate_term_genes(
#'   result_df = example_pathfindR_output,
#'   input_processed = example_gene_data
#' )
annotate_term_genes <- function(result_df, input_processed, genes_by_term = pathfindR.data::kegg_genes) {
    message("## Annotating involved genes and visualizing enriched terms")
    ### Argument checks
    if (!is.data.frame(result_df)) {
        stop("`result_df` should be a data frame")
    }
    if (!"ID" %in% colnames(result_df)) {
        stop("`result_df` should contain an \"ID\" column")
    }

    if (!is.data.frame(input_processed)) {
        stop("`input_processed` should be a data frame")
    }
    if (!all(c("GENE", "CHANGE") %in% colnames(input_processed))) {
        stop("`input_processed` should contain the columns \"GENE\" and \"CHANGE\"")
    }

    if (!is.list(genes_by_term)) {
        stop("`genes_by_term` should be a list of term gene sets")
    }
    if (is.null(names(genes_by_term))) {
        stop("`genes_by_term` should be a named list (names are gene set IDs)")
    }

    ### Annotate up/down-regulated term-related genes Up/Down-regulated genes
    upreg <- base::toupper(input_processed$GENE[input_processed$CHANGE >= 0])
    downreg <- base::toupper(input_processed$GENE[input_processed$CHANGE < 0])

    ## Annotation
    annotated_df <- result_df
    annotated_df$Down_regulated <- annotated_df$Up_regulated <- NA
    for (i in base::seq_len(nrow(annotated_df))) {
        idx <- which(names(genes_by_term) == annotated_df$ID[i])
        temp <- genes_by_term[[idx]]
        annotated_df$Up_regulated[i] <- paste(temp[base::toupper(temp) %in% upreg],
            collapse = ", ")
        annotated_df$Down_regulated[i] <- paste(temp[base::toupper(temp) %in% downreg],
            collapse = ", ")
    }

    return(annotated_df)
}


#' Fetch Gene Set Objects
#'
#' Function for obtaining the gene sets per term and the term descriptions to
#' be used for enrichment analysis.
#'
#' @param gene_sets Name of the gene sets to be used for enrichment analysis.
#'  Available gene sets are 'KEGG', 'Reactome', 'BioCarta', 'GO-All',
#'  'GO-BP', 'GO-CC', 'GO-MF', 'cell_markers', 'mmu_KEGG' or 'Custom'.
#'  If 'Custom', the arguments \code{custom_genes} and \code{custom_descriptions}
#'  must be specified. (Default = 'KEGG')
#' @param min_gset_size minimum number of genes a term must contain (default = 10)
#' @param max_gset_size maximum number of genes a term must contain (default = 300)
#' @param custom_genes a list containing the genes involved in each custom
#'  term. Each element is a vector of gene symbols located in the given custom
#'  term. Names should correspond to the IDs of the custom terms.
#' @param custom_descriptions A vector containing the descriptions for each
#'  custom  term. Names of the vector should correspond to the IDs of the custom
#'  terms.
#'
#' @return a list containing 2 elements \describe{
#'   \item{genes_by_term}{list of vectors of genes contained in each term}
#'   \item{term_descriptions}{vector of descriptions per each term}
#' }
#'
#' @export
#'
#' @examples
#' KEGG_gset <- fetch_gene_set()
#' GO_MF_gset <- fetch_gene_set('GO-MF', min_gset_size = 20, max_gset_size = 100)
fetch_gene_set <- function(gene_sets = "KEGG", min_gset_size = 10, max_gset_size = 300,
    custom_genes = NULL, custom_descriptions = NULL) {
    ### Argument checks
    all_gs_opts <- c("KEGG", "Reactome", "BioCarta", "GO-All", "GO-BP", "GO-CC",
        "GO-MF", "cell_markers", "mmu_KEGG", "Custom")
    if (!gene_sets %in% all_gs_opts) {
        stop("`gene_sets` should be one of ", paste(dQuote(all_gs_opts), collapse = ", "))
    }

    if (!is.numeric(min_gset_size)) {
        stop("`min_gset_size` should be numeric")
    }
    if (!is.numeric(max_gset_size)) {
        stop("`max_gset_size` should be numeric")
    }


    ### Custom Gene Sets
    if (gene_sets == "Custom") {
        if (is.null(custom_genes) | is.null(custom_descriptions)) {
            stop("`custom_genes` and `custom_descriptions` must be provided if `gene_sets = \"Custom\"`")
        }

        if (!is.list(custom_genes)) {
            stop("`custom_genes` should be a list of term gene sets")
        }
        if (is.null(names(custom_genes))) {
            stop("`custom_genes` should be a named list (names are gene set IDs)")
        }

        if (!is.atomic(custom_descriptions)) {
            stop("`custom_descriptions` should be a vector of term gene descriptions")
        }
        if (is.null(names(custom_descriptions))) {
            stop("`custom_descriptions` should be a named vector (names are gene set IDs)")
        }

        # filter by size
        gset_lens <- vapply(custom_genes, length, 1)
        keep <- which(gset_lens >= min_gset_size & gset_lens <= max_gset_size)
        custom_genes <- custom_genes[keep]
        custom_descriptions <- custom_descriptions[names(custom_genes)]

        return(list(genes_by_term = custom_genes, term_descriptions = custom_descriptions))
    }

    ### Built-in Gene Sets GO gene sets
    if (grepl("^GO", gene_sets)) {
        genes_by_term <- pathfindR.data::go_all_genes

        GO_df <- pathfindR.data:::GO_all_terms_df
        term_descriptions <- GO_df$GO_term
        names(term_descriptions) <- GO_df$GO_ID

        if (gene_sets == "GO-BP") {
            tmp <- GO_df$GO_ID[GO_df$Category == "Process"]
            genes_by_term <- genes_by_term[tmp]
            term_descriptions <- term_descriptions[tmp]
        } else if (gene_sets == "GO-CC") {
            tmp <- GO_df$GO_ID[GO_df$Category == "Component"]
            genes_by_term <- genes_by_term[tmp]
            term_descriptions <- term_descriptions[tmp]
        } else if (gene_sets == "GO-MF") {
            tmp <- GO_df$GO_ID[GO_df$Category == "Function"]
            genes_by_term <- genes_by_term[tmp]
            term_descriptions <- term_descriptions[tmp]
        }

        ## non-GO (KEGG, Reactome, BioCarta, mmu_KEGG)
    } else {
        if (gene_sets == "KEGG") {
            genes_by_term <- pathfindR.data::kegg_genes
            term_descriptions <- pathfindR.data::kegg_descriptions
        } else if (gene_sets == "Reactome") {
            genes_by_term <- pathfindR.data::reactome_genes
            term_descriptions <- pathfindR.data::reactome_descriptions
        } else if (gene_sets == "BioCarta") {
            genes_by_term <- pathfindR.data::biocarta_genes
            term_descriptions <- pathfindR.data::biocarta_descriptions
        } else if (gene_sets == "mmu_KEGG") {
            genes_by_term <- pathfindR.data::mmu_kegg_genes
            term_descriptions <- pathfindR.data::mmu_kegg_descriptions
        } else {
            genes_by_term <- pathfindR.data::cell_markers_gsets
            term_descriptions <- pathfindR.data::cell_markers_descriptions
        }
    }

    # filter by size
    term_lens <- vapply(genes_by_term, length, 1)
    keep <- which(term_lens >= min_gset_size & term_lens <= max_gset_size)
    genes_by_term <- genes_by_term[keep]
    term_descriptions <- term_descriptions[names(genes_by_term)]

    return(list(genes_by_term = genes_by_term, term_descriptions = term_descriptions))
}

#' Return The Path to Given Protein-Protein Interaction Network (PIN)
#'
#' This function returns the absolute path/to/PIN.sif. While the default PINs are
#' 'Biogrid', 'STRING', 'GeneMania', 'IntAct', 'KEGG' and 'mmu_STRING'. The user can also
#' use any other PIN by specifying the 'path/to/PIN.sif'. All PINs to be used
#' in this package must formatted as SIF files: i.e. have 3 columns with no
#' header, no row names and be tab-separated. Columns 1 and 3 must be
#' interactors' gene symbols, column 2 must be a column with all
#' rows consisting of 'pp'.
#'
#' @param pin_name_path Name of the chosen PIN or absolute/path/to/PIN.sif. If PIN name,
#'   must be one of c('Biogrid', 'STRING', 'GeneMania', 'IntAct', 'KEGG', 'mmu_STRING'). If
#'   path/to/PIN.sif, the file must comply with the PIN specifications. (Default = 'Biogrid')
#'
#' @return The absolute path to chosen PIN.
#'
#' @export
#' @seealso See \code{\link{run_pathfindR}} for the wrapper function of the
#'   pathfindR workflow
#' @examples
#' \dontrun{
#' pin_path <- return_pin_path('GeneMania')
#' }
return_pin_path <- function(pin_name_path = "Biogrid") {
    ## Default PINs
    valid_opts <- c("Biogrid", "STRING", "GeneMania", "IntAct", "KEGG", "mmu_STRING",
        "/path/to/custom/SIF")
    if (pin_name_path %in% valid_opts[-length(valid_opts)]) {
        path <- file.path(tempdir(check = TRUE), paste0(pin_name_path, ".sif"))
        if (!file.exists(path)) {
            adj_list <- utils::getFromNamespace(paste0(tolower(pin_name_path), "_adj_list"),
                ns = "pathfindR.data")

            pin_df <- lapply(seq_along(adj_list), function(i, nm, val) {
                data.frame(base::toupper(nm[[i]]), "pp", base::toupper(val[[i]]))
            }, val = adj_list, nm = names(adj_list))
            pin_df <- base::do.call("rbind", pin_df)
            utils::write.table(pin_df, path, sep = "\t", row.names = FALSE, col.names = FALSE,
                quote = FALSE)
        }
        path <- normalizePath(path)

        ## Custom PIN
    } else if (file.exists(suppressWarnings(normalizePath(pin_name_path)))) {
        path <- normalizePath(pin_name_path)
        pin <- utils::read.delim(file = path, quote = "", header = FALSE)
        if (ncol(pin) != 3) {
            stop("The PIN file must have 3 columns and be tab-separated")
        }

        if (any(pin[, 2] != "pp")) {
            stop("The second column of the PIN file must all be \"pp\" ")
        }

        if (any(grepl("[a-z]", pin[, 1])) | any(grepl("[a-z]", pin[, 3]))) {
            pin[, 1] <- base::toupper(pin[, 1])
            pin[, 3] <- base::toupper(pin[, 3])

            path <- file.path(tempdir(check = TRUE), "custom_PIN.sif")
            utils::write.table(pin, path, sep = "\t", row.names = FALSE, col.names = FALSE,
                quote = FALSE)
            path <- normalizePath(path)
        }
    } else {
        stop("The chosen PIN must be one of:\n", paste(dQuote(valid_opts), collapse = ", "))
    }
    return(path)
}


================================================
FILE: R/visualization.R
================================================
#' Check if value is a valid color
#'
#' @param x value
#'
#' @return TRUE if x is a valid color, otherwise FALSE
isColor <- function(x) {
  if (!is.character(x) | length(x) != 1) {
    return(FALSE)
  }
  tryCatch(is.matrix(grDevices::col2rgb(x)), error = function(e) FALSE)
}


#' Create Diagrams for Enriched Terms
#'
#' @param result_df Data frame of enrichment results. Must-have columns for
#'  KEGG human pathway diagrams (\code{is_KEGG_result = TRUE}) are: 'ID' and 'Term_Description'.
#'  Must-have columns for the rest are: 'Term_Description', 'Up_regulated' and
#' 'Down_regulated'
#' @param input_processed input data processed via \code{\link{input_processing}},
#'  not necessary when \code{is_KEGG_result = FALSE}
#' @param is_KEGG_result boolean to indicate whether KEGG gene sets were used for
#'  enrichment analysis or not (default = \code{TRUE})
#' @inheritParams return_pin_path
#' @param ... additional arguments for \code{\link{visualize_KEGG_diagram}} (used
#' when \code{is_KEGG_result = TRUE}) or \code{\link{visualize_term_interactions}}
#' (used when \code{is_KEGG_result = FALSE})
#'
#' @return Depending on the argument \code{is_KEGG_result}, creates visualization of
#'  interactions of genes involved in the list of enriched terms in
#'  \code{result_df}. Returns a list of ggplot objects named by Term ID.
#'
#'
#' @details For \code{is_KEGG_result = TRUE}, KEGG pathway diagrams are created,
#' affected nodes colored by up/down regulation status.
#' For other gene sets, interactions of affected genes are determined (via a shortest-path
#' algorithm) and are visualized (colored by change status) using igraph.
#'
#'
#' @export
#'
#' @seealso See \code{\link{visualize_KEGG_diagram}} for the visualization function
#' of KEGG diagrams. See \code{\link{visualize_term_interactions}} for the
#' visualization function that generates diagrams showing the interactions of
#' input genes in the PIN. See \code{\link{run_pathfindR}} for the wrapper
#' function of the pathfindR workflow.
#'
#' @examples
#' \dontrun{
#' input_processed <- data.frame(
#'   GENE = c("PARP1", "NDUFA1", "STX6", "SNAP23"),
#'   CHANGE = c(1.5, -2, 3, 5)
#' )
#' result_df <- example_pathfindR_output[1:2, ]
#'
#' gg_list <- visualize_terms(result_df, input_processed)
#' gg_list2 <- visualize_terms(result_df, is_KEGG_result = FALSE, pin_name_path = 'IntAct')
#' }
visualize_terms <- function(
    result_df, input_processed = NULL, is_KEGG_result = TRUE, pin_name_path = "Biogrid", ...
) {
    ############ Argument Checks
    if (!is.data.frame(result_df)) {
        stop("`result_df` should be a data frame")
    }

    if (!is.logical(is_KEGG_result)) {
        stop("the argument `is_KEGG_result` should be either TRUE or FALSE")
    }

    if (is_KEGG_result) {
        nec_cols <- "ID"
    } else {
        nec_cols <- c("Term_Description", "Up_regulated", "Down_regulated")
    }
    if (!all(nec_cols %in% colnames(result_df))) {
        stop("`result_df` should contain the following columns: ", paste(dQuote(nec_cols),
            collapse = ", "))
    }

    if (is_KEGG_result) {
        if (is.null(input_processed)) {
            stop("`input_processed` should be specified when `is_KEGG_result = TRUE`")
        }
    }

  ############ Generate Diagrams
  if (is_KEGG_result) {
    visualize_KEGG_diagram(
      kegg_pw_ids = result_df$ID, input_processed = input_processed, ...
    )
  } else {
    visualize_term_interactions(
      result_df = result_df, pin_name_path = pin_name_path, ...
    )
  }
}

#' Visualize Interactions of Genes Involved in the Given Enriched Terms
#'
#' @param result_df Data frame of enrichment results. Must-have columns
#' are: 'Term_Description', 'Up_regulated' and 'Down_regulated'
#' @inheritParams return_pin_path
#' @param show_legend Boolean to indicate whether to display the legend (\code{TRUE})
#' or not (\code{FALSE}) (default: \code{TRUE})
#'
#' @return list of ggplot objects (named by Term ID) visualizing the interactions of genes involved
#' in the given enriched terms (annotated in the \code{result_df}) in the PIN used
#' for enrichment analysis (specified by \code{pin_name_path}).
#'
#' @details The following steps are performed for the visualization of interactions
#' of genes involved for each enriched term: \enumerate{
#'   \item shortest paths between all affected genes are determined (via \code{\link[igraph]{igraph}})
#'   \item the nodes of all shortest paths are merged
#'   \item the PIN is subsetted using the merged nodes (genes)
#'   \item using the PIN subset, the graph showing the interactions is generated
#'   \item the final graph is visualized using \code{\link[igraph]{igraph}}, colored by changed
#'   status (if provided)
#' }
#'
#' @export
#'
#' @seealso See \code{\link{visualize_terms}} for the wrapper function
#'   for creating enriched term diagrams. See \code{\link{run_pathfindR}} for the
#'   wrapper function of the pathfindR enrichment workflow.
#'
#' @examples
#' \dontrun{
#' result_df <- example_pathfindR_output[1:2, ]
#' gg_list <- visualize_term_interactions(result_df, pin_name_path = 'IntAct')
#' }
visualize_term_interactions <- function(result_df, pin_name_path, show_legend = TRUE) {
    ############ Initial Steps fix naming issue
    result_df$Term_Description <- gsub("\\/", "-", result_df$Term_Description)

    ## load PIN
    pin_path <- return_pin_path(pin_name_path)
    pin <- utils::read.delim(file = pin_path, header = FALSE)
    pin$V2 <- NULL

    pin[, 1] <- base::toupper(pin[, 1])
    pin[, 2] <- base::toupper(pin[, 2])

    ## pin graph
    pin_g <- igraph::graph_from_data_frame(pin, directed = FALSE)

    ############ Visualize interactions by enriched term
    pw_vis_list <- list()
    for (i in base::seq_len(nrow(result_df))) {
        current_row <- result_df[i, ]

        up_genes <- base::toupper(unlist(strsplit(current_row$Up_regulated, ", ")))
        down_genes <- base::toupper(unlist(strsplit(current_row$Down_regulated, ", ")))
        current_genes <- c(down_genes, up_genes)

        ## Add active snw genes if listed
        if (!is.null(result_df$non_Signif_Snw_Genes)) {
            snw_genes <- unlist(strsplit(current_row$non_Signif_Snw_Genes, ", "))
            snw_genes <- base::toupper(snw_genes)
            current_genes <- c(current_genes, snw_genes)
        } else {
            snw_genes <- NULL
        }

        if (length(current_genes) < 2) {
            message(paste0("< 2 genes, skipping visualization of ", current_row$Term_Description))
        } else {
            cat("Visualizing:", paste0("(", i, ")") , current_row$Term_Description, paste(rep(" ", 200),
                collapse = ""), "\r")

            ## Find genes without direct interaction
            cond1 <- pin$V1 %in% current_genes
            cond2 <- pin$V3 %in% current_genes
            direct_interactions <- pin[cond1 & cond2, ]
            tmp <- c(direct_interactions$V1, direct_interactions$V3)
            missing_genes <- current_genes[!current_genes %in% tmp]

            ## Find shortest path between genes without direct interaction and
            ## other current_genes
            s_path_genes <- c()
            for (gene in missing_genes) {
                tmp <- suppressWarnings(igraph::shortest_paths(pin_g, from = which(names(igraph::V(pin_g)) ==
                  gene), to = which(names(igraph::V(pin_g)) %in% current_genes),
                  output = "vpath"))
                tmp <- unique(unlist(lapply(tmp$vpath, function(x) names(x))))
                s_path_genes <- unique(c(s_path_genes, tmp))
            }

            final_genes <- unique(c(current_genes, s_path_genes))
            cond1 <- pin$V1 %in% final_genes
            cond2 <- pin$V3 %in% final_genes
            final_interactions <- pin[cond1 & cond2, ]
            g <- igraph::graph_from_data_frame(final_interactions, directed = FALSE)

            cond1 <- names(igraph::V(g)) %in% up_genes
            cond2 <- names(igraph::V(g)) %in% down_genes
            cond3 <- names(igraph::V(g)) %in% snw_genes
            node_type <- as.factor(ifelse(cond1, "up",
                                          ifelse(cond2, "down",
                                                 ifelse(cond3,
                                                        "interactor", "none"))))
            igraph::V(g)$type <- node_type

            node_colors <- c("green", "red", "blue", "gray")
            names(node_colors) <- c("up", "down", "interactor", "none")
            node_colors <- node_colors[levels(node_type)]

            type_descriptions <- c(
              none = "other", up = "up-regulated gene", down = "down-regulated gene", interactor = "interacting non-input gene"
            )
            type_descriptions <- type_descriptions[levels(node_type)]

            p <- ggraph::ggraph(g, layout = "stress")
            p <- p + ggraph::geom_edge_link(alpha = 0.8, colour = "darkgrey", linewidth = 0.5)
            p <- p + ggraph::geom_node_point(ggplot2::aes(color = .data$type), size = 5)
            p <- p + ggplot2::theme_void()
            p <- p + suppressWarnings(ggraph::geom_node_text(ggplot2::aes(label = .data$name),
                                                             nudge_y = 0.2, repel = TRUE, max.overlaps = 20))
            p <- p + ggplot2::scale_color_manual(values = node_colors, name = NULL,
                                                 labels = type_descriptions)
            p <- p + ggplot2::ggtitle(
              paste(current_row$Term_Description, "\n Involved Gene Interactions in", pin_name_path)
            )
            pw_vis_list[[current_row$ID]] <- p
        }
    }
    return(pw_vis_list)
}

#' Visualize Human KEGG Pathways
#'
#' @param kegg_pw_ids KEGG ids of pathways to be colored and visualized
#' @param input_processed input data processed via \code{\link{input_processing}}
#' @inheritParams color_kegg_pathway
#'
#' @return Creates colored visualizations of the enriched human KEGG pathways
#' and returns them as a list of ggplot objects, named by Term ID.
#'
#' @seealso See \code{\link{visualize_terms}} for the wrapper function for
#' creating enriched term diagrams. See \code{\link{run_pathfindR}} for the
#' wrapper function of the pathfindR enrichment workflow.
#'
#' @export
#'
#' @examples
#' \dontrun{
#' input_processed <- data.frame(
#'   GENE = c("PKLR", "GPI", "CREB1", "INS"),
#'   CHANGE = c(1.5, -2, 3, 5)
#' )
#' gg_list <- visualize_KEGG_diagram(c("hsa00010", "hsa04911"), input_processed)
#' }
visualize_KEGG_diagram <- function(
    kegg_pw_ids,
    input_processed,
    scale_vals = TRUE,
    node_cols = NULL,
    legend.position = "top"
) {
    message("This function utilises one functionality of `ggkegg`. For more options, visit https://github.com/noriakis/ggkegg")
    
    ############ Arg checks

    ### kegg_pw_ids
    if (!is.atomic(kegg_pw_ids)) {
        stop("`kegg_pw_ids` should be a vector of KEGG IDs")
    }
    if (!all(grepl("^[a-z]{3}[0-9]{5}$", kegg_pw_ids))) {
        stop("`kegg_pw_ids` should be a vector of valid hsa KEGG IDs")
    }

    ### input_processed
    if (!is.data.frame(input_processed)) {
        stop("`input_processed` should be a data frame")
    }

    nec_cols <- c("GENE", "CHANGE")
    if (!all(nec_cols %in% colnames(input_processed))) {
        stop("`input_processed` should contain the following columns: ", paste(dQuote(nec_cols),
            collapse = ", "))
    }
    
    if (!requireNamespace("org.Hs.eg.db", quietly = TRUE)) {
      message(
        "Package 'org.Hs.eg.db' is not installed; returning empty list.\n",
        "Install it with:\n",
        "  if (!requireNamespace('BiocManager', quietly = TRUE)) install.packages('BiocManager')\n",
        "  BiocManager::install('org.Hs.eg.db')"
      )
      return(list())
    }

    ############ Create change vector Convert gene symbols into NCBI gene IDs
    tmp <- AnnotationDbi::mget(input_processed$GENE, AnnotationDbi::revmap(org.Hs.eg.db::org.Hs.egSYMBOL),
        ifnotfound = NA)
    input_processed$EG_ID <- vapply(tmp, function(x) as.character(x[1]), "EGID")
    input_processed <- input_processed[!is.na(input_processed$EG_ID), ]

    ### A rule of thumb for the 'kegg' ID is entrezgene ID for eukaryote
    ### species
    input_processed$KEGG_ID <- paste0("hsa:", input_processed$EG_ID)

    ############ Fetch all pathway genes, create vector of change values and
    ############ Generate colored pathway diagrams for each pathway
    change_vec <- input_processed$CHANGE
    names(change_vec) <- input_processed$KEGG_ID

    cat("Generating pathway diagrams of", length(kegg_pw_ids), "KEGG pathways\n\n")
    pw_vis_list <- lapply(
      kegg_pw_ids,
      color_kegg_pathway,
      change_vec=change_vec,
      scale_vals = scale_vals,
      node_cols = node_cols,
      legend.position = legend.position
    )
    names(pw_vis_list) <- kegg_pw_ids

    return(pw_vis_list)
}

#' Color hsa KEGG pathway
#'
#' @param pw_id hsa KEGG pathway id (e.g. hsa05012)
#' @param change_vec vector of change values, names should be hsa KEGG gene ids
#' @param scale_vals should change values be scaled? (default = \code{TRUE})
#' @param node_cols low, middle and high color values for coloring the pathway nodes
#' (default = \code{NULL}). If \code{node_cols=NULL}, the low, middle and high color
#' are set as 'green', 'gray' and 'red'. If all change values are 1e6 (in case no
#' changes are supplied, this dummy value is assigned by
#' \code{\link{input_processing}}), only one color ('#F38F18' if NULL) is used.
#' @inheritParams ggplot2::theme
#'
#' @return a ggplot object containing the colored KEGG pathway diagram visualization
#'
#' @examples
#' \dontrun{
#' pw_id <- 'hsa00010'
#' change_vec <- c(-2, 4, 6)
#' names(change_vec) <- c('hsa:2821', 'hsa:226', 'hsa:229')
#' result <- pathfindR:::color_kegg_pathway(pw_id, change_vec)
#' }
color_kegg_pathway <- function(pw_id, change_vec, scale_vals = TRUE, node_cols = NULL, legend.position = "top") {
    ############ Arg checks
    if (!is.logical(scale_vals)) {
        stop("`scale_vals` should be logical")
    }

    ## check node_cols
    if (!is.null(node_cols)) {
        if (!is.atomic(node_cols)) {
            stop("`node_cols` should be a vector of colors")
        }

        if (!all(change_vec == 1e+06) & length(node_cols) != 3) {
            stop("the length of `node_cols` should be 3")
        }

        if (!all(vapply(node_cols, isColor, TRUE))) {
            stop("`node_cols` should be a vector of valid colors")
        }
    }
    ############ Set node palette if node_cols not supplied, use default
    ############ color(s)
    if (!is.null(node_cols)) {
        if (all(change_vec == 1e+06)) {
            message("all `change_vec` values are 1e6, using the first color in `node_cols`")
            low_col <- mid_col <- high_col <- node_cols[1]
        } else {
            low_col <- node_cols[1]
            mid_col <- node_cols[2]
            high_col <- node_cols[3]
        }
    } else if (all(change_vec == 1e+06)) {
        ## NO CHANGES SUPPLIED
        low_col <- mid_col <- high_col <- "#F38F18"
    } else {
        low_col <- "red"
        mid_col <- "gray"
        high_col <- "green"
    }

    ############ Assign the input change values to any corresponding pathway gene nodes
    # create pathway graph object and collect all pathway genes
    ggkegg_temp_dir <- file.path(tempdir(check = TRUE), "ggkegg")
    dir.create(ggkegg_temp_dir, showWarnings = FALSE)

    g <- tryCatch({
      ggkegg::pathway(pid = pw_id, directory = ggkegg_temp_dir)
    }, error = function(e) {
      message(paste("Cannot parse KEGG pathway for:", pw_id))
      message("Here's the original error message:")
      message(e$message)
      return(NULL)
    }, warning = function(w) {
      message(paste("Cannot parse KEGG pathway for:", pw_id))
      message("Here's the original error message:")
      message(w$message)
      return(NULL)
    })

    if (is.null(g)) {
      return(NULL)
    }

    gene_nodes <- names(igraph::V(g))[igraph::V(g)$type == "gene"]

    ## aggregate change values over all pathway gene nodes
    pw_vis_changes <- c()
    for (i in seq_len(length(gene_nodes))) {
        node_name <- gene_nodes[i]
        node <- unlist(strsplit(node_name, " "))
        cond <- names(change_vec) %in% node

        if (any(cond)) {
          node_val <- mean(change_vec[cond])
          names(node_val) <- node_name
          pw_vis_changes <- c(pw_vis_changes, node_val)
        }
    }
    ## if no input genes present in chosen pathway
    if (all(is.na(pw_vis_changes))) {
        return(NULL)
    }

    ############ Determine node colors
    ### scaling
    if (!all(pw_vis_changes == 1e+06) & scale_vals) {
      common_limit <- max(abs(pw_vis_changes))
      pw_vis_changes <- ifelse(pw_vis_changes < 0,
                               -abs(pw_vis_changes) / common_limit,
                               pw_vis_changes / common_limit)
    }


    ############ Create pathway diagram visualisation
    igraph::V(g)$change_value <- NA
    igraph::V(g)$change_value[match(names(pw_vis_changes), names(igraph::V(g)))] <- pw_vis_changes

    p <- ggraph::ggraph(g, layout="manual", x=igraph::V(g)$x, y=igraph::V(g)$y)
    p <- p + ggkegg::geom_node_rect(ggplot2::aes(filter = !is.na(.data$change_value), fill = .data$change_value))
    p <- p + ggkegg::overlay_raw_map(pw_id)
    p <- p + ggplot2::scale_fill_gradient2(low = low_col, mid = mid_col, high = high_col)
    p <- p + ggplot2::theme_void()
    p <- p + ggplot2::theme(
      legend.title = ggplot2::element_blank(),
      legend.position = legend.position
    )

    return(p)
}

#' Create Bubble Chart of Enrichment Results
#'
#' This function is used to create a ggplot2 bubble chart displaying the
#' enrichment results.
#'
#' @param result_df a data frame that must contain the following columns: \describe{
#'   \item{Term_Description}{Description of the enriched term}
#'   \item{Fold_Enrichment}{Fold enrichment value for the enriched term}
#'   \item{lowest_p}{the lowest adjusted-p value of the given term over all iterations}
#'   \item{Up_regulated}{the up-regulated genes in the input involved in the given term's gene set, comma-separated}
#'   \item{Down_regulated}{the down-regulated genes in the input involved in the given term's gene set, comma-separated}
#'   \item{Cluster(OPTIONAL)}{the cluster to which the enriched term is assigned}
#' }
#' @param top_terms number of top terms (according to the 'lowest_p' column)
#'  to plot (default = 10). If \code{plot_by_cluster = TRUE}, selects the top
#'  \code{top_terms} terms per each cluster. Set \code{top_terms = NULL} to plot
#'  for all terms.If the total number of terms is less than \code{top_terms},
#'  all terms are plotted.
#' @param plot_by_cluster boolean value indicating whether or not to group the
#'  enriched terms by cluster (works if \code{result_df} contains a
#'  'Cluster' column).
#' @param num_bubbles number of sizes displayed in the legend \code{# genes}
#'  (Default = 4)
#' @param even_breaks whether or not to set even breaks for the number of sizes
#'  displayed in the legend \code{# genes}. If \code{TRUE} (default), sets
#'  equal breaks and the number of displayed bubbles may be different than the
#'  number set by \code{num_bubbles}. If the exact number set by
#'  \code{num_bubbles} is required, set this argument to \code{FALSE}
#'
#' @return a \code{\link[ggplot2]{ggplot2}} object containing the bubble chart.
#' The x-axis corresponds to fold enrichment values while the y-axis indicates
#' the enriched terms. Size of the bubble indicates the number of significant
#' genes in the given enriched term. Color indicates the -log10(lowest-p) value.
#' The closer the color is to red, the more significant the enrichment is.
#' Optionally, if 'Cluster' is a column of \code{result_df} and
#' \code{plot_by_cluster == TRUE}, the enriched terms are grouped by clusters.
#'
#' @import ggplot2
#' @export
#'
#' @examples
#' g <- enrichment_chart(example_pathfindR_output)
enrichment_chart <- function(result_df, top_terms = 10, plot_by_cluster = FALSE,
    num_bubbles = 4, even_breaks = TRUE) {
    message("Plotting the enrichment bubble chart")
    necessary <- c("Term_Description", "Fold_Enrichment", "lowest_p", "Up_regulated",
        "Down_regulated")

    if (!all(necessary %in% colnames(result_df))) {
        stop("The input data frame must have the columns:\n", paste(necessary, collapse = ", "))
    }

    if (!is.logical(plot_by_cluster)) {
        stop("`plot_by_cluster` must be either TRUE or FALSE")
    }

    if (!is.numeric(top_terms) & !is.null(top_terms)) {
        stop("`top_terms` must be either numeric or NULL")
    }

    if (!is.null(top_terms)) {
        if (top_terms < 1) {
            stop("`top_terms` must be > 1")
        }
    }

    # sort by lowest adj.p
    result_df <- result_df[order(result_df$lowest_p), ]

    ## Filter for top_terms
    if (!is.null(top_terms)) {
        if (plot_by_cluster & "Cluster" %in% colnames(result_df)) {
            keep_ids <- tapply(result_df$ID, result_df$Cluster, function(x) {
                x[seq_len(min(top_terms, length(x)))]
            })
            keep_ids <- unlist(keep_ids)
            result_df <- result_df[result_df$ID %in% keep_ids, ]
        } else if (top_terms < nrow(result_df)) {
            result_df <- result_df[seq_len(top_terms), ]
        }
    }

    num_genes <- vapply(result_df$Up_regulated, function(x) length(unlist(strsplit(x,
        ", "))), 1)
    num_genes <- num_genes + vapply(result_df$Down_regulated, function(x) length(unlist(strsplit(x,
        ", "))), 1)

    result_df$Term_Description <- factor(result_df$Term_Description, levels = rev(unique(result_df$Term_Description)))

    log_p <- -log10(result_df$lowest_p)

    g <- ggplot2::ggplot(result_df, ggplot2::aes(.data$Fold_Enrichment, .data$Term_Description))
    g <- g + ggplot2::geom_point(ggplot2::aes(color = log_p, size = num_genes), na.rm = TRUE)
    g <- g + ggplot2::theme_bw()
    g <- g + ggplot2::theme(axis.text.x = ggplot2::element_text(size = 10), axis.text.y = ggplot2::element_text(size = 10),
        plot.title = ggplot2::element_blank())
    g <- g + ggplot2::xlab("Fold Enrichment")
    g <- g + ggplot2::theme(axis.title.y = ggplot2::element_blank())
    g <- g + ggplot2::labs(size = "# genes", color = expression(-log[10](p)))

    ## breaks for # genes
    if (max(num_genes) < num_bubbles) {
        g <- g + ggplot2::scale_size_continuous(breaks = seq(0, max(num_genes)))
    } else {
        if (even_breaks) {
            brks <- base::seq(0, max(num_genes), round(max(num_genes)/(num_bubbles +
                1)))
        } else {
            brks <- base::round(base::seq(0, max(num_genes), length.out = num_bubbles +
                1))
        }
        g <- g + ggplot2::scale_size_continuous(breaks = brks)
    }

    g <- g + ggplot2::scale_color_gradient(low = "#f5efef", high = "red")

    if (plot_by_cluster & "Cluster" %in% colnames(result_df)) {
        g <- g + ggplot2::facet_grid(result_df$Cluster ~ ., scales = "free_y", space = "free",
            drop = TRUE)
    } else if (plot_by_cluster) {
        message("For plotting by cluster, there must a column named `Cluster` in the input data frame!")
    }

    return(g)
}


#' Create Term-Gene Graph
#'
#' @param result_df A dataframe of pathfindR results that must contain the following columns: \describe{
#'   \item{Term_Description}{Description of the enriched term (necessary if \code{use_description = TRUE})}
#'   \item{ID}{ID of the enriched term (necessary if \code{use_description = FALSE})}
#'   \item{lowest_p}{the lowest adjusted-p value of the given term over all iterations}
#'   \item{Up_regulated}{the up-regulated genes in the input involved in the given term's gene set, comma-separated}
#'   \item{Down_regulated}{the down-regulated genes in the input involved in the given term's gene set, comma-separated}
#' }
#' @param num_terms Number of top enriched terms to use while creating the graph. Set to \code{NULL} to use
#'  all enriched terms (default = 10, i.e. top 10 terms)
#' @param layout The type of layout to create (see \code{\link[ggraph]{ggraph}} for details. Default = 'stress')
#' @param use_description Boolean argument to indicate whether term descriptions
#'  (in the 'Term_Description' column) should be used. (default = \code{FALSE})
#' @param node_size Argument to indicate whether to use number of significant genes ('num_genes')
#'  or the -log10(lowest p value) ('p_val') for adjusting the node sizes (default = 'num_genes')
#' @param node_colors vector of 3 colors to be used for coloring nodes (colors for term nodes, up, and down, respectively)
#'
#' @return a  \code{\link[ggraph]{ggraph}} object containing the term-gene graph.
#'  Each node corresponds to an enriched term (beige), an up-regulated gene (green)
#'  or a down-regulated gene (red). An edge between a term and a gene indicates
#'  that the given term involves the gene. Size of a term node is proportional
#'  to either the number of genes (if \code{node_size = 'num_genes'}) or
#'  the -log10(lowest p value) (if \code{node_size = 'p_val'}).
#'
#' @details This function (adapted from the Gene-Concept network visualization
#' by the R package \code{enrichplot}) can be utilized to visualize which input
#' genes are involved in the enriched terms as a graph. The term-gene graph
#' shows the links between genes and biological terms and allows for the
#' investigation of multiple terms to which significant genes are related. The
#' graph also enables determination of the overlap between the enriched terms
#' by identifying shared and distinct significant term-related genes.
#'
#' @import ggraph
#' @export
#'
#' @examples
#' p <- term_gene_graph(example_pathfindR_output)
#' p <- term_gene_graph(example_pathfindR_output, num_terms = 5)
#' p <- term_gene_graph(example_pathfindR_output, node_size = 'p_val')
term_gene_graph <- function(result_df, num_terms = 10, layout = "stress", use_description = FALSE,
    node_size = "num_genes", node_colors = c("#E5D7BF", "green", "red")) {
    ############ Argument Checks Check num_terms is NULL or numeric
    if (!is.numeric(num_terms) & !is.null(num_terms)) {
        stop("`num_terms` must either be numeric or NULL!")
    }

    ### Check use_description is boolean
    if (!is.logical(use_description)) {
        stop("`use_description` must either be TRUE or FALSE!")
    }

    ### Set column for term labels
    ID_column <- ifelse(use_description, "Term_Description", "ID")

    ### Check node_size
    val_node_size <- c("num_genes", "p_val")
    if (!node_size %in% val_node_size) {
        stop("`node_size` should be one of ", paste(dQuote(val_node_size), collapse = ", "))
    }

    if (!is.data.frame(result_df)) {
        stop("`result_df` should be a data frame")
    }

    ### Check necessary columnns
    necessary_cols <- c(ID_column, "lowest_p", "Up_regulated", "Down_regulated")

    if (!all(necessary_cols %in% colnames(result_df))) {
        stop(paste(c("All of", paste(necessary_cols, collapse = ", "), "must be present in `results_df`!"),
            collapse = " "))
    }

    if (!is.atomic(node_colors)) {
      stop("`node_colors` should be a vector of colors")
    }

    if (!all(vapply(node_colors, isColor, TRUE))) {
      stop("`node_colors` should be a vector of valid colors")
    }

    if (length(node_colors) != 3) {
      stop("`node_colors` must contain exactly 3 colors")
    }

    ############ Initial steps set num_terms to NULL if number of enriched
    ############ terms is smaller than num_terms
    if (!is.null(num_terms)) {
        if (nrow(result_df) < num_terms) {
            num_terms <- NULL
        }
    }

    ### Order and filter for top N genes
    result_df <- result_df[order(result_df$lowest_p, decreasing = FALSE), ]
    if (!is.null(num_terms)) {
        result_df <- result_df[1:num_terms, ]
    }

    ### Prep data frame for graph
    graph_df <- data.frame()
    for (i in base::seq_len(nrow(result_df))) {
        up_genes <- unlist(strsplit(result_df$Up_regulated[i], ", "))
        down_genes <- unlist(strsplit(result_df$Down_regulated[i], ", "))
        for (gene in c(up_genes, down_genes)) {
            graph_df <- rbind(graph_df, data.frame(Term = result_df[i, ID_column],
                Gene = gene))
        }
    }

    up_genes <- lapply(result_df$Up_regulated, function(x) unlist(strsplit(x, ", ")))
    up_genes <- unlist(up_genes)

    ############ Create graph object and plot create igraph object
    g <- igraph::graph_from_data_frame(graph_df, directed = FALSE)
    cond_term <- names(igraph::V(g)) %in% result_df[, ID_column]
    cond_up_gene <- names(igraph::V(g)) %in% up_genes

    node_type <-  ifelse(cond_term, "term", ifelse(cond_up_gene, "up", "down"))
    node_type <- factor(node_type, levels = c("term", "up", "down"))
    node_type <- droplevels(node_type)
    igraph::V(g)$type <- node_type

    type_descriptions <- c(term="enriched term", up="up-regulated gene", down="down-regulated gene")
    type_descriptions <- type_descriptions[levels(node_type)]

    names(node_colors) <- c("term", "up", "down")
    node_colors <- node_colors[levels(node_type)]

    # Adjust node sizes
    if (node_size == "num_genes") {
        sizes <- igraph::degree(g)
        sizes <- ifelse(igraph::V(g)$type == "term", sizes, 2)
        size_label <- "# genes"
    } else {
        idx <- match(names(igraph::V(g)), result_df[, ID_column])
        sizes <- -log10(result_df$lowest_p[idx])
        sizes[is.na(sizes)] <- 2
        size_label <- "-log10(p)"
    }
    igraph::V(g)$size <- sizes
    igraph::V(g)$label.cex <- 0.5
    igraph::V(g)$frame.color <- "gray"

    ### Create graph
    p <- ggraph::ggraph(g, layout = layout)
    p <- p + ggraph::geom_edge_link(alpha = 0.8, colour = "darkgrey")
    p <- p + ggraph::geom_node_point(ggplot2::aes(color = .data$type, size = .data$size))
    p <- p + ggplot2::scale_size(range = c(5, 10), breaks = round(seq(round(min(igraph::V(g)$size)),
        round(max(igraph::V(g)$size)), length.out = 4)), name = size_label)
    p <- p + ggplot2::theme_void()
    p <- p + suppressWarnings(ggraph::geom_node_text(ggplot2::aes(label = .data$name),
        nudge_y = 0.2, repel = TRUE, max.overlaps = 20))
    p <- p + ggplot2::scale_color_manual(values = node_colors, name = NULL,
        labels = type_descriptions)
    if (is.null(num_terms)) {
        p <- p + ggplot2::ggtitle("Term-Gene Graph")
    } else {
        p <- p + ggplot2::ggtitle("Term-Gene Graph", subtitle = paste(c("Top", num_terms,
            "terms"), collapse = " "))
    }

    p <- p + ggplot2::theme(plot.title = ggplot2::element_text(hjust = 0.5), plot.subtitle = ggplot2::element_text(hjust = 0.5))

    return(p)
}


#' Create Terms by Genes Heatmap
#'
#' @param result_df A dataframe of pathfindR results that must contain the following columns: \describe{
#'   \item{Term_Description}{Description of the enriched term (necessary if \code{use_description = TRUE})}
#'   \item{ID}{ID of the enriched term (necessary if \code{use_description = FALSE})}
#'   \item{lowest_p}{the highest adjusted-p value of the given term over all iterations}
#'   \item{Up_regulated}{the up-regulated genes in the input involved in the given term's gene set, comma-separated}
#'   \item{Down_regulated}{the down-regulated genes in the input involved in the given term's gene set, comma-separated}
#' }
#' @param genes_df the input data that was used with \code{\link{run_pathfindR}}.
#'   It must be a data frame with 3 columns: \enumerate{
#'   \item Gene Symbol (Gene Symbol)
#'   \item Change value, e.g. log(fold change) (optional)
#'   \item p value, e.g. adjusted p value associated with differential expression
#' } The change values in this data frame are used to color the affected genes
#' @param num_terms Number of top enriched terms to use while creating the plot. Set to \code{NULL} to use
#'  all enriched terms (default = 10)
#' @inheritParams term_gene_graph
#' @inheritParams plot_scores
#' @param legend_title legend title (default = 'change')
#' @param sort_terms_by_p boolean to indicate whether to sort terms by 'lowest_p'
#' (\code{TRUE}) or by number of genes (\code{FALSE}) (default = \code{FALSE})
#' @param ... additional arguments for \code{\link{input_processing}} (used if
#' \code{genes_df} is provided)
#'
#' @return a ggplot2 object of a heatmap where rows are enriched terms and
#' columns are involved input genes. If \code{genes_df} is provided, colors of
#' the tiles indicate the change values.
#' @export
#'
#' @examples
#' term_gene_heatmap(example_pathfindR_output, num_terms = 3)
term_gene_heatmap <- function(result_df, genes_df, num_terms = 10, use_description = FALSE,
    low = "red", mid = "black", high = "green", legend_title = "change", sort_terms_by_p = FALSE,
    ...) {
    ############ Arg checks
    if (!is.logical(use_description)) {
        stop("`use_description` must either be TRUE or FALSE!")
    }

    ### Set column for term labels
    ID_column <- ifelse(use_description, "Term_Description", "ID")

    if (!is.data.frame(result_df)) {
        stop("`result_df` should be a data frame")
    }

    nec_cols <- c(ID_column, "lowest_p", "Up_regulated", "Down_regulated")
    if (!all(nec_cols %in% colnames(result_df))) {
        stop("`result_df` should have the following columns: ", paste(dQuote(nec_cols),
            collapse = ", "))
    }

    if (!missing(genes_df)) {
        suppressMessages(input_testing(genes_df))
    }

    if (!is.null(num_terms)) {
        if (!is.numeric(num_terms)) {
            stop("`num_terms` should be numeric or NULL")
        }

        if (num_terms < 1) {
            stop("`num_terms` should be > 0 or NULL")
        }
    }

    if (!isColor(low)) {
      stop("`low` should be a valid color")
    }

    if (!isColor(mid)) {
      stop("`mid` should be a valid color")
    }

    if (!isColor(high)) {
      stop("`high` should be a valid color")
    }

    ############ Init prep steps
    result_df <- result_df[order(result_df$lowest_p), ]
    ### select num_terms genes
    if (!is.null(num_terms)) {
        if (num_terms < nrow(result_df)) {
            result_df <- result_df[1:num_terms, ]
        }
    }

    ### process input genes (if provided)
    if (!missing(genes_df)) {
        genes_df <- input_processing(input = genes_df, ...)
    }

    ### parse genes from enrichment results
    parse_genes <- function(vec, idx) {
        return(unname(unlist(strsplit(vec[idx], ", "))))
    }

    up_genes <- apply(result_df, 1, parse_genes, which(colnames(result_df) == "Up_regulated"))
    down_genes <- apply(result_df, 1, parse_genes, which(colnames(result_df) == "Down_regulated"))

    if (length(down_genes) == 0) {
        down_genes <- rep(NA, nrow(result_df))
    }
    if (length(up_genes) == 0) {
        up_genes <- rep(NA, nrow(result_df))
    }

    names(up_genes) <- names(down_genes) <- result_df[, ID_column]

    ############ Create terms-by-genes matrix and order
    all_genes <- unique(c(unlist(up_genes), unlist(down_genes)))
    all_genes <- all_genes[!is.na(all_genes)]
    all_terms <- result_df[, ID_column]

    term_genes_mat <- matrix(0, nrow = nrow(result_df), ncol = length(all_genes),
        dimnames = list(all_terms, all_genes))
    for (i in seq_len(nrow(term_genes_mat))) {
        current_term <- rownames(term_genes_mat)[i]
        current_genes <- c(up_genes[[current_term]], down_genes[[current_term]])
        current_genes <- current_genes[!is.na(current_genes)]
        term_genes_mat[i, match(current_genes, colnames(term_genes_mat))] <- 1
    }

    ### Order by column
    term_genes_mat <- term_genes_mat[, order(colSums(term_genes_mat), decreasing = TRUE)]

    ### Order by row
    ordering_func <- function(row) {
        n <- length(row)
        pow <- 2^-(0:(n - 1))
        return(row %*% pow)
    }
    term_genes_mat <- term_genes_mat[order(apply(term_genes_mat, 1, ordering_func),
        decreasing = TRUE), ]

    ### Transform the matrix
    var_names <- list()
    var_names[["Enriched_Term"]] <- factor(rownames(term_genes_mat), levels = rev(rownames(term_genes_mat)))
    var_names[["Symbol"]] <- factor(colnames(term_genes_mat), levels = colnames(term_genes_mat))


    term_genes_df <- expand.grid(var_names, KEEP.OUT.ATTRS = FALSE, stringsAsFactors = FALSE)
    value <- as.vector(term_genes_mat)
    value <- data.frame(value)
    term_genes_df <- cbind(term_genes_df, value)
    term_genes_df$value[term_genes_df$value == 0] <- NA

    bg_df <- expand.grid(Enriched_Term = all_terms, Symbol = all_genes)
    if (sort_terms_by_p) {
        bg_df$Enriched_Term <- factor(bg_df$Enriched_Term, levels = rev(result_df[,
            ID_column]))
    } else {
        bg_df$Enriched_Term <- factor(bg_df$Enriched_Term, levels = rev(rownames(term_genes_mat)))
    }


    bg_df$Symbol <- factor(bg_df$Symbol, levels = colnames(term_genes_mat))

    if (!missing(genes_df)) {
        for (i in seq_len(nrow(term_genes_df))) {
            if (!is.na(term_genes_df$value[i])) {
                if (all(genes_df$CHANGE == 1e+06)) {
                  term_genes_df$value[i] <- ifelse(term_genes_df$Symbol[i] %in% up_genes[[as.character(term_genes_df$Enriched_Term[i])]],
                    1, -1)
                } else {
                  term_genes_df$value[i] <- genes_df$CHANGE[genes_df$GENE == term_genes_df$Symbol[i]]
                }
            }
        }

        if (all(genes_df$CHANGE == 1e+06)) {
            term_genes_df$value <- factor(term_genes_df$value, levels = c(-1, 1))
        }
    } else {
        for (i in seq_len(nrow(term_genes_df))) {
            if (!is.na(term_genes_df$value[i])) {
                term_genes_df$value[i] <- ifelse(term_genes_df$Symbol[i] %in% unlist(up_genes),
                  "up", "down")
            }
        }
    }

    g <- ggplot2::ggplot(bg_df, ggplot2::aes(x = .data$Symbol, y = .data$Enriched_Term))
    g <- g + ggplot2::geom_tile(fill = "white", color = "white")
    g <- g + ggplot2::theme(axis.ticks.y = ggplot2::element_blank(), axis.text.x = ggplot2::element_text(angle = 90,
        hjust = 1), axis.text.y = ggplot2::element_text(colour = "#000000"), axis.title.x = ggplot2::element_blank(),
        axis.title.y = ggplot2::element_blank(), panel.grid.major.x = ggplot2::element_blank(),
        panel.grid.major.y = ggplot2::element_blank(), panel.grid.minor.x = ggplot2::element_blank(),
        panel.grid.minor.y = ggplot2::element_blank(), panel.background = ggplot2::element_rect(fill = "#ffffff"))
    g <- g + ggplot2::geom_tile(data = term_genes_df, ggplot2::aes(fill = .data$value),
        color = "gray60")
    if (!missing(genes_df)) {
        if (all(genes_df$CHANGE == 1e+06)) {
            g <- g + ggplot2::scale_fill_manual(values = c(low, high), na.value = "white",
                name = legend_title)
        } else {
            g <- g + ggplot2::scale_fill_gradient2(low = low, mid = mid, high = high,
                na.value = "white", name = legend_title)
        }
    } else {
        g <- g + ggplot2::scale_fill_manual(values = c(low, high), na.value = "white",
            name = legend_title)
    }
    return(g)
}


#' Create UpSet Plot of Enriched Terms
#'
#' @inheritParams term_gene_heatmap
#' @param method the option for producing the plot. Options include 'heatmap',
#' 'boxplot' and 'barplot'. (default = 'heatmap')
#'
#' @return UpSet plots are plots of the intersections of sets as a matrix. This
#' function creates a ggplot object of an UpSet plot where the x-axis is the
#' UpSet plot of intersections of enriched terms. By default (i.e.
#' \code{method = 'heatmap'}) the main plot is a heatmap of genes at the
#' corresponding intersections, colored by up/down regulation (if
#' \code{genes_df} is provided, colored by change values). If
#' \code{method = 'barplot'}, the main plot is bar plots of the number of genes
#' at the corresponding intersections. Finally, if \code{method = 'boxplot'} and
#' if \code{genes_df} is provided, then the main plot displays the boxplots of
#' change values of the genes at the corresponding intersections.
#' @export
#'
#' @examples
#' UpSet_plot(example_pathfindR_output)
UpSet_plot <- function(result_df, genes_df, num_terms = 10, method = "heatmap", use_description = FALSE,
    low = "red", mid = "black", high = "green", ...) {
    ############ Arg checks
    if (!is.logical(use_description)) {
        stop("`use_description` must either be TRUE or FALSE!")
    }

    ### Set column for term labels
    ID_column <- ifelse(use_description, "Term_Description", "ID")

    if (!is.data.frame(result_df)) {
        stop("`result_df` should be a data frame")
    }

    nec_cols <- c(ID_column, "lowest_p", "Up_regulated", "Down_regulated")
    if (!all(nec_cols %in% colnames(result_df))) {
        stop("`result_df` should have the following columns: ", paste(dQuote(nec_cols),
            collapse = ", "))
    }

    if (!missing(genes_df)) {
        suppressMessages(input_testing(genes_df))
    }

    if (!is.null(num_terms)) {
        if (!is.numeric(num_terms)) {
            stop("`num_terms` should be numeric or NULL")
        }

        if (num_terms < 1) {
            stop("`num_terms` should be > 0 or NULL")
        }
    }

    valid_opts <- c("heatmap", "boxplot", "barplot")
    if (!method %in% valid_opts) {
        stop("`method` should be one of` ", paste(dQuote(valid_opts), collapse = ", "))
    }

    if (!isColor(low)) {
      stop("`low` should be a valid color")
    }

    if (!isColor(mid)) {
      stop("`mid` should be a valid color")
    }

    if (!isColor(high)) {
      stop("`high` should be a valid color")
    }

    ########## Init prep steps
    result_df <- result_df[order(result_df$lowest_p), ]
    ### select num_terms genes
    if (!is.null(num_terms)) {
        if (num_terms < nrow(result_df)) {
            result_df <- result_df[1:num_terms, ]
        }
    }

    ### process input genes (if provided)
    if (!missing(genes_df)) {
        genes_df <- input_processing(input = genes_df, ...)
    }

    ### parse genes from enrichment results
    parse_genes <- function(vec, idx) {
        return(unname(unlist(strsplit(vec[idx], ", "))))
    }

    up_genes <- apply(result_df, 1, parse_genes, which(colnames(result_df) == "Up_regulated"))
    down_genes <- apply(result_df, 1, parse_genes, which(colnames(result_df) == "Down_regulated"))

    if (length(down_genes) == 0) {
        down_genes <- rep(NA, nrow(result_df))
    }
    if (length(up_genes) == 0) {
        up_genes <- rep(NA, nrow(result_df))
    }

    names(up_genes) <- names(down_genes) <- result_df[, ID_column]

    ############ Create terms-by-genes matrix and order
    all_genes <- unique(c(unlist(up_genes), unlist(down_genes)))
    all_terms <- result_df[, ID_column]

    term_genes_mat <- matrix(0, nrow = nrow(result_df), ncol = length(all_genes),
        dimnames = list(all_terms, all_genes))
    for (i in seq_len(nrow(term_genes_mat))) {
        current_term <- rownames(term_genes_mat)[i]
        current_genes <- c(up_genes[[current_term]], down_genes[[current_term]])
        term_genes_mat[i, match(current_genes, colnames(term_genes_mat))] <- 1
    }

    ### Transform the matrix
    var_names <- list()
    var_names[["Enriched_Term"]] <- factor(rownames(term_genes_mat), levels = rownames(term_genes_mat))
    var_names[["Symbol"]] <- factor(colnames(term_genes_mat), levels = colnames(term_genes_mat))


    term_genes_df <- expand.grid(var_names, KEEP.OUT.ATTRS = FALSE, stringsAsFactors = FALSE)
    value <- as.vector(term_genes_mat)
    value <- data.frame(value)
    term_genes_df <- cbind(term_genes_df, value)
    term_genes_df <- term_genes_df[term_genes_df$value != 0, ]

    ### Order according to frequencies
    term_genes_df$Enriched_Term <- factor(term_genes_df$Enriched_Term, levels = names(sort(table(term_genes_df$Enriched_Term),
        decreasing = TRUE)))
    term_genes_df$Symbol <- factor(term_genes_df$Symbol, levels = rev(names(sort(table(term_genes_df$Symbol)))))

    terms_lists <- rev(split(term_genes_df$Enriched_Term, term_genes_df$Symbol))

    plot_df <- data.frame(
      Gene = names(terms_lists),
      Up_Down = ifelse(names(terms_lists) %in% unlist(up_genes), "up", "down"),
      stringsAsFactors = FALSE
    )

    plot_df$Term <- terms_lists

    bg_df <- expand.grid(Gene = unique(plot_df$Gene), Term = unique(plot_df$Term))

    if (method == "heatmap") {
        g <- ggplot2::ggplot(bg_df, ggplot2::aes(x = .data$Term, y = .data$Gene))
        g <- g + ggplot2::geom_tile(fill = "white", color = "gray60")

        if (missing(genes_df)) {
            g <- g + ggplot2::geom_tile(data = plot_df, ggplot2::aes(x = .data$Term,
                y = .data$Gene, fill = .data$Up_Down), color = "gray60")
            g <- g + ggplot2::scale_fill_manual(values = c(low, high))
        } else {
            plot_df$Value <- genes_df$CHANGE[match(names(plot_df$Term), genes_df$GENE)]
            g <- g + ggplot2::geom_tile(data = plot_df, ggplot2::aes(x = .data$Term,
                y = .data$Gene, fill = .data$Value), color = "gray60")
            g <- g + ggplot2::scale_fill_gradient2(low = low, mid = mid, high = high)
        }
        g <- g + ggplot2::theme_minimal()
        g <- g + ggplot2::theme(axis.title = ggplot2::element_blank(), panel.grid.major = ggplot2::element_blank(),
            panel.grid.minor = ggplot2::element_blank(), legend.title = ggplot2::element_blank())
    } else if (method == "boxplot") {
        if (missing(genes_df)) {
            stop("For `method = boxplot`, you must provide `genes_df`")
        }

        plot_df$Value <- genes_df$CHANGE[match(names(plot_df$Term), genes_df$GENE)]
        g <- ggplot2::ggplot(plot_df, ggplot2::aes(x = .data$Term, y = .data$Value))
        g <- g + ggplot2::geom_boxplot()
        g <- g + ggplot2::geom_jitter(width = 0.1)
    } else {
        g <- ggplot2::ggplot(plot_df, ggplot2::aes(x = .data$Term))
        g <- g + ggplot2::geom_bar()
    }

    g <- g + ggupset::scale_x_upset(order_by = ifelse(missing(genes_df), "freq",
        "degree"), reverse = !missing(genes_df))
    return(g)
}


================================================
FILE: R/zzz.R
================================================
.onAttach <- function(libname, pkgname) {
    packageStartupMessage("##############################################################################
                        Welcome to pathfindR!

Please cite the article below if you use pathfindR in published reseach:

Ulgen E, Ozisik O, Sezerman OU. 2019. pathfindR: An R Package for Comprehensive
Identification of Enriched Pathways in Omics Data Through Active Subnetworks.
Front. Genet. doi:10.3389/fgene.2019.00858

##############################################################################")
    check_java_version()
}

#' Obtain Java Version
#'
#' @return character vector containing the output of 'java -version'
#'
#' @details this function was adapted from the CRAN package \code{sparklyr}
fetch_java_version <- function() {
    java_home <- Sys.getenv("JAVA_HOME", unset = NA)
    if (!is.na(java_home)) {
        java <- file.path(java_home, "bin", "java")
        if (identical(.Platform$OS.type, "windows")) {
            java <- paste0(java, ".exe")
        }
        if (!file.exists(java)) {
            java <- ""
        }
    } else {
        java <- Sys.which("java")
    }

    if (java == "") {
        stop("Java version not detected. Please download and install Java from ",
            dQuote("https://www.java.com/en/"))
    }

    version <- system2(java, "-version", stderr = TRUE, stdout = TRUE)
    if (length(version) < 1) {
        stop("Java version not detected. Please download and install Java from ",
            dQuote("https://www.java.com/en/"))
    }

    return(version)
}

#' Check Java Version
#'
#' @param version character vector containing the output of 'java -version'. If
#' NULL, result of \code{\link{fetch_java_version}} is used (default = NULL)
#'
#' @return only parses and checks whether the java version is >= 1.8
#'
#' @details this function was adapted from the CRAN package \code{sparklyr}
check_java_version <- function(version = NULL) {
    if (is.null(version)) {
        version <- fetch_java_version()
    }

    # find line with version info
    versionLine <- version[grepl("version", version)]
    if (length(versionLine) != 1) {
        stop("Java version detected but couldn't parse version from ", paste(version,
            collapse = " - "))
    }

    # transform to usable R version string
    vers_string <- strsplit(versionLine, "\\s+", perl = TRUE)[[1]]
    vers_string <- vers_string[grepl("[0-9]+\\.[0-9]+\\.[0-9]+", vers_string, perl = TRUE)]
    if (length(vers_string) != 1) {
        vers_string <- strsplit(versionLine, "\\s+", perl = TRUE)[[1]]
        vers_string <- vers_string[grepl("[0-9]+", vers_string, perl = TRUE)]
        vers_string <- vers_string[!grepl("-", vers_string)]

        if (length(vers_string) != 1) {
            stop("Java version detected but couldn't parse version from: ", versionLine)
        }
    }

    parsedVersion <- gsub("^\"|\"$", "", vers_string)
    parsedVersion <- gsub("_", ".", parsedVersion)
    parsedVersion <- gsub("[^0-9.]+", "", parsedVersion)

    # ensure Java 1.8 (8) or higher
    if (utils::compareVersion(parsedVersion, "1.8") < 0) {
        stop("Java version", parsedVersion, " detected but Java >=8 is required.
  Please download and install Java from ",
            dQuote("https://www.java.com/en/"))
    }
}


================================================
FILE: README.Rmd
================================================
---
output: github_document
---

<!-- README.md is generated from README.Rmd. Please edit that file -->

```{r, include=FALSE}
knitr::opts_chunk$set(collapse=TRUE,
                      comment="#>",
                      fig.path="inst/extdata/",
                      out.width="100%")
suppressPackageStartupMessages(library(pathfindR))
```

# <img src="https://github.com/egeulgen/pathfindR/blob/master/inst/extdata/logo.png?raw=true" align="left" height=150/> pathfindR: An R Package for Enrichment Analysis Utilizing Active Subnetworks

<!-- badges: start -->
[![R-CMD-check](https://github.com/egeulgen/pathfindR/actions/workflows/R-CMD-check.yaml/badge.svg)](https://github.com/egeulgen/pathfindR/actions/workflows/R-CMD-check.yaml)
[![codecov](https://codecov.io/gh/egeulgen/pathfindR/graph/badge.svg?token=8m9aPaXzNr)](https://codecov.io/gh/egeulgen/pathfindR)
[![CRAN version](https://www.r-pkg.org/badges/version/pathfindR)](https://cran.r-project.org/package=pathfindR)
[![CRAN total downloads](https://cranlogs.r-pkg.org/badges/grand-total/pathfindR)](https://cran.r-project.org/package=pathfindR)
[![install with bioconda](https://img.shields.io/badge/install%20with-bioconda-brightgreen.svg?style=flat)](http://bioconda.github.io/recipes/r-pathfindr/README.html)
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/license/mit)
<!-- badges: end -->


# Overview

`pathfindR` is an R package for enrichment analysis via active subnetworks. The package also offers functionality to cluster the enriched terms and identify representative terms in each cluster, score the enriched terms per sample, and visualize analysis results. As of the latest version, the package also allows comparison of two pathfindR results.

The functionality suite of pathfindR is described in detail in _Ulgen E, Ozisik O, Sezerman OU. 2019. pathfindR: An R Package for Comprehensive Identification of Enriched Pathways in Omics Data Through Active Subnetworks. Front. Genet. [https://doi.org/10.3389/fgene.2019.00858](https://doi.org/10.3389/fgene.2019.00858)_

For detailed documentation, see [pathfindR's website](https://egeulgen.github.io/pathfindR/).

# Installation

- You can install the released version of pathfindR from CRAN via:

```{r installation1, eval=FALSE}
install.packages("pathfindR")
```

- Since version 2.1.0, you may also install `pathfindR` via conda:

```{bash conda, eval=FALSE}
conda install -c bioconda r-pathfindr
```

- Via [pak](https://pak.r-lib.org/) (this might be preferable given `pathfindR`'s Bioconductor dependencies):

```{r installation2, eval=FALSE}
install.packages("pak") # if you have not installed "pak"
pak::pkg_install("pathfindR")
```

- And the development version from GitHub via `devtools`:

```{r installation3, eval=FALSE}
install.packages("devtools") # if you have not installed "devtools"
devtools::install_github("egeulgen/pathfindR")
```

> **IMPORTANT NOTE**
> For the active subnetwork search component to work, the user must have [Java (>= 8.0)](https://www.java.com/en/download/) installed, and the path/to/java must be in the PATH environment variable.

We also have docker images available on [Docker Hub](https://hub.docker.com/repository/docker/egeulgen/pathfindr) and [GitHub](https://github.com/egeulgen/pathfindR/packages):

```{bash docker, eval=FALSE}
# pull image for the latest release
docker pull egeulgen/pathfindr:latest

# pull image for a specific version (e.g., 1.4.1)
docker pull egeulgen/pathfindr:1.4.1
```

Online app on superbio.ai: [https://app.superbio.ai/apps/111/](https://app.superbio.ai/apps/111/)

# Enrichment Analysis with pathfindR

![pathfindR Enrichment Workflow](https://github.com/egeulgen/pathfindR/blob/master/vignettes/pathfindr.png?raw=true "pathfindr Enrichment Workflow")

This workflow takes in a data frame consisting of "gene symbols", "change values" (optional), and "associated p-values":

```{r example_input, echo=FALSE}
tmp <- example_pathfindR_input[1:4, ]
tmp$logFC <- round(tmp$logFC,2)
tmp$adj.P.Val <- format(tmp$adj.P.Val, digits=2)
colnames(tmp) <- c("Gene_symbol", "logFC", "FDR_p")
knitr::kable(tmp, align=c("l", "c", "c"))
```

After input testing, any gene symbol that is not in the chosen protein-protein interaction network (PIN) is converted to an alias symbol if there is an alias that is found in the PIN. After mapping the input genes with the associated p-values onto the PIN, active subnetwork search is performed. The resulting active subnetworks are then filtered based on their scores and the number of significant genes they contain. 

> An active subnetwork can be defined as a group of interconnected genes in a protein-protein interaction network (PIN) that predominantly consists of significantly altered genes. In other words, active subnetworks define distinct disease-associated sets of interacting genes, whether discovered through the original analysis or discovered because of being in interaction with a significant gene.

These filtered lists of active subnetworks are then used for enrichment analyses, i.e., using the genes in each of the active subnetworks, the significantly enriched terms (pathways/gene sets) are identified. Enriched terms with adjusted p-values larger than the given threshold are discarded, and the lowest adjusted p-value (among all active subnetworks) for each term is kept. This process of `active subnetwork search + enrichment analyses` is repeated for a selected number of iterations, performed in parallel. Over all iterations, the lowest and the highest adjusted p-values, and the number of occurrences among all iterations are reported for each significantly enriched term.

This workflow can be run using the function `run_pathfindR()`:

```{r basic_usage, eval=FALSE}
library(pathfindR)
output_df <- run_pathfindR(input_df)
```

This wrapper function performs the active-subnetwork-oriented enrichment analysis, and returns a data frame of enriched terms:

![pathfindR Enrichment Chart](https://github.com/egeulgen/pathfindR/blob/master/vignettes/enrichment_chart.png?raw=true "Enrichment Chart")

Some useful arguments are:

```{r useful_args, eval=FALSE}
# set an output directory for saving active subnetworks and creating an HTML report 
# (default=NULL, sets a temporary directory)
output_df <- run_pathfindR(input_df, output_dir="/top/secret/results")

# change the gene sets used for analysis (default="KEGG")
output_df <- run_pathfindR(input_df, gene_sets="GO-MF")

# change the PIN for active subnetwork search (default=Biogrid)
output_df <- run_pathfindR(input_df, pin_name_path="IntAct")
# or use an external PIN of your choice
output_df <- run_pathfindR(input_df, pin_name_path="/path/to/my/PIN.sif")

# change the number of iterations (default=10)
output_df <- run_pathfindR(input_df, iterations=25) 

# report the non-significant active subnetwork genes (for later analyses)
output_df <- run_pathfindR(input_df, list_active_snw_genes=TRUE)
```

The available PINs are "Biogrid", "STRING", "GeneMania", "IntAct", "KEGG" and "mmu_STRING". The available gene sets are "KEGG", "Reactome", "BioCarta", "GO-All", "GO-BP", "GO-CC", "GO-MF", and "mmu_KEGG". You also use a custom PIN (see `?return_pin_path`) or a custom gene set (see `?fetch_gene_set`)

> As of the latest development version, pathfindR offers utility functions for obtaining organism-specific PIN data (for now, only BioGRID PINs) and organism-specific gene sets (KEGG and Reactome) data via `get_pin_file()` and `get_gene_sets_list()`, respectively.

# Clustering of the Enriched Terms

![Enriched Terms Clustering Workflow](https://github.com/egeulgen/pathfindR/blob/master/vignettes/term_clustering.png?raw=true "Enriched Terms Clustering Workflow")
The wrapper function for this workflow is `cluster_enriched_terms()`.

This workflow first calculates the pairwise kappa statistics between the enriched terms. The function then performs hierarchical clustering (by default), automatically determines the optimal number of clusters by maximizing the average silhouette width and returns a data frame with cluster assignments.

```{r clustering_h, eval=FALSE}
# default settings
clustered_df <- cluster_enriched_terms(output_df)

# display the heatmap of hierarchical clustering
clustered_df <- cluster_enriched_terms(output_df, plot_hmap=TRUE)

# display the dendrogram and automatically-determined clusters
clustered_df <- cluster_enriched_terms(output_df, plot_dend=TRUE)

# change agglomeration method (default="average") for hierarchical clustering
clustered_df <- cluster_enriched_terms(output_df, clu_method="centroid")
```

Alternatively, the `fuzzy` clustering method (as described in Huang DW, Sherman BT, Tan Q, et al. The DAVID Gene Functional Classification Tool: a novel biological module-centric algorithm to functionally analyze large gene lists. Genome Biol. 2007;8(9):R183.) can be used:

```{r clustering_f, eval=FALSE}
clustered_df_fuzzy <- cluster_enriched_terms(output_df, method="fuzzy")
```

# Visualization of Enrichment Results

## Enriched Term Diagrams

For H.sapiens KEGG enrichment analyses, `visualize_terms()` can be used to generate KEGG pathway diagrams as `ggraph` (inherits from `ggplot`) objects (using [`ggkegg`](https://github.com/noriakis/ggkegg)):

```{r KEGG_vis, eval=FALSE}
input_processed <- input_processing(example_pathfindR_input)
gg_list <- visualize_terms(
  result_df = example_pathfindR_output,
  input_processed = input_processed,
  is_KEGG_result = TRUE
)  # this function returns a list of ggraph objects (named by Term ID)

# save one of the plots as PDF image
ggplot2::ggsave(
  "hsa04911_diagram.pdf",   # path to output, format is determined by extension
  gg_list$hsa04911,         # what to plot
  width = 5                 # adjust width
  height = 5                # adjust height
) 
```

![KEGG Pathway Diagram](https://github.com/egeulgen/pathfindR/blob/master/vignettes/example_kegg_pathway_diagram.png?raw=true)

Alternatively (i.e., for other types of (non-KEGG) enrichment analyses), an interaction diagram per enriched term can be generated again via `visualize_terms()`. These diagrams are also returned as `ggraph` objects:

```{r nonKEGG_viss, eval=FALSE}
input_processed <- input_processing(example_pathfindR_input)
gg_list <- visualize_terms(
  result_df = example_pathfindR_output,
  input_processed = input_processed,
  is_KEGG_result = FALSE,
  pin_name_path = "Biogrid"
)  # this function returns a list of ggraph objects (named by Term ID)

# save one of the plots as PDF image
ggplot2::ggsave(
  "diabetic_cardiomyopathy_interactions.pdf",   # path to output, format is determined by extension
  gg_list$hsa04911,                             # what to plot
  width = 10                                    # adjust width
  height = 6                                    # adjust height
) 
```

![Interaction Diagram](https://github.com/egeulgen/pathfindR/blob/master/vignettes/example_interaction_vis.png?raw=true)

## Term-Gene Heatmap

The function `term_gene_heatmap()` can visualize the heatmap of enriched terms by the involved input genes. This heatmap allows visual identification of the input genes involved in the enriched terms, and the common or distinct genes between different terms. If the input data frame (same as in `run_pathfindR()`) is supplied, the tile colors indicate the change values.

![Term-Gene Heatmap](https://github.com/egeulgen/pathfindR/blob/master/vignettes/hmap.png?raw=true "Term-Gene Heatmap")

## Term-Gene Graph

The function `term_gene_graph()` (adapted from the Gene-Concept network visualization by the R package `enrichplot`) can be utilized to visualize which significant genes are involved in the enriched terms. The function creates the term-gene graph, displaying the connections between genes and biological terms (enriched pathways or gene sets). This allows for the investigation of multiple terms to which significant genes are related. The graph also enables the determination of the degree of overlap between the enriched terms by identifying shared and/or distinct significant genes.

![Term-Gene Graph](https://github.com/egeulgen/pathfindR/blob/master/vignettes/term_gene.png?raw=true "Term-Gene Graph")

## UpSet Plot

UpSet plots are plots of the intersections of sets as a matrix. This function creates a ggplot object of an UpSet plot where the x-axis is the UpSet plot of intersections of enriched terms. By default (i.e., `method="heatmap"`), the main plot is a heatmap of genes at the corresponding intersections, colored by up-/down-regulation (if `genes_df` is provided, colored by change values). If `method="barplot"`, the main plot is bar plots of the number of genes at the corresponding intersections. Finally, if `method="boxplot"` and `genes_df` is provided, then the main plot displays the boxplots of the genes' change values at the corresponding intersections.

![UpSet plot](https://github.com/egeulgen/pathfindR/blob/master/vignettes/upset.png?raw=true "UpSet Plot")

# Per Sample Enriched Term Scores

![Agglomerated Scores for all Enriched Terms per Sample](https://github.com/egeulgen/pathfindR/blob/master/vignettes/score_hmap.png?raw=true "Scoring per Sample")

The function `score_terms()` can be used to calculate the agglomerated z score of each enriched term per sample. This allows the user to examine the scores individually and infer how a term is overall altered (activated or repressed) in a given sample or a group of samples.

# Comparison of 2 pathfindR Results

The function `combine_pathfindR_results()` allows combining two pathfindR analysis results for investigating common and distinct terms between the groups. Below is an example for comparing two different results using rheumatoid arthritis-related data.

```{r compare2res, eval=FALSE}
combined_df <- combine_pathfindR_results(
  result_A=an_output_df, 
  result_B=another_output_df
)
```

By default, `combine_pathfindR_results()` plots the term-gene graph for the common terms in the combined results. The function `combined_results_graph()` can be used to create this graph (using only selected terms etc.) later on.

```{r compare_graph, eval=FALSE}
combined_results_graph(combined_df, selected_terms=c("hsa04144", "hsa04141", "hsa04140"))
```

![Combined Results Graph](https://github.com/egeulgen/pathfindR/blob/master/vignettes/combined_graph.png?raw=true "Combined Results Graph")


================================================
FILE: README.md
================================================

<!-- README.md is generated from README.Rmd. Please edit that file -->

# <img src="https://github.com/egeulgen/pathfindR/blob/master/inst/extdata/logo.png?raw=true" align="left" height=150/> pathfindR: An R Package for Enrichment Analysis Utilizing Active Subnetworks

<!-- badges: start -->

[![R-CMD-check](https://github.com/egeulgen/pathfindR/actions/workflows/R-CMD-check.yaml/badge.svg)](https://github.com/egeulgen/pathfindR/actions/workflows/R-CMD-check.yaml)
[![codecov](https://codecov.io/gh/egeulgen/pathfindR/graph/badge.svg?token=8m9aPaXzNr)](https://codecov.io/gh/egeulgen/pathfindR)
[![CRAN
version](https://www.r-pkg.org/badges/version/pathfindR)](https://cran.r-project.org/package=pathfindR)
[![CRAN total
downloads](https://cranlogs.r-pkg.org/badges/grand-total/pathfindR)](https://cran.r-project.org/package=pathfindR)
[![install with
bioconda](https://img.shields.io/badge/install%20with-bioconda-brightgreen.svg?style=flat)](http://bioconda.github.io/recipes/r-pathfindr/README.html)
[![License:
MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/license/mit)
<!-- badges: end -->

# Overview

`pathfindR` is an R package for enrichment analysis via active
subnetworks. The package also offers functionality to cluster the
enriched terms and identify representative terms in each cluster, score
the enriched terms per sample, and visualize analysis results. As of the
latest version, the package also allows comparison of two pathfindR
results.

The functionality suite of pathfindR is described in detail in *Ulgen E,
Ozisik O, Sezerman OU. 2019. pathfindR: An R Package for Comprehensive
Identification of Enriched Pathways in Omics Data Through Active
Subnetworks. Front. Genet. <https://doi.org/10.3389/fgene.2019.00858>*

For detailed documentation, see [pathfindR’s
website](https://egeulgen.github.io/pathfindR/).

# Installation

- You can install the released version of pathfindR from CRAN via:

``` r
install.packages("pathfindR")
```

- Since version 2.1.0, you may also install `pathfindR` via conda:

``` bash
conda install -c bioconda r-pathfindr
```

- Via [pak](https://pak.r-lib.org/) (this might be preferable given
  `pathfindR`’s Bioconductor dependencies):

``` r
install.packages("pak") # if you have not installed "pak"
pak::pkg_install("pathfindR")
```

- And the development version from GitHub via `devtools`:

``` r
install.packages("devtools") # if you have not installed "devtools"
devtools::install_github("egeulgen/pathfindR")
```

> **IMPORTANT NOTE** For the active subnetwork search component to work,
> the user must have [Java (\>= 8.0)](https://www.java.com/en/download/)
> installed, and the path/to/java must be in the PATH environment
> variable.

We also have docker images available on [Docker
Hub](https://hub.docker.com/repository/docker/egeulgen/pathfindr) and
[GitHub](https://github.com/egeulgen/pathfindR/packages):

``` bash
# pull image for the latest release
docker pull egeulgen/pathfindr:latest

# pull image for a specific version (e.g., 1.4.1)
docker pull egeulgen/pathfindr:1.4.1
```

Online app on superbio.ai: <https://app.superbio.ai/apps/111/>

# Enrichment Analysis with pathfindR

<figure>
<img
src="https://github.com/egeulgen/pathfindR/blob/master/vignettes/pathfindr.png?raw=true"
title="pathfindr Enrichment Workflow"
alt="pathfindR Enrichment Workflow" />
<figcaption aria-hidden="true">pathfindR Enrichment
Workflow</figcaption>
</figure>

This workflow takes in a data frame consisting of “gene symbols”,
“change values” (optional), and “associated p-values”:

| Gene_symbol | logFC |  FDR_p  |
|:------------|:-----:|:-------:|
| FAM110A     | -0.69 | 3.4e-06 |
| RNASE2      | 1.35  | 1.0e-05 |
| S100A8      | 1.54  | 3.5e-05 |
| S100A9      | 1.03  | 2.3e-04 |

After input testing, any gene symbol that is not in the chosen
protein-protein interaction network (PIN) is converted to an alias
symbol if there is an alias that is found in the PIN. After mapping the
input genes with the associated p-values onto the PIN, active subnetwork
search is performed. The resulting active subnetworks are then filtered
based on their scores and the number of significant genes they contain.

> An active subnetwork can be defined as a group of interconnected genes
> in a protein-protein interaction network (PIN) that predominantly
> consists of significantly altered genes. In other words, active
> subnetworks define distinct disease-associated sets of interacting
> genes, whether discovered through the original analysis or discovered
> because of being in interaction with a significant gene.

These filtered lists of active subnetworks are then used for enrichment
analyses, i.e., using the genes in each of the active subnetworks, the
significantly enriched terms (pathways/gene sets) are identified.
Enriched terms with adjusted p-values larger than the given threshold
are discarded, and the lowest adjusted p-value (among all active
subnetworks) for each term is kept. This process of
`active subnetwork search + enrichment analyses` is repeated for a
selected number of iterations, performed in parallel. Over all
iterations, the lowest and the highest adjusted p-values, and the number
of occurrences among all iterations are reported for each significantly
enriched term.

This workflow can be run using the function `run_pathfindR()`:

``` r
library(pathfindR)
output_df <- run_pathfindR(input_df)
```

This wrapper function performs the active-subnetwork-oriented enrichment
analysis, and returns a data frame of enriched terms:

<figure>
<img
src="https://github.com/egeulgen/pathfindR/blob/master/vignettes/enrichment_chart.png?raw=true"
title="Enrichment Chart" alt="pathfindR Enrichment Chart" />
<figcaption aria-hidden="true">pathfindR Enrichment Chart</figcaption>
</figure>

Some useful arguments are:

``` r
# set an output directory for saving active subnetworks and creating an HTML report 
# (default=NULL, sets a temporary directory)
output_df <- run_pathfindR(input_df, output_dir="/top/secret/results")

# change the gene sets used for analysis (default="KEGG")
output_df <- run_pathfindR(input_df, gene_sets="GO-MF")

# change the PIN for active subnetwork search (default=Biogrid)
output_df <- run_pathfindR(input_df, pin_name_path="IntAct")
# or use an external PIN of your choice
output_df <- run_pathfindR(input_df, pin_name_path="/path/to/my/PIN.sif")

# change the number of iterations (default=10)
output_df <- run_pathfindR(input_df, iterations=25) 

# report the non-significant active subnetwork genes (for later analyses)
output_df <- run_pathfindR(input_df, list_active_snw_genes=TRUE)
```

The available PINs are “Biogrid”, “STRING”, “GeneMania”, “IntAct”,
“KEGG” and “mmu_STRING”. The available gene sets are “KEGG”, “Reactome”,
“BioCarta”, “GO-All”, “GO-BP”, “GO-CC”, “GO-MF”, and “mmu_KEGG”. You
also use a custom PIN (see `?return_pin_path`) or a custom gene set (see
`?fetch_gene_set`)

> As of the latest development version, pathfindR offers utility
> functions for obtaining organism-specific PIN data (for now, only
> BioGRID PINs) and organism-specific gene sets (KEGG and Reactome) data
> via `get_pin_file()` and `get_gene_sets_list()`, respectively.

# Clustering of the Enriched Terms

![Enriched Terms Clustering
Workflow](https://github.com/egeulgen/pathfindR/blob/master/vignettes/term_clustering.png?raw=true "Enriched Terms Clustering Workflow")
The wrapper function for this workflow is `cluster_enriched_terms()`.

This workflow first calculates the pairwise kappa statistics between the
enriched terms. The function then performs hierarchical clustering (by
default), automatically determines the optimal number of clusters by
maximizing the average silhouette width and returns a data frame with
cluster assignments.

``` r
# default settings
clustered_df <- cluster_enriched_terms(output_df)

# display the heatmap of hierarchical clustering
clustered_df <- cluster_enriched_terms(output_df, plot_hmap=TRUE)

# display the dendrogram and automatically-determined clusters
clustered_df <- cluster_enriched_terms(output_df, plot_dend=TRUE)

# change agglomeration method (default="average") for hierarchical clustering
clustered_df <- cluster_enriched_terms(output_df, clu_method="centroid")
```

Alternatively, the `fuzzy` clustering method (as described in Huang DW,
Sherman BT, Tan Q, et al. The DAVID Gene Functional Classification Tool:
a novel biological module-centric algorithm to functionally analyze
large gene lists. Genome Biol. 2007;8(9):R183.) can be used:

``` r
clustered_df_fuzzy <- cluster_enriched_terms(output_df, method="fuzzy")
```

# Visualization of Enrichment Results

## Enriched Term Diagrams

For H.sapiens KEGG enrichment analyses, `visualize_terms()` can be used
to generate KEGG pathway diagrams as `ggraph` (inherits from `ggplot`)
objects (using [`ggkegg`](https://github.com/noriakis/ggkegg)):

``` r
input_processed <- input_processing(example_pathfindR_input)
gg_list <- visualize_terms(
  result_df = example_pathfindR_output,
  input_processed = input_processed,
  is_KEGG_result = TRUE
)  # this function returns a list of ggraph objects (named by Term ID)

# save one of the plots as PDF image
ggplot2::ggsave(
  "hsa04911_diagram.pdf",   # path to output, format is determined by extension
  gg_list$hsa04911,         # what to plot
  width = 5                 # adjust width
  height = 5                # adjust height
) 
```

<figure>
<img
src="https://github.com/egeulgen/pathfindR/blob/master/vignettes/example_kegg_pathway_diagram.png?raw=true"
alt="KEGG Pathway Diagram" />
<figcaption aria-hidden="true">KEGG Pathway Diagram</figcaption>
</figure>

Alternatively (i.e., for other types of (non-KEGG) enrichment analyses),
an interaction diagram per enriched term can be generated again via
`visualize_terms()`. These diagrams are also returned as `ggraph`
objects:

``` r
input_processed <- input_processing(example_pathfindR_input)
gg_list <- visualize_terms(
  result_df = example_pathfindR_output,
  input_processed = input_processed,
  is_KEGG_result = FALSE,
  pin_name_path = "Biogrid"
)  # this function returns a list of ggraph objects (named by Term ID)

# save one of the plots as PDF image
ggplot2::ggsave(
  "diabetic_cardiomyopathy_interactions.pdf",   # path to output, format is determined by extension
  gg_list$hsa04911,                             # what to plot
  width = 10                                    # adjust width
  height = 6                                    # adjust height
) 
```

<figure>
<img
src="https://github.com/egeulgen/pathfindR/blob/master/vignettes/example_interaction_vis.png?raw=true"
alt="Interaction Diagram" />
<figcaption aria-hidden="true">Interaction Diagram</figcaption>
</figure>

## Term-Gene Heatmap

The function `term_gene_heatmap()` can visualize the heatmap of enriched
terms by the involved input genes. This heatmap allows visual
identification of the input genes involved in the enriched terms, and
the common or distinct genes between different terms. If the input data
frame (same as in `run_pathfindR()`) is supplied, the tile colors
indicate the change values.

<figure>
<img
src="https://github.com/egeulgen/pathfindR/blob/master/vignettes/hmap.png?raw=true"
title="Term-Gene Heatmap" alt="Term-Gene Heatmap" />
<figcaption aria-hidden="true">Term-Gene Heatmap</figcaption>
</figure>

## Term-Gene Graph

The function `term_gene_graph()` (adapted from the Gene-Concept network
visualization by the R package `enrichplot`) can be utilized to
visualize which significant genes are involved in the enriched terms.
The function creates the term-gene graph, displaying the connections
between genes and biological terms (enriched pathways or gene sets).
This allows for the investigation of multiple terms to which significant
genes are related. The graph also enables the determination of the
degree of overlap between the enriched terms by identifying shared
and/or distinct significant genes.

<figure>
<img
src="https://github.com/egeulgen/pathfindR/blob/master/vignettes/term_gene.png?raw=true"
title="Term-Gene Graph" alt="Term-Gene Graph" />
<figcaption aria-hidden="true">Term-Gene Graph</figcaption>
</figure>

## UpSet Plot

UpSet plots are plots of the intersections of sets as a matrix. This
function creates a ggplot object of an UpSet plot where the x-axis is
the UpSet plot of intersections of enriched terms. By default (i.e.,
`method="heatmap"`), the main plot is a heatmap of genes at the
corresponding intersections, colored by up-/down-regulation (if
`genes_df` is provided, colored by change values). If
`method="barplot"`, the main plot is bar plots of the number of genes at
the corresponding intersections. Finally, if `method="boxplot"` and
`genes_df` is provided, then the main plot displays the boxplots of the
genes’ change values at the corresponding intersections.

<figure>
<img
src="https://github.com/egeulgen/pathfindR/blob/master/vignettes/upset.png?raw=true"
title="UpSet Plot" alt="UpSet plot" />
<figcaption aria-hidden="true">UpSet plot</figcaption>
</figure>

# Per Sample Enriched Term Scores

<figure>
<img
src="https://github.com/egeulgen/pathfindR/blob/master/vignettes/score_hmap.png?raw=true"
title="Scoring per Sample"
alt="Agglomerated Scores for all Enriched Terms per Sample" />
<figcaption aria-hidden="true">Agglomerated Scores for all Enriched
Terms per Sample</figcaption>
</figure>

The function `score_terms()` can be used to calculate the agglomerated z
score of each enriched term per sample. This allows the user to examine
the scores individually and infer how a term is overall altered
(activated or repressed) in a given sample or a group of samples.

# Comparison of 2 pathfindR Results

The function `combine_pathfindR_results()` allows combining two
pathfindR analysis results for investigating common and distinct terms
between the groups. Below is an example for comparing two different
results using rheumatoid arthritis-related data.

``` r
combined_df <- combine_pathfindR_results(
  result_A=an_output_df, 
  result_B=another_output_df
)
```

By default, `combine_pathfindR_results()` plots the term-gene graph for
the common terms in the combined results. The function
`combined_results_graph()` can be used to create this graph (using only
selected terms etc.) later on.

``` r
combined_results_graph(combined_df, selected_terms=c("hsa04144", "hsa04141", "hsa04140"))
```

<figure>
<img
src="https://github.com/egeulgen/pathfindR/blob/master/vignettes/combined_graph.png?raw=true"
title="Combined Results Graph" alt="Combined Results Graph" />
<figcaption aria-hidden="true">Combined Results Graph</figcaption>
</figure>


================================================
FILE: _pkgdown.yml
================================================
destination: docs
template:
  params:
    bootswatch: united
    docsearch:
      api_key: 7f13d388d59fea08d4add29291ea2896
      index_name: pathfindr
url: https://egeulgen.github.io/pathfindR/
news:
- one_page: false
reference:
  - title: "pathfindR"
    desc: >
      pathfindR package
    contents:
      - pathfindR
  - title: "Main functions"
    desc: >
      Main functions of pathfindR
    contents:
      - run_pathfindR
      - cluster_enriched_terms
      - enrichment_chart
      - score_terms
      - plot_scores
      - term_gene_heatmap
      - term_gene_graph
      - UpSet_plot
      - visualize_terms
      - cluster_graph_vis
      - visualize_active_subnetworks
  - title: "Data Generation"
    desc: >
      Functions to generate PIN and gene sets data
    contents:
      - get_pin_file
      - get_gene_sets_list
  - title: "Comparison of 2 pathfindR Results"
    desc: >
      Functions to compare 2 different pathfindR results
    contents:
      - combine_pathfindR_results
      - combined_results_graph
  - title: "Enrichment-related functions"
    desc: >
      Active subnetwork search- and Enrichment-related functions
    contents:
      - active_snw_enrichment_wrapper
      - single_iter_wrapper
      - active_snw_search
      - annotate_term_genes
      - enrichment
      - enrichment_analyses
      - fetch_gene_set
      - filterActiveSnws
      - hyperg_test
      - input_processing
      - input_testing
      - return_pin_path
      - summarize_enrichment_results
  - title: "Clustering-related functions"
    desc: >
      Clustering-related functions
    contents:
     - create_kappa_matrix
     - fuzzy_term_clustering
     - hierarchical_term_clustering
  - title: "Visualization functions"
    desc: >
      Visualization-related functions
    contents:
      - color_kegg_pathway
      - visualize_KEGG_diagram
      - visualize_term_interactions
  - title: "Misc. functions"
    desc: >
      Miscellaneous functions
    contents:
      - configure_output_dir
      - check_java_version
      - fetch_java_version
      - get_biogrid_pin
      - process_pin
      - get_kegg_gsets
      - get_reactome_gsets
      - get_mgsigdb_gsets
      - gset_list_from_gmt
      - create_HTML_report
      - isColor
      - safe_get_content


================================================
FILE: codecov.yml
================================================
comment: false

coverage:
  status:
    project:
      default:
        target: auto
        threshold: 1%
    patch:
      default:
        target: auto
        threshold: 1%


================================================
FILE: cran-comments.md
================================================
## Test environments
* local OS X 26.2, R 4.5.2
* macOS-latest (on GitHub-Actions), R 4.5.2
* windows-latest (on GitHub-Actions), R 4.5.2
* ubuntu-latest (on GitHub-Actions), R 4.5.2
* ubuntu-latest (on GitHub-Actions), R devel
* ubuntu-latest (on GitHub-Actions), R 4.4.3
* win-builder (devel and release)

## R CMD check results
  There were no ERRORs, WARNINGs or NOTEs.
  
  This is a minor release for 'pathfindR', fixing the CRAN errors due to strong
  dependencies on a package from Bioconductor data annotation repository. The
  package was moved to 'Suggests' and code was updated to conditionally execute
  if installed, raising an informative message if not.
  
## Downstream dependencies
  There are currently no downstream dependencies for this package.


================================================
FILE: inst/CITATION
================================================
citHeader("Please cite the article below if you use pathfindR in published reseach")

bibentry(
  bibtype = "Article",
	author = c(person("Ege","Ulgen"),
	           person("Ozan","Ozisik"),
	           person(c("Osman","Ugur"),"Sezerman")),
	title = "pathfindR: An R Package for Comprehensive Identification of Enriched Pathways in Omics Data Through Active Subnetworks",
	journal = "Frontiers in Genetics",
	volume = 10,
	year = 2019,
	pages = 858,


  url = "https://doi.org/10.3389/fgene.2019.00858",
	textVersion =
	"Ulgen E, Ozisik O, Sezerman OU. 2019. pathfindR: An R Package for Comprehensive Identification of Enriched Pathways in Omics Data Through Active Subnetworks. Front. Genet. https://doi.org/10.3389/fgene.2019.00858"
)


================================================
FILE: inst/extdata/CREB.txt
================================================
CREB_01
> Genes having at least one occurence of the transcription factor binding site V$CREB_01 (v7.4 TRANSFAC) in the regions spanning up to 4 kb around their transcription starting sites.
ABCE1
ABHD16A
ADAP1
ADCY8
ADNP
ADNP2
AFF4
AHI1
AKIRIN1
ALS2
ANAPC10
APPBP2
AREG
ARIH1
ARL4D
ASPHD1
ATF3
ATG5
ATL2
ATP6V0C
AVPI1
BNIP3L
BRAF
BRE
C11orf87
C12orf66
C1orf35
C3orf26
CALM2
CAMK2D
CBX8
CCBL1
CCDC148
CCNA2
CD2AP
CDC42
CDK2AP2
CDS1
CDX4
CENPE
CHGB
CHMP1B
CHMP2A
CHPF
CLDN6
CLDN7
CLSTN3
CNTROB
CREM
CRH
CSDC2
CTC1
CXCL16
CYLD
DAAM2
DCTN1
DDX19A
DDX28
DDX51
DEPDC4
DGUOK
DHX36
DIO2
DNAJC27
DNTTIP1
DUS2L
DUSP1
EEF2
ELAVL1
ELL2
ELOVL5
EPB41
EPHA2
FAM131A
FAM174A
FAM65A
FGF6
FLJ44313
FLT1
FOSB
FOXD3
G3BP2
GEM
GLI1
GNB4
GNL1
GPBP1
GPM6B
GPR3
GTF3C1
HAS1
HDX
HHIP
HIST1H4E
HOXC10
HS3ST2
HS3ST3A1
HSP90AB1
ID1
IFT20
IKBKB
ING4
INTS7
IRF2BPL
IRX4
IRX6
JUND
KCNA5
KCNF1
KCTD8
LDHA
LGR5
LMCD1
LTBP1
MAF
MAFF
MAOA
MAP1LC3A
MAP3K13
MAPK10
MBIP
MBNL2
MCAM
MITF
MLF2
MMGT1
MRGPRF
MRRF
MYL6
NCALD
NDUFA10
NDUFB2
NEUROD6
NF1
NOC4L
NOL4
NR2E1
NR4A2
NUP214
NUP98
NUPL2
OGDH
ORMDL2
OSBPL9
OSR1
PACRGL
PAFAH1B1
PAK1
PARD6A
PCSK1
PDLIM3
PDP1
PEG3
PER1
PFAS
PHACTR3
PITX2
PKP4
PLCD3
PLK4
PNMA3
PNMA6A
PNRC1
PPARGC1A
PPM1A
PPP1R15A
PPP2R2A
PRELID1
PRR3
PTPRU
RAB24
RAB25
RAB6A
RAD51C
RAI1
RALGAPA1P
RBKS
RBM18
RBMS2
RBP5
RCAN1
RCE1
RELB
RIPK4
RNF44
RNF7
RNMTL1
RPL41
RPRD1A
RPS29
RUNDC3A
RUSC1
RUSC1-AS1
SARNP
SCAMP5
SCG2
SDHB
SEMA4C
SENP2
SEZ6L2
SGIP1
SIK1
SLC18A2
SLC25A25
SLC35F5
SLC38A1
SNAP25
SPAG9
SPATA7
SREBF2
SRRM4
SST
ST13
STAT3
SULT4A1
SUV39H2
SYNGR3
SYT11
TEX14
TGIF2
TH
THADA
THOC1
TIPRL
TMEM147
TMEM39A
TMUB2
TNFAIP1
TP53INP2
TRAF4
TRAP1
TSC22D2
TSPAN7
TUBB2B
UBE2H
UCN
UMPS
USP48
VGF
VPS37B
WFDC3
WISP1
WNT10A
XPNPEP3
XRN2
YTHDC2
YWHAZ
ZBTB11
ZBTB37
ZC3H10
ZFAND2B
ZFYVE27
ZIM2
ZMYM2
ZMYND15
ZNF184
ZNF295
ZNF335
ZNF367
ZNF576
ZNF593
ZNF687


================================================
FILE: inst/extdata/MYC.txt
================================================
CACGTG_MYC_Q2
> Genes having at least one occurence of the highly conserved motif M2 CACGTG sites. The motif matches transcription factor binding site V$MYC_Q2 (v7.4 TRANSFAC).
AATF
ABCA1
ABCA2
ABCB6
ABCC4
ACACA
ACAN
ACAP3
ACLY
ACVR2B
ACY1
ADAM10
ADAM12
ADAMTS17
ADAMTS19
ADAMTS3
ADCK1
ADCY3
ADK
ADNP
ADPRHL1
ADSS
AEN
AFF4
AGMAT
AGRP
AIFM3
AK2
AK3
AKAP1
AKAP10
AKAP12
ALDH18A1
ALDH3B1
ALDH6A1
ALDOA
ALG1
ALX4
AMD1
AMMECR1L
AMPD2
ANAPC13
ANGPT2
ANKHD1
ANKHD1-EIF4EBP3
ANKRD12
ANKRD13B
ANKRD17
ANXA6
AP1S2
AP3D1
AP3M1
APEX1
APOA5
ARF6
ARHGAP12
ARHGAP17
ARHGAP20
ARHGAP44
ARL2
ARL3
ARMC6
ARMCX1
ARMCX2
ARMCX3
ARMCX4
ARMCX6
ARPC5
ARRDC3
ARRDC4
ARX
ASNA1
ASPHD1
ASPSCR1
ASS1
ATAD3A
ATAD3B
ATF1
ATF4
ATF7IP
ATG3
ATG5
ATIC
ATL2
ATOH8
ATP1B3
ATP5F1
ATP5G1
ATP6V0B
ATP6V1A
ATP6V1C1
ATP6V1H
ATP7A
ATXN3
ATXN7L2
AVP
AVPI1
B3GALT2
B3GALT6
B3GNT9
B4GALT2
BAI2
BARHL1
BATF2
BATF3
BAX
BCAS3
BCKDHA
BCL11B
BCL6
BCL7C
BCL9
BCL9L
BCOR
BDNF
BEND4
BEX1
BEX2
BFAR
BHLHA15
BHLHE40
BHLHE41
BLOC1S1
BMP2
BMP2K
BMP4
BMP6
BMP7
BOK
BRD2
BRDT
C10orf46
C11orf10
C12orf12
C12orf66
C14orf132
C15orf39
C16orf42
C16orf72
C17orf102
C19orf2
C19orf26
C19orf54
C19orf70
C1orf43
C1orf51
C1QBP
C20orf111
C20orf112
C21orf91
C2CD2L
C2CD4A
C2orf67
C5orf4
C5orf41
C6orf125
C6orf211
C9orf139
C9orf85
CA14
CACNA1D
CAD
CAMK2D
CAMK4
CAMKK1
CAMKK2
CAMKV
CANX
CBX5
CBX6
CCAR1
CCDC103
CCDC126
CCDC132
CCDC41
CCDC6
CCNYL1
CD164
CD3EAP
CDC14A
CDK20
CDK5R1
CDKL5
CDKN2C
CEACAM5
CEBPA
CEBPB
CELF1
CELSR3
CEP57
CEP63
CEP95
CFL1
CGN
CGREF1
CHD4
CHRM1
CHRNA7
CHST11
CIRH1A
CITED2
CLCN2
CLIP2
CLN3
CLTC
CMTM8
CNNM1
CNOT4
CNPPD1
CNPY2
CNPY3
CNST
COL25A1
COL2A1
COMMD3
COMMD8
COPS7A
COPZ1
CPEB4
CPT1A
CRMP1
CSDA
CSDE1
CSK
CSRP2BP
CTAGE5
CTBP2
CTIF
CTSA
CTSF
CUL5
CXorf41
CXorf56
CYP2D6
DAZL
DCAF11
DCAF13
DCHS1
DCTN4
DCTPP1
DCUN1D4
DDX18
DDX3X
DDX4
DDX5
DEPDC7
DHX35
DIABLO
DIAPH1
DIRC2
DKFZp547B139
DLC1
DLX1
DLX2
DNAAF1
DNAJB5
DNAJB9
DNMT3A
DNTTIP2
DOHH
DOLK
DOPEY1
DPAGT1
DRD1
DSCAM
DTNA
DUSP1
DUSP4
DUSP7
DVL2
DYM
DYRK1B
DZIP1
E2F3
E2F8
EBNA1BP2
EEF1B2
EEF1E1
EFNB1
EFTUD2
EGLN2
EIF2C2
EIF3A
EIF3B
EIF3E
EIF3J
EIF4A1
EIF4B
EIF4E
EIF4G1
ELAVL3
ELK1
ELOVL5
EME1
EN1
EN2
ENO3
ENOPH1
ENPP6
EPB41
EPB41L4B
EPC1
ERCC6
ERF
ESR2
ESRP2
ESRRA
ESYT1
ETV1
EVC2
EWSR1
EXOSC5
EYA4
FABP3
FADS3
FAF1
FAM108B1
FAM116A
FAM117A
FAM123B
FAM134A
FAM13B
FAM164C
FAM179B
FAM192A
FAM19A4
FAM76B
FARP1
FBL
FBXL19
FBXL19-AS1
FBXO33
FCHSD2
FEN1
FGF11
FGF14
FGF19
FGF6
FHOD1
FKBP10
FKBP11
FKBP5
FLJ45684
FLT3
FLVCR2
FMR1
FOSL1
FOXD3
FOXF2
FOXJ3
FOXO3
FOXRED2
FPGS
FRMD3
FXR1
FXYD2
FXYD6
G6PC3
GABARAP
GADD45B
GADD45G
GAPDH
GAR1
GATA4
GATA5
GCSH
GEMIN4
GGN
GIGYF2
GIT1
GJA1
GK
GLA
GLB1L
GLS2
GLYR1
GMFB
GNA13
GNAS
GNB2
GNL1
GNPTG
GPC3
GPD1
GPM6B
GPR176
GPRC5C
GPS1
GPX1
GRIN2A
GRK6
GSK3B
GTF2A1
GTF2H1
GTPBP1
GUCY1A2
GYG1
H2AFZ
H3F3A
HBP1
HEBP2
HERPUD1
HEXA
HHEX
HHIP
HIF1A
HIRA
HLX
HMGA1
HMGN2
HMHA1
HMOX1
HNRNPA1
HNRNPA3
HNRNPD
HNRNPF
HNRNPH2
HNRNPH3
HNRNPM
HNRPDL
HOOK2
HOXA1
HOXA11
HOXA3
HOXA4
HOXA7
HOXA9
HOXB4
HOXB5
HOXB7
HOXC11
HOXC12
HOXC13
HOXC5
HOXD10
HPCA
HPCAL4
HPS3
HPS5
HRH3
HRSP12
HS3ST3A1
HSD11B1L
HSDL1
HSP90AB1
HSPA4
HSPA9
HSPBAP1
HSPD1
HSPE1
HSPH1
HYAL2
ICAM5
IER5L
IFRD2
IGF1R
IGF2BP1
IGF2BP3
IGF2R
IGSF22
IKZF3
IL15RA
IL1RAPL1
ILF3
ILK
INSM1
IPO13
IPO4
IPO7
IQCG
IQGAP2
IRF9
IRS4
JAKMIP1
JOSD1
JPH1
KAT5
KAT6A
KBTBD2
KCMF1
KCNAB1
KCNE4
KCNH4
KCNK5
KCNN4
KCNQ5
KCTD15
KDM3A
KDM4C
KDM6A
KIAA0090
KIAA0586
KIAA0664
KIAA1033
KIAA1407
KIAA1539
KIAA1715
KIAA1737
KLF10
KLF11
KLF9
KLHDC3
KLHL28
KRTCAP2
LAMP1
LAP3
LCLAT1
LDHA
LEF1
LEPREL4
LHX9
LIG3
LIN28A
LMLN
LMNB1
LMX1A
LOC147727
LOC80054
LONP1
LONRF3
LPCAT4
LRFN4
LRP8
LRRC16B
LRRC48
LTBR
LYAR
LYPD1
LZTFL1
LZTS2
MACROD1
MAEL
MAFF
MANF
MAP7
MAPKAPK3
MAT2A
MAX
MBNL1
MCART1
MCM2
MCM8
MCOLN1
MDN1
MEA1
MED28
MEOX2
MEPCE
METAP1D
METTL11A
MFHAS1
MFSD5
MGC13053
MICAL2
MICU1
MIEN1
MINK1
MLL
MMP23A
MMP23B
MNT
MON1A
MORF4L2
MPP3
MPV17
MRM1
MRPL27
MRPL40
MRTO4
MTAP
MTCH2
MTHFD1
MTUS1
MXD3
MXD4
MYB
MYCL1
MYL12A
MYO19
NAA25
NAPA
NAT10
NCL
NCOA6
NDUFA7
NDUFAF4
NDUFS1
NEFM
NEK6
NET1
NEURL2
NEUROD1
NEUROD6
NFATC3
NFX1
NGFRAP1
NID1
NIT1
NKX2-2
NKX2-3
NLN
NMNAT2
NOL6
NOL8
NOP56
NOP58
NOTCH1
NPM1
NPTN
NPTX1
NR0B2
NR1D1
NR4A3
NR5A1
NR6A1
NRAS
NRIP3
NRXN1
NTHL1
NTN3
NUDC
NUDT11
NUP153
NUP62CL
NXPH1
ODC1
OGDHL
OGT
OLFM2
ONECUT1
OPRD1
OSGEP
OSR1
PA2G4
PABPC1
PABPC4
PAICS
PANK3
PATZ1
PAX6
PBRM1
PCDHA10
PDCD6IP
PDIA2
PDIA4
PDK2
PDP2
PDPR
PEPD
PER1
PES1
PFDN2
PFDN6
PFKFB3
PFN1
PHC3
PHF15
PHF17
PHF20L1
PIAS4
PICALM
PIGW
PIK3IP1
PITX3
PKN1
PLA2G4A
PLA2G6
PLAG1
PLAGL1
PLBD1
PLCG2
PLEKHA6
PNMAL1
POGK
POLA1
POLH
POLR2H
POLR2L
POLR3A
POLR3C
POLR3D
POLR3E
POP1
POU3F2
PPARGC1A
PPARGC1B
PPAT
PPCS
PPM1A
PPP1R1B
PPP1R3B
PPP1R3C
PPP1R9B
PPP2R2B
PPRC1
PRDM4
PRELID1
PRKCE
PRKCG
PRKCH
PRMT1
PROK2
PRPS1
PRPS2
PRR3
PRR7
PRRC2C
PSEN2
PSMB3
PSME3
PTCH1
PTGES2
PTMA
PUS1
PWP2
QTRT1
QTRTD1
RAB24
RAB2A
RAB30
RAB31
RAB3IL1
RABGAP1
RAD50
RAD9A
RALYL
RAMP2
RANBP1
RAPGEF6
RARG
RASD2
RASGRP2
RBBP4
RBBP6
RBBP7
RBM15B
RBM19
RBM3
RCL1
RCOR2
REEP3
RELB
REV1
REXO2
RFX4
RFX5
RGL1
RHEBL1
RHOBTB3
RLF
RNF115
RNF128
RNF145
RNF146
RNF219
RNF43
RNF44
RORC
RPA1
RPIA
RPL13A
RPL19
RPL22
RPL30
RPS11
RPS19
RPS2
RPS28
RPS6KA5
RPUSD4
RRAGB
RRAGC
RRP15
RRP8
RRS1
RSPO2
RSPRY1
RTN4
RTN4R
RTN4RL2
RUNX1T1
RUNX2
RUSC1
RXRB
SAE1
SAMD12
SASH3
SATB2
SC5DL
SCAMP3
SCFD2
SCRT2
SCYL1
SDC1
SEC11C
SEC23IP
SELS
SEMA3F
SEMA7A
SEPT3
SERBP1
SET
SEZ6L
SEZ6L2
SFXN2
SGK1
SGOL1
SGTB
SHMT1
SHMT2
SIGMAR1
SIRT1
SLC12A5
SLC12A6
SLC17A2
SLC17A9
SLC1A7
SLC20A1
SLC24A4
SLC25A32
SLC25A33
SLC25A37
SLC26A10
SLC26A2
SLC31A2
SLC33A1
SLC35A5
SLC36A1
SLC38A2
SLC38A5
SLC38A7
SLC39A11
SLC39A7
SLC43A1
SLC4A11
SLC6A15
SLC7A3
SLC7A5
SLC7A5P1
SLC9A5
SLCO1C1
SLCO5A1
SMAD2
SMC3
SMG1
SMYD4
SNCAIP
SNCB
SNN
SNRPA
SNTB2
SNX16
SNX2
SNX5
SNX8
SOCS2
SOCS5
SORD
SORL1
SOX12
SPATA2L
SPG21
SPIN2A
SPNS1
SPPL3
SRFBP1
SRP72
SSR1
ST6GAL1
STAT3
STC2
STK31
STMN1
STMN4
STT3B
STX6
STXBP2
SUCLG2
SUGP2
SUMF1
SUPT16H
SYBU
SYNCRIP
SYNRG
SYT3
SYT6
TAC1
TADA1
TAF6L
TAGLN2
TBC1D15
TBC1D5
TBL1X
TBL1Y
TBX4
TCEAL1
TCEAL3
TCEAL7
TCEAL8
TCERG1
TCF4
TCOF1
TDRD1
TEF
TESK2
TET2
TEX12
TFAP4
TFB2M
TFRC
TGFB2
TGFB3
TGIF2
THAP5
THUMPD2
TIAL1
TIMM10
TIMM50
TIMM8A
TIMM9
TLE3
TLE4
TLL1
TMED10
TMEM108
TMEM132E
TMEM146
TMEM47
TMEM86A
TMUB1
TNFRSF21
TNPO2
TNXB
TOM1
TOM1L2
TOP1
TOPORS
TPM2
TRAPPC8
TRIB1
TRIM3
TRIM37
TRIM46
TRIP10
TRMT2A
TRMT6
TRPM7
TSC1
TSC2
TSKU
TSPAN4
TSR2
TSSK3
TUBA4A
TUBA4B
TUG1
TXLNG
TXNDC12
U2AF2
UBA1
UBE2B
UBE2C
UBE4B
UBIAD1
UBR4
UBR5
UBXN1
UBXN10
UBXN6
UNC45B
USP15
USP2
USP31
USP34
USP36
UTP14A
UTP18
UTY
UVRAG
VARS2
VGF
VLDLR
VPS13A
VPS16
VPS26A
VPS33A
VPS37B
VWA2
WBP2
WDR17
WDR46
WDR65
WDR77
WEE1
WHSC1L1
WIPI1
WWP2
XPO1
XPO5
XRN2
YBX1
YBX2
YEATS2
YPEL5
ZADH2
ZBTB10
ZBTB11
ZBTB40
ZBTB49
ZBTB5
ZBTB8OS
ZC3H10
ZCCHC7
ZCWPW1
ZFP91
ZFYVE26
ZHX2
ZIC1
ZMYM6
ZMYND12
ZNF296
ZNF318
ZNF503
ZNF565
ZNF574
ZNF593
ZNF711
ZNF771
ZNF800
ZNHIT6
ZNRF2
ZZZ3


================================================
FILE: inst/extdata/resultActiveSubnetworkSearch.txt
================================================
91.13730033969254 ZRANB1 ALKBH5 HNRNPH1 EWSR1 PSMD7 CSNK2A2 TRRAP ZNF148 CBX3 NUP93 NUP214 CFL1 PFN1 ATPIF1 PDIA4 RMDN3 HNRNPD HNRNPR RPL31 RPL26 EIF4A3 SARNP YARS ACTG1 TFG MAGED1 DNAAF5 C16ORF58 MARC1 PGRMC2 PTRH2 ATP5J VDAC1 SF3B6 ILF3 SHMT1 APEH ATIC ECH1 GSTO1 TXN CDC37 FBXW4 PFDN5 PIK3R4 PARP1 PRPSAP1 PRPSAP2 COPS5 COPS7B ZZEF1 TCEB2 PCBP1 UBL5 CRKL SGK223 U2AF2 GMFG SF3B2 SRSF5 RPLP2 RPS24 HNRNPDL C12ORF65 MRPS18C SRP54 EIF2S3 BRD7 BRD9 CKAP4 ITGB1 ATP5I UBAC2 PDHA1 ECHS1 MDH2 MRPL49 DNAJA3 PYCR2 SLIRP CALM3 MYL6 MLH1 CKB ARIH2 SMARCA4 ZMYND8 KDM1A LEO1 MRPL33 AKR7A2 MTMR12 THEM6 IARS AIMP2 EEF2 GAPDH ANXA7 PDHB RCC2 BUD31 SON S100A9 RNPS1 ZNF207 DPY30 SRRM1 SRPK1 SRSF8 ERH PHRF1 DDX23 PTBP1 RANGAP1 UBE2I HMGN2 UTRN PHF19 HDAC1 RING1 KDM2B SIN3A THAP11 TNPO1 KCTD5 CYFIP1 GNB1 MTOR TUBB S100A8 RPA1 XPC LSM3 USP11 DYNC1H1 TSR2 DDX24 GMDS ALDH9A1 STK38 LRRFIP1 CLASP2 TRIP12 WRNIP1 ELP3 ARG1 HNRNPA1 HNRNPAB SNAP23 SEPT9 GORASP2 TXLNA NUP62 UBIAD1 EIF2B1 DDX54 TAF15 ARAF FAF2 SLC25A5 RBM14 WDR33 S100P RAB11FIP3 DDOST RRP1 ANKHD1 UQCRQ SLC25A3 SUGP2 QSOX2 ATP2A2 NOA1 GAK PUF60 DAZAP1 AASDHPPT CAPRIN1 NUFIP2 HMGB2 CNDP2 SLC9A3R1 PUS1 NRD1 VPS25 ZNF664 POLD2 MAT2B UVRAG KLF13 WDR11 CREB1 TAF4 ANAPC1 DNMT1 ABCF1 ACOT13 JAK1 PHACTR4 SLBP NKTR NOL9 SBF1 PSMG2 FBXO21 MLLT6 TIMM8B GTF2B HMCES PITPNB TAMM41 COMT SPECC1L DCK HNMT CRELD2 BCL9L DSTN GLRX CASP3 METAP1 GNE C2ORF68 CEP85 CNOT11 WDR4 
60.49402062461474 DDX58 FNBP1 MCEMP1 EWSR1 DDIT3 COPS7B FBXW4 PFDN5 ZNF148 TCEB2 S100A8 KDM1A SF3B2 SRSF5 PTBP1 RANGAP1 PDHA1 ECHS1 SLC9A3R1 KCTD5 GOT1 DNAJA3 MAP4K1 RNF34 EEF2 ATIC USP11 ELMSAN1 FOXJ3 WRNIP1 ELP3 IARS HNRNPAB ACTB ACTG1 CFL1 FAF2 TMEM43 PDIA4 TMEM131 SPG7 UBE2I PRPSAP1 BAG4 IST1 SNRPB BHLHE40 ADA YEATS2 MYL6B ABCF1 TICAM1 DDX54 GSE1 BTBD2 ZNF207 PHRF1 TAF4 GLTSCR1 IGBP1 CENPB GID8 MLLT6 KCTD20 GAK NUDT9 RETN SPOCK2 ZMIZ1 BLOC1S4 WDR59 UBL5 FBXL12 FAM43A PARP6 COMT KDM2B HK3 ZNF296 SARS FRMD8 MDM1 ST6GAL1 IMP3 C14ORF2 CD37 RTN3 TSPYL1 S100A12 SLC16A11 C6ORF136 RASAL3 GLRX FAM65A RUNX3 TEX261 TMCC3 
57.29788313236089 APP C12ORF65 PYCR2 HNRNPH1 DFFB COPS5 COPS7B PRPSAP2 CAPRIN1 RPL31 ADARB1 SRPK1 SF3B2 SF3B6 VDAC1 CLEC4D ANXA5 ACTB GSTO1 ECH1 DYNLT1 ERH PAXIP1 FUCA1 ETS1 UBE2I SRSF8 PTBP1 PDHB ITGB1 SLIRP ECHDC2 MRPL49 HNRNPA1 GAPDH CFL1 PCBP1 CDC37 BMX DDX47 WDR45B TUBB S100A8 EDEM1 MAT2B IARS AIMP2 GORASP2 RPUSD2 SEPT9 GAK HMGB2 IGBP1 ASPSCR1 ABCF1 PHACTR4 DDX54 NUDT9 ARHGAP17 COMT CD74 TMEM109 ZSCAN21 BUD31 RING1 MLLT6 HADH MED24 CENPB APEX2 LCP2 TBC1D9B ZFAND5 SLC41A3 BLOC1S1 ANKRD13D HMCES RASA3 TMA7 FRMD8 STK25 NDUFB3 UBE2G1 GIMAP6 TSPYL1 CRACR2B ZFP36 C1ORF174 BET1L RNF34 S1PR1 
54.15217634009142 M FNBP1 ANKHD1 EIF4A3 HNRNPR RPL26 SRSF5 RCC2 PDHB MDH2 CKB PFN1 BAG4 PGRMC2 GHITM SLC52A2 DNAJA3 KDM1A RNF34 RMDN3 STIM2 CKAP4 ARF1 RAB9A STX6 SCRIB UTRN ITGB1 UBAC2 FAF2 TMEM43 PDIA4 PIGT PAFAH1B1 DDOST SLC25A3 ATP2A2 RBM14 SNRPB TMEM131 DSTN SLIRP SLC25A5 RPS24 EIF2S3 DDX54 SNAP23 IKBKB PRKCQ PHACTR4 STX10 C16ORF58 GORASP2 SHMT1 WDR11 RTN3 THEM6 GAPDH AP2B1 HADH PUS1 JAK1 MED24 MIPEP MARCKSL1 SLC23A2 ALG9 SLC41A3 YARS WDR45B ELP3 KIAA0195 WDR4 C2CD2 NOL9 COMT MYOF PPP1R16B CNTNAP1 LRRC8C AHCYL2 ZDHHC8 RFTN1 SBF1 DDX47 RAD17 TM9SF4 BET1L KIAA0355 TELO2 TMCC3 BTN3A3 MDFIC 
38.16260765638983 RAB7A EDEM1 FAF2 RMDN3 STIM2 CKAP4 PIGT TCTN3 TMEM43 UBAC2 VDAC1 MYOF RPA1 ITGB1 STX6 SNAP23 RAB9A WDR59 HPS3 PIK3R4 TUBB COPS5 SLC25A3 ACTB U2AF2 CFL1 PDHA1 UQCRQ COX7A2 UVRAG RTN3 DDX54 GORASP2 STX10 DDOST ATP2A2 CBX3 TMEM131 IST1 HMGCL BLOC1S1 HEATR5B DSCR3 BET1L 
38.09303453097859 GOLGA2 FAM110A GATA2 KDM1A SIN3A GSE1 KDM2B UBE2I CDC37 TXLNA NUP62 PFN1 CKB ANKHD1 PDHA1 ECH1 RAB9A STX6 GORASP2 ATP6V1D DDX54 IFT20 PCBP1 PAXIP1 NCOA6 RNF214 EDEM1 PPP1R16B SNRPB LCP2 MLLT6 CCNC CCDC92 SUGP2 GLTSCR1 AFF4 
37.94922574840974 SEC61B RPL31 SRP54 CKAP4 RMDN3 ILF3 RPLP2 RPS24 EEF2 COPS5 SLC25A3 DDOST EDEM1 FAF2 TMEM43 PDIA4 PIGT TCTN3 PGRMC2 PTRH2 STIM2 CALM1 RTN3 RAB9A ARF1 ITGB1 SLC25A5 GORASP2 ATP2A2 TMEM109 MBOAT2 POFUT1 TMEM131 TNPO1 RANGAP1 AHCYL2 GNB1 IFT20 SON RPL39 TOR1A TMCC3 COX7C BTN3A3 COX7A2 MUTYH 
32.513228872394464 BCAR1 RPL26 MTOR RRAGD STK38 SHMT1 ABL1 CRKL UBL5 KIAA0355 UVRAG GNE PSMG2 MED24 CRLF3 HMGN2 IST1 HPS6 BMX SUMO3 NUFIP2 SLPI C14ORF2 GAK COX7A2 POLD2 
32.36264675233986 SMAD3 ZMIZ1 ETS1 SMARCA4 ACTB U2AF2 CTCF SIN3A PARP1 MED24 SMAD7 HDAC1 TGIF2 RUNX3 ARID1A NCOA6 RBM14 RPA1 ANXA7 ITGB1 PIAS3 NUP214 JAK1 HP SH2D2A ZCCHC14 DOCK9 GMDS 
32.029488486189834 ERBB2 FRS3 TMA7 ARF1 JAK1 ABL1 CRKL AP2B1 ACTB GAPDH CFL1 ILF3 CKB PTPRA CALM1 CLASP2 TNPO1 TXN CDC37 BMX SRPK1 STX6 ITGB1 IARS LCP2 SH2B3 ZMYND8 SH2D2A APLP1 
31.95879977723965 SRC IKBKB CDC37 BMX HNRNPA1 RPLP2 CFL1 PARP1 NCOA6 ABL1 MICAL1 ARHGAP17 EVL TUBB ANXA7 BTBD2 PTPRA CALM1 SLC25A3 FASLG SH2D2A CYSTM1 TCEB2 LHFPL2 RAB9A PHF19 ASAP1 S1PR1 
31.579541097028233 MMGT1 MYL6 MYL6B VDAC1 RMDN3 FAF2 TMEM43 TCTN3 ATP2A2 PDHA1 MDH2 ANXA1 ANXA5 TUBB GAPDH CLEC4D TNPO1 CKAP4 STX6 RAB9A C16ORF58 TMEM109 MYOF MED24 SF3B2 IST1 
30.99958176142714 DDX39A RBP7 MED29 CRLF3 FNTA VPS9D1 DDOST PARP1 ACTB UBE2I SARNP ANXA1 ILF3 SNRPB ERH LSM3 USP11 ALG9 QDPR IRF1 POLE4 GNE CRACR2B STIM2 C1ORF174 KLF2 ATP6V0E1 BCL11B APLP1 ZC3H7A 
30.976151623504936 PPP2CA FNBP1 TNPO1 HNRNPH1 PAXIP1 RPA1 ANAPC1 AIMP2 ATP6V1D MAP4K1 CARD11 IKBKB PPP2R3C ZNF217 ZFP36 NRD1 SON STK25 FAM43A INTS9 EP400 TMEM109 RBL2 IGBP1 
30.66887205839992 MKRN2 HNRNPR HNRNPDL HDAC1 TGIF2 CREB1 ZFP36 ALG13 ZC3H7A RNF214 CEP85 KIAA0355 ARF1 AP3D1 PARP1 MTOR ASAP1 CAPRIN1 IARS KDM1A NTNG2 TM9SF4 DYNC1H1 IFT20 PTPRA S1PR1 
30.549398058010485 NSP2 HNRNPR TFG ANXA1 RRS1 SRPK1 DDX24 NFATC3 HEATR5B PTRH2 AGK TXLNA SH3BP5L CCDC102A CCDC92 VDAC1 COX7A2 ATP2A2 ZC3H7A MARC1 SUGP2 ZFAND5 TTF2 SPECC1L EEF2K CLASP2 TIMM8B 
30.20892482298902 HRAS SCRIB ANXA7 ANXA1 MRPL49 FNTA MTOR STK38 ITGB1 ANAPC1 BTBD2 PTPRA PHACTR4 ACTG1 SNAP23 SEPT9 GORASP2 UBIAD1 MARCKSL1 SLC23A2 RALGDS NBEAL2 TLR6 SBF1 PHF19 DOCK9 S1PR1 MUTYH 
30.06034561405813 UBE3A PSMD7 CCDC92 SCRIB ACTB AIMP2 DSTN CKB COPS5 COPS7B TUBB GNE ANXA1 UTRN CALM1 DNMT1 MDM1 SUMO3 ECH1 DAZAP2 ZFAND2B FAF2 ZFAND5 JADE1 SGK223 IRF1 POLE4 
29.56177733780863 MEOX2 HNRNPD CCNC ARIH2 IKBKB WDR37 STX6 TSR2 ZNF330 MLLT6 ZMYND8 BUD31 TXLNA DSCR3 MOAP1 C1ORF174 METAP1 EIF4A3 MAGED1 DPH3 COX7A2 PSMB10 TJAP1 HK3 
29.534218339459187 AGO2 HNRNPR HNRNPD ILF3 CTCF PCBP1 COPS5 GAPDH HNRNPA1 HNRNPAB ACTB AP2B1 CNDP2 GEMIN4 TFG ZC3H7A RNF214 CEP85 ZFP36 KIAA0355 SAMD4B IGBP1 NCOA6 EEF2K 
29.433975525714484 BICD2 FAM110A NUP93 NUP62 NFATC3 PAFAH1B1 DYNC1H1 IST1 RANGAP1 PCBP1 ZC3H7A KIAA0355 MTOR TXLNG TTF2 TMEM131 SEPT9 ASAP1 
29.366908226555786 SMG7 RPL31 LEO1 KDM1A CSNK2A2 PSMD7 RING1 KDM2B ALG13 ZFP36 CEP85 RNF214 ZC3H7A SAMD4B KIAA0355 CAPRIN1 ZNF217 GSE1 U2AF2 PFN1 ARID1A PAXIP1 NCOA6 YEATS2 DDOST MLLT6 
29.116580283771295 SH3KBP1 CRKL MAP4K1 ABL1 AP2B1 RPA1 ELMSAN1 ARHGAP17 CKAP4 ASAP1 FRS3 SRRM1 ZFP36 DNMT1 UBE2I PHACTR4 SLC25A3 SEPT9 KLF13 RNF34 
28.74239366438648 MYO6 ACTB PTRH2 HNRNPA1 PCBP1 WDR33 COPS5 GEMIN4 SEPT9 PDHA1 CLEC4D CALM3 MYL6B CALM1 GAK STX6 RAB9A AP2B1 YEATS2 WRNIP1 USP11 EVL 
28.53371294563163 BRD1 SCRIB FEZ1 TELO2 KIAA0355 ZNF148 ZNF330 DNMT1 CBX3 WRNIP1 USP11 DDX24 JADE1 ING4 DPY30 BCL11B CTCF QSOX2 WDR37 GID8 ANAPC1 ZZEF1 DDX47 DNAJA3 
28.510823733459013 KRT31 FAM110A FRS3 GNE ZNF148 PSMG2 JOSD1 FNTA TXLNA TXLNG SNRPB BAG4 ALG13 RBM14 AIMP2 NUFIP2 ABL1 VPS9D1 CCNC KDM1A COMT ASPSCR1 MXD3 SLIRP 
28.482266596865543 YTHDC1 RPL31 ADARB1 SRPK1 ZNF330 LEO1 PARP1 CENPB PYHIN1 SNRPB HNRNPA1 EIF4A3 RNPS1 DDX23 PFDN5 ABL1 SOX13 STX6 POFUT1 ZNF317 SMARCAL1 FAF2 TLR6 PHF19 
28.430905333676385 MED23 TRRAP CSNK2A2 PSMD7 RING1 ATPIF1 SRRM1 PAXIP1 NCOA6 RBM14 PARP1 MED24 CCNC MED29 ETS1 EP400 KDM1A INTS9 SRP54 GEMIN4 AGK PDXP PSMB10 
28.361188225398905 SMARCA2 ACTB GAPDH SMARCA4 DDIT3 ARID1A GATA3 KDM1A GATA2 BRD7 BRD9 GLTSCR1 PARP1 ETS1 SIN3A HDAC1 IRF1 ACTG1 MYL6 ZNF212 JADE1 NUP62 
28.26035995500177 SEC16A RNF214 CEP85 ZFP36 ALG13 PFN1 HNRNPA1 ACTB PTRH2 AP2B1 GAK RBM14 WDR33 CLEC4D PDHA1 RAB9A STX6 ANKHD1 KIAA0355 SAMD4B IKBKB STK38 ELMSAN1 KCTD5 
28.187402251544047 DCTN1 PSMD7 DYNC1H1 PAFAH1B1 ACTB CDC37 FBXW4 PCBP1 ACTG1 ZC3H7A CAPRIN1 TTF2 SPECC1L DYNLT1 CLASP2 CCDC64 ANXA1 AGK UQCRQ MIPEP KCTD5 RBL2 
28.13225095313616 LNX1 DDIT3 CRACR2B ZNF148 PFDN5 IARS AIMP2 ILF3 EIF4A3 HNRNPH1 JOSD1 NUP62 DYNLT1 ABCA1 CEP85 HSBP1 LCP2 FBXL12 SH2D2A DCTD DOCK9 
27.913794256344993 NSP12 RNF214 CEP85 ZC3H7A ANKHD1 ZNF217 GSE1 FRMD8 ANAPC1 SNRPB PFDN5 EEF2 PDHA1 SLC25A5 ALDH9A1 HADH CRKL CCDC92 SPECC1L IFT20 
27.876520709497008 IKBKG DDIT3 PRKCQ IKBKB CDC37 MTOR PARP1 MLLT6 UBE2I ANXA1 RNF34 SEPT9 MAP4K1 CARD11 ACTG1 ABCA1 FRMD8 STK25 SRPK1 ZFAND5 AKR7A2 PFDN5 
27.728264803811395 CRK ACTB AP2B1 ABL1 CRKL MAP4K1 DNAJA3 TUBB SGK223 ASAP1 MICAL1 ARHGAP17 PHACTR4 PTPRA SH2D2A ATIC CNDP2 STK38 FASLG BUD31 NUFIP2 
27.650138213066064 HGS TFG MAGED1 DAZAP2 IST1 UBE2I BLOC1S1 ABCA1 PTRH2 TUBB STX6 RAB9A CLEC4D HNRNPDL IL2RB MIF4GD CDR2 ATP2A2 NUP62 
27.587134145670063 CALML3 CCDC102A CALM1 MYL6 MYL6B HMGB2 JOSD1 HMCES GAK AP2B1 DYNC1H1 PYHIN1 SRPK1 HNRNPA1 CFL1 ACTG1 INTS9 PIGT SPECC1L GOT1 BHLHE40 GNB1 
27.553956583637234 RAD50 HNRNPR ILF3 HNRNPA1 ACTB DYNC1H1 RCC2 TCEB2 MLH1 RPA1 PAXIP1 GATA3 PARP1 PYHIN1 USP11 DYNLT1 FNBP1 ZFAND2B ZC3H7A RBL2 
27.034416026855386 MAP3K7 CDC37 IKBKB PRKCQ CARD11 MAP4K1 COPS5 COPS7B MDH2 ANXA5 PDIA4 HMGB2 ECHS1 EEF2 ATIC VDAC1 SMAD7 MAGED1 DAZAP2 YEATS2 POLE4 MARCKSL1 
26.953813646267744 MDFI DCANP1 TSPYL1 EWSR1 CDC37 MAGED1 ILF3 SHMT1 CRY1 ZNF330 GNE GATA2 JOSD1 STX6 ZNF696 TOR1A NUFIP2 AASDHPPT SPG7 
26.652634059296886 PRRC2B TFG GATA3 GATA2 RNF214 CEP85 ZFP36 ALG13 BAG4 ANKHD1 ZC3H7A PCBP1 HNRNPA1 RBM14 CAPRIN1 SAMD4B KIAA0355 PYHIN1 USP11 CD74 DPY30 
26.60955502665256 SET RPL31 CTCF RPS24 DDX24 TSPYL1 ZNF330 ELMSAN1 CBX3 HMGB2 GAPDH ACTB MARCKSL1 CD81 HNRNPH1 PFN1 CALM1 DDX23 SUGP2 IRF1 ASAP1 
26.604841398775786 RBPMS TFG EWSR1 PCBP1 ILF3 RBM14 SNRPB BAG4 MAGED1 DAZAP2 BHLHE40 ALG13 GSE1 GATA2 ANKHD1 NUFIP2 SLIRP WDR54 ATP6V0E2 PPP1R16B SPG7 
26.559339681656965 LAMTOR5 TTF2 HMCES RRAGD RAB9A ITGB1 DSTN GORASP2 BLOC1S1 STK38 DBI HNRNPH1 SIN3A NCOA6 GAPDH CD74 KLHL6 DNAJA3 
26.53741759469747 MYL12A ACTB SLC25A3 VDAC1 PDHB COX7C PDHA1 HNRNPA1 COX7A2 LRRFIP1 MYL6B MYL6 HERC1 IST1 DNMT1 PAXIP1 IGBP1 
26.50678531012626 RYBP CSNK2A2 RING1 KDM2B GATA3 GATA2 HDAC1 UBE2I PARP1 MYOF SCRIB ABL1 AP3D1 CBX3 TCEB2 AGK ZNF207 IFT20 STK25 GNB1 
26.381565144143718 LYN FNBP1 FASLG SCRIB ARF1 ERH DYNLT1 SNAP23 SEPT9 DDX23 UTRN ITGB1 PHACTR4 EVL LCP2 MARCKSL1 ZDHHC8 CDC37 IRAK2 RFTN1 RASA3 
26.34167459781998 RCN1 FAM110A MYL6 TCTN3 PDIA4 ILF3 COPS5 SLC25A3 CLEC4D WDR45B CD74 RAB9A WRNIP1 KCTD5 SPECC1L FBXL12 
26.295698935954142 CYSRT1 FRS3 GNE GATA3 GATA2 RMDN3 HSBP1 JOSD1 ZNF696 ZNF330 ZNF319 ERH SNRPB FASLG NUFIP2 DNAJA3 RALGDS ZCCHC14 MXD3 SPG7 
26.116452991137777 GNB2 IL27RA VDAC1 CLEC4D GNB1 CD81 KCTD5 DYNLT1 PFDN5 BHLHE40 MTOR PDHB ANXA7 ANXA1 MED29 ZNF212 COMT 
25.911842498569833 METTL21B SMARCAL1 HERC1 TSPYL1 DNAAF5 TELO2 TBC1D9B NBEAL2 RAD17 IKBKB GAK MDH2 TRIP12 TTF2 SBF1 GEMIN4 ANAPC1 ANKHD1 PFDN5 IGBP1 
25.765880786853874 CBL ACTB GAPDH HNRNPD JAK1 ABL1 CRKL ASAP1 ARF1 DAZAP1 PFN1 CALM1 CD81 TUBB FRS3 PRKCQ PRPSAP2 SMAD7 IRF1 LCP2 
25.487196278902946 KAT2B TRRAP WDR37 YEATS2 ACTB RPA1 HNRNPD PARP1 ETS1 HMGN2 TAF15 SMAD7 KLF10 KLF13 IRF1 KLF2 
25.426630150517504 ATG9A EDEM1 FAF2 UBAC2 ZFAND2B HEATR5B STX6 GAK PARP1 AP3D1 WDR11 STX10 SYPL1 UVRAG PGRMC2 AP4B1 TJAP1 CAPRIN1 
25.382403590600777 FKBP5 PRKCQ IKBKB CDC37 PCBP1 HNRNPH1 SF3B2 MAP4K1 ACTG1 ANAPC1 HPS3 ALDH9A1 PDHB SMARCAL1 AP2B1 TTF2 DNAJC27 
25.306585932491917 UBQLN1 RPL31 RPL26 RPLP2 MTOR EDEM1 GNB1 HNRNPH1 MLLT6 IST1 TICAM1 ABCF1 GOT1 MIPEP GAPDH SLPI DAZAP2 ZFAND2B UBE2I H2AFJ 
25.270191207789683 MEX3B RUNX3 MAGED1 ZC3H7A RNF214 CNOT11 CEP85 KIAA0355 ALG13 SAMD4B ANKHD1 CAPRIN1 NUFIP2 U2AF2 ZCCHC14 AP2B1 GID8 
25.253942730998883 PCM1 TSPYL1 EWSR1 DDIT3 DDX23 CEP85 MED29 IFT20 TXLNA TXLNG ACTB TTF2 CCDC92 MAGED1 STX10 TBC1D31 ALG13 PPP2R3C 
25.247542609638433 SNX27 FNBP1 MTMR12 THEM6 JAK1 HEATR5B PAFAH1B1 WDR45B HP ORM1 SRSF8 STK38 LSM3 FAM65A OTULIN SBF1 KCTD5 ARID1A 
25.217045209311454 NDUFA4 GHITM ECH1 TAMM41 PDHA1 CLEC4D UQCRQ COX6A1 COX7C PDHB COX7A2 SLC25A3 ATP5J ATP5E AGK RFTN1 NDUFB3 
25.122699797674088 TANK CSNK2A2 TELO2 AIMP2 MDH2 HNRNPH1 ACTG1 PAFAH1B1 ZFAND2B IKBKB TXLNG TXLNA NUP62 RAB9A ZFP36 TBC1D31 TICAM1 PFDN5 
25.060602196492937 WDYHV1 CEP85 RASAL3 BTBD2 EIF2B1 AIMP2 ACTB ACTG1 EP400 MIF4GD SLPI CDR2 ACOT13 DYNLT1 PFDN5 HMGCL GMDS 
25.051732878129698 TFAP4 TTF2 PYHIN1 CTCF SRP54 HMGB2 SIN3A ARID1A BRD7 HDAC1 ELMSAN1 KCTD5 S100A8 TCEB2 YEATS2 WDR45B 
25.003358997383305 NSP10 SNRPB SON GSE1 ZNF217 PDHA1 SLC25A5 ZZEF1 THAP11 MYL6B HK3 ARID1A MED29 MLLT6 AASDHPPT MARCKSL1 AP2B1 WDR11 
24.954419596014894 TRIP6 FRS3 GNE RNF214 PCBP1 ILF3 COPS5 PARP1 GSE1 ZFP36 KIAA0355 KCTD5 FASLG S1PR1 
24.83451468168788 INF2 ACTB ACTG1 PFN1 CALM1 GAK STX6 RAB9A WDR59 HPS3 CKAP4 RMDN3 SBF1 HERC1 RAD17 CALM3 ERH SEPT9 
24.827734322394786 KLK10 GMFG HP ORM1 ORM2 CCNC S100A9 S100A8 GOT1 S100P PGLYRP1 HK3 CAMP MMP9 
24.766579216258826 CDK4 RUNX3 HDAC1 COPS5 GEMIN4 RPA1 ANXA7 APLP1 WDR33 TUBB CDC37 CDKN1C RBL2 SLBP GATA2 DAZAP2 
24.75634944872382 MYOD1 KDM1A TRRAP EP400 SMARCA4 ARID1A HDAC1 RING1 KDM2B ELMSAN1 BHLHE40 CREB1 GSE1 SMAD7 PAXIP1 CDKN1C 
24.697673358683844 STAT1 IL27RA JAK1 IL2RB SHMT1 APEH SMARCA4 COPS5 SARS U2AF2 HDAC1 UBE2I IRF1 MYL6B OTULIN HADH IGBP1 MTOR 
24.498781860019943 CEP135 ECH1 DNAAF5 MAGED1 GSE1 HDAC1 SRPK1 CEP85 ANXA7 PAFAH1B1 TXLNG TXLNA TBC1D31 TMEM131 ATP6V1D DBI 
24.426709546580838 POLR2B ZMYND8 KDM1A CCNC HNRNPD MED24 MED29 SNRPB TUBB CALM1 INTS9 MAP4K1 GTF2B JADE1 PHRF1 
24.3944898474767 BYSL HERC1 CSNK2A2 PYHIN1 SMARCA4 PARP1 AIMP2 ZNF330 TNPO1 CLEC4D COPS5 RPS24 KLHL6 TSR2 ZNF212 JADE1 BMX BHLHE40 
24.363399531551607 CEP128 CCDC102A ECH1 NFATC3 CEP85 ANXA1 PAFAH1B1 GID8 TXLNG TXLNA AP2B1 TTF2 TBC1D31 GNB1 DDX47 HDAC1 PPP2R3C 
24.35367728369922 CCR1 ORM1 HP ANXA1 ARG1 CSTA S100A9 S100A8 GDPD3 CD74 ANKRD22 CKAP4 S100P 
24.346665796907565 RELA CSNK2A2 ABCA1 ANXA1 PIAS3 HDAC1 PARP1 NCOA6 RBM14 SRF DNAJA3 MAP4K1 DYNC1H1 KDM1A IKBKB ING4 GTF2B NUFIP2 COMMD6 TIMM8B 
24.31928244629448 KRTAP10-8 FAM110A FRS3 ZSCAN21 ZNF330 MLLT6 GNE C2ORF68 UBL5 JOSD1 ZNF696 ZNF319 NUFIP2 RALGDS PDIA4 ASPSCR1 
24.296049906303015 PAX6 PAXIP1 NCOA6 GSE1 ZNF148 SMARCA4 CTCF ARID1A GLTSCR1 BRD7 DNMT1 ARIH2 EP400 RING1 KDM2B MLLT6 ELMSAN1 MARCKSL1 
24.16612736205139 LGALS3 MYL6 PAXIP1 TXN PYHIN1 PARP1 DDOST IGBP1 CKAP4 ITGB1 DSTN SLC25A5 SYPL1 GEMIN4 GOT1 PTPRA DBI 
24.05099156926739 ARID4B ZNF330 SIN3A CSNK2A2 HDAC1 SMARCA4 GATA2 ARID1A ETS1 PARP1 BCL11B TGIF2 MXD3 KLF10 SRPK1 DYNLT1 PAFAH1B1 
23.855333393527275 ATP6AP2 ATP6V0E1 RMDN3 PDIA4 TCTN3 CKAP4 CLEC4D RAB9A ATP6V1D UBAC2 GDPD3 IST1 S100A8 RANGAP1 S100P 
23.77178100370523 TLE1 ACTB ANXA7 HDAC1 GATA3 GATA2 SIN3A PARP1 ERH PFN1 ALG13 RUNX3 DAZAP2 SBF1 BTBD2 
23.74199242534983 KRT40 ZSCAN21 GNE C2ORF68 GATA2 KDM1A THAP11 JOSD1 ZNF696 TXLNA TXLNG ZNF319 ZNF317 SNRPB FASLG COMT SPG7 SLIRP 
23.738941897343484 PSMD4 AHCYL2 ACTB GAPDH MDH2 AIMP2 RPA1 XPC CBX3 AP2B1 ABL1 PSMB10 PSMD7 CCDC92 MED24 PDIA4 PTPRA S100P 
23.622306931199184 DNAJB11 ECH1 PAXIP1 EDEM1 PDIA4 TCTN3 CKAP4 BRD7 CLEC4D DDOST ADA CD74 SEPT9 MAT2B SIMC1 
23.575659537725052 CNOT3 KDM1A TRRAP ELP3 INTS9 ZFP36 CNOT11 TMEM131 RNF214 CEP85 ZC3H7A SAMD4B KIAA0355 CAPRIN1 SRRM1 DPY30 
23.570620483092753 AQP3 DNAAF5 C16ORF58 TELO2 TBC1D9B C14ORF2 TUBB HNRNPH1 WDR11 COX6A1 SLC25A3 SPG7 NDUFB3 AGPAT2 
23.441872793469827 NOTCH2 ZSCAN21 STX6 RAB9A CKAP4 ZNF696 ZNF317 E4F1 ZNF664 EEF2 PTPRA SIN3A KIAA0355 CRKL ZMYND8 ATP2A2 
23.176542122147268 TBK1 TFG CDC37 TXLNA TXLNG MTOR RRAGD EDEM1 MAP4K1 RNF214 SAMD4B ZFP36 KIAA0355 TICAM1 IRF1 S100P 
23.127158306389877 TRIM63 GATA3 RING1 ING4 JOSD1 CKB ILF3 RBM14 UBE2I PIAS3 SRF UBE2G1 KLHL36 AKR7A2 NUFIP2 PDHB NDUFA1 
23.094161882115614 MYO9B ACTB ACTG1 PFN1 CALM1 MYL6 CALM3 MYL6B PCBP1 BRD7 NUP214 MOAP1 ARF1 HLA-DPA1 
23.037291279391198 DYNLL2 EWSR1 PCBP1 HNRNPH1 PAXIP1 NCOA6 ETS1 TCEB2 DYNC1H1 PAFAH1B1 DYNLT1 KDM1A RNF214 CDR2 
22.950992094490754 PTGES3 EWSR1 SMARCAL1 CDC37 MAP4K1 ABL1 LEO1 ZNF330 NOL9 ELP3 WDR11 SARS CLEC4D TTF2 SBF1 SLIRP 
22.929939493409215 SVIL ACTB GAPDH DSTN PARP1 RNF144A HMGN2 SEPT9 MIF4GD PAXIP1 CALM3 CALM1 GNB1 DYNLT1 
22.922724785261654 EPB41L3 EIF4A3 SNRPB SRRM1 CSNK2A2 RNPS1 GID8 TAF4 DPY30 PAXIP1 EEF2K FAF2 RNF144A AP3D1 AFF4 ELMSAN1 
22.911035474216124 FHL3 FAM110A GATA2 AIMP2 SUGP2 ZCCHC14 ZFP36 SNRPB UBE2I CREB1 ANKHD1 TJAP1 SLIRP 
22.805753645852427 TNFAIP3 ANKRD13D TICAM1 RNF216 CARD11 IRAK2 PCBP1 UBE2I RCC2 FAF2 ATP2A2 IKBKB MTOR ALDH9A1 
22.803687847515796 PLS3 ACTB PAFAH1B1 ANXA5 EWSR1 SF3B2 PYHIN1 HNRNPH1 IARS SEPT9 PIGT AIF1 ALDH9A1 
22.688545400903262 NDUFA5 EWSR1 VDAC1 PDHB C12ORF65 SLIRP MDH2 DAZAP2 TCTN3 MAP4K1 DNAJA3 COX6A1 NDUFB3 NDUFA1 
22.677008170066358 POMK XYLT2 HDAC1 ATP5J ATP5E GHITM PGRMC2 TCTN3 PIGT DGAT1 RTN3 ATP2A2 DDOST PSMD7 DYNC1H1 QSOX2 GPR114 
22.64634993302088 IQGAP3 HERC1 IST1 KCTD5 TUBB BRD7 PARP1 MYL6 CALM3 MYL6B CALM1 BCL11B 
22.49127909969904 TRIM56 ZFP36 ALG13 ZC3H7A PCBP1 ILF3 PARP1 HNRNPH1 TICAM1 FAF2 ING4 ALDH9A1 
22.48205888708661 FOS DDIT3 SMARCA4 BRD7 ARID1A EP400 PAXIP1 NCOA6 ATP5I CALM1 MYL6B KDM2B ABL1 UBE2I ANKHD1 
22.435308433675367 ANXA7 SEMA5B ACTB GAPDH ANXA1 RPA1 PDHB COPS5 PCBP1 PSMB10 APLP1 EDEM1 SUMO3 CENPB 
22.412284808998795 VPS33A TELO2 PCBP1 SRPK1 TMCC3 AP3D1 ELP3 RAB9A TBC1D31 ALG13 TTC21B UVRAG CNOT11 ALDH9A1 
22.401184151315004 PDCD6 CRKL TFG IST1 BAG4 TXN PCBP1 PARP1 HNRNPH1 IGBP1 ANXA5 QDPR ARID1A 
22.39876501455635 ZUFSP ACTB GAPDH TUBB FNBP1 MTOR USP11 DYNC1H1 RPA1 ARG1 CSTA UTRN GATA2 DSCR3 CTCF POLE4 
22.350023063177872 NFIA KDM2B GATA3 PAXIP1 GATA2 GSE1 RPS24 RRS1 PARP1 ZNF148 ARID1A ELMSAN1 DPY30 SON CREB1 
22.241979826524467 MED17 TRRAP MED24 CCNC HNRNPD PARP1 SMARCA4 MED29 INTS9 WDR33 FBXO21 TBC1D31 
22.17787793942259 VAMP3 ZSCAN21 C12ORF65 RMDN3 STX2 SNAP23 STX6 STX10 RAB9A LHFPL2 MLH1 CD81 COMT SYPL1 BET1L 
22.1206301728684 PSMD2 PSMD7 CCDC92 APEH VPS9D1 HDAC1 U2AF2 CALM1 AP2B1 PARP1 AIMP2 FAF2 ZFAND2B SFSWAP 
22.06403313517576 LEMD3 ACTB PTRH2 CBX3 ZNF330 ILF3 RBM14 PARP1 CKAP4 RMDN3 TCTN3 STX6 RAB9A CLEC2D 
22.01178243677922 LRIF1 FEZ1 PFN1 PARP1 BRD7 CBX3 ANXA7 APLP1 ANXA1 S100A8 DSCR3 DYNLT1 SPG7 
21.99414021892283 GPR45 DNAAF5 C16ORF58 TELO2 MTOR TUBB TBC1D9B NBEAL2 C2CD2 JAK1 TM9SF4 GEMIN4 FBXL12 TNPO1 DCK SPG7 
21.964211839576638 KIF1B ACTB PAFAH1B1 UBE2I SCRIB HERC1 CSNK2A2 PCBP1 HMGB2 MIF4GD CALM3 CALM1 ZNF217 RAB9A 
21.91274218826615 TJP2 ACTB TTF2 PFN1 SRPK1 DDX23 ARHGAP17 FNBP1 SCRIB PIGT CD74 GTF2H5 GOT1 H2AFJ 
21.874809291831554 RPS27A RPL31 EWSR1 HNRNPA1 RPL26 E4F1 RPS24 PARP1 HDAC1 PAXIP1 DAZAP2 DOCK9 ZC3H7A CRY1 
21.744976403500328 TSC1 FAM110A CDR2 FRS3 PFN1 DAZAP1 SH2D2A EDEM1 ARF1 RAB9A STX6 IKBKB JAK1 KDM1A 
21.740965021284055 WTAP ACTB PDHA1 TUBB SLIRP TXN PYHIN1 THAP11 IFT20 SH3BP5L CA4 ANXA1 HNRNPH1 CSTA 
21.73765112608427 JAK3 CD81 GTF2B MAGED1 RNPS1 JAK1 IL2RB ANXA1 DCK CRY1 WDR45B AP3D1 DNAJA3 
21.730823127708998 RICTOR TSPYL1 MIF4GD RPL26 MTOR TELO2 HNRNPD IKBKB TUBB STK38 GNB1 ARF1 STX6 
21.71815736971932 DVL3 TFG TSPYL1 ZSCAN21 PCBP1 PHF19 KDM1A PLAGL2 ZNF696 PPP1R16B ZNF319 BHLHE40 DDX54 
21.7036238713553 ANKRD28 HNRNPR FNBP1 PFN1 RTN3 MAGED1 PRPSAP1 CTCF PAFAH1B1 LEO1 CD74 AP2B1 AGPAT2 
21.7026987332619 HMGN1 KLHL36 HMGN2 PARP1 XPC PFN1 ANXA5 CALM1 ATPIF1 PDIA4 JADE1 SUGP2 MAP4K1 LCP2 PLAGL2 
21.679038499034174 NPTN TSPYL1 DNAAF5 TELO2 MTOR NBEAL2 RAD17 HEATR5B GEMIN4 FBXL12 ATP2A2 TNPO1 SLC25A5 SIMC1 
21.665706703251033 LZTS2 SHMT1 CEP85 ZFP36 ALG13 WDR33 KIAA0355 SAMD4B DUSP5 STX6 BMX RANGAP1 CCNC SBF1 PHF19 SPG7 
21.657324992346748 CLTA ACTB AP2B1 GAK HEATR5B COPS5 TFG ARIH2 IST1 PYCR2 TMEM43 STX10 AP3D1 CLEC4D ELMSAN1 
21.61692909717344 PIAS4 ZNF330 PARP1 ETS1 UBE2I HDAC1 CBX3 YEATS2 SMAD7 HNRNPH1 TICAM1 SIMC1 
21.616067982348312 NIPSNAP1 ACTB COX7C VDAC1 EEF2 PDHB MDH2 CALM1 PDHA1 UQCRQ ARHGAP17 TCTN3 DBI 
21.573527581064123 SKP2 WDR59 CDKN1C COPS5 IKBKB MTOR SMAD7 MLH1 AP5B1 ANAPC1 SLC9A3R1 MMP9 RBL2 CRY1 PFDN5 
21.554456075506707 NOL10 ZNF330 PARP1 H2AFJ RPL31 ADARB1 IMP3 SRSF5 EIF4A3 RPS24 JADE1 DDX47 DDX23 
21.52275062308499 PSMC1 PSMD7 CCDC92 MIF4GD LEO1 KDM1A HDAC1 U2AF2 AKR7A2 PIAS3 ATP6V1D DPY30 CRY1 SFSWAP 
21.50193674447883 MMS19 PRPSAP1 COPS5 MAP4K1 ABCF1 EDEM1 ARF1 TMEM43 QSOX2 ORAOV1 ELP3 AP2B1 MLH1 RANGAP1 POLD2 MUTYH 
21.469832097503673 KLF15 RPL39 CTCF GSE1 ZNF148 ZNF330 BRD7 CAPRIN1 DDX24 KDM2B RING1 ZNF319 EP400 TAF4 
21.46565442294332 MAVS PCBP1 RMDN3 FAF2 RAB9A IKBKB MAP4K1 PTRH2 WRNIP1 ABL1 TICAM1 GNB1 RNF34 
21.44890130293635 EIF3I SRRM1 PAXIP1 PARP1 CTCF PYHIN1 HMGB2 TRIP12 EIF2S3 CD74 CD81 CAPRIN1 DDX47 
21.384501218150543 FTL SMARCAL1 TSPYL1 GATA3 FAM43A THAP11 DUSP5 AP3D1 ELP3 TBC1D31 SBF1 ASPSCR1 UQCRQ DDX47 SBK1 
21.362369535057887 SERPINB4 CCNC S100A9 PIGT S100A8 GOT1 COPS5 MED24 CCDC92 BMX UBAC2 HLA-DPA1 PFDN5 
21.336770871113693 PLSCR1 CRKL TFG EWSR1 ILF3 CRY1 DAZAP2 FRS3 SLPI MFNG NUFIP2 SPG7 ADCY7 
21.259060954991394 GRN AHCYL2 CRKL SLPI ZSCAN21 GNE FAM43A INTS9 ALG13 HK3 CRY1 NUFIP2 
21.22211596637735 GTF2E1 ZNF330 PARP1 EWSR1 GEMIN4 ERH DDX23 SRRM1 CBX3 GTF2B SARS IKBKB RPUSD2 GTF2H5 
21.171061474785397 SDCCAG3 KDM1A TTF2 YEATS2 CEP85 KIAA0355 SAMD4B NUP62 RBM14 HNRNPH1 PAFAH1B1 AP2B1 
21.148014995146898 UQCRB COX6A1 COX7C ACTB PSMB10 PDHB VDAC1 DDOST UQCRQ GNB1 SPECC1L CALM1 C14ORF2 
21.102952273421558 RNF11 ANKRD13D PSMD7 S100A9 S100A8 TXN RNF216 IKBKB MYOF PDXP UBE2G1 AP2B1 
21.085367560477668 RAI14 ACTB RPA1 VDAC1 FEZ1 PCBP1 CALM3 STX6 PACSIN1 ASAP1 SEPT9 
21.049384049475936 TPM3 CCDC102A CALM1 U2AF2 ACTB PDHA1 VDAC1 CKB SEPT9 PDIA4 IMP3 PAXIP1 KDM1A KLF10 
21.036711119741426 CTR9 PYHIN1 PARP1 LEO1 TCEB2 ZNF330 AFF4 DDX23 SNRPB AKR7A2 BMX TMEM131 TJAP1 
21.03524615609842 TNPO3 ST6GAL1 FNBP1 TNPO1 PARP1 RBM14 HNRNPH1 HDAC1 PTPRA MFNG C16ORF58 S1PR1 
21.03340396317452 PIGT IL1RN TCTN3 CSTA S100A9 S100A8 ARG1 PDIA4 CKAP4 GID8 CLEC2D 
20.973819332734642 AKAP9 KDM1A SF3B2 SNRPB RBM14 HDAC1 FNBP1 CALM1 PFN1 AP2B1 SAMD3 MAGED1 STK25 VPS9D1 
20.932353161307823 DYNC1I1 FAM110A PFDN5 GNB1 DYNLT1 PAFAH1B1 DYNC1H1 CCDC64 CALM1 ANXA7 
20.891372724232568 RAD51 RPL31 CTCF PFN1 EVL ABL1 CSTA BRD9 RPA1 ELMSAN1 EP400 UBE2I IGBP1 DNAJA3 
20.857089875489 TIPRL EIF4A3 TFG FUCA1 IARS FASLG ZC3H7A DYNLT1 CREB1 PUF60 XPNPEP1 IGBP1 
20.697447240584054 TNRC6A CEP85 ZFP36 ALG13 ZC3H7A CNOT11 KIAA0355 ARID1A ARIH2 NCOA6 TXLNG AP2B1 AFF4 KDM1A ANAPC1 TNPO1 
20.670589778379085 CCNA2 HELB RPA1 ELMSAN1 GATA2 PARP1 COPS5 CDKN1C PRKCQ SLBP CALM1 RBL2 
20.55594649417474 MAGEA9 RAD17 SUGP2 STX6 ELP3 TTF2 SARS ARHGAP17 CNOT11 DDX24 TAF4 
20.508258250814002 COPA SMARCAL1 RPA1 SF3B2 PYHIN1 USP11 PDHA1 SEPT9 CD74 WDR11 ZNF217 RTN3 BET1L 
20.461119166972882 PSMC4 PSMD7 CCDC92 RAD17 ZFAND2B PAFAH1B1 ZFAND5 TTF2 SEPT9 CRY1 USP11 PFDN5 
20.45361068395323 TPM4 ACTB CKB ILF3 COPS5 HNRNPH1 TELO2 CALM1 RNF144A RAB9A CLEC4D PACSIN1 KLF10 
20.37617733405232 SULT1C4 TTF2 SBF1 ANAPC1 MAGED1 WDR59 HPS3 HPS6 NBEAL2 ANKHD1 RAD17 WDR11 
20.3749152650239 CCDC88A SRSF8 PAXIP1 PTPRA CALM1 DYNC1H1 AP2B1 NUP62 ANXA7 ANXA1 STX6 S100P 
20.36680311684649 IPO4 MYL6 AIMP2 EEF2 PARP1 ZNF330 TNPO1 CLEC4D PAFAH1B1 ZC3H7A 
20.351428249792228 IQCB1 ACTB PAFAH1B1 SLC25A3 GAPDH TUBB CKB ECH1 CALM3 CALM1 GEMIN4 IRAK2 QDPR S100P 
20.342280376611587 BRPF3 JADE1 ING4 CSNK2A2 SIN3A DDX24 BAG4 CRACR2B CBX3 YEATS2 SUGP2 USP11 CENPB 
20.341650927239456 NDC80 HERC1 CEP85 HNRNPH1 TXLNG TXLNA AKR7A2 TBC1D31 ATP6V1D IFT20 NUFIP2 SPG7 
20.32731188478642 SNX6 ZMYND8 TSPYL1 CSNK2A2 HERC1 PFN1 CALM1 SHMT1 APEH FAM43A STX6 RAB9A 
20.302484292894984 CDK6 CDC37 UBE2I DDIT3 SLBP NFATC3 CDKN1C WDR33 ZFP36 CA4 KLF10 RBL2 LPIN1 
20.17529612617826 CERS2 ATP5J ATP5I ATPIF1 MICAL1 TCTN3 HNRNPH1 RFTN1 RAB9A SNAP23 TRIP12 PTPRA 
20.17008364092812 POLR2H CCNC MED29 CSNK2A2 PYHIN1 PARP1 INTS9 PHRF1 IKBKB GTF2B UVRAG 
20.16139702054575 EIF4E2 FAM110A CDR2 MAGED1 ZFP36 ZC3H7A RNF214 HNRNPD ARIH2 
20.153030956967186 ABI2 ACTB ACTG1 RNPS1 MED29 IFT20 NUP62 BAG4 ABL1 SNAP23 PFDN5 KIAA0355 
20.134840484107603 MED15 TRRAP MED24 CCNC HNRNPD EWSR1 NCOA6 IGBP1 MED29 ELP3 INTS9 MLLT6 
20.132178021128354 PLIN3 EIF4A3 CCNC PIAS3 VDAC1 C16ORF58 RAB9A GOT1 WDR4 ATIC TMEM43 PARP6 IGBP1 
20.098736137725858 UBE2D3 HERC1 RNF216 ARIH2 HNRNPH1 RING1 KLHL3 UBE2G1 STK25 RNF34 FBXL12 XPNPEP1 
20.096389228605318 TRIM54 CCNC FAM110A CDC37 UBE2I KDM1A CALM3 JOSD1 TJAP1 VPS9D1 
20.0779340675615 CEP162 KDM1A DNAJA3 PYCR2 RPA1 ACTB TTF2 YEATS2 TBC1D31 TXLNA TXLNG GAK 
20.05649072924457 CEP131 PFN1 ATP5J RNF214 CEP85 SAMD4B CALM3 AP2B1 KDM1A AKR7A2 TBC1D31 PPP2R3C 
20.042434173433982 ATP2B1 VDAC1 CKAP4 STX6 ITGB1 RAB9A CALM3 CALM1 HNRNPH1 ETS1 CLEC2D 
19.97396532413994 MIS12 HP AKR7A2 HERC1 CBX3 TJAP1 PTPRA CEP85 MRPL49 MED24 SF3B2 S100P 
19.97052951089356 OASL HNRNPR SRSF5 RRS1 SF3B2 DDX54 DDX24 HNRNPAB CSTA TM9SF4 GEMIN4 ZNF317 
19.96045703163332 MTUS2 FAM110A TFG JOSD1 ZNF696 RASAL3 RPA1 AFF4 SPG7 SLIRP 
19.92461931546622 EDC4 TRRAP WDR37 YEATS2 CRKL MRPS18C RPA1 AP2B1 PDHA1 ZFP36 BAG4 ABCF1 
19.91877567423529 KIF7 ZFP36 KIAA0355 SAMD4B TBC1D31 NRD1 PFN1 NUP62 TNFAIP6 RBM14 HNRNPH1 
19.910107659171956 C14ORF1 TFG ANXA1 ANXA7 PFN1 S100A8 SNRPB BTBD2 CREB1 CD74 
19.89655996319854 HOXA1 FRS3 SLPI GNE RALGDS UBL5 CRELD2 KDM1A ALG13 BAG4 SNRPB BUD31 
19.87378424312189 IRAK1 CDC37 IKBKB EEF2 MAGED1 TELO2 GAK IRAK2 HPS6 PFDN5 DNAJA3 CLEC2D 
19.842559749010086 RFC5 ZNF330 PARP1 PCBP1 RPA1 RAD17 ITGB1 GTF2B CCNC PPP1R16B 
19.822317856775523 RCOR3 KDM1A GATA3 GATA2 GSE1 ZMYND8 HDAC1 ZNF217 ALG13 PHF19 TXLNA CDR2 MXD3 
19.81517650315358 NUBP2 TTF2 TUBB BRD7 ARID1A S100A9 PIK3R4 UVRAG MRI1 IGBP1 FBXL12 
19.759847321825347 DNAJB6 RRS1 PARP1 DNMT1 HNRNPH1 NOL9 RPUSD2 BAG4 CAPRIN1 DYNLT1 GAPDH ASAP1 
19.756906236101734 ATP6V1B2 ACTB TFG PYHIN1 RMDN3 STX6 RAB9A ATP6V1D COX6A1 EEF2 IARS ATP2A2 
19.72322397634765 S100A2 THAP11 ILF3 GAPDH COPS5 TCEB2 PIGT MED29 CCDC102A PAXIP1 CDR2 ARHGAP17 
19.716122884436086 COX5B COX6A1 COX7C PDHA1 DNAJA3 PARP1 PSMD7 CDR2 BHLHE40 RPLP2 AP2B1 
19.68259725695752 SMURF2 PIAS3 UBE2I PARP1 AIMP2 RBM14 RUNX3 SMAD7 DAZAP2 SLC25A5 UBE2G1 
19.678378445752724 ARHGAP17 WDR49 COMT ACTB CKB PFN1 SLC9A3R1 FNBP1 EWSR1 ABL1 PACSIN1 
19.644446990927882 LASP1 ACTB U2AF2 COPS5 TFG DAZAP2 BHLHE40 BAG4 PFN1 CD81 SH2D2A DNMT1 
19.571676583754964 FOXK1 AHCYL2 SCRIB RNPS1 ZNF330 SIN3A HDAC1 RBL2 EEF2 SRF POLD2 
19.564399741835985 GOLGA6L9 FAM110A CDC37 RNPS1 ZNF696 GTF2B TJAP1 STK25 SLIRP 
19.56270124649877 ARL13B SCRIB STX6 ITGB1 RAB9A PHACTR4 UBE2I TMEM43 APEH SLFN13 GEMIN4 STK25 CMTM7 
19.561747471319276 SMAD9 KDM1A DNAJA3 ACTB DSTN E4F1 SMAD7 MED24 MED29 DPY30 TRIP12 METAP1 
19.55623251819622 CCDC101 TRRAP WDR37 YEATS2 ACTB ACTG1 PAXIP1 NUP62 TNPO1 BUD31 
19.509816900695025 MYH14 ECH1 KLF10 MYL6 MYL6B CLEC4D DNMT1 HDAC1 USP11 PSMB10 
19.487152882623747 TF CD81 HP ORM1 FNBP1 GOT1 ECHS1 RMDN3 PGRMC2 SRPK1 PIGT 
19.485295819026575 MBIP TRRAP WDR37 EWSR1 CALM1 YEATS2 ACTB POLE4 CCNC S100P ETS1 TXLNA MOAP1 
19.482044221883875 CCT8L2 TTF2 HMCES THAP11 YEATS2 CNOT11 RAD17 IKBKB LEO1 SIMC1 TRIP12 
19.46407309046348 FAHD1 CFL1 PCBP1 DAZAP1 MDH2 ANXA1 GOT1 PUS1 PAXIP1 MIPEP NUDT9 
19.455382939166878 CLIC1 ACTB U2AF2 EWSR1 HNRNPH1 ADA PDHB CLEC4D TOR1A NUP62 
19.438550206093833 MKRN3 FAM110A MIF4GD SLPI MAGED1 APEX2 UBE2I SON VPS9D1 
19.40573560655779 MTA3 ZNF296 BCL11B ZMYND8 KDM1A GATA3 GSE1 SIN3A PARP1 HDAC1 BAG4 DPY30 
19.397856450624943 KANK2 PARP1 SEPT9 PLAGL2 AP2B1 ZNF212 ZFP36 KIAA0355 TXLNA GEMIN4 C2ORF68 
19.39526008815284 NCOR2 GATA3 GATA2 SMARCA4 SIN3A WDR59 HDAC1 UBE2I SRF NCOA6 NRD1 ALG13 
19.35512084754303 LGALS1 ACTB U2AF2 SNRPB GEMIN4 PCBP1 IGBP1 SEPT9 PTPRA ITGB1 POLE4 
19.331616418935166 EFEMP1 FAM110A PDIA4 ANXA5 SLPI ZSCAN21 BAG4 NUFIP2 E4F1 ZNF696 
19.321135702619927 PSMD3 ACTB GAPDH U2AF2 SNRPB DYNC1H1 PSMD7 CCDC92 ZFAND2B CD74 GNB1 CRY1 
19.30691752661284 KRT8 NUP93 NUP62 CDR2 CEP85 FAF2 ARIH2 ANXA1 TBC1D31 ALG13 LCP2 
19.303735537022803 NVL ZNF330 ILF3 CRY1 RPL31 ADARB1 UBE2I PARP1 PYHIN1 RPS24 TTF2 DDX23 
19.300485699250302 YKT6 RMDN3 CKAP4 TCTN3 STX6 RAB9A HNRNPH1 DBI MDH2 FNTA GOT1 BET1L 
19.260791230499745 CCND2 ACTB RPA1 DYNLT1 PFDN5 SPECC1L CALM3 ABL1 CBX3 CALM1 CDKN1C RBL2 
19.26052864274564 MALL CLEC10A MCEMP1 CLDN9 EWSR1 SFSWAP CKAP4 PGRMC2 ATP5J UBE2I DAGLA WDR33 RFTN1 
19.22841133690489 PAPD5 ZNF330 SMARCA4 PARP1 RPL31 ADARB1 SRSF5 SF3B2 ZCCHC14 TCTN3 
19.21438172969777 CREB1 ATF1 FEZ1 WDR59 SMARCA4 ETS1 UBE2I HDAC1 TTF2 ABL1 CRY1 TAF4 
19.213160396405982 XRN1 CALM1 QSOX2 RPA1 HNRNPA1 MLH1 POLD2 ZFP36 ALG13 ZC3H7A ATP5G2 DDX24 
19.193591153777238 L3MBTL2 ZNF296 TSPYL1 TBC1D9B THAP11 HDAC1 RING1 KDM2B CBX3 ZFP36 PFDN5 IRF1 
19.17647111909451 KDM6A ETS1 PAXIP1 GATA3 GATA2 SMARCA4 SRF NCOA6 DPY30 YEATS2 FNTA 
19.170996167169086 CBLB CD81 HMGB2 CRKL ABL1 ASAP1 TICAM1 GORASP2 CRY1 CARD11 DSCR3 
19.156378056606947 TBC1D22B CDR2 ARF1 RAB9A STX6 GDPD3 S100A9 S100A8 S100P 
19.152962556050998 BANP AHCYL2 PRKCH LAP2 PARP1 UBE2I HDAC1 SNRPB STK38 PDHB 
19.148752393108914 TNIP1 CDC37 IKBKB TXLNA DAZAP2 PIGT AP2B1 GTF2B TTF2 CARD11 HDAC1 
19.147769346738343 VPS4B RPL31 NUP214 UBE2I IST1 VPS9D1 CSNK2A2 TUBB UVRAG CALM1 SPOCK2 IGBP1 
19.131949136717967 TOP3A TTF2 COX7A2 PDHA1 RPA1 RAD17 PARP1 MLH1 ECHDC2 NAA38 MARCKSL1 
19.13076243711087 SPPL2B GHITM SLC52A2 PGRMC2 CDR2 PIGT CLEC2D TMEM43 QSOX2 DGAT1 GPR114 TOR1A UQCRQ 
19.12347919588782 LMO2 DDIT3 GATA3 GATA2 AIMP2 DAZAP2 MED24 UBE2I YEATS2 NUP62 
19.121618184043303 UPK1A ACTB COX7A2 ALG9 S100A9 S100A8 SLC52A2 CKAP4 GDPD3 
19.118161453979877 LSM2 ACTB CDC37 ILF3 SNRPB SF3B2 LSM3 NAA38 GEMIN4 DDX23 ARHGAP17 BUD31 
19.114296101095956 HIVEP1 ETS1 PARP1 CDC37 MAGED1 BHLHE40 CREB1 GATA3 GATA2 KLF10 
19.113472505120797 CEP19 PUS1 SCRIB ANAPC1 RABL2B PPP2R3C SRPK1 INTS9 HMGB2 CREB1 AP3D1 TRIP12 
19.096368846649064 CRYAA KDM1A TRRAP GSE1 NCF4 CEP85 DCK GORASP2 PAFAH1B1 RAB24 OTULIN 
19.094817039065568 TXN2 HADH TXN MLH1 AP2B1 ILF3 PPP1R16B DCTD DDIT3 CRACR2B ANXA1 C2ORF50 
19.057268865970883 GPR17 XYLT2 DNAAF5 TBC1D9B PIK3R4 MTOR HEATR5B DGAT1 GEMIN4 ATP2A2 NDUFB3 SIMC1 
19.045182507189242 TECR ACTB SLC25A3 VDAC1 DDOST PGRMC2 PDHA1 CTCF PYHIN1 PAXIP1 ZC3H7A GORASP2 
19.03050236564008 RBM12 CSNK2A2 PCBP1 HNRNPA1 SNRPB DAZAP1 IGBP1 ALG13 GNE GATA3 BTBD2 
19.024373962959444 ACTBL2 ACTB RNASE3 RPA1 CLEC4D ACTG1 CFL1 ADA BRD7 COPS5 FAM65A CALM1 PHF19 
19.005401735020634 TSC2 HERC1 EDEM1 ARF1 ARIH2 SRSF5 CALM1 RAB9A RRAGD GAPDH OTULIN 
18.995803666174993 RMND1 COX6A1 PUS1 MFNG PYCR2 SLIRP ECHS1 TMEM43 TMEM109 TMEM5 
18.96026247853289 SIRT2 SARS CALM1 KCTD5 RUNX3 ANAPC1 MOAP1 RMDN3 PARP1 SLC25A5 
18.931216172323577 TOPBP1 ZNF330 SMARCA4 PARP1 RPA1 SF3B2 MUTYH ABL1 ACTG1 ARID1A TELO2 
18.880273131018253 MARCH2 ANKRD13D IKBKB HADH S100A9 S100A8 ALDH9A1 STX6 RMDN3 
18.8634091464663 CASP8 TICAM1 CARD11 RALGDS QDPR FASLG IKBKB MTOR PARP1 RNF34 AIF1 
18.859793921498206 UBE2L3 HERC1 RNF216 DAZAP1 ARIH2 MDH2 PARP1 RNF144A TRIP12 SMAD7 
18.8583894898744 PIAS2 ZNF330 PARP1 UBE2I SUMO3 CREB1 TXLNA ZNF319 PAXIP1 ADA TSR2 
18.85500475832615 SNAPIN BLOC1S4 BLOC1S1 TOR1A DOCK9 SNAP23 RAB9A HMGB2 NUP62 S100P 
18.831240465115705 UBXN1 CD81 MAGED1 FAF2 PRPSAP2 BTBD2 UBE2I MAP4K1 PAFAH1B1 IGBP1 RPA1 
18.82384352352349 MIB1 ECH1 DNAJA3 MAGED1 RRS1 TMCC3 AP2B1 TBC1D31 GEMIN4 YARS RANGAP1 NUFIP2 
18.771864505001417 NDUFB10 ACTB SLC25A3 VDAC1 PDHB COX7C MARC1 NDUFB3 NDUFA1 DNAJA3 
18.757705337947726 HOOK1 MIF4GD MYL6 ARF1 HNRNPH1 RBM14 UBE2I DYNC1H1 AP4B1 IFT20 
18.687600223483518 ARFIP2 CSNK2A2 SARS CALM1 EWSR1 C16ORF58 IFT20 NUP62 ARF1 DDOST STX6 
18.654617427831408 ZBTB48 EIF4A3 HNRNPH1 CTCF DDX54 MRPS18C ZNF317 ABL1 DDX24 CENPB 
18.648278262136973 DAPK1 SARNP SH2D2A LRRFIP1 CALM1 PARP1 ABL1 APEX2 DSTN EEF2 CAPRIN1 CNTNAP1 
18.625191815768023 CCDC53 KDM1A ACTG1 TXLNA NUP62 IFT20 BLOC1S4 BLOC1S1 HSBP1 CKAP4 
18.613707031693966 ATL3 RTN3 ATP5E GPR114 GOT1 EDEM1 DDOST ITGB1 UBL5 TMEM109 
18.608491932005382 HNRNPLL PAXIP1 MAGED1 BHLHE40 UBE2I PTBP1 PCBP1 WDR33 CALM1 H2AFJ 
18.56529717198281 ANP32B AHCYL2 CD81 EWSR1 CALM1 ANXA5 PFN1 ATPIF1 PDIA4 PARP1 RPA1 DNAJC27 
18.54525598807852 PDE4DIP KDM1A MIF4GD S100A9 ZZEF1 CALM1 ASAP1 GDPD3 ZNF696 
18.51730224032609 ATP6V0A1 TFG ATP6V0E1 ATP6V1D PTRH2 MARCKSL1 GPR114 RMDN3 SLC25A3 TCTN3 AGPAT2 
18.515290446059403 ACTN1 TXN DYNLT1 FAF2 PARP1 CKB CD81 ITGB1 CLEC4D ARID1A 
18.490950401234517 SLC2A1 TELO2 PIK3R4 PARP1 DDOST TMEM43 UBE2I RAB9A AGK GEMIN4 SPG7 AGPAT2 
18.480387369752872 NSF PTRH2 GORASP2 STX6 STX2 SNAP23 HNRNPAB RAB9A STX10 ASPSCR1 BET1L RAB24 
18.435480920682263 UBQLN4 MIF4GD TXN RNPS1 DAZAP2 ATPIF1 ANKRD13D MOAP1 STK25 HDAC1 MLLT6 
18.42097874642018 EIF5B JADE1 ETS1 PAXIP1 SRRM1 BRD7 ZNF207 ANXA5 CALM1 PYHIN1 
18.41752972112872 ASNS ZFP36 PDIA4 PAFAH1B1 COPS5 IARS USP11 GLRX CD74 XPNPEP1 
18.41218533710208 XPO7 ZNF330 NUP93 UBE2I NUP62 IFT20 DSCR3 USP11 AP2B1 
18.396392430993856 ARF5 GHITM RAB11FIP3 HNRNPH1 DAZAP1 ARF1 AP3D1 FAF2 MRI1 GOT1 GORASP2 
18.37044446705495 WIZ EWSR1 PARP1 PRPSAP1 ZNF330 CBX3 HDAC1 GATA3 GATA2 PYHIN1 
18.339639700863625 RHOV SCRIB MYL6B MYOF HNRNPH1 RAB9A ITGB1 MTOR SNAP23 RASA3 
18.328622114707695 SEC23B EIF4A3 SRPK1 CLEC4D ITGB1 DDOST ARF1 DAZAP1 SUMO3 KIAA0355 
18.328519961304476 ZNF24 ZNF330 ZSCAN21 RPL31 CBX3 PARP1 UBE2I USP11 PPP1R16B APLP1 PRKCQ SFSWAP 
18.321824014256123 DDIT4L EIF4A3 TXN BAG4 LSM3 CALM3 CALM1 DBI PUF60 CRLF3 
18.301390330688754 CLINT1 ACTB RPA1 EWSR1 CALM1 GAK STX6 RAB9A ARF1 AP2B1 TFG PFDN5 
18.242067302980896 CHCHD4 ILF3 SON TXN CDR2 GOT1 ECHS1 WDR54 MED29 CRELD2 RAB9A 
18.23829564467257 RIPK1 HNRNPR PARP1 RPA1 ANXA1 ITGB1 TICAM1 RNF216 IKBKB TAF4 
18.23828550624874 ERGIC3 TCTN3 TMEM109 ANKRD13D CD81 HP UBIAD1 C16ORF58 GPR114 BUD31 
18.20184529809352 STX1A CD81 TXLNA CDC37 UBE2I STX2 SNAP23 STX6 STX10 CDR2 CMTM7 ZNF696 
18.198325271132198 HARS2 IMP3 SARS TXN C12ORF65 PDHA1 MDH2 PFDN5 GTF2H5 NUDT9 
18.187868803564555 NFATC2IP ZNF330 PARP1 HNRNPH1 CBX3 SUMO3 RPL26 TMA7 CALM1 
18.18178613831558 ELF2 ZNF330 ZNF148 TRRAP EP400 CBX3 PARP1 CALM1 DYNLT1 
18.177610063533027 VAMP2 ILF3 SYPL1 CALM3 CALM1 SNAP23 SEPT9 STX6 STX2 RAB9A 
18.174546799272033 PIP CSNK2A2 RING1 USP11 S100A9 TOR1A NFATC3 RAD17 ZC3H7A PPP2R3C 
18.14954325162262 DEPDC1B FAM110A NUP93 RAB9A ANAPC1 SMARCA4 TCTN3 RANGAP1 
18.148689022614946 STX11 KDM1A FAM110A CDR2 ANKHD1 RNPS1 SNAP23 STX6 STX10 HSBP1 
18.07293777403692 CDC20 TRRAP PAXIP1 ANAPC1 CDR2 CTCF HDAC1 COPS5 TUBB IST1 RNF34 
18.06537352829875 ATP1B3 ALG9 EWSR1 RPLP2 RPL26 VDAC1 EEF2 ATP5J PAXIP1 TM9SF4 RTN3 CNTNAP1 
18.058430968208235 FH GOT1 ECHS1 MDH2 DAZAP1 ALDH9A1 ATIC PAFAH1B1 XPNPEP1 KLHL6 
18.054029816552436 ATR TELO2 XPC RPA1 RAD17 ABL1 CREB1 HDAC1 E4F1 IFT20 
18.043557300899707 HSF2BP FAM110A CDC37 MAGED1 UBE2I HSBP1 SEPT9 SPG7 
18.035869441777375 DNM1 ACTB UBE2I PARP1 FNBP1 IST1 PACSIN1 ASAP1 ANKRD22 
18.014966775295967 HCK ACTB CDC37 IRAK2 CRKL ABL1 FRS3 PRPSAP1 PRPSAP2 FASLG RASA3 
17.999239629722133 CCNDBP1 FAM110A IMP3 PAXIP1 COPS5 STK25 SPG7 PSMB10 ZNF696 
17.997378770344223 RUFY1 COMT FNTA PFN1 ANKHD1 TELO2 BMX AP3D1 TUBB PUF60 DYNC1H1 RAB9A 
17.992518965392787 HCST DNAAF5 JAK1 OTULIN TAMM41 AP5B1 GEMIN4 PTPRA LRRC8C MTOR SIMC1 
17.98383233661214 VMA21 ATP6V0E1 CLEC4D EDEM1 RMDN3 DAGLA SEPT9 TCTN3 LHFPL2 
17.976748177489497 TAB1 HNRNPR CDC37 IKBKB MAGED1 MICAL1 CKB CFL1 CARD11 SMAD7 
17.95016160990202 ZNF48 ADARB1 RRS1 RPLP2 ILF3 DDX54 DDX24 SRRM1 ZNF317 TXLNG 
17.946900157130642 TLX2 RPL39 KDM2B CSNK2A2 DDX54 DDX24 PAXIP1 GSE1 ARID1A ELMSAN1 
17.92469569460188 GOLGB1 KDM1A UBE2I CLEC4D PDHA1 ASAP1 ARF1 RAB9A STX6 GAK EWSR1 PFN1 
17.92296645714955 STX12 EWSR1 VDAC1 ABCA1 TM9SF4 STX10 STX6 STX2 SNAP23 RAB9A TXLNA EIF2B1 
17.916064893873326 ASPM FEZ1 COPS7B MOAP1 CALM3 CALM1 MXD3 CENPB HLA-DPA1 KIAA0355 MTOR 
17.907111849332058 KEAP1 DDIT3 BCL11B PYHIN1 USP11 BRD7 EEF2 GAPDH IRF1 IKBKB KLHL3 LSM3 C2ORF68 
17.895459931767437 INPPL1 TICAM1 PFN1 ABL1 SUMO3 ANXA1 TUBB HNRNPH1 SLC25A5 SAMD4B 
17.89301132233023 RAB11FIP1 TFG NUP62 DYNLT1 RPA1 ASAP1 STX6 RAB9A S100P 
17.87551032688699 RAMP3 GHITM C16ORF58 PYCR2 C2CD2 TUBB PTRH2 CKAP4 SPG7 AGPAT2 FBXL12 
17.842736731254284 PNMA1 ACTB FAM110A SNRPB BTBD2 ZNF148 MOAP1 VPS9D1 SPG7 SLIRP 
17.825485321672883 PPM1B EIF4A3 VDAC1 S100A8 ABL1 IKBKB ANXA1 GAK ECHS1 S1PR1 
17.790638061159452 GRAP2 ZMYND8 ZNF319 ZSCAN21 ANAPC1 MAGED1 BAG4 ABL1 FASLG MAP4K1 LCP2 
17.74976522305623 MAP1LC3A TBC1D9B MOAP1 SRPK1 HNRNPH1 PARP1 TUBB VPS9D1 AP2B1 
17.749404315652754 RMND5A KDM1A RPL31 GEMIN4 PIGT GID8 ZCCHC14 ARID1A PAPD7 USP11 PDHB 
17.731747104066947 CTIF AHCYL2 NUP62 NUP214 TUBB MIF4GD SLBP AP2B1 
17.731701247949797 PAFAH1B1 HSP90AA4P ACTB DYNC1H1 DYNLT1 CLASP2 ACTG1 ATIC GAK GID8 FNTA PDIA4 
17.72957967176349 AMFR TXN EDEM1 FAF2 UBAC2 CALM1 PTPRA UBE2G1 THEM6 ATP2A2 
17.72781688378375 ITCH EWSR1 DAZAP1 ARID1A IKBKB UBE2G1 RAB9A PACSIN1 SMAD7 KLF10 
17.70929939195824 MED8 CCNC MED29 MED24 EWSR1 GAPDH INTS9 HDAC1 TCEB2 BRD9 
17.687433724996705 PAPSS1 CNDP2 ALKBH5 ARHGAP17 DBI ATIC PAFAH1B1 PARP1 EVL 
17.68495250384136 NPRL2 CFL1 PCBP1 ANXA7 RRAGD WDR59 RAD17 ARF1 ARIH2 
17.67703921298801 VPS18 SRRM1 UVRAG EDEM1 TJAP1 IFT20 AP3D1 RAB9A 
17.67040654526595 ALAS1 ACTG1 EP400 C12ORF65 PDHA1 MDH2 SLIRP ECHDC2 HMGCL AGK 
17.668826609745818 INCA1 TFG DAZAP2 PFDN5 SNRPB LSM3 FRS3 GTF2H5 DSCR3 ASPSCR1 METAP1 
17.66739858163018 ALB ARIH2 HNRNPA1 HP KCTD5 S100A8 GOT1 EEF2K FOXJ3 
17.640103658590974 VAMP5 COMT IMP3 SLC9A3R1 SNAP23 STX6 STX2 STX10 TNPO1 BET1L PTRH2 SLIRP 
17.621481945212242 ZYX ACTB U2AF2 EVL PUF60 CCDC92 IMP3 KLHL6 APEH ALG13 
17.61515943652794 VASP ACTB ACTG1 PHACTR4 EVL PFN1 ALG13 FRS3 HNRNPH1 SF3B2 
17.605763039296725 MINOS1 COX7C PDHA1 UQCRQ PDHB C14ORF2 MED24 LEO1 
17.5843438987046 DCP1A TTF2 CRKL KIAA0355 ZFP36 CEP85 SAMD4B PAXIP1 BAG4 YARS 
17.579609036915 TBL1XR1 KDM1A GATA3 GATA2 HDAC1 ACTB UBE2I HNRNPD WDR59 ARF1 ABCF1 
17.568996843252716 NCOA3 ETS1 ALG13 EWSR1 HNRNPD ABL1 GATA3 GATA2 NFATC3 IKBKB 
17.548439981710015 GOPC CCDC102A DDIT3 PTRH2 STX6 RAB9A CLEC4D ASPSCR1 NUP62 CLASP2 
17.52217877989635 RALA ANKRD13D ACTB MARCKSL1 ILF3 CTCF AKR7A2 ATP5J CALM3 CALM1 ARF1 
17.507233647365528 SH3GLB1 SARNP UBE2I RANGAP1 ARHGAP17 DSCR3 IRAK2 EDEM1 SIN3A GORASP2 
17.505892555040685 SPAG9 FNBP1 HDAC1 U2AF2 HNRNPA1 DDX23 LRRFIP1 MAP4K1 RAB9A ABL1 
17.502402023603658 EGFL8 ORM1 HP MMP9 HNRNPH1 S100A9 TIMM8B RMDN3 
17.498376830412813 SCAMP2 RTN3 TCTN3 RAB9A STX6 SNAP23 ATP5J GNB1 ARIH2 LRRC8C 
17.485076415876733 KMT2B ZNF330 CBX3 YEATS2 PAXIP1 NCOA6 DPY30 ANXA7 ANXA1 
17.461767487524938 TRIM41 CSNK2A2 RNPS1 ZSCAN21 HNRNPH1 PLAGL2 ZNF696 ZNF319 UBE2I ANKHD1 
17.45267074083423 UBE2O CSNK2A2 NOL9 ZNF330 AFF4 DDX54 DUSP5 E4F1 SMAD7 CLASP2 
17.42279531113533 LGALS8 APEH PDIA4 ANXA5 ANXA1 HMGB2 ITGB1 TXN PTPRA LRRC8C 
17.402842583915255 TRIM66 TELO2 KIAA0355 ALG13 GSE1 JAK1 GLTSCR1 MLH1 USP11 DNAJA3 
17.391114150212022 RNH1 RNASE3 PYHIN1 PARP1 U2AF2 THAP11 ZNF696 DSTN FBXL12 
17.3763304532466 ZNF644 ZNF330 PARP1 HNRNPR CBX3 GATA2 RPS24 TCTN3 
17.376138466020933 GLUL CCNC PIAS3 UBE2I WDR59 ASPSCR1 UBAC2 PIGT ZFAND2B ARIH2 HNRNPH1 TAF15 
17.37573524227005 PPP6R1 TTF2 TUBB PFDN5 PRPSAP1 FAM43A MED24 S1PR1 CD74 
17.354649530663025 RBM4B IMP3 RRS1 SUGP2 CTCF SON DDX54 DDOST DDX24 SFSWAP 
17.33265038164944 LAMTOR2 ILF3 RRAGD MTOR RAB9A BLOC1S1 PDHA1 RING1 
17.27176419434903 CEP70 CDC37 ZSCAN21 GSE1 KDM1A ZNF148 DAZAP2 ZNF696 PPP1R16B STK25 
17.264800467104696 COMMD4 ARHGAP17 DSCR3 COMMD6 NUP62 SF3B2 ATP6V1D APEX2 
17.216393478640946 ZNF622 ZNF330 PAXIP1 PCBP1 WDR33 PYHIN1 HMGB2 ATP6V1D CKAP4 
17.195372681096107 MGME1 SOX13 CDC37 PDHA1 MDH2 DDX54 MIPEP HMGCL RAB24 
17.162938734881806 NDUFB9 ACTB COX7C VDAC1 COX7A2 ILF3 FEZ1 NDUFB3 NDUFA1 SPG7 
17.16143327826983 USP5 ACTG1 MLH1 RNF216 COPS5 IKBKB HNRNPA1 PARP1 DAZAP2 OTULIN 
17.159812883321646 TAF6L TRRAP WDR37 YEATS2 XYLT2 HDAC1 PARP1 PFDN5 HLA-DPA1 
17.153055678075496 CDKL3 TTF2 HNRNPH1 GNB1 SRPK1 GDPD3 
17.148203603584427 EPHA2 CDC37 EEF2 RNF144A ANXA1 STX6 ITGB1 RAB9A AP2B1 NUDT9 
17.141995823664153 FBXL19 CCNC S100A9 MED29 MED24 COX7A2 ATPIF1 HNRNPH1 
17.128740863187105 OSBPL8 CSNK2A2 TCTN3 CKAP4 STX6 RAB9A CLEC4D CD74 OTULIN GPR114 
17.125577970599593 NOC4L RPL31 ADARB1 RRS1 BRD7 ILF3 DNAJA3 BHLHE40 DAZAP2 ZNF148 
17.12409034019394 PRKACB HP TBC1D31 ERH PARP1 DDX23 RNF216 ARF1 DFFB RNF34 
17.117328586616598 NUP54 WDR59 IFT20 NUP62 HDAC1 KIAA0355 LEO1 AP2B1 
17.10271946575575 CETN2 TTF2 XPC PARP1 HNRNPD CALM3 CALM1 GTF2H5 
17.08270282018985 FGB HP ORM1 GOT1 AIMP2 PIGT ANXA7 RING1 XPNPEP1 
17.00733208733818 PLEC PRKCH PCBP1 HNRNPA1 PDHA1 ETS1 MOAP1 GOT1 DYNLT1 
17.00724666278472 SLC25A4 ACTB BRD7 SLC25A5 VDAC1 COPS5 CLEC2D FAF2 RAB9A HLA-DPA1 
16.996327473340266 HIPK2 DAZAP2 SUMO3 UBE2I PARP1 ABL1 HNRNPH1 CBX3 
16.9899152942407 ZBTB9 ZNF330 PARP1 UBE2I TSPYL1 GATA3 GATA2 MED29 TSR2 
16.983958281243105 NEDD4L DAZAP2 IKBKB HNRNPH1 DDX54 APEX2 ABL1 SMAD7 SLIRP 
16.960100644517627 MTX1 GHITM ACTB HNRNPD COX7A2 SRRM1 C16ORF58 HLA-DPA1 CKAP4 DDOST 
16.934699250434406 TRIM9 ACTB PTRH2 RTN3 CEP85 ZC3H7A PSMG2 QDPR GNB1 EVL 
16.900108799281384 CHD1 ZNF330 PARP1 HDAC1 CBX3 PYHIN1 BMX DOCK9 DUSP5 DDX23 
16.884286848921455 SLC25A12 GHITM ATPIF1 PAFAH1B1 TM9SF4 BRD9 BRD7 AGK COX6A1 COX7A2 TIMM8B 
16.86944207121782 CXCR4 DNAAF5 JAK1 PTRH2 TM9SF4 WDR11 OTULIN DGAT1 SPG7 LRRC8C MTOR 
16.84947160782626 TOMM34 ZFAND5 ACTB CDC37 HNRNPD MRPS18C ATP6V1D SLC16A11 CALM1 PLAGL2 
16.84853546956054 KRTAP13-3 MIPEP ZNF330 UBL5 ZNF319 PFDN5 GNE GATA2 
16.825851174065047 NEK7 HADH SBF1 NUP214 SLC9A3R1 DPH3 EEF2 CTCF ANXA5 AP3D1 
16.81076736954127 ARL1 ACTB PARP1 EP400 TJAP1 ITGB1 BET1L GORASP2 
16.803924091631114 IRF3 WRNIP1 HNRNPH1 EWSR1 MAP4K1 OTULIN TICAM1 TELO2 RBL2 S1PR1 
16.7953850502838 PUM2 ZFP36 ALG13 ZC3H7A RNF214 NUFIP2 SRSF8 HNRNPH1 
16.784404873309732 NAMPT CD81 HNRNPH1 U2AF2 HNRNPA1 CLEC4D EDEM1 LEO1 
16.783732196123356 C17ORF59 BLOC1S1 CDR2 IFT20 NUP62 UQCRQ HSBP1 RAB9A 
16.780217354392438 GABARAPL2 GIMAP6 VDAC1 TXN TBC1D9B PTPRA HNRNPD TSR2 
16.777324656444506 CXXC1 DNMT1 PAXIP1 NCOA6 DPY30 YEATS2 HNRNPH1 H2AFJ 
16.765373188894834 RYK NUP93 CDC37 PARP1 PTPRA F5 DYNC1H1 AGK 
16.763517232608326 MEAF6 TRRAP EP400 PARP1 JADE1 ING4 DPY30 ANKHD1 
16.72863180277571 SEC24A TFG EWSR1 ILF3 IGBP1 CLEC4D DSTN ARF1 RAB9A CREB1 
16.722669347904347 PCBD1 CCDC102A KLF13 RBL2 GORASP2 TFF3 BRD7 COPS5 RPS24 YEATS2 
16.722118550242232 A2M ACTB CDC37 FBXW4 FBXL12 ANXA7 ALKBH5 PAXIP1 RETN 
16.711617896269008 APOA1 CD81 HP ORM1 DGAT1 ABCA1 GOT1 QDPR TMEM43 U2AF2 
16.710504842795768 FARP1 NUP214 CDC37 SNRPB LSM3 RNPS1 MYL6B DYNLT1 
16.70870300017079 DDX19A NUP62 NUP214 ARID1A MIF4GD CLEC4D MRI1 RAB9A 
16.706562591021132 PIK3R2 ACTB BRD7 RRS1 TUBB USP11 CRKL ABL1 EDEM1 RNF144A 
16.70276799682795 RAB3GAP1 FEZ1 CKAP4 RAB9A GNE PAXIP1 HNRNPH1 AP2B1 
16.69880931582648 GNAI2 ACTB U2AF2 HNRNPH1 CD81 GNB1 CYSTM1 RASA3 
16.69640197293929 SYDE1 FAM110A PYCR2 HNRNPH1 CALM1 WDR33 
16.690827286074203 PLEKHF2 FRMD8 RTN3 MRI1 PRPSAP1 BRD7 PACSIN1 AIMP2 DAZAP2 
16.67892069653526 GID8 SPON1 RNPS1 CALM1 PIGT HNRNPH1 PAFAH1B1 PGRMC2 
16.67603733156661 DUSP14 TFG MRPS18C WDR59 PYHIN1 PYCR2 OTULIN BAG4 ANKHD1 GOT1 
16.673868630948487 PAK4 ACTB DYNC1H1 RCC2 TCEB2 ABL1 SRPK1 STX6 RAB9A 
16.67009208289545 AZGP1 CCNC CSNK2A2 S100A8 ORM1 ARG1 TAF15 PSMB10 
16.66879271203885 POC1A BLOC1S4 AP4B1 TMCC3 MDM1 HNRNPH1 U2AF2 PFDN5 CKB 
16.66335891305705 ALDH3A1 PIGT ARG1 CSTA SRF TXN GOT1 FBXL12 
16.642778478791808 ARFGEF1 HNRNPDL DPY30 NUP62 HDAC1 STX6 RAB9A CLEC2D 
16.634845113728268 TMOD3 ACTB PAFAH1B1 ACTG1 CFL1 LRRFIP1 CALM1 SF3B2 CLEC4D ERH 
16.61727412110016 C11ORF58 MYL6B MAP4K1 DUSP5 PARP1 EWSR1 RANGAP1 CBX3 BMX 
16.610766286640867 ERC1 KDM1A CRLF3 ACTB PFN1 NUP62 EWSR1 BMX WDR4 IKBKB 
16.596994217730888 PHF20L1 KDM1A DNMT1 HDAC1 JADE1 YEATS2 XPNPEP1 CALM1 
16.59598111004374 FAM134C XYLT2 ATP5J DAZAP2 RTN3 UBIAD1 C16ORF58 THEM6 CRY1 CMTM7 
16.590846226017806 C3ORF52 SLC23A2 AGPAT2 CLEC10A MARC1 RMDN3 THEM6 UBAC2 KIAA0195 YEATS2 DAGLA 
16.58544993837413 PTCH1 CSNK2A2 MYL6 TCTN3 TCEB2 DYNC1H1 CD74 TMEM131 SPG7 
16.584833823961432 USP25 TMEM43 SUMO3 UBE2I ANXA1 GAPDH ASPSCR1 WRNIP1 EDEM1 SON 
16.583654088755168 EIF3L HNRNPR CAPRIN1 CSNK2A2 CTCF CD74 EIF2S3 CD81 TTC21B BMX 
16.56847792951107 ILVBL GHITM GEMIN4 TBC1D9B UQCRQ SLC25A3 FAF2 TCTN3 WDR45B MTOR 
16.555636001206043 CEP97 PFN1 CALM1 AP2B1 RNF214 BTBD2 CALM3 SNAP23 KIAA0355 
16.547371349390563 ACAD11 CCNC MED29 MED24 SLBP STK25 CAPRIN1 HNRNPH1 
16.547185004519505 LRRK1 CDC37 MAP4K1 GAK S100A8 SH2D2A ABL1 ECHS1 ASAP1 
16.546676792140822 DAB1 DAZAP2 MAGED1 BHLHE40 CRKL APLP1 SNRPB PAFAH1B1 CLASP2 
16.53908828708178 PACSIN2 ACTG1 PDIA4 EWSR1 ARIH2 PACSIN1 ASAP1 FASLG STX6 SBK1 
16.525608114567383 SGTA ACTB U2AF2 CALM1 DSTN SLPI EDEM1 RNF144A TFF3 AASDHPPT 
16.51864160126772 FAM129B SHMT1 GOT1 PTPRA HNRNPH1 PUF60 STX6 RAB9A 
16.517706094980625 TXNDC5 JMJD8 TXN WDR59 TCTN3 PDIA4 ZNF207 PTPRA 
16.493822462293114 HMGXB4 ZNF330 SIN3A CBX3 ZNF296 JADE1 UBE2I UBL5 
16.47752056472647 MBNL1 ORM1 HP HNRNPH1 HNRNPA1 PGRMC2 CKAP4 ANKRD22 
16.473890123608427 TINF2 HNMT ACTB GAPDH TUBB ANXA5 ADA CKB SARS PAXIP1 
16.471585542795882 CTNND1 EIF4A3 ACTB AIMP2 ACTG1 RNF144A STX6 RAB9A PTPRA 
16.460484931002252 BAIAP2 KDM1A ACTB ACTG1 PFN1 NRD1 SLC9A3R1 PTPRA CRY1 STX6 
16.445225865674477 RUNX1T1 ETS1 ZFP36 HDAC1 DNMT1 SIN3A GSE1 CDR2 
16.429460003974558 FGFR1OP PIK3R4 ELP3 WRNIP1 ABL1 TBC1D31 TXLNA SPECC1L RABL2B PPP2R3C 
16.421336304176734 ASAP1 ARHGAP9 CRKL ARF1 RAB11FIP3 PACSIN1 PDHA1 ASPSCR1 
16.412935077952277 HIST1H2BO ZNF330 XPC PARP1 H2AFJ ING4 JADE1 CD81 
16.399303849552872 VPS33B TBC1D31 UVRAG RMDN3 ZC3H7A DYNLT1 COMMD6 STX6 AP3D1 
16.396794686763364 SP110 CSNK2A2 RUNX3 CBX3 RMDN3 AFF4 ANXA7 CENPB 
16.36998139259394 NUDCD3 KDM1A UBE2I PARP1 TUBB DYNLT1 PAFAH1B1 DYNC1H1 KLHL3 KLHL6 
16.36720488365212 TAB2 PIAS3 UBE2I HDAC1 SIN3A SMAD7 EDEM1 IKBKB HNRNPH1 
16.349906534066797 TSPAN4 ITGB1 CD81 HNRNPH1 FRS3 GNE CLEC2D 
16.34006896549489 FOXH1 TFG MAGED1 DAZAP2 IARS SCRIB COX7A2 YARS 
16.316466337999003 GNAS ACTB CD81 GNB1 CLEC4D RMDN3 CALM1 GTF2H5 ELMSAN1 
16.296003760182465 USP21 FAM110A UTRN GATA3 FUCA1 BHLHE40 JOSD1 RNF144A 
16.2852731495118 EPC1 TRRAP ELP3 EP400 ACTB HNRNPH1 SRF 
16.284772617464178 SNX3 EIF4A3 TFG PAXIP1 GOT1 DAZAP1 IGBP1 RAB9A 
16.264200043157967 ATPAF2 HMGN2 PPP1R16B MIF4GD CLEC4D EWSR1 DDIT3 MICAL1 ECHS1 HMGCL 
16.247728033207103 HLA-B IL27RA ST6GAL1 TELO2 EDEM1 FAF2 PTRH2 WDR11 CLEC4D C16ORF58 
16.2442081272534 DUSP16 CEP85 CNOT11 KIAA0355 DYNC1H1 STK38 ANAPC1 HPS3 RNF34 KCTD20 
16.23614677408253 PIP4K2C AHCYL2 ACTG1 CSNK2A2 RNPS1 SCRIB DDX23 AFF4 GOT1 EDEM1 
16.23464525562317 CBWD1 TTF2 TUBB FAM43A WDR59 C2CD2 DDX47 METAP1 
16.216856238419606 GIPC1 NUP93 EWSR1 DDIT3 GEMIN4 HNRNPH1 PFN1 DOCK9 
16.20290775512864 SCLT1 KDM1A BLOC1S1 IFT20 CALM1 PHACTR4 DYNC1H1 HSBP1 COMMD6 
16.196967540870038 WEE1 CSNK2A2 EEF2 PCBP1 PAXIP1 GATA3 ADA ING4 MTOR 
16.187348851315615 LINC00839 PTBP1 SEPT9 RPLP2 QSOX2 PARP6 PTPRA CALM1 
16.17987945542773 LURAP1 DOCK10 CCDC92 SBF1 TSPYL1 UTRN HSBP1 U2AF2 ELP3 
16.176639755202395 ACADVL GOT1 INTS9 EEF2K PARP1 MDH2 DNAJA3 HMGCL 
16.175080066253535 WDR46 ADARB1 RRS1 SRSF5 DDX24 TSPYL1 CSNK2A2 ZNF317 H2AFJ 
16.161542765064663 H2AFJ PNMAL1 HNRNPD ING4 U2AF2 PARP1 TRIP12 SFSWAP 
16.160956607601783 PRDX6 VDAC1 PAXIP1 ORM2 ANXA1 PARP1 RANGAP1 GSTO1 
16.158663697549933 MED19 GTF2B CCNC MED29 MED24 PARP1 AFF4 DCK 
16.132902748781305 SLC6A15 KLHL36 PTPRA CKAP4 PTRH2 STX6 RAB9A AGPAT2 KIAA0355 AP2B1 
16.111030568296503 EXOC3 IFT20 PTPRA SUMO3 STX6 RAB9A MLH1 METAP1 
16.079466892295642 TGFBR1 CSNK2A2 IKBKB AP2B1 FNTA PPP1R16B USP11 SMAD7 BTBD2 FBXL12 
16.05080252324916 GDI2 ACTB U2AF2 RRS1 PARP1 TUBB RAB24 RAB9A 
16.049780308592744 MYOG ZCCHC14 FAM110A SRF SMARCA4 MLH1 
16.04629491970636 ZBTB11 ZNF330 RPL31 ADARB1 RPL26 SRSF5 E4F1 SMAD7 
16.032447140002745 TMEM120B COMT ATP5J C16ORF58 TCTN3 STX2 CD74 CLEC2D 
16.0317118160904 NEK6 KCTD5 NUP93 CDC37 BHLHE40 FOXJ3 
16.02310707129147 STAM2 IST1 CLEC4D DAZAP2 JAK1 PARP1 LCP2 STX6 
16.003244726448333 POLE3 APEH YEATS2 POLE4 RAD17 RPA1 POLD2 ARF1 WDR45B 
15.979654706376744 CCNH CCNC PTBP1 PDHB RPA1 TFG BCL11B GTF2H5 
15.97746926960866 MELK FAM110A TXN SMAD7 PSMG2 
15.968991066438525 KRT38 KDM1A CCDC102A FAM110A TXLNA TXLNG THAP11 CREB1 
15.952221449386027 NDEL1 ACTB PAFAH1B1 DYNC1H1 DYNLT1 RBM14 AIMP2 UTRN TUBB SFSWAP 
15.940494341495826 NCAPD2 PSMD7 PAXIP1 PARP1 SNRPB SF3B2 HDAC1 CD74 
15.940093981638165 MCL1 VDAC1 BAG4 EDEM1 DBI SEPT9 ARG1 TXLNG S1PR1 
15.940000041295681 RNF5 S100A9 EDEM1 S100A8 C16ORF58 AGPAT2 
15.897820053267852 RAB11FIP5 JMJD8 NUP93 RAB9A STX6 ERH RNPS1 ECHS1 
15.896089608622601 CLU PAXIP1 ORM1 HP HLA-DPA1 KDM1A TOR1A MMP9 KIAA0355 
15.884184984777965 RIOK2 JADE1 GHITM CSNK2A2 PYHIN1 PARP1 EWSR1 SRPK1 H2AFJ 
15.883232700331392 TNIK ACTG1 PAFAH1B1 DYNC1H1 FNTA CCDC92 GSE1 IKBKB 
15.87882932139874 ZCRB1 RPL31 ADARB1 RPL26 CTCF SRSF5 RRS1 SF3B2 SF3B6 SNRPB 
15.860555368201913 IQGAP2 ACTB PARP1 MYL6 CALM1 IKBKB MAP4K1 CD74 
15.85554818844333 ARFIP1 ACTB PFN1 ECH1 RAB9A ARF1 DYNLT1 SARS SHMT1 QDPR 
15.849079847984552 PSMD5 ZMYND8 PSMD7 CCDC92 BTBD2 SON PARP1 PSMB10 
15.839002914621174 SLC39A7 ARF1 ARIH2 DAZAP1 HNRNPH1 TCTN3 HDAC1 NUDT9 
15.825626024309042 ROR2 ALG13 MAGED1 DAZAP2 BHLHE40 PTPRA STX6 RAB9A 
15.806671809834366 PLEKHA5 CCDC92 PFN1 CALM1 STX6 RAB9A SH3BP5L DYNLT1 SLC25A5 
15.787227371909902 C19ORF52 GEMIN4 CRACR2B ATP6V1D PDHA1 FAF2 PARP1 AGK 
15.76946839274223 PIK3R3 WRNIP1 FRS3 RBP7 SH2D2A MICAL1 ORM1 RING1 
15.761089433737174 VPS11 KDM1A CCDC92 SCRIB TICAM1 CSTA UVRAG AP3D1 RAB9A 
15.750710665271558 ENV PSMD7 VDAC1 ATP2A2 CKAP4 TOR1A CALM1 TMEM43 EVL TIMM8B 
15.73976617927808 BRPF1 ZNF330 ZMYND8 CSNK2A2 RING1 JADE1 ING4 DYNLT1 DDX23 
15.73751855547109 CDK19 CCNC MED29 MED24 CTCF SMARCA4 CDC37 STK38 
15.729987427345261 PUS7 ZNF330 CBX3 ACOT13 EWSR1 RAD17 DUSP5 S100P 
15.727534655297669 WAS CRKL FNBP1 LCP2 PACSIN1 UBE2I RPA1 
15.723468157074976 COMMD1 TTF2 DSCR3 COMMD6 NUP62 HMGB2 TCEB2 
15.721644696551733 USP47 FNBP1 HDAC1 LRRFIP1 IST1 PUS1 DUSP5 PRPSAP2 
15.701963920129039 HIST2H2BF ZNF330 CD81 HMGN2 SRSF5 PRPSAP2 C2ORF68 GID8 
15.701341664354155 SMN2 GEMIN4 SNRPB SRSF5 BRD7 MAGED1 SNU13 PDHA1 
15.69748755538086 IKBKAP APEH PFN1 DPH3 BMX IKBKB SF3B2 DYNC1H1 ELP3 ZNF217 
15.694961512113478 SYT2 AHCYL2 COPS5 COPS7B EIF2B1 CALM1 AFF4 CENPB LEO1 
15.68962213204478 NDUFB1 C14ORF2 MED24 EWSR1 NDUFB3 NDUFA1 LEO1 
15.676376867247177 TSPAN15 PIGT QSOX2 HNRNPH1 TM9SF4 DGAT1 RTN3 THEM6 S1PR1 LRRC8C 
15.66053620538008 CIDEB COX7C C14ORF2 DFFB HNRNPH1 ATP5I DNAJA3 
15.649891697563719 ATP2B4 SLC41A3 ATP5J CALM3 CALM1 TMEM5 LHFPL2 RAB9A 
15.641630683709483 TAF3 ZNF330 PARP1 CTCF RPL31 DDX23 MICAL1 BHLHE40 TAF4 
15.637244950327485 PTPN21 AHCYL2 AKR7A2 CSNK2A2 DDX54 DSCR3 BMX GID8 CEP85 
15.625029081217942 TUBB8 IL27RA FAM43A TCTN3 TUBB COPS5 GNB1 PFDN5 
15.624669221963435 USP18 IKBKB HERC1 MYL6 CARD11 UVRAG EEF2 
15.588648143243995 POC5 DDX47 DDOST RNF214 DYNC1H1 CALM1 AGK CALM3 
15.577116519930607 SRP1 DAZAP2 ACTB ACTG1 S100A9 S100A8 
15.572086061903365 PPP4R2 GEMIN4 DSTN CAPRIN1 IKBKB SF3B2 SF3B6 ELMSAN1 IGBP1 MARCKSL1 
15.56827998584919 NAGK STIM2 TXN GOT1 PIGT HNRNPH1 C2ORF68 XPNPEP1 
15.562532280822358 OIP5 THAP11 ING4 MED29 PLAGL2 KDM1A SNRPB NUP62 
15.559594736957242 RNF111 SUMO3 UBE2I CREB1 TSPYL1 CSNK2A2 XPC AP2B1 SMAD7 
15.557429362211144 CSTF1 DOCK10 RBM14 SNRPB HNRNPA1 DNAAF5 ALDH9A1 PAFAH1B1 
15.553688819971903 PLEKHG4 TFG ZNF319 GATA3 ZNF148 ARF1 ZCCHC14 SLIRP METAP1 
15.542170910855692 CBFB ETS1 ARIH2 TCEB2 RBM14 ANXA1 RUNX3 BCL11B TAF4 
15.536452490355899 F2RL1 CD81 GEMIN4 COPS5 TELO2 TBC1D9B ATP5J C16ORF58 ATP2A2 
15.535961294810384 HLA-C IL27RA ST6GAL1 PIGT PDIA4 CLEC4D CKAP4 FAF2 MTMR12 STX10 
15.534114358260656 BCL2 GHITM VDAC1 RTN3 MOAP1 UBE2I HNRNPD PARP1 IKBKB 
15.533404440158645 CLIC4 CRKL HADH CLEC4D LSM3 ADA HNRNPA1 PARP1 
15.520544839745105 ATXN10 MAGED1 GATA3 ABCA1 SMARCA4 PTPRA TNPO1 GNB1 
15.496123999634769 CREB3L1 TRRAP DDIT3 ORM2 COMT UBIAD1 C16ORF58 CMTM7 
15.491293611880717 WLS FAF2 RAB9A TCTN3 PARP1 HNRNPH1 
15.461829961702833 FBXO38 TRRAP EP400 YEATS2 TAF4 RING1 USP11 ZNF217 
15.450550316999559 SMG9 ZC3H7A RNF214 SARS PFDN5 PRPSAP1 PPP2R3C 
15.426384951050304 FOXRED2 MIPEP TNFAIP6 DPY30 HERC1 EDEM1 PDIA4 VDAC1 RAB9A 
15.425917716032867 SRGAP2 ACTB PFN1 FAM110A CALM3 FASLG S100P 
15.422690579936049 VPS45 COPS5 HNRNPD ARIH2 STX10 STX6 RAB9A AP2B1 
15.418415901594125 DST ACTB COPS7B CALM1 STX6 RAB9A TXLNA EIF2S3 DYNLT1 
15.415364741229673 PFKM APEH SARS CALM1 PCBP1 SUMO3 ELMSAN1 
15.4099128323473 SETD2 SCRIB RNPS1 SNRPB RPA1 DDX23 PACSIN1 WDR37 
15.406980960943029 MAD2L1 ATP5E TUBB APEX2 MLH1 TSR2 ANAPC1 YEATS2 STK25 
15.406252299971804 LACTB CCDC102A CALM1 MDH2 PDHB ECH1 PAXIP1 MARC1 
15.402711277620153 MYCBP2 TBC1D9B CALM1 CD81 CFL1 CRY1 USP11 PUF60 
15.396935924275263 ECM1 CCNC FRS3 GNE SRPK1 FASLG IL2RB 
15.37730311625595 SUFU ZNF330 ZFP36 RUNX3 SLC41A3 EDEM1 NFATC3 
15.372741331014076 EYA2 AKR7A2 TFG GSE1 THEM6 CEP85 PIGT ARID1A 
15.349175895187445 CARD10 KDM1A GSE1 CDC37 TMEM43 JOSD1 HNRNPH1 
15.341469127898563 GNL3L ZNF330 BRD7 RPL31 RPS24 PYHIN1 PAFAH1B1 OTULIN S100P 
15.341311908693433 FGG PIGT KDM1A HP RING1 NUP93 BLOC1S1 GOT1 
15.332547492732672 COPB1 SRPK1 SF3B2 ARF1 ZC3H7A CD74 KCTD5 
15.326547410839261 PC ILF3 BRD7 SLC25A3 GOT1 DYNLT1 MDH2 FOXJ3 
15.314352339322406 SLX4 SUMO3 UBE2I PARP1 RPA1 TSPYL1 GID8 
15.311255187290874 OXSR1 AP5B1 ARHGAP17 HNRNPH1 TCEB2 DYNC1H1 
15.30554778340895 MSH3 ZNF330 PARP1 RPA1 CLEC4D MLH1 ORAOV1 
15.294569698171255 ATP6V1E1 ZFP36 ATP6V1D AIMP2 XPNPEP1 RMDN3 TMEM43 ARID1A 
15.287075643354028 GLI4 ZNF317 RRS1 CTCF NOA1 DDX24 RETN CENPB ZNF696 
15.257357234458235 RGS20 TXLNA TXLNG ZNF664 FAF2 FASLG CRY1 NUFIP2 ZNF696 
15.236179671414925 OPRM1 ATP6V0E1 DNAAF5 GEMIN4 COPS5 CALM1 ARF1 NDUFB3 
15.23524860800903 PATZ1 PPP1R16B USP11 ILF3 HNRNPH1 SCRIB DAZAP2 PFDN5 
15.227784128141206 CENPM VPS25 GNE MICAL1 COPS5 PAFAH1B1 PDXP LEO1 
15.204726404679072 C11ORF30 ETS1 NUP214 CSNK2A2 HDAC1 DDIT3 SON DYNLT1 
15.158855629279438 FBF1 PSMD7 SCRIB TXLNA TXLNG IFT20 CEP85 AGK FBXL12 
15.156024313376316 VWCE FBXO21 HLA-DPA1 ZSCAN21 NUFIP2 E4F1 ZNF664 ZNF696 
15.146391307382599 RAB11B HADH RAB11FIP3 SH3BP5L ARF1 BTN3A3 RPA1 CCDC64 
15.14304136761394 ENTHD2 CCNC DSCR3 AP4B1 LSM3 SF3B2 TNPO1 MAT2B 
15.14243275966237 XPOT NUP93 WDR11 RAB9A NUP62 PTPRA 
15.133795975681004 NCOA1 GTF2B ETS1 SMARCA4 COPS5 ARID1A SRF NCOA6 
15.132902760384578 CEP120 TTF2 TBC1D31 ECH1 ANAPC1 YEATS2 CNOT11 CKB 
15.128644851715702 CENPV ADARB1 ILF3 C12ORF65 PARP1 PYHIN1 FNTA CLEC4D 
15.1035351275964 UBA5 APEH HADH SARS ATP6V1D PARP6 GSTO1 ECHS1 
15.102492137094542 MRTO4 RPL31 ADARB1 SRPK1 HNRNPH1 DDX24 PYHIN1 ARID1A 
15.101646578516238 RALBP1 GSE1 HDAC1 RNPS1 AP2B1 SEPT9 
15.09225683574621 NARS PUS1 ABL1 PARP1 GOT1 IARS ANXA5 
15.091045033968834 NUMB FRS3 GAK STX6 RAB9A AP2B1 PTPRA IGBP1 MTOR 
15.088913125212214 WASF2 APEH ACTB ACTG1 EVL PFN1 ABL1 UTRN HSBP1 
15.081391671513344 KIF18B ZNF330 PARP1 AHCYL2 NUP62 IMP3 
15.078345229678735 TACC3 TTF2 SLBP ASPSCR1 NUP62 EVL MLLT6 
15.073728626475999 ELK4 ACTB ACTG1 H2AFJ ERH SRF PRPSAP1 PRPSAP2 
15.073648740277468 BAZ1B ZNF330 SMARCA4 PYHIN1 PARP1 HMGN2 CBX3 ARID1A 
15.06765795657382 REEP6 CDC37 RTN3 ATP5J GPR114 IFT20 TMEM43 GORASP2 
15.061554428939509 MBD3L1 ZNF296 UBL5 SIN3A HDAC1 FRS3 SNRPB MLH1 
15.04254796183253 EPN2 DAZAP2 PARP1 TXLNA RNF214 STX6 AP2B1 
15.020521594219765 SPINT2 CLEC4D TBC1D9B MARC1 THAP11 PTPRA CNTNAP1 ZNF696 
15.011061405772034 NDUFA13 ATP5J MDH2 ATP6V1D MTMR12 NDUFB3 NDUFA1 
15.009003725914887 B3GAT3 XYLT2 CLEC2D DNAAF5 HNRNPD CKAP4 STX6 
14.998436277651132 BCCIP ACTB MARCKSL1 CDC37 PSMB10 HMGB2 ALDH9A1 DDX24 
14.990558598267253 IPO9 AIMP2 ACTG1 FAF2 ZNF330 TNPO1 H2AFJ S1PR1 
14.98464710176754 SGOL1 KLHL36 TBC1D31 ECH1 CBX3 PAFAH1B1 MLLT6 HNRNPAB 
14.983710838573753 GLMN KDM1A IL27RA ST6GAL1 CDC37 MFNG C16ORF58 TCEB2 
14.972520526277455 MCMBP PSMD7 DYNC1H1 TCEB2 RPA1 DDIT3 ATP6V1D DUSP5 
14.970841334464218 FAM168A TFG EWSR1 HNRNPH1 DAZAP2 OTULIN RNF216 
14.955353056539847 GCD7 RTN3 UBE2I PUF60 AIMP2 HSBP1 DNAJA3 ZNF212 
14.95426536391778 RAB10 CFL1 MDH2 CALM1 CLEC4D RPA1 ITGB1 MICAL1 
14.951051209501884 RASSF10 FAM110A ANXA1 CSTA PCBP1 
14.93673947379863 ATXN1L ALG13 GATA2 ANKHD1 SUGP2 DAZAP2 GORASP2 S100P 
14.927319768291202 HMGCL MIPEP GTF2B MS4A7 FBXO21 GSTO1 
14.92509537396442 CORO1A DDX24 U2AF2 TJAP1 IFT20 TMCC3 
14.920797341901133 SUCLG1 ACTB GSTO1 DOCK10 ANXA5 MDH2 ECHS1 SEPT9 
14.908251005516775 PFN2 ACTB ACTG1 EVL GEMIN4 SIN3A WDR33 HNRNPH1 
14.90449333655689 ZNF609 ALG13 GATA3 GATA2 HDAC1 PIGT HNRNPH1 TAF15 
14.903698021848014 ZNF687 ZMYND8 TSPYL1 CSNK2A2 HDAC1 PYHIN1 CENPB DDX24 
14.895259445445555 TAF1D ADARB1 HNRNPH1 ILF3 RNPS1 CSNK2A2 FEZ1 CENPB 
14.893436067265139 ATP2C1 FAF2 RMDN3 HLA-DPA1 OTULIN GORASP2 S1PR1 CLEC2D AGK 
14.886596050997753 RABL6 SMARCA4 PARP1 CAPRIN1 MED29 STX6 RAB9A 
14.871411459595244 PRKAA2 CDR2 AIMP2 ZNF212 STIM2 DNMT1 RASAL3 UBE2I 
14.868943292226202 EMILIN1 TTF2 NUP214 PUS1 IFT20 WDR11 
14.851084135537645 PAK2 SCRIB ABL1 EEF2 CKB GAPDH RPA1 PAFAH1B1 
14.850405335794173 ISY1 CCNC SRRM1 SNRPB UBE2I BUD31 MLH1 POLE4 
14.835281063429976 GNA13 CD81 GNB1 MRPS18C UTRN PHF19 RNF144A S1PR1 
14.821536815769784 MED31 CCNC MED29 MED24 ERH DAZAP2 ANXA7 
14.815821244038327 USP54 ZNF148 PIGT ZCCHC14 ZFP36 KIAA0355 SNRPB BHLHE40 
14.809417337935024 PTPRF ACTB GAPDH ATP6V0E1 DNAAF5 PTPRA SEPT9 
14.7879076406152 NSFL1C KDM1A DYNLT1 ASPSCR1 DBI ATIC GMDS PARP6 AP2B1 
14.777756102430113 INTS7 PAXIP1 SLC9A3R1 HNRNPD FAF2 SUMO3 HLA-DPA1 
14.748978630850628 CACNA1A RPL31 TELO2 MOAP1 CALM3 TCTN3 PUF60 TAF15 
14.746549495008853 SAAL1 ST6GAL1 PTPRA TMEM43 QSOX2 CKAP4 PTRH2 S1PR1 SPG7 
14.744273321890732 TTYH1 XYLT2 ALG9 KIAA0195 TAMM41 DGAT1 THEM6 DAGLA AGPAT2 LRRC8C 
14.742989722545698 CCDC120 MAGED1 DAZAP2 HNRNPH1 GID8 CDR2 PLAGL2 
14.730963231649827 DNAJC11 PITPNB RNF214 RMDN3 TCTN3 ANXA7 
14.726287590291909 IRF2BP2 TRRAP SCRIB PARP1 SIN3A KLF10 NUDT9 
14.707477862874384 NDUFA10 PDHA1 UQCRQ RMDN3 NDUFB3 NDUFA1 ELMSAN1 
14.706109044870798 ZDHHC17 CSTA CNDP2 ZFP36 IFT20 STK25 SNAP23 EVL 
14.699001765592229 CIRBP PAXIP1 COPS5 HNRNPA1 ARIH2 CAPRIN1 DOCK9 SRF 
14.698104690140756 TOM1L2 SIN3A IGBP1 CREB1 DSTN RAB9A CALM3 CALM1 
14.664765953545581 VDR RUNX3 STK38 SRPK1 NCOA6 MED24 
14.656220157241687 HIRA UBE2I HNRNPD RPA1 PYHIN1 SMARCA4 CTCF CSTA 
14.638638682988514 SNX12 DAZAP1 HNRNPH1 EIF4A3 ELMSAN1 IGBP1 
14.636167734860132 MTPAP C12ORF65 PDHA1 CALM1 SLIRP STK25 DYNLT1 LEO1 
14.626340866270665 APOD CLEC10A ZMYND8 CCNC ATP6V0E1 TOR1A KIAA0355 CLASP2 
14.615614289920204 SMG1 EIF4A3 ZC3H7A RNF214 TELO2 EEF2 
14.614446791960203 AP1G1 COMT AIMP2 ARF1 CREB1 HEATR5B AP2B1 PAFAH1B1 
14.604407722606613 TACSTD2 GEMIN4 TBC1D9B EIF2B1 TNPO1 MED24 HNRNPAB MTOR 
14.599528345408455 MET CLEC4D ITGB1 SH2D2A PCBP1 SH2B3 
14.59321985832173 RINT1 FAM110A STX6 CKAP4 TXLNA RBL2 
14.585510872544091 TMEM184A NRD1 DGAT1 UQCRQ HNRNPH1 AASDHPPT SPG7 AGPAT2 
14.576893732970479 CHD8 GTF2B CBX3 CTCF U2AF2 DYNLT1 CREB1 AP3D1 
14.569988940649422 EIF2B5 S100A9 OPN3 EIF2S3 CD74 EIF2B1 
14.560632203976718 S100A4 TTC21B ZZEF1 PARP1 GAK DNAJA3 HSBP1 IGBP1 
14.551894346955736 MACF1 ACTB PFN1 SNRPB RBM14 GOT1 CBX3 ASAP1 STX6 
14.535508606477173 ARHGAP18 IST1 BAG4 S100A9 MRI1 MIPEP GAK AP2B1 
14.522276717124278 AMOTL2 DDIT3 CDC37 NUP62 RASAL3 CDR2 ANXA1 
14.509699238226958 ENKD1 MIF4GD CDR2 HNRNPH1 MLH1 PPP2R3C S100P 
14.503235710060022 ZMYM4 KDM1A ZSCAN21 CBX3 PARP1 CTCF PYHIN1 TUBB 
14.500830484484682 MED21 GTF2B CCNC MED29 MED24 CSNK2A2 CKAP4 PTRH2 
14.497762528559528 FTH1 GATA3 BAG4 PCBP1 PACSIN1 TNPO1 BRD7 
14.497167754290938 FAM91A1 KCTD5 TUBB BRD7 TNFAIP6 STX6 RAB9A 
14.494305343661955 SGTB SLPI EDEM1 MTOR RNF144A EIF2B1 DDX24 
14.488715022648904 HOXB9 AHCYL2 ING4 PARP1 PFDN5 S100P CKLF 
14.4870544863434 LDLRAD1 ALG9 DNAAF5 JMJD8 PIGT ZNF696 CNTNAP1 CMTM7 
14.486934274192736 UCHL1 PYHIN1 DNMT1 ILF3 COPS5 CLEC4D UBE2I TMEM5 
14.475595830125046 TRAF1 ECH1 EWSR1 GORASP2 GATA2 IKBKB JOSD1 TICAM1 NUFIP2 
14.464025817044417 CMTM5 SLC23A2 MCEMP1 TMEM43 TUBB DGAT1 C16ORF58 PPP2R3C 
14.443172774533613 SV2A NDUFB3 SPG7 WDR11 RAB9A CALM1 CLEC2D 
14.441202817255288 DNAJC6 KDM1A HDAC1 COX6A1 HK3 GAK AP2B1 WDR11 
14.425176356594495 HOXC9 AHCYL2 PIGT ZFP36 MAGED1 HNRNPH1 S100P 
14.42281411997591 BLOC1S2 CCNC ACTB BLOC1S4 BLOC1S1 STX2 IFT20 
14.417781482273915 MOB4 MYL6 CALM1 PAXIP1 RMDN3 STK25 HDAC1 
14.407430941776948 GMPS AIMP2 ANXA5 GSTO1 PARP1 CRY1 AGPAT2 
14.406138254204377 ZNF777 ADARB1 RPL26 MRPS18C SRSF5 NKTR SCRIB ZNF212 
14.40113789512277 ATXN2L PCBP1 NUFIP2 ZC3H7A AIMP2 ADA 
14.395977640558531 LAGE3 CNDP2 RNF214 HK3 TCEB2 WDR11 
14.392059312497357 EPB41L1 KCTD5 COPS5 RNPS1 PTPRA EEF2K AFF4 
14.391582558401 MNDA RRS1 CAPRIN1 DDX47 CTCF DDX54 DDX24 NOL9 
14.387737267370053 GLO1 ATIC CLEC4D PARP1 EWSR1 HNRNPA1 IGBP1 QDPR 
14.362025728634679 DEPDC5 ARF1 DAZAP1 HNRNPD RRAGD WDR59 
14.358813622571114 CD6 ZMYND8 CSNK2A2 SBF1 PAXIP1 DDX47 XPNPEP1 
14.357825919578593 CCP110 CALM3 CALM1 AP2B1 SNAP23 DYNLT1 TUBB SAMD4B 
14.345373302536832 EFHD1 ACTB CLEC4D S100A8 PLAGL2 CALM3 
14.328541690386054 ZNF655 ZNF330 IMP3 CDC37 DSCR3 VPS9D1 EVL 
14.318998225813118 WBP11 ZNF330 RPL31 EWSR1 DDX23 C16ORF58 PYHIN1 ARF1 H2AFJ 
14.31874569276878 ATRIP HMGN2 ST6GAL1 TXLNA TELO2 IFT20 RAD17 RPA1 
14.317280296589368 SLX1B ANXA1 THAP11 GEMIN4 SLIRP CALM1 S100A8 
14.31538049992126 FOXR2 TRRAP EP400 CCNC EIF2B1 PSMG2 
14.306306111268016 PTDSS2 HNRNPR ILF3 H2AFJ TCTN3 S1PR1 CLEC2D 
14.305368161660532 LMO3 AIMP2 HNRNPH1 BRD7 BHLHE40 SAMD3 
14.29858331837413 LENG1 EWSR1 CDR2 IARS UBL5 H2AFJ KLF10 
14.29801629683905 EXOC4 IFT20 DYNLT1 RPA1 HNRNPH1 STX6 
14.296864617112503 HLA-DPB1 C2CD2 HLA-DPA1 STIM2 TBC1D9B CD74 ZZEF1 GID8 
14.29534476861751 CEP72 AHCYL2 TBC1D31 SOX13 UBE2I HNRNPH1 PPP2R3C 
14.294423010960298 SGPL1 CSNK2A2 VDAC1 CD74 TCTN3 TM9SF4 BRD7 UQCRQ 
14.282586431773723 PRDM1 KDM1A HDAC1 GSE1 KDM2B IRF1 AGK BMX 
14.271988577978682 PDCD7 FBXO21 NUP62 FEZ1 SNRPB LEO1 CALM1 
14.266049355687565 TCL1B ZNF330 BAG4 HPS3 HPS6 MED29 MED24 
14.262926785345126 TGM1 CCNC THAP11 PIGT TCTN3 JOSD1 GOT1 TAF15 
14.239321973041342 COPG1 TELO2 MAGED1 ARF1 AP3D1 CD74 GEMIN4 SF3B2 
14.231903724696291 H1FNT MRPS18C E4F1 ZNF317 CTCF MRPL33 MRPL49 SF3B2 
14.214787339325838 IPO8 ZNF330 ZSCAN21 TNPO1 DUSP5 S1PR1 SLIRP ZNF696 
14.21234697151638 GPR35 CD81 GHITM VDAC1 VPS25 RTN3 C16ORF58 ATPIF1 KIAA0195 
14.210070749500918 TMEM31 PSMD7 CCDC92 UBAC2 FAF2 ZFAND2B RNF144A ANKRD13D CMTM7 
14.203511802765265 ATP6V0C MCEMP1 ERH ATP5J ATPIF1 TCTN3 CLEC2D 
14.199553554519772 CNP CLEC4D HPS6 CALM1 TUBB BRD7 RAB9A 
14.19942877252465 C6ORF165 MIF4GD MAGED1 GSE1 SAMD3 KLHL3 
14.1666517561191 SERTAD3 CCNC TXN UBL5 IMP3 SNRPB 
14.16425244951171 NDUFB7 ZNF330 CCNC UQCRQ CKAP4 NDUFB3 NDUFA1 
14.159542622530301 ADD3 ACTB ACTG1 UBE2I DYNLT1 ASAP1 STX6 RAB9A 
14.157796306634246 AHNAK2 CSNK2A2 PYCR2 PFN1 UBIAD1 STX6 GOT1 MIPEP 
14.148986131305842 SLC22A4 TAMM41 ALG9 DGAT1 UQCRQ C14ORF2 PTRH2 SPG7 
14.142585503153851 TFAP2A CITED4 RBM14 PARP1 EWSR1 UBE2I PYHIN1 AP2B1 
14.126401601073487 FAM9B KDM2B ECH1 SLPI CDC37 BLOC1S1 
14.11698360628591 C8ORF33 ADARB1 RRS1 ILF3 COPS5 SF3B2 PYHIN1 FEZ1 CRELD2 
14.09670937979197 PKN1 CD81 PCBP1 TUBB CDR2 PLAGL2 SLIRP 
14.096460685304157 ATP1A3 VDAC1 SLC25A3 AGK TMEM5 AGPAT2 CLEC2D S1PR1 
14.09091431891004 TMEM9B C16ORF58 STX10 STX6 HNRNPH1 CLEC2D 
14.080928100266034 TSHR CD81 ACTB HNRNPA1 SCRIB TUBB JAK1 CRELD2 
14.07677455964096 L3MBTL3 ZNF330 DNMT1 KDM1A CRLF3 HDAC1 SNRPB 
14.074186567554621 CD97 XYLT2 CLEC2D ADARB1 HNRNPD NDUFB3 
14.063389780930601 VIPR1 GHITM PFN1 ATPIF1 C16ORF58 C14ORF2 AGPAT2 TM9SF4 
14.047697404324515 LHFPL5 CD81 UBIAD1 C16ORF58 RMDN3 DAGLA CMTM7 CD37 
14.019767573935857 TRIP10 ARHGAP17 ASAP1 SBK1 CAPRIN1 INTS9 
14.016244020808331 TRIM67 TAMM41 VDAC1 U2AF2 EVL QDPR CEP85 ZCCHC14 
13.97106267306708 CENPF FNTA PAFAH1B1 TXLNA PAXIP1 RMDN3 HNRNPAB 
13.96512240617107 POU6F2 CCNC CDC37 ZNF319 ZNF148 BHLHE40 VPS9D1 
13.9620968727965 PEG10 COMT ACTB UBE2I RBM14 HNRNPH1 GATA3 AP3D1 
13.960547509257054 HSDL2 PDHA1 MDH2 MIPEP TCTN3 HDAC1 PSMB10 
13.95221521815776 CSGALNACT1 TRRAP EP400 DNMT1 STX6 
13.937941140827986 TM9SF3 FAF2 TCTN3 HNRNPH1 S1PR1 CLEC2D 
13.901465512151681 GTF3C2 RNPS1 PARP1 RNF144A RPA1 STX6 
13.891768487992056 SHBG APEH ACTB DSTN SRF MXD4 
13.889394629042139 MB21D1 RRS1 CTCF ANXA1 ANXA5 MYOF CENPB IST1 TXLNA 
13.88879957377773 BICD1 CRKL NFATC3 CKB TBC1D31 IST1 SH3BP5L DBI 
13.888014641070207 GORAB ARF1 CD81 CRKL KCTD5 BET1L CMTM7 
13.866536928577545 ARID5A MAGED1 DAZAP2 BAG4 SH2D2A ALDH9A1 PLAGL2 
13.864841176114938 CENPB ZNF331 PYHIN1 PARP1 MXD3 ANXA7 
13.844141838513725 DCTN5 ACTB DYNC1H1 ACTG1 FNBP1 TXN 
13.840392818732811 FAU CD81 RRS1 RPS24 CTCF COPS5 PTBP1 HMGB2 
13.830190341568729 TMEM30A IL27RA CLEC4D MARC1 HLA-DPA1 HNRNPH1 CLEC2D 
13.824976426695509 PJA1 MAGED1 FAF2 DDX54 JOSD1 LCP2 
13.821307537057125 KRTAP13-2 GEMIN4 GNE GATA3 HNRNPH1 PFDN5 
13.82091863045577 HK2 FNTA VDAC1 HK3 ARIH2 PARP1 OTULIN 
13.801978899167814 EEA1 ABL1 PARP1 MLLT6 HSBP1 STX6 RAB9A 
13.801231239852132 ZNF768 ZNF330 TNPO1 EWSR1 SRSF5 ADARB1 SCRIB 
13.795015218110475 POTEF EWSR1 TAF15 ADA HLA-DPA1 TUBB GID8 S1PR1 
13.786997962041116 RER1 ATP5I TOR1A TCTN3 PGRMC2 HLA-DPA1 GORASP2 S1PR1 
13.779166064238144 RBCK1 CARD11 ITGB1 RAB9A ZFAND2B UBE2G1 OTULIN 
13.766438796468979 TP53BP2 ACTB CCDC92 ANXA1 KDM1A DYNLT1 TXLNA RASAL3 
13.757358903316197 TEKT4 GSE1 HNRNPH1 CRY1 PLAGL2 
13.754981062989167 CD63 CD81 ITGB1 ALG13 TMEM43 SRF 
13.749187665037981 CPVL SCRIB SARNP STK25 SEPT9 QSOX2 DYNLT1 
13.747331086519326 UBR1 BTBD2 BRD7 HNRNPA1 EDEM1 FAF2 CLEC2D 
13.74524726650152 WHAMMP3 HERC1 BLOC1S1 CDR2 ANKHD1 WDR37 TXLNG 
13.721591975028716 GABARAPL1 TMEM131 RTN3 TBC1D9B SRPK1 HNRNPH1 WDR11 
13.717543038006806 TMX2 TLR5 TM9SF4 CD81 SRSF5 C16ORF58 CMTM7 
13.715612245385882 NF1 PFN1 CALM1 SMARCA4 ECHDC2 FAF2 HLA-DPA1 CA4 
13.709079796406828 SEC23IP ACTB DYNC1H1 RCC2 DSTN ARF1 ARIH2 RAB9A 
13.704506994994043 TARS2 ZFP36 SYPL1 CDC37 C12ORF65 PDHA1 NAA38 
13.704167533836676 ZNF408 ZNF330 ZMYND8 ZNF317 UBE2I CENPB ZNF212 
13.702505984133895 ACSL4 EDEM1 TJAP1 HNRNPD BRD7 
13.699840286172384 BRAF FNTA CDC37 PARP1 MYOF AP2B1 S100P 
13.696836524005418 SOHLH1 TRRAP TTF2 WDR59 HPS3 PFDN5 
13.677485930740957 GATAD1 ETS1 SLBP E4F1 HDAC1 SIN3A 
13.667274027745199 RFX1 XYLT2 HDAC1 CBX3 PYHIN1 MAGED1 ABL1 
13.658056312246414 POLR2L PHRF1 IMP3 SNRPB ARG1 MED29 
13.650040380902949 GDI1 RCC2 ALDH9A1 RAB24 RAB9A 
13.646802418884919 BLVRB ORM1 RPL31 GMDS GOT1 MDH2 
13.643682598516305 TTC4 CDC37 S100A9 BTBD2 HDAC1 TXLNG 
13.639634140063906 ALAD WDR4 CCNC GOT1 ANKRD22 PARP1 
13.630042136985038 RAB6B SBF1 CALM1 ARF1 RING1 RPA1 CCDC64 
13.629012356509211 USP4 RNPS1 IKBKB ANXA7 RBL2 USP11 
13.625784239264714 SPANXN2 TSPYL1 DDX24 WDR37 RNF214 WDR54 
13.6136308433884 CTNNA3 FAM110A GNB1 PLAGL2 
13.612013975419917 CCR2 RTN3 CD81 SLC41A3 NTNG2 GOT1 SRSF5 
13.611979105100254 B4GALT3 XYLT2 GHITM MFNG PDIA4 ATPIF1 ARF1 HLA-DPA1 
13.598885776510313 PRDX4 TTF2 PDIA4 EWSR1 PYCR2 CRY1 
13.59083307231909 GAB2 CRKL FAM110A STK38 
13.56740927457904 LCN2 CCNC PDIA4 LY96 IRAK2 HNRNPA1 GID8 
13.5608554918248 MED29 STEAP3 CCNC MED24 IFT20 EIF2B1 
13.533374477148104 CDSN JADE1 CCNC GOT1 ARG1 USP11 DYNC1H1 
13.508573726716403 LIN37 ZNF296 BCL11B DPY30 SIN3A RBL2 EWSR1 CDR2 
13.502656594965496 ATF4 GTF2B HK3 CKAP4 DDIT3 ZNF212 
13.49274644723814 TRAF4 KDM1A UBE2I TICAM1 FRS3 SNRPB CEP85 GORASP2 PLAGL2 
13.491692514103148 SLC7A1 GHITM COMT DGAT1 RMDN3 CKAP4 RNF144A LEO1 
13.48802351253996 UBN2 ETS1 EWSR1 CSTA CBX3 PYHIN1 CALM3 
13.47847287643377 JAK2 CALM1 JOSD1 JAK1 IL2RB DNAJA3 
13.474314317996262 CCDC132 KDM1A SF3B2 THAP11 PPP2R3C STX6 RAB9A 
13.464653568121086 WIPF2 EIF4A3 ACTB ACTG1 ZFAND5 TXLNG HNRNPDL 
13.456927709278041 MYO15B CCNC ZCCHC14 NUP62 DNAJA3 
13.444037441348483 SH3PXD2A FAM110A FASLG HNRNPH1 
13.437785550896827 MPHOSPH8 ZNF330 ILF3 TRRAP IMP3 ERH 
13.432047486363093 PRKAR1B CDC37 TUBB PRPSAP2 FAM43A EEF2K POLE4 
13.428297600536807 SHARPIN PSMD7 GSE1 IKBKB WDR11 RAB9A OTULIN 
13.421121466466964 CDK15 CDC37 RBM14 ANKRD22 ORM1 AFF4 
13.407415681291369 TTI1 TRRAP TELO2 MTOR HERC1 S1PR1 AGK 
13.405334299167512 MTMR2 MTMR12 SBF1 SRSF5 COX7A2 XPNPEP1 
13.403087901237766 ZMYND19 GSE1 PIGT GID8 TUBB GORASP2 MLLT6 
13.391244179507627 JADE2 JADE1 ING4 CSNK2A2 GORASP2 PIK3R4 PACSIN1 
13.374122882590308 MIS18A DYNLT1 TXLNA TXLNG RMDN3 IFT20 AIMP2 
13.365715297242449 TYMP PIGT CCNC HNRNPH1 EEF2 GAPDH GOT1 
13.360718218774418 CTDSP1 FAM110A HNRNPH1 HMGCL 
13.358229961689704 SLC18A2 DNAAF5 TELO2 TMEM109 JAK1 OTULIN 
13.357981301769028 NCAPD3 RPL31 RPS24 PAXIP1 CALM1 BRD7 CD74 
13.35630284483614 PBX2 WDR4 AFF4 CREB1 HNRNPH1 STK25 
13.353031096243674 PBXIP1 ZMYND8 XYLT2 CCNC GHITM RMDN3 STX6 
13.348757149020487 RNF170 FBXO21 TMEM109 S100A9 TM9SF4 
13.345235734581498 SEPT8 JADE1 SEPT9 PUS1 CALM1 
13.33819730113847 HIST1H2BN CD81 EWSR1 HNRNPD ING4 TMA7 
13.337093616838812 TYMS ATIC TXN DCK WDR45B IGBP1 GOT1 
13.327583169015186 STK3 NRD1 TSPYL1 ORM2 ZC3H7A 
13.324058562454878 CEP290 TBC1D31 ECH1 CALM3 CALM1 DDX47 KLHL3 
13.317805848742136 CYCS HADH VDAC1 RBL2 HNRNPH1 SFSWAP 
13.297351255303914 PPP1R13B ZFP36 TXLNA NUP62 RASAL3 MLH1 
13.294333181493785 CCDC153 AIMP2 GORASP2 NUP62 DDIT3 CDR2 
13.283856007317798 UBE2V1 TXN CALM1 DAZAP2 ARIH2 RNF144A ECHS1 
13.282617980893415 CRX CCNC TFG ARIH2 PPP1R16B PSMB10 MLLT6 AASDHPPT 
13.271614009938997 KRTAP19-7 GEMIN4 GNE BHLHE40 DAZAP2 MIPEP 
13.263949834634003 PPP3R1 PDHB CDR2 TMCC3 CALM3 CALM1 
13.262703645254199 SOCS3 JAK1 ABL1 IL2RB SH2D2A TCEB2 
13.259067590988048 LLGL1 SMARCAL1 PTPRA USP11 BRD7 STX6 RAB9A 
13.25659015229679 DHFR CDC37 TBC1D9B HPS3 RAD17 HNRNPH1 
13.252482265608082 SCGN SNAP23 STX6 STX10 GNE CNOT11 
13.248991289368847 SLC1A1 EWSR1 HNRNPH1 JAK1 SPG7 AGPAT2 TM9SF4 
13.248113296095777 COL4A3BP PARP1 RANGAP1 RTN3 STX2 TAF15 IFT20 
13.241093402902848 HSPA14 GMFG SEPT9 PAXIP1 ATP5E 
13.240083163653946 CCT6B PAXIP1 WDR37 FBXW4 GNB1 HDAC1 
13.228444875590714 MAP3K4 ACTB CDC37 ACTG1 HNRNPD ZFP36 
13.216398320570889 SLAIN2 AKR7A2 HERC1 PFN1 SRPK1 AP2B1 
13.215651343412931 MAP7D3 ZC3H7A DYNLT1 RMDN3 CALM3 CALM1 
13.215389239964676 FAM9C CCNC HSBP1 BLOC1S4 BLOC1S1 STX6 
13.213196304257522 SPCS2 ILF3 HNRNPH1 TCTN3 CKAP4 TM9SF4 
13.200638477233342 XPNPEP1 DMGDH ATP6V1D WDR4 GORASP2 PUF60 
13.184059276239408 KANSL1 KDM1A DPY30 YEATS2 CTCF NUP62 CDR2 
13.176651081413965 HYPK TXLNA PCBP1 YEATS2 CALM1 C2ORF68 
13.17122202292975 GJB1 COMT CD37 HNRNPH1 LHFPL2 CALM1 
13.170005366190786 GPATCH2L KDM1A CSNK2A2 TSPYL1 S100A9 PLAGL2 
13.169484808803535 ARL8A ITGB1 GHITM ZSCAN21 SPECC1L TBC1D9B PUF60 
13.16597190748735 BRAT1 PSMD7 PARP1 MTOR GPR114 INTS9 
13.164523600068346 THRB ACTB UBE2I PARP1 NCOA6 HNRNPH1 
13.16251984874304 FAM103A1 DAZAP2 MAGED1 HNRNPH1 COPS5 BAG4 
13.161692169169015 RPAP3 CDC37 MAP4K1 TELO2 TM9SF4 KDM1A SBF1 ANAPC1 
13.158157055788442 LAMTOR3 RRAGD RAB9A UQCRQ PTRH2 PUF60 
13.156424827525322 NUS1 ACTB TCTN3 HLA-DPA1 HNRNPH1 GPR114 
13.149885514145168 CHRAC1 PARP1 XPC DCK RNF214 GOT1 
13.14716381117448 MARCH4 SARS S100A9 MARC1 PTPRA KIAA0195 
13.134130184306802 GPANK1 MAGED1 HNRNPH1 BAG4 PFDN5 AP2B1 
13.13142106532119 PTOV1 XYLT2 HNRNPH1 HDAC1 FEZ1 TCEB2 
13.117255451103132 TNKS ZMYND8 FNBP1 SH3BP5L GMDS 
13.113329452889042 TGS1 LSM10 NCOA6 SCRIB CRY1 SNRPB ZZEF1 
13.113201706727365 DOK4 AHCYL2 KDM1A GSE1 PFDN5 RBP7 
13.107951416852334 ZMIZ1 SETD4 ETS1 PIAS3 NFATC3 HNRNPD 
13.107464984593593 POLA1 HELB RPA1 PARP1 POLD2 POLE4 
13.104390783741206 ITGA5 CD81 EWSR1 ITGB1 SPECC1L MMP9 AP2B1 
13.096750788301103 SFSWAP IFI27L1 JADE1 H2AFJ EWSR1 DDX23 RNPS1 
13.086658428799515 STXBP1 SHMT1 ARHGAP17 CALM1 ECHS1 STX2 
13.085397863770039 CD9 ITGB1 CD81 HNRNPH1 CTCF GOT1 
13.085257214666772 HLA-A HLA-DPA1 STX6 RAB9A CLEC4D BUD31 
13.077978333835484 TRIM68 TFG ARHGAP17 TCTN3 BTN3A3 
13.075748556786317 TTC26 HK3 HERC1 FAM43A IFT20 
13.049909587928354 KCTD3 FAM110A MRI1 SLC9A3R1 ASPSCR1 COPS5 
13.046845438203533 LIMK2 CFL1 PGLYRP1 CDC37 EEF2 ATP5E 
13.046763940182355 RAB2B TTF2 FNTA PARP1 RAB9A 
13.03344498415573 SERINC1 ILF3 HNRNPH1 GPR114 IFT20 
13.029195669599432 SCAF4 SNRPB HNRNPA1 RPA1 RNPS1 GATA3 RRS1 
13.026763292070262 MAPKAP1 FAM110A STK38 MTOR 
13.024327958966104 CSPP1 TUBB PFDN5 RNF214 KIAA0355 SAMD4B 
13.01838079645147 PRKD2 TMA7 ANXA5 MDH2 HNRNPH1 TAF15 
13.0157735733304 KIF20B H2AFJ CRY1 PAFAH1B1 SF3B2 MXD3 
13.01101756437445 RFWD2 ETS1 COPS7B BTBD2 COPS5 TCEB2 
13.007214252958244 EMP1 MS4A7 CD37 HNRNPH1 ATP6V0E1 CLDN9 
13.005440618023538 PPP1R32 TFG PFDN5 FRS3 HNRNPH1 BAG4 
12.988936406052147 HABP4 JADE1 PPP1R16B PIAS3 UBE2I DDX24 H2AFJ 
12.978918475784493 RNF7 ACTB ARID1A TBC1D31 COPS7B TCEB2 ARIH2 
12.971137736057823 DARS2 FNTA PDHA1 DNAJA3 AIMP2 NCOA6 
12.966808167658233 SORBS3 IMP3 TXLNA HNRNPDL MICAL1 FASLG SIN3A 
12.966605312803864 USP32 DNAAF5 ANKHD1 RNF144A RAB9A TRIP12 
12.96451131317066 IVL JADE1 GOT1 PIGT CSTA GID8 HNRNPA1 
12.961051788435846 DKK2 AHCYL2 SIN3A S100A9 CENPB ZZEF1 
12.959552251138502 LTBP4 CRKL DYNLT1 NUFIP2 CRY1 E4F1 
12.956586808934262 HERC1 ZNF571 HMGN2 ARF1 ABL1 EDEM1 ECHS1 
12.954635932397242 RRAGA ACTB DYNLT1 RRAGD RAB9A EDEM1 
12.945627735256437 UBE2H JADE1 RNF216 GOT1 RNF144A RNF34 
12.9440695459552 PBX4 ZNF330 KDM1A HNRNPH1 IFT20 
12.93878407132058 AARS EIF4A3 EEF2K CLEC4D IARS 
12.935428040539033 DDX3Y CD81 PCBP1 RNF34 USP11 
12.934316738857081 TRIM44 TSPYL1 DDX24 TXLNA TXLNG TCEB2 
12.934102737694474 ALS2CR11 CDR2 AP2B1 CNOT11 DNAJA3 HNRNPD 
12.932418498968426 BCL10 IKBKB PRKCQ CARD11 TMEM43 SLC9A3R1 OTULIN 
12.924767511259562 PLN MS4A7 ATP2A2 ATP6V0E1 RTN3 STX2 CLEC2D 
12.9235721491963 PLEKHG6 ARF1 ACTB NUP214 ANXA1 HNRNPH1 
12.919752197400918 PI4KA CALM1 MAP4K1 RAB9A S100A9 C2ORF50 
12.916441192163163 C10ORF12 ZNF330 CBX3 EWSR1 NBEAL2 PHF19 
12.908988763675014 PPEF1 SLC25A3 ATP2A2 HERC1 DNAJA3 CALM1 
12.903730181520613 BMPR2 HNRNPR FRS3 C16ORF58 SMAD7 DYNLT1 
12.885906947284935 MYO9A MYL6B PFN1 CKB BRD7 SAMD3 
12.884937420498597 DHDH RNF214 SBF1 WDR59 TUBB 
12.873702188085337 SNAPC5 BLOC1S1 NUP62 DDIT3 
12.865712783640276 DDX60 DDX47 ERH MAGED1 CD74 EIF2S3 
12.853292077019905 KIAA1683 ANAPC1 PUF60 WDR59 CALM3 CALM1 
12.853099550283657 PLD2 ACTB GAPDH TUBB MTOR BAG4 ARF1 DYNLT1 
12.852934090492212 USP13 TXN ATP6V1D DAZAP2 FAF2 
12.830495263923973 RPS6KB1 ABL1 MTOR USP11 RCC2 EEF2K 
12.828456460380654 HNF1A ACTB KDM2B CBX3 PAXIP1 CALM3 
12.826934446399154 PPP4R1 HNRNPR HDAC1 SF3B2 IKBKB MED24 
12.824445031052672 MYD88 LRRFIP1 TLR5 TXN IRAK2 
12.823421172458008 SIPA1L1 ACTB FAM110A USP11 SAMD4B 
12.82186380945007 IL13RA2 DNAAF5 DSCR3 BAG4 JAK1 SLC25A5 
12.820345143346154 PTPN13 PTPRA FASLG DFFB PARP1 STX6 
12.816851565002347 RAB6A CLEC4D RPA1 ARF1 RAB9A CCDC64 
12.814317302079758 ARHGAP12 ACTB ACTG1 PHACTR4 ASAP1 SRPK1 
12.81099531415499 COX17 COMT CFL1 HADH TXN HMGCL 
12.80892213645707 APRT TUBB CLEC4D PSMB10 MLH1 NUDT9 ADA 
12.790211875026548 MCCC1 ECH1 DYNLT1 DYNC1H1 MIPEP ARF1 NAA38 PFDN5 
12.789876745089192 RHBDD2 COMT EDEM1 SLC25A3 TCTN3 DYNC1H1 
12.77675866043283 ANKRD26 TBC1D31 SPECC1L PFN1 ARID1A STX6 RAB9A 
12.770199511572732 RBM14-RBM4 FAF2 WDR4 H2AFJ WDR37 ZSCAN21 NOL9 
12.764765140446867 HOXA5 AHCYL2 HLA-DPA1 DDIT3 NUFIP2 ANKHD1 S100P 
12.754224269979586 CMSS1 RPL31 RPL26 SRSF5 PACSIN1 S1PR1 
12.748445344565313 HAL DOCK10 VDAC1 UBAC2 GOT1 PIGT PFDN5 HNRNPA1 
12.740096980426063 CRELD1 GPR114 RMDN3 NUFIP2 NDUFB3 SPG7 
12.735076839334619 SERPINA12 CCNC TCTN3 SEC22C DYNC1H1 FOXJ3 
12.728647880300256 MARCH1 JAK1 VPS9D1 STX10 STX6 PTPRA 
12.727834860530066 HSF2 NUP62 ANXA1 UBE2I BAG4 DDIT3 
12.726983515534382 TRAPPC9 IKBKB TFG GATA2 STX6 NFATC3 
12.724901894157206 GRB10 ABL1 WDR59 NOA1 DDX24 USP11 
12.724426265432443 CHPT1 ITGB1 TCTN3 ALG9 TMEM5 CLEC2D 
12.719092908622864 USP2 CRACR2B GEMIN4 CARD11 SMAD7 CRY1 
12.718395161851076 COG4 FAM110A HDAC1 RAB9A 
12.708187742157715 EPB41 RNPS1 C2CD2 EEF2K ECHS1 STX6 DYNC1H1 
12.704714419736337 GSC2 ANXA5 CALM1 ORAOV1 GTF2H5 TSR2 CALM3 
12.7011091195529 FAM90A1 CCNC CDR2 PFDN5 PLAGL2 MLH1 ZNF212 
12.697441082988284 PIP5K1A AHCYL2 PARP1 CENPB HNRNPH1 
12.691140626850947 DCAF13 PAXIP1 RPA1 DDX47 H2AFJ NUFIP2 
12.688499355686796 GDE1 DAZAP2 TMEM109 HNRNPA1 UBAC2 XPNPEP1 
12.686206415923946 ND4 CTCF GHITM NDUFB3 NDUFA1 ATP5J 
12.682908113727049 ZACN ALG9 HNRNPH1 SLC41A3 LRRC8C EDEM1 
12.680036932203036 SH3GL2 ACTG1 ERH PAXIP1 LCP2 CALM1 
12.678600229641985 FAM96B PAXIP1 SON MAP4K1 PAFAH1B1 ELP3 
12.674693113804508 SLC12A4 ILF3 HNRNPH1 TMEM43 STX6 
12.673664676371178 TRIML2 NUP62 CCDC92 EIF2B1 CDR2 
12.671467107829773 PDAP1 KLHL36 HNRNPH1 U2AF2 CD74 
12.669975911174232 TSTA3 MAT2B GOT1 ANXA5 DSTN HNRNPH1 
12.666857891707162 FKBP9 APEH COX6A1 COX7C ACTB XPNPEP1 
12.665860155802582 SLC25A15 HLA-DPA1 PARP1 DNAJA3 CLEC2D 
12.656567185543286 AAK1 ACTB DYNC1H1 PSMD7 DYNLT1 AP2B1 GAK PFN1 
12.65642255762714 UQCC1 SBF1 TMEM131 MRPS18C CDR2 PDHA1 ECHDC2 
12.65629440309535 PKP4 SCRIB STX6 PAXIP1 HNRNPH1 
12.645129520941639 ZNF460 ZNF330 AHCYL2 MRPS18C NOA1 CENPB AP3D1 
12.644129103560928 SPAG5 TSPYL1 PFN1 COPS7B IFT20 
12.642632034060462 LPAR1 XYLT2 PIGT CKAP4 TM9SF4 USP11 NDUFB3 
12.635815251543947 ISLR TCTN3 CRELD2 SLC25A5 SPG7 MTOR 
12.628541127803826 RPAP2 PHRF1 ACTB PFN1 TNFAIP6 MED24 
12.623554075986746 NT5C1A GDPD3 CDC37 S100A9 
12.62063534157844 GNA11 CD81 ACTB GNB1 DYNLT1 SLC9A3R1 
12.618230071219948 MAPK9 SHMT1 NFATC3 STK25 HDAC1 XPNPEP1 
12.617570886483609 ZKSCAN8 SCRIB MRPS18C NOA1 ZSCAN21 THAP11 CENPB 
12.613417030843538 VCPIP1 ACTB AP2B1 PFN1 AP5B1 ASPSCR1 FAF2 
12.613334695584571 ZNF800 HDAC1 ADARB1 SRSF5 FBXO21 TM9SF4 
12.611438562677618 STON2 ACTB AP2B1 RNPS1 GAK ABL1 TOR1A 
12.609903236700042 MKS1 ECH1 RNF34 DDX24 KIAA0355 SAMD4B 
12.609795980454468 TCHP ACTB ZFP36 KIAA0355 CDR2 TUBB S100P 
12.604479713492816 RBBP8 TRRAP PAXIP1 UBE2I RBL2 ZNF217 
12.592559776499902 FAM195B DNMT1 ZFP36 ZC3H7A PPP1R16B 
12.591383524260609 SCYL1 ACTB SMARCA4 ARID1A BTBD2 ARF1 CKAP4 DDX23 
12.58717338010275 TNFRSF1A HNRNPD JAK1 UBE2I BAG4 IKBKB FASLG 
12.580664715623618 TARBP2 ADARB1 ILF3 CAPRIN1 SRP54 MAP4K1 TGIF2 
12.57379066574236 LCP1 ACTB U2AF2 CNDP2 AIF1 GOT1 
12.570138223956597 TMEM55B FAF2 RAB9A TBC1D9B DFFB NFATC3 
12.562996964214928 AP3S1 CD81 BMX AP3D1 DUSP5 SPG7 AGK 
12.560524205460064 LSG1 CLEC4D CKAP4 PTRH2 RAB9A PYHIN1 HMGB2 
12.553869938314373 ZNF592 ZNF330 ZMYND8 CSNK2A2 HDAC1 SCRIB 
12.553081935183153 UBAC1 PSMD7 HMGN2 RNASE3 VPS25 RNF216 DAZAP2 
12.552451473258257 C12ORF5 PIGT ANXA1 GSTO1 HMGCL PCBP1 
12.548327648987696 SLC2A8 C2CD2 AGPAT2 CLEC2D AP2B1 IARS 
12.53804980163159 ATP2A3 GHITM RMDN3 S1PR1 CLEC2D ATP2A2 SPG7 
12.529918255732493 SEC22A CLEC10A RNF144A GORASP2 STX2 CD74 CLEC2D 
12.525940122846919 TSPYL6 MYL6 DDX24 CENPB WDR59 
12.51773125485418 NDUFA3 HLA-DPA1 NDUFB3 NDUFA1 INTS9 
12.51578089830613 TPCN2 HK3 STX6 AP3D1 WDR11 PTPRA 
12.514743611966564 MINA C1ORF174 FNTA DDX54 ILF3 RNPS1 
12.509899874891586 SPEN CCNC BAG4 CBX3 HDAC1 PYHIN1 DYNLT1 
12.499346134667679 NCF1 NCF4 FASLG ACTB RCC2 EEF2 
12.496025391401774 CNN2 CD81 ACTB U2AF2 ANXA1 MAT2B 
12.49596597572004 FOXO3 TRRAP IKBKB HNRNPH1 RAD17 
12.491226176417404 TMEM87A ATP5J STX6 RAB9A TMEM131 CLEC4D 
12.485730559703105 OBFC1 CDC37 SPECC1L MED29 MED24 BRD7 
12.482763634260044 PALLD ACTB DSTN PFN1 SARS WDR4 IGBP1 
12.481136837564845 HIC1 DNMT1 PIAS3 UBE2I BCL11B ARID1A PHF19 
12.474570293288217 SSH3 ZNF330 HERC1 ASAP1 RPA1 
12.473312988369914 TMEM14B STX6 UBIAD1 COMT SRSF5 SFSWAP 
12.46851020946378 EXOSC6 ZFP36 MIF4GD GSE1 ARIH2 
12.463193349393734 CENPK TBC1D31 SPECC1L PAFAH1B1 LEO1 COPS5 
12.45830242512749 SH3BP4 FAM110A PARP1 
12.45049650858817 CCR4 CNTNAP1 ATP2A2 GHITM NUP62 CLASP2 
12.44981455040277 RERE ZMYND8 HDAC1 DPY30 YEATS2 ALG13 
12.433511011713414 MAGEA11 DOCK10 ILF3 EWSR1 MXD3 TRIM51 
12.432649061412166 DRD2 GHITM SLC22A17 RTN3 CALM1 CALM3 
12.423309769548975 NRBP1 TTF2 COPS5 TCEB2 DYNLT1 C2ORF50 
12.421925887481848 MGMT ACTB ANXA1 CSNK2A2 DDX24 PRKCH 
12.415612251227067 MAPRE3 PIK3R4 TTF2 TMEM131 STIM2 SPECC1L CLASP2 
12.41291968955834 NDUFB2 NDUFA1 MTOR MED24 HNRNPH1 
12.408369891568181 MS4A4A AGPAT2 C16ORF58 DNAAF5 CMTM7 ATP6V0E1 
12.407344019827867 TUBA3C CD81 FEZ1 AP4B1 FAF2 DYNLT1 
12.404306998260306 TM4SF5 STX10 STX6 TMEM43 PTRH2 CYSTM1 
12.397162827527717 PRSS8 IL27RA NDUFB3 AGK IARS 
12.390609640362428 PPARD KDM1A HDAC1 TTF2 TELO2 RAD17 
12.38887783516027 TRIM39 KDM1A CCNC MOAP1 CBX3 UBL5 
12.386457352604717 NOS1AP SCRIB RNPS1 AP2B1 STX6 PLAGL2 
12.381746816361211 WDR73 DAZAP2 ANXA7 ARID1A INTS9 
12.38076494716938 ARHGEF7 SCRIB CALM1 ATPIF1 MYL6B IGBP1 
12.378400783097414 ABCB10 TCTN3 HLA-DPA1 PDHA1 DNAJA3 
12.375722651846571 MPO S100A8 IRAK2 SRPK1 
12.371429326997916 SACM1L RTN3 COPS5 PARP1 DDOST TM9SF4 S1PR1 
12.37096064853776 MOSPD2 CLEC4D GPR114 RMDN3 FASLG TCTN3 
12.367453301762332 CD44 CD74 PHRF1 GOT1 ITGB1 
12.361978858350152 GPRIN1 ACTB HNRNPD S1PR1 STX6 PTPRA 
12.360881360859432 PPP1R8 SARS HNRNPH1 U2AF2 SNRPB 
12.357507784676356 EGLN1 ING4 RUNX3 PRPSAP1 RMDN3 
12.350049781616514 FBXW8 MYL6 CALM1 MAP4K1 MMP9 PFDN5 
12.345024404752504 VPS26A APEH VPS25 CRLF3 IFT20 EIF2S3 
12.338932731569418 CDKN1B SLC9A3R1 ABL1 UBE2I COPS5 ORM2 IRF1 


================================================
FILE: inst/rmd/conversion_table.Rmd
================================================
---
title: '`r logo_path <- system.file("extdata", "logo.png", package = "pathfindR"); knitr::opts_chunk$set(out.width="15%"); knitr::include_graphics(logo_path)` pathfindR - Converted Genes and Genes without Interactions' 
date: "`r format(Sys.time(), '%d %B, %Y')`"
output: html_document
params:
  df: ""
  original_df: ""
---

```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = FALSE)
```

## Table of Converted Gene Symbols
```{r converted_tbl, table1, comment=NA}
genes_df <- params$df
colnames(genes_df) <- c("Old Symbol", "Converted Symbol", "Change", "p-value")
genes_df <- genes_df[genes_df[, 1] != genes_df[, 2], ]
knitr::kable(genes_df, align = "c", table.caption.prefix ="")
```

## Table of Genes without Interactions (not found in the PIN)
```{r gene_wo_interaction, table2, comment=NA}
org_df <- params$original_df
missing_df <- org_df[!org_df[, 1] %in% params$df[, 1], ]
knitr::kable(missing_df, align = "c", table.caption.prefix ="")
```


================================================
FILE: inst/rmd/enriched_terms.Rmd
================================================
---
title: '`r logo_path <- system.file("extdata", "logo.png", package = "pathfindR"); knitr::opts_chunk$set(out.width="15%"); knitr::include_graphics(logo_path)` pathfindR - All Enriched Terms' 
output: html_document
params:
  df: ""
---

```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```

```{r table, echo = F, comment=NA}
result_df <- params$df
result_df$lowest_p <- format(result_df$lowest_p, digits = 2)
result_df$highest_p <- format(result_df$highest_p, digits = 2)

create_link <- function(text, link)
  return(paste0("[", text, "]", "(", link, ")"))

knitr::kable(result_df, align = "c", table.caption.prefix ="")
```


================================================
FILE: inst/rmd/results.Rmd
================================================
---
title: '`r logo_path <- system.file("extdata", "logo.png", package = "pathfindR"); knitr::opts_chunk$set(out.width="15%"); knitr::include_graphics(logo_path)` pathfindR - Results'
date: "`r format(Sys.time(), '%d %B, %Y')`"
output: html_document
---

```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```

pathfindR-Enrichment results are presented below:

## [All terms found to be enriched](./enriched_terms.html)
A table that lists all terms found to be enriched as well as lists of up- or down-regulated genes for each term. If it was requested, the term descriptions are linked to the visualizations of these terms, where affected color genes are colored by change values (if provided).

## [Tables of genes with converted gene symbols and genes without interactions](./conversion_table.html)
- A table listing the genes whose symbols (Old Symbol) were converted to aliases (Converted Symbol) that were in the protein-protein interaction network.
- A table listing the input genes for which no interactions in the PIN were found (after the aliases were also checked).


================================================
FILE: java/ActiveSubnetworkSearchAlgorithms/ActiveSubnetworkSearch.java
================================================
package ActiveSubnetworkSearchAlgorithms;

import ActiveSubnetworkSearchMisc.ScoreCalculations;
import ActiveSubnetworkSearchMisc.Subnetwork;
import Application.AppActiveSubnetworkSearch;
import Application.Parameters;
import File.ExperimentFileReader;
import File.SIFReader;
import Network.Network;
import Network.Node;
import java.io.BufferedWriter;
import java.io.FileNotFoundException;
import java.io.FileWriter;
import java.io.IOException;
import java.util.AbstractMap.SimpleEntry;
import java.util.ArrayList;
import java.util.logging.Level;
import java.util.logging.Logger;

/**
 *
 * @author Ozan Ozisik
 */
public class ActiveSubnetworkSearch {
    
    /**
     * scoreCalculations and network are used in other classes
     */
    public static ScoreCalculations scoreCalculations;
    public static Network network;
    public static ArrayList<Node> networkNodeList;
    
    public static void activeSubnetworkSearch(){
        
        network=SIFReader.readSIF(Parameters.sifPath);
        if(network==null){
            Logger.getLogger(AppActiveSubnetworkSearch.class.getName()).log(Level.SEVERE, "SIF file could not be loaded");
            System.exit(0);
        }
        networkNodeList=network.getNodeList();
        
        ArrayList<SimpleEntry<String, Double>> namePValuePairList=ExperimentFileReader.readExperimentFile(Parameters.experimentFilePath);
        if(namePValuePairList==null){
            Logger.getLogger(AppActiveSubnetworkSearch.class.getName()).log(Level.SEVERE, "Experiment file could not be loaded");
            System.exit(0);
        }
        
        scoreCalculations=new ScoreCalculations(namePValuePairList);
        
        ArrayList<Subnetwork> subnetworkList;
        
        if(Parameters.useSAorGAorGR==Parameters.SearchMethod.GA){
            GeneticAlgorithm geneticAlgorithm=new GeneticAlgorithm();
            subnetworkList=geneticAlgorithm.geneticAlgorithm();
        }else if(Parameters.useSAorGAorGR==Parameters.SearchMethod.SA){
            SimulatedAnnealing simulatedAnnealing=new SimulatedAnnealing();
            subnetworkList=simulatedAnnealing.simulatedAnnealing();
        }else{
            GreedySearch greedySearch=new GreedySearch();
            subnetworkList=greedySearch.greedySearch();
        }
        
        try {
            BufferedWriter bw=new BufferedWriter(new FileWriter(Parameters.resultFilePath));
            for(Subnetwork subnetwork:subnetworkList){
                if(subnetwork.getScore()>0){
                    bw.write(subnetwork.getScore()+" ");
//                    bw.write(subnetwork.numberOfNodes()+" ");
                    for(Node node:subnetwork.getNodeList()){
                        bw.write(node.getName()+" ");
                    }
                    bw.newLine();
                }
            }
            bw.close();
        } catch (FileNotFoundException ex) {
            Logger.getLogger(ActiveSubnetworkSearch.class.getName()).log(Level.SEVERE, null, ex);
        } catch (IOException ex) {
            Logger.getLogger(ActiveSubnetworkSearch.class.getName()).log(Level.SEVERE, null, ex);
        }
        
    }
    
}


================================================
FILE: java/ActiveSubnetworkSearchAlgorithms/GAIndividual.java
================================================
package ActiveSubnetworkSearchAlgorithms;

import ActiveSubnetworkSearchMisc.Subnetwork;
import Network.Node;
import Network.SubnetworkFinder;
import java.text.DecimalFormat;
import java.util.ArrayList;
import java.util.Collections;
import java.util.HashSet;
import java.util.Iterator;

/**
 *
 * @author Ozan Ozisik
 */
public class GAIndividual implements Comparable<Object>{
    
    private ArrayList<Boolean> representationBoolean;
    private ArrayList<Node> networkNodeList;
    private HashSet<Node> nodesOnSet;
    private ArrayList<Subnetwork> subnetworkList;
    
    public GAIndividual(HashSet<Node> nodesOnSet){
        this.nodesOnSet=nodesOnSet;
        this.networkNodeList=ActiveSubnetworkSearch.networkNodeList;
        representationBoolean=new ArrayList<>();
        for(Node node:networkNodeList){
            if(nodesOnSet.contains(node)){
                representationBoolean.add(Boolean.TRUE);
            }else{
                representationBoolean.add(Boolean.FALSE);
            }
        }
        subnetworkList=(new SubnetworkFinder()).findSubnetworksDFS(nodesOnSet);
        Collections.sort(subnetworkList,Collections.reverseOrder());
    }
    
    public GAIndividual(ArrayList<Boolean> representationBoolean){
        this.representationBoolean=representationBoolean;
        this.networkNodeList=ActiveSubnetworkSearch.networkNodeList;
        nodesOnSet=new HashSet<>();
        for(int i=0;i<representationBoolean.size();i++){
            if(representationBoolean.get(i)){
                nodesOnSet.add(networkNodeList.get(i));
            }
        }
        subnetworkList=(new SubnetworkFinder()).findSubnetworksDFS(nodesOnSet);
        Collections.sort(subnetworkList,Collections.reverseOrder());
    }

//    @Override
//    public int compareTo(Object o) {
//        int result=0; 
//        result=(int)Math.signum(this.getScore()-((GAIndividual)o).getScore());
//        return result;
//    }

    @Override
    public int compareTo(Object o) {
        int result=0; 
        
        boolean decision = false;

        Iterator<Subnetwork> ownIt = this.subnetworkList.iterator();
        Iterator<Subnetwork> otherIt = ((GAIndividual)o).getSubnetworkList().iterator();

        while (!decision && (ownIt.hasNext() && otherIt.hasNext())) {
            Subnetwork subnetworkOwn=ownIt.next();
            Subnetwork subnetworkOther=otherIt.next();
            if (subnetworkOwn.getScore() > subnetworkOther.getScore()) {
                result=1;
                decision = true;
            }else if (subnetworkOwn.getScore() < subnetworkOther.getScore()) {
                result=-1;
                decision = true;
            }
        }

        if(!decision){
            //Gives advantage to the individual with more subnetworks
            //when one individual's subnetwork set is a subset of other's
            if(ownIt.hasNext()){
                result=1;
            }else if(otherIt.hasNext()){
                result=-1;
            }
        }
        
        return result;
    }

    public ArrayList<Boolean> getRepresentationBoolean() {
        return representationBoolean;
    }

    public ArrayList<Node> getNetworkNodeList() {
        return networkNodeList;
    }

    public HashSet<Node> getNodesOnSet() {
        return nodesOnSet;
    }

    public ArrayList<Subnetwork> getSubnetworkList() {
        return subnetworkList;
    }
    
    public Subnetwork getHighestScoringSubnetwork(){
        return subnetworkList.get(0);
    }
    
    
    /**
     * @return score of highest scoring subnetwork in the individual
     */
    public double getScore(){
        if(subnetworkList.isEmpty()){
            return 0;
        }else{
            return subnetworkList.get(0).getScore();
        }
    }
    
    public String toString(){
        String str="";
        for(Subnetwork subnetwork:subnetworkList){
            if(subnetwork.numberOfNodes()>1){
                str+=subnetwork.numberOfNodes()+" ";
                str+=(new DecimalFormat("###.##")).format(subnetwork.getScore())+", ";
            }
        }
        return str;
    }
}


================================================
FILE: java/ActiveSubnetworkSearchAlgorithms/GeneticAlgorithm.java
================================================
package ActiveSubnetworkSearchAlgorithms;

import ActiveSubnetworkSearchMisc.ScoreCalculations;
import ActiveSubnetworkSearchMisc.Subnetwork;
import Application.Parameters;
import Network.*;
import java.util.ArrayList;
import java.util.Collections;
import java.util.Random;
import java.util.concurrent.ThreadLocalRandom;
import java.util.logging.Level;
import java.util.logging.Logger;


/**
 *
 * @author Ozan Ozisik
 */

enum SelectionType {RANKSELECTION, ROULETTEWHEEL}; 
enum CrossoverType {SINGLEPOINT, MULTIPOINT, UNIFORM};

public class GeneticAlgorithm {
    
    ScoreCalculations scoreCalculations;
    ArrayList<Node> networkNodeList;
    Random random;
    int populationSize;
    
     
    public ArrayList<Subnetwork> geneticAlgorithm(){
        scoreCalculations=ActiveSubnetworkSearch.scoreCalculations;
        networkNodeList = ActiveSubnetworkSearch.networkNodeList;
        populationSize=Parameters.ga_populationSize;
        ArrayList<GAIndividual> population=new ArrayList<>();
        random=new Random(Parameters.seedForRandom);
        
        initializePopulation(population, populationSize);
        
        printSituation(population);
        
        int addRandomIndividualCount=0;

        boolean running=true;
        int iter=0;
        GAIndividual lastBestIndividual=population.get(0);
        int lastBestRepeatNumber=0;
        while(running){
            
            /**
             * New population created
             */
            ArrayList<GAIndividual> newPopulation=createNewPopulation(population
                    , SelectionType.RANKSELECTION, CrossoverType.UNIFORM);
            
            /**
             * After each 10 steps 10% of the population (worst scoring part)
             * is replaced with random individuals
             */
            if(addRandomIndividualCount==10){
                for(int i=1;i<=(int)Parameters.ga_populationSize*0.1;i++){
                    newPopulation.set(newPopulation.size()-i, createRandomGAIndividual());
                }
                addRandomIndividualCount=0;
            }
            
            
            /**
             * Best scoring individuals are checked to prevent score decrement
             * in the next population. 
             * There is a possibility that this may overwrite one of
             * the randomly added individuals above. It is not a big deal.
            */
            if(Parameters.ga_Elitism){
                if(newPopulation.get(0).compareTo(population.get(0))<0){
                    newPopulation.set(newPopulation.size()-1, population.get(0));
                }
            }
            
            Collections.sort(newPopulation,Collections.reverseOrder());
            
            population=newPopulation;
            
            System.out.println("New Population, iter="+iter);
            
            printSituation(population);
            
            //TODO: Be careful about GAIndividuals with 0 subnetworks, you may 
            //consider adding empty subnetwork with score 0 in subnetwork finder
            //when no subnetwork found
            
            addRandomIndividualCount++;
            
            iter++;
            
            if(population.get(0).compareTo(lastBestIndividual)==0){
                lastBestRepeatNumber++;
            }else{
                lastBestIndividual=population.get(0);
                lastBestRepeatNumber=0;
            }
            
            if(lastBestRepeatNumber>=50){
                running=false;
                System.out.println("The score did not improve in 50 steps");
            }
            if(iter>=Parameters.ga_totalIterations){
                running=false;
            }
            
        }
        
        return population.get(0).getSubnetworkList();

    }
    
    private void printSituation(ArrayList<GAIndividual> population){
        for (int i = 0; i < 10; i++) {
            System.out.println(population.get(i).toString());
        }
    }
    
    
    private void initializePopulation(ArrayList<GAIndividual> population, int populationSize){
        for (int i = 0; i < populationSize; i++) {
            population.add(createRandomGAIndividual());
        }

        /**
         * Creates a GAIndividual that contains all the genes that have 
         * positive scores
         */ 
        if(Parameters.startWithAllPositiveZScoreNodes){
            ArrayList<Boolean> individualPositiveZ=new ArrayList<>();
            for(int i=0;i<networkNodeList.size();i++){
                Node node=networkNodeList.get(i);
                individualPositiveZ.add(scoreCalculations.getZScore(node) > 0);
            }
            population.set(populationSize-1, new GAIndividual(individualPositiveZ));
        }
        
        Collections.sort(population,Collections.reverseOrder());
    }
    
    private GAIndividual createRandomGAIndividual(){
        ArrayList<Boolean> individual = new ArrayList<>();
        for (int i = 0; i < networkNodeList.size(); i++) {
            individual.add(random.nextDouble()<Parameters.geneInitialAdditionProbability);
        }
        return new GAIndividual(individual);
    }
    
    private ArrayList<GAIndividual> createNewPopulation(ArrayList<GAIndividual> population, SelectionType selectionType, CrossoverType crossoverType){
        ArrayList<GAIndividual> newPopulation=new ArrayList<>();
        
        long start=System.nanoTime();
        ArrayList<Thread> threads=new ArrayList();
        for(int i=0;i<Parameters.ga_threadNumber;i++){
            Thread thread=new Thread(new NewPopulationFactory(population, 
                    newPopulation, selectionType, crossoverType));
            threads.add(thread);
            thread.start();
        }
        
        for(Thread thread:threads){
            try {
                thread.join();
            } catch (InterruptedException ex) {
                Logger.getLogger(GeneticAlgorithm.class.getName()).log(Level.SEVERE, null, ex);
            }
        }
        long stop=System.nanoTime();
        System.out.println("Time:"+(stop-start)/1000000);

        
        Collections.sort(newPopulation,Collections.reverseOrder());
        return newPopulation;
    }
    
}

class NewPopulationFactory implements Runnable{

    ArrayList<GAIndividual> newPopulation;
    ArrayList<GAIndividual> population;
    SelectionType selectionType;
    CrossoverType crossoverType;
    
    public NewPopulationFactory(ArrayList<GAIndividual> population, ArrayList<GAIndividual> newPopulation, SelectionType selectionType, CrossoverType crossoverType){
        this.population=population;
        this.newPopulation=newPopulation;
        this.selectionType=selectionType;
        this.crossoverType=crossoverType;
    }
    
    @Override
    public void run() {
        while(newPopulation.size()<population.size()){
            GAIndividual[] parents = selection(population);
            GAIndividual[] offsprings = crossoverAndMutation(parents[0], parents[1]);

            synchronized(newPopulation){
                if(newPopulation.size()<population.size()){
                    for(int i=0;i<offsprings.length;i++){
                        newPopulation.add(offsprings[i]);
                    }
                }
            }
        }
    }
    
    private GAIndividual[] selection(ArrayList<GAIndividual> population) {
        double totalWeight = 0;
        double weights[] = new double[population.size()];

        if (selectionType == SelectionType.RANKSELECTION) {
            for (int i = 0; i < population.size(); i++) {
                weights[i] = population.size()-i;
                totalWeight = totalWeight + weights[i];
            }
        } else {//SelectionType.ROULETTEWHEEL, individuals who have the same 
            //highest scoring subnetwork will have the same weight
            for (int i = 0; i < population.size(); i++) {
                weights[i] = population.get(i).getScore();
                totalWeight = totalWeight + weights[i];
            }
        }

        GAIndividual[] parents=new GAIndividual[2];
        for (int i = 0; i < 2; i++) {
            int randomIndex = -1;
            double rand = ThreadLocalRandom.current().nextDouble() * totalWeight;
            int rr = 0;
            while ((rr < population.size()) && (randomIndex == -1)) {
                rand = rand - weights[rr];
                if (rand <= 0.0) {
                    randomIndex = rr;
                }
                rr++;
            }
            parents[i]=population.get(randomIndex);
        }
        
        return parents;
    }
    
    private GAIndividual[] crossoverAndMutation(GAIndividual parent1, GAIndividual parent2){
        ArrayList<Boolean> parent1Boolean=parent1.getRepresentationBoolean();
        ArrayList<Boolean> parent2Boolean=parent2.getRepresentationBoolean();
        ArrayList<Boolean> child1Boolean=new ArrayList<>();
        ArrayList<Boolean> child2Boolean=new ArrayList<>();
        
        /**
         * Crossover
         */
        if(ThreadLocalRandom.current().nextDouble()<Parameters.ga_crossoverRate){
            
            if(crossoverType==CrossoverType.SINGLEPOINT){
                int crossoverPoint=ThreadLocalRandom.current().nextInt(parent1Boolean.size());
                for(int i=0;i<crossoverPoint;i++){
                    child1Boolean.add(parent1Boolean.get(i));
                    child2Boolean.add(parent2Boolean.get(i));
                }
                for(int i=crossoverPoint;i<parent1Boolean.size();i++){
                    child1Boolean.add(parent2Boolean.get(i));
                    child2Boolean.add(parent1Boolean.get(i));
                }
            }else if(crossoverType==CrossoverType.MULTIPOINT){
                int flag=0, count=0;
                for(int i=0;i<parent1Boolean.size();i++){
                    if(flag==0){
                        child1Boolean.add(parent1Boolean.get(i));
                        child2Boolean.add(parent2Boolean.get(i));
                    }else{
                        child1Boolean.add(parent2Boolean.get(i));
                        child2Boolean.add(parent1Boolean.get(i));
                    }
                    count++;
                    if(count==10){
                        count=0;
                        flag=(flag+1)%2;
                    }             
                }
            }else{//UNIFORM
                for(int i=0;i<parent1Boolean.size();i++){
                    if(ThreadLocalRandom.current().nextBoolean()){
                        child1Boolean.add(parent1Boolean.get(i));
                        child2Boolean.add(parent2Boolean.get(i));
                    }else{
                        child1Boolean.add(parent2Boolean.get(i));
                        child2Boolean.add(parent1Boolean.get(i));
                    }
                }
            }
        }
        
        /**
         * Mutation
         */ 
        if(Parameters.ga_mutationRate>0){
            for(int i=0;i<child1Boolean.size();i++){
                if(ThreadLocalRandom.current().nextDouble()<Parameters.ga_mutationRate){
                    child1Boolean.set(i, !child1Boolean.get(i));
                }
                if(ThreadLocalRandom.current().nextDouble()<Parameters.ga_mutationRate){
                    child2Boolean.set(i, !child2Boolean.get(i));
                }
            }
        }
        
        GAIndividual[] offsprings = new GAIndividual[2];
        offsprings[0] = new GAIndividual(child1Boolean);
        offsprings[1] = new GAIndividual(child2Boolean);
        
        return offsprings;
    }
    
}

================================================
FILE: java/ActiveSubnetworkSearchAlgorithms/GreedySearch.java
================================================
package ActiveSubnetworkSearchAlgorithms;

import ActiveSubnetworkSearchMisc.Subnetwork;
import Application.Parameters;
import Network.Network;
import Network.Node;
import java.util.ArrayList;
import java.util.Collections;
import java.util.HashMap;
import java.util.HashSet;
import java.util.LinkedList;

/**
 *
 * @author Ozan Ozisik adapted from https://github.com/idekerlab/jActiveModules
 *
 */
public class GreedySearch {

    int max_depth, search_depth;
    HashMap<Node, Subnetwork> node2BestComponent;
    /**
     * Track the best score generated from the current starting point
     */
    double bestScore;

    /**
     * Map from a node to the number of nodes which are dependent on this node
     * for connectivity into the graph
     */
    HashMap<Node, Integer> node2DependentCount;

    /**
     * Map from a node to it's predecessor in the search tree When we remove
     * this node, that predecessor may be optionally added to the list of
     * removable nodes, dependending if it has any other predecessors
     */
    HashMap<Node, Node> node2Predecessor;

    /**
     * Lets us know if we need to repeat the greedy search from a new starting
     * point
     */
    boolean greedyDone;
    /**
     * Determines which nodes are within max depth of the starting point
     */
    HashSet<Node> withinMaxDepth;
    ArrayList<Node> nodeList;
    Network graph;

    public ArrayList<Subnetwork> greedySearch() {
        this.max_depth = Parameters.gr_maxDepth;
        this.search_depth = Parameters.gr_searchDepth;

        node2BestComponent = new HashMap<Node, Subnetwork>();
        nodeList = ActiveSubnetworkSearch.networkNodeList;
        graph = ActiveSubnetworkSearch.network;

        int percent=0;
        for(int nodeNo=0;nodeNo<nodeList.size();nodeNo++){
            Node seed=nodeList.get(nodeNo);
            int newPercent=(100*nodeNo)/nodeList.size();
            if(newPercent>percent){
                percent=newPercent;
                System.out.println(percent+"% of seeds checked");
            }
            
            
            withinMaxDepth = new HashSet();
            
            /**
             * determine which nodes are within max-depth
             * of this starting node and add them to a hash set
             * so we can easily identify them
             * if the user doesn't wish to limit the maximum
             * depth, just add every node into the max depth
             * hash, thus all nodes are accepted as possible
             * additions
             */
            if (max_depth==0) {
                for(Node node:nodeList){
                    withinMaxDepth.add(node);
                }
            } else {
                /**
                 * recursively find the nodes within a max depth
                 */ 
                initializeMaxDepth(seed, max_depth);
            }
            
            // set the neighborhood of nodes to initially be only
            // the single node we are starting the search from
            ArrayList<Node> nodeListForSubnetwork=new ArrayList();
            nodeListForSubnetwork.add(seed);
            Subnetwork component = new Subnetwork(nodeListForSubnetwork);
            
            // make sure that the seed is never added to the list of removables
            node2DependentCount = new HashMap();
            node2Predecessor = new HashMap();
            node2DependentCount.put(seed, 1);
            // we don't need to make a predecessor entry for the seed,
            // since it should never be added to the list of removable nodes
            HashSet<Node> removableNodes = new HashSet();
            bestScore = Double.NEGATIVE_INFINITY;
            runGreedySearchRecursive(search_depth, component, seed, removableNodes);            
            runGreedyRemovalSearch(component, removableNodes);
            
            for(Node node:component.getNodeList()){
                Subnetwork oldBest = node2BestComponent.get(node);
                if (oldBest == null || oldBest.getScore() < component.getScore()) {
                    node2BestComponent.put(node, component);
                }
            }
            
        }
        System.out.println("100%");
        
        ArrayList<Subnetwork> subnetworkList=new ArrayList<Subnetwork>(node2BestComponent.values());
        Collections.sort(subnetworkList,Collections.reverseOrder());
        
        System.out.println("Filtering");
        
        System.out.println("Subnetwork number"+subnetworkList.size());
        for(int i=subnetworkList.size()-1;i>=0;i--){
            if(subnetworkList.get(i).getScore()<=0){
                subnetworkList.remove(i);
            }
        }
        System.out.println("Subnetwork number"+subnetworkList.size());
        for(int i=subnetworkList.size()-1;i>=0;i--){
            if(subnetworkList.get(i).numberOfNodes()<2){
                subnetworkList.remove(i);
            }
        }
        System.out.println("Subnetwork number"+subnetworkList.size());
        
        return filterSubnetworkList(subnetworkList);
        
    }
    
    /**
     * Takes sorted subnetworkList and filters subnetworks using overlap threshold
     * @param subnetworkList 
     */
    private ArrayList<Subnetwork> filterSubnetworkList(ArrayList<Subnetwork> subnetworkList){
        ArrayList<Subnetwork> filteredSubnetworkList=new ArrayList<>();
        ArrayList<Subnetwork> subnetworkListToBeDeleted=new ArrayList<>();
        
        int percent=0;
        int i=0;
        while(i<subnetworkList.size()-1 && filteredSubnetworkList.size()<Parameters.gr_subnetworkNum){
            int newPercent=(100*i)/subnetworkList.size();
            if(newPercent>percent){
                percent=newPercent;
                System.out.println(percent+"% of subnetworks checked, "+(filteredSubnetworkList.size()+1) + " filtered subnetworks in the list");
            }
            Subnetwork subnetwork1=subnetworkList.get(i);
            if (!subnetworkListToBeDeleted.contains(subnetwork1)) {
                filteredSubnetworkList.add(subnetwork1);
                for (int j = i + 1; j < subnetworkList.size(); j++) {
                    Subnetwork subnetwork2 = subnetworkList.get(j);
                    if (!subnetworkListToBeDeleted.contains(subnetwork2)) {
                        int common = 0;
                        for (Node node1 : subnetwork1.getNodeList()) {
                            if(subnetwork2.contains(node1)){
                                common++;
                            }
                        }
                        int size;
                        if(subnetwork1.numberOfNodes()<subnetwork2.numberOfNodes()){
                            size=subnetwork1.numberOfNodes();
                        }else{
                            size=subnetwork2.numberOfNodes();
                        }
                        double overlap=common/((double)(size));
                        if(overlap>Parameters.gr_overlapThreshold){
                            //subnetwork2 is added because it has lower score
                            //subnetworkList is sorted
                            subnetworkListToBeDeleted.add(subnetwork2);
                        }
                    }
                }
            }
            i++;
        }
        System.out.println("100%");
        //subnetworkList.removeAll(subnetworkListToBeDeleted);
        return filteredSubnetworkList;
    }
    
    /**
     * Recursively find the nodes within a max depth
     */
    private void initializeMaxDepth(Node current, int depth) {
        withinMaxDepth.add(current);
        if (depth > 0) {
            for (Node neighbor : graph.getNeighborSet(current)) {
                if (!withinMaxDepth.contains(neighbor)) {
                    initializeMaxDepth(neighbor, depth - 1);
                }
            }
        }
    }


    /**
     * Recursive greedy search function. Called from greedySearch() to a
     * recursive set of calls to greedily identify high scoring networks. The
     * idea for this search is that we make a recursive call for each addition
     * of a node from the neighborhood. At each stage we check to see if we have
     * found a higher scoring network, and if so, store it in one of the global
     * variables. You know how in the Wonder Twins, one of them turned into an
     * elephant and the other turned into a bucket of water? This function is
     * like the elephant.
     *
     * @param depth The remaining depth allowed for this greed search.
     * @param component The current component we are branching from.
     * @param lastAdded The last node added.
     * @param removableNodes Nodes that can be removed.
     */
    private boolean runGreedySearchRecursive(int depth, Subnetwork component,
            Node lastAdded, HashSet<Node> removableNodes) {
        boolean improved = false;
        // score this component, check and see if the global top scores should
        // be updated, if we have found a better score, then return true
        if (component.getScore() > bestScore) {
            depth = search_depth;
            improved = true;
            bestScore = component.getScore();
        }

        if (depth > 0) {
            // if depth > 0, otherwise we are out of depth and the recursive
            // calls will end
            // Get an iterator of nodes which are next to the
            boolean anyCallImproved = false;
            removableNodes.remove(lastAdded);
            int dependentCount = 0;
            for(Node newNeighbor:graph.getNeighborSet(lastAdded)){
                //this node is only a new neighbor if it is not currently
                // in the component.
                if (withinMaxDepth.contains(newNeighbor)
                        && !component.contains(newNeighbor)) {
                    component.addNode(newNeighbor);
                    removableNodes.add(newNeighbor);
                    boolean thisCallImproved = runGreedySearchRecursive(
                            depth - 1, component, newNeighbor, removableNodes);
                    if (!thisCallImproved) {
                        component.removeNode(newNeighbor);
                        removableNodes.remove(newNeighbor);
                    }else {
                        dependentCount += 1;
                        anyCallImproved = true;
                        node2Predecessor.put(newNeighbor, lastAdded);
                    }
                }
            }
            improved = improved | anyCallImproved;
            if (dependentCount > 0) {
                removableNodes.remove(lastAdded);
                node2DependentCount.put(lastAdded, dependentCount);
            }

        }
        return improved;
    }

    private void runGreedyRemovalSearch(Subnetwork component, HashSet removableNodes) {
        LinkedList list = new LinkedList(removableNodes);
        while (!list.isEmpty()) {
            Node current = (Node) list.removeFirst();
            component.removeNode(current);
            double score = component.getScore();
            if (score > bestScore) {
                bestScore = score;
                Node predecessor = (Node) node2Predecessor.get(current);
                int dependentCount = node2DependentCount.get(predecessor);
                dependentCount -= 1;
                if (dependentCount == 0) {
                    removableNodes.add(predecessor);
                }else {
                    node2DependentCount.put(predecessor, dependentCount);
                }
            }else {
                component.addNode(current);
            }
        }
    }
}


================================================
FILE: java/ActiveSubnetworkSearchAlgorithms/SimulatedAnnealing.java
================================================
package ActiveSubnetworkSearchAlgorithms;

import ActiveSubnetworkSearchMisc.Subnetwork;
import ActiveSubnetworkSearchMisc.ScoreCalculations;
import Application.Parameters;
import Network.*;
import java.text.DecimalFormat;
import java.util.ArrayList;
import java.util.Collections;
import java.util.HashSet;
import java.util.Iterator;
import java.util.Random;

/**
 *
 * @author Ozan Ozisik Some code parts are
 * adapted from  https://github.com/idekerlab/jActiveModules
 *
 * Simulated Annealing for active subnetwork search.
 *
 * Notes: In code from idekerLab, all nodes are "on" in the beginning, which
 * cost lots of iterations to clean up. Here, if the related parameter is set, 
 * all the nodes with positive z-scores are on, others are off; else
 * they are set randomly. Keeping change was default behavior when
 * score was not improved in any subnetwork in the list and randomness did not
 * reject the change. I changed default to false, although this situation can
 * occur rarely.
 */
public class SimulatedAnnealing {

    /**
     * 
     * @param network
     * @param scoreCalculations 
     */
    public ArrayList<Subnetwork> simulatedAnnealing() {
        
        Random rand = new Random(Parameters.seedForRandom);
        
        
        Network network=ActiveSubnetworkSearch.network;
        ScoreCalculations scoreCalculations=ActiveSubnetworkSearch.scoreCalculations;
                
        ArrayList<Node> nodeList = ActiveSubnetworkSearch.networkNodeList;

        HashSet<Node> nodesOnSet = new HashSet<>();
        HashSet<Node> nodesOffSet = new HashSet<>(nodeList);

        if(Parameters.startWithAllPositiveZScoreNodes){
            for (Node node : nodesOffSet) {
                if (scoreCalculations.getZScore(node) > 0) {
                    nodesOnSet.add(node);
                }
            }
        }else{
            for (Node node : nodesOffSet) {
                if(rand.nextDouble()<Parameters.geneInitialAdditionProbability){
                    nodesOnSet.add(node);
                }
            }
        }
        nodesOffSet.removeAll(nodesOnSet);

        SubnetworkFinder subnetworkFinder = new SubnetworkFinder();

        ArrayList<Subnetwork> subnetworkList = subnetworkFinder.findSubnetworksDFS(nodesOnSet);

        //subnetworkList.sort((subnetwork1, subnetwork2) -> - (int)Math.signum(subnetwork1.getScore()-subnetwork2.getScore()));
        Collections.sort(subnetworkList, Collections.reverseOrder());
        for (int i = 0; i < subnetworkList.size(); i++) {
            if (subnetworkList.get(i).numberOfNodes() > 1) {
                System.out.print(subnetworkList.get(i).numberOfNodes() + " " + (new DecimalFormat("###.##")).format(subnetworkList.get(i).getScore()) + ", ");
            }
        }
        System.out.println("");
        
        
        double initialTemperature = Parameters.sa_initialTemperature;
        double finalTemperature = Parameters.sa_finalTemperature;
        int totalIterations = Parameters.sa_totalIterations;

        double T = initialTemperature;
        double temp_step = 1 - Math.pow((finalTemperature / initialTemperature), (1.0 / totalIterations));
        

        System.out.println("Percentage of finished job, node number and score of modules that have more than one node are as follows:");
        int percent=0;
        System.out.println("0%");
        //TODO: There should be another stop mechanism, not only iteration number
        for (int iteration = 0; iteration < totalIterations; iteration++) {
            int newPercent=(100*iteration)/totalIterations;
            if(newPercent>percent){
                percent=newPercent;
                System.out.println(percent+"% ");
                printSituation(subnetworkList);
            }
            
            
            Node node = nodeList.get(rand.nextInt(nodeList.size()));

            toggleNodeState(nodesOnSet, nodesOffSet, node);

            ArrayList<Subnetwork> newSubnetworkList = subnetworkFinder.findSubnetworksDFS(nodesOnSet);
            //newSubnetworkList.sort((subnetwork1, subnetwork2) -> -(int)Math.signum(subnetwork1.getScore()-subnetwork2.getScore()));
            Collections.sort(newSubnetworkList,Collections.reverseOrder());

            boolean decision = false;
            boolean keep = false;//was true in IdekerLab code

            Iterator<Subnetwork> oldIt = subnetworkList.iterator();
            Iterator<Subnetwork> newIt = newSubnetworkList.iterator();

            
            //Note: There is a higher chance of accepting a change
            while (!decision && (newIt.hasNext() && oldIt.hasNext())) {
                Subnetwork subnetworkOld=oldIt.next();
                Subnetwork subnetworkNew=newIt.next();
                double delta = subnetworkNew.getScore() - subnetworkOld.getScore();
                if (delta > .001) {
                    keep = true;
                    decision = true;
                }else if (rand.nextDouble() > Math.exp(delta / T)) {
                    keep = false;
                    decision = true;
                }
            }

            if (keep) {
                subnetworkList = newSubnetworkList;
            } else {
                toggleNodeState(nodesOnSet, nodesOffSet, node);
            }

            T = T * (1 - temp_step);
        }
        System.out.println("100%");
        printSituation(subnetworkList);
        
        return subnetworkList;
    }

    /**
     * Moves node from nodesOnSet to nodesOff set or vice versa.
     * @param nodesOnSet
     * @param nodesOffSet
     * @param node 
     */
    public void toggleNodeState(HashSet<Node> nodesOnSet, HashSet<Node> nodesOffSet, Node node) {
        if (nodesOnSet.contains(node)) {
            nodesOnSet.remove(node);
            nodesOffSet.add(node);
        } else {
            nodesOffSet.remove(node);
            nodesOnSet.add(node);
        }
    }
    
    public void printSituation(ArrayList<Subnetwork> subnetworkList){
        for (int i = 0; i < subnetworkList.size(); i++) {
            if (subnetworkList.get(i).numberOfNodes() > 1) {
                System.out.print(subnetworkList.get(i).numberOfNodes() + " " + (new DecimalFormat("###.##")).format(subnetworkList.get(i).getScore()) + ", ");
            }
        }
        System.out.println("");
    }
            
}


================================================
FILE: java/ActiveSubnetworkSearchMisc/Gaussian.java
================================================
package ActiveSubnetworkSearchMisc;

/******************************************************************************
 *
 *  https://introcs.cs.princeton.edu/java/22library/Gaussian.java.html
 * 
 *  Function to compute the Gaussian pdf (probability density function)
 *  and the Gaussian cdf (cumulative density function)
 *
 *  % java Gaussian 820 1019 209
 *  0.17050966869132111
 *
 *  % java Gaussian 1500 1019 209
 *  0.9893164837383883
 *
 *  % java Gaussian 1500 1025 231
 *  0.9801220907365489
 *
 *  The approximation is accurate to absolute error less than 8 * 10^(-16).
 *  Reference: Evaluating the Normal Distribution by George Marsaglia.
 *  http://www.jstatsoft.org/v11/a04/paper
 *
 ******************************************************************************/

public class Gaussian {

    // return pdf(x) = standard Gaussian pdf
    public static double pdf(double x) {
        return Math.exp(-x*x / 2) / Math.sqrt(2 * Math.PI);
    }

    // return pdf(x, mu, signma) = Gaussian pdf with mean mu and stddev sigma
    public static double pdf(double x, double mu, double sigma) {
        return pdf((x - mu) / sigma) / sigma;
    }

    // return cdf(z) = standard Gaussian cdf using Taylor approximation
    public static double cdf(double z) {
        if (z < -8.0) return 0.0;
        if (z >  8.0) return 1.0;
        double sum = 0.0, term = z;
        for (int i = 3; sum + term != sum; i += 2) {
            sum  = sum + term;
            term = term * z * z / i;
        }
        return 0.5 + sum * pdf(z);
    }

    // return cdf(z, mu, sigma) = Gaussian cdf with mean mu and stddev sigma
    public static double cdf(double z, double mu, double sigma) {
        return cdf((z - mu) / sigma);
    } 

    // Compute z such that cdf(z) = y via bisection search
    public static double inverseCDF(double y) {
        return inverseCDF(y, 0.00000001, -8, 8);
    } 

    // bisection search
    private static double inverseCDF(double y, double delta, double lo, double hi) {
        double mid = lo + (hi - lo) / 2;
        if (hi - lo < delta) return mid;
        if (cdf(mid) > y) return inverseCDF(y, delta, lo, mid);
        else              return inverseCDF(y, delta, mid, hi);
    }

}


================================================
FILE: java/ActiveSubnetworkSearchMisc/ScoreCalculations.java
================================================
package ActiveSubnetworkSearchMisc;

import Application.Parameters;
import ActiveSubnetworkSearchAlgorithms.ActiveSubnetworkSearch;
import Network.*;
import java.util.ArrayList;
import java.util.HashMap;
import java.util.AbstractMap.SimpleEntry;
import java.util.Collections;
import java.util.Random;
import java.util.logging.Level;
import java.util.logging.Logger;

/**
 *
 * @author Ozan Ozisik
 * Some code parts from https://github.com/idekerlab/jActiveModules
 * 
 * Everything related to score calculation is in this class.
 * 
 * As Monte Carlo approach includes randomness, the score calibrated by this
 * approach will be different in each run. 
 * 
 */
public class ScoreCalculations {

    private HashMap<Node, Double> nodeToPValueMap;
    private HashMap<Node, Double> nodeToZScoreMap;
    private final ArrayList<Node> networkNodeList;
    private double[] samplingScoreMeans;
    private double[] samplingScoreStds;
    private double[] samplingScoreMins;
    private double[] samplingScoreMaxs;
    
    private double MIN_SIG = 0.0000000000001;
    private double MAX_SIG = 1 - MIN_SIG;

    public ScoreCalculations(ArrayList<SimpleEntry<String, Double>> namePValuePairList) {
        this.networkNodeList=ActiveSubnetworkSearch.networkNodeList;
        fillNodeToPValueMap(namePValuePairList);
        process();
    }

    private void fillNodeToPValueMap(ArrayList<SimpleEntry<String, Double>> namePValuePairList) {
        nodeToPValueMap = new HashMap<Node, Double>();
        int geneFromExperimentNotExisingInNetwork = 0;
        for (SimpleEntry<String, Double> entry : namePValuePairList) {
            Node node = new Node(entry.getKey());
            if (networkNodeList.contains(node)) {
                double pValue = entry.getValue();
                if(pValue<MIN_SIG){
                    pValue=MIN_SIG;
                }else if(pValue>MAX_SIG){
                    pValue=MAX_SIG;
                }
                double existingPValue=nodeToPValueMap.get(node) == null ? 1 : nodeToPValueMap.get(node);
                if (pValue < existingPValue) {
                    nodeToPValueMap.put(node, pValue);
                }
            } else {
                geneFromExperimentNotExisingInNetwork++;
            }
        }
        System.out.println(nodeToPValueMap);
        if(geneFromExperimentNotExisingInNetwork>0){
            Logger.getLogger(ScoreCalculations.class.getName()).log(Level.WARNING, "{0} genes in experiment file does not exist in the network", geneFromExperimentNotExisingInNetwork);
        }
        
        //Assign p-value to genes that do not exist in the experiment file.
        for (Node node : networkNodeList) {
            if (!nodeToPValueMap.containsKey(node)) {
                nodeToPValueMap.put(node, Parameters.pForNonSignificantNodes);
            }
        }
    }

    public void process() {
        boolean tmpPenaltyForSize=Parameters.penaltyForSize;
        Parameters.penaltyForSize=false;
        calculateZScores();
        calculateMeanAndStdForMonteCarlo();
        Parameters.penaltyForSize=tmpPenaltyForSize;
    }

    public Double getPValue(Node node) {
        return nodeToPValueMap.get(node);
    }
    
    public Double getZScore(Node node) {
        return nodeToZScoreMap.get(node);
    }

    private void calculateZScores() {
        nodeToZScoreMap=new HashMap<Node, Double>();
        for (Node node : networkNodeList) {
            double pValue = nodeToPValueMap.get(node);
            nodeToZScoreMap.put(node, ZStatistics.oneMinusNormalCDFInverse(pValue));
        }
    }
    
    
    private void calculateMeanAndStdForMonteCarlo() {
        int numberOfNodes = networkNodeList.size();
        samplingScoreMeans = new double[numberOfNodes+1];//0th position is not used
        samplingScoreStds = new double[numberOfNodes+1];//0th position is not used
        samplingScoreMins = new double[numberOfNodes+1];//0th position is not used
        samplingScoreMaxs = new double[numberOfNodes+1];//0th position is not used
        
        double[] samplingScoreSums=new double[numberOfNodes+1];//0th position is not used
        double[] samplingScoreSquareSums=new double[numberOfNodes+1];//0th position is not used
                
        for (int i = 0; i < numberOfNodes+1; i++) {
            samplingScoreSums[i] = 0;
            samplingScoreSquareSums[i] = 0;
            samplingScoreMins[i]=Double.MAX_VALUE;
            samplingScoreMaxs[i]=Double.MIN_VALUE;
        }
        int numberOfTrials=2000;
        
        ArrayList<Node> nodeListForSampling=new ArrayList<>(networkNodeList);
        ArrayList<Node> significantNodesList=new ArrayList<>();
        ArrayList<Node> nonsignificantNodesList=new ArrayList<>();
        for(Node node:networkNodeList){
            if(nodeToZScoreMap.get(node)>0){
                significantNodesList.add(node);
            }else{
                nonsignificantNodesList.add(node);
            }
        }
//        System.out.println(""+significantNodesList.size()+" "+nonsignificantNodesList.size());
        
        Random random=new Random(Parameters.seedForRandom);
        

        for (int trial = 0; trial < numberOfTrials; trial++) {
//            long start=System.nanoTime();

            Collections.shuffle(nodeListForSampling, random);

            //These code can be used to first add significant nodes and start 
            //sampling with positive scored nodes
//            Collections.shuffle(significantNodesList, random);
//            Collections.shuffle(nonsignificantNodesList, random);
//            nodeListForSampling.clear();
//            nodeListForSampling.addAll(significantNodesList);
//            nodeListForSampling.addAll(nonsignificantNodesList);
            
            
            double zSum=0;
            int numberOfNodesInSubnetwork=0;
            for(Node node:nodeListForSampling){
                zSum=zSum+nodeToZScoreMap.get(node);
                numberOfNodesInSubnetwork++;
                double score=ScoreCalculations.this.calculateScoreOfSubnetwork(numberOfNodesInSubnetwork,zSum,false);
                samplingScoreSums[numberOfNodesInSubnetwork]+=score;
                samplingScoreSquareSums[numberOfNodesInSubnetwork]+=score*score;
                
                if(score<samplingScoreMins[numberOfNodesInSubnetwork]){
                    samplingScoreMins[numberOfNodesInSubnetwork]=score;
                }
                if(score>samplingScoreMaxs[numberOfNodesInSubnetwork]){
                    samplingScoreMaxs[numberOfNodesInSubnetwork]=score;
                }
                
            }
//            long stop=System.nanoTime();
//            System.out.println((stop-start)/1000);//ms
        }
        
        for(int i=1;i<=numberOfNodes;i++){
            samplingScoreMeans[i]=samplingScoreSums[i]/numberOfTrials;
            
            /**
             * var = SUM((x-xmean)^2) / N 
             * var = SUM(x^2 - 2*xmean*x + xmean^2)/N
             * var = SUM(x^2)/N - (2*xmean*SUM(x))/N + (N*xmean^2)/N 
             * var = SUM(x^2 )/N - 2*xmean^2 + xmean^2
             * var = SUM(x^2 )/N - xmean^2
             */
            samplingScoreStds[i]=samplingScoreSquareSums[i]/numberOfTrials - samplingScoreMeans[i]*samplingScoreMeans[i];
            samplingScoreStds[i]=Math.sqrt(samplingScoreStds[i]+0.0000001);
        }
        
    }
    
    /**
     * Calculates score of subnetwork. Returns zero for one node subnetworks.
     * @param subnetwork
     * @param subnetworkScoreNormalization
     * @return 
     */
    public double calculateScoreOfSubnetwork(Subnetwork subnetwork, boolean subnetworkScoreNormalization) {
        return ScoreCalculations.this.calculateScoreOfSubnetwork(subnetwork.getNodeList(), subnetworkScoreNormalization);
    }
    
    /**
     * Calculates score using node list. Returns zero for one node subnetworks.
     * @param nodeList
     * @param subnetworkScoreNormalization
     * @return 
     */
    public double calculateScoreOfSubnetwork(ArrayList<Node> nodeList, boolean subnetworkScoreNormalization) {
        int numberOfNodes=nodeList.size();
        double zSum=0;
        for(Node node:nodeList){
            zSum=zSum+nodeToZScoreMap.get(node);
        }
        return ScoreCalculations.this.calculateScoreOfSubnetwork(numberOfNodes, zSum, subnetworkScoreNormalization);
    }
    
    /**
     * Calculates score using z score sum and number of nodes. 
     * Returns zero for one node subnetworks.
     * @param numberOfNodes
     * @param zSum
     * @param subnetworkScoreNormalization
     * @return 
     */
    public double calculateScoreOfSubnetwork(int numberOfNodes, double zSum, boolean subnetworkScoreNormalization) {
        if(numberOfNodes==1){
            return 0;
        }
        double score=zSum/Math.sqrt(numberOfNodes);
        if(subnetworkScoreNormalization){
            score=normalizeScore(score, numberOfNodes);
        }
        if(Parameters.penaltyForSize){
            score=penaltyForSize(score, numberOfNodes);
        }
        return score;
    }
    
    private double normalizeScore(double score, int numberOfNodes){
        return (score-samplingScoreMeans[numberOfNodes])/samplingScoreStds[numberOfNodes];
    }
    
    private double penaltyForSize(double score, int numberOfNodes){
        score=score*Gaussian.cdf(score, 100, 30)*1000;
        return score;
    }
    

}


================================================
FILE: java/ActiveSubnetworkSearchMisc/Subnetwork.java
================================================
package ActiveSubnetworkSearchMisc;

import Network.Network;
import ActiveSubnetworkSearchAlgorithms.ActiveSubnetworkSearch;
import Network.Node;
import java.util.ArrayList;
import java.util.HashSet;

/**
 *
 * @author Ozan Ozisik
 */
public class Subnetwork implements Comparable<Object> {
    
    private Network network;
    private ArrayList<Node> nodeList;
    private ScoreCalculations scoreCalculations;
    private double score;
    private double zSum;
    private HashSet<Node> neighborSet;
    private ArrayList<Node> neighborList;
    
    public Subnetwork(ArrayList<Node> nodeList){
        this.nodeList=nodeList;
        this.scoreCalculations=ActiveSubnetworkSearch.scoreCalculations;
        neighborSet=new HashSet<>();
        neighborList=new ArrayList();
        network = ActiveSubnetworkSearch.network;
        
        zSum=0;
        for(Node node:nodeList){
            zSum=zSum+scoreCalculations.getZScore(node);
        }
        this.score=scoreCalculations.calculateScoreOfSubnetwork(nodeList.size(), zSum, true);
    }
    
    //TODO It may be better to return a copy of the list or Collections.unmodifiableList(nodeList), here and in other private collection returning areas
    public ArrayList<Node> getNodeList(){
        return nodeList;
    }
    
    public HashSet<Node> getNeighborSet(){
        if(neighborSet.isEmpty()){
            extractNeighborSet();
        }
        return neighborSet;
    }
    public ArrayList<Node> getNeighborList(){
        if(neighborSet.isEmpty()){
            extractNeighborSet();
        }
        return neighborList;
    }
    
    
    public int numberOfNodes(){
        return nodeList.size();
    }
    
    public double getScore(){
        return score;
    }

    @Override
    public int compareTo(Object o) {
        return (int)Math.signum(this.getScore()-((Subnetwork)o).getScore());
    }
    
    public boolean contains(Node node){
        return nodeList.contains(node);
    }
    
    public void addNode(Node node){
        nodeList.add(node);
        zSum=zSum+scoreCalculations.getZScore(node);
        this.score=scoreCalculations.calculateScoreOfSubnetwork(nodeList.size(), zSum, true);
        
        //neighborSet is cleared for reextraction in case of need.
        //It could also be updated here.
        neighborSet.clear();
    }
    
    public void removeNode(Node node){
        if(nodeList.contains(node)){
            nodeList.remove(node);
            zSum=zSum-scoreCalculations.getZScore(node);
            this.score=scoreCalculations.calculateScoreOfSubnetwork(nodeList.size(), zSum, true);
            
            //neighborSet is cleared for reextraction in case of need.
            //It could also be updated here.
            neighborSet.clear();
        }
    }
    
    private void extractNeighborSet(){
        neighborSet.clear();
        for(Node node:nodeList){
            neighborSet.addAll(network.getNeighborSet(node));
        }
        neighborSet.removeAll(nodeList);
        neighborList.addAll(neighborSet);
    }
}


================================================
FILE: java/ActiveSubnetworkSearchMisc/ZStatistics.java
================================================
package ActiveSubnetworkSearchMisc;

/**
 *
 * @author Ozan Ozisik
 * adapted from  https://github.com/idekerlab/jActiveModules
 */
public class ZStatistics {
    
    public static double oneMinusNormalCDFInverse(double p) {
        if (p <= 0.5) {
            if (p > 0) {
                return oneMinusNormalCDFInversePLT5(p);
            } else {
                return Double.POSITIVE_INFINITY;
            }
        } else if (p < 1) {
            return -oneMinusNormalCDFInversePLT5(1 - p);
        } else {
            return Double.NEGATIVE_INFINITY;
        }
    }
    
    //from 26.2.23, page 933, Handbook of Mathematical Functions, NBS, 1964
    //Requires 0 < p <= 0.5
    private static double oneMinusNormalCDFInversePLT5(double p) {

        double t, temp;

        if (p < 0) {
            throw new IllegalArgumentException("oneMinusNormalCDFInversePLT5 called with negative p\n");
        } else if (p > 0.5) {
            throw new IllegalArgumentException("oneMinusNormalCDFInversePLT5 called with p > 0.5\n");
        } else {
            t = Math.sqrt(-2 * Math.log(p));
            temp = 2.515517 + 0.802853 * t + 0.010328 * t * t;
            temp = t - temp / (1 + 1.432788 * t + 0.189269 * t * t + 0.001308 * t * t * t);
            return temp;
        }
    }
    
    
}


================================================
FILE: java/Application/AppActiveSubnetworkSearch.java
================================================
package Application;

import ActiveSubnetworkSearchAlgorithms.ActiveSubnetworkSearch;
import java.util.logging.Level;
import java.util.logging.Logger;

/**
 *
 * @author Ozan Ozisik
 */
public class AppActiveSubnetworkSearch {

    /**
     * @param args the command line arguments
     */
    public static void main(String[] args) {

        try{
            processArguments(args);
        }catch(Exception e){
            Logger.getLogger(AppActiveSubnetworkSearch.class.getName()).log(Level.SEVERE, "Please check the arguments");
            System.exit(0);
        }        
        ActiveSubnetworkSearch.activeSubnetworkSearch();
        
    }
    
    public static void processArguments(String[] args) throws Exception {
        String helpText;
        helpText = "Options of the application are\n"
                + "-sif=<path>            \tuses the given interaction file\n"
                + "-sig=<path>            \tuses the given experiment file (gene p-value pairs)\n"
                + "-method=[GR|SA|GA]     \truns greedy search, simulated annealing or genetic algorithm for the search (default GR)\n"
                + "-useAllPositives       \tif used adds an individual with all positive nodes in GA, initializes candidate solution with all positive nodes in SA (default false)\n"
                + "-geneInitProb=<value>  \tprobability of adding a gene in inital solution for SA and GA (default 0.1)\n"
                + "-saTemp0=<value>       \tinitial temperature for SA (default 1.0)\n"
                + "-saTemp1=<value>       \tfinal temperature for SA (default 0.01)\n"
                + "-saIter=<value>        \titeration number for SA (default 10000)\n"
                + "-gaPop=<value>         \tpopulation size for GA (default 400)\n"
                + "-gaIter=<value>        \titeration number for GA (default 200)\n"
                + "-gaThread=<value>      \tnumber of threads to be used in GA (default 5)\n"
                + "-gaCrossover=<value>   \tapplies crossover with given probability (default 1)\n"
                + "-gaMut=<value>         \tapplies mutation with given rate (default 0)\n"
                + "-grMaxDepth=<value>    \tsets max depth in greedy search, 0 for no limit (default 1)\n"
                + "-grSearchDepth=<value> \tsets search depth in greedy search (default 1)\n"
                + "-grOverlap=<value>     \tsets overlap threshold for results of greedy search (default 0.5)\n"
                + "-grSubNum=<value>      \tsets number of subnetworks to be presented in the results (default 1000)\n"
                + "-seedForRandom=<value>      \tsets the seed for random number generators, useful for reproducibility (default 1234)\n";
        if(args.length==0 || args[0].equals("-h") || args[0].equals("help") || args[0].equals("-help")){
            System.out.println(helpText);
        }else{
            for(int i=0;i<args.length;i++){
                System.out.println(""+args[i]);
            }
            for(int i=0;i<args.length;i++){
                String[] str=args[i].split("=");
                String argType=str[0];
                String value="";
                if(str.length>1){
                    value=str[1];
                }
                
                switch(argType){
                    case "-sif":Parameters.sifPath=value;break;
                    case "-sig":Parameters.experimentFilePath=value;break;
                    case "-method":Parameters.useSAorGAorGR=Parameters.SearchMethod.valueOf(value);break;
                    case "-useAllPositives":Parameters.startWithAllPositiveZScoreNodes=true;break;
                    case "-geneInitProb":Parameters.geneInitialAdditionProbability=Double.parseDouble(value);break;
                    case "-saTemp0":Parameters.sa_initialTemperature=Double.parseDouble(value);break;
                    case "-saTemp1":Parameters.sa_finalTemperature=Double.parseDouble(value);break;
                    case "-saIter":Parameters.sa_totalIterations=Integer.parseInt(value);break;
                    case "-gaPop":Parameters.ga_populationSize=Integer.parseInt(value);break;
                    case "-gaIter":Parameters.ga_totalIterations=Integer.parseInt(value);break;
                    case "-gaThread":Parameters.ga_threadNumber=Integer.parseInt(value);break;
                    case "-gaMut":Parameters.ga_mutationRate=Double.parseDouble(value);break;
                    case "-grMaxDepth":Parameters.gr_maxDepth=Integer.parseInt(value);break;
                    case "-grSearchDepth":Parameters.gr_searchDepth=Integer.parseInt(value);break;
                    case "-grOverlap":Parameters.gr_overlapThreshold=Double.parseDouble(value);break;
                    case "-grSubNum":Parameters.gr_subnetworkNum=Integer.parseInt(value);break;
                    case "-seedForRandom":Parameters.seedForRandom=Integer.parseInt(value);break;
                    default:System.out.println("Unknown argument: "+argType);
                }
            }
        }
    }
}


================================================
FILE: java/Application/Parameters.java
================================================
package Application;

/**
 *
 * @author Ozan Ozisik
 */

public class Parameters {
    public static String sifPath="BIOGRID-ORGANISM-Homo_sapiens-3.4.155.OzCleaned.sif";
    public static String experimentFilePath="Behcet_jp_GWASPvalue.txt";
    public static String resultFilePath="resultActiveSubnetworkSearch.txt";
    
    public enum SearchMethod{GR, SA, GA};
    public static SearchMethod useSAorGAorGR=SearchMethod.GR;//(default GR)
    public static boolean startWithAllPositiveZScoreNodes=false;//(default false)
    public static double geneInitialAdditionProbability=0.1;//(default 0.1)
    
    public static boolean penaltyForSize=false;
    public static double pForNonSignificantNodes=0.5;//0.9999999999999
    
    public static double sa_initialTemperature=1.0;//(default 1.0)
    public static double sa_finalTemperature=0.01;//(default 0.01)
    public static int sa_totalIterations=10000;//(default 10000)
    
    
    public static int ga_populationSize=400;//(default 400)
    public static int ga_totalIterations=200;//(default 200)
    public static int ga_threadNumber=5;//(default 5)
    public static double ga_crossoverRate=1;
    public static double ga_mutationRate=0.0;
    public static boolean ga_Elitism=true;
    
    public static int gr_maxDepth=1;//(default 1)
    public static int gr_searchDepth=1;//(default 1)
    public static double gr_overlapThreshold=0.5;//(default 0.5)
    public static double gr_subnetworkNum=1000;//(default 1000)
    
    public static int seedForRandom=1234;
    
}

================================================
FILE: java/File/ExperimentFileReader.java
================================================
package File;

import java.io.BufferedReader;
import java.io.FileNotFoundException;
import java.io.FileReader;
import java.io.IOException;
import java.util.logging.Level;
import java.util.logging.Logger;
import java.util.AbstractMap.SimpleEntry;
import java.util.ArrayList;

/**
 *
 * @author Ozan Ozisik
 */
public class ExperimentFileReader {
    
    public static ArrayList<SimpleEntry<String, Double>> readExperimentFile(String path){
        
        try {
            //A list of pairs is used to allow multiple p-values for the same gene.
            //These multiple p-values situation is not business of this class
            ArrayList<SimpleEntry<String, Double>> namePValuePairList=new ArrayList<>();
            
            BufferedReader bufReader=new BufferedReader(new FileReader(path));
                        
            String line;
            String[] strArr;
            int lineNo=1;
            while ((line = bufReader.readLine()) != null) {
                strArr=line.split("[ \\t]");
                if(strArr.length==2){
                    try{
                        namePValuePairList.add(new SimpleEntry<>(strArr[0].toUpperCase(), Double.parseDouble(strArr[1])));
                    }catch(NumberFormatException nfe){
                        Logger.getLogger(ExperimentFileReader.class.getName()).log(Level.WARNING, "Unexpected number format in experiment file line {0}, discarded", lineNo);
                    }
                }else{
                    Logger.getLogger(ExperimentFileReader.class.getName()).log(Level.WARNING, "Unexpected column number in experiment file line {0}, discarded", lineNo);
                }
                lineNo++;
            }
            return namePValuePairList;
            
        } catch (FileNotFoundException ex) {
            Logger.getLogger(ExperimentFileReader.class.getName()).log(Level.SEVERE, "Experiment file not found", ex);
            return null;
        } catch (IOException ex) {
            Logger.getLogger(ExperimentFileReader.class.getName()).log(Level.SEVERE, null, ex);
            return null;
        }
        
    }
            
}


================================================
FILE: java/File/SIFReader.java
================================================
package File;

import Network.Network;
import java.io.BufferedReader;
import java.io.FileNotFoundException;
import java.io.FileReader;
import java.io.IOException;
import java.util.logging.Level;
import java.util.logging.Logger;

/**
 *
 * @author Ozan Ozisik
 */
public class SIFReader {
    
    public static Network readSIF(String path){
    
        try {
            int columnNumber;
            
            BufferedReader bufReader=new BufferedReader(new FileReader(path));
            Network network=new Network();
            
            bufReader.mark(300);
            String line;
            line = bufReader.readLine();
            String[] strArr=line.split("[ \\t]");
            if(strArr.length==2){
                columnNumber=2;
            }else if(strArr.length==3){
                columnNumber=3;
            }else{
                Logger.getLogger(SIFReader.class.getName()).log(Level.SEVERE, "SIF file must have 2 or 3 columns");
                return null;
            }
            
            bufReader.reset();
            
            int lineNo=1;
            while ((line = bufReader.readLine()) != null) {
                strArr=line.split("[ \\t]");
                if(strArr.length==columnNumber){
                    String strNode1, strNode2;
                    if(columnNumber==2){
                        strNode1=strArr[0];
                        strNode2=strArr[1];
                    }else{
                        strNode1=strArr[0];
                        strNode2=strArr[2];
                    }
                    strNode1=strNode1.toUpperCase();
                    strNode2=strNode2.toUpperCase();
                    network.addInteraction(strNode1,strNode2);
                }else{
                    Logger.getLogger(SIFReader.class.getName()).log(Level.WARNING, "Unexpected column number in SIF line {0}, discarded", lineNo);
                }
                lineNo++;
            }
            return network;
            
        } catch (FileNotFoundException ex) {
            Logger.getLogger(SIFReader.class.getName()).log(Level.SEVERE, "SIF file not found", ex);
            return null;
        } catch (IOException ex) {
            Logger.getLogger(SIFReader.class.getName()).log(Level.SEVERE, null, ex);
            return null;
        }
        
    }
}


================================================
FILE: java/Network/Network.java
================================================
package Network;

import java.util.ArrayList;
import java.util.HashMap;
import java.util.HashSet;
import java.util.logging.Level;
import java.util.logging.Logger;

/**
 *
 * @author Ozan Ozisik
 */
public class Network {
    
    private HashMap<Node, HashSet<Node>> adjacency;
    private boolean selfInteractionWarningGiven=false;

    public Network() {
        adjacency=new HashMap<Node, HashSet<Node>>();
    }
    
    public void addInteraction(String strNode1, String strNode2){
        Node node1=new Node(strNode1);
        Node node2=new Node(strNode2);
        addInteraction(node1, node2);
    }
    
    public void addInteraction(Node node1, Node node2){
        if(node1.equals(node2)){
            if(!selfInteractionWarningGiven){
                Logger.getLogger(Network.class.getName()).log(Level.WARNING, "Self interactions are discarded.");
                selfInteractionWarningGiven=true;
            }
        }else{
            if(adjacency.get(node1)==null){
            adjacency.put(node1, new HashSet<Node>());
            }
            if(adjacency.get(node2)==null){
                adjacency.put(node2, new HashSet<Node>());
            }
            adjacency.get(node1).add(node2);
            adjacency.get(node2).add(node1);
        }
        
    }
    
    public HashSet<Node> getNeighborSet(Node node){
        return adjacency.get(node);
    }
    
    public ArrayList<Node> getNodeList(){
        return new ArrayList<>(adjacency.keySet());
    }
    
    public boolean areAdjacent(Node node1, Node node2){
        return adjacency.get(node1).contains(node2);
    }
    
    public int getNumberOfNodes(){
        return adjacency.keySet().size();
    }
    
    public int getNumberOfInteractions(){
        int interactionNumber=0;
        ArrayList<Node> nodeList=getNodeList();
        for(int i=0;i<nodeList.size()-1;i++){
            for(int j=i+1;j<nodeList.size();j++){
                if(areAdjacent(nodeList.get(i), nodeList.get(j))){
                    interactionNumber++;
                }
            }
        }
        return interactionNumber;
    }
}


================================================
FILE: java/Network/Node.java
================================================
package Network;

import java.util.Objects;

/**
 *
 * @author Ozan Ozisik
 */
public class Node {
    private final String name;
    
    public Node(String name){
        this.name=name;
    }
    
    public String getName(){
        return name;
    }
    
    @Override
    public String toString(){
        return getName();
    }
    
    @Override
    public int hashCode() {
        return name.hashCode();
    }

    @Override
    public boolean equals(Object obj) {
        if (this == obj) {
            return true;
        }
        if (obj == null) {
            return false;
        }
        if (getClass() != obj.getClass()) {
            return false;
        }
        final Node other = (Node) obj;
        if (!Objects.equals(this.name, other.name)) {
            return false;
        }
        return true;
    }

    
}


================================================
FILE: java/Network/SubnetworkFinder.java
================================================
package Network;

import ActiveSubnetworkSearchAlgorithms.ActiveSubnetworkSearch;
import ActiveSubnetworkSearchMisc.Subnetwork;
import java.util.ArrayList;
import java.util.HashSet;
import java.util.LinkedList;


/**
 *
 * @author Ozan Ozisik
 */
public class SubnetworkFinder {
    Network network;
    HashSet<Node> nodesOnSet;
    HashSet<Node> reached;
    
    public SubnetworkFinder(){
        network=ActiveSubnetworkSearch.network;
    }
    
    
    /**
     * Finds the connected subnetworks of the given nodes using depth first
     * search. This method may return empty ArrayList, this should be handled in the 
     * calling methods.
     * @param nodesOnSet
     * @return 
     */ 
    public ArrayList<Subnetwork> findSubnetworksDFS(HashSet<Node> nodesOnSet){
        this.nodesOnSet=nodesOnSet;
        ArrayList<Subnetwork> subnetworkList=new ArrayList<Subnetwork>();
        reached=new HashSet<>(2 * nodesOnSet.size());
        
        for(Node node:nodesOnSet){
            if(!reached.contains(node)){
                ArrayList<Node> subnetworkNodeList = new ArrayList<>();
                search(node, subnetworkNodeList);
                subnetworkList.add(new Subnetwork(subnetworkNodeList));
            }
        }
        return subnetworkList;
    }
    private void search(Node node, ArrayList<Node> subnetworkNodeList){
        reached.add(node);
        subnetworkNodeList.add(node);
        HashSet<Node> neighborNodesSet=network.getNeighborSet(node);
        for(Node neighborNode:neighborNodesSet){
            if((nodesOnSet.contains(neighborNode))&&(!reached.contains(neighborNode))){
                search(neighborNode, subnetworkNodeList);
            }
        }
    }
    
    
    public ArrayList<Subnetwork> findSubnetworksDFSNonRecursive(HashSet<Node> nodesOnSet){
        ArrayList<Subnetwork> subnetworkList=new ArrayList<Subnetwork>();
        HashSet<Node> reached=new HashSet<>(2 * nodesOnSet.size());
        
        for(Node node:nodesOnSet){
            if(!reached.contains(node)){
                ArrayList<Node> subnetworkNodeList = new ArrayList<>();
                LinkedList<Node> nodesToBeChecked=new LinkedList<>();
                nodesToBeChecked.add(node);
                while(!nodesToBeChecked.isEmpty()){
                    Node curNode=nodesToBeChecked.pop();
                    if(!reached.contains(curNode)){
                        reached.add(curNode);
                        subnetworkNodeList.add(curNode);
                        for(Node neighborNode:network.getNeighborSet(curNode)){
                            if((nodesOnSet.contains(neighborNode))&&(!reached.contains(neighborNode))){
                                nodesToBeChecked.push(neighborNode);
                            }
                        }
                    }
                }
                subnetworkList.add(new Subnetwork(subnetworkNodeList));
            }
        }
        return subnetworkList;
    }
    
    
    public ArrayList<Subnetwork> findSubnetworksBFS(HashSet<Node> nodesOnSet){
        ArrayList<Subnetwork> subnetworkList=new ArrayList<Subnetwork>();
        HashSet<Node> reached=new HashSet<>(2 * nodesOnSet.size());
        
        for(Node node:nodesOnSet){
            if(!reached.contains(node)){
                ArrayList<Node> subnetworkNodeList = new ArrayList<>();
                LinkedList<Node> nodesToBeChecked=new LinkedList<>();
                nodesToBeChecked.add(node);
                while(!nodesToBeChecked.isEmpty()){
                    Node curNode=nodesToBeChecked.pop();
                    if(!reached.contains(curNode)){
                        reached.add(curNode);
                        subnetworkNodeList.add(curNode);
                        for(Node neighborNode:network.getNeighborSet(curNode)){
                            if((nodesOnSet.contains(neighborNode))&&(!reached.contains(neighborNode))){
                                nodesToBeChecked.add(neighborNode);
                            }
                        }
                    }
                }
                subnetworkList.add(new Subnetwork(subnetworkNodeList));
            }
        }
        return subnetworkList;
    }
}


================================================
FILE: man/UpSet_plot.Rd
================================================
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/visualization.R
\name{UpSet_plot}
\alias{UpSet_plot}
\title{Create UpSet Plot of Enriched Terms}
\usage{
UpSet_plot(
  result_df,
  genes_df,
  num_terms = 10,
  method = "heatmap",
  use_description = FALSE,
  low = "red",
  mid = "black",
  high = "green",
  ...
)
}
\arguments{
\item{result_df}{A dataframe of pathfindR results that must contain the following columns: \describe{
  \item{Term_Description}{Description of the enriched term (necessary if \code{use_description = TRUE})}
  \item{ID}{ID of the enriched term (necessary if \code{use_description = FALSE})}
  \item{lowest_p}{the highest adjusted-p value of the given term over all iterations}
  \item{Up_regulated}{the up-regulated genes in the input involved in the given term's gene set, comma-separated}
  \item{Down_regulated}{the down-regulated genes in the input involved in the given term's gene set, comma-separated}
}}

\item{genes_df}{the input data that was used with \code{\link{run_pathfindR}}.
  It must be a data frame with 3 columns: \enumerate{
  \item Gene Symbol (Gene Symbol)
  \item Change value, e.g. log(fold change) (optional)
  \item p value, e.g. adjusted p value associated with differential expression
} The change values in this data frame are used to color the affected genes}

\item{num_terms}{Number of top enriched terms to use while creating the plot. Set to \code{NULL} to use
all enriched terms (default = 10)}

\item{method}{the option for producing the plot. Options include 'heatmap',
'boxplot' and 'barplot'. (default = 'heatmap')}

\item{use_description}{Boolean argument to indicate whether term descriptions
(in the 'Term_Description' column) should be used. (default = \code{FALSE})}

\item{low}{a string indicating the color of 'low' values in the coloring gradient (default = 'green')}

\item{mid}{a string indicating the color of 'mid' values in the coloring gradient (default = 'black')}

\item{high}{a string indicating the color of 'high' values in the coloring gradient (default = 'red')}

\item{...}{additional arguments for \code{\link{input_processing}} (used if
\code{genes_df} is provided)}
}
\value{
UpSet plots are plots of the intersections of sets as a matrix. This
function creates a ggplot object of an UpSet plot where the x-axis is the
UpSet plot of intersections of enriched terms. By default (i.e.
\code{method = 'heatmap'}) the main plot is a heatmap of genes at the
corresponding intersections, colored by up/down regulation (if
\code{genes_df} is provided, colored by change values). If
\code{method = 'barplot'}, the main plot is bar plots of the number of genes
at the corresponding intersections. Finally, if \code{method = 'boxplot'} and
if \code{genes_df} is provided, then the main plot displays the boxplots of
change values of the genes at the corresponding intersections.
}
\description{
Create UpSet Plot of Enriched Terms
}
\examples{
UpSet_plot(example_pathfindR_output)
}


================================================
FILE: man/active_snw_enrichment_wrapper.Rd
================================================
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/utility.R
\name{active_snw_enrichment_wrapper}
\alias{active_snw_enrichment_wrapper}
\title{Wrapper for Active Subnetwork Search + Enrichment over Single/Multiple Iteration(s)}
\usage{
active_snw_enrichment_wrapper(
  input_processed,
  pin_path,
  gset_list,
  enrichment_threshold,
  list_active_snw_genes,
  adj_method = "bonferroni",
  search_method = "GR",
  disable_parallel = FALSE,
  use_all_positives = FALSE,
  iterations = 10,
  n_processes = NULL,
  score_quan_thr = 0.8,
  sig_gene_thr = 0.02,
  saTemp0 = 1,
  saTemp1 = 0.01,
  saIter = 10000,
  gaPop = 400,
  gaIter = 200,
  gaThread = 5,
  gaCrossover = 1,
  gaMut = 0,
  grMaxDepth = 1,
  grSearchDepth = 1,
  grOverlap = 0.5,
  grSubNum = 1000,
  silent_option = TRUE
)
}
\arguments{
\item{input_processed}{processed input data frame}

\item{pin_path}{path/to/PIN/file}

\item{gset_list}{list for gene sets}

\item{enrichment_threshold}{adjusted-p value threshold used when filtering
enrichment results (default = 0.05)}

\item{list_active_snw_genes}{boolean value indicating whether or not to report
the non-significant active subnetwork genes for the active subnetwork which was enriched for
the given term with the lowest p value (default = \code{FALSE})}

\item{adj_method}{correction method to be used for adjusting p-values.
(default = 'bonferroni')}

\item{search_method}{algorithm to use when performing active subnetwork
search. Options are greedy search (GR), simulated annealing (SA) or genetic
algorithm (GA) for the search (default = 'GR').}

\item{disable_parallel}{boolean to indicate whether to disable parallel runs
via \code{foreach} (default = FALSE)}

\item{use_all_positives}{if TRUE: in GA, adds an individual with all positive
nodes. In SA, initializes candidate solution with all positive nodes. (default = FALSE)}

\item{iterations}{number of iterations for active subnetwork search and
enrichment analyses (Default = 10)}

\item{n_processes}{optional argument for specifying the number of processes
used by foreach. If not specified, the function determines this
automatically (Default == NULL. Gets set to 1 for Genetic Algorithm)}

\item{score_quan_thr}{active subnetwork score quantile threshold. Must be
between 0 and 1 or set to -1 for not filtering. (Default = 0.8)}

\item{sig_gene_thr}{threshold for the minimum proportion of significant genes in
the subnetwork (Default = 0.02) If the number of genes to use as threshold is
calculated to be < 2 (e.g. 50 signif. genes x 0.01 = 0.5), the threshold number
is set to 2}

\item{saTemp0}{Initial temperature for SA (default = 1.0)}

\item{saTemp1}{Final temperature for SA (default = 0.01)}

\item{saIter}{Iteration number for SA (default = 10000)}

\item{gaPop}{Population size for GA (default = 400)}

\item{gaIter}{Iteration number for GA (default = 200)}

\item{gaThread}{Number of threads to be used in GA (default = 5)}

\item{gaCrossover}{Applies crossover with the given probability in GA (default = 1, i.e. always perform crossover)}

\item{gaMut}{For GA, applies mutation with given mutation rate (default = 0, i.e. mutation off)}

\item{grMaxDepth}{Sets max depth in greedy search, 0 for no limit (default = 1)}

\item{grSearchDepth}{Search depth in greedy search (default = 1)}

\item{grOverlap}{Overlap threshold for results of greedy search (default = 0.5)}

\item{grSubNum}{Number of subnetworks to be presented in the results (default = 1000)}

\item{silent_option}{boolean value indicating whether to print the messages
to the console (FALSE) or not (TRUE, this will print to a temp. file) during
active subnetwork search (default = TRUE). This option was added because
during parallel runs, the console messages get disorderly printed.}
}
\value{
Data frame of combined pathfindR enrichment results
}
\description{
Wrapper for Active Subnetwork Search + Enrichment over Single/Multiple Iteration(s)
}


================================================
FILE: man/active_snw_search.Rd
================================================
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/active_snw_search.R
\name{active_snw_search}
\alias{active_snw_search}
\title{Perform Active Subnetwork Search}
\usage{
active_snw_search(
  input_for_search,
  pin_name_path = "Biogrid",
  snws_file = "active_snws",
  dir_for_parallel_run = NULL,
  score_quan_thr = 0.8,
  sig_gene_thr = 0.02,
  search_method = "GR",
  seedForRandom = 1234,
  silent_option = TRUE,
  use_all_positives = FALSE,
  geneInitProbs = 0.1,
  saTemp0 = 1,
  saTemp1 = 0.01,
  saIter = 10000,
  gaPop = 400,
  gaIter = 10000,
  gaThread = 5,
  gaCrossover = 1,
  gaMut = 0,
  grMaxDepth = 1,
  grSearchDepth = 1,
  grOverlap = 0.5,
  grSubNum = 1000
)
}
\arguments{
\item{input_for_search}{input the input data that active subnetwork search uses. The input
must be a data frame containing at least these 2 columns: \describe{
  \item{GENE}{Gene Symbol}
  \item{P_VALUE}{p value obtained through a test, e.g. differential expression/methylation}
}}

\item{pin_name_path}{Name of the chosen PIN or absolute/path/to/PIN.sif. If PIN name,
must be one of c('Biogrid', 'STRING', 'GeneMania', 'IntAct', 'KEGG', 'mmu_STRING'). If
path/to/PIN.sif, the file must comply with the PIN specifications. (Default = 'Biogrid')}

\item{snws_file}{name for active subnetwork search output data
\strong{without file extension} (default = 'active_snws')}

\item{dir_for_parallel_run}{(previously created) directory for a parallel run iteration.
Used in the wrapper function (see ?run_pathfindR) (Default = NULL)}

\item{score_quan_thr}{active subnetwork score quantile threshold. Must be
between 0 and 1 or set to -1 for not filtering. (Default = 0.8)}

\item{sig_gene_thr}{threshold for the minimum proportion of significant genes in
the subnetwork (Default = 0.02) If the number of genes to use as threshold is
calculated to be < 2 (e.g. 50 signif. genes x 0.01 = 0.5), the threshold number
is set to 2}

\item{search_method}{algorithm to use when performing active subnetwork
search. Options are greedy search (GR), simulated annealing (SA) or genetic
algorithm (GA) for the search (default = 'GR').}

\item{seedForRandom}{seed for reproducibility while running the java modules (applies for GR and SA)}

\item{silent_option}{boolean value indicating whether to print the messages
to the console (FALSE) or not (TRUE, this will print to a temp. file) during
active subnetwork search (default = TRUE). This option was added because
during parallel runs, the console messages get disorderly printed.}

\item{use_all_positives}{if TRUE: in GA, adds an individual with all positive
nodes. In SA, initializes candidate solution with all positive nodes. (default = FALSE)}

\item{geneInitProbs}{For SA and GA, probability of adding a gene in initial solution (default = 0.1)}

\item{saTemp0}{Initial temperature for SA (default = 1.0)}

\item{saTemp1}{Final temperature for SA (default = 0.01)}

\item{saIter}{Iteration number for SA (default = 10000)}

\item{gaPop}{Population size for GA (default = 400)}

\item{gaIter}{Iteration number for GA (default = 200)}

\item{gaThread}{Number of threads to be used in GA (default = 5)}

\item{gaCrossover}{Applies crossover with the given probability in GA (default = 1, i.e. always perform crossover)}

\item{gaMut}{For GA, applies mutation with given mutation rate (default = 0, i.e. mutation off)}

\item{grMaxDepth}{Sets max depth in greedy search, 0 for no limit (default = 1)}

\item{grSearchDepth}{Search depth in greedy search (default = 1)}

\item{grOverlap}{Overlap threshold for results of greedy search (default = 0.5)}

\item{grSubNum}{Number of subnetworks to be presented in the results (default = 1000)}
}
\value{
A list of genes in every identified active subnetwork that has a score greater than
the `score_quan_thr`th quantile and that has at least `sig_gene_thr` affected genes.
}
\description{
Perform Active Subnetwork Search
}
\examples{
\donttest{
processed_df <- example_pathfindR_input[1:15, -2]
colnames(processed_df) <- c('GENE', 'P_VALUE')
GR_snws <- active_snw_search(
  input_for_search = processed_df,
  pin_name_path = 'KEGG',
  search_method = 'GR',
  score_quan_thr = 0.8
)
# clean-up
unlink('active_snw_search', recursive = TRUE)
}
}


================================================
FILE: man/annotate_term_genes.Rd
================================================
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/utility.R
\name{annotate_term_genes}
\alias{annotate_term_genes}
\title{Annotate the Affected Genes in the Provided Enriched Terms}
\usage{
annotate_term_genes(
  result_df,
  input_processed,
  genes_by_term = pathfindR.data::kegg_genes
)
}
\arguments{
\item{result_df}{data frame of enrichment results.
The only must-have column is 'ID'.}

\item{input_processed}{input data processed via \code{\link{input_processing}}}

\item{genes_by_term}{List that contains genes for each gene set. Names of
this list are gene set IDs (default = kegg_genes)}
}
\value{
The original data frame with two additional columns:  \describe{
  \item{Up_regulated}{the up-regulated genes in the input involved in the given term's gene set, comma-separated}
  \item{Down_regulated}{the down-regulated genes in the input involved in the given term's gene set, comma-separated}
}
}
\description{
Function to annotate the involved affected (input) genes in each term.
}
\examples{
example_gene_data <- example_pathfindR_input
colnames(example_gene_data) <- c('GENE', 'CHANGE', 'P_VALUE')

annotated_result <- annotate_term_genes(
  result_df = example_pathfindR_output,
  input_processed = example_gene_data
)
}


================================================
FILE: man/check_java_version.Rd
================================================
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/zzz.R
\name{check_java_version}
\alias{check_java_version}
\title{Check Java Version}
\usage{
check_java_version(version = NULL)
}
\arguments{
\item{version}{character vector containing the output of 'java -version'. If
NULL, result of \code{\link{fetch_java_version}} is used (default = NULL)}
}
\value{
only parses and checks whether the java version is >= 1.8
}
\description{
Check Java Version
}
\details{
this function was adapted from the CRAN package \code{sparklyr}
}


================================================
FILE: man/cluster_enriched_terms.Rd
================================================
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/clustering.R
\name{cluster_enriched_terms}
\alias{cluster_enriched_terms}
\title{Cluster Enriched Terms}
\usage{
cluster_enriched_terms(
  enrichment_res,
  method = "hierarchical",
  plot_clusters_graph = TRUE,
  use_description = FALSE,
  use_active_snw_genes = FALSE,
  ...
)
}
\arguments{
\item{enrichment_res}{data frame of pathfindR enrichment results. Must-have
columns are 'Term_Description' (if \code{use_description = TRUE}) or 'ID'
(if \code{use_description = FALSE}), 'Down_regulated', and 'Up_regulated'.
If \code{use_active_snw_genes = TRUE}, 'non_Signif_Snw_Genes' must also be
provided.}

\item{method}{Either 'hierarchical' or 'fuzzy'. Details of clustering are
provided in the corresponding functions \code{\link{hierarchical_term_clustering}},
and \code{\link{fuzzy_term_clustering}}}

\item{plot_clusters_graph}{boolean value indicate whether or not to plot
the graph diagram of clustering results (default = TRUE)}

\item{use_description}{Boolean argument to indicate whether term descriptions
(in the 'Term_Description' column) should be used. (default = \code{FALSE})}

\item{use_active_snw_genes}{boolean to indicate whether or not to use
non-input active subnetwork genes in the calculation of kappa statistics
(default = FALSE, i.e. only use affected genes)}

\item{...}{additional arguments for \code{\link{hierarchical_term_clustering}},
\code{\link{fuzzy_term_clustering}} and \code{\link{cluster_graph_vis}}.
See documentation of these functions for more details.}
}
\value{
a data frame of clustering results. For 'hierarchical', the cluster
assignments (Cluster) and whether the term is representative of its cluster
(Status) is added as columns. For 'fuzzy', terms that are in multiple
clusters are provided for each cluster. The cluster assignments (Cluster)
and whether the term is representative of its cluster (Status) is
added as columns.
}
\description{
Cluster Enriched Terms
}
\examples{
example_clustered <- cluster_enriched_terms(
  example_pathfindR_output[1:3, ],
  plot_clusters_graph = FALSE
)
example_clustered <- cluster_enriched_terms(
  example_pathfindR_output[1:3, ],
  method = 'fuzzy', plot_clusters_graph = FALSE
)
}
\seealso{
See \code{\link{hierarchical_term_clustering}} for hierarchical
clustering of enriched terms.
See \code{\link{fuzzy_term_clustering}} for fuzzy clustering of enriched terms.
See \code{\link{cluster_graph_vis}} for graph visualization of clustering.
}


================================================
FILE: man/cluster_graph_vis.Rd
================================================
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/clustering.R
\name{cluster_graph_vis}
\alias{cluster_graph_vis}
\title{Graph Visualization of Clustered Enriched Terms}
\usage{
cluster_graph_vis(
  clu_obj,
  kappa_mat,
  enrichment_res,
  kappa_threshold = 0.35,
  use_description = FALSE,
  vertex.label.cex = 0.7,
  vertex.size.scaling = 2.5
)
}
\arguments{
\item{clu_obj}{clustering result (either a matrix obtained via
\code{\link{hierarchical_term_clustering}} or \code{\link{fuzzy_term_clustering}}
`fuzzy_term_clustering` or a vector obtained via `hierarchical_term_clustering`)}

\item{kappa_mat}{matrix of kappa statistics (output of \code{\link{create_kappa_matrix}})}

\item{enrichment_res}{data frame of pathfindR enrichment results. Must-have
columns are 'Term_Description' (if \code{use_description = TRUE}) or 'ID'
(if \code{use_description = FALSE}), 'Down_regulated', and 'Up_regulated'.
If \code{use_active_snw_genes = TRUE}, 'non_Signif_Snw_Genes' must also be
provided.}

\item{kappa_threshold}{threshold for kappa statistics, defining strong
relation (default = 0.35)}

\item{use_description}{Boolean argument to indicate whether term descriptions
(in the 'Term_Description' column) should be used. (default = \code{FALSE})}

\item{vertex.label.cex}{font size for vertex labels; it is interpreted as a multiplication factor of some device-dependent base font size (default = 0.7)}

\item{vertex.size.scaling}{scaling factor for the node size (default = 2.5)}
}
\value{
Plots a graph diagram of clustering results. Each node is an enriched term
from `enrichment_res`. Size of node corresponds to -log(lowest_p). Thickness
of the edges between nodes correspond to the kappa statistic between the two
terms. Color of each node corresponds to distinct clusters. For fuzzy
clustering, if a term is in multiple clusters, multiple colors are utilized.
}
\description{
Graph Visualization of Clustered Enriched Terms
}
\examples{
\dontrun{
cluster_graph_vis(clu_obj, kappa_mat, enrichment_res)
}
}


================================================
FILE: man/color_kegg_pathway.Rd
================================================
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/visualization.R
\name{color_kegg_pathway}
\alias{color_kegg_pathway}
\title{Color hsa KEGG pathway}
\usage{
color_kegg_pathway(
  pw_id,
  change_vec,
  scale_vals = TRUE,
  node_cols = NULL,
  legend.position = "top"
)
}
\arguments{
\item{pw_id}{hsa KEGG pathway id (e.g. hsa05012)}

\item{change_vec}{vector of change values, names should be hsa KEGG gene ids}

\item{scale_vals}{should change values be scaled? (default = \code{TRUE})}

\item{node_cols}{low, middle and high color values for coloring the pathway nodes
(default = \code{NULL}). If \code{node_cols=NULL}, the low, middle and high color
are set as 'green', 'gray' and 'red'. If all change values are 1e6 (in case no
changes are supplied, this dummy value is assigned by
\code{\link{input_processing}}), only one color ('#F38F18' if NULL) is used.}

\item{legend.position}{the default position of legends ("none", "left",
"right", "bottom", "top", "inside")}
}
\value{
a ggplot object containing the colored KEGG pathway diagram visualization
}
\description{
Color hsa KEGG pathway
}
\examples{
\dontrun{
pw_id <- 'hsa00010'
change_vec <- c(-2, 4, 6)
names(change_vec) <- c('hsa:2821', 'hsa:226', 'hsa:229')
result <- pathfindR:::color_kegg_pathway(pw_id, change_vec)
}
}


================================================
FILE: man/combine_pathfindR_results.Rd
================================================
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/comparison.R
\name{combine_pathfindR_results}
\alias{combine_pathfindR_results}
\title{Combine 2 pathfindR Results}
\usage{
combine_pathfindR_results(result_A, result_B, plot_common = TRUE)
}
\arguments{
\item{result_A}{data frame of first pathfindR enrichment results}

\item{result_B}{data frame of second pathfindR enrichment results}

\item{plot_common}{boolean to indicate whether or not to plot the term-gene
graph of the common terms (default=\code{TRUE})}
}
\value{
Data frame of combined pathfindR enrichment results. Columns are: \describe{
  \item{ID}{ID of the enriched term}
  \item{Term_Description}{Description of the enriched term}
  \item{Fold_Enrichment_A}{Fold enrichment value for the enriched term (Calculated using ONLY the input genes)}
  \item{occurrence_A}{the number of iterations that the given term was found to enriched over all iterations}
  \item{lowest_p_A}{the lowest adjusted-p value of the given term over all iterations}
  \item{highest_p_A}{the highest adjusted-p value of the given term over all iterations}
  \item{Up_regulated_A}{the up-regulated genes in the input involved in the given term's gene set, comma-separated}
  \item{Down_regulated_A}{the down-regulated genes in the input involved in the given term's gene set, comma-separated}
  \item{Fold_Enrichment_B}{Fold enrichment value for the enriched term (Calculated using ONLY the input genes)}
  \item{occurrence_B}{the number of iterations that the given term was found to enriched over all iterations}
  \item{lowest_p_B}{the lowest adjusted-p value of the given term over all iterations}
  \item{highest_p_B}{the highest adjusted-p value of the given term over all iterations}
  \item{Up_regulated_B}{the up-regulated genes in the input involved in the given term's gene set, comma-separated}
  \item{Down_regulated_B}{the down-regulated genes in the input involved in the given term's gene set, comma-separated}
  \item{combined_p}{the combined p value (via Fisher's method)}
  \item{status}{whether the term is found in both analyses ('common'), found only in the first ('A only') or found only in the second ('B only)}
}
By default, the function also displays the term-gene graph of the common terms
}
\description{
Combine 2 pathfindR Results
}
\examples{
combined_results <- combine_pathfindR_results(example_pathfindR_output, example_comparison_output)
}


================================================
FILE: man/combined_results_graph.Rd
================================================
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/comparison.R
\name{combined_results_graph}
\alias{combined_results_graph}
\title{Combined Results Graph}
\usage{
combined_results_graph(
  combined_df,
  selected_terms = "common",
  use_description = FALSE,
  layout = "stress",
  node_size = "num_genes"
)
}
\arguments{
\item{combined_df}{Data frame of combined pathfindR enrichment results}

\item{selected_terms}{the vector of selected terms for creating the graph
(either IDs or term descriptions). If set to \code{'common'}, all of the
common terms are used. (default = 'common')}

\item{use_description}{Boolean argument to indicate whether term descriptions
(in the 'Term_Description' column) should be used. (default = \code{FALSE})}

\item{layout}{The type of layout to create (see \code{\link[ggraph]{ggraph}} for details. Default = 'stress')}

\item{node_size}{Argument to indicate whether to use number of significant genes ('num_genes')
or the -log10(lowest p value) ('p_val') for adjusting the node sizes (default = 'num_genes')}
}
\value{
a  \code{\link[ggraph]{ggraph}} object containing the combined term-gene graph.
 Each node corresponds to an enriched term (orange if common, different shades of blue otherwise),
 an up-regulated gene (green), a down-regulated gene (red) or
 a conflicting (i.e. up in one analysis, down in the other or vice versa) gene
 (gray). An edge between a term and a gene indicates
 that the given term involves the gene. Size of a term node is proportional
 to either the number of genes (if \code{node_size = 'num_genes'}) or
 the -log10(lowest p value) (if \code{node_size = 'p_val'}).
}
\description{
Combined Results Graph
}
\examples{
combined_results <- combine_pathfindR_results(
  example_pathfindR_output,
  example_comparison_output,
  plot_common = FALSE
)
g <- combined_results_graph(combined_results, selected_terms = sample(combined_results$ID, 3))
}


================================================
FILE: man/configure_output_dir.Rd
================================================
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/utility.R
\name{configure_output_dir}
\alias{configure_output_dir}
\title{Configure Output Directory Name}
\usage{
configure_output_dir(output_dir = NULL)
}
\arguments{
\item{output_dir}{the directory to be created where the output and intermediate
files are saved (default = \code{NULL}, a temporary directory is used)}
}
\value{
/path/to/output/dir
}
\description{
Configure Output Directory Name
}


================================================
FILE: man/create_HTML_report.Rd
================================================
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/utility.R
\name{create_HTML_report}
\alias{create_HTML_report}
\title{Create HTML Report of pathfindR Results}
\usage{
create_HTML_report(input, input_processed, final_res, dir_for_report)
}
\arguments{
\item{input}{the input data that pathfindR uses. The input must be a data
  frame with three columns: \enumerate{
  \item Gene Symbol (Gene Symbol)
  \item Change value, e.g. log(fold change) (OPTIONAL)
  \item p value, e.g. adjusted p value associated with differential expression
}}

\item{input_processed}{processed input data frame}

\item{final_res}{final pathfindR result data frame}

\item{dir_for_report}{directory to render the report in}
}
\description{
Create HTML Report of pathfindR Results
}


================================================
FILE: man/create_kappa_matrix.Rd
================================================
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/clustering.R
\name{create_kappa_matrix}
\alias{create_kappa_matrix}
\title{Create Kappa Statistics Matrix}
\usage{
create_kappa_matrix(
  enrichment_res,
  use_description = FALSE,
  use_active_snw_genes = FALSE
)
}
\arguments{
\item{enrichment_res}{data frame of pathfindR enrichment results. Must-have
columns are 'Term_Description' (if \code{use_description = TRUE}) or 'ID'
(if \code{use_description = FALSE}), 'Down_regulated', and 'Up_regulated'.
If \code{use_active_snw_genes = TRUE}, 'non_Signif_Snw_Genes' must also be
provided.}

\item{use_description}{Boolean argument to indicate whether term descriptions
(in the 'Term_Description' column) should be used. (default = \code{FALSE})}

\item{use_active_snw_genes}{boolean to indicate whether or not to use
non-input active subnetwork genes in the calculation of kappa statistics
(default = FALSE, i.e. only use affected genes)}
}
\value{
a matrix of kappa statistics between each term in the
enrichment results.
}
\description{
Create Kappa Statistics Matrix
}
\examples{
sub_df <- example_pathfindR_output[1:3, ]
create_kappa_matrix(sub_df)
}


================================================
FILE: man/enrichment.Rd
================================================
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/enrichment.R
\name{enrichment}
\alias{enrichment}
\title{Perform Enrichment Analysis for a Single Gene Set}
\usage{
enrichment(
  input_genes,
  genes_by_term = pathfindR.data::kegg_genes,
  term_descriptions = pathfindR.data::kegg_descriptions,
  adj_method = "bonferroni",
  enrichment_threshold = 0.05,
  sig_genes_vec,
  background_genes
)
}
\arguments{
\item{input_genes}{The set of gene symbols to be used for enrichment
analysis. In the scope of this package, these are genes that were
identified for an active subnetwork}

\item{genes_by_term}{List that contains genes for each gene set. Names of
this list are gene set IDs (default = kegg_genes)}

\item{term_descriptions}{Vector that contains term descriptions for the
gene sets. Names of this vector are gene set IDs (default = kegg_descriptions)}

\item{adj_method}{correction method to be used for adjusting p-values.
(default = 'bonferroni')}

\item{enrichment_threshold}{adjusted-p value threshold used when filtering
enrichment results (default = 0.05)}

\item{sig_genes_vec}{vector of significant gene symbols. In the scope of this
package, these are the input genes that were used for active subnetwork search}

\item{background_genes}{vector of background genes. In the scope of this package,
the background genes are taken as all genes in the PIN
(see \code{\link{enrichment_analyses}})}
}
\value{
A data frame that contains enrichment results
}
\description{
Perform Enrichment Analysis for a Single Gene Set
}
\examples{
enrichment(
  input_genes = c('PER1', 'PER2', 'CRY1', 'CREB1'),
  sig_genes_vec = 'PER1',
  background_genes = unlist(pathfindR.data::kegg_genes)
)
}
\seealso{
\code{\link[stats]{p.adjust}} for adjustment of p values. See
  \code{\link{run_pathfindR}} for the wrapper function of the pathfindR
  workflow. \code{\link{hyperg_test}} for the details on hypergeometric
  distribution-based hypothesis testing.
}


================================================
FILE: man/enrichment_analyses.Rd
================================================
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/enrichment.R
\name{enrichment_analyses}
\alias{enrichment_analyses}
\title{Perform Enrichment Analyses on the Input Subnetworks}
\usage{
enrichment_analyses(
  snws,
  sig_genes_vec,
  pin_name_path = "Biogrid",
  genes_by_term = pathfindR.data::kegg_genes,
  term_descriptions = pathfindR.data::kegg_descriptions,
  adj_method = "bonferroni",
  enrichment_threshold = 0.05,
  list_active_snw_genes = FALSE
)
}
\arguments{
\item{snws}{a list of subnetwork genes (i.e., vectors of genes for each subnetwork)}

\item{sig_genes_vec}{vector of significant gene symbols. In the scope of this
package, these are the input genes that were used for active subnetwork search}

\item{pin_name_path}{Name of the chosen PIN or absolute/path/to/PIN.sif. If PIN name,
must be one of c('Biogrid', 'STRING', 'GeneMania', 'IntAct', 'KEGG', 'mmu_STRING'). If
path/to/PIN.sif, the file must comply with the PIN specifications. (Default = 'Biogrid')}

\item{genes_by_term}{List that contains genes for each gene set. Names of
this list are gene set IDs (default = kegg_genes)}

\item{term_descriptions}{Vector that contains term descriptions for the
gene sets. Names of this vector are gene set IDs (default = kegg_descriptions)}

\item{adj_method}{correction method to be used for adjusting p-values.
(default = 'bonferroni')}

\item{enrichment_threshold}{adjusted-p value threshold used when filtering
enrichment results (default = 0.05)}

\item{list_active_snw_genes}{boolean value indicating whether or not to report
the non-significant active subnetwork genes for the active subnetwork which was enriched for
the given term with the lowest p value (default = \code{FALSE})}
}
\value{
a dataframe of combined enrichment results. Columns are: \describe{
  \item{ID}{ID of the enriched term}
  \item{Term_Description}{Description of the enriched term}
  \item{Fold_Enrichment}{Fold enrichment value for the enriched term}
  \item{p_value}{p value of enrichment}
  \item{adj_p}{adjusted p value of enrichment}
  \item{support}{the support (proportion of active subnetworks leading to enrichment over all subnetworks) for the gene set}
  \item{non_Signif_Snw_Genes (OPTIONAL)}{the non-significant active subnetwork genes, comma-separated}
}
}
\description{
Perform Enrichment Analyses on the Input Subnetworks
}
\examples{
enr_res <- enrichment_analyses(
  snws = example_active_snws[1:2],
  sig_genes_vec = example_pathfindR_input$Gene.symbol[1:25],
  pin_name_path = 'KEGG'
)
}
\seealso{
\code{\link{enrichment}} for the enrichment analysis for a single gene set
}


================================================
FILE: man/enrichment_chart.Rd
================================================
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/visualization.R
\name{enrichment_chart}
\alias{enrichment_chart}
\title{Create Bubble Chart of Enrichment Results}
\usage{
enrichment_chart(
  result_df,
  top_terms = 10,
  plot_by_cluster = FALSE,
  num_bubbles = 4,
  even_breaks = TRUE
)
}
\arguments{
\item{result_df}{a data frame that must contain the following columns: \describe{
  \item{Term_Description}{Description of the enriched term}
  \item{Fold_Enrichment}{Fold enrichment value for the enriched term}
  \item{lowest_p}{the lowest adjusted-p value of the given term over all iterations}
  \item{Up_regulated}{the up-regulated genes in the input involved in the given term's gene set, comma-separated}
  \item{Down_regulated}{the down-regulated genes in the input involved in the given term's gene set, comma-separated}
  \item{Cluster(OPTIONAL)}{the cluster to which the enriched term is assigned}
}}

\item{top_terms}{number of top terms (according to the 'lowest_p' column)
to plot (default = 10). If \code{plot_by_cluster = TRUE}, selects the top
\code{top_terms} terms per each cluster. Set \code{top_terms = NULL} to plot
for all terms.If the total number of terms is less than \code{top_terms},
all terms are plotted.}

\item{plot_by_cluster}{boolean value indicating whether or not to group the
enriched terms by cluster (works if \code{result_df} contains a
'Cluster' column).}

\item{num_bubbles}{number of sizes displayed in the legend \code{# genes}
(Default = 4)}

\item{even_breaks}{whether or not to set even breaks for the number of sizes
displayed in the legend \code{# genes}. If \code{TRUE} (default), sets
equal breaks and the number of displayed bubbles may be different than the
number set by \code{num_bubbles}. If the exact number set by
\code{num_bubbles} is required, set this argument to \code{FALSE}}
}
\value{
a \code{\link[ggplot2]{ggplot2}} object containing the bubble chart.
The x-axis corresponds to fold enrichment values while the y-axis indicates
the enriched terms. Size of the bubble indicates the number of significant
genes in the given enriched term. Color indicates the -log10(lowest-p) value.
The closer the color is to red, the more significant the enrichment is.
Optionally, if 'Cluster' is a column of \code{result_df} and
\code{plot_by_cluster == TRUE}, the enriched terms are grouped by clusters.
}
\description{
This function is used to create a ggplot2 bubble chart displaying the
enrichment results.
}
\examples{
g <- enrichment_chart(example_pathfindR_output)
}


================================================
FILE: man/fetch_gene_set.Rd
================================================
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/utility.R
\name{fetch_gene_set}
\alias{fetch_gene_set}
\title{Fetch Gene Set Objects}
\usage{
fetch_gene_set(
  gene_sets = "KEGG",
  min_gset_size = 10,
  max_gset_size = 300,
  custom_genes = NULL,
  custom_descriptions = NULL
)
}
\arguments{
\item{gene_sets}{Name of the gene sets to be used for enrichment analysis.
Available gene sets are 'KEGG', 'Reactome', 'BioCarta', 'GO-All',
'GO-BP', 'GO-CC', 'GO-MF', 'cell_markers', 'mmu_KEGG' or 'Custom'.
If 'Custom', the arguments \code{custom_genes} and \code{custom_descriptions}
must be specified. (Default = 'KEGG')}

\item{min_gset_size}{minimum number of genes a term must contain (default = 10)}

\item{max_gset_size}{maximum number of genes a term must contain (default = 300)}

\item{custom_genes}{a list containing the genes involved in each custom
term. Each element is a vector of gene symbols located in the given custom
term. Names should correspond to the IDs of the custom terms.}

\item{custom_descriptions}{A vector containing the descriptions for each
custom  term. Names of the vector should correspond to the IDs of the custom
terms.}
}
\value{
a list containing 2 elements \describe{
  \item{genes_by_term}{list of vectors of genes contained in each term}
  \item{term_descriptions}{vector of descriptions per each term}
}
}
\description{
Function for obtaining the gene sets per term and the term descriptions to
be used for enrichment analysis.
}
\examples{
KEGG_gset <- fetch_gene_set()
GO_MF_gset <- fetch_gene_set('GO-MF', min_gset_size = 20, max_gset_size = 100)
}


================================================
FILE: man/fetch_java_version.Rd
================================================
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/zzz.R
\name{fetch_java_version}
\alias{fetch_java_version}
\title{Obtain Java Version}
\usage{
fetch_java_version()
}
\value{
character vector containing the output of 'java -version'
}
\description{
Obtain Java Version
}
\details{
this function was adapted from the CRAN package \code{sparklyr}
}


================================================
FILE: man/filterActiveSnws.Rd
================================================
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/active_snw_search.R
\name{filterActiveSnws}
\alias{filterActiveSnws}
\title{Parse Active Subnetwork Search Output File and Filter the Subnetworks}
\usage{
filterActiveSnws(
  active_snw_path,
  sig_genes_vec,
  score_quan_thr = 0.8,
  sig_gene_thr = 0.02
)
}
\arguments{
\item{active_snw_path}{path to the output of an Active Subnetwork Search}

\item{sig_genes_vec}{vector of significant gene symbols. In the scope of this
package, these are the input genes that were used for active subnetwork search}

\item{score_quan_thr}{active subnetwork score quantile threshold. Must be
between 0 and 1 or set to -1 for not filtering. (Default = 0.8)}

\item{sig_gene_thr}{threshold for the minimum proportion of significant genes in
the subnetwork (Default = 0.02) If the number of genes to use as threshold is
calculated to be < 2 (e.g. 50 signif. genes x 0.01 = 0.5), the threshold number
is set to 2}
}
\value{
A list containing \code{subnetworks}: a list of of genes in every
active subnetwork that has a score greater than the \code{score_quan_thr}th
quantile and that contains at least \code{sig_gene_thr} of significant genes
and \code{scores} the score of each filtered active subnetwork
}
\description{
Parse Active Subnetwork Search Output File and Filter the Subnetworks
}
\examples{
path2snw_list <- system.file(
  'extdata/resultActiveSubnetworkSearch.txt',
  package = 'pathfindR'
)
filtered <- filterActiveSnws(
  active_snw_path = path2snw_list,
  sig_genes_vec = example_pathfindR_input$Gene.symbol
)
}
\seealso{
See \code{\link{run_pathfindR}} for the wrapper function of the
  pathfindR enrichment workflow
}


================================================
FILE: man/fuzzy_term_clustering.Rd
================================================
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/clustering.R
\name{fuzzy_term_clustering}
\alias{fuzzy_term_clustering}
\title{Heuristic Fuzzy Multiple-linkage Partitioning of Enriched Terms}
\usage{
fuzzy_term_clustering(
  kappa_mat,
  enrichment_res,
  kappa_threshold = 0.35,
  use_description = FALSE
)
}
\arguments{
\item{kappa_mat}{matrix of kappa statistics (output of \code{\link{create_kappa_matrix}})}

\item{enrichment_res}{data frame of pathfindR enrichment results. Must-have
columns are 'Term_Description' (if \code{use_description = TRUE}) or 'ID'
(if \code{use_description = FALSE}), 'Down_regulated', and 'Up_regulated'.
If \code{use_active_snw_genes = TRUE}, 'non_Signif_Snw_Genes' must also be
provided.}

\item{kappa_threshold}{threshold for kappa statistics, defining strong
relation (default = 0.35)}

\item{use_description}{Boolean argument to indicate whether term descriptions
(in the 'Term_Description' column) should be used. (default = \code{FALSE})}
}
\value{
a boolean matrix of cluster assignments. Each row corresponds to an
enriched term, each column corresponds to a cluster.
}
\description{
Heuristic Fuzzy Multiple-linkage Partitioning of Enriched Terms
}
\details{
The fuzzy clustering algorithm was implemented based on:
Huang DW, Sherman BT, Tan Q, et al. The DAVID Gene Functional
Classification Tool: a novel biological module-centric algorithm to
functionally analyze large gene lists. Genome Biol. 2007;8(9):R183.
}
\examples{
\dontrun{
fuzzy_term_clustering(kappa_mat, enrichment_res)
fuzzy_term_clustering(kappa_mat, enrichment_res, kappa_threshold = 0.45)
}
}


================================================
FILE: man/get_biogrid_pin.Rd
================================================
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/data_generation.R
\name{get_biogrid_pin}
\alias{get_biogrid_pin}
\title{Retrieve the Requested Release of Organism-specific BioGRID PIN}
\usage{
get_biogrid_pin(org = "Homo_sapiens", path2pin, release = "latest")
}
\arguments{
\item{org}{organism name. BioGRID naming requires underscores for spaces so
'Homo sapiens' becomes 'Homo_sapiens', 'Mus musculus' becomes 'Mus_musculus'
etc. See \url{https://wiki.thebiogrid.org/doku.php/statistics} for a full
list of available organisms (default = 'Homo_sapiens')}

\item{path2pin}{the path of the file to save the PIN data. By default, the
PIN data is saved in a temporary file}

\item{release}{the requested BioGRID release (default = 'latest')}
}
\value{
the path of the file in which the PIN data was saved. If
\code{path2pin} was not supplied by the user, the PIN data is saved in a
temporary file
}
\description{
Retrieve the Requested Release of Organism-specific BioGRID PIN
}


================================================
FILE: man/get_gene_sets_list.Rd
================================================
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/data_generation.R
\name{get_gene_sets_list}
\alias{get_gene_sets_list}
\title{Retrieve Organism-specific Gene Sets List}
\usage{
get_gene_sets_list(
  source = "KEGG",
  org_code = "hsa",
  species = "Homo sapiens",
  db_species = "HS",
  collection,
  subcollection = NULL
)
}
\arguments{
\item{source}{As of this version, either 'KEGG', 'Reactome' or 'MSigDB' (default = 'KEGG')}

\item{org_code}{(Used for 'KEGG' only) KEGG organism code for the selected organism. For a full list
of all available organisms, see \url{https://www.genome.jp/kegg/catalog/org_list.html}}

\item{species}{species name for output genes, such as Homo sapiens, Mus musculus, etc.
See \code{\link[msigdbr]{msigdbr_species}} for all the species available in
the msigdbr package.}

\item{db_species}{Species abbreviation for the human or mouse databases ("HS" or "MM").}

\item{collection}{collection. e.g., H, C1. (default = NULL,
i.e. list all gene sets in collection). 
See \code{\link[msigdbr]{msigdbr_collections}} for all available options
the msigdbr package.}

\item{subcollection}{sub-collection, such as CGP, BP, etc. (default = NULL,
i.e. list all gene sets in collection). 
See \code{\link[msigdbr]{msigdbr_collections}} for all available options
the msigdbr package.}
}
\value{
A list containing 2 elements: \itemize{
\item{gene_sets - A list containing the genes involved in each gene set}
\item{descriptions - A named vector containing the descriptions for each gene set}
}. For 'KEGG' and 'MSigDB', it is possible to choose a specific organism. For a full list
of all available KEGG organisms, see \url{https://www.genome.jp/kegg/catalog/org_list.html}.
See \code{\link[msigdbr]{msigdbr_species}} for all the species available in
the msigdbr package used for obtaining 'MSigDB' gene sets.
For Reactome, there is only one collection of pathway gene sets.
}
\description{
Retrieve Organism-specific Gene Sets List
}


================================================
FILE: man/get_kegg_gsets.Rd
================================================
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/data_generation.R
\name{get_kegg_gsets}
\alias{get_kegg_gsets}
\title{Retrieve Organism-specific KEGG Pathway Gene Sets}
\usage{
get_kegg_gsets(org_code = "hsa")
}
\arguments{
\item{org_code}{KEGG organism code for the selected organism. For a full list
of all available organisms, see \url{https://www.genome.jp/kegg/catalog/org_list.html}}
}
\value{
list containing 2 elements: \itemize{
\item{gene_sets - A list containing KEGG IDs for the genes involved in each KEGG pathway}
\item{descriptions - A named vector containing the descriptions for each KEGG pathway}
}
}
\description{
Retrieve Organism-specific KEGG Pathway Gene Sets
}


================================================
FILE: man/get_mgsigdb_gsets.Rd
================================================
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/data_generation.R
\name{get_mgsigdb_gsets}
\alias{get_mgsigdb_gsets}
\title{Retrieve Organism-specific MSigDB Gene Sets}
\usage{
get_mgsigdb_gsets(
  species = "Homo sapiens",
  db_species = "HS",
  collection = NULL,
  subcollection = NULL
)
}
\arguments{
\item{species}{species name for output genes, such as Homo sapiens, Mus musculus, etc.
See \code{\link[msigdbr]{msigdbr_species}} for all the species available in
the msigdbr package.}

\item{db_species}{Species abbreviation for the human or mouse databases ("HS" or "MM").}

\item{collection}{collection. e.g., H, C1. (default = NULL,
i.e. list all gene sets in collection). 
See \code{\link[msigdbr]{msigdbr_collections}} for all available options
the msigdbr package.}

\item{subcollection}{sub-collection, such as CGP, BP, etc. (default = NULL,
i.e. list all gene sets in collection). 
See \code{\link[msigdbr]{msigdbr_collections}} for all available options
the msigdbr package.}
}
\value{
Retrieves the MSigDB gene sets and returns a list containing 2 elements: \itemize{
\item{gene_sets - A list containing the genes involved in each of the selected MSigDB gene sets}
\item{descriptions - A named vector containing the descriptions for each selected MSigDB gene set}
}
}
\description{
Retrieve Organism-specific MSigDB Gene Sets
}
\details{
this function utilizes the function \code{\link[msigdbr]{msigdbr}}
from the \code{msigdbr} package to retrieve the 'Molecular Signatures Database'
(MSigDB) gene sets (Subramanian et al. 2005 <doi:10.1073/pnas.0506580102>,
Liberzon et al. 2015 <doi:10.1016/j.cels.2015.12.004>).
Available collections are: H: hallmark gene sets, C1: positional gene sets,
C2: curated gene sets, C3: motif gene sets, C4: computational gene sets,
C5: GO gene sets, C6: oncogenic signatures and C7: immunologic signatures
}


================================================
FILE: man/get_pin_file.Rd
================================================
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/data_generation.R
\name{get_pin_file}
\alias{get_pin_file}
\title{Retrieve Organism-specific PIN data}
\usage{
get_pin_file(source = "BioGRID", org = "Homo_sapiens", path2pin, ...)
}
\arguments{
\item{source}{As of this version, this function is implemented to get data
from 'BioGRID' only. This argument (and this wrapper function) was implemented
for future utility}

\item{org}{organism name. BioGRID naming requires underscores for spaces so
'Homo sapiens' becomes 'Homo_sapiens', 'Mus musculus' becomes 'Mus_musculus'
etc. See \url{https://wiki.thebiogrid.org/doku.php/statistics} for a full
list of available organisms (default = 'Homo_sapiens')}

\item{path2pin}{the path of the file to save the PIN data. By default, the
PIN data is saved in a temporary file}

\item{...}{additional arguments for \code{\link{get_biogrid_pin}}}
}
\value{
the path of the file in which the PIN data was saved. If
\code{path2pin} was not supplied by the user, the PIN data is saved in a
temporary file
}
\description{
Retrieve Organism-specific PIN data
}
\examples{
\dontrun{
pin_path <- get_pin_file()
}
}


================================================
FILE: man/get_reactome_gsets.Rd
================================================
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/data_generation.R
\name{get_reactome_gsets}
\alias{get_reactome_gsets}
\title{Retrieve Reactome Pathway Gene Sets}
\usage{
get_reactome_gsets()
}
\value{
Gets the latest Reactome pathways gene sets in gmt format. Parses the
gmt file and returns a list containing 2 elements: \itemize{
\item{gene_sets - A list containing the genes involved in each Reactome pathway}
\item{descriptions - A named vector containing the descriptions for each Reactome pathway}
}
}
\description{
Retrieve Reactome Pathway Gene Sets
}


================================================
FILE: man/gset_list_from_gmt.Rd
================================================
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/data_generation.R
\name{gset_list_from_gmt}
\alias{gset_list_from_gmt}
\title{Retrieve Gene Sets from GMT-format File}
\usage{
gset_list_from_gmt(path2gmt, descriptions_idx = 2)
}
\arguments{
\item{path2gmt}{path to the gmt file}

\item{descriptions_idx}{index for descriptions (default = 2)}
}
\value{
list containing 2 elements: \itemize{
\item{gene_sets - A list containing the genes involved in each gene set}
\item{descriptions - A named vector containing the descriptions for each gene set}
}
}
\description{
Retrieve Gene Sets from GMT-format File
}


================================================
FILE: man/hierarchical_term_clustering.Rd
================================================
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/clustering.R
\name{hierarchical_term_clustering}
\alias{hierarchical_term_clustering}
\title{Hierarchical Clustering of Enriched Terms}
\usage{
hierarchical_term_clustering(
  kappa_mat,
  enrichment_res,
  num_clusters = NULL,
  use_description = FALSE,
  clu_method = "average",
  plot_hmap = FALSE,
  plot_dend = TRUE
)
}
\arguments{
\item{kappa_mat}{matrix of kappa statistics (output of \code{\link{create_kappa_matrix}})}

\item{enrichment_res}{data frame of pathfindR enrichment results. Must-have
columns are 'Term_Description' (if \code{use_description = TRUE}) or 'ID'
(if \code{use_description = FALSE}), 'Down_regulated', and 'Up_regulated'.
If \code{use_active_snw_genes = TRUE}, 'non_Signif_Snw_Genes' must also be
provided.}

\item{num_clusters}{number of clusters to be formed (default = \code{NULL}).
If \code{NULL}, the optimal number of clusters is determined as the number
which yields the highest average silhouette width.}

\item{use_description}{Boolean argument to indicate whether term descriptions
(in the 'Term_Description' column) should be used. (default = \code{FALSE})}

\item{clu_method}{the agglomeration method to be used
(default = 'average', see \code{\link[stats]{hclust}})}

\item{plot_hmap}{boolean to indicate whether to plot the kappa statistics
clustering heatmap or not (default = FALSE)}

\item{plot_dend}{boolean to indicate whether to plot the clustering
dendrogram partitioned into the optimal number of clusters (default = TRUE)}
}
\value{
a vector of clusters for each enriched term in the enrichment results.
}
\description{
Hierarchical Clustering of Enriched Terms
}
\details{
The function initially performs hierarchical clustering
of the enriched terms in \code{enrichment_res} using the kappa statistics
(defining the distance as \code{1 - kappa_statistic}). Next,
the clustering dendrogram is cut into k = 2, 3, ..., n - 1 clusters
(where n is the number of terms). The optimal number of clusters is
determined as the k value which yields the highest average silhouette width.
(if \code{num_clusters} not specified)
}
\examples{
\dontrun{
hierarchical_term_clustering(kappa_mat, enrichment_res)
hierarchical_term_clustering(kappa_mat, enrichment_res, method = 'complete')
}
}


================================================
FILE: man/hyperg_test.Rd
================================================
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/enrichment.R
\name{hyperg_test}
\alias{hyperg_test}
\title{Hypergeometric Distribution-based Hypothesis Testing}
\usage{
hyperg_test(term_genes, chosen_genes, background_genes)
}
\arguments{
\item{term_genes}{vector of genes in the selected term gene set}

\item{chosen_genes}{vector containing the set of input genes}

\item{background_genes}{vector of background genes (i.e. universal set of
genes in the experiment)}
}
\value{
the p-value as determined using the hypergeometric distribution.
}
\description{
Hypergeometric Distribution-based Hypothesis Testing
}
\details{
To determine whether the \code{chosen_genes} are enriched
(compared to a background pool of genes) in the \code{term_genes}, the
hypergeometric distribution is assumed and the appropriate p value
(the value under the right tail) is calculated and returned.
}
\examples{
hyperg_test(letters[1:5], letters[2:5], letters)
hyperg_test(letters[1:5], letters[2:10], letters)
hyperg_test(letters[1:5], letters[2:13], letters)
}


================================================
FILE: man/input_processing.Rd
================================================
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/utility.R
\name{input_processing}
\alias{input_processing}
\title{Process Input}
\usage{
input_processing(
  input,
  p_val_threshold = 0.05,
  pin_name_path = "Biogrid",
  convert2alias = TRUE
)
}
\arguments{
\item{input}{the input data that pathfindR uses. The input must be a data
  frame with three columns: \enumerate{
  \item Gene Symbol (Gene Symbol)
  \item Change value, e.g. log(fold change) (OPTIONAL)
  \item p value, e.g. adjusted p value associated with differential expression
}}

\item{p_val_threshold}{the p value threshold to use when filtering
the input data frame. Must a numeric value between 0 and 1. (default = 0.05)}

\item{pin_name_path}{Name of the chosen PIN or absolute/path/to/PIN.sif. If PIN name,
must be one of c('Biogrid', 'STRING', 'GeneMania', 'IntAct', 'KEGG', 'mmu_STRING'). If
path/to/PIN.sif, the file must comply with the PIN specifications. (Default = 'Biogrid')}

\item{convert2alias}{boolean to indicate whether or not to convert gene symbols
in the input that are not found in the PIN to an alias symbol found in the PIN
(default = TRUE) IMPORTANT NOTE: the conversion uses human gene symbols/alias symbols.}
}
\value{
This function first filters the input so that all p values are less
  than or equal to the threshold. Next, gene symbols that are not found in
  the PIN are identified. If aliases of these gene symbols are found in the
  PIN, the symbols are converted to the corresponding aliases. The
  resulting data frame containing the original gene symbols, the updated
  symbols, change values and p values is then returned.
}
\description{
Process Input
}
\examples{
processed_df <- input_processing(
  input = example_pathfindR_input[1:5, ],
  pin_name_path = 'KEGG'
)
processed_df <- input_processing(
  input = example_pathfindR_input[1:5, ],
  pin_name_path = 'KEGG',
  convert2alias = FALSE
)
}
\seealso{
See \code{\link{run_pathfindR}} for the wrapper function of the
  pathfindR workflow
}


================================================
FILE: man/input_testing.Rd
================================================
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/utility.R
\name{input_testing}
\alias{input_testing}
\title{Input Testing}
\usage{
input_testing(input, p_val_threshold = 0.05)
}
\arguments{
\item{input}{the input data that pathfindR uses. The input must be a data
  frame with three columns: \enumerate{
  \item Gene Symbol (Gene Symbol)
  \item Change value, e.g. log(fold change) (OPTIONAL)
  \item p value, e.g. adjusted p value associated with differential expression
}}

\item{p_val_threshold}{the p value threshold to use when filtering
the input data frame. Must a numeric value between 0 and 1. (default = 0.05)}
}
\value{
Only checks if the input and the threshold follows the required
  specifications.
}
\description{
Input Testing
}
\examples{
input_testing(example_pathfindR_input, 0.05)
}
\seealso{
See \code{\link{run_pathfindR}} for the wrapper function of the
  pathfindR workflow
}


================================================
FILE: man/isColor.Rd
================================================
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/visualization.R
\name{isColor}
\alias{isColor}
\title{Check if value is a valid color}
\usage{
isColor(x)
}
\arguments{
\item{x}{value}
}
\value{
TRUE if x is a valid color, otherwise FALSE
}
\description{
Check if value is a valid color
}


================================================
FILE: man/pathfindr.Rd
================================================
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/pathfindr.R
\docType{package}
\name{pathfindR}
\alias{pathfindR-package}
\alias{pathfindR}
\title{pathfindR: A package for Enrichment Analysis Utilizing Active Subnetworks}
\description{
pathfindR is a tool for active-subnetwork-oriented gene set enrichment analysis.
The main aim of the package is to identify active subnetworks in a
protein-protein interaction network using a user-provided list of genes
and associated p values then performing enrichment analyses on the identified
subnetworks, discovering enriched terms (i.e. pathways, gene ontology, TF target
gene sets etc.) that possibly underlie the phenotype of interest.
}
\details{
For analysis on non-Homo sapiens organisms, pathfindR offers utility functions
for obtaining organism-specific PIN data and organism-specific gene sets data.

pathfindR also offers functionalities to cluster the enriched terms and
identify representative terms in each cluster, to score the enriched terms
per sample and to visualize analysis results.
}
\seealso{
See \code{\link{run_pathfindR}} for details on the pathfindR
active-subnetwork-oriented enrichment analysis
See \code{\link{cluster_enriched_terms}} for details on methods of enriched
terms clustering to define clusters of biologically-related terms
See \code{\link{score_terms}} for details on agglomerated score calculation
for enriched terms to investigate how a gene set is altered in a given sample
(or in cases vs. controls)
See \code{\link{term_gene_heatmap}} for details on visualization of the heatmap
of enriched terms by involved genes
See \code{\link{term_gene_graph}} for details on visualizing terms and
term-related genes as a graph to determine the degree of overlap between the
enriched terms by identifying shared and/or distinct significant genes
See \code{\link{UpSet_plot}} for details on creating an UpSet plot of the
enriched terms.
See \code{\link{get_pin_file}} for obtaining organism-specific PIN data and
\code{\link{get_gene_sets_list}} for obtaining organism-specific gene sets data
}
\author{
\strong{Maintainer}: Ege Ulgen \email{egeulgen@gmail.com} (\href{https://orcid.org/0000-0003-2090-3621}{ORCID}) [copyright holder]

Authors:
\itemize{
  \item Ozan Ozisik \email{ozanytu@gmail.com} (\href{https://orcid.org/0000-0001-5980-8002}{ORCID})
}

}


================================================
FILE: man/plot_scores.Rd
================================================
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/scoring.R
\name{plot_scores}
\alias{plot_scores}
\title{Plot the Heatmap of Score Matrix of Enriched Terms per Sample}
\usage{
plot_scores(
  score_matrix,
  cases = NULL,
  label_samples = TRUE,
  case_title = "Case",
  control_title = "Control",
  low = "green",
  mid = "black",
  high = "red"
)
}
\arguments{
\item{score_matrix}{Matrix of agglomerated enriched term scores per sample. Columns are
samples, rows are enriched terms}

\item{cases}{(Optional) A vector of sample names that are cases in the
case/control experiment. (default = NULL)}

\item{label_samples}{Boolean value to indicate whether or not to label the
samples in the heatmap plot (default = TRUE)}

\item{case_title}{Naming of the 'Case' group (as in \code{cases}) (default = 'Case')}

\item{control_title}{Naming of the 'Control' group (default = 'Control')}

\item{low}{a string indicating the color of 'low' values in the coloring gradient (default = 'green')}

\item{mid}{a string indicating the color of 'mid' values in the coloring gradient (default = 'black')}

\item{high}{a string indicating the color of 'high' values in the coloring gradient (default = 'red')}
}
\value{
A `ggplot2` object containing the heatmap plot. x-axis indicates
the samples. y-axis indicates the enriched terms. 'Score' indicates the
score of the term in a given sample. If \code{cases} are provided, the plot is
divided into 2 facets, named by \code{case_title} and \code{control_title}.
}
\description{
Plot the Heatmap of Score Matrix of Enriched Terms per Sample
}
\examples{
score_matrix <- score_terms(
  example_pathfindR_output,
  example_experiment_matrix,
  plot_hmap = FALSE
)
hmap <- plot_scores(score_matrix)
}


================================================
FILE: man/process_pin.Rd
================================================
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/data_generation.R
\name{process_pin}
\alias{process_pin}
\title{Process Data frame of Protein-protein Interactions}
\usage{
process_pin(pin_df)
}
\arguments{
\item{pin_df}{data frame of protein-protein interactions with 2 columns:
'Interactor_A' and 'Interactor_B'}
}
\value{
processed PIN data frame (removes self-interactions and
duplicated interactions)
}
\description{
Process Data frame of Protein-protein Interactions
}


================================================
FILE: man/return_pin_path.Rd
================================================
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/utility.R
\name{return_pin_path}
\alias{return_pin_path}
\title{Return The Path to Given Protein-Protein Interaction Network (PIN)}
\usage{
return_pin_path(pin_name_path = "Biogrid")
}
\arguments{
\item{pin_name_path}{Name of the chosen PIN or absolute/path/to/PIN.sif. If PIN name,
must be one of c('Biogrid', 'STRING', 'GeneMania', 'IntAct', 'KEGG', 'mmu_STRING'). If
path/to/PIN.sif, the file must comply with the PIN specifications. (Default = 'Biogrid')}
}
\value{
The absolute path to chosen PIN.
}
\description{
This function returns the absolute path/to/PIN.sif. While the default PINs are
'Biogrid', 'STRING', 'GeneMania', 'IntAct', 'KEGG' and 'mmu_STRING'. The user can also
use any other PIN by specifying the 'path/to/PIN.sif'. All PINs to be used
in this package must formatted as SIF files: i.e. have 3 columns with no
header, no row names and be tab-separated. Columns 1 and 3 must be
interactors' gene symbols, column 2 must be a column with all
rows consisting of 'pp'.
}
\examples{
\dontrun{
pin_path <- return_pin_path('GeneMania')
}
}
\seealso{
See \code{\link{run_pathfindR}} for the wrapper function of the
  pathfindR workflow
}


================================================
FILE: man/run_pathfindr.Rd
================================================
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/core.R
\name{run_pathfindR}
\alias{run_pathfindR}
\title{Wrapper Function for pathfindR - Active-Subnetwork-Oriented Enrichment Workflow}
\usage{
run_pathfindR(
  input,
  gene_sets = "KEGG",
  min_gset_size = 10,
  max_gset_size = 300,
  custom_genes = NULL,
  custom_descriptions = NULL,
  pin_name_path = "Biogrid",
  p_val_threshold = 0.05,
  enrichment_threshold = 0.05,
  convert2alias = TRUE,
  plot_enrichment_chart = TRUE,
  output_dir = NULL,
  list_active_snw_genes = FALSE,
  ...
)
}
\arguments{
\item{input}{the input data that pathfindR uses. The input must be a data
  frame with three columns: \enumerate{
  \item Gene Symbol (Gene Symbol)
  \item Change value, e.g. log(fold change) (OPTIONAL)
  \item p value, e.g. adjusted p value associated with differential expression
}}

\item{gene_sets}{Name of the gene sets to be used for enrichment analysis.
Available gene sets are 'KEGG', 'Reactome', 'BioCarta', 'GO-All',
'GO-BP', 'GO-CC', 'GO-MF', 'cell_markers', 'mmu_KEGG' or 'Custom'.
If 'Custom', the arguments \code{custom_genes} and \code{custom_descriptions}
must be specified. (Default = 'KEGG')}

\item{min_gset_size}{minimum number of genes a term must contain (default = 10)}

\item{max_gset_size}{maximum number of genes a term must contain (default = 300)}

\item{custom_genes}{a list containing the genes involved in each custom
term. Each element is a vector of gene symbols located in the given custom
term. Names should correspond to the IDs of the custom terms.}

\item{custom_descriptions}{A vector containing the descriptions for each
custom  term. Names of the vector should correspond to the IDs of the custom
terms.}

\item{pin_name_path}{Name of the chosen PIN or absolute/path/to/PIN.sif. If PIN name,
must be one of c('Biogrid', 'STRING', 'GeneMania', 'IntAct', 'KEGG', 'mmu_STRING'). If
path/to/PIN.sif, the file must comply with the PIN specifications. (Default = 'Biogrid')}

\item{p_val_threshold}{the p value threshold to use when filtering
the input data frame. Must a numeric value between 0 and 1. (default = 0.05)}

\item{enrichment_threshold}{adjusted-p value threshold used when filtering
enrichment results (default = 0.05)}

\item{convert2alias}{boolean to indicate whether or not to convert gene symbols
in the input that are not found in the PIN to an alias symbol found in the PIN
(default = TRUE) IMPORTANT NOTE: the conversion uses human gene symbols/alias symbols.}

\item{plot_enrichment_chart}{boolean value. If TRUE, a bubble chart displaying
the enrichment results is plotted. (default = TRUE)}

\item{output_dir}{the directory to be created where the output and intermediate
files are saved (default = \code{NULL}, a temporary directory is used)}

\item{list_active_snw_genes}{boolean value indicating whether or not to report
the non-significant active subnetwork genes for the active subnetwork which was enriched for
the given term with the lowest p value (default = \code{FALSE})}

\item{...}{additional arguments for \code{\link{active_snw_enrichment_wrapper}}}
}
\value{
Data frame of pathfindR enrichment results. Columns are: \describe{
  \item{ID}{ID of the enriched term}
  \item{Term_Description}{Description of the enriched term}
  \item{Fold_Enrichment}{Fold enrichment value for the enriched term (Calculated using ONLY the input genes)}
  \item{occurrence}{the number of iterations that the given term was found to enriched over all iterations}
  \item{support}{the median support (proportion of active subnetworks leading to enrichment within an iteration) over all iterations}
  \item{lowest_p}{the lowest adjusted-p value of the given term over all iterations}
  \item{highest_p}{the highest adjusted-p value of the given term over all iterations}
  \item{non_Signif_Snw_Genes (OPTIONAL)}{the non-significant active subnetwork genes, comma-separated}
  \item{Up_regulated}{the up-regulated genes (as determined by `change value` > 0, if the `change column` was provided) in the input involved in the given term's gene set, comma-separated. If change column not provided, all affected are listed here.}
  \item{Down_regulated}{the down-regulated genes (as determined by `change value` < 0, if the `change column` was provided) in the input involved in the given term's gene set, comma-separated}
}
 The function also creates an HTML report with the pathfindR enrichment
 results linked to the visualizations of the enriched terms in addition to
 the table of converted gene symbols. This report can be found in
 '\code{output_dir}/results.html' under the current working directory.

 By default, a bubble chart of top 10 enrichment results are plotted. The x-axis
 corresponds to fold enrichment values while the y-axis indicates the enriched
 terms. Sizes of the bubbles indicate the number of significant genes in the given terms.
 Color indicates the -log10(lowest-p) value; the more red it is, the more
 significant the enriched term is. See \code{\link{enrichment_chart}}.
}
\description{
\code{run_pathfindR} is the wrapper function for the pathfindR workflow
}
\details{
This function takes in a data frame consisting of Gene Symbol, log-fold-change
and adjusted-p values. After input testing, any gene symbols that are not in
the PIN are converted to alias symbols if the alias is in the PIN. Next,
active subnetwork search is performed. Enrichment analysis is
performed using the genes in each of the active subnetworks. Terms with
adjusted-p values lower than \code{enrichment_threshold} are discarded. The
lowest adjusted-p value (over all subnetworks) for each term is kept. This
process of active subnetwork search and enrichment is repeated  for a selected
number of \code{iterations}, which is done in parallel. Over all iterations,
the lowest and the highest adjusted-p values, as well as number of occurrences
are reported for each enriched term.
}
\section{Warning}{
 Especially depending on the protein interaction network,
 the algorithm and the number of iterations you choose, 'active subnetwork
 search + enrichment' component of \code{run_pathfindR} may take a long time to finish.
}

\examples{
\dontrun{
run_pathfindR(example_pathfindR_input)
}
}
\seealso{
\code{\link{input_testing}} for input testing, \code{\link{input_processing}} for input processing,
\code{\link{active_snw_search}} for active subnetwork search and subnetwork filtering,
\code{\link{enrichment_analyses}} for enrichment analysis (using the active subnetworks),
\code{\link{summarize_enrichment_results}} for summarizing the active-subnetwork-oriented enrichment results,
\code{\link{annotate_term_genes}} for annotation of affected genes in the given gene sets,
\code{\link{visualize_terms}} for visualization of enriched terms,
\code{\link{enrichment_chart}} for a visual summary of the pathfindR enrichment results,
\code{\link[foreach]{foreach}} for details on parallel execution of looping constructs,
\code{\link{cluster_enriched_terms}} for clustering the resulting enriched terms and partitioning into clusters.
}


================================================
FILE: man/safe_get_content.Rd
================================================
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/data_generation.R
\name{safe_get_content}
\alias{safe_get_content}
\title{Safely download and parse web content}
\usage{
safe_get_content(url, ..., timeout_sec = 10)
}
\arguments{
\item{url}{Character string. The URL of the resource to download.}

\item{...}{Additional arguments passed to \code{\link[httr]{GET}}.}

\item{timeout_sec}{Numeric. Timeout in seconds for the request (default = 10).}
}
\value{
A character string containing the parsed content of the response 
  (UTF-8 encoded). On failure, an error is raised with a clear message.
}
\description{
This helper function retrieves content from a given URL using \pkg{httr}.  
It ensures that common issues (e.g. no internet, timeouts, HTTP errors, 
or parsing errors) are handled gracefully with clear, informative error messages.
}
\details{
This function is intended for use inside package functions.  
For examples, vignettes, or tests, wrap calls in a connectivity check 
(e.g. using \code{http_error(HEAD(url))}) to avoid CRAN failures 
when the resource is temporarily unavailable.
}
\examples{
\dontrun{
# Retrieve the latest BioGRID release page
result <- safe_get_content("https://downloads.thebiogrid.org/BioGRID/Latest-Release/")
}

}


================================================
FILE: man/score_terms.Rd
================================================
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/scoring.R
\name{score_terms}
\alias{score_terms}
\title{Calculate Agglomerated Scores of Enriched Terms for Each Subject}
\usage{
score_terms(
  enrichment_table,
  exp_mat,
  cases = NULL,
  use_description = FALSE,
  plot_hmap = TRUE,
  ...
)
}
\arguments{
\item{enrichment_table}{a data frame that must contain the 3 columns below: \describe{
  \item{Term_Description}{Description of the enriched term (necessary if \code{use_description = TRUE})}
  \item{ID}{ID of the enriched term (necessary if \code{use_description = FALSE})}
  \item{Up_regulated}{the up-regulated genes in the input involved in the given term's gene set, comma-separated}
  \item{Down_regulated}{the down-regulated genes in the input involved in the given term's gene set, comma-separated}
}}

\item{exp_mat}{the experiment (e.g., gene expression/methylation) matrix.
Columns are samples and rows are genes. Column names must contain sample
names and row names must contain the gene symbols.}

\item{cases}{(Optional) A vector of sample names that are cases in the
case/control experiment. (default = NULL)}

\item{use_description}{Boolean argument to indicate whether term descriptions
(in the 'Term_Description' column) should be used. (default = \code{FALSE})}

\item{plot_hmap}{Boolean value to indicate whether or not to draw the
heatmap plot of the scores. (default = TRUE)}

\item{...}{Additional arguments for \code{\link{plot_scores}} for aesthetics
of the heatmap plot}
}
\value{
Matrix of agglomerated scores of each enriched term per sample.
Columns are samples, rows are enriched terms. Optionally, displays a heatmap
of this matrix.
}
\description{
Calculate Agglomerated Scores of Enriched Terms for Each Subject
}
\section{Conceptual Background}{

For an experiment matrix (containing expression, methylation, etc. values),
the rows of which are genes and the columns of which are samples,
we denote: \itemize{
\item E as a matrix of size \ifelse{html}{\out{m x n}}{\eqn{m \times n}}
\item G as the set of all genes in the experiment \ifelse{html}{\out{G = E<sub>i.</sub>,  i &#8712; [1, m]}}{\eqn{G = E_{i\cdot},  \ \ i \in [1, m]}}
\item S as the set of all samples in the experiment \ifelse{html}{\out{S = E<sub>.j</sub>,  i &#8712; [1, n]}}{\eqn{S = E_{j\cdot},  \ \ \in [1, n]}}
}

We next define the gene score matrix GS (the standardized experiment matrix,
also of size \ifelse{html}{\out{m x n}}{\eqn{m \times n}}) as:

\ifelse{html}{\out{GS<sub>gs</sub> = (E<sub>gs</sub> - &#x113;<sub>g</sub>) / s<sub>g</sub>}}{\eqn{GS_{gs} = \frac{E_{gs} - \bar{e_g}}{s_g}}}

where \ifelse{html}{\out{g &#8712; G}}{\eqn{g \in G}}, \ifelse{html}{\out{s &#8712; S}}{\eqn{s \in S}},
\ifelse{html}{\out{&#x113;<sub>g</sub>}}{\eqn{\bar{e_g}}} is the mean of
all values for gene g and \ifelse{html}{\out{s<sub>g</sub>}}{\eqn{\bar{s_g}}}
is the standard deviation of all values for gene g.

We next denote T to be a set of terms (where each \ifelse{html}{\out{t &#8712; T}}{\eqn{t \in T}}
is a set of term-related genes, i.e.,
\ifelse{html}{\out{t = \{g<sub>x</sub>, ..., g<sub>y</sub>\} &sub; G}}{\eqn{t = \{g_x, ..., g_y\} \subset G}})
and finally define the agglomerated term scores matrix TS (where rows
correspond to genes and columns corresponds to samples s.t. the matrix has size
\ifelse{html}{\out{|T| x n}}{\eqn{|T| \times n}}) as:

\ifelse{html}{\out{TS<sub>ts</sub> = 1/|t| &#x2211; <sub>g &#8712; t</sub> GS<sub>gs</sub>}}{\eqn{TS_{ts} = \frac{1}{|t|}\sum_{g \in t} GS_{gs}}},
where \ifelse{html}{\out{t &#8712; T}}{\eqn{t \in T}} and \ifelse{html}{\out{s &#8712; S}}{\eqn{s \in S}}.
}

\examples{
score_matrix <- score_terms(
  example_pathfindR_output,
  example_experiment_matrix,
  plot_hmap = FALSE
)
}


================================================
FILE: man/single_iter_wrapper.Rd
================================================
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/utility.R
\name{single_iter_wrapper}
\alias{single_iter_wrapper}
\title{Active Subnetwork Search + Enrichment Analysis Wrapper for a Single Iteration}
\usage{
single_iter_wrapper(
  i = NULL,
  dirs,
  input_processed,
  pin_path,
  score_quan_thr,
  sig_gene_thr,
  search_method,
  silent_option,
  use_all_positives,
  geneInitProbs,
  saTemp0,
  saTemp1,
  saIter,
  gaPop,
  gaIter,
  gaThread,
  gaCrossover,
  gaMut,
  grMaxDepth,
  grSearchDepth,
  grOverlap,
  grSubNum,
  gset_list,
  adj_method,
  enrichment_threshold,
  list_active_snw_genes
)
}
\arguments{
\item{i}{current iteration index (default = \code{NULL})}

\item{dirs}{vector of directories for parallel runs}

\item{input_processed}{processed input data frame}

\item{pin_path}{path/to/PIN/file}

\item{score_quan_thr}{active subnetwork score quantile threshold. Must be
between 0 and 1 or set to -1 for not filtering. (Default = 0.8)}

\item{sig_gene_thr}{threshold for the minimum proportion of significant genes in
the subnetwork (Default = 0.02) If the number of genes to use as threshold is
calculated to be < 2 (e.g. 50 signif. genes x 0.01 = 0.5), the threshold number
is set to 2}

\item{search_method}{algorithm to use when performing active subnetwork
search. Options are greedy search (GR), simulated annealing (SA) or genetic
algorithm (GA) for the search (default = 'GR').}

\item{silent_option}{boolean value indicating whether to print the messages
to the console (FALSE) or not (TRUE, this will print to a temp. file) during
active subnetwork search (default = TRUE). This option was added because
during parallel runs, the console messages get disorderly printed.}

\item{use_all_positives}{if TRUE: in GA, adds an individual with all positive
nodes. In SA, initializes candidate solution with all positive nodes. (default = FALSE)}

\item{geneInitProbs}{For SA and GA, probability of adding a gene in initial solution (default = 0.1)}

\item{saTemp0}{Initial temperature for SA (default = 1.0)}

\item{saTemp1}{Final temperature for SA (default = 0.01)}

\item{saIter}{Iteration number for SA (default = 10000)}

\item{gaPop}{Population size for GA (default = 400)}

\item{gaIter}{Iteration number for GA (default = 200)}

\item{gaThread}{Number of threads to be used in GA (default = 5)}

\item{gaCrossover}{Applies crossover with the given probability in GA (default = 1, i.e. always perform crossover)}

\item{gaMut}{For GA, applies mutation with given mutation rate (default = 0, i.e. mutation off)}

\item{grMaxDepth}{Sets max depth in greedy search, 0 for no limit (default = 1)}

\item{grSearchDepth}{Search depth in greedy search (default = 1)}

\item{grOverlap}{Overlap threshold for results of greedy search (default = 0.5)}

\item{grSubNum}{Number of subnetworks to be presented in the results (default = 1000)}

\item{gset_list}{list for gene sets}

\item{adj_method}{correction method to be used for adjusting p-values.
(default = 'bonferroni')}

\item{enrichment_threshold}{adjusted-p value threshold used when filtering
enrichment results (default = 0.05)}

\item{list_active_snw_genes}{boolean value indicating whether or not to report
the non-significant active subnetwork genes for the active subnetwork which was enriched for
the given term with the lowest p value (default = \code{FALSE})}
}
\value{
Data frame of enrichment results using active subnetwork search results
}
\description{
Active Subnetwork Search + Enrichment Analysis Wrapper for a Single Iteration
}


================================================
FILE: man/summarize_enrichment_results.Rd
================================================
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/enrichment.R
\name{summarize_enrichment_results}
\alias{summarize_enrichment_results}
\title{Summarize Enrichment Results}
\usage{
summarize_enrichment_results(enrichment_res, list_active_snw_genes = FALSE)
}
\arguments{
\item{enrichment_res}{a dataframe of combined enrichment results. Columns are: \describe{
  \item{ID}{ID of the enriched term}
  \item{Term_Description}{Description of the enriched term}
  \item{Fold_Enrichment}{Fold enrichment value for the enriched term}
  \item{p_value}{p value of enrichment}
  \item{adj_p}{adjusted p value of enrichment}
  \item{non_Signif_Snw_Genes (OPTIONAL)}{the non-significant active subnetwork genes, comma-separated}
}}

\item{list_active_snw_genes}{boolean value indicating whether or not to report
the non-significant active subnetwork genes for the active subnetwork which was enriched for
the given term with the lowest p value (default = \code{FALSE})}
}
\value{
a dataframe of summarized enrichment results (over multiple iterations). Columns are: \describe{
  \item{ID}{ID of the enriched term}
  \item{Term_Description}{Description of the enriched term}
  \item{Fold_Enrichment}{Fold enrichment value for the enriched term}
  \item{occurrence}{the number of iterations that the given term was found to enriched over all iterations}
  \item{support}{the median support (proportion of active subnetworks leading to enrichment within an iteration) over all iterations}
  \item{lowest_p}{the lowest adjusted-p value of the given term over all iterations}
  \item{highest_p}{the highest adjusted-p value of the given term over all iterations}
  \item{non_Signif_Snw_Genes (OPTIONAL)}{the non-significant active subnetwork genes, comma-separated}
}
}
\description{
Summarize Enrichment Results
}
\examples{
\dontrun{
summarize_enrichment_results(enrichment_res)
}
}


================================================
FILE: man/term_gene_graph.Rd
================================================
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/visualization.R
\name{term_gene_graph}
\alias{term_gene_graph}
\title{Create Term-Gene Graph}
\usage{
term_gene_graph(
  result_df,
  num_terms = 10,
  layout = "stress",
  use_description = FALSE,
  node_size = "num_genes",
  node_colors = c("#E5D7BF", "green", "red")
)
}
\arguments{
\item{result_df}{A dataframe of pathfindR results that must contain the following columns: \describe{
  \item{Term_Description}{Description of the enriched term (necessary if \code{use_description = TRUE})}
  \item{ID}{ID of the enriched term (necessary if \code{use_description = FALSE})}
  \item{lowest_p}{the lowest adjusted-p value of the given term over all iterations}
  \item{Up_regulated}{the up-regulated genes in the input involved in the given term's gene set, comma-separated}
  \item{Down_regulated}{the down-regulated genes in the input involved in the given term's gene set, comma-separated}
}}

\item{num_terms}{Number of top enriched terms to use while creating the graph. Set to \code{NULL} to use
all enriched terms (default = 10, i.e. top 10 terms)}

\item{layout}{The type of layout to create (see \code{\link[ggraph]{ggraph}} for details. Default = 'stress')}

\item{use_description}{Boolean argument to indicate whether term descriptions
(in the 'Term_Description' column) should be used. (default = \code{FALSE})}

\item{node_size}{Argument to indicate whether to use number of significant genes ('num_genes')
or the -log10(lowest p value) ('p_val') for adjusting the node sizes (default = 'num_genes')}

\item{node_colors}{vector of 3 colors to be used for coloring nodes (colors for term nodes, up, and down, respectively)}
}
\value{
a  \code{\link[ggraph]{ggraph}} object containing the term-gene graph.
 Each node corresponds to an enriched term (beige), an up-regulated gene (green)
 or a down-regulated gene (red). An edge between a term and a gene indicates
 that the given term involves the gene. Size of a term node is proportional
 to either the number of genes (if \code{node_size = 'num_genes'}) or
 the -log10(lowest p value) (if \code{node_size = 'p_val'}).
}
\description{
Create Term-Gene Graph
}
\details{
This function (adapted from the Gene-Concept network visualization
by the R package \code{enrichplot}) can be utilized to visualize which input
genes are involved in the enriched terms as a graph. The term-gene graph
shows the links between genes and biological terms and allows for the
investigation of multiple terms to which significant genes are related. The
graph also enables determination of the overlap between the enriched terms
by identifying shared and distinct significant term-related genes.
}
\examples{
p <- term_gene_graph(example_pathfindR_output)
p <- term_gene_graph(example_pathfindR_output, num_terms = 5)
p <- term_gene_graph(example_pathfindR_output, node_size = 'p_val')
}


================================================
FILE: man/term_gene_heatmap.Rd
================================================
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/visualization.R
\name{term_gene_heatmap}
\alias{term_gene_heatmap}
\title{Create Terms by Genes Heatmap}
\usage{
term_gene_heatmap(
  result_df,
  genes_df,
  num_terms = 10,
  use_description = FALSE,
  low = "red",
  mid = "black",
  high = "green",
  legend_title = "change",
  sort_terms_by_p = FALSE,
  ...
)
}
\arguments{
\item{result_df}{A dataframe of pathfindR results that must contain the following columns: \describe{
  \item{Term_Description}{Description of the enriched term (necessary if \code{use_description = TRUE})}
  \item{ID}{ID of the enriched term (necessary if \code{use_description = FALSE})}
  \item{lowest_p}{the highest adjusted-p value of the given term over all iterations}
  \item{Up_regulated}{the up-regulated genes in the input involved in the given term's gene set, comma-separated}
  \item{Down_regulated}{the down-regulated genes in the input involved in the given term's gene set, comma-separated}
}}

\item{genes_df}{the input data that was used with \code{\link{run_pathfindR}}.
  It must be a data frame with 3 columns: \enumerate{
  \item Gene Symbol (Gene Symbol)
  \item Change value, e.g. log(fold change) (optional)
  \item p value, e.g. adjusted p value associated with differential expression
} The change values in this data frame are used to color the affected genes}

\item{num_terms}{Number of top enriched terms to use while creating the plot. Set to \code{NULL} to use
all enriched terms (default = 10)}

\item{use_description}{Boolean argument to indicate whether term descriptions
(in the 'Term_Description' column) should be used. (default = \code{FALSE})}

\item{low}{a string indicating the color of 'low' values in the coloring gradient (default = 'green')}

\item{mid}{a string indicating the color of 'mid' values in the coloring gradient (default = 'black')}

\item{high}{a string indicating the color of 'high' values in the coloring gradient (default = 'red')}

\item{legend_title}{legend title (default = 'change')}

\item{sort_terms_by_p}{boolean to indicate whether to sort terms by 'lowest_p'
(\code{TRUE}) or by number of genes (\code{FALSE}) (default = \code{FALSE})}

\item{...}{additional arguments for \code{\link{input_processing}} (used if
\code{genes_df} is provided)}
}
\value{
a ggplot2 object of a heatmap where rows are enriched terms and
columns are involved input genes. If \code{genes_df} is provided, colors of
the tiles indicate the change values.
}
\description{
Create Terms by Genes Heatmap
}
\examples{
term_gene_heatmap(example_pathfindR_output, num_terms = 3)
}


================================================
FILE: man/visualize_KEGG_diagram.Rd
================================================
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/visualization.R
\name{visualize_KEGG_diagram}
\alias{visualize_KEGG_diagram}
\title{Visualize Human KEGG Pathways}
\usage{
visualize_KEGG_diagram(
  kegg_pw_ids,
  input_processed,
  scale_vals = TRUE,
  node_cols = NULL,
  legend.position = "top"
)
}
\arguments{
\item{kegg_pw_ids}{KEGG ids of pathways to be colored and visualized}

\item{input_processed}{input data processed via \code{\link{input_processing}}}

\item{scale_vals}{should change values be scaled? (default = \code{TRUE})}

\item{node_cols}{low, middle and high color values for coloring the pathway nodes
(default = \code{NULL}). If \code{node_cols=NULL}, the low, middle and high color
are set as 'green', 'gray' and 'red'. If all change values are 1e6 (in case no
changes are supplied, this dummy value is assigned by
\code{\link{input_processing}}), only one color ('#F38F18' if NULL) is used.}

\item{legend.position}{the default position of legends ("none", "left",
"right", "bottom", "top", "inside")}
}
\value{
Creates colored visualizations of the enriched human KEGG pathways
and returns them as a list of ggplot objects, named by Term ID.
}
\description{
Visualize Human KEGG Pathways
}
\examples{
\dontrun{
input_processed <- data.frame(
  GENE = c("PKLR", "GPI", "CREB1", "INS"),
  CHANGE = c(1.5, -2, 3, 5)
)
gg_list <- visualize_KEGG_diagram(c("hsa00010", "hsa04911"), input_processed)
}
}
\seealso{
See \code{\link{visualize_terms}} for the wrapper function for
creating enriched term diagrams. See \code{\link{run_pathfindR}} for the
wrapper function of the pathfindR enrichment workflow.
}


================================================
FILE: man/visualize_active_subnetworks.Rd
================================================
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/active_snw_search.R
\name{visualize_active_subnetworks}
\alias{visualize_active_subnetworks}
\title{Visualize Active Subnetworks}
\usage{
visualize_active_subnetworks(
  active_snw_path,
  genes_df,
  pin_name_path = "Biogrid",
  num_snws,
  layout = "stress",
  score_quan_thr = 0.8,
  sig_gene_thr = 0.02,
  ...
)
}
\arguments{
\item{active_snw_path}{path to the output of an Active Subnetwork Search}

\item{genes_df}{the input data that was used with \code{\link{run_pathfindR}}.
  It must be a data frame with 3 columns: \enumerate{
  \item Gene Symbol (Gene Symbol)
  \item Change value, e.g. log(fold change) (optional)
  \item p value, e.g. adjusted p value associated with differential expression
} The change values in this data frame are used to color the affected genes}

\item{pin_name_path}{Name of the chosen PIN or absolute/path/to/PIN.sif. If PIN name,
must be one of c('Biogrid', 'STRING', 'GeneMania', 'IntAct', 'KEGG', 'mmu_STRING'). If
path/to/PIN.sif, the file must comply with the PIN specifications. (Default = 'Biogrid')}

\item{num_snws}{number of top subnetworks to be visualized (leave blank if
you want to visualize all subnetworks)}

\item{layout}{The type of layout to create (see \code{\link[ggraph]{ggraph}} for details. Default = 'stress')}

\item{score_quan_thr}{active subnetwork score quantile threshold. Must be
between 0 and 1 or set to -1 for not filtering. (Default = 0.8)}

\item{sig_gene_thr}{threshold for the minimum proportion of significant genes in
the subnetwork (Default = 0.02) If the number of genes to use as threshold is
calculated to be < 2 (e.g. 50 signif. genes x 0.01 = 0.5), the threshold number
is set to 2}

\item{...}{additional arguments for \code{\link{input_processing}}}
}
\value{
a list of ggplot objects of graph visualizations of identified active
subnetworks. Green nodes are down-regulated genes, reds are up-regulated genes
and yellows are non-input genes
}
\description{
Visualize Active Subnetworks
}
\examples{
path2snw_list <- system.file(
  'extdata/resultActiveSubnetworkSearch.txt',
  package = 'pathfindR'
)
# visualize top 2 active subnetworks
g_list <- visualize_active_subnetworks(
  active_snw_path = path2snw_list,
  genes_df = example_pathfindR_input[1:10, ],
  pin_name_path = 'KEGG',
  num_snws = 2
)
}


================================================
FILE: man/visualize_term_interactions.Rd
================================================
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/visualization.R
\name{visualize_term_interactions}
\alias{visualize_term_interactions}
\title{Visualize Interactions of Genes Involved in the Given Enriched Terms}
\usage{
visualize_term_interactions(result_df, pin_name_path, show_legend = TRUE)
}
\arguments{
\item{result_df}{Data frame of enrichment results. Must-have columns
are: 'Term_Description', 'Up_regulated' and 'Down_regulated'}

\item{pin_name_path}{Name of the chosen PIN or absolute/path/to/PIN.sif. If PIN name,
must be one of c('Biogrid', 'STRING', 'GeneMania', 'IntAct', 'KEGG', 'mmu_STRING'). If
path/to/PIN.sif, the file must comply with the PIN specifications. (Default = 'Biogrid')}

\item{show_legend}{Boolean to indicate whether to display the legend (\code{TRUE})
or not (\code{FALSE}) (default: \code{TRUE})}
}
\value{
list of ggplot objects (named by Term ID) visualizing the interactions of genes involved
in the given enriched terms (annotated in the \code{result_df}) in the PIN used
for enrichment analysis (specified by \code{pin_name_path}).
}
\description{
Visualize Interactions of Genes Involved in the Given Enriched Terms
}
\details{
The following steps are performed for the visualization of interactions
of genes involved for each enriched term: \enumerate{
  \item shortest paths between all affected genes are determined (via \code{\link[igraph]{igraph}})
  \item the nodes of all shortest paths are merged
  \item the PIN is subsetted using the merged nodes (genes)
  \item using the PIN subset, the graph showing the interactions is generated
  \item the final graph is visualized using \code{\link[igraph]{igraph}}, colored by changed
  status (if provided)
}
}
\examples{
\dontrun{
result_df <- example_pathfindR_output[1:2, ]
gg_list <- visualize_term_interactions(result_df, pin_name_path = 'IntAct')
}
}
\seealso{
See \code{\link{visualize_terms}} for the wrapper function
  for creating enriched term diagrams. See \code{\link{run_pathfindR}} for the
  wrapper function of the pathfindR enrichment workflow.
}


================================================
FILE: man/visualize_terms.Rd
================================================
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/visualization.R
\name{visualize_terms}
\alias{visualize_terms}
\title{Create Diagrams for Enriched Terms}
\usage{
visualize_terms(
  result_df,
  input_processed = NULL,
  is_KEGG_result = TRUE,
  pin_name_path = "Biogrid",
  ...
)
}
\arguments{
\item{result_df}{Data frame of enrichment results. Must-have columns for
 KEGG human pathway diagrams (\code{is_KEGG_result = TRUE}) are: 'ID' and 'Term_Description'.
 Must-have columns for the rest are: 'Term_Description', 'Up_regulated' and
'Down_regulated'}

\item{input_processed}{input data processed via \code{\link{input_processing}},
not necessary when \code{is_KEGG_result = FALSE}}

\item{is_KEGG_result}{boolean to indicate whether KEGG gene sets were used for
enrichment analysis or not (default = \code{TRUE})}

\item{pin_name_path}{Name of the chosen PIN or absolute/path/to/PIN.sif. If PIN name,
must be one of c('Biogrid', 'STRING', 'GeneMania', 'IntAct', 'KEGG', 'mmu_STRING'). If
path/to/PIN.sif, the file must comply with the PIN specifications. (Default = 'Biogrid')}

\item{...}{additional arguments for \code{\link{visualize_KEGG_diagram}} (used
when \code{is_KEGG_result = TRUE}) or \code{\link{visualize_term_interactions}}
(used when \code{is_KEGG_result = FALSE})}
}
\value{
Depending on the argument \code{is_KEGG_result}, creates visualization of
 interactions of genes involved in the list of enriched terms in
 \code{result_df}. Returns a list of ggplot objects named by Term ID.
}
\description{
Create Diagrams for Enriched Terms
}
\details{
For \code{is_KEGG_result = TRUE}, KEGG pathway diagrams are created,
affected nodes colored by up/down regulation status.
For other gene sets, interactions of affected genes are determined (via a shortest-path
algorithm) and are visualized (colored by change status) using igraph.
}
\examples{
\dontrun{
input_processed <- data.frame(
  GENE = c("PARP1", "NDUFA1", "STX6", "SNAP23"),
  CHANGE = c(1.5, -2, 3, 5)
)
result_df <- example_pathfindR_output[1:2, ]

gg_list <- visualize_terms(result_df, input_processed)
gg_list2 <- visualize_terms(result_df, is_KEGG_result = FALSE, pin_name_path = 'IntAct')
}
}
\seealso{
See \code{\link{visualize_KEGG_diagram}} for the visualization function
of KEGG diagrams. See \code{\link{visualize_term_interactions}} for the
visualization function that generates diagrams showing the interactions of
input genes in the PIN. See \code{\link{run_pathfindR}} for the wrapper
function of the pathfindR workflow.
}


================================================
FILE: renv/.gitignore
================================================
library/
local/
cellar/
lock/
python/
sandbox/
staging/


================================================
FILE: renv/activate.R
================================================

local({

  # the requested version of renv
  version <- "1.1.4"
  attr(version, "sha") <- NULL

  # the project directory
  project <- Sys.getenv("RENV_PROJECT")
  if (!nzchar(project))
    project <- getwd()

  # use start-up diagnostics if enabled
  diagnostics <- Sys.getenv("RENV_STARTUP_DIAGNOSTICS", unset = "FALSE")
  if (diagnostics) {
    start <- Sys.time()
    profile <- tempfile("renv-startup-", fileext = ".Rprof")
    utils::Rprof(profile)
    on.exit({
      utils::Rprof(NULL)
      elapsed <- signif(difftime(Sys.time(), start, units = "auto"), digits = 2L)
      writeLines(sprintf("- renv took %s to run the autoloader.", format(elapsed)))
      writeLines(sprintf("- Profile: %s", profile))
      print(utils::summaryRprof(profile))
    }, add = TRUE)
  }

  # figure out whether the autoloader is enabled
  enabled <- local({

    # first, check config option
    override <- getOption("renv.config.autoloader.enabled")
    if (!is.null(override))
      return(override)

    # if we're being run in a context where R_LIBS is already set,
    # don't load -- presumably we're being run as a sub-process and
    # the parent process has already set up library paths for us
    rcmd <- Sys.getenv("R_CMD", unset = NA)
    rlibs <- Sys.getenv("R_LIBS", unset = NA)
    if (!is.na(rlibs) && !is.na(rcmd))
      return(FALSE)

    # next, check environment variables
    # prefer using the configuration one in the future
    envvars <- c(
      "RENV_CONFIG_AUTOLOADER_ENABLED",
      "RENV_AUTOLOADER_ENABLED",
      "RENV_ACTIVATE_PROJECT"
    )

    for (envvar in envvars) {
      envval <- Sys.getenv(envvar, unset = NA)
      if (!is.na(envval))
        return(tolower(envval) %in% c("true", "t", "1"))
    }

    # enable by default
    TRUE

  })

  # bail if we're not enabled
  if (!enabled) {

    # if we're not enabled, we might still need to manually load
    # the user profile here
    profile <- Sys.getenv("R_PROFILE_USER", unset = "~/.Rprofile")
    if (file.exists(profile)) {
      cfg <- Sys.getenv("RENV_CONFIG_USER_PROFILE", unset = "TRUE")
      if (tolower(cfg) %in% c("true", "t", "1"))
        sys.source(profile, envir = globalenv())
    }

    return(FALSE)

  }

  # avoid recursion
  if (identical(getOption("renv.autoloader.running"), TRUE)) {
    warning("ignoring recursive attempt to run renv autoloader")
    return(invisible(TRUE))
  }

  # signal that we're loading renv during R startup
  options(renv.autoloader.running = TRUE)
  on.exit(options(renv.autoloader.running = NULL), add = TRUE)

  # signal that we've consented to use renv
  options(renv.consent = TRUE)

  # load the 'utils' package eagerly -- this ensures that renv shims, which
  # mask 'utils' packages, will come first on the search path
  library(utils, lib.loc = .Library)

  # unload renv if it's already been loaded
  if ("renv" %in% loadedNamespaces())
    unloadNamespace("renv")

  # load bootstrap tools   
  ansify <- function(text) {
    if (renv_ansify_enabled())
      renv_ansify_enhanced(text)
    else
      renv_ansify_default(text)
  }
  
  renv_ansify_enabled <- function() {
  
    override <- Sys.getenv("RENV_ANSIFY_ENABLED", unset = NA)
    if (!is.na(override))
      return(as.logical(override))
  
    pane <- Sys.getenv("RSTUDIO_CHILD_PROCESS_PANE", unset = NA)
    if (identical(pane, "build"))
      return(FALSE)
  
    testthat <- Sys.getenv("TESTTHAT", unset = "false")
    if (tolower(testthat) %in% "true")
      return(FALSE)
  
    iderun <- Sys.getenv("R_CLI_HAS_HYPERLINK_IDE_RUN", unset = "false")
    if (tolower(iderun) %in% "false")
      return(FALSE)
  
    TRUE
  
  }
  
  renv_ansify_default <- function(text) {
    text
  }
  
  renv_ansify_enhanced <- function(text) {
  
    # R help links
    pattern <- "`\\?(renv::(?:[^`])+)`"
    replacement <- "`\033]8;;x-r-help:\\1\a?\\1\033]8;;\a`"
    text <- gsub(pattern, replacement, text, perl = TRUE)
  
    # runnable code
    pattern <- "`(renv::(?:[^`])+)`"
    replacement <- "`\033]8;;x-r-run:\\1\a\\1\033]8;;\a`"
    text <- gsub(pattern, replacement, text, perl = TRUE)
  
    # return ansified text
    text
  
  }
  
  renv_ansify_init <- function() {
  
    envir <- renv_envir_self()
    if (renv_ansify_enabled())
      assign("ansify", renv_ansify_enhanced, envir = envir)
    else
      assign("ansify", renv_ansify_default, envir = envir)
  
  }
  
  `%||%` <- function(x, y) {
    if (is.null(x)) y else x
  }
  
  catf <- function(fmt, ..., appendLF = TRUE) {
  
    quiet <- getOption("renv.bootstrap.quiet", default = FALSE)
    if (quiet)
      return(invisible())
  
    msg <- sprintf(fmt, ...)
    cat(msg, file = stdout(), sep = if (appendLF) "\n" else "")
  
    invisible(msg)
  
  }
  
  header <- function(label,
                     ...,
                     prefix = "#",
                     suffix = "-",
                     n = min(getOption("width"), 78))
  {
    label <- sprintf(label, ...)
    n <- max(n - nchar(label) - nchar(prefix) - 2L, 8L)
    if (n <= 0)
      return(paste(prefix, label))
  
    tail <- paste(rep.int(suffix, n), collapse = "")
    paste0(prefix, " ", label, " ", tail)
  
  }
  
  heredoc <- function(text, leave = 0) {
  
    # remove leading, trailing whitespace
    trimmed <- gsub("^\\s*\\n|\\n\\s*$", "", text)
  
    # split into lines
    lines <- strsplit(trimmed, "\n", fixed = TRUE)[[1L]]
  
    # compute common indent
    indent <- regexpr("[^[:space:]]", lines)
    common <- min(setdiff(indent, -1L)) - leave
    text <- paste(substring(lines, common), collapse = "\n")
  
    # substitute in ANSI links for executable renv code
    ansify(text)
  
  }
  
  bootstrap <- function(version, library) {
  
    friendly <- renv_bootstrap_version_friendly(version)
    section <- header(sprintf("Bootstrapping renv %s", friendly))
    catf(section)
  
    # attempt to download renv
    catf("- Downloading renv ... ", appendLF = FALSE)
    withCallingHandlers(
      tarball <- renv_bootstrap_download(version),
      error = function(err) {
        catf("FAILED")
        stop("failed to download:\n", conditionMessage(err))
      }
    )
    catf("OK")
    on.exit(unlink(tarball), add = TRUE)
  
    # now attempt to install
    catf("- Installing renv  ... ", appendLF = FALSE)
    withCallingHandlers(
      status <- renv_bootstrap_install(version, tarball, library),
      error = function(err) {
        catf("FAILED")
        stop("failed to install:\n", conditionMessage(err))
      }
    )
    catf("OK")
  
    # add empty line to break up bootstrapping from normal output
    catf("")
  
    return(invisible())
  }
  
  renv_bootstrap_tests_running <- function() {
    getOption("renv.tests.running", default = FALSE)
  }
  
  renv_bootstrap_repos <- function() {
  
    # get CRAN repository
    cran <- getOption("renv.repos.cran", "https://cloud.r-project.org")
  
    # check for repos override
    repos <- Sys.getenv("RENV_CONFIG_REPOS_OVERRIDE", unset = NA)
    if (!is.na(repos)) {
  
      # check for RSPM; if set, use a fallback repository for renv
      rspm <- Sys.getenv("RSPM", unset = NA)
      if (identical(rspm, repos))
        repos <- c(RSPM = rspm, CRAN = cran)
  
      return(repos)
  
    }
  
    # check for lockfile repositories
    repos <- tryCatch(renv_bootstrap_repos_lockfile(), error = identity)
    if (!inherits(repos, "error") && length(repos))
      return(repos)
  
    # retrieve current repos
    repos <- getOption("repos")
  
    # ensure @CRAN@ entries are resolved
    repos[repos == "@CRAN@"] <- cran
  
    # add in renv.bootstrap.repos if set
    default <- c(FALLBACK = "https://cloud.r-project.org")
    extra <- getOption("renv.bootstrap.repos", default = default)
    repos <- c(repos, extra)
  
    # remove duplicates that might've snuck in
    dupes <- duplicated(repos) | duplicated(names(repos))
    repos[!dupes]
  
  }
  
  renv_bootstrap_repos_lockfile <- function() {
  
    lockpath <- Sys.getenv("RENV_PATHS_LOCKFILE", unset = "renv.lock")
    if (!file.exists(lockpath))
      return(NULL)
  
    lockfile <- tryCatch(renv_json_read(lockpath), error = identity)
    if (inherits(lockfile, "error")) {
      warning(lockfile)
      return(NULL)
    }
  
    repos <- lockfile$R$Repositories
    if (length(repos) == 0)
      return(NULL)
  
    keys <- vapply(repos, `[[`, "Name", FUN.VALUE = character(1))
    vals <- vapply(repos, `[[`, "URL", FUN.VALUE = character(1))
    names(vals) <- keys
  
    return(vals)
  
  }
  
  renv_bootstrap_download <- function(version) {
  
    sha <- attr(version, "sha", exact = TRUE)
  
    methods <- if (!is.null(sha)) {
  
      # attempting to bootstrap a development version of renv
      c(
        function() renv_bootstrap_download_tarball(sha),
        function() renv_bootstrap_download_github(sha)
      )
  
    } else {
  
      # attempting to bootstrap a release version of renv
      c(
        function() renv_bootstrap_download_tarball(version),
        function() renv_bootstrap_download_cran_latest(version),
        function() renv_bootstrap_download_cran_archive(version)
      )
  
    }
  
    for (method in methods) {
      path <- tryCatch(method(), error = identity)
      if (is.character(path) && file.exists(path))
        return(path)
    }
  
    stop("All download methods failed")
  
  }
  
  renv_bootstrap_download_impl <- function(url, destfile) {
  
    mode <- "wb"
  
    # https://bugs.r-project.org/bugzilla/show_bug.cgi?id=17715
    fixup <-
      Sys.info()[["sysname"]] == "Windows" &&
      substring(url, 1L, 5L) == "file:"
  
    if (fixup)
      mode <- "w+b"
  
    args <- list(
      url      = url,
      destfile = destfile,
      mode     = mode,
      quiet    = TRUE
    )
  
    if ("headers" %in% names(formals(utils::download.file))) {
      headers <- renv_bootstrap_download_custom_headers(url)
      if (length(headers) && is.character(headers))
        args$headers <- headers
    }
  
    do.call(utils::download.file, args)
  
  }
  
  renv_bootstrap_download_custom_headers <- function(url) {
  
    headers <- getOption("renv.download.headers")
    if (is.null(headers))
      return(character())
  
    if (!is.function(headers))
      stopf("'renv.download.headers' is not a function")
  
    headers <- headers(url)
    if (length(headers) == 0L)
      return(character())
  
    if (is.list(headers))
      headers <- unlist(headers, recursive = FALSE, use.names = TRUE)
  
    ok <-
      is.character(headers) &&
      is.character(names(headers)) &&
      all(nzchar(names(headers)))
  
    if (!ok)
      stop("invocation of 'renv.download.headers' did not return a named character vector")
  
    headers
  
  }
  
  renv_bootstrap_download_cran_latest <- function(version) {
  
    spec <- renv_bootstrap_download_cran_latest_find(version)
    type  <- spec$type
    repos <- spec$repos
  
    baseurl <- utils::contrib.url(repos = repos, type = type)
    ext <- if (identical(type, "source"))
      ".tar.gz"
    else if (Sys.info()[["sysname"]] == "Windows")
      ".zip"
    else
      ".tgz"
    name <- sprintf("renv_%s%s", version, ext)
    url <- paste(baseurl, name, sep = "/")
  
    destfile <- file.path(tempdir(), name)
    status <- tryCatch(
      renv_bootstrap_download_impl(url, destfile),
      condition = identity
    )
  
    if (inherits(status, "condition"))
      return(FALSE)
  
    # report success and return
    destfile
  
  }
  
  renv_bootstrap_download_cran_latest_find <- function(version) {
  
    # check whether binaries are supported on this system
    binary <-
      getOption("renv.bootstrap.binary", default = TRUE) &&
      !identical(.Platform$pkgType, "source") &&
      !identical(getOption("pkgType"), "source") &&
      Sys.info()[["sysname"]] %in% c("Darwin", "Windows")
  
    types <- c(if (binary) "binary", "source")
  
    # iterate over types + repositories
    for (type in types) {
      for (repos in renv_bootstrap_repos()) {
  
        # build arguments for utils::available.packages() call
        args <- list(type = type, repos = repos)
  
        # add custom headers if available -- note that
        # utils::available.packages() will pass this to download.file()
        if ("headers" %in% names(formals(utils::download.file))) {
          headers <- renv_bootstrap_download_custom_headers(repos)
          if (length(headers) && is.character(headers))
            args$headers <- headers
        }
  
        # retrieve package database
        db <- tryCatch(
          as.data.frame(
            do.call(utils::available.packages, args),
            stringsAsFactors = FALSE
          ),
          error = identity
        )
  
        if (inherits(db, "error"))
          next
  
        # check for compatible entry
        entry <- db[db$Package %in% "renv" & db$Version %in% version, ]
        if (nrow(entry) == 0)
          next
  
        # found it; return spec to caller
        spec <- list(entry = entry, type = type, repos = repos)
        return(spec)
  
      }
    }
  
    # if we got here, we failed to find renv
    fmt <- "renv %s is not available from your declared package repositories"
    stop(sprintf(fmt, version))
  
  }
  
  renv_bootstrap_download_cran_archive <- function(version) {
  
    name <- sprintf("renv_%s.tar.gz", version)
    repos <- renv_bootstrap_repos()
    urls <- file.path(repos, "src/contrib/Archive/renv", name)
    destfile <- file.path(tempdir(), name)
  
    for (url in urls) {
  
      status <- tryCatch(
        renv_bootstrap_download_impl(url, destfile),
        condition = identity
      )
  
      if (identical(status, 0L))
        return(destfile)
  
    }
  
    return(FALSE)
  
  }
  
  renv_bootstrap_download_tarball <- function(version) {
  
    # if the user has provided the path to a tarball via
    # an environment variable, then use it
    tarball <- Sys.getenv("RENV_BOOTSTRAP_TARBALL", unset = NA)
    if (is.na(tarball))
      return()
  
    # allow directories
    if (dir.exists(tarball)) {
      name <- sprintf("renv_%s.tar.gz", version)
      tarball <- file.path(tarball, name)
    }
  
    # bail if it doesn't exist
    if (!file.exists(tarball)) {
  
      # let the user know we weren't able to honour their request
      fmt <- "- RENV_BOOTSTRAP_TARBALL is set (%s) but does not exist."
      msg <- sprintf(fmt, tarball)
      warning(msg)
  
      # bail
      return()
  
    }
  
    catf("- Using local tarball '%s'.", tarball)
    tarball
  
  }
  
  renv_bootstrap_github_token <- function() {
    for (envvar in c("GITHUB_TOKEN", "GITHUB_PAT", "GH_TOKEN")) {
      envval <- Sys.getenv(envvar, unset = NA)
      if (!is.na(envval))
        return(envval)
    }
  }
  
  renv_bootstrap_download_github <- function(version) {
  
    enabled <- Sys.getenv("RENV_BOOTSTRAP_FROM_GITHUB", unset = "TRUE")
    if (!identical(enabled, "TRUE"))
      return(FALSE)
  
    # prepare download options
    token <- renv_bootstrap_github_token()
    if (is.null(token))
      token <- ""
  
    if (nzchar(Sys.which("curl")) && nzchar(token)) {
      fmt <- "--location --fail --header \"Authorization: token %s\""
      extra <- sprintf(fmt, token)
      saved <- options("download.file.method", "download.file.extra")
      options(download.file.method = "curl", download.file.extra = extra)
      on.exit(do.call(base::options, saved), add = TRUE)
    } else if (nzchar(Sys.which("wget")) && nzchar(token)) {
      fmt <- "--header=\"Authorization: token %s\""
      extra <- sprintf(fmt, token)
      saved <- options("download.file.method", "download.file.extra")
      options(download.file.method = "wget", download.file.extra = extra)
      on.exit(do.call(base::options, saved), add = TRUE)
    }
  
    url <- file.path("https://api.github.com/repos/rstudio/renv/tarball", version)
    name <- sprintf("renv_%s.tar.gz", version)
    destfile <- file.path(tempdir(), name)
  
    status <- tryCatch(
      renv_bootstrap_download_impl(url, destfile),
      condition = identity
    )
  
    if (!identical(status, 0L))
      return(FALSE)
  
    renv_bootstrap_download_augment(destfile)
  
    return(destfile)
  
  }
  
  # Add Sha to DESCRIPTION. This is stop gap until #890, after which we
  # can use renv::install() to fully capture metadata.
  renv_bootstrap_download_augment <- function(destfile) {
    sha <- renv_bootstrap_git_extract_sha1_tar(destfile)
    if (is.null(sha)) {
      return()
    }
  
    # Untar
    tempdir <- tempfile("renv-github-")
    on.exit(unlink(tempdir, recursive = TRUE), add = TRUE)
    untar(destfile, exdir = tempdir)
    pkgdir <- dir(tempdir, full.names = TRUE)[[1]]
  
    # Modify description
    desc_path <- file.path(pkgdir, "DESCRIPTION")
    desc_lines <- readLines(desc_path)
    remotes_fields <- c(
      "RemoteType: github",
      "RemoteHost: api.github.com",
      "RemoteRepo: renv",
      "RemoteUsername: rstudio",
      "RemotePkgRef: rstudio/renv",
      paste("RemoteRef: ", sha),
      paste("RemoteSha: ", sha)
    )
    writeLines(c(desc_lines[desc_lines != ""], remotes_fields), con = desc_path)
  
    # Re-tar
    local({
      old <- setwd(tempdir)
      on.exit(setwd(old), add = TRUE)
  
      tar(destfile, compression = "gzip")
    })
    invisible()
  }
  
  # Extract the commit hash from a git archive. Git archives include the SHA1
  # hash as the comment field of the tarball pax extended header
  # (see https://www.kernel.org/pub/software/scm/git/docs/git-archive.html)
  # For GitHub archives this should be the first header after the default one
  # (512 byte) header.
  renv_bootstrap_git_extract_sha1_tar <- function(bundle) {
  
    # open the bundle for reading
    # We use gzcon for everything because (from ?gzcon)
    # > Reading from a connection which does not supply a 'gzip' magic
    # > header is equivalent to reading from the original connection
    conn <- gzcon(file(bundle, open = "rb", raw = TRUE))
    on.exit(close(conn))
  
    # The default pax header is 512 bytes long and the first pax extended header
    # with the comment should be 51 bytes long
    # `52 comment=` (11 chars) + 40 byte SHA1 hash
    len <- 0x200 + 0x33
    res <- rawToChar(readBin(conn, "raw", n = len)[0x201:len])
  
    if (grepl("^52 comment=", res)) {
      sub("52 comment=", "", res)
    } else {
      NULL
    }
  }
  
  renv_bootstrap_install <- function(version, tarball, library) {
  
    # attempt to install it into project library
    dir.create(library, showWarnings = FALSE, recursive = TRUE)
    output <- renv_bootstrap_install_impl(library, tarball)
  
    # check for successful install
    status <- attr(output, "status")
    if (is.null(status) || identical(status, 0L))
      return(status)
  
    # an error occurred; report it
    header <- "installation of renv failed"
    lines <- paste(rep.int("=", nchar(header)), collapse = "")
    text <- paste(c(header, lines, output), collapse = "\n")
    stop(text)
  
  }
  
  renv_bootstrap_install_impl <- function(library, tarball) {
  
    # invoke using system2 so we can capture and report output
    bin <- R.home("bin")
    exe <- if (Sys.info()[["sysname"]] == "Windows") "R.exe" else "R"
    R <- file.path(bin, exe)
  
    args <- c(
      "--vanilla", "CMD", "INSTALL", "--no-multiarch",
      "-l", shQuote(path.expand(library)),
      shQuote(path.expand(tarball))
    )
  
    system2(R, args, stdout = TRUE, stderr = TRUE)
  
  }
  
  renv_bootstrap_platform_prefix_default <- function() {
  
    # read version component
    version <- Sys.getenv("RENV_PATHS_VERSION", unset = "R-%v")
  
    # expand placeholders
    placeholders <- list(
      list("%v", format(getRversion()[1, 1:2])),
      list("%V", format(getRversion()[1, 1:3]))
    )
  
    for (placeholder in placeholders)
      version <- gsub(placeholder[[1L]], placeholder[[2L]], version, fixed = TRUE)
  
    # include SVN revision for development versions of R
    # (to avoid sharing platform-specific artefacts with released versions of R)
    devel <-
      identical(R.version[["status"]],   "Under development (unstable)") ||
      identical(R.version[["nickname"]], "Unsuffered Consequences")
  
    if (devel)
      version <- paste(version, R.version[["svn rev"]], sep = "-r")
  
    version
  
  }
  
  renv_bootstrap_platform_prefix <- function() {
  
    # construct version prefix
    version <- renv_bootstrap_platform_prefix_default()
  
    # build list of path components
    components <- c(version, R.version$platform)
  
    # include prefix if provided by user
    prefix <- renv_bootstrap_platform_prefix_impl()
    if (!is.na(prefix) && nzchar(prefix))
      components <- c(prefix, components)
  
    # build prefix
    paste(components, collapse = "/")
  
  }
  
  renv_bootstrap_platform_prefix_impl <- function() {
  
    # if an explicit prefix has been supplied, use it
    prefix <- Sys.getenv("RENV_PATHS_PREFIX", unset = NA)
    if (!is.na(prefix))
      return(prefix)
  
    # if the user has requested an automatic prefix, generate it
    auto <- Sys.getenv("RENV_PATHS_PREFIX_AUTO", unset = NA)
    if (is.na(auto) && getRversion() >= "4.4.0")
      auto <- "TRUE"
  
    if (auto %in% c("TRUE", "True", "true", "1"))
      return(renv_bootstrap_platform_prefix_auto())
  
    # empty string on failure
    ""
  
  }
  
  renv_bootstrap_platform_prefix_auto <- function() {
  
    prefix <- tryCatch(renv_bootstrap_platform_os(), error = identity)
    if (inherits(prefix, "error") || prefix %in% "unknown") {
  
      msg <- paste(
        "failed to infer current operating system",
        "please file a bug report at https://github.com/rstudio/renv/issues",
        sep = "; "
      )
  
      warning(msg)
  
    }
  
    prefix
  
  }
  
  renv_bootstrap_platform_os <- function() {
  
    sysinfo <- Sys.info()
    sysname <- sysinfo[["sysname"]]
  
    # handle Windows + macOS up front
    if (sysname == "Windows")
      return("windows")
    else if (sysname == "Darwin")
      return("macos")
  
    # check for os-release files
    for (file in c("/etc/os-release", "/usr/lib/os-release"))
      if (file.exists(file))
        return(renv_bootstrap_platform_os_via_os_release(file, sysinfo))
  
    # check for redhat-release files
    if (file.exists("/etc/redhat-release"))
      return(renv_bootstrap_platform_os_via_redhat_release())
  
    "unknown"
  
  }
  
  renv_bootstrap_platform_os_via_os_release <- function(file, sysinfo) {
  
    # read /etc/os-release
    release <- utils::read.table(
      file             = file,
      sep              = "=",
      quote            = c("\"", "'"),
      col.names        = c("Key", "Value"),
      comment.char     = "#",
      stringsAsFactors = FALSE
    )
  
    vars <- as.list(release$Value)
    names(vars) <- release$Key
  
    # get os name
    os <- tolower(sysinfo[["sysname"]])
  
    # read id
    id <- "unknown"
    for (field in c("ID", "ID_LIKE")) {
      if (field %in% names(vars) && nzchar(vars[[field]])) {
        id <- vars[[field]]
        break
      }
    }
  
    # read version
    version <- "unknown"
    for (field in c("UBUNTU_CODENAME", "VERSION_CODENAME", "VERSION_ID", "BUILD_ID")) {
      if (field %in% names(vars) && nzchar(vars[[field]])) {
        version <- vars[[field]]
        break
      }
    }
  
    # join together
    paste(c(os, id, version), collapse = "-")
  
  }
  
  renv_bootstrap_platform_os_via_redhat_release <- function() {
  
    # read /etc/redhat-release
    contents <- readLines("/etc/redhat-release", warn = FALSE)
  
    # infer id
    id <- if (grepl("centos", contents, ignore.case = TRUE))
      "centos"
    else if (grepl("redhat", contents, ignore.case = TRUE))
      "redhat"
    else
      "unknown"
  
    # try to find a version component (very hacky)
    version <- "unknown"
  
    parts <- strsplit(contents, "[[:space:]]")[[1L]]
    for (part in parts) {
  
      nv <- tryCatch(numeric_version(part), error = identity)
      if (inherits(nv, "error"))
        next
  
      version <- nv[1, 1]
      break
  
    }
  
    paste(c("linux", id, version), collapse = "-")
  
  }
  
  renv_bootstrap_library_root_name <- function(project) {
  
    # use project name as-is if requested
    asis <- Sys.getenv("RENV_PATHS_LIBRARY_ROOT_ASIS", unset = "FALSE")
    if (asis)
      return(basename(project))
  
    # otherwise, disambiguate based on project's path
    id <- substring(renv_bootstrap_hash_text(project), 1L, 8L)
    paste(basename(project), id, sep = "-")
  
  }
  
  renv_bootstrap_library_root <- function(project) {
  
    prefix <- renv_bootstrap_profile_prefix()
  
    path <- Sys.getenv("RENV_PATHS_LIBRARY", unset = NA)
    if (!is.na(path))
      return(paste(c(path, prefix), collapse = "/"))
  
    path <- renv_bootstrap_library_root_impl(project)
    if (!is.null(path)) {
      name <- renv_bootstrap_library_root_name(project)
      return(paste(c(path, prefix, name), collapse = "/"))
    }
  
    renv_bootstrap_paths_renv("library", project = project)
  
  }
  
  renv_bootstrap_library_root_impl <- function(project) {
  
    root <- Sys.getenv("RENV_PATHS_LIBRARY_ROOT", unset = NA)
    if (!is.na(root))
      return(root)
  
    type <- renv_bootstrap_project_type(project)
    if (identical(type, "package")) {
      userdir <- renv_bootstrap_user_dir()
      return(file.path(userdir, "library"))
    }
  
  }
  
  renv_bootstrap_validate_version <- function(version, description = NULL) {
  
    # resolve description file
    #
    # avoid passing lib.loc to `packageDescription()` below, since R will
    # use the loaded version of the package by default anyhow. note that
    # this function should only be called after 'renv' is loaded
    # https://github.com/rstudio/renv/issues/1625
    description <- description %||% packageDescription("renv")
  
    # check whether requested version 'version' matches loaded version of renv
    sha <- attr(version, "sha", exact = TRUE)
    valid <- if (!is.null(sha))
      renv_bootstrap_validate_version_dev(sha, description)
    else
      renv_bootstrap_validate_version_release(version, description)
  
    if (valid)
      return(TRUE)
  
    # the loaded version of renv doesn't match the requested version;
    # give the user instructions on how to proceed
    dev <- identical(description[["RemoteType"]], "github")
    remote <- if (dev)
      paste("rstudio/renv", description[["RemoteSha"]], sep = "@")
    else
      paste("renv", description[["Version"]], sep = "@")
  
    # display both loaded version + sha if available
    friendly <- renv_bootstrap_version_friendly(
      version = description[["Version"]],
      sha     = if (dev) description[["RemoteSha"]]
    )
  
    fmt <- heredoc("
      renv %1$s was loaded from project library, but this project is configured to use renv %2$s.
      - Use `renv::record(\"%3$s\")` to record renv %1$s in the lockfile.
      - Use `renv::restore(packages = \"renv\")` to install renv %2$s into the project library.
    ")
    catf(fmt, friendly, renv_bootstrap_version_friendly(version), remote)
  
    FALSE
  
  }
  
  renv_bootstrap_validate_version_dev <- function(version, description) {
  
    expected <- description[["RemoteSha"]]
    if (!is.character(expected))
      return(FALSE)
  
    pattern <- sprintf("^\\Q%s\\E", version)
    grepl(pattern, expected, perl = TRUE)
  
  }
  
  renv_bootstrap_validate_version_release <- function(version, description) {
    expected <- description[["Version"]]
    is.character(expected) && identical(expected, version)
  }
  
  renv_bootstrap_hash_text <- function(text) {
  
    hashfile <- tempfile("renv-hash-")
    on.exit(unlink(hashfile), add = TRUE)
  
    writeLines(text, con = hashfile)
    tools::md5sum(hashfile)
  
  }
  
  renv_bootstrap_load <- function(project, libpath, version) {
  
    # try to load renv from the project library
    if (!requireNamespace("renv", lib.loc = libpath, quietly = TRUE))
      return(FALSE)
  
    # warn if the version of renv loaded does not match
    renv_bootstrap_validate_version(version)
  
    # execute renv load hooks, if any
    hooks <- getHook("renv::autoload")
    for (hook in hooks)
      if (is.function(hook))
        tryCatch(hook(), error = warnify)
  
    # load the project
    renv::load(project)
  
    TRUE
  
  }
  
  renv_bootstrap_profile_load <- function(project) {
  
    # if RENV_PROFILE is already set, just use that
    profile <- Sys.getenv("RENV_PROFILE", unset = NA)
    if (!is.na(profile) && nzchar(profile))
      return(profile)
  
    # check for a profile file (nothing to do if it doesn't exist)
    path <- renv_bootstrap_paths_renv("profile", profile = FALSE, project = project)
    if (!file.exists(path))
      return(NULL)
  
    # read the profile, and set it if it exists
    contents <- readLines(path, warn = FALSE)
    if (length(contents) == 0L)
      return(NULL)
  
    # set RENV_PROFILE
    profile <- contents[[1L]]
    if (!profile %in% c("", "default"))
      Sys.setenv(RENV_PROFILE = profile)
  
    profile
  
  }
  
  renv_bootstrap_profile_prefix <- function() {
    profile <- renv_bootstrap_profile_get()
    if (!is.null(profile))
      return(file.path("profiles", profile, "renv"))
  }
  
  renv_bootstrap_profile_get <- function() {
    profile <- Sys.getenv("RENV_PROFILE", unset = "")
    renv_bootstrap_profile_normalize(profile)
  }
  
  renv_bootstrap_profile_set <- function(profile) {
    profile <- renv_bootstrap_profile_normalize(profile)
    if (is.null(profile))
      Sys.unsetenv("RENV_PROFILE")
    else
      Sys.setenv(RENV_PROFILE = profile)
  }
  
  renv_bootstrap_profile_normalize <- function(profile) {
  
    if (is.null(profile) || profile %in% c("", "default"))
      return(NULL)
  
    profile
  
  }
  
  renv_bootstrap_path_absolute <- function(path) {
  
    substr(path, 1L, 1L) %in% c("~", "/", "\\") || (
      substr(path, 1L, 1L) %in% c(letters, LETTERS) &&
      substr(path, 2L, 3L) %in% c(":/", ":\\")
    )
  
  }
  
  renv_bootstrap_paths_renv <- function(..., profile = TRUE, project = NULL) {
    renv <- Sys.getenv("RENV_PATHS_RENV", unset = "renv")
    root <- if (renv_bootstrap_path_absolute(renv)) NULL else project
    prefix <- if (profile) renv_bootstrap_profile_prefix()
    components <- c(root, renv, prefix, ...)
    paste(components, collapse = "/")
  }
  
  renv_bootstrap_project_type <- function(path) {
  
    descpath <- file.path(path, "DESCRIPTION")
    if (!file.exists(descpath))
      return("unknown")
  
    desc <- tryCatch(
      read.dcf(descpath, all = TRUE),
      error = identity
    )
  
    if (inherits(desc, "error"))
      return("unknown")
  
    type <- desc$Type
    if (!is.null(type))
      return(tolower(type))
  
    package <- desc$Package
    if (!is.null(package))
      return("package")
  
    "unknown"
  
  }
  
  renv_bootstrap_user_dir <- function() {
    dir <- renv_bootstrap_user_dir_impl()
    path.expand(chartr("\\", "/", dir))
  }
  
  renv_bootstrap_user_dir_impl <- function() {
  
    # use local override if set
    override <- getOption("renv.userdir.override")
    if (!is.null(override))
      return(override)
  
    # use R_user_dir if available
    tools <- asNamespace("tools")
    if (is.function(tools$R_user_dir))
      return(tools$R_user_dir("renv", "cache"))
  
    # try using our own backfill for older versions of R
    envvars <- c("R_USER_CACHE_DIR", "XDG_CACHE_HOME")
    for (envvar in envvars) {
      root <- Sys.getenv(envvar, unset = NA)
      if (!is.na(root))
        return(file.path(root, "R/renv"))
    }
  
    # use platform-specific default fallbacks
    if (Sys.info()[["sysname"]] == "Windows")
      file.path(Sys.getenv("LOCALAPPDATA"), "R/cache/R/renv")
    else if (Sys.info()[["sysname"]] == "Darwin")
      "~/Library/Caches/org.R-project.R/R/renv"
    else
      "~/.cache/R/renv"
  
  }
  
  renv_bootstrap_version_friendly <- function(version, shafmt = NULL, sha = NULL) {
    sha <- sha %||% attr(version, "sha", exact = TRUE)
    parts <- c(version, sprintf(shafmt %||% " [sha: %s]", substring(sha, 1L, 7L)))
    paste(parts, collapse = "")
  }
  
  renv_bootstrap_exec <- function(project, libpath, version) {
    if (!renv_bootstrap_load(project, libpath, version))
      renv_bootstrap_run(project, libpath, version)
  }
  
  renv_bootstrap_run <- function(project, libpath, version) {
  
    # perform bootstrap
    bootstrap(version, libpath)
  
    # exit early if we're just testing bootstrap
    if (!is.na(Sys.getenv("RENV_BOOTSTRAP_INSTALL_ONLY", unset = NA)))
      return(TRUE)
  
    # try again to load
    if (requireNamespace("renv", lib.loc = libpath, quietly = TRUE)) {
      return(renv::load(project = project))
    }
  
    # failed to download or load renv; warn the user
    msg <- c(
      "Failed to find an renv installation: the project will not be loaded.",
      "Use `renv::activate()` to re-initialize the project."
    )
  
    warning(paste(msg, collapse = "\n"), call. = FALSE)
  
  }
  
  renv_json_read <- function(file = NULL, text = NULL) {
  
    jlerr <- NULL
  
    # if jsonlite is loaded, use that instead
    if ("jsonlite" %in% loadedNamespaces()) {
  
      json <- tryCatch(renv_json_read_jsonlite(file, text), error = identity)
      if (!inherits(json, "error"))
        return(json)
  
      jlerr <- json
  
    }
  
    # otherwise, fall back to the default JSON reader
    json <- tryCatch(renv_json_read_default(file, text), error = identity)
    if (!inherits(json, "error"))
      return(json)
  
    # report an error
    if (!is.null(jlerr))
      stop(jlerr)
    else
      stop(json)
  
  }
  
  renv_json_read_jsonlite <- function(file = NULL, text = NULL) {
    text <- paste(text %||% readLines(file, warn = FALSE), collapse = "\n")
    jsonlite::fromJSON(txt = text, simplifyVector = FALSE)
  }
  
  renv_json_read_patterns <- function() {
  
    list(
  
      # objects
      list("{", "\t\n\tobject(\t\n\t", TRUE),
      list("}", "\t\n\t)\t\n\t",       TRUE),
  
      # arrays
      list("[", "\t\n\tarray(\t\n\t", TRUE),
      list("]", "\n\t\n)\n\t\n",      TRUE),
  
      # maps
      list(":", "\t\n\t=\t\n\t", TRUE),
  
      # newlines
      list("\\u000a", "\n", FALSE)
  
    )
  
  }
  
  renv_json_read_envir <- function() {
  
    envir <- new.env(parent = emptyenv())
  
    envir[["+"]] <- `+`
    envir[["-"]] <- `-`
  
    envir[["object"]] <- function(...) {
      result <- list(...)
      names(result) <- as.character(names(result))
      result
    }
  
    envir[["array"]] <- list
  
    envir[["true"]]  <- TRUE
    envir[["false"]] <- FALSE
    envir[["null"]]  <- NULL
  
    envir
  
  }
  
  renv_json_read_remap <- function(object, patterns) {
  
    # repair names if necessary
    if (!is.null(names(object))) {
  
      nms <- names(object)
      for (pattern in patterns)
        nms <- gsub(pattern[[2L]], pattern[[1L]], nms, fixed = TRUE)
      names(object) <- nms
  
    }
  
    # repair strings if necessary
    if (is.character(object)) {
      for (pattern in patterns)
        object <- gsub(pattern[[2L]], pattern[[1L]], object, fixed = TRUE)
    }
  
    # recurse for other objects
    if (is.recursive(object))
      for (i in seq_along(object))
        object[i] <- list(renv_json_read_remap(object[[i]], patterns))
  
    # return remapped object
    object
  
  }
  
  renv_json_read_default <- function(file = NULL, text = NULL) {
  
    # read json text
    text <- paste(text %||% readLines(file, warn = FALSE), collapse = "\n")
  
    # convert into something the R parser will understand
    patterns <- renv_json_read_patterns()
    transformed <- text
    for (pattern in patterns)
      transformed <- gsub(pattern[[1L]], pattern[[2L]], transformed, fixed = TRUE)
  
    # parse it
    rfile <- tempfile("renv-json-", fileext = ".R")
    on.exit(unlink(rfile), add = TRUE)
    writeLines(transformed, con = rfile)
    json <- parse(rfile, keep.source = FALSE, srcfile = NULL)[[1L]]
  
    # evaluate in safe environment
    result <- eval(json, envir = renv_json_read_envir())
  
    # fix up strings if necessary -- do so only with reversible patterns
    patterns <- Filter(function(pattern) pattern[[3L]], patterns)
    renv_json_read_remap(result, patterns)
  
  }
  

  # load the renv profile, if any
  renv_bootstrap_profile_load(project)

  # construct path to library root
  root <- renv_bootstrap_library_root(project)

  # construct library prefix for platform
  prefix <- renv_bootstrap_platform_prefix()

  # construct full libpath
  libpath <- file.path(root, prefix)

  # run bootstrap code
  renv_bootstrap_exec(project, libpath, version)

  invisible()

})


================================================
FILE: renv/settings.json
================================================
{
  "bioconductor.version": "3.21",
  "external.libraries": [],
  "ignored.packages": [],
  "package.dependency.fields": [
    "Imports",
    "Depends",
    "LinkingTo"
  ],
  "ppm.enabled": null,
  "ppm.ignored.urls": [],
  "r.version": null,
  "snapshot.type": "explicit",
  "use.cache": true,
  "vcs.ignore.cellar": true,
  "vcs.ignore.library": true,
  "vcs.ignore.local": true,
  "vcs.manage.ignores": true
}


================================================
FILE: revdep/.gitignore
================================================
checks
library
checks.noindex
library.noindex
data.sqlite
*.html


================================================
FILE: revdep/email.yml
================================================
release_date: ???
rel_release_date: ???
my_news_url: ???
release_version: ???
release_details: ???


================================================
FILE: revdep/failures.md
================================================
*Wow, no problems at all. :)*

================================================
FILE: slides/cost_charme_school/demo_script.R
================================================
##################################################
## Project: pathfindR
## Script purpose: COST CHARME Summer Training
## School, Istanbul - pathfindR hands-on demonstration
## Date: Sep 1, 2019
## Author: Ege Ulgen
##################################################

# Installation ------------------------------------------------------------
# For the active subnetwork search component to work(i.e., in order to
# run pathfindR), the user must have Java installed and
# path/to/java must be in the PATH environment variable.
# For Windows users, to configure the PATH environment variable see:
# https://github.com/egeulgen/pathfindR/wiki/Installation#configuration-of-java-on-windows

install.packages("devtools") # if you have not installed "devtools" package
devtools::install_github("egeulgen/pathfindR")

library(pathfindR)

# Enrichment Analysis -----------------------------------------------------
## demo input file = RA_input
?RA_input
dim(RA_input)
head(RA_input)

## demo runs
?run_pathfindR

# takes a while (use `visualize_pathways = FALSE` for faster runs)
RA_demo_out1 <- run_pathfindR(RA_input,
                         iterations = 1) # change number of iter.s to 1 for demo
dim(RA_demo_out1)
head(RA_demo_out1)

# faster non-default run
RA_demo_out2 <- run_pathfindR(RA_input,
                              iterations = 1, # change number of iter.s to 1 for demo
                              gene_sets = "BioCarta", # change from default ("KEGG")
                              pin_name_path = "GeneMania", # change from default ("Biogrid")
                              output = "DEMO_OUTPUT") # change output directory

# Pathway Clustering ------------------------------------------------------
## demo enrichment result file = RA_output
?RA_output
dim(RA_output)
head(RA_output)

?cluster_pathways
RA_demo_clu1 <- cluster_pathways(RA_output) # hierarchical (default)
RA_demo_clu2 <- cluster_pathways(RA_output,
                                 method = "fuzzy")

head(RA_demo_clu1)
head(RA_demo_clu2)

## Plot enrichment chart grouped by clusters
enrichment_chart(RA_demo_clu1,
                 plot_by_cluster = TRUE)

## Example Output for the pathfindR Clustering Workflow
?RA_clustered

# Term-gene graph ---------------------------------------------------------
?term_gene_graph
### `options(stringsAsFactors = TRUE)` if `stringsAsFactors` is set as FALSE in .Rprofile

term_gene_graph(RA_output) # top 10 terms(default)

## Graph using representative pathways
RA_representative <- RA_demo_clu1[RA_demo_clu1$Status == "Representative", ]
term_gene_graph(RA_representative,
                num_terms = NULL, # to plot using all terms
                use_names = TRUE) # use pw names instead of IDs

# Pathway Scoring ---------------------------------------------------------
## Expression matrix = RA_exp_mat
?RA_exp_mat

## Vector of "Case" IDs
cases <- c("GSM389703", "GSM389704", "GSM389706", "GSM389708", "GSM389711",
           "GSM389714", "GSM389716", "GSM389717", "GSM389719", "GSM389721",
           "GSM389722", "GSM389724", "GSM389726", "GSM389727", "GSM389730",
           "GSM389731", "GSM389733", "GSM389735")

?calculate_pw_scores

## Calculate pathway scores and plot heatmap for top 10 enriched pathways
score_matrix <- calculate_pw_scores(RA_output[1:10, ],
                                    RA_exp_mat,
                                    cases)

## Calculate pathway scores and plot heatmap for representative patways
score_matrix <- calculate_pw_scores(RA_representative,
                                    RA_exp_mat,
                                    cases)

# works if cases are not supplied as well
score_matrix <- calculate_pw_scores(RA_representative,
                                    RA_exp_mat)


================================================
FILE: tests/testthat/test-active_snw_search.R
================================================
## Tests for functions related to active subnetwork search - Aug 2023

# set up input data
input_data_frame <- example_pathfindR_input[1:10, c(1, 3)]
colnames(input_data_frame) <- c("GENE", "P_VALUE")

example_snws_len <- 1000
example_snw_output <- system.file("extdata", "resultActiveSubnetworkSearch.txt",
    package = "pathfindR")
mock_file_path <- function(...) {
    args <- list(...)
    if (args[[1]] == "active_snw_search") {
        return(example_snw_output)
    }
    return(file.path(...))
}

test_that("`active_snw_search()` -- returns a list object", {
    mockery::stub(active_snw_search, "dir.exists", TRUE)
    mockery::stub(active_snw_search, "file.exists", TRUE)
    mockery::stub(active_snw_search, "normalizePath", NULL)
    mockery::stub(active_snw_search, "system", NULL)
    mockery::stub(active_snw_search, "file.path", mock_file_path)
    mockery::stub(active_snw_search, "file.rename", NULL)

    # Expect > 0 active snws
    expect_message(snw_list <- active_snw_search(input_for_search = input_data_frame),
        "Found [1-9]\\d* active subnetworks")
    expect_is(snw_list, "list")
    expect_is(snw_list[[1]], "character")
    expect_true(length(snw_list) > 0)

    # Expect no active snws
    mockery::stub(active_snw_search, "filterActiveSnws", NULL)
    expect_message(snw_list <- active_snw_search(input_for_search = input_data_frame),
        "Found 0 active subnetworks")
    expect_identical(snw_list, list())
})

test_that("`active_snw_search()` -- `dir_for_parallel_run` arg is used when provided",
    {
        mockery::stub(active_snw_search, "dir.exists", TRUE)
        mockery::stub(active_snw_search, "file.exists", TRUE)
        mockery::stub(active_snw_search, "normalizePath", NULL)
        mockery::stub(active_snw_search, "system", NULL)
        mockery::stub(active_snw_search, "file.path", mock_file_path)
        mockery::stub(active_snw_search, "file.rename", NULL)

        m <- mockery::mock(NULL, cycle = TRUE)
        mockery::stub(active_snw_search, "setwd", m)
        res <- active_snw_search(input_for_search = input_data_frame, dir_for_parallel_run = tempdir())
        mockery::expect_called(m, 2)
    })

test_that("`active_snw_search()` -- argument checks work", {
    # input_for_search
    expect_error(snw_list <- active_snw_search(input_for_search = list()), "`input_for_search` should be data frame")

    invalid_input <- input_data_frame
    colnames(invalid_input) <- c("A", "B")
    expect_error(snw_list <- active_snw_search(input_for_search = invalid_input),
        paste0("`input_for_search` should contain the columns ", paste(dQuote(c("GENE",
            "P_VALUE")), collapse = ",")))

    # snws_file
    expect_error(snw_list <- active_snw_search(input_for_search = input_data_frame,
        snws_file = "[/]"), "`snws_file` may be containing forbidden characters. Please change and try again")

    # search_method
    valid_mets <- c("GR", "SA", "GA")
    expect_error(active_snw_search(input_for_search = input_data_frame, search_method = "INVALID"),
        paste0("`search_method` should be one of ", paste(dQuote(valid_mets), collapse = ", ")))

    # silent_option
    expect_error(active_snw_search(input_for_search = input_data_frame, silent_option = "WRONG"),
        "`silent_option` should be either TRUE or FALSE")

    expect_error(active_snw_search(input_for_search = input_data_frame, use_all_positives = "INVALID"),
        "`use_all_positives` should be either TRUE or FALSE")
})

test_that("`active_snw_search()` -- all search methods work", {
    skip_on_cran()
    ## GR
    expect_message(snw_list <- active_snw_search(input_for_search = input_data_frame,
        pin_name_path = "Biogrid", search_method = "GR", dir_for_parallel_run = tempdir(check = TRUE)),
        "Found [1-9]\\d* active subnetworks")
    expect_is(snw_list, "list")
    expect_is(snw_list[[1]], "character")

    skip("will test SA and GA if we can create a suitable (faster and non-empty) test case")
    ## SA
    expect_message(snw_list <- active_snw_search(input_for_search = input_data_frame,
        pin_name_path = "Biogrid", search_method = "SA", dir_for_parallel_run = tempdir(check = TRUE)),
        "Found [1-9]\\d* active subnetworks")
    expect_is(snw_list, "list")
    expect_is(snw_list[[1]], "character")

    ## GA
    expect_message(snw_list <- active_snw_search(input_for_search = input_data_frame,
        pin_name_path = "Biogrid", search_method = "GA", dir_for_parallel_run = tempdir(check = TRUE)),
        "Found [1-9]\\d* active subnetworks")
    expect_is(snw_list, "list")
    expect_is(snw_list[[1]], "character")
})

test_that("`active_snw_search()` -- results are reproducible", {
    skip_on_cran()
    snw_lists <- list()
    seed_vals <- c(123, 123, 456)
    for (idx in 1:3) {
        seed <- seed_vals[idx]
        snw_lists[[idx]] <- active_snw_search(input_for_search = input_data_frame,
            seedForRandom = seed, dir_for_parallel_run = tempdir(check = TRUE))
    }
    expect_identical(snw_lists[[1]], snw_lists[[2]])
    expect_false(identical(snw_lists[[1]], snw_lists[[3]]))
})


test_that("`filterActiveSnws()` -- returns expected list object", {
    snws_filtered <- filterActiveSnws(active_snw_path = example_snw_output, sig_genes_vec = input_data_frame$GENE)
    expect_is(snws_filtered, "list")
    expect_length(snws_filtered, 2)
    expect_is(snws_filtered$subnetworks, "list")
    expect_is(snws_filtered$scores, "numeric")

    expect_is(snws_filtered$subnetworks[[1]], "character")
    expect_true(length(snws_filtered$subnetworks) <= example_snws_len)

    # empty file case
    empty_path <- tempfile("empty", fileext = ".txt")
    file.create(empty_path)
    expect_null(suppressWarnings(filterActiveSnws(active_snw_path = empty_path, sig_genes_vec = input_data_frame$GENE)))
})

test_that("`filterActiveSnws()` -- `score_quan_thr` works", {
    snws_filtered <- filterActiveSnws(active_snw_path = example_snw_output, sig_genes_vec = example_pathfindR_input$Gene.symbol,
        score_quan_thr = -1, sig_gene_thr = 0)
    expect_length(snws_filtered$subnetworks, example_snws_len)

    for (q_thr in seq(0.1, 1, by = 0.1)) {
        snws_filtered <- filterActiveSnws(active_snw_path = example_snw_output, sig_genes_vec = example_pathfindR_input$Gene.symbol,
            score_quan_thr = q_thr, sig_gene_thr = 0)
        exp_len <- example_snws_len * (1 - q_thr)
        expect_length(snws_filtered$subnetworks, as.integer(exp_len + 0.5))
    }
})

test_that("`filterActiveSnws()` -- `sig_gene_thr` works", {
    snws_filtered1 <- filterActiveSnws(active_snw_path = example_snw_output, sig_genes_vec = example_pathfindR_input$Gene.symbol,
        sig_gene_thr = 0.02, score_quan_thr = -1)
    snws_filtered2 <- filterActiveSnws(active_snw_path = example_snw_output, sig_genes_vec = example_pathfindR_input$Gene.symbol,
        sig_gene_thr = 0.1, score_quan_thr = -1)

    expect_true(length(snws_filtered2$subnetworks) < example_snws_len)
    expect_true(length(snws_filtered1$subnetworks) > length(snws_filtered2$subnetworks))
})

test_that("`filterActiveSnws()` -- argument checks work", {
    expect_error(filterActiveSnws(active_snw_path = "this/is/not/a/valid/path"),
        "The active subnetwork file does not exist! Check the `active_snw_path` argument")

    expect_error(filterActiveSnws(active_snw_path = example_snw_output, sig_genes_vec = list()),
        "`sig_genes_vec` should be a vector")

    expect_error(filterActiveSnws(active_snw_path = example_snw_output, sig_genes_vec = example_pathfindR_input$Gene.symbol,
        score_quan_thr = "INVALID"), "`score_quan_thr` should be numeric")
    expect_error(filterActiveSnws(active_snw_path = example_snw_output, sig_genes_vec = example_pathfindR_input$Gene.symbol,
        score_quan_thr = -2), "`score_quan_thr` should be in \\[0, 1\\] or -1 \\(if not filtering\\)")
    expect_error(filterActiveSnws(active_snw_path = example_snw_output, sig_genes_vec = example_pathfindR_input$Gene.symbol,
        score_quan_thr = 2), "`score_quan_thr` should be in \\[0, 1\\] or -1 \\(if not filtering\\)")

    expect_error(filterActiveSnws(active_snw_path = example_snw_output, sig_genes_vec = example_pathfindR_input$Gene.symbol,
        sig_gene_thr = "INVALID"), "`sig_gene_thr` should be numeric")
    expect_error(filterActiveSnws(active_snw_path = example_snw_output, sig_genes_vec = example_pathfindR_input$Gene.symbol,
        sig_gene_thr = -1), "`sig_gene_thr` should be in \\[0, 1\\]")
})

test_that("`visualize_active_subnetworks()` -- returns list of ggraph objects", {
    # empty file case
    empty_path <- tempfile("empty", fileext = ".txt")
    file.create(empty_path)
    expect_null(visualize_active_subnetworks(active_snw_path = empty_path, genes_df = input_data_frame))

    skip_on_cran()
    # default
    g_list <- visualize_active_subnetworks(example_snw_output, input_data_frame)
    expect_is(g_list, "list")
    expect_is(g_list[[1]], "ggraph")
    expect_true(length(g_list) <= example_snws_len)

    # set `num_snws` to larger than actual number
    g_list <- visualize_active_subnetworks(example_snw_output, input_data_frame,
        num_snws = 21)
    expect_is(g_list, "list")
    expect_is(g_list[[1]], "ggraph")
    expect_length(g_list, 13)
})


================================================
FILE: tests/testthat/test-clustering.R
================================================
## Tests for enriched term clustering functions - Aug 2023

enrichment_res <- example_pathfindR_output[1:5, ]
input_kappa_mat <- create_kappa_matrix(enrichment_res)

test_that("`create_kappa_matrix()` -- creates kappa matrix", {
    input_df <- enrichment_res
    kappa_mat <- create_kappa_matrix(input_df)
    expect_true(isSymmetric.matrix(kappa_mat))
    expect_true(all(kappa_mat >= 0 & kappa_mat <= 1 | kappa_mat >= -1 & kappa_mat <=
        0))
    expect_identical(colnames(kappa_mat), rownames(kappa_mat))
    expect_identical(colnames(kappa_mat), input_df$ID)

    # zero length excluded
    input_df2 <- input_df
    input_df2$Down_regulated[1] <- input_df2$Up_regulated[1] <- ""
    kappa_mat2 <- create_kappa_matrix(input_df2)
    expect_true(isSymmetric.matrix(kappa_mat2))
    expect_false(input_df2$ID[1] %in% colnames(kappa_mat2))

    input_df$non_Signif_Snw_Genes <- c("GeneA, GeneB", "GeneA", "GeneC", "GeneB, GeneC",
        "")
    kappa_mat3 <- create_kappa_matrix(input_df, use_active_snw_genes = TRUE)
    expect_true(isSymmetric.matrix(kappa_mat3))
    expect_true(!all(kappa_mat3 != kappa_mat))
})

test_that("`create_kappa_matrix()` -- argument checks works", {
    expect_error(create_kappa_matrix(example_pathfindR_output, use_description = "INVALID"),
        "`use_description` should be TRUE or FALSE")
    expect_error(create_kappa_matrix(example_pathfindR_output, use_active_snw_genes = "INVALID"),
        "`use_active_snw_genes` should be TRUE or FALSE")
    expect_error(create_kappa_matrix(list()), "`enrichment_res` should be a data frame of enrichment results")
    expect_error(create_kappa_matrix(example_pathfindR_output[1, ]), "`enrichment_res` should contain at least 2 rows")

    cr_cols <- function(use_description = FALSE, use_active_snw_genes = FALSE) {
        nec_cols <- c("Down_regulated", "Up_regulated")
        if (use_description) {
            nec_cols <- c("Term_Description", nec_cols)
        } else {
            nec_cols <- c("ID", nec_cols)
        }
        if (use_active_snw_genes) {
            nec_cols <- c(nec_cols, "non_Signif_Snw_Genes")
        }
        return(nec_cols)
    }

    # desc F
    nec_cols <- cr_cols()
    valid_res <- enrichment_res[, -2]
    expect_silent(create_kappa_matrix(valid_res))
    invalid_res <- enrichment_res[, -1]
    expect_error(create_kappa_matrix(invalid_res), paste0("`enrichment_res` should contain all of ",
        paste(dQuote(nec_cols), collapse = ", ")))
    # desc T
    nec_cols <- cr_cols(use_description = TRUE)
    valid_res <- enrichment_res[, -1]
    expect_silent(create_kappa_matrix(valid_res, use_description = TRUE))
    invalid_res <- enrichment_res[, -2]
    expect_error(create_kappa_matrix(invalid_res, use_description = TRUE), paste0("`enrichment_res` should contain all of ",
        paste(dQuote(nec_cols), collapse = ", ")))
    # snw_g T
    nec_cols <- cr_cols(use_active_snw_genes = TRUE)
    valid_res <- enrichment_res
    valid_res$non_Signif_Snw_Genes <- ""
    expect_silent(create_kappa_matrix(valid_res, use_active_snw_genes = TRUE))
    expect_error(create_kappa_matrix(enrichment_res, use_active_snw_genes = TRUE),
        paste0("`enrichment_res` should contain all of ", paste(dQuote(nec_cols),
            collapse = ", ")))
})

test_that("`hierarchical_term_clustering()` -- returns integer vector", {
    m <- mockery::mock(NULL, cycle = TRUE)
    mockery::stub(hierarchical_term_clustering, "graphics::plot", m)
    mockery::stub(hierarchical_term_clustering, "stats::heatmap", m)
    mockery::stub(hierarchical_term_clustering, "stats::rect.hclust", m)

    expected_message_regex <- "The maximum average silhouette width was -?(0\\.?\\d{0,2}|1) for k = \\d+ \n\n"
    expect_message(clu_res <- hierarchical_term_clustering(input_kappa_mat, enrichment_res,
        plot_hmap = TRUE, plot_dend = TRUE), expected_message_regex)
    expect_is(clu_res, "integer")
    expect_true(max(clu_res) <= nrow(input_kappa_mat))
    expect_identical(rownames(input_kappa_mat), names(clu_res))
})

test_that("`hierarchical_term_clustering()` -- `num_clusters` works", {
    for (selected_num_clusters in seq_len(nrow(enrichment_res))) {
        expect_is(res <- hierarchical_term_clustering(input_kappa_mat, enrichment_res,
            num_clusters = selected_num_clusters, plot_hmap = FALSE, plot_dend = FALSE),
            "integer")
        expect_equal(max(res), selected_num_clusters)
    }
})

test_that("`hierarchical_term_clustering()` -- `kseq` (sequence of number of clusters to try) is determined appropriately",
    {
        mockery::stub(hierarchical_term_clustering, "stats::hclust", NULL)
        mockery::stub(hierarchical_term_clustering, "isSymmetric.matrix", TRUE)

        mock_cutree <- function(tree, k, h = NULL) {
            return(k)
        }
        mockery::stub(hierarchical_term_clustering, "stats::cutree", mock_cutree)

        for (num_terms in c(3, 15, 153, 200, 204, 432)) {
            kmax <- max(num_terms%/%2, 2)
            num_expected_calls <- ifelse(kmax <= 20, kmax - 1, ifelse(kmax <= 100,
                18 + kmax%/%10 - 1, 26 + kmax%/%50 - 1))
            target_k <- ifelse(kmax <= 20, kmax, ifelse(kmax <= 100, round(kmax%/%10) *
                10, round(kmax%/%50) * 50))

            tmp_enr_res <- example_pathfindR_output[seq_len(num_terms), ]
            tmp_kappa_mat <- matrix(NA, nrow = num_terms, ncol = num_terms, dimnames = list(tmp_enr_res$ID,
                tmp_enr_res$ID))

            silwidth_out_vec <- vector("list", num_expected_calls)
            for (idx in seq_len(num_expected_calls)) {
                if (idx == length(silwidth_out_vec)) {
                  silwidth_out_vec[[idx]] <- list(avg.silwidth = 100)
                } else {
                  silwidth_out_vec[[idx]] <- list(avg.silwidth = -100)
                }
            }
            mock_cluster.stats <- do.call(mockery::mock, silwidth_out_vec)
            mockery::stub(hierarchical_term_clustering, "fpc::cluster.stats", mock_cluster.stats)

            expected_message <- paste0("The maximum average silhouette width was 100 for k = ",
                target_k, " \n\n")
            expect_message(res_k <- hierarchical_term_clustering(tmp_kappa_mat, tmp_enr_res,
                plot_hmap = FALSE, plot_dend = FALSE), expected_message)
            expect_equal(res_k, target_k)
            mockery::expect_called(mock_cluster.stats, num_expected_calls)
        }
    })

test_that("`hierarchical_term_clustering()` -- argument checks work", {
    expect_error(hierarchical_term_clustering(kappa_mat = list(), enrichment_res = data.frame()),
        "`kappa_mat` should be a symmetric matrix")
    expect_error(hierarchical_term_clustering(kappa_mat = matrix(nrow = 1, ncol = 2),
        enrichment_res = data.frame()), "`kappa_mat` should be a symmetric matrix")

    mat <- matrix(nrow = 3, ncol = 3, dimnames = list(1:3, 1:3))
    expect_error(hierarchical_term_clustering(kappa_mat = mat, enrichment_res = data.frame(ID = 4:5)),
        "All terms in `kappa_mat` should be present in `enrichment_res`")
    expect_error(hierarchical_term_clustering(kappa_mat = mat, enrichment_res = data.frame(ID = 1:3),
        plot_hmap = "INVALID"), "`plot_hmap` should be TRUE or FALSE")
    expect_error(hierarchical_term_clustering(kappa_mat = mat, enrichment_res = data.frame(ID = 1:3),
        plot_dend = "INVALID"), "`plot_dend` should be TRUE or FALSE")
})

test_that("`fuzzy_term_clustering()` -- returns matrix of cluster memberships", {
    expect_is(res_mat <- fuzzy_term_clustering(create_kappa_matrix(example_pathfindR_output[1:25,
        ]), example_pathfindR_output[1:25, ], kappa_threshold = 0.1), "matrix")
    expect_true(is.logical(res_mat))
})

test_that("`fuzzy_term_clustering()` -- argument checks work", {
    expect_error(fuzzy_term_clustering(kappa_mat = list(), enrichment_res = data.frame()),
        "`kappa_mat` should be a symmetric matrix")
    expect_error(fuzzy_term_clustering(kappa_mat = matrix(nrow = 1, ncol = 2), enrichment_res = data.frame()),
        "`kappa_mat` should be a symmetric matrix")

    mat <- matrix(nrow = 3, ncol = 3, dimnames = list(1:3, 1:3))
    expect_error(fuzzy_term_clustering(kappa_mat = mat, enrichment_res = data.frame(ID = 4:5)),
        "All terms in `kappa_mat` should be present in `enrichment_res`")
    expect_error(fuzzy_term_clustering(kappa_mat = mat, enrichment_res = data.frame(ID = 1:3),
        kappa_threshold = "INVALID"), "`kappa_threshold` should be numeric")
    expect_error(fuzzy_term_clustering(kappa_mat = mat, enrichment_res = data.frame(ID = 1:3),
        kappa_threshold = 1.5), "`kappa_threshold` should be at most 1 as kappa statistic is always <= 1")
})

test_that("`cluster_graph_vis()` -- graph visualization of clusters works OK", {
    mockery::stub(hierarchical_term_clustering, "graphics::plot", NULL)
    mockery::stub(hierarchical_term_clustering, "stats::rect.hclust", NULL)
    mock_plot.igraph <- mockery::mock(NULL, cycle = TRUE)
    mockery::stub(cluster_graph_vis, "igraph::plot.igraph", mock_plot.igraph)
    ## use_description = FALSE
    for (clustering_func in c(hierarchical_term_clustering, fuzzy_term_clustering)) {
        clu_obj <- clustering_func(input_kappa_mat, enrichment_res)
        expect_silent(cluster_graph_vis(clu_obj, input_kappa_mat, enrichment_res))
    }
    mockery::expect_called(mock_plot.igraph, 2)
})

test_that("`cluster_graph_vis()` -- coloring of 'extra' clusters work", {
    mockery::stub(cluster_graph_vis, "igraph::plot.igraph", NULL)
    ### more than 41 clusters (number of colors available)
    selected_num_terms <- 45
    clu_input_df <- example_pathfindR_output[seq_len(selected_num_terms), ]
    mock_kappa_mat <- matrix(0, nrow = selected_num_terms, ncol = selected_num_terms,
        dimnames = list(clu_input_df$ID, clu_input_df$ID))

    # dummy hierarchical result
    hierarchical_clu_obj <- seq_len(selected_num_terms)
    names(hierarchical_clu_obj) <- clu_input_df$ID
    expect_silent(cluster_graph_vis(hierarchical_clu_obj, mock_kappa_mat, clu_input_df))

    # dummy fuzzy result
    fuzzy_clu_obj <- matrix(FALSE, nrow = selected_num_terms, ncol = selected_num_terms,
        dimnames = list(clu_input_df$ID, seq_len(selected_num_terms)))
    diag(fuzzy_clu_obj) <- TRUE
    expect_silent(cluster_graph_vis(fuzzy_clu_obj, mock_kappa_mat, clu_input_df))
})

test_that("`cluster_graph_vis()` -- check errors are raised appropriately", {
    expect_error(cluster_graph_vis(list(), matrix(), data.frame(ID = 1)), "Invalid class for `clu_obj`!")

    # hierarchical - missing terms in kappa matrix
    clu_obj <- hierarchical_term_clustering(input_kappa_mat, enrichment_res, plot_dend = FALSE)
    expect_error(cluster_graph_vis(c(clu_obj, EXTRA = 1L), input_kappa_mat, enrichment_res),
        "Not all terms in `clu_obj` present in `kappa_mat`!")

    # fuzzy - missing terms in kappa matrix
    clu_obj <- fuzzy_term_clustering(input_kappa_mat, enrichment_res)
    expect_error(cluster_graph_vis(rbind(clu_obj, EXTRA = rep(FALSE, ncol(clu_obj))),
        input_kappa_mat, enrichment_res), "Not all terms in `clu_obj` present in `kappa_mat`!")
})

test_that("`cluster_enriched_terms()` -- returns the input data frame
          with the additional columns `Cluster` and `Status`",
    {
        set.seed(123)
        num_clusters <- 3
        available_clus <- seq_len(num_clusters)
        toy_hierarchical_clu_obj <- sample(available_clus, size = nrow(enrichment_res) -
            1, replace = TRUE)
        missing_clu <- setdiff(available_clus, toy_hierarchical_clu_obj)
        toy_hierarchical_clu_obj <- c(toy_hierarchical_clu_obj, missing_clu)
        toy_hierarchical_clu_obj <- sample(toy_hierarchical_clu_obj)
        names(toy_hierarchical_clu_obj) <- enrichment_res$ID

        toy_fuzzy_clu_obj <- matrix(FALSE, nrow = nrow(enrichment_res), ncol = num_clusters,
            dimnames = list(enrichment_res$ID, available_clus))
        for (row_idx in seq_len(nrow(toy_fuzzy_clu_obj))) {
            num_memberships <- sample(available_clus, 1)
            new_cols <- sample(available_clus, size = num_memberships)
            toy_fuzzy_clu_obj[row_idx, new_cols] <- TRUE
        }

        mock_doCall <- function(...) {
            arguments <- list(...)
            if (arguments[1] == "hierarchical_term_clustering") {
                return(toy_hierarchical_clu_obj)
            }
            if (arguments[1] == "fuzzy_term_clustering") {
                return(toy_fuzzy_clu_obj)
            }
            return(NULL)
        }

        mockery::stub(cluster_enriched_terms, "create_kappa_matrix", input_kappa_mat)
        mockery::stub(cluster_enriched_terms, "R.utils::doCall", mock_doCall)

        # hierarchical
        expect_is(h_clu_res <- cluster_enriched_terms(enrichment_res), "data.frame")
        expect_true(all(c("Cluster", "Status") %in% colnames(h_clu_res)))
        expect_equal(max(h_clu_res$Cluster), num_clusters)
        # expect to have same number of rep. terms as the number of clusters
        expect_equal(max(h_clu_res$Cluster), sum(h_clu_res$Status == "Representative"))

        ## fuzzy
        expect_is(fuzzy_clu_res <- cluster_enriched_terms(enrichment_res, method = "fuzzy"),
            "data.frame")
        expect_true(all(c("Cluster", "Status") %in% colnames(fuzzy_clu_res)))
        expect_true(max(fuzzy_clu_res$Cluster) <= sum(fuzzy_clu_res$Status == "Representative"))
    })

test_that("`cluster_enriched_terms()` argument checks work", {
    expect_error(cluster_enriched_terms(enrichment_res, method = "INVALID"), "the clustering `method` must either be \"hierarchical\" or \"fuzzy\"")
    expect_error(cluster_enriched_terms(enrichment_res, plot_clusters_graph = "INVALID"),
        "`plot_clusters_graph` must be logical!")
})


================================================
FILE: tests/testthat/test-comparison.R
================================================
## Tests for functions related to comparison of pathfindR results - Aug 2023

input_df_A <- example_pathfindR_output[1:20, ]
input_df_B <- example_comparison_output[1:20, ]

test_that("`combine_pathfindR_results()` -- works as expected", {
    mock_graph <- mockery::mock(NULL)
    mock_plot <- mockery::mock(NULL)
    mockery::stub(combine_pathfindR_results, "combined_results_graph", mock_graph)
    mockery::stub(combine_pathfindR_results, "graphics::plot", mock_plot)
    expect_is(combined <- combine_pathfindR_results(input_df_A, input_df_B), "data.frame")
    expect_true(nrow(combined) <= nrow(input_df_A) + nrow(input_df_B))
    mockery::expect_called(mock_plot, 1)
    mockery::expect_called(mock_graph, 1)
})


combined_df <- combine_pathfindR_results(input_df_A, input_df_B, plot_common = FALSE)
combined_df2 <- combined_df[combined_df$status != "common", ]

test_that("`combined_results_graph()` -- produces a ggplot object using the correct data",
    {
        # Common Terms, default
        expect_is(p <- combined_results_graph(combined_df), "ggplot")
        expect_true(all(p$data$type %in% c("gene", "common term")))
        expect_equal(sum(p$data$type == "common term"), sum(combined_df$status ==
            "common"))

        # Selected 5 Terms
        sel_terms <- combined_df$ID[1:5]
        expect_is(p <- combined_results_graph(combined_df, selected_terms = sel_terms),
            "ggplot")
        expect_true(all(sel_terms %in% p$data$name))

        # use_description = TRUE
        expect_is(p <- combined_results_graph(combined_df, use_description = TRUE),
            "ggplot")

        # node_size = 'p_val'
        expect_is(p <- combined_results_graph(combined_df, node_size = "p_val"),
            "ggplot")

        # errors when there are no common terms
        expect_error(combined_results_graph(combined_df2), "There are no common terms")
    })

test_that("`combined_results_graph()` -- argument checks work", {
    expect_error(combined_results_graph(combined_df, use_description = "INVALID"),
        "`use_description` must either be TRUE or FALSE!")

    val_node_size <- c("num_genes", "p_val")
    expect_error(combined_results_graph(combined_df, node_size = "INVALID"), paste0("`node_size` should be one of ",
        paste(dQuote(val_node_size), collapse = ", ")))

    expect_error(combined_results_graph(combined_df = "INVALID"), "`combined_df` should be a data frame")

    wrong_df <- combined_df[, -c(1, 2)]
    ID_column <- "ID"
    necessary_cols <- c(ID_column, "combined_p", "Up_regulated_A", "Down_regulated_A",
        "Up_regulated_B", "Down_regulated_B")
    expect_error(combined_results_graph(wrong_df, use_description = FALSE), paste(c("All of",
        paste(necessary_cols, collapse = ", "), "must be present in `results_df`!"),
        collapse = " "))

    ID_column <- "Term_Description"
    necessary_cols <- c(ID_column, "combined_p", "Up_regulated_A", "Down_regulated_A",
        "Up_regulated_B", "Down_regulated_B")
    expect_error(combined_results_graph(wrong_df, use_description = TRUE), paste(c("All of",
        paste(necessary_cols, collapse = ", "), "must be present in `results_df`!"),
        collapse = " "))

    expect_error(combined_results_graph(combined_df, selected_terms = "INVALID"),
        "None of the `selected_terms` are in the combined results!")
})


================================================
FILE: tests/testthat/test-core.R
================================================
## Tests for core function - Aug 2023

# set up input data
input_data_frame <- example_pathfindR_input[1:10, c(1, 3)]
colnames(input_data_frame) <- c("GENE", "P_VALUE")

test_that("`run_pathfindR()` -- works as expected", {
    mock_fetch_gene_set <- mockery::mock(list(), cycle = TRUE)
    mock_return_pin_path <- mockery::mock("/path/to/some/PIN/SIF", cycle = TRUE)
    mock_input_processing <- mockery::mock(input_data_frame, cycle = TRUE)
    mock_active_snw_enrichment_wrapper <- mockery::mock(data.frame(), c())
    mock_summarize_enrichment_results <- mockery::mock(data.frame())
    mock_annotate_term_genes <- mockery::mock(example_pathfindR_output)
    mock_plot <- mockery::mock(NULL)

    mockery::stub(run_pathfindR, "fetch_gene_set", mock_fetch_gene_set)
    mockery::stub(run_pathfindR, "return_pin_path", mock_return_pin_path)
    mockery::stub(run_pathfindR, "input_processing", mock_input_processing)
    mockery::stub(run_pathfindR, "active_snw_enrichment_wrapper", mock_active_snw_enrichment_wrapper)
    mockery::stub(run_pathfindR, "summarize_enrichment_results", mock_summarize_enrichment_results)
    mockery::stub(run_pathfindR, "annotate_term_genes", mock_annotate_term_genes)
    mockery::stub(run_pathfindR, "graphics::plot", mock_plot)
    mockery::stub(run_pathfindR, "create_HTML_report", NULL)

    expected_messages <- paste(c("The input looks OK", "Plotting the enrichment bubble chart",
        paste(c(paste0("Found ", nrow(example_pathfindR_output), " enriched terms\n"),
            "You may run:", "- cluster_enriched_terms() for clustering enriched terms",
            "- visualize_terms() for visualizing enriched term diagrams\n"), collapse = "\n")),
        collapse = "|")
    # wrapper functions correctly - with output_dir provided
    out_dir <- file.path(tempdir(check = TRUE), "core_test")
    expect_message(res <- run_pathfindR(input_data_frame, output_dir = out_dir),
        expected_messages)
    expect_is(res, "data.frame")
    expect_identical(res, example_pathfindR_output)
    expect_true(dir.exists(out_dir))
    mockery::expect_called(mock_fetch_gene_set, 1)
    mockery::expect_called(mock_return_pin_path, 1)
    mockery::expect_called(mock_input_processing, 1)
    mockery::expect_called(mock_active_snw_enrichment_wrapper, 1)
    mockery::expect_called(mock_summarize_enrichment_results, 1)
    mockery::expect_called(mock_annotate_term_genes, 1)
    mockery::expect_called(mock_plot, 1)

    # warning raised as expected when no results found
    expect_warning(res <- run_pathfindR(input_data_frame), "Did not find any enriched terms!")
    expect_identical(res, data.frame())
})

test_that("`run_pathfindR()` argument checks work", {
    expect_error(run_pathfindR(input_data_frame, plot_enrichment_chart = "INVALID"),
        "`plot_enrichment_chart` should be either TRUE or FALSE")
    expect_error(run_pathfindR(input_data_frame, list_active_snw_genes = "INVALID"),
        "`list_active_snw_genes` should be either TRUE or FALSE")
})


================================================
FILE: tests/testthat/test-data_generation.R
================================================
## Tests for functions related to data generation - September 2025
library(httr)
library(ggkegg)

test_that("safe_get_content handles GET error via mocking", {
  fake_GET <- function(...) stop("Simulated connection failure")
  
  with_mocked_bindings(
    {
      expect_error(
        safe_get_content("http://example.com"),
        regexp = "Failed to retrieve resource"
      )
    },
    GET = fake_GET
  )
})

test_that("safe_get_content handles HTTP error via mocking", {
  fake_GET <- function(...) {
    structure(
      list(status_code = 500L),
      class = "response"
    )
  }
  
  with_mocked_bindings(
    {
      expect_error(
        safe_get_content("http://example.com"),
        regexp = "unavailable"
      )
    },
    GET = fake_GET
  )
})

test_that("safe_get_content handles content parsing failure via mocking", {
  fake_GET <- function(...) {
    structure(
      list(status_code = 200L),
      class = "response"
    )
  }
  
  fake_content <- function(...) stop("Simulated parsing failure")
  
  with_mocked_bindings(
    {
      expect_error(
        safe_get_content("http://example.com"),
        regexp = "Failed to parse content"
      )
    },
    GET = fake_GET,
    content = fake_content
  )
})


set.seed(123)
gene_pool <- paste0("Gene", 1:100)
toy_biogrid_pin <- data.frame(A = sample(gene_pool, 25), B = sample(gene_pool, 25))
colnames(toy_biogrid_pin) <- c("Official Symbol Interactor A", "Official Symbol Interactor B")

test_that("`process_pin()` -- removes self-interactions and duplicated interactions",
    {
        input_pin_df <- toy_biogrid_pin
        colnames(input_pin_df) <- c("Interactor_A", "Interactor_B")
        input_pin_df <- rbind(input_pin_df, data.frame(Interactor_A = input_pin_df$Interactor_B[1:5],
            Interactor_B = input_pin_df$Interactor_A[1:5]))

        processed_df <- process_pin(input_pin_df)

        expect_true(nrow(processed_df) < nrow(input_pin_df))
    })

test_that("`get_biogrid_pin()` -- returns a path to a valid PIN file", {
    mockery::stub(get_biogrid_pin, "utils::download.file", NULL)
    mockery::stub(get_biogrid_pin, "utils::unzip", list(Name = "BIOGRID-ORGANISM-Homo_sapiens-4.4.211.tab3.txt"))
    mockery::stub(get_biogrid_pin, "utils::read.delim", toy_biogrid_pin)

    expected_biogrid_pin_df <- toy_biogrid_pin
    colnames(expected_biogrid_pin_df) <- c("Interactor_A", "Interactor_B")
    expected_biogrid_pin_df <- process_pin(expected_biogrid_pin_df)
    expected_biogrid_pin_df <- data.frame(V1 = expected_biogrid_pin_df$Interactor_A,
        V2 = "pp", V3 = expected_biogrid_pin_df$Interactor_B)

    pin_path <- get_biogrid_pin(release = "4.4.211")
    pin_df <- read.delim(pin_path, header = FALSE)
    expect_true(ncol(pin_df) == 3)
    expect_true(all(pin_df[, 2] == "pp"))
    expect_identical(pin_df, expected_biogrid_pin_df)
})

test_that("`get_biogrid_pin()` -- determines and downloads the latest version", {
  mockery::stub(get_biogrid_pin, "safe_get_content", "<h2>BioGRID Release 3.5.183")
  mockery::stub(get_biogrid_pin, "utils::download.file", NULL)
  mockery::stub(get_biogrid_pin, "utils::unzip", list(Name = "BIOGRID-ORGANISM-Homo_sapiens-X.X.X.tab3.txt"))
  mockery::stub(get_biogrid_pin, "utils::read.delim", toy_biogrid_pin)

  expected_biogrid_pin_df <- toy_biogrid_pin
  colnames(expected_biogrid_pin_df) <- c("Interactor_A", "Interactor_B")
  expected_biogrid_pin_df <- process_pin(expected_biogrid_pin_df)
  expected_biogrid_pin_df <- data.frame(V1 = expected_biogrid_pin_df$Interactor_A,
                                        V2 = "pp", V3 = expected_biogrid_pin_df$Interactor_B)

  pin_path <- get_biogrid_pin()
  pin_df <- read.delim(pin_path, header = FALSE)
  expect_true(ncol(pin_df) == 3)
  expect_true(all(pin_df[, 2] == "pp"))
  expect_identical(pin_df, expected_biogrid_pin_df)
})

test_that("`get_biogrid_pin()` -- error check works", {
    # invalid organism error
    expect_error(get_biogrid_pin(org = "Hsapiens"), paste("Hsapiens is not a valid Biogrid organism.",
        "Available organisms are listed on: https://wiki.thebiogrid.org/doku.php/statistics"))
})

test_that("`get_pin_file()` -- works as expected", {
    with_mocked_bindings({
        pin_path <- get_pin_file()
        expect_identical(pin_path, "/path/to/some/PIN/file")
    }, get_biogrid_pin = function(...) "/path/to/some/PIN/file", .package = "pathfindR")

    expect_error(get_pin_file(source = "STRING"), "As of this version, this function is implemented to get data from BioGRID only")
})

test_that("`gset_list_from_gmt()` -- works as expected", {
    gmt_list <- list(GSA = sample(gene_pool, 80), GSB = sample(gene_pool, 100), GSC = sample(gene_pool,
        33))
    description_vec <- c(GSA = "gene set A", GSB = "gene set B", GSC = "gene set C")

    gmt_df <- c()
    for (gset in names(gmt_list)) {
        tmp <- c(gset, description_vec[gset])
        tmp <- c(tmp, gmt_list[[gset]], rep("", 100 - length(gmt_list[[gset]])))
        gmt_df <- rbind(gmt_df, tmp)
    }

    path2gmt <- tempfile()
    write.table(gmt_df, path2gmt, sep = "\t", col.names = FALSE, row.names = FALSE,
        quote = FALSE)

    expect_is(res <- gset_list_from_gmt(path2gmt), "list")
    expect_identical(res$gene_sets, gmt_list)
    expect_identical(res$descriptions, description_vec)
})


test_that("`get_kegg_gsets()` -- works as expected", {
  skip_on_cran()
  mock_response <- "pathway1\tdescription\npathway2\tdescription2"
  
  mock_pw_graph1 <- igraph::graph_from_data_frame(
    data.frame(from = c("A", "B"), to = c("B", "C")),
    vertices = data.frame(
      name = c("A", "B", "C"),
      type = c("gene", "not_gene", "gene")
    )
  )
  
  mock_pw_graph2 <- igraph::graph_from_data_frame(
    data.frame(from = c("D", "F"), to = c("E", "G")),
    vertices = data.frame(
      name = c("D", "E", "F", "G"),
      type = c("gene", "gene", "not_gene", "gene")
    )
  )
  
  mock_pathway <- function(pid, ...) {
    if (pid == "pathway1") {
      return(mock_pw_graph1)
    } else if (pid == "pathway2") {
      return(mock_pw_graph2)
    } else {
      stop("Unknown pid")
    }
  }
  
  with_mocked_bindings(
    {
      expect_is(toy_eco_kegg <- pathfindR:::get_kegg_gsets(), "list")
    },
    safe_get_content = function(...) mock_response,
    pathway = mock_pathway
  )

  expect_length(toy_eco_kegg, 2)
  expect_true(all(names(toy_eco_kegg) == c("gene_sets", "descriptions")))
  expect_true(all(names(toy_eco_kegg[["gene_sets"]]) %in% names(toy_eco_kegg[["descriptions"]])))
  expect_length(toy_eco_kegg[["gene_sets"]], 2)
  expect_length(toy_eco_kegg[["descriptions"]], 2)

  expect_true(toy_eco_kegg[["descriptions"]]["pathway1"] == "description")
  expect_true(toy_eco_kegg[["descriptions"]]["pathway2"] == "description2")

  expect_identical(toy_eco_kegg[["gene_sets"]][["pathway1"]], c("A", "C"))
  expect_identical(toy_eco_kegg[["gene_sets"]][["pathway2"]], c("D", "E", "G"))
})

test_that("`get_reactome_gsets()` -- works as expected", {
  skip_on_cran()

  pw1 <- "Pathway1"
  pw2 <- "Pathway2"
  desc1 <- "Description1"
  desc2 <- "Description2"
  genes1 <- c("GeneA", "GeneB")
  genes2 <- c("GeneC", "GeneD", "GeneE")

  gmt_content <- paste(
    c(
      paste(c(desc1, pw1, genes1), collapse = "\t"),
      paste(c(desc2, pw2, genes2), collapse = "\t")
    ),
    collapse = "\n"
  )

  mockery::stub(get_reactome_gsets, "utils::download.file", NULL)

  unz_mock <- function(zipfile, filename, ...) {
    textConnection(gmt_content)
  }
  mockery::stub(get_reactome_gsets, "unz", unz_mock)

  expected_gsets <- list(genes1, genes2)
  names(expected_gsets) <- c(pw1, pw2)
  expect_descriptions <- c(desc1, desc2)
  names(expect_descriptions) <- c(pw1, pw2)

  expect_is(reactome <- get_reactome_gsets(), "list")
  expect_length(reactome, 2)
  expect_length(reactome$gene_sets, 2)
  expect_length(reactome$descriptions, 2)
  expect_equal(names(reactome$gene_sets), names(reactome$descriptions))
  expect_equal(reactome$gene_sets, expected_gsets)
  expect_equal(reactome$descriptions, expect_descriptions)
})

test_that("`get_mgsigdb_gsets()` -- works as expected", {
    toy_msigdb_df <- c()
    for (gs_idx in 1:5) {
        toy_msigdb_df <- rbind(toy_msigdb_df, data.frame(gene_symbol = sample(gene_pool,
            sample(25:75, 1)), gs_id = paste0("GS", gs_idx), gs_name = paste("Gene Set",
            gs_idx)))
    }
    mockery::stub(get_mgsigdb_gsets, "msigdbr::msigdbr", toy_msigdb_df)

    expect_is(res_msig_db <- get_mgsigdb_gsets(collection = "C1"), "list")
    expect_length(res_msig_db, 2)
    expect_true(all(names(res_msig_db) == c("gene_sets", "descriptions")))
    expect_true(all(names(res_msig_db[["gene_sets"]] %in% names(res_msig_db[["descriptions"]]))))
})

test_that("`get_gene_sets_list()` works", {
    expect_error(gsets <- get_gene_sets_list("Wiki"), "As of this version, this function is implemented to get data from KEGG, Reactome and MSigDB only")

    mockery::stub(get_gene_sets_list, "get_kegg_gsets", NULL)
    mockery::stub(get_gene_sets_list, "get_reactome_gsets", NULL)
    mockery::stub(get_gene_sets_list, "get_mgsigdb_gsets", NULL)
    expect_silent(kegg <- get_gene_sets_list(org_code = "vcn"))
    expect_message(rctm <- get_gene_sets_list("Reactome"))
    expect_silent(msig <- get_gene_sets_list("MSigDB", species = "Mus musculus", db_species = "MS",
        collection = "C3", subcollection = "MIR:MIR_Legacy"))
})


================================================
FILE: tests/testthat/test-enrichment.R
================================================
## Tests for functions related to enrichment analyses - Aug 2023
set.seed(123)

test_that("`hyperg_test()` -- returns an appropriate p value", {
    expect_is(tmp_p <- hyperg_test(term_genes = LETTERS[1:10], chosen_genes = LETTERS[2:5],
        background_genes = LETTERS), "numeric")
    expect_true(tmp_p >= 0 & tmp_p <= 1)

    expect_is(tmp_p2 <- hyperg_test(term_genes = LETTERS[1:4], chosen_genes = LETTERS[3:10],
        background_genes = LETTERS), "numeric")
    expect_true(tmp_p2 >= 0 & tmp_p2 <= 1)
    expect_true(tmp_p2 > tmp_p)
})

test_that("`hyperg_test()` -- argument checks work", {
    expect_error(hyperg_test(term_genes = list()), "`term_genes` should be a vector")
    expect_error(hyperg_test(term_genes = LETTERS, chosen_genes = list()), "`chosen_genes` should be a vector")
    expect_error(hyperg_test(term_genes = LETTERS, chosen_genes = LETTERS[1:2], background_genes = list()),
        "`background_genes` should be a vector")
    expect_error(hyperg_test(term_genes = c(LETTERS, LETTERS), chosen_genes = LETTERS[1:3],
        background_genes = LETTERS), "`term_genes` cannot be larger than `background_genes`!")
    expect_error(hyperg_test(term_genes = LETTERS[1:10], chosen_genes = c(LETTERS,
        LETTERS), background_genes = LETTERS), "`chosen_genes` cannot be larger than `background_genes`!")
})

test_that("`enrichment()` -- returns a data frame", {
    expected_num_significant <- 10
    gsets <- example_pathfindR_output$ID[1:50]
    p_val_vec <- c(runif(expected_num_significant, min = 1e-05, max = 0.001), runif(length(gsets) -
        expected_num_significant, min = 0.05, max = 1))
    names(p_val_vec) <- gsets
    mock_vapply <- mockery::mock(p_val_vec, 5, 2, cycle = TRUE)
    mockery::stub(enrichment, "vapply", mock_vapply)
    mockery::stub(enrichment, "base::setdiff", c("RPS6KA2", "HSPA2", "SCN4B", "PPP2R1B",
        "PTCH1", "CASP10", "TIRAP", "BEX3", "KIF5C", "TNFSF13B"))

    # default
    expect_is(enr_res <- enrichment(input_genes = example_pathfindR_input$Gene.symbol,
        sig_genes_vec = c("DummyGene"), background_genes = c("DummyGene")), "data.frame")
    expect_equal(nrow(enr_res), expected_num_significant)
    expect_true(any(enr_res$non_Signif_Snw_Genes != ""))
    expect_true(all(enr_res$Fold_Enrichment == 2.5))

    # higher threshold - no filter
    expect_is(enr_res2 <- enrichment(input_genes = example_pathfindR_input$Gene.symbol,
        sig_genes_vec = c("DummyGene"), background_genes = c("DummyGene"), enrichment_threshold = 1), "data.frame")
    expect_equal(nrow(enr_res2), 50)
    expect_true(any(enr_res2$non_Signif_Snw_Genes != ""))

    # no enrichment case
    mockery::stub(enrichment, "stats::p.adjust", rep(1, 50))
    expect_null(enr_res3 <- enrichment(input_genes = example_pathfindR_input$Gene.symbol,
        sig_genes_vec = c("DummyGene"), background_genes = c("DummyGene")))
})

test_that("`enrichment()` -- argument checks work", {
    tmp_input_genes <- example_pathfindR_input$Gene.symbol[1:6]
    tmp_sig_vec <- example_pathfindR_input$Gene.symbol[1:3]
    ## input genes
    expect_error(enrichment(input_genes = list(), sig_genes_vec = "PER1", background_genes = unlist(kegg_genes)),
        "`input_genes` should be a vector of gene symbols")

    ## gene sets data
    expect_error(enrichment(input_genes = tmp_input_genes, genes_by_term = "INVALID",
        sig_genes_vec = tmp_sig_vec, background_genes = unlist(kegg_genes)), "`genes_by_term` should be a list of term gene sets")
    expect_error(enrichment(input_genes = tmp_input_genes, genes_by_term = list(1:3),
        sig_genes_vec = tmp_sig_vec, background_genes = unlist(kegg_genes)), "`genes_by_term` should be a named list \\(names are gene set IDs\\)")

    expect_error(enrichment(input_genes = tmp_input_genes, term_descriptions = list(),
        sig_genes_vec = tmp_sig_vec, background_genes = unlist(kegg_genes)), "`term_descriptions` should be a vector of term gene descriptions")
    expect_error(enrichment(input_genes = tmp_input_genes, term_descriptions = 1:3,
        sig_genes_vec = tmp_sig_vec, background_genes = unlist(kegg_genes)), "`term_descriptions` should be a named vector \\(names are gene set IDs\\)")

    expect_error(enrichment(input_genes = tmp_input_genes, genes_by_term = list(A = 1:3),
        term_descriptions = c(A = "a", B = "b"), sig_genes_vec = tmp_sig_vec, background_genes = unlist(kegg_genes)),
        "The lengths of `genes_by_term` and `term_descriptions` should be the same")
    expect_error(enrichment(input_genes = tmp_input_genes, genes_by_term = list(A = 1:3,
        X = 1:3), term_descriptions = c(A = "a", B = "b"), sig_genes_vec = tmp_sig_vec,
        background_genes = unlist(kegg_genes)), "The names of `genes_by_term` and `term_descriptions` should all be the same")

    ## enrichment threshold
    expect_error(enrichment(input_genes = tmp_input_genes, sig_genes_vec = tmp_sig_vec,
        background_genes = unlist(kegg_genes), enrichment_threshold = "INVALID"),
        "`enrichment_threshold` should be a numeric value between 0 and 1")

    expect_error(enrichment(input_genes = tmp_input_genes, sig_genes_vec = tmp_sig_vec,
        background_genes = unlist(kegg_genes), enrichment_threshold = -1), "`enrichment_threshold` should be between 0 and 1")

    ## signif. genes and background (universal set) genes
    expect_error(enrichment(input_genes = tmp_input_genes, sig_genes_vec = list(),
        background_genes = unlist(kegg_genes)), "`sig_genes_vec` should be a vector")
    expect_error(enrichment(input_genes = tmp_input_genes, sig_genes_vec = tmp_sig_vec,
        background_genes = list()), "`background_genes` should be a vector")
})

tmp_gset_genes <- kegg_genes[example_pathfindR_output$ID[order(example_pathfindR_output$support,
    decreasing = TRUE)[1:10]]]
tmp_gset_desc <- kegg_descriptions[names(tmp_gset_genes)]

all_iter_enr_res <- list(NULL, NULL, NULL)
subnw_start_idx <- 1:3
for (idx in seq_along(subnw_start_idx)) {
    j <- subnw_start_idx[idx]
    res <- enrichment_analyses(snws = example_active_snws[j:j + 2], sig_genes_vec = example_pathfindR_input$Gene.symbol,
        genes_by_term = tmp_gset_genes, term_descriptions = tmp_gset_desc, list_active_snw_genes = TRUE)
    if (!is.null(res)) {
        all_iter_enr_res[[idx]] <- res
    }
}

combined_res <- do.call(rbind, all_iter_enr_res)

test_that("`enrichment_analyses()` -- returns a data frame", {
    toy_pin <- data.frame(V1 = paste("Gene", sample(1:50, 10)), V2 = "pp", V3 = paste("Gene",
        sample(1:50, 10)))
    mockery::stub(enrichment_analyses, "return_pin_path", NULL)
    mockery::stub(enrichment_analyses, "utils::read.delim", toy_pin)

    mock_lapply <- mockery::mock(c(), all_iter_enr_res, cycle = TRUE)
    mockery::stub(enrichment_analyses, "lapply", mock_lapply)

    # default
    expect_is(enr_res1 <- enrichment_analyses(snws = example_active_snws[1:3], sig_genes_vec = example_pathfindR_input$Gene.symbol,
        list_active_snw_genes = FALSE), "data.frame")
    total <- sum(vapply(all_iter_enr_res, function(x) ifelse(is.null(x), 0, nrow(x)),
        1))
    expect_true(nrow(enr_res1) <= total)

    # list active snw genes
    expect_is(enr_res2 <- enrichment_analyses(snws = example_active_snws[1:3], sig_genes_vec = example_pathfindR_input$Gene.symbol,
        list_active_snw_genes = TRUE), "data.frame")
    expect_true(ncol(enr_res2) == ncol(enr_res1) + 1)
})

test_that("`enrichment_analyses()` -- argument check works", {
    expect_error(enrichment_analyses(snws = example_active_snws, list_active_snw_genes = "INVALID"),
        "`list_active_snw_genes` should be either TRUE or FALSE")
})

test_that("`summarize_enrichment_results()` -- returns summarized enrichment results",
    {
        # default
        expect_is(summ_res <- summarize_enrichment_results(enrichment_res = combined_res[,
            -6]), "data.frame")
        expect_equal(ncol(summ_res), 7)
        expect_false("non_Signif_Snw_Genes" %in% colnames(summ_res))
        expect_true(nrow(summ_res) <= nrow(combined_res))

        # list active snw genes
        expect_is(summ_res2 <- summarize_enrichment_results(enrichment_res = combined_res,
            list_active_snw_genes = TRUE), "data.frame")
        expect_equal(ncol(summ_res2), 8)
        expect_true("non_Signif_Snw_Genes" %in% colnames(summ_res2))
        expect_true(nrow(summ_res2) <= nrow(combined_res))
    })

test_that("`summarize_enrichment_results()` -- argument checks work", {
    expect_error(summarize_enrichment_results(enrichment_res = combined_res, list_active_snw_genes = "INVALID"),
        "`list_active_snw_genes` should be either TRUE or FALSE")

    expect_error(summarize_enrichment_results(enrichment_res = list()), "`enrichment_res` should be a data frame")

    # list_active_snw_genes = FALSE
    nec_cols <- c("ID", "Term_Description", "Fold_Enrichment", "p_value", "adj_p",
        "support")

    expect_error(summarize_enrichment_results(enrichment_res = data.frame()), paste0("`enrichment_res` should have exactly ",
        length(nec_cols), " columns"))

    tmp <- as.data.frame(matrix(nrow = 1, ncol = length(nec_cols), dimnames = list(NULL,
        letters[seq_along(nec_cols)])))
    expect_error(summarize_enrichment_results(enrichment_res = tmp), paste0("`enrichment_res` should have column names ",
        paste(dQuote(nec_cols), collapse = ", ")))

    # list_active_snw_genes = TRUE
    nec_cols <- c("ID", "Term_Description", "Fold_Enrichment", "p_value", "adj_p",
        "support", "non_Signif_Snw_Genes")

    expect_error(summarize_enrichment_results(enrichment_res = data.frame(), list_active_snw_genes = TRUE),
        paste0("`enrichment_res` should have exactly ", length(nec_cols), " columns"))

    tmp <- as.data.frame(matrix(nrow = 1, ncol = length(nec_cols), dimnames = list(NULL,
        letters[seq_along(nec_cols)])))
    expect_error(summarize_enrichment_results(enrichment_res = tmp, list_active_snw_genes = TRUE),
        paste0("`enrichment_res` should have column names ", paste(dQuote(nec_cols),
            collapse = ", ")))
})


================================================
FILE: tests/testthat/test-scoring.R
================================================
## Tests for agglomerated term scoring functions - Jan 2024

test_that("`score_terms()` -- returns score matrix", {
    mockery::stub(score_terms, "graphics::plot", NULL)

    small_result <- example_pathfindR_output[1:3, ]
    expect_is(score_terms(enrichment_table = small_result, exp_mat = example_experiment_matrix,
        plot_hmap = FALSE), "matrix")
    expect_is(score_terms(enrichment_table = small_result, exp_mat = example_experiment_matrix,
        plot_hmap = TRUE), "matrix")
    expect_is(score_terms(enrichment_table = small_result, exp_mat = example_experiment_matrix,
        cases = colnames(example_experiment_matrix)[1:3], plot_hmap = TRUE), "matrix")
})

test_that("`score_terms()` -- matches gene symbols correctly", {
  toy_result <- data.frame(
    ID = c("gset1", "gset2"),
    Term_Description = c("gset1", "gset2"),
    Up_regulated = "",
    Down_regulated = c(
      paste(paste0("Gene_", c(1, 3, 5)), collapse = ", "),
      paste(paste0("Gene_", c(6, 8)), collapse = ", ")
    )
  )
  toy_result2 <- data.frame(
    ID = c("gset1", "gset2"),
    Term_Description = c("gset1", "gset2"),
    Up_regulated = "",
    Down_regulated = c(
      paste(paste0("Dummy_", c(1, 3, 5)), collapse = ", "),
      paste(paste0("Gene_", c(6, 8)), collapse = ", ")
    )
  )
  toy_exp_mat <- matrix(
    rnorm(40), nrow = 10, ncol = 4, dimnames = list(paste0("gene_", 1:10), paste0("subject_", 1:4))
  )
  expect_is(res_mat <- score_terms(enrichment_table = toy_result, exp_mat = toy_exp_mat,
                        plot_hmap = FALSE), "matrix")
  expect_equal(nrow(res_mat), 2)


  expect_is(res_mat <- score_terms(enrichment_table = toy_result2, exp_mat = toy_exp_mat,
                                   plot_hmap = FALSE), "matrix")
  expect_equal(nrow(res_mat), 1)
})

test_that("`score_terms()` -- argument checks work", {
    expect_error(score_terms(enrichment_table = example_pathfindR_output, exp_mat = example_experiment_matrix,
        use_description = "INVALID"), "`use_description` should either be TRUE or FALSE")

    expect_error(score_terms(enrichment_table = example_pathfindR_output, exp_mat = example_experiment_matrix,
        plot_hmap = "INVALID"), "`plot_hmap` should either be TRUE or FALSE")

    expect_error(score_terms(enrichment_table = list(), exp_mat = example_experiment_matrix),
        "`enrichment_table` should be a data frame of enrichment results")

    tmp <- example_pathfindR_output[, -c(1, 2)]
    nec_cols <- c("ID", "Up_regulated", "Down_regulated")
    expect_error(score_terms(enrichment_table = tmp, exp_mat = example_experiment_matrix),
        paste0("`enrichment_table` should contain all of ", paste(dQuote(nec_cols),
            collapse = ", ")))
    nec_cols <- c("Term_Description", "Up_regulated", "Down_regulated")
    expect_error(score_terms(enrichment_table = tmp, exp_mat = example_experiment_matrix,
        use_description = TRUE), paste0("`enrichment_table` should contain all of ",
        paste(dQuote(nec_cols), collapse = ", ")))

    expect_error(score_terms(enrichment_table = example_pathfindR_output, exp_mat = list()),
        "`exp_mat` should be a matrix")

    expect_error(score_terms(enrichment_table = example_pathfindR_output, exp_mat = example_experiment_matrix,
        cases = list()), "`cases` should be a vector")
    expect_error(score_terms(enrichment_table = example_pathfindR_output, exp_mat = example_experiment_matrix,
        cases = LETTERS), "Missing `cases` in `exp_mat`")
})

test_that("duplicated term descriptions test", {
    small_result <- example_pathfindR_output[1:2, ]
    small_result$Term_Description <- small_result$Term_Description[1]
    expect_is(score_terms(enrichment_table = small_result, exp_mat = example_experiment_matrix,
        use_description = TRUE, plot_hmap = FALSE), "matrix")
})

test_that("`plot_scores()` -- creates term score heatmap ggplot object with correct labels",
    {
        score_mat <- score_terms(example_pathfindR_output[1:3, ], example_experiment_matrix,
            plot_hmap = FALSE)

        # default
        g <- plot_scores(score_mat)
        expect_is(g, "ggplot")
        
        labels <- ggplot2::get_labs(g)
        expect_identical(labels$fill, "Score")
        expect_identical(labels$x, "Sample")
        expect_identical(labels$y, "Term")

        # cases provided
        g <- plot_scores(score_mat, cases = colnames(score_mat)[1:3])
        expect_is(g, "ggplot")
        
        labels <- ggplot2::get_labs(g)
        expect_identical(labels$fill, "Score")
        expect_identical(labels$x, "Sample")
        expect_identical(labels$y, "Term")

        # default - label_samples = FALSE
        g <- plot_scores(score_mat, label_samples = FALSE)
        expect_is(g, "ggplot")
        
        labels <- ggplot2::get_labs(g)
        expect_identical(labels$fill, "Score")
        expect_identical(labels$x, "Sample")
        expect_identical(labels$y, "Term")

        # cases provided - label_samples = FALSE
        g <- plot_scores(score_mat, cases = colnames(score_mat)[1:3], label_samples = FALSE)
        expect_is(g, "ggplot")
        
        labels <- ggplot2::get_labs(g)
        expect_identical(labels$fill, "Score")
        expect_identical(labels$x, "Sample")
        expect_identical(labels$y, "Term")
    })

test_that("`plot_scores()` -- argument checks work", {
    expect_error(plot_scores(score_matrix = c()), "`score_matrix` should be a matrix")
    expect_error(plot_scores(score_matrix = data.frame()), "`score_matrix` should be a matrix")
    expect_error(plot_scores(score_matrix = list()), "`score_matrix` should be a matrix")

    mat <- matrix(1, nrow = 3, ncol = 2, dimnames = list(paste0("T", 1:3), c("A",
        "B")))

    expect_error(plot_scores(score_matrix = mat, cases = list()), "`cases` should be a vector")
    expect_error(plot_scores(score_matrix = mat, cases = c("A", "B", "C")), "Missing `cases` in `score_matrix`")

    expect_error(plot_scores(score_matrix = mat, label_samples = "INVALID"), "`label_samples` should be TRUE or FALSE")

    expect_error(plot_scores(score_matrix = mat, case_title = 1), "`case_title` should be a single character value")
    expect_error(plot_scores(score_matrix = mat, case_title = rep("z", 3)), "`case_title` should be a single character value")

    expect_error(plot_scores(score_matrix = mat, control_title = 1), "`control_title` should be a single character value")
    expect_error(plot_scores(score_matrix = mat, control_title = rep("z", 3)), "`control_title` should be a single character value")

    expect_error(plot_scores(score_matrix = mat, low = ""))
    expect_error(plot_scores(score_matrix = mat, mid = ""))
    expect_error(plot_scores(score_matrix = mat, high = ""))
})


================================================
FILE: tests/testthat/test-utility.R
================================================
## Tests for various utility functions - Aug 2023

set.seed(123)

test_that("`active_snw_enrichment_wrapper()` -- works as expected", {
    input_df <- example_pathfindR_input[, c(1, 3)]
    colnames(input_df) <- c("GENE", "P_VALUE")

    org_dir <- getwd()
    test_directory <- file.path(tempdir(check = TRUE), "snw_wrapper_test")
    dir.create(test_directory)
    setwd(test_directory)
    on.exit(setwd(org_dir))
    on.exit(unlink(test_directory), add = TRUE)

    with_mocked_bindings({
        expect_is(active_snw_enrichment_wrapper(input_processed = input_df, pin_path = "Biogrid",
            gset_list = list(), enrichment_threshold = 0.05, list_active_snw_genes = FALSE,
            iterations = 1), "data.frame")

        expect_is(active_snw_enrichment_wrapper(input_processed = input_df, pin_path = "Biogrid",
            gset_list = list(), enrichment_threshold = 0.05, list_active_snw_genes = FALSE,
            iterations = 2, disable_parallel = TRUE), "data.frame")

        expect_warning(active_snw_enrichment_wrapper(input_processed = input_df,
            pin_path = "Biogrid", gset_list = list(), enrichment_threshold = 0.05,
            list_active_snw_genes = FALSE, search_method = "GA", iterations = 2))
    }, single_iter_wrapper = function(...) example_pathfindR_output, .package = "pathfindR")

    skip_on_cran()
    expect_is(active_snw_enrichment_wrapper(input_processed = input_df[1:10, ], pin_path = "Biogrid",
        gset_list = list(genes_by_term = kegg_genes[1:2], term_descriptions = kegg_descriptions[names(kegg_genes[1:2])]),
        enrichment_threshold = 0.05, list_active_snw_genes = FALSE, iterations = 2),
        "NULL")
})

test_that("`active_snw_enrichment_wrapper()` -- argument checks work", {
    valid_mets <- c("GR", "SA", "GA")
    expect_error(active_snw_enrichment_wrapper(input_processed = input_processed,
        pin_path = pin_path, gset_list = list(), enrichment_threshold = 0.05, list_active_snw_genes = FALSE,
        search_method = "INVALID"), paste0("`search_method` should be one of ", paste(dQuote(valid_mets),
        collapse = ", ")))

    expect_error(active_snw_enrichment_wrapper(input_processed = input_processed,
        pin_path = pin_path, gset_list = list(), enrichment_threshold = 0.05, list_active_snw_genes = FALSE,
        use_all_positives = "INVALID"), "`use_all_positives` should be either TRUE or FALSE")

    expect_error(active_snw_enrichment_wrapper(input_processed = input_processed,
        pin_path = pin_path, gset_list = list(), enrichment_threshold = 0.05, list_active_snw_genes = FALSE,
        silent_option = "INVALID"), "`silent_option` should be either TRUE or FALSE")

    expect_error(active_snw_enrichment_wrapper(input_processed = input_processed,
        pin_path = pin_path, gset_list = list(), enrichment_threshold = 0.05, list_active_snw_genes = FALSE,
        disable_parallel = "INVALID"), "`disable_parallel` should be either TRUE or FALSE")

    expect_error(active_snw_enrichment_wrapper(input_processed = input_processed,
        pin_path = pin_path, gset_list = list(), enrichment_threshold = 0.05, list_active_snw_genes = FALSE,
        iterations = "INVALID"), "`iterations` should be a positive integer")

    expect_error(active_snw_enrichment_wrapper(input_processed = input_processed,
        pin_path = pin_path, gset_list = list(), enrichment_threshold = 0.05, list_active_snw_genes = FALSE,
        iterations = 0), "`iterations` should be >= 1")

    expect_error(active_snw_enrichment_wrapper(input_processed = input_processed,
        pin_path = pin_path, gset_list = list(), enrichment_threshold = 0.05, list_active_snw_genes = FALSE,
        n_processes = "INVALID"), "`n_processes` should be either NULL or a positive integer")

    expect_error(active_snw_enrichment_wrapper(input_processed = input_processed,
        pin_path = pin_path, gset_list = list(), enrichment_threshold = 0.05, list_active_snw_genes = FALSE,
        n_processes = 0), "`n_processes` should be > 1")
})

test_that("`configure_output_dir()` -- works as expected", {
    expected_dir <- file.path(tempdir(), "test_pathfindR_results")
    mockery::stub(configure_output_dir, "file.path", expected_dir)
    expect_equal(configure_output_dir(), expected_dir)

    test_out_dir <- file.path(tempdir(), "TEST")
    for (i in 1:3) {
        actual_dir <- configure_output_dir(test_out_dir)
        dir_to_check <- test_out_dir
        if (i > 1) {
            dir_to_check <- paste0(dir_to_check, "(", i - 1, ")")
        }
        expect_equal(actual_dir, dir_to_check)
        dir.create(actual_dir)
    }
})

test_that("`fetch_gene_set()` -- can fetch all gene set objects", {
    skip_on_cran()
    for (gset_name in c("KEGG", "mmu_KEGG", "Reactome", "BioCarta", "cell_markers",
        "GO-All", "GO-BP", "GO-CC", "GO-MF")) {
        expect_is(gset_obj <- fetch_gene_set(gene_sets = gset_name, min_gset_size = 10,
            max_gset_size = 300), "list")
        expect_is(gset_obj$genes_by_term, "list")
        expect_is(gset_obj$term_descriptions, "character")
        expect_true(length(gset_obj$genes_by_term) == length(gset_obj$term_descriptions))
        tmp <- vapply(gset_obj$genes_by_term, length, 1L)
        expect_true(min(tmp) >= 10 & max(tmp) <= 300)
    }
    # Custom
    gset_obj <- fetch_gene_set(gene_sets = "Custom", min_gset_size = 20, max_gset_size = 200,
        custom_genes = kegg_genes, custom_descriptions = kegg_descriptions)
    expect_is(gset_obj$genes_by_term, "list")
    expect_is(gset_obj$term_descriptions, "character")
    expect_true(length(gset_obj$genes_by_term) == length(gset_obj$term_descriptions))
    tmp <- vapply(gset_obj$genes_by_term, length, 1L)
    expect_true(min(tmp) >= 20 & max(tmp) <= 200)
})

test_that("`create_HTML_report()` -- works a expected", {
    mock_render <- mockery::mock(NULL, cycle = TRUE)
    mockery::stub(create_HTML_report, "rmarkdown::render", mock_render)

    create_HTML_report(input = data.frame(), input_processed = data.frame(), final_res = data.frame(),
        dir_for_report = "/path/to/report/dir")
    mockery::expect_called(mock_render, 3)
})

test_that("`fetch_gene_set()` -- min/max_gset_size args correctly filter gene sets",
    {
        skip_on_cran()
        min_max_pairs <- list(c(min = 10, max = 300), c(min = 50, max = 200))
        num_of_terms_after_size_filtering <- c()
        for (idx in seq_along(min_max_pairs)) {
            cur_vals <- min_max_pairs[[idx]]
            expect_is(gset_obj <- fetch_gene_set(gene_sets = "KEGG", min_gset_size = cur_vals["min"],
                max_gset_size = cur_vals["max"]), "list")
            sizes_of_terms <- vapply(gset_obj$genes_by_term, length, 1L)
            expect_true(min(sizes_of_terms) >= cur_vals["min"] & max(sizes_of_terms) <=
                cur_vals["max"])
            num_of_terms_after_size_filtering <- c(num_of_terms_after_size_filtering,
                length(gset_obj$genes_by_term))
        }

        expect_true(num_of_terms_after_size_filtering[2] < num_of_terms_after_size_filtering[1])
    })

test_that("`fetch_gene_set()` -- for 'Custom' gene set, check if the custom objects are provided",
    {
        expect_error(fetch_gene_set(gene_sets = "Custom"), "`custom_genes` and `custom_descriptions` must be provided if `gene_sets = \"Custom\"`")
        expect_error(fetch_gene_set(gene_sets = "Custom", custom_genes = kegg_genes),
            "`custom_genes` and `custom_descriptions` must be provided if `gene_sets = \"Custom\"`")
        expect_error(fetch_gene_set(gene_sets = "Custom", custom_descriptions = kegg_descriptions),
            "`custom_genes` and `custom_descriptions` must be provided if `gene_sets = \"Custom\"`")
    })

test_that("`fetch_gene_set()` -- argument checks work", {
    all_gs_opts <- c("KEGG", "Reactome", "BioCarta", "GO-All", "GO-BP", "GO-CC",
        "GO-MF", "cell_markers", "mmu_KEGG", "Custom")
    expect_error(fetch_gene_set(gene_sets = "INVALID"), paste0("`gene_sets` should be one of ",
        paste(dQuote(all_gs_opts), collapse = ", ")))

    expect_error(fetch_gene_set(min_gset_size = "INVALID"), "`min_gset_size` should be numeric")

    expect_error(fetch_gene_set(max_gset_size = "INVALID"), "`max_gset_size` should be numeric")

    expect_error(fetch_gene_set(gene_sets = "Custom", custom_genes = "INVALID", custom_descriptions = ""),
        "`custom_genes` should be a list of term gene sets")
    expect_error(fetch_gene_set(gene_sets = "Custom", custom_genes = list(), custom_descriptions = ""),
        "`custom_genes` should be a named list \\(names are gene set IDs\\)")

    expect_error(fetch_gene_set(gene_sets = "Custom", custom_genes = kegg_genes,
        custom_descriptions = list()), "`custom_descriptions` should be a vector of term gene descriptions")
    expect_error(fetch_gene_set(gene_sets = "Custom", custom_genes = kegg_genes,
        custom_descriptions = 1:3), "`custom_descriptions` should be a named vector \\(names are gene set IDs\\)")
})

test_that("`return_pin_path()` -- returns the absolute path to PIN file", {
    mockery::stub(return_pin_path, "utils::getFromNamespace", list())
    mockery::stub(return_pin_path, "lapply", list(data.frame(V1 = paste0("G", 1:10),
        V2 = "pp", V3 = paste0("G", 2:11)), data.frame(V1 = paste0("G", 3:5), V2 = "pp",
        V3 = paste0("G", 5:7))))
    expect_silent(path2file <- return_pin_path("Biogrid"))
    expect_true(file.exists(path2file))

    custom_pin <- read.delim(path2file, header = FALSE)
    custom_pin$V1 <- tolower(custom_pin$V1)
    custom_sif_path <- file.path(tempdir(check = TRUE), "tmp_PIN.sif")
    utils::write.table(custom_pin, custom_sif_path, sep = "\t", row.names = FALSE,
        col.names = FALSE, quote = FALSE)
    expect_silent(final_custom_path <- return_pin_path(custom_sif_path))
    expect_true(file.exists(final_custom_path))

    # convert to uppercase works
    upper_case_custom <- read.delim(final_custom_path, header = FALSE)
    expect_true(all(toupper(upper_case_custom[, 1]) == upper_case_custom[, 1]))
    expect_true(all(toupper(upper_case_custom[, 3]) == upper_case_custom[, 3]))


    # invalid custom PIN - wrong format
    invalid_sif_path <- system.file(paste0("extdata/MYC.txt"), package = "pathfindR")
    expect_error(return_pin_path(invalid_sif_path), "The PIN file must have 3 columns and be tab-separated")

    # invalid custom PIN - invalid second column
    invalid_sif_path <- file.path(tempdir(check = TRUE), "custom.sif")
    invalid_custom_sif <- data.frame(P1 = "X", pp = "INVALID", P2 = "Y")
    write.table(invalid_custom_sif, invalid_sif_path, sep = "\t", col.names = FALSE,
        row.names = FALSE)
    expect_error(return_pin_path(invalid_sif_path), "The second column of the PIN file must all be \"pp\" ")

    # invalid option
    valid_opts <- c("Biogrid", "STRING", "GeneMania", "IntAct", "KEGG", "mmu_STRING",
        "/path/to/custom/SIF")
    expect_error(return_pin_path("INVALID"), paste0("The chosen PIN must be one of:\n",
        paste(dQuote(valid_opts), collapse = ", ")))
})

test_that("`input_testing()` -- works as expected", {
    expect_message(input_testing(input = example_pathfindR_input, p_val_threshold = 0.05),
        "The input looks OK")

    expect_error(input_testing(input = matrix(), p_val_threshold = 0.05), "the input is not a data frame")

    expect_error(input_testing(input = example_pathfindR_input[, 1, drop = FALSE],
        p_val_threshold = 0.05), "the input should have 2 or 3 columns")

    expect_error(input_testing(input = example_pathfindR_input[1, ], p_val_threshold = 0.05),
        "There must be at least 2 rows \\(genes\\) in the input data frame")

    expect_error(input_testing(input = example_pathfindR_input, p_val_threshold = "INVALID"),
        "`p_val_threshold` must be a numeric value between 0 and 1")

    expect_error(input_testing(input = example_pathfindR_input, p_val_threshold = -1),
        "`p_val_threshold` must be between 0 and 1")

    tmp <- example_pathfindR_input
    tmp$adj.P.Val <- NA
    expect_error(input_testing(input = tmp, p_val_threshold = 0.05), "p values cannot contain NA values")

    tmp <- example_pathfindR_input
    tmp$adj.P.Val <- "INVALID"
    expect_error(input_testing(input = tmp, p_val_threshold = 0.05), "p values must all be numeric")

    tmp <- example_pathfindR_input
    tmp$adj.P.Val[1] <- -1
    expect_error(input_testing(input = tmp, p_val_threshold = 0.05), "p values must all be between 0 and 1")
})

test_that("`input_processing()` -- works as expected", {
    input_df <- example_pathfindR_input[1:10, ]
    toy_PIN <- data.frame(V1 = sample(example_pathfindR_input$Gene.symbol, 100),
        V2 = "pp", V3 = sample(example_pathfindR_input$Gene.symbol, 100))
    mockery::stub(input_processing, "return_pin_path", NULL)
    mockery::stub(input_processing, "utils::read.delim", toy_PIN)

    expect_is(processed_df <- input_processing(input_df), "data.frame")
    expect_true(ncol(processed_df) == 4)
    expect_true(nrow(processed_df) <= nrow(example_pathfindR_input))

    # no change values provided
    input_df2 <- input_df[, -2]
    expect_is(processed_df2 <- suppressWarnings(input_processing(input_df2)), "data.frame")
    expect_true(ncol(processed_df2) == 4)
    expect_true(all(processed_df2$CHANGE == 1e+06))

    toy_PIN2 <- rbind(toy_PIN, data.frame(V1 = c("SERPINA3", "ARHGAP17"), V2 = "pp",
        V3 = c("ACT", "GIG25")))
    mockery::stub(input_processing, "utils::read.delim", toy_PIN2)

    # multiple mapping
    input_multimap <- input_df
    input_multimap$Gene.symbol[1] <- "GIG24"
    input_multimap$Gene.symbol[2] <- "ACT"
    input_multimap$Gene.symbol[3] <- "AACT"
    input_multimap$Gene.symbol[4] <- "GIG25"
    expect_is(processed_df3 <- input_processing(input_multimap), "data.frame")
})

test_that("`input_processing()` -- errors and warnings work", {
    input_df <- example_pathfindR_input[1:10, ]

    toy_PIN <- data.frame(V1 = sample(input_df$Gene.symbol, 7), V2 = "pp", V3 = sample(input_df$Gene.symbol,
        7))
    mockery::stub(input_processing, "return_pin_path", NULL)
    mockery::stub(input_processing, "utils::read.delim", toy_PIN)

    input_df$Gene.symbol <- as.factor(input_df$Gene.symbol)
    expect_warning(input_processing(input_df, p_val_threshold = 0.05, pin_name_path = "Biogrid",
        convert2alias = TRUE), "The gene column was turned into character from factor.")

    expect_error(input_processing(example_pathfindR_input, p_val_threshold = 1e-100,
        pin_name_path = "Biogrid"), "No input p value is lower than the provided threshold \\(1e-100\\)")

    input_dup <- example_pathfindR_input[1:3, ]
    input_dup <- rbind(input_dup, input_dup[1, ])
    expect_warning(input_processing(input_dup, p_val_threshold = 0.05, pin_name_path = "Biogrid"),
        "Duplicated genes found! The lowest p value for each gene was selected")

    low_sig_input <- example_pathfindR_input[1:3, ]
    low_sig_input$adj.P.Val <- 1e-15
    expect_message(res <- input_processing(low_sig_input, p_val_threshold = 0.05,
        pin_name_path = "Biogrid"), "pathfindR cannot handle p values < 1e-13. These were changed to 1e-13")
    expect_true(all(res$P_VALUE == 1e-13))

    invalid_genes_input <- low_sig_input
    invalid_genes_input$Gene.symbol <- paste0(LETTERS[seq_len(nrow(invalid_genes_input))],
        "INVALID")
    expect_error(input_processing(invalid_genes_input, p_val_threshold = 0.05, pin_name_path = "Biogrid"),
        "None of the genes were in the PIN\nPlease check your gene symbols")

    low_sig_input$Gene.symbol[1] <- "INVALID_A"
    low_sig_input$Gene.symbol[2] <- "INVALID_B"
    low_sig_input$Gene.symbol[3] <- toy_PIN$V1[1]
    expect_error(input_processing(low_sig_input, p_val_threshold = 0.05, pin_name_path = "Biogrid"),
        "After processing, 1 gene \\(or no genes\\) could be mapped to the PIN")

    expect_error(input_processing(low_sig_input, p_val_threshold = 0.05, pin_name_path = "Biogrid",
        convert2alias = "INVALID"), "`convert2alias` should be either TRUE or FALSE")
})

example_gene_data <- example_pathfindR_input[1:10, ]
colnames(example_gene_data) <- c("GENE", "CHANGE", "P_VALUE")
tmp_res <- example_pathfindR_output[1:5, -c(7, 8)]

test_that("`annotate_term_genes()` -- adds input genes for each term", {
    expect_is(annotated_result <- annotate_term_genes(result_df = tmp_res, input_processed = example_gene_data),
        "data.frame")
    expect_true("Up_regulated" %in% colnames(annotated_result) & "Down_regulated" %in%
        colnames(annotated_result))
    expect_true(nrow(annotated_result) == nrow(tmp_res))
})

test_that("annotate_term_genes() -- argument checks work", {
    expect_error(annotate_term_genes(result_df = list(), input_processed = example_gene_data),
        "`result_df` should be a data frame")
    expect_error(annotate_term_genes(result_df = tmp_res[, -1], input_processed = example_gene_data),
        "`result_df` should contain an \"ID\" column")

    expect_error(annotate_term_genes(result_df = tmp_res, input_processed = list()),
        "`input_processed` should be a data frame")
    expect_error(annotate_term_genes(result_df = tmp_res, input_processed = example_gene_data[,
        -1]), "`input_processed` should contain the columns \"GENE\" and \"CHANGE\"")


    expect_error(annotate_term_genes(result_df = tmp_res, input_processed = example_gene_data,
        genes_by_term = "INVALID"), "`genes_by_term` should be a list of term gene sets")
    expect_error(annotate_term_genes(result_df = tmp_res, input_processed = example_gene_data,
        genes_by_term = list(1)), "`genes_by_term` should be a named list \\(names are gene set IDs\\)")
})


================================================
FILE: tests/testthat/test-visualization.R
================================================
## Tests for functions related to various visualization functions - Apr 2024

single_result <- example_pathfindR_output[1, ]
processed_input <- example_pathfindR_input[, c(1, 1, 2, 3)]
colnames(processed_input) <- c("old_GENE", "GENE", "CHANGE", "P_VALUE")

test_that("`visualize_terms()` -- calls the appropriate function", {
    mock_vis_kegg <- mockery::mock(NULL)
    mockery::stub(visualize_terms, "visualize_KEGG_diagram", mock_vis_kegg)
    expect_silent(visualize_terms(result_df = single_result, input_processed = data.frame(),
        is_KEGG_result = TRUE))
    mockery::expect_called(mock_vis_kegg, 1)

    mock_vis_term_inter <- mockery::mock(NULL)
    mockery::stub(visualize_terms, "visualize_term_interactions", mock_vis_term_inter)
    expect_silent(visualize_terms(result_df = single_result, is_KEGG_result = FALSE))
    mockery::expect_called(mock_vis_term_inter, 1)
})

test_that("`visualize_terms()` -- argumment checks work", {
    expect_error(visualize_terms(result_df = "INVALID"), "`result_df` should be a data frame")

    # is_KEGG_result = TRUE
    nec_cols <- "ID"
    expect_error(visualize_terms(single_result[, -1], is_KEGG_result = TRUE), paste0("`result_df` should contain the following columns: ",
        paste(dQuote(nec_cols), collapse = ", ")))
    # is_KEGG_result = FALSE
    nec_cols <- c("Term_Description", "Up_regulated", "Down_regulated")
    expect_error(visualize_terms(single_result[, -2], is_KEGG_result = FALSE), paste0("`result_df` should contain the following columns: ",
        paste(dQuote(nec_cols), collapse = ", ")))

    expect_error(visualize_terms(result_df = single_result, is_KEGG_result = TRUE), "`input_processed` should be specified when `is_KEGG_result = TRUE`")

    expect_error(visualize_terms(result_df = single_result, is_KEGG_result = "INVALID"),
        "the argument `is_KEGG_result` should be either TRUE or FALSE")
})

test_that("`visualize_term_interactions()` -- creates expected list of ggraph objects", {
    skip_on_cran()
    expect_is(res <- visualize_term_interactions(single_result, pin_name_path = "Biogrid"), "list")
    expect_is(res[[1]], "ggraph")

    tmp_res <- rbind(single_result, single_result)
    tmp_res$Term_Description[2] <- "SKIP"
    tmp_res$Up_regulated[2] <- "Gene1"
    tmp_res$Down_regulated[2] <- ""
    expect_message(res <- visualize_term_interactions(tmp_res, pin_name_path = "KEGG"),
        paste0("< 2 genes, skipping visualization of ", tmp_res$Term_Description[2]))

    # Non-empty non_Signif_Snw_Genes
    tmp_res <- single_result
    tmp_res$non_Signif_Snw_Genes <- example_pathfindR_output$Up_regulated[2]
    expect_is(res <- visualize_term_interactions(tmp_res, pin_name_path = "Biogrid"), "list")
    expect_is(res[[1]], "ggraph")
})

test_that("`visualize_KEGG_diagram()` -- creates expected list of ggraph objects", {
    skip_on_cran()
    skip_if_not_installed("org.Hs.eg.db")
    
    expect_is(res <- visualize_KEGG_diagram(kegg_pw_ids = single_result$ID, input_processed = processed_input), "list")
    expect_is(res[[1]], "ggraph")

    constant_input <- processed_input
    constant_input$CHANGE <- 1e+06
    expect_is(visualize_KEGG_diagram(kegg_pw_ids = single_result$ID, input_processed = constant_input), "list")
    expect_is(res[[1]], "ggraph")
})

test_that("`visualize_KEGG_diagram()` -- skips pathway if non-existent", {
    skip_on_cran()
    skip_if_not_installed("org.Hs.eg.db")
    temp_res <- example_pathfindR_output[1:2, ]
    temp_res$ID[2] <- "hsa12345"

    expect_is(res <- visualize_KEGG_diagram(kegg_pw_ids = temp_res$ID, input_processed = processed_input), "list")
    expect_is(res[[1]], "ggraph")
    expect_length(expect_is, 1)
})

test_that("`visualize_KEGG_diagram()` -- argument checks work", {
    expect_error(visualize_KEGG_diagram(kegg_pw_ids = list(), input_processed = processed_input),
        "`kegg_pw_ids` should be a vector of KEGG IDs")
    expect_error(visualize_KEGG_diagram(kegg_pw_ids = c("X", "Y", "Z"), input_processed = processed_input),
        "`kegg_pw_ids` should be a vector of valid hsa KEGG IDs")

    expect_error(visualize_KEGG_diagram(kegg_pw_ids = "abc12345", input_processed = list()),
        "`input_processed` should be a data frame")
    expect_error(visualize_KEGG_diagram(kegg_pw_ids = "abc12345", input_processed = processed_input[,
        -2]), paste0("`input_processed` should contain the following columns: ",
        paste(dQuote(c("GENE", "CHANGE")), collapse = ", ")))
})

test_that("`color_kegg_pathway()` -- works as expected", {
  skip_on_cran()

  pw_id <- "hsa00010"
  change_vec <- c(-2, 4, 6)
  names(change_vec) <- c("hsa:2821", "hsa:226", "hsa:229")

  expect_is(result <- color_kegg_pathway(pw_id, change_vec), "ggraph")

  names(change_vec) <- rep("missing", 3)
  expect_is(result <- color_kegg_pathway(pw_id, change_vec), "NULL")
})

test_that("`color_kegg_pathway()` -- exceptions are handled properly", {
    change_vec <- c(-2, 4, 6)
    names(change_vec) <- c("hsa:2821", "hsa:226", "hsa:229")

    expect_error(color_kegg_pathway(pw_id = "hsa03040", change_vec = change_vec,
        scale_vals = "INVALID"), "`scale_vals` should be logical")
    expect_error(color_kegg_pathway(pw_id = "hsa03040", change_vec = change_vec,
        node_cols = list()), "`node_cols` should be a vector of colors")
    expect_error(color_kegg_pathway(pw_id = "hsa03040", change_vec = change_vec,
        node_cols = rep("red", 4)), "the length of `node_cols` should be 3")
    expect_error(color_kegg_pathway(pw_id = "hsa03040", change_vec = change_vec,
        node_cols = c("red", "#FFFFFF", "INVALID")), "`node_cols` should be a vector of valid colors")

    skip_on_cran()

    constant_vec <- rep(1e+06, 3)
    names(constant_vec) <- c("hsa:2821", "hsa:226", "hsa:229")

    expect_silent(color_kegg_pathway(pw_id = "hsa03040", change_vec = change_vec,
        node_cols = c("red", "blue", "green")))
    expect_message(color_kegg_pathway(pw_id = "hsa03040", change_vec = constant_vec,
        node_cols = c("red", "blue", "green")))

    expect_null(suppressWarnings(color_kegg_pathway(pw_id = "hsa03040", change_vec = NULL)))
    expect_message(color_kegg_pathway(pw_id = "hsa11111", change_vec = c()))
})

test_that("`enrichment_chart()` -- produces a ggplot object with correct labels",
    {
        # default - top 10
        expect_is(g <- enrichment_chart(example_pathfindR_output), "ggplot")
        expect_equal(ggplot2::quo_name(g$mapping$x), "Fold_Enrichment")
        expect_equal(ggplot2::quo_name(g$mapping$y), "Term_Description")
        
        labels <- ggplot2::get_labs(g)
        expect_equal(labels$size, "# genes")
        expect_equal(labels$colour, expression(-log[10](p)))
        expect_equal(labels$x, "Fold Enrichment")
        expect_equal(labels$y, "Term_Description")

        # plot_by_cluster
        expect_is(g <- enrichment_chart(example_pathfindR_output_clustered, plot_by_cluster = TRUE),
            "ggplot")
        expect_equal(ggplot2::quo_name(g$mapping$x), "Fold_Enrichment")
        expect_equal(ggplot2::quo_name(g$mapping$y), "Term_Description")
        
        labels <- ggplot2::get_labs(g)
        expect_equal(labels$size, "# genes")
        expect_equal(labels$colour, expression(-log[10](p)))
        expect_equal(labels$x, "Fold Enrichment")
        expect_equal(labels$y, "Term_Description")

        # chang top_terms
        expect_is(g <- enrichment_chart(example_pathfindR_output, top_terms = NULL),
            "ggplot")
        expect_equal(ggplot2::quo_name(g$mapping$x), "Fold_Enrichment")
        expect_equal(ggplot2::quo_name(g$mapping$y), "Term_Description")
        
        labels <- ggplot2::get_labs(g)
        expect_equal(labels$size, "# genes")
        expect_equal(labels$colour, expression(-log[10](p)))
        expect_equal(labels$x, "Fold Enrichment")
        expect_equal(labels$y, "Term_Description")

        expect_is(g <- enrichment_chart(example_pathfindR_output, top_terms = 1000),
            "ggplot")
        expect_equal(ggplot2::quo_name(g$mapping$x), "Fold_Enrichment")
        expect_equal(ggplot2::quo_name(g$mapping$y), "Term_Description")
        
        labels <- ggplot2::get_labs(g)
        expect_equal(labels$size, "# genes")
        expect_equal(labels$colour, expression(-log[10](p)))
        expect_equal(labels$x, "Fold Enrichment")
        expect_equal(labels$y, "Term_Description")

        # change num_bubbles
        expect_is(g <- enrichment_chart(example_pathfindR_output_clustered, num_bubbles = 30),
            "ggplot")
        expect_equal(ggplot2::quo_name(g$mapping$x), "Fold_Enrichment")
        expect_equal(ggplot2::quo_name(g$mapping$y), "Term_Description")
        
        labels <- ggplot2::get_labs(g)
        expect_equal(labels$size, "# genes")
        expect_equal(labels$colour, expression(-log[10](p)))
        expect_equal(labels$x, "Fold Enrichment")
        expect_equal(labels$y, "Term_Description")

        # change even_breaks
        expect_is(g <- enrichment_chart(example_pathfindR_output_clustered, even_breaks = FALSE),
            "ggplot")
        expect_equal(ggplot2::quo_name(g$mapping$x), "Fold_Enrichment")
        expect_equal(ggplot2::quo_name(g$mapping$y), "Term_Description")
        
        labels <- ggplot2::get_labs(g)
        expect_equal(labels$size, "# genes")
        expect_equal(labels$colour, expression(-log[10](p)))
        expect_equal(labels$x, "Fold Enrichment")
        expect_equal(labels$y, "Term_Description")
    })

test_that("`enrichment_chart()` -- argument checks work", {
    necessary <- c("Term_Description", "Fold_Enrichment", "lowest_p", "Up_regulated",
        "Down_regulated")
    expect_error(enrichment_chart(example_pathfindR_output[, -2]), paste0("The input data frame must have the columns:\n",
        paste(necessary, collapse = ", ")))

    expect_error(enrichment_chart(example_pathfindR_output, plot_by_cluster = "INVALID"),
        "`plot_by_cluster` must be either TRUE or FALSE")

    expect_message(enrichment_chart(example_pathfindR_output, plot_by_cluster = TRUE),
        "For plotting by cluster, there must a column named `Cluster` in the input data frame!")

    expect_error(enrichment_chart(example_pathfindR_output, top_terms = "INVALID"),
        "`top_terms` must be either numeric or NULL")

    expect_error(enrichment_chart(example_pathfindR_output, top_terms = 0), "`top_terms` must be > 1")
})

test_that("`term_gene_graph()` -- produces a ggplot object using the correct data",
    {
        # Top 10 (default)
        expect_is(p <- term_gene_graph(example_pathfindR_output), "ggplot")
        expect_equal(sum(p$data$type == "term"), 10)

        # Top 3
        expect_is(p <- term_gene_graph(example_pathfindR_output, num_terms = 3),
            "ggplot")
        expect_equal(sum(p$data$type == "term"), 3)

        # All terms
        expect_is(p <- term_gene_graph(example_pathfindR_output[1:15, ], num_terms = NULL),
            "ggplot")
        expect_equal(sum(p$data$type == "term"), 15)

        # Top 1000, expect to plot top nrow(output)
        expect_is(p <- term_gene_graph(example_pathfindR_output[1:15, ], num_terms = 1000),
            "ggplot")
        expect_equal(sum(p$data$type == "term"), 15)

        # use_description = TRUE
        expect_is(p <- term_gene_graph(example_pathfindR_output, use_description = TRUE),
            "ggplot")
        expect_equal(sum(p$data$type == "term"), 10)

        # node_size = 'p_val'
        expect_is(p <- term_gene_graph(example_pathfindR_output, node_size = "p_val"),
            "ggplot")
        expect_equal(sum(p$data$type == "term"), 10)
    })

test_that("`term_gene_graph()` -- argument checks work", {
    expect_error(term_gene_graph(example_pathfindR_output, num_terms = "INVALID"),
        "`num_terms` must either be numeric or NULL!")

    expect_error(term_gene_graph(example_pathfindR_output, use_description = "INVALID"),
        "`use_description` must either be TRUE or FALSE!")

    val_node_size <- c("num_genes", "p_val")
    expect_error(term_gene_graph(example_pathfindR_output, node_size = "INVALID"),
        paste0("`node_size` should be one of ", paste(dQuote(val_node_size), collapse = ", ")))

    expect_error(term_gene_graph(result_df = "INVALID"), "`result_df` should be a data frame")

    wrong_df <- example_pathfindR_output[, -c(1, 2)]
    ID_column <- "ID"
    necessary_cols <- c(ID_column, "lowest_p", "Up_regulated", "Down_regulated")
    expect_error(term_gene_graph(wrong_df, use_description = FALSE), paste(c("All of",
        paste(necessary_cols, collapse = ", "), "must be present in `results_df`!"),
        collapse = " "))

    ID_column <- "Term_Description"
    necessary_cols <- c(ID_column, "lowest_p", "Up_regulated", "Down_regulated")
    expect_error(term_gene_graph(wrong_df, use_description = TRUE), paste(c("All of",
        paste(necessary_cols, collapse = ", "), "must be present in `results_df`!"),
        collapse = " "))

    expect_error(term_gene_graph(example_pathfindR_output, node_colors = list()))
    expect_error(term_gene_graph(example_pathfindR_output, node_colors = c(1, 2, 3)))
    expect_error(term_gene_graph(example_pathfindR_output, node_colors = c("red", "blue")))
})

test_that("`term_gene_heatmap()` -- produces a ggplot object using the correct data",
    {
        skip_on_cran()
        # Top 10 (default)
        expect_is(p <- term_gene_heatmap(example_pathfindR_output), "ggplot")
        expect_equal(length(unique(p$data$Enriched_Term)), 10)
        expect_true(all(p$data$Enriched_Term %in% example_pathfindR_output$ID))

        # Top 3
        expect_is(p <- term_gene_heatmap(example_pathfindR_output, num_terms = 3),
            "ggplot")
        expect_equal(length(unique(p$data$Enriched_Term)), 3)

        # No genes in 'Down_regulated'
        res_df <- example_pathfindR_output[1:3, ]
        res_df$Down_regulated <- ""
        expect_is(p <- term_gene_heatmap(res_df), "ggplot")
        expect_equal(length(unique(p$data$Enriched_Term)), 3)

        # No genes in 'Up_regulated'
        res_df <- example_pathfindR_output[1:3, ]
        res_df$Up_regulated <- ""
        expect_is(p <- term_gene_heatmap(res_df), "ggplot")
        expect_equal(length(unique(p$data$Enriched_Term)), 3)

        # All terms
        expect_is(p <- term_gene_heatmap(example_pathfindR_output[1:15, ], num_terms = NULL),
            "ggplot")
        expect_equal(length(unique(p$data$Enriched_Term)), 15)

        # Top 1000, expect to plot top nrow(output)
        expect_is(p <- term_gene_heatmap(example_pathfindR_output[1:15, ], num_terms = 1000),
            "ggplot")
        expect_equal(length(unique(p$data$Enriched_Term)), 15)

        # use_description = TRUE
        expect_is(p <- term_gene_heatmap(example_pathfindR_output, use_description = TRUE),
            "ggplot")
        expect_equal(length(unique(p$data$Enriched_Term)), 10)
        expect_true(all(p$data$Enriched_Term %in% example_pathfindR_output$Term_Description))

        # genes_df supplied
        expect_is(p <- term_gene_heatmap(example_pathfindR_output[1:3, ], example_pathfindR_input),
            "ggplot")

        # genes_df supplied - wihout change column
        expect_is(p <- term_gene_heatmap(example_pathfindR_output[1:3, ], example_pathfindR_input[,
            -2]), "ggplot")

        # sort by lowest_p instead
        expect_is(p <- term_gene_heatmap(example_pathfindR_output[1:3, ], example_pathfindR_input,
            sort_terms_by_p = TRUE), "ggplot")
    })

test_that("`term_gene_graph()` -- argument checks work", {
    expect_error(term_gene_heatmap(result_df = example_pathfindR_output, use_description = "INVALID"),
        "`use_description` must either be TRUE or FALSE!")

    expect_error(term_gene_heatmap(result_df = "INVALID"), "`result_df` should be a data frame")

    wrong_df <- example_pathfindR_output[, -c(1, 2)]
    ID_column <- "ID"
    nec_cols <- c(ID_column, "lowest_p", "Up_regulated", "Down_regulated")
    expect_error(term_gene_heatmap(wrong_df, use_description = FALSE), paste0("`result_df` should have the following columns: ",
        paste(dQuote(nec_cols), collapse = ", ")))

    ID_column <- "Term_Description"
    nec_cols <- c(ID_column, "lowest_p", "Up_regulated", "Down_regulated")
    expect_error(term_gene_heatmap(wrong_df, use_description = TRUE), paste0("`result_df` should have the following columns: ",
        paste(dQuote(nec_cols), collapse = ", ")))

    expect_error(term_gene_heatmap(result_df = example_pathfindR_output, genes_df = "INVALID"))

    expect_error(term_gene_heatmap(result_df = example_pathfindR_output, num_terms = "INVALID"),
        "`num_terms` should be numeric or NULL")

    expect_error(term_gene_heatmap(result_df = example_pathfindR_output, num_terms = -1),
        "`num_terms` should be > 0 or NULL")

    expect_error(term_gene_heatmap(example_pathfindR_output, low = ""))
    expect_error(term_gene_heatmap(example_pathfindR_output, mid = ""))
    expect_error(term_gene_heatmap(example_pathfindR_output, high = ""))
})

test_that("`UpSet_plot()` -- produces a ggplot object", {
    skip_on_cran()
    # Top 10 (default)
    expect_is(p <- UpSet_plot(example_pathfindR_output), "ggplot")

    # Top 3
    expect_is(p <- UpSet_plot(example_pathfindR_output, num_terms = 3), "ggplot")

    # All terms
    expect_is(p <- UpSet_plot(example_pathfindR_output[1:15, ], num_terms = NULL),
        "ggplot")

    # No genes in 'Down_regulated'
    res_df <- example_pathfindR_output
    res_df$Down_regulated <- ""
    expect_is(p <- UpSet_plot(res_df, num_terms = 3), "ggplot")

    # No genes in 'Up_regulated'
    res_df <- example_pathfindR_output
    res_df$Up_regulated <- ""
    expect_is(p <- UpSet_plot(res_df, num_terms = 3), "ggplot")

    # use_description = TRUE
    expect_is(p <- UpSet_plot(example_pathfindR_output, use_description = TRUE),
        "ggplot")

    # Other visualization types
    expect_is(p <- UpSet_plot(example_pathfindR_output[1:3, ], example_pathfindR_input[1:10,
        ]), "ggplot")
    expect_is(p <- UpSet_plot(example_pathfindR_output[1:3, ], example_pathfindR_input[1:10,
        ], method = "boxplot"), "ggplot")
    expect_is(p <- UpSet_plot(example_pathfindR_output[1:3, ], method = "barplot"),
        "ggplot")
})

test_that("`UpSet_plot()` -- argument checks work", {
    expect_error(UpSet_plot(result_df = example_pathfindR_output, use_description = "INVALID"),
        "`use_description` must either be TRUE or FALSE!")

    expect_error(UpSet_plot(result_df = "INVALID"), "`result_df` should be a data frame")

    wrong_df <- example_pathfindR_output[, -c(1, 2)]
    ID_column <- "ID"
    nec_cols <- c(ID_column, "lowest_p", "Up_regulated", "Down_regulated")
    expect_error(UpSet_plot(wrong_df, use_description = FALSE), paste0("`result_df` should have the following columns: ",
        paste(dQuote(nec_cols), collapse = ", ")))

    ID_column <- "Term_Description"
    nec_cols <- c(ID_column, "lowest_p", "Up_regulated", "Down_regulated")
    expect_error(UpSet_plot(wrong_df, use_description = TRUE), paste0("`result_df` should have the following columns: ",
        paste(dQuote(nec_cols), collapse = ", ")))

    expect_error(UpSet_plot(result_df = example_pathfindR_output, genes_df = "INVALID"))

    expect_error(UpSet_plot(result_df = example_pathfindR_output, num_terms = "INVALID"),
        "`num_terms` should be numeric or NULL")

    expect_error(UpSet_plot(result_df = example_pathfindR_output, num_terms = -1),
        "`num_terms` should be > 0 or NULL")

    valid_opts <- c("heatmap", "boxplot", "barplot")
    expect_error(UpSet_plot(result_df = example_pathfindR_output, method = "INVALID"),
        paste("`method` should be one of`", paste(dQuote(valid_opts), collapse = ", ")))

    expect_error(UpSet_plot(result_df = example_pathfindR_output, method = "boxplot"),
        "For `method = boxplot`, you must provide `genes_df`")

    expect_error(UpSet_plot(example_pathfindR_output, low = ""))
    expect_error(UpSet_plot(example_pathfindR_output, mid = ""))
    expect_error(UpSet_plot(example_pathfindR_output, high = ""))
})

test_that("`isColor()` -- identifies colors correctly", {
  expect_true(isColor("red"))
  expect_true(isColor("green"))
  expect_true(isColor("black"))
  expect_true(isColor("gray60"))
  expect_true(isColor("#E5D7BF"))

  expect_false(isColor(""))
  expect_false(isColor("a"))
  expect_false(isColor(FALSE))
  expect_false(isColor(1))
  expect_false(isColor(c()))
  expect_false(isColor(list()))
})


================================================
FILE: tests/testthat/test-zzz.R
================================================
## Tests for functions related to java version check - Aug 2023

test_that("`fetch_java_version()` works as expected", {
    version_vec <- c("java version \"13.0.1\" 2019-10-15", "Java(TM) SE Runtime Environment (build 13.0.1+9)",
        "Java HotSpot(TM) 64-Bit Server VM (build 13.0.1+9, mixed mode, sharing)")

    mockery::stub(fetch_java_version, "Sys.getenv", "/path/to/java/home")
    mockery::stub(fetch_java_version, "file.exists", TRUE)
    mockery::stub(fetch_java_version, "system2", version_vec)

    # unix
    mockery::stub(fetch_java_version, "identical", FALSE)
    expect_equal(fetch_java_version(), version_vec)
    # windows
    mockery::stub(fetch_java_version, "identical", TRUE)
    expect_equal(fetch_java_version(), version_vec)

    mockery::stub(fetch_java_version, "system2", c())
    expect_error(fetch_java_version())

    mockery::stub(fetch_java_version, "file.exists", FALSE)
    expect_error(fetch_java_version())

    mockery::stub(fetch_java_version, "Sys.getenv", NA)
    mockery::stub(fetch_java_version, "Sys.which", "path/to/java")
    mockery::stub(fetch_java_version, "system2", version_vec)
    expect_equal(fetch_java_version(), version_vec)
})

test_that("`check_java_version()` works", {
    expect_null(check_java_version())
})

test_that("`check_java_version()` raises parsing error", {
    expect_error(check_java_version(c("version 1.8", "version 1.7")), "Java version detected but couldn't parse version from ")
    expect_error(check_java_version("version XXXX"), "Java version detected but couldn't parse version from: ")
})

test_that("`check_java_version()` works with 1.8", {
    expect_null(check_java_version(c("java version \"1.8.0_144\"", "Java(TM) SE Runtime Environment (build 1.8.0_000-000)",
        "Java HotSpot(TM) 64-Bit Server VM (build 00.000-000, mixed mode)")))
})

test_that("`check_java_version()` works with 14", {
    expect_null(check_java_version(c("java version \"14\" 2020-03-17", "Java(TM) SE Runtime Environment (build 14+36-1461)",
        "Java HotSpot(TM) 64-Bit Server VM (build 14+36-1461, mixed mode, sharing)")))
})

test_that("`check_java_version()` fails with 1.7", {
    expect_error(check_java_version(c("java version \"1.7.0\"", "Java(TM) SE Runtime Environment (build 1.7.0_000-000)",
        "Java HotSpot(TM) 64-Bit Server VM (build 00.000-000, mixed mode)")))
})


================================================
FILE: tests/testthat-active_snw.R
================================================
library(testthat)
library(pathfindR)

test_check("pathfindR", filter = "active_snw_search")


================================================
FILE: tests/testthat-clustering.R
================================================
library(testthat)
library(pathfindR)

test_check("pathfindR", filter = "clustering")


================================================
FILE: tests/testthat-comparison.R
================================================
library(testthat)
library(pathfindR)

test_check("pathfindR", filter = "comparison")


================================================
FILE: tests/testthat-core.R
================================================
library(testthat)
library(pathfindR)

test_check("pathfindR", filter = "core")


================================================
FILE: tests/testthat-data_generation.R
================================================
library(testthat)
library(pathfindR)

test_check("pathfindR", filter = "data_generation")


================================================
FILE: tests/testthat-enrichment.R
================================================
library(testthat)
library(pathfindR)

test_check("pathfindR", filter = "enrichment")


================================================
FILE: tests/testthat-scoring.R
================================================
library(testthat)
library(pathfindR)

test_check("pathfindR", filter = "scoring")


================================================
FILE: tests/testthat-utility.R
================================================
library(testthat)
library(pathfindR)

test_check("pathfindR", filter = "utility")


================================================
FILE: tests/testthat-visualization.R
================================================
library(testthat)
library(pathfindR)

test_check("pathfindR", filter = "visualization")


================================================
FILE: tests/testthat-zzz.R
================================================
library(testthat)
library(pathfindR)

test_check("pathfindR", filter = "zzz")


================================================
FILE: vignettes/.gitignore
================================================
*.html
*.R


================================================
FILE: vignettes/comparing_results.Rmd
================================================
---
title: "Comparing Two pathfindR Results"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Comparing Two pathfindR Results}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  fig.width = 8, fig.height = 4, fig.align = "center"
)
suppressPackageStartupMessages(library(pathfindR))
```

The function `combine_pathfindR_results()` allows combination of two pathfindR active-subnetwork-oriented enrichment analysis results for investigating common and distinct terms between the groups. Below is an example for comparing results using two different rheumatoid arthritis-related data sets (`example_pathfindR_output` and `example_comparison_output`).

```{r compare2res}
combined_df <- combine_pathfindR_results(
  result_A = example_pathfindR_output,
  result_B = example_comparison_output,
  plot_common = FALSE
)
```

By default, `combine_pathfindR_results()` plots the term-gene graph for the common terms in the combined results. For not plotting the graph, set `plot_common = FALSE`. 

The function `combined_results_graph()` can be used to create this graph (using only selected terms etc.) later on. By default, the function creates the graph using all common terms:

```{r compare_graph1}
combined_results_graph(combined_df)
```

By supplying a vector of selected terms to the `selected_terms` arguments, you may plot the term-gene graph for the selected terms:

```{r compare_graph2, fig.width=8, fig.height=4}
combined_results_graph(
  combined_df,
  selected_terms = c("hsa04144", "hsa04141", "hsa04140")
)
```

By default, `combined_results_graph()` creates the graph using term IDs. To use term descriptions instead, set `use_description = TRUE`: 

```{r compare_graph3, eval=FALSE}
combined_results_graph(
  combined_df,
  use_description = TRUE,
  selected_terms = combined_df$Term_Description[1:4]
)
```

For changing the layout of the graph (`"auto"` by default), you may use the `layout` argument.

For changing how the sizes of the term nodes are determined, you may use the `node_size` argument. The options are `"num_genes"` (default) and `"p_val"` for using the number of significant genes in the term and the -log10(p) value of the term, respectively:

```{r compare_graph4, eval=FALSE}
combined_results_graph(
  combined_df,
  selected_terms = c("hsa04144", "hsa04141", "hsa04140"),
  node_size = "p_val"
)
```


================================================
FILE: vignettes/intro_vignette.Rmd
================================================
---
title: "Introduction to pathfindR"
author: "Ege Ulgen"
date: "`r Sys.Date()`"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Introduction to pathfindR}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r setup, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  eval = FALSE,
  fig.width = 7, fig.height = 7, fig.align = "center"
)
suppressPackageStartupMessages(library(pathfindR))
```

`pathfindR` is a tool for enrichment analysis via active subnetworks. The package also offers functionality to cluster the enriched terms and identify representative terms in each cluster, to score the enriched terms per sample and to visualize analysis results.
    
The functionality suite of pathfindR is described in detail in _Ulgen E, Ozisik O, Sezerman OU. 2019. pathfindR: An R Package for Comprehensive Identification of Enriched Pathways in Omics Data Through Active Subnetworks. Front. Genet. [https://doi.org/10.3389/fgene.2019.00858](https://doi.org/10.3389/fgene.2019.00858)_

# Overview

The observation that motivated us to develop `pathfindR` was that direct enrichment analysis of differential RNA/protein expression or DNA methylation results may not provide the researcher with the full picture. That is to say: enrichment analysis of only a list of significant genes alone may not be informative enough to explain the underlying disease mechanisms. Therefore, we considered leveraging interaction information from a protein-protein interaction network (PIN) to identify distinct active subnetworks and then perform enrichment analyses on these subnetworks.

> An active subnetwork can be defined as a group of interconnected genes in a PIN that predominantly consists of significantly altered genes. In other words, active subnetworks define distinct disease-associated sets of interacting genes, whether discovered through the original analysis or discovered because of being in interaction with a significant gene.

The active-subnetwork-oriented enrichment analysis approach of pathfindR can be summarized as follows: Mapping the input genes with the associated p values onto the PIN (after processing the input), active subnetwork search is performed. The resulting active subnetworks are then filtered based on their scores and the number of significant genes they contain. This filtered list of active subnetworks are then used for enrichment analyses, i.e. using the genes in each of the active subnetworks, the significantly enriched terms (pathways/gene sets) are identified. Enriched terms with adjusted p values larger than the given threshold are discarded and the lowest adjusted p value (over all active subnetworks) for each term is kept. This process of `active subnetwork search + enrichment analyses` is repeated for a selected number of iterations, performed in parallel. Over all iterations, the lowest and the highest adjusted-p values, as well as number of occurrences over all iterations are reported for each significantly enriched term in the resulting data frame. This active-subnetwork-oriented enrichment approach is demonstrated in the section [Active-subnetwork-oriented Enrichment Analysis] of this vignette.

The enrichment analysis usually yields a great number of enriched terms whose biological functions are related. Therefore, we implemented two clustering approaches using a pairwise distance matrix based on the kappa statistics between the enriched terms (as proposed by Huang et al. [^1]). Based on this distance metric, the user can perform either hierarchical (default) or fuzzy clustering of the enriched terms. Details of clustering and partitioning of enriched terms are presented in the [Clustering Enriched Terms] section of this vignette.

Other functionality of pathfindR includes:

- agglomerated score calculation per each term (to investigate how a gene set is altered in a given sample)
- visualization of terms and term-related genes as a graph (to determine the degree of overlap between the enriched terms by identifying shared and/or distinct significant genes) and
- pathfindR analysis with custom gene sets are also briefly described.

# Active-subnetwork-oriented Enrichment Analysis

<img src="./pathfindr.png" style="max-width:100%;" />

For convenience, we provide  the wrapper function `run_pathfindR()` to be used for the active-subnetwork-oriented enrichment analysis. The input for this function must be a data frame consisting of the columns containing: `Gene Symbols`, `Change Values` (optional) and `p values`. The example data frame used in this vignette (`example_pathfindR_input`) is the dataset containing the differentially-expressed genes for the GEO dataset GSE15573 comparing 18 rheumatoid arthritis (RA) patients versus 15 healthy subjects.

The first 6 rows of the example input data frame are displayed below:

```{r load_pkg, eval=TRUE}
library(pathfindR)
knitr::kable(head(example_pathfindR_input))
```

> For a detailed step-by-step explanation and an unwrapped demonstration of the active-subnetwork-oriented enrichment analysis, see the vignette [Step-by-Step Execution of the pathfindR Enrichment Workflow](manual_execution.html)

Executing the workflow is straightforward (but does typically take several minutes):

```{r run_pathfindR}
output_df <- run_pathfindR(example_pathfindR_input)
```

## Useful arguments

This subsection demonstrates some (selected) useful arguments of `run_pathfindR()`. For a full list of arguments, see `?run_pathfindR` or visit [our GitHub wiki](https://github.com/egeulgen/pathfindR/wiki).

### Filtering Input Genes

By default, `run_pathfindR()` uses the input genes with p-values < 0.05. To change this threshold, use `p_val_threshold`:

```{r change_input_thr}
output_df <- run_pathfindR(example_pathfindR_input, p_val_threshold = 0.01)
```

### Output Directory

By default, `run_pathfindR()` creates a temporary directory for writing the output files, including active subnetwork search results and a HTML report. To set the output directory, use `output_dir`:

```{r change_out_dir}
output_df <- run_pathfindR(example_pathfindR_input, output_dir = "this_is_my_output_directory")
```

This creates `"this_is_my_output_directory"` under the current working directory.

In essence, this argument is treated as a path so it can be used to create the output directory anywhere. For example, to create the directory `"my_dir"` under `"~/Desktop"` and run the analysis there, you may run:

```{r change_out_dir2}
output_df <- run_pathfindR(example_pathfindR_input, output_dir = "~/Desktop/my_dir")
```

> Note: If the output directory (e.g. `"my_dir"`) already exists, `run_pathfindR()` creates and works under `"my_dir(1)"`. If that exists also exists, it creates `"my_dir(2)"` and so on. This was intentionally implemented so that any previous pathfindR results are not overwritten.

### Gene Sets for Enrichment

The active-subnetwork-oriented enrichment analyses can be performed on any gene sets (biological pathways, gene ontology terms, transcription factor target genes, miRNA target genes etc.). The available gene sets in pathfindR are "KEGG", "Reactome", "BioCarta", "GO-All", "GO-BP", "GO-CC" and "GO-MF" (all for Homo sapiens). For changing the default gene sets for enrichment analysis (hsa KEGG pathways), use the argument `gene_sets`

```{r change_gset1}
output_df <- run_pathfindR(example_pathfindR_input, gene_sets = "GO-MF")
```

By default, `run_pathfindR()` filters the gene sets by including only the terms containing at least 10 and at most 300 genes. To change the default behavior, you may change `min_gset_size` and `max_gset_size`:

```{r change_gset2}
## Including more terms for enrichment analysis
output_df <- run_pathfindR(example_pathfindR_input,
  gene_sets = "GO-MF",
  min_gset_size = 5,
  max_gset_size = 500
)
```

> Note that increasing the number of terms for enrichment analysis may result in significantly longer run time.

If the user prefers to use another gene set source, the `gene_sets` argument should be set to `"Custom"` and the custom gene sets (list) and the custom gene set descriptions (named vector) should be supplied via the arguments `custom_genes` and `custom_descriptions`, respectively. See `?fetch_gene_set` for more details and [Analysis with Custom Gene Sets] for a simple demonstration.

For details on obtaining organism-specific Gene Sets and PIN data, see the vignette [Obtaining PIN and Gene Sets Data](obtain_data.html).

### Filtering Enriched Terms by Adjusted-p Values

By default, `run_pathfindR()` adjusts the enrichment p values via the "bonferroni" method and filters the enriched terms by adjusted-p value < 0.05. To change this adjustment method and the threshold, set `adj_method` and `enrichment_threshold`, respectively:

```{r change_enr_threshold}
output_df <- run_pathfindR(example_pathfindR_input,
  adj_method = "fdr",
  enrichment_threshold = 0.01
)
```

### Protein-protein Interaction Network

For the active subnetwork search process, a protein-protein interaction network (PIN) is used. `run_pathfindR()` maps the input genes onto this PIN and identifies active subnetworks which are then be used for enrichment analyses. To change the default PIN ("Biogrid"), use the `pin_name_path` argument:

```{r change_PIN1}
output_df <- run_pathfindR(example_pathfindR_input, pin_name_path = "IntAct")
```

The `pin_name_path` argument can be one of "Biogrid", "STRING", "GeneMania", "IntAct", "KEGG", "mmu_STRING" or it can be the path to a custom PIN file provided by the user.

```{r change_PIN2}
# to use an external PIN of your choice
output_df <- run_pathfindR(example_pathfindR_input, pin_name_path = "/path/to/myPIN.sif")
```

> NOTE: the PIN is also used for generating the background genes (in this case, all unique genes in the PIN) during hypergeometric-distribution-based tests in enrichment analyses. Therefore, a large PIN will generally result in better results.

### Active Subnetwork Search Method

Currently, there are three algorithms implemented in pathfindR for active subnetwork search: Greedy Algorithm (default, based on Ideker et al. [^2]), Simulated Annealing Algorithm (based on Ideker et al. [^2]) and Genetic Algorithm (based on Ozisik et al. [^3]). For a detailed discussion on which algorithm to use see [this wiki entry](https://github.com/egeulgen/pathfindR/wiki/Active-subnetwork-oriented-Enrichment-Documentation#selecting-the-active-subnetwork-search-algorithm)

```{r change_method}
# for simulated annealing:
output_df <- run_pathfindR(example_pathfindR_input, search_method = "SA")
# for genetic algorithm:
output_df <- run_pathfindR(example_pathfindR_input, search_method = "GA")
```

### Other Arguments 

Because the active subnetwork search algorithms are stochastic, `run_pathfindR()` may be set to iterate the active subnetwork identification and enrichment steps multiple times (by default 1 time). To change this number, set `iterations`:

```{r change_n_iters}
output_df <- run_pathfindR(example_pathfindR_input, iterations = 25)
```

`run_pathfindR()` uses a parallel loop (using the package `foreach`) for performing these iterations in parallel. By default, the number of processes to be used is determined automatically. To override, change `n_processes`:

``` {r change_n_proc}
# if not set, `n_processes` defaults to (number of detected cores - 1)
output_df <- run_pathfindR(example_pathfindR_input, iterations = 5, n_processes = 2)
```

## Output

### Enriched Terms Data Frame
`run_pathfindR()` returns a data frame of enriched terms. Columns are:

- ID: ID of the enriched term
- Term_Description: Description of the enriched term
- Fold_Enrichment: Fold enrichment value for the enriched term (Calculated using ONLY the input genes)
- occurrence: The number of iterations that the given term was found to enriched over all iterations
- lowest_p: the lowest adjusted-p value of the given term over all iterations
- highest_p: the highest adjusted-p value of the given term over all iterations
- non_Signif_Snw_Genes (OPTIONAL): the non-significant active subnetwork genes, comma-separated (controlled by `list_active_snw_genes`, default is `FALSE`)
- Up_regulated: the up-regulated genes (as determined by `change value` > 0, if the `change column` was provided) in the input involved in the given term's gene set, comma-separated. If change column was not provided, all affected input genes are listed here.
- Down_regulated: the down-regulated genes (as determined by `change value` < 0, if the `change column` was provided) in the input involved in the given term's gene set, comma-separated

The first 2 rows of the output data frame of the example analysis on the rheumatoid arthritis gene-level differential expression input data (`example_pathfindR_input`) is shown below:

```{r example_out, eval=TRUE}
knitr::kable(head(example_pathfindR_output, 2))
```

By default, `run_pathfindR()` also produces a graphical summary of enrichment results for top 10 enriched terms, which can also be later produced by `enrichment_chart()`:

<img src="./enrichment_chart.png" style="max-width:100%;" />

You may also disable plotting this chart by setting `plot_enrichment_chart=FALSE` and later produce this plot via the function `enrichment_chart()`:

```{r encrichment_plot_shown}
# change number of top terms plotted (default = 10)
enrichment_chart(
  result_df = example_pathfindR_output,
  top_terms = 15
)
```

### HTML Report (created when `output_dir` is set)

The function also creates an HTML report `results.html` that is saved in the output directory if it's set. This report contains links to two other HTML files:

**1. `enriched_terms.html`**

This document contains the table of the active subnetwork-oriented enrichment results (same as the returned data frame).

**2. `conversion_table.html`**

This document contains the table of converted gene symbols. Columns are:

- Old Symbol: the original gene symbol
- Converted Symbol: the alias symbol that was found in the PIN
- Change: the provided change value
- p-value: the provided adjusted p value

> During input processing, gene symbols that are not in the PIN are identified and excluded. For human genes, if aliases of these missing gene symbols are found in the PIN, these symbols are converted to the corresponding aliases (controlled by the argument `convert2alias`). This step is performed to best map the input data onto the PIN.

The document contains a second table of genes for which **no interactions** were identified after checking for alias symbols (so these could not be used during the analysis).

## Enriched Term Diagrams

For KEGG enrichment analyses, `visualize_terms()` can be used to generate KEGG pathway diagrams that are returned as a list of `ggraph` objects (using [`ggkegg`](https://github.com/noriakis/ggkegg))::

```{r KEGG_vis}
input_processed <- input_processing(example_pathfindR_input)
gg_list <- visualize_terms(
  result_df = example_pathfindR_output,
  input_processed = input_processed,
  is_KEGG_result = TRUE
)  # this function returns a list of ggraph objects (named by Term ID)

# save one of the plots as PDF image
ggplot2::ggsave(
  "hsa04911_diagram.pdf",   # path to output, format is determined by extension
  gg_list$hsa04911,         # what to plot
  width = 5,                 # adjust width
  height = 5                # adjust height
) 
```

Alternatively (i.e., for other types of non-KEGG enrichment analyses), an interaction diagram per enriched term can be generated again via `visualize_terms()`. These diagrams are also returned as a list of `ggraph` objects:

```{r nonKEGG_viss}
input_processed <- input_processing(example_pathfindR_input)
gg_list <- visualize_terms(
  result_df = example_pathfindR_output,
  input_processed = input_processed,
  is_KEGG_result = FALSE,
  pin_name_path = "Biogrid"
)  # this function returns a list of ggraph objects (named by Term ID)

# save one of the plots as PDF image
ggplot2::ggsave(
  "diabetic_cardiomyopathy_interactions.pdf",   # path to output, format is determined by extension
  gg_list$hsa04911,                             # what to plot
  width = 10,                                    # adjust width
  height = 6                                    # adjust height
) 
```

# Clustering Enriched Terms

<img src="./term_clustering.png" style="max-width:100%;" />

The wrapper function `cluster_enriched_terms()` can be used to perform clustering of enriched terms and partitioning the terms into biologically-relevant groups. Clustering can be performed either via `hierarchical` or `fuzzy` method using the pairwise kappa statistics (a chance-corrected measure of co-occurrence between two sets of categorized data) matrix between all enriched terms.

## Hierarchical Clustering

By default, `cluster_enriched_terms()` performs hierarchical clustering of the terms (using $1 - \kappa$ as the distance metric). Iterating over $2,3,...n$ clusters (where $n$ is the number of terms), `cluster_enriched_terms()` determines the optimal number of clusters by maximizing the average silhouette width, partitions the data into this optimal number of clusters and returns a data frame with cluster assignments.

```{r hierarchical0}
example_pathfindR_output_clustered <- cluster_enriched_terms(example_pathfindR_output, plot_dend = FALSE, plot_clusters_graph = FALSE)
```

```{r hierarchical1, eval=TRUE}
## First 2 rows of clustered data frame
knitr::kable(head(example_pathfindR_output_clustered, 2))
## The representative terms
knitr::kable(example_pathfindR_output_clustered[example_pathfindR_output_clustered$Status == "Representative", ])
```

After clustering, you may again plot the summary enrichment chart and display the enriched terms by clusters:

```{r hierarchical2, eval=TRUE}
# plotting only selected clusters for better visualization
selected_clusters <- subset(example_pathfindR_output_clustered, Cluster %in% 5:7)
enrichment_chart(selected_clusters, plot_by_cluster = TRUE)
```

For details, see `?hierarchical_term_clustering`

## Heuristic Fuzzy Multiple-linkage Partitioning

Alternatively, the `fuzzy` clustering method (as described by Huang et al.[^1]) can be used:

```{r fuzzy}
clustered_fuzzy <- cluster_enriched_terms(example_pathfindR_output, method = "fuzzy")
```

For details, see `?fuzzy_term_clustering`

# Aggregated Term Scores per Sample

The function `score_terms()` can be used to calculate the agglomerated z score of each enriched term per sample. This allows the user to individually examine the scores and infer how a term is overall altered (activated or repressed) in a given sample or a group of samples.

```{r scores}
## Vector of "Case" IDs
cases <- c(
  "GSM389703", "GSM389704", "GSM389706", "GSM389708",
  "GSM389711", "GSM389714", "GSM389716", "GSM389717",
  "GSM389719", "GSM389721", "GSM389722", "GSM389724",
  "GSM389726", "GSM389727", "GSM389730", "GSM389731",
  "GSM389733", "GSM389735"
)

## Calculate scores for representative terms
## and plot heat map using term descriptions
representative_df <- example_pathfindR_output_clustered[example_pathfindR_output_clustered$Status == "Representative", ]
score_matrix <- score_terms(
  enrichment_table = representative_df,
  exp_mat = example_experiment_matrix,
  cases = cases,
  use_description = TRUE, # default FALSE
  label_samples = FALSE, # default = TRUE
  case_title = "RA", # default = "Case"
  control_title = "Healthy", # default = "Control"
  low = "#f7797d", # default = "green"
  mid = "#fffde4", # default = "black"
  high = "#1f4037" # default = "red"
)
```
<img src="./score_hmap.png" style="max-width:100%;" />


# Comparison of 2 pathfindR Results

The function `combine_pathfindR_results()` allows combination of two pathfindR active-subnetwork-oriented enrichment analysis results for investigating common and distinct terms between the groups. Below is an example for comparing results using two different rheumatoid arthritis-related data sets(`example_pathfindR_output` and `example_comparison_output`).

```{r compare2res, eval=TRUE, fig.height=4, fig.width=8}
combined_df <- combine_pathfindR_results(
  result_A = example_pathfindR_output,
  result_B = example_comparison_output,
  plot_common = FALSE
)
```

For more details, see the vignette [Comparing Two pathfindR Results](comparing_results.html)

# Analysis with Custom Gene Sets

> As of v1.5, pathfindR offers utility functions for obtaining organism-specific PIN data and organism-specific gene sets data via `get_pin_file()` and `get_gene_sets_list()`, respectively. See the vignette [Obtaining PIN and Gene Sets Data](obtain_data.html) for detailed information on how to gather PIN and gene sets data (for any organism of your choice) for use with pathfindR.

It is possible to use `run_pathfindR()` with custom gene sets (including gene sets for non-Homo-sapiens species). Here, we provide an example application of active-subnetwork-oriented enrichment analysis of the target genes of two transcription factors.

We first load and prepare the gene sets:

```{r custom_prep, eval=TRUE}
## CREB target genes
CREB_target_genes <- normalizePath(system.file("extdata/CREB.txt", package = "pathfindR"))
CREB_target_genes <- readLines(CREB_target_genes)[-c(1, 2)] # skip the first two lines

## MYC target genes
MYC_target_genes <- normalizePath(system.file("extdata/MYC.txt", package = "pathfindR"))
MYC_target_genes <- readLines(MYC_target_genes)[-c(1, 2)] # skip the first two lines

## Prep for use
custom_genes <- list(TF1 = CREB_target_genes, TF2 = MYC_target_genes)
custom_descriptions <- c(TF1 = "CREB target genes", TF2 = "MYC target genes")
```

We next prepare the example input data frame. Because of the way we choose genes, we expect significant enrichment for MYC targets (40 MYC target genes + 10 CREB target genes). Because this is only an example, we also assign each genes random p-values between 0.001 and 0.05.

```{r custom_input, eval=TRUE}
set.seed(123)

## Select 40 random genes from MYC gene sets and 10 from CREB gene sets
selected_genes <- sample(MYC_target_genes, 40)
selected_genes <- c(
  selected_genes,
  sample(CREB_target_genes, 10)
)

## Assign random p value between 0.001 and 0.05 for each selected gene
rand_p_vals <- sample(seq(0.001, 0.05, length.out = 5),
  size = length(selected_genes),
  replace = TRUE
)

example_pathfindR_input <- data.frame(
  Gene_symbol = selected_genes,
  p_val = rand_p_vals
)
knitr::kable(head(example_pathfindR_input))
```

Finally, we perform active-subnetwork-oriented enrichment analysis via `run_pathfindR()` using the custom genes as the gene sets:

```{r custom_run}
example_custom_genesets_result <- run_pathfindR(
  example_pathfindR_input,
  gene_sets = "Custom",
  custom_genes = custom_genes,
  custom_descriptions = custom_descriptions,
  min_gset_size = 1,   # do not limit the gene set size for demo
  max_gset_size = Inf, # do not limit the gene set size for demo
)

knitr::kable(example_custom_genesets_result)
```

```{r custom_result1, eval=TRUE, echo=FALSE}
knitr::kable(example_custom_genesets_result)
```

> It is also possible to run pathfindR using non-human organism annotation. See the vignette [pathfindR Analysis for non-Homo-sapiens organisms](non_hs_analysis.html)

[^1]: Huang DW, Sherman BT, Tan Q, et al. The DAVID Gene Functional Classification Tool: a novel biological module-centric algorithm to functionally analyze large gene lists. Genome Biol. 2007;8(9):R183.

[^2]: Ideker T, Ozier O, Schwikowski B, Siegel AF. Discovering regulatory and signalling circuits in molecular interaction networks. Bioinformatics. 2002;18 Suppl 1:S233-40.

[^3]: Ozisik O, Bakir-Gungor B, Diri B, Sezerman OU. Active Subnetwork GA: A Two Stage Genetic Algorithm Approach to Active Subnetwork Search. Current Bioinformatics. 2017; 12(4):320-8. \doi 10.2174/1574893611666160527100444


================================================
FILE: vignettes/manual_execution.Rmd
================================================
---
title: "Step-by-Step Execution of the pathfindR Enrichment Workflow"
author: "Ege Ulgen"
date: "`r Sys.Date()`"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Step-by-Step Execution of the pathfindR Enrichment Workflow}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r setup, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  eval = FALSE
)
```

This vignette walks through each step of the pathfindR active-subnetwork-oriented pathway enrichment analysis. For most purposes, the wrapper function `run_pathfindR()` can be used to perform this analysis from start to end. For users who wish to have further control over the enrichment workflow, this vignette will be more useful.


<img src="./pathfindr.png" style="max-width:100%;" />

# Load the package and prepare the input data frame

We first need to load the package and the input data to be used for analysis. The input must be a data frame consisting of the following columns: `Gene Symbols`, `Change Values` (optional) and `p values`. The example data frame used in this vignette (`example_pathfindR_input`) is the dataset containing the differentially-expressed genes for the GEO dataset GSE15573 comparing 18 rheumatoid arthritis (RA) patients versus 15 healthy subjects.

```{r init_steps, eval=TRUE}
suppressPackageStartupMessages(library(pathfindR))
data(example_pathfindR_input)
head(example_pathfindR_input, 3)
```

# The protein-protein interaction network (PIN)

For the active subnetwork search process, we will need a protein-protein interaction network (PIN). pathfindR will map the input genes onto this PIN and identify active subnetworks which will then be used for enrichment analyses.

> An active subnetwork can be defined as a group of interconnected genes in a protein-protein interaction network (PIN) that predominantly consists of significantly altered genes. In other words, active subnetworks define distinct disease-associated sets of interacting genes, whether discovered through the original analysis or discovered because of being in interaction with a significant gene.

The `pin_name_path` argument in all functions can be one of "Biogrid", "STRING", "GeneMania", "IntAct", "KEGG", "mmu_STRING" or it can be the path to a custom PIN file provided by the user.

# Process input data

We next need to process the input data for use in analysis via `input_processing()`:

```{r process}
example_processed <- input_processing(
  input = example_pathfindR_input, # the input: in this case, differential expression results
  p_val_threshold = 0.05, # p value threshold to filter significant genes
  pin_name_path = "Biogrid", # the name of the PIN to use for active subnetwork search
  convert2alias = TRUE # boolean indicating whether or not to convert missing symbols to alias symbols in the PIN
)
```

> After checking that the data frame complies with the requirements, `input_processing()` filters the input so that genes with p values larger than `p_val_threshold` are excluded. Next, gene symbols that are not in the PIN are identified and excluded. For human genes, if aliases of these missing gene symbols are found in the PIN, these symbols are converted to the corresponding aliases (controlled by the argument `convert2alias`). This step is performed to best map the input data onto the PIN.

# Obtain Gene Set Data

We obtain the necessary gene sets for enrichment analyses using `fetch_gene_set()`:

``` {r gene_set}
# using "BioCarta" as our gene sets for enrichment
biocarta_list <- fetch_gene_set(
  gene_sets = "BioCarta",
  min_gset_size = 10,
  max_gset_size = 300
)
biocarta_gsets <- biocarta_list[[1]]
biocarta_descriptions <- biocarta_list[[2]]
```

> The available gene sets in pathfindR are "KEGG", "Reactome", "BioCarta", "GO-All", "GO-BP", "GO-CC" and "GO-MF". If the user prefers to use another gene set source, the `gene_sets` argument should be set to `"Custom"` and the custom gene sets (list) and the custom gene set descriptions (named vector) should be supplied via the arguments `custom_genes` and `custom_descriptions`, respectively. See `?fetch_gene_set` for more details.


# Active Subnetwork Search and Enrichment Analyses

As outlined in the vignette [Introduction to pathfindR](intro_vignette.html), `run_pathfindR()` initially identifies and filters active subnetworks, then performs enrichment analyses on these subnetworks and summarize the results.

To perform these steps manually, we utilize the function `active_snw_search()` for identifying and filtering active subnetworks and the function `enrichment_analyses()` for obtaining enriched terms using these subnetworks. Because the active subnetwork search algorithms are stochastic, we suggest iterating these subnetwork identification and enrichment steps multiple times (especially for "SA")[^1]:

[^1]: Here we are using a regular `for` loop. In the wrapper function `run_pathfindR()`, however, a parallel loop (via the package `foreach`) is used.

```{r snw_search}
n_iter <- 10 ## number of iterations
combined_res <- NULL ## to store the result of each iteration

for (i in 1:n_iter) {
  ###### Active Subnetwork Search
  snws_file <- paste0("active_snws_", i) # Name of output file
  active_snws <- active_snw_search(
    input_for_search = example_processed,
    pin_name_path = "Biogrid",
    snws_file = snws_file,
    score_quan_thr = 0.8, # you may tweak these arguments for optimal filtering of subnetworks
    sig_gene_thr = 0.02, # you may tweak these arguments for optimal filtering of subnetworks
    search_method = "GR", # we suggest using GR
    seedForRandom = i # setting seed to ensure reproducibility per iteration
  )

  ###### Enrichment Analyses
  current_res <- enrichment_analyses(
    snws = active_snws,
    sig_genes_vec = example_processed$GENE,
    pin_name_path = "Biogrid",
    genes_by_term = biocarta_gsets,
    term_descriptions = biocarta_descriptions,
    adj_method = "bonferroni",
    enrichment_threshold = 0.05,
    list_active_snw_genes = TRUE
  ) # listing the non-input active snw genes in output

  ###### Combine results via `rbind`
  combined_res <- rbind(combined_res, current_res)
}
```

# Summary of Enrichment Results

We next summarize the enrichment results (in `combined_res`) using `summarize_enrichment_results()` and annotate the involved significant (input) genes in each term using `annotate_term_genes()`.

```{r post_proc}
###### Summarize Combined Enrichment Results
summarized_df <- summarize_enrichment_results(combined_res,
  list_active_snw_genes = TRUE
)

###### Annotate Affected Genes Involved in Each Enriched Term
final_res <- annotate_term_genes(
  result_df = summarized_df,
  input_processed = example_processed,
  genes_by_term = biocarta_gsets
)
```

# Visualizations

We can visualize each enriched term diagram using `visualize_terms()`. In this case, these will be graphs of interactions of pathway-involved genes for each pathway. See `?visualize_terms` for more details.

```{r vis_pws}
visualize_terms(
  result_df = final_res,
  hsa_KEGG = FALSE, # boolean to indicate whether human KEGG gene sets were used for enrichment analysis or not
  pin_name_path = "Biogrid"
)
```

<img src="./example_interaction_vis.png" style="max-width:100%;" />

We can also create a graphical summary of the top 10 enrichment results using `enrichment_chart()`:

```{r enr_chart}
enrichment_chart(final_res[1:10, ])
```

<img src="./man_enrichment_chart.png" style="max-width:100%;" />

The x-axis corresponds to fold enrichment values while the y-axis indicates the enriched terms. The size of each bubble indicates the number of significant genes in the given enriched term. Color indicates the -log10(lowest-p) value. The closer the color is to red, the more significant the enrichment is.


================================================
FILE: vignettes/non_hs_analysis.Rmd
================================================
---
title: "pathfindR Analysis for non-Homo-sapiens organisms"
author: "Ege Ulgen"
date: "`r Sys.Date()`"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{pathfindR Analysis for non-Homo-sapiens organisms}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  fig.width = 7, fig.height = 7, fig.align = "center",
  eval = FALSE
)
suppressPackageStartupMessages(library(pathfindR))
```

As mentioned in the vignette [Introduction to pathfindR](intro_vignette.html), enrichment analysis with pathfindR is not limited to the built-in data. The users are able to utilize custom protein-protein interaction networks (PINs) as well as custom gene sets. These abilities to use custom data naturally allow for performing pathfindR analysis on non-Homo-sapiens input data. In this vignette, we'll try to provide an overview of how pathfindR analysis using Mus musculus data can be performed. 

# Preparation of Necessary Data

> As of v1.5, pathfindR offers utility functions for obtaining organism-specific PIN data and organism-specific gene sets data via `get_pin_file()` and `get_gene_sets_list()`, respectively. See the vignette [Obtaining PIN and Gene Sets Data](obtain_data.html) for detailed information on how to gather PIN and gene sets data (for any organism of your choice) for use with pathfindR.

For performing non-human active-subnetwork-oriented enrichment analysis, the user needs the following resources:

- organism-specific protein interaction network (PIN) data
- organism-specific gene sets data

After obtaining and processing these data for use, the user can run pathfindR using custom parameters.

> Important Note: Because the non-human organism-specific PIN will likely contain less interactions than the Homo sapiens PIN, pathfindR may result in less (or even no) enriched terms.

## Obtain Organism-specific Gene Sets 

We can obtain the up-to-date M.musculus (KEGG identifier: mmu) KEGG Pathway Gene Sets using the function `get_gene_sets_list()`:

> If using another organism, all you have to do is to replace "mmu" with the KEGG organism code in the related arguments in this vignette.

```{r mmu_kegg}
gsets_list <- get_gene_sets_list(
  source = "KEGG",
  org_code = "mmu"
)
```

This returns a list containing 2 objects named: `gene_sets` containing sets of genes of each pathway and `desriptions` containing the description of each pathway. 

The M.musculus KEGG gene set data `mmu_kegg_genes` and `mmu_kegg_descriptions` are already provided in pathfindR. For other organisms, the user may wish to save the data as RDS files for future use:

```{r KEGG_save}
mmu_kegg_genes <- gsets_list$gene_sets
mmu_kegg_descriptions <- gsets_list$descriptions

## Save both as RDS files for later use
saveRDS(mmu_kegg_genes, "mmu_kegg_genes.RDS")
saveRDS(mmu_kegg_descriptions, "mmu_kegg_descriptions.RDS")
```

These can be later loaded via:

```{r KEGG_load}
mmu_kegg_genes <- readRDS("mmu_kegg_genes.RDS")
mmu_kegg_descriptions <- readRDS("mmu_kegg_descriptions.RDS")
```


> The function `get_gene_sets_list()` can also be used to obtain gene sets data from other sources. See the vignette [Obtaining PIN and Gene Sets Data](obtain_data.html) for more detail.


## Obtain Organism-specific Protein-protein Interaction Network

You may use the function `get_pin_file()` to obtain organism-specific BioGRID PIN data (see the vignette [Obtaining PIN and Gene Sets Data](obtain_data.html))

> Note that BioGRID PINs are smaller for non-H.sapiens organisms and this, in turn, results in less or no significantly enriched terms with pathfindR analysis.

Here, we demonstrate obtaining the organism-specific protein-protein interaction network (PIN) from [STRING](https://string-db.org/). You may choose the organism of your choice and find the PIN on the downloads page with the description "protein network data (scored links between proteins)". When processing, we recommend filtering the interactions using a link score threshold (e.g. 800).


Regardless of the resource, the raw PIN data should be processed to a SIF file, each interactor should be specified with their gene symbols. The first 3 interactions from an example SIF file is provided below:

|         |   |        |
|:--------|:--|:-------|
|C2cd2    |pp |Ints2   |
|Apob     |pp |Gpt     |
|B4galnt1 |pp |Mettl1  |

Notice there are no headers and each line contains an interaction in the form `GeneA pp GeneB`, separated by tab (i.e. `\t`) with no row names and no column names.

Below we download process the STRING PIN for use with pathfindR:

```{r process_PIN1}
## Downloading the STRING PIN file to tempdir
url <- "https://stringdb-static.org/download/protein.links.v11.0/10090.protein.links.v11.0.txt.gz"
path2file <- file.path(tempdir(check = TRUE), "STRING.txt.gz")
download.file(url, path2file)

## read STRING pin file
mmu_string_df <- read.table(path2file, header = TRUE)

## filter using combined_score cut-off value of 800
mmu_string_df <- mmu_string_df[mmu_string_df$combined_score >= 800, ]

## fix ids
mmu_string_pin <- data.frame(
  Interactor_A = sub("^10090\\.", "", mmu_string_df$protein1),
  Interactor_B = sub("^10090\\.", "", mmu_string_df$protein2)
)
head(mmu_string_pin, 2)
```

|Interactor_A       |Interactor_B       |
|:------------------|:------------------|
|ENSMUSP00000000001 |ENSMUSP00000017460 |
|ENSMUSP00000000001 |ENSMUSP00000039107 |

Since the interactors are Ensembl peptide IDs, we'll need to convert them to MGI symbols for use with pathfindR. This can be achieved via `biomaRt` or any other conversion method you prefer:

```{r process_PIN2, eval=FALSE}
# library(biomaRt)

mmu_ensembl <- useMart("ensembl", dataset = "mmusculus_gene_ensembl")

converted <- getBM(
  attributes = c("ensembl_peptide_id", "mgi_symbol"),
  filters = "ensembl_peptide_id",
  values = unique(unlist(mmu_string_pin)),
  mart = mmu_ensembl
)
mmu_string_pin$Interactor_A <- converted$mgi_symbol[match(mmu_string_pin$Interactor_A, converted$ensembl_peptide_id)]
mmu_string_pin$Interactor_B <- converted$mgi_symbol[match(mmu_string_pin$Interactor_B, converted$ensembl_peptide_id)]
mmu_string_pin <- mmu_string_pin[!is.na(mmu_string_pin$Interactor_A) & !is.na(mmu_string_pin$Interactor_B), ]
mmu_string_pin <- mmu_string_pin[mmu_string_pin$Interactor_A != "" & mmu_string_pin$Interactor_B != "", ]

head(mmu_string_pin, 2)
```

| Interactor_A | Interactor_B |
|:------------:|:------------:|
|    Gnai3     |     Ppy      |
|    Gnai3     |     Ccr3     |

Next, we remove self interactions and any duplicated interactions, format the data frame as SIF:

```{r process_PIN3}
# remove self interactions
self_intr_cond <- mmu_string_pin$Interactor_A == mmu_string_pin$Interactor_B
mmu_string_pin <- mmu_string_pin[!self_intr_cond, ]

# remove duplicated inteactions (including symmetric ones)
mmu_string_pin <- unique(t(apply(mmu_string_pin, 1, sort))) # this will return a matrix object

mmu_string_pin <- data.frame(
  A = mmu_string_pin[, 1],
  pp = "pp",
  B = mmu_string_pin[, 2]
)
```

Finally, we save the gene symbol PIN as a SIF file named "mmusculusPIN.sif" under the temporary directory (i.e. `tempdir()`):

```{r process_PIN4}
path2SIF <- file.path(tempdir(), "mmusculusPIN.sif")
write.table(mmu_string_pin,
  file = path2SIF,
  col.names = FALSE,
  row.names = FALSE,
  sep = "\t",
  quote = FALSE
)
path2SIF <- normalizePath(path2SIF)
```

We'll use this path to the custom sif for analysis with `run_pathfindR()`.

>The STRING Mus musculus PIN created above is available in pathfindR and can be used via setting `pin_name_path = "mmu_STRING"` in `run_pathfindR()`.

# Running pathfindR on non-Homo sapiens data

## Input Data

The data used in this vignette (`example_mmu_input`) is the data frame of differentially-expressed genes along for the GEO dataset [GSE99393](https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE99393). The RNA microarray experiment was perform to detail the global program of gene expression underlying polarization of myeloma-associated macrophages by CSF1R antibody treatment. The samples are 6 murine bone marrow derived macrophages co-cultured with myeloma cells (myeloma-associated macrophages), 3 of which were treated with CSF1R antibody (treatment group) and the rest were treated with control IgG antibody (control group). In `example_mmu_input`, 45 differentially-expressed genes with |logFC| >= 2 and FDR <= 0.05 are presented.

```{r mmu_input_df, eval=TRUE}
knitr::kable(head(example_mmu_input))
```


## Executing `run_pathfindR()`

After obtaining the necessary PIN and gene sets data, you can then perform pathfindR analysis by setting these arguments:
- `convert2alias = FALSE`: alias conversion only works on H.sapiens genes
- `pin_name_path = path2SIF`: as we're using a non-built-in PIN, we need to provide the path to the mmu sif file
- `gene_sets = "Custom`: as we're using a non-built-in source for gene sets
- `custom_genes = mmu_kegg_genes`
- `custom_descriptions = mmu_kegg_descriptions`

```{r run}
example_mmu_output <- run_pathfindR(
  input = example_mmu_input,
  convert2alias = FALSE,
  gene_sets = "Custom",
  custom_genes = mmu_kegg_genes,
  custom_descriptions = mmu_kegg_descriptions,
  pin_name_path = path2SIF
)
```

```{r enr_chart, echo=FALSE, eval=TRUE}
enrichment_chart(example_mmu_output)
```

```{r output, eval=TRUE}
knitr::kable(example_mmu_output)
```

Because we used a very strict cut-off (logFC >= 2 + FDR <= 0.05), there were only 18 enriched KEGG pathways. However, the pathways identified here are significantly related to the pathways identified in the original publication by Wang et al.[^1].

[^1]: Wang Q, Lu Y, Li R, et al. Therapeutic effects of CSF1R-blocking antibodies in multiple myeloma. Leukemia. 2018;32(1):176-183.

## Built-in Mus musculus Data

As aforementioned, for Mus musculus (only), we have provided the necessary PIN (`mmu_STRING`) and gene set data (`mmu_KEGG`) so you can also run:

```{r run2}
example_mmu_output <- run_pathfindR(
  input = example_mmu_input,
  convert2alias = FALSE,
  gene_sets = "mmu_KEGG",
  pin_name_path = "mmu_STRING"
)
```


================================================
FILE: vignettes/obtain_data.Rmd
================================================
---
title: "Obtaining PIN and Gene Sets Data"
output: rmarkdown::html_vignette
date: "`r Sys.Date()`"
vignette: >
  %\VignetteIndexEntry{Obtaining PIN and Gene Sets Data}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  eval = FALSE
)
```

# Get PIN File

For retrieving the PIN file for an organism of your choice, you may use the function `get_pin_file()`. As of this version, the only source for PIN data is "BioGRID".

By default, the function downloads the PIN data from BioGRID and processes it, saves it in a temporary file and returns the path: 

```{r}
## the default organism is "Homo_sapiens"
path_to_pin_file <- get_pin_file()
```

You can retrieve the PIN data for the organism of your choice, by setting the `org` argument:

```{r}
## retrieving PIN data for "Gallus_gallus"
path_to_pin_file <- get_pin_file(org = "Gallus_gallus")
```

You may also supply a `path/to/PIN/file` to save the PIN file for later use (in this case, the path you supply will be returned):

```{r}
## saving the "Homo_sapiens" PIN as "/path/to/PIN/file"
path_to_pin_file <- get_pin_file(path2pin = "/path/to/PIN/file")
```

You may also retrieve a specific version of BioGRID via setting the `release` argument:

```{r}
## retrieving PIN data for "Mus_musculus" from BioGRID release 3.5.179
path_to_pin_file <- get_pin_file(
  org = "Mus_musculus",
  release = "3.5.179"
)
```

# Get Gene Sets List

To retrieve organism-specific gene sets list, you may use the function `get_gene_sets_list()`. The available sources for gene sets are "KEGG", "Reactome" and "MSigDB". The function retrieves the gene sets data from the source and processes it into a list of two objects used by pathfindR for active-subnetwork-oriented enrichment analysis:
1. **gene_sets** A list containing the genes involved in each gene set
2. **descriptions** A named vector containing the descriptions for each gene set

By default, `get_gene_sets_list()` obtains "KEGG" gene sets for "hsa".

## KEGG Pathway Gene Sets

To obtain the gene sets list of the KEGG pathways for an organism of your choice, use the KEGG organism code for the selected organism. For a full list of all available organisms, see [here](https://www.genome.jp/kegg/catalog/org_list.html).

```{r}
## obtaining KEGG pathway gene sets for Rattus norvegicus (rno)
gsets_list <- get_gene_sets_list(org_code = "rno")
```

## Reactome Pathway Gene Sets

For obtaining Reactome pathway gene sets, set the `source` argument to "Reactome". This downloads the most current Reactome pathways in gmt format and processes it into the list object that pathfindR uses:

```{r}
gsets_list <- get_gene_sets_list(source = "Reactome")
```

For Reactome, there is only one collection of pathway gene sets.

## MSigDB Gene Sets

Using `msigdbr`, `pathfindR` can retrieve all MSigDB gene sets. For this, set the `source` argument to "MSigDB" and the `collection` argument to the desired MSigDB collection (one of H, C1, C2, C3, C4, C5, C6, C7):

```{r}
gsets_list <- get_gene_sets_list(
  source = "MSigDB",
  collection = "C2"
)
```

The default organism for MSigDB is "Homo sapiens", you may obtain the gene sets data for another organism by setting the `species` argument:

```{r}
## obtaining C5 gene sets data for "Drosophila melanogaster"
gsets_list <- get_gene_sets_list(
  source = "MSigDB",
  species = "Drosophila melanogaster",
  collection = "C5"
)
```

```{r, eval=TRUE}
## see msigdbr::msigdbr_species() for all available organisms
msigdbr::msigdbr_species()
```

You may also obtain the gene sets for a subcollection by setting the `subcollection` argument:

```{r}
## obtaining C3 - MIR: microRNA targets
gsets_list <- get_gene_sets_list(
  source = "MSigDB",
  collection = "C3",
  subcollection = "MIR"
)
```


================================================
FILE: vignettes/visualization_vignette.Rmd
================================================
---
title: "Visualization of pathfindR Enrichment Results"
output: rmarkdown::html_vignette
date: "`r Sys.Date()`"
vignette: >
  %\VignetteIndexEntry{Visualization of pathfindR Enrichment Results}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  fig.width = 8, fig.height = 4, fig.align = "center"
)
```
```{r setup}
suppressPackageStartupMessages(library(pathfindR))
```

`pathfindR` offers various functionality to visualize the enrichment results. In this vignette, I try to demonstrate these functionalities.

## `enrichment_chart()`: Bubble Chart of Enrichment Results

`enrichment_chart` generates a bubble chart. The x-axis corresponds to fold enrichment values while the y-axis indicates the enriched terms. Size of the bubble indicates the number of significant genes in the given enriched term. Color indicates the -log10(lowest-p) value. The closer the color is to red, the more significant the enrichment is.

```{r enr_chart, eval=FALSE}
enrichment_chart(example_pathfindR_output)
```

<img src="./enrichment_chart.png" style="max-width:100%;" />

By default, the bubble chart is generated for the top 10 terms. This can be controlled by the `top_terms` argument:

```{r enr_chart2, eval=FALSE}
## change top_terms
enrichment_chart(example_pathfindR_output, top_terms = 3)

## set null for displaying all terms
enrichment_chart(example_pathfindR_output, top_terms = NULL)
```

If the enrichment results were clustered, setting `plot_by_cluster == TRUE` will result in the enriched terms to be grouped by clusters:

```{r enr_chart3, fig.height=8, fig.width=8}
enrichment_chart(example_pathfindR_output_clustered, plot_by_cluster = TRUE)
```

See `?enrichment_chart` for more details.

## `visualize_terms()`: Enriched Term Diagrams

For KEGG enrichment analyses, `visualize_terms()` can be used to generate KEGG pathway diagrams that are returned as a list of `ggraph` objects (using [`ggkegg`](https://github.com/noriakis/ggkegg))::

```{r KEGG_vis, eval=FALSE}
input_processed <- input_processing(example_pathfindR_input)
gg_list <- visualize_terms(
  result_df = example_pathfindR_output,
  input_processed = input_processed,
  is_KEGG_result = TRUE
)  # this function returns a list of ggraph objects (named by Term ID)

# save one of the plots as PDF image
ggplot2::ggsave(
  "hsa04911_diagram.pdf",   # path to output, format is determined by extension
  gg_list$hsa04911,         # what to plot
  width = 5                 # adjust width
  height = 5                # adjust height
) 
```

<img src="./example_kegg_pathway_diagram.png" style="max-width:100%;" />

Alternatively (i.e., for other types of non-KEGG enrichment analyses), an interaction diagram per enriched term can be generated again via `visualize_terms()`. These diagrams are also returned as a list of `ggraph` objects:

```{r nonKEGG_viss, eval=FALSE}
input_processed <- input_processing(example_pathfindR_input)
gg_list <- visualize_terms(
  result_df = example_pathfindR_output,
  input_processed = input_processed,
  is_KEGG_result = FALSE,
  pin_name_path = "Biogrid"
)  # this function returns a list of ggraph objects (named by Term ID)

# save one of the plots as PDF image
ggplot2::ggsave(
  "diabetic_cardiomyopathy_interactions.pdf",   # path to output, format is determined by extension
  gg_list$hsa04911,                             # what to plot
  width = 10                                    # adjust width
  height = 6                                    # adjust height
) 
```

<img src="./example_interaction_vis.png" style="max-width:100%;" />

See `?visualize_terms` for more details.

## `term_gene_heatmap()`: Terms by Genes Heatmap

`term_gene_heatmap()` is used to create a heatmap where rows are enriched terms and columns are involved input genes. This heatmap allows visual identification of the input genes involved in the enriched terms, as well as the common or distinct genes between different terms. 

```{r hmap}
term_gene_heatmap(example_pathfindR_output)
```

By default, the heatmap is generated for the top 10 terms. This can be controlled by the `num_terms` argument:

```{r hmap2, eval=FALSE}
term_gene_heatmap(example_pathfindR_output, num_terms = 3)

## set null for displaying all terms
term_gene_heatmap(example_pathfindR_output, num_terms = NULL)
```

By default, the term ids are used. For using full descriptions, set `use_description = TRUE`

```{r hmap3, eval=FALSE}
term_gene_heatmap(example_pathfindR_output, use_description = TRUE)
```

If the input data frame (same as in `run_pathfindR()`) is supplied, the tile colors indicate the change values:

```{r hmap4, eval=FALSE}
term_gene_heatmap(result_df = example_pathfindR_output, genes_df = example_pathfindR_input)
```

<img src="./hmap.png" style="max-width:100%;" />

See `?term_gene_heatmap` for more details.

## `term_gene_graph()`: Term-Gene Graph

The function `term_gene_graph()` (adapted from the Gene-Concept network visualization by the R package `enrichplot`) can be utilized to visualize which significant genes are involved in the enriched terms. The function creates the term-gene graph, displaying the connections between genes and biological terms (enriched pathways or gene sets). This allows for the investigation of multiple terms to which significant genes are related. The graph also enables determination of the degree of overlap between the enriched terms by identifying shared and/or distinct significant genes.
By default, the function visualizes the term-gene graph for the top 10 enriched terms:

```{r term_gene1}
term_gene_graph(example_pathfindR_output)
```

To plot all of the enriched terms in the enrichment results, set `num_terms = NULL` (not advised due to cluttered visualization):

```{r term_gene2, eval=FALSE}
term_gene_graph(example_pathfindR_output, num_terms = NULL)
```

To plot using full term names (instead of IDs which is the default), set `use_description = TRUE`:

```{r term_gene3, eval=FALSE}
term_gene_graph(example_pathfindR_output, num_terms = 3, use_description = TRUE)
```

<img src="./term_gene.png" style="max-width:100%;" />

By default the node sizes are plotted proportional to the number of genes a term contains (`num_genes`). To adjust node sizes using the $-log_{10}$(lowest p values), set `node_size = "p_val"`:

```{r term_gene4, eval=FALSE}
term_gene_graph(example_pathfindR_output, num_terms = 3, node_size = "p_val")
```

See `?term_gene_graph` for more details.

## `UpSet_plot()`: UpSet Plots of Enriched Terms

UpSet plots are plots of the intersections of sets as a matrix. `UpSet_plot()` creates a ggplot object of an UpSet plot where the x-axis is the UpSet plot of intersections of enriched terms. By default (`method = "heatmap"`), the main plot is a heatmap of genes at the corresponding intersections, colored by up/down regulation:

```{r upset1}
UpSet_plot(example_pathfindR_output)
```

If genes_df is provided, the heatmap tiles are colored by change values:

```{r upset2, eval=FALSE}
UpSet_plot(example_pathfindR_output, genes_df = example_pathfindR_input)
```

<img src="./upset.png" style="max-width:100%;" />

Again, you may change the number of top terms plotted via `num_terms` (default = 10):

```{r upset3, eval=FALSE}
UpSet_plot(example_pathfindR_output, num_terms = 5)
```

Again, to plot using full term names (instead of IDs which is the default), set `use_description = TRUE`:

```{r upset4, eval=FALSE}
UpSet_plot(example_pathfindR_output, use_description = TRUE)
```

If `method = "barplot"`, the main plot is a bar plots of the number of genes in the corresponding intersections:

```{r upset5, eval=FALSE}
UpSet_plot(example_pathfindR_output, method = "barplot")
```

If `method = "boxplot"` and if `genes_df` is provided, then the main plot displays the boxplots of change values of the genes within the corresponding intersections:

```{r upset6, eval=FALSE}
UpSet_plot(example_pathfindR_output, example_pathfindR_input, method = "boxplot")
```

See `?UpSet_plot` for more details.