Full Code of egeulgen/pathfindR for AI

master 7ce1330d6d16 cached

138 files

688.6 KB

207.2k tokens

101 symbols

1 requests

Download .txt

Showing preview only (729K chars total). Download the full file or copy to clipboard to get everything.

Repository: egeulgen/pathfindR
Branch: master
Commit: 7ce1330d6d16
Files: 138
Total size: 688.6 KB

Directory structure:
gitextract_46eozuid/

├── .Rbuildignore
├── .Rinstignore
├── .github/
│   ├── .gitignore
│   ├── ISSUE_TEMPLATE/
│   │   ├── bug_report.md
│   │   └── feature_request.md
│   └── workflows/
│       ├── R-CMD-check.yaml
│       ├── branch_naming_policy.yaml
│       ├── pkgdown.yaml
│       └── test-coverage.yaml
├── .gitignore
├── CODE_OF_CONDUCT.md
├── CONTRIBUTING.md
├── DESCRIPTION
├── LICENSE
├── LICENSE.md
├── NAMESPACE
├── NEWS.md
├── R/
│   ├── active_snw_search.R
│   ├── clustering.R
│   ├── comparison.R
│   ├── core.R
│   ├── data_generation.R
│   ├── enrichment.R
│   ├── pathfindr.R
│   ├── scoring.R
│   ├── utility.R
│   ├── visualization.R
│   └── zzz.R
├── README.Rmd
├── README.md
├── _pkgdown.yml
├── codecov.yml
├── cran-comments.md
├── inst/
│   ├── CITATION
│   ├── extdata/
│   │   ├── CREB.txt
│   │   ├── MYC.txt
│   │   └── resultActiveSubnetworkSearch.txt
│   ├── java/
│   │   └── ActiveSubnetworkSearch.jar
│   └── rmd/
│       ├── conversion_table.Rmd
│       ├── enriched_terms.Rmd
│       └── results.Rmd
├── java/
│   ├── ActiveSubnetworkSearchAlgorithms/
│   │   ├── ActiveSubnetworkSearch.java
│   │   ├── GAIndividual.java
│   │   ├── GeneticAlgorithm.java
│   │   ├── GreedySearch.java
│   │   └── SimulatedAnnealing.java
│   ├── ActiveSubnetworkSearchMisc/
│   │   ├── Gaussian.java
│   │   ├── ScoreCalculations.java
│   │   ├── Subnetwork.java
│   │   └── ZStatistics.java
│   ├── Application/
│   │   ├── AppActiveSubnetworkSearch.java
│   │   └── Parameters.java
│   ├── File/
│   │   ├── ExperimentFileReader.java
│   │   └── SIFReader.java
│   └── Network/
│       ├── Network.java
│       ├── Node.java
│       └── SubnetworkFinder.java
├── man/
│   ├── UpSet_plot.Rd
│   ├── active_snw_enrichment_wrapper.Rd
│   ├── active_snw_search.Rd
│   ├── annotate_term_genes.Rd
│   ├── check_java_version.Rd
│   ├── cluster_enriched_terms.Rd
│   ├── cluster_graph_vis.Rd
│   ├── color_kegg_pathway.Rd
│   ├── combine_pathfindR_results.Rd
│   ├── combined_results_graph.Rd
│   ├── configure_output_dir.Rd
│   ├── create_HTML_report.Rd
│   ├── create_kappa_matrix.Rd
│   ├── enrichment.Rd
│   ├── enrichment_analyses.Rd
│   ├── enrichment_chart.Rd
│   ├── fetch_gene_set.Rd
│   ├── fetch_java_version.Rd
│   ├── filterActiveSnws.Rd
│   ├── fuzzy_term_clustering.Rd
│   ├── get_biogrid_pin.Rd
│   ├── get_gene_sets_list.Rd
│   ├── get_kegg_gsets.Rd
│   ├── get_mgsigdb_gsets.Rd
│   ├── get_pin_file.Rd
│   ├── get_reactome_gsets.Rd
│   ├── gset_list_from_gmt.Rd
│   ├── hierarchical_term_clustering.Rd
│   ├── hyperg_test.Rd
│   ├── input_processing.Rd
│   ├── input_testing.Rd
│   ├── isColor.Rd
│   ├── pathfindr.Rd
│   ├── plot_scores.Rd
│   ├── process_pin.Rd
│   ├── return_pin_path.Rd
│   ├── run_pathfindr.Rd
│   ├── safe_get_content.Rd
│   ├── score_terms.Rd
│   ├── single_iter_wrapper.Rd
│   ├── summarize_enrichment_results.Rd
│   ├── term_gene_graph.Rd
│   ├── term_gene_heatmap.Rd
│   ├── visualize_KEGG_diagram.Rd
│   ├── visualize_active_subnetworks.Rd
│   ├── visualize_term_interactions.Rd
│   └── visualize_terms.Rd
├── renv/
│   ├── .gitignore
│   ├── activate.R
│   └── settings.json
├── revdep/
│   ├── .gitignore
│   ├── email.yml
│   └── failures.md
├── slides/
│   └── cost_charme_school/
│       └── demo_script.R
├── tests/
│   ├── testthat/
│   │   ├── test-active_snw_search.R
│   │   ├── test-clustering.R
│   │   ├── test-comparison.R
│   │   ├── test-core.R
│   │   ├── test-data_generation.R
│   │   ├── test-enrichment.R
│   │   ├── test-scoring.R
│   │   ├── test-utility.R
│   │   ├── test-visualization.R
│   │   └── test-zzz.R
│   ├── testthat-active_snw.R
│   ├── testthat-clustering.R
│   ├── testthat-comparison.R
│   ├── testthat-core.R
│   ├── testthat-data_generation.R
│   ├── testthat-enrichment.R
│   ├── testthat-scoring.R
│   ├── testthat-utility.R
│   ├── testthat-visualization.R
│   └── testthat-zzz.R
└── vignettes/
    ├── .gitignore
    ├── comparing_results.Rmd
    ├── intro_vignette.Rmd
    ├── manual_execution.Rmd
    ├── non_hs_analysis.Rmd
    ├── obtain_data.Rmd
    └── visualization_vignette.Rmd

================================================
FILE CONTENTS
================================================

================================================
FILE: .Rbuildignore
================================================
^renv$
^renv\.lock$
^slides$
^CODE_OF_CONDUCT\.md$
^CONTRIBUTING.md$
^\.github$
^Meta$
^doc$
^.*\.Rprofile$
^.*\.Rproj$
^\.Rproj\.user$
^data-raw$
^misc$
^README.md$
^\.travis\.yml$
^cran-comments\.md$
^CRAN-RELEASE$
^Dockerfile_dev$
^codecov\.yml$
^LICENSE\.md$
^README\.Rmd$
^docs$
^_pkgdown\.yml$
^pkgdown$
^revdep$
^CRAN-SUBMISSION$


================================================
FILE: .Rinstignore
================================================
^slides$
^java$
^misc_data$


================================================
FILE: .github/.gitignore
================================================
*.html


================================================
FILE: .github/ISSUE_TEMPLATE/bug_report.md
================================================
---
name: Bug report
about: Create a report to help us improve
title: ''
labels: 'bug'
assignees: ''

---

**Describe the bug**
A clear and concise description of what the bug is.

**To Reproduce**
Steps to reproduce the behavior:
1. Prepare input as '...'
2. Run the following function: '....'
3. See error

**Expected behavior**
A clear and concise description of what you expected to happen.

**Screenshots**
If applicable, add screenshots to help explain your problem.

**Desktop (please complete the following information):**
 - OS: [e.g. macOS, Windows, Linux]
 - Version [e.g. 10.14.5]

** R Session Information:**
Please provide the R session information (by running `sessionInfo()`)

**Additional context**
Add any other context about the problem here. While pathfindR is an R package, the active subnetwork search functionality is written in Java. If you suspect any issue regarding java please provide your Java version (by running `java --version`)


================================================
FILE: .github/ISSUE_TEMPLATE/feature_request.md
================================================
---
name: Feature request
about: Suggest an idea for this project
title: ''
labels: 'enhancement'
assignees: ''

---

**Is your feature request related to a problem? Please describe.**
A clear and concise description of what the problem is. Ex. I'm always frustrated when [...]

**Describe the solution you'd like**
A clear and concise description of what you want to happen.

**Describe alternatives you've considered**
A clear and concise description of any alternative solutions or features you've considered.

**Additional context**
Add any other context or screenshots about the feature request here.


================================================
FILE: .github/workflows/R-CMD-check.yaml
================================================
# Workflow derived from https://github.com/r-lib/actions/tree/v2/examples
# Need help debugging build failures? Start at https://github.com/r-lib/actions#where-to-find-help
on:
  push:
    branches:
      - master
  pull_request:
    branches:
      - master

name: R-CMD-check

jobs:
  R-CMD-check:
    runs-on: ${{ matrix.config.os }}

    name: ${{ matrix.config.os }} (${{ matrix.config.r }})

    strategy:
      fail-fast: false
      matrix:
        config:
          - {os: macos-latest,   r: 'release'}
          - {os: windows-latest, r: 'release'}
          - {os: ubuntu-latest,   r: 'devel', http-user-agent: 'release'}
          - {os: ubuntu-latest,   r: 'release'}
          - {os: ubuntu-latest,   r: 'oldrel-1'}

    env:
      GITHUB_PAT: ${{ secrets.GITHUB_TOKEN }}
      R_KEEP_PKG_SOURCE: yes

    steps:
      - uses: actions/checkout@v4

      - uses: r-lib/actions/setup-pandoc@v2

      - uses: r-lib/actions/setup-r@v2
        with:
          r-version: ${{ matrix.config.r }}
          http-user-agent: ${{ matrix.config.http-user-agent }}
          use-public-rspm: true

      - uses: r-lib/actions/setup-r-dependencies@v2
        with:
          extra-packages: any::rcmdcheck
          needs: check

      - uses: r-lib/actions/check-r-package@v2
        with:
          upload-snapshots: true
          build_args: 'c("--no-manual","--compact-vignettes=gs+qpdf")'


================================================
FILE: .github/workflows/branch_naming_policy.yaml
================================================
name: Branch Naming Policy Action

on:
  create:
  delete:
  pull_request:
    branches:
      - master

jobs:
  branch-naming-policy:
    runs-on: ubuntu-latest

    steps:
      - name: Checkout
        uses: actions/checkout@v4

      - name: Run Branch Naming Policy Action
        uses: nicklegan/github-repo-branch-naming-policy-action@v1.1.1
        if: github.ref_type == 'branch' || github.ref_type == 'pull_request'
        with:
          token: ${{ secrets.REPO_TOKEN }}
          regex: '^(feature|fix|docs|refactor|test|release|chore|experiment)\/[a-zA-Z0-9-]+$'
          delete: true


================================================
FILE: .github/workflows/pkgdown.yaml
================================================
# Workflow derived from https://github.com/r-lib/actions/tree/v2/examples
# Need help debugging build failures? Start at https://github.com/r-lib/actions#where-to-find-help
on:
  push:
    branches: [main, master]
  pull_request:
    branches: [main, master]
  release:
    types: [published]
  workflow_dispatch:

name: pkgdown

jobs:
  pkgdown:
    runs-on: ubuntu-latest
    # Only restrict concurrency for non-PR jobs
    concurrency:
      group: pkgdown-${{ github.event_name != 'pull_request' || github.run_id }}
    env:
      GITHUB_PAT: ${{ secrets.GITHUB_TOKEN }}
    permissions:
      contents: write
    steps:
      - uses: actions/checkout@v4

      - uses: r-lib/actions/setup-pandoc@v2

      - uses: r-lib/actions/setup-r@v2
        with:
          use-public-rspm: true

      - uses: r-lib/actions/setup-r-dependencies@v2
        with:
          extra-packages: any::pkgdown, local::.
          needs: website

      - name: Build site
        run: pkgdown::build_site_github_pages(new_process = FALSE, install = FALSE)
        shell: Rscript {0}

      - name: Deploy to GitHub pages 🚀
        if: github.event_name != 'pull_request'
        uses: JamesIves/github-pages-deploy-action@v4.5.0
        with:
          clean: false
          branch: gh-pages
          folder: docs


================================================
FILE: .github/workflows/test-coverage.yaml
================================================
# Workflow derived from https://github.com/r-lib/actions/tree/v2/examples
# Need help debugging build failures? Start at https://github.com/r-lib/actions#where-to-find-help
on:
  push:
    branches: [main, master]
  pull_request:
    branches: [main, master]

name: test-coverage

jobs:
  test-coverage:
    runs-on: ubuntu-latest
    env:
      GITHUB_PAT: ${{ secrets.GITHUB_TOKEN }}

    steps:
      - uses: actions/checkout@v4

      - uses: r-lib/actions/setup-r@v2
        with:
          use-public-rspm: true

      - uses: r-lib/actions/setup-r-dependencies@v2
        with:
          extra-packages: any::covr
          needs: coverage

      - name: Test coverage
        run: |
          covr::codecov(
            quiet = FALSE,
            clean = FALSE,
            install_path = file.path(normalizePath(Sys.getenv("RUNNER_TEMP"), winslash = "/"), "package")
          )
        shell: Rscript {0}

      - name: Show testthat output
        if: always()
        run: |
          ## --------------------------------------------------------------------
          find '${{ runner.temp }}/package' -name 'testthat.Rout*' -exec cat '{}' \; || true
        shell: bash

      - name: Upload test results
        if: failure()
        uses: actions/upload-artifact@v4
        with:
          name: coverage-test-failures
          path: ${{ runner.temp }}/package


================================================
FILE: .gitignore
================================================
Meta
doc
inst/doc
misc
data-raw
*.pptx

.Rprofile

*.DS_Store
*.Rproj
*.RData
*.Ruserdata
*.Rproj.user
*.Rhistory
.Rproj.user

docs


================================================
FILE: CODE_OF_CONDUCT.md
================================================
# Contributor Covenant Code of Conduct

## Our Pledge

In the interest of fostering an open and welcoming environment, we as
contributors and maintainers pledge to making participation in our project and
our community a harassment-free experience for everyone, regardless of age, body
size, disability, ethnicity, sex characteristics, gender identity and expression,
level of experience, education, socio-economic status, nationality, personal
appearance, race, religion, or sexual identity and orientation.

## Our Standards

Examples of behavior that contributes to creating a positive environment
include:

* Using welcoming and inclusive language
* Being respectful of differing viewpoints and experiences
* Gracefully accepting constructive criticism
* Focusing on what is best for the community
* Showing empathy towards other community members

Examples of unacceptable behavior by participants include:

* The use of sexualized language or imagery and unwelcome sexual attention or
 advances
* Trolling, insulting/derogatory comments, and personal or political attacks
* Public or private harassment
* Publishing others' private information, such as a physical or electronic
 address, without explicit permission
* Other conduct which could reasonably be considered inappropriate in a
 professional setting

## Our Responsibilities

Project maintainers are responsible for clarifying the standards of acceptable
behavior and are expected to take appropriate and fair corrective action in
response to any instances of unacceptable behavior.

Project maintainers have the right and responsibility to remove, edit, or
reject comments, commits, code, wiki edits, issues, and other contributions
that are not aligned to this Code of Conduct, or to ban temporarily or
permanently any contributor for other behaviors that they deem inappropriate,
threatening, offensive, or harmful.

## Scope

This Code of Conduct applies both within project spaces and in public spaces
when an individual is representing the project or its community. Examples of
representing a project or community include using an official project e-mail
address, posting via an official social media account, or acting as an appointed
representative at an online or offline event. Representation of a project may be
further defined and clarified by project maintainers.

## Enforcement

Instances of abusive, harassing, or otherwise unacceptable behavior may be
reported by contacting the project team at egeulgen@gmail.com. All
complaints will be reviewed and investigated and will result in a response that
is deemed necessary and appropriate to the circumstances. The project team is
obligated to maintain confidentiality with regard to the reporter of an incident.
Further details of specific enforcement policies may be posted separately.

Project maintainers who do not follow or enforce the Code of Conduct in good
faith may face temporary or permanent repercussions as determined by other
members of the project's leadership.

## Attribution

This Code of Conduct is adapted from the [Contributor Covenant][homepage], version 1.4,
available at https://www.contributor-covenant.org/version/1/4/code-of-conduct.html

[homepage]: https://www.contributor-covenant.org

For answers to common questions about this code of conduct, see
https://www.contributor-covenant.org/faq


================================================
FILE: CONTRIBUTING.md
================================================
# Contributing to pathfindR development

The goal of this guide is to help you in contributing to pathfindR. The guide is divided into two main pieces:

1. Filing a bug report or feature request in an issue.
1. Suggesting a change via a pull request.

Please note that pathfindR is released with a [Contributor Code of Conduct](CODE_OF_CONDUCT.md). By contributing to this project, 
you agree to abide by its terms.

## Issues

When filing an issue, the most important thing is to include a minimal 
reproducible example so that we can quickly verify the problem, and then figure 
out how to fix it. There are three things you need to include to make your 
example reproducible: required packages, data, code.

1.  **Packages** should be loaded at the top of the script, so it's easy to
    see which ones the example needs.
  
1.  The easiest way to include **data** is to use `dput()` to generate the R code 
    to recreate it. For example, to recreate the `mtcars` dataset in R,
    I'd perform the following steps:
  
       1. Run `dput(mtcars)` in R
       2. Copy the output
       3. In my reproducible script, type `mtcars <- ` then paste.
       
    But even better is if you can create a `data.frame()` with just a handful
    of rows and columns that still illustrates the problem.
    
    For more complex **data**, you can use `saveRDS()` to save the object and attach it with the issue.
  
1.  Spend a little bit of time ensuring that your **code** is easy for others to
    read:
  
    * make sure you've used spaces and your variable names are concise, but
      informative
  
    * use comments to indicate where your problem lies
  
    * do your best to remove everything that is not related to the problem.  
     The shorter your code is, the easier it is to understand.

You can check you have actually made a reproducible example by starting up a 
fresh R session and pasting your script in.

## Pull requests

To contribute a change to pathfindR, you follow these steps:

1. Create a branch in git and make your changes.
1. Push branch to github and issue pull request (PR).
1. Discuss the pull request.
1. Iterate until either we accept the PR or decide that it's not
   a good fit for pathfindR.

If you're not familiar with git or github, please start by reading <http://r-pkgs.had.co.nz/git.html>

## Branch Naming Conventions

We want to follow the branch following naming convention during development:

### Feature Development:
- Use the prefix `feature/` followed by a brief description of the feature.
- Example: `feature/add-new-method`, `feature/update-active-snw-search`

### Bug Fixes:
- Use the prefix `fix/` followed by a description of the fix or the issue number.
- Example: `fix/correct-typo`, `fix/#123`

### Documentation:
- Use the prefix `docs/` for updates exclusively in the documentation.
- Example: `docs/update-readme`, `docs/add-examples`

### Refactoring:
- Use `refactor/` when modifying the structure and organization of code without changing its external behavior.
- Example: `refactor/reorganize-tests`, `refactor/optimization-code`

### Testing:
- Use `test/` for changes related to testing only.
- Example: `test/add-unit-tests`, `test/expand-tests`

### Releases (for maintainers only):
- Use `release/` for preparing a new version release.
- Example: `release/v1.0.0`, `release/v2.0.0`

### Chore/Maintenance (mostly for maintainers):
- Use `chore/` for mundane tasks like updating dependencies or minor tasks that don't modify the source code.
- Example: `chore/update-packages`, `chore/license-update`

### Experimental:
- Use `experiment/` for experimental work that might not be merged into the `master`
- Example: `experiment/new-algorithm`, `exp/test-new-library`

# Attribution
This Contributing guide was adapted from [ggplot2](https://github.com/tidyverse/ggplot2)


================================================
FILE: DESCRIPTION
================================================
Package: pathfindR
Type: Package
Title: Enrichment Analysis Utilizing Active Subnetworks
Version: 2.7.0.9000
Authors@R: c(person("Ege", "Ulgen",
                    role = c("cre", "cph"), 
                    email = "egeulgen@gmail.com",
                    comment = c(ORCID = "0000-0003-2090-3621")),
             person("Ozan", "Ozisik",
                    role = "aut",
                    email = "ozanytu@gmail.com",
                    comment = c(ORCID = "0000-0001-5980-8002")))
Maintainer: Ege Ulgen <egeulgen@gmail.com>
Description: Enrichment analysis enables researchers to uncover mechanisms 
    underlying a phenotype. However, conventional methods for enrichment 
    analysis do not take into account protein-protein interaction information, 
    resulting in incomplete conclusions. 'pathfindR' is a tool for enrichment 
    analysis utilizing active subnetworks. The main function identifies active 
    subnetworks in a protein-protein interaction network using a user-provided 
    list of genes and associated p values. It then performs enrichment analyses 
    on the identified subnetworks, identifying enriched terms (i.e. pathways or, 
    more broadly, gene sets) that possibly underlie the phenotype of interest.
    'pathfindR' also offers functionalities to cluster the enriched terms and 
    identify representative terms in each cluster, to score the enriched terms 
    per sample and to visualize analysis results. The enrichment, clustering and 
    other methods implemented in 'pathfindR' are described in detail in 
    Ulgen E, Ozisik O, Sezerman OU. 2019. 'pathfindR': An R Package for 
    Comprehensive Identification of Enriched Pathways in Omics Data Through 
    Active Subnetworks. Front. Genet. <doi:10.3389/fgene.2019.00858>.
License: MIT + file LICENSE
URL: https://egeulgen.github.io/pathfindR/, https://github.com/egeulgen/pathfindR
BugReports: https://github.com/egeulgen/pathfindR/issues
Encoding: UTF-8
LazyData: true
SystemRequirements: Java (>= 8.0)
biocViews:
Imports: 
    DBI,
    AnnotationDbi,
    doParallel,
    foreach,
    rmarkdown,
    ggplot2,
    ggraph,
    ggupset,
    fpc,
    ggkegg (>= 1.4.0),
    grDevices,
    httr,
    igraph,
    R.utils,
    msigdbr (>= 24.1.0),
    knitr
Depends: R (>= 4.3.0),
    pathfindR.data (>= 2.0)
Suggests: 
    org.Hs.eg.db,
    testthat (>= 2.3.2),
    covr,
    mockery
RoxygenNote: 7.3.3
VignetteBuilder: knitr


================================================
FILE: LICENSE
================================================
YEAR: 2020
COPYRIGHT HOLDER: Ege Ulgen


================================================
FILE: LICENSE.md
================================================
# MIT License

Copyright (c) 2020 Ege Ulgen

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.


================================================
FILE: NAMESPACE
================================================
# Generated by roxygen2: do not edit by hand

export(UpSet_plot)
export(active_snw_search)
export(annotate_term_genes)
export(cluster_enriched_terms)
export(cluster_graph_vis)
export(combine_pathfindR_results)
export(combined_results_graph)
export(create_kappa_matrix)
export(enrichment)
export(enrichment_analyses)
export(enrichment_chart)
export(fetch_gene_set)
export(filterActiveSnws)
export(fuzzy_term_clustering)
export(get_gene_sets_list)
export(get_pin_file)
export(hierarchical_term_clustering)
export(hyperg_test)
export(input_processing)
export(input_testing)
export(plot_scores)
export(return_pin_path)
export(run_pathfindR)
export(score_terms)
export(summarize_enrichment_results)
export(term_gene_graph)
export(term_gene_heatmap)
export(visualize_KEGG_diagram)
export(visualize_active_subnetworks)
export(visualize_term_interactions)
export(visualize_terms)
import(doParallel)
import(foreach)
import(ggplot2)
import(ggraph)
import(graphics)
import(knitr)
import(parallel)
import(pathfindR.data)
import(rmarkdown)
importFrom(ggkegg,pathway)
importFrom(httr,GET)
importFrom(httr,content)
importFrom(httr,http_error)
importFrom(httr,status_code)
importFrom(httr,timeout)


================================================
FILE: NEWS.md
================================================
# pathfindR (development version)

# pathfindR 2.7.0
## Minor Changes and Bug Fixes
- Moved org.Hs.eg.db from "Imports" to "Suggests" per new CRAN policy. Relevant functions revert to default behaviour if the required package is not installed.

# pathfindR 2.6.0

## Minor Changes and Bug Fixes
- fixed missing argument issue in `get_gene_sets_list`(#230)
- refactored to introduce `safe_get_content` so that URL access issues are handled more gracefully

# pathfindR 2.5.1

## Minor Changes and Bug Fixes
- fixed NA values in kappa matrix generation that will cause error as part of the latest `igraph` update (#227)

# pathfindR 2.5.0

## Major Changes
- updated dependencies so that `pathfindR` depends on `msigdbr (>= 24.1.0)`
- added the new `db_species` argument to the `get_mgsigdb_gsets()` data generation function

## Minor Changes and Bug Fixes
- fixed test assertions that will break as part of the latest `ggplot2` update (#223)

# pathfindR 2.4.2

## Minor Changes and Bug Fixes

- fixed a bug in `visualize_KEGG_diagram()` where `ggkegg` was raising an error (#214)


# pathfindR 2.4.1

## Minor Changes and Bug Fixes

- fixed a bug regarding KEGG gene set fetching: removed the conversion functionality in `get_kegg_gsets()` which now returns KEGG IDs so that the user can convert the returned identifiers using a more appropriate tool (e.g. BioMart) should they wish

# pathfindR 2.4.0

## Major Changes

- implemented a new `color_kegg_pathway()` function using `ggkegg` to create colored KEGG pathway ggplot objects (instead of using `KEGGREST` to obtain the colored PNG files, which no longer works #169)
- renamed the `visualize_hsa_KEGG` function to `visualize_KEGG_diagram()` to reflect this is now able to handle KEGG pathway enrichment results from any organism
- updated the `visualize_terms()`, `visualize_term_interactions()` and `visualize_KEGG_diagram()` functions so that they now return a list of ggplot objects (named by term ID)
- updated the `get_kegg_gsets()` function to also use `ggkegg` for fetching genes per pathway data
- removed unneeded dependencies: `magick`, `KEGGgraph` and `KEGGREST`

## Minor Changes and Bug Fixes

- updated the `get_biogrid_pin()` function so that it can now determine the latest version and download/process it from BioGRID (via setting `release = "latest"`, which is now the default behavior) 

# pathfindR 2.3.1

## Minor Changes and Bug Fixes
- fixed a bug in the `UpSet_plot()` plot function regarding the interaction with `ggupset` package that was discovered in a reverse dependency check for `ggplot2 3.5.0` (#189)
- fixed gene symbol case mismatch issue in `score_terms()` (#186)
- applied enhancement suggestion from #184 to enable scale fill manual for `term_gene_graph()`

# pathfindR 2.3.0
## Major Changes
- reverted removal of `create_HTML_report()` so `run_pathfindR()` once again generates HTML reports

# pathfindR 2.2.0
## Minor Changes and Bug Fixes
- added the `disable_parallel` argument in `active_snw_enrichment_wrapper()` to be able to disable parallel runs via `foreach` 
- fixed the issue encountered on CentOS where `forech` wasn't loading `pathfindR` (#164)
- fixed a CRAN error due to a package documentation issue (#172)
- performed some refactoring and updated/improved all tests

# pathfindR 2.1.0
## Minor Changes and Bug Fixes
- removed `create_HTML_report()` so `run_pathfindR()` no longer generates a HTML report

# pathfindR 2.0.1

## Minor Changes and Bug Fixes
- added the `dir_for_report` argument in the internal function `create_HTML_report()` to fix test issues on CRAN

# pathfindR 2.0.0

## Major Changes
- updated the java active subnetwork search component and added the `seedForRandom` argument in `active_snw_search()`to ensure reproducibility. By default behavior, in `run_pathfindR()`, a seed is set for each iteration to produce reproducible results (#108)
- as the example input/output data were renamed for convenience in 'pathfindR.data' v2.0, 'pathfindR' now depends on pathfindR.data (>= 2.0) 
- refactored/simplified `run_pathfindR()`
- visualization enriched term diagrams are now NOT part of `run_pathfindR()`
- default behavior of `run_pathfindR()` is now to run in a temporary directory. The user can still set `output_dir` to run in a specified directory and also produce HTML reports
- in `hierarchical_term_clustering()`, update the sequence of number of clusters for which silhouette width is calculated for choosing the optimal number of clusters. This should speed up the function for cases with a large number of enriched terms
- updated the relevant vignettes to reflect the implemented changes

## Minor Changes and Bug Fixes
- fixed a minor issue in `return_pin_path()` where the PIN was not properly read (#157)

# pathfindR 1.6.4

## Minor Changes and Bug Fixes
- updated the alias selection function within `input_processing()` so that an alias that is not already present is selected
- updated the min-max scaling (controlled by `scale_vals`) in `color_kegg_pathway()`, the default is now `scale_vals=TRUE`
- updated the `term_gene_heatmap()` function so that legend title is shown and can be customized
- updated the `term_gene_heatmap()` function so that coloring is proper when no change values are provided in `genes_df`
- added the `sort_terms_by_p` argument to the `term_gene_heatmap()` function to enable sorting of terms by 'lowest_p'
- in visualization functions, made coloring of up-/down-regulated genes consistent (#126)
- added the `vertex.label.cex` and `vertex.size.scaling` arguments to `cluster_graph_vis()`
- added the `show_legend` argument to `visualize_term_interactions()` to toggle the legend


# pathfindR 1.6.3

## Minor Changes and Bug Fixes
- Fixed coloring issue in `color_kegg_pathway()`
- In `color_kegg_pathway()` the default value for `normalize_vals` is now `FALSE`


# pathfindR 1.6.2

## Major Changes
- fixed an issue in `get_kegg_gsets()` where empty result was returned for some organisms due to an error in parsing (#72)

## Minor Changes and Bug Fixes
- added `repel = TRUE` in `term_gene_graph()` and `combined_results_graph()` for better visualization of labels
- fixed minor issue in `enrichment_chart()` (#75)
- fixed minor issue in `visualize_term_interactions()`
- fixed issue in `get_biogrid_pin()` where the download method was set to `wget` (now set to `auto`, per #83)
- updated to using tab3 format for `get_biogrid_pin()` (if tab3 is available for the chosen release, otherwise tab2 format is used)
- updated the default version of PIN obtained by `get_biogrid_pin()` to '4.4.200'
- in `get_kegg_gsets()`, improved parsing of KEGG term descriptions so that no description is duplicated (#87)
- in `score_terms()`, if using descriptions, the ID is now appended for (any) duplicated term descriptions (#87)
- in `obtain_colored_url()`, swapped `bg_color` with `fg_color` due to an issue with `KEGGREST`
- added legend to `term_gene_heatmap()` (#95)
- in `get_biogrid_pin()`, the "download.file.method" from global options is used
- `combined_results_graph()` raises an error if there are no common terms in the combined data frame

# pathfindR 1.6.1

## Major Changes

- In `run_pathfindR()`, the default `iterations` was set back to 10 (the default for all other v1.x)

# pathfindR 1.6.0

## Major Changes
- In `run_pathfindR()`, as "GR" (the default active subnetwork search method) provides nearly identical results in each iteration, the default `iterations` is set to 1
- added the column 'support' (the proportion of active subnetworks leading to enrichment over all subnetworks) in the output
- updated the download URL in `get_biogrid_pin()` as BioGRID updated the URL for download

## Minor Changes and Bug Fixes
- changed old argument in the "Step-by-Step Execution of the pathfindR Enrichment Workflow" vignette
- fixed an issue in `visualize_term_interactions()` where the file name was too long, it was causing an error on Windows. Limited to 100 characters (#58)

# pathfindR 1.5.1

## Minor Changes and Bug Fixes
- Fixed issue in `check_java_version()` where java version 14 could not be parsed (#49)
- Fixed issue in `combined_results_graph()` where gene nodes were not colored correctly (#55)

# pathfindR 1.5.0

## Major Changes
- created separate package `pathfindR.data` for storing pathfindR data
- added the function `visualize_active_subnetworks()` for visualizing graphs of active subnetworks
- add the new vignette "Comparing Two pathfindR Results" that briefly describes how different pathfindR results can be compared
- added the functions `combine_pathfindR_results()` and `combined_results_graph()` for comparison of 2 pathfindR results and term-gene graph of the combined results, respectively
- added the function `get_pin_file()` for obtaining organism-specific PIN data (only from BioGRID for now)
- added the function `get_gene_sets_list()` for obtaining organism-specific gene sets list from KEGG, Reactome and MSigDB
- added the function `term_gene_heatmap()` to create heatmap visualizations of enriched terms and the involved input genes. Rows are enriched terms and columns are involved input genes. If `genes_df` is provided, colors of the tiles indicate the change values
- added the function `UpSet_plot()` to create UpSet plots of enriched terms
- added the human cell markers gene sets data `cell_markers_gsets` and `cell_markers_descriptions`

## Minor Changes and Bug Fixes
- fixed an issue regarding `parallel::makeCluster()` in `run_pathfindR()` (#45)
- fixed save-related issue in `download_kegg_png()` (#37, @rix133)
- added the output data `RA_comparison_output` of pathfindR results on another RA-related dataset (GSE84074)
- in `visualize_hsa_KEGG()`, fixed the issue where >1 entrez ids were returned for a gene symbol (the first one is kept)
- in `visualize_hsa_KEGG()`, implemented a tryCatch to avoid any issues when `KEGGREST::color.pathway.by.objects()` might fail (#28)
- in `visualize_hsa_KEGG()`, now limiting the number of genes passes onto `KEGGREST::color.pathway.by.objects()` to < 60 (because the KEGG API now limits the number?)
- changed default visualization in `term_gene_heatmap()` (i.e. when `genes_df` is not provided) to binary colored heatmap (by default, "green" and "red", controlled by `low` and `high`) by up-/down- regulation status
- update the vignette "pathfindR Analysis for non-Homo-sapiens organisms" to reflect new data generation functions `get_pin_file()` and `get_gene_sets_list()` and fixed a minor issue in the vignette (#46)

# pathfindR 1.4.2

## Minor Changes and Bug Fixes
- Fixed corner case in `create_kappa_matrix()` when `chance` is 1, the metric is turned into 0
- Fixed misused `class(.) == *` in `cluster_graph_vis()`

# pathfindR 1.4.1

## Major Changes
- Fixed error in DESCRIPTION: the Java version in SystemRequirements was corrected to "Java (>= 8.0)"
- The Java version is now checked

## Minor Changes and Bug Fixes
- Fixed behavior: when no input genes are present in the enriched hsa KEGG pathway, visualization of the pathway is now skipped
- Added the argument `max_to_plot` to `visualize_hsa_KEGG()` and to `run_pathfindR()`. This argument controls the number of pathways to be visualized (default is NULL, i.e. no filter). This was implemented not to slow down the runtime of `run_pathfindR()` as downloading the png files is slow.
- Fixed links to visualizations in `enriched_ters.Rmd`

# pathfindR 1.4.0

## Major Changes
- Replaced most occurrences of "pathway" to "term". This was adapted because "term" reflects the utility of the package better. The enrichment and clustering approaches work with any kind of gene set data (be it pathway gene sets, gene ontology gene sets, motif gene sets etc.) Accordingly:
  - `DESCRIPTION` was updated
  - The functions `annotate_pathway_DEGs()`, `calculate_pw_scores()`, `cluster_pathways()`, `fuzzy_pw_clustering()`, `hierarchical_pw_clustering()`, `visualize_pw_interactions()` and `visualize_pws()` were renamed to 
  `annotate_term_DEGs()`, `score_terms()`, `cluster_enriched_terms()`, `fuzzy_term_clustering()`, `hierarchical_term_clustering()`, `visualize_term_interactions()` and `visualize_terms()` respectively
  - The Rmd template file for the report `enriched_pathways.Rmd` was renamed to `enriched_terms.Rmd`
  - All the Rmd template files for the report were updated
  - Documentation of each function was updated accordingly
- Added the visualization function `term_gene_graph()`, which creates a graph of enriched terms - involved genes
- Made changes in `enrichment()` and `enrichment_analyses()` to get enrichment results faster
- Added the function `fetch_gene_set()` for obtaining gene set data more easily
- Terms in gene sets can now be filtered according to the number of genes a term contains (controlled by `min_gset_size`, `max_gset_size` in `fetch_gene_set()` and `run_pathfindR()`) 
- Added the argument `gaCrossover` during active subnetwork search which controls the probability of a crossover in GA (default = 1, i.e. always perform crossover)
- Added unit tests using `testthat`
- Updated all gene sets data
- Updated all RA example data
- The vignettes were updated
- Updated all PIN data
- Improved speed of kappa matrix calculation (`create_kappa_matrix()`)
- Added vignette for non-Homo-sapiens organisms
- Added Mus musculus (mmu) data:
  - `mmu_kegg_genes` & `mmu_kegg_descriptions`: mmu KEGG gene sets data
  - mmu STRING PIN
  - `myeloma_input` & `myeloma_output`: example mmu input and output data
- Added the STRING PIN (combined score >= 400)
- The argument `sig_gene_thr` in subnetwork filtering via `filterActiveSnws()` now serves the threshold proportion of significant genes in the active subnetwork. e.g., if there are 100 significant genes and `sig_gene_thr = 0.03`, subnetwork that contain at least 3 (100 x 0.03) significant genes will be accepted for further analysis
- Removed `pathview` dependency by implementing colored pathway diagram visualization function using `KEGGREST` and `KEGGgraph`

## Minor Changes and Bug Fixes
- In `hierarchical_term_clustering()`, redefined the distance measure as `1 - kappa statistic`
- Fixed minor issue in `cluster_graph_vis()` (during the calculations for additional node colors)
- Removed title from graph visualization of hierarchical clustering in `cluster_graph_vis()`
- In `active_snw_search()`, unnecessary warnings during active subnetwork search were removed
- Fixed minor issue in `enrichment_chart()`, supplying fuzzy clustered results no longer raises an error
- Added new checks in `input_testing()` and `input_processing()` to ensure that both the initial input data frame and the processed input data frame for active subnetwork search contain at least 2 genes (to fix the corner case encountered in issue #17)
- Fixed minor issue in `enrichment_chart()`, ensuring that bubble sizes displayed in the legend (proportional to # of DEGs) are integers
- In `enrichment_chart()`, added the arguments `num_bubbles` (default is 4) to control number of bubbles displayed in the legend and `even_breaks` (default is `TRUE`) to indicate if even increments of breaks are required
- Updated the logo
- Minor fix in `term_gene_graph()` (create the igraph object as an undirected graph for better auto layout)
- Minor fix in `visualize_term_interactions()`. The legend no longer displays "Non-input Active Snw. Genes" if they were not provided
- The argument `human_genes` in `run_pathfindR()` and `input_processing()` was renamed as `convert2alias`
- The gene symbols in the input data frame, the PIN and the gene sets are now turned into uppercase (for obtaining the best overlap)
- Added the argument `top_terms` to `enrichment_chart()`, controlling the number top enriched terms to plot (default is 10)
- Other minor bug/error fixes

# pathfindR 1.3.0

## Major Changes
- Separated the steps of the function `run_pathfindR` into individual functions: `active_snw_search`, `enrichment_analyses`, `summarize_enrichment_results`, `annotate_pathway_DEGs`, `visualize_pws`.
- renamed the function `pathmap` as `visualize_hsa_KEGG`, updated the function to produce different visualizations for inputs with binary change values (ordered) and no change values (the `input_processing` function, assigns a change value of 100 to all).
- Created new the visualization function `visualize_pw_interactions`, which creates PNG files visualizing the interactions (in the selected PIN) of genes involved in the given pathways.
- Added new vignette, describing the step-by-step execution of the pathfindR workflow
- Changed clustering metric to kappa statistic, created the new clustering related functions `create_kappa_matrix`, `hierarchical_pw_clustering`, `fuzzy_pw_clustering` and `cluster_pathways`.
- Implemented the new function `cluster_graph_vis` for visualizing graph diagrams of clustering results.

## Minor Changes and Bug Fixes
- Fixed the bug where the arguments `score_quan_thr` and `sig_gene_thr` for `run_pathfindR` were not being utilized.
- in `run_pathfindR`, added message at the end of run, reporting the number enriched pathways.
- the function `run_pathfindR` now creates a variable `org_dir` that is the "path/to/original/working/directory". `org_dir` is used in multiple functions to return to the original working directory if anything fails. This changes the previous behavior where if a function stopped with an error the directory was changed to "..", i.e. the parent directory. This change was adapted so that the user is returned to the original working directory if they supply a recursive output folder (`output_dir`, e.g. "./ALL_RESULTS/RESULT_A"). 
- in `input_processing`, added the argument `human_genes` to only perform alias symbol conversion when human gene symbols are provided. - Updated the Rmd files used to create the report HTML files
- Added the data for `GO-All`, all annotations in the GO database (BP+MF+CC)
- Updated the vignette `pathfindR - An R Package for Pathway Enrichment Analysis Utilizing Active Subnetworks` to reflect the new functionalities.

# pathfindR 1.2.3
## Minor Changes and Bug Fixes
- in the function `plot_scores`, added the argument `label_cases` to indicate whether or not to label the cases in the pathway scoring heatmap plot. Also added the argument `case_control_titles` which allows the user to change the default "Case" and "Control" headers. Also added the arguments `low` and `high` used to change the low and high end colors of the scoring color gradient.
- in the function `plot_scores`, reversed the color gradient to match the coloring scheme used by pathview (i.e. red for positive values, green for negative values)
- minor change in `parseActiveSnwSearch`, replaced `score_thr` by `score_quan_thr`. This was done so that the scoring filter for active subnetworks could be performed based on the distribution of the current active subnetworks and not using a constant empirical score value threshold.
- minor change in `parseActiveSnwSearch`, increased `sig_gene_thr` from 2 to 10 as we observed in most of the cases, this resulted in faster runs with comparable results.
- in `choose_clusters`, added the argument `p_val_threshold` to be used as p value threshold for filtering the enriched pathways prior to clustering.

# pathfindR 1.2.2

## Major Changes
- fixed issue related to the package `pathview`.
## Minor Changes and Bug Fixes
- in the function `choose_clusters`, added option to use pathway names instead of pathway ids when visualizing the clustering dendrogram and heatmap.

# pathfindR 1.2.1

## Major Changes
- Added the option to specify a custom gene set when using `run_pathfindR`. For this, the `gene_sets` argument should be set to "Custom" and `custom_genes` and `custom_pathways` should be provided.

## Minor Changes and Bug Fixes
- fixed minor bug in `calculate_pw_scores` where if there was one DEG, subsetting the experiment matrix failed
- added if condition to check if there were DEGs in `calculate_pw_scores`. If there is none, the pathway is skipped.
- in `calculate_pw_scores`, if `cases` are provided, the pathways are reordered before plotting the heat map and returning the matrix according to their activity in `cases`. This way, "up" pathways are grouped together, same for "down" pathways.
- in `calculate_pwd`, if a pathway has perfect overlap with other pathways, change the correlation value with 1 instead of NA.
- in `choose_clusters`, if `result_df` has less than 3 pathways, do not perform clustering.
- `run_pathfindR` checks whether the output directory (`output_dir`) already exists and if it exists, now appends "(1)" to `output_dir` and displays a warning message. This was implemented to prevent writing over existing results.
- in run `run_pathfindR`, recursive creation for the output directory (`output_dir`) is now supported.
- in run `run_pathfindR`, if no pathways are found, the function returns an empty data frame instead of raising an error.

# pathfindR 1.2

## Major Changes
- Implemented the (per subject) pathway scoring function `calculate_pw_scores` and the function to plot the heatmap of pathway scores per subject `plot_scores`.

- Added the `auto` parameter to `choose_clusters`. When `auto == TRUE` (default), the function chooses the optimal number of clusters `k` automatically, as the value which maximizes the average silhouette width. It then returns a data frame with the cluster assignments and the representative/member statuses of each pathway.

- Added the `Fold_Enrichment` column to the resulting data frame of `enrichment`, and as a corollary to the resulting data frame of `run_pathfindR`.

- Added the option `bubble` to plot a bubble chart displaying the enrichment results in `run_pathfindR` using the helper function `enrichment_chart`. To plot the bubble chart set `bubble = TRUE` in `run_pathfindR` or use `enrichment_chart(your_result_df)`. 

## Minor Changes and Bug Fixes
- Add the parameter `silent_option` to `run_pathfindR`. When `silent_option == TRUE` (default), the console outputs during active subnetwork search are printed to a file named "console_out.txt". If `silent_option == FALSE`, the output is printed on the screen. Default was set to `TRUE` because multiple console outputs are simultaneously printed when running in parallel.

- Added the `list_active_snw_genes` parameter to `run_pathfindR`. When `list_active_snw_genes == TRUE`, the function adds the column `non_DEG_Active_Snw_Genes`, which reports the non-DEG active subnetwork genes for the active subnetwork which was enriched for the given pathway with the lowest p value.

- Added the data `RA_clustered`, which is the example output of the clustering workflow.

- In the function, `run_pathfindR` added the option to specify the argument `output_dir` which specifies the directory to be created under the current working directory for storing the result HTML files. `output_dir` is "pathfindR_Results" by default.

- `run_pathfindR` now checks whether the output directory (`output_dir`) already exists and if it exists, stops and displays an error message. This was implemented to prevent writing over existing results.

- `genes_table.html` now contains a second table displaying the input gene symbols for which there were no interactions in the PIN.

# pathfindR 1.1

## Major changes
- Added the `gene_sets` option in `run_pathfindR` to chose between different gene sets. Available gene sets are `KEGG`, `Reactome`, `BioCarta` and Gene Ontology gene sets (`GO-BP`, `GO-CC` and `GO-MF`)
- `cluster_pathways` automatically recognizes the ID type and chooses the gene sets accordingly

## Minor Changes and Bug Fixes
- Fixed issue regarding p values < 1e-13. No active subnetworks were found when there were p values < 1e-13. These are now changed to 1e-13 in the function `input_processing`
- In `input_processing`, genes for which no interactions are found in the PIN are now removed before active subnetwork search
- Duplicated gene symbols no longer raise an error. If there are duplicated symbols, the lowest p value is chosen for each gene symbol in the function `input_processing`
- To prevent the formation of nested folders, by default and on errors, the function `run_pathfindR` returns to the user's working directory.
- Citation information are now provided for our BioRxiv pre-print


================================================
FILE: R/active_snw_search.R
================================================
#' Perform Active Subnetwork Search
#'
#' @param input_for_search input the input data that active subnetwork search uses. The input
#' must be a data frame containing at least these 2 columns: \describe{
#'   \item{GENE}{Gene Symbol}
#'   \item{P_VALUE}{p value obtained through a test, e.g. differential expression/methylation}
#' }
#' @inheritParams return_pin_path
#' @param snws_file name for active subnetwork search output data
#' \strong{without file extension} (default = 'active_snws')
#' @param dir_for_parallel_run (previously created) directory for a parallel run iteration.
#' Used in the wrapper function (see ?run_pathfindR) (Default = NULL)
#' @inheritParams filterActiveSnws
#' @param search_method algorithm to use when performing active subnetwork
#'  search. Options are greedy search (GR), simulated annealing (SA) or genetic
#'  algorithm (GA) for the search (default = 'GR').
#' @param seedForRandom seed for reproducibility while running the java modules (applies for GR and SA)
#' @param silent_option boolean value indicating whether to print the messages
#' to the console (FALSE) or not (TRUE, this will print to a temp. file) during
#' active subnetwork search (default = TRUE). This option was added because
#' during parallel runs, the console messages get disorderly printed.
#' @param use_all_positives if TRUE: in GA, adds an individual with all positive
#'  nodes. In SA, initializes candidate solution with all positive nodes. (default = FALSE)
#' @param geneInitProbs For SA and GA, probability of adding a gene in initial solution (default = 0.1)
#' @param saTemp0 Initial temperature for SA (default = 1.0)
#' @param saTemp1 Final temperature for SA (default = 0.01)
#' @param saIter Iteration number for SA (default = 10000)
#' @param gaPop Population size for GA (default = 400)
#' @param gaIter Iteration number for GA (default = 200)
#' @param gaThread Number of threads to be used in GA (default = 5)
#' @param gaCrossover Applies crossover with the given probability in GA (default = 1, i.e. always perform crossover)
#' @param gaMut For GA, applies mutation with given mutation rate (default = 0, i.e. mutation off)
#' @param grMaxDepth Sets max depth in greedy search, 0 for no limit (default = 1)
#' @param grSearchDepth Search depth in greedy search (default = 1)
#' @param grOverlap Overlap threshold for results of greedy search (default = 0.5)
#' @param grSubNum Number of subnetworks to be presented in the results (default = 1000)
#'
#' @return A list of genes in every identified active subnetwork that has a score greater than
#' the `score_quan_thr`th quantile and that has at least `sig_gene_thr` affected genes.
#'
#' @export
#'
#' @examples
#' \donttest{
#' processed_df <- example_pathfindR_input[1:15, -2]
#' colnames(processed_df) <- c('GENE', 'P_VALUE')
#' GR_snws <- active_snw_search(
#'   input_for_search = processed_df,
#'   pin_name_path = 'KEGG',
#'   search_method = 'GR',
#'   score_quan_thr = 0.8
#' )
#' # clean-up
#' unlink('active_snw_search', recursive = TRUE)
#' }
active_snw_search <- function(input_for_search, pin_name_path = "Biogrid", snws_file = "active_snws",
    dir_for_parallel_run = NULL, score_quan_thr = 0.8, sig_gene_thr = 0.02, search_method = "GR",
    seedForRandom = 1234, silent_option = TRUE, use_all_positives = FALSE, geneInitProbs = 0.1,
    saTemp0 = 1, saTemp1 = 0.01, saIter = 10000, gaPop = 400, gaIter = 10000, gaThread = 5,
    gaCrossover = 1, gaMut = 0, grMaxDepth = 1, grSearchDepth = 1, grOverlap = 0.5,
    grSubNum = 1000) {
    ############ Argument checks input_for_search
    if (!is.data.frame(input_for_search)) {
        stop("`input_for_search` should be data frame")
    }
    cnames <- c("GENE", "P_VALUE")
    if (any(!cnames %in% colnames(input_for_search))) {
        stop("`input_for_search` should contain the columns ", paste(dQuote(cnames),
            collapse = ","))
    }

    # pin_name_path (fetch pin path)
    pin_path <- return_pin_path(pin_name_path)

    # snws_file
    if (!suppressWarnings(file.create(file.path(tempdir(check = TRUE), snws_file)))) {
        stop("`snws_file` may be containing forbidden characters. Please change and try again")
    }

    # search_method
    valid_mets <- c("GR", "SA", "GA")
    if (!search_method %in% valid_mets) {
        stop("`search_method` should be one of ", paste(dQuote(valid_mets), collapse = ", "))
    }

    # silent_option
    if (!is.logical(silent_option)) {
        stop("`silent_option` should be either TRUE or FALSE")
    }

    # use_all_positives
    if (!is.logical(use_all_positives)) {
        stop("`use_all_positives` should be either TRUE or FALSE")
    }

    ############ Initial Steps If dir_for_parallel_run is provided, change
    ############ working dir to dir_for_parallel_run
    if (!is.null(dir_for_parallel_run)) {
        org_dir <- getwd()
        on.exit(setwd(org_dir))
        setwd(dir_for_parallel_run)
    }

    ## turn silent_option into shell argument
    tmp_out <- file.path(tempdir(check = TRUE), paste0("console_out_", snws_file,
        ".txt"))
    silent_option <- ifelse(silent_option, paste0(" > ", tmp_out), "")

    ## turn use_all_positives into the java argument
    use_all_positives <- ifelse(use_all_positives, " -useAllPositives", "")

    ## absolute path for active snw search jar
    active_search_jar_path <- system.file("java/ActiveSubnetworkSearch.jar", package = "pathfindR")

    ## create directory for active subnetworks
    if (!dir.exists("active_snw_search")) {
        dir.create("active_snw_search")
    }

    if (!file.exists("active_snw_search/input_for_search.txt")) {
        input_for_search$GENE <- base::toupper(input_for_search$GENE)
        utils::write.table(input_for_search[, c("GENE", "P_VALUE")], "active_snw_search/input_for_search.txt",
            col.names = FALSE, row.names = FALSE, quote = FALSE, sep = "\t")
    }

    input_path <- normalizePath("active_snw_search/input_for_search.txt")

    ############ Run active Subnetwork Search running Active Subnetwork Search
    system(paste0("java -Xss4m -jar \"", active_search_jar_path, "\"", " -sif=\"",
        pin_path, "\"", " -sig=\"", input_path, "\"", " -method=", search_method,
        " -seedForRandom=", seedForRandom, use_all_positives, " -saTemp0=", saTemp0,
        " -saTemp1=", saTemp1, " -saIter=", format(saIter, scientific = FALSE), " -geneInitProb=",
        geneInitProbs, " -gaPop=", gaPop, " -gaIter=", gaIter, " -gaThread=", gaThread,
        " -gaCrossover=", gaCrossover, " -gaMut=", gaMut, " -grMaxDepth=", grMaxDepth,
        " -grSearchDepth=", grSearchDepth, " -grOverlap=", grOverlap, " -grSubNum=",
        grSubNum, silent_option))

    snws_file <- file.path("active_snw_search", paste0(snws_file, ".txt"))
    file.rename(from = "resultActiveSubnetworkSearch.txt", to = snws_file)

    ############ Parse and filter active subnetworks
    filtered_snws <- filterActiveSnws(active_snw_path = snws_file, sig_genes_vec = input_for_search$GENE,
        score_quan_thr = score_quan_thr, sig_gene_thr = sig_gene_thr)

    if (is.null(filtered_snws)) {
        snws <- list()
    } else {
        snws <- filtered_snws$subnetworks
    }
    message(paste0("Found ", length(snws), " active subnetworks\n\n"))

    return(snws)
}

#' Parse Active Subnetwork Search Output File and Filter the Subnetworks
#'
#' @param active_snw_path path to the output of an Active Subnetwork Search
#' @param sig_genes_vec vector of significant gene symbols. In the scope of this
#'   package, these are the input genes that were used for active subnetwork search
#' @param score_quan_thr active subnetwork score quantile threshold. Must be
#' between 0 and 1 or set to -1 for not filtering. (Default = 0.8)
#' @param sig_gene_thr threshold for the minimum proportion of significant genes in
#' the subnetwork (Default = 0.02) If the number of genes to use as threshold is
#' calculated to be < 2 (e.g. 50 signif. genes x 0.01 = 0.5), the threshold number
#' is set to 2
#'
#' @return A list containing \code{subnetworks}: a list of of genes in every
#' active subnetwork that has a score greater than the \code{score_quan_thr}th
#' quantile and that contains at least \code{sig_gene_thr} of significant genes
#' and \code{scores} the score of each filtered active subnetwork
#' @export
#'
#' @seealso See \code{\link{run_pathfindR}} for the wrapper function of the
#'   pathfindR enrichment workflow
#'
#' @examples
#' path2snw_list <- system.file(
#'   'extdata/resultActiveSubnetworkSearch.txt',
#'   package = 'pathfindR'
#' )
#' filtered <- filterActiveSnws(
#'   active_snw_path = path2snw_list,
#'   sig_genes_vec = example_pathfindR_input$Gene.symbol
#' )
filterActiveSnws <- function(active_snw_path, sig_genes_vec, score_quan_thr = 0.8,
    sig_gene_thr = 0.02) {
    ## Arg. checks
    active_snw_path <- suppressWarnings(normalizePath(active_snw_path))

    if (!file.exists(active_snw_path)) {
        stop("The active subnetwork file does not exist! Check the `active_snw_path` argument")
    }

    if (!is.atomic(sig_genes_vec)) {
        stop("`sig_genes_vec` should be a vector")
    }

    if (!is.numeric(score_quan_thr)) {
        stop("`score_quan_thr` should be numeric")
    }
    if (score_quan_thr != -1 & (score_quan_thr > 1 | score_quan_thr < 0)) {
        stop("`score_quan_thr` should be in [0, 1] or -1 (if not filtering)")
    }

    if (!is.numeric(sig_gene_thr)) {
        stop("`sig_gene_thr` should be numeric")
    }
    if (sig_gene_thr < 0 | sig_gene_thr > 1) {
        stop("`sig_gene_thr` should be in [0, 1]")
    }

    output <- readLines(active_snw_path)

    if (length(output) == 0) {
        return(NULL)
    }

    score_vec <- c()
    subnetworks <- list()
    for (i in base::seq_len(length(output))) {
        snw <- output[[i]]

        snw <- unlist(strsplit(snw, "\\s"))

        score_vec <- c(score_vec, as.numeric(snw[1]))
        subnetworks[[i]] <- snw[-1]
    }

    # keep subnetworks with score over the 'score_quan_thr'th quantile
    if (score_quan_thr == -1) {
        score_thr <- min(score_vec) - 1
    } else {
        score_thr <- stats::quantile(score_vec, score_quan_thr)
    }
    cond <- as.numeric(score_vec) > as.numeric(score_thr)
    subnetworks <- subnetworks[cond]
    score_vec <- as.numeric(score_vec)[cond]

    # select subnetworks containing at least 'sig_gene_thr' of significant
    # genes
    snw_sig_counts <- vapply(subnetworks, function(snw_genes) {
        sum(base::toupper(snw_genes) %in% base::toupper(sig_genes_vec))
    }, 1)
    sig_gene_num_thr <- sig_gene_thr * length(sig_genes_vec)
    sig_gene_num_thr <- max(2, sig_gene_num_thr)
    cond <- (snw_sig_counts >= sig_gene_num_thr)
    subnetworks <- subnetworks[cond]
    score_vec <- score_vec[cond]

    return(list(subnetworks = subnetworks, scores = score_vec))
}

#' Visualize Active Subnetworks
#'
#' @inheritParams filterActiveSnws
#' @inheritParams term_gene_heatmap
#' @inheritParams return_pin_path
#' @param num_snws number of top subnetworks to be visualized (leave blank if
#' you want to visualize all subnetworks)
#' @inheritParams term_gene_graph
#' @param ... additional arguments for \code{\link{input_processing}}
#'
#' @return a list of ggplot objects of graph visualizations of identified active
#' subnetworks. Green nodes are down-regulated genes, reds are up-regulated genes
#' and yellows are non-input genes
#' @export
#'
#' @examples
#' path2snw_list <- system.file(
#'   'extdata/resultActiveSubnetworkSearch.txt',
#'   package = 'pathfindR'
#' )
#' # visualize top 2 active subnetworks
#' g_list <- visualize_active_subnetworks(
#'   active_snw_path = path2snw_list,
#'   genes_df = example_pathfindR_input[1:10, ],
#'   pin_name_path = 'KEGG',
#'   num_snws = 2
#' )
visualize_active_subnetworks <- function(active_snw_path, genes_df, pin_name_path = "Biogrid",
    num_snws, layout = "stress", score_quan_thr = 0.8, sig_gene_thr = 0.02, ...) {
    # process input data frame
    processed_input <- input_processing(genes_df, pin_name_path = pin_name_path,
        ...)

    # parse and filter active subnetworks
    active_snw_list <- filterActiveSnws(active_snw_path = active_snw_path, sig_genes_vec = processed_input$GENE,
        score_quan_thr = score_quan_thr, sig_gene_thr = sig_gene_thr)
    if (is.null(active_snw_list) | length(active_snw_list$scores) == 0) {
        return(NULL)
    }

    score_vec <- active_snw_list$scores
    subnetworks <- active_snw_list$subnetworks

    if (missing(num_snws)) {
        num_snws <- length(subnetworks)
    }

    if (num_snws > length(subnetworks)) {
        num_snws <- length(subnetworks)
    }

    # load PIN data load PIN
    pin_path <- return_pin_path(pin_name_path)
    pin <- utils::read.delim(file = pin_path, header = FALSE)
    pin$V2 <- NULL

    pin[, 1] <- base::toupper(pin[, 1])
    pin[, 2] <- base::toupper(pin[, 2])

    # create graphs
    graphs_list <- list()
    for (idx in seq_len(num_snws)) {
        snw <- subnetworks[[idx]]

        num_input_genes <- sum(processed_input$GENE %in% snw)
        perc_input_genes <- round(num_input_genes/length(processed_input$GENE) *
            100, 2)

        snw_interactions <- pin[pin[, 1] %in% snw & pin[, 2] %in% snw, ]
        g <- igraph::graph_from_data_frame(snw_interactions, directed = FALSE)
        cond_up_gene <- names(igraph::V(g)) %in% processed_input$GENE[processed_input$CHANGE >
            0]
        cond_down_gene <- names(igraph::V(g)) %in% processed_input$GENE[processed_input$CHANGE <
            0]
        igraph::V(g)$type <- ifelse(cond_up_gene, "up", ifelse(cond_down_gene, "down",
            "non-input"))

        igraph::V(g)$label.cex <- 0.5
        igraph::V(g)$frame.color <- "gray"
        igraph::V(g)$color <- ifelse(igraph::V(g)$type == "non-input", "#FFD500",
            ifelse(igraph::V(g)$type == "up", "#D2222D", "#35CD35"))

        color_lookup <- c(`#35CD35` = "down-regulated gene", `#D2222D` = "up-regulated gene",
            `#FFD500` = "non-input gene")



        p <- ggraph::ggraph(g, layout = layout)
        p <- p + ggraph::geom_edge_link(alpha = 0.8, colour = "darkgrey")
        p <- p + ggraph::geom_node_point(ggplot2::aes(color = .data$color), size = 2)
        p <- p + ggplot2::theme_void()
        p <- p + ggraph::geom_node_text(ggplot2::aes(label = .data$name), nudge_y = 0.2)
        p <- p + ggplot2::scale_colour_manual(values = unique(igraph::V(g)$color),
            name = NULL, labels = color_lookup[unique(igraph::V(g)$color)])
        p <- p + ggplot2::labs(title = paste0("Active Subnetwork #", idx), subtitle = paste0("Score=",
            round(score_vec[idx], 2), ", ", num_input_genes, "(", perc_input_genes,
            "%) input genes"))
        p <- p + ggplot2::theme(plot.title = ggplot2::element_text(hjust = 0.5),
            plot.subtitle = ggplot2::element_text(hjust = 0.5), legend.position = "bottom")
        graphs_list[[idx]] <- p
    }

    return(graphs_list)
}


================================================
FILE: R/clustering.R
================================================
#' Create Kappa Statistics Matrix
#'
#' @param enrichment_res data frame of pathfindR enrichment results. Must-have
#' columns are 'Term_Description' (if \code{use_description = TRUE}) or 'ID'
#' (if \code{use_description = FALSE}), 'Down_regulated', and 'Up_regulated'.
#' If \code{use_active_snw_genes = TRUE}, 'non_Signif_Snw_Genes' must also be
#' provided.
#' @param use_description Boolean argument to indicate whether term descriptions
#'  (in the 'Term_Description' column) should be used. (default = \code{FALSE})
#' @param use_active_snw_genes boolean to indicate whether or not to use
#' non-input active subnetwork genes in the calculation of kappa statistics
#' (default = FALSE, i.e. only use affected genes)
#'
#' @return a matrix of kappa statistics between each term in the
#' enrichment results.
#'
#' @export
#'
#' @examples
#' sub_df <- example_pathfindR_output[1:3, ]
#' create_kappa_matrix(sub_df)
create_kappa_matrix <- function(enrichment_res, use_description = FALSE, use_active_snw_genes = FALSE) {
    ### Argument checks
    if (!is.logical(use_description)) {
        stop("`use_description` should be TRUE or FALSE")
    }

    if (!is.logical(use_active_snw_genes)) {
        stop("`use_active_snw_genes` should be TRUE or FALSE")
    }

    if (!is.data.frame(enrichment_res)) {
        stop("`enrichment_res` should be a data frame of enrichment results")
    }
    if (nrow(enrichment_res) < 2) {
        stop("`enrichment_res` should contain at least 2 rows")
    }

    nec_cols <- c("Down_regulated", "Up_regulated")
    if (use_description) {
        nec_cols <- c("Term_Description", nec_cols)
    } else {
        nec_cols <- c("ID", nec_cols)
    }
    if (use_active_snw_genes) {
        nec_cols <- c(nec_cols, "non_Signif_Snw_Genes")
    }

    if (!all(nec_cols %in% colnames(enrichment_res))) {
        stop("`enrichment_res` should contain all of ", paste(dQuote(nec_cols), collapse = ", "))
    }

    ### Initial steps Column to use for gene set names
    chosen_id <- ifelse(use_description, which(colnames(enrichment_res) == "Term_Description"),
        which(colnames(enrichment_res) == "ID"))

    # list of genes
    down_idx <- which(colnames(enrichment_res) == "Down_regulated")
    up_idx <- which(colnames(enrichment_res) == "Up_regulated")

    genes_lists <- apply(enrichment_res, 1, function(x) {
        base::toupper(c(unlist(strsplit(as.character(x[up_idx]), ", ")), unlist(strsplit(as.character(x[down_idx]),
            ", "))))
    })

    if (use_active_snw_genes) {
        active_idx <- which(colnames(enrichment_res) == "non_Signif_Snw_Genes")

        genes_lists <- mapply(function(x, y) {
            c(x, unlist(strsplit(as.character(y), ", ")))
        }, genes_lists, enrichment_res[, active_idx])
    }

    # Exclude zero-length gene sets
    excluded_idx <- which(vapply(genes_lists, length, 1) == 0)
    if (length(excluded_idx) != 0) {
        genes_lists <- genes_lists[-excluded_idx]
        enrichment_res <- enrichment_res[-excluded_idx, ]
    }

    ### Create Kappa Matrix
    all_genes <- unique(unlist(genes_lists, use.names = FALSE))
    N <- nrow(enrichment_res)
    term_names <- enrichment_res[, chosen_id]

    kappa_mat <- matrix(0, nrow = N, ncol = N, dimnames = list(term_names, term_names))
    diag(kappa_mat) <- 1

    total <- length(all_genes)
    for (i in 1:(N - 1)) {
        for (j in (i + 1):N) {
            genes_i <- genes_lists[[i]]
            genes_j <- genes_lists[[j]]

            both <- length(intersect(genes_i, genes_j))
            term_i <- length(base::setdiff(genes_i, genes_j))
            term_j <- length(base::setdiff(genes_j, genes_i))
            no_terms <- total - sum(both, term_i, term_j)

            observed <- (both + no_terms)/total
            chance <- (both + term_i) * (both + term_j)
            chance <- chance + (term_j + no_terms) * (term_i + no_terms)
            chance <- chance/total^2
            kappa_mat[j, i] <- kappa_mat[i, j] <- (observed - chance)/(1 - chance)
        }
    }
    kappa_mat[is.na(kappa_mat)] <- 0
    return(kappa_mat)
}


#' Hierarchical Clustering of Enriched Terms
#'
#' @param kappa_mat matrix of kappa statistics (output of \code{\link{create_kappa_matrix}})
#' @inheritParams create_kappa_matrix
#' @param num_clusters number of clusters to be formed (default = \code{NULL}).
#' If \code{NULL}, the optimal number of clusters is determined as the number
#' which yields the highest average silhouette width.
#' @param clu_method the agglomeration method to be used
#' (default = 'average', see \code{\link[stats]{hclust}})
#' @param plot_hmap boolean to indicate whether to plot the kappa statistics
#' clustering heatmap or not (default = FALSE)
#' @param plot_dend boolean to indicate whether to plot the clustering
#' dendrogram partitioned into the optimal number of clusters (default = TRUE)
#'
#' @details The function initially performs hierarchical clustering
#' of the enriched terms in \code{enrichment_res} using the kappa statistics
#' (defining the distance as \code{1 - kappa_statistic}). Next,
#' the clustering dendrogram is cut into k = 2, 3, ..., n - 1 clusters
#' (where n is the number of terms). The optimal number of clusters is
#' determined as the k value which yields the highest average silhouette width.
#' (if \code{num_clusters} not specified)
#'
#' @return a vector of clusters for each enriched term in the enrichment results.
#' @export
#'
#' @examples
#' \dontrun{
#' hierarchical_term_clustering(kappa_mat, enrichment_res)
#' hierarchical_term_clustering(kappa_mat, enrichment_res, method = 'complete')
#' }
hierarchical_term_clustering <- function(kappa_mat, enrichment_res, num_clusters = NULL,
    use_description = FALSE, clu_method = "average", plot_hmap = FALSE, plot_dend = TRUE) {
    ### Set ID/Name index
    chosen_id <- ifelse(use_description, which(colnames(enrichment_res) == "Term_Description"),
        which(colnames(enrichment_res) == "ID"))

    ### Argument checks
    if (!isSymmetric.matrix(kappa_mat)) {
        stop("`kappa_mat` should be a symmetric matrix")
    }

    if (!all(colnames(kappa_mat) %in% enrichment_res[, chosen_id])) {
        stop("All terms in `kappa_mat` should be present in `enrichment_res`")
    }

    if (!is.logical(plot_hmap)) {
        stop("`plot_hmap` should be TRUE or FALSE")
    }

    if (!is.logical(plot_dend)) {
        stop("`plot_dend` should be TRUE or FALSE")
    }

    ### Add excluded (zero-length) genes
    kappa_mat2 <- kappa_mat
    cond <- !enrichment_res[, chosen_id] %in% rownames(kappa_mat2)
    outliers <- enrichment_res[cond, chosen_id]
    outliers_mat <- matrix(-1, nrow = nrow(kappa_mat2), ncol = length(outliers),
        dimnames = list(rownames(kappa_mat2), outliers))
    kappa_mat2 <- cbind(kappa_mat2, outliers_mat)
    outliers_mat <- matrix(-1, nrow = length(outliers), ncol = ncol(kappa_mat2),
        dimnames = list(outliers, colnames(kappa_mat2)))
    kappa_mat2 <- rbind(kappa_mat2, outliers_mat)

    ### Perform hierarchical clustering
    clu <- stats::hclust(stats::as.dist(1 - kappa_mat2), method = clu_method)

    if (plot_hmap) {
        stats::heatmap(kappa_mat2, distfun = function(x) stats::as.dist(1 - x), hclustfun = function(x) stats::hclust(x,
            method = clu_method))
    }

    ### Choose optimal k (if not specified)
    if (is.null(num_clusters)) {
        kmax <- max(nrow(kappa_mat2)%/%2, 2)

        # sequence of k (number of clusters) to try
        if (kmax <= 20) {
            kseq <- 2:kmax
        } else if (kmax <= 100) {
            kseq <- c(2:19, seq(20, kmax%/%10 * 10, 10))
        } else {
            kseq <- c(2:19, seq(20, 99, 10), seq(100, kmax%/%50 * 50, 50))
        }

        # calculate average silhouette width per k in sequence
        avg_sils <- c()
        for (k in kseq) {
            avg_sils <- c(avg_sils, fpc::cluster.stats(stats::as.dist(1 - kappa_mat2),
                stats::cutree(clu, k = k), silhouette = TRUE)$avg.silwidth)
        }

        k_opt <- kseq[which.max(avg_sils)]

        message(paste("The maximum average silhouette width was", round(max(avg_sils),
            2), "for k =", k_opt, "\n\n"))
    } else {
        k_opt <- num_clusters
    }


    if (plot_dend) {
        graphics::plot(clu)
        stats::rect.hclust(clu, k = k_opt)
    }

    clusters <- stats::cutree(clu, k = k_opt)

    return(clusters)
}

#' Heuristic Fuzzy Multiple-linkage Partitioning of Enriched Terms
#'
#' @inheritParams hierarchical_term_clustering
#' @inheritParams create_kappa_matrix
#' @param kappa_threshold threshold for kappa statistics, defining strong
#' relation (default = 0.35)
#'
#' @details The fuzzy clustering algorithm was implemented based on:
#' Huang DW, Sherman BT, Tan Q, et al. The DAVID Gene Functional
#' Classification Tool: a novel biological module-centric algorithm to
#' functionally analyze large gene lists. Genome Biol. 2007;8(9):R183.
#'
#' @return a boolean matrix of cluster assignments. Each row corresponds to an
#' enriched term, each column corresponds to a cluster.
#' @export
#'
#' @examples
#' \dontrun{
#' fuzzy_term_clustering(kappa_mat, enrichment_res)
#' fuzzy_term_clustering(kappa_mat, enrichment_res, kappa_threshold = 0.45)
#' }
fuzzy_term_clustering <- function(kappa_mat, enrichment_res, kappa_threshold = 0.35,
    use_description = FALSE) {
    ### Set ID/Name index
    chosen_id <- ifelse(use_description, which(colnames(enrichment_res) == "Term_Description"),
        which(colnames(enrichment_res) == "ID"))

    ### Argument checks
    if (!isSymmetric.matrix(kappa_mat)) {
        stop("`kappa_mat` should be a symmetric matrix")
    }

    if (!all(colnames(kappa_mat) %in% enrichment_res[, chosen_id])) {
        stop("All terms in `kappa_mat` should be present in `enrichment_res`")
    }

    if (!is.numeric(kappa_threshold)) {
        stop("`kappa_threshold` should be numeric")
    }

    if (kappa_threshold > 1) {
        stop("`kappa_threshold` should be at most 1 as kappa statistic is always <= 1")
    }

    ### Find Qualified Seeds
    qualified_seeds <- list()
    j <- 1
    for (i in base::seq_len(nrow(kappa_mat))) {
        current_term <- rownames(kappa_mat)[i]
        current_term_kappa <- kappa_mat[i, ]


        init_membership_cond <- current_term_kappa >= kappa_threshold
        if (sum(init_membership_cond) > 3) {
            related_terms <- names(current_term_kappa)[init_membership_cond]
            terms <- c(current_term, related_terms)
            related_kappa <- kappa_mat[rownames(kappa_mat) %in% terms, colnames(kappa_mat) %in%
                terms]
            diag(related_kappa) <- 0
            tight_relationship_cond <- sum(related_kappa >= kappa_threshold)/(nrow(related_kappa)^2) >=
                0.5

            if (tight_relationship_cond) {
                qualified_seeds[[j]] <- related_terms
                names(qualified_seeds)[j] <- current_term
                j <- j + 1
            }
        }
    }

    ### Fuzzy Clustering
    clusters <- unique(qualified_seeds)
    i <- 1
    j <- i + 1
    while (i < length(clusters)) {
        common_terms <- intersect(clusters[[i]], clusters[[j]])
        all_terms <- union(clusters[[i]], clusters[[j]])

        if (length(common_terms)/length(all_terms) > 0.5 & i != j) {
            clusters[[i]] <- all_terms
            clusters[[j]] <- NULL
            i <- 1
            j <- i + 1
        } else if (j < length(clusters)) {
            j <- j + 1
        } else {
            i <- i + 1
            j <- 1
        }
    }

    ### Find Outliers
    cond <- !enrichment_res[, chosen_id] %in% c(names(clusters), unlist(clusters))
    outliers <- enrichment_res[cond, chosen_id]
    for (outlier in outliers) {
        clusters[[outlier]] <- outlier
    }
    ### Return Cluster Matrix
    names(clusters) <- base::seq_len(length(clusters))

    cluster_mat <- matrix(FALSE, nrow = nrow(enrichment_res), ncol = length(clusters),
        dimnames = list(enrichment_res[, chosen_id], names(clusters)))
    for (clu in names(clusters)) {
        clu_terms <- clusters[[clu]]
        cluster_mat[clu_terms, clu] <- TRUE
    }

    return(cluster_mat)
}


#' Graph Visualization of Clustered Enriched Terms
#'
#' @param clu_obj clustering result (either a matrix obtained via
#' \code{\link{hierarchical_term_clustering}} or \code{\link{fuzzy_term_clustering}}
#' `fuzzy_term_clustering` or a vector obtained via `hierarchical_term_clustering`)
#' @inheritParams fuzzy_term_clustering
#' @param vertex.label.cex font size for vertex labels; it is interpreted as a multiplication factor of some device-dependent base font size (default = 0.7)
#' @param vertex.size.scaling scaling factor for the node size (default = 2.5)
#'
#' @return Plots a graph diagram of clustering results. Each node is an enriched term
#' from `enrichment_res`. Size of node corresponds to -log(lowest_p). Thickness
#' of the edges between nodes correspond to the kappa statistic between the two
#' terms. Color of each node corresponds to distinct clusters. For fuzzy
#' clustering, if a term is in multiple clusters, multiple colors are utilized.
#'
#' @export
#'
#' @examples
#' \dontrun{
#' cluster_graph_vis(clu_obj, kappa_mat, enrichment_res)
#' }
cluster_graph_vis <- function(clu_obj, kappa_mat, enrichment_res, kappa_threshold = 0.35,
    use_description = FALSE, vertex.label.cex = 0.7, vertex.size.scaling = 2.5) {
    ### Set ID/Name index
    chosen_id <- ifelse(use_description, which(colnames(enrichment_res) == "Term_Description"),
        which(colnames(enrichment_res) == "ID"))

    ### For coloring nodes
    all_cols <- c("#E41A1C", "#377EB8", "#4DAF4A", "#984EA3", "#FF7F00", "#FFFF33",
        "#A65628", "#F781BF", "#999999", "#66C2A5", "#FC8D62", "#8DA0CB", "#E78AC3",
        "#A6D854", "#FFD92F", "#E5C494", "#B3B3B3", "#8DD3C7", "#FFFFB3", "#BEBADA",
        "#FB8072", "#80B1D3", "#FDB462", "#B3DE69", "#FCCDE5", "#D9D9D9", "#BC80BD",
        "#CCEBC5", "#FFED6F", "#A6CEE3", "#1F78B4", "#B2DF8A", "#33A02C", "#FB9A99",
        "#E31A1C", "#FDBF6F", "#FF7F00", "#CAB2D6", "#6A3D9A", "#FFFF99", "#B15928")

    if (is.matrix(clu_obj)) {
        ### Argument checks
        if (!all(rownames(clu_obj) %in% colnames(kappa_mat))) {
            stop("Not all terms in `clu_obj` present in `kappa_mat`!")
        }

        ### Prep data Remove weak links
        kappa_mat2 <- kappa_mat
        diag(kappa_mat2) <- 0
        kappa_mat2 <- ifelse(kappa_mat2 < kappa_threshold, 0, kappa_mat2)

        # Add missing terms
        missing <- rownames(clu_obj)[!rownames(clu_obj) %in% colnames(kappa_mat2)]
        missing_mat <- matrix(0, nrow = nrow(kappa_mat2), ncol = length(missing),
            dimnames = list(rownames(kappa_mat2), missing))
        kappa_mat2 <- cbind(kappa_mat2, missing_mat)
        missing <- rownames(clu_obj)[!rownames(clu_obj) %in% rownames(kappa_mat2)]
        missing_mat <- matrix(0, nrow = length(missing), ncol = ncol(kappa_mat2),
            dimnames = list(missing, colnames(kappa_mat2)))
        kappa_mat2 <- rbind(kappa_mat2, missing_mat)

        ### Create Graph, Set Color, Size and Percentages
        values <- apply(clu_obj, 1, function(x) which(x))
        percs <- list()
        for (i in base::seq_len(length(values))) {
            percs[[i]] <- rep(1/length(values[[i]]), length(values[[i]]))
        }

        g <- igraph::graph_from_adjacency_matrix(kappa_mat2, weighted = TRUE)

        if (length(all_cols) < max(as.integer(colnames(clu_obj)))) {
            num_extra <- max(as.integer(colnames(clu_obj))) - length(all_cols)
            extra_colors <- grDevices::rainbow(num_extra)
            all_cols <- c(all_cols, extra_colors)
        }

        # Node shapes are either circle (single cluster) or pie (multiple
        # clusters)
        igraph::V(g)$shape <- ifelse(vapply(percs, length, 1) > 1, "pie", "circle")

        # Node colors are cluster memberships
        cols <- lapply(values, function(x) all_cols[x])
        igraph::V(g)$color <- vapply(cols, function(x) x[1], "")

        # Node sizes are -log(lowest_p)
        p_idx <- match(names(igraph::V(g)), enrichment_res[, chosen_id])
        transformed_p <- -log10(enrichment_res$lowest_p[p_idx])
        igraph::V(g)$size <- transformed_p * vertex.size.scaling

        ### Plot Graph
        igraph::plot.igraph(g, vertex.pie = percs, vertex.pie.color = cols, layout = igraph::layout_nicely(g),
            edge.curved = FALSE, vertex.label.dist = 0, vertex.label.color = "black",
            asp = 1, vertex.label.cex = vertex.label.cex, edge.width = igraph::E(g)$weight,
            edge.arrow.mode = 0)
    } else if (is.integer(clu_obj)) {
        ### Argument checks
        if (!all(names(clu_obj) %in% colnames(kappa_mat))) {
            stop("Not all terms in `clu_obj` present in `kappa_mat`!")
        }

        ### Prep data Remove weak links
        kappa_mat2 <- kappa_mat
        diag(kappa_mat2) <- 0
        kappa_mat2 <- ifelse(kappa_mat2 > kappa_threshold, kappa_mat2, 0)

        # Add missing terms
        missing <- names(clu_obj)[!names(clu_obj) %in% colnames(kappa_mat2)]
        missing_mat <- matrix(0, nrow = nrow(kappa_mat2), ncol = length(missing),
            dimnames = list(rownames(kappa_mat2), missing))
        kappa_mat2 <- cbind(kappa_mat2, missing_mat)
        missing <- names(clu_obj)[!names(clu_obj) %in% rownames(kappa_mat2)]
        missing_mat <- matrix(0, nrow = length(missing), ncol = ncol(kappa_mat2),
            dimnames = list(missing, colnames(kappa_mat2)))
        kappa_mat2 <- rbind(kappa_mat2, missing_mat)

        ### Create Graph, Set Colors and Sizes
        g <- igraph::graph_from_adjacency_matrix(kappa_mat2, weighted = TRUE)

        igraph::V(g)$Clu <- clu_obj[match(igraph::V(g)$name, names(clu_obj))]

        if (length(all_cols) < max(as.integer(igraph::V(g)$Clu))) {
            num_extra <- max(clu_obj) - length(all_cols)
            extra_colors <- grDevices::rainbow(num_extra)
            all_cols <- c(all_cols, extra_colors)
        }

        # Node colors are cluster memberships
        igraph::V(g)$color <- all_cols[as.integer(igraph::V(g)$Clu)]

        # Node sizes are -log(lowest_p)
        p_idx <- match(names(igraph::V(g)), enrichment_res[, chosen_id])
        transformed_p <- -log10(enrichment_res$lowest_p[p_idx])
        igraph::V(g)$size <- transformed_p * vertex.size.scaling

        ### Plot graph
        igraph::plot.igraph(g, layout = igraph::layout_nicely(g), edge.curved = FALSE,
            vertex.label.dist = 0, vertex.label.color = "black", asp = 0, vertex.label.cex = vertex.label.cex,
            edge.width = igraph::E(g)$weight, edge.arrow.mode = 0)
    } else {
        stop("Invalid class for `clu_obj`!")
    }
}

#' Cluster Enriched Terms
#'
#' @inheritParams create_kappa_matrix
#' @param method Either 'hierarchical' or 'fuzzy'. Details of clustering are
#' provided in the corresponding functions \code{\link{hierarchical_term_clustering}},
#' and \code{\link{fuzzy_term_clustering}}
#' @param plot_clusters_graph boolean value indicate whether or not to plot
#' the graph diagram of clustering results (default = TRUE)
#' @param ... additional arguments for \code{\link{hierarchical_term_clustering}},
#' \code{\link{fuzzy_term_clustering}} and \code{\link{cluster_graph_vis}}.
#' See documentation of these functions for more details.
#'
#'
#' @return a data frame of clustering results. For 'hierarchical', the cluster
#' assignments (Cluster) and whether the term is representative of its cluster
#' (Status) is added as columns. For 'fuzzy', terms that are in multiple
#' clusters are provided for each cluster. The cluster assignments (Cluster)
#' and whether the term is representative of its cluster (Status) is
#' added as columns.
#'
#' @export
#'
#' @examples
#' example_clustered <- cluster_enriched_terms(
#'   example_pathfindR_output[1:3, ],
#'   plot_clusters_graph = FALSE
#' )
#' example_clustered <- cluster_enriched_terms(
#'   example_pathfindR_output[1:3, ],
#'   method = 'fuzzy', plot_clusters_graph = FALSE
#' )
#' @seealso See \code{\link{hierarchical_term_clustering}} for hierarchical
#' clustering of enriched terms.
#' See \code{\link{fuzzy_term_clustering}} for fuzzy clustering of enriched terms.
#' See \code{\link{cluster_graph_vis}} for graph visualization of clustering.
cluster_enriched_terms <- function(enrichment_res, method = "hierarchical", plot_clusters_graph = TRUE,
    use_description = FALSE, use_active_snw_genes = FALSE, ...) {
    ### Argument Checks
    if (!method %in% c("hierarchical", "fuzzy")) {
        stop("the clustering `method` must either be \"hierarchical\" or \"fuzzy\"")
    }

    if (!is.logical(plot_clusters_graph)) {
        stop("`plot_clusters_graph` must be logical!")
    }

    ### Create Kappa Matrix
    kappa_mat <- create_kappa_matrix(enrichment_res = enrichment_res, use_description = use_description,
        use_active_snw_genes = use_active_snw_genes)
    kappa_mat[is.na(kappa_mat)] <- 0

    ### Cluster Terms
    if (method == "hierarchical") {
        clu_obj <- R.utils::doCall("hierarchical_term_clustering", kappa_mat = kappa_mat,
            enrichment_res = enrichment_res, use_description = use_description, ...)
    } else {
        clu_obj <- R.utils::doCall("fuzzy_term_clustering", kappa_mat = kappa_mat,
            enrichment_res = enrichment_res, use_description = use_description, ...)
    }

    ### Graph Visualization of Clusters
    if (plot_clusters_graph) {
        R.utils::doCall("cluster_graph_vis", clu_obj = clu_obj, kappa_mat = kappa_mat,
            enrichment_res = enrichment_res, use_description = use_description, ...)
    }

    ### Returned Data Frame with Cluster Information
    clustered_df <- enrichment_res

    ### Set ID/Name index
    chosen_id <- ifelse(use_description, which(colnames(enrichment_res) == "Term_Description"),
        which(colnames(enrichment_res) == "ID"))

    if (method == "hierarchical") {
        ### Assign Clusters and Representatives
        clu_idx <- match(clustered_df[, chosen_id], names(clu_obj))
        clustered_df$Cluster <- clu_obj[clu_idx]
        clustered_df <- clustered_df[order(clustered_df$Cluster, clustered_df$lowest_p,
            decreasing = FALSE), ]

        tmp <- tapply(clustered_df[, chosen_id], clustered_df$Cluster, function(x) x[1])
        stat_cond <- clustered_df[, chosen_id] %in% tmp
        clustered_df$Status <- ifelse(stat_cond, "Representative", "Member")
    } else {
        term_list <- list()
        for (term in rownames(clu_obj)) {
            term_list[[term]] <- which(clu_obj[term, ])
        }
        ### Assign Clusters and Representatives
        clustered_df2 <- c()
        for (i in base::seq_len(nrow(clustered_df))) {
            current_row <- clustered_df[i, ]
            current_clusters <- term_list[[current_row[, chosen_id]]]
            for (clu in current_clusters) {
                clustered_df2 <- rbind(clustered_df2, data.frame(current_row, Cluster = clu))
            }
        }

        clustered_df <- clustered_df2
        clustered_df <- clustered_df[order(clustered_df$Cluster, clustered_df$lowest_p,
            decreasing = FALSE), ]

        tmp <- tapply(clustered_df[, chosen_id], clustered_df$Cluster, function(x) x[1])
        stat_cond <- clustered_df[, chosen_id] %in% tmp
        clustered_df$Status <- ifelse(stat_cond, "Representative", "Member")
    }

    return(clustered_df)
}


================================================
FILE: R/comparison.R
================================================
#' Combine 2 pathfindR Results
#'
#' @param result_A data frame of first pathfindR enrichment results
#' @param result_B data frame of second pathfindR enrichment results
#' @param plot_common boolean to indicate whether or not to plot the term-gene
#' graph of the common terms (default=\code{TRUE})
#'
#' @return Data frame of combined pathfindR enrichment results. Columns are: \describe{
#'   \item{ID}{ID of the enriched term}
#'   \item{Term_Description}{Description of the enriched term}
#'   \item{Fold_Enrichment_A}{Fold enrichment value for the enriched term (Calculated using ONLY the input genes)}
#'   \item{occurrence_A}{the number of iterations that the given term was found to enriched over all iterations}
#'   \item{lowest_p_A}{the lowest adjusted-p value of the given term over all iterations}
#'   \item{highest_p_A}{the highest adjusted-p value of the given term over all iterations}
#'   \item{Up_regulated_A}{the up-regulated genes in the input involved in the given term's gene set, comma-separated}
#'   \item{Down_regulated_A}{the down-regulated genes in the input involved in the given term's gene set, comma-separated}
#'   \item{Fold_Enrichment_B}{Fold enrichment value for the enriched term (Calculated using ONLY the input genes)}
#'   \item{occurrence_B}{the number of iterations that the given term was found to enriched over all iterations}
#'   \item{lowest_p_B}{the lowest adjusted-p value of the given term over all iterations}
#'   \item{highest_p_B}{the highest adjusted-p value of the given term over all iterations}
#'   \item{Up_regulated_B}{the up-regulated genes in the input involved in the given term's gene set, comma-separated}
#'   \item{Down_regulated_B}{the down-regulated genes in the input involved in the given term's gene set, comma-separated}
#'   \item{combined_p}{the combined p value (via Fisher's method)}
#'   \item{status}{whether the term is found in both analyses ('common'), found only in the first ('A only') or found only in the second ('B only)}
#' }
#' By default, the function also displays the term-gene graph of the common terms
#'
#' @export
#'
#' @examples
#' combined_results <- combine_pathfindR_results(example_pathfindR_output, example_comparison_output)
combine_pathfindR_results <- function(result_A, result_B, plot_common = TRUE) {
    combined_df <- merge(result_A, result_B, by = c("ID", "Term_Description"), all = TRUE,
        suffixes = c("_A", "_B"))

    ### Calculate combined p values
    combined_df$combined_p <- NA
    for (i in seq_len(nrow(combined_df))) {
        p_vec <- c(combined_df$lowest_p_A[i], combined_df$lowest_p_B[i])
        p_vec <- p_vec[!is.na(p_vec)]
        combined_df$combined_p[i] <- stats::pchisq(q = sum(log(p_vec)) * -2, df = length(p_vec) *
            2, lower.tail = FALSE)
    }
    ### Indicate intersection status
    combined_df$status <- ifelse(is.na(combined_df$lowest_p_A), "B only", ifelse(is.na(combined_df$lowest_p_B),
        "A only", "common"))

    ### Plot graph common terms
    if (plot_common) {
        graphics::plot(combined_results_graph(combined_df))
    }

    message("You may run `combined_results_graph()` to create visualizations of combined term-gene graphs of selected terms")

    return(combined_df)
}



#' Combined Results Graph
#'
#' @param combined_df Data frame of combined pathfindR enrichment results
#' @param selected_terms the vector of selected terms for creating the graph
#' (either IDs or term descriptions). If set to \code{'common'}, all of the
#' common terms are used. (default = 'common')
#' @inheritParams term_gene_graph
#'
#' @return a  \code{\link[ggraph]{ggraph}} object containing the combined term-gene graph.
#'  Each node corresponds to an enriched term (orange if common, different shades of blue otherwise),
#'  an up-regulated gene (green), a down-regulated gene (red) or
#'  a conflicting (i.e. up in one analysis, down in the other or vice versa) gene
#'  (gray). An edge between a term and a gene indicates
#'  that the given term involves the gene. Size of a term node is proportional
#'  to either the number of genes (if \code{node_size = 'num_genes'}) or
#'  the -log10(lowest p value) (if \code{node_size = 'p_val'}).
#' @export
#'
#' @examples
#' combined_results <- combine_pathfindR_results(
#'   example_pathfindR_output,
#'   example_comparison_output,
#'   plot_common = FALSE
#' )
#' g <- combined_results_graph(combined_results, selected_terms = sample(combined_results$ID, 3))
combined_results_graph <- function(combined_df, selected_terms = "common", use_description = FALSE,
    layout = "stress", node_size = "num_genes") {
    ############ Argument Checks Check use_description is boolean
    if (!is.logical(use_description)) {
        stop("`use_description` must either be TRUE or FALSE!")
    }

    ### Set column for term labels
    ID_column <- ifelse(use_description, "Term_Description", "ID")

    ### Check node_size
    val_node_size <- c("num_genes", "p_val")
    if (!node_size %in% val_node_size) {
        stop("`node_size` should be one of ", paste(dQuote(val_node_size), collapse = ", "))
    }

    if (!is.data.frame(combined_df)) {
        stop("`combined_df` should be a data frame")
    }

    ### Check necessary columnns
    necessary_cols <- c(ID_column, "combined_p", "Up_regulated_A", "Down_regulated_A",
        "Up_regulated_B", "Down_regulated_B")

    if (!all(necessary_cols %in% colnames(combined_df))) {
        stop(paste(c("All of", paste(necessary_cols, collapse = ", "), "must be present in `results_df`!"),
            collapse = " "))
    }

    ############ Initial steps Filter for selected terms
    if (any(selected_terms == "common")) {
        if (!any(combined_df$status == "common")) {
            stop("There are no common terms")
        }
        combined_df <- combined_df[combined_df$status == "common", ]
    } else {
        if (!any(selected_terms %in% combined_df[, ID_column])) {
            stop("None of the `selected_terms` are in the combined results!")
        }
        combined_df <- combined_df[combined_df[, ID_column] %in% selected_terms,
            ]
    }

    ### Prep data frame for graph
    graph_df <- data.frame()
    for (i in base::seq_len(nrow(combined_df))) {
        up_genes <- c(unlist(strsplit(combined_df$Up_regulated_A[i], ", ")), unlist(strsplit(combined_df$Up_regulated_B[i],
            ", ")))
        down_genes <- c(unlist(strsplit(combined_df$Down_regulated_A[i], ", ")),
            unlist(strsplit(combined_df$Down_regulated_B[i], ", ")))
        genes <- c(up_genes, down_genes)
        genes <- genes[!is.na(genes)]
        for (gene in genes) {
            graph_df <- rbind(graph_df, data.frame(Term = combined_df[i, ID_column],
                Gene = gene))
        }
    }
    graph_df <- unique(graph_df)

    up_genes_A <- unlist(lapply(combined_df$Up_regulated_A, function(x) unlist(strsplit(x,
        ", "))))
    down_genes_A <- unlist(lapply(combined_df$Down_regulated_A, function(x) unlist(strsplit(x,
        ", "))))
    up_genes_B <- unlist(lapply(combined_df$Up_regulated_B, function(x) unlist(strsplit(x,
        ", "))))
    down_genes_B <- unlist(lapply(combined_df$Down_regulated_B, function(x) unlist(strsplit(x,
        ", "))))

    terms_A <- combined_df[!is.na(combined_df$lowest_p_A) & is.na(combined_df$lowest_p_B),
        ID_column]
    terms_B <- combined_df[is.na(combined_df$lowest_p_A) & !is.na(combined_df$lowest_p_B),
        ID_column]

    ############ Create graph object and plot create igraph object
    g <- igraph::graph_from_data_frame(graph_df, directed = FALSE)
    igraph::V(g)$type <- ifelse(names(igraph::V(g)) %in% terms_A, "A-only term",
        ifelse(names(igraph::V(g)) %in% terms_B, "B-only term", ifelse(names(igraph::V(g)) %in%
            combined_df[, ID_column], "common term", "gene")))

    # Adjust node sizes
    if (node_size == "num_genes") {
        sizes <- igraph::degree(g)
        sizes <- ifelse(grepl("term", igraph::V(g)$type), sizes, 2)
        size_label <- "# genes"
    } else {
        idx <- match(names(igraph::V(g)), combined_df[, ID_column])
        sizes <- -log10(combined_df$combined_p[idx])
        sizes[is.na(sizes)] <- 2
        size_label <- "-log10(p)"
    }
    igraph::V(g)$size <- sizes
    igraph::V(g)$label.cex <- 0.5
    igraph::V(g)$frame.color <- "gray"

    cond_up_A <- names(igraph::V(g)) %in% up_genes_A
    cond_up_B <- names(igraph::V(g)) %in% up_genes_B
    cond_down_A <- names(igraph::V(g)) %in% down_genes_A
    cond_down_B <- names(igraph::V(g)) %in% down_genes_B
    missing_A <- !cond_up_A & !cond_down_A
    missing_B <- !cond_up_B & !cond_down_B

    up_cond <- (cond_up_A & cond_up_B) | (missing_A & cond_up_B) | (cond_up_A & missing_B)
    down_cond <- (cond_down_A & cond_down_B) | (missing_A & cond_down_B) | (cond_down_A &
        missing_B)

    igraph::V(g)$for_coloring <- ifelse(igraph::V(g)$type == "common term", "Common term",
        ifelse(igraph::V(g)$type == "A-only term", "A-only term", ifelse(igraph::V(g)$type ==
            "B-only term", "B-only term", ifelse(up_cond, "Up gene", ifelse(down_cond,
            "Down gene", "Conflicting gene")))))

    ### Create graph
    create_graph <- function(g, for_coloring, size) {
        color_var <- ggplot2::enquo(for_coloring)
        size_var <- ggplot2::enquo(size)
        p <- ggraph::ggraph(g, layout = layout)
        p <- p + ggraph::geom_edge_link(alpha = 0.8, colour = "darkgrey")
        p <- p + ggraph::geom_node_point(ggplot2::aes(color = !!color_var, size = !!size_var))
        p <- p + ggplot2::scale_size(range = c(5, 10), breaks = round(seq(round(min(igraph::V(g)$size)),
            round(max(igraph::V(g)$size)), length.out = 4)), name = size_label)
        p <- p + ggplot2::theme_void()
        p <- p + suppressWarnings(ggraph::geom_node_text(ggplot2::aes(label = .data$name),
            nudge_y = 0.2, repel = TRUE, max.overlaps = 20))

        vertex_cols <- c(`Common term` = "#FCCA46", `A-only term` = "#9FB8AD", `B-only term` = "#619B8A",
            `Up gene` = "green", `Down gene` = "red", `Conflicting gene` = "gray")
        p <- p + ggplot2::scale_colour_manual(values = vertex_cols, name = NULL)
        p <- p + ggplot2::ggtitle("Combined Terms Graph")
        p <- p + ggplot2::theme(plot.title = ggplot2::element_text(hjust = 0.5))
        return(p)
    }

    return(create_graph(g, for_coloring, size))
}


================================================
FILE: R/core.R
================================================
#' Wrapper Function for pathfindR - Active-Subnetwork-Oriented Enrichment Workflow
#'
#' \code{run_pathfindR} is the wrapper function for the pathfindR workflow
#'
#' This function takes in a data frame consisting of Gene Symbol, log-fold-change
#' and adjusted-p values. After input testing, any gene symbols that are not in
#' the PIN are converted to alias symbols if the alias is in the PIN. Next,
#' active subnetwork search is performed. Enrichment analysis is
#' performed using the genes in each of the active subnetworks. Terms with
#' adjusted-p values lower than \code{enrichment_threshold} are discarded. The
#' lowest adjusted-p value (over all subnetworks) for each term is kept. This
#' process of active subnetwork search and enrichment is repeated  for a selected
#' number of \code{iterations}, which is done in parallel. Over all iterations,
#' the lowest and the highest adjusted-p values, as well as number of occurrences
#' are reported for each enriched term.
#'
#' @inheritParams input_processing
#' @inheritParams fetch_gene_set
#' @inheritParams enrichment_analyses
#' @param plot_enrichment_chart boolean value. If TRUE, a bubble chart displaying
#'  the enrichment results is plotted. (default = TRUE)
#' @param output_dir the directory to be created where the output and intermediate
#'  files are saved (default = \code{NULL}, a temporary directory is used)
#' @param ... additional arguments for \code{\link{active_snw_enrichment_wrapper}}
#'
#' @return Data frame of pathfindR enrichment results. Columns are: \describe{
#'   \item{ID}{ID of the enriched term}
#'   \item{Term_Description}{Description of the enriched term}
#'   \item{Fold_Enrichment}{Fold enrichment value for the enriched term (Calculated using ONLY the input genes)}
#'   \item{occurrence}{the number of iterations that the given term was found to enriched over all iterations}
#'   \item{support}{the median support (proportion of active subnetworks leading to enrichment within an iteration) over all iterations}
#'   \item{lowest_p}{the lowest adjusted-p value of the given term over all iterations}
#'   \item{highest_p}{the highest adjusted-p value of the given term over all iterations}
#'   \item{non_Signif_Snw_Genes (OPTIONAL)}{the non-significant active subnetwork genes, comma-separated}
#'   \item{Up_regulated}{the up-regulated genes (as determined by `change value` > 0, if the `change column` was provided) in the input involved in the given term's gene set, comma-separated. If change column not provided, all affected are listed here.}
#'   \item{Down_regulated}{the down-regulated genes (as determined by `change value` < 0, if the `change column` was provided) in the input involved in the given term's gene set, comma-separated}
#' }
#'  The function also creates an HTML report with the pathfindR enrichment
#'  results linked to the visualizations of the enriched terms in addition to
#'  the table of converted gene symbols. This report can be found in
#'  '\code{output_dir}/results.html' under the current working directory.
#'
#'  By default, a bubble chart of top 10 enrichment results are plotted. The x-axis
#'  corresponds to fold enrichment values while the y-axis indicates the enriched
#'  terms. Sizes of the bubbles indicate the number of significant genes in the given terms.
#'  Color indicates the -log10(lowest-p) value; the more red it is, the more
#'  significant the enriched term is. See \code{\link{enrichment_chart}}.
#'
#' @import knitr
#' @import rmarkdown
#' @import parallel
#' @import doParallel
#' @import foreach
#' @import graphics
#'
#' @export
#'
#' @section Warning: Especially depending on the protein interaction network,
#'  the algorithm and the number of iterations you choose, 'active subnetwork
#'  search + enrichment' component of \code{run_pathfindR} may take a long time to finish.
#'
#' @seealso
#' \code{\link{input_testing}} for input testing, \code{\link{input_processing}} for input processing,
#' \code{\link{active_snw_search}} for active subnetwork search and subnetwork filtering,
#' \code{\link{enrichment_analyses}} for enrichment analysis (using the active subnetworks),
#' \code{\link{summarize_enrichment_results}} for summarizing the active-subnetwork-oriented enrichment results,
#' \code{\link{annotate_term_genes}} for annotation of affected genes in the given gene sets,
#' \code{\link{visualize_terms}} for visualization of enriched terms,
#' \code{\link{enrichment_chart}} for a visual summary of the pathfindR enrichment results,
#' \code{\link[foreach]{foreach}} for details on parallel execution of looping constructs,
#' \code{\link{cluster_enriched_terms}} for clustering the resulting enriched terms and partitioning into clusters.
#'
#' @examples
#' \dontrun{
#' run_pathfindR(example_pathfindR_input)
#' }
run_pathfindR <- function(input, gene_sets = "KEGG", min_gset_size = 10, max_gset_size = 300,
    custom_genes = NULL, custom_descriptions = NULL, pin_name_path = "Biogrid", p_val_threshold = 0.05,
    enrichment_threshold = 0.05, convert2alias = TRUE, plot_enrichment_chart = TRUE,
    output_dir = NULL, list_active_snw_genes = FALSE, ...) {
    ############ Argument checks
    if (!is.logical(plot_enrichment_chart)) {
        stop("`plot_enrichment_chart` should be either TRUE or FALSE")
    }
    if (!is.logical(list_active_snw_genes)) {
        stop("`list_active_snw_genes` should be either TRUE or FALSE")
    }

    gset_list <- fetch_gene_set(gene_sets = gene_sets, min_gset_size = min_gset_size,
        max_gset_size = max_gset_size, custom_genes = custom_genes, custom_descriptions = custom_descriptions)

    ## absolute path to PIN
    pin_path <- return_pin_path(pin_name_path)

    ## create output dir
    output_dir_org <- output_dir
    output_dir <- configure_output_dir(output_dir)
    # on exit, set working directory back to original working directory
    org_dir <- getwd()
    on.exit(setwd(org_dir))
    # create and change working directory into the output directory
    dir.create(output_dir, recursive = TRUE)
    output_dir <- normalizePath(output_dir)
    setwd(output_dir)

    input_testing(input, p_val_threshold)

    input_processed <- input_processing(input, p_val_threshold, pin_path, convert2alias)

    combined_res <- active_snw_enrichment_wrapper(input_processed, pin_path, gset_list,
        enrichment_threshold, list_active_snw_genes, ...)
    setwd(output_dir)

    ## In case no enrichment was found
    if (is.null(combined_res)) {
        warning("Did not find any enriched terms!", call. = FALSE)
        return(data.frame())
    }

    final_res <- summarize_enrichment_results(combined_res, list_active_snw_genes)


    final_res <- annotate_term_genes(result_df = final_res, input_processed = input_processed,
        genes_by_term = gset_list$genes_by_term)

    if (!is.null(output_dir_org)) {
        create_HTML_report(input = input, input_processed = input_processed, final_res = final_res,
            dir_for_report = output_dir)
    }

    if (plot_enrichment_chart) {
        graphics::plot(enrichment_chart(result_df = final_res))
    }

    message(paste0("Found ", nrow(final_res), " enriched terms\n\n"))
    message("You may run:\n")
    message("- cluster_enriched_terms() for clustering enriched terms\n")
    message("- visualize_terms() for visualizing enriched term diagrams\n\n")

    return(final_res)
}


================================================
FILE: R/data_generation.R
================================================
#' Safely download and parse web content
#'
#' This helper function retrieves content from a given URL using \pkg{httr}.  
#' It ensures that common issues (e.g. no internet, timeouts, HTTP errors, 
#' or parsing errors) are handled gracefully with clear, informative error messages.  
#'
#' @param url Character string. The URL of the resource to download.
#' @param ... Additional arguments passed to \code{\link[httr]{GET}}.
#' @param timeout_sec Numeric. Timeout in seconds for the request (default = 10).
#'
#' @return A character string containing the parsed content of the response 
#'   (UTF-8 encoded). On failure, an error is raised with a clear message.
#'
#' @details
#' This function is intended for use inside package functions.  
#' For examples, vignettes, or tests, wrap calls in a connectivity check 
#' (e.g. using \code{http_error(HEAD(url))}) to avoid CRAN failures 
#' when the resource is temporarily unavailable.
#'
#' @examples
#' \dontrun{
#' # Retrieve the latest BioGRID release page
#' result <- safe_get_content("https://downloads.thebiogrid.org/BioGRID/Latest-Release/")
#' }
#' 
#' @importFrom httr GET timeout http_error status_code content
safe_get_content <- function(url, ..., timeout_sec = 10) {
  res <- tryCatch(
    {
      GET(url, timeout(timeout_sec), ...)
    },
    error = function(e) {
      stop("Failed to retrieve resource from ", url, 
           ". Error: ", conditionMessage(e), call. = FALSE)
    }
  )
  
  # Check HTTP status
  if (http_error(res)) {
    stop("The resource at ", url, " is unavailable. HTTP status: ",
         status_code(res), call. = FALSE)
  }
  
  # Return parsed content (default: text if HTML, raw if binary, etc.)
  content <- tryCatch(
    content(res, as = "text", encoding = "UTF-8"),
    error = function(e) {
      stop("Failed to parse content from ", url, 
           ". Error: ", conditionMessage(e), call. = FALSE)
    }
  )
  
  return(content)
}


#' Process Data frame of Protein-protein Interactions
#'
#' @param pin_df data frame of protein-protein interactions with 2 columns:
#' 'Interactor_A' and 'Interactor_B'
#'
#' @return processed PIN data frame (removes self-interactions and
#' duplicated interactions)
process_pin <- function(pin_df) {
    # remove self-interactions
    pin_df <- pin_df[pin_df$Interactor_A != pin_df$Interactor_B, ]

    # remove duplicated inteactions (including symmetric ones)
    pin_df <- unique(t(apply(pin_df, 1, sort)))

    pin_df <- as.data.frame(pin_df)
    colnames(pin_df) <- c("Interactor_A", "Interactor_B")
    return(pin_df)
}

#' Retrieve the Requested Release of Organism-specific BioGRID PIN
#'
#' @param org organism name. BioGRID naming requires underscores for spaces so
#' 'Homo sapiens' becomes 'Homo_sapiens', 'Mus musculus' becomes 'Mus_musculus'
#' etc. See \url{https://wiki.thebiogrid.org/doku.php/statistics} for a full
#' list of available organisms (default = 'Homo_sapiens')
#' @param path2pin the path of the file to save the PIN data. By default, the
#' PIN data is saved in a temporary file
#' @param release the requested BioGRID release (default = 'latest')
#'
#' @return the path of the file in which the PIN data was saved. If
#' \code{path2pin} was not supplied by the user, the PIN data is saved in a
#' temporary file
get_biogrid_pin <- function(org = "Homo_sapiens", path2pin, release = "latest") {
    # check organism name
    all_org_names <- c("Anopheles_gambiae_PEST", "Apis_mellifera", "Arabidopsis_thaliana_Columbia",
        "Bacillus_subtilis_168", "Bos_taurus", "Caenorhabditis_elegans", "Candida_albicans_SC5314",
        "Canis_familiaris", "Cavia_porcellus", "Chlamydomonas_reinhardtii", "Chlorocebus_sabaeus",
        "Cricetulus_griseus", "Danio_rerio", "Dictyostelium_discoideum_AX4", "Drosophila_melanogaster",
        "Emericella_nidulans_FGSC_A4", "Equus_caballus", "Escherichia_coli_K12_MC4100_BW2952",
        "Escherichia_coli_K12_MG1655", "Escherichia_coli_K12_W3110", "Escherichia_coli_K12",
        "Gallus_gallus", "Glycine_max", "Hepatitus_C_Virus", "Homo_sapiens", "Human_Herpesvirus_1",
        "Human_Herpesvirus_2", "Human_Herpesvirus_3", "Human_Herpesvirus_4", "Human_Herpesvirus_5",
        "Human_Herpesvirus_6A", "Human_Herpesvirus_6B", "Human_Herpesvirus_7", "Human_Herpesvirus_8",
        "Human_Immunodeficiency_Virus_1", "Human_Immunodeficiency_Virus_2", "Human_papillomavirus_10",
        "Human_papillomavirus_16", "Human_papillomavirus_6b", "Leishmania_major_Friedlin",
        "Macaca_mulatta", "Meleagris_gallopavo", "Mus_musculus", "Mycobacterium_tuberculosis_H37Rv",
        "Neurospora_crassa_OR74A", "Nicotiana_tomentosiformis", "Oryctolagus_cuniculus",
        "Oryza_sativa_Japonica", "Ovis_aries", "Pan_troglodytes", "Pediculus_humanus",
        "Plasmodium_falciparum_3D7", "Rattus_norvegicus", "Ricinus_communis", "Saccharomyces_cerevisiae_S288c",
        "Schizosaccharomyces_pombe_972h", "Selaginella_moellendorffii", "Simian_Immunodeficiency_Virus",
        "Simian_Virus_40", "Solanum_lycopersicum", "Solanum_tuberosum", "Streptococcus_pneumoniae_ATCCBAA255",
        "Strongylocentrotus_purpuratus", "Sus_scrofa", "Tobacco_Mosaic_Virus", "Ustilago_maydis_521",
        "Vaccinia_Virus", "Vitis_vinifera", "Xenopus_laevis", "Zea_mays")
    if (!org %in% all_org_names) {
        stop(paste(org, "is not a valid Biogrid organism.", "Available organisms are listed on: https://wiki.thebiogrid.org/doku.php/statistics"))
    }

    if (release == "latest") {
      result <- safe_get_content("https://downloads.thebiogrid.org/BioGRID/Latest-Release/")
      
      h2_matches <- regexpr("(?<=<h2>BioGRID Release\\s)(\\d\\.\\d\\.\\d+)", result, perl = TRUE)
      release <- regmatches(result, h2_matches)
    }

    # release directory for download
    rel_dir <- paste0("BIOGRID-", release)

    # choose tab2 vs. tab3
    tab_v <- ifelse(utils::compareVersion(release, "3.5.183") == -1, ".tab2", ".tab3")

    # download tab2 format organism files
    tmp <- tempfile()
    fname <- paste0("BIOGRID-ORGANISM-", release, tab_v)
    biogrid_url <- paste0("https://downloads.thebiogrid.org/Download/BioGRID/Release-Archive/",
        rel_dir, "/", fname, ".zip")
    utils::download.file(biogrid_url, tmp, method = getOption("download.file.method"),
        quiet = TRUE)

    # parse organism names
    all_org_files <- utils::unzip(tmp, list = TRUE)
    all_org_files$Organism <- sub("\\.tab\\d\\.txt", "", all_org_files$Name)
    all_org_files$Organism <- sub("BIOGRID-ORGANISM-", "", all_org_files$Organism)
    all_org_files$Organism <- sub("-.*\\d+$", "", all_org_files$Organism)

    org_file <- all_org_files$Name[all_org_files$Organism == org]

    # process and save organism PIN file
    biogrid_df <- utils::read.delim(unz(tmp, org_file), check.names = FALSE, colClasses = "character")
    biogrid_pin <- data.frame(Interactor_A = biogrid_df[, "Official Symbol Interactor A"],
        Interactor_B = biogrid_df[, "Official Symbol Interactor B"])
    biogrid_pin <- process_pin(biogrid_pin)

    final_pin <- data.frame(intA = biogrid_pin$Interactor_A, pp = "pp", intB = biogrid_pin$Interactor_B)

    if (missing(path2pin)) {
        path2pin <- tempfile()
    }
    utils::write.table(final_pin, path2pin, sep = "\t", row.names = FALSE, col.names = FALSE,
        quote = FALSE)
    return(path2pin)
}

#' Retrieve Organism-specific PIN data
#'
#' @param source As of this version, this function is implemented to get data
#' from 'BioGRID' only. This argument (and this wrapper function) was implemented
#' for future utility
#' @inheritParams get_biogrid_pin
#' @param ... additional arguments for \code{\link{get_biogrid_pin}}
#'
#' @return the path of the file in which the PIN data was saved. If
#' \code{path2pin} was not supplied by the user, the PIN data is saved in a
#' temporary file
#' @export
#'
#' @examples
#' \dontrun{
#' pin_path <- get_pin_file()
#' }
get_pin_file <- function(source = "BioGRID", org = "Homo_sapiens", path2pin, ...) {
    ## TODO
    if (source != "BioGRID") {
        stop("As of this version, this function is implemented to get data from BioGRID only")
    }

    path2pin <- get_biogrid_pin(org = org, path2pin = path2pin, ...)
    return(path2pin)
}

#' Retrieve Gene Sets from GMT-format File
#'
#' @param path2gmt path to the gmt file
#' @param descriptions_idx index for descriptions (default = 2)
#'
#' @return list containing 2 elements: \itemize{
#' \item{gene_sets - A list containing the genes involved in each gene set}
#' \item{descriptions - A named vector containing the descriptions for each gene set}
#' }
gset_list_from_gmt <- function(path2gmt, descriptions_idx = 2) {
    gset_names_idx <- ifelse(descriptions_idx == 2, 1, 2)
    gmt_lines <- readLines(path2gmt)

    ## Genes list
    genes_list <- lapply(gmt_lines, function(x) {
        x <- unlist(strsplit(x, "\t"))
        x <- unique(x[3:length(x)])
        x <- x[x != ""]
        return(x)
    })

    names(genes_list) <- vapply(gmt_lines, function(x) {
        x <- unlist(strsplit(x, "\t"))
        return(x[gset_names_idx])
    }, "a")

    ## Descriptions vector
    descriptions_vec <- vapply(gmt_lines, function(x) {
        x <- unlist(strsplit(x, "\t"))
        return(x[descriptions_idx])
    }, "a")

    names(descriptions_vec) <- names(genes_list)

    # remove empty gene sets (if any)
    genes_list <- genes_list[vapply(genes_list, length, 1) != 0]
    descriptions_vec <- descriptions_vec[names(genes_list)]

    return(list(gene_sets = genes_list, descriptions = descriptions_vec))
}

#' Retrieve Organism-specific KEGG Pathway Gene Sets
#'
#' @param org_code KEGG organism code for the selected organism. For a full list
#' of all available organisms, see \url{https://www.genome.jp/kegg/catalog/org_list.html}
#'
#' @return list containing 2 elements: \itemize{
#' \item{gene_sets - A list containing KEGG IDs for the genes involved in each KEGG pathway}
#' \item{descriptions - A named vector containing the descriptions for each KEGG pathway}
#' }
#' @importFrom ggkegg pathway
get_kegg_gsets <- function(org_code = "hsa") {

  message("Grab a cup of coffee, this will take a while...")

  all_pathways_url <- paste0("https://rest.kegg.jp/list/pathway/", org_code)
  all_pathways_result <- safe_get_content(all_pathways_url)
  parsed_all_pathways_result <- strsplit(all_pathways_result, "\n")[[1]]
  pathway_ids <- vapply(parsed_all_pathways_result, function(x) unlist(strsplit(x, "\t"))[1], "id")
  pathway_descriptons <- vapply(parsed_all_pathways_result, function(x) unlist(strsplit(x, "\t"))[2], "description")
  names(pathway_descriptons) <- pathway_ids

  genes_by_pathway <- lapply(pathway_ids, function(pw_id) {
    pathways_graph <- pathway(pid = pw_id, directory = tempdir(), use_cache = FALSE, return_tbl_graph = FALSE)
    all_pw_kegg_ids <- igraph::V(pathways_graph)$name[igraph::V(pathways_graph)$type == "gene"]
    all_pw_kegg_ids <- unlist(strsplit(all_pw_kegg_ids, " "))
    all_pw_kegg_ids <- unique(all_pw_kegg_ids)
    return(all_pw_kegg_ids)
  })

  names(genes_by_pathway) <- pathway_ids

  # remove empty gene sets (e.g. pure metabolic pathways)
  kegg_genes <- genes_by_pathway[vapply(genes_by_pathway, length, 1) != 0]

  kegg_descriptions <- pathway_descriptons
  kegg_descriptions <- sub(" & .*$", "", sub("-([^-]*)$", "&\\1", kegg_descriptions))
  kegg_descriptions <- kegg_descriptions[names(kegg_descriptions) %in% names(kegg_genes)]

  result <- list(gene_sets = kegg_genes, descriptions = kegg_descriptions)
  return(result)
}

#' Retrieve Reactome Pathway Gene Sets
#'
#' @return Gets the latest Reactome pathways gene sets in gmt format. Parses the
#' gmt file and returns a list containing 2 elements: \itemize{
#' \item{gene_sets - A list containing the genes involved in each Reactome pathway}
#' \item{descriptions - A named vector containing the descriptions for each Reactome pathway}
#' }
#'
get_reactome_gsets <- function() {
    tmp <- tempfile()
    reactome_url <- "https://reactome.org/download/current/ReactomePathways.gmt.zip"
    utils::download.file(reactome_url, tmp, method = getOption("download.file.method"))

    reactome_gmt <- unz(tmp, "ReactomePathways.gmt")
    result <- gset_list_from_gmt(reactome_gmt, descriptions_idx = 1)
    close(reactome_gmt)

    # fix illegal char(s)
    result$descriptions <- gsub("[^ -~]", "", result$descriptions)
    return(result)
}

#' Retrieve Organism-specific MSigDB Gene Sets
#'
#' @param species species name for output genes, such as Homo sapiens, Mus musculus, etc.
#' See \code{\link[msigdbr]{msigdbr_species}} for all the species available in
#' the msigdbr package.
#' @param db_species Species abbreviation for the human or mouse databases ("HS" or "MM").
#' @param collection collection. e.g., H, C1. (default = NULL,
#' i.e. list all gene sets in collection). 
#' See \code{\link[msigdbr]{msigdbr_collections}} for all available options
#' the msigdbr package.
#' @param subcollection sub-collection, such as CGP, BP, etc. (default = NULL,
#' i.e. list all gene sets in collection). 
#' See \code{\link[msigdbr]{msigdbr_collections}} for all available options
#' the msigdbr package.
#'
#' @return Retrieves the MSigDB gene sets and returns a list containing 2 elements: \itemize{
#' \item{gene_sets - A list containing the genes involved in each of the selected MSigDB gene sets}
#' \item{descriptions - A named vector containing the descriptions for each selected MSigDB gene set}
#' }
#'
#' @details this function utilizes the function \code{\link[msigdbr]{msigdbr}}
#' from the \code{msigdbr} package to retrieve the 'Molecular Signatures Database'
#' (MSigDB) gene sets (Subramanian et al. 2005 <doi:10.1073/pnas.0506580102>,
#' Liberzon et al. 2015 <doi:10.1016/j.cels.2015.12.004>).
#' Available collections are: H: hallmark gene sets, C1: positional gene sets,
#' C2: curated gene sets, C3: motif gene sets, C4: computational gene sets,
#' C5: GO gene sets, C6: oncogenic signatures and C7: immunologic signatures
get_mgsigdb_gsets <- function(species = "Homo sapiens", db_species = "HS", collection = NULL, subcollection = NULL) {
    msig_df <- msigdbr::msigdbr(
      species = species, 
      collection = collection, 
      subcollection = subcollection, 
      db_species = db_species
    )

    ### create gene sets list
    all_gs_ids <- unique(msig_df$gs_id)
    msig_gsets_list <- list()
    for (id in all_gs_ids) {
        sub_df <- msig_df[msig_df$gs_id == id, ]
        msig_gsets_list[[id]] <- unique(sub_df$gene_symbol)
    }
    ### create gene sets descriptions
    msig_gsets_descriptions <- msig_df[, c("gs_name", "gs_id")]
    msig_gsets_descriptions <- unique(msig_gsets_descriptions)
    tmp <- msig_gsets_descriptions$gs_id
    msig_gsets_descriptions <- msig_gsets_descriptions$gs_name
    names(msig_gsets_descriptions) <- tmp

    result <- list(gene_sets = msig_gsets_list, descriptions = msig_gsets_descriptions)
    return(result)
}

#' Retrieve Organism-specific Gene Sets List
#'
#' @param source As of this version, either 'KEGG', 'Reactome' or 'MSigDB' (default = 'KEGG')
#' @param org_code (Used for 'KEGG' only) KEGG organism code for the selected organism. For a full list
#' of all available organisms, see \url{https://www.genome.jp/kegg/catalog/org_list.html}
#' @inheritParams get_mgsigdb_gsets
#'
#' @return A list containing 2 elements: \itemize{
#' \item{gene_sets - A list containing the genes involved in each gene set}
#' \item{descriptions - A named vector containing the descriptions for each gene set}
#' }. For 'KEGG' and 'MSigDB', it is possible to choose a specific organism. For a full list
#' of all available KEGG organisms, see \url{https://www.genome.jp/kegg/catalog/org_list.html}.
#' See \code{\link[msigdbr]{msigdbr_species}} for all the species available in
#' the msigdbr package used for obtaining 'MSigDB' gene sets.
#' For Reactome, there is only one collection of pathway gene sets.
#' @export
#'
get_gene_sets_list <- function(source = "KEGG", org_code = "hsa", species = "Homo sapiens", 
                               db_species = "HS", collection, subcollection = NULL) {
    if (source == "KEGG") {
        return(get_kegg_gsets(org_code))
    } else if (source == "Reactome") {
        message("For Reactome, there is only one collection of pathway gene sets.")
        return(get_reactome_gsets())
    } else if (source == "MSigDB") {
        return(
          get_mgsigdb_gsets(
            species = species, 
            db_species= db_species, 
            collection = collection, 
            subcollection = subcollection
          )
        )
    } else {
        stop("As of this version, this function is implemented to get data from KEGG, Reactome and MSigDB only")
    }
}


================================================
FILE: R/enrichment.R
================================================
#' Hypergeometric Distribution-based Hypothesis Testing
#'
#' @param term_genes vector of genes in the selected term gene set
#' @param chosen_genes vector containing the set of input genes
#' @param background_genes vector of background genes (i.e. universal set of
#' genes in the experiment)
#'
#' @return the p-value as determined using the hypergeometric distribution.
#'
#' @details To determine whether the \code{chosen_genes} are enriched
#' (compared to a background pool of genes) in the \code{term_genes}, the
#' hypergeometric distribution is assumed and the appropriate p value
#' (the value under the right tail) is calculated and returned.
#'
#' @export
#'
#' @examples
#' hyperg_test(letters[1:5], letters[2:5], letters)
#' hyperg_test(letters[1:5], letters[2:10], letters)
#' hyperg_test(letters[1:5], letters[2:13], letters)
hyperg_test <- function(term_genes, chosen_genes, background_genes) {
    #### Argument checks
    if (!is.atomic(term_genes)) {
        stop("`term_genes` should be a vector")
    }
    if (!is.atomic(chosen_genes)) {
        stop("`chosen_genes` should be a vector")
    }
    if (!is.atomic(background_genes)) {
        stop("`background_genes` should be a vector")
    }

    if (length(term_genes) > length(background_genes)) {
        stop("`term_genes` cannot be larger than `background_genes`!")
    }
    if (length(chosen_genes) > length(background_genes)) {
        stop("`chosen_genes` cannot be larger than `background_genes`!")
    }

    #### Calculate p value
    term_genes_selected <- sum(chosen_genes %in% term_genes)
    term_genes_in_pool <- sum(term_genes %in% background_genes)
    tot_genes_in_pool <- length(background_genes)
    non_term_genes_in_pool <- tot_genes_in_pool - term_genes_in_pool
    num_selected_genes <- length(chosen_genes)

    p_val <- stats::phyper(term_genes_selected - 1, term_genes_in_pool, non_term_genes_in_pool,
        num_selected_genes, lower.tail = FALSE)
    return(p_val)
}

#' Perform Enrichment Analysis for a Single Gene Set
#'
#' @param input_genes The set of gene symbols to be used for enrichment
#'   analysis. In the scope of this package, these are genes that were
#'   identified for an active subnetwork
#' @param genes_by_term List that contains genes for each gene set. Names of
#'   this list are gene set IDs (default = kegg_genes)
#' @param term_descriptions Vector that contains term descriptions for the
#'   gene sets. Names of this vector are gene set IDs (default = kegg_descriptions)
#' @param adj_method correction method to be used for adjusting p-values.
#'   (default = 'bonferroni')
#' @param enrichment_threshold adjusted-p value threshold used when filtering
#'   enrichment results (default = 0.05)
#' @param sig_genes_vec vector of significant gene symbols. In the scope of this
#'   package, these are the input genes that were used for active subnetwork search
#' @param background_genes vector of background genes. In the scope of this package,
#'   the background genes are taken as all genes in the PIN
#'   (see \code{\link{enrichment_analyses}})
#'
#' @return A data frame that contains enrichment results
#' @export
#' @seealso \code{\link[stats]{p.adjust}} for adjustment of p values. See
#'   \code{\link{run_pathfindR}} for the wrapper function of the pathfindR
#'   workflow. \code{\link{hyperg_test}} for the details on hypergeometric
#'   distribution-based hypothesis testing.
#' @examples
#' enrichment(
#'   input_genes = c('PER1', 'PER2', 'CRY1', 'CREB1'),
#'   sig_genes_vec = 'PER1',
#'   background_genes = unlist(pathfindR.data::kegg_genes)
#' )
enrichment <- function(input_genes, genes_by_term = pathfindR.data::kegg_genes, term_descriptions = pathfindR.data::kegg_descriptions,
    adj_method = "bonferroni", enrichment_threshold = 0.05, sig_genes_vec, background_genes) {
    #### Argument checks input genes
    if (!is.atomic(input_genes)) {
        stop("`input_genes` should be a vector of gene symbols")
    }

    ## gene sets data
    if (!is.list(genes_by_term)) {
        stop("`genes_by_term` should be a list of term gene sets")
    }
    if (is.null(names(genes_by_term))) {
        stop("`genes_by_term` should be a named list (names are gene set IDs)")
    }

    if (!is.atomic(term_descriptions)) {
        stop("`term_descriptions` should be a vector of term gene descriptions")
    }
    if (is.null(names(term_descriptions))) {
        stop("`term_descriptions` should be a named vector (names are gene set IDs)")
    }

    if (length(genes_by_term) != length(term_descriptions)) {
        stop("The lengths of `genes_by_term` and `term_descriptions` should be the same")
    }
    if (any(names(genes_by_term) != names(term_descriptions))) {
        stop("The names of `genes_by_term` and `term_descriptions` should all be the same")
    }

    ## enrichment threshold
    if (!is.numeric(enrichment_threshold)) {
        stop("`enrichment_threshold` should be a numeric value between 0 and 1")
    }
    if (enrichment_threshold < 0 | enrichment_threshold > 1) {
        stop("`enrichment_threshold` should be between 0 and 1")
    }

    ## signif. genes and background (universal set) genes
    if (!is.atomic(sig_genes_vec)) {
        stop("`sig_genes_vec` should be a vector")
    }
    if (!is.atomic(background_genes)) {
        stop("`background_genes` should be a vector")
    }

    #### Obtain p values
    enrichment_res <- vapply(genes_by_term, hyperg_test, 0.1, input_genes, background_genes)
    enrichment_res <- as.data.frame(enrichment_res)
    colnames(enrichment_res) <- "p_value"

    # Adjust p values
    idx <- order(enrichment_res$p_value)
    enrichment_res <- enrichment_res[idx, , drop = FALSE]
    enrichment_res$adj_p <- stats::p.adjust(enrichment_res$p, method = adj_method)


    #### Filter by adj-p
    cond <- enrichment_res$adj_p <= enrichment_threshold
    # Empty case (if all adj-p > threshold)
    if (sum(cond) == 0) {
        return(NULL)
    }
    enrichment_res <- enrichment_res[cond, ]

    #### Add other columns Term IDs
    enrichment_res$ID <- rownames(enrichment_res)

    ## Term descriptions
    idx <- match(enrichment_res$ID, names(term_descriptions))
    enrichment_res$Term_Description <- term_descriptions[idx]

    # Fold enrinchment
    gset_for_fe <- genes_by_term[rownames(enrichment_res)]
    A <- vapply(gset_for_fe, function(gset) length(intersect(sig_genes_vec, gset)),
        1L)/length(sig_genes_vec)
    B <- vapply(gset_for_fe, function(gset) length(intersect(background_genes, gset)),
        1L)/length(background_genes)
    enrichment_res$Fold_Enrichment <- A/B

    # Non-significant Subnetwork Genes
    non_sig_snw_genes <- base::setdiff(input_genes, sig_genes_vec)
    for (i in base::seq_len(nrow(enrichment_res))) {
        tmp <- intersect(non_sig_snw_genes, genes_by_term[[enrichment_res$ID[i]]])
        enrichment_res$non_Signif_Snw_Genes[i] <- paste(tmp, collapse = ", ")
    }

    ## reorder columns
    to_order <- c("ID", "Term_Description", "Fold_Enrichment", "p_value", "adj_p",
        "non_Signif_Snw_Genes")
    enrichment_res <- enrichment_res[, to_order]

    return(enrichment_res)
}

#' Perform Enrichment Analyses on the Input Subnetworks
#'
#' @param snws a list of subnetwork genes (i.e., vectors of genes for each subnetwork)
#' @inheritParams enrichment
#' @inheritParams return_pin_path
#' @param list_active_snw_genes boolean value indicating whether or not to report
#' the non-significant active subnetwork genes for the active subnetwork which was enriched for
#' the given term with the lowest p value (default = \code{FALSE})
#'
#' @return a dataframe of combined enrichment results. Columns are: \describe{
#'   \item{ID}{ID of the enriched term}
#'   \item{Term_Description}{Description of the enriched term}
#'   \item{Fold_Enrichment}{Fold enrichment value for the enriched term}
#'   \item{p_value}{p value of enrichment}
#'   \item{adj_p}{adjusted p value of enrichment}
#'   \item{support}{the support (proportion of active subnetworks leading to enrichment over all subnetworks) for the gene set}
#'   \item{non_Signif_Snw_Genes (OPTIONAL)}{the non-significant active subnetwork genes, comma-separated}
#' }
#'
#' @export
#'
#' @seealso \code{\link{enrichment}} for the enrichment analysis for a single gene set
#'
#' @examples
#' enr_res <- enrichment_analyses(
#'   snws = example_active_snws[1:2],
#'   sig_genes_vec = example_pathfindR_input$Gene.symbol[1:25],
#'   pin_name_path = 'KEGG'
#' )
enrichment_analyses <- function(snws, sig_genes_vec, pin_name_path = "Biogrid", genes_by_term = pathfindR.data::kegg_genes,
    term_descriptions = pathfindR.data::kegg_descriptions, adj_method = "bonferroni",
    enrichment_threshold = 0.05, list_active_snw_genes = FALSE) {
    ### Argument check
    if (!is.logical(list_active_snw_genes)) {
        stop("`list_active_snw_genes` should be either TRUE or FALSE")
    }

    ### Load PIN Data
    pin_path <- return_pin_path(pin_name_path)
    pin <- utils::read.delim(file = pin_path, header = FALSE)

    background_genes <- unique(c(pin[, 1], pin[, 3]))

    # turn all to upper case for best match
    genes_by_term <- lapply(genes_by_term, base::toupper)
    sig_genes_vec <- base::toupper(sig_genes_vec)
    background_genes <- base::toupper(background_genes)

    ############ Enrichment per subnetwork
    enrichment_res <- lapply(snws, function(x) {
        enrichment(input_genes = base::toupper(x), genes_by_term = genes_by_term,
            term_descriptions = term_descriptions, adj_method = adj_method, enrichment_threshold = enrichment_threshold,
            sig_genes_vec = sig_genes_vec, background_genes = background_genes)
    })

    ### indices for snw.s
    if (length(enrichment_res) != 0) {
        for (i in seq_len(length(enrichment_res))) {
            if (!is.null(enrichment_res[[i]])) {
                enrichment_res[[i]]$snw_idx <- i
            }
        }
    }

    ############ Combine Enrichments Results for All Subnetworks
    enrichment_res <- Reduce(rbind, enrichment_res)

    ############ Process if non-empty
    if (!is.null(enrichment_res)) {
        ## calculate support values
        support <- tapply(enrichment_res$snw_idx, enrichment_res$ID, length)
        support <- support/length(snws)
        enrichment_res$support <- support[match(enrichment_res$ID, names(support))]
        enrichment_res$snw_idx <- NULL

        ## delete non_Signif_Snw_Genes if list_active_snw_genes == FALSE
        if (!list_active_snw_genes) {
            enrichment_res$non_Signif_Snw_Genes <- NULL
        }

        ## keep lowest p for each term
        idx <- order(enrichment_res$adj_p)
        enrichment_res <- enrichment_res[idx, ]
        enrichment_res <- enrichment_res[!duplicated(enrichment_res$ID), ]
    }
    return(enrichment_res)
}


#' Summarize Enrichment Results
#'
#' @param enrichment_res a dataframe of combined enrichment results. Columns are: \describe{
#'   \item{ID}{ID of the enriched term}
#'   \item{Term_Description}{Description of the enriched term}
#'   \item{Fold_Enrichment}{Fold enrichment value for the enriched term}
#'   \item{p_value}{p value of enrichment}
#'   \item{adj_p}{adjusted p value of enrichment}
#'   \item{non_Signif_Snw_Genes (OPTIONAL)}{the non-significant active subnetwork genes, comma-separated}
#' }
#' @inheritParams enrichment_analyses
#'
#' @return a dataframe of summarized enrichment results (over multiple iterations). Columns are: \describe{
#'   \item{ID}{ID of the enriched term}
#'   \item{Term_Description}{Description of the enriched term}
#'   \item{Fold_Enrichment}{Fold enrichment value for the enriched term}
#'   \item{occurrence}{the number of iterations that the given term was found to enriched over all iterations}
#'   \item{support}{the median support (proportion of active subnetworks leading to enrichment within an iteration) over all iterations}
#'   \item{lowest_p}{the lowest adjusted-p value of the given term over all iterations}
#'   \item{highest_p}{the highest adjusted-p value of the given term over all iterations}
#'   \item{non_Signif_Snw_Genes (OPTIONAL)}{the non-significant active subnetwork genes, comma-separated}
#' }
#' @export
#'
#' @examples
#' \dontrun{
#' summarize_enrichment_results(enrichment_res)
#' }
summarize_enrichment_results <- function(enrichment_res, list_active_snw_genes = FALSE) {
    message("## Processing the enrichment results over all iterations")

    ## Argument checks
    if (!is.logical(list_active_snw_genes)) {
        stop("`list_active_snw_genes` should be either TRUE or FALSE")
    }

    nec_cols <- c("ID", "Term_Description", "Fold_Enrichment", "p_value", "adj_p",
        "support")
    if (list_active_snw_genes) {
        nec_cols <- c(nec_cols, "non_Signif_Snw_Genes")
    }

    if (!is.data.frame(enrichment_res)) {
        stop("`enrichment_res` should be a data frame")
    }

    if (ncol(enrichment_res) != length(nec_cols)) {
        stop("`enrichment_res` should have exactly ", length(nec_cols), " columns")
    }

    if (!all(nec_cols %in% colnames(enrichment_res))) {
        stop("`enrichment_res` should have column names ", paste(dQuote(nec_cols),
            collapse = ", "))
    }

    ## Annotate lowest p, highest p, occurrence and median support
    final_res <- enrichment_res
    lowest_p <- tapply(enrichment_res$adj_p, enrichment_res$ID, min)
    highest_p <- tapply(enrichment_res$adj_p, enrichment_res$ID, max)
    occurrence <- tapply(enrichment_res$adj_p, enrichment_res$ID, length)
    support <- tapply(enrichment_res$support, enrichment_res$ID, stats::median)

    matched_idx <- match(final_res$ID, names(lowest_p))
    final_res$lowest_p <- as.numeric(lowest_p[matched_idx])

    matched_idx <- match(final_res$ID, names(highest_p))
    final_res$highest_p <- as.numeric(highest_p[matched_idx])

    matched_idx <- match(final_res$ID, names(occurrence))
    final_res$occurrence <- as.numeric(occurrence[matched_idx])

    matched_idx <- match(final_res$ID, names(support))
    final_res$support <- as.numeric(support[matched_idx])

    ## reorder columns
    keep <- c("ID", "Term_Description", "Fold_Enrichment", "occurrence", "support",
        "lowest_p", "highest_p")
    if (list_active_snw_genes) {
        keep <- c(keep, "non_Signif_Snw_Genes")
    }
    final_res <- final_res[, keep]

    ## keep data with lowest p-value over all iterations
    final_res <- final_res[order(final_res$lowest_p), ]
    final_res <- final_res[!duplicated(final_res$ID), ]
    rownames(final_res) <- NULL

    return(final_res)
}


================================================
FILE: R/pathfindr.R
================================================
#' pathfindR: A package for Enrichment Analysis Utilizing Active Subnetworks
#'
#' pathfindR is a tool for active-subnetwork-oriented gene set enrichment analysis.
#' The main aim of the package is to identify active subnetworks in a
#' protein-protein interaction network using a user-provided list of genes
#' and associated p values then performing enrichment analyses on the identified
#' subnetworks, discovering enriched terms (i.e. pathways, gene ontology, TF target
#' gene sets etc.) that possibly underlie the phenotype of interest.
#'
#' For analysis on non-Homo sapiens organisms, pathfindR offers utility functions
#' for obtaining organism-specific PIN data and organism-specific gene sets data.
#'
#' pathfindR also offers functionalities to cluster the enriched terms and
#' identify representative terms in each cluster, to score the enriched terms
#' per sample and to visualize analysis results.
#'
#'
#' @seealso See \code{\link{run_pathfindR}} for details on the pathfindR
#' active-subnetwork-oriented enrichment analysis
#' See \code{\link{cluster_enriched_terms}} for details on methods of enriched
#' terms clustering to define clusters of biologically-related terms
#' See \code{\link{score_terms}} for details on agglomerated score calculation
#' for enriched terms to investigate how a gene set is altered in a given sample
#' (or in cases vs. controls)
#' See \code{\link{term_gene_heatmap}} for details on visualization of the heatmap
#' of enriched terms by involved genes
#' See \code{\link{term_gene_graph}} for details on visualizing terms and
#' term-related genes as a graph to determine the degree of overlap between the
#' enriched terms by identifying shared and/or distinct significant genes
#' See \code{\link{UpSet_plot}} for details on creating an UpSet plot of the
#' enriched terms.
#' See \code{\link{get_pin_file}} for obtaining organism-specific PIN data and
#' \code{\link{get_gene_sets_list}} for obtaining organism-specific gene sets data
#' @import pathfindR.data
#' @name pathfindR
"_PACKAGE"

globalVariables(c("for_coloring", "size"))


================================================
FILE: R/scoring.R
================================================
#' Calculate Agglomerated Scores of Enriched Terms for Each Subject
#'
#' @param enrichment_table a data frame that must contain the 3 columns below: \describe{
#'   \item{Term_Description}{Description of the enriched term (necessary if \code{use_description = TRUE})}
#'   \item{ID}{ID of the enriched term (necessary if \code{use_description = FALSE})}
#'   \item{Up_regulated}{the up-regulated genes in the input involved in the given term's gene set, comma-separated}
#'   \item{Down_regulated}{the down-regulated genes in the input involved in the given term's gene set, comma-separated}
#' }
#' @param exp_mat the experiment (e.g., gene expression/methylation) matrix.
#' Columns are samples and rows are genes. Column names must contain sample
#' names and row names must contain the gene symbols.
#' @param cases (Optional) A vector of sample names that are cases in the
#' case/control experiment. (default = NULL)
#' @param use_description Boolean argument to indicate whether term descriptions
#'  (in the 'Term_Description' column) should be used. (default = \code{FALSE})
#' @param plot_hmap Boolean value to indicate whether or not to draw the
#' heatmap plot of the scores. (default = TRUE)
#' @param ... Additional arguments for \code{\link{plot_scores}} for aesthetics
#' of the heatmap plot
#'
#' @return Matrix of agglomerated scores of each enriched term per sample.
#' Columns are samples, rows are enriched terms. Optionally, displays a heatmap
#' of this matrix.
#'
#' @section Conceptual Background:
#' For an experiment matrix (containing expression, methylation, etc. values),
#' the rows of which are genes and the columns of which are samples,
#' we denote: \itemize{
#' \item E as a matrix of size \ifelse{html}{\out{m x n}}{\eqn{m \times n}}
#' \item G as the set of all genes in the experiment \ifelse{html}{\out{G = E<sub>i.</sub>,  i &#8712; [1, m]}}{\eqn{G = E_{i\cdot},  \ \ i \in [1, m]}}
#' \item S as the set of all samples in the experiment \ifelse{html}{\out{S = E<sub>.j</sub>,  i &#8712; [1, n]}}{\eqn{S = E_{j\cdot},  \ \ \in [1, n]}}
#' }
#'
#' We next define the gene score matrix GS (the standardized experiment matrix,
#' also of size \ifelse{html}{\out{m x n}}{\eqn{m \times n}}) as:
#'
#' \ifelse{html}{\out{GS<sub>gs</sub> = (E<sub>gs</sub> - &#x113;<sub>g</sub>) / s<sub>g</sub>}}{\eqn{GS_{gs} = \frac{E_{gs} - \bar{e_g}}{s_g}}}
#'
#' where \ifelse{html}{\out{g &#8712; G}}{\eqn{g \in G}}, \ifelse{html}{\out{s &#8712; S}}{\eqn{s \in S}},
#' \ifelse{html}{\out{&#x113;<sub>g</sub>}}{\eqn{\bar{e_g}}} is the mean of
#' all values for gene g and \ifelse{html}{\out{s<sub>g</sub>}}{\eqn{\bar{s_g}}}
#' is the standard deviation of all values for gene g.
#'
#' We next denote T to be a set of terms (where each \ifelse{html}{\out{t &#8712; T}}{\eqn{t \in T}}
#' is a set of term-related genes, i.e.,
#' \ifelse{html}{\out{t = \{g<sub>x</sub>, ..., g<sub>y</sub>\} &sub; G}}{\eqn{t = \{g_x, ..., g_y\} \subset G}})
#' and finally define the agglomerated term scores matrix TS (where rows
#' correspond to genes and columns corresponds to samples s.t. the matrix has size
#' \ifelse{html}{\out{|T| x n}}{\eqn{|T| \times n}}) as:
#'
#' \ifelse{html}{\out{TS<sub>ts</sub> = 1/|t| &#x2211; <sub>g &#8712; t</sub> GS<sub>gs</sub>}}{\eqn{TS_{ts} = \frac{1}{|t|}\sum_{g \in t} GS_{gs}}},
#' where \ifelse{html}{\out{t &#8712; T}}{\eqn{t \in T}} and \ifelse{html}{\out{s &#8712; S}}{\eqn{s \in S}}.
#'
#' @export
#'
#' @examples
#' score_matrix <- score_terms(
#'   example_pathfindR_output,
#'   example_experiment_matrix,
#'   plot_hmap = FALSE
#' )
score_terms <- function(enrichment_table, exp_mat, cases = NULL, use_description = FALSE,
    plot_hmap = TRUE, ...) {
    #### Argument Checks
    if (!is.logical(use_description)) {
        stop("`use_description` should either be TRUE or FALSE")
    }

    if (!is.logical(plot_hmap)) {
        stop("`plot_hmap` should either be TRUE or FALSE")
    }

    if (!is.data.frame(enrichment_table)) {
        stop("`enrichment_table` should be a data frame of enrichment results")
    }
    ID_column <- ifelse(use_description, "Term_Description", "ID")
    nec_cols <- c(ID_column, "Up_regulated", "Down_regulated")
    if (!all(nec_cols %in% colnames(enrichment_table))) {
        stop("`enrichment_table` should contain all of ", paste(dQuote(nec_cols),
            collapse = ", "))
    }

    if (!is.matrix(exp_mat)) {
        stop("`exp_mat` should be a matrix")
    }

    if (!is.null(cases)) {
        if (!is.atomic(cases)) {
            stop("`cases` should be a vector")
        }

        if (!all(cases %in% colnames(exp_mat))) {
            stop("Missing `cases` in `exp_mat`")
        }
    }

    ## fix duplicated term descriptions (if using description)
    if (use_description) {
        dup_desc <- enrichment_table$Term_Description[duplicated(enrichment_table$Term_Description)]

        tmp <- ifelse(enrichment_table$Term_Description %in% dup_desc, paste0(enrichment_table$Term_Description,
            "_", enrichment_table$ID), enrichment_table$Term_Description)
        enrichment_table$Term_Description <- tmp
    }

    #### Create score matrix
    all_scores_matrix <- c()
    for (i in base::seq_len(nrow(enrichment_table))) {
        # Get signif. genes
        up_genes <- enrichment_table$Up_regulated[i]
        down_genes <- enrichment_table$Down_regulated[i]
        up_genes <- unlist(strsplit(up_genes, ", "))
        down_genes <- unlist(strsplit(down_genes, ", "))

        genes <- c(up_genes, down_genes)

        # convert gene symbols to upper case for comparison
        genes <- toupper(genes)
        exp_mat_genes <- rownames(exp_mat)
        exp_mat_genes <- toupper(exp_mat_genes)

        # some genes may not be in exp. matrix
        genes <- genes[genes %in% exp_mat_genes]

        if (length(genes) != 0) {
            # subset exp. matrix to include only genes
            sub_mat <- exp_mat[exp_mat_genes %in% genes, , drop = FALSE]

            current_term_score_matrix <- c()
            for (gene in genes) {
                gene_vec <- sub_mat[toupper(rownames(sub_mat)) == gene, ]
                gene_vec <- as.numeric(gene_vec)
                names(gene_vec) <- colnames(sub_mat)

                # calculate mean and sd across samples
                gene_mean <- base::mean(gene_vec)
                gene_sd <- stats::sd(gene_vec)

                gene_scores <- vapply(gene_vec, function(x) (x - gene_mean)/gene_sd,
                  1.2)
                current_term_score_matrix <- rbind(current_term_score_matrix, gene_scores)
                rownames(current_term_score_matrix)[nrow(current_term_score_matrix)] <- gene
            }

            current_term_scores <- apply(current_term_score_matrix, 2, base::mean)
            all_scores_matrix <- rbind(all_scores_matrix, current_term_scores)
            rownames(all_scores_matrix)[nrow(all_scores_matrix)] <- enrichment_table[i,
                ID_column]
        }
    }

    if (!is.null(cases)) {
        ## order as cases, then controls
        match1 <- match(cases, colnames(all_scores_matrix))
        match2 <- setdiff(base::seq_len(ncol(all_scores_matrix)), match1)
        all_scores_matrix <- all_scores_matrix[, c(match1, match2)]
    }

    if (plot_hmap) {
        heatmap <- plot_scores(score_matrix = all_scores_matrix, cases = cases, ...)
        graphics::plot(heatmap)
    }

    return(all_scores_matrix)
}

#' Plot the Heatmap of Score Matrix of Enriched Terms per Sample
#'
#' @param score_matrix Matrix of agglomerated enriched term scores per sample. Columns are
#' samples, rows are enriched terms
#' @inheritParams score_terms
#' @param label_samples Boolean value to indicate whether or not to label the
#' samples in the heatmap plot (default = TRUE)
#' @param case_title Naming of the 'Case' group (as in \code{cases}) (default = 'Case')
#' @param control_title Naming of the 'Control' group (default = 'Control')
#' @param low a string indicating the color of 'low' values in the coloring gradient (default = 'green')
#' @param mid a string indicating the color of 'mid' values in the coloring gradient (default = 'black')
#' @param high a string indicating the color of 'high' values in the coloring gradient (default = 'red')
#'
#' @return A `ggplot2` object containing the heatmap plot. x-axis indicates
#' the samples. y-axis indicates the enriched terms. 'Score' indicates the
#' score of the term in a given sample. If \code{cases} are provided, the plot is
#' divided into 2 facets, named by \code{case_title} and \code{control_title}.
#'
#' @import ggplot2
#' @export
#'
#' @examples
#' score_matrix <- score_terms(
#'   example_pathfindR_output,
#'   example_experiment_matrix,
#'   plot_hmap = FALSE
#' )
#' hmap <- plot_scores(score_matrix)
plot_scores <- function(score_matrix, cases = NULL, label_samples = TRUE, case_title = "Case",
    control_title = "Control", low = "green", mid = "black", high = "red") {
    #### Argument Checks
    if (!is.matrix(score_matrix)) {
        stop("`score_matrix` should be a matrix")
    }

    if (!is.null(cases)) {
        if (!is.atomic(cases)) {
            stop("`cases` should be a vector")
        }

        if (!all(cases %in% colnames(score_matrix))) {
            stop("Missing `cases` in `score_matrix`")
        }
    }

    if (!is.logical(label_samples)) {
        stop("`label_samples` should be TRUE or FALSE")
    }

    if (!is.character(case_title) | length(case_title) != 1) {
        stop("`case_title` should be a single character value")
    }

    if (!is.character(control_title) | length(control_title) != 1) {
        stop("`control_title` should be a single character value")
    }

    if (!isColor(low)) {
      stop("`low` should be a valid color")
    }

    if (!isColor(mid)) {
      stop("`mid` should be a valid color")
    }

    if (!isColor(high)) {
      stop("`high` should be a valid color")
    }

    #### Create plot sort according to activity (up/down)
    if (!is.null(cases)) {
        tmp <- rowMeans(score_matrix[, cases, drop = FALSE])
        score_matrix <- score_matrix[c(which(tmp >= 0), which(tmp < 0)), ]
    }

    ## transform the matrix
    var_names <- list()
    var_names[["Term"]] <- factor(rownames(score_matrix), levels = rev(rownames(score_matrix)))
    var_names[["Sample"]] <- factor(colnames(score_matrix), levels = colnames(score_matrix))

    score_df <- expand.grid(var_names, KEEP.OUT.ATTRS = FALSE, stringsAsFactors = FALSE)
    scores <- as.vector(score_matrix)
    scores <- data.frame(scores)
    score_df <- cbind(score_df, scores)
    if (!is.null(cases)) {
        score_df$Type <- ifelse(score_df$Sample %in% cases, case_title, control_title)
        score_df$Type <- factor(score_df$Type, levels = c(case_title, control_title))
    }

    g <- ggplot2::ggplot(score_df, ggplot2::aes(x = .data$Sample, y = .data$Term))
    g <- g + ggplot2::geom_tile(ggplot2::aes(fill = .data$scores), color = "white")
    g <- g + ggplot2::scale_fill_gradient2(low = low, mid = mid, high = high)
    g <- g + ggplot2::theme(axis.title.x = ggplot2::element_blank(), axis.title.y = ggplot2::element_blank(),
        axis.text.x = ggplot2::element_text(angle = 45, hjust = 1), legend.title = ggplot2::element_text(size = 10),
        legend.text = ggplot2::element_text(size = 12))
    g <- g + ggplot2::labs(fill = "Score")
    if (!is.null(cases)) {
        g <- g + ggplot2::facet_grid(~Type, scales = "free_x", space = "free")
        g <- g + ggplot2::theme(strip.text.x = ggplot2::element_text(size = 12, face = "bold"))
    }
    if (!label_samples) {
        g <- g + ggplot2::theme(axis.text.x = ggplot2::element_blank(), axis.ticks.x = ggplot2::element_blank())
    }
    return(g)
}


================================================
FILE: R/utility.R
================================================
#' Active Subnetwork Search + Enrichment Analysis Wrapper for a Single Iteration
#'
#' @param i current iteration index (default = \code{NULL})
#' @param dirs vector of directories for parallel runs
#' @inheritParams active_snw_search
#' @inheritParams enrichment_analyses
#' @inheritParams active_snw_enrichment_wrapper
#'
#' @return Data frame of enrichment results using active subnetwork search results
single_iter_wrapper <- function(i = NULL, dirs, input_processed, pin_path, score_quan_thr,
    sig_gene_thr, search_method, silent_option, use_all_positives, geneInitProbs,
    saTemp0, saTemp1, saIter, gaPop, gaIter, gaThread, gaCrossover, gaMut, grMaxDepth,
    grSearchDepth, grOverlap, grSubNum, gset_list, adj_method, enrichment_threshold,
    list_active_snw_genes) {
    snws_file <- "active_snws"
    dir_for_parallel_run <- NULL
    if (!is.null(i)) {
        snws_file <- paste0("active_snws_", i)
        dir_for_parallel_run <- dirs[i]
    }
    snws <- active_snw_search(input_for_search = input_processed, pin_name_path = pin_path,
        snws_file = snws_file, dir_for_parallel_run = dir_for_parallel_run, score_quan_thr = score_quan_thr,
        sig_gene_thr = sig_gene_thr, search_method = search_method, seedForRandom = ifelse(is.null(i),
            1234, i), silent_option = silent_option, use_all_positives = use_all_positives,
        geneInitProbs = ifelse(!is.null(i), geneInitProbs[i], geneInitProbs), saTemp0 = saTemp0,
        saTemp1 = saTemp1, saIter = saIter, gaPop = gaPop, gaIter = gaIter, gaThread = gaThread,
        gaCrossover = gaCrossover, gaMut = gaMut, grMaxDepth = grMaxDepth, grSearchDepth = grSearchDepth,
        grOverlap = grOverlap, grSubNum = grSubNum)
    enrichment_res <- enrichment_analyses(snws = snws, sig_genes_vec = input_processed$GENE,
        pin_name_path = pin_path, genes_by_term = gset_list$genes_by_term, term_descriptions = gset_list$term_descriptions,
        adj_method = adj_method, enrichment_threshold = enrichment_threshold, list_active_snw_genes = list_active_snw_genes)
    return(enrichment_res)
}



#' Wrapper for Active Subnetwork Search + Enrichment over Single/Multiple Iteration(s)
#'
#' @param input_processed processed input data frame
#' @param pin_path path/to/PIN/file
#' @param gset_list list for gene sets
#' @param disable_parallel boolean to indicate whether to disable parallel runs
#'  via \code{foreach} (default = FALSE)
#' @inheritParams run_pathfindR
#' @inheritParams active_snw_search
#' @inheritParams enrichment_analyses
#' @param iterations number of iterations for active subnetwork search and
#'  enrichment analyses (Default = 10)
#' @param n_processes optional argument for specifying the number of processes
#'  used by foreach. If not specified, the function determines this
#'  automatically (Default == NULL. Gets set to 1 for Genetic Algorithm)
#'
#' @return Data frame of combined pathfindR enrichment results
active_snw_enrichment_wrapper <- function(input_processed, pin_path, gset_list, enrichment_threshold,
    list_active_snw_genes, adj_method = "bonferroni", search_method = "GR", disable_parallel = FALSE,
    use_all_positives = FALSE, iterations = 10, n_processes = NULL, score_quan_thr = 0.8,
    sig_gene_thr = 0.02, saTemp0 = 1, saTemp1 = 0.01, saIter = 10000, gaPop = 400,
    gaIter = 200, gaThread = 5, gaCrossover = 1, gaMut = 0, grMaxDepth = 1, grSearchDepth = 1,
    grOverlap = 0.5, grSubNum = 1000, silent_option = TRUE) {
    message("## Performing Active Subnetwork Search and Enrichment")
    ############ Argument checks Active Subnetwork Search Method
    valid_mets <- c("GR", "SA", "GA")
    if (!search_method %in% valid_mets) {
        stop("`search_method` should be one of ", paste(dQuote(valid_mets), collapse = ", "))
    }

    ## If search_method is GA, set iterations as 1
    if (search_method == "GA") {
        warning("`iterations` is set to 1 because `search_method = \"GA\"`", call. = FALSE)
        iterations <- 1
    }

    if (!is.null(n_processes)) {
        if (!is.numeric(n_processes)) {
            stop("`n_processes` should be either NULL or a positive integer")
        }
        if (n_processes < 1) {
            stop("`n_processes` should be > 1")
        }
    }

    # calculate the number of processes, if necessary
    if (is.null(n_processes)) {
        n_processes <- parallel::detectCores() - 1
    }

    ## If iterations < n_processes, set n_processes to iterations
    if (iterations < n_processes & iterations != 1) {
        message("`n_processes` is set to `iterations` because `iterations` < `n_processes`")
        n_processes <- iterations
    }

    if (!is.logical(use_all_positives)) {
        stop("`use_all_positives` should be either TRUE or FALSE")
    }

    if (!is.logical(silent_option)) {
        stop("`silent_option` should be either TRUE or FALSE")
    }

    if (!is.logical(disable_parallel)) {
        stop("`disable_parallel` should be either TRUE or FALSE")
    }

    if (!is.numeric(iterations)) {
        stop("`iterations` should be a positive integer")
    }
    if (iterations < 1) {
        stop("`iterations` should be >= 1")
    }

    geneInitProbs <- 0.1
    dirs <- c()
    if (iterations > 1) {
        geneInitProbs <- seq(from = 0.01, to = 0.2, length.out = iterations)

        for (i in base::seq_len(iterations)) {
            dir_i <- file.path("active_snw_searches", paste0("Iteration_", i))
            dir.create(dir_i, recursive = TRUE, showWarnings = FALSE)
            dirs <- c(dirs, dir_i)
        }
    }

    if (iterations == 1) {
        combined_res <- single_iter_wrapper(i = NULL, dirs, input_processed, pin_path,
            score_quan_thr, sig_gene_thr, search_method, silent_option, use_all_positives,
            geneInitProbs, saTemp0, saTemp1, saIter, gaPop, gaIter, gaThread, gaCrossover,
            gaMut, grMaxDepth, grSearchDepth, grOverlap, grSubNum, gset_list, adj_method,
            enrichment_threshold, list_active_snw_genes)
    } else {
        if (!disable_parallel) {
            cl <- parallel::makeCluster(n_processes, setup_strategy = "sequential")
            doParallel::registerDoParallel(cl)
            `%dopar%` <- foreach::`%dopar%`
            combined_res <- foreach::foreach(i = 1:iterations, .combine = rbind,
                .packages = "pathfindR") %dopar% {
                single_iter_wrapper(i, dirs, input_processed, pin_path, score_quan_thr,
                  sig_gene_thr, search_method, silent_option, use_all_positives,
                  geneInitProbs, saTemp0, saTemp1, saIter, gaPop, gaIter, gaThread,
                  gaCrossover, gaMut, grMaxDepth, grSearchDepth, grOverlap, grSubNum,
                  gset_list, adj_method, enrichment_threshold, list_active_snw_genes)
            }
            parallel::stopCluster(cl)
        } else {
            combined_res <- c()
            for (i in 1:iterations) {
                current_res <- single_iter_wrapper(i, dirs, score_quan_thr, sig_gene_thr,
                  search_method, silent_option, use_all_positives, geneInitProbs,
                  saTemp0, saTemp1, saIter, gaPop, gaIter, gaThread, gaCrossover,
                  gaMut, grMaxDepth, grSearchDepth, grOverlap, grSubNum, gset_list,
                  adj_method, enrichment_threshold, list_active_snw_genes)
                combined_res <- rbind(combined_res, current_res)
            }
        }
    }
    return(combined_res)
}



#' Configure Output Directory Name
#'
#' @inheritParams run_pathfindR
#'
#' @return /path/to/output/dir
configure_output_dir <- function(output_dir = NULL) {
    output_dir_init <- output_dir
    output_dir <- ifelse(is.null(output_dir), file.path(tempdir(check = TRUE), "pathfindR_results"),
        output_dir)
    dir_changed <- FALSE
    while (dir.exists(output_dir)) {
        output_dir <- sub("/$", "", output_dir)
        if (grepl("\\(\\d+\\)$", output_dir)) {
            output_dir <- unlist(strsplit(output_dir, "\\("))
            suffix <- as.numeric(sub("\\)", "", output_dir[2])) + 1
            output_dir <- paste0(output_dir[1], "(", suffix, ")")
        } else {
            output_dir <- paste0(output_dir, "(1)")
        }
        dir_changed <- TRUE
    }

    if (dir_changed & !is.null(output_dir_init)) {
        message(paste0("There is already a directory named \"", output_dir_init,
            "\".\nWriting the result to \"", output_dir, "\" not to overwrite any previous results."))
    }
    return(output_dir)
}

#' Create HTML Report of pathfindR Results
#'
#' @inheritParams run_pathfindR
#' @param input_processed processed input data frame
#' @param final_res final pathfindR result data frame
#' @param dir_for_report directory to render the report in
create_HTML_report <- function(input, input_processed, final_res, dir_for_report) {
    message("## Creating HTML report")
    rmarkdown::render(input = system.file("rmd", "results.Rmd", package = "pathfindR"),
        output_dir = dir_for_report)
    rmarkdown::render(input = system.file("rmd", "enriched_terms.Rmd", package = "pathfindR"),
        params = list(df = final_res), output_dir = dir_for_report)
    rmarkdown::render(input = system.file("rmd", "conversion_table.Rmd", package = "pathfindR"),
        params = list(df = input_processed, original_df = input), output_dir = dir_for_report)
}

#' Input Testing
#'
#' @param input the input data that pathfindR uses. The input must be a data
#'   frame with three columns: \enumerate{
#'   \item Gene Symbol (Gene Symbol)
#'   \item Change value, e.g. log(fold change) (OPTIONAL)
#'   \item p value, e.g. adjusted p value associated with differential expression
#' }
#' @param p_val_threshold the p value threshold to use when filtering
#'   the input data frame. Must a numeric value between 0 and 1. (default = 0.05)
#'
#' @return Only checks if the input and the threshold follows the required
#'   specifications.
#' @export
#' @seealso See \code{\link{run_pathfindR}} for the wrapper function of the
#'   pathfindR workflow
#' @examples
#' input_testing(example_pathfindR_input, 0.05)
input_testing <- function(input, p_val_threshold = 0.05) {
    message("## Testing input")

    if (!is.data.frame(input)) {
        stop("the input is not a data frame")
    }

    if (ncol(input) != 2 & ncol(input) != 3) {
        stop("the input should have 2 or 3 columns")
    }

    if (nrow(input) < 2) {
        stop("There must be at least 2 rows (genes) in the input data frame")
    }

    if (!is.numeric(p_val_threshold)) {
        stop("`p_val_threshold` must be a numeric value between 0 and 1")
    }

    if (p_val_threshold > 1 | p_val_threshold < 0) {
        stop("`p_val_threshold` must be between 0 and 1")
    }

    # if changes are provided, p vals are in col. 3, else in col. 2
    p_column <- ifelse(ncol(input) == 3, 3, 2)

    if (any(is.na(input[, p_column]))) {
        stop("p values cannot contain NA values")
    }

    if (!all(is.numeric(input[, p_column]))) {
        stop("p values must all be numeric")
    }

    if (any(input[, p_column] > 1 | input[, p_column] < 0)) {
        stop("p values must all be between 0 and 1")
    }

    message("The input looks OK")
}

#' Process Input
#' @inheritParams input_testing
#' @inheritParams active_snw_search
#' @inheritParams return_pin_path
#' @param convert2alias boolean to indicate whether or not to convert gene symbols
#' in the input that are not found in the PIN to an alias symbol found in the PIN
#' (default = TRUE) IMPORTANT NOTE: the conversion uses human gene symbols/alias symbols.
#'
#' @return This function first filters the input so that all p values are less
#'   than or equal to the threshold. Next, gene symbols that are not found in
#'   the PIN are identified. If aliases of these gene symbols are found in the
#'   PIN, the symbols are converted to the corresponding aliases. The
#'   resulting data frame containing the original gene symbols, the updated
#'   symbols, change values and p values is then returned.
#' @export
#'
#' @seealso See \code{\link{run_pathfindR}} for the wrapper function of the
#'   pathfindR workflow
#'
#' @examples
#' processed_df <- input_processing(
#'   input = example_pathfindR_input[1:5, ],
#'   pin_name_path = 'KEGG'
#' )
#' processed_df <- input_processing(
#'   input = example_pathfindR_input[1:5, ],
#'   pin_name_path = 'KEGG',
#'   convert2alias = FALSE
#' )
input_processing <- function(input, p_val_threshold = 0.05, pin_name_path = "Biogrid",
    convert2alias = TRUE) {
    message("## Processing input. Converting gene symbols,
          if necessary (and if human gene symbols provided)")

    if (!is.logical(convert2alias)) {
        stop("`convert2alias` should be either TRUE or FALSE")
    }

    pin_path <- return_pin_path(pin_name_path)

    if (ncol(input) == 2) {
        input <- data.frame(GENE = input[, 1], CHANGE = rep(1e+06, nrow(input)),
            P_VALUE = input[, 2])
    }

    colnames(input) <- c("GENE", "CHANGE", "P_VALUE")

    ## Turn GENE into character
    if (is.factor(input$GENE)) {
        warning("The gene column was turned into character from factor.", call. = FALSE)
        input$GENE <- as.character(input$GENE)
    }

    message("Number of genes provided in input: ", nrow(input))
    ## Discard larger than p-value threshold
    if (sum(input$P_VALUE <= p_val_threshold) == 0) {
        stop("No input p value is lower than the provided threshold (", p_val_threshold,
            ")")
    }
    input <- input[input$P_VALUE <= p_val_threshold, ]
    message("Number of genes in input after p-value filtering: ", nrow(input))

    ## Choose lowest p for each gene
    if (anyDuplicated(input$GENE)) {
        warning("Duplicated genes found! The lowest p value for each gene was selected",
            call. = FALSE)

        input <- input[order(input$P_VALUE, decreasing = FALSE), ]
        input <- input[!duplicated(input$GENE), ]
    }

    ## Fix p < 1e-13
    if (any(input$P_VALUE < 1e-13)) {
        message("pathfindR cannot handle p values < 1e-13. These were changed to 1e-13")
        input$P_VALUE <- ifelse(input$P_VALUE < 1e-13, 1e-13, input$P_VALUE)
    }

    ## load and prep pin
    pin <- utils::read.delim(file = pin_path, header = FALSE)

    ## Genes not in pin
    PIN_genes <- c(base::toupper(pin[, 1]), base::toupper(pin[, 3]))
    missing_symbols <- input$GENE[!base::toupper(input$GENE) %in% PIN_genes]
    non_missing_symbols <- input$GENE[base::toupper(input$GENE) %in% PIN_genes]

    
    if (convert2alias & !requireNamespace("org.Hs.eg.db", quietly = TRUE)) {
      message(
        "Package 'org.Hs.eg.db' is not installed; returning input genes unchanged.\n",
        "Install it with:\n",
        "  if (!requireNamespace('BiocManager', quietly = TRUE)) install.packages('BiocManager')\n",
        "  BiocManager::install('org.Hs.eg.db')"
      )
      convert2alias <- FALSE
    }
    
    
    if (convert2alias & length(missing_symbols) != 0) {
        ## use SQL to get alias table and gene_info table (contains the
        ## symbols) first open the database connection
        db_con <- org.Hs.eg.db::org.Hs.eg_dbconn()
        ## the SQL query
        sql_query <- "SELECT * FROM alias, gene_info WHERE alias._id == gene_info._id;"
        ## execute the query on the database
        hsa_alias_df <- DBI::dbGetQuery(db_con, sql_query)

        select_alias <- function(result, converted, idx) {
            while (idx > 0) {
                if (!result[idx] %in% c(converted[, 2], non_missing_symbols)) {
                  return(result[idx])
                }
                idx <- idx - 1
            }
            return("NOT_FOUND")
        }

        ## loop for getting all symbols
        converted <- c()
        for (i in base::seq_len(length(missing_symbols))) {
            result <- hsa_alias_df[hsa_alias_df$alias_symbol == missing_symbols[i],
                c("alias_symbol", "symbol")]
            result <- hsa_alias_df[hsa_alias_df$symbol %in% result$symbol, c("alias_symbol",
                "symbol")]
            result <- result$alias_symbol[base::toupper(result$alias_symbol) %in%
                PIN_genes]
            ## avoid duplicate entries
            to_add <- select_alias(result, converted, length(result))
            converted <- rbind(converted, c(missing_symbols[i], to_add))
        }

        ## Convert to appropriate symbol
        input$new_gene <- input$GENE
        input$new_gene[match(converted[, 1], input$new_gene)] <- converted[, 2]
    } else {
        input$new_gene <- ifelse(input$GENE %in% missing_symbols, "NOT_FOUND", input$GENE)
    }

    ## number and percent still missing
    n <- sum(input$new_gene == "NOT_FOUND")
    perc <- n/nrow(input) * 100

    if (n == nrow(input)) {
        stop("None of the genes were in the PIN\nPlease check your gene symbols")
    }

    ## Give out warning indicating the number of still missing
    if (n != 0) {
        message(paste0("Could not find any interactions for ", n, " (", round(perc,
            2), "%) genes in the PIN"))
    } else {
        message(paste0("Found interactions for all genes in the PIN"))
    }

    ## reorder columns
    input <- input[, c(1, 4, 2, 3)]
    colnames(input) <- c("old_GENE", "GENE", "CHANGE", "P_VALUE")

    input <- input[input$GENE != "NOT_FOUND", ]

    ## Keep lowest p value for duplicated genes
    input <- input[order(input$P_VALUE), ]
    input <- input[!duplicated(input$GENE), ]

    ## Check that at least two genes remain
    if (nrow(input) < 2) {
        stop("After processing, 1 gene (or no genes) could be mapped to the PIN")
    }

    message("Final number of genes in input: ", nrow(input))

    return(input)
}

#' Annotate the Affected Genes in the Provided Enriched Terms
#'
#' Function to annotate the involved affected (input) genes in each term.
#'
#' @param result_df data frame of enrichment results.
#'  The only must-have column is 'ID'.
#' @param input_processed input data processed via \code{\link{input_processing}}
#' @param genes_by_term List that contains genes for each gene set. Names of
#'   this list are gene set IDs (default = kegg_genes)
#'
#' @return The original data frame with two additional columns:  \describe{
#'   \item{Up_regulated}{the up-regulated genes in the input involved in the given term's gene set, comma-separated}
#'   \item{Down_regulated}{the down-regulated genes in the input involved in the given term's gene set, comma-separated}
#' }
#' @export
#'
#' @examples
#' example_gene_data <- example_pathfindR_input
#' colnames(example_gene_data) <- c('GENE', 'CHANGE', 'P_VALUE')
#'
#' annotated_result <- annotate_term_genes(
#'   result_df = example_pathfindR_output,
#'   input_processed = example_gene_data
#' )
annotate_term_genes <- function(result_df, input_processed, genes_by_term = pathfindR.data::kegg_genes) {
    message("## Annotating involved genes and visualizing enriched terms")
    ### Argument checks
    if (!is.data.frame(result_df)) {
        stop("`result_df` should be a data frame")
    }
    if (!"ID" %in% colnames(result_df)) {
        stop("`result_df` should contain an \"ID\" column")
    }

    if (!is.data.frame(input_processed)) {
        stop("`input_processed` should be a data frame")
    }
    if (!all(c("GENE", "CHANGE") %in% colnames(input_processed))) {
        stop("`input_processed` should contain the columns \"GENE\" and \"CHANGE\"")
    }

    if (!is.list(genes_by_term)) {
        stop("`genes_by_term` should be a list of term gene sets")
    }
    if (is.null(names(genes_by_term))) {
        stop("`genes_by_term` should be a named list (names are gene set IDs)")
    }

    ### Annotate up/down-regulated term-related genes Up/Down-regulated genes
    upreg <- base::toupper(input_processed$GENE[input_processed$CHANGE >= 0])
    downreg <- base::toupper(input_processed$GENE[input_processed$CHANGE < 0])

    ## Annotation
    annotated_df <- result_df
    annotated_df$Down_regulated <- annotated_df$Up_regulated <- NA
    for (i in base::seq_len(nrow(annotated_df))) {
        idx <- which(names(genes_by_term) == annotated_df$ID[i])
        temp <- genes_by_term[[idx]]
        annotated_df$Up_regulated[i] <- paste(temp[base::toupper(temp) %in% upreg],
            collapse = ", ")
        annotated_df$Down_regulated[i] <- paste(temp[base::toupper(temp) %in% downreg],
            collapse = ", ")
    }

    return(annotated_df)
}


#' Fetch Gene Set Objects
#'
#' Function for obtaining the gene sets per term and the term descriptions to
#' be used for enrichment analysis.
#'
#' @param gene_sets Name of the gene sets to be used for enrichment analysis.
#'  Available gene sets are 'KEGG', 'Reactome', 'BioCarta', 'GO-All',
#'  'GO-BP', 'GO-CC', 'GO-MF', 'cell_markers', 'mmu_KEGG' or 'Custom'.
#'  If 'Custom', the arguments \code{custom_genes} and \code{custom_descriptions}
#'  must be specified. (Default = 'KEGG')
#' @param min_gset_size minimum number of genes a term must contain (default = 10)
#' @param max_gset_size maximum number of genes a term must contain (default = 300)
#' @param custom_genes a list containing the genes involved in each custom
#'  term. Each element is a vector of gene symbols located in the given custom
#'  term. Names should correspond to the IDs of the custom terms.
#' @param custom_descriptions A vector containing the descriptions for each
#'  custom  term. Names of the vector should correspond to the IDs of the custom
#'  terms.
#'
#' @return a list containing 2 elements \describe{
#'   \item{genes_by_term}{list of vectors of genes contained in each term}
#'   \item{term_descriptions}{vector of descriptions per each term}
#' }
#'
#' @export
#'
#' @examples
#' KEGG_gset <- fetch_gene_set()
#' GO_MF_gset <- fetch_gene_set('GO-MF', min_gset_size = 20, max_gset_size = 100)
fetch_gene_set <- function(gene_sets = "KEGG", min_gset_size = 10, max_gset_size = 300,
    custom_genes = NULL, custom_descriptions = NULL) {
    ### Argument checks
    all_gs_opts <- c("KEGG", "Reactome", "BioCarta", "GO-All", "GO-BP", "GO-CC",
        "GO-MF", "cell_markers", "mmu_KEGG", "Custom")
    if (!gene_sets %in% all_gs_opts) {
        stop("`gene_sets` should be one of ", paste(dQuote(all_gs_opts), collapse = ", "))
    }

    if (!is.numeric(min_gset_size)) {
        stop("`min_gset_size` should be numeric")
    }
    if (!is.numeric(max_gset_size)) {
        stop("`max_gset_size` should be numeric")
    }


    ### Custom Gene Sets
    if (gene_sets == "Custom") {
        if (is.null(custom_genes) | is.null(custom_descriptions)) {
            stop("`custom_genes` and `custom_descriptions` must be provided if `gene_sets = \"Custom\"`")
        }

        if (!is.list(custom_genes)) {
            stop("`custom_genes` should be a list of term gene sets")
        }
        if (is.null(names(custom_genes))) {
            stop("`custom_genes` should be a named list (names are gene set IDs)")
        }

        if (!is.atomic(custom_descriptions)) {
            stop("`custom_descriptions` should be a vector of term gene descriptions")
        }
        if (is.null(names(custom_descriptions))) {
            stop("`custom_descriptions` should be a named vector (names are gene set IDs)")
        }

        # filter by size
        gset_lens <- vapply(custom_genes, length, 1)
        keep <- which(gset_lens >= min_gset_size & gset_lens <= max_gset_size)
        custom_genes <- custom_genes[keep]
        custom_descriptions <- custom_descriptions[names(custom_genes)]

        return(list(genes_by_term = custom_genes, term_descriptions = custom_descriptions))
    }

    ### Built-in Gene Sets GO gene sets
    if (grepl("^GO", gene_sets)) {
        genes_by_term <- pathfindR.data::go_all_genes

        GO_df <- pathfindR.data:::GO_all_terms_df
        term_descriptions <- GO_df$GO_term
        names(term_descriptions) <- GO_df$GO_ID

        if (gene_sets == "GO-BP") {
            tmp <- GO_df$GO_ID[GO_df$Category == "Process"]
            genes_by_term <- genes_by_term[tmp]
            term_descriptions <- term_descriptions[tmp]
        } else if (gene_sets == "GO-CC") {
            tmp <- GO_df$GO_ID[GO_df$Category == "Component"]
            genes_by_term <- genes_by_term[tmp]
            term_descriptions <- term_descriptions[tmp]
        } else if (gene_sets == "GO-MF") {
            tmp <- GO_df$GO_ID[GO_df$Category == "Function"]
            genes_by_term <- genes_by_term[tmp]
            term_descriptions <- term_descriptions[tmp]
        }

        ## non-GO (KEGG, Reactome, BioCarta, mmu_KEGG)
    } else {
        if (gene_sets == "KEGG") {
            genes_by_term <- pathfindR.data::kegg_genes
            term_descriptions <- pathfindR.data::kegg_descriptions
        } else if (gene_sets == "Reactome") {
            genes_by_term <- pathfindR.data::reactome_genes
            term_descriptions <- pathfindR.data::reactome_descriptions
        } else if (gene_sets == "BioCarta") {
            genes_by_term <- pathfindR.data::biocarta_genes
            term_descriptions <- pathfindR.data::biocarta_descriptions
        } else if (gene_sets == "mmu_KEGG") {
            genes_by_term <- pathfindR.data::mmu_kegg_genes
            term_descriptions <- pathfindR.data::mmu_kegg_descriptions
        } else {
            genes_by_term <- pathfindR.data::cell_markers_gsets
            term_descriptions <- pathfindR.data::cell_markers_descriptions
        }
    }

    # filter by size
    term_lens <- vapply(genes_by_term, length, 1)
    keep <- which(term_lens >= min_gset_size & term_lens <= max_gset_size)
    genes_by_term <- genes_by_term[keep]
    term_descriptions <- term_descriptions[names(genes_by_term)]

    return(list(genes_by_term = genes_by_term, term_descriptions = term_descriptions))
}

#' Return The Path to Given Protein-Protein Interaction Network (PIN)
#'
#' This function returns the absolute path/to/PIN.sif. While the default PINs are
#' 'Biogrid', 'STRING', 'GeneMania', 'IntAct', 'KEGG' and 'mmu_STRING'. The user can also
#' use any other PIN by specifying the 'path/to/PIN.sif'. All PINs to be used
#' in this package must formatted as SIF files: i.e. have 3 columns with no
#' header, no row names and be tab-separated. Columns 1 and 3 must be
#' interactors' gene symbols, column 2 must be a column with all
#' rows consisting of 'pp'.
#'
#' @param pin_name_path Name of the chosen PIN or absolute/path/to/PIN.sif. If PIN name,
#'   must be one of c('Biogrid', 'STRING', 'GeneMania', 'IntAct', 'KEGG', 'mmu_STRING'). If
#'   path/to/PIN.sif, the file must comply with the PIN specifications. (Default = 'Biogrid')
#'
#' @return The absolute path to chosen PIN.
#'
#' @export
#' @seealso See \code{\link{run_pathfindR}} for the wrapper function of the
#'   pathfindR workflow
#' @examples
#' \dontrun{
#' pin_path <- return_pin_path('GeneMania')
#' }
return_pin_path <- function(pin_name_path = "Biogrid") {
    ## Default PINs
    valid_opts <- c("Biogrid", "STRING", "GeneMania", "IntAct", "KEGG", "mmu_STRING",
        "/path/to/custom/SIF")
    if (pin_name_path %in% valid_opts[-length(valid_opts)]) {
        path <- file.path(tempdir(check = TRUE), paste0(pin_name_path, ".sif"))
        if (!file.exists(path)) {
            adj_list <- utils::getFromNamespace(paste0(tolower(pin_name_path), "_adj_list"),
                ns = "pathfindR.data")

            pin_df <- lapply(seq_along(adj_list), function(i, nm, val) {
                data.frame(base::toupper(nm[[i]]), "pp", base::toupper(val[[i]]))
            }, val = adj_list, nm = names(adj_list))
            pin_df <- base::do.call("rbind", pin_df)
            utils::write.table(pin_df, path, sep = "\t", row.names = FALSE, col.names = FALSE,
                quote = FALSE)
        }
        path <- normalizePath(path)

        ## Custom PIN
    } else if (file.exists(suppressWarnings(normalizePath(pin_name_path)))) {
        path <- normalizePath(pin_name_path)
        pin <- utils::read.delim(file = path, quote = "", header = FALSE)
        if (ncol(pin) != 3) {
            stop("The PIN file must have 3 columns and be tab-separated")
        }

        if (any(pin[, 2] != "pp")) {
            stop("The second column of the PIN file must all be \"pp\" ")
        }

        if (any(grepl("[a-z]", pin[, 1])) | any(grepl("[a-z]", pin[, 3]))) {
            pin[, 1] <- base::toupper(pin[, 1])
            pin[, 3] <- base::toupper(pin[, 3])

            path <- file.path(tempdir(check = TRUE), "custom_PIN.sif")
            utils::write.table(pin, path, sep = "\t", row.names = FALSE, col.names = FALSE,
                quote = FALSE)
            path <- normalizePath(path)
        }
    } else {
        stop("The chosen PIN must be one of:\n", paste(dQuote(valid_opts), collapse = ", "))
    }
    return(path)
}


================================================
FILE: R/visualization.R
================================================
#' Check if value is a valid color
#'
#' @param x value
#'
#' @return TRUE if x is a valid color, otherwise FALSE
isColor <- function(x) {
  if (!is.character(x) | length(x) != 1) {
    return(FALSE)
  }
  tryCatch(is.matrix(grDevices::col2rgb(x)), error = function(e) FALSE)
}


#' Create Diagrams for Enriched Terms
#'
#' @param result_df Data frame of enrichment results. Must-have columns for
#'  KEGG human pathway diagrams (\code{is_KEGG_result = TRUE}) are: 'ID' and 'Term_Description'.
#'  Must-have columns for the rest are: 'Term_Description', 'Up_regulated' and
#' 'Down_regulated'
#' @param input_processed input data processed via \code{\link{input_processing}},
#'  not necessary when \code{is_KEGG_result = FALSE}
#' @param is_KEGG_result boolean to indicate whether KEGG gene sets were used for
#'  enrichment analysis or not (default = \code{TRUE})
#' @inheritParams return_pin_path
#' @param ... additional arguments for \code{\link{visualize_KEGG_diagram}} (used
#' when \code{is_KEGG_result = TRUE}) or \code{\link{visualize_term_interactions}}
#' (used when \code{is_KEGG_result = FALSE})
#'
#' @return Depending on the argument \code{is_KEGG_result}, creates visualization of
#'  interactions of genes involved in the list of enriched terms in
#'  \code{result_df}. Returns a list of ggplot objects named by Term ID.
#'
#'
#' @details For \code{is_KEGG_result = TRUE}, KEGG pathway diagrams are created,
#' affected nodes colored by up/down regulation status.
#' For other gene sets, interactions of affected genes are determined (via a shortest-path
#' algorithm) and are visualized (colored by change status) using igraph.
#'
#'
#' @export
#'
#' @seealso See \code{\link{visualize_KEGG_diagram}} for the visualization function
#' of KEGG diagrams. See \code{\link{visualize_term_interactions}} for the
#' visualization function that generates diagrams showing the interactions of
#' input genes in the PIN. See \code{\link{run_pathfindR}} for the wrapper
#' function of the pathfindR workflow.
#'
#' @examples
#' \dontrun{
#' input_processed <- data.frame(
#'   GENE = c("PARP1", "NDUFA1", "STX6", "SNAP23"),
#'   CHANGE = c(1.5, -2, 3, 5)
#' )
#' result_df <- example_pathfindR_output[1:2, ]
#'
#' gg_list <- visualize_terms(result_df, input_processed)
#' gg_list2 <- visualize_terms(result_df, is_KEGG_result = FALSE, pin_name_path = 'IntAct')
#' }
visualize_terms <- function(
    result_df, input_processed = NULL, is_KEGG_result = TRUE, pin_name_path = "Biogrid", ...
) {
    ############ Argument Checks
    if (!is.data.frame(result_df)) {
        stop("`result_df` should be a data frame")
    }

    if (!is.logical(is_KEGG_result)) {
        stop("the argument `is_KEGG_result` should be either TRUE or FALSE")
    }

    if (is_KEGG_result) {
        nec_cols <- "ID"
    } else {
        nec_cols <- c("Term_Description", "Up_regulated", "Down_regulated")
    }
    if (!all(nec_cols %in% colnames(result_df))) {
        stop("`result_df` should contain the following columns: ", paste(dQuote(nec_cols),
            collapse = ", "))
    }

    if (is_KEGG_result) {
        if (is.null(input_processed)) {
            stop("`input_processed` should be specified when `is_KEGG_result = TRUE`")
        }
    }

  ############ Generate Diagrams
  if (is_KEGG_result) {
    visualize_KEGG_diagram(
      kegg_pw_ids = result_df$ID, input_processed = input_processed, ...
    )
  } else {
    visualize_term_interactions(
      result_df = result_df, pin_name_path = pin_name_path, ...
    )
  }
}

#' Visualize Interactions of Genes Involved in the Given Enriched Terms
#'
#' @param result_df Data frame of enrichment results. Must-have columns
#' are: 'Term_Description', 'Up_regulated' and 'Down_regulated'
#' @inheritParams return_pin_path
#' @param show_legend Boolean to indicate whether to display the legend (\code{TRUE})
#' or not (\code{FALSE}) (default: \code{TRUE})
#'
#' @return list of ggplot objects (named by Term ID) visualizing the interactions of genes involved
#' in the given enriched terms (annotated in the \code{result_df}) in the PIN used
#' for enrichment analysis (specified by \code{pin_name_path}).
#'
#' @details The following steps are performed for the visualization of interactions
#' of genes involved for each enriched term: \enumerate{
#'   \item shortest paths between all affected genes are determined (via \code{\link[igraph]{igraph}})
#'   \item the nodes of all shortest paths are merged
#'   \item the PIN is subsetted using the merged nodes (genes)
#'   \item using the PIN subset, the graph showing the interactions is generated
#'   \item the final graph is visualized using \code{\link[igraph]{igraph}}, colored by changed
#'   status (if provided)
#' }
#'
#' @export
#'
#' @seealso See \code{\link{visualize_terms}} for the wrapper function
#'   for creating enriched term diagrams. See \code{\link{run_pathfindR}} for the
#'   wrapper function of the pathfindR enrichment workflow.
#'
#' @examples
#' \dontrun{
#' result_df <- example_pathfindR_output[1:2, ]
#' gg_list <- visualize_term_interactions(result_df, pin_name_path = 'IntAct')
#' }
visualize_term_interactions <- function(result_df, pin_name_path, show_legend = TRUE) {
    ############ Initial Steps fix naming issue
    result_df$Term_Description <- gsub("\\/", "-", result_df$Term_Description)

    ## load PIN
    pin_path <- return_pin_path(pin_name_path)
    pin <- utils::read.delim(file = pin_path, header = FALSE)
    pin$V2 <- NULL

    pin[, 1] <- base::toupper(pin[, 1])
    pin[, 2] <- base::toupper(pin[, 2])

    ## pin graph
    pin_g <- igraph::graph_from_data_frame(pin, directed = FALSE)

    ############ Visualize interactions by enriched term
    pw_vis_list <- list()
    for (i in base::seq_len(nrow(result_df))) {
        current_row <- result_df[i, ]

        up_genes <- base::toupper(unlist(strsplit(current_row$Up_regulated, ", ")))
        down_genes <- base::toupper(unlist(strsplit(current_row$Down_regulated, ", ")))
        current_genes <- c(down_genes, up_genes)

        ## Add active snw genes if listed
        if (!is.null(result_df$non_Signif_Snw_Genes)) {
            snw_genes <- unlist(strsplit(current_row$non_Signif_Snw_Genes, ", "))
            snw_genes <- base::toupper(snw_genes)
            current_genes <- c(current_genes, snw_genes)
        } else {
            snw_genes <- NULL
        }

        if (length(current_genes) < 2) {
            message(paste0("< 2 genes, skipping visualization of ", current_row$Term_Description))
        } else {
            cat("Visualizing:", paste0("(", i, ")") , current_row$Term_Description, paste(rep(" ", 200),
                collapse = ""), "\r")

            ## Find genes without direct interaction
            cond1 <- pin$V1 %in% current_genes
            cond2 <- pin$V3 %in% current_genes
            direct_interactions <- pin[cond1 & cond2, ]
            tmp <- c(direct_interactions$V1, direct_interactions$V3)
            missing_genes <- current_genes[!current_genes %in% tmp]

            ## Find shortest path between genes without direct interaction and
            ## other current_genes
            s_path_genes <- c()
            for (gene in missing_genes) {
                tmp <- suppressWarnings(igraph::shortest_paths(pin_g, from = which(names(igraph::V(pin_g)) ==
                  gene), to = which(names(igraph::V(pin_g)) %in% current_genes),
                  output = "vpath"))
                tmp <- unique(unlist(lapply(tmp$vpath, function(x) names(x))))
                s_path_genes <- unique(c(s_path_genes, tmp))
            }

            final_genes <- unique(c(current_genes, s_path_genes))
            cond1 <- pin$V1 %in% final_genes
            cond2 <- pin$V3 %in% final_genes
            final_interactions <- pin[cond1 & cond2, ]
            g <- igraph::graph_from_data_frame(final_interactions, directed = FALSE)

            cond1 <- names(igraph::V(g)) %in% up_genes
            cond2 <- names(igraph::V(g)) %in% down_genes
            cond3 <- names(igraph::V(g)) %in% snw_genes
            node_type <- as.factor(ifelse(cond1, "up",
                                          ifelse(cond2, "down",
                                                 ifelse(cond3,
                                                        "interactor", "none"))))
            igraph::V(g)$type <- node_type

            node_colors <- c("green", "red", "blue", "gray")
            names(node_colors) <- c("up", "down", "interactor", "none")
            node_colors <- node_colors[levels(node_type)]

            type_descriptions <- c(
              none = "other", up = "up-regulated gene", down = "down-regulated gene", interactor = "interacting non-input gene"
            )
            type_descriptions <- type_descriptions[levels(node_type)]

            p <- ggraph::ggraph(g, layout = "stress")
            p <- p + ggraph::geom_edge_link(alpha = 0.8, colour = "darkgrey", linewidth = 0.5)
            p <- p + ggraph::geom_node_point(ggplot2::aes(color = .data$type), size = 5)
            p <- p + ggplot2::theme_void()
            p <- p + suppressWarnings(ggraph::geom_node_text(ggplot2::aes(label = .data$name),
                                                             nudge_y = 0.2, repel = TRUE, max.overlaps = 20))
            p <- p + ggplot2::scale_color_manual(values = node_colors, name = NULL,
                                                 labels = type_descriptions)
            p <- p + ggplot2::ggtitle(
              paste(current_row$Term_Description, "\n Involved Gene Interactions in", pin_name_path)
            )
            pw_vis_list[[current_row$ID]] <- p
        }
    }
    return(pw_vis_list)
}

#' Visualize Human KEGG Pathways
#'
#' @param kegg_pw_ids KEGG ids of pathways to be colored and visualized
#' @param input_processed input data processed via \code{\link{input_processing}}
#' @inheritParams color_kegg_pathway
#'
#' @return Creates colored visualizations of the enriched human KEGG pathways
#' and returns them as a list of ggplot objects, named by Term ID.
#'
#' @seealso See \code{\link{visualize_terms}} for the wrapper function for
#' creating enriched term diagrams. See \code{\link{run_pathfindR}} for the
#' wrapper function of the pathfindR enrichment workflow.
#'
#' @export
#'
#' @examples
#' \dontrun{
#' input_processed <- data.frame(
#'   GENE = c("PKLR", "GPI", "CREB1", "INS"),
#'   CHANGE = c(1.5, -2, 3, 5)
#' )
#' gg_list <- visualize_KEGG_diagram(c("hsa00010", "hsa04911"), input_processed)
#' }
visualize_KEGG_diagram <- function(
    kegg_pw_ids,
    input_processed,
    scale_vals = TRUE,
    node_cols = NULL,
    legend.position = "top"
) {
    message("This function utilises one functionality of `ggkegg`. For more options, visit https://github.com/noriakis/ggkegg")
    
    ############ Arg checks

    ### kegg_pw_ids
    if (!is.atomic(kegg_pw_ids)) {
        stop("`kegg_pw_ids` should be a vector of KEGG IDs")
    }
    if (!all(grepl("^[a-z]{3}[0-9]{5}$", kegg_pw_ids))) {
        stop("`kegg_pw_ids` should be a vector of valid hsa KEGG IDs")
    }

    ### input_processed
    if (!is.data.frame(input_processed)) {
        stop("`input_processed` should be a data frame")
    }

    nec_cols <- c("GENE", "CHANGE")
    if (!all(nec_cols %in% colnames(input_processed))) {
        stop("`input_processed` should contain the following columns: ", paste(dQuote(nec_cols),
            collapse = ", "))
    }
    
    if (!requireNamespace("org.Hs.eg.db", quietly = TRUE)) {
      message(
        "Package 'org.Hs.eg.db' is not installed; returning empty list.\n",
        "Install it with:\n",
        "  if (!requireNamespace('BiocManager', quietly = TRUE)) install.packages('BiocManager')\n",
        "  BiocManager::install('org.Hs.eg.db')"
      )
      return(list())
    }

    ############ Create change vector Convert gene symbols into NCBI gene IDs
    tmp <- AnnotationDbi::mget(input_processed$GENE, AnnotationDbi::revmap(org.Hs.eg.db::org.Hs.egSYMBOL),
        ifnotfound = NA)
    input_processed$EG_ID <- vapply(tmp, function(x) as.character(x[1]), "EGID")
    input_processed <- input_processed[!is.na(input_processed$EG_ID), ]

    ### A rule of thumb for the 'kegg' ID is entrezgene ID for eukaryote
    ### species
    input_processed$KEGG_ID <- paste0("hsa:", input_processed$EG_ID)

    ############ Fetch all pathway genes, create vector of change values and
    ############ Generate colored pathway diagrams for each pathway
    change_vec <- input_processed$CHANGE
    names(change_vec) <- input_processed$KEGG_ID

    cat("Generating pathway diagrams of", length(kegg_pw_ids), "KEGG pathways\n\n")
    pw_vis_list <- lapply(
      kegg_pw_ids,
      color_kegg_pathway,
      change_vec=change_vec,
      scale_vals = scale_vals,
      node_cols = node_cols,
      legend.position = legend.position
    )
    names(pw_vis_list) <- kegg_pw_ids

    return(pw_vis_list)
}

#' Color hsa KEGG pathway
#'
#' @param pw_id hsa KEGG pathway id (e.g. hsa05012)
#' @param change_vec vector of change values, names should be hsa KEGG gene ids
#' @param scale_vals should change values be scaled? (default = \code{TRUE})
#' @param node_cols low, middle and high color values for coloring the pathway nodes
#' (default = \code{NULL}). If \code{node_cols=NULL}, the low, middle and high color
#' are set as 'green', 'gray' and 'red'. If all change values are 1e6 (in case no
#' changes are supplied, this dummy value is assigned by
#' \code{\link{input_processing}}), only one color ('#F38F18' if NULL) is used.
#' @inheritParams ggplot2::theme
#'
#' @return a ggplot object containing the colored KEGG pathway diagram visualization
#'
#' @examples
#' \dontrun{
#' pw_id <- 'hsa00010'
#' change_vec <- c(-2, 4, 6)
#' names(change_vec) <- c('hsa:2821', 'hsa:226', 'hsa:229')
#' result <- pathfindR:::color_kegg_pathway(pw_id, change_vec)
#' }
color_kegg_pathway <- function(pw_id, change_vec, scale_vals = TRUE, node_cols = NULL, legend.position = "top") {
    ############ Arg checks
    if (!is.logical(scale_vals)) {
        stop("`scale_vals` should be logical")
    }

    ## check node_cols
    if (!is.null(node_cols)) {
        if (!is.atomic(node_cols)) {
            stop("`node_cols` should be a vector of colors")
        }

        if (!all(change_vec == 1e+06) & length(node_cols) != 3) {
            stop("the length of `node_cols` should be 3")
        }

        if (!all(vapply(node_cols, isColor, TRUE))) {
            stop("`node_cols` should be a vector of valid colors")
        }
    }
    ############ Set node palette if node_cols not supplied, use default
    ############ color(s)
    if (!is.null(node_cols)) {
        if (all(change_vec == 1e+06)) {
            message("all `change_vec` values are 1e6, using the first color in `node_cols`")
            low_col <- mid_col <- high_col <- node_cols[1]
        } else {
            low_col <- node_cols[1]
            mid_col <- node_cols[2]
            high_col <- node_cols[3]
        }
    } else if (all(change_vec == 1e+06)) {
        ## NO CHANGES SUPPLIED
        low_col <- mid_col <- high_col <- "#F38F18"
    } else {
        low_col <- "red"
        mid_col <- "gray"
        high_col <- "green"
    }

    ############ Assign the input change values to any corresponding pathway gene nodes
    # create pathway graph object and collect all pathway genes
    ggkegg_temp_dir <- file.path(tempdir(check = TRUE), "ggkegg")
    dir.create(ggkegg_temp_dir, showWarnings = FALSE)

    g <- tryCatch({
      ggkegg::pathway(pid = pw_id, directory = ggkegg_temp_dir)
    }, error = function(e) {
      message(paste("Cannot parse KEGG pathway for:", pw_id))
      message("Here's the original error message:")
      message(e$message)
      return(NULL)
    }, warning = function(w) {
      message(paste("Cannot parse KEGG pathway for:", pw_id))
      message("Here's the original error message:")
      message(w$message)
      return(NULL)
    })

    if (is.null(g)) {
      return(NULL)
    }

    gene_nodes <- names(igraph::V(g))[igraph::V(g)$type == "gene"]

    ## aggregate change values over all pathway gene nodes
    pw_vis_changes <- c()
    for (i in seq_len(length(gene_nodes))) {
        node_name <- gene_nodes[i]
        node <- unlist(strsplit(node_name, " "))
        cond <- names(change_vec) %in% node

        if (any(cond)) {
          node_val <- mean(change_vec[cond])
          names(node_val) <- node_name
          pw_vis_changes <- c(pw_vis_changes, node_val)
        }
    }
    ## if no input genes present in chosen pathway
    if (all(is.na(pw_vis_changes))) {
        return(NULL)
    }

    ############ Determine node colors
    ### scaling
    if (!all(pw_vis_changes == 1e+06) & scale_vals) {
      common_limit <- max(abs(pw_vis_changes))
      pw_vis_changes <- ifelse(pw_vis_changes < 0,
                               -abs(pw_vis_changes) / common_limit,
                               pw_vis_changes / common_limit)
    }


    ############ Create pathway diagram visualisation
    igraph::V(g)$change_value <- NA
    igraph::V(g)$change_value[match(names(pw_vis_changes), names(igraph::V(g)))] <- pw_vis_changes

    p <- ggraph::ggraph(g, layout="manual", x=igraph::V(g)$x, y=igraph::V(g)$y)
    p <- p + ggkegg::geom_node_rect(ggplot2::aes(filter = !is.na(.data$change_value), fill = .data$change_value))
    p <- p + ggkegg::overlay_raw_map(pw_id)
    p <- p + ggplot2::scale_fill_gradient2(low = low_col, mid = mid_col, high = high_col)
    p <- p + ggplot2::theme_void()
    p <- p + ggplot2::theme(
      legend.title = ggplot2::element_blank(),
      legend.position = legend.position
    )

    return(p)
}

#' Create Bubble Chart of Enrichment Results
#'
#' This function is used to create a ggplot2 bubble chart displaying the
#' enrichment results.
#'
#' @param result_df a data frame that must contain the following columns: \describe{
#'   \item{Term_Description}{Description of the enriched term}
#'   \item{Fold_Enrichment}{Fold enrichment value for the enriched term}
#'   \item{lowest_p}{the lowest adjusted-p value of the given term over all iterations}
#'   \item{Up_regulated}{the up-regulated genes in the input involved in the given term's gene set, comma-separated}
#'   \item{Down_regulated}{the down-regulated genes in the input involved in the given term's gene set, comma-separated}
#'   \item{Cluster(OPTIONAL)}{the cluster to which the enriched term is assigned}
#' }
#' @param top_terms number of top terms (according to the 'lowest_p' column)
#'  to plot (default = 10). If \code{plot_by_cluster = TRUE}, selects the top
#'  \code{top_terms} terms per each cluster. Set \code{top_terms = NULL} to plot
#'  for all terms.If the total number of terms is less than \code{top_terms},
#'  all terms are plotted.
#' @param plot_by_cluster boolean value indicating whether or not to group the
#'  enriched terms by cluster (works if \code{result_df} contains a
#'  'Cluster' column).
#' @param num_bubbles number of sizes displayed in the legend \code{# genes}
#'  (Default = 4)
#' @param even_breaks whether or not to set even breaks for the number of sizes
#'  displayed in the legend \code{# genes}. If \code{TRUE} (default), sets
#'  equal breaks and the number of displayed bubbles may be different than the
#'  number set by \code{num_bubbles}. If the exact number set by
#'  \code{num_bubbles} is required, set this argument to \code{FALSE}
#'
#' @return a \code{\link[ggplot2]{ggplot2}} object containing the bubble chart.
#' The x-axis corresponds to fold enrichment values while the y-axis indicates
#' the enriched terms. Size of the bubble indicates the number of significant
#' genes in the given enriched term. Color indicates the -log10(lowest-p) value.
#' The closer the color is to red, the more significant the enrichment is.
#' Optionally, if 'Cluster' is a column of \code{result_df} and
#' \code{plot_by_cluster == TRUE}, the enriched terms are grouped by clusters.
#'
#' @import ggplot2
#' @export
#'
#' @examples
#' g <- enrichment_chart(example_pathfindR_output)
enrichment_chart <- function(result_df, top_terms = 10, plot_by_cluster = FALSE,
    num_bubbles = 4, even_breaks = TRUE) {
    message("Plotting the enrichment bubble chart")
    necessary <- c("Term_Description", "Fold_Enrichment", "lowest_p", "Up_regulated",
        "Down_regulated")

    if (!all(necessary %in% colnames(result_df))) {
        stop("The input data frame must have the columns:\n", paste(necessary, collapse = ", "))
    }

    if (!is.logical(plot_by_cluster)) {
        stop("`plot_by_cluster` must be either TRUE or FALSE")
    }

    if (!is.numeric(top_terms) & !is.null(top_terms)) {
        stop("`top_terms` must be either numeric or NULL")
    }

    if (!is.null(top_terms)) {
        if (top_terms < 1) {
            stop("`top_terms` must be > 1")
        }
    }

    # sort by lowest adj.p
    result_df <- result_df[order(result_df$lowest_p), ]

    ## Filter for top_terms
    if (!is.null(top_terms)) {
        if (plot_by_cluster & "Cluster" %in% colnames(result_df)) {
            keep_ids <- tapply(result_df$ID, result_df$Cluster, function(x) {
                x[seq_len(min(top_terms, length(x)))]
            })
            keep_ids <- unlist(keep_ids)
            result_df <- result_df[result_df$ID %in% keep_ids, ]
        } else if (top_terms < nrow(result_df)) {
            result_df <- result_df[seq_len(top_terms), ]
        }
    }

    num_genes <- vapply(result_df$Up_regulated, function(x) length(unlist(strsplit(x,
        ", "))), 1)
    num_genes <- num_genes + vapply(result_df$Down_regulated, function(x) length(unlist(strsplit(x,
        ", "))), 1)

    result_df$Term_Description <- factor(result_df$Term_Description, levels = rev(unique(result_df$Term_Description)))

    log_p <- -log10(result_df$lowest_p)

    g <- ggplot2::ggplot(result_df, ggplot2::aes(.data$Fold_Enrichment, .data$Term_Description))
    g <- g + ggplot2::geom_point(ggplot2::aes(color = log_p, size = num_genes), na.rm = TRUE)
    g <- g + ggplot2::theme_bw()
    g <- g + ggplot2::theme(axis.text.x = ggplot2::element_text(size = 10), axis.text.y = ggplot2::element_text(size = 10),
        plot.title = ggplot2::element_blank())
    g <- g + ggplot2::xlab("Fold Enrichment")
    g <- g + ggplot2::theme(axis.title.y = ggplot2::element_blank())
    g <- g + ggplot2::labs(size = "# genes", color = expression(-log[10](p)))

    ## breaks for # genes
    if (max(num_genes) < num_bubbles) {
        g <- g + ggplot2::scale_size_continuous(breaks = seq(0, max(num_genes)))
    } else {
        if (even_breaks) {
            brks <- base::seq(0, max(num_genes), round(max(num_genes)/(num_bubbles +
                1)))
        } else {
            brks <- base::round(base::seq(0, max(num_genes), length.out = num_bubbles +
                1))
        }
        g <- g + ggplot2::scale_size_continuous(breaks = brks)
    }

    g <- g + ggplot2::scale_color_gradient(low = "#f5efef", high = "red")

    if (plot_by_cluster & "Cluster" %in% colnames(result_df)) {
        g <- g + ggplot2::facet_grid(result_df$Cluster ~ ., scales = "free_y", space = "free",
            drop = TRUE)
    } else if (plot_by_cluster) {
        message("For plotting by cluster

Download .txt

gitextract_46eozuid/

├── .Rbuildignore
├── .Rinstignore
├── .github/
│   ├── .gitignore
│   ├── ISSUE_TEMPLATE/
│   │   ├── bug_report.md
│   │   └── feature_request.md
│   └── workflows/
│       ├── R-CMD-check.yaml
│       ├── branch_naming_policy.yaml
│       ├── pkgdown.yaml
│       └── test-coverage.yaml
├── .gitignore
├── CODE_OF_CONDUCT.md
├── CONTRIBUTING.md
├── DESCRIPTION
├── LICENSE
├── LICENSE.md
├── NAMESPACE
├── NEWS.md
├── R/
│   ├── active_snw_search.R
│   ├── clustering.R
│   ├── comparison.R
│   ├── core.R
│   ├── data_generation.R
│   ├── enrichment.R
│   ├── pathfindr.R
│   ├── scoring.R
│   ├── utility.R
│   ├── visualization.R
│   └── zzz.R
├── README.Rmd
├── README.md
├── _pkgdown.yml
├── codecov.yml
├── cran-comments.md
├── inst/
│   ├── CITATION
│   ├── extdata/
│   │   ├── CREB.txt
│   │   ├── MYC.txt
│   │   └── resultActiveSubnetworkSearch.txt
│   ├── java/
│   │   └── ActiveSubnetworkSearch.jar
│   └── rmd/
│       ├── conversion_table.Rmd
│       ├── enriched_terms.Rmd
│       └── results.Rmd
├── java/
│   ├── ActiveSubnetworkSearchAlgorithms/
│   │   ├── ActiveSubnetworkSearch.java
│   │   ├── GAIndividual.java
│   │   ├── GeneticAlgorithm.java
│   │   ├── GreedySearch.java
│   │   └── SimulatedAnnealing.java
│   ├── ActiveSubnetworkSearchMisc/
│   │   ├── Gaussian.java
│   │   ├── ScoreCalculations.java
│   │   ├── Subnetwork.java
│   │   └── ZStatistics.java
│   ├── Application/
│   │   ├── AppActiveSubnetworkSearch.java
│   │   └── Parameters.java
│   ├── File/
│   │   ├── ExperimentFileReader.java
│   │   └── SIFReader.java
│   └── Network/
│       ├── Network.java
│       ├── Node.java
│       └── SubnetworkFinder.java
├── man/
│   ├── UpSet_plot.Rd
│   ├── active_snw_enrichment_wrapper.Rd
│   ├── active_snw_search.Rd
│   ├── annotate_term_genes.Rd
│   ├── check_java_version.Rd
│   ├── cluster_enriched_terms.Rd
│   ├── cluster_graph_vis.Rd
│   ├── color_kegg_pathway.Rd
│   ├── combine_pathfindR_results.Rd
│   ├── combined_results_graph.Rd
│   ├── configure_output_dir.Rd
│   ├── create_HTML_report.Rd
│   ├── create_kappa_matrix.Rd
│   ├── enrichment.Rd
│   ├── enrichment_analyses.Rd
│   ├── enrichment_chart.Rd
│   ├── fetch_gene_set.Rd
│   ├── fetch_java_version.Rd
│   ├── filterActiveSnws.Rd
│   ├── fuzzy_term_clustering.Rd
│   ├── get_biogrid_pin.Rd
│   ├── get_gene_sets_list.Rd
│   ├── get_kegg_gsets.Rd
│   ├── get_mgsigdb_gsets.Rd
│   ├── get_pin_file.Rd
│   ├── get_reactome_gsets.Rd
│   ├── gset_list_from_gmt.Rd
│   ├── hierarchical_term_clustering.Rd
│   ├── hyperg_test.Rd
│   ├── input_processing.Rd
│   ├── input_testing.Rd
│   ├── isColor.Rd
│   ├── pathfindr.Rd
│   ├── plot_scores.Rd
│   ├── process_pin.Rd
│   ├── return_pin_path.Rd
│   ├── run_pathfindr.Rd
│   ├── safe_get_content.Rd
│   ├── score_terms.Rd
│   ├── single_iter_wrapper.Rd
│   ├── summarize_enrichment_results.Rd
│   ├── term_gene_graph.Rd
│   ├── term_gene_heatmap.Rd
│   ├── visualize_KEGG_diagram.Rd
│   ├── visualize_active_subnetworks.Rd
│   ├── visualize_term_interactions.Rd
│   └── visualize_terms.Rd
├── renv/
│   ├── .gitignore
│   ├── activate.R
│   └── settings.json
├── revdep/
│   ├── .gitignore
│   ├── email.yml
│   └── failures.md
├── slides/
│   └── cost_charme_school/
│       └── demo_script.R
├── tests/
│   ├── testthat/
│   │   ├── test-active_snw_search.R
│   │   ├── test-clustering.R
│   │   ├── test-comparison.R
│   │   ├── test-core.R
│   │   ├── test-data_generation.R
│   │   ├── test-enrichment.R
│   │   ├── test-scoring.R
│   │   ├── test-utility.R
│   │   ├── test-visualization.R
│   │   └── test-zzz.R
│   ├── testthat-active_snw.R
│   ├── testthat-clustering.R
│   ├── testthat-comparison.R
│   ├── testthat-core.R
│   ├── testthat-data_generation.R
│   ├── testthat-enrichment.R
│   ├── testthat-scoring.R
│   ├── testthat-utility.R
│   ├── testthat-visualization.R
│   └── testthat-zzz.R
└── vignettes/
    ├── .gitignore
    ├── comparing_results.Rmd
    ├── intro_vignette.Rmd
    ├── manual_execution.Rmd
    ├── non_hs_analysis.Rmd
    ├── obtain_data.Rmd
    └── visualization_vignette.Rmd

Download .txt

SYMBOL INDEX (101 symbols across 16 files)

FILE: java/ActiveSubnetworkSearchAlgorithms/ActiveSubnetworkSearch.java
  class ActiveSubnetworkSearch (line 24) | public class ActiveSubnetworkSearch {
    method activeSubnetworkSearch (line 33) | public static void activeSubnetworkSearch(){

FILE: java/ActiveSubnetworkSearchAlgorithms/GAIndividual.java
  class GAIndividual (line 16) | public class GAIndividual implements Comparable<Object>{
    method GAIndividual (line 23) | public GAIndividual(HashSet<Node> nodesOnSet){
    method GAIndividual (line 38) | public GAIndividual(ArrayList<Boolean> representationBoolean){
    method compareTo (line 58) | @Override
    method getRepresentationBoolean (line 92) | public ArrayList<Boolean> getRepresentationBoolean() {
    method getNetworkNodeList (line 96) | public ArrayList<Node> getNetworkNodeList() {
    method getNodesOnSet (line 100) | public HashSet<Node> getNodesOnSet() {
    method getSubnetworkList (line 104) | public ArrayList<Subnetwork> getSubnetworkList() {
    method getHighestScoringSubnetwork (line 108) | public Subnetwork getHighestScoringSubnetwork(){
    method getScore (line 116) | public double getScore(){
    method toString (line 124) | public String toString(){

FILE: java/ActiveSubnetworkSearchAlgorithms/GeneticAlgorithm.java
  type SelectionType (line 20) | enum SelectionType {RANKSELECTION, ROULETTEWHEEL}
  type CrossoverType (line 21) | enum CrossoverType {SINGLEPOINT, MULTIPOINT, UNIFORM}
  class GeneticAlgorithm (line 23) | public class GeneticAlgorithm {
    method geneticAlgorithm (line 32) | public ArrayList<Subnetwork> geneticAlgorithm(){
    method printSituation (line 118) | private void printSituation(ArrayList<GAIndividual> population){
    method initializePopulation (line 125) | private void initializePopulation(ArrayList<GAIndividual> population, ...
    method createRandomGAIndividual (line 146) | private GAIndividual createRandomGAIndividual(){
    method createNewPopulation (line 154) | private ArrayList<GAIndividual> createNewPopulation(ArrayList<GAIndivi...
  class NewPopulationFactory (line 185) | class NewPopulationFactory implements Runnable{
    method NewPopulationFactory (line 192) | public NewPopulationFactory(ArrayList<GAIndividual> population, ArrayL...
    method run (line 199) | @Override
    method selection (line 215) | private GAIndividual[] selection(ArrayList<GAIndividual> population) {
    method crossoverAndMutation (line 250) | private GAIndividual[] crossoverAndMutation(GAIndividual parent1, GAIn...

FILE: java/ActiveSubnetworkSearchAlgorithms/GreedySearch.java
  class GreedySearch (line 18) | public class GreedySearch {
    method greedySearch (line 52) | public ArrayList<Subnetwork> greedySearch() {
    method filterSubnetworkList (line 146) | private ArrayList<Subnetwork> filterSubnetworkList(ArrayList<Subnetwor...
    method initializeMaxDepth (line 195) | private void initializeMaxDepth(Node current, int depth) {
    method runGreedySearchRecursive (line 222) | private boolean runGreedySearchRecursive(int depth, Subnetwork component,
    method runGreedyRemovalSearch (line 269) | private void runGreedyRemovalSearch(Subnetwork component, HashSet remo...

FILE: java/ActiveSubnetworkSearchAlgorithms/SimulatedAnnealing.java
  class SimulatedAnnealing (line 29) | public class SimulatedAnnealing {
    method simulatedAnnealing (line 36) | public ArrayList<Subnetwork> simulatedAnnealing() {
    method toggleNodeState (line 149) | public void toggleNodeState(HashSet<Node> nodesOnSet, HashSet<Node> no...
    method printSituation (line 159) | public void printSituation(ArrayList<Subnetwork> subnetworkList){

FILE: java/ActiveSubnetworkSearchMisc/Gaussian.java
  class Gaussian (line 25) | public class Gaussian {
    method pdf (line 28) | public static double pdf(double x) {
    method pdf (line 33) | public static double pdf(double x, double mu, double sigma) {
    method cdf (line 38) | public static double cdf(double z) {
    method cdf (line 50) | public static double cdf(double z, double mu, double sigma) {
    method inverseCDF (line 55) | public static double inverseCDF(double y) {
    method inverseCDF (line 60) | private static double inverseCDF(double y, double delta, double lo, do...

FILE: java/ActiveSubnetworkSearchMisc/ScoreCalculations.java
  class ScoreCalculations (line 25) | public class ScoreCalculations {
    method ScoreCalculations (line 38) | public ScoreCalculations(ArrayList<SimpleEntry<String, Double>> namePV...
    method fillNodeToPValueMap (line 44) | private void fillNodeToPValueMap(ArrayList<SimpleEntry<String, Double>...
    method process (line 77) | public void process() {
    method getPValue (line 85) | public Double getPValue(Node node) {
    method getZScore (line 89) | public Double getZScore(Node node) {
    method calculateZScores (line 93) | private void calculateZScores() {
    method calculateMeanAndStdForMonteCarlo (line 102) | private void calculateMeanAndStdForMonteCarlo() {
    method calculateScoreOfSubnetwork (line 192) | public double calculateScoreOfSubnetwork(Subnetwork subnetwork, boolea...
    method calculateScoreOfSubnetwork (line 202) | public double calculateScoreOfSubnetwork(ArrayList<Node> nodeList, boo...
    method calculateScoreOfSubnetwork (line 219) | public double calculateScoreOfSubnetwork(int numberOfNodes, double zSu...
    method normalizeScore (line 233) | private double normalizeScore(double score, int numberOfNodes){
    method penaltyForSize (line 237) | private double penaltyForSize(double score, int numberOfNodes){

FILE: java/ActiveSubnetworkSearchMisc/Subnetwork.java
  class Subnetwork (line 13) | public class Subnetwork implements Comparable<Object> {
    method Subnetwork (line 23) | public Subnetwork(ArrayList<Node> nodeList){
    method getNodeList (line 38) | public ArrayList<Node> getNodeList(){
    method getNeighborSet (line 42) | public HashSet<Node> getNeighborSet(){
    method getNeighborList (line 48) | public ArrayList<Node> getNeighborList(){
    method numberOfNodes (line 56) | public int numberOfNodes(){
    method getScore (line 60) | public double getScore(){
    method compareTo (line 64) | @Override
    method contains (line 69) | public boolean contains(Node node){
    method addNode (line 73) | public void addNode(Node node){
    method removeNode (line 83) | public void removeNode(Node node){
    method extractNeighborSet (line 95) | private void extractNeighborSet(){

FILE: java/ActiveSubnetworkSearchMisc/ZStatistics.java
  class ZStatistics (line 8) | public class ZStatistics {
    method oneMinusNormalCDFInverse (line 10) | public static double oneMinusNormalCDFInverse(double p) {
    method oneMinusNormalCDFInversePLT5 (line 26) | private static double oneMinusNormalCDFInversePLT5(double p) {

FILE: java/Application/AppActiveSubnetworkSearch.java
  class AppActiveSubnetworkSearch (line 11) | public class AppActiveSubnetworkSearch {
    method main (line 16) | public static void main(String[] args) {
    method processArguments (line 28) | public static void processArguments(String[] args) throws Exception {

FILE: java/Application/Parameters.java
  class Parameters (line 8) | public class Parameters {
    type SearchMethod (line 13) | public enum SearchMethod{GR, SA, GA}

FILE: java/File/ExperimentFileReader.java
  class ExperimentFileReader (line 16) | public class ExperimentFileReader {
    method readExperimentFile (line 18) | public static ArrayList<SimpleEntry<String, Double>> readExperimentFil...

FILE: java/File/SIFReader.java
  class SIFReader (line 15) | public class SIFReader {
    method readSIF (line 17) | public static Network readSIF(String path){

FILE: java/Network/Network.java
  class Network (line 13) | public class Network {
    method Network (line 18) | public Network() {
    method addInteraction (line 22) | public void addInteraction(String strNode1, String strNode2){
    method addInteraction (line 28) | public void addInteraction(Node node1, Node node2){
    method getNeighborSet (line 47) | public HashSet<Node> getNeighborSet(Node node){
    method getNodeList (line 51) | public ArrayList<Node> getNodeList(){
    method areAdjacent (line 55) | public boolean areAdjacent(Node node1, Node node2){
    method getNumberOfNodes (line 59) | public int getNumberOfNodes(){
    method getNumberOfInteractions (line 63) | public int getNumberOfInteractions(){

FILE: java/Network/Node.java
  class Node (line 9) | public class Node {
    method Node (line 12) | public Node(String name){
    method getName (line 16) | public String getName(){
    method toString (line 20) | @Override
    method hashCode (line 25) | @Override
    method equals (line 30) | @Override

FILE: java/Network/SubnetworkFinder.java
  class SubnetworkFinder (line 14) | public class SubnetworkFinder {
    method SubnetworkFinder (line 19) | public SubnetworkFinder(){
    method findSubnetworksDFS (line 31) | public ArrayList<Subnetwork> findSubnetworksDFS(HashSet<Node> nodesOnS...
    method search (line 45) | private void search(Node node, ArrayList<Node> subnetworkNodeList){
    method findSubnetworksDFSNonRecursive (line 57) | public ArrayList<Subnetwork> findSubnetworksDFSNonRecursive(HashSet<No...
    method findSubnetworksBFS (line 87) | public ArrayList<Subnetwork> findSubnetworksBFS(HashSet<Node> nodesOnS...

Download .json

Condensed preview — 138 files, each showing path, character count, and a content snippet. Download the .json file or copy for the full structured content (738K chars).

[
  {
    "path": ".Rbuildignore",
    "chars": 337,
    "preview": "^renv$\n^renv\\.lock$\n^slides$\n^CODE_OF_CONDUCT\\.md$\n^CONTRIBUTING.md$\n^\\.github$\n^Meta$\n^doc$\n^.*\\.Rprofile$\n^.*\\.Rproj$\n"
  },
  {
    "path": ".Rinstignore",
    "chars": 28,
    "preview": "^slides$\n^java$\n^misc_data$\n"
  },
  {
    "path": ".github/.gitignore",
    "chars": 7,
    "preview": "*.html\n"
  },
  {
    "path": ".github/ISSUE_TEMPLATE/bug_report.md",
    "chars": 961,
    "preview": "---\nname: Bug report\nabout: Create a report to help us improve\ntitle: ''\nlabels: 'bug'\nassignees: ''\n\n---\n\n**Describe th"
  },
  {
    "path": ".github/ISSUE_TEMPLATE/feature_request.md",
    "chars": 606,
    "preview": "---\nname: Feature request\nabout: Suggest an idea for this project\ntitle: ''\nlabels: 'enhancement'\nassignees: ''\n\n---\n\n**"
  },
  {
    "path": ".github/workflows/R-CMD-check.yaml",
    "chars": 1397,
    "preview": "# Workflow derived from https://github.com/r-lib/actions/tree/v2/examples\n# Need help debugging build failures? Start at"
  },
  {
    "path": ".github/workflows/branch_naming_policy.yaml",
    "chars": 600,
    "preview": "name: Branch Naming Policy Action\n\non:\n  create:\n  delete:\n  pull_request:\n    branches:\n      - master\n\njobs:\n  branch-"
  },
  {
    "path": ".github/workflows/pkgdown.yaml",
    "chars": 1301,
    "preview": "# Workflow derived from https://github.com/r-lib/actions/tree/v2/examples\n# Need help debugging build failures? Start at"
  },
  {
    "path": ".github/workflows/test-coverage.yaml",
    "chars": 1376,
    "preview": "# Workflow derived from https://github.com/r-lib/actions/tree/v2/examples\n# Need help debugging build failures? Start at"
  },
  {
    "path": ".gitignore",
    "chars": 132,
    "preview": "Meta\ndoc\ninst/doc\nmisc\ndata-raw\n*.pptx\n\n.Rprofile\n\n*.DS_Store\n*.Rproj\n*.RData\n*.Ruserdata\n*.Rproj.user\n*.Rhistory\n.Rproj"
  },
  {
    "path": "CODE_OF_CONDUCT.md",
    "chars": 3350,
    "preview": "# Contributor Covenant Code of Conduct\n\n## Our Pledge\n\nIn the interest of fostering an open and welcoming environment, w"
  },
  {
    "path": "CONTRIBUTING.md",
    "chars": 3841,
    "preview": "# Contributing to pathfindR development\n\nThe goal of this guide is to help you in contributing to pathfindR. The guide i"
  },
  {
    "path": "DESCRIPTION",
    "chars": 2428,
    "preview": "Package: pathfindR\nType: Package\nTitle: Enrichment Analysis Utilizing Active Subnetworks\nVersion: 2.7.0.9000\nAuthors@R: "
  },
  {
    "path": "LICENSE",
    "chars": 39,
    "preview": "YEAR: 2020\nCOPYRIGHT HOLDER: Ege Ulgen\n"
  },
  {
    "path": "LICENSE.md",
    "chars": 1068,
    "preview": "# MIT License\n\nCopyright (c) 2020 Ege Ulgen\n\nPermission is hereby granted, free of charge, to any person obtaining a cop"
  },
  {
    "path": "NAMESPACE",
    "chars": 1182,
    "preview": "# Generated by roxygen2: do not edit by hand\n\nexport(UpSet_plot)\nexport(active_snw_search)\nexport(annotate_term_genes)\ne"
  },
  {
    "path": "NEWS.md",
    "chars": 24299,
    "preview": "# pathfindR (development version)\n\n# pathfindR 2.7.0\n## Minor Changes and Bug Fixes\n- Moved org.Hs.eg.db from \"Imports\" "
  },
  {
    "path": "R/active_snw_search.R",
    "chars": 15011,
    "preview": "#' Perform Active Subnetwork Search\n#'\n#' @param input_for_search input the input data that active subnetwork search use"
  },
  {
    "path": "R/clustering.R",
    "chars": 23532,
    "preview": "#' Create Kappa Statistics Matrix\n#'\n#' @param enrichment_res data frame of pathfindR enrichment results. Must-have\n#' c"
  },
  {
    "path": "R/comparison.R",
    "chars": 10433,
    "preview": "#' Combine 2 pathfindR Results\n#'\n#' @param result_A data frame of first pathfindR enrichment results\n#' @param result_B"
  },
  {
    "path": "R/core.R",
    "chars": 7386,
    "preview": "#' Wrapper Function for pathfindR - Active-Subnetwork-Oriented Enrichment Workflow\n#'\n#' \\code{run_pathfindR} is the wra"
  },
  {
    "path": "R/data_generation.R",
    "chars": 16748,
    "preview": "#' Safely download and parse web content\n#'\n#' This helper function retrieves content from a given URL using \\pkg{httr}."
  },
  {
    "path": "R/enrichment.R",
    "chars": 14584,
    "preview": "#' Hypergeometric Distribution-based Hypothesis Testing\n#'\n#' @param term_genes vector of genes in the selected term gen"
  },
  {
    "path": "R/pathfindr.R",
    "chars": 2090,
    "preview": "#' pathfindR: A package for Enrichment Analysis Utilizing Active Subnetworks\n#'\n#' pathfindR is a tool for active-subnet"
  },
  {
    "path": "R/scoring.R",
    "chars": 11761,
    "preview": "#' Calculate Agglomerated Scores of Enriched Terms for Each Subject\n#'\n#' @param enrichment_table a data frame that must"
  },
  {
    "path": "R/utility.R",
    "chars": 28824,
    "preview": "#' Active Subnetwork Search + Enrichment Analysis Wrapper for a Single Iteration\n#'\n#' @param i current iteration index "
  },
  {
    "path": "R/visualization.R",
    "chars": 46380,
    "preview": "#' Check if value is a valid color\n#'\n#' @param x value\n#'\n#' @return TRUE if x is a valid color, otherwise FALSE\nisColo"
  },
  {
    "path": "R/zzz.R",
    "chars": 3309,
    "preview": ".onAttach <- function(libname, pkgname) {\n    packageStartupMessage(\"###################################################"
  },
  {
    "path": "README.Rmd",
    "chars": 14414,
    "preview": "---\noutput: github_document\n---\n\n<!-- README.md is generated from README.Rmd. Please edit that file -->\n\n```{r, include="
  },
  {
    "path": "README.md",
    "chars": 14693,
    "preview": "\n<!-- README.md is generated from README.Rmd. Please edit that file -->\n\n# <img src=\"https://github.com/egeulgen/pathfin"
  },
  {
    "path": "_pkgdown.yml",
    "chars": 2281,
    "preview": "destination: docs\ntemplate:\n  params:\n    bootswatch: united\n    docsearch:\n      api_key: 7f13d388d59fea08d4add29291ea2"
  },
  {
    "path": "codecov.yml",
    "chars": 176,
    "preview": "comment: false\n\ncoverage:\n  status:\n    project:\n      default:\n        target: auto\n        threshold: 1%\n    patch:\n  "
  },
  {
    "path": "cran-comments.md",
    "chars": 767,
    "preview": "## Test environments\n* local OS X 26.2, R 4.5.2\n* macOS-latest (on GitHub-Actions), R 4.5.2\n* windows-latest (on GitHub-"
  },
  {
    "path": "inst/CITATION",
    "chars": 738,
    "preview": "citHeader(\"Please cite the article below if you use pathfindR in published reseach\")\n\nbibentry(\n  bibtype = \"Article\",\n\t"
  },
  {
    "path": "inst/extdata/CREB.txt",
    "chars": 1835,
    "preview": "CREB_01\n> Genes having at least one occurence of the transcription factor binding site V$CREB_01 (v7.4 TRANSFAC) in the "
  },
  {
    "path": "inst/extdata/MYC.txt",
    "chars": 6747,
    "preview": "CACGTG_MYC_Q2\n> Genes having at least one occurence of the highly conserved motif M2 CACGTG sites. The motif matches tra"
  },
  {
    "path": "inst/extdata/resultActiveSubnetworkSearch.txt",
    "chars": 81236,
    "preview": "91.13730033969254 ZRANB1 ALKBH5 HNRNPH1 EWSR1 PSMD7 CSNK2A2 TRRAP ZNF148 CBX3 NUP93 NUP214 CFL1 PFN1 ATPIF1 PDIA4 RMDN3 "
  },
  {
    "path": "inst/rmd/conversion_table.Rmd",
    "chars": 961,
    "preview": "---\ntitle: '`r logo_path <- system.file(\"extdata\", \"logo.png\", package = \"pathfindR\"); knitr::opts_chunk$set(out.width=\""
  },
  {
    "path": "inst/rmd/enriched_terms.Rmd",
    "chars": 644,
    "preview": "---\ntitle: '`r logo_path <- system.file(\"extdata\", \"logo.png\", package = \"pathfindR\"); knitr::opts_chunk$set(out.width=\""
  },
  {
    "path": "inst/rmd/results.Rmd",
    "chars": 1090,
    "preview": "---\ntitle: '`r logo_path <- system.file(\"extdata\", \"logo.png\", package = \"pathfindR\"); knitr::opts_chunk$set(out.width=\""
  },
  {
    "path": "java/ActiveSubnetworkSearchAlgorithms/ActiveSubnetworkSearch.java",
    "chars": 3153,
    "preview": "package ActiveSubnetworkSearchAlgorithms;\n\nimport ActiveSubnetworkSearchMisc.ScoreCalculations;\nimport ActiveSubnetworkS"
  },
  {
    "path": "java/ActiveSubnetworkSearchAlgorithms/GAIndividual.java",
    "chars": 4131,
    "preview": "package ActiveSubnetworkSearchAlgorithms;\n\nimport ActiveSubnetworkSearchMisc.Subnetwork;\nimport Network.Node;\nimport Net"
  },
  {
    "path": "java/ActiveSubnetworkSearchAlgorithms/GeneticAlgorithm.java",
    "chars": 11690,
    "preview": "package ActiveSubnetworkSearchAlgorithms;\n\nimport ActiveSubnetworkSearchMisc.ScoreCalculations;\nimport ActiveSubnetworkS"
  },
  {
    "path": "java/ActiveSubnetworkSearchAlgorithms/GreedySearch.java",
    "chars": 11669,
    "preview": "package ActiveSubnetworkSearchAlgorithms;\n\nimport ActiveSubnetworkSearchMisc.Subnetwork;\nimport Application.Parameters;\n"
  },
  {
    "path": "java/ActiveSubnetworkSearchAlgorithms/SimulatedAnnealing.java",
    "chars": 6358,
    "preview": "package ActiveSubnetworkSearchAlgorithms;\n\nimport ActiveSubnetworkSearchMisc.Subnetwork;\nimport ActiveSubnetworkSearchMi"
  },
  {
    "path": "java/ActiveSubnetworkSearchMisc/Gaussian.java",
    "chars": 2226,
    "preview": "package ActiveSubnetworkSearchMisc;\n\n/******************************************************************************\n *\n"
  },
  {
    "path": "java/ActiveSubnetworkSearchMisc/ScoreCalculations.java",
    "chars": 9441,
    "preview": "package ActiveSubnetworkSearchMisc;\n\nimport Application.Parameters;\nimport ActiveSubnetworkSearchAlgorithms.ActiveSubnet"
  },
  {
    "path": "java/ActiveSubnetworkSearchMisc/Subnetwork.java",
    "chars": 3060,
    "preview": "package ActiveSubnetworkSearchMisc;\n\nimport Network.Network;\nimport ActiveSubnetworkSearchAlgorithms.ActiveSubnetworkSea"
  },
  {
    "path": "java/ActiveSubnetworkSearchMisc/ZStatistics.java",
    "chars": 1308,
    "preview": "package ActiveSubnetworkSearchMisc;\n\n/**\n *\n * @author Ozan Ozisik\n * adapted from  https://github.com/idekerlab/jActive"
  },
  {
    "path": "java/Application/AppActiveSubnetworkSearch.java",
    "chars": 5042,
    "preview": "package Application;\n\nimport ActiveSubnetworkSearchAlgorithms.ActiveSubnetworkSearch;\nimport java.util.logging.Level;\nim"
  },
  {
    "path": "java/Application/Parameters.java",
    "chars": 1537,
    "preview": "package Application;\n\n/**\n *\n * @author Ozan Ozisik\n */\n\npublic class Parameters {\n    public static String sifPath=\"BIO"
  },
  {
    "path": "java/File/ExperimentFileReader.java",
    "chars": 2141,
    "preview": "package File;\n\nimport java.io.BufferedReader;\nimport java.io.FileNotFoundException;\nimport java.io.FileReader;\nimport ja"
  },
  {
    "path": "java/File/SIFReader.java",
    "chars": 2334,
    "preview": "package File;\n\nimport Network.Network;\nimport java.io.BufferedReader;\nimport java.io.FileNotFoundException;\nimport java."
  },
  {
    "path": "java/Network/Network.java",
    "chars": 2116,
    "preview": "package Network;\n\nimport java.util.ArrayList;\nimport java.util.HashMap;\nimport java.util.HashSet;\nimport java.util.loggi"
  },
  {
    "path": "java/Network/Node.java",
    "chars": 851,
    "preview": "package Network;\n\nimport java.util.Objects;\n\n/**\n *\n * @author Ozan Ozisik\n */\npublic class Node {\n    private final Str"
  },
  {
    "path": "java/Network/SubnetworkFinder.java",
    "chars": 4234,
    "preview": "package Network;\n\nimport ActiveSubnetworkSearchAlgorithms.ActiveSubnetworkSearch;\nimport ActiveSubnetworkSearchMisc.Subn"
  },
  {
    "path": "man/UpSet_plot.Rd",
    "chars": 2996,
    "preview": "% Generated by roxygen2: do not edit by hand\n% Please edit documentation in R/visualization.R\n\\name{UpSet_plot}\n\\alias{U"
  },
  {
    "path": "man/active_snw_enrichment_wrapper.Rd",
    "chars": 3946,
    "preview": "% Generated by roxygen2: do not edit by hand\n% Please edit documentation in R/utility.R\n\\name{active_snw_enrichment_wrap"
  },
  {
    "path": "man/active_snw_search.Rd",
    "chars": 4246,
    "preview": "% Generated by roxygen2: do not edit by hand\n% Please edit documentation in R/active_snw_search.R\n\\name{active_snw_searc"
  },
  {
    "path": "man/annotate_term_genes.Rd",
    "chars": 1266,
    "preview": "% Generated by roxygen2: do not edit by hand\n% Please edit documentation in R/utility.R\n\\name{annotate_term_genes}\n\\alia"
  },
  {
    "path": "man/check_java_version.Rd",
    "chars": 554,
    "preview": "% Generated by roxygen2: do not edit by hand\n% Please edit documentation in R/zzz.R\n\\name{check_java_version}\n\\alias{che"
  },
  {
    "path": "man/cluster_enriched_terms.Rd",
    "chars": 2512,
    "preview": "% Generated by roxygen2: do not edit by hand\n% Please edit documentation in R/clustering.R\n\\name{cluster_enriched_terms}"
  },
  {
    "path": "man/cluster_graph_vis.Rd",
    "chars": 2041,
    "preview": "% Generated by roxygen2: do not edit by hand\n% Please edit documentation in R/clustering.R\n\\name{cluster_graph_vis}\n\\ali"
  },
  {
    "path": "man/color_kegg_pathway.Rd",
    "chars": 1316,
    "preview": "% Generated by roxygen2: do not edit by hand\n% Please edit documentation in R/visualization.R\n\\name{color_kegg_pathway}\n"
  },
  {
    "path": "man/combine_pathfindR_results.Rd",
    "chars": 2442,
    "preview": "% Generated by roxygen2: do not edit by hand\n% Please edit documentation in R/comparison.R\n\\name{combine_pathfindR_resul"
  },
  {
    "path": "man/combined_results_graph.Rd",
    "chars": 1939,
    "preview": "% Generated by roxygen2: do not edit by hand\n% Please edit documentation in R/comparison.R\n\\name{combined_results_graph}"
  },
  {
    "path": "man/configure_output_dir.Rd",
    "chars": 479,
    "preview": "% Generated by roxygen2: do not edit by hand\n% Please edit documentation in R/utility.R\n\\name{configure_output_dir}\n\\ali"
  },
  {
    "path": "man/create_HTML_report.Rd",
    "chars": 787,
    "preview": "% Generated by roxygen2: do not edit by hand\n% Please edit documentation in R/utility.R\n\\name{create_HTML_report}\n\\alias"
  },
  {
    "path": "man/create_kappa_matrix.Rd",
    "chars": 1182,
    "preview": "% Generated by roxygen2: do not edit by hand\n% Please edit documentation in R/clustering.R\n\\name{create_kappa_matrix}\n\\a"
  },
  {
    "path": "man/enrichment.Rd",
    "chars": 1980,
    "preview": "% Generated by roxygen2: do not edit by hand\n% Please edit documentation in R/enrichment.R\n\\name{enrichment}\n\\alias{enri"
  },
  {
    "path": "man/enrichment_analyses.Rd",
    "chars": 2625,
    "preview": "% Generated by roxygen2: do not edit by hand\n% Please edit documentation in R/enrichment.R\n\\name{enrichment_analyses}\n\\a"
  },
  {
    "path": "man/enrichment_chart.Rd",
    "chars": 2557,
    "preview": "% Generated by roxygen2: do not edit by hand\n% Please edit documentation in R/visualization.R\n\\name{enrichment_chart}\n\\a"
  },
  {
    "path": "man/fetch_gene_set.Rd",
    "chars": 1620,
    "preview": "% Generated by roxygen2: do not edit by hand\n% Please edit documentation in R/utility.R\n\\name{fetch_gene_set}\n\\alias{fet"
  },
  {
    "path": "man/fetch_java_version.Rd",
    "chars": 376,
    "preview": "% Generated by roxygen2: do not edit by hand\n% Please edit documentation in R/zzz.R\n\\name{fetch_java_version}\n\\alias{fet"
  },
  {
    "path": "man/filterActiveSnws.Rd",
    "chars": 1699,
    "preview": "% Generated by roxygen2: do not edit by hand\n% Please edit documentation in R/active_snw_search.R\n\\name{filterActiveSnws"
  },
  {
    "path": "man/fuzzy_term_clustering.Rd",
    "chars": 1637,
    "preview": "% Generated by roxygen2: do not edit by hand\n% Please edit documentation in R/clustering.R\n\\name{fuzzy_term_clustering}\n"
  },
  {
    "path": "man/get_biogrid_pin.Rd",
    "chars": 1008,
    "preview": "% Generated by roxygen2: do not edit by hand\n% Please edit documentation in R/data_generation.R\n\\name{get_biogrid_pin}\n\\"
  },
  {
    "path": "man/get_gene_sets_list.Rd",
    "chars": 1985,
    "preview": "% Generated by roxygen2: do not edit by hand\n% Please edit documentation in R/data_generation.R\n\\name{get_gene_sets_list"
  },
  {
    "path": "man/get_kegg_gsets.Rd",
    "chars": 715,
    "preview": "% Generated by roxygen2: do not edit by hand\n% Please edit documentation in R/data_generation.R\n\\name{get_kegg_gsets}\n\\a"
  },
  {
    "path": "man/get_mgsigdb_gsets.Rd",
    "chars": 1886,
    "preview": "% Generated by roxygen2: do not edit by hand\n% Please edit documentation in R/data_generation.R\n\\name{get_mgsigdb_gsets}"
  },
  {
    "path": "man/get_pin_file.Rd",
    "chars": 1175,
    "preview": "% Generated by roxygen2: do not edit by hand\n% Please edit documentation in R/data_generation.R\n\\name{get_pin_file}\n\\ali"
  },
  {
    "path": "man/get_reactome_gsets.Rd",
    "chars": 591,
    "preview": "% Generated by roxygen2: do not edit by hand\n% Please edit documentation in R/data_generation.R\n\\name{get_reactome_gsets"
  },
  {
    "path": "man/gset_list_from_gmt.Rd",
    "chars": 635,
    "preview": "% Generated by roxygen2: do not edit by hand\n% Please edit documentation in R/data_generation.R\n\\name{gset_list_from_gmt"
  },
  {
    "path": "man/hierarchical_term_clustering.Rd",
    "chars": 2310,
    "preview": "% Generated by roxygen2: do not edit by hand\n% Please edit documentation in R/clustering.R\n\\name{hierarchical_term_clust"
  },
  {
    "path": "man/hyperg_test.Rd",
    "chars": 1075,
    "preview": "% Generated by roxygen2: do not edit by hand\n% Please edit documentation in R/enrichment.R\n\\name{hyperg_test}\n\\alias{hyp"
  },
  {
    "path": "man/input_processing.Rd",
    "chars": 2029,
    "preview": "% Generated by roxygen2: do not edit by hand\n% Please edit documentation in R/utility.R\n\\name{input_processing}\n\\alias{i"
  },
  {
    "path": "man/input_testing.Rd",
    "chars": 930,
    "preview": "% Generated by roxygen2: do not edit by hand\n% Please edit documentation in R/utility.R\n\\name{input_testing}\n\\alias{inpu"
  },
  {
    "path": "man/isColor.Rd",
    "chars": 318,
    "preview": "% Generated by roxygen2: do not edit by hand\n% Please edit documentation in R/visualization.R\n\\name{isColor}\n\\alias{isCo"
  },
  {
    "path": "man/pathfindr.Rd",
    "chars": 2365,
    "preview": "% Generated by roxygen2: do not edit by hand\n% Please edit documentation in R/pathfindr.R\n\\docType{package}\n\\name{pathfi"
  },
  {
    "path": "man/plot_scores.Rd",
    "chars": 1761,
    "preview": "% Generated by roxygen2: do not edit by hand\n% Please edit documentation in R/scoring.R\n\\name{plot_scores}\n\\alias{plot_s"
  },
  {
    "path": "man/process_pin.Rd",
    "chars": 504,
    "preview": "% Generated by roxygen2: do not edit by hand\n% Please edit documentation in R/data_generation.R\n\\name{process_pin}\n\\alia"
  },
  {
    "path": "man/return_pin_path.Rd",
    "chars": 1230,
    "preview": "% Generated by roxygen2: do not edit by hand\n% Please edit documentation in R/utility.R\n\\name{return_pin_path}\n\\alias{re"
  },
  {
    "path": "man/run_pathfindr.Rd",
    "chars": 7079,
    "preview": "% Generated by roxygen2: do not edit by hand\n% Please edit documentation in R/core.R\n\\name{run_pathfindR}\n\\alias{run_pat"
  },
  {
    "path": "man/safe_get_content.Rd",
    "chars": 1285,
    "preview": "% Generated by roxygen2: do not edit by hand\n% Please edit documentation in R/data_generation.R\n\\name{safe_get_content}\n"
  },
  {
    "path": "man/score_terms.Rd",
    "chars": 3779,
    "preview": "% Generated by roxygen2: do not edit by hand\n% Please edit documentation in R/scoring.R\n\\name{score_terms}\n\\alias{score_"
  },
  {
    "path": "man/single_iter_wrapper.Rd",
    "chars": 3558,
    "preview": "% Generated by roxygen2: do not edit by hand\n% Please edit documentation in R/utility.R\n\\name{single_iter_wrapper}\n\\alia"
  },
  {
    "path": "man/summarize_enrichment_results.Rd",
    "chars": 1897,
    "preview": "% Generated by roxygen2: do not edit by hand\n% Please edit documentation in R/enrichment.R\n\\name{summarize_enrichment_re"
  },
  {
    "path": "man/term_gene_graph.Rd",
    "chars": 2908,
    "preview": "% Generated by roxygen2: do not edit by hand\n% Please edit documentation in R/visualization.R\n\\name{term_gene_graph}\n\\al"
  },
  {
    "path": "man/term_gene_heatmap.Rd",
    "chars": 2632,
    "preview": "% Generated by roxygen2: do not edit by hand\n% Please edit documentation in R/visualization.R\n\\name{term_gene_heatmap}\n\\"
  },
  {
    "path": "man/visualize_KEGG_diagram.Rd",
    "chars": 1654,
    "preview": "% Generated by roxygen2: do not edit by hand\n% Please edit documentation in R/visualization.R\n\\name{visualize_KEGG_diagr"
  },
  {
    "path": "man/visualize_active_subnetworks.Rd",
    "chars": 2369,
    "preview": "% Generated by roxygen2: do not edit by hand\n% Please edit documentation in R/active_snw_search.R\n\\name{visualize_active"
  },
  {
    "path": "man/visualize_term_interactions.Rd",
    "chars": 2088,
    "preview": "% Generated by roxygen2: do not edit by hand\n% Please edit documentation in R/visualization.R\n\\name{visualize_term_inter"
  },
  {
    "path": "man/visualize_terms.Rd",
    "chars": 2544,
    "preview": "% Generated by roxygen2: do not edit by hand\n% Please edit documentation in R/visualization.R\n\\name{visualize_terms}\n\\al"
  },
  {
    "path": "renv/.gitignore",
    "chars": 56,
    "preview": "library/\nlocal/\ncellar/\nlock/\npython/\nsandbox/\nstaging/\n"
  },
  {
    "path": "renv/activate.R",
    "chars": 36745,
    "preview": "\nlocal({\n\n  # the requested version of renv\n  version <- \"1.1.4\"\n  attr(version, \"sha\") <- NULL\n\n  # the project directo"
  },
  {
    "path": "renv/settings.json",
    "chars": 414,
    "preview": "{\n  \"bioconductor.version\": \"3.21\",\n  \"external.libraries\": [],\n  \"ignored.packages\": [],\n  \"package.dependency.fields\":"
  },
  {
    "path": "revdep/.gitignore",
    "chars": 65,
    "preview": "checks\nlibrary\nchecks.noindex\nlibrary.noindex\ndata.sqlite\n*.html\n"
  },
  {
    "path": "revdep/email.yml",
    "chars": 99,
    "preview": "release_date: ???\nrel_release_date: ???\nmy_news_url: ???\nrelease_version: ???\nrelease_details: ???\n"
  },
  {
    "path": "revdep/failures.md",
    "chars": 29,
    "preview": "*Wow, no problems at all. :)*"
  },
  {
    "path": "slides/cost_charme_school/demo_script.R",
    "chars": 3763,
    "preview": "##################################################\n## Project: pathfindR\n## Script purpose: COST CHARME Summer Training\n"
  },
  {
    "path": "tests/testthat/test-active_snw_search.R",
    "chars": 9285,
    "preview": "## Tests for functions related to active subnetwork search - Aug 2023\n\n# set up input data\ninput_data_frame <- example_p"
  },
  {
    "path": "tests/testthat/test-clustering.R",
    "chars": 13846,
    "preview": "## Tests for enriched term clustering functions - Aug 2023\n\nenrichment_res <- example_pathfindR_output[1:5, ]\ninput_kapp"
  },
  {
    "path": "tests/testthat/test-comparison.R",
    "chars": 3356,
    "preview": "## Tests for functions related to comparison of pathfindR results - Aug 2023\n\ninput_df_A <- example_pathfindR_output[1:2"
  },
  {
    "path": "tests/testthat/test-core.R",
    "chars": 3007,
    "preview": "## Tests for core function - Aug 2023\n\n# set up input data\ninput_data_frame <- example_pathfindR_input[1:10, c(1, 3)]\nco"
  },
  {
    "path": "tests/testthat/test-data_generation.R",
    "chars": 9412,
    "preview": "## Tests for functions related to data generation - September 2025\nlibrary(httr)\nlibrary(ggkegg)\n\ntest_that(\"safe_get_co"
  },
  {
    "path": "tests/testthat/test-enrichment.R",
    "chars": 10096,
    "preview": "## Tests for functions related to enrichment analyses - Aug 2023\nset.seed(123)\n\ntest_that(\"`hyperg_test()` -- returns an"
  },
  {
    "path": "tests/testthat/test-scoring.R",
    "chars": 6751,
    "preview": "## Tests for agglomerated term scoring functions - Jan 2024\n\ntest_that(\"`score_terms()` -- returns score matrix\", {\n    "
  },
  {
    "path": "tests/testthat/test-utility.R",
    "chars": 17718,
    "preview": "## Tests for various utility functions - Aug 2023\n\nset.seed(123)\n\ntest_that(\"`active_snw_enrichment_wrapper()` -- works "
  },
  {
    "path": "tests/testthat/test-visualization.R",
    "chars": 20579,
    "preview": "## Tests for functions related to various visualization functions - Apr 2024\n\nsingle_result <- example_pathfindR_output["
  },
  {
    "path": "tests/testthat/test-zzz.R",
    "chars": 2363,
    "preview": "## Tests for functions related to java version check - Aug 2023\n\ntest_that(\"`fetch_java_version()` works as expected\", {"
  },
  {
    "path": "tests/testthat-active_snw.R",
    "chars": 92,
    "preview": "library(testthat)\nlibrary(pathfindR)\n\ntest_check(\"pathfindR\", filter = \"active_snw_search\")\n"
  },
  {
    "path": "tests/testthat-clustering.R",
    "chars": 85,
    "preview": "library(testthat)\nlibrary(pathfindR)\n\ntest_check(\"pathfindR\", filter = \"clustering\")\n"
  },
  {
    "path": "tests/testthat-comparison.R",
    "chars": 85,
    "preview": "library(testthat)\nlibrary(pathfindR)\n\ntest_check(\"pathfindR\", filter = \"comparison\")\n"
  },
  {
    "path": "tests/testthat-core.R",
    "chars": 79,
    "preview": "library(testthat)\nlibrary(pathfindR)\n\ntest_check(\"pathfindR\", filter = \"core\")\n"
  },
  {
    "path": "tests/testthat-data_generation.R",
    "chars": 90,
    "preview": "library(testthat)\nlibrary(pathfindR)\n\ntest_check(\"pathfindR\", filter = \"data_generation\")\n"
  },
  {
    "path": "tests/testthat-enrichment.R",
    "chars": 85,
    "preview": "library(testthat)\nlibrary(pathfindR)\n\ntest_check(\"pathfindR\", filter = \"enrichment\")\n"
  },
  {
    "path": "tests/testthat-scoring.R",
    "chars": 82,
    "preview": "library(testthat)\nlibrary(pathfindR)\n\ntest_check(\"pathfindR\", filter = \"scoring\")\n"
  },
  {
    "path": "tests/testthat-utility.R",
    "chars": 82,
    "preview": "library(testthat)\nlibrary(pathfindR)\n\ntest_check(\"pathfindR\", filter = \"utility\")\n"
  },
  {
    "path": "tests/testthat-visualization.R",
    "chars": 88,
    "preview": "library(testthat)\nlibrary(pathfindR)\n\ntest_check(\"pathfindR\", filter = \"visualization\")\n"
  },
  {
    "path": "tests/testthat-zzz.R",
    "chars": 78,
    "preview": "library(testthat)\nlibrary(pathfindR)\n\ntest_check(\"pathfindR\", filter = \"zzz\")\n"
  },
  {
    "path": "vignettes/.gitignore",
    "chars": 11,
    "preview": "*.html\n*.R\n"
  },
  {
    "path": "vignettes/comparing_results.Rmd",
    "chars": 2458,
    "preview": "---\ntitle: \"Comparing Two pathfindR Results\"\noutput: rmarkdown::html_vignette\nvignette: >\n  %\\VignetteIndexEntry{Compari"
  },
  {
    "path": "vignettes/intro_vignette.Rmd",
    "chars": 23878,
    "preview": "---\ntitle: \"Introduction to pathfindR\"\nauthor: \"Ege Ulgen\"\ndate: \"`r Sys.Date()`\"\noutput: rmarkdown::html_vignette\nvigne"
  },
  {
    "path": "vignettes/manual_execution.Rmd",
    "chars": 7826,
    "preview": "---\ntitle: \"Step-by-Step Execution of the pathfindR Enrichment Workflow\"\nauthor: \"Ege Ulgen\"\ndate: \"`r Sys.Date()`\"\noutp"
  },
  {
    "path": "vignettes/non_hs_analysis.Rmd",
    "chars": 10189,
    "preview": "---\ntitle: \"pathfindR Analysis for non-Homo-sapiens organisms\"\nauthor: \"Ege Ulgen\"\ndate: \"`r Sys.Date()`\"\noutput: rmarkd"
  },
  {
    "path": "vignettes/obtain_data.Rmd",
    "chars": 3847,
    "preview": "---\ntitle: \"Obtaining PIN and Gene Sets Data\"\noutput: rmarkdown::html_vignette\ndate: \"`r Sys.Date()`\"\nvignette: >\n  %\\Vi"
  },
  {
    "path": "vignettes/visualization_vignette.Rmd",
    "chars": 8089,
    "preview": "---\ntitle: \"Visualization of pathfindR Enrichment Results\"\noutput: rmarkdown::html_vignette\ndate: \"`r Sys.Date()`\"\nvigne"
  }
]

// ... and 1 more files (download for full content)

About this extraction

This page contains the full source code of the egeulgen/pathfindR GitHub repository, extracted and formatted as plain text for AI agents and large language models (LLMs). The extraction includes 138 files (688.6 KB), approximately 207.2k tokens, and a symbol index with 101 extracted functions, classes, methods, constants, and types. Use this with OpenClaw, Claude, ChatGPT, Cursor, Windsurf, or any other AI tool that accepts text input. You can copy the full output to your clipboard or download it as a .txt file.

Extracted by GitExtract — free GitHub repo to text converter for AI. Built by Nikandr Surkov.

Extract another repo