Full Code of yiluheihei/microbiomeMarker for AI

devel 66fc685c97be cached

179 files

680.9 KB

195.3k tokens

1 requests

Download .txt

Showing preview only (725K chars total). Download the full file or copy to clipboard to get everything.

Repository: yiluheihei/microbiomeMarker
Branch: devel
Commit: 66fc685c97be
Files: 179
Total size: 680.9 KB

Directory structure:
gitextract_few3gk_j/

├── .Rbuildignore
├── .gitattributes
├── .github/
│   ├── .gitignore
│   ├── ISSUE_TEMPLATE/
│   │   └── issue_template.md
│   └── workflows/
│       ├── check-bioc.yml
│       └── pkgdown.yaml
├── .gitignore
├── DESCRIPTION
├── LICENSE.md
├── NAMESPACE
├── NEWS.md
├── R/
│   ├── AllClasses.R
│   ├── AllGenerics.R
│   ├── DA-aldex.R
│   ├── DA-all.R
│   ├── DA-ancom.R
│   ├── DA-ancombc.R
│   ├── DA-comparing.R
│   ├── DA-deseq2.R
│   ├── DA-edgeR.R
│   ├── DA-lefse.R
│   ├── DA-limma-voom.R
│   ├── DA-metagenomeSeq.R
│   ├── DA-simple-statistic.R
│   ├── DA-sl.R
│   ├── DA-test-multiple-groups.R
│   ├── DA-test-two-groups.R
│   ├── abundances-methods.R
│   ├── aggregate-taxa.R
│   ├── assignment-methods.R
│   ├── confounder.R
│   ├── data.R
│   ├── extract-methods.R
│   ├── import-biobakery-lefse_in.R
│   ├── import-dada2.R
│   ├── import-picrust2.R
│   ├── import-qiime2.R
│   ├── lefse-utilities.R
│   ├── microbiomeMarker.R
│   ├── normalization.R
│   ├── plot-abundance.R
│   ├── plot-cladogram.R
│   ├── plot-comparing.R
│   ├── plot-effect-size.R
│   ├── plot-heatmap.R
│   ├── plot-postHocTest.R
│   ├── plot-sl-roc.R
│   ├── post-hoc-test.R
│   ├── reexports.R
│   ├── subset-marker.R
│   ├── summarize-taxa.R
│   ├── sysdata.rda
│   ├── test-utilities.R
│   ├── transform.R
│   └── utilities.R
├── README.Rmd
├── README.md
├── _pkgdown.yml
├── codecov.yml
├── data/
│   ├── caporaso.rda
│   ├── cid_ying.rda
│   ├── ecam.rda
│   ├── enterotypes_arumugam.rda
│   ├── kostic_crc.rda
│   ├── oxygen.rda
│   ├── pediatric_ibd.rda
│   └── spontaneous_colitis.rda
├── data-raw/
│   ├── available_ranks.R
│   └── data.R
├── inst/
│   ├── CITATION
│   └── extdata/
│       ├── dada2_samdata.txt
│       ├── dada2_seqtab.rds
│       ├── dada2_taxtab.rds
│       ├── picrust2_metadata.tsv
│       ├── refseq.qza
│       ├── sample-metadata.tsv
│       ├── table.qza
│       ├── taxonomy.qza
│       └── tree.qza
├── man/
│   ├── abundances-methods.Rd
│   ├── aggregate_taxa.Rd
│   ├── assign-marker_table.Rd
│   ├── assign-otu_table.Rd
│   ├── compare_DA.Rd
│   ├── confounder.Rd
│   ├── data-caporaso.Rd
│   ├── data-cid_ying.Rd
│   ├── data-ecam.Rd
│   ├── data-enterotypes_arumugam.Rd
│   ├── data-kostic_crc.Rd
│   ├── data-oxygen.Rd
│   ├── data-pediatric_ibd.Rd
│   ├── data-spontaneous_colitis.Rd
│   ├── effect_size-plot.Rd
│   ├── extract-methods.Rd
│   ├── extract_posthoc_res.Rd
│   ├── figures/
│   │   └── sticker.R
│   ├── get_treedata_phyloseq.Rd
│   ├── import_dada2.Rd
│   ├── import_picrust2.Rd
│   ├── import_qiime2.Rd
│   ├── marker_table-class.Rd
│   ├── marker_table-methods.Rd
│   ├── microbiomeMarker-class.Rd
│   ├── microbiomeMarker-package.Rd
│   ├── microbiomeMarker.Rd
│   ├── nmarker-methods.Rd
│   ├── normalize-methods.Rd
│   ├── phyloseq2DESeq2.Rd
│   ├── phyloseq2edgeR.Rd
│   ├── phyloseq2metagenomeSeq.Rd
│   ├── plot.compareDA.Rd
│   ├── plot_abundance.Rd
│   ├── plot_cladogram.Rd
│   ├── plot_heatmap.Rd
│   ├── plot_postHocTest.Rd
│   ├── plot_sl_roc.Rd
│   ├── postHocTest-class.Rd
│   ├── postHocTest.Rd
│   ├── reexports.Rd
│   ├── run_aldex.Rd
│   ├── run_ancom.Rd
│   ├── run_ancombc.Rd
│   ├── run_deseq2.Rd
│   ├── run_edger.Rd
│   ├── run_lefse.Rd
│   ├── run_limma_voom.Rd
│   ├── run_marker.Rd
│   ├── run_metagenomeseq.Rd
│   ├── run_posthoc_test.Rd
│   ├── run_simple_stat.Rd
│   ├── run_sl.Rd
│   ├── run_test_multiple_groups.Rd
│   ├── run_test_two_groups.Rd
│   ├── subset_marker.Rd
│   ├── summarize_taxa.Rd
│   ├── summary.compareDA.Rd
│   └── transform_abundances.Rd
├── tests/
│   ├── testthat/
│   │   ├── _snaps/
│   │   │   ├── ancom.md
│   │   │   ├── edgeR.md
│   │   │   ├── lefse.md
│   │   │   ├── limma-voom.md
│   │   │   ├── multiple-groups-test.md
│   │   │   └── two-group-test.md
│   │   ├── data/
│   │   │   ├── ancom-zero.csv
│   │   │   ├── ancom-zero_neg_lb.csv
│   │   │   ├── data_tax_duplicate.rds
│   │   │   └── generate_cladogram_annotation.rds
│   │   ├── test-abundances.R
│   │   ├── test-aldex.R
│   │   ├── test-ancom.R
│   │   ├── test-ancombc.R
│   │   ├── test-assignment.R
│   │   ├── test-barplot.R
│   │   ├── test-comparing.R
│   │   ├── test-confounder.R
│   │   ├── test-edgeR.R
│   │   ├── test-extract.R
│   │   ├── test-import-picrust2.R
│   │   ├── test-import-qiime2.R
│   │   ├── test-lefse-input.R
│   │   ├── test-lefse.R
│   │   ├── test-limma-voom.R
│   │   ├── test-metagenomeSeq.R
│   │   ├── test-microbiomeMaker-methods.R
│   │   ├── test-microbiomeMarker-class.R
│   │   ├── test-multiple-groups-test.R
│   │   ├── test-normalization.R
│   │   ├── test-sl.R
│   │   ├── test-summarize-tax.R
│   │   ├── test-transform.R
│   │   ├── test-two-group-test.R
│   │   ├── test-utilities.R
│   │   ├── test_cladogram.R
│   │   └── test_fix_duplicate_tax.R
│   └── testthat.R
└── vignettes/
    ├── .gitignore
    ├── microbiomeMarker-vignette.Rmd
    └── vignette.bib

================================================
FILE CONTENTS
================================================

================================================
FILE: .Rbuildignore
================================================
^microbiomeMarker\.Rproj$
^\.Rproj\.user$
^test\.R$
^README\.Rmd$
^data-raw$
^lefse$
^dev_test$
^\.github/workflows/R-CMD-check\.yaml$
^\.github$
^LICENSE\.md$
^codecov\.yml$
^_pkgdown\.yml$
^docs$
^pkgdown$
^man/figures/micribiome.png$


================================================
FILE: .gitattributes
================================================
* text=lf


================================================
FILE: .github/.gitignore
================================================
*.html


================================================
FILE: .github/ISSUE_TEMPLATE/issue_template.md
================================================
Please briefly describe your problem, what output actually happend, and what 
output you expect.

Please provide a minimal reproducible example. For more deails on how to make 
a great minimal reproducible example, see https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example and
https://www.tidyverse.org/help/#reprex.

```
Brief description of the problem

# insert minimal reprducible example here
```



================================================
FILE: .github/workflows/check-bioc.yml
================================================
## Read more about GitHub actions the features of this GitHub Actions workflow
## at https://lcolladotor.github.io/biocthis/articles/biocthis.html#use_bioc_github_action
##
## For more details, check the biocthis developer notes vignette at
## https://lcolladotor.github.io/biocthis/articles/biocthis_dev_notes.html
##
## You can add this workflow to other packages using:
## > biocthis::use_bioc_github_action()
##
## Using GitHub Actions exposes you to many details about how R packages are
## compiled and installed in several operating system.s
### If you need help, please follow the steps listed at
## https://github.com/r-lib/actions#where-to-find-help
##
## If you found an issue specific to biocthis's GHA workflow, please report it
## with the information that will make it easier for others to help you.
## Thank you!

## Acronyms:
## * GHA: GitHub Action
## * OS: operating system

on:
  push:
  pull_request:

name: R-CMD-check-bioc

## These environment variables control whether to run GHA code later on that is
## specific to testthat, covr, and pkgdown.
##
## If you need to clear the cache of packages, update the number inside
## cache-version as discussed at https://github.com/r-lib/actions/issues/86.
## Note that you can always run a GHA test without the cache by using the word
## "/nocache" in the commit message.
env:
  has_testthat: 'true'
  run_covr: 'true'
  run_pkgdown: 'true'
  has_RUnit: 'false'
  cache-version: 'cache-v1'
  run_docker: 'false'

jobs:
  build-check:
    runs-on: ${{ matrix.config.os }}
    name: ${{ matrix.config.os }} (${{ matrix.config.r }})
    container: ${{ matrix.config.cont }}
    ## Environment variables unique to this job.

    strategy:
      fail-fast: false
      matrix:
        config:
          - { os: ubuntu-latest, r: '4.2', bioc: '3.15', cont: "bioconductor/bioconductor_docker:RELEASE_3_15", rspm: "https://packagemanager.rstudio.com/cran/__linux__/focal/latest" }
          - { os: macOS-latest, r: '4.2', bioc: '3.15'}
          - { os: windows-latest, r: '4.2', bioc: '3.15'}
          ## Check https://github.com/r-lib/actions/tree/master/examples
          ## for examples using the http-user-agent
    env:
      R_REMOTES_NO_ERRORS_FROM_WARNINGS: true
      RSPM: ${{ matrix.config.rspm }}
      NOT_CRAN: true
      TZ: UTC
      GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
      GITHUB_PAT: ${{ secrets.GITHUB_TOKEN }}

    steps:

      ## Set the R library to the directory matching the
      ## R packages cache step further below when running on Docker (Linux).
      - name: Set R Library home on Linux
        if: runner.os == 'Linux'
        run: |
          mkdir /__w/_temp/Library
          echo ".libPaths('/__w/_temp/Library')" > ~/.Rprofile

      ## Most of these steps are the same as the ones in
      ## https://github.com/r-lib/actions/blob/master/examples/check-standard.yaml
      ## If they update their steps, we will also need to update ours.
      - name: Checkout Repository
        uses: actions/checkout@v2

      ## R is already included in the Bioconductor docker images
      - name: Setup R from r-lib
        if: runner.os != 'Linux'
        uses: r-lib/actions/setup-r@master
        with:
          r-version: ${{ matrix.config.r }}
          http-user-agent: ${{ matrix.config.http-user-agent }}

      ## pandoc is already included in the Bioconductor docker images
      - name: Setup pandoc from r-lib
        if: runner.os != 'Linux'
        uses: r-lib/actions/setup-pandoc@master

      - name: Query dependencies
        run: |
          install.packages('remotes')
          saveRDS(remotes::dev_package_deps(dependencies = TRUE), ".github/depends.Rds", version = 2)
        shell: Rscript {0}

      - name: Restore R package cache
        if: "!contains(github.event.head_commit.message, '/nocache') && runner.os != 'Linux'"
        uses: actions/cache@v2
        with:
          path: ${{ env.R_LIBS_USER }}
          key: ${{ env.cache-version }}-${{ runner.os }}-biocversion-RELEASE_3_15-r-4.2-${{ hashFiles('.github/depends.Rds') }}
          restore-keys: ${{ env.cache-version }}-${{ runner.os }}-biocversion-RELEASE_3_15-r-4.2-

      - name: Cache R packages on Linux
        if: "!contains(github.event.head_commit.message, '/nocache') && runner.os == 'Linux' "
        uses: actions/cache@v2
        with:
          path: /home/runner/work/_temp/Library
          key: ${{ env.cache-version }}-${{ runner.os }}-biocversion-RELEASE_3_15-r-4.2-${{ hashFiles('.github/depends.Rds') }}
          restore-keys: ${{ env.cache-version }}-${{ runner.os }}-biocversion-RELEASE_3_15-r-4.2-

      - name: Install Linux system dependencies
        if: runner.os == 'Linux'
        run: |
          sysreqs=$(Rscript -e 'cat("apt-get update -y && apt-get install -y", paste(gsub("apt-get install -y ", "", remotes::system_requirements("ubuntu", "20.04")), collapse = " "))')
          echo $sysreqs
          sudo -s eval "$sysreqs"

      - name: Install macOS system dependencies
        if: matrix.config.os == 'macOS-latest'
        run: |
          ## Enable installing XML from source if needed
          brew install libxml2
          echo "XML_CONFIG=/usr/local/opt/libxml2/bin/xml2-config" >> $GITHUB_ENV

          ## Required to install magick as noted at
          ## https://github.com/r-lib/usethis/commit/f1f1e0d10c1ebc75fd4c18fa7e2de4551fd9978f#diff-9bfee71065492f63457918efcd912cf2
          brew install imagemagick@6

          ## For textshaping, required by ragg, and required by pkgdown
          brew install harfbuzz fribidi

          ## For installing usethis's dependency gert
          brew install libgit2

          ## Required for tcltk
          brew install xquartz --cask

      - name: Install Windows system dependencies
        if: runner.os == 'Windows'
        run: |
          ## Edit below if you have any Windows system dependencies
        shell: Rscript {0}

      - name: Install BiocManager
        run: |
          message(paste('****', Sys.time(), 'installing BiocManager ****'))
          remotes::install_cran("BiocManager")
        shell: Rscript {0}

      - name: Set BiocVersion
        run: |
          BiocManager::install(version = "${{ matrix.config.bioc }}", ask = FALSE, force = TRUE)
        shell: Rscript {0}

      - name: Install dependencies pass 1
        run: |
          ## Try installing the package dependencies in steps. First the local
          ## dependencies, then any remaining dependencies to avoid the
          ## issues described at
          ## https://stat.ethz.ch/pipermail/bioc-devel/2020-April/016675.html
          ## https://github.com/r-lib/remotes/issues/296
          ## Ideally, all dependencies should get installed in the first pass.

          ## Set the repos source depending on the OS
          ## Alternatively use https://storage.googleapis.com/bioconductor_docker/packages/
          ## though based on https://bit.ly/bioc2021-package-binaries
          ## the Azure link will be the main one going forward.
          gha_repos <- if(
              .Platform$OS.type == "unix" && Sys.info()["sysname"] != "Darwin"
          ) c(
              "AnVIL" = "https://bioconductordocker.blob.core.windows.net/packages/3.15/bioc",
              BiocManager::repositories()
              ) else BiocManager::repositories()

          ## For running the checks
          message(paste('****', Sys.time(), 'installing rcmdcheck and BiocCheck ****'))
          install.packages(c("rcmdcheck", "BiocCheck"), repos = gha_repos)

          ## Pass #1 at installing dependencies
          ## This pass uses AnVIL-powered fast binaries
          ## details at https://github.com/nturaga/bioc2021-bioconductor-binaries
          ## The speed gains only apply to the docker builds.
          message(paste('****', Sys.time(), 'pass number 1 at installing dependencies: local dependencies ****'))
          remotes::install_local(dependencies = TRUE, repos = gha_repos, build_vignettes = FALSE, upgrade = TRUE)
        continue-on-error: true
        shell: Rscript {0}

      - name: Install dependencies pass 2
        run: |
          ## Pass #2 at installing dependencies
          ## This pass does not use AnVIL and will thus update any packages
          ## that have seen been updated in Bioconductor
          message(paste('****', Sys.time(), 'pass number 2 at installing dependencies: any remaining dependencies ****'))
          remotes::install_local(dependencies = TRUE, repos = BiocManager::repositories(), build_vignettes = TRUE, upgrade = TRUE, force = TRUE)
        shell: Rscript {0}

      - name: Install BiocGenerics
        if:  env.has_RUnit == 'true'
        run: |
          ## Install BiocGenerics
          BiocManager::install("BiocGenerics")
        shell: Rscript {0}

      - name: Install covr
        if: github.ref == 'refs/heads/master' && env.run_covr == 'true' && runner.os == 'Linux'
        run: |
          remotes::install_cran("covr")
        shell: Rscript {0}

      - name: Install pkgdown
        if: github.ref == 'refs/heads/master' && env.run_pkgdown == 'true' && runner.os == 'Linux'
        run: |
          remotes::install_cran("pkgdown")
        shell: Rscript {0}

      - name: Session info
        run: |
          options(width = 100)
          pkgs <- installed.packages()[, "Package"]
          sessioninfo::session_info(pkgs, include_base = TRUE)
        shell: Rscript {0}

      - name: Run CMD check
        env:
          _R_CHECK_CRAN_INCOMING_: false
          DISPLAY: 99.0
        run: |
          options(crayon.enabled = TRUE)
          rcmdcheck::rcmdcheck(
              args = c("--no-manual", "--no-vignettes", "--timings"),
              build_args = c("--no-manual", "--keep-empty-dirs", "--no-resave-data"),
              error_on = "warning",
              check_dir = "check"
          )
        shell: Rscript {0}

      ## Might need an to add this to the if:  && runner.os == 'Linux'
      - name: Reveal testthat details
        if:  env.has_testthat == 'true'
        run: find . -name testthat.Rout -exec cat '{}' ';'

      - name: Run RUnit tests
        if:  env.has_RUnit == 'true'
        run: |
          BiocGenerics:::testPackage()
        shell: Rscript {0}

      - name: Run BiocCheck
        env:
          DISPLAY: 99.0
        run: |
          BiocCheck::BiocCheck(
              dir('check', 'tar.gz$', full.names = TRUE),
              `quit-with-status` = TRUE,
              `no-check-R-ver` = TRUE,
              `no-check-bioc-help` = TRUE
          )
        shell: Rscript {0}

      - name: Test coverage
        if: github.ref == 'refs/heads/master' && env.run_covr == 'true' && runner.os == 'Linux'
        run: |
          covr::codecov()
        shell: Rscript {0}

      - name: Install package
        if: github.ref == 'refs/heads/master' && env.run_pkgdown == 'true' && runner.os == 'Linux'
        run: R CMD INSTALL .

      - name: Build and deploy pkgdown site
        if: github.ref == 'refs/heads/master' && env.run_pkgdown == 'true' && runner.os == 'Linux'
        run: |
          git config --local user.name "$GITHUB_ACTOR"
          git config --local user.email "$GITHUB_ACTOR@users.noreply.github.com"
          Rscript -e "pkgdown::deploy_to_branch(new_process = FALSE)"
        shell: bash {0}
        ## Note that you need to run pkgdown::deploy_to_branch(new_process = FALSE)
        ## at least one locally before this will work. This creates the gh-pages
        ## branch (erasing anything you haven't version controlled!) and
        ## makes the git history recognizable by pkgdown.

      - name: Upload check results
        if: failure()
        uses: actions/upload-artifact@master
        with:
          name: ${{ runner.os }}-biocversion-RELEASE_3_15-r-4.2-results
          path: check

        ## Note that DOCKER_PASSWORD is really a token for your dockerhub
        ## account, not your actual dockerhub account password.
        ## This comes from
        ## https://seandavi.github.io/BuildABiocWorkshop/articles/HOWTO_BUILD_WORKSHOP.html#6-add-secrets-to-github-repo
        ## Check https://github.com/docker/build-push-action/tree/releases/v1
        ## for more details.
      - uses: docker/build-push-action@v1
        if: "!contains(github.event.head_commit.message, '/nodocker') && env.run_docker == 'true' && runner.os == 'Linux' "
        with:
          username: ${{ secrets.DOCKER_USERNAME }}
          password: ${{ secrets.DOCKER_PASSWORD }}
          repository: yiluheihei/microbiomemarker
          tag_with_ref: true
          tag_with_sha: true
          tags: latest


================================================
FILE: .github/workflows/pkgdown.yaml
================================================
on:
  push:
    branches:
      - main
      - master
    tags:
      -'*'

name: pkgdown

jobs:
  pkgdown:
    runs-on: macOS-latest
    env:
      GITHUB_PAT: ${{ secrets.GITHUB_TOKEN }}
    steps:
      - uses: actions/checkout@v2

      - uses: r-lib/actions/setup-r@v1

      - uses: r-lib/actions/setup-pandoc@v1

      - name: Install XQuartz # .onLoad failed in loadNamespace() for 'Cairo'
        run: brew install xquartz --cask

      - name: Query dependencies
        run: |
          install.packages('remotes')
          saveRDS(remotes::dev_package_deps(dependencies = TRUE), ".github/depends.Rds", version = 2)
          writeLines(sprintf("R-%i.%i", getRversion()$major, getRversion()$minor), ".github/R-version")
        shell: Rscript {0}

      - name: Restore R package cache
        uses: actions/cache@v2
        with:
          path: ${{ env.R_LIBS_USER }}
          key: ${{ runner.os }}-${{ hashFiles('.github/R-version') }}-1-${{ hashFiles('.github/depends.Rds') }}
          restore-keys: ${{ runner.os }}-${{ hashFiles('.github/R-version') }}-1-

      - name: Install dependencies
        run: |
          remotes::install_deps(dependencies = TRUE)
          install.packages("pkgdown", type = "binary")
        shell: Rscript {0}

      - name: Install package
        run: R CMD INSTALL .

      - name: Deploy package
        run: |
          git config --local user.email "actions@github.com"
          git config --local user.name "GitHub Actions"
          Rscript -e 'pkgdown::deploy_to_branch(new_process = FALSE)'


================================================
FILE: .gitignore
================================================
.RData
__MACOSX
docs
inst/doc
.Rproj.user
*.Rproj


================================================
FILE: DESCRIPTION
================================================
Package: microbiomeMarker
Title: microbiome biomarker analysis toolkit
Version: 1.13.2
Authors@R: 
    person(given = "Yang",
           family = "Cao",
           role = c("aut", "cre"),
           email = "caoyang.name@gmail.com")
Description: To date, a number of methods have been developed for microbiome 
    marker discovery based on metagenomic profiles, e.g. LEfSe. However, all of 
    these methods have its own advantages and disadvantages, and none of them is 
    considered standard or universal. Moreover, different programs or softwares 
    may be development using different programming languages, even in different 
    operating systems. Here, we have developed an all-in-one R package 
    microbiomeMarker that integrates commonly used differential analysis 
    methods as well as three machine learning-based approaches, including 
    Logistic regression, Random forest, and Support vector machine, to 
    facilitate the identification of microbiome markers.
License: GPL-3
biocViews: Metagenomics, Microbiome, DifferentialExpression
URL: https://github.com/yiluheihei/microbiomeMarker
BugReports: https://github.com/yiluheihei/microbiomeMarker/issues
Depends: R (>= 4.1.0)
Imports: 
    dplyr,
    phyloseq,
    magrittr,
    purrr,
    MASS,
    utils,
    ggplot2,
    tibble,
    rlang,
    stats,
    coin,
    ggtree,
    tidytree,
    methods,
    IRanges,
    tidyr,
    patchwork,
    ggsignif,
    metagenomeSeq,
    DESeq2,
    edgeR,
    BiocGenerics,
    Biostrings,
    yaml,
    biomformat,
    S4Vectors,
    Biobase,
    ComplexHeatmap,
    ANCOMBC,
    caret,
    limma,
    ALDEx2,
    multtest,
    plotROC,
    vegan,
    pROC,
    BiocParallel
Encoding: UTF-8
RoxygenNote: 7.3.2
Roxygen: list(markdown = TRUE)
Suggests: 
    testthat,
    covr,
    glmnet,
    Matrix,
    kernlab,
    e1071,
    ranger,
    knitr,
    rmarkdown,
    BiocStyle,
    withr,
    microbiome
VignetteBuilder: knitr
Config/testthat/edition: 3


================================================
FILE: LICENSE.md
================================================
GNU General Public License
==========================

_Version 3, 29 June 2007_  
_Copyright © 2007 Free Software Foundation, Inc. &lt;<http://fsf.org/>&gt;_

Everyone is permitted to copy and distribute verbatim copies of this license
document, but changing it is not allowed.

## Preamble

The GNU General Public License is a free, copyleft license for software and other
kinds of works.

The licenses for most software and other practical works are designed to take away
your freedom to share and change the works. By contrast, the GNU General Public
License is intended to guarantee your freedom to share and change all versions of a
program--to make sure it remains free software for all its users. We, the Free
Software Foundation, use the GNU General Public License for most of our software; it
applies also to any other work released this way by its authors. You can apply it to
your programs, too.

When we speak of free software, we are referring to freedom, not price. Our General
Public Licenses are designed to make sure that you have the freedom to distribute
copies of free software (and charge for them if you wish), that you receive source
code or can get it if you want it, that you can change the software or use pieces of
it in new free programs, and that you know you can do these things.

To protect your rights, we need to prevent others from denying you these rights or
asking you to surrender the rights. Therefore, you have certain responsibilities if
you distribute copies of the software, or if you modify it: responsibilities to
respect the freedom of others.

For example, if you distribute copies of such a program, whether gratis or for a fee,
you must pass on to the recipients the same freedoms that you received. You must make
sure that they, too, receive or can get the source code. And you must show them these
terms so they know their rights.

Developers that use the GNU GPL protect your rights with two steps: **(1)** assert
copyright on the software, and **(2)** offer you this License giving you legal permission
to copy, distribute and/or modify it.

For the developers' and authors' protection, the GPL clearly explains that there is
no warranty for this free software. For both users' and authors' sake, the GPL
requires that modified versions be marked as changed, so that their problems will not
be attributed erroneously to authors of previous versions.

Some devices are designed to deny users access to install or run modified versions of
the software inside them, although the manufacturer can do so. This is fundamentally
incompatible with the aim of protecting users' freedom to change the software. The
systematic pattern of such abuse occurs in the area of products for individuals to
use, which is precisely where it is most unacceptable. Therefore, we have designed
this version of the GPL to prohibit the practice for those products. If such problems
arise substantially in other domains, we stand ready to extend this provision to
those domains in future versions of the GPL, as needed to protect the freedom of
users.

Finally, every program is threatened constantly by software patents. States should
not allow patents to restrict development and use of software on general-purpose
computers, but in those that do, we wish to avoid the special danger that patents
applied to a free program could make it effectively proprietary. To prevent this, the
GPL assures that patents cannot be used to render the program non-free.

The precise terms and conditions for copying, distribution and modification follow.

## TERMS AND CONDITIONS

### 0. Definitions

“This License” refers to version 3 of the GNU General Public License.

“Copyright” also means copyright-like laws that apply to other kinds of
works, such as semiconductor masks.

“The Program” refers to any copyrightable work licensed under this
License. Each licensee is addressed as “you”. “Licensees” and
“recipients” may be individuals or organizations.

To “modify” a work means to copy from or adapt all or part of the work in
a fashion requiring copyright permission, other than the making of an exact copy. The
resulting work is called a “modified version” of the earlier work or a
work “based on” the earlier work.

A “covered work” means either the unmodified Program or a work based on
the Program.

To “propagate” a work means to do anything with it that, without
permission, would make you directly or secondarily liable for infringement under
applicable copyright law, except executing it on a computer or modifying a private
copy. Propagation includes copying, distribution (with or without modification),
making available to the public, and in some countries other activities as well.

To “convey” a work means any kind of propagation that enables other
parties to make or receive copies. Mere interaction with a user through a computer
network, with no transfer of a copy, is not conveying.

An interactive user interface displays “Appropriate Legal Notices” to the
extent that it includes a convenient and prominently visible feature that **(1)**
displays an appropriate copyright notice, and **(2)** tells the user that there is no
warranty for the work (except to the extent that warranties are provided), that
licensees may convey the work under this License, and how to view a copy of this
License. If the interface presents a list of user commands or options, such as a
menu, a prominent item in the list meets this criterion.

### 1. Source Code

The “source code” for a work means the preferred form of the work for
making modifications to it. “Object code” means any non-source form of a
work.

A “Standard Interface” means an interface that either is an official
standard defined by a recognized standards body, or, in the case of interfaces
specified for a particular programming language, one that is widely used among
developers working in that language.

The “System Libraries” of an executable work include anything, other than
the work as a whole, that **(a)** is included in the normal form of packaging a Major
Component, but which is not part of that Major Component, and **(b)** serves only to
enable use of the work with that Major Component, or to implement a Standard
Interface for which an implementation is available to the public in source code form.
A “Major Component”, in this context, means a major essential component
(kernel, window system, and so on) of the specific operating system (if any) on which
the executable work runs, or a compiler used to produce the work, or an object code
interpreter used to run it.

The “Corresponding Source” for a work in object code form means all the
source code needed to generate, install, and (for an executable work) run the object
code and to modify the work, including scripts to control those activities. However,
it does not include the work's System Libraries, or general-purpose tools or
generally available free programs which are used unmodified in performing those
activities but which are not part of the work. For example, Corresponding Source
includes interface definition files associated with source files for the work, and
the source code for shared libraries and dynamically linked subprograms that the work
is specifically designed to require, such as by intimate data communication or
control flow between those subprograms and other parts of the work.

The Corresponding Source need not include anything that users can regenerate
automatically from other parts of the Corresponding Source.

The Corresponding Source for a work in source code form is that same work.

### 2. Basic Permissions

All rights granted under this License are granted for the term of copyright on the
Program, and are irrevocable provided the stated conditions are met. This License
explicitly affirms your unlimited permission to run the unmodified Program. The
output from running a covered work is covered by this License only if the output,
given its content, constitutes a covered work. This License acknowledges your rights
of fair use or other equivalent, as provided by copyright law.

You may make, run and propagate covered works that you do not convey, without
conditions so long as your license otherwise remains in force. You may convey covered
works to others for the sole purpose of having them make modifications exclusively
for you, or provide you with facilities for running those works, provided that you
comply with the terms of this License in conveying all material for which you do not
control copyright. Those thus making or running the covered works for you must do so
exclusively on your behalf, under your direction and control, on terms that prohibit
them from making any copies of your copyrighted material outside their relationship
with you.

Conveying under any other circumstances is permitted solely under the conditions
stated below. Sublicensing is not allowed; section 10 makes it unnecessary.

### 3. Protecting Users' Legal Rights From Anti-Circumvention Law

No covered work shall be deemed part of an effective technological measure under any
applicable law fulfilling obligations under article 11 of the WIPO copyright treaty
adopted on 20 December 1996, or similar laws prohibiting or restricting circumvention
of such measures.

When you convey a covered work, you waive any legal power to forbid circumvention of
technological measures to the extent such circumvention is effected by exercising
rights under this License with respect to the covered work, and you disclaim any
intention to limit operation or modification of the work as a means of enforcing,
against the work's users, your or third parties' legal rights to forbid circumvention
of technological measures.

### 4. Conveying Verbatim Copies

You may convey verbatim copies of the Program's source code as you receive it, in any
medium, provided that you conspicuously and appropriately publish on each copy an
appropriate copyright notice; keep intact all notices stating that this License and
any non-permissive terms added in accord with section 7 apply to the code; keep
intact all notices of the absence of any warranty; and give all recipients a copy of
this License along with the Program.

You may charge any price or no price for each copy that you convey, and you may offer
support or warranty protection for a fee.

### 5. Conveying Modified Source Versions

You may convey a work based on the Program, or the modifications to produce it from
the Program, in the form of source code under the terms of section 4, provided that
you also meet all of these conditions:

* **a)** The work must carry prominent notices stating that you modified it, and giving a
relevant date.
* **b)** The work must carry prominent notices stating that it is released under this
License and any conditions added under section 7. This requirement modifies the
requirement in section 4 to “keep intact all notices”.
* **c)** You must license the entire work, as a whole, under this License to anyone who
comes into possession of a copy. This License will therefore apply, along with any
applicable section 7 additional terms, to the whole of the work, and all its parts,
regardless of how they are packaged. This License gives no permission to license the
work in any other way, but it does not invalidate such permission if you have
separately received it.
* **d)** If the work has interactive user interfaces, each must display Appropriate Legal
Notices; however, if the Program has interactive interfaces that do not display
Appropriate Legal Notices, your work need not make them do so.

A compilation of a covered work with other separate and independent works, which are
not by their nature extensions of the covered work, and which are not combined with
it such as to form a larger program, in or on a volume of a storage or distribution
medium, is called an “aggregate” if the compilation and its resulting
copyright are not used to limit the access or legal rights of the compilation's users
beyond what the individual works permit. Inclusion of a covered work in an aggregate
does not cause this License to apply to the other parts of the aggregate.

### 6. Conveying Non-Source Forms

You may convey a covered work in object code form under the terms of sections 4 and
5, provided that you also convey the machine-readable Corresponding Source under the
terms of this License, in one of these ways:

* **a)** Convey the object code in, or embodied in, a physical product (including a
physical distribution medium), accompanied by the Corresponding Source fixed on a
durable physical medium customarily used for software interchange.
* **b)** Convey the object code in, or embodied in, a physical product (including a
physical distribution medium), accompanied by a written offer, valid for at least
three years and valid for as long as you offer spare parts or customer support for
that product model, to give anyone who possesses the object code either **(1)** a copy of
the Corresponding Source for all the software in the product that is covered by this
License, on a durable physical medium customarily used for software interchange, for
a price no more than your reasonable cost of physically performing this conveying of
source, or **(2)** access to copy the Corresponding Source from a network server at no
charge.
* **c)** Convey individual copies of the object code with a copy of the written offer to
provide the Corresponding Source. This alternative is allowed only occasionally and
noncommercially, and only if you received the object code with such an offer, in
accord with subsection 6b.
* **d)** Convey the object code by offering access from a designated place (gratis or for
a charge), and offer equivalent access to the Corresponding Source in the same way
through the same place at no further charge. You need not require recipients to copy
the Corresponding Source along with the object code. If the place to copy the object
code is a network server, the Corresponding Source may be on a different server
(operated by you or a third party) that supports equivalent copying facilities,
provided you maintain clear directions next to the object code saying where to find
the Corresponding Source. Regardless of what server hosts the Corresponding Source,
you remain obligated to ensure that it is available for as long as needed to satisfy
these requirements.
* **e)** Convey the object code using peer-to-peer transmission, provided you inform
other peers where the object code and Corresponding Source of the work are being
offered to the general public at no charge under subsection 6d.

A separable portion of the object code, whose source code is excluded from the
Corresponding Source as a System Library, need not be included in conveying the
object code work.

A “User Product” is either **(1)** a “consumer product”, which
means any tangible personal property which is normally used for personal, family, or
household purposes, or **(2)** anything designed or sold for incorporation into a
dwelling. In determining whether a product is a consumer product, doubtful cases
shall be resolved in favor of coverage. For a particular product received by a
particular user, “normally used” refers to a typical or common use of
that class of product, regardless of the status of the particular user or of the way
in which the particular user actually uses, or expects or is expected to use, the
product. A product is a consumer product regardless of whether the product has
substantial commercial, industrial or non-consumer uses, unless such uses represent
the only significant mode of use of the product.

“Installation Information” for a User Product means any methods,
procedures, authorization keys, or other information required to install and execute
modified versions of a covered work in that User Product from a modified version of
its Corresponding Source. The information must suffice to ensure that the continued
functioning of the modified object code is in no case prevented or interfered with
solely because modification has been made.

If you convey an object code work under this section in, or with, or specifically for
use in, a User Product, and the conveying occurs as part of a transaction in which
the right of possession and use of the User Product is transferred to the recipient
in perpetuity or for a fixed term (regardless of how the transaction is
characterized), the Corresponding Source conveyed under this section must be
accompanied by the Installation Information. But this requirement does not apply if
neither you nor any third party retains the ability to install modified object code
on the User Product (for example, the work has been installed in ROM).

The requirement to provide Installation Information does not include a requirement to
continue to provide support service, warranty, or updates for a work that has been
modified or installed by the recipient, or for the User Product in which it has been
modified or installed. Access to a network may be denied when the modification itself
materially and adversely affects the operation of the network or violates the rules
and protocols for communication across the network.

Corresponding Source conveyed, and Installation Information provided, in accord with
this section must be in a format that is publicly documented (and with an
implementation available to the public in source code form), and must require no
special password or key for unpacking, reading or copying.

### 7. Additional Terms

“Additional permissions” are terms that supplement the terms of this
License by making exceptions from one or more of its conditions. Additional
permissions that are applicable to the entire Program shall be treated as though they
were included in this License, to the extent that they are valid under applicable
law. If additional permissions apply only to part of the Program, that part may be
used separately under those permissions, but the entire Program remains governed by
this License without regard to the additional permissions.

When you convey a copy of a covered work, you may at your option remove any
additional permissions from that copy, or from any part of it. (Additional
permissions may be written to require their own removal in certain cases when you
modify the work.) You may place additional permissions on material, added by you to a
covered work, for which you have or can give appropriate copyright permission.

Notwithstanding any other provision of this License, for material you add to a
covered work, you may (if authorized by the copyright holders of that material)
supplement the terms of this License with terms:

* **a)** Disclaiming warranty or limiting liability differently from the terms of
sections 15 and 16 of this License; or
* **b)** Requiring preservation of specified reasonable legal notices or author
attributions in that material or in the Appropriate Legal Notices displayed by works
containing it; or
* **c)** Prohibiting misrepresentation of the origin of that material, or requiring that
modified versions of such material be marked in reasonable ways as different from the
original version; or
* **d)** Limiting the use for publicity purposes of names of licensors or authors of the
material; or
* **e)** Declining to grant rights under trademark law for use of some trade names,
trademarks, or service marks; or
* **f)** Requiring indemnification of licensors and authors of that material by anyone
who conveys the material (or modified versions of it) with contractual assumptions of
liability to the recipient, for any liability that these contractual assumptions
directly impose on those licensors and authors.

All other non-permissive additional terms are considered “further
restrictions” within the meaning of section 10. If the Program as you received
it, or any part of it, contains a notice stating that it is governed by this License
along with a term that is a further restriction, you may remove that term. If a
license document contains a further restriction but permits relicensing or conveying
under this License, you may add to a covered work material governed by the terms of
that license document, provided that the further restriction does not survive such
relicensing or conveying.

If you add terms to a covered work in accord with this section, you must place, in
the relevant source files, a statement of the additional terms that apply to those
files, or a notice indicating where to find the applicable terms.

Additional terms, permissive or non-permissive, may be stated in the form of a
separately written license, or stated as exceptions; the above requirements apply
either way.

### 8. Termination

You may not propagate or modify a covered work except as expressly provided under
this License. Any attempt otherwise to propagate or modify it is void, and will
automatically terminate your rights under this License (including any patent licenses
granted under the third paragraph of section 11).

However, if you cease all violation of this License, then your license from a
particular copyright holder is reinstated **(a)** provisionally, unless and until the
copyright holder explicitly and finally terminates your license, and **(b)** permanently,
if the copyright holder fails to notify you of the violation by some reasonable means
prior to 60 days after the cessation.

Moreover, your license from a particular copyright holder is reinstated permanently
if the copyright holder notifies you of the violation by some reasonable means, this
is the first time you have received notice of violation of this License (for any
work) from that copyright holder, and you cure the violation prior to 30 days after
your receipt of the notice.

Termination of your rights under this section does not terminate the licenses of
parties who have received copies or rights from you under this License. If your
rights have been terminated and not permanently reinstated, you do not qualify to
receive new licenses for the same material under section 10.

### 9. Acceptance Not Required for Having Copies

You are not required to accept this License in order to receive or run a copy of the
Program. Ancillary propagation of a covered work occurring solely as a consequence of
using peer-to-peer transmission to receive a copy likewise does not require
acceptance. However, nothing other than this License grants you permission to
propagate or modify any covered work. These actions infringe copyright if you do not
accept this License. Therefore, by modifying or propagating a covered work, you
indicate your acceptance of this License to do so.

### 10. Automatic Licensing of Downstream Recipients

Each time you convey a covered work, the recipient automatically receives a license
from the original licensors, to run, modify and propagate that work, subject to this
License. You are not responsible for enforcing compliance by third parties with this
License.

An “entity transaction” is a transaction transferring control of an
organization, or substantially all assets of one, or subdividing an organization, or
merging organizations. If propagation of a covered work results from an entity
transaction, each party to that transaction who receives a copy of the work also
receives whatever licenses to the work the party's predecessor in interest had or
could give under the previous paragraph, plus a right to possession of the
Corresponding Source of the work from the predecessor in interest, if the predecessor
has it or can get it with reasonable efforts.

You may not impose any further restrictions on the exercise of the rights granted or
affirmed under this License. For example, you may not impose a license fee, royalty,
or other charge for exercise of rights granted under this License, and you may not
initiate litigation (including a cross-claim or counterclaim in a lawsuit) alleging
that any patent claim is infringed by making, using, selling, offering for sale, or
importing the Program or any portion of it.

### 11. Patents

A “contributor” is a copyright holder who authorizes use under this
License of the Program or a work on which the Program is based. The work thus
licensed is called the contributor's “contributor version”.

A contributor's “essential patent claims” are all patent claims owned or
controlled by the contributor, whether already acquired or hereafter acquired, that
would be infringed by some manner, permitted by this License, of making, using, or
selling its contributor version, but do not include claims that would be infringed
only as a consequence of further modification of the contributor version. For
purposes of this definition, “control” includes the right to grant patent
sublicenses in a manner consistent with the requirements of this License.

Each contributor grants you a non-exclusive, worldwide, royalty-free patent license
under the contributor's essential patent claims, to make, use, sell, offer for sale,
import and otherwise run, modify and propagate the contents of its contributor
version.

In the following three paragraphs, a “patent license” is any express
agreement or commitment, however denominated, not to enforce a patent (such as an
express permission to practice a patent or covenant not to sue for patent
infringement). To “grant” such a patent license to a party means to make
such an agreement or commitment not to enforce a patent against the party.

If you convey a covered work, knowingly relying on a patent license, and the
Corresponding Source of the work is not available for anyone to copy, free of charge
and under the terms of this License, through a publicly available network server or
other readily accessible means, then you must either **(1)** cause the Corresponding
Source to be so available, or **(2)** arrange to deprive yourself of the benefit of the
patent license for this particular work, or **(3)** arrange, in a manner consistent with
the requirements of this License, to extend the patent license to downstream
recipients. “Knowingly relying” means you have actual knowledge that, but
for the patent license, your conveying the covered work in a country, or your
recipient's use of the covered work in a country, would infringe one or more
identifiable patents in that country that you have reason to believe are valid.

If, pursuant to or in connection with a single transaction or arrangement, you
convey, or propagate by procuring conveyance of, a covered work, and grant a patent
license to some of the parties receiving the covered work authorizing them to use,
propagate, modify or convey a specific copy of the covered work, then the patent
license you grant is automatically extended to all recipients of the covered work and
works based on it.

A patent license is “discriminatory” if it does not include within the
scope of its coverage, prohibits the exercise of, or is conditioned on the
non-exercise of one or more of the rights that are specifically granted under this
License. You may not convey a covered work if you are a party to an arrangement with
a third party that is in the business of distributing software, under which you make
payment to the third party based on the extent of your activity of conveying the
work, and under which the third party grants, to any of the parties who would receive
the covered work from you, a discriminatory patent license **(a)** in connection with
copies of the covered work conveyed by you (or copies made from those copies), or **(b)**
primarily for and in connection with specific products or compilations that contain
the covered work, unless you entered into that arrangement, or that patent license
was granted, prior to 28 March 2007.

Nothing in this License shall be construed as excluding or limiting any implied
license or other defenses to infringement that may otherwise be available to you
under applicable patent law.

### 12. No Surrender of Others' Freedom

If conditions are imposed on you (whether by court order, agreement or otherwise)
that contradict the conditions of this License, they do not excuse you from the
conditions of this License. If you cannot convey a covered work so as to satisfy
simultaneously your obligations under this License and any other pertinent
obligations, then as a consequence you may not convey it at all. For example, if you
agree to terms that obligate you to collect a royalty for further conveying from
those to whom you convey the Program, the only way you could satisfy both those terms
and this License would be to refrain entirely from conveying the Program.

### 13. Use with the GNU Affero General Public License

Notwithstanding any other provision of this License, you have permission to link or
combine any covered work with a work licensed under version 3 of the GNU Affero
General Public License into a single combined work, and to convey the resulting work.
The terms of this License will continue to apply to the part which is the covered
work, but the special requirements of the GNU Affero General Public License, section
13, concerning interaction through a network will apply to the combination as such.

### 14. Revised Versions of this License

The Free Software Foundation may publish revised and/or new versions of the GNU
General Public License from time to time. Such new versions will be similar in spirit
to the present version, but may differ in detail to address new problems or concerns.

Each version is given a distinguishing version number. If the Program specifies that
a certain numbered version of the GNU General Public License “or any later
version” applies to it, you have the option of following the terms and
conditions either of that numbered version or of any later version published by the
Free Software Foundation. If the Program does not specify a version number of the GNU
General Public License, you may choose any version ever published by the Free
Software Foundation.

If the Program specifies that a proxy can decide which future versions of the GNU
General Public License can be used, that proxy's public statement of acceptance of a
version permanently authorizes you to choose that version for the Program.

Later license versions may give you additional or different permissions. However, no
additional obligations are imposed on any author or copyright holder as a result of
your choosing to follow a later version.

### 15. Disclaimer of Warranty

THERE IS NO WARRANTY FOR THE PROGRAM, TO THE EXTENT PERMITTED BY APPLICABLE LAW.
EXCEPT WHEN OTHERWISE STATED IN WRITING THE COPYRIGHT HOLDERS AND/OR OTHER PARTIES
PROVIDE THE PROGRAM “AS IS” WITHOUT WARRANTY OF ANY KIND, EITHER
EXPRESSED OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF
MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. THE ENTIRE RISK AS TO THE
QUALITY AND PERFORMANCE OF THE PROGRAM IS WITH YOU. SHOULD THE PROGRAM PROVE
DEFECTIVE, YOU ASSUME THE COST OF ALL NECESSARY SERVICING, REPAIR OR CORRECTION.

### 16. Limitation of Liability

IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN WRITING WILL ANY
COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MODIFIES AND/OR CONVEYS THE PROGRAM AS
PERMITTED ABOVE, BE LIABLE TO YOU FOR DAMAGES, INCLUDING ANY GENERAL, SPECIAL,
INCIDENTAL OR CONSEQUENTIAL DAMAGES ARISING OUT OF THE USE OR INABILITY TO USE THE
PROGRAM (INCLUDING BUT NOT LIMITED TO LOSS OF DATA OR DATA BEING RENDERED INACCURATE
OR LOSSES SUSTAINED BY YOU OR THIRD PARTIES OR A FAILURE OF THE PROGRAM TO OPERATE
WITH ANY OTHER PROGRAMS), EVEN IF SUCH HOLDER OR OTHER PARTY HAS BEEN ADVISED OF THE
POSSIBILITY OF SUCH DAMAGES.

### 17. Interpretation of Sections 15 and 16

If the disclaimer of warranty and limitation of liability provided above cannot be
given local legal effect according to their terms, reviewing courts shall apply local
law that most closely approximates an absolute waiver of all civil liability in
connection with the Program, unless a warranty or assumption of liability accompanies
a copy of the Program in return for a fee.

_END OF TERMS AND CONDITIONS_

## How to Apply These Terms to Your New Programs

If you develop a new program, and you want it to be of the greatest possible use to
the public, the best way to achieve this is to make it free software which everyone
can redistribute and change under these terms.

To do so, attach the following notices to the program. It is safest to attach them
to the start of each source file to most effectively state the exclusion of warranty;
and each file should have at least the “copyright” line and a pointer to
where the full notice is found.

    <one line to give the program's name and a brief idea of what it does.>
    Copyright (C) 2020 Yang Cao

    This program is free software: you can redistribute it and/or modify
    it under the terms of the GNU General Public License as published by
    the Free Software Foundation, either version 3 of the License, or
    (at your option) any later version.

    This program is distributed in the hope that it will be useful,
    but WITHOUT ANY WARRANTY; without even the implied warranty of
    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
    GNU General Public License for more details.

    You should have received a copy of the GNU General Public License
    along with this program.  If not, see <http://www.gnu.org/licenses/>.

Also add information on how to contact you by electronic and paper mail.

If the program does terminal interaction, make it output a short notice like this
when it starts in an interactive mode:

    microbiomeMarker Copyright (C) 2020 Yang Cao
    This program comes with ABSOLUTELY NO WARRANTY; for details type 'show w'.
    This is free software, and you are welcome to redistribute it
    under certain conditions; type 'show c' for details.

The hypothetical commands `show w` and `show c` should show the appropriate parts of
the General Public License. Of course, your program's commands might be different;
for a GUI interface, you would use an “about box”.

You should also get your employer (if you work as a programmer) or school, if any, to
sign a “copyright disclaimer” for the program, if necessary. For more
information on this, and how to apply and follow the GNU GPL, see
&lt;<http://www.gnu.org/licenses/>&gt;.

The GNU General Public License does not permit incorporating your program into
proprietary programs. If your program is a subroutine library, you may consider it
more useful to permit linking proprietary applications with the library. If this is
what you want to do, use the GNU Lesser General Public License instead of this
License. But first, please read
&lt;<http://www.gnu.org/philosophy/why-not-lgpl.html>&gt;.


================================================
FILE: NAMESPACE
================================================
# Generated by roxygen2: do not edit by hand

S3method(plot,compareDA)
S3method(summary,compareDA)
export("%>%")
export("marker_table<-")
export(abundances)
export(aggregate_taxa)
export(compare_DA)
export(confounder)
export(extract_posthoc_res)
export(import_biom)
export(import_dada2)
export(import_mothur)
export(import_picrust2)
export(import_qiime)
export(import_qiime2)
export(marker_table)
export(microbiomeMarker)
export(nmarker)
export(norm_clr)
export(norm_cpm)
export(norm_css)
export(norm_rarefy)
export(norm_rle)
export(norm_tmm)
export(norm_tss)
export(nsamples)
export(ntaxa)
export(otu_table)
export(otu_table2metagenomeSeq)
export(phyloseq2DESeq2)
export(phyloseq2edgeR)
export(phyloseq2metagenomeSeq)
export(plot_abundance)
export(plot_cladogram)
export(plot_ef_bar)
export(plot_ef_dot)
export(plot_heatmap)
export(plot_postHocTest)
export(plot_sl_roc)
export(postHocTest)
export(run_aldex)
export(run_ancom)
export(run_ancombc)
export(run_deseq2)
export(run_edger)
export(run_lefse)
export(run_limma_voom)
export(run_marker)
export(run_metagenomeseq)
export(run_posthoc_test)
export(run_simple_stat)
export(run_sl)
export(run_test_multiple_groups)
export(run_test_two_groups)
export(sample_data)
export(sample_names)
export(subset_marker)
export(summarize_taxa)
export(tax_table)
export(taxa_names)
export(transform_abundances)
exportClasses(marker_table)
exportClasses(microbiomeMarker)
exportClasses(postHocTest)
exportMethods("[")
exportMethods(normalize)
exportMethods(nsamples)
exportMethods(ntaxa)
exportMethods(otu_table)
exportMethods(sample_data)
exportMethods(sample_names)
exportMethods(show)
exportMethods(tax_table)
exportMethods(taxa_names)
importClassesFrom(IRanges,DataFrameList)
importClassesFrom(phyloseq,otu_table)
importClassesFrom(phyloseq,phyloseq)
importClassesFrom(phyloseq,taxonomyTable)
importFrom(ANCOMBC,ancombc)
importFrom(Biobase,"pData<-")
importFrom(Biobase,AnnotatedDataFrame)
importFrom(Biobase,pData)
importFrom(ComplexHeatmap,Heatmap)
importFrom(ComplexHeatmap,HeatmapAnnotation)
importFrom(DESeq2,"dispersions<-")
importFrom(IRanges,DataFrameList)
importFrom(biomformat,biom_data)
importFrom(biomformat,read_biom)
importFrom(dplyr,"%>%")
importFrom(dplyr,arrange)
importFrom(dplyr,bind_cols)
importFrom(dplyr,bind_rows)
importFrom(dplyr,desc)
importFrom(dplyr,everything)
importFrom(dplyr,filter)
importFrom(dplyr,group_by)
importFrom(dplyr,group_modify)
importFrom(dplyr,group_split)
importFrom(dplyr,mutate)
importFrom(dplyr,rowwise)
importFrom(dplyr,select)
importFrom(dplyr,slice)
importFrom(dplyr,summarise)
importFrom(dplyr,ungroup)
importFrom(ggplot2,aes)
importFrom(ggplot2,aes_)
importFrom(ggplot2,aes_string)
importFrom(ggplot2,annotate)
importFrom(ggplot2,coord_equal)
importFrom(ggplot2,element_blank)
importFrom(ggplot2,element_text)
importFrom(ggplot2,facet_wrap)
importFrom(ggplot2,geom_boxplot)
importFrom(ggplot2,geom_col)
importFrom(ggplot2,geom_errorbar)
importFrom(ggplot2,geom_point)
importFrom(ggplot2,geom_rect)
importFrom(ggplot2,ggplot)
importFrom(ggplot2,guide_axis)
importFrom(ggplot2,guide_legend)
importFrom(ggplot2,guides)
importFrom(ggplot2,labeller)
importFrom(ggplot2,labs)
importFrom(ggplot2,scale_shape_manual)
importFrom(ggplot2,scale_x_continuous)
importFrom(ggplot2,scale_y_discrete)
importFrom(ggplot2,theme)
importFrom(ggplot2,theme_bw)
importFrom(ggtree,geom_cladelabel)
importFrom(ggtree,geom_hilight)
importFrom(ggtree,geom_point2)
importFrom(ggtree,ggtree)
importFrom(magrittr,"%>%")
importFrom(metagenomeSeq,"normFactors<-")
importFrom(metagenomeSeq,MRcounts)
importFrom(metagenomeSeq,cumNorm)
importFrom(metagenomeSeq,cumNormStatFast)
importFrom(metagenomeSeq,newMRexperiment)
importFrom(methods,setClass)
importFrom(methods,setGeneric)
importFrom(methods,setMethod)
importFrom(phyloseq,"otu_table<-")
importFrom(phyloseq,"sample_data<-")
importFrom(phyloseq,"tax_table<-")
importFrom(phyloseq,"taxa_are_rows<-")
importFrom(phyloseq,"taxa_names<-")
importFrom(phyloseq,`otu_table<-`)
importFrom(phyloseq,import_biom)
importFrom(phyloseq,import_mothur)
importFrom(phyloseq,import_qiime)
importFrom(phyloseq,merge_phyloseq)
importFrom(phyloseq,nsamples)
importFrom(phyloseq,ntaxa)
importFrom(phyloseq,otu_table)
importFrom(phyloseq,phy_tree)
importFrom(phyloseq,phyloseq)
importFrom(phyloseq,prune_samples)
importFrom(phyloseq,prune_taxa)
importFrom(phyloseq,rank_names)
importFrom(phyloseq,rarefy_even_depth)
importFrom(phyloseq,read_tree)
importFrom(phyloseq,refseq)
importFrom(phyloseq,sample_data)
importFrom(phyloseq,sample_names)
importFrom(phyloseq,sample_sums)
importFrom(phyloseq,t)
importFrom(phyloseq,tax_glom)
importFrom(phyloseq,tax_table)
importFrom(phyloseq,taxa_are_rows)
importFrom(phyloseq,taxa_names)
importFrom(phyloseq,taxa_sums)
importFrom(phyloseq,transform_sample_counts)
importFrom(plotROC,calc_auc)
importFrom(plotROC,geom_roc)
importFrom(plotROC,style_roc)
importFrom(purrr,map_dbl)
importFrom(purrr,pmap_chr)
importFrom(purrr,pmap_dbl)
importFrom(rlang,.data)
importFrom(stats,TukeyHSD)
importFrom(stats,anova)
importFrom(stats,aov)
importFrom(stats,coef)
importFrom(stats,dnorm)
importFrom(stats,drop1)
importFrom(stats,fisher.test)
importFrom(stats,formula)
importFrom(stats,glm)
importFrom(stats,kruskal.test)
importFrom(stats,lm)
importFrom(stats,median)
importFrom(stats,model.matrix)
importFrom(stats,model.tables)
importFrom(stats,na.omit)
importFrom(stats,p.adjust)
importFrom(stats,pairwise.table)
importFrom(stats,pf)
importFrom(stats,pnorm)
importFrom(stats,psignrank)
importFrom(stats,pt)
importFrom(stats,ptukey)
importFrom(stats,pwilcox)
importFrom(stats,qf)
importFrom(stats,qtukey)
importFrom(stats,quantile)
importFrom(stats,relevel)
importFrom(stats,residuals)
importFrom(stats,sd)
importFrom(stats,var)
importFrom(stats,wilcox.test)
importFrom(tidytree,treedata)
importFrom(utils,read.delim)
importFrom(vegan,cca)
importFrom(yaml,read_yaml)
importMethodsFrom(BiocGenerics,"sizeFactors<-")
importMethodsFrom(BiocGenerics,counts)
importMethodsFrom(BiocGenerics,normalize)
importMethodsFrom(S4Vectors,mcols)
importMethodsFrom(phyloseq,"otu_table<-")
importMethodsFrom(phyloseq,nsamples)
importMethodsFrom(phyloseq,ntaxa)
importMethodsFrom(phyloseq,otu_table)
importMethodsFrom(phyloseq,sample_data)
importMethodsFrom(phyloseq,sample_names)
importMethodsFrom(phyloseq,t)
importMethodsFrom(phyloseq,tax_table)
importMethodsFrom(phyloseq,taxa_names)


================================================
FILE: NEWS.md
================================================
# microbiomeMarker 1.3.2

+ fix error on subgroup in lefse, #62, #55

# microbiomeMarker 1.3.1 (2022-05-26)

+ Development version on Bioconductor.

# microbiomeMarker 1.2.1 (2022-05-26)

+ Confounder analysis.
+ Comparison of different methods.

# microbiomeMarker 1.2.0 (2022-04-27) 

+ Released on Bioconductor 3.15.

# microbiomeMarker 1.1.2 (2022-03-07)

+ Development version on Bioconductor
+ Use 3rd version of testthat to fix test error (use `expect_snapshot()` rather 
than `expect_known_ouput`).
+ Add two new arguments in `plot_heatmap()` `scale_by_row` and `annotation_col`
to improve heatmap viaualization, #52.
+ Set slot `marker_table` to `NULL` if no marker was identified.
+ Add new import function `import_picrust2()` to import prediction functional
table from PICRUSt2, and all DA functions support for PICRUSt2 output data.
+ Keep color consistent between legend and plot in cladogram, #42. 
+ Add a new argument `clade_label_font_size` in `plot_cladogram()` to specify 
font size of clade label, #49.

# microbiomeMarker  1.1.1 (2021-03-07)

+ Add a para `only_marker` in `plot_cladogram` to specify whether only show the 
markers or all features in the cladogram.
+ Fix a bug in `run_test_multiple_groups()`, error group names for enrich 
groups (2021-10-12, #48).
+ Fix a bug in `plot_abundance()`, error var name of effect size in
`marker_table` (2021-10-17, #47).

# microbiomeMarker 1.0.0 (2021-10-27)

+ Released on Bioconductor.

# microbiomeMarker 0.99.1 (2021-10-11)

+ Accepted by Bioconductor.

# microbiomeMarker 0.99.0 (2021-09-01)

+ Submitted to Bioconductor


================================================
FILE: R/AllClasses.R
================================================
# marker_table class ------------------------------------------------------

#' The S4 class for storing microbiome marker information
#'
#' This Class is inherit from `data.frame`. Rows represent the microbiome
#' markers and variables represents feature of the marker.
#'
#' @name marker_table-class
#' @aliases marker_table-class
#' @field names,row.names a character vector, inherited from the input
#' data.frame
#' @field .data a list, each element corresponding the each column of the
#' input data.frame
#' @field .S3Class character, the S3 class `marker_table` inherited from:
#' "`data.frame`"
#' @author Yang Cao
#' @exportClass marker_table
setClass("marker_table", contains = "data.frame")

# validator of marker_table
validity_marker_table <- function(object) {
    msg <- NULL
    if (!"feature" %in% names(object)) {
        msg <- c(
            msg,
            "marker table must contain variable `feature`: the name of marker"
        )
    }
    if (any(dim(object) == 0)) {
        msg <- c(msg, "marker table must have non-zero dimensions")
    }

    if (length(msg)) {
        return(msg)
    } else {
        return(TRUE)
    }
}

setValidity("marker_table", validity_marker_table)

################################################################################
# A class may be defined as the union of other classes; that is, as a virtual
# class defined as a superclass of several other classes. This is a way of
# dealing with the expected scenarios in which one or more of the slot is not
# available, in which case NULL will be used instead.
################################################################################
#' @importClassesFrom phyloseq taxonomyTable
#' @keywords internal
setClassUnion("marker_tableOrNULL", c("marker_table", "NULL"))
#' @keywords internal
setClassUnion("taxonomyTableOrNULL", c("taxonomyTable", "NULL"))
#' @keywords internal
setClassUnion("characterOrNULL", c("character", "NULL"))
#' @keywords internal
setClassUnion("numericOrNULL", c("numeric", "NULL"))

# microbiomeMarker class --------------------------------------------------

#' The main class for microbiomeMarker data
#'
#' `microbiomeMarker-class` is inherited from the [`phyloseq::phyloseq-class`]
#' by adding a custom slot `microbiome_marker` to save the differential analysis
#' results. And it provides a seamless interface with **phyloseq**, which makes
#' **microbiomeMarker** simple and easy to use. For more details on see the
#' document of [`phyloseq::phyloseq-class`].
#' @name microbiomeMarker-class
#' @aliases microbiomeMarker-class
#' @importClassesFrom phyloseq phyloseq
#' @slot marker_table a data.frame, a [`marker_table-class`] object.
#' @slot norm_method character, method used to normalize the input `phyloseq`
#'   object.
#' @slot diff_method character, method used for marker identification.
#' @seealso [`phyloseq::phyloseq-class`], [`marker_table-class`],
#' [summarize_taxa()]
#' @exportClass microbiomeMarker
#' @return a [`microbiomeMarker-class`] object.
`microbiomeMarker-class` <- setClass("microbiomeMarker",
    slots = c(
        marker_table = "marker_tableOrNULL",
        norm_method = "characterOrNULL",
        diff_method = "characterOrNULL"
    ),
    contains = "phyloseq",
    prototype = list(
        marker_table = NULL,
        norm_method = NULL,
        diff_method = NULL
    )
)

#' Build microbiomeMarker-class objects
#'
#' This the constructor to build the [`microbiomeMarker-class`] object, don't
#' use the `new()` constructor.
#' @param marker_table a [`marker_table-class`] object differtial analysis.
#' @param norm_method character, method used to normalize the input `phyloseq`
#'   object.
#' @param diff_method character, method used for microbiome marker
#'   identification.
#' @param ... arguments passed to [phyloseq::phyloseq()]
#' @seealso [phyloseq::phyloseq()]
#' @name microbiomeMarker
#' @export
#' @return  a [`microbiomeMarker-class`] object.
#' @examples
#' microbiomeMarker(
#'     marker_table = marker_table(data.frame(
#'         feature = c("speciesA", "speciesB"),
#'         enrich_group = c("groupA", "groupB"),
#'         ef_logFC = c(-2, 2),
#'         pvalue = c(0.01, 0.01),
#'         padj = c(0.01, 0.01),
#'         row.names = c("marker1", "marker2")
#'     )),
#'     norm_method = "TSS",
#'     diff_method = "DESeq2",
#'     otu_table = otu_table(matrix(
#'         c(4, 1, 1, 4),
#'         nrow = 2, byrow = TRUE,
#'         dimnames = list(c("speciesA", "speciesB"), c("sample1", "sample2"))
#'     ),
#'     taxa_are_rows = TRUE
#'     ),
#'     tax_table = tax_table(matrix(
#'         c("speciesA", "speciesB"),
#'         nrow = 2,
#'         dimnames = list(c("speciesA", "speciesB"), "Species")
#'     )),
#'     sam_data = sample_data(data.frame(
#'         group = c("groupA", "groupB"),
#'         row.names = c("sample1", "sample2")
#'     ))
#' )
microbiomeMarker <- function(marker_table = NULL,
    norm_method = NULL,
    diff_method = NULL,
    ...) {
    ps_slots <- list(...)
    ps_component_cls <- vapply(ps_slots, class, character(1))
    if (!"otu_table" %in% ps_component_cls) {
        stop("otu_table is required")
    }
    if (!"taxonomyTable" %in% ps_component_cls) {
        stop("tax_table is required")
    }

    # set the rownmaes of marker_table as "markern"
    if (!is.null(marker_table)) {
        rownames(marker_table) <- paste0("marker", seq_len(nrow(marker_table)))
    }

    new(
        "microbiomeMarker",
        marker_table = marker_table,
        norm_method = norm_method,
        diff_method = diff_method,
        ...
    )
}

# validity for microbiomeMarker, at least contains two slots: otu_table,
#  tax_table
#' @importMethodsFrom phyloseq taxa_names
validity_microbiomeMarker <- function(object) {
    msg <- NULL
    otu <- object@otu_table
    tax <- object@tax_table
    marker <- object@marker_table
    norm_method <- object@norm_method
    diff_method <- object@diff_method

    # summarized taxa
    if (is.null(tax)) {
        msg <- c(msg, "tax_table is required")
    }

    if (is.null(otu)) {
        msg <- c(msg, "otu_table is required")
    }

    # marker in marker_table must be contained in tax_table
    if (!is.null(marker) && !is.null(tax) &&
        !all(marker$feature %in% tax@.Data[, 1])) {
        msg <- c(msg, "marker in marker_table must be contained in tax")
    }

    if (!is.null(otu) && !is.null(tax) && nrow(otu) != nrow(tax)) {
        msg <- c(
            msg,
            "nrow of `otu_table` must be equal to the length of `tax_table()`"
        )
    }

    if (!is.null(tax) && !is.null(marker) && nrow(marker) > nrow(tax)) {
        msg <- c(
            msg,
            paste0(
                "The number of different feature must be smaller than the",
                " total number of feature"
            )
        )
    }

    if (length(msg)) {
        return(msg)
    } else {
        return(TRUE)
    }
}

setValidity("microbiomeMarker", validity_microbiomeMarker)

# postHocTest  class ------------------------------------------------------

#' The postHocTest Class, represents the result of post-hoc test result among
#' multiple groups
#'
#' @slot result  a [`IRanges::DataFrameList-class`], each `DataFrame` consists
#' of five variables:
#' * `comparisons`: character, specify which two groups to test (the group names
#'   are separated by "_)
#' * `diff_mean`: numeric, difference in mean abundances
#' * `pvalue`: numeric, p values
#' * `ci_lower` and `ci_upper`: numeric, lower and upper confidence interval of
#'   difference in mean abundances
#' @slot abundance abundance of each feature in each group
#' @slot conf_level confidence level
#' @slot method method used for post-hoc test
#' @slot method_str method illustration
#' @name postHocTest-class
#' @aliases postHocTest-class
#' @author Yang Cao
#' @exportClass postHocTest
#' @importClassesFrom IRanges DataFrameList
#' @return a [`postHocTest-class`] object.
setClass("postHocTest",
    slots = c(
        result = "DataFrameList",
        abundance = "data.frame",
        conf_level = "numeric",
        method = "character",
        method_str = "character"
    ),
    prototype = list(
        result = NULL,
        conf_level = NULL,
        method = NULL,
        method_str = NULL
    )
)

# validity for postHocTest
validity_postHocTest <- function(object) {
    msg <- NULL
    conf_level <- object@conf_level
    if (!is.numeric(conf_level) || conf_level < 0 || conf_level > 1) {
        msg <- c(
            msg,
            "conf_level must in the range of (0,1)"
        )
    }

    method <- object@method
    if (!method %in% 
            c("tukey", "games_howell", "scheffe", "welch_uncorrected")) {
        msg <- c(
            msg,
            paste(
                "method must be one of tukey, games_howell, scheffe or",
                "welch_uncorrected"
            )
        )
    }

    if (length(msg)) {
        return(msg)
    } else {
        return(TRUE)
    }
}

setValidity("postHocTest", validity_postHocTest)


#' Build postHocTest object
#'
#' This function is used for create `postHocTest` object, and is only used for
#' developers.
#'
#' @param result a [`IRanges::SimpleDFrameList-class`] object.
#' @param abundance data.frame.
#' @param conf_level numeric, confidence level.
#' @param method character, method for posthoc test.
#' @param  method_str character, illustrates which method is used for posthoc
#'   test.
#' @return a [`postHocTest-class`] object.
#' @export
#' @examples
#' require(IRanges)
#' pht <- postHocTest(
#'     result = DataFrameList(
#'         featureA = DataFrame(
#'             comparisons = c("group2-group1", 
#'                 "group3-group1", 
#'                 "group3-group2"),
#'             diff_mean = runif(3),
#'             pvalue = rep(0.01, 3),
#'             ci_lower = rep(0.01, 3),
#'             ci_upper = rep(0.011, 3)
#'         ),
#'         featureB = DataFrame(
#'             comparisons = c("group2-group1", 
#'                 "group3-group1", 
#'                 "group3-group2"),
#'             diff_mean = runif(3),
#'             pvalue = rep(0.01, 3),
#'             ci_lower = rep(0.01, 3),
#'             ci_upper = rep(0.011, 3)
#'         )
#'     ),
#'     abundance = data.frame(
#'         featureA = runif(3),
#'         featureB = runif(3),
#'         group = c("group1", "group2", "grou3")
#'     )
#' )
#' pht
postHocTest <- function(result,
    abundance,
    conf_level = 0.95,
    method = "tukey",
    method_str =
        paste(
            "Posthoc multiple comparisons of means: ",
            method
        )) {
    new(
        "postHocTest",
        result = result,
        abundance = abundance,
        conf_level = conf_level,
        method = method,
        method_str = method_str
    )
}


================================================
FILE: R/AllGenerics.R
================================================
# marker_table class -----------------------------------------------------------

#' Build or access the marker_table
#'
#' This is the recommended function for both building and accessing microbiome
#' marker table ([`marker_table-class`]).
#' @param object an object among the set of classes defined by the
#' microbiomeMarker package that contain `marker_table`
#' @export
#' @rdname marker_table-methods
#' @return a [`marker_table-class`] object.
#' @examples
#' data(enterotypes_arumugam)
#' mm <- run_limma_voom(
#'     enterotypes_arumugam,
#'     "Enterotype",
#'     contrast = c("Enterotype 3", "Enterotype 2"),
#'     pvalue_cutoff = 0.05,
#'     p_adjust = "fdr"
#' )
#' marker_table(mm)
setGeneric(
    "marker_table",
    function(object) standardGeneric("marker_table")
)

# build marker_table from data.frame
#' @aliases marker_table,data.frame-method
#' @rdname marker_table-methods
setMethod("marker_table", "data.frame", function(object) {
    mt <- new("marker_table", object)
    row.names(mt) <- paste0("marker", seq_len(nrow(object)))

    mt
})


# access the marker_table of a microbiomeMarker-class object
#' @rdname marker_table-methods
#' @aliases marker_table,microbiomeMarker-method

setMethod("marker_table", "microbiomeMarker", function(object) {
    object@marker_table
})


# Assign marker_table -----------------------------------------------------
#' Assign marker_table to `object`
#'
#' This function replace the `marker_table` slot of `object` with `value`.
#'
#' @param object a [`microbiomeMarker-class`] object to modify.
#' @param value new value to replace the `marker_table` slot of `object`.
#'   Either a `marker_table-class`, a `data.frame` that can be coerced
#'   into `marker_table-class`.
#' @export
#' @rdname assign-marker_table
#' @aliases assign-marker_table marker_table<-
#' @return a [`microbiomeMarker-class`] object.
#' @examples
#' data(enterotypes_arumugam)
#' mm <- run_limma_voom(
#'     enterotypes_arumugam,
#'     "Enterotype",
#'     contrast = c("Enterotype 3", "Enterotype 2"),
#'     pvalue_cutoff = 0.1,
#'     p_adjust = "fdr"
#' )
#' mm_marker <- marker_table(mm)
#' mm_marker
#' marker_table(mm) <- mm_marker[1:2, ]
#' marker_table(mm)
"marker_table<-" <- function(object, value) {
    if (!inherits(value, "marker_table") && !is.null(value)) {
        value <- marker_table(value)
    }

    microbiomeMarker(
        marker_table = value,
        norm_method = object@norm_method,
        diff_method = object@diff_method,
        otu_table = object@otu_table,
        tax_table = object@tax_table,
        phy_tree = object@phy_tree,
        refseq = object@refseq
    )
}
# microbiomeMarker class ------------------------------------------------------

# modified from the show method of phyloseq
# https://github.com/joey711/phyloseq/blob/master/R/show-methods.R#L47-L82
#' @rdname microbiomeMarker-class
#' @param object a `microbiomeMarker-class` object
#' @export
setMethod("show", "microbiomeMarker", function(object) {
    cat("microbiomeMarker-class inherited from phyloseq-class", fill = TRUE)
    norm <- object@norm_method
    if (!is.null(norm)) {
        if (grepl("per-sample normalized", norm)) {
            norm <- gsub(".*to ", "", norm)
            cat(
                "normalization: per-sample to value [", norm, "]",
                fill = TRUE
            )
        } else {
            cat(
                "normalization method:              [", norm, "]",
                fill = TRUE
            )
        }
    }

    if (!is.null(object@diff_method)) {
        cat(
            "microbiome marker identity method: [",
            object@diff_method,
            "]",
            fill = TRUE
        )
    }
    
    if (!is.null(object@marker_table)) {
        cat(
            "marker_table() Marker Table:       [",
            nrow(object@marker_table), "microbiome markers with",
            ncol(object@marker_table), "variables ]",
            fill = TRUE
        )
    } else {
        cat(
            "marker_table() Marker Table:       [",
            "no microbiome markers were identified ]",
            fill = TRUE
        )
    }

    # print otu_table (always there).
    cat(
        "otu_table()    OTU Table:          [",
        ntaxa(otu_table(object)), "taxa and ",
        nsamples(otu_table(object)), "samples ]",
        fill = TRUE
    )

    # print Sample Data if there
    if (!is.null(sample_data(object, FALSE))) {
        cat(
            "sample_data()  Sample Data:        [", dim(sample_data(object))[1],
            "samples by ", dim(sample_data(object))[2],
            "sample variables ]",
            fill = TRUE
        )
    }

    # print tax Tab if there
    if (!is.null(tax_table(object, FALSE))) {
        cat(
            "tax_table()    Taxonomy Table:     [", dim(tax_table(object))[1],
            "taxa by", dim(tax_table(object))[2],
            "taxonomic ranks ]",
            fill = TRUE
        )
    }

    # print tree if there
    if (!is.null(phy_tree(object, FALSE))) {
        cat(
            "phy_tree()    Phylogenetic Tree:   [", ntaxa(phy_tree(object)),
            "tips and", phy_tree(object)$Nnode,
            "internal nodes ]",
            fill = TRUE
        )
    }

    # print refseq summary if there
    if (!is.null(refseq(object, FALSE))) {
        cat(
            "refseq()      ", class(refseq(object))[1],
            ":         [", ntaxa(refseq(object)),
            " reference sequences ]",
            fill = TRUE
        )
    }
})

# get the number of markers -----------------------------------------------

#' Get the number of microbiome markers
#' @param object a [`microbiomeMarker-class`] or [`marker_table-class`] object
#' @docType methods
#' @rdname nmarker-methods
#' @return an integer, the number of microbiome markers
#' @export
#' @examples
#' mt <- marker_table(data.frame(
#'     feature = c("speciesA", "speciesB"),
#'     enrich_group = c("groupA", "groupB"),
#'     ef_logFC = c(-2, 2),
#'     pvalue = c(0.01, 0.01),
#'     padj = c(0.01, 0.01),
#'     row.names = c("marker1", "marker2")
#' ))
#' nmarker(mt)
setGeneric("nmarker", function(object) standardGeneric("nmarker"))

#' @rdname nmarker-methods
#' @aliases nmarker,microbiomeMarker-method
setMethod("nmarker", "microbiomeMarker", function(object) {
    marker <- marker_table(object)
    ifelse(is.null(marker), 0L,  nrow(marker))
})

#' @rdname nmarker-methods
#' @aliases nmarker,marker_table-method
setMethod("nmarker", "marker_table", function(object) {
    ifelse(is.null(object), 0L,  nrow(object))
})

# postHocTest class -------------------------------------------------------
#' @rdname postHocTest-class
#' @aliases show, postHocTest-method
#' @param object a `postHocTest-class` object
#' @export
setMethod("show", "postHocTest", function(object) {
    cat("postHocTest-class object", fill = TRUE)
    result <- object@result
    var_mean <- c(
        "pair groups to test which separated by '-'",
        "difference in mean proportions",
        "post hoc test p values",
        "lower confidence interval",
        "upper confidence interval"
    )
    cat(
        "Pairwise test result of", length(result), " features, ",
        "DataFrameList object, each DataFrame has five variables:\n       ",
        paste0(
            names(result[[1]]),
            c("    : ", ": ", "        : ", " : ", " : "),
            var_mean,
            collapse = "        ",
            "\n"
        )
    )
    cat(
        "Posthoc multiple comparisons of means",
        " using ", object@method, " method",
        fill = TRUE
    )
})


================================================
FILE: R/DA-aldex.R
================================================
#' Perform differential analysis using ALDEx2
#'
#' @param ps a [`phyloseq::phyloseq-class`] object
#' @param group character, the variable to set the group
#' @param taxa_rank character to specify taxonomic rank to perform
#'   differential analysis on. Should be one of
#'   `phyloseq::rank_names(phyloseq)`, or "all" means to summarize the taxa by
#'   the top taxa ranks (`summarize_taxa(ps, level = rank_names(ps)[1])`), or
#'   "none" means perform differential analysis on the original taxa
#'   (`taxa_names(phyloseq)`, e.g., OTU or ASV).
#' @param transform character, the methods used to transform the microbial
#'   abundance. See [`transform_abundances()`] for more details. The
#'   options include:
#'   * "identity", return the original data without any transformation
#'     (default).
#'   * "log10", the transformation is `log10(object)`, and if the data contains
#'     zeros the transformation is `log10(1 + object)`.
#'   * "log10p", the transformation is `log10(1 + object)`.
#' @param norm the methods used to normalize the microbial abundance data. See
#'   [`normalize()`] for more details.
#'   Options include:
#'   * "none": do not normalize.
#'   * "rarefy": random subsampling counts to the smallest library size in the
#'     data set.
#'   * "TSS": total sum scaling, also referred to as "relative abundance", the
#'     abundances were normalized by dividing the corresponding sample library
#'     size.
#'   * "TMM": trimmed mean of m-values. First, a sample is chosen as reference.
#'     The scaling factor is then derived using a weighted trimmed mean over the
#'     differences of the log-transformed gene-count fold-change between the
#'     sample and the reference.
#'   * "RLE", relative log expression, RLE uses a pseudo-reference calculated
#'     using the geometric mean of the gene-specific abundances over all
#'     samples. The scaling factors are then calculated as the median of the
#'     gene counts ratios between the samples and the reference.
#'   * "CSS": cumulative sum scaling, calculates scaling factors as the
#'     cumulative sum of gene abundances up to a data-derived threshold.
#'   * "CLR": centered log-ratio normalization.
#'   * "CPM": pre-sample normalization of the sum of the values to 1e+06.
#' @param norm_para arguments passed to specific normalization methods
#' @param method test method, options include: "t.test" and "wilcox.test"
#'   for two groups comparison,  "kruskal" and "glm_anova" for multiple groups
#'   comparison.
#' @param p_adjust method for multiple test correction, default `none`,
#' for more details see [stats::p.adjust].
#' @param pvalue_cutoff cutoff of p value, default 0.05.
#' @param mc_samples integer, the number of Monte Carlo samples to use for
#'   underlying distributions estimation, 128 is usually sufficient.
#' @param denom character string, specifiy which features used to as the
#'   denominator for the geometric mean calculation. Options are:
#'   * "all", with all features.
#'   * "iqlr", accounts for data with systematic variation and centers the
#'    features on the set features that have variance that is between the lower
#'    and upper quartile of variance.
#'   *  "zero", a more extreme case where there are many non-zero features in
#'     one condition but many zeros in another. In this case the geometric mean
#'     of each group is calculated using the set of per-group non-zero features.
#'   * "lvha", with house keeping features.
#' @param paired logical, whether to perform paired tests, only worked for
#'   method "t.test" and "wilcox.test".
#' @export
#' @references Fernandes, A.D., Reid, J.N., Macklaim, J.M. et al. Unifying the
#'   analysis of high-throughput sequencing datasets: characterizing RNA-seq,
#'   16S rRNA gene sequencing and selective growth experiments by compositional
#'   data analysis. Microbiome 2, 15 (2014).
#' @seealso [`ALDEx2::aldex()`]
#' @return a [`microbiomeMarker-class`] object.
#' @examples
#' data(enterotypes_arumugam)
#' ps <- phyloseq::subset_samples(
#'     enterotypes_arumugam,
#'     Enterotype %in% c("Enterotype 3", "Enterotype 2")
#' )
#' run_aldex(ps, group = "Enterotype")
run_aldex <- function(ps,
    group,
    taxa_rank = "all",
    transform = c("identity", "log10", "log10p"),
    norm = "none",
    norm_para = list(),
    method = c(
        "t.test", "wilcox.test",
        "kruskal", "glm_anova"
    ),
    p_adjust = c(
        "none", "fdr", "bonferroni", "holm",
        "hochberg", "hommel", "BH", "BY"
    ),
    pvalue_cutoff = 0.05,
    mc_samples = 128,
    denom = c("all", "iqlr", "zero", "lvha"),
    paired = FALSE) {
    stopifnot(inherits(ps, "phyloseq"))
    ps <- check_rank_names(ps) %>%
        check_taxa_rank( taxa_rank)

    denom <- match.arg(denom, c("all", "iqlr", "zero", "lvha"))
    p_adjust <- match.arg(
        p_adjust,
        c(
            "none", "fdr", "bonferroni", "holm",
            "hochberg", "hommel", "BH", "BY"
        )
    )

    # trans method as argument test in ALDEx2::aldex
    method <- match.arg(
        method,
        c("t.test", "wilcox.test", "kruskal", "glm_anova")
    )
    if (method %in% c("t.test", "wilcox.test")) {
        test_method <- "t"
    } else {
        test_method <- "kw"
    }

    # check whether group is valid, write a function
    sample_meta <- sample_data(ps)
    meta_nms <- names(sample_meta)
    if (!group %in% meta_nms) {
        stop(
            group, " are not contained in the `sample_data` of `ps`",
            call. = FALSE
        )
    }

    transform <- match.arg(transform, c("identity", "log10", "log10p"))

    # preprocess phyloseq object
    ps <- preprocess_ps(ps)
    ps <- transform_abundances(ps, transform = transform)

    # normalize the data
    norm_para <- c(norm_para, method = norm, object = list(ps))
    ps_normed <- do.call(normalize, norm_para)
    ps_summarized <- pre_ps_taxa_rank(ps_normed, taxa_rank)
    groups <- sample_meta[[group]]
    abd <- abundances(ps_summarized, norm = TRUE)

    test_fun <- ifelse(test_method == "t", aldex_t, aldex_kw)
    test_para <- list(
        reads = abd,
        conditions = groups,
        method = method,
        mc_samples = mc_samples,
        denom = denom,
        p_adjust = p_adjust
    )
    if (test_method == "t") {
        test_para <- c(test_para, paired = paired)
    }

    test_out <- tryCatch(
        do.call(test_fun, test_para),
        error = function(e) e
    )

    # check whether counts are integers
    if (inherits(test_out, "error") &&
        conditionMessage(test_out) == "not all reads are integers") {
        warning(
            "Not all reads are integers, the reads are ceiled to integers.\n",
            "   Raw reads is recommended from the ALDEx2 paper.",
            call. = FALSE
        )
        test_para$reads <- ceiling(abd)
        test_out <- do.call(test_fun, test_para)
    }

    sig_feature <- dplyr::filter(test_out, .data$padj <= pvalue_cutoff)
    marker <- return_marker(sig_feature, test_out)

    feature <- test_out$feature
    tax <- matrix(feature) %>%
        tax_table()
    row.names(tax) <- row.names(abd)

    mm <- microbiomeMarker(
        marker_table = marker,
        norm_method = get_norm_method(norm),
        diff_method = paste0("ALDEx2_", method),
        sam_data = sample_data(ps_summarized),
        otu_table = otu_table(abd, taxa_are_rows = TRUE),
        tax_table = tax
    )

    mm
}



# aldex t test, wilcox test
# In the original version of ALDEx2, each p value is corrected using the
# Benjamini-Hochberg method. Here, we add a new argument `p_adjust` to
# make aldex support for other correction methods.
aldex_t <- function(reads,
    conditions,
    mc_samples,
    method = c("t.test", "wilcox.test"),
    denom = c("all", "iqlr", "zero", "lvha"),
    p_adjust = c(
        "none", "fdr", "bonferroni", "holm",
        "hochberg", "hommel", "BH", "BY"
    ),
    paired = FALSE) {
    method <- match.arg(method, c("t.test", "wilcox.test"))
    demon <- match.arg(denom, c("all", "iqlr", "zero", "lvha"))
    p_adjust <- match.arg(
        p_adjust,
        c(
            "none", "fdr", "bonferroni", "holm",
            "hochberg", "hommel", "BH", "BY"
        )
    )
    conditions <- as.factor(conditions)

    if (!inherits(reads, "aldex.clr")) {
        reads_clr <- ALDEx2::aldex.clr(
            reads = reads,
            conds = as.character(conditions),
            mc.samples = mc_samples,
            denom = denom
        )
        feature <- row.names(reads)
    } else {
        reads_clr <- reads
        feature <- row.names(reads@reads)
    }

    mc_instance <- reads_clr@analysisData
    mc_instance_ldf <- convert_instance(mc_instance, mc_samples)

    if (method == "t.test") {
        pvalue <- purrr::map_dfc(
            mc_instance_ldf,
            t_fast,
            group = conditions, paired = paired)
    } else {
        pvalue <- purrr::map_dfc(
            mc_instance_ldf,
            wilcox_fast,
            group = conditions, paired = paired
        )
    }

    padj_greater <- purrr::map_dfc(
        pvalue,
        \(x) p.adjust (2 * x, method = p_adjust)
    )
    padj_less <- purrr::map_dfc(
        pvalue,
        \(x) p.adjust (2 * (1 - x), method = p_adjust)
    )

    # making this into a two-sided test
    pvalue_greater <-2 * pvalue
    pvalue_less <- 2 * (1 -  pvalue)

    # making sure the max p-value is 1
    pvalue_greater <- apply(pvalue_greater, c(1, 2), \(x) min(x, 1))
    pvalue_less <- apply(pvalue_less, c(1, 2), \(x) min(x, 1))

    # get the expected value of p value and adjusted p value
    e_pvalue <- cbind(rowMeans(pvalue_greater), rowMeans(pvalue_less)) |>
        apply(1, min)
    e_padj <- cbind(rowMeans(padj_greater), rowMeans(padj_less)) |>
        apply(1, min)

    # effect size
    ef <- ALDEx2::aldex.effect(
        reads_clr,
        include.sample.summary = FALSE,
        verbose = FALSE
    )
    # enrich group
    cds <- gsub("rab.win.", "", names(ef)[2:3])
    ef <- ef$effect
    enrich_group <- ifelse(ef > 0, cds[1], cds[2])

    res <- data.frame(
        feature = feature,
        enrich_group = enrich_group,
        ef_aldex = ef,
        pvalue = e_pvalue,
        padj = e_padj
    )

    res
}

# aldex kruskal-wallis test and glm anova statistics
#' @importFrom stats kruskal.test glm drop1
aldex_kw <- function(reads,
    conditions,
    method = c("kruskal", "glm_anova"),
    mc_samples = 128,
    denom = c("all", "iqlr", "zero", "lvha"),
    p_adjust = c(
        "none", "fdr", "bonferroni", "holm",
        "hochberg", "hommel", "BH", "BY"
    )) {
    method <- match.arg(method, c("kruskal", "glm_anova"))
    demon <- match.arg(denom, c("all", "iqlr", "zero", "lvha"))
    p_adjust <- match.arg(
        p_adjust,
        c(
            "none", "fdr", "bonferroni", "holm",
            "hochberg", "hommel", "BH", "BY"
        )
    )
    conditions <- as.factor(conditions)

    if (!inherits(reads, "aldex.clr")) {
        reads_clr <- ALDEx2::aldex.clr(
            reads = reads,
            conds = conditions,
            mc.samples = mc_samples,
            denom = denom
        )
        feature <- row.names(reads)
    } else {
        reads_clr <- reads
        feature <- row.names(reads@reads)
    }

    mc_instance <- reads_clr@analysisData
    # convert mc_instance to a list of data frame, each element represents a mc
    # sample for all samples.
    mc_instance_ldf <- convert_instance(mc_instance, mc_samples)

    if (method == "kruskal") {
        pvalue <- purrr::map_dfc(
            mc_instance_ldf,
            function(x) {
                apply(
                    x, 1,
                    function(y) {
                        stats::kruskal.test(y, g = factor(conditions))[[3]]
                    }
                )
            }
        )
    } else {
        pvalue <- purrr::map_dfc(
            mc_instance_ldf,
            function(x) {
                apply(
                    x, 1,
                    function(y) {
                        stats::glm(as.numeric(y) ~ factor(conditions)) %>%
                            stats::drop1(test = "Chis") %>%
                            purrr::pluck(5, 2)
                    }
                )
            }
        )
    }

    padj <- purrr::map_dfc(pvalue, p.adjust, method = p_adjust)
    e_pvalue <- rowMeans(pvalue)
    e_padj <- rowMeans(padj)

    # f statistic
    ef_F_statistic <- purrr::map_dfc(
        mc_instance_ldf,
        function(x) {
            apply(
                x, 1,
                function(y) {
                    summary(aov(y ~ factor(conditions)))[[1]]$`F value`[1]
                }
            )
        }
    ) %>%
        rowMeans()

    enrich_group <- get_aldex_kwglm_enrich_group(mc_instance_ldf, conditions)

    res <- data.frame(
        feature = feature,
        enrich_group = enrich_group,
        ef_F_statistic = ef_F_statistic,
        pvalue = e_pvalue,
        padj = e_padj
    )

    res
}

# enriched group for kw and glm anova
get_aldex_kwglm_enrich_group <- function(mc_instance_ldf, conditions) {
    instance_split <- purrr::map(
        mc_instance_ldf,
        ~ split(data.frame(t(.x)), conditions)
    )
    instance_mean <- purrr::map(
        instance_split,
        ~ purrr::map_dfc(.x, colMeans)
    )
    instance_mean <- Reduce("+", instance_mean)
    max_idx <- apply(instance_mean, 1, which.max)
    enrich_group <- names(instance_mean)[max_idx]

    enrich_group
}

# Each element of mc instances of a clr object represents all instances of a
# sample, this function convert mc instances to list data frames where each
# element represents a mc instance for all samples
convert_instance <- function(mc_instance, mc_samples) {
    mc_instance_ldf <- purrr::map(
        seq.int(mc_samples),
        function(x) {
            res <- purrr::map_dfc(mc_instance, function(y) y[, x])
            names(res) <- names(mc_instance)
            res
        }
    )

    mc_instance_ldf
}


# fast test function modified from ALDEx2::t.fast
#' @importFrom stats pt
t_fast <- function(x, group, paired = FALSE) {
    grp1 <- group == unique(group)[1]
    grp2 <- group == unique(group)[2]
    n1 <- sum(grp1)
    n2 <- sum(grp2)

    if (paired) {
        # Order pairs for the mt.teststat function
        if (n1 != n2) stop("Cannot pair uneven groups.")
        idx1 <- which(grp1)
        idx2 <- which(grp2)
        paired_order <- unlist(
            lapply(
                seq_along(idx1),
                function(i) c(idx1[i], idx2[i])
            )
        )

        t <- multtest::mt.teststat(
            x[, paired_order],
            as.numeric(grp1)[paired_order],
            test = "pairt",
            nonpara = "n"
        )
        df <- length(idx1) - 1
    } else {
        t <- multtest::mt.teststat(x,
            as.numeric(grp1),
            test = "t",
            nonpara = "n"
        )
        s1 <- apply(x[, grp1], 1, sd)
        s2 <- apply(x[, grp2], 1, sd)
        df <- ((s1^2 / n1 + s2^2 / n2)^2) / ((s1^2 / n1)^2 / (n1 - 1) +
                (s2^2 / n2)^2 / (n2 - 1))
    }

    return(pt(t, df = df, lower.tail = FALSE))
}

# wilcox.fast function replaces wilcox.test
#  * runs much faster
#  * uses exact distribution for ties!
#    * this differs from ?wilcox.test
#  * optional paired test
#    * equivalent to wilcox.test(..., correct = FALSE)
#  * uses multtest
#' @importFrom stats psignrank pnorm pwilcox wilcox.test
wilcox_fast <- function(x, group, paired = FALSE) {
    stopifnot(ncol(x) == length(group))
    grp1 <- group == unique(group)[1]
    grp2 <- group == unique(group)[2]
    n1 <- sum(grp1)
    n2 <- sum(grp2)

    # Check for ties in i-th Monte-Carlo instance
    xt <- t(x)
    if (paired) {
        any_ties <- any(
            apply(xt[grp1, ] - xt[grp2, ], 2, function(y) length(unique(y))) !=
                ncol(x) / 2
        )
    } else {
        any_ties <- any(
            apply(xt, 2, function(y) length(unique(y))) != ncol(x)
        )
    }

    # Ties trigger slower, safer wilcox.test function
    if (any_ties) {
        res <- apply(
            xt, 2,
            function(i) {
                wilcox.test(
                    i[grp1], i[grp2],
                    paired = paired,
                    alternative = "greater",
                    correct = FALSE
                )$p.value
            }
        )

        return(res)
    }

    if (paired) {
        if (n1 != n2) stop("Cannot pair uneven groups.")
        x_diff <- xt[grp1, ] - xt[grp2, ]
        v <- apply(x_diff, 2, function(y) sum(rank(abs(y))[y > 0]))
        topscore <- (n1 * (n1 + 1)) / 2
        if (sum(grp1) < 50) {
            # as per wilcox test, use exact -- ASSUMES NO TIES!!
            res <- psignrank(v - 1, n1, lower.tail = FALSE)
        } else { # Use normal approximation
            v_std <- (v - topscore / 2) /
                sqrt(n1 * (n1 + 1) * (2 * n1 + 1) / 24)
            res <- pnorm(v_std, lower.tail = FALSE)
        }
    } else {
        w_std <- multtest::mt.teststat(x, as.numeric(grp1), test = "wilcoxon")
        if (sum(grp1) < 50 && sum(grp2) < 50) {
            # as per wilcox test, use exact -- ASSUMES NO TIES!!
            w_var <- sqrt((n1 * n2) * (n1 + n2 + 1) / 12)
            w <- w_std * w_var + (n1 * n2) / 2
            res <- pwilcox(w - 1, n1, n2, lower.tail = FALSE)
        } else { # Use normal approximation
            res <- pnorm(w_std, lower.tail = FALSE)
        }
    }

    res
}


================================================
FILE: R/DA-all.R
================================================
#' Find makers (differentially expressed metagenomic features)
#'
#' `run_marker` is a wrapper of all differential analysis functions.
#'
#' @param ps a [`phyloseq::phyloseq-class`] object
#' @param group character, the variable to set the group
#' @param da_method character to specify the differential analysis method. The
#'   options include:
#'   * "lefse", linear discriminant analysis (LDA) effect size (LEfSe) method,
#'     for more details see [`run_lefse()`].
#'   * "simple_t", "simple_welch", "simple_white", "simple_kruskal",
#'     and "simple_anova", simple statistic methods; "simple_t", "simple_welch"
#'     and "simple_white" for two groups comparison; "simple_kruskal", and
#'     "simple_anova" for multiple groups comparison. For more details see
#'     [`run_simple_stat()`].
#'   * "edger", see [`run_edger()`].
#'   * "deseq2", see [`run_deseq2()`].
#'   * "metagenomeseq", differential expression analysis based on the
#'     Zero-inflated Log-Normal mixture model or Zero-inflated Gaussian mixture
#'     model using metagenomeSeq, see [`run_metagenomeseq()`].
#'   * "ancom", see [`run_ancom()`].
#'   * "ancombc", differential analysis of compositions of microbiomes with
#'     bias correction, see [`run_ancombc()`].
#'   * "aldex", see [`run_aldex()`].
#'   * "limma_voom", see [`run_limma_voom()`].
#'   * "sl_lr", "sl_rf", and "sl_svm", there supervised leaning (SL) methods:
#'     logistic regression (lr), random forest (rf), or support vector machine
#'     (svm). For more details see [`run_sl()`].
#' @param taxa_rank character to specify taxonomic rank to perform
#'   differential analysis on. Should be one of
#'   `phyloseq::rank_names(phyloseq)`, or "all" means to summarize the taxa by
#'   the top taxa ranks (`summarize_taxa(ps, level = rank_names(ps)[1])`), or
#'   "none" means perform differential analysis on the original taxa
#'   (`taxa_names(phyloseq)`, e.g., OTU or ASV).
#' @param transform character, the methods used to transform the microbial
#'   abundance. See [`transform_abundances()`] for more details. The
#'   options include:
#'   * "identity", return the original data without any transformation
#'     (default).
#'   * "log10", the transformation is `log10(object)`, and if the data contains
#'     zeros the transformation is `log10(1 + object)`.
#'   * "log10p", the transformation is `log10(1 + object)`.
#' @param norm the methods used to normalize the microbial abundance data. See
#'   [`normalize()`] for more details.
#'   Options include:
#'   * "none": do not normalize.
#'   * "rarefy": random subsampling counts to the smallest library size in the
#'     data set.
#'   * "TSS": total sum scaling, also referred to as "relative abundance", the
#'     abundances were normalized by dividing the corresponding sample library
#'     size.
#'   * "TMM": trimmed mean of m-values. First, a sample is chosen as reference.
#'     The scaling factor is then derived using a weighted trimmed mean over
#'     the differences of the log-transformed gene-count fold-change between
#'     the sample and the reference.
#'   * "RLE", relative log expression, RLE uses a pseudo-reference calculated
#'     using the geometric mean of the gene-specific abundances over all
#'     samples. The scaling factors are then calculated as the median of the
#'     gene counts ratios between the samples and the reference.
#'   * "CSS": cumulative sum scaling, calculates scaling factors as the
#'     cumulative sum of gene abundances up to a data-derived threshold.
#'   * "CLR": centered log-ratio normalization.
#'   * "CPM": pre-sample normalization of the sum of the values to 1e+06.
#' @param norm_para arguments passed to specific normalization methods
#' @param p_adjust method for multiple test correction, default `none`,
#'   for more details see [stats::p.adjust].
#' @param pvalue_cutoff numeric, p value cutoff, default 0.05.
#' @param ... extra arguments passed to the corresponding differential analysis
#'   functions, e.g. [`run_lefse()`].
#' @return a [`microbiomeMarker-class`] object.
#' @details This function is only a wrapper of all differential analysis
#'   functions, We recommend to use the corresponding function, since it has a
#'   better default arguments setting.
#' @export
#' @seealso [`run_lefse()`],[`run_simple_stat()`],[`run_test_two_groups()`],
#'   [`run_test_multiple_groups()`],[`run_edger()`],[`run_deseq2()`],
#'   [`run_metagenomeseq`],[`run_ancom()`],[`run_ancombc()`],[`run_aldex()`],
#'   [`run_limma_voom()`],[`run_sl()`]
run_marker <- function(ps,
    group,
    da_method = c(
        "lefse", "simple_t", "simple_welch",
        "simple_white", "simple_kruskal",
        "simple_anova", "edger", "deseq2",
        "metagenomeseq", "ancom", "ancombc", "aldex",
        "limma_voom", "sl_lr", "sl_rf", "sl_svm"
    ),
    taxa_rank = "all",
    transform = c("identity", "log10", "log10p"),
    norm = "none",
    norm_para = list(),
    p_adjust = c(
        "none", "fdr", "bonferroni", "holm",
        "hochberg", "hommel", "BH", "BY"
    ),
    pvalue_cutoff = 0.05,
    ...) {
    stopifnot(inherits(ps, "phyloseq"))

    transform <- match.arg(transform, c("identity", "log10", "log10p"))
    da_method <- match.arg(
        da_method,
        c(
            "lefse", "simple_t", "simple_welch",
            "simple_white", "simple_kruskal", "simple_anova",
            "edger", "deseq2", "metagenomeseq", "ancom",
            "ancombc", "aldex", "limma_voom",
            "sl_lr", "sl_rf", "sl_svm"
        )
    )
    p_adjust <- match.arg(
        p_adjust,
        c(
            "none", "fdr", "bonferroni", "holm",
            "hochberg", "hommel", "BH", "BY"
        )
    )

    # group
    sample_meta <- sample_data(ps)
    if (!group %in% names(sample_meta)) {
        stop("`group` must in the field of sample meta data", call. = FALSE)
    }
    groups <- sample_meta[[group]]
    n_group <- length(unique(groups))
    if (n_group == 1) {
        stop("at least two groups required", call. = FALSE)
    }

    para <- c(
        list(...),
        ps = ps,
        group = group,
        taxa_rank = taxa_rank,
        transform = transform,
        norm = norm,
        norm_para = norm_para,
        p_adjust = p_adjust,
        pvalue_cutoff = pvalue_cutoff
    )

    test_fun <- switch(da_method,
        lefse = run_lefse,
        edger = run_edger,
        metagenomeseq = run_metagenomeseq,
        deseq2 = run_deseq2,
        ancom = run_ancom,
        ancombc = run_ancombc,
        aldex = run_aldex,
        limma_voom = run_limma_voom
    )

    if (da_method == "lefse") {
        para <- c(
            list(...),
            ps = ps,
            class = group,
            taxa_rank = taxa_rank,
            transform = transform,
            norm = norm,
            norm_para = norm_para
        )
    }
    if (da_method %in% c(
        "simple_t", "simple_welch",
        "simple_white", "simple_kruskal", "simple_anova"
    )) {
        test_method <- switch(da_method,
            simple_t = "t.test",
            simple_wilch = "welch.test",
            simple_white = "white.test",
            simple_anova = "anova",
            simple_kruskal = "kruskal"
        )
        para <- c(para, method = test_method)
        test_fun <- run_simple_stat
    }
    if (da_method %in% c("sl_lr", "sl_rf", "sl_svm")) { # sl methods
        sl_method <- gsub("sl_", "", da_method) %>%
            toupper()
        para <- c(
            list(...),
            ps = ps,
            group = group,
            taxa_rank = taxa_rank,
            transform = transform,
            norm = norm,
            norm_para = norm_para,
            method = sl_method
        )
        test_fun <- run_sl
    }
    mm <- do.call(test_fun, para)

    mm
}


================================================
FILE: R/DA-ancom.R
================================================
#' Perform differential analysis using ANCOM
#'
#' Perform significant test by comparing the pairwise log ratios between all
#' features.
#'
#' @param ps a \code{\link[phyloseq]{phyloseq-class}} object.
#' @param group character, the variable to set the group.
#' @param confounders character vector, the confounding variables to be adjusted.
#'   default `character(0)`, indicating no confounding variable.
#' @param taxa_rank character to specify taxonomic rank to perform
#'   differential analysis on. Should be one of
#'   `phyloseq::rank_names(phyloseq)`, or "all" means to summarize the taxa by
#'   the top taxa ranks (`summarize_taxa(ps, level = rank_names(ps)[1])`), or
#'   "none" means perform differential analysis on the original taxa
#'   (`taxa_names(phyloseq)`, e.g., OTU or ASV).
#' @param transform character, the methods used to transform the microbial
#'   abundance. See [`transform_abundances()`] for more details. The
#'   options include:
#'   * "identity", return the original data without any transformation.
#'   * "log10", the transformation is `log10(object)`, and if the data contains
#'     zeros the transformation is `log10(1 + object)`.
#'   * "log10p", the transformation is `log10(1 + object)`.
#' @param norm the methods used to normalize the microbial abundance data. See
#'   [`normalize()`] for more details.
#'   Options include:
#'   * "none": do not normalize.
#'   * "rarefy": random subsampling counts to the smallest library size in the
#'     data set.
#'   * "TSS": total sum scaling, also referred to as "relative abundance", the
#'     abundances were normalized by dividing the corresponding sample library
#'     size.
#'   * "TMM": trimmed mean of m-values. First, a sample is chosen as reference.
#'     The scaling factor is then derived using a weighted trimmed mean over
#'     the differences of the log-transformed gene-count fold-change between
#'     the sample and the reference.
#'   * "RLE", relative log expression, RLE uses a pseudo-reference calculated
#'     using the geometric mean of the gene-specific abundances over all
#'     samples. The scaling factors are then calculated as the median of the
#'     gene counts ratios between the samples and the reference.
#'   * "CSS": cumulative sum scaling, calculates scaling factors as the
#'     cumulative sum of gene abundances up to a data-derived threshold.
#'   * "CLR": centered log-ratio normalization.
#'   * "CPM": pre-sample normalization of the sum of the values to 1e+06.
#' @param norm_para  named `list`. other arguments passed to specific
#'   normalization methods.  Most users will not need to pass any additional
#'   arguments here.
#' @param p_adjust method for multiple test correction, default `none`,
#' for more details see [stats::p.adjust].
#' @param pvalue_cutoff significance level for each of the statistical tests,
#'   default 0.05.
#' @param W_cutoff lower bound for the proportion for the W-statistic, default
#'   0.7.
#'
#' @details
#' In an experiment with only two treatments, this tests the following
#' hypothesis for feature \eqn{i}:
#'
#' \deqn{H_{0i}: E(log(\mu_i^1)) =  E(log(\mu_i^2))}
#'
#' where \eqn{\mu_i^1} and \eqn{\mu_i^2} are the mean abundances for feature
#' \eqn{i} in the two groups.
#'
#' The developers of this method recommend the following significance tests
#' if there are 2 groups, use non-parametric Wilcoxon rank sum test
#' [`stats::wilcox.test()`]. If there are more than 2 groups, use nonparametric
#' [`stats::kruskal.test()`] or one-way ANOVA [`stats::aov()`].
#'
#' @return a [microbiomeMarker-class] object, in which the `slot` of
#' `marker_table` contains four variables:
#' * `feature`, significantly different features.
#' * `enrich_group`, the class of the differential features enriched.
#' *  `effect_size`, differential means for two groups, or F statistic for more
#'   than two groups.
#' * `W`, the W-statistic, number of features that a single feature is tested
#'   to be significantly different against.
#'
#' @references Mandal et al. "Analysis of composition of microbiomes: a novel
#' method for studying microbial composition", Microbial Ecology in Health
#' & Disease, (2015), 26.
#' @author Huang Lin, Yang Cao
#' @export
#' @examples
#' \donttest{
#' data(enterotypes_arumugam)
#' ps <- phyloseq::subset_samples(
#'     enterotypes_arumugam,
#'     Enterotype %in% c("Enterotype 3", "Enterotype 2")
#' )
#' run_ancom(ps, group = "Enterotype")
#' }
run_ancom <- function(ps,
    group,
    confounders = character(0),
    taxa_rank = "all",
    transform = c("identity", "log10", "log10p"),
    norm = "TSS",
    norm_para = list(),
    p_adjust = c(
        "none", "fdr", "bonferroni", "holm",
        "hochberg", "hommel", "BH", "BY"
    ),
    pvalue_cutoff = 0.05,
    W_cutoff = 0.75) {
    stopifnot(inherits(ps, "phyloseq"))
    transform <- match.arg(transform, c("identity", "log10", "log10p"))
    p_adjust <- match.arg(
        p_adjust,
        c(
            "none", "fdr", "bonferroni", "holm",
            "hochberg", "hommel", "BH", "BY"
        )
    )

    ps <- check_rank_names(ps) %>%
        check_taxa_rank( taxa_rank)

    if (length(confounders)) {
        confounders <- check_confounder(ps, group, confounders)
    }

    # check whether group is valid, write a function
    meta <- sample_data(ps)
    meta_nms <- names(meta)
    groups <- meta[[group]]
    groups <- make.names(groups)

    if (!is.factor(groups)) {
        groups <- factor(groups)
    }
    sample_data(ps)[[group]] <- groups
    lvl <- levels(groups)
    n_lvl <- length(lvl)

    if (!length(confounders)) {
       tfun <- ifelse(n_lvl > 2, stats::kruskal.test, stats::wilcox.test)
       fml <- paste("x ~ ", group)
    } else {
        tfun <- stats::aov
        fml <- paste("x ~ ", group, "+",  paste(confounders, collapse = " + "))
    }

    # preprocess phyloseq object
    ps <- preprocess_ps(ps)
    ps <- transform_abundances(ps, transform = transform)

    # normalize the data
    norm_para <- c(norm_para, method = norm, object = list(ps))
    ps_normed <- do.call(normalize, norm_para)
    ps_summarized <- pre_ps_taxa_rank(ps_normed, taxa_rank)
    feature_table <- abundances(ps_summarized, norm = TRUE)

    # effect size: CLR mean_difference or aov f statistic
    feature_table_clr <- norm_clr(
        otu_table(feature_table, taxa_are_rows = TRUE)
    )
    feature_table_clr <- data.frame(t(feature_table_clr))
    ef <- vapply(
        feature_table_clr,
        calc_ef_md_f,
        FUN.VALUE = 0.0,
        group = groups
    )

    # enrich_group
    group_enriched <- vapply(
        feature_table_clr,
        get_ancom_enrich_group,
        FUN.VALUE = character(1),
        group = groups
    )

    # ANCOM requires log transformation
    feature_table <- log(as.matrix(feature_table) + 1)
    n_taxa <- nrow(feature_table)
    taxa_id <- row.names(feature_table)
    n_samp <- ncol(feature_table)

    # Calculate the p-value for each pairwise comparison of taxa.
    # para group is just for the main var in the formula
    test_var_dat <- data.frame(groups)
    names(test_var_dat) <- group
    if (length(confounders)) {
        test_var_dat[[confounders]] <- meta[[confounders]]
    }
    p <- calc_ancom_pmat(
        feature_table,
        test_var_dat,
        tfun,
        fml
    )

    # Multiple comparisons correction.
    p_adjusted <- vapply(
        data.frame(p),
        p.adjust,
        FUN.VALUE = numeric(n_taxa),
        method = p_adjust
    )

    # Calculate the W statistic of ANCOM.
    # For each taxon, count the number of q-values < pvalue_cutoff.
    W <- apply(p_adjusted, 2, function(x) sum(x < pvalue_cutoff))

    # Organize outputs
    out_comp <- data.frame(
        feature = taxa_id,
        enrich_group = group_enriched,
        ef = ef,
        W = W,
        row.names = NULL,
        check.names = FALSE
    )
    # Declare a taxon to be differentially abundant based on the quantile of W
    # statistic. We perform (n_taxa - 1) hypothesis testings on each taxon, so
    # the maximum number of rejections is (n_taxa - 1).
    sig_out <- out_comp[out_comp$W > W_cutoff * (n_taxa - 1), ]
    if (n_lvl == 2) {
        names(sig_out)[3] <- "ef_CLR_diff_mean"
    } else {
        names(sig_out)[3] <- "ef_CLR_F_statistic"
    }

    marker <- return_marker(sig_out, out_comp)
    tax <- matrix(taxa_id) %>%
        tax_table()
    row.names(tax) <- row.names(feature_table)

    mm <- microbiomeMarker(
        marker_table = marker,
        norm_method = get_norm_method(norm),
        diff_method = "ANCOM",
        otu_table = otu_table(feature_table, taxa_are_rows = TRUE),
        sam_data = sample_data(ps_normed),
        tax_table = tax
    )

    mm
}

#' Calculates pairwise pvalues between all features
#' @param feature_table matrix-like, logged feature table.
#' @param test_var_dat data.frame, variables data (sample meta data)
#' @param test  character, the test to determine the p value of log ratio,
#'   one of "aov", "wilcox.test",  "kruskal.test".
#' @param ... extra arguments passed to the test.
#' @references
#' github/biocore/scikit-bio/blob/master/skbio/stats/composition.py#L811
#' @noRd
calc_ancom_pmat <- function(feature_table, test_var_dat, test, fml) {

    taxas <- row.names(feature_table)
    feature_table <- data.frame(t(feature_table))
    taxa_n <- ncol(feature_table)
    p <- matrix(NA, nrow = taxa_n, ncol = taxa_n)
    row.names(p) <- taxas
    colnames(p) <- taxas


    for (i in seq_len(taxa_n - 1)) {
        new_table <- -(feature_table[(i + 1):taxa_n] - feature_table[[i]])
        p[-(seq_len(i)), i] <- vapply(
            new_table,
            calc_ancom_p,
            FUN.VALUE = numeric(1),
            test_var_dat = test_var_dat, test = test, fml = fml
        )
    }

    # Complete the p-value matrix.
    # What we got from above iterations is a lower triangle matrix of p-values.
    p[upper.tri(p)] <- t(p)[upper.tri(p)]
    diag(p) <- 1 # let p-values on diagonal equal to 1
    p[is.na(p)] <- 1 # let p-values of NA equal to 1

    p
}

#' calculate the p value of a pair-wise log ratio
#' @param log_ratio  a numeric vector, a pair-wise log ratio.
#' @param classes character vector, the same length with `log_ratio`.
#' @param test  character, the test to dtermine the p value of log ratio,
#'   one of "aov", "wilcox.test",  "kruskal.test".
#' @param ... extra arguments passed to the test.
#' @noRd
calc_ancom_p <- function(log_ratio, test_var_dat, test, fml) {
    # fist var is the target var (main var)
    group <- names(test_var_dat)[1]
    test_dat <- cbind(x = log_ratio, test_var_dat)
    fml <- stats::formula(fml)
    if (identical(test, stats::aov)) {
        fit = test(fml,
                   data = test_dat,
                   na.action = na.omit)
        p = summary(fit)[[1]][group, "Pr(>F)"]
    } else {
        suppressWarnings(p <- test(fml, data = test_dat)$p.value)
    }

    p
}


#' Identify structural zeros
#' from "FrederickHuangLin/ANCOMBC/R/get_struc_zero.R"
#'
#' @author Huang Lin, Yang Cao
#' @noRd
get_struc_zero <- function(ps, group, neg_lb) {
    stopifnot(inherits(ps, "phyloseq"))
    stopifnot(is.logical(neg_lb))
    stopifnot(length(group) == 1 & is.character(group))

    meta_tab <- sample_data(ps)
    check_var_in_meta(group, meta_tab)
    groups <- factor(meta_tab[[group]])

    feature_tab <- as(otu_table(ps), "matrix")
    present_tab <- feature_tab
    present_tab[is.na(present_tab)] <- 0
    present_tab[present_tab != 0] <- 1
    n_taxa <- nrow(feature_tab)
    n_group <- nlevels(groups)

    p_hat <- matrix(NA, nrow = n_taxa, ncol = n_group)
    rownames(p_hat) <- rownames(feature_tab)
    colnames(p_hat) <- levels(groups)
    samp_size <- p_hat

    for (i in seq_len(n_taxa)) {
        p_hat[i, ] <- tapply(
            present_tab[i, ],
            groups,
            function(x) mean(x, na.rm = TRUE)
        )
        samp_size[i, ] <- tapply(
            feature_tab[i, ],
            groups,
            function(x) length(x[!is.na(x)])
        )
    }

    p_hat_lo <- p_hat - 1.96 * sqrt(p_hat * (1 - p_hat) / samp_size)

    zero_ind <- p_hat == 0

    if (neg_lb) {
        zero_ind[p_hat_lo <= 0] <- TRUE
    }
    colnames(zero_ind) <- paste0(
        "structural_zero (", group, " = ", colnames(zero_ind), ")"
    )

    data.frame(zero_ind)
}

#' enrich group for ancom, rewrite this function in the later
#' split get_feature_enrich_group into two funcitons: enrich_group and
#' log max mean
#' @noRd
get_ancom_enrich_group <- function(feature_abd, group) {
    abd_split <- split(feature_abd, group)
    abd_mean_group <- vapply(abd_split, mean, FUN.VALUE = 0.0)
    enrich_group <- names(abd_split)[which.max(abd_mean_group)]

    enrich_group
}

#' preprocess feature data using methods of ANCOM-II
#' @noRd
#' @importFrom stats dnorm lm na.omit quantile residuals sd
preprocess_ancom <- function(feature_table,
    meta_data,
    sample_var,
    lib_cut,
    neg_lb,
    group = NULL,
    out_cut = 0.05,
    zero_cut = 0.90) {

    feature_table <- data.frame(feature_table, check.names = FALSE)
    meta_data <- data.frame(meta_data, check.names = FALSE)
    # Drop unused levels
    meta_data[] <- lapply(
        meta_data,
        function(x) if (is.factor(x)) factor(x) else x
    )
    # Match sample IDs between metadata and feature table
    sample_ID <- intersect(meta_data[, sample_var], colnames(feature_table))
    feature_table <- feature_table[, sample_ID]
    meta_data <- meta_data[match(sample_ID, meta_data[, sample_var]), ]

    # 1. Identify outliers within each taxon
    if (!is.null(group)) {
        groups <- meta_data[, group]
        z <- feature_table + 1 # Add pseudo-count (1)
        f <- log(z)
        f[f == 0] <- NA
        f <- colMeans(f, na.rm = TRUE)
        f_fit <- lm(f ~ groups)
        e <- rep(0, length(f))
        e[!is.na(groups)] <- residuals(f_fit)
        y <- t(t(z) - e)

        outlier_check <- function(x) {
            # Fitting the mixture model using the algorithm of Peddada, S. Das,
            # and JT Gene Hwang (2002)
            mu1 <- quantile(x, 0.25, na.rm = TRUE)
            mu2 <- quantile(x, 0.75, na.rm = TRUE)
            sigma1 <- quantile(x, 0.75, na.rm = TRUE) -
                quantile(x, 0.25, na.rm = TRUE)
            sigma2 <- sigma1
            pi <- 0.75
            n <- length(x)
            epsilon <- 100
            tol <- 1e-5
            score <- pi * dnorm(x, mean = mu1, sd = sigma1) /
                ((1 - pi) * dnorm(x, mean = mu2, sd = sigma2))
            while (epsilon > tol) {
                grp1_ind <- (score >= 1)
                mu1_new <- mean(x[grp1_ind])
                mu2_new <- mean(x[!grp1_ind])
                sigma1_new <- sd(x[grp1_ind])
                if (is.na(sigma1_new)) sigma1_new <- 0
                sigma2_new <- sd(x[!grp1_ind])
                if (is.na(sigma2_new)) sigma2_new <- 0
                pi_new <- sum(grp1_ind) / n

                para <- c(mu1_new, mu2_new, sigma1_new, sigma2_new, pi_new)
                if (any(is.na(para))) break

                score <- pi_new * dnorm(x, mean = mu1_new, sd = sigma1_new) /
                    ((1 - pi_new) * dnorm(x, mean = mu2_new, sd = sigma2_new))

                epsilon <- sqrt(
                    (mu1 - mu1_new)^2 +
                        (mu2 - mu2_new)^2 +
                        (sigma1 - sigma1_new)^2 +
                        (sigma2 - sigma2_new)^2 +
                        (pi - pi_new)^2
                )
                mu1 <- mu1_new
                mu2 <- mu2_new
                sigma1 <- sigma1_new
                sigma2 <- sigma2_new
                pi <- pi_new
            }

            if (mu1 + 1.96 * sigma1 < mu2 - 1.96 * sigma2) {
                if (pi < out_cut) {
                    out_ind <- grp1_ind
                } else if (pi > 1 - out_cut) {
                    out_ind <- (!grp1_ind)
                } else {
                    out_ind <- rep(FALSE, n)
                }
            } else {
                out_ind <- rep(FALSE, n)
            }
            return(out_ind)
        }
        out_ind <- matrix(
            FALSE,
            nrow = nrow(feature_table),
            ncol = ncol(feature_table)
        )
        out_ind[, !is.na(groups)] <- t(apply(
            y, 1,
            function(i) {
                unlist(tapply(i, groups, function(j) outlier_check(j)))
            }
        ))

        feature_table[out_ind] <- NA
    }

    # 2. Discard taxa with zeros  >=  zero_cut
    zero_prop <- apply(
        feature_table, 1,
        function(x) sum(x == 0, na.rm = TRUE) / length(x[!is.na(x)])
    )
    taxa_del <- which(zero_prop >= zero_cut)
    if (length(taxa_del) > 0) {
        feature_table <- feature_table[-taxa_del, ]
    }

    # 3. Discard samples with library size < lib_cut
    lib_size <- colSums(feature_table, na.rm = TRUE)
    if (any(lib_size < lib_cut)) {
        subj_del <- which(lib_size < lib_cut)
        feature_table <- feature_table[, -subj_del]
        meta_data <- meta_data[-subj_del, ]
    }

    # 4. Identify taxa with structure zeros
    if (!is.null(group)) {
        groups <- factor(meta_data[, group])
        present_table <- as.matrix(feature_table)
        present_table[is.na(present_table)] <- 0
        present_table[present_table != 0] <- 1

        p_hat <- t(apply(
            present_table, 1,
            function(x) {
                unlist(tapply(x, groups, function(y) mean(y, na.rm = TRUE)))
            }
        ))
        samp_size <- t(apply(
            feature_table, 1,
            function(x) {
                unlist(tapply(x, groups, function(y) length(y[!is.na(y)])))
            }
        ))
        p_hat_lo <- p_hat - 1.96 * sqrt(p_hat * (1 - p_hat) / samp_size)

        struc_zero <- (p_hat == 0) * 1
        # Whether we need to classify a taxon into structural zero by its
        # negative lower bound?
        if (neg_lb) struc_zero[p_hat_lo <= 0] <- 1

        # Entries considered to be structural zeros are set to be 0s
        struc_ind <- struc_zero[, groups]
        feature_table <- feature_table * (1 - struc_ind)

        colnames(struc_zero) <- paste0(
            "structural_zero (",
            colnames(struc_zero),
            ")"
        )
    } else {
        struc_zero <- NULL
    }

    # 5. Return results
    res <- list(
        feature_table = feature_table,
        meta_data = meta_data,
        structure_zeros = struc_zero
    )

    res
}


================================================
FILE: R/DA-ancombc.R
================================================
#' Differential analysis of compositions of microbiomes with bias correction
#' (ANCOM-BC).
#'
#' Differential abundance analysis for microbial absolute abundance data. This
#' function is a wrapper of [`ANCOMBC::ancombc()`].
#'
#' @param ps  a [`phyloseq::phyloseq-class`] object, which consists of a feature
#'   table, a sample metadata and a taxonomy table.
#' @param group the name of the group variable in metadata. Specifying
#'   `group` is required for detecting structural zeros and performing
#'   global test.
#' @param confounders character vector, the confounding variables to be adjusted.
#'   default `character(0)`, indicating no confounding variable.
#' @param contrast this parameter only used for two groups comparison while
#'   there are multiple groups. For more please see the following details.
#' @param taxa_rank character to specify taxonomic rank to perform
#'   differential analysis on. Should be one of
#'   `phyloseq::rank_names(phyloseq)`, or "all" means to summarize the taxa by
#'   the top taxa ranks (`summarize_taxa(ps, level = rank_names(ps)[1])`), or
#'   "none" means perform differential analysis on the original taxa
#'   (`taxa_names(phyloseq)`, e.g., OTU or ASV).
#' @param transform character, the methods used to transform the microbial
#'   abundance. See [`transform_abundances()`] for more details. The
#'   options include:
#'   * "identity", return the original data without any transformation
#'     (default).
#'   * "log10", the transformation is `log10(object)`, and if the data contains
#'     zeros the transformation is `log10(1 + object)`.
#'   * "log10p", the transformation is `log10(1 + object)`.
#' @param norm the methods used to normalize the microbial abundance data. See
#'   [`normalize()`] for more details.
#'   Options include:
#'   * "none": do not normalize.
#'   * "rarefy": random subsampling counts to the smallest library size in the
#'     data set.
#'   * "TSS": total sum scaling, also referred to as "relative abundance", the
#'     abundances were normalized by dividing the corresponding sample library
#'     size.
#'   * "TMM": trimmed mean of m-values. First, a sample is chosen as reference.
#'     The scaling factor is then derived using a weighted trimmed mean over the
#'     differences of the log-transformed gene-count fold-change between the
#'     sample and the reference.
#'   * "RLE", relative log expression, RLE uses a pseudo-reference calculated
#'     using the geometric mean of the gene-specific abundances over all
#'     samples. The scaling factors are then calculated as the median of the
#'     gene counts ratios between the samples and the reference.
#'   * "CSS": cumulative sum scaling, calculates scaling factors as the
#'     cumulative sum of gene abundances up to a data-derived threshold.
#'   * "CLR": centered log-ratio normalization.
#'   * "CPM": pre-sample normalization of the sum of the values to 1e+06.
#' @param norm_para  named `list`. other arguments passed to specific
#'   normalization methods.  Most users will not need to pass any additional
#'   arguments here.
#' @param p_adjust method to adjust p-values by. Default is "holm".
#'   Options include "holm", "hochberg", "hommel", "bonferroni", "BH", "BY",
#'   "fdr", "none". See [`stats::p.adjust()`] for more details.
#' @param prv_cut a numerical fraction between 0 and 1. Taxa with prevalences
#'   less than `prv_cut` will be excluded in the analysis. Default
#'   is 0.10.
#' @param lib_cut a numerical threshold for filtering samples based on library
#'   sizes. Samples with library sizes less than `lib_cut` will be excluded
#'   in the analysis. Default is 0, i.e. do not filter any sample.
#' @param struc_zero whether to detect structural zeros. Default is FALSE.
#' @param neg_lb whether to classify a taxon as a structural zero in the
#'   corresponding study group using its asymptotic lower bound.
#'   Default is FALSE.
#' @param tol the iteration convergence tolerance for the E-M algorithm.
#'   Default is 1e-05.
#' @param max_iter the maximum number of iterations for the E-M algorithm.
#'   Default is 100.
#' @param conserve whether to use a conservative variance estimate of
#'   the test statistic. It is recommended if the sample size is small and/or
#'   the number of differentially abundant taxa is believed to be large.
#'   Default is FALSE.
#' @param pvalue_cutoff level of significance. Default is 0.05.
#'
#' @details
#' `contrast` must be a two length character or `NULL` (default). It is only
#' required to set manually for two groups comparison when there are multiple
#' groups. The order determines the direction of comparison, the first element
#' is used to specify the reference group (control). This means that, the first
#' element is the denominator for the fold change, and the second element is
#' used as baseline (numerator for fold change). Otherwise, users do required
#' to concern this parameter (set as default `NULL`), and if there are
#' two groups, the first level of groups will set as the reference group; if
#' there are multiple groups, it will perform an ANOVA-like testing to find
#' markers which difference in any of the groups.
#'
#' @references
#' Lin, Huang, and Shyamal Das Peddada. "Analysis of compositions of microbiomes
#' with bias correction." Nature communications 11.1 (2020): 1-11.
#'
#' @seealso [`ANCOMBC::ancombc`]
#'
#' @importFrom ANCOMBC ancombc
#' @importFrom stats relevel
#' @export
#' @return a [`microbiomeMarker-class`] object.
#' @examples
#' data(enterotypes_arumugam)
#' ps <- phyloseq::subset_samples(
#'     enterotypes_arumugam,
#'     Enterotype %in% c("Enterotype 3", "Enterotype 2")
#' )
#' if (requireNamespace("microbiome", quietly = TRUE)) {
#'     run_ancombc(ps, group = "Enterotype")
#' } else {
#'     message("The 'mirobiome' package is not installed, please install it to use this example")
#' }
run_ancombc <- function(ps,
    group,
    confounders = character(0),
    contrast = NULL,
    taxa_rank = "all",
    transform = c("identity", "log10", "log10p"),
    norm = "none",
    norm_para = list(),
    p_adjust = c(
        "none", "fdr", "bonferroni", "holm",
        "hochberg", "hommel", "BH", "BY"
    ),
    prv_cut = 0.1,
    lib_cut = 0,
    struc_zero = FALSE,
    neg_lb = FALSE,
    tol = 1e-05,
    max_iter = 100,
    conserve = FALSE,
    pvalue_cutoff = 0.05) {
    stopifnot(inherits(ps, "phyloseq"))
    ps <- check_rank_names(ps) %>%
        check_taxa_rank( taxa_rank)

    if (length(confounders)) {
        confounders <- check_confounder(ps, group, confounders)
    }

    # if it contains missing values for any
    # variable specified in the formula, the corresponding sampling fraction
    # estimate for this sample will return NA since the sampling fraction is
    # not estimable with the presence of missing values.
    # remove this samples
    fml_char <- ifelse(length(confounders),
                       paste(c(confounders, group), collapse = " + "),
                       group)
    # fml_char <- paste(c(confounders, group), collapse = " + ")
    # fml <- stats::as.formula(paste("~", fml_char))
    # vars_fml <- all.vars(fml)
    for (var in c(confounders, group)) {
        ps <- remove_na_samples(ps, var)
    }

    # check whether group is valid, write a function
    meta <- sample_data(ps)
    meta_nms <- names(meta)
    groups <- meta[[group]]
    groups <- make.names(groups)
    if (!is.null(contrast)) {
        contrast <- make.names(contrast)
    }
    if (!is.factor(groups)) {
        groups <- factor(groups)
    }
    groups <- set_lvl(groups, contrast)
    sample_data(ps)[[group]] <- groups
    lvl <- levels(groups)
    n_lvl <- length(lvl)

    contrast <- check_contrast(contrast)

    transform <- match.arg(transform, c("identity", "log10", "log10p"))
    p_adjust <- match.arg(
        p_adjust,
        c(
            "none", "fdr", "bonferroni", "holm",
            "hochberg", "hommel", "BH", "BY"
        )
    )

    # set the reference level for pair-wise comparison from mutliple groups
    # if (!is.null(contrast) && n_lvl > 2) {
    #     groups <- relevel(groups, ref = contrast[1])
    #     sample_data(ps)[[group]] <- groups
    # }


    # preprocess phyloseq object
    ps <- preprocess_ps(ps)
    ps <- transform_abundances(ps, transform = transform)

    # normalize the data
    norm_para <- c(norm_para, method = norm, object = list(ps))
    ps_normed <- do.call(normalize, norm_para)
    ps_summarized <- pre_ps_taxa_rank(ps_normed, taxa_rank)

    global <- ifelse(n_lvl > 2, TRUE, FALSE)
    # ancombc differential abundance analysis

    if (taxa_rank == "all") {
        ancombc_taxa_rank <- rank_names(ps_summarized)[1]
    } else {
        ancombc_taxa_rank <- taxa_rank
    }

    ancombc_out <- ANCOMBC::ancombc(
        ps_summarized,
        tax_level = ancombc_taxa_rank,
        formula = fml_char,
        p_adj_method = p_adjust,
        prv_cut = prv_cut,
        lib_cut = lib_cut,
        group = group,
        struc_zero = struc_zero,
        neg_lb = neg_lb,
        tol = tol,
        max_iter = max_iter,
        conserve = conserve,
        alpha = pvalue_cutoff,
        global = global
    )

    # multiple-group comparison will be performed while the group
    # variable has > 2 levels
    keep_var <- c("W", "p_val", "q_val", "diff_abn")
    if (n_lvl > 2) {
        # ANCOM-BC global test to determine taxa that are differentially
        # abundant between three or more groups of multiple samples.
        # global result to marker_table
        if (is.null(contrast)) {
            mtab <- ancombc_out$res_global
        } else {
            exp_lvl <- paste0(group, contrast[2])
            ancombc_res <- ancombc_out$res
            mtab <- lapply(keep_var, function(x) ancombc_res[[x]][exp_lvl])
            mtab <- do.call(cbind, mtab)
        }
    } else {
        ancombc_out_res <- ancombc_out$res
        # drop intercept
        #  in the previous version of ancombc (Bioc 3.15), taxa names are saved
        #  as row names, while saved as the first column in the current version
        #  remove intercept and taxa names, and save the rownames as taxa names
        ancombc_out_res <- lapply(
            ancombc_out_res,
            function(x) {
                new_x <- x[-1:-2]
                rownames(new_x) <- x[[1]]
                new_x
            }
        )
        mtab <- do.call(
            cbind,
            ancombc_out_res[c("W", "p_val", "q_val", "diff_abn")]
        )
    }
    names(mtab) <- keep_var

    # determine enrich group based on coefficients
    # drop taxa and intercept
    cf <- ancombc_out$res$lfc[-1:-2]
    if (n_lvl > 2) {
        if (!is.null(contrast)) {
            cf <- cf[exp_lvl]
            enrich_group <- ifelse(cf[[1]] > 0, contrast[2], contrast[1])
        } else {
            cf <- cbind(0, cf)
            enrich_group <- lvl[apply(cf, 1, which.max)]
        }
    } else {
        enrich_group <- ifelse(cf[[1]] > 0, lvl[2], lvl[1])
    }

    # # enriched group
    enrich_abd <- get_ancombc_enrich_group(ps_summarized, ancombc_out, group)
    norm_abd <- enrich_abd$abd

    mtab <- cbind(feature = row.names(mtab), mtab, enrich_group)
    mtab_sig <- mtab[mtab$diff_abn, ]
    mtab_sig <- mtab_sig[c("feature", "enrich_group", "W", "p_val", "q_val")]
    names(mtab_sig) <- c("feature", "enrich_group", "ef_W", "pvalue", "padj")
    marker <- return_marker(mtab_sig, mtab)
    marker <- microbiomeMarker(
        marker_table = marker,
        norm_method = get_norm_method(norm),
        diff_method = "ancombc",
        sam_data = sample_data(ps_summarized),
        otu_table = otu_table(norm_abd, taxa_are_rows = TRUE),
        tax_table = tax_table(ps_summarized)
    )

    marker
}

get_ancombc_enrich_group <- function(ps, ancombc_out, group) {
    samp_frac <- ancombc_out$samp_frac
    # As shown in the ancombc vignette: if it contains missing values for any
    # variable specified in the formula, the corresponding sampling fraction
    # estimate for this sample will return NA since the sampling fraction is
    # not estimable with the presence of missing values.
    # Replace NA with 0
    samp_frac[is.na(samp_frac)] <- 0

    # Add pesudo-count (1) to avoid taking the log of 0
    log_abd <- log(abundances(ps, norm = TRUE) + 1)
    # Adjust the log observed abundances
    log_abd_adj <- sweep(log_abd, 2, samp_frac)

    groups <- sample_data(ps)[[group]]
    # remove groups with NA
    na_idx <- is.na(groups)
    log_abd_adj <- log_abd_adj[, !na_idx]
    groups <- groups[!na_idx]

    # mean absolute abundance
    abd_mean <- by(t(log_abd_adj), groups, colMeans)
    abd_mean <- do.call(cbind, abd_mean)
    idx_enrich <- apply(abd_mean, 1, which.max)
    group_enrich <- colnames(abd_mean)[idx_enrich]
    group_enrich <- data.frame(
        feature = rownames(abd_mean),
        enrich_group = group_enrich
    )

    list(abd = exp(log_abd_adj), group_enrich = group_enrich)
}


================================================
FILE: R/DA-comparing.R
================================================
# The module of comparing differential analysis is inspired from DAtest
# https://github.com/Russel88/DAtest
# If you use this function please cite the original paper:
# Russel, Jakob, et al. "DAtest: a framework for choosing differential abundance
# or expression method." BioRxiv (2018): 241802.

#' Comparing the results of differential analysis methods by Empirical power
#' and False Discovery Rate
#'
#' Calculating power, false discovery rates, false positive rates and auc (
#' area under the receiver operating characteristic (ROC) curve)
#' for various DA methods.
#'
#' @param ps,group,taxa_rank main arguments of all differential analysis
#'   methods. `ps`: a [`phyloseq::phyloseq-class`] object; `group`, character,
#'   the variable to set the group, must be one of the var of the sample
#'   metadata; `taxa_rank`: character, taxonomic rank, please not that **since
#'   the abundance table is spiked in the lowest level, only
#'   `taxa_rank = "none"` is allowed**.
#' @param methods character vector, differential analysis methods to be
#'   compared, available methods are "aldex", "ancom", "ancombc", "deseq2",
#'   "edger", "lefse", "limma_voom", "metagenomeseq", "simple_stat".
#' @param args named list, which used to set the extra arguments of the
#'   differential analysis methods, so the names must be contained in `methods`.
#'   For more see details below.
#' @param n_rep integer, number of times to run the differential analyses.
#' @param effect_size numeric, the effect size for the spike-ins. Default 5.
#' @param k numeric vector of length 3, number of features to spike in each
#'   tertile (lower, mid, upper), e.g. `k=c(5,10,15)` means 5 features spiked
#'   in low abundance tertile, 10 features spiked in mid abundance tertile and
#'   15 features spiked in high abundance tertile. Default `NULL`, which will
#'   spike 2 percent of the total amount of features in each tertile (a total
#'   of 6 percent), but minimum c(5,5,5).
#' @param relative logical, whether rescale the total number of individuals
#'   observed for each sample to the original level after spike-in. Default
#'   `TRUE`.
#' @param BPPARAM [`BiocParallel::BiocParallelParam`] instance defining the
#'   parallel back-end.
#'
#' @details
#' To make this function support for different arguments for a certain DA method
#' `args` allows list of list of list e.g. `args = list(lefse = list(list(norm = "CPM"), list(norm = "TSS")))`, which specify to compare the different norm
#' arguments for lefse analysis.
#'
#' For `taxa_rank`, only `taxa_rank = "none"` is supported, if this argument is
#' not "none", it will be forced to "none" internally.
#'
#'
#' @return an `compareDA` object, which contains a two-length list of:
#'   - `metrics`: `data.frame`, FPR, AUC and spike detection rate for each run.
#'   - `mm`: differential analysis results.
#'
#' @importFrom phyloseq `otu_table<-`
#' @importFrom stats median
#' @export
compare_DA <- function(ps,
                       group,
                       taxa_rank = "none",
                       methods,
                       args = list(),
                       n_rep = 20,
                       effect_size = 5,
                       k = NULL,
                       relative = TRUE,
                       BPPARAM = BiocParallel::SnowParam(progressbar = TRUE)) {
    stopifnot(inherits(ps, "phyloseq"))
    # check methods
    avlb_methods <- c("aldex", "ancom", "ancombc", "deseq2", "edger", "lefse",
                      "limma_voom", "metagenomeseq", "simple_stat")
    out_methods <- setdiff(methods, avlb_methods)
    if (length(out_methods)) {
        stop("methods ", paste(out_methods, collapse = ", "),
             " not available. \n",
             "Please check your `methods`.\n",
             paste(strwrap(paste("Available methods:",
                           paste(avlb_methods, collapse = ", ")),
                           width = 0.9 * getOption("width")),
                   collapse = paste("\n", space(nchar("Available methods:")))),
             ".\n",
             call. = FALSE)
    }

    ps_var_name <- deparse(substitute(ps))

    # support for different arguments for a DA method, list of list of list
    # e.g. args = list(lefse = list(list(norm = "CPM"), list(norm = "TSS")))
    # different norm arguments for lefse analysis
    new_args <- generate_compare_args(methods, args)
    methods <- new_args$methods
    args <- new_args$args

    meta <- sample_data(ps)
    groups <- meta[[group]] |> factor()
    n_lvl <- nlevels(groups)

    if (n_lvl == 2) {
        if ("test_multiple_groups" %in% methods) {
            warning("There are two categories of interested variable ", group,
                    ", method `test_multiple_groups` are dropped.")
            methods <- setdiff(methods, "test_multiple_groups")
        }
    } else if (n_lvl >= 3) {
        if ("test_two_groups" %in% methods) {
            warning("There are more than two categories of interested variable ",
                    group,
                    ", method `test_two_groups` are dropped.")
            methods <- setdiff(methods, "test_two_groups")
        }
    } else {
        stop("Only one category  of interested variable ", group, ".")
    }

    # taxa_rank must be "none"
    if (taxa_rank != "none") {
        warning("since the abundance table is spiked in the lowest level, ",
             "`taxa_rank` was forced set as 'none'",
             call. = FALSE)
        taxa_rank <- "none"
    }

    # taxa_ranks <- vapply(args, `[[`, "taxa_rank", FUN.VALUE = character(1))
    # wrong_taxa_rank <- taxa_ranks != "none"
    # if (any(wrong_taxa_rank)) {
    #     warning("Set `taxa_rank` of all methods to 'none'")
    #     for (i in which(wrong_taxa_rank)) {
    #         args[[i]]$taxa_rank <- "none"
    #     }
    # }

    count_tab <- otu_table(ps)
    features <- rownames(count_tab)

    # spike in, differential features
    n_feature <- nrow(count_tab)

    if (is.null(k)) {
        k <- rep(round(n_feature * 0.02), 3)
        if (sum(k) < 15) {
            k <- rep(5, 3)
        }
    }
    if (sum(k) == n_feature) {
        stop("Set to spike all features, can't calculate FDR or AUC",
             call. = FALSE)
    }
    if (sum(k) > n_feature) {
        stop("Set to spike more features than are present in the data",
             call. = FALSE)
    }
    if (sum(k) < 15 && sum(k) >= 10 && n_rep <= 10) {
        warning("Few features are spiked, increase `k` or set `n_rep` to ",
                "more than 10 to ensure proper estimation of AUC and FPR",,
                call. = FALSE)
    }
    if (sum(k) < 10 && sum(k) >= 5 && n_rep <= 20) {
        warning("Few features are spiked, increase `k` or set `n_rep` to ",
                "more than 20 to ensure proper estimation of AUC and FPR",
                call. = FALSE)
    }
    if (sum(k) < 5 && n_rep <= 50) {
        warning("Very few features are spiked, increase `k` set `n_rep` to ",
                "more than 50 to ensure proper estimation of AUC and FPR",
                call. = FALSE)
    }
    if (sum(k) > n_feature/2) {
        warning("Set to spike more than half of the features, ",
                "which might give unreliable estimates")
   }

    # if(verbose) cat("Spikeing...\n")
    # shuffle predictor
    # predictor <- sample_data(ps)[[group[[1]]]]
    rands <- lapply(seq_len(n_rep), \(x) sample(groups))
    # spikeins
    spikeds <- lapply(rands,
                      \(x) spikein(count_tab, x, effect_size, k, relative))
    count_tabs <- lapply(spikeds, `[[`, 1)
    spiked_features <- lapply(spikeds, `[[`, 2)
    spiked_features <- rep(spiked_features, each = length(methods))

    # spiked phyloseq objects
    generate_spiked_ps <- function(spiked_count, rand, group, ps = ps) {
        otu_table(ps) <- otu_table(spiked_count, taxa_are_rows = TRUE)
        meta <- sample_data(ps)
        meta[[group]] <- rand
        sample_data(ps) <- meta

        ps
    }
    pss <- mapply(generate_spiked_ps,
                  count_tabs, rands,
                  MoreArgs = list(group = group, ps = ps))
    pss <- rep(pss, each = length(methods))
    # rep methods
    rep_methods <- rep(methods, n_rep)
    all_methods <- methods

    # fun for performance metrics of DA methods
    for (item in names(args)) {
        args[[item]]$group <- group
        args[[item]]$taxa_rank <- taxa_rank
    }

    ## for ancombc, use bpmapply will raise an error:
    # `argument "formula" is missing, with no default`. This error could be due
    # to ANCOMBC package. We first run the ancombc method sequentially, and then
    # run other methods parallelly, finnally bind the results
    ancombc_md_idx <- grepl("ancombc", rep_methods, fixed = TRUE)
    if (any(ancombc_md_idx)) {
        ancombc_mds <- rep_methods[ancombc_md_idx]
        ancombc_args <- args[ancombc_md_idx]
        ancombc_pss <- pss[ancombc_md_idx]
        ancombc_spiked_features <- spiked_features[ancombc_md_idx]

        methods <- rep_methods[!ancombc_md_idx]
        args <- args[!ancombc_md_idx]
        pss <- pss[!ancombc_md_idx]
        spiked_features <- spiked_features[!ancombc_md_idx]
    }

    calc_da_metrics <- function(ps, method, args, features,
                                spiked_features, ps_var_name,
                                effect_size) {
        args <- args[[method]]
        args$ps <- ps
        fun <- paste0("run_", method)
        # remove number suffix for different args for a certern method
        fun <- gsub("(.*)_\\d+$", "\\1", fun)
        tm <- system.time(mm <- do.call(fun, args))

        marker <- data.frame(marker_table(mm))

        # pseduo pvalue of ancom
        if (method == "ancom" && !is.null(marker)) {
            w <- marker$W
            cf <- 0.05 * min(w)
        }

        if (!is.null(marker)) {
            spiked <- rep("no", nrow(marker))
            spiked[marker$feature %in% spiked_features] <- "yes"
            marker$spiked <- spiked
        }

        # confusion matrix
        n_pos <- ifelse(is.null(marker), 0, nrow(marker))
        n_feature <- length(features)
        neg_feature <- setdiff(features, spiked_features)
        n_neg <- n_feature - n_pos
        # for effect_size = 1
        true_neg <- n_neg
        true_pos <- 0
        false_neg <- 0
        # for effect size != 1
        if (effect_size != 1) {
            true_pos <- ifelse(is.null(marker), 0, sum(marker$spiked == "yes"))
            false_neg <- sum(neg_feature %in% spiked_features)
        }
        false_pos <- n_pos - true_pos
        true_neg <- n_neg - false_neg

        # fpr
        fpr <- ifelse((false_pos + true_neg) != 0,
                      false_pos / (false_pos + true_neg),
                      0)

        # sdr
        # sum(k) == length(spiked_features)
        sdr <- true_pos / length(spiked_features)

        # auc
        if(effect_size != 1) {
            test_roc <- NULL
            tryCatch(test_roc <- pROC::roc(
                as.numeric(marker$feature %in% spiked_features) ~ marker$pvalue,
                auc = TRUE,
                direction = ">",
                quiet = TRUE), error = function(e) NULL)
            auc <- ifelse(is.null(test_roc), 0.5, as.numeric(test_roc$auc))
        } else {
            auc <- 0.5
        }

        # fdr
        fdr <- ifelse(n_pos != 0, false_pos / n_pos, 0)

        # create call
        cmd_args <- args
        cmd_args$ps <- ps_var_name
        cmd_args$fun <- fun
        # reorder
        args_nms <- names(cmd_args)
        head_nms <- c("fun", "ps", "group", "taxa_rank")
        new_nms <- c(head_nms, setdiff(args_nms, head_nms))
        cmd_args <- cmd_args[new_nms]
        cmd_chr <- deparse1(as.call(cmd_args))
        cmd_chr <- gsub(paste0("\"(", fun, ")\""), "\\1", cmd_chr)

        metrics <- data.frame(auc = auc,
                              fpr = fpr,
                              fdr = fdr,
                              power = sdr,
                              # method = gsub("(.*)_\\d+$", "\\1", method),
                              method = method,
                              call = cmd_chr,
                              time_min = round(tm[3] / 60, 4))
        rownames(metrics) <- NULL

        list(metrics = metrics, mm = mm)
    }

    if (any(ancombc_md_idx)) {
        ancombc_out <- mapply(calc_da_metrics,
                              ps = ancombc_pss,
                              method = ancombc_mds,
                              spiked_features = ancombc_spiked_features,
                              MoreArgs = list(args = ancombc_args,
                                              features = features,
                                              ps_var_name = ps_var_name,
                                              effect_size = effect_size),
                              SIMPLIFY = FALSE)
    }
    if (!all(ancombc_md_idx)) {
        bp_out <- BiocParallel::bpmapply(calc_da_metrics,
                               ps = pss,
                               method = methods,
                               spiked_features = spiked_features,
                               MoreArgs = list(args = args,
                                               features = features,
                                               ps_var_name = ps_var_name,
                                               effect_size = effect_size),
                               BPPARAM = BPPARAM,
                               SIMPLIFY = FALSE)
    }

    if (all(ancombc_md_idx)) {
        da_out <- ancombc_out
    } else {
        if (any(ancombc_md_idx)) {
            da_out <- c(ancombc_out, bp_out)
        } else {
            da_out <- bp_out
        }
    }


    # order the out, to the original order
    idx <- order(c(which(ancombc_md_idx), which(!ancombc_md_idx)))
    da_out <- da_out[idx]

    da_metrics <- do.call(rbind, lapply(da_out, `[[`, "metrics"))
    da_metrics$run <- rep(seq_len(n_rep), each = length(all_methods))
    mms <- lapply(da_out, `[[`, "mm")

    # detail <- list(n_feature = n_feature,
    #                n_sample = ncol(count_tab),
    #                effect_size = effect_size,
    #                spike = paste0(c("Low:","Mid:","High:"), k,
    #                               collapse = ", "))

    out <- list(metrics = da_metrics, mm = mms)
    class(out) <- "compareDA"

    out
}


#' Summary differential analysis methods comparison results
#'
#' @param object an `compareDA` object, output from [`compare_DA()`].
#' @param sort character string specifying sort method. Possibilities are
#'   "score" which is calculated as \eqn{(auc - 0.5) * power - fdr}, "auc" for
#'   area under the ROC curve, "fpr" for false positive rate, "power" for
#'   empirical power.
#' @param boot logical, whether use bootstrap for confidence limites of the
#'   score, default `TRUE`. Recommended to be `TRUE` unless `n_rep` is larger
#'   then 100 in [`compare_DA()`].
#' @param boot_n integer, number of bootstraps, default 1000L.
#' @param prob two length numeric vector, confidence limits for score, default
#'   `c(0.05, 0.95)`.
#' @param ... extra arguments affecting the summary produced.
#' @return a `data.frame` containing measurements for differential analysis
#'    methods:
#'    - `call`: differential analysis commands.
#'    - `auc`: area under curve of ROC.
#'    - `fpr`: false positive rate
#'    - `power`: empirical power.
#'    - `fdr`: false discover7y rate.
#'    - `score`: score whch is calculated as \eqn{(auc - 0.5) * power - fdr}.
#'    - `score_*`: confidence limits of score.
#' @export
summary.compareDA <- function(object,
                              sort = c("score", "auc", "fpr", "power"),
                              boot = TRUE,
                              boot_n = 1000L,
                              prob = c(0.05, 0.95),
                              ...) {
    stopifnot(inherits(object, "compareDA"))
    sort <- match.arg(sort, c("score", "auc", "fpr", "power"))

    # medians
    metrics <- object$metrics
    calls <- metrics$call
    new_metrics <- metrics[c('auc', 'fpr', 'power', 'fdr')]
    metric_med <- stats::aggregate(new_metrics,
                                   by = list(call = calls),
                                   FUN = median)
    # score
    metric_med$score <- (metric_med$auc - 0.5) * metric_med$power -
        metric_med$fdr
    # interval
    new_metrics$score <- (new_metrics$auc - 0.5) * new_metrics$power -
        new_metrics$fdr
    metrics$score <- (metrics$auc - 0.5) * metrics$power - metrics$fdr

    if (boot) {
        boots <- dplyr::group_by(metrics, call) |>
            dplyr::group_modify(~ .x[sample(rownames(.x),
                                            boot_n,
                                            replace = TRUE),
                                    ])
        score_cl <- stats::aggregate(score ~ call,
                                  data = boots,
                                  FUN = function(x)
                                      stats::quantile(x, probs = prob))
    } else {
        score_cl <- stats::aggregate(score ~ call,
                                     data = metrics,
                                     FUN = \(x) stats::quantile(x, probs = prob))
    }
    score_cl <- data.frame(call = score_cl$call,
                           score_cl$score[, 1],
                           score_cl$score[, 2])
    names(score_cl) <- c("call", paste0("score_", prob))
    out <- merge(metric_med, score_cl, by = "call")
    # reorder score descreasing
    out <- out[order(out$score,
                     out[[paste0("score_", prob[1])]],
                     out[[paste0("score_", prob[2])]],
                     decreasing = TRUE),
               ]
    if (out$score[1] <= 0) {
        warning("Best score is <= 0.\n",
                "You might require to preprocessing your data or ",
                "re-run with a higher effect size.",
                call. = FALSE)
    }

    # mat <- vector("logical", nrow(out))
    # for (i in seq_len(nrow(out))) {
    #     mat[i] <- out$score[i] >= out[[paste0("score_", prob[1])]][1]
    # }
    # out$` ` <- " "
    # out[mat,]$` ` <- "*"


    if (sort == "auc") {
        out <- out[order(out$auc, decreasing = TRUE), ]
    }
    if (sort == "fpr") {
        out <- out[order(out$fpr, decreasing = FALSE), ]
    }
    if (sort == "power") {
        out <- out[order(out$power, decreasing = TRUE), ]
    }

    out
}

# compare_DA <- function(...,
#                        n_rep = 20,
#                        effect_size = 5,
#                        k = NULL,
#                        n_core = parallel::detectCores() -1,
#                        check_core = TRUE,
#                        relative = TRUE,
#                        verbose = TRUE) {
#     if (check_core) {
#         if (check_core > 20) {
#             ANSWER <- readline(paste("You are about to run compareDA using",
#                                      n_core,
#                                      "cores. Enter y to proceed "))
#             if (ANSWER != "y") {
#                 stop("Process aborted")
#             }
#         }
#     }
#
#     exp_chrs <- list(...)
#     t_start <- proc.time()
#     calls <- lapply(exp_chrs, \(x) standardise_call(str2lang(x)))
#
#     # extract the ps object and target variable
#     pss <- lapply(calls, `[[`, "ps")
#     group <- lapply(calls, `[[`, "group")
#     # all the ps and target variable must be the same
#     if (sum(duplicated(pss)) != (length(pss) - 1)) {
#         stop("`ps` objects in DA analysis must be the same")
#     }
#     if (sum(duplicated(group)) != (length(group) - 1)) {
#         stop("`group` var in all DA analysis must be the same")
#     }
#
#     # DA methods comparison only support for taxa_rank = "none", since
#     #  the abundance table is spiked in the lowest level
#     # full_paras <- lapply(calls, \(x) formals(match.fun(x[[1]])))
#     taxa_ranks <- lapply(calls, `[[`, "taxa_rank")
#     if (sum(duplicated(taxa_ranks)) != (length(group) - 1)) {
#         stop("`taxa_rank` objects in all DA analysis must be the same")
#     }
#     if (!is.null(taxa_ranks[[1]]) && taxa_ranks[[1]] != "none") {
#         stop("since the abundance table is spiked in the lowest level, ",
#              "`taxa_rank` must be 'none'",
#              call. = FALSE)
#     }
#
#      if (verbose) {
#         message("Comparing differential methods may take a long time")
#         message("Running on ", n_core, "cores")
#     }
#
#     # differential analysis functions
#     funs <- lapply(calls, `[[`, 1)
#     funs_chr <- vapply(funs, as.character, FUN.VALUE = character(1))
#
#     ps <- eval(pss[[1]], envir = parent.frame())
#     count_tab <- otu_table(ps)
#     features <- rownames(count_tab)
#
#     # spike in differential features
#     if (is.null(k)) {
#         k <- rep(round(nrow(count_tab) * 0.02), 3)
#         if (sum(k) < 15) {
#             k <- rep(5, 3)
#         }
#     }
#     n_feature <- nrow(count_tab)
#     if (sum(k) == n_feature) {
#         stop("Set to spike all features, can't calculate FDR or AUC",
#              call. = FALSE)
#     }
#     if (sum(k) > n_feature) {
#         stop("Set to spike more features than are present in the data",
#              call. = FALSE)
#     }
#     if (sum(k) < 15 && sum(k) >= 10 && n_rep <= 10) {
#         warning("Few features are spiked, increase `k` or set `n_rep` to ",
#                 "more than 10 to ensure proper estimation of AUC and FPR",,
#                 call. = FALSE)
#     }
#     if (sum(k) < 10 && sum(k) >= 5 && n_rep <= 20) {
#         warning("Few features are spiked, increase `k` or set `n_rep` to ",
#                 "more than 20 to ensure proper estimation of AUC and FPR",
#                 call. = FALSE)
#     }
#     if (sum(k) < 5 && n_rep <= 50) {
#         warning("Very few features are spiked, increase `k` set `n_rep` to ",
#                 "more than 50 to ensure proper estimation of AUC and FPR",
#                 call. = FALSE)
#     }
#     if (sum(k) > n_feature/2) {
#         warning("Set to spike more than half of the features, ",
#                 "which might give unreliable estimates")
#     }
#
#     if(verbose) cat("Spikeing...\n")
#     # shuffle predictor
#     predictor <- sample_data(ps)[[group[[1]]]]
#     rands <- lapply(seq_len(n_rep), \(x) sample(predictor))
#
#     # spikeins
#     spikeds <- lapply(rands,
#                       \(x) spikein(count_tab, x, effect_size, k, relative))
#     count_tabs <- lapply(spikeds, `[[`, 1)
#
#     if (verbose) {
#         cat(paste("Testing", length(exp_chrs),
#                   "methods", n_rep, "times each ...\n"))
#     }
#
#     # progress bar
#     # da_par <- paste(rep(seq_len(n_rep), each = 2),
#     #                funs_chr, sep = "-")
#     cmds <- rep(exp_chrs, n_rep)
#     run_no <- rep(seq_len(n_rep), each = length(exp_chrs))
#     pb <- utils::txtProgressBar(max = length(cmds), style = 3)
#     progress <- function(n) setTxtProgressBar(pb, n)
#     opts <- list(progress = progress)
#
#     # config parallel
#     if (n_core == 1) {
#         foreach::registerDoSEQ()
#     } else {
#         cl <- parallel::makeCluster(n_core)
#         doSNOW::registerDoSNOW(cl)
#         on.exit(parallel::stopCluster(cl))
#     }
#
#     # run the DA analysis in parallel
#     res <- foreach::foreach(exp_chr = cmds, i = run_no,
#                             .export = c("otu_table", "otu_table<-", funs_chr),
#                             .options.snow = opts) %dopar% {
#         t1_sub <- proc.time()
#         # construct new ps with spiked feature abundance table
#         new_count_tab <- count_tabs[[i]]
#         otu_table(ps) <- otu_table(new_count_tab, taxa_are_rows = TRUE)
#         res_sub <- eval(str2expression(exp_chr),
#                         list(ps = ps),
#                         enclos = parent.frame())
#
#         run_time_sub <- (proc.time() - t1_sub)[3]
#         return(list(res_sub, run_time_sub))
#     }
#     run_times <- lapply(res, `[[`, "run_time_sub")
#     da_res <- lapply(res, `[[`, "res_sub")
#
#     n_da <- length(exp_chrs)
#     r <- NULL
#     final_res <- foreach::foreach(r = seq_len(n_rep)) %do% {
#         da_sub <- da_res[(1 + (r - 1) * n_da):(r * n_da)]
#         curr_cmds <- cmds[(1 + (r - 1) * n_da):(r * n_da)]
#         curr_spiked_features <- spikeds[[r]][[2]]
#         # insert spiked column
#         rsp <- NULL
#         da_sub <- foreach::foreach(rsp = seq_along(da_sub)) %do% {
#             tmp <- da_sub[[rsp]]
#             tmp_marker <- data.frame(marker_table(tmp))
#
#             # psedudo pvalue of ancom
#             if (grepl("ancom(", curr_cmds[rsp], fixed = TRUE)) {
#                 w <- tmp_marker$W
#                 cf <- 0.05 * min(w)
#                 tmp_marker$pvalue <- (1 / w) * cf
#             }
#
#             tmp_spiked <- rep("no", nrow(tmp_marker))
#             tmp_spiked[tmp_marker$feature %in% curr_spiked_features] <- "yes"
#             tmp_marker$spiked <- tmp_spiked
#             return(tmp_marker)
#         }
#
#         # confusion matrix
#         n_pos <- vapply(da_sub, nrow, FUN.VALUE = integer(1))
#         neg_feature <- lapply(da_sub, \(x) setdiff(features, x$feature))
#         n_neg <- n_feature - n_pos # vapply(neg_feature, length, integer(1))
#         # for effect_size = 1
#         true_neg <- n_neg
#         true_pos <- 0
#         false_neg <- 0
#         # for effect_size != 1
#         if (effect_size != 1) {
#             true_pos <- vapply(da_sub,
#                                \(x) sum(da_sub$spiked == "yes"),
#                                FUN.VALUE = integer(1))
#             false_neg <- vapply(neg_feature,
#                                \(x) sum(x %in% curr_spiked_features),
#                                FUN.VALUE = integer(1))
#         }
#         false_pos <- n_pos - true_pos
#         true_neg <- n_neg - false_neg
#
#         # FPR: false positive rate
#         fprs <- vapply(seq_along(da_sub),
#                        \(x) ifelse((false_pos[x] + true_neg[x]) != 0,
#                                    false_pos[x] / (false_pos[x] + true_neg[x]),
#                                    0),
#                        FUN.VALUE = numeric(1))
#
#         # sdr: spike detection rate
#         sdrs <- vapply(true_pos, \(x) x / sum(k), FUN.VALUE = numeric(1))
#
#         # auc
#         aucs <- vapply(da_sub, \(x) {
#             if (effect_size != 1) {
#                 test_roc <- NULL
#                 spiked_idx <- as.numeric(x$feature %in% curr_spiked_features)
#                 tryCatch(
#                     test_roc <- pROC::roc(spiked_idx ~ x$pvalue,
#                                           auc = TRUE,
#                                           direction = ">",
#                                           quiet = TRUE),
#                     error = function(e) NULL)
#                     res <- ifelse(is.null(test_roc),
#                                   0.5,
#                                   as.numeric(test_roc$auc))
#             } else {
#                 res <- 0.5
#             }
#
#             res
#         }, FUN.VALUE = numeric(1))
#
#         # fdrs
#         fdrs <- vapply(seq_along(da_sub),
#                        \(x) ifelse(n_pos != 0, false_pos / n_pos, 0))
#
#         df_combine <- data.frame(call = curr_cmds,
#                                  AUC = aucs,
#                                  FPR = fprs,
#                                  FDR = fdrs,
#                                  Power = sdrs,
#                                  run = r)
#         rownames(df_combine) <- NULL
#
#         return(df_combine, da_sub)
#     }
#
#     out_res <- do.call(rbind, lapply(final_res, `[[`, 1))
#     out_res_marker <- lapply(final_res, `[[`, 2)
#
#     # running time
#     run_secs <- (proc.time() - t_start)[3]
#     if ((run_secs)/60/60 > 1) {
#         run_time <- paste(round((run_secs)/60/60, 2), "Hours")
#     } else {
#         run_time <- paste(round((run_secs)/60,2),"Minutes")
#     }
#
#     out_detail <- data.frame(n_feature = nrow(count_tab),
#                              n_sample = ncol(count_tab),
#                              run_time = run_time,
#                              effect_size = effect_size,
#                              spiked = paste0(c("Low:","Mid:","High:"), k,
#                                              collapse = ", "))
#     out_detail <- as.data.frame(t(out_detail))
#     names(out_detail) <- NULL
#
#     # run times
#     run_times <- data.frame(DA = cmds,
#                             minites = round(unlist(run_times) / 60, 4))
#
#     out <- list(res = out_res,
#                 marker = out_res_marker,
#                 detail = out_detail,
#                 run_time = run_times)
#
#     out
# }


# spike in features
spikein <- function(count_tab,
                    predictor,
                    effect_size = 2,
                    k,
                    relative = TRUE) {
    if (effect_size < 0) {
        stop("Effect size should be positive")
    }

    spike_method <- ifelse(effect_size == 1, "none", "mult")
    if (is.null(rownames(count_tab))) {
        rownames(count_tab) <- seq_len(nrow(count_tab))
    }
    count_tab <- as.data.frame(count_tab)
    predictor <- as.numeric(as.factor(predictor)) - 1

    # Choose Features to spike
    propcount <- sweep(count_tab, 2, colSums(count_tab), "/")
    # propcount <- apply(count_tab, 2, function(x) x/sum(x))
    count_abundances <- sort(rowSums(propcount)/ncol(propcount))

    # Only spike Features present in cases (except if predictor is numeric)
    case_count_tab <- count_tab[
        rowSums(count_tab[, predictor == 1]) > 0, predictor == 1]
    approved_count_abundances <- count_abundances[
        names(count_abundances) %in% row.names(case_count_tab)]

    # Which to spike in each tertile
    lower_tert <- names(approved_count_abundances[
        approved_count_abundances < quantile(approved_count_abundances,1/3)])
    mid_tert <- names(approved_count_abundances[
      approved_count_abundances >= quantile(approved_count_abundances,1/3) &
          approved_count_abundances < quantile(approved_count_abundances,2/3)])
    upper_tert <- names(approved_count_abundances[
        approved_count_abundances >= quantile(approved_count_abundances,2/3)])

    spike_features <- c(sample(lower_tert, k[1]),
                        sample(mid_tert, k[2]),
                        sample(upper_tert,k[3]))
    spike_feature_index <- which(row.names(count_tab) %in% spike_features)

    # Spike Features by multiplication
    old_sums <- colSums(count_tab)

    if (spike_method == "mult"){
        count_tab[spike_feature_index, predictor==1] <-
            count_tab[spike_feature_index, predictor==1] * effect_size
    }

    # Rescale to original sample sums
    new_sums <- colSums(count_tab)
    if (relative) {
        count_tab <- round(sweep(count_tab, 2, old_sums/new_sums, "*"))
    }

    list(count_tab, spike_features)
}


# from pryr: Standardise a function call
standardise_call <- function(call, env = parent.frame()) {
    stopifnot(is.call(call))
    f <- eval(call[[1]], env)
    if (is.primitive(f)) {
        return(call)
    }

    return(match.call(f, call))
}

# To make compare_DA() support for different arguments for a certain DA method,
# args allows list of list of list
# e.g. args = list(lefse = list(list(norm = "CPM"), list(norm = "TSS"))),
# represents compare the different norm arguments for lefse analysis. So we
# need to flattern the args and extend methods for DA analysis:
# methods = c("lefse", "lefse"),
# args = list(list(norm = "CPM"), list(norm = "TSS"))
#
# For method with no args provided, set it to list(), e.g. list(ancom = list()).
generate_compare_args <- function(methods, args) {
    # check args
    args_nms <- names(args)
    if (length(args)) {
        out_args <- setdiff(args_nms, methods)
        if (length(out_args)) {
            stop("names of `args` must be contained in `methods`.\n",
                paste(args[out_args], collapse = ", "), " in names of `args` ",
                "does not match DA methods",
                call. = FALSE)
        }
    }

    # create args list for each method
    method_no_args <- setdiff(methods, args_nms)
    for (i in seq_along(method_no_args)) {
        args[[method_no_args[i]]] <- list()
    }

    new_args <- list()
    n_arg <- vector("integer", length(args))
    for (i in seq_along(args)) {
        curr_arg <- args[i]
        if (purrr::pluck_depth(curr_arg) > 4) {
            stop("`args` could be 'list of list', ",
                 "'list of list of list' to support for different arguments ",
                 "for a certain DA method")
        }
        if (purrr::pluck_depth(curr_arg) == 4) {
            curr_arg <- unlist(curr_arg, recursive = FALSE)
            names(curr_arg) <- paste(names(args)[i],
                                     seq_along(curr_arg),
                                     sep = "_")

        }
        new_args <- c(new_args, curr_arg)
        n_arg[i] <- length(curr_arg)
    }
    methods <- rep(methods, times = n_arg)
    methods_suffix <- lapply(n_arg, \(x) {
        if (x > 1) {
            as.character(paste0("_", seq_len(x)))
        } else {
            ""
        }
    }) |> unlist()
    methods <- paste(methods, methods_suffix, sep = "")

    return(list(methods = methods, args = new_args))
}


================================================
FILE: R/DA-deseq2.R
================================================
# In the vignette of DESeq2:
# The values in the matrix should be un-normalized counts or estimated counts
# of sequencing reads (for single-end RNA-seq) or fragments (for paired-end
# RNA-seq). The RNA-seq workflow describes multiple techniques for preparing
# such count matrices. It is important to provide count matrices as input for
# DESeq2’s statistical model (Love, Huber, and Anders 2014) to hold, as only
# the count values allow assessing the measurement precision correctly. The
# DESeq2 model internally corrects for library size, so transformed or
# normalized values such as counts scaled by library size should not be used
# as input.
#
# DESeq2 contrast: https://github.com/tavareshugo/tutorial_DESeq2_contrasts
#
# reference source code:
# from biocore/qiime/blob/master/qiime/support_files/R/DESeq2_nbinom.r
# https://github.com/hbctraining/DGE_workshop/blob/master/schedule/1.5-day.md
#
#
## p value and logFC from LRT
# From https://hbctraining.github.io/DGE_workshop/lessons/08_DGE_LRT.html
# https://support.bioconductor.org/p/133804/#133856
# By default the Wald test is used to generate the results table, but DESeq2
# also offers the LRT which is used to identify any genes that show change in
# expression across the different levels. The LRT is comparing the full model
# to the reduced model to identify significant genes. The p-values are
# determined solely by the difference in deviance between the ‘full’ and
# "reduced" model formula (not log2 fold changes).
#
# The log2 fold change LRT results  is calculated using Wald (two groups
# comparison).
#
#
#
#' Perform DESeq differential analysis
#'
#' Differential expression analysis based on the Negative Binomial distribution
#' using **DESeq2**.
#'
#' @param ps  a [`phyloseq::phyloseq-class`] object.
#' @param group  character, the variable to set the group, must be one of
#'   the var of the sample metadata.
#' @param confounders character vector, the confounding variables to be adjusted.
#'   default `character(0)`, indicating no confounding variable.
#' @param contrast this parameter only used for two groups comparison while
#'   there are multiple groups. For more please see the following details.
#' @param taxa_rank character to specify taxonomic rank to perform
#'   differential analysis on. Should be one of
#'   `phyloseq::rank_names(phyloseq)`, or "all" means to summarize the taxa by
#'   the top taxa ranks (`summarize_taxa(ps, level = rank_names(ps)[1])`), or
#'   "none" means perform differential analysis on the original taxa
#'   (`taxa_names(phyloseq)`, e.g., OTU or ASV).
#' @param transform character, the methods used to transform the microbial
#'   abundance. See [`transform_abundances()`] for more details. The
#'   options include:
#'   * "identity", return the original data without any transformation
#'     (default).
#'   * "log10", the transformation is `log10(object)`, and if the data contains
#'     zeros the transformation is `log10(1 + object)`.
#'   * "log10p", the transformation is `log10(1 + object)`.
#' @param norm the methods used to normalize the microbial abundance data. See
#'   [`normalize()`] for more details.
#'   Options include:
#'   * "none": do not normalize.
#'   * "rarefy": random subsampling counts to the smallest library size in the
#'     data set.
#'   * "TMM": trimmed mean of m-values. First, a sample is chosen as reference.
#'     The scaling factor is then derived using a weighted trimmed mean over the
#'     differences of the log-transformed gene-count fold-change between the
#'     sample and the reference.
#'   * "RLE", relative log expression, RLE uses a pseudo-reference calculated
#'     using the geometric mean of the gene-specific abundances over all
#'     samples. The scaling factors are then calculated as the median of the
#'     gene counts ratios between the samples and the reference.
#'   * "CSS": cumulative sum scaling, calculates scaling factors as the
#'     cumulative sum of gene abundances up to a data-derived threshold.
#' @param norm_para arguments passed to specific normalization methods. Most
#'   users will not need to pass any additional arguments here.
#' @param fitType,sfType,betaPrior,modelMatrixType,useT,minmu these seven
#'   parameters are inherited form [`DESeq2::DESeq()`].
#'   - `fitType`, either "parametric", "local", "mean", or "glmGamPoi" for the
#'     type of fitting of dispersions to the mean intensity.
#'   - `sfType`, either "ratio", "poscounts", or "iterate" for the type of size
#'     factor estimation. We recommend to use "poscounts".
#'   - `betaPrior`, whether or not to put a zero-mean normal prior on the
#'     non-intercept coefficients.
#'   - `modelMatrixType`, either "standard" or "expanded", which describe how
#'     the model matrix,
#'   - `useT`, logical, where Wald statistics are assumed to follow a standard
#'     Normal.
#'   - `minmu`, lower bound on the estimated count for fitting gene-wise
#'     dispersion.
#'
#'   For more details, see [`DESeq2::DESeq()`].  Most users will not need to
#'   set this arguments (just use the defaults).
#'
#' @param p_adjust method for multiple test correction, default `none`, for
#'   more details see [stats::p.adjust].
#' @param pvalue_cutoff pvalue_cutoff numeric, p value cutoff, default 0.05.
#' @param ... extra parameters passed to [`DESeq2::DESeq()`].
#'
#' @details
#' **Note**: DESeq2 requires the input is raw counts (un-normalized counts), as
#' only the counts values allow assessing the measurement precision correctly.
#' For more details see the vignette of DESeq2 (`vignette("DESeq2")`).
#'
#' Thus, this function only supports "none", "rarefy", "RLE", "CSS", and
#' "TMM" normalization methods. We strongly recommend using the "RLE" method
#' (default normalization method in the DESeq2 package). The other
#' normalization methods are used for expert users and comparisons among
#' different normalization methods.
#'
#' For two groups comparison, this function utilizes the Wald test (defined by
#' [`DESeq2::nbinomWaldTest()`]) for hypothesis testing. A Wald test statistic
#' is computed along with a probability (p-value) that a test statistic at least
#' as extreme as the observed value were selected at random. `contrasts` are
#' used to specify which two groups to compare. The order of the names
#' determines the direction of fold change that is reported.
#'
#' Likelihood ratio test (LRT) is used to identify the genes that significantly
#' changed across all the different levels for multiple groups comparisons. The
#' LRT identified the significant features by comparing the full model to the
#' reduced model. It is testing whether a feature removed in the reduced
#' model explains a significant variation in the data.
#'
#' `contrast` must be a two length character or `NULL` (default). It is only
#' required to set manually for two groups comparison when there are multiple
#' groups. The order determines the direction of comparison, the first element
#' is used to specify the reference group (control). This means that, the first
#' element is the denominator for the fold change, and the second element is
#' used as baseline (numerator for fold change). Otherwise, users do required
#' to concern this parameter (set as default `NULL`), and if there are
#' two groups, the first level of groups will set as the reference group; if
#' there are multiple groups, it will perform an ANOVA-like testing to find
#' markers which difference in any of the groups.
#'
#' @export
#' @return a [`microbiomeMarker-class`] object.
#' @seealso [`DESeq2::results()`],[`DESeq2::DESeq()`]
#' @importFrom stats formula coef
#' @importFrom DESeq2 dispersions<-
#' @importMethodsFrom S4Vectors mcols
#' @importMethodsFrom BiocGenerics sizeFactors<- counts
#' @references
#' Love, Michael I., Wolfgang Huber, and Simon Anders. "Moderated estimation
#' of fold change and dispersion for RNA-seq data with DESeq2." Genome
#' biology 15.12 (2014): 1-21.
#' @examples
#' data(enterotypes_arumugam)
#' ps <- phyloseq::subset_samples(
#'     enterotypes_arumugam,
#'     Enterotype %in% c("Enterotype 3", "Enterotype 2")) %>%
#'     phyloseq::subset_taxa(Phylum %in% c("Firmicutes"))
#' run_deseq2(ps, group = "Enterotype")
run_deseq2 <- function(ps,
    group,
    confounders = character(0),
    contrast = NULL,
    taxa_rank = "all",
    norm = "RLE",
    norm_para = list(),
    transform = c("identity", "log10", "log10p"),
    # test = c("Wald", "LRT"),
    fitType = c("parametric", "local", "mean", "glmGamPoi"),
    sfType = "poscounts",
    betaPrior = FALSE,
    modelMatrixType,
    useT = FALSE,
    minmu = ifelse(fitType == "glmGamPoi", 1e-06, 0.5),
    p_adjust = c(
        "none", "fdr", "bonferroni", "holm",
        "hochberg", "hommel", "BH", "BY"
    ),
    pvalue_cutoff = 0.05,
    ...) {
    ps <- check_rank_names(ps) %>% 
        check_taxa_rank( taxa_rank)
    
    norm_methods <- c("none", "rarefy", "RLE", "CSS", "TMM")
    if (!norm %in% norm_methods) {
        stop(
            "`norm` must be one of 'none', 'rarefy', 'RLE', 'CSS', or 'TMM'",
            call. = FALSE
        )
    }
    
    if (length(confounders)) {
        confounders <- check_confounder(ps, group, confounders)
    }

    # groups
    meta <- sample_data(ps)
    groups <- meta[[group]]
    groups <- make.names(groups)
    if (!is.null(contrast)) {
        contrast <- make.names(contrast)
    }
    if (!is.factor(groups)) {
        groups <- factor(groups)
    }
    groups <- set_lvl(groups, contrast)
    sample_data(ps)[[group]] <- groups
    lvl <- levels(groups)
    n_lvl <- length(lvl)

    if (n_lvl < 2) {
        stop("Differential analysis requires at least two groups.")
    }

    # contrast, test method, name of effect size
    if (n_lvl == 2) { # two groups
        if (!is.null(contrast)) {
            warning(
                "`contrast` is ignored, you do not need to set it",
                call. = FALSE
            )
        }
        contrast_new <- c(group, lvl[2], lvl[1])
    } else {
        if (!is.null(contrast)) {
            if (!is.character(contrast) || length(contrast) != 2) {
                stop("`contrast` must be a two length character", call. = FALSE)
            }
            idx <- match(contrast, lvl, nomatch = 0L)
            if (!all(idx)) {
                stop(
                    "all elements of `contrast` must be contained in `groups`",
                    call. = FALSE
                )
            }
            contrast_new <- c(group, contrast[2], contrast[1])
        }
    }
    test <- ifelse(n_lvl > 2 && is.null(contrast), "LRT", "Wald")
    ef_name <- ifelse(test == "Wald", "logFC", "F")

    fitType <- match.arg(fitType, c("parametric", "local", "mean", "glmGamPoi"))
    transform <- match.arg(transform, c("identity", "log10", "log10p"))
    p_adjust <- match.arg(
        p_adjust,
        c(
            "none", "fdr", "bonferroni", "holm",
            "hochberg", "hommel", "BH", "BY"
        )
    )

    if (!sfType %in% c("ratio", "poscounts", "iterate")) {
        stop("`sfType` muste be one of poscounts, ratio, or iterate")
    }

    # preprocess phyloseq object
    ps <- preprocess_ps(ps)
    ps <- transform_abundances(ps, transform = transform)

    # prenormalize the data
    norm_para <- c(norm_para, method = norm, object = list(ps))
    ps_normed <- do.call(normalize, norm_para)
    ps_summarized <- pre_ps_taxa_rank(ps_normed, taxa_rank)
    
    if (!length(confounders)) {
        dsg <- formula(paste("~", group))
    } else {
        dsg <- formula(paste(
            "~", 
            paste(c(confounders, group), collapse = " + ")
        ))
    }
    
    dds_summarized <- phyloseq2DESeq2(
        ps_summarized,
        design = dsg
    )
    nf <- get_norm_factors(ps_normed)
    if (!is.null(nf)) {
        sizeFactors(dds_summarized) <- nf
    }


    # error: all gene-wise dispersion estimates are within 2 orders of magnitude
    # from the minimum value, which indicates that the count are not
    # overdispersed
    #
    # If dispersion values are less than 1e-6  (minimal value is 1e-8),
    # it would be problematic to fit a dispersion trend in DESeq2.
    # The reason for a minimal value, is that for a given row of the count 
    # matrix, the maximum likelihood estimate can tend to 0 (and so we have a 
    # rule to stop after 1e-8)
    # https://support.bioconductor.org/p/63845/
    # https://support.bioconductor.org/p/122757/
    # from biocore/qiime/blob/master/qiime/support_files/R/DESeq2_nbinom.r

    # LRT is used to analyze all levels of a factor at once, and the
    # The p values are determined solely by the difference in deviance between
    # the "full" and "reduced" model formula (not log2 fold changes). Only Wast
    # method was used for pair-wise comparison. Thus, for pair-wise comparison,
    # we use Wald test. Moreover, you can set the argument `test` in `results()`
    # when extract the results from LRT, and it is equivalent to Wast test.
    #
    # However, even though there are fold changes  present in the results of
    # LRT, they are not directly associated with the actual hypothesis test (
    # actually determined by the arguments name or contrast).
    if (test == "Wald") { # two groups comparison
        res_deseq <- try(
            DESeq2::DESeq(
                dds_summarized,
                test = test,
                fitType = fitType,
                sfType = sfType,
                quiet = TRUE,
                betaPrior = betaPrior,
                modelMatrixType = modelMatrixType,
                useT = useT,
                minmu = minmu,
                ...
            ),
            silent = TRUE
        )

        if (inherits(res_deseq, "try-error") && fitType != "local") {
            warning("data is not overdispered, try `fitType = 'local'`")
            res_deseq <- try(
                DESeq2::DESeq(
                    dds_summarized,
                    test = test,
                    fitType = "local",
                    sfType = sfType,
                    quiet = TRUE,
                    betaPrior = betaPrior,
                    modelMatrixType = modelMatrixType,
                    useT = useT,
                    minmu = minmu,
                    ...
                ),
                silent = TRUE
            )
        }

        if (inherits(res_deseq, "try-error") && fitType != "mean") {
            warning("data is not overdispered, try `fitType = 'mean'`")
            res_deseq <- try(
                DESeq2::DESeq(
                    dds_summarized,
                    test = test,
                    fitType = "mean",
                    sfType = sfType,
                    quiet = TRUE,
                    betaPrior = betaPrior,
                    modelMatrixType = modelMatrixType,
                    useT = useT,
                    minmu = minmu,
                    ...
                ),
                silent = TRUE
            )
        }
        if (inherits(res_deseq, "try-error")) {
            warning(
                "data is not overdispered, use gene-wise estimates ",
                "as final estimates"
            )
            dds_summarized <- DESeq2::estimateDispersionsGeneEst(dds_summarized)
            DESeq2::dispersions(dds_summarized) <- 
                mcols(dds_summarized)$dispGeneEst

            dds_summarized <- DESeq2::nbinomWaldTest(
                dds_summarized,
                betaPrior = betaPrior,
                quiet = TRUE,
                modelMatrixType = modelMatrixType,
                useT = useT,
                minmu = minmu
            )
        } else {
            dds_summarized <- res_deseq
        }

        res <- DESeq2::results(
            object = dds_summarized,
            contrast = contrast_new,
            pAdjustMethod = p_adjust
        )
        # rename log2FoldChange to logFC, use base R rather than dplyr::rename
        names(res)[names(res) == "log2FoldChange"] <- "logFC"
    } else {
        dds_summarized <- DESeq2::DESeq(
            dds_summarized,
            test = test,
            fitType = fitType,
            sfType = sfType,
            quiet = TRUE,
            minmu = minmu,
            reduced = ~1,
            ...
        )
        res <- DESeq2::results(
            object = dds_summarized,
            pAdjustMethod = p_adjust
        )
    }

    # Why p value is NA?
    # By default, independent filtering is performed to select a set of genes
    # for multiple test correction which maximizes the number of adjusted p
    # values less than a given critical value alpha (by default 0.1).
    # The adjusted p-values for the genes which do not pass the filter threshold
    # are set to NA.
    # By default, results assigns a p-value of NA to genes containing count
    # outliers, as identified using Cook's distance.

    # normalized counts
    counts_normalized <- DESeq2::counts(dds_summarized, normalized = TRUE)

    # one way anova f statistic for LRT
    if (test == "LRT") {
        temp_count <- data.frame(t(counts_normalized))
        f_stat <- vapply(
            temp_count,
            calc_ef_md_f,
            FUN.VALUE = 0.0,
            group = groups
        )
        res[["F"]] <- f_stat
    }

    res <- data.frame(res)
    # enrich group
    if (test == "Wald") {
        enrich_group <- ifelse(res$logFC > 0, contrast_new[2], contrast_new[3])
    } else {
        cf <- coef(dds_summarized)
        
        # extract coef of interested var
        target_idx <- grepl(group, colnames(cf))
        cf <- cf[, target_idx]
        # the first coef is intercept, bind the coef of the reference group as 0
        # (the first column)
        cf <- cbind(0, cf)
        enrich_idx <- apply(
            cf, 1,
            function(x) ifelse(any(is.na(x)), NA, which.max(x))
        )
        enrich_group <- lvl[enrich_idx]
        enrich_group <- enrich_group[match(row.names(res), row.names(cf))]
    }
    res$enrich_group <- enrich_group
    # order according to padj
    res_ordered <- res[order(res$padj), ]
    # filter sig feature
    padj <- res_ordered$padj
    res_ordered <- cbind(feature = row.names(res_ordered), res_ordered)
    # rownames in the form of marker*
    row.names(res_ordered) <- paste0("marker", seq_len(nrow(res_ordered)))

    # reorder columns: feature, enrich_group, other columns
    other_col <- setdiff(names(res_ordered), c("feature", "enrich_group"))
    res_ordered <- res_ordered[, c("feature", "enrich_group", other_col)]
    row.names(res_ordered) <- paste0("marker", seq_len(nrow(res_ordered)))

    # only keep five variables: feature, enrich_group, effect_size (logFC),
    # pvalue, and padj
    keep_var <- c("feature", "enrich_group", ef_name, "pvalue", "padj")
    res_ordered <- res_ordered[keep_var]
    names(res_ordered)[3] <- paste0("ef_", ef_name)
    sig_res <-  res_ordered[!is.na(padj) & padj < pvalue_cutoff, ]
    marker <- return_marker(sig_res, res_ordered)

    marker <- microbiomeMarker(
        marker_table = marker,
        # if no pre-calculated size factors, DESeq2 will calculate the 
        # size factors internally, so norm method shoule be RLE
        norm_method = ifelse(is.null(nf), "RLE", get_norm_method(norm)),
        diff_method = paste0("DESeq2: ", test),
        sam_data = sample_data(ps_normed),
        tax_table = tax_table(ps_summarized),
        otu_table = otu_table(counts_normalized, taxa_are_rows = TRUE)
    )

    marker
}

#' Convert `phyloseq-class` object to `DESeqDataSet-class` object
#'
#' This function convert [phyloseq::phyloseq-class`] to
#' [`DESeq2::DESeqDataSet-class`], which can then be tested using
#' [`DESeq2::DESeq()`].
#'
#' @param ps the [phyloseq::phyloseq-class`] object to convert, which must have
#'   a [`phyloseq::sample_data()`] component.
#' @param design a `formula` or `matrix`, the formula expresses how the counts
#'   for each gene depend on the variables in colData. Many R formula are valid,
#'   including designs with multiple variables, e.g., `~ group + condition`.
#'   This argument is passed to [`DESeq2::DESeqDataSetFromMatrix()`].
#' @param ... additional arguments passed to
#'   [`DESeq2::DESeqDataSetFromMatrix()`], Most users will not need to pass any
#'   additional arguments here.
#' @export
#' @return a [`DESeq2::DESeqDataSet-class`] object.
#' @seealso [`DESeq2::DESeqDataSetFromMatrix()`],[`DESeq2::DESeq()`]
#' @examples
#' data(caporaso)
#' phyloseq2DESeq2(caporaso, ~SampleType)
phyloseq2DESeq2 <- function(ps, design, ...) {
    stopifnot(inherits(ps, "phyloseq"))
    ps <- keep_taxa_in_rows(ps)

    # sample data
    samp <- sample_data(ps, errorIfNULL = FALSE)
    if (is.null(samp)) {
        stop(
            "`sample_data` of `ps` is required,",
            " for specifying experimental design.",
            call. = FALSE
        )
    }
    # count data
    ct <- as(otu_table(ps), "matrix")

    # deseq2 requires raw counts, means the counts must be integer
    dds <- tryCatch(
        DESeq2::DESeqDataSetFromMatrix(
            countData = ct,
            colData = data.frame(samp),
            design = design,
            ...
        ),
        error = function(e) e
    )
    if (inherits(dds, "error") &&
        conditionMessage(dds) == "some values in assay are not integers") {
        warning(
            "Some counts are non-integers, they are rounded to integers.\n",
            "Raw count is recommended for reliable results for deseq2 method.",
            call. = FALSE
        )
        dds <- DESeq2::DESeqDataSetFromMatrix(
            countData = round(ct),
            colData = data.frame(samp),
            design = design,
            ...
        )
    }


    dds
}

# Modified from `DESeq2::estimateFactorsForMatrix()` directly
# for `estimateSizeFactors`:
# `sizeFactors(estimateSizeFactors(dds, type = "poscounts"))` is identical to
# `sizeFactors(estimateSizeFactors(dds, geoMeans = geoMeans))`
#
# The original function of `DESeq2::estimateFactorsForMatrix()` does not
# stabilize size factors to have geometric mean of 1 while `type = "poscounts"`.
# This modified function is to make
# `estimateSizeFactorsForMatrix(counts(diagdds2),geoMeans = geoMeans)` is equal
# to `estimateSizeFactorsForMatrix(counts(diagdds2), type = "poscounts")` by
# stabilize size factors if `type = "poscounts"`.
estimateSizeFactorsForMatrix <- function(counts,
    locfunc = stats::median,
    geoMeans,
    controlGenes,
    type = c("ratio", "poscounts")) {
    type <- match.arg(type, c("ratio", "poscounts"))
    if (missing(geoMeans)) {
        incomingGeoMeans <- FALSE
        if (type == "ratio") {
            loggeomeans <- rowMeans(log(counts))
        } else if (type == "poscounts") {
            lc <- log(counts)
            lc[!is.finite(lc)] <- 0
            loggeomeans <- rowMeans(lc)
            allZero <- rowSums(counts) == 0
            loggeomeans[allZero] <- -Inf
        }
    } else {
        incomingGeoMeans <- TRUE
        if (length(geoMeans) != nrow(counts)) {
            stop("geoMeans should be as long as the number of rows of counts")
        }
        loggeomeans <- log(geoMeans)
    }
    if (all(is.infinite(loggeomeans))) {
        stop(
            "every gene contains at least one zero ",
            "cannot compute log geometric means",
            call. = FALSE
        )
    }
    sf <- if (missing(controlGenes)) {
        apply(counts, 2, function(cnts) {
            exp(locfunc((log(cnts) - loggeomeans)[
                is.finite(loggeomeans) & cnts > 0]))
        })
    } else {
        if (!(is.numeric(controlGenes) | is.logical(controlGenes))) {
            stop("controlGenes should be either a numeric or logical vector")
        }
        loggeomeansSub <- loggeomeans[controlGenes]
        apply(
            counts[controlGenes, , drop = FALSE], 2,
            function(cnts) {
                idx <- is.finite(loggeomeansSub) & cnts > 0
                exp(locfunc((log(cnts) - loggeomeansSub)[idx]))
            }
        )
    }
    if (incomingGeoMeans | type == "poscounts") {
        # stabilize size factors to have geometric mean of 1
        sf <- sf / exp(mean(log(sf)))
    }
    sf
}


================================================
FILE: R/DA-edgeR.R
================================================
#' Perform differential analysis using edgeR
#'
#' Differential expression analysis based on the Negative Binomial distribution
#' using **edgeR**.
#'
#' @param ps  ps a [`phyloseq::phyloseq-class`] object.
#' @param group  character, the variable to set the group, must be one of
#'   the var of the sample metadata.
#' @param confounders character vector, the confounding variables to be adjusted.
#'   default `character(0)`, indicating no confounding variable.
#' @param contrast this parameter only used for two groups comparison while
#'   there are multiple groups. For more please see the following details.
#' @param taxa_rank character to specify taxonomic rank to perform
#'   differential analysis on. Should be one of
#'   `phyloseq::rank_names(phyloseq)`, or "all" means to summarize the taxa by
#'   the top taxa ranks (`summarize_taxa(ps, level = rank_names(ps)[1])`), or
#'   "none" means perform differential analysis on the original taxa
#'   (`taxa_names(phyloseq)`, e.g., OTU or ASV).
#' @param method character, used for differential analysis, please see details
#'   below for more info.
#' @param transform character, the methods used to transform the microbial
#'   abundance. See [`transform_abundances()`] for more details. The
#'   options include:
#'   * "identity", return the original data without any transformation
#'     (default).
#'   * "log10", the transformation is `log10(object)`, and if the data contains
#'     zeros the transformation is `log10(1 + object)`.
#'   * "log10p", the transformation is `log10(1 + object)`.
#' @param norm the methods used to normalize the microbial abundance data. See
#'   [`normalize()`] for more details.
#'   Options include:
#'   * "none": do not normalize.
#'   * "rarefy": random subsampling counts to the smallest library size in the
#'     data set.
#'   * "TSS": total sum scaling, also referred to as "relative abundance", the
#'     abundances were normalized by dividing the corresponding sample library
#'     size.
#'   * "TMM": trimmed mean of m-values. First, a sample is chosen as reference.
#'     The scaling factor is then derived using a weighted trimmed mean over
#'     the differences of the log-transformed gene-count fold-change between
#'     the sample and the reference.
#'   * "RLE", relative log expression, RLE uses a pseudo-reference calculated
#'     using the geometric mean of the gene-specific abundances over all
#'     samples. The scaling factors are then calculated as the median of the
#'     gene counts ratios between the samples and the reference.
#'   * "CSS": cumulative sum scaling, calculates scaling factors as the
#'     cumulative sum of gene abundances up to a data-derived threshold.
#'   * "CLR": centered log-ratio normalization.
#'   * "CPM": pre-sample normalization of the sum of the values to 1e+06.
#' @param norm_para arguments passed to specific normalization methods. Most
#'   users will not need to pass any additional arguments here.
#' @param disp_para additional arguments passed to [`edgeR::estimateDisp()`]
#'   used for dispersions estimation. Most users will not need to pass any
#'   additional arguments here.
#' @param p_adjust method for multiple test correction, default `none`,
#'   for more details see [stats::p.adjust].
#' @param pvalue_cutoff numeric, p value cutoff, default 0.05
#' @param ... extra arguments passed to the model. See [`edgeR::glmQLFit()`]
#'   and [`edgeR::glmFit()`] for more details.
#' @return  a [`microbiomeMarker-class`] object.
#'
#' @details
#' **Note** that edgeR is designed to work with actual counts. This means that
#' transformation is not required in any way before inputting them to edgeR.
#'
#' There are two test methods for differential analysis in **edgeR**,
#' likelihood ratio test (LRT) and quasi-likelihood F-test (QLFT). The QLFT
#' method is recommended as it allows stricter error rate control by
#' accounting for the uncertainty in dispersion estimation.
#'
#' `contrast` must be a two length character or `NULL` (default). It is only
#' required to set manually for two groups comparison when there are multiple
#' groups. The order determines the direction of comparison, the first element
#' is used to specify the reference group (control). This means that, the first
#' element is the denominator for the fold change, and the second element is
#' used as baseline (numerator for fold change). Otherwise, users do required
#' to concern this parameter (set as default `NULL`), and if there are
#' two groups, the first level of groups will set as the reference group; if
#' there are multiple groups, it will perform an ANOVA-like testing to find
#' markers which difference in any of the groups.
#'
#' @export
#' @seealso [`edgeR::glmFit()`],[`edgeR::glmQLFit(

Download .txt

gitextract_few3gk_j/

├── .Rbuildignore
├── .gitattributes
├── .github/
│   ├── .gitignore
│   ├── ISSUE_TEMPLATE/
│   │   └── issue_template.md
│   └── workflows/
│       ├── check-bioc.yml
│       └── pkgdown.yaml
├── .gitignore
├── DESCRIPTION
├── LICENSE.md
├── NAMESPACE
├── NEWS.md
├── R/
│   ├── AllClasses.R
│   ├── AllGenerics.R
│   ├── DA-aldex.R
│   ├── DA-all.R
│   ├── DA-ancom.R
│   ├── DA-ancombc.R
│   ├── DA-comparing.R
│   ├── DA-deseq2.R
│   ├── DA-edgeR.R
│   ├── DA-lefse.R
│   ├── DA-limma-voom.R
│   ├── DA-metagenomeSeq.R
│   ├── DA-simple-statistic.R
│   ├── DA-sl.R
│   ├── DA-test-multiple-groups.R
│   ├── DA-test-two-groups.R
│   ├── abundances-methods.R
│   ├── aggregate-taxa.R
│   ├── assignment-methods.R
│   ├── confounder.R
│   ├── data.R
│   ├── extract-methods.R
│   ├── import-biobakery-lefse_in.R
│   ├── import-dada2.R
│   ├── import-picrust2.R
│   ├── import-qiime2.R
│   ├── lefse-utilities.R
│   ├── microbiomeMarker.R
│   ├── normalization.R
│   ├── plot-abundance.R
│   ├── plot-cladogram.R
│   ├── plot-comparing.R
│   ├── plot-effect-size.R
│   ├── plot-heatmap.R
│   ├── plot-postHocTest.R
│   ├── plot-sl-roc.R
│   ├── post-hoc-test.R
│   ├── reexports.R
│   ├── subset-marker.R
│   ├── summarize-taxa.R
│   ├── sysdata.rda
│   ├── test-utilities.R
│   ├── transform.R
│   └── utilities.R
├── README.Rmd
├── README.md
├── _pkgdown.yml
├── codecov.yml
├── data/
│   ├── caporaso.rda
│   ├── cid_ying.rda
│   ├── ecam.rda
│   ├── enterotypes_arumugam.rda
│   ├── kostic_crc.rda
│   ├── oxygen.rda
│   ├── pediatric_ibd.rda
│   └── spontaneous_colitis.rda
├── data-raw/
│   ├── available_ranks.R
│   └── data.R
├── inst/
│   ├── CITATION
│   └── extdata/
│       ├── dada2_samdata.txt
│       ├── dada2_seqtab.rds
│       ├── dada2_taxtab.rds
│       ├── picrust2_metadata.tsv
│       ├── refseq.qza
│       ├── sample-metadata.tsv
│       ├── table.qza
│       ├── taxonomy.qza
│       └── tree.qza
├── man/
│   ├── abundances-methods.Rd
│   ├── aggregate_taxa.Rd
│   ├── assign-marker_table.Rd
│   ├── assign-otu_table.Rd
│   ├── compare_DA.Rd
│   ├── confounder.Rd
│   ├── data-caporaso.Rd
│   ├── data-cid_ying.Rd
│   ├── data-ecam.Rd
│   ├── data-enterotypes_arumugam.Rd
│   ├── data-kostic_crc.Rd
│   ├── data-oxygen.Rd
│   ├── data-pediatric_ibd.Rd
│   ├── data-spontaneous_colitis.Rd
│   ├── effect_size-plot.Rd
│   ├── extract-methods.Rd
│   ├── extract_posthoc_res.Rd
│   ├── figures/
│   │   └── sticker.R
│   ├── get_treedata_phyloseq.Rd
│   ├── import_dada2.Rd
│   ├── import_picrust2.Rd
│   ├── import_qiime2.Rd
│   ├── marker_table-class.Rd
│   ├── marker_table-methods.Rd
│   ├── microbiomeMarker-class.Rd
│   ├── microbiomeMarker-package.Rd
│   ├── microbiomeMarker.Rd
│   ├── nmarker-methods.Rd
│   ├── normalize-methods.Rd
│   ├── phyloseq2DESeq2.Rd
│   ├── phyloseq2edgeR.Rd
│   ├── phyloseq2metagenomeSeq.Rd
│   ├── plot.compareDA.Rd
│   ├── plot_abundance.Rd
│   ├── plot_cladogram.Rd
│   ├── plot_heatmap.Rd
│   ├── plot_postHocTest.Rd
│   ├── plot_sl_roc.Rd
│   ├── postHocTest-class.Rd
│   ├── postHocTest.Rd
│   ├── reexports.Rd
│   ├── run_aldex.Rd
│   ├── run_ancom.Rd
│   ├── run_ancombc.Rd
│   ├── run_deseq2.Rd
│   ├── run_edger.Rd
│   ├── run_lefse.Rd
│   ├── run_limma_voom.Rd
│   ├── run_marker.Rd
│   ├── run_metagenomeseq.Rd
│   ├── run_posthoc_test.Rd
│   ├── run_simple_stat.Rd
│   ├── run_sl.Rd
│   ├── run_test_multiple_groups.Rd
│   ├── run_test_two_groups.Rd
│   ├── subset_marker.Rd
│   ├── summarize_taxa.Rd
│   ├── summary.compareDA.Rd
│   └── transform_abundances.Rd
├── tests/
│   ├── testthat/
│   │   ├── _snaps/
│   │   │   ├── ancom.md
│   │   │   ├── edgeR.md
│   │   │   ├── lefse.md
│   │   │   ├── limma-voom.md
│   │   │   ├── multiple-groups-test.md
│   │   │   └── two-group-test.md
│   │   ├── data/
│   │   │   ├── ancom-zero.csv
│   │   │   ├── ancom-zero_neg_lb.csv
│   │   │   ├── data_tax_duplicate.rds
│   │   │   └── generate_cladogram_annotation.rds
│   │   ├── test-abundances.R
│   │   ├── test-aldex.R
│   │   ├── test-ancom.R
│   │   ├── test-ancombc.R
│   │   ├── test-assignment.R
│   │   ├── test-barplot.R
│   │   ├── test-comparing.R
│   │   ├── test-confounder.R
│   │   ├── test-edgeR.R
│   │   ├── test-extract.R
│   │   ├── test-import-picrust2.R
│   │   ├── test-import-qiime2.R
│   │   ├── test-lefse-input.R
│   │   ├── test-lefse.R
│   │   ├── test-limma-voom.R
│   │   ├── test-metagenomeSeq.R
│   │   ├── test-microbiomeMaker-methods.R
│   │   ├── test-microbiomeMarker-class.R
│   │   ├── test-multiple-groups-test.R
│   │   ├── test-normalization.R
│   │   ├── test-sl.R
│   │   ├── test-summarize-tax.R
│   │   ├── test-transform.R
│   │   ├── test-two-group-test.R
│   │   ├── test-utilities.R
│   │   ├── test_cladogram.R
│   │   └── test_fix_duplicate_tax.R
│   └── testthat.R
└── vignettes/
    ├── .gitignore
    ├── microbiomeMarker-vignette.Rmd
    └── vignette.bib

Download .json

Condensed preview — 179 files, each showing path, character count, and a content snippet. Download the .json file or copy for the full structured content (736K chars).

[
  {
    "path": ".Rbuildignore",
    "chars": 237,
    "preview": "^microbiomeMarker\\.Rproj$\n^\\.Rproj\\.user$\n^test\\.R$\n^README\\.Rmd$\n^data-raw$\n^lefse$\n^dev_test$\n^\\.github/workflows/R-CM"
  },
  {
    "path": ".gitattributes",
    "chars": 10,
    "preview": "* text=lf\n"
  },
  {
    "path": ".github/.gitignore",
    "chars": 7,
    "preview": "*.html\n"
  },
  {
    "path": ".github/ISSUE_TEMPLATE/issue_template.md",
    "chars": 436,
    "preview": "Please briefly describe your problem, what output actually happend, and what \noutput you expect.\n\nPlease provide a minim"
  },
  {
    "path": ".github/workflows/check-bioc.yml",
    "chars": 12648,
    "preview": "## Read more about GitHub actions the features of this GitHub Actions workflow\n## at https://lcolladotor.github.io/bioct"
  },
  {
    "path": ".github/workflows/pkgdown.yaml",
    "chars": 1554,
    "preview": "on:\n  push:\n    branches:\n      - main\n      - master\n    tags:\n      -'*'\n\nname: pkgdown\n\njobs:\n  pkgdown:\n    runs-on:"
  },
  {
    "path": ".gitignore",
    "chars": 50,
    "preview": ".RData\n__MACOSX\ndocs\ninst/doc\n.Rproj.user\n*.Rproj\n"
  },
  {
    "path": "DESCRIPTION",
    "chars": 1971,
    "preview": "Package: microbiomeMarker\nTitle: microbiome biomarker analysis toolkit\nVersion: 1.13.2\nAuthors@R: \n    person(given = \"Y"
  },
  {
    "path": "LICENSE.md",
    "chars": 34723,
    "preview": "GNU General Public License\n==========================\n\n_Version 3, 29 June 2007_  \n_Copyright © 2007 Free Software Found"
  },
  {
    "path": "NAMESPACE",
    "chars": 6391,
    "preview": "# Generated by roxygen2: do not edit by hand\n\nS3method(plot,compareDA)\nS3method(summary,compareDA)\nexport(\"%>%\")\nexport("
  },
  {
    "path": "NEWS.md",
    "chars": 1596,
    "preview": "# microbiomeMarker 1.3.2\n\n+ fix error on subgroup in lefse, #62, #55\n\n# microbiomeMarker 1.3.1 (2022-05-26)\n\n+ Developme"
  },
  {
    "path": "R/AllClasses.R",
    "chars": 10840,
    "preview": "# marker_table class ------------------------------------------------------\n\n#' The S4 class for storing microbiome mark"
  },
  {
    "path": "R/AllGenerics.R",
    "chars": 7627,
    "preview": "# marker_table class -----------------------------------------------------------\n\n#' Build or access the marker_table\n#'"
  },
  {
    "path": "R/DA-aldex.R",
    "chars": 17544,
    "preview": "#' Perform differential analysis using ALDEx2\n#'\n#' @param ps a [`phyloseq::phyloseq-class`] object\n#' @param group char"
  },
  {
    "path": "R/DA-all.R",
    "chars": 7793,
    "preview": "#' Find makers (differentially expressed metagenomic features)\n#'\n#' `run_marker` is a wrapper of all differential analy"
  },
  {
    "path": "R/DA-ancom.R",
    "chars": 18549,
    "preview": "#' Perform differential analysis using ANCOM\n#'\n#' Perform significant test by comparing the pairwise log ratios between"
  },
  {
    "path": "R/DA-ancombc.R",
    "chars": 13005,
    "preview": "#' Differential analysis of compositions of microbiomes with bias correction\n#' (ANCOM-BC).\n#'\n#' Differential abundance"
  },
  {
    "path": "R/DA-comparing.R",
    "chars": 33548,
    "preview": "# The module of comparing differential analysis is inspired from DAtest\n# https://github.com/Russel88/DAtest\n# If you us"
  },
  {
    "path": "R/DA-deseq2.R",
    "chars": 24252,
    "preview": "# In the vignette of DESeq2:\n# The values in the matrix should be un-normalized counts or estimated counts\n# of sequenci"
  },
  {
    "path": "R/DA-edgeR.R",
    "chars": 13700,
    "preview": "#' Perform differential analysis using edgeR\n#'\n#' Differential expression analysis based on the Negative Binomial distr"
  },
  {
    "path": "R/DA-lefse.R",
    "chars": 9312,
    "preview": "#' Liner discriminant analysis (LDA) effect size (LEFSe) analysis\n#'\n#' Perform Metagenomic LEFSe analysis based on phyl"
  },
  {
    "path": "R/DA-limma-voom.R",
    "chars": 8765,
    "preview": "#' Differential analysis using limma-voom\n#'\n#' @param ps  ps a [`phyloseq::phyloseq-class`] object.\n#' @param group  ch"
  },
  {
    "path": "R/DA-metagenomeSeq.R",
    "chars": 17358,
    "preview": "# We recommend fitFeatureModel over fitZig. MRcoefs, MRtable and MRfulltable\n# are useful summary tables of the model ou"
  },
  {
    "path": "R/DA-simple-statistic.R",
    "chars": 7616,
    "preview": "#' Simple statistical analysis of metagenomic profiles\n#'\n#' Perform simple statistical analysis of metagenomic profiles"
  },
  {
    "path": "R/DA-sl.R",
    "chars": 9558,
    "preview": "#' Identify biomarkers using supervised leaning (SL) methods\n#'\n#' Identify biomarkers using logistic regression, random"
  },
  {
    "path": "R/DA-test-multiple-groups.R",
    "chars": 8817,
    "preview": "#' Statistical test for multiple groups\n#'\n#' @param ps a [`phyloseq::phyloseq-class`] object\n#' @param group character,"
  },
  {
    "path": "R/DA-test-two-groups.R",
    "chars": 22654,
    "preview": "#' Statistical test between two groups\n#'\n#' @param ps a [`phyloseq::phyloseq-class`] object\n#' @param group character, "
  },
  {
    "path": "R/abundances-methods.R",
    "chars": 2762,
    "preview": "# This function is inspired from microbiome::abundances\n\n#' Extract taxa abundances\n#'\n#' Extract taxa abundances from p"
  },
  {
    "path": "R/aggregate-taxa.R",
    "chars": 5079,
    "preview": "## Note: This function is copied from package microbiome\n\n#' @title Aggregate Taxa\n#' @description Summarize phyloseq da"
  },
  {
    "path": "R/assignment-methods.R",
    "chars": 1846,
    "preview": "#' Assign a new OTU table\n#'\n#' Assign a new OTU table in microbiomeMarker object\n#' @param x [`microbiomeMarker-class`]"
  },
  {
    "path": "R/confounder.R",
    "chars": 2504,
    "preview": "## reference\n# https://github.com/biomedbigdata/namco/blob/647d3108a281eb0e36af31c44f5bf38d0c70c07d/app/R/utils.R#L480-L"
  },
  {
    "path": "R/data.R",
    "chars": 4515,
    "preview": "#' 16S rRNA data from \"Moving pictures of the human microbiome\"\n#'\n#' 16S read counts and phylogenetic tree file of 34 I"
  },
  {
    "path": "R/extract-methods.R",
    "chars": 527,
    "preview": "#' Extract `marker_table` object\n#'\n#' Operators acting on `marker_table` to extract parts.\n#'\n#' @name [\n#' @aliases [,"
  },
  {
    "path": "R/import-biobakery-lefse_in.R",
    "chars": 2986,
    "preview": "#' @title Import function to read the tab-delimited input file of biobakery\n#' lefse\n#'\n#' @description For biobakey lef"
  },
  {
    "path": "R/import-dada2.R",
    "chars": 2965,
    "preview": "# This function is modified from import_dada2.R in MicrobiotaProcess\n# https://github.com/YuLab-SMU/MicrobiotaProcess/bl"
  },
  {
    "path": "R/import-picrust2.R",
    "chars": 3178,
    "preview": "#' Import function to read the output of picrust2 as phyloseq object\n#' \n#' Import the output of picrust2 into phyloseq "
  },
  {
    "path": "R/import-qiime2.R",
    "chars": 7908,
    "preview": "# This function is modified from import_qiime2.R in MicrobiotaProcess\n# https://github.com/YuLab-SMU/MicrobiotaProcess/b"
  },
  {
    "path": "R/lefse-utilities.R",
    "chars": 15149,
    "preview": "# enrich group of the feature ---------------------------------------------\n\n#' get the mean abundances of each class fo"
  },
  {
    "path": "R/microbiomeMarker.R",
    "chars": 717,
    "preview": "#' microbiomeMarker: A package for microbiome biomarker discovery\n#'\n#' The microboimeMarker package provides several me"
  },
  {
    "path": "R/normalization.R",
    "chars": 17965,
    "preview": "#' Normalize the microbial abundance data\n#'\n#' It is critical to normalize the feature table to eliminate any bias due "
  },
  {
    "path": "R/plot-abundance.R",
    "chars": 2536,
    "preview": "#' plot the abundances of markers\n#'\n#' @inheritParams plot_ef_bar\n#' @param group character, the variable to set the gr"
  },
  {
    "path": "R/plot-cladogram.R",
    "chars": 13711,
    "preview": "##  codes for cladogram plot are modified from microbiomeViz\n##  https://github.com/lch14forever/microbiomeViz\n\n#' @titl"
  },
  {
    "path": "R/plot-comparing.R",
    "chars": 2299,
    "preview": "#' Plotting DA comparing result\n#'\n#' @param x an `compareDA` object, output from [`compare_DA()`].\n#' @param sort chara"
  },
  {
    "path": "R/plot-effect-size.R",
    "chars": 7143,
    "preview": "#' bar and dot plot of effect size of microbiomeMarker data\n#'\n#' bar and dot plot of effect size microbiomeMarker data."
  },
  {
    "path": "R/plot-heatmap.R",
    "chars": 4530,
    "preview": "#' Heatmap of microbiome marker\n#'\n#' Display the microbiome marker using heatmap, in which rows represents the\n#' marke"
  },
  {
    "path": "R/plot-postHocTest.R",
    "chars": 4617,
    "preview": "#' `postHocTest` plot\n#'\n#' Visualize the result of post-hoc test using ggplot2\n#'\n#' @param pht a [`postHocTest-class`]"
  },
  {
    "path": "R/plot-sl-roc.R",
    "chars": 2577,
    "preview": "#' ROC curve of microbiome marker from supervised learning methods\n#'\n#' Show the ROC curve of the microbiome marker cal"
  },
  {
    "path": "R/post-hoc-test.R",
    "chars": 13019,
    "preview": "# post hoc test -----------------------------------------------------------\n\n#' Post hoc pairwise comparisons for multip"
  },
  {
    "path": "R/reexports.R",
    "chars": 1214,
    "preview": "#' @importMethodsFrom phyloseq ntaxa\n#' @importFrom phyloseq ntaxa\n#' @export\n#' @exportMethod ntaxa\nphyloseq::ntaxa\n\n#'"
  },
  {
    "path": "R/subset-marker.R",
    "chars": 1018,
    "preview": "#' Subset microbiome markers\n#'\n#' Subset markers based on an expression related to the columns and values\n#' within the"
  },
  {
    "path": "R/summarize-taxa.R",
    "chars": 4841,
    "preview": "#' Summarize taxa into a taxonomic level within each sample\n#'\n#' Provides summary information of the representation of "
  },
  {
    "path": "R/test-utilities.R",
    "chars": 157,
    "preview": "# transpose otu_table and then convert it to data.frame\n#' @importMethodsFrom phyloseq t\ntranspose_and_2df <- function(o"
  },
  {
    "path": "R/transform.R",
    "chars": 1935,
    "preview": "#' Transform the taxa abundances in `otu_table` sample by sample\n#'\n#' Transform the taxa abundances in `otu_table` samp"
  },
  {
    "path": "R/utilities.R",
    "chars": 17761,
    "preview": "#' check whether tax abundance table is summarized or not\n#' @noRd\ncheck_tax_summarize <- function(ps) {\n    taxa <- row"
  },
  {
    "path": "README.Rmd",
    "chars": 11254,
    "preview": "---\ntitle:: R package for microbiome biomarker discovery\noutput: github_document\n---\n\n<!-- README.md is generated from R"
  },
  {
    "path": "README.md",
    "chars": 11598,
    "preview": "\r\n<!-- README.md is generated from README.Rmd. Please edit that file -->\r\n\r\n# microbiomeMarker\r\n\r\n<a href='https://githu"
  },
  {
    "path": "_pkgdown.yml",
    "chars": 1565,
    "preview": "reference:\n  - title: \"Data import\"\n    desc: >\n      Functions for importing external data and converting other R objec"
  },
  {
    "path": "codecov.yml",
    "chars": 237,
    "preview": "comment: false\n\n# disable project and patch check\ncoverage:\n  status:\n    project:\n      default:\n        enabled: false"
  },
  {
    "path": "data-raw/available_ranks.R",
    "chars": 315,
    "preview": "# available taxonomic ranks, Summarize represents summarized tax\navailable_ranks <- c(\n    \"Kingdom\", \"Phylum\", \"Class\","
  },
  {
    "path": "data-raw/data.R",
    "chars": 8412,
    "preview": "library(MicrobiomeAnalystR)\nlibrary(phyloseq)\nlibrary(magrittr)\n# Human Moving Picture from MicrobiomeAnalyst server ---"
  },
  {
    "path": "inst/CITATION",
    "chars": 912,
    "preview": "citHeader(\"To cite microbiomeMarker in publications use:\")\n\ncitEntry(\n    entry  = \"article\",\n    title  = \"microbiomeMa"
  },
  {
    "path": "inst/extdata/dada2_samdata.txt",
    "chars": 397,
    "preview": "Subject\tGender\tDay\tWhen\nF3D0\t3\tF\t0\tEarly\nF3D1\t3\tF\t1\tEarly\nF3D141\t3\tF\t141\tLate\nF3D142\t3\tF\t142\tLate\nF3D143\t3\tF\t143\tLate\nF3"
  },
  {
    "path": "inst/extdata/picrust2_metadata.tsv",
    "chars": 547,
    "preview": "SampleID\tFacility\tGenotype\n100CHE6KO\tPaloAlto\tKO\n101CHE6WT\tPaloAlto\tWT\n102CHE6WT\tPaloAlto\tWT\n103CHE6KO\tPaloAlto\tKO\n104CH"
  },
  {
    "path": "inst/extdata/sample-metadata.tsv",
    "chars": 2094,
    "preview": "sample-id\tbarcode-sequence\tbody-site\tyear\tmonth\tday\tsubject\treported-antibiotic-usage\tdays-since-experiment-start\r\n#q2:t"
  },
  {
    "path": "man/abundances-methods.Rd",
    "chars": 1644,
    "preview": "% Generated by roxygen2: do not edit by hand\n% Please edit documentation in R/abundances-methods.R\n\\docType{methods}\n\\na"
  },
  {
    "path": "man/aggregate_taxa.Rd",
    "chars": 1018,
    "preview": "% Generated by roxygen2: do not edit by hand\n% Please edit documentation in R/aggregate-taxa.R\n\\name{aggregate_taxa}\n\\al"
  },
  {
    "path": "man/assign-marker_table.Rd",
    "chars": 987,
    "preview": "% Generated by roxygen2: do not edit by hand\n% Please edit documentation in R/AllGenerics.R\n\\name{marker_table<-}\n\\alias"
  },
  {
    "path": "man/assign-otu_table.Rd",
    "chars": 871,
    "preview": "% Generated by roxygen2: do not edit by hand\n% Please edit documentation in R/assignment-methods.R\n\\docType{methods}\n\\na"
  },
  {
    "path": "man/compare_DA.Rd",
    "chars": 2968,
    "preview": "% Generated by roxygen2: do not edit by hand\n% Please edit documentation in R/DA-comparing.R\n\\name{compare_DA}\n\\alias{co"
  },
  {
    "path": "man/confounder.Rd",
    "chars": 1255,
    "preview": "% Generated by roxygen2: do not edit by hand\n% Please edit documentation in R/confounder.R\n\\name{confounder}\n\\alias{conf"
  },
  {
    "path": "man/data-caporaso.Rd",
    "chars": 780,
    "preview": "% Generated by roxygen2: do not edit by hand\n% Please edit documentation in R/data.R\n\\docType{data}\n\\name{data-caporaso}"
  },
  {
    "path": "man/data-cid_ying.Rd",
    "chars": 764,
    "preview": "% Generated by roxygen2: do not edit by hand\n% Please edit documentation in R/data.R\n\\docType{data}\n\\name{data-cid_ying}"
  },
  {
    "path": "man/data-ecam.Rd",
    "chars": 877,
    "preview": "% Generated by roxygen2: do not edit by hand\n% Please edit documentation in R/data.R\n\\docType{data}\n\\name{data-ecam}\n\\al"
  },
  {
    "path": "man/data-enterotypes_arumugam.Rd",
    "chars": 598,
    "preview": "% Generated by roxygen2: do not edit by hand\n% Please edit documentation in R/data.R\n\\docType{data}\n\\name{data-enterotyp"
  },
  {
    "path": "man/data-kostic_crc.Rd",
    "chars": 697,
    "preview": "% Generated by roxygen2: do not edit by hand\n% Please edit documentation in R/data.R\n\\docType{data}\n\\name{data-kostic_cr"
  },
  {
    "path": "man/data-oxygen.Rd",
    "chars": 577,
    "preview": "% Generated by roxygen2: do not edit by hand\n% Please edit documentation in R/data.R\n\\docType{data}\n\\name{data-oxygen}\n\\"
  },
  {
    "path": "man/data-pediatric_ibd.Rd",
    "chars": 528,
    "preview": "% Generated by roxygen2: do not edit by hand\n% Please edit documentation in R/data.R\n\\docType{data}\n\\name{data-pediatric"
  },
  {
    "path": "man/data-spontaneous_colitis.Rd",
    "chars": 677,
    "preview": "% Generated by roxygen2: do not edit by hand\n% Please edit documentation in R/data.R\n\\docType{data}\n\\name{data-spontaneo"
  },
  {
    "path": "man/effect_size-plot.Rd",
    "chars": 1252,
    "preview": "% Generated by roxygen2: do not edit by hand\n% Please edit documentation in R/plot-effect-size.R\n\\name{plot_ef_bar}\n\\ali"
  },
  {
    "path": "man/extract-methods.Rd",
    "chars": 631,
    "preview": "% Generated by roxygen2: do not edit by hand\n% Please edit documentation in R/extract-methods.R\n\\name{[}\n\\alias{[}\n\\alia"
  },
  {
    "path": "man/extract_posthoc_res.Rd",
    "chars": 1489,
    "preview": "% Generated by roxygen2: do not edit by hand\n% Please edit documentation in R/post-hoc-test.R\n\\name{extract_posthoc_res}"
  },
  {
    "path": "man/figures/sticker.R",
    "chars": 224,
    "preview": "imgurl <- \"man/figures/microbiome.png\"\nsticker(\n  subplot = imgurl,\n  package=\"microbiomeMarker\",\n  p_size=14, s_x=1, s_"
  },
  {
    "path": "man/get_treedata_phyloseq.Rd",
    "chars": 597,
    "preview": "% Generated by roxygen2: do not edit by hand\n% Please edit documentation in R/plot-cladogram.R\n\\name{get_treedata_phylos"
  },
  {
    "path": "man/import_dada2.Rd",
    "chars": 1961,
    "preview": "% Generated by roxygen2: do not edit by hand\n% Please edit documentation in R/import-dada2.R\n\\name{import_dada2}\n\\alias{"
  },
  {
    "path": "man/import_picrust2.Rd",
    "chars": 1995,
    "preview": "% Generated by roxygen2: do not edit by hand\n% Please edit documentation in R/import-picrust2.R\n\\name{import_picrust2}\n\\"
  },
  {
    "path": "man/import_qiime2.Rd",
    "chars": 1577,
    "preview": "% Generated by roxygen2: do not edit by hand\n% Please edit documentation in R/import-qiime2.R\n\\name{import_qiime2}\n\\alia"
  },
  {
    "path": "man/marker_table-class.Rd",
    "chars": 714,
    "preview": "% Generated by roxygen2: do not edit by hand\n% Please edit documentation in R/AllClasses.R\n\\docType{class}\n\\name{marker_"
  },
  {
    "path": "man/marker_table-methods.Rd",
    "chars": 940,
    "preview": "% Generated by roxygen2: do not edit by hand\n% Please edit documentation in R/AllGenerics.R\n\\name{marker_table}\n\\alias{m"
  },
  {
    "path": "man/microbiomeMarker-class.Rd",
    "chars": 1338,
    "preview": "% Generated by roxygen2: do not edit by hand\n% Please edit documentation in R/AllClasses.R, R/AllGenerics.R\n\\docType{cla"
  },
  {
    "path": "man/microbiomeMarker-package.Rd",
    "chars": 652,
    "preview": "% Generated by roxygen2: do not edit by hand\n% Please edit documentation in R/microbiomeMarker.R\n\\docType{package}\n\\name"
  },
  {
    "path": "man/microbiomeMarker.Rd",
    "chars": 1761,
    "preview": "% Generated by roxygen2: do not edit by hand\n% Please edit documentation in R/AllClasses.R\n\\name{microbiomeMarker}\n\\alia"
  },
  {
    "path": "man/nmarker-methods.Rd",
    "chars": 858,
    "preview": "% Generated by roxygen2: do not edit by hand\n% Please edit documentation in R/AllGenerics.R\n\\docType{methods}\n\\name{nmar"
  },
  {
    "path": "man/normalize-methods.Rd",
    "chars": 5985,
    "preview": "% Generated by roxygen2: do not edit by hand\n% Please edit documentation in R/normalization.R\n\\name{normalize,phyloseq-m"
  },
  {
    "path": "man/phyloseq2DESeq2.Rd",
    "chars": 1343,
    "preview": "% Generated by roxygen2: do not edit by hand\n% Please edit documentation in R/DA-deseq2.R\n\\name{phyloseq2DESeq2}\n\\alias{"
  },
  {
    "path": "man/phyloseq2edgeR.Rd",
    "chars": 879,
    "preview": "% Generated by roxygen2: do not edit by hand\n% Please edit documentation in R/DA-edgeR.R\n\\name{phyloseq2edgeR}\n\\alias{ph"
  },
  {
    "path": "man/phyloseq2metagenomeSeq.Rd",
    "chars": 1535,
    "preview": "% Generated by roxygen2: do not edit by hand\n% Please edit documentation in R/DA-metagenomeSeq.R\n\\name{phyloseq2metageno"
  },
  {
    "path": "man/plot.compareDA.Rd",
    "chars": 818,
    "preview": "% Generated by roxygen2: do not edit by hand\n% Please edit documentation in R/plot-comparing.R\n\\name{plot.compareDA}\n\\al"
  },
  {
    "path": "man/plot_abundance.Rd",
    "chars": 1098,
    "preview": "% Generated by roxygen2: do not edit by hand\n% Please edit documentation in R/plot-abundance.R\n\\name{plot_abundance}\n\\al"
  },
  {
    "path": "man/plot_cladogram.Rd",
    "chars": 2579,
    "preview": "% Generated by roxygen2: do not edit by hand\n% Please edit documentation in R/plot-cladogram.R\n\\name{plot_cladogram}\n\\al"
  },
  {
    "path": "man/plot_heatmap.Rd",
    "chars": 2675,
    "preview": "% Generated by roxygen2: do not edit by hand\n% Please edit documentation in R/plot-heatmap.R\n\\name{plot_heatmap}\n\\alias{"
  },
  {
    "path": "man/plot_postHocTest.Rd",
    "chars": 991,
    "preview": "% Generated by roxygen2: do not edit by hand\n% Please edit documentation in R/plot-postHocTest.R\n\\name{plot_postHocTest}"
  },
  {
    "path": "man/plot_sl_roc.Rd",
    "chars": 1032,
    "preview": "% Generated by roxygen2: do not edit by hand\n% Please edit documentation in R/plot-sl-roc.R\n\\name{plot_sl_roc}\n\\alias{pl"
  },
  {
    "path": "man/postHocTest-class.Rd",
    "chars": 1364,
    "preview": "% Generated by roxygen2: do not edit by hand\n% Please edit documentation in R/AllClasses.R, R/AllGenerics.R\n\\docType{cla"
  },
  {
    "path": "man/postHocTest.Rd",
    "chars": 1631,
    "preview": "% Generated by roxygen2: do not edit by hand\n% Please edit documentation in R/AllClasses.R\n\\name{postHocTest}\n\\alias{pos"
  },
  {
    "path": "man/reexports.Rd",
    "chars": 1099,
    "preview": "% Generated by roxygen2: do not edit by hand\n% Please edit documentation in R/reexports.R\n\\docType{import}\n\\name{reexpor"
  },
  {
    "path": "man/run_aldex.Rd",
    "chars": 4726,
    "preview": "% Generated by roxygen2: do not edit by hand\n% Please edit documentation in R/DA-aldex.R\n\\name{run_aldex}\n\\alias{run_ald"
  },
  {
    "path": "man/run_ancom.Rd",
    "chars": 4919,
    "preview": "% Generated by roxygen2: do not edit by hand\n% Please edit documentation in R/DA-ancom.R\n\\name{run_ancom}\n\\alias{run_anc"
  },
  {
    "path": "man/run_ancombc.Rd",
    "chars": 6337,
    "preview": "% Generated by roxygen2: do not edit by hand\n% Please edit documentation in R/DA-ancombc.R\n\\name{run_ancombc}\n\\alias{run"
  },
  {
    "path": "man/run_deseq2.Rd",
    "chars": 7187,
    "preview": "% Generated by roxygen2: do not edit by hand\n% Please edit documentation in R/DA-deseq2.R\n\\name{run_deseq2}\n\\alias{run_d"
  },
  {
    "path": "man/run_edger.Rd",
    "chars": 6030,
    "preview": "% Generated by roxygen2: do not edit by hand\n% Please edit documentation in R/DA-edgeR.R\n\\name{run_edger}\n\\alias{run_edg"
  },
  {
    "path": "man/run_lefse.Rd",
    "chars": 4988,
    "preview": "% Generated by roxygen2: do not edit by hand\n% Please edit documentation in R/DA-lefse.R\n\\name{run_lefse}\n\\alias{run_lef"
  },
  {
    "path": "man/run_limma_voom.Rd",
    "chars": 4648,
    "preview": "% Generated by roxygen2: do not edit by hand\n% Please edit documentation in R/DA-limma-voom.R\n\\name{run_limma_voom}\n\\ali"
  },
  {
    "path": "man/run_marker.Rd",
    "chars": 5625,
    "preview": "% Generated by roxygen2: do not edit by hand\n% Please edit documentation in R/DA-all.R\n\\name{run_marker}\n\\alias{run_mark"
  },
  {
    "path": "man/run_metagenomeseq.Rd",
    "chars": 6625,
    "preview": "% Generated by roxygen2: do not edit by hand\n% Please edit documentation in R/DA-metagenomeSeq.R\n\\name{run_metagenomeseq"
  },
  {
    "path": "man/run_posthoc_test.Rd",
    "chars": 3266,
    "preview": "% Generated by roxygen2: do not edit by hand\n% Please edit documentation in R/post-hoc-test.R\n\\name{run_posthoc_test}\n\\a"
  },
  {
    "path": "man/run_simple_stat.Rd",
    "chars": 4606,
    "preview": "% Generated by roxygen2: do not edit by hand\n% Please edit documentation in R/DA-simple-statistic.R\n\\name{run_simple_sta"
  },
  {
    "path": "man/run_sl.Rd",
    "chars": 5104,
    "preview": "% Generated by roxygen2: do not edit by hand\n% Please edit documentation in R/DA-sl.R\n\\name{run_sl}\n\\alias{run_sl}\n\\titl"
  },
  {
    "path": "man/run_test_multiple_groups.Rd",
    "chars": 3846,
    "preview": "% Generated by roxygen2: do not edit by hand\n% Please edit documentation in R/DA-test-multiple-groups.R\n\\name{run_test_m"
  },
  {
    "path": "man/run_test_two_groups.Rd",
    "chars": 3982,
    "preview": "% Generated by roxygen2: do not edit by hand\n% Please edit documentation in R/DA-test-two-groups.R\n\\name{run_test_two_gr"
  },
  {
    "path": "man/subset_marker.Rd",
    "chars": 844,
    "preview": "% Generated by roxygen2: do not edit by hand\n% Please edit documentation in R/subset-marker.R\n\\name{subset_marker}\n\\alia"
  },
  {
    "path": "man/summarize_taxa.Rd",
    "chars": 986,
    "preview": "% Generated by roxygen2: do not edit by hand\n% Please edit documentation in R/summarize-taxa.R\n\\name{summarize_taxa}\n\\al"
  },
  {
    "path": "man/summary.compareDA.Rd",
    "chars": 1664,
    "preview": "% Generated by roxygen2: do not edit by hand\n% Please edit documentation in R/DA-comparing.R\n\\name{summary.compareDA}\n\\a"
  },
  {
    "path": "man/transform_abundances.Rd",
    "chars": 1324,
    "preview": "% Generated by roxygen2: do not edit by hand\n% Please edit documentation in R/transform.R\n\\name{transform_abundances}\n\\a"
  },
  {
    "path": "tests/testthat/_snaps/ancom.md",
    "chars": 563,
    "preview": "# ancom result\n\n    Code\n      print(head(curr_marker), digits = 5)\n    Output\n                            feature enric"
  },
  {
    "path": "tests/testthat/_snaps/edgeR.md",
    "chars": 7936,
    "preview": "# result of edger\n\n    Code\n      print(marker_table(mm_edger), digits = 5)\n    Output\n                                 "
  },
  {
    "path": "tests/testthat/_snaps/lefse.md",
    "chars": 3239,
    "preview": "# lefse output of oxygen\n\n    Code\n      mm_lefse\n    Output\n      microbiomeMarker-class inherited from phyloseq-class\n"
  },
  {
    "path": "tests/testthat/_snaps/limma-voom.md",
    "chars": 504,
    "preview": "# limma voom\n\n    Code\n      print(marker_table(mm_lv), digits = 5)\n    Output\n                                        f"
  },
  {
    "path": "tests/testthat/_snaps/multiple-groups-test.md",
    "chars": 1523,
    "preview": "# test post hoc test result\n\n    Code\n      print(res_test, digits = 5)\n    Output\n                                     "
  },
  {
    "path": "tests/testthat/_snaps/two-group-test.md",
    "chars": 1036,
    "preview": "# test two group result\n\n    Code\n      mm_welch\n    Output\n      microbiomeMarker-class inherited from phyloseq-class\n "
  },
  {
    "path": "tests/testthat/data/ancom-zero.csv",
    "chars": 985,
    "preview": "\"\",\"structural_zero..delivery...Cesarean.\",\"structural_zero..delivery...Vaginal.\"\n\"sp1\",FALSE,FALSE\n\"sp2\",FALSE,FALSE\n\"s"
  },
  {
    "path": "tests/testthat/data/ancom-zero_neg_lb.csv",
    "chars": 985,
    "preview": "\"\",\"structural_zero..delivery...Cesarean.\",\"structural_zero..delivery...Vaginal.\"\n\"sp1\",FALSE,FALSE\n\"sp2\",FALSE,FALSE\n\"s"
  },
  {
    "path": "tests/testthat/test-abundances.R",
    "chars": 1052,
    "preview": "ps <- phyloseq::phyloseq(\n    otu_table = otu_table(\n        matrix(\n            sample(100, 40),\n            nrow = 2,\n"
  },
  {
    "path": "tests/testthat/test-aldex.R",
    "chars": 3817,
    "preview": "test_that(\"convert mc instances\", {\n    instance <- list(\n        sample1 = data.frame(inst1 = runif(10), inst2 = runif("
  },
  {
    "path": "tests/testthat/test-ancom.R",
    "chars": 1183,
    "preview": "if (FALSE) {\n    zero_neg_lb <- ANCOMBC:::get_struc_zero(\n        data.frame(otu_table(ecam)),\n        as(sample_data(ec"
  },
  {
    "path": "tests/testthat/test-ancombc.R",
    "chars": 3137,
    "preview": "test_that(\"ancombc works correctly\", {\n    if (FALSE) {\n        # result of ancombc package, example from the package vi"
  },
  {
    "path": "tests/testthat/test-assignment.R",
    "chars": 865,
    "preview": "marker <- marker_table(\n    data.frame(\n        feature = paste0(\"sp\", 1:5),\n        enrich_group = c(\"cr\", \"er\", \"cr\", "
  },
  {
    "path": "tests/testthat/test-barplot.R",
    "chars": 3914,
    "preview": "test_that(\"feature label in bar plot\", {\n    feature <- \"Bacteria|Bacteroidetes|Bacteroidia|Bacteroidales|Bacteroidaceae"
  },
  {
    "path": "tests/testthat/test-comparing.R",
    "chars": 1659,
    "preview": "test_that(\"comparing da methods\", {\n    data(ecam)\n    expect_error(compare_DA(ecam, \"delivery\", methods = c(\"test\", \"an"
  },
  {
    "path": "tests/testthat/test-confounder.R",
    "chars": 1224,
    "preview": "data(\"caporaso\")\ndata(\"pediatric_ibd\")\ntest_that(\"check confounding variables\", {\n    expect_error(\n        check_confou"
  },
  {
    "path": "tests/testthat/test-edgeR.R",
    "chars": 248,
    "preview": "test_that(\"result of edger\", {\n    data(pediatric_ibd)\n    mm_edger <- run_edger(\n        pediatric_ibd,\n        \"Class\""
  },
  {
    "path": "tests/testthat/test-extract.R",
    "chars": 333,
    "preview": "test_that(\"extract methods\", {\n    marker <- marker_table(\n        data.frame(\n            feature = paste0(\"sp\", 1:5),\n"
  },
  {
    "path": "tests/testthat/test-import-picrust2.R",
    "chars": 427,
    "preview": "test_that(\"import_picrust2 works\", {\n    sam_tab <- system.file(\n        \"extdata\", \"picrust2_metadata.tsv\",\n        pac"
  },
  {
    "path": "tests/testthat/test-import-qiime2.R",
    "chars": 373,
    "preview": "test_that(\"whether the row.names of feature table is dna sequence or not\", {\n    expect_false(is_dna_seq(\"3597a2689efaf5"
  },
  {
    "path": "tests/testthat/test-lefse-input.R",
    "chars": 1128,
    "preview": "test_that(\"add missing levels: keep abundance is lower than 1\", {\n    data(oxygen)\n    oxygen_feature <- otu_table(oxyge"
  },
  {
    "path": "tests/testthat/test-lefse.R",
    "chars": 759,
    "preview": "# lefse - lda\ndata(kostic_crc)\nkostic_crc_small <- phyloseq::subset_taxa(\n    kostic_crc,\n    Phylum == \"Firmicutes\"\n)\nm"
  },
  {
    "path": "tests/testthat/test-limma-voom.R",
    "chars": 465,
    "preview": "test_that(\"limma voom\", {\n    data(enterotypes_arumugam)\n    enterotype <- phyloseq::subset_samples(\n        enterotypes"
  },
  {
    "path": "tests/testthat/test-metagenomeSeq.R",
    "chars": 1294,
    "preview": "test_that(\"result of metagenomeSeq\", {\n    ps <- phyloseq::phyloseq(\n        otu_table = otu_table(\n            matrix(\n"
  },
  {
    "path": "tests/testthat/test-microbiomeMaker-methods.R",
    "chars": 1159,
    "preview": "test_that(\"nmarker method\", {\n    mm <- microbiomeMarker(\n        marker_table = marker_table(data.frame(\n            fe"
  },
  {
    "path": "tests/testthat/test-microbiomeMarker-class.R",
    "chars": 2159,
    "preview": "test_that(\"microbiomeMarker constructor\", {\n    marker1 <- marker_table(\n        data.frame(\n            feature = paste"
  },
  {
    "path": "tests/testthat/test-multiple-groups-test.R",
    "chars": 1929,
    "preview": "data(enterotypes_arumugam)\nenterotype <- phyloseq::subset_samples(\n    enterotypes_arumugam,\n    Enterotype %in% c(\"Ente"
  },
  {
    "path": "tests/testthat/test-normalization.R",
    "chars": 6837,
    "preview": "ct <- as(otu_table(pediatric_ibd), \"matrix\")\ngm_mean <- function(x, na.rm = TRUE) {\n    exp(sum(log(x[x > 0]), na.rm = n"
  },
  {
    "path": "tests/testthat/test-sl.R",
    "chars": 606,
    "preview": "test_that(\"supervised machine learning method workds properly\", {\n    data(enterotypes_arumugam)\n    ps_small <- phylose"
  },
  {
    "path": "tests/testthat/test-summarize-tax.R",
    "chars": 1365,
    "preview": "ps <- phyloseq::phyloseq(\n    otu_table = otu_table(\n        matrix(\n            sample(100, 40),\n            nrow = 4,\n"
  },
  {
    "path": "tests/testthat/test-transform.R",
    "chars": 1516,
    "preview": "data(enterotypes_arumugam)\nps_t <- transform_abundances(enterotypes_arumugam, \"log10p\")\notutable_t <- transform_abundanc"
  },
  {
    "path": "tests/testthat/test-two-group-test.R",
    "chars": 665,
    "preview": "test_that(\"ration\", {\n    abd1 <- rep(0, 6)\n    abd2 <- rep(0, 6)\n    expect_equal(calc_ratio(abd1, abd2), 0)\n\n    abd1 "
  },
  {
    "path": "tests/testthat/test-utilities.R",
    "chars": 10243,
    "preview": "test_that(\"check upper first letter\", {\n    expect_equal(\n        upper_firstletter(c(\"abc\", \"ABC\", \"Abc\")),\n        c(\""
  },
  {
    "path": "tests/testthat/test_cladogram.R",
    "chars": 828,
    "preview": "test_that(\"Generate unique id for short annotation label\", {\n    uid <- get_unique_id(500)\n    expect_equal(uid[26], \"z\""
  },
  {
    "path": "tests/testthat/test_fix_duplicate_tax.R",
    "chars": 719,
    "preview": "test_that(\"fix duplicate tax\", {\n    ps <- readRDS(\"data/data_tax_duplicate.rds\")\n    ps_fixed <- fix_duplicate_tax(ps)\n"
  },
  {
    "path": "tests/testthat.R",
    "chars": 76,
    "preview": "library(testthat)\nlibrary(microbiomeMarker)\n\ntest_check(\"microbiomeMarker\")\n"
  },
  {
    "path": "vignettes/.gitignore",
    "chars": 11,
    "preview": "*.html\n*.R\n"
  },
  {
    "path": "vignettes/microbiomeMarker-vignette.Rmd",
    "chars": 21442,
    "preview": "---\ntitle: \"Tools for microbiome marker identification\"\nauthor: \n  - name: Yang  Cao\n    affiliation: Department of Envi"
  },
  {
    "path": "vignettes/vignette.bib",
    "chars": 5716,
    "preview": "@article{gilbert2018current,\n  title={Current understanding of the human microbiome},\n  author={Gilbert, Jack A and Blas"
  }
]

// ... and 17 more files (download for full content)

About this extraction

This page contains the full source code of the yiluheihei/microbiomeMarker GitHub repository, extracted and formatted as plain text for AI agents and large language models (LLMs). The extraction includes 179 files (680.9 KB), approximately 195.3k tokens. Use this with OpenClaw, Claude, ChatGPT, Cursor, Windsurf, or any other AI tool that accepts text input. You can copy the full output to your clipboard or download it as a .txt file.

Extracted by GitExtract — free GitHub repo to text converter for AI. Built by Nikandr Surkov.

Extract another repo