Full Code of YuLab-SMU/ggmsa for AI

devel 956078ed388a cached
108 files
300.1 KB
117.4k tokens
1 requests
Download .txt
Showing preview only (324K chars total). Download the full file or copy to clipboard to get everything.
Repository: YuLab-SMU/ggmsa
Branch: devel
Commit: 956078ed388a
Files: 108
Total size: 300.1 KB

Directory structure:
gitextract_gj8qs7tf/

├── .Rbuildignore
├── .gitignore
├── CONDUCT.md
├── DESCRIPTION
├── Makefile
├── NAMESPACE
├── NEWS.md
├── R/
│   ├── AllClasses.R
│   ├── SeqBundles.R
│   ├── ancestor_seq.R
│   ├── arc.R
│   ├── available.R
│   ├── clustal.R
│   ├── color_by_conservation.R
│   ├── color_else.R
│   ├── cons.R
│   ├── data.R
│   ├── dms.R
│   ├── facet_msa.R
│   ├── geom_GC.R
│   ├── geom_asterisk.R
│   ├── geom_msa.R
│   ├── geom_msaBar.R
│   ├── geom_seed.R
│   ├── ggmaf.R
│   ├── ggmsa.R
│   ├── import-functions.R
│   ├── method-plot.R
│   ├── method-show.R
│   ├── methods-diff.R
│   ├── methods-ggplot_add.R
│   ├── msa_data.R
│   ├── pp_interactive.R
│   ├── prepare_fasta.R
│   ├── read_maf.R
│   ├── seqdiff.R
│   ├── seqlogo.R
│   ├── simplot.R
│   ├── sysdata.rda
│   ├── theme_msa.R
│   └── zzz.R
├── README.Rmd
├── README.md
├── inst/
│   ├── CITATION
│   └── extdata/
│       ├── GVariation/
│       │   ├── A.Mont.fas
│       │   ├── B.Oz.fas
│       │   ├── C.Wilga5.fas
│       │   └── sample_alignment.fa
│       ├── Gram-negative_AKL.fasta
│       ├── Gram-positive_AKL.fasta
│       ├── LeaderRepeat_All.fa
│       ├── Rfam/
│       │   ├── RF00458.fasta
│       │   ├── RF03120.fasta
│       │   └── RF03120_SS.txt
│       ├── TP53_genes.xlsx
│       ├── sample.fasta
│       ├── seedSample.fa
│       ├── sequence-link-tree.fasta
│       └── tp53.fa
├── man/
│   ├── GVariation.Rd
│   ├── Gram-negative_AKL.fasta.Rd
│   ├── Gram-positive_AKL.fasta.Rd
│   ├── LeaderRepeat_All.fa.Rd
│   ├── Rfam.Rd
│   ├── TP53_genes.xlsx.Rd
│   ├── adjust_ally.Rd
│   ├── assign_dms.Rd
│   ├── available_colors.Rd
│   ├── available_fonts.Rd
│   ├── available_msa.Rd
│   ├── extract_seq.Rd
│   ├── facet_msa.Rd
│   ├── geom_GC.Rd
│   ├── geom_helix.Rd
│   ├── geom_msa.Rd
│   ├── geom_msaBar.Rd
│   ├── geom_seed.Rd
│   ├── geom_seqlogo.Rd
│   ├── ggSeqBundle.Rd
│   ├── gghelix.Rd
│   ├── ggmaf.Rd
│   ├── ggmsa.Rd
│   ├── merge_seq.Rd
│   ├── plot-methods.Rd
│   ├── readSSfile.Rd
│   ├── read_maf.Rd
│   ├── reset_pos.Rd
│   ├── sample.fasta.Rd
│   ├── seedSample.fa.Rd
│   ├── seqdiff.Rd
│   ├── seqlogo.Rd
│   ├── sequence-link-tree.fasta.Rd
│   ├── show-methods.Rd
│   ├── simplify_hdata.Rd
│   ├── simplot.Rd
│   ├── theme_msa.Rd
│   ├── tidy_hdata.Rd
│   ├── tidy_maf_df.Rd
│   ├── tidy_msa.Rd
│   ├── tp53.fa.Rd
│   └── treeMSA_plot.Rd
├── tests/
│   ├── testthat/
│   │   ├── test-main.R
│   │   ├── test-msa_data.R
│   │   └── test-tidy_msa.R
│   └── testthat.R
└── vignettes/
    ├── .gitignore
    ├── ggmsa.Rmd
    └── ggmsa.bib

================================================
FILE CONTENTS
================================================

================================================
FILE: .Rbuildignore
================================================
^.*\.Rproj$
^\.Rproj\.user$
Makefile
README.md
README_files
README.Rmd
^_pkgdown\.yml$
^docs$
^pkgdown$
logo.png
CONDUCT.md


================================================
FILE: .gitignore
================================================
.Rproj.user
.Rhistory
.RData
.Renviron
.DS_Store
inst/doc
ggmsa.Rproj
ggmsa.Rcheck
.git
docs/
pkgdown/


================================================
FILE: CONDUCT.md
================================================
# Contributor Code of Conduct

As contributors and maintainers of this project, we pledge to respect all people who 
contribute through reporting issues, posting feature requests, updating documentation,
submitting pull requests or patches, and other activities.

We are committed to making participation in this project a harassment-free experience for
everyone, regardless of level of experience, gender, gender identity and expression,
sexual orientation, disability, personal appearance, body size, race, ethnicity, age, or religion.

Examples of unacceptable behavior by participants include the use of sexual language or
imagery, derogatory comments or personal attacks, trolling, public or private harassment,
insults, or other unprofessional conduct.

Project maintainers have the right and responsibility to remove, edit, or reject comments,
commits, code, wiki edits, issues, and other contributions that are not aligned to this 
Code of Conduct. Project maintainers who do not follow the Code of Conduct may be removed 
from the project team.

Instances of abusive, harassing, or otherwise unacceptable behavior may be reported by 
opening an issue or contacting one or more of the project maintainers.

This Code of Conduct is adapted from the Contributor Covenant 
(http://contributor-covenant.org), version 1.0.0, available at 
http://contributor-covenant.org/version/1/0/0/


================================================
FILE: DESCRIPTION
================================================
Package: ggmsa
Title: Plot Multiple Sequence Alignment using 'ggplot2'
Version: 1.19.0
Authors@R: c(person("Guangchuang", "Yu", email = "guangchuangyu@gmail.com", role = c("aut", "cre","ths"), comment = c(ORCID = "0000-0002-6485-8781")),
             person("Lang", "Zhou",      email = "nyzhoulang@gmail.com",    role = "aut"),
             person("Shuangbin", "Xu",   email = "xshuangbin@163.com",      role = "ctb"),
             person("Huina", "Huang",    email = "1185796994@qq.com",       role = "ctb"))
Description: A visual exploration tool for multiple sequence alignment 
    and associated data. Supports MSA of DNA, RNA, and protein sequences 
    using 'ggplot2'. Multiple sequence alignment can easily be combined 
    with other 'ggplot2' plots, such as phylogenetic tree Visualized by 
    'ggtree', boxplot, genome map and so on. More features: visualization 
    of sequence logos, sequence bundles, RNA secondary structures and detection 
    of sequence recombinations.
Depends: R (>= 4.1.0)
Imports:
    Biostrings, 
    ggplot2,
    magrittr,
    tidyr,
    utils,
    stats,
    aplot,
    RColorBrewer,
    ggfun (>= 0.2.0),
    ggforce,
    dplyr,
    R4RNA,
    grDevices,
    seqmagick,
    grid,
    methods,
    ggtree (>= 1.17.1)
Suggests:
    ggtreeExtra,
    ape,
    cowplot,
    knitr,
    rmarkdown,
    readxl,
    ggnewscale,
    kableExtra,
    gggenes,
    statebins,
    prettydoc,
    testthat (>= 3.0.0),
    yulab.utils
License: Artistic-2.0
Encoding: UTF-8
URL: https://doi.org/10.1093/bib/bbac222(paper), https://www.amazon.com/Integration-Manipulation-Visualization-Phylogenetic-Computational-ebook/dp/B0B5NLZR1Z/ (book)
BugReports: https://github.com/YuLab-SMU/ggmsa/issues
biocViews: Software, Visualization, Alignment, Annotation, MultipleSequenceAlignment
RoxygenNote: 7.3.2
VignetteBuilder: knitr
Config/testthat/edition: 3


================================================
FILE: Makefile
================================================
PKGNAME := $(shell sed -n "s/Package: *\([^ ]*\)/\1/p" DESCRIPTION)
PKGVERS := $(shell sed -n "s/Version: *\([^ ]*\)/\1/p" DESCRIPTION)
PKGSRC  := $(shell basename `pwd`)
BIOCVER := RELEASE_3_23


all: rd check clean

alldocs: rd readme mkdocs

rd:
	Rscript -e 'roxygen2::roxygenise(".")'

readme:
	Rscript -e 'rmarkdown::render("README.Rmd")'

readme2:
	Rscript -e 'rmarkdown::render("README.Rmd", "html_document")'

build:
	# cd ..;\
	# R CMD build $(PKGSRC)
	Rscript -e 'devtools::build()'

build2:
	cd ..;\
	R CMD build --no-build-vignettes $(PKGSRC)

install:
	cd ..;\
	R CMD INSTALL $(PKGNAME)_$(PKGVERS).tar.gz

check: #build
	#cd ..;\
	#Rscript -e 'rcmdcheck::rcmdcheck("$(PKGNAME)_$(PKGVERS).tar.gz")'
	Rscript -e 'devtools::check()'

check2: build
	cd ..;\
	R CMD check $(PKGNAME)_$(PKGVERS).tar.gz

bioccheck:
	cd ..;\
	Rscript -e 'BiocCheck::BiocCheck("$(PKGNAME)_$(PKGVERS).tar.gz")'
	
gpcheck:
	Rscript -e 'goodpractice::gp()'

clean:
	cd ..;\
	$(RM) -r $(PKGNAME).Rcheck/

gitmaintain:
	git gc --auto;\
	git prune -v;\
	git fsck --full

rmrelease:
	git branch -D $(BIOCVER)

release:
	git checkout $(BIOCVER);\
	git fetch --all

update:
	git fetch --all;\
	git checkout devel;\
	git merge upstream/devel;\
	git merge origin/devel;\


push: 
	git push upstream devel;\
	git push origin devel

biocinit:
	git remote add upstream git@git.bioconductor.org:packages/$(PKGNAME).git;\
	git fetch --all

================================================
FILE: NAMESPACE
================================================
# Generated by roxygen2: do not edit by hand

S3method(diff,SeqDiff)
S3method(ggplot_add,GCcontent)
S3method(ggplot_add,facet_msa)
S3method(ggplot_add,msaBar)
S3method(ggplot_add,nucleotideeHelix)
S3method(ggplot_add,seed)
S3method(ggplot_add,seqlogo)
export(adjust_ally)
export(assign_dms)
export(available_colors)
export(available_fonts)
export(available_msa)
export(extract_seq)
export(facet_msa)
export(geom_GC)
export(geom_helix)
export(geom_msa)
export(geom_msaBar)
export(geom_seed)
export(geom_seqlogo)
export(ggSeqBundle)
export(gghelix)
export(ggmaf)
export(ggmsa)
export(merge_seq)
export(readSSfile)
export(read_maf)
export(reset_pos)
export(seqdiff)
export(seqlogo)
export(simplify_hdata)
export(simplot)
export(theme_msa)
export(tidy_hdata)
export(tidy_maf_df)
export(tidy_msa)
export(treeMSA_plot)
exportMethods(plot)
exportMethods(show)
importClassesFrom(Biostrings,BStringSet)
importFrom(Biostrings,AAStringSet)
importFrom(Biostrings,DNAStringSet)
importFrom(Biostrings,RNAStringSet)
importFrom(Biostrings,readBStringSet)
importFrom(Biostrings,readDNAStringSet)
importFrom(Biostrings,toString)
importFrom(Biostrings,width)
importFrom(R4RNA,as.helix)
importFrom(R4RNA,collapseHelix)
importFrom(R4RNA,expandHelix)
importFrom(R4RNA,readBpseq)
importFrom(R4RNA,readConnect)
importFrom(R4RNA,readHelix)
importFrom(R4RNA,readVienna)
importFrom(RColorBrewer,brewer.pal)
importFrom(aplot,insert_top)
importFrom(aplot,plot_list)
importFrom(dplyr,group_by)
importFrom(dplyr,group_by_)
importFrom(dplyr,n)
importFrom(dplyr,select)
importFrom(dplyr,summarize)
importFrom(dplyr,summarize_)
importFrom(ggforce,geom_arc)
importFrom(ggfun,geom_xspline)
importFrom(ggplot2,Geom)
importFrom(ggplot2,aes)
importFrom(ggplot2,aes_)
importFrom(ggplot2,coord_cartesian)
importFrom(ggplot2,coord_fixed)
importFrom(ggplot2,draw_key_polygon)
importFrom(ggplot2,element_blank)
importFrom(ggplot2,element_line)
importFrom(ggplot2,element_text)
importFrom(ggplot2,facet_wrap)
importFrom(ggplot2,geom_area)
importFrom(ggplot2,geom_blank)
importFrom(ggplot2,geom_col)
importFrom(ggplot2,geom_line)
importFrom(ggplot2,geom_point)
importFrom(ggplot2,geom_polygon)
importFrom(ggplot2,geom_ribbon)
importFrom(ggplot2,geom_segment)
importFrom(ggplot2,geom_smooth)
importFrom(ggplot2,geom_text)
importFrom(ggplot2,geom_tile)
importFrom(ggplot2,ggplot)
importFrom(ggplot2,ggplot_add)
importFrom(ggplot2,ggplot_build)
importFrom(ggplot2,ggplot_gtable)
importFrom(ggplot2,ggproto)
importFrom(ggplot2,ggtitle)
importFrom(ggplot2,labs)
importFrom(ggplot2,layer)
importFrom(ggplot2,scale_color_manual)
importFrom(ggplot2,scale_fill_gradientn)
importFrom(ggplot2,scale_fill_manual)
importFrom(ggplot2,scale_x_continuous)
importFrom(ggplot2,scale_y_continuous)
importFrom(ggplot2,theme)
importFrom(ggplot2,theme_bw)
importFrom(ggplot2,theme_minimal)
importFrom(ggplot2,theme_void)
importFrom(ggplot2,xlab)
importFrom(ggplot2,xlim)
importFrom(ggplot2,ylab)
importFrom(ggtree,geom_facet)
importFrom(ggtree,geom_tiplab)
importFrom(grDevices,colorRampPalette)
importFrom(grid,arrow)
importFrom(grid,gTree)
importFrom(grid,gpar)
importFrom(grid,polygonGrob)
importFrom(grid,unit)
importFrom(grid,unit.pmax)
importFrom(magrittr,"%<>%")
importFrom(magrittr,"%>%")
importFrom(methods,missingArg)
importFrom(methods,new)
importFrom(methods,show)
importFrom(seqmagick,fa_read)
importFrom(stats,setNames)
importFrom(tidyr,gather)
importFrom(utils,getFromNamespace)
importFrom(utils,globalVariables)
importFrom(utils,modifyList)
importFrom(utils,packageDescription)
importFrom(utils,read.delim)


================================================
FILE: NEWS.md
================================================
# ggmsa 1.18.0

+ Bioconductor RELEASE_3_23 (2026-04-29, Wed)

# ggmsa 1.16.0

+ Bioconductor RELEASE_3_22 (2025-11-01, Sat)

# ggmsa 1.15.1

+ replace `ggalt::geom_xspline()` with `ggfun::geom_xspline()` (2017-07-12, Sat)

# ggmsa 1.3.3

+ calling `\dontrun{}` for examples on `ggmsa()`

# ggmsa 1.3.2
+ bugfix: `geom_msaBar` conservation layer incorrectly aligned issues#34(2022-5-13, Fri)

# ggmsa 1.3.1

+ A new feature--selects ancestral sequence on Tree-MSA plot `treeMSA_plot` (2022-4-14, Thu)
+ A new feature--visualization of genome alignment `ggmaf` (2022-4-14, Thu)
+ A test feature--visualization protein-protein interactive (2022-4-14, Thu)
+ updated the way smooth is invoked on simplot(2022-01-03, Mon)

# ggmsa 1.1.4
added smoothed curve on simplot.(2021-12-17, Fri)

# ggmsa 1.1.3
fixed the typo in "posHighligthed", and changed it to 
snake_case "position_highlight" from camelCase "posHighligthed" (2021-12-13, Mon)


# ggmsa 1.1.2
fixed the assignment error on line 155 'seqlogo.R'

# ggmsa 1.1.1 
fixed error: using `||` instead of `|` on 110 lines in geom_msa.R


# ggmsa 0.99.0 or 0.99.x
(Prepare for submission to `Bioconductor`, 2021-09-22 Wed)

+ 0.99.1 update DESCRIPTION and NEWS files (2021-09-28, Tue)
+ 0.99.2 add documentation for row data in extdata/inst and clean up code (2021-09-29, Wed)
+ 0.99.3 remove some  vignettes from master (build on the gh-pages branch) (2021-10-1, Fri)
+ 0.99.4 remove 'stringr' package from 'Imports' (2021-10-11, Mon)
+ 0.99.5 make the consensus_views compatible ggtreeExtra and add package description. (2021-10-21, Thu)

# ggmsa 0.0.10 

+ update default color schemes in  lower part of the SeqDiff plot (2021-08-20, Fri)

# ggmsa 0.0.9

+ import R4RNA to fix R check (2021-08-03, Tue)

# ggmsa 0.0.8

+ bugfix: fix variable names error in color_scheme. (2021-07-29, Thu)
+ The migration of sequence recombination functionality from `seqcombo` package. (2021-07-20, Tue)


# ggmsa 0.0.7

+ added `gghelix()` and `geom_helix()`.(2021-04-1, Thu)
+ added option to show the fill legend.(2021-03-23, Tue)
+ added a error message to remind that "sequences must have unique names".(2021-03-18, Thu)
+ added `ggSeqBundle()` to plot Sequence Bundles for MSAs based `ggolot2` (2021-03-18, Thu)

# ggmsa 0.0.6

+ supports linking `ggtreeExtra`. (2021-01-21, Thu)
+ bugfix: reversed sequence in 'tree + geom_facet(font)' . (2021-01-21, Thu)
+ bugfix: partitioning error when the sequence starting point greater than 1. (2021-01-21, Thu)
+ bugfix: generates continuous x-axis labels for each panel. (2021-01-21, Thu)
+ supports customize colors `custom_color`. (2020-12-28, Mon)

# ggmsa 0.0.5

+ added a new view called `by_conservation`.(2020-12-22, Tue)
+ added a new color scheme `Hydrophobicity` and a new parameter `border`.(2020-12-21, Mon)
+ rewrite the function `facet_msa()`.(2020-12-03, Thu)
+ Debug: tree + geom_facet(geom_msa()) does not work.(2020-12-03, Thu)
+ added a new function `geom_msaBar()`.(2020-12-03, Thu)
+ added a new parameter `ignore_gaps` used in consensus views.(2020-10-09, Fri)
+ debug in consensus views (2020-10-05, Mon)
+ added consensus views (2020-9-30, Wed)
+ added new colors `LETTER` and `CN6` provided by ShixiangWang.[issues#8](https://github.com/YuLab-SMU/ggmsa/issues/8)

# ggmsa 0.0.4

+ fixed warning message in **msa_data.R** (2020-4-26, Sun)
+ added ggplot_add methods for `geom_*()` (2020-4-24, Fri)
+ added a parameter `seq_name` in `ggmsa()` (2020-4-23, Thu)
+ added a new function `facet_msa()` --> break down the MSA (2020-4-17, Fri)
+ added a parameter `posHighlighted` in `ggmsa()` (2020-4-17, Fri)
+ created a new layer `geom_asterisk()` to optimized `geom_seed()` (2020-4-11, Sta)
+ added new functions `available_colors()`, `available_fonts()` and `available_msa()` (2020-3-30, Thu)
+ added a new function `geom_seed()` --> highlight the seed region in miRNA sequences (2020-3-27, Fri)
+ added a new function `ggmotif()`--> plot sequence motifs independently (2020-3-23, Tue)
+ added a Monospaced Font `DroidSansMono` (2020-3-23, Mon)

# ggmsa 0.0.3

+ release of v=0.0.3 (2020-03-16, Mon)
+ added a new function `geom_GC()` --> plot GC content in MSA (2020-02-28, Fri)
+ added a new function `geom_seqlogo()` --> plot plot sequence motifs in MSA (2020-02-14, Fri)
+ used a proportional scaling algorithm (2020-01-08, Wed)


# ggmsa 0.0.2

+ support plot sequence logo (2019-12-25, Wed)
+ added three fonts:`helvetical`, `times_new_roman`, `mono` (2019-12-21, Sta)
+ ~~added three fonts:`serif_font`, `Montserrat_font`, `roboto_font` (2019-12-17, Tue)~~
+ added internal outline polygons (2019-12-15, Sun)
+ bug fixed of `tidy_msa`
+ import `seqmagick` for parsing fasta 
+ `tidy_msa` for converting msa file/object to tidy data frame (2019-12-09, Mon)

 
# ggmsa 0.0.1

+ initial CRAN release (2019-10-17, Thu) 
+ removed from CRAN on 2021-08-17


================================================
FILE: R/AllClasses.R
================================================
setClass("SeqDiff",
         representation = representation(
                          file = "character",
                          sequence = "BStringSet",
                          reference = "numeric",
                          diff = "data.frame"
                          )
        )


================================================
FILE: R/SeqBundles.R
================================================
##'  plot Sequence Bundles for MSA based 'ggolot2'
##'
##'
##' @title ggSeqBundle
##' @importFrom ggfun geom_xspline
##' @param msa Multiple sequence alignment file(FASTA) or object for 
##' representing either nucleotide sequences or peptide sequences.Also receives
##'  multiple MSA files.
##'  eg:msa = c("Gram-negative_AKL.fasta", "Gram-positive_AKL.fasta").
##' @param line_width The width of bundles at each site, default is 0.3.
##' @param line_thickness The thickness of bundles at each site, default is 0.3.
##' @param line_high The high of bundles at each site, default is 0.
##' @param spline_shape A numeric vector of values between -1 and 1, which 
##' control the shape of the spline relative to the control points.
##' @param size A numeric vector of values between 0 and 1, 
##' which control the size of each lines.
##' @param alpha A numeric vector of values between 0 and 1, 
##' which control the alpha of each lines.
##' @param bundle_color The colors of each sequence bundles.
##' eg: bundle_color = c("#2ba0f5","#424242").
##' @param lev_molecule Reassigning the Y-axis and displaying 
##' letter-coded amino acids/nucleotides arranged by physiochemical 
##' properties or others.eg:amino acids hydrophobicity 
##' lev_molecule = c("-","A", "V", "L", "I", "P", "F", "W", "M", 
##'    "G", "S","T", "C", "Y", "N", "Q", "D", "E", "K","R", "H").
##' @return ggplot object
##' @export
##' @examples
##' aln <- system.file("extdata", "Gram-negative_AKL.fasta", package = "ggmsa")
##' ggSeqBundle(aln)
##' @author Lang Zhou
ggSeqBundle <- function(msa,
                        line_width = 0.3,
                        line_thickness = 0.3,
                        line_high = 0,
                        spline_shape = 0.3,
                        size = 0.5,
                        alpha = 0.2,
                        bundle_color = c("#2ba0f5","#424242"),
                        lev_molecule = c("-", "A", "V", "L", "I", "P", 
                                         "F", "W", "M", "G", "S","T", 
                                         "C", "Y", "N", "Q", "D", "E", 
                                         "K", "R", "H")
                        ) {
    if(length(msa) > length(bundle_color)) {
      stop("Each MSA group should be assigned a bundle color!!")
    }

    df <- lapply(seq_along(msa), function(i){
        df_aa <- tidy_msa(msa[[i]])
        df_aa$name <- as.character(df_aa$name)
        df_aa$group <- i
        df_aa
    })%>% do.call("rbind",.)

    dd <- adjustMSA(df_msa = df,
                    lev_molecule = lev_molecule,
                    line_width = line_width,
                    line_thickness = line_thickness,
                    line_high = line_high,
                    bundle_color = bundle_color
                    )

    mapping <- aes(x = position_adj, y = y_adj, 
                   group=name, color = I(bundle_color))
    ggplot(data = dd, mapping = mapping) +
        geom_xspline(shape = spline_shape, linewidth = size, alpha = alpha) +
            theme_bundles(df = df, lev_molecule = lev_molecule)

}



adjustMSA <- function(df_msa, lev_molecule, line_width, 
                      line_thickness, bundle_color, line_high) {
    data_scale <- lapply(nrow(df_msa) %>% seq_len(), function(i) {
        d <- df_msa[i,]
        d[2,] <-  d[1,]
        d[1,"position_adj"] <- d[1,"position"] - line_width
        d[2,"position_adj"] <- d[2,"position"] + line_width
        d
    }) %>% do.call("rbind",.)

    data_scale$y <- factor(data_scale$character, levels = lev_molecule) %>%
      as.numeric()

    data_adj <- lapply(data_scale$group %>% unique, function(g) {
        data_group <- data_scale[data_scale$group == g,]
        thickness <- line_thickness / factor(data_group$name) %>% 
          as.numeric %>%
          max
        
        dd_adj <- lapply(unique(data_group$position), function(i){
            df_pos <- data_group[data_group$position == i,]
            lapply(unique(df_pos$y), function(j){
                df_y <- df_pos[df_pos$y == j,]
                thick_lev <- df_y$name %>% factor %>% as.numeric - 1
                df_y$y_adj <- df_y$y - 0.4 + line_high + thickness * 
                              thick_lev + line_thickness * (g - 1)
                df_y
            }) %>% do.call("rbind",.)
        }) %>% do.call("rbind",.)
    dd_adj$bundle_color <- bundle_color[[g]]
    dd_adj
    }) %>% do.call("rbind",.)
    return(data_adj)
}

##' @importFrom ggplot2 element_line
theme_bundles <- function(df, lev_molecule){
    break_y <- factor(lev_molecule, levels = lev_molecule) %>% as.numeric
    minor_y <- c(break_y + 0.5, break_y - 0.5) %>% unique
    break_x <- max(df$position) %>% seq_len
    minor_x <- c(break_x + 0.5, break_x - 0.5) %>% unique

    list(
        ylab(NULL),
        xlab("Position number"),
        scale_x_continuous(breaks = break_x, 
                           labels = break_x, 
                           minor_breaks = minor_x),
        scale_y_continuous(breaks = break_y, 
                           labels = lev_molecule, 
                           minor_breaks = minor_y),
        theme(panel.grid.minor.y = element_line(color = "#e8e0e0", linewidth = 0.4),
              axis.line.x = element_line(color = "gray60", linewidth = 0.8),
              panel.grid.major = element_blank(),
              axis.ticks.y = element_blank(),
              panel.background = element_blank())
  )
}







================================================
FILE: R/ancestor_seq.R
================================================
##' plot Tree-MSA plot
##'
##'
##' 'treeMSA_plot()' automatically re-arranges the MSA data according to 
##' the tree structure,
##' @title treeMSA_plot
##' @param p_tree tree view
##' @param tidymsa_df tidy MSA data 
##' @param ancestral_node vector, internal node in tree. Assigning a internal 
##' node to display "ancestral sequences",If ancestral_node = "none" hides 
##' all ancestral sequences, if ancestral_node = "all" shows all ancestral 
##' sequences.
##' @param sub logical value. Displaying a subset of ancestral sequences or not.
##' @param panel panel name for plot of MSA data
##' @param font font families, possible values are 'helvetical', 'mono', and 
##' 'DroidSansMono', 'TimesNewRoman'.  Defaults is 'helvetical'. 
##' If font = NULL, only plot the background tile.
##' @param color a Color scheme. One of 'Clustal', 'Chemistry_AA', 
##' 'Shapely_AA', 'Zappo_AA', 'Taylor_AA', 'LETTER', 'CN6', 'Chemistry_NT', 
##' 'Shapely_NT', 'Zappo_NT', 'Taylor_NT'. Defaults is 'Chemistry_AA'.
##' @param seq_colname the colname of MSA on tree$data
##' @param ... additional parameters for 'geom_msa'
##' @export
##' @importFrom ggtree geom_facet
##' @return ggplot object 
##' @author Lang Zhou
treeMSA_plot <- function(p_tree, 
                         tidymsa_df, 
                         ancestral_node = "none", 
                         sub = FALSE,
                         panel = "MSA",
                         font = NULL,
                         color = "Chemistry_AA",
                         seq_colname = NULL,
                         ...) {
  
  if(!ancestral_node == "none" && is.null(seq_colname)) {
    stop("pls assign the colname of MSA on tree$data by arguments 'seq_colname'!")
  } 
  
  
  if(!ancestral_node == "none") {
    p_tree <- adjust_ally(p_tree, node = ancestral_node, 
                          sub = sub,
                          seq_colname = seq_colname)
    
    tidymsa_df <- extract_seq(p_tree, 
                              seq_colname = seq_colname)
  }
  
  p <- p_tree + geom_facet(geom = geom_msa, 
                      data = tidymsa_df,  
                      panel = panel,
                      font = font, 
                      color = color,
                      ...)
  
  if(ancestral_node == "none") {
    p <- p + geom_tiplab(offset = 0.002)
  }
  
  p
}

##' adjust the tree branch position after assigning ancestor node
##'
##' @title adjust_ally
##' @param tree ggtree object
##' @param node internal node in tree
##' @param sub logical value.
##' @param seq_colname the colname of MSA on tree$data
##' @importFrom ggtree geom_tiplab
##' @importFrom ggplot2 aes_
##' @importFrom utils getFromNamespace
##' @return tree
##' @export
##' @author Lang Zhou

adjust_ally <- function(tree, node, sub = FALSE, seq_colname = "mol_seq") {
  getSubtree <- getFromNamespace("getSubtree", "ggtree")
  
  if(node == "all"){
    d <- tree$data
    ancestor_n <- d[!d$isTip & !is.na(d[,seq_colname][[1]]),"node"][[1]]
  }else {
    
    if(sub){
      ancestor_n <- lapply(node, function(i) {
        sub_tree <- getSubtree(tree,node = i)
        sub_ancestor <- sub_tree[!sub_tree$isTip,]
        ancestor_n <- sub_ancestor$node
        return(ancestor_n)
      })%>% unlist %>% unique
    }else {
      ancestor_n <- node
    }
    
  }
  
  for (i in ancestor_n) {
    tree <- adjust_treey(tree = tree, node = i)
  }
  
  tree$data$node_color <- "black"
  tree$data[tree$data$node %in% ancestor_n,"node_color"] <- "red"
  tree <- tree + geom_tiplab(aes_(color = ~I(node_color)),offset = 0.002)
  return(tree)
}

##' extract ancestor sequence from tree data
##'
##' @title extract_seq
##' @param tree_adjust ggtree object
##' @param seq_colname the colname of MSA on tree$data
##' @return character
##' @export
##' @author Lang Zhou
extract_seq <- function(tree_adjust, seq_colname = "mol_seq") {
  data <- tree_adjust$data
  seq <- data[data$isTip,seq_colname][[1]]
  names(seq) <- data[data$isTip,]$label
  tidy <- tidy_msa(seq)
  return(tidy)
}


adjust_treey <- function(tree, node) {
  tree$data$isTip[tree$data$node == node] <- TRUE
  tree$data$label[tree$data$node == node] <- 
    tree$data$name[tree$data$node == node]
  
  y_ancenstor <- tree$data$y[tree$data$node == node]
  tree$data$y[tree$data$y > y_ancenstor] <- 
    tree$data$y[tree$data$y > y_ancenstor] + 1
  tree$data$y[tree$data$node == node] <- 
    tree$data$y[tree$data$node == node] %>% ceiling
  return(tree)
}










================================================
FILE: R/arc.R
================================================
##'  Plots nucleltide secondary structure as helices in arc diagram
##'
##' @title gghelix
##' @param helix_data a data frame. The file of nucleltide secondary structure
##' and then read by readSSfile().
##' @param overlap Logicals. If TRUE, two structures data called predict 
##' and known must be given(eg:heilx_data = list(known = data1, 
##'                                              predicted = data2)), 
##' plots the predicted helices that are known on top, predicted helices that
##'  are not known on the bottom, and finally plots unpredicted helices 
##'  on top in black.
##' @param color_by generate colors for helices by various rules,
##' including integer counts and value ranges one of "length" and "value"
##' @return ggplot object
##' @export
##' @examples
##' RF03120 <- system.file("extdata/Rfam/RF03120_SS.txt", package="ggmsa")
##' helix_data <- readSSfile(RF03120, type = "Vienna")
##' gghelix(helix_data)
##' @author Lang Zhou
gghelix <- function(helix_data, color_by = "length",overlap = FALSE){
    if(is.data.frame(helix_data)) {
        helix_tidy <- tidy_helix(helix_data, color_by = color_by)
    }else {
        helix_tidy <- tidy_list_helix(helix_data, color_by = color_by)
    }
    ly <- layer_helix(helix_data = helix_tidy, overlap = overlap)
    p <- ggplot() + ly + theme_helix()
    return(p)
}

##' The layer of helix plot
##'
##' @title geom_helix
##' @param helix_data a data frame. The file of nucleltide secondary structure
##' and then read by readSSfile().
##' @param overlap Logicals. If TRUE, two structures data called predict 
##' and known must be given(eg:heilx_data = list(known = data1, 
##'                                              predicted = data2)), 
##' plots the predicted helices that are known on top,
##' predicted helices that are not known on the bottom, and finally plots 
##' unpredicted helices on top in black.
##' @param color_by generate colors for helices by various rules,
##' including integer counts and value ranges one of "length" and "value"
##' @param ... additional parameter
##' @return ggplot2 layers
##' @export
##' @examples
##' RF03120 <- system.file("extdata/Rfam/RF03120_SS.txt", package="ggmsa")
##'RF03120_fas <- system.file("extdata/Rfam/RF03120.fasta", package="ggmsa")
##'SS <- readSSfile(RF03120, type = "Vienna")
##'ggmsa(RF03120_fas, font = NULL,border = NA, 
##'     color = "Chemistry_NT", seq_name = FALSE) +
##'geom_helix(SS)
##' @author Lang Zhou
geom_helix <- function(helix_data, color_by = "length", overlap = FALSE,  ...) {
  structure(list(helix_data = helix_data,
                 color_by = color_by,
                 overlap = overlap),
            class = "nucleotideeHelix")
}

##' Read secondary structure file
##'
##' @title readSSfile
##' @importFrom utils read.delim
##' @param file A text file in connect format
##' @param type file type. one of "Helix, "Connect", "Vienna" and "Bpseq"
##' @return data frame
##' @importFrom R4RNA readHelix
##' @importFrom R4RNA readConnect
##' @importFrom R4RNA readVienna
##' @importFrom R4RNA readBpseq
##' @importFrom R4RNA expandHelix
##' @importFrom R4RNA collapseHelix
##' @export
##' @examples
##' RF03120 <- system.file("extdata/Rfam/RF03120_SS.txt", package="ggmsa")
##' helix_data <- readSSfile(RF03120, type = "Vienna")
##' @author Lang Zhou
readSSfile <- function(file, type = NULL) {
    type <- match.arg(type, c("Helix", "Connect", "Vienna", "Bpseq"))
    load_data <- switch(type,
                        Helix = readHelix(file),
                        Connect = readConnect(file),
                        Vienna = readVienna(file),
                        Bpseq = expandHelix(file))

    data <- collapseHelix(load_data)
    return(data)

}

tidy_list_helix <- function(helix_data, color_by = "length"){
  known <- tidy_helix(helix_data$known, color_by = color_by)
  predicted <-  tidy_helix(helix_data$predicted, color_by = color_by)
  return(list(known = known, predicted = predicted))
}

tidy_helix <- function(helix_data, color_by = "length"){
    helix_data <- color_helix(helix_data, color = color_by)
    names(helix_data)[c(1,2)] <- c("from","to")
    helix_data$x0 <- (helix_data$to + helix_data$from)/2
    helix_data$r <- (helix_data$to - helix_data$from)/2
    return(helix_data)
}

color_helix <- function(helix_data, color){
    #color <- match.arg(color, c("length", "value"))
    if(color == "length"){
        data_color <- colorBy_length(helix_data)
    }else if(color == "value") {
        data_color <- colorBy_value(helix_data)
    }else {
      helix_data$col <- color
      data_color <- helix_data
    }
      data <- expandHelix(data_color)
      return(data)
}

colorBy_length <- function(helix_data){
    pal_lenght <- colorRampPalette(brewer.pal(name = "Paired", n = 12))
    helix_data$col <- nrow(helix_data) %>% pal_lenght()
    return(helix_data)
}

colorBy_value <- function(helix_data){
    pal_value <- colorRampPalette(rev(brewer.pal(name = "Blues", n = 4)))
    helix_data$col <- nrow(helix_data) %>% pal_value()
    return(helix_data)
}

##' @importFrom ggforce geom_arc
layer_helix <- function(helix_data, overlap = FALSE, seq_numbers = 0){
    mapping_above <- aes_(x0 = ~x0, 
                          y0 = ~(seq_numbers + 0.5), 
                          r = ~r, start = ~1.5*pi, 
                          end = ~2.5*pi)
    mapping_below <- aes_(x0 = ~x0, 
                          y0 = ~(-0.5), 
                          r = ~r, start = ~pi/2, 
                          end = ~1.5*pi)
    if(seq_numbers > 0) {
        mapping_below <- modifyList(mapping_below, aes_(y0 = ~0))
    }
    if(is.list(helix_data) & "col" %in% names(helix_data[[2]])) {
        mapping_above <- modifyList(mapping_above, aes_(color = ~I(col)))
        mapping_below <- modifyList(mapping_below, aes_(color = ~I(col)))
      }

    if(overlap) {
        if(!is.list(helix_data)| length(helix_data) != 2){
            stop("Overlapping structures must input a list with
                 2 helix data.
                 (eg: heilx_data = list(known = data1, predicted = data2)")
        }
        if(!names(helix_data) %in% c("known", "predicted") %>% all) {
            stop("helix_data names must be 'known' and 'predicted'. 
                 (eg: heilx_data = list(known = data1, predicted = data2)")
        }

        overlap_data <- overlap_helix(known = helix_data[["known"]],
                                      predicted = helix_data[["predicted"]])

        if (overlap_data[["above_justknown"]] %>% nrow == 0){
            ly_up <- geom_arc(data = overlap_data[["above_both"]],
                              mapping = mapping_above)
            ly_below <- geom_arc(data = overlap_data[["below"]], 
                                 mapping = mapping_below)
            return(list(ly_up, ly_below))

        }else {
            ly_up <- geom_arc(data = overlap_data[["above_both"]],
                              mapping = mapping_above)
            ly_up_justknown <- 
              geom_arc(data = overlap_data[["above_justknown"]], 
                       mapping = mapping_above, 
                       color = "black")
            
            ly_below <- geom_arc(data = overlap_data[["below"]], 
                                 mapping = mapping_below)
            return(list(ly_up, ly_up_justknown, ly_below))
        }

    }else {#overlap = FALSE
        if(is.list(helix_data) & length(helix_data) == 2) {
            if(!"col" %in% names(helix_data[["known"]])) {
                mapping_below <- modifyList(mapping_below, 
                                            aes_(color = I("#8fce5e")))
            }
            ly_up <- geom_arc(data = helix_data[["known"]], 
                              mapping = mapping_below)
            ly_below <- geom_arc(data = helix_data[["predicted"]], 
                                 mapping = mapping_above)
            return(list(ly_up, ly_below))

        }else if(is.data.frame(helix_data)){
            if("col" %in% names(helix_data)){
                mapping_above <- modifyList(mapping_above, 
                                            aes_(color = ~I(col)))
            }
            ly_arc <- geom_arc(data = helix_data, mapping = mapping_above)
            return(ly_arc)
        }else {
            stop("Only a data frame or a list with 2 of helix data are allowed.
                 eg: heilx_data = data or 
                 heilx_data = list(known = data1, predicted = data2)")
        }
    }
}

overlap_helix <- function(known, predicted){
    if(!c("from", "to") %in% names(known) %>% all) {
        stop("'known' must be a output from 'readSSfile()'")
    }
    if(!c("from", "to") %in% names(predicted) %>% all) {
        stop("'predicted' must be a output from 'readSSfile()'")
    }

    known$heli <- paste0(known$from, "t",known$to)
    predicted$heli <- paste0(predicted$from, "t", predicted$to)

    below <- predicted[!predicted$heli %in% known$heli,] #predicted & not known
    above_both <- predicted[predicted$heli %in% known$heli,] #predicted & known
    above_justknown <- known[!known$heli %in% above_both$heli,] #unpredicted & known

    return(list(below = below,
                above_both = above_both,
                above_justknown = above_justknown))
}

##' @importFrom ggplot2 theme_void
##' @importFrom ggplot2 element_text
##' @importFrom grid arrow
theme_helix <- function(){
    list(theme_void(),
         scale_y_continuous(breaks = 0),
         coord_fixed(),
         theme(panel.grid.major.y = element_line(size = 1, arrow = arrow(length = unit(0.3, 'cm'))),
               panel.grid.major.x = element_line(color = "#eaeaea", size = 0.4),
               axis.text.x = element_text())
         )
  }






================================================
FILE: R/available.R
================================================
##' This function lists font families currently available 
##' that can be used by 'ggmsa'
##'
##'
##' @title List Font Families currently available
##' @return A character vector of available font family names
##' @examples available_fonts()
##' @export
##' @author Lang Zhou
available_fonts <- function(){
    message("font families currently available:" )
    font <- paste(names(font_fam), collapse = ' ')
    message(font, "\n")
}

##' This function lists color schemes currently available that
##'  can be used by 'ggmsa'
##'
##'
##' @title List Color Schemes currently available
##' @return A character vector of available color schemes
##' @examples available_colors()
##' @export
##' @author Lang Zhou
available_colors <- function(){
    message("1.color schemes for nucleotide sequences currently available:")
    color_nt <- paste(names(scheme_NT), collapse = ' ')
    message(color_nt, "\n")
    
    message("2.color schemes for AA sequences currently available:")
    color_aa <- paste(names(scheme_AA), collapse = ' ')
    message("Clustal", color_aa, "\n")
}

##' This function lists MSA objects currently available that
##'  can be used by 'ggmsa'
##'
##'
##' @title List MSA objects currently available
##' @return A character vector of available objects
##' @examples available_msa()
##' @export
##' @author Lang Zhou
available_msa <- function(){
    message("1.files currently available:")
    message(".fasta",'\n')

    message("2.XStringSet objects from 'Biostrings' package:")
    mes <- paste(supported_msa_class[!grepl("bin", supported_msa_class)],
                 collapse = ' ')
    message(mes, '\n')

    message("3.bin objects:")
    mes_bin <- paste(supported_msa_class[grepl("bin", supported_msa_class)],
                     collapse = ' ')
    message(mes_bin, '\n')

}



================================================
FILE: R/clustal.R
================================================
##'  A color scheme of Culstal. The algorithm to assign colors
##'   for Multiple Sequence.
##'
##' @param y sequence alignment with data frame, generated by tidy_msa().
##' @keywords clustal
##' @noRd
color_Clustal <- function(y) {
    char_freq <- lapply(split(y, y$position), function(x) table(x$character))
    col_convert <- lapply(char_freq, function(seq_column) {
        ##The white as the background
        clustal <- rep("#ffffff", length(seq_column)) 
        names(clustal) <- names(seq_column)
        r <- seq_column/sum(seq_column)
        for (pos in seq_along(seq_column)) {
            char <- names(seq_column)[pos]
            i <- grep(char, scheme_clustal$re_position)
            for (j in i) {
                if (scheme_clustal$type[j] == "combined"){
                    rr <- sum(r[strsplit(scheme_clustal$re_gp[j], '')[[1]]], 
                              na.rm = TRUE)
                    if (rr > scheme_clustal$thred[j]) {
                        clustal[pos] <- scheme_clustal$colour[j]}
                    } else{
                        rr1<-r[strsplit(scheme_clustal$re_gp[j], ',')[[1]]]
                        if (any(rr1> scheme_clustal$thred[j],na.rm = TRUE) ) {
                            clustal[pos] <- scheme_clustal$colour[j]}
                    }
                break
            }
        }
        return(clustal)
    })

    yy <- split(y, y$position)
    lapply(names(yy), function(n) {
        d <- yy[[n]]
        col <- col_convert[[n]]
        d$color <- col[d$character]
        return(d)
    }) %>% do.call('rbind', .)
}


================================================
FILE: R/color_by_conservation.R
================================================
color_increment <- function(conservation_visibility){
    lapply(seq_len(nrow(conservation_visibility)), function(i){
        color_ramp <- 
            colorRampPalette(colors = 
                                 c(conservation_visibility[i,"color"], 
                                   "#ffffff"))
        
        color_change <- 
            rev(color_ramp(100))[conservation_visibility[i,"visibility"]]
        return(color_change)
        }) %>% unlist 

}


color_visibility <- function(y){
    #options(digits = 2)
    #on.exit()
    conser_data <- bar_data(y)
    conser_data$visibility <- 
        conser_data$Freq / length(levels(y[[1]])) %>% round(2)
    conser_data$visibility <- conser_data$visibility * 100
    names(conser_data)[3] <- "position"
    y_filter <- y[c(-1,-3)] 
    conser_ready <- merge(conser_data, y_filter)
    y$color <- color_increment(conser_ready)
    return(y)
}


================================================
FILE: R/color_else.R
================================================
##'  Assigning colors to sequence alignment.
##'
##'
##' @param y sequence alignment with data frame, generated by tidy_msa().
##' @param color a Color scheme. One of 'Clustal', 'Chemistry_AA', 
##' 'Shapely_AA', 'Zappo_AA', 'Taylor_AA', 'LETTER', 'CN6', 'Chemistry_NT',
##'  'Shapely_NT', 'Zappo_NT', 'Taylor_NT'. Defaults is 'Chemistry_AA'.
##' @param custom_color A data frame with two column called "names" 
##' and "color".Customize the color scheme.
##' @noRd
color_scheme <- function(y, color = "Chemistry_AA", custom_color = NULL) {
    if (!is.null(custom_color)){
        #Elimination factor interference
        custom_color[["names"]] <- as.character(custom_color[["names"]]) 
        #Fuzzy matching the string "colors" or "colours"
        custom_color[["color"]] <- as.character(custom_color$col)
        row.names(custom_color) <- custom_color[["names"]]
        scheme_AA$custom_color <- 
            custom_color[row.names(scheme_AA), "color"] %>% as.character()
        y$color <- scheme_AA[y$character, "custom_color"]
    }else{
        if(grepl("NT", color)){
            y$color <- scheme_NT[y$character, color]
        } else{
            y$color <- scheme_AA[y$character, color]
        }
    }
    return(y)
}




================================================
FILE: R/cons.R
================================================
##' cleaning the needless sequences' color according to the 
##' consensus sequence (only used in the consensus views).
##'
##' @param y a data frame, sequence alignment with specified color.
##' @param consensus the consensus sequence which can be called by 
##' get_consensus().
##' @param disagreement a logical value. Displays characters that 
##' disagreement to consensus(excludes ambiguous disagreements).
##' @param ref a character string. Specifying the reference sequence
##'  which should be one of input sequences when 'consensus_views' is TRUE.
##' @keywords tidy_color
##' @noRd
tidy_color <- function(y, consensus, disagreement, ref) {
    c <- lapply(unique(y$position), function(i) {
        msa_cloumn <- y[y$position == i, ]
        if(!is.null(ref)) {
            if ('label' %in% names(msa_cloumn)) { ##work for ggtreeExtra
                msa_cloumn <- msa_cloumn[!msa_cloumn$label == ref, ]
            }else{
                msa_cloumn <- msa_cloumn[!msa_cloumn$name == ref, ]
                }
           
        }
        #Get consensus char.
        cons_char <- consensus[consensus$position == i, "character"] 
        
        #Compare the characters of the current position(i) 
        #to the consensus char.
        logic <- msa_cloumn$character == cons_char 
        #Cleaning colors according to the 'logic'.
        if(cons_char == "X") {
            msa_cloumn$color <- NA
        }
        if(disagreement){
            msa_cloumn[logic, "color"] <- NA
        }else{
            msa_cloumn[!logic, "color"] <- NA
        }
        msa_cloumn
    }) %>% do.call("rbind", .)
    return(c)
}

##' calling the consensus sequence.
##'
##' @param tidy sequence alignment with data frame, generated by tidy_msa().
##' @param ignore_gaps a logical value. When selected TRUE, gaps in 
##' column are treated as if that row didn't exist.
##' @param ref a character string. Specifying the reference sequence 
##' which should be one of input sequences when 'consensus_views' is TRUE.
##' @keywords get_consensus
##' @noRd
get_consensus <- function(tidy, ignore_gaps = FALSE, ref = NULL) {
    if(!is.null(ref)) {
        if(ignore_gaps) {
            warning("The argument 'ignore_gaps' is 
                    invalid when 'ref' is specified!")
        }
        if ('label' %in% names(tidy)) { ##work for ggtreeExtra
            ref <- match.arg(ref, levels(factor(tidy$label)))
            cons <- tidy[tidy$label == ref,]
        }else {
            ref <- match.arg(ref, levels(tidy$name))
            cons <- tidy[tidy$name == ref,]
        }
        return(cons)
    }
    #Iterate through each columns
    cons <- lapply(unique(tidy$position), function(i) { 
        msa_cloumn <- tidy[tidy$position == i, ]
        cons <- data.frame(position = i)
        if(ignore_gaps) {
            msa_cloumn <- msa_cloumn[!msa_cloumn$character %in% "-",]
        }
        #Gets the highest frequency characters
        fre <- table(msa_cloumn$character) %>% data.frame
        max_element <- fre[fre[2] == max(fre[2]),]
        max_number <-  max_element %>% nrow
        if(max_number == 1) {
            cons$character <- max_element[1,1]
        }else {
            cons$character <- "X"
        }
        cons
        }) %>% do.call("rbind", .)

        cons$name = "Consensus"
        cons$character <- as.character(cons$character) #debug 'as.character'
        return(cons)
}


order_name <- function(name, order = NULL, 
                       consensus_views = FALSE,
                       ref = NULL) {
    name_uni <- unique(name)
    if(is.null(ref)){
        #placed 'consensus' at the top
        name_expect <- name_uni[!name_uni %in% "Consensus"] %>%
            rev %>% 
            as.character
        name <- factor(name, levels = c(name_expect, "Consensus"))
    }else {
        name_expect <- name_uni[!name_uni %in% ref] %>%
            rev %>%
            as.character
        name <- factor(name, levels = c(name_expect, ref))
    }

    return(name)
}



================================================
FILE: R/data.R
================================================
#' A sample data used in ggmsa
#'
#' A dataset containing the alignment sequences of 
#' the phenylalanine hydroxylase protein (PH4H) 
#' within nine species
#'
#'
#' @docType data
#' @keywords datasets
#' @name sample.fasta
#' @format A MSA fasta with 9 sequences and 456 positions.
NULL



#' GVariation
#'
#' A folder containing 4 MAS files as a sample
#' data set to identify the sequence recombination event.
#'
#' \itemize{
#'   \item A.Mont.fas MSA with sequences of 'Mont' and 'CF_YL21'
#'   \item B.Oz.fas MSA with sequences of 'Oz' and 'CF_YL21'
#'   \item C.Wilga5.fas MSA with sequences of 'Wilga5' and 'CF_YL21'
#'   \item sample_alignment.fa MSA with sequences of 'Mont', 'CF_YL21', 
#'   'Oz', and 'Wilga5'
#' }
#' @docType data
#' @keywords datasets
#' @name GVariation
#' @format a folder 
#' @source \url{https://link.springer.com/article/10.1007/s11540-015-9307-3}
NULL



#' Rfam
#'
#' A folder containing seed alignment sequences and 
#' corresponding consensus RNA secondary structure. 
#'
#' \itemize{
#'   \item RF00458.fasta seed alignment sequences of Cripavirus internal 
#'   ribosome entry site (IRES)
#'   \item RF03120.fasta seed alignment sequences of Sarbecovirus 5'UTR
#'   \item RF03120_SS.txt consensus RNA secondary structure of 
#'   Sarbecovirus 5'UTR
#'  
#' }
#' @docType data
#' @keywords datasets
#' @name Rfam
#' @format a folder 
#' @source \url{https://rfam.xfam.org/}
NULL



#' Gram-negative_AKL
#'
#' Amino acids in the adenylate kinase lid (AKL) domain
#' from Gram-negative bacteria. 
#'
#' @docType data
#' @keywords datasets
#' @name Gram-negative_AKL.fasta
#' @format A MSA fasta with 100 sequences and 36 positions.
#' @source \url{http://biovis.net/year/2013/info/redesign-contest}
NULL



#' Gram-positive_AKL
#'
#' Amino acids in the adenylate kinase lid (AKL) domain
#' from Gram-positive bacteria. 
#'
#' @docType data
#' @keywords datasets
#' @name Gram-positive_AKL.fasta
#' @format A MSA fasta with 100 sequences and 36 positions.
#' @source \url{http://biovis.net/year/2013/info/redesign-contest}
NULL



#' A sample DNA alignment sequences
#'
#' DNA alignment sequences with 24 sequences and 56 positions.
#'
#'
#' @docType data
#' @keywords datasets
#' @name LeaderRepeat_All.fa
#' @format A MSA fasta 
NULL



#' microRNA data used in ggmsa
#'
#'Fasta format sequences of mature miRNA sequences 
#'from miRBase
#'
#'
#' @docType data
#' @keywords datasets
#' @name seedSample.fa
#' @format A MSA fasta with 6 sequences and 22 positions.
#' @source \url{https://www.mirbase.org/ftp.shtml}
NULL



#' sequence-link-tree
#'
#' Alignment sequences used to demonstrate circular MSA layout
#'
#' @docType data
#' @keywords datasets
#' @name sequence-link-tree.fasta
#' @format A MSA fasta with 28 sequences and 480 positions.
NULL



#' TP53 MSA
#'
#' Alignment sequences of used to show graphical combination
#'
#' @docType data
#' @keywords datasets
#' @name tp53.fa
#' @format A MSA fasta with 5 sequences and 404 positions.
NULL



#' genome locus
#'
#' The local genome map shows the 30000 sites around the TP53 gene.
#'
#' @docType data
#' @keywords datasets
#' @name TP53_genes.xlsx
#' @format xlsx
NULL



================================================
FILE: R/dms.R
================================================
##' assign dms value to alignments.
##'
##' @title assign_dms
##' @param x data frame from tidy_msa()
##' @param dms dms data frame
##' @return tree
##' @export
##' @author Lang Zhou

assign_dms <- function(x, dms) {
    dms_value <- lapply(unique(x$position), function(i) {
        xx <- x[x$position == i,]
        dmss <- dms[dms$site_RBD == i,]
        
        wt <- unique(dmss[,"wildtype"])
        xx$mutation <- paste0(wt, xx$position, xx$character)
        xx$bind_avg  <- lapply(seq_along(xx$mutation),function(j) {
            bind_avg <- dmss[dmss$mutation_RBD %in% xx[j,"mutation"],"bind_avg"]
            return(bind_avg)
        }) %>% unlist
        
        return(xx)
    }) %>% do.call("rbind",.)
    return(dms_value )
}









================================================
FILE: R/facet_msa.R
================================================
##' The MSA would be plot in a field that you set.

##' @title segment MSA
##' @param field a numeric vector of the field size.
##' @return ggplot layers
##' @examples
##' library(ggplot2)
##' f <- system.file("extdata/sample.fasta", package="ggmsa")
##' # 2 fields
##' ggmsa(f, end = 120, font = NULL, color="Chemistry_AA") + 
##'   facet_msa(field = 60)
##' # 3 fields
##' ggmsa(f, end = 120, font = NULL,  color="Chemistry_AA") + 
##'   facet_msa(field = 40)
##' @export
##' @author Lang Zhou
facet_msa <- function(field) {
    structure(list(field = field),
              class = "facet_msa"
              )
}

facet_data <- function(msaData, field) {

    if(min(msaData$position) > 1){
        pos_reset <- msaData$position - min(msaData$position)
        pos_reset[pos_reset == 0] <- 1
    }else {
        pos_reset <- msaData$position
    }
    msaData$facet <- pos_reset %/% field


    msaData[(pos_reset %% field) == 0, "facet"] <- 
        msaData[(pos_reset %% field) == 0, "facet"] - 1

    return(msaData)

}









================================================
FILE: R/geom_GC.R
================================================
##' Multiple sequence alignment layer for ggplot2. It plot points of GC content.

##' @title geom_GC
##' @param show.legend logical. Should this layer be included in the legends?
##' @return a ggplot layer
##' @examples
##' #plot GC content
##' f <- system.file("extdata/LeaderRepeat_All.fa", package="ggmsa")
##' ggmsa(f, font = NULL, color="Chemistry_NT") + geom_GC()
##' @export
##' @author Lang Zhou
geom_GC <- function(show.legend = FALSE) {
    structure(list(show.legend = show.legend),
              class = "GCcontent")
}


geom_GC1 <- function(tidyData, show.legend = FALSE){
    tidy <- tidyData
    #tidy <- tidy_msa(msa = msa, start = start, end = end)
    GC_pos <- getOption("GC_pos")

    GC <- content_GC(tidy)
    GC <-GC[GC$character == "GC",]
    col_num <- levels(factor(tidy$position))
    col_len <- length(col_num) + GC_pos
    ly_GC <- geom_point(data = GC,
                        mapping = aes_(x = ~col_len, 
                                       y = ~ypos, 
                                       size = ~fre),
                        color = "#51a6e9", 
                        na.rm = TRUE, 
                        show.legend = show.legend)
    return(ly_GC)
}
##' get GC content

##' @title content_GC
##' @param data  Multiple aligned sequence files or objects
##'  for representing nucleotide sequences
##' @return A data frame
##' @noRd
##' @author Lang Zhou
content_GC<- function(data){
    tidy <- data
    tidy$name <- factor(tidy$name, levels = unique(tidy$name))
    tidy$ypos <- as.numeric(tidy$name)
    seq_num <- unique(tidy$ypos)
    lchar_num <- lapply(seq_num, function(j){
        clo <- tidy[tidy$ypos == j, ]
        y <- prop.table(table(clo$character))
        y["GC"] <- y["G"] + y["C"]
        num <-setNames(rep(0,5), c("A", "T", "G", "C", "GC"))
        num[names(y)] <- y
        return(num)
    })

    char_num <- do.call(rbind,lchar_num)
    char_num <- as.data.frame(char_num)
    char_num["ypos"] =  seq_num
    char_num2 <- gather(char_num,character,fre, "A", "T", "C","G","GC")
    return(char_num2)
}





================================================
FILE: R/geom_asterisk.R
================================================
##' a ggplot2 layer of asterisk as a polygon
##'
##'
##' @title a ggplot2 layer of asterisk as a polygon
##' @param mapping aes mapping
##' @param data a data frame
##' @param stat the statistical transformation to use on the data 
##' for this layer, as a string.
##' @param position position adjustment, either as a string, 
##' or the result of a call to a position adjustment function.
##' @param na.rm a logical value
##' @param show.legend a logical value
##' @param inherit.aes a logical value
##' @param ... additional parameters
##' @importFrom ggplot2 layer
##' @return ggplot2 layer
## @export
##' @noRd
##' @author Lang Zhou
##' @examples
##' #library(ggplot2)
##' #ggplot(mtcars, aes(mpg, disp)) + geom_asterisk()
geom_asterisk <- function(mapping = NULL, 
                          data = NULL, 
                          stat = "identity",
                          position = "identity", 
                          na.rm = FALSE, 
                          show.legend = NA,
                          inherit.aes = TRUE, ...) {

  layer(geom = Geomasterisk, 
        mapping = mapping, 
        data = data, 
        stat = stat,
        position = position,
        show.legend = show.legend,
        inherit.aes = inherit.aes,
        params = list(na.rm = na.rm, ...))
}

##' @importFrom grid polygonGrob
##' @importFrom grid gpar
SeedStar <- function(x = NULL , y = NULL) {

    char_width <- getOption("asterisk_width")
    char_scale_2 <- getOption("char_scale_2")

    x_width <- char_scale_2 * diff(range(star$y))
    star$x = star$x * x_width/diff(range(star$x))

    char_scale <- diff(range(star$x))/diff(range(star$y))
    star$x = star$x * (char_width * char_scale)/diff(range(star$x))
    star$y = star$y * char_width/diff(range(star$y))

    star$x = star$x - min(star$x)  - (char_width * char_scale)/2 + x
    star$y = star$y - min(star$y)  - char_width/2 + y

    polygonGrob(star$x, star$y, gp = gpar(fill = "black") )

}


##' @importFrom ggplot2 ggproto
##' @importFrom ggplot2 Geom
##' @importFrom ggplot2 draw_key_polygon
##' @importFrom ggplot2 aes
##' @importFrom grid gTree
Geomasterisk <- ggproto("Geomasterisk", Geom,
                         required_aes = c("x", "y"),
                         default_aes = aes(fill = "black"),
                         draw_key = draw_key_polygon,

                         draw_panel = function(data, panel_params, coord) {
                             data <- coord$transform(data, panel_params)
                             grobs <- lapply(seq_len(nrow(data)), function(i) {
                                          SeedStar(data$x[i], data$y[i])
                                      })
                             class(grobs) <- "gList"
                             ggplot2:::ggname("geom_asterisk", 
                                              gTree(children = grobs))
                         }

)






================================================
FILE: R/geom_msa.R
================================================
##' Multiple sequence alignment layer for ggplot2. 
##' It creates background tiles with/without sequence characters.
##'
##' @title geom_msa
##' @param data sequence alignment with data frame, generated by tidy_msa().
##' @param font font families, possible values are 'helvetical', 'mono', 
##' and 'DroidSansMono', 'TimesNewRoman'. Defaults is 'helvetical'.
##' @param mapping aes mapping
##' If font = NULL, only plot the background tile.
##' @param color A Color scheme. One of 'Clustal', 'Chemistry_AA', 'Shapely_AA',
##'  'Zappo_AA', 'Taylor_AA', 'LETTER','CN6',, 'Chemistry_NT', 'Shapely_NT', 
##'  'Zappo_NT', 'Taylor_NT'. Defaults is 'Chemistry_AA'.
##' @param custom_color A data frame with two column called "names" and 
##' "color".Customize the color scheme.
##' @param char_width a numeric vector. Specifying the character width in 
##' the range of 0 to 1. Defaults is 0.9.
##' @param by_conservation a logical value. The most conserved regions have
##'  the brightest colors.
##' @param none_bg a logical value indicating whether background 
##' should be displayed. Defaults is FALSE.
##' @param position_highlight A numeric vector of the position that
##'  need to be highlighted.
##' @param seq_name a logical value indicating whether sequence names
##'  should be displayed. Defaults is 'NULL' which indicates that the 
##'  sequence name is displayed when 'font = null', but 'font = char' 
##'  will not be displayed. If 'seq_name = TRUE' the sequence name will 
##'  be displayed in any case. If 'seq_name = FALSE' the sequence name will not
##'   be displayed under any circumstances.
##' @param border a character string. The border color.
##' @param consensus_views a logical value that opening consensus views.
##' @param use_dot a logical value. Displays characters as dots instead of
##'  fading their color in the consensus view.
##' @param disagreement a logical value. Displays characters that disagreement
##'  to consensus(excludes ambiguous disagreements).
##' @param ignore_gaps a logical value. When selected TRUE, 
##' gaps in column are treated as if that row didn't exist.
##' @param ref a character string. Specifying the reference sequence
##'  which should be one of input sequences when 'consensus_views' is TRUE.
##' @param position Position adjustment, either as a string, or
##'  the result of a call to a position adjustment function,
##' default is 'identity' meaning 'position_identity()'.
##' @param show.legend logical. Should this layer be included in the legends?
##' @param dms logical. 
##' @param position_color logical. 
##' @param ... additional parameter
##' @return A list
##' @importFrom ggplot2 scale_fill_manual
##' @importFrom utils modifyList
##' @export
##' @examples
##' library(ggplot2)
##'aln <- system.file("extdata", "sample.fasta", package = "ggmsa")
##'tidy_aln <- tidy_msa(aln, start = 150, end = 170)
##'ggplot() + geom_msa(data = tidy_aln, font = NULL) + coord_fixed()
##' @author Guangchuang Yu, Lang Zhou
geom_msa <- function(data, font = "helvetical",
                     mapping = NULL,
                     color = "Chemistry_AA",
                     custom_color = NULL,
                     char_width = 0.9,
                     none_bg = FALSE,
                     by_conservation = FALSE,
                     position_highlight = NULL,
                     seq_name = NULL,
                     border = NULL,
                     consensus_views = FALSE,
                     use_dot = FALSE,
                     disagreement = TRUE,
                     ignore_gaps = FALSE,
                     ref = NULL,
                     position = "identity",
                     show.legend = FALSE,
                     dms = FALSE,
                     position_color = FALSE,
                     ... ) {

    data <- msa_data(data,
                     font = font,
                     color = color,
                     custom_color = custom_color,
                     char_width = char_width,
                     by_conservation = by_conservation,
                     consensus_views  = consensus_views,
                     use_dot = use_dot,
                     disagreement = disagreement,
                     ignore_gaps = ignore_gaps,
                     ref = ref)

    #legend work
    xx <- data[,c("character","color")] %>% unique()
    xx <- xx[!is.na(xx$color),]
    labs <- lapply(unique(xx$color) %>% seq_along, function(i) {
        cols <- unique(xx$color)[i]
        dup_char <- xx[xx$color == cols, "character"]
        lab <- paste0(dup_char, collapse = ",")
    }) %>% do.call("rbind",.) %>% as.vector()

    cols <- xx$color %>% unique()
    names(cols) <- cols
    sacle_tile_cols <- scale_fill_manual(values = cols,
                                         breaks = cols,
                                         labels = labs)


    bg_data <- data

    #work to ggtreeExtra
    if (is.null(mapping)) {
        mapping <- aes_(x = ~position, y = ~name, fill = ~I(color))
    }
    
    #dms color work
    if (dms) {
        mapping <- modifyList(mapping, aes_(fill = ~bind_avg))
    }
    if (position_color) {
        mapping <- modifyList(mapping, aes_(fill = ~I(pos_color)))
    }
    
    
    #'seq_name' work
    if (!isTRUE(seq_name)) {
        if ('y' %in% colnames(data) || isFALSE(seq_name) ) {
            y <- as.numeric(bg_data$name)
            mapping <- modifyList(mapping, aes_(y = ~y)) #"~y" is seq numbers
        }
    }

    #'position_highlight' work
    if (!is.null(position_highlight)) {
        none_bg = TRUE
        bg_data <- bg_data[bg_data$position %in% position_highlight,]
        bg_data$postion <- as.factor(bg_data$position)
        mapping <- modifyList(mapping, aes_(x = ~position, 
                                            fill = ~color, 
                                            width = 1))
    }

    #'border' work
    if(is.null(border)){
        ly_bg <- geom_tile(mapping = mapping, data = bg_data, color = 'grey', 
                           inherit.aes = FALSE, position = position, 
                           show.legend = show.legend)
    }else{
        ly_bg <- geom_tile(mapping = mapping, data = bg_data, color = border,
                           inherit.aes = FALSE, position = position, 
                           show.legend = show.legend)
    }

    if (!all(c("yy", "order", "group") %in% colnames(data))) {
        if(position_color) {
            return(list(ly_bg))
        }else{
            return(list(ly_bg, sacle_tile_cols))
        }
    }

    if ('y' %in% colnames(data)) {
        data$yy = data$yy - as.numeric(data$name) + data$y
    }

    label_mapping <- aes_(x = ~x, y = ~yy, group = ~group)

    # use_dot work
    if (consensus_views && !use_dot) {
        if(show.legend) {
            stop("legends catn't be shown in the consensus view!")
        }
        label_mapping <- modifyList(label_mapping, aes_(fill = ~I(font_color)))
    }
    ly_label <- geom_polygon(mapping = label_mapping, data = data, 
                             inherit.aes = FALSE, position = position)

    #'none_bg' work
    if (none_bg & is.null(position_highlight)) {
        return(ly_label)
    }

    if(consensus_views) {
        return(list(ly_bg, ly_label))
    }else {
        if(position_color){
            return(list(ly_bg, ly_label))
        }else{
            return(list(ly_bg, ly_label, sacle_tile_cols))
        }
    }

}



================================================
FILE: R/geom_msaBar.R
================================================
##' Multiple sequence alignment layer for ggplot2.
##'  It plot sequence conservation bar.

##' @title geom_msaBar

##' @return A list
##' @examples
##' #plot multiple sequence alignment and conservation bar.
##' f <- system.file("extdata/sample.fasta", package="ggmsa")
##' ggmsa(f, 221, 280, font = NULL, seq_name = TRUE) + geom_msaBar()
##' @export
##' @author Lang Zhou
geom_msaBar <- function() {
    structure(list(),
              class = "msaBar")
}

##' @importFrom ggplot2 geom_col
ly_bar <- function(tidy){
    data <- bar_data(tidy)
    mapping <- aes_(x = ~pos, y = ~Freq, fill = ~Freq)
    ly_bar <- geom_col(data = data, 
                       mapping = mapping, 
                       width = 1, 
                       show.legend = FALSE)
    return(ly_bar)
}


##' get bar data

##' @title bar_data
##' @param tidy  Multiple aligned sequence files or 
##' object for representing nucleotide sequences
##' @return A data frame
##' @noRd
##' @author Lang Zhou
bar_data <- function(tidy){
    character_position <- unique(tidy$position)
    conservation_score <- lapply(character_position, function(j) {
        cloumn_data <- tidy[tidy$position == j, ]
        character_frequency <- table(cloumn_data$character) %>% as.data.frame
        max_frequency <- character_frequency[character_frequency[2] ==
                                                 max(character_frequency[2]),]
        max_frequency$Var1 <- as.character(max_frequency$Var1)
        if(nrow(max_frequency) == 1) {
            max_frequency <- max_frequency[1,]
        }else {
            max_frequency <- max_frequency[1,]
        }
}) %>% do.call("rbind", .)
    conservation_score["pos"] <- character_position
    return(conservation_score)
}


================================================
FILE: R/geom_seed.R
================================================
##' Highlighting the seed in miRNA sequences
##'
##'
##' @title geom_seed
##' @param seed a character string.Specifying the miRNA seed sequence
##'  like 'GAGGUAG'.
##' @param star a logical value indicating whether asterisks should 
##' be displayed.
##' @return a ggplot layer
##' @author Lang Zhou
##' @examples
##' miRNA_sequences <- system.file("extdata/seedSample.fa", package="ggmsa")
##' ggmsa(miRNA_sequences, font = 'DroidSansMono', 
##'       color = "Chemistry_NT", none_bg = TRUE) +
##' geom_seed(seed = "GAGGUAG", star = FALSE)
##' ggmsa(miRNA_sequences, font = 'DroidSansMono', 
##'       color = "Chemistry_NT") +
##' geom_seed(seed = "GAGGUAG", star = TRUE)
##' @export
geom_seed <- function(seed, star = FALSE) {
    structure(list(seed = seed,
                   star = star),
              class = "seed")
}


geom_seed1 <- function(tidyData, seed, star) {
    get_asteriskScale(tidyData)
    tidyData$y <- as.numeric(tidyData$name)
    seq_first <- tidyData[tidyData$y == 1,]
    char <- seq_first$character
    char <- paste(char, collapse = "")
    seedPos <- regexpr(seed,char)
    #locate <- str_locate(char, seed)
    #df_locate <- as.data.frame(locate)
    #seedPos <- df_locate$start # start position of seed region
    seedLen <- nchar(seed) # length of seed region
    numSeq <- max(tidyData$y) # number of sequences
    shadingLen <- getOption("shadingLen") #shading width
    shading_alpha <- getOption("shading_alpha")

    x <- seedPos - .5 #the x coordinate of the lower left corner
    y <- 1 - .5 - shadingLen #the y coordinate of the lower left corner
    yy <- numSeq + .5 + shadingLen # #the y coordinate of the top right corner
    xx <- x + seedLen #the x coordinate of the top right corner

    shadingData <- data.frame(x = c(x, x, xx, xx),
                    y = c(y, yy, yy, y),
                    t = c('a', 'a', 'a','a'))
    starData <- data.frame(star_x = seq(seedPos, length.out = nchar(seed)),
                            star_y = rep(y, times = nchar(seed)))

    if(isTRUE(star)) {
        ly_star <- geom_asterisk(data = starData, 
                                 aes_(x = ~star_x, y = ~star_y))
        return(ly_star)
    }

    mapping <- aes_(x= ~x, y= ~y, group= ~t, fill = ~I('#bebebe'))
    ly_seed <- geom_polygon(data = shadingData, 
                            mapping = mapping, 
                            alpha = shading_alpha)
    return(ly_seed)
 }


get_asteriskScale <- function(tidyData) {
    m <- max(tidyData$position)
    seq_name <- factor(tidyData$name, levels = unique(tidyData$name))
    n <- max(as.numeric(seq_name))
    char_scale <- diff(range(star$x))/diff(range(star$y))
    char_scale_2 <- char_scale * 3/2 * n/m

    return(options("char_scale_2" = char_scale_2))

}


================================================
FILE: R/ggmaf.R
================================================
##' plot MAF
##'
##' @title ggmaf 
##' @param data a tidy MAF data frame.You can get it by tidy_maf_df() 
##' @param ref character, the name of reference genome. 
##' eg:"hg38.chr1_KI270707v1_random"
##' @param block_start a numeric vector(>0). The start block to plot.
##' @param block_end a numeric vector(< max block). The end block to plot.
##' @param facet_field a numeric vector. The field in a facet panel.
##' @param heights two numeric vector.The plot proportion between 
##' "Genomic location" panel(upon) and "Alignment" panel(down).
##' Default:c(0.4,0.6)
##' @param facet_heights Numeric vectors.The facet proportion.
##' @return ggplot object
##' @export
##' @author Lang Zhou
ggmaf <- function(data, 
                  ref, 
                  block_start = NULL, 
                  block_end = NULL, 
                  facet_field = NULL, 
                  heights = c(0.4,0.6),
                  facet_heights = NULL) {
  
  d <- data[data$block_number %in% c(block_start : block_end),]
  
  if(is.null(facet_field)) {
    maf_p <- maf_plot(d = d, ref = ref)
    p <- plot_list(gglist = maf_p, heights = heights)
    return(p)
  }else {
    d <- facet_maf(mafData = d, field = facet_field)
    p_ls <- lapply(unique(d$facet), function(i) {
      facet_d <- d[d$facet == i,]
      maf_p <- maf_plot(d = facet_d, ref = ref)
      pp <- plot_list(gglist = maf_p, heights = heights)
      return(pp)
    })
    p <- plot_list(gglist = p_ls, ncol =  1, heights = facet_heights)
    return(p)
  }
}


##' tidy MAF data frame 
##'
##' @title tidy_maf_df
##' @param maf_df a MAF data frame.You can get it by read_maf() 
##' @param ref character, the name of reference genome. 
##' eg:"hg38.chr1_KI270707v1_random"
##' @return data frame
##' @export
##' @author Lang Zhou
tidy_maf_df <- function(maf_df,ref) {
  ##add ref position to other genome
  block_num <- unique(maf_df$block)
  tidy_df <- lapply(block_num, function(i) {
    x <- maf_df[maf_df$block == i,]
    x$ref_start <- x[x$src == ref, "start"]
    x$ref_end <- x[x$src == ref, "end_gap"]
    return(x)
  })%>% do.call("rbind", .)
  
  tidy_df$block_number <- factor(tidy_df$block, levels = 
                                   unique(tidy_df$block)) %>% as.numeric
  tidy_df$bs <- paste0(tidy_df$src,"-",tidy_df$block) 
  tidy_df$merge_y <- factor(tidy_df$src) %>% as.numeric
  tidy_df$label <- paste0("B",tidy_df$block_number)
  tidy_df <- order_aln(tidy_df,ref)
  return(tidy_df)
  
}


#put the ref sequence the first in each block, new col "y"
order_aln <- function(tidy_df, ref) {
    block_num <- unique(tidy_df$block)
    lev <- sapply(block_num, function(i) {
        x <- tidy_df[tidy_df$block == i,]
        order <- c(ref, x$src[!x$src %in% ref]) 
        
        lev <- paste0(order, "-",x$block)  
        return(lev)
    })%>% unlist %>% rev
    tidy_df$y <- factor(tidy_df$bs,levels = lev) %>% as.numeric
    return(tidy_df)
}

##' @importFrom utils getFromNamespace
##' @importFrom ggplot2 aes_
##' @importFrom ggplot2 geom_text
maf_plot <- function(d, ref, 
                     positive_color = "#a9c9d4",
                     negative_color = "#ffa389") {
  geom_rrect <- getFromNamespace("geom_rrect","statebins")
  ##plot down panel
  p_maf_aln <- ggplot(data = d) + 
    geom_rrect(mapping=aes_(xmin =~ ref_start,
                            xmax =~ ref_end,
                            ymin =~ y - 0.3,
                            ymax =~ y + 0.3,
                            fill =~ strand)) +
    geom_rrect(data = d,
               mapping=aes_(xmin =~ ref_start,
                            xmax =~ ref_end,
                            ymin =~ max(y) + 1 - 0.3,
                            ymax =~ max(y) + 1 + 0.3),
               fill = "#a9c9d4",color = "black") +
    scale_y_continuous(breaks = c(d$y,max(d$y + 1)),labels = c(d$bs, ref)) +
    scale_fill_manual(breaks = c("+","-"),
                      values = c(positive_color,negative_color)) +
    theme_void() +
    theme(axis.text.x = element_text(),
          axis.text.y = element_text(),
          panel.grid.minor.y = element_blank(),
          panel.grid.major.y = element_line(color = "grey"))
  
  ##plot upon panel
  aim <- d[d$src != ref, ]
  p_maf_genomePos <- ggplot(data = aim) + 
    geom_rrect(mapping = aes_(xmin =~ start,
                              xmax =~ end_gap,
                              ymin =~ merge_y - 0.3,
                              ymax =~ merge_y + 0.3,
                              fill =~ strand),
               color = "black", 
               size = 0.5, 
               alpha = 0.8, 
               show.legend = FALSE) + 
    scale_y_continuous(breaks = unique(aim$merge_y),
                       labels = unique(aim$src)) +
    scale_fill_manual(breaks = c("+","-"),
                      values = c(positive_color,negative_color)) +
    theme_void() + theme(panel.grid.major.y = element_line(color = "grey"),
                         axis.text.x = element_text(),
                         axis.text.y = element_text(),
                         strip.text = element_blank()) + 
    geom_text(aes_(x =~ (start + end_gap)/2, 
                   y =~ merge_y,label =~ label), 
              size = 3) +
    facet_wrap(~src, scales = "free", ncol = 1)
  return(list(p_maf_genomePos, p_maf_aln))
}

#assign facet number to blocks
facet_maf <- function(mafData, field) {
    
    if(min(mafData$block_number) > 1){
        pos_reset <- mafData$block_number - min(mafData$block_number) + 1
        #pos_reset[pos_reset == 0] <- 1
    }else {
        pos_reset <- mafData$block_number
    }
    mafData$facet <- pos_reset %/% field
    
    mafData[(pos_reset %% field) == 0, "facet"] <-
        mafData[(pos_reset %% field) == 0, "facet"] - 1
    
    return(mafData)
}













================================================
FILE: R/ggmsa.R
================================================
##' Plot multiple sequence alignment using ggplot2 with multiple color schemes 
##' supported.
##'
##'
##' @title ggmsa
##' @param msa Multiple aligned sequence files or objects representing either 
##' nucleotide sequences or AA sequences.
##' @param start a numeric vector. Start position to plot.
##' @param end a numeric vector. End position to plot.
##' @param font font families, possible values are 'helvetical', 'mono', and 
##' 'DroidSansMono', 'TimesNewRoman'.  Defaults is 'helvetical'. 
##' If font = NULL, only plot the background tile.
##' @param color a Color scheme. One of 'Clustal', 'Chemistry_AA', 
##' 'Shapely_AA', 'Zappo_AA', 'Taylor_AA', 'LETTER', 'CN6', 'Chemistry_NT', 
##' 'Shapely_NT', 'Zappo_NT', 'Taylor_NT'. Defaults is 'Chemistry_AA'.
##' @param custom_color A data frame with two column called "names" and 
##' "color".Customize the color scheme.
##' @param char_width a numeric vector. Specifying the character width in 
##' the range of 0 to 1. Defaults is 0.9.
##' @param by_conservation a logical value. The most conserved regions have 
##' the brightest colors.
##' @param none_bg a logical value indicating whether background should be
##'  displayed. Defaults is FALSE.
##' @param position_highlight A numeric vector of the position that need to be
##'  highlighted.
##' @param seq_name a logical value indicating whether sequence names 
##' should be displayed. Defaults is 'NULL' which indicates that the 
##' sequence name is displayed when 'font = null', but 'font = char' 
##' will not be displayed. If 'seq_name = TRUE' the sequence name will 
##' be displayed in any case. If 'seq_name = FALSE' the sequence name 
##' will not be displayed under any circumstances.
##' @param border a character string. The border color.
##' @param consensus_views a logical value that opening consensus views.
##' @param use_dot a logical value. Displays characters as dots instead 
##' of fading their color in the consensus view.
##' @param disagreement a logical value. Displays characters that 
##' disagreememt to consensus(excludes ambiguous disagreements).
##' @param ignore_gaps a logical value. When selected TRUE, gaps in column 
##' are treated as if that row didn't exist.
##' @param ref a character string. Specifying the reference sequence which 
##' should be one of input sequences when 'consensus_views' is TRUE.
##' @param show.legend logical. Should this layer be included in the legends?
##' @return ggplot object
##' @importFrom tidyr gather
##' @importFrom ggplot2 ggplot
##' @importFrom ggplot2 aes_
##' @importFrom ggplot2 theme
##' @importFrom ggplot2 theme_minimal
##' @importFrom ggplot2 geom_tile
##' @importFrom ggplot2 geom_polygon
##' @importFrom ggplot2 xlab
##' @importFrom ggplot2 ylab
##' @importFrom ggplot2 coord_fixed
##' @importFrom ggplot2 geom_point
##' @importFrom ggplot2 element_blank
##' @importFrom magrittr %>%
##' @importFrom stats setNames
##' @importFrom grid unit
##' @examples
##' #plot multiple sequences by loading fasta format
##' fasta <- system.file("extdata", "sample.fasta", package = "ggmsa")
##' ggmsa(fasta, 164, 213, color="Chemistry_AA")
##'
##'\dontrun{
##' #XMultipleAlignment objects can be used as input in the 'ggmsa'
##' AAMultipleAlignment <- Biostrings::readAAMultipleAlignment(fasta)
##' ggmsa(AAMultipleAlignment, 164, 213, color="Chemistry_AA")
##'
##' #XStringSet objects can be used as input in the 'ggmsa'
##' AAStringSet <- Biostrings::readAAStringSet(fasta)
##' ggmsa(AAStringSet, 164, 213, color="Chemistry_AA")
##'
##' #Xbin objects from 'seqmagick' can be used as input in the 'ggmsa'
##' AAbin <- seqmagick::fa_read(fasta)
##' ggmsa(AAbin, 164, 213, color="Chemistry_AA")
##' }
##' @export
##' @author Guangchuang Yu
ggmsa <- function(msa,
                  start = NULL,
                  end = NULL,
                  font = "helvetical",
                  color = "Chemistry_AA",
                  custom_color = NULL,
                  char_width = 0.9,
                  none_bg = FALSE,
                  by_conservation = FALSE,
                  position_highlight = NULL,
                  seq_name = NULL,
                  border = NULL,
                  consensus_views = FALSE,
                  use_dot = FALSE,
                  disagreement = TRUE,
                  ignore_gaps = FALSE,
                  ref = NULL,
                  show.legend = FALSE) {

    data <- tidy_msa(msa, start = start, end = end)

    ggplot() + geom_msa(data, font = font,
                        color = color,
                        custom_color = custom_color,
                        char_width = char_width,
                        none_bg = none_bg,
                        by_conservation = by_conservation,
                        position_highlight = position_highlight,
                        seq_name = seq_name,
                        border = border,
                        consensus_views = consensus_views,
                        use_dot = use_dot,
                        disagreement = disagreement,
                        ignore_gaps = ignore_gaps,
                        ref = ref,
                        show.legend = show.legend) +
               theme_msa()

}








================================================
FILE: R/import-functions.R
================================================
##' @importFrom utils globalVariables
globalVariables(".")
globalVariables("fre") #geom_GC.R:
globalVariables("read.delim") #arc.R
globalVariables(c("name", "position_adj", "y_adj")) #SeqBundles.R






================================================
FILE: R/method-plot.R
================================================
##' plot method for SeqDiff object
##'
##' @name plot
##' @rdname plot-methods
##' @exportMethod plot
##' @aliases plot,SeqDiff,ANY-method
##' @docType methods
##' @param x SeqDiff object
##' @param width bin width
##' @param title plot title
##' @param xlab xlab
##' @param by one of 'bar' and 'area'
##' @param fill fill color of upper part of the plot
##' @param colors color of lower part of the plot
##' @param xlim limits of x-axis
##' @return plot
##' @importFrom ggplot2 ggtitle
##' @importFrom ggplot2 xlim
##' @importFrom ggplot2 ggplot_gtable
##' @importFrom ggplot2 ggplot_build
##' @importFrom grid unit.pmax
##' @importFrom aplot plot_list
##' @author guangchuang yu
##' @examples
##' fas <- list.files(system.file("extdata", "GVariation", package="ggmsa"),
##'                   pattern="fas", full.names=TRUE)
##' x1 <- seqdiff(fas[1], reference=1)
##' plot(x1)
setMethod("plot", signature(x="SeqDiff"),
          function(x, width=50, title="auto",
                   xlab = "Nucleotide Position",
                   by="bar", fill="firebrick",
                   colors=c(A="#ff6d6d", C="#769dcc", G="#f2be3c", T="#74ce98"),
                   xlim = NULL) {
              nn <- names(x@sequence)
              if (is.null(title) || is.na(title)) {
                  title <- ""
              } else if (title == "auto") {
                  title <- paste(nn[-x@reference], 
                                 "nucelotide differences relative to", 
                                 nn[x@reference])
              }

              p1 <- plot_difference_count(x@diff, width, by=by, fill=fill) + 
                  ggtitle(title)
              p2 <- plot_difference(x@diff, colors=colors, xlab)

              if (!is.null(xlim)) {
                  p1 <- p1 + xlim(xlim)
                  p2 <- p2 + xlim(xlim)
              }

              plot_list(p1, p2, ncol=1, heights=c(.7, .4))
          }
          )



##' @importFrom ggplot2 ggplot
##' @importFrom ggplot2 aes_
##' @importFrom ggplot2 geom_segment
##' @importFrom ggplot2 xlab
##' @importFrom ggplot2 ylab
##' @importFrom ggplot2 scale_y_continuous
##' @importFrom ggplot2 theme_minimal
##' @importFrom ggplot2 theme
##' @importFrom ggplot2 element_blank
##' @importFrom ggplot2 scale_color_manual
plot_difference <- function(x, colors, xlab="Nucleotide Position") {
    x$difference <-  x$difference %>% toupper
    yy = 4:1
    names(yy) = c("A", "C", "G", "T")
    x$y <- yy[x$difference]
    n <- sum(is.na(x$y))
    if (n > 0) {
        message(n, " sites contain deletions or ambiguous bases, 
                which will be ignored in current implementation...")
    }
    x <- x[!is.na(x$y),]
    p <- ggplot(x, aes_(x=~position, y=~y, color=~difference))

    p + geom_segment(aes_(x=~position, xend=~position, y=~y, yend=~y+.8)) +
        xlab(xlab) + ylab(NULL) +
        scale_y_continuous(breaks=yy, labels=names(yy)) +
        theme_minimal() +
        theme(legend.position="none")+
        theme(axis.text.x=element_blank(), axis.ticks.x = element_blank()) +
        scale_color_manual(values=colors)
}

##' @importFrom ggplot2 geom_col
##' @importFrom ggplot2 geom_area
##' @importFrom ggplot2 theme_bw
plot_difference_count <- function(x, width, by = 'bar', fill='red') {
    by <- match.arg(by, c("bar", "area"))
    if (by == 'bar') {
        geom <- geom_col(fill=fill, width=width)
        keep0 <- FALSE
    } else if (by == "area") {
        geom <- geom_area(fill=fill)
        keep0 <- TRUE
    }
    d <- nucleotide_difference_count(x, width, keep0)
    p <- ggplot(d, aes_(x=~position, y=~count))
    p + geom + xlab(NULL) + ylab("Difference") + theme_bw()
}



================================================
FILE: R/method-show.R
================================================
##' show method
##'
##'
##' @name show
##' @docType methods
##' @rdname show-methods
##' @title show method
##' @param object SeqDiff object
##' @return message
##' @importFrom methods show
##' @exportMethod show
##' @aliases SeqDiff-class
##'   show,SeqDiff-method
##' @usage show(object)
##' @examples
##' fas <- list.files(system.file("extdata", "GVariation", package="ggmsa"),
##'                   pattern="fas", full.names=TRUE)
##' x1 <- seqdiff(fas[1], reference=1)
##' x1
setMethod("show",signature(object="SeqDiff"),
          function(object) {
              message("sequence differences of", 
                      paste0(names(object@sequence), collapse=" and "), 
                      '\n')
              d <- object@diff$difference %>% table %>% as.data.frame
              message(sum(d$Freq), " ", "sites differ:\n")
              freq <- d[,2]
              names(freq) <- d[,1]
              print(freq)
          })


================================================
FILE: R/methods-diff.R
================================================
##' @method diff SeqDiff
##' @export
diff.SeqDiff <- function(x, ...) {
    x@diff
}




================================================
FILE: R/methods-ggplot_add.R
================================================
##' @method ggplot_add seqlogo
##' @export
ggplot_add.seqlogo <- function(object, plot, object_name) {
    msaData <- plot$layers[[1]]$data
    logo_tidyData <- msa2tidy(msaData)
    logo_font <- object$font
    logo_color <- object[["color"]]
    adaptive <- object$adaptive
    top <- object$top
    logo_custom_color <- object[["custom_color"]]
    show.legend <- object$show.legend

    ly_logo <- geom_logo(data  = logo_tidyData, 
                         font = logo_font, 
                         color = logo_color,
                         adaptive = adaptive, 
                         top = top, 
                         custom_color = logo_custom_color, 
                         show.legend = show.legend)
    ggplot_add(ly_logo, plot, object_name)
}

##' @method ggplot_add seed
##' @export
ggplot_add.seed <- function(object, plot, object_name) {
    msaData <- plot$layers[[1]]$data
    seed_tidyData <- msa2tidy(msaData)
    seed <- object$seed
    star <- object$star

    ly <- geom_seed1(seed_tidyData, seed, star)

    ggplot_add(ly, plot, object_name)
}



##' @method ggplot_add GCcontent
##' @export
ggplot_add.GCcontent <- function(object, plot, object_name) {
    msaData <- plot$layers[[1]]$data
    show.legend <- object$show.legend
    GC_tidyData <- msa2tidy(msaData)

    ly <- geom_GC1(GC_tidyData, show.legend = show.legend )

    ggplot_add(ly, plot, object_name)
}


##' @importFrom ggplot2 facet_wrap
##' @importFrom ggplot2 ggplot_add
##' @importFrom ggplot2 scale_x_continuous
##' @importFrom ggplot2 coord_cartesian
##' @importFrom ggplot2 geom_blank
##' @method ggplot_add facet_msa
##' @export
ggplot_add.facet_msa <- function(object, plot, object_name){
    msaData <- plot$layers[[1]]$data
    field <- object$field
    facetData <- facet_data(msaData, field)

    ##update data
    plot$layers[[1]]$data <- facetData #ly_bg
    if (length(plot$layers) > 1){
        plot$layers[[2]]$data <- facetData #ly_label
    }

    region <- diff(range(facetData$position))
    xl_scale <- facet_scale(facetData, field)

    if (region %% field == 0) {
        plot + facet_wrap(.~facet, ncol = 1, scales = "free_x") +
            scale_x_continuous(expand = c(0,0), 
                               breaks = xl_scale, 
                               labels = xl_scale) +
            coord_cartesian()
    }else {
        max_pos <- facetData$position %>% max
        min_pos <- facetData$position %>% min
        max_facet <- facetData$facet %>% max
        minpos_maxfacet <- facetData[facetData$facet == 
                                         max_facet,"position"] %>% min
        expand_pos <-  (region %/% field + 1) * field + min_pos

        dummy <- data.frame(x = c(minpos_maxfacet, expand_pos), 
                            facet = max_facet)
        plot +
            facet_wrap(.~facet, ncol = 1, scales = "free_x") +
            geom_blank(aes_(x = ~x), dummy, inherit.aes = FALSE) +
            scale_x_continuous(expand = c(0,0), 
                               breaks = xl_scale, 
                               labels = xl_scale) +
            coord_cartesian()
    }

}

##' @method ggplot_add msaBar
##' @importFrom aplot insert_top
##' @importFrom ggplot2 coord_cartesian
##' @export
ggplot_add.msaBar <- function(object, plot, object_name){
    msaData <- plot$layers[[1]]$data
    bar_tidyData <- msa2tidy(msaData)
    ly <- ly_bar(bar_tidyData)

    p_bar <- ggplot() + ly_bar(bar_tidyData) + bar_theme(bar_tidyData)
    plot <- plot + coord_cartesian()
    p_bar %>% insert_top(plot, height = 3)
}


##' @method ggplot_add nucleotideeHelix
##' @export
ggplot_add.nucleotideeHelix <- function(object, plot, object_name){
    msa_data <- plot$layers[[1]]$data
    tidy_data <- msa2tidy(msa_data)
    seq_numbers <- levels(tidy_data$name) %>% length

    helix_data <- object$helix_data
    color_by <- object$color_by
    overlap <- object$overlap

    if(is.data.frame(helix_data)) {
        helix_tidy <- tidy_helix(helix_data, color_by = color_by)
    }else {
        helix_tidy <- tidy_list_helix(helix_data, color_by = color_by)
    }
    ly <- layer_helix(helix_data = helix_tidy, 
                      overlap = overlap, 
                      seq_numbers = seq_numbers)
    ggplot_add(ly, plot, object_name)
}


================================================
FILE: R/msa_data.R
================================================
##' This function parses FASTA files or other sequence objects. 
##' And assign color to each molecule (amino acid or nucleotide) according to
##'  the selected color scheme.
##'
##'
##' @title msa_data
##' @param tidymsa sequence alignment with data frame, generated by tidy_msa().
##' @param font font families, possible values are 'helvetical', 'mono', 
##' and 'DroidSansMono', 'TimesNewRoman'. . Defaults is 'helvetical'. 
##' If you specify font = NULL, only the background box will be printed.
##' @param color A Color scheme. One of 'Clustal', 'Chemistry_AA', '
##' Shapely_AA', 'Zappo_AA', 'Taylor_AA','LETTER','CN6', 'Chemistry_NT', 
##' 'Shapely_NT', 'Zappo_NT', 'Taylor_NT'.Defaults is 'Chemistry_AA.
##' @param custom_color A data frame with two cloumn called "names" and 
##' "color".Customize the color scheme.
##' @param order vectors.Specified sequences order.
##' @param char_width a numeric vector. Specifying the character 
##' width in the range of 0 to 1. Defaults is 0.9.
##' @param by_conservation a logical value. The most conserved 
##' regions have the brightest colors.
##' @param consensus_views a logical value that opeaning consensus views.
##' @param use_dot a logical value. Displays characters as dots 
##' instead of fading their color in the consensus view.
##' @param disagreement a logical value. Displays characters that 
##' disagreememt to consensus(excludes ambiguous disagreements).
##' @param ignore_gaps a logical value. When selected TRUE, gaps 
##' in column are treated as if that row didn't exist.
##' @param ref a character string. Specifying the reference sequence 
##' which should be one of input sequences when 'consensus_views' is TRUE.
##' @return A data frame
##' @examples
##' fasta <- system.file("extdata/sample.fasta", package="ggmsa")
##' data <- msa_data(fasta, 20, 120, 
##'                  font = "helvetical", 
##'                  color = 'Chemistry_AA' )
## @export
##' @noRd
##' @author Guangchuang Yu, Lang Zhou
msa_data <- function(tidymsa, font = "helvetical",
                     color = "Chemistry_AA",
                     custom_color = NULL,
                     char_width = 0.9,
                     by_conservation = FALSE,
                     consensus_views = FALSE,
                     use_dot = FALSE,
                     disagreement = TRUE,
                     ignore_gaps = FALSE,
                     ref = NULL) {

    if (is.null(custom_color)) {
        color <- match.arg(color, c("Clustal", "Chemistry_AA", "Shapely_AA", 
                                    "Zappo_AA", "Taylor_AA","Chemistry_NT",
                                    "Shapely_NT", "Zappo_NT", "Taylor_NT", 
                                    "LETTER", "CN6", "Hydrophobicity" ))
    }
    y <- tidymsa

    ## add color
    if (color == "Clustal"){
        y <- color_Clustal(y)
    }else {
        if (consensus_views) {
            consensus <- get_consensus(y, #extract a consensus/ref sequence
                                       ignore_gaps = ignore_gaps, 
                                       ref = ref)
            tc <- color_scheme(y, color) %>% #assigning color for other seq.
                  tidy_color(consensus, disagreement, ref = ref)# tidy colors
            
            y <- color_scheme(consensus, color) %>% #assigning color for con/ref
                 rbind(tc) #add consensus sequence

            if (use_dot){
                y[is.na(y$color), "character"] <- "."
            }else {
                y$font_color <- "#000000"
                y[is.na(y$color), "font_color"] <- "#aaacaf"
                y[is.na(y$color), "color"] <- "#ffffff"
            }

        }else {
            y <- color_scheme(y, color, custom_color)
        }
    }

    if (by_conservation){
        y <- color_visibility(y)
    }


    if (is.null(font)) {
        return(y)
    }

    ## calling internal polygons
    font_f <- font_fam[[font]]
    
    #debug using'as.character()'
    data_sp <- font_f[as.character(unique(y$character))]

    ## To adapt to tree data
    if (!'name' %in% names(y) & !consensus_views) {
        if ('label' %in% names(y)) {
            names(y)[names(y) == 'label'] <- "name"
        }else {
            stop("unknown sequence name...")
        }
    }

    if(!is.factor(y$name) & !consensus_views){
        lev <- unique(data.frame(y[,c("name","y")]))
        
        # y is the order of the nodes in the tree
        lev <- lev[order(lev$y), "name"] 
        y$name <- factor(y$name, levels = lev)
    } else if(consensus_views) {
        y$name <- order_name(y$name, 
                             consensus_views = consensus_views, 
                             ref = ref)
    }
    y$ypos <- as.numeric(y$name)

    # for ggtreeExtra
    if ("new_position" %in% colnames(y)) {
        scale_n <- 5 * length(unique(y$name))/diff(range(y$new_position))
        char_width <- char_width * 
          diff(range(y$new_position))/diff(range(y$position))
    }

    yy <- lapply(seq_len(nrow(y)), function(i) {
        d <- y[i, ]
        dd <- data_sp[[d$character]]
        if(d$character == "."){ # '.' without zooming
          if ("new_position" %in% colnames(d)){
              dd$x <- dd$x - min(dd$x) + d$new_position - diff(range(dd$x))/2
          }else{
              dd$x <- dd$x - min(dd$x) + d$position - diff(range(dd$x))/2
          }
          dd$y <- dd$y - min(dd$y) + d$ypos - diff(range(dd$y))/2
        }else {# other characters
            char_scale <- diff(range(dd$x))/diff(range(dd$y))#equal proportion
            #y_width = char_width, x-width scaled proportionally
            if(diff(range(dd$x)) <= diff(range(dd$y))) {
                dd$x <- dd$x * (char_width * char_scale)/diff(range(dd$x))
                # for ggtreeExtra
                if ("new_position" %in% colnames(d)){
                    dd$y <- (dd$y * char_width)/diff(range(dd$y)) * scale_n
                    dd$x <- dd$x - min(dd$x) + d$new_position - 
                      (char_width * char_scale)/2
                    dd$y <- dd$y - min(dd$y) + d$ypos - scale_n * char_width/2
                }else{
                    dd$y <- (dd$y * char_width)/diff(range(dd$y))
                    dd$x <- dd$x - min(dd$x) + d$position - 
                      (char_width * char_scale)/2
                    dd$y <- dd$y - min(dd$y) + d$ypos - char_width/2
                }
            }else{#x_width = char_width, y-width scaled proportionally                                       
                dd$x <- dd$x * char_width/diff(range(dd$x))
                # for ggtreeExtra
                if ("new_position" %in% colnames(d)){
                    dd$y <- dd$y * 
                      char_width/(diff(range(dd$y)) * char_scale) * scale_n
                    dd$x <- dd$x - min(dd$x) + d$new_position - char_width/2
                    dd$y <- dd$y - min(dd$y) + d$ypos - 
                      (scale_n * char_width/char_scale)/2
                }else{
                    dd$y <- dd$y * char_width/(diff(range(dd$y)) * char_scale)
                    dd$x <- dd$x - min(dd$x) + d$position - char_width/2
                    dd$y <- dd$y - min(dd$y) + d$ypos - 
                      (char_width/char_scale)/2
                }
            }
        }
        cn <- colnames(d)
        cn <- cn[!cn %in% c('x','y', 'ypos')]
        for (nn in cn) {
            dd[[nn]] <- d[[nn]]
        }

        dd$group <- paste0("V", d$position, "L", d$ypos)
        return(dd)
    })

    ydf <- do.call(rbind, yy)
    colnames(ydf)[colnames(ydf) == 'y'] <- 'yy'
    ydf$y <- as.numeric(ydf$name)
    ydf <- cbind(label = ydf$name, ydf)
    return(ydf)
}

##' Convert msa file/object to tidy data frame.
##'
##'
##' @title tidy_msa
##' @param msa multiple sequence alignment file or sequence object in 
##' DNAStringSet, RNAStringSet, AAStringSet, BStringSet, DNAMultipleAlignment, 
##' RNAMultipleAlignment, AAMultipleAlignment, DNAbin or AAbin
##' @param start start position to extract subset of alignment
##' @param end end position to extract subset of alignemnt
##' @return tibble data frame
##' @export
##' @examples
##' fasta <- system.file("extdata", "sample.fasta", package = "ggmsa")
##' aln <- tidy_msa(msa = fasta, start = 10, end = 100)
##' @author Guangchuang Yu
tidy_msa <- function(msa, start = NULL, end = NULL) {
    if(inherits(msa, "character") && length(msa) > 1) {
        aln <- msa
    }else {
        aln <- prepare_msa(msa)
    }
    alnmat <- lapply(seq_along(aln), function(i) {
        ##Preventing function collisions
        base::strsplit(as.character(aln[[i]]), '')[[1]]
    }) %>% do.call('rbind', .)
    ## for DNAbin and AAbin
    alndf <- as.data.frame(alnmat, stringsAsFactors = FALSE)

    if(unique(names(aln)) %>% length == length(aln)) {
        alndf$name = names(aln)
    }else{
      stop("Sequences must have unique names")
    }
    cn = colnames(alndf)
    cn <- cn[!cn %in% "name"]
    df <- gather(alndf, "position", "character", cn)

    y <- df
    y$position = as.numeric(sub("V", "", y$position))
    y$character = toupper(y$character)

    y$name = factor(y$name, levels=rev(names(aln)))


    if (is.null(start)) start <- min(y$position)
    if (is.null(end)) end <- max(y$position)

    y <- y[y$position >=start & y$position <= end, ]

    return(y)
}





##' This function converts the msa_data to the tidy data.
##'
##' @param msaData sequence alignment data generated by msa_data().
##' @noRd
msa2tidy <- function(msaData) {
  if ("order" %in% names(msaData)) {
    msaData <- msaData[msaData$order == 1,]
  }
  df_tidy <- data.frame(name = msaData$name,
                        position = msaData$position,
                        character = msaData$character)
  df_tidy$character <- as.character(df_tidy$character)

  return(df_tidy)
}




================================================
FILE: R/pp_interactive.R
================================================

make_gap <- function(gap, previous_seq) {
    gap_df <- previous_seq[rep(1, each=gap),] 
    gap_start <- max(previous_seq$position) + 1
    gap_df$position <- gap_start : (gap_start + gap - 1 )
    gap_df$character <- "-"
    
    if("pos_previous"  %in% names(gap_df)) {
        gap_df$pos_previous <- 0
    }
    
    return(gap_df)
}

##' merge two MSA
##'
##' @title merge_seq
##' @param previous_seq previous MSA
##' @param subsequent_seq subsequent MSA
##' @param gap gap length
##' @param adjust_name logical value. merge seq name or not
##' @return tidy MSA data frame
##' @export
##' @author Lang Zhou
merge_seq <- function(previous_seq, gap, subsequent_seq, adjust_name = TRUE) {
    
    name_pre <- levels(previous_seq$name)
    name_subse <- levels(subsequent_seq$name)
    
    if(length(name_pre) != length(name_subse)) {
        stop("The sequences number of previous_seq and subsequent_seq is inconsistent")
    }
    
    gap_df <- make_gap(gap = gap, previous_seq = previous_seq)
    subsequent_seq$position <- 
        subsequent_seq$position - min(subsequent_seq$position) + 1
    subsequent_seq$position <- 
        subsequent_seq$position + max(previous_seq$position) + gap
    
    t_merge <- rbind(previous_seq,gap_df,subsequent_seq)
    
    if (adjust_name) {
        rownames(t_merge) <- seq(nrow(t_merge))
        names(t_merge)[1] <- "name_previous"
        t_merge$name <- ""
        
        for(i in seq(length(name_pre))) {
            t_merge[t_merge$name_previous %in% c(name_pre[i], name_subse[i]),"name"] <- 
                paste0(name_pre[i],"-", name_subse[i])
        }
        t_merge$name <- factor(t_merge$name)
    }
    return(t_merge)
}


##' tidy protein-protein interactive position data
##'
##' @title tidy_hdata
##' @param gap gap length
##' @param inter protein-protein interactive position data
##' @param previous_seq previous MSA
##' @param subsequent_seq subsequent MSA
##' @importFrom R4RNA as.helix
##' @return helix data
##' @export
##' @author Lang Zhou
tidy_hdata <- function(gap, inter, previous_seq,subsequent_seq) {
    inter$j <- inter$Res.no..2 - 
        min(subsequent_seq$position) + 
        max(previous_seq$position) + gap + 1
    hdata <- data.frame(i = inter$Res.no.1, 
                        j = inter$j,
                        length = 1, 
                        value = NA, 
                        colour = "blue")
    hdata <- as.helix(hdata)
    return(hdata)
}

##' reset MSA position
##'
##' @title reset_pos
##' @param seq_df MSA data
##' @return data frame
##' @export
##' @author Lang Zhou
reset_pos <- function(seq_df) {
    names(seq_df)[2] <- "pos_previous"
    seq_df$position <- ""
    
    for(i in unique(seq_df$pos_previous)%>% seq) {
        uni <- unique(seq_df$pos_previous)
        seq_df[seq_df$pos_previous == uni[i],"position"] <- i
    }
    
    seq_df$position <- as.numeric(seq_df$position)
    return(seq_df)
    
}

##' reset hdata data position
##'
##' @title simplify_hdata 
##' @param hdata data from tidy_hdata()
##' @param sim_msa MSA data frame
##' @return data frame
##' @export
##' @author Lang Zhou
simplify_hdata <- function(hdata, sim_msa) {
    
    new_hdata <- lapply(seq(nrow(hdata)), function(a) {
        n <- hdata[a,]
        n$pre_i <- n$i
        n$i <- sim_msa[sim_msa$pos_previous == n$i,"position"] %>% unique
        return(n)
    }) %>% do.call("rbind",.)
    
    new_hdata <- lapply(seq(nrow(new_hdata)), function(a) {
        n <- new_hdata[a,]
        n$pre_j <- n$j
        n$j <- sim_msa[sim_msa$pos_previous == n$j,"position"] %>% unique
        return(n)
    }) %>% do.call("rbind",.)
    
    new_hdata <- as.helix(new_hdata)
    
    return(new_hdata)
    
}











================================================
FILE: R/prepare_fasta.R
================================================
##' preparing multiple sequence alignment
##'
##' This function supports both NT or AA sequences; It supports multiple 
##' input formats such as "DNAStringSet", "BStringSet", "AAStringSet", 
##' DNAbin", "AAbin" and a filepath.
##' @title prepare_msa
##' @param msa a multiple sequence alignment file or object
##' @return BStringSet based object
##' @importFrom Biostrings DNAStringSet
##' @importFrom Biostrings RNAStringSet
##' @importFrom Biostrings AAStringSet
##' @importFrom methods missingArg
##' @importFrom seqmagick fa_read
## @export
##' @author Lang Zhou and Guangchuang Yu
##' @noRd
prepare_msa <- function(msa) {
    if (missingArg(msa)) {
        stop("no input...")
    } else if (inherits(msa, "character")) {
        msa <- fa_read(msa)
    } else if (!class(msa) %in% supported_msa_class) {
        stop("multiple sequence alignment object no supported...")
    }

    res <- switch(class(msa),
                  DNAbin = DNAbin2DNAStringSet(msa),
                  AAbin = AAbin2AAStringSet(msa),
                  DNAMultipleAlignment = DNAStringSet(msa),
                  RNAMultipleAlignment = RNAStringSet(msa),
                  AAMultipleAlignment = AAStringSet(msa),
                  msa ## DNAstringSet, RNAStringSet, AAString, BStringSet
                  )
    return(res)
}


DNAbin2DNAStringSet <- function(msa) {
    seqs <- vapply(seq_along(msa),
                   function(i) paste0(as.character(msa[i]) %>% unlist, 
                                      collapse=''),
                   character(1))
    names(seqs) <- names(msa)

    switch(class(msa),
           DNAbin = DNAStringSet(seqs),
           AAbin = AAStringSet(seqs))
}

AAbin2AAStringSet <- DNAbin2DNAStringSet



supported_msa_class <- c("DNAStringSet",  
                         "RNAStringSet", 
                         "AAStringSet", 
                         "BStringSet",
                         "DNAMultipleAlignment", 
                         "RNAMultipleAlignment", 
                         "AAMultipleAlignment",
                         "DNAbin", 
                         "AAbin")





================================================
FILE: R/read_maf.R
================================================
##' read 'multiple alignment format'(MAF) file
##'
##' @title read_maf
##' @param multiple_alignment_format a multiple alignment format(MAF) file
##' @return data frame
##' @export
##' @author Lang Zhou
read_maf <- function(multiple_alignment_format) {
    
    line <- readLines(multiple_alignment_format)
    head <- sapply(line, function(i) substring(i,1,1))
    rm(line)# 'line' in names(heads) 
    
    #remove header
    head <- head[-seq(which(head == "#"))]
    
    #split block
    blank <- which(head == "")
    block_ls <- lapply(seq(blank), function(i) {
        if (blank[i] == min(blank)) {
            x <- names(head)[1:blank[i]]
        }else {
            x <- names(head)[blank[i-1]:blank[i]]
        }
        return(x)
    })
    names(block_ls) <- paste0("block_",seq(length(block_ls)))
    
    #extra lines starting with "s"
    s_block <- lapply(seq(length(block_ls)), function(i) {
        blocki <- block_ls[[i]]
        line_s <- blocki[sapply(blocki, function(j) substring(j,1,1))  == "s"] 
    }) 
    names(s_block) <- names(block_ls)
    
    #get a MAF df
    s_name <- c("type", "src", "start", "size", "strand", 'src_size', "text")
    seq_df <-lapply(seq(length(s_block)), function(i) {
        
        blocki <- s_block[[i]]
        seq_df <- lapply(seq(length(blocki)), function(j) {
            x <- blocki[[j]]
            #extra all columns
            x <- strsplit(x, " ") %>% unlist 
            x1 <- x[sapply(x, nchar) > 0]
            #convert to data frame
            seq <- t(as.matrix(x1)) %>% as.data.frame()
            names(seq) <- s_name
            seq[,c("start","size",'src_size')] <- 
                seq[,c("start","size",'src_size')] %>%as.numeric()
            
            seq$size_gap <- nchar(seq$text)
            seq$end <- seq$start + seq$size
            seq$end_gap <- seq$start + seq$size_gap
            seq$block <- names(s_block[i])
            return(seq)
        })%>% do.call("rbind", .)
        return(seq_df)
        
    }) %>% do.call("rbind", .)
}


================================================
FILE: R/seqdiff.R
================================================

##' calculate difference of two aligned sequences
##'
##'
##' @title seqdiff
##' @param fasta fasta file
##' @param reference which sequence serve as reference, 1 or 2
##' @return SeqDiff object
##' @export
##' @importFrom Biostrings readBStringSet
##' @importClassesFrom Biostrings BStringSet
##' @importFrom methods new
##' @author guangchuang yu
##' @examples
##' fas <- list.files(system.file("extdata", "GVariation", package="ggmsa"),
##'                   pattern="fas", full.names=TRUE)
##' seqdiff(fas[1], reference=1)
seqdiff <- function(fasta, reference=1) {
    sequence <- readBStringSet(fasta)
    if (length(sequence) != 2 && length(width(sequence)) != 1) {
        stop("fas should contains 2 aligned sequences...")
    }
    diff <- nucleotide_difference(sequence, reference)
    new("SeqDiff",
        file = fasta,
        sequence = sequence,
        reference = reference,
        diff = diff)
}

##' @importFrom magrittr %>%
##' @importFrom Biostrings toString
##' @importFrom Biostrings width
nucleotide_difference <- function(x, reference=1) {
    n <- width(x[1])
    nn <- seq_len(n)
    s1 <- x[1] %>% toString %>% substring(nn, nn)
    s2 <- x[2] %>% toString %>% substring(nn, nn)

    pos <- which(s1 != s2)
    if (reference == 1) {
        diff <- s2[pos]
    } else {
        diff <- s1[pos]
    }

    return(data.frame(position = pos,
                      difference = diff,
                      stringsAsFactors = FALSE))
}




##' @importFrom dplyr group_by
##' @importFrom dplyr summarize
##' @importFrom dplyr select
##' @importFrom dplyr n
nucleotide_difference_count <- function(x, width=50, keep0=FALSE) {
    n <- max(x$position)
    bin <- rep(seq_len(ceiling(n/width)), each=width)
    position <- c(seq_len(n)[!duplicated(bin)], n)
    x$bin <- bin[x$pos]
    y <- x %>% group_by(bin) %>%
        summarize(position=min(position), count = n()) %>%
        select(-bin)
    y$position <- position[findInterval(y$position, position)]
    if (keep0) {
        itv <- seq(1, n, width)
        yy <- data.frame(position = itv[!itv %in% y$position],
                         count = 0)
        y <- rbind(y, yy)
        y <- y[order(y$position, decreasing=FALSE),]
    }
    return(y)
}



================================================
FILE: R/seqlogo.R
================================================
##' plot sequence logo for MSA based 'ggolot2'

##' @title seqlogo
##' @param msa Multiple sequence alignment file or object for representing 
##' either nucleotide sequences or peptide sequences.
##' @param start Start position to plot.
##' @param end End position to plot.
##' @param font font families, possible values are 'helvetical', 'mono', and 
##' 'DroidSansMono', 'TimesNewRoman'.  Defaults is 'DroidSansMono'. 
##' If font=NULL, only the background tiles is drawn.
##' @param color A Color scheme. One of 'Clustal', 'Chemistry_AA', 
##' 'Shapely_AA', 'Zappo_AA', 'Taylor_AA', 'LETTER', 'CN6','Chemistry_NT', 
##' 'Shapely_NT', 'Zappo_NT', 'Taylor_NT'. Defaults is 'Chemistry_AA'.
##' @param custom_color A data frame with two cloumn called "names" and 
##' "color".Customize the color scheme.
##' @param adaptive A logical value indicating whether the overall height of 
##' seqlogo corresponds to the number of sequences. If FALSE, seqlogo 
##' overall height = 4,fixedly.
##' @param top  A logical value. If TRUE, seqlogo is aligned to the top of MSA.
##' @return ggplot object
##' @examples
##' #plot sequence motif independently
##' nt_sequence <- system.file("extdata", "LeaderRepeat_All.fa", 
##'                            package = "ggmsa")
##' seqlogo(nt_sequence, color = "Chemistry_NT")
##' @export
##' @author Lang Zhou
seqlogo <- function(msa, 
                    start = NULL, 
                    end = NULL, 
                    font = "DroidSansMono", 
                    color = "Chemistry_AA", 
                    adaptive = FALSE, 
                    top = FALSE, 
                    custom_color = NULL) {
  
    data <- tidy_msa(msa, start = start, end = end)
    ggplot() + geom_logo(data, 
                         font = font, 
                         color = color, 
                         adaptive = adaptive, 
                         top = top, 
                         custom_color = custom_color) +
        theme_minimal() + xlab(NULL) + ylab(NULL) +
        theme(legend.position = 'none') + 
        theme(panel.grid = element_blank(), axis.text.y = element_blank()) +
        coord_fixed()
}

##' Multiple sequence alignment layer for ggplot2. It plot sequence motifs.

##' @title geom_seqlogo
##' @param font font families, possible values are 'helvetical', 'mono', 
##' and 'DroidSansMono', 'TimesNewRoman'. Defaults is 'DroidSansMono'.
##' @param color A Color scheme. One of 'Clustal', 'Chemistry_AA', 
##' 'Shapely_AA', 'Zappo_AA', 'Taylor_AA', 'LETTER', 'CN6', 'Chemistry_NT', 
##' 'Shapely_NT', 'Zappo_NT', 'Taylor_NT'. Defaults is 'Chemistry_AA'.
##' @param custom_color A data frame with two cloumn called "names" and 
##' "color".Customize the color scheme.
##' @param adaptive A logical value indicating whether the overall height 
##' of seqlogo corresponds to the number of sequences.If is FALSE, 
##' seqlogo overall height = 4,fixedly.
##' @param top A logical value. If TRUE, seqlogo is aligned to the top of MSA.
##' @param show.legend logical. Should this layer be included in the legends?
##' @param ... additional parameter
##' @return A list
##' @examples
##' #plot multiple sequence alignment and sequence motifs
##' f <- system.file("extdata/LeaderRepeat_All.fa", package="ggmsa")
##' ggmsa(f,font = NULL,color = "Chemistry_NT") + geom_seqlogo()
##' @export
##' @author Lang Zhou
geom_seqlogo <- function(font = "DroidSansMono", color = "Chemistry_AA", 
                         adaptive = TRUE, top = TRUE, custom_color = NULL, 
                         show.legend = FALSE, ...) {
    structure(list(font = font,
                   color = color,
                   adaptive = adaptive,
                   top = top,
                   custom_color = custom_color,
                   show.legend = show.legend),
              class = "seqlogo")
}


geom_logo <- function(data, font = "DroidSansMono", color = "Chemistry_AA",
                      adaptive = FALSE, top = TRUE, custom_color = NULL, 
                      show.legend = FALSE, ...) {
    mapping  <- aes_(x = ~logo_x, 
                     y = ~logo_y,  
                     group = ~group, 
                     fill = ~I(color))
    logo_data <- seqlogo_data(data, font = font, color = color, 
                              adaptive = adaptive, top = top, 
                              custom_color = custom_color)

    ly_logo <- geom_polygon(mapping = mapping, data = logo_data, 
                            inherit.aes = FALSE, show.legend = show.legend)
    return(ly_logo)
}

seqlogo_data <- function(data, font = "DroidSansMono", 
                         color = "Chemistry_AA", adaptive = FALSE, 
                         top = TRUE, custom_color = NULL){
    tidy <- data

    if (color == "Clustal") {
        tidy <- color_Clustal(tidy)
    } else{
        tidy <- color_scheme(tidy, color, custom_color)
    }

    if (adaptive) {
        seq_number  <-  as.character(unique(tidy[[1]]))
        total_heigh <- length(seq_number) / 6
    } else {
        total_heigh <- 4
    }

    #total_heigh <- getOption("total_heigh")
    logo_width <- getOption("logo_width")
    ## assign the start postion to the first label
    col_num <- as.numeric(levels(factor(tidy$position))) 
    moti_da <- lapply(col_num, function(j){
        ## Calculate the char frequency in each column
        clo <- tidy[tidy$position == j, ] 
        fre <- prop.table(table(clo$character))
        ## total_heigh is overall hight, the height of each char is assigned.
        ywidth <- sort(total_heigh * fre ) 
        ## calling color scheme
        column_char_color <- data.frame(unique(clo[c("character", "color")])) 
        font_f <- font_fam[[font]]
        motif_char <- font_f[names(ywidth)]
        ds_ <- lapply(seq_along(motif_char), function(i){
            ds_ <- motif_char[[i]]
            names(ds_)[names(ds_) == "x"] <- "logo_x"
            names(ds_)[names(ds_) == "y"] <- "logo_y"
            ds_$char <- names(motif_char[i])
            #width = .9
            ds_$logo_x <- ds_$logo_x * logo_width/diff(range(ds_$logo_x)) 
            #hight = overall hight * frequency
            ds_$logo_y <- ds_$logo_y * ywidth[[i]]/diff(range(ds_$logo_y))
            ymotif <- sum(ywidth[0:(i - 1)]) # sum-hight currently
            #  moving char horizontally
            ds_$logo_x <- ds_$logo_x - min(ds_$logo_x) - logo_width/2 + j
            ds_$logo_y <- ds_$logo_y - min(ds_$logo_y) - ywidth[[i]]/2 + 
                          ymotif + ywidth[[i]]/2
            if (top) {
              ds_$logo_y <- ds_$logo_y + nrow(tidy[tidy$position == j, ]) + .5
            }
            ## ds_$y - min(ds_$y) - ywidth[[i]]/2: Centered at zero
            ## + ymotif: sum-hight that are below the char currently
            ## + ywidth[[i]]/2: the char height currently
            ds_$group <- paste0("P", j, '-', "Char", names(motif_char[i]))
            ds_$color <- column_char_color[column_char_color$character == 
                                           unique(ds_$char), "color"]
            return(ds_)
         })
        ds <- do.call(rbind, ds_)
        return(ds)
  })
    moti_da <- do.call(rbind, moti_da)
    moti_da$name <- as.character(tidy[1,1])
    other_cn <- names(moti_da)[!names(moti_da) == 'name']
    moti_da <- moti_da[c("name", other_cn)]
    add_col <- tidy[,!names(tidy) %in% names(moti_da)]
    moti_da <- cbind(add_col[1,], moti_da, row.names = NULL)
    return(moti_da)
}




















================================================
FILE: R/simplot.R
================================================
##' Sequence similarity plot
##'
##'
##' @title simplot
##' @param file alignment fast file
##' @param query query sequence
##' @param window sliding window size (bp)
##' @param step step size to slide the window (bp)
##' @param group whether grouping sequence.(eg. For "A-seq1,A-seq-2,B-seq1 and 
##' B-seq2", using sep = "-" and id = 1 to divide sequences into groups A and 
##' B)
##' @param id position to extract id for grouping; only works if group = TRUE
##' @param sep separator to split sequence name; only works if group = TRUE
##' @param sd whether display standard deviation of 
##' similarity among each group; only works if group=TRUE
##' @param smooth FALSE(default)or TRUE; whether display smoothed spline.
##' @param smooth_params a list that add params for geom_smooth,
##' (default: smooth_params = list(method = "loess", se = FALSE))
##' @return ggplot object
##' @importFrom Biostrings readDNAStringSet
##' @importFrom ggplot2 aes_
##' @importFrom ggplot2 geom_line
##' @importFrom ggplot2 ggtitle
##' @importFrom ggplot2 geom_ribbon
##' @importFrom ggplot2 geom_smooth
##' @importFrom magrittr %<>%
##' @importFrom dplyr group_by_
##' @importFrom dplyr summarize_
##' @export
##' @author guangchuang yu
##' @examples
##' fas <- system.file("extdata/GVariation/sample_alignment.fa", 
##'                     package="ggmsa")
##' simplot(fas, 'CF_YL21')
simplot <- function(file, 
                    query, 
                    window=200, 
                    step=20, 
                    group=FALSE, 
                    id, 
                    sep, 
                    sd=FALSE,
                    smooth = FALSE,
                    smooth_params = list(method = "loess", 
                                         se = FALSE)) {
    aln <- readDNAStringSet(file)
    nn <- names(aln)
    if (group) {
        g <- vapply(strsplit(nn, sep), function(x) x[id], character(1))
    }

    idx <- which(nn != query)
    w <- width(aln[query])
    start <- seq(1, w, by=step)
    end <- start + window - 1
    start <- start[end <= w]
    end <- end[end <= w]
    res <- lapply(idx, function(i) {
        x <- toCharacter(aln[i]) == toCharacter(aln[query])
        pos <- round((start+end)/2)
        sim <- vapply(seq_along(start), function(j) {
            mean(x[start[j]:end[j]])
        }, numeric(1))

        y <- data.frame(sequence=nn[i], position = pos, similarity = sim)
        if(group) {
            y$group <- g[i]
        }
        return(y)
    }) %>% do.call(rbind, .)

    if (group) {
        res %<>% group_by_(~position, ~group) %>%
            summarize_(msim=~mean(similarity), sd=~sd(similarity))
    }


    if (group) {
        p <- ggplot(res, aes_(x=~position, y=~msim, group=~group))
        if (sd) p <- p + geom_ribbon(aes_(ymin=~msim-sd, 
                                          ymax=~msim+sd, 
                                          fill=~group), alpha=.25)
        if (smooth) {
            smooth_layer <- do.call(geom_smooth, 
                                    smooth_params)
            p <- p + smooth_layer
        } else {
            p <- p + geom_line(aes_(color=~group))
        }
        
        
    } else {
        mapping = aes_(x=~position, 
                       y=~similarity,
                       group=~sequence, 
                       color=~sequence)
        p <- ggplot(res, mapping = mapping) 
        
        if (smooth) {
            smooth_layer <- do.call(geom_smooth, 
                                    smooth_params)
            p <- p + smooth_layer
            
        } else {
            p <- p + geom_line()
        }
    }

    p + xlab("Nucleotide Position") + ylab("Similarity (%)") +
        ggtitle(paste("Sequence similarities compare to", query)) +
        theme_minimal() +
        theme(legend.title=element_blank()) 
}


toCharacter <- function(x) {
    unlist(strsplit(toString(x),""))
}




================================================
FILE: R/theme_msa.R
================================================
##' Theme for ggmsa.
##'
##' @title theme_msa
##' @importFrom ggplot2 theme_minimal
##' @importFrom ggplot2 labs
##' @export
##' @author Lang Zhou
theme_msa <- function(){
  list(
    xlab(NULL),
    ylab(NULL),
    labs(fill = "Fills"),
    coord_fixed(),
    scale_x_continuous(expand = c(0,0)),
    theme_minimal() +
        theme(
            strip.text = element_blank(),
            panel.spacing.y = unit(.4, "in"),
            panel.grid = element_blank())
  )
}


##' @importFrom grDevices colorRampPalette
##' @importFrom RColorBrewer brewer.pal
##' @importFrom ggplot2 coord_cartesian
##' @importFrom ggplot2 scale_x_continuous
##' @importFrom ggplot2 scale_y_continuous
##' @importFrom ggplot2 scale_fill_gradientn
bar_theme <- function(tidy){
    data <- bar_data(tidy)
    color_palettes <- colorRampPalette(brewer.pal(n = 9, 
                                                  name = "Blues")[c(4:7)])
    list(
        xlab(NULL),
        ylab("consensus"),
        scale_x_continuous(breaks = data[[3]], 
                           labels = data[[1]],
                           expand = c(0,0)),
        scale_y_continuous(breaks = NULL),
        scale_fill_gradientn(colours = color_palettes(100)),
        theme_minimal() +
            theme(panel.grid.minor.x = element_blank(), 
                  panel.grid.major.x = element_blank())
        )
}

facet_scale <- function(facetData, field) {
    facet0_pos <- facetData[facetData$facet == 0,"position"]
    msa_start <- min(facet0_pos)
    
    ## x labels of facet 0
    facet0_xl_scale <- pretty(min(facet0_pos):max(facet0_pos)) 
    
    ## assign the start postion to the first label
    facet0_xl_scale[1] <- msa_start 
    xl_scale <- facet0_xl_scale
    for(i in max(facetData$facet) %>% seq_len) {
        scale_i <- facet0_xl_scale + field * i
        if(msa_start > 1) scale_i[1] <- scale_i[1] + 1
        #print(scale_i)
        xl_scale <- xl_scale %>% c(scale_i)
    }
    max_pos <- facetData$position %>% max
    xl_scale <- xl_scale[xl_scale <= max_pos]
    return(xl_scale)
}






================================================
FILE: R/zzz.R
================================================
#' @importFrom utils packageDescription
.onAttach <- function(libname, pkgname){
    #options(total_heigh = 4)
    options(logo_width = 0.9)
    options(asterisk_width = .03)
    options(GC_pos = 2)
    options(shadingLen = .5)
    options(shading_alpha = .3)
    
    pkgVersion <- packageDescription(pkgname, fields="Version")
    msg <- paste0(pkgname, " v", pkgVersion, "  ",
                  "Document: http://yulab-smu.top/ggmsa/", "\n\n")
    citation <- paste0("If you use ", pkgname,
                       " in published research, please cite:\n",
                       "L Zhou, T Feng, S Xu, F Gao, TT Lam, Q Wang, T Wu, ",
                       "H Huang, L Zhan, L Li, Y Guan, Z Dai*, G Yu* ",
                       "ggmsa: a visual exploration tool for multiple sequence alignment and associated data. ",
                       "Briefings in Bioinformatics. DOI:10.1093/bib/bbac222")
    packageStartupMessage(paste0(msg, citation))
    
}

================================================
FILE: README.Rmd
================================================
---
output: 
  md_document:
    variant: gfm
html_preview: TRUE
---
<!-- README.md is generated from README.Rmd. Please edit that file -->

```{r, include = FALSE}
knitr::opts_chunk$set(
  fig.path = "man/figures/REAMED-",
  message = FALSE,
  warning = FALSE
)
```
#  ggmsa:a visual exploration tool for multiple sequence alignment and associated data <img src="man/figures/logo.png" height="140" align="right" />

```{r echo=FALSE, results="hide", message=FALSE}
library(badger)
```

```{r, echo = FALSE, results='asis'}
cat(
	badge_devel("YuLab-SMU/ggmsa", "blue"),
	badge_lifecycle("experimental", "orange"),
	badge_license("Artistic-2.0")
)
```
<!-- badges: start -->
<!-- [![CRAN_Release_Badge](https://www.r-pkg.org/badges/version-ago/ggmsa)](https://cran.r-project.org/package=ggmsa)-->
<!-- [![CRAN_Download_Badge](https://cranlogs.r-pkg.org/badges/grand-total/ggmsa?color=green)](https://cran.r-project.org/package=ggmsa)-->
<!-- badges: end -->


`ggmsa` is designed for visualization and annotation of multiple sequence alignment. It implements functions to visualize publication-quality multiple sequence alignments (protein/DNA/RNA) in R extremely simple and powerful. 

For details, please visit <http://yulab-smu.top/ggmsa/>


##  :hammer: Installation

The released version from `Bioconductor`

```{r eval=FALSE}
if (!requireNamespace("BiocManager", quietly=TRUE))
    install.packages("BiocManager")
## BiocManager::install("BiocUpgrade") ## you may need this
BiocManager::install("ggmsa")
```

Alternatively, you can grab the development version from github using devtools:

```{r eval=FALSE}
if (!requireNamespace("devtools", quietly=TRUE))
    install.packages("devtools")
devtools::install_github("YuLab-SMU/ggmsa")
```


##  :bulb: Quick Example 

```{r fig.height = 2.5, fig.width = 11, message=FALSE, warning=FALSE, dpi=300}
library(ggmsa)
protein_sequences <- system.file("extdata", "sample.fasta", package = "ggmsa")
ggmsa(protein_sequences, start = 221, end = 280, char_width = 0.5, seq_name = TRUE) + geom_seqlogo() + geom_msaBar()
```


##  :books: Learn more

Check out the guides for learning everything there is to know about all the different features:

- [Getting Started](https://yulab-smu.github.io/ggmsa/articles/ggmsa.html)
- [Annotations](https://yulab-smu.github.io/ggmsa/articles/guides/Annotations.html)
- [Color Schemes and Font Families](https://yulab-smu.github.io/ggmsa/articles/guides/Color_schemes_And_Font_Families.html)
- [Theme](https://yulab-smu.github.io/ggmsa/articles/guides/MSA_theme.html)
- [Other Modules](https://yulab-smu.github.io/ggmsa/articles/guides/Other_Modules.html)
- [View Modes](https://yulab-smu.github.io/ggmsa/articles/guides/View_modes.html)

##  :runner: Author

- [Guangchuang Yu](https://guangchuangyu.github.io)  Professor, PI 
- [Lang Zhou](https://github.com/nyzhoulang)  Master's Student
- [Shuangbin Xu](https://github.com/xiangpin)  PhD Student

**YuLab**  <https://yulab-smu.top/>

**Department of Bioinformatics, School of Basic Medical Sciences, Southern Medical University**



## :sparkling_heart: Contributing

We welcome any contributions! By participating in this project you agree to abide
by the terms outlined in the [Contributor Code of Conduct](https://github.com/YuLab-SMU/ggmsa/blob/master/CONDUCT.md).




================================================
FILE: README.md
================================================
<!-- README.md is generated from README.Rmd. Please edit that file -->

# ggmsa:a visual exploration tool for multiple sequence alignment and associated data <img src="man/figures/logo.png" height="140" align="right" />

[![](https://img.shields.io/badge/devel%20version-1.3.2-blue.svg)](https://github.com/YuLab-SMU/ggmsa)
[![](https://img.shields.io/badge/lifecycle-experimental-orange.svg)](https://lifecycle.r-lib.org/articles/stages.html#experimental)
[![License:
Artistic-2.0](https://img.shields.io/badge/license-Artistic--2.0-blue.svg)](https://cran.r-project.org/web/licenses/Artistic-2.0)
<!-- badges: start -->
<!-- [![CRAN_Release_Badge](https://www.r-pkg.org/badges/version-ago/ggmsa)](https://cran.r-project.org/package=ggmsa)-->
<!-- [![CRAN_Download_Badge](https://cranlogs.r-pkg.org/badges/grand-total/ggmsa?color=green)](https://cran.r-project.org/package=ggmsa)-->
<!-- badges: end -->

`ggmsa` is designed for visualization and annotation of multiple
sequence alignment. It implements functions to visualize
publication-quality multiple sequence alignments (protein/DNA/RNA) in R
extremely simple and powerful.

For details, please visit <http://yulab-smu.top/ggmsa/>

## :hammer: Installation

The released version from `Bioconductor`

``` r
if (!requireNamespace("BiocManager", quietly=TRUE))
    install.packages("BiocManager")
## BiocManager::install("BiocUpgrade") ## you may need this
BiocManager::install("ggmsa")
```

Alternatively, you can grab the development version from github using
devtools:

``` r
if (!requireNamespace("devtools", quietly=TRUE))
    install.packages("devtools")
devtools::install_github("YuLab-SMU/ggmsa")
```

## :bulb: Quick Example

``` r
library(ggmsa)
protein_sequences <- system.file("extdata", "sample.fasta", package = "ggmsa")
ggmsa(protein_sequences, start = 221, end = 280, char_width = 0.5, seq_name = TRUE) + geom_seqlogo() + geom_msaBar()
```

![](man/figures/REAMED-unnamed-chunk-6-1.png)<!-- -->

## :books: Learn more

Check out the guides for learning everything there is to know about all
the different features:

-   [Getting
    Started](https://yulab-smu.github.io/ggmsa/articles/ggmsa.html)
-   [Annotations](https://yulab-smu.github.io/ggmsa/articles/guides/Annotations.html)
-   [Color Schemes and Font
    Families](https://yulab-smu.github.io/ggmsa/articles/guides/Color_schemes_And_Font_Families.html)
-   [Theme](https://yulab-smu.github.io/ggmsa/articles/guides/MSA_theme.html)
-   [Other
    Modules](https://yulab-smu.github.io/ggmsa/articles/guides/Other_Modules.html)
-   [View
    Modes](https://yulab-smu.github.io/ggmsa/articles/guides/View_modes.html)

## :runner: Author

-   [Guangchuang Yu](https://guangchuangyu.github.io) Professor, PI
-   [Lang Zhou](https://github.com/nyzhoulang) Master’s Student
-   [Shuangbin Xu](https://github.com/xiangpin) PhD Student

**YuLab** <https://yulab-smu.top/>

**Department of Bioinformatics, School of Basic Medical Sciences,
Southern Medical University**

## :sparkling_heart: Contributing

We welcome any contributions! By participating in this project you agree
to abide by the terms outlined in the [Contributor Code of
Conduct](https://github.com/YuLab-SMU/ggmsa/blob/master/CONDUCT.md).


================================================
FILE: inst/CITATION
================================================
citHeader("To cite ggmsa in publications use:")

citEntry(
    entry  = "book",
    title = "Data Integration, Manipulation and Visualization of Phylogenetic Treess",
    author = person("Guangchuang", "Yu"),
	publisher = "Chapman and Hall/{CRC}",
    year = "2022",
	edition = "1st edition",
    url = "https://www.amazon.com/Integration-Manipulation-Visualization-Phylogenetic-Computational-ebook/dp/B0B5NLZR1Z/",
    textVersion = paste("Guangchuang Yu. (2022).",
                        "Data Integration, Manipulation and Visualization of Phylogenetic Trees (1st edition).",
                        "Chapman and Hall/CRC.")   
)


citEntry(
    entry  = "article",
    title  = "ggmsa: a visual exploration tool for multiple sequence alignment and associated data ",
    author = personList(
        as.person("Lang Zhou"),
        as.person("Tingze Feng"),
        as.person("Shuangbin Xu"),
        as.person("Fangluan Gao"),
        as.person("Tommy T Lam"),
        as.person("Qianwen Wang"),
        as.person("Tianzhi Wu"),
        as.person("Huina Huang"),
        as.person("Li Zhan"),
        as.person("Lin Li"),
        as.person("Yi Guan"),
        as.person("Zehan Dai"),
        as.person("Guangchuang Yu")
        ),
    journal = "BRIEFINGS IN BIOINFORMATICS",
    volume  = "23",
    issue   = "4",
    year    = "2022",
    month   = "06",
    ISSN    = "1467-5463",
    doi     = "10.1093/bib/bbac222",
    PMID    = "35671504",
    url     = "https://academic.oup.com/bib/article-abstract/23/4/bbac222/6603927",
    textVersion = paste("L Zhou, T Feng, S Xu, F Gao, TT Lam, Q Wang, T Wu, H Huang, L Zhan, L Li, Y Guan, Z Dai, G Yu.",
                        "ggmsa: a visual exploration tool for multiple sequence alignment and associated data.",
                        "Bioinformatics. 2022, 23(4):bbac222. 10.1093/bib/bbac222")
)

================================================
FILE: inst/extdata/GVariation/A.Mont.fas
================================================
>Mont
ATGGCAACTTACACATCAACAATCCAGTTTGGTTCCATTGAATGCAAACTTCCATACTCACCCGCTCCTTTTGGGCTAGTTGCGGGGAAACGAGAAGTTTTAACCACCACTGACCCCTTCGCAAGTTTGGAGATGCAGCTTAGTGCGCGATTACGAAGGCAAGAGTTTGCAACTATTCGAACATCCAAGAATGGTACTTGCATGTATCGATACAAGACTGATGTCCAGATTGCGCGCATTCAAAAGAAGCGCGAGGAAAGAGAAAGAGAGGAATATAATTTCCAAATGGCTGCGTCAAGTGTTGTGTCGAAGATCACTATTGCTGGTGGAGAGCCACCTTCAAAACTTGAATCACAAGTGCGGAGGGGTGTCATCCACACAACTCCAAGGATGCGCACAGCAAAAACATATCACACGCCAAAGCTGACAGAGGGACAAATGAACCACCTTATCAAGCAGGTGAAGCAAATTATGTCAACCAAAGGAGGGTCTGTTCAACTGATTAGCAAGAAAAGTACTCATGTTCACTATAAAGAAGTTTTGGGATCACATCGCGCAGTTGTTTGCACTGCACATATGAGAGGTTTACGAAAGAGAGTGGACTTTCGGTGTGATAAATGGACCGTTGTGCGTCTACAGCATCTCGCCAGGACGGACAAGTGGACTAACCAAGTTCGTGCTACTGATCTACGCAAGGGCGATAGTGGAGTTATATTGAGTAATACTAATCTCAAAGGAAACTTTGGGAGAAGCTCGGAGGGCCTATTCATAGTGCGTGGGTCGCACGAAGGAAAAATCTATGATGCACGTTCCAAGGTTACTCAAGGGGTTATGGATTCAATGGTTCAGTTCTCAAGCGCTGAAAGCTTTTGGAAGGGATTGGACGGCAATTGGGCACAAATGAGATATCCTACAGATCATACATGTGTGGCAGGCTTACCAGTTGAAGACTGTGGCAGAGTTGCAGCGATAATGACACACAGTATTTTACCGTGCTATAAGATAACCTGCCCTACCTGTGCCCAACAATATGCCAACTTGCCAGCCAGTGACTTACTTAAGATATTACACAAGCACGCAAGTGATGGTCTAAATCGATTGGGGGCAGACAAAGATCGCTTTGTGCATGTCAAAAAGTTCTTGACAATCTTAGAGCACTTAACTGAACCGGTTGATCTGAGTCTAGAAATTTTCAATGAAGTATTCAAGTCTATAGGGGAGAAGCAACAATCACCTTTCAAAAACCTGAATATTCTGAATAATTTCTTTTTGAAAGGAAAGGAAAATACAGCTCGTGAATGGCAGGTGGCTCAATTAAGCTTACTTGAATTGGCAAGATTCCAAAAGAACAGAACGGATAATATCAAGAAAGGAGACATCTCGTTCTTTAGGAATAAACTATCTGCAAAAGCAAATTGGAACTTGTATCTGTCATGTGATAACCAGCTGGATAAGAATGCAAACTTCCTGTGGGGACAGAGGGAATATCATGCTAAGCGATTTTTCTCGAACTATTTCGAGGAAATTGATCCAGCGAAGGGCTATTCAGCATACGAAAATCGTTTGCATCCGAATGGGACAAGAAAACTTGCAATTGGAAACCTAATTGTACCACTTGATCTGGCTGAGTTTAGGCGGAAGATGAAAGGTGATTATAAAAGACAGCCAGGGGTGAGTAAGAAGTGCACGAGCTCGAAGGATGGAAACTACGTGTATCCCTGTTGTTGCACTACACTTGATGATGGCTCAGCTGTTGAGTCAACATTTTACCCGCCAACTAAGAAGCACCTCGTAATAGGTAATAGTGGCGACCAAAAGTATGTTGACTTACCAAAAGGGAATTCTGAGATGTTATATATTGCCAGGCAAGGCTTCTGTTACATTAACATTTTTCTCGCGATGTTGATTAACATTAGTGAGGAAGATGCAAAGGATTTCACTAAGAAGGTTCGTGACATGTGTGTGCCAAAGCTTGGAACCTGGCCAACCATGATGGATTTGGCTACAACTTGTGCTCAAATGAAAATATTCTACCCTGATGTTCATGATGCAGAACTGCCTAGAATACTAGTCGATCACGAAACGCAGACATGCCATGTGGTTGACTCGTTTGGCTCACAAACAACTGGGTATCATATTTTGAAAGCATCTAGCGTGTCCCAACTTATTTTGTTTGCTAATGATGAGTTGGAGTCTGACATTAAGCACTATAGAGTTGGTGGTATTCCTGGAGCATGCCCTGAGCTTGGGTCCACAATATCACCTTTTAGAGAAGGAGGAATCATAATGTCTGAGTCAGCAGCGCTAAAACTGCTCCTAAAGGGAATTTTTAGGCCCAAAGTGATGAAGCAATTGCTACTGGATGAACCATATTTGCTCATTTTATCGATATTATCTCCTGGTATACTTATGGCCATGTACAACAATGGGATATTTGAGTTAGCGGTGAAGTTGTGGATCAATGAGAAACAATCTATAGCCATGATAGCATCGTTATTGTCCGCCTTGGCTTTACGAGTGTCAGCAGCAGAAACACTCGTTGCACAGAGGATTATAATTGACACGGCAGCAACAGATCTTCTCGATGCTACGTGTGATGGATTCAACTTACATCTAACATATCCCACTGCACTCATGGTGTTGCAAGTTGTTAAGAACAGAAATGAATGTGATGATACGTTGTTTAAAGCAGGTTTTTCACATTACAACATGAGTGTCGTGCAGATTATGGAAAAAAATTATCTAAGCCTCTTGGGCGATGCTTGGAAAGATTTAACCTGGCGAGAAAAATTATCCGCAACATGGCACTCATACAAAGCAAAGCGCTCTATCACTCAGTTCATAAAACCCATAGGCAAAGCAGATTTAAAAGGGTTGTACAACATATCACCGCAAGCATTCTTGGGTCAGGGCGTACAGAGAGTCAAAGGCACCGCCTCAGGGTTGAATGAGCGACTCAATAATTATATCAATACTAAGTGTGTAAATATTTCATCCTTTTTCATTCGTAGAATTTTCCGGCGCTTGCCAACTTTTGTAACTTTCATTAATTCATTATTAGTTATTAGTATGCTAACTAGTGTAGTAGCAGTGTGTCAAGCAATAATTCTAGATCAAAGGAAGTATAGAAAAGAAATTGAGTTGATGCAGATTGAGAAGAATGAAATTGTTTGTATGGAGTTGTATGCGAGTCTGCAGCGCAAACTTGAGCGTGAATTCACATGGGATGAATATATGGAATATTTGAAATCTGTGAATCCCCAGATAGTTCAATTCGCGCAAGCTCAAATGGAAGAATATAATGTGCGACATCAGCGCTCCACACCAGGTGTTAAGAATTTAGAGCAGGTGGTAGCATTTATAACTCTAATTATCATGATGTTTGATGCTGAAAGGAGCGACTGTGTATTTAAGACTCTCAACAAATTCAAAGGCATCGTTTCTTCAATGGATCATGAAGTTAGACACCAGTCCTTGGATGATGTAATCAAGAATTTCGATGAAAGGAACGAAGTTATTGATTTTGAACTAAATGAGGATACAATTAAAACATCATCAGTGTTGGACACAAAGTTTAGCGACTGGTGGGATCGGCAAATCCAAATGGGACACACACTTCCCCATTATAGAACTGAGGGACACTTCATGGAATTCACAAGGGCAACTGCTGTACAAGTGGCCAACGACATCGCGCATAGTGAGCACCTAGACTTTCTAGTGAGGGGAGCTGTTGGGTCTGGAAAATCTACTGGACTGCCTGTCCATCTCAGTGCAGCTGGATCTGTGCTTTTGATAGAACCAACTCGACCACTTGCAGAAAACGTGTTCAAGCAATTATCCAGTGAACCGTTTTTCAAGAAGCCAACACTGCGCATGCGAGGAAATAGTGTGTTTGGTTCCTCTCCAATCTCCATTATGACTAGCGGCTTTGCGTTGCACTACTATGCTAATAATCGCTCTCAGCTAACTCAGTTTAATTTCATAATTTTTGATGAATGTCATGTTTTAGATCCTTCTGCAATGGCATTTCGTAGCTTGTTAAGTGTGTATCACCAAACATGCAAAGTGTTAAAGGTGTCAGCCACTCCAGTGGGAAGGGAGGTCGAGTTCACAACACAACAACCAGTTAAATTGGTGGTTGAGGATACACTTTCATTCCAATCTTTTGTTGATGCGCAAGGCTCAAAAACCAATGCTGACGTAGTTCAGCATGGTTCGAACATACTCGTGTATGTGTCGAGTTACAATGAAGTGGATACATTAGCCAAGCTTCTAACAGATAGGAATATGATAGTCTCAAAAGTTGATGGCAGAACAATGAAGCACGGATGCTTAGAAATTGTAACGAAAGGGACTAGTGCAAAGCCACATTTTGTCGTAGCAACCAACATTATTGAAAATGGAGTAACTTTAGATATAGATGTAGTTGTAGATTTTGGGCTTAAAGTCTCACCGTTTTTAGATATTGACAATAGGAGCATAGCATACAATAAGATTAGTGTTAGCTATGGAGAAAGAATTCAGAGGTTGGGCCGTGTTGGGCGCTTTAAGAAGGGAGTGGCATTGCGTATTGGACACACCGAAAAGGGAATTATTGAGATTCCAAGTATGATTGCTAGTGAAGCTGCGCTTGCGTGCTTTGCATACAATTTGCCAGTAATGACAGGGGGTGTTTCAACTAGCCTCATTGGCAATTGTACTGTTCGTCAAGTTAAAACTATGCAACAATTTGAGCTGAGTCCATTCTTTATACAAAATTTTGTTGCCCATGATGGATCAATGCATCCTGTCATACATGACATTCTTAAGAAGTATAAACTGCGAGATTGTATGACGCCCTTGTGTGATCAATCCATACCTTACAGAGCCTCAAGCACTTGGTTGTCTGTTAGTGAGTACGAACGACTCGGAGTGGTTTTGGACATTCCAAAACAGATCAAGATTGCATTCCACATCAAGGATATCCCTCCTAAGTTGCATGAAATGCTTTGGGAAACAGTTATCAAATATAAGGATGTTTGTTTGTTTCCAAGTATTCGGGCTTCATCCATTAGCAAAATTGCATACACACTGCGCACTGATCTTTTTGCAATTCCCAGAACCCTAATTCTAGTTGAAAGATTGCTCGAGGAGGAACGAGTGAAACAGAGTCAATTCAGAAGTCTCATTGATGAAGGATGCTCAAGCATGTTTTCAATTGTTAATTTAACAAACACTCTTAGAGCTAGATATGCAAAGGATTACACTGCAGAAAACATACAGAAGCTCGAGAAAGTGAGGAGTCAGTTAAAGGAGTTCTCAAATTTAAATGGCTCTGCATGCGAGGAGAACTTAATGAAGAGGTATGAATCTCTACAGTTTGTGCATCATCAAGCAACAACTGCACTCGCAAAGGATTTGAAGTTGAAAGGAGTTTGGAAGAAGTCATTAGTTGTGCAGGACTTAATCATAGCGGGTGCCGTTGCTATTGGTGGAATAGGGCTCATCTATAGTTGGTTTACTCAATCAGTTGAAACTGTGTCTCACCAGGGCAAGAACAAATCCAAAAGAATTCAAGCATTGAAGTTTCGACACGCCCGCGATAAGAGGGCTGGTTTTGAAATTGATAACAATGATGATACAATAGAAGAATTCTTTGGATCTGCATACAGGAAGAAGGGAAAAGGTAAAGGCACCACTGTTGGTATGGGCAAGTCAAGCAGGAGGTTTGTTAATATGTATGGATTTGACCCAACAGAATATTCATTCATCCAGTTCGTTGATCCGCTCACTGGAGCTCAAATTGAAGAGAACGTCTATGCTGATATTAGAGACATCCAAGAGCGCTTTAGTGATGTCCGCAAGAAAATGGTAGAGGATGATGAAATCGAATTGCAAGCATTGGGCAGCAACACAATCATTCATGCTTACTTCAGGAAGGATTGGTCTGACAAGGCTCTAAAAATTGATTTGATGCCACACAACCCACTCAAAATCTGTGATAAATCGAATGGCATTGCTAAGTTTCCTGAAAGAGAACTTGAGTTGAGGCAAACTGGGCCAGCAACAGAGGTTGATGTGAAAGACATTCCAAAACAGGAAGTGGAGCATGAAGCCAAATCACTCATGAGAGGTTTAAGGGATTTCAATCCAATTGCTCAAACAGTTTGCAGAGTAAAAGTGTCTGTTGAATATGGAACGTCTGAAATGTATGGGTTCGGTTTTGGTGCGTATATTATAGTAAACCACCATCTATTCAAGAGTTTCAATGGATCCATGGAAGTGCGATCAATGCATGGAACATTCAGAGTGAAGAATTTGCATAGCCTGAGCGTTTTACCGATCAAAGGCAGAGACATTATCATCATAAAGATGCCAAAGGATTTCCCTGTTTTCCCACAAAAACTGCACTTCCGAGCTCCAGTGCAGAATGAGAGGATTTGTTTGGTTGGAACTAATTTTCAAGAAAAACATGCATCATCAATCATCACAGAAACGAGTACTACATACAATGTACCGGGCAGCACTTTTTGGAAGCATTGGATTGAAACAAATGATGGGCATTGTGGATTACCAGTAGTGAGTACAGCTGATGGATGTCTAGTTGGAATACACAGCTTGGCGAATAATGTGCAAACCACGAATTATTATTCAGCCTTTGATGAGGATTTTGAAAGTAAGTATCTCCGAACTGATGAGCATAATGAGTGGACCAAATCGTGGGTATATAACCCAGATACTGTGTTGTGGGGTCCATTGAAGCTCAAAGAGAGTACCCCTAAAGGCCTGTTTAAGACAACAAAACTTGTACAGGATTTAATTGATCATGATGTTGTTGTAGAGCAAGCTAAACATTCTGCGTGGATGTATGAGGCTCTAACAGGGAATTTGCAAGCTGTGGCGACAATGAAGAGTCAGCTAGTGACAAAGCACGTGGTCAAAGGGGAGTGTCGGCACTTCAAAGAGTTCTTAACTGTGGATTCGGAAGCAGAAGCTTTCTTCAGGCCTTTGATGGATGCTTATGGGAAGAGCTTGTTAAATAGAGAAGCATATATAAAGGACATAATGAAATACTCAAAGCCTATTGATGTTGGAATAGTAGACTGTGATGCTTTTGAAGAGGCTATCAATAGGGTTATCATTTATCTGCAAGTACATGGCTTCCAGAAATGCAATTACATCACCGATGAGCAGGAAATTTTCAAAGCTCTCAATATGAAAGCTGCTGTCGGGGCTATGTATGGAGGCAAGAAGAAAGACTACTTCGAGCATTTTACTGAGGCGGATAAAGAGGAAATTGTTATGCAAAGTTGCTTACGATTGTACAAGGGCTCACTTGGCATATGGAATGGATCATTGAAAGCAGAACTTCGGTGCAAAGAGAAGATACTTGCAAATAAGACAAGGACATTCACTGCTGCACCTTTAGATACTCTACTGGGTGGGAAGGTGTGCGTTGATGATTTTAATAATCAATTCTACTCAAAGAACATTGAATGCTGCTGGACTGTTGGAATGACTAAGTTTTATGGAGGTTGGGACAAATTGCTTCGGCGTCTACCTGAAAATTGGGTGTACTGCGATGCCGATGGTTCACAATTCGATAGTTCACTCACCCCATACCTAATTAATGCTGTTCTCATCATCAGAAGCACATACATGGAAGATTGGGACTTGGGGTTGCAAATGTTGCGCAATTTGTACACAGAAATAATTTACACACCAATCTCAACTCCAGATGGAACAATTGTCAAGAAGTTTAGAGGTAATAATAGCGGTCAACCTTCTACCGTTGTGGATAATTCTCTCATGGTTGTTCTTGCTATGCATTACGCTCTCATTAAGGAGTGCGTTGAGTTTGAAGAAATCGACAGCACGTGTGTATTCTTTGTTAATGGTGATGACTTATTGATTGCTGTGAATCCGGAGAAAGAGAGCATTCTCGATAGAATGTCACAACATTTCTCAGATCTTGGTTTGAACTATGATTTTTCGTCGAGAACAAGAAGGAAGGAGGAATTGTGGTTCATGTCCCATAGAGGCCTGCTAATTGAGGGTATGTACGTGCCAAAGCTTGAAGAAGAGAGAATTGTATCCATTCTGCAATGGGATAGGGCTGATCTGCCAGAGCACAGATTAGAAGCGATTTGTGCAGCAATGATAGAATCCTGGGGTTATTTTGAGTTAACGCACCAAATTAGGAGATTCTACTCATGGTTGTTACAACAGCAACCTTTTTCAACGATAGCACAGGAAGGAAAAGCTCCATACATAGCGAGCATGGCATTGAAGAAGCTGTACATGAATAGGACAGTAGATGAGGAGGAACTGAAGGCTTTCACTGAAATGATGGTTGCCTTGGATGACGAATTTGAGTGCGATACTTATGAAGTGCACCATCAAGGAAATGACACAATCGATGCAGGAGGAAGCACTAAGAAGGATGCAAAACAAGAGCAAGGTAGCATTCAACCAAATCTCAACAAGGAAAAGGAAAAGGACGTGAATGTTGGAACATCTGGAACTCATACTGTGCCACGAATTAAAGCTATCACGTCCAAAATGAGAATGCCTAAGAGTAAAGGTGCAACTGTACTAAATTTGGAACACTTACTCGAGTATGCTCCACAGCAAATTGACATCTCAAATACTCGAGCAACTCAATCACAGTTTGATACGTGGTATGAAGCAGTACAACTTGCATACGACATAGGAGAAACTGAAATGCCAACTGTGATGAATGGGCTTATGGTTTGGTGCATTGAAAATGGAACCTCGCCAAACATCAACGGAGTTTGGGTTATGATGGATGGAGATGAACAAGTCGAATACCCACTGAAACCAATCGTTGAGAATGCAAAACCAACACTTAGGCAAATCATGGCACATTTCTCAGATGTTGCAGAAGCGTATATAGAAATGCGCAACAAAAAGGAACCATATATGCCACGATATGGTTTAGTTCGTAATCTGCGCGATGGAAGTCTGGCTCGCTATGCTTTTGACTTTTATGAAGTTACATCACGGACACCAGTGAGGGCTAGAGAGGCACACATTCAAATGAAGGCCGCAGCTTTAAAATCAGCTCAATCTCGACTTTTCGGATTGGATGGTGGCATTAGTACACAAGAGGAAAACACAGAGAGGCACACCACCGAGGATGTTTCTCCAAGTATGCATACTCTACTTGGAGTGAAGAACATGTGA
>CF_YL21
ATGGCAATCTACACGTCAACAATCTGTTTCGGTTCGTTTGAATGCAAGCTACCATACTCACCCGCCTCTTGCGGGCATATTGTGAAGGAACGAGAAGTGCTGGCTTCCATCGATCCTTTCGCAGATCTGGAAACACAACTTAGTGCACGATTGCTCAAGCAAGAATATGCTACTGTTCGTGTGCTCAAGAACGGTACTCTTACGTACCGATACAAGACTGATGCCCAGATAACGCGCATCCAGAAGAGACTGGAAAGGAAGGATAGGGAAGAATACCACTTCCAGATGGCAGCTCCTAGTATTGTGTCAAAGATCACTATTGCTGGTGGAGAGCCACCTTCAAAACTTGAATCACAAGTGCGGAGGGGTGTCATCCACACAACTCCAAGGATGCGCACAGCAAAAACATATCACACGCCAAAGCTGACAGAGGGACAAATGAACCACCTTATCAAGCAGGTGAAGCAAATTATGTCAACCAAAGGAGGGTCTGTTCAACTGATTAGCAAGAAAAGTACCCATGTTCACTATAAAGAAGTTTTGGGATCACATCGCGCAGTTGTTTGCACTGCACATATGAGAGGTTTACGAAAGAGAGTGGACTTTCGGTGTGATAAATGGACCGTTGTGCGTCTACAGCATCTCGCCAGGACGGATAAGTGGAATAACCAAGTTCGTGTTACTGATCTACGCAAGGGCGATAGTGGAGTTATATTGAGTAATACTAATCTCAAAGGAAACTTTGGGAGAAGCTCGGAGGGCATATTCATAGTGCGTGGGTCGCACGAAGGAAAAATCTATGATGCACGTTCCAAGGTTACTCAAGGGGTTATGGATTCAATGGTTCAGTTCTCAAGCGCTGAAAGCTTTTGGAAGGGATTGGACGGCAATTGGGCACAAATGAGATATCCTACAGATCATACATGTGTGGCAGGCTTACCAGTTGAAGACTGTGGCAGAGTTGCAGCGATAATGACACACAGTATTTTACCGTGCTATAAGATAACCTGCCCTACCTGTGCCCAACAATATGCCAACTTGCCAGCCAGTGACTTACTTAAGATATTACACAAGCACGCAAGTGATGGTTTAAATCGATTGGGGGCAGACAAAGATCGCTTTGTGCATGTCAAAAAGTTCTTGACAATCTTAGAGCACTTAACTGAACCGGTTGATCTGAGTCTAGAAATTTTCAATGAAGTATTCAAGTCTATAGGGGAGAAGCAACAATCACCTTTCAAAAACCTGAATATTCTGAATAATTTCTTTTTAAAAGGAAAGGAAAATACAGCTCGTGAATGGCAGGTGGCTCAATTAAGCTTACTTGAATTGGCAAGATTCCAAAAGAACAGAACGGATAATATCAAGAAAGTAGACATCTCGTTCTTTAGGAATAAACTATCTGCCAAAGCAAATTGGAACTTGTATCTGTCATGTGATAACCAGCTGGATAAGAATGCAAACTTCCTGTGGGGACAGAGGGAATATCATGCTAAGCGATTTTTCTCGAACTATTTCGAGGAAATTGATCCAGCGAAGGGCTATTCAGCATACGAAAATCGTTTGCATCCGAATGGGACAAGAAAACTTGCAATTGGAAACCTAATTGTACCACTTGATCTGGCTGAGTTTAGGCGGAAGATGAAAGGTGATTTTAAAAGACAGCCAGGGGTGAGTAAGAAGTGCACGAGCTCGAAGGATGGAAACTACGTGTATCCCTGTTGTTGCACTACACTTGATGATGGCTCAGCTGTTGAATCAACATTTTACCCGCCAACTAAGAAGCGCCTCGTAATAGGTAATAGTGGCGACCAAAAGTATGTTGACTTACCAAAAGGGAATTCTGAGATGTTATATATTGCCAGGCAAGGCTTCTGTTACATTAACATTTTCCTCGCGATGTTGATTAACATTAGTGAGGAAGATGCAAAGGATTTCACTAAGAAGGTTCGTGACATGTGTGTGCCAAAGCTCGGAACCTGGCCAACCATGATGGATCTGGCTACAACTTGTGCTCAAATGAAAATATTCTACCCTGATGTTCATGATGCAGAACTGCCTAGAATACTAGTCGATCACGAAACGCAGACATGCCATGTGGTTGACTCGTTTGGCTCACAAACAACTGGGTATCATATTTTGAAAGCATCTAGCGTGTCCCAACTTATTTTGTTTGCTAATGATGAGTTGGAGTCTGACATTAAGCACTATAGAGTTGGTGGTATTCCTAATGCATGCCCTGAACTTGGGTCCACAATATCACCTTTCAGAGAAGGAGGAGTTATAATGTCTGAGTCGGCAGCGCTGAAACTGCTTTTGAAGGGAATTTTTAGACCTAAGGTGATGAGACAGTTGCTGTTAGATGAGCCTTACCTGTTGATTCTATCAATATTATCCCCTGGCATACTGATGGCTATGTATAATAATGGGATTTTTGAACTTGCGGTGAGGTTGTGGATTAATGAGAAGCAATCCATAGCGATGATAGCATCGCTACTATCAGCTTTAGCCCTACGAGTGTCAGCGGCAGAAACACTCGTCGCACAGAGGATTATAATTGATGCTGCAGCTACAGACCTCCTCGATGCTACGTGTGACGGGTTCAACCTGCATCTAACGTACCCCACTGCATTGATGGTGTTGCAAGTTGTTAAGAATAGAAATGAATGTGATGATACTCTATTCAAGGCGGGTTTTCCAAGTTACAACACGAGCGTCGTACAGATTATGGAAAAAAATTATCTAAATCTCTTGAACGATGCTTGGAAAGATTTAACTTGGCGGGAAAAATTTTCCGCAACATGGTACTCATACAGAGCAAAACGCTCTATCACTCGGTACATAAAACCCACAGGAAGGGCAGATTTGAAAGGGTTATACAACATATCACCACAAGCATTCTTGGGCCGAAGCGTCCAGGTGGTCAAAGGCACTGCCTCAGGATTGAGCGAGCGATTTAATAATTACTTCAATACTAAGTGTGTAAATATTTCATCCTTTTTCATTCGTAGAATCTTTAGGCGTTTGCCAACTTTCGTCACTTTTGTTAACTCATTATTAGTTATTAGTATGTTAACTAGTGTAGTGGCAGTGTGTCAGGCAATAATTTTAGATCAGAAGAAGTATAGGAGAGAGATCGAGTTGATGCAGATAGAGAAGAATGAGATTGTCTGCATGGAGCTATACGCAAGTTTACAGCGCAAACTTGAACGCGATTTCACATGGGATGAGTACATTGAGTATTTGAAATCAGTGAACCCTCAGATAGTTCAATTTGCTCAAGCGCAGATGGAAGAATATGATGTGCGACACCAGCGTTCCACACCAGGTGTTAAAAATTTGGAACAAGTGGTAGCATTTATGGCTTTAGTCATCATGGTGTTCGATGCTGAAAGGAGTGACTGCGTGTTCAAAACTCTCAATAAATTTAAGGGTGTCCTTTCCTCACTGGACCATGAAGTTAGACATCAGTCCTTGGATGATGTGATCAAGAATTTTGATGAGAGGAATGAGACTATTGATTTTGAATTGAGTGAGGACACAATTCGAACATCATCAGTGCTAGATACAAAGTTTAGTGATTGGTGGGACCGACAAATCCAAATGGGACATACACTTCCACATTACAGAACCGAGGGGCACTTCATGGAATTTACAAGGGCAACTGCTGTTCAAGTGGCTAATGACATTGCCCACAGCGAACACCTAGACTTTCTAGTAAGGGGAGCTGTTGGGTCTGGAAAGTCAACTGGGTTGCCTGTTCATCTTAGTGTAGCTGGATCTGTGCTTTTAATTGAGCCAACGCGGCCACTAACGGAGAACGTTTTCAAACAGCTATCTAGTGAACCATTCTTCAAGAAGCCAACACTGCGTATGCGTGGAAATAGTATATTTGGCTCTTCCCCAATCTCCGTTATGACTAGCGGGTTTGCGCTACACTACTTCGCCAATAATCGCTCTCAATTAGCTCAGTTCAATTTTGTAATATTTGATGAGTGCCATGTTCTGGATCCTTCCGCAATGGCGTTCCGCAGTCTGCTGAGTGTTTATCATCAAGCATGTAAAGTATTAAAAGTGTCAGCTACTCCAGTGGGAAGAGAGGTTGAATTCACAACACAGCAGCCAGTCAAGTTAATAGTGGAAGACACACTGTCTTTCCAATCATTTGTTGATGCACAAGGTTCTAAAACTAATGCTGATGTTGTCCAGTTTGGTTCAAACATACTTGTTTACGTGTCGAGCTACAATGAAGTTGACAACTTGGCTAAGCTCCTAACAGATAAGAATATGATGGTCACAAAGGTTGATGGCAGAACAATGAAGCACGGTTGCCTAGAAATTGTCACAAAAGGAACCAGTGCGAGACCACATTTTGTTGTAGCAACCAACATAATTGAGAATGGAGTGACTTTGGACATAGATGTAGTAGTGGATTTTGGGTTGAAAGTCTCACCGTTCTTGGACATTGACAATAGGAGCATTGCTTACAATAAGGTGAGTGTTAGTTATGGTGAGAGAATTCAAAGGCTGGGTCGTGTTGGACGCTTCAAGAAAGGAGTAGCATTGCGCATTGGACACACTGAGAAGGGAATTATCGAAATTCCAAGCATGATCGCTACTGAGGCGGCTCTTGCTTGCTTTGCATATAACTTACCAGTGATGACAGGAGGCGTCTCAACTAGTCTGATTGGCAATTGTACTGTGCGCCAGGTCAAAACAATGCAGCAATTTGAATTGAGCCCCTTCTTTATCCAGAATTTCGTTGCTCATGATGGATCAATGCATCCTGTCATACATGACATTCTCAAAAAGTATAAACTCCGAGATTGTATGACACCTTTGTGCGATCAGTCTATACCATACAGGGCATCGAGCACTTGGTTATCGGTTAGTGAATATGAGCGACTTGGAGTGGCTTTAGAAATTCCAAAGCAAGTCAAAATTGCATTCCATATCAAAGAGATCCCTCCTAAGCTCCACGAAATGCTTTGGGAAACGGTTGTCAAGTACAAAGACGTTTGCTTATTTCCAAGCATTCGAGCATCGTCCATCAGCAAAATCGCATACACATTGCGTACAGACCTCTTCGCCATCCCAAGAACTCTAATATTGGTGGAGAGATTGCTTGAAGAGGAGCGAGTGAAGCAGAGCCAATTCAGAAGTCTCATCGATGAAGGGTGCTCAAGCATGTTTTCAATTGTCAACTTGACCAACACTCTCAGGGCCAGATATGCAAAAGATTACACCGCAGAGAACATACAAAAACTTGAGAAAGTGAGAAGTCAATTAAAAGAATTCTCAAATTTGGATGGTTCTGCATGTGAGGAAAATTTAATAAAGAGGTATGAGTCTTTGCAGTTCGTTCACCACCAAGCTGCGACGTCACTTGCAAAGGATCTCAAGTTGAAGGGGACTTGGAAGAAGTCATTAGTGGCTAAAGACTTGATCATAGCAGGCGCTGTTGCAATTGGTGGAATAGGACTCATATATAGTTGGTTCACACAATCAGTTGAGACTGTGTCTCACCAAGGGAAAAATAAATCCAAAAGAATCCAAGCCTTGAAGTTTCGCCATGCTCGTGACAAAAGGGCTGGCTTTGAAATTGACAACAATGATGACACAATAGAGGAATTCTTTGGATCTGCATACAGGAAGAAGGGAAAAGGTAAAGGTACCACAGTTGGTATGGGCAAGTCAAGCAGGAGGTTCATTAACATGTACGGGTTTGACCCGACAGAGTACTCATTCATCCAATTCGTTGATCCACTCACTGGGGCACAAATAGAAGAGAATGTCTATGCTGACATTAGAGATGTTCAAGAAAGATTTAGTGAGGTGCGAAAGAAAATGGTTGAGAATGATGATATTGAAATGCAAGCCTTGGGTAGTAACACGACCATACATGCATACTTTAGGAAAGATTGGTCTGACAAAGCTTTGAAGATTGATTTAATGCCACATAACCCACTCAAAGTTTGTGACAAAACAAATGGCATTGCCAAACTTCCTGAGAGAGAGTTCGAACTAAGGCAGACTGGGCCAGCTGTAGAAGTCGACGTGAAGGACATACCAGCGCAGGAAGTGGAGCATGAAGCCAAATCGCTCATGAGAGGCTTGAGAGACTTCAACCCAATTGCTCAAACAGTTTGTAGGCTGAAAGTATCTGTTGAATATGGGACATCAGAGATGTACGGTTTTGGATTTGGAGCATACATAATAGCGAACCACCATTTGTTTAGGAGTTACAATGGTTCCATGGAGGTGCGATCCATGCATGGTACATTCAGGGTGAAGAATCTACACAGTTTGAGCGTTCTGCCAATTAAAGGCAGGGACATCATCCTCATCAAAATGCCGAAGGATTTCCCTGTTTTTCCACAGAAATTGCATTTCCGAGCTCCTACACAGAATGAAAGAGTTTGTTTAGTTGGGACCAACTTTCAGGAGAAGTATGCATCGTCGATCATCACAGAAACAAGCACTACTTACAATATACCAGGCAGCACATTCTGGAAGCATTGGATTGAAACAGATAATGGACATTGTGGACTACCAGTGGTGAGCACAGCCGATGGATGTCTAGTCGGAATTCATAGTTTGGCAAACAATGCACACACCACGAACTACTACTCAGCCTTCGATGAAGATTTTGAAAGCAAGTACCTCCGAACCAATGAGCACAATGAATGGGTCAAATCTTGGGTTTATAATCCAGACACAGTGTTGTGGGGCCCGTTGAAGCTTAAAGATAGCACTCCCAAAGGGTTATTCAAAACAACAAAGCTTGTGCAAGATCTAATCGATCATGATGAAGTCGCGGAGCAAGCTAAGCACTCTGCGTGGATGTTTGAAGCCTTGACAGGAAATTTGCAAGCTGTCGCAACAATGAAGAGTCAATTAGTAACCAAGCATGTAGTTAAAGGAGAGTGTCGACACTTCAAAGAATTCCTGACTGTGGATGCAGAAGCAGAGGCATTCTTCAGGCCTTTGATGGATGCATATGGGAAAAGCTTGCTGAATAGAGATGCGTACATCAAAGACATAATGAAGTACTCAAAACCTATAGATGTTGGTATCGTGGACTGTGATGCATTTGAAGAAGCCATCAATAGGGTTATCATCTACTTGCAAGTGCACGGCTTCAAGAAGTGTGCATATGTCACTGACGAGCAAGAAATTTTCAAAGCGCTCAACATGAAAGCTGCAGTCGGAGCCATGTATGGTGGCAAAAAGAAAGACTATTTTGAGCATTTCACTGATGCAGATAAGGAAGAAATAGTCATGCAAAGCTGTCTGCGACTGTATAAAGGCTTGCTCGGCATTTGGAACGGATCGTTGAAGGCAGAGCTCCGGTGTAAGGAAAAGATACTTGCAAACAAGACGAGGACGTTCACTGCTGCACCTCTAGACACTTTGCTGGGTGGTAAAGTGTGTGTTGATGACTTCAATAATCAATTTTATTCAAAGAATATTGAATGCTGTTGGACAGTTGGGATGACTAAGTTTTATGGTGGTTGGGATAAACTGCTTCGGCGTTTACCTGAGAATTGGGTATACTGTGATGCCGATGGCTCACAGTTTGATAGTTCACTAACTCCATACTTAATCAATGCTGTTCTCACCATCAGAAGCACATACATGGAAGACTGGGATGTGGGGTTGCAAATGCTGCGTAATTTATACACTGAGATTATTTACACACCTATCTCAACTCCAGATGGAACAATTGTTAAGAAGTTCAGAGGAAATAACAGTGGTCAGCCTTCTACTGTTGTGGACAACTCTCTTATGGTTGTCCTTGCCATGCACTATGCTCTCATCAAAGAATGCATTGAGTTTGAAGAGATTGACAGCACGTGCGTGTTCTTCGTCAATGGTGATGATTTGCTGATTGCCGTGAATCCGGATAAAGAGGGCATTCTTGATAGATTGTCACAGCACTTCTCAGATCTTGGTTTAAATTATGATTTTTCGTCAAGAACAAGAAATAAGGAAGAGTTATGGTTTATGTCTCATAGAGGCCTACTGATTGAGGGCATGTACGTGCCGAAACTTGAAGAAGAAAGGATTGTGTCCATTCTCCAATGGGACAGAGCAGACTTGGCTGAACACAGGCTTGAGGCGATTTGCGCAGCTATGATAGAGTCCTGGGGTTATTCTGAACTAACACACCAAATTAGAAGATTCTACTCATGGTTATTGCAACAGCAACCTTTTGCAACAATAGCGCAGGAAGGGAAGGCTCCTTATATAGCAAGCATGGCATTAAGGAAACTGTATATGGATAGAGCTGTGGATGAGGAAGAGCTGAGAGCCTTCACTGAAATGATGGTCGCATTAGACGATGAGTTTGAGTGTGACTCTTATGAAGTACACCACCAGGGAAACGATACAATCGATGCAGGAGGAAGCAGCAAGAAAGATGCAAGACCAGAGCAAGGCAGCATCCAGTCAAACCCGAACAAAGGAAAAGATAAGGATGTGAATGCTGGTACATCTGGGACACATACTGTGCCGAGAATCAAGGCTATCACGTCCAAAATGAGAATGCCCAAAAGCAAAGGAGCAACCGTGCTAAACTTAGAACACTTGCTTGAGTATGCTCCACAACAAATTGATATTTCAAATACTCGGGCAACTCAATCACAGTTTGATACGTGGTATGAGGCAGTGCGGATGGCATACGACATAGGAGAAACTGAGATGCCAACTGTGATGAATGGGCTTATGGTTTGGTGCATTGAAAATGGAACCTCGCCAAATGTCAACGGAGTCTGGGTTATGATGGATGGGGATGAACAAGTCGAGTACCCGTTGAAACCAATCGTTGAGAATGCAAAACCAACCCTTAGGCAAATCATGGCACATTTCTCAGATGTTGCAGAAGCGTATATAGAAATGCGCAACAAAAAGGAACCATATATGCCACGATATGGTTTAATTCGAAATCTGCGGGATGTGGGTTTAGCGCGTTATGCCTTTGACTTTTATGAGGTCACATCACGAACACCAGTGAGGGCTAGGGAAGCGCACATTCAAATGAAGGCCGCAGCATTGAAATCAGCCCAACCTCGACTTTTCGGGTTGGACGGTGGCATCAGTACACAAGAGGAGAACACAGAGAGGCACACCGCCGAGGATGTCTCTCCAAGTATGCATACTCTACTTGGAGTCAAAAACATGTGA

================================================
FILE: inst/extdata/GVariation/B.Oz.fas
================================================
>Oz
ATGGCAACTTACATGTCAACAATCTGTTTCGGTTCGTTTGAATGCAAGCTACCATACTCACCCGCTTCTTGCGGGCATATTGTGAAGGAGCGAGAAGTGCTGGCTTCCGTTGATCCTTTCGCAGATCTGGAAACACAACTTAGTGCACGATTGCTCAAGCAAGAATATGCTACTGTTCGTGTGCTCAAGAACGGTACTCTTACTTACCGATACAAAACTGATGCCCAGATAACGCGCATTCAGAAGAAACTGGAGAGGAAGGATAGGGAAGAATATCACTTCCAGATGGCCGCTCCTAGTATTGTGTCAAAAATTACAATAGCTGGTGGAGATCCTCCATCAAAGTCTGAGCCACAAGCACCAAGAGGGATCATTCATACAACTCCAAGGGTGCGTAAAGTCAAGACACGTCCCATAATAAAGTTGACAGAAGGCCAGATGAATCATCTCATTAAGCAGGTGAAGCAGATTATGTCGGAGAAGAGAGGGTCTGTCCACTTAATTAGTAAGAAGACCACTCATGTTCAATATAAGGAGATACTTGGAGCAACTCGCGCAGCGGTTCGAACTGCACATATGATGGGTTTGCGACGGAGAGTGGACTTCCGATGTGATATGTGGACAGTCGGACTTTTGCAACGTCTCGCTCGGACGGACAAATGGTCCAATCAAGTCCGCACTATCAACATACGAAGGGGTGATAGTGGAGTCATTTTGAACACAAAAAGCCTCAAAGGCCACTTTGGTAGAAGTTCAGGAGACTTGTTCATAGTGCGTGGATCACACGAAGGGAAATTGTACGATGCACGTTCTAGAGTTACTCAGAGTGTTTTGAACTCAATGATCCAGTTTTCGAATGCTGATAATTTTTGGAAGGGTCTAGACGGTAATTGGGCACAACTGAGATATCCTTCGGATCACACATGTGTAGCTGGTTTACCTGTCGAAGATTGTGGTAGAGTTGCTGCATTGATGGCACACAGTATCCTCCCGTGCTACAAGATAACCTGCCCCACCTGTGCTCAACAGTATGCCAGCTTGCCGGTTAGCGATCTGTTTAAGCTGTTGCATAAACATGCGAGAGATGGTTTGAACCGATTGGGAGCGGATAAAGACCGGTTTATACATGTTAATAAGTTCTTGATAGCGTTAGAGCATCTAACTGAACCGGTGGATTTGAATCTCGAGCTTTTCAATGAGATATTTAAATCCATAGGGGAGAAGCAGCAAGCACCGTTCAAGAATTTAAATGTCTTAAATAATTTCTTCCTGAAAGGAAAAGAAAATACAGCTCATGAATGGCAAGTGGCTCAATTGAGTTTGCTCGAATTAGCAAGGTTCCAGAAGAATAGAACTGATAACATCAAGAAAGGTGATATATCTTTCTTCAGAAATAAATTATCTGCCAAGGCAAACTGGAATCTGTATTTGTCGTGCGACAACCAGTTGGATAAAAATGCAAATTTTCTGTGGGGACAAAGGGAGTATCATGCTAAGCGGTTTTTCTCAAACTTCTTTGAGGAAATTGATCCAGCAAAGGGATACTCAGCATATGAAATCCGCAAGCATCCAAATGGAACAAGGAAGCTCTCAATTGGTAACTTAGTTGTCCCACTTGATTTAGCTGAGTTTAGGCAGAAGATGAAAGGTGACTATAGGAAACAACCAGGAGTTAGCAGAAAGTGCACGAGTTCGAAAGATGGTAATTATGTGTATCCCTGTTGTTGCACAACACTTGATGATGGTTCAGCTATTGAATCAACATTCTATCCACCAACCAAAAAGCACCTTGTAATAGGCAATAGCGGTGACCAAAAATTTGTTGATTTACCAAAAGGGGATTCGGAGATGTTATACATTGCCAAGCAGGGTTATTGTTATATCAACGTGTTTCTTGCAATGCTTATTAACATTAGCGAGGAGGATGCAAAGGATTTCACAAAGAAAGTTCGCGACATGTGTGTGCCAAAGCTTGGAACCTGGCCAACTATGATGGATTTGGCGACCACTTGTGCTCAAATGAGAATATTCTATCCTGACGTGCATGATGCAGAGCTGCCTAGAATATTGGTTGACCATGACACTCAAACGTGTCACGTGGTTGACTCATTTGGCTCGCAAACAACTGGATATCATATTCTAAAAGCATCCAGCGTGTCTCAACTTATCTTGTTTGCAAATGATGAATTAGAATCTGATATAAAACATTATAGAGTTGGTGGTGTTCCTAATGCATGCCCTGAACTTGGGTCCACAATATCACCTTTCAGAGAAGGAGGAGTTATAATGTCTGAGTCGGCAGCGCTGAAACTGCTTTTAAAGGGAATTTTTAGACCTAAGGTGATGAGACAGTTGCTGTTAGATGAGCCTTACCTGTTGATTTTATCAATATTATCTCCTGGCATACTGATGGCTATGTATAATAATGGGATTTTTGAACTTGCGGTAAGGTTGTGGATTAATGAGAAACAATCCATAGCTATGATAGCATCGCTACTATCAGCTTTAGCCCTACGAGTGTCAGCGGCAGAAACACTCGTCGCACAGAGGATTATCATTGATGCTGCAGCTACAGACCTCCTTGATGCTACGTGTGATGGGTTCAACCTACATCTAACGTACCCCACTGCATTGATGGTGTTGCAAGTTGTTAAGAATAGAAATGAATGTGATGATACCCTATTCAAGGCGGGTTTTTCAAGTTACAACACGAGCGTCGTACAGATTATGGAAAAAAATTATCTAAATCTCTTGAACGATGCTTGGAAAGATTTAACTTGGCGGGAAAAATTATCCGCAACATGGTACTCATACAGAGCAAAACGCTCTATCACTCGGTACATAAAACCCACAGGAAGGGCAGATTTGAAAGGGTTATACAACATATCACCACAAGCATTCTTGGGCCGAAGCGCCCAGGTGGTCAAAGGTACTGCCTCAGGATTGAGTGAGCGATTTAATAATTATTTCAATACTAAGTGTGTAAATATTTCATCCTTTTTCATTCGTAGAATCTTTAGGCGTTTGCCAACTTTCGTCACTTTTGTTAACTCATTATTAGTTATTAGTATGTTAACTAGCGTAGTGGCAGTGTGTCAGGCAATAATTTTAGATCAGAGGAAATATAGGAGAGAAATCGAGTTGATGCAGATAGAGAAGAATGAGATTGTCTGCATGGAGCTATATGCAAGTTTACAGCGCAAACTTGAACGCGATTTCACATGGGATGAGTACATTGAGTATTTGAAATCAGTAAACCCTCAGATAGTTCAGTTTGCTCAAGCGCAGATGGAAGAATATGATGTGCGACACCAGCGTTCCACACCAGGTGTTAAAAATTTGGAACAAGTGGTAGCATTTATGGCTTTAGTCATTATGGTGTTCGATGCTGAAAGGAGTGATTGCGTGTTCAAAACTCTCAATAAATTTAAGGGTGTCCTTTCCTCAATGGACTATGAAGTTAGACATCAGTCCTTAGACGATGTGATCAAGAATTTTGATGAGAGGAATGAGATTATTGATTTTGAATTGAGTGAGGACACAATTCGAACATCATCAGTGCTAGATACAAAGTTTAGTGATTGGTGGGACCGACAAATCCAGATGGGACATACACTTCCACATTACAGAACCGAGGGGCACTTCATGGAATTCACAAGAGCAACTGCTGTCCAAGTGGCTAATGACATTGCCCATAGCGAACACCTAGACTTTCTAGTAAGGGGAGCTGTTGGGTCTGGAAAGTCAACTGGGTTACCTGTTCATCTTAGTGTAGCCGGATCTGTGCTTTTAATTGAACCAACGCGACCACTAGCGGAGAACGTTTTCAAACAGCTATCTAGTGAACCATTCTTCAAGAAGCCAACACTGCGTATGCGTGGAAATAGTATATTTGGCTCTTCTCCAATCTCCGTCATGACTAGCGGATTTGCGCTACACTACTTCGCCAATAATCGCTCTCAATTAGCTCAGTTCAACTTTGTAATATTTGATGAGTGCCATGTTCTGGATCCTTCCGCAATGGCGTTCCGCAGTCTGCTGAGTGTTTATCATCAAGCATGCAAAGTATTAAAAGTGTCAGCTACTCCAGTGGGAAGAGAGGTTGAATTTACAACACAGCAACCAGTCAAGTTAATAGTGGAGGACACACTGTCTTTCCAATCATTTGTTGATGCACAAGGTTCTAAAACTAATGCTGATGTTGTTCAGTTTGGTTCAAACGTACTTGTGTACGTGTCGAGCTACAATGAAGTTGATACCTTGGCTAAGCTCCTAACAGACAAGAATATGATGGTCACAAAGGTTGATGGCAGAACAATGAAGCACGGTTGCCTAGAAATTGTCACAAAAGGAACCAGTGCGAGACCACATTTTGTTGTAGCAACCAACATAATTGAGAATGGAGTGACTTTGGACATAGACGTGGTTGTAGATTTTGGGTTAAAAGTCTCACCGTTCTTGGACATTGACAATAGGAGCATTGCTTACAATAAGGTGAGTGTTAGCTATGGTGAGAGAATTCAAAGGCTGGGTCGTGTTGGACGCTTCAAGAAAGGAGTAGCATTGCGCATTGGACACACTGAGAAGGGAATTATTGAAATTCCAAGCATGATCGCTACAGAGGCAGCTCTTGCTTGCTTTGCATATAACTTACCAGTGATGACAGGAGGCGTCTCAACTAGTCTGATTGGCAATTGTACTGTGCGCCAAGTTAAAACAATGCAGCAATTTGAATTGAGTCCCTTCTTTATCCAGAATTTCGTTGCCCATGATGGATCAATGCATCCTGTCATACATGACATTCTTAAAAAGTATAAACTTCGAGATTGTATGACACCTTTGTGCGATCAGTCTATACCATACAGGGCATCGAGCACTTGGTTATCGGTTAGTGAATATGAGCGACTTGGAGTGGCCTTAGAAATTCCAAAGCAAGCCAAAATTGCATTCCATATCAAAGAGATCCCTCCTAAGCTCCACGAAATGCTTTGGGAAACGGTTGTCAAGTACAAAGACGTTTGCTTATTTCCAAGCATTCGAGCATCGTCCATCAGCAAAATCGCATACACATTGCGTACAGACCTCTTCGCCATCCCAAGAACTCTAATATTGGTGGAGAGACTGCTTGAAGAGGAGCGAGTGAAGCAGAGCCAATTCAGAAGTCTCATCGATGAAGGATGCTCAAGCATGTTTTCAATTGTCAACCTGACAAACACTCTCAGAGCTAGATATGCAAAAGATTACACCGCAGAGAACATACAAAAACTTGAGAAAGTGAGAAGTCAATTGAAAGAATTCTCAAATTTGGATGGTTCTGCATGTGAGGAAAATTTAATAAAGAGGTATGAGTCTTTGCAGTTCGTTCATCACCAAGCTGCGACGTCACTTGCAAAGGATCTCAAGTTAAAGGGGACTTGGAAGAAGTCATTAGTGGCCAAAGACTTGATCATAGCAGGCGCTGTTGCAATTGGTGGAATAGGACTCATATATAGTTGGTTCACACAATCAGTTGAGACTGTGTCTCACCAAGGGAAAAATAAATCCAAAAGAATTCAAGCCTTGAAGTTTCGCCATGCTCGTGACAAAAGGGCTGGCTTTGAAATTGACAACAATGATGACACAATAGAGGAATTCTTTGGATCTGCATACAGGAAAAAGGGAAAAGGTAAAGGTACCACAGTTGGTATGGGCAAGTCAAGCAGGAGGTTCATCAACATGTATGGGTTTGATCCAACAGAGTACTCATTCATCCAATTCGTTGATCCACTCACTGGGGCGCAAATAGAAGAGAATGTCTATGCTGACATTAGAGATATTCAAGAGAGATTTAGTGAAGTGCGAAAGAAAATGGTTGAGAATGATGACATTGAAATGCAAGCCTTGGGTGGTAACACGACCATACATGCATACTTTAGGAAAGATTGGTCTGACAAAGCTTTGAAGATTGATTTAATGCCACATAATCCACTCAAAGTTTGTGACAAGACAAATGGCATTGCCAAATTTCCTGAGAGAGAGCTCGAACTAAGGCAGACTGGGCCAGCTGTAGAAGTCGATGTGAAGGACATACCAGCACAGGAAGTGGAGCATGAAGCTAAATCGCTCATGAGAGGCTTGAGAGACTTCAACCCAATTGCCCAAACAGTTTGTAGGCTGAAAGTATCTGTTGAATATGGGACATCAGAGATGTACGGTTTTGGATTTGGAGCATACATAATAGCGAACCACCATTTGTTCAGGAGTTACAATGGTTCCATGGAGGTGCGATCCATGCACGGTACATTCAGGGTGAAGAATCTACACAGTTTGAGCGTTCTGCCAATTAAAGGTAGGGATATCATCCTCATCAAAATGCCGAAAGATTTCCCTGTCTTTCCACAGAAATTGCATTTCCGAGCTCCTACACAGAATGAAAGAGTTTGTTTAGTTGGAACCAACTTTCAGGAGAAGTATGCATCGTCGATCATCACAGAAACAAGCACCACTTACAATATACCAGGCAGCACATTCTGGAAGCATTGGATTGAAACAGATAATGGACATTGTGGATTACCAGTGGTGAGCACCACCGATGGATGTCTAGTCGGAATTCACAGTTTGGCAAACAACAAACACACCACGAACTACTACTCAGCCTTCGATGAAGATTTTGAAAGCAAGTATCTCCGAACCAATGAGCACAATGAATGGGTCAAGTCTTGGATTTATAATCCAGACACAGTGTTGTGGGGCCCGTTGAAACTTAAAGACAGCACTCCCAAAGGATTATTCAAAACAACAAAGCTTGTGCAAGATCTAATCGATCATGATGTAGTGGTGGAGCAAGCTAAGCACTCTGCGTGGATGTTTGAAGCCTTGACAGGAAATTTGCAAGCTGTCGCAACAATGAAGAGCCAATTAGTAACCAAGCATGTAGTTAAAGGAGAGTGTCGACACTTCAAAGAATTCCTGACTGTGGATGCAGAAGCAGAGGCATTCTTCAGGCCTTTGATGGATGCGTATGGGAAAAGCTTGCTGAATAGAGATGCATACATCAAGGACATAATGAAGTATTCAAAACCTATAGATGTTGGTATCGTGGACTGTGATGCATTTGAGGAAGCCATCAATAGGGTTATCATCTACCTGCAAGTGCACGGCTTCAAGAAGTGCGCATACGTCACTGACGAGCAAGAAATTTTCAAAGCGCTCAACATGAAAGCTGCAGTTGGAGCCATGTATGGTGGCAAAAAGAAAGACTATTTTGAGCATTTCACTGATGCAGATAAGGAAGAAATAGTCATGCAAAGCTGTCTGCGATTGTATAAAGGCTTGCTTGGCATTTGGAATGGATCATTGAAGGCAGAGCTCCGGTGTAAGGAAAAGATACTTGCAAATAAGACGAGGACATTCACTGCTGCACCTTTAGACACTTTGCTGGGTGGTAAAGTGTGTGTTGATGATTTCAATAATCAATTTTATTCAAAGAATATTGAATGCTGTTGGACGGTTGGGATGACTAAGTTTTATGGTGGTTGGGATAAACTGCTGCGGCGTTTACCTGAGAATTGGGTATACTGTGATGCCGATGGCTCACAGTTTGATAGTTCACTAACTCCATACTTAATCAATGCTGTTCTCACCATCAGAAGCACATACATGGAAGATTGGGATGTGGGGTTGCAAATGTTGCGCAATTTATACACTGAGATTGTTTACACACCTATTTCAACTCCAGATGGAACAATTGTTAAGAAGTTCAGAGGAAATAACAGTGGTCAGCCTTCTACTGTTGTGGACAACTCTCTTATGGTCGTCCTTGCCATGCACTATGCTCTCATCAAAGAATGCATTGAGTTTGAAGAGATTGACAGCACGTGCGTGTTCTTTGTCAATGGTGATGATTTGCTGATTGCTGTGAATCCGGATAAAGAGGGCATTCTTGACAGATTGTCACAACACTTCTCAGATCTTGGTTTGAATTATGATTTCTCGTCAAGAACAAGAAATAAGGAGGAATTGTGGTTTATGTCTCATAGAGGCCTACTGATTGAGGGCATGTACGTGCCGAAACTTGAAGAAGAAAGGATTGTGTCCATTCTCCAATGGGACAGAGCAGACTTGGCTGAACACAGGCTTGAGGCGATTTGCGCAGCTATGATAGAGTCCTGGGGTTATTCTGAACTAACACACCAAATCAGGAGATTCTACTCATGGTTATTGCAACAGCAACCCTTTGCAACAATAGCGCAGGAAGGGAAGGCTCCTTATATAGCAAGCATGGCATTAAGGAAATTGTATATGGATAGGGCTGTGGATGAGGAAGAGCTGAGAGCCTTCACTGAAATGATGGTCGCATTAGACGATGAGTTTGAATTTGACTCTTATGAAGTACACCATCAAGCAAATGACACAATCGATGCAGGAGGAAGCAGCAAGAAAGATGCAAGACCGGAGCAAGGCAGCATCCAGTCAAACCCGAACAAAGGAAAAGATAAGGATGTGAATGCTGGTACATCTGGGACACATACTGTGCCGAGAATCAAGGCTATCACGTCCAAAATGAGAATGCCCAAAAGCAAGGGAGCAACCGTGCTAAACCTAGAACACTTGCTTGAGTATGCTCCACAACAAATTGATATTTCAAATACTCGGGCAACTCAATCACAGTTTGATACGTGGTATGAGGCAGTGCGGATGGCATACGACATAGGAGAAACTGAGATGCCAACTGTGATGAATGGGCTTATGGTTTGGTGCATTGAAAATGGAACCTCGCCAAATGTCAACGGAGTTTGGGTTATGATGGATGGGAATGAACAAGTCGAGTACCCGTTGAAACCAATCGTTGAGAATGCAAAACCAACCCTTAGGCAAATCATGGCACATTTCTCAGATGTTGCAGAAGCGTATATAGAAATGCGCAACAAAAAGGAACCATATATGCCACGATATGGTTTAATTCGAAATCTGCGGGATGTGGGTTTAGCGCGTTATGCCTTTGACTTTTATGAGGTCACATCACGAACACCAGTGAGGGCTAGGGAAGCGCACATTCAAATGAAGGCCGCAGCATTGAAATCAGCCCAACCTCGACTTTTCGGGTTGGACGGTGGCATCAGTACACAAGAGGAGAACACAGAGAGGCACACCACCGAGGATGTCTCTCCAAGTATGCATACTCTACTTGGAGTCAAGAACATGTGA
>CF_YL21
ATGGCAATCTACACGTCAACAATCTGTTTCGGTTCGTTTGAATGCAAGCTACCATACTCACCCGCCTCTTGCGGGCATATTGTGAAGGAACGAGAAGTGCTGGCTTCCATCGATCCTTTCGCAGATCTGGAAACACAACTTAGTGCACGATTGCTCAAGCAAGAATATGCTACTGTTCGTGTGCTCAAGAACGGTACTCTTACGTACCGATACAAGACTGATGCCCAGATAACGCGCATCCAGAAGAGACTGGAAAGGAAGGATAGGGAAGAATACCACTTCCAGATGGCAGCTCCTAGTATTGTGTCAAAGATCACTATTGCTGGTGGAGAGCCACCTTCAAAACTTGAATCACAAGTGCGGAGGGGTGTCATCCACACAACTCCAAGGATGCGCACAGCAAAAACATATCACACGCCAAAGCTGACAGAGGGACAAATGAACCACCTTATCAAGCAGGTGAAGCAAATTATGTCAACCAAAGGAGGGTCTGTTCAACTGATTAGCAAGAAAAGTACCCATGTTCACTATAAAGAAGTTTTGGGATCACATCGCGCAGTTGTTTGCACTGCACATATGAGAGGTTTACGAAAGAGAGTGGACTTTCGGTGTGATAAATGGACCGTTGTGCGTCTACAGCATCTCGCCAGGACGGATAAGTGGAATAACCAAGTTCGTGTTACTGATCTACGCAAGGGCGATAGTGGAGTTATATTGAGTAATACTAATCTCAAAGGAAACTTTGGGAGAAGCTCGGAGGGCATATTCATAGTGCGTGGGTCGCACGAAGGAAAAATCTATGATGCACGTTCCAAGGTTACTCAAGGGGTTATGGATTCAATGGTTCAGTTCTCAAGCGCTGAAAGCTTTTGGAAGGGATTGGACGGCAATTGGGCACAAATGAGATATCCTACAGATCATACATGTGTGGCAGGCTTACCAGTTGAAGACTGTGGCAGAGTTGCAGCGATAATGACACACAGTATTTTACCGTGCTATAAGATAACCTGCCCTACCTGTGCCCAACAATATGCCAACTTGCCAGCCAGTGACTTACTTAAGATATTACACAAGCACGCAAGTGATGGTTTAAATCGATTGGGGGCAGACAAAGATCGCTTTGTGCATGTCAAAAAGTTCTTGACAATCTTAGAGCACTTAACTGAACCGGTTGATCTGAGTCTAGAAATTTTCAATGAAGTATTCAAGTCTATAGGGGAGAAGCAACAATCACCTTTCAAAAACCTGAATATTCTGAATAATTTCTTTTTAAAAGGAAAGGAAAATACAGCTCGTGAATGGCAGGTGGCTCAATTAAGCTTACTTGAATTGGCAAGATTCCAAAAGAACAGAACGGATAATATCAAGAAAGTAGACATCTCGTTCTTTAGGAATAAACTATCTGCCAAAGCAAATTGGAACTTGTATCTGTCATGTGATAACCAGCTGGATAAGAATGCAAACTTCCTGTGGGGACAGAGGGAATATCATGCTAAGCGATTTTTCTCGAACTATTTCGAGGAAATTGATCCAGCGAAGGGCTATTCAGCATACGAAAATCGTTTGCATCCGAATGGGACAAGAAAACTTGCAATTGGAAACCTAATTGTACCACTTGATCTGGCTGAGTTTAGGCGGAAGATGAAAGGTGATTTTAAAAGACAGCCAGGGGTGAGTAAGAAGTGCACGAGCTCGAAGGATGGAAACTACGTGTATCCCTGTTGTTGCACTACACTTGATGATGGCTCAGCTGTTGAATCAACATTTTACCCGCCAACTAAGAAGCGCCTCGTAATAGGTAATAGTGGCGACCAAAAGTATGTTGACTTACCAAAAGGGAATTCTGAGATGTTATATATTGCCAGGCAAGGCTTCTGTTACATTAACATTTTCCTCGCGATGTTGATTAACATTAGTGAGGAAGATGCAAAGGATTTCACTAAGAAGGTTCGTGACATGTGTGTGCCAAAGCTCGGAACCTGGCCAACCATGATGGATCTGGCTACAACTTGTGCTCAAATGAAAATATTCTACCCTGATGTTCATGATGCAGAACTGCCTAGAATACTAGTCGATCACGAAACGCAGACATGCCATGTGGTTGACTCGTTTGGCTCACAAACAACTGGGTATCATATTTTGAAAGCATCTAGCGTGTCCCAACTTATTTTGTTTGCTAATGATGAGTTGGAGTCTGACATTAAGCACTATAGAGTTGGTGGTATTCCTAATGCATGCCCTGAACTTGGGTCCACAATATCACCTTTCAGAGAAGGAGGAGTTATAATGTCTGAGTCGGCAGCGCTGAAACTGCTTTTGAAGGGAATTTTTAGACCTAAGGTGATGAGACAGTTGCTGTTAGATGAGCCTTACCTGTTGATTCTATCAATATTATCCCCTGGCATACTGATGGCTATGTATAATAATGGGATTTTTGAACTTGCGGTGAGGTTGTGGATTAATGAGAAGCAATCCATAGCGATGATAGCATCGCTACTATCAGCTTTAGCCCTACGAGTGTCAGCGGCAGAAACACTCGTCGCACAGAGGATTATAATTGATGCTGCAGCTACAGACCTCCTCGATGCTACGTGTGACGGGTTCAACCTGCATCTAACGTACCCCACTGCATTGATGGTGTTGCAAGTTGTTAAGAATAGAAATGAATGTGATGATACTCTATTCAAGGCGGGTTTTCCAAGTTACAACACGAGCGTCGTACAGATTATGGAAAAAAATTATCTAAATCTCTTGAACGATGCTTGGAAAGATTTAACTTGGCGGGAAAAATTTTCCGCAACATGGTACTCATACAGAGCAAAACGCTCTATCACTCGGTACATAAAACCCACAGGAAGGGCAGATTTGAAAGGGTTATACAACATATCACCACAAGCATTCTTGGGCCGAAGCGTCCAGGTGGTCAAAGGCACTGCCTCAGGATTGAGCGAGCGATTTAATAATTACTTCAATACTAAGTGTGTAAATATTTCATCCTTTTTCATTCGTAGAATCTTTAGGCGTTTGCCAACTTTCGTCACTTTTGTTAACTCATTATTAGTTATTAGTATGTTAACTAGTGTAGTGGCAGTGTGTCAGGCAATAATTTTAGATCAGAAGAAGTATAGGAGAGAGATCGAGTTGATGCAGATAGAGAAGAATGAGATTGTCTGCATGGAGCTATACGCAAGTTTACAGCGCAAACTTGAACGCGATTTCACATGGGATGAGTACATTGAGTATTTGAAATCAGTGAACCCTCAGATAGTTCAATTTGCTCAAGCGCAGATGGAAGAATATGATGTGCGACACCAGCGTTCCACACCAGGTGTTAAAAATTTGGAACAAGTGGTAGCATTTATGGCTTTAGTCATCATGGTGTTCGATGCTGAAAGGAGTGACTGCGTGTTCAAAACTCTCAATAAATTTAAGGGTGTCCTTTCCTCACTGGACCATGAAGTTAGACATCAGTCCTTGGATGATGTGATCAAGAATTTTGATGAGAGGAATGAGACTATTGATTTTGAATTGAGTGAGGACACAATTCGAACATCATCAGTGCTAGATACAAAGTTTAGTGATTGGTGGGACCGACAAATCCAAATGGGACATACACTTCCACATTACAGAACCGAGGGGCACTTCATGGAATTTACAAGGGCAACTGCTGTTCAAGTGGCTAATGACATTGCCCACAGCGAACACCTAGACTTTCTAGTAAGGGGAGCTGTTGGGTCTGGAAAGTCAACTGGGTTGCCTGTTCATCTTAGTGTAGCTGGATCTGTGCTTTTAATTGAGCCAACGCGGCCACTAACGGAGAACGTTTTCAAACAGCTATCTAGTGAACCATTCTTCAAGAAGCCAACACTGCGTATGCGTGGAAATAGTATATTTGGCTCTTCCCCAATCTCCGTTATGACTAGCGGGTTTGCGCTACACTACTTCGCCAATAATCGCTCTCAATTAGCTCAGTTCAATTTTGTAATATTTGATGAGTGCCATGTTCTGGATCCTTCCGCAATGGCGTTCCGCAGTCTGCTGAGTGTTTATCATCAAGCATGTAAAGTATTAAAAGTGTCAGCTACTCCAGTGGGAAGAGAGGTTGAATTCACAACACAGCAGCCAGTCAAGTTAATAGTGGAAGACACACTGTCTTTCCAATCATTTGTTGATGCACAAGGTTCTAAAACTAATGCTGATGTTGTCCAGTTTGGTTCAAACATACTTGTTTACGTGTCGAGCTACAATGAAGTTGACAACTTGGCTAAGCTCCTAACAGATAAGAATATGATGGTCACAAAGGTTGATGGCAGAACAATGAAGCACGGTTGCCTAGAAATTGTCACAAAAGGAACCAGTGCGAGACCACATTTTGTTGTAGCAACCAACATAATTGAGAATGGAGTGACTTTGGACATAGATGTAGTAGTGGATTTTGGGTTGAAAGTCTCACCGTTCTTGGACATTGACAATAGGAGCATTGCTTACAATAAGGTGAGTGTTAGTTATGGTGAGAGAATTCAAAGGCTGGGTCGTGTTGGACGCTTCAAGAAAGGAGTAGCATTGCGCATTGGACACACTGAGAAGGGAATTATCGAAATTCCAAGCATGATCGCTACTGAGGCGGCTCTTGCTTGCTTTGCATATAACTTACCAGTGATGACAGGAGGCGTCTCAACTAGTCTGATTGGCAATTGTACTGTGCGCCAGGTCAAAACAATGCAGCAATTTGAATTGAGCCCCTTCTTTATCCAGAATTTCGTTGCTCATGATGGATCAATGCATCCTGTCATACATGACATTCTCAAAAAGTATAAACTCCGAGATTGTATGACACCTTTGTGCGATCAGTCTATACCATACAGGGCATCGAGCACTTGGTTATCGGTTAGTGAATATGAGCGACTTGGAGTGGCTTTAGAAATTCCAAAGCAAGTCAAAATTGCATTCCATATCAAAGAGATCCCTCCTAAGCTCCACGAAATGCTTTGGGAAACGGTTGTCAAGTACAAAGACGTTTGCTTATTTCCAAGCATTCGAGCATCGTCCATCAGCAAAATCGCATACACATTGCGTACAGACCTCTTCGCCATCCCAAGAACTCTAATATTGGTGGAGAGATTGCTTGAAGAGGAGCGAGTGAAGCAGAGCCAATTCAGAAGTCTCATCGATGAAGGGTGCTCAAGCATGTTTTCAATTGTCAACTTGACCAACACTCTCAGGGCCAGATATGCAAAAGATTACACCGCAGAGAACATACAAAAACTTGAGAAAGTGAGAAGTCAATTAAAAGAATTCTCAAATTTGGATGGTTCTGCATGTGAGGAAAATTTAATAAAGAGGTATGAGTCTTTGCAGTTCGTTCACCACCAAGCTGCGACGTCACTTGCAAAGGATCTCAAGTTGAAGGGGACTTGGAAGAAGTCATTAGTGGCTAAAGACTTGATCATAGCAGGCGCTGTTGCAATTGGTGGAATAGGACTCATATATAGTTGGTTCACACAATCAGTTGAGACTGTGTCTCACCAAGGGAAAAATAAATCCAAAAGAATCCAAGCCTTGAAGTTTCGCCATGCTCGTGACAAAAGGGCTGGCTTTGAAATTGACAACAATGATGACACAATAGAGGAATTCTTTGGATCTGCATACAGGAAGAAGGGAAAAGGTAAAGGTACCACAGTTGGTATGGGCAAGTCAAGCAGGAGGTTCATTAACATGTACGGGTTTGACCCGACAGAGTACTCATTCATCCAATTCGTTGATCCACTCACTGGGGCACAAATAGAAGAGAATGTCTATGCTGACATTAGAGATGTTCAAGAAAGATTTAGTGAGGTGCGAAAGAAAATGGTTGAGAATGATGATATTGAAATGCAAGCCTTGGGTAGTAACACGACCATACATGCATACTTTAGGAAAGATTGGTCTGACAAAGCTTTGAAGATTGATTTAATGCCACATAACCCACTCAAAGTTTGTGACAAAACAAATGGCATTGCCAAACTTCCTGAGAGAGAGTTCGAACTAAGGCAGACTGGGCCAGCTGTAGAAGTCGACGTGAAGGACATACCAGCGCAGGAAGTGGAGCATGAAGCCAAATCGCTCATGAGAGGCTTGAGAGACTTCAACCCAATTGCTCAAACAGTTTGTAGGCTGAAAGTATCTGTTGAATATGGGACATCAGAGATGTACGGTTTTGGATTTGGAGCATACATAATAGCGAACCACCATTTGTTTAGGAGTTACAATGGTTCCATGGAGGTGCGATCCATGCATGGTACATTCAGGGTGAAGAATCTACACAGTTTGAGCGTTCTGCCAATTAAAGGCAGGGACATCATCCTCATCAAAATGCCGAAGGATTTCCCTGTTTTTCCACAGAAATTGCATTTCCGAGCTCCTACACAGAATGAAAGAGTTTGTTTAGTTGGGACCAACTTTCAGGAGAAGTATGCATCGTCGATCATCACAGAAACAAGCACTACTTACAATATACCAGGCAGCACATTCTGGAAGCATTGGATTGAAACAGATAATGGACATTGTGGACTACCAGTGGTGAGCACAGCCGATGGATGTCTAGTCGGAATTCATAGTTTGGCAAACAATGCACACACCACGAACTACTACTCAGCCTTCGATGAAGATTTTGAAAGCAAGTACCTCCGAACCAATGAGCACAATGAATGGGTCAAATCTTGGGTTTATAATCCAGACACAGTGTTGTGGGGCCCGTTGAAGCTTAAAGATAGCACTCCCAAAGGGTTATTCAAAACAACAAAGCTTGTGCAAGATCTAATCGATCATGATGAAGTCGCGGAGCAAGCTAAGCACTCTGCGTGGATGTTTGAAGCCTTGACAGGAAATTTGCAAGCTGTCGCAACAATGAAGAGTCAATTAGTAACCAAGCATGTAGTTAAAGGAGAGTGTCGACACTTCAAAGAATTCCTGACTGTGGATGCAGAAGCAGAGGCATTCTTCAGGCCTTTGATGGATGCATATGGGAAAAGCTTGCTGAATAGAGATGCGTACATCAAAGACATAATGAAGTACTCAAAACCTATAGATGTTGGTATCGTGGACTGTGATGCATTTGAAGAAGCCATCAATAGGGTTATCATCTACTTGCAAGTGCACGGCTTCAAGAAGTGTGCATATGTCACTGACGAGCAAGAAATTTTCAAAGCGCTCAACATGAAAGCTGCAGTCGGAGCCATGTATGGTGGCAAAAAGAAAGACTATTTTGAGCATTTCACTGATGCAGATAAGGAAGAAATAGTCATGCAAAGCTGTCTGCGACTGTATAAAGGCTTGCTCGGCATTTGGAACGGATCGTTGAAGGCAGAGCTCCGGTGTAAGGAAAAGATACTTGCAAACAAGACGAGGACGTTCACTGCTGCACCTCTAGACACTTTGCTGGGTGGTAAAGTGTGTGTTGATGACTTCAATAATCAATTTTATTCAAAGAATATTGAATGCTGTTGGACAGTTGGGATGACTAAGTTTTATGGTGGTTGGGATAAACTGCTTCGGCGTTTACCTGAGAATTGGGTATACTGTGATGCCGATGGCTCACAGTTTGATAGTTCACTAACTCCATACTTAATCAATGCTGTTCTCACCATCAGAAGCACATACATGGAAGACTGGGATGTGGGGTTGCAAATGCTGCGTAATTTATACACTGAGATTATTTACACACCTATCTCAACTCCAGATGGAACAATTGTTAAGAAGTTCAGAGGAAATAACAGTGGTCAGCCTTCTACTGTTGTGGACAACTCTCTTATGGTTGTCCTTGCCATGCACTATGCTCTCATCAAAGAATGCATTGAGTTTGAAGAGATTGACAGCACGTGCGTGTTCTTCGTCAATGGTGATGATTTGCTGATTGCCGTGAATCCGGATAAAGAGGGCATTCTTGATAGATTGTCACAGCACTTCTCAGATCTTGGTTTAAATTATGATTTTTCGTCAAGAACAAGAAATAAGGAAGAGTTATGGTTTATGTCTCATAGAGGCCTACTGATTGAGGGCATGTACGTGCCGAAACTTGAAGAAGAAAGGATTGTGTCCATTCTCCAATGGGACAGAGCAGACTTGGCTGAACACAGGCTTGAGGCGATTTGCGCAGCTATGATAGAGTCCTGGGGTTATTCTGAACTAACACACCAAATTAGAAGATTCTACTCATGGTTATTGCAACAGCAACCTTTTGCAACAATAGCGCAGGAAGGGAAGGCTCCTTATATAGCAAGCATGGCATTAAGGAAACTGTATATGGATAGAGCTGTGGATGAGGAAGAGCTGAGAGCCTTCACTGAAATGATGGTCGCATTAGACGATGAGTTTGAGTGTGACTCTTATGAAGTACACCACCAGGGAAACGATACAATCGATGCAGGAGGAAGCAGCAAGAAAGATGCAAGACCAGAGCAAGGCAGCATCCAGTCAAACCCGAACAAAGGAAAAGATAAGGATGTGAATGCTGGTACATCTGGGACACATACTGTGCCGAGAATCAAGGCTATCACGTCCAAAATGAGAATGCCCAAAAGCAAAGGAGCAACCGTGCTAAACTTAGAACACTTGCTTGAGTATGCTCCACAACAAATTGATATTTCAAATACTCGGGCAACTCAATCACAGTTTGATACGTGGTATGAGGCAGTGCGGATGGCATACGACATAGGAGAAACTGAGATGCCAACTGTGATGAATGGGCTTATGGTTTGGTGCATTGAAAATGGAACCTCGCCAAATGTCAACGGAGTCTGGGTTATGATGGATGGGGATGAACAAGTCGAGTACCCGTTGAAACCAATCGTTGAGAATGCAAAACCAACCCTTAGGCAAATCATGGCACATTTCTCAGATGTTGCAGAAGCGTATATAGAAATGCGCAACAAAAAGGAACCATATATGCCACGATATGGTTTAATTCGAAATCTGCGGGATGTGGGTTTAGCGCGTTATGCCTTTGACTTTTATGAGGTCACATCACGAACACCAGTGAGGGCTAGGGAAGCGCACATTCAAATGAAGGCCGCAGCATTGAAATCAGCCCAACCTCGACTTTTCGGGTTGGACGGTGGCATCAGTACACAAGAGGAGAACACAGAGAGGCACACCGCCGAGGATGTCTCTCCAAGTATGCATACTCTACTTGGAGTCAAAAACATGTGA

================================================
FILE: inst/extdata/GVariation/C.Wilga5.fas
================================================
>Wilga5
ATGGCAACTTACATGTCAACAATCTGTTTCGGTTCGTTTGAATGCAAGCTACCATACTCACCCGCCTCTTGCGGGCATATTGTGAAGGAACGAGAAGTGCTGGCTTCCGTTGATCCTTTCGCAGATCTGGAAACACAACTTAGTGCACGATTGCTCAAGCAAGAATATGCTACTGTTCGTGTGCTCAAGAACGGTACTCTTACGTACCGATACAAGACTGATGCCCAGATAACGCGCATCCAGAAGAAACTGGAAAGGAAGGATAGGGAAGAATATCACTTCCAGATGGCAGCTCCTAGTATTGTGTCAAAGATCACTATTGCTGGTGGAGAGCCACCTTCAAAACTTGAATCACAAGTGCGGAGGGGTGTCATCCACACAACTCCAAGGATGCGCACAGCAAAAACATATCACACGCCAAAGTTGACAGAGGGACAAATGAACCACCTTATCAAGCAGGTGAAGCAAATTATGTCAACCAAAGGAGGGTCTGTTCAACTGATTAGCAAGAAAAGTACCCATGTTCACTATAAAGAAGTTTTGGGATCACATCGCGCAGTTGTTTGCACTGCACATATGAGAGGTTTACGAAAGAGAGTGGACTTTCGGTGTGATAAATGGACCGTTGTGCGTCTACAGCATCTCGCCAGGACGGACAAGTGGAATAACCAAGTTCGTGCTACTGATCTACGCAAGGGCGATAGTGGAGTTATATTGAGTAATACTAATCTCAAAGGAAGCTTTGGGAGAAGCTCGGAGGGCATATTCATAGTGCGTGGGTCGCACGAAGGAAAAATCTATGATGCACGTTCCAAGGTTACTCAAGGGGTTATGGATTCAATGGTTCAGTTCTCAAGCGCTGAAAGCTTCTGGAAGGGATTGGACGGCAATTGGGCACAAATGAGATATCCTACAGATCATACATGTGTGGCAGGCTTACCAGTTGAAGACTGCGGCAGAGTTGCAGCGATAATGACACACAGTATTTTACCGTGCTATAAGATAACCTGCCCTACCTGTGCCCAACAATATGCCAATTTGCCAGCCAGTGACTTACTTAAGATATTACACAAGCACGCAAGTGATGGTTTAAATCGATTGGGGGCAGACAAAGATCGCTTTGTGCACGTCAAAAAGTTCTTGACAATCTTAGAGCACTTAACTGAACCGGTTGATCTGGGTCTAGAAATTTTCAATGAAGTATTCAAGTCTATAGGGGAGAAGCAACAATCACCTTTCAAAAACCTGAATATTCTGAATAATTTCTTTTTGAAAGGAAAAGAAAATACAGCTCGTGAATGGCAGGTGGCTCAATTAGGCTTACTTGAATGGGCAAGATTCCAAAAGAACAGAACGGATAATATCAAGAAAGGAGACATCTCGTTCTTTAGGAATAAACTATCTGCCAAAGCAAATTGGAACTTGTATCTGTCATGTGATAACCAGCTGGATAAGAATGCAAACTTCCTGTGGGGACAGAGGGAATATCATGCTAAGCGATTTTTCTCGAACTATTTCGAGGAAATTGATCCAGCGAAGGGCTATTCAGCATATGAAAATCGTTTGCATCCGAATGGGACAAGAAAACTTGCAATTGGAAACCTAATTGTACCACTTGATCTGGCTGAGTTTAGGCGGAAGATGAAAGGTGATTATAAAAGACAGCCAGGGGTGAGTAAGAAGTGCACGAGCTCGAAGGATGGAAACTACGTGTATCCCTGTTGTTGCACTACACTTGATGATGGCTCAGCTGTTGAATCAACATTTTACCCGCCAACTAAGAAGCACCTTGTAATAGGTAATAGTGGCGACCAAAAGTATGTTGACTTACCAAAAGGGAATTCTGAGATGTTATATATTGCCAGGCAAGGCTTTTGTTACATTAACATTTTCCTCGCGATGTTGATTAACATTAGTGAGGAAGATGCAAAGGATTTCACTAAGAAGGTTCGTGACATGTGTGTGCCAAAGCTTGGAACCTGGCCAACCATGATGGATCTGGCTACAACTTGTGCTCAAATGAAAATATTCTACCCTGATGTTCATGATGCAGAACTGCCTAGAATACTAGTCGATCACGAAACGCAGACATGCCATGTGGTTGACTCGTTTGGCTCACAAACAACTGGGTATCATATTTTGAAAGCATCTAGCGTGTCCCAACTTATTTTGTTTGCTAATGATGAGTTGGAGTCTGACATTAAACATTATAGAGTTGGTGGTGTTCCTAATGCATGCCCTGAACTTGGGTCCACAATATCACCTTTCAGAGAAGGAGGAGTTATAATGTCTGAGTCGGCAGCTCTGAAACTGCTTTTGAAGGGAATTTTTAGACCTAAGGTGATGAGACAGTTGTTGTTAGATGAGCCTTATCTGTTGATTCTATCAATATTATCCCCTGGCATACTGATGGCTATGTACAATAATGGGATTTTTGAACTTGCGGTAAGGTTGTGGATTAATGAGAAACAATCCATAGCTATGATAGCATCGCTACTATCAGCTTTAGCCCTACGAGTGTCAGCGGCAGAAACACTCGTCGCACAGAGGATTATAATTGATACTGCAGCTACAGATCTCCTTGATGCTACGTGCGATGGGTTCAACCTACATCTAACGTACCCCACTGCGTTGATGGTGTTGCAAGTTGTTAAGAATAGAAATGAATGTGATGATACCCTATTCAAGGCGGGTTTTCCAAGTTACAACACGAGCGTCGTACAGATCATGGAAAAAAATTATCTAAATCTCTTAAACGATGCTTGGAAAGATTTAACTTGGCGGGAAAAATTATCCGCAACATGGTACTCATACAGAGCAAAACGCTCTATCACTCGGTACATAAAACCCACAGGAAGGGCAGATTTGAAAGGGTTATACAACATATCACCACAAGCATTCTTGGGCCGAGGCGCCAAGGTGGTCAAAGGCACTGCCTCAGGATTGTGCGAGCGATTTAATAATTATTTCTACACTAAGTGTGTAAATATTTCATCCTTTTTCATTCGTAGAATCTTTAGGCGTTTGCCAACTTTCGTCACTTTTGTTAACTCATTATTAGTTATTAGTATGTTAACCAGCGTAGTGGCAGTCTGTCAGGCAATAATTTTAGATCAGAGGAAGTATAGGAGAGAAATCGAGTTGATGCAGATAGAGAAGAATGAGATCGTCTGCATGGAGCTATATGCAAGTTTACAGCGCAAACTTGAACGCGATTTCACATGGGATGAGTACATTGAGTATTTGAAGTCAGTAAACCCTCAGATAGTTCAGTTTGCTCAAGCGCAGATGGAAGAATATGATGTGCGACACCAGCGTTCCACACCAGGTGTTAAAAATTTGGAACAAGTGGTAGCATTTATGGCTTTAGTCATCATGGTGTTCGATGCTGAAAGGAGTGATTGCGTGTTCAAAACTCTCAATAAATTTAAGGGTGTCCTTTCCTCACTGGACCATGAAGTTAGACATCAGTCCTTAGACGATGTGATCAAGAATTTTGATGAGAGGAATGAGGTTATTGATTTTGAATTGAGTGAGGACACAATTCGAACATCATCAGTGCTAGATACAAAGTTTAGTGATTGGTGGGACCGACAAATCCAGATGGGGCATACACTTCCACATTACAGAACCGAGGGGCACTTCATGGAATTCACAAGAGCAACTGCTGTCCAAGTGGCTAATGACATTGCCCATAGCGAACACCTAGACTTTCTAGTAAGGGGAGCTGTTGGGTCTGGAAAGTCAACTGGGTTGCCTGTTCATCTTAGTGTAGCCGGATCTGTGCTTTTAATTGAACCAACGCGACCACTAGCGGAGAACGTTTTCAAACAGCTATCTAGTGAACCATTCTTCAAGAAGCCAACACTGCGTATGCGTGGAAATAGTATATTTGGCTCTTCTCCAATCTCCGTCATGACTAGCGGATTTGCGCTACACTACTTCGCCAATAATCGCTCTCAATTAGCTCAGTTCAACTTTGTAATATTTGATGAGTGCCATGTTCTGGATCCTTCCGCAATGGCGTTCCGCAGTCTGCTGAGTGTTTATCATCAAGCATGCAAAGTATTAAAAGTGTCAGCTACTCCAGTGGGAAGAGAGGTTGAATTTACAACACAGCAACCAGTCAAGTTAATAGTGGAGGACACACTGTCTTTTCTATCATTTGTTGATGCACAAGGTTCTAAAACTAATGCTGATGTTGTTCAGTTTGGTTCAAACGTACTTGTGTACGTGTCGAGCTACAATGAAGTTGATACCTTGGCTAAGCTCCTAACAGACAAGAATATGATGGTCACAAAAGTTGATGGCAGAACAATGAAGCACGGTTGCCTAGAAATTGTCACAAAAGGAACCAGTGCGAGACCACATTTTGTTGTAGCAACCAACATAATTGAGAATGGAGTGACTTTGGACATAGACGTGGTTGTAGATTTTGGGTTGAAAGTCTCACCGTTCTTGGACATTGACAATAGGAGCATTGCTTACAATAAGGTGAGTGTTAGCTATGGTGAGAGAATTCAAAGGCTGGGTCGTGTTGGACGCTTCAAGAAAGGAGTAGCATTGCGCATTGGACACACTGAGAAGGGAATTATTGAAATTCCAAGCATGATCGCTACAGAGGCGGCTCTTGCTTGCTTTGCATATAACTTGCCAGTGATGACAGGAGGCGTCTCAACTAGTCTGATTGGCAATTGTACTGTGCGCCAAGTTAAAACAATGCAGCAATTTGAATTGAGTCCCTTCTTTATCCAGAATTTTGTTGCCCATGATGGATCAATGCATCCTGTCATACATGACATTCTTAAAAAGTATAAACTTCGAGATTGTATGACACCTTTGTGCGATCAGTCTATACCATACAGGGCATCGAGCACTTGGTTATCGGTTAGTGAATATGAGCGACTTGGAGTGGCCTTAGAAATTCCAAAGCAAGTCAAAATTGCATTCCATATCAAAGAGATCCCTCCTAAGCTCCATGAAATGCTTTGGGAAACGGTTGTCAAGTACAAAGACGTTTGCTTATTTCCAAGCATTCGAGCATCGTCCATCAGCAAAATCGCATACACATTGCGTACAGACCTTTTCGCCATCCCAAGAACTCTAATATTGGTGGAGAGACTGCTTGAAGAGGAGCGAGTGAAGCAGAGCCAATTCAGAAGTCTCATCGACGAAGGATGCTCAAGCATGTTTTCAATTGTCAACTTGACAAACACTCTCAGAGCTAGATATGCAAAAGATTACACCGCAGAGAACATACAAAAACTTGAGAAAGTGAGAAGTCAATTGAAAGAATTCTCAAATTTGGATGGTTCTGCATGTGAGGAAAATTTAATAAAGAGGTATGAGTCTTTGCAGTTCGTTCATCACCAAGCTGCGACGTCACTTGCAAAGGATCTCAAGTTGAAGGGGACTTGGAAGAAGTCATTAGTGGCCAAAGACTTGATCATAGCAGGCGCTGTTGCAATTGGTGGAATAGGACTCATATATAGTTGGTTCACACAATCAGTTGAGACTGTGTCTCACCAAGGGAAAAATAAATCCAAAAGAATCCAAGCCTTGAAGTTTCGCCATGCTCGTGACAAAAGGGCTGGCTTTGAAATTGACAACAATGATGACACAATAGAGGAATTCTTTGGATCTGCATACAGGAAAAAGGGAAAAGGTAAAGGTACCACAGTTGGTATGGGCAAGTCAAGCAGGAGGTTCATCAACATGTATGGGTTTGACCCAACAGAGTACTCATTCATCCAATTCGTTGATCCACTCACTGGGGCGCAAATAGAAGAGAATGTCTATGCTGACATTAGAGATATTCAAGAGAGATTTAGTGAAGTGCGAAAGAAAATGGTTGAGAATGATGACATTGAAATGCAAGCCTTGGGCAGTAACACGACCATACATGCATACTTCAGGAAAGATTGGTCTGACAAAGCTTTGAAGATTGATTTAATGCCACATAATCCACTCAAAGTTTGTGACAAGACAAATGGCATTGCCAAATTTCCTGAGAGAGAACTCGAACTAAGGCAGACTGGGCCAGCTGTAGAAGTCGACGTGAAGGACATACCAGCACAGGAGGTGGAGCATGAAGCTAAATCGCTCATGAGAGGCTTGAGAGACTTCAACCCAATTGCCCAAACAGTTTGTAGGCTGAAAGTATCTGTTGAATATGGGACATCAGAGATGTACGGTTTTGGATTTGGAGCGTACATAATAGCGAACCACCATTTGTTCAGGAGTTACAATGGTTCCATGGAGGTGCGATCCATGCACGGTACATTCAGGGTGAAGAATCTACACAGTTTGAGTGTTCTGCCAATTAAAGGTAGGGATATCATCCTCATCAAAATGCCGAAAGATTTCCCTGTCTTTCCACAGAAATTGCATTTCCGAGCTCCTACACAGAATGAAAGAGTTTGTTTAGTTGGAACCAACTTTCAGGAGAAGTATGCATCGTCGATCATCACAGAAGGAGGCACCACTTACAATATACCAGGCAGCACATTCTGGAAGCATTGGATTGAAACAGATAATGGACATTGTGGACTACCAGTGGTGAGCACCACCGATGGATGTCTAGTCGGAATTCACAGTTTGGCAAACAACAAACACACCACGAATTACTACTCAGCCTTCGATGAAGATTTTGAAAGCAAGTATCTCCGAACCAATGAGCACAATGAATGGGTCAAGTCTTGGATTTATAATCCAGACACAGTATTGTGGGGCCCGTTGAAACTTAAAGACAGCACTCCCAAAGGATTATTCAAAACAACAAAGCTTGTGCAAGATCTAATTGATCATGATGAAGTGGTGGAGCAAGCTAAGCACTCTGCGTGGATGTTTGAAGCCTTGACAGGAAATTTGCAAGCTGTCGCAACAATGAAGAGCCAATTAGTAACCAAGCATGTAGTTAAAGGAGAGTGTCGACACTTCAAAGAATTCCTGACTGTGGATGCAGAAGCAGAGGCATTCTTCAGGCCTTTGATGGATGCGTATGGGAAAAGTTTGCTGAATAGAGATGCATACATCAAGGACATAATGAAGTATTCAAAACCTATAGATGTTGGTATCGTGGACTGTGATGCATTTGAGGAAGCCATCAATAGGGTTATCATCTACCTGCAAGTGCACGGCTTCAAGAAGTGCGCATACGTTACTGACGAGCAAGAAATTTTTAAAGCGCTCAACATGAAAGCTGCAGTCGGAGCCATGTATGGTGGCAAAAAGAAAGACTATTTTGAGCATTTCACTGATGCAGATAAGGAAGAAATAGTCATGCAAAGCTGTCTGCGATTGTATAAAGGCTTGCTTGGCATTTGGAATGGATCATTGAAGGCAGAGCTCCGGTGTAAGGAAAAGATACTTGCAAATAAGACGAGGACATTCACTGCTGCACCTTTAGACACTTTGCTGGGTGGTAAAGTGTGTGTTGATGACTTCAATAATCAATTTTATTCAAAGAATATTGAATGCTGTTGGACGGTTGGGATGACAAAGTTTTATGGTGGTTGGGATAAACTGCTGCGGCGTTTACCTGAGAATTGGGTTTACTGTGATGCCGATGGCTCACAGTTTGATAGTTCACTAACTCCATACTTAATCAATGCAGTTCTCACCATTAGAAGCACATACATGGAAGATTGGGATGTGGGGTTGCAAATGTTGCGCAATTTATACACTGAGATTGTTTACACACCTATTTCAACTCCAGATGGAACAATTGTTAAGAAGTTCAGAGGGAATAACAGTGGTCAGCCTTCTACTGTTGTGGACAACTCTCTTATGGTCGTCCTTGCCATGCACTATGCTCTCATCAAAGAATGCATTGAGTTTGAAGAGATTGACAGCACGTGCGTGTTCTTTGTCAATGGTGATGATTTGCTGATTGCTGTGAATCCGGATAAAGAGGGCATTCTTGACAGATTGTCACAACACTTCTCAGATCTTGGTTTGAATTATGATTTCTCGTCAAGAACAAGAAATAAGGAGGAGTTGTGGTTTATGTCTCATAGAGGCCTACTGATTGAGGCAATGTACGTGCCGAAACTTGAAGAAGAAAGGATTGTGTCCATTCTCCAATGGGACAGAGCAGACTTGGCTGAACATAGGCTTGAGGCGATTTGCGCAGCTATGATAGAGTCCTGGGGTTATTCTGAACTAACACACCAAATCAGGAGATTCTACTCATGGTTATTGCAACAGCAACCCTTTGCAACAATAGCGCAGGAAGGGAAGGCTCCTTATATAGCAAGCATGGCATTAAGGAAATTGTATATGGATAGGGCTGTGGATGAGGAAGAGCTGAGAGCCTTCACTGAAATGATGGTCGCATTAGATGATGAGTTTGAATTTGACTCTTATGAAGTATACCATCAAGCAAATGACACAATCGATGCAGGAGAAAGCAGCAAGAAAGATGCAAGACCAGAGCAAGGCAGCATCCAGTCAAACCCGAACAAAGGAAAAGATAAGGATGTGAATGCTGGTACATCTGGGACACATACTGTGCCGAGAATCAAGGCTATCACGTCCAAAATGAGGATGCCCAAAAGCAAGGGAGCAACCGTGCTAAACTTAGAACACTTGCTTGAGTATGCTCCACAACAAATTGATATTTCAAATACTCGGGCAACTCAATCACAGTTTGATACGTGGTATGAGGCAGTGCGGATGGCATACGACATAGGAGAAACTGAGATGCCAACTGTGATGAATGGGCTTATGGTTTGGTGCATTGAAAATGGAACCTCGCCAAATGTCAACGGAGTTTGGGTTATGATGGATGGGAATGAACAAGTCGAGTACCCGTTGAAACCAATCGTTGAGAATGCAAAACCAACCCTTAGGCAAATCATGGCACATTTCTCAGATGTTGCAGAAGCGTATATAGAAATGCGCAACAAAAAGGAACCATATATGCCACGATATGGTTTAATTCGAAATCTGCGGGATGTGGGTTTAGCGCGTTATGCCTTTGACTTTTATGAGGTCACATCACGAACACCAGTGAGGGCTAGGGAAGCGCACATTCAAATGAAGGCCGCAGCATTGAAATCAGCCCAACCTCGACTTTTCGGGTTGGACGGTGGCATCAGTACACAAGAGGAGAACACAGAGAGGCACACCACCGAGGATGTCTCTCCAAGTATGCATACTCTACTTGGAGTCAAGAACATGTGA
>CF_YL21
ATGGCAATCTACACGTCAACAATCTGTTTCGGTTCGTTTGAATGCAAGCTACCATACTCACCCGCCTCTTGCGGGCATATTGTGAAGGAACGAGAAGTGCTGGCTTCCATCGATCCTTTCGCAGATCTGGAAACACAACTTAGTGCACGATTGCTCAAGCAAGAATATGCTACTGTTCGTGTGCTCAAGAACGGTACTCTTACGTACCGATACAAGACTGATGCCCAGATAACGCGCATCCAGAAGAGACTGGAAAGGAAGGATAGGGAAGAATACCACTTCCAGATGGCAGCTCCTAGTATTGTGTCAAAGATCACTATTGCTGGTGGAGAGCCACCTTCAAAACTTGAATCACAAGTGCGGAGGGGTGTCATCCACACAACTCCAAGGATGCGCACAGCAAAAACATATCACACGCCAAAGCTGACAGAGGGACAAATGAACCACCTTATCAAGCAGGTGAAGCAAATTATGTCAACCAAAGGAGGGTCTGTTCAACTGATTAGCAAGAAAAGTACCCATGTTCACTATAAAGAAGTTTTGGGATCACATCGCGCAGTTGTTTGCACTGCACATATGAGAGGTTTACGAAAGAGAGTGGACTTTCGGTGTGATAAATGGACCGTTGTGCGTCTACAGCATCTCGCCAGGACGGATAAGTGGAATAACCAAGTTCGTGTTACTGATCTACGCAAGGGCGATAGTGGAGTTATATTGAGTAATACTAATCTCAAAGGAAACTTTGGGAGAAGCTCGGAGGGCATATTCATAGTGCGTGGGTCGCACGAAGGAAAAATCTATGATGCACGTTCCAAGGTTACTCAAGGGGTTATGGATTCAATGGTTCAGTTCTCAAGCGCTGAAAGCTTTTGGAAGGGATTGGACGGCAATTGGGCACAAATGAGATATCCTACAGATCATACATGTGTGGCAGGCTTACCAGTTGAAGACTGTGGCAGAGTTGCAGCGATAATGACACACAGTATTTTACCGTGCTATAAGATAACCTGCCCTACCTGTGCCCAACAATATGCCAACTTGCCAGCCAGTGACTTACTTAAGATATTACACAAGCACGCAAGTGATGGTTTAAATCGATTGGGGGCAGACAAAGATCGCTTTGTGCATGTCAAAAAGTTCTTGACAATCTTAGAGCACTTAACTGAACCGGTTGATCTGAGTCTAGAAATTTTCAATGAAGTATTCAAGTCTATAGGGGAGAAGCAACAATCACCTTTCAAAAACCTGAATATTCTGAATAATTTCTTTTTAAAAGGAAAGGAAAATACAGCTCGTGAATGGCAGGTGGCTCAATTAAGCTTACTTGAATTGGCAAGATTCCAAAAGAACAGAACGGATAATATCAAGAAAGTAGACATCTCGTTCTTTAGGAATAAACTATCTGCCAAAGCAAATTGGAACTTGTATCTGTCATGTGATAACCAGCTGGATAAGAATGCAAACTTCCTGTGGGGACAGAGGGAATATCATGCTAAGCGATTTTTCTCGAACTATTTCGAGGAAATTGATCCAGCGAAGGGCTATTCAGCATACGAAAATCGTTTGCATCCGAATGGGACAAGAAAACTTGCAATTGGAAACCTAATTGTACCACTTGATCTGGCTGAGTTTAGGCGGAAGATGAAAGGTGATTTTAAAAGACAGCCAGGGGTGAGTAAGAAGTGCACGAGCTCGAAGGATGGAAACTACGTGTATCCCTGTTGTTGCACTACACTTGATGATGGCTCAGCTGTTGAATCAACATTTTACCCGCCAACTAAGAAGCGCCTCGTAATAGGTAATAGTGGCGACCAAAAGTATGTTGACTTACCAAAAGGGAATTCTGAGATGTTATATATTGCCAGGCAAGGCTTCTGTTACATTAACATTTTCCTCGCGATGTTGATTAACATTAGTGAGGAAGATGCAAAGGATTTCACTAAGAAGGTTCGTGACATGTGTGTGCCAAAGCTCGGAACCTGGCCAACCATGATGGATCTGGCTACAACTTGTGCTCAAATGAAAATATTCTACCCTGATGTTCATGATGCAGAACTGCCTAGAATACTAGTCGATCACGAAACGCAGACATGCCATGTGGTTGACTCGTTTGGCTCACAAACAACTGGGTATCATATTTTGAAAGCATCTAGCGTGTCCCAACTTATTTTGTTTGCTAATGATGAGTTGGAGTCTGACATTAAGCACTATAGAGTTGGTGGTATTCCTAATGCATGCCCTGAACTTGGGTCCACAATATCACCTTTCAGAGAAGGAGGAGTTATAATGTCTGAGTCGGCAGCGCTGAAACTGCTTTTGAAGGGAATTTTTAGACCTAAGGTGATGAGACAGTTGCTGTTAGATGAGCCTTACCTGTTGATTCTATCAATATTATCCCCTGGCATACTGATGGCTATGTATAATAATGGGATTTTTGAACTTGCGGTGAGGTTGTGGATTAATGAGAAGCAATCCATAGCGATGATAGCATCGCTACTATCAGCTTTAGCCCTACGAGTGTCAGCGGCAGAAACACTCGTCGCACAGAGGATTATAATTGATGCTGCAGCTACAGACCTCCTCGATGCTACGTGTGACGGGTTCAACCTGCATCTAACGTACCCCACTGCATTGATGGTGTTGCAAGTTGTTAAGAATAGAAATGAATGTGATGATACTCTATTCAAGGCGGGTTTTCCAAGTTACAACACGAGCGTCGTACAGATTATGGAAAAAAATTATCTAAATCTCTTGAACGATGCTTGGAAAGATTTAACTTGGCGGGAAAAATTTTCCGCAACATGGTACTCATACAGAGCAAAACGCTCTATCACTCGGTACATAAAACCCACAGGAAGGGCAGATTTGAAAGGGTTATACAACATATCACCACAAGCATTCTTGGGCCGAAGCGTCCAGGTGGTCAAAGGCACTGCCTCAGGATTGAGCGAGCGATTTAATAATTACTTCAATACTAAGTGTGTAAATATTTCATCCTTTTTCATTCGTAGAATCTTTAGGCGTTTGCCAACTTTCGTCACTTTTGTTAACTCATTATTAGTTATTAGTATGTTAACTAGTGTAGTGGCAGTGTGTCAGGCAATAATTTTAGATCAGAAGAAGTATAGGAGAGAGATCGAGTTGATGCAGATAGAGAAGAATGAGATTGTCTGCATGGAGCTATACGCAAGTTTACAGCGCAAACTTGAACGCGATTTCACATGGGATGAGTACATTGAGTATTTGAAATCAGTGAACCCTCAGATAGTTCAATTTGCTCAAGCGCAGATGGAAGAATATGATGTGCGACACCAGCGTTCCACACCAGGTGTTAAAAATTTGGAACAAGTGGTAGCATTTATGGCTTTAGTCATCATGGTGTTCGATGCTGAAAGGAGTGACTGCGTGTTCAAAACTCTCAATAAATTTAAGGGTGTCCTTTCCTCACTGGACCATGAAGTTAGACATCAGTCCTTGGATGATGTGATCAAGAATTTTGATGAGAGGAATGAGACTATTGATTTTGAATTGAGTGAGGACACAATTCGAACATCATCAGTGCTAGATACAAAGTTTAGTGATTGGTGGGACCGACAAATCCAAATGGGACATACACTTCCACATTACAGAACCGAGGGGCACTTCATGGAATTTACAAGGGCAACTGCTGTTCAAGTGGCTAATGACATTGCCCACAGCGAACACCTAGACTTTCTAGTAAGGGGAGCTGTTGGGTCTGGAAAGTCAACTGGGTTGCCTGTTCATCTTAGTGTAGCTGGATCTGTGCTTTTAATTGAGCCAACGCGGCCACTAACGGAGAACGTTTTCAAACAGCTATCTAGTGAACCATTCTTCAAGAAGCCAACACTGCGTATGCGTGGAAATAGTATATTTGGCTCTTCCCCAATCTCCGTTATGACTAGCGGGTTTGCGCTACACTACTTCGCCAATAATCGCTCTCAATTAGCTCAGTTCAATTTTGTAATATTTGATGAGTGCCATGTTCTGGATCCTTCCGCAATGGCGTTCCGCAGTCTGCTGAGTGTTTATCATCAAGCATGTAAAGTATTAAAAGTGTCAGCTACTCCAGTGGGAAGAGAGGTTGAATTCACAACACAGCAGCCAGTCAAGTTAATAGTGGAAGACACACTGTCTTTCCAATCATTTGTTGATGCACAAGGTTCTAAAACTAATGCTGATGTTGTCCAGTTTGGTTCAAACATACTTGTTTACGTGTCGAGCTACAATGAAGTTGACAACTTGGCTAAGCTCCTAACAGATAAGAATATGATGGTCACAAAGGTTGATGGCAGAACAATGAAGCACGGTTGCCTAGAAATTGTCACAAAAGGAACCAGTGCGAGACCACATTTTGTTGTAGCAACCAACATAATTGAGAATGGAGTGACTTTGGACATAGATGTAGTAGTGGATTTTGGGTTGAAAGTCTCACCGTTCTTGGACATTGACAATAGGAGCATTGCTTACAATAAGGTGAGTGTTAGTTATGGTGAGAGAATTCAAAGGCTGGGTCGTGTTGGACGCTTCAAGAAAGGAGTAGCATTGCGCATTGGACACACTGAGAAGGGAATTATCGAAATTCCAAGCATGATCGCTACTGAGGCGGCTCTTGCTTGCTTTGCATATAACTTACCAGTGATGACAGGAGGCGTCTCAACTAGTCTGATTGGCAATTGTACTGTGCGCCAGGTCAAAACAATGCAGCAATTTGAATTGAGCCCCTTCTTTATCCAGAATTTCGTTGCTCATGATGGATCAATGCATCCTGTCATACATGACATTCTCAAAAAGTATAAACTCCGAGATTGTATGACACCTTTGTGCGATCAGTCTATACCATACAGGGCATCGAGCACTTGGTTATCGGTTAGTGAATATGAGCGACTTGGAGTGGCTTTAGAAATTCCAAAGCAAGTCAAAATTGCATTCCATATCAAAGAGATCCCTCCTAAGCTCCACGAAATGCTTTGGGAAACGGTTGTCAAGTACAAAGACGTTTGCTTATTTCCAAGCATTCGAGCATCGTCCATCAGCAAAATCGCATACACATTGCGTACAGACCTCTTCGCCATCCCAAGAACTCTAATATTGGTGGAGAGATTGCTTGAAGAGGAGCGAGTGAAGCAGAGCCAATTCAGAAGTCTCATCGATGAAGGGTGCTCAAGCATGTTTTCAATTGTCAACTTGACCAACACTCTCAGGGCCAGATATGCAAAAGATTACACCGCAGAGAACATACAAAAACTTGAGAAAGTGAGAAGTCAATTAAAAGAATTCTCAAATTTGGATGGTTCTGCATGTGAGGAAAATTTAATAAAGAGGTATGAGTCTTTGCAGTTCGTTCACCACCAAGCTGCGACGTCACTTGCAAAGGATCTCAAGTTGAAGGGGACTTGGAAGAAGTCATTAGTGGCTAAAGACTTGATCATAGCAGGCGCTGTTGCAATTGGTGGAATAGGACTCATATATAGTTGGTTCACACAATCAGTTGAGACTGTGTCTCACCAAGGGAAAAATAAATCCAAAAGAATCCAAGCCTTGAAGTTTCGCCATGCTCGTGACAAAAGGGCTGGCTTTGAAATTGACAACAATGATGACACAATAGAGGAATTCTTTGGATCTGCATACAGGAAGAAGGGAAAAGGTAAAGGTACCACAGTTGGTATGGGCAAGTCAAGCAGGAGGTTCATTAACATGTACGGGTTTGACCCGACAGAGTACTCATTCATCCAATTCGTTGATCCACTCACTGGGGCACAAATAGAAGAGAATGTCTATGCTGACATTAGAGATGTTCAAGAAAGATTTAGTGAGGTGCGAAAGAAAATGGTTGAGAATGATGATATTGAAATGCAAGCCTTGGGTAGTAACACGACCATACATGCATACTTTAGGAAAGATTGGTCTGACAAAGCTTTGAAGATTGATTTAATGCCACATAACCCACTCAAAGTTTGTGACAAAACAAATGGCATTGCCAAACTTCCTGAGAGAGAGTTCGAACTAAGGCAGACTGGGCCAGCTGTAGAAGTCGACGTGAAGGACATACCAGCGCAGGAAGTGGAGCATGAAGCCAAATCGCTCATGAGAGGCTTGAGAGACTTCAACCCAATTGCTCAAACAGTTTGTAGGCTGAAAGTATCTGTTGAATATGGGACATCAGAGATGTACGGTTTTGGATTTGGAGCATACATAATAGCGAACCACCATTTGTTTAGGAGTTACAATGGTTCCATGGAGGTGCGATCCATGCATGGTACATTCAGGGTGAAGAATCTACACAGTTTGAGCGTTCTGCCAATTAAAGGCAGGGACATCATCCTCATCAAAATGCCGAAGGATTTCCCTGTTTTTCCACAGAAATTGCATTTCCGAGCTCCTACACAGAATGAAAGAGTTTGTTTAGTTGGGACCAACTTTCAGGAGAAGTATGCATCGTCGATCATCACAGAAACAAGCACTACTTACAATATACCAGGCAGCACATTCTGGAAGCATTGGATTGAAACAGATAATGGACATTGTGGACTACCAGTGGTGAGCACAGCCGATGGATGTCTAGTCGGAATTCATAGTTTGGCAAACAATGCACACACCACGAACTACTACTCAGCCTTCGATGAAGATTTTGAAAGCAAGTACCTCCGAACCAATGAGCACAATGAATGGGTCAAATCTTGGGTTTATAATCCAGACACAGTGTTGTGGGGCCCGTTGAAGCTTAAAGATAGCACTCCCAAAGGGTTATTCAAAACAACAAAGCTTGTGCAAGATCTAATCGATCATGATGAAGTCGCGGAGCAAGCTAAGCACTCTGCGTGGATGTTTGAAGCCTTGACAGGAAATTTGCAAGCTGTCGCAACAATGAAGAGTCAATTAGTAACCAAGCATGTAGTTAAAGGAGAGTGTCGACACTTCAAAGAATTCCTGACTGTGGATGCAGAAGCAGAGGCATTCTTCAGGCCTTTGATGGATGCATATGGGAAAAGCTTGCTGAATAGAGATGCGTACATCAAAGACATAATGAAGTACTCAAAACCTATAGATGTTGGTATCGTGGACTGTGATGCATTTGAAGAAGCCATCAATAGGGTTATCATCTACTTGCAAGTGCACGGCTTCAAGAAGTGTGCATATGTCACTGACGAGCAAGAAATTTTCAAAGCGCTCAACATGAAAGCTGCAGTCGGAGCCATGTATGGTGGCAAAAAGAAAGACTATTTTGAGCATTTCACTGATGCAGATAAGGAAGAAATAGTCATGCAAAGCTGTCTGCGACTGTATAAAGGCTTGCTCGGCATTTGGAACGGATCGTTGAAGGCAGAGCTCCGGTGTAAGGAAAAGATACTTGCAAACAAGACGAGGACGTTCACTGCTGCACCTCTAGACACTTTGCTGGGTGGTAAAGTGTGTGTTGATGACTTCAATAATCAATTTTATTCAAAGAATATTGAATGCTGTTGGACAGTTGGGATGACTAAGTTTTATGGTGGTTGGGATAAACTGCTTCGGCGTTTACCTGAGAATTGGGTATACTGTGATGCCGATGGCTCACAGTTTGATAGTTCACTAACTCCATACTTAATCAATGCTGTTCTCACCATCAGAAGCACATACATGGAAGACTGGGATGTGGGGTTGCAAATGCTGCGTAATTTATACACTGAGATTATTTACACACCTATCTCAACTCCAGATGGAACAATTGTTAAGAAGTTCAGAGGAAATAACAGTGGTCAGCCTTCTACTGTTGTGGACAACTCTCTTATGGTTGTCCTTGCCATGCACTATGCTCTCATCAAAGAATGCATTGAGTTTGAAGAGATTGACAGCACGTGCGTGTTCTTCGTCAATGGTGATGATTTGCTGATTGCCGTGAATCCGGATAAAGAGGGCATTCTTGATAGATTGTCACAGCACTTCTCAGATCTTGGTTTAAATTATGATTTTTCGTCAAGAACAAGAAATAAGGAAGAGTTATGGTTTATGTCTCATAGAGGCCTACTGATTGAGGGCATGTACGTGCCGAAACTTGAAGAAGAAAGGATTGTGTCCATTCTCCAATGGGACAGAGCAGACTTGGCTGAACACAGGCTTGAGGCGATTTGCGCAGCTATGATAGAGTCCTGGGGTTATTCTGAACTAACACACCAAATTAGAAGATTCTACTCATGGTTATTGCAACAGCAACCTTTTGCAACAATAGCGCAGGAAGGGAAGGCTCCTTATATAGCAAGCATGGCATTAAGGAAACTGTATATGGATAGAGCTGTGGATGAGGAAGAGCTGAGAGCCTTCACTGAAATGATGGTCGCATTAGACGATGAGTTTGAGTGTGACTCTTATGAAGTACACCACCAGGGAAACGATACAATCGATGCAGGAGGAAGCAGCAAGAAAGATGCAAGACCAGAGCAAGGCAGCATCCAGTCAAACCCGAACAAAGGAAAAGATAAGGATGTGAATGCTGGTACATCTGGGACACATACTGTGCCGAGAATCAAGGCTATCACGTCCAAAATGAGAATGCCCAAAAGCAAAGGAGCAACCGTGCTAAACTTAGAACACTTGCTTGAGTATGCTCCACAACAAATTGATATTTCAAATACTCGGGCAACTCAATCACAGTTTGATACGTGGTATGAGGCAGTGCGGATGGCATACGACATAGGAGAAACTGAGATGCCAACTGTGATGAATGGGCTTATGGTTTGGTGCATTGAAAATGGAACCTCGCCAAATGTCAACGGAGTCTGGGTTATGATGGATGGGGATGAACAAGTCGAGTACCCGTTGAAACCAATCGTTGAGAATGCAAAACCAACCCTTAGGCAAATCATGGCACATTTCTCAGATGTTGCAGAAGCGTATATAGAAATGCGCAACAAAAAGGAACCATATATGCCACGATATGGTTTAATTCGAAATCTGCGGGATGTGGGTTTAGCGCGTTATGCCTTTGACTTTTATGAGGTCACATCACGAACACCAGTGAGGGCTAGGGAAGCGCACATTCAAATGAAGGCCGCAGCATTGAAATCAGCCCAACCTCGACTTTTCGGGTTGGACGGTGGCATCAGTACACAAGAGGAGAACACAGAGAGGCACACCGCCGAGGATGTCTCTCCAAGTATGCATACTCTACTTGGAGTCAAAAACATGTGA

================================================
FILE: inst/extdata/GVariation/sample_alignment.fa
================================================
>Mont
ATGGCAACTTACACATCAACAATCCAGTTTGGTTCCATTGAATGCAAACTTCCATACTCACCCGCTCCTTTTGGGCTAGT
TGCGGGGAAACGAGAAGTTTTAACCACCACTGACCCCTTCGCAAGTTTGGAGATGCAGCTTAGTGCGCGATTACGAAGGC
AAGAGTTTGCAACTATTCGAACATCCAAGAATGGTACTTGCATGTATCGATACAAGACTGATGTCCAGATTGCGCGCATT
CAAAAGAAGCGCGAGGAAAGAGAAAGAGAGGAATATAATTTCCAAATGGCTGCGTCAAGTGTTGTGTCGAAGATCACTAT
TGCTGGTGGAGAGCCACCTTCAAAACTTGAATCACAAGTGCGGAGGGGTGTCATCCACACAACTCCAAGGATGCGCACAG
CAAAAACATATCACACGCCAAAGCTGACAGAGGGACAAATGAACCACCTTATCAAGCAGGTGAAGCAAATTATGTCAACC
AAAGGAGGGTCTGTTCAACTGATTAGCAAGAAAAGTACTCATGTTCACTATAAAGAAGTTTTGGGATCACATCGCGCAGT
TGTTTGCACTGCACATATGAGAGGTTTACGAAAGAGAGTGGACTTTCGGTGTGATAAATGGACCGTTGTGCGTCTACAGC
ATCTCGCCAGGACGGACAAGTGGACTAACCAAGTTCGTGCTACTGATCTACGCAAGGGCGATAGTGGAGTTATATTGAGT
AATACTAATCTCAAAGGAAACTTTGGGAGAAGCTCGGAGGGCCTATTCATAGTGCGTGGGTCGCACGAAGGAAAAATCTA
TGATGCACGTTCCAAGGTTACTCAAGGGGTTATGGATTCAATGGTTCAGTTCTCAAGCGCTGAAAGCTTTTGGAAGGGAT
TGGACGGCAATTGGGCACAAATGAGATATCCTACAGATCATACATGTGTGGCAGGCTTACCAGTTGAAGACTGTGGCAGA
GTTGCAGCGATAATGACACACAGTATTTTACCGTGCTATAAGATAACCTGCCCTACCTGTGCCCAACAATATGCCAACTT
GCCAGCCAGTGACTTACTTAAGATATTACACAAGCACGCAAGTGATGGTCTAAATCGATTGGGGGCAGACAAAGATCGCT
TTGTGCATGTCAAAAAGTTCTTGACAATCTTAGAGCACTTAACTGAACCGGTTGATCTGAGTCTAGAAATTTTCAATGAA
GTATTCAAGTCTATAGGGGAGAAGCAACAATCACCTTTCAAAAACCTGAATATTCTGAATAATTTCTTTTTGAAAGGAAA
GGAAAATACAGCTCGTGAATGGCAGGTGGCTCAATTAAGCTTACTTGAATTGGCAAGATTCCAAAAGAACAGAACGGATA
ATATCAAGAAAGGAGACATCTCGTTCTTTAGGAATAAACTATCTGCAAAAGCAAATTGGAACTTGTATCTGTCATGTGAT
AACCAGCTGGATAAGAATGCAAACTTCCTGTGGGGACAGAGGGAATATCATGCTAAGCGATTTTTCTCGAACTATTTCGA
GGAAATTGATCCAGCGAAGGGCTATTCAGCATACGAAAATCGTTTGCATCCGAATGGGACAAGAAAACTTGCAATTGGAA
ACCTAATTGTACCACTTGATCTGGCTGAGTTTAGGCGGAAGATGAAAGGTGATTATAAAAGACAGCCAGGGGTGAGTAAG
AAGTGCACGAGCTCGAAGGATGGAAACTACGTGTATCCCTGTTGTTGCACTACACTTGATGATGGCTCAGCTGTTGAGTC
AACATTTTACCCGCCAACTAAGAAGCACCTCGTAATAGGTAATAGTGGCGACCAAAAGTATGTTGACTTACCAAAAGGGA
ATTCTGAGATGTTATATATTGCCAGGCAAGGCTTCTGTTACATTAACATTTTTCTCGCGATGTTGATTAACATTAGTGAG
GAAGATGCAAAGGATTTCACTAAGAAGGTTCGTGACATGTGTGTGCCAAAGCTTGGAACCTGGCCAACCATGATGGATTT
GGCTACAACTTGTGCTCAAATGAAAATATTCTACCCTGATGTTCATGATGCAGAACTGCCTAGAATACTAGTCGATCACG
AAACGCAGACATGCCATGTGGTTGACTCGTTTGGCTCACAAACAACTGGGTATCATATTTTGAAAGCATCTAGCGTGTCC
CAACTTATTTTGTTTGCTAATGATGAGTTGGAGTCTGACATTAAGCACTATAGAGTTGGTGGTATTCCTGGAGCATGCCC
TGAGCTTGGGTCCACAATATCACCTTTTAGAGAAGGAGGAATCATAATGTCTGAGTCAGCAGCGCTAAAACTGCTCCTAA
AGGGAATTTTTAGGCCCAAAGTGATGAAGCAATTGCTACTGGATGAACCATATTTGCTCATTTTATCGATATTATCTCCT
GGTATACTTATGGCCATGTACAACAATGGGATATTTGAGTTAGCGGTGAAGTTGTGGATCAATGAGAAACAATCTATAGC
CATGATAGCATCGTTATTGTCCGCCTTGGCTTTACGAGTGTCAGCAGCAGAAACACTCGTTGCACAGAGGATTATAATTG
ACACGGCAGCAACAGATCTTCTCGATGCTACGTGTGATGGATTCAACTTACATCTAACATATCCCACTGCACTCATGGTG
TTGCAAGTTGTTAAGAACAGAAATGAATGTGATGATACGTTGTTTAAAGCAGGTTTTTCACATTACAACATGAGTGTCGT
GCAGATTATGGAAAAAAATTATCTAAGCCTCTTGGGCGATGCTTGGAAAGATTTAACCTGGCGAGAAAAATTATCCGCAA
CATGGCACTCATACAAAGCAAAGCGCTCTATCACTCAGTTCATAAAACCCATAGGCAAAGCAGATTTAAAAGGGTTGTAC
AACATATCACCGCAAGCATTCTTGGGTCAGGGCGTACAGAGAGTCAAAGGCACCGCCTCAGGGTTGAATGAGCGACTCAA
TAATTATATCAATACTAAGTGTGTAAATATTTCATCCTTTTTCATTCGTAGAATTTTCCGGCGCTTGCCAACTTTTGTAA
CTTTCATTAATTCATTATTAGTTATTAGTATGCTAACTAGTGTAGTAGCAGTGTGTCAAGCAATAATTCTAGATCAAAGG
AAGTATAGAAAAGAAATTGAGTTGATGCAGATTGAGAAGAATGAAATTGTTTGTATGGAGTTGTATGCGAGTCTGCAGCG
CAAACTTGAGCGTGAATTCACATGGGATGAATATATGGAATATTTGAAATCTGTGAATCCCCAGATAGTTCAATTCGCGC
AAGCTCAAATGGAAGAATATAATGTGCGACATCAGCGCTCCACACCAGGTGTTAAGAATTTAGAGCAGGTGGTAGCATTT
ATAACTCTAATTATCATGATGTTTGATGCTGAAAGGAGCGACTGTGTATTTAAGACTCTCAACAAATTCAAAGGCATCGT
TTCTTCAATGGATCATGAAGTTAGACACCAGTCCTTGGATGATGTAATCAAGAATTTCGATGAAAGGAACGAAGTTATTG
ATTTTGAACTAAATGAGGATACAATTAAAACATCATCAGTGTTGGACACAAAGTTTAGCGACTGGTGGGATCGGCAAATC
CAAATGGGACACACACTTCCCCATTATAGAACTGAGGGACACTTCATGGAATTCACAAGGGCAACTGCTGTACAAGTGGC
CAACGACATCGCGCATAGTGAGCACCTAGACTTTCTAGTGAGGGGAGCTGTTGGGTCTGGAAAATCTACTGGACTGCCTG
TCCATCTCAGTGCAGCTGGATCTGTGCTTTTGATAGAACCAACTCGACCACTTGCAGAAAACGTGTTCAAGCAATTATCC
AGTGAACCGTTTTTCAAGAAGCCAACACTGCGCATGCGAGGAAATAGTGTGTTTGGTTCCTCTCCAATCTCCATTATGAC
TAGCGGCTTTGCGTTGCACTACTATGCTAATAATCGCTCTCAGCTAACTCAGTTTAATTTCATAATTTTTGATGAATGTC
ATGTTTTAGATCCTTCTGCAATGGCATTTCGTAGCTTGTTAAGTGTGTATCACCAAACATGCAAAGTGTTAAAGGTGTCA
GCCACTCCAGTGGGAAGGGAGGTCGAGTTCACAACACAACAACCAGTTAAATTGGTGGTTGAGGATACACTTTCATTCCA
ATCTTTTGTTGATGCGCAAGGCTCAAAAACCAATGCTGACGTAGTTCAGCATGGTTCGAACATACTCGTGTATGTGTCGA
GTTACAATGAAGTGGATACATTAGCCAAGCTTCTAACAGATAGGAATATGATAGTCTCAAAAGTTGATGGCAGAACAATG
AAGCACGGATGCTTAGAAATTGTAACGAAAGGGACTAGTGCAAAGCCACATTTTGTCGTAGCAACCAACATTATTGAAAA
TGGAGTAACTTTAGATATAGATGTAGTTGTAGATTTTGGGCTTAAAGTCTCACCGTTTTTAGATATTGACAATAGGAGCA
TAGCATACAATAAGATTAGTGTTAGCTATGGAGAAAGAATTCAGAGGTTGGGCCGTGTTGGGCGCTTTAAGAAGGGAGTG
GCATTGCGTATTGGACACACCGAAAAGGGAATTATTGAGATTCCAAGTATGATTGCTAGTGAAGCTGCGCTTGCGTGCTT
TGCATACAATTTGCCAGTAATGACAGGGGGTGTTTCAACTAGCCTCATTGGCAATTGTACTGTTCGTCAAGTTAAAACTA
TGCAACAATTTGAGCTGAGTCCATTCTTTATACAAAATTTTGTTGCCCATGATGGATCAATGCATCCTGTCATACATGAC
ATTCTTAAGAAGTATAAACTGCGAGATTGTATGACGCCCTTGTGTGATCAATCCATACCTTACAGAGCCTCAAGCACTTG
GTTGTCTGTTAGTGAGTACGAACGACTCGGAGTGGTTTTGGACATTCCAAAACAGATCAAGATTGCATTCCACATCAAGG
ATATCCCTCCTAAGTTGCATGAAATGCTTTGGGAAACAGTTATCAAATATAAGGATGTTTGTTTGTTTCCAAGTATTCGG
GCTTCATCCATTAGCAAAATTGCATACACACTGCGCACTGATCTTTTTGCAATTCCCAGAACCCTAATTCTAGTTGAAAG
ATTGCTCGAGGAGGAACGAGTGAAACAGAGTCAATTCAGAAGTCTCATTGATGAAGGATGCTCAAGCATGTTTTCAATTG
TTAATTTAACAAACACTCTTAGAGCTAGATATGCAAAGGATTACACTGCAGAAAACATACAGAAGCTCGAGAAAGTGAGG
AGTCAGTTAAAGGAGTTCTCAAATTTAAATGGCTCTGCATGCGAGGAGAACTTAATGAAGAGGTATGAATCTCTACAGTT
TGTGCATCATCAAGCAACAACTGCACTCGCAAAGGATTTGAAGTTGAAAGGAGTTTGGAAGAAGTCATTAGTTGTGCAGG
ACTTAATCATAGCGGGTGCCGTTGCTATTGGTGGAATAGGGCTCATCTATAGTTGGTTTACTCAATCAGTTGAAACTGTG
TCTCACCAGGGCAAGAACAAATCCAAAAGAATTCAAGCATTGAAGTTTCGACACGCCCGCGATAAGAGGGCTGGTTTTGA
AATTGATAACAATGATGATACAATAGAAGAATTCTTTGGATCTGCATACAGGAAGAAGGGAAAAGGTAAAGGCACCACTG
TTGGTATGGGCAAGTCAAGCAGGAGGTTTGTTAATATGTATGGATTTGACCCAACAGAATATTCATTCATCCAGTTCGTT
GATCCGCTCACTGGAGCTCAAATTGAAGAGAACGTCTATGCTGATATTAGAGACATCCAAGAGCGCTTTAGTGATGTCCG
CAAGAAAATGGTAGAGGATGATGAAATCGAATTGCAAGCATTGGGCAGCAACACAATCATTCATGCTTACTTCAGGAAGG
ATTGGTCTGACAAGGCTCTAAAAATTGATTTGATGCCACACAACCCACTCAAAATCTGTGATAAATCGAATGGCATTGCT
AAGTTTCCTGAAAGAGAACTTGAGTTGAGGCAAACTGGGCCAGCAACAGAGGTTGATGTGAAAGACATTCCAAAACAGGA
AGTGGAGCATGAAGCCAAATCACTCATGAGAGGTTTAAGGGATTTCAATCCAATTGCTCAAACAGTTTGCAGAGTAAAAG
TGTCTGTTGAATATGGAACGTCTGAAATGTATGGGTTCGGTTTTGGTGCGTATATTATAGTAAACCACCATCTATTCAAG
AGTTTCAATGGATCCATGGAAGTGCGATCAATGCATGGAACATTCAGAGTGAAGAATTTGCATAGCCTGAGCGTTTTACC
GATCAAAGGCAGAGACATTATCATCATAAAGATGCCAAAGGATTTCCCTGTTTTCCCACAAAAACTGCACTTCCGAGCTC
CAGTGCAGAATGAGAGGATTTGTTTGGTTGGAACTAATTTTCAAGAAAAACATGCATCATCAATCATCACAGAAACGAGT
ACTACATACAATGTACCGGGCAGCACTTTTTGGAAGCATTGGATTGAAACAAATGATGGGCATTGTGGATTACCAGTAGT
GAGTACAGCTGATGGATGTCTAGTTGGAATACACAGCTTGGCGAATAATGTGCAAACCACGAATTATTATTCAGCCTTTG
ATGAGGATTTTGAAAGTAAGTATCTCCGAACTGATGAGCATAATGAGTGGACCAAATCGTGGGTATATAACCCAGATACT
GTGTTGTGGGGTCCATTGAAGCTCAAAGAGAGTACCCCTAAAGGCCTGTTTAAGACAACAAAACTTGTACAGGATTTAAT
TGATCATGATGTTGTTGTAGAGCAAGCTAAACATTCTGCGTGGATGTATGAGGCTCTAACAGGGAATTTGCAAGCTGTGG
CGACAATGAAGAGTCAGCTAGTGACAAAGCACGTGGTCAAAGGGGAGTGTCGGCACTTCAAAGAGTTCTTAACTGTGGAT
TCGGAAGCAGAAGCTTTCTTCAGGCCTTTGATGGATGCTTATGGGAAGAGCTTGTTAAATAGAGAAGCATATATAAAGGA
CATAATGAAATACTCAAAGCCTATTGATGTTGGAATAGTAGACTGTGATGCTTTTGAAGAGGCTATCAATAGGGTTATCA
TTTATCTGCAAGTACATGGCTTCCAGAAATGCAATTACATCACCGATGAGCAGGAAATTTTCAAAGCTCTCAATATGAAA
GCTGCTGTCGGGGCTATGTATGGAGGCAAGAAGAAAGACTACTTCGAGCATTTTACTGAGGCGGATAAAGAGGAAATTGT
TATGCAAAGTTGCTTACGATTGTACAAGGGCTCACTTGGCATATGGAATGGATCATTGAAAGCAGAACTTCGGTGCAAAG
AGAAGATACTTGCAAATAAGACAAGGACATTCACTGCTGCACCTTTAGATACTCTACTGGGTGGGAAGGTGTGCGTTGAT
GATTTTAATAATCAATTCTACTCAAAGAACATTGAATGCTGCTGGACTGTTGGAATGACTAAGTTTTATGGAGGTTGGGA
CAAATTGCTTCGGCGTCTACCTGAAAATTGGGTGTACTGCGATGCCGATGGTTCACAATTCGATAGTTCACTCACCCCAT
ACCTAATTAATGCTGTTCTCATCATCAGAAGCACATACATGGAAGATTGGGACTTGGGGTTGCAAATGTTGCGCAATTTG
TACACAGAAATAATTTACACACCAATCTCAACTCCAGATGGAACAATTGTCAAGAAGTTTAGAGGTAATAATAGCGGTCA
ACCTTCTACCGTTGTGGATAATTCTCTCATGGTTGTTCTTGCTATGCATTACGCTCTCATTAAGGAGTGCGTTGAGTTTG
AAGAAATCGACAGCACGTGTGTATTCTTTGTTAATGGTGATGACTTATTGATTGCTGTGAATCCGGAGAAAGAGAGCATT
CTCGATAGAATGTCACAACATTTCTCAGATCTTGGTTTGAACTATGATTTTTCGTCGAGAACAAGAAGGAAGGAGGAATT
GTGGTTCATGTCCCATAGAGGCCTGCTAATTGAGGGTATGTACGTGCCAAAGCTTGAAGAAGAGAGAATTGTATCCATTC
TGCAATGGGATAGGGCTGATCTGCCAGAGCACAGATTAGAAGCGATTTGTGCAGCAATGATAGAATCCTGGGGTTATTTT
GAGTTAACGCACCAAATTAGGAGATTCTACTCATGGTTGTTACAACAGCAACCTTTTTCAACGATAGCACAGGAAGGAAA
AGCTCCATACATAGCGAGCATGGCATTGAAGAAGCTGTACATGAATAGGACAGTAGATGAGGAGGAACTGAAGGCTTTCA
CTGAAATGATGGTTGCCTTGGATGACGAATTTGAGTGCGATACTTATGAAGTGCACCATCAAGGAAATGACACAATCGAT
GCAGGAGGAAGCACTAAGAAGGATGCAAAACAAGAGCAAGGTAGCATTCAACCAAATCTCAACAAGGAAAAGGAAAAGGA
CGTGAATGTTGGAACATCTGGAACTCATACTGTGCCACGAATTAAAGCTATCACGTCCAAAATGAGAATGCCTAAGAGTA
AAGGTGCAACTGTACTAAATTTGGAACACTTACTCGAGTATGCTCCACAGCAAATTGACATCTCAAATACTCGAGCAACT
CAATCACAGTTTGATACGTGGTATGAAGCAGTACAACTTGCATACGACATAGGAGAAACTGAAATGCCAACTGTGATGAA
TGGGCTTATGGTTTGGTGCATTGAAAATGGAACCTCGCCAAACATCAACGGAGTTTGGGTTATGATGGATGGAGATGAAC
AAGTCGAATACCCACTGAAACCAATCGTTGAGAATGCAAAACCAACACTTAGGCAAATCATGGCACATTTCTCAGATGTT
GCAGAAGCGTATATAGAAATGCGCAACAAAAAGGAACCATATATGCCACGATATGGTTTAGTTCGTAATCTGCGCGATGG
AAGTCTGGCTCGCTATGCTTTTGACTTTTATGAAGTTACATCACGGACACCAGTGAGGGCTAGAGAGGCACACATTCAAA
TGAAGGCCGCAGCTTTAAAATCAGCTCAATCTCGACTTTTCGGATTGGATGGTGGCATTAGTACACAAGAGGAAAACACA
GAGAGGCACACCACCGAGGATGTTTCTCCAAGTATGCATACTCTACTTGGAGTGAAGAACATGTGA
>CF_YL21
ATGGCAATCTACACGTCAACAATCTGTTTCGGTTCGTTTGAATGCAAGCTACCATACTCACCCGCCTCTTGCGGGCATAT
TGTGAAGGAACGAGAAGTGCTGGCTTCCATCGATCCTTTCGCAGATCTGGAAACACAACTTAGTGCACGATTGCTCAAGC
AAGAATATGCTACTGTTCGTGTGCTCAAGAACGGTACTCTTACGTACCGATACAAGACTGATGCCCAGATAACGCGCATC
CAGAAGAGACTGGAAAGGAAGGATAGGGAAGAATACCACTTCCAGATGGCAGCTCCTAGTATTGTGTCAAAGATCACTAT
TGCTGGTGGAGAGCCACCTTCAAAACTTGAATCACAAGTGCGGAGGGGTGTCATCCACACAACTCCAAGGATGCGCACAG
CAAAAACATATCACACGCCAAAGCTGACAGAGGGACAAATGAACCACCTTATCAAGCAGGTGAAGCAAATTATGTCAACC
AAAGGAGGGTCTGTTCAACTGATTAGCAAGAAAAGTACCCATGTTCACTATAAAGAAGTTTTGGGATCACATCGCGCAGT
TGTTTGCACTGCACATATGAGAGGTTTACGAAAGAGAGTGGACTTTCGGTGTGATAAATGGACCGTTGTGCGTCTACAGC
ATCTCGCCAGGACGGATAAGTGGAATAACCAAGTTCGTGTTACTGATCTACGCAAGGGCGATAGTGGAGTTATATTGAGT
AATACTAATCTCAAAGGAAACTTTGGGAGAAGCTCGGAGGGCATATTCATAGTGCGTGGGTCGCACGAAGGAAAAATCTA
TGATGCACGTTCCAAGGTTACTCAAGGGGTTATGGATTCAATGGTTCAGTTCTCAAGCGCTGAAAGCTTTTGGAAGGGAT
TGGACGGCAATTGGGCACAAATGAGATATCCTACAGATCATACATGTGTGGCAGGCTTACCAGTTGAAGACTGTGGCAGA
GTTGCAGCGATAATGACACACAGTATTTTACCGTGCTATAAGATAACCTGCCCTACCTGTGCCCAACAATATGCCAACTT
GCCAGCCAGTGACTTACTTAAGATATTACACAAGCACGCAAGTGATGGTTTAAATCGATTGGGGGCAGACAAAGATCGCT
TTGTGCATGTCAAAAAGTTCTTGACAATCTTAGAGCACTTAACTGAACCGGTTGATCTGAGTCTAGAAATTTTCAATGAA
GTATTCAAGTCTATAGGGGAGAAGCAACAATCACCTTTCAAAAACCTGAATATTCTGAATAATTTCTTTTTAAAAGGAAA
GGAAAATACAGCTCGTGAATGGCAGGTGGCTCAATTAAGCTTACTTGAATTGGCAAGATTCCAAAAGAACAGAACGGATA
ATATCAAGAAAGTAGACATCTCGTTCTTTAGGAAT
Download .txt
gitextract_gj8qs7tf/

├── .Rbuildignore
├── .gitignore
├── CONDUCT.md
├── DESCRIPTION
├── Makefile
├── NAMESPACE
├── NEWS.md
├── R/
│   ├── AllClasses.R
│   ├── SeqBundles.R
│   ├── ancestor_seq.R
│   ├── arc.R
│   ├── available.R
│   ├── clustal.R
│   ├── color_by_conservation.R
│   ├── color_else.R
│   ├── cons.R
│   ├── data.R
│   ├── dms.R
│   ├── facet_msa.R
│   ├── geom_GC.R
│   ├── geom_asterisk.R
│   ├── geom_msa.R
│   ├── geom_msaBar.R
│   ├── geom_seed.R
│   ├── ggmaf.R
│   ├── ggmsa.R
│   ├── import-functions.R
│   ├── method-plot.R
│   ├── method-show.R
│   ├── methods-diff.R
│   ├── methods-ggplot_add.R
│   ├── msa_data.R
│   ├── pp_interactive.R
│   ├── prepare_fasta.R
│   ├── read_maf.R
│   ├── seqdiff.R
│   ├── seqlogo.R
│   ├── simplot.R
│   ├── sysdata.rda
│   ├── theme_msa.R
│   └── zzz.R
├── README.Rmd
├── README.md
├── inst/
│   ├── CITATION
│   └── extdata/
│       ├── GVariation/
│       │   ├── A.Mont.fas
│       │   ├── B.Oz.fas
│       │   ├── C.Wilga5.fas
│       │   └── sample_alignment.fa
│       ├── Gram-negative_AKL.fasta
│       ├── Gram-positive_AKL.fasta
│       ├── LeaderRepeat_All.fa
│       ├── Rfam/
│       │   ├── RF00458.fasta
│       │   ├── RF03120.fasta
│       │   └── RF03120_SS.txt
│       ├── TP53_genes.xlsx
│       ├── sample.fasta
│       ├── seedSample.fa
│       ├── sequence-link-tree.fasta
│       └── tp53.fa
├── man/
│   ├── GVariation.Rd
│   ├── Gram-negative_AKL.fasta.Rd
│   ├── Gram-positive_AKL.fasta.Rd
│   ├── LeaderRepeat_All.fa.Rd
│   ├── Rfam.Rd
│   ├── TP53_genes.xlsx.Rd
│   ├── adjust_ally.Rd
│   ├── assign_dms.Rd
│   ├── available_colors.Rd
│   ├── available_fonts.Rd
│   ├── available_msa.Rd
│   ├── extract_seq.Rd
│   ├── facet_msa.Rd
│   ├── geom_GC.Rd
│   ├── geom_helix.Rd
│   ├── geom_msa.Rd
│   ├── geom_msaBar.Rd
│   ├── geom_seed.Rd
│   ├── geom_seqlogo.Rd
│   ├── ggSeqBundle.Rd
│   ├── gghelix.Rd
│   ├── ggmaf.Rd
│   ├── ggmsa.Rd
│   ├── merge_seq.Rd
│   ├── plot-methods.Rd
│   ├── readSSfile.Rd
│   ├── read_maf.Rd
│   ├── reset_pos.Rd
│   ├── sample.fasta.Rd
│   ├── seedSample.fa.Rd
│   ├── seqdiff.Rd
│   ├── seqlogo.Rd
│   ├── sequence-link-tree.fasta.Rd
│   ├── show-methods.Rd
│   ├── simplify_hdata.Rd
│   ├── simplot.Rd
│   ├── theme_msa.Rd
│   ├── tidy_hdata.Rd
│   ├── tidy_maf_df.Rd
│   ├── tidy_msa.Rd
│   ├── tp53.fa.Rd
│   └── treeMSA_plot.Rd
├── tests/
│   ├── testthat/
│   │   ├── test-main.R
│   │   ├── test-msa_data.R
│   │   └── test-tidy_msa.R
│   └── testthat.R
└── vignettes/
    ├── .gitignore
    ├── ggmsa.Rmd
    └── ggmsa.bib
Condensed preview — 108 files, each showing path, character count, and a content snippet. Download the .json file or copy for the full structured content (322K chars).
[
  {
    "path": ".Rbuildignore",
    "chars": 124,
    "preview": "^.*\\.Rproj$\n^\\.Rproj\\.user$\nMakefile\nREADME.md\nREADME_files\nREADME.Rmd\n^_pkgdown\\.yml$\n^docs$\n^pkgdown$\nlogo.png\nCONDUCT"
  },
  {
    "path": ".gitignore",
    "chars": 103,
    "preview": ".Rproj.user\n.Rhistory\n.RData\n.Renviron\n.DS_Store\ninst/doc\nggmsa.Rproj\nggmsa.Rcheck\n.git\ndocs/\npkgdown/\n"
  },
  {
    "path": "CONDUCT.md",
    "chars": 1389,
    "preview": "# Contributor Code of Conduct\n\nAs contributors and maintainers of this project, we pledge to respect all people who \ncon"
  },
  {
    "path": "DESCRIPTION",
    "chars": 1876,
    "preview": "Package: ggmsa\nTitle: Plot Multiple Sequence Alignment using 'ggplot2'\nVersion: 1.19.0\nAuthors@R: c(person(\"Guangchuang\""
  },
  {
    "path": "Makefile",
    "chars": 1409,
    "preview": "PKGNAME := $(shell sed -n \"s/Package: *\\([^ ]*\\)/\\1/p\" DESCRIPTION)\nPKGVERS := $(shell sed -n \"s/Version: *\\([^ ]*\\)/\\1/"
  },
  {
    "path": "NAMESPACE",
    "chars": 3555,
    "preview": "# Generated by roxygen2: do not edit by hand\n\nS3method(diff,SeqDiff)\nS3method(ggplot_add,GCcontent)\nS3method(ggplot_add,"
  },
  {
    "path": "NEWS.md",
    "chars": 4861,
    "preview": "# ggmsa 1.18.0\n\n+ Bioconductor RELEASE_3_23 (2026-04-29, Wed)\n\n# ggmsa 1.16.0\n\n+ Bioconductor RELEASE_3_22 (2025-11-01, "
  },
  {
    "path": "R/AllClasses.R",
    "chars": 292,
    "preview": "setClass(\"SeqDiff\",\n         representation = representation(\n                          file = \"character\",\n            "
  },
  {
    "path": "R/SeqBundles.R",
    "chars": 5438,
    "preview": "##'  plot Sequence Bundles for MSA based 'ggolot2'\n##'\n##'\n##' @title ggSeqBundle\n##' @importFrom ggfun geom_xspline\n##'"
  },
  {
    "path": "R/ancestor_seq.R",
    "chars": 4462,
    "preview": "##' plot Tree-MSA plot\n##'\n##'\n##' 'treeMSA_plot()' automatically re-arranges the MSA data according to \n##' the tree st"
  },
  {
    "path": "R/arc.R",
    "chars": 9728,
    "preview": "##'  Plots nucleltide secondary structure as helices in arc diagram\n##'\n##' @title gghelix\n##' @param helix_data a data "
  },
  {
    "path": "R/available.R",
    "chars": 1807,
    "preview": "##' This function lists font families currently available \n##' that can be used by 'ggmsa'\n##'\n##'\n##' @title List Font "
  },
  {
    "path": "R/clustal.R",
    "chars": 1582,
    "preview": "##'  A color scheme of Culstal. The algorithm to assign colors\n##'   for Multiple Sequence.\n##'\n##' @param y sequence al"
  },
  {
    "path": "R/color_by_conservation.R",
    "chars": 900,
    "preview": "color_increment <- function(conservation_visibility){\n    lapply(seq_len(nrow(conservation_visibility)), function(i){\n  "
  },
  {
    "path": "R/color_else.R",
    "chars": 1269,
    "preview": "##'  Assigning colors to sequence alignment.\r\n##'\r\n##'\r\n##' @param y sequence alignment with data frame, generated by ti"
  },
  {
    "path": "R/cons.R",
    "chars": 4118,
    "preview": "##' cleaning the needless sequences' color according to the \r\n##' consensus sequence (only used in the consensus views)."
  },
  {
    "path": "R/data.R",
    "chars": 3173,
    "preview": "#' A sample data used in ggmsa\n#'\n#' A dataset containing the alignment sequences of \n#' the phenylalanine hydroxylase p"
  },
  {
    "path": "R/dms.R",
    "chars": 749,
    "preview": "##' assign dms value to alignments.\n##'\n##' @title assign_dms\n##' @param x data frame from tidy_msa()\n##' @param dms dms"
  },
  {
    "path": "R/facet_msa.R",
    "chars": 1031,
    "preview": "##' The MSA would be plot in a field that you set.\n\n##' @title segment MSA\n##' @param field a numeric vector of the fiel"
  },
  {
    "path": "R/geom_GC.R",
    "chars": 2072,
    "preview": "##' Multiple sequence alignment layer for ggplot2. It plot points of GC content.\n\n##' @title geom_GC\n##' @param show.leg"
  },
  {
    "path": "R/geom_asterisk.R",
    "chars": 2894,
    "preview": "##' a ggplot2 layer of asterisk as a polygon\n##'\n##'\n##' @title a ggplot2 layer of asterisk as a polygon\n##' @param mapp"
  },
  {
    "path": "R/geom_msa.R",
    "chars": 7621,
    "preview": "##' Multiple sequence alignment layer for ggplot2. \r\n##' It creates background tiles with/without sequence characters.\r\n"
  },
  {
    "path": "R/geom_msaBar.R",
    "chars": 1734,
    "preview": "##' Multiple sequence alignment layer for ggplot2.\n##'  It plot sequence conservation bar.\n\n##' @title geom_msaBar\n\n##' "
  },
  {
    "path": "R/geom_seed.R",
    "chars": 2836,
    "preview": "##' Highlighting the seed in miRNA sequences\r\n##'\r\n##'\r\n##' @title geom_seed\r\n##' @param seed a character string.Specify"
  },
  {
    "path": "R/ggmaf.R",
    "chars": 5783,
    "preview": "##' plot MAF\n##'\n##' @title ggmaf \n##' @param data a tidy MAF data frame.You can get it by tidy_maf_df() \n##' @param ref"
  },
  {
    "path": "R/ggmsa.R",
    "chars": 5334,
    "preview": "##' Plot multiple sequence alignment using ggplot2 with multiple color schemes \r\n##' supported.\r\n##'\r\n##'\r\n##' @title gg"
  },
  {
    "path": "R/import-functions.R",
    "chars": 201,
    "preview": "##' @importFrom utils globalVariables\nglobalVariables(\".\")\nglobalVariables(\"fre\") #geom_GC.R:\nglobalVariables(\"read.deli"
  },
  {
    "path": "R/method-plot.R",
    "chars": 3664,
    "preview": "##' plot method for SeqDiff object\n##'\n##' @name plot\n##' @rdname plot-methods\n##' @exportMethod plot\n##' @aliases plot,"
  },
  {
    "path": "R/method-show.R",
    "chars": 938,
    "preview": "##' show method\n##'\n##'\n##' @name show\n##' @docType methods\n##' @rdname show-methods\n##' @title show method\n##' @param o"
  },
  {
    "path": "R/methods-diff.R",
    "chars": 87,
    "preview": "##' @method diff SeqDiff\n##' @export\ndiff.SeqDiff <- function(x, ...) {\n    x@diff\n}\n\n\n"
  },
  {
    "path": "R/methods-ggplot_add.R",
    "chars": 4414,
    "preview": "##' @method ggplot_add seqlogo\r\n##' @export\r\nggplot_add.seqlogo <- function(object, plot, object_name) {\r\n    msaData <-"
  },
  {
    "path": "R/msa_data.R",
    "chars": 10076,
    "preview": "##' This function parses FASTA files or other sequence objects. \r\n##' And assign color to each molecule (amino acid or n"
  },
  {
    "path": "R/pp_interactive.R",
    "chars": 3716,
    "preview": "\nmake_gap <- function(gap, previous_seq) {\n    gap_df <- previous_seq[rep(1, each=gap),] \n    gap_start <- max(previous_"
  },
  {
    "path": "R/prepare_fasta.R",
    "chars": 2107,
    "preview": "##' preparing multiple sequence alignment\n##'\n##' This function supports both NT or AA sequences; It supports multiple \n"
  },
  {
    "path": "R/read_maf.R",
    "chars": 2034,
    "preview": "##' read 'multiple alignment format'(MAF) file\n##'\n##' @title read_maf\n##' @param multiple_alignment_format a multiple a"
  },
  {
    "path": "R/seqdiff.R",
    "chars": 2230,
    "preview": "\n##' calculate difference of two aligned sequences\n##'\n##'\n##' @title seqdiff\n##' @param fasta fasta file\n##' @param ref"
  },
  {
    "path": "R/seqlogo.R",
    "chars": 7473,
    "preview": "##' plot sequence logo for MSA based 'ggolot2'\n\n##' @title seqlogo\n##' @param msa Multiple sequence alignment file or ob"
  },
  {
    "path": "R/simplot.R",
    "chars": 3909,
    "preview": "##' Sequence similarity plot\n##'\n##'\n##' @title simplot\n##' @param file alignment fast file\n##' @param query query seque"
  },
  {
    "path": "R/theme_msa.R",
    "chars": 2068,
    "preview": "##' Theme for ggmsa.\n##'\n##' @title theme_msa\n##' @importFrom ggplot2 theme_minimal\n##' @importFrom ggplot2 labs\n##' @ex"
  },
  {
    "path": "R/zzz.R",
    "chars": 956,
    "preview": "#' @importFrom utils packageDescription\n.onAttach <- function(libname, pkgname){\n    #options(total_heigh = 4)\n    optio"
  },
  {
    "path": "README.Rmd",
    "chars": 3304,
    "preview": "---\noutput: \n  md_document:\n    variant: gfm\nhtml_preview: TRUE\n---\n<!-- README.md is generated from README.Rmd. Please "
  },
  {
    "path": "README.md",
    "chars": 3226,
    "preview": "<!-- README.md is generated from README.Rmd. Please edit that file -->\n\n# ggmsa:a visual exploration tool for multiple s"
  },
  {
    "path": "inst/CITATION",
    "chars": 1857,
    "preview": "citHeader(\"To cite ggmsa in publications use:\")\n\ncitEntry(\n    entry  = \"book\",\n    title = \"Data Integration, Manipulat"
  },
  {
    "path": "inst/extdata/GVariation/A.Mont.fas",
    "chars": 18388,
    "preview": ">Mont\nATGGCAACTTACACATCAACAATCCAGTTTGGTTCCATTGAATGCAAACTTCCATACTCACCCGCTCCTTTTGGGCTAGTTGCGGGGAAACGAGAAGTTTTAACCACCACTGAC"
  },
  {
    "path": "inst/extdata/GVariation/B.Oz.fas",
    "chars": 18386,
    "preview": ">Oz\nATGGCAACTTACATGTCAACAATCTGTTTCGGTTCGTTTGAATGCAAGCTACCATACTCACCCGCTTCTTGCGGGCATATTGTGAAGGAGCGAGAAGTGCTGGCTTCCGTTGATCC"
  },
  {
    "path": "inst/extdata/GVariation/C.Wilga5.fas",
    "chars": 18390,
    "preview": ">Wilga5\nATGGCAACTTACATGTCAACAATCTGTTTCGGTTCGTTTGAATGCAAGCTACCATACTCACCCGCCTCTTGCGGGCATATTGTGAAGGAACGAGAAGTGCTGGCTTCCGTTG"
  },
  {
    "path": "inst/extdata/GVariation/sample_alignment.fa",
    "chars": 37231,
    "preview": ">Mont\nATGGCAACTTACACATCAACAATCCAGTTTGGTTCCATTGAATGCAAACTTCCATACTCACCCGCTCCTTTTGGGCTAGT\nTGCGGGGAAACGAGAAGTTTTAACCACCACTGA"
  },
  {
    "path": "inst/extdata/Gram-negative_AKL.fasta",
    "chars": 6900,
    "preview": ">Random_Gram-negative_AKL_gjtez\nRWTHLASGRTYNYKFNPPKQYGKDDITGEDLIQRED\n>Random_Gram-negative_AKL_dibhu\nRWTHLNSGRTYHYKFNPPK"
  },
  {
    "path": "inst/extdata/Gram-positive_AKL.fasta",
    "chars": 6900,
    "preview": ">Random_Gram-positive_AKL_pjxgp\nRRTCVGCGTAFNYVMEPPKKEGICDACGGKLVVRDD\n>Random_Gram-positive_AKL_essyp\nRRTCVGCGTAFNYVMEPPK"
  },
  {
    "path": "inst/extdata/LeaderRepeat_All.fa",
    "chars": 1677,
    "preview": ">Ain_RyC-MR95\nATCCGTTGATCAAATTTGAGGTTTGAGAGATATGTAAATTCAAAGGATAATCAAAC\n>Asp_D21\nATCCGTTGATCAAATTTGAGGTTTGAGAGATATGTAAATT"
  },
  {
    "path": "inst/extdata/Rfam/RF00458.fasta",
    "chars": 1630,
    "preview": ">AF178440.1/5925-6123\nUUGACUAUGUGAUCUUGCUUUCG----UAAUAAAAUUCUGUACAUAAAAGUCGAAAGUAUUGCUAUAGUUAAGGUUGCGCUUGCCUAUUUAGGCAUAC"
  },
  {
    "path": "inst/extdata/Rfam/RF03120.fasta",
    "chars": 6212,
    "preview": ">KU973692.1/1-298\nAUAUUAGGUUUUUACCUACCCAGGAA--AAGCCAACCAA-CCUCGAUCUCUUGUAGAUCUGUUCUCUAAACGA-ACUUUAAAA------UCUGUGUAGCUGU"
  },
  {
    "path": "inst/extdata/Rfam/RF03120_SS.txt",
    "chars": 318,
    "preview": ">RF03120\n......<<<<<<<.<<<....>>>>>..>>>>>...........<<<<<.....>>>>>.<<<<.......>>.>>..............<<<<<<<<.<<.<<<<.<<<."
  },
  {
    "path": "inst/extdata/sample.fasta",
    "chars": 4377,
    "preview": ">PH4H_Rattus_norvegicus\nMAAVVLENGVLSRKLSDFGQETSYIEDNSNQNGAISLIFSLKEEVGALAKVLRLFEENDINLTHIESRPSRLNKDEYEFF\nTYLDKRTKPVLGSII"
  },
  {
    "path": "inst/extdata/seedSample.fa",
    "chars": 444,
    "preview": ">hsa-let-7a-5p MIMAT0000062 Homo sapiens let-7a-5p\nUGAGGUAGUAGGUUGUAUAGUU\n>hsa-let-7b-5p MIMAT0000063 Homo sapiens let-7"
  },
  {
    "path": "inst/extdata/sequence-link-tree.fasta",
    "chars": 14113,
    "preview": ">Phy000B0HV_NEUCR\nM-----GIGSATLG-----------------------------------SRIPTPVLVARAVVSSSDGK-----DC--VA\nNPNLCEKP-VGGSQLTVPIVL"
  },
  {
    "path": "inst/extdata/tp53.fa",
    "chars": 2099,
    "preview": ">Homo_sapiens\n----MDDLMLSP-------DDIEQWFTED-----------------PGPDEAPRMPEAAPPVAPAPA---------APTPAAPAPAPSWPLSSSVPSQKTYQGSYG"
  },
  {
    "path": "man/GVariation.Rd",
    "chars": 688,
    "preview": "% Generated by roxygen2: do not edit by hand\n% Please edit documentation in R/data.R\n\\docType{data}\n\\name{GVariation}\n\\a"
  },
  {
    "path": "man/Gram-negative_AKL.fasta.Rd",
    "chars": 433,
    "preview": "% Generated by roxygen2: do not edit by hand\n% Please edit documentation in R/data.R\n\\docType{data}\n\\name{Gram-negative_"
  },
  {
    "path": "man/Gram-positive_AKL.fasta.Rd",
    "chars": 433,
    "preview": "% Generated by roxygen2: do not edit by hand\n% Please edit documentation in R/data.R\n\\docType{data}\n\\name{Gram-positive_"
  },
  {
    "path": "man/LeaderRepeat_All.fa.Rd",
    "chars": 314,
    "preview": "% Generated by roxygen2: do not edit by hand\n% Please edit documentation in R/data.R\n\\docType{data}\n\\name{LeaderRepeat_A"
  },
  {
    "path": "man/Rfam.Rd",
    "chars": 609,
    "preview": "% Generated by roxygen2: do not edit by hand\n% Please edit documentation in R/data.R\n\\docType{data}\n\\name{Rfam}\n\\alias{R"
  },
  {
    "path": "man/TP53_genes.xlsx.Rd",
    "chars": 284,
    "preview": "% Generated by roxygen2: do not edit by hand\n% Please edit documentation in R/data.R\n\\docType{data}\n\\name{TP53_genes.xls"
  },
  {
    "path": "man/adjust_ally.Rd",
    "chars": 496,
    "preview": "% Generated by roxygen2: do not edit by hand\n% Please edit documentation in R/ancestor_seq.R\n\\name{adjust_ally}\n\\alias{a"
  },
  {
    "path": "man/assign_dms.Rd",
    "chars": 332,
    "preview": "% Generated by roxygen2: do not edit by hand\n% Please edit documentation in R/dms.R\n\\name{assign_dms}\n\\alias{assign_dms}"
  },
  {
    "path": "man/available_colors.Rd",
    "chars": 423,
    "preview": "% Generated by roxygen2: do not edit by hand\n% Please edit documentation in R/available.R\n\\name{available_colors}\n\\alias"
  },
  {
    "path": "man/available_fonts.Rd",
    "chars": 423,
    "preview": "% Generated by roxygen2: do not edit by hand\n% Please edit documentation in R/available.R\n\\name{available_fonts}\n\\alias{"
  },
  {
    "path": "man/available_msa.Rd",
    "chars": 401,
    "preview": "% Generated by roxygen2: do not edit by hand\n% Please edit documentation in R/available.R\n\\name{available_msa}\n\\alias{av"
  },
  {
    "path": "man/extract_seq.Rd",
    "chars": 411,
    "preview": "% Generated by roxygen2: do not edit by hand\n% Please edit documentation in R/ancestor_seq.R\n\\name{extract_seq}\n\\alias{e"
  },
  {
    "path": "man/facet_msa.Rd",
    "chars": 619,
    "preview": "% Generated by roxygen2: do not edit by hand\n% Please edit documentation in R/facet_msa.R\n\\name{facet_msa}\n\\alias{facet_"
  },
  {
    "path": "man/geom_GC.Rd",
    "chars": 553,
    "preview": "% Generated by roxygen2: do not edit by hand\n% Please edit documentation in R/geom_GC.R\n\\name{geom_GC}\n\\alias{geom_GC}\n\\"
  },
  {
    "path": "man/geom_helix.Rd",
    "chars": 1259,
    "preview": "% Generated by roxygen2: do not edit by hand\n% Please edit documentation in R/arc.R\n\\name{geom_helix}\n\\alias{geom_helix}"
  },
  {
    "path": "man/geom_msa.Rd",
    "chars": 3399,
    "preview": "% Generated by roxygen2: do not edit by hand\n% Please edit documentation in R/geom_msa.R\n\\name{geom_msa}\n\\alias{geom_msa"
  },
  {
    "path": "man/geom_msaBar.Rd",
    "chars": 504,
    "preview": "% Generated by roxygen2: do not edit by hand\n% Please edit documentation in R/geom_msaBar.R\n\\name{geom_msaBar}\n\\alias{ge"
  },
  {
    "path": "man/geom_seed.Rd",
    "chars": 809,
    "preview": "% Generated by roxygen2: do not edit by hand\n% Please edit documentation in R/geom_seed.R\n\\name{geom_seed}\n\\alias{geom_s"
  },
  {
    "path": "man/geom_seqlogo.Rd",
    "chars": 1457,
    "preview": "% Generated by roxygen2: do not edit by hand\n% Please edit documentation in R/seqlogo.R\n\\name{geom_seqlogo}\n\\alias{geom_"
  },
  {
    "path": "man/ggSeqBundle.Rd",
    "chars": 1877,
    "preview": "% Generated by roxygen2: do not edit by hand\n% Please edit documentation in R/SeqBundles.R\n\\name{ggSeqBundle}\n\\alias{ggS"
  },
  {
    "path": "man/gghelix.Rd",
    "chars": 1094,
    "preview": "% Generated by roxygen2: do not edit by hand\n% Please edit documentation in R/arc.R\n\\name{gghelix}\n\\alias{gghelix}\n\\titl"
  },
  {
    "path": "man/ggmaf.Rd",
    "chars": 925,
    "preview": "% Generated by roxygen2: do not edit by hand\n% Please edit documentation in R/ggmaf.R\n\\name{ggmaf}\n\\alias{ggmaf}\n\\title{"
  },
  {
    "path": "man/ggmsa.Rd",
    "chars": 3566,
    "preview": "% Generated by roxygen2: do not edit by hand\n% Please edit documentation in R/ggmsa.R\n\\name{ggmsa}\n\\alias{ggmsa}\n\\title{"
  },
  {
    "path": "man/merge_seq.Rd",
    "chars": 473,
    "preview": "% Generated by roxygen2: do not edit by hand\n% Please edit documentation in R/pp_interactive.R\n\\name{merge_seq}\n\\alias{m"
  },
  {
    "path": "man/plot-methods.Rd",
    "chars": 984,
    "preview": "% Generated by roxygen2: do not edit by hand\n% Please edit documentation in R/method-plot.R\n\\docType{methods}\n\\name{plot"
  },
  {
    "path": "man/readSSfile.Rd",
    "chars": 532,
    "preview": "% Generated by roxygen2: do not edit by hand\n% Please edit documentation in R/arc.R\n\\name{readSSfile}\n\\alias{readSSfile}"
  },
  {
    "path": "man/read_maf.Rd",
    "chars": 372,
    "preview": "% Generated by roxygen2: do not edit by hand\n% Please edit documentation in R/read_maf.R\n\\name{read_maf}\n\\alias{read_maf"
  },
  {
    "path": "man/reset_pos.Rd",
    "chars": 291,
    "preview": "% Generated by roxygen2: do not edit by hand\n% Please edit documentation in R/pp_interactive.R\n\\name{reset_pos}\n\\alias{r"
  },
  {
    "path": "man/sample.fasta.Rd",
    "chars": 386,
    "preview": "% Generated by roxygen2: do not edit by hand\n% Please edit documentation in R/data.R\n\\docType{data}\n\\name{sample.fasta}\n"
  },
  {
    "path": "man/seedSample.fa.Rd",
    "chars": 386,
    "preview": "% Generated by roxygen2: do not edit by hand\n% Please edit documentation in R/data.R\n\\docType{data}\n\\name{seedSample.fa}"
  },
  {
    "path": "man/seqdiff.Rd",
    "chars": 553,
    "preview": "% Generated by roxygen2: do not edit by hand\n% Please edit documentation in R/seqdiff.R\n\\name{seqdiff}\n\\alias{seqdiff}\n\\"
  },
  {
    "path": "man/seqlogo.Rd",
    "chars": 1569,
    "preview": "% Generated by roxygen2: do not edit by hand\n% Please edit documentation in R/seqlogo.R\n\\name{seqlogo}\n\\alias{seqlogo}\n\\"
  },
  {
    "path": "man/sequence-link-tree.fasta.Rd",
    "chars": 347,
    "preview": "% Generated by roxygen2: do not edit by hand\n% Please edit documentation in R/data.R\n\\docType{data}\n\\name{sequence-link-"
  },
  {
    "path": "man/show-methods.Rd",
    "chars": 492,
    "preview": "% Generated by roxygen2: do not edit by hand\n% Please edit documentation in R/method-show.R\n\\docType{methods}\n\\name{show"
  },
  {
    "path": "man/simplify_hdata.Rd",
    "chars": 371,
    "preview": "% Generated by roxygen2: do not edit by hand\n% Please edit documentation in R/pp_interactive.R\n\\name{simplify_hdata}\n\\al"
  },
  {
    "path": "man/simplot.Rd",
    "chars": 1319,
    "preview": "% Generated by roxygen2: do not edit by hand\n% Please edit documentation in R/simplot.R\n\\name{simplot}\n\\alias{simplot}\n\\"
  },
  {
    "path": "man/theme_msa.Rd",
    "chars": 219,
    "preview": "% Generated by roxygen2: do not edit by hand\n% Please edit documentation in R/theme_msa.R\n\\name{theme_msa}\n\\alias{theme_"
  },
  {
    "path": "man/tidy_hdata.Rd",
    "chars": 487,
    "preview": "% Generated by roxygen2: do not edit by hand\n% Please edit documentation in R/pp_interactive.R\n\\name{tidy_hdata}\n\\alias{"
  },
  {
    "path": "man/tidy_maf_df.Rd",
    "chars": 421,
    "preview": "% Generated by roxygen2: do not edit by hand\n% Please edit documentation in R/ggmaf.R\n\\name{tidy_maf_df}\n\\alias{tidy_maf"
  },
  {
    "path": "man/tidy_msa.Rd",
    "chars": 770,
    "preview": "% Generated by roxygen2: do not edit by hand\n% Please edit documentation in R/msa_data.R\n\\name{tidy_msa}\n\\alias{tidy_msa"
  },
  {
    "path": "man/tp53.fa.Rd",
    "chars": 300,
    "preview": "% Generated by roxygen2: do not edit by hand\n% Please edit documentation in R/data.R\n\\docType{data}\n\\name{tp53.fa}\n\\alia"
  },
  {
    "path": "man/treeMSA_plot.Rd",
    "chars": 1440,
    "preview": "% Generated by roxygen2: do not edit by hand\n% Please edit documentation in R/ancestor_seq.R\n\\name{treeMSA_plot}\n\\alias{"
  },
  {
    "path": "tests/testthat/test-main.R",
    "chars": 304,
    "preview": "library(ggmsa)\nlibrary(ggplot2)\n\n\ntest_that(\"check whether `ggmsa` create a `ggplot` object\", {\n    p <- ggmsa(msa = sys"
  },
  {
    "path": "tests/testthat/test-msa_data.R",
    "chars": 883,
    "preview": "\n\nlibrary(ggmsa)\n\nmsa <- system.file(\"extdata\", \"sample.fasta\", package = \"ggmsa\")\ntidymsa <- tidy_msa(msa, 10, 20)\n\n\nte"
  },
  {
    "path": "tests/testthat/test-tidy_msa.R",
    "chars": 1208,
    "preview": "\n\nlibrary(ggmsa)\nlibrary(Biostrings)\n\nmsa <- system.file(\"extdata\", \"sample.fasta\", package = \"ggmsa\")\ntidy_names <- c(\""
  },
  {
    "path": "tests/testthat.R",
    "chars": 54,
    "preview": "library(testthat)\nlibrary(ggmsa)\n\ntest_check(\"ggmsa\")\n"
  },
  {
    "path": "vignettes/.gitignore",
    "chars": 98,
    "preview": "Annotations.Rmd\nColor_schemes_And_Font_Families.Rmd\nMSA_theme.Rmd\nOther_Modules.Rmd\nView_modes.Rmd"
  },
  {
    "path": "vignettes/ggmsa.Rmd",
    "chars": 4967,
    "preview": "---\ntitle: \"ggmsa-Getting Started\"\nauthor: \"GuangChuang Yu and Lang Zhou\"\noutput:\n  prettydoc::html_pretty:\n    toc: fal"
  },
  {
    "path": "vignettes/ggmsa.bib",
    "chars": 1272,
    "preview": "@article{Taylor1997Residual,\n         title={Residual colours: a proposal for aminochromography.},\n         author={Tayl"
  }
]

// ... and 2 more files (download for full content)

About this extraction

This page contains the full source code of the YuLab-SMU/ggmsa GitHub repository, extracted and formatted as plain text for AI agents and large language models (LLMs). The extraction includes 108 files (300.1 KB), approximately 117.4k tokens. Use this with OpenClaw, Claude, ChatGPT, Cursor, Windsurf, or any other AI tool that accepts text input. You can copy the full output to your clipboard or download it as a .txt file.

Extracted by GitExtract — free GitHub repo to text converter for AI. Built by Nikandr Surkov.

Copied to clipboard!