Showing preview only (324K chars total). Download the full file or copy to clipboard to get everything.
Repository: YuLab-SMU/ggmsa
Branch: devel
Commit: 956078ed388a
Files: 108
Total size: 300.1 KB
Directory structure:
gitextract_gj8qs7tf/
├── .Rbuildignore
├── .gitignore
├── CONDUCT.md
├── DESCRIPTION
├── Makefile
├── NAMESPACE
├── NEWS.md
├── R/
│ ├── AllClasses.R
│ ├── SeqBundles.R
│ ├── ancestor_seq.R
│ ├── arc.R
│ ├── available.R
│ ├── clustal.R
│ ├── color_by_conservation.R
│ ├── color_else.R
│ ├── cons.R
│ ├── data.R
│ ├── dms.R
│ ├── facet_msa.R
│ ├── geom_GC.R
│ ├── geom_asterisk.R
│ ├── geom_msa.R
│ ├── geom_msaBar.R
│ ├── geom_seed.R
│ ├── ggmaf.R
│ ├── ggmsa.R
│ ├── import-functions.R
│ ├── method-plot.R
│ ├── method-show.R
│ ├── methods-diff.R
│ ├── methods-ggplot_add.R
│ ├── msa_data.R
│ ├── pp_interactive.R
│ ├── prepare_fasta.R
│ ├── read_maf.R
│ ├── seqdiff.R
│ ├── seqlogo.R
│ ├── simplot.R
│ ├── sysdata.rda
│ ├── theme_msa.R
│ └── zzz.R
├── README.Rmd
├── README.md
├── inst/
│ ├── CITATION
│ └── extdata/
│ ├── GVariation/
│ │ ├── A.Mont.fas
│ │ ├── B.Oz.fas
│ │ ├── C.Wilga5.fas
│ │ └── sample_alignment.fa
│ ├── Gram-negative_AKL.fasta
│ ├── Gram-positive_AKL.fasta
│ ├── LeaderRepeat_All.fa
│ ├── Rfam/
│ │ ├── RF00458.fasta
│ │ ├── RF03120.fasta
│ │ └── RF03120_SS.txt
│ ├── TP53_genes.xlsx
│ ├── sample.fasta
│ ├── seedSample.fa
│ ├── sequence-link-tree.fasta
│ └── tp53.fa
├── man/
│ ├── GVariation.Rd
│ ├── Gram-negative_AKL.fasta.Rd
│ ├── Gram-positive_AKL.fasta.Rd
│ ├── LeaderRepeat_All.fa.Rd
│ ├── Rfam.Rd
│ ├── TP53_genes.xlsx.Rd
│ ├── adjust_ally.Rd
│ ├── assign_dms.Rd
│ ├── available_colors.Rd
│ ├── available_fonts.Rd
│ ├── available_msa.Rd
│ ├── extract_seq.Rd
│ ├── facet_msa.Rd
│ ├── geom_GC.Rd
│ ├── geom_helix.Rd
│ ├── geom_msa.Rd
│ ├── geom_msaBar.Rd
│ ├── geom_seed.Rd
│ ├── geom_seqlogo.Rd
│ ├── ggSeqBundle.Rd
│ ├── gghelix.Rd
│ ├── ggmaf.Rd
│ ├── ggmsa.Rd
│ ├── merge_seq.Rd
│ ├── plot-methods.Rd
│ ├── readSSfile.Rd
│ ├── read_maf.Rd
│ ├── reset_pos.Rd
│ ├── sample.fasta.Rd
│ ├── seedSample.fa.Rd
│ ├── seqdiff.Rd
│ ├── seqlogo.Rd
│ ├── sequence-link-tree.fasta.Rd
│ ├── show-methods.Rd
│ ├── simplify_hdata.Rd
│ ├── simplot.Rd
│ ├── theme_msa.Rd
│ ├── tidy_hdata.Rd
│ ├── tidy_maf_df.Rd
│ ├── tidy_msa.Rd
│ ├── tp53.fa.Rd
│ └── treeMSA_plot.Rd
├── tests/
│ ├── testthat/
│ │ ├── test-main.R
│ │ ├── test-msa_data.R
│ │ └── test-tidy_msa.R
│ └── testthat.R
└── vignettes/
├── .gitignore
├── ggmsa.Rmd
└── ggmsa.bib
================================================
FILE CONTENTS
================================================
================================================
FILE: .Rbuildignore
================================================
^.*\.Rproj$
^\.Rproj\.user$
Makefile
README.md
README_files
README.Rmd
^_pkgdown\.yml$
^docs$
^pkgdown$
logo.png
CONDUCT.md
================================================
FILE: .gitignore
================================================
.Rproj.user
.Rhistory
.RData
.Renviron
.DS_Store
inst/doc
ggmsa.Rproj
ggmsa.Rcheck
.git
docs/
pkgdown/
================================================
FILE: CONDUCT.md
================================================
# Contributor Code of Conduct
As contributors and maintainers of this project, we pledge to respect all people who
contribute through reporting issues, posting feature requests, updating documentation,
submitting pull requests or patches, and other activities.
We are committed to making participation in this project a harassment-free experience for
everyone, regardless of level of experience, gender, gender identity and expression,
sexual orientation, disability, personal appearance, body size, race, ethnicity, age, or religion.
Examples of unacceptable behavior by participants include the use of sexual language or
imagery, derogatory comments or personal attacks, trolling, public or private harassment,
insults, or other unprofessional conduct.
Project maintainers have the right and responsibility to remove, edit, or reject comments,
commits, code, wiki edits, issues, and other contributions that are not aligned to this
Code of Conduct. Project maintainers who do not follow the Code of Conduct may be removed
from the project team.
Instances of abusive, harassing, or otherwise unacceptable behavior may be reported by
opening an issue or contacting one or more of the project maintainers.
This Code of Conduct is adapted from the Contributor Covenant
(http://contributor-covenant.org), version 1.0.0, available at
http://contributor-covenant.org/version/1/0/0/
================================================
FILE: DESCRIPTION
================================================
Package: ggmsa
Title: Plot Multiple Sequence Alignment using 'ggplot2'
Version: 1.19.0
Authors@R: c(person("Guangchuang", "Yu", email = "guangchuangyu@gmail.com", role = c("aut", "cre","ths"), comment = c(ORCID = "0000-0002-6485-8781")),
person("Lang", "Zhou", email = "nyzhoulang@gmail.com", role = "aut"),
person("Shuangbin", "Xu", email = "xshuangbin@163.com", role = "ctb"),
person("Huina", "Huang", email = "1185796994@qq.com", role = "ctb"))
Description: A visual exploration tool for multiple sequence alignment
and associated data. Supports MSA of DNA, RNA, and protein sequences
using 'ggplot2'. Multiple sequence alignment can easily be combined
with other 'ggplot2' plots, such as phylogenetic tree Visualized by
'ggtree', boxplot, genome map and so on. More features: visualization
of sequence logos, sequence bundles, RNA secondary structures and detection
of sequence recombinations.
Depends: R (>= 4.1.0)
Imports:
Biostrings,
ggplot2,
magrittr,
tidyr,
utils,
stats,
aplot,
RColorBrewer,
ggfun (>= 0.2.0),
ggforce,
dplyr,
R4RNA,
grDevices,
seqmagick,
grid,
methods,
ggtree (>= 1.17.1)
Suggests:
ggtreeExtra,
ape,
cowplot,
knitr,
rmarkdown,
readxl,
ggnewscale,
kableExtra,
gggenes,
statebins,
prettydoc,
testthat (>= 3.0.0),
yulab.utils
License: Artistic-2.0
Encoding: UTF-8
URL: https://doi.org/10.1093/bib/bbac222(paper), https://www.amazon.com/Integration-Manipulation-Visualization-Phylogenetic-Computational-ebook/dp/B0B5NLZR1Z/ (book)
BugReports: https://github.com/YuLab-SMU/ggmsa/issues
biocViews: Software, Visualization, Alignment, Annotation, MultipleSequenceAlignment
RoxygenNote: 7.3.2
VignetteBuilder: knitr
Config/testthat/edition: 3
================================================
FILE: Makefile
================================================
PKGNAME := $(shell sed -n "s/Package: *\([^ ]*\)/\1/p" DESCRIPTION)
PKGVERS := $(shell sed -n "s/Version: *\([^ ]*\)/\1/p" DESCRIPTION)
PKGSRC := $(shell basename `pwd`)
BIOCVER := RELEASE_3_23
all: rd check clean
alldocs: rd readme mkdocs
rd:
Rscript -e 'roxygen2::roxygenise(".")'
readme:
Rscript -e 'rmarkdown::render("README.Rmd")'
readme2:
Rscript -e 'rmarkdown::render("README.Rmd", "html_document")'
build:
# cd ..;\
# R CMD build $(PKGSRC)
Rscript -e 'devtools::build()'
build2:
cd ..;\
R CMD build --no-build-vignettes $(PKGSRC)
install:
cd ..;\
R CMD INSTALL $(PKGNAME)_$(PKGVERS).tar.gz
check: #build
#cd ..;\
#Rscript -e 'rcmdcheck::rcmdcheck("$(PKGNAME)_$(PKGVERS).tar.gz")'
Rscript -e 'devtools::check()'
check2: build
cd ..;\
R CMD check $(PKGNAME)_$(PKGVERS).tar.gz
bioccheck:
cd ..;\
Rscript -e 'BiocCheck::BiocCheck("$(PKGNAME)_$(PKGVERS).tar.gz")'
gpcheck:
Rscript -e 'goodpractice::gp()'
clean:
cd ..;\
$(RM) -r $(PKGNAME).Rcheck/
gitmaintain:
git gc --auto;\
git prune -v;\
git fsck --full
rmrelease:
git branch -D $(BIOCVER)
release:
git checkout $(BIOCVER);\
git fetch --all
update:
git fetch --all;\
git checkout devel;\
git merge upstream/devel;\
git merge origin/devel;\
push:
git push upstream devel;\
git push origin devel
biocinit:
git remote add upstream git@git.bioconductor.org:packages/$(PKGNAME).git;\
git fetch --all
================================================
FILE: NAMESPACE
================================================
# Generated by roxygen2: do not edit by hand
S3method(diff,SeqDiff)
S3method(ggplot_add,GCcontent)
S3method(ggplot_add,facet_msa)
S3method(ggplot_add,msaBar)
S3method(ggplot_add,nucleotideeHelix)
S3method(ggplot_add,seed)
S3method(ggplot_add,seqlogo)
export(adjust_ally)
export(assign_dms)
export(available_colors)
export(available_fonts)
export(available_msa)
export(extract_seq)
export(facet_msa)
export(geom_GC)
export(geom_helix)
export(geom_msa)
export(geom_msaBar)
export(geom_seed)
export(geom_seqlogo)
export(ggSeqBundle)
export(gghelix)
export(ggmaf)
export(ggmsa)
export(merge_seq)
export(readSSfile)
export(read_maf)
export(reset_pos)
export(seqdiff)
export(seqlogo)
export(simplify_hdata)
export(simplot)
export(theme_msa)
export(tidy_hdata)
export(tidy_maf_df)
export(tidy_msa)
export(treeMSA_plot)
exportMethods(plot)
exportMethods(show)
importClassesFrom(Biostrings,BStringSet)
importFrom(Biostrings,AAStringSet)
importFrom(Biostrings,DNAStringSet)
importFrom(Biostrings,RNAStringSet)
importFrom(Biostrings,readBStringSet)
importFrom(Biostrings,readDNAStringSet)
importFrom(Biostrings,toString)
importFrom(Biostrings,width)
importFrom(R4RNA,as.helix)
importFrom(R4RNA,collapseHelix)
importFrom(R4RNA,expandHelix)
importFrom(R4RNA,readBpseq)
importFrom(R4RNA,readConnect)
importFrom(R4RNA,readHelix)
importFrom(R4RNA,readVienna)
importFrom(RColorBrewer,brewer.pal)
importFrom(aplot,insert_top)
importFrom(aplot,plot_list)
importFrom(dplyr,group_by)
importFrom(dplyr,group_by_)
importFrom(dplyr,n)
importFrom(dplyr,select)
importFrom(dplyr,summarize)
importFrom(dplyr,summarize_)
importFrom(ggforce,geom_arc)
importFrom(ggfun,geom_xspline)
importFrom(ggplot2,Geom)
importFrom(ggplot2,aes)
importFrom(ggplot2,aes_)
importFrom(ggplot2,coord_cartesian)
importFrom(ggplot2,coord_fixed)
importFrom(ggplot2,draw_key_polygon)
importFrom(ggplot2,element_blank)
importFrom(ggplot2,element_line)
importFrom(ggplot2,element_text)
importFrom(ggplot2,facet_wrap)
importFrom(ggplot2,geom_area)
importFrom(ggplot2,geom_blank)
importFrom(ggplot2,geom_col)
importFrom(ggplot2,geom_line)
importFrom(ggplot2,geom_point)
importFrom(ggplot2,geom_polygon)
importFrom(ggplot2,geom_ribbon)
importFrom(ggplot2,geom_segment)
importFrom(ggplot2,geom_smooth)
importFrom(ggplot2,geom_text)
importFrom(ggplot2,geom_tile)
importFrom(ggplot2,ggplot)
importFrom(ggplot2,ggplot_add)
importFrom(ggplot2,ggplot_build)
importFrom(ggplot2,ggplot_gtable)
importFrom(ggplot2,ggproto)
importFrom(ggplot2,ggtitle)
importFrom(ggplot2,labs)
importFrom(ggplot2,layer)
importFrom(ggplot2,scale_color_manual)
importFrom(ggplot2,scale_fill_gradientn)
importFrom(ggplot2,scale_fill_manual)
importFrom(ggplot2,scale_x_continuous)
importFrom(ggplot2,scale_y_continuous)
importFrom(ggplot2,theme)
importFrom(ggplot2,theme_bw)
importFrom(ggplot2,theme_minimal)
importFrom(ggplot2,theme_void)
importFrom(ggplot2,xlab)
importFrom(ggplot2,xlim)
importFrom(ggplot2,ylab)
importFrom(ggtree,geom_facet)
importFrom(ggtree,geom_tiplab)
importFrom(grDevices,colorRampPalette)
importFrom(grid,arrow)
importFrom(grid,gTree)
importFrom(grid,gpar)
importFrom(grid,polygonGrob)
importFrom(grid,unit)
importFrom(grid,unit.pmax)
importFrom(magrittr,"%<>%")
importFrom(magrittr,"%>%")
importFrom(methods,missingArg)
importFrom(methods,new)
importFrom(methods,show)
importFrom(seqmagick,fa_read)
importFrom(stats,setNames)
importFrom(tidyr,gather)
importFrom(utils,getFromNamespace)
importFrom(utils,globalVariables)
importFrom(utils,modifyList)
importFrom(utils,packageDescription)
importFrom(utils,read.delim)
================================================
FILE: NEWS.md
================================================
# ggmsa 1.18.0
+ Bioconductor RELEASE_3_23 (2026-04-29, Wed)
# ggmsa 1.16.0
+ Bioconductor RELEASE_3_22 (2025-11-01, Sat)
# ggmsa 1.15.1
+ replace `ggalt::geom_xspline()` with `ggfun::geom_xspline()` (2017-07-12, Sat)
# ggmsa 1.3.3
+ calling `\dontrun{}` for examples on `ggmsa()`
# ggmsa 1.3.2
+ bugfix: `geom_msaBar` conservation layer incorrectly aligned issues#34(2022-5-13, Fri)
# ggmsa 1.3.1
+ A new feature--selects ancestral sequence on Tree-MSA plot `treeMSA_plot` (2022-4-14, Thu)
+ A new feature--visualization of genome alignment `ggmaf` (2022-4-14, Thu)
+ A test feature--visualization protein-protein interactive (2022-4-14, Thu)
+ updated the way smooth is invoked on simplot(2022-01-03, Mon)
# ggmsa 1.1.4
added smoothed curve on simplot.(2021-12-17, Fri)
# ggmsa 1.1.3
fixed the typo in "posHighligthed", and changed it to
snake_case "position_highlight" from camelCase "posHighligthed" (2021-12-13, Mon)
# ggmsa 1.1.2
fixed the assignment error on line 155 'seqlogo.R'
# ggmsa 1.1.1
fixed error: using `||` instead of `|` on 110 lines in geom_msa.R
# ggmsa 0.99.0 or 0.99.x
(Prepare for submission to `Bioconductor`, 2021-09-22 Wed)
+ 0.99.1 update DESCRIPTION and NEWS files (2021-09-28, Tue)
+ 0.99.2 add documentation for row data in extdata/inst and clean up code (2021-09-29, Wed)
+ 0.99.3 remove some vignettes from master (build on the gh-pages branch) (2021-10-1, Fri)
+ 0.99.4 remove 'stringr' package from 'Imports' (2021-10-11, Mon)
+ 0.99.5 make the consensus_views compatible ggtreeExtra and add package description. (2021-10-21, Thu)
# ggmsa 0.0.10
+ update default color schemes in lower part of the SeqDiff plot (2021-08-20, Fri)
# ggmsa 0.0.9
+ import R4RNA to fix R check (2021-08-03, Tue)
# ggmsa 0.0.8
+ bugfix: fix variable names error in color_scheme. (2021-07-29, Thu)
+ The migration of sequence recombination functionality from `seqcombo` package. (2021-07-20, Tue)
# ggmsa 0.0.7
+ added `gghelix()` and `geom_helix()`.(2021-04-1, Thu)
+ added option to show the fill legend.(2021-03-23, Tue)
+ added a error message to remind that "sequences must have unique names".(2021-03-18, Thu)
+ added `ggSeqBundle()` to plot Sequence Bundles for MSAs based `ggolot2` (2021-03-18, Thu)
# ggmsa 0.0.6
+ supports linking `ggtreeExtra`. (2021-01-21, Thu)
+ bugfix: reversed sequence in 'tree + geom_facet(font)' . (2021-01-21, Thu)
+ bugfix: partitioning error when the sequence starting point greater than 1. (2021-01-21, Thu)
+ bugfix: generates continuous x-axis labels for each panel. (2021-01-21, Thu)
+ supports customize colors `custom_color`. (2020-12-28, Mon)
# ggmsa 0.0.5
+ added a new view called `by_conservation`.(2020-12-22, Tue)
+ added a new color scheme `Hydrophobicity` and a new parameter `border`.(2020-12-21, Mon)
+ rewrite the function `facet_msa()`.(2020-12-03, Thu)
+ Debug: tree + geom_facet(geom_msa()) does not work.(2020-12-03, Thu)
+ added a new function `geom_msaBar()`.(2020-12-03, Thu)
+ added a new parameter `ignore_gaps` used in consensus views.(2020-10-09, Fri)
+ debug in consensus views (2020-10-05, Mon)
+ added consensus views (2020-9-30, Wed)
+ added new colors `LETTER` and `CN6` provided by ShixiangWang.[issues#8](https://github.com/YuLab-SMU/ggmsa/issues/8)
# ggmsa 0.0.4
+ fixed warning message in **msa_data.R** (2020-4-26, Sun)
+ added ggplot_add methods for `geom_*()` (2020-4-24, Fri)
+ added a parameter `seq_name` in `ggmsa()` (2020-4-23, Thu)
+ added a new function `facet_msa()` --> break down the MSA (2020-4-17, Fri)
+ added a parameter `posHighlighted` in `ggmsa()` (2020-4-17, Fri)
+ created a new layer `geom_asterisk()` to optimized `geom_seed()` (2020-4-11, Sta)
+ added new functions `available_colors()`, `available_fonts()` and `available_msa()` (2020-3-30, Thu)
+ added a new function `geom_seed()` --> highlight the seed region in miRNA sequences (2020-3-27, Fri)
+ added a new function `ggmotif()`--> plot sequence motifs independently (2020-3-23, Tue)
+ added a Monospaced Font `DroidSansMono` (2020-3-23, Mon)
# ggmsa 0.0.3
+ release of v=0.0.3 (2020-03-16, Mon)
+ added a new function `geom_GC()` --> plot GC content in MSA (2020-02-28, Fri)
+ added a new function `geom_seqlogo()` --> plot plot sequence motifs in MSA (2020-02-14, Fri)
+ used a proportional scaling algorithm (2020-01-08, Wed)
# ggmsa 0.0.2
+ support plot sequence logo (2019-12-25, Wed)
+ added three fonts:`helvetical`, `times_new_roman`, `mono` (2019-12-21, Sta)
+ ~~added three fonts:`serif_font`, `Montserrat_font`, `roboto_font` (2019-12-17, Tue)~~
+ added internal outline polygons (2019-12-15, Sun)
+ bug fixed of `tidy_msa`
+ import `seqmagick` for parsing fasta
+ `tidy_msa` for converting msa file/object to tidy data frame (2019-12-09, Mon)
# ggmsa 0.0.1
+ initial CRAN release (2019-10-17, Thu)
+ removed from CRAN on 2021-08-17
================================================
FILE: R/AllClasses.R
================================================
setClass("SeqDiff",
representation = representation(
file = "character",
sequence = "BStringSet",
reference = "numeric",
diff = "data.frame"
)
)
================================================
FILE: R/SeqBundles.R
================================================
##' plot Sequence Bundles for MSA based 'ggolot2'
##'
##'
##' @title ggSeqBundle
##' @importFrom ggfun geom_xspline
##' @param msa Multiple sequence alignment file(FASTA) or object for
##' representing either nucleotide sequences or peptide sequences.Also receives
##' multiple MSA files.
##' eg:msa = c("Gram-negative_AKL.fasta", "Gram-positive_AKL.fasta").
##' @param line_width The width of bundles at each site, default is 0.3.
##' @param line_thickness The thickness of bundles at each site, default is 0.3.
##' @param line_high The high of bundles at each site, default is 0.
##' @param spline_shape A numeric vector of values between -1 and 1, which
##' control the shape of the spline relative to the control points.
##' @param size A numeric vector of values between 0 and 1,
##' which control the size of each lines.
##' @param alpha A numeric vector of values between 0 and 1,
##' which control the alpha of each lines.
##' @param bundle_color The colors of each sequence bundles.
##' eg: bundle_color = c("#2ba0f5","#424242").
##' @param lev_molecule Reassigning the Y-axis and displaying
##' letter-coded amino acids/nucleotides arranged by physiochemical
##' properties or others.eg:amino acids hydrophobicity
##' lev_molecule = c("-","A", "V", "L", "I", "P", "F", "W", "M",
##' "G", "S","T", "C", "Y", "N", "Q", "D", "E", "K","R", "H").
##' @return ggplot object
##' @export
##' @examples
##' aln <- system.file("extdata", "Gram-negative_AKL.fasta", package = "ggmsa")
##' ggSeqBundle(aln)
##' @author Lang Zhou
ggSeqBundle <- function(msa,
line_width = 0.3,
line_thickness = 0.3,
line_high = 0,
spline_shape = 0.3,
size = 0.5,
alpha = 0.2,
bundle_color = c("#2ba0f5","#424242"),
lev_molecule = c("-", "A", "V", "L", "I", "P",
"F", "W", "M", "G", "S","T",
"C", "Y", "N", "Q", "D", "E",
"K", "R", "H")
) {
if(length(msa) > length(bundle_color)) {
stop("Each MSA group should be assigned a bundle color!!")
}
df <- lapply(seq_along(msa), function(i){
df_aa <- tidy_msa(msa[[i]])
df_aa$name <- as.character(df_aa$name)
df_aa$group <- i
df_aa
})%>% do.call("rbind",.)
dd <- adjustMSA(df_msa = df,
lev_molecule = lev_molecule,
line_width = line_width,
line_thickness = line_thickness,
line_high = line_high,
bundle_color = bundle_color
)
mapping <- aes(x = position_adj, y = y_adj,
group=name, color = I(bundle_color))
ggplot(data = dd, mapping = mapping) +
geom_xspline(shape = spline_shape, linewidth = size, alpha = alpha) +
theme_bundles(df = df, lev_molecule = lev_molecule)
}
adjustMSA <- function(df_msa, lev_molecule, line_width,
line_thickness, bundle_color, line_high) {
data_scale <- lapply(nrow(df_msa) %>% seq_len(), function(i) {
d <- df_msa[i,]
d[2,] <- d[1,]
d[1,"position_adj"] <- d[1,"position"] - line_width
d[2,"position_adj"] <- d[2,"position"] + line_width
d
}) %>% do.call("rbind",.)
data_scale$y <- factor(data_scale$character, levels = lev_molecule) %>%
as.numeric()
data_adj <- lapply(data_scale$group %>% unique, function(g) {
data_group <- data_scale[data_scale$group == g,]
thickness <- line_thickness / factor(data_group$name) %>%
as.numeric %>%
max
dd_adj <- lapply(unique(data_group$position), function(i){
df_pos <- data_group[data_group$position == i,]
lapply(unique(df_pos$y), function(j){
df_y <- df_pos[df_pos$y == j,]
thick_lev <- df_y$name %>% factor %>% as.numeric - 1
df_y$y_adj <- df_y$y - 0.4 + line_high + thickness *
thick_lev + line_thickness * (g - 1)
df_y
}) %>% do.call("rbind",.)
}) %>% do.call("rbind",.)
dd_adj$bundle_color <- bundle_color[[g]]
dd_adj
}) %>% do.call("rbind",.)
return(data_adj)
}
##' @importFrom ggplot2 element_line
theme_bundles <- function(df, lev_molecule){
break_y <- factor(lev_molecule, levels = lev_molecule) %>% as.numeric
minor_y <- c(break_y + 0.5, break_y - 0.5) %>% unique
break_x <- max(df$position) %>% seq_len
minor_x <- c(break_x + 0.5, break_x - 0.5) %>% unique
list(
ylab(NULL),
xlab("Position number"),
scale_x_continuous(breaks = break_x,
labels = break_x,
minor_breaks = minor_x),
scale_y_continuous(breaks = break_y,
labels = lev_molecule,
minor_breaks = minor_y),
theme(panel.grid.minor.y = element_line(color = "#e8e0e0", linewidth = 0.4),
axis.line.x = element_line(color = "gray60", linewidth = 0.8),
panel.grid.major = element_blank(),
axis.ticks.y = element_blank(),
panel.background = element_blank())
)
}
================================================
FILE: R/ancestor_seq.R
================================================
##' plot Tree-MSA plot
##'
##'
##' 'treeMSA_plot()' automatically re-arranges the MSA data according to
##' the tree structure,
##' @title treeMSA_plot
##' @param p_tree tree view
##' @param tidymsa_df tidy MSA data
##' @param ancestral_node vector, internal node in tree. Assigning a internal
##' node to display "ancestral sequences",If ancestral_node = "none" hides
##' all ancestral sequences, if ancestral_node = "all" shows all ancestral
##' sequences.
##' @param sub logical value. Displaying a subset of ancestral sequences or not.
##' @param panel panel name for plot of MSA data
##' @param font font families, possible values are 'helvetical', 'mono', and
##' 'DroidSansMono', 'TimesNewRoman'. Defaults is 'helvetical'.
##' If font = NULL, only plot the background tile.
##' @param color a Color scheme. One of 'Clustal', 'Chemistry_AA',
##' 'Shapely_AA', 'Zappo_AA', 'Taylor_AA', 'LETTER', 'CN6', 'Chemistry_NT',
##' 'Shapely_NT', 'Zappo_NT', 'Taylor_NT'. Defaults is 'Chemistry_AA'.
##' @param seq_colname the colname of MSA on tree$data
##' @param ... additional parameters for 'geom_msa'
##' @export
##' @importFrom ggtree geom_facet
##' @return ggplot object
##' @author Lang Zhou
treeMSA_plot <- function(p_tree,
tidymsa_df,
ancestral_node = "none",
sub = FALSE,
panel = "MSA",
font = NULL,
color = "Chemistry_AA",
seq_colname = NULL,
...) {
if(!ancestral_node == "none" && is.null(seq_colname)) {
stop("pls assign the colname of MSA on tree$data by arguments 'seq_colname'!")
}
if(!ancestral_node == "none") {
p_tree <- adjust_ally(p_tree, node = ancestral_node,
sub = sub,
seq_colname = seq_colname)
tidymsa_df <- extract_seq(p_tree,
seq_colname = seq_colname)
}
p <- p_tree + geom_facet(geom = geom_msa,
data = tidymsa_df,
panel = panel,
font = font,
color = color,
...)
if(ancestral_node == "none") {
p <- p + geom_tiplab(offset = 0.002)
}
p
}
##' adjust the tree branch position after assigning ancestor node
##'
##' @title adjust_ally
##' @param tree ggtree object
##' @param node internal node in tree
##' @param sub logical value.
##' @param seq_colname the colname of MSA on tree$data
##' @importFrom ggtree geom_tiplab
##' @importFrom ggplot2 aes_
##' @importFrom utils getFromNamespace
##' @return tree
##' @export
##' @author Lang Zhou
adjust_ally <- function(tree, node, sub = FALSE, seq_colname = "mol_seq") {
getSubtree <- getFromNamespace("getSubtree", "ggtree")
if(node == "all"){
d <- tree$data
ancestor_n <- d[!d$isTip & !is.na(d[,seq_colname][[1]]),"node"][[1]]
}else {
if(sub){
ancestor_n <- lapply(node, function(i) {
sub_tree <- getSubtree(tree,node = i)
sub_ancestor <- sub_tree[!sub_tree$isTip,]
ancestor_n <- sub_ancestor$node
return(ancestor_n)
})%>% unlist %>% unique
}else {
ancestor_n <- node
}
}
for (i in ancestor_n) {
tree <- adjust_treey(tree = tree, node = i)
}
tree$data$node_color <- "black"
tree$data[tree$data$node %in% ancestor_n,"node_color"] <- "red"
tree <- tree + geom_tiplab(aes_(color = ~I(node_color)),offset = 0.002)
return(tree)
}
##' extract ancestor sequence from tree data
##'
##' @title extract_seq
##' @param tree_adjust ggtree object
##' @param seq_colname the colname of MSA on tree$data
##' @return character
##' @export
##' @author Lang Zhou
extract_seq <- function(tree_adjust, seq_colname = "mol_seq") {
data <- tree_adjust$data
seq <- data[data$isTip,seq_colname][[1]]
names(seq) <- data[data$isTip,]$label
tidy <- tidy_msa(seq)
return(tidy)
}
adjust_treey <- function(tree, node) {
tree$data$isTip[tree$data$node == node] <- TRUE
tree$data$label[tree$data$node == node] <-
tree$data$name[tree$data$node == node]
y_ancenstor <- tree$data$y[tree$data$node == node]
tree$data$y[tree$data$y > y_ancenstor] <-
tree$data$y[tree$data$y > y_ancenstor] + 1
tree$data$y[tree$data$node == node] <-
tree$data$y[tree$data$node == node] %>% ceiling
return(tree)
}
================================================
FILE: R/arc.R
================================================
##' Plots nucleltide secondary structure as helices in arc diagram
##'
##' @title gghelix
##' @param helix_data a data frame. The file of nucleltide secondary structure
##' and then read by readSSfile().
##' @param overlap Logicals. If TRUE, two structures data called predict
##' and known must be given(eg:heilx_data = list(known = data1,
##' predicted = data2)),
##' plots the predicted helices that are known on top, predicted helices that
##' are not known on the bottom, and finally plots unpredicted helices
##' on top in black.
##' @param color_by generate colors for helices by various rules,
##' including integer counts and value ranges one of "length" and "value"
##' @return ggplot object
##' @export
##' @examples
##' RF03120 <- system.file("extdata/Rfam/RF03120_SS.txt", package="ggmsa")
##' helix_data <- readSSfile(RF03120, type = "Vienna")
##' gghelix(helix_data)
##' @author Lang Zhou
gghelix <- function(helix_data, color_by = "length",overlap = FALSE){
if(is.data.frame(helix_data)) {
helix_tidy <- tidy_helix(helix_data, color_by = color_by)
}else {
helix_tidy <- tidy_list_helix(helix_data, color_by = color_by)
}
ly <- layer_helix(helix_data = helix_tidy, overlap = overlap)
p <- ggplot() + ly + theme_helix()
return(p)
}
##' The layer of helix plot
##'
##' @title geom_helix
##' @param helix_data a data frame. The file of nucleltide secondary structure
##' and then read by readSSfile().
##' @param overlap Logicals. If TRUE, two structures data called predict
##' and known must be given(eg:heilx_data = list(known = data1,
##' predicted = data2)),
##' plots the predicted helices that are known on top,
##' predicted helices that are not known on the bottom, and finally plots
##' unpredicted helices on top in black.
##' @param color_by generate colors for helices by various rules,
##' including integer counts and value ranges one of "length" and "value"
##' @param ... additional parameter
##' @return ggplot2 layers
##' @export
##' @examples
##' RF03120 <- system.file("extdata/Rfam/RF03120_SS.txt", package="ggmsa")
##'RF03120_fas <- system.file("extdata/Rfam/RF03120.fasta", package="ggmsa")
##'SS <- readSSfile(RF03120, type = "Vienna")
##'ggmsa(RF03120_fas, font = NULL,border = NA,
##' color = "Chemistry_NT", seq_name = FALSE) +
##'geom_helix(SS)
##' @author Lang Zhou
geom_helix <- function(helix_data, color_by = "length", overlap = FALSE, ...) {
structure(list(helix_data = helix_data,
color_by = color_by,
overlap = overlap),
class = "nucleotideeHelix")
}
##' Read secondary structure file
##'
##' @title readSSfile
##' @importFrom utils read.delim
##' @param file A text file in connect format
##' @param type file type. one of "Helix, "Connect", "Vienna" and "Bpseq"
##' @return data frame
##' @importFrom R4RNA readHelix
##' @importFrom R4RNA readConnect
##' @importFrom R4RNA readVienna
##' @importFrom R4RNA readBpseq
##' @importFrom R4RNA expandHelix
##' @importFrom R4RNA collapseHelix
##' @export
##' @examples
##' RF03120 <- system.file("extdata/Rfam/RF03120_SS.txt", package="ggmsa")
##' helix_data <- readSSfile(RF03120, type = "Vienna")
##' @author Lang Zhou
readSSfile <- function(file, type = NULL) {
type <- match.arg(type, c("Helix", "Connect", "Vienna", "Bpseq"))
load_data <- switch(type,
Helix = readHelix(file),
Connect = readConnect(file),
Vienna = readVienna(file),
Bpseq = expandHelix(file))
data <- collapseHelix(load_data)
return(data)
}
tidy_list_helix <- function(helix_data, color_by = "length"){
known <- tidy_helix(helix_data$known, color_by = color_by)
predicted <- tidy_helix(helix_data$predicted, color_by = color_by)
return(list(known = known, predicted = predicted))
}
tidy_helix <- function(helix_data, color_by = "length"){
helix_data <- color_helix(helix_data, color = color_by)
names(helix_data)[c(1,2)] <- c("from","to")
helix_data$x0 <- (helix_data$to + helix_data$from)/2
helix_data$r <- (helix_data$to - helix_data$from)/2
return(helix_data)
}
color_helix <- function(helix_data, color){
#color <- match.arg(color, c("length", "value"))
if(color == "length"){
data_color <- colorBy_length(helix_data)
}else if(color == "value") {
data_color <- colorBy_value(helix_data)
}else {
helix_data$col <- color
data_color <- helix_data
}
data <- expandHelix(data_color)
return(data)
}
colorBy_length <- function(helix_data){
pal_lenght <- colorRampPalette(brewer.pal(name = "Paired", n = 12))
helix_data$col <- nrow(helix_data) %>% pal_lenght()
return(helix_data)
}
colorBy_value <- function(helix_data){
pal_value <- colorRampPalette(rev(brewer.pal(name = "Blues", n = 4)))
helix_data$col <- nrow(helix_data) %>% pal_value()
return(helix_data)
}
##' @importFrom ggforce geom_arc
layer_helix <- function(helix_data, overlap = FALSE, seq_numbers = 0){
mapping_above <- aes_(x0 = ~x0,
y0 = ~(seq_numbers + 0.5),
r = ~r, start = ~1.5*pi,
end = ~2.5*pi)
mapping_below <- aes_(x0 = ~x0,
y0 = ~(-0.5),
r = ~r, start = ~pi/2,
end = ~1.5*pi)
if(seq_numbers > 0) {
mapping_below <- modifyList(mapping_below, aes_(y0 = ~0))
}
if(is.list(helix_data) & "col" %in% names(helix_data[[2]])) {
mapping_above <- modifyList(mapping_above, aes_(color = ~I(col)))
mapping_below <- modifyList(mapping_below, aes_(color = ~I(col)))
}
if(overlap) {
if(!is.list(helix_data)| length(helix_data) != 2){
stop("Overlapping structures must input a list with
2 helix data.
(eg: heilx_data = list(known = data1, predicted = data2)")
}
if(!names(helix_data) %in% c("known", "predicted") %>% all) {
stop("helix_data names must be 'known' and 'predicted'.
(eg: heilx_data = list(known = data1, predicted = data2)")
}
overlap_data <- overlap_helix(known = helix_data[["known"]],
predicted = helix_data[["predicted"]])
if (overlap_data[["above_justknown"]] %>% nrow == 0){
ly_up <- geom_arc(data = overlap_data[["above_both"]],
mapping = mapping_above)
ly_below <- geom_arc(data = overlap_data[["below"]],
mapping = mapping_below)
return(list(ly_up, ly_below))
}else {
ly_up <- geom_arc(data = overlap_data[["above_both"]],
mapping = mapping_above)
ly_up_justknown <-
geom_arc(data = overlap_data[["above_justknown"]],
mapping = mapping_above,
color = "black")
ly_below <- geom_arc(data = overlap_data[["below"]],
mapping = mapping_below)
return(list(ly_up, ly_up_justknown, ly_below))
}
}else {#overlap = FALSE
if(is.list(helix_data) & length(helix_data) == 2) {
if(!"col" %in% names(helix_data[["known"]])) {
mapping_below <- modifyList(mapping_below,
aes_(color = I("#8fce5e")))
}
ly_up <- geom_arc(data = helix_data[["known"]],
mapping = mapping_below)
ly_below <- geom_arc(data = helix_data[["predicted"]],
mapping = mapping_above)
return(list(ly_up, ly_below))
}else if(is.data.frame(helix_data)){
if("col" %in% names(helix_data)){
mapping_above <- modifyList(mapping_above,
aes_(color = ~I(col)))
}
ly_arc <- geom_arc(data = helix_data, mapping = mapping_above)
return(ly_arc)
}else {
stop("Only a data frame or a list with 2 of helix data are allowed.
eg: heilx_data = data or
heilx_data = list(known = data1, predicted = data2)")
}
}
}
overlap_helix <- function(known, predicted){
if(!c("from", "to") %in% names(known) %>% all) {
stop("'known' must be a output from 'readSSfile()'")
}
if(!c("from", "to") %in% names(predicted) %>% all) {
stop("'predicted' must be a output from 'readSSfile()'")
}
known$heli <- paste0(known$from, "t",known$to)
predicted$heli <- paste0(predicted$from, "t", predicted$to)
below <- predicted[!predicted$heli %in% known$heli,] #predicted & not known
above_both <- predicted[predicted$heli %in% known$heli,] #predicted & known
above_justknown <- known[!known$heli %in% above_both$heli,] #unpredicted & known
return(list(below = below,
above_both = above_both,
above_justknown = above_justknown))
}
##' @importFrom ggplot2 theme_void
##' @importFrom ggplot2 element_text
##' @importFrom grid arrow
theme_helix <- function(){
list(theme_void(),
scale_y_continuous(breaks = 0),
coord_fixed(),
theme(panel.grid.major.y = element_line(size = 1, arrow = arrow(length = unit(0.3, 'cm'))),
panel.grid.major.x = element_line(color = "#eaeaea", size = 0.4),
axis.text.x = element_text())
)
}
================================================
FILE: R/available.R
================================================
##' This function lists font families currently available
##' that can be used by 'ggmsa'
##'
##'
##' @title List Font Families currently available
##' @return A character vector of available font family names
##' @examples available_fonts()
##' @export
##' @author Lang Zhou
available_fonts <- function(){
message("font families currently available:" )
font <- paste(names(font_fam), collapse = ' ')
message(font, "\n")
}
##' This function lists color schemes currently available that
##' can be used by 'ggmsa'
##'
##'
##' @title List Color Schemes currently available
##' @return A character vector of available color schemes
##' @examples available_colors()
##' @export
##' @author Lang Zhou
available_colors <- function(){
message("1.color schemes for nucleotide sequences currently available:")
color_nt <- paste(names(scheme_NT), collapse = ' ')
message(color_nt, "\n")
message("2.color schemes for AA sequences currently available:")
color_aa <- paste(names(scheme_AA), collapse = ' ')
message("Clustal", color_aa, "\n")
}
##' This function lists MSA objects currently available that
##' can be used by 'ggmsa'
##'
##'
##' @title List MSA objects currently available
##' @return A character vector of available objects
##' @examples available_msa()
##' @export
##' @author Lang Zhou
available_msa <- function(){
message("1.files currently available:")
message(".fasta",'\n')
message("2.XStringSet objects from 'Biostrings' package:")
mes <- paste(supported_msa_class[!grepl("bin", supported_msa_class)],
collapse = ' ')
message(mes, '\n')
message("3.bin objects:")
mes_bin <- paste(supported_msa_class[grepl("bin", supported_msa_class)],
collapse = ' ')
message(mes_bin, '\n')
}
================================================
FILE: R/clustal.R
================================================
##' A color scheme of Culstal. The algorithm to assign colors
##' for Multiple Sequence.
##'
##' @param y sequence alignment with data frame, generated by tidy_msa().
##' @keywords clustal
##' @noRd
color_Clustal <- function(y) {
char_freq <- lapply(split(y, y$position), function(x) table(x$character))
col_convert <- lapply(char_freq, function(seq_column) {
##The white as the background
clustal <- rep("#ffffff", length(seq_column))
names(clustal) <- names(seq_column)
r <- seq_column/sum(seq_column)
for (pos in seq_along(seq_column)) {
char <- names(seq_column)[pos]
i <- grep(char, scheme_clustal$re_position)
for (j in i) {
if (scheme_clustal$type[j] == "combined"){
rr <- sum(r[strsplit(scheme_clustal$re_gp[j], '')[[1]]],
na.rm = TRUE)
if (rr > scheme_clustal$thred[j]) {
clustal[pos] <- scheme_clustal$colour[j]}
} else{
rr1<-r[strsplit(scheme_clustal$re_gp[j], ',')[[1]]]
if (any(rr1> scheme_clustal$thred[j],na.rm = TRUE) ) {
clustal[pos] <- scheme_clustal$colour[j]}
}
break
}
}
return(clustal)
})
yy <- split(y, y$position)
lapply(names(yy), function(n) {
d <- yy[[n]]
col <- col_convert[[n]]
d$color <- col[d$character]
return(d)
}) %>% do.call('rbind', .)
}
================================================
FILE: R/color_by_conservation.R
================================================
color_increment <- function(conservation_visibility){
lapply(seq_len(nrow(conservation_visibility)), function(i){
color_ramp <-
colorRampPalette(colors =
c(conservation_visibility[i,"color"],
"#ffffff"))
color_change <-
rev(color_ramp(100))[conservation_visibility[i,"visibility"]]
return(color_change)
}) %>% unlist
}
color_visibility <- function(y){
#options(digits = 2)
#on.exit()
conser_data <- bar_data(y)
conser_data$visibility <-
conser_data$Freq / length(levels(y[[1]])) %>% round(2)
conser_data$visibility <- conser_data$visibility * 100
names(conser_data)[3] <- "position"
y_filter <- y[c(-1,-3)]
conser_ready <- merge(conser_data, y_filter)
y$color <- color_increment(conser_ready)
return(y)
}
================================================
FILE: R/color_else.R
================================================
##' Assigning colors to sequence alignment.
##'
##'
##' @param y sequence alignment with data frame, generated by tidy_msa().
##' @param color a Color scheme. One of 'Clustal', 'Chemistry_AA',
##' 'Shapely_AA', 'Zappo_AA', 'Taylor_AA', 'LETTER', 'CN6', 'Chemistry_NT',
##' 'Shapely_NT', 'Zappo_NT', 'Taylor_NT'. Defaults is 'Chemistry_AA'.
##' @param custom_color A data frame with two column called "names"
##' and "color".Customize the color scheme.
##' @noRd
color_scheme <- function(y, color = "Chemistry_AA", custom_color = NULL) {
if (!is.null(custom_color)){
#Elimination factor interference
custom_color[["names"]] <- as.character(custom_color[["names"]])
#Fuzzy matching the string "colors" or "colours"
custom_color[["color"]] <- as.character(custom_color$col)
row.names(custom_color) <- custom_color[["names"]]
scheme_AA$custom_color <-
custom_color[row.names(scheme_AA), "color"] %>% as.character()
y$color <- scheme_AA[y$character, "custom_color"]
}else{
if(grepl("NT", color)){
y$color <- scheme_NT[y$character, color]
} else{
y$color <- scheme_AA[y$character, color]
}
}
return(y)
}
================================================
FILE: R/cons.R
================================================
##' cleaning the needless sequences' color according to the
##' consensus sequence (only used in the consensus views).
##'
##' @param y a data frame, sequence alignment with specified color.
##' @param consensus the consensus sequence which can be called by
##' get_consensus().
##' @param disagreement a logical value. Displays characters that
##' disagreement to consensus(excludes ambiguous disagreements).
##' @param ref a character string. Specifying the reference sequence
##' which should be one of input sequences when 'consensus_views' is TRUE.
##' @keywords tidy_color
##' @noRd
tidy_color <- function(y, consensus, disagreement, ref) {
c <- lapply(unique(y$position), function(i) {
msa_cloumn <- y[y$position == i, ]
if(!is.null(ref)) {
if ('label' %in% names(msa_cloumn)) { ##work for ggtreeExtra
msa_cloumn <- msa_cloumn[!msa_cloumn$label == ref, ]
}else{
msa_cloumn <- msa_cloumn[!msa_cloumn$name == ref, ]
}
}
#Get consensus char.
cons_char <- consensus[consensus$position == i, "character"]
#Compare the characters of the current position(i)
#to the consensus char.
logic <- msa_cloumn$character == cons_char
#Cleaning colors according to the 'logic'.
if(cons_char == "X") {
msa_cloumn$color <- NA
}
if(disagreement){
msa_cloumn[logic, "color"] <- NA
}else{
msa_cloumn[!logic, "color"] <- NA
}
msa_cloumn
}) %>% do.call("rbind", .)
return(c)
}
##' calling the consensus sequence.
##'
##' @param tidy sequence alignment with data frame, generated by tidy_msa().
##' @param ignore_gaps a logical value. When selected TRUE, gaps in
##' column are treated as if that row didn't exist.
##' @param ref a character string. Specifying the reference sequence
##' which should be one of input sequences when 'consensus_views' is TRUE.
##' @keywords get_consensus
##' @noRd
get_consensus <- function(tidy, ignore_gaps = FALSE, ref = NULL) {
if(!is.null(ref)) {
if(ignore_gaps) {
warning("The argument 'ignore_gaps' is
invalid when 'ref' is specified!")
}
if ('label' %in% names(tidy)) { ##work for ggtreeExtra
ref <- match.arg(ref, levels(factor(tidy$label)))
cons <- tidy[tidy$label == ref,]
}else {
ref <- match.arg(ref, levels(tidy$name))
cons <- tidy[tidy$name == ref,]
}
return(cons)
}
#Iterate through each columns
cons <- lapply(unique(tidy$position), function(i) {
msa_cloumn <- tidy[tidy$position == i, ]
cons <- data.frame(position = i)
if(ignore_gaps) {
msa_cloumn <- msa_cloumn[!msa_cloumn$character %in% "-",]
}
#Gets the highest frequency characters
fre <- table(msa_cloumn$character) %>% data.frame
max_element <- fre[fre[2] == max(fre[2]),]
max_number <- max_element %>% nrow
if(max_number == 1) {
cons$character <- max_element[1,1]
}else {
cons$character <- "X"
}
cons
}) %>% do.call("rbind", .)
cons$name = "Consensus"
cons$character <- as.character(cons$character) #debug 'as.character'
return(cons)
}
order_name <- function(name, order = NULL,
consensus_views = FALSE,
ref = NULL) {
name_uni <- unique(name)
if(is.null(ref)){
#placed 'consensus' at the top
name_expect <- name_uni[!name_uni %in% "Consensus"] %>%
rev %>%
as.character
name <- factor(name, levels = c(name_expect, "Consensus"))
}else {
name_expect <- name_uni[!name_uni %in% ref] %>%
rev %>%
as.character
name <- factor(name, levels = c(name_expect, ref))
}
return(name)
}
================================================
FILE: R/data.R
================================================
#' A sample data used in ggmsa
#'
#' A dataset containing the alignment sequences of
#' the phenylalanine hydroxylase protein (PH4H)
#' within nine species
#'
#'
#' @docType data
#' @keywords datasets
#' @name sample.fasta
#' @format A MSA fasta with 9 sequences and 456 positions.
NULL
#' GVariation
#'
#' A folder containing 4 MAS files as a sample
#' data set to identify the sequence recombination event.
#'
#' \itemize{
#' \item A.Mont.fas MSA with sequences of 'Mont' and 'CF_YL21'
#' \item B.Oz.fas MSA with sequences of 'Oz' and 'CF_YL21'
#' \item C.Wilga5.fas MSA with sequences of 'Wilga5' and 'CF_YL21'
#' \item sample_alignment.fa MSA with sequences of 'Mont', 'CF_YL21',
#' 'Oz', and 'Wilga5'
#' }
#' @docType data
#' @keywords datasets
#' @name GVariation
#' @format a folder
#' @source \url{https://link.springer.com/article/10.1007/s11540-015-9307-3}
NULL
#' Rfam
#'
#' A folder containing seed alignment sequences and
#' corresponding consensus RNA secondary structure.
#'
#' \itemize{
#' \item RF00458.fasta seed alignment sequences of Cripavirus internal
#' ribosome entry site (IRES)
#' \item RF03120.fasta seed alignment sequences of Sarbecovirus 5'UTR
#' \item RF03120_SS.txt consensus RNA secondary structure of
#' Sarbecovirus 5'UTR
#'
#' }
#' @docType data
#' @keywords datasets
#' @name Rfam
#' @format a folder
#' @source \url{https://rfam.xfam.org/}
NULL
#' Gram-negative_AKL
#'
#' Amino acids in the adenylate kinase lid (AKL) domain
#' from Gram-negative bacteria.
#'
#' @docType data
#' @keywords datasets
#' @name Gram-negative_AKL.fasta
#' @format A MSA fasta with 100 sequences and 36 positions.
#' @source \url{http://biovis.net/year/2013/info/redesign-contest}
NULL
#' Gram-positive_AKL
#'
#' Amino acids in the adenylate kinase lid (AKL) domain
#' from Gram-positive bacteria.
#'
#' @docType data
#' @keywords datasets
#' @name Gram-positive_AKL.fasta
#' @format A MSA fasta with 100 sequences and 36 positions.
#' @source \url{http://biovis.net/year/2013/info/redesign-contest}
NULL
#' A sample DNA alignment sequences
#'
#' DNA alignment sequences with 24 sequences and 56 positions.
#'
#'
#' @docType data
#' @keywords datasets
#' @name LeaderRepeat_All.fa
#' @format A MSA fasta
NULL
#' microRNA data used in ggmsa
#'
#'Fasta format sequences of mature miRNA sequences
#'from miRBase
#'
#'
#' @docType data
#' @keywords datasets
#' @name seedSample.fa
#' @format A MSA fasta with 6 sequences and 22 positions.
#' @source \url{https://www.mirbase.org/ftp.shtml}
NULL
#' sequence-link-tree
#'
#' Alignment sequences used to demonstrate circular MSA layout
#'
#' @docType data
#' @keywords datasets
#' @name sequence-link-tree.fasta
#' @format A MSA fasta with 28 sequences and 480 positions.
NULL
#' TP53 MSA
#'
#' Alignment sequences of used to show graphical combination
#'
#' @docType data
#' @keywords datasets
#' @name tp53.fa
#' @format A MSA fasta with 5 sequences and 404 positions.
NULL
#' genome locus
#'
#' The local genome map shows the 30000 sites around the TP53 gene.
#'
#' @docType data
#' @keywords datasets
#' @name TP53_genes.xlsx
#' @format xlsx
NULL
================================================
FILE: R/dms.R
================================================
##' assign dms value to alignments.
##'
##' @title assign_dms
##' @param x data frame from tidy_msa()
##' @param dms dms data frame
##' @return tree
##' @export
##' @author Lang Zhou
assign_dms <- function(x, dms) {
dms_value <- lapply(unique(x$position), function(i) {
xx <- x[x$position == i,]
dmss <- dms[dms$site_RBD == i,]
wt <- unique(dmss[,"wildtype"])
xx$mutation <- paste0(wt, xx$position, xx$character)
xx$bind_avg <- lapply(seq_along(xx$mutation),function(j) {
bind_avg <- dmss[dmss$mutation_RBD %in% xx[j,"mutation"],"bind_avg"]
return(bind_avg)
}) %>% unlist
return(xx)
}) %>% do.call("rbind",.)
return(dms_value )
}
================================================
FILE: R/facet_msa.R
================================================
##' The MSA would be plot in a field that you set.
##' @title segment MSA
##' @param field a numeric vector of the field size.
##' @return ggplot layers
##' @examples
##' library(ggplot2)
##' f <- system.file("extdata/sample.fasta", package="ggmsa")
##' # 2 fields
##' ggmsa(f, end = 120, font = NULL, color="Chemistry_AA") +
##' facet_msa(field = 60)
##' # 3 fields
##' ggmsa(f, end = 120, font = NULL, color="Chemistry_AA") +
##' facet_msa(field = 40)
##' @export
##' @author Lang Zhou
facet_msa <- function(field) {
structure(list(field = field),
class = "facet_msa"
)
}
facet_data <- function(msaData, field) {
if(min(msaData$position) > 1){
pos_reset <- msaData$position - min(msaData$position)
pos_reset[pos_reset == 0] <- 1
}else {
pos_reset <- msaData$position
}
msaData$facet <- pos_reset %/% field
msaData[(pos_reset %% field) == 0, "facet"] <-
msaData[(pos_reset %% field) == 0, "facet"] - 1
return(msaData)
}
================================================
FILE: R/geom_GC.R
================================================
##' Multiple sequence alignment layer for ggplot2. It plot points of GC content.
##' @title geom_GC
##' @param show.legend logical. Should this layer be included in the legends?
##' @return a ggplot layer
##' @examples
##' #plot GC content
##' f <- system.file("extdata/LeaderRepeat_All.fa", package="ggmsa")
##' ggmsa(f, font = NULL, color="Chemistry_NT") + geom_GC()
##' @export
##' @author Lang Zhou
geom_GC <- function(show.legend = FALSE) {
structure(list(show.legend = show.legend),
class = "GCcontent")
}
geom_GC1 <- function(tidyData, show.legend = FALSE){
tidy <- tidyData
#tidy <- tidy_msa(msa = msa, start = start, end = end)
GC_pos <- getOption("GC_pos")
GC <- content_GC(tidy)
GC <-GC[GC$character == "GC",]
col_num <- levels(factor(tidy$position))
col_len <- length(col_num) + GC_pos
ly_GC <- geom_point(data = GC,
mapping = aes_(x = ~col_len,
y = ~ypos,
size = ~fre),
color = "#51a6e9",
na.rm = TRUE,
show.legend = show.legend)
return(ly_GC)
}
##' get GC content
##' @title content_GC
##' @param data Multiple aligned sequence files or objects
##' for representing nucleotide sequences
##' @return A data frame
##' @noRd
##' @author Lang Zhou
content_GC<- function(data){
tidy <- data
tidy$name <- factor(tidy$name, levels = unique(tidy$name))
tidy$ypos <- as.numeric(tidy$name)
seq_num <- unique(tidy$ypos)
lchar_num <- lapply(seq_num, function(j){
clo <- tidy[tidy$ypos == j, ]
y <- prop.table(table(clo$character))
y["GC"] <- y["G"] + y["C"]
num <-setNames(rep(0,5), c("A", "T", "G", "C", "GC"))
num[names(y)] <- y
return(num)
})
char_num <- do.call(rbind,lchar_num)
char_num <- as.data.frame(char_num)
char_num["ypos"] = seq_num
char_num2 <- gather(char_num,character,fre, "A", "T", "C","G","GC")
return(char_num2)
}
================================================
FILE: R/geom_asterisk.R
================================================
##' a ggplot2 layer of asterisk as a polygon
##'
##'
##' @title a ggplot2 layer of asterisk as a polygon
##' @param mapping aes mapping
##' @param data a data frame
##' @param stat the statistical transformation to use on the data
##' for this layer, as a string.
##' @param position position adjustment, either as a string,
##' or the result of a call to a position adjustment function.
##' @param na.rm a logical value
##' @param show.legend a logical value
##' @param inherit.aes a logical value
##' @param ... additional parameters
##' @importFrom ggplot2 layer
##' @return ggplot2 layer
## @export
##' @noRd
##' @author Lang Zhou
##' @examples
##' #library(ggplot2)
##' #ggplot(mtcars, aes(mpg, disp)) + geom_asterisk()
geom_asterisk <- function(mapping = NULL,
data = NULL,
stat = "identity",
position = "identity",
na.rm = FALSE,
show.legend = NA,
inherit.aes = TRUE, ...) {
layer(geom = Geomasterisk,
mapping = mapping,
data = data,
stat = stat,
position = position,
show.legend = show.legend,
inherit.aes = inherit.aes,
params = list(na.rm = na.rm, ...))
}
##' @importFrom grid polygonGrob
##' @importFrom grid gpar
SeedStar <- function(x = NULL , y = NULL) {
char_width <- getOption("asterisk_width")
char_scale_2 <- getOption("char_scale_2")
x_width <- char_scale_2 * diff(range(star$y))
star$x = star$x * x_width/diff(range(star$x))
char_scale <- diff(range(star$x))/diff(range(star$y))
star$x = star$x * (char_width * char_scale)/diff(range(star$x))
star$y = star$y * char_width/diff(range(star$y))
star$x = star$x - min(star$x) - (char_width * char_scale)/2 + x
star$y = star$y - min(star$y) - char_width/2 + y
polygonGrob(star$x, star$y, gp = gpar(fill = "black") )
}
##' @importFrom ggplot2 ggproto
##' @importFrom ggplot2 Geom
##' @importFrom ggplot2 draw_key_polygon
##' @importFrom ggplot2 aes
##' @importFrom grid gTree
Geomasterisk <- ggproto("Geomasterisk", Geom,
required_aes = c("x", "y"),
default_aes = aes(fill = "black"),
draw_key = draw_key_polygon,
draw_panel = function(data, panel_params, coord) {
data <- coord$transform(data, panel_params)
grobs <- lapply(seq_len(nrow(data)), function(i) {
SeedStar(data$x[i], data$y[i])
})
class(grobs) <- "gList"
ggplot2:::ggname("geom_asterisk",
gTree(children = grobs))
}
)
================================================
FILE: R/geom_msa.R
================================================
##' Multiple sequence alignment layer for ggplot2.
##' It creates background tiles with/without sequence characters.
##'
##' @title geom_msa
##' @param data sequence alignment with data frame, generated by tidy_msa().
##' @param font font families, possible values are 'helvetical', 'mono',
##' and 'DroidSansMono', 'TimesNewRoman'. Defaults is 'helvetical'.
##' @param mapping aes mapping
##' If font = NULL, only plot the background tile.
##' @param color A Color scheme. One of 'Clustal', 'Chemistry_AA', 'Shapely_AA',
##' 'Zappo_AA', 'Taylor_AA', 'LETTER','CN6',, 'Chemistry_NT', 'Shapely_NT',
##' 'Zappo_NT', 'Taylor_NT'. Defaults is 'Chemistry_AA'.
##' @param custom_color A data frame with two column called "names" and
##' "color".Customize the color scheme.
##' @param char_width a numeric vector. Specifying the character width in
##' the range of 0 to 1. Defaults is 0.9.
##' @param by_conservation a logical value. The most conserved regions have
##' the brightest colors.
##' @param none_bg a logical value indicating whether background
##' should be displayed. Defaults is FALSE.
##' @param position_highlight A numeric vector of the position that
##' need to be highlighted.
##' @param seq_name a logical value indicating whether sequence names
##' should be displayed. Defaults is 'NULL' which indicates that the
##' sequence name is displayed when 'font = null', but 'font = char'
##' will not be displayed. If 'seq_name = TRUE' the sequence name will
##' be displayed in any case. If 'seq_name = FALSE' the sequence name will not
##' be displayed under any circumstances.
##' @param border a character string. The border color.
##' @param consensus_views a logical value that opening consensus views.
##' @param use_dot a logical value. Displays characters as dots instead of
##' fading their color in the consensus view.
##' @param disagreement a logical value. Displays characters that disagreement
##' to consensus(excludes ambiguous disagreements).
##' @param ignore_gaps a logical value. When selected TRUE,
##' gaps in column are treated as if that row didn't exist.
##' @param ref a character string. Specifying the reference sequence
##' which should be one of input sequences when 'consensus_views' is TRUE.
##' @param position Position adjustment, either as a string, or
##' the result of a call to a position adjustment function,
##' default is 'identity' meaning 'position_identity()'.
##' @param show.legend logical. Should this layer be included in the legends?
##' @param dms logical.
##' @param position_color logical.
##' @param ... additional parameter
##' @return A list
##' @importFrom ggplot2 scale_fill_manual
##' @importFrom utils modifyList
##' @export
##' @examples
##' library(ggplot2)
##'aln <- system.file("extdata", "sample.fasta", package = "ggmsa")
##'tidy_aln <- tidy_msa(aln, start = 150, end = 170)
##'ggplot() + geom_msa(data = tidy_aln, font = NULL) + coord_fixed()
##' @author Guangchuang Yu, Lang Zhou
geom_msa <- function(data, font = "helvetical",
mapping = NULL,
color = "Chemistry_AA",
custom_color = NULL,
char_width = 0.9,
none_bg = FALSE,
by_conservation = FALSE,
position_highlight = NULL,
seq_name = NULL,
border = NULL,
consensus_views = FALSE,
use_dot = FALSE,
disagreement = TRUE,
ignore_gaps = FALSE,
ref = NULL,
position = "identity",
show.legend = FALSE,
dms = FALSE,
position_color = FALSE,
... ) {
data <- msa_data(data,
font = font,
color = color,
custom_color = custom_color,
char_width = char_width,
by_conservation = by_conservation,
consensus_views = consensus_views,
use_dot = use_dot,
disagreement = disagreement,
ignore_gaps = ignore_gaps,
ref = ref)
#legend work
xx <- data[,c("character","color")] %>% unique()
xx <- xx[!is.na(xx$color),]
labs <- lapply(unique(xx$color) %>% seq_along, function(i) {
cols <- unique(xx$color)[i]
dup_char <- xx[xx$color == cols, "character"]
lab <- paste0(dup_char, collapse = ",")
}) %>% do.call("rbind",.) %>% as.vector()
cols <- xx$color %>% unique()
names(cols) <- cols
sacle_tile_cols <- scale_fill_manual(values = cols,
breaks = cols,
labels = labs)
bg_data <- data
#work to ggtreeExtra
if (is.null(mapping)) {
mapping <- aes_(x = ~position, y = ~name, fill = ~I(color))
}
#dms color work
if (dms) {
mapping <- modifyList(mapping, aes_(fill = ~bind_avg))
}
if (position_color) {
mapping <- modifyList(mapping, aes_(fill = ~I(pos_color)))
}
#'seq_name' work
if (!isTRUE(seq_name)) {
if ('y' %in% colnames(data) || isFALSE(seq_name) ) {
y <- as.numeric(bg_data$name)
mapping <- modifyList(mapping, aes_(y = ~y)) #"~y" is seq numbers
}
}
#'position_highlight' work
if (!is.null(position_highlight)) {
none_bg = TRUE
bg_data <- bg_data[bg_data$position %in% position_highlight,]
bg_data$postion <- as.factor(bg_data$position)
mapping <- modifyList(mapping, aes_(x = ~position,
fill = ~color,
width = 1))
}
#'border' work
if(is.null(border)){
ly_bg <- geom_tile(mapping = mapping, data = bg_data, color = 'grey',
inherit.aes = FALSE, position = position,
show.legend = show.legend)
}else{
ly_bg <- geom_tile(mapping = mapping, data = bg_data, color = border,
inherit.aes = FALSE, position = position,
show.legend = show.legend)
}
if (!all(c("yy", "order", "group") %in% colnames(data))) {
if(position_color) {
return(list(ly_bg))
}else{
return(list(ly_bg, sacle_tile_cols))
}
}
if ('y' %in% colnames(data)) {
data$yy = data$yy - as.numeric(data$name) + data$y
}
label_mapping <- aes_(x = ~x, y = ~yy, group = ~group)
# use_dot work
if (consensus_views && !use_dot) {
if(show.legend) {
stop("legends catn't be shown in the consensus view!")
}
label_mapping <- modifyList(label_mapping, aes_(fill = ~I(font_color)))
}
ly_label <- geom_polygon(mapping = label_mapping, data = data,
inherit.aes = FALSE, position = position)
#'none_bg' work
if (none_bg & is.null(position_highlight)) {
return(ly_label)
}
if(consensus_views) {
return(list(ly_bg, ly_label))
}else {
if(position_color){
return(list(ly_bg, ly_label))
}else{
return(list(ly_bg, ly_label, sacle_tile_cols))
}
}
}
================================================
FILE: R/geom_msaBar.R
================================================
##' Multiple sequence alignment layer for ggplot2.
##' It plot sequence conservation bar.
##' @title geom_msaBar
##' @return A list
##' @examples
##' #plot multiple sequence alignment and conservation bar.
##' f <- system.file("extdata/sample.fasta", package="ggmsa")
##' ggmsa(f, 221, 280, font = NULL, seq_name = TRUE) + geom_msaBar()
##' @export
##' @author Lang Zhou
geom_msaBar <- function() {
structure(list(),
class = "msaBar")
}
##' @importFrom ggplot2 geom_col
ly_bar <- function(tidy){
data <- bar_data(tidy)
mapping <- aes_(x = ~pos, y = ~Freq, fill = ~Freq)
ly_bar <- geom_col(data = data,
mapping = mapping,
width = 1,
show.legend = FALSE)
return(ly_bar)
}
##' get bar data
##' @title bar_data
##' @param tidy Multiple aligned sequence files or
##' object for representing nucleotide sequences
##' @return A data frame
##' @noRd
##' @author Lang Zhou
bar_data <- function(tidy){
character_position <- unique(tidy$position)
conservation_score <- lapply(character_position, function(j) {
cloumn_data <- tidy[tidy$position == j, ]
character_frequency <- table(cloumn_data$character) %>% as.data.frame
max_frequency <- character_frequency[character_frequency[2] ==
max(character_frequency[2]),]
max_frequency$Var1 <- as.character(max_frequency$Var1)
if(nrow(max_frequency) == 1) {
max_frequency <- max_frequency[1,]
}else {
max_frequency <- max_frequency[1,]
}
}) %>% do.call("rbind", .)
conservation_score["pos"] <- character_position
return(conservation_score)
}
================================================
FILE: R/geom_seed.R
================================================
##' Highlighting the seed in miRNA sequences
##'
##'
##' @title geom_seed
##' @param seed a character string.Specifying the miRNA seed sequence
##' like 'GAGGUAG'.
##' @param star a logical value indicating whether asterisks should
##' be displayed.
##' @return a ggplot layer
##' @author Lang Zhou
##' @examples
##' miRNA_sequences <- system.file("extdata/seedSample.fa", package="ggmsa")
##' ggmsa(miRNA_sequences, font = 'DroidSansMono',
##' color = "Chemistry_NT", none_bg = TRUE) +
##' geom_seed(seed = "GAGGUAG", star = FALSE)
##' ggmsa(miRNA_sequences, font = 'DroidSansMono',
##' color = "Chemistry_NT") +
##' geom_seed(seed = "GAGGUAG", star = TRUE)
##' @export
geom_seed <- function(seed, star = FALSE) {
structure(list(seed = seed,
star = star),
class = "seed")
}
geom_seed1 <- function(tidyData, seed, star) {
get_asteriskScale(tidyData)
tidyData$y <- as.numeric(tidyData$name)
seq_first <- tidyData[tidyData$y == 1,]
char <- seq_first$character
char <- paste(char, collapse = "")
seedPos <- regexpr(seed,char)
#locate <- str_locate(char, seed)
#df_locate <- as.data.frame(locate)
#seedPos <- df_locate$start # start position of seed region
seedLen <- nchar(seed) # length of seed region
numSeq <- max(tidyData$y) # number of sequences
shadingLen <- getOption("shadingLen") #shading width
shading_alpha <- getOption("shading_alpha")
x <- seedPos - .5 #the x coordinate of the lower left corner
y <- 1 - .5 - shadingLen #the y coordinate of the lower left corner
yy <- numSeq + .5 + shadingLen # #the y coordinate of the top right corner
xx <- x + seedLen #the x coordinate of the top right corner
shadingData <- data.frame(x = c(x, x, xx, xx),
y = c(y, yy, yy, y),
t = c('a', 'a', 'a','a'))
starData <- data.frame(star_x = seq(seedPos, length.out = nchar(seed)),
star_y = rep(y, times = nchar(seed)))
if(isTRUE(star)) {
ly_star <- geom_asterisk(data = starData,
aes_(x = ~star_x, y = ~star_y))
return(ly_star)
}
mapping <- aes_(x= ~x, y= ~y, group= ~t, fill = ~I('#bebebe'))
ly_seed <- geom_polygon(data = shadingData,
mapping = mapping,
alpha = shading_alpha)
return(ly_seed)
}
get_asteriskScale <- function(tidyData) {
m <- max(tidyData$position)
seq_name <- factor(tidyData$name, levels = unique(tidyData$name))
n <- max(as.numeric(seq_name))
char_scale <- diff(range(star$x))/diff(range(star$y))
char_scale_2 <- char_scale * 3/2 * n/m
return(options("char_scale_2" = char_scale_2))
}
================================================
FILE: R/ggmaf.R
================================================
##' plot MAF
##'
##' @title ggmaf
##' @param data a tidy MAF data frame.You can get it by tidy_maf_df()
##' @param ref character, the name of reference genome.
##' eg:"hg38.chr1_KI270707v1_random"
##' @param block_start a numeric vector(>0). The start block to plot.
##' @param block_end a numeric vector(< max block). The end block to plot.
##' @param facet_field a numeric vector. The field in a facet panel.
##' @param heights two numeric vector.The plot proportion between
##' "Genomic location" panel(upon) and "Alignment" panel(down).
##' Default:c(0.4,0.6)
##' @param facet_heights Numeric vectors.The facet proportion.
##' @return ggplot object
##' @export
##' @author Lang Zhou
ggmaf <- function(data,
ref,
block_start = NULL,
block_end = NULL,
facet_field = NULL,
heights = c(0.4,0.6),
facet_heights = NULL) {
d <- data[data$block_number %in% c(block_start : block_end),]
if(is.null(facet_field)) {
maf_p <- maf_plot(d = d, ref = ref)
p <- plot_list(gglist = maf_p, heights = heights)
return(p)
}else {
d <- facet_maf(mafData = d, field = facet_field)
p_ls <- lapply(unique(d$facet), function(i) {
facet_d <- d[d$facet == i,]
maf_p <- maf_plot(d = facet_d, ref = ref)
pp <- plot_list(gglist = maf_p, heights = heights)
return(pp)
})
p <- plot_list(gglist = p_ls, ncol = 1, heights = facet_heights)
return(p)
}
}
##' tidy MAF data frame
##'
##' @title tidy_maf_df
##' @param maf_df a MAF data frame.You can get it by read_maf()
##' @param ref character, the name of reference genome.
##' eg:"hg38.chr1_KI270707v1_random"
##' @return data frame
##' @export
##' @author Lang Zhou
tidy_maf_df <- function(maf_df,ref) {
##add ref position to other genome
block_num <- unique(maf_df$block)
tidy_df <- lapply(block_num, function(i) {
x <- maf_df[maf_df$block == i,]
x$ref_start <- x[x$src == ref, "start"]
x$ref_end <- x[x$src == ref, "end_gap"]
return(x)
})%>% do.call("rbind", .)
tidy_df$block_number <- factor(tidy_df$block, levels =
unique(tidy_df$block)) %>% as.numeric
tidy_df$bs <- paste0(tidy_df$src,"-",tidy_df$block)
tidy_df$merge_y <- factor(tidy_df$src) %>% as.numeric
tidy_df$label <- paste0("B",tidy_df$block_number)
tidy_df <- order_aln(tidy_df,ref)
return(tidy_df)
}
#put the ref sequence the first in each block, new col "y"
order_aln <- function(tidy_df, ref) {
block_num <- unique(tidy_df$block)
lev <- sapply(block_num, function(i) {
x <- tidy_df[tidy_df$block == i,]
order <- c(ref, x$src[!x$src %in% ref])
lev <- paste0(order, "-",x$block)
return(lev)
})%>% unlist %>% rev
tidy_df$y <- factor(tidy_df$bs,levels = lev) %>% as.numeric
return(tidy_df)
}
##' @importFrom utils getFromNamespace
##' @importFrom ggplot2 aes_
##' @importFrom ggplot2 geom_text
maf_plot <- function(d, ref,
positive_color = "#a9c9d4",
negative_color = "#ffa389") {
geom_rrect <- getFromNamespace("geom_rrect","statebins")
##plot down panel
p_maf_aln <- ggplot(data = d) +
geom_rrect(mapping=aes_(xmin =~ ref_start,
xmax =~ ref_end,
ymin =~ y - 0.3,
ymax =~ y + 0.3,
fill =~ strand)) +
geom_rrect(data = d,
mapping=aes_(xmin =~ ref_start,
xmax =~ ref_end,
ymin =~ max(y) + 1 - 0.3,
ymax =~ max(y) + 1 + 0.3),
fill = "#a9c9d4",color = "black") +
scale_y_continuous(breaks = c(d$y,max(d$y + 1)),labels = c(d$bs, ref)) +
scale_fill_manual(breaks = c("+","-"),
values = c(positive_color,negative_color)) +
theme_void() +
theme(axis.text.x = element_text(),
axis.text.y = element_text(),
panel.grid.minor.y = element_blank(),
panel.grid.major.y = element_line(color = "grey"))
##plot upon panel
aim <- d[d$src != ref, ]
p_maf_genomePos <- ggplot(data = aim) +
geom_rrect(mapping = aes_(xmin =~ start,
xmax =~ end_gap,
ymin =~ merge_y - 0.3,
ymax =~ merge_y + 0.3,
fill =~ strand),
color = "black",
size = 0.5,
alpha = 0.8,
show.legend = FALSE) +
scale_y_continuous(breaks = unique(aim$merge_y),
labels = unique(aim$src)) +
scale_fill_manual(breaks = c("+","-"),
values = c(positive_color,negative_color)) +
theme_void() + theme(panel.grid.major.y = element_line(color = "grey"),
axis.text.x = element_text(),
axis.text.y = element_text(),
strip.text = element_blank()) +
geom_text(aes_(x =~ (start + end_gap)/2,
y =~ merge_y,label =~ label),
size = 3) +
facet_wrap(~src, scales = "free", ncol = 1)
return(list(p_maf_genomePos, p_maf_aln))
}
#assign facet number to blocks
facet_maf <- function(mafData, field) {
if(min(mafData$block_number) > 1){
pos_reset <- mafData$block_number - min(mafData$block_number) + 1
#pos_reset[pos_reset == 0] <- 1
}else {
pos_reset <- mafData$block_number
}
mafData$facet <- pos_reset %/% field
mafData[(pos_reset %% field) == 0, "facet"] <-
mafData[(pos_reset %% field) == 0, "facet"] - 1
return(mafData)
}
================================================
FILE: R/ggmsa.R
================================================
##' Plot multiple sequence alignment using ggplot2 with multiple color schemes
##' supported.
##'
##'
##' @title ggmsa
##' @param msa Multiple aligned sequence files or objects representing either
##' nucleotide sequences or AA sequences.
##' @param start a numeric vector. Start position to plot.
##' @param end a numeric vector. End position to plot.
##' @param font font families, possible values are 'helvetical', 'mono', and
##' 'DroidSansMono', 'TimesNewRoman'. Defaults is 'helvetical'.
##' If font = NULL, only plot the background tile.
##' @param color a Color scheme. One of 'Clustal', 'Chemistry_AA',
##' 'Shapely_AA', 'Zappo_AA', 'Taylor_AA', 'LETTER', 'CN6', 'Chemistry_NT',
##' 'Shapely_NT', 'Zappo_NT', 'Taylor_NT'. Defaults is 'Chemistry_AA'.
##' @param custom_color A data frame with two column called "names" and
##' "color".Customize the color scheme.
##' @param char_width a numeric vector. Specifying the character width in
##' the range of 0 to 1. Defaults is 0.9.
##' @param by_conservation a logical value. The most conserved regions have
##' the brightest colors.
##' @param none_bg a logical value indicating whether background should be
##' displayed. Defaults is FALSE.
##' @param position_highlight A numeric vector of the position that need to be
##' highlighted.
##' @param seq_name a logical value indicating whether sequence names
##' should be displayed. Defaults is 'NULL' which indicates that the
##' sequence name is displayed when 'font = null', but 'font = char'
##' will not be displayed. If 'seq_name = TRUE' the sequence name will
##' be displayed in any case. If 'seq_name = FALSE' the sequence name
##' will not be displayed under any circumstances.
##' @param border a character string. The border color.
##' @param consensus_views a logical value that opening consensus views.
##' @param use_dot a logical value. Displays characters as dots instead
##' of fading their color in the consensus view.
##' @param disagreement a logical value. Displays characters that
##' disagreememt to consensus(excludes ambiguous disagreements).
##' @param ignore_gaps a logical value. When selected TRUE, gaps in column
##' are treated as if that row didn't exist.
##' @param ref a character string. Specifying the reference sequence which
##' should be one of input sequences when 'consensus_views' is TRUE.
##' @param show.legend logical. Should this layer be included in the legends?
##' @return ggplot object
##' @importFrom tidyr gather
##' @importFrom ggplot2 ggplot
##' @importFrom ggplot2 aes_
##' @importFrom ggplot2 theme
##' @importFrom ggplot2 theme_minimal
##' @importFrom ggplot2 geom_tile
##' @importFrom ggplot2 geom_polygon
##' @importFrom ggplot2 xlab
##' @importFrom ggplot2 ylab
##' @importFrom ggplot2 coord_fixed
##' @importFrom ggplot2 geom_point
##' @importFrom ggplot2 element_blank
##' @importFrom magrittr %>%
##' @importFrom stats setNames
##' @importFrom grid unit
##' @examples
##' #plot multiple sequences by loading fasta format
##' fasta <- system.file("extdata", "sample.fasta", package = "ggmsa")
##' ggmsa(fasta, 164, 213, color="Chemistry_AA")
##'
##'\dontrun{
##' #XMultipleAlignment objects can be used as input in the 'ggmsa'
##' AAMultipleAlignment <- Biostrings::readAAMultipleAlignment(fasta)
##' ggmsa(AAMultipleAlignment, 164, 213, color="Chemistry_AA")
##'
##' #XStringSet objects can be used as input in the 'ggmsa'
##' AAStringSet <- Biostrings::readAAStringSet(fasta)
##' ggmsa(AAStringSet, 164, 213, color="Chemistry_AA")
##'
##' #Xbin objects from 'seqmagick' can be used as input in the 'ggmsa'
##' AAbin <- seqmagick::fa_read(fasta)
##' ggmsa(AAbin, 164, 213, color="Chemistry_AA")
##' }
##' @export
##' @author Guangchuang Yu
ggmsa <- function(msa,
start = NULL,
end = NULL,
font = "helvetical",
color = "Chemistry_AA",
custom_color = NULL,
char_width = 0.9,
none_bg = FALSE,
by_conservation = FALSE,
position_highlight = NULL,
seq_name = NULL,
border = NULL,
consensus_views = FALSE,
use_dot = FALSE,
disagreement = TRUE,
ignore_gaps = FALSE,
ref = NULL,
show.legend = FALSE) {
data <- tidy_msa(msa, start = start, end = end)
ggplot() + geom_msa(data, font = font,
color = color,
custom_color = custom_color,
char_width = char_width,
none_bg = none_bg,
by_conservation = by_conservation,
position_highlight = position_highlight,
seq_name = seq_name,
border = border,
consensus_views = consensus_views,
use_dot = use_dot,
disagreement = disagreement,
ignore_gaps = ignore_gaps,
ref = ref,
show.legend = show.legend) +
theme_msa()
}
================================================
FILE: R/import-functions.R
================================================
##' @importFrom utils globalVariables
globalVariables(".")
globalVariables("fre") #geom_GC.R:
globalVariables("read.delim") #arc.R
globalVariables(c("name", "position_adj", "y_adj")) #SeqBundles.R
================================================
FILE: R/method-plot.R
================================================
##' plot method for SeqDiff object
##'
##' @name plot
##' @rdname plot-methods
##' @exportMethod plot
##' @aliases plot,SeqDiff,ANY-method
##' @docType methods
##' @param x SeqDiff object
##' @param width bin width
##' @param title plot title
##' @param xlab xlab
##' @param by one of 'bar' and 'area'
##' @param fill fill color of upper part of the plot
##' @param colors color of lower part of the plot
##' @param xlim limits of x-axis
##' @return plot
##' @importFrom ggplot2 ggtitle
##' @importFrom ggplot2 xlim
##' @importFrom ggplot2 ggplot_gtable
##' @importFrom ggplot2 ggplot_build
##' @importFrom grid unit.pmax
##' @importFrom aplot plot_list
##' @author guangchuang yu
##' @examples
##' fas <- list.files(system.file("extdata", "GVariation", package="ggmsa"),
##' pattern="fas", full.names=TRUE)
##' x1 <- seqdiff(fas[1], reference=1)
##' plot(x1)
setMethod("plot", signature(x="SeqDiff"),
function(x, width=50, title="auto",
xlab = "Nucleotide Position",
by="bar", fill="firebrick",
colors=c(A="#ff6d6d", C="#769dcc", G="#f2be3c", T="#74ce98"),
xlim = NULL) {
nn <- names(x@sequence)
if (is.null(title) || is.na(title)) {
title <- ""
} else if (title == "auto") {
title <- paste(nn[-x@reference],
"nucelotide differences relative to",
nn[x@reference])
}
p1 <- plot_difference_count(x@diff, width, by=by, fill=fill) +
ggtitle(title)
p2 <- plot_difference(x@diff, colors=colors, xlab)
if (!is.null(xlim)) {
p1 <- p1 + xlim(xlim)
p2 <- p2 + xlim(xlim)
}
plot_list(p1, p2, ncol=1, heights=c(.7, .4))
}
)
##' @importFrom ggplot2 ggplot
##' @importFrom ggplot2 aes_
##' @importFrom ggplot2 geom_segment
##' @importFrom ggplot2 xlab
##' @importFrom ggplot2 ylab
##' @importFrom ggplot2 scale_y_continuous
##' @importFrom ggplot2 theme_minimal
##' @importFrom ggplot2 theme
##' @importFrom ggplot2 element_blank
##' @importFrom ggplot2 scale_color_manual
plot_difference <- function(x, colors, xlab="Nucleotide Position") {
x$difference <- x$difference %>% toupper
yy = 4:1
names(yy) = c("A", "C", "G", "T")
x$y <- yy[x$difference]
n <- sum(is.na(x$y))
if (n > 0) {
message(n, " sites contain deletions or ambiguous bases,
which will be ignored in current implementation...")
}
x <- x[!is.na(x$y),]
p <- ggplot(x, aes_(x=~position, y=~y, color=~difference))
p + geom_segment(aes_(x=~position, xend=~position, y=~y, yend=~y+.8)) +
xlab(xlab) + ylab(NULL) +
scale_y_continuous(breaks=yy, labels=names(yy)) +
theme_minimal() +
theme(legend.position="none")+
theme(axis.text.x=element_blank(), axis.ticks.x = element_blank()) +
scale_color_manual(values=colors)
}
##' @importFrom ggplot2 geom_col
##' @importFrom ggplot2 geom_area
##' @importFrom ggplot2 theme_bw
plot_difference_count <- function(x, width, by = 'bar', fill='red') {
by <- match.arg(by, c("bar", "area"))
if (by == 'bar') {
geom <- geom_col(fill=fill, width=width)
keep0 <- FALSE
} else if (by == "area") {
geom <- geom_area(fill=fill)
keep0 <- TRUE
}
d <- nucleotide_difference_count(x, width, keep0)
p <- ggplot(d, aes_(x=~position, y=~count))
p + geom + xlab(NULL) + ylab("Difference") + theme_bw()
}
================================================
FILE: R/method-show.R
================================================
##' show method
##'
##'
##' @name show
##' @docType methods
##' @rdname show-methods
##' @title show method
##' @param object SeqDiff object
##' @return message
##' @importFrom methods show
##' @exportMethod show
##' @aliases SeqDiff-class
##' show,SeqDiff-method
##' @usage show(object)
##' @examples
##' fas <- list.files(system.file("extdata", "GVariation", package="ggmsa"),
##' pattern="fas", full.names=TRUE)
##' x1 <- seqdiff(fas[1], reference=1)
##' x1
setMethod("show",signature(object="SeqDiff"),
function(object) {
message("sequence differences of",
paste0(names(object@sequence), collapse=" and "),
'\n')
d <- object@diff$difference %>% table %>% as.data.frame
message(sum(d$Freq), " ", "sites differ:\n")
freq <- d[,2]
names(freq) <- d[,1]
print(freq)
})
================================================
FILE: R/methods-diff.R
================================================
##' @method diff SeqDiff
##' @export
diff.SeqDiff <- function(x, ...) {
x@diff
}
================================================
FILE: R/methods-ggplot_add.R
================================================
##' @method ggplot_add seqlogo
##' @export
ggplot_add.seqlogo <- function(object, plot, object_name) {
msaData <- plot$layers[[1]]$data
logo_tidyData <- msa2tidy(msaData)
logo_font <- object$font
logo_color <- object[["color"]]
adaptive <- object$adaptive
top <- object$top
logo_custom_color <- object[["custom_color"]]
show.legend <- object$show.legend
ly_logo <- geom_logo(data = logo_tidyData,
font = logo_font,
color = logo_color,
adaptive = adaptive,
top = top,
custom_color = logo_custom_color,
show.legend = show.legend)
ggplot_add(ly_logo, plot, object_name)
}
##' @method ggplot_add seed
##' @export
ggplot_add.seed <- function(object, plot, object_name) {
msaData <- plot$layers[[1]]$data
seed_tidyData <- msa2tidy(msaData)
seed <- object$seed
star <- object$star
ly <- geom_seed1(seed_tidyData, seed, star)
ggplot_add(ly, plot, object_name)
}
##' @method ggplot_add GCcontent
##' @export
ggplot_add.GCcontent <- function(object, plot, object_name) {
msaData <- plot$layers[[1]]$data
show.legend <- object$show.legend
GC_tidyData <- msa2tidy(msaData)
ly <- geom_GC1(GC_tidyData, show.legend = show.legend )
ggplot_add(ly, plot, object_name)
}
##' @importFrom ggplot2 facet_wrap
##' @importFrom ggplot2 ggplot_add
##' @importFrom ggplot2 scale_x_continuous
##' @importFrom ggplot2 coord_cartesian
##' @importFrom ggplot2 geom_blank
##' @method ggplot_add facet_msa
##' @export
ggplot_add.facet_msa <- function(object, plot, object_name){
msaData <- plot$layers[[1]]$data
field <- object$field
facetData <- facet_data(msaData, field)
##update data
plot$layers[[1]]$data <- facetData #ly_bg
if (length(plot$layers) > 1){
plot$layers[[2]]$data <- facetData #ly_label
}
region <- diff(range(facetData$position))
xl_scale <- facet_scale(facetData, field)
if (region %% field == 0) {
plot + facet_wrap(.~facet, ncol = 1, scales = "free_x") +
scale_x_continuous(expand = c(0,0),
breaks = xl_scale,
labels = xl_scale) +
coord_cartesian()
}else {
max_pos <- facetData$position %>% max
min_pos <- facetData$position %>% min
max_facet <- facetData$facet %>% max
minpos_maxfacet <- facetData[facetData$facet ==
max_facet,"position"] %>% min
expand_pos <- (region %/% field + 1) * field + min_pos
dummy <- data.frame(x = c(minpos_maxfacet, expand_pos),
facet = max_facet)
plot +
facet_wrap(.~facet, ncol = 1, scales = "free_x") +
geom_blank(aes_(x = ~x), dummy, inherit.aes = FALSE) +
scale_x_continuous(expand = c(0,0),
breaks = xl_scale,
labels = xl_scale) +
coord_cartesian()
}
}
##' @method ggplot_add msaBar
##' @importFrom aplot insert_top
##' @importFrom ggplot2 coord_cartesian
##' @export
ggplot_add.msaBar <- function(object, plot, object_name){
msaData <- plot$layers[[1]]$data
bar_tidyData <- msa2tidy(msaData)
ly <- ly_bar(bar_tidyData)
p_bar <- ggplot() + ly_bar(bar_tidyData) + bar_theme(bar_tidyData)
plot <- plot + coord_cartesian()
p_bar %>% insert_top(plot, height = 3)
}
##' @method ggplot_add nucleotideeHelix
##' @export
ggplot_add.nucleotideeHelix <- function(object, plot, object_name){
msa_data <- plot$layers[[1]]$data
tidy_data <- msa2tidy(msa_data)
seq_numbers <- levels(tidy_data$name) %>% length
helix_data <- object$helix_data
color_by <- object$color_by
overlap <- object$overlap
if(is.data.frame(helix_data)) {
helix_tidy <- tidy_helix(helix_data, color_by = color_by)
}else {
helix_tidy <- tidy_list_helix(helix_data, color_by = color_by)
}
ly <- layer_helix(helix_data = helix_tidy,
overlap = overlap,
seq_numbers = seq_numbers)
ggplot_add(ly, plot, object_name)
}
================================================
FILE: R/msa_data.R
================================================
##' This function parses FASTA files or other sequence objects.
##' And assign color to each molecule (amino acid or nucleotide) according to
##' the selected color scheme.
##'
##'
##' @title msa_data
##' @param tidymsa sequence alignment with data frame, generated by tidy_msa().
##' @param font font families, possible values are 'helvetical', 'mono',
##' and 'DroidSansMono', 'TimesNewRoman'. . Defaults is 'helvetical'.
##' If you specify font = NULL, only the background box will be printed.
##' @param color A Color scheme. One of 'Clustal', 'Chemistry_AA', '
##' Shapely_AA', 'Zappo_AA', 'Taylor_AA','LETTER','CN6', 'Chemistry_NT',
##' 'Shapely_NT', 'Zappo_NT', 'Taylor_NT'.Defaults is 'Chemistry_AA.
##' @param custom_color A data frame with two cloumn called "names" and
##' "color".Customize the color scheme.
##' @param order vectors.Specified sequences order.
##' @param char_width a numeric vector. Specifying the character
##' width in the range of 0 to 1. Defaults is 0.9.
##' @param by_conservation a logical value. The most conserved
##' regions have the brightest colors.
##' @param consensus_views a logical value that opeaning consensus views.
##' @param use_dot a logical value. Displays characters as dots
##' instead of fading their color in the consensus view.
##' @param disagreement a logical value. Displays characters that
##' disagreememt to consensus(excludes ambiguous disagreements).
##' @param ignore_gaps a logical value. When selected TRUE, gaps
##' in column are treated as if that row didn't exist.
##' @param ref a character string. Specifying the reference sequence
##' which should be one of input sequences when 'consensus_views' is TRUE.
##' @return A data frame
##' @examples
##' fasta <- system.file("extdata/sample.fasta", package="ggmsa")
##' data <- msa_data(fasta, 20, 120,
##' font = "helvetical",
##' color = 'Chemistry_AA' )
## @export
##' @noRd
##' @author Guangchuang Yu, Lang Zhou
msa_data <- function(tidymsa, font = "helvetical",
color = "Chemistry_AA",
custom_color = NULL,
char_width = 0.9,
by_conservation = FALSE,
consensus_views = FALSE,
use_dot = FALSE,
disagreement = TRUE,
ignore_gaps = FALSE,
ref = NULL) {
if (is.null(custom_color)) {
color <- match.arg(color, c("Clustal", "Chemistry_AA", "Shapely_AA",
"Zappo_AA", "Taylor_AA","Chemistry_NT",
"Shapely_NT", "Zappo_NT", "Taylor_NT",
"LETTER", "CN6", "Hydrophobicity" ))
}
y <- tidymsa
## add color
if (color == "Clustal"){
y <- color_Clustal(y)
}else {
if (consensus_views) {
consensus <- get_consensus(y, #extract a consensus/ref sequence
ignore_gaps = ignore_gaps,
ref = ref)
tc <- color_scheme(y, color) %>% #assigning color for other seq.
tidy_color(consensus, disagreement, ref = ref)# tidy colors
y <- color_scheme(consensus, color) %>% #assigning color for con/ref
rbind(tc) #add consensus sequence
if (use_dot){
y[is.na(y$color), "character"] <- "."
}else {
y$font_color <- "#000000"
y[is.na(y$color), "font_color"] <- "#aaacaf"
y[is.na(y$color), "color"] <- "#ffffff"
}
}else {
y <- color_scheme(y, color, custom_color)
}
}
if (by_conservation){
y <- color_visibility(y)
}
if (is.null(font)) {
return(y)
}
## calling internal polygons
font_f <- font_fam[[font]]
#debug using'as.character()'
data_sp <- font_f[as.character(unique(y$character))]
## To adapt to tree data
if (!'name' %in% names(y) & !consensus_views) {
if ('label' %in% names(y)) {
names(y)[names(y) == 'label'] <- "name"
}else {
stop("unknown sequence name...")
}
}
if(!is.factor(y$name) & !consensus_views){
lev <- unique(data.frame(y[,c("name","y")]))
# y is the order of the nodes in the tree
lev <- lev[order(lev$y), "name"]
y$name <- factor(y$name, levels = lev)
} else if(consensus_views) {
y$name <- order_name(y$name,
consensus_views = consensus_views,
ref = ref)
}
y$ypos <- as.numeric(y$name)
# for ggtreeExtra
if ("new_position" %in% colnames(y)) {
scale_n <- 5 * length(unique(y$name))/diff(range(y$new_position))
char_width <- char_width *
diff(range(y$new_position))/diff(range(y$position))
}
yy <- lapply(seq_len(nrow(y)), function(i) {
d <- y[i, ]
dd <- data_sp[[d$character]]
if(d$character == "."){ # '.' without zooming
if ("new_position" %in% colnames(d)){
dd$x <- dd$x - min(dd$x) + d$new_position - diff(range(dd$x))/2
}else{
dd$x <- dd$x - min(dd$x) + d$position - diff(range(dd$x))/2
}
dd$y <- dd$y - min(dd$y) + d$ypos - diff(range(dd$y))/2
}else {# other characters
char_scale <- diff(range(dd$x))/diff(range(dd$y))#equal proportion
#y_width = char_width, x-width scaled proportionally
if(diff(range(dd$x)) <= diff(range(dd$y))) {
dd$x <- dd$x * (char_width * char_scale)/diff(range(dd$x))
# for ggtreeExtra
if ("new_position" %in% colnames(d)){
dd$y <- (dd$y * char_width)/diff(range(dd$y)) * scale_n
dd$x <- dd$x - min(dd$x) + d$new_position -
(char_width * char_scale)/2
dd$y <- dd$y - min(dd$y) + d$ypos - scale_n * char_width/2
}else{
dd$y <- (dd$y * char_width)/diff(range(dd$y))
dd$x <- dd$x - min(dd$x) + d$position -
(char_width * char_scale)/2
dd$y <- dd$y - min(dd$y) + d$ypos - char_width/2
}
}else{#x_width = char_width, y-width scaled proportionally
dd$x <- dd$x * char_width/diff(range(dd$x))
# for ggtreeExtra
if ("new_position" %in% colnames(d)){
dd$y <- dd$y *
char_width/(diff(range(dd$y)) * char_scale) * scale_n
dd$x <- dd$x - min(dd$x) + d$new_position - char_width/2
dd$y <- dd$y - min(dd$y) + d$ypos -
(scale_n * char_width/char_scale)/2
}else{
dd$y <- dd$y * char_width/(diff(range(dd$y)) * char_scale)
dd$x <- dd$x - min(dd$x) + d$position - char_width/2
dd$y <- dd$y - min(dd$y) + d$ypos -
(char_width/char_scale)/2
}
}
}
cn <- colnames(d)
cn <- cn[!cn %in% c('x','y', 'ypos')]
for (nn in cn) {
dd[[nn]] <- d[[nn]]
}
dd$group <- paste0("V", d$position, "L", d$ypos)
return(dd)
})
ydf <- do.call(rbind, yy)
colnames(ydf)[colnames(ydf) == 'y'] <- 'yy'
ydf$y <- as.numeric(ydf$name)
ydf <- cbind(label = ydf$name, ydf)
return(ydf)
}
##' Convert msa file/object to tidy data frame.
##'
##'
##' @title tidy_msa
##' @param msa multiple sequence alignment file or sequence object in
##' DNAStringSet, RNAStringSet, AAStringSet, BStringSet, DNAMultipleAlignment,
##' RNAMultipleAlignment, AAMultipleAlignment, DNAbin or AAbin
##' @param start start position to extract subset of alignment
##' @param end end position to extract subset of alignemnt
##' @return tibble data frame
##' @export
##' @examples
##' fasta <- system.file("extdata", "sample.fasta", package = "ggmsa")
##' aln <- tidy_msa(msa = fasta, start = 10, end = 100)
##' @author Guangchuang Yu
tidy_msa <- function(msa, start = NULL, end = NULL) {
if(inherits(msa, "character") && length(msa) > 1) {
aln <- msa
}else {
aln <- prepare_msa(msa)
}
alnmat <- lapply(seq_along(aln), function(i) {
##Preventing function collisions
base::strsplit(as.character(aln[[i]]), '')[[1]]
}) %>% do.call('rbind', .)
## for DNAbin and AAbin
alndf <- as.data.frame(alnmat, stringsAsFactors = FALSE)
if(unique(names(aln)) %>% length == length(aln)) {
alndf$name = names(aln)
}else{
stop("Sequences must have unique names")
}
cn = colnames(alndf)
cn <- cn[!cn %in% "name"]
df <- gather(alndf, "position", "character", cn)
y <- df
y$position = as.numeric(sub("V", "", y$position))
y$character = toupper(y$character)
y$name = factor(y$name, levels=rev(names(aln)))
if (is.null(start)) start <- min(y$position)
if (is.null(end)) end <- max(y$position)
y <- y[y$position >=start & y$position <= end, ]
return(y)
}
##' This function converts the msa_data to the tidy data.
##'
##' @param msaData sequence alignment data generated by msa_data().
##' @noRd
msa2tidy <- function(msaData) {
if ("order" %in% names(msaData)) {
msaData <- msaData[msaData$order == 1,]
}
df_tidy <- data.frame(name = msaData$name,
position = msaData$position,
character = msaData$character)
df_tidy$character <- as.character(df_tidy$character)
return(df_tidy)
}
================================================
FILE: R/pp_interactive.R
================================================
make_gap <- function(gap, previous_seq) {
gap_df <- previous_seq[rep(1, each=gap),]
gap_start <- max(previous_seq$position) + 1
gap_df$position <- gap_start : (gap_start + gap - 1 )
gap_df$character <- "-"
if("pos_previous" %in% names(gap_df)) {
gap_df$pos_previous <- 0
}
return(gap_df)
}
##' merge two MSA
##'
##' @title merge_seq
##' @param previous_seq previous MSA
##' @param subsequent_seq subsequent MSA
##' @param gap gap length
##' @param adjust_name logical value. merge seq name or not
##' @return tidy MSA data frame
##' @export
##' @author Lang Zhou
merge_seq <- function(previous_seq, gap, subsequent_seq, adjust_name = TRUE) {
name_pre <- levels(previous_seq$name)
name_subse <- levels(subsequent_seq$name)
if(length(name_pre) != length(name_subse)) {
stop("The sequences number of previous_seq and subsequent_seq is inconsistent")
}
gap_df <- make_gap(gap = gap, previous_seq = previous_seq)
subsequent_seq$position <-
subsequent_seq$position - min(subsequent_seq$position) + 1
subsequent_seq$position <-
subsequent_seq$position + max(previous_seq$position) + gap
t_merge <- rbind(previous_seq,gap_df,subsequent_seq)
if (adjust_name) {
rownames(t_merge) <- seq(nrow(t_merge))
names(t_merge)[1] <- "name_previous"
t_merge$name <- ""
for(i in seq(length(name_pre))) {
t_merge[t_merge$name_previous %in% c(name_pre[i], name_subse[i]),"name"] <-
paste0(name_pre[i],"-", name_subse[i])
}
t_merge$name <- factor(t_merge$name)
}
return(t_merge)
}
##' tidy protein-protein interactive position data
##'
##' @title tidy_hdata
##' @param gap gap length
##' @param inter protein-protein interactive position data
##' @param previous_seq previous MSA
##' @param subsequent_seq subsequent MSA
##' @importFrom R4RNA as.helix
##' @return helix data
##' @export
##' @author Lang Zhou
tidy_hdata <- function(gap, inter, previous_seq,subsequent_seq) {
inter$j <- inter$Res.no..2 -
min(subsequent_seq$position) +
max(previous_seq$position) + gap + 1
hdata <- data.frame(i = inter$Res.no.1,
j = inter$j,
length = 1,
value = NA,
colour = "blue")
hdata <- as.helix(hdata)
return(hdata)
}
##' reset MSA position
##'
##' @title reset_pos
##' @param seq_df MSA data
##' @return data frame
##' @export
##' @author Lang Zhou
reset_pos <- function(seq_df) {
names(seq_df)[2] <- "pos_previous"
seq_df$position <- ""
for(i in unique(seq_df$pos_previous)%>% seq) {
uni <- unique(seq_df$pos_previous)
seq_df[seq_df$pos_previous == uni[i],"position"] <- i
}
seq_df$position <- as.numeric(seq_df$position)
return(seq_df)
}
##' reset hdata data position
##'
##' @title simplify_hdata
##' @param hdata data from tidy_hdata()
##' @param sim_msa MSA data frame
##' @return data frame
##' @export
##' @author Lang Zhou
simplify_hdata <- function(hdata, sim_msa) {
new_hdata <- lapply(seq(nrow(hdata)), function(a) {
n <- hdata[a,]
n$pre_i <- n$i
n$i <- sim_msa[sim_msa$pos_previous == n$i,"position"] %>% unique
return(n)
}) %>% do.call("rbind",.)
new_hdata <- lapply(seq(nrow(new_hdata)), function(a) {
n <- new_hdata[a,]
n$pre_j <- n$j
n$j <- sim_msa[sim_msa$pos_previous == n$j,"position"] %>% unique
return(n)
}) %>% do.call("rbind",.)
new_hdata <- as.helix(new_hdata)
return(new_hdata)
}
================================================
FILE: R/prepare_fasta.R
================================================
##' preparing multiple sequence alignment
##'
##' This function supports both NT or AA sequences; It supports multiple
##' input formats such as "DNAStringSet", "BStringSet", "AAStringSet",
##' DNAbin", "AAbin" and a filepath.
##' @title prepare_msa
##' @param msa a multiple sequence alignment file or object
##' @return BStringSet based object
##' @importFrom Biostrings DNAStringSet
##' @importFrom Biostrings RNAStringSet
##' @importFrom Biostrings AAStringSet
##' @importFrom methods missingArg
##' @importFrom seqmagick fa_read
## @export
##' @author Lang Zhou and Guangchuang Yu
##' @noRd
prepare_msa <- function(msa) {
if (missingArg(msa)) {
stop("no input...")
} else if (inherits(msa, "character")) {
msa <- fa_read(msa)
} else if (!class(msa) %in% supported_msa_class) {
stop("multiple sequence alignment object no supported...")
}
res <- switch(class(msa),
DNAbin = DNAbin2DNAStringSet(msa),
AAbin = AAbin2AAStringSet(msa),
DNAMultipleAlignment = DNAStringSet(msa),
RNAMultipleAlignment = RNAStringSet(msa),
AAMultipleAlignment = AAStringSet(msa),
msa ## DNAstringSet, RNAStringSet, AAString, BStringSet
)
return(res)
}
DNAbin2DNAStringSet <- function(msa) {
seqs <- vapply(seq_along(msa),
function(i) paste0(as.character(msa[i]) %>% unlist,
collapse=''),
character(1))
names(seqs) <- names(msa)
switch(class(msa),
DNAbin = DNAStringSet(seqs),
AAbin = AAStringSet(seqs))
}
AAbin2AAStringSet <- DNAbin2DNAStringSet
supported_msa_class <- c("DNAStringSet",
"RNAStringSet",
"AAStringSet",
"BStringSet",
"DNAMultipleAlignment",
"RNAMultipleAlignment",
"AAMultipleAlignment",
"DNAbin",
"AAbin")
================================================
FILE: R/read_maf.R
================================================
##' read 'multiple alignment format'(MAF) file
##'
##' @title read_maf
##' @param multiple_alignment_format a multiple alignment format(MAF) file
##' @return data frame
##' @export
##' @author Lang Zhou
read_maf <- function(multiple_alignment_format) {
line <- readLines(multiple_alignment_format)
head <- sapply(line, function(i) substring(i,1,1))
rm(line)# 'line' in names(heads)
#remove header
head <- head[-seq(which(head == "#"))]
#split block
blank <- which(head == "")
block_ls <- lapply(seq(blank), function(i) {
if (blank[i] == min(blank)) {
x <- names(head)[1:blank[i]]
}else {
x <- names(head)[blank[i-1]:blank[i]]
}
return(x)
})
names(block_ls) <- paste0("block_",seq(length(block_ls)))
#extra lines starting with "s"
s_block <- lapply(seq(length(block_ls)), function(i) {
blocki <- block_ls[[i]]
line_s <- blocki[sapply(blocki, function(j) substring(j,1,1)) == "s"]
})
names(s_block) <- names(block_ls)
#get a MAF df
s_name <- c("type", "src", "start", "size", "strand", 'src_size', "text")
seq_df <-lapply(seq(length(s_block)), function(i) {
blocki <- s_block[[i]]
seq_df <- lapply(seq(length(blocki)), function(j) {
x <- blocki[[j]]
#extra all columns
x <- strsplit(x, " ") %>% unlist
x1 <- x[sapply(x, nchar) > 0]
#convert to data frame
seq <- t(as.matrix(x1)) %>% as.data.frame()
names(seq) <- s_name
seq[,c("start","size",'src_size')] <-
seq[,c("start","size",'src_size')] %>%as.numeric()
seq$size_gap <- nchar(seq$text)
seq$end <- seq$start + seq$size
seq$end_gap <- seq$start + seq$size_gap
seq$block <- names(s_block[i])
return(seq)
})%>% do.call("rbind", .)
return(seq_df)
}) %>% do.call("rbind", .)
}
================================================
FILE: R/seqdiff.R
================================================
##' calculate difference of two aligned sequences
##'
##'
##' @title seqdiff
##' @param fasta fasta file
##' @param reference which sequence serve as reference, 1 or 2
##' @return SeqDiff object
##' @export
##' @importFrom Biostrings readBStringSet
##' @importClassesFrom Biostrings BStringSet
##' @importFrom methods new
##' @author guangchuang yu
##' @examples
##' fas <- list.files(system.file("extdata", "GVariation", package="ggmsa"),
##' pattern="fas", full.names=TRUE)
##' seqdiff(fas[1], reference=1)
seqdiff <- function(fasta, reference=1) {
sequence <- readBStringSet(fasta)
if (length(sequence) != 2 && length(width(sequence)) != 1) {
stop("fas should contains 2 aligned sequences...")
}
diff <- nucleotide_difference(sequence, reference)
new("SeqDiff",
file = fasta,
sequence = sequence,
reference = reference,
diff = diff)
}
##' @importFrom magrittr %>%
##' @importFrom Biostrings toString
##' @importFrom Biostrings width
nucleotide_difference <- function(x, reference=1) {
n <- width(x[1])
nn <- seq_len(n)
s1 <- x[1] %>% toString %>% substring(nn, nn)
s2 <- x[2] %>% toString %>% substring(nn, nn)
pos <- which(s1 != s2)
if (reference == 1) {
diff <- s2[pos]
} else {
diff <- s1[pos]
}
return(data.frame(position = pos,
difference = diff,
stringsAsFactors = FALSE))
}
##' @importFrom dplyr group_by
##' @importFrom dplyr summarize
##' @importFrom dplyr select
##' @importFrom dplyr n
nucleotide_difference_count <- function(x, width=50, keep0=FALSE) {
n <- max(x$position)
bin <- rep(seq_len(ceiling(n/width)), each=width)
position <- c(seq_len(n)[!duplicated(bin)], n)
x$bin <- bin[x$pos]
y <- x %>% group_by(bin) %>%
summarize(position=min(position), count = n()) %>%
select(-bin)
y$position <- position[findInterval(y$position, position)]
if (keep0) {
itv <- seq(1, n, width)
yy <- data.frame(position = itv[!itv %in% y$position],
count = 0)
y <- rbind(y, yy)
y <- y[order(y$position, decreasing=FALSE),]
}
return(y)
}
================================================
FILE: R/seqlogo.R
================================================
##' plot sequence logo for MSA based 'ggolot2'
##' @title seqlogo
##' @param msa Multiple sequence alignment file or object for representing
##' either nucleotide sequences or peptide sequences.
##' @param start Start position to plot.
##' @param end End position to plot.
##' @param font font families, possible values are 'helvetical', 'mono', and
##' 'DroidSansMono', 'TimesNewRoman'. Defaults is 'DroidSansMono'.
##' If font=NULL, only the background tiles is drawn.
##' @param color A Color scheme. One of 'Clustal', 'Chemistry_AA',
##' 'Shapely_AA', 'Zappo_AA', 'Taylor_AA', 'LETTER', 'CN6','Chemistry_NT',
##' 'Shapely_NT', 'Zappo_NT', 'Taylor_NT'. Defaults is 'Chemistry_AA'.
##' @param custom_color A data frame with two cloumn called "names" and
##' "color".Customize the color scheme.
##' @param adaptive A logical value indicating whether the overall height of
##' seqlogo corresponds to the number of sequences. If FALSE, seqlogo
##' overall height = 4,fixedly.
##' @param top A logical value. If TRUE, seqlogo is aligned to the top of MSA.
##' @return ggplot object
##' @examples
##' #plot sequence motif independently
##' nt_sequence <- system.file("extdata", "LeaderRepeat_All.fa",
##' package = "ggmsa")
##' seqlogo(nt_sequence, color = "Chemistry_NT")
##' @export
##' @author Lang Zhou
seqlogo <- function(msa,
start = NULL,
end = NULL,
font = "DroidSansMono",
color = "Chemistry_AA",
adaptive = FALSE,
top = FALSE,
custom_color = NULL) {
data <- tidy_msa(msa, start = start, end = end)
ggplot() + geom_logo(data,
font = font,
color = color,
adaptive = adaptive,
top = top,
custom_color = custom_color) +
theme_minimal() + xlab(NULL) + ylab(NULL) +
theme(legend.position = 'none') +
theme(panel.grid = element_blank(), axis.text.y = element_blank()) +
coord_fixed()
}
##' Multiple sequence alignment layer for ggplot2. It plot sequence motifs.
##' @title geom_seqlogo
##' @param font font families, possible values are 'helvetical', 'mono',
##' and 'DroidSansMono', 'TimesNewRoman'. Defaults is 'DroidSansMono'.
##' @param color A Color scheme. One of 'Clustal', 'Chemistry_AA',
##' 'Shapely_AA', 'Zappo_AA', 'Taylor_AA', 'LETTER', 'CN6', 'Chemistry_NT',
##' 'Shapely_NT', 'Zappo_NT', 'Taylor_NT'. Defaults is 'Chemistry_AA'.
##' @param custom_color A data frame with two cloumn called "names" and
##' "color".Customize the color scheme.
##' @param adaptive A logical value indicating whether the overall height
##' of seqlogo corresponds to the number of sequences.If is FALSE,
##' seqlogo overall height = 4,fixedly.
##' @param top A logical value. If TRUE, seqlogo is aligned to the top of MSA.
##' @param show.legend logical. Should this layer be included in the legends?
##' @param ... additional parameter
##' @return A list
##' @examples
##' #plot multiple sequence alignment and sequence motifs
##' f <- system.file("extdata/LeaderRepeat_All.fa", package="ggmsa")
##' ggmsa(f,font = NULL,color = "Chemistry_NT") + geom_seqlogo()
##' @export
##' @author Lang Zhou
geom_seqlogo <- function(font = "DroidSansMono", color = "Chemistry_AA",
adaptive = TRUE, top = TRUE, custom_color = NULL,
show.legend = FALSE, ...) {
structure(list(font = font,
color = color,
adaptive = adaptive,
top = top,
custom_color = custom_color,
show.legend = show.legend),
class = "seqlogo")
}
geom_logo <- function(data, font = "DroidSansMono", color = "Chemistry_AA",
adaptive = FALSE, top = TRUE, custom_color = NULL,
show.legend = FALSE, ...) {
mapping <- aes_(x = ~logo_x,
y = ~logo_y,
group = ~group,
fill = ~I(color))
logo_data <- seqlogo_data(data, font = font, color = color,
adaptive = adaptive, top = top,
custom_color = custom_color)
ly_logo <- geom_polygon(mapping = mapping, data = logo_data,
inherit.aes = FALSE, show.legend = show.legend)
return(ly_logo)
}
seqlogo_data <- function(data, font = "DroidSansMono",
color = "Chemistry_AA", adaptive = FALSE,
top = TRUE, custom_color = NULL){
tidy <- data
if (color == "Clustal") {
tidy <- color_Clustal(tidy)
} else{
tidy <- color_scheme(tidy, color, custom_color)
}
if (adaptive) {
seq_number <- as.character(unique(tidy[[1]]))
total_heigh <- length(seq_number) / 6
} else {
total_heigh <- 4
}
#total_heigh <- getOption("total_heigh")
logo_width <- getOption("logo_width")
## assign the start postion to the first label
col_num <- as.numeric(levels(factor(tidy$position)))
moti_da <- lapply(col_num, function(j){
## Calculate the char frequency in each column
clo <- tidy[tidy$position == j, ]
fre <- prop.table(table(clo$character))
## total_heigh is overall hight, the height of each char is assigned.
ywidth <- sort(total_heigh * fre )
## calling color scheme
column_char_color <- data.frame(unique(clo[c("character", "color")]))
font_f <- font_fam[[font]]
motif_char <- font_f[names(ywidth)]
ds_ <- lapply(seq_along(motif_char), function(i){
ds_ <- motif_char[[i]]
names(ds_)[names(ds_) == "x"] <- "logo_x"
names(ds_)[names(ds_) == "y"] <- "logo_y"
ds_$char <- names(motif_char[i])
#width = .9
ds_$logo_x <- ds_$logo_x * logo_width/diff(range(ds_$logo_x))
#hight = overall hight * frequency
ds_$logo_y <- ds_$logo_y * ywidth[[i]]/diff(range(ds_$logo_y))
ymotif <- sum(ywidth[0:(i - 1)]) # sum-hight currently
# moving char horizontally
ds_$logo_x <- ds_$logo_x - min(ds_$logo_x) - logo_width/2 + j
ds_$logo_y <- ds_$logo_y - min(ds_$logo_y) - ywidth[[i]]/2 +
ymotif + ywidth[[i]]/2
if (top) {
ds_$logo_y <- ds_$logo_y + nrow(tidy[tidy$position == j, ]) + .5
}
## ds_$y - min(ds_$y) - ywidth[[i]]/2: Centered at zero
## + ymotif: sum-hight that are below the char currently
## + ywidth[[i]]/2: the char height currently
ds_$group <- paste0("P", j, '-', "Char", names(motif_char[i]))
ds_$color <- column_char_color[column_char_color$character ==
unique(ds_$char), "color"]
return(ds_)
})
ds <- do.call(rbind, ds_)
return(ds)
})
moti_da <- do.call(rbind, moti_da)
moti_da$name <- as.character(tidy[1,1])
other_cn <- names(moti_da)[!names(moti_da) == 'name']
moti_da <- moti_da[c("name", other_cn)]
add_col <- tidy[,!names(tidy) %in% names(moti_da)]
moti_da <- cbind(add_col[1,], moti_da, row.names = NULL)
return(moti_da)
}
================================================
FILE: R/simplot.R
================================================
##' Sequence similarity plot
##'
##'
##' @title simplot
##' @param file alignment fast file
##' @param query query sequence
##' @param window sliding window size (bp)
##' @param step step size to slide the window (bp)
##' @param group whether grouping sequence.(eg. For "A-seq1,A-seq-2,B-seq1 and
##' B-seq2", using sep = "-" and id = 1 to divide sequences into groups A and
##' B)
##' @param id position to extract id for grouping; only works if group = TRUE
##' @param sep separator to split sequence name; only works if group = TRUE
##' @param sd whether display standard deviation of
##' similarity among each group; only works if group=TRUE
##' @param smooth FALSE(default)or TRUE; whether display smoothed spline.
##' @param smooth_params a list that add params for geom_smooth,
##' (default: smooth_params = list(method = "loess", se = FALSE))
##' @return ggplot object
##' @importFrom Biostrings readDNAStringSet
##' @importFrom ggplot2 aes_
##' @importFrom ggplot2 geom_line
##' @importFrom ggplot2 ggtitle
##' @importFrom ggplot2 geom_ribbon
##' @importFrom ggplot2 geom_smooth
##' @importFrom magrittr %<>%
##' @importFrom dplyr group_by_
##' @importFrom dplyr summarize_
##' @export
##' @author guangchuang yu
##' @examples
##' fas <- system.file("extdata/GVariation/sample_alignment.fa",
##' package="ggmsa")
##' simplot(fas, 'CF_YL21')
simplot <- function(file,
query,
window=200,
step=20,
group=FALSE,
id,
sep,
sd=FALSE,
smooth = FALSE,
smooth_params = list(method = "loess",
se = FALSE)) {
aln <- readDNAStringSet(file)
nn <- names(aln)
if (group) {
g <- vapply(strsplit(nn, sep), function(x) x[id], character(1))
}
idx <- which(nn != query)
w <- width(aln[query])
start <- seq(1, w, by=step)
end <- start + window - 1
start <- start[end <= w]
end <- end[end <= w]
res <- lapply(idx, function(i) {
x <- toCharacter(aln[i]) == toCharacter(aln[query])
pos <- round((start+end)/2)
sim <- vapply(seq_along(start), function(j) {
mean(x[start[j]:end[j]])
}, numeric(1))
y <- data.frame(sequence=nn[i], position = pos, similarity = sim)
if(group) {
y$group <- g[i]
}
return(y)
}) %>% do.call(rbind, .)
if (group) {
res %<>% group_by_(~position, ~group) %>%
summarize_(msim=~mean(similarity), sd=~sd(similarity))
}
if (group) {
p <- ggplot(res, aes_(x=~position, y=~msim, group=~group))
if (sd) p <- p + geom_ribbon(aes_(ymin=~msim-sd,
ymax=~msim+sd,
fill=~group), alpha=.25)
if (smooth) {
smooth_layer <- do.call(geom_smooth,
smooth_params)
p <- p + smooth_layer
} else {
p <- p + geom_line(aes_(color=~group))
}
} else {
mapping = aes_(x=~position,
y=~similarity,
group=~sequence,
color=~sequence)
p <- ggplot(res, mapping = mapping)
if (smooth) {
smooth_layer <- do.call(geom_smooth,
smooth_params)
p <- p + smooth_layer
} else {
p <- p + geom_line()
}
}
p + xlab("Nucleotide Position") + ylab("Similarity (%)") +
ggtitle(paste("Sequence similarities compare to", query)) +
theme_minimal() +
theme(legend.title=element_blank())
}
toCharacter <- function(x) {
unlist(strsplit(toString(x),""))
}
================================================
FILE: R/theme_msa.R
================================================
##' Theme for ggmsa.
##'
##' @title theme_msa
##' @importFrom ggplot2 theme_minimal
##' @importFrom ggplot2 labs
##' @export
##' @author Lang Zhou
theme_msa <- function(){
list(
xlab(NULL),
ylab(NULL),
labs(fill = "Fills"),
coord_fixed(),
scale_x_continuous(expand = c(0,0)),
theme_minimal() +
theme(
strip.text = element_blank(),
panel.spacing.y = unit(.4, "in"),
panel.grid = element_blank())
)
}
##' @importFrom grDevices colorRampPalette
##' @importFrom RColorBrewer brewer.pal
##' @importFrom ggplot2 coord_cartesian
##' @importFrom ggplot2 scale_x_continuous
##' @importFrom ggplot2 scale_y_continuous
##' @importFrom ggplot2 scale_fill_gradientn
bar_theme <- function(tidy){
data <- bar_data(tidy)
color_palettes <- colorRampPalette(brewer.pal(n = 9,
name = "Blues")[c(4:7)])
list(
xlab(NULL),
ylab("consensus"),
scale_x_continuous(breaks = data[[3]],
labels = data[[1]],
expand = c(0,0)),
scale_y_continuous(breaks = NULL),
scale_fill_gradientn(colours = color_palettes(100)),
theme_minimal() +
theme(panel.grid.minor.x = element_blank(),
panel.grid.major.x = element_blank())
)
}
facet_scale <- function(facetData, field) {
facet0_pos <- facetData[facetData$facet == 0,"position"]
msa_start <- min(facet0_pos)
## x labels of facet 0
facet0_xl_scale <- pretty(min(facet0_pos):max(facet0_pos))
## assign the start postion to the first label
facet0_xl_scale[1] <- msa_start
xl_scale <- facet0_xl_scale
for(i in max(facetData$facet) %>% seq_len) {
scale_i <- facet0_xl_scale + field * i
if(msa_start > 1) scale_i[1] <- scale_i[1] + 1
#print(scale_i)
xl_scale <- xl_scale %>% c(scale_i)
}
max_pos <- facetData$position %>% max
xl_scale <- xl_scale[xl_scale <= max_pos]
return(xl_scale)
}
================================================
FILE: R/zzz.R
================================================
#' @importFrom utils packageDescription
.onAttach <- function(libname, pkgname){
#options(total_heigh = 4)
options(logo_width = 0.9)
options(asterisk_width = .03)
options(GC_pos = 2)
options(shadingLen = .5)
options(shading_alpha = .3)
pkgVersion <- packageDescription(pkgname, fields="Version")
msg <- paste0(pkgname, " v", pkgVersion, " ",
"Document: http://yulab-smu.top/ggmsa/", "\n\n")
citation <- paste0("If you use ", pkgname,
" in published research, please cite:\n",
"L Zhou, T Feng, S Xu, F Gao, TT Lam, Q Wang, T Wu, ",
"H Huang, L Zhan, L Li, Y Guan, Z Dai*, G Yu* ",
"ggmsa: a visual exploration tool for multiple sequence alignment and associated data. ",
"Briefings in Bioinformatics. DOI:10.1093/bib/bbac222")
packageStartupMessage(paste0(msg, citation))
}
================================================
FILE: README.Rmd
================================================
---
output:
md_document:
variant: gfm
html_preview: TRUE
---
<!-- README.md is generated from README.Rmd. Please edit that file -->
```{r, include = FALSE}
knitr::opts_chunk$set(
fig.path = "man/figures/REAMED-",
message = FALSE,
warning = FALSE
)
```
# ggmsa:a visual exploration tool for multiple sequence alignment and associated data <img src="man/figures/logo.png" height="140" align="right" />
```{r echo=FALSE, results="hide", message=FALSE}
library(badger)
```
```{r, echo = FALSE, results='asis'}
cat(
badge_devel("YuLab-SMU/ggmsa", "blue"),
badge_lifecycle("experimental", "orange"),
badge_license("Artistic-2.0")
)
```
<!-- badges: start -->
<!-- [](https://cran.r-project.org/package=ggmsa)-->
<!-- [](https://cran.r-project.org/package=ggmsa)-->
<!-- badges: end -->
`ggmsa` is designed for visualization and annotation of multiple sequence alignment. It implements functions to visualize publication-quality multiple sequence alignments (protein/DNA/RNA) in R extremely simple and powerful.
For details, please visit <http://yulab-smu.top/ggmsa/>
## :hammer: Installation
The released version from `Bioconductor`
```{r eval=FALSE}
if (!requireNamespace("BiocManager", quietly=TRUE))
install.packages("BiocManager")
## BiocManager::install("BiocUpgrade") ## you may need this
BiocManager::install("ggmsa")
```
Alternatively, you can grab the development version from github using devtools:
```{r eval=FALSE}
if (!requireNamespace("devtools", quietly=TRUE))
install.packages("devtools")
devtools::install_github("YuLab-SMU/ggmsa")
```
## :bulb: Quick Example
```{r fig.height = 2.5, fig.width = 11, message=FALSE, warning=FALSE, dpi=300}
library(ggmsa)
protein_sequences <- system.file("extdata", "sample.fasta", package = "ggmsa")
ggmsa(protein_sequences, start = 221, end = 280, char_width = 0.5, seq_name = TRUE) + geom_seqlogo() + geom_msaBar()
```
## :books: Learn more
Check out the guides for learning everything there is to know about all the different features:
- [Getting Started](https://yulab-smu.github.io/ggmsa/articles/ggmsa.html)
- [Annotations](https://yulab-smu.github.io/ggmsa/articles/guides/Annotations.html)
- [Color Schemes and Font Families](https://yulab-smu.github.io/ggmsa/articles/guides/Color_schemes_And_Font_Families.html)
- [Theme](https://yulab-smu.github.io/ggmsa/articles/guides/MSA_theme.html)
- [Other Modules](https://yulab-smu.github.io/ggmsa/articles/guides/Other_Modules.html)
- [View Modes](https://yulab-smu.github.io/ggmsa/articles/guides/View_modes.html)
## :runner: Author
- [Guangchuang Yu](https://guangchuangyu.github.io) Professor, PI
- [Lang Zhou](https://github.com/nyzhoulang) Master's Student
- [Shuangbin Xu](https://github.com/xiangpin) PhD Student
**YuLab** <https://yulab-smu.top/>
**Department of Bioinformatics, School of Basic Medical Sciences, Southern Medical University**
## :sparkling_heart: Contributing
We welcome any contributions! By participating in this project you agree to abide
by the terms outlined in the [Contributor Code of Conduct](https://github.com/YuLab-SMU/ggmsa/blob/master/CONDUCT.md).
================================================
FILE: README.md
================================================
<!-- README.md is generated from README.Rmd. Please edit that file -->
# ggmsa:a visual exploration tool for multiple sequence alignment and associated data <img src="man/figures/logo.png" height="140" align="right" />
[](https://github.com/YuLab-SMU/ggmsa)
[](https://lifecycle.r-lib.org/articles/stages.html#experimental)
[](https://cran.r-project.org/web/licenses/Artistic-2.0)
<!-- badges: start -->
<!-- [](https://cran.r-project.org/package=ggmsa)-->
<!-- [](https://cran.r-project.org/package=ggmsa)-->
<!-- badges: end -->
`ggmsa` is designed for visualization and annotation of multiple
sequence alignment. It implements functions to visualize
publication-quality multiple sequence alignments (protein/DNA/RNA) in R
extremely simple and powerful.
For details, please visit <http://yulab-smu.top/ggmsa/>
## :hammer: Installation
The released version from `Bioconductor`
``` r
if (!requireNamespace("BiocManager", quietly=TRUE))
install.packages("BiocManager")
## BiocManager::install("BiocUpgrade") ## you may need this
BiocManager::install("ggmsa")
```
Alternatively, you can grab the development version from github using
devtools:
``` r
if (!requireNamespace("devtools", quietly=TRUE))
install.packages("devtools")
devtools::install_github("YuLab-SMU/ggmsa")
```
## :bulb: Quick Example
``` r
library(ggmsa)
protein_sequences <- system.file("extdata", "sample.fasta", package = "ggmsa")
ggmsa(protein_sequences, start = 221, end = 280, char_width = 0.5, seq_name = TRUE) + geom_seqlogo() + geom_msaBar()
```
<!-- -->
## :books: Learn more
Check out the guides for learning everything there is to know about all
the different features:
- [Getting
Started](https://yulab-smu.github.io/ggmsa/articles/ggmsa.html)
- [Annotations](https://yulab-smu.github.io/ggmsa/articles/guides/Annotations.html)
- [Color Schemes and Font
Families](https://yulab-smu.github.io/ggmsa/articles/guides/Color_schemes_And_Font_Families.html)
- [Theme](https://yulab-smu.github.io/ggmsa/articles/guides/MSA_theme.html)
- [Other
Modules](https://yulab-smu.github.io/ggmsa/articles/guides/Other_Modules.html)
- [View
Modes](https://yulab-smu.github.io/ggmsa/articles/guides/View_modes.html)
## :runner: Author
- [Guangchuang Yu](https://guangchuangyu.github.io) Professor, PI
- [Lang Zhou](https://github.com/nyzhoulang) Master’s Student
- [Shuangbin Xu](https://github.com/xiangpin) PhD Student
**YuLab** <https://yulab-smu.top/>
**Department of Bioinformatics, School of Basic Medical Sciences,
Southern Medical University**
## :sparkling_heart: Contributing
We welcome any contributions! By participating in this project you agree
to abide by the terms outlined in the [Contributor Code of
Conduct](https://github.com/YuLab-SMU/ggmsa/blob/master/CONDUCT.md).
================================================
FILE: inst/CITATION
================================================
citHeader("To cite ggmsa in publications use:")
citEntry(
entry = "book",
title = "Data Integration, Manipulation and Visualization of Phylogenetic Treess",
author = person("Guangchuang", "Yu"),
publisher = "Chapman and Hall/{CRC}",
year = "2022",
edition = "1st edition",
url = "https://www.amazon.com/Integration-Manipulation-Visualization-Phylogenetic-Computational-ebook/dp/B0B5NLZR1Z/",
textVersion = paste("Guangchuang Yu. (2022).",
"Data Integration, Manipulation and Visualization of Phylogenetic Trees (1st edition).",
"Chapman and Hall/CRC.")
)
citEntry(
entry = "article",
title = "ggmsa: a visual exploration tool for multiple sequence alignment and associated data ",
author = personList(
as.person("Lang Zhou"),
as.person("Tingze Feng"),
as.person("Shuangbin Xu"),
as.person("Fangluan Gao"),
as.person("Tommy T Lam"),
as.person("Qianwen Wang"),
as.person("Tianzhi Wu"),
as.person("Huina Huang"),
as.person("Li Zhan"),
as.person("Lin Li"),
as.person("Yi Guan"),
as.person("Zehan Dai"),
as.person("Guangchuang Yu")
),
journal = "BRIEFINGS IN BIOINFORMATICS",
volume = "23",
issue = "4",
year = "2022",
month = "06",
ISSN = "1467-5463",
doi = "10.1093/bib/bbac222",
PMID = "35671504",
url = "https://academic.oup.com/bib/article-abstract/23/4/bbac222/6603927",
textVersion = paste("L Zhou, T Feng, S Xu, F Gao, TT Lam, Q Wang, T Wu, H Huang, L Zhan, L Li, Y Guan, Z Dai, G Yu.",
"ggmsa: a visual exploration tool for multiple sequence alignment and associated data.",
"Bioinformatics. 2022, 23(4):bbac222. 10.1093/bib/bbac222")
)
================================================
FILE: inst/extdata/GVariation/A.Mont.fas
================================================
>Mont
ATGGCAACTTACACATCAACAATCCAGTTTGGTTCCATTGAATGCAAACTTCCATACTCACCCGCTCCTTTTGGGCTAGTTGCGGGGAAACGAGAAGTTTTAACCACCACTGACCCCTTCGCAAGTTTGGAGATGCAGCTTAGTGCGCGATTACGAAGGCAAGAGTTTGCAACTATTCGAACATCCAAGAATGGTACTTGCATGTATCGATACAAGACTGATGTCCAGATTGCGCGCATTCAAAAGAAGCGCGAGGAAAGAGAAAGAGAGGAATATAATTTCCAAATGGCTGCGTCAAGTGTTGTGTCGAAGATCACTATTGCTGGTGGAGAGCCACCTTCAAAACTTGAATCACAAGTGCGGAGGGGTGTCATCCACACAACTCCAAGGATGCGCACAGCAAAAACATATCACACGCCAAAGCTGACAGAGGGACAAATGAACCACCTTATCAAGCAGGTGAAGCAAATTATGTCAACCAAAGGAGGGTCTGTTCAACTGATTAGCAAGAAAAGTACTCATGTTCACTATAAAGAAGTTTTGGGATCACATCGCGCAGTTGTTTGCACTGCACATATGAGAGGTTTACGAAAGAGAGTGGACTTTCGGTGTGATAAATGGACCGTTGTGCGTCTACAGCATCTCGCCAGGACGGACAAGTGGACTAACCAAGTTCGTGCTACTGATCTACGCAAGGGCGATAGTGGAGTTATATTGAGTAATACTAATCTCAAAGGAAACTTTGGGAGAAGCTCGGAGGGCCTATTCATAGTGCGTGGGTCGCACGAAGGAAAAATCTATGATGCACGTTCCAAGGTTACTCAAGGGGTTATGGATTCAATGGTTCAGTTCTCAAGCGCTGAAAGCTTTTGGAAGGGATTGGACGGCAATTGGGCACAAATGAGATATCCTACAGATCATACATGTGTGGCAGGCTTACCAGTTGAAGACTGTGGCAGAGTTGCAGCGATAATGACACACAGTATTTTACCGTGCTATAAGATAACCTGCCCTACCTGTGCCCAACAATATGCCAACTTGCCAGCCAGTGACTTACTTAAGATATTACACAAGCACGCAAGTGATGGTCTAAATCGATTGGGGGCAGACAAAGATCGCTTTGTGCATGTCAAAAAGTTCTTGACAATCTTAGAGCACTTAACTGAACCGGTTGATCTGAGTCTAGAAATTTTCAATGAAGTATTCAAGTCTATAGGGGAGAAGCAACAATCACCTTTCAAAAACCTGAATATTCTGAATAATTTCTTTTTGAAAGGAAAGGAAAATACAGCTCGTGAATGGCAGGTGGCTCAATTAAGCTTACTTGAATTGGCAAGATTCCAAAAGAACAGAACGGATAATATCAAGAAAGGAGACATCTCGTTCTTTAGGAATAAACTATCTGCAAAAGCAAATTGGAACTTGTATCTGTCATGTGATAACCAGCTGGATAAGAATGCAAACTTCCTGTGGGGACAGAGGGAATATCATGCTAAGCGATTTTTCTCGAACTATTTCGAGGAAATTGATCCAGCGAAGGGCTATTCAGCATACGAAAATCGTTTGCATCCGAATGGGACAAGAAAACTTGCAATTGGAAACCTAATTGTACCACTTGATCTGGCTGAGTTTAGGCGGAAGATGAAAGGTGATTATAAAAGACAGCCAGGGGTGAGTAAGAAGTGCACGAGCTCGAAGGATGGAAACTACGTGTATCCCTGTTGTTGCACTACACTTGATGATGGCTCAGCTGTTGAGTCAACATTTTACCCGCCAACTAAGAAGCACCTCGTAATAGGTAATAGTGGCGACCAAAAGTATGTTGACTTACCAAAAGGGAATTCTGAGATGTTATATATTGCCAGGCAAGGCTTCTGTTACATTAACATTTTTCTCGCGATGTTGATTAACATTAGTGAGGAAGATGCAAAGGATTTCACTAAGAAGGTTCGTGACATGTGTGTGCCAAAGCTTGGAACCTGGCCAACCATGATGGATTTGGCTACAACTTGTGCTCAAATGAAAATATTCTACCCTGATGTTCATGATGCAGAACTGCCTAGAATACTAGTCGATCACGAAACGCAGACATGCCATGTGGTTGACTCGTTTGGCTCACAAACAACTGGGTATCATATTTTGAAAGCATCTAGCGTGTCCCAACTTATTTTGTTTGCTAATGATGAGTTGGAGTCTGACATTAAGCACTATAGAGTTGGTGGTATTCCTGGAGCATGCCCTGAGCTTGGGTCCACAATATCACCTTTTAGAGAAGGAGGAATCATAATGTCTGAGTCAGCAGCGCTAAAACTGCTCCTAAAGGGAATTTTTAGGCCCAAAGTGATGAAGCAATTGCTACTGGATGAACCATATTTGCTCATTTTATCGATATTATCTCCTGGTATACTTATGGCCATGTACAACAATGGGATATTTGAGTTAGCGGTGAAGTTGTGGATCAATGAGAAACAATCTATAGCCATGATAGCATCGTTATTGTCCGCCTTGGCTTTACGAGTGTCAGCAGCAGAAACACTCGTTGCACAGAGGATTATAATTGACACGGCAGCAACAGATCTTCTCGATGCTACGTGTGATGGATTCAACTTACATCTAACATATCCCACTGCACTCATGGTGTTGCAAGTTGTTAAGAACAGAAATGAATGTGATGATACGTTGTTTAAAGCAGGTTTTTCACATTACAACATGAGTGTCGTGCAGATTATGGAAAAAAATTATCTAAGCCTCTTGGGCGATGCTTGGAAAGATTTAACCTGGCGAGAAAAATTATCCGCAACATGGCACTCATACAAAGCAAAGCGCTCTATCACTCAGTTCATAAAACCCATAGGCAAAGCAGATTTAAAAGGGTTGTACAACATATCACCGCAAGCATTCTTGGGTCAGGGCGTACAGAGAGTCAAAGGCACCGCCTCAGGGTTGAATGAGCGACTCAATAATTATATCAATACTAAGTGTGTAAATATTTCATCCTTTTTCATTCGTAGAATTTTCCGGCGCTTGCCAACTTTTGTAACTTTCATTAATTCATTATTAGTTATTAGTATGCTAACTAGTGTAGTAGCAGTGTGTCAAGCAATAATTCTAGATCAAAGGAAGTATAGAAAAGAAATTGAGTTGATGCAGATTGAGAAGAATGAAATTGTTTGTATGGAGTTGTATGCGAGTCTGCAGCGCAAACTTGAGCGTGAATTCACATGGGATGAATATATGGAATATTTGAAATCTGTGAATCCCCAGATAGTTCAATTCGCGCAAGCTCAAATGGAAGAATATAATGTGCGACATCAGCGCTCCACACCAGGTGTTAAGAATTTAGAGCAGGTGGTAGCATTTATAACTCTAATTATCATGATGTTTGATGCTGAAAGGAGCGACTGTGTATTTAAGACTCTCAACAAATTCAAAGGCATCGTTTCTTCAATGGATCATGAAGTTAGACACCAGTCCTTGGATGATGTAATCAAGAATTTCGATGAAAGGAACGAAGTTATTGATTTTGAACTAAATGAGGATACAATTAAAACATCATCAGTGTTGGACACAAAGTTTAGCGACTGGTGGGATCGGCAAATCCAAATGGGACACACACTTCCCCATTATAGAACTGAGGGACACTTCATGGAATTCACAAGGGCAACTGCTGTACAAGTGGCCAACGACATCGCGCATAGTGAGCACCTAGACTTTCTAGTGAGGGGAGCTGTTGGGTCTGGAAAATCTACTGGACTGCCTGTCCATCTCAGTGCAGCTGGATCTGTGCTTTTGATAGAACCAACTCGACCACTTGCAGAAAACGTGTTCAAGCAATTATCCAGTGAACCGTTTTTCAAGAAGCCAACACTGCGCATGCGAGGAAATAGTGTGTTTGGTTCCTCTCCAATCTCCATTATGACTAGCGGCTTTGCGTTGCACTACTATGCTAATAATCGCTCTCAGCTAACTCAGTTTAATTTCATAATTTTTGATGAATGTCATGTTTTAGATCCTTCTGCAATGGCATTTCGTAGCTTGTTAAGTGTGTATCACCAAACATGCAAAGTGTTAAAGGTGTCAGCCACTCCAGTGGGAAGGGAGGTCGAGTTCACAACACAACAACCAGTTAAATTGGTGGTTGAGGATACACTTTCATTCCAATCTTTTGTTGATGCGCAAGGCTCAAAAACCAATGCTGACGTAGTTCAGCATGGTTCGAACATACTCGTGTATGTGTCGAGTTACAATGAAGTGGATACATTAGCCAAGCTTCTAACAGATAGGAATATGATAGTCTCAAAAGTTGATGGCAGAACAATGAAGCACGGATGCTTAGAAATTGTAACGAAAGGGACTAGTGCAAAGCCACATTTTGTCGTAGCAACCAACATTATTGAAAATGGAGTAACTTTAGATATAGATGTAGTTGTAGATTTTGGGCTTAAAGTCTCACCGTTTTTAGATATTGACAATAGGAGCATAGCATACAATAAGATTAGTGTTAGCTATGGAGAAAGAATTCAGAGGTTGGGCCGTGTTGGGCGCTTTAAGAAGGGAGTGGCATTGCGTATTGGACACACCGAAAAGGGAATTATTGAGATTCCAAGTATGATTGCTAGTGAAGCTGCGCTTGCGTGCTTTGCATACAATTTGCCAGTAATGACAGGGGGTGTTTCAACTAGCCTCATTGGCAATTGTACTGTTCGTCAAGTTAAAACTATGCAACAATTTGAGCTGAGTCCATTCTTTATACAAAATTTTGTTGCCCATGATGGATCAATGCATCCTGTCATACATGACATTCTTAAGAAGTATAAACTGCGAGATTGTATGACGCCCTTGTGTGATCAATCCATACCTTACAGAGCCTCAAGCACTTGGTTGTCTGTTAGTGAGTACGAACGACTCGGAGTGGTTTTGGACATTCCAAAACAGATCAAGATTGCATTCCACATCAAGGATATCCCTCCTAAGTTGCATGAAATGCTTTGGGAAACAGTTATCAAATATAAGGATGTTTGTTTGTTTCCAAGTATTCGGGCTTCATCCATTAGCAAAATTGCATACACACTGCGCACTGATCTTTTTGCAATTCCCAGAACCCTAATTCTAGTTGAAAGATTGCTCGAGGAGGAACGAGTGAAACAGAGTCAATTCAGAAGTCTCATTGATGAAGGATGCTCAAGCATGTTTTCAATTGTTAATTTAACAAACACTCTTAGAGCTAGATATGCAAAGGATTACACTGCAGAAAACATACAGAAGCTCGAGAAAGTGAGGAGTCAGTTAAAGGAGTTCTCAAATTTAAATGGCTCTGCATGCGAGGAGAACTTAATGAAGAGGTATGAATCTCTACAGTTTGTGCATCATCAAGCAACAACTGCACTCGCAAAGGATTTGAAGTTGAAAGGAGTTTGGAAGAAGTCATTAGTTGTGCAGGACTTAATCATAGCGGGTGCCGTTGCTATTGGTGGAATAGGGCTCATCTATAGTTGGTTTACTCAATCAGTTGAAACTGTGTCTCACCAGGGCAAGAACAAATCCAAAAGAATTCAAGCATTGAAGTTTCGACACGCCCGCGATAAGAGGGCTGGTTTTGAAATTGATAACAATGATGATACAATAGAAGAATTCTTTGGATCTGCATACAGGAAGAAGGGAAAAGGTAAAGGCACCACTGTTGGTATGGGCAAGTCAAGCAGGAGGTTTGTTAATATGTATGGATTTGACCCAACAGAATATTCATTCATCCAGTTCGTTGATCCGCTCACTGGAGCTCAAATTGAAGAGAACGTCTATGCTGATATTAGAGACATCCAAGAGCGCTTTAGTGATGTCCGCAAGAAAATGGTAGAGGATGATGAAATCGAATTGCAAGCATTGGGCAGCAACACAATCATTCATGCTTACTTCAGGAAGGATTGGTCTGACAAGGCTCTAAAAATTGATTTGATGCCACACAACCCACTCAAAATCTGTGATAAATCGAATGGCATTGCTAAGTTTCCTGAAAGAGAACTTGAGTTGAGGCAAACTGGGCCAGCAACAGAGGTTGATGTGAAAGACATTCCAAAACAGGAAGTGGAGCATGAAGCCAAATCACTCATGAGAGGTTTAAGGGATTTCAATCCAATTGCTCAAACAGTTTGCAGAGTAAAAGTGTCTGTTGAATATGGAACGTCTGAAATGTATGGGTTCGGTTTTGGTGCGTATATTATAGTAAACCACCATCTATTCAAGAGTTTCAATGGATCCATGGAAGTGCGATCAATGCATGGAACATTCAGAGTGAAGAATTTGCATAGCCTGAGCGTTTTACCGATCAAAGGCAGAGACATTATCATCATAAAGATGCCAAAGGATTTCCCTGTTTTCCCACAAAAACTGCACTTCCGAGCTCCAGTGCAGAATGAGAGGATTTGTTTGGTTGGAACTAATTTTCAAGAAAAACATGCATCATCAATCATCACAGAAACGAGTACTACATACAATGTACCGGGCAGCACTTTTTGGAAGCATTGGATTGAAACAAATGATGGGCATTGTGGATTACCAGTAGTGAGTACAGCTGATGGATGTCTAGTTGGAATACACAGCTTGGCGAATAATGTGCAAACCACGAATTATTATTCAGCCTTTGATGAGGATTTTGAAAGTAAGTATCTCCGAACTGATGAGCATAATGAGTGGACCAAATCGTGGGTATATAACCCAGATACTGTGTTGTGGGGTCCATTGAAGCTCAAAGAGAGTACCCCTAAAGGCCTGTTTAAGACAACAAAACTTGTACAGGATTTAATTGATCATGATGTTGTTGTAGAGCAAGCTAAACATTCTGCGTGGATGTATGAGGCTCTAACAGGGAATTTGCAAGCTGTGGCGACAATGAAGAGTCAGCTAGTGACAAAGCACGTGGTCAAAGGGGAGTGTCGGCACTTCAAAGAGTTCTTAACTGTGGATTCGGAAGCAGAAGCTTTCTTCAGGCCTTTGATGGATGCTTATGGGAAGAGCTTGTTAAATAGAGAAGCATATATAAAGGACATAATGAAATACTCAAAGCCTATTGATGTTGGAATAGTAGACTGTGATGCTTTTGAAGAGGCTATCAATAGGGTTATCATTTATCTGCAAGTACATGGCTTCCAGAAATGCAATTACATCACCGATGAGCAGGAAATTTTCAAAGCTCTCAATATGAAAGCTGCTGTCGGGGCTATGTATGGAGGCAAGAAGAAAGACTACTTCGAGCATTTTACTGAGGCGGATAAAGAGGAAATTGTTATGCAAAGTTGCTTACGATTGTACAAGGGCTCACTTGGCATATGGAATGGATCATTGAAAGCAGAACTTCGGTGCAAAGAGAAGATACTTGCAAATAAGACAAGGACATTCACTGCTGCACCTTTAGATACTCTACTGGGTGGGAAGGTGTGCGTTGATGATTTTAATAATCAATTCTACTCAAAGAACATTGAATGCTGCTGGACTGTTGGAATGACTAAGTTTTATGGAGGTTGGGACAAATTGCTTCGGCGTCTACCTGAAAATTGGGTGTACTGCGATGCCGATGGTTCACAATTCGATAGTTCACTCACCCCATACCTAATTAATGCTGTTCTCATCATCAGAAGCACATACATGGAAGATTGGGACTTGGGGTTGCAAATGTTGCGCAATTTGTACACAGAAATAATTTACACACCAATCTCAACTCCAGATGGAACAATTGTCAAGAAGTTTAGAGGTAATAATAGCGGTCAACCTTCTACCGTTGTGGATAATTCTCTCATGGTTGTTCTTGCTATGCATTACGCTCTCATTAAGGAGTGCGTTGAGTTTGAAGAAATCGACAGCACGTGTGTATTCTTTGTTAATGGTGATGACTTATTGATTGCTGTGAATCCGGAGAAAGAGAGCATTCTCGATAGAATGTCACAACATTTCTCAGATCTTGGTTTGAACTATGATTTTTCGTCGAGAACAAGAAGGAAGGAGGAATTGTGGTTCATGTCCCATAGAGGCCTGCTAATTGAGGGTATGTACGTGCCAAAGCTTGAAGAAGAGAGAATTGTATCCATTCTGCAATGGGATAGGGCTGATCTGCCAGAGCACAGATTAGAAGCGATTTGTGCAGCAATGATAGAATCCTGGGGTTATTTTGAGTTAACGCACCAAATTAGGAGATTCTACTCATGGTTGTTACAACAGCAACCTTTTTCAACGATAGCACAGGAAGGAAAAGCTCCATACATAGCGAGCATGGCATTGAAGAAGCTGTACATGAATAGGACAGTAGATGAGGAGGAACTGAAGGCTTTCACTGAAATGATGGTTGCCTTGGATGACGAATTTGAGTGCGATACTTATGAAGTGCACCATCAAGGAAATGACACAATCGATGCAGGAGGAAGCACTAAGAAGGATGCAAAACAAGAGCAAGGTAGCATTCAACCAAATCTCAACAAGGAAAAGGAAAAGGACGTGAATGTTGGAACATCTGGAACTCATACTGTGCCACGAATTAAAGCTATCACGTCCAAAATGAGAATGCCTAAGAGTAAAGGTGCAACTGTACTAAATTTGGAACACTTACTCGAGTATGCTCCACAGCAAATTGACATCTCAAATACTCGAGCAACTCAATCACAGTTTGATACGTGGTATGAAGCAGTACAACTTGCATACGACATAGGAGAAACTGAAATGCCAACTGTGATGAATGGGCTTATGGTTTGGTGCATTGAAAATGGAACCTCGCCAAACATCAACGGAGTTTGGGTTATGATGGATGGAGATGAACAAGTCGAATACCCACTGAAACCAATCGTTGAGAATGCAAAACCAACACTTAGGCAAATCATGGCACATTTCTCAGATGTTGCAGAAGCGTATATAGAAATGCGCAACAAAAAGGAACCATATATGCCACGATATGGTTTAGTTCGTAATCTGCGCGATGGAAGTCTGGCTCGCTATGCTTTTGACTTTTATGAAGTTACATCACGGACACCAGTGAGGGCTAGAGAGGCACACATTCAAATGAAGGCCGCAGCTTTAAAATCAGCTCAATCTCGACTTTTCGGATTGGATGGTGGCATTAGTACACAAGAGGAAAACACAGAGAGGCACACCACCGAGGATGTTTCTCCAAGTATGCATACTCTACTTGGAGTGAAGAACATGTGA
>CF_YL21
ATGGCAATCTACACGTCAACAATCTGTTTCGGTTCGTTTGAATGCAAGCTACCATACTCACCCGCCTCTTGCGGGCATATTGTGAAGGAACGAGAAGTGCTGGCTTCCATCGATCCTTTCGCAGATCTGGAAACACAACTTAGTGCACGATTGCTCAAGCAAGAATATGCTACTGTTCGTGTGCTCAAGAACGGTACTCTTACGTACCGATACAAGACTGATGCCCAGATAACGCGCATCCAGAAGAGACTGGAAAGGAAGGATAGGGAAGAATACCACTTCCAGATGGCAGCTCCTAGTATTGTGTCAAAGATCACTATTGCTGGTGGAGAGCCACCTTCAAAACTTGAATCACAAGTGCGGAGGGGTGTCATCCACACAACTCCAAGGATGCGCACAGCAAAAACATATCACACGCCAAAGCTGACAGAGGGACAAATGAACCACCTTATCAAGCAGGTGAAGCAAATTATGTCAACCAAAGGAGGGTCTGTTCAACTGATTAGCAAGAAAAGTACCCATGTTCACTATAAAGAAGTTTTGGGATCACATCGCGCAGTTGTTTGCACTGCACATATGAGAGGTTTACGAAAGAGAGTGGACTTTCGGTGTGATAAATGGACCGTTGTGCGTCTACAGCATCTCGCCAGGACGGATAAGTGGAATAACCAAGTTCGTGTTACTGATCTACGCAAGGGCGATAGTGGAGTTATATTGAGTAATACTAATCTCAAAGGAAACTTTGGGAGAAGCTCGGAGGGCATATTCATAGTGCGTGGGTCGCACGAAGGAAAAATCTATGATGCACGTTCCAAGGTTACTCAAGGGGTTATGGATTCAATGGTTCAGTTCTCAAGCGCTGAAAGCTTTTGGAAGGGATTGGACGGCAATTGGGCACAAATGAGATATCCTACAGATCATACATGTGTGGCAGGCTTACCAGTTGAAGACTGTGGCAGAGTTGCAGCGATAATGACACACAGTATTTTACCGTGCTATAAGATAACCTGCCCTACCTGTGCCCAACAATATGCCAACTTGCCAGCCAGTGACTTACTTAAGATATTACACAAGCACGCAAGTGATGGTTTAAATCGATTGGGGGCAGACAAAGATCGCTTTGTGCATGTCAAAAAGTTCTTGACAATCTTAGAGCACTTAACTGAACCGGTTGATCTGAGTCTAGAAATTTTCAATGAAGTATTCAAGTCTATAGGGGAGAAGCAACAATCACCTTTCAAAAACCTGAATATTCTGAATAATTTCTTTTTAAAAGGAAAGGAAAATACAGCTCGTGAATGGCAGGTGGCTCAATTAAGCTTACTTGAATTGGCAAGATTCCAAAAGAACAGAACGGATAATATCAAGAAAGTAGACATCTCGTTCTTTAGGAATAAACTATCTGCCAAAGCAAATTGGAACTTGTATCTGTCATGTGATAACCAGCTGGATAAGAATGCAAACTTCCTGTGGGGACAGAGGGAATATCATGCTAAGCGATTTTTCTCGAACTATTTCGAGGAAATTGATCCAGCGAAGGGCTATTCAGCATACGAAAATCGTTTGCATCCGAATGGGACAAGAAAACTTGCAATTGGAAACCTAATTGTACCACTTGATCTGGCTGAGTTTAGGCGGAAGATGAAAGGTGATTTTAAAAGACAGCCAGGGGTGAGTAAGAAGTGCACGAGCTCGAAGGATGGAAACTACGTGTATCCCTGTTGTTGCACTACACTTGATGATGGCTCAGCTGTTGAATCAACATTTTACCCGCCAACTAAGAAGCGCCTCGTAATAGGTAATAGTGGCGACCAAAAGTATGTTGACTTACCAAAAGGGAATTCTGAGATGTTATATATTGCCAGGCAAGGCTTCTGTTACATTAACATTTTCCTCGCGATGTTGATTAACATTAGTGAGGAAGATGCAAAGGATTTCACTAAGAAGGTTCGTGACATGTGTGTGCCAAAGCTCGGAACCTGGCCAACCATGATGGATCTGGCTACAACTTGTGCTCAAATGAAAATATTCTACCCTGATGTTCATGATGCAGAACTGCCTAGAATACTAGTCGATCACGAAACGCAGACATGCCATGTGGTTGACTCGTTTGGCTCACAAACAACTGGGTATCATATTTTGAAAGCATCTAGCGTGTCCCAACTTATTTTGTTTGCTAATGATGAGTTGGAGTCTGACATTAAGCACTATAGAGTTGGTGGTATTCCTAATGCATGCCCTGAACTTGGGTCCACAATATCACCTTTCAGAGAAGGAGGAGTTATAATGTCTGAGTCGGCAGCGCTGAAACTGCTTTTGAAGGGAATTTTTAGACCTAAGGTGATGAGACAGTTGCTGTTAGATGAGCCTTACCTGTTGATTCTATCAATATTATCCCCTGGCATACTGATGGCTATGTATAATAATGGGATTTTTGAACTTGCGGTGAGGTTGTGGATTAATGAGAAGCAATCCATAGCGATGATAGCATCGCTACTATCAGCTTTAGCCCTACGAGTGTCAGCGGCAGAAACACTCGTCGCACAGAGGATTATAATTGATGCTGCAGCTACAGACCTCCTCGATGCTACGTGTGACGGGTTCAACCTGCATCTAACGTACCCCACTGCATTGATGGTGTTGCAAGTTGTTAAGAATAGAAATGAATGTGATGATACTCTATTCAAGGCGGGTTTTCCAAGTTACAACACGAGCGTCGTACAGATTATGGAAAAAAATTATCTAAATCTCTTGAACGATGCTTGGAAAGATTTAACTTGGCGGGAAAAATTTTCCGCAACATGGTACTCATACAGAGCAAAACGCTCTATCACTCGGTACATAAAACCCACAGGAAGGGCAGATTTGAAAGGGTTATACAACATATCACCACAAGCATTCTTGGGCCGAAGCGTCCAGGTGGTCAAAGGCACTGCCTCAGGATTGAGCGAGCGATTTAATAATTACTTCAATACTAAGTGTGTAAATATTTCATCCTTTTTCATTCGTAGAATCTTTAGGCGTTTGCCAACTTTCGTCACTTTTGTTAACTCATTATTAGTTATTAGTATGTTAACTAGTGTAGTGGCAGTGTGTCAGGCAATAATTTTAGATCAGAAGAAGTATAGGAGAGAGATCGAGTTGATGCAGATAGAGAAGAATGAGATTGTCTGCATGGAGCTATACGCAAGTTTACAGCGCAAACTTGAACGCGATTTCACATGGGATGAGTACATTGAGTATTTGAAATCAGTGAACCCTCAGATAGTTCAATTTGCTCAAGCGCAGATGGAAGAATATGATGTGCGACACCAGCGTTCCACACCAGGTGTTAAAAATTTGGAACAAGTGGTAGCATTTATGGCTTTAGTCATCATGGTGTTCGATGCTGAAAGGAGTGACTGCGTGTTCAAAACTCTCAATAAATTTAAGGGTGTCCTTTCCTCACTGGACCATGAAGTTAGACATCAGTCCTTGGATGATGTGATCAAGAATTTTGATGAGAGGAATGAGACTATTGATTTTGAATTGAGTGAGGACACAATTCGAACATCATCAGTGCTAGATACAAAGTTTAGTGATTGGTGGGACCGACAAATCCAAATGGGACATACACTTCCACATTACAGAACCGAGGGGCACTTCATGGAATTTACAAGGGCAACTGCTGTTCAAGTGGCTAATGACATTGCCCACAGCGAACACCTAGACTTTCTAGTAAGGGGAGCTGTTGGGTCTGGAAAGTCAACTGGGTTGCCTGTTCATCTTAGTGTAGCTGGATCTGTGCTTTTAATTGAGCCAACGCGGCCACTAACGGAGAACGTTTTCAAACAGCTATCTAGTGAACCATTCTTCAAGAAGCCAACACTGCGTATGCGTGGAAATAGTATATTTGGCTCTTCCCCAATCTCCGTTATGACTAGCGGGTTTGCGCTACACTACTTCGCCAATAATCGCTCTCAATTAGCTCAGTTCAATTTTGTAATATTTGATGAGTGCCATGTTCTGGATCCTTCCGCAATGGCGTTCCGCAGTCTGCTGAGTGTTTATCATCAAGCATGTAAAGTATTAAAAGTGTCAGCTACTCCAGTGGGAAGAGAGGTTGAATTCACAACACAGCAGCCAGTCAAGTTAATAGTGGAAGACACACTGTCTTTCCAATCATTTGTTGATGCACAAGGTTCTAAAACTAATGCTGATGTTGTCCAGTTTGGTTCAAACATACTTGTTTACGTGTCGAGCTACAATGAAGTTGACAACTTGGCTAAGCTCCTAACAGATAAGAATATGATGGTCACAAAGGTTGATGGCAGAACAATGAAGCACGGTTGCCTAGAAATTGTCACAAAAGGAACCAGTGCGAGACCACATTTTGTTGTAGCAACCAACATAATTGAGAATGGAGTGACTTTGGACATAGATGTAGTAGTGGATTTTGGGTTGAAAGTCTCACCGTTCTTGGACATTGACAATAGGAGCATTGCTTACAATAAGGTGAGTGTTAGTTATGGTGAGAGAATTCAAAGGCTGGGTCGTGTTGGACGCTTCAAGAAAGGAGTAGCATTGCGCATTGGACACACTGAGAAGGGAATTATCGAAATTCCAAGCATGATCGCTACTGAGGCGGCTCTTGCTTGCTTTGCATATAACTTACCAGTGATGACAGGAGGCGTCTCAACTAGTCTGATTGGCAATTGTACTGTGCGCCAGGTCAAAACAATGCAGCAATTTGAATTGAGCCCCTTCTTTATCCAGAATTTCGTTGCTCATGATGGATCAATGCATCCTGTCATACATGACATTCTCAAAAAGTATAAACTCCGAGATTGTATGACACCTTTGTGCGATCAGTCTATACCATACAGGGCATCGAGCACTTGGTTATCGGTTAGTGAATATGAGCGACTTGGAGTGGCTTTAGAAATTCCAAAGCAAGTCAAAATTGCATTCCATATCAAAGAGATCCCTCCTAAGCTCCACGAAATGCTTTGGGAAACGGTTGTCAAGTACAAAGACGTTTGCTTATTTCCAAGCATTCGAGCATCGTCCATCAGCAAAATCGCATACACATTGCGTACAGACCTCTTCGCCATCCCAAGAACTCTAATATTGGTGGAGAGATTGCTTGAAGAGGAGCGAGTGAAGCAGAGCCAATTCAGAAGTCTCATCGATGAAGGGTGCTCAAGCATGTTTTCAATTGTCAACTTGACCAACACTCTCAGGGCCAGATATGCAAAAGATTACACCGCAGAGAACATACAAAAACTTGAGAAAGTGAGAAGTCAATTAAAAGAATTCTCAAATTTGGATGGTTCTGCATGTGAGGAAAATTTAATAAAGAGGTATGAGTCTTTGCAGTTCGTTCACCACCAAGCTGCGACGTCACTTGCAAAGGATCTCAAGTTGAAGGGGACTTGGAAGAAGTCATTAGTGGCTAAAGACTTGATCATAGCAGGCGCTGTTGCAATTGGTGGAATAGGACTCATATATAGTTGGTTCACACAATCAGTTGAGACTGTGTCTCACCAAGGGAAAAATAAATCCAAAAGAATCCAAGCCTTGAAGTTTCGCCATGCTCGTGACAAAAGGGCTGGCTTTGAAATTGACAACAATGATGACACAATAGAGGAATTCTTTGGATCTGCATACAGGAAGAAGGGAAAAGGTAAAGGTACCACAGTTGGTATGGGCAAGTCAAGCAGGAGGTTCATTAACATGTACGGGTTTGACCCGACAGAGTACTCATTCATCCAATTCGTTGATCCACTCACTGGGGCACAAATAGAAGAGAATGTCTATGCTGACATTAGAGATGTTCAAGAAAGATTTAGTGAGGTGCGAAAGAAAATGGTTGAGAATGATGATATTGAAATGCAAGCCTTGGGTAGTAACACGACCATACATGCATACTTTAGGAAAGATTGGTCTGACAAAGCTTTGAAGATTGATTTAATGCCACATAACCCACTCAAAGTTTGTGACAAAACAAATGGCATTGCCAAACTTCCTGAGAGAGAGTTCGAACTAAGGCAGACTGGGCCAGCTGTAGAAGTCGACGTGAAGGACATACCAGCGCAGGAAGTGGAGCATGAAGCCAAATCGCTCATGAGAGGCTTGAGAGACTTCAACCCAATTGCTCAAACAGTTTGTAGGCTGAAAGTATCTGTTGAATATGGGACATCAGAGATGTACGGTTTTGGATTTGGAGCATACATAATAGCGAACCACCATTTGTTTAGGAGTTACAATGGTTCCATGGAGGTGCGATCCATGCATGGTACATTCAGGGTGAAGAATCTACACAGTTTGAGCGTTCTGCCAATTAAAGGCAGGGACATCATCCTCATCAAAATGCCGAAGGATTTCCCTGTTTTTCCACAGAAATTGCATTTCCGAGCTCCTACACAGAATGAAAGAGTTTGTTTAGTTGGGACCAACTTTCAGGAGAAGTATGCATCGTCGATCATCACAGAAACAAGCACTACTTACAATATACCAGGCAGCACATTCTGGAAGCATTGGATTGAAACAGATAATGGACATTGTGGACTACCAGTGGTGAGCACAGCCGATGGATGTCTAGTCGGAATTCATAGTTTGGCAAACAATGCACACACCACGAACTACTACTCAGCCTTCGATGAAGATTTTGAAAGCAAGTACCTCCGAACCAATGAGCACAATGAATGGGTCAAATCTTGGGTTTATAATCCAGACACAGTGTTGTGGGGCCCGTTGAAGCTTAAAGATAGCACTCCCAAAGGGTTATTCAAAACAACAAAGCTTGTGCAAGATCTAATCGATCATGATGAAGTCGCGGAGCAAGCTAAGCACTCTGCGTGGATGTTTGAAGCCTTGACAGGAAATTTGCAAGCTGTCGCAACAATGAAGAGTCAATTAGTAACCAAGCATGTAGTTAAAGGAGAGTGTCGACACTTCAAAGAATTCCTGACTGTGGATGCAGAAGCAGAGGCATTCTTCAGGCCTTTGATGGATGCATATGGGAAAAGCTTGCTGAATAGAGATGCGTACATCAAAGACATAATGAAGTACTCAAAACCTATAGATGTTGGTATCGTGGACTGTGATGCATTTGAAGAAGCCATCAATAGGGTTATCATCTACTTGCAAGTGCACGGCTTCAAGAAGTGTGCATATGTCACTGACGAGCAAGAAATTTTCAAAGCGCTCAACATGAAAGCTGCAGTCGGAGCCATGTATGGTGGCAAAAAGAAAGACTATTTTGAGCATTTCACTGATGCAGATAAGGAAGAAATAGTCATGCAAAGCTGTCTGCGACTGTATAAAGGCTTGCTCGGCATTTGGAACGGATCGTTGAAGGCAGAGCTCCGGTGTAAGGAAAAGATACTTGCAAACAAGACGAGGACGTTCACTGCTGCACCTCTAGACACTTTGCTGGGTGGTAAAGTGTGTGTTGATGACTTCAATAATCAATTTTATTCAAAGAATATTGAATGCTGTTGGACAGTTGGGATGACTAAGTTTTATGGTGGTTGGGATAAACTGCTTCGGCGTTTACCTGAGAATTGGGTATACTGTGATGCCGATGGCTCACAGTTTGATAGTTCACTAACTCCATACTTAATCAATGCTGTTCTCACCATCAGAAGCACATACATGGAAGACTGGGATGTGGGGTTGCAAATGCTGCGTAATTTATACACTGAGATTATTTACACACCTATCTCAACTCCAGATGGAACAATTGTTAAGAAGTTCAGAGGAAATAACAGTGGTCAGCCTTCTACTGTTGTGGACAACTCTCTTATGGTTGTCCTTGCCATGCACTATGCTCTCATCAAAGAATGCATTGAGTTTGAAGAGATTGACAGCACGTGCGTGTTCTTCGTCAATGGTGATGATTTGCTGATTGCCGTGAATCCGGATAAAGAGGGCATTCTTGATAGATTGTCACAGCACTTCTCAGATCTTGGTTTAAATTATGATTTTTCGTCAAGAACAAGAAATAAGGAAGAGTTATGGTTTATGTCTCATAGAGGCCTACTGATTGAGGGCATGTACGTGCCGAAACTTGAAGAAGAAAGGATTGTGTCCATTCTCCAATGGGACAGAGCAGACTTGGCTGAACACAGGCTTGAGGCGATTTGCGCAGCTATGATAGAGTCCTGGGGTTATTCTGAACTAACACACCAAATTAGAAGATTCTACTCATGGTTATTGCAACAGCAACCTTTTGCAACAATAGCGCAGGAAGGGAAGGCTCCTTATATAGCAAGCATGGCATTAAGGAAACTGTATATGGATAGAGCTGTGGATGAGGAAGAGCTGAGAGCCTTCACTGAAATGATGGTCGCATTAGACGATGAGTTTGAGTGTGACTCTTATGAAGTACACCACCAGGGAAACGATACAATCGATGCAGGAGGAAGCAGCAAGAAAGATGCAAGACCAGAGCAAGGCAGCATCCAGTCAAACCCGAACAAAGGAAAAGATAAGGATGTGAATGCTGGTACATCTGGGACACATACTGTGCCGAGAATCAAGGCTATCACGTCCAAAATGAGAATGCCCAAAAGCAAAGGAGCAACCGTGCTAAACTTAGAACACTTGCTTGAGTATGCTCCACAACAAATTGATATTTCAAATACTCGGGCAACTCAATCACAGTTTGATACGTGGTATGAGGCAGTGCGGATGGCATACGACATAGGAGAAACTGAGATGCCAACTGTGATGAATGGGCTTATGGTTTGGTGCATTGAAAATGGAACCTCGCCAAATGTCAACGGAGTCTGGGTTATGATGGATGGGGATGAACAAGTCGAGTACCCGTTGAAACCAATCGTTGAGAATGCAAAACCAACCCTTAGGCAAATCATGGCACATTTCTCAGATGTTGCAGAAGCGTATATAGAAATGCGCAACAAAAAGGAACCATATATGCCACGATATGGTTTAATTCGAAATCTGCGGGATGTGGGTTTAGCGCGTTATGCCTTTGACTTTTATGAGGTCACATCACGAACACCAGTGAGGGCTAGGGAAGCGCACATTCAAATGAAGGCCGCAGCATTGAAATCAGCCCAACCTCGACTTTTCGGGTTGGACGGTGGCATCAGTACACAAGAGGAGAACACAGAGAGGCACACCGCCGAGGATGTCTCTCCAAGTATGCATACTCTACTTGGAGTCAAAAACATGTGA
================================================
FILE: inst/extdata/GVariation/B.Oz.fas
================================================
>Oz
ATGGCAACTTACATGTCAACAATCTGTTTCGGTTCGTTTGAATGCAAGCTACCATACTCACCCGCTTCTTGCGGGCATATTGTGAAGGAGCGAGAAGTGCTGGCTTCCGTTGATCCTTTCGCAGATCTGGAAACACAACTTAGTGCACGATTGCTCAAGCAAGAATATGCTACTGTTCGTGTGCTCAAGAACGGTACTCTTACTTACCGATACAAAACTGATGCCCAGATAACGCGCATTCAGAAGAAACTGGAGAGGAAGGATAGGGAAGAATATCACTTCCAGATGGCCGCTCCTAGTATTGTGTCAAAAATTACAATAGCTGGTGGAGATCCTCCATCAAAGTCTGAGCCACAAGCACCAAGAGGGATCATTCATACAACTCCAAGGGTGCGTAAAGTCAAGACACGTCCCATAATAAAGTTGACAGAAGGCCAGATGAATCATCTCATTAAGCAGGTGAAGCAGATTATGTCGGAGAAGAGAGGGTCTGTCCACTTAATTAGTAAGAAGACCACTCATGTTCAATATAAGGAGATACTTGGAGCAACTCGCGCAGCGGTTCGAACTGCACATATGATGGGTTTGCGACGGAGAGTGGACTTCCGATGTGATATGTGGACAGTCGGACTTTTGCAACGTCTCGCTCGGACGGACAAATGGTCCAATCAAGTCCGCACTATCAACATACGAAGGGGTGATAGTGGAGTCATTTTGAACACAAAAAGCCTCAAAGGCCACTTTGGTAGAAGTTCAGGAGACTTGTTCATAGTGCGTGGATCACACGAAGGGAAATTGTACGATGCACGTTCTAGAGTTACTCAGAGTGTTTTGAACTCAATGATCCAGTTTTCGAATGCTGATAATTTTTGGAAGGGTCTAGACGGTAATTGGGCACAACTGAGATATCCTTCGGATCACACATGTGTAGCTGGTTTACCTGTCGAAGATTGTGGTAGAGTTGCTGCATTGATGGCACACAGTATCCTCCCGTGCTACAAGATAACCTGCCCCACCTGTGCTCAACAGTATGCCAGCTTGCCGGTTAGCGATCTGTTTAAGCTGTTGCATAAACATGCGAGAGATGGTTTGAACCGATTGGGAGCGGATAAAGACCGGTTTATACATGTTAATAAGTTCTTGATAGCGTTAGAGCATCTAACTGAACCGGTGGATTTGAATCTCGAGCTTTTCAATGAGATATTTAAATCCATAGGGGAGAAGCAGCAAGCACCGTTCAAGAATTTAAATGTCTTAAATAATTTCTTCCTGAAAGGAAAAGAAAATACAGCTCATGAATGGCAAGTGGCTCAATTGAGTTTGCTCGAATTAGCAAGGTTCCAGAAGAATAGAACTGATAACATCAAGAAAGGTGATATATCTTTCTTCAGAAATAAATTATCTGCCAAGGCAAACTGGAATCTGTATTTGTCGTGCGACAACCAGTTGGATAAAAATGCAAATTTTCTGTGGGGACAAAGGGAGTATCATGCTAAGCGGTTTTTCTCAAACTTCTTTGAGGAAATTGATCCAGCAAAGGGATACTCAGCATATGAAATCCGCAAGCATCCAAATGGAACAAGGAAGCTCTCAATTGGTAACTTAGTTGTCCCACTTGATTTAGCTGAGTTTAGGCAGAAGATGAAAGGTGACTATAGGAAACAACCAGGAGTTAGCAGAAAGTGCACGAGTTCGAAAGATGGTAATTATGTGTATCCCTGTTGTTGCACAACACTTGATGATGGTTCAGCTATTGAATCAACATTCTATCCACCAACCAAAAAGCACCTTGTAATAGGCAATAGCGGTGACCAAAAATTTGTTGATTTACCAAAAGGGGATTCGGAGATGTTATACATTGCCAAGCAGGGTTATTGTTATATCAACGTGTTTCTTGCAATGCTTATTAACATTAGCGAGGAGGATGCAAAGGATTTCACAAAGAAAGTTCGCGACATGTGTGTGCCAAAGCTTGGAACCTGGCCAACTATGATGGATTTGGCGACCACTTGTGCTCAAATGAGAATATTCTATCCTGACGTGCATGATGCAGAGCTGCCTAGAATATTGGTTGACCATGACACTCAAACGTGTCACGTGGTTGACTCATTTGGCTCGCAAACAACTGGATATCATATTCTAAAAGCATCCAGCGTGTCTCAACTTATCTTGTTTGCAAATGATGAATTAGAATCTGATATAAAACATTATAGAGTTGGTGGTGTTCCTAATGCATGCCCTGAACTTGGGTCCACAATATCACCTTTCAGAGAAGGAGGAGTTATAATGTCTGAGTCGGCAGCGCTGAAACTGCTTTTAAAGGGAATTTTTAGACCTAAGGTGATGAGACAGTTGCTGTTAGATGAGCCTTACCTGTTGATTTTATCAATATTATCTCCTGGCATACTGATGGCTATGTATAATAATGGGATTTTTGAACTTGCGGTAAGGTTGTGGATTAATGAGAAACAATCCATAGCTATGATAGCATCGCTACTATCAGCTTTAGCCCTACGAGTGTCAGCGGCAGAAACACTCGTCGCACAGAGGATTATCATTGATGCTGCAGCTACAGACCTCCTTGATGCTACGTGTGATGGGTTCAACCTACATCTAACGTACCCCACTGCATTGATGGTGTTGCAAGTTGTTAAGAATAGAAATGAATGTGATGATACCCTATTCAAGGCGGGTTTTTCAAGTTACAACACGAGCGTCGTACAGATTATGGAAAAAAATTATCTAAATCTCTTGAACGATGCTTGGAAAGATTTAACTTGGCGGGAAAAATTATCCGCAACATGGTACTCATACAGAGCAAAACGCTCTATCACTCGGTACATAAAACCCACAGGAAGGGCAGATTTGAAAGGGTTATACAACATATCACCACAAGCATTCTTGGGCCGAAGCGCCCAGGTGGTCAAAGGTACTGCCTCAGGATTGAGTGAGCGATTTAATAATTATTTCAATACTAAGTGTGTAAATATTTCATCCTTTTTCATTCGTAGAATCTTTAGGCGTTTGCCAACTTTCGTCACTTTTGTTAACTCATTATTAGTTATTAGTATGTTAACTAGCGTAGTGGCAGTGTGTCAGGCAATAATTTTAGATCAGAGGAAATATAGGAGAGAAATCGAGTTGATGCAGATAGAGAAGAATGAGATTGTCTGCATGGAGCTATATGCAAGTTTACAGCGCAAACTTGAACGCGATTTCACATGGGATGAGTACATTGAGTATTTGAAATCAGTAAACCCTCAGATAGTTCAGTTTGCTCAAGCGCAGATGGAAGAATATGATGTGCGACACCAGCGTTCCACACCAGGTGTTAAAAATTTGGAACAAGTGGTAGCATTTATGGCTTTAGTCATTATGGTGTTCGATGCTGAAAGGAGTGATTGCGTGTTCAAAACTCTCAATAAATTTAAGGGTGTCCTTTCCTCAATGGACTATGAAGTTAGACATCAGTCCTTAGACGATGTGATCAAGAATTTTGATGAGAGGAATGAGATTATTGATTTTGAATTGAGTGAGGACACAATTCGAACATCATCAGTGCTAGATACAAAGTTTAGTGATTGGTGGGACCGACAAATCCAGATGGGACATACACTTCCACATTACAGAACCGAGGGGCACTTCATGGAATTCACAAGAGCAACTGCTGTCCAAGTGGCTAATGACATTGCCCATAGCGAACACCTAGACTTTCTAGTAAGGGGAGCTGTTGGGTCTGGAAAGTCAACTGGGTTACCTGTTCATCTTAGTGTAGCCGGATCTGTGCTTTTAATTGAACCAACGCGACCACTAGCGGAGAACGTTTTCAAACAGCTATCTAGTGAACCATTCTTCAAGAAGCCAACACTGCGTATGCGTGGAAATAGTATATTTGGCTCTTCTCCAATCTCCGTCATGACTAGCGGATTTGCGCTACACTACTTCGCCAATAATCGCTCTCAATTAGCTCAGTTCAACTTTGTAATATTTGATGAGTGCCATGTTCTGGATCCTTCCGCAATGGCGTTCCGCAGTCTGCTGAGTGTTTATCATCAAGCATGCAAAGTATTAAAAGTGTCAGCTACTCCAGTGGGAAGAGAGGTTGAATTTACAACACAGCAACCAGTCAAGTTAATAGTGGAGGACACACTGTCTTTCCAATCATTTGTTGATGCACAAGGTTCTAAAACTAATGCTGATGTTGTTCAGTTTGGTTCAAACGTACTTGTGTACGTGTCGAGCTACAATGAAGTTGATACCTTGGCTAAGCTCCTAACAGACAAGAATATGATGGTCACAAAGGTTGATGGCAGAACAATGAAGCACGGTTGCCTAGAAATTGTCACAAAAGGAACCAGTGCGAGACCACATTTTGTTGTAGCAACCAACATAATTGAGAATGGAGTGACTTTGGACATAGACGTGGTTGTAGATTTTGGGTTAAAAGTCTCACCGTTCTTGGACATTGACAATAGGAGCATTGCTTACAATAAGGTGAGTGTTAGCTATGGTGAGAGAATTCAAAGGCTGGGTCGTGTTGGACGCTTCAAGAAAGGAGTAGCATTGCGCATTGGACACACTGAGAAGGGAATTATTGAAATTCCAAGCATGATCGCTACAGAGGCAGCTCTTGCTTGCTTTGCATATAACTTACCAGTGATGACAGGAGGCGTCTCAACTAGTCTGATTGGCAATTGTACTGTGCGCCAAGTTAAAACAATGCAGCAATTTGAATTGAGTCCCTTCTTTATCCAGAATTTCGTTGCCCATGATGGATCAATGCATCCTGTCATACATGACATTCTTAAAAAGTATAAACTTCGAGATTGTATGACACCTTTGTGCGATCAGTCTATACCATACAGGGCATCGAGCACTTGGTTATCGGTTAGTGAATATGAGCGACTTGGAGTGGCCTTAGAAATTCCAAAGCAAGCCAAAATTGCATTCCATATCAAAGAGATCCCTCCTAAGCTCCACGAAATGCTTTGGGAAACGGTTGTCAAGTACAAAGACGTTTGCTTATTTCCAAGCATTCGAGCATCGTCCATCAGCAAAATCGCATACACATTGCGTACAGACCTCTTCGCCATCCCAAGAACTCTAATATTGGTGGAGAGACTGCTTGAAGAGGAGCGAGTGAAGCAGAGCCAATTCAGAAGTCTCATCGATGAAGGATGCTCAAGCATGTTTTCAATTGTCAACCTGACAAACACTCTCAGAGCTAGATATGCAAAAGATTACACCGCAGAGAACATACAAAAACTTGAGAAAGTGAGAAGTCAATTGAAAGAATTCTCAAATTTGGATGGTTCTGCATGTGAGGAAAATTTAATAAAGAGGTATGAGTCTTTGCAGTTCGTTCATCACCAAGCTGCGACGTCACTTGCAAAGGATCTCAAGTTAAAGGGGACTTGGAAGAAGTCATTAGTGGCCAAAGACTTGATCATAGCAGGCGCTGTTGCAATTGGTGGAATAGGACTCATATATAGTTGGTTCACACAATCAGTTGAGACTGTGTCTCACCAAGGGAAAAATAAATCCAAAAGAATTCAAGCCTTGAAGTTTCGCCATGCTCGTGACAAAAGGGCTGGCTTTGAAATTGACAACAATGATGACACAATAGAGGAATTCTTTGGATCTGCATACAGGAAAAAGGGAAAAGGTAAAGGTACCACAGTTGGTATGGGCAAGTCAAGCAGGAGGTTCATCAACATGTATGGGTTTGATCCAACAGAGTACTCATTCATCCAATTCGTTGATCCACTCACTGGGGCGCAAATAGAAGAGAATGTCTATGCTGACATTAGAGATATTCAAGAGAGATTTAGTGAAGTGCGAAAGAAAATGGTTGAGAATGATGACATTGAAATGCAAGCCTTGGGTGGTAACACGACCATACATGCATACTTTAGGAAAGATTGGTCTGACAAAGCTTTGAAGATTGATTTAATGCCACATAATCCACTCAAAGTTTGTGACAAGACAAATGGCATTGCCAAATTTCCTGAGAGAGAGCTCGAACTAAGGCAGACTGGGCCAGCTGTAGAAGTCGATGTGAAGGACATACCAGCACAGGAAGTGGAGCATGAAGCTAAATCGCTCATGAGAGGCTTGAGAGACTTCAACCCAATTGCCCAAACAGTTTGTAGGCTGAAAGTATCTGTTGAATATGGGACATCAGAGATGTACGGTTTTGGATTTGGAGCATACATAATAGCGAACCACCATTTGTTCAGGAGTTACAATGGTTCCATGGAGGTGCGATCCATGCACGGTACATTCAGGGTGAAGAATCTACACAGTTTGAGCGTTCTGCCAATTAAAGGTAGGGATATCATCCTCATCAAAATGCCGAAAGATTTCCCTGTCTTTCCACAGAAATTGCATTTCCGAGCTCCTACACAGAATGAAAGAGTTTGTTTAGTTGGAACCAACTTTCAGGAGAAGTATGCATCGTCGATCATCACAGAAACAAGCACCACTTACAATATACCAGGCAGCACATTCTGGAAGCATTGGATTGAAACAGATAATGGACATTGTGGATTACCAGTGGTGAGCACCACCGATGGATGTCTAGTCGGAATTCACAGTTTGGCAAACAACAAACACACCACGAACTACTACTCAGCCTTCGATGAAGATTTTGAAAGCAAGTATCTCCGAACCAATGAGCACAATGAATGGGTCAAGTCTTGGATTTATAATCCAGACACAGTGTTGTGGGGCCCGTTGAAACTTAAAGACAGCACTCCCAAAGGATTATTCAAAACAACAAAGCTTGTGCAAGATCTAATCGATCATGATGTAGTGGTGGAGCAAGCTAAGCACTCTGCGTGGATGTTTGAAGCCTTGACAGGAAATTTGCAAGCTGTCGCAACAATGAAGAGCCAATTAGTAACCAAGCATGTAGTTAAAGGAGAGTGTCGACACTTCAAAGAATTCCTGACTGTGGATGCAGAAGCAGAGGCATTCTTCAGGCCTTTGATGGATGCGTATGGGAAAAGCTTGCTGAATAGAGATGCATACATCAAGGACATAATGAAGTATTCAAAACCTATAGATGTTGGTATCGTGGACTGTGATGCATTTGAGGAAGCCATCAATAGGGTTATCATCTACCTGCAAGTGCACGGCTTCAAGAAGTGCGCATACGTCACTGACGAGCAAGAAATTTTCAAAGCGCTCAACATGAAAGCTGCAGTTGGAGCCATGTATGGTGGCAAAAAGAAAGACTATTTTGAGCATTTCACTGATGCAGATAAGGAAGAAATAGTCATGCAAAGCTGTCTGCGATTGTATAAAGGCTTGCTTGGCATTTGGAATGGATCATTGAAGGCAGAGCTCCGGTGTAAGGAAAAGATACTTGCAAATAAGACGAGGACATTCACTGCTGCACCTTTAGACACTTTGCTGGGTGGTAAAGTGTGTGTTGATGATTTCAATAATCAATTTTATTCAAAGAATATTGAATGCTGTTGGACGGTTGGGATGACTAAGTTTTATGGTGGTTGGGATAAACTGCTGCGGCGTTTACCTGAGAATTGGGTATACTGTGATGCCGATGGCTCACAGTTTGATAGTTCACTAACTCCATACTTAATCAATGCTGTTCTCACCATCAGAAGCACATACATGGAAGATTGGGATGTGGGGTTGCAAATGTTGCGCAATTTATACACTGAGATTGTTTACACACCTATTTCAACTCCAGATGGAACAATTGTTAAGAAGTTCAGAGGAAATAACAGTGGTCAGCCTTCTACTGTTGTGGACAACTCTCTTATGGTCGTCCTTGCCATGCACTATGCTCTCATCAAAGAATGCATTGAGTTTGAAGAGATTGACAGCACGTGCGTGTTCTTTGTCAATGGTGATGATTTGCTGATTGCTGTGAATCCGGATAAAGAGGGCATTCTTGACAGATTGTCACAACACTTCTCAGATCTTGGTTTGAATTATGATTTCTCGTCAAGAACAAGAAATAAGGAGGAATTGTGGTTTATGTCTCATAGAGGCCTACTGATTGAGGGCATGTACGTGCCGAAACTTGAAGAAGAAAGGATTGTGTCCATTCTCCAATGGGACAGAGCAGACTTGGCTGAACACAGGCTTGAGGCGATTTGCGCAGCTATGATAGAGTCCTGGGGTTATTCTGAACTAACACACCAAATCAGGAGATTCTACTCATGGTTATTGCAACAGCAACCCTTTGCAACAATAGCGCAGGAAGGGAAGGCTCCTTATATAGCAAGCATGGCATTAAGGAAATTGTATATGGATAGGGCTGTGGATGAGGAAGAGCTGAGAGCCTTCACTGAAATGATGGTCGCATTAGACGATGAGTTTGAATTTGACTCTTATGAAGTACACCATCAAGCAAATGACACAATCGATGCAGGAGGAAGCAGCAAGAAAGATGCAAGACCGGAGCAAGGCAGCATCCAGTCAAACCCGAACAAAGGAAAAGATAAGGATGTGAATGCTGGTACATCTGGGACACATACTGTGCCGAGAATCAAGGCTATCACGTCCAAAATGAGAATGCCCAAAAGCAAGGGAGCAACCGTGCTAAACCTAGAACACTTGCTTGAGTATGCTCCACAACAAATTGATATTTCAAATACTCGGGCAACTCAATCACAGTTTGATACGTGGTATGAGGCAGTGCGGATGGCATACGACATAGGAGAAACTGAGATGCCAACTGTGATGAATGGGCTTATGGTTTGGTGCATTGAAAATGGAACCTCGCCAAATGTCAACGGAGTTTGGGTTATGATGGATGGGAATGAACAAGTCGAGTACCCGTTGAAACCAATCGTTGAGAATGCAAAACCAACCCTTAGGCAAATCATGGCACATTTCTCAGATGTTGCAGAAGCGTATATAGAAATGCGCAACAAAAAGGAACCATATATGCCACGATATGGTTTAATTCGAAATCTGCGGGATGTGGGTTTAGCGCGTTATGCCTTTGACTTTTATGAGGTCACATCACGAACACCAGTGAGGGCTAGGGAAGCGCACATTCAAATGAAGGCCGCAGCATTGAAATCAGCCCAACCTCGACTTTTCGGGTTGGACGGTGGCATCAGTACACAAGAGGAGAACACAGAGAGGCACACCACCGAGGATGTCTCTCCAAGTATGCATACTCTACTTGGAGTCAAGAACATGTGA
>CF_YL21
ATGGCAATCTACACGTCAACAATCTGTTTCGGTTCGTTTGAATGCAAGCTACCATACTCACCCGCCTCTTGCGGGCATATTGTGAAGGAACGAGAAGTGCTGGCTTCCATCGATCCTTTCGCAGATCTGGAAACACAACTTAGTGCACGATTGCTCAAGCAAGAATATGCTACTGTTCGTGTGCTCAAGAACGGTACTCTTACGTACCGATACAAGACTGATGCCCAGATAACGCGCATCCAGAAGAGACTGGAAAGGAAGGATAGGGAAGAATACCACTTCCAGATGGCAGCTCCTAGTATTGTGTCAAAGATCACTATTGCTGGTGGAGAGCCACCTTCAAAACTTGAATCACAAGTGCGGAGGGGTGTCATCCACACAACTCCAAGGATGCGCACAGCAAAAACATATCACACGCCAAAGCTGACAGAGGGACAAATGAACCACCTTATCAAGCAGGTGAAGCAAATTATGTCAACCAAAGGAGGGTCTGTTCAACTGATTAGCAAGAAAAGTACCCATGTTCACTATAAAGAAGTTTTGGGATCACATCGCGCAGTTGTTTGCACTGCACATATGAGAGGTTTACGAAAGAGAGTGGACTTTCGGTGTGATAAATGGACCGTTGTGCGTCTACAGCATCTCGCCAGGACGGATAAGTGGAATAACCAAGTTCGTGTTACTGATCTACGCAAGGGCGATAGTGGAGTTATATTGAGTAATACTAATCTCAAAGGAAACTTTGGGAGAAGCTCGGAGGGCATATTCATAGTGCGTGGGTCGCACGAAGGAAAAATCTATGATGCACGTTCCAAGGTTACTCAAGGGGTTATGGATTCAATGGTTCAGTTCTCAAGCGCTGAAAGCTTTTGGAAGGGATTGGACGGCAATTGGGCACAAATGAGATATCCTACAGATCATACATGTGTGGCAGGCTTACCAGTTGAAGACTGTGGCAGAGTTGCAGCGATAATGACACACAGTATTTTACCGTGCTATAAGATAACCTGCCCTACCTGTGCCCAACAATATGCCAACTTGCCAGCCAGTGACTTACTTAAGATATTACACAAGCACGCAAGTGATGGTTTAAATCGATTGGGGGCAGACAAAGATCGCTTTGTGCATGTCAAAAAGTTCTTGACAATCTTAGAGCACTTAACTGAACCGGTTGATCTGAGTCTAGAAATTTTCAATGAAGTATTCAAGTCTATAGGGGAGAAGCAACAATCACCTTTCAAAAACCTGAATATTCTGAATAATTTCTTTTTAAAAGGAAAGGAAAATACAGCTCGTGAATGGCAGGTGGCTCAATTAAGCTTACTTGAATTGGCAAGATTCCAAAAGAACAGAACGGATAATATCAAGAAAGTAGACATCTCGTTCTTTAGGAATAAACTATCTGCCAAAGCAAATTGGAACTTGTATCTGTCATGTGATAACCAGCTGGATAAGAATGCAAACTTCCTGTGGGGACAGAGGGAATATCATGCTAAGCGATTTTTCTCGAACTATTTCGAGGAAATTGATCCAGCGAAGGGCTATTCAGCATACGAAAATCGTTTGCATCCGAATGGGACAAGAAAACTTGCAATTGGAAACCTAATTGTACCACTTGATCTGGCTGAGTTTAGGCGGAAGATGAAAGGTGATTTTAAAAGACAGCCAGGGGTGAGTAAGAAGTGCACGAGCTCGAAGGATGGAAACTACGTGTATCCCTGTTGTTGCACTACACTTGATGATGGCTCAGCTGTTGAATCAACATTTTACCCGCCAACTAAGAAGCGCCTCGTAATAGGTAATAGTGGCGACCAAAAGTATGTTGACTTACCAAAAGGGAATTCTGAGATGTTATATATTGCCAGGCAAGGCTTCTGTTACATTAACATTTTCCTCGCGATGTTGATTAACATTAGTGAGGAAGATGCAAAGGATTTCACTAAGAAGGTTCGTGACATGTGTGTGCCAAAGCTCGGAACCTGGCCAACCATGATGGATCTGGCTACAACTTGTGCTCAAATGAAAATATTCTACCCTGATGTTCATGATGCAGAACTGCCTAGAATACTAGTCGATCACGAAACGCAGACATGCCATGTGGTTGACTCGTTTGGCTCACAAACAACTGGGTATCATATTTTGAAAGCATCTAGCGTGTCCCAACTTATTTTGTTTGCTAATGATGAGTTGGAGTCTGACATTAAGCACTATAGAGTTGGTGGTATTCCTAATGCATGCCCTGAACTTGGGTCCACAATATCACCTTTCAGAGAAGGAGGAGTTATAATGTCTGAGTCGGCAGCGCTGAAACTGCTTTTGAAGGGAATTTTTAGACCTAAGGTGATGAGACAGTTGCTGTTAGATGAGCCTTACCTGTTGATTCTATCAATATTATCCCCTGGCATACTGATGGCTATGTATAATAATGGGATTTTTGAACTTGCGGTGAGGTTGTGGATTAATGAGAAGCAATCCATAGCGATGATAGCATCGCTACTATCAGCTTTAGCCCTACGAGTGTCAGCGGCAGAAACACTCGTCGCACAGAGGATTATAATTGATGCTGCAGCTACAGACCTCCTCGATGCTACGTGTGACGGGTTCAACCTGCATCTAACGTACCCCACTGCATTGATGGTGTTGCAAGTTGTTAAGAATAGAAATGAATGTGATGATACTCTATTCAAGGCGGGTTTTCCAAGTTACAACACGAGCGTCGTACAGATTATGGAAAAAAATTATCTAAATCTCTTGAACGATGCTTGGAAAGATTTAACTTGGCGGGAAAAATTTTCCGCAACATGGTACTCATACAGAGCAAAACGCTCTATCACTCGGTACATAAAACCCACAGGAAGGGCAGATTTGAAAGGGTTATACAACATATCACCACAAGCATTCTTGGGCCGAAGCGTCCAGGTGGTCAAAGGCACTGCCTCAGGATTGAGCGAGCGATTTAATAATTACTTCAATACTAAGTGTGTAAATATTTCATCCTTTTTCATTCGTAGAATCTTTAGGCGTTTGCCAACTTTCGTCACTTTTGTTAACTCATTATTAGTTATTAGTATGTTAACTAGTGTAGTGGCAGTGTGTCAGGCAATAATTTTAGATCAGAAGAAGTATAGGAGAGAGATCGAGTTGATGCAGATAGAGAAGAATGAGATTGTCTGCATGGAGCTATACGCAAGTTTACAGCGCAAACTTGAACGCGATTTCACATGGGATGAGTACATTGAGTATTTGAAATCAGTGAACCCTCAGATAGTTCAATTTGCTCAAGCGCAGATGGAAGAATATGATGTGCGACACCAGCGTTCCACACCAGGTGTTAAAAATTTGGAACAAGTGGTAGCATTTATGGCTTTAGTCATCATGGTGTTCGATGCTGAAAGGAGTGACTGCGTGTTCAAAACTCTCAATAAATTTAAGGGTGTCCTTTCCTCACTGGACCATGAAGTTAGACATCAGTCCTTGGATGATGTGATCAAGAATTTTGATGAGAGGAATGAGACTATTGATTTTGAATTGAGTGAGGACACAATTCGAACATCATCAGTGCTAGATACAAAGTTTAGTGATTGGTGGGACCGACAAATCCAAATGGGACATACACTTCCACATTACAGAACCGAGGGGCACTTCATGGAATTTACAAGGGCAACTGCTGTTCAAGTGGCTAATGACATTGCCCACAGCGAACACCTAGACTTTCTAGTAAGGGGAGCTGTTGGGTCTGGAAAGTCAACTGGGTTGCCTGTTCATCTTAGTGTAGCTGGATCTGTGCTTTTAATTGAGCCAACGCGGCCACTAACGGAGAACGTTTTCAAACAGCTATCTAGTGAACCATTCTTCAAGAAGCCAACACTGCGTATGCGTGGAAATAGTATATTTGGCTCTTCCCCAATCTCCGTTATGACTAGCGGGTTTGCGCTACACTACTTCGCCAATAATCGCTCTCAATTAGCTCAGTTCAATTTTGTAATATTTGATGAGTGCCATGTTCTGGATCCTTCCGCAATGGCGTTCCGCAGTCTGCTGAGTGTTTATCATCAAGCATGTAAAGTATTAAAAGTGTCAGCTACTCCAGTGGGAAGAGAGGTTGAATTCACAACACAGCAGCCAGTCAAGTTAATAGTGGAAGACACACTGTCTTTCCAATCATTTGTTGATGCACAAGGTTCTAAAACTAATGCTGATGTTGTCCAGTTTGGTTCAAACATACTTGTTTACGTGTCGAGCTACAATGAAGTTGACAACTTGGCTAAGCTCCTAACAGATAAGAATATGATGGTCACAAAGGTTGATGGCAGAACAATGAAGCACGGTTGCCTAGAAATTGTCACAAAAGGAACCAGTGCGAGACCACATTTTGTTGTAGCAACCAACATAATTGAGAATGGAGTGACTTTGGACATAGATGTAGTAGTGGATTTTGGGTTGAAAGTCTCACCGTTCTTGGACATTGACAATAGGAGCATTGCTTACAATAAGGTGAGTGTTAGTTATGGTGAGAGAATTCAAAGGCTGGGTCGTGTTGGACGCTTCAAGAAAGGAGTAGCATTGCGCATTGGACACACTGAGAAGGGAATTATCGAAATTCCAAGCATGATCGCTACTGAGGCGGCTCTTGCTTGCTTTGCATATAACTTACCAGTGATGACAGGAGGCGTCTCAACTAGTCTGATTGGCAATTGTACTGTGCGCCAGGTCAAAACAATGCAGCAATTTGAATTGAGCCCCTTCTTTATCCAGAATTTCGTTGCTCATGATGGATCAATGCATCCTGTCATACATGACATTCTCAAAAAGTATAAACTCCGAGATTGTATGACACCTTTGTGCGATCAGTCTATACCATACAGGGCATCGAGCACTTGGTTATCGGTTAGTGAATATGAGCGACTTGGAGTGGCTTTAGAAATTCCAAAGCAAGTCAAAATTGCATTCCATATCAAAGAGATCCCTCCTAAGCTCCACGAAATGCTTTGGGAAACGGTTGTCAAGTACAAAGACGTTTGCTTATTTCCAAGCATTCGAGCATCGTCCATCAGCAAAATCGCATACACATTGCGTACAGACCTCTTCGCCATCCCAAGAACTCTAATATTGGTGGAGAGATTGCTTGAAGAGGAGCGAGTGAAGCAGAGCCAATTCAGAAGTCTCATCGATGAAGGGTGCTCAAGCATGTTTTCAATTGTCAACTTGACCAACACTCTCAGGGCCAGATATGCAAAAGATTACACCGCAGAGAACATACAAAAACTTGAGAAAGTGAGAAGTCAATTAAAAGAATTCTCAAATTTGGATGGTTCTGCATGTGAGGAAAATTTAATAAAGAGGTATGAGTCTTTGCAGTTCGTTCACCACCAAGCTGCGACGTCACTTGCAAAGGATCTCAAGTTGAAGGGGACTTGGAAGAAGTCATTAGTGGCTAAAGACTTGATCATAGCAGGCGCTGTTGCAATTGGTGGAATAGGACTCATATATAGTTGGTTCACACAATCAGTTGAGACTGTGTCTCACCAAGGGAAAAATAAATCCAAAAGAATCCAAGCCTTGAAGTTTCGCCATGCTCGTGACAAAAGGGCTGGCTTTGAAATTGACAACAATGATGACACAATAGAGGAATTCTTTGGATCTGCATACAGGAAGAAGGGAAAAGGTAAAGGTACCACAGTTGGTATGGGCAAGTCAAGCAGGAGGTTCATTAACATGTACGGGTTTGACCCGACAGAGTACTCATTCATCCAATTCGTTGATCCACTCACTGGGGCACAAATAGAAGAGAATGTCTATGCTGACATTAGAGATGTTCAAGAAAGATTTAGTGAGGTGCGAAAGAAAATGGTTGAGAATGATGATATTGAAATGCAAGCCTTGGGTAGTAACACGACCATACATGCATACTTTAGGAAAGATTGGTCTGACAAAGCTTTGAAGATTGATTTAATGCCACATAACCCACTCAAAGTTTGTGACAAAACAAATGGCATTGCCAAACTTCCTGAGAGAGAGTTCGAACTAAGGCAGACTGGGCCAGCTGTAGAAGTCGACGTGAAGGACATACCAGCGCAGGAAGTGGAGCATGAAGCCAAATCGCTCATGAGAGGCTTGAGAGACTTCAACCCAATTGCTCAAACAGTTTGTAGGCTGAAAGTATCTGTTGAATATGGGACATCAGAGATGTACGGTTTTGGATTTGGAGCATACATAATAGCGAACCACCATTTGTTTAGGAGTTACAATGGTTCCATGGAGGTGCGATCCATGCATGGTACATTCAGGGTGAAGAATCTACACAGTTTGAGCGTTCTGCCAATTAAAGGCAGGGACATCATCCTCATCAAAATGCCGAAGGATTTCCCTGTTTTTCCACAGAAATTGCATTTCCGAGCTCCTACACAGAATGAAAGAGTTTGTTTAGTTGGGACCAACTTTCAGGAGAAGTATGCATCGTCGATCATCACAGAAACAAGCACTACTTACAATATACCAGGCAGCACATTCTGGAAGCATTGGATTGAAACAGATAATGGACATTGTGGACTACCAGTGGTGAGCACAGCCGATGGATGTCTAGTCGGAATTCATAGTTTGGCAAACAATGCACACACCACGAACTACTACTCAGCCTTCGATGAAGATTTTGAAAGCAAGTACCTCCGAACCAATGAGCACAATGAATGGGTCAAATCTTGGGTTTATAATCCAGACACAGTGTTGTGGGGCCCGTTGAAGCTTAAAGATAGCACTCCCAAAGGGTTATTCAAAACAACAAAGCTTGTGCAAGATCTAATCGATCATGATGAAGTCGCGGAGCAAGCTAAGCACTCTGCGTGGATGTTTGAAGCCTTGACAGGAAATTTGCAAGCTGTCGCAACAATGAAGAGTCAATTAGTAACCAAGCATGTAGTTAAAGGAGAGTGTCGACACTTCAAAGAATTCCTGACTGTGGATGCAGAAGCAGAGGCATTCTTCAGGCCTTTGATGGATGCATATGGGAAAAGCTTGCTGAATAGAGATGCGTACATCAAAGACATAATGAAGTACTCAAAACCTATAGATGTTGGTATCGTGGACTGTGATGCATTTGAAGAAGCCATCAATAGGGTTATCATCTACTTGCAAGTGCACGGCTTCAAGAAGTGTGCATATGTCACTGACGAGCAAGAAATTTTCAAAGCGCTCAACATGAAAGCTGCAGTCGGAGCCATGTATGGTGGCAAAAAGAAAGACTATTTTGAGCATTTCACTGATGCAGATAAGGAAGAAATAGTCATGCAAAGCTGTCTGCGACTGTATAAAGGCTTGCTCGGCATTTGGAACGGATCGTTGAAGGCAGAGCTCCGGTGTAAGGAAAAGATACTTGCAAACAAGACGAGGACGTTCACTGCTGCACCTCTAGACACTTTGCTGGGTGGTAAAGTGTGTGTTGATGACTTCAATAATCAATTTTATTCAAAGAATATTGAATGCTGTTGGACAGTTGGGATGACTAAGTTTTATGGTGGTTGGGATAAACTGCTTCGGCGTTTACCTGAGAATTGGGTATACTGTGATGCCGATGGCTCACAGTTTGATAGTTCACTAACTCCATACTTAATCAATGCTGTTCTCACCATCAGAAGCACATACATGGAAGACTGGGATGTGGGGTTGCAAATGCTGCGTAATTTATACACTGAGATTATTTACACACCTATCTCAACTCCAGATGGAACAATTGTTAAGAAGTTCAGAGGAAATAACAGTGGTCAGCCTTCTACTGTTGTGGACAACTCTCTTATGGTTGTCCTTGCCATGCACTATGCTCTCATCAAAGAATGCATTGAGTTTGAAGAGATTGACAGCACGTGCGTGTTCTTCGTCAATGGTGATGATTTGCTGATTGCCGTGAATCCGGATAAAGAGGGCATTCTTGATAGATTGTCACAGCACTTCTCAGATCTTGGTTTAAATTATGATTTTTCGTCAAGAACAAGAAATAAGGAAGAGTTATGGTTTATGTCTCATAGAGGCCTACTGATTGAGGGCATGTACGTGCCGAAACTTGAAGAAGAAAGGATTGTGTCCATTCTCCAATGGGACAGAGCAGACTTGGCTGAACACAGGCTTGAGGCGATTTGCGCAGCTATGATAGAGTCCTGGGGTTATTCTGAACTAACACACCAAATTAGAAGATTCTACTCATGGTTATTGCAACAGCAACCTTTTGCAACAATAGCGCAGGAAGGGAAGGCTCCTTATATAGCAAGCATGGCATTAAGGAAACTGTATATGGATAGAGCTGTGGATGAGGAAGAGCTGAGAGCCTTCACTGAAATGATGGTCGCATTAGACGATGAGTTTGAGTGTGACTCTTATGAAGTACACCACCAGGGAAACGATACAATCGATGCAGGAGGAAGCAGCAAGAAAGATGCAAGACCAGAGCAAGGCAGCATCCAGTCAAACCCGAACAAAGGAAAAGATAAGGATGTGAATGCTGGTACATCTGGGACACATACTGTGCCGAGAATCAAGGCTATCACGTCCAAAATGAGAATGCCCAAAAGCAAAGGAGCAACCGTGCTAAACTTAGAACACTTGCTTGAGTATGCTCCACAACAAATTGATATTTCAAATACTCGGGCAACTCAATCACAGTTTGATACGTGGTATGAGGCAGTGCGGATGGCATACGACATAGGAGAAACTGAGATGCCAACTGTGATGAATGGGCTTATGGTTTGGTGCATTGAAAATGGAACCTCGCCAAATGTCAACGGAGTCTGGGTTATGATGGATGGGGATGAACAAGTCGAGTACCCGTTGAAACCAATCGTTGAGAATGCAAAACCAACCCTTAGGCAAATCATGGCACATTTCTCAGATGTTGCAGAAGCGTATATAGAAATGCGCAACAAAAAGGAACCATATATGCCACGATATGGTTTAATTCGAAATCTGCGGGATGTGGGTTTAGCGCGTTATGCCTTTGACTTTTATGAGGTCACATCACGAACACCAGTGAGGGCTAGGGAAGCGCACATTCAAATGAAGGCCGCAGCATTGAAATCAGCCCAACCTCGACTTTTCGGGTTGGACGGTGGCATCAGTACACAAGAGGAGAACACAGAGAGGCACACCGCCGAGGATGTCTCTCCAAGTATGCATACTCTACTTGGAGTCAAAAACATGTGA
================================================
FILE: inst/extdata/GVariation/C.Wilga5.fas
================================================
>Wilga5
ATGGCAACTTACATGTCAACAATCTGTTTCGGTTCGTTTGAATGCAAGCTACCATACTCACCCGCCTCTTGCGGGCATATTGTGAAGGAACGAGAAGTGCTGGCTTCCGTTGATCCTTTCGCAGATCTGGAAACACAACTTAGTGCACGATTGCTCAAGCAAGAATATGCTACTGTTCGTGTGCTCAAGAACGGTACTCTTACGTACCGATACAAGACTGATGCCCAGATAACGCGCATCCAGAAGAAACTGGAAAGGAAGGATAGGGAAGAATATCACTTCCAGATGGCAGCTCCTAGTATTGTGTCAAAGATCACTATTGCTGGTGGAGAGCCACCTTCAAAACTTGAATCACAAGTGCGGAGGGGTGTCATCCACACAACTCCAAGGATGCGCACAGCAAAAACATATCACACGCCAAAGTTGACAGAGGGACAAATGAACCACCTTATCAAGCAGGTGAAGCAAATTATGTCAACCAAAGGAGGGTCTGTTCAACTGATTAGCAAGAAAAGTACCCATGTTCACTATAAAGAAGTTTTGGGATCACATCGCGCAGTTGTTTGCACTGCACATATGAGAGGTTTACGAAAGAGAGTGGACTTTCGGTGTGATAAATGGACCGTTGTGCGTCTACAGCATCTCGCCAGGACGGACAAGTGGAATAACCAAGTTCGTGCTACTGATCTACGCAAGGGCGATAGTGGAGTTATATTGAGTAATACTAATCTCAAAGGAAGCTTTGGGAGAAGCTCGGAGGGCATATTCATAGTGCGTGGGTCGCACGAAGGAAAAATCTATGATGCACGTTCCAAGGTTACTCAAGGGGTTATGGATTCAATGGTTCAGTTCTCAAGCGCTGAAAGCTTCTGGAAGGGATTGGACGGCAATTGGGCACAAATGAGATATCCTACAGATCATACATGTGTGGCAGGCTTACCAGTTGAAGACTGCGGCAGAGTTGCAGCGATAATGACACACAGTATTTTACCGTGCTATAAGATAACCTGCCCTACCTGTGCCCAACAATATGCCAATTTGCCAGCCAGTGACTTACTTAAGATATTACACAAGCACGCAAGTGATGGTTTAAATCGATTGGGGGCAGACAAAGATCGCTTTGTGCACGTCAAAAAGTTCTTGACAATCTTAGAGCACTTAACTGAACCGGTTGATCTGGGTCTAGAAATTTTCAATGAAGTATTCAAGTCTATAGGGGAGAAGCAACAATCACCTTTCAAAAACCTGAATATTCTGAATAATTTCTTTTTGAAAGGAAAAGAAAATACAGCTCGTGAATGGCAGGTGGCTCAATTAGGCTTACTTGAATGGGCAAGATTCCAAAAGAACAGAACGGATAATATCAAGAAAGGAGACATCTCGTTCTTTAGGAATAAACTATCTGCCAAAGCAAATTGGAACTTGTATCTGTCATGTGATAACCAGCTGGATAAGAATGCAAACTTCCTGTGGGGACAGAGGGAATATCATGCTAAGCGATTTTTCTCGAACTATTTCGAGGAAATTGATCCAGCGAAGGGCTATTCAGCATATGAAAATCGTTTGCATCCGAATGGGACAAGAAAACTTGCAATTGGAAACCTAATTGTACCACTTGATCTGGCTGAGTTTAGGCGGAAGATGAAAGGTGATTATAAAAGACAGCCAGGGGTGAGTAAGAAGTGCACGAGCTCGAAGGATGGAAACTACGTGTATCCCTGTTGTTGCACTACACTTGATGATGGCTCAGCTGTTGAATCAACATTTTACCCGCCAACTAAGAAGCACCTTGTAATAGGTAATAGTGGCGACCAAAAGTATGTTGACTTACCAAAAGGGAATTCTGAGATGTTATATATTGCCAGGCAAGGCTTTTGTTACATTAACATTTTCCTCGCGATGTTGATTAACATTAGTGAGGAAGATGCAAAGGATTTCACTAAGAAGGTTCGTGACATGTGTGTGCCAAAGCTTGGAACCTGGCCAACCATGATGGATCTGGCTACAACTTGTGCTCAAATGAAAATATTCTACCCTGATGTTCATGATGCAGAACTGCCTAGAATACTAGTCGATCACGAAACGCAGACATGCCATGTGGTTGACTCGTTTGGCTCACAAACAACTGGGTATCATATTTTGAAAGCATCTAGCGTGTCCCAACTTATTTTGTTTGCTAATGATGAGTTGGAGTCTGACATTAAACATTATAGAGTTGGTGGTGTTCCTAATGCATGCCCTGAACTTGGGTCCACAATATCACCTTTCAGAGAAGGAGGAGTTATAATGTCTGAGTCGGCAGCTCTGAAACTGCTTTTGAAGGGAATTTTTAGACCTAAGGTGATGAGACAGTTGTTGTTAGATGAGCCTTATCTGTTGATTCTATCAATATTATCCCCTGGCATACTGATGGCTATGTACAATAATGGGATTTTTGAACTTGCGGTAAGGTTGTGGATTAATGAGAAACAATCCATAGCTATGATAGCATCGCTACTATCAGCTTTAGCCCTACGAGTGTCAGCGGCAGAAACACTCGTCGCACAGAGGATTATAATTGATACTGCAGCTACAGATCTCCTTGATGCTACGTGCGATGGGTTCAACCTACATCTAACGTACCCCACTGCGTTGATGGTGTTGCAAGTTGTTAAGAATAGAAATGAATGTGATGATACCCTATTCAAGGCGGGTTTTCCAAGTTACAACACGAGCGTCGTACAGATCATGGAAAAAAATTATCTAAATCTCTTAAACGATGCTTGGAAAGATTTAACTTGGCGGGAAAAATTATCCGCAACATGGTACTCATACAGAGCAAAACGCTCTATCACTCGGTACATAAAACCCACAGGAAGGGCAGATTTGAAAGGGTTATACAACATATCACCACAAGCATTCTTGGGCCGAGGCGCCAAGGTGGTCAAAGGCACTGCCTCAGGATTGTGCGAGCGATTTAATAATTATTTCTACACTAAGTGTGTAAATATTTCATCCTTTTTCATTCGTAGAATCTTTAGGCGTTTGCCAACTTTCGTCACTTTTGTTAACTCATTATTAGTTATTAGTATGTTAACCAGCGTAGTGGCAGTCTGTCAGGCAATAATTTTAGATCAGAGGAAGTATAGGAGAGAAATCGAGTTGATGCAGATAGAGAAGAATGAGATCGTCTGCATGGAGCTATATGCAAGTTTACAGCGCAAACTTGAACGCGATTTCACATGGGATGAGTACATTGAGTATTTGAAGTCAGTAAACCCTCAGATAGTTCAGTTTGCTCAAGCGCAGATGGAAGAATATGATGTGCGACACCAGCGTTCCACACCAGGTGTTAAAAATTTGGAACAAGTGGTAGCATTTATGGCTTTAGTCATCATGGTGTTCGATGCTGAAAGGAGTGATTGCGTGTTCAAAACTCTCAATAAATTTAAGGGTGTCCTTTCCTCACTGGACCATGAAGTTAGACATCAGTCCTTAGACGATGTGATCAAGAATTTTGATGAGAGGAATGAGGTTATTGATTTTGAATTGAGTGAGGACACAATTCGAACATCATCAGTGCTAGATACAAAGTTTAGTGATTGGTGGGACCGACAAATCCAGATGGGGCATACACTTCCACATTACAGAACCGAGGGGCACTTCATGGAATTCACAAGAGCAACTGCTGTCCAAGTGGCTAATGACATTGCCCATAGCGAACACCTAGACTTTCTAGTAAGGGGAGCTGTTGGGTCTGGAAAGTCAACTGGGTTGCCTGTTCATCTTAGTGTAGCCGGATCTGTGCTTTTAATTGAACCAACGCGACCACTAGCGGAGAACGTTTTCAAACAGCTATCTAGTGAACCATTCTTCAAGAAGCCAACACTGCGTATGCGTGGAAATAGTATATTTGGCTCTTCTCCAATCTCCGTCATGACTAGCGGATTTGCGCTACACTACTTCGCCAATAATCGCTCTCAATTAGCTCAGTTCAACTTTGTAATATTTGATGAGTGCCATGTTCTGGATCCTTCCGCAATGGCGTTCCGCAGTCTGCTGAGTGTTTATCATCAAGCATGCAAAGTATTAAAAGTGTCAGCTACTCCAGTGGGAAGAGAGGTTGAATTTACAACACAGCAACCAGTCAAGTTAATAGTGGAGGACACACTGTCTTTTCTATCATTTGTTGATGCACAAGGTTCTAAAACTAATGCTGATGTTGTTCAGTTTGGTTCAAACGTACTTGTGTACGTGTCGAGCTACAATGAAGTTGATACCTTGGCTAAGCTCCTAACAGACAAGAATATGATGGTCACAAAAGTTGATGGCAGAACAATGAAGCACGGTTGCCTAGAAATTGTCACAAAAGGAACCAGTGCGAGACCACATTTTGTTGTAGCAACCAACATAATTGAGAATGGAGTGACTTTGGACATAGACGTGGTTGTAGATTTTGGGTTGAAAGTCTCACCGTTCTTGGACATTGACAATAGGAGCATTGCTTACAATAAGGTGAGTGTTAGCTATGGTGAGAGAATTCAAAGGCTGGGTCGTGTTGGACGCTTCAAGAAAGGAGTAGCATTGCGCATTGGACACACTGAGAAGGGAATTATTGAAATTCCAAGCATGATCGCTACAGAGGCGGCTCTTGCTTGCTTTGCATATAACTTGCCAGTGATGACAGGAGGCGTCTCAACTAGTCTGATTGGCAATTGTACTGTGCGCCAAGTTAAAACAATGCAGCAATTTGAATTGAGTCCCTTCTTTATCCAGAATTTTGTTGCCCATGATGGATCAATGCATCCTGTCATACATGACATTCTTAAAAAGTATAAACTTCGAGATTGTATGACACCTTTGTGCGATCAGTCTATACCATACAGGGCATCGAGCACTTGGTTATCGGTTAGTGAATATGAGCGACTTGGAGTGGCCTTAGAAATTCCAAAGCAAGTCAAAATTGCATTCCATATCAAAGAGATCCCTCCTAAGCTCCATGAAATGCTTTGGGAAACGGTTGTCAAGTACAAAGACGTTTGCTTATTTCCAAGCATTCGAGCATCGTCCATCAGCAAAATCGCATACACATTGCGTACAGACCTTTTCGCCATCCCAAGAACTCTAATATTGGTGGAGAGACTGCTTGAAGAGGAGCGAGTGAAGCAGAGCCAATTCAGAAGTCTCATCGACGAAGGATGCTCAAGCATGTTTTCAATTGTCAACTTGACAAACACTCTCAGAGCTAGATATGCAAAAGATTACACCGCAGAGAACATACAAAAACTTGAGAAAGTGAGAAGTCAATTGAAAGAATTCTCAAATTTGGATGGTTCTGCATGTGAGGAAAATTTAATAAAGAGGTATGAGTCTTTGCAGTTCGTTCATCACCAAGCTGCGACGTCACTTGCAAAGGATCTCAAGTTGAAGGGGACTTGGAAGAAGTCATTAGTGGCCAAAGACTTGATCATAGCAGGCGCTGTTGCAATTGGTGGAATAGGACTCATATATAGTTGGTTCACACAATCAGTTGAGACTGTGTCTCACCAAGGGAAAAATAAATCCAAAAGAATCCAAGCCTTGAAGTTTCGCCATGCTCGTGACAAAAGGGCTGGCTTTGAAATTGACAACAATGATGACACAATAGAGGAATTCTTTGGATCTGCATACAGGAAAAAGGGAAAAGGTAAAGGTACCACAGTTGGTATGGGCAAGTCAAGCAGGAGGTTCATCAACATGTATGGGTTTGACCCAACAGAGTACTCATTCATCCAATTCGTTGATCCACTCACTGGGGCGCAAATAGAAGAGAATGTCTATGCTGACATTAGAGATATTCAAGAGAGATTTAGTGAAGTGCGAAAGAAAATGGTTGAGAATGATGACATTGAAATGCAAGCCTTGGGCAGTAACACGACCATACATGCATACTTCAGGAAAGATTGGTCTGACAAAGCTTTGAAGATTGATTTAATGCCACATAATCCACTCAAAGTTTGTGACAAGACAAATGGCATTGCCAAATTTCCTGAGAGAGAACTCGAACTAAGGCAGACTGGGCCAGCTGTAGAAGTCGACGTGAAGGACATACCAGCACAGGAGGTGGAGCATGAAGCTAAATCGCTCATGAGAGGCTTGAGAGACTTCAACCCAATTGCCCAAACAGTTTGTAGGCTGAAAGTATCTGTTGAATATGGGACATCAGAGATGTACGGTTTTGGATTTGGAGCGTACATAATAGCGAACCACCATTTGTTCAGGAGTTACAATGGTTCCATGGAGGTGCGATCCATGCACGGTACATTCAGGGTGAAGAATCTACACAGTTTGAGTGTTCTGCCAATTAAAGGTAGGGATATCATCCTCATCAAAATGCCGAAAGATTTCCCTGTCTTTCCACAGAAATTGCATTTCCGAGCTCCTACACAGAATGAAAGAGTTTGTTTAGTTGGAACCAACTTTCAGGAGAAGTATGCATCGTCGATCATCACAGAAGGAGGCACCACTTACAATATACCAGGCAGCACATTCTGGAAGCATTGGATTGAAACAGATAATGGACATTGTGGACTACCAGTGGTGAGCACCACCGATGGATGTCTAGTCGGAATTCACAGTTTGGCAAACAACAAACACACCACGAATTACTACTCAGCCTTCGATGAAGATTTTGAAAGCAAGTATCTCCGAACCAATGAGCACAATGAATGGGTCAAGTCTTGGATTTATAATCCAGACACAGTATTGTGGGGCCCGTTGAAACTTAAAGACAGCACTCCCAAAGGATTATTCAAAACAACAAAGCTTGTGCAAGATCTAATTGATCATGATGAAGTGGTGGAGCAAGCTAAGCACTCTGCGTGGATGTTTGAAGCCTTGACAGGAAATTTGCAAGCTGTCGCAACAATGAAGAGCCAATTAGTAACCAAGCATGTAGTTAAAGGAGAGTGTCGACACTTCAAAGAATTCCTGACTGTGGATGCAGAAGCAGAGGCATTCTTCAGGCCTTTGATGGATGCGTATGGGAAAAGTTTGCTGAATAGAGATGCATACATCAAGGACATAATGAAGTATTCAAAACCTATAGATGTTGGTATCGTGGACTGTGATGCATTTGAGGAAGCCATCAATAGGGTTATCATCTACCTGCAAGTGCACGGCTTCAAGAAGTGCGCATACGTTACTGACGAGCAAGAAATTTTTAAAGCGCTCAACATGAAAGCTGCAGTCGGAGCCATGTATGGTGGCAAAAAGAAAGACTATTTTGAGCATTTCACTGATGCAGATAAGGAAGAAATAGTCATGCAAAGCTGTCTGCGATTGTATAAAGGCTTGCTTGGCATTTGGAATGGATCATTGAAGGCAGAGCTCCGGTGTAAGGAAAAGATACTTGCAAATAAGACGAGGACATTCACTGCTGCACCTTTAGACACTTTGCTGGGTGGTAAAGTGTGTGTTGATGACTTCAATAATCAATTTTATTCAAAGAATATTGAATGCTGTTGGACGGTTGGGATGACAAAGTTTTATGGTGGTTGGGATAAACTGCTGCGGCGTTTACCTGAGAATTGGGTTTACTGTGATGCCGATGGCTCACAGTTTGATAGTTCACTAACTCCATACTTAATCAATGCAGTTCTCACCATTAGAAGCACATACATGGAAGATTGGGATGTGGGGTTGCAAATGTTGCGCAATTTATACACTGAGATTGTTTACACACCTATTTCAACTCCAGATGGAACAATTGTTAAGAAGTTCAGAGGGAATAACAGTGGTCAGCCTTCTACTGTTGTGGACAACTCTCTTATGGTCGTCCTTGCCATGCACTATGCTCTCATCAAAGAATGCATTGAGTTTGAAGAGATTGACAGCACGTGCGTGTTCTTTGTCAATGGTGATGATTTGCTGATTGCTGTGAATCCGGATAAAGAGGGCATTCTTGACAGATTGTCACAACACTTCTCAGATCTTGGTTTGAATTATGATTTCTCGTCAAGAACAAGAAATAAGGAGGAGTTGTGGTTTATGTCTCATAGAGGCCTACTGATTGAGGCAATGTACGTGCCGAAACTTGAAGAAGAAAGGATTGTGTCCATTCTCCAATGGGACAGAGCAGACTTGGCTGAACATAGGCTTGAGGCGATTTGCGCAGCTATGATAGAGTCCTGGGGTTATTCTGAACTAACACACCAAATCAGGAGATTCTACTCATGGTTATTGCAACAGCAACCCTTTGCAACAATAGCGCAGGAAGGGAAGGCTCCTTATATAGCAAGCATGGCATTAAGGAAATTGTATATGGATAGGGCTGTGGATGAGGAAGAGCTGAGAGCCTTCACTGAAATGATGGTCGCATTAGATGATGAGTTTGAATTTGACTCTTATGAAGTATACCATCAAGCAAATGACACAATCGATGCAGGAGAAAGCAGCAAGAAAGATGCAAGACCAGAGCAAGGCAGCATCCAGTCAAACCCGAACAAAGGAAAAGATAAGGATGTGAATGCTGGTACATCTGGGACACATACTGTGCCGAGAATCAAGGCTATCACGTCCAAAATGAGGATGCCCAAAAGCAAGGGAGCAACCGTGCTAAACTTAGAACACTTGCTTGAGTATGCTCCACAACAAATTGATATTTCAAATACTCGGGCAACTCAATCACAGTTTGATACGTGGTATGAGGCAGTGCGGATGGCATACGACATAGGAGAAACTGAGATGCCAACTGTGATGAATGGGCTTATGGTTTGGTGCATTGAAAATGGAACCTCGCCAAATGTCAACGGAGTTTGGGTTATGATGGATGGGAATGAACAAGTCGAGTACCCGTTGAAACCAATCGTTGAGAATGCAAAACCAACCCTTAGGCAAATCATGGCACATTTCTCAGATGTTGCAGAAGCGTATATAGAAATGCGCAACAAAAAGGAACCATATATGCCACGATATGGTTTAATTCGAAATCTGCGGGATGTGGGTTTAGCGCGTTATGCCTTTGACTTTTATGAGGTCACATCACGAACACCAGTGAGGGCTAGGGAAGCGCACATTCAAATGAAGGCCGCAGCATTGAAATCAGCCCAACCTCGACTTTTCGGGTTGGACGGTGGCATCAGTACACAAGAGGAGAACACAGAGAGGCACACCACCGAGGATGTCTCTCCAAGTATGCATACTCTACTTGGAGTCAAGAACATGTGA
>CF_YL21
ATGGCAATCTACACGTCAACAATCTGTTTCGGTTCGTTTGAATGCAAGCTACCATACTCACCCGCCTCTTGCGGGCATATTGTGAAGGAACGAGAAGTGCTGGCTTCCATCGATCCTTTCGCAGATCTGGAAACACAACTTAGTGCACGATTGCTCAAGCAAGAATATGCTACTGTTCGTGTGCTCAAGAACGGTACTCTTACGTACCGATACAAGACTGATGCCCAGATAACGCGCATCCAGAAGAGACTGGAAAGGAAGGATAGGGAAGAATACCACTTCCAGATGGCAGCTCCTAGTATTGTGTCAAAGATCACTATTGCTGGTGGAGAGCCACCTTCAAAACTTGAATCACAAGTGCGGAGGGGTGTCATCCACACAACTCCAAGGATGCGCACAGCAAAAACATATCACACGCCAAAGCTGACAGAGGGACAAATGAACCACCTTATCAAGCAGGTGAAGCAAATTATGTCAACCAAAGGAGGGTCTGTTCAACTGATTAGCAAGAAAAGTACCCATGTTCACTATAAAGAAGTTTTGGGATCACATCGCGCAGTTGTTTGCACTGCACATATGAGAGGTTTACGAAAGAGAGTGGACTTTCGGTGTGATAAATGGACCGTTGTGCGTCTACAGCATCTCGCCAGGACGGATAAGTGGAATAACCAAGTTCGTGTTACTGATCTACGCAAGGGCGATAGTGGAGTTATATTGAGTAATACTAATCTCAAAGGAAACTTTGGGAGAAGCTCGGAGGGCATATTCATAGTGCGTGGGTCGCACGAAGGAAAAATCTATGATGCACGTTCCAAGGTTACTCAAGGGGTTATGGATTCAATGGTTCAGTTCTCAAGCGCTGAAAGCTTTTGGAAGGGATTGGACGGCAATTGGGCACAAATGAGATATCCTACAGATCATACATGTGTGGCAGGCTTACCAGTTGAAGACTGTGGCAGAGTTGCAGCGATAATGACACACAGTATTTTACCGTGCTATAAGATAACCTGCCCTACCTGTGCCCAACAATATGCCAACTTGCCAGCCAGTGACTTACTTAAGATATTACACAAGCACGCAAGTGATGGTTTAAATCGATTGGGGGCAGACAAAGATCGCTTTGTGCATGTCAAAAAGTTCTTGACAATCTTAGAGCACTTAACTGAACCGGTTGATCTGAGTCTAGAAATTTTCAATGAAGTATTCAAGTCTATAGGGGAGAAGCAACAATCACCTTTCAAAAACCTGAATATTCTGAATAATTTCTTTTTAAAAGGAAAGGAAAATACAGCTCGTGAATGGCAGGTGGCTCAATTAAGCTTACTTGAATTGGCAAGATTCCAAAAGAACAGAACGGATAATATCAAGAAAGTAGACATCTCGTTCTTTAGGAATAAACTATCTGCCAAAGCAAATTGGAACTTGTATCTGTCATGTGATAACCAGCTGGATAAGAATGCAAACTTCCTGTGGGGACAGAGGGAATATCATGCTAAGCGATTTTTCTCGAACTATTTCGAGGAAATTGATCCAGCGAAGGGCTATTCAGCATACGAAAATCGTTTGCATCCGAATGGGACAAGAAAACTTGCAATTGGAAACCTAATTGTACCACTTGATCTGGCTGAGTTTAGGCGGAAGATGAAAGGTGATTTTAAAAGACAGCCAGGGGTGAGTAAGAAGTGCACGAGCTCGAAGGATGGAAACTACGTGTATCCCTGTTGTTGCACTACACTTGATGATGGCTCAGCTGTTGAATCAACATTTTACCCGCCAACTAAGAAGCGCCTCGTAATAGGTAATAGTGGCGACCAAAAGTATGTTGACTTACCAAAAGGGAATTCTGAGATGTTATATATTGCCAGGCAAGGCTTCTGTTACATTAACATTTTCCTCGCGATGTTGATTAACATTAGTGAGGAAGATGCAAAGGATTTCACTAAGAAGGTTCGTGACATGTGTGTGCCAAAGCTCGGAACCTGGCCAACCATGATGGATCTGGCTACAACTTGTGCTCAAATGAAAATATTCTACCCTGATGTTCATGATGCAGAACTGCCTAGAATACTAGTCGATCACGAAACGCAGACATGCCATGTGGTTGACTCGTTTGGCTCACAAACAACTGGGTATCATATTTTGAAAGCATCTAGCGTGTCCCAACTTATTTTGTTTGCTAATGATGAGTTGGAGTCTGACATTAAGCACTATAGAGTTGGTGGTATTCCTAATGCATGCCCTGAACTTGGGTCCACAATATCACCTTTCAGAGAAGGAGGAGTTATAATGTCTGAGTCGGCAGCGCTGAAACTGCTTTTGAAGGGAATTTTTAGACCTAAGGTGATGAGACAGTTGCTGTTAGATGAGCCTTACCTGTTGATTCTATCAATATTATCCCCTGGCATACTGATGGCTATGTATAATAATGGGATTTTTGAACTTGCGGTGAGGTTGTGGATTAATGAGAAGCAATCCATAGCGATGATAGCATCGCTACTATCAGCTTTAGCCCTACGAGTGTCAGCGGCAGAAACACTCGTCGCACAGAGGATTATAATTGATGCTGCAGCTACAGACCTCCTCGATGCTACGTGTGACGGGTTCAACCTGCATCTAACGTACCCCACTGCATTGATGGTGTTGCAAGTTGTTAAGAATAGAAATGAATGTGATGATACTCTATTCAAGGCGGGTTTTCCAAGTTACAACACGAGCGTCGTACAGATTATGGAAAAAAATTATCTAAATCTCTTGAACGATGCTTGGAAAGATTTAACTTGGCGGGAAAAATTTTCCGCAACATGGTACTCATACAGAGCAAAACGCTCTATCACTCGGTACATAAAACCCACAGGAAGGGCAGATTTGAAAGGGTTATACAACATATCACCACAAGCATTCTTGGGCCGAAGCGTCCAGGTGGTCAAAGGCACTGCCTCAGGATTGAGCGAGCGATTTAATAATTACTTCAATACTAAGTGTGTAAATATTTCATCCTTTTTCATTCGTAGAATCTTTAGGCGTTTGCCAACTTTCGTCACTTTTGTTAACTCATTATTAGTTATTAGTATGTTAACTAGTGTAGTGGCAGTGTGTCAGGCAATAATTTTAGATCAGAAGAAGTATAGGAGAGAGATCGAGTTGATGCAGATAGAGAAGAATGAGATTGTCTGCATGGAGCTATACGCAAGTTTACAGCGCAAACTTGAACGCGATTTCACATGGGATGAGTACATTGAGTATTTGAAATCAGTGAACCCTCAGATAGTTCAATTTGCTCAAGCGCAGATGGAAGAATATGATGTGCGACACCAGCGTTCCACACCAGGTGTTAAAAATTTGGAACAAGTGGTAGCATTTATGGCTTTAGTCATCATGGTGTTCGATGCTGAAAGGAGTGACTGCGTGTTCAAAACTCTCAATAAATTTAAGGGTGTCCTTTCCTCACTGGACCATGAAGTTAGACATCAGTCCTTGGATGATGTGATCAAGAATTTTGATGAGAGGAATGAGACTATTGATTTTGAATTGAGTGAGGACACAATTCGAACATCATCAGTGCTAGATACAAAGTTTAGTGATTGGTGGGACCGACAAATCCAAATGGGACATACACTTCCACATTACAGAACCGAGGGGCACTTCATGGAATTTACAAGGGCAACTGCTGTTCAAGTGGCTAATGACATTGCCCACAGCGAACACCTAGACTTTCTAGTAAGGGGAGCTGTTGGGTCTGGAAAGTCAACTGGGTTGCCTGTTCATCTTAGTGTAGCTGGATCTGTGCTTTTAATTGAGCCAACGCGGCCACTAACGGAGAACGTTTTCAAACAGCTATCTAGTGAACCATTCTTCAAGAAGCCAACACTGCGTATGCGTGGAAATAGTATATTTGGCTCTTCCCCAATCTCCGTTATGACTAGCGGGTTTGCGCTACACTACTTCGCCAATAATCGCTCTCAATTAGCTCAGTTCAATTTTGTAATATTTGATGAGTGCCATGTTCTGGATCCTTCCGCAATGGCGTTCCGCAGTCTGCTGAGTGTTTATCATCAAGCATGTAAAGTATTAAAAGTGTCAGCTACTCCAGTGGGAAGAGAGGTTGAATTCACAACACAGCAGCCAGTCAAGTTAATAGTGGAAGACACACTGTCTTTCCAATCATTTGTTGATGCACAAGGTTCTAAAACTAATGCTGATGTTGTCCAGTTTGGTTCAAACATACTTGTTTACGTGTCGAGCTACAATGAAGTTGACAACTTGGCTAAGCTCCTAACAGATAAGAATATGATGGTCACAAAGGTTGATGGCAGAACAATGAAGCACGGTTGCCTAGAAATTGTCACAAAAGGAACCAGTGCGAGACCACATTTTGTTGTAGCAACCAACATAATTGAGAATGGAGTGACTTTGGACATAGATGTAGTAGTGGATTTTGGGTTGAAAGTCTCACCGTTCTTGGACATTGACAATAGGAGCATTGCTTACAATAAGGTGAGTGTTAGTTATGGTGAGAGAATTCAAAGGCTGGGTCGTGTTGGACGCTTCAAGAAAGGAGTAGCATTGCGCATTGGACACACTGAGAAGGGAATTATCGAAATTCCAAGCATGATCGCTACTGAGGCGGCTCTTGCTTGCTTTGCATATAACTTACCAGTGATGACAGGAGGCGTCTCAACTAGTCTGATTGGCAATTGTACTGTGCGCCAGGTCAAAACAATGCAGCAATTTGAATTGAGCCCCTTCTTTATCCAGAATTTCGTTGCTCATGATGGATCAATGCATCCTGTCATACATGACATTCTCAAAAAGTATAAACTCCGAGATTGTATGACACCTTTGTGCGATCAGTCTATACCATACAGGGCATCGAGCACTTGGTTATCGGTTAGTGAATATGAGCGACTTGGAGTGGCTTTAGAAATTCCAAAGCAAGTCAAAATTGCATTCCATATCAAAGAGATCCCTCCTAAGCTCCACGAAATGCTTTGGGAAACGGTTGTCAAGTACAAAGACGTTTGCTTATTTCCAAGCATTCGAGCATCGTCCATCAGCAAAATCGCATACACATTGCGTACAGACCTCTTCGCCATCCCAAGAACTCTAATATTGGTGGAGAGATTGCTTGAAGAGGAGCGAGTGAAGCAGAGCCAATTCAGAAGTCTCATCGATGAAGGGTGCTCAAGCATGTTTTCAATTGTCAACTTGACCAACACTCTCAGGGCCAGATATGCAAAAGATTACACCGCAGAGAACATACAAAAACTTGAGAAAGTGAGAAGTCAATTAAAAGAATTCTCAAATTTGGATGGTTCTGCATGTGAGGAAAATTTAATAAAGAGGTATGAGTCTTTGCAGTTCGTTCACCACCAAGCTGCGACGTCACTTGCAAAGGATCTCAAGTTGAAGGGGACTTGGAAGAAGTCATTAGTGGCTAAAGACTTGATCATAGCAGGCGCTGTTGCAATTGGTGGAATAGGACTCATATATAGTTGGTTCACACAATCAGTTGAGACTGTGTCTCACCAAGGGAAAAATAAATCCAAAAGAATCCAAGCCTTGAAGTTTCGCCATGCTCGTGACAAAAGGGCTGGCTTTGAAATTGACAACAATGATGACACAATAGAGGAATTCTTTGGATCTGCATACAGGAAGAAGGGAAAAGGTAAAGGTACCACAGTTGGTATGGGCAAGTCAAGCAGGAGGTTCATTAACATGTACGGGTTTGACCCGACAGAGTACTCATTCATCCAATTCGTTGATCCACTCACTGGGGCACAAATAGAAGAGAATGTCTATGCTGACATTAGAGATGTTCAAGAAAGATTTAGTGAGGTGCGAAAGAAAATGGTTGAGAATGATGATATTGAAATGCAAGCCTTGGGTAGTAACACGACCATACATGCATACTTTAGGAAAGATTGGTCTGACAAAGCTTTGAAGATTGATTTAATGCCACATAACCCACTCAAAGTTTGTGACAAAACAAATGGCATTGCCAAACTTCCTGAGAGAGAGTTCGAACTAAGGCAGACTGGGCCAGCTGTAGAAGTCGACGTGAAGGACATACCAGCGCAGGAAGTGGAGCATGAAGCCAAATCGCTCATGAGAGGCTTGAGAGACTTCAACCCAATTGCTCAAACAGTTTGTAGGCTGAAAGTATCTGTTGAATATGGGACATCAGAGATGTACGGTTTTGGATTTGGAGCATACATAATAGCGAACCACCATTTGTTTAGGAGTTACAATGGTTCCATGGAGGTGCGATCCATGCATGGTACATTCAGGGTGAAGAATCTACACAGTTTGAGCGTTCTGCCAATTAAAGGCAGGGACATCATCCTCATCAAAATGCCGAAGGATTTCCCTGTTTTTCCACAGAAATTGCATTTCCGAGCTCCTACACAGAATGAAAGAGTTTGTTTAGTTGGGACCAACTTTCAGGAGAAGTATGCATCGTCGATCATCACAGAAACAAGCACTACTTACAATATACCAGGCAGCACATTCTGGAAGCATTGGATTGAAACAGATAATGGACATTGTGGACTACCAGTGGTGAGCACAGCCGATGGATGTCTAGTCGGAATTCATAGTTTGGCAAACAATGCACACACCACGAACTACTACTCAGCCTTCGATGAAGATTTTGAAAGCAAGTACCTCCGAACCAATGAGCACAATGAATGGGTCAAATCTTGGGTTTATAATCCAGACACAGTGTTGTGGGGCCCGTTGAAGCTTAAAGATAGCACTCCCAAAGGGTTATTCAAAACAACAAAGCTTGTGCAAGATCTAATCGATCATGATGAAGTCGCGGAGCAAGCTAAGCACTCTGCGTGGATGTTTGAAGCCTTGACAGGAAATTTGCAAGCTGTCGCAACAATGAAGAGTCAATTAGTAACCAAGCATGTAGTTAAAGGAGAGTGTCGACACTTCAAAGAATTCCTGACTGTGGATGCAGAAGCAGAGGCATTCTTCAGGCCTTTGATGGATGCATATGGGAAAAGCTTGCTGAATAGAGATGCGTACATCAAAGACATAATGAAGTACTCAAAACCTATAGATGTTGGTATCGTGGACTGTGATGCATTTGAAGAAGCCATCAATAGGGTTATCATCTACTTGCAAGTGCACGGCTTCAAGAAGTGTGCATATGTCACTGACGAGCAAGAAATTTTCAAAGCGCTCAACATGAAAGCTGCAGTCGGAGCCATGTATGGTGGCAAAAAGAAAGACTATTTTGAGCATTTCACTGATGCAGATAAGGAAGAAATAGTCATGCAAAGCTGTCTGCGACTGTATAAAGGCTTGCTCGGCATTTGGAACGGATCGTTGAAGGCAGAGCTCCGGTGTAAGGAAAAGATACTTGCAAACAAGACGAGGACGTTCACTGCTGCACCTCTAGACACTTTGCTGGGTGGTAAAGTGTGTGTTGATGACTTCAATAATCAATTTTATTCAAAGAATATTGAATGCTGTTGGACAGTTGGGATGACTAAGTTTTATGGTGGTTGGGATAAACTGCTTCGGCGTTTACCTGAGAATTGGGTATACTGTGATGCCGATGGCTCACAGTTTGATAGTTCACTAACTCCATACTTAATCAATGCTGTTCTCACCATCAGAAGCACATACATGGAAGACTGGGATGTGGGGTTGCAAATGCTGCGTAATTTATACACTGAGATTATTTACACACCTATCTCAACTCCAGATGGAACAATTGTTAAGAAGTTCAGAGGAAATAACAGTGGTCAGCCTTCTACTGTTGTGGACAACTCTCTTATGGTTGTCCTTGCCATGCACTATGCTCTCATCAAAGAATGCATTGAGTTTGAAGAGATTGACAGCACGTGCGTGTTCTTCGTCAATGGTGATGATTTGCTGATTGCCGTGAATCCGGATAAAGAGGGCATTCTTGATAGATTGTCACAGCACTTCTCAGATCTTGGTTTAAATTATGATTTTTCGTCAAGAACAAGAAATAAGGAAGAGTTATGGTTTATGTCTCATAGAGGCCTACTGATTGAGGGCATGTACGTGCCGAAACTTGAAGAAGAAAGGATTGTGTCCATTCTCCAATGGGACAGAGCAGACTTGGCTGAACACAGGCTTGAGGCGATTTGCGCAGCTATGATAGAGTCCTGGGGTTATTCTGAACTAACACACCAAATTAGAAGATTCTACTCATGGTTATTGCAACAGCAACCTTTTGCAACAATAGCGCAGGAAGGGAAGGCTCCTTATATAGCAAGCATGGCATTAAGGAAACTGTATATGGATAGAGCTGTGGATGAGGAAGAGCTGAGAGCCTTCACTGAAATGATGGTCGCATTAGACGATGAGTTTGAGTGTGACTCTTATGAAGTACACCACCAGGGAAACGATACAATCGATGCAGGAGGAAGCAGCAAGAAAGATGCAAGACCAGAGCAAGGCAGCATCCAGTCAAACCCGAACAAAGGAAAAGATAAGGATGTGAATGCTGGTACATCTGGGACACATACTGTGCCGAGAATCAAGGCTATCACGTCCAAAATGAGAATGCCCAAAAGCAAAGGAGCAACCGTGCTAAACTTAGAACACTTGCTTGAGTATGCTCCACAACAAATTGATATTTCAAATACTCGGGCAACTCAATCACAGTTTGATACGTGGTATGAGGCAGTGCGGATGGCATACGACATAGGAGAAACTGAGATGCCAACTGTGATGAATGGGCTTATGGTTTGGTGCATTGAAAATGGAACCTCGCCAAATGTCAACGGAGTCTGGGTTATGATGGATGGGGATGAACAAGTCGAGTACCCGTTGAAACCAATCGTTGAGAATGCAAAACCAACCCTTAGGCAAATCATGGCACATTTCTCAGATGTTGCAGAAGCGTATATAGAAATGCGCAACAAAAAGGAACCATATATGCCACGATATGGTTTAATTCGAAATCTGCGGGATGTGGGTTTAGCGCGTTATGCCTTTGACTTTTATGAGGTCACATCACGAACACCAGTGAGGGCTAGGGAAGCGCACATTCAAATGAAGGCCGCAGCATTGAAATCAGCCCAACCTCGACTTTTCGGGTTGGACGGTGGCATCAGTACACAAGAGGAGAACACAGAGAGGCACACCGCCGAGGATGTCTCTCCAAGTATGCATACTCTACTTGGAGTCAAAAACATGTGA
================================================
FILE: inst/extdata/GVariation/sample_alignment.fa
================================================
>Mont
ATGGCAACTTACACATCAACAATCCAGTTTGGTTCCATTGAATGCAAACTTCCATACTCACCCGCTCCTTTTGGGCTAGT
TGCGGGGAAACGAGAAGTTTTAACCACCACTGACCCCTTCGCAAGTTTGGAGATGCAGCTTAGTGCGCGATTACGAAGGC
AAGAGTTTGCAACTATTCGAACATCCAAGAATGGTACTTGCATGTATCGATACAAGACTGATGTCCAGATTGCGCGCATT
CAAAAGAAGCGCGAGGAAAGAGAAAGAGAGGAATATAATTTCCAAATGGCTGCGTCAAGTGTTGTGTCGAAGATCACTAT
TGCTGGTGGAGAGCCACCTTCAAAACTTGAATCACAAGTGCGGAGGGGTGTCATCCACACAACTCCAAGGATGCGCACAG
CAAAAACATATCACACGCCAAAGCTGACAGAGGGACAAATGAACCACCTTATCAAGCAGGTGAAGCAAATTATGTCAACC
AAAGGAGGGTCTGTTCAACTGATTAGCAAGAAAAGTACTCATGTTCACTATAAAGAAGTTTTGGGATCACATCGCGCAGT
TGTTTGCACTGCACATATGAGAGGTTTACGAAAGAGAGTGGACTTTCGGTGTGATAAATGGACCGTTGTGCGTCTACAGC
ATCTCGCCAGGACGGACAAGTGGACTAACCAAGTTCGTGCTACTGATCTACGCAAGGGCGATAGTGGAGTTATATTGAGT
AATACTAATCTCAAAGGAAACTTTGGGAGAAGCTCGGAGGGCCTATTCATAGTGCGTGGGTCGCACGAAGGAAAAATCTA
TGATGCACGTTCCAAGGTTACTCAAGGGGTTATGGATTCAATGGTTCAGTTCTCAAGCGCTGAAAGCTTTTGGAAGGGAT
TGGACGGCAATTGGGCACAAATGAGATATCCTACAGATCATACATGTGTGGCAGGCTTACCAGTTGAAGACTGTGGCAGA
GTTGCAGCGATAATGACACACAGTATTTTACCGTGCTATAAGATAACCTGCCCTACCTGTGCCCAACAATATGCCAACTT
GCCAGCCAGTGACTTACTTAAGATATTACACAAGCACGCAAGTGATGGTCTAAATCGATTGGGGGCAGACAAAGATCGCT
TTGTGCATGTCAAAAAGTTCTTGACAATCTTAGAGCACTTAACTGAACCGGTTGATCTGAGTCTAGAAATTTTCAATGAA
GTATTCAAGTCTATAGGGGAGAAGCAACAATCACCTTTCAAAAACCTGAATATTCTGAATAATTTCTTTTTGAAAGGAAA
GGAAAATACAGCTCGTGAATGGCAGGTGGCTCAATTAAGCTTACTTGAATTGGCAAGATTCCAAAAGAACAGAACGGATA
ATATCAAGAAAGGAGACATCTCGTTCTTTAGGAATAAACTATCTGCAAAAGCAAATTGGAACTTGTATCTGTCATGTGAT
AACCAGCTGGATAAGAATGCAAACTTCCTGTGGGGACAGAGGGAATATCATGCTAAGCGATTTTTCTCGAACTATTTCGA
GGAAATTGATCCAGCGAAGGGCTATTCAGCATACGAAAATCGTTTGCATCCGAATGGGACAAGAAAACTTGCAATTGGAA
ACCTAATTGTACCACTTGATCTGGCTGAGTTTAGGCGGAAGATGAAAGGTGATTATAAAAGACAGCCAGGGGTGAGTAAG
AAGTGCACGAGCTCGAAGGATGGAAACTACGTGTATCCCTGTTGTTGCACTACACTTGATGATGGCTCAGCTGTTGAGTC
AACATTTTACCCGCCAACTAAGAAGCACCTCGTAATAGGTAATAGTGGCGACCAAAAGTATGTTGACTTACCAAAAGGGA
ATTCTGAGATGTTATATATTGCCAGGCAAGGCTTCTGTTACATTAACATTTTTCTCGCGATGTTGATTAACATTAGTGAG
GAAGATGCAAAGGATTTCACTAAGAAGGTTCGTGACATGTGTGTGCCAAAGCTTGGAACCTGGCCAACCATGATGGATTT
GGCTACAACTTGTGCTCAAATGAAAATATTCTACCCTGATGTTCATGATGCAGAACTGCCTAGAATACTAGTCGATCACG
AAACGCAGACATGCCATGTGGTTGACTCGTTTGGCTCACAAACAACTGGGTATCATATTTTGAAAGCATCTAGCGTGTCC
CAACTTATTTTGTTTGCTAATGATGAGTTGGAGTCTGACATTAAGCACTATAGAGTTGGTGGTATTCCTGGAGCATGCCC
TGAGCTTGGGTCCACAATATCACCTTTTAGAGAAGGAGGAATCATAATGTCTGAGTCAGCAGCGCTAAAACTGCTCCTAA
AGGGAATTTTTAGGCCCAAAGTGATGAAGCAATTGCTACTGGATGAACCATATTTGCTCATTTTATCGATATTATCTCCT
GGTATACTTATGGCCATGTACAACAATGGGATATTTGAGTTAGCGGTGAAGTTGTGGATCAATGAGAAACAATCTATAGC
CATGATAGCATCGTTATTGTCCGCCTTGGCTTTACGAGTGTCAGCAGCAGAAACACTCGTTGCACAGAGGATTATAATTG
ACACGGCAGCAACAGATCTTCTCGATGCTACGTGTGATGGATTCAACTTACATCTAACATATCCCACTGCACTCATGGTG
TTGCAAGTTGTTAAGAACAGAAATGAATGTGATGATACGTTGTTTAAAGCAGGTTTTTCACATTACAACATGAGTGTCGT
GCAGATTATGGAAAAAAATTATCTAAGCCTCTTGGGCGATGCTTGGAAAGATTTAACCTGGCGAGAAAAATTATCCGCAA
CATGGCACTCATACAAAGCAAAGCGCTCTATCACTCAGTTCATAAAACCCATAGGCAAAGCAGATTTAAAAGGGTTGTAC
AACATATCACCGCAAGCATTCTTGGGTCAGGGCGTACAGAGAGTCAAAGGCACCGCCTCAGGGTTGAATGAGCGACTCAA
TAATTATATCAATACTAAGTGTGTAAATATTTCATCCTTTTTCATTCGTAGAATTTTCCGGCGCTTGCCAACTTTTGTAA
CTTTCATTAATTCATTATTAGTTATTAGTATGCTAACTAGTGTAGTAGCAGTGTGTCAAGCAATAATTCTAGATCAAAGG
AAGTATAGAAAAGAAATTGAGTTGATGCAGATTGAGAAGAATGAAATTGTTTGTATGGAGTTGTATGCGAGTCTGCAGCG
CAAACTTGAGCGTGAATTCACATGGGATGAATATATGGAATATTTGAAATCTGTGAATCCCCAGATAGTTCAATTCGCGC
AAGCTCAAATGGAAGAATATAATGTGCGACATCAGCGCTCCACACCAGGTGTTAAGAATTTAGAGCAGGTGGTAGCATTT
ATAACTCTAATTATCATGATGTTTGATGCTGAAAGGAGCGACTGTGTATTTAAGACTCTCAACAAATTCAAAGGCATCGT
TTCTTCAATGGATCATGAAGTTAGACACCAGTCCTTGGATGATGTAATCAAGAATTTCGATGAAAGGAACGAAGTTATTG
ATTTTGAACTAAATGAGGATACAATTAAAACATCATCAGTGTTGGACACAAAGTTTAGCGACTGGTGGGATCGGCAAATC
CAAATGGGACACACACTTCCCCATTATAGAACTGAGGGACACTTCATGGAATTCACAAGGGCAACTGCTGTACAAGTGGC
CAACGACATCGCGCATAGTGAGCACCTAGACTTTCTAGTGAGGGGAGCTGTTGGGTCTGGAAAATCTACTGGACTGCCTG
TCCATCTCAGTGCAGCTGGATCTGTGCTTTTGATAGAACCAACTCGACCACTTGCAGAAAACGTGTTCAAGCAATTATCC
AGTGAACCGTTTTTCAAGAAGCCAACACTGCGCATGCGAGGAAATAGTGTGTTTGGTTCCTCTCCAATCTCCATTATGAC
TAGCGGCTTTGCGTTGCACTACTATGCTAATAATCGCTCTCAGCTAACTCAGTTTAATTTCATAATTTTTGATGAATGTC
ATGTTTTAGATCCTTCTGCAATGGCATTTCGTAGCTTGTTAAGTGTGTATCACCAAACATGCAAAGTGTTAAAGGTGTCA
GCCACTCCAGTGGGAAGGGAGGTCGAGTTCACAACACAACAACCAGTTAAATTGGTGGTTGAGGATACACTTTCATTCCA
ATCTTTTGTTGATGCGCAAGGCTCAAAAACCAATGCTGACGTAGTTCAGCATGGTTCGAACATACTCGTGTATGTGTCGA
GTTACAATGAAGTGGATACATTAGCCAAGCTTCTAACAGATAGGAATATGATAGTCTCAAAAGTTGATGGCAGAACAATG
AAGCACGGATGCTTAGAAATTGTAACGAAAGGGACTAGTGCAAAGCCACATTTTGTCGTAGCAACCAACATTATTGAAAA
TGGAGTAACTTTAGATATAGATGTAGTTGTAGATTTTGGGCTTAAAGTCTCACCGTTTTTAGATATTGACAATAGGAGCA
TAGCATACAATAAGATTAGTGTTAGCTATGGAGAAAGAATTCAGAGGTTGGGCCGTGTTGGGCGCTTTAAGAAGGGAGTG
GCATTGCGTATTGGACACACCGAAAAGGGAATTATTGAGATTCCAAGTATGATTGCTAGTGAAGCTGCGCTTGCGTGCTT
TGCATACAATTTGCCAGTAATGACAGGGGGTGTTTCAACTAGCCTCATTGGCAATTGTACTGTTCGTCAAGTTAAAACTA
TGCAACAATTTGAGCTGAGTCCATTCTTTATACAAAATTTTGTTGCCCATGATGGATCAATGCATCCTGTCATACATGAC
ATTCTTAAGAAGTATAAACTGCGAGATTGTATGACGCCCTTGTGTGATCAATCCATACCTTACAGAGCCTCAAGCACTTG
GTTGTCTGTTAGTGAGTACGAACGACTCGGAGTGGTTTTGGACATTCCAAAACAGATCAAGATTGCATTCCACATCAAGG
ATATCCCTCCTAAGTTGCATGAAATGCTTTGGGAAACAGTTATCAAATATAAGGATGTTTGTTTGTTTCCAAGTATTCGG
GCTTCATCCATTAGCAAAATTGCATACACACTGCGCACTGATCTTTTTGCAATTCCCAGAACCCTAATTCTAGTTGAAAG
ATTGCTCGAGGAGGAACGAGTGAAACAGAGTCAATTCAGAAGTCTCATTGATGAAGGATGCTCAAGCATGTTTTCAATTG
TTAATTTAACAAACACTCTTAGAGCTAGATATGCAAAGGATTACACTGCAGAAAACATACAGAAGCTCGAGAAAGTGAGG
AGTCAGTTAAAGGAGTTCTCAAATTTAAATGGCTCTGCATGCGAGGAGAACTTAATGAAGAGGTATGAATCTCTACAGTT
TGTGCATCATCAAGCAACAACTGCACTCGCAAAGGATTTGAAGTTGAAAGGAGTTTGGAAGAAGTCATTAGTTGTGCAGG
ACTTAATCATAGCGGGTGCCGTTGCTATTGGTGGAATAGGGCTCATCTATAGTTGGTTTACTCAATCAGTTGAAACTGTG
TCTCACCAGGGCAAGAACAAATCCAAAAGAATTCAAGCATTGAAGTTTCGACACGCCCGCGATAAGAGGGCTGGTTTTGA
AATTGATAACAATGATGATACAATAGAAGAATTCTTTGGATCTGCATACAGGAAGAAGGGAAAAGGTAAAGGCACCACTG
TTGGTATGGGCAAGTCAAGCAGGAGGTTTGTTAATATGTATGGATTTGACCCAACAGAATATTCATTCATCCAGTTCGTT
GATCCGCTCACTGGAGCTCAAATTGAAGAGAACGTCTATGCTGATATTAGAGACATCCAAGAGCGCTTTAGTGATGTCCG
CAAGAAAATGGTAGAGGATGATGAAATCGAATTGCAAGCATTGGGCAGCAACACAATCATTCATGCTTACTTCAGGAAGG
ATTGGTCTGACAAGGCTCTAAAAATTGATTTGATGCCACACAACCCACTCAAAATCTGTGATAAATCGAATGGCATTGCT
AAGTTTCCTGAAAGAGAACTTGAGTTGAGGCAAACTGGGCCAGCAACAGAGGTTGATGTGAAAGACATTCCAAAACAGGA
AGTGGAGCATGAAGCCAAATCACTCATGAGAGGTTTAAGGGATTTCAATCCAATTGCTCAAACAGTTTGCAGAGTAAAAG
TGTCTGTTGAATATGGAACGTCTGAAATGTATGGGTTCGGTTTTGGTGCGTATATTATAGTAAACCACCATCTATTCAAG
AGTTTCAATGGATCCATGGAAGTGCGATCAATGCATGGAACATTCAGAGTGAAGAATTTGCATAGCCTGAGCGTTTTACC
GATCAAAGGCAGAGACATTATCATCATAAAGATGCCAAAGGATTTCCCTGTTTTCCCACAAAAACTGCACTTCCGAGCTC
CAGTGCAGAATGAGAGGATTTGTTTGGTTGGAACTAATTTTCAAGAAAAACATGCATCATCAATCATCACAGAAACGAGT
ACTACATACAATGTACCGGGCAGCACTTTTTGGAAGCATTGGATTGAAACAAATGATGGGCATTGTGGATTACCAGTAGT
GAGTACAGCTGATGGATGTCTAGTTGGAATACACAGCTTGGCGAATAATGTGCAAACCACGAATTATTATTCAGCCTTTG
ATGAGGATTTTGAAAGTAAGTATCTCCGAACTGATGAGCATAATGAGTGGACCAAATCGTGGGTATATAACCCAGATACT
GTGTTGTGGGGTCCATTGAAGCTCAAAGAGAGTACCCCTAAAGGCCTGTTTAAGACAACAAAACTTGTACAGGATTTAAT
TGATCATGATGTTGTTGTAGAGCAAGCTAAACATTCTGCGTGGATGTATGAGGCTCTAACAGGGAATTTGCAAGCTGTGG
CGACAATGAAGAGTCAGCTAGTGACAAAGCACGTGGTCAAAGGGGAGTGTCGGCACTTCAAAGAGTTCTTAACTGTGGAT
TCGGAAGCAGAAGCTTTCTTCAGGCCTTTGATGGATGCTTATGGGAAGAGCTTGTTAAATAGAGAAGCATATATAAAGGA
CATAATGAAATACTCAAAGCCTATTGATGTTGGAATAGTAGACTGTGATGCTTTTGAAGAGGCTATCAATAGGGTTATCA
TTTATCTGCAAGTACATGGCTTCCAGAAATGCAATTACATCACCGATGAGCAGGAAATTTTCAAAGCTCTCAATATGAAA
GCTGCTGTCGGGGCTATGTATGGAGGCAAGAAGAAAGACTACTTCGAGCATTTTACTGAGGCGGATAAAGAGGAAATTGT
TATGCAAAGTTGCTTACGATTGTACAAGGGCTCACTTGGCATATGGAATGGATCATTGAAAGCAGAACTTCGGTGCAAAG
AGAAGATACTTGCAAATAAGACAAGGACATTCACTGCTGCACCTTTAGATACTCTACTGGGTGGGAAGGTGTGCGTTGAT
GATTTTAATAATCAATTCTACTCAAAGAACATTGAATGCTGCTGGACTGTTGGAATGACTAAGTTTTATGGAGGTTGGGA
CAAATTGCTTCGGCGTCTACCTGAAAATTGGGTGTACTGCGATGCCGATGGTTCACAATTCGATAGTTCACTCACCCCAT
ACCTAATTAATGCTGTTCTCATCATCAGAAGCACATACATGGAAGATTGGGACTTGGGGTTGCAAATGTTGCGCAATTTG
TACACAGAAATAATTTACACACCAATCTCAACTCCAGATGGAACAATTGTCAAGAAGTTTAGAGGTAATAATAGCGGTCA
ACCTTCTACCGTTGTGGATAATTCTCTCATGGTTGTTCTTGCTATGCATTACGCTCTCATTAAGGAGTGCGTTGAGTTTG
AAGAAATCGACAGCACGTGTGTATTCTTTGTTAATGGTGATGACTTATTGATTGCTGTGAATCCGGAGAAAGAGAGCATT
CTCGATAGAATGTCACAACATTTCTCAGATCTTGGTTTGAACTATGATTTTTCGTCGAGAACAAGAAGGAAGGAGGAATT
GTGGTTCATGTCCCATAGAGGCCTGCTAATTGAGGGTATGTACGTGCCAAAGCTTGAAGAAGAGAGAATTGTATCCATTC
TGCAATGGGATAGGGCTGATCTGCCAGAGCACAGATTAGAAGCGATTTGTGCAGCAATGATAGAATCCTGGGGTTATTTT
GAGTTAACGCACCAAATTAGGAGATTCTACTCATGGTTGTTACAACAGCAACCTTTTTCAACGATAGCACAGGAAGGAAA
AGCTCCATACATAGCGAGCATGGCATTGAAGAAGCTGTACATGAATAGGACAGTAGATGAGGAGGAACTGAAGGCTTTCA
CTGAAATGATGGTTGCCTTGGATGACGAATTTGAGTGCGATACTTATGAAGTGCACCATCAAGGAAATGACACAATCGAT
GCAGGAGGAAGCACTAAGAAGGATGCAAAACAAGAGCAAGGTAGCATTCAACCAAATCTCAACAAGGAAAAGGAAAAGGA
CGTGAATGTTGGAACATCTGGAACTCATACTGTGCCACGAATTAAAGCTATCACGTCCAAAATGAGAATGCCTAAGAGTA
AAGGTGCAACTGTACTAAATTTGGAACACTTACTCGAGTATGCTCCACAGCAAATTGACATCTCAAATACTCGAGCAACT
CAATCACAGTTTGATACGTGGTATGAAGCAGTACAACTTGCATACGACATAGGAGAAACTGAAATGCCAACTGTGATGAA
TGGGCTTATGGTTTGGTGCATTGAAAATGGAACCTCGCCAAACATCAACGGAGTTTGGGTTATGATGGATGGAGATGAAC
AAGTCGAATACCCACTGAAACCAATCGTTGAGAATGCAAAACCAACACTTAGGCAAATCATGGCACATTTCTCAGATGTT
GCAGAAGCGTATATAGAAATGCGCAACAAAAAGGAACCATATATGCCACGATATGGTTTAGTTCGTAATCTGCGCGATGG
AAGTCTGGCTCGCTATGCTTTTGACTTTTATGAAGTTACATCACGGACACCAGTGAGGGCTAGAGAGGCACACATTCAAA
TGAAGGCCGCAGCTTTAAAATCAGCTCAATCTCGACTTTTCGGATTGGATGGTGGCATTAGTACACAAGAGGAAAACACA
GAGAGGCACACCACCGAGGATGTTTCTCCAAGTATGCATACTCTACTTGGAGTGAAGAACATGTGA
>CF_YL21
ATGGCAATCTACACGTCAACAATCTGTTTCGGTTCGTTTGAATGCAAGCTACCATACTCACCCGCCTCTTGCGGGCATAT
TGTGAAGGAACGAGAAGTGCTGGCTTCCATCGATCCTTTCGCAGATCTGGAAACACAACTTAGTGCACGATTGCTCAAGC
AAGAATATGCTACTGTTCGTGTGCTCAAGAACGGTACTCTTACGTACCGATACAAGACTGATGCCCAGATAACGCGCATC
CAGAAGAGACTGGAAAGGAAGGATAGGGAAGAATACCACTTCCAGATGGCAGCTCCTAGTATTGTGTCAAAGATCACTAT
TGCTGGTGGAGAGCCACCTTCAAAACTTGAATCACAAGTGCGGAGGGGTGTCATCCACACAACTCCAAGGATGCGCACAG
CAAAAACATATCACACGCCAAAGCTGACAGAGGGACAAATGAACCACCTTATCAAGCAGGTGAAGCAAATTATGTCAACC
AAAGGAGGGTCTGTTCAACTGATTAGCAAGAAAAGTACCCATGTTCACTATAAAGAAGTTTTGGGATCACATCGCGCAGT
TGTTTGCACTGCACATATGAGAGGTTTACGAAAGAGAGTGGACTTTCGGTGTGATAAATGGACCGTTGTGCGTCTACAGC
ATCTCGCCAGGACGGATAAGTGGAATAACCAAGTTCGTGTTACTGATCTACGCAAGGGCGATAGTGGAGTTATATTGAGT
AATACTAATCTCAAAGGAAACTTTGGGAGAAGCTCGGAGGGCATATTCATAGTGCGTGGGTCGCACGAAGGAAAAATCTA
TGATGCACGTTCCAAGGTTACTCAAGGGGTTATGGATTCAATGGTTCAGTTCTCAAGCGCTGAAAGCTTTTGGAAGGGAT
TGGACGGCAATTGGGCACAAATGAGATATCCTACAGATCATACATGTGTGGCAGGCTTACCAGTTGAAGACTGTGGCAGA
GTTGCAGCGATAATGACACACAGTATTTTACCGTGCTATAAGATAACCTGCCCTACCTGTGCCCAACAATATGCCAACTT
GCCAGCCAGTGACTTACTTAAGATATTACACAAGCACGCAAGTGATGGTTTAAATCGATTGGGGGCAGACAAAGATCGCT
TTGTGCATGTCAAAAAGTTCTTGACAATCTTAGAGCACTTAACTGAACCGGTTGATCTGAGTCTAGAAATTTTCAATGAA
GTATTCAAGTCTATAGGGGAGAAGCAACAATCACCTTTCAAAAACCTGAATATTCTGAATAATTTCTTTTTAAAAGGAAA
GGAAAATACAGCTCGTGAATGGCAGGTGGCTCAATTAAGCTTACTTGAATTGGCAAGATTCCAAAAGAACAGAACGGATA
ATATCAAGAAAGTAGACATCTCGTTCTTTAGGAAT
gitextract_gj8qs7tf/
├── .Rbuildignore
├── .gitignore
├── CONDUCT.md
├── DESCRIPTION
├── Makefile
├── NAMESPACE
├── NEWS.md
├── R/
│ ├── AllClasses.R
│ ├── SeqBundles.R
│ ├── ancestor_seq.R
│ ├── arc.R
│ ├── available.R
│ ├── clustal.R
│ ├── color_by_conservation.R
│ ├── color_else.R
│ ├── cons.R
│ ├── data.R
│ ├── dms.R
│ ├── facet_msa.R
│ ├── geom_GC.R
│ ├── geom_asterisk.R
│ ├── geom_msa.R
│ ├── geom_msaBar.R
│ ├── geom_seed.R
│ ├── ggmaf.R
│ ├── ggmsa.R
│ ├── import-functions.R
│ ├── method-plot.R
│ ├── method-show.R
│ ├── methods-diff.R
│ ├── methods-ggplot_add.R
│ ├── msa_data.R
│ ├── pp_interactive.R
│ ├── prepare_fasta.R
│ ├── read_maf.R
│ ├── seqdiff.R
│ ├── seqlogo.R
│ ├── simplot.R
│ ├── sysdata.rda
│ ├── theme_msa.R
│ └── zzz.R
├── README.Rmd
├── README.md
├── inst/
│ ├── CITATION
│ └── extdata/
│ ├── GVariation/
│ │ ├── A.Mont.fas
│ │ ├── B.Oz.fas
│ │ ├── C.Wilga5.fas
│ │ └── sample_alignment.fa
│ ├── Gram-negative_AKL.fasta
│ ├── Gram-positive_AKL.fasta
│ ├── LeaderRepeat_All.fa
│ ├── Rfam/
│ │ ├── RF00458.fasta
│ │ ├── RF03120.fasta
│ │ └── RF03120_SS.txt
│ ├── TP53_genes.xlsx
│ ├── sample.fasta
│ ├── seedSample.fa
│ ├── sequence-link-tree.fasta
│ └── tp53.fa
├── man/
│ ├── GVariation.Rd
│ ├── Gram-negative_AKL.fasta.Rd
│ ├── Gram-positive_AKL.fasta.Rd
│ ├── LeaderRepeat_All.fa.Rd
│ ├── Rfam.Rd
│ ├── TP53_genes.xlsx.Rd
│ ├── adjust_ally.Rd
│ ├── assign_dms.Rd
│ ├── available_colors.Rd
│ ├── available_fonts.Rd
│ ├── available_msa.Rd
│ ├── extract_seq.Rd
│ ├── facet_msa.Rd
│ ├── geom_GC.Rd
│ ├── geom_helix.Rd
│ ├── geom_msa.Rd
│ ├── geom_msaBar.Rd
│ ├── geom_seed.Rd
│ ├── geom_seqlogo.Rd
│ ├── ggSeqBundle.Rd
│ ├── gghelix.Rd
│ ├── ggmaf.Rd
│ ├── ggmsa.Rd
│ ├── merge_seq.Rd
│ ├── plot-methods.Rd
│ ├── readSSfile.Rd
│ ├── read_maf.Rd
│ ├── reset_pos.Rd
│ ├── sample.fasta.Rd
│ ├── seedSample.fa.Rd
│ ├── seqdiff.Rd
│ ├── seqlogo.Rd
│ ├── sequence-link-tree.fasta.Rd
│ ├── show-methods.Rd
│ ├── simplify_hdata.Rd
│ ├── simplot.Rd
│ ├── theme_msa.Rd
│ ├── tidy_hdata.Rd
│ ├── tidy_maf_df.Rd
│ ├── tidy_msa.Rd
│ ├── tp53.fa.Rd
│ └── treeMSA_plot.Rd
├── tests/
│ ├── testthat/
│ │ ├── test-main.R
│ │ ├── test-msa_data.R
│ │ └── test-tidy_msa.R
│ └── testthat.R
└── vignettes/
├── .gitignore
├── ggmsa.Rmd
└── ggmsa.bib
Condensed preview — 108 files, each showing path, character count, and a content snippet. Download the .json file or copy for the full structured content (322K chars).
[
{
"path": ".Rbuildignore",
"chars": 124,
"preview": "^.*\\.Rproj$\n^\\.Rproj\\.user$\nMakefile\nREADME.md\nREADME_files\nREADME.Rmd\n^_pkgdown\\.yml$\n^docs$\n^pkgdown$\nlogo.png\nCONDUCT"
},
{
"path": ".gitignore",
"chars": 103,
"preview": ".Rproj.user\n.Rhistory\n.RData\n.Renviron\n.DS_Store\ninst/doc\nggmsa.Rproj\nggmsa.Rcheck\n.git\ndocs/\npkgdown/\n"
},
{
"path": "CONDUCT.md",
"chars": 1389,
"preview": "# Contributor Code of Conduct\n\nAs contributors and maintainers of this project, we pledge to respect all people who \ncon"
},
{
"path": "DESCRIPTION",
"chars": 1876,
"preview": "Package: ggmsa\nTitle: Plot Multiple Sequence Alignment using 'ggplot2'\nVersion: 1.19.0\nAuthors@R: c(person(\"Guangchuang\""
},
{
"path": "Makefile",
"chars": 1409,
"preview": "PKGNAME := $(shell sed -n \"s/Package: *\\([^ ]*\\)/\\1/p\" DESCRIPTION)\nPKGVERS := $(shell sed -n \"s/Version: *\\([^ ]*\\)/\\1/"
},
{
"path": "NAMESPACE",
"chars": 3555,
"preview": "# Generated by roxygen2: do not edit by hand\n\nS3method(diff,SeqDiff)\nS3method(ggplot_add,GCcontent)\nS3method(ggplot_add,"
},
{
"path": "NEWS.md",
"chars": 4861,
"preview": "# ggmsa 1.18.0\n\n+ Bioconductor RELEASE_3_23 (2026-04-29, Wed)\n\n# ggmsa 1.16.0\n\n+ Bioconductor RELEASE_3_22 (2025-11-01, "
},
{
"path": "R/AllClasses.R",
"chars": 292,
"preview": "setClass(\"SeqDiff\",\n representation = representation(\n file = \"character\",\n "
},
{
"path": "R/SeqBundles.R",
"chars": 5438,
"preview": "##' plot Sequence Bundles for MSA based 'ggolot2'\n##'\n##'\n##' @title ggSeqBundle\n##' @importFrom ggfun geom_xspline\n##'"
},
{
"path": "R/ancestor_seq.R",
"chars": 4462,
"preview": "##' plot Tree-MSA plot\n##'\n##'\n##' 'treeMSA_plot()' automatically re-arranges the MSA data according to \n##' the tree st"
},
{
"path": "R/arc.R",
"chars": 9728,
"preview": "##' Plots nucleltide secondary structure as helices in arc diagram\n##'\n##' @title gghelix\n##' @param helix_data a data "
},
{
"path": "R/available.R",
"chars": 1807,
"preview": "##' This function lists font families currently available \n##' that can be used by 'ggmsa'\n##'\n##'\n##' @title List Font "
},
{
"path": "R/clustal.R",
"chars": 1582,
"preview": "##' A color scheme of Culstal. The algorithm to assign colors\n##' for Multiple Sequence.\n##'\n##' @param y sequence al"
},
{
"path": "R/color_by_conservation.R",
"chars": 900,
"preview": "color_increment <- function(conservation_visibility){\n lapply(seq_len(nrow(conservation_visibility)), function(i){\n "
},
{
"path": "R/color_else.R",
"chars": 1269,
"preview": "##' Assigning colors to sequence alignment.\r\n##'\r\n##'\r\n##' @param y sequence alignment with data frame, generated by ti"
},
{
"path": "R/cons.R",
"chars": 4118,
"preview": "##' cleaning the needless sequences' color according to the \r\n##' consensus sequence (only used in the consensus views)."
},
{
"path": "R/data.R",
"chars": 3173,
"preview": "#' A sample data used in ggmsa\n#'\n#' A dataset containing the alignment sequences of \n#' the phenylalanine hydroxylase p"
},
{
"path": "R/dms.R",
"chars": 749,
"preview": "##' assign dms value to alignments.\n##'\n##' @title assign_dms\n##' @param x data frame from tidy_msa()\n##' @param dms dms"
},
{
"path": "R/facet_msa.R",
"chars": 1031,
"preview": "##' The MSA would be plot in a field that you set.\n\n##' @title segment MSA\n##' @param field a numeric vector of the fiel"
},
{
"path": "R/geom_GC.R",
"chars": 2072,
"preview": "##' Multiple sequence alignment layer for ggplot2. It plot points of GC content.\n\n##' @title geom_GC\n##' @param show.leg"
},
{
"path": "R/geom_asterisk.R",
"chars": 2894,
"preview": "##' a ggplot2 layer of asterisk as a polygon\n##'\n##'\n##' @title a ggplot2 layer of asterisk as a polygon\n##' @param mapp"
},
{
"path": "R/geom_msa.R",
"chars": 7621,
"preview": "##' Multiple sequence alignment layer for ggplot2. \r\n##' It creates background tiles with/without sequence characters.\r\n"
},
{
"path": "R/geom_msaBar.R",
"chars": 1734,
"preview": "##' Multiple sequence alignment layer for ggplot2.\n##' It plot sequence conservation bar.\n\n##' @title geom_msaBar\n\n##' "
},
{
"path": "R/geom_seed.R",
"chars": 2836,
"preview": "##' Highlighting the seed in miRNA sequences\r\n##'\r\n##'\r\n##' @title geom_seed\r\n##' @param seed a character string.Specify"
},
{
"path": "R/ggmaf.R",
"chars": 5783,
"preview": "##' plot MAF\n##'\n##' @title ggmaf \n##' @param data a tidy MAF data frame.You can get it by tidy_maf_df() \n##' @param ref"
},
{
"path": "R/ggmsa.R",
"chars": 5334,
"preview": "##' Plot multiple sequence alignment using ggplot2 with multiple color schemes \r\n##' supported.\r\n##'\r\n##'\r\n##' @title gg"
},
{
"path": "R/import-functions.R",
"chars": 201,
"preview": "##' @importFrom utils globalVariables\nglobalVariables(\".\")\nglobalVariables(\"fre\") #geom_GC.R:\nglobalVariables(\"read.deli"
},
{
"path": "R/method-plot.R",
"chars": 3664,
"preview": "##' plot method for SeqDiff object\n##'\n##' @name plot\n##' @rdname plot-methods\n##' @exportMethod plot\n##' @aliases plot,"
},
{
"path": "R/method-show.R",
"chars": 938,
"preview": "##' show method\n##'\n##'\n##' @name show\n##' @docType methods\n##' @rdname show-methods\n##' @title show method\n##' @param o"
},
{
"path": "R/methods-diff.R",
"chars": 87,
"preview": "##' @method diff SeqDiff\n##' @export\ndiff.SeqDiff <- function(x, ...) {\n x@diff\n}\n\n\n"
},
{
"path": "R/methods-ggplot_add.R",
"chars": 4414,
"preview": "##' @method ggplot_add seqlogo\r\n##' @export\r\nggplot_add.seqlogo <- function(object, plot, object_name) {\r\n msaData <-"
},
{
"path": "R/msa_data.R",
"chars": 10076,
"preview": "##' This function parses FASTA files or other sequence objects. \r\n##' And assign color to each molecule (amino acid or n"
},
{
"path": "R/pp_interactive.R",
"chars": 3716,
"preview": "\nmake_gap <- function(gap, previous_seq) {\n gap_df <- previous_seq[rep(1, each=gap),] \n gap_start <- max(previous_"
},
{
"path": "R/prepare_fasta.R",
"chars": 2107,
"preview": "##' preparing multiple sequence alignment\n##'\n##' This function supports both NT or AA sequences; It supports multiple \n"
},
{
"path": "R/read_maf.R",
"chars": 2034,
"preview": "##' read 'multiple alignment format'(MAF) file\n##'\n##' @title read_maf\n##' @param multiple_alignment_format a multiple a"
},
{
"path": "R/seqdiff.R",
"chars": 2230,
"preview": "\n##' calculate difference of two aligned sequences\n##'\n##'\n##' @title seqdiff\n##' @param fasta fasta file\n##' @param ref"
},
{
"path": "R/seqlogo.R",
"chars": 7473,
"preview": "##' plot sequence logo for MSA based 'ggolot2'\n\n##' @title seqlogo\n##' @param msa Multiple sequence alignment file or ob"
},
{
"path": "R/simplot.R",
"chars": 3909,
"preview": "##' Sequence similarity plot\n##'\n##'\n##' @title simplot\n##' @param file alignment fast file\n##' @param query query seque"
},
{
"path": "R/theme_msa.R",
"chars": 2068,
"preview": "##' Theme for ggmsa.\n##'\n##' @title theme_msa\n##' @importFrom ggplot2 theme_minimal\n##' @importFrom ggplot2 labs\n##' @ex"
},
{
"path": "R/zzz.R",
"chars": 956,
"preview": "#' @importFrom utils packageDescription\n.onAttach <- function(libname, pkgname){\n #options(total_heigh = 4)\n optio"
},
{
"path": "README.Rmd",
"chars": 3304,
"preview": "---\noutput: \n md_document:\n variant: gfm\nhtml_preview: TRUE\n---\n<!-- README.md is generated from README.Rmd. Please "
},
{
"path": "README.md",
"chars": 3226,
"preview": "<!-- README.md is generated from README.Rmd. Please edit that file -->\n\n# ggmsa:a visual exploration tool for multiple s"
},
{
"path": "inst/CITATION",
"chars": 1857,
"preview": "citHeader(\"To cite ggmsa in publications use:\")\n\ncitEntry(\n entry = \"book\",\n title = \"Data Integration, Manipulat"
},
{
"path": "inst/extdata/GVariation/A.Mont.fas",
"chars": 18388,
"preview": ">Mont\nATGGCAACTTACACATCAACAATCCAGTTTGGTTCCATTGAATGCAAACTTCCATACTCACCCGCTCCTTTTGGGCTAGTTGCGGGGAAACGAGAAGTTTTAACCACCACTGAC"
},
{
"path": "inst/extdata/GVariation/B.Oz.fas",
"chars": 18386,
"preview": ">Oz\nATGGCAACTTACATGTCAACAATCTGTTTCGGTTCGTTTGAATGCAAGCTACCATACTCACCCGCTTCTTGCGGGCATATTGTGAAGGAGCGAGAAGTGCTGGCTTCCGTTGATCC"
},
{
"path": "inst/extdata/GVariation/C.Wilga5.fas",
"chars": 18390,
"preview": ">Wilga5\nATGGCAACTTACATGTCAACAATCTGTTTCGGTTCGTTTGAATGCAAGCTACCATACTCACCCGCCTCTTGCGGGCATATTGTGAAGGAACGAGAAGTGCTGGCTTCCGTTG"
},
{
"path": "inst/extdata/GVariation/sample_alignment.fa",
"chars": 37231,
"preview": ">Mont\nATGGCAACTTACACATCAACAATCCAGTTTGGTTCCATTGAATGCAAACTTCCATACTCACCCGCTCCTTTTGGGCTAGT\nTGCGGGGAAACGAGAAGTTTTAACCACCACTGA"
},
{
"path": "inst/extdata/Gram-negative_AKL.fasta",
"chars": 6900,
"preview": ">Random_Gram-negative_AKL_gjtez\nRWTHLASGRTYNYKFNPPKQYGKDDITGEDLIQRED\n>Random_Gram-negative_AKL_dibhu\nRWTHLNSGRTYHYKFNPPK"
},
{
"path": "inst/extdata/Gram-positive_AKL.fasta",
"chars": 6900,
"preview": ">Random_Gram-positive_AKL_pjxgp\nRRTCVGCGTAFNYVMEPPKKEGICDACGGKLVVRDD\n>Random_Gram-positive_AKL_essyp\nRRTCVGCGTAFNYVMEPPK"
},
{
"path": "inst/extdata/LeaderRepeat_All.fa",
"chars": 1677,
"preview": ">Ain_RyC-MR95\nATCCGTTGATCAAATTTGAGGTTTGAGAGATATGTAAATTCAAAGGATAATCAAAC\n>Asp_D21\nATCCGTTGATCAAATTTGAGGTTTGAGAGATATGTAAATT"
},
{
"path": "inst/extdata/Rfam/RF00458.fasta",
"chars": 1630,
"preview": ">AF178440.1/5925-6123\nUUGACUAUGUGAUCUUGCUUUCG----UAAUAAAAUUCUGUACAUAAAAGUCGAAAGUAUUGCUAUAGUUAAGGUUGCGCUUGCCUAUUUAGGCAUAC"
},
{
"path": "inst/extdata/Rfam/RF03120.fasta",
"chars": 6212,
"preview": ">KU973692.1/1-298\nAUAUUAGGUUUUUACCUACCCAGGAA--AAGCCAACCAA-CCUCGAUCUCUUGUAGAUCUGUUCUCUAAACGA-ACUUUAAAA------UCUGUGUAGCUGU"
},
{
"path": "inst/extdata/Rfam/RF03120_SS.txt",
"chars": 318,
"preview": ">RF03120\n......<<<<<<<.<<<....>>>>>..>>>>>...........<<<<<.....>>>>>.<<<<.......>>.>>..............<<<<<<<<.<<.<<<<.<<<."
},
{
"path": "inst/extdata/sample.fasta",
"chars": 4377,
"preview": ">PH4H_Rattus_norvegicus\nMAAVVLENGVLSRKLSDFGQETSYIEDNSNQNGAISLIFSLKEEVGALAKVLRLFEENDINLTHIESRPSRLNKDEYEFF\nTYLDKRTKPVLGSII"
},
{
"path": "inst/extdata/seedSample.fa",
"chars": 444,
"preview": ">hsa-let-7a-5p MIMAT0000062 Homo sapiens let-7a-5p\nUGAGGUAGUAGGUUGUAUAGUU\n>hsa-let-7b-5p MIMAT0000063 Homo sapiens let-7"
},
{
"path": "inst/extdata/sequence-link-tree.fasta",
"chars": 14113,
"preview": ">Phy000B0HV_NEUCR\nM-----GIGSATLG-----------------------------------SRIPTPVLVARAVVSSSDGK-----DC--VA\nNPNLCEKP-VGGSQLTVPIVL"
},
{
"path": "inst/extdata/tp53.fa",
"chars": 2099,
"preview": ">Homo_sapiens\n----MDDLMLSP-------DDIEQWFTED-----------------PGPDEAPRMPEAAPPVAPAPA---------APTPAAPAPAPSWPLSSSVPSQKTYQGSYG"
},
{
"path": "man/GVariation.Rd",
"chars": 688,
"preview": "% Generated by roxygen2: do not edit by hand\n% Please edit documentation in R/data.R\n\\docType{data}\n\\name{GVariation}\n\\a"
},
{
"path": "man/Gram-negative_AKL.fasta.Rd",
"chars": 433,
"preview": "% Generated by roxygen2: do not edit by hand\n% Please edit documentation in R/data.R\n\\docType{data}\n\\name{Gram-negative_"
},
{
"path": "man/Gram-positive_AKL.fasta.Rd",
"chars": 433,
"preview": "% Generated by roxygen2: do not edit by hand\n% Please edit documentation in R/data.R\n\\docType{data}\n\\name{Gram-positive_"
},
{
"path": "man/LeaderRepeat_All.fa.Rd",
"chars": 314,
"preview": "% Generated by roxygen2: do not edit by hand\n% Please edit documentation in R/data.R\n\\docType{data}\n\\name{LeaderRepeat_A"
},
{
"path": "man/Rfam.Rd",
"chars": 609,
"preview": "% Generated by roxygen2: do not edit by hand\n% Please edit documentation in R/data.R\n\\docType{data}\n\\name{Rfam}\n\\alias{R"
},
{
"path": "man/TP53_genes.xlsx.Rd",
"chars": 284,
"preview": "% Generated by roxygen2: do not edit by hand\n% Please edit documentation in R/data.R\n\\docType{data}\n\\name{TP53_genes.xls"
},
{
"path": "man/adjust_ally.Rd",
"chars": 496,
"preview": "% Generated by roxygen2: do not edit by hand\n% Please edit documentation in R/ancestor_seq.R\n\\name{adjust_ally}\n\\alias{a"
},
{
"path": "man/assign_dms.Rd",
"chars": 332,
"preview": "% Generated by roxygen2: do not edit by hand\n% Please edit documentation in R/dms.R\n\\name{assign_dms}\n\\alias{assign_dms}"
},
{
"path": "man/available_colors.Rd",
"chars": 423,
"preview": "% Generated by roxygen2: do not edit by hand\n% Please edit documentation in R/available.R\n\\name{available_colors}\n\\alias"
},
{
"path": "man/available_fonts.Rd",
"chars": 423,
"preview": "% Generated by roxygen2: do not edit by hand\n% Please edit documentation in R/available.R\n\\name{available_fonts}\n\\alias{"
},
{
"path": "man/available_msa.Rd",
"chars": 401,
"preview": "% Generated by roxygen2: do not edit by hand\n% Please edit documentation in R/available.R\n\\name{available_msa}\n\\alias{av"
},
{
"path": "man/extract_seq.Rd",
"chars": 411,
"preview": "% Generated by roxygen2: do not edit by hand\n% Please edit documentation in R/ancestor_seq.R\n\\name{extract_seq}\n\\alias{e"
},
{
"path": "man/facet_msa.Rd",
"chars": 619,
"preview": "% Generated by roxygen2: do not edit by hand\n% Please edit documentation in R/facet_msa.R\n\\name{facet_msa}\n\\alias{facet_"
},
{
"path": "man/geom_GC.Rd",
"chars": 553,
"preview": "% Generated by roxygen2: do not edit by hand\n% Please edit documentation in R/geom_GC.R\n\\name{geom_GC}\n\\alias{geom_GC}\n\\"
},
{
"path": "man/geom_helix.Rd",
"chars": 1259,
"preview": "% Generated by roxygen2: do not edit by hand\n% Please edit documentation in R/arc.R\n\\name{geom_helix}\n\\alias{geom_helix}"
},
{
"path": "man/geom_msa.Rd",
"chars": 3399,
"preview": "% Generated by roxygen2: do not edit by hand\n% Please edit documentation in R/geom_msa.R\n\\name{geom_msa}\n\\alias{geom_msa"
},
{
"path": "man/geom_msaBar.Rd",
"chars": 504,
"preview": "% Generated by roxygen2: do not edit by hand\n% Please edit documentation in R/geom_msaBar.R\n\\name{geom_msaBar}\n\\alias{ge"
},
{
"path": "man/geom_seed.Rd",
"chars": 809,
"preview": "% Generated by roxygen2: do not edit by hand\n% Please edit documentation in R/geom_seed.R\n\\name{geom_seed}\n\\alias{geom_s"
},
{
"path": "man/geom_seqlogo.Rd",
"chars": 1457,
"preview": "% Generated by roxygen2: do not edit by hand\n% Please edit documentation in R/seqlogo.R\n\\name{geom_seqlogo}\n\\alias{geom_"
},
{
"path": "man/ggSeqBundle.Rd",
"chars": 1877,
"preview": "% Generated by roxygen2: do not edit by hand\n% Please edit documentation in R/SeqBundles.R\n\\name{ggSeqBundle}\n\\alias{ggS"
},
{
"path": "man/gghelix.Rd",
"chars": 1094,
"preview": "% Generated by roxygen2: do not edit by hand\n% Please edit documentation in R/arc.R\n\\name{gghelix}\n\\alias{gghelix}\n\\titl"
},
{
"path": "man/ggmaf.Rd",
"chars": 925,
"preview": "% Generated by roxygen2: do not edit by hand\n% Please edit documentation in R/ggmaf.R\n\\name{ggmaf}\n\\alias{ggmaf}\n\\title{"
},
{
"path": "man/ggmsa.Rd",
"chars": 3566,
"preview": "% Generated by roxygen2: do not edit by hand\n% Please edit documentation in R/ggmsa.R\n\\name{ggmsa}\n\\alias{ggmsa}\n\\title{"
},
{
"path": "man/merge_seq.Rd",
"chars": 473,
"preview": "% Generated by roxygen2: do not edit by hand\n% Please edit documentation in R/pp_interactive.R\n\\name{merge_seq}\n\\alias{m"
},
{
"path": "man/plot-methods.Rd",
"chars": 984,
"preview": "% Generated by roxygen2: do not edit by hand\n% Please edit documentation in R/method-plot.R\n\\docType{methods}\n\\name{plot"
},
{
"path": "man/readSSfile.Rd",
"chars": 532,
"preview": "% Generated by roxygen2: do not edit by hand\n% Please edit documentation in R/arc.R\n\\name{readSSfile}\n\\alias{readSSfile}"
},
{
"path": "man/read_maf.Rd",
"chars": 372,
"preview": "% Generated by roxygen2: do not edit by hand\n% Please edit documentation in R/read_maf.R\n\\name{read_maf}\n\\alias{read_maf"
},
{
"path": "man/reset_pos.Rd",
"chars": 291,
"preview": "% Generated by roxygen2: do not edit by hand\n% Please edit documentation in R/pp_interactive.R\n\\name{reset_pos}\n\\alias{r"
},
{
"path": "man/sample.fasta.Rd",
"chars": 386,
"preview": "% Generated by roxygen2: do not edit by hand\n% Please edit documentation in R/data.R\n\\docType{data}\n\\name{sample.fasta}\n"
},
{
"path": "man/seedSample.fa.Rd",
"chars": 386,
"preview": "% Generated by roxygen2: do not edit by hand\n% Please edit documentation in R/data.R\n\\docType{data}\n\\name{seedSample.fa}"
},
{
"path": "man/seqdiff.Rd",
"chars": 553,
"preview": "% Generated by roxygen2: do not edit by hand\n% Please edit documentation in R/seqdiff.R\n\\name{seqdiff}\n\\alias{seqdiff}\n\\"
},
{
"path": "man/seqlogo.Rd",
"chars": 1569,
"preview": "% Generated by roxygen2: do not edit by hand\n% Please edit documentation in R/seqlogo.R\n\\name{seqlogo}\n\\alias{seqlogo}\n\\"
},
{
"path": "man/sequence-link-tree.fasta.Rd",
"chars": 347,
"preview": "% Generated by roxygen2: do not edit by hand\n% Please edit documentation in R/data.R\n\\docType{data}\n\\name{sequence-link-"
},
{
"path": "man/show-methods.Rd",
"chars": 492,
"preview": "% Generated by roxygen2: do not edit by hand\n% Please edit documentation in R/method-show.R\n\\docType{methods}\n\\name{show"
},
{
"path": "man/simplify_hdata.Rd",
"chars": 371,
"preview": "% Generated by roxygen2: do not edit by hand\n% Please edit documentation in R/pp_interactive.R\n\\name{simplify_hdata}\n\\al"
},
{
"path": "man/simplot.Rd",
"chars": 1319,
"preview": "% Generated by roxygen2: do not edit by hand\n% Please edit documentation in R/simplot.R\n\\name{simplot}\n\\alias{simplot}\n\\"
},
{
"path": "man/theme_msa.Rd",
"chars": 219,
"preview": "% Generated by roxygen2: do not edit by hand\n% Please edit documentation in R/theme_msa.R\n\\name{theme_msa}\n\\alias{theme_"
},
{
"path": "man/tidy_hdata.Rd",
"chars": 487,
"preview": "% Generated by roxygen2: do not edit by hand\n% Please edit documentation in R/pp_interactive.R\n\\name{tidy_hdata}\n\\alias{"
},
{
"path": "man/tidy_maf_df.Rd",
"chars": 421,
"preview": "% Generated by roxygen2: do not edit by hand\n% Please edit documentation in R/ggmaf.R\n\\name{tidy_maf_df}\n\\alias{tidy_maf"
},
{
"path": "man/tidy_msa.Rd",
"chars": 770,
"preview": "% Generated by roxygen2: do not edit by hand\n% Please edit documentation in R/msa_data.R\n\\name{tidy_msa}\n\\alias{tidy_msa"
},
{
"path": "man/tp53.fa.Rd",
"chars": 300,
"preview": "% Generated by roxygen2: do not edit by hand\n% Please edit documentation in R/data.R\n\\docType{data}\n\\name{tp53.fa}\n\\alia"
},
{
"path": "man/treeMSA_plot.Rd",
"chars": 1440,
"preview": "% Generated by roxygen2: do not edit by hand\n% Please edit documentation in R/ancestor_seq.R\n\\name{treeMSA_plot}\n\\alias{"
},
{
"path": "tests/testthat/test-main.R",
"chars": 304,
"preview": "library(ggmsa)\nlibrary(ggplot2)\n\n\ntest_that(\"check whether `ggmsa` create a `ggplot` object\", {\n p <- ggmsa(msa = sys"
},
{
"path": "tests/testthat/test-msa_data.R",
"chars": 883,
"preview": "\n\nlibrary(ggmsa)\n\nmsa <- system.file(\"extdata\", \"sample.fasta\", package = \"ggmsa\")\ntidymsa <- tidy_msa(msa, 10, 20)\n\n\nte"
},
{
"path": "tests/testthat/test-tidy_msa.R",
"chars": 1208,
"preview": "\n\nlibrary(ggmsa)\nlibrary(Biostrings)\n\nmsa <- system.file(\"extdata\", \"sample.fasta\", package = \"ggmsa\")\ntidy_names <- c(\""
},
{
"path": "tests/testthat.R",
"chars": 54,
"preview": "library(testthat)\nlibrary(ggmsa)\n\ntest_check(\"ggmsa\")\n"
},
{
"path": "vignettes/.gitignore",
"chars": 98,
"preview": "Annotations.Rmd\nColor_schemes_And_Font_Families.Rmd\nMSA_theme.Rmd\nOther_Modules.Rmd\nView_modes.Rmd"
},
{
"path": "vignettes/ggmsa.Rmd",
"chars": 4967,
"preview": "---\ntitle: \"ggmsa-Getting Started\"\nauthor: \"GuangChuang Yu and Lang Zhou\"\noutput:\n prettydoc::html_pretty:\n toc: fal"
},
{
"path": "vignettes/ggmsa.bib",
"chars": 1272,
"preview": "@article{Taylor1997Residual,\n title={Residual colours: a proposal for aminochromography.},\n author={Tayl"
}
]
// ... and 2 more files (download for full content)
About this extraction
This page contains the full source code of the YuLab-SMU/ggmsa GitHub repository, extracted and formatted as plain text for AI agents and large language models (LLMs). The extraction includes 108 files (300.1 KB), approximately 117.4k tokens. Use this with OpenClaw, Claude, ChatGPT, Cursor, Windsurf, or any other AI tool that accepts text input. You can copy the full output to your clipboard or download it as a .txt file.
Extracted by GitExtract — free GitHub repo to text converter for AI. Built by Nikandr Surkov.