Repository: YuLab-SMU/ggmsa
Branch: devel
Commit: 956078ed388a
Files: 108
Total size: 300.1 KB
Directory structure:
gitextract_gj8qs7tf/
├── .Rbuildignore
├── .gitignore
├── CONDUCT.md
├── DESCRIPTION
├── Makefile
├── NAMESPACE
├── NEWS.md
├── R/
│ ├── AllClasses.R
│ ├── SeqBundles.R
│ ├── ancestor_seq.R
│ ├── arc.R
│ ├── available.R
│ ├── clustal.R
│ ├── color_by_conservation.R
│ ├── color_else.R
│ ├── cons.R
│ ├── data.R
│ ├── dms.R
│ ├── facet_msa.R
│ ├── geom_GC.R
│ ├── geom_asterisk.R
│ ├── geom_msa.R
│ ├── geom_msaBar.R
│ ├── geom_seed.R
│ ├── ggmaf.R
│ ├── ggmsa.R
│ ├── import-functions.R
│ ├── method-plot.R
│ ├── method-show.R
│ ├── methods-diff.R
│ ├── methods-ggplot_add.R
│ ├── msa_data.R
│ ├── pp_interactive.R
│ ├── prepare_fasta.R
│ ├── read_maf.R
│ ├── seqdiff.R
│ ├── seqlogo.R
│ ├── simplot.R
│ ├── sysdata.rda
│ ├── theme_msa.R
│ └── zzz.R
├── README.Rmd
├── README.md
├── inst/
│ ├── CITATION
│ └── extdata/
│ ├── GVariation/
│ │ ├── A.Mont.fas
│ │ ├── B.Oz.fas
│ │ ├── C.Wilga5.fas
│ │ └── sample_alignment.fa
│ ├── Gram-negative_AKL.fasta
│ ├── Gram-positive_AKL.fasta
│ ├── LeaderRepeat_All.fa
│ ├── Rfam/
│ │ ├── RF00458.fasta
│ │ ├── RF03120.fasta
│ │ └── RF03120_SS.txt
│ ├── TP53_genes.xlsx
│ ├── sample.fasta
│ ├── seedSample.fa
│ ├── sequence-link-tree.fasta
│ └── tp53.fa
├── man/
│ ├── GVariation.Rd
│ ├── Gram-negative_AKL.fasta.Rd
│ ├── Gram-positive_AKL.fasta.Rd
│ ├── LeaderRepeat_All.fa.Rd
│ ├── Rfam.Rd
│ ├── TP53_genes.xlsx.Rd
│ ├── adjust_ally.Rd
│ ├── assign_dms.Rd
│ ├── available_colors.Rd
│ ├── available_fonts.Rd
│ ├── available_msa.Rd
│ ├── extract_seq.Rd
│ ├── facet_msa.Rd
│ ├── geom_GC.Rd
│ ├── geom_helix.Rd
│ ├── geom_msa.Rd
│ ├── geom_msaBar.Rd
│ ├── geom_seed.Rd
│ ├── geom_seqlogo.Rd
│ ├── ggSeqBundle.Rd
│ ├── gghelix.Rd
│ ├── ggmaf.Rd
│ ├── ggmsa.Rd
│ ├── merge_seq.Rd
│ ├── plot-methods.Rd
│ ├── readSSfile.Rd
│ ├── read_maf.Rd
│ ├── reset_pos.Rd
│ ├── sample.fasta.Rd
│ ├── seedSample.fa.Rd
│ ├── seqdiff.Rd
│ ├── seqlogo.Rd
│ ├── sequence-link-tree.fasta.Rd
│ ├── show-methods.Rd
│ ├── simplify_hdata.Rd
│ ├── simplot.Rd
│ ├── theme_msa.Rd
│ ├── tidy_hdata.Rd
│ ├── tidy_maf_df.Rd
│ ├── tidy_msa.Rd
│ ├── tp53.fa.Rd
│ └── treeMSA_plot.Rd
├── tests/
│ ├── testthat/
│ │ ├── test-main.R
│ │ ├── test-msa_data.R
│ │ └── test-tidy_msa.R
│ └── testthat.R
└── vignettes/
├── .gitignore
├── ggmsa.Rmd
└── ggmsa.bib
================================================
FILE CONTENTS
================================================
================================================
FILE: .Rbuildignore
================================================
^.*\.Rproj$
^\.Rproj\.user$
Makefile
README.md
README_files
README.Rmd
^_pkgdown\.yml$
^docs$
^pkgdown$
logo.png
CONDUCT.md
================================================
FILE: .gitignore
================================================
.Rproj.user
.Rhistory
.RData
.Renviron
.DS_Store
inst/doc
ggmsa.Rproj
ggmsa.Rcheck
.git
docs/
pkgdown/
================================================
FILE: CONDUCT.md
================================================
# Contributor Code of Conduct
As contributors and maintainers of this project, we pledge to respect all people who
contribute through reporting issues, posting feature requests, updating documentation,
submitting pull requests or patches, and other activities.
We are committed to making participation in this project a harassment-free experience for
everyone, regardless of level of experience, gender, gender identity and expression,
sexual orientation, disability, personal appearance, body size, race, ethnicity, age, or religion.
Examples of unacceptable behavior by participants include the use of sexual language or
imagery, derogatory comments or personal attacks, trolling, public or private harassment,
insults, or other unprofessional conduct.
Project maintainers have the right and responsibility to remove, edit, or reject comments,
commits, code, wiki edits, issues, and other contributions that are not aligned to this
Code of Conduct. Project maintainers who do not follow the Code of Conduct may be removed
from the project team.
Instances of abusive, harassing, or otherwise unacceptable behavior may be reported by
opening an issue or contacting one or more of the project maintainers.
This Code of Conduct is adapted from the Contributor Covenant
(http://contributor-covenant.org), version 1.0.0, available at
http://contributor-covenant.org/version/1/0/0/
================================================
FILE: DESCRIPTION
================================================
Package: ggmsa
Title: Plot Multiple Sequence Alignment using 'ggplot2'
Version: 1.19.0
Authors@R: c(person("Guangchuang", "Yu", email = "guangchuangyu@gmail.com", role = c("aut", "cre","ths"), comment = c(ORCID = "0000-0002-6485-8781")),
person("Lang", "Zhou", email = "nyzhoulang@gmail.com", role = "aut"),
person("Shuangbin", "Xu", email = "xshuangbin@163.com", role = "ctb"),
person("Huina", "Huang", email = "1185796994@qq.com", role = "ctb"))
Description: A visual exploration tool for multiple sequence alignment
and associated data. Supports MSA of DNA, RNA, and protein sequences
using 'ggplot2'. Multiple sequence alignment can easily be combined
with other 'ggplot2' plots, such as phylogenetic tree Visualized by
'ggtree', boxplot, genome map and so on. More features: visualization
of sequence logos, sequence bundles, RNA secondary structures and detection
of sequence recombinations.
Depends: R (>= 4.1.0)
Imports:
Biostrings,
ggplot2,
magrittr,
tidyr,
utils,
stats,
aplot,
RColorBrewer,
ggfun (>= 0.2.0),
ggforce,
dplyr,
R4RNA,
grDevices,
seqmagick,
grid,
methods,
ggtree (>= 1.17.1)
Suggests:
ggtreeExtra,
ape,
cowplot,
knitr,
rmarkdown,
readxl,
ggnewscale,
kableExtra,
gggenes,
statebins,
prettydoc,
testthat (>= 3.0.0),
yulab.utils
License: Artistic-2.0
Encoding: UTF-8
URL: https://doi.org/10.1093/bib/bbac222(paper), https://www.amazon.com/Integration-Manipulation-Visualization-Phylogenetic-Computational-ebook/dp/B0B5NLZR1Z/ (book)
BugReports: https://github.com/YuLab-SMU/ggmsa/issues
biocViews: Software, Visualization, Alignment, Annotation, MultipleSequenceAlignment
RoxygenNote: 7.3.2
VignetteBuilder: knitr
Config/testthat/edition: 3
================================================
FILE: Makefile
================================================
PKGNAME := $(shell sed -n "s/Package: *\([^ ]*\)/\1/p" DESCRIPTION)
PKGVERS := $(shell sed -n "s/Version: *\([^ ]*\)/\1/p" DESCRIPTION)
PKGSRC := $(shell basename `pwd`)
BIOCVER := RELEASE_3_23
all: rd check clean
alldocs: rd readme mkdocs
rd:
Rscript -e 'roxygen2::roxygenise(".")'
readme:
Rscript -e 'rmarkdown::render("README.Rmd")'
readme2:
Rscript -e 'rmarkdown::render("README.Rmd", "html_document")'
build:
# cd ..;\
# R CMD build $(PKGSRC)
Rscript -e 'devtools::build()'
build2:
cd ..;\
R CMD build --no-build-vignettes $(PKGSRC)
install:
cd ..;\
R CMD INSTALL $(PKGNAME)_$(PKGVERS).tar.gz
check: #build
#cd ..;\
#Rscript -e 'rcmdcheck::rcmdcheck("$(PKGNAME)_$(PKGVERS).tar.gz")'
Rscript -e 'devtools::check()'
check2: build
cd ..;\
R CMD check $(PKGNAME)_$(PKGVERS).tar.gz
bioccheck:
cd ..;\
Rscript -e 'BiocCheck::BiocCheck("$(PKGNAME)_$(PKGVERS).tar.gz")'
gpcheck:
Rscript -e 'goodpractice::gp()'
clean:
cd ..;\
$(RM) -r $(PKGNAME).Rcheck/
gitmaintain:
git gc --auto;\
git prune -v;\
git fsck --full
rmrelease:
git branch -D $(BIOCVER)
release:
git checkout $(BIOCVER);\
git fetch --all
update:
git fetch --all;\
git checkout devel;\
git merge upstream/devel;\
git merge origin/devel;\
push:
git push upstream devel;\
git push origin devel
biocinit:
git remote add upstream git@git.bioconductor.org:packages/$(PKGNAME).git;\
git fetch --all
================================================
FILE: NAMESPACE
================================================
# Generated by roxygen2: do not edit by hand
S3method(diff,SeqDiff)
S3method(ggplot_add,GCcontent)
S3method(ggplot_add,facet_msa)
S3method(ggplot_add,msaBar)
S3method(ggplot_add,nucleotideeHelix)
S3method(ggplot_add,seed)
S3method(ggplot_add,seqlogo)
export(adjust_ally)
export(assign_dms)
export(available_colors)
export(available_fonts)
export(available_msa)
export(extract_seq)
export(facet_msa)
export(geom_GC)
export(geom_helix)
export(geom_msa)
export(geom_msaBar)
export(geom_seed)
export(geom_seqlogo)
export(ggSeqBundle)
export(gghelix)
export(ggmaf)
export(ggmsa)
export(merge_seq)
export(readSSfile)
export(read_maf)
export(reset_pos)
export(seqdiff)
export(seqlogo)
export(simplify_hdata)
export(simplot)
export(theme_msa)
export(tidy_hdata)
export(tidy_maf_df)
export(tidy_msa)
export(treeMSA_plot)
exportMethods(plot)
exportMethods(show)
importClassesFrom(Biostrings,BStringSet)
importFrom(Biostrings,AAStringSet)
importFrom(Biostrings,DNAStringSet)
importFrom(Biostrings,RNAStringSet)
importFrom(Biostrings,readBStringSet)
importFrom(Biostrings,readDNAStringSet)
importFrom(Biostrings,toString)
importFrom(Biostrings,width)
importFrom(R4RNA,as.helix)
importFrom(R4RNA,collapseHelix)
importFrom(R4RNA,expandHelix)
importFrom(R4RNA,readBpseq)
importFrom(R4RNA,readConnect)
importFrom(R4RNA,readHelix)
importFrom(R4RNA,readVienna)
importFrom(RColorBrewer,brewer.pal)
importFrom(aplot,insert_top)
importFrom(aplot,plot_list)
importFrom(dplyr,group_by)
importFrom(dplyr,group_by_)
importFrom(dplyr,n)
importFrom(dplyr,select)
importFrom(dplyr,summarize)
importFrom(dplyr,summarize_)
importFrom(ggforce,geom_arc)
importFrom(ggfun,geom_xspline)
importFrom(ggplot2,Geom)
importFrom(ggplot2,aes)
importFrom(ggplot2,aes_)
importFrom(ggplot2,coord_cartesian)
importFrom(ggplot2,coord_fixed)
importFrom(ggplot2,draw_key_polygon)
importFrom(ggplot2,element_blank)
importFrom(ggplot2,element_line)
importFrom(ggplot2,element_text)
importFrom(ggplot2,facet_wrap)
importFrom(ggplot2,geom_area)
importFrom(ggplot2,geom_blank)
importFrom(ggplot2,geom_col)
importFrom(ggplot2,geom_line)
importFrom(ggplot2,geom_point)
importFrom(ggplot2,geom_polygon)
importFrom(ggplot2,geom_ribbon)
importFrom(ggplot2,geom_segment)
importFrom(ggplot2,geom_smooth)
importFrom(ggplot2,geom_text)
importFrom(ggplot2,geom_tile)
importFrom(ggplot2,ggplot)
importFrom(ggplot2,ggplot_add)
importFrom(ggplot2,ggplot_build)
importFrom(ggplot2,ggplot_gtable)
importFrom(ggplot2,ggproto)
importFrom(ggplot2,ggtitle)
importFrom(ggplot2,labs)
importFrom(ggplot2,layer)
importFrom(ggplot2,scale_color_manual)
importFrom(ggplot2,scale_fill_gradientn)
importFrom(ggplot2,scale_fill_manual)
importFrom(ggplot2,scale_x_continuous)
importFrom(ggplot2,scale_y_continuous)
importFrom(ggplot2,theme)
importFrom(ggplot2,theme_bw)
importFrom(ggplot2,theme_minimal)
importFrom(ggplot2,theme_void)
importFrom(ggplot2,xlab)
importFrom(ggplot2,xlim)
importFrom(ggplot2,ylab)
importFrom(ggtree,geom_facet)
importFrom(ggtree,geom_tiplab)
importFrom(grDevices,colorRampPalette)
importFrom(grid,arrow)
importFrom(grid,gTree)
importFrom(grid,gpar)
importFrom(grid,polygonGrob)
importFrom(grid,unit)
importFrom(grid,unit.pmax)
importFrom(magrittr,"%<>%")
importFrom(magrittr,"%>%")
importFrom(methods,missingArg)
importFrom(methods,new)
importFrom(methods,show)
importFrom(seqmagick,fa_read)
importFrom(stats,setNames)
importFrom(tidyr,gather)
importFrom(utils,getFromNamespace)
importFrom(utils,globalVariables)
importFrom(utils,modifyList)
importFrom(utils,packageDescription)
importFrom(utils,read.delim)
================================================
FILE: NEWS.md
================================================
# ggmsa 1.18.0
+ Bioconductor RELEASE_3_23 (2026-04-29, Wed)
# ggmsa 1.16.0
+ Bioconductor RELEASE_3_22 (2025-11-01, Sat)
# ggmsa 1.15.1
+ replace `ggalt::geom_xspline()` with `ggfun::geom_xspline()` (2017-07-12, Sat)
# ggmsa 1.3.3
+ calling `\dontrun{}` for examples on `ggmsa()`
# ggmsa 1.3.2
+ bugfix: `geom_msaBar` conservation layer incorrectly aligned issues#34(2022-5-13, Fri)
# ggmsa 1.3.1
+ A new feature--selects ancestral sequence on Tree-MSA plot `treeMSA_plot` (2022-4-14, Thu)
+ A new feature--visualization of genome alignment `ggmaf` (2022-4-14, Thu)
+ A test feature--visualization protein-protein interactive (2022-4-14, Thu)
+ updated the way smooth is invoked on simplot(2022-01-03, Mon)
# ggmsa 1.1.4
added smoothed curve on simplot.(2021-12-17, Fri)
# ggmsa 1.1.3
fixed the typo in "posHighligthed", and changed it to
snake_case "position_highlight" from camelCase "posHighligthed" (2021-12-13, Mon)
# ggmsa 1.1.2
fixed the assignment error on line 155 'seqlogo.R'
# ggmsa 1.1.1
fixed error: using `||` instead of `|` on 110 lines in geom_msa.R
# ggmsa 0.99.0 or 0.99.x
(Prepare for submission to `Bioconductor`, 2021-09-22 Wed)
+ 0.99.1 update DESCRIPTION and NEWS files (2021-09-28, Tue)
+ 0.99.2 add documentation for row data in extdata/inst and clean up code (2021-09-29, Wed)
+ 0.99.3 remove some vignettes from master (build on the gh-pages branch) (2021-10-1, Fri)
+ 0.99.4 remove 'stringr' package from 'Imports' (2021-10-11, Mon)
+ 0.99.5 make the consensus_views compatible ggtreeExtra and add package description. (2021-10-21, Thu)
# ggmsa 0.0.10
+ update default color schemes in lower part of the SeqDiff plot (2021-08-20, Fri)
# ggmsa 0.0.9
+ import R4RNA to fix R check (2021-08-03, Tue)
# ggmsa 0.0.8
+ bugfix: fix variable names error in color_scheme. (2021-07-29, Thu)
+ The migration of sequence recombination functionality from `seqcombo` package. (2021-07-20, Tue)
# ggmsa 0.0.7
+ added `gghelix()` and `geom_helix()`.(2021-04-1, Thu)
+ added option to show the fill legend.(2021-03-23, Tue)
+ added a error message to remind that "sequences must have unique names".(2021-03-18, Thu)
+ added `ggSeqBundle()` to plot Sequence Bundles for MSAs based `ggolot2` (2021-03-18, Thu)
# ggmsa 0.0.6
+ supports linking `ggtreeExtra`. (2021-01-21, Thu)
+ bugfix: reversed sequence in 'tree + geom_facet(font)' . (2021-01-21, Thu)
+ bugfix: partitioning error when the sequence starting point greater than 1. (2021-01-21, Thu)
+ bugfix: generates continuous x-axis labels for each panel. (2021-01-21, Thu)
+ supports customize colors `custom_color`. (2020-12-28, Mon)
# ggmsa 0.0.5
+ added a new view called `by_conservation`.(2020-12-22, Tue)
+ added a new color scheme `Hydrophobicity` and a new parameter `border`.(2020-12-21, Mon)
+ rewrite the function `facet_msa()`.(2020-12-03, Thu)
+ Debug: tree + geom_facet(geom_msa()) does not work.(2020-12-03, Thu)
+ added a new function `geom_msaBar()`.(2020-12-03, Thu)
+ added a new parameter `ignore_gaps` used in consensus views.(2020-10-09, Fri)
+ debug in consensus views (2020-10-05, Mon)
+ added consensus views (2020-9-30, Wed)
+ added new colors `LETTER` and `CN6` provided by ShixiangWang.[issues#8](https://github.com/YuLab-SMU/ggmsa/issues/8)
# ggmsa 0.0.4
+ fixed warning message in **msa_data.R** (2020-4-26, Sun)
+ added ggplot_add methods for `geom_*()` (2020-4-24, Fri)
+ added a parameter `seq_name` in `ggmsa()` (2020-4-23, Thu)
+ added a new function `facet_msa()` --> break down the MSA (2020-4-17, Fri)
+ added a parameter `posHighlighted` in `ggmsa()` (2020-4-17, Fri)
+ created a new layer `geom_asterisk()` to optimized `geom_seed()` (2020-4-11, Sta)
+ added new functions `available_colors()`, `available_fonts()` and `available_msa()` (2020-3-30, Thu)
+ added a new function `geom_seed()` --> highlight the seed region in miRNA sequences (2020-3-27, Fri)
+ added a new function `ggmotif()`--> plot sequence motifs independently (2020-3-23, Tue)
+ added a Monospaced Font `DroidSansMono` (2020-3-23, Mon)
# ggmsa 0.0.3
+ release of v=0.0.3 (2020-03-16, Mon)
+ added a new function `geom_GC()` --> plot GC content in MSA (2020-02-28, Fri)
+ added a new function `geom_seqlogo()` --> plot plot sequence motifs in MSA (2020-02-14, Fri)
+ used a proportional scaling algorithm (2020-01-08, Wed)
# ggmsa 0.0.2
+ support plot sequence logo (2019-12-25, Wed)
+ added three fonts:`helvetical`, `times_new_roman`, `mono` (2019-12-21, Sta)
+ ~~added three fonts:`serif_font`, `Montserrat_font`, `roboto_font` (2019-12-17, Tue)~~
+ added internal outline polygons (2019-12-15, Sun)
+ bug fixed of `tidy_msa`
+ import `seqmagick` for parsing fasta
+ `tidy_msa` for converting msa file/object to tidy data frame (2019-12-09, Mon)
# ggmsa 0.0.1
+ initial CRAN release (2019-10-17, Thu)
+ removed from CRAN on 2021-08-17
================================================
FILE: R/AllClasses.R
================================================
setClass("SeqDiff",
representation = representation(
file = "character",
sequence = "BStringSet",
reference = "numeric",
diff = "data.frame"
)
)
================================================
FILE: R/SeqBundles.R
================================================
##' plot Sequence Bundles for MSA based 'ggolot2'
##'
##'
##' @title ggSeqBundle
##' @importFrom ggfun geom_xspline
##' @param msa Multiple sequence alignment file(FASTA) or object for
##' representing either nucleotide sequences or peptide sequences.Also receives
##' multiple MSA files.
##' eg:msa = c("Gram-negative_AKL.fasta", "Gram-positive_AKL.fasta").
##' @param line_width The width of bundles at each site, default is 0.3.
##' @param line_thickness The thickness of bundles at each site, default is 0.3.
##' @param line_high The high of bundles at each site, default is 0.
##' @param spline_shape A numeric vector of values between -1 and 1, which
##' control the shape of the spline relative to the control points.
##' @param size A numeric vector of values between 0 and 1,
##' which control the size of each lines.
##' @param alpha A numeric vector of values between 0 and 1,
##' which control the alpha of each lines.
##' @param bundle_color The colors of each sequence bundles.
##' eg: bundle_color = c("#2ba0f5","#424242").
##' @param lev_molecule Reassigning the Y-axis and displaying
##' letter-coded amino acids/nucleotides arranged by physiochemical
##' properties or others.eg:amino acids hydrophobicity
##' lev_molecule = c("-","A", "V", "L", "I", "P", "F", "W", "M",
##' "G", "S","T", "C", "Y", "N", "Q", "D", "E", "K","R", "H").
##' @return ggplot object
##' @export
##' @examples
##' aln <- system.file("extdata", "Gram-negative_AKL.fasta", package = "ggmsa")
##' ggSeqBundle(aln)
##' @author Lang Zhou
ggSeqBundle <- function(msa,
line_width = 0.3,
line_thickness = 0.3,
line_high = 0,
spline_shape = 0.3,
size = 0.5,
alpha = 0.2,
bundle_color = c("#2ba0f5","#424242"),
lev_molecule = c("-", "A", "V", "L", "I", "P",
"F", "W", "M", "G", "S","T",
"C", "Y", "N", "Q", "D", "E",
"K", "R", "H")
) {
if(length(msa) > length(bundle_color)) {
stop("Each MSA group should be assigned a bundle color!!")
}
df <- lapply(seq_along(msa), function(i){
df_aa <- tidy_msa(msa[[i]])
df_aa$name <- as.character(df_aa$name)
df_aa$group <- i
df_aa
})%>% do.call("rbind",.)
dd <- adjustMSA(df_msa = df,
lev_molecule = lev_molecule,
line_width = line_width,
line_thickness = line_thickness,
line_high = line_high,
bundle_color = bundle_color
)
mapping <- aes(x = position_adj, y = y_adj,
group=name, color = I(bundle_color))
ggplot(data = dd, mapping = mapping) +
geom_xspline(shape = spline_shape, linewidth = size, alpha = alpha) +
theme_bundles(df = df, lev_molecule = lev_molecule)
}
adjustMSA <- function(df_msa, lev_molecule, line_width,
line_thickness, bundle_color, line_high) {
data_scale <- lapply(nrow(df_msa) %>% seq_len(), function(i) {
d <- df_msa[i,]
d[2,] <- d[1,]
d[1,"position_adj"] <- d[1,"position"] - line_width
d[2,"position_adj"] <- d[2,"position"] + line_width
d
}) %>% do.call("rbind",.)
data_scale$y <- factor(data_scale$character, levels = lev_molecule) %>%
as.numeric()
data_adj <- lapply(data_scale$group %>% unique, function(g) {
data_group <- data_scale[data_scale$group == g,]
thickness <- line_thickness / factor(data_group$name) %>%
as.numeric %>%
max
dd_adj <- lapply(unique(data_group$position), function(i){
df_pos <- data_group[data_group$position == i,]
lapply(unique(df_pos$y), function(j){
df_y <- df_pos[df_pos$y == j,]
thick_lev <- df_y$name %>% factor %>% as.numeric - 1
df_y$y_adj <- df_y$y - 0.4 + line_high + thickness *
thick_lev + line_thickness * (g - 1)
df_y
}) %>% do.call("rbind",.)
}) %>% do.call("rbind",.)
dd_adj$bundle_color <- bundle_color[[g]]
dd_adj
}) %>% do.call("rbind",.)
return(data_adj)
}
##' @importFrom ggplot2 element_line
theme_bundles <- function(df, lev_molecule){
break_y <- factor(lev_molecule, levels = lev_molecule) %>% as.numeric
minor_y <- c(break_y + 0.5, break_y - 0.5) %>% unique
break_x <- max(df$position) %>% seq_len
minor_x <- c(break_x + 0.5, break_x - 0.5) %>% unique
list(
ylab(NULL),
xlab("Position number"),
scale_x_continuous(breaks = break_x,
labels = break_x,
minor_breaks = minor_x),
scale_y_continuous(breaks = break_y,
labels = lev_molecule,
minor_breaks = minor_y),
theme(panel.grid.minor.y = element_line(color = "#e8e0e0", linewidth = 0.4),
axis.line.x = element_line(color = "gray60", linewidth = 0.8),
panel.grid.major = element_blank(),
axis.ticks.y = element_blank(),
panel.background = element_blank())
)
}
================================================
FILE: R/ancestor_seq.R
================================================
##' plot Tree-MSA plot
##'
##'
##' 'treeMSA_plot()' automatically re-arranges the MSA data according to
##' the tree structure,
##' @title treeMSA_plot
##' @param p_tree tree view
##' @param tidymsa_df tidy MSA data
##' @param ancestral_node vector, internal node in tree. Assigning a internal
##' node to display "ancestral sequences",If ancestral_node = "none" hides
##' all ancestral sequences, if ancestral_node = "all" shows all ancestral
##' sequences.
##' @param sub logical value. Displaying a subset of ancestral sequences or not.
##' @param panel panel name for plot of MSA data
##' @param font font families, possible values are 'helvetical', 'mono', and
##' 'DroidSansMono', 'TimesNewRoman'. Defaults is 'helvetical'.
##' If font = NULL, only plot the background tile.
##' @param color a Color scheme. One of 'Clustal', 'Chemistry_AA',
##' 'Shapely_AA', 'Zappo_AA', 'Taylor_AA', 'LETTER', 'CN6', 'Chemistry_NT',
##' 'Shapely_NT', 'Zappo_NT', 'Taylor_NT'. Defaults is 'Chemistry_AA'.
##' @param seq_colname the colname of MSA on tree$data
##' @param ... additional parameters for 'geom_msa'
##' @export
##' @importFrom ggtree geom_facet
##' @return ggplot object
##' @author Lang Zhou
treeMSA_plot <- function(p_tree,
tidymsa_df,
ancestral_node = "none",
sub = FALSE,
panel = "MSA",
font = NULL,
color = "Chemistry_AA",
seq_colname = NULL,
...) {
if(!ancestral_node == "none" && is.null(seq_colname)) {
stop("pls assign the colname of MSA on tree$data by arguments 'seq_colname'!")
}
if(!ancestral_node == "none") {
p_tree <- adjust_ally(p_tree, node = ancestral_node,
sub = sub,
seq_colname = seq_colname)
tidymsa_df <- extract_seq(p_tree,
seq_colname = seq_colname)
}
p <- p_tree + geom_facet(geom = geom_msa,
data = tidymsa_df,
panel = panel,
font = font,
color = color,
...)
if(ancestral_node == "none") {
p <- p + geom_tiplab(offset = 0.002)
}
p
}
##' adjust the tree branch position after assigning ancestor node
##'
##' @title adjust_ally
##' @param tree ggtree object
##' @param node internal node in tree
##' @param sub logical value.
##' @param seq_colname the colname of MSA on tree$data
##' @importFrom ggtree geom_tiplab
##' @importFrom ggplot2 aes_
##' @importFrom utils getFromNamespace
##' @return tree
##' @export
##' @author Lang Zhou
adjust_ally <- function(tree, node, sub = FALSE, seq_colname = "mol_seq") {
getSubtree <- getFromNamespace("getSubtree", "ggtree")
if(node == "all"){
d <- tree$data
ancestor_n <- d[!d$isTip & !is.na(d[,seq_colname][[1]]),"node"][[1]]
}else {
if(sub){
ancestor_n <- lapply(node, function(i) {
sub_tree <- getSubtree(tree,node = i)
sub_ancestor <- sub_tree[!sub_tree$isTip,]
ancestor_n <- sub_ancestor$node
return(ancestor_n)
})%>% unlist %>% unique
}else {
ancestor_n <- node
}
}
for (i in ancestor_n) {
tree <- adjust_treey(tree = tree, node = i)
}
tree$data$node_color <- "black"
tree$data[tree$data$node %in% ancestor_n,"node_color"] <- "red"
tree <- tree + geom_tiplab(aes_(color = ~I(node_color)),offset = 0.002)
return(tree)
}
##' extract ancestor sequence from tree data
##'
##' @title extract_seq
##' @param tree_adjust ggtree object
##' @param seq_colname the colname of MSA on tree$data
##' @return character
##' @export
##' @author Lang Zhou
extract_seq <- function(tree_adjust, seq_colname = "mol_seq") {
data <- tree_adjust$data
seq <- data[data$isTip,seq_colname][[1]]
names(seq) <- data[data$isTip,]$label
tidy <- tidy_msa(seq)
return(tidy)
}
adjust_treey <- function(tree, node) {
tree$data$isTip[tree$data$node == node] <- TRUE
tree$data$label[tree$data$node == node] <-
tree$data$name[tree$data$node == node]
y_ancenstor <- tree$data$y[tree$data$node == node]
tree$data$y[tree$data$y > y_ancenstor] <-
tree$data$y[tree$data$y > y_ancenstor] + 1
tree$data$y[tree$data$node == node] <-
tree$data$y[tree$data$node == node] %>% ceiling
return(tree)
}
================================================
FILE: R/arc.R
================================================
##' Plots nucleltide secondary structure as helices in arc diagram
##'
##' @title gghelix
##' @param helix_data a data frame. The file of nucleltide secondary structure
##' and then read by readSSfile().
##' @param overlap Logicals. If TRUE, two structures data called predict
##' and known must be given(eg:heilx_data = list(known = data1,
##' predicted = data2)),
##' plots the predicted helices that are known on top, predicted helices that
##' are not known on the bottom, and finally plots unpredicted helices
##' on top in black.
##' @param color_by generate colors for helices by various rules,
##' including integer counts and value ranges one of "length" and "value"
##' @return ggplot object
##' @export
##' @examples
##' RF03120 <- system.file("extdata/Rfam/RF03120_SS.txt", package="ggmsa")
##' helix_data <- readSSfile(RF03120, type = "Vienna")
##' gghelix(helix_data)
##' @author Lang Zhou
gghelix <- function(helix_data, color_by = "length",overlap = FALSE){
if(is.data.frame(helix_data)) {
helix_tidy <- tidy_helix(helix_data, color_by = color_by)
}else {
helix_tidy <- tidy_list_helix(helix_data, color_by = color_by)
}
ly <- layer_helix(helix_data = helix_tidy, overlap = overlap)
p <- ggplot() + ly + theme_helix()
return(p)
}
##' The layer of helix plot
##'
##' @title geom_helix
##' @param helix_data a data frame. The file of nucleltide secondary structure
##' and then read by readSSfile().
##' @param overlap Logicals. If TRUE, two structures data called predict
##' and known must be given(eg:heilx_data = list(known = data1,
##' predicted = data2)),
##' plots the predicted helices that are known on top,
##' predicted helices that are not known on the bottom, and finally plots
##' unpredicted helices on top in black.
##' @param color_by generate colors for helices by various rules,
##' including integer counts and value ranges one of "length" and "value"
##' @param ... additional parameter
##' @return ggplot2 layers
##' @export
##' @examples
##' RF03120 <- system.file("extdata/Rfam/RF03120_SS.txt", package="ggmsa")
##'RF03120_fas <- system.file("extdata/Rfam/RF03120.fasta", package="ggmsa")
##'SS <- readSSfile(RF03120, type = "Vienna")
##'ggmsa(RF03120_fas, font = NULL,border = NA,
##' color = "Chemistry_NT", seq_name = FALSE) +
##'geom_helix(SS)
##' @author Lang Zhou
geom_helix <- function(helix_data, color_by = "length", overlap = FALSE, ...) {
structure(list(helix_data = helix_data,
color_by = color_by,
overlap = overlap),
class = "nucleotideeHelix")
}
##' Read secondary structure file
##'
##' @title readSSfile
##' @importFrom utils read.delim
##' @param file A text file in connect format
##' @param type file type. one of "Helix, "Connect", "Vienna" and "Bpseq"
##' @return data frame
##' @importFrom R4RNA readHelix
##' @importFrom R4RNA readConnect
##' @importFrom R4RNA readVienna
##' @importFrom R4RNA readBpseq
##' @importFrom R4RNA expandHelix
##' @importFrom R4RNA collapseHelix
##' @export
##' @examples
##' RF03120 <- system.file("extdata/Rfam/RF03120_SS.txt", package="ggmsa")
##' helix_data <- readSSfile(RF03120, type = "Vienna")
##' @author Lang Zhou
readSSfile <- function(file, type = NULL) {
type <- match.arg(type, c("Helix", "Connect", "Vienna", "Bpseq"))
load_data <- switch(type,
Helix = readHelix(file),
Connect = readConnect(file),
Vienna = readVienna(file),
Bpseq = expandHelix(file))
data <- collapseHelix(load_data)
return(data)
}
tidy_list_helix <- function(helix_data, color_by = "length"){
known <- tidy_helix(helix_data$known, color_by = color_by)
predicted <- tidy_helix(helix_data$predicted, color_by = color_by)
return(list(known = known, predicted = predicted))
}
tidy_helix <- function(helix_data, color_by = "length"){
helix_data <- color_helix(helix_data, color = color_by)
names(helix_data)[c(1,2)] <- c("from","to")
helix_data$x0 <- (helix_data$to + helix_data$from)/2
helix_data$r <- (helix_data$to - helix_data$from)/2
return(helix_data)
}
color_helix <- function(helix_data, color){
#color <- match.arg(color, c("length", "value"))
if(color == "length"){
data_color <- colorBy_length(helix_data)
}else if(color == "value") {
data_color <- colorBy_value(helix_data)
}else {
helix_data$col <- color
data_color <- helix_data
}
data <- expandHelix(data_color)
return(data)
}
colorBy_length <- function(helix_data){
pal_lenght <- colorRampPalette(brewer.pal(name = "Paired", n = 12))
helix_data$col <- nrow(helix_data) %>% pal_lenght()
return(helix_data)
}
colorBy_value <- function(helix_data){
pal_value <- colorRampPalette(rev(brewer.pal(name = "Blues", n = 4)))
helix_data$col <- nrow(helix_data) %>% pal_value()
return(helix_data)
}
##' @importFrom ggforce geom_arc
layer_helix <- function(helix_data, overlap = FALSE, seq_numbers = 0){
mapping_above <- aes_(x0 = ~x0,
y0 = ~(seq_numbers + 0.5),
r = ~r, start = ~1.5*pi,
end = ~2.5*pi)
mapping_below <- aes_(x0 = ~x0,
y0 = ~(-0.5),
r = ~r, start = ~pi/2,
end = ~1.5*pi)
if(seq_numbers > 0) {
mapping_below <- modifyList(mapping_below, aes_(y0 = ~0))
}
if(is.list(helix_data) & "col" %in% names(helix_data[[2]])) {
mapping_above <- modifyList(mapping_above, aes_(color = ~I(col)))
mapping_below <- modifyList(mapping_below, aes_(color = ~I(col)))
}
if(overlap) {
if(!is.list(helix_data)| length(helix_data) != 2){
stop("Overlapping structures must input a list with
2 helix data.
(eg: heilx_data = list(known = data1, predicted = data2)")
}
if(!names(helix_data) %in% c("known", "predicted") %>% all) {
stop("helix_data names must be 'known' and 'predicted'.
(eg: heilx_data = list(known = data1, predicted = data2)")
}
overlap_data <- overlap_helix(known = helix_data[["known"]],
predicted = helix_data[["predicted"]])
if (overlap_data[["above_justknown"]] %>% nrow == 0){
ly_up <- geom_arc(data = overlap_data[["above_both"]],
mapping = mapping_above)
ly_below <- geom_arc(data = overlap_data[["below"]],
mapping = mapping_below)
return(list(ly_up, ly_below))
}else {
ly_up <- geom_arc(data = overlap_data[["above_both"]],
mapping = mapping_above)
ly_up_justknown <-
geom_arc(data = overlap_data[["above_justknown"]],
mapping = mapping_above,
color = "black")
ly_below <- geom_arc(data = overlap_data[["below"]],
mapping = mapping_below)
return(list(ly_up, ly_up_justknown, ly_below))
}
}else {#overlap = FALSE
if(is.list(helix_data) & length(helix_data) == 2) {
if(!"col" %in% names(helix_data[["known"]])) {
mapping_below <- modifyList(mapping_below,
aes_(color = I("#8fce5e")))
}
ly_up <- geom_arc(data = helix_data[["known"]],
mapping = mapping_below)
ly_below <- geom_arc(data = helix_data[["predicted"]],
mapping = mapping_above)
return(list(ly_up, ly_below))
}else if(is.data.frame(helix_data)){
if("col" %in% names(helix_data)){
mapping_above <- modifyList(mapping_above,
aes_(color = ~I(col)))
}
ly_arc <- geom_arc(data = helix_data, mapping = mapping_above)
return(ly_arc)
}else {
stop("Only a data frame or a list with 2 of helix data are allowed.
eg: heilx_data = data or
heilx_data = list(known = data1, predicted = data2)")
}
}
}
overlap_helix <- function(known, predicted){
if(!c("from", "to") %in% names(known) %>% all) {
stop("'known' must be a output from 'readSSfile()'")
}
if(!c("from", "to") %in% names(predicted) %>% all) {
stop("'predicted' must be a output from 'readSSfile()'")
}
known$heli <- paste0(known$from, "t",known$to)
predicted$heli <- paste0(predicted$from, "t", predicted$to)
below <- predicted[!predicted$heli %in% known$heli,] #predicted & not known
above_both <- predicted[predicted$heli %in% known$heli,] #predicted & known
above_justknown <- known[!known$heli %in% above_both$heli,] #unpredicted & known
return(list(below = below,
above_both = above_both,
above_justknown = above_justknown))
}
##' @importFrom ggplot2 theme_void
##' @importFrom ggplot2 element_text
##' @importFrom grid arrow
theme_helix <- function(){
list(theme_void(),
scale_y_continuous(breaks = 0),
coord_fixed(),
theme(panel.grid.major.y = element_line(size = 1, arrow = arrow(length = unit(0.3, 'cm'))),
panel.grid.major.x = element_line(color = "#eaeaea", size = 0.4),
axis.text.x = element_text())
)
}
================================================
FILE: R/available.R
================================================
##' This function lists font families currently available
##' that can be used by 'ggmsa'
##'
##'
##' @title List Font Families currently available
##' @return A character vector of available font family names
##' @examples available_fonts()
##' @export
##' @author Lang Zhou
available_fonts <- function(){
message("font families currently available:" )
font <- paste(names(font_fam), collapse = ' ')
message(font, "\n")
}
##' This function lists color schemes currently available that
##' can be used by 'ggmsa'
##'
##'
##' @title List Color Schemes currently available
##' @return A character vector of available color schemes
##' @examples available_colors()
##' @export
##' @author Lang Zhou
available_colors <- function(){
message("1.color schemes for nucleotide sequences currently available:")
color_nt <- paste(names(scheme_NT), collapse = ' ')
message(color_nt, "\n")
message("2.color schemes for AA sequences currently available:")
color_aa <- paste(names(scheme_AA), collapse = ' ')
message("Clustal", color_aa, "\n")
}
##' This function lists MSA objects currently available that
##' can be used by 'ggmsa'
##'
##'
##' @title List MSA objects currently available
##' @return A character vector of available objects
##' @examples available_msa()
##' @export
##' @author Lang Zhou
available_msa <- function(){
message("1.files currently available:")
message(".fasta",'\n')
message("2.XStringSet objects from 'Biostrings' package:")
mes <- paste(supported_msa_class[!grepl("bin", supported_msa_class)],
collapse = ' ')
message(mes, '\n')
message("3.bin objects:")
mes_bin <- paste(supported_msa_class[grepl("bin", supported_msa_class)],
collapse = ' ')
message(mes_bin, '\n')
}
================================================
FILE: R/clustal.R
================================================
##' A color scheme of Culstal. The algorithm to assign colors
##' for Multiple Sequence.
##'
##' @param y sequence alignment with data frame, generated by tidy_msa().
##' @keywords clustal
##' @noRd
color_Clustal <- function(y) {
char_freq <- lapply(split(y, y$position), function(x) table(x$character))
col_convert <- lapply(char_freq, function(seq_column) {
##The white as the background
clustal <- rep("#ffffff", length(seq_column))
names(clustal) <- names(seq_column)
r <- seq_column/sum(seq_column)
for (pos in seq_along(seq_column)) {
char <- names(seq_column)[pos]
i <- grep(char, scheme_clustal$re_position)
for (j in i) {
if (scheme_clustal$type[j] == "combined"){
rr <- sum(r[strsplit(scheme_clustal$re_gp[j], '')[[1]]],
na.rm = TRUE)
if (rr > scheme_clustal$thred[j]) {
clustal[pos] <- scheme_clustal$colour[j]}
} else{
rr1<-r[strsplit(scheme_clustal$re_gp[j], ',')[[1]]]
if (any(rr1> scheme_clustal$thred[j],na.rm = TRUE) ) {
clustal[pos] <- scheme_clustal$colour[j]}
}
break
}
}
return(clustal)
})
yy <- split(y, y$position)
lapply(names(yy), function(n) {
d <- yy[[n]]
col <- col_convert[[n]]
d$color <- col[d$character]
return(d)
}) %>% do.call('rbind', .)
}
================================================
FILE: R/color_by_conservation.R
================================================
color_increment <- function(conservation_visibility){
lapply(seq_len(nrow(conservation_visibility)), function(i){
color_ramp <-
colorRampPalette(colors =
c(conservation_visibility[i,"color"],
"#ffffff"))
color_change <-
rev(color_ramp(100))[conservation_visibility[i,"visibility"]]
return(color_change)
}) %>% unlist
}
color_visibility <- function(y){
#options(digits = 2)
#on.exit()
conser_data <- bar_data(y)
conser_data$visibility <-
conser_data$Freq / length(levels(y[[1]])) %>% round(2)
conser_data$visibility <- conser_data$visibility * 100
names(conser_data)[3] <- "position"
y_filter <- y[c(-1,-3)]
conser_ready <- merge(conser_data, y_filter)
y$color <- color_increment(conser_ready)
return(y)
}
================================================
FILE: R/color_else.R
================================================
##' Assigning colors to sequence alignment.
##'
##'
##' @param y sequence alignment with data frame, generated by tidy_msa().
##' @param color a Color scheme. One of 'Clustal', 'Chemistry_AA',
##' 'Shapely_AA', 'Zappo_AA', 'Taylor_AA', 'LETTER', 'CN6', 'Chemistry_NT',
##' 'Shapely_NT', 'Zappo_NT', 'Taylor_NT'. Defaults is 'Chemistry_AA'.
##' @param custom_color A data frame with two column called "names"
##' and "color".Customize the color scheme.
##' @noRd
color_scheme <- function(y, color = "Chemistry_AA", custom_color = NULL) {
if (!is.null(custom_color)){
#Elimination factor interference
custom_color[["names"]] <- as.character(custom_color[["names"]])
#Fuzzy matching the string "colors" or "colours"
custom_color[["color"]] <- as.character(custom_color$col)
row.names(custom_color) <- custom_color[["names"]]
scheme_AA$custom_color <-
custom_color[row.names(scheme_AA), "color"] %>% as.character()
y$color <- scheme_AA[y$character, "custom_color"]
}else{
if(grepl("NT", color)){
y$color <- scheme_NT[y$character, color]
} else{
y$color <- scheme_AA[y$character, color]
}
}
return(y)
}
================================================
FILE: R/cons.R
================================================
##' cleaning the needless sequences' color according to the
##' consensus sequence (only used in the consensus views).
##'
##' @param y a data frame, sequence alignment with specified color.
##' @param consensus the consensus sequence which can be called by
##' get_consensus().
##' @param disagreement a logical value. Displays characters that
##' disagreement to consensus(excludes ambiguous disagreements).
##' @param ref a character string. Specifying the reference sequence
##' which should be one of input sequences when 'consensus_views' is TRUE.
##' @keywords tidy_color
##' @noRd
tidy_color <- function(y, consensus, disagreement, ref) {
c <- lapply(unique(y$position), function(i) {
msa_cloumn <- y[y$position == i, ]
if(!is.null(ref)) {
if ('label' %in% names(msa_cloumn)) { ##work for ggtreeExtra
msa_cloumn <- msa_cloumn[!msa_cloumn$label == ref, ]
}else{
msa_cloumn <- msa_cloumn[!msa_cloumn$name == ref, ]
}
}
#Get consensus char.
cons_char <- consensus[consensus$position == i, "character"]
#Compare the characters of the current position(i)
#to the consensus char.
logic <- msa_cloumn$character == cons_char
#Cleaning colors according to the 'logic'.
if(cons_char == "X") {
msa_cloumn$color <- NA
}
if(disagreement){
msa_cloumn[logic, "color"] <- NA
}else{
msa_cloumn[!logic, "color"] <- NA
}
msa_cloumn
}) %>% do.call("rbind", .)
return(c)
}
##' calling the consensus sequence.
##'
##' @param tidy sequence alignment with data frame, generated by tidy_msa().
##' @param ignore_gaps a logical value. When selected TRUE, gaps in
##' column are treated as if that row didn't exist.
##' @param ref a character string. Specifying the reference sequence
##' which should be one of input sequences when 'consensus_views' is TRUE.
##' @keywords get_consensus
##' @noRd
get_consensus <- function(tidy, ignore_gaps = FALSE, ref = NULL) {
if(!is.null(ref)) {
if(ignore_gaps) {
warning("The argument 'ignore_gaps' is
invalid when 'ref' is specified!")
}
if ('label' %in% names(tidy)) { ##work for ggtreeExtra
ref <- match.arg(ref, levels(factor(tidy$label)))
cons <- tidy[tidy$label == ref,]
}else {
ref <- match.arg(ref, levels(tidy$name))
cons <- tidy[tidy$name == ref,]
}
return(cons)
}
#Iterate through each columns
cons <- lapply(unique(tidy$position), function(i) {
msa_cloumn <- tidy[tidy$position == i, ]
cons <- data.frame(position = i)
if(ignore_gaps) {
msa_cloumn <- msa_cloumn[!msa_cloumn$character %in% "-",]
}
#Gets the highest frequency characters
fre <- table(msa_cloumn$character) %>% data.frame
max_element <- fre[fre[2] == max(fre[2]),]
max_number <- max_element %>% nrow
if(max_number == 1) {
cons$character <- max_element[1,1]
}else {
cons$character <- "X"
}
cons
}) %>% do.call("rbind", .)
cons$name = "Consensus"
cons$character <- as.character(cons$character) #debug 'as.character'
return(cons)
}
order_name <- function(name, order = NULL,
consensus_views = FALSE,
ref = NULL) {
name_uni <- unique(name)
if(is.null(ref)){
#placed 'consensus' at the top
name_expect <- name_uni[!name_uni %in% "Consensus"] %>%
rev %>%
as.character
name <- factor(name, levels = c(name_expect, "Consensus"))
}else {
name_expect <- name_uni[!name_uni %in% ref] %>%
rev %>%
as.character
name <- factor(name, levels = c(name_expect, ref))
}
return(name)
}
================================================
FILE: R/data.R
================================================
#' A sample data used in ggmsa
#'
#' A dataset containing the alignment sequences of
#' the phenylalanine hydroxylase protein (PH4H)
#' within nine species
#'
#'
#' @docType data
#' @keywords datasets
#' @name sample.fasta
#' @format A MSA fasta with 9 sequences and 456 positions.
NULL
#' GVariation
#'
#' A folder containing 4 MAS files as a sample
#' data set to identify the sequence recombination event.
#'
#' \itemize{
#' \item A.Mont.fas MSA with sequences of 'Mont' and 'CF_YL21'
#' \item B.Oz.fas MSA with sequences of 'Oz' and 'CF_YL21'
#' \item C.Wilga5.fas MSA with sequences of 'Wilga5' and 'CF_YL21'
#' \item sample_alignment.fa MSA with sequences of 'Mont', 'CF_YL21',
#' 'Oz', and 'Wilga5'
#' }
#' @docType data
#' @keywords datasets
#' @name GVariation
#' @format a folder
#' @source \url{https://link.springer.com/article/10.1007/s11540-015-9307-3}
NULL
#' Rfam
#'
#' A folder containing seed alignment sequences and
#' corresponding consensus RNA secondary structure.
#'
#' \itemize{
#' \item RF00458.fasta seed alignment sequences of Cripavirus internal
#' ribosome entry site (IRES)
#' \item RF03120.fasta seed alignment sequences of Sarbecovirus 5'UTR
#' \item RF03120_SS.txt consensus RNA secondary structure of
#' Sarbecovirus 5'UTR
#'
#' }
#' @docType data
#' @keywords datasets
#' @name Rfam
#' @format a folder
#' @source \url{https://rfam.xfam.org/}
NULL
#' Gram-negative_AKL
#'
#' Amino acids in the adenylate kinase lid (AKL) domain
#' from Gram-negative bacteria.
#'
#' @docType data
#' @keywords datasets
#' @name Gram-negative_AKL.fasta
#' @format A MSA fasta with 100 sequences and 36 positions.
#' @source \url{http://biovis.net/year/2013/info/redesign-contest}
NULL
#' Gram-positive_AKL
#'
#' Amino acids in the adenylate kinase lid (AKL) domain
#' from Gram-positive bacteria.
#'
#' @docType data
#' @keywords datasets
#' @name Gram-positive_AKL.fasta
#' @format A MSA fasta with 100 sequences and 36 positions.
#' @source \url{http://biovis.net/year/2013/info/redesign-contest}
NULL
#' A sample DNA alignment sequences
#'
#' DNA alignment sequences with 24 sequences and 56 positions.
#'
#'
#' @docType data
#' @keywords datasets
#' @name LeaderRepeat_All.fa
#' @format A MSA fasta
NULL
#' microRNA data used in ggmsa
#'
#'Fasta format sequences of mature miRNA sequences
#'from miRBase
#'
#'
#' @docType data
#' @keywords datasets
#' @name seedSample.fa
#' @format A MSA fasta with 6 sequences and 22 positions.
#' @source \url{https://www.mirbase.org/ftp.shtml}
NULL
#' sequence-link-tree
#'
#' Alignment sequences used to demonstrate circular MSA layout
#'
#' @docType data
#' @keywords datasets
#' @name sequence-link-tree.fasta
#' @format A MSA fasta with 28 sequences and 480 positions.
NULL
#' TP53 MSA
#'
#' Alignment sequences of used to show graphical combination
#'
#' @docType data
#' @keywords datasets
#' @name tp53.fa
#' @format A MSA fasta with 5 sequences and 404 positions.
NULL
#' genome locus
#'
#' The local genome map shows the 30000 sites around the TP53 gene.
#'
#' @docType data
#' @keywords datasets
#' @name TP53_genes.xlsx
#' @format xlsx
NULL
================================================
FILE: R/dms.R
================================================
##' assign dms value to alignments.
##'
##' @title assign_dms
##' @param x data frame from tidy_msa()
##' @param dms dms data frame
##' @return tree
##' @export
##' @author Lang Zhou
assign_dms <- function(x, dms) {
dms_value <- lapply(unique(x$position), function(i) {
xx <- x[x$position == i,]
dmss <- dms[dms$site_RBD == i,]
wt <- unique(dmss[,"wildtype"])
xx$mutation <- paste0(wt, xx$position, xx$character)
xx$bind_avg <- lapply(seq_along(xx$mutation),function(j) {
bind_avg <- dmss[dmss$mutation_RBD %in% xx[j,"mutation"],"bind_avg"]
return(bind_avg)
}) %>% unlist
return(xx)
}) %>% do.call("rbind",.)
return(dms_value )
}
================================================
FILE: R/facet_msa.R
================================================
##' The MSA would be plot in a field that you set.
##' @title segment MSA
##' @param field a numeric vector of the field size.
##' @return ggplot layers
##' @examples
##' library(ggplot2)
##' f <- system.file("extdata/sample.fasta", package="ggmsa")
##' # 2 fields
##' ggmsa(f, end = 120, font = NULL, color="Chemistry_AA") +
##' facet_msa(field = 60)
##' # 3 fields
##' ggmsa(f, end = 120, font = NULL, color="Chemistry_AA") +
##' facet_msa(field = 40)
##' @export
##' @author Lang Zhou
facet_msa <- function(field) {
structure(list(field = field),
class = "facet_msa"
)
}
facet_data <- function(msaData, field) {
if(min(msaData$position) > 1){
pos_reset <- msaData$position - min(msaData$position)
pos_reset[pos_reset == 0] <- 1
}else {
pos_reset <- msaData$position
}
msaData$facet <- pos_reset %/% field
msaData[(pos_reset %% field) == 0, "facet"] <-
msaData[(pos_reset %% field) == 0, "facet"] - 1
return(msaData)
}
================================================
FILE: R/geom_GC.R
================================================
##' Multiple sequence alignment layer for ggplot2. It plot points of GC content.
##' @title geom_GC
##' @param show.legend logical. Should this layer be included in the legends?
##' @return a ggplot layer
##' @examples
##' #plot GC content
##' f <- system.file("extdata/LeaderRepeat_All.fa", package="ggmsa")
##' ggmsa(f, font = NULL, color="Chemistry_NT") + geom_GC()
##' @export
##' @author Lang Zhou
geom_GC <- function(show.legend = FALSE) {
structure(list(show.legend = show.legend),
class = "GCcontent")
}
geom_GC1 <- function(tidyData, show.legend = FALSE){
tidy <- tidyData
#tidy <- tidy_msa(msa = msa, start = start, end = end)
GC_pos <- getOption("GC_pos")
GC <- content_GC(tidy)
GC <-GC[GC$character == "GC",]
col_num <- levels(factor(tidy$position))
col_len <- length(col_num) + GC_pos
ly_GC <- geom_point(data = GC,
mapping = aes_(x = ~col_len,
y = ~ypos,
size = ~fre),
color = "#51a6e9",
na.rm = TRUE,
show.legend = show.legend)
return(ly_GC)
}
##' get GC content
##' @title content_GC
##' @param data Multiple aligned sequence files or objects
##' for representing nucleotide sequences
##' @return A data frame
##' @noRd
##' @author Lang Zhou
content_GC<- function(data){
tidy <- data
tidy$name <- factor(tidy$name, levels = unique(tidy$name))
tidy$ypos <- as.numeric(tidy$name)
seq_num <- unique(tidy$ypos)
lchar_num <- lapply(seq_num, function(j){
clo <- tidy[tidy$ypos == j, ]
y <- prop.table(table(clo$character))
y["GC"] <- y["G"] + y["C"]
num <-setNames(rep(0,5), c("A", "T", "G", "C", "GC"))
num[names(y)] <- y
return(num)
})
char_num <- do.call(rbind,lchar_num)
char_num <- as.data.frame(char_num)
char_num["ypos"] = seq_num
char_num2 <- gather(char_num,character,fre, "A", "T", "C","G","GC")
return(char_num2)
}
================================================
FILE: R/geom_asterisk.R
================================================
##' a ggplot2 layer of asterisk as a polygon
##'
##'
##' @title a ggplot2 layer of asterisk as a polygon
##' @param mapping aes mapping
##' @param data a data frame
##' @param stat the statistical transformation to use on the data
##' for this layer, as a string.
##' @param position position adjustment, either as a string,
##' or the result of a call to a position adjustment function.
##' @param na.rm a logical value
##' @param show.legend a logical value
##' @param inherit.aes a logical value
##' @param ... additional parameters
##' @importFrom ggplot2 layer
##' @return ggplot2 layer
## @export
##' @noRd
##' @author Lang Zhou
##' @examples
##' #library(ggplot2)
##' #ggplot(mtcars, aes(mpg, disp)) + geom_asterisk()
geom_asterisk <- function(mapping = NULL,
data = NULL,
stat = "identity",
position = "identity",
na.rm = FALSE,
show.legend = NA,
inherit.aes = TRUE, ...) {
layer(geom = Geomasterisk,
mapping = mapping,
data = data,
stat = stat,
position = position,
show.legend = show.legend,
inherit.aes = inherit.aes,
params = list(na.rm = na.rm, ...))
}
##' @importFrom grid polygonGrob
##' @importFrom grid gpar
SeedStar <- function(x = NULL , y = NULL) {
char_width <- getOption("asterisk_width")
char_scale_2 <- getOption("char_scale_2")
x_width <- char_scale_2 * diff(range(star$y))
star$x = star$x * x_width/diff(range(star$x))
char_scale <- diff(range(star$x))/diff(range(star$y))
star$x = star$x * (char_width * char_scale)/diff(range(star$x))
star$y = star$y * char_width/diff(range(star$y))
star$x = star$x - min(star$x) - (char_width * char_scale)/2 + x
star$y = star$y - min(star$y) - char_width/2 + y
polygonGrob(star$x, star$y, gp = gpar(fill = "black") )
}
##' @importFrom ggplot2 ggproto
##' @importFrom ggplot2 Geom
##' @importFrom ggplot2 draw_key_polygon
##' @importFrom ggplot2 aes
##' @importFrom grid gTree
Geomasterisk <- ggproto("Geomasterisk", Geom,
required_aes = c("x", "y"),
default_aes = aes(fill = "black"),
draw_key = draw_key_polygon,
draw_panel = function(data, panel_params, coord) {
data <- coord$transform(data, panel_params)
grobs <- lapply(seq_len(nrow(data)), function(i) {
SeedStar(data$x[i], data$y[i])
})
class(grobs) <- "gList"
ggplot2:::ggname("geom_asterisk",
gTree(children = grobs))
}
)
================================================
FILE: R/geom_msa.R
================================================
##' Multiple sequence alignment layer for ggplot2.
##' It creates background tiles with/without sequence characters.
##'
##' @title geom_msa
##' @param data sequence alignment with data frame, generated by tidy_msa().
##' @param font font families, possible values are 'helvetical', 'mono',
##' and 'DroidSansMono', 'TimesNewRoman'. Defaults is 'helvetical'.
##' @param mapping aes mapping
##' If font = NULL, only plot the background tile.
##' @param color A Color scheme. One of 'Clustal', 'Chemistry_AA', 'Shapely_AA',
##' 'Zappo_AA', 'Taylor_AA', 'LETTER','CN6',, 'Chemistry_NT', 'Shapely_NT',
##' 'Zappo_NT', 'Taylor_NT'. Defaults is 'Chemistry_AA'.
##' @param custom_color A data frame with two column called "names" and
##' "color".Customize the color scheme.
##' @param char_width a numeric vector. Specifying the character width in
##' the range of 0 to 1. Defaults is 0.9.
##' @param by_conservation a logical value. The most conserved regions have
##' the brightest colors.
##' @param none_bg a logical value indicating whether background
##' should be displayed. Defaults is FALSE.
##' @param position_highlight A numeric vector of the position that
##' need to be highlighted.
##' @param seq_name a logical value indicating whether sequence names
##' should be displayed. Defaults is 'NULL' which indicates that the
##' sequence name is displayed when 'font = null', but 'font = char'
##' will not be displayed. If 'seq_name = TRUE' the sequence name will
##' be displayed in any case. If 'seq_name = FALSE' the sequence name will not
##' be displayed under any circumstances.
##' @param border a character string. The border color.
##' @param consensus_views a logical value that opening consensus views.
##' @param use_dot a logical value. Displays characters as dots instead of
##' fading their color in the consensus view.
##' @param disagreement a logical value. Displays characters that disagreement
##' to consensus(excludes ambiguous disagreements).
##' @param ignore_gaps a logical value. When selected TRUE,
##' gaps in column are treated as if that row didn't exist.
##' @param ref a character string. Specifying the reference sequence
##' which should be one of input sequences when 'consensus_views' is TRUE.
##' @param position Position adjustment, either as a string, or
##' the result of a call to a position adjustment function,
##' default is 'identity' meaning 'position_identity()'.
##' @param show.legend logical. Should this layer be included in the legends?
##' @param dms logical.
##' @param position_color logical.
##' @param ... additional parameter
##' @return A list
##' @importFrom ggplot2 scale_fill_manual
##' @importFrom utils modifyList
##' @export
##' @examples
##' library(ggplot2)
##'aln <- system.file("extdata", "sample.fasta", package = "ggmsa")
##'tidy_aln <- tidy_msa(aln, start = 150, end = 170)
##'ggplot() + geom_msa(data = tidy_aln, font = NULL) + coord_fixed()
##' @author Guangchuang Yu, Lang Zhou
geom_msa <- function(data, font = "helvetical",
mapping = NULL,
color = "Chemistry_AA",
custom_color = NULL,
char_width = 0.9,
none_bg = FALSE,
by_conservation = FALSE,
position_highlight = NULL,
seq_name = NULL,
border = NULL,
consensus_views = FALSE,
use_dot = FALSE,
disagreement = TRUE,
ignore_gaps = FALSE,
ref = NULL,
position = "identity",
show.legend = FALSE,
dms = FALSE,
position_color = FALSE,
... ) {
data <- msa_data(data,
font = font,
color = color,
custom_color = custom_color,
char_width = char_width,
by_conservation = by_conservation,
consensus_views = consensus_views,
use_dot = use_dot,
disagreement = disagreement,
ignore_gaps = ignore_gaps,
ref = ref)
#legend work
xx <- data[,c("character","color")] %>% unique()
xx <- xx[!is.na(xx$color),]
labs <- lapply(unique(xx$color) %>% seq_along, function(i) {
cols <- unique(xx$color)[i]
dup_char <- xx[xx$color == cols, "character"]
lab <- paste0(dup_char, collapse = ",")
}) %>% do.call("rbind",.) %>% as.vector()
cols <- xx$color %>% unique()
names(cols) <- cols
sacle_tile_cols <- scale_fill_manual(values = cols,
breaks = cols,
labels = labs)
bg_data <- data
#work to ggtreeExtra
if (is.null(mapping)) {
mapping <- aes_(x = ~position, y = ~name, fill = ~I(color))
}
#dms color work
if (dms) {
mapping <- modifyList(mapping, aes_(fill = ~bind_avg))
}
if (position_color) {
mapping <- modifyList(mapping, aes_(fill = ~I(pos_color)))
}
#'seq_name' work
if (!isTRUE(seq_name)) {
if ('y' %in% colnames(data) || isFALSE(seq_name) ) {
y <- as.numeric(bg_data$name)
mapping <- modifyList(mapping, aes_(y = ~y)) #"~y" is seq numbers
}
}
#'position_highlight' work
if (!is.null(position_highlight)) {
none_bg = TRUE
bg_data <- bg_data[bg_data$position %in% position_highlight,]
bg_data$postion <- as.factor(bg_data$position)
mapping <- modifyList(mapping, aes_(x = ~position,
fill = ~color,
width = 1))
}
#'border' work
if(is.null(border)){
ly_bg <- geom_tile(mapping = mapping, data = bg_data, color = 'grey',
inherit.aes = FALSE, position = position,
show.legend = show.legend)
}else{
ly_bg <- geom_tile(mapping = mapping, data = bg_data, color = border,
inherit.aes = FALSE, position = position,
show.legend = show.legend)
}
if (!all(c("yy", "order", "group") %in% colnames(data))) {
if(position_color) {
return(list(ly_bg))
}else{
return(list(ly_bg, sacle_tile_cols))
}
}
if ('y' %in% colnames(data)) {
data$yy = data$yy - as.numeric(data$name) + data$y
}
label_mapping <- aes_(x = ~x, y = ~yy, group = ~group)
# use_dot work
if (consensus_views && !use_dot) {
if(show.legend) {
stop("legends catn't be shown in the consensus view!")
}
label_mapping <- modifyList(label_mapping, aes_(fill = ~I(font_color)))
}
ly_label <- geom_polygon(mapping = label_mapping, data = data,
inherit.aes = FALSE, position = position)
#'none_bg' work
if (none_bg & is.null(position_highlight)) {
return(ly_label)
}
if(consensus_views) {
return(list(ly_bg, ly_label))
}else {
if(position_color){
return(list(ly_bg, ly_label))
}else{
return(list(ly_bg, ly_label, sacle_tile_cols))
}
}
}
================================================
FILE: R/geom_msaBar.R
================================================
##' Multiple sequence alignment layer for ggplot2.
##' It plot sequence conservation bar.
##' @title geom_msaBar
##' @return A list
##' @examples
##' #plot multiple sequence alignment and conservation bar.
##' f <- system.file("extdata/sample.fasta", package="ggmsa")
##' ggmsa(f, 221, 280, font = NULL, seq_name = TRUE) + geom_msaBar()
##' @export
##' @author Lang Zhou
geom_msaBar <- function() {
structure(list(),
class = "msaBar")
}
##' @importFrom ggplot2 geom_col
ly_bar <- function(tidy){
data <- bar_data(tidy)
mapping <- aes_(x = ~pos, y = ~Freq, fill = ~Freq)
ly_bar <- geom_col(data = data,
mapping = mapping,
width = 1,
show.legend = FALSE)
return(ly_bar)
}
##' get bar data
##' @title bar_data
##' @param tidy Multiple aligned sequence files or
##' object for representing nucleotide sequences
##' @return A data frame
##' @noRd
##' @author Lang Zhou
bar_data <- function(tidy){
character_position <- unique(tidy$position)
conservation_score <- lapply(character_position, function(j) {
cloumn_data <- tidy[tidy$position == j, ]
character_frequency <- table(cloumn_data$character) %>% as.data.frame
max_frequency <- character_frequency[character_frequency[2] ==
max(character_frequency[2]),]
max_frequency$Var1 <- as.character(max_frequency$Var1)
if(nrow(max_frequency) == 1) {
max_frequency <- max_frequency[1,]
}else {
max_frequency <- max_frequency[1,]
}
}) %>% do.call("rbind", .)
conservation_score["pos"] <- character_position
return(conservation_score)
}
================================================
FILE: R/geom_seed.R
================================================
##' Highlighting the seed in miRNA sequences
##'
##'
##' @title geom_seed
##' @param seed a character string.Specifying the miRNA seed sequence
##' like 'GAGGUAG'.
##' @param star a logical value indicating whether asterisks should
##' be displayed.
##' @return a ggplot layer
##' @author Lang Zhou
##' @examples
##' miRNA_sequences <- system.file("extdata/seedSample.fa", package="ggmsa")
##' ggmsa(miRNA_sequences, font = 'DroidSansMono',
##' color = "Chemistry_NT", none_bg = TRUE) +
##' geom_seed(seed = "GAGGUAG", star = FALSE)
##' ggmsa(miRNA_sequences, font = 'DroidSansMono',
##' color = "Chemistry_NT") +
##' geom_seed(seed = "GAGGUAG", star = TRUE)
##' @export
geom_seed <- function(seed, star = FALSE) {
structure(list(seed = seed,
star = star),
class = "seed")
}
geom_seed1 <- function(tidyData, seed, star) {
get_asteriskScale(tidyData)
tidyData$y <- as.numeric(tidyData$name)
seq_first <- tidyData[tidyData$y == 1,]
char <- seq_first$character
char <- paste(char, collapse = "")
seedPos <- regexpr(seed,char)
#locate <- str_locate(char, seed)
#df_locate <- as.data.frame(locate)
#seedPos <- df_locate$start # start position of seed region
seedLen <- nchar(seed) # length of seed region
numSeq <- max(tidyData$y) # number of sequences
shadingLen <- getOption("shadingLen") #shading width
shading_alpha <- getOption("shading_alpha")
x <- seedPos - .5 #the x coordinate of the lower left corner
y <- 1 - .5 - shadingLen #the y coordinate of the lower left corner
yy <- numSeq + .5 + shadingLen # #the y coordinate of the top right corner
xx <- x + seedLen #the x coordinate of the top right corner
shadingData <- data.frame(x = c(x, x, xx, xx),
y = c(y, yy, yy, y),
t = c('a', 'a', 'a','a'))
starData <- data.frame(star_x = seq(seedPos, length.out = nchar(seed)),
star_y = rep(y, times = nchar(seed)))
if(isTRUE(star)) {
ly_star <- geom_asterisk(data = starData,
aes_(x = ~star_x, y = ~star_y))
return(ly_star)
}
mapping <- aes_(x= ~x, y= ~y, group= ~t, fill = ~I('#bebebe'))
ly_seed <- geom_polygon(data = shadingData,
mapping = mapping,
alpha = shading_alpha)
return(ly_seed)
}
get_asteriskScale <- function(tidyData) {
m <- max(tidyData$position)
seq_name <- factor(tidyData$name, levels = unique(tidyData$name))
n <- max(as.numeric(seq_name))
char_scale <- diff(range(star$x))/diff(range(star$y))
char_scale_2 <- char_scale * 3/2 * n/m
return(options("char_scale_2" = char_scale_2))
}
================================================
FILE: R/ggmaf.R
================================================
##' plot MAF
##'
##' @title ggmaf
##' @param data a tidy MAF data frame.You can get it by tidy_maf_df()
##' @param ref character, the name of reference genome.
##' eg:"hg38.chr1_KI270707v1_random"
##' @param block_start a numeric vector(>0). The start block to plot.
##' @param block_end a numeric vector(< max block). The end block to plot.
##' @param facet_field a numeric vector. The field in a facet panel.
##' @param heights two numeric vector.The plot proportion between
##' "Genomic location" panel(upon) and "Alignment" panel(down).
##' Default:c(0.4,0.6)
##' @param facet_heights Numeric vectors.The facet proportion.
##' @return ggplot object
##' @export
##' @author Lang Zhou
ggmaf <- function(data,
ref,
block_start = NULL,
block_end = NULL,
facet_field = NULL,
heights = c(0.4,0.6),
facet_heights = NULL) {
d <- data[data$block_number %in% c(block_start : block_end),]
if(is.null(facet_field)) {
maf_p <- maf_plot(d = d, ref = ref)
p <- plot_list(gglist = maf_p, heights = heights)
return(p)
}else {
d <- facet_maf(mafData = d, field = facet_field)
p_ls <- lapply(unique(d$facet), function(i) {
facet_d <- d[d$facet == i,]
maf_p <- maf_plot(d = facet_d, ref = ref)
pp <- plot_list(gglist = maf_p, heights = heights)
return(pp)
})
p <- plot_list(gglist = p_ls, ncol = 1, heights = facet_heights)
return(p)
}
}
##' tidy MAF data frame
##'
##' @title tidy_maf_df
##' @param maf_df a MAF data frame.You can get it by read_maf()
##' @param ref character, the name of reference genome.
##' eg:"hg38.chr1_KI270707v1_random"
##' @return data frame
##' @export
##' @author Lang Zhou
tidy_maf_df <- function(maf_df,ref) {
##add ref position to other genome
block_num <- unique(maf_df$block)
tidy_df <- lapply(block_num, function(i) {
x <- maf_df[maf_df$block == i,]
x$ref_start <- x[x$src == ref, "start"]
x$ref_end <- x[x$src == ref, "end_gap"]
return(x)
})%>% do.call("rbind", .)
tidy_df$block_number <- factor(tidy_df$block, levels =
unique(tidy_df$block)) %>% as.numeric
tidy_df$bs <- paste0(tidy_df$src,"-",tidy_df$block)
tidy_df$merge_y <- factor(tidy_df$src) %>% as.numeric
tidy_df$label <- paste0("B",tidy_df$block_number)
tidy_df <- order_aln(tidy_df,ref)
return(tidy_df)
}
#put the ref sequence the first in each block, new col "y"
order_aln <- function(tidy_df, ref) {
block_num <- unique(tidy_df$block)
lev <- sapply(block_num, function(i) {
x <- tidy_df[tidy_df$block == i,]
order <- c(ref, x$src[!x$src %in% ref])
lev <- paste0(order, "-",x$block)
return(lev)
})%>% unlist %>% rev
tidy_df$y <- factor(tidy_df$bs,levels = lev) %>% as.numeric
return(tidy_df)
}
##' @importFrom utils getFromNamespace
##' @importFrom ggplot2 aes_
##' @importFrom ggplot2 geom_text
maf_plot <- function(d, ref,
positive_color = "#a9c9d4",
negative_color = "#ffa389") {
geom_rrect <- getFromNamespace("geom_rrect","statebins")
##plot down panel
p_maf_aln <- ggplot(data = d) +
geom_rrect(mapping=aes_(xmin =~ ref_start,
xmax =~ ref_end,
ymin =~ y - 0.3,
ymax =~ y + 0.3,
fill =~ strand)) +
geom_rrect(data = d,
mapping=aes_(xmin =~ ref_start,
xmax =~ ref_end,
ymin =~ max(y) + 1 - 0.3,
ymax =~ max(y) + 1 + 0.3),
fill = "#a9c9d4",color = "black") +
scale_y_continuous(breaks = c(d$y,max(d$y + 1)),labels = c(d$bs, ref)) +
scale_fill_manual(breaks = c("+","-"),
values = c(positive_color,negative_color)) +
theme_void() +
theme(axis.text.x = element_text(),
axis.text.y = element_text(),
panel.grid.minor.y = element_blank(),
panel.grid.major.y = element_line(color = "grey"))
##plot upon panel
aim <- d[d$src != ref, ]
p_maf_genomePos <- ggplot(data = aim) +
geom_rrect(mapping = aes_(xmin =~ start,
xmax =~ end_gap,
ymin =~ merge_y - 0.3,
ymax =~ merge_y + 0.3,
fill =~ strand),
color = "black",
size = 0.5,
alpha = 0.8,
show.legend = FALSE) +
scale_y_continuous(breaks = unique(aim$merge_y),
labels = unique(aim$src)) +
scale_fill_manual(breaks = c("+","-"),
values = c(positive_color,negative_color)) +
theme_void() + theme(panel.grid.major.y = element_line(color = "grey"),
axis.text.x = element_text(),
axis.text.y = element_text(),
strip.text = element_blank()) +
geom_text(aes_(x =~ (start + end_gap)/2,
y =~ merge_y,label =~ label),
size = 3) +
facet_wrap(~src, scales = "free", ncol = 1)
return(list(p_maf_genomePos, p_maf_aln))
}
#assign facet number to blocks
facet_maf <- function(mafData, field) {
if(min(mafData$block_number) > 1){
pos_reset <- mafData$block_number - min(mafData$block_number) + 1
#pos_reset[pos_reset == 0] <- 1
}else {
pos_reset <- mafData$block_number
}
mafData$facet <- pos_reset %/% field
mafData[(pos_reset %% field) == 0, "facet"] <-
mafData[(pos_reset %% field) == 0, "facet"] - 1
return(mafData)
}
================================================
FILE: R/ggmsa.R
================================================
##' Plot multiple sequence alignment using ggplot2 with multiple color schemes
##' supported.
##'
##'
##' @title ggmsa
##' @param msa Multiple aligned sequence files or objects representing either
##' nucleotide sequences or AA sequences.
##' @param start a numeric vector. Start position to plot.
##' @param end a numeric vector. End position to plot.
##' @param font font families, possible values are 'helvetical', 'mono', and
##' 'DroidSansMono', 'TimesNewRoman'. Defaults is 'helvetical'.
##' If font = NULL, only plot the background tile.
##' @param color a Color scheme. One of 'Clustal', 'Chemistry_AA',
##' 'Shapely_AA', 'Zappo_AA', 'Taylor_AA', 'LETTER', 'CN6', 'Chemistry_NT',
##' 'Shapely_NT', 'Zappo_NT', 'Taylor_NT'. Defaults is 'Chemistry_AA'.
##' @param custom_color A data frame with two column called "names" and
##' "color".Customize the color scheme.
##' @param char_width a numeric vector. Specifying the character width in
##' the range of 0 to 1. Defaults is 0.9.
##' @param by_conservation a logical value. The most conserved regions have
##' the brightest colors.
##' @param none_bg a logical value indicating whether background should be
##' displayed. Defaults is FALSE.
##' @param position_highlight A numeric vector of the position that need to be
##' highlighted.
##' @param seq_name a logical value indicating whether sequence names
##' should be displayed. Defaults is 'NULL' which indicates that the
##' sequence name is displayed when 'font = null', but 'font = char'
##' will not be displayed. If 'seq_name = TRUE' the sequence name will
##' be displayed in any case. If 'seq_name = FALSE' the sequence name
##' will not be displayed under any circumstances.
##' @param border a character string. The border color.
##' @param consensus_views a logical value that opening consensus views.
##' @param use_dot a logical value. Displays characters as dots instead
##' of fading their color in the consensus view.
##' @param disagreement a logical value. Displays characters that
##' disagreememt to consensus(excludes ambiguous disagreements).
##' @param ignore_gaps a logical value. When selected TRUE, gaps in column
##' are treated as if that row didn't exist.
##' @param ref a character string. Specifying the reference sequence which
##' should be one of input sequences when 'consensus_views' is TRUE.
##' @param show.legend logical. Should this layer be included in the legends?
##' @return ggplot object
##' @importFrom tidyr gather
##' @importFrom ggplot2 ggplot
##' @importFrom ggplot2 aes_
##' @importFrom ggplot2 theme
##' @importFrom ggplot2 theme_minimal
##' @importFrom ggplot2 geom_tile
##' @importFrom ggplot2 geom_polygon
##' @importFrom ggplot2 xlab
##' @importFrom ggplot2 ylab
##' @importFrom ggplot2 coord_fixed
##' @importFrom ggplot2 geom_point
##' @importFrom ggplot2 element_blank
##' @importFrom magrittr %>%
##' @importFrom stats setNames
##' @importFrom grid unit
##' @examples
##' #plot multiple sequences by loading fasta format
##' fasta <- system.file("extdata", "sample.fasta", package = "ggmsa")
##' ggmsa(fasta, 164, 213, color="Chemistry_AA")
##'
##'\dontrun{
##' #XMultipleAlignment objects can be used as input in the 'ggmsa'
##' AAMultipleAlignment <- Biostrings::readAAMultipleAlignment(fasta)
##' ggmsa(AAMultipleAlignment, 164, 213, color="Chemistry_AA")
##'
##' #XStringSet objects can be used as input in the 'ggmsa'
##' AAStringSet <- Biostrings::readAAStringSet(fasta)
##' ggmsa(AAStringSet, 164, 213, color="Chemistry_AA")
##'
##' #Xbin objects from 'seqmagick' can be used as input in the 'ggmsa'
##' AAbin <- seqmagick::fa_read(fasta)
##' ggmsa(AAbin, 164, 213, color="Chemistry_AA")
##' }
##' @export
##' @author Guangchuang Yu
ggmsa <- function(msa,
start = NULL,
end = NULL,
font = "helvetical",
color = "Chemistry_AA",
custom_color = NULL,
char_width = 0.9,
none_bg = FALSE,
by_conservation = FALSE,
position_highlight = NULL,
seq_name = NULL,
border = NULL,
consensus_views = FALSE,
use_dot = FALSE,
disagreement = TRUE,
ignore_gaps = FALSE,
ref = NULL,
show.legend = FALSE) {
data <- tidy_msa(msa, start = start, end = end)
ggplot() + geom_msa(data, font = font,
color = color,
custom_color = custom_color,
char_width = char_width,
none_bg = none_bg,
by_conservation = by_conservation,
position_highlight = position_highlight,
seq_name = seq_name,
border = border,
consensus_views = consensus_views,
use_dot = use_dot,
disagreement = disagreement,
ignore_gaps = ignore_gaps,
ref = ref,
show.legend = show.legend) +
theme_msa()
}
================================================
FILE: R/import-functions.R
================================================
##' @importFrom utils globalVariables
globalVariables(".")
globalVariables("fre") #geom_GC.R:
globalVariables("read.delim") #arc.R
globalVariables(c("name", "position_adj", "y_adj")) #SeqBundles.R
================================================
FILE: R/method-plot.R
================================================
##' plot method for SeqDiff object
##'
##' @name plot
##' @rdname plot-methods
##' @exportMethod plot
##' @aliases plot,SeqDiff,ANY-method
##' @docType methods
##' @param x SeqDiff object
##' @param width bin width
##' @param title plot title
##' @param xlab xlab
##' @param by one of 'bar' and 'area'
##' @param fill fill color of upper part of the plot
##' @param colors color of lower part of the plot
##' @param xlim limits of x-axis
##' @return plot
##' @importFrom ggplot2 ggtitle
##' @importFrom ggplot2 xlim
##' @importFrom ggplot2 ggplot_gtable
##' @importFrom ggplot2 ggplot_build
##' @importFrom grid unit.pmax
##' @importFrom aplot plot_list
##' @author guangchuang yu
##' @examples
##' fas <- list.files(system.file("extdata", "GVariation", package="ggmsa"),
##' pattern="fas", full.names=TRUE)
##' x1 <- seqdiff(fas[1], reference=1)
##' plot(x1)
setMethod("plot", signature(x="SeqDiff"),
function(x, width=50, title="auto",
xlab = "Nucleotide Position",
by="bar", fill="firebrick",
colors=c(A="#ff6d6d", C="#769dcc", G="#f2be3c", T="#74ce98"),
xlim = NULL) {
nn <- names(x@sequence)
if (is.null(title) || is.na(title)) {
title <- ""
} else if (title == "auto") {
title <- paste(nn[-x@reference],
"nucelotide differences relative to",
nn[x@reference])
}
p1 <- plot_difference_count(x@diff, width, by=by, fill=fill) +
ggtitle(title)
p2 <- plot_difference(x@diff, colors=colors, xlab)
if (!is.null(xlim)) {
p1 <- p1 + xlim(xlim)
p2 <- p2 + xlim(xlim)
}
plot_list(p1, p2, ncol=1, heights=c(.7, .4))
}
)
##' @importFrom ggplot2 ggplot
##' @importFrom ggplot2 aes_
##' @importFrom ggplot2 geom_segment
##' @importFrom ggplot2 xlab
##' @importFrom ggplot2 ylab
##' @importFrom ggplot2 scale_y_continuous
##' @importFrom ggplot2 theme_minimal
##' @importFrom ggplot2 theme
##' @importFrom ggplot2 element_blank
##' @importFrom ggplot2 scale_color_manual
plot_difference <- function(x, colors, xlab="Nucleotide Position") {
x$difference <- x$difference %>% toupper
yy = 4:1
names(yy) = c("A", "C", "G", "T")
x$y <- yy[x$difference]
n <- sum(is.na(x$y))
if (n > 0) {
message(n, " sites contain deletions or ambiguous bases,
which will be ignored in current implementation...")
}
x <- x[!is.na(x$y),]
p <- ggplot(x, aes_(x=~position, y=~y, color=~difference))
p + geom_segment(aes_(x=~position, xend=~position, y=~y, yend=~y+.8)) +
xlab(xlab) + ylab(NULL) +
scale_y_continuous(breaks=yy, labels=names(yy)) +
theme_minimal() +
theme(legend.position="none")+
theme(axis.text.x=element_blank(), axis.ticks.x = element_blank()) +
scale_color_manual(values=colors)
}
##' @importFrom ggplot2 geom_col
##' @importFrom ggplot2 geom_area
##' @importFrom ggplot2 theme_bw
plot_difference_count <- function(x, width, by = 'bar', fill='red') {
by <- match.arg(by, c("bar", "area"))
if (by == 'bar') {
geom <- geom_col(fill=fill, width=width)
keep0 <- FALSE
} else if (by == "area") {
geom <- geom_area(fill=fill)
keep0 <- TRUE
}
d <- nucleotide_difference_count(x, width, keep0)
p <- ggplot(d, aes_(x=~position, y=~count))
p + geom + xlab(NULL) + ylab("Difference") + theme_bw()
}
================================================
FILE: R/method-show.R
================================================
##' show method
##'
##'
##' @name show
##' @docType methods
##' @rdname show-methods
##' @title show method
##' @param object SeqDiff object
##' @return message
##' @importFrom methods show
##' @exportMethod show
##' @aliases SeqDiff-class
##' show,SeqDiff-method
##' @usage show(object)
##' @examples
##' fas <- list.files(system.file("extdata", "GVariation", package="ggmsa"),
##' pattern="fas", full.names=TRUE)
##' x1 <- seqdiff(fas[1], reference=1)
##' x1
setMethod("show",signature(object="SeqDiff"),
function(object) {
message("sequence differences of",
paste0(names(object@sequence), collapse=" and "),
'\n')
d <- object@diff$difference %>% table %>% as.data.frame
message(sum(d$Freq), " ", "sites differ:\n")
freq <- d[,2]
names(freq) <- d[,1]
print(freq)
})
================================================
FILE: R/methods-diff.R
================================================
##' @method diff SeqDiff
##' @export
diff.SeqDiff <- function(x, ...) {
x@diff
}
================================================
FILE: R/methods-ggplot_add.R
================================================
##' @method ggplot_add seqlogo
##' @export
ggplot_add.seqlogo <- function(object, plot, object_name) {
msaData <- plot$layers[[1]]$data
logo_tidyData <- msa2tidy(msaData)
logo_font <- object$font
logo_color <- object[["color"]]
adaptive <- object$adaptive
top <- object$top
logo_custom_color <- object[["custom_color"]]
show.legend <- object$show.legend
ly_logo <- geom_logo(data = logo_tidyData,
font = logo_font,
color = logo_color,
adaptive = adaptive,
top = top,
custom_color = logo_custom_color,
show.legend = show.legend)
ggplot_add(ly_logo, plot, object_name)
}
##' @method ggplot_add seed
##' @export
ggplot_add.seed <- function(object, plot, object_name) {
msaData <- plot$layers[[1]]$data
seed_tidyData <- msa2tidy(msaData)
seed <- object$seed
star <- object$star
ly <- geom_seed1(seed_tidyData, seed, star)
ggplot_add(ly, plot, object_name)
}
##' @method ggplot_add GCcontent
##' @export
ggplot_add.GCcontent <- function(object, plot, object_name) {
msaData <- plot$layers[[1]]$data
show.legend <- object$show.legend
GC_tidyData <- msa2tidy(msaData)
ly <- geom_GC1(GC_tidyData, show.legend = show.legend )
ggplot_add(ly, plot, object_name)
}
##' @importFrom ggplot2 facet_wrap
##' @importFrom ggplot2 ggplot_add
##' @importFrom ggplot2 scale_x_continuous
##' @importFrom ggplot2 coord_cartesian
##' @importFrom ggplot2 geom_blank
##' @method ggplot_add facet_msa
##' @export
ggplot_add.facet_msa <- function(object, plot, object_name){
msaData <- plot$layers[[1]]$data
field <- object$field
facetData <- facet_data(msaData, field)
##update data
plot$layers[[1]]$data <- facetData #ly_bg
if (length(plot$layers) > 1){
plot$layers[[2]]$data <- facetData #ly_label
}
region <- diff(range(facetData$position))
xl_scale <- facet_scale(facetData, field)
if (region %% field == 0) {
plot + facet_wrap(.~facet, ncol = 1, scales = "free_x") +
scale_x_continuous(expand = c(0,0),
breaks = xl_scale,
labels = xl_scale) +
coord_cartesian()
}else {
max_pos <- facetData$position %>% max
min_pos <- facetData$position %>% min
max_facet <- facetData$facet %>% max
minpos_maxfacet <- facetData[facetData$facet ==
max_facet,"position"] %>% min
expand_pos <- (region %/% field + 1) * field + min_pos
dummy <- data.frame(x = c(minpos_maxfacet, expand_pos),
facet = max_facet)
plot +
facet_wrap(.~facet, ncol = 1, scales = "free_x") +
geom_blank(aes_(x = ~x), dummy, inherit.aes = FALSE) +
scale_x_continuous(expand = c(0,0),
breaks = xl_scale,
labels = xl_scale) +
coord_cartesian()
}
}
##' @method ggplot_add msaBar
##' @importFrom aplot insert_top
##' @importFrom ggplot2 coord_cartesian
##' @export
ggplot_add.msaBar <- function(object, plot, object_name){
msaData <- plot$layers[[1]]$data
bar_tidyData <- msa2tidy(msaData)
ly <- ly_bar(bar_tidyData)
p_bar <- ggplot() + ly_bar(bar_tidyData) + bar_theme(bar_tidyData)
plot <- plot + coord_cartesian()
p_bar %>% insert_top(plot, height = 3)
}
##' @method ggplot_add nucleotideeHelix
##' @export
ggplot_add.nucleotideeHelix <- function(object, plot, object_name){
msa_data <- plot$layers[[1]]$data
tidy_data <- msa2tidy(msa_data)
seq_numbers <- levels(tidy_data$name) %>% length
helix_data <- object$helix_data
color_by <- object$color_by
overlap <- object$overlap
if(is.data.frame(helix_data)) {
helix_tidy <- tidy_helix(helix_data, color_by = color_by)
}else {
helix_tidy <- tidy_list_helix(helix_data, color_by = color_by)
}
ly <- layer_helix(helix_data = helix_tidy,
overlap = overlap,
seq_numbers = seq_numbers)
ggplot_add(ly, plot, object_name)
}
================================================
FILE: R/msa_data.R
================================================
##' This function parses FASTA files or other sequence objects.
##' And assign color to each molecule (amino acid or nucleotide) according to
##' the selected color scheme.
##'
##'
##' @title msa_data
##' @param tidymsa sequence alignment with data frame, generated by tidy_msa().
##' @param font font families, possible values are 'helvetical', 'mono',
##' and 'DroidSansMono', 'TimesNewRoman'. . Defaults is 'helvetical'.
##' If you specify font = NULL, only the background box will be printed.
##' @param color A Color scheme. One of 'Clustal', 'Chemistry_AA', '
##' Shapely_AA', 'Zappo_AA', 'Taylor_AA','LETTER','CN6', 'Chemistry_NT',
##' 'Shapely_NT', 'Zappo_NT', 'Taylor_NT'.Defaults is 'Chemistry_AA.
##' @param custom_color A data frame with two cloumn called "names" and
##' "color".Customize the color scheme.
##' @param order vectors.Specified sequences order.
##' @param char_width a numeric vector. Specifying the character
##' width in the range of 0 to 1. Defaults is 0.9.
##' @param by_conservation a logical value. The most conserved
##' regions have the brightest colors.
##' @param consensus_views a logical value that opeaning consensus views.
##' @param use_dot a logical value. Displays characters as dots
##' instead of fading their color in the consensus view.
##' @param disagreement a logical value. Displays characters that
##' disagreememt to consensus(excludes ambiguous disagreements).
##' @param ignore_gaps a logical value. When selected TRUE, gaps
##' in column are treated as if that row didn't exist.
##' @param ref a character string. Specifying the reference sequence
##' which should be one of input sequences when 'consensus_views' is TRUE.
##' @return A data frame
##' @examples
##' fasta <- system.file("extdata/sample.fasta", package="ggmsa")
##' data <- msa_data(fasta, 20, 120,
##' font = "helvetical",
##' color = 'Chemistry_AA' )
## @export
##' @noRd
##' @author Guangchuang Yu, Lang Zhou
msa_data <- function(tidymsa, font = "helvetical",
color = "Chemistry_AA",
custom_color = NULL,
char_width = 0.9,
by_conservation = FALSE,
consensus_views = FALSE,
use_dot = FALSE,
disagreement = TRUE,
ignore_gaps = FALSE,
ref = NULL) {
if (is.null(custom_color)) {
color <- match.arg(color, c("Clustal", "Chemistry_AA", "Shapely_AA",
"Zappo_AA", "Taylor_AA","Chemistry_NT",
"Shapely_NT", "Zappo_NT", "Taylor_NT",
"LETTER", "CN6", "Hydrophobicity" ))
}
y <- tidymsa
## add color
if (color == "Clustal"){
y <- color_Clustal(y)
}else {
if (consensus_views) {
consensus <- get_consensus(y, #extract a consensus/ref sequence
ignore_gaps = ignore_gaps,
ref = ref)
tc <- color_scheme(y, color) %>% #assigning color for other seq.
tidy_color(consensus, disagreement, ref = ref)# tidy colors
y <- color_scheme(consensus, color) %>% #assigning color for con/ref
rbind(tc) #add consensus sequence
if (use_dot){
y[is.na(y$color), "character"] <- "."
}else {
y$font_color <- "#000000"
y[is.na(y$color), "font_color"] <- "#aaacaf"
y[is.na(y$color), "color"] <- "#ffffff"
}
}else {
y <- color_scheme(y, color, custom_color)
}
}
if (by_conservation){
y <- color_visibility(y)
}
if (is.null(font)) {
return(y)
}
## calling internal polygons
font_f <- font_fam[[font]]
#debug using'as.character()'
data_sp <- font_f[as.character(unique(y$character))]
## To adapt to tree data
if (!'name' %in% names(y) & !consensus_views) {
if ('label' %in% names(y)) {
names(y)[names(y) == 'label'] <- "name"
}else {
stop("unknown sequence name...")
}
}
if(!is.factor(y$name) & !consensus_views){
lev <- unique(data.frame(y[,c("name","y")]))
# y is the order of the nodes in the tree
lev <- lev[order(lev$y), "name"]
y$name <- factor(y$name, levels = lev)
} else if(consensus_views) {
y$name <- order_name(y$name,
consensus_views = consensus_views,
ref = ref)
}
y$ypos <- as.numeric(y$name)
# for ggtreeExtra
if ("new_position" %in% colnames(y)) {
scale_n <- 5 * length(unique(y$name))/diff(range(y$new_position))
char_width <- char_width *
diff(range(y$new_position))/diff(range(y$position))
}
yy <- lapply(seq_len(nrow(y)), function(i) {
d <- y[i, ]
dd <- data_sp[[d$character]]
if(d$character == "."){ # '.' without zooming
if ("new_position" %in% colnames(d)){
dd$x <- dd$x - min(dd$x) + d$new_position - diff(range(dd$x))/2
}else{
dd$x <- dd$x - min(dd$x) + d$position - diff(range(dd$x))/2
}
dd$y <- dd$y - min(dd$y) + d$ypos - diff(range(dd$y))/2
}else {# other characters
char_scale <- diff(range(dd$x))/diff(range(dd$y))#equal proportion
#y_width = char_width, x-width scaled proportionally
if(diff(range(dd$x)) <= diff(range(dd$y))) {
dd$x <- dd$x * (char_width * char_scale)/diff(range(dd$x))
# for ggtreeExtra
if ("new_position" %in% colnames(d)){
dd$y <- (dd$y * char_width)/diff(range(dd$y)) * scale_n
dd$x <- dd$x - min(dd$x) + d$new_position -
(char_width * char_scale)/2
dd$y <- dd$y - min(dd$y) + d$ypos - scale_n * char_width/2
}else{
dd$y <- (dd$y * char_width)/diff(range(dd$y))
dd$x <- dd$x - min(dd$x) + d$position -
(char_width * char_scale)/2
dd$y <- dd$y - min(dd$y) + d$ypos - char_width/2
}
}else{#x_width = char_width, y-width scaled proportionally
dd$x <- dd$x * char_width/diff(range(dd$x))
# for ggtreeExtra
if ("new_position" %in% colnames(d)){
dd$y <- dd$y *
char_width/(diff(range(dd$y)) * char_scale) * scale_n
dd$x <- dd$x - min(dd$x) + d$new_position - char_width/2
dd$y <- dd$y - min(dd$y) + d$ypos -
(scale_n * char_width/char_scale)/2
}else{
dd$y <- dd$y * char_width/(diff(range(dd$y)) * char_scale)
dd$x <- dd$x - min(dd$x) + d$position - char_width/2
dd$y <- dd$y - min(dd$y) + d$ypos -
(char_width/char_scale)/2
}
}
}
cn <- colnames(d)
cn <- cn[!cn %in% c('x','y', 'ypos')]
for (nn in cn) {
dd[[nn]] <- d[[nn]]
}
dd$group <- paste0("V", d$position, "L", d$ypos)
return(dd)
})
ydf <- do.call(rbind, yy)
colnames(ydf)[colnames(ydf) == 'y'] <- 'yy'
ydf$y <- as.numeric(ydf$name)
ydf <- cbind(label = ydf$name, ydf)
return(ydf)
}
##' Convert msa file/object to tidy data frame.
##'
##'
##' @title tidy_msa
##' @param msa multiple sequence alignment file or sequence object in
##' DNAStringSet, RNAStringSet, AAStringSet, BStringSet, DNAMultipleAlignment,
##' RNAMultipleAlignment, AAMultipleAlignment, DNAbin or AAbin
##' @param start start position to extract subset of alignment
##' @param end end position to extract subset of alignemnt
##' @return tibble data frame
##' @export
##' @examples
##' fasta <- system.file("extdata", "sample.fasta", package = "ggmsa")
##' aln <- tidy_msa(msa = fasta, start = 10, end = 100)
##' @author Guangchuang Yu
tidy_msa <- function(msa, start = NULL, end = NULL) {
if(inherits(msa, "character") && length(msa) > 1) {
aln <- msa
}else {
aln <- prepare_msa(msa)
}
alnmat <- lapply(seq_along(aln), function(i) {
##Preventing function collisions
base::strsplit(as.character(aln[[i]]), '')[[1]]
}) %>% do.call('rbind', .)
## for DNAbin and AAbin
alndf <- as.data.frame(alnmat, stringsAsFactors = FALSE)
if(unique(names(aln)) %>% length == length(aln)) {
alndf$name = names(aln)
}else{
stop("Sequences must have unique names")
}
cn = colnames(alndf)
cn <- cn[!cn %in% "name"]
df <- gather(alndf, "position", "character", cn)
y <- df
y$position = as.numeric(sub("V", "", y$position))
y$character = toupper(y$character)
y$name = factor(y$name, levels=rev(names(aln)))
if (is.null(start)) start <- min(y$position)
if (is.null(end)) end <- max(y$position)
y <- y[y$position >=start & y$position <= end, ]
return(y)
}
##' This function converts the msa_data to the tidy data.
##'
##' @param msaData sequence alignment data generated by msa_data().
##' @noRd
msa2tidy <- function(msaData) {
if ("order" %in% names(msaData)) {
msaData <- msaData[msaData$order == 1,]
}
df_tidy <- data.frame(name = msaData$name,
position = msaData$position,
character = msaData$character)
df_tidy$character <- as.character(df_tidy$character)
return(df_tidy)
}
================================================
FILE: R/pp_interactive.R
================================================
make_gap <- function(gap, previous_seq) {
gap_df <- previous_seq[rep(1, each=gap),]
gap_start <- max(previous_seq$position) + 1
gap_df$position <- gap_start : (gap_start + gap - 1 )
gap_df$character <- "-"
if("pos_previous" %in% names(gap_df)) {
gap_df$pos_previous <- 0
}
return(gap_df)
}
##' merge two MSA
##'
##' @title merge_seq
##' @param previous_seq previous MSA
##' @param subsequent_seq subsequent MSA
##' @param gap gap length
##' @param adjust_name logical value. merge seq name or not
##' @return tidy MSA data frame
##' @export
##' @author Lang Zhou
merge_seq <- function(previous_seq, gap, subsequent_seq, adjust_name = TRUE) {
name_pre <- levels(previous_seq$name)
name_subse <- levels(subsequent_seq$name)
if(length(name_pre) != length(name_subse)) {
stop("The sequences number of previous_seq and subsequent_seq is inconsistent")
}
gap_df <- make_gap(gap = gap, previous_seq = previous_seq)
subsequent_seq$position <-
subsequent_seq$position - min(subsequent_seq$position) + 1
subsequent_seq$position <-
subsequent_seq$position + max(previous_seq$position) + gap
t_merge <- rbind(previous_seq,gap_df,subsequent_seq)
if (adjust_name) {
rownames(t_merge) <- seq(nrow(t_merge))
names(t_merge)[1] <- "name_previous"
t_merge$name <- ""
for(i in seq(length(name_pre))) {
t_merge[t_merge$name_previous %in% c(name_pre[i], name_subse[i]),"name"] <-
paste0(name_pre[i],"-", name_subse[i])
}
t_merge$name <- factor(t_merge$name)
}
return(t_merge)
}
##' tidy protein-protein interactive position data
##'
##' @title tidy_hdata
##' @param gap gap length
##' @param inter protein-protein interactive position data
##' @param previous_seq previous MSA
##' @param subsequent_seq subsequent MSA
##' @importFrom R4RNA as.helix
##' @return helix data
##' @export
##' @author Lang Zhou
tidy_hdata <- function(gap, inter, previous_seq,subsequent_seq) {
inter$j <- inter$Res.no..2 -
min(subsequent_seq$position) +
max(previous_seq$position) + gap + 1
hdata <- data.frame(i = inter$Res.no.1,
j = inter$j,
length = 1,
value = NA,
colour = "blue")
hdata <- as.helix(hdata)
return(hdata)
}
##' reset MSA position
##'
##' @title reset_pos
##' @param seq_df MSA data
##' @return data frame
##' @export
##' @author Lang Zhou
reset_pos <- function(seq_df) {
names(seq_df)[2] <- "pos_previous"
seq_df$position <- ""
for(i in unique(seq_df$pos_previous)%>% seq) {
uni <- unique(seq_df$pos_previous)
seq_df[seq_df$pos_previous == uni[i],"position"] <- i
}
seq_df$position <- as.numeric(seq_df$position)
return(seq_df)
}
##' reset hdata data position
##'
##' @title simplify_hdata
##' @param hdata data from tidy_hdata()
##' @param sim_msa MSA data frame
##' @return data frame
##' @export
##' @author Lang Zhou
simplify_hdata <- function(hdata, sim_msa) {
new_hdata <- lapply(seq(nrow(hdata)), function(a) {
n <- hdata[a,]
n$pre_i <- n$i
n$i <- sim_msa[sim_msa$pos_previous == n$i,"position"] %>% unique
return(n)
}) %>% do.call("rbind",.)
new_hdata <- lapply(seq(nrow(new_hdata)), function(a) {
n <- new_hdata[a,]
n$pre_j <- n$j
n$j <- sim_msa[sim_msa$pos_previous == n$j,"position"] %>% unique
return(n)
}) %>% do.call("rbind",.)
new_hdata <- as.helix(new_hdata)
return(new_hdata)
}
================================================
FILE: R/prepare_fasta.R
================================================
##' preparing multiple sequence alignment
##'
##' This function supports both NT or AA sequences; It supports multiple
##' input formats such as "DNAStringSet", "BStringSet", "AAStringSet",
##' DNAbin", "AAbin" and a filepath.
##' @title prepare_msa
##' @param msa a multiple sequence alignment file or object
##' @return BStringSet based object
##' @importFrom Biostrings DNAStringSet
##' @importFrom Biostrings RNAStringSet
##' @importFrom Biostrings AAStringSet
##' @importFrom methods missingArg
##' @importFrom seqmagick fa_read
## @export
##' @author Lang Zhou and Guangchuang Yu
##' @noRd
prepare_msa <- function(msa) {
if (missingArg(msa)) {
stop("no input...")
} else if (inherits(msa, "character")) {
msa <- fa_read(msa)
} else if (!class(msa) %in% supported_msa_class) {
stop("multiple sequence alignment object no supported...")
}
res <- switch(class(msa),
DNAbin = DNAbin2DNAStringSet(msa),
AAbin = AAbin2AAStringSet(msa),
DNAMultipleAlignment = DNAStringSet(msa),
RNAMultipleAlignment = RNAStringSet(msa),
AAMultipleAlignment = AAStringSet(msa),
msa ## DNAstringSet, RNAStringSet, AAString, BStringSet
)
return(res)
}
DNAbin2DNAStringSet <- function(msa) {
seqs <- vapply(seq_along(msa),
function(i) paste0(as.character(msa[i]) %>% unlist,
collapse=''),
character(1))
names(seqs) <- names(msa)
switch(class(msa),
DNAbin = DNAStringSet(seqs),
AAbin = AAStringSet(seqs))
}
AAbin2AAStringSet <- DNAbin2DNAStringSet
supported_msa_class <- c("DNAStringSet",
"RNAStringSet",
"AAStringSet",
"BStringSet",
"DNAMultipleAlignment",
"RNAMultipleAlignment",
"AAMultipleAlignment",
"DNAbin",
"AAbin")
================================================
FILE: R/read_maf.R
================================================
##' read 'multiple alignment format'(MAF) file
##'
##' @title read_maf
##' @param multiple_alignment_format a multiple alignment format(MAF) file
##' @return data frame
##' @export
##' @author Lang Zhou
read_maf <- function(multiple_alignment_format) {
line <- readLines(multiple_alignment_format)
head <- sapply(line, function(i) substring(i,1,1))
rm(line)# 'line' in names(heads)
#remove header
head <- head[-seq(which(head == "#"))]
#split block
blank <- which(head == "")
block_ls <- lapply(seq(blank), function(i) {
if (blank[i] == min(blank)) {
x <- names(head)[1:blank[i]]
}else {
x <- names(head)[blank[i-1]:blank[i]]
}
return(x)
})
names(block_ls) <- paste0("block_",seq(length(block_ls)))
#extra lines starting with "s"
s_block <- lapply(seq(length(block_ls)), function(i) {
blocki <- block_ls[[i]]
line_s <- blocki[sapply(blocki, function(j) substring(j,1,1)) == "s"]
})
names(s_block) <- names(block_ls)
#get a MAF df
s_name <- c("type", "src", "start", "size", "strand", 'src_size', "text")
seq_df <-lapply(seq(length(s_block)), function(i) {
blocki <- s_block[[i]]
seq_df <- lapply(seq(length(blocki)), function(j) {
x <- blocki[[j]]
#extra all columns
x <- strsplit(x, " ") %>% unlist
x1 <- x[sapply(x, nchar) > 0]
#convert to data frame
seq <- t(as.matrix(x1)) %>% as.data.frame()
names(seq) <- s_name
seq[,c("start","size",'src_size')] <-
seq[,c("start","size",'src_size')] %>%as.numeric()
seq$size_gap <- nchar(seq$text)
seq$end <- seq$start + seq$size
seq$end_gap <- seq$start + seq$size_gap
seq$block <- names(s_block[i])
return(seq)
})%>% do.call("rbind", .)
return(seq_df)
}) %>% do.call("rbind", .)
}
================================================
FILE: R/seqdiff.R
================================================
##' calculate difference of two aligned sequences
##'
##'
##' @title seqdiff
##' @param fasta fasta file
##' @param reference which sequence serve as reference, 1 or 2
##' @return SeqDiff object
##' @export
##' @importFrom Biostrings readBStringSet
##' @importClassesFrom Biostrings BStringSet
##' @importFrom methods new
##' @author guangchuang yu
##' @examples
##' fas <- list.files(system.file("extdata", "GVariation", package="ggmsa"),
##' pattern="fas", full.names=TRUE)
##' seqdiff(fas[1], reference=1)
seqdiff <- function(fasta, reference=1) {
sequence <- readBStringSet(fasta)
if (length(sequence) != 2 && length(width(sequence)) != 1) {
stop("fas should contains 2 aligned sequences...")
}
diff <- nucleotide_difference(sequence, reference)
new("SeqDiff",
file = fasta,
sequence = sequence,
reference = reference,
diff = diff)
}
##' @importFrom magrittr %>%
##' @importFrom Biostrings toString
##' @importFrom Biostrings width
nucleotide_difference <- function(x, reference=1) {
n <- width(x[1])
nn <- seq_len(n)
s1 <- x[1] %>% toString %>% substring(nn, nn)
s2 <- x[2] %>% toString %>% substring(nn, nn)
pos <- which(s1 != s2)
if (reference == 1) {
diff <- s2[pos]
} else {
diff <- s1[pos]
}
return(data.frame(position = pos,
difference = diff,
stringsAsFactors = FALSE))
}
##' @importFrom dplyr group_by
##' @importFrom dplyr summarize
##' @importFrom dplyr select
##' @importFrom dplyr n
nucleotide_difference_count <- function(x, width=50, keep0=FALSE) {
n <- max(x$position)
bin <- rep(seq_len(ceiling(n/width)), each=width)
position <- c(seq_len(n)[!duplicated(bin)], n)
x$bin <- bin[x$pos]
y <- x %>% group_by(bin) %>%
summarize(position=min(position), count = n()) %>%
select(-bin)
y$position <- position[findInterval(y$position, position)]
if (keep0) {
itv <- seq(1, n, width)
yy <- data.frame(position = itv[!itv %in% y$position],
count = 0)
y <- rbind(y, yy)
y <- y[order(y$position, decreasing=FALSE),]
}
return(y)
}
================================================
FILE: R/seqlogo.R
================================================
##' plot sequence logo for MSA based 'ggolot2'
##' @title seqlogo
##' @param msa Multiple sequence alignment file or object for representing
##' either nucleotide sequences or peptide sequences.
##' @param start Start position to plot.
##' @param end End position to plot.
##' @param font font families, possible values are 'helvetical', 'mono', and
##' 'DroidSansMono', 'TimesNewRoman'. Defaults is 'DroidSansMono'.
##' If font=NULL, only the background tiles is drawn.
##' @param color A Color scheme. One of 'Clustal', 'Chemistry_AA',
##' 'Shapely_AA', 'Zappo_AA', 'Taylor_AA', 'LETTER', 'CN6','Chemistry_NT',
##' 'Shapely_NT', 'Zappo_NT', 'Taylor_NT'. Defaults is 'Chemistry_AA'.
##' @param custom_color A data frame with two cloumn called "names" and
##' "color".Customize the color scheme.
##' @param adaptive A logical value indicating whether the overall height of
##' seqlogo corresponds to the number of sequences. If FALSE, seqlogo
##' overall height = 4,fixedly.
##' @param top A logical value. If TRUE, seqlogo is aligned to the top of MSA.
##' @return ggplot object
##' @examples
##' #plot sequence motif independently
##' nt_sequence <- system.file("extdata", "LeaderRepeat_All.fa",
##' package = "ggmsa")
##' seqlogo(nt_sequence, color = "Chemistry_NT")
##' @export
##' @author Lang Zhou
seqlogo <- function(msa,
start = NULL,
end = NULL,
font = "DroidSansMono",
color = "Chemistry_AA",
adaptive = FALSE,
top = FALSE,
custom_color = NULL) {
data <- tidy_msa(msa, start = start, end = end)
ggplot() + geom_logo(data,
font = font,
color = color,
adaptive = adaptive,
top = top,
custom_color = custom_color) +
theme_minimal() + xlab(NULL) + ylab(NULL) +
theme(legend.position = 'none') +
theme(panel.grid = element_blank(), axis.text.y = element_blank()) +
coord_fixed()
}
##' Multiple sequence alignment layer for ggplot2. It plot sequence motifs.
##' @title geom_seqlogo
##' @param font font families, possible values are 'helvetical', 'mono',
##' and 'DroidSansMono', 'TimesNewRoman'. Defaults is 'DroidSansMono'.
##' @param color A Color scheme. One of 'Clustal', 'Chemistry_AA',
##' 'Shapely_AA', 'Zappo_AA', 'Taylor_AA', 'LETTER', 'CN6', 'Chemistry_NT',
##' 'Shapely_NT', 'Zappo_NT', 'Taylor_NT'. Defaults is 'Chemistry_AA'.
##' @param custom_color A data frame with two cloumn called "names" and
##' "color".Customize the color scheme.
##' @param adaptive A logical value indicating whether the overall height
##' of seqlogo corresponds to the number of sequences.If is FALSE,
##' seqlogo overall height = 4,fixedly.
##' @param top A logical value. If TRUE, seqlogo is aligned to the top of MSA.
##' @param show.legend logical. Should this layer be included in the legends?
##' @param ... additional parameter
##' @return A list
##' @examples
##' #plot multiple sequence alignment and sequence motifs
##' f <- system.file("extdata/LeaderRepeat_All.fa", package="ggmsa")
##' ggmsa(f,font = NULL,color = "Chemistry_NT") + geom_seqlogo()
##' @export
##' @author Lang Zhou
geom_seqlogo <- function(font = "DroidSansMono", color = "Chemistry_AA",
adaptive = TRUE, top = TRUE, custom_color = NULL,
show.legend = FALSE, ...) {
structure(list(font = font,
color = color,
adaptive = adaptive,
top = top,
custom_color = custom_color,
show.legend = show.legend),
class = "seqlogo")
}
geom_logo <- function(data, font = "DroidSansMono", color = "Chemistry_AA",
adaptive = FALSE, top = TRUE, custom_color = NULL,
show.legend = FALSE, ...) {
mapping <- aes_(x = ~logo_x,
y = ~logo_y,
group = ~group,
fill = ~I(color))
logo_data <- seqlogo_data(data, font = font, color = color,
adaptive = adaptive, top = top,
custom_color = custom_color)
ly_logo <- geom_polygon(mapping = mapping, data = logo_data,
inherit.aes = FALSE, show.legend = show.legend)
return(ly_logo)
}
seqlogo_data <- function(data, font = "DroidSansMono",
color = "Chemistry_AA", adaptive = FALSE,
top = TRUE, custom_color = NULL){
tidy <- data
if (color == "Clustal") {
tidy <- color_Clustal(tidy)
} else{
tidy <- color_scheme(tidy, color, custom_color)
}
if (adaptive) {
seq_number <- as.character(unique(tidy[[1]]))
total_heigh <- length(seq_number) / 6
} else {
total_heigh <- 4
}
#total_heigh <- getOption("total_heigh")
logo_width <- getOption("logo_width")
## assign the start postion to the first label
col_num <- as.numeric(levels(factor(tidy$position)))
moti_da <- lapply(col_num, function(j){
## Calculate the char frequency in each column
clo <- tidy[tidy$position == j, ]
fre <- prop.table(table(clo$character))
## total_heigh is overall hight, the height of each char is assigned.
ywidth <- sort(total_heigh * fre )
## calling color scheme
column_char_color <- data.frame(unique(clo[c("character", "color")]))
font_f <- font_fam[[font]]
motif_char <- font_f[names(ywidth)]
ds_ <- lapply(seq_along(motif_char), function(i){
ds_ <- motif_char[[i]]
names(ds_)[names(ds_) == "x"] <- "logo_x"
names(ds_)[names(ds_) == "y"] <- "logo_y"
ds_$char <- names(motif_char[i])
#width = .9
ds_$logo_x <- ds_$logo_x * logo_width/diff(range(ds_$logo_x))
#hight = overall hight * frequency
ds_$logo_y <- ds_$logo_y * ywidth[[i]]/diff(range(ds_$logo_y))
ymotif <- sum(ywidth[0:(i - 1)]) # sum-hight currently
# moving char horizontally
ds_$logo_x <- ds_$logo_x - min(ds_$logo_x) - logo_width/2 + j
ds_$logo_y <- ds_$logo_y - min(ds_$logo_y) - ywidth[[i]]/2 +
ymotif + ywidth[[i]]/2
if (top) {
ds_$logo_y <- ds_$logo_y + nrow(tidy[tidy$position == j, ]) + .5
}
## ds_$y - min(ds_$y) - ywidth[[i]]/2: Centered at zero
## + ymotif: sum-hight that are below the char currently
## + ywidth[[i]]/2: the char height currently
ds_$group <- paste0("P", j, '-', "Char", names(motif_char[i]))
ds_$color <- column_char_color[column_char_color$character ==
unique(ds_$char), "color"]
return(ds_)
})
ds <- do.call(rbind, ds_)
return(ds)
})
moti_da <- do.call(rbind, moti_da)
moti_da$name <- as.character(tidy[1,1])
other_cn <- names(moti_da)[!names(moti_da) == 'name']
moti_da <- moti_da[c("name", other_cn)]
add_col <- tidy[,!names(tidy) %in% names(moti_da)]
moti_da <- cbind(add_col[1,], moti_da, row.names = NULL)
return(moti_da)
}
================================================
FILE: R/simplot.R
================================================
##' Sequence similarity plot
##'
##'
##' @title simplot
##' @param file alignment fast file
##' @param query query sequence
##' @param window sliding window size (bp)
##' @param step step size to slide the window (bp)
##' @param group whether grouping sequence.(eg. For "A-seq1,A-seq-2,B-seq1 and
##' B-seq2", using sep = "-" and id = 1 to divide sequences into groups A and
##' B)
##' @param id position to extract id for grouping; only works if group = TRUE
##' @param sep separator to split sequence name; only works if group = TRUE
##' @param sd whether display standard deviation of
##' similarity among each group; only works if group=TRUE
##' @param smooth FALSE(default)or TRUE; whether display smoothed spline.
##' @param smooth_params a list that add params for geom_smooth,
##' (default: smooth_params = list(method = "loess", se = FALSE))
##' @return ggplot object
##' @importFrom Biostrings readDNAStringSet
##' @importFrom ggplot2 aes_
##' @importFrom ggplot2 geom_line
##' @importFrom ggplot2 ggtitle
##' @importFrom ggplot2 geom_ribbon
##' @importFrom ggplot2 geom_smooth
##' @importFrom magrittr %<>%
##' @importFrom dplyr group_by_
##' @importFrom dplyr summarize_
##' @export
##' @author guangchuang yu
##' @examples
##' fas <- system.file("extdata/GVariation/sample_alignment.fa",
##' package="ggmsa")
##' simplot(fas, 'CF_YL21')
simplot <- function(file,
query,
window=200,
step=20,
group=FALSE,
id,
sep,
sd=FALSE,
smooth = FALSE,
smooth_params = list(method = "loess",
se = FALSE)) {
aln <- readDNAStringSet(file)
nn <- names(aln)
if (group) {
g <- vapply(strsplit(nn, sep), function(x) x[id], character(1))
}
idx <- which(nn != query)
w <- width(aln[query])
start <- seq(1, w, by=step)
end <- start + window - 1
start <- start[end <= w]
end <- end[end <= w]
res <- lapply(idx, function(i) {
x <- toCharacter(aln[i]) == toCharacter(aln[query])
pos <- round((start+end)/2)
sim <- vapply(seq_along(start), function(j) {
mean(x[start[j]:end[j]])
}, numeric(1))
y <- data.frame(sequence=nn[i], position = pos, similarity = sim)
if(group) {
y$group <- g[i]
}
return(y)
}) %>% do.call(rbind, .)
if (group) {
res %<>% group_by_(~position, ~group) %>%
summarize_(msim=~mean(similarity), sd=~sd(similarity))
}
if (group) {
p <- ggplot(res, aes_(x=~position, y=~msim, group=~group))
if (sd) p <- p + geom_ribbon(aes_(ymin=~msim-sd,
ymax=~msim+sd,
fill=~group), alpha=.25)
if (smooth) {
smooth_layer <- do.call(geom_smooth,
smooth_params)
p <- p + smooth_layer
} else {
p <- p + geom_line(aes_(color=~group))
}
} else {
mapping = aes_(x=~position,
y=~similarity,
group=~sequence,
color=~sequence)
p <- ggplot(res, mapping = mapping)
if (smooth) {
smooth_layer <- do.call(geom_smooth,
smooth_params)
p <- p + smooth_layer
} else {
p <- p + geom_line()
}
}
p + xlab("Nucleotide Position") + ylab("Similarity (%)") +
ggtitle(paste("Sequence similarities compare to", query)) +
theme_minimal() +
theme(legend.title=element_blank())
}
toCharacter <- function(x) {
unlist(strsplit(toString(x),""))
}
================================================
FILE: R/theme_msa.R
================================================
##' Theme for ggmsa.
##'
##' @title theme_msa
##' @importFrom ggplot2 theme_minimal
##' @importFrom ggplot2 labs
##' @export
##' @author Lang Zhou
theme_msa <- function(){
list(
xlab(NULL),
ylab(NULL),
labs(fill = "Fills"),
coord_fixed(),
scale_x_continuous(expand = c(0,0)),
theme_minimal() +
theme(
strip.text = element_blank(),
panel.spacing.y = unit(.4, "in"),
panel.grid = element_blank())
)
}
##' @importFrom grDevices colorRampPalette
##' @importFrom RColorBrewer brewer.pal
##' @importFrom ggplot2 coord_cartesian
##' @importFrom ggplot2 scale_x_continuous
##' @importFrom ggplot2 scale_y_continuous
##' @importFrom ggplot2 scale_fill_gradientn
bar_theme <- function(tidy){
data <- bar_data(tidy)
color_palettes <- colorRampPalette(brewer.pal(n = 9,
name = "Blues")[c(4:7)])
list(
xlab(NULL),
ylab("consensus"),
scale_x_continuous(breaks = data[[3]],
labels = data[[1]],
expand = c(0,0)),
scale_y_continuous(breaks = NULL),
scale_fill_gradientn(colours = color_palettes(100)),
theme_minimal() +
theme(panel.grid.minor.x = element_blank(),
panel.grid.major.x = element_blank())
)
}
facet_scale <- function(facetData, field) {
facet0_pos <- facetData[facetData$facet == 0,"position"]
msa_start <- min(facet0_pos)
## x labels of facet 0
facet0_xl_scale <- pretty(min(facet0_pos):max(facet0_pos))
## assign the start postion to the first label
facet0_xl_scale[1] <- msa_start
xl_scale <- facet0_xl_scale
for(i in max(facetData$facet) %>% seq_len) {
scale_i <- facet0_xl_scale + field * i
if(msa_start > 1) scale_i[1] <- scale_i[1] + 1
#print(scale_i)
xl_scale <- xl_scale %>% c(scale_i)
}
max_pos <- facetData$position %>% max
xl_scale <- xl_scale[xl_scale <= max_pos]
return(xl_scale)
}
================================================
FILE: R/zzz.R
================================================
#' @importFrom utils packageDescription
.onAttach <- function(libname, pkgname){
#options(total_heigh = 4)
options(logo_width = 0.9)
options(asterisk_width = .03)
options(GC_pos = 2)
options(shadingLen = .5)
options(shading_alpha = .3)
pkgVersion <- packageDescription(pkgname, fields="Version")
msg <- paste0(pkgname, " v", pkgVersion, " ",
"Document: http://yulab-smu.top/ggmsa/", "\n\n")
citation <- paste0("If you use ", pkgname,
" in published research, please cite:\n",
"L Zhou, T Feng, S Xu, F Gao, TT Lam, Q Wang, T Wu, ",
"H Huang, L Zhan, L Li, Y Guan, Z Dai*, G Yu* ",
"ggmsa: a visual exploration tool for multiple sequence alignment and associated data. ",
"Briefings in Bioinformatics. DOI:10.1093/bib/bbac222")
packageStartupMessage(paste0(msg, citation))
}
================================================
FILE: README.Rmd
================================================
---
output:
md_document:
variant: gfm
html_preview: TRUE
---
```{r, include = FALSE}
knitr::opts_chunk$set(
fig.path = "man/figures/REAMED-",
message = FALSE,
warning = FALSE
)
```
# ggmsa:a visual exploration tool for multiple sequence alignment and associated data
```{r echo=FALSE, results="hide", message=FALSE}
library(badger)
```
```{r, echo = FALSE, results='asis'}
cat(
badge_devel("YuLab-SMU/ggmsa", "blue"),
badge_lifecycle("experimental", "orange"),
badge_license("Artistic-2.0")
)
```
`ggmsa` is designed for visualization and annotation of multiple sequence alignment. It implements functions to visualize publication-quality multiple sequence alignments (protein/DNA/RNA) in R extremely simple and powerful.
For details, please visit
## :hammer: Installation
The released version from `Bioconductor`
```{r eval=FALSE}
if (!requireNamespace("BiocManager", quietly=TRUE))
install.packages("BiocManager")
## BiocManager::install("BiocUpgrade") ## you may need this
BiocManager::install("ggmsa")
```
Alternatively, you can grab the development version from github using devtools:
```{r eval=FALSE}
if (!requireNamespace("devtools", quietly=TRUE))
install.packages("devtools")
devtools::install_github("YuLab-SMU/ggmsa")
```
## :bulb: Quick Example
```{r fig.height = 2.5, fig.width = 11, message=FALSE, warning=FALSE, dpi=300}
library(ggmsa)
protein_sequences <- system.file("extdata", "sample.fasta", package = "ggmsa")
ggmsa(protein_sequences, start = 221, end = 280, char_width = 0.5, seq_name = TRUE) + geom_seqlogo() + geom_msaBar()
```
## :books: Learn more
Check out the guides for learning everything there is to know about all the different features:
- [Getting Started](https://yulab-smu.github.io/ggmsa/articles/ggmsa.html)
- [Annotations](https://yulab-smu.github.io/ggmsa/articles/guides/Annotations.html)
- [Color Schemes and Font Families](https://yulab-smu.github.io/ggmsa/articles/guides/Color_schemes_And_Font_Families.html)
- [Theme](https://yulab-smu.github.io/ggmsa/articles/guides/MSA_theme.html)
- [Other Modules](https://yulab-smu.github.io/ggmsa/articles/guides/Other_Modules.html)
- [View Modes](https://yulab-smu.github.io/ggmsa/articles/guides/View_modes.html)
## :runner: Author
- [Guangchuang Yu](https://guangchuangyu.github.io) Professor, PI
- [Lang Zhou](https://github.com/nyzhoulang) Master's Student
- [Shuangbin Xu](https://github.com/xiangpin) PhD Student
**YuLab**
**Department of Bioinformatics, School of Basic Medical Sciences, Southern Medical University**
## :sparkling_heart: Contributing
We welcome any contributions! By participating in this project you agree to abide
by the terms outlined in the [Contributor Code of Conduct](https://github.com/YuLab-SMU/ggmsa/blob/master/CONDUCT.md).
================================================
FILE: README.md
================================================
# ggmsa:a visual exploration tool for multiple sequence alignment and associated data
[](https://github.com/YuLab-SMU/ggmsa)
[](https://lifecycle.r-lib.org/articles/stages.html#experimental)
[](https://cran.r-project.org/web/licenses/Artistic-2.0)
`ggmsa` is designed for visualization and annotation of multiple
sequence alignment. It implements functions to visualize
publication-quality multiple sequence alignments (protein/DNA/RNA) in R
extremely simple and powerful.
For details, please visit
## :hammer: Installation
The released version from `Bioconductor`
``` r
if (!requireNamespace("BiocManager", quietly=TRUE))
install.packages("BiocManager")
## BiocManager::install("BiocUpgrade") ## you may need this
BiocManager::install("ggmsa")
```
Alternatively, you can grab the development version from github using
devtools:
``` r
if (!requireNamespace("devtools", quietly=TRUE))
install.packages("devtools")
devtools::install_github("YuLab-SMU/ggmsa")
```
## :bulb: Quick Example
``` r
library(ggmsa)
protein_sequences <- system.file("extdata", "sample.fasta", package = "ggmsa")
ggmsa(protein_sequences, start = 221, end = 280, char_width = 0.5, seq_name = TRUE) + geom_seqlogo() + geom_msaBar()
```

## :books: Learn more
Check out the guides for learning everything there is to know about all
the different features:
- [Getting
Started](https://yulab-smu.github.io/ggmsa/articles/ggmsa.html)
- [Annotations](https://yulab-smu.github.io/ggmsa/articles/guides/Annotations.html)
- [Color Schemes and Font
Families](https://yulab-smu.github.io/ggmsa/articles/guides/Color_schemes_And_Font_Families.html)
- [Theme](https://yulab-smu.github.io/ggmsa/articles/guides/MSA_theme.html)
- [Other
Modules](https://yulab-smu.github.io/ggmsa/articles/guides/Other_Modules.html)
- [View
Modes](https://yulab-smu.github.io/ggmsa/articles/guides/View_modes.html)
## :runner: Author
- [Guangchuang Yu](https://guangchuangyu.github.io) Professor, PI
- [Lang Zhou](https://github.com/nyzhoulang) Master’s Student
- [Shuangbin Xu](https://github.com/xiangpin) PhD Student
**YuLab**
**Department of Bioinformatics, School of Basic Medical Sciences,
Southern Medical University**
## :sparkling_heart: Contributing
We welcome any contributions! By participating in this project you agree
to abide by the terms outlined in the [Contributor Code of
Conduct](https://github.com/YuLab-SMU/ggmsa/blob/master/CONDUCT.md).
================================================
FILE: inst/CITATION
================================================
citHeader("To cite ggmsa in publications use:")
citEntry(
entry = "book",
title = "Data Integration, Manipulation and Visualization of Phylogenetic Treess",
author = person("Guangchuang", "Yu"),
publisher = "Chapman and Hall/{CRC}",
year = "2022",
edition = "1st edition",
url = "https://www.amazon.com/Integration-Manipulation-Visualization-Phylogenetic-Computational-ebook/dp/B0B5NLZR1Z/",
textVersion = paste("Guangchuang Yu. (2022).",
"Data Integration, Manipulation and Visualization of Phylogenetic Trees (1st edition).",
"Chapman and Hall/CRC.")
)
citEntry(
entry = "article",
title = "ggmsa: a visual exploration tool for multiple sequence alignment and associated data ",
author = personList(
as.person("Lang Zhou"),
as.person("Tingze Feng"),
as.person("Shuangbin Xu"),
as.person("Fangluan Gao"),
as.person("Tommy T Lam"),
as.person("Qianwen Wang"),
as.person("Tianzhi Wu"),
as.person("Huina Huang"),
as.person("Li Zhan"),
as.person("Lin Li"),
as.person("Yi Guan"),
as.person("Zehan Dai"),
as.person("Guangchuang Yu")
),
journal = "BRIEFINGS IN BIOINFORMATICS",
volume = "23",
issue = "4",
year = "2022",
month = "06",
ISSN = "1467-5463",
doi = "10.1093/bib/bbac222",
PMID = "35671504",
url = "https://academic.oup.com/bib/article-abstract/23/4/bbac222/6603927",
textVersion = paste("L Zhou, T Feng, S Xu, F Gao, TT Lam, Q Wang, T Wu, H Huang, L Zhan, L Li, Y Guan, Z Dai, G Yu.",
"ggmsa: a visual exploration tool for multiple sequence alignment and associated data.",
"Bioinformatics. 2022, 23(4):bbac222. 10.1093/bib/bbac222")
)
================================================
FILE: inst/extdata/GVariation/A.Mont.fas
================================================
>Mont
ATGGCAACTTACACATCAACAATCCAGTTTGGTTCCATTGAATGCAAACTTCCATACTCACCCGCTCCTTTTGGGCTAGTTGCGGGGAAACGAGAAGTTTTAACCACCACTGACCCCTTCGCAAGTTTGGAGATGCAGCTTAGTGCGCGATTACGAAGGCAAGAGTTTGCAACTATTCGAACATCCAAGAATGGTACTTGCATGTATCGATACAAGACTGATGTCCAGATTGCGCGCATTCAAAAGAAGCGCGAGGAAAGAGAAAGAGAGGAATATAATTTCCAAATGGCTGCGTCAAGTGTTGTGTCGAAGATCACTATTGCTGGTGGAGAGCCACCTTCAAAACTTGAATCACAAGTGCGGAGGGGTGTCATCCACACAACTCCAAGGATGCGCACAGCAAAAACATATCACACGCCAAAGCTGACAGAGGGACAAATGAACCACCTTATCAAGCAGGTGAAGCAAATTATGTCAACCAAAGGAGGGTCTGTTCAACTGATTAGCAAGAAAAGTACTCATGTTCACTATAAAGAAGTTTTGGGATCACATCGCGCAGTTGTTTGCACTGCACATATGAGAGGTTTACGAAAGAGAGTGGACTTTCGGTGTGATAAATGGACCGTTGTGCGTCTACAGCATCTCGCCAGGACGGACAAGTGGACTAACCAAGTTCGTGCTACTGATCTACGCAAGGGCGATAGTGGAGTTATATTGAGTAATACTAATCTCAAAGGAAACTTTGGGAGAAGCTCGGAGGGCCTATTCATAGTGCGTGGGTCGCACGAAGGAAAAATCTATGATGCACGTTCCAAGGTTACTCAAGGGGTTATGGATTCAATGGTTCAGTTCTCAAGCGCTGAAAGCTTTTGGAAGGGATTGGACGGCAATTGGGCACAAATGAGATATCCTACAGATCATACATGTGTGGCAGGCTTACCAGTTGAAGACTGTGGCAGAGTTGCAGCGATAATGACACACAGTATTTTACCGTGCTATAAGATAACCTGCCCTACCTGTGCCCAACAATATGCCAACTTGCCAGCCAGTGACTTACTTAAGATATTACACAAGCACGCAAGTGATGGTCTAAATCGATTGGGGGCAGACAAAGATCGCTTTGTGCATGTCAAAAAGTTCTTGACAATCTTAGAGCACTTAACTGAACCGGTTGATCTGAGTCTAGAAATTTTCAATGAAGTATTCAAGTCTATAGGGGAGAAGCAACAATCACCTTTCAAAAACCTGAATATTCTGAATAATTTCTTTTTGAAAGGAAAGGAAAATACAGCTCGTGAATGGCAGGTGGCTCAATTAAGCTTACTTGAATTGGCAAGATTCCAAAAGAACAGAACGGATAATATCAAGAAAGGAGACATCTCGTTCTTTAGGAATAAACTATCTGCAAAAGCAAATTGGAACTTGTATCTGTCATGTGATAACCAGCTGGATAAGAATGCAAACTTCCTGTGGGGACAGAGGGAATATCATGCTAAGCGATTTTTCTCGAACTATTTCGAGGAAATTGATCCAGCGAAGGGCTATTCAGCATACGAAAATCGTTTGCATCCGAATGGGACAAGAAAACTTGCAATTGGAAACCTAATTGTACCACTTGATCTGGCTGAGTTTAGGCGGAAGATGAAAGGTGATTATAAAAGACAGCCAGGGGTGAGTAAGAAGTGCACGAGCTCGAAGGATGGAAACTACGTGTATCCCTGTTGTTGCACTACACTTGATGATGGCTCAGCTGTTGAGTCAACATTTTACCCGCCAACTAAGAAGCACCTCGTAATAGGTAATAGTGGCGACCAAAAGTATGTTGACTTACCAAAAGGGAATTCTGAGATGTTATATATTGCCAGGCAAGGCTTCTGTTACATTAACATTTTTCTCGCGATGTTGATTAACATTAGTGAGGAAGATGCAAAGGATTTCACTAAGAAGGTTCGTGACATGTGTGTGCCAAAGCTTGGAACCTGGCCAACCATGATGGATTTGGCTACAACTTGTGCTCAAATGAAAATATTCTACCCTGATGTTCATGATGCAGAACTGCCTAGAATACTAGTCGATCACGAAACGCAGACATGCCATGTGGTTGACTCGTTTGGCTCACAAACAACTGGGTATCATATTTTGAAAGCATCTAGCGTGTCCCAACTTATTTTGTTTGCTAATGATGAGTTGGAGTCTGACATTAAGCACTATAGAGTTGGTGGTATTCCTGGAGCATGCCCTGAGCTTGGGTCCACAATATCACCTTTTAGAGAAGGAGGAATCATAATGTCTGAGTCAGCAGCGCTAAAACTGCTCCTAAAGGGAATTTTTAGGCCCAAAGTGATGAAGCAATTGCTACTGGATGAACCATATTTGCTCATTTTATCGATATTATCTCCTGGTATACTTATGGCCATGTACAACAATGGGATATTTGAGTTAGCGGTGAAGTTGTGGATCAATGAGAAACAATCTATAGCCATGATAGCATCGTTATTGTCCGCCTTGGCTTTACGAGTGTCAGCAGCAGAAACACTCGTTGCACAGAGGATTATAATTGACACGGCAGCAACAGATCTTCTCGATGCTACGTGTGATGGATTCAACTTACATCTAACATATCCCACTGCACTCATGGTGTTGCAAGTTGTTAAGAACAGAAATGAATGTGATGATACGTTGTTTAAAGCAGGTTTTTCACATTACAACATGAGTGTCGTGCAGATTATGGAAAAAAATTATCTAAGCCTCTTGGGCGATGCTTGGAAAGATTTAACCTGGCGAGAAAAATTATCCGCAACATGGCACTCATACAAAGCAAAGCGCTCTATCACTCAGTTCATAAAACCCATAGGCAAAGCAGATTTAAAAGGGTTGTACAACATATCACCGCAAGCATTCTTGGGTCAGGGCGTACAGAGAGTCAAAGGCACCGCCTCAGGGTTGAATGAGCGACTCAATAATTATATCAATACTAAGTGTGTAAATATTTCATCCTTTTTCATTCGTAGAATTTTCCGGCGCTTGCCAACTTTTGTAACTTTCATTAATTCATTATTAGTTATTAGTATGCTAACTAGTGTAGTAGCAGTGTGTCAAGCAATAATTCTAGATCAAAGGAAGTATAGAAAAGAAATTGAGTTGATGCAGATTGAGAAGAATGAAATTGTTTGTATGGAGTTGTATGCGAGTCTGCAGCGCAAACTTGAGCGTGAATTCACATGGGATGAATATATGGAATATTTGAAATCTGTGAATCCCCAGATAGTTCAATTCGCGCAAGCTCAAATGGAAGAATATAATGTGCGACATCAGCGCTCCACACCAGGTGTTAAGAATTTAGAGCAGGTGGTAGCATTTATAACTCTAATTATCATGATGTTTGATGCTGAAAGGAGCGACTGTGTATTTAAGACTCTCAACAAATTCAAAGGCATCGTTTCTTCAATGGATCATGAAGTTAGACACCAGTCCTTGGATGATGTAATCAAGAATTTCGATGAAAGGAACGAAGTTATTGATTTTGAACTAAATGAGGATACAATTAAAACATCATCAGTGTTGGACACAAAGTTTAGCGACTGGTGGGATCGGCAAATCCAAATGGGACACACACTTCCCCATTATAGAACTGAGGGACACTTCATGGAATTCACAAGGGCAACTGCTGTACAAGTGGCCAACGACATCGCGCATAGTGAGCACCTAGACTTTCTAGTGAGGGGAGCTGTTGGGTCTGGAAAATCTACTGGACTGCCTGTCCATCTCAGTGCAGCTGGATCTGTGCTTTTGATAGAACCAACTCGACCACTTGCAGAAAACGTGTTCAAGCAATTATCCAGTGAACCGTTTTTCAAGAAGCCAACACTGCGCATGCGAGGAAATAGTGTGTTTGGTTCCTCTCCAATCTCCATTATGACTAGCGGCTTTGCGTTGCACTACTATGCTAATAATCGCTCTCAGCTAACTCAGTTTAATTTCATAATTTTTGATGAATGTCATGTTTTAGATCCTTCTGCAATGGCATTTCGTAGCTTGTTAAGTGTGTATCACCAAACATGCAAAGTGTTAAAGGTGTCAGCCACTCCAGTGGGAAGGGAGGTCGAGTTCACAACACAACAACCAGTTAAATTGGTGGTTGAGGATACACTTTCATTCCAATCTTTTGTTGATGCGCAAGGCTCAAAAACCAATGCTGACGTAGTTCAGCATGGTTCGAACATACTCGTGTATGTGTCGAGTTACAATGAAGTGGATACATTAGCCAAGCTTCTAACAGATAGGAATATGATAGTCTCAAAAGTTGATGGCAGAACAATGAAGCACGGATGCTTAGAAATTGTAACGAAAGGGACTAGTGCAAAGCCACATTTTGTCGTAGCAACCAACATTATTGAAAATGGAGTAACTTTAGATATAGATGTAGTTGTAGATTTTGGGCTTAAAGTCTCACCGTTTTTAGATATTGACAATAGGAGCATAGCATACAATAAGATTAGTGTTAGCTATGGAGAAAGAATTCAGAGGTTGGGCCGTGTTGGGCGCTTTAAGAAGGGAGTGGCATTGCGTATTGGACACACCGAAAAGGGAATTATTGAGATTCCAAGTATGATTGCTAGTGAAGCTGCGCTTGCGTGCTTTGCATACAATTTGCCAGTAATGACAGGGGGTGTTTCAACTAGCCTCATTGGCAATTGTACTGTTCGTCAAGTTAAAACTATGCAACAATTTGAGCTGAGTCCATTCTTTATACAAAATTTTGTTGCCCATGATGGATCAATGCATCCTGTCATACATGACATTCTTAAGAAGTATAAACTGCGAGATTGTATGACGCCCTTGTGTGATCAATCCATACCTTACAGAGCCTCAAGCACTTGGTTGTCTGTTAGTGAGTACGAACGACTCGGAGTGGTTTTGGACATTCCAAAACAGATCAAGATTGCATTCCACATCAAGGATATCCCTCCTAAGTTGCATGAAATGCTTTGGGAAACAGTTATCAAATATAAGGATGTTTGTTTGTTTCCAAGTATTCGGGCTTCATCCATTAGCAAAATTGCATACACACTGCGCACTGATCTTTTTGCAATTCCCAGAACCCTAATTCTAGTTGAAAGATTGCTCGAGGAGGAACGAGTGAAACAGAGTCAATTCAGAAGTCTCATTGATGAAGGATGCTCAAGCATGTTTTCAATTGTTAATTTAACAAACACTCTTAGAGCTAGATATGCAAAGGATTACACTGCAGAAAACATACAGAAGCTCGAGAAAGTGAGGAGTCAGTTAAAGGAGTTCTCAAATTTAAATGGCTCTGCATGCGAGGAGAACTTAATGAAGAGGTATGAATCTCTACAGTTTGTGCATCATCAAGCAACAACTGCACTCGCAAAGGATTTGAAGTTGAAAGGAGTTTGGAAGAAGTCATTAGTTGTGCAGGACTTAATCATAGCGGGTGCCGTTGCTATTGGTGGAATAGGGCTCATCTATAGTTGGTTTACTCAATCAGTTGAAACTGTGTCTCACCAGGGCAAGAACAAATCCAAAAGAATTCAAGCATTGAAGTTTCGACACGCCCGCGATAAGAGGGCTGGTTTTGAAATTGATAACAATGATGATACAATAGAAGAATTCTTTGGATCTGCATACAGGAAGAAGGGAAAAGGTAAAGGCACCACTGTTGGTATGGGCAAGTCAAGCAGGAGGTTTGTTAATATGTATGGATTTGACCCAACAGAATATTCATTCATCCAGTTCGTTGATCCGCTCACTGGAGCTCAAATTGAAGAGAACGTCTATGCTGATATTAGAGACATCCAAGAGCGCTTTAGTGATGTCCGCAAGAAAATGGTAGAGGATGATGAAATCGAATTGCAAGCATTGGGCAGCAACACAATCATTCATGCTTACTTCAGGAAGGATTGGTCTGACAAGGCTCTAAAAATTGATTTGATGCCACACAACCCACTCAAAATCTGTGATAAATCGAATGGCATTGCTAAGTTTCCTGAAAGAGAACTTGAGTTGAGGCAAACTGGGCCAGCAACAGAGGTTGATGTGAAAGACATTCCAAAACAGGAAGTGGAGCATGAAGCCAAATCACTCATGAGAGGTTTAAGGGATTTCAATCCAATTGCTCAAACAGTTTGCAGAGTAAAAGTGTCTGTTGAATATGGAACGTCTGAAATGTATGGGTTCGGTTTTGGTGCGTATATTATAGTAAACCACCATCTATTCAAGAGTTTCAATGGATCCATGGAAGTGCGATCAATGCATGGAACATTCAGAGTGAAGAATTTGCATAGCCTGAGCGTTTTACCGATCAAAGGCAGAGACATTATCATCATAAAGATGCCAAAGGATTTCCCTGTTTTCCCACAAAAACTGCACTTCCGAGCTCCAGTGCAGAATGAGAGGATTTGTTTGGTTGGAACTAATTTTCAAGAAAAACATGCATCATCAATCATCACAGAAACGAGTACTACATACAATGTACCGGGCAGCACTTTTTGGAAGCATTGGATTGAAACAAATGATGGGCATTGTGGATTACCAGTAGTGAGTACAGCTGATGGATGTCTAGTTGGAATACACAGCTTGGCGAATAATGTGCAAACCACGAATTATTATTCAGCCTTTGATGAGGATTTTGAAAGTAAGTATCTCCGAACTGATGAGCATAATGAGTGGACCAAATCGTGGGTATATAACCCAGATACTGTGTTGTGGGGTCCATTGAAGCTCAAAGAGAGTACCCCTAAAGGCCTGTTTAAGACAACAAAACTTGTACAGGATTTAATTGATCATGATGTTGTTGTAGAGCAAGCTAAACATTCTGCGTGGATGTATGAGGCTCTAACAGGGAATTTGCAAGCTGTGGCGACAATGAAGAGTCAGCTAGTGACAAAGCACGTGGTCAAAGGGGAGTGTCGGCACTTCAAAGAGTTCTTAACTGTGGATTCGGAAGCAGAAGCTTTCTTCAGGCCTTTGATGGATGCTTATGGGAAGAGCTTGTTAAATAGAGAAGCATATATAAAGGACATAATGAAATACTCAAAGCCTATTGATGTTGGAATAGTAGACTGTGATGCTTTTGAAGAGGCTATCAATAGGGTTATCATTTATCTGCAAGTACATGGCTTCCAGAAATGCAATTACATCACCGATGAGCAGGAAATTTTCAAAGCTCTCAATATGAAAGCTGCTGTCGGGGCTATGTATGGAGGCAAGAAGAAAGACTACTTCGAGCATTTTACTGAGGCGGATAAAGAGGAAATTGTTATGCAAAGTTGCTTACGATTGTACAAGGGCTCACTTGGCATATGGAATGGATCATTGAAAGCAGAACTTCGGTGCAAAGAGAAGATACTTGCAAATAAGACAAGGACATTCACTGCTGCACCTTTAGATACTCTACTGGGTGGGAAGGTGTGCGTTGATGATTTTAATAATCAATTCTACTCAAAGAACATTGAATGCTGCTGGACTGTTGGAATGACTAAGTTTTATGGAGGTTGGGACAAATTGCTTCGGCGTCTACCTGAAAATTGGGTGTACTGCGATGCCGATGGTTCACAATTCGATAGTTCACTCACCCCATACCTAATTAATGCTGTTCTCATCATCAGAAGCACATACATGGAAGATTGGGACTTGGGGTTGCAAATGTTGCGCAATTTGTACACAGAAATAATTTACACACCAATCTCAACTCCAGATGGAACAATTGTCAAGAAGTTTAGAGGTAATAATAGCGGTCAACCTTCTACCGTTGTGGATAATTCTCTCATGGTTGTTCTTGCTATGCATTACGCTCTCATTAAGGAGTGCGTTGAGTTTGAAGAAATCGACAGCACGTGTGTATTCTTTGTTAATGGTGATGACTTATTGATTGCTGTGAATCCGGAGAAAGAGAGCATTCTCGATAGAATGTCACAACATTTCTCAGATCTTGGTTTGAACTATGATTTTTCGTCGAGAACAAGAAGGAAGGAGGAATTGTGGTTCATGTCCCATAGAGGCCTGCTAATTGAGGGTATGTACGTGCCAAAGCTTGAAGAAGAGAGAATTGTATCCATTCTGCAATGGGATAGGGCTGATCTGCCAGAGCACAGATTAGAAGCGATTTGTGCAGCAATGATAGAATCCTGGGGTTATTTTGAGTTAACGCACCAAATTAGGAGATTCTACTCATGGTTGTTACAACAGCAACCTTTTTCAACGATAGCACAGGAAGGAAAAGCTCCATACATAGCGAGCATGGCATTGAAGAAGCTGTACATGAATAGGACAGTAGATGAGGAGGAACTGAAGGCTTTCACTGAAATGATGGTTGCCTTGGATGACGAATTTGAGTGCGATACTTATGAAGTGCACCATCAAGGAAATGACACAATCGATGCAGGAGGAAGCACTAAGAAGGATGCAAAACAAGAGCAAGGTAGCATTCAACCAAATCTCAACAAGGAAAAGGAAAAGGACGTGAATGTTGGAACATCTGGAACTCATACTGTGCCACGAATTAAAGCTATCACGTCCAAAATGAGAATGCCTAAGAGTAAAGGTGCAACTGTACTAAATTTGGAACACTTACTCGAGTATGCTCCACAGCAAATTGACATCTCAAATACTCGAGCAACTCAATCACAGTTTGATACGTGGTATGAAGCAGTACAACTTGCATACGACATAGGAGAAACTGAAATGCCAACTGTGATGAATGGGCTTATGGTTTGGTGCATTGAAAATGGAACCTCGCCAAACATCAACGGAGTTTGGGTTATGATGGATGGAGATGAACAAGTCGAATACCCACTGAAACCAATCGTTGAGAATGCAAAACCAACACTTAGGCAAATCATGGCACATTTCTCAGATGTTGCAGAAGCGTATATAGAAATGCGCAACAAAAAGGAACCATATATGCCACGATATGGTTTAGTTCGTAATCTGCGCGATGGAAGTCTGGCTCGCTATGCTTTTGACTTTTATGAAGTTACATCACGGACACCAGTGAGGGCTAGAGAGGCACACATTCAAATGAAGGCCGCAGCTTTAAAATCAGCTCAATCTCGACTTTTCGGATTGGATGGTGGCATTAGTACACAAGAGGAAAACACAGAGAGGCACACCACCGAGGATGTTTCTCCAAGTATGCATACTCTACTTGGAGTGAAGAACATGTGA
>CF_YL21
ATGGCAATCTACACGTCAACAATCTGTTTCGGTTCGTTTGAATGCAAGCTACCATACTCACCCGCCTCTTGCGGGCATATTGTGAAGGAACGAGAAGTGCTGGCTTCCATCGATCCTTTCGCAGATCTGGAAACACAACTTAGTGCACGATTGCTCAAGCAAGAATATGCTACTGTTCGTGTGCTCAAGAACGGTACTCTTACGTACCGATACAAGACTGATGCCCAGATAACGCGCATCCAGAAGAGACTGGAAAGGAAGGATAGGGAAGAATACCACTTCCAGATGGCAGCTCCTAGTATTGTGTCAAAGATCACTATTGCTGGTGGAGAGCCACCTTCAAAACTTGAATCACAAGTGCGGAGGGGTGTCATCCACACAACTCCAAGGATGCGCACAGCAAAAACATATCACACGCCAAAGCTGACAGAGGGACAAATGAACCACCTTATCAAGCAGGTGAAGCAAATTATGTCAACCAAAGGAGGGTCTGTTCAACTGATTAGCAAGAAAAGTACCCATGTTCACTATAAAGAAGTTTTGGGATCACATCGCGCAGTTGTTTGCACTGCACATATGAGAGGTTTACGAAAGAGAGTGGACTTTCGGTGTGATAAATGGACCGTTGTGCGTCTACAGCATCTCGCCAGGACGGATAAGTGGAATAACCAAGTTCGTGTTACTGATCTACGCAAGGGCGATAGTGGAGTTATATTGAGTAATACTAATCTCAAAGGAAACTTTGGGAGAAGCTCGGAGGGCATATTCATAGTGCGTGGGTCGCACGAAGGAAAAATCTATGATGCACGTTCCAAGGTTACTCAAGGGGTTATGGATTCAATGGTTCAGTTCTCAAGCGCTGAAAGCTTTTGGAAGGGATTGGACGGCAATTGGGCACAAATGAGATATCCTACAGATCATACATGTGTGGCAGGCTTACCAGTTGAAGACTGTGGCAGAGTTGCAGCGATAATGACACACAGTATTTTACCGTGCTATAAGATAACCTGCCCTACCTGTGCCCAACAATATGCCAACTTGCCAGCCAGTGACTTACTTAAGATATTACACAAGCACGCAAGTGATGGTTTAAATCGATTGGGGGCAGACAAAGATCGCTTTGTGCATGTCAAAAAGTTCTTGACAATCTTAGAGCACTTAACTGAACCGGTTGATCTGAGTCTAGAAATTTTCAATGAAGTATTCAAGTCTATAGGGGAGAAGCAACAATCACCTTTCAAAAACCTGAATATTCTGAATAATTTCTTTTTAAAAGGAAAGGAAAATACAGCTCGTGAATGGCAGGTGGCTCAATTAAGCTTACTTGAATTGGCAAGATTCCAAAAGAACAGAACGGATAATATCAAGAAAGTAGACATCTCGTTCTTTAGGAATAAACTATCTGCCAAAGCAAATTGGAACTTGTATCTGTCATGTGATAACCAGCTGGATAAGAATGCAAACTTCCTGTGGGGACAGAGGGAATATCATGCTAAGCGATTTTTCTCGAACTATTTCGAGGAAATTGATCCAGCGAAGGGCTATTCAGCATACGAAAATCGTTTGCATCCGAATGGGACAAGAAAACTTGCAATTGGAAACCTAATTGTACCACTTGATCTGGCTGAGTTTAGGCGGAAGATGAAAGGTGATTTTAAAAGACAGCCAGGGGTGAGTAAGAAGTGCACGAGCTCGAAGGATGGAAACTACGTGTATCCCTGTTGTTGCACTACACTTGATGATGGCTCAGCTGTTGAATCAACATTTTACCCGCCAACTAAGAAGCGCCTCGTAATAGGTAATAGTGGCGACCAAAAGTATGTTGACTTACCAAAAGGGAATTCTGAGATGTTATATATTGCCAGGCAAGGCTTCTGTTACATTAACATTTTCCTCGCGATGTTGATTAACATTAGTGAGGAAGATGCAAAGGATTTCACTAAGAAGGTTCGTGACATGTGTGTGCCAAAGCTCGGAACCTGGCCAACCATGATGGATCTGGCTACAACTTGTGCTCAAATGAAAATATTCTACCCTGATGTTCATGATGCAGAACTGCCTAGAATACTAGTCGATCACGAAACGCAGACATGCCATGTGGTTGACTCGTTTGGCTCACAAACAACTGGGTATCATATTTTGAAAGCATCTAGCGTGTCCCAACTTATTTTGTTTGCTAATGATGAGTTGGAGTCTGACATTAAGCACTATAGAGTTGGTGGTATTCCTAATGCATGCCCTGAACTTGGGTCCACAATATCACCTTTCAGAGAAGGAGGAGTTATAATGTCTGAGTCGGCAGCGCTGAAACTGCTTTTGAAGGGAATTTTTAGACCTAAGGTGATGAGACAGTTGCTGTTAGATGAGCCTTACCTGTTGATTCTATCAATATTATCCCCTGGCATACTGATGGCTATGTATAATAATGGGATTTTTGAACTTGCGGTGAGGTTGTGGATTAATGAGAAGCAATCCATAGCGATGATAGCATCGCTACTATCAGCTTTAGCCCTACGAGTGTCAGCGGCAGAAACACTCGTCGCACAGAGGATTATAATTGATGCTGCAGCTACAGACCTCCTCGATGCTACGTGTGACGGGTTCAACCTGCATCTAACGTACCCCACTGCATTGATGGTGTTGCAAGTTGTTAAGAATAGAAATGAATGTGATGATACTCTATTCAAGGCGGGTTTTCCAAGTTACAACACGAGCGTCGTACAGATTATGGAAAAAAATTATCTAAATCTCTTGAACGATGCTTGGAAAGATTTAACTTGGCGGGAAAAATTTTCCGCAACATGGTACTCATACAGAGCAAAACGCTCTATCACTCGGTACATAAAACCCACAGGAAGGGCAGATTTGAAAGGGTTATACAACATATCACCACAAGCATTCTTGGGCCGAAGCGTCCAGGTGGTCAAAGGCACTGCCTCAGGATTGAGCGAGCGATTTAATAATTACTTCAATACTAAGTGTGTAAATATTTCATCCTTTTTCATTCGTAGAATCTTTAGGCGTTTGCCAACTTTCGTCACTTTTGTTAACTCATTATTAGTTATTAGTATGTTAACTAGTGTAGTGGCAGTGTGTCAGGCAATAATTTTAGATCAGAAGAAGTATAGGAGAGAGATCGAGTTGATGCAGATAGAGAAGAATGAGATTGTCTGCATGGAGCTATACGCAAGTTTACAGCGCAAACTTGAACGCGATTTCACATGGGATGAGTACATTGAGTATTTGAAATCAGTGAACCCTCAGATAGTTCAATTTGCTCAAGCGCAGATGGAAGAATATGATGTGCGACACCAGCGTTCCACACCAGGTGTTAAAAATTTGGAACAAGTGGTAGCATTTATGGCTTTAGTCATCATGGTGTTCGATGCTGAAAGGAGTGACTGCGTGTTCAAAACTCTCAATAAATTTAAGGGTGTCCTTTCCTCACTGGACCATGAAGTTAGACATCAGTCCTTGGATGATGTGATCAAGAATTTTGATGAGAGGAATGAGACTATTGATTTTGAATTGAGTGAGGACACAATTCGAACATCATCAGTGCTAGATACAAAGTTTAGTGATTGGTGGGACCGACAAATCCAAATGGGACATACACTTCCACATTACAGAACCGAGGGGCACTTCATGGAATTTACAAGGGCAACTGCTGTTCAAGTGGCTAATGACATTGCCCACAGCGAACACCTAGACTTTCTAGTAAGGGGAGCTGTTGGGTCTGGAAAGTCAACTGGGTTGCCTGTTCATCTTAGTGTAGCTGGATCTGTGCTTTTAATTGAGCCAACGCGGCCACTAACGGAGAACGTTTTCAAACAGCTATCTAGTGAACCATTCTTCAAGAAGCCAACACTGCGTATGCGTGGAAATAGTATATTTGGCTCTTCCCCAATCTCCGTTATGACTAGCGGGTTTGCGCTACACTACTTCGCCAATAATCGCTCTCAATTAGCTCAGTTCAATTTTGTAATATTTGATGAGTGCCATGTTCTGGATCCTTCCGCAATGGCGTTCCGCAGTCTGCTGAGTGTTTATCATCAAGCATGTAAAGTATTAAAAGTGTCAGCTACTCCAGTGGGAAGAGAGGTTGAATTCACAACACAGCAGCCAGTCAAGTTAATAGTGGAAGACACACTGTCTTTCCAATCATTTGTTGATGCACAAGGTTCTAAAACTAATGCTGATGTTGTCCAGTTTGGTTCAAACATACTTGTTTACGTGTCGAGCTACAATGAAGTTGACAACTTGGCTAAGCTCCTAACAGATAAGAATATGATGGTCACAAAGGTTGATGGCAGAACAATGAAGCACGGTTGCCTAGAAATTGTCACAAAAGGAACCAGTGCGAGACCACATTTTGTTGTAGCAACCAACATAATTGAGAATGGAGTGACTTTGGACATAGATGTAGTAGTGGATTTTGGGTTGAAAGTCTCACCGTTCTTGGACATTGACAATAGGAGCATTGCTTACAATAAGGTGAGTGTTAGTTATGGTGAGAGAATTCAAAGGCTGGGTCGTGTTGGACGCTTCAAGAAAGGAGTAGCATTGCGCATTGGACACACTGAGAAGGGAATTATCGAAATTCCAAGCATGATCGCTACTGAGGCGGCTCTTGCTTGCTTTGCATATAACTTACCAGTGATGACAGGAGGCGTCTCAACTAGTCTGATTGGCAATTGTACTGTGCGCCAGGTCAAAACAATGCAGCAATTTGAATTGAGCCCCTTCTTTATCCAGAATTTCGTTGCTCATGATGGATCAATGCATCCTGTCATACATGACATTCTCAAAAAGTATAAACTCCGAGATTGTATGACACCTTTGTGCGATCAGTCTATACCATACAGGGCATCGAGCACTTGGTTATCGGTTAGTGAATATGAGCGACTTGGAGTGGCTTTAGAAATTCCAAAGCAAGTCAAAATTGCATTCCATATCAAAGAGATCCCTCCTAAGCTCCACGAAATGCTTTGGGAAACGGTTGTCAAGTACAAAGACGTTTGCTTATTTCCAAGCATTCGAGCATCGTCCATCAGCAAAATCGCATACACATTGCGTACAGACCTCTTCGCCATCCCAAGAACTCTAATATTGGTGGAGAGATTGCTTGAAGAGGAGCGAGTGAAGCAGAGCCAATTCAGAAGTCTCATCGATGAAGGGTGCTCAAGCATGTTTTCAATTGTCAACTTGACCAACACTCTCAGGGCCAGATATGCAAAAGATTACACCGCAGAGAACATACAAAAACTTGAGAAAGTGAGAAGTCAATTAAAAGAATTCTCAAATTTGGATGGTTCTGCATGTGAGGAAAATTTAATAAAGAGGTATGAGTCTTTGCAGTTCGTTCACCACCAAGCTGCGACGTCACTTGCAAAGGATCTCAAGTTGAAGGGGACTTGGAAGAAGTCATTAGTGGCTAAAGACTTGATCATAGCAGGCGCTGTTGCAATTGGTGGAATAGGACTCATATATAGTTGGTTCACACAATCAGTTGAGACTGTGTCTCACCAAGGGAAAAATAAATCCAAAAGAATCCAAGCCTTGAAGTTTCGCCATGCTCGTGACAAAAGGGCTGGCTTTGAAATTGACAACAATGATGACACAATAGAGGAATTCTTTGGATCTGCATACAGGAAGAAGGGAAAAGGTAAAGGTACCACAGTTGGTATGGGCAAGTCAAGCAGGAGGTTCATTAACATGTACGGGTTTGACCCGACAGAGTACTCATTCATCCAATTCGTTGATCCACTCACTGGGGCACAAATAGAAGAGAATGTCTATGCTGACATTAGAGATGTTCAAGAAAGATTTAGTGAGGTGCGAAAGAAAATGGTTGAGAATGATGATATTGAAATGCAAGCCTTGGGTAGTAACACGACCATACATGCATACTTTAGGAAAGATTGGTCTGACAAAGCTTTGAAGATTGATTTAATGCCACATAACCCACTCAAAGTTTGTGACAAAACAAATGGCATTGCCAAACTTCCTGAGAGAGAGTTCGAACTAAGGCAGACTGGGCCAGCTGTAGAAGTCGACGTGAAGGACATACCAGCGCAGGAAGTGGAGCATGAAGCCAAATCGCTCATGAGAGGCTTGAGAGACTTCAACCCAATTGCTCAAACAGTTTGTAGGCTGAAAGTATCTGTTGAATATGGGACATCAGAGATGTACGGTTTTGGATTTGGAGCATACATAATAGCGAACCACCATTTGTTTAGGAGTTACAATGGTTCCATGGAGGTGCGATCCATGCATGGTACATTCAGGGTGAAGAATCTACACAGTTTGAGCGTTCTGCCAATTAAAGGCAGGGACATCATCCTCATCAAAATGCCGAAGGATTTCCCTGTTTTTCCACAGAAATTGCATTTCCGAGCTCCTACACAGAATGAAAGAGTTTGTTTAGTTGGGACCAACTTTCAGGAGAAGTATGCATCGTCGATCATCACAGAAACAAGCACTACTTACAATATACCAGGCAGCACATTCTGGAAGCATTGGATTGAAACAGATAATGGACATTGTGGACTACCAGTGGTGAGCACAGCCGATGGATGTCTAGTCGGAATTCATAGTTTGGCAAACAATGCACACACCACGAACTACTACTCAGCCTTCGATGAAGATTTTGAAAGCAAGTACCTCCGAACCAATGAGCACAATGAATGGGTCAAATCTTGGGTTTATAATCCAGACACAGTGTTGTGGGGCCCGTTGAAGCTTAAAGATAGCACTCCCAAAGGGTTATTCAAAACAACAAAGCTTGTGCAAGATCTAATCGATCATGATGAAGTCGCGGAGCAAGCTAAGCACTCTGCGTGGATGTTTGAAGCCTTGACAGGAAATTTGCAAGCTGTCGCAACAATGAAGAGTCAATTAGTAACCAAGCATGTAGTTAAAGGAGAGTGTCGACACTTCAAAGAATTCCTGACTGTGGATGCAGAAGCAGAGGCATTCTTCAGGCCTTTGATGGATGCATATGGGAAAAGCTTGCTGAATAGAGATGCGTACATCAAAGACATAATGAAGTACTCAAAACCTATAGATGTTGGTATCGTGGACTGTGATGCATTTGAAGAAGCCATCAATAGGGTTATCATCTACTTGCAAGTGCACGGCTTCAAGAAGTGTGCATATGTCACTGACGAGCAAGAAATTTTCAAAGCGCTCAACATGAAAGCTGCAGTCGGAGCCATGTATGGTGGCAAAAAGAAAGACTATTTTGAGCATTTCACTGATGCAGATAAGGAAGAAATAGTCATGCAAAGCTGTCTGCGACTGTATAAAGGCTTGCTCGGCATTTGGAACGGATCGTTGAAGGCAGAGCTCCGGTGTAAGGAAAAGATACTTGCAAACAAGACGAGGACGTTCACTGCTGCACCTCTAGACACTTTGCTGGGTGGTAAAGTGTGTGTTGATGACTTCAATAATCAATTTTATTCAAAGAATATTGAATGCTGTTGGACAGTTGGGATGACTAAGTTTTATGGTGGTTGGGATAAACTGCTTCGGCGTTTACCTGAGAATTGGGTATACTGTGATGCCGATGGCTCACAGTTTGATAGTTCACTAACTCCATACTTAATCAATGCTGTTCTCACCATCAGAAGCACATACATGGAAGACTGGGATGTGGGGTTGCAAATGCTGCGTAATTTATACACTGAGATTATTTACACACCTATCTCAACTCCAGATGGAACAATTGTTAAGAAGTTCAGAGGAAATAACAGTGGTCAGCCTTCTACTGTTGTGGACAACTCTCTTATGGTTGTCCTTGCCATGCACTATGCTCTCATCAAAGAATGCATTGAGTTTGAAGAGATTGACAGCACGTGCGTGTTCTTCGTCAATGGTGATGATTTGCTGATTGCCGTGAATCCGGATAAAGAGGGCATTCTTGATAGATTGTCACAGCACTTCTCAGATCTTGGTTTAAATTATGATTTTTCGTCAAGAACAAGAAATAAGGAAGAGTTATGGTTTATGTCTCATAGAGGCCTACTGATTGAGGGCATGTACGTGCCGAAACTTGAAGAAGAAAGGATTGTGTCCATTCTCCAATGGGACAGAGCAGACTTGGCTGAACACAGGCTTGAGGCGATTTGCGCAGCTATGATAGAGTCCTGGGGTTATTCTGAACTAACACACCAAATTAGAAGATTCTACTCATGGTTATTGCAACAGCAACCTTTTGCAACAATAGCGCAGGAAGGGAAGGCTCCTTATATAGCAAGCATGGCATTAAGGAAACTGTATATGGATAGAGCTGTGGATGAGGAAGAGCTGAGAGCCTTCACTGAAATGATGGTCGCATTAGACGATGAGTTTGAGTGTGACTCTTATGAAGTACACCACCAGGGAAACGATACAATCGATGCAGGAGGAAGCAGCAAGAAAGATGCAAGACCAGAGCAAGGCAGCATCCAGTCAAACCCGAACAAAGGAAAAGATAAGGATGTGAATGCTGGTACATCTGGGACACATACTGTGCCGAGAATCAAGGCTATCACGTCCAAAATGAGAATGCCCAAAAGCAAAGGAGCAACCGTGCTAAACTTAGAACACTTGCTTGAGTATGCTCCACAACAAATTGATATTTCAAATACTCGGGCAACTCAATCACAGTTTGATACGTGGTATGAGGCAGTGCGGATGGCATACGACATAGGAGAAACTGAGATGCCAACTGTGATGAATGGGCTTATGGTTTGGTGCATTGAAAATGGAACCTCGCCAAATGTCAACGGAGTCTGGGTTATGATGGATGGGGATGAACAAGTCGAGTACCCGTTGAAACCAATCGTTGAGAATGCAAAACCAACCCTTAGGCAAATCATGGCACATTTCTCAGATGTTGCAGAAGCGTATATAGAAATGCGCAACAAAAAGGAACCATATATGCCACGATATGGTTTAATTCGAAATCTGCGGGATGTGGGTTTAGCGCGTTATGCCTTTGACTTTTATGAGGTCACATCACGAACACCAGTGAGGGCTAGGGAAGCGCACATTCAAATGAAGGCCGCAGCATTGAAATCAGCCCAACCTCGACTTTTCGGGTTGGACGGTGGCATCAGTACACAAGAGGAGAACACAGAGAGGCACACCGCCGAGGATGTCTCTCCAAGTATGCATACTCTACTTGGAGTCAAAAACATGTGA
================================================
FILE: inst/extdata/GVariation/B.Oz.fas
================================================
>Oz
ATGGCAACTTACATGTCAACAATCTGTTTCGGTTCGTTTGAATGCAAGCTACCATACTCACCCGCTTCTTGCGGGCATATTGTGAAGGAGCGAGAAGTGCTGGCTTCCGTTGATCCTTTCGCAGATCTGGAAACACAACTTAGTGCACGATTGCTCAAGCAAGAATATGCTACTGTTCGTGTGCTCAAGAACGGTACTCTTACTTACCGATACAAAACTGATGCCCAGATAACGCGCATTCAGAAGAAACTGGAGAGGAAGGATAGGGAAGAATATCACTTCCAGATGGCCGCTCCTAGTATTGTGTCAAAAATTACAATAGCTGGTGGAGATCCTCCATCAAAGTCTGAGCCACAAGCACCAAGAGGGATCATTCATACAACTCCAAGGGTGCGTAAAGTCAAGACACGTCCCATAATAAAGTTGACAGAAGGCCAGATGAATCATCTCATTAAGCAGGTGAAGCAGATTATGTCGGAGAAGAGAGGGTCTGTCCACTTAATTAGTAAGAAGACCACTCATGTTCAATATAAGGAGATACTTGGAGCAACTCGCGCAGCGGTTCGAACTGCACATATGATGGGTTTGCGACGGAGAGTGGACTTCCGATGTGATATGTGGACAGTCGGACTTTTGCAACGTCTCGCTCGGACGGACAAATGGTCCAATCAAGTCCGCACTATCAACATACGAAGGGGTGATAGTGGAGTCATTTTGAACACAAAAAGCCTCAAAGGCCACTTTGGTAGAAGTTCAGGAGACTTGTTCATAGTGCGTGGATCACACGAAGGGAAATTGTACGATGCACGTTCTAGAGTTACTCAGAGTGTTTTGAACTCAATGATCCAGTTTTCGAATGCTGATAATTTTTGGAAGGGTCTAGACGGTAATTGGGCACAACTGAGATATCCTTCGGATCACACATGTGTAGCTGGTTTACCTGTCGAAGATTGTGGTAGAGTTGCTGCATTGATGGCACACAGTATCCTCCCGTGCTACAAGATAACCTGCCCCACCTGTGCTCAACAGTATGCCAGCTTGCCGGTTAGCGATCTGTTTAAGCTGTTGCATAAACATGCGAGAGATGGTTTGAACCGATTGGGAGCGGATAAAGACCGGTTTATACATGTTAATAAGTTCTTGATAGCGTTAGAGCATCTAACTGAACCGGTGGATTTGAATCTCGAGCTTTTCAATGAGATATTTAAATCCATAGGGGAGAAGCAGCAAGCACCGTTCAAGAATTTAAATGTCTTAAATAATTTCTTCCTGAAAGGAAAAGAAAATACAGCTCATGAATGGCAAGTGGCTCAATTGAGTTTGCTCGAATTAGCAAGGTTCCAGAAGAATAGAACTGATAACATCAAGAAAGGTGATATATCTTTCTTCAGAAATAAATTATCTGCCAAGGCAAACTGGAATCTGTATTTGTCGTGCGACAACCAGTTGGATAAAAATGCAAATTTTCTGTGGGGACAAAGGGAGTATCATGCTAAGCGGTTTTTCTCAAACTTCTTTGAGGAAATTGATCCAGCAAAGGGATACTCAGCATATGAAATCCGCAAGCATCCAAATGGAACAAGGAAGCTCTCAATTGGTAACTTAGTTGTCCCACTTGATTTAGCTGAGTTTAGGCAGAAGATGAAAGGTGACTATAGGAAACAACCAGGAGTTAGCAGAAAGTGCACGAGTTCGAAAGATGGTAATTATGTGTATCCCTGTTGTTGCACAACACTTGATGATGGTTCAGCTATTGAATCAACATTCTATCCACCAACCAAAAAGCACCTTGTAATAGGCAATAGCGGTGACCAAAAATTTGTTGATTTACCAAAAGGGGATTCGGAGATGTTATACATTGCCAAGCAGGGTTATTGTTATATCAACGTGTTTCTTGCAATGCTTATTAACATTAGCGAGGAGGATGCAAAGGATTTCACAAAGAAAGTTCGCGACATGTGTGTGCCAAAGCTTGGAACCTGGCCAACTATGATGGATTTGGCGACCACTTGTGCTCAAATGAGAATATTCTATCCTGACGTGCATGATGCAGAGCTGCCTAGAATATTGGTTGACCATGACACTCAAACGTGTCACGTGGTTGACTCATTTGGCTCGCAAACAACTGGATATCATATTCTAAAAGCATCCAGCGTGTCTCAACTTATCTTGTTTGCAAATGATGAATTAGAATCTGATATAAAACATTATAGAGTTGGTGGTGTTCCTAATGCATGCCCTGAACTTGGGTCCACAATATCACCTTTCAGAGAAGGAGGAGTTATAATGTCTGAGTCGGCAGCGCTGAAACTGCTTTTAAAGGGAATTTTTAGACCTAAGGTGATGAGACAGTTGCTGTTAGATGAGCCTTACCTGTTGATTTTATCAATATTATCTCCTGGCATACTGATGGCTATGTATAATAATGGGATTTTTGAACTTGCGGTAAGGTTGTGGATTAATGAGAAACAATCCATAGCTATGATAGCATCGCTACTATCAGCTTTAGCCCTACGAGTGTCAGCGGCAGAAACACTCGTCGCACAGAGGATTATCATTGATGCTGCAGCTACAGACCTCCTTGATGCTACGTGTGATGGGTTCAACCTACATCTAACGTACCCCACTGCATTGATGGTGTTGCAAGTTGTTAAGAATAGAAATGAATGTGATGATACCCTATTCAAGGCGGGTTTTTCAAGTTACAACACGAGCGTCGTACAGATTATGGAAAAAAATTATCTAAATCTCTTGAACGATGCTTGGAAAGATTTAACTTGGCGGGAAAAATTATCCGCAACATGGTACTCATACAGAGCAAAACGCTCTATCACTCGGTACATAAAACCCACAGGAAGGGCAGATTTGAAAGGGTTATACAACATATCACCACAAGCATTCTTGGGCCGAAGCGCCCAGGTGGTCAAAGGTACTGCCTCAGGATTGAGTGAGCGATTTAATAATTATTTCAATACTAAGTGTGTAAATATTTCATCCTTTTTCATTCGTAGAATCTTTAGGCGTTTGCCAACTTTCGTCACTTTTGTTAACTCATTATTAGTTATTAGTATGTTAACTAGCGTAGTGGCAGTGTGTCAGGCAATAATTTTAGATCAGAGGAAATATAGGAGAGAAATCGAGTTGATGCAGATAGAGAAGAATGAGATTGTCTGCATGGAGCTATATGCAAGTTTACAGCGCAAACTTGAACGCGATTTCACATGGGATGAGTACATTGAGTATTTGAAATCAGTAAACCCTCAGATAGTTCAGTTTGCTCAAGCGCAGATGGAAGAATATGATGTGCGACACCAGCGTTCCACACCAGGTGTTAAAAATTTGGAACAAGTGGTAGCATTTATGGCTTTAGTCATTATGGTGTTCGATGCTGAAAGGAGTGATTGCGTGTTCAAAACTCTCAATAAATTTAAGGGTGTCCTTTCCTCAATGGACTATGAAGTTAGACATCAGTCCTTAGACGATGTGATCAAGAATTTTGATGAGAGGAATGAGATTATTGATTTTGAATTGAGTGAGGACACAATTCGAACATCATCAGTGCTAGATACAAAGTTTAGTGATTGGTGGGACCGACAAATCCAGATGGGACATACACTTCCACATTACAGAACCGAGGGGCACTTCATGGAATTCACAAGAGCAACTGCTGTCCAAGTGGCTAATGACATTGCCCATAGCGAACACCTAGACTTTCTAGTAAGGGGAGCTGTTGGGTCTGGAAAGTCAACTGGGTTACCTGTTCATCTTAGTGTAGCCGGATCTGTGCTTTTAATTGAACCAACGCGACCACTAGCGGAGAACGTTTTCAAACAGCTATCTAGTGAACCATTCTTCAAGAAGCCAACACTGCGTATGCGTGGAAATAGTATATTTGGCTCTTCTCCAATCTCCGTCATGACTAGCGGATTTGCGCTACACTACTTCGCCAATAATCGCTCTCAATTAGCTCAGTTCAACTTTGTAATATTTGATGAGTGCCATGTTCTGGATCCTTCCGCAATGGCGTTCCGCAGTCTGCTGAGTGTTTATCATCAAGCATGCAAAGTATTAAAAGTGTCAGCTACTCCAGTGGGAAGAGAGGTTGAATTTACAACACAGCAACCAGTCAAGTTAATAGTGGAGGACACACTGTCTTTCCAATCATTTGTTGATGCACAAGGTTCTAAAACTAATGCTGATGTTGTTCAGTTTGGTTCAAACGTACTTGTGTACGTGTCGAGCTACAATGAAGTTGATACCTTGGCTAAGCTCCTAACAGACAAGAATATGATGGTCACAAAGGTTGATGGCAGAACAATGAAGCACGGTTGCCTAGAAATTGTCACAAAAGGAACCAGTGCGAGACCACATTTTGTTGTAGCAACCAACATAATTGAGAATGGAGTGACTTTGGACATAGACGTGGTTGTAGATTTTGGGTTAAAAGTCTCACCGTTCTTGGACATTGACAATAGGAGCATTGCTTACAATAAGGTGAGTGTTAGCTATGGTGAGAGAATTCAAAGGCTGGGTCGTGTTGGACGCTTCAAGAAAGGAGTAGCATTGCGCATTGGACACACTGAGAAGGGAATTATTGAAATTCCAAGCATGATCGCTACAGAGGCAGCTCTTGCTTGCTTTGCATATAACTTACCAGTGATGACAGGAGGCGTCTCAACTAGTCTGATTGGCAATTGTACTGTGCGCCAAGTTAAAACAATGCAGCAATTTGAATTGAGTCCCTTCTTTATCCAGAATTTCGTTGCCCATGATGGATCAATGCATCCTGTCATACATGACATTCTTAAAAAGTATAAACTTCGAGATTGTATGACACCTTTGTGCGATCAGTCTATACCATACAGGGCATCGAGCACTTGGTTATCGGTTAGTGAATATGAGCGACTTGGAGTGGCCTTAGAAATTCCAAAGCAAGCCAAAATTGCATTCCATATCAAAGAGATCCCTCCTAAGCTCCACGAAATGCTTTGGGAAACGGTTGTCAAGTACAAAGACGTTTGCTTATTTCCAAGCATTCGAGCATCGTCCATCAGCAAAATCGCATACACATTGCGTACAGACCTCTTCGCCATCCCAAGAACTCTAATATTGGTGGAGAGACTGCTTGAAGAGGAGCGAGTGAAGCAGAGCCAATTCAGAAGTCTCATCGATGAAGGATGCTCAAGCATGTTTTCAATTGTCAACCTGACAAACACTCTCAGAGCTAGATATGCAAAAGATTACACCGCAGAGAACATACAAAAACTTGAGAAAGTGAGAAGTCAATTGAAAGAATTCTCAAATTTGGATGGTTCTGCATGTGAGGAAAATTTAATAAAGAGGTATGAGTCTTTGCAGTTCGTTCATCACCAAGCTGCGACGTCACTTGCAAAGGATCTCAAGTTAAAGGGGACTTGGAAGAAGTCATTAGTGGCCAAAGACTTGATCATAGCAGGCGCTGTTGCAATTGGTGGAATAGGACTCATATATAGTTGGTTCACACAATCAGTTGAGACTGTGTCTCACCAAGGGAAAAATAAATCCAAAAGAATTCAAGCCTTGAAGTTTCGCCATGCTCGTGACAAAAGGGCTGGCTTTGAAATTGACAACAATGATGACACAATAGAGGAATTCTTTGGATCTGCATACAGGAAAAAGGGAAAAGGTAAAGGTACCACAGTTGGTATGGGCAAGTCAAGCAGGAGGTTCATCAACATGTATGGGTTTGATCCAACAGAGTACTCATTCATCCAATTCGTTGATCCACTCACTGGGGCGCAAATAGAAGAGAATGTCTATGCTGACATTAGAGATATTCAAGAGAGATTTAGTGAAGTGCGAAAGAAAATGGTTGAGAATGATGACATTGAAATGCAAGCCTTGGGTGGTAACACGACCATACATGCATACTTTAGGAAAGATTGGTCTGACAAAGCTTTGAAGATTGATTTAATGCCACATAATCCACTCAAAGTTTGTGACAAGACAAATGGCATTGCCAAATTTCCTGAGAGAGAGCTCGAACTAAGGCAGACTGGGCCAGCTGTAGAAGTCGATGTGAAGGACATACCAGCACAGGAAGTGGAGCATGAAGCTAAATCGCTCATGAGAGGCTTGAGAGACTTCAACCCAATTGCCCAAACAGTTTGTAGGCTGAAAGTATCTGTTGAATATGGGACATCAGAGATGTACGGTTTTGGATTTGGAGCATACATAATAGCGAACCACCATTTGTTCAGGAGTTACAATGGTTCCATGGAGGTGCGATCCATGCACGGTACATTCAGGGTGAAGAATCTACACAGTTTGAGCGTTCTGCCAATTAAAGGTAGGGATATCATCCTCATCAAAATGCCGAAAGATTTCCCTGTCTTTCCACAGAAATTGCATTTCCGAGCTCCTACACAGAATGAAAGAGTTTGTTTAGTTGGAACCAACTTTCAGGAGAAGTATGCATCGTCGATCATCACAGAAACAAGCACCACTTACAATATACCAGGCAGCACATTCTGGAAGCATTGGATTGAAACAGATAATGGACATTGTGGATTACCAGTGGTGAGCACCACCGATGGATGTCTAGTCGGAATTCACAGTTTGGCAAACAACAAACACACCACGAACTACTACTCAGCCTTCGATGAAGATTTTGAAAGCAAGTATCTCCGAACCAATGAGCACAATGAATGGGTCAAGTCTTGGATTTATAATCCAGACACAGTGTTGTGGGGCCCGTTGAAACTTAAAGACAGCACTCCCAAAGGATTATTCAAAACAACAAAGCTTGTGCAAGATCTAATCGATCATGATGTAGTGGTGGAGCAAGCTAAGCACTCTGCGTGGATGTTTGAAGCCTTGACAGGAAATTTGCAAGCTGTCGCAACAATGAAGAGCCAATTAGTAACCAAGCATGTAGTTAAAGGAGAGTGTCGACACTTCAAAGAATTCCTGACTGTGGATGCAGAAGCAGAGGCATTCTTCAGGCCTTTGATGGATGCGTATGGGAAAAGCTTGCTGAATAGAGATGCATACATCAAGGACATAATGAAGTATTCAAAACCTATAGATGTTGGTATCGTGGACTGTGATGCATTTGAGGAAGCCATCAATAGGGTTATCATCTACCTGCAAGTGCACGGCTTCAAGAAGTGCGCATACGTCACTGACGAGCAAGAAATTTTCAAAGCGCTCAACATGAAAGCTGCAGTTGGAGCCATGTATGGTGGCAAAAAGAAAGACTATTTTGAGCATTTCACTGATGCAGATAAGGAAGAAATAGTCATGCAAAGCTGTCTGCGATTGTATAAAGGCTTGCTTGGCATTTGGAATGGATCATTGAAGGCAGAGCTCCGGTGTAAGGAAAAGATACTTGCAAATAAGACGAGGACATTCACTGCTGCACCTTTAGACACTTTGCTGGGTGGTAAAGTGTGTGTTGATGATTTCAATAATCAATTTTATTCAAAGAATATTGAATGCTGTTGGACGGTTGGGATGACTAAGTTTTATGGTGGTTGGGATAAACTGCTGCGGCGTTTACCTGAGAATTGGGTATACTGTGATGCCGATGGCTCACAGTTTGATAGTTCACTAACTCCATACTTAATCAATGCTGTTCTCACCATCAGAAGCACATACATGGAAGATTGGGATGTGGGGTTGCAAATGTTGCGCAATTTATACACTGAGATTGTTTACACACCTATTTCAACTCCAGATGGAACAATTGTTAAGAAGTTCAGAGGAAATAACAGTGGTCAGCCTTCTACTGTTGTGGACAACTCTCTTATGGTCGTCCTTGCCATGCACTATGCTCTCATCAAAGAATGCATTGAGTTTGAAGAGATTGACAGCACGTGCGTGTTCTTTGTCAATGGTGATGATTTGCTGATTGCTGTGAATCCGGATAAAGAGGGCATTCTTGACAGATTGTCACAACACTTCTCAGATCTTGGTTTGAATTATGATTTCTCGTCAAGAACAAGAAATAAGGAGGAATTGTGGTTTATGTCTCATAGAGGCCTACTGATTGAGGGCATGTACGTGCCGAAACTTGAAGAAGAAAGGATTGTGTCCATTCTCCAATGGGACAGAGCAGACTTGGCTGAACACAGGCTTGAGGCGATTTGCGCAGCTATGATAGAGTCCTGGGGTTATTCTGAACTAACACACCAAATCAGGAGATTCTACTCATGGTTATTGCAACAGCAACCCTTTGCAACAATAGCGCAGGAAGGGAAGGCTCCTTATATAGCAAGCATGGCATTAAGGAAATTGTATATGGATAGGGCTGTGGATGAGGAAGAGCTGAGAGCCTTCACTGAAATGATGGTCGCATTAGACGATGAGTTTGAATTTGACTCTTATGAAGTACACCATCAAGCAAATGACACAATCGATGCAGGAGGAAGCAGCAAGAAAGATGCAAGACCGGAGCAAGGCAGCATCCAGTCAAACCCGAACAAAGGAAAAGATAAGGATGTGAATGCTGGTACATCTGGGACACATACTGTGCCGAGAATCAAGGCTATCACGTCCAAAATGAGAATGCCCAAAAGCAAGGGAGCAACCGTGCTAAACCTAGAACACTTGCTTGAGTATGCTCCACAACAAATTGATATTTCAAATACTCGGGCAACTCAATCACAGTTTGATACGTGGTATGAGGCAGTGCGGATGGCATACGACATAGGAGAAACTGAGATGCCAACTGTGATGAATGGGCTTATGGTTTGGTGCATTGAAAATGGAACCTCGCCAAATGTCAACGGAGTTTGGGTTATGATGGATGGGAATGAACAAGTCGAGTACCCGTTGAAACCAATCGTTGAGAATGCAAAACCAACCCTTAGGCAAATCATGGCACATTTCTCAGATGTTGCAGAAGCGTATATAGAAATGCGCAACAAAAAGGAACCATATATGCCACGATATGGTTTAATTCGAAATCTGCGGGATGTGGGTTTAGCGCGTTATGCCTTTGACTTTTATGAGGTCACATCACGAACACCAGTGAGGGCTAGGGAAGCGCACATTCAAATGAAGGCCGCAGCATTGAAATCAGCCCAACCTCGACTTTTCGGGTTGGACGGTGGCATCAGTACACAAGAGGAGAACACAGAGAGGCACACCACCGAGGATGTCTCTCCAAGTATGCATACTCTACTTGGAGTCAAGAACATGTGA
>CF_YL21
ATGGCAATCTACACGTCAACAATCTGTTTCGGTTCGTTTGAATGCAAGCTACCATACTCACCCGCCTCTTGCGGGCATATTGTGAAGGAACGAGAAGTGCTGGCTTCCATCGATCCTTTCGCAGATCTGGAAACACAACTTAGTGCACGATTGCTCAAGCAAGAATATGCTACTGTTCGTGTGCTCAAGAACGGTACTCTTACGTACCGATACAAGACTGATGCCCAGATAACGCGCATCCAGAAGAGACTGGAAAGGAAGGATAGGGAAGAATACCACTTCCAGATGGCAGCTCCTAGTATTGTGTCAAAGATCACTATTGCTGGTGGAGAGCCACCTTCAAAACTTGAATCACAAGTGCGGAGGGGTGTCATCCACACAACTCCAAGGATGCGCACAGCAAAAACATATCACACGCCAAAGCTGACAGAGGGACAAATGAACCACCTTATCAAGCAGGTGAAGCAAATTATGTCAACCAAAGGAGGGTCTGTTCAACTGATTAGCAAGAAAAGTACCCATGTTCACTATAAAGAAGTTTTGGGATCACATCGCGCAGTTGTTTGCACTGCACATATGAGAGGTTTACGAAAGAGAGTGGACTTTCGGTGTGATAAATGGACCGTTGTGCGTCTACAGCATCTCGCCAGGACGGATAAGTGGAATAACCAAGTTCGTGTTACTGATCTACGCAAGGGCGATAGTGGAGTTATATTGAGTAATACTAATCTCAAAGGAAACTTTGGGAGAAGCTCGGAGGGCATATTCATAGTGCGTGGGTCGCACGAAGGAAAAATCTATGATGCACGTTCCAAGGTTACTCAAGGGGTTATGGATTCAATGGTTCAGTTCTCAAGCGCTGAAAGCTTTTGGAAGGGATTGGACGGCAATTGGGCACAAATGAGATATCCTACAGATCATACATGTGTGGCAGGCTTACCAGTTGAAGACTGTGGCAGAGTTGCAGCGATAATGACACACAGTATTTTACCGTGCTATAAGATAACCTGCCCTACCTGTGCCCAACAATATGCCAACTTGCCAGCCAGTGACTTACTTAAGATATTACACAAGCACGCAAGTGATGGTTTAAATCGATTGGGGGCAGACAAAGATCGCTTTGTGCATGTCAAAAAGTTCTTGACAATCTTAGAGCACTTAACTGAACCGGTTGATCTGAGTCTAGAAATTTTCAATGAAGTATTCAAGTCTATAGGGGAGAAGCAACAATCACCTTTCAAAAACCTGAATATTCTGAATAATTTCTTTTTAAAAGGAAAGGAAAATACAGCTCGTGAATGGCAGGTGGCTCAATTAAGCTTACTTGAATTGGCAAGATTCCAAAAGAACAGAACGGATAATATCAAGAAAGTAGACATCTCGTTCTTTAGGAATAAACTATCTGCCAAAGCAAATTGGAACTTGTATCTGTCATGTGATAACCAGCTGGATAAGAATGCAAACTTCCTGTGGGGACAGAGGGAATATCATGCTAAGCGATTTTTCTCGAACTATTTCGAGGAAATTGATCCAGCGAAGGGCTATTCAGCATACGAAAATCGTTTGCATCCGAATGGGACAAGAAAACTTGCAATTGGAAACCTAATTGTACCACTTGATCTGGCTGAGTTTAGGCGGAAGATGAAAGGTGATTTTAAAAGACAGCCAGGGGTGAGTAAGAAGTGCACGAGCTCGAAGGATGGAAACTACGTGTATCCCTGTTGTTGCACTACACTTGATGATGGCTCAGCTGTTGAATCAACATTTTACCCGCCAACTAAGAAGCGCCTCGTAATAGGTAATAGTGGCGACCAAAAGTATGTTGACTTACCAAAAGGGAATTCTGAGATGTTATATATTGCCAGGCAAGGCTTCTGTTACATTAACATTTTCCTCGCGATGTTGATTAACATTAGTGAGGAAGATGCAAAGGATTTCACTAAGAAGGTTCGTGACATGTGTGTGCCAAAGCTCGGAACCTGGCCAACCATGATGGATCTGGCTACAACTTGTGCTCAAATGAAAATATTCTACCCTGATGTTCATGATGCAGAACTGCCTAGAATACTAGTCGATCACGAAACGCAGACATGCCATGTGGTTGACTCGTTTGGCTCACAAACAACTGGGTATCATATTTTGAAAGCATCTAGCGTGTCCCAACTTATTTTGTTTGCTAATGATGAGTTGGAGTCTGACATTAAGCACTATAGAGTTGGTGGTATTCCTAATGCATGCCCTGAACTTGGGTCCACAATATCACCTTTCAGAGAAGGAGGAGTTATAATGTCTGAGTCGGCAGCGCTGAAACTGCTTTTGAAGGGAATTTTTAGACCTAAGGTGATGAGACAGTTGCTGTTAGATGAGCCTTACCTGTTGATTCTATCAATATTATCCCCTGGCATACTGATGGCTATGTATAATAATGGGATTTTTGAACTTGCGGTGAGGTTGTGGATTAATGAGAAGCAATCCATAGCGATGATAGCATCGCTACTATCAGCTTTAGCCCTACGAGTGTCAGCGGCAGAAACACTCGTCGCACAGAGGATTATAATTGATGCTGCAGCTACAGACCTCCTCGATGCTACGTGTGACGGGTTCAACCTGCATCTAACGTACCCCACTGCATTGATGGTGTTGCAAGTTGTTAAGAATAGAAATGAATGTGATGATACTCTATTCAAGGCGGGTTTTCCAAGTTACAACACGAGCGTCGTACAGATTATGGAAAAAAATTATCTAAATCTCTTGAACGATGCTTGGAAAGATTTAACTTGGCGGGAAAAATTTTCCGCAACATGGTACTCATACAGAGCAAAACGCTCTATCACTCGGTACATAAAACCCACAGGAAGGGCAGATTTGAAAGGGTTATACAACATATCACCACAAGCATTCTTGGGCCGAAGCGTCCAGGTGGTCAAAGGCACTGCCTCAGGATTGAGCGAGCGATTTAATAATTACTTCAATACTAAGTGTGTAAATATTTCATCCTTTTTCATTCGTAGAATCTTTAGGCGTTTGCCAACTTTCGTCACTTTTGTTAACTCATTATTAGTTATTAGTATGTTAACTAGTGTAGTGGCAGTGTGTCAGGCAATAATTTTAGATCAGAAGAAGTATAGGAGAGAGATCGAGTTGATGCAGATAGAGAAGAATGAGATTGTCTGCATGGAGCTATACGCAAGTTTACAGCGCAAACTTGAACGCGATTTCACATGGGATGAGTACATTGAGTATTTGAAATCAGTGAACCCTCAGATAGTTCAATTTGCTCAAGCGCAGATGGAAGAATATGATGTGCGACACCAGCGTTCCACACCAGGTGTTAAAAATTTGGAACAAGTGGTAGCATTTATGGCTTTAGTCATCATGGTGTTCGATGCTGAAAGGAGTGACTGCGTGTTCAAAACTCTCAATAAATTTAAGGGTGTCCTTTCCTCACTGGACCATGAAGTTAGACATCAGTCCTTGGATGATGTGATCAAGAATTTTGATGAGAGGAATGAGACTATTGATTTTGAATTGAGTGAGGACACAATTCGAACATCATCAGTGCTAGATACAAAGTTTAGTGATTGGTGGGACCGACAAATCCAAATGGGACATACACTTCCACATTACAGAACCGAGGGGCACTTCATGGAATTTACAAGGGCAACTGCTGTTCAAGTGGCTAATGACATTGCCCACAGCGAACACCTAGACTTTCTAGTAAGGGGAGCTGTTGGGTCTGGAAAGTCAACTGGGTTGCCTGTTCATCTTAGTGTAGCTGGATCTGTGCTTTTAATTGAGCCAACGCGGCCACTAACGGAGAACGTTTTCAAACAGCTATCTAGTGAACCATTCTTCAAGAAGCCAACACTGCGTATGCGTGGAAATAGTATATTTGGCTCTTCCCCAATCTCCGTTATGACTAGCGGGTTTGCGCTACACTACTTCGCCAATAATCGCTCTCAATTAGCTCAGTTCAATTTTGTAATATTTGATGAGTGCCATGTTCTGGATCCTTCCGCAATGGCGTTCCGCAGTCTGCTGAGTGTTTATCATCAAGCATGTAAAGTATTAAAAGTGTCAGCTACTCCAGTGGGAAGAGAGGTTGAATTCACAACACAGCAGCCAGTCAAGTTAATAGTGGAAGACACACTGTCTTTCCAATCATTTGTTGATGCACAAGGTTCTAAAACTAATGCTGATGTTGTCCAGTTTGGTTCAAACATACTTGTTTACGTGTCGAGCTACAATGAAGTTGACAACTTGGCTAAGCTCCTAACAGATAAGAATATGATGGTCACAAAGGTTGATGGCAGAACAATGAAGCACGGTTGCCTAGAAATTGTCACAAAAGGAACCAGTGCGAGACCACATTTTGTTGTAGCAACCAACATAATTGAGAATGGAGTGACTTTGGACATAGATGTAGTAGTGGATTTTGGGTTGAAAGTCTCACCGTTCTTGGACATTGACAATAGGAGCATTGCTTACAATAAGGTGAGTGTTAGTTATGGTGAGAGAATTCAAAGGCTGGGTCGTGTTGGACGCTTCAAGAAAGGAGTAGCATTGCGCATTGGACACACTGAGAAGGGAATTATCGAAATTCCAAGCATGATCGCTACTGAGGCGGCTCTTGCTTGCTTTGCATATAACTTACCAGTGATGACAGGAGGCGTCTCAACTAGTCTGATTGGCAATTGTACTGTGCGCCAGGTCAAAACAATGCAGCAATTTGAATTGAGCCCCTTCTTTATCCAGAATTTCGTTGCTCATGATGGATCAATGCATCCTGTCATACATGACATTCTCAAAAAGTATAAACTCCGAGATTGTATGACACCTTTGTGCGATCAGTCTATACCATACAGGGCATCGAGCACTTGGTTATCGGTTAGTGAATATGAGCGACTTGGAGTGGCTTTAGAAATTCCAAAGCAAGTCAAAATTGCATTCCATATCAAAGAGATCCCTCCTAAGCTCCACGAAATGCTTTGGGAAACGGTTGTCAAGTACAAAGACGTTTGCTTATTTCCAAGCATTCGAGCATCGTCCATCAGCAAAATCGCATACACATTGCGTACAGACCTCTTCGCCATCCCAAGAACTCTAATATTGGTGGAGAGATTGCTTGAAGAGGAGCGAGTGAAGCAGAGCCAATTCAGAAGTCTCATCGATGAAGGGTGCTCAAGCATGTTTTCAATTGTCAACTTGACCAACACTCTCAGGGCCAGATATGCAAAAGATTACACCGCAGAGAACATACAAAAACTTGAGAAAGTGAGAAGTCAATTAAAAGAATTCTCAAATTTGGATGGTTCTGCATGTGAGGAAAATTTAATAAAGAGGTATGAGTCTTTGCAGTTCGTTCACCACCAAGCTGCGACGTCACTTGCAAAGGATCTCAAGTTGAAGGGGACTTGGAAGAAGTCATTAGTGGCTAAAGACTTGATCATAGCAGGCGCTGTTGCAATTGGTGGAATAGGACTCATATATAGTTGGTTCACACAATCAGTTGAGACTGTGTCTCACCAAGGGAAAAATAAATCCAAAAGAATCCAAGCCTTGAAGTTTCGCCATGCTCGTGACAAAAGGGCTGGCTTTGAAATTGACAACAATGATGACACAATAGAGGAATTCTTTGGATCTGCATACAGGAAGAAGGGAAAAGGTAAAGGTACCACAGTTGGTATGGGCAAGTCAAGCAGGAGGTTCATTAACATGTACGGGTTTGACCCGACAGAGTACTCATTCATCCAATTCGTTGATCCACTCACTGGGGCACAAATAGAAGAGAATGTCTATGCTGACATTAGAGATGTTCAAGAAAGATTTAGTGAGGTGCGAAAGAAAATGGTTGAGAATGATGATATTGAAATGCAAGCCTTGGGTAGTAACACGACCATACATGCATACTTTAGGAAAGATTGGTCTGACAAAGCTTTGAAGATTGATTTAATGCCACATAACCCACTCAAAGTTTGTGACAAAACAAATGGCATTGCCAAACTTCCTGAGAGAGAGTTCGAACTAAGGCAGACTGGGCCAGCTGTAGAAGTCGACGTGAAGGACATACCAGCGCAGGAAGTGGAGCATGAAGCCAAATCGCTCATGAGAGGCTTGAGAGACTTCAACCCAATTGCTCAAACAGTTTGTAGGCTGAAAGTATCTGTTGAATATGGGACATCAGAGATGTACGGTTTTGGATTTGGAGCATACATAATAGCGAACCACCATTTGTTTAGGAGTTACAATGGTTCCATGGAGGTGCGATCCATGCATGGTACATTCAGGGTGAAGAATCTACACAGTTTGAGCGTTCTGCCAATTAAAGGCAGGGACATCATCCTCATCAAAATGCCGAAGGATTTCCCTGTTTTTCCACAGAAATTGCATTTCCGAGCTCCTACACAGAATGAAAGAGTTTGTTTAGTTGGGACCAACTTTCAGGAGAAGTATGCATCGTCGATCATCACAGAAACAAGCACTACTTACAATATACCAGGCAGCACATTCTGGAAGCATTGGATTGAAACAGATAATGGACATTGTGGACTACCAGTGGTGAGCACAGCCGATGGATGTCTAGTCGGAATTCATAGTTTGGCAAACAATGCACACACCACGAACTACTACTCAGCCTTCGATGAAGATTTTGAAAGCAAGTACCTCCGAACCAATGAGCACAATGAATGGGTCAAATCTTGGGTTTATAATCCAGACACAGTGTTGTGGGGCCCGTTGAAGCTTAAAGATAGCACTCCCAAAGGGTTATTCAAAACAACAAAGCTTGTGCAAGATCTAATCGATCATGATGAAGTCGCGGAGCAAGCTAAGCACTCTGCGTGGATGTTTGAAGCCTTGACAGGAAATTTGCAAGCTGTCGCAACAATGAAGAGTCAATTAGTAACCAAGCATGTAGTTAAAGGAGAGTGTCGACACTTCAAAGAATTCCTGACTGTGGATGCAGAAGCAGAGGCATTCTTCAGGCCTTTGATGGATGCATATGGGAAAAGCTTGCTGAATAGAGATGCGTACATCAAAGACATAATGAAGTACTCAAAACCTATAGATGTTGGTATCGTGGACTGTGATGCATTTGAAGAAGCCATCAATAGGGTTATCATCTACTTGCAAGTGCACGGCTTCAAGAAGTGTGCATATGTCACTGACGAGCAAGAAATTTTCAAAGCGCTCAACATGAAAGCTGCAGTCGGAGCCATGTATGGTGGCAAAAAGAAAGACTATTTTGAGCATTTCACTGATGCAGATAAGGAAGAAATAGTCATGCAAAGCTGTCTGCGACTGTATAAAGGCTTGCTCGGCATTTGGAACGGATCGTTGAAGGCAGAGCTCCGGTGTAAGGAAAAGATACTTGCAAACAAGACGAGGACGTTCACTGCTGCACCTCTAGACACTTTGCTGGGTGGTAAAGTGTGTGTTGATGACTTCAATAATCAATTTTATTCAAAGAATATTGAATGCTGTTGGACAGTTGGGATGACTAAGTTTTATGGTGGTTGGGATAAACTGCTTCGGCGTTTACCTGAGAATTGGGTATACTGTGATGCCGATGGCTCACAGTTTGATAGTTCACTAACTCCATACTTAATCAATGCTGTTCTCACCATCAGAAGCACATACATGGAAGACTGGGATGTGGGGTTGCAAATGCTGCGTAATTTATACACTGAGATTATTTACACACCTATCTCAACTCCAGATGGAACAATTGTTAAGAAGTTCAGAGGAAATAACAGTGGTCAGCCTTCTACTGTTGTGGACAACTCTCTTATGGTTGTCCTTGCCATGCACTATGCTCTCATCAAAGAATGCATTGAGTTTGAAGAGATTGACAGCACGTGCGTGTTCTTCGTCAATGGTGATGATTTGCTGATTGCCGTGAATCCGGATAAAGAGGGCATTCTTGATAGATTGTCACAGCACTTCTCAGATCTTGGTTTAAATTATGATTTTTCGTCAAGAACAAGAAATAAGGAAGAGTTATGGTTTATGTCTCATAGAGGCCTACTGATTGAGGGCATGTACGTGCCGAAACTTGAAGAAGAAAGGATTGTGTCCATTCTCCAATGGGACAGAGCAGACTTGGCTGAACACAGGCTTGAGGCGATTTGCGCAGCTATGATAGAGTCCTGGGGTTATTCTGAACTAACACACCAAATTAGAAGATTCTACTCATGGTTATTGCAACAGCAACCTTTTGCAACAATAGCGCAGGAAGGGAAGGCTCCTTATATAGCAAGCATGGCATTAAGGAAACTGTATATGGATAGAGCTGTGGATGAGGAAGAGCTGAGAGCCTTCACTGAAATGATGGTCGCATTAGACGATGAGTTTGAGTGTGACTCTTATGAAGTACACCACCAGGGAAACGATACAATCGATGCAGGAGGAAGCAGCAAGAAAGATGCAAGACCAGAGCAAGGCAGCATCCAGTCAAACCCGAACAAAGGAAAAGATAAGGATGTGAATGCTGGTACATCTGGGACACATACTGTGCCGAGAATCAAGGCTATCACGTCCAAAATGAGAATGCCCAAAAGCAAAGGAGCAACCGTGCTAAACTTAGAACACTTGCTTGAGTATGCTCCACAACAAATTGATATTTCAAATACTCGGGCAACTCAATCACAGTTTGATACGTGGTATGAGGCAGTGCGGATGGCATACGACATAGGAGAAACTGAGATGCCAACTGTGATGAATGGGCTTATGGTTTGGTGCATTGAAAATGGAACCTCGCCAAATGTCAACGGAGTCTGGGTTATGATGGATGGGGATGAACAAGTCGAGTACCCGTTGAAACCAATCGTTGAGAATGCAAAACCAACCCTTAGGCAAATCATGGCACATTTCTCAGATGTTGCAGAAGCGTATATAGAAATGCGCAACAAAAAGGAACCATATATGCCACGATATGGTTTAATTCGAAATCTGCGGGATGTGGGTTTAGCGCGTTATGCCTTTGACTTTTATGAGGTCACATCACGAACACCAGTGAGGGCTAGGGAAGCGCACATTCAAATGAAGGCCGCAGCATTGAAATCAGCCCAACCTCGACTTTTCGGGTTGGACGGTGGCATCAGTACACAAGAGGAGAACACAGAGAGGCACACCGCCGAGGATGTCTCTCCAAGTATGCATACTCTACTTGGAGTCAAAAACATGTGA
================================================
FILE: inst/extdata/GVariation/C.Wilga5.fas
================================================
>Wilga5
ATGGCAACTTACATGTCAACAATCTGTTTCGGTTCGTTTGAATGCAAGCTACCATACTCACCCGCCTCTTGCGGGCATATTGTGAAGGAACGAGAAGTGCTGGCTTCCGTTGATCCTTTCGCAGATCTGGAAACACAACTTAGTGCACGATTGCTCAAGCAAGAATATGCTACTGTTCGTGTGCTCAAGAACGGTACTCTTACGTACCGATACAAGACTGATGCCCAGATAACGCGCATCCAGAAGAAACTGGAAAGGAAGGATAGGGAAGAATATCACTTCCAGATGGCAGCTCCTAGTATTGTGTCAAAGATCACTATTGCTGGTGGAGAGCCACCTTCAAAACTTGAATCACAAGTGCGGAGGGGTGTCATCCACACAACTCCAAGGATGCGCACAGCAAAAACATATCACACGCCAAAGTTGACAGAGGGACAAATGAACCACCTTATCAAGCAGGTGAAGCAAATTATGTCAACCAAAGGAGGGTCTGTTCAACTGATTAGCAAGAAAAGTACCCATGTTCACTATAAAGAAGTTTTGGGATCACATCGCGCAGTTGTTTGCACTGCACATATGAGAGGTTTACGAAAGAGAGTGGACTTTCGGTGTGATAAATGGACCGTTGTGCGTCTACAGCATCTCGCCAGGACGGACAAGTGGAATAACCAAGTTCGTGCTACTGATCTACGCAAGGGCGATAGTGGAGTTATATTGAGTAATACTAATCTCAAAGGAAGCTTTGGGAGAAGCTCGGAGGGCATATTCATAGTGCGTGGGTCGCACGAAGGAAAAATCTATGATGCACGTTCCAAGGTTACTCAAGGGGTTATGGATTCAATGGTTCAGTTCTCAAGCGCTGAAAGCTTCTGGAAGGGATTGGACGGCAATTGGGCACAAATGAGATATCCTACAGATCATACATGTGTGGCAGGCTTACCAGTTGAAGACTGCGGCAGAGTTGCAGCGATAATGACACACAGTATTTTACCGTGCTATAAGATAACCTGCCCTACCTGTGCCCAACAATATGCCAATTTGCCAGCCAGTGACTTACTTAAGATATTACACAAGCACGCAAGTGATGGTTTAAATCGATTGGGGGCAGACAAAGATCGCTTTGTGCACGTCAAAAAGTTCTTGACAATCTTAGAGCACTTAACTGAACCGGTTGATCTGGGTCTAGAAATTTTCAATGAAGTATTCAAGTCTATAGGGGAGAAGCAACAATCACCTTTCAAAAACCTGAATATTCTGAATAATTTCTTTTTGAAAGGAAAAGAAAATACAGCTCGTGAATGGCAGGTGGCTCAATTAGGCTTACTTGAATGGGCAAGATTCCAAAAGAACAGAACGGATAATATCAAGAAAGGAGACATCTCGTTCTTTAGGAATAAACTATCTGCCAAAGCAAATTGGAACTTGTATCTGTCATGTGATAACCAGCTGGATAAGAATGCAAACTTCCTGTGGGGACAGAGGGAATATCATGCTAAGCGATTTTTCTCGAACTATTTCGAGGAAATTGATCCAGCGAAGGGCTATTCAGCATATGAAAATCGTTTGCATCCGAATGGGACAAGAAAACTTGCAATTGGAAACCTAATTGTACCACTTGATCTGGCTGAGTTTAGGCGGAAGATGAAAGGTGATTATAAAAGACAGCCAGGGGTGAGTAAGAAGTGCACGAGCTCGAAGGATGGAAACTACGTGTATCCCTGTTGTTGCACTACACTTGATGATGGCTCAGCTGTTGAATCAACATTTTACCCGCCAACTAAGAAGCACCTTGTAATAGGTAATAGTGGCGACCAAAAGTATGTTGACTTACCAAAAGGGAATTCTGAGATGTTATATATTGCCAGGCAAGGCTTTTGTTACATTAACATTTTCCTCGCGATGTTGATTAACATTAGTGAGGAAGATGCAAAGGATTTCACTAAGAAGGTTCGTGACATGTGTGTGCCAAAGCTTGGAACCTGGCCAACCATGATGGATCTGGCTACAACTTGTGCTCAAATGAAAATATTCTACCCTGATGTTCATGATGCAGAACTGCCTAGAATACTAGTCGATCACGAAACGCAGACATGCCATGTGGTTGACTCGTTTGGCTCACAAACAACTGGGTATCATATTTTGAAAGCATCTAGCGTGTCCCAACTTATTTTGTTTGCTAATGATGAGTTGGAGTCTGACATTAAACATTATAGAGTTGGTGGTGTTCCTAATGCATGCCCTGAACTTGGGTCCACAATATCACCTTTCAGAGAAGGAGGAGTTATAATGTCTGAGTCGGCAGCTCTGAAACTGCTTTTGAAGGGAATTTTTAGACCTAAGGTGATGAGACAGTTGTTGTTAGATGAGCCTTATCTGTTGATTCTATCAATATTATCCCCTGGCATACTGATGGCTATGTACAATAATGGGATTTTTGAACTTGCGGTAAGGTTGTGGATTAATGAGAAACAATCCATAGCTATGATAGCATCGCTACTATCAGCTTTAGCCCTACGAGTGTCAGCGGCAGAAACACTCGTCGCACAGAGGATTATAATTGATACTGCAGCTACAGATCTCCTTGATGCTACGTGCGATGGGTTCAACCTACATCTAACGTACCCCACTGCGTTGATGGTGTTGCAAGTTGTTAAGAATAGAAATGAATGTGATGATACCCTATTCAAGGCGGGTTTTCCAAGTTACAACACGAGCGTCGTACAGATCATGGAAAAAAATTATCTAAATCTCTTAAACGATGCTTGGAAAGATTTAACTTGGCGGGAAAAATTATCCGCAACATGGTACTCATACAGAGCAAAACGCTCTATCACTCGGTACATAAAACCCACAGGAAGGGCAGATTTGAAAGGGTTATACAACATATCACCACAAGCATTCTTGGGCCGAGGCGCCAAGGTGGTCAAAGGCACTGCCTCAGGATTGTGCGAGCGATTTAATAATTATTTCTACACTAAGTGTGTAAATATTTCATCCTTTTTCATTCGTAGAATCTTTAGGCGTTTGCCAACTTTCGTCACTTTTGTTAACTCATTATTAGTTATTAGTATGTTAACCAGCGTAGTGGCAGTCTGTCAGGCAATAATTTTAGATCAGAGGAAGTATAGGAGAGAAATCGAGTTGATGCAGATAGAGAAGAATGAGATCGTCTGCATGGAGCTATATGCAAGTTTACAGCGCAAACTTGAACGCGATTTCACATGGGATGAGTACATTGAGTATTTGAAGTCAGTAAACCCTCAGATAGTTCAGTTTGCTCAAGCGCAGATGGAAGAATATGATGTGCGACACCAGCGTTCCACACCAGGTGTTAAAAATTTGGAACAAGTGGTAGCATTTATGGCTTTAGTCATCATGGTGTTCGATGCTGAAAGGAGTGATTGCGTGTTCAAAACTCTCAATAAATTTAAGGGTGTCCTTTCCTCACTGGACCATGAAGTTAGACATCAGTCCTTAGACGATGTGATCAAGAATTTTGATGAGAGGAATGAGGTTATTGATTTTGAATTGAGTGAGGACACAATTCGAACATCATCAGTGCTAGATACAAAGTTTAGTGATTGGTGGGACCGACAAATCCAGATGGGGCATACACTTCCACATTACAGAACCGAGGGGCACTTCATGGAATTCACAAGAGCAACTGCTGTCCAAGTGGCTAATGACATTGCCCATAGCGAACACCTAGACTTTCTAGTAAGGGGAGCTGTTGGGTCTGGAAAGTCAACTGGGTTGCCTGTTCATCTTAGTGTAGCCGGATCTGTGCTTTTAATTGAACCAACGCGACCACTAGCGGAGAACGTTTTCAAACAGCTATCTAGTGAACCATTCTTCAAGAAGCCAACACTGCGTATGCGTGGAAATAGTATATTTGGCTCTTCTCCAATCTCCGTCATGACTAGCGGATTTGCGCTACACTACTTCGCCAATAATCGCTCTCAATTAGCTCAGTTCAACTTTGTAATATTTGATGAGTGCCATGTTCTGGATCCTTCCGCAATGGCGTTCCGCAGTCTGCTGAGTGTTTATCATCAAGCATGCAAAGTATTAAAAGTGTCAGCTACTCCAGTGGGAAGAGAGGTTGAATTTACAACACAGCAACCAGTCAAGTTAATAGTGGAGGACACACTGTCTTTTCTATCATTTGTTGATGCACAAGGTTCTAAAACTAATGCTGATGTTGTTCAGTTTGGTTCAAACGTACTTGTGTACGTGTCGAGCTACAATGAAGTTGATACCTTGGCTAAGCTCCTAACAGACAAGAATATGATGGTCACAAAAGTTGATGGCAGAACAATGAAGCACGGTTGCCTAGAAATTGTCACAAAAGGAACCAGTGCGAGACCACATTTTGTTGTAGCAACCAACATAATTGAGAATGGAGTGACTTTGGACATAGACGTGGTTGTAGATTTTGGGTTGAAAGTCTCACCGTTCTTGGACATTGACAATAGGAGCATTGCTTACAATAAGGTGAGTGTTAGCTATGGTGAGAGAATTCAAAGGCTGGGTCGTGTTGGACGCTTCAAGAAAGGAGTAGCATTGCGCATTGGACACACTGAGAAGGGAATTATTGAAATTCCAAGCATGATCGCTACAGAGGCGGCTCTTGCTTGCTTTGCATATAACTTGCCAGTGATGACAGGAGGCGTCTCAACTAGTCTGATTGGCAATTGTACTGTGCGCCAAGTTAAAACAATGCAGCAATTTGAATTGAGTCCCTTCTTTATCCAGAATTTTGTTGCCCATGATGGATCAATGCATCCTGTCATACATGACATTCTTAAAAAGTATAAACTTCGAGATTGTATGACACCTTTGTGCGATCAGTCTATACCATACAGGGCATCGAGCACTTGGTTATCGGTTAGTGAATATGAGCGACTTGGAGTGGCCTTAGAAATTCCAAAGCAAGTCAAAATTGCATTCCATATCAAAGAGATCCCTCCTAAGCTCCATGAAATGCTTTGGGAAACGGTTGTCAAGTACAAAGACGTTTGCTTATTTCCAAGCATTCGAGCATCGTCCATCAGCAAAATCGCATACACATTGCGTACAGACCTTTTCGCCATCCCAAGAACTCTAATATTGGTGGAGAGACTGCTTGAAGAGGAGCGAGTGAAGCAGAGCCAATTCAGAAGTCTCATCGACGAAGGATGCTCAAGCATGTTTTCAATTGTCAACTTGACAAACACTCTCAGAGCTAGATATGCAAAAGATTACACCGCAGAGAACATACAAAAACTTGAGAAAGTGAGAAGTCAATTGAAAGAATTCTCAAATTTGGATGGTTCTGCATGTGAGGAAAATTTAATAAAGAGGTATGAGTCTTTGCAGTTCGTTCATCACCAAGCTGCGACGTCACTTGCAAAGGATCTCAAGTTGAAGGGGACTTGGAAGAAGTCATTAGTGGCCAAAGACTTGATCATAGCAGGCGCTGTTGCAATTGGTGGAATAGGACTCATATATAGTTGGTTCACACAATCAGTTGAGACTGTGTCTCACCAAGGGAAAAATAAATCCAAAAGAATCCAAGCCTTGAAGTTTCGCCATGCTCGTGACAAAAGGGCTGGCTTTGAAATTGACAACAATGATGACACAATAGAGGAATTCTTTGGATCTGCATACAGGAAAAAGGGAAAAGGTAAAGGTACCACAGTTGGTATGGGCAAGTCAAGCAGGAGGTTCATCAACATGTATGGGTTTGACCCAACAGAGTACTCATTCATCCAATTCGTTGATCCACTCACTGGGGCGCAAATAGAAGAGAATGTCTATGCTGACATTAGAGATATTCAAGAGAGATTTAGTGAAGTGCGAAAGAAAATGGTTGAGAATGATGACATTGAAATGCAAGCCTTGGGCAGTAACACGACCATACATGCATACTTCAGGAAAGATTGGTCTGACAAAGCTTTGAAGATTGATTTAATGCCACATAATCCACTCAAAGTTTGTGACAAGACAAATGGCATTGCCAAATTTCCTGAGAGAGAACTCGAACTAAGGCAGACTGGGCCAGCTGTAGAAGTCGACGTGAAGGACATACCAGCACAGGAGGTGGAGCATGAAGCTAAATCGCTCATGAGAGGCTTGAGAGACTTCAACCCAATTGCCCAAACAGTTTGTAGGCTGAAAGTATCTGTTGAATATGGGACATCAGAGATGTACGGTTTTGGATTTGGAGCGTACATAATAGCGAACCACCATTTGTTCAGGAGTTACAATGGTTCCATGGAGGTGCGATCCATGCACGGTACATTCAGGGTGAAGAATCTACACAGTTTGAGTGTTCTGCCAATTAAAGGTAGGGATATCATCCTCATCAAAATGCCGAAAGATTTCCCTGTCTTTCCACAGAAATTGCATTTCCGAGCTCCTACACAGAATGAAAGAGTTTGTTTAGTTGGAACCAACTTTCAGGAGAAGTATGCATCGTCGATCATCACAGAAGGAGGCACCACTTACAATATACCAGGCAGCACATTCTGGAAGCATTGGATTGAAACAGATAATGGACATTGTGGACTACCAGTGGTGAGCACCACCGATGGATGTCTAGTCGGAATTCACAGTTTGGCAAACAACAAACACACCACGAATTACTACTCAGCCTTCGATGAAGATTTTGAAAGCAAGTATCTCCGAACCAATGAGCACAATGAATGGGTCAAGTCTTGGATTTATAATCCAGACACAGTATTGTGGGGCCCGTTGAAACTTAAAGACAGCACTCCCAAAGGATTATTCAAAACAACAAAGCTTGTGCAAGATCTAATTGATCATGATGAAGTGGTGGAGCAAGCTAAGCACTCTGCGTGGATGTTTGAAGCCTTGACAGGAAATTTGCAAGCTGTCGCAACAATGAAGAGCCAATTAGTAACCAAGCATGTAGTTAAAGGAGAGTGTCGACACTTCAAAGAATTCCTGACTGTGGATGCAGAAGCAGAGGCATTCTTCAGGCCTTTGATGGATGCGTATGGGAAAAGTTTGCTGAATAGAGATGCATACATCAAGGACATAATGAAGTATTCAAAACCTATAGATGTTGGTATCGTGGACTGTGATGCATTTGAGGAAGCCATCAATAGGGTTATCATCTACCTGCAAGTGCACGGCTTCAAGAAGTGCGCATACGTTACTGACGAGCAAGAAATTTTTAAAGCGCTCAACATGAAAGCTGCAGTCGGAGCCATGTATGGTGGCAAAAAGAAAGACTATTTTGAGCATTTCACTGATGCAGATAAGGAAGAAATAGTCATGCAAAGCTGTCTGCGATTGTATAAAGGCTTGCTTGGCATTTGGAATGGATCATTGAAGGCAGAGCTCCGGTGTAAGGAAAAGATACTTGCAAATAAGACGAGGACATTCACTGCTGCACCTTTAGACACTTTGCTGGGTGGTAAAGTGTGTGTTGATGACTTCAATAATCAATTTTATTCAAAGAATATTGAATGCTGTTGGACGGTTGGGATGACAAAGTTTTATGGTGGTTGGGATAAACTGCTGCGGCGTTTACCTGAGAATTGGGTTTACTGTGATGCCGATGGCTCACAGTTTGATAGTTCACTAACTCCATACTTAATCAATGCAGTTCTCACCATTAGAAGCACATACATGGAAGATTGGGATGTGGGGTTGCAAATGTTGCGCAATTTATACACTGAGATTGTTTACACACCTATTTCAACTCCAGATGGAACAATTGTTAAGAAGTTCAGAGGGAATAACAGTGGTCAGCCTTCTACTGTTGTGGACAACTCTCTTATGGTCGTCCTTGCCATGCACTATGCTCTCATCAAAGAATGCATTGAGTTTGAAGAGATTGACAGCACGTGCGTGTTCTTTGTCAATGGTGATGATTTGCTGATTGCTGTGAATCCGGATAAAGAGGGCATTCTTGACAGATTGTCACAACACTTCTCAGATCTTGGTTTGAATTATGATTTCTCGTCAAGAACAAGAAATAAGGAGGAGTTGTGGTTTATGTCTCATAGAGGCCTACTGATTGAGGCAATGTACGTGCCGAAACTTGAAGAAGAAAGGATTGTGTCCATTCTCCAATGGGACAGAGCAGACTTGGCTGAACATAGGCTTGAGGCGATTTGCGCAGCTATGATAGAGTCCTGGGGTTATTCTGAACTAACACACCAAATCAGGAGATTCTACTCATGGTTATTGCAACAGCAACCCTTTGCAACAATAGCGCAGGAAGGGAAGGCTCCTTATATAGCAAGCATGGCATTAAGGAAATTGTATATGGATAGGGCTGTGGATGAGGAAGAGCTGAGAGCCTTCACTGAAATGATGGTCGCATTAGATGATGAGTTTGAATTTGACTCTTATGAAGTATACCATCAAGCAAATGACACAATCGATGCAGGAGAAAGCAGCAAGAAAGATGCAAGACCAGAGCAAGGCAGCATCCAGTCAAACCCGAACAAAGGAAAAGATAAGGATGTGAATGCTGGTACATCTGGGACACATACTGTGCCGAGAATCAAGGCTATCACGTCCAAAATGAGGATGCCCAAAAGCAAGGGAGCAACCGTGCTAAACTTAGAACACTTGCTTGAGTATGCTCCACAACAAATTGATATTTCAAATACTCGGGCAACTCAATCACAGTTTGATACGTGGTATGAGGCAGTGCGGATGGCATACGACATAGGAGAAACTGAGATGCCAACTGTGATGAATGGGCTTATGGTTTGGTGCATTGAAAATGGAACCTCGCCAAATGTCAACGGAGTTTGGGTTATGATGGATGGGAATGAACAAGTCGAGTACCCGTTGAAACCAATCGTTGAGAATGCAAAACCAACCCTTAGGCAAATCATGGCACATTTCTCAGATGTTGCAGAAGCGTATATAGAAATGCGCAACAAAAAGGAACCATATATGCCACGATATGGTTTAATTCGAAATCTGCGGGATGTGGGTTTAGCGCGTTATGCCTTTGACTTTTATGAGGTCACATCACGAACACCAGTGAGGGCTAGGGAAGCGCACATTCAAATGAAGGCCGCAGCATTGAAATCAGCCCAACCTCGACTTTTCGGGTTGGACGGTGGCATCAGTACACAAGAGGAGAACACAGAGAGGCACACCACCGAGGATGTCTCTCCAAGTATGCATACTCTACTTGGAGTCAAGAACATGTGA
>CF_YL21
ATGGCAATCTACACGTCAACAATCTGTTTCGGTTCGTTTGAATGCAAGCTACCATACTCACCCGCCTCTTGCGGGCATATTGTGAAGGAACGAGAAGTGCTGGCTTCCATCGATCCTTTCGCAGATCTGGAAACACAACTTAGTGCACGATTGCTCAAGCAAGAATATGCTACTGTTCGTGTGCTCAAGAACGGTACTCTTACGTACCGATACAAGACTGATGCCCAGATAACGCGCATCCAGAAGAGACTGGAAAGGAAGGATAGGGAAGAATACCACTTCCAGATGGCAGCTCCTAGTATTGTGTCAAAGATCACTATTGCTGGTGGAGAGCCACCTTCAAAACTTGAATCACAAGTGCGGAGGGGTGTCATCCACACAACTCCAAGGATGCGCACAGCAAAAACATATCACACGCCAAAGCTGACAGAGGGACAAATGAACCACCTTATCAAGCAGGTGAAGCAAATTATGTCAACCAAAGGAGGGTCTGTTCAACTGATTAGCAAGAAAAGTACCCATGTTCACTATAAAGAAGTTTTGGGATCACATCGCGCAGTTGTTTGCACTGCACATATGAGAGGTTTACGAAAGAGAGTGGACTTTCGGTGTGATAAATGGACCGTTGTGCGTCTACAGCATCTCGCCAGGACGGATAAGTGGAATAACCAAGTTCGTGTTACTGATCTACGCAAGGGCGATAGTGGAGTTATATTGAGTAATACTAATCTCAAAGGAAACTTTGGGAGAAGCTCGGAGGGCATATTCATAGTGCGTGGGTCGCACGAAGGAAAAATCTATGATGCACGTTCCAAGGTTACTCAAGGGGTTATGGATTCAATGGTTCAGTTCTCAAGCGCTGAAAGCTTTTGGAAGGGATTGGACGGCAATTGGGCACAAATGAGATATCCTACAGATCATACATGTGTGGCAGGCTTACCAGTTGAAGACTGTGGCAGAGTTGCAGCGATAATGACACACAGTATTTTACCGTGCTATAAGATAACCTGCCCTACCTGTGCCCAACAATATGCCAACTTGCCAGCCAGTGACTTACTTAAGATATTACACAAGCACGCAAGTGATGGTTTAAATCGATTGGGGGCAGACAAAGATCGCTTTGTGCATGTCAAAAAGTTCTTGACAATCTTAGAGCACTTAACTGAACCGGTTGATCTGAGTCTAGAAATTTTCAATGAAGTATTCAAGTCTATAGGGGAGAAGCAACAATCACCTTTCAAAAACCTGAATATTCTGAATAATTTCTTTTTAAAAGGAAAGGAAAATACAGCTCGTGAATGGCAGGTGGCTCAATTAAGCTTACTTGAATTGGCAAGATTCCAAAAGAACAGAACGGATAATATCAAGAAAGTAGACATCTCGTTCTTTAGGAATAAACTATCTGCCAAAGCAAATTGGAACTTGTATCTGTCATGTGATAACCAGCTGGATAAGAATGCAAACTTCCTGTGGGGACAGAGGGAATATCATGCTAAGCGATTTTTCTCGAACTATTTCGAGGAAATTGATCCAGCGAAGGGCTATTCAGCATACGAAAATCGTTTGCATCCGAATGGGACAAGAAAACTTGCAATTGGAAACCTAATTGTACCACTTGATCTGGCTGAGTTTAGGCGGAAGATGAAAGGTGATTTTAAAAGACAGCCAGGGGTGAGTAAGAAGTGCACGAGCTCGAAGGATGGAAACTACGTGTATCCCTGTTGTTGCACTACACTTGATGATGGCTCAGCTGTTGAATCAACATTTTACCCGCCAACTAAGAAGCGCCTCGTAATAGGTAATAGTGGCGACCAAAAGTATGTTGACTTACCAAAAGGGAATTCTGAGATGTTATATATTGCCAGGCAAGGCTTCTGTTACATTAACATTTTCCTCGCGATGTTGATTAACATTAGTGAGGAAGATGCAAAGGATTTCACTAAGAAGGTTCGTGACATGTGTGTGCCAAAGCTCGGAACCTGGCCAACCATGATGGATCTGGCTACAACTTGTGCTCAAATGAAAATATTCTACCCTGATGTTCATGATGCAGAACTGCCTAGAATACTAGTCGATCACGAAACGCAGACATGCCATGTGGTTGACTCGTTTGGCTCACAAACAACTGGGTATCATATTTTGAAAGCATCTAGCGTGTCCCAACTTATTTTGTTTGCTAATGATGAGTTGGAGTCTGACATTAAGCACTATAGAGTTGGTGGTATTCCTAATGCATGCCCTGAACTTGGGTCCACAATATCACCTTTCAGAGAAGGAGGAGTTATAATGTCTGAGTCGGCAGCGCTGAAACTGCTTTTGAAGGGAATTTTTAGACCTAAGGTGATGAGACAGTTGCTGTTAGATGAGCCTTACCTGTTGATTCTATCAATATTATCCCCTGGCATACTGATGGCTATGTATAATAATGGGATTTTTGAACTTGCGGTGAGGTTGTGGATTAATGAGAAGCAATCCATAGCGATGATAGCATCGCTACTATCAGCTTTAGCCCTACGAGTGTCAGCGGCAGAAACACTCGTCGCACAGAGGATTATAATTGATGCTGCAGCTACAGACCTCCTCGATGCTACGTGTGACGGGTTCAACCTGCATCTAACGTACCCCACTGCATTGATGGTGTTGCAAGTTGTTAAGAATAGAAATGAATGTGATGATACTCTATTCAAGGCGGGTTTTCCAAGTTACAACACGAGCGTCGTACAGATTATGGAAAAAAATTATCTAAATCTCTTGAACGATGCTTGGAAAGATTTAACTTGGCGGGAAAAATTTTCCGCAACATGGTACTCATACAGAGCAAAACGCTCTATCACTCGGTACATAAAACCCACAGGAAGGGCAGATTTGAAAGGGTTATACAACATATCACCACAAGCATTCTTGGGCCGAAGCGTCCAGGTGGTCAAAGGCACTGCCTCAGGATTGAGCGAGCGATTTAATAATTACTTCAATACTAAGTGTGTAAATATTTCATCCTTTTTCATTCGTAGAATCTTTAGGCGTTTGCCAACTTTCGTCACTTTTGTTAACTCATTATTAGTTATTAGTATGTTAACTAGTGTAGTGGCAGTGTGTCAGGCAATAATTTTAGATCAGAAGAAGTATAGGAGAGAGATCGAGTTGATGCAGATAGAGAAGAATGAGATTGTCTGCATGGAGCTATACGCAAGTTTACAGCGCAAACTTGAACGCGATTTCACATGGGATGAGTACATTGAGTATTTGAAATCAGTGAACCCTCAGATAGTTCAATTTGCTCAAGCGCAGATGGAAGAATATGATGTGCGACACCAGCGTTCCACACCAGGTGTTAAAAATTTGGAACAAGTGGTAGCATTTATGGCTTTAGTCATCATGGTGTTCGATGCTGAAAGGAGTGACTGCGTGTTCAAAACTCTCAATAAATTTAAGGGTGTCCTTTCCTCACTGGACCATGAAGTTAGACATCAGTCCTTGGATGATGTGATCAAGAATTTTGATGAGAGGAATGAGACTATTGATTTTGAATTGAGTGAGGACACAATTCGAACATCATCAGTGCTAGATACAAAGTTTAGTGATTGGTGGGACCGACAAATCCAAATGGGACATACACTTCCACATTACAGAACCGAGGGGCACTTCATGGAATTTACAAGGGCAACTGCTGTTCAAGTGGCTAATGACATTGCCCACAGCGAACACCTAGACTTTCTAGTAAGGGGAGCTGTTGGGTCTGGAAAGTCAACTGGGTTGCCTGTTCATCTTAGTGTAGCTGGATCTGTGCTTTTAATTGAGCCAACGCGGCCACTAACGGAGAACGTTTTCAAACAGCTATCTAGTGAACCATTCTTCAAGAAGCCAACACTGCGTATGCGTGGAAATAGTATATTTGGCTCTTCCCCAATCTCCGTTATGACTAGCGGGTTTGCGCTACACTACTTCGCCAATAATCGCTCTCAATTAGCTCAGTTCAATTTTGTAATATTTGATGAGTGCCATGTTCTGGATCCTTCCGCAATGGCGTTCCGCAGTCTGCTGAGTGTTTATCATCAAGCATGTAAAGTATTAAAAGTGTCAGCTACTCCAGTGGGAAGAGAGGTTGAATTCACAACACAGCAGCCAGTCAAGTTAATAGTGGAAGACACACTGTCTTTCCAATCATTTGTTGATGCACAAGGTTCTAAAACTAATGCTGATGTTGTCCAGTTTGGTTCAAACATACTTGTTTACGTGTCGAGCTACAATGAAGTTGACAACTTGGCTAAGCTCCTAACAGATAAGAATATGATGGTCACAAAGGTTGATGGCAGAACAATGAAGCACGGTTGCCTAGAAATTGTCACAAAAGGAACCAGTGCGAGACCACATTTTGTTGTAGCAACCAACATAATTGAGAATGGAGTGACTTTGGACATAGATGTAGTAGTGGATTTTGGGTTGAAAGTCTCACCGTTCTTGGACATTGACAATAGGAGCATTGCTTACAATAAGGTGAGTGTTAGTTATGGTGAGAGAATTCAAAGGCTGGGTCGTGTTGGACGCTTCAAGAAAGGAGTAGCATTGCGCATTGGACACACTGAGAAGGGAATTATCGAAATTCCAAGCATGATCGCTACTGAGGCGGCTCTTGCTTGCTTTGCATATAACTTACCAGTGATGACAGGAGGCGTCTCAACTAGTCTGATTGGCAATTGTACTGTGCGCCAGGTCAAAACAATGCAGCAATTTGAATTGAGCCCCTTCTTTATCCAGAATTTCGTTGCTCATGATGGATCAATGCATCCTGTCATACATGACATTCTCAAAAAGTATAAACTCCGAGATTGTATGACACCTTTGTGCGATCAGTCTATACCATACAGGGCATCGAGCACTTGGTTATCGGTTAGTGAATATGAGCGACTTGGAGTGGCTTTAGAAATTCCAAAGCAAGTCAAAATTGCATTCCATATCAAAGAGATCCCTCCTAAGCTCCACGAAATGCTTTGGGAAACGGTTGTCAAGTACAAAGACGTTTGCTTATTTCCAAGCATTCGAGCATCGTCCATCAGCAAAATCGCATACACATTGCGTACAGACCTCTTCGCCATCCCAAGAACTCTAATATTGGTGGAGAGATTGCTTGAAGAGGAGCGAGTGAAGCAGAGCCAATTCAGAAGTCTCATCGATGAAGGGTGCTCAAGCATGTTTTCAATTGTCAACTTGACCAACACTCTCAGGGCCAGATATGCAAAAGATTACACCGCAGAGAACATACAAAAACTTGAGAAAGTGAGAAGTCAATTAAAAGAATTCTCAAATTTGGATGGTTCTGCATGTGAGGAAAATTTAATAAAGAGGTATGAGTCTTTGCAGTTCGTTCACCACCAAGCTGCGACGTCACTTGCAAAGGATCTCAAGTTGAAGGGGACTTGGAAGAAGTCATTAGTGGCTAAAGACTTGATCATAGCAGGCGCTGTTGCAATTGGTGGAATAGGACTCATATATAGTTGGTTCACACAATCAGTTGAGACTGTGTCTCACCAAGGGAAAAATAAATCCAAAAGAATCCAAGCCTTGAAGTTTCGCCATGCTCGTGACAAAAGGGCTGGCTTTGAAATTGACAACAATGATGACACAATAGAGGAATTCTTTGGATCTGCATACAGGAAGAAGGGAAAAGGTAAAGGTACCACAGTTGGTATGGGCAAGTCAAGCAGGAGGTTCATTAACATGTACGGGTTTGACCCGACAGAGTACTCATTCATCCAATTCGTTGATCCACTCACTGGGGCACAAATAGAAGAGAATGTCTATGCTGACATTAGAGATGTTCAAGAAAGATTTAGTGAGGTGCGAAAGAAAATGGTTGAGAATGATGATATTGAAATGCAAGCCTTGGGTAGTAACACGACCATACATGCATACTTTAGGAAAGATTGGTCTGACAAAGCTTTGAAGATTGATTTAATGCCACATAACCCACTCAAAGTTTGTGACAAAACAAATGGCATTGCCAAACTTCCTGAGAGAGAGTTCGAACTAAGGCAGACTGGGCCAGCTGTAGAAGTCGACGTGAAGGACATACCAGCGCAGGAAGTGGAGCATGAAGCCAAATCGCTCATGAGAGGCTTGAGAGACTTCAACCCAATTGCTCAAACAGTTTGTAGGCTGAAAGTATCTGTTGAATATGGGACATCAGAGATGTACGGTTTTGGATTTGGAGCATACATAATAGCGAACCACCATTTGTTTAGGAGTTACAATGGTTCCATGGAGGTGCGATCCATGCATGGTACATTCAGGGTGAAGAATCTACACAGTTTGAGCGTTCTGCCAATTAAAGGCAGGGACATCATCCTCATCAAAATGCCGAAGGATTTCCCTGTTTTTCCACAGAAATTGCATTTCCGAGCTCCTACACAGAATGAAAGAGTTTGTTTAGTTGGGACCAACTTTCAGGAGAAGTATGCATCGTCGATCATCACAGAAACAAGCACTACTTACAATATACCAGGCAGCACATTCTGGAAGCATTGGATTGAAACAGATAATGGACATTGTGGACTACCAGTGGTGAGCACAGCCGATGGATGTCTAGTCGGAATTCATAGTTTGGCAAACAATGCACACACCACGAACTACTACTCAGCCTTCGATGAAGATTTTGAAAGCAAGTACCTCCGAACCAATGAGCACAATGAATGGGTCAAATCTTGGGTTTATAATCCAGACACAGTGTTGTGGGGCCCGTTGAAGCTTAAAGATAGCACTCCCAAAGGGTTATTCAAAACAACAAAGCTTGTGCAAGATCTAATCGATCATGATGAAGTCGCGGAGCAAGCTAAGCACTCTGCGTGGATGTTTGAAGCCTTGACAGGAAATTTGCAAGCTGTCGCAACAATGAAGAGTCAATTAGTAACCAAGCATGTAGTTAAAGGAGAGTGTCGACACTTCAAAGAATTCCTGACTGTGGATGCAGAAGCAGAGGCATTCTTCAGGCCTTTGATGGATGCATATGGGAAAAGCTTGCTGAATAGAGATGCGTACATCAAAGACATAATGAAGTACTCAAAACCTATAGATGTTGGTATCGTGGACTGTGATGCATTTGAAGAAGCCATCAATAGGGTTATCATCTACTTGCAAGTGCACGGCTTCAAGAAGTGTGCATATGTCACTGACGAGCAAGAAATTTTCAAAGCGCTCAACATGAAAGCTGCAGTCGGAGCCATGTATGGTGGCAAAAAGAAAGACTATTTTGAGCATTTCACTGATGCAGATAAGGAAGAAATAGTCATGCAAAGCTGTCTGCGACTGTATAAAGGCTTGCTCGGCATTTGGAACGGATCGTTGAAGGCAGAGCTCCGGTGTAAGGAAAAGATACTTGCAAACAAGACGAGGACGTTCACTGCTGCACCTCTAGACACTTTGCTGGGTGGTAAAGTGTGTGTTGATGACTTCAATAATCAATTTTATTCAAAGAATATTGAATGCTGTTGGACAGTTGGGATGACTAAGTTTTATGGTGGTTGGGATAAACTGCTTCGGCGTTTACCTGAGAATTGGGTATACTGTGATGCCGATGGCTCACAGTTTGATAGTTCACTAACTCCATACTTAATCAATGCTGTTCTCACCATCAGAAGCACATACATGGAAGACTGGGATGTGGGGTTGCAAATGCTGCGTAATTTATACACTGAGATTATTTACACACCTATCTCAACTCCAGATGGAACAATTGTTAAGAAGTTCAGAGGAAATAACAGTGGTCAGCCTTCTACTGTTGTGGACAACTCTCTTATGGTTGTCCTTGCCATGCACTATGCTCTCATCAAAGAATGCATTGAGTTTGAAGAGATTGACAGCACGTGCGTGTTCTTCGTCAATGGTGATGATTTGCTGATTGCCGTGAATCCGGATAAAGAGGGCATTCTTGATAGATTGTCACAGCACTTCTCAGATCTTGGTTTAAATTATGATTTTTCGTCAAGAACAAGAAATAAGGAAGAGTTATGGTTTATGTCTCATAGAGGCCTACTGATTGAGGGCATGTACGTGCCGAAACTTGAAGAAGAAAGGATTGTGTCCATTCTCCAATGGGACAGAGCAGACTTGGCTGAACACAGGCTTGAGGCGATTTGCGCAGCTATGATAGAGTCCTGGGGTTATTCTGAACTAACACACCAAATTAGAAGATTCTACTCATGGTTATTGCAACAGCAACCTTTTGCAACAATAGCGCAGGAAGGGAAGGCTCCTTATATAGCAAGCATGGCATTAAGGAAACTGTATATGGATAGAGCTGTGGATGAGGAAGAGCTGAGAGCCTTCACTGAAATGATGGTCGCATTAGACGATGAGTTTGAGTGTGACTCTTATGAAGTACACCACCAGGGAAACGATACAATCGATGCAGGAGGAAGCAGCAAGAAAGATGCAAGACCAGAGCAAGGCAGCATCCAGTCAAACCCGAACAAAGGAAAAGATAAGGATGTGAATGCTGGTACATCTGGGACACATACTGTGCCGAGAATCAAGGCTATCACGTCCAAAATGAGAATGCCCAAAAGCAAAGGAGCAACCGTGCTAAACTTAGAACACTTGCTTGAGTATGCTCCACAACAAATTGATATTTCAAATACTCGGGCAACTCAATCACAGTTTGATACGTGGTATGAGGCAGTGCGGATGGCATACGACATAGGAGAAACTGAGATGCCAACTGTGATGAATGGGCTTATGGTTTGGTGCATTGAAAATGGAACCTCGCCAAATGTCAACGGAGTCTGGGTTATGATGGATGGGGATGAACAAGTCGAGTACCCGTTGAAACCAATCGTTGAGAATGCAAAACCAACCCTTAGGCAAATCATGGCACATTTCTCAGATGTTGCAGAAGCGTATATAGAAATGCGCAACAAAAAGGAACCATATATGCCACGATATGGTTTAATTCGAAATCTGCGGGATGTGGGTTTAGCGCGTTATGCCTTTGACTTTTATGAGGTCACATCACGAACACCAGTGAGGGCTAGGGAAGCGCACATTCAAATGAAGGCCGCAGCATTGAAATCAGCCCAACCTCGACTTTTCGGGTTGGACGGTGGCATCAGTACACAAGAGGAGAACACAGAGAGGCACACCGCCGAGGATGTCTCTCCAAGTATGCATACTCTACTTGGAGTCAAAAACATGTGA
================================================
FILE: inst/extdata/GVariation/sample_alignment.fa
================================================
>Mont
ATGGCAACTTACACATCAACAATCCAGTTTGGTTCCATTGAATGCAAACTTCCATACTCACCCGCTCCTTTTGGGCTAGT
TGCGGGGAAACGAGAAGTTTTAACCACCACTGACCCCTTCGCAAGTTTGGAGATGCAGCTTAGTGCGCGATTACGAAGGC
AAGAGTTTGCAACTATTCGAACATCCAAGAATGGTACTTGCATGTATCGATACAAGACTGATGTCCAGATTGCGCGCATT
CAAAAGAAGCGCGAGGAAAGAGAAAGAGAGGAATATAATTTCCAAATGGCTGCGTCAAGTGTTGTGTCGAAGATCACTAT
TGCTGGTGGAGAGCCACCTTCAAAACTTGAATCACAAGTGCGGAGGGGTGTCATCCACACAACTCCAAGGATGCGCACAG
CAAAAACATATCACACGCCAAAGCTGACAGAGGGACAAATGAACCACCTTATCAAGCAGGTGAAGCAAATTATGTCAACC
AAAGGAGGGTCTGTTCAACTGATTAGCAAGAAAAGTACTCATGTTCACTATAAAGAAGTTTTGGGATCACATCGCGCAGT
TGTTTGCACTGCACATATGAGAGGTTTACGAAAGAGAGTGGACTTTCGGTGTGATAAATGGACCGTTGTGCGTCTACAGC
ATCTCGCCAGGACGGACAAGTGGACTAACCAAGTTCGTGCTACTGATCTACGCAAGGGCGATAGTGGAGTTATATTGAGT
AATACTAATCTCAAAGGAAACTTTGGGAGAAGCTCGGAGGGCCTATTCATAGTGCGTGGGTCGCACGAAGGAAAAATCTA
TGATGCACGTTCCAAGGTTACTCAAGGGGTTATGGATTCAATGGTTCAGTTCTCAAGCGCTGAAAGCTTTTGGAAGGGAT
TGGACGGCAATTGGGCACAAATGAGATATCCTACAGATCATACATGTGTGGCAGGCTTACCAGTTGAAGACTGTGGCAGA
GTTGCAGCGATAATGACACACAGTATTTTACCGTGCTATAAGATAACCTGCCCTACCTGTGCCCAACAATATGCCAACTT
GCCAGCCAGTGACTTACTTAAGATATTACACAAGCACGCAAGTGATGGTCTAAATCGATTGGGGGCAGACAAAGATCGCT
TTGTGCATGTCAAAAAGTTCTTGACAATCTTAGAGCACTTAACTGAACCGGTTGATCTGAGTCTAGAAATTTTCAATGAA
GTATTCAAGTCTATAGGGGAGAAGCAACAATCACCTTTCAAAAACCTGAATATTCTGAATAATTTCTTTTTGAAAGGAAA
GGAAAATACAGCTCGTGAATGGCAGGTGGCTCAATTAAGCTTACTTGAATTGGCAAGATTCCAAAAGAACAGAACGGATA
ATATCAAGAAAGGAGACATCTCGTTCTTTAGGAATAAACTATCTGCAAAAGCAAATTGGAACTTGTATCTGTCATGTGAT
AACCAGCTGGATAAGAATGCAAACTTCCTGTGGGGACAGAGGGAATATCATGCTAAGCGATTTTTCTCGAACTATTTCGA
GGAAATTGATCCAGCGAAGGGCTATTCAGCATACGAAAATCGTTTGCATCCGAATGGGACAAGAAAACTTGCAATTGGAA
ACCTAATTGTACCACTTGATCTGGCTGAGTTTAGGCGGAAGATGAAAGGTGATTATAAAAGACAGCCAGGGGTGAGTAAG
AAGTGCACGAGCTCGAAGGATGGAAACTACGTGTATCCCTGTTGTTGCACTACACTTGATGATGGCTCAGCTGTTGAGTC
AACATTTTACCCGCCAACTAAGAAGCACCTCGTAATAGGTAATAGTGGCGACCAAAAGTATGTTGACTTACCAAAAGGGA
ATTCTGAGATGTTATATATTGCCAGGCAAGGCTTCTGTTACATTAACATTTTTCTCGCGATGTTGATTAACATTAGTGAG
GAAGATGCAAAGGATTTCACTAAGAAGGTTCGTGACATGTGTGTGCCAAAGCTTGGAACCTGGCCAACCATGATGGATTT
GGCTACAACTTGTGCTCAAATGAAAATATTCTACCCTGATGTTCATGATGCAGAACTGCCTAGAATACTAGTCGATCACG
AAACGCAGACATGCCATGTGGTTGACTCGTTTGGCTCACAAACAACTGGGTATCATATTTTGAAAGCATCTAGCGTGTCC
CAACTTATTTTGTTTGCTAATGATGAGTTGGAGTCTGACATTAAGCACTATAGAGTTGGTGGTATTCCTGGAGCATGCCC
TGAGCTTGGGTCCACAATATCACCTTTTAGAGAAGGAGGAATCATAATGTCTGAGTCAGCAGCGCTAAAACTGCTCCTAA
AGGGAATTTTTAGGCCCAAAGTGATGAAGCAATTGCTACTGGATGAACCATATTTGCTCATTTTATCGATATTATCTCCT
GGTATACTTATGGCCATGTACAACAATGGGATATTTGAGTTAGCGGTGAAGTTGTGGATCAATGAGAAACAATCTATAGC
CATGATAGCATCGTTATTGTCCGCCTTGGCTTTACGAGTGTCAGCAGCAGAAACACTCGTTGCACAGAGGATTATAATTG
ACACGGCAGCAACAGATCTTCTCGATGCTACGTGTGATGGATTCAACTTACATCTAACATATCCCACTGCACTCATGGTG
TTGCAAGTTGTTAAGAACAGAAATGAATGTGATGATACGTTGTTTAAAGCAGGTTTTTCACATTACAACATGAGTGTCGT
GCAGATTATGGAAAAAAATTATCTAAGCCTCTTGGGCGATGCTTGGAAAGATTTAACCTGGCGAGAAAAATTATCCGCAA
CATGGCACTCATACAAAGCAAAGCGCTCTATCACTCAGTTCATAAAACCCATAGGCAAAGCAGATTTAAAAGGGTTGTAC
AACATATCACCGCAAGCATTCTTGGGTCAGGGCGTACAGAGAGTCAAAGGCACCGCCTCAGGGTTGAATGAGCGACTCAA
TAATTATATCAATACTAAGTGTGTAAATATTTCATCCTTTTTCATTCGTAGAATTTTCCGGCGCTTGCCAACTTTTGTAA
CTTTCATTAATTCATTATTAGTTATTAGTATGCTAACTAGTGTAGTAGCAGTGTGTCAAGCAATAATTCTAGATCAAAGG
AAGTATAGAAAAGAAATTGAGTTGATGCAGATTGAGAAGAATGAAATTGTTTGTATGGAGTTGTATGCGAGTCTGCAGCG
CAAACTTGAGCGTGAATTCACATGGGATGAATATATGGAATATTTGAAATCTGTGAATCCCCAGATAGTTCAATTCGCGC
AAGCTCAAATGGAAGAATATAATGTGCGACATCAGCGCTCCACACCAGGTGTTAAGAATTTAGAGCAGGTGGTAGCATTT
ATAACTCTAATTATCATGATGTTTGATGCTGAAAGGAGCGACTGTGTATTTAAGACTCTCAACAAATTCAAAGGCATCGT
TTCTTCAATGGATCATGAAGTTAGACACCAGTCCTTGGATGATGTAATCAAGAATTTCGATGAAAGGAACGAAGTTATTG
ATTTTGAACTAAATGAGGATACAATTAAAACATCATCAGTGTTGGACACAAAGTTTAGCGACTGGTGGGATCGGCAAATC
CAAATGGGACACACACTTCCCCATTATAGAACTGAGGGACACTTCATGGAATTCACAAGGGCAACTGCTGTACAAGTGGC
CAACGACATCGCGCATAGTGAGCACCTAGACTTTCTAGTGAGGGGAGCTGTTGGGTCTGGAAAATCTACTGGACTGCCTG
TCCATCTCAGTGCAGCTGGATCTGTGCTTTTGATAGAACCAACTCGACCACTTGCAGAAAACGTGTTCAAGCAATTATCC
AGTGAACCGTTTTTCAAGAAGCCAACACTGCGCATGCGAGGAAATAGTGTGTTTGGTTCCTCTCCAATCTCCATTATGAC
TAGCGGCTTTGCGTTGCACTACTATGCTAATAATCGCTCTCAGCTAACTCAGTTTAATTTCATAATTTTTGATGAATGTC
ATGTTTTAGATCCTTCTGCAATGGCATTTCGTAGCTTGTTAAGTGTGTATCACCAAACATGCAAAGTGTTAAAGGTGTCA
GCCACTCCAGTGGGAAGGGAGGTCGAGTTCACAACACAACAACCAGTTAAATTGGTGGTTGAGGATACACTTTCATTCCA
ATCTTTTGTTGATGCGCAAGGCTCAAAAACCAATGCTGACGTAGTTCAGCATGGTTCGAACATACTCGTGTATGTGTCGA
GTTACAATGAAGTGGATACATTAGCCAAGCTTCTAACAGATAGGAATATGATAGTCTCAAAAGTTGATGGCAGAACAATG
AAGCACGGATGCTTAGAAATTGTAACGAAAGGGACTAGTGCAAAGCCACATTTTGTCGTAGCAACCAACATTATTGAAAA
TGGAGTAACTTTAGATATAGATGTAGTTGTAGATTTTGGGCTTAAAGTCTCACCGTTTTTAGATATTGACAATAGGAGCA
TAGCATACAATAAGATTAGTGTTAGCTATGGAGAAAGAATTCAGAGGTTGGGCCGTGTTGGGCGCTTTAAGAAGGGAGTG
GCATTGCGTATTGGACACACCGAAAAGGGAATTATTGAGATTCCAAGTATGATTGCTAGTGAAGCTGCGCTTGCGTGCTT
TGCATACAATTTGCCAGTAATGACAGGGGGTGTTTCAACTAGCCTCATTGGCAATTGTACTGTTCGTCAAGTTAAAACTA
TGCAACAATTTGAGCTGAGTCCATTCTTTATACAAAATTTTGTTGCCCATGATGGATCAATGCATCCTGTCATACATGAC
ATTCTTAAGAAGTATAAACTGCGAGATTGTATGACGCCCTTGTGTGATCAATCCATACCTTACAGAGCCTCAAGCACTTG
GTTGTCTGTTAGTGAGTACGAACGACTCGGAGTGGTTTTGGACATTCCAAAACAGATCAAGATTGCATTCCACATCAAGG
ATATCCCTCCTAAGTTGCATGAAATGCTTTGGGAAACAGTTATCAAATATAAGGATGTTTGTTTGTTTCCAAGTATTCGG
GCTTCATCCATTAGCAAAATTGCATACACACTGCGCACTGATCTTTTTGCAATTCCCAGAACCCTAATTCTAGTTGAAAG
ATTGCTCGAGGAGGAACGAGTGAAACAGAGTCAATTCAGAAGTCTCATTGATGAAGGATGCTCAAGCATGTTTTCAATTG
TTAATTTAACAAACACTCTTAGAGCTAGATATGCAAAGGATTACACTGCAGAAAACATACAGAAGCTCGAGAAAGTGAGG
AGTCAGTTAAAGGAGTTCTCAAATTTAAATGGCTCTGCATGCGAGGAGAACTTAATGAAGAGGTATGAATCTCTACAGTT
TGTGCATCATCAAGCAACAACTGCACTCGCAAAGGATTTGAAGTTGAAAGGAGTTTGGAAGAAGTCATTAGTTGTGCAGG
ACTTAATCATAGCGGGTGCCGTTGCTATTGGTGGAATAGGGCTCATCTATAGTTGGTTTACTCAATCAGTTGAAACTGTG
TCTCACCAGGGCAAGAACAAATCCAAAAGAATTCAAGCATTGAAGTTTCGACACGCCCGCGATAAGAGGGCTGGTTTTGA
AATTGATAACAATGATGATACAATAGAAGAATTCTTTGGATCTGCATACAGGAAGAAGGGAAAAGGTAAAGGCACCACTG
TTGGTATGGGCAAGTCAAGCAGGAGGTTTGTTAATATGTATGGATTTGACCCAACAGAATATTCATTCATCCAGTTCGTT
GATCCGCTCACTGGAGCTCAAATTGAAGAGAACGTCTATGCTGATATTAGAGACATCCAAGAGCGCTTTAGTGATGTCCG
CAAGAAAATGGTAGAGGATGATGAAATCGAATTGCAAGCATTGGGCAGCAACACAATCATTCATGCTTACTTCAGGAAGG
ATTGGTCTGACAAGGCTCTAAAAATTGATTTGATGCCACACAACCCACTCAAAATCTGTGATAAATCGAATGGCATTGCT
AAGTTTCCTGAAAGAGAACTTGAGTTGAGGCAAACTGGGCCAGCAACAGAGGTTGATGTGAAAGACATTCCAAAACAGGA
AGTGGAGCATGAAGCCAAATCACTCATGAGAGGTTTAAGGGATTTCAATCCAATTGCTCAAACAGTTTGCAGAGTAAAAG
TGTCTGTTGAATATGGAACGTCTGAAATGTATGGGTTCGGTTTTGGTGCGTATATTATAGTAAACCACCATCTATTCAAG
AGTTTCAATGGATCCATGGAAGTGCGATCAATGCATGGAACATTCAGAGTGAAGAATTTGCATAGCCTGAGCGTTTTACC
GATCAAAGGCAGAGACATTATCATCATAAAGATGCCAAAGGATTTCCCTGTTTTCCCACAAAAACTGCACTTCCGAGCTC
CAGTGCAGAATGAGAGGATTTGTTTGGTTGGAACTAATTTTCAAGAAAAACATGCATCATCAATCATCACAGAAACGAGT
ACTACATACAATGTACCGGGCAGCACTTTTTGGAAGCATTGGATTGAAACAAATGATGGGCATTGTGGATTACCAGTAGT
GAGTACAGCTGATGGATGTCTAGTTGGAATACACAGCTTGGCGAATAATGTGCAAACCACGAATTATTATTCAGCCTTTG
ATGAGGATTTTGAAAGTAAGTATCTCCGAACTGATGAGCATAATGAGTGGACCAAATCGTGGGTATATAACCCAGATACT
GTGTTGTGGGGTCCATTGAAGCTCAAAGAGAGTACCCCTAAAGGCCTGTTTAAGACAACAAAACTTGTACAGGATTTAAT
TGATCATGATGTTGTTGTAGAGCAAGCTAAACATTCTGCGTGGATGTATGAGGCTCTAACAGGGAATTTGCAAGCTGTGG
CGACAATGAAGAGTCAGCTAGTGACAAAGCACGTGGTCAAAGGGGAGTGTCGGCACTTCAAAGAGTTCTTAACTGTGGAT
TCGGAAGCAGAAGCTTTCTTCAGGCCTTTGATGGATGCTTATGGGAAGAGCTTGTTAAATAGAGAAGCATATATAAAGGA
CATAATGAAATACTCAAAGCCTATTGATGTTGGAATAGTAGACTGTGATGCTTTTGAAGAGGCTATCAATAGGGTTATCA
TTTATCTGCAAGTACATGGCTTCCAGAAATGCAATTACATCACCGATGAGCAGGAAATTTTCAAAGCTCTCAATATGAAA
GCTGCTGTCGGGGCTATGTATGGAGGCAAGAAGAAAGACTACTTCGAGCATTTTACTGAGGCGGATAAAGAGGAAATTGT
TATGCAAAGTTGCTTACGATTGTACAAGGGCTCACTTGGCATATGGAATGGATCATTGAAAGCAGAACTTCGGTGCAAAG
AGAAGATACTTGCAAATAAGACAAGGACATTCACTGCTGCACCTTTAGATACTCTACTGGGTGGGAAGGTGTGCGTTGAT
GATTTTAATAATCAATTCTACTCAAAGAACATTGAATGCTGCTGGACTGTTGGAATGACTAAGTTTTATGGAGGTTGGGA
CAAATTGCTTCGGCGTCTACCTGAAAATTGGGTGTACTGCGATGCCGATGGTTCACAATTCGATAGTTCACTCACCCCAT
ACCTAATTAATGCTGTTCTCATCATCAGAAGCACATACATGGAAGATTGGGACTTGGGGTTGCAAATGTTGCGCAATTTG
TACACAGAAATAATTTACACACCAATCTCAACTCCAGATGGAACAATTGTCAAGAAGTTTAGAGGTAATAATAGCGGTCA
ACCTTCTACCGTTGTGGATAATTCTCTCATGGTTGTTCTTGCTATGCATTACGCTCTCATTAAGGAGTGCGTTGAGTTTG
AAGAAATCGACAGCACGTGTGTATTCTTTGTTAATGGTGATGACTTATTGATTGCTGTGAATCCGGAGAAAGAGAGCATT
CTCGATAGAATGTCACAACATTTCTCAGATCTTGGTTTGAACTATGATTTTTCGTCGAGAACAAGAAGGAAGGAGGAATT
GTGGTTCATGTCCCATAGAGGCCTGCTAATTGAGGGTATGTACGTGCCAAAGCTTGAAGAAGAGAGAATTGTATCCATTC
TGCAATGGGATAGGGCTGATCTGCCAGAGCACAGATTAGAAGCGATTTGTGCAGCAATGATAGAATCCTGGGGTTATTTT
GAGTTAACGCACCAAATTAGGAGATTCTACTCATGGTTGTTACAACAGCAACCTTTTTCAACGATAGCACAGGAAGGAAA
AGCTCCATACATAGCGAGCATGGCATTGAAGAAGCTGTACATGAATAGGACAGTAGATGAGGAGGAACTGAAGGCTTTCA
CTGAAATGATGGTTGCCTTGGATGACGAATTTGAGTGCGATACTTATGAAGTGCACCATCAAGGAAATGACACAATCGAT
GCAGGAGGAAGCACTAAGAAGGATGCAAAACAAGAGCAAGGTAGCATTCAACCAAATCTCAACAAGGAAAAGGAAAAGGA
CGTGAATGTTGGAACATCTGGAACTCATACTGTGCCACGAATTAAAGCTATCACGTCCAAAATGAGAATGCCTAAGAGTA
AAGGTGCAACTGTACTAAATTTGGAACACTTACTCGAGTATGCTCCACAGCAAATTGACATCTCAAATACTCGAGCAACT
CAATCACAGTTTGATACGTGGTATGAAGCAGTACAACTTGCATACGACATAGGAGAAACTGAAATGCCAACTGTGATGAA
TGGGCTTATGGTTTGGTGCATTGAAAATGGAACCTCGCCAAACATCAACGGAGTTTGGGTTATGATGGATGGAGATGAAC
AAGTCGAATACCCACTGAAACCAATCGTTGAGAATGCAAAACCAACACTTAGGCAAATCATGGCACATTTCTCAGATGTT
GCAGAAGCGTATATAGAAATGCGCAACAAAAAGGAACCATATATGCCACGATATGGTTTAGTTCGTAATCTGCGCGATGG
AAGTCTGGCTCGCTATGCTTTTGACTTTTATGAAGTTACATCACGGACACCAGTGAGGGCTAGAGAGGCACACATTCAAA
TGAAGGCCGCAGCTTTAAAATCAGCTCAATCTCGACTTTTCGGATTGGATGGTGGCATTAGTACACAAGAGGAAAACACA
GAGAGGCACACCACCGAGGATGTTTCTCCAAGTATGCATACTCTACTTGGAGTGAAGAACATGTGA
>CF_YL21
ATGGCAATCTACACGTCAACAATCTGTTTCGGTTCGTTTGAATGCAAGCTACCATACTCACCCGCCTCTTGCGGGCATAT
TGTGAAGGAACGAGAAGTGCTGGCTTCCATCGATCCTTTCGCAGATCTGGAAACACAACTTAGTGCACGATTGCTCAAGC
AAGAATATGCTACTGTTCGTGTGCTCAAGAACGGTACTCTTACGTACCGATACAAGACTGATGCCCAGATAACGCGCATC
CAGAAGAGACTGGAAAGGAAGGATAGGGAAGAATACCACTTCCAGATGGCAGCTCCTAGTATTGTGTCAAAGATCACTAT
TGCTGGTGGAGAGCCACCTTCAAAACTTGAATCACAAGTGCGGAGGGGTGTCATCCACACAACTCCAAGGATGCGCACAG
CAAAAACATATCACACGCCAAAGCTGACAGAGGGACAAATGAACCACCTTATCAAGCAGGTGAAGCAAATTATGTCAACC
AAAGGAGGGTCTGTTCAACTGATTAGCAAGAAAAGTACCCATGTTCACTATAAAGAAGTTTTGGGATCACATCGCGCAGT
TGTTTGCACTGCACATATGAGAGGTTTACGAAAGAGAGTGGACTTTCGGTGTGATAAATGGACCGTTGTGCGTCTACAGC
ATCTCGCCAGGACGGATAAGTGGAATAACCAAGTTCGTGTTACTGATCTACGCAAGGGCGATAGTGGAGTTATATTGAGT
AATACTAATCTCAAAGGAAACTTTGGGAGAAGCTCGGAGGGCATATTCATAGTGCGTGGGTCGCACGAAGGAAAAATCTA
TGATGCACGTTCCAAGGTTACTCAAGGGGTTATGGATTCAATGGTTCAGTTCTCAAGCGCTGAAAGCTTTTGGAAGGGAT
TGGACGGCAATTGGGCACAAATGAGATATCCTACAGATCATACATGTGTGGCAGGCTTACCAGTTGAAGACTGTGGCAGA
GTTGCAGCGATAATGACACACAGTATTTTACCGTGCTATAAGATAACCTGCCCTACCTGTGCCCAACAATATGCCAACTT
GCCAGCCAGTGACTTACTTAAGATATTACACAAGCACGCAAGTGATGGTTTAAATCGATTGGGGGCAGACAAAGATCGCT
TTGTGCATGTCAAAAAGTTCTTGACAATCTTAGAGCACTTAACTGAACCGGTTGATCTGAGTCTAGAAATTTTCAATGAA
GTATTCAAGTCTATAGGGGAGAAGCAACAATCACCTTTCAAAAACCTGAATATTCTGAATAATTTCTTTTTAAAAGGAAA
GGAAAATACAGCTCGTGAATGGCAGGTGGCTCAATTAAGCTTACTTGAATTGGCAAGATTCCAAAAGAACAGAACGGATA
ATATCAAGAAAGTAGACATCTCGTTCTTTAGGAATAAACTATCTGCCAAAGCAAATTGGAACTTGTATCTGTCATGTGAT
AACCAGCTGGATAAGAATGCAAACTTCCTGTGGGGACAGAGGGAATATCATGCTAAGCGATTTTTCTCGAACTATTTCGA
GGAAATTGATCCAGCGAAGGGCTATTCAGCATACGAAAATCGTTTGCATCCGAATGGGACAAGAAAACTTGCAATTGGAA
ACCTAATTGTACCACTTGATCTGGCTGAGTTTAGGCGGAAGATGAAAGGTGATTTTAAAAGACAGCCAGGGGTGAGTAAG
AAGTGCACGAGCTCGAAGGATGGAAACTACGTGTATCCCTGTTGTTGCACTACACTTGATGATGGCTCAGCTGTTGAATC
AACATTTTACCCGCCAACTAAGAAGCGCCTCGTAATAGGTAATAGTGGCGACCAAAAGTATGTTGACTTACCAAAAGGGA
ATTCTGAGATGTTATATATTGCCAGGCAAGGCTTCTGTTACATTAACATTTTCCTCGCGATGTTGATTAACATTAGTGAG
GAAGATGCAAAGGATTTCACTAAGAAGGTTCGTGACATGTGTGTGCCAAAGCTCGGAACCTGGCCAACCATGATGGATCT
GGCTACAACTTGTGCTCAAATGAAAATATTCTACCCTGATGTTCATGATGCAGAACTGCCTAGAATACTAGTCGATCACG
AAACGCAGACATGCCATGTGGTTGACTCGTTTGGCTCACAAACAACTGGGTATCATATTTTGAAAGCATCTAGCGTGTCC
CAACTTATTTTGTTTGCTAATGATGAGTTGGAGTCTGACATTAAGCACTATAGAGTTGGTGGTATTCCTAATGCATGCCC
TGAACTTGGGTCCACAATATCACCTTTCAGAGAAGGAGGAGTTATAATGTCTGAGTCGGCAGCGCTGAAACTGCTTTTGA
AGGGAATTTTTAGACCTAAGGTGATGAGACAGTTGCTGTTAGATGAGCCTTACCTGTTGATTCTATCAATATTATCCCCT
GGCATACTGATGGCTATGTATAATAATGGGATTTTTGAACTTGCGGTGAGGTTGTGGATTAATGAGAAGCAATCCATAGC
GATGATAGCATCGCTACTATCAGCTTTAGCCCTACGAGTGTCAGCGGCAGAAACACTCGTCGCACAGAGGATTATAATTG
ATGCTGCAGCTACAGACCTCCTCGATGCTACGTGTGACGGGTTCAACCTGCATCTAACGTACCCCACTGCATTGATGGTG
TTGCAAGTTGTTAAGAATAGAAATGAATGTGATGATACTCTATTCAAGGCGGGTTTTCCAAGTTACAACACGAGCGTCGT
ACAGATTATGGAAAAAAATTATCTAAATCTCTTGAACGATGCTTGGAAAGATTTAACTTGGCGGGAAAAATTTTCCGCAA
CATGGTACTCATACAGAGCAAAACGCTCTATCACTCGGTACATAAAACCCACAGGAAGGGCAGATTTGAAAGGGTTATAC
AACATATCACCACAAGCATTCTTGGGCCGAAGCGTCCAGGTGGTCAAAGGCACTGCCTCAGGATTGAGCGAGCGATTTAA
TAATTACTTCAATACTAAGTGTGTAAATATTTCATCCTTTTTCATTCGTAGAATCTTTAGGCGTTTGCCAACTTTCGTCA
CTTTTGTTAACTCATTATTAGTTATTAGTATGTTAACTAGTGTAGTGGCAGTGTGTCAGGCAATAATTTTAGATCAGAAG
AAGTATAGGAGAGAGATCGAGTTGATGCAGATAGAGAAGAATGAGATTGTCTGCATGGAGCTATACGCAAGTTTACAGCG
CAAACTTGAACGCGATTTCACATGGGATGAGTACATTGAGTATTTGAAATCAGTGAACCCTCAGATAGTTCAATTTGCTC
AAGCGCAGATGGAAGAATATGATGTGCGACACCAGCGTTCCACACCAGGTGTTAAAAATTTGGAACAAGTGGTAGCATTT
ATGGCTTTAGTCATCATGGTGTTCGATGCTGAAAGGAGTGACTGCGTGTTCAAAACTCTCAATAAATTTAAGGGTGTCCT
TTCCTCACTGGACCATGAAGTTAGACATCAGTCCTTGGATGATGTGATCAAGAATTTTGATGAGAGGAATGAGACTATTG
ATTTTGAATTGAGTGAGGACACAATTCGAACATCATCAGTGCTAGATACAAAGTTTAGTGATTGGTGGGACCGACAAATC
CAAATGGGACATACACTTCCACATTACAGAACCGAGGGGCACTTCATGGAATTTACAAGGGCAACTGCTGTTCAAGTGGC
TAATGACATTGCCCACAGCGAACACCTAGACTTTCTAGTAAGGGGAGCTGTTGGGTCTGGAAAGTCAACTGGGTTGCCTG
TTCATCTTAGTGTAGCTGGATCTGTGCTTTTAATTGAGCCAACGCGGCCACTAACGGAGAACGTTTTCAAACAGCTATCT
AGTGAACCATTCTTCAAGAAGCCAACACTGCGTATGCGTGGAAATAGTATATTTGGCTCTTCCCCAATCTCCGTTATGAC
TAGCGGGTTTGCGCTACACTACTTCGCCAATAATCGCTCTCAATTAGCTCAGTTCAATTTTGTAATATTTGATGAGTGCC
ATGTTCTGGATCCTTCCGCAATGGCGTTCCGCAGTCTGCTGAGTGTTTATCATCAAGCATGTAAAGTATTAAAAGTGTCA
GCTACTCCAGTGGGAAGAGAGGTTGAATTCACAACACAGCAGCCAGTCAAGTTAATAGTGGAAGACACACTGTCTTTCCA
ATCATTTGTTGATGCACAAGGTTCTAAAACTAATGCTGATGTTGTCCAGTTTGGTTCAAACATACTTGTTTACGTGTCGA
GCTACAATGAAGTTGACAACTTGGCTAAGCTCCTAACAGATAAGAATATGATGGTCACAAAGGTTGATGGCAGAACAATG
AAGCACGGTTGCCTAGAAATTGTCACAAAAGGAACCAGTGCGAGACCACATTTTGTTGTAGCAACCAACATAATTGAGAA
TGGAGTGACTTTGGACATAGATGTAGTAGTGGATTTTGGGTTGAAAGTCTCACCGTTCTTGGACATTGACAATAGGAGCA
TTGCTTACAATAAGGTGAGTGTTAGTTATGGTGAGAGAATTCAAAGGCTGGGTCGTGTTGGACGCTTCAAGAAAGGAGTA
GCATTGCGCATTGGACACACTGAGAAGGGAATTATCGAAATTCCAAGCATGATCGCTACTGAGGCGGCTCTTGCTTGCTT
TGCATATAACTTACCAGTGATGACAGGAGGCGTCTCAACTAGTCTGATTGGCAATTGTACTGTGCGCCAGGTCAAAACAA
TGCAGCAATTTGAATTGAGCCCCTTCTTTATCCAGAATTTCGTTGCTCATGATGGATCAATGCATCCTGTCATACATGAC
ATTCTCAAAAAGTATAAACTCCGAGATTGTATGACACCTTTGTGCGATCAGTCTATACCATACAGGGCATCGAGCACTTG
GTTATCGGTTAGTGAATATGAGCGACTTGGAGTGGCTTTAGAAATTCCAAAGCAAGTCAAAATTGCATTCCATATCAAAG
AGATCCCTCCTAAGCTCCACGAAATGCTTTGGGAAACGGTTGTCAAGTACAAAGACGTTTGCTTATTTCCAAGCATTCGA
GCATCGTCCATCAGCAAAATCGCATACACATTGCGTACAGACCTCTTCGCCATCCCAAGAACTCTAATATTGGTGGAGAG
ATTGCTTGAAGAGGAGCGAGTGAAGCAGAGCCAATTCAGAAGTCTCATCGATGAAGGGTGCTCAAGCATGTTTTCAATTG
TCAACTTGACCAACACTCTCAGGGCCAGATATGCAAAAGATTACACCGCAGAGAACATACAAAAACTTGAGAAAGTGAGA
AGTCAATTAAAAGAATTCTCAAATTTGGATGGTTCTGCATGTGAGGAAAATTTAATAAAGAGGTATGAGTCTTTGCAGTT
CGTTCACCACCAAGCTGCGACGTCACTTGCAAAGGATCTCAAGTTGAAGGGGACTTGGAAGAAGTCATTAGTGGCTAAAG
ACTTGATCATAGCAGGCGCTGTTGCAATTGGTGGAATAGGACTCATATATAGTTGGTTCACACAATCAGTTGAGACTGTG
TCTCACCAAGGGAAAAATAAATCCAAAAGAATCCAAGCCTTGAAGTTTCGCCATGCTCGTGACAAAAGGGCTGGCTTTGA
AATTGACAACAATGATGACACAATAGAGGAATTCTTTGGATCTGCATACAGGAAGAAGGGAAAAGGTAAAGGTACCACAG
TTGGTATGGGCAAGTCAAGCAGGAGGTTCATTAACATGTACGGGTTTGACCCGACAGAGTACTCATTCATCCAATTCGTT
GATCCACTCACTGGGGCACAAATAGAAGAGAATGTCTATGCTGACATTAGAGATGTTCAAGAAAGATTTAGTGAGGTGCG
AAAGAAAATGGTTGAGAATGATGATATTGAAATGCAAGCCTTGGGTAGTAACACGACCATACATGCATACTTTAGGAAAG
ATTGGTCTGACAAAGCTTTGAAGATTGATTTAATGCCACATAACCCACTCAAAGTTTGTGACAAAACAAATGGCATTGCC
AAACTTCCTGAGAGAGAGTTCGAACTAAGGCAGACTGGGCCAGCTGTAGAAGTCGACGTGAAGGACATACCAGCGCAGGA
AGTGGAGCATGAAGCCAAATCGCTCATGAGAGGCTTGAGAGACTTCAACCCAATTGCTCAAACAGTTTGTAGGCTGAAAG
TATCTGTTGAATATGGGACATCAGAGATGTACGGTTTTGGATTTGGAGCATACATAATAGCGAACCACCATTTGTTTAGG
AGTTACAATGGTTCCATGGAGGTGCGATCCATGCATGGTACATTCAGGGTGAAGAATCTACACAGTTTGAGCGTTCTGCC
AATTAAAGGCAGGGACATCATCCTCATCAAAATGCCGAAGGATTTCCCTGTTTTTCCACAGAAATTGCATTTCCGAGCTC
CTACACAGAATGAAAGAGTTTGTTTAGTTGGGACCAACTTTCAGGAGAAGTATGCATCGTCGATCATCACAGAAACAAGC
ACTACTTACAATATACCAGGCAGCACATTCTGGAAGCATTGGATTGAAACAGATAATGGACATTGTGGACTACCAGTGGT
GAGCACAGCCGATGGATGTCTAGTCGGAATTCATAGTTTGGCAAACAATGCACACACCACGAACTACTACTCAGCCTTCG
ATGAAGATTTTGAAAGCAAGTACCTCCGAACCAATGAGCACAATGAATGGGTCAAATCTTGGGTTTATAATCCAGACACA
GTGTTGTGGGGCCCGTTGAAGCTTAAAGATAGCACTCCCAAAGGGTTATTCAAAACAACAAAGCTTGTGCAAGATCTAAT
CGATCATGATGAAGTCGCGGAGCAAGCTAAGCACTCTGCGTGGATGTTTGAAGCCTTGACAGGAAATTTGCAAGCTGTCG
CAACAATGAAGAGTCAATTAGTAACCAAGCATGTAGTTAAAGGAGAGTGTCGACACTTCAAAGAATTCCTGACTGTGGAT
GCAGAAGCAGAGGCATTCTTCAGGCCTTTGATGGATGCATATGGGAAAAGCTTGCTGAATAGAGATGCGTACATCAAAGA
CATAATGAAGTACTCAAAACCTATAGATGTTGGTATCGTGGACTGTGATGCATTTGAAGAAGCCATCAATAGGGTTATCA
TCTACTTGCAAGTGCACGGCTTCAAGAAGTGTGCATATGTCACTGACGAGCAAGAAATTTTCAAAGCGCTCAACATGAAA
GCTGCAGTCGGAGCCATGTATGGTGGCAAAAAGAAAGACTATTTTGAGCATTTCACTGATGCAGATAAGGAAGAAATAGT
CATGCAAAGCTGTCTGCGACTGTATAAAGGCTTGCTCGGCATTTGGAACGGATCGTTGAAGGCAGAGCTCCGGTGTAAGG
AAAAGATACTTGCAAACAAGACGAGGACGTTCACTGCTGCACCTCTAGACACTTTGCTGGGTGGTAAAGTGTGTGTTGAT
GACTTCAATAATCAATTTTATTCAAAGAATATTGAATGCTGTTGGACAGTTGGGATGACTAAGTTTTATGGTGGTTGGGA
TAAACTGCTTCGGCGTTTACCTGAGAATTGGGTATACTGTGATGCCGATGGCTCACAGTTTGATAGTTCACTAACTCCAT
ACTTAATCAATGCTGTTCTCACCATCAGAAGCACATACATGGAAGACTGGGATGTGGGGTTGCAAATGCTGCGTAATTTA
TACACTGAGATTATTTACACACCTATCTCAACTCCAGATGGAACAATTGTTAAGAAGTTCAGAGGAAATAACAGTGGTCA
GCCTTCTACTGTTGTGGACAACTCTCTTATGGTTGTCCTTGCCATGCACTATGCTCTCATCAAAGAATGCATTGAGTTTG
AAGAGATTGACAGCACGTGCGTGTTCTTCGTCAATGGTGATGATTTGCTGATTGCCGTGAATCCGGATAAAGAGGGCATT
CTTGATAGATTGTCACAGCACTTCTCAGATCTTGGTTTAAATTATGATTTTTCGTCAAGAACAAGAAATAAGGAAGAGTT
ATGGTTTATGTCTCATAGAGGCCTACTGATTGAGGGCATGTACGTGCCGAAACTTGAAGAAGAAAGGATTGTGTCCATTC
TCCAATGGGACAGAGCAGACTTGGCTGAACACAGGCTTGAGGCGATTTGCGCAGCTATGATAGAGTCCTGGGGTTATTCT
GAACTAACACACCAAATTAGAAGATTCTACTCATGGTTATTGCAACAGCAACCTTTTGCAACAATAGCGCAGGAAGGGAA
GGCTCCTTATATAGCAAGCATGGCATTAAGGAAACTGTATATGGATAGAGCTGTGGATGAGGAAGAGCTGAGAGCCTTCA
CTGAAATGATGGTCGCATTAGACGATGAGTTTGAGTGTGACTCTTATGAAGTACACCACCAGGGAAACGATACAATCGAT
GCAGGAGGAAGCAGCAAGAAAGATGCAAGACCAGAGCAAGGCAGCATCCAGTCAAACCCGAACAAAGGAAAAGATAAGGA
TGTGAATGCTGGTACATCTGGGACACATACTGTGCCGAGAATCAAGGCTATCACGTCCAAAATGAGAATGCCCAAAAGCA
AAGGAGCAACCGTGCTAAACTTAGAACACTTGCTTGAGTATGCTCCACAACAAATTGATATTTCAAATACTCGGGCAACT
CAATCACAGTTTGATACGTGGTATGAGGCAGTGCGGATGGCATACGACATAGGAGAAACTGAGATGCCAACTGTGATGAA
TGGGCTTATGGTTTGGTGCATTGAAAATGGAACCTCGCCAAATGTCAACGGAGTCTGGGTTATGATGGATGGGGATGAAC
AAGTCGAGTACCCGTTGAAACCAATCGTTGAGAATGCAAAACCAACCCTTAGGCAAATCATGGCACATTTCTCAGATGTT
GCAGAAGCGTATATAGAAATGCGCAACAAAAAGGAACCATATATGCCACGATATGGTTTAATTCGAAATCTGCGGGATGT
GGGTTTAGCGCGTTATGCCTTTGACTTTTATGAGGTCACATCACGAACACCAGTGAGGGCTAGGGAAGCGCACATTCAAA
TGAAGGCCGCAGCATTGAAATCAGCCCAACCTCGACTTTTCGGGTTGGACGGTGGCATCAGTACACAAGAGGAGAACACA
GAGAGGCACACCGCCGAGGATGTCTCTCCAAGTATGCATACTCTACTTGGAGTCAAAAACATGTGA
>Oz
ATGGCAACTTACATGTCAACAATCTGTTTCGGTTCGTTTGAATGCAAGCTACCATACTCACCCGCTTCTTGCGGGCATAT
TGTGAAGGAGCGAGAAGTGCTGGCTTCCGTTGATCCTTTCGCAGATCTGGAAACACAACTTAGTGCACGATTGCTCAAGC
AAGAATATGCTACTGTTCGTGTGCTCAAGAACGGTACTCTTACTTACCGATACAAAACTGATGCCCAGATAACGCGCATT
CAGAAGAAACTGGAGAGGAAGGATAGGGAAGAATATCACTTCCAGATGGCCGCTCCTAGTATTGTGTCAAAAATTACAAT
AGCTGGTGGAGATCCTCCATCAAAGTCTGAGCCACAAGCACCAAGAGGGATCATTCATACAACTCCAAGGGTGCGTAAAG
TCAAGACACGTCCCATAATAAAGTTGACAGAAGGCCAGATGAATCATCTCATTAAGCAGGTGAAGCAGATTATGTCGGAG
AAGAGAGGGTCTGTCCACTTAATTAGTAAGAAGACCACTCATGTTCAATATAAGGAGATACTTGGAGCAACTCGCGCAGC
GGTTCGAACTGCACATATGATGGGTTTGCGACGGAGAGTGGACTTCCGATGTGATATGTGGACAGTCGGACTTTTGCAAC
GTCTCGCTCGGACGGACAAATGGTCCAATCAAGTCCGCACTATCAACATACGAAGGGGTGATAGTGGAGTCATTTTGAAC
ACAAAAAGCCTCAAAGGCCACTTTGGTAGAAGTTCAGGAGACTTGTTCATAGTGCGTGGATCACACGAAGGGAAATTGTA
CGATGCACGTTCTAGAGTTACTCAGAGTGTTTTGAACTCAATGATCCAGTTTTCGAATGCTGATAATTTTTGGAAGGGTC
TAGACGGTAATTGGGCACAACTGAGATATCCTTCGGATCACACATGTGTAGCTGGTTTACCTGTCGAAGATTGTGGTAGA
GTTGCTGCATTGATGGCACACAGTATCCTCCCGTGCTACAAGATAACCTGCCCCACCTGTGCTCAACAGTATGCCAGCTT
GCCGGTTAGCGATCTGTTTAAGCTGTTGCATAAACATGCGAGAGATGGTTTGAACCGATTGGGAGCGGATAAAGACCGGT
TTATACATGTTAATAAGTTCTTGATAGCGTTAGAGCATCTAACTGAACCGGTGGATTTGAATCTCGAGCTTTTCAATGAG
ATATTTAAATCCATAGGGGAGAAGCAGCAAGCACCGTTCAAGAATTTAAATGTCTTAAATAATTTCTTCCTGAAAGGAAA
AGAAAATACAGCTCATGAATGGCAAGTGGCTCAATTGAGTTTGCTCGAATTAGCAAGGTTCCAGAAGAATAGAACTGATA
ACATCAAGAAAGGTGATATATCTTTCTTCAGAAATAAATTATCTGCCAAGGCAAACTGGAATCTGTATTTGTCGTGCGAC
AACCAGTTGGATAAAAATGCAAATTTTCTGTGGGGACAAAGGGAGTATCATGCTAAGCGGTTTTTCTCAAACTTCTTTGA
GGAAATTGATCCAGCAAAGGGATACTCAGCATATGAAATCCGCAAGCATCCAAATGGAACAAGGAAGCTCTCAATTGGTA
ACTTAGTTGTCCCACTTGATTTAGCTGAGTTTAGGCAGAAGATGAAAGGTGACTATAGGAAACAACCAGGAGTTAGCAGA
AAGTGCACGAGTTCGAAAGATGGTAATTATGTGTATCCCTGTTGTTGCACAACACTTGATGATGGTTCAGCTATTGAATC
AACATTCTATCCACCAACCAAAAAGCACCTTGTAATAGGCAATAGCGGTGACCAAAAATTTGTTGATTTACCAAAAGGGG
ATTCGGAGATGTTATACATTGCCAAGCAGGGTTATTGTTATATCAACGTGTTTCTTGCAATGCTTATTAACATTAGCGAG
GAGGATGCAAAGGATTTCACAAAGAAAGTTCGCGACATGTGTGTGCCAAAGCTTGGAACCTGGCCAACTATGATGGATTT
GGCGACCACTTGTGCTCAAATGAGAATATTCTATCCTGACGTGCATGATGCAGAGCTGCCTAGAATATTGGTTGACCATG
ACACTCAAACGTGTCACGTGGTTGACTCATTTGGCTCGCAAACAACTGGATATCATATTCTAAAAGCATCCAGCGTGTCT
CAACTTATCTTGTTTGCAAATGATGAATTAGAATCTGATATAAAACATTATAGAGTTGGTGGTGTTCCTAATGCATGCCC
TGAACTTGGGTCCACAATATCACCTTTCAGAGAAGGAGGAGTTATAATGTCTGAGTCGGCAGCGCTGAAACTGCTTTTAA
AGGGAATTTTTAGACCTAAGGTGATGAGACAGTTGCTGTTAGATGAGCCTTACCTGTTGATTTTATCAATATTATCTCCT
GGCATACTGATGGCTATGTATAATAATGGGATTTTTGAACTTGCGGTAAGGTTGTGGATTAATGAGAAACAATCCATAGC
TATGATAGCATCGCTACTATCAGCTTTAGCCCTACGAGTGTCAGCGGCAGAAACACTCGTCGCACAGAGGATTATCATTG
ATGCTGCAGCTACAGACCTCCTTGATGCTACGTGTGATGGGTTCAACCTACATCTAACGTACCCCACTGCATTGATGGTG
TTGCAAGTTGTTAAGAATAGAAATGAATGTGATGATACCCTATTCAAGGCGGGTTTTTCAAGTTACAACACGAGCGTCGT
ACAGATTATGGAAAAAAATTATCTAAATCTCTTGAACGATGCTTGGAAAGATTTAACTTGGCGGGAAAAATTATCCGCAA
CATGGTACTCATACAGAGCAAAACGCTCTATCACTCGGTACATAAAACCCACAGGAAGGGCAGATTTGAAAGGGTTATAC
AACATATCACCACAAGCATTCTTGGGCCGAAGCGCCCAGGTGGTCAAAGGTACTGCCTCAGGATTGAGTGAGCGATTTAA
TAATTATTTCAATACTAAGTGTGTAAATATTTCATCCTTTTTCATTCGTAGAATCTTTAGGCGTTTGCCAACTTTCGTCA
CTTTTGTTAACTCATTATTAGTTATTAGTATGTTAACTAGCGTAGTGGCAGTGTGTCAGGCAATAATTTTAGATCAGAGG
AAATATAGGAGAGAAATCGAGTTGATGCAGATAGAGAAGAATGAGATTGTCTGCATGGAGCTATATGCAAGTTTACAGCG
CAAACTTGAACGCGATTTCACATGGGATGAGTACATTGAGTATTTGAAATCAGTAAACCCTCAGATAGTTCAGTTTGCTC
AAGCGCAGATGGAAGAATATGATGTGCGACACCAGCGTTCCACACCAGGTGTTAAAAATTTGGAACAAGTGGTAGCATTT
ATGGCTTTAGTCATTATGGTGTTCGATGCTGAAAGGAGTGATTGCGTGTTCAAAACTCTCAATAAATTTAAGGGTGTCCT
TTCCTCAATGGACTATGAAGTTAGACATCAGTCCTTAGACGATGTGATCAAGAATTTTGATGAGAGGAATGAGATTATTG
ATTTTGAATTGAGTGAGGACACAATTCGAACATCATCAGTGCTAGATACAAAGTTTAGTGATTGGTGGGACCGACAAATC
CAGATGGGACATACACTTCCACATTACAGAACCGAGGGGCACTTCATGGAATTCACAAGAGCAACTGCTGTCCAAGTGGC
TAATGACATTGCCCATAGCGAACACCTAGACTTTCTAGTAAGGGGAGCTGTTGGGTCTGGAAAGTCAACTGGGTTACCTG
TTCATCTTAGTGTAGCCGGATCTGTGCTTTTAATTGAACCAACGCGACCACTAGCGGAGAACGTTTTCAAACAGCTATCT
AGTGAACCATTCTTCAAGAAGCCAACACTGCGTATGCGTGGAAATAGTATATTTGGCTCTTCTCCAATCTCCGTCATGAC
TAGCGGATTTGCGCTACACTACTTCGCCAATAATCGCTCTCAATTAGCTCAGTTCAACTTTGTAATATTTGATGAGTGCC
ATGTTCTGGATCCTTCCGCAATGGCGTTCCGCAGTCTGCTGAGTGTTTATCATCAAGCATGCAAAGTATTAAAAGTGTCA
GCTACTCCAGTGGGAAGAGAGGTTGAATTTACAACACAGCAACCAGTCAAGTTAATAGTGGAGGACACACTGTCTTTCCA
ATCATTTGTTGATGCACAAGGTTCTAAAACTAATGCTGATGTTGTTCAGTTTGGTTCAAACGTACTTGTGTACGTGTCGA
GCTACAATGAAGTTGATACCTTGGCTAAGCTCCTAACAGACAAGAATATGATGGTCACAAAGGTTGATGGCAGAACAATG
AAGCACGGTTGCCTAGAAATTGTCACAAAAGGAACCAGTGCGAGACCACATTTTGTTGTAGCAACCAACATAATTGAGAA
TGGAGTGACTTTGGACATAGACGTGGTTGTAGATTTTGGGTTAAAAGTCTCACCGTTCTTGGACATTGACAATAGGAGCA
TTGCTTACAATAAGGTGAGTGTTAGCTATGGTGAGAGAATTCAAAGGCTGGGTCGTGTTGGACGCTTCAAGAAAGGAGTA
GCATTGCGCATTGGACACACTGAGAAGGGAATTATTGAAATTCCAAGCATGATCGCTACAGAGGCAGCTCTTGCTTGCTT
TGCATATAACTTACCAGTGATGACAGGAGGCGTCTCAACTAGTCTGATTGGCAATTGTACTGTGCGCCAAGTTAAAACAA
TGCAGCAATTTGAATTGAGTCCCTTCTTTATCCAGAATTTCGTTGCCCATGATGGATCAATGCATCCTGTCATACATGAC
ATTCTTAAAAAGTATAAACTTCGAGATTGTATGACACCTTTGTGCGATCAGTCTATACCATACAGGGCATCGAGCACTTG
GTTATCGGTTAGTGAATATGAGCGACTTGGAGTGGCCTTAGAAATTCCAAAGCAAGCCAAAATTGCATTCCATATCAAAG
AGATCCCTCCTAAGCTCCACGAAATGCTTTGGGAAACGGTTGTCAAGTACAAAGACGTTTGCTTATTTCCAAGCATTCGA
GCATCGTCCATCAGCAAAATCGCATACACATTGCGTACAGACCTCTTCGCCATCCCAAGAACTCTAATATTGGTGGAGAG
ACTGCTTGAAGAGGAGCGAGTGAAGCAGAGCCAATTCAGAAGTCTCATCGATGAAGGATGCTCAAGCATGTTTTCAATTG
TCAACCTGACAAACACTCTCAGAGCTAGATATGCAAAAGATTACACCGCAGAGAACATACAAAAACTTGAGAAAGTGAGA
AGTCAATTGAAAGAATTCTCAAATTTGGATGGTTCTGCATGTGAGGAAAATTTAATAAAGAGGTATGAGTCTTTGCAGTT
CGTTCATCACCAAGCTGCGACGTCACTTGCAAAGGATCTCAAGTTAAAGGGGACTTGGAAGAAGTCATTAGTGGCCAAAG
ACTTGATCATAGCAGGCGCTGTTGCAATTGGTGGAATAGGACTCATATATAGTTGGTTCACACAATCAGTTGAGACTGTG
TCTCACCAAGGGAAAAATAAATCCAAAAGAATTCAAGCCTTGAAGTTTCGCCATGCTCGTGACAAAAGGGCTGGCTTTGA
AATTGACAACAATGATGACACAATAGAGGAATTCTTTGGATCTGCATACAGGAAAAAGGGAAAAGGTAAAGGTACCACAG
TTGGTATGGGCAAGTCAAGCAGGAGGTTCATCAACATGTATGGGTTTGATCCAACAGAGTACTCATTCATCCAATTCGTT
GATCCACTCACTGGGGCGCAAATAGAAGAGAATGTCTATGCTGACATTAGAGATATTCAAGAGAGATTTAGTGAAGTGCG
AAAGAAAATGGTTGAGAATGATGACATTGAAATGCAAGCCTTGGGTGGTAACACGACCATACATGCATACTTTAGGAAAG
ATTGGTCTGACAAAGCTTTGAAGATTGATTTAATGCCACATAATCCACTCAAAGTTTGTGACAAGACAAATGGCATTGCC
AAATTTCCTGAGAGAGAGCTCGAACTAAGGCAGACTGGGCCAGCTGTAGAAGTCGATGTGAAGGACATACCAGCACAGGA
AGTGGAGCATGAAGCTAAATCGCTCATGAGAGGCTTGAGAGACTTCAACCCAATTGCCCAAACAGTTTGTAGGCTGAAAG
TATCTGTTGAATATGGGACATCAGAGATGTACGGTTTTGGATTTGGAGCATACATAATAGCGAACCACCATTTGTTCAGG
AGTTACAATGGTTCCATGGAGGTGCGATCCATGCACGGTACATTCAGGGTGAAGAATCTACACAGTTTGAGCGTTCTGCC
AATTAAAGGTAGGGATATCATCCTCATCAAAATGCCGAAAGATTTCCCTGTCTTTCCACAGAAATTGCATTTCCGAGCTC
CTACACAGAATGAAAGAGTTTGTTTAGTTGGAACCAACTTTCAGGAGAAGTATGCATCGTCGATCATCACAGAAACAAGC
ACCACTTACAATATACCAGGCAGCACATTCTGGAAGCATTGGATTGAAACAGATAATGGACATTGTGGATTACCAGTGGT
GAGCACCACCGATGGATGTCTAGTCGGAATTCACAGTTTGGCAAACAACAAACACACCACGAACTACTACTCAGCCTTCG
ATGAAGATTTTGAAAGCAAGTATCTCCGAACCAATGAGCACAATGAATGGGTCAAGTCTTGGATTTATAATCCAGACACA
GTGTTGTGGGGCCCGTTGAAACTTAAAGACAGCACTCCCAAAGGATTATTCAAAACAACAAAGCTTGTGCAAGATCTAAT
CGATCATGATGTAGTGGTGGAGCAAGCTAAGCACTCTGCGTGGATGTTTGAAGCCTTGACAGGAAATTTGCAAGCTGTCG
CAACAATGAAGAGCCAATTAGTAACCAAGCATGTAGTTAAAGGAGAGTGTCGACACTTCAAAGAATTCCTGACTGTGGAT
GCAGAAGCAGAGGCATTCTTCAGGCCTTTGATGGATGCGTATGGGAAAAGCTTGCTGAATAGAGATGCATACATCAAGGA
CATAATGAAGTATTCAAAACCTATAGATGTTGGTATCGTGGACTGTGATGCATTTGAGGAAGCCATCAATAGGGTTATCA
TCTACCTGCAAGTGCACGGCTTCAAGAAGTGCGCATACGTCACTGACGAGCAAGAAATTTTCAAAGCGCTCAACATGAAA
GCTGCAGTTGGAGCCATGTATGGTGGCAAAAAGAAAGACTATTTTGAGCATTTCACTGATGCAGATAAGGAAGAAATAGT
CATGCAAAGCTGTCTGCGATTGTATAAAGGCTTGCTTGGCATTTGGAATGGATCATTGAAGGCAGAGCTCCGGTGTAAGG
AAAAGATACTTGCAAATAAGACGAGGACATTCACTGCTGCACCTTTAGACACTTTGCTGGGTGGTAAAGTGTGTGTTGAT
GATTTCAATAATCAATTTTATTCAAAGAATATTGAATGCTGTTGGACGGTTGGGATGACTAAGTTTTATGGTGGTTGGGA
TAAACTGCTGCGGCGTTTACCTGAGAATTGGGTATACTGTGATGCCGATGGCTCACAGTTTGATAGTTCACTAACTCCAT
ACTTAATCAATGCTGTTCTCACCATCAGAAGCACATACATGGAAGATTGGGATGTGGGGTTGCAAATGTTGCGCAATTTA
TACACTGAGATTGTTTACACACCTATTTCAACTCCAGATGGAACAATTGTTAAGAAGTTCAGAGGAAATAACAGTGGTCA
GCCTTCTACTGTTGTGGACAACTCTCTTATGGTCGTCCTTGCCATGCACTATGCTCTCATCAAAGAATGCATTGAGTTTG
AAGAGATTGACAGCACGTGCGTGTTCTTTGTCAATGGTGATGATTTGCTGATTGCTGTGAATCCGGATAAAGAGGGCATT
CTTGACAGATTGTCACAACACTTCTCAGATCTTGGTTTGAATTATGATTTCTCGTCAAGAACAAGAAATAAGGAGGAATT
GTGGTTTATGTCTCATAGAGGCCTACTGATTGAGGGCATGTACGTGCCGAAACTTGAAGAAGAAAGGATTGTGTCCATTC
TCCAATGGGACAGAGCAGACTTGGCTGAACACAGGCTTGAGGCGATTTGCGCAGCTATGATAGAGTCCTGGGGTTATTCT
GAACTAACACACCAAATCAGGAGATTCTACTCATGGTTATTGCAACAGCAACCCTTTGCAACAATAGCGCAGGAAGGGAA
GGCTCCTTATATAGCAAGCATGGCATTAAGGAAATTGTATATGGATAGGGCTGTGGATGAGGAAGAGCTGAGAGCCTTCA
CTGAAATGATGGTCGCATTAGACGATGAGTTTGAATTTGACTCTTATGAAGTACACCATCAAGCAAATGACACAATCGAT
GCAGGAGGAAGCAGCAAGAAAGATGCAAGACCGGAGCAAGGCAGCATCCAGTCAAACCCGAACAAAGGAAAAGATAAGGA
TGTGAATGCTGGTACATCTGGGACACATACTGTGCCGAGAATCAAGGCTATCACGTCCAAAATGAGAATGCCCAAAAGCA
AGGGAGCAACCGTGCTAAACCTAGAACACTTGCTTGAGTATGCTCCACAACAAATTGATATTTCAAATACTCGGGCAACT
CAATCACAGTTTGATACGTGGTATGAGGCAGTGCGGATGGCATACGACATAGGAGAAACTGAGATGCCAACTGTGATGAA
TGGGCTTATGGTTTGGTGCATTGAAAATGGAACCTCGCCAAATGTCAACGGAGTTTGGGTTATGATGGATGGGAATGAAC
AAGTCGAGTACCCGTTGAAACCAATCGTTGAGAATGCAAAACCAACCCTTAGGCAAATCATGGCACATTTCTCAGATGTT
GCAGAAGCGTATATAGAAATGCGCAACAAAAAGGAACCATATATGCCACGATATGGTTTAATTCGAAATCTGCGGGATGT
GGGTTTAGCGCGTTATGCCTTTGACTTTTATGAGGTCACATCACGAACACCAGTGAGGGCTAGGGAAGCGCACATTCAAA
TGAAGGCCGCAGCATTGAAATCAGCCCAACCTCGACTTTTCGGGTTGGACGGTGGCATCAGTACACAAGAGGAGAACACA
GAGAGGCACACCACCGAGGATGTCTCTCCAAGTATGCATACTCTACTTGGAGTCAAGAACATGTGA
>Wilga5
ATGGCAACTTACATGTCAACAATCTGTTTCGGTTCGTTTGAATGCAAGCTACCATACTCACCCGCCTCTTGCGGGCATAT
TGTGAAGGAACGAGAAGTGCTGGCTTCCGTTGATCCTTTCGCAGATCTGGAAACACAACTTAGTGCACGATTGCTCAAGC
AAGAATATGCTACTGTTCGTGTGCTCAAGAACGGTACTCTTACGTACCGATACAAGACTGATGCCCAGATAACGCGCATC
CAGAAGAAACTGGAAAGGAAGGATAGGGAAGAATATCACTTCCAGATGGCAGCTCCTAGTATTGTGTCAAAGATCACTAT
TGCTGGTGGAGAGCCACCTTCAAAACTTGAATCACAAGTGCGGAGGGGTGTCATCCACACAACTCCAAGGATGCGCACAG
CAAAAACATATCACACGCCAAAGTTGACAGAGGGACAAATGAACCACCTTATCAAGCAGGTGAAGCAAATTATGTCAACC
AAAGGAGGGTCTGTTCAACTGATTAGCAAGAAAAGTACCCATGTTCACTATAAAGAAGTTTTGGGATCACATCGCGCAGT
TGTTTGCACTGCACATATGAGAGGTTTACGAAAGAGAGTGGACTTTCGGTGTGATAAATGGACCGTTGTGCGTCTACAGC
ATCTCGCCAGGACGGACAAGTGGAATAACCAAGTTCGTGCTACTGATCTACGCAAGGGCGATAGTGGAGTTATATTGAGT
AATACTAATCTCAAAGGAAGCTTTGGGAGAAGCTCGGAGGGCATATTCATAGTGCGTGGGTCGCACGAAGGAAAAATCTA
TGATGCACGTTCCAAGGTTACTCAAGGGGTTATGGATTCAATGGTTCAGTTCTCAAGCGCTGAAAGCTTCTGGAAGGGAT
TGGACGGCAATTGGGCACAAATGAGATATCCTACAGATCATACATGTGTGGCAGGCTTACCAGTTGAAGACTGCGGCAGA
GTTGCAGCGATAATGACACACAGTATTTTACCGTGCTATAAGATAACCTGCCCTACCTGTGCCCAACAATATGCCAATTT
GCCAGCCAGTGACTTACTTAAGATATTACACAAGCACGCAAGTGATGGTTTAAATCGATTGGGGGCAGACAAAGATCGCT
TTGTGCACGTCAAAAAGTTCTTGACAATCTTAGAGCACTTAACTGAACCGGTTGATCTGGGTCTAGAAATTTTCAATGAA
GTATTCAAGTCTATAGGGGAGAAGCAACAATCACCTTTCAAAAACCTGAATATTCTGAATAATTTCTTTTTGAAAGGAAA
AGAAAATACAGCTCGTGAATGGCAGGTGGCTCAATTAGGCTTACTTGAATGGGCAAGATTCCAAAAGAACAGAACGGATA
ATATCAAGAAAGGAGACATCTCGTTCTTTAGGAATAAACTATCTGCCAAAGCAAATTGGAACTTGTATCTGTCATGTGAT
AACCAGCTGGATAAGAATGCAAACTTCCTGTGGGGACAGAGGGAATATCATGCTAAGCGATTTTTCTCGAACTATTTCGA
GGAAATTGATCCAGCGAAGGGCTATTCAGCATATGAAAATCGTTTGCATCCGAATGGGACAAGAAAACTTGCAATTGGAA
ACCTAATTGTACCACTTGATCTGGCTGAGTTTAGGCGGAAGATGAAAGGTGATTATAAAAGACAGCCAGGGGTGAGTAAG
AAGTGCACGAGCTCGAAGGATGGAAACTACGTGTATCCCTGTTGTTGCACTACACTTGATGATGGCTCAGCTGTTGAATC
AACATTTTACCCGCCAACTAAGAAGCACCTTGTAATAGGTAATAGTGGCGACCAAAAGTATGTTGACTTACCAAAAGGGA
ATTCTGAGATGTTATATATTGCCAGGCAAGGCTTTTGTTACATTAACATTTTCCTCGCGATGTTGATTAACATTAGTGAG
GAAGATGCAAAGGATTTCACTAAGAAGGTTCGTGACATGTGTGTGCCAAAGCTTGGAACCTGGCCAACCATGATGGATCT
GGCTACAACTTGTGCTCAAATGAAAATATTCTACCCTGATGTTCATGATGCAGAACTGCCTAGAATACTAGTCGATCACG
AAACGCAGACATGCCATGTGGTTGACTCGTTTGGCTCACAAACAACTGGGTATCATATTTTGAAAGCATCTAGCGTGTCC
CAACTTATTTTGTTTGCTAATGATGAGTTGGAGTCTGACATTAAACATTATAGAGTTGGTGGTGTTCCTAATGCATGCCC
TGAACTTGGGTCCACAATATCACCTTTCAGAGAAGGAGGAGTTATAATGTCTGAGTCGGCAGCTCTGAAACTGCTTTTGA
AGGGAATTTTTAGACCTAAGGTGATGAGACAGTTGTTGTTAGATGAGCCTTATCTGTTGATTCTATCAATATTATCCCCT
GGCATACTGATGGCTATGTACAATAATGGGATTTTTGAACTTGCGGTAAGGTTGTGGATTAATGAGAAACAATCCATAGC
TATGATAGCATCGCTACTATCAGCTTTAGCCCTACGAGTGTCAGCGGCAGAAACACTCGTCGCACAGAGGATTATAATTG
ATACTGCAGCTACAGATCTCCTTGATGCTACGTGCGATGGGTTCAACCTACATCTAACGTACCCCACTGCGTTGATGGTG
TTGCAAGTTGTTAAGAATAGAAATGAATGTGATGATACCCTATTCAAGGCGGGTTTTCCAAGTTACAACACGAGCGTCGT
ACAGATCATGGAAAAAAATTATCTAAATCTCTTAAACGATGCTTGGAAAGATTTAACTTGGCGGGAAAAATTATCCGCAA
CATGGTACTCATACAGAGCAAAACGCTCTATCACTCGGTACATAAAACCCACAGGAAGGGCAGATTTGAAAGGGTTATAC
AACATATCACCACAAGCATTCTTGGGCCGAGGCGCCAAGGTGGTCAAAGGCACTGCCTCAGGATTGTGCGAGCGATTTAA
TAATTATTTCTACACTAAGTGTGTAAATATTTCATCCTTTTTCATTCGTAGAATCTTTAGGCGTTTGCCAACTTTCGTCA
CTTTTGTTAACTCATTATTAGTTATTAGTATGTTAACCAGCGTAGTGGCAGTCTGTCAGGCAATAATTTTAGATCAGAGG
AAGTATAGGAGAGAAATCGAGTTGATGCAGATAGAGAAGAATGAGATCGTCTGCATGGAGCTATATGCAAGTTTACAGCG
CAAACTTGAACGCGATTTCACATGGGATGAGTACATTGAGTATTTGAAGTCAGTAAACCCTCAGATAGTTCAGTTTGCTC
AAGCGCAGATGGAAGAATATGATGTGCGACACCAGCGTTCCACACCAGGTGTTAAAAATTTGGAACAAGTGGTAGCATTT
ATGGCTTTAGTCATCATGGTGTTCGATGCTGAAAGGAGTGATTGCGTGTTCAAAACTCTCAATAAATTTAAGGGTGTCCT
TTCCTCACTGGACCATGAAGTTAGACATCAGTCCTTAGACGATGTGATCAAGAATTTTGATGAGAGGAATGAGGTTATTG
ATTTTGAATTGAGTGAGGACACAATTCGAACATCATCAGTGCTAGATACAAAGTTTAGTGATTGGTGGGACCGACAAATC
CAGATGGGGCATACACTTCCACATTACAGAACCGAGGGGCACTTCATGGAATTCACAAGAGCAACTGCTGTCCAAGTGGC
TAATGACATTGCCCATAGCGAACACCTAGACTTTCTAGTAAGGGGAGCTGTTGGGTCTGGAAAGTCAACTGGGTTGCCTG
TTCATCTTAGTGTAGCCGGATCTGTGCTTTTAATTGAACCAACGCGACCACTAGCGGAGAACGTTTTCAAACAGCTATCT
AGTGAACCATTCTTCAAGAAGCCAACACTGCGTATGCGTGGAAATAGTATATTTGGCTCTTCTCCAATCTCCGTCATGAC
TAGCGGATTTGCGCTACACTACTTCGCCAATAATCGCTCTCAATTAGCTCAGTTCAACTTTGTAATATTTGATGAGTGCC
ATGTTCTGGATCCTTCCGCAATGGCGTTCCGCAGTCTGCTGAGTGTTTATCATCAAGCATGCAAAGTATTAAAAGTGTCA
GCTACTCCAGTGGGAAGAGAGGTTGAATTTACAACACAGCAACCAGTCAAGTTAATAGTGGAGGACACACTGTCTTTTCT
ATCATTTGTTGATGCACAAGGTTCTAAAACTAATGCTGATGTTGTTCAGTTTGGTTCAAACGTACTTGTGTACGTGTCGA
GCTACAATGAAGTTGATACCTTGGCTAAGCTCCTAACAGACAAGAATATGATGGTCACAAAAGTTGATGGCAGAACAATG
AAGCACGGTTGCCTAGAAATTGTCACAAAAGGAACCAGTGCGAGACCACATTTTGTTGTAGCAACCAACATAATTGAGAA
TGGAGTGACTTTGGACATAGACGTGGTTGTAGATTTTGGGTTGAAAGTCTCACCGTTCTTGGACATTGACAATAGGAGCA
TTGCTTACAATAAGGTGAGTGTTAGCTATGGTGAGAGAATTCAAAGGCTGGGTCGTGTTGGACGCTTCAAGAAAGGAGTA
GCATTGCGCATTGGACACACTGAGAAGGGAATTATTGAAATTCCAAGCATGATCGCTACAGAGGCGGCTCTTGCTTGCTT
TGCATATAACTTGCCAGTGATGACAGGAGGCGTCTCAACTAGTCTGATTGGCAATTGTACTGTGCGCCAAGTTAAAACAA
TGCAGCAATTTGAATTGAGTCCCTTCTTTATCCAGAATTTTGTTGCCCATGATGGATCAATGCATCCTGTCATACATGAC
ATTCTTAAAAAGTATAAACTTCGAGATTGTATGACACCTTTGTGCGATCAGTCTATACCATACAGGGCATCGAGCACTTG
GTTATCGGTTAGTGAATATGAGCGACTTGGAGTGGCCTTAGAAATTCCAAAGCAAGTCAAAATTGCATTCCATATCAAAG
AGATCCCTCCTAAGCTCCATGAAATGCTTTGGGAAACGGTTGTCAAGTACAAAGACGTTTGCTTATTTCCAAGCATTCGA
GCATCGTCCATCAGCAAAATCGCATACACATTGCGTACAGACCTTTTCGCCATCCCAAGAACTCTAATATTGGTGGAGAG
ACTGCTTGAAGAGGAGCGAGTGAAGCAGAGCCAATTCAGAAGTCTCATCGACGAAGGATGCTCAAGCATGTTTTCAATTG
TCAACTTGACAAACACTCTCAGAGCTAGATATGCAAAAGATTACACCGCAGAGAACATACAAAAACTTGAGAAAGTGAGA
AGTCAATTGAAAGAATTCTCAAATTTGGATGGTTCTGCATGTGAGGAAAATTTAATAAAGAGGTATGAGTCTTTGCAGTT
CGTTCATCACCAAGCTGCGACGTCACTTGCAAAGGATCTCAAGTTGAAGGGGACTTGGAAGAAGTCATTAGTGGCCAAAG
ACTTGATCATAGCAGGCGCTGTTGCAATTGGTGGAATAGGACTCATATATAGTTGGTTCACACAATCAGTTGAGACTGTG
TCTCACCAAGGGAAAAATAAATCCAAAAGAATCCAAGCCTTGAAGTTTCGCCATGCTCGTGACAAAAGGGCTGGCTTTGA
AATTGACAACAATGATGACACAATAGAGGAATTCTTTGGATCTGCATACAGGAAAAAGGGAAAAGGTAAAGGTACCACAG
TTGGTATGGGCAAGTCAAGCAGGAGGTTCATCAACATGTATGGGTTTGACCCAACAGAGTACTCATTCATCCAATTCGTT
GATCCACTCACTGGGGCGCAAATAGAAGAGAATGTCTATGCTGACATTAGAGATATTCAAGAGAGATTTAGTGAAGTGCG
AAAGAAAATGGTTGAGAATGATGACATTGAAATGCAAGCCTTGGGCAGTAACACGACCATACATGCATACTTCAGGAAAG
ATTGGTCTGACAAAGCTTTGAAGATTGATTTAATGCCACATAATCCACTCAAAGTTTGTGACAAGACAAATGGCATTGCC
AAATTTCCTGAGAGAGAACTCGAACTAAGGCAGACTGGGCCAGCTGTAGAAGTCGACGTGAAGGACATACCAGCACAGGA
GGTGGAGCATGAAGCTAAATCGCTCATGAGAGGCTTGAGAGACTTCAACCCAATTGCCCAAACAGTTTGTAGGCTGAAAG
TATCTGTTGAATATGGGACATCAGAGATGTACGGTTTTGGATTTGGAGCGTACATAATAGCGAACCACCATTTGTTCAGG
AGTTACAATGGTTCCATGGAGGTGCGATCCATGCACGGTACATTCAGGGTGAAGAATCTACACAGTTTGAGTGTTCTGCC
AATTAAAGGTAGGGATATCATCCTCATCAAAATGCCGAAAGATTTCCCTGTCTTTCCACAGAAATTGCATTTCCGAGCTC
CTACACAGAATGAAAGAGTTTGTTTAGTTGGAACCAACTTTCAGGAGAAGTATGCATCGTCGATCATCACAGAAGGAGGC
ACCACTTACAATATACCAGGCAGCACATTCTGGAAGCATTGGATTGAAACAGATAATGGACATTGTGGACTACCAGTGGT
GAGCACCACCGATGGATGTCTAGTCGGAATTCACAGTTTGGCAAACAACAAACACACCACGAATTACTACTCAGCCTTCG
ATGAAGATTTTGAAAGCAAGTATCTCCGAACCAATGAGCACAATGAATGGGTCAAGTCTTGGATTTATAATCCAGACACA
GTATTGTGGGGCCCGTTGAAACTTAAAGACAGCACTCCCAAAGGATTATTCAAAACAACAAAGCTTGTGCAAGATCTAAT
TGATCATGATGAAGTGGTGGAGCAAGCTAAGCACTCTGCGTGGATGTTTGAAGCCTTGACAGGAAATTTGCAAGCTGTCG
CAACAATGAAGAGCCAATTAGTAACCAAGCATGTAGTTAAAGGAGAGTGTCGACACTTCAAAGAATTCCTGACTGTGGAT
GCAGAAGCAGAGGCATTCTTCAGGCCTTTGATGGATGCGTATGGGAAAAGTTTGCTGAATAGAGATGCATACATCAAGGA
CATAATGAAGTATTCAAAACCTATAGATGTTGGTATCGTGGACTGTGATGCATTTGAGGAAGCCATCAATAGGGTTATCA
TCTACCTGCAAGTGCACGGCTTCAAGAAGTGCGCATACGTTACTGACGAGCAAGAAATTTTTAAAGCGCTCAACATGAAA
GCTGCAGTCGGAGCCATGTATGGTGGCAAAAAGAAAGACTATTTTGAGCATTTCACTGATGCAGATAAGGAAGAAATAGT
CATGCAAAGCTGTCTGCGATTGTATAAAGGCTTGCTTGGCATTTGGAATGGATCATTGAAGGCAGAGCTCCGGTGTAAGG
AAAAGATACTTGCAAATAAGACGAGGACATTCACTGCTGCACCTTTAGACACTTTGCTGGGTGGTAAAGTGTGTGTTGAT
GACTTCAATAATCAATTTTATTCAAAGAATATTGAATGCTGTTGGACGGTTGGGATGACAAAGTTTTATGGTGGTTGGGA
TAAACTGCTGCGGCGTTTACCTGAGAATTGGGTTTACTGTGATGCCGATGGCTCACAGTTTGATAGTTCACTAACTCCAT
ACTTAATCAATGCAGTTCTCACCATTAGAAGCACATACATGGAAGATTGGGATGTGGGGTTGCAAATGTTGCGCAATTTA
TACACTGAGATTGTTTACACACCTATTTCAACTCCAGATGGAACAATTGTTAAGAAGTTCAGAGGGAATAACAGTGGTCA
GCCTTCTACTGTTGTGGACAACTCTCTTATGGTCGTCCTTGCCATGCACTATGCTCTCATCAAAGAATGCATTGAGTTTG
AAGAGATTGACAGCACGTGCGTGTTCTTTGTCAATGGTGATGATTTGCTGATTGCTGTGAATCCGGATAAAGAGGGCATT
CTTGACAGATTGTCACAACACTTCTCAGATCTTGGTTTGAATTATGATTTCTCGTCAAGAACAAGAAATAAGGAGGAGTT
GTGGTTTATGTCTCATAGAGGCCTACTGATTGAGGCAATGTACGTGCCGAAACTTGAAGAAGAAAGGATTGTGTCCATTC
TCCAATGGGACAGAGCAGACTTGGCTGAACATAGGCTTGAGGCGATTTGCGCAGCTATGATAGAGTCCTGGGGTTATTCT
GAACTAACACACCAAATCAGGAGATTCTACTCATGGTTATTGCAACAGCAACCCTTTGCAACAATAGCGCAGGAAGGGAA
GGCTCCTTATATAGCAAGCATGGCATTAAGGAAATTGTATATGGATAGGGCTGTGGATGAGGAAGAGCTGAGAGCCTTCA
CTGAAATGATGGTCGCATTAGATGATGAGTTTGAATTTGACTCTTATGAAGTATACCATCAAGCAAATGACACAATCGAT
GCAGGAGAAAGCAGCAAGAAAGATGCAAGACCAGAGCAAGGCAGCATCCAGTCAAACCCGAACAAAGGAAAAGATAAGGA
TGTGAATGCTGGTACATCTGGGACACATACTGTGCCGAGAATCAAGGCTATCACGTCCAAAATGAGGATGCCCAAAAGCA
AGGGAGCAACCGTGCTAAACTTAGAACACTTGCTTGAGTATGCTCCACAACAAATTGATATTTCAAATACTCGGGCAACT
CAATCACAGTTTGATACGTGGTATGAGGCAGTGCGGATGGCATACGACATAGGAGAAACTGAGATGCCAACTGTGATGAA
TGGGCTTATGGTTTGGTGCATTGAAAATGGAACCTCGCCAAATGTCAACGGAGTTTGGGTTATGATGGATGGGAATGAAC
AAGTCGAGTACCCGTTGAAACCAATCGTTGAGAATGCAAAACCAACCCTTAGGCAAATCATGGCACATTTCTCAGATGTT
GCAGAAGCGTATATAGAAATGCGCAACAAAAAGGAACCATATATGCCACGATATGGTTTAATTCGAAATCTGCGGGATGT
GGGTTTAGCGCGTTATGCCTTTGACTTTTATGAGGTCACATCACGAACACCAGTGAGGGCTAGGGAAGCGCACATTCAAA
TGAAGGCCGCAGCATTGAAATCAGCCCAACCTCGACTTTTCGGGTTGGACGGTGGCATCAGTACACAAGAGGAGAACACA
GAGAGGCACACCACCGAGGATGTCTCTCCAAGTATGCATACTCTACTTGGAGTCAAGAACATGTGA
================================================
FILE: inst/extdata/Gram-negative_AKL.fasta
================================================
>Random_Gram-negative_AKL_gjtez
RWTHLASGRTYNYKFNPPKQYGKDDITGEDLIQRED
>Random_Gram-negative_AKL_dibhu
RWTHLNSGRTYHYKFNPPKVHGVDDVTGEPLVQRED
>Random_Gram-negative_AKL_elirp
RWTHLASGRTYNYKFNPPKQYGKDDITGEDLIQRED
>Random_Gram-negative_AKL_dnjtf
RWTHLASGRTYNYKFNPPKQYGKDDITGEDLIQRED
>Random_Gram-negative_AKL_qzcvn
RLIHQPSGRSYHEEFNPPKEPMKDDVTGEPLIRRSD
>Random_Gram-negative_AKL_mqvro
RRVHPGSGRVYHVVYNPPKVEGKDDETGEELIVRAD
>Random_Gram-negative_AKL_qjvxv
RRVHPASGRIYHLVHNPPEVDGVDDATGEMLIQRDD
>Random_Gram-negative_AKL_mlmcf
RRVHAPSGRVYHVKFNPPKVEGKDDVTGEELTTRKD
>Random_Gram-negative_AKL_bfqnk
RWVHEPSGRVYNTDFNAPKVPGKDDITGEPLTQRQD
>Random_Gram-negative_AKL_kvcas
RWVHEPSGRVYNTDFNVPKVPGKDDVTGEPLTQRQD
>Random_Gram-negative_AKL_xrbtp
RRAHLPSGRTYHVVYNPPKVEGKDDVTGEDLVIRED
>Random_Gram-negative_AKL_yggsb
RRAHLPSGRTYHVVYNPPKVEGKDDVTGEDLVIRED
>Random_Gram-negative_AKL_wntes
RRVHQPSGRSYHIVYNPPKTEGKDDVTGEDLIIRAD
>Random_Gram-negative_AKL_gbdos
RRVHQPSGRSYHIVYNPPKTEGKDDVTGEDLIIRAD
>Random_Gram-negative_AKL_lhrmd
RRVHQPSGRSYHIVYNPPKTEGKDDVTGEDLIIRAD
>Random_Gram-negative_AKL_zhrxk
RRVHQPSGRSYHIVYNPPKTEGKDDVTGEDLIIRAD
>Random_Gram-negative_AKL_taozi
RRVHQPSGRSYHIVYNPPKTEGKDDVTGEDLIIRAD
>Random_Gram-negative_AKL_reram
RRVHQPSGRSYHIVYNPPKTEGKDDVTGEDLIIRAD
>Random_Gram-negative_AKL_rukmd
RRVHQPSGRSYHIVYNPPKTEGKDDVTGEDLIIRAD
>Random_Gram-negative_AKL_dfkbq
RRVHQPSGRSYHIVYNPPKTEGKDDVTGEDLIIRAD
>Random_Gram-negative_AKL_dvomf
RRVHQPSGRSYHIIYNPPKTEGKDDVTGEDLIIRAD
>Random_Gram-negative_AKL_vgzym
RWIHPSSGRVYNLDFNPPQVQGIDDITGEPLVQQED
>Random_Gram-negative_AKL_ptlzq
RLFHPGSGRVYHKVTNPPKKPMTDDITGEPLIIRKD
>Random_Gram-negative_AKL_lgmpt
RLFHPGSGRTYHTKFNPPKVPMKDDQTGEDLIVRKD
>Random_Gram-negative_AKL_stqhz
RRAHLPSGRTYHVVYNPPKVEGKDDVTGEDLVIRED
>Random_Gram-negative_AKL_leceq
RRAHLPSGRTYHVVYNPPKVEGKDDVTGEDLVIRED
>Random_Gram-negative_AKL_arqwq
RWVHEPSGRVYNTDFNAPKVPGKDDVTGEPLTQRED
>Random_Gram-negative_AKL_edhmf
RRVHPGSGRSYHVKFNPPKVEGKDDVTGEPLVQRDD
>Random_Gram-negative_AKL_jefev
RRVHPGSGRVYHVVFNPPKVEGKDDVTGEDLAIRPD
>Random_Gram-negative_AKL_mgvft
RRTHPASGRTYHVKFNPPKVDGKDDVTGEPLIQRDD
>Random_Gram-negative_AKL_pdjwi
RWIHPSSGRSYHTKFAPPKVPGVDDVTGEPLIQRKD
>Random_Gram-negative_AKL_hbdlm
RWIHPSSGRSYHTKFAPPKTPGLDDVTGEPLIQRKD
>Random_Gram-negative_AKL_qinsk
RRAHLPSGRTYHVVYNPPKVEGKDDVTGEDLVIRED
>Random_Gram-negative_AKL_xiszr
RRVHAPSGRVYHVKFNPPKVEGKDDVTDEELTTRKD
>Random_Gram-negative_AKL_tsjls
RRAHLPSGRTYHVVYNPPKVEGKDDVTGEDLVIRED
>Random_Gram-negative_AKL_ivaqd
RRAHLPSGRTYHVVYNPPKVEGKDDVTGEDLVIRED
>Random_Gram-negative_AKL_uceun
RRVHESSGRIYHVKYDPPKVEDKDNETGEALIQRED
>Random_Gram-negative_AKL_yfnqy
RRVHPASGRVYHTEHNPPKVAGKDDVTGEELIQRED
>Random_Gram-negative_AKL_fquul
RRVHPASGRVYHTEHNPPKVAGKDDVTGEELIQRED
>Random_Gram-negative_AKL_hrvsw
RWVHVPSGRVYNLDYNPPKVPFKDDVTGEPLSKRED
>Random_Gram-negative_AKL_wdkfx
RRVHAPSGRVYHVKFNPPKVEGKDDVTGEELTTRKD
>Random_Gram-negative_AKL_lpxmt
RRVHAPSGRVYHVKFNPPKVEGKDDVTGEELTTRKD
>Random_Gram-negative_AKL_bkmgo
RRVHAPSGRVYHVKFNPPKVEGKDDVTGEELTTRKD
>Random_Gram-negative_AKL_rqtgn
RRVHAPSGRVYHVKFNPPKVAGKDDVTGEELTTRKD
>Random_Gram-negative_AKL_fzfio
RRVHAPSGRVYHVKFNPPKVEGKDDVTGEXLTTRKD
>Random_Gram-negative_AKL_ptxxd
RRVHAPSGRVYHVKFNPPKVEGKDDVTGEELTTRKD
>Random_Gram-negative_AKL_fmdzi
RRVHVASGRTYHVKYNPPKNEGKDDETGEPLIQRDD
>Random_Gram-negative_AKL_ehnfi
RRAHLPSGRTYHSVYNPPKEEGKDDITGEELVVRDD
>Random_Gram-negative_AKL_gwaom
RRVHPGSGRVYHIKHNPPKEEGKDDETGEELVIRPD
>Random_Gram-negative_AKL_ngobh
RRAHLPSGRTYHNVYNPPKEEGKDDITGEELVVRDD
>Random_Gram-negative_AKL_jgpqr
RRVHPESGRIYHTVYNPPKVEGKDDETGEDLVQRPD
>Random_Gram-negative_AKL_jvlnt
RRVHPGSGRIYHVEHNPPKVEGVDDETGEALVHRDD
>Random_Gram-negative_AKL_dnrym
RRVHEASGRVYHVMHNPPKESGIDDITGEPLIQRDD
>Random_Gram-negative_AKL_omfoc
RRVHPGSGRVYHRIHNPPTLDDRDDLTGEPLVQRDD
>Random_Gram-negative_AKL_gnjvq
RRVHPGSGRVYHVVYNPSKVEGKDDVTGEDLIIRDD
>Random_Gram-negative_AKL_xapht
RRMHPASGRNYHIIFNPPKVEGKDDATGEDLIQRED
>Random_Gram-negative_AKL_vhtcj
RWYHLKSGRIYHTLYNPPLTAGKDDDTGEPLEQ---
>Random_Gram-negative_AKL_jnrhr
RWVHKSSGRTYHEVFRPPRTPGKDDVTGEDLHQRPD
>Random_Gram-negative_AKL_kmyvp
RWIHKPSGRTYHEVFRPPKTPGKDDITGEDLYQRPD
>Random_Gram-negative_AKL_bbbbb
RWYHPKSGRIYHTFYNPPLNAGKDDYTGEPLVQ---
>Random_Gram-negative_AKL_obouo
RRIHPASGRTYHTKFNPPKVADKDDVTGEPLITRTD
>Random_Gram-negative_AKL_ellkt
RRTHPASGRTYHVKFNPPKVEGKDDVTGEPLVQRDD
>Random_Gram-negative_AKL_sxldp
RRVHQASGRSYHIVYNPPKVEGKDDVTGEDLIIRAD
>Random_Gram-negative_AKL_sckku
RRVHQASGRSYHIVYNPPKVEGKDDVTGEDLIIRAD
>Random_Gram-negative_AKL_fbqzv
RRVHQTSGRSYHIVYNPPKVEGKDDVTGEDLIIRAD
>Random_Gram-negative_AKL_ypuig
RRVHQTSGRSYHIVYNPPKVEGKDDVTGEDLIIRAD
>Random_Gram-negative_AKL_ltbjc
RRVHQASGRSYHIVYNPPKVEGKDDVTGEDLIIRAD
>Random_Gram-negative_AKL_wmjap
RRVHQASGRSYHIVYNPPKVEGKDDVTGEDLIIRAD
>Random_Gram-negative_AKL_oxood
RRVHQASGRSYHIVYNPPKVEGKDDVTGEDLIIRAD
>Random_Gram-negative_AKL_ddrjb
RWVHEPSGRVYNDTFNAPQVPGRDDVTGEPLVRRPD
>Random_Gram-negative_AKL_nwqma
RWIHPASGRSYHTKFAPPKVEGKDDFTGEPLIKRKD
>Random_Gram-negative_AKL_dtzyf
RRVHPASGRVYHTEHNPPKVAGKDDETGEELIQRED
>Random_Gram-negative_AKL_whwzb
RWIHPSSGRTYHTKFAPPKVSGVDDVTGEPLIQRKD
>Random_Gram-negative_AKL_dmvij
RWVHVPSGRVYNLDYNPPKVPFKDDITGEPLTKRSD
>Random_Gram-negative_AKL_xtwaf
RYVHLPSGRIYSLDYNPPKVPFKDDVTGEDLVKRED
>Random_Gram-negative_AKL_iyejp
RRTHPASGRTYHVKFNPPKVEGKDDVTGEPLVQRDD
>Random_Gram-negative_AKL_cbxjs
RLIHKPSGRIYHKIFNPPKTPFKDDITNEPLIQRED
>Random_Gram-negative_AKL_oglie
RRAHLPSGRTYHTVYNPPKEEGKDDVTGEELVVRDD
>Random_Gram-negative_AKL_jqtgo
RRTHPASGRTYHVKFNPPKVEGKDDVTGEPLVQRDD
>Random_Gram-negative_AKL_prkvf
RRQHPGSGRVYHLKYNPPKQEGLDDETGEPLIQRDD
>Random_Gram-negative_AKL_aincb
RRTHPASGRTYHVKFNPPKVEGKDDVTGEPLVQRDD
>Random_Gram-negative_AKL_whyuk
RRAHLPSGRTYHVVYNPPKVEGKDDVTGEDLVIRED
>Random_Gram-negative_AKL_tbpgo
RRAHLPSGRTYHVVYNPPKVEGKDDVTGEDLVIRED
>Random_Gram-negative_AKL_lkebr
RRAHLPSGRTYHVVYNPPKVEGKDDVTGEDLVVRDD
>Random_Gram-negative_AKL_npiwv
RRAHLPSGRTYHVVYNPPKVEGKDDVTGEDLVVRDD
>Random_Gram-negative_AKL_zzajl
RRVHAASGRVYHVKFNPPKVEDKDDVTGEELTIRKD
>Random_Gram-negative_AKL_pwhal
RRAHLASGRTYHVVYNPPKVEGKDDVTGEDLVVRDD
>Random_Gram-negative_AKL_hcuqd
RRTHPASGRTYHVKFNPPKQEGIDDITGEPLVQRDD
>Random_Gram-negative_AKL_pswng
RWVHAPSGRVYNTQFNAPKEPGKDDVTGEPLVQRAD
>Random_Gram-negative_AKL_eueyh
RWVHAPSGRVYNTTFHAPKVAGLDDITGEKLTKRPD
>Random_Gram-negative_AKL_iplvh
RRAHLPSGRTYHVVYNPPKVEGKDDVTGEDLVIRED
>Random_Gram-negative_AKL_ykocu
RRAHLPSGRTYHVVYNPPKVEGKDDVTGEDLVIRED
>Random_Gram-negative_AKL_pzokl
RYVHVPSGRVYNLQYNPPKVPGLDDITGEPLTKRLD
>Random_Gram-negative_AKL_dpucn
RLVHEPSGRVYHMTSKPPKVPMRDDITNEPLTQRKD
>Random_Gram-negative_AKL_hcasp
RRVHVASGRTYHVKYNPPKTEGVDDETGEPLIQRDD
>Random_Gram-negative_AKL_ynuts
RWVHAPSGRVYNTTFHAPKVPGLDDITGEKLTKRPD
>Random_Gram-negative_AKL_kfbqi
RRIHPASGRTYHTKFNPPKVADKDDVTGEPLITRTD
>Random_Gram-negative_AKL_fbphm
RRIHPASGRTYHTKFNPPKVADKDDVTGEPLITRTD
>Random_Gram-negative_AKL_xrebl
RRIHPASGRTYHTKFNPPKVADKDDVTGEPLITRTD
>Random_Gram-negative_AKL_snboh
RRVHQPSGRTYHVVYNPPKVEGKDDVTGEDLIIRQD
================================================
FILE: inst/extdata/Gram-positive_AKL.fasta
================================================
>Random_Gram-positive_AKL_pjxgp
RRTCVGCGTAFNYVMEPPKKEGICDACGGKLVVRDD
>Random_Gram-positive_AKL_essyp
RRTCVGCGTAFNYVMEPPKKEGICDACGGKLVVRDD
>Random_Gram-positive_AKL_lopeh
RRIHEASGRVYHVVFNPPKKSGVDDETGDQLLQRED
>Random_Gram-positive_AKL_mzuep
RRICRSCGATYHIHFNPPAQAGICDKCGGELYQRAD
>Random_Gram-positive_AKL_pjycw
RLYCPNCGETYHVSWKPPRKPGVCDNCGSRLVRRRD
>Random_Gram-positive_AKL_tmsgs
RRICARCGAIYHVKYMPPKIPGICDKCGGPLVQRRD
>Random_Gram-positive_AKL_byrtv
RRICQSCGGIFNIYTLPTKEKGICDLCKGSLYQRKD
>Random_Gram-positive_AKL_hynwj
RRICKSCGGIFNIYTLPTKEKEICDLCKGILYQRKD
>Random_Gram-positive_AKL_ycsho
RRICKECGATYHLEFNPPAKADVCDKCGGELYQRSD
>Random_Gram-positive_AKL_kgtzr
RWIHPSSGRVYNLDFNPPQVQED-------------
>Random_Gram-positive_AKL_tmtym
RRLDPVTGKIYHLKYSPPENEEIAS----RLTQRFD
>Random_Gram-positive_AKL_diswt
RYICPKCGRVYNLLFNPPKNDLRCDDDGTPLIRRSD
>Random_Gram-positive_AKL_mgvxi
RRLCPNCQRTYHILFAPPKKDSLCDYCSVQLVQRAD
>Random_Gram-positive_AKL_cfbwo
RRICKTCGASYHLVFNPPAEEGKCDKDGGELYTRAD
>Random_Gram-positive_AKL_rmuqg
RYTCGNCGAGYHDDFKKPKVEGTCDDCGEQMKRRAD
>Random_Gram-positive_AKL_ejowt
RSTCGSCGEVYNDITKPIPQDGKCTKCGGEFKRRAD
>Random_Gram-positive_AKL_qzdve
RFTCGGCGEGYHDSFKQPQAMGTCDKCGGEFKRRAD
>Random_Gram-positive_AKL_yiwft
RYSCGSCGAVYHDDTKPTKVEGVCDVCGSDLRRRAD
>Random_Gram-positive_AKL_ubfcu
RSTCAACGEGYHDSFKQPARAGTCDKCGGEFKRRPD
>Random_Gram-positive_AKL_yqxwz
RSTCGNCGEVYHDVTKPQPADGKCEKCGADFKRRAD
>Random_Gram-positive_AKL_vvrnb
RSTCGNCGEVYHDVTKPQPADGKCEKCGADFKRRAD
>Random_Gram-positive_AKL_kcxvy
RSTCANCGEVYHDETKPIPADGKCSVCGGEFKRRAD
>Random_Gram-positive_AKL_qimxn
RYTCGGCGEGYHDSFKTPAVAGVCDKCSGDMQRRPD
>Random_Gram-positive_AKL_gooyo
RYTCGGCGEGYHDSFKVPSVEGTCDKCGGEMKRRAD
>Random_Gram-positive_AKL_zgspg
RSTCGNCGEVYNDMTKPWPADGKCAKCGSDVRRRAD
>Random_Gram-positive_AKL_fbyaj
RFSCANCGALYHDTANPPAKEGVCDVCHSEFKRRPD
>Random_Gram-positive_AKL_spvav
RSTCGGCGEVYHDETKPWPEDGKCTNCGSEVKRRTD
>Random_Gram-positive_AKL_elbro
RYTCGGCGEGYHDSFKQPAVAGTCDKCGSNMTRRAD
>Random_Gram-positive_AKL_oxdgk
RRLCSGCGLDYNLIHHRPQVIDQCDVCGAPLTQRAD
>Random_Gram-positive_AKL_fxbao
RFTCGDCGEGYHDTFKTPKVADTCDNCGANMTRRAD
>Random_Gram-positive_AKL_siwfh
RYTCAGCGEGYHDSFKQPAVEGKCDKCGGEMTRRAD
>Random_Gram-positive_AKL_riiin
RSTCGGCGEVYHDETKPWAADGKCTNCGSDVKRRAD
>Random_Gram-positive_AKL_eposf
RRMCGQCGRSWHVEFNPTRVEGICDTCAGSLHQRED
>Random_Gram-positive_AKL_klpbd
RRIHLSSGRSYHIEFNPPRVEGKDDLSGEDLIQRED
>Random_Gram-positive_AKL_txwex
RRTDPLTGTIYHLKYNPPPEDDT-------------
>Random_Gram-positive_AKL_lyfma
RRSCPDCGFVYNIKMDPPKVDGVCDKCGCPLITRKD
>Random_Gram-positive_AKL_vlwew
RRSCPDCGFVYNIKMDPPKLDGVCDKCGCPLITRKD
>Random_Gram-positive_AKL_xzjec
RRLDPVTGRIYNLKSDPPSPDVVDR-----------
>Random_Gram-positive_AKL_zyvla
RWVHKASGRSYHATFNPPKSLKAC------------
>Random_Gram-positive_AKL_xxxrz
RYTCAKCGAGYHDKFQQPKVAGTCDSCGGEFTRRAD
>Random_Gram-positive_AKL_hlisn
RFTCAACGEGYHDHFKQAAVAGTCDKCGGDFRRRPD
>Random_Gram-positive_AKL_xeeqt
RYSCGNCGAVYHDETKPTKVEGVCDVCGSDLRRRAD
>Random_Gram-positive_AKL_yuxgl
RRLDPVTGRIYHLKYSPPENEEIAA----RLTQRFD
>Random_Gram-positive_AKL_gzzla
RRICRSCGASYHVLFNKPAIEGRCNACGGELYQRSD
>Random_Gram-positive_AKL_lrjxz
RRICESCGTTYHLVFNPPKVEGICDIDGGKLYQRED
>Random_Gram-positive_AKL_subqs
RRFCPNCKAGFHIDFMPSSKGNICDKCGTELITRKD
>Random_Gram-positive_AKL_rqgxp
RRACVDCGATYHLVYAPTKEEGICDKCGGGLILRDD
>Random_Gram-positive_AKL_ffzzd
RYTCANCGAGYHDTFKQPKIEGVCDECGSEFKRRPD
>Random_Gram-positive_AKL_ytofs
RRTSKVTGKIYHIKFNPPVDEKEED-----LVQRAD
>Random_Gram-positive_AKL_oygvi
RQTCKTCGSTYNIYYFPSKHPNVCDDCGGKLYQRSD
>Random_Gram-positive_AKL_cyvfv
RFICRNCGATYHKLYNAPKVEGTCDVCGHEFYQRDD
>Random_Gram-positive_AKL_mwtjn
RRACVGCGATYHVVYNPTKEEGTCDTCGGELIVRDD
>Random_Gram-positive_AKL_vaoec
RRACLKCGATYHIVYAAPKVENVCDTCGENLVLRDD
>Random_Gram-positive_AKL_gxjsl
RRACVGCGATYHLVYAPTKTEGICDVCGKELILRDD
>Random_Gram-positive_AKL_nyalo
RYTCGGCGEGYHDSFKMPNVAGICDKCGGEMKRRAD
>Random_Gram-positive_AKL_unitb
RSTCGGCGEVYNDITKPWPADGKCAKCGSDVKRRAD
>Random_Gram-positive_AKL_cgcxi
RRVHEGSGRIYHVKYDPPKTEGKDDETGDALIQRED
>Random_Gram-positive_AKL_umndp
RRVHAPSGRVYHTVYNPPKVAGKDNETGDELTIRVD
>Random_Gram-positive_AKL_diikj
RFTCGGCGEGYHDSFKPTDKPGICDACGGDMKRRAD
>Random_Gram-positive_AKL_uwyra
RSTCAGCGEVYNDITKPIPADGICPKCGGEFKRRAD
>Random_Gram-positive_AKL_nlnlr
RRQDPETGAIYHLKFNPPADEAVLA----RLVQRKD
>Random_Gram-positive_AKL_easkj
RRVCSHCGTPFHLESNPPKKPDVCDVCGGELIERDD
>Random_Gram-positive_AKL_bzgzv
RQNCRKCGEIYNKLFMPSKVEGVCDKCGGELFQRPD
>Random_Gram-positive_AKL_rdkpc
RRICESCGTTYHLVFNPPKVEGICDIDGGKLYQRED
>Random_Gram-positive_AKL_etmff
RRMCKECGATYHILFNPPTKADQCDKCGGQLYQRDD
>Random_Gram-positive_AKL_ynwrz
RRICESCGTTYHLVFNPPKVEGICDIDGGKLYQRED
>Random_Gram-positive_AKL_fxmpj
RRACVDCGATYHIVYAPTEKEDVCDKCGGSLILRDD
>Random_Gram-positive_AKL_wkyro
RFICRNCGTTYHKLYNAPKVEGTCDVCGHEFYQRDD
>Random_Gram-positive_AKL_yemlt
RQTCKTCGATYNIYYFPSKHPNICDDCGGKLYQRSD
>Random_Gram-positive_AKL_ekvfi
RRLDPVTGRIYHLKYSPPENEEIAA----RLTQRFD
>Random_Gram-positive_AKL_quwnk
RRACVGCGATYHIVYNPTKVEGKCDVCSSDLILRDD
>Random_Gram-positive_AKL_qrxch
RRVCEKCGATYHLLYKKPKAEGVCDICGGTLIQRKD
>Random_Gram-positive_AKL_qrppk
RRICESCGTTYHLVFNPPKVEGICDIDGGKLYQRED
>Random_Gram-positive_AKL_fnlju
RRICKECGATYHLEFNPPAKADVCDKCGGELYQRSD
>Random_Gram-positive_AKL_kvakt
RRICKECGATYHLEFNPPAKADVCDKCGGKLYQRSD
>Random_Gram-positive_AKL_rcuhp
RRICKECGATYHLEFNPPAKADVCDKCGGELYQRSD
>Random_Gram-positive_AKL_tbwwv
RRICKECGATYHLEFNPPAKADVCDKCGGELYQRSD
>Random_Gram-positive_AKL_acvdx
RRICKECGATYHLEFNPPAKADVCDKCGGELYQRSD
>Random_Gram-positive_AKL_wojzq
RRICKECGATYHLEFNPPAKADVCDKCGGELYQRSD
>Random_Gram-positive_AKL_jipjw
RRICKECGATYHLEFNPPAKADVCDKCSGELYQRSD
>Random_Gram-positive_AKL_hrayu
RRICKECGATYHLEFNPPAKADVCDKCGGELYQRSD
>Random_Gram-positive_AKL_kgavl
RRICKECGATYHLEFNPPAKADVCDKCGGELYQRSD
>Random_Gram-positive_AKL_pusim
RRICKECGATYHLEFNPPAKADVCDKCGGELYQRSD
>Random_Gram-positive_AKL_vwdds
RRICKECGATYHLEFNPPAKADVCDKCGGELYQRSD
>Random_Gram-positive_AKL_tkoth
RRICKECGATYHLEFNPPATADVCDKCGGELYQRSD
>Random_Gram-positive_AKL_hcbyk
RRICKECGATYHLEFNPPAKADVCDKCGGELYQRSD
>Random_Gram-positive_AKL_ahjnk
RRICKECGATYHLEFNPPAKADVCDKCGGELYQRSD
>Random_Gram-positive_AKL_dryqn
RRICKECGATYHLEFNAPAKADVCDKCGGELYQRSD
>Random_Gram-positive_AKL_bkgrl
RRICKECGATYHLEFNPPAKADVCDKCGGELYQRSD
>Random_Gram-positive_AKL_piqkt
RRICKECGATYHLEFNAPAKADVCDKCGGELYQRSD
>Random_Gram-positive_AKL_xmjgd
RRICKECGATYHLEFNAPAKADVCDKCGGKLYQRSD
>Random_Gram-positive_AKL_fmrku
RRICKECGATYHLEFNPPANADVCDKCGGDLYQRSD
>Random_Gram-positive_AKL_kiqtd
RRICKECGATYHLEFNPPANADVCDKCGGDLYQRSD
>Random_Gram-positive_AKL_awqsq
RRICKECGATYHLEFNPPANADVCDKCGGDLYQRSD
>Random_Gram-positive_AKL_cjuqw
RRICKECGATYHLEFNPPANADVCDKCGGDLYQRSD
>Random_Gram-positive_AKL_etzsu
RRICKECGATYHLEFNPPANADVCDKCGGDLYQRSD
>Random_Gram-positive_AKL_ndrpd
RRICKECGATYHLEFNPPANADVCDKCGGDLYQRSD
>Random_Gram-positive_AKL_taebr
RRICKECGATYHLEFNPPAKADVCDKCGGELYQRSD
>Random_Gram-positive_AKL_qvzlz
RRICKECGATYHLEFNPPAKADVCDKCGGELYQRSD
>Random_Gram-positive_AKL_yhzrz
RRICKECGATYHLEFNPPAKADVCDKCGGELYQRSD
================================================
FILE: inst/extdata/LeaderRepeat_All.fa
================================================
>Ain_RyC-MR95
ATCCGTTGATCAAATTTGAGGTTTGAGAGATATGTAAATTCAAAGGATAATCAAAC
>Asp_D21
ATCCGTTGATCAAATTTGAGGTTTGAGAGATATGTAAATTCAAAGGATAATCAAAC
>Bbi_S17
AGGAATCCTTAAGGCTATCGGTTTCAGATGCCTGTCAGATCAATGACTTTGACCAC
>Bca_FSLF6-1037
ACAAAATCGACGCATTTGAGGTTTTAGAGCTGTGTTAAATTGAATGGTATTAAAAC
>Bfi_16/4
GTTGGAAATTTGGATTTGACGTTTTAGTACCCGGGAAAATTAAGTGATTGGAAAAC
>Bpe_CAG:437
TTTGGATAACATGATTTGGTATTTTAGTACCTGAACAAATTACGTGACTGTAAAAC
>Bsp_AC2005
TTTATCATACTATATTTGGTGTTTTAGTACCTAGAGAAATTAAGTGATTAGAAAAC
>Bth_DSM20171
ACAAAATTTCATTGTTTGAGGTTTTAGAGCTGTGTTAAATTGAATGGTATTAAAAC
>Cgl_PW2
GAGAAGATTTTGATCCAATGGTTTTGGAGCAGTGTCGTTCTGACTGGTAATCCAAC
>Cla_DSM_14151
ATGGCTCTCTAAAATTTGAGGTTTTAGACCAGTGTAATTTTAGAGAGTAGTAAAAC
>Cma_M35/04/3
TTTAAATATTACAATTTAAGGTTCTTGTACTTTCTAGATTTTCATATTAGTAAAAC
>Cmi_DSM15897
ATTGGATTTTTGAATTTGAGGTTTTAGGGTTATGTTATTTTGAACTGAATTAAAAC
>Csp_CAG:230
CGATTATATTTGAATTTGATATTTTAGTACCTGAAAGAATTGAGTTATCGTAAAAC
>Csp_ZWU0011
TACGTTATAATGAAATTGACATTTTGGTACTCTCGCATCTTTTGGTATAAGGAAAC
>Dlo_AGR2136
CGGCGAGAACCGGATTTGAGGTTTGAGAGTCTTGTTAATACGGAAGGATTTTAAAC
>Edo_DSM3991
TATGTTAAAATATGTTTGAGGTTTTGTTACCATATGGATTTTTGCTAGATTAAGAC
>Efa_1141733
GGAAAAATTTTTTCTGCGAGGTTTTAGAGCTATGCTGATTTGAATGCTTCCAAAAC
>Efa_D32_1
GAAAAAAATAATTCTCCGAGGTTTTAGAGTCATGTTGTTTAGAATGGTACCAAAAC
>Efa_OG1RF_1
GAAAAAAATAATTCTCCGAGGTTTTAGAGTCATGTTGTTTAGAATGGTACCAAAAC
>Efa_TX0012
TTTCAAATTTTAAATTTGAGGTTTTTGTACTCTCAATAATTTCTTATCAGTAAAAC
>Efa_TX0012_2
TTTCAAATTTTAAATTTGAGGTTTTTGTACTCTCAATAATTTCTTATCAGTAAAAC
>Eit_DSM15952
AGATAAAAAATATCTGCGAGGTTTTAGAGCTATGTTGAATCGAATGCTTCCAAAAC
>Emu_QU25_DNA
GAAAAAATTTTTTCTACGAGGTTTTAGAGCTATGTTGAATTGAATGCTTCCAAAAC
>Eph_ATCCBAA-412
AGAAAGAAAATGGCTGCGAGGTTTTAGAGCCATGTTGAATTGAATGCTTCCAAAAC
================================================
FILE: inst/extdata/Rfam/RF00458.fasta
================================================
>AF178440.1/5925-6123
UUGACUAUGUGAUCUUGCUUUCG----UAAUAAAAUUCUGUACAUAAAAGUCGAAAGUAUUGCUAUAGUUAAGGUUGCGCUUGCCUAUUUAGGCAUACUUCUCAGGAUGGCGCG-UUGCAGUCCAA-CAAG-AUCCAGGGACUGUACAGAAUUUUCC-UAUACCUCGAGUCGGGUUU-GGAA--UCUAAGGUUGACUCGCUGUAAAUAAU
>AB017037.1/6286-6484
GAAAAUGUGUGAUCUGAUUAGAAG--UAAGAAAAUUCCUAG-UUAUAAUAUUUUUAAUACUGCUACAUUUUU-AAGACCCUUAGUUAUUUAGCUUUACCGCCCAGGAUGGGGUG-CAGCGUUCCUG-CAA-UAUCCAGGGCAC--CUAGGUGCAGCCUUGUAGUUUUAGUGGACUUUAGGCU--AAAGAAUUUCACUAGCAAAUAAUAAU
>AF014388.1/6078-6278
GUUAAGAUGUGAUCUUGCUUCCUU--AUACAAUUUUGAGAGGUUAAUAAGAAGGAAGUAGUGCUAUCUUAAU-AAUUAGGUUAACUAUUUAGUUUUACUGUUCAGGAUGCCUAU-UGGCAGCCCCA-UAA-UAUCCAGGACAC-CCUCUCUGCUUCUUAUAUGAUUAGGUUGUCAUUUAGAA--UAAGAAAAUAACCUGCUAACUUUCAA
>AF218039.1/6028-6228
GCAAAAAUGUGAUCUUGCUUGUAA--AUACAAUUUUGAGAGGUUAAUAAAUUACAAGUAGUGCUAUUUUUGU-AUUUAGGUUAGCUAUUUAGCUUUACGUUCCAGGAUGCCUAG-UGGCAGCCCCA-CAA-UAUCCAGGAAGC-CCUCUCUGCGGUUUUUCAGAUUAGGUAGUCGAAAAACC--UAAGAAAUUUACCUGCUACAUUUCAA
>AF183905.1/5647-5848
CCAACAAUGUGAUCUUGCUUGCGGA-GGCAAAAUUUGCACAGUAUAAAAUCUGCAAGUAGUGCUAUUGUUGG-AAUCACCGUACCUAUUUAGGUUUACGCUCCAAGAUCGGUGGAUAGCAGCCCUAUCAA-UAUCUAGGAGAA-CUGUGCU-AUGUUUAGAAGAUUAGGUAGUCUCUAAACA---GAACAAUUUACCUGCUGAACAAAUU
>AB006531.1/6003-6204
CUGACUAUGUGAUCUUAUUAAAAUUAGGUUAAAUUUCGAGGUUAAAAAUAGUUUUAAUAUUGCUAUAGUCUU-AGAGGUCUUGUAUAUUUAUACUUACCACACAAGAUGGACCG-GAGCAGCCCUC-CAA-UAUCUAGUGUAC--CCUCGUGCUCGCUCAAACAUUAAGUGGUGUUGUGCGA--AAAGAAUCUCACUUCAAGAAAAAGAA
>AF022937.1/6935-7121
AGUGUUGUGUGAUCUUGCGCGAU-------AAAUGCUGACG---UGAAAACGUUGCGUAUUGCUACAACACU-----UGGUUAGCUAUUUAGCUUUACUAAUCAAGACGCCGUC-GUGCAGCCCAC-AAAA-GUCUAGAUA----CGUCACAGGAGAGCAUACGCUAGGUCGCGUUGACUAUCCUUAUAUAU-GACCUGCAAAUAUAAAC
================================================
FILE: inst/extdata/Rfam/RF03120.fasta
================================================
>KU973692.1/1-298
AUAUUAGGUUUUUACCUACCCAGGAA--AAGCCAACCAA-CCUCGAUCUCUUGUAGAUCUGUUCUCUAAACGA-ACUUUAAAA------UCUGUGUAGCUGUCGCUCGGCUGCAUGCCUAGUGCACCUACGCAGUAUAAACAAUAAUAAACUUUACUGUCGUUGACAAGAAACGAGUAACUCGUCCCUCUUCUGCAGACUGCUUACGGUUUCGUCCGUGUUGCAGUCGAUCAUCAGCAUACCUAGGUUUCGUCCGGGUGUGACCGAAAGGUAAGAUGGAGAGCCUUGUUCUUGGAAUCAACGAGAAAA
>DQ071615.1/1-298
AUAUUAGGUUUUUACCUACCCAGGAA--AAGCCAACCAA-CCUCGAUCUCUUGUAGAUCUGUUCUCUAAACGA-ACUUUAAAA------UCUGUGUAGCUGUCGCUUGGCUGCAUGCCUAGUGCACCUACGCAGUAUAAACAAUAAUAAACUCUACUGUCGUUGACAAGAAACGAGUAACUCGUCCCUCUUCUGCAGACUGCUUACGGUUUCGUCCGUGUUGCAGUCGAUCAUCAGCAUACCUAGGUUUCGUCCGGGUGUGACCGAGAGGUAAGAUGGAGAGCCUUGUUCUUGGAAUCAACGAGAAAA
>KF367457.1/1-298
AUAUUAGGUUUUUACCUACCCAGGAA--AAGCCAACCAA-CCCCGAUCUCUUGUAGAUCUGUUCUCUAAACGA-ACUUUAAAA------UCUGUGUAGCUGUCGCUCGGCUGCAUGCCUAGUGCACCUACGCAGUAUAAACAAUAAUAAAUUUUACUGUCGUUGACAAGAAACGAGUAACUCGUCCCUCUUCUGCAGACUGCUUACGGUUUCGUCCGUGUUGCAGUCGAUCAUCAGCAUACCUAGGUUUCGUCCGGGUGUGACCGAAAGGUAAGAUGGAGAGCCUUGUUCUUGGUGUCAACGAGAAAA
>MK211377.1/1-296
--UUUAGGUUUUUACCUACCCAGGAA--AAGCCAACCAA-CCUCGAUCUCUUGUAGAUCUGUUCUCUAAACGA-ACUUUAAAA------UCUGUGUAGCUGUCGCUCGGCUGCAUGCCUAGUGCACCUACGCAGUAUAAACAAUAAUAAAUUUUACUGUCGUUGACAAGAAACGAGUAACUCGUCCCUCUUCUGCAGACUGCUUACGGUUUCGUCCGUGUUGCAGUCGAUCAUCAGCAUACCUAGGUUUCGUCCGGGUGUGACCGAAAGGUAAGAUGGAGAGCCUUGUUCUUGGUGUCAACGAGAAAA
>MK062184.1/1-299
AUAUUAGGUUUUUACCUACCCAGGAA--AAGCCAACCAA-CCUCGAUCUCUUGUAGAUCUGUUCUCUAAUGGUCGCUUUAAAA------UCUGUGUAGCUGUCGCUCGGCUGCAUGCCUAGUGCACCUACGCAGUAUAAACAAUAAUAAAUUUUACUGUCGUUGACAAGAAACGAGUAACUCGUCCCUCUUCUGCAGACUGCUUACGGUUUCGUCCGUGUUGCAGUCGAUCAUCAGCAUACCUAGGUUUCGUCCGGGUGUGACCGAAAGGUAAGAUGGAGAGCCUUGUUCUUGGUGUCAACGAGAAAA
>AY559082.1/1-297
AAAUUUUGUUU-CAUCUAUACAGGAA--AAGCCAACCAA-CCUCGAUCUCUUGUAGAUCUGUUCUCUAAACGA-ACUUUAAAA------UCUGUGUAGCUGUCGCUCGGCUGCAUGCCUAGUGCACCUACGCAGUAUAAACAAUAAUAAAUUUUACUGUCGUUGACAAGAAACGAGUAACUCGUCCCUCUUCUGCAGACUGCUUACGGUUUCGUCCGUGUUGCAGUCGAUCAUCAGCAUACCUAGGUUUCGUCCGGGUGUGACCGAAAGGUAAGAUGGAGAGCCUUGUUCUUGGUGUCAACGAGAAAA
>DQ412043.1/1-294
AUAUUAGGUUUUUACCUACCCAGGAA--AAGCCAACCAA-CCUCGAUCUCUUGUAGAUCUGUUCUUUAAACGA-ACUUUAAAA------UCUGUGUGGCUGUCGCUUGGCUGUAUGCCUAGUGCACCUACACAGUAUAAA---UAAU-AACUUUACUGUCGUUGACAAGAAACGGGUAACUCGUCCUUCUUCUGCAGACUGCUUACGGUUUCGUCCGUGUUGUAGUCGAUCAUCAGCAUACCUAGGUUUCGUCCGGGUGUGACCGAAAGGUAAGAUGGAGAGCCUUGUUCUUGGUGUCAACGAGAAAA
>KP886809.1/1-297
AUAUUAGGUUUUUACCUACCCAGGAA--AAGCCAACCAA-CCUCGAUCUCUUGUAGAUCUGUUCUUUAAACGA-ACUUUAAAA------UCUGUGUGGCUGUCGCUCGGCUGUAUGCCUAGUGCACCUACACAGUAUAAAUAUUAAU-AACUUUACUGUCGUUGACAAGAAACGAGUAACUCGUCCUUCUUCUGCAGACUGCUUACGGUUUCGUCCGUGUUGCAGUCGAUCAUCAGCAUACCUAGGUUCCGUCCGGGUGUGACCGAAAGGUAAGAUGGAGAGCCUUGUUCUUGGUGUCAACGAGAAAA
>DQ022305.2/1-295
--GUUAGGUUUUUACCUACCCAGGAA--AAGCCAACCAA-CCUUGAUCUCUUGUAGAUCUGUUCUCUAAACGA-ACUUUAAAA------UCUGUGUGGCUGUCGCUCGGCUGCAUGCCUAGCGCACCUACGCAGUAUAAAUAUUAAU-AACUUUACUGUCGUUGACAAGAAACGAGUAACUCGUCCCUCUUCUGCAGACUGCUUACGGUUUCGUCCGUGUUGCAGUCGAUCAUCAGCAUACCUAGGUUUCGUCCGGGUGUGACCGAAAGGUAAGAUGGAGAGCCUUGUUCUUGGUGUCAAUGAGAAAA
>MK211374.1/1-294
---UUAGGUUUUUACCUACCCAGGAA--AAGCCAACCAA-CUUCGAUCUCUUGUAGAUCUGUUCUCUAAACGA-ACUUUAAAA------UCUGUGUGGCUGUCGCUUGGCUGUAUGCCUAGUGCACCUACGCAGUACAAAUAUUAAU-AACUCUAUUGUCGUUGACAAGAAACGAGUAACUCGUCCUUCUUCUGCAGACUGCUUACGGUUUCGUCCGUGUCGCAGUCGAUCAUCAGCAUACCUAGGUUUCGUCCGGGUGUGACCGAAAGGUAAGAUGGAGAGCCUUGUUCUUGGUGUCAACGAGAAAA
>DQ648857.1/1-297
AUAUUAGGUUUUUACCUACCCAGGAA--AAGCCAACCAA-UCUCGAUCUCUUGUAGAUCUGUUCUCUAAACGA-ACUUUAAAA------UCUGUGUAGCUGUCGCUCGGCUGCAUGCCUAGUGCACUUACGCAGUAUAAAUAUUAAU-AACUUUACUGUCGCUGACUGGAUACGAGUAACUCGUCCUUCUUCUGCAGACUGCUUACGGUUUCGUCCGUGUUGCAGUCGAUCAUCAGCAUACCUAGGUUUCGUCCGGGUGUGACCGAAAGGUAAGAUGGAGAGCCUUGCUCUUGGUGUCAGCGAGAAAA
>KF569996.1/1-305
AUAUUAGGUUUUUACCUACCCAGGAA--AAGCCAACCAACCCUCGAUCUCUUGUAGAUCUGUUCUCUAAACGA-ACUUUAAAAGCAUUCUCUGUGUAGCUGUCGCUCGGCUGCAUGCCUAGUGCACCUACGCAGUAUAAACAAUAAUAAACUUUACUGUCGUUGACAAGAAACGAGUAACUCGUCCCCCUUCUGCAGACUGCUUGCGGUUUCGUCCGUGUUGCAGUCGAUCAUCAGCAUACCUAGGUUCCGUCCGGGUGUGACCGAAAGGUAAGAUGGAGAGCCUUGUUCUUGGAAUCAACGAGAAAA
>MT163718.1/1-299
UUGGUUGGUUUAUACCUUCSCAGGUAACAAACCAACCAACUUUCGAUCUCUUGUAGAUCUGUUCUCUAAACGA-ACUUUAAAA------UCUGUGUGGCUGUCACUCGGCUGCAUGCUUAGUGCACUCACGCAGUAUAAUUAAUAACUAA--UUACUGUCGUUGACAGGACACGAGUAACUCGUCUAUCUUCUGCAGGCUGCUUACGGUUUCGUCCGUGUUGCAGCCGAUCAUCAGCACAUCUAGGUUUCGUCCGGGUGUGACCGAAAGGUAAGAUGGAGAGCCUUGUCCCUGGUUUCAACGAGAAAA
>MT344963.1/1-299
AUUAAAGGUUUAUACCUUCCCAGGUAACAAACCAACCAACUUUCGAUCUCUUGUAGAUCUGUUCUCUAAACGA-ACUUUAAAA------UCUGUGUGGCUGUCACUCGGCUGCAUGCUUAGUGCACUCACGCAGUAUAAUUAAUAACUAA--UUACUGUCGUUGACAGGACACGAGUAACUCGUCUAUCUUCUGCAGGCUGCUUACGGUUUCKUCCGUGUUGCAGCCGAUCAUCAGCACAUCUAGGUUUUGUCCGGGUGUGACCGAAAGGUAAGAUGGAGAGCCUUGUCCCUGGUUUCAACGAGAAAA
>MT019530.1/1-299
AUUAAAGGUUUAUACCUUCCCAGGUAACAAACCAACCAACUUUCGAUCUCUUGUAGAUCUGUUCUCUAAACGA-ACUUUAAAA------UCUGUGUGGCUGUCACUCGGCAGCAUGCCGAGUGCAGCCACACAGUAUAAUUAAUAACUAA--UUACUGUCGUUGACAGGACACGAGUAACUCGUCUAUCUUCUGCAGGCUGCUUACGGUUUCGUCCGUGUUGCAGCCGAUCAUCAGCACAUCUAGGUUUCGUCCGGGUGUGACCGAAAGGUAAGAUGGAGAGCCUUGUCCCUGGUUUCAACGAGAAAA
>MT263421.1/1-296
AU---AGGUUUAUACCUUCCCAGGUAACAAACCHUUHAACUUUCGAUCUCUUGUAGAUCUGUUCUCUAAACGA-ACUUUAAAA------UCUGUGUGGCUGUCACUCGGCUGCAUGCUUAGUGCACUCACGCAGUAUAAUUAAUAACUAA--UUACUGUCGUUGACAGGACACGAGUAACUCGUCUAUCUUCUGCAGGCUGCUUACGGUUUCGUCCGUGUUGCAGCCGAUCAUCAGCACAUCUAGGUUUCGUCCGGGUGUGACCGAAAGGUAAGAUGGAGAGCCUUGUCCCUGGUUUCAACGAGAAAA
>MT345869.1/1-293
------GGUUUAUACCUUCCCAGGUAACAAUCCAWUCAACUUUCGAUCUCUUGUAGAUCUGUUCUCUAAACGA-ACUUUAAAA------UCUGUGUGGCUGUCACUCGGCUGCAUGCUUAGUGCACUCACGCAGUAUAAUUAAUAACUAA--UUACUGUCGUUGACAGGACACGAGUAACUCGUCUAUCUUCUGCAGGCUGCUUACGGUUUCGUCCGUGUUGCAGCCGAUCAUCAGCACAUCUAGGUUUUGUCCGGGUGUGACCGAAAGGUAAGAUGGAGAGCCUUGUCCCUGGUUUCAACGAGAAAA
>MT345841.1/1-293
------CUYYUAUACCUUCCCAGGUAACAAACCHWYCAACUUUCGAUCUCUUGUAGAUCUGUUCUCUAAACGA-ACUUUAAAA------UCUGUGUGGCUGUCACUCGGCUGCAUGCUUAGUGCACUCACGCAGUAUAAUUAAUAACUAA--UUACUGUCGUUGACAGGACACGAGUAACUCGUCUAUCUUCUGCAGGCUGCUUACGGUUUCGUCCGUGUUGCAGCCGAUCAUCAGCACAUCUAGGUUUCGUCCGGGUGUGACCGAAAGGUAAGAUGGAGAGCCUUGUCCCUGGUUUCAACGAGAAAA
>MG772934.1/1-298
AUAUUAGGUUUUUACCUUCCCAGGUAACAAACCAACUAACUCUCGAUCUCUUGUAGAUCUGUUCUCUAAACGA-ACUUUAAAA------UCUGUGUGACUGUCACUUAGCUGCAUGCUUAGUGCACUCACGCAGUUUAAUUA-UAAUUAA--UUACUGUCGUUGACAGGACACGAGUAACUCGUCUAUCUUCUGCAGGUUGCUUACGGUUUCGUCCGUGUUGCAGCCGAUCAUCAGCAUACCUUGGUUUCGUCCGGGUGUGACCGAGAGGUAAGAUGGAGAGCCUUGUCCCUGGUUUCAACGAGAAAA
================================================
FILE: inst/extdata/Rfam/RF03120_SS.txt
================================================
>RF03120
......<<<<<<<.<<<....>>>>>..>>>>>...........<<<<<.....>>>>>.<<<<.......>>.>>..............<<<<<<<<.<<.<<<<.<<<.....>>>.>>>>>>.>>>>>>>>........................((((((((((((.(((((...(((.(((.((((<<<..<<<<<<.<<<<<......>>>>>..>>>>>>......>>><<<<<<<.<<......>>>>>>>>><<<....>>>)))).)))))).))))))))))...))))))).....
================================================
FILE: inst/extdata/sample.fasta
================================================
>PH4H_Rattus_norvegicus
MAAVVLENGVLSRKLSDFGQETSYIEDNSNQNGAISLIFSLKEEVGALAKVLRLFEENDINLTHIESRPSRLNKDEYEFF
TYLDKRTKPVLGSIIKSLRNDIGATVHELSRDKEKNTVPWFPRTIQELDRFANQILSYGAELDADHPGFKDPVYRARRKQ
FADIAYNYRHGQPIPRVEYTEEEKQTWGTVFRTLKALYKTHACYEHNHIFPLLEKYCGFREDNIPQLEDVSQFLQTCTGF
RLRPVAGLLSSRDFLGGLAFRVFHCTQYIRHGSKPMYTPEPDICHELLGHVPLFSDRSFAQFSQEIG-LASLGAPDEYIE
KLATIYWFTVEFGLCKEG-DSIKAYGAGLLSSFGELQYCLSD-KPKLLPLELEKTACQEYSVTEFQPLYYVAESFSDAKE
KVRTFAATIPRPFSVRYDPYTQRVEVLDNTQQLKILADSINSEVGILCNALQKIKS
>PH4H_Mus_musculus
MAAVVLENGVLSRKLSDFGQETSYIEDNSNQNGAVSLIFSLKEEVGALAKVLRLFEENEINLTHIESRPSRLNKDEYEFF
TYLDKRSKPVLGSIIKSLRNDIGATVHELSRDKEKNTVPWFPRTIQELDRFANQILSYGAELDADHPGFKDPVYRARRKQ
FADIAYNYRHGQPIPRVEYTEEERKTWGTVFRTLKALYKTHACYEHNHIFPLLEKYCGFREDNIPQLEDVSQFLQTCTGF
RLRPVAGLLSSRDFLGGLAFRVFHCTQYIRHGSKPMYTPEPDICHELLGHVPLFSDRSFAQFSQEIG-LASLGAPDEYIE
KLATIYWFTVEFGLCKEG-DSIKAYGAGLLSSFGELQYCLSD-KPKLLPLELEKTACQEYTVTEFQPLYYVAESFNDAKE
KVRTFAATIPRPFSVRYDPYTQRVEVLDNTQQLKILADSINSEVGILCHALQKIKS
>PH4H_Homo_sapiens
MSTAVLENPGLGRKLSDFGQETSYIEDNCNQNGAISLIFSLKEEVGALAKVLRLFEENDVNLTHIESRPSRLKKDEYEFF
THLDKRSLPALTNIIKILRHDIGATVHELSRDKKKDTVPWFPRTIQELDRFANQILSYGAELDADHPGFKDPVYRARRKQ
FADIAYNYRHGQPIPRVEYMEEEKKTWGTVFKTLKSLYKTHACYEYNHIFPLLEKYCGFHEDNIPQLEDVSQFLQTCTGF
RLRPVAGLLSSRDFLGGLAFRVFHCTQYIRHGSKPMYTPEPDICHELLGHVPLFSDRSFAQFSQEIG-LASLGAPDEYIE
KLATIYWFTVEFGLCKQG-DSIKAYGAGLLSSFGELQYCLSE-KPKLLPLELEKTAIQNYTVTEFQPLYYVAESFNDAKE
KVRNFAATIPRPFSVRYDPYTQRIEVLDNTQQLKILADSINSEIGILCSALQKIK-
>PH4H_Bos_taurus
MSALVLESRALGRKLSDFGQETSYIEGNSDQN-AVSLIFSLKEEVGALARVLRLFEENDINLTHIESRPSRLRKDEYEFF
TNLDQRSVPALANIIKILRHDIGATVHELSRDKKKDTVPWFPRTIQELDNFANQVLSYGAELDADHPGFKDPVYRARRKQ
FADIAYNYRHGQPIPRVEYTEEEKKTWGTVFRTLKSLYKTHACYEHNHIFPLLEKYCGFREDNIPQLEEVSQFLQSCTGF
RLRPVAGLLSSRDFLGGLAFRVFHCTQYIRHGSKPMYTPEPDICHELLGHVPLFSDRSFAQFSQEIG-LASLGAPDEYIE
KLATIYWFTVEFGLCKQG-DSIKAYGAGLLSSFGELQYCLSD-KPKLLPLELEKTAVQEYTITEFQPLYYVAESFNDAKE
KVRNFAATIPRPFSVHYDPYTQRIEVLDNTQQLKILADSISSEVEILCSALQKLK-
>PH4H_Chromobacterium_violaceum
--------------------------------------------------------------------------------
----------------------------------------------------MNDRADFVVPD-----ITTRKNVGLSHD
AN------DFTLPQPLDRYSAEDHATWATLYQRQCKLLPGRACDEFMEGL----ERLEVDADRVPDFNKLNQKLMAATGW
KIVAVPGLIPDDVFFEHLANRRFPVTWWLREPHQLDYLQEPDVFHDLFGHVPLLINPVFADYLEAYGKGGVKAKALGALP
MLARLYWYTVEFGLINTP-AGMRIYGAGILSSKSESIYCLDSASPNRVGFDLMRIMNTRYRIDTFQKTYFVIDSFKQLFD
ATA-PDFAPLYLQLADAQPWGAGDVAPDDLVLNAGDRQGWADTEDV----------
>PH4H_Ralstonia_solanacearum
--------------------------------------------------------------------------------
-----------------------------------------------MAIATPTSAAPTPAPAGFTGTLTDKLREQFAEG
LDGQTLRPDFTMEQPVHRYTAADHATWRTLYDRQEALLPGRACDEFLQGL----STLGMSREGVPSFDRLNETLMRATGW
QIVAVPGLVPDEVFFEHLANRRFPASWWMRRPDQLDYLQEPDGFHDIFGHVPLLINPVFADYMQAYGQGGLKAARLGALD
MLARLYWYTVEFGLIRTP-AGLRIYGAGIVSSKSESVYALDSASPNRIGFDVHRIMRTRYRIDTFQKTYFVIDSFEQLFD
ATR-PDFTPLYEALGTLPTFGAGDVVDGDAVLNAGTREGWADTADI----------
>PH4H_Caulobacter_crescentus
--------------------------------------------------------------------------------
----------------------------------------------------MSG---------------DGLSNGPPPG
AR-----PDWTIDQGWETYTQAEHDVWITLYERQTDMLHGRACDEFMRGL----DALDLHRSGIPDFARINEELKRLTGW
TVVAVPGLVPDDVFFDHLANRRFPAGQFIRKPHELDYLQEPDIFHDVFGHVPMLTDPVFADYMQAYGEGGRRALGLGRLA
NLARLYWYTVEFGLMNTP-AGLRIYGAGIVSSRTESIFALDDPSPNRIGFDLERVMRTLYRIDDFQQVYFVIDSIQTLQE
VTL-RDFGAIYERLASVSDIGVAEIVPGDAVLTRGT-QAYATAGGRLAGAAAG---
>PH4H_Pseudomonas_aeruginosa
--------------------------------------------------------------------------------
----------------------------------------------------------------------MKTTQYVARQ
PD----------DNGFIHYPETEHQVWNTLITRQLKVIEGRACQEYLDGI----EQLGLPHERIPQLDEINRVLQATTGW
RVARVPALIPFQTFFELLASQQFPVATFIRTPEELDYLQEPDIFHEIFGHCPLLTNPWFAEFTHTYGKLGLKASKE-ERV
FLARLYWMTIEFGLVETD-QGKRIYGGGILSSPKETVYSLSD-EPLHQAFNPLEAMRTPYRIDILQPLYFVLPDLKRLFQ
LAQ-EDIMALVHEAMRLG-LHAPLFPPKQAA-------------------------
>PH4H_Rhizobium_loti
--------------------------------------------------------------------------------
----------------------------------------------------MSVAEYAR----------DCAAQGLRGD
YS--VCRADFTVAQDYD-YSDEEQAVWRTLCDRQTKLTRKLAHHSYLDGV----EKLGL-LDRIPDFEDVSTKLRKLTGW
EIIAVPGLIPAAPFFDHLANRRFPVTNWLRTRQELDYIVEPDMFHDFFGHVPVLSQPVFADFMQMYGKKAGDIIALGGDE
MITRLYWYTAEYGLVQEAGQPLKAFGAGLMSSFTELQFAVEGKDAHHVPFDLETVMRTGYEIDKFQRAYFVLPSFDALRD
AFQTADFEAIVARRKDQKALDPATV-------------------------------
================================================
FILE: inst/extdata/seedSample.fa
================================================
>hsa-let-7a-5p MIMAT0000062 Homo sapiens let-7a-5p
UGAGGUAGUAGGUUGUAUAGUU
>hsa-let-7b-5p MIMAT0000063 Homo sapiens let-7b-5p
UGAGGUAGUAGGUUGUGUGGUU
>hsa-let-7c-5p MIMAT0000064 Homo sapiens let-7c-5p
UGAGGUAGUAGGUUGUAUGGUU
>hsa-let-7d-5p MIMAT0000065 Homo sapiens let-7d-5p
AGAGGUAGUAGGUUGCAUAGUU
>hsa-let-7e-5p MIMAT0000066 Homo sapiens let-7e-5p
UGAGGUAGGAGGUUGUAUAGUU
>hsa-let-7f-5p MIMAT0000067 Homo sapiens let-7f-5p
UGAGGUAGUAGAUUGUAUAGUU
================================================
FILE: inst/extdata/sequence-link-tree.fasta
================================================
>Phy000B0HV_NEUCR
M-----GIGSATLG-----------------------------------SRIPTPVLVARAVVSSSDGK-----DC--VA
NPNLCEKP-VGGSQLTVPIVLGLW----------------RNMKKLAAEEAHDPHKSLDFGLDENM-----------GKA
KGRNMAG------EKDGNGSRFHAHQMSMDMNLSSPYLLPPDAH-GSQSSLNSLARTL-NPQDDPFRPVTQYTASDAASV
KSMP-R-----GTD-----------R-------GPGG------PFRGPPPRQGSMP-RSPEPTHA---RPGNG----PRP
PRI-SVQD------------P---SSNA-TS-D--NE-TS----------------------D----------------S
ERTLT-GSPRELHAATHK-------------------DGVKPPA-SPSQPISPANP-AV---------------------
>Phy000FCLK_ASPCL
--------------------------------------------------------------------------------
-------------------------------------------------------------------------------M
-------------REAEKGNPMHAKGMSLDIV-PSPYLLPPGLH-GSRESLHSLSRSV-IGDDDKYRHATSFL-GDNASV
RSQP-R---G-YHDDAMTYSR-SQ-S------K-VS------M--R-DDMNQGLLQ------------NAQRMSR--SSP
PL-YNTPPDGGSVHSPVGQD------------------------------------------R-----------------
GQDSG-LQLNLPRSLSPVHI-----------------PGFNGSR-GPSPV-----P-TS-PE------GNDDKLPS----
>Phy000FJDH_ASPFL
MH---YHHRHQT--HQDIHMV-VRSPP-RRPDI-VPRHRLP------YLV-PEPPTFVKRDSDPS--------QTCSAGD
TSSKCEKPTSTTTTTTLPVVLGAVVPILCAV-IVLIYLHRRNVRKLRSEDANDKHRSLDFGLDLEP-TG----GGNA--M
R-------Q----TEKSNGSYNHNKGISLDIG-PSPYLLPPGLH-GSRDSLHSLSRSI--GGDDKYRHATSFL-GDNASV
RSQS-R---G-AQDDAPSFTG-SA-R------K-AA------L---GDDMKQGLLG------------NAQRMSR--SSP
PL-YISPGEDGA-HVQVDPI------------------------------------------A-----------------
QPDHG-FQFELPRSPSPVLI-----------------PGAPSTK-ESITP-----TNNV-DK------------------
>Phy000FQ5O_ASPFU
MH---HHHQHLHFPRHGIHLA-VRSPP-RRPDI-VPRDRVP------LLVGTEDPTLVKRVPSTSTTSTA--STRCPEGD
TSSACEKYTNSSSTTTLPIVLGAVIPIVCAI-IVLFYLHRRNVKKLRQEDANDKHKSLDFGLDLEP-RA----GSKP--M
-------------TQAEKGSNMHSKGMSLDIG-QSPYLLPPGLH-GSRESLHSLSRSI-IGDDDKYRHASSFL-GDNASV
RSQP-R---G-FHDETSAFSR-SQ-S------K-AS------L--RGDDMNQGLLQ------------NAQRMSR--SSP
PL-YNAPSDGGSSHSPRGQG------------------------------------------N-----------------
GQDMG-LQLNLPRSLSPVHI-----------------PGVNGSR-GTSPA-----P-GGHAD------GSEDISSS----
>Phy000G05U_EMENI
MH---RHQQHQH--RHGKYLG-ARFAP-VEPAL-MPRNRPP------YLLMPEAPTLVKREPMPTTDSGR--VETCSPGD
NSARCEKNTSTASNTTLPVVLGAVIPIVCAI-IVLIFLHRRNVKKLRNEDANDKHKSLDFGMDLAP-SG----GRSG--M
Q-------E---------KGSHHMKGISLDIG-PSPYMLPPSIR-GSKDSLNSLPRTI-LADDDKYRHAHTYFSTDAQSI
RSQR-R-----VHDDAASVAG-ST-R------R-GA------F---GDEMNQGLLG------------NAQRISR--SSP
PL-YNPPEPTAGRAQ----P------------------------------------------Q-----------------
VQDAG-FELSLPRSPSPVHV-----------------SGLTSIN-ESTTE-----TGRE-AN------------------
>Phy000GDP6_ASPNG
M--------------------------------------------------RETPTLARREPLPSTDSSS------ASSS
TASSGTKPTSTLTTTTLPVVLGAVVPIVIAI-GILLYLHRRNVKKLRNEDANDKHKSLDFGLDLAP-TN----GAVP--M
Q-------Q----AEKTDRNAAHNKGISLDIG-PSPYLLPPGLH-NSRESLASMSRSIGDGDDDKYRHVGSFL-GDNSSL
RSHS-R---G-PHDDAASFTG-ST-R------R-AA------L---GDDMNQGLLR------------NAQRMSR--SSP
PL-YKTSSGDRNVQSPASSD------------------------------------------H-----------------
EHDHG-FQLDLPRSPSPVHV-----------------PGMAISE-PHT-------TSNE-VG------FAGDHAVTETSA
>Phy000HD5X_BOTFU
MADHQRLANIVRLARRV----P--LAE-AAAED-IGNIASIL------KMSLPDPVLMVRSATTSAAASS---STCAADD
TSAACEKP-VGPSAYTLPVILGIVIPVGGAI-ILFTILQRRYMKKAREEDLNDPTKNMDFGMGRIS-R--T------AGG
ESG-I--S---N-FDDEKGGAVRTRQMSLDLGGKSPYLLPPELH-NSRESLHSLSRTI-HSNEDPYRPVHEAV---GGSI
RSKQ-------GRNGSSIMTESSA-A--------PSK-----MYDAGSPDGQGLLS------------NAAAMSR--TTP
PSTGSSPP------------P---KSNS--I-----P-P-----------------------------------------
-ANMP-AEPKQAESPQNVARKGL--------------PGNFRPQ-DRFPTAMPVPM-PYP-------------DRESYAG
>Phy000IAZP_COCIM
MA------------RHTYRDP--SRLV-SRALA-IPVERSI------ILTALEPPSLVKRNPADAASSSSVPTKTCGPDD
TTGVCTRPVNSTTTLTLPIVLGAVIPLTCAF-IAFFFLHRRHVKKLRLEDANDKHKSLDFGLDFVP-SG----SNNNRRG
NGGNG-P------SMAEKSTRRGGHGVSMDLTLNSPYLLPPGLH-GSHESIHSLSRSL-HGEDDKYRHASAFPTGDSGSI
RSCS-PSFKRGGDD-ASSHNSPSS-K------Y-PY----------GDDMNQHLLK------------NAQRMSR--SPP
AI-ELDPIESDLGHPPHHA--------------------T----------------------A-----VSASE------S
GNTTF-HGRSELTVPTAVSS-----------------HGDRSSS-SSSER-----DDSV-LR---------KS-------
>Phy000KG2Q_MAGGR
MVGVTVHEGEYHLGSRM----P--VMA-RDAST-PAL-QIAA------DGPGFFKRLVARQSSDD----------CVNGE
PSNLCEKP-VTSQTLALPIALGVTIPLVALV-VMLIWLHRKNVRRQRQEDANDPHKSLDFGLDMGP------------GK
RKS-K--L---F-GGEKLGGGPHNRQISMDMNLSSPYLLPPNMQ-NSRESIHSLAKTL--HNEDPYRHITQYNASDAGSL
RSYK-A---G-GMD-------------------RPIG-----PKITVPTSRKGSLQATSPTSTIGSVPPRYEASQ---DD
YV-KPPPP------------A---ALK---S-P--TQ-DS----------------------TPYPDDKSGP-------L
ATVMP-SVP-EIQEPKPASLSK----E-SS---QAPS---LAAV-PPSSPLTISAP-EI---------------------
>Phy000ODBJ_SCLSC
MEDHQRLANIVRLARRV----P--LAE-AAAED-IDNIASIL------KMAVPDRVIMGRSSTTTSSTSS---STCAADD
TSAACEKP-IGPSAYTLPVILGIVIPVAGAI-ILFTILQRRYTKRAREEDANDPTKNMDFGMGRIS-R--T------AGG
ESS-I--S---N-FDDEKGDSGRPRQMSLDLGGKSPYLLPPELQ-DSRESLHSLSKTI-HQNEDPYRPVHEAV--GAASI
RSKQ-------GRNGSSILSASTV-A--------PSG-----MNDTGSPDGQGLLS------------NAAAMSR--TTP
PTAGFNPP------------P---RSNS--I-----P-P-----------------------------------------
-AKMP-EEPRQSP-EQNVDKKGP--------------PGNFRPQ-NGFSSTRSIPM-PFL-------------DWESYAG
>Phy000PFY6_UNCRE
MA------------RHAFQPA--SGLV-PRALA-IPLDRSI------LLTSLDHPSHVKRSPAATASSSAAATTSCGPND
TTGICTRPVSSTTTMTLPIVLGAAIPITCAI-IAFFFLHRRHVKKLRLEDANDKHKSLDFGLDFVP-SG----SNNNKRG
NGGNG-G------LMGEKSTRQRAHGVSLDLTMGNPYLLPPVSM-GSHESIHSLSKSL-HGGDDKYRHAAAFPSSENRF-
--------------------------------------------------RHSVLQ------------PTNPLA---SEP
RS-PLSPPGRNELTKLKQQ--------------------L----------------------------------------
------------------------------------------------DK-----EQSV-LR---------KS-------
>Phy00201Y5_COCHE
M-------------------------------------------------------------------------------
--------------------CATTVPVVGIA-VVLAFLHRRNKQKLREEDQRDKYKSNDWGMEGVIPK---------TSK
KGG-P-EM---S-ISEKEISGGHDRGLSIETG--SPYILPPGLH-GSRESFHSLSRST-HDPHDPYGPVAFLR--DDQSL
RSH----GPY-KGETNSVYT--A-SS-------SGT---------KKEGLQAGLLQ------------NAQRMST--SAP
VR-GESLS------------P---DSTR--SPD--SK-FAEAGIPLSPLNPRYEPEAPA-----AAPAPAPA-------P
AHAAP-VASKPTDVP-TI----------------------SIPE-PQVTEKQV---------------------------
>Phy00208KX_MYCGR
MY---IPRA------------EDS-----------R-VQRMV------DGAAAGLRIVARSL------A-------ERAE
SNSKEDTPNDRMKVQNIGIALGVIIPIGGAI-IVLTYLHRRHVKRQRVEDMNDPHKSLDFGLEGLG-SMPPQAPKKSRRG
KKGPE-MIV-TDFGGPTAHPSKRGHGMSLDLGVPSPYLLPAGLQ-GSKESIHSMSRN--YDEHDPYRSVAMM--RPSGET
DRF---R----GDDKGSVYSMSTG-N------R-SA------L--PQD--RASLIA------------NARPMS---ITP
SK-RSDPATSHPSTPADVSP------------------------------------------R-----------------
DSHSPISRTRSPLAKLSVDE-----------------TAIAEKQLEPLPS-----P-PTVPE------VALMMPPP-RKS
>Phy0020GNV_PYRTR
MP---HSHHLHHMRHQL----R--HDN-QLGSP-ITGSKTMH------VFERATRVLVARAESS-----------C-TND
SDPGCTKP---TQVPTMAIALAVIVPIVGVS-IVLCFLHRRNKRKLAEEDSKDQYKSNDWGMEGVA-K---------TNK
KKR-P-EM---S-LSEKDAGGGHDRGLSIEAG--SPYILPVGLH-GSRESFHSLSRSQ-HDPHDPYGPVAFLK--DDQST
RGSSVRGGPY-RNETGSVYT--T-SS-------SGT---------RKEGLQAGLLQ------------NAQRMST--SNP
VR-GDSLS------------P---VSTS--SPD--TK-FPDPGIPLSPLNPRFENQSPI-----SPPAASPS-------P
S-------IKPNSVP-TI----------------------SIPE-PGVTEKQV---------------------------
>Phy0022J75_CRYPA
MDEMLARRNGHLMGPRI----P--IGR-RVAAV--AE-DTSV------EASTPPSHVVGRSSSSTSDASSST-ATCSSSS
ASNTCEKP--TSTSIAGEISIGIAVPMAIIFICVLIYFHRRNLKRQAAEDRDPHHRSLDFGLGDTS-S----------GK
SKR-K-SM-----LGLGGEKSKHPRGLSIDMNLSSPYLLPEHVQ-GSRESMNSLAKTL-HQADDPYRPITKYM-SETGSV
GSLE-K---N-GRYTPSVMTASTK-RVSRQSYANPM------SPALQQPLRQNSYP-KSPLTPSAA--------------
----SSVT------------A---VETDIST-P--TAAKE----------------------PTVPEDGPMPPPQC---D
LPPLP-VVP-EIRQPAPVAQRGAA----REPVMQEHEEELDLPD-FSNNSKRESAD-EL---------------------
>Phy0022OIS_VERA1
MAATAFNGNGYRMGSRI----H--VRT-AEPTHEDAA-L----------LRSPGPVIAARKEC---------------DP
DHPDCEAPAVKPQTLI--IALSVVIPIVAIM-SILYYLHRRGIKKQRMEEASDPTMSLDFGINDDK-M---------GRG
GKRKS-VF---R-EKMLNLDPKHRAQVSMDMNLSSPYLLPPALQ-GSKQSLHSLARNL-HDDDDPYRPVNQYG-SEVGSI
RSFRPEK--E-GRAGSSVYTGSTE-R-------GSSL------HSRTHPPRQNSLP-KPPPLT-A---DPFATPTGARTP
QLETSPIS------------P---TGGS--------L-PH----------------------AIIPEIGTVSYAEDFDDS
NRNLP-HVP-DVTQPAPVAQRDARRVSSGASQSSWNEPAAQFPD-PAAHQVHNAAP-TL---------------------
>Phy003AMS0_602072
M--------------------------------------------------AETPTLARREPLPSTESSS-------SSS
SSSSETKPTSTLTTTTLPVVLGAVIPVVIAI-AILLYLHRRNVKKLRNEDANDKHKSLDFGLDLAP-T-----GAKP--M
Q-------Q----AEKLDRNAAHNKGVSLDIG-PSPYLLPPGLH-NSRESLSSLSRSIGDGDDDKYRHVGSFL-GDNASL
RSHS-R---G-PQDDASSFTG-ST-R------R-GA------L---GDDMNQGLLR------------NAQRMSR--SSP
PL-YTIPSGDRNVQSPASSD------------------------------------------H-----------------
ERDPG-FQLDLPRSPSPVHV-----------------PGMTISE-PTNSM-----TSNE-PE------FSGVHANTENSA
>Phy003BKXA_GIBZE
MGLTHYH--DQ----------R--ADIGQGASS-ISQ-KMAS------SSSHIFRRLARRENC----------------K
DDNSCAQS-SVSNS--------LVLPIVVAI-I--------NMKKQMLEDAHDPHKSLDFGLGDEG-G---------AKK
SAR-R-SI---FMGGGEKTLAHKPSQLSMDMNLSSPYLLPPGLQ-ESRESLNSLAKSLGNDNQDPYQYVAAITQSETGSL
RSFNPK---D-SHSRNTKFNSPRN---------SGKP-----GSLKMPPSRMNSLP-ETPVSATESRVDPFGTPKM--PA
PA-HPAKS------------P---FDS---E-KDAFH-PA----------------------PIVPEIGVVSD-------
FDEKN-AVP-SVQQPPIARSKT----------------------------------------------------------
>Phy003BOHC_AJECA
-----------------------------------------------MQIPPPPPTLARRHVVPK---------------
TPPEDARD----LLVMLPLPLYPYIPLTIAI-LVLVFLHRRHIRKLRSEDANDKHKSLDFGLDVVP-SG------NKKRG
RGRKG-G-MEMTTADAEKSVRRNDRGLSMDITMTSPYLLPPALN-GSHDSLHSLSRSV-HADDDRYRTATAFSAGDNSSM
RSFT-SNLKP-FPDDSVSFTGMSS-R------H-AP------P---GDEMHANLLR------------NAQRMSR--ASP
PP-GTATHSIGSSQSHRSPPR---KLTT--PT------PN----------------------I-----VS----------
DRSGI-HSPD---------------------------------------R-----SLAP-KSISTPGSELRKS-------
>Phy003DGO9_PENCH
MP---HAHH-------AGLVMRNH----VRRDV-IPPHRLPFLVPSTSSIATELPSLVARAE-----------AS-----
TTVTGEKPTSNLTTTVLPVVLGAGVPILCAI-VVLIVLHRRHVKKLLREDAMDKHKSLDFGMDTVG-PA----TRRK--G
P------------GMPPMSEPTHTKGLSLDVG---PYLMPPGLK-NSPESLRSMSI-----DDDKYRPATA-------SI
RSYP-R---------GSRFEG-------------------------ADDGNSGLLQ------------NAQRMSR--SSP
PL-YSSPIESHGRSLDQHND------------------------------------------Y-----------------
L-----GEVPGVTHPPAAQQ-----------------PGMAIGS-PNANRIPSPEP-LP--------------HLDSSLG
>Phy003PHXT_PENMQ
MS--HRHGMHHHVRRHI------PEDP-VQLES-VPLEPAP--------TISEAPSVIRRTSSATST--------C----
TGSSCETTSSSNLVNTLPVVLGVVIPVVLAI-AVLLFLHRRHVRKLRQEDANDKHKSLDFGMEVVR-AG----GGK----
-------------ANPEMGEKPHKHGMSLDI-ISSPYLLPPGLH-GSKESLRSLSKVI-SPDDDKYRLGLAAQ-SDTASL
RSYR-SHPRM-GQDDASSFRG-ST-R------H-GP------L---PDDMNQGLLQ------------NASRMSR--SPP
VD-ATSPLSVNHTIHEEQFD------------------------------------------H-----------------
PRTVG-NQSPIRQAESPPMA--------------------KSPK-NHVSP-----DHSG-QG---------DE-------
>Phy003PVXT_TALSN
-M--PHRHGIHHVHRRN------AENL-IKLES-LPLKPAP--------TISEPPSVVRRASSETST--------C----
SGASCEKSSSSGLVNTLPVVLGVVIPVVAAI-IVLLILHRRHVRKLRQEDANDKHKSLDFGMEVVR-AG----GGN----
---P---------KQPEMGEKPHKHGMSLDI-IGSPYLLPPGLH-NSKESLRSLTKVI-SVEDDKYRVAAQ---SDTASL
RSHR-T---M-GNDDASSFGG-ST-R------H-GP------I---PDDMNQGLLQ------------NASRMSR--SPP
VD-ASSPLSVSQTIHEEPFD------------------------------------------H-----------------
SNAMR-NQSQNHQAVDS-HM-----------------PPEDLPK-NHSSP-----APSG-PG---------DE-------
>Phy003PZPF_FUSOX
MGIAHYE--GARLR-------P--RTNIEDVSS-ASQ-NGVA------LSSSIFRRLVTRENC----------------Q
DTDSCAAA-SANTNLVVPIVVAIVVPIVLIA-IFLYYLHRKNMKRQMLEDANDPHKSLDFGLDGA--G---------GKK
SAR-R-SL---FMGGGEKGLNHKPSQLSMDMNLSSPYLLPPGLQ-ESRESLNSLAKSLGNDNQDPYHP------------
----------------------RN---------SGKP-----GSMKMPPSRMNSLP-ETPVSATDSKVDPFGTPKA--PA
PT-HQPNS------------H---FDE-----KDGFQ-PT----------------------AIIPEIGVVSD-------
FDEKR-DGA-SVQPPPAVRSKT----------------------------------------------------------
>Phy003QBJJ_PENDI
MS---HAHH-------AGLVMRNH----VRRDV-IPANRVPIFVPS-LSVATQLPTLVARSE------------S-----
EPTSGPKATSNLATTVFPIVFGAGIPIFCAL-IILVVLHRRQVKKLVREDAMDKHKSLDFGLDTVG-PA----TRRK--G
A-------K----GMPPMSEHNHTKGLSLDVG---PYLLPPGLQ-HSTDSLRSMSI-----DDGKYRPATA-------SI
RSNS-R---------NSKYGG-------------------------TDDGNSGLLQ------------NAQRIPR--SSP
PL-CSPIEPRARSPLNQHDD------------------------------------------Y-----------------
I-----GQVPEVTHPPAVHQ-----------------PGMAIGS-PNTNRIPSPEP-LP--------------HVDSSSG
>Phy0043OCA_COLGM
MASASFSANGYVMGSRI----P--IRD-VNPINMTPT-PASP-------IRIASRIIGARDE------------QC--TG
SATLCEKP-VDPASLTLPITLGVTIPIVGAL-FLLYYFHRRNMRRQAQEDATDPNRGLDFGLGDAP-I--D------KGG
KKRKS-LM---FREKGMGIETNKQRQLSMDMNLSSPYLLPPGLQ-SSRESLNSLARTL-HNEADPYRPVYASS--DAGSI
YTKT-------TSR-----------R-------GSSMTGRTTMTQNTLPPRQTSLP-RPPPAT-A---DPLGASR--SGS
PSL-PPTS------------P---AIR---S-P--LV-AE----------------------PVIPQIETVP-------S
GSSLP-QIP-DVPEPEPVAQRGL--------------PGNSRPS-PGHPTILEARE-PE---------------------
>Phy0043W64_36779
M-AGVAEAGSYRMSGRI----P--IVR-RNASG-VEA-LDVP-------QPDQTRPLVARESID-----------C-TGE
NANLCEKP-YGANSLGVPIALGVAIPIVALL-GVVFWLHRRNIKKQRSEEANDPHKSLDFGLGDGS-R--G------SKG
GKRKS-AF---FGGGGAEKASHRNNQLSMDMNLSSPYLLPPSAQAGSRESLHSLARTL-HGNEDPYSPVYQ--QSDARSM
RSTK-K---G-SRDD-------YN---------GPSG-----PGLSVPPSRKSSFP-TSPTSPVTSIPPRYEASK---DE
VT-PPPPA------------HSPGQAN---F-P--LN-DT----------------------SPYPNDHQLDA------H
GVSMP-AVP-ELQEPAQAKMPS-------------SP---RFPL-P----------------------------------
>Phy00443NV_MAGO7
MVGVTVHEGEYHLGSRM----P--VMA-RDAST-PAL-QIAA------DGPGFFKRLVARQSSDD----------CVNGE
PSNLCEKP-VTSQTLALPIALGVTIPLVALV-VMLIWLHRKNVRRQRQEDANDPHKSLDFGLDMGP------------GK
RKS-K--L---F-GGEKLGGGPHNRQISMDMNLSSPYLLPPNMQ-NSRESIHSLAKTL--HNEDPYRHITQYNASDAGSL
RSYK-A---G-GMD-------------------RPIG-----PKITVPTSRKGSLQATSPTSTIGSVPPRYEASQ---DD
YV-KPPPP------------A---ALK---S-P--TQ-DS----------------------TPYPDDKSGP-------L
ATVMP-SVP-EIQEPKPASLSK----E-SS---QAPS---LAAV-PPSSPLTISAP-EI--------------------A
>Phy0044G80_PHANO
M-------------HHL----R--RDA-QMAAS-TSATHTL--------VDRASRVLVARTT-------------C-TND
SDPGCTKP---TQVPTIAIALAAIVPVVGLL-IVLVFLHRRNQKKLAAEDAKDKYKSMDFGMGGAG-K---------KNK
-GG-P-EM---SITEKDIRGGAHSRGISLEGG--NPYILPVGLH-GSRESFHSLSRSQ-NDPHDPYRPVTFLR-NDNQSI
RSQS-RG--Y-GHDNGSLYTTRTMSS-------GGT---------QRNRMGDGLLN------------NAQRMST--SRP
MR-SESLS------------P---DSTT--SPD--VK-FPEQNIALSPLNPRFEGEPLAMPATELPHSRTPP-------S
A-------SSPPNVP-II----------------------AVPA-PAAAKPEI---------------------------
================================================
FILE: inst/extdata/tp53.fa
================================================
>Homo_sapiens
----MDDLMLSP-------DDIEQWFTED-----------------PGPDEAPRMPEAAPPVAPAPA---------APTPAAPAPAPSWPLSSSVPSQKTYQGSYGFRLGFLHSGTAKSVTCTYSPALNKMFCQLAKTCPVQLWVDSTPPPGTRVRAMAIYKQSQHMTEVVRRCPHHERCSD-SDGLAPPQHLIRVEGNLRVEYLDDRNTFRHSVVVPYEPPEVGSDCTTIHYNYMCNSSCMGGMNRRPILTIITLEDSSGNLLGRNSFEVRVCACPGRDRRTEEENLRKKGEPHHELPPGS-TKRALPNNTSSS---PQPKKKP----LDGEYFTLQIRGRERFEMFRELNEALELKDAQAGKEPGGSRAHSSHLKSKKGQSTS---RHKKLMFKTEG-PDSD
>Sus_scrofa
MEESQSELGVEPPLSQETFSDLWKLLPENNLLSSELSL-AAVNDLLLSP-VTNWLDENPDDASRVPAPPA----ATAPAPAAPAPATSWPLSSFVPSQKTYPGSYDFRLGFLHSGTAKSVTCTYSPALNKLFCQLAKTCPVQLWVSSPPPPGTRVRAMAIYKKSEYMTEVVRRCPHHERSSDYSDGLAPPQHLIRVEGNLRAEYLDDRNTFRHSVVVPYEPPEVGSDCTTIHYNFMCNSSCMGGMNRRPILTIITLEDASGNLLGRNSFEVRVCACPGRDRRTEEENFLKKGQSCPEPPPGS-TKRALPTSTSSS---PVQKKKP----LDGEYFTLQIRGRERFEMFRELNDALELKDAQTARESGENRAHSSHLKSKKGQSPS---RHKKPMFKREG-PDSD
>Rattus_norvegicus
MEDSQSDMSIELPLSQETFSCLWKLLPPDDILPTTATGSPNSMEDLFLPQDVAELLEGPEEALQVSAPAAQEPGTEAPAPVAPASATPWPLSSSVPSQKTYQGNYGFHLGFLQSGTAKSVMCTYSISLNKLFCQLAKTCPVQLWVTSTPPPGTRVRAMAIYKKSQHMTEVVRRCPHHERCSD-GDGLAPPQHLIRVEGNPYAEYLDDRQTFRHSVVVPYEPPEVGSDYTTIHYKYMCNSSCMGGMNRRPILTIITLEDSSGNLLGRDSFEVRVCACPGRDRRTEEENFRKKEEHCPELPPGS-AKRALPTSTSSS---PQQKKKP----LDGEYFTLKIRGRERFEMFRELNEALELKDARAAEESGDSRAHSSYPKTKKGQSTS---RHKKPMIKKVG-PDSD
>Equus_caballus
MEETQTELGIEPPLSQETFSDLWKLLPENNVLSPDLS--PAVNNLLLSPDVVNWLDEGPDEAPRMPA---------APAPLAPAPATSWPLSSFVPSQKTYPGCYGFRLGFLNSGTAKSVTCTYSPTLNKLFCQLAKTCPVQLLVSSPPPPGTRVRAMAIYKKSEFMTEVVRRCPHHERCSDSSDGLAPPQHLIRVEGNLRAEYLEDRNTFRHSVVVPYEPPEVGSDCTTIHYNFMCNSSCMGGMNRRPILTIITLEDSSGNLLGRNSFEVRVCACPGRDRRTEEENFRKKEEPCPEPPPRS-TKRVLSSNTSSS---PPQKKKP----LDGEYFTLQIRGRERFEMFRELNEALELKDAQTGKEPGGSKAHSSHLKSKKGQSTS---SHKKLIFKREG-PDSD
>Danio_rerio
MAQNDSQE----------FAELWEKNLIS-----------------IQPPGGGSCWDIINDEEYLPGSFDPN--FFENVLEEQPQPSTLPPTSTVPETSDYPGDHGFRLRFPQSGTAKSVTCTYSPDLNKLFCQLAKTCPVQMVVDVAPPQGSVVRATAIYKKSEHVAEVVRRCPHHERTPD-GDNLAPAGHLIRVEGNQRANYREDNITLRHSVFVPYEAPQLGAEWTTVLLNYMCNSSCMGGMNRRPILTIITLETQEGQLLGRRSFEVRVCACPGRDRKTEESNFKKDQETKTMAKTTTGTKRSLVKESSSATLRPEGSKKAKGSSSDEEIFTLQVRGRERYEILKKLNDSLELSDVVPASDAEKYRQKFMTKNKKENRESSEPKQGKKLMVKDEGRSDSD
================================================
FILE: man/GVariation.Rd
================================================
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/data.R
\docType{data}
\name{GVariation}
\alias{GVariation}
\title{GVariation}
\format{
a folder
}
\source{
\url{https://link.springer.com/article/10.1007/s11540-015-9307-3}
}
\description{
A folder containing 4 MAS files as a sample
data set to identify the sequence recombination event.
}
\details{
\itemize{
\item A.Mont.fas MSA with sequences of 'Mont' and 'CF_YL21'
\item B.Oz.fas MSA with sequences of 'Oz' and 'CF_YL21'
\item C.Wilga5.fas MSA with sequences of 'Wilga5' and 'CF_YL21'
\item sample_alignment.fa MSA with sequences of 'Mont', 'CF_YL21',
'Oz', and 'Wilga5'
}
}
\keyword{datasets}
================================================
FILE: man/Gram-negative_AKL.fasta.Rd
================================================
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/data.R
\docType{data}
\name{Gram-negative_AKL.fasta}
\alias{Gram-negative_AKL.fasta}
\title{Gram-negative_AKL}
\format{
A MSA fasta with 100 sequences and 36 positions.
}
\source{
\url{http://biovis.net/year/2013/info/redesign-contest}
}
\description{
Amino acids in the adenylate kinase lid (AKL) domain
from Gram-negative bacteria.
}
\keyword{datasets}
================================================
FILE: man/Gram-positive_AKL.fasta.Rd
================================================
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/data.R
\docType{data}
\name{Gram-positive_AKL.fasta}
\alias{Gram-positive_AKL.fasta}
\title{Gram-positive_AKL}
\format{
A MSA fasta with 100 sequences and 36 positions.
}
\source{
\url{http://biovis.net/year/2013/info/redesign-contest}
}
\description{
Amino acids in the adenylate kinase lid (AKL) domain
from Gram-positive bacteria.
}
\keyword{datasets}
================================================
FILE: man/LeaderRepeat_All.fa.Rd
================================================
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/data.R
\docType{data}
\name{LeaderRepeat_All.fa}
\alias{LeaderRepeat_All.fa}
\title{A sample DNA alignment sequences}
\format{
A MSA fasta
}
\description{
DNA alignment sequences with 24 sequences and 56 positions.
}
\keyword{datasets}
================================================
FILE: man/Rfam.Rd
================================================
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/data.R
\docType{data}
\name{Rfam}
\alias{Rfam}
\title{Rfam}
\format{
a folder
}
\source{
\url{https://rfam.xfam.org/}
}
\description{
A folder containing seed alignment sequences and
corresponding consensus RNA secondary structure.
}
\details{
\itemize{
\item RF00458.fasta seed alignment sequences of Cripavirus internal
ribosome entry site (IRES)
\item RF03120.fasta seed alignment sequences of Sarbecovirus 5'UTR
\item RF03120_SS.txt consensus RNA secondary structure of
Sarbecovirus 5'UTR
}
}
\keyword{datasets}
================================================
FILE: man/TP53_genes.xlsx.Rd
================================================
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/data.R
\docType{data}
\name{TP53_genes.xlsx}
\alias{TP53_genes.xlsx}
\title{genome locus}
\format{
xlsx
}
\description{
The local genome map shows the 30000 sites around the TP53 gene.
}
\keyword{datasets}
================================================
FILE: man/adjust_ally.Rd
================================================
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/ancestor_seq.R
\name{adjust_ally}
\alias{adjust_ally}
\title{adjust_ally}
\usage{
adjust_ally(tree, node, sub = FALSE, seq_colname = "mol_seq")
}
\arguments{
\item{tree}{ggtree object}
\item{node}{internal node in tree}
\item{sub}{logical value.}
\item{seq_colname}{the colname of MSA on tree$data}
}
\value{
tree
}
\description{
adjust the tree branch position after assigning ancestor node
}
\author{
Lang Zhou
}
================================================
FILE: man/assign_dms.Rd
================================================
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/dms.R
\name{assign_dms}
\alias{assign_dms}
\title{assign_dms}
\usage{
assign_dms(x, dms)
}
\arguments{
\item{x}{data frame from tidy_msa()}
\item{dms}{dms data frame}
}
\value{
tree
}
\description{
assign dms value to alignments.
}
\author{
Lang Zhou
}
================================================
FILE: man/available_colors.Rd
================================================
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/available.R
\name{available_colors}
\alias{available_colors}
\title{List Color Schemes currently available}
\usage{
available_colors()
}
\value{
A character vector of available color schemes
}
\description{
This function lists color schemes currently available that
can be used by 'ggmsa'
}
\examples{
available_colors()
}
\author{
Lang Zhou
}
================================================
FILE: man/available_fonts.Rd
================================================
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/available.R
\name{available_fonts}
\alias{available_fonts}
\title{List Font Families currently available}
\usage{
available_fonts()
}
\value{
A character vector of available font family names
}
\description{
This function lists font families currently available
that can be used by 'ggmsa'
}
\examples{
available_fonts()
}
\author{
Lang Zhou
}
================================================
FILE: man/available_msa.Rd
================================================
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/available.R
\name{available_msa}
\alias{available_msa}
\title{List MSA objects currently available}
\usage{
available_msa()
}
\value{
A character vector of available objects
}
\description{
This function lists MSA objects currently available that
can be used by 'ggmsa'
}
\examples{
available_msa()
}
\author{
Lang Zhou
}
================================================
FILE: man/extract_seq.Rd
================================================
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/ancestor_seq.R
\name{extract_seq}
\alias{extract_seq}
\title{extract_seq}
\usage{
extract_seq(tree_adjust, seq_colname = "mol_seq")
}
\arguments{
\item{tree_adjust}{ggtree object}
\item{seq_colname}{the colname of MSA on tree$data}
}
\value{
character
}
\description{
extract ancestor sequence from tree data
}
\author{
Lang Zhou
}
================================================
FILE: man/facet_msa.Rd
================================================
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/facet_msa.R
\name{facet_msa}
\alias{facet_msa}
\title{segment MSA}
\usage{
facet_msa(field)
}
\arguments{
\item{field}{a numeric vector of the field size.}
}
\value{
ggplot layers
}
\description{
The MSA would be plot in a field that you set.
}
\examples{
library(ggplot2)
f <- system.file("extdata/sample.fasta", package="ggmsa")
# 2 fields
ggmsa(f, end = 120, font = NULL, color="Chemistry_AA") +
facet_msa(field = 60)
# 3 fields
ggmsa(f, end = 120, font = NULL, color="Chemistry_AA") +
facet_msa(field = 40)
}
\author{
Lang Zhou
}
================================================
FILE: man/geom_GC.Rd
================================================
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/geom_GC.R
\name{geom_GC}
\alias{geom_GC}
\title{geom_GC}
\usage{
geom_GC(show.legend = FALSE)
}
\arguments{
\item{show.legend}{logical. Should this layer be included in the legends?}
}
\value{
a ggplot layer
}
\description{
Multiple sequence alignment layer for ggplot2. It plot points of GC content.
}
\examples{
#plot GC content
f <- system.file("extdata/LeaderRepeat_All.fa", package="ggmsa")
ggmsa(f, font = NULL, color="Chemistry_NT") + geom_GC()
}
\author{
Lang Zhou
}
================================================
FILE: man/geom_helix.Rd
================================================
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/arc.R
\name{geom_helix}
\alias{geom_helix}
\title{geom_helix}
\usage{
geom_helix(helix_data, color_by = "length", overlap = FALSE, ...)
}
\arguments{
\item{helix_data}{a data frame. The file of nucleltide secondary structure
and then read by readSSfile().}
\item{color_by}{generate colors for helices by various rules,
including integer counts and value ranges one of "length" and "value"}
\item{overlap}{Logicals. If TRUE, two structures data called predict
and known must be given(eg:heilx_data = list(known = data1,
predicted = data2)),
plots the predicted helices that are known on top,
predicted helices that are not known on the bottom, and finally plots
unpredicted helices on top in black.}
\item{...}{additional parameter}
}
\value{
ggplot2 layers
}
\description{
The layer of helix plot
}
\examples{
RF03120 <- system.file("extdata/Rfam/RF03120_SS.txt", package="ggmsa")
RF03120_fas <- system.file("extdata/Rfam/RF03120.fasta", package="ggmsa")
SS <- readSSfile(RF03120, type = "Vienna")
ggmsa(RF03120_fas, font = NULL,border = NA,
color = "Chemistry_NT", seq_name = FALSE) +
geom_helix(SS)
}
\author{
Lang Zhou
}
================================================
FILE: man/geom_msa.Rd
================================================
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/geom_msa.R
\name{geom_msa}
\alias{geom_msa}
\title{geom_msa}
\usage{
geom_msa(
data,
font = "helvetical",
mapping = NULL,
color = "Chemistry_AA",
custom_color = NULL,
char_width = 0.9,
none_bg = FALSE,
by_conservation = FALSE,
position_highlight = NULL,
seq_name = NULL,
border = NULL,
consensus_views = FALSE,
use_dot = FALSE,
disagreement = TRUE,
ignore_gaps = FALSE,
ref = NULL,
position = "identity",
show.legend = FALSE,
dms = FALSE,
position_color = FALSE,
...
)
}
\arguments{
\item{data}{sequence alignment with data frame, generated by tidy_msa().}
\item{font}{font families, possible values are 'helvetical', 'mono',
and 'DroidSansMono', 'TimesNewRoman'. Defaults is 'helvetical'.}
\item{mapping}{aes mapping
If font = NULL, only plot the background tile.}
\item{color}{A Color scheme. One of 'Clustal', 'Chemistry_AA', 'Shapely_AA',
'Zappo_AA', 'Taylor_AA', 'LETTER','CN6',, 'Chemistry_NT', 'Shapely_NT',
'Zappo_NT', 'Taylor_NT'. Defaults is 'Chemistry_AA'.}
\item{custom_color}{A data frame with two column called "names" and
"color".Customize the color scheme.}
\item{char_width}{a numeric vector. Specifying the character width in
the range of 0 to 1. Defaults is 0.9.}
\item{none_bg}{a logical value indicating whether background
should be displayed. Defaults is FALSE.}
\item{by_conservation}{a logical value. The most conserved regions have
the brightest colors.}
\item{position_highlight}{A numeric vector of the position that
need to be highlighted.}
\item{seq_name}{a logical value indicating whether sequence names
should be displayed. Defaults is 'NULL' which indicates that the
sequence name is displayed when 'font = null', but 'font = char'
will not be displayed. If 'seq_name = TRUE' the sequence name will
be displayed in any case. If 'seq_name = FALSE' the sequence name will not
be displayed under any circumstances.}
\item{border}{a character string. The border color.}
\item{consensus_views}{a logical value that opening consensus views.}
\item{use_dot}{a logical value. Displays characters as dots instead of
fading their color in the consensus view.}
\item{disagreement}{a logical value. Displays characters that disagreement
to consensus(excludes ambiguous disagreements).}
\item{ignore_gaps}{a logical value. When selected TRUE,
gaps in column are treated as if that row didn't exist.}
\item{ref}{a character string. Specifying the reference sequence
which should be one of input sequences when 'consensus_views' is TRUE.}
\item{position}{Position adjustment, either as a string, or
the result of a call to a position adjustment function,
default is 'identity' meaning 'position_identity()'.}
\item{show.legend}{logical. Should this layer be included in the legends?}
\item{dms}{logical.}
\item{position_color}{logical.}
\item{...}{additional parameter}
}
\value{
A list
}
\description{
Multiple sequence alignment layer for ggplot2.
It creates background tiles with/without sequence characters.
}
\examples{
library(ggplot2)
aln <- system.file("extdata", "sample.fasta", package = "ggmsa")
tidy_aln <- tidy_msa(aln, start = 150, end = 170)
ggplot() + geom_msa(data = tidy_aln, font = NULL) + coord_fixed()
}
\author{
Guangchuang Yu, Lang Zhou
seq_name' work
position_highlight' work
border' work
none_bg' work
}
================================================
FILE: man/geom_msaBar.Rd
================================================
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/geom_msaBar.R
\name{geom_msaBar}
\alias{geom_msaBar}
\title{geom_msaBar}
\usage{
geom_msaBar()
}
\value{
A list
}
\description{
Multiple sequence alignment layer for ggplot2.
It plot sequence conservation bar.
}
\examples{
#plot multiple sequence alignment and conservation bar.
f <- system.file("extdata/sample.fasta", package="ggmsa")
ggmsa(f, 221, 280, font = NULL, seq_name = TRUE) + geom_msaBar()
}
\author{
Lang Zhou
}
================================================
FILE: man/geom_seed.Rd
================================================
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/geom_seed.R
\name{geom_seed}
\alias{geom_seed}
\title{geom_seed}
\usage{
geom_seed(seed, star = FALSE)
}
\arguments{
\item{seed}{a character string.Specifying the miRNA seed sequence
like 'GAGGUAG'.}
\item{star}{a logical value indicating whether asterisks should
be displayed.}
}
\value{
a ggplot layer
}
\description{
Highlighting the seed in miRNA sequences
}
\examples{
miRNA_sequences <- system.file("extdata/seedSample.fa", package="ggmsa")
ggmsa(miRNA_sequences, font = 'DroidSansMono',
color = "Chemistry_NT", none_bg = TRUE) +
geom_seed(seed = "GAGGUAG", star = FALSE)
ggmsa(miRNA_sequences, font = 'DroidSansMono',
color = "Chemistry_NT") +
geom_seed(seed = "GAGGUAG", star = TRUE)
}
\author{
Lang Zhou
}
================================================
FILE: man/geom_seqlogo.Rd
================================================
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/seqlogo.R
\name{geom_seqlogo}
\alias{geom_seqlogo}
\title{geom_seqlogo}
\usage{
geom_seqlogo(
font = "DroidSansMono",
color = "Chemistry_AA",
adaptive = TRUE,
top = TRUE,
custom_color = NULL,
show.legend = FALSE,
...
)
}
\arguments{
\item{font}{font families, possible values are 'helvetical', 'mono',
and 'DroidSansMono', 'TimesNewRoman'. Defaults is 'DroidSansMono'.}
\item{color}{A Color scheme. One of 'Clustal', 'Chemistry_AA',
'Shapely_AA', 'Zappo_AA', 'Taylor_AA', 'LETTER', 'CN6', 'Chemistry_NT',
'Shapely_NT', 'Zappo_NT', 'Taylor_NT'. Defaults is 'Chemistry_AA'.}
\item{adaptive}{A logical value indicating whether the overall height
of seqlogo corresponds to the number of sequences.If is FALSE,
seqlogo overall height = 4,fixedly.}
\item{top}{A logical value. If TRUE, seqlogo is aligned to the top of MSA.}
\item{custom_color}{A data frame with two cloumn called "names" and
"color".Customize the color scheme.}
\item{show.legend}{logical. Should this layer be included in the legends?}
\item{...}{additional parameter}
}
\value{
A list
}
\description{
Multiple sequence alignment layer for ggplot2. It plot sequence motifs.
}
\examples{
#plot multiple sequence alignment and sequence motifs
f <- system.file("extdata/LeaderRepeat_All.fa", package="ggmsa")
ggmsa(f,font = NULL,color = "Chemistry_NT") + geom_seqlogo()
}
\author{
Lang Zhou
}
================================================
FILE: man/ggSeqBundle.Rd
================================================
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/SeqBundles.R
\name{ggSeqBundle}
\alias{ggSeqBundle}
\title{ggSeqBundle}
\usage{
ggSeqBundle(
msa,
line_width = 0.3,
line_thickness = 0.3,
line_high = 0,
spline_shape = 0.3,
size = 0.5,
alpha = 0.2,
bundle_color = c("#2ba0f5", "#424242"),
lev_molecule = c("-", "A", "V", "L", "I", "P", "F", "W", "M", "G", "S", "T", "C", "Y",
"N", "Q", "D", "E", "K", "R", "H")
)
}
\arguments{
\item{msa}{Multiple sequence alignment file(FASTA) or object for
representing either nucleotide sequences or peptide sequences.Also receives
multiple MSA files.
eg:msa = c("Gram-negative_AKL.fasta", "Gram-positive_AKL.fasta").}
\item{line_width}{The width of bundles at each site, default is 0.3.}
\item{line_thickness}{The thickness of bundles at each site, default is 0.3.}
\item{line_high}{The high of bundles at each site, default is 0.}
\item{spline_shape}{A numeric vector of values between -1 and 1, which
control the shape of the spline relative to the control points.}
\item{size}{A numeric vector of values between 0 and 1,
which control the size of each lines.}
\item{alpha}{A numeric vector of values between 0 and 1,
which control the alpha of each lines.}
\item{bundle_color}{The colors of each sequence bundles.
eg: bundle_color = c("#2ba0f5","#424242").}
\item{lev_molecule}{Reassigning the Y-axis and displaying
letter-coded amino acids/nucleotides arranged by physiochemical
properties or others.eg:amino acids hydrophobicity
lev_molecule = c("-","A", "V", "L", "I", "P", "F", "W", "M",
"G", "S","T", "C", "Y", "N", "Q", "D", "E", "K","R", "H").}
}
\value{
ggplot object
}
\description{
plot Sequence Bundles for MSA based 'ggolot2'
}
\examples{
aln <- system.file("extdata", "Gram-negative_AKL.fasta", package = "ggmsa")
ggSeqBundle(aln)
}
\author{
Lang Zhou
}
================================================
FILE: man/gghelix.Rd
================================================
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/arc.R
\name{gghelix}
\alias{gghelix}
\title{gghelix}
\usage{
gghelix(helix_data, color_by = "length", overlap = FALSE)
}
\arguments{
\item{helix_data}{a data frame. The file of nucleltide secondary structure
and then read by readSSfile().}
\item{color_by}{generate colors for helices by various rules,
including integer counts and value ranges one of "length" and "value"}
\item{overlap}{Logicals. If TRUE, two structures data called predict
and known must be given(eg:heilx_data = list(known = data1,
predicted = data2)),
plots the predicted helices that are known on top, predicted helices that
are not known on the bottom, and finally plots unpredicted helices
on top in black.}
}
\value{
ggplot object
}
\description{
Plots nucleltide secondary structure as helices in arc diagram
}
\examples{
RF03120 <- system.file("extdata/Rfam/RF03120_SS.txt", package="ggmsa")
helix_data <- readSSfile(RF03120, type = "Vienna")
gghelix(helix_data)
}
\author{
Lang Zhou
}
================================================
FILE: man/ggmaf.Rd
================================================
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/ggmaf.R
\name{ggmaf}
\alias{ggmaf}
\title{ggmaf}
\usage{
ggmaf(
data,
ref,
block_start = NULL,
block_end = NULL,
facet_field = NULL,
heights = c(0.4, 0.6),
facet_heights = NULL
)
}
\arguments{
\item{data}{a tidy MAF data frame.You can get it by tidy_maf_df()}
\item{ref}{character, the name of reference genome.
eg:"hg38.chr1_KI270707v1_random"}
\item{block_start}{a numeric vector(>0). The start block to plot.}
\item{block_end}{a numeric vector(< max block). The end block to plot.}
\item{facet_field}{a numeric vector. The field in a facet panel.}
\item{heights}{two numeric vector.The plot proportion between
"Genomic location" panel(upon) and "Alignment" panel(down).
Default:c(0.4,0.6)}
\item{facet_heights}{Numeric vectors.The facet proportion.}
}
\value{
ggplot object
}
\description{
plot MAF
}
\author{
Lang Zhou
}
================================================
FILE: man/ggmsa.Rd
================================================
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/ggmsa.R
\name{ggmsa}
\alias{ggmsa}
\title{ggmsa}
\usage{
ggmsa(
msa,
start = NULL,
end = NULL,
font = "helvetical",
color = "Chemistry_AA",
custom_color = NULL,
char_width = 0.9,
none_bg = FALSE,
by_conservation = FALSE,
position_highlight = NULL,
seq_name = NULL,
border = NULL,
consensus_views = FALSE,
use_dot = FALSE,
disagreement = TRUE,
ignore_gaps = FALSE,
ref = NULL,
show.legend = FALSE
)
}
\arguments{
\item{msa}{Multiple aligned sequence files or objects representing either
nucleotide sequences or AA sequences.}
\item{start}{a numeric vector. Start position to plot.}
\item{end}{a numeric vector. End position to plot.}
\item{font}{font families, possible values are 'helvetical', 'mono', and
'DroidSansMono', 'TimesNewRoman'. Defaults is 'helvetical'.
If font = NULL, only plot the background tile.}
\item{color}{a Color scheme. One of 'Clustal', 'Chemistry_AA',
'Shapely_AA', 'Zappo_AA', 'Taylor_AA', 'LETTER', 'CN6', 'Chemistry_NT',
'Shapely_NT', 'Zappo_NT', 'Taylor_NT'. Defaults is 'Chemistry_AA'.}
\item{custom_color}{A data frame with two column called "names" and
"color".Customize the color scheme.}
\item{char_width}{a numeric vector. Specifying the character width in
the range of 0 to 1. Defaults is 0.9.}
\item{none_bg}{a logical value indicating whether background should be
displayed. Defaults is FALSE.}
\item{by_conservation}{a logical value. The most conserved regions have
the brightest colors.}
\item{position_highlight}{A numeric vector of the position that need to be
highlighted.}
\item{seq_name}{a logical value indicating whether sequence names
should be displayed. Defaults is 'NULL' which indicates that the
sequence name is displayed when 'font = null', but 'font = char'
will not be displayed. If 'seq_name = TRUE' the sequence name will
be displayed in any case. If 'seq_name = FALSE' the sequence name
will not be displayed under any circumstances.}
\item{border}{a character string. The border color.}
\item{consensus_views}{a logical value that opening consensus views.}
\item{use_dot}{a logical value. Displays characters as dots instead
of fading their color in the consensus view.}
\item{disagreement}{a logical value. Displays characters that
disagreememt to consensus(excludes ambiguous disagreements).}
\item{ignore_gaps}{a logical value. When selected TRUE, gaps in column
are treated as if that row didn't exist.}
\item{ref}{a character string. Specifying the reference sequence which
should be one of input sequences when 'consensus_views' is TRUE.}
\item{show.legend}{logical. Should this layer be included in the legends?}
}
\value{
ggplot object
}
\description{
Plot multiple sequence alignment using ggplot2 with multiple color schemes
supported.
}
\examples{
#plot multiple sequences by loading fasta format
fasta <- system.file("extdata", "sample.fasta", package = "ggmsa")
ggmsa(fasta, 164, 213, color="Chemistry_AA")
\dontrun{
#XMultipleAlignment objects can be used as input in the 'ggmsa'
AAMultipleAlignment <- Biostrings::readAAMultipleAlignment(fasta)
ggmsa(AAMultipleAlignment, 164, 213, color="Chemistry_AA")
#XStringSet objects can be used as input in the 'ggmsa'
AAStringSet <- Biostrings::readAAStringSet(fasta)
ggmsa(AAStringSet, 164, 213, color="Chemistry_AA")
#Xbin objects from 'seqmagick' can be used as input in the 'ggmsa'
AAbin <- seqmagick::fa_read(fasta)
ggmsa(AAbin, 164, 213, color="Chemistry_AA")
}
}
\author{
Guangchuang Yu
}
================================================
FILE: man/merge_seq.Rd
================================================
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/pp_interactive.R
\name{merge_seq}
\alias{merge_seq}
\title{merge_seq}
\usage{
merge_seq(previous_seq, gap, subsequent_seq, adjust_name = TRUE)
}
\arguments{
\item{previous_seq}{previous MSA}
\item{gap}{gap length}
\item{subsequent_seq}{subsequent MSA}
\item{adjust_name}{logical value. merge seq name or not}
}
\value{
tidy MSA data frame
}
\description{
merge two MSA
}
\author{
Lang Zhou
}
================================================
FILE: man/plot-methods.Rd
================================================
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/method-plot.R
\docType{methods}
\name{plot}
\alias{plot}
\alias{plot,SeqDiff,ANY-method}
\title{plot method for SeqDiff object}
\usage{
\S4method{plot}{SeqDiff,ANY}(
x,
width = 50,
title = "auto",
xlab = "Nucleotide Position",
by = "bar",
fill = "firebrick",
colors = c(A = "#ff6d6d", C = "#769dcc", G = "#f2be3c", T = "#74ce98"),
xlim = NULL
)
}
\arguments{
\item{x}{SeqDiff object}
\item{width}{bin width}
\item{title}{plot title}
\item{xlab}{xlab}
\item{by}{one of 'bar' and 'area'}
\item{fill}{fill color of upper part of the plot}
\item{colors}{color of lower part of the plot}
\item{xlim}{limits of x-axis}
}
\value{
plot
}
\description{
plot method for SeqDiff object
}
\examples{
fas <- list.files(system.file("extdata", "GVariation", package="ggmsa"),
pattern="fas", full.names=TRUE)
x1 <- seqdiff(fas[1], reference=1)
plot(x1)
}
\author{
guangchuang yu
}
================================================
FILE: man/readSSfile.Rd
================================================
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/arc.R
\name{readSSfile}
\alias{readSSfile}
\title{readSSfile}
\usage{
readSSfile(file, type = NULL)
}
\arguments{
\item{file}{A text file in connect format}
\item{type}{file type. one of "Helix, "Connect", "Vienna" and "Bpseq"}
}
\value{
data frame
}
\description{
Read secondary structure file
}
\examples{
RF03120 <- system.file("extdata/Rfam/RF03120_SS.txt", package="ggmsa")
helix_data <- readSSfile(RF03120, type = "Vienna")
}
\author{
Lang Zhou
}
================================================
FILE: man/read_maf.Rd
================================================
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/read_maf.R
\name{read_maf}
\alias{read_maf}
\title{read_maf}
\usage{
read_maf(multiple_alignment_format)
}
\arguments{
\item{multiple_alignment_format}{a multiple alignment format(MAF) file}
}
\value{
data frame
}
\description{
read 'multiple alignment format'(MAF) file
}
\author{
Lang Zhou
}
================================================
FILE: man/reset_pos.Rd
================================================
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/pp_interactive.R
\name{reset_pos}
\alias{reset_pos}
\title{reset_pos}
\usage{
reset_pos(seq_df)
}
\arguments{
\item{seq_df}{MSA data}
}
\value{
data frame
}
\description{
reset MSA position
}
\author{
Lang Zhou
}
================================================
FILE: man/sample.fasta.Rd
================================================
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/data.R
\docType{data}
\name{sample.fasta}
\alias{sample.fasta}
\title{A sample data used in ggmsa}
\format{
A MSA fasta with 9 sequences and 456 positions.
}
\description{
A dataset containing the alignment sequences of
the phenylalanine hydroxylase protein (PH4H)
within nine species
}
\keyword{datasets}
================================================
FILE: man/seedSample.fa.Rd
================================================
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/data.R
\docType{data}
\name{seedSample.fa}
\alias{seedSample.fa}
\title{microRNA data used in ggmsa}
\format{
A MSA fasta with 6 sequences and 22 positions.
}
\source{
\url{https://www.mirbase.org/ftp.shtml}
}
\description{
Fasta format sequences of mature miRNA sequences
from miRBase
}
\keyword{datasets}
================================================
FILE: man/seqdiff.Rd
================================================
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/seqdiff.R
\name{seqdiff}
\alias{seqdiff}
\title{seqdiff}
\usage{
seqdiff(fasta, reference = 1)
}
\arguments{
\item{fasta}{fasta file}
\item{reference}{which sequence serve as reference, 1 or 2}
}
\value{
SeqDiff object
}
\description{
calculate difference of two aligned sequences
}
\examples{
fas <- list.files(system.file("extdata", "GVariation", package="ggmsa"),
pattern="fas", full.names=TRUE)
seqdiff(fas[1], reference=1)
}
\author{
guangchuang yu
}
================================================
FILE: man/seqlogo.Rd
================================================
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/seqlogo.R
\name{seqlogo}
\alias{seqlogo}
\title{seqlogo}
\usage{
seqlogo(
msa,
start = NULL,
end = NULL,
font = "DroidSansMono",
color = "Chemistry_AA",
adaptive = FALSE,
top = FALSE,
custom_color = NULL
)
}
\arguments{
\item{msa}{Multiple sequence alignment file or object for representing
either nucleotide sequences or peptide sequences.}
\item{start}{Start position to plot.}
\item{end}{End position to plot.}
\item{font}{font families, possible values are 'helvetical', 'mono', and
'DroidSansMono', 'TimesNewRoman'. Defaults is 'DroidSansMono'.
If font=NULL, only the background tiles is drawn.}
\item{color}{A Color scheme. One of 'Clustal', 'Chemistry_AA',
'Shapely_AA', 'Zappo_AA', 'Taylor_AA', 'LETTER', 'CN6','Chemistry_NT',
'Shapely_NT', 'Zappo_NT', 'Taylor_NT'. Defaults is 'Chemistry_AA'.}
\item{adaptive}{A logical value indicating whether the overall height of
seqlogo corresponds to the number of sequences. If FALSE, seqlogo
overall height = 4,fixedly.}
\item{top}{A logical value. If TRUE, seqlogo is aligned to the top of MSA.}
\item{custom_color}{A data frame with two cloumn called "names" and
"color".Customize the color scheme.}
}
\value{
ggplot object
}
\description{
plot sequence logo for MSA based 'ggolot2'
}
\examples{
#plot sequence motif independently
nt_sequence <- system.file("extdata", "LeaderRepeat_All.fa",
package = "ggmsa")
seqlogo(nt_sequence, color = "Chemistry_NT")
}
\author{
Lang Zhou
}
================================================
FILE: man/sequence-link-tree.fasta.Rd
================================================
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/data.R
\docType{data}
\name{sequence-link-tree.fasta}
\alias{sequence-link-tree.fasta}
\title{sequence-link-tree}
\format{
A MSA fasta with 28 sequences and 480 positions.
}
\description{
Alignment sequences used to demonstrate circular MSA layout
}
\keyword{datasets}
================================================
FILE: man/show-methods.Rd
================================================
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/method-show.R
\docType{methods}
\name{show}
\alias{show}
\alias{SeqDiff-class}
\alias{show,SeqDiff-method}
\title{show method}
\usage{
show(object)
}
\arguments{
\item{object}{SeqDiff object}
}
\value{
message
}
\description{
show method
}
\examples{
fas <- list.files(system.file("extdata", "GVariation", package="ggmsa"),
pattern="fas", full.names=TRUE)
x1 <- seqdiff(fas[1], reference=1)
x1
}
================================================
FILE: man/simplify_hdata.Rd
================================================
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/pp_interactive.R
\name{simplify_hdata}
\alias{simplify_hdata}
\title{simplify_hdata}
\usage{
simplify_hdata(hdata, sim_msa)
}
\arguments{
\item{hdata}{data from tidy_hdata()}
\item{sim_msa}{MSA data frame}
}
\value{
data frame
}
\description{
reset hdata data position
}
\author{
Lang Zhou
}
================================================
FILE: man/simplot.Rd
================================================
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/simplot.R
\name{simplot}
\alias{simplot}
\title{simplot}
\usage{
simplot(
file,
query,
window = 200,
step = 20,
group = FALSE,
id,
sep,
sd = FALSE,
smooth = FALSE,
smooth_params = list(method = "loess", se = FALSE)
)
}
\arguments{
\item{file}{alignment fast file}
\item{query}{query sequence}
\item{window}{sliding window size (bp)}
\item{step}{step size to slide the window (bp)}
\item{group}{whether grouping sequence.(eg. For "A-seq1,A-seq-2,B-seq1 and
B-seq2", using sep = "-" and id = 1 to divide sequences into groups A and
B)}
\item{id}{position to extract id for grouping; only works if group = TRUE}
\item{sep}{separator to split sequence name; only works if group = TRUE}
\item{sd}{whether display standard deviation of
similarity among each group; only works if group=TRUE}
\item{smooth}{FALSE(default)or TRUE; whether display smoothed spline.}
\item{smooth_params}{a list that add params for geom_smooth,
(default: smooth_params = list(method = "loess", se = FALSE))}
}
\value{
ggplot object
}
\description{
Sequence similarity plot
}
\examples{
fas <- system.file("extdata/GVariation/sample_alignment.fa",
package="ggmsa")
simplot(fas, 'CF_YL21')
}
\author{
guangchuang yu
}
================================================
FILE: man/theme_msa.Rd
================================================
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/theme_msa.R
\name{theme_msa}
\alias{theme_msa}
\title{theme_msa}
\usage{
theme_msa()
}
\description{
Theme for ggmsa.
}
\author{
Lang Zhou
}
================================================
FILE: man/tidy_hdata.Rd
================================================
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/pp_interactive.R
\name{tidy_hdata}
\alias{tidy_hdata}
\title{tidy_hdata}
\usage{
tidy_hdata(gap, inter, previous_seq, subsequent_seq)
}
\arguments{
\item{gap}{gap length}
\item{inter}{protein-protein interactive position data}
\item{previous_seq}{previous MSA}
\item{subsequent_seq}{subsequent MSA}
}
\value{
helix data
}
\description{
tidy protein-protein interactive position data
}
\author{
Lang Zhou
}
================================================
FILE: man/tidy_maf_df.Rd
================================================
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/ggmaf.R
\name{tidy_maf_df}
\alias{tidy_maf_df}
\title{tidy_maf_df}
\usage{
tidy_maf_df(maf_df, ref)
}
\arguments{
\item{maf_df}{a MAF data frame.You can get it by read_maf()}
\item{ref}{character, the name of reference genome.
eg:"hg38.chr1_KI270707v1_random"}
}
\value{
data frame
}
\description{
tidy MAF data frame
}
\author{
Lang Zhou
}
================================================
FILE: man/tidy_msa.Rd
================================================
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/msa_data.R
\name{tidy_msa}
\alias{tidy_msa}
\title{tidy_msa}
\usage{
tidy_msa(msa, start = NULL, end = NULL)
}
\arguments{
\item{msa}{multiple sequence alignment file or sequence object in
DNAStringSet, RNAStringSet, AAStringSet, BStringSet, DNAMultipleAlignment,
RNAMultipleAlignment, AAMultipleAlignment, DNAbin or AAbin}
\item{start}{start position to extract subset of alignment}
\item{end}{end position to extract subset of alignemnt}
}
\value{
tibble data frame
}
\description{
Convert msa file/object to tidy data frame.
}
\examples{
fasta <- system.file("extdata", "sample.fasta", package = "ggmsa")
aln <- tidy_msa(msa = fasta, start = 10, end = 100)
}
\author{
Guangchuang Yu
}
================================================
FILE: man/tp53.fa.Rd
================================================
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/data.R
\docType{data}
\name{tp53.fa}
\alias{tp53.fa}
\title{TP53 MSA}
\format{
A MSA fasta with 5 sequences and 404 positions.
}
\description{
Alignment sequences of used to show graphical combination
}
\keyword{datasets}
================================================
FILE: man/treeMSA_plot.Rd
================================================
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/ancestor_seq.R
\name{treeMSA_plot}
\alias{treeMSA_plot}
\title{treeMSA_plot}
\usage{
treeMSA_plot(
p_tree,
tidymsa_df,
ancestral_node = "none",
sub = FALSE,
panel = "MSA",
font = NULL,
color = "Chemistry_AA",
seq_colname = NULL,
...
)
}
\arguments{
\item{p_tree}{tree view}
\item{tidymsa_df}{tidy MSA data}
\item{ancestral_node}{vector, internal node in tree. Assigning a internal
node to display "ancestral sequences",If ancestral_node = "none" hides
all ancestral sequences, if ancestral_node = "all" shows all ancestral
sequences.}
\item{sub}{logical value. Displaying a subset of ancestral sequences or not.}
\item{panel}{panel name for plot of MSA data}
\item{font}{font families, possible values are 'helvetical', 'mono', and
'DroidSansMono', 'TimesNewRoman'. Defaults is 'helvetical'.
If font = NULL, only plot the background tile.}
\item{color}{a Color scheme. One of 'Clustal', 'Chemistry_AA',
'Shapely_AA', 'Zappo_AA', 'Taylor_AA', 'LETTER', 'CN6', 'Chemistry_NT',
'Shapely_NT', 'Zappo_NT', 'Taylor_NT'. Defaults is 'Chemistry_AA'.}
\item{seq_colname}{the colname of MSA on tree$data}
\item{...}{additional parameters for 'geom_msa'}
}
\value{
ggplot object
}
\description{
plot Tree-MSA plot
}
\details{
'treeMSA_plot()' automatically re-arranges the MSA data according to
the tree structure,
}
\author{
Lang Zhou
}
================================================
FILE: tests/testthat/test-main.R
================================================
library(ggmsa)
library(ggplot2)
test_that("check whether `ggmsa` create a `ggplot` object", {
p <- ggmsa(msa = system.file("extdata", "sample.fasta", package = "ggmsa"),
start = 10,
end = 20,
font = NULL)
expect_true(is.ggplot(p))
})
================================================
FILE: tests/testthat/test-msa_data.R
================================================
library(ggmsa)
msa <- system.file("extdata", "sample.fasta", package = "ggmsa")
tidymsa <- tidy_msa(msa, 10, 20)
test_that("check msaData integrity when using `font`", {
msaData <- msa_data(tidymsa)
msaFull_names <- c("label",
"x",
"yy",
"order",
"name",
"position",
"character",
"color",
"group",
"y")
expect_true(is.data.frame(msaData))
expect_named(msaData, msaFull_names)
})
test_that("check msaData integrity when using `font = NULL`", {
msaData <- msa_data(tidymsa, font = NULL)
msaFull_names <- c("name", "position", "character", "color" )
expect_true(is.data.frame(msaData))
expect_named(msaData, msaFull_names)
})
================================================
FILE: tests/testthat/test-tidy_msa.R
================================================
library(ggmsa)
library(Biostrings)
msa <- system.file("extdata", "sample.fasta", package = "ggmsa")
tidy_names <- c("name", "position", "character")
test_that("tidy FASTA format by tidy_msa", {
fasta_tidy <- tidy_msa(msa, 10, 20)
expect_true(is.data.frame(fasta_tidy))
expect_named(fasta_tidy, tidy_names)
})
test_that("tidy Biostrings objects by tidy_msa", {
AAMultipleAlignment <- readAAMultipleAlignment(msa)
expect_s4_class(AAMultipleAlignment, "AAMultipleAlignment")
AAStringSet <- readAAStringSet(msa)
expect_s4_class( AAStringSet, "AAStringSet")
AAMultipleAlignment_tidy <- tidy_msa(AAMultipleAlignment, 10, 20)
AAStringSet_tidy <- tidy_msa(AAStringSet, 10, 20)
expect_true(is.data.frame(AAMultipleAlignment_tidy))
expect_named(AAMultipleAlignment_tidy, tidy_names)
expect_true(is.data.frame(AAStringSet_tidy))
expect_named(AAStringSet_tidy, tidy_names)
})
test_that("tidy AAbin objects by tidy_msa", {
AAbin <- ape::read.FASTA(msa, "AA")
expect_s3_class(AAbin, "AAbin")
AAbin_tidy <- tidy_msa(AAbin, 10, 20)
expect_true(is.data.frame(AAbin_tidy))
expect_named(AAbin_tidy, tidy_names)
})
================================================
FILE: tests/testthat.R
================================================
library(testthat)
library(ggmsa)
test_check("ggmsa")
================================================
FILE: vignettes/.gitignore
================================================
Annotations.Rmd
Color_schemes_And_Font_Families.Rmd
MSA_theme.Rmd
Other_Modules.Rmd
View_modes.Rmd
================================================
FILE: vignettes/ggmsa.Rmd
================================================
---
title: "ggmsa-Getting Started"
author: "GuangChuang Yu and Lang Zhou"
output:
prettydoc::html_pretty:
toc: false
theme: cayman
highlight: github
pdf_document:
toc: true
date: "`r Sys.Date()`"
bibliography: ggmsa.bib
vignette: >
%\VignetteIndexEntry{ggmsa}
%\VignetteEngine{knitr::rmarkdown}
%\VignetteEncoding{UTF-8}
---
```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>"
)
# Packages -------------------------------------------------------------------
library(ggmsa)
library(ggplot2)
library(yulab.utils)
```
# Install package
```{r eval = FALSE}
if (!require("BiocManager"))
install.packages("BiocManager")
BiocManager::install("ggmsa")
```
# Introduction
ggmsa is a package designed to plot multiple sequence alignments.
This package implements functions to visualize publication-quality
multiple sequence alignments (protein/DNA/RNA) in R extremely
simple and powerful. It uses module design to annotate sequence
alignments and allows to accept other data sets for diagrams combination.
In this tutorial, we’ll work through the basics of using ggmsa.
```{r results="hide", message=FALSE, warning=FALSE}
library(ggmsa)
```
```{r echo=FALSE, out.width='50%'}
knitr::include_graphics("man/figures/workflow.png")
```
# Importing MSA data
We’ll start by importing some example data to use throughout this
tutorial. Expect FASTA files, some of the objects in R can also
as input. `available_msa()` can be used to list MSA objects
currently available.
```{r warning=FALSE}
available_msa()
protein_sequences <- system.file("extdata", "sample.fasta",
package = "ggmsa")
miRNA_sequences <- system.file("extdata", "seedSample.fa",
package = "ggmsa")
nt_sequences <- system.file("extdata", "LeaderRepeat_All.fa",
package = "ggmsa")
```
# Basic use: MSA Visualization
The most simple code to use ggmsa:
```{r fig.height = 2, fig.width = 10, warning=FALSE}
ggmsa(protein_sequences, 300, 350, color = "Clustal",
font = "DroidSansMono", char_width = 0.5, seq_name = TRUE )
```
## Color Schemes
ggmsa predefines several color schemes for rendering MSA
are shipped in the package. In the same ways, using
`available_msa()` to list color schemes currently available.
Note that amino acids (protein) and nucleotides (DNA/RNA) have
different names.
```{r warning=FALSE}
available_colors()
```
```{r echo=FALSE, out.width = '50%'}
knitr::include_graphics("man/figures/schemes.png")
```
## Font
Several predefined fonts are shipped ggmsa.
Users can use `available_fonts()` to list the font currently available.
```{r warning=FALSE}
available_fonts()
```
# MSA Annotation
ggmsa supports annotations for MSA. Similar to the ggplot2,
it implements annotations by `geom` and users can perform
annotation with `+` , like this: `ggmsa() + geom_*()`.
Automatically generated annotations that containing colored
labels and symbols are overlaid on MSAs to indicate
potentially conserved or divergent regions.
For example, visualizing multiple sequence alignment
with **sequence logo** and **bar chart**:
```{r fig.height = 2.5, fig.width = 11, warning = FALSE, message = FALSE}
ggmsa(protein_sequences, 221, 280, seq_name = TRUE, char_width = 0.5) +
geom_seqlogo(color = "Chemistry_AA") + geom_msaBar()
```
This table shows the annnotation layers supported by ggmsa as following:
```{r echo=FALSE, results='asis', warning=FALSE, message=FALSE}
library(kableExtra)
x <- "geom_seqlogo()\tgeometric layer\tautomatically generated sequence logos for a MSA\n
geom_GC()\tannotation module\tshows GC content with bubble chart\n
geom_seed()\tannotation module\thighlights seed region on miRNA sequences\n
geom_msaBar()\tannotation module\tshows sequences conservation by a bar chart\n
geom_helix()\tannotation module\tdepicts RNA secondary structure as arc diagrams(need extra data)\n
"
xx <- strsplit(x, "\n\n")[[1]]
y <- strsplit(xx, "\t") %>% do.call("rbind", .)
y <- as.data.frame(y, stringsAsFactors = FALSE)
colnames(y) <- c("Annotation modules", "Type", "Description")
knitr::kable(y, align = "l", booktabs = TRUE, escape = TRUE) %>%
kable_styling(latex_options = c("striped", "hold_position", "scale_down"))
```
# Learn more
Check out the guides for learning everything there is to know about all the different features:
- [Getting Started](https://yulab-smu.top/ggmsa/articles/ggmsa.html)
- [Annotations](https://yulab-smu.top/ggmsa/articles/Annotations.html)
- [Color Schemes and Font Families](https://yulab-smu.top/ggmsa/articles/Color_schemes_And_Font_Families.html)
- [Theme](https://yulab-smu.top/ggmsa/articles/guides/MSA_theme.html)
- [Other Modules](https://yulab-smu.top/ggmsa/articles/Other_Modules.html)
- [View Modes](https://yulab-smu.top/ggmsa/articles/View_modes.html)
# Session Info
```{r echo = FALSE}
sessionInfo()
```
================================================
FILE: vignettes/ggmsa.bib
================================================
@article{Taylor1997Residual,
title={Residual colours: a proposal for aminochromography.},
author={Taylor, W R},
journal={Protein Eng},
volume={10},
number={7},
pages={743-746},
year={1997},
}
@article{Waterhouse2009Jalview,
title={Jalview Version 2--a multiple sequence alignment editor and analysis workbench},
author={Waterhouse, A. M. and Procter, J. B. and Martin, D. M. and Clamp, M and Barton, G. J.},
journal={Bioinformatics},
volume={25},
number={9},
pages={1189},
year={2009},
}
@article{yu2017ggtree,
title={ggtree: an R package for visualization and annotation of phylogenetic trees with their covariates and other associated data},
author={Yu, Guangchuang and Smith, David K and Zhu, Huachen and Guan, Yi and Lam, Tommy Tsanyuk},
journal={Methods in Ecology and Evolution},
volume={8},
number={1},
pages={28--36},
year={2017}
}
@article{Wagih2017ggseqlogo,
title={ggseqlogo: a versatile R package for drawing sequence logos},
author={Wagih, Omar},
journal={Bioinformatics},
volume={33},
number={22},
year={2017},
}