Repository: YuLab-SMU/ggmsa Branch: devel Commit: 956078ed388a Files: 108 Total size: 300.1 KB Directory structure: gitextract_gj8qs7tf/ ├── .Rbuildignore ├── .gitignore ├── CONDUCT.md ├── DESCRIPTION ├── Makefile ├── NAMESPACE ├── NEWS.md ├── R/ │ ├── AllClasses.R │ ├── SeqBundles.R │ ├── ancestor_seq.R │ ├── arc.R │ ├── available.R │ ├── clustal.R │ ├── color_by_conservation.R │ ├── color_else.R │ ├── cons.R │ ├── data.R │ ├── dms.R │ ├── facet_msa.R │ ├── geom_GC.R │ ├── geom_asterisk.R │ ├── geom_msa.R │ ├── geom_msaBar.R │ ├── geom_seed.R │ ├── ggmaf.R │ ├── ggmsa.R │ ├── import-functions.R │ ├── method-plot.R │ ├── method-show.R │ ├── methods-diff.R │ ├── methods-ggplot_add.R │ ├── msa_data.R │ ├── pp_interactive.R │ ├── prepare_fasta.R │ ├── read_maf.R │ ├── seqdiff.R │ ├── seqlogo.R │ ├── simplot.R │ ├── sysdata.rda │ ├── theme_msa.R │ └── zzz.R ├── README.Rmd ├── README.md ├── inst/ │ ├── CITATION │ └── extdata/ │ ├── GVariation/ │ │ ├── A.Mont.fas │ │ ├── B.Oz.fas │ │ ├── C.Wilga5.fas │ │ └── sample_alignment.fa │ ├── Gram-negative_AKL.fasta │ ├── Gram-positive_AKL.fasta │ ├── LeaderRepeat_All.fa │ ├── Rfam/ │ │ ├── RF00458.fasta │ │ ├── RF03120.fasta │ │ └── RF03120_SS.txt │ ├── TP53_genes.xlsx │ ├── sample.fasta │ ├── seedSample.fa │ ├── sequence-link-tree.fasta │ └── tp53.fa ├── man/ │ ├── GVariation.Rd │ ├── Gram-negative_AKL.fasta.Rd │ ├── Gram-positive_AKL.fasta.Rd │ ├── LeaderRepeat_All.fa.Rd │ ├── Rfam.Rd │ ├── TP53_genes.xlsx.Rd │ ├── adjust_ally.Rd │ ├── assign_dms.Rd │ ├── available_colors.Rd │ ├── available_fonts.Rd │ ├── available_msa.Rd │ ├── extract_seq.Rd │ ├── facet_msa.Rd │ ├── geom_GC.Rd │ ├── geom_helix.Rd │ ├── geom_msa.Rd │ ├── geom_msaBar.Rd │ ├── geom_seed.Rd │ ├── geom_seqlogo.Rd │ ├── ggSeqBundle.Rd │ ├── gghelix.Rd │ ├── ggmaf.Rd │ ├── ggmsa.Rd │ ├── merge_seq.Rd │ ├── plot-methods.Rd │ ├── readSSfile.Rd │ ├── read_maf.Rd │ ├── reset_pos.Rd │ ├── sample.fasta.Rd │ ├── seedSample.fa.Rd │ ├── seqdiff.Rd │ ├── seqlogo.Rd │ ├── sequence-link-tree.fasta.Rd │ ├── show-methods.Rd │ ├── simplify_hdata.Rd │ ├── simplot.Rd │ ├── theme_msa.Rd │ ├── tidy_hdata.Rd │ ├── tidy_maf_df.Rd │ ├── tidy_msa.Rd │ ├── tp53.fa.Rd │ └── treeMSA_plot.Rd ├── tests/ │ ├── testthat/ │ │ ├── test-main.R │ │ ├── test-msa_data.R │ │ └── test-tidy_msa.R │ └── testthat.R └── vignettes/ ├── .gitignore ├── ggmsa.Rmd └── ggmsa.bib ================================================ FILE CONTENTS ================================================ ================================================ FILE: .Rbuildignore ================================================ ^.*\.Rproj$ ^\.Rproj\.user$ Makefile README.md README_files README.Rmd ^_pkgdown\.yml$ ^docs$ ^pkgdown$ logo.png CONDUCT.md ================================================ FILE: .gitignore ================================================ .Rproj.user .Rhistory .RData .Renviron .DS_Store inst/doc ggmsa.Rproj ggmsa.Rcheck .git docs/ pkgdown/ ================================================ FILE: CONDUCT.md ================================================ # Contributor Code of Conduct As contributors and maintainers of this project, we pledge to respect all people who contribute through reporting issues, posting feature requests, updating documentation, submitting pull requests or patches, and other activities. We are committed to making participation in this project a harassment-free experience for everyone, regardless of level of experience, gender, gender identity and expression, sexual orientation, disability, personal appearance, body size, race, ethnicity, age, or religion. Examples of unacceptable behavior by participants include the use of sexual language or imagery, derogatory comments or personal attacks, trolling, public or private harassment, insults, or other unprofessional conduct. Project maintainers have the right and responsibility to remove, edit, or reject comments, commits, code, wiki edits, issues, and other contributions that are not aligned to this Code of Conduct. Project maintainers who do not follow the Code of Conduct may be removed from the project team. Instances of abusive, harassing, or otherwise unacceptable behavior may be reported by opening an issue or contacting one or more of the project maintainers. This Code of Conduct is adapted from the Contributor Covenant (http://contributor-covenant.org), version 1.0.0, available at http://contributor-covenant.org/version/1/0/0/ ================================================ FILE: DESCRIPTION ================================================ Package: ggmsa Title: Plot Multiple Sequence Alignment using 'ggplot2' Version: 1.19.0 Authors@R: c(person("Guangchuang", "Yu", email = "guangchuangyu@gmail.com", role = c("aut", "cre","ths"), comment = c(ORCID = "0000-0002-6485-8781")), person("Lang", "Zhou", email = "nyzhoulang@gmail.com", role = "aut"), person("Shuangbin", "Xu", email = "xshuangbin@163.com", role = "ctb"), person("Huina", "Huang", email = "1185796994@qq.com", role = "ctb")) Description: A visual exploration tool for multiple sequence alignment and associated data. Supports MSA of DNA, RNA, and protein sequences using 'ggplot2'. Multiple sequence alignment can easily be combined with other 'ggplot2' plots, such as phylogenetic tree Visualized by 'ggtree', boxplot, genome map and so on. More features: visualization of sequence logos, sequence bundles, RNA secondary structures and detection of sequence recombinations. Depends: R (>= 4.1.0) Imports: Biostrings, ggplot2, magrittr, tidyr, utils, stats, aplot, RColorBrewer, ggfun (>= 0.2.0), ggforce, dplyr, R4RNA, grDevices, seqmagick, grid, methods, ggtree (>= 1.17.1) Suggests: ggtreeExtra, ape, cowplot, knitr, rmarkdown, readxl, ggnewscale, kableExtra, gggenes, statebins, prettydoc, testthat (>= 3.0.0), yulab.utils License: Artistic-2.0 Encoding: UTF-8 URL: https://doi.org/10.1093/bib/bbac222(paper), https://www.amazon.com/Integration-Manipulation-Visualization-Phylogenetic-Computational-ebook/dp/B0B5NLZR1Z/ (book) BugReports: https://github.com/YuLab-SMU/ggmsa/issues biocViews: Software, Visualization, Alignment, Annotation, MultipleSequenceAlignment RoxygenNote: 7.3.2 VignetteBuilder: knitr Config/testthat/edition: 3 ================================================ FILE: Makefile ================================================ PKGNAME := $(shell sed -n "s/Package: *\([^ ]*\)/\1/p" DESCRIPTION) PKGVERS := $(shell sed -n "s/Version: *\([^ ]*\)/\1/p" DESCRIPTION) PKGSRC := $(shell basename `pwd`) BIOCVER := RELEASE_3_23 all: rd check clean alldocs: rd readme mkdocs rd: Rscript -e 'roxygen2::roxygenise(".")' readme: Rscript -e 'rmarkdown::render("README.Rmd")' readme2: Rscript -e 'rmarkdown::render("README.Rmd", "html_document")' build: # cd ..;\ # R CMD build $(PKGSRC) Rscript -e 'devtools::build()' build2: cd ..;\ R CMD build --no-build-vignettes $(PKGSRC) install: cd ..;\ R CMD INSTALL $(PKGNAME)_$(PKGVERS).tar.gz check: #build #cd ..;\ #Rscript -e 'rcmdcheck::rcmdcheck("$(PKGNAME)_$(PKGVERS).tar.gz")' Rscript -e 'devtools::check()' check2: build cd ..;\ R CMD check $(PKGNAME)_$(PKGVERS).tar.gz bioccheck: cd ..;\ Rscript -e 'BiocCheck::BiocCheck("$(PKGNAME)_$(PKGVERS).tar.gz")' gpcheck: Rscript -e 'goodpractice::gp()' clean: cd ..;\ $(RM) -r $(PKGNAME).Rcheck/ gitmaintain: git gc --auto;\ git prune -v;\ git fsck --full rmrelease: git branch -D $(BIOCVER) release: git checkout $(BIOCVER);\ git fetch --all update: git fetch --all;\ git checkout devel;\ git merge upstream/devel;\ git merge origin/devel;\ push: git push upstream devel;\ git push origin devel biocinit: git remote add upstream git@git.bioconductor.org:packages/$(PKGNAME).git;\ git fetch --all ================================================ FILE: NAMESPACE ================================================ # Generated by roxygen2: do not edit by hand S3method(diff,SeqDiff) S3method(ggplot_add,GCcontent) S3method(ggplot_add,facet_msa) S3method(ggplot_add,msaBar) S3method(ggplot_add,nucleotideeHelix) S3method(ggplot_add,seed) S3method(ggplot_add,seqlogo) export(adjust_ally) export(assign_dms) export(available_colors) export(available_fonts) export(available_msa) export(extract_seq) export(facet_msa) export(geom_GC) export(geom_helix) export(geom_msa) export(geom_msaBar) export(geom_seed) export(geom_seqlogo) export(ggSeqBundle) export(gghelix) export(ggmaf) export(ggmsa) export(merge_seq) export(readSSfile) export(read_maf) export(reset_pos) export(seqdiff) export(seqlogo) export(simplify_hdata) export(simplot) export(theme_msa) export(tidy_hdata) export(tidy_maf_df) export(tidy_msa) export(treeMSA_plot) exportMethods(plot) exportMethods(show) importClassesFrom(Biostrings,BStringSet) importFrom(Biostrings,AAStringSet) importFrom(Biostrings,DNAStringSet) importFrom(Biostrings,RNAStringSet) importFrom(Biostrings,readBStringSet) importFrom(Biostrings,readDNAStringSet) importFrom(Biostrings,toString) importFrom(Biostrings,width) importFrom(R4RNA,as.helix) importFrom(R4RNA,collapseHelix) importFrom(R4RNA,expandHelix) importFrom(R4RNA,readBpseq) importFrom(R4RNA,readConnect) importFrom(R4RNA,readHelix) importFrom(R4RNA,readVienna) importFrom(RColorBrewer,brewer.pal) importFrom(aplot,insert_top) importFrom(aplot,plot_list) importFrom(dplyr,group_by) importFrom(dplyr,group_by_) importFrom(dplyr,n) importFrom(dplyr,select) importFrom(dplyr,summarize) importFrom(dplyr,summarize_) importFrom(ggforce,geom_arc) importFrom(ggfun,geom_xspline) importFrom(ggplot2,Geom) importFrom(ggplot2,aes) importFrom(ggplot2,aes_) importFrom(ggplot2,coord_cartesian) importFrom(ggplot2,coord_fixed) importFrom(ggplot2,draw_key_polygon) importFrom(ggplot2,element_blank) importFrom(ggplot2,element_line) importFrom(ggplot2,element_text) importFrom(ggplot2,facet_wrap) importFrom(ggplot2,geom_area) importFrom(ggplot2,geom_blank) importFrom(ggplot2,geom_col) importFrom(ggplot2,geom_line) importFrom(ggplot2,geom_point) importFrom(ggplot2,geom_polygon) importFrom(ggplot2,geom_ribbon) importFrom(ggplot2,geom_segment) importFrom(ggplot2,geom_smooth) importFrom(ggplot2,geom_text) importFrom(ggplot2,geom_tile) importFrom(ggplot2,ggplot) importFrom(ggplot2,ggplot_add) importFrom(ggplot2,ggplot_build) importFrom(ggplot2,ggplot_gtable) importFrom(ggplot2,ggproto) importFrom(ggplot2,ggtitle) importFrom(ggplot2,labs) importFrom(ggplot2,layer) importFrom(ggplot2,scale_color_manual) importFrom(ggplot2,scale_fill_gradientn) importFrom(ggplot2,scale_fill_manual) importFrom(ggplot2,scale_x_continuous) importFrom(ggplot2,scale_y_continuous) importFrom(ggplot2,theme) importFrom(ggplot2,theme_bw) importFrom(ggplot2,theme_minimal) importFrom(ggplot2,theme_void) importFrom(ggplot2,xlab) importFrom(ggplot2,xlim) importFrom(ggplot2,ylab) importFrom(ggtree,geom_facet) importFrom(ggtree,geom_tiplab) importFrom(grDevices,colorRampPalette) importFrom(grid,arrow) importFrom(grid,gTree) importFrom(grid,gpar) importFrom(grid,polygonGrob) importFrom(grid,unit) importFrom(grid,unit.pmax) importFrom(magrittr,"%<>%") importFrom(magrittr,"%>%") importFrom(methods,missingArg) importFrom(methods,new) importFrom(methods,show) importFrom(seqmagick,fa_read) importFrom(stats,setNames) importFrom(tidyr,gather) importFrom(utils,getFromNamespace) importFrom(utils,globalVariables) importFrom(utils,modifyList) importFrom(utils,packageDescription) importFrom(utils,read.delim) ================================================ FILE: NEWS.md ================================================ # ggmsa 1.18.0 + Bioconductor RELEASE_3_23 (2026-04-29, Wed) # ggmsa 1.16.0 + Bioconductor RELEASE_3_22 (2025-11-01, Sat) # ggmsa 1.15.1 + replace `ggalt::geom_xspline()` with `ggfun::geom_xspline()` (2017-07-12, Sat) # ggmsa 1.3.3 + calling `\dontrun{}` for examples on `ggmsa()` # ggmsa 1.3.2 + bugfix: `geom_msaBar` conservation layer incorrectly aligned issues#34(2022-5-13, Fri) # ggmsa 1.3.1 + A new feature--selects ancestral sequence on Tree-MSA plot `treeMSA_plot` (2022-4-14, Thu) + A new feature--visualization of genome alignment `ggmaf` (2022-4-14, Thu) + A test feature--visualization protein-protein interactive (2022-4-14, Thu) + updated the way smooth is invoked on simplot(2022-01-03, Mon) # ggmsa 1.1.4 added smoothed curve on simplot.(2021-12-17, Fri) # ggmsa 1.1.3 fixed the typo in "posHighligthed", and changed it to snake_case "position_highlight" from camelCase "posHighligthed" (2021-12-13, Mon) # ggmsa 1.1.2 fixed the assignment error on line 155 'seqlogo.R' # ggmsa 1.1.1 fixed error: using `||` instead of `|` on 110 lines in geom_msa.R # ggmsa 0.99.0 or 0.99.x (Prepare for submission to `Bioconductor`, 2021-09-22 Wed) + 0.99.1 update DESCRIPTION and NEWS files (2021-09-28, Tue) + 0.99.2 add documentation for row data in extdata/inst and clean up code (2021-09-29, Wed) + 0.99.3 remove some vignettes from master (build on the gh-pages branch) (2021-10-1, Fri) + 0.99.4 remove 'stringr' package from 'Imports' (2021-10-11, Mon) + 0.99.5 make the consensus_views compatible ggtreeExtra and add package description. (2021-10-21, Thu) # ggmsa 0.0.10 + update default color schemes in lower part of the SeqDiff plot (2021-08-20, Fri) # ggmsa 0.0.9 + import R4RNA to fix R check (2021-08-03, Tue) # ggmsa 0.0.8 + bugfix: fix variable names error in color_scheme. (2021-07-29, Thu) + The migration of sequence recombination functionality from `seqcombo` package. (2021-07-20, Tue) # ggmsa 0.0.7 + added `gghelix()` and `geom_helix()`.(2021-04-1, Thu) + added option to show the fill legend.(2021-03-23, Tue) + added a error message to remind that "sequences must have unique names".(2021-03-18, Thu) + added `ggSeqBundle()` to plot Sequence Bundles for MSAs based `ggolot2` (2021-03-18, Thu) # ggmsa 0.0.6 + supports linking `ggtreeExtra`. (2021-01-21, Thu) + bugfix: reversed sequence in 'tree + geom_facet(font)' . (2021-01-21, Thu) + bugfix: partitioning error when the sequence starting point greater than 1. (2021-01-21, Thu) + bugfix: generates continuous x-axis labels for each panel. (2021-01-21, Thu) + supports customize colors `custom_color`. (2020-12-28, Mon) # ggmsa 0.0.5 + added a new view called `by_conservation`.(2020-12-22, Tue) + added a new color scheme `Hydrophobicity` and a new parameter `border`.(2020-12-21, Mon) + rewrite the function `facet_msa()`.(2020-12-03, Thu) + Debug: tree + geom_facet(geom_msa()) does not work.(2020-12-03, Thu) + added a new function `geom_msaBar()`.(2020-12-03, Thu) + added a new parameter `ignore_gaps` used in consensus views.(2020-10-09, Fri) + debug in consensus views (2020-10-05, Mon) + added consensus views (2020-9-30, Wed) + added new colors `LETTER` and `CN6` provided by ShixiangWang.[issues#8](https://github.com/YuLab-SMU/ggmsa/issues/8) # ggmsa 0.0.4 + fixed warning message in **msa_data.R** (2020-4-26, Sun) + added ggplot_add methods for `geom_*()` (2020-4-24, Fri) + added a parameter `seq_name` in `ggmsa()` (2020-4-23, Thu) + added a new function `facet_msa()` --> break down the MSA (2020-4-17, Fri) + added a parameter `posHighlighted` in `ggmsa()` (2020-4-17, Fri) + created a new layer `geom_asterisk()` to optimized `geom_seed()` (2020-4-11, Sta) + added new functions `available_colors()`, `available_fonts()` and `available_msa()` (2020-3-30, Thu) + added a new function `geom_seed()` --> highlight the seed region in miRNA sequences (2020-3-27, Fri) + added a new function `ggmotif()`--> plot sequence motifs independently (2020-3-23, Tue) + added a Monospaced Font `DroidSansMono` (2020-3-23, Mon) # ggmsa 0.0.3 + release of v=0.0.3 (2020-03-16, Mon) + added a new function `geom_GC()` --> plot GC content in MSA (2020-02-28, Fri) + added a new function `geom_seqlogo()` --> plot plot sequence motifs in MSA (2020-02-14, Fri) + used a proportional scaling algorithm (2020-01-08, Wed) # ggmsa 0.0.2 + support plot sequence logo (2019-12-25, Wed) + added three fonts:`helvetical`, `times_new_roman`, `mono` (2019-12-21, Sta) + ~~added three fonts:`serif_font`, `Montserrat_font`, `roboto_font` (2019-12-17, Tue)~~ + added internal outline polygons (2019-12-15, Sun) + bug fixed of `tidy_msa` + import `seqmagick` for parsing fasta + `tidy_msa` for converting msa file/object to tidy data frame (2019-12-09, Mon) # ggmsa 0.0.1 + initial CRAN release (2019-10-17, Thu) + removed from CRAN on 2021-08-17 ================================================ FILE: R/AllClasses.R ================================================ setClass("SeqDiff", representation = representation( file = "character", sequence = "BStringSet", reference = "numeric", diff = "data.frame" ) ) ================================================ FILE: R/SeqBundles.R ================================================ ##' plot Sequence Bundles for MSA based 'ggolot2' ##' ##' ##' @title ggSeqBundle ##' @importFrom ggfun geom_xspline ##' @param msa Multiple sequence alignment file(FASTA) or object for ##' representing either nucleotide sequences or peptide sequences.Also receives ##' multiple MSA files. ##' eg:msa = c("Gram-negative_AKL.fasta", "Gram-positive_AKL.fasta"). ##' @param line_width The width of bundles at each site, default is 0.3. ##' @param line_thickness The thickness of bundles at each site, default is 0.3. ##' @param line_high The high of bundles at each site, default is 0. ##' @param spline_shape A numeric vector of values between -1 and 1, which ##' control the shape of the spline relative to the control points. ##' @param size A numeric vector of values between 0 and 1, ##' which control the size of each lines. ##' @param alpha A numeric vector of values between 0 and 1, ##' which control the alpha of each lines. ##' @param bundle_color The colors of each sequence bundles. ##' eg: bundle_color = c("#2ba0f5","#424242"). ##' @param lev_molecule Reassigning the Y-axis and displaying ##' letter-coded amino acids/nucleotides arranged by physiochemical ##' properties or others.eg:amino acids hydrophobicity ##' lev_molecule = c("-","A", "V", "L", "I", "P", "F", "W", "M", ##' "G", "S","T", "C", "Y", "N", "Q", "D", "E", "K","R", "H"). ##' @return ggplot object ##' @export ##' @examples ##' aln <- system.file("extdata", "Gram-negative_AKL.fasta", package = "ggmsa") ##' ggSeqBundle(aln) ##' @author Lang Zhou ggSeqBundle <- function(msa, line_width = 0.3, line_thickness = 0.3, line_high = 0, spline_shape = 0.3, size = 0.5, alpha = 0.2, bundle_color = c("#2ba0f5","#424242"), lev_molecule = c("-", "A", "V", "L", "I", "P", "F", "W", "M", "G", "S","T", "C", "Y", "N", "Q", "D", "E", "K", "R", "H") ) { if(length(msa) > length(bundle_color)) { stop("Each MSA group should be assigned a bundle color!!") } df <- lapply(seq_along(msa), function(i){ df_aa <- tidy_msa(msa[[i]]) df_aa$name <- as.character(df_aa$name) df_aa$group <- i df_aa })%>% do.call("rbind",.) dd <- adjustMSA(df_msa = df, lev_molecule = lev_molecule, line_width = line_width, line_thickness = line_thickness, line_high = line_high, bundle_color = bundle_color ) mapping <- aes(x = position_adj, y = y_adj, group=name, color = I(bundle_color)) ggplot(data = dd, mapping = mapping) + geom_xspline(shape = spline_shape, linewidth = size, alpha = alpha) + theme_bundles(df = df, lev_molecule = lev_molecule) } adjustMSA <- function(df_msa, lev_molecule, line_width, line_thickness, bundle_color, line_high) { data_scale <- lapply(nrow(df_msa) %>% seq_len(), function(i) { d <- df_msa[i,] d[2,] <- d[1,] d[1,"position_adj"] <- d[1,"position"] - line_width d[2,"position_adj"] <- d[2,"position"] + line_width d }) %>% do.call("rbind",.) data_scale$y <- factor(data_scale$character, levels = lev_molecule) %>% as.numeric() data_adj <- lapply(data_scale$group %>% unique, function(g) { data_group <- data_scale[data_scale$group == g,] thickness <- line_thickness / factor(data_group$name) %>% as.numeric %>% max dd_adj <- lapply(unique(data_group$position), function(i){ df_pos <- data_group[data_group$position == i,] lapply(unique(df_pos$y), function(j){ df_y <- df_pos[df_pos$y == j,] thick_lev <- df_y$name %>% factor %>% as.numeric - 1 df_y$y_adj <- df_y$y - 0.4 + line_high + thickness * thick_lev + line_thickness * (g - 1) df_y }) %>% do.call("rbind",.) }) %>% do.call("rbind",.) dd_adj$bundle_color <- bundle_color[[g]] dd_adj }) %>% do.call("rbind",.) return(data_adj) } ##' @importFrom ggplot2 element_line theme_bundles <- function(df, lev_molecule){ break_y <- factor(lev_molecule, levels = lev_molecule) %>% as.numeric minor_y <- c(break_y + 0.5, break_y - 0.5) %>% unique break_x <- max(df$position) %>% seq_len minor_x <- c(break_x + 0.5, break_x - 0.5) %>% unique list( ylab(NULL), xlab("Position number"), scale_x_continuous(breaks = break_x, labels = break_x, minor_breaks = minor_x), scale_y_continuous(breaks = break_y, labels = lev_molecule, minor_breaks = minor_y), theme(panel.grid.minor.y = element_line(color = "#e8e0e0", linewidth = 0.4), axis.line.x = element_line(color = "gray60", linewidth = 0.8), panel.grid.major = element_blank(), axis.ticks.y = element_blank(), panel.background = element_blank()) ) } ================================================ FILE: R/ancestor_seq.R ================================================ ##' plot Tree-MSA plot ##' ##' ##' 'treeMSA_plot()' automatically re-arranges the MSA data according to ##' the tree structure, ##' @title treeMSA_plot ##' @param p_tree tree view ##' @param tidymsa_df tidy MSA data ##' @param ancestral_node vector, internal node in tree. Assigning a internal ##' node to display "ancestral sequences",If ancestral_node = "none" hides ##' all ancestral sequences, if ancestral_node = "all" shows all ancestral ##' sequences. ##' @param sub logical value. Displaying a subset of ancestral sequences or not. ##' @param panel panel name for plot of MSA data ##' @param font font families, possible values are 'helvetical', 'mono', and ##' 'DroidSansMono', 'TimesNewRoman'. Defaults is 'helvetical'. ##' If font = NULL, only plot the background tile. ##' @param color a Color scheme. One of 'Clustal', 'Chemistry_AA', ##' 'Shapely_AA', 'Zappo_AA', 'Taylor_AA', 'LETTER', 'CN6', 'Chemistry_NT', ##' 'Shapely_NT', 'Zappo_NT', 'Taylor_NT'. Defaults is 'Chemistry_AA'. ##' @param seq_colname the colname of MSA on tree$data ##' @param ... additional parameters for 'geom_msa' ##' @export ##' @importFrom ggtree geom_facet ##' @return ggplot object ##' @author Lang Zhou treeMSA_plot <- function(p_tree, tidymsa_df, ancestral_node = "none", sub = FALSE, panel = "MSA", font = NULL, color = "Chemistry_AA", seq_colname = NULL, ...) { if(!ancestral_node == "none" && is.null(seq_colname)) { stop("pls assign the colname of MSA on tree$data by arguments 'seq_colname'!") } if(!ancestral_node == "none") { p_tree <- adjust_ally(p_tree, node = ancestral_node, sub = sub, seq_colname = seq_colname) tidymsa_df <- extract_seq(p_tree, seq_colname = seq_colname) } p <- p_tree + geom_facet(geom = geom_msa, data = tidymsa_df, panel = panel, font = font, color = color, ...) if(ancestral_node == "none") { p <- p + geom_tiplab(offset = 0.002) } p } ##' adjust the tree branch position after assigning ancestor node ##' ##' @title adjust_ally ##' @param tree ggtree object ##' @param node internal node in tree ##' @param sub logical value. ##' @param seq_colname the colname of MSA on tree$data ##' @importFrom ggtree geom_tiplab ##' @importFrom ggplot2 aes_ ##' @importFrom utils getFromNamespace ##' @return tree ##' @export ##' @author Lang Zhou adjust_ally <- function(tree, node, sub = FALSE, seq_colname = "mol_seq") { getSubtree <- getFromNamespace("getSubtree", "ggtree") if(node == "all"){ d <- tree$data ancestor_n <- d[!d$isTip & !is.na(d[,seq_colname][[1]]),"node"][[1]] }else { if(sub){ ancestor_n <- lapply(node, function(i) { sub_tree <- getSubtree(tree,node = i) sub_ancestor <- sub_tree[!sub_tree$isTip,] ancestor_n <- sub_ancestor$node return(ancestor_n) })%>% unlist %>% unique }else { ancestor_n <- node } } for (i in ancestor_n) { tree <- adjust_treey(tree = tree, node = i) } tree$data$node_color <- "black" tree$data[tree$data$node %in% ancestor_n,"node_color"] <- "red" tree <- tree + geom_tiplab(aes_(color = ~I(node_color)),offset = 0.002) return(tree) } ##' extract ancestor sequence from tree data ##' ##' @title extract_seq ##' @param tree_adjust ggtree object ##' @param seq_colname the colname of MSA on tree$data ##' @return character ##' @export ##' @author Lang Zhou extract_seq <- function(tree_adjust, seq_colname = "mol_seq") { data <- tree_adjust$data seq <- data[data$isTip,seq_colname][[1]] names(seq) <- data[data$isTip,]$label tidy <- tidy_msa(seq) return(tidy) } adjust_treey <- function(tree, node) { tree$data$isTip[tree$data$node == node] <- TRUE tree$data$label[tree$data$node == node] <- tree$data$name[tree$data$node == node] y_ancenstor <- tree$data$y[tree$data$node == node] tree$data$y[tree$data$y > y_ancenstor] <- tree$data$y[tree$data$y > y_ancenstor] + 1 tree$data$y[tree$data$node == node] <- tree$data$y[tree$data$node == node] %>% ceiling return(tree) } ================================================ FILE: R/arc.R ================================================ ##' Plots nucleltide secondary structure as helices in arc diagram ##' ##' @title gghelix ##' @param helix_data a data frame. The file of nucleltide secondary structure ##' and then read by readSSfile(). ##' @param overlap Logicals. If TRUE, two structures data called predict ##' and known must be given(eg:heilx_data = list(known = data1, ##' predicted = data2)), ##' plots the predicted helices that are known on top, predicted helices that ##' are not known on the bottom, and finally plots unpredicted helices ##' on top in black. ##' @param color_by generate colors for helices by various rules, ##' including integer counts and value ranges one of "length" and "value" ##' @return ggplot object ##' @export ##' @examples ##' RF03120 <- system.file("extdata/Rfam/RF03120_SS.txt", package="ggmsa") ##' helix_data <- readSSfile(RF03120, type = "Vienna") ##' gghelix(helix_data) ##' @author Lang Zhou gghelix <- function(helix_data, color_by = "length",overlap = FALSE){ if(is.data.frame(helix_data)) { helix_tidy <- tidy_helix(helix_data, color_by = color_by) }else { helix_tidy <- tidy_list_helix(helix_data, color_by = color_by) } ly <- layer_helix(helix_data = helix_tidy, overlap = overlap) p <- ggplot() + ly + theme_helix() return(p) } ##' The layer of helix plot ##' ##' @title geom_helix ##' @param helix_data a data frame. The file of nucleltide secondary structure ##' and then read by readSSfile(). ##' @param overlap Logicals. If TRUE, two structures data called predict ##' and known must be given(eg:heilx_data = list(known = data1, ##' predicted = data2)), ##' plots the predicted helices that are known on top, ##' predicted helices that are not known on the bottom, and finally plots ##' unpredicted helices on top in black. ##' @param color_by generate colors for helices by various rules, ##' including integer counts and value ranges one of "length" and "value" ##' @param ... additional parameter ##' @return ggplot2 layers ##' @export ##' @examples ##' RF03120 <- system.file("extdata/Rfam/RF03120_SS.txt", package="ggmsa") ##'RF03120_fas <- system.file("extdata/Rfam/RF03120.fasta", package="ggmsa") ##'SS <- readSSfile(RF03120, type = "Vienna") ##'ggmsa(RF03120_fas, font = NULL,border = NA, ##' color = "Chemistry_NT", seq_name = FALSE) + ##'geom_helix(SS) ##' @author Lang Zhou geom_helix <- function(helix_data, color_by = "length", overlap = FALSE, ...) { structure(list(helix_data = helix_data, color_by = color_by, overlap = overlap), class = "nucleotideeHelix") } ##' Read secondary structure file ##' ##' @title readSSfile ##' @importFrom utils read.delim ##' @param file A text file in connect format ##' @param type file type. one of "Helix, "Connect", "Vienna" and "Bpseq" ##' @return data frame ##' @importFrom R4RNA readHelix ##' @importFrom R4RNA readConnect ##' @importFrom R4RNA readVienna ##' @importFrom R4RNA readBpseq ##' @importFrom R4RNA expandHelix ##' @importFrom R4RNA collapseHelix ##' @export ##' @examples ##' RF03120 <- system.file("extdata/Rfam/RF03120_SS.txt", package="ggmsa") ##' helix_data <- readSSfile(RF03120, type = "Vienna") ##' @author Lang Zhou readSSfile <- function(file, type = NULL) { type <- match.arg(type, c("Helix", "Connect", "Vienna", "Bpseq")) load_data <- switch(type, Helix = readHelix(file), Connect = readConnect(file), Vienna = readVienna(file), Bpseq = expandHelix(file)) data <- collapseHelix(load_data) return(data) } tidy_list_helix <- function(helix_data, color_by = "length"){ known <- tidy_helix(helix_data$known, color_by = color_by) predicted <- tidy_helix(helix_data$predicted, color_by = color_by) return(list(known = known, predicted = predicted)) } tidy_helix <- function(helix_data, color_by = "length"){ helix_data <- color_helix(helix_data, color = color_by) names(helix_data)[c(1,2)] <- c("from","to") helix_data$x0 <- (helix_data$to + helix_data$from)/2 helix_data$r <- (helix_data$to - helix_data$from)/2 return(helix_data) } color_helix <- function(helix_data, color){ #color <- match.arg(color, c("length", "value")) if(color == "length"){ data_color <- colorBy_length(helix_data) }else if(color == "value") { data_color <- colorBy_value(helix_data) }else { helix_data$col <- color data_color <- helix_data } data <- expandHelix(data_color) return(data) } colorBy_length <- function(helix_data){ pal_lenght <- colorRampPalette(brewer.pal(name = "Paired", n = 12)) helix_data$col <- nrow(helix_data) %>% pal_lenght() return(helix_data) } colorBy_value <- function(helix_data){ pal_value <- colorRampPalette(rev(brewer.pal(name = "Blues", n = 4))) helix_data$col <- nrow(helix_data) %>% pal_value() return(helix_data) } ##' @importFrom ggforce geom_arc layer_helix <- function(helix_data, overlap = FALSE, seq_numbers = 0){ mapping_above <- aes_(x0 = ~x0, y0 = ~(seq_numbers + 0.5), r = ~r, start = ~1.5*pi, end = ~2.5*pi) mapping_below <- aes_(x0 = ~x0, y0 = ~(-0.5), r = ~r, start = ~pi/2, end = ~1.5*pi) if(seq_numbers > 0) { mapping_below <- modifyList(mapping_below, aes_(y0 = ~0)) } if(is.list(helix_data) & "col" %in% names(helix_data[[2]])) { mapping_above <- modifyList(mapping_above, aes_(color = ~I(col))) mapping_below <- modifyList(mapping_below, aes_(color = ~I(col))) } if(overlap) { if(!is.list(helix_data)| length(helix_data) != 2){ stop("Overlapping structures must input a list with 2 helix data. (eg: heilx_data = list(known = data1, predicted = data2)") } if(!names(helix_data) %in% c("known", "predicted") %>% all) { stop("helix_data names must be 'known' and 'predicted'. (eg: heilx_data = list(known = data1, predicted = data2)") } overlap_data <- overlap_helix(known = helix_data[["known"]], predicted = helix_data[["predicted"]]) if (overlap_data[["above_justknown"]] %>% nrow == 0){ ly_up <- geom_arc(data = overlap_data[["above_both"]], mapping = mapping_above) ly_below <- geom_arc(data = overlap_data[["below"]], mapping = mapping_below) return(list(ly_up, ly_below)) }else { ly_up <- geom_arc(data = overlap_data[["above_both"]], mapping = mapping_above) ly_up_justknown <- geom_arc(data = overlap_data[["above_justknown"]], mapping = mapping_above, color = "black") ly_below <- geom_arc(data = overlap_data[["below"]], mapping = mapping_below) return(list(ly_up, ly_up_justknown, ly_below)) } }else {#overlap = FALSE if(is.list(helix_data) & length(helix_data) == 2) { if(!"col" %in% names(helix_data[["known"]])) { mapping_below <- modifyList(mapping_below, aes_(color = I("#8fce5e"))) } ly_up <- geom_arc(data = helix_data[["known"]], mapping = mapping_below) ly_below <- geom_arc(data = helix_data[["predicted"]], mapping = mapping_above) return(list(ly_up, ly_below)) }else if(is.data.frame(helix_data)){ if("col" %in% names(helix_data)){ mapping_above <- modifyList(mapping_above, aes_(color = ~I(col))) } ly_arc <- geom_arc(data = helix_data, mapping = mapping_above) return(ly_arc) }else { stop("Only a data frame or a list with 2 of helix data are allowed. eg: heilx_data = data or heilx_data = list(known = data1, predicted = data2)") } } } overlap_helix <- function(known, predicted){ if(!c("from", "to") %in% names(known) %>% all) { stop("'known' must be a output from 'readSSfile()'") } if(!c("from", "to") %in% names(predicted) %>% all) { stop("'predicted' must be a output from 'readSSfile()'") } known$heli <- paste0(known$from, "t",known$to) predicted$heli <- paste0(predicted$from, "t", predicted$to) below <- predicted[!predicted$heli %in% known$heli,] #predicted & not known above_both <- predicted[predicted$heli %in% known$heli,] #predicted & known above_justknown <- known[!known$heli %in% above_both$heli,] #unpredicted & known return(list(below = below, above_both = above_both, above_justknown = above_justknown)) } ##' @importFrom ggplot2 theme_void ##' @importFrom ggplot2 element_text ##' @importFrom grid arrow theme_helix <- function(){ list(theme_void(), scale_y_continuous(breaks = 0), coord_fixed(), theme(panel.grid.major.y = element_line(size = 1, arrow = arrow(length = unit(0.3, 'cm'))), panel.grid.major.x = element_line(color = "#eaeaea", size = 0.4), axis.text.x = element_text()) ) } ================================================ FILE: R/available.R ================================================ ##' This function lists font families currently available ##' that can be used by 'ggmsa' ##' ##' ##' @title List Font Families currently available ##' @return A character vector of available font family names ##' @examples available_fonts() ##' @export ##' @author Lang Zhou available_fonts <- function(){ message("font families currently available:" ) font <- paste(names(font_fam), collapse = ' ') message(font, "\n") } ##' This function lists color schemes currently available that ##' can be used by 'ggmsa' ##' ##' ##' @title List Color Schemes currently available ##' @return A character vector of available color schemes ##' @examples available_colors() ##' @export ##' @author Lang Zhou available_colors <- function(){ message("1.color schemes for nucleotide sequences currently available:") color_nt <- paste(names(scheme_NT), collapse = ' ') message(color_nt, "\n") message("2.color schemes for AA sequences currently available:") color_aa <- paste(names(scheme_AA), collapse = ' ') message("Clustal", color_aa, "\n") } ##' This function lists MSA objects currently available that ##' can be used by 'ggmsa' ##' ##' ##' @title List MSA objects currently available ##' @return A character vector of available objects ##' @examples available_msa() ##' @export ##' @author Lang Zhou available_msa <- function(){ message("1.files currently available:") message(".fasta",'\n') message("2.XStringSet objects from 'Biostrings' package:") mes <- paste(supported_msa_class[!grepl("bin", supported_msa_class)], collapse = ' ') message(mes, '\n') message("3.bin objects:") mes_bin <- paste(supported_msa_class[grepl("bin", supported_msa_class)], collapse = ' ') message(mes_bin, '\n') } ================================================ FILE: R/clustal.R ================================================ ##' A color scheme of Culstal. The algorithm to assign colors ##' for Multiple Sequence. ##' ##' @param y sequence alignment with data frame, generated by tidy_msa(). ##' @keywords clustal ##' @noRd color_Clustal <- function(y) { char_freq <- lapply(split(y, y$position), function(x) table(x$character)) col_convert <- lapply(char_freq, function(seq_column) { ##The white as the background clustal <- rep("#ffffff", length(seq_column)) names(clustal) <- names(seq_column) r <- seq_column/sum(seq_column) for (pos in seq_along(seq_column)) { char <- names(seq_column)[pos] i <- grep(char, scheme_clustal$re_position) for (j in i) { if (scheme_clustal$type[j] == "combined"){ rr <- sum(r[strsplit(scheme_clustal$re_gp[j], '')[[1]]], na.rm = TRUE) if (rr > scheme_clustal$thred[j]) { clustal[pos] <- scheme_clustal$colour[j]} } else{ rr1<-r[strsplit(scheme_clustal$re_gp[j], ',')[[1]]] if (any(rr1> scheme_clustal$thred[j],na.rm = TRUE) ) { clustal[pos] <- scheme_clustal$colour[j]} } break } } return(clustal) }) yy <- split(y, y$position) lapply(names(yy), function(n) { d <- yy[[n]] col <- col_convert[[n]] d$color <- col[d$character] return(d) }) %>% do.call('rbind', .) } ================================================ FILE: R/color_by_conservation.R ================================================ color_increment <- function(conservation_visibility){ lapply(seq_len(nrow(conservation_visibility)), function(i){ color_ramp <- colorRampPalette(colors = c(conservation_visibility[i,"color"], "#ffffff")) color_change <- rev(color_ramp(100))[conservation_visibility[i,"visibility"]] return(color_change) }) %>% unlist } color_visibility <- function(y){ #options(digits = 2) #on.exit() conser_data <- bar_data(y) conser_data$visibility <- conser_data$Freq / length(levels(y[[1]])) %>% round(2) conser_data$visibility <- conser_data$visibility * 100 names(conser_data)[3] <- "position" y_filter <- y[c(-1,-3)] conser_ready <- merge(conser_data, y_filter) y$color <- color_increment(conser_ready) return(y) } ================================================ FILE: R/color_else.R ================================================ ##' Assigning colors to sequence alignment. ##' ##' ##' @param y sequence alignment with data frame, generated by tidy_msa(). ##' @param color a Color scheme. One of 'Clustal', 'Chemistry_AA', ##' 'Shapely_AA', 'Zappo_AA', 'Taylor_AA', 'LETTER', 'CN6', 'Chemistry_NT', ##' 'Shapely_NT', 'Zappo_NT', 'Taylor_NT'. Defaults is 'Chemistry_AA'. ##' @param custom_color A data frame with two column called "names" ##' and "color".Customize the color scheme. ##' @noRd color_scheme <- function(y, color = "Chemistry_AA", custom_color = NULL) { if (!is.null(custom_color)){ #Elimination factor interference custom_color[["names"]] <- as.character(custom_color[["names"]]) #Fuzzy matching the string "colors" or "colours" custom_color[["color"]] <- as.character(custom_color$col) row.names(custom_color) <- custom_color[["names"]] scheme_AA$custom_color <- custom_color[row.names(scheme_AA), "color"] %>% as.character() y$color <- scheme_AA[y$character, "custom_color"] }else{ if(grepl("NT", color)){ y$color <- scheme_NT[y$character, color] } else{ y$color <- scheme_AA[y$character, color] } } return(y) } ================================================ FILE: R/cons.R ================================================ ##' cleaning the needless sequences' color according to the ##' consensus sequence (only used in the consensus views). ##' ##' @param y a data frame, sequence alignment with specified color. ##' @param consensus the consensus sequence which can be called by ##' get_consensus(). ##' @param disagreement a logical value. Displays characters that ##' disagreement to consensus(excludes ambiguous disagreements). ##' @param ref a character string. Specifying the reference sequence ##' which should be one of input sequences when 'consensus_views' is TRUE. ##' @keywords tidy_color ##' @noRd tidy_color <- function(y, consensus, disagreement, ref) { c <- lapply(unique(y$position), function(i) { msa_cloumn <- y[y$position == i, ] if(!is.null(ref)) { if ('label' %in% names(msa_cloumn)) { ##work for ggtreeExtra msa_cloumn <- msa_cloumn[!msa_cloumn$label == ref, ] }else{ msa_cloumn <- msa_cloumn[!msa_cloumn$name == ref, ] } } #Get consensus char. cons_char <- consensus[consensus$position == i, "character"] #Compare the characters of the current position(i) #to the consensus char. logic <- msa_cloumn$character == cons_char #Cleaning colors according to the 'logic'. if(cons_char == "X") { msa_cloumn$color <- NA } if(disagreement){ msa_cloumn[logic, "color"] <- NA }else{ msa_cloumn[!logic, "color"] <- NA } msa_cloumn }) %>% do.call("rbind", .) return(c) } ##' calling the consensus sequence. ##' ##' @param tidy sequence alignment with data frame, generated by tidy_msa(). ##' @param ignore_gaps a logical value. When selected TRUE, gaps in ##' column are treated as if that row didn't exist. ##' @param ref a character string. Specifying the reference sequence ##' which should be one of input sequences when 'consensus_views' is TRUE. ##' @keywords get_consensus ##' @noRd get_consensus <- function(tidy, ignore_gaps = FALSE, ref = NULL) { if(!is.null(ref)) { if(ignore_gaps) { warning("The argument 'ignore_gaps' is invalid when 'ref' is specified!") } if ('label' %in% names(tidy)) { ##work for ggtreeExtra ref <- match.arg(ref, levels(factor(tidy$label))) cons <- tidy[tidy$label == ref,] }else { ref <- match.arg(ref, levels(tidy$name)) cons <- tidy[tidy$name == ref,] } return(cons) } #Iterate through each columns cons <- lapply(unique(tidy$position), function(i) { msa_cloumn <- tidy[tidy$position == i, ] cons <- data.frame(position = i) if(ignore_gaps) { msa_cloumn <- msa_cloumn[!msa_cloumn$character %in% "-",] } #Gets the highest frequency characters fre <- table(msa_cloumn$character) %>% data.frame max_element <- fre[fre[2] == max(fre[2]),] max_number <- max_element %>% nrow if(max_number == 1) { cons$character <- max_element[1,1] }else { cons$character <- "X" } cons }) %>% do.call("rbind", .) cons$name = "Consensus" cons$character <- as.character(cons$character) #debug 'as.character' return(cons) } order_name <- function(name, order = NULL, consensus_views = FALSE, ref = NULL) { name_uni <- unique(name) if(is.null(ref)){ #placed 'consensus' at the top name_expect <- name_uni[!name_uni %in% "Consensus"] %>% rev %>% as.character name <- factor(name, levels = c(name_expect, "Consensus")) }else { name_expect <- name_uni[!name_uni %in% ref] %>% rev %>% as.character name <- factor(name, levels = c(name_expect, ref)) } return(name) } ================================================ FILE: R/data.R ================================================ #' A sample data used in ggmsa #' #' A dataset containing the alignment sequences of #' the phenylalanine hydroxylase protein (PH4H) #' within nine species #' #' #' @docType data #' @keywords datasets #' @name sample.fasta #' @format A MSA fasta with 9 sequences and 456 positions. NULL #' GVariation #' #' A folder containing 4 MAS files as a sample #' data set to identify the sequence recombination event. #' #' \itemize{ #' \item A.Mont.fas MSA with sequences of 'Mont' and 'CF_YL21' #' \item B.Oz.fas MSA with sequences of 'Oz' and 'CF_YL21' #' \item C.Wilga5.fas MSA with sequences of 'Wilga5' and 'CF_YL21' #' \item sample_alignment.fa MSA with sequences of 'Mont', 'CF_YL21', #' 'Oz', and 'Wilga5' #' } #' @docType data #' @keywords datasets #' @name GVariation #' @format a folder #' @source \url{https://link.springer.com/article/10.1007/s11540-015-9307-3} NULL #' Rfam #' #' A folder containing seed alignment sequences and #' corresponding consensus RNA secondary structure. #' #' \itemize{ #' \item RF00458.fasta seed alignment sequences of Cripavirus internal #' ribosome entry site (IRES) #' \item RF03120.fasta seed alignment sequences of Sarbecovirus 5'UTR #' \item RF03120_SS.txt consensus RNA secondary structure of #' Sarbecovirus 5'UTR #' #' } #' @docType data #' @keywords datasets #' @name Rfam #' @format a folder #' @source \url{https://rfam.xfam.org/} NULL #' Gram-negative_AKL #' #' Amino acids in the adenylate kinase lid (AKL) domain #' from Gram-negative bacteria. #' #' @docType data #' @keywords datasets #' @name Gram-negative_AKL.fasta #' @format A MSA fasta with 100 sequences and 36 positions. #' @source \url{http://biovis.net/year/2013/info/redesign-contest} NULL #' Gram-positive_AKL #' #' Amino acids in the adenylate kinase lid (AKL) domain #' from Gram-positive bacteria. #' #' @docType data #' @keywords datasets #' @name Gram-positive_AKL.fasta #' @format A MSA fasta with 100 sequences and 36 positions. #' @source \url{http://biovis.net/year/2013/info/redesign-contest} NULL #' A sample DNA alignment sequences #' #' DNA alignment sequences with 24 sequences and 56 positions. #' #' #' @docType data #' @keywords datasets #' @name LeaderRepeat_All.fa #' @format A MSA fasta NULL #' microRNA data used in ggmsa #' #'Fasta format sequences of mature miRNA sequences #'from miRBase #' #' #' @docType data #' @keywords datasets #' @name seedSample.fa #' @format A MSA fasta with 6 sequences and 22 positions. #' @source \url{https://www.mirbase.org/ftp.shtml} NULL #' sequence-link-tree #' #' Alignment sequences used to demonstrate circular MSA layout #' #' @docType data #' @keywords datasets #' @name sequence-link-tree.fasta #' @format A MSA fasta with 28 sequences and 480 positions. NULL #' TP53 MSA #' #' Alignment sequences of used to show graphical combination #' #' @docType data #' @keywords datasets #' @name tp53.fa #' @format A MSA fasta with 5 sequences and 404 positions. NULL #' genome locus #' #' The local genome map shows the 30000 sites around the TP53 gene. #' #' @docType data #' @keywords datasets #' @name TP53_genes.xlsx #' @format xlsx NULL ================================================ FILE: R/dms.R ================================================ ##' assign dms value to alignments. ##' ##' @title assign_dms ##' @param x data frame from tidy_msa() ##' @param dms dms data frame ##' @return tree ##' @export ##' @author Lang Zhou assign_dms <- function(x, dms) { dms_value <- lapply(unique(x$position), function(i) { xx <- x[x$position == i,] dmss <- dms[dms$site_RBD == i,] wt <- unique(dmss[,"wildtype"]) xx$mutation <- paste0(wt, xx$position, xx$character) xx$bind_avg <- lapply(seq_along(xx$mutation),function(j) { bind_avg <- dmss[dmss$mutation_RBD %in% xx[j,"mutation"],"bind_avg"] return(bind_avg) }) %>% unlist return(xx) }) %>% do.call("rbind",.) return(dms_value ) } ================================================ FILE: R/facet_msa.R ================================================ ##' The MSA would be plot in a field that you set. ##' @title segment MSA ##' @param field a numeric vector of the field size. ##' @return ggplot layers ##' @examples ##' library(ggplot2) ##' f <- system.file("extdata/sample.fasta", package="ggmsa") ##' # 2 fields ##' ggmsa(f, end = 120, font = NULL, color="Chemistry_AA") + ##' facet_msa(field = 60) ##' # 3 fields ##' ggmsa(f, end = 120, font = NULL, color="Chemistry_AA") + ##' facet_msa(field = 40) ##' @export ##' @author Lang Zhou facet_msa <- function(field) { structure(list(field = field), class = "facet_msa" ) } facet_data <- function(msaData, field) { if(min(msaData$position) > 1){ pos_reset <- msaData$position - min(msaData$position) pos_reset[pos_reset == 0] <- 1 }else { pos_reset <- msaData$position } msaData$facet <- pos_reset %/% field msaData[(pos_reset %% field) == 0, "facet"] <- msaData[(pos_reset %% field) == 0, "facet"] - 1 return(msaData) } ================================================ FILE: R/geom_GC.R ================================================ ##' Multiple sequence alignment layer for ggplot2. It plot points of GC content. ##' @title geom_GC ##' @param show.legend logical. Should this layer be included in the legends? ##' @return a ggplot layer ##' @examples ##' #plot GC content ##' f <- system.file("extdata/LeaderRepeat_All.fa", package="ggmsa") ##' ggmsa(f, font = NULL, color="Chemistry_NT") + geom_GC() ##' @export ##' @author Lang Zhou geom_GC <- function(show.legend = FALSE) { structure(list(show.legend = show.legend), class = "GCcontent") } geom_GC1 <- function(tidyData, show.legend = FALSE){ tidy <- tidyData #tidy <- tidy_msa(msa = msa, start = start, end = end) GC_pos <- getOption("GC_pos") GC <- content_GC(tidy) GC <-GC[GC$character == "GC",] col_num <- levels(factor(tidy$position)) col_len <- length(col_num) + GC_pos ly_GC <- geom_point(data = GC, mapping = aes_(x = ~col_len, y = ~ypos, size = ~fre), color = "#51a6e9", na.rm = TRUE, show.legend = show.legend) return(ly_GC) } ##' get GC content ##' @title content_GC ##' @param data Multiple aligned sequence files or objects ##' for representing nucleotide sequences ##' @return A data frame ##' @noRd ##' @author Lang Zhou content_GC<- function(data){ tidy <- data tidy$name <- factor(tidy$name, levels = unique(tidy$name)) tidy$ypos <- as.numeric(tidy$name) seq_num <- unique(tidy$ypos) lchar_num <- lapply(seq_num, function(j){ clo <- tidy[tidy$ypos == j, ] y <- prop.table(table(clo$character)) y["GC"] <- y["G"] + y["C"] num <-setNames(rep(0,5), c("A", "T", "G", "C", "GC")) num[names(y)] <- y return(num) }) char_num <- do.call(rbind,lchar_num) char_num <- as.data.frame(char_num) char_num["ypos"] = seq_num char_num2 <- gather(char_num,character,fre, "A", "T", "C","G","GC") return(char_num2) } ================================================ FILE: R/geom_asterisk.R ================================================ ##' a ggplot2 layer of asterisk as a polygon ##' ##' ##' @title a ggplot2 layer of asterisk as a polygon ##' @param mapping aes mapping ##' @param data a data frame ##' @param stat the statistical transformation to use on the data ##' for this layer, as a string. ##' @param position position adjustment, either as a string, ##' or the result of a call to a position adjustment function. ##' @param na.rm a logical value ##' @param show.legend a logical value ##' @param inherit.aes a logical value ##' @param ... additional parameters ##' @importFrom ggplot2 layer ##' @return ggplot2 layer ## @export ##' @noRd ##' @author Lang Zhou ##' @examples ##' #library(ggplot2) ##' #ggplot(mtcars, aes(mpg, disp)) + geom_asterisk() geom_asterisk <- function(mapping = NULL, data = NULL, stat = "identity", position = "identity", na.rm = FALSE, show.legend = NA, inherit.aes = TRUE, ...) { layer(geom = Geomasterisk, mapping = mapping, data = data, stat = stat, position = position, show.legend = show.legend, inherit.aes = inherit.aes, params = list(na.rm = na.rm, ...)) } ##' @importFrom grid polygonGrob ##' @importFrom grid gpar SeedStar <- function(x = NULL , y = NULL) { char_width <- getOption("asterisk_width") char_scale_2 <- getOption("char_scale_2") x_width <- char_scale_2 * diff(range(star$y)) star$x = star$x * x_width/diff(range(star$x)) char_scale <- diff(range(star$x))/diff(range(star$y)) star$x = star$x * (char_width * char_scale)/diff(range(star$x)) star$y = star$y * char_width/diff(range(star$y)) star$x = star$x - min(star$x) - (char_width * char_scale)/2 + x star$y = star$y - min(star$y) - char_width/2 + y polygonGrob(star$x, star$y, gp = gpar(fill = "black") ) } ##' @importFrom ggplot2 ggproto ##' @importFrom ggplot2 Geom ##' @importFrom ggplot2 draw_key_polygon ##' @importFrom ggplot2 aes ##' @importFrom grid gTree Geomasterisk <- ggproto("Geomasterisk", Geom, required_aes = c("x", "y"), default_aes = aes(fill = "black"), draw_key = draw_key_polygon, draw_panel = function(data, panel_params, coord) { data <- coord$transform(data, panel_params) grobs <- lapply(seq_len(nrow(data)), function(i) { SeedStar(data$x[i], data$y[i]) }) class(grobs) <- "gList" ggplot2:::ggname("geom_asterisk", gTree(children = grobs)) } ) ================================================ FILE: R/geom_msa.R ================================================ ##' Multiple sequence alignment layer for ggplot2. ##' It creates background tiles with/without sequence characters. ##' ##' @title geom_msa ##' @param data sequence alignment with data frame, generated by tidy_msa(). ##' @param font font families, possible values are 'helvetical', 'mono', ##' and 'DroidSansMono', 'TimesNewRoman'. Defaults is 'helvetical'. ##' @param mapping aes mapping ##' If font = NULL, only plot the background tile. ##' @param color A Color scheme. One of 'Clustal', 'Chemistry_AA', 'Shapely_AA', ##' 'Zappo_AA', 'Taylor_AA', 'LETTER','CN6',, 'Chemistry_NT', 'Shapely_NT', ##' 'Zappo_NT', 'Taylor_NT'. Defaults is 'Chemistry_AA'. ##' @param custom_color A data frame with two column called "names" and ##' "color".Customize the color scheme. ##' @param char_width a numeric vector. Specifying the character width in ##' the range of 0 to 1. Defaults is 0.9. ##' @param by_conservation a logical value. The most conserved regions have ##' the brightest colors. ##' @param none_bg a logical value indicating whether background ##' should be displayed. Defaults is FALSE. ##' @param position_highlight A numeric vector of the position that ##' need to be highlighted. ##' @param seq_name a logical value indicating whether sequence names ##' should be displayed. Defaults is 'NULL' which indicates that the ##' sequence name is displayed when 'font = null', but 'font = char' ##' will not be displayed. If 'seq_name = TRUE' the sequence name will ##' be displayed in any case. If 'seq_name = FALSE' the sequence name will not ##' be displayed under any circumstances. ##' @param border a character string. The border color. ##' @param consensus_views a logical value that opening consensus views. ##' @param use_dot a logical value. Displays characters as dots instead of ##' fading their color in the consensus view. ##' @param disagreement a logical value. Displays characters that disagreement ##' to consensus(excludes ambiguous disagreements). ##' @param ignore_gaps a logical value. When selected TRUE, ##' gaps in column are treated as if that row didn't exist. ##' @param ref a character string. Specifying the reference sequence ##' which should be one of input sequences when 'consensus_views' is TRUE. ##' @param position Position adjustment, either as a string, or ##' the result of a call to a position adjustment function, ##' default is 'identity' meaning 'position_identity()'. ##' @param show.legend logical. Should this layer be included in the legends? ##' @param dms logical. ##' @param position_color logical. ##' @param ... additional parameter ##' @return A list ##' @importFrom ggplot2 scale_fill_manual ##' @importFrom utils modifyList ##' @export ##' @examples ##' library(ggplot2) ##'aln <- system.file("extdata", "sample.fasta", package = "ggmsa") ##'tidy_aln <- tidy_msa(aln, start = 150, end = 170) ##'ggplot() + geom_msa(data = tidy_aln, font = NULL) + coord_fixed() ##' @author Guangchuang Yu, Lang Zhou geom_msa <- function(data, font = "helvetical", mapping = NULL, color = "Chemistry_AA", custom_color = NULL, char_width = 0.9, none_bg = FALSE, by_conservation = FALSE, position_highlight = NULL, seq_name = NULL, border = NULL, consensus_views = FALSE, use_dot = FALSE, disagreement = TRUE, ignore_gaps = FALSE, ref = NULL, position = "identity", show.legend = FALSE, dms = FALSE, position_color = FALSE, ... ) { data <- msa_data(data, font = font, color = color, custom_color = custom_color, char_width = char_width, by_conservation = by_conservation, consensus_views = consensus_views, use_dot = use_dot, disagreement = disagreement, ignore_gaps = ignore_gaps, ref = ref) #legend work xx <- data[,c("character","color")] %>% unique() xx <- xx[!is.na(xx$color),] labs <- lapply(unique(xx$color) %>% seq_along, function(i) { cols <- unique(xx$color)[i] dup_char <- xx[xx$color == cols, "character"] lab <- paste0(dup_char, collapse = ",") }) %>% do.call("rbind",.) %>% as.vector() cols <- xx$color %>% unique() names(cols) <- cols sacle_tile_cols <- scale_fill_manual(values = cols, breaks = cols, labels = labs) bg_data <- data #work to ggtreeExtra if (is.null(mapping)) { mapping <- aes_(x = ~position, y = ~name, fill = ~I(color)) } #dms color work if (dms) { mapping <- modifyList(mapping, aes_(fill = ~bind_avg)) } if (position_color) { mapping <- modifyList(mapping, aes_(fill = ~I(pos_color))) } #'seq_name' work if (!isTRUE(seq_name)) { if ('y' %in% colnames(data) || isFALSE(seq_name) ) { y <- as.numeric(bg_data$name) mapping <- modifyList(mapping, aes_(y = ~y)) #"~y" is seq numbers } } #'position_highlight' work if (!is.null(position_highlight)) { none_bg = TRUE bg_data <- bg_data[bg_data$position %in% position_highlight,] bg_data$postion <- as.factor(bg_data$position) mapping <- modifyList(mapping, aes_(x = ~position, fill = ~color, width = 1)) } #'border' work if(is.null(border)){ ly_bg <- geom_tile(mapping = mapping, data = bg_data, color = 'grey', inherit.aes = FALSE, position = position, show.legend = show.legend) }else{ ly_bg <- geom_tile(mapping = mapping, data = bg_data, color = border, inherit.aes = FALSE, position = position, show.legend = show.legend) } if (!all(c("yy", "order", "group") %in% colnames(data))) { if(position_color) { return(list(ly_bg)) }else{ return(list(ly_bg, sacle_tile_cols)) } } if ('y' %in% colnames(data)) { data$yy = data$yy - as.numeric(data$name) + data$y } label_mapping <- aes_(x = ~x, y = ~yy, group = ~group) # use_dot work if (consensus_views && !use_dot) { if(show.legend) { stop("legends catn't be shown in the consensus view!") } label_mapping <- modifyList(label_mapping, aes_(fill = ~I(font_color))) } ly_label <- geom_polygon(mapping = label_mapping, data = data, inherit.aes = FALSE, position = position) #'none_bg' work if (none_bg & is.null(position_highlight)) { return(ly_label) } if(consensus_views) { return(list(ly_bg, ly_label)) }else { if(position_color){ return(list(ly_bg, ly_label)) }else{ return(list(ly_bg, ly_label, sacle_tile_cols)) } } } ================================================ FILE: R/geom_msaBar.R ================================================ ##' Multiple sequence alignment layer for ggplot2. ##' It plot sequence conservation bar. ##' @title geom_msaBar ##' @return A list ##' @examples ##' #plot multiple sequence alignment and conservation bar. ##' f <- system.file("extdata/sample.fasta", package="ggmsa") ##' ggmsa(f, 221, 280, font = NULL, seq_name = TRUE) + geom_msaBar() ##' @export ##' @author Lang Zhou geom_msaBar <- function() { structure(list(), class = "msaBar") } ##' @importFrom ggplot2 geom_col ly_bar <- function(tidy){ data <- bar_data(tidy) mapping <- aes_(x = ~pos, y = ~Freq, fill = ~Freq) ly_bar <- geom_col(data = data, mapping = mapping, width = 1, show.legend = FALSE) return(ly_bar) } ##' get bar data ##' @title bar_data ##' @param tidy Multiple aligned sequence files or ##' object for representing nucleotide sequences ##' @return A data frame ##' @noRd ##' @author Lang Zhou bar_data <- function(tidy){ character_position <- unique(tidy$position) conservation_score <- lapply(character_position, function(j) { cloumn_data <- tidy[tidy$position == j, ] character_frequency <- table(cloumn_data$character) %>% as.data.frame max_frequency <- character_frequency[character_frequency[2] == max(character_frequency[2]),] max_frequency$Var1 <- as.character(max_frequency$Var1) if(nrow(max_frequency) == 1) { max_frequency <- max_frequency[1,] }else { max_frequency <- max_frequency[1,] } }) %>% do.call("rbind", .) conservation_score["pos"] <- character_position return(conservation_score) } ================================================ FILE: R/geom_seed.R ================================================ ##' Highlighting the seed in miRNA sequences ##' ##' ##' @title geom_seed ##' @param seed a character string.Specifying the miRNA seed sequence ##' like 'GAGGUAG'. ##' @param star a logical value indicating whether asterisks should ##' be displayed. ##' @return a ggplot layer ##' @author Lang Zhou ##' @examples ##' miRNA_sequences <- system.file("extdata/seedSample.fa", package="ggmsa") ##' ggmsa(miRNA_sequences, font = 'DroidSansMono', ##' color = "Chemistry_NT", none_bg = TRUE) + ##' geom_seed(seed = "GAGGUAG", star = FALSE) ##' ggmsa(miRNA_sequences, font = 'DroidSansMono', ##' color = "Chemistry_NT") + ##' geom_seed(seed = "GAGGUAG", star = TRUE) ##' @export geom_seed <- function(seed, star = FALSE) { structure(list(seed = seed, star = star), class = "seed") } geom_seed1 <- function(tidyData, seed, star) { get_asteriskScale(tidyData) tidyData$y <- as.numeric(tidyData$name) seq_first <- tidyData[tidyData$y == 1,] char <- seq_first$character char <- paste(char, collapse = "") seedPos <- regexpr(seed,char) #locate <- str_locate(char, seed) #df_locate <- as.data.frame(locate) #seedPos <- df_locate$start # start position of seed region seedLen <- nchar(seed) # length of seed region numSeq <- max(tidyData$y) # number of sequences shadingLen <- getOption("shadingLen") #shading width shading_alpha <- getOption("shading_alpha") x <- seedPos - .5 #the x coordinate of the lower left corner y <- 1 - .5 - shadingLen #the y coordinate of the lower left corner yy <- numSeq + .5 + shadingLen # #the y coordinate of the top right corner xx <- x + seedLen #the x coordinate of the top right corner shadingData <- data.frame(x = c(x, x, xx, xx), y = c(y, yy, yy, y), t = c('a', 'a', 'a','a')) starData <- data.frame(star_x = seq(seedPos, length.out = nchar(seed)), star_y = rep(y, times = nchar(seed))) if(isTRUE(star)) { ly_star <- geom_asterisk(data = starData, aes_(x = ~star_x, y = ~star_y)) return(ly_star) } mapping <- aes_(x= ~x, y= ~y, group= ~t, fill = ~I('#bebebe')) ly_seed <- geom_polygon(data = shadingData, mapping = mapping, alpha = shading_alpha) return(ly_seed) } get_asteriskScale <- function(tidyData) { m <- max(tidyData$position) seq_name <- factor(tidyData$name, levels = unique(tidyData$name)) n <- max(as.numeric(seq_name)) char_scale <- diff(range(star$x))/diff(range(star$y)) char_scale_2 <- char_scale * 3/2 * n/m return(options("char_scale_2" = char_scale_2)) } ================================================ FILE: R/ggmaf.R ================================================ ##' plot MAF ##' ##' @title ggmaf ##' @param data a tidy MAF data frame.You can get it by tidy_maf_df() ##' @param ref character, the name of reference genome. ##' eg:"hg38.chr1_KI270707v1_random" ##' @param block_start a numeric vector(>0). The start block to plot. ##' @param block_end a numeric vector(< max block). The end block to plot. ##' @param facet_field a numeric vector. The field in a facet panel. ##' @param heights two numeric vector.The plot proportion between ##' "Genomic location" panel(upon) and "Alignment" panel(down). ##' Default:c(0.4,0.6) ##' @param facet_heights Numeric vectors.The facet proportion. ##' @return ggplot object ##' @export ##' @author Lang Zhou ggmaf <- function(data, ref, block_start = NULL, block_end = NULL, facet_field = NULL, heights = c(0.4,0.6), facet_heights = NULL) { d <- data[data$block_number %in% c(block_start : block_end),] if(is.null(facet_field)) { maf_p <- maf_plot(d = d, ref = ref) p <- plot_list(gglist = maf_p, heights = heights) return(p) }else { d <- facet_maf(mafData = d, field = facet_field) p_ls <- lapply(unique(d$facet), function(i) { facet_d <- d[d$facet == i,] maf_p <- maf_plot(d = facet_d, ref = ref) pp <- plot_list(gglist = maf_p, heights = heights) return(pp) }) p <- plot_list(gglist = p_ls, ncol = 1, heights = facet_heights) return(p) } } ##' tidy MAF data frame ##' ##' @title tidy_maf_df ##' @param maf_df a MAF data frame.You can get it by read_maf() ##' @param ref character, the name of reference genome. ##' eg:"hg38.chr1_KI270707v1_random" ##' @return data frame ##' @export ##' @author Lang Zhou tidy_maf_df <- function(maf_df,ref) { ##add ref position to other genome block_num <- unique(maf_df$block) tidy_df <- lapply(block_num, function(i) { x <- maf_df[maf_df$block == i,] x$ref_start <- x[x$src == ref, "start"] x$ref_end <- x[x$src == ref, "end_gap"] return(x) })%>% do.call("rbind", .) tidy_df$block_number <- factor(tidy_df$block, levels = unique(tidy_df$block)) %>% as.numeric tidy_df$bs <- paste0(tidy_df$src,"-",tidy_df$block) tidy_df$merge_y <- factor(tidy_df$src) %>% as.numeric tidy_df$label <- paste0("B",tidy_df$block_number) tidy_df <- order_aln(tidy_df,ref) return(tidy_df) } #put the ref sequence the first in each block, new col "y" order_aln <- function(tidy_df, ref) { block_num <- unique(tidy_df$block) lev <- sapply(block_num, function(i) { x <- tidy_df[tidy_df$block == i,] order <- c(ref, x$src[!x$src %in% ref]) lev <- paste0(order, "-",x$block) return(lev) })%>% unlist %>% rev tidy_df$y <- factor(tidy_df$bs,levels = lev) %>% as.numeric return(tidy_df) } ##' @importFrom utils getFromNamespace ##' @importFrom ggplot2 aes_ ##' @importFrom ggplot2 geom_text maf_plot <- function(d, ref, positive_color = "#a9c9d4", negative_color = "#ffa389") { geom_rrect <- getFromNamespace("geom_rrect","statebins") ##plot down panel p_maf_aln <- ggplot(data = d) + geom_rrect(mapping=aes_(xmin =~ ref_start, xmax =~ ref_end, ymin =~ y - 0.3, ymax =~ y + 0.3, fill =~ strand)) + geom_rrect(data = d, mapping=aes_(xmin =~ ref_start, xmax =~ ref_end, ymin =~ max(y) + 1 - 0.3, ymax =~ max(y) + 1 + 0.3), fill = "#a9c9d4",color = "black") + scale_y_continuous(breaks = c(d$y,max(d$y + 1)),labels = c(d$bs, ref)) + scale_fill_manual(breaks = c("+","-"), values = c(positive_color,negative_color)) + theme_void() + theme(axis.text.x = element_text(), axis.text.y = element_text(), panel.grid.minor.y = element_blank(), panel.grid.major.y = element_line(color = "grey")) ##plot upon panel aim <- d[d$src != ref, ] p_maf_genomePos <- ggplot(data = aim) + geom_rrect(mapping = aes_(xmin =~ start, xmax =~ end_gap, ymin =~ merge_y - 0.3, ymax =~ merge_y + 0.3, fill =~ strand), color = "black", size = 0.5, alpha = 0.8, show.legend = FALSE) + scale_y_continuous(breaks = unique(aim$merge_y), labels = unique(aim$src)) + scale_fill_manual(breaks = c("+","-"), values = c(positive_color,negative_color)) + theme_void() + theme(panel.grid.major.y = element_line(color = "grey"), axis.text.x = element_text(), axis.text.y = element_text(), strip.text = element_blank()) + geom_text(aes_(x =~ (start + end_gap)/2, y =~ merge_y,label =~ label), size = 3) + facet_wrap(~src, scales = "free", ncol = 1) return(list(p_maf_genomePos, p_maf_aln)) } #assign facet number to blocks facet_maf <- function(mafData, field) { if(min(mafData$block_number) > 1){ pos_reset <- mafData$block_number - min(mafData$block_number) + 1 #pos_reset[pos_reset == 0] <- 1 }else { pos_reset <- mafData$block_number } mafData$facet <- pos_reset %/% field mafData[(pos_reset %% field) == 0, "facet"] <- mafData[(pos_reset %% field) == 0, "facet"] - 1 return(mafData) } ================================================ FILE: R/ggmsa.R ================================================ ##' Plot multiple sequence alignment using ggplot2 with multiple color schemes ##' supported. ##' ##' ##' @title ggmsa ##' @param msa Multiple aligned sequence files or objects representing either ##' nucleotide sequences or AA sequences. ##' @param start a numeric vector. Start position to plot. ##' @param end a numeric vector. End position to plot. ##' @param font font families, possible values are 'helvetical', 'mono', and ##' 'DroidSansMono', 'TimesNewRoman'. Defaults is 'helvetical'. ##' If font = NULL, only plot the background tile. ##' @param color a Color scheme. One of 'Clustal', 'Chemistry_AA', ##' 'Shapely_AA', 'Zappo_AA', 'Taylor_AA', 'LETTER', 'CN6', 'Chemistry_NT', ##' 'Shapely_NT', 'Zappo_NT', 'Taylor_NT'. Defaults is 'Chemistry_AA'. ##' @param custom_color A data frame with two column called "names" and ##' "color".Customize the color scheme. ##' @param char_width a numeric vector. Specifying the character width in ##' the range of 0 to 1. Defaults is 0.9. ##' @param by_conservation a logical value. The most conserved regions have ##' the brightest colors. ##' @param none_bg a logical value indicating whether background should be ##' displayed. Defaults is FALSE. ##' @param position_highlight A numeric vector of the position that need to be ##' highlighted. ##' @param seq_name a logical value indicating whether sequence names ##' should be displayed. Defaults is 'NULL' which indicates that the ##' sequence name is displayed when 'font = null', but 'font = char' ##' will not be displayed. If 'seq_name = TRUE' the sequence name will ##' be displayed in any case. If 'seq_name = FALSE' the sequence name ##' will not be displayed under any circumstances. ##' @param border a character string. The border color. ##' @param consensus_views a logical value that opening consensus views. ##' @param use_dot a logical value. Displays characters as dots instead ##' of fading their color in the consensus view. ##' @param disagreement a logical value. Displays characters that ##' disagreememt to consensus(excludes ambiguous disagreements). ##' @param ignore_gaps a logical value. When selected TRUE, gaps in column ##' are treated as if that row didn't exist. ##' @param ref a character string. Specifying the reference sequence which ##' should be one of input sequences when 'consensus_views' is TRUE. ##' @param show.legend logical. Should this layer be included in the legends? ##' @return ggplot object ##' @importFrom tidyr gather ##' @importFrom ggplot2 ggplot ##' @importFrom ggplot2 aes_ ##' @importFrom ggplot2 theme ##' @importFrom ggplot2 theme_minimal ##' @importFrom ggplot2 geom_tile ##' @importFrom ggplot2 geom_polygon ##' @importFrom ggplot2 xlab ##' @importFrom ggplot2 ylab ##' @importFrom ggplot2 coord_fixed ##' @importFrom ggplot2 geom_point ##' @importFrom ggplot2 element_blank ##' @importFrom magrittr %>% ##' @importFrom stats setNames ##' @importFrom grid unit ##' @examples ##' #plot multiple sequences by loading fasta format ##' fasta <- system.file("extdata", "sample.fasta", package = "ggmsa") ##' ggmsa(fasta, 164, 213, color="Chemistry_AA") ##' ##'\dontrun{ ##' #XMultipleAlignment objects can be used as input in the 'ggmsa' ##' AAMultipleAlignment <- Biostrings::readAAMultipleAlignment(fasta) ##' ggmsa(AAMultipleAlignment, 164, 213, color="Chemistry_AA") ##' ##' #XStringSet objects can be used as input in the 'ggmsa' ##' AAStringSet <- Biostrings::readAAStringSet(fasta) ##' ggmsa(AAStringSet, 164, 213, color="Chemistry_AA") ##' ##' #Xbin objects from 'seqmagick' can be used as input in the 'ggmsa' ##' AAbin <- seqmagick::fa_read(fasta) ##' ggmsa(AAbin, 164, 213, color="Chemistry_AA") ##' } ##' @export ##' @author Guangchuang Yu ggmsa <- function(msa, start = NULL, end = NULL, font = "helvetical", color = "Chemistry_AA", custom_color = NULL, char_width = 0.9, none_bg = FALSE, by_conservation = FALSE, position_highlight = NULL, seq_name = NULL, border = NULL, consensus_views = FALSE, use_dot = FALSE, disagreement = TRUE, ignore_gaps = FALSE, ref = NULL, show.legend = FALSE) { data <- tidy_msa(msa, start = start, end = end) ggplot() + geom_msa(data, font = font, color = color, custom_color = custom_color, char_width = char_width, none_bg = none_bg, by_conservation = by_conservation, position_highlight = position_highlight, seq_name = seq_name, border = border, consensus_views = consensus_views, use_dot = use_dot, disagreement = disagreement, ignore_gaps = ignore_gaps, ref = ref, show.legend = show.legend) + theme_msa() } ================================================ FILE: R/import-functions.R ================================================ ##' @importFrom utils globalVariables globalVariables(".") globalVariables("fre") #geom_GC.R: globalVariables("read.delim") #arc.R globalVariables(c("name", "position_adj", "y_adj")) #SeqBundles.R ================================================ FILE: R/method-plot.R ================================================ ##' plot method for SeqDiff object ##' ##' @name plot ##' @rdname plot-methods ##' @exportMethod plot ##' @aliases plot,SeqDiff,ANY-method ##' @docType methods ##' @param x SeqDiff object ##' @param width bin width ##' @param title plot title ##' @param xlab xlab ##' @param by one of 'bar' and 'area' ##' @param fill fill color of upper part of the plot ##' @param colors color of lower part of the plot ##' @param xlim limits of x-axis ##' @return plot ##' @importFrom ggplot2 ggtitle ##' @importFrom ggplot2 xlim ##' @importFrom ggplot2 ggplot_gtable ##' @importFrom ggplot2 ggplot_build ##' @importFrom grid unit.pmax ##' @importFrom aplot plot_list ##' @author guangchuang yu ##' @examples ##' fas <- list.files(system.file("extdata", "GVariation", package="ggmsa"), ##' pattern="fas", full.names=TRUE) ##' x1 <- seqdiff(fas[1], reference=1) ##' plot(x1) setMethod("plot", signature(x="SeqDiff"), function(x, width=50, title="auto", xlab = "Nucleotide Position", by="bar", fill="firebrick", colors=c(A="#ff6d6d", C="#769dcc", G="#f2be3c", T="#74ce98"), xlim = NULL) { nn <- names(x@sequence) if (is.null(title) || is.na(title)) { title <- "" } else if (title == "auto") { title <- paste(nn[-x@reference], "nucelotide differences relative to", nn[x@reference]) } p1 <- plot_difference_count(x@diff, width, by=by, fill=fill) + ggtitle(title) p2 <- plot_difference(x@diff, colors=colors, xlab) if (!is.null(xlim)) { p1 <- p1 + xlim(xlim) p2 <- p2 + xlim(xlim) } plot_list(p1, p2, ncol=1, heights=c(.7, .4)) } ) ##' @importFrom ggplot2 ggplot ##' @importFrom ggplot2 aes_ ##' @importFrom ggplot2 geom_segment ##' @importFrom ggplot2 xlab ##' @importFrom ggplot2 ylab ##' @importFrom ggplot2 scale_y_continuous ##' @importFrom ggplot2 theme_minimal ##' @importFrom ggplot2 theme ##' @importFrom ggplot2 element_blank ##' @importFrom ggplot2 scale_color_manual plot_difference <- function(x, colors, xlab="Nucleotide Position") { x$difference <- x$difference %>% toupper yy = 4:1 names(yy) = c("A", "C", "G", "T") x$y <- yy[x$difference] n <- sum(is.na(x$y)) if (n > 0) { message(n, " sites contain deletions or ambiguous bases, which will be ignored in current implementation...") } x <- x[!is.na(x$y),] p <- ggplot(x, aes_(x=~position, y=~y, color=~difference)) p + geom_segment(aes_(x=~position, xend=~position, y=~y, yend=~y+.8)) + xlab(xlab) + ylab(NULL) + scale_y_continuous(breaks=yy, labels=names(yy)) + theme_minimal() + theme(legend.position="none")+ theme(axis.text.x=element_blank(), axis.ticks.x = element_blank()) + scale_color_manual(values=colors) } ##' @importFrom ggplot2 geom_col ##' @importFrom ggplot2 geom_area ##' @importFrom ggplot2 theme_bw plot_difference_count <- function(x, width, by = 'bar', fill='red') { by <- match.arg(by, c("bar", "area")) if (by == 'bar') { geom <- geom_col(fill=fill, width=width) keep0 <- FALSE } else if (by == "area") { geom <- geom_area(fill=fill) keep0 <- TRUE } d <- nucleotide_difference_count(x, width, keep0) p <- ggplot(d, aes_(x=~position, y=~count)) p + geom + xlab(NULL) + ylab("Difference") + theme_bw() } ================================================ FILE: R/method-show.R ================================================ ##' show method ##' ##' ##' @name show ##' @docType methods ##' @rdname show-methods ##' @title show method ##' @param object SeqDiff object ##' @return message ##' @importFrom methods show ##' @exportMethod show ##' @aliases SeqDiff-class ##' show,SeqDiff-method ##' @usage show(object) ##' @examples ##' fas <- list.files(system.file("extdata", "GVariation", package="ggmsa"), ##' pattern="fas", full.names=TRUE) ##' x1 <- seqdiff(fas[1], reference=1) ##' x1 setMethod("show",signature(object="SeqDiff"), function(object) { message("sequence differences of", paste0(names(object@sequence), collapse=" and "), '\n') d <- object@diff$difference %>% table %>% as.data.frame message(sum(d$Freq), " ", "sites differ:\n") freq <- d[,2] names(freq) <- d[,1] print(freq) }) ================================================ FILE: R/methods-diff.R ================================================ ##' @method diff SeqDiff ##' @export diff.SeqDiff <- function(x, ...) { x@diff } ================================================ FILE: R/methods-ggplot_add.R ================================================ ##' @method ggplot_add seqlogo ##' @export ggplot_add.seqlogo <- function(object, plot, object_name) { msaData <- plot$layers[[1]]$data logo_tidyData <- msa2tidy(msaData) logo_font <- object$font logo_color <- object[["color"]] adaptive <- object$adaptive top <- object$top logo_custom_color <- object[["custom_color"]] show.legend <- object$show.legend ly_logo <- geom_logo(data = logo_tidyData, font = logo_font, color = logo_color, adaptive = adaptive, top = top, custom_color = logo_custom_color, show.legend = show.legend) ggplot_add(ly_logo, plot, object_name) } ##' @method ggplot_add seed ##' @export ggplot_add.seed <- function(object, plot, object_name) { msaData <- plot$layers[[1]]$data seed_tidyData <- msa2tidy(msaData) seed <- object$seed star <- object$star ly <- geom_seed1(seed_tidyData, seed, star) ggplot_add(ly, plot, object_name) } ##' @method ggplot_add GCcontent ##' @export ggplot_add.GCcontent <- function(object, plot, object_name) { msaData <- plot$layers[[1]]$data show.legend <- object$show.legend GC_tidyData <- msa2tidy(msaData) ly <- geom_GC1(GC_tidyData, show.legend = show.legend ) ggplot_add(ly, plot, object_name) } ##' @importFrom ggplot2 facet_wrap ##' @importFrom ggplot2 ggplot_add ##' @importFrom ggplot2 scale_x_continuous ##' @importFrom ggplot2 coord_cartesian ##' @importFrom ggplot2 geom_blank ##' @method ggplot_add facet_msa ##' @export ggplot_add.facet_msa <- function(object, plot, object_name){ msaData <- plot$layers[[1]]$data field <- object$field facetData <- facet_data(msaData, field) ##update data plot$layers[[1]]$data <- facetData #ly_bg if (length(plot$layers) > 1){ plot$layers[[2]]$data <- facetData #ly_label } region <- diff(range(facetData$position)) xl_scale <- facet_scale(facetData, field) if (region %% field == 0) { plot + facet_wrap(.~facet, ncol = 1, scales = "free_x") + scale_x_continuous(expand = c(0,0), breaks = xl_scale, labels = xl_scale) + coord_cartesian() }else { max_pos <- facetData$position %>% max min_pos <- facetData$position %>% min max_facet <- facetData$facet %>% max minpos_maxfacet <- facetData[facetData$facet == max_facet,"position"] %>% min expand_pos <- (region %/% field + 1) * field + min_pos dummy <- data.frame(x = c(minpos_maxfacet, expand_pos), facet = max_facet) plot + facet_wrap(.~facet, ncol = 1, scales = "free_x") + geom_blank(aes_(x = ~x), dummy, inherit.aes = FALSE) + scale_x_continuous(expand = c(0,0), breaks = xl_scale, labels = xl_scale) + coord_cartesian() } } ##' @method ggplot_add msaBar ##' @importFrom aplot insert_top ##' @importFrom ggplot2 coord_cartesian ##' @export ggplot_add.msaBar <- function(object, plot, object_name){ msaData <- plot$layers[[1]]$data bar_tidyData <- msa2tidy(msaData) ly <- ly_bar(bar_tidyData) p_bar <- ggplot() + ly_bar(bar_tidyData) + bar_theme(bar_tidyData) plot <- plot + coord_cartesian() p_bar %>% insert_top(plot, height = 3) } ##' @method ggplot_add nucleotideeHelix ##' @export ggplot_add.nucleotideeHelix <- function(object, plot, object_name){ msa_data <- plot$layers[[1]]$data tidy_data <- msa2tidy(msa_data) seq_numbers <- levels(tidy_data$name) %>% length helix_data <- object$helix_data color_by <- object$color_by overlap <- object$overlap if(is.data.frame(helix_data)) { helix_tidy <- tidy_helix(helix_data, color_by = color_by) }else { helix_tidy <- tidy_list_helix(helix_data, color_by = color_by) } ly <- layer_helix(helix_data = helix_tidy, overlap = overlap, seq_numbers = seq_numbers) ggplot_add(ly, plot, object_name) } ================================================ FILE: R/msa_data.R ================================================ ##' This function parses FASTA files or other sequence objects. ##' And assign color to each molecule (amino acid or nucleotide) according to ##' the selected color scheme. ##' ##' ##' @title msa_data ##' @param tidymsa sequence alignment with data frame, generated by tidy_msa(). ##' @param font font families, possible values are 'helvetical', 'mono', ##' and 'DroidSansMono', 'TimesNewRoman'. . Defaults is 'helvetical'. ##' If you specify font = NULL, only the background box will be printed. ##' @param color A Color scheme. One of 'Clustal', 'Chemistry_AA', ' ##' Shapely_AA', 'Zappo_AA', 'Taylor_AA','LETTER','CN6', 'Chemistry_NT', ##' 'Shapely_NT', 'Zappo_NT', 'Taylor_NT'.Defaults is 'Chemistry_AA. ##' @param custom_color A data frame with two cloumn called "names" and ##' "color".Customize the color scheme. ##' @param order vectors.Specified sequences order. ##' @param char_width a numeric vector. Specifying the character ##' width in the range of 0 to 1. Defaults is 0.9. ##' @param by_conservation a logical value. The most conserved ##' regions have the brightest colors. ##' @param consensus_views a logical value that opeaning consensus views. ##' @param use_dot a logical value. Displays characters as dots ##' instead of fading their color in the consensus view. ##' @param disagreement a logical value. Displays characters that ##' disagreememt to consensus(excludes ambiguous disagreements). ##' @param ignore_gaps a logical value. When selected TRUE, gaps ##' in column are treated as if that row didn't exist. ##' @param ref a character string. Specifying the reference sequence ##' which should be one of input sequences when 'consensus_views' is TRUE. ##' @return A data frame ##' @examples ##' fasta <- system.file("extdata/sample.fasta", package="ggmsa") ##' data <- msa_data(fasta, 20, 120, ##' font = "helvetical", ##' color = 'Chemistry_AA' ) ## @export ##' @noRd ##' @author Guangchuang Yu, Lang Zhou msa_data <- function(tidymsa, font = "helvetical", color = "Chemistry_AA", custom_color = NULL, char_width = 0.9, by_conservation = FALSE, consensus_views = FALSE, use_dot = FALSE, disagreement = TRUE, ignore_gaps = FALSE, ref = NULL) { if (is.null(custom_color)) { color <- match.arg(color, c("Clustal", "Chemistry_AA", "Shapely_AA", "Zappo_AA", "Taylor_AA","Chemistry_NT", "Shapely_NT", "Zappo_NT", "Taylor_NT", "LETTER", "CN6", "Hydrophobicity" )) } y <- tidymsa ## add color if (color == "Clustal"){ y <- color_Clustal(y) }else { if (consensus_views) { consensus <- get_consensus(y, #extract a consensus/ref sequence ignore_gaps = ignore_gaps, ref = ref) tc <- color_scheme(y, color) %>% #assigning color for other seq. tidy_color(consensus, disagreement, ref = ref)# tidy colors y <- color_scheme(consensus, color) %>% #assigning color for con/ref rbind(tc) #add consensus sequence if (use_dot){ y[is.na(y$color), "character"] <- "." }else { y$font_color <- "#000000" y[is.na(y$color), "font_color"] <- "#aaacaf" y[is.na(y$color), "color"] <- "#ffffff" } }else { y <- color_scheme(y, color, custom_color) } } if (by_conservation){ y <- color_visibility(y) } if (is.null(font)) { return(y) } ## calling internal polygons font_f <- font_fam[[font]] #debug using'as.character()' data_sp <- font_f[as.character(unique(y$character))] ## To adapt to tree data if (!'name' %in% names(y) & !consensus_views) { if ('label' %in% names(y)) { names(y)[names(y) == 'label'] <- "name" }else { stop("unknown sequence name...") } } if(!is.factor(y$name) & !consensus_views){ lev <- unique(data.frame(y[,c("name","y")])) # y is the order of the nodes in the tree lev <- lev[order(lev$y), "name"] y$name <- factor(y$name, levels = lev) } else if(consensus_views) { y$name <- order_name(y$name, consensus_views = consensus_views, ref = ref) } y$ypos <- as.numeric(y$name) # for ggtreeExtra if ("new_position" %in% colnames(y)) { scale_n <- 5 * length(unique(y$name))/diff(range(y$new_position)) char_width <- char_width * diff(range(y$new_position))/diff(range(y$position)) } yy <- lapply(seq_len(nrow(y)), function(i) { d <- y[i, ] dd <- data_sp[[d$character]] if(d$character == "."){ # '.' without zooming if ("new_position" %in% colnames(d)){ dd$x <- dd$x - min(dd$x) + d$new_position - diff(range(dd$x))/2 }else{ dd$x <- dd$x - min(dd$x) + d$position - diff(range(dd$x))/2 } dd$y <- dd$y - min(dd$y) + d$ypos - diff(range(dd$y))/2 }else {# other characters char_scale <- diff(range(dd$x))/diff(range(dd$y))#equal proportion #y_width = char_width, x-width scaled proportionally if(diff(range(dd$x)) <= diff(range(dd$y))) { dd$x <- dd$x * (char_width * char_scale)/diff(range(dd$x)) # for ggtreeExtra if ("new_position" %in% colnames(d)){ dd$y <- (dd$y * char_width)/diff(range(dd$y)) * scale_n dd$x <- dd$x - min(dd$x) + d$new_position - (char_width * char_scale)/2 dd$y <- dd$y - min(dd$y) + d$ypos - scale_n * char_width/2 }else{ dd$y <- (dd$y * char_width)/diff(range(dd$y)) dd$x <- dd$x - min(dd$x) + d$position - (char_width * char_scale)/2 dd$y <- dd$y - min(dd$y) + d$ypos - char_width/2 } }else{#x_width = char_width, y-width scaled proportionally dd$x <- dd$x * char_width/diff(range(dd$x)) # for ggtreeExtra if ("new_position" %in% colnames(d)){ dd$y <- dd$y * char_width/(diff(range(dd$y)) * char_scale) * scale_n dd$x <- dd$x - min(dd$x) + d$new_position - char_width/2 dd$y <- dd$y - min(dd$y) + d$ypos - (scale_n * char_width/char_scale)/2 }else{ dd$y <- dd$y * char_width/(diff(range(dd$y)) * char_scale) dd$x <- dd$x - min(dd$x) + d$position - char_width/2 dd$y <- dd$y - min(dd$y) + d$ypos - (char_width/char_scale)/2 } } } cn <- colnames(d) cn <- cn[!cn %in% c('x','y', 'ypos')] for (nn in cn) { dd[[nn]] <- d[[nn]] } dd$group <- paste0("V", d$position, "L", d$ypos) return(dd) }) ydf <- do.call(rbind, yy) colnames(ydf)[colnames(ydf) == 'y'] <- 'yy' ydf$y <- as.numeric(ydf$name) ydf <- cbind(label = ydf$name, ydf) return(ydf) } ##' Convert msa file/object to tidy data frame. ##' ##' ##' @title tidy_msa ##' @param msa multiple sequence alignment file or sequence object in ##' DNAStringSet, RNAStringSet, AAStringSet, BStringSet, DNAMultipleAlignment, ##' RNAMultipleAlignment, AAMultipleAlignment, DNAbin or AAbin ##' @param start start position to extract subset of alignment ##' @param end end position to extract subset of alignemnt ##' @return tibble data frame ##' @export ##' @examples ##' fasta <- system.file("extdata", "sample.fasta", package = "ggmsa") ##' aln <- tidy_msa(msa = fasta, start = 10, end = 100) ##' @author Guangchuang Yu tidy_msa <- function(msa, start = NULL, end = NULL) { if(inherits(msa, "character") && length(msa) > 1) { aln <- msa }else { aln <- prepare_msa(msa) } alnmat <- lapply(seq_along(aln), function(i) { ##Preventing function collisions base::strsplit(as.character(aln[[i]]), '')[[1]] }) %>% do.call('rbind', .) ## for DNAbin and AAbin alndf <- as.data.frame(alnmat, stringsAsFactors = FALSE) if(unique(names(aln)) %>% length == length(aln)) { alndf$name = names(aln) }else{ stop("Sequences must have unique names") } cn = colnames(alndf) cn <- cn[!cn %in% "name"] df <- gather(alndf, "position", "character", cn) y <- df y$position = as.numeric(sub("V", "", y$position)) y$character = toupper(y$character) y$name = factor(y$name, levels=rev(names(aln))) if (is.null(start)) start <- min(y$position) if (is.null(end)) end <- max(y$position) y <- y[y$position >=start & y$position <= end, ] return(y) } ##' This function converts the msa_data to the tidy data. ##' ##' @param msaData sequence alignment data generated by msa_data(). ##' @noRd msa2tidy <- function(msaData) { if ("order" %in% names(msaData)) { msaData <- msaData[msaData$order == 1,] } df_tidy <- data.frame(name = msaData$name, position = msaData$position, character = msaData$character) df_tidy$character <- as.character(df_tidy$character) return(df_tidy) } ================================================ FILE: R/pp_interactive.R ================================================ make_gap <- function(gap, previous_seq) { gap_df <- previous_seq[rep(1, each=gap),] gap_start <- max(previous_seq$position) + 1 gap_df$position <- gap_start : (gap_start + gap - 1 ) gap_df$character <- "-" if("pos_previous" %in% names(gap_df)) { gap_df$pos_previous <- 0 } return(gap_df) } ##' merge two MSA ##' ##' @title merge_seq ##' @param previous_seq previous MSA ##' @param subsequent_seq subsequent MSA ##' @param gap gap length ##' @param adjust_name logical value. merge seq name or not ##' @return tidy MSA data frame ##' @export ##' @author Lang Zhou merge_seq <- function(previous_seq, gap, subsequent_seq, adjust_name = TRUE) { name_pre <- levels(previous_seq$name) name_subse <- levels(subsequent_seq$name) if(length(name_pre) != length(name_subse)) { stop("The sequences number of previous_seq and subsequent_seq is inconsistent") } gap_df <- make_gap(gap = gap, previous_seq = previous_seq) subsequent_seq$position <- subsequent_seq$position - min(subsequent_seq$position) + 1 subsequent_seq$position <- subsequent_seq$position + max(previous_seq$position) + gap t_merge <- rbind(previous_seq,gap_df,subsequent_seq) if (adjust_name) { rownames(t_merge) <- seq(nrow(t_merge)) names(t_merge)[1] <- "name_previous" t_merge$name <- "" for(i in seq(length(name_pre))) { t_merge[t_merge$name_previous %in% c(name_pre[i], name_subse[i]),"name"] <- paste0(name_pre[i],"-", name_subse[i]) } t_merge$name <- factor(t_merge$name) } return(t_merge) } ##' tidy protein-protein interactive position data ##' ##' @title tidy_hdata ##' @param gap gap length ##' @param inter protein-protein interactive position data ##' @param previous_seq previous MSA ##' @param subsequent_seq subsequent MSA ##' @importFrom R4RNA as.helix ##' @return helix data ##' @export ##' @author Lang Zhou tidy_hdata <- function(gap, inter, previous_seq,subsequent_seq) { inter$j <- inter$Res.no..2 - min(subsequent_seq$position) + max(previous_seq$position) + gap + 1 hdata <- data.frame(i = inter$Res.no.1, j = inter$j, length = 1, value = NA, colour = "blue") hdata <- as.helix(hdata) return(hdata) } ##' reset MSA position ##' ##' @title reset_pos ##' @param seq_df MSA data ##' @return data frame ##' @export ##' @author Lang Zhou reset_pos <- function(seq_df) { names(seq_df)[2] <- "pos_previous" seq_df$position <- "" for(i in unique(seq_df$pos_previous)%>% seq) { uni <- unique(seq_df$pos_previous) seq_df[seq_df$pos_previous == uni[i],"position"] <- i } seq_df$position <- as.numeric(seq_df$position) return(seq_df) } ##' reset hdata data position ##' ##' @title simplify_hdata ##' @param hdata data from tidy_hdata() ##' @param sim_msa MSA data frame ##' @return data frame ##' @export ##' @author Lang Zhou simplify_hdata <- function(hdata, sim_msa) { new_hdata <- lapply(seq(nrow(hdata)), function(a) { n <- hdata[a,] n$pre_i <- n$i n$i <- sim_msa[sim_msa$pos_previous == n$i,"position"] %>% unique return(n) }) %>% do.call("rbind",.) new_hdata <- lapply(seq(nrow(new_hdata)), function(a) { n <- new_hdata[a,] n$pre_j <- n$j n$j <- sim_msa[sim_msa$pos_previous == n$j,"position"] %>% unique return(n) }) %>% do.call("rbind",.) new_hdata <- as.helix(new_hdata) return(new_hdata) } ================================================ FILE: R/prepare_fasta.R ================================================ ##' preparing multiple sequence alignment ##' ##' This function supports both NT or AA sequences; It supports multiple ##' input formats such as "DNAStringSet", "BStringSet", "AAStringSet", ##' DNAbin", "AAbin" and a filepath. ##' @title prepare_msa ##' @param msa a multiple sequence alignment file or object ##' @return BStringSet based object ##' @importFrom Biostrings DNAStringSet ##' @importFrom Biostrings RNAStringSet ##' @importFrom Biostrings AAStringSet ##' @importFrom methods missingArg ##' @importFrom seqmagick fa_read ## @export ##' @author Lang Zhou and Guangchuang Yu ##' @noRd prepare_msa <- function(msa) { if (missingArg(msa)) { stop("no input...") } else if (inherits(msa, "character")) { msa <- fa_read(msa) } else if (!class(msa) %in% supported_msa_class) { stop("multiple sequence alignment object no supported...") } res <- switch(class(msa), DNAbin = DNAbin2DNAStringSet(msa), AAbin = AAbin2AAStringSet(msa), DNAMultipleAlignment = DNAStringSet(msa), RNAMultipleAlignment = RNAStringSet(msa), AAMultipleAlignment = AAStringSet(msa), msa ## DNAstringSet, RNAStringSet, AAString, BStringSet ) return(res) } DNAbin2DNAStringSet <- function(msa) { seqs <- vapply(seq_along(msa), function(i) paste0(as.character(msa[i]) %>% unlist, collapse=''), character(1)) names(seqs) <- names(msa) switch(class(msa), DNAbin = DNAStringSet(seqs), AAbin = AAStringSet(seqs)) } AAbin2AAStringSet <- DNAbin2DNAStringSet supported_msa_class <- c("DNAStringSet", "RNAStringSet", "AAStringSet", "BStringSet", "DNAMultipleAlignment", "RNAMultipleAlignment", "AAMultipleAlignment", "DNAbin", "AAbin") ================================================ FILE: R/read_maf.R ================================================ ##' read 'multiple alignment format'(MAF) file ##' ##' @title read_maf ##' @param multiple_alignment_format a multiple alignment format(MAF) file ##' @return data frame ##' @export ##' @author Lang Zhou read_maf <- function(multiple_alignment_format) { line <- readLines(multiple_alignment_format) head <- sapply(line, function(i) substring(i,1,1)) rm(line)# 'line' in names(heads) #remove header head <- head[-seq(which(head == "#"))] #split block blank <- which(head == "") block_ls <- lapply(seq(blank), function(i) { if (blank[i] == min(blank)) { x <- names(head)[1:blank[i]] }else { x <- names(head)[blank[i-1]:blank[i]] } return(x) }) names(block_ls) <- paste0("block_",seq(length(block_ls))) #extra lines starting with "s" s_block <- lapply(seq(length(block_ls)), function(i) { blocki <- block_ls[[i]] line_s <- blocki[sapply(blocki, function(j) substring(j,1,1)) == "s"] }) names(s_block) <- names(block_ls) #get a MAF df s_name <- c("type", "src", "start", "size", "strand", 'src_size', "text") seq_df <-lapply(seq(length(s_block)), function(i) { blocki <- s_block[[i]] seq_df <- lapply(seq(length(blocki)), function(j) { x <- blocki[[j]] #extra all columns x <- strsplit(x, " ") %>% unlist x1 <- x[sapply(x, nchar) > 0] #convert to data frame seq <- t(as.matrix(x1)) %>% as.data.frame() names(seq) <- s_name seq[,c("start","size",'src_size')] <- seq[,c("start","size",'src_size')] %>%as.numeric() seq$size_gap <- nchar(seq$text) seq$end <- seq$start + seq$size seq$end_gap <- seq$start + seq$size_gap seq$block <- names(s_block[i]) return(seq) })%>% do.call("rbind", .) return(seq_df) }) %>% do.call("rbind", .) } ================================================ FILE: R/seqdiff.R ================================================ ##' calculate difference of two aligned sequences ##' ##' ##' @title seqdiff ##' @param fasta fasta file ##' @param reference which sequence serve as reference, 1 or 2 ##' @return SeqDiff object ##' @export ##' @importFrom Biostrings readBStringSet ##' @importClassesFrom Biostrings BStringSet ##' @importFrom methods new ##' @author guangchuang yu ##' @examples ##' fas <- list.files(system.file("extdata", "GVariation", package="ggmsa"), ##' pattern="fas", full.names=TRUE) ##' seqdiff(fas[1], reference=1) seqdiff <- function(fasta, reference=1) { sequence <- readBStringSet(fasta) if (length(sequence) != 2 && length(width(sequence)) != 1) { stop("fas should contains 2 aligned sequences...") } diff <- nucleotide_difference(sequence, reference) new("SeqDiff", file = fasta, sequence = sequence, reference = reference, diff = diff) } ##' @importFrom magrittr %>% ##' @importFrom Biostrings toString ##' @importFrom Biostrings width nucleotide_difference <- function(x, reference=1) { n <- width(x[1]) nn <- seq_len(n) s1 <- x[1] %>% toString %>% substring(nn, nn) s2 <- x[2] %>% toString %>% substring(nn, nn) pos <- which(s1 != s2) if (reference == 1) { diff <- s2[pos] } else { diff <- s1[pos] } return(data.frame(position = pos, difference = diff, stringsAsFactors = FALSE)) } ##' @importFrom dplyr group_by ##' @importFrom dplyr summarize ##' @importFrom dplyr select ##' @importFrom dplyr n nucleotide_difference_count <- function(x, width=50, keep0=FALSE) { n <- max(x$position) bin <- rep(seq_len(ceiling(n/width)), each=width) position <- c(seq_len(n)[!duplicated(bin)], n) x$bin <- bin[x$pos] y <- x %>% group_by(bin) %>% summarize(position=min(position), count = n()) %>% select(-bin) y$position <- position[findInterval(y$position, position)] if (keep0) { itv <- seq(1, n, width) yy <- data.frame(position = itv[!itv %in% y$position], count = 0) y <- rbind(y, yy) y <- y[order(y$position, decreasing=FALSE),] } return(y) } ================================================ FILE: R/seqlogo.R ================================================ ##' plot sequence logo for MSA based 'ggolot2' ##' @title seqlogo ##' @param msa Multiple sequence alignment file or object for representing ##' either nucleotide sequences or peptide sequences. ##' @param start Start position to plot. ##' @param end End position to plot. ##' @param font font families, possible values are 'helvetical', 'mono', and ##' 'DroidSansMono', 'TimesNewRoman'. Defaults is 'DroidSansMono'. ##' If font=NULL, only the background tiles is drawn. ##' @param color A Color scheme. One of 'Clustal', 'Chemistry_AA', ##' 'Shapely_AA', 'Zappo_AA', 'Taylor_AA', 'LETTER', 'CN6','Chemistry_NT', ##' 'Shapely_NT', 'Zappo_NT', 'Taylor_NT'. Defaults is 'Chemistry_AA'. ##' @param custom_color A data frame with two cloumn called "names" and ##' "color".Customize the color scheme. ##' @param adaptive A logical value indicating whether the overall height of ##' seqlogo corresponds to the number of sequences. If FALSE, seqlogo ##' overall height = 4,fixedly. ##' @param top A logical value. If TRUE, seqlogo is aligned to the top of MSA. ##' @return ggplot object ##' @examples ##' #plot sequence motif independently ##' nt_sequence <- system.file("extdata", "LeaderRepeat_All.fa", ##' package = "ggmsa") ##' seqlogo(nt_sequence, color = "Chemistry_NT") ##' @export ##' @author Lang Zhou seqlogo <- function(msa, start = NULL, end = NULL, font = "DroidSansMono", color = "Chemistry_AA", adaptive = FALSE, top = FALSE, custom_color = NULL) { data <- tidy_msa(msa, start = start, end = end) ggplot() + geom_logo(data, font = font, color = color, adaptive = adaptive, top = top, custom_color = custom_color) + theme_minimal() + xlab(NULL) + ylab(NULL) + theme(legend.position = 'none') + theme(panel.grid = element_blank(), axis.text.y = element_blank()) + coord_fixed() } ##' Multiple sequence alignment layer for ggplot2. It plot sequence motifs. ##' @title geom_seqlogo ##' @param font font families, possible values are 'helvetical', 'mono', ##' and 'DroidSansMono', 'TimesNewRoman'. Defaults is 'DroidSansMono'. ##' @param color A Color scheme. One of 'Clustal', 'Chemistry_AA', ##' 'Shapely_AA', 'Zappo_AA', 'Taylor_AA', 'LETTER', 'CN6', 'Chemistry_NT', ##' 'Shapely_NT', 'Zappo_NT', 'Taylor_NT'. Defaults is 'Chemistry_AA'. ##' @param custom_color A data frame with two cloumn called "names" and ##' "color".Customize the color scheme. ##' @param adaptive A logical value indicating whether the overall height ##' of seqlogo corresponds to the number of sequences.If is FALSE, ##' seqlogo overall height = 4,fixedly. ##' @param top A logical value. If TRUE, seqlogo is aligned to the top of MSA. ##' @param show.legend logical. Should this layer be included in the legends? ##' @param ... additional parameter ##' @return A list ##' @examples ##' #plot multiple sequence alignment and sequence motifs ##' f <- system.file("extdata/LeaderRepeat_All.fa", package="ggmsa") ##' ggmsa(f,font = NULL,color = "Chemistry_NT") + geom_seqlogo() ##' @export ##' @author Lang Zhou geom_seqlogo <- function(font = "DroidSansMono", color = "Chemistry_AA", adaptive = TRUE, top = TRUE, custom_color = NULL, show.legend = FALSE, ...) { structure(list(font = font, color = color, adaptive = adaptive, top = top, custom_color = custom_color, show.legend = show.legend), class = "seqlogo") } geom_logo <- function(data, font = "DroidSansMono", color = "Chemistry_AA", adaptive = FALSE, top = TRUE, custom_color = NULL, show.legend = FALSE, ...) { mapping <- aes_(x = ~logo_x, y = ~logo_y, group = ~group, fill = ~I(color)) logo_data <- seqlogo_data(data, font = font, color = color, adaptive = adaptive, top = top, custom_color = custom_color) ly_logo <- geom_polygon(mapping = mapping, data = logo_data, inherit.aes = FALSE, show.legend = show.legend) return(ly_logo) } seqlogo_data <- function(data, font = "DroidSansMono", color = "Chemistry_AA", adaptive = FALSE, top = TRUE, custom_color = NULL){ tidy <- data if (color == "Clustal") { tidy <- color_Clustal(tidy) } else{ tidy <- color_scheme(tidy, color, custom_color) } if (adaptive) { seq_number <- as.character(unique(tidy[[1]])) total_heigh <- length(seq_number) / 6 } else { total_heigh <- 4 } #total_heigh <- getOption("total_heigh") logo_width <- getOption("logo_width") ## assign the start postion to the first label col_num <- as.numeric(levels(factor(tidy$position))) moti_da <- lapply(col_num, function(j){ ## Calculate the char frequency in each column clo <- tidy[tidy$position == j, ] fre <- prop.table(table(clo$character)) ## total_heigh is overall hight, the height of each char is assigned. ywidth <- sort(total_heigh * fre ) ## calling color scheme column_char_color <- data.frame(unique(clo[c("character", "color")])) font_f <- font_fam[[font]] motif_char <- font_f[names(ywidth)] ds_ <- lapply(seq_along(motif_char), function(i){ ds_ <- motif_char[[i]] names(ds_)[names(ds_) == "x"] <- "logo_x" names(ds_)[names(ds_) == "y"] <- "logo_y" ds_$char <- names(motif_char[i]) #width = .9 ds_$logo_x <- ds_$logo_x * logo_width/diff(range(ds_$logo_x)) #hight = overall hight * frequency ds_$logo_y <- ds_$logo_y * ywidth[[i]]/diff(range(ds_$logo_y)) ymotif <- sum(ywidth[0:(i - 1)]) # sum-hight currently # moving char horizontally ds_$logo_x <- ds_$logo_x - min(ds_$logo_x) - logo_width/2 + j ds_$logo_y <- ds_$logo_y - min(ds_$logo_y) - ywidth[[i]]/2 + ymotif + ywidth[[i]]/2 if (top) { ds_$logo_y <- ds_$logo_y + nrow(tidy[tidy$position == j, ]) + .5 } ## ds_$y - min(ds_$y) - ywidth[[i]]/2: Centered at zero ## + ymotif: sum-hight that are below the char currently ## + ywidth[[i]]/2: the char height currently ds_$group <- paste0("P", j, '-', "Char", names(motif_char[i])) ds_$color <- column_char_color[column_char_color$character == unique(ds_$char), "color"] return(ds_) }) ds <- do.call(rbind, ds_) return(ds) }) moti_da <- do.call(rbind, moti_da) moti_da$name <- as.character(tidy[1,1]) other_cn <- names(moti_da)[!names(moti_da) == 'name'] moti_da <- moti_da[c("name", other_cn)] add_col <- tidy[,!names(tidy) %in% names(moti_da)] moti_da <- cbind(add_col[1,], moti_da, row.names = NULL) return(moti_da) } ================================================ FILE: R/simplot.R ================================================ ##' Sequence similarity plot ##' ##' ##' @title simplot ##' @param file alignment fast file ##' @param query query sequence ##' @param window sliding window size (bp) ##' @param step step size to slide the window (bp) ##' @param group whether grouping sequence.(eg. For "A-seq1,A-seq-2,B-seq1 and ##' B-seq2", using sep = "-" and id = 1 to divide sequences into groups A and ##' B) ##' @param id position to extract id for grouping; only works if group = TRUE ##' @param sep separator to split sequence name; only works if group = TRUE ##' @param sd whether display standard deviation of ##' similarity among each group; only works if group=TRUE ##' @param smooth FALSE(default)or TRUE; whether display smoothed spline. ##' @param smooth_params a list that add params for geom_smooth, ##' (default: smooth_params = list(method = "loess", se = FALSE)) ##' @return ggplot object ##' @importFrom Biostrings readDNAStringSet ##' @importFrom ggplot2 aes_ ##' @importFrom ggplot2 geom_line ##' @importFrom ggplot2 ggtitle ##' @importFrom ggplot2 geom_ribbon ##' @importFrom ggplot2 geom_smooth ##' @importFrom magrittr %<>% ##' @importFrom dplyr group_by_ ##' @importFrom dplyr summarize_ ##' @export ##' @author guangchuang yu ##' @examples ##' fas <- system.file("extdata/GVariation/sample_alignment.fa", ##' package="ggmsa") ##' simplot(fas, 'CF_YL21') simplot <- function(file, query, window=200, step=20, group=FALSE, id, sep, sd=FALSE, smooth = FALSE, smooth_params = list(method = "loess", se = FALSE)) { aln <- readDNAStringSet(file) nn <- names(aln) if (group) { g <- vapply(strsplit(nn, sep), function(x) x[id], character(1)) } idx <- which(nn != query) w <- width(aln[query]) start <- seq(1, w, by=step) end <- start + window - 1 start <- start[end <= w] end <- end[end <= w] res <- lapply(idx, function(i) { x <- toCharacter(aln[i]) == toCharacter(aln[query]) pos <- round((start+end)/2) sim <- vapply(seq_along(start), function(j) { mean(x[start[j]:end[j]]) }, numeric(1)) y <- data.frame(sequence=nn[i], position = pos, similarity = sim) if(group) { y$group <- g[i] } return(y) }) %>% do.call(rbind, .) if (group) { res %<>% group_by_(~position, ~group) %>% summarize_(msim=~mean(similarity), sd=~sd(similarity)) } if (group) { p <- ggplot(res, aes_(x=~position, y=~msim, group=~group)) if (sd) p <- p + geom_ribbon(aes_(ymin=~msim-sd, ymax=~msim+sd, fill=~group), alpha=.25) if (smooth) { smooth_layer <- do.call(geom_smooth, smooth_params) p <- p + smooth_layer } else { p <- p + geom_line(aes_(color=~group)) } } else { mapping = aes_(x=~position, y=~similarity, group=~sequence, color=~sequence) p <- ggplot(res, mapping = mapping) if (smooth) { smooth_layer <- do.call(geom_smooth, smooth_params) p <- p + smooth_layer } else { p <- p + geom_line() } } p + xlab("Nucleotide Position") + ylab("Similarity (%)") + ggtitle(paste("Sequence similarities compare to", query)) + theme_minimal() + theme(legend.title=element_blank()) } toCharacter <- function(x) { unlist(strsplit(toString(x),"")) } ================================================ FILE: R/theme_msa.R ================================================ ##' Theme for ggmsa. ##' ##' @title theme_msa ##' @importFrom ggplot2 theme_minimal ##' @importFrom ggplot2 labs ##' @export ##' @author Lang Zhou theme_msa <- function(){ list( xlab(NULL), ylab(NULL), labs(fill = "Fills"), coord_fixed(), scale_x_continuous(expand = c(0,0)), theme_minimal() + theme( strip.text = element_blank(), panel.spacing.y = unit(.4, "in"), panel.grid = element_blank()) ) } ##' @importFrom grDevices colorRampPalette ##' @importFrom RColorBrewer brewer.pal ##' @importFrom ggplot2 coord_cartesian ##' @importFrom ggplot2 scale_x_continuous ##' @importFrom ggplot2 scale_y_continuous ##' @importFrom ggplot2 scale_fill_gradientn bar_theme <- function(tidy){ data <- bar_data(tidy) color_palettes <- colorRampPalette(brewer.pal(n = 9, name = "Blues")[c(4:7)]) list( xlab(NULL), ylab("consensus"), scale_x_continuous(breaks = data[[3]], labels = data[[1]], expand = c(0,0)), scale_y_continuous(breaks = NULL), scale_fill_gradientn(colours = color_palettes(100)), theme_minimal() + theme(panel.grid.minor.x = element_blank(), panel.grid.major.x = element_blank()) ) } facet_scale <- function(facetData, field) { facet0_pos <- facetData[facetData$facet == 0,"position"] msa_start <- min(facet0_pos) ## x labels of facet 0 facet0_xl_scale <- pretty(min(facet0_pos):max(facet0_pos)) ## assign the start postion to the first label facet0_xl_scale[1] <- msa_start xl_scale <- facet0_xl_scale for(i in max(facetData$facet) %>% seq_len) { scale_i <- facet0_xl_scale + field * i if(msa_start > 1) scale_i[1] <- scale_i[1] + 1 #print(scale_i) xl_scale <- xl_scale %>% c(scale_i) } max_pos <- facetData$position %>% max xl_scale <- xl_scale[xl_scale <= max_pos] return(xl_scale) } ================================================ FILE: R/zzz.R ================================================ #' @importFrom utils packageDescription .onAttach <- function(libname, pkgname){ #options(total_heigh = 4) options(logo_width = 0.9) options(asterisk_width = .03) options(GC_pos = 2) options(shadingLen = .5) options(shading_alpha = .3) pkgVersion <- packageDescription(pkgname, fields="Version") msg <- paste0(pkgname, " v", pkgVersion, " ", "Document: http://yulab-smu.top/ggmsa/", "\n\n") citation <- paste0("If you use ", pkgname, " in published research, please cite:\n", "L Zhou, T Feng, S Xu, F Gao, TT Lam, Q Wang, T Wu, ", "H Huang, L Zhan, L Li, Y Guan, Z Dai*, G Yu* ", "ggmsa: a visual exploration tool for multiple sequence alignment and associated data. ", "Briefings in Bioinformatics. DOI:10.1093/bib/bbac222") packageStartupMessage(paste0(msg, citation)) } ================================================ FILE: README.Rmd ================================================ --- output: md_document: variant: gfm html_preview: TRUE --- ```{r, include = FALSE} knitr::opts_chunk$set( fig.path = "man/figures/REAMED-", message = FALSE, warning = FALSE ) ``` # ggmsa:a visual exploration tool for multiple sequence alignment and associated data ```{r echo=FALSE, results="hide", message=FALSE} library(badger) ``` ```{r, echo = FALSE, results='asis'} cat( badge_devel("YuLab-SMU/ggmsa", "blue"), badge_lifecycle("experimental", "orange"), badge_license("Artistic-2.0") ) ``` `ggmsa` is designed for visualization and annotation of multiple sequence alignment. It implements functions to visualize publication-quality multiple sequence alignments (protein/DNA/RNA) in R extremely simple and powerful. For details, please visit ## :hammer: Installation The released version from `Bioconductor` ```{r eval=FALSE} if (!requireNamespace("BiocManager", quietly=TRUE)) install.packages("BiocManager") ## BiocManager::install("BiocUpgrade") ## you may need this BiocManager::install("ggmsa") ``` Alternatively, you can grab the development version from github using devtools: ```{r eval=FALSE} if (!requireNamespace("devtools", quietly=TRUE)) install.packages("devtools") devtools::install_github("YuLab-SMU/ggmsa") ``` ## :bulb: Quick Example ```{r fig.height = 2.5, fig.width = 11, message=FALSE, warning=FALSE, dpi=300} library(ggmsa) protein_sequences <- system.file("extdata", "sample.fasta", package = "ggmsa") ggmsa(protein_sequences, start = 221, end = 280, char_width = 0.5, seq_name = TRUE) + geom_seqlogo() + geom_msaBar() ``` ## :books: Learn more Check out the guides for learning everything there is to know about all the different features: - [Getting Started](https://yulab-smu.github.io/ggmsa/articles/ggmsa.html) - [Annotations](https://yulab-smu.github.io/ggmsa/articles/guides/Annotations.html) - [Color Schemes and Font Families](https://yulab-smu.github.io/ggmsa/articles/guides/Color_schemes_And_Font_Families.html) - [Theme](https://yulab-smu.github.io/ggmsa/articles/guides/MSA_theme.html) - [Other Modules](https://yulab-smu.github.io/ggmsa/articles/guides/Other_Modules.html) - [View Modes](https://yulab-smu.github.io/ggmsa/articles/guides/View_modes.html) ## :runner: Author - [Guangchuang Yu](https://guangchuangyu.github.io) Professor, PI - [Lang Zhou](https://github.com/nyzhoulang) Master's Student - [Shuangbin Xu](https://github.com/xiangpin) PhD Student **YuLab** **Department of Bioinformatics, School of Basic Medical Sciences, Southern Medical University** ## :sparkling_heart: Contributing We welcome any contributions! By participating in this project you agree to abide by the terms outlined in the [Contributor Code of Conduct](https://github.com/YuLab-SMU/ggmsa/blob/master/CONDUCT.md). ================================================ FILE: README.md ================================================ # ggmsa:a visual exploration tool for multiple sequence alignment and associated data [![](https://img.shields.io/badge/devel%20version-1.3.2-blue.svg)](https://github.com/YuLab-SMU/ggmsa) [![](https://img.shields.io/badge/lifecycle-experimental-orange.svg)](https://lifecycle.r-lib.org/articles/stages.html#experimental) [![License: Artistic-2.0](https://img.shields.io/badge/license-Artistic--2.0-blue.svg)](https://cran.r-project.org/web/licenses/Artistic-2.0) `ggmsa` is designed for visualization and annotation of multiple sequence alignment. It implements functions to visualize publication-quality multiple sequence alignments (protein/DNA/RNA) in R extremely simple and powerful. For details, please visit ## :hammer: Installation The released version from `Bioconductor` ``` r if (!requireNamespace("BiocManager", quietly=TRUE)) install.packages("BiocManager") ## BiocManager::install("BiocUpgrade") ## you may need this BiocManager::install("ggmsa") ``` Alternatively, you can grab the development version from github using devtools: ``` r if (!requireNamespace("devtools", quietly=TRUE)) install.packages("devtools") devtools::install_github("YuLab-SMU/ggmsa") ``` ## :bulb: Quick Example ``` r library(ggmsa) protein_sequences <- system.file("extdata", "sample.fasta", package = "ggmsa") ggmsa(protein_sequences, start = 221, end = 280, char_width = 0.5, seq_name = TRUE) + geom_seqlogo() + geom_msaBar() ``` ![](man/figures/REAMED-unnamed-chunk-6-1.png) ## :books: Learn more Check out the guides for learning everything there is to know about all the different features: - [Getting Started](https://yulab-smu.github.io/ggmsa/articles/ggmsa.html) - [Annotations](https://yulab-smu.github.io/ggmsa/articles/guides/Annotations.html) - [Color Schemes and Font Families](https://yulab-smu.github.io/ggmsa/articles/guides/Color_schemes_And_Font_Families.html) - [Theme](https://yulab-smu.github.io/ggmsa/articles/guides/MSA_theme.html) - [Other Modules](https://yulab-smu.github.io/ggmsa/articles/guides/Other_Modules.html) - [View Modes](https://yulab-smu.github.io/ggmsa/articles/guides/View_modes.html) ## :runner: Author - [Guangchuang Yu](https://guangchuangyu.github.io) Professor, PI - [Lang Zhou](https://github.com/nyzhoulang) Master’s Student - [Shuangbin Xu](https://github.com/xiangpin) PhD Student **YuLab** **Department of Bioinformatics, School of Basic Medical Sciences, Southern Medical University** ## :sparkling_heart: Contributing We welcome any contributions! By participating in this project you agree to abide by the terms outlined in the [Contributor Code of Conduct](https://github.com/YuLab-SMU/ggmsa/blob/master/CONDUCT.md). ================================================ FILE: inst/CITATION ================================================ citHeader("To cite ggmsa in publications use:") citEntry( entry = "book", title = "Data Integration, Manipulation and Visualization of Phylogenetic Treess", author = person("Guangchuang", "Yu"), publisher = "Chapman and Hall/{CRC}", year = "2022", edition = "1st edition", url = "https://www.amazon.com/Integration-Manipulation-Visualization-Phylogenetic-Computational-ebook/dp/B0B5NLZR1Z/", textVersion = paste("Guangchuang Yu. (2022).", "Data Integration, Manipulation and Visualization of Phylogenetic Trees (1st edition).", "Chapman and Hall/CRC.") ) citEntry( entry = "article", title = "ggmsa: a visual exploration tool for multiple sequence alignment and associated data ", author = personList( as.person("Lang Zhou"), as.person("Tingze Feng"), as.person("Shuangbin Xu"), as.person("Fangluan Gao"), as.person("Tommy T Lam"), as.person("Qianwen Wang"), as.person("Tianzhi Wu"), as.person("Huina Huang"), as.person("Li Zhan"), as.person("Lin Li"), as.person("Yi Guan"), as.person("Zehan Dai"), as.person("Guangchuang Yu") ), journal = "BRIEFINGS IN BIOINFORMATICS", volume = "23", issue = "4", year = "2022", month = "06", ISSN = "1467-5463", doi = "10.1093/bib/bbac222", PMID = "35671504", url = "https://academic.oup.com/bib/article-abstract/23/4/bbac222/6603927", textVersion = paste("L Zhou, T Feng, S Xu, F Gao, TT Lam, Q Wang, T Wu, H Huang, L Zhan, L Li, Y Guan, Z Dai, G Yu.", "ggmsa: a visual exploration tool for multiple sequence alignment and associated data.", "Bioinformatics. 2022, 23(4):bbac222. 10.1093/bib/bbac222") ) ================================================ FILE: inst/extdata/GVariation/A.Mont.fas ================================================ >Mont ATGGCAACTTACACATCAACAATCCAGTTTGGTTCCATTGAATGCAAACTTCCATACTCACCCGCTCCTTTTGGGCTAGTTGCGGGGAAACGAGAAGTTTTAACCACCACTGACCCCTTCGCAAGTTTGGAGATGCAGCTTAGTGCGCGATTACGAAGGCAAGAGTTTGCAACTATTCGAACATCCAAGAATGGTACTTGCATGTATCGATACAAGACTGATGTCCAGATTGCGCGCATTCAAAAGAAGCGCGAGGAAAGAGAAAGAGAGGAATATAATTTCCAAATGGCTGCGTCAAGTGTTGTGTCGAAGATCACTATTGCTGGTGGAGAGCCACCTTCAAAACTTGAATCACAAGTGCGGAGGGGTGTCATCCACACAACTCCAAGGATGCGCACAGCAAAAACATATCACACGCCAAAGCTGACAGAGGGACAAATGAACCACCTTATCAAGCAGGTGAAGCAAATTATGTCAACCAAAGGAGGGTCTGTTCAACTGATTAGCAAGAAAAGTACTCATGTTCACTATAAAGAAGTTTTGGGATCACATCGCGCAGTTGTTTGCACTGCACATATGAGAGGTTTACGAAAGAGAGTGGACTTTCGGTGTGATAAATGGACCGTTGTGCGTCTACAGCATCTCGCCAGGACGGACAAGTGGACTAACCAAGTTCGTGCTACTGATCTACGCAAGGGCGATAGTGGAGTTATATTGAGTAATACTAATCTCAAAGGAAACTTTGGGAGAAGCTCGGAGGGCCTATTCATAGTGCGTGGGTCGCACGAAGGAAAAATCTATGATGCACGTTCCAAGGTTACTCAAGGGGTTATGGATTCAATGGTTCAGTTCTCAAGCGCTGAAAGCTTTTGGAAGGGATTGGACGGCAATTGGGCACAAATGAGATATCCTACAGATCATACATGTGTGGCAGGCTTACCAGTTGAAGACTGTGGCAGAGTTGCAGCGATAATGACACACAGTATTTTACCGTGCTATAAGATAACCTGCCCTACCTGTGCCCAACAATATGCCAACTTGCCAGCCAGTGACTTACTTAAGATATTACACAAGCACGCAAGTGATGGTCTAAATCGATTGGGGGCAGACAAAGATCGCTTTGTGCATGTCAAAAAGTTCTTGACAATCTTAGAGCACTTAACTGAACCGGTTGATCTGAGTCTAGAAATTTTCAATGAAGTATTCAAGTCTATAGGGGAGAAGCAACAATCACCTTTCAAAAACCTGAATATTCTGAATAATTTCTTTTTGAAAGGAAAGGAAAATACAGCTCGTGAATGGCAGGTGGCTCAATTAAGCTTACTTGAATTGGCAAGATTCCAAAAGAACAGAACGGATAATATCAAGAAAGGAGACATCTCGTTCTTTAGGAATAAACTATCTGCAAAAGCAAATTGGAACTTGTATCTGTCATGTGATAACCAGCTGGATAAGAATGCAAACTTCCTGTGGGGACAGAGGGAATATCATGCTAAGCGATTTTTCTCGAACTATTTCGAGGAAATTGATCCAGCGAAGGGCTATTCAGCATACGAAAATCGTTTGCATCCGAATGGGACAAGAAAACTTGCAATTGGAAACCTAATTGTACCACTTGATCTGGCTGAGTTTAGGCGGAAGATGAAAGGTGATTATAAAAGACAGCCAGGGGTGAGTAAGAAGTGCACGAGCTCGAAGGATGGAAACTACGTGTATCCCTGTTGTTGCACTACACTTGATGATGGCTCAGCTGTTGAGTCAACATTTTACCCGCCAACTAAGAAGCACCTCGTAATAGGTAATAGTGGCGACCAAAAGTATGTTGACTTACCAAAAGGGAATTCTGAGATGTTATATATTGCCAGGCAAGGCTTCTGTTACATTAACATTTTTCTCGCGATGTTGATTAACATTAGTGAGGAAGATGCAAAGGATTTCACTAAGAAGGTTCGTGACATGTGTGTGCCAAAGCTTGGAACCTGGCCAACCATGATGGATTTGGCTACAACTTGTGCTCAAATGAAAATATTCTACCCTGATGTTCATGATGCAGAACTGCCTAGAATACTAGTCGATCACGAAACGCAGACATGCCATGTGGTTGACTCGTTTGGCTCACAAACAACTGGGTATCATATTTTGAAAGCATCTAGCGTGTCCCAACTTATTTTGTTTGCTAATGATGAGTTGGAGTCTGACATTAAGCACTATAGAGTTGGTGGTATTCCTGGAGCATGCCCTGAGCTTGGGTCCACAATATCACCTTTTAGAGAAGGAGGAATCATAATGTCTGAGTCAGCAGCGCTAAAACTGCTCCTAAAGGGAATTTTTAGGCCCAAAGTGATGAAGCAATTGCTACTGGATGAACCATATTTGCTCATTTTATCGATATTATCTCCTGGTATACTTATGGCCATGTACAACAATGGGATATTTGAGTTAGCGGTGAAGTTGTGGATCAATGAGAAACAATCTATAGCCATGATAGCATCGTTATTGTCCGCCTTGGCTTTACGAGTGTCAGCAGCAGAAACACTCGTTGCACAGAGGATTATAATTGACACGGCAGCAACAGATCTTCTCGATGCTACGTGTGATGGATTCAACTTACATCTAACATATCCCACTGCACTCATGGTGTTGCAAGTTGTTAAGAACAGAAATGAATGTGATGATACGTTGTTTAAAGCAGGTTTTTCACATTACAACATGAGTGTCGTGCAGATTATGGAAAAAAATTATCTAAGCCTCTTGGGCGATGCTTGGAAAGATTTAACCTGGCGAGAAAAATTATCCGCAACATGGCACTCATACAAAGCAAAGCGCTCTATCACTCAGTTCATAAAACCCATAGGCAAAGCAGATTTAAAAGGGTTGTACAACATATCACCGCAAGCATTCTTGGGTCAGGGCGTACAGAGAGTCAAAGGCACCGCCTCAGGGTTGAATGAGCGACTCAATAATTATATCAATACTAAGTGTGTAAATATTTCATCCTTTTTCATTCGTAGAATTTTCCGGCGCTTGCCAACTTTTGTAACTTTCATTAATTCATTATTAGTTATTAGTATGCTAACTAGTGTAGTAGCAGTGTGTCAAGCAATAATTCTAGATCAAAGGAAGTATAGAAAAGAAATTGAGTTGATGCAGATTGAGAAGAATGAAATTGTTTGTATGGAGTTGTATGCGAGTCTGCAGCGCAAACTTGAGCGTGAATTCACATGGGATGAATATATGGAATATTTGAAATCTGTGAATCCCCAGATAGTTCAATTCGCGCAAGCTCAAATGGAAGAATATAATGTGCGACATCAGCGCTCCACACCAGGTGTTAAGAATTTAGAGCAGGTGGTAGCATTTATAACTCTAATTATCATGATGTTTGATGCTGAAAGGAGCGACTGTGTATTTAAGACTCTCAACAAATTCAAAGGCATCGTTTCTTCAATGGATCATGAAGTTAGACACCAGTCCTTGGATGATGTAATCAAGAATTTCGATGAAAGGAACGAAGTTATTGATTTTGAACTAAATGAGGATACAATTAAAACATCATCAGTGTTGGACACAAAGTTTAGCGACTGGTGGGATCGGCAAATCCAAATGGGACACACACTTCCCCATTATAGAACTGAGGGACACTTCATGGAATTCACAAGGGCAACTGCTGTACAAGTGGCCAACGACATCGCGCATAGTGAGCACCTAGACTTTCTAGTGAGGGGAGCTGTTGGGTCTGGAAAATCTACTGGACTGCCTGTCCATCTCAGTGCAGCTGGATCTGTGCTTTTGATAGAACCAACTCGACCACTTGCAGAAAACGTGTTCAAGCAATTATCCAGTGAACCGTTTTTCAAGAAGCCAACACTGCGCATGCGAGGAAATAGTGTGTTTGGTTCCTCTCCAATCTCCATTATGACTAGCGGCTTTGCGTTGCACTACTATGCTAATAATCGCTCTCAGCTAACTCAGTTTAATTTCATAATTTTTGATGAATGTCATGTTTTAGATCCTTCTGCAATGGCATTTCGTAGCTTGTTAAGTGTGTATCACCAAACATGCAAAGTGTTAAAGGTGTCAGCCACTCCAGTGGGAAGGGAGGTCGAGTTCACAACACAACAACCAGTTAAATTGGTGGTTGAGGATACACTTTCATTCCAATCTTTTGTTGATGCGCAAGGCTCAAAAACCAATGCTGACGTAGTTCAGCATGGTTCGAACATACTCGTGTATGTGTCGAGTTACAATGAAGTGGATACATTAGCCAAGCTTCTAACAGATAGGAATATGATAGTCTCAAAAGTTGATGGCAGAACAATGAAGCACGGATGCTTAGAAATTGTAACGAAAGGGACTAGTGCAAAGCCACATTTTGTCGTAGCAACCAACATTATTGAAAATGGAGTAACTTTAGATATAGATGTAGTTGTAGATTTTGGGCTTAAAGTCTCACCGTTTTTAGATATTGACAATAGGAGCATAGCATACAATAAGATTAGTGTTAGCTATGGAGAAAGAATTCAGAGGTTGGGCCGTGTTGGGCGCTTTAAGAAGGGAGTGGCATTGCGTATTGGACACACCGAAAAGGGAATTATTGAGATTCCAAGTATGATTGCTAGTGAAGCTGCGCTTGCGTGCTTTGCATACAATTTGCCAGTAATGACAGGGGGTGTTTCAACTAGCCTCATTGGCAATTGTACTGTTCGTCAAGTTAAAACTATGCAACAATTTGAGCTGAGTCCATTCTTTATACAAAATTTTGTTGCCCATGATGGATCAATGCATCCTGTCATACATGACATTCTTAAGAAGTATAAACTGCGAGATTGTATGACGCCCTTGTGTGATCAATCCATACCTTACAGAGCCTCAAGCACTTGGTTGTCTGTTAGTGAGTACGAACGACTCGGAGTGGTTTTGGACATTCCAAAACAGATCAAGATTGCATTCCACATCAAGGATATCCCTCCTAAGTTGCATGAAATGCTTTGGGAAACAGTTATCAAATATAAGGATGTTTGTTTGTTTCCAAGTATTCGGGCTTCATCCATTAGCAAAATTGCATACACACTGCGCACTGATCTTTTTGCAATTCCCAGAACCCTAATTCTAGTTGAAAGATTGCTCGAGGAGGAACGAGTGAAACAGAGTCAATTCAGAAGTCTCATTGATGAAGGATGCTCAAGCATGTTTTCAATTGTTAATTTAACAAACACTCTTAGAGCTAGATATGCAAAGGATTACACTGCAGAAAACATACAGAAGCTCGAGAAAGTGAGGAGTCAGTTAAAGGAGTTCTCAAATTTAAATGGCTCTGCATGCGAGGAGAACTTAATGAAGAGGTATGAATCTCTACAGTTTGTGCATCATCAAGCAACAACTGCACTCGCAAAGGATTTGAAGTTGAAAGGAGTTTGGAAGAAGTCATTAGTTGTGCAGGACTTAATCATAGCGGGTGCCGTTGCTATTGGTGGAATAGGGCTCATCTATAGTTGGTTTACTCAATCAGTTGAAACTGTGTCTCACCAGGGCAAGAACAAATCCAAAAGAATTCAAGCATTGAAGTTTCGACACGCCCGCGATAAGAGGGCTGGTTTTGAAATTGATAACAATGATGATACAATAGAAGAATTCTTTGGATCTGCATACAGGAAGAAGGGAAAAGGTAAAGGCACCACTGTTGGTATGGGCAAGTCAAGCAGGAGGTTTGTTAATATGTATGGATTTGACCCAACAGAATATTCATTCATCCAGTTCGTTGATCCGCTCACTGGAGCTCAAATTGAAGAGAACGTCTATGCTGATATTAGAGACATCCAAGAGCGCTTTAGTGATGTCCGCAAGAAAATGGTAGAGGATGATGAAATCGAATTGCAAGCATTGGGCAGCAACACAATCATTCATGCTTACTTCAGGAAGGATTGGTCTGACAAGGCTCTAAAAATTGATTTGATGCCACACAACCCACTCAAAATCTGTGATAAATCGAATGGCATTGCTAAGTTTCCTGAAAGAGAACTTGAGTTGAGGCAAACTGGGCCAGCAACAGAGGTTGATGTGAAAGACATTCCAAAACAGGAAGTGGAGCATGAAGCCAAATCACTCATGAGAGGTTTAAGGGATTTCAATCCAATTGCTCAAACAGTTTGCAGAGTAAAAGTGTCTGTTGAATATGGAACGTCTGAAATGTATGGGTTCGGTTTTGGTGCGTATATTATAGTAAACCACCATCTATTCAAGAGTTTCAATGGATCCATGGAAGTGCGATCAATGCATGGAACATTCAGAGTGAAGAATTTGCATAGCCTGAGCGTTTTACCGATCAAAGGCAGAGACATTATCATCATAAAGATGCCAAAGGATTTCCCTGTTTTCCCACAAAAACTGCACTTCCGAGCTCCAGTGCAGAATGAGAGGATTTGTTTGGTTGGAACTAATTTTCAAGAAAAACATGCATCATCAATCATCACAGAAACGAGTACTACATACAATGTACCGGGCAGCACTTTTTGGAAGCATTGGATTGAAACAAATGATGGGCATTGTGGATTACCAGTAGTGAGTACAGCTGATGGATGTCTAGTTGGAATACACAGCTTGGCGAATAATGTGCAAACCACGAATTATTATTCAGCCTTTGATGAGGATTTTGAAAGTAAGTATCTCCGAACTGATGAGCATAATGAGTGGACCAAATCGTGGGTATATAACCCAGATACTGTGTTGTGGGGTCCATTGAAGCTCAAAGAGAGTACCCCTAAAGGCCTGTTTAAGACAACAAAACTTGTACAGGATTTAATTGATCATGATGTTGTTGTAGAGCAAGCTAAACATTCTGCGTGGATGTATGAGGCTCTAACAGGGAATTTGCAAGCTGTGGCGACAATGAAGAGTCAGCTAGTGACAAAGCACGTGGTCAAAGGGGAGTGTCGGCACTTCAAAGAGTTCTTAACTGTGGATTCGGAAGCAGAAGCTTTCTTCAGGCCTTTGATGGATGCTTATGGGAAGAGCTTGTTAAATAGAGAAGCATATATAAAGGACATAATGAAATACTCAAAGCCTATTGATGTTGGAATAGTAGACTGTGATGCTTTTGAAGAGGCTATCAATAGGGTTATCATTTATCTGCAAGTACATGGCTTCCAGAAATGCAATTACATCACCGATGAGCAGGAAATTTTCAAAGCTCTCAATATGAAAGCTGCTGTCGGGGCTATGTATGGAGGCAAGAAGAAAGACTACTTCGAGCATTTTACTGAGGCGGATAAAGAGGAAATTGTTATGCAAAGTTGCTTACGATTGTACAAGGGCTCACTTGGCATATGGAATGGATCATTGAAAGCAGAACTTCGGTGCAAAGAGAAGATACTTGCAAATAAGACAAGGACATTCACTGCTGCACCTTTAGATACTCTACTGGGTGGGAAGGTGTGCGTTGATGATTTTAATAATCAATTCTACTCAAAGAACATTGAATGCTGCTGGACTGTTGGAATGACTAAGTTTTATGGAGGTTGGGACAAATTGCTTCGGCGTCTACCTGAAAATTGGGTGTACTGCGATGCCGATGGTTCACAATTCGATAGTTCACTCACCCCATACCTAATTAATGCTGTTCTCATCATCAGAAGCACATACATGGAAGATTGGGACTTGGGGTTGCAAATGTTGCGCAATTTGTACACAGAAATAATTTACACACCAATCTCAACTCCAGATGGAACAATTGTCAAGAAGTTTAGAGGTAATAATAGCGGTCAACCTTCTACCGTTGTGGATAATTCTCTCATGGTTGTTCTTGCTATGCATTACGCTCTCATTAAGGAGTGCGTTGAGTTTGAAGAAATCGACAGCACGTGTGTATTCTTTGTTAATGGTGATGACTTATTGATTGCTGTGAATCCGGAGAAAGAGAGCATTCTCGATAGAATGTCACAACATTTCTCAGATCTTGGTTTGAACTATGATTTTTCGTCGAGAACAAGAAGGAAGGAGGAATTGTGGTTCATGTCCCATAGAGGCCTGCTAATTGAGGGTATGTACGTGCCAAAGCTTGAAGAAGAGAGAATTGTATCCATTCTGCAATGGGATAGGGCTGATCTGCCAGAGCACAGATTAGAAGCGATTTGTGCAGCAATGATAGAATCCTGGGGTTATTTTGAGTTAACGCACCAAATTAGGAGATTCTACTCATGGTTGTTACAACAGCAACCTTTTTCAACGATAGCACAGGAAGGAAAAGCTCCATACATAGCGAGCATGGCATTGAAGAAGCTGTACATGAATAGGACAGTAGATGAGGAGGAACTGAAGGCTTTCACTGAAATGATGGTTGCCTTGGATGACGAATTTGAGTGCGATACTTATGAAGTGCACCATCAAGGAAATGACACAATCGATGCAGGAGGAAGCACTAAGAAGGATGCAAAACAAGAGCAAGGTAGCATTCAACCAAATCTCAACAAGGAAAAGGAAAAGGACGTGAATGTTGGAACATCTGGAACTCATACTGTGCCACGAATTAAAGCTATCACGTCCAAAATGAGAATGCCTAAGAGTAAAGGTGCAACTGTACTAAATTTGGAACACTTACTCGAGTATGCTCCACAGCAAATTGACATCTCAAATACTCGAGCAACTCAATCACAGTTTGATACGTGGTATGAAGCAGTACAACTTGCATACGACATAGGAGAAACTGAAATGCCAACTGTGATGAATGGGCTTATGGTTTGGTGCATTGAAAATGGAACCTCGCCAAACATCAACGGAGTTTGGGTTATGATGGATGGAGATGAACAAGTCGAATACCCACTGAAACCAATCGTTGAGAATGCAAAACCAACACTTAGGCAAATCATGGCACATTTCTCAGATGTTGCAGAAGCGTATATAGAAATGCGCAACAAAAAGGAACCATATATGCCACGATATGGTTTAGTTCGTAATCTGCGCGATGGAAGTCTGGCTCGCTATGCTTTTGACTTTTATGAAGTTACATCACGGACACCAGTGAGGGCTAGAGAGGCACACATTCAAATGAAGGCCGCAGCTTTAAAATCAGCTCAATCTCGACTTTTCGGATTGGATGGTGGCATTAGTACACAAGAGGAAAACACAGAGAGGCACACCACCGAGGATGTTTCTCCAAGTATGCATACTCTACTTGGAGTGAAGAACATGTGA >CF_YL21 ATGGCAATCTACACGTCAACAATCTGTTTCGGTTCGTTTGAATGCAAGCTACCATACTCACCCGCCTCTTGCGGGCATATTGTGAAGGAACGAGAAGTGCTGGCTTCCATCGATCCTTTCGCAGATCTGGAAACACAACTTAGTGCACGATTGCTCAAGCAAGAATATGCTACTGTTCGTGTGCTCAAGAACGGTACTCTTACGTACCGATACAAGACTGATGCCCAGATAACGCGCATCCAGAAGAGACTGGAAAGGAAGGATAGGGAAGAATACCACTTCCAGATGGCAGCTCCTAGTATTGTGTCAAAGATCACTATTGCTGGTGGAGAGCCACCTTCAAAACTTGAATCACAAGTGCGGAGGGGTGTCATCCACACAACTCCAAGGATGCGCACAGCAAAAACATATCACACGCCAAAGCTGACAGAGGGACAAATGAACCACCTTATCAAGCAGGTGAAGCAAATTATGTCAACCAAAGGAGGGTCTGTTCAACTGATTAGCAAGAAAAGTACCCATGTTCACTATAAAGAAGTTTTGGGATCACATCGCGCAGTTGTTTGCACTGCACATATGAGAGGTTTACGAAAGAGAGTGGACTTTCGGTGTGATAAATGGACCGTTGTGCGTCTACAGCATCTCGCCAGGACGGATAAGTGGAATAACCAAGTTCGTGTTACTGATCTACGCAAGGGCGATAGTGGAGTTATATTGAGTAATACTAATCTCAAAGGAAACTTTGGGAGAAGCTCGGAGGGCATATTCATAGTGCGTGGGTCGCACGAAGGAAAAATCTATGATGCACGTTCCAAGGTTACTCAAGGGGTTATGGATTCAATGGTTCAGTTCTCAAGCGCTGAAAGCTTTTGGAAGGGATTGGACGGCAATTGGGCACAAATGAGATATCCTACAGATCATACATGTGTGGCAGGCTTACCAGTTGAAGACTGTGGCAGAGTTGCAGCGATAATGACACACAGTATTTTACCGTGCTATAAGATAACCTGCCCTACCTGTGCCCAACAATATGCCAACTTGCCAGCCAGTGACTTACTTAAGATATTACACAAGCACGCAAGTGATGGTTTAAATCGATTGGGGGCAGACAAAGATCGCTTTGTGCATGTCAAAAAGTTCTTGACAATCTTAGAGCACTTAACTGAACCGGTTGATCTGAGTCTAGAAATTTTCAATGAAGTATTCAAGTCTATAGGGGAGAAGCAACAATCACCTTTCAAAAACCTGAATATTCTGAATAATTTCTTTTTAAAAGGAAAGGAAAATACAGCTCGTGAATGGCAGGTGGCTCAATTAAGCTTACTTGAATTGGCAAGATTCCAAAAGAACAGAACGGATAATATCAAGAAAGTAGACATCTCGTTCTTTAGGAATAAACTATCTGCCAAAGCAAATTGGAACTTGTATCTGTCATGTGATAACCAGCTGGATAAGAATGCAAACTTCCTGTGGGGACAGAGGGAATATCATGCTAAGCGATTTTTCTCGAACTATTTCGAGGAAATTGATCCAGCGAAGGGCTATTCAGCATACGAAAATCGTTTGCATCCGAATGGGACAAGAAAACTTGCAATTGGAAACCTAATTGTACCACTTGATCTGGCTGAGTTTAGGCGGAAGATGAAAGGTGATTTTAAAAGACAGCCAGGGGTGAGTAAGAAGTGCACGAGCTCGAAGGATGGAAACTACGTGTATCCCTGTTGTTGCACTACACTTGATGATGGCTCAGCTGTTGAATCAACATTTTACCCGCCAACTAAGAAGCGCCTCGTAATAGGTAATAGTGGCGACCAAAAGTATGTTGACTTACCAAAAGGGAATTCTGAGATGTTATATATTGCCAGGCAAGGCTTCTGTTACATTAACATTTTCCTCGCGATGTTGATTAACATTAGTGAGGAAGATGCAAAGGATTTCACTAAGAAGGTTCGTGACATGTGTGTGCCAAAGCTCGGAACCTGGCCAACCATGATGGATCTGGCTACAACTTGTGCTCAAATGAAAATATTCTACCCTGATGTTCATGATGCAGAACTGCCTAGAATACTAGTCGATCACGAAACGCAGACATGCCATGTGGTTGACTCGTTTGGCTCACAAACAACTGGGTATCATATTTTGAAAGCATCTAGCGTGTCCCAACTTATTTTGTTTGCTAATGATGAGTTGGAGTCTGACATTAAGCACTATAGAGTTGGTGGTATTCCTAATGCATGCCCTGAACTTGGGTCCACAATATCACCTTTCAGAGAAGGAGGAGTTATAATGTCTGAGTCGGCAGCGCTGAAACTGCTTTTGAAGGGAATTTTTAGACCTAAGGTGATGAGACAGTTGCTGTTAGATGAGCCTTACCTGTTGATTCTATCAATATTATCCCCTGGCATACTGATGGCTATGTATAATAATGGGATTTTTGAACTTGCGGTGAGGTTGTGGATTAATGAGAAGCAATCCATAGCGATGATAGCATCGCTACTATCAGCTTTAGCCCTACGAGTGTCAGCGGCAGAAACACTCGTCGCACAGAGGATTATAATTGATGCTGCAGCTACAGACCTCCTCGATGCTACGTGTGACGGGTTCAACCTGCATCTAACGTACCCCACTGCATTGATGGTGTTGCAAGTTGTTAAGAATAGAAATGAATGTGATGATACTCTATTCAAGGCGGGTTTTCCAAGTTACAACACGAGCGTCGTACAGATTATGGAAAAAAATTATCTAAATCTCTTGAACGATGCTTGGAAAGATTTAACTTGGCGGGAAAAATTTTCCGCAACATGGTACTCATACAGAGCAAAACGCTCTATCACTCGGTACATAAAACCCACAGGAAGGGCAGATTTGAAAGGGTTATACAACATATCACCACAAGCATTCTTGGGCCGAAGCGTCCAGGTGGTCAAAGGCACTGCCTCAGGATTGAGCGAGCGATTTAATAATTACTTCAATACTAAGTGTGTAAATATTTCATCCTTTTTCATTCGTAGAATCTTTAGGCGTTTGCCAACTTTCGTCACTTTTGTTAACTCATTATTAGTTATTAGTATGTTAACTAGTGTAGTGGCAGTGTGTCAGGCAATAATTTTAGATCAGAAGAAGTATAGGAGAGAGATCGAGTTGATGCAGATAGAGAAGAATGAGATTGTCTGCATGGAGCTATACGCAAGTTTACAGCGCAAACTTGAACGCGATTTCACATGGGATGAGTACATTGAGTATTTGAAATCAGTGAACCCTCAGATAGTTCAATTTGCTCAAGCGCAGATGGAAGAATATGATGTGCGACACCAGCGTTCCACACCAGGTGTTAAAAATTTGGAACAAGTGGTAGCATTTATGGCTTTAGTCATCATGGTGTTCGATGCTGAAAGGAGTGACTGCGTGTTCAAAACTCTCAATAAATTTAAGGGTGTCCTTTCCTCACTGGACCATGAAGTTAGACATCAGTCCTTGGATGATGTGATCAAGAATTTTGATGAGAGGAATGAGACTATTGATTTTGAATTGAGTGAGGACACAATTCGAACATCATCAGTGCTAGATACAAAGTTTAGTGATTGGTGGGACCGACAAATCCAAATGGGACATACACTTCCACATTACAGAACCGAGGGGCACTTCATGGAATTTACAAGGGCAACTGCTGTTCAAGTGGCTAATGACATTGCCCACAGCGAACACCTAGACTTTCTAGTAAGGGGAGCTGTTGGGTCTGGAAAGTCAACTGGGTTGCCTGTTCATCTTAGTGTAGCTGGATCTGTGCTTTTAATTGAGCCAACGCGGCCACTAACGGAGAACGTTTTCAAACAGCTATCTAGTGAACCATTCTTCAAGAAGCCAACACTGCGTATGCGTGGAAATAGTATATTTGGCTCTTCCCCAATCTCCGTTATGACTAGCGGGTTTGCGCTACACTACTTCGCCAATAATCGCTCTCAATTAGCTCAGTTCAATTTTGTAATATTTGATGAGTGCCATGTTCTGGATCCTTCCGCAATGGCGTTCCGCAGTCTGCTGAGTGTTTATCATCAAGCATGTAAAGTATTAAAAGTGTCAGCTACTCCAGTGGGAAGAGAGGTTGAATTCACAACACAGCAGCCAGTCAAGTTAATAGTGGAAGACACACTGTCTTTCCAATCATTTGTTGATGCACAAGGTTCTAAAACTAATGCTGATGTTGTCCAGTTTGGTTCAAACATACTTGTTTACGTGTCGAGCTACAATGAAGTTGACAACTTGGCTAAGCTCCTAACAGATAAGAATATGATGGTCACAAAGGTTGATGGCAGAACAATGAAGCACGGTTGCCTAGAAATTGTCACAAAAGGAACCAGTGCGAGACCACATTTTGTTGTAGCAACCAACATAATTGAGAATGGAGTGACTTTGGACATAGATGTAGTAGTGGATTTTGGGTTGAAAGTCTCACCGTTCTTGGACATTGACAATAGGAGCATTGCTTACAATAAGGTGAGTGTTAGTTATGGTGAGAGAATTCAAAGGCTGGGTCGTGTTGGACGCTTCAAGAAAGGAGTAGCATTGCGCATTGGACACACTGAGAAGGGAATTATCGAAATTCCAAGCATGATCGCTACTGAGGCGGCTCTTGCTTGCTTTGCATATAACTTACCAGTGATGACAGGAGGCGTCTCAACTAGTCTGATTGGCAATTGTACTGTGCGCCAGGTCAAAACAATGCAGCAATTTGAATTGAGCCCCTTCTTTATCCAGAATTTCGTTGCTCATGATGGATCAATGCATCCTGTCATACATGACATTCTCAAAAAGTATAAACTCCGAGATTGTATGACACCTTTGTGCGATCAGTCTATACCATACAGGGCATCGAGCACTTGGTTATCGGTTAGTGAATATGAGCGACTTGGAGTGGCTTTAGAAATTCCAAAGCAAGTCAAAATTGCATTCCATATCAAAGAGATCCCTCCTAAGCTCCACGAAATGCTTTGGGAAACGGTTGTCAAGTACAAAGACGTTTGCTTATTTCCAAGCATTCGAGCATCGTCCATCAGCAAAATCGCATACACATTGCGTACAGACCTCTTCGCCATCCCAAGAACTCTAATATTGGTGGAGAGATTGCTTGAAGAGGAGCGAGTGAAGCAGAGCCAATTCAGAAGTCTCATCGATGAAGGGTGCTCAAGCATGTTTTCAATTGTCAACTTGACCAACACTCTCAGGGCCAGATATGCAAAAGATTACACCGCAGAGAACATACAAAAACTTGAGAAAGTGAGAAGTCAATTAAAAGAATTCTCAAATTTGGATGGTTCTGCATGTGAGGAAAATTTAATAAAGAGGTATGAGTCTTTGCAGTTCGTTCACCACCAAGCTGCGACGTCACTTGCAAAGGATCTCAAGTTGAAGGGGACTTGGAAGAAGTCATTAGTGGCTAAAGACTTGATCATAGCAGGCGCTGTTGCAATTGGTGGAATAGGACTCATATATAGTTGGTTCACACAATCAGTTGAGACTGTGTCTCACCAAGGGAAAAATAAATCCAAAAGAATCCAAGCCTTGAAGTTTCGCCATGCTCGTGACAAAAGGGCTGGCTTTGAAATTGACAACAATGATGACACAATAGAGGAATTCTTTGGATCTGCATACAGGAAGAAGGGAAAAGGTAAAGGTACCACAGTTGGTATGGGCAAGTCAAGCAGGAGGTTCATTAACATGTACGGGTTTGACCCGACAGAGTACTCATTCATCCAATTCGTTGATCCACTCACTGGGGCACAAATAGAAGAGAATGTCTATGCTGACATTAGAGATGTTCAAGAAAGATTTAGTGAGGTGCGAAAGAAAATGGTTGAGAATGATGATATTGAAATGCAAGCCTTGGGTAGTAACACGACCATACATGCATACTTTAGGAAAGATTGGTCTGACAAAGCTTTGAAGATTGATTTAATGCCACATAACCCACTCAAAGTTTGTGACAAAACAAATGGCATTGCCAAACTTCCTGAGAGAGAGTTCGAACTAAGGCAGACTGGGCCAGCTGTAGAAGTCGACGTGAAGGACATACCAGCGCAGGAAGTGGAGCATGAAGCCAAATCGCTCATGAGAGGCTTGAGAGACTTCAACCCAATTGCTCAAACAGTTTGTAGGCTGAAAGTATCTGTTGAATATGGGACATCAGAGATGTACGGTTTTGGATTTGGAGCATACATAATAGCGAACCACCATTTGTTTAGGAGTTACAATGGTTCCATGGAGGTGCGATCCATGCATGGTACATTCAGGGTGAAGAATCTACACAGTTTGAGCGTTCTGCCAATTAAAGGCAGGGACATCATCCTCATCAAAATGCCGAAGGATTTCCCTGTTTTTCCACAGAAATTGCATTTCCGAGCTCCTACACAGAATGAAAGAGTTTGTTTAGTTGGGACCAACTTTCAGGAGAAGTATGCATCGTCGATCATCACAGAAACAAGCACTACTTACAATATACCAGGCAGCACATTCTGGAAGCATTGGATTGAAACAGATAATGGACATTGTGGACTACCAGTGGTGAGCACAGCCGATGGATGTCTAGTCGGAATTCATAGTTTGGCAAACAATGCACACACCACGAACTACTACTCAGCCTTCGATGAAGATTTTGAAAGCAAGTACCTCCGAACCAATGAGCACAATGAATGGGTCAAATCTTGGGTTTATAATCCAGACACAGTGTTGTGGGGCCCGTTGAAGCTTAAAGATAGCACTCCCAAAGGGTTATTCAAAACAACAAAGCTTGTGCAAGATCTAATCGATCATGATGAAGTCGCGGAGCAAGCTAAGCACTCTGCGTGGATGTTTGAAGCCTTGACAGGAAATTTGCAAGCTGTCGCAACAATGAAGAGTCAATTAGTAACCAAGCATGTAGTTAAAGGAGAGTGTCGACACTTCAAAGAATTCCTGACTGTGGATGCAGAAGCAGAGGCATTCTTCAGGCCTTTGATGGATGCATATGGGAAAAGCTTGCTGAATAGAGATGCGTACATCAAAGACATAATGAAGTACTCAAAACCTATAGATGTTGGTATCGTGGACTGTGATGCATTTGAAGAAGCCATCAATAGGGTTATCATCTACTTGCAAGTGCACGGCTTCAAGAAGTGTGCATATGTCACTGACGAGCAAGAAATTTTCAAAGCGCTCAACATGAAAGCTGCAGTCGGAGCCATGTATGGTGGCAAAAAGAAAGACTATTTTGAGCATTTCACTGATGCAGATAAGGAAGAAATAGTCATGCAAAGCTGTCTGCGACTGTATAAAGGCTTGCTCGGCATTTGGAACGGATCGTTGAAGGCAGAGCTCCGGTGTAAGGAAAAGATACTTGCAAACAAGACGAGGACGTTCACTGCTGCACCTCTAGACACTTTGCTGGGTGGTAAAGTGTGTGTTGATGACTTCAATAATCAATTTTATTCAAAGAATATTGAATGCTGTTGGACAGTTGGGATGACTAAGTTTTATGGTGGTTGGGATAAACTGCTTCGGCGTTTACCTGAGAATTGGGTATACTGTGATGCCGATGGCTCACAGTTTGATAGTTCACTAACTCCATACTTAATCAATGCTGTTCTCACCATCAGAAGCACATACATGGAAGACTGGGATGTGGGGTTGCAAATGCTGCGTAATTTATACACTGAGATTATTTACACACCTATCTCAACTCCAGATGGAACAATTGTTAAGAAGTTCAGAGGAAATAACAGTGGTCAGCCTTCTACTGTTGTGGACAACTCTCTTATGGTTGTCCTTGCCATGCACTATGCTCTCATCAAAGAATGCATTGAGTTTGAAGAGATTGACAGCACGTGCGTGTTCTTCGTCAATGGTGATGATTTGCTGATTGCCGTGAATCCGGATAAAGAGGGCATTCTTGATAGATTGTCACAGCACTTCTCAGATCTTGGTTTAAATTATGATTTTTCGTCAAGAACAAGAAATAAGGAAGAGTTATGGTTTATGTCTCATAGAGGCCTACTGATTGAGGGCATGTACGTGCCGAAACTTGAAGAAGAAAGGATTGTGTCCATTCTCCAATGGGACAGAGCAGACTTGGCTGAACACAGGCTTGAGGCGATTTGCGCAGCTATGATAGAGTCCTGGGGTTATTCTGAACTAACACACCAAATTAGAAGATTCTACTCATGGTTATTGCAACAGCAACCTTTTGCAACAATAGCGCAGGAAGGGAAGGCTCCTTATATAGCAAGCATGGCATTAAGGAAACTGTATATGGATAGAGCTGTGGATGAGGAAGAGCTGAGAGCCTTCACTGAAATGATGGTCGCATTAGACGATGAGTTTGAGTGTGACTCTTATGAAGTACACCACCAGGGAAACGATACAATCGATGCAGGAGGAAGCAGCAAGAAAGATGCAAGACCAGAGCAAGGCAGCATCCAGTCAAACCCGAACAAAGGAAAAGATAAGGATGTGAATGCTGGTACATCTGGGACACATACTGTGCCGAGAATCAAGGCTATCACGTCCAAAATGAGAATGCCCAAAAGCAAAGGAGCAACCGTGCTAAACTTAGAACACTTGCTTGAGTATGCTCCACAACAAATTGATATTTCAAATACTCGGGCAACTCAATCACAGTTTGATACGTGGTATGAGGCAGTGCGGATGGCATACGACATAGGAGAAACTGAGATGCCAACTGTGATGAATGGGCTTATGGTTTGGTGCATTGAAAATGGAACCTCGCCAAATGTCAACGGAGTCTGGGTTATGATGGATGGGGATGAACAAGTCGAGTACCCGTTGAAACCAATCGTTGAGAATGCAAAACCAACCCTTAGGCAAATCATGGCACATTTCTCAGATGTTGCAGAAGCGTATATAGAAATGCGCAACAAAAAGGAACCATATATGCCACGATATGGTTTAATTCGAAATCTGCGGGATGTGGGTTTAGCGCGTTATGCCTTTGACTTTTATGAGGTCACATCACGAACACCAGTGAGGGCTAGGGAAGCGCACATTCAAATGAAGGCCGCAGCATTGAAATCAGCCCAACCTCGACTTTTCGGGTTGGACGGTGGCATCAGTACACAAGAGGAGAACACAGAGAGGCACACCGCCGAGGATGTCTCTCCAAGTATGCATACTCTACTTGGAGTCAAAAACATGTGA ================================================ FILE: inst/extdata/GVariation/B.Oz.fas ================================================ >Oz ATGGCAACTTACATGTCAACAATCTGTTTCGGTTCGTTTGAATGCAAGCTACCATACTCACCCGCTTCTTGCGGGCATATTGTGAAGGAGCGAGAAGTGCTGGCTTCCGTTGATCCTTTCGCAGATCTGGAAACACAACTTAGTGCACGATTGCTCAAGCAAGAATATGCTACTGTTCGTGTGCTCAAGAACGGTACTCTTACTTACCGATACAAAACTGATGCCCAGATAACGCGCATTCAGAAGAAACTGGAGAGGAAGGATAGGGAAGAATATCACTTCCAGATGGCCGCTCCTAGTATTGTGTCAAAAATTACAATAGCTGGTGGAGATCCTCCATCAAAGTCTGAGCCACAAGCACCAAGAGGGATCATTCATACAACTCCAAGGGTGCGTAAAGTCAAGACACGTCCCATAATAAAGTTGACAGAAGGCCAGATGAATCATCTCATTAAGCAGGTGAAGCAGATTATGTCGGAGAAGAGAGGGTCTGTCCACTTAATTAGTAAGAAGACCACTCATGTTCAATATAAGGAGATACTTGGAGCAACTCGCGCAGCGGTTCGAACTGCACATATGATGGGTTTGCGACGGAGAGTGGACTTCCGATGTGATATGTGGACAGTCGGACTTTTGCAACGTCTCGCTCGGACGGACAAATGGTCCAATCAAGTCCGCACTATCAACATACGAAGGGGTGATAGTGGAGTCATTTTGAACACAAAAAGCCTCAAAGGCCACTTTGGTAGAAGTTCAGGAGACTTGTTCATAGTGCGTGGATCACACGAAGGGAAATTGTACGATGCACGTTCTAGAGTTACTCAGAGTGTTTTGAACTCAATGATCCAGTTTTCGAATGCTGATAATTTTTGGAAGGGTCTAGACGGTAATTGGGCACAACTGAGATATCCTTCGGATCACACATGTGTAGCTGGTTTACCTGTCGAAGATTGTGGTAGAGTTGCTGCATTGATGGCACACAGTATCCTCCCGTGCTACAAGATAACCTGCCCCACCTGTGCTCAACAGTATGCCAGCTTGCCGGTTAGCGATCTGTTTAAGCTGTTGCATAAACATGCGAGAGATGGTTTGAACCGATTGGGAGCGGATAAAGACCGGTTTATACATGTTAATAAGTTCTTGATAGCGTTAGAGCATCTAACTGAACCGGTGGATTTGAATCTCGAGCTTTTCAATGAGATATTTAAATCCATAGGGGAGAAGCAGCAAGCACCGTTCAAGAATTTAAATGTCTTAAATAATTTCTTCCTGAAAGGAAAAGAAAATACAGCTCATGAATGGCAAGTGGCTCAATTGAGTTTGCTCGAATTAGCAAGGTTCCAGAAGAATAGAACTGATAACATCAAGAAAGGTGATATATCTTTCTTCAGAAATAAATTATCTGCCAAGGCAAACTGGAATCTGTATTTGTCGTGCGACAACCAGTTGGATAAAAATGCAAATTTTCTGTGGGGACAAAGGGAGTATCATGCTAAGCGGTTTTTCTCAAACTTCTTTGAGGAAATTGATCCAGCAAAGGGATACTCAGCATATGAAATCCGCAAGCATCCAAATGGAACAAGGAAGCTCTCAATTGGTAACTTAGTTGTCCCACTTGATTTAGCTGAGTTTAGGCAGAAGATGAAAGGTGACTATAGGAAACAACCAGGAGTTAGCAGAAAGTGCACGAGTTCGAAAGATGGTAATTATGTGTATCCCTGTTGTTGCACAACACTTGATGATGGTTCAGCTATTGAATCAACATTCTATCCACCAACCAAAAAGCACCTTGTAATAGGCAATAGCGGTGACCAAAAATTTGTTGATTTACCAAAAGGGGATTCGGAGATGTTATACATTGCCAAGCAGGGTTATTGTTATATCAACGTGTTTCTTGCAATGCTTATTAACATTAGCGAGGAGGATGCAAAGGATTTCACAAAGAAAGTTCGCGACATGTGTGTGCCAAAGCTTGGAACCTGGCCAACTATGATGGATTTGGCGACCACTTGTGCTCAAATGAGAATATTCTATCCTGACGTGCATGATGCAGAGCTGCCTAGAATATTGGTTGACCATGACACTCAAACGTGTCACGTGGTTGACTCATTTGGCTCGCAAACAACTGGATATCATATTCTAAAAGCATCCAGCGTGTCTCAACTTATCTTGTTTGCAAATGATGAATTAGAATCTGATATAAAACATTATAGAGTTGGTGGTGTTCCTAATGCATGCCCTGAACTTGGGTCCACAATATCACCTTTCAGAGAAGGAGGAGTTATAATGTCTGAGTCGGCAGCGCTGAAACTGCTTTTAAAGGGAATTTTTAGACCTAAGGTGATGAGACAGTTGCTGTTAGATGAGCCTTACCTGTTGATTTTATCAATATTATCTCCTGGCATACTGATGGCTATGTATAATAATGGGATTTTTGAACTTGCGGTAAGGTTGTGGATTAATGAGAAACAATCCATAGCTATGATAGCATCGCTACTATCAGCTTTAGCCCTACGAGTGTCAGCGGCAGAAACACTCGTCGCACAGAGGATTATCATTGATGCTGCAGCTACAGACCTCCTTGATGCTACGTGTGATGGGTTCAACCTACATCTAACGTACCCCACTGCATTGATGGTGTTGCAAGTTGTTAAGAATAGAAATGAATGTGATGATACCCTATTCAAGGCGGGTTTTTCAAGTTACAACACGAGCGTCGTACAGATTATGGAAAAAAATTATCTAAATCTCTTGAACGATGCTTGGAAAGATTTAACTTGGCGGGAAAAATTATCCGCAACATGGTACTCATACAGAGCAAAACGCTCTATCACTCGGTACATAAAACCCACAGGAAGGGCAGATTTGAAAGGGTTATACAACATATCACCACAAGCATTCTTGGGCCGAAGCGCCCAGGTGGTCAAAGGTACTGCCTCAGGATTGAGTGAGCGATTTAATAATTATTTCAATACTAAGTGTGTAAATATTTCATCCTTTTTCATTCGTAGAATCTTTAGGCGTTTGCCAACTTTCGTCACTTTTGTTAACTCATTATTAGTTATTAGTATGTTAACTAGCGTAGTGGCAGTGTGTCAGGCAATAATTTTAGATCAGAGGAAATATAGGAGAGAAATCGAGTTGATGCAGATAGAGAAGAATGAGATTGTCTGCATGGAGCTATATGCAAGTTTACAGCGCAAACTTGAACGCGATTTCACATGGGATGAGTACATTGAGTATTTGAAATCAGTAAACCCTCAGATAGTTCAGTTTGCTCAAGCGCAGATGGAAGAATATGATGTGCGACACCAGCGTTCCACACCAGGTGTTAAAAATTTGGAACAAGTGGTAGCATTTATGGCTTTAGTCATTATGGTGTTCGATGCTGAAAGGAGTGATTGCGTGTTCAAAACTCTCAATAAATTTAAGGGTGTCCTTTCCTCAATGGACTATGAAGTTAGACATCAGTCCTTAGACGATGTGATCAAGAATTTTGATGAGAGGAATGAGATTATTGATTTTGAATTGAGTGAGGACACAATTCGAACATCATCAGTGCTAGATACAAAGTTTAGTGATTGGTGGGACCGACAAATCCAGATGGGACATACACTTCCACATTACAGAACCGAGGGGCACTTCATGGAATTCACAAGAGCAACTGCTGTCCAAGTGGCTAATGACATTGCCCATAGCGAACACCTAGACTTTCTAGTAAGGGGAGCTGTTGGGTCTGGAAAGTCAACTGGGTTACCTGTTCATCTTAGTGTAGCCGGATCTGTGCTTTTAATTGAACCAACGCGACCACTAGCGGAGAACGTTTTCAAACAGCTATCTAGTGAACCATTCTTCAAGAAGCCAACACTGCGTATGCGTGGAAATAGTATATTTGGCTCTTCTCCAATCTCCGTCATGACTAGCGGATTTGCGCTACACTACTTCGCCAATAATCGCTCTCAATTAGCTCAGTTCAACTTTGTAATATTTGATGAGTGCCATGTTCTGGATCCTTCCGCAATGGCGTTCCGCAGTCTGCTGAGTGTTTATCATCAAGCATGCAAAGTATTAAAAGTGTCAGCTACTCCAGTGGGAAGAGAGGTTGAATTTACAACACAGCAACCAGTCAAGTTAATAGTGGAGGACACACTGTCTTTCCAATCATTTGTTGATGCACAAGGTTCTAAAACTAATGCTGATGTTGTTCAGTTTGGTTCAAACGTACTTGTGTACGTGTCGAGCTACAATGAAGTTGATACCTTGGCTAAGCTCCTAACAGACAAGAATATGATGGTCACAAAGGTTGATGGCAGAACAATGAAGCACGGTTGCCTAGAAATTGTCACAAAAGGAACCAGTGCGAGACCACATTTTGTTGTAGCAACCAACATAATTGAGAATGGAGTGACTTTGGACATAGACGTGGTTGTAGATTTTGGGTTAAAAGTCTCACCGTTCTTGGACATTGACAATAGGAGCATTGCTTACAATAAGGTGAGTGTTAGCTATGGTGAGAGAATTCAAAGGCTGGGTCGTGTTGGACGCTTCAAGAAAGGAGTAGCATTGCGCATTGGACACACTGAGAAGGGAATTATTGAAATTCCAAGCATGATCGCTACAGAGGCAGCTCTTGCTTGCTTTGCATATAACTTACCAGTGATGACAGGAGGCGTCTCAACTAGTCTGATTGGCAATTGTACTGTGCGCCAAGTTAAAACAATGCAGCAATTTGAATTGAGTCCCTTCTTTATCCAGAATTTCGTTGCCCATGATGGATCAATGCATCCTGTCATACATGACATTCTTAAAAAGTATAAACTTCGAGATTGTATGACACCTTTGTGCGATCAGTCTATACCATACAGGGCATCGAGCACTTGGTTATCGGTTAGTGAATATGAGCGACTTGGAGTGGCCTTAGAAATTCCAAAGCAAGCCAAAATTGCATTCCATATCAAAGAGATCCCTCCTAAGCTCCACGAAATGCTTTGGGAAACGGTTGTCAAGTACAAAGACGTTTGCTTATTTCCAAGCATTCGAGCATCGTCCATCAGCAAAATCGCATACACATTGCGTACAGACCTCTTCGCCATCCCAAGAACTCTAATATTGGTGGAGAGACTGCTTGAAGAGGAGCGAGTGAAGCAGAGCCAATTCAGAAGTCTCATCGATGAAGGATGCTCAAGCATGTTTTCAATTGTCAACCTGACAAACACTCTCAGAGCTAGATATGCAAAAGATTACACCGCAGAGAACATACAAAAACTTGAGAAAGTGAGAAGTCAATTGAAAGAATTCTCAAATTTGGATGGTTCTGCATGTGAGGAAAATTTAATAAAGAGGTATGAGTCTTTGCAGTTCGTTCATCACCAAGCTGCGACGTCACTTGCAAAGGATCTCAAGTTAAAGGGGACTTGGAAGAAGTCATTAGTGGCCAAAGACTTGATCATAGCAGGCGCTGTTGCAATTGGTGGAATAGGACTCATATATAGTTGGTTCACACAATCAGTTGAGACTGTGTCTCACCAAGGGAAAAATAAATCCAAAAGAATTCAAGCCTTGAAGTTTCGCCATGCTCGTGACAAAAGGGCTGGCTTTGAAATTGACAACAATGATGACACAATAGAGGAATTCTTTGGATCTGCATACAGGAAAAAGGGAAAAGGTAAAGGTACCACAGTTGGTATGGGCAAGTCAAGCAGGAGGTTCATCAACATGTATGGGTTTGATCCAACAGAGTACTCATTCATCCAATTCGTTGATCCACTCACTGGGGCGCAAATAGAAGAGAATGTCTATGCTGACATTAGAGATATTCAAGAGAGATTTAGTGAAGTGCGAAAGAAAATGGTTGAGAATGATGACATTGAAATGCAAGCCTTGGGTGGTAACACGACCATACATGCATACTTTAGGAAAGATTGGTCTGACAAAGCTTTGAAGATTGATTTAATGCCACATAATCCACTCAAAGTTTGTGACAAGACAAATGGCATTGCCAAATTTCCTGAGAGAGAGCTCGAACTAAGGCAGACTGGGCCAGCTGTAGAAGTCGATGTGAAGGACATACCAGCACAGGAAGTGGAGCATGAAGCTAAATCGCTCATGAGAGGCTTGAGAGACTTCAACCCAATTGCCCAAACAGTTTGTAGGCTGAAAGTATCTGTTGAATATGGGACATCAGAGATGTACGGTTTTGGATTTGGAGCATACATAATAGCGAACCACCATTTGTTCAGGAGTTACAATGGTTCCATGGAGGTGCGATCCATGCACGGTACATTCAGGGTGAAGAATCTACACAGTTTGAGCGTTCTGCCAATTAAAGGTAGGGATATCATCCTCATCAAAATGCCGAAAGATTTCCCTGTCTTTCCACAGAAATTGCATTTCCGAGCTCCTACACAGAATGAAAGAGTTTGTTTAGTTGGAACCAACTTTCAGGAGAAGTATGCATCGTCGATCATCACAGAAACAAGCACCACTTACAATATACCAGGCAGCACATTCTGGAAGCATTGGATTGAAACAGATAATGGACATTGTGGATTACCAGTGGTGAGCACCACCGATGGATGTCTAGTCGGAATTCACAGTTTGGCAAACAACAAACACACCACGAACTACTACTCAGCCTTCGATGAAGATTTTGAAAGCAAGTATCTCCGAACCAATGAGCACAATGAATGGGTCAAGTCTTGGATTTATAATCCAGACACAGTGTTGTGGGGCCCGTTGAAACTTAAAGACAGCACTCCCAAAGGATTATTCAAAACAACAAAGCTTGTGCAAGATCTAATCGATCATGATGTAGTGGTGGAGCAAGCTAAGCACTCTGCGTGGATGTTTGAAGCCTTGACAGGAAATTTGCAAGCTGTCGCAACAATGAAGAGCCAATTAGTAACCAAGCATGTAGTTAAAGGAGAGTGTCGACACTTCAAAGAATTCCTGACTGTGGATGCAGAAGCAGAGGCATTCTTCAGGCCTTTGATGGATGCGTATGGGAAAAGCTTGCTGAATAGAGATGCATACATCAAGGACATAATGAAGTATTCAAAACCTATAGATGTTGGTATCGTGGACTGTGATGCATTTGAGGAAGCCATCAATAGGGTTATCATCTACCTGCAAGTGCACGGCTTCAAGAAGTGCGCATACGTCACTGACGAGCAAGAAATTTTCAAAGCGCTCAACATGAAAGCTGCAGTTGGAGCCATGTATGGTGGCAAAAAGAAAGACTATTTTGAGCATTTCACTGATGCAGATAAGGAAGAAATAGTCATGCAAAGCTGTCTGCGATTGTATAAAGGCTTGCTTGGCATTTGGAATGGATCATTGAAGGCAGAGCTCCGGTGTAAGGAAAAGATACTTGCAAATAAGACGAGGACATTCACTGCTGCACCTTTAGACACTTTGCTGGGTGGTAAAGTGTGTGTTGATGATTTCAATAATCAATTTTATTCAAAGAATATTGAATGCTGTTGGACGGTTGGGATGACTAAGTTTTATGGTGGTTGGGATAAACTGCTGCGGCGTTTACCTGAGAATTGGGTATACTGTGATGCCGATGGCTCACAGTTTGATAGTTCACTAACTCCATACTTAATCAATGCTGTTCTCACCATCAGAAGCACATACATGGAAGATTGGGATGTGGGGTTGCAAATGTTGCGCAATTTATACACTGAGATTGTTTACACACCTATTTCAACTCCAGATGGAACAATTGTTAAGAAGTTCAGAGGAAATAACAGTGGTCAGCCTTCTACTGTTGTGGACAACTCTCTTATGGTCGTCCTTGCCATGCACTATGCTCTCATCAAAGAATGCATTGAGTTTGAAGAGATTGACAGCACGTGCGTGTTCTTTGTCAATGGTGATGATTTGCTGATTGCTGTGAATCCGGATAAAGAGGGCATTCTTGACAGATTGTCACAACACTTCTCAGATCTTGGTTTGAATTATGATTTCTCGTCAAGAACAAGAAATAAGGAGGAATTGTGGTTTATGTCTCATAGAGGCCTACTGATTGAGGGCATGTACGTGCCGAAACTTGAAGAAGAAAGGATTGTGTCCATTCTCCAATGGGACAGAGCAGACTTGGCTGAACACAGGCTTGAGGCGATTTGCGCAGCTATGATAGAGTCCTGGGGTTATTCTGAACTAACACACCAAATCAGGAGATTCTACTCATGGTTATTGCAACAGCAACCCTTTGCAACAATAGCGCAGGAAGGGAAGGCTCCTTATATAGCAAGCATGGCATTAAGGAAATTGTATATGGATAGGGCTGTGGATGAGGAAGAGCTGAGAGCCTTCACTGAAATGATGGTCGCATTAGACGATGAGTTTGAATTTGACTCTTATGAAGTACACCATCAAGCAAATGACACAATCGATGCAGGAGGAAGCAGCAAGAAAGATGCAAGACCGGAGCAAGGCAGCATCCAGTCAAACCCGAACAAAGGAAAAGATAAGGATGTGAATGCTGGTACATCTGGGACACATACTGTGCCGAGAATCAAGGCTATCACGTCCAAAATGAGAATGCCCAAAAGCAAGGGAGCAACCGTGCTAAACCTAGAACACTTGCTTGAGTATGCTCCACAACAAATTGATATTTCAAATACTCGGGCAACTCAATCACAGTTTGATACGTGGTATGAGGCAGTGCGGATGGCATACGACATAGGAGAAACTGAGATGCCAACTGTGATGAATGGGCTTATGGTTTGGTGCATTGAAAATGGAACCTCGCCAAATGTCAACGGAGTTTGGGTTATGATGGATGGGAATGAACAAGTCGAGTACCCGTTGAAACCAATCGTTGAGAATGCAAAACCAACCCTTAGGCAAATCATGGCACATTTCTCAGATGTTGCAGAAGCGTATATAGAAATGCGCAACAAAAAGGAACCATATATGCCACGATATGGTTTAATTCGAAATCTGCGGGATGTGGGTTTAGCGCGTTATGCCTTTGACTTTTATGAGGTCACATCACGAACACCAGTGAGGGCTAGGGAAGCGCACATTCAAATGAAGGCCGCAGCATTGAAATCAGCCCAACCTCGACTTTTCGGGTTGGACGGTGGCATCAGTACACAAGAGGAGAACACAGAGAGGCACACCACCGAGGATGTCTCTCCAAGTATGCATACTCTACTTGGAGTCAAGAACATGTGA >CF_YL21 ATGGCAATCTACACGTCAACAATCTGTTTCGGTTCGTTTGAATGCAAGCTACCATACTCACCCGCCTCTTGCGGGCATATTGTGAAGGAACGAGAAGTGCTGGCTTCCATCGATCCTTTCGCAGATCTGGAAACACAACTTAGTGCACGATTGCTCAAGCAAGAATATGCTACTGTTCGTGTGCTCAAGAACGGTACTCTTACGTACCGATACAAGACTGATGCCCAGATAACGCGCATCCAGAAGAGACTGGAAAGGAAGGATAGGGAAGAATACCACTTCCAGATGGCAGCTCCTAGTATTGTGTCAAAGATCACTATTGCTGGTGGAGAGCCACCTTCAAAACTTGAATCACAAGTGCGGAGGGGTGTCATCCACACAACTCCAAGGATGCGCACAGCAAAAACATATCACACGCCAAAGCTGACAGAGGGACAAATGAACCACCTTATCAAGCAGGTGAAGCAAATTATGTCAACCAAAGGAGGGTCTGTTCAACTGATTAGCAAGAAAAGTACCCATGTTCACTATAAAGAAGTTTTGGGATCACATCGCGCAGTTGTTTGCACTGCACATATGAGAGGTTTACGAAAGAGAGTGGACTTTCGGTGTGATAAATGGACCGTTGTGCGTCTACAGCATCTCGCCAGGACGGATAAGTGGAATAACCAAGTTCGTGTTACTGATCTACGCAAGGGCGATAGTGGAGTTATATTGAGTAATACTAATCTCAAAGGAAACTTTGGGAGAAGCTCGGAGGGCATATTCATAGTGCGTGGGTCGCACGAAGGAAAAATCTATGATGCACGTTCCAAGGTTACTCAAGGGGTTATGGATTCAATGGTTCAGTTCTCAAGCGCTGAAAGCTTTTGGAAGGGATTGGACGGCAATTGGGCACAAATGAGATATCCTACAGATCATACATGTGTGGCAGGCTTACCAGTTGAAGACTGTGGCAGAGTTGCAGCGATAATGACACACAGTATTTTACCGTGCTATAAGATAACCTGCCCTACCTGTGCCCAACAATATGCCAACTTGCCAGCCAGTGACTTACTTAAGATATTACACAAGCACGCAAGTGATGGTTTAAATCGATTGGGGGCAGACAAAGATCGCTTTGTGCATGTCAAAAAGTTCTTGACAATCTTAGAGCACTTAACTGAACCGGTTGATCTGAGTCTAGAAATTTTCAATGAAGTATTCAAGTCTATAGGGGAGAAGCAACAATCACCTTTCAAAAACCTGAATATTCTGAATAATTTCTTTTTAAAAGGAAAGGAAAATACAGCTCGTGAATGGCAGGTGGCTCAATTAAGCTTACTTGAATTGGCAAGATTCCAAAAGAACAGAACGGATAATATCAAGAAAGTAGACATCTCGTTCTTTAGGAATAAACTATCTGCCAAAGCAAATTGGAACTTGTATCTGTCATGTGATAACCAGCTGGATAAGAATGCAAACTTCCTGTGGGGACAGAGGGAATATCATGCTAAGCGATTTTTCTCGAACTATTTCGAGGAAATTGATCCAGCGAAGGGCTATTCAGCATACGAAAATCGTTTGCATCCGAATGGGACAAGAAAACTTGCAATTGGAAACCTAATTGTACCACTTGATCTGGCTGAGTTTAGGCGGAAGATGAAAGGTGATTTTAAAAGACAGCCAGGGGTGAGTAAGAAGTGCACGAGCTCGAAGGATGGAAACTACGTGTATCCCTGTTGTTGCACTACACTTGATGATGGCTCAGCTGTTGAATCAACATTTTACCCGCCAACTAAGAAGCGCCTCGTAATAGGTAATAGTGGCGACCAAAAGTATGTTGACTTACCAAAAGGGAATTCTGAGATGTTATATATTGCCAGGCAAGGCTTCTGTTACATTAACATTTTCCTCGCGATGTTGATTAACATTAGTGAGGAAGATGCAAAGGATTTCACTAAGAAGGTTCGTGACATGTGTGTGCCAAAGCTCGGAACCTGGCCAACCATGATGGATCTGGCTACAACTTGTGCTCAAATGAAAATATTCTACCCTGATGTTCATGATGCAGAACTGCCTAGAATACTAGTCGATCACGAAACGCAGACATGCCATGTGGTTGACTCGTTTGGCTCACAAACAACTGGGTATCATATTTTGAAAGCATCTAGCGTGTCCCAACTTATTTTGTTTGCTAATGATGAGTTGGAGTCTGACATTAAGCACTATAGAGTTGGTGGTATTCCTAATGCATGCCCTGAACTTGGGTCCACAATATCACCTTTCAGAGAAGGAGGAGTTATAATGTCTGAGTCGGCAGCGCTGAAACTGCTTTTGAAGGGAATTTTTAGACCTAAGGTGATGAGACAGTTGCTGTTAGATGAGCCTTACCTGTTGATTCTATCAATATTATCCCCTGGCATACTGATGGCTATGTATAATAATGGGATTTTTGAACTTGCGGTGAGGTTGTGGATTAATGAGAAGCAATCCATAGCGATGATAGCATCGCTACTATCAGCTTTAGCCCTACGAGTGTCAGCGGCAGAAACACTCGTCGCACAGAGGATTATAATTGATGCTGCAGCTACAGACCTCCTCGATGCTACGTGTGACGGGTTCAACCTGCATCTAACGTACCCCACTGCATTGATGGTGTTGCAAGTTGTTAAGAATAGAAATGAATGTGATGATACTCTATTCAAGGCGGGTTTTCCAAGTTACAACACGAGCGTCGTACAGATTATGGAAAAAAATTATCTAAATCTCTTGAACGATGCTTGGAAAGATTTAACTTGGCGGGAAAAATTTTCCGCAACATGGTACTCATACAGAGCAAAACGCTCTATCACTCGGTACATAAAACCCACAGGAAGGGCAGATTTGAAAGGGTTATACAACATATCACCACAAGCATTCTTGGGCCGAAGCGTCCAGGTGGTCAAAGGCACTGCCTCAGGATTGAGCGAGCGATTTAATAATTACTTCAATACTAAGTGTGTAAATATTTCATCCTTTTTCATTCGTAGAATCTTTAGGCGTTTGCCAACTTTCGTCACTTTTGTTAACTCATTATTAGTTATTAGTATGTTAACTAGTGTAGTGGCAGTGTGTCAGGCAATAATTTTAGATCAGAAGAAGTATAGGAGAGAGATCGAGTTGATGCAGATAGAGAAGAATGAGATTGTCTGCATGGAGCTATACGCAAGTTTACAGCGCAAACTTGAACGCGATTTCACATGGGATGAGTACATTGAGTATTTGAAATCAGTGAACCCTCAGATAGTTCAATTTGCTCAAGCGCAGATGGAAGAATATGATGTGCGACACCAGCGTTCCACACCAGGTGTTAAAAATTTGGAACAAGTGGTAGCATTTATGGCTTTAGTCATCATGGTGTTCGATGCTGAAAGGAGTGACTGCGTGTTCAAAACTCTCAATAAATTTAAGGGTGTCCTTTCCTCACTGGACCATGAAGTTAGACATCAGTCCTTGGATGATGTGATCAAGAATTTTGATGAGAGGAATGAGACTATTGATTTTGAATTGAGTGAGGACACAATTCGAACATCATCAGTGCTAGATACAAAGTTTAGTGATTGGTGGGACCGACAAATCCAAATGGGACATACACTTCCACATTACAGAACCGAGGGGCACTTCATGGAATTTACAAGGGCAACTGCTGTTCAAGTGGCTAATGACATTGCCCACAGCGAACACCTAGACTTTCTAGTAAGGGGAGCTGTTGGGTCTGGAAAGTCAACTGGGTTGCCTGTTCATCTTAGTGTAGCTGGATCTGTGCTTTTAATTGAGCCAACGCGGCCACTAACGGAGAACGTTTTCAAACAGCTATCTAGTGAACCATTCTTCAAGAAGCCAACACTGCGTATGCGTGGAAATAGTATATTTGGCTCTTCCCCAATCTCCGTTATGACTAGCGGGTTTGCGCTACACTACTTCGCCAATAATCGCTCTCAATTAGCTCAGTTCAATTTTGTAATATTTGATGAGTGCCATGTTCTGGATCCTTCCGCAATGGCGTTCCGCAGTCTGCTGAGTGTTTATCATCAAGCATGTAAAGTATTAAAAGTGTCAGCTACTCCAGTGGGAAGAGAGGTTGAATTCACAACACAGCAGCCAGTCAAGTTAATAGTGGAAGACACACTGTCTTTCCAATCATTTGTTGATGCACAAGGTTCTAAAACTAATGCTGATGTTGTCCAGTTTGGTTCAAACATACTTGTTTACGTGTCGAGCTACAATGAAGTTGACAACTTGGCTAAGCTCCTAACAGATAAGAATATGATGGTCACAAAGGTTGATGGCAGAACAATGAAGCACGGTTGCCTAGAAATTGTCACAAAAGGAACCAGTGCGAGACCACATTTTGTTGTAGCAACCAACATAATTGAGAATGGAGTGACTTTGGACATAGATGTAGTAGTGGATTTTGGGTTGAAAGTCTCACCGTTCTTGGACATTGACAATAGGAGCATTGCTTACAATAAGGTGAGTGTTAGTTATGGTGAGAGAATTCAAAGGCTGGGTCGTGTTGGACGCTTCAAGAAAGGAGTAGCATTGCGCATTGGACACACTGAGAAGGGAATTATCGAAATTCCAAGCATGATCGCTACTGAGGCGGCTCTTGCTTGCTTTGCATATAACTTACCAGTGATGACAGGAGGCGTCTCAACTAGTCTGATTGGCAATTGTACTGTGCGCCAGGTCAAAACAATGCAGCAATTTGAATTGAGCCCCTTCTTTATCCAGAATTTCGTTGCTCATGATGGATCAATGCATCCTGTCATACATGACATTCTCAAAAAGTATAAACTCCGAGATTGTATGACACCTTTGTGCGATCAGTCTATACCATACAGGGCATCGAGCACTTGGTTATCGGTTAGTGAATATGAGCGACTTGGAGTGGCTTTAGAAATTCCAAAGCAAGTCAAAATTGCATTCCATATCAAAGAGATCCCTCCTAAGCTCCACGAAATGCTTTGGGAAACGGTTGTCAAGTACAAAGACGTTTGCTTATTTCCAAGCATTCGAGCATCGTCCATCAGCAAAATCGCATACACATTGCGTACAGACCTCTTCGCCATCCCAAGAACTCTAATATTGGTGGAGAGATTGCTTGAAGAGGAGCGAGTGAAGCAGAGCCAATTCAGAAGTCTCATCGATGAAGGGTGCTCAAGCATGTTTTCAATTGTCAACTTGACCAACACTCTCAGGGCCAGATATGCAAAAGATTACACCGCAGAGAACATACAAAAACTTGAGAAAGTGAGAAGTCAATTAAAAGAATTCTCAAATTTGGATGGTTCTGCATGTGAGGAAAATTTAATAAAGAGGTATGAGTCTTTGCAGTTCGTTCACCACCAAGCTGCGACGTCACTTGCAAAGGATCTCAAGTTGAAGGGGACTTGGAAGAAGTCATTAGTGGCTAAAGACTTGATCATAGCAGGCGCTGTTGCAATTGGTGGAATAGGACTCATATATAGTTGGTTCACACAATCAGTTGAGACTGTGTCTCACCAAGGGAAAAATAAATCCAAAAGAATCCAAGCCTTGAAGTTTCGCCATGCTCGTGACAAAAGGGCTGGCTTTGAAATTGACAACAATGATGACACAATAGAGGAATTCTTTGGATCTGCATACAGGAAGAAGGGAAAAGGTAAAGGTACCACAGTTGGTATGGGCAAGTCAAGCAGGAGGTTCATTAACATGTACGGGTTTGACCCGACAGAGTACTCATTCATCCAATTCGTTGATCCACTCACTGGGGCACAAATAGAAGAGAATGTCTATGCTGACATTAGAGATGTTCAAGAAAGATTTAGTGAGGTGCGAAAGAAAATGGTTGAGAATGATGATATTGAAATGCAAGCCTTGGGTAGTAACACGACCATACATGCATACTTTAGGAAAGATTGGTCTGACAAAGCTTTGAAGATTGATTTAATGCCACATAACCCACTCAAAGTTTGTGACAAAACAAATGGCATTGCCAAACTTCCTGAGAGAGAGTTCGAACTAAGGCAGACTGGGCCAGCTGTAGAAGTCGACGTGAAGGACATACCAGCGCAGGAAGTGGAGCATGAAGCCAAATCGCTCATGAGAGGCTTGAGAGACTTCAACCCAATTGCTCAAACAGTTTGTAGGCTGAAAGTATCTGTTGAATATGGGACATCAGAGATGTACGGTTTTGGATTTGGAGCATACATAATAGCGAACCACCATTTGTTTAGGAGTTACAATGGTTCCATGGAGGTGCGATCCATGCATGGTACATTCAGGGTGAAGAATCTACACAGTTTGAGCGTTCTGCCAATTAAAGGCAGGGACATCATCCTCATCAAAATGCCGAAGGATTTCCCTGTTTTTCCACAGAAATTGCATTTCCGAGCTCCTACACAGAATGAAAGAGTTTGTTTAGTTGGGACCAACTTTCAGGAGAAGTATGCATCGTCGATCATCACAGAAACAAGCACTACTTACAATATACCAGGCAGCACATTCTGGAAGCATTGGATTGAAACAGATAATGGACATTGTGGACTACCAGTGGTGAGCACAGCCGATGGATGTCTAGTCGGAATTCATAGTTTGGCAAACAATGCACACACCACGAACTACTACTCAGCCTTCGATGAAGATTTTGAAAGCAAGTACCTCCGAACCAATGAGCACAATGAATGGGTCAAATCTTGGGTTTATAATCCAGACACAGTGTTGTGGGGCCCGTTGAAGCTTAAAGATAGCACTCCCAAAGGGTTATTCAAAACAACAAAGCTTGTGCAAGATCTAATCGATCATGATGAAGTCGCGGAGCAAGCTAAGCACTCTGCGTGGATGTTTGAAGCCTTGACAGGAAATTTGCAAGCTGTCGCAACAATGAAGAGTCAATTAGTAACCAAGCATGTAGTTAAAGGAGAGTGTCGACACTTCAAAGAATTCCTGACTGTGGATGCAGAAGCAGAGGCATTCTTCAGGCCTTTGATGGATGCATATGGGAAAAGCTTGCTGAATAGAGATGCGTACATCAAAGACATAATGAAGTACTCAAAACCTATAGATGTTGGTATCGTGGACTGTGATGCATTTGAAGAAGCCATCAATAGGGTTATCATCTACTTGCAAGTGCACGGCTTCAAGAAGTGTGCATATGTCACTGACGAGCAAGAAATTTTCAAAGCGCTCAACATGAAAGCTGCAGTCGGAGCCATGTATGGTGGCAAAAAGAAAGACTATTTTGAGCATTTCACTGATGCAGATAAGGAAGAAATAGTCATGCAAAGCTGTCTGCGACTGTATAAAGGCTTGCTCGGCATTTGGAACGGATCGTTGAAGGCAGAGCTCCGGTGTAAGGAAAAGATACTTGCAAACAAGACGAGGACGTTCACTGCTGCACCTCTAGACACTTTGCTGGGTGGTAAAGTGTGTGTTGATGACTTCAATAATCAATTTTATTCAAAGAATATTGAATGCTGTTGGACAGTTGGGATGACTAAGTTTTATGGTGGTTGGGATAAACTGCTTCGGCGTTTACCTGAGAATTGGGTATACTGTGATGCCGATGGCTCACAGTTTGATAGTTCACTAACTCCATACTTAATCAATGCTGTTCTCACCATCAGAAGCACATACATGGAAGACTGGGATGTGGGGTTGCAAATGCTGCGTAATTTATACACTGAGATTATTTACACACCTATCTCAACTCCAGATGGAACAATTGTTAAGAAGTTCAGAGGAAATAACAGTGGTCAGCCTTCTACTGTTGTGGACAACTCTCTTATGGTTGTCCTTGCCATGCACTATGCTCTCATCAAAGAATGCATTGAGTTTGAAGAGATTGACAGCACGTGCGTGTTCTTCGTCAATGGTGATGATTTGCTGATTGCCGTGAATCCGGATAAAGAGGGCATTCTTGATAGATTGTCACAGCACTTCTCAGATCTTGGTTTAAATTATGATTTTTCGTCAAGAACAAGAAATAAGGAAGAGTTATGGTTTATGTCTCATAGAGGCCTACTGATTGAGGGCATGTACGTGCCGAAACTTGAAGAAGAAAGGATTGTGTCCATTCTCCAATGGGACAGAGCAGACTTGGCTGAACACAGGCTTGAGGCGATTTGCGCAGCTATGATAGAGTCCTGGGGTTATTCTGAACTAACACACCAAATTAGAAGATTCTACTCATGGTTATTGCAACAGCAACCTTTTGCAACAATAGCGCAGGAAGGGAAGGCTCCTTATATAGCAAGCATGGCATTAAGGAAACTGTATATGGATAGAGCTGTGGATGAGGAAGAGCTGAGAGCCTTCACTGAAATGATGGTCGCATTAGACGATGAGTTTGAGTGTGACTCTTATGAAGTACACCACCAGGGAAACGATACAATCGATGCAGGAGGAAGCAGCAAGAAAGATGCAAGACCAGAGCAAGGCAGCATCCAGTCAAACCCGAACAAAGGAAAAGATAAGGATGTGAATGCTGGTACATCTGGGACACATACTGTGCCGAGAATCAAGGCTATCACGTCCAAAATGAGAATGCCCAAAAGCAAAGGAGCAACCGTGCTAAACTTAGAACACTTGCTTGAGTATGCTCCACAACAAATTGATATTTCAAATACTCGGGCAACTCAATCACAGTTTGATACGTGGTATGAGGCAGTGCGGATGGCATACGACATAGGAGAAACTGAGATGCCAACTGTGATGAATGGGCTTATGGTTTGGTGCATTGAAAATGGAACCTCGCCAAATGTCAACGGAGTCTGGGTTATGATGGATGGGGATGAACAAGTCGAGTACCCGTTGAAACCAATCGTTGAGAATGCAAAACCAACCCTTAGGCAAATCATGGCACATTTCTCAGATGTTGCAGAAGCGTATATAGAAATGCGCAACAAAAAGGAACCATATATGCCACGATATGGTTTAATTCGAAATCTGCGGGATGTGGGTTTAGCGCGTTATGCCTTTGACTTTTATGAGGTCACATCACGAACACCAGTGAGGGCTAGGGAAGCGCACATTCAAATGAAGGCCGCAGCATTGAAATCAGCCCAACCTCGACTTTTCGGGTTGGACGGTGGCATCAGTACACAAGAGGAGAACACAGAGAGGCACACCGCCGAGGATGTCTCTCCAAGTATGCATACTCTACTTGGAGTCAAAAACATGTGA ================================================ FILE: inst/extdata/GVariation/C.Wilga5.fas ================================================ >Wilga5 ATGGCAACTTACATGTCAACAATCTGTTTCGGTTCGTTTGAATGCAAGCTACCATACTCACCCGCCTCTTGCGGGCATATTGTGAAGGAACGAGAAGTGCTGGCTTCCGTTGATCCTTTCGCAGATCTGGAAACACAACTTAGTGCACGATTGCTCAAGCAAGAATATGCTACTGTTCGTGTGCTCAAGAACGGTACTCTTACGTACCGATACAAGACTGATGCCCAGATAACGCGCATCCAGAAGAAACTGGAAAGGAAGGATAGGGAAGAATATCACTTCCAGATGGCAGCTCCTAGTATTGTGTCAAAGATCACTATTGCTGGTGGAGAGCCACCTTCAAAACTTGAATCACAAGTGCGGAGGGGTGTCATCCACACAACTCCAAGGATGCGCACAGCAAAAACATATCACACGCCAAAGTTGACAGAGGGACAAATGAACCACCTTATCAAGCAGGTGAAGCAAATTATGTCAACCAAAGGAGGGTCTGTTCAACTGATTAGCAAGAAAAGTACCCATGTTCACTATAAAGAAGTTTTGGGATCACATCGCGCAGTTGTTTGCACTGCACATATGAGAGGTTTACGAAAGAGAGTGGACTTTCGGTGTGATAAATGGACCGTTGTGCGTCTACAGCATCTCGCCAGGACGGACAAGTGGAATAACCAAGTTCGTGCTACTGATCTACGCAAGGGCGATAGTGGAGTTATATTGAGTAATACTAATCTCAAAGGAAGCTTTGGGAGAAGCTCGGAGGGCATATTCATAGTGCGTGGGTCGCACGAAGGAAAAATCTATGATGCACGTTCCAAGGTTACTCAAGGGGTTATGGATTCAATGGTTCAGTTCTCAAGCGCTGAAAGCTTCTGGAAGGGATTGGACGGCAATTGGGCACAAATGAGATATCCTACAGATCATACATGTGTGGCAGGCTTACCAGTTGAAGACTGCGGCAGAGTTGCAGCGATAATGACACACAGTATTTTACCGTGCTATAAGATAACCTGCCCTACCTGTGCCCAACAATATGCCAATTTGCCAGCCAGTGACTTACTTAAGATATTACACAAGCACGCAAGTGATGGTTTAAATCGATTGGGGGCAGACAAAGATCGCTTTGTGCACGTCAAAAAGTTCTTGACAATCTTAGAGCACTTAACTGAACCGGTTGATCTGGGTCTAGAAATTTTCAATGAAGTATTCAAGTCTATAGGGGAGAAGCAACAATCACCTTTCAAAAACCTGAATATTCTGAATAATTTCTTTTTGAAAGGAAAAGAAAATACAGCTCGTGAATGGCAGGTGGCTCAATTAGGCTTACTTGAATGGGCAAGATTCCAAAAGAACAGAACGGATAATATCAAGAAAGGAGACATCTCGTTCTTTAGGAATAAACTATCTGCCAAAGCAAATTGGAACTTGTATCTGTCATGTGATAACCAGCTGGATAAGAATGCAAACTTCCTGTGGGGACAGAGGGAATATCATGCTAAGCGATTTTTCTCGAACTATTTCGAGGAAATTGATCCAGCGAAGGGCTATTCAGCATATGAAAATCGTTTGCATCCGAATGGGACAAGAAAACTTGCAATTGGAAACCTAATTGTACCACTTGATCTGGCTGAGTTTAGGCGGAAGATGAAAGGTGATTATAAAAGACAGCCAGGGGTGAGTAAGAAGTGCACGAGCTCGAAGGATGGAAACTACGTGTATCCCTGTTGTTGCACTACACTTGATGATGGCTCAGCTGTTGAATCAACATTTTACCCGCCAACTAAGAAGCACCTTGTAATAGGTAATAGTGGCGACCAAAAGTATGTTGACTTACCAAAAGGGAATTCTGAGATGTTATATATTGCCAGGCAAGGCTTTTGTTACATTAACATTTTCCTCGCGATGTTGATTAACATTAGTGAGGAAGATGCAAAGGATTTCACTAAGAAGGTTCGTGACATGTGTGTGCCAAAGCTTGGAACCTGGCCAACCATGATGGATCTGGCTACAACTTGTGCTCAAATGAAAATATTCTACCCTGATGTTCATGATGCAGAACTGCCTAGAATACTAGTCGATCACGAAACGCAGACATGCCATGTGGTTGACTCGTTTGGCTCACAAACAACTGGGTATCATATTTTGAAAGCATCTAGCGTGTCCCAACTTATTTTGTTTGCTAATGATGAGTTGGAGTCTGACATTAAACATTATAGAGTTGGTGGTGTTCCTAATGCATGCCCTGAACTTGGGTCCACAATATCACCTTTCAGAGAAGGAGGAGTTATAATGTCTGAGTCGGCAGCTCTGAAACTGCTTTTGAAGGGAATTTTTAGACCTAAGGTGATGAGACAGTTGTTGTTAGATGAGCCTTATCTGTTGATTCTATCAATATTATCCCCTGGCATACTGATGGCTATGTACAATAATGGGATTTTTGAACTTGCGGTAAGGTTGTGGATTAATGAGAAACAATCCATAGCTATGATAGCATCGCTACTATCAGCTTTAGCCCTACGAGTGTCAGCGGCAGAAACACTCGTCGCACAGAGGATTATAATTGATACTGCAGCTACAGATCTCCTTGATGCTACGTGCGATGGGTTCAACCTACATCTAACGTACCCCACTGCGTTGATGGTGTTGCAAGTTGTTAAGAATAGAAATGAATGTGATGATACCCTATTCAAGGCGGGTTTTCCAAGTTACAACACGAGCGTCGTACAGATCATGGAAAAAAATTATCTAAATCTCTTAAACGATGCTTGGAAAGATTTAACTTGGCGGGAAAAATTATCCGCAACATGGTACTCATACAGAGCAAAACGCTCTATCACTCGGTACATAAAACCCACAGGAAGGGCAGATTTGAAAGGGTTATACAACATATCACCACAAGCATTCTTGGGCCGAGGCGCCAAGGTGGTCAAAGGCACTGCCTCAGGATTGTGCGAGCGATTTAATAATTATTTCTACACTAAGTGTGTAAATATTTCATCCTTTTTCATTCGTAGAATCTTTAGGCGTTTGCCAACTTTCGTCACTTTTGTTAACTCATTATTAGTTATTAGTATGTTAACCAGCGTAGTGGCAGTCTGTCAGGCAATAATTTTAGATCAGAGGAAGTATAGGAGAGAAATCGAGTTGATGCAGATAGAGAAGAATGAGATCGTCTGCATGGAGCTATATGCAAGTTTACAGCGCAAACTTGAACGCGATTTCACATGGGATGAGTACATTGAGTATTTGAAGTCAGTAAACCCTCAGATAGTTCAGTTTGCTCAAGCGCAGATGGAAGAATATGATGTGCGACACCAGCGTTCCACACCAGGTGTTAAAAATTTGGAACAAGTGGTAGCATTTATGGCTTTAGTCATCATGGTGTTCGATGCTGAAAGGAGTGATTGCGTGTTCAAAACTCTCAATAAATTTAAGGGTGTCCTTTCCTCACTGGACCATGAAGTTAGACATCAGTCCTTAGACGATGTGATCAAGAATTTTGATGAGAGGAATGAGGTTATTGATTTTGAATTGAGTGAGGACACAATTCGAACATCATCAGTGCTAGATACAAAGTTTAGTGATTGGTGGGACCGACAAATCCAGATGGGGCATACACTTCCACATTACAGAACCGAGGGGCACTTCATGGAATTCACAAGAGCAACTGCTGTCCAAGTGGCTAATGACATTGCCCATAGCGAACACCTAGACTTTCTAGTAAGGGGAGCTGTTGGGTCTGGAAAGTCAACTGGGTTGCCTGTTCATCTTAGTGTAGCCGGATCTGTGCTTTTAATTGAACCAACGCGACCACTAGCGGAGAACGTTTTCAAACAGCTATCTAGTGAACCATTCTTCAAGAAGCCAACACTGCGTATGCGTGGAAATAGTATATTTGGCTCTTCTCCAATCTCCGTCATGACTAGCGGATTTGCGCTACACTACTTCGCCAATAATCGCTCTCAATTAGCTCAGTTCAACTTTGTAATATTTGATGAGTGCCATGTTCTGGATCCTTCCGCAATGGCGTTCCGCAGTCTGCTGAGTGTTTATCATCAAGCATGCAAAGTATTAAAAGTGTCAGCTACTCCAGTGGGAAGAGAGGTTGAATTTACAACACAGCAACCAGTCAAGTTAATAGTGGAGGACACACTGTCTTTTCTATCATTTGTTGATGCACAAGGTTCTAAAACTAATGCTGATGTTGTTCAGTTTGGTTCAAACGTACTTGTGTACGTGTCGAGCTACAATGAAGTTGATACCTTGGCTAAGCTCCTAACAGACAAGAATATGATGGTCACAAAAGTTGATGGCAGAACAATGAAGCACGGTTGCCTAGAAATTGTCACAAAAGGAACCAGTGCGAGACCACATTTTGTTGTAGCAACCAACATAATTGAGAATGGAGTGACTTTGGACATAGACGTGGTTGTAGATTTTGGGTTGAAAGTCTCACCGTTCTTGGACATTGACAATAGGAGCATTGCTTACAATAAGGTGAGTGTTAGCTATGGTGAGAGAATTCAAAGGCTGGGTCGTGTTGGACGCTTCAAGAAAGGAGTAGCATTGCGCATTGGACACACTGAGAAGGGAATTATTGAAATTCCAAGCATGATCGCTACAGAGGCGGCTCTTGCTTGCTTTGCATATAACTTGCCAGTGATGACAGGAGGCGTCTCAACTAGTCTGATTGGCAATTGTACTGTGCGCCAAGTTAAAACAATGCAGCAATTTGAATTGAGTCCCTTCTTTATCCAGAATTTTGTTGCCCATGATGGATCAATGCATCCTGTCATACATGACATTCTTAAAAAGTATAAACTTCGAGATTGTATGACACCTTTGTGCGATCAGTCTATACCATACAGGGCATCGAGCACTTGGTTATCGGTTAGTGAATATGAGCGACTTGGAGTGGCCTTAGAAATTCCAAAGCAAGTCAAAATTGCATTCCATATCAAAGAGATCCCTCCTAAGCTCCATGAAATGCTTTGGGAAACGGTTGTCAAGTACAAAGACGTTTGCTTATTTCCAAGCATTCGAGCATCGTCCATCAGCAAAATCGCATACACATTGCGTACAGACCTTTTCGCCATCCCAAGAACTCTAATATTGGTGGAGAGACTGCTTGAAGAGGAGCGAGTGAAGCAGAGCCAATTCAGAAGTCTCATCGACGAAGGATGCTCAAGCATGTTTTCAATTGTCAACTTGACAAACACTCTCAGAGCTAGATATGCAAAAGATTACACCGCAGAGAACATACAAAAACTTGAGAAAGTGAGAAGTCAATTGAAAGAATTCTCAAATTTGGATGGTTCTGCATGTGAGGAAAATTTAATAAAGAGGTATGAGTCTTTGCAGTTCGTTCATCACCAAGCTGCGACGTCACTTGCAAAGGATCTCAAGTTGAAGGGGACTTGGAAGAAGTCATTAGTGGCCAAAGACTTGATCATAGCAGGCGCTGTTGCAATTGGTGGAATAGGACTCATATATAGTTGGTTCACACAATCAGTTGAGACTGTGTCTCACCAAGGGAAAAATAAATCCAAAAGAATCCAAGCCTTGAAGTTTCGCCATGCTCGTGACAAAAGGGCTGGCTTTGAAATTGACAACAATGATGACACAATAGAGGAATTCTTTGGATCTGCATACAGGAAAAAGGGAAAAGGTAAAGGTACCACAGTTGGTATGGGCAAGTCAAGCAGGAGGTTCATCAACATGTATGGGTTTGACCCAACAGAGTACTCATTCATCCAATTCGTTGATCCACTCACTGGGGCGCAAATAGAAGAGAATGTCTATGCTGACATTAGAGATATTCAAGAGAGATTTAGTGAAGTGCGAAAGAAAATGGTTGAGAATGATGACATTGAAATGCAAGCCTTGGGCAGTAACACGACCATACATGCATACTTCAGGAAAGATTGGTCTGACAAAGCTTTGAAGATTGATTTAATGCCACATAATCCACTCAAAGTTTGTGACAAGACAAATGGCATTGCCAAATTTCCTGAGAGAGAACTCGAACTAAGGCAGACTGGGCCAGCTGTAGAAGTCGACGTGAAGGACATACCAGCACAGGAGGTGGAGCATGAAGCTAAATCGCTCATGAGAGGCTTGAGAGACTTCAACCCAATTGCCCAAACAGTTTGTAGGCTGAAAGTATCTGTTGAATATGGGACATCAGAGATGTACGGTTTTGGATTTGGAGCGTACATAATAGCGAACCACCATTTGTTCAGGAGTTACAATGGTTCCATGGAGGTGCGATCCATGCACGGTACATTCAGGGTGAAGAATCTACACAGTTTGAGTGTTCTGCCAATTAAAGGTAGGGATATCATCCTCATCAAAATGCCGAAAGATTTCCCTGTCTTTCCACAGAAATTGCATTTCCGAGCTCCTACACAGAATGAAAGAGTTTGTTTAGTTGGAACCAACTTTCAGGAGAAGTATGCATCGTCGATCATCACAGAAGGAGGCACCACTTACAATATACCAGGCAGCACATTCTGGAAGCATTGGATTGAAACAGATAATGGACATTGTGGACTACCAGTGGTGAGCACCACCGATGGATGTCTAGTCGGAATTCACAGTTTGGCAAACAACAAACACACCACGAATTACTACTCAGCCTTCGATGAAGATTTTGAAAGCAAGTATCTCCGAACCAATGAGCACAATGAATGGGTCAAGTCTTGGATTTATAATCCAGACACAGTATTGTGGGGCCCGTTGAAACTTAAAGACAGCACTCCCAAAGGATTATTCAAAACAACAAAGCTTGTGCAAGATCTAATTGATCATGATGAAGTGGTGGAGCAAGCTAAGCACTCTGCGTGGATGTTTGAAGCCTTGACAGGAAATTTGCAAGCTGTCGCAACAATGAAGAGCCAATTAGTAACCAAGCATGTAGTTAAAGGAGAGTGTCGACACTTCAAAGAATTCCTGACTGTGGATGCAGAAGCAGAGGCATTCTTCAGGCCTTTGATGGATGCGTATGGGAAAAGTTTGCTGAATAGAGATGCATACATCAAGGACATAATGAAGTATTCAAAACCTATAGATGTTGGTATCGTGGACTGTGATGCATTTGAGGAAGCCATCAATAGGGTTATCATCTACCTGCAAGTGCACGGCTTCAAGAAGTGCGCATACGTTACTGACGAGCAAGAAATTTTTAAAGCGCTCAACATGAAAGCTGCAGTCGGAGCCATGTATGGTGGCAAAAAGAAAGACTATTTTGAGCATTTCACTGATGCAGATAAGGAAGAAATAGTCATGCAAAGCTGTCTGCGATTGTATAAAGGCTTGCTTGGCATTTGGAATGGATCATTGAAGGCAGAGCTCCGGTGTAAGGAAAAGATACTTGCAAATAAGACGAGGACATTCACTGCTGCACCTTTAGACACTTTGCTGGGTGGTAAAGTGTGTGTTGATGACTTCAATAATCAATTTTATTCAAAGAATATTGAATGCTGTTGGACGGTTGGGATGACAAAGTTTTATGGTGGTTGGGATAAACTGCTGCGGCGTTTACCTGAGAATTGGGTTTACTGTGATGCCGATGGCTCACAGTTTGATAGTTCACTAACTCCATACTTAATCAATGCAGTTCTCACCATTAGAAGCACATACATGGAAGATTGGGATGTGGGGTTGCAAATGTTGCGCAATTTATACACTGAGATTGTTTACACACCTATTTCAACTCCAGATGGAACAATTGTTAAGAAGTTCAGAGGGAATAACAGTGGTCAGCCTTCTACTGTTGTGGACAACTCTCTTATGGTCGTCCTTGCCATGCACTATGCTCTCATCAAAGAATGCATTGAGTTTGAAGAGATTGACAGCACGTGCGTGTTCTTTGTCAATGGTGATGATTTGCTGATTGCTGTGAATCCGGATAAAGAGGGCATTCTTGACAGATTGTCACAACACTTCTCAGATCTTGGTTTGAATTATGATTTCTCGTCAAGAACAAGAAATAAGGAGGAGTTGTGGTTTATGTCTCATAGAGGCCTACTGATTGAGGCAATGTACGTGCCGAAACTTGAAGAAGAAAGGATTGTGTCCATTCTCCAATGGGACAGAGCAGACTTGGCTGAACATAGGCTTGAGGCGATTTGCGCAGCTATGATAGAGTCCTGGGGTTATTCTGAACTAACACACCAAATCAGGAGATTCTACTCATGGTTATTGCAACAGCAACCCTTTGCAACAATAGCGCAGGAAGGGAAGGCTCCTTATATAGCAAGCATGGCATTAAGGAAATTGTATATGGATAGGGCTGTGGATGAGGAAGAGCTGAGAGCCTTCACTGAAATGATGGTCGCATTAGATGATGAGTTTGAATTTGACTCTTATGAAGTATACCATCAAGCAAATGACACAATCGATGCAGGAGAAAGCAGCAAGAAAGATGCAAGACCAGAGCAAGGCAGCATCCAGTCAAACCCGAACAAAGGAAAAGATAAGGATGTGAATGCTGGTACATCTGGGACACATACTGTGCCGAGAATCAAGGCTATCACGTCCAAAATGAGGATGCCCAAAAGCAAGGGAGCAACCGTGCTAAACTTAGAACACTTGCTTGAGTATGCTCCACAACAAATTGATATTTCAAATACTCGGGCAACTCAATCACAGTTTGATACGTGGTATGAGGCAGTGCGGATGGCATACGACATAGGAGAAACTGAGATGCCAACTGTGATGAATGGGCTTATGGTTTGGTGCATTGAAAATGGAACCTCGCCAAATGTCAACGGAGTTTGGGTTATGATGGATGGGAATGAACAAGTCGAGTACCCGTTGAAACCAATCGTTGAGAATGCAAAACCAACCCTTAGGCAAATCATGGCACATTTCTCAGATGTTGCAGAAGCGTATATAGAAATGCGCAACAAAAAGGAACCATATATGCCACGATATGGTTTAATTCGAAATCTGCGGGATGTGGGTTTAGCGCGTTATGCCTTTGACTTTTATGAGGTCACATCACGAACACCAGTGAGGGCTAGGGAAGCGCACATTCAAATGAAGGCCGCAGCATTGAAATCAGCCCAACCTCGACTTTTCGGGTTGGACGGTGGCATCAGTACACAAGAGGAGAACACAGAGAGGCACACCACCGAGGATGTCTCTCCAAGTATGCATACTCTACTTGGAGTCAAGAACATGTGA >CF_YL21 ATGGCAATCTACACGTCAACAATCTGTTTCGGTTCGTTTGAATGCAAGCTACCATACTCACCCGCCTCTTGCGGGCATATTGTGAAGGAACGAGAAGTGCTGGCTTCCATCGATCCTTTCGCAGATCTGGAAACACAACTTAGTGCACGATTGCTCAAGCAAGAATATGCTACTGTTCGTGTGCTCAAGAACGGTACTCTTACGTACCGATACAAGACTGATGCCCAGATAACGCGCATCCAGAAGAGACTGGAAAGGAAGGATAGGGAAGAATACCACTTCCAGATGGCAGCTCCTAGTATTGTGTCAAAGATCACTATTGCTGGTGGAGAGCCACCTTCAAAACTTGAATCACAAGTGCGGAGGGGTGTCATCCACACAACTCCAAGGATGCGCACAGCAAAAACATATCACACGCCAAAGCTGACAGAGGGACAAATGAACCACCTTATCAAGCAGGTGAAGCAAATTATGTCAACCAAAGGAGGGTCTGTTCAACTGATTAGCAAGAAAAGTACCCATGTTCACTATAAAGAAGTTTTGGGATCACATCGCGCAGTTGTTTGCACTGCACATATGAGAGGTTTACGAAAGAGAGTGGACTTTCGGTGTGATAAATGGACCGTTGTGCGTCTACAGCATCTCGCCAGGACGGATAAGTGGAATAACCAAGTTCGTGTTACTGATCTACGCAAGGGCGATAGTGGAGTTATATTGAGTAATACTAATCTCAAAGGAAACTTTGGGAGAAGCTCGGAGGGCATATTCATAGTGCGTGGGTCGCACGAAGGAAAAATCTATGATGCACGTTCCAAGGTTACTCAAGGGGTTATGGATTCAATGGTTCAGTTCTCAAGCGCTGAAAGCTTTTGGAAGGGATTGGACGGCAATTGGGCACAAATGAGATATCCTACAGATCATACATGTGTGGCAGGCTTACCAGTTGAAGACTGTGGCAGAGTTGCAGCGATAATGACACACAGTATTTTACCGTGCTATAAGATAACCTGCCCTACCTGTGCCCAACAATATGCCAACTTGCCAGCCAGTGACTTACTTAAGATATTACACAAGCACGCAAGTGATGGTTTAAATCGATTGGGGGCAGACAAAGATCGCTTTGTGCATGTCAAAAAGTTCTTGACAATCTTAGAGCACTTAACTGAACCGGTTGATCTGAGTCTAGAAATTTTCAATGAAGTATTCAAGTCTATAGGGGAGAAGCAACAATCACCTTTCAAAAACCTGAATATTCTGAATAATTTCTTTTTAAAAGGAAAGGAAAATACAGCTCGTGAATGGCAGGTGGCTCAATTAAGCTTACTTGAATTGGCAAGATTCCAAAAGAACAGAACGGATAATATCAAGAAAGTAGACATCTCGTTCTTTAGGAATAAACTATCTGCCAAAGCAAATTGGAACTTGTATCTGTCATGTGATAACCAGCTGGATAAGAATGCAAACTTCCTGTGGGGACAGAGGGAATATCATGCTAAGCGATTTTTCTCGAACTATTTCGAGGAAATTGATCCAGCGAAGGGCTATTCAGCATACGAAAATCGTTTGCATCCGAATGGGACAAGAAAACTTGCAATTGGAAACCTAATTGTACCACTTGATCTGGCTGAGTTTAGGCGGAAGATGAAAGGTGATTTTAAAAGACAGCCAGGGGTGAGTAAGAAGTGCACGAGCTCGAAGGATGGAAACTACGTGTATCCCTGTTGTTGCACTACACTTGATGATGGCTCAGCTGTTGAATCAACATTTTACCCGCCAACTAAGAAGCGCCTCGTAATAGGTAATAGTGGCGACCAAAAGTATGTTGACTTACCAAAAGGGAATTCTGAGATGTTATATATTGCCAGGCAAGGCTTCTGTTACATTAACATTTTCCTCGCGATGTTGATTAACATTAGTGAGGAAGATGCAAAGGATTTCACTAAGAAGGTTCGTGACATGTGTGTGCCAAAGCTCGGAACCTGGCCAACCATGATGGATCTGGCTACAACTTGTGCTCAAATGAAAATATTCTACCCTGATGTTCATGATGCAGAACTGCCTAGAATACTAGTCGATCACGAAACGCAGACATGCCATGTGGTTGACTCGTTTGGCTCACAAACAACTGGGTATCATATTTTGAAAGCATCTAGCGTGTCCCAACTTATTTTGTTTGCTAATGATGAGTTGGAGTCTGACATTAAGCACTATAGAGTTGGTGGTATTCCTAATGCATGCCCTGAACTTGGGTCCACAATATCACCTTTCAGAGAAGGAGGAGTTATAATGTCTGAGTCGGCAGCGCTGAAACTGCTTTTGAAGGGAATTTTTAGACCTAAGGTGATGAGACAGTTGCTGTTAGATGAGCCTTACCTGTTGATTCTATCAATATTATCCCCTGGCATACTGATGGCTATGTATAATAATGGGATTTTTGAACTTGCGGTGAGGTTGTGGATTAATGAGAAGCAATCCATAGCGATGATAGCATCGCTACTATCAGCTTTAGCCCTACGAGTGTCAGCGGCAGAAACACTCGTCGCACAGAGGATTATAATTGATGCTGCAGCTACAGACCTCCTCGATGCTACGTGTGACGGGTTCAACCTGCATCTAACGTACCCCACTGCATTGATGGTGTTGCAAGTTGTTAAGAATAGAAATGAATGTGATGATACTCTATTCAAGGCGGGTTTTCCAAGTTACAACACGAGCGTCGTACAGATTATGGAAAAAAATTATCTAAATCTCTTGAACGATGCTTGGAAAGATTTAACTTGGCGGGAAAAATTTTCCGCAACATGGTACTCATACAGAGCAAAACGCTCTATCACTCGGTACATAAAACCCACAGGAAGGGCAGATTTGAAAGGGTTATACAACATATCACCACAAGCATTCTTGGGCCGAAGCGTCCAGGTGGTCAAAGGCACTGCCTCAGGATTGAGCGAGCGATTTAATAATTACTTCAATACTAAGTGTGTAAATATTTCATCCTTTTTCATTCGTAGAATCTTTAGGCGTTTGCCAACTTTCGTCACTTTTGTTAACTCATTATTAGTTATTAGTATGTTAACTAGTGTAGTGGCAGTGTGTCAGGCAATAATTTTAGATCAGAAGAAGTATAGGAGAGAGATCGAGTTGATGCAGATAGAGAAGAATGAGATTGTCTGCATGGAGCTATACGCAAGTTTACAGCGCAAACTTGAACGCGATTTCACATGGGATGAGTACATTGAGTATTTGAAATCAGTGAACCCTCAGATAGTTCAATTTGCTCAAGCGCAGATGGAAGAATATGATGTGCGACACCAGCGTTCCACACCAGGTGTTAAAAATTTGGAACAAGTGGTAGCATTTATGGCTTTAGTCATCATGGTGTTCGATGCTGAAAGGAGTGACTGCGTGTTCAAAACTCTCAATAAATTTAAGGGTGTCCTTTCCTCACTGGACCATGAAGTTAGACATCAGTCCTTGGATGATGTGATCAAGAATTTTGATGAGAGGAATGAGACTATTGATTTTGAATTGAGTGAGGACACAATTCGAACATCATCAGTGCTAGATACAAAGTTTAGTGATTGGTGGGACCGACAAATCCAAATGGGACATACACTTCCACATTACAGAACCGAGGGGCACTTCATGGAATTTACAAGGGCAACTGCTGTTCAAGTGGCTAATGACATTGCCCACAGCGAACACCTAGACTTTCTAGTAAGGGGAGCTGTTGGGTCTGGAAAGTCAACTGGGTTGCCTGTTCATCTTAGTGTAGCTGGATCTGTGCTTTTAATTGAGCCAACGCGGCCACTAACGGAGAACGTTTTCAAACAGCTATCTAGTGAACCATTCTTCAAGAAGCCAACACTGCGTATGCGTGGAAATAGTATATTTGGCTCTTCCCCAATCTCCGTTATGACTAGCGGGTTTGCGCTACACTACTTCGCCAATAATCGCTCTCAATTAGCTCAGTTCAATTTTGTAATATTTGATGAGTGCCATGTTCTGGATCCTTCCGCAATGGCGTTCCGCAGTCTGCTGAGTGTTTATCATCAAGCATGTAAAGTATTAAAAGTGTCAGCTACTCCAGTGGGAAGAGAGGTTGAATTCACAACACAGCAGCCAGTCAAGTTAATAGTGGAAGACACACTGTCTTTCCAATCATTTGTTGATGCACAAGGTTCTAAAACTAATGCTGATGTTGTCCAGTTTGGTTCAAACATACTTGTTTACGTGTCGAGCTACAATGAAGTTGACAACTTGGCTAAGCTCCTAACAGATAAGAATATGATGGTCACAAAGGTTGATGGCAGAACAATGAAGCACGGTTGCCTAGAAATTGTCACAAAAGGAACCAGTGCGAGACCACATTTTGTTGTAGCAACCAACATAATTGAGAATGGAGTGACTTTGGACATAGATGTAGTAGTGGATTTTGGGTTGAAAGTCTCACCGTTCTTGGACATTGACAATAGGAGCATTGCTTACAATAAGGTGAGTGTTAGTTATGGTGAGAGAATTCAAAGGCTGGGTCGTGTTGGACGCTTCAAGAAAGGAGTAGCATTGCGCATTGGACACACTGAGAAGGGAATTATCGAAATTCCAAGCATGATCGCTACTGAGGCGGCTCTTGCTTGCTTTGCATATAACTTACCAGTGATGACAGGAGGCGTCTCAACTAGTCTGATTGGCAATTGTACTGTGCGCCAGGTCAAAACAATGCAGCAATTTGAATTGAGCCCCTTCTTTATCCAGAATTTCGTTGCTCATGATGGATCAATGCATCCTGTCATACATGACATTCTCAAAAAGTATAAACTCCGAGATTGTATGACACCTTTGTGCGATCAGTCTATACCATACAGGGCATCGAGCACTTGGTTATCGGTTAGTGAATATGAGCGACTTGGAGTGGCTTTAGAAATTCCAAAGCAAGTCAAAATTGCATTCCATATCAAAGAGATCCCTCCTAAGCTCCACGAAATGCTTTGGGAAACGGTTGTCAAGTACAAAGACGTTTGCTTATTTCCAAGCATTCGAGCATCGTCCATCAGCAAAATCGCATACACATTGCGTACAGACCTCTTCGCCATCCCAAGAACTCTAATATTGGTGGAGAGATTGCTTGAAGAGGAGCGAGTGAAGCAGAGCCAATTCAGAAGTCTCATCGATGAAGGGTGCTCAAGCATGTTTTCAATTGTCAACTTGACCAACACTCTCAGGGCCAGATATGCAAAAGATTACACCGCAGAGAACATACAAAAACTTGAGAAAGTGAGAAGTCAATTAAAAGAATTCTCAAATTTGGATGGTTCTGCATGTGAGGAAAATTTAATAAAGAGGTATGAGTCTTTGCAGTTCGTTCACCACCAAGCTGCGACGTCACTTGCAAAGGATCTCAAGTTGAAGGGGACTTGGAAGAAGTCATTAGTGGCTAAAGACTTGATCATAGCAGGCGCTGTTGCAATTGGTGGAATAGGACTCATATATAGTTGGTTCACACAATCAGTTGAGACTGTGTCTCACCAAGGGAAAAATAAATCCAAAAGAATCCAAGCCTTGAAGTTTCGCCATGCTCGTGACAAAAGGGCTGGCTTTGAAATTGACAACAATGATGACACAATAGAGGAATTCTTTGGATCTGCATACAGGAAGAAGGGAAAAGGTAAAGGTACCACAGTTGGTATGGGCAAGTCAAGCAGGAGGTTCATTAACATGTACGGGTTTGACCCGACAGAGTACTCATTCATCCAATTCGTTGATCCACTCACTGGGGCACAAATAGAAGAGAATGTCTATGCTGACATTAGAGATGTTCAAGAAAGATTTAGTGAGGTGCGAAAGAAAATGGTTGAGAATGATGATATTGAAATGCAAGCCTTGGGTAGTAACACGACCATACATGCATACTTTAGGAAAGATTGGTCTGACAAAGCTTTGAAGATTGATTTAATGCCACATAACCCACTCAAAGTTTGTGACAAAACAAATGGCATTGCCAAACTTCCTGAGAGAGAGTTCGAACTAAGGCAGACTGGGCCAGCTGTAGAAGTCGACGTGAAGGACATACCAGCGCAGGAAGTGGAGCATGAAGCCAAATCGCTCATGAGAGGCTTGAGAGACTTCAACCCAATTGCTCAAACAGTTTGTAGGCTGAAAGTATCTGTTGAATATGGGACATCAGAGATGTACGGTTTTGGATTTGGAGCATACATAATAGCGAACCACCATTTGTTTAGGAGTTACAATGGTTCCATGGAGGTGCGATCCATGCATGGTACATTCAGGGTGAAGAATCTACACAGTTTGAGCGTTCTGCCAATTAAAGGCAGGGACATCATCCTCATCAAAATGCCGAAGGATTTCCCTGTTTTTCCACAGAAATTGCATTTCCGAGCTCCTACACAGAATGAAAGAGTTTGTTTAGTTGGGACCAACTTTCAGGAGAAGTATGCATCGTCGATCATCACAGAAACAAGCACTACTTACAATATACCAGGCAGCACATTCTGGAAGCATTGGATTGAAACAGATAATGGACATTGTGGACTACCAGTGGTGAGCACAGCCGATGGATGTCTAGTCGGAATTCATAGTTTGGCAAACAATGCACACACCACGAACTACTACTCAGCCTTCGATGAAGATTTTGAAAGCAAGTACCTCCGAACCAATGAGCACAATGAATGGGTCAAATCTTGGGTTTATAATCCAGACACAGTGTTGTGGGGCCCGTTGAAGCTTAAAGATAGCACTCCCAAAGGGTTATTCAAAACAACAAAGCTTGTGCAAGATCTAATCGATCATGATGAAGTCGCGGAGCAAGCTAAGCACTCTGCGTGGATGTTTGAAGCCTTGACAGGAAATTTGCAAGCTGTCGCAACAATGAAGAGTCAATTAGTAACCAAGCATGTAGTTAAAGGAGAGTGTCGACACTTCAAAGAATTCCTGACTGTGGATGCAGAAGCAGAGGCATTCTTCAGGCCTTTGATGGATGCATATGGGAAAAGCTTGCTGAATAGAGATGCGTACATCAAAGACATAATGAAGTACTCAAAACCTATAGATGTTGGTATCGTGGACTGTGATGCATTTGAAGAAGCCATCAATAGGGTTATCATCTACTTGCAAGTGCACGGCTTCAAGAAGTGTGCATATGTCACTGACGAGCAAGAAATTTTCAAAGCGCTCAACATGAAAGCTGCAGTCGGAGCCATGTATGGTGGCAAAAAGAAAGACTATTTTGAGCATTTCACTGATGCAGATAAGGAAGAAATAGTCATGCAAAGCTGTCTGCGACTGTATAAAGGCTTGCTCGGCATTTGGAACGGATCGTTGAAGGCAGAGCTCCGGTGTAAGGAAAAGATACTTGCAAACAAGACGAGGACGTTCACTGCTGCACCTCTAGACACTTTGCTGGGTGGTAAAGTGTGTGTTGATGACTTCAATAATCAATTTTATTCAAAGAATATTGAATGCTGTTGGACAGTTGGGATGACTAAGTTTTATGGTGGTTGGGATAAACTGCTTCGGCGTTTACCTGAGAATTGGGTATACTGTGATGCCGATGGCTCACAGTTTGATAGTTCACTAACTCCATACTTAATCAATGCTGTTCTCACCATCAGAAGCACATACATGGAAGACTGGGATGTGGGGTTGCAAATGCTGCGTAATTTATACACTGAGATTATTTACACACCTATCTCAACTCCAGATGGAACAATTGTTAAGAAGTTCAGAGGAAATAACAGTGGTCAGCCTTCTACTGTTGTGGACAACTCTCTTATGGTTGTCCTTGCCATGCACTATGCTCTCATCAAAGAATGCATTGAGTTTGAAGAGATTGACAGCACGTGCGTGTTCTTCGTCAATGGTGATGATTTGCTGATTGCCGTGAATCCGGATAAAGAGGGCATTCTTGATAGATTGTCACAGCACTTCTCAGATCTTGGTTTAAATTATGATTTTTCGTCAAGAACAAGAAATAAGGAAGAGTTATGGTTTATGTCTCATAGAGGCCTACTGATTGAGGGCATGTACGTGCCGAAACTTGAAGAAGAAAGGATTGTGTCCATTCTCCAATGGGACAGAGCAGACTTGGCTGAACACAGGCTTGAGGCGATTTGCGCAGCTATGATAGAGTCCTGGGGTTATTCTGAACTAACACACCAAATTAGAAGATTCTACTCATGGTTATTGCAACAGCAACCTTTTGCAACAATAGCGCAGGAAGGGAAGGCTCCTTATATAGCAAGCATGGCATTAAGGAAACTGTATATGGATAGAGCTGTGGATGAGGAAGAGCTGAGAGCCTTCACTGAAATGATGGTCGCATTAGACGATGAGTTTGAGTGTGACTCTTATGAAGTACACCACCAGGGAAACGATACAATCGATGCAGGAGGAAGCAGCAAGAAAGATGCAAGACCAGAGCAAGGCAGCATCCAGTCAAACCCGAACAAAGGAAAAGATAAGGATGTGAATGCTGGTACATCTGGGACACATACTGTGCCGAGAATCAAGGCTATCACGTCCAAAATGAGAATGCCCAAAAGCAAAGGAGCAACCGTGCTAAACTTAGAACACTTGCTTGAGTATGCTCCACAACAAATTGATATTTCAAATACTCGGGCAACTCAATCACAGTTTGATACGTGGTATGAGGCAGTGCGGATGGCATACGACATAGGAGAAACTGAGATGCCAACTGTGATGAATGGGCTTATGGTTTGGTGCATTGAAAATGGAACCTCGCCAAATGTCAACGGAGTCTGGGTTATGATGGATGGGGATGAACAAGTCGAGTACCCGTTGAAACCAATCGTTGAGAATGCAAAACCAACCCTTAGGCAAATCATGGCACATTTCTCAGATGTTGCAGAAGCGTATATAGAAATGCGCAACAAAAAGGAACCATATATGCCACGATATGGTTTAATTCGAAATCTGCGGGATGTGGGTTTAGCGCGTTATGCCTTTGACTTTTATGAGGTCACATCACGAACACCAGTGAGGGCTAGGGAAGCGCACATTCAAATGAAGGCCGCAGCATTGAAATCAGCCCAACCTCGACTTTTCGGGTTGGACGGTGGCATCAGTACACAAGAGGAGAACACAGAGAGGCACACCGCCGAGGATGTCTCTCCAAGTATGCATACTCTACTTGGAGTCAAAAACATGTGA ================================================ FILE: inst/extdata/GVariation/sample_alignment.fa ================================================ >Mont ATGGCAACTTACACATCAACAATCCAGTTTGGTTCCATTGAATGCAAACTTCCATACTCACCCGCTCCTTTTGGGCTAGT TGCGGGGAAACGAGAAGTTTTAACCACCACTGACCCCTTCGCAAGTTTGGAGATGCAGCTTAGTGCGCGATTACGAAGGC AAGAGTTTGCAACTATTCGAACATCCAAGAATGGTACTTGCATGTATCGATACAAGACTGATGTCCAGATTGCGCGCATT CAAAAGAAGCGCGAGGAAAGAGAAAGAGAGGAATATAATTTCCAAATGGCTGCGTCAAGTGTTGTGTCGAAGATCACTAT TGCTGGTGGAGAGCCACCTTCAAAACTTGAATCACAAGTGCGGAGGGGTGTCATCCACACAACTCCAAGGATGCGCACAG CAAAAACATATCACACGCCAAAGCTGACAGAGGGACAAATGAACCACCTTATCAAGCAGGTGAAGCAAATTATGTCAACC AAAGGAGGGTCTGTTCAACTGATTAGCAAGAAAAGTACTCATGTTCACTATAAAGAAGTTTTGGGATCACATCGCGCAGT TGTTTGCACTGCACATATGAGAGGTTTACGAAAGAGAGTGGACTTTCGGTGTGATAAATGGACCGTTGTGCGTCTACAGC ATCTCGCCAGGACGGACAAGTGGACTAACCAAGTTCGTGCTACTGATCTACGCAAGGGCGATAGTGGAGTTATATTGAGT AATACTAATCTCAAAGGAAACTTTGGGAGAAGCTCGGAGGGCCTATTCATAGTGCGTGGGTCGCACGAAGGAAAAATCTA TGATGCACGTTCCAAGGTTACTCAAGGGGTTATGGATTCAATGGTTCAGTTCTCAAGCGCTGAAAGCTTTTGGAAGGGAT TGGACGGCAATTGGGCACAAATGAGATATCCTACAGATCATACATGTGTGGCAGGCTTACCAGTTGAAGACTGTGGCAGA GTTGCAGCGATAATGACACACAGTATTTTACCGTGCTATAAGATAACCTGCCCTACCTGTGCCCAACAATATGCCAACTT GCCAGCCAGTGACTTACTTAAGATATTACACAAGCACGCAAGTGATGGTCTAAATCGATTGGGGGCAGACAAAGATCGCT TTGTGCATGTCAAAAAGTTCTTGACAATCTTAGAGCACTTAACTGAACCGGTTGATCTGAGTCTAGAAATTTTCAATGAA GTATTCAAGTCTATAGGGGAGAAGCAACAATCACCTTTCAAAAACCTGAATATTCTGAATAATTTCTTTTTGAAAGGAAA GGAAAATACAGCTCGTGAATGGCAGGTGGCTCAATTAAGCTTACTTGAATTGGCAAGATTCCAAAAGAACAGAACGGATA ATATCAAGAAAGGAGACATCTCGTTCTTTAGGAATAAACTATCTGCAAAAGCAAATTGGAACTTGTATCTGTCATGTGAT AACCAGCTGGATAAGAATGCAAACTTCCTGTGGGGACAGAGGGAATATCATGCTAAGCGATTTTTCTCGAACTATTTCGA GGAAATTGATCCAGCGAAGGGCTATTCAGCATACGAAAATCGTTTGCATCCGAATGGGACAAGAAAACTTGCAATTGGAA ACCTAATTGTACCACTTGATCTGGCTGAGTTTAGGCGGAAGATGAAAGGTGATTATAAAAGACAGCCAGGGGTGAGTAAG AAGTGCACGAGCTCGAAGGATGGAAACTACGTGTATCCCTGTTGTTGCACTACACTTGATGATGGCTCAGCTGTTGAGTC AACATTTTACCCGCCAACTAAGAAGCACCTCGTAATAGGTAATAGTGGCGACCAAAAGTATGTTGACTTACCAAAAGGGA ATTCTGAGATGTTATATATTGCCAGGCAAGGCTTCTGTTACATTAACATTTTTCTCGCGATGTTGATTAACATTAGTGAG GAAGATGCAAAGGATTTCACTAAGAAGGTTCGTGACATGTGTGTGCCAAAGCTTGGAACCTGGCCAACCATGATGGATTT GGCTACAACTTGTGCTCAAATGAAAATATTCTACCCTGATGTTCATGATGCAGAACTGCCTAGAATACTAGTCGATCACG AAACGCAGACATGCCATGTGGTTGACTCGTTTGGCTCACAAACAACTGGGTATCATATTTTGAAAGCATCTAGCGTGTCC CAACTTATTTTGTTTGCTAATGATGAGTTGGAGTCTGACATTAAGCACTATAGAGTTGGTGGTATTCCTGGAGCATGCCC TGAGCTTGGGTCCACAATATCACCTTTTAGAGAAGGAGGAATCATAATGTCTGAGTCAGCAGCGCTAAAACTGCTCCTAA AGGGAATTTTTAGGCCCAAAGTGATGAAGCAATTGCTACTGGATGAACCATATTTGCTCATTTTATCGATATTATCTCCT GGTATACTTATGGCCATGTACAACAATGGGATATTTGAGTTAGCGGTGAAGTTGTGGATCAATGAGAAACAATCTATAGC CATGATAGCATCGTTATTGTCCGCCTTGGCTTTACGAGTGTCAGCAGCAGAAACACTCGTTGCACAGAGGATTATAATTG ACACGGCAGCAACAGATCTTCTCGATGCTACGTGTGATGGATTCAACTTACATCTAACATATCCCACTGCACTCATGGTG TTGCAAGTTGTTAAGAACAGAAATGAATGTGATGATACGTTGTTTAAAGCAGGTTTTTCACATTACAACATGAGTGTCGT GCAGATTATGGAAAAAAATTATCTAAGCCTCTTGGGCGATGCTTGGAAAGATTTAACCTGGCGAGAAAAATTATCCGCAA CATGGCACTCATACAAAGCAAAGCGCTCTATCACTCAGTTCATAAAACCCATAGGCAAAGCAGATTTAAAAGGGTTGTAC AACATATCACCGCAAGCATTCTTGGGTCAGGGCGTACAGAGAGTCAAAGGCACCGCCTCAGGGTTGAATGAGCGACTCAA TAATTATATCAATACTAAGTGTGTAAATATTTCATCCTTTTTCATTCGTAGAATTTTCCGGCGCTTGCCAACTTTTGTAA CTTTCATTAATTCATTATTAGTTATTAGTATGCTAACTAGTGTAGTAGCAGTGTGTCAAGCAATAATTCTAGATCAAAGG AAGTATAGAAAAGAAATTGAGTTGATGCAGATTGAGAAGAATGAAATTGTTTGTATGGAGTTGTATGCGAGTCTGCAGCG CAAACTTGAGCGTGAATTCACATGGGATGAATATATGGAATATTTGAAATCTGTGAATCCCCAGATAGTTCAATTCGCGC AAGCTCAAATGGAAGAATATAATGTGCGACATCAGCGCTCCACACCAGGTGTTAAGAATTTAGAGCAGGTGGTAGCATTT ATAACTCTAATTATCATGATGTTTGATGCTGAAAGGAGCGACTGTGTATTTAAGACTCTCAACAAATTCAAAGGCATCGT TTCTTCAATGGATCATGAAGTTAGACACCAGTCCTTGGATGATGTAATCAAGAATTTCGATGAAAGGAACGAAGTTATTG ATTTTGAACTAAATGAGGATACAATTAAAACATCATCAGTGTTGGACACAAAGTTTAGCGACTGGTGGGATCGGCAAATC CAAATGGGACACACACTTCCCCATTATAGAACTGAGGGACACTTCATGGAATTCACAAGGGCAACTGCTGTACAAGTGGC CAACGACATCGCGCATAGTGAGCACCTAGACTTTCTAGTGAGGGGAGCTGTTGGGTCTGGAAAATCTACTGGACTGCCTG TCCATCTCAGTGCAGCTGGATCTGTGCTTTTGATAGAACCAACTCGACCACTTGCAGAAAACGTGTTCAAGCAATTATCC AGTGAACCGTTTTTCAAGAAGCCAACACTGCGCATGCGAGGAAATAGTGTGTTTGGTTCCTCTCCAATCTCCATTATGAC TAGCGGCTTTGCGTTGCACTACTATGCTAATAATCGCTCTCAGCTAACTCAGTTTAATTTCATAATTTTTGATGAATGTC ATGTTTTAGATCCTTCTGCAATGGCATTTCGTAGCTTGTTAAGTGTGTATCACCAAACATGCAAAGTGTTAAAGGTGTCA GCCACTCCAGTGGGAAGGGAGGTCGAGTTCACAACACAACAACCAGTTAAATTGGTGGTTGAGGATACACTTTCATTCCA ATCTTTTGTTGATGCGCAAGGCTCAAAAACCAATGCTGACGTAGTTCAGCATGGTTCGAACATACTCGTGTATGTGTCGA GTTACAATGAAGTGGATACATTAGCCAAGCTTCTAACAGATAGGAATATGATAGTCTCAAAAGTTGATGGCAGAACAATG AAGCACGGATGCTTAGAAATTGTAACGAAAGGGACTAGTGCAAAGCCACATTTTGTCGTAGCAACCAACATTATTGAAAA TGGAGTAACTTTAGATATAGATGTAGTTGTAGATTTTGGGCTTAAAGTCTCACCGTTTTTAGATATTGACAATAGGAGCA TAGCATACAATAAGATTAGTGTTAGCTATGGAGAAAGAATTCAGAGGTTGGGCCGTGTTGGGCGCTTTAAGAAGGGAGTG GCATTGCGTATTGGACACACCGAAAAGGGAATTATTGAGATTCCAAGTATGATTGCTAGTGAAGCTGCGCTTGCGTGCTT TGCATACAATTTGCCAGTAATGACAGGGGGTGTTTCAACTAGCCTCATTGGCAATTGTACTGTTCGTCAAGTTAAAACTA TGCAACAATTTGAGCTGAGTCCATTCTTTATACAAAATTTTGTTGCCCATGATGGATCAATGCATCCTGTCATACATGAC ATTCTTAAGAAGTATAAACTGCGAGATTGTATGACGCCCTTGTGTGATCAATCCATACCTTACAGAGCCTCAAGCACTTG GTTGTCTGTTAGTGAGTACGAACGACTCGGAGTGGTTTTGGACATTCCAAAACAGATCAAGATTGCATTCCACATCAAGG ATATCCCTCCTAAGTTGCATGAAATGCTTTGGGAAACAGTTATCAAATATAAGGATGTTTGTTTGTTTCCAAGTATTCGG GCTTCATCCATTAGCAAAATTGCATACACACTGCGCACTGATCTTTTTGCAATTCCCAGAACCCTAATTCTAGTTGAAAG ATTGCTCGAGGAGGAACGAGTGAAACAGAGTCAATTCAGAAGTCTCATTGATGAAGGATGCTCAAGCATGTTTTCAATTG TTAATTTAACAAACACTCTTAGAGCTAGATATGCAAAGGATTACACTGCAGAAAACATACAGAAGCTCGAGAAAGTGAGG AGTCAGTTAAAGGAGTTCTCAAATTTAAATGGCTCTGCATGCGAGGAGAACTTAATGAAGAGGTATGAATCTCTACAGTT TGTGCATCATCAAGCAACAACTGCACTCGCAAAGGATTTGAAGTTGAAAGGAGTTTGGAAGAAGTCATTAGTTGTGCAGG ACTTAATCATAGCGGGTGCCGTTGCTATTGGTGGAATAGGGCTCATCTATAGTTGGTTTACTCAATCAGTTGAAACTGTG TCTCACCAGGGCAAGAACAAATCCAAAAGAATTCAAGCATTGAAGTTTCGACACGCCCGCGATAAGAGGGCTGGTTTTGA AATTGATAACAATGATGATACAATAGAAGAATTCTTTGGATCTGCATACAGGAAGAAGGGAAAAGGTAAAGGCACCACTG TTGGTATGGGCAAGTCAAGCAGGAGGTTTGTTAATATGTATGGATTTGACCCAACAGAATATTCATTCATCCAGTTCGTT GATCCGCTCACTGGAGCTCAAATTGAAGAGAACGTCTATGCTGATATTAGAGACATCCAAGAGCGCTTTAGTGATGTCCG CAAGAAAATGGTAGAGGATGATGAAATCGAATTGCAAGCATTGGGCAGCAACACAATCATTCATGCTTACTTCAGGAAGG ATTGGTCTGACAAGGCTCTAAAAATTGATTTGATGCCACACAACCCACTCAAAATCTGTGATAAATCGAATGGCATTGCT AAGTTTCCTGAAAGAGAACTTGAGTTGAGGCAAACTGGGCCAGCAACAGAGGTTGATGTGAAAGACATTCCAAAACAGGA AGTGGAGCATGAAGCCAAATCACTCATGAGAGGTTTAAGGGATTTCAATCCAATTGCTCAAACAGTTTGCAGAGTAAAAG TGTCTGTTGAATATGGAACGTCTGAAATGTATGGGTTCGGTTTTGGTGCGTATATTATAGTAAACCACCATCTATTCAAG AGTTTCAATGGATCCATGGAAGTGCGATCAATGCATGGAACATTCAGAGTGAAGAATTTGCATAGCCTGAGCGTTTTACC GATCAAAGGCAGAGACATTATCATCATAAAGATGCCAAAGGATTTCCCTGTTTTCCCACAAAAACTGCACTTCCGAGCTC CAGTGCAGAATGAGAGGATTTGTTTGGTTGGAACTAATTTTCAAGAAAAACATGCATCATCAATCATCACAGAAACGAGT ACTACATACAATGTACCGGGCAGCACTTTTTGGAAGCATTGGATTGAAACAAATGATGGGCATTGTGGATTACCAGTAGT GAGTACAGCTGATGGATGTCTAGTTGGAATACACAGCTTGGCGAATAATGTGCAAACCACGAATTATTATTCAGCCTTTG ATGAGGATTTTGAAAGTAAGTATCTCCGAACTGATGAGCATAATGAGTGGACCAAATCGTGGGTATATAACCCAGATACT GTGTTGTGGGGTCCATTGAAGCTCAAAGAGAGTACCCCTAAAGGCCTGTTTAAGACAACAAAACTTGTACAGGATTTAAT TGATCATGATGTTGTTGTAGAGCAAGCTAAACATTCTGCGTGGATGTATGAGGCTCTAACAGGGAATTTGCAAGCTGTGG CGACAATGAAGAGTCAGCTAGTGACAAAGCACGTGGTCAAAGGGGAGTGTCGGCACTTCAAAGAGTTCTTAACTGTGGAT TCGGAAGCAGAAGCTTTCTTCAGGCCTTTGATGGATGCTTATGGGAAGAGCTTGTTAAATAGAGAAGCATATATAAAGGA CATAATGAAATACTCAAAGCCTATTGATGTTGGAATAGTAGACTGTGATGCTTTTGAAGAGGCTATCAATAGGGTTATCA TTTATCTGCAAGTACATGGCTTCCAGAAATGCAATTACATCACCGATGAGCAGGAAATTTTCAAAGCTCTCAATATGAAA GCTGCTGTCGGGGCTATGTATGGAGGCAAGAAGAAAGACTACTTCGAGCATTTTACTGAGGCGGATAAAGAGGAAATTGT TATGCAAAGTTGCTTACGATTGTACAAGGGCTCACTTGGCATATGGAATGGATCATTGAAAGCAGAACTTCGGTGCAAAG AGAAGATACTTGCAAATAAGACAAGGACATTCACTGCTGCACCTTTAGATACTCTACTGGGTGGGAAGGTGTGCGTTGAT GATTTTAATAATCAATTCTACTCAAAGAACATTGAATGCTGCTGGACTGTTGGAATGACTAAGTTTTATGGAGGTTGGGA CAAATTGCTTCGGCGTCTACCTGAAAATTGGGTGTACTGCGATGCCGATGGTTCACAATTCGATAGTTCACTCACCCCAT ACCTAATTAATGCTGTTCTCATCATCAGAAGCACATACATGGAAGATTGGGACTTGGGGTTGCAAATGTTGCGCAATTTG TACACAGAAATAATTTACACACCAATCTCAACTCCAGATGGAACAATTGTCAAGAAGTTTAGAGGTAATAATAGCGGTCA ACCTTCTACCGTTGTGGATAATTCTCTCATGGTTGTTCTTGCTATGCATTACGCTCTCATTAAGGAGTGCGTTGAGTTTG AAGAAATCGACAGCACGTGTGTATTCTTTGTTAATGGTGATGACTTATTGATTGCTGTGAATCCGGAGAAAGAGAGCATT CTCGATAGAATGTCACAACATTTCTCAGATCTTGGTTTGAACTATGATTTTTCGTCGAGAACAAGAAGGAAGGAGGAATT GTGGTTCATGTCCCATAGAGGCCTGCTAATTGAGGGTATGTACGTGCCAAAGCTTGAAGAAGAGAGAATTGTATCCATTC TGCAATGGGATAGGGCTGATCTGCCAGAGCACAGATTAGAAGCGATTTGTGCAGCAATGATAGAATCCTGGGGTTATTTT GAGTTAACGCACCAAATTAGGAGATTCTACTCATGGTTGTTACAACAGCAACCTTTTTCAACGATAGCACAGGAAGGAAA AGCTCCATACATAGCGAGCATGGCATTGAAGAAGCTGTACATGAATAGGACAGTAGATGAGGAGGAACTGAAGGCTTTCA CTGAAATGATGGTTGCCTTGGATGACGAATTTGAGTGCGATACTTATGAAGTGCACCATCAAGGAAATGACACAATCGAT GCAGGAGGAAGCACTAAGAAGGATGCAAAACAAGAGCAAGGTAGCATTCAACCAAATCTCAACAAGGAAAAGGAAAAGGA CGTGAATGTTGGAACATCTGGAACTCATACTGTGCCACGAATTAAAGCTATCACGTCCAAAATGAGAATGCCTAAGAGTA AAGGTGCAACTGTACTAAATTTGGAACACTTACTCGAGTATGCTCCACAGCAAATTGACATCTCAAATACTCGAGCAACT CAATCACAGTTTGATACGTGGTATGAAGCAGTACAACTTGCATACGACATAGGAGAAACTGAAATGCCAACTGTGATGAA TGGGCTTATGGTTTGGTGCATTGAAAATGGAACCTCGCCAAACATCAACGGAGTTTGGGTTATGATGGATGGAGATGAAC AAGTCGAATACCCACTGAAACCAATCGTTGAGAATGCAAAACCAACACTTAGGCAAATCATGGCACATTTCTCAGATGTT GCAGAAGCGTATATAGAAATGCGCAACAAAAAGGAACCATATATGCCACGATATGGTTTAGTTCGTAATCTGCGCGATGG AAGTCTGGCTCGCTATGCTTTTGACTTTTATGAAGTTACATCACGGACACCAGTGAGGGCTAGAGAGGCACACATTCAAA TGAAGGCCGCAGCTTTAAAATCAGCTCAATCTCGACTTTTCGGATTGGATGGTGGCATTAGTACACAAGAGGAAAACACA GAGAGGCACACCACCGAGGATGTTTCTCCAAGTATGCATACTCTACTTGGAGTGAAGAACATGTGA >CF_YL21 ATGGCAATCTACACGTCAACAATCTGTTTCGGTTCGTTTGAATGCAAGCTACCATACTCACCCGCCTCTTGCGGGCATAT TGTGAAGGAACGAGAAGTGCTGGCTTCCATCGATCCTTTCGCAGATCTGGAAACACAACTTAGTGCACGATTGCTCAAGC AAGAATATGCTACTGTTCGTGTGCTCAAGAACGGTACTCTTACGTACCGATACAAGACTGATGCCCAGATAACGCGCATC CAGAAGAGACTGGAAAGGAAGGATAGGGAAGAATACCACTTCCAGATGGCAGCTCCTAGTATTGTGTCAAAGATCACTAT TGCTGGTGGAGAGCCACCTTCAAAACTTGAATCACAAGTGCGGAGGGGTGTCATCCACACAACTCCAAGGATGCGCACAG CAAAAACATATCACACGCCAAAGCTGACAGAGGGACAAATGAACCACCTTATCAAGCAGGTGAAGCAAATTATGTCAACC AAAGGAGGGTCTGTTCAACTGATTAGCAAGAAAAGTACCCATGTTCACTATAAAGAAGTTTTGGGATCACATCGCGCAGT TGTTTGCACTGCACATATGAGAGGTTTACGAAAGAGAGTGGACTTTCGGTGTGATAAATGGACCGTTGTGCGTCTACAGC ATCTCGCCAGGACGGATAAGTGGAATAACCAAGTTCGTGTTACTGATCTACGCAAGGGCGATAGTGGAGTTATATTGAGT AATACTAATCTCAAAGGAAACTTTGGGAGAAGCTCGGAGGGCATATTCATAGTGCGTGGGTCGCACGAAGGAAAAATCTA TGATGCACGTTCCAAGGTTACTCAAGGGGTTATGGATTCAATGGTTCAGTTCTCAAGCGCTGAAAGCTTTTGGAAGGGAT TGGACGGCAATTGGGCACAAATGAGATATCCTACAGATCATACATGTGTGGCAGGCTTACCAGTTGAAGACTGTGGCAGA GTTGCAGCGATAATGACACACAGTATTTTACCGTGCTATAAGATAACCTGCCCTACCTGTGCCCAACAATATGCCAACTT GCCAGCCAGTGACTTACTTAAGATATTACACAAGCACGCAAGTGATGGTTTAAATCGATTGGGGGCAGACAAAGATCGCT TTGTGCATGTCAAAAAGTTCTTGACAATCTTAGAGCACTTAACTGAACCGGTTGATCTGAGTCTAGAAATTTTCAATGAA GTATTCAAGTCTATAGGGGAGAAGCAACAATCACCTTTCAAAAACCTGAATATTCTGAATAATTTCTTTTTAAAAGGAAA GGAAAATACAGCTCGTGAATGGCAGGTGGCTCAATTAAGCTTACTTGAATTGGCAAGATTCCAAAAGAACAGAACGGATA ATATCAAGAAAGTAGACATCTCGTTCTTTAGGAATAAACTATCTGCCAAAGCAAATTGGAACTTGTATCTGTCATGTGAT AACCAGCTGGATAAGAATGCAAACTTCCTGTGGGGACAGAGGGAATATCATGCTAAGCGATTTTTCTCGAACTATTTCGA GGAAATTGATCCAGCGAAGGGCTATTCAGCATACGAAAATCGTTTGCATCCGAATGGGACAAGAAAACTTGCAATTGGAA ACCTAATTGTACCACTTGATCTGGCTGAGTTTAGGCGGAAGATGAAAGGTGATTTTAAAAGACAGCCAGGGGTGAGTAAG AAGTGCACGAGCTCGAAGGATGGAAACTACGTGTATCCCTGTTGTTGCACTACACTTGATGATGGCTCAGCTGTTGAATC AACATTTTACCCGCCAACTAAGAAGCGCCTCGTAATAGGTAATAGTGGCGACCAAAAGTATGTTGACTTACCAAAAGGGA ATTCTGAGATGTTATATATTGCCAGGCAAGGCTTCTGTTACATTAACATTTTCCTCGCGATGTTGATTAACATTAGTGAG GAAGATGCAAAGGATTTCACTAAGAAGGTTCGTGACATGTGTGTGCCAAAGCTCGGAACCTGGCCAACCATGATGGATCT GGCTACAACTTGTGCTCAAATGAAAATATTCTACCCTGATGTTCATGATGCAGAACTGCCTAGAATACTAGTCGATCACG AAACGCAGACATGCCATGTGGTTGACTCGTTTGGCTCACAAACAACTGGGTATCATATTTTGAAAGCATCTAGCGTGTCC CAACTTATTTTGTTTGCTAATGATGAGTTGGAGTCTGACATTAAGCACTATAGAGTTGGTGGTATTCCTAATGCATGCCC TGAACTTGGGTCCACAATATCACCTTTCAGAGAAGGAGGAGTTATAATGTCTGAGTCGGCAGCGCTGAAACTGCTTTTGA AGGGAATTTTTAGACCTAAGGTGATGAGACAGTTGCTGTTAGATGAGCCTTACCTGTTGATTCTATCAATATTATCCCCT GGCATACTGATGGCTATGTATAATAATGGGATTTTTGAACTTGCGGTGAGGTTGTGGATTAATGAGAAGCAATCCATAGC GATGATAGCATCGCTACTATCAGCTTTAGCCCTACGAGTGTCAGCGGCAGAAACACTCGTCGCACAGAGGATTATAATTG ATGCTGCAGCTACAGACCTCCTCGATGCTACGTGTGACGGGTTCAACCTGCATCTAACGTACCCCACTGCATTGATGGTG TTGCAAGTTGTTAAGAATAGAAATGAATGTGATGATACTCTATTCAAGGCGGGTTTTCCAAGTTACAACACGAGCGTCGT ACAGATTATGGAAAAAAATTATCTAAATCTCTTGAACGATGCTTGGAAAGATTTAACTTGGCGGGAAAAATTTTCCGCAA CATGGTACTCATACAGAGCAAAACGCTCTATCACTCGGTACATAAAACCCACAGGAAGGGCAGATTTGAAAGGGTTATAC AACATATCACCACAAGCATTCTTGGGCCGAAGCGTCCAGGTGGTCAAAGGCACTGCCTCAGGATTGAGCGAGCGATTTAA TAATTACTTCAATACTAAGTGTGTAAATATTTCATCCTTTTTCATTCGTAGAATCTTTAGGCGTTTGCCAACTTTCGTCA CTTTTGTTAACTCATTATTAGTTATTAGTATGTTAACTAGTGTAGTGGCAGTGTGTCAGGCAATAATTTTAGATCAGAAG AAGTATAGGAGAGAGATCGAGTTGATGCAGATAGAGAAGAATGAGATTGTCTGCATGGAGCTATACGCAAGTTTACAGCG CAAACTTGAACGCGATTTCACATGGGATGAGTACATTGAGTATTTGAAATCAGTGAACCCTCAGATAGTTCAATTTGCTC AAGCGCAGATGGAAGAATATGATGTGCGACACCAGCGTTCCACACCAGGTGTTAAAAATTTGGAACAAGTGGTAGCATTT ATGGCTTTAGTCATCATGGTGTTCGATGCTGAAAGGAGTGACTGCGTGTTCAAAACTCTCAATAAATTTAAGGGTGTCCT TTCCTCACTGGACCATGAAGTTAGACATCAGTCCTTGGATGATGTGATCAAGAATTTTGATGAGAGGAATGAGACTATTG ATTTTGAATTGAGTGAGGACACAATTCGAACATCATCAGTGCTAGATACAAAGTTTAGTGATTGGTGGGACCGACAAATC CAAATGGGACATACACTTCCACATTACAGAACCGAGGGGCACTTCATGGAATTTACAAGGGCAACTGCTGTTCAAGTGGC TAATGACATTGCCCACAGCGAACACCTAGACTTTCTAGTAAGGGGAGCTGTTGGGTCTGGAAAGTCAACTGGGTTGCCTG TTCATCTTAGTGTAGCTGGATCTGTGCTTTTAATTGAGCCAACGCGGCCACTAACGGAGAACGTTTTCAAACAGCTATCT AGTGAACCATTCTTCAAGAAGCCAACACTGCGTATGCGTGGAAATAGTATATTTGGCTCTTCCCCAATCTCCGTTATGAC TAGCGGGTTTGCGCTACACTACTTCGCCAATAATCGCTCTCAATTAGCTCAGTTCAATTTTGTAATATTTGATGAGTGCC ATGTTCTGGATCCTTCCGCAATGGCGTTCCGCAGTCTGCTGAGTGTTTATCATCAAGCATGTAAAGTATTAAAAGTGTCA GCTACTCCAGTGGGAAGAGAGGTTGAATTCACAACACAGCAGCCAGTCAAGTTAATAGTGGAAGACACACTGTCTTTCCA ATCATTTGTTGATGCACAAGGTTCTAAAACTAATGCTGATGTTGTCCAGTTTGGTTCAAACATACTTGTTTACGTGTCGA GCTACAATGAAGTTGACAACTTGGCTAAGCTCCTAACAGATAAGAATATGATGGTCACAAAGGTTGATGGCAGAACAATG AAGCACGGTTGCCTAGAAATTGTCACAAAAGGAACCAGTGCGAGACCACATTTTGTTGTAGCAACCAACATAATTGAGAA TGGAGTGACTTTGGACATAGATGTAGTAGTGGATTTTGGGTTGAAAGTCTCACCGTTCTTGGACATTGACAATAGGAGCA TTGCTTACAATAAGGTGAGTGTTAGTTATGGTGAGAGAATTCAAAGGCTGGGTCGTGTTGGACGCTTCAAGAAAGGAGTA GCATTGCGCATTGGACACACTGAGAAGGGAATTATCGAAATTCCAAGCATGATCGCTACTGAGGCGGCTCTTGCTTGCTT TGCATATAACTTACCAGTGATGACAGGAGGCGTCTCAACTAGTCTGATTGGCAATTGTACTGTGCGCCAGGTCAAAACAA TGCAGCAATTTGAATTGAGCCCCTTCTTTATCCAGAATTTCGTTGCTCATGATGGATCAATGCATCCTGTCATACATGAC ATTCTCAAAAAGTATAAACTCCGAGATTGTATGACACCTTTGTGCGATCAGTCTATACCATACAGGGCATCGAGCACTTG GTTATCGGTTAGTGAATATGAGCGACTTGGAGTGGCTTTAGAAATTCCAAAGCAAGTCAAAATTGCATTCCATATCAAAG AGATCCCTCCTAAGCTCCACGAAATGCTTTGGGAAACGGTTGTCAAGTACAAAGACGTTTGCTTATTTCCAAGCATTCGA GCATCGTCCATCAGCAAAATCGCATACACATTGCGTACAGACCTCTTCGCCATCCCAAGAACTCTAATATTGGTGGAGAG ATTGCTTGAAGAGGAGCGAGTGAAGCAGAGCCAATTCAGAAGTCTCATCGATGAAGGGTGCTCAAGCATGTTTTCAATTG TCAACTTGACCAACACTCTCAGGGCCAGATATGCAAAAGATTACACCGCAGAGAACATACAAAAACTTGAGAAAGTGAGA AGTCAATTAAAAGAATTCTCAAATTTGGATGGTTCTGCATGTGAGGAAAATTTAATAAAGAGGTATGAGTCTTTGCAGTT CGTTCACCACCAAGCTGCGACGTCACTTGCAAAGGATCTCAAGTTGAAGGGGACTTGGAAGAAGTCATTAGTGGCTAAAG ACTTGATCATAGCAGGCGCTGTTGCAATTGGTGGAATAGGACTCATATATAGTTGGTTCACACAATCAGTTGAGACTGTG TCTCACCAAGGGAAAAATAAATCCAAAAGAATCCAAGCCTTGAAGTTTCGCCATGCTCGTGACAAAAGGGCTGGCTTTGA AATTGACAACAATGATGACACAATAGAGGAATTCTTTGGATCTGCATACAGGAAGAAGGGAAAAGGTAAAGGTACCACAG TTGGTATGGGCAAGTCAAGCAGGAGGTTCATTAACATGTACGGGTTTGACCCGACAGAGTACTCATTCATCCAATTCGTT GATCCACTCACTGGGGCACAAATAGAAGAGAATGTCTATGCTGACATTAGAGATGTTCAAGAAAGATTTAGTGAGGTGCG AAAGAAAATGGTTGAGAATGATGATATTGAAATGCAAGCCTTGGGTAGTAACACGACCATACATGCATACTTTAGGAAAG ATTGGTCTGACAAAGCTTTGAAGATTGATTTAATGCCACATAACCCACTCAAAGTTTGTGACAAAACAAATGGCATTGCC AAACTTCCTGAGAGAGAGTTCGAACTAAGGCAGACTGGGCCAGCTGTAGAAGTCGACGTGAAGGACATACCAGCGCAGGA AGTGGAGCATGAAGCCAAATCGCTCATGAGAGGCTTGAGAGACTTCAACCCAATTGCTCAAACAGTTTGTAGGCTGAAAG TATCTGTTGAATATGGGACATCAGAGATGTACGGTTTTGGATTTGGAGCATACATAATAGCGAACCACCATTTGTTTAGG AGTTACAATGGTTCCATGGAGGTGCGATCCATGCATGGTACATTCAGGGTGAAGAATCTACACAGTTTGAGCGTTCTGCC AATTAAAGGCAGGGACATCATCCTCATCAAAATGCCGAAGGATTTCCCTGTTTTTCCACAGAAATTGCATTTCCGAGCTC CTACACAGAATGAAAGAGTTTGTTTAGTTGGGACCAACTTTCAGGAGAAGTATGCATCGTCGATCATCACAGAAACAAGC ACTACTTACAATATACCAGGCAGCACATTCTGGAAGCATTGGATTGAAACAGATAATGGACATTGTGGACTACCAGTGGT GAGCACAGCCGATGGATGTCTAGTCGGAATTCATAGTTTGGCAAACAATGCACACACCACGAACTACTACTCAGCCTTCG ATGAAGATTTTGAAAGCAAGTACCTCCGAACCAATGAGCACAATGAATGGGTCAAATCTTGGGTTTATAATCCAGACACA GTGTTGTGGGGCCCGTTGAAGCTTAAAGATAGCACTCCCAAAGGGTTATTCAAAACAACAAAGCTTGTGCAAGATCTAAT CGATCATGATGAAGTCGCGGAGCAAGCTAAGCACTCTGCGTGGATGTTTGAAGCCTTGACAGGAAATTTGCAAGCTGTCG CAACAATGAAGAGTCAATTAGTAACCAAGCATGTAGTTAAAGGAGAGTGTCGACACTTCAAAGAATTCCTGACTGTGGAT GCAGAAGCAGAGGCATTCTTCAGGCCTTTGATGGATGCATATGGGAAAAGCTTGCTGAATAGAGATGCGTACATCAAAGA CATAATGAAGTACTCAAAACCTATAGATGTTGGTATCGTGGACTGTGATGCATTTGAAGAAGCCATCAATAGGGTTATCA TCTACTTGCAAGTGCACGGCTTCAAGAAGTGTGCATATGTCACTGACGAGCAAGAAATTTTCAAAGCGCTCAACATGAAA GCTGCAGTCGGAGCCATGTATGGTGGCAAAAAGAAAGACTATTTTGAGCATTTCACTGATGCAGATAAGGAAGAAATAGT CATGCAAAGCTGTCTGCGACTGTATAAAGGCTTGCTCGGCATTTGGAACGGATCGTTGAAGGCAGAGCTCCGGTGTAAGG AAAAGATACTTGCAAACAAGACGAGGACGTTCACTGCTGCACCTCTAGACACTTTGCTGGGTGGTAAAGTGTGTGTTGAT GACTTCAATAATCAATTTTATTCAAAGAATATTGAATGCTGTTGGACAGTTGGGATGACTAAGTTTTATGGTGGTTGGGA TAAACTGCTTCGGCGTTTACCTGAGAATTGGGTATACTGTGATGCCGATGGCTCACAGTTTGATAGTTCACTAACTCCAT ACTTAATCAATGCTGTTCTCACCATCAGAAGCACATACATGGAAGACTGGGATGTGGGGTTGCAAATGCTGCGTAATTTA TACACTGAGATTATTTACACACCTATCTCAACTCCAGATGGAACAATTGTTAAGAAGTTCAGAGGAAATAACAGTGGTCA GCCTTCTACTGTTGTGGACAACTCTCTTATGGTTGTCCTTGCCATGCACTATGCTCTCATCAAAGAATGCATTGAGTTTG AAGAGATTGACAGCACGTGCGTGTTCTTCGTCAATGGTGATGATTTGCTGATTGCCGTGAATCCGGATAAAGAGGGCATT CTTGATAGATTGTCACAGCACTTCTCAGATCTTGGTTTAAATTATGATTTTTCGTCAAGAACAAGAAATAAGGAAGAGTT ATGGTTTATGTCTCATAGAGGCCTACTGATTGAGGGCATGTACGTGCCGAAACTTGAAGAAGAAAGGATTGTGTCCATTC TCCAATGGGACAGAGCAGACTTGGCTGAACACAGGCTTGAGGCGATTTGCGCAGCTATGATAGAGTCCTGGGGTTATTCT GAACTAACACACCAAATTAGAAGATTCTACTCATGGTTATTGCAACAGCAACCTTTTGCAACAATAGCGCAGGAAGGGAA GGCTCCTTATATAGCAAGCATGGCATTAAGGAAACTGTATATGGATAGAGCTGTGGATGAGGAAGAGCTGAGAGCCTTCA CTGAAATGATGGTCGCATTAGACGATGAGTTTGAGTGTGACTCTTATGAAGTACACCACCAGGGAAACGATACAATCGAT GCAGGAGGAAGCAGCAAGAAAGATGCAAGACCAGAGCAAGGCAGCATCCAGTCAAACCCGAACAAAGGAAAAGATAAGGA TGTGAATGCTGGTACATCTGGGACACATACTGTGCCGAGAATCAAGGCTATCACGTCCAAAATGAGAATGCCCAAAAGCA AAGGAGCAACCGTGCTAAACTTAGAACACTTGCTTGAGTATGCTCCACAACAAATTGATATTTCAAATACTCGGGCAACT CAATCACAGTTTGATACGTGGTATGAGGCAGTGCGGATGGCATACGACATAGGAGAAACTGAGATGCCAACTGTGATGAA TGGGCTTATGGTTTGGTGCATTGAAAATGGAACCTCGCCAAATGTCAACGGAGTCTGGGTTATGATGGATGGGGATGAAC AAGTCGAGTACCCGTTGAAACCAATCGTTGAGAATGCAAAACCAACCCTTAGGCAAATCATGGCACATTTCTCAGATGTT GCAGAAGCGTATATAGAAATGCGCAACAAAAAGGAACCATATATGCCACGATATGGTTTAATTCGAAATCTGCGGGATGT GGGTTTAGCGCGTTATGCCTTTGACTTTTATGAGGTCACATCACGAACACCAGTGAGGGCTAGGGAAGCGCACATTCAAA TGAAGGCCGCAGCATTGAAATCAGCCCAACCTCGACTTTTCGGGTTGGACGGTGGCATCAGTACACAAGAGGAGAACACA GAGAGGCACACCGCCGAGGATGTCTCTCCAAGTATGCATACTCTACTTGGAGTCAAAAACATGTGA >Oz ATGGCAACTTACATGTCAACAATCTGTTTCGGTTCGTTTGAATGCAAGCTACCATACTCACCCGCTTCTTGCGGGCATAT TGTGAAGGAGCGAGAAGTGCTGGCTTCCGTTGATCCTTTCGCAGATCTGGAAACACAACTTAGTGCACGATTGCTCAAGC AAGAATATGCTACTGTTCGTGTGCTCAAGAACGGTACTCTTACTTACCGATACAAAACTGATGCCCAGATAACGCGCATT CAGAAGAAACTGGAGAGGAAGGATAGGGAAGAATATCACTTCCAGATGGCCGCTCCTAGTATTGTGTCAAAAATTACAAT AGCTGGTGGAGATCCTCCATCAAAGTCTGAGCCACAAGCACCAAGAGGGATCATTCATACAACTCCAAGGGTGCGTAAAG TCAAGACACGTCCCATAATAAAGTTGACAGAAGGCCAGATGAATCATCTCATTAAGCAGGTGAAGCAGATTATGTCGGAG AAGAGAGGGTCTGTCCACTTAATTAGTAAGAAGACCACTCATGTTCAATATAAGGAGATACTTGGAGCAACTCGCGCAGC GGTTCGAACTGCACATATGATGGGTTTGCGACGGAGAGTGGACTTCCGATGTGATATGTGGACAGTCGGACTTTTGCAAC GTCTCGCTCGGACGGACAAATGGTCCAATCAAGTCCGCACTATCAACATACGAAGGGGTGATAGTGGAGTCATTTTGAAC ACAAAAAGCCTCAAAGGCCACTTTGGTAGAAGTTCAGGAGACTTGTTCATAGTGCGTGGATCACACGAAGGGAAATTGTA CGATGCACGTTCTAGAGTTACTCAGAGTGTTTTGAACTCAATGATCCAGTTTTCGAATGCTGATAATTTTTGGAAGGGTC TAGACGGTAATTGGGCACAACTGAGATATCCTTCGGATCACACATGTGTAGCTGGTTTACCTGTCGAAGATTGTGGTAGA GTTGCTGCATTGATGGCACACAGTATCCTCCCGTGCTACAAGATAACCTGCCCCACCTGTGCTCAACAGTATGCCAGCTT GCCGGTTAGCGATCTGTTTAAGCTGTTGCATAAACATGCGAGAGATGGTTTGAACCGATTGGGAGCGGATAAAGACCGGT TTATACATGTTAATAAGTTCTTGATAGCGTTAGAGCATCTAACTGAACCGGTGGATTTGAATCTCGAGCTTTTCAATGAG ATATTTAAATCCATAGGGGAGAAGCAGCAAGCACCGTTCAAGAATTTAAATGTCTTAAATAATTTCTTCCTGAAAGGAAA AGAAAATACAGCTCATGAATGGCAAGTGGCTCAATTGAGTTTGCTCGAATTAGCAAGGTTCCAGAAGAATAGAACTGATA ACATCAAGAAAGGTGATATATCTTTCTTCAGAAATAAATTATCTGCCAAGGCAAACTGGAATCTGTATTTGTCGTGCGAC AACCAGTTGGATAAAAATGCAAATTTTCTGTGGGGACAAAGGGAGTATCATGCTAAGCGGTTTTTCTCAAACTTCTTTGA GGAAATTGATCCAGCAAAGGGATACTCAGCATATGAAATCCGCAAGCATCCAAATGGAACAAGGAAGCTCTCAATTGGTA ACTTAGTTGTCCCACTTGATTTAGCTGAGTTTAGGCAGAAGATGAAAGGTGACTATAGGAAACAACCAGGAGTTAGCAGA AAGTGCACGAGTTCGAAAGATGGTAATTATGTGTATCCCTGTTGTTGCACAACACTTGATGATGGTTCAGCTATTGAATC AACATTCTATCCACCAACCAAAAAGCACCTTGTAATAGGCAATAGCGGTGACCAAAAATTTGTTGATTTACCAAAAGGGG ATTCGGAGATGTTATACATTGCCAAGCAGGGTTATTGTTATATCAACGTGTTTCTTGCAATGCTTATTAACATTAGCGAG GAGGATGCAAAGGATTTCACAAAGAAAGTTCGCGACATGTGTGTGCCAAAGCTTGGAACCTGGCCAACTATGATGGATTT GGCGACCACTTGTGCTCAAATGAGAATATTCTATCCTGACGTGCATGATGCAGAGCTGCCTAGAATATTGGTTGACCATG ACACTCAAACGTGTCACGTGGTTGACTCATTTGGCTCGCAAACAACTGGATATCATATTCTAAAAGCATCCAGCGTGTCT CAACTTATCTTGTTTGCAAATGATGAATTAGAATCTGATATAAAACATTATAGAGTTGGTGGTGTTCCTAATGCATGCCC TGAACTTGGGTCCACAATATCACCTTTCAGAGAAGGAGGAGTTATAATGTCTGAGTCGGCAGCGCTGAAACTGCTTTTAA AGGGAATTTTTAGACCTAAGGTGATGAGACAGTTGCTGTTAGATGAGCCTTACCTGTTGATTTTATCAATATTATCTCCT GGCATACTGATGGCTATGTATAATAATGGGATTTTTGAACTTGCGGTAAGGTTGTGGATTAATGAGAAACAATCCATAGC TATGATAGCATCGCTACTATCAGCTTTAGCCCTACGAGTGTCAGCGGCAGAAACACTCGTCGCACAGAGGATTATCATTG ATGCTGCAGCTACAGACCTCCTTGATGCTACGTGTGATGGGTTCAACCTACATCTAACGTACCCCACTGCATTGATGGTG TTGCAAGTTGTTAAGAATAGAAATGAATGTGATGATACCCTATTCAAGGCGGGTTTTTCAAGTTACAACACGAGCGTCGT ACAGATTATGGAAAAAAATTATCTAAATCTCTTGAACGATGCTTGGAAAGATTTAACTTGGCGGGAAAAATTATCCGCAA CATGGTACTCATACAGAGCAAAACGCTCTATCACTCGGTACATAAAACCCACAGGAAGGGCAGATTTGAAAGGGTTATAC AACATATCACCACAAGCATTCTTGGGCCGAAGCGCCCAGGTGGTCAAAGGTACTGCCTCAGGATTGAGTGAGCGATTTAA TAATTATTTCAATACTAAGTGTGTAAATATTTCATCCTTTTTCATTCGTAGAATCTTTAGGCGTTTGCCAACTTTCGTCA CTTTTGTTAACTCATTATTAGTTATTAGTATGTTAACTAGCGTAGTGGCAGTGTGTCAGGCAATAATTTTAGATCAGAGG AAATATAGGAGAGAAATCGAGTTGATGCAGATAGAGAAGAATGAGATTGTCTGCATGGAGCTATATGCAAGTTTACAGCG CAAACTTGAACGCGATTTCACATGGGATGAGTACATTGAGTATTTGAAATCAGTAAACCCTCAGATAGTTCAGTTTGCTC AAGCGCAGATGGAAGAATATGATGTGCGACACCAGCGTTCCACACCAGGTGTTAAAAATTTGGAACAAGTGGTAGCATTT ATGGCTTTAGTCATTATGGTGTTCGATGCTGAAAGGAGTGATTGCGTGTTCAAAACTCTCAATAAATTTAAGGGTGTCCT TTCCTCAATGGACTATGAAGTTAGACATCAGTCCTTAGACGATGTGATCAAGAATTTTGATGAGAGGAATGAGATTATTG ATTTTGAATTGAGTGAGGACACAATTCGAACATCATCAGTGCTAGATACAAAGTTTAGTGATTGGTGGGACCGACAAATC CAGATGGGACATACACTTCCACATTACAGAACCGAGGGGCACTTCATGGAATTCACAAGAGCAACTGCTGTCCAAGTGGC TAATGACATTGCCCATAGCGAACACCTAGACTTTCTAGTAAGGGGAGCTGTTGGGTCTGGAAAGTCAACTGGGTTACCTG TTCATCTTAGTGTAGCCGGATCTGTGCTTTTAATTGAACCAACGCGACCACTAGCGGAGAACGTTTTCAAACAGCTATCT AGTGAACCATTCTTCAAGAAGCCAACACTGCGTATGCGTGGAAATAGTATATTTGGCTCTTCTCCAATCTCCGTCATGAC TAGCGGATTTGCGCTACACTACTTCGCCAATAATCGCTCTCAATTAGCTCAGTTCAACTTTGTAATATTTGATGAGTGCC ATGTTCTGGATCCTTCCGCAATGGCGTTCCGCAGTCTGCTGAGTGTTTATCATCAAGCATGCAAAGTATTAAAAGTGTCA GCTACTCCAGTGGGAAGAGAGGTTGAATTTACAACACAGCAACCAGTCAAGTTAATAGTGGAGGACACACTGTCTTTCCA ATCATTTGTTGATGCACAAGGTTCTAAAACTAATGCTGATGTTGTTCAGTTTGGTTCAAACGTACTTGTGTACGTGTCGA GCTACAATGAAGTTGATACCTTGGCTAAGCTCCTAACAGACAAGAATATGATGGTCACAAAGGTTGATGGCAGAACAATG AAGCACGGTTGCCTAGAAATTGTCACAAAAGGAACCAGTGCGAGACCACATTTTGTTGTAGCAACCAACATAATTGAGAA TGGAGTGACTTTGGACATAGACGTGGTTGTAGATTTTGGGTTAAAAGTCTCACCGTTCTTGGACATTGACAATAGGAGCA TTGCTTACAATAAGGTGAGTGTTAGCTATGGTGAGAGAATTCAAAGGCTGGGTCGTGTTGGACGCTTCAAGAAAGGAGTA GCATTGCGCATTGGACACACTGAGAAGGGAATTATTGAAATTCCAAGCATGATCGCTACAGAGGCAGCTCTTGCTTGCTT TGCATATAACTTACCAGTGATGACAGGAGGCGTCTCAACTAGTCTGATTGGCAATTGTACTGTGCGCCAAGTTAAAACAA TGCAGCAATTTGAATTGAGTCCCTTCTTTATCCAGAATTTCGTTGCCCATGATGGATCAATGCATCCTGTCATACATGAC ATTCTTAAAAAGTATAAACTTCGAGATTGTATGACACCTTTGTGCGATCAGTCTATACCATACAGGGCATCGAGCACTTG GTTATCGGTTAGTGAATATGAGCGACTTGGAGTGGCCTTAGAAATTCCAAAGCAAGCCAAAATTGCATTCCATATCAAAG AGATCCCTCCTAAGCTCCACGAAATGCTTTGGGAAACGGTTGTCAAGTACAAAGACGTTTGCTTATTTCCAAGCATTCGA GCATCGTCCATCAGCAAAATCGCATACACATTGCGTACAGACCTCTTCGCCATCCCAAGAACTCTAATATTGGTGGAGAG ACTGCTTGAAGAGGAGCGAGTGAAGCAGAGCCAATTCAGAAGTCTCATCGATGAAGGATGCTCAAGCATGTTTTCAATTG TCAACCTGACAAACACTCTCAGAGCTAGATATGCAAAAGATTACACCGCAGAGAACATACAAAAACTTGAGAAAGTGAGA AGTCAATTGAAAGAATTCTCAAATTTGGATGGTTCTGCATGTGAGGAAAATTTAATAAAGAGGTATGAGTCTTTGCAGTT CGTTCATCACCAAGCTGCGACGTCACTTGCAAAGGATCTCAAGTTAAAGGGGACTTGGAAGAAGTCATTAGTGGCCAAAG ACTTGATCATAGCAGGCGCTGTTGCAATTGGTGGAATAGGACTCATATATAGTTGGTTCACACAATCAGTTGAGACTGTG TCTCACCAAGGGAAAAATAAATCCAAAAGAATTCAAGCCTTGAAGTTTCGCCATGCTCGTGACAAAAGGGCTGGCTTTGA AATTGACAACAATGATGACACAATAGAGGAATTCTTTGGATCTGCATACAGGAAAAAGGGAAAAGGTAAAGGTACCACAG TTGGTATGGGCAAGTCAAGCAGGAGGTTCATCAACATGTATGGGTTTGATCCAACAGAGTACTCATTCATCCAATTCGTT GATCCACTCACTGGGGCGCAAATAGAAGAGAATGTCTATGCTGACATTAGAGATATTCAAGAGAGATTTAGTGAAGTGCG AAAGAAAATGGTTGAGAATGATGACATTGAAATGCAAGCCTTGGGTGGTAACACGACCATACATGCATACTTTAGGAAAG ATTGGTCTGACAAAGCTTTGAAGATTGATTTAATGCCACATAATCCACTCAAAGTTTGTGACAAGACAAATGGCATTGCC AAATTTCCTGAGAGAGAGCTCGAACTAAGGCAGACTGGGCCAGCTGTAGAAGTCGATGTGAAGGACATACCAGCACAGGA AGTGGAGCATGAAGCTAAATCGCTCATGAGAGGCTTGAGAGACTTCAACCCAATTGCCCAAACAGTTTGTAGGCTGAAAG TATCTGTTGAATATGGGACATCAGAGATGTACGGTTTTGGATTTGGAGCATACATAATAGCGAACCACCATTTGTTCAGG AGTTACAATGGTTCCATGGAGGTGCGATCCATGCACGGTACATTCAGGGTGAAGAATCTACACAGTTTGAGCGTTCTGCC AATTAAAGGTAGGGATATCATCCTCATCAAAATGCCGAAAGATTTCCCTGTCTTTCCACAGAAATTGCATTTCCGAGCTC CTACACAGAATGAAAGAGTTTGTTTAGTTGGAACCAACTTTCAGGAGAAGTATGCATCGTCGATCATCACAGAAACAAGC ACCACTTACAATATACCAGGCAGCACATTCTGGAAGCATTGGATTGAAACAGATAATGGACATTGTGGATTACCAGTGGT GAGCACCACCGATGGATGTCTAGTCGGAATTCACAGTTTGGCAAACAACAAACACACCACGAACTACTACTCAGCCTTCG ATGAAGATTTTGAAAGCAAGTATCTCCGAACCAATGAGCACAATGAATGGGTCAAGTCTTGGATTTATAATCCAGACACA GTGTTGTGGGGCCCGTTGAAACTTAAAGACAGCACTCCCAAAGGATTATTCAAAACAACAAAGCTTGTGCAAGATCTAAT CGATCATGATGTAGTGGTGGAGCAAGCTAAGCACTCTGCGTGGATGTTTGAAGCCTTGACAGGAAATTTGCAAGCTGTCG CAACAATGAAGAGCCAATTAGTAACCAAGCATGTAGTTAAAGGAGAGTGTCGACACTTCAAAGAATTCCTGACTGTGGAT GCAGAAGCAGAGGCATTCTTCAGGCCTTTGATGGATGCGTATGGGAAAAGCTTGCTGAATAGAGATGCATACATCAAGGA CATAATGAAGTATTCAAAACCTATAGATGTTGGTATCGTGGACTGTGATGCATTTGAGGAAGCCATCAATAGGGTTATCA TCTACCTGCAAGTGCACGGCTTCAAGAAGTGCGCATACGTCACTGACGAGCAAGAAATTTTCAAAGCGCTCAACATGAAA GCTGCAGTTGGAGCCATGTATGGTGGCAAAAAGAAAGACTATTTTGAGCATTTCACTGATGCAGATAAGGAAGAAATAGT CATGCAAAGCTGTCTGCGATTGTATAAAGGCTTGCTTGGCATTTGGAATGGATCATTGAAGGCAGAGCTCCGGTGTAAGG AAAAGATACTTGCAAATAAGACGAGGACATTCACTGCTGCACCTTTAGACACTTTGCTGGGTGGTAAAGTGTGTGTTGAT GATTTCAATAATCAATTTTATTCAAAGAATATTGAATGCTGTTGGACGGTTGGGATGACTAAGTTTTATGGTGGTTGGGA TAAACTGCTGCGGCGTTTACCTGAGAATTGGGTATACTGTGATGCCGATGGCTCACAGTTTGATAGTTCACTAACTCCAT ACTTAATCAATGCTGTTCTCACCATCAGAAGCACATACATGGAAGATTGGGATGTGGGGTTGCAAATGTTGCGCAATTTA TACACTGAGATTGTTTACACACCTATTTCAACTCCAGATGGAACAATTGTTAAGAAGTTCAGAGGAAATAACAGTGGTCA GCCTTCTACTGTTGTGGACAACTCTCTTATGGTCGTCCTTGCCATGCACTATGCTCTCATCAAAGAATGCATTGAGTTTG AAGAGATTGACAGCACGTGCGTGTTCTTTGTCAATGGTGATGATTTGCTGATTGCTGTGAATCCGGATAAAGAGGGCATT CTTGACAGATTGTCACAACACTTCTCAGATCTTGGTTTGAATTATGATTTCTCGTCAAGAACAAGAAATAAGGAGGAATT GTGGTTTATGTCTCATAGAGGCCTACTGATTGAGGGCATGTACGTGCCGAAACTTGAAGAAGAAAGGATTGTGTCCATTC TCCAATGGGACAGAGCAGACTTGGCTGAACACAGGCTTGAGGCGATTTGCGCAGCTATGATAGAGTCCTGGGGTTATTCT GAACTAACACACCAAATCAGGAGATTCTACTCATGGTTATTGCAACAGCAACCCTTTGCAACAATAGCGCAGGAAGGGAA GGCTCCTTATATAGCAAGCATGGCATTAAGGAAATTGTATATGGATAGGGCTGTGGATGAGGAAGAGCTGAGAGCCTTCA CTGAAATGATGGTCGCATTAGACGATGAGTTTGAATTTGACTCTTATGAAGTACACCATCAAGCAAATGACACAATCGAT GCAGGAGGAAGCAGCAAGAAAGATGCAAGACCGGAGCAAGGCAGCATCCAGTCAAACCCGAACAAAGGAAAAGATAAGGA TGTGAATGCTGGTACATCTGGGACACATACTGTGCCGAGAATCAAGGCTATCACGTCCAAAATGAGAATGCCCAAAAGCA AGGGAGCAACCGTGCTAAACCTAGAACACTTGCTTGAGTATGCTCCACAACAAATTGATATTTCAAATACTCGGGCAACT CAATCACAGTTTGATACGTGGTATGAGGCAGTGCGGATGGCATACGACATAGGAGAAACTGAGATGCCAACTGTGATGAA TGGGCTTATGGTTTGGTGCATTGAAAATGGAACCTCGCCAAATGTCAACGGAGTTTGGGTTATGATGGATGGGAATGAAC AAGTCGAGTACCCGTTGAAACCAATCGTTGAGAATGCAAAACCAACCCTTAGGCAAATCATGGCACATTTCTCAGATGTT GCAGAAGCGTATATAGAAATGCGCAACAAAAAGGAACCATATATGCCACGATATGGTTTAATTCGAAATCTGCGGGATGT GGGTTTAGCGCGTTATGCCTTTGACTTTTATGAGGTCACATCACGAACACCAGTGAGGGCTAGGGAAGCGCACATTCAAA TGAAGGCCGCAGCATTGAAATCAGCCCAACCTCGACTTTTCGGGTTGGACGGTGGCATCAGTACACAAGAGGAGAACACA GAGAGGCACACCACCGAGGATGTCTCTCCAAGTATGCATACTCTACTTGGAGTCAAGAACATGTGA >Wilga5 ATGGCAACTTACATGTCAACAATCTGTTTCGGTTCGTTTGAATGCAAGCTACCATACTCACCCGCCTCTTGCGGGCATAT TGTGAAGGAACGAGAAGTGCTGGCTTCCGTTGATCCTTTCGCAGATCTGGAAACACAACTTAGTGCACGATTGCTCAAGC AAGAATATGCTACTGTTCGTGTGCTCAAGAACGGTACTCTTACGTACCGATACAAGACTGATGCCCAGATAACGCGCATC CAGAAGAAACTGGAAAGGAAGGATAGGGAAGAATATCACTTCCAGATGGCAGCTCCTAGTATTGTGTCAAAGATCACTAT TGCTGGTGGAGAGCCACCTTCAAAACTTGAATCACAAGTGCGGAGGGGTGTCATCCACACAACTCCAAGGATGCGCACAG CAAAAACATATCACACGCCAAAGTTGACAGAGGGACAAATGAACCACCTTATCAAGCAGGTGAAGCAAATTATGTCAACC AAAGGAGGGTCTGTTCAACTGATTAGCAAGAAAAGTACCCATGTTCACTATAAAGAAGTTTTGGGATCACATCGCGCAGT TGTTTGCACTGCACATATGAGAGGTTTACGAAAGAGAGTGGACTTTCGGTGTGATAAATGGACCGTTGTGCGTCTACAGC ATCTCGCCAGGACGGACAAGTGGAATAACCAAGTTCGTGCTACTGATCTACGCAAGGGCGATAGTGGAGTTATATTGAGT AATACTAATCTCAAAGGAAGCTTTGGGAGAAGCTCGGAGGGCATATTCATAGTGCGTGGGTCGCACGAAGGAAAAATCTA TGATGCACGTTCCAAGGTTACTCAAGGGGTTATGGATTCAATGGTTCAGTTCTCAAGCGCTGAAAGCTTCTGGAAGGGAT TGGACGGCAATTGGGCACAAATGAGATATCCTACAGATCATACATGTGTGGCAGGCTTACCAGTTGAAGACTGCGGCAGA GTTGCAGCGATAATGACACACAGTATTTTACCGTGCTATAAGATAACCTGCCCTACCTGTGCCCAACAATATGCCAATTT GCCAGCCAGTGACTTACTTAAGATATTACACAAGCACGCAAGTGATGGTTTAAATCGATTGGGGGCAGACAAAGATCGCT TTGTGCACGTCAAAAAGTTCTTGACAATCTTAGAGCACTTAACTGAACCGGTTGATCTGGGTCTAGAAATTTTCAATGAA GTATTCAAGTCTATAGGGGAGAAGCAACAATCACCTTTCAAAAACCTGAATATTCTGAATAATTTCTTTTTGAAAGGAAA AGAAAATACAGCTCGTGAATGGCAGGTGGCTCAATTAGGCTTACTTGAATGGGCAAGATTCCAAAAGAACAGAACGGATA ATATCAAGAAAGGAGACATCTCGTTCTTTAGGAATAAACTATCTGCCAAAGCAAATTGGAACTTGTATCTGTCATGTGAT AACCAGCTGGATAAGAATGCAAACTTCCTGTGGGGACAGAGGGAATATCATGCTAAGCGATTTTTCTCGAACTATTTCGA GGAAATTGATCCAGCGAAGGGCTATTCAGCATATGAAAATCGTTTGCATCCGAATGGGACAAGAAAACTTGCAATTGGAA ACCTAATTGTACCACTTGATCTGGCTGAGTTTAGGCGGAAGATGAAAGGTGATTATAAAAGACAGCCAGGGGTGAGTAAG AAGTGCACGAGCTCGAAGGATGGAAACTACGTGTATCCCTGTTGTTGCACTACACTTGATGATGGCTCAGCTGTTGAATC AACATTTTACCCGCCAACTAAGAAGCACCTTGTAATAGGTAATAGTGGCGACCAAAAGTATGTTGACTTACCAAAAGGGA ATTCTGAGATGTTATATATTGCCAGGCAAGGCTTTTGTTACATTAACATTTTCCTCGCGATGTTGATTAACATTAGTGAG GAAGATGCAAAGGATTTCACTAAGAAGGTTCGTGACATGTGTGTGCCAAAGCTTGGAACCTGGCCAACCATGATGGATCT GGCTACAACTTGTGCTCAAATGAAAATATTCTACCCTGATGTTCATGATGCAGAACTGCCTAGAATACTAGTCGATCACG AAACGCAGACATGCCATGTGGTTGACTCGTTTGGCTCACAAACAACTGGGTATCATATTTTGAAAGCATCTAGCGTGTCC CAACTTATTTTGTTTGCTAATGATGAGTTGGAGTCTGACATTAAACATTATAGAGTTGGTGGTGTTCCTAATGCATGCCC TGAACTTGGGTCCACAATATCACCTTTCAGAGAAGGAGGAGTTATAATGTCTGAGTCGGCAGCTCTGAAACTGCTTTTGA AGGGAATTTTTAGACCTAAGGTGATGAGACAGTTGTTGTTAGATGAGCCTTATCTGTTGATTCTATCAATATTATCCCCT GGCATACTGATGGCTATGTACAATAATGGGATTTTTGAACTTGCGGTAAGGTTGTGGATTAATGAGAAACAATCCATAGC TATGATAGCATCGCTACTATCAGCTTTAGCCCTACGAGTGTCAGCGGCAGAAACACTCGTCGCACAGAGGATTATAATTG ATACTGCAGCTACAGATCTCCTTGATGCTACGTGCGATGGGTTCAACCTACATCTAACGTACCCCACTGCGTTGATGGTG TTGCAAGTTGTTAAGAATAGAAATGAATGTGATGATACCCTATTCAAGGCGGGTTTTCCAAGTTACAACACGAGCGTCGT ACAGATCATGGAAAAAAATTATCTAAATCTCTTAAACGATGCTTGGAAAGATTTAACTTGGCGGGAAAAATTATCCGCAA CATGGTACTCATACAGAGCAAAACGCTCTATCACTCGGTACATAAAACCCACAGGAAGGGCAGATTTGAAAGGGTTATAC AACATATCACCACAAGCATTCTTGGGCCGAGGCGCCAAGGTGGTCAAAGGCACTGCCTCAGGATTGTGCGAGCGATTTAA TAATTATTTCTACACTAAGTGTGTAAATATTTCATCCTTTTTCATTCGTAGAATCTTTAGGCGTTTGCCAACTTTCGTCA CTTTTGTTAACTCATTATTAGTTATTAGTATGTTAACCAGCGTAGTGGCAGTCTGTCAGGCAATAATTTTAGATCAGAGG AAGTATAGGAGAGAAATCGAGTTGATGCAGATAGAGAAGAATGAGATCGTCTGCATGGAGCTATATGCAAGTTTACAGCG CAAACTTGAACGCGATTTCACATGGGATGAGTACATTGAGTATTTGAAGTCAGTAAACCCTCAGATAGTTCAGTTTGCTC AAGCGCAGATGGAAGAATATGATGTGCGACACCAGCGTTCCACACCAGGTGTTAAAAATTTGGAACAAGTGGTAGCATTT ATGGCTTTAGTCATCATGGTGTTCGATGCTGAAAGGAGTGATTGCGTGTTCAAAACTCTCAATAAATTTAAGGGTGTCCT TTCCTCACTGGACCATGAAGTTAGACATCAGTCCTTAGACGATGTGATCAAGAATTTTGATGAGAGGAATGAGGTTATTG ATTTTGAATTGAGTGAGGACACAATTCGAACATCATCAGTGCTAGATACAAAGTTTAGTGATTGGTGGGACCGACAAATC CAGATGGGGCATACACTTCCACATTACAGAACCGAGGGGCACTTCATGGAATTCACAAGAGCAACTGCTGTCCAAGTGGC TAATGACATTGCCCATAGCGAACACCTAGACTTTCTAGTAAGGGGAGCTGTTGGGTCTGGAAAGTCAACTGGGTTGCCTG TTCATCTTAGTGTAGCCGGATCTGTGCTTTTAATTGAACCAACGCGACCACTAGCGGAGAACGTTTTCAAACAGCTATCT AGTGAACCATTCTTCAAGAAGCCAACACTGCGTATGCGTGGAAATAGTATATTTGGCTCTTCTCCAATCTCCGTCATGAC TAGCGGATTTGCGCTACACTACTTCGCCAATAATCGCTCTCAATTAGCTCAGTTCAACTTTGTAATATTTGATGAGTGCC ATGTTCTGGATCCTTCCGCAATGGCGTTCCGCAGTCTGCTGAGTGTTTATCATCAAGCATGCAAAGTATTAAAAGTGTCA GCTACTCCAGTGGGAAGAGAGGTTGAATTTACAACACAGCAACCAGTCAAGTTAATAGTGGAGGACACACTGTCTTTTCT ATCATTTGTTGATGCACAAGGTTCTAAAACTAATGCTGATGTTGTTCAGTTTGGTTCAAACGTACTTGTGTACGTGTCGA GCTACAATGAAGTTGATACCTTGGCTAAGCTCCTAACAGACAAGAATATGATGGTCACAAAAGTTGATGGCAGAACAATG AAGCACGGTTGCCTAGAAATTGTCACAAAAGGAACCAGTGCGAGACCACATTTTGTTGTAGCAACCAACATAATTGAGAA TGGAGTGACTTTGGACATAGACGTGGTTGTAGATTTTGGGTTGAAAGTCTCACCGTTCTTGGACATTGACAATAGGAGCA TTGCTTACAATAAGGTGAGTGTTAGCTATGGTGAGAGAATTCAAAGGCTGGGTCGTGTTGGACGCTTCAAGAAAGGAGTA GCATTGCGCATTGGACACACTGAGAAGGGAATTATTGAAATTCCAAGCATGATCGCTACAGAGGCGGCTCTTGCTTGCTT TGCATATAACTTGCCAGTGATGACAGGAGGCGTCTCAACTAGTCTGATTGGCAATTGTACTGTGCGCCAAGTTAAAACAA TGCAGCAATTTGAATTGAGTCCCTTCTTTATCCAGAATTTTGTTGCCCATGATGGATCAATGCATCCTGTCATACATGAC ATTCTTAAAAAGTATAAACTTCGAGATTGTATGACACCTTTGTGCGATCAGTCTATACCATACAGGGCATCGAGCACTTG GTTATCGGTTAGTGAATATGAGCGACTTGGAGTGGCCTTAGAAATTCCAAAGCAAGTCAAAATTGCATTCCATATCAAAG AGATCCCTCCTAAGCTCCATGAAATGCTTTGGGAAACGGTTGTCAAGTACAAAGACGTTTGCTTATTTCCAAGCATTCGA GCATCGTCCATCAGCAAAATCGCATACACATTGCGTACAGACCTTTTCGCCATCCCAAGAACTCTAATATTGGTGGAGAG ACTGCTTGAAGAGGAGCGAGTGAAGCAGAGCCAATTCAGAAGTCTCATCGACGAAGGATGCTCAAGCATGTTTTCAATTG TCAACTTGACAAACACTCTCAGAGCTAGATATGCAAAAGATTACACCGCAGAGAACATACAAAAACTTGAGAAAGTGAGA AGTCAATTGAAAGAATTCTCAAATTTGGATGGTTCTGCATGTGAGGAAAATTTAATAAAGAGGTATGAGTCTTTGCAGTT CGTTCATCACCAAGCTGCGACGTCACTTGCAAAGGATCTCAAGTTGAAGGGGACTTGGAAGAAGTCATTAGTGGCCAAAG ACTTGATCATAGCAGGCGCTGTTGCAATTGGTGGAATAGGACTCATATATAGTTGGTTCACACAATCAGTTGAGACTGTG TCTCACCAAGGGAAAAATAAATCCAAAAGAATCCAAGCCTTGAAGTTTCGCCATGCTCGTGACAAAAGGGCTGGCTTTGA AATTGACAACAATGATGACACAATAGAGGAATTCTTTGGATCTGCATACAGGAAAAAGGGAAAAGGTAAAGGTACCACAG TTGGTATGGGCAAGTCAAGCAGGAGGTTCATCAACATGTATGGGTTTGACCCAACAGAGTACTCATTCATCCAATTCGTT GATCCACTCACTGGGGCGCAAATAGAAGAGAATGTCTATGCTGACATTAGAGATATTCAAGAGAGATTTAGTGAAGTGCG AAAGAAAATGGTTGAGAATGATGACATTGAAATGCAAGCCTTGGGCAGTAACACGACCATACATGCATACTTCAGGAAAG ATTGGTCTGACAAAGCTTTGAAGATTGATTTAATGCCACATAATCCACTCAAAGTTTGTGACAAGACAAATGGCATTGCC AAATTTCCTGAGAGAGAACTCGAACTAAGGCAGACTGGGCCAGCTGTAGAAGTCGACGTGAAGGACATACCAGCACAGGA GGTGGAGCATGAAGCTAAATCGCTCATGAGAGGCTTGAGAGACTTCAACCCAATTGCCCAAACAGTTTGTAGGCTGAAAG TATCTGTTGAATATGGGACATCAGAGATGTACGGTTTTGGATTTGGAGCGTACATAATAGCGAACCACCATTTGTTCAGG AGTTACAATGGTTCCATGGAGGTGCGATCCATGCACGGTACATTCAGGGTGAAGAATCTACACAGTTTGAGTGTTCTGCC AATTAAAGGTAGGGATATCATCCTCATCAAAATGCCGAAAGATTTCCCTGTCTTTCCACAGAAATTGCATTTCCGAGCTC CTACACAGAATGAAAGAGTTTGTTTAGTTGGAACCAACTTTCAGGAGAAGTATGCATCGTCGATCATCACAGAAGGAGGC ACCACTTACAATATACCAGGCAGCACATTCTGGAAGCATTGGATTGAAACAGATAATGGACATTGTGGACTACCAGTGGT GAGCACCACCGATGGATGTCTAGTCGGAATTCACAGTTTGGCAAACAACAAACACACCACGAATTACTACTCAGCCTTCG ATGAAGATTTTGAAAGCAAGTATCTCCGAACCAATGAGCACAATGAATGGGTCAAGTCTTGGATTTATAATCCAGACACA GTATTGTGGGGCCCGTTGAAACTTAAAGACAGCACTCCCAAAGGATTATTCAAAACAACAAAGCTTGTGCAAGATCTAAT TGATCATGATGAAGTGGTGGAGCAAGCTAAGCACTCTGCGTGGATGTTTGAAGCCTTGACAGGAAATTTGCAAGCTGTCG CAACAATGAAGAGCCAATTAGTAACCAAGCATGTAGTTAAAGGAGAGTGTCGACACTTCAAAGAATTCCTGACTGTGGAT GCAGAAGCAGAGGCATTCTTCAGGCCTTTGATGGATGCGTATGGGAAAAGTTTGCTGAATAGAGATGCATACATCAAGGA CATAATGAAGTATTCAAAACCTATAGATGTTGGTATCGTGGACTGTGATGCATTTGAGGAAGCCATCAATAGGGTTATCA TCTACCTGCAAGTGCACGGCTTCAAGAAGTGCGCATACGTTACTGACGAGCAAGAAATTTTTAAAGCGCTCAACATGAAA GCTGCAGTCGGAGCCATGTATGGTGGCAAAAAGAAAGACTATTTTGAGCATTTCACTGATGCAGATAAGGAAGAAATAGT CATGCAAAGCTGTCTGCGATTGTATAAAGGCTTGCTTGGCATTTGGAATGGATCATTGAAGGCAGAGCTCCGGTGTAAGG AAAAGATACTTGCAAATAAGACGAGGACATTCACTGCTGCACCTTTAGACACTTTGCTGGGTGGTAAAGTGTGTGTTGAT GACTTCAATAATCAATTTTATTCAAAGAATATTGAATGCTGTTGGACGGTTGGGATGACAAAGTTTTATGGTGGTTGGGA TAAACTGCTGCGGCGTTTACCTGAGAATTGGGTTTACTGTGATGCCGATGGCTCACAGTTTGATAGTTCACTAACTCCAT ACTTAATCAATGCAGTTCTCACCATTAGAAGCACATACATGGAAGATTGGGATGTGGGGTTGCAAATGTTGCGCAATTTA TACACTGAGATTGTTTACACACCTATTTCAACTCCAGATGGAACAATTGTTAAGAAGTTCAGAGGGAATAACAGTGGTCA GCCTTCTACTGTTGTGGACAACTCTCTTATGGTCGTCCTTGCCATGCACTATGCTCTCATCAAAGAATGCATTGAGTTTG AAGAGATTGACAGCACGTGCGTGTTCTTTGTCAATGGTGATGATTTGCTGATTGCTGTGAATCCGGATAAAGAGGGCATT CTTGACAGATTGTCACAACACTTCTCAGATCTTGGTTTGAATTATGATTTCTCGTCAAGAACAAGAAATAAGGAGGAGTT GTGGTTTATGTCTCATAGAGGCCTACTGATTGAGGCAATGTACGTGCCGAAACTTGAAGAAGAAAGGATTGTGTCCATTC TCCAATGGGACAGAGCAGACTTGGCTGAACATAGGCTTGAGGCGATTTGCGCAGCTATGATAGAGTCCTGGGGTTATTCT GAACTAACACACCAAATCAGGAGATTCTACTCATGGTTATTGCAACAGCAACCCTTTGCAACAATAGCGCAGGAAGGGAA GGCTCCTTATATAGCAAGCATGGCATTAAGGAAATTGTATATGGATAGGGCTGTGGATGAGGAAGAGCTGAGAGCCTTCA CTGAAATGATGGTCGCATTAGATGATGAGTTTGAATTTGACTCTTATGAAGTATACCATCAAGCAAATGACACAATCGAT GCAGGAGAAAGCAGCAAGAAAGATGCAAGACCAGAGCAAGGCAGCATCCAGTCAAACCCGAACAAAGGAAAAGATAAGGA TGTGAATGCTGGTACATCTGGGACACATACTGTGCCGAGAATCAAGGCTATCACGTCCAAAATGAGGATGCCCAAAAGCA AGGGAGCAACCGTGCTAAACTTAGAACACTTGCTTGAGTATGCTCCACAACAAATTGATATTTCAAATACTCGGGCAACT CAATCACAGTTTGATACGTGGTATGAGGCAGTGCGGATGGCATACGACATAGGAGAAACTGAGATGCCAACTGTGATGAA TGGGCTTATGGTTTGGTGCATTGAAAATGGAACCTCGCCAAATGTCAACGGAGTTTGGGTTATGATGGATGGGAATGAAC AAGTCGAGTACCCGTTGAAACCAATCGTTGAGAATGCAAAACCAACCCTTAGGCAAATCATGGCACATTTCTCAGATGTT GCAGAAGCGTATATAGAAATGCGCAACAAAAAGGAACCATATATGCCACGATATGGTTTAATTCGAAATCTGCGGGATGT GGGTTTAGCGCGTTATGCCTTTGACTTTTATGAGGTCACATCACGAACACCAGTGAGGGCTAGGGAAGCGCACATTCAAA TGAAGGCCGCAGCATTGAAATCAGCCCAACCTCGACTTTTCGGGTTGGACGGTGGCATCAGTACACAAGAGGAGAACACA GAGAGGCACACCACCGAGGATGTCTCTCCAAGTATGCATACTCTACTTGGAGTCAAGAACATGTGA ================================================ FILE: inst/extdata/Gram-negative_AKL.fasta ================================================ >Random_Gram-negative_AKL_gjtez RWTHLASGRTYNYKFNPPKQYGKDDITGEDLIQRED >Random_Gram-negative_AKL_dibhu RWTHLNSGRTYHYKFNPPKVHGVDDVTGEPLVQRED >Random_Gram-negative_AKL_elirp RWTHLASGRTYNYKFNPPKQYGKDDITGEDLIQRED >Random_Gram-negative_AKL_dnjtf RWTHLASGRTYNYKFNPPKQYGKDDITGEDLIQRED >Random_Gram-negative_AKL_qzcvn RLIHQPSGRSYHEEFNPPKEPMKDDVTGEPLIRRSD >Random_Gram-negative_AKL_mqvro RRVHPGSGRVYHVVYNPPKVEGKDDETGEELIVRAD >Random_Gram-negative_AKL_qjvxv RRVHPASGRIYHLVHNPPEVDGVDDATGEMLIQRDD >Random_Gram-negative_AKL_mlmcf RRVHAPSGRVYHVKFNPPKVEGKDDVTGEELTTRKD >Random_Gram-negative_AKL_bfqnk RWVHEPSGRVYNTDFNAPKVPGKDDITGEPLTQRQD >Random_Gram-negative_AKL_kvcas RWVHEPSGRVYNTDFNVPKVPGKDDVTGEPLTQRQD >Random_Gram-negative_AKL_xrbtp RRAHLPSGRTYHVVYNPPKVEGKDDVTGEDLVIRED >Random_Gram-negative_AKL_yggsb RRAHLPSGRTYHVVYNPPKVEGKDDVTGEDLVIRED >Random_Gram-negative_AKL_wntes RRVHQPSGRSYHIVYNPPKTEGKDDVTGEDLIIRAD >Random_Gram-negative_AKL_gbdos RRVHQPSGRSYHIVYNPPKTEGKDDVTGEDLIIRAD >Random_Gram-negative_AKL_lhrmd RRVHQPSGRSYHIVYNPPKTEGKDDVTGEDLIIRAD >Random_Gram-negative_AKL_zhrxk RRVHQPSGRSYHIVYNPPKTEGKDDVTGEDLIIRAD >Random_Gram-negative_AKL_taozi RRVHQPSGRSYHIVYNPPKTEGKDDVTGEDLIIRAD >Random_Gram-negative_AKL_reram RRVHQPSGRSYHIVYNPPKTEGKDDVTGEDLIIRAD >Random_Gram-negative_AKL_rukmd RRVHQPSGRSYHIVYNPPKTEGKDDVTGEDLIIRAD >Random_Gram-negative_AKL_dfkbq RRVHQPSGRSYHIVYNPPKTEGKDDVTGEDLIIRAD >Random_Gram-negative_AKL_dvomf RRVHQPSGRSYHIIYNPPKTEGKDDVTGEDLIIRAD >Random_Gram-negative_AKL_vgzym RWIHPSSGRVYNLDFNPPQVQGIDDITGEPLVQQED >Random_Gram-negative_AKL_ptlzq RLFHPGSGRVYHKVTNPPKKPMTDDITGEPLIIRKD >Random_Gram-negative_AKL_lgmpt RLFHPGSGRTYHTKFNPPKVPMKDDQTGEDLIVRKD >Random_Gram-negative_AKL_stqhz RRAHLPSGRTYHVVYNPPKVEGKDDVTGEDLVIRED >Random_Gram-negative_AKL_leceq RRAHLPSGRTYHVVYNPPKVEGKDDVTGEDLVIRED >Random_Gram-negative_AKL_arqwq RWVHEPSGRVYNTDFNAPKVPGKDDVTGEPLTQRED >Random_Gram-negative_AKL_edhmf RRVHPGSGRSYHVKFNPPKVEGKDDVTGEPLVQRDD >Random_Gram-negative_AKL_jefev RRVHPGSGRVYHVVFNPPKVEGKDDVTGEDLAIRPD >Random_Gram-negative_AKL_mgvft RRTHPASGRTYHVKFNPPKVDGKDDVTGEPLIQRDD >Random_Gram-negative_AKL_pdjwi RWIHPSSGRSYHTKFAPPKVPGVDDVTGEPLIQRKD >Random_Gram-negative_AKL_hbdlm RWIHPSSGRSYHTKFAPPKTPGLDDVTGEPLIQRKD >Random_Gram-negative_AKL_qinsk RRAHLPSGRTYHVVYNPPKVEGKDDVTGEDLVIRED >Random_Gram-negative_AKL_xiszr RRVHAPSGRVYHVKFNPPKVEGKDDVTDEELTTRKD >Random_Gram-negative_AKL_tsjls RRAHLPSGRTYHVVYNPPKVEGKDDVTGEDLVIRED >Random_Gram-negative_AKL_ivaqd RRAHLPSGRTYHVVYNPPKVEGKDDVTGEDLVIRED >Random_Gram-negative_AKL_uceun RRVHESSGRIYHVKYDPPKVEDKDNETGEALIQRED >Random_Gram-negative_AKL_yfnqy RRVHPASGRVYHTEHNPPKVAGKDDVTGEELIQRED >Random_Gram-negative_AKL_fquul RRVHPASGRVYHTEHNPPKVAGKDDVTGEELIQRED >Random_Gram-negative_AKL_hrvsw RWVHVPSGRVYNLDYNPPKVPFKDDVTGEPLSKRED >Random_Gram-negative_AKL_wdkfx RRVHAPSGRVYHVKFNPPKVEGKDDVTGEELTTRKD >Random_Gram-negative_AKL_lpxmt RRVHAPSGRVYHVKFNPPKVEGKDDVTGEELTTRKD >Random_Gram-negative_AKL_bkmgo RRVHAPSGRVYHVKFNPPKVEGKDDVTGEELTTRKD >Random_Gram-negative_AKL_rqtgn RRVHAPSGRVYHVKFNPPKVAGKDDVTGEELTTRKD >Random_Gram-negative_AKL_fzfio RRVHAPSGRVYHVKFNPPKVEGKDDVTGEXLTTRKD >Random_Gram-negative_AKL_ptxxd RRVHAPSGRVYHVKFNPPKVEGKDDVTGEELTTRKD >Random_Gram-negative_AKL_fmdzi RRVHVASGRTYHVKYNPPKNEGKDDETGEPLIQRDD >Random_Gram-negative_AKL_ehnfi RRAHLPSGRTYHSVYNPPKEEGKDDITGEELVVRDD >Random_Gram-negative_AKL_gwaom RRVHPGSGRVYHIKHNPPKEEGKDDETGEELVIRPD >Random_Gram-negative_AKL_ngobh RRAHLPSGRTYHNVYNPPKEEGKDDITGEELVVRDD >Random_Gram-negative_AKL_jgpqr RRVHPESGRIYHTVYNPPKVEGKDDETGEDLVQRPD >Random_Gram-negative_AKL_jvlnt RRVHPGSGRIYHVEHNPPKVEGVDDETGEALVHRDD >Random_Gram-negative_AKL_dnrym RRVHEASGRVYHVMHNPPKESGIDDITGEPLIQRDD >Random_Gram-negative_AKL_omfoc RRVHPGSGRVYHRIHNPPTLDDRDDLTGEPLVQRDD >Random_Gram-negative_AKL_gnjvq RRVHPGSGRVYHVVYNPSKVEGKDDVTGEDLIIRDD >Random_Gram-negative_AKL_xapht RRMHPASGRNYHIIFNPPKVEGKDDATGEDLIQRED >Random_Gram-negative_AKL_vhtcj RWYHLKSGRIYHTLYNPPLTAGKDDDTGEPLEQ--- >Random_Gram-negative_AKL_jnrhr RWVHKSSGRTYHEVFRPPRTPGKDDVTGEDLHQRPD >Random_Gram-negative_AKL_kmyvp RWIHKPSGRTYHEVFRPPKTPGKDDITGEDLYQRPD >Random_Gram-negative_AKL_bbbbb RWYHPKSGRIYHTFYNPPLNAGKDDYTGEPLVQ--- >Random_Gram-negative_AKL_obouo RRIHPASGRTYHTKFNPPKVADKDDVTGEPLITRTD >Random_Gram-negative_AKL_ellkt RRTHPASGRTYHVKFNPPKVEGKDDVTGEPLVQRDD >Random_Gram-negative_AKL_sxldp RRVHQASGRSYHIVYNPPKVEGKDDVTGEDLIIRAD >Random_Gram-negative_AKL_sckku RRVHQASGRSYHIVYNPPKVEGKDDVTGEDLIIRAD >Random_Gram-negative_AKL_fbqzv RRVHQTSGRSYHIVYNPPKVEGKDDVTGEDLIIRAD >Random_Gram-negative_AKL_ypuig RRVHQTSGRSYHIVYNPPKVEGKDDVTGEDLIIRAD >Random_Gram-negative_AKL_ltbjc RRVHQASGRSYHIVYNPPKVEGKDDVTGEDLIIRAD >Random_Gram-negative_AKL_wmjap RRVHQASGRSYHIVYNPPKVEGKDDVTGEDLIIRAD >Random_Gram-negative_AKL_oxood RRVHQASGRSYHIVYNPPKVEGKDDVTGEDLIIRAD >Random_Gram-negative_AKL_ddrjb RWVHEPSGRVYNDTFNAPQVPGRDDVTGEPLVRRPD >Random_Gram-negative_AKL_nwqma RWIHPASGRSYHTKFAPPKVEGKDDFTGEPLIKRKD >Random_Gram-negative_AKL_dtzyf RRVHPASGRVYHTEHNPPKVAGKDDETGEELIQRED >Random_Gram-negative_AKL_whwzb RWIHPSSGRTYHTKFAPPKVSGVDDVTGEPLIQRKD >Random_Gram-negative_AKL_dmvij RWVHVPSGRVYNLDYNPPKVPFKDDITGEPLTKRSD >Random_Gram-negative_AKL_xtwaf RYVHLPSGRIYSLDYNPPKVPFKDDVTGEDLVKRED >Random_Gram-negative_AKL_iyejp RRTHPASGRTYHVKFNPPKVEGKDDVTGEPLVQRDD >Random_Gram-negative_AKL_cbxjs RLIHKPSGRIYHKIFNPPKTPFKDDITNEPLIQRED >Random_Gram-negative_AKL_oglie RRAHLPSGRTYHTVYNPPKEEGKDDVTGEELVVRDD >Random_Gram-negative_AKL_jqtgo RRTHPASGRTYHVKFNPPKVEGKDDVTGEPLVQRDD >Random_Gram-negative_AKL_prkvf RRQHPGSGRVYHLKYNPPKQEGLDDETGEPLIQRDD >Random_Gram-negative_AKL_aincb RRTHPASGRTYHVKFNPPKVEGKDDVTGEPLVQRDD >Random_Gram-negative_AKL_whyuk RRAHLPSGRTYHVVYNPPKVEGKDDVTGEDLVIRED >Random_Gram-negative_AKL_tbpgo RRAHLPSGRTYHVVYNPPKVEGKDDVTGEDLVIRED >Random_Gram-negative_AKL_lkebr RRAHLPSGRTYHVVYNPPKVEGKDDVTGEDLVVRDD >Random_Gram-negative_AKL_npiwv RRAHLPSGRTYHVVYNPPKVEGKDDVTGEDLVVRDD >Random_Gram-negative_AKL_zzajl RRVHAASGRVYHVKFNPPKVEDKDDVTGEELTIRKD >Random_Gram-negative_AKL_pwhal RRAHLASGRTYHVVYNPPKVEGKDDVTGEDLVVRDD >Random_Gram-negative_AKL_hcuqd RRTHPASGRTYHVKFNPPKQEGIDDITGEPLVQRDD >Random_Gram-negative_AKL_pswng RWVHAPSGRVYNTQFNAPKEPGKDDVTGEPLVQRAD >Random_Gram-negative_AKL_eueyh RWVHAPSGRVYNTTFHAPKVAGLDDITGEKLTKRPD >Random_Gram-negative_AKL_iplvh RRAHLPSGRTYHVVYNPPKVEGKDDVTGEDLVIRED >Random_Gram-negative_AKL_ykocu RRAHLPSGRTYHVVYNPPKVEGKDDVTGEDLVIRED >Random_Gram-negative_AKL_pzokl RYVHVPSGRVYNLQYNPPKVPGLDDITGEPLTKRLD >Random_Gram-negative_AKL_dpucn RLVHEPSGRVYHMTSKPPKVPMRDDITNEPLTQRKD >Random_Gram-negative_AKL_hcasp RRVHVASGRTYHVKYNPPKTEGVDDETGEPLIQRDD >Random_Gram-negative_AKL_ynuts RWVHAPSGRVYNTTFHAPKVPGLDDITGEKLTKRPD >Random_Gram-negative_AKL_kfbqi RRIHPASGRTYHTKFNPPKVADKDDVTGEPLITRTD >Random_Gram-negative_AKL_fbphm RRIHPASGRTYHTKFNPPKVADKDDVTGEPLITRTD >Random_Gram-negative_AKL_xrebl RRIHPASGRTYHTKFNPPKVADKDDVTGEPLITRTD >Random_Gram-negative_AKL_snboh RRVHQPSGRTYHVVYNPPKVEGKDDVTGEDLIIRQD ================================================ FILE: inst/extdata/Gram-positive_AKL.fasta ================================================ >Random_Gram-positive_AKL_pjxgp RRTCVGCGTAFNYVMEPPKKEGICDACGGKLVVRDD >Random_Gram-positive_AKL_essyp RRTCVGCGTAFNYVMEPPKKEGICDACGGKLVVRDD >Random_Gram-positive_AKL_lopeh RRIHEASGRVYHVVFNPPKKSGVDDETGDQLLQRED >Random_Gram-positive_AKL_mzuep RRICRSCGATYHIHFNPPAQAGICDKCGGELYQRAD >Random_Gram-positive_AKL_pjycw RLYCPNCGETYHVSWKPPRKPGVCDNCGSRLVRRRD >Random_Gram-positive_AKL_tmsgs RRICARCGAIYHVKYMPPKIPGICDKCGGPLVQRRD >Random_Gram-positive_AKL_byrtv RRICQSCGGIFNIYTLPTKEKGICDLCKGSLYQRKD >Random_Gram-positive_AKL_hynwj RRICKSCGGIFNIYTLPTKEKEICDLCKGILYQRKD >Random_Gram-positive_AKL_ycsho RRICKECGATYHLEFNPPAKADVCDKCGGELYQRSD >Random_Gram-positive_AKL_kgtzr RWIHPSSGRVYNLDFNPPQVQED------------- >Random_Gram-positive_AKL_tmtym RRLDPVTGKIYHLKYSPPENEEIAS----RLTQRFD >Random_Gram-positive_AKL_diswt RYICPKCGRVYNLLFNPPKNDLRCDDDGTPLIRRSD >Random_Gram-positive_AKL_mgvxi RRLCPNCQRTYHILFAPPKKDSLCDYCSVQLVQRAD >Random_Gram-positive_AKL_cfbwo RRICKTCGASYHLVFNPPAEEGKCDKDGGELYTRAD >Random_Gram-positive_AKL_rmuqg RYTCGNCGAGYHDDFKKPKVEGTCDDCGEQMKRRAD >Random_Gram-positive_AKL_ejowt RSTCGSCGEVYNDITKPIPQDGKCTKCGGEFKRRAD >Random_Gram-positive_AKL_qzdve RFTCGGCGEGYHDSFKQPQAMGTCDKCGGEFKRRAD >Random_Gram-positive_AKL_yiwft RYSCGSCGAVYHDDTKPTKVEGVCDVCGSDLRRRAD >Random_Gram-positive_AKL_ubfcu RSTCAACGEGYHDSFKQPARAGTCDKCGGEFKRRPD >Random_Gram-positive_AKL_yqxwz RSTCGNCGEVYHDVTKPQPADGKCEKCGADFKRRAD >Random_Gram-positive_AKL_vvrnb RSTCGNCGEVYHDVTKPQPADGKCEKCGADFKRRAD >Random_Gram-positive_AKL_kcxvy RSTCANCGEVYHDETKPIPADGKCSVCGGEFKRRAD >Random_Gram-positive_AKL_qimxn RYTCGGCGEGYHDSFKTPAVAGVCDKCSGDMQRRPD >Random_Gram-positive_AKL_gooyo RYTCGGCGEGYHDSFKVPSVEGTCDKCGGEMKRRAD >Random_Gram-positive_AKL_zgspg RSTCGNCGEVYNDMTKPWPADGKCAKCGSDVRRRAD >Random_Gram-positive_AKL_fbyaj RFSCANCGALYHDTANPPAKEGVCDVCHSEFKRRPD >Random_Gram-positive_AKL_spvav RSTCGGCGEVYHDETKPWPEDGKCTNCGSEVKRRTD >Random_Gram-positive_AKL_elbro RYTCGGCGEGYHDSFKQPAVAGTCDKCGSNMTRRAD >Random_Gram-positive_AKL_oxdgk RRLCSGCGLDYNLIHHRPQVIDQCDVCGAPLTQRAD >Random_Gram-positive_AKL_fxbao RFTCGDCGEGYHDTFKTPKVADTCDNCGANMTRRAD >Random_Gram-positive_AKL_siwfh RYTCAGCGEGYHDSFKQPAVEGKCDKCGGEMTRRAD >Random_Gram-positive_AKL_riiin RSTCGGCGEVYHDETKPWAADGKCTNCGSDVKRRAD >Random_Gram-positive_AKL_eposf RRMCGQCGRSWHVEFNPTRVEGICDTCAGSLHQRED >Random_Gram-positive_AKL_klpbd RRIHLSSGRSYHIEFNPPRVEGKDDLSGEDLIQRED >Random_Gram-positive_AKL_txwex RRTDPLTGTIYHLKYNPPPEDDT------------- >Random_Gram-positive_AKL_lyfma RRSCPDCGFVYNIKMDPPKVDGVCDKCGCPLITRKD >Random_Gram-positive_AKL_vlwew RRSCPDCGFVYNIKMDPPKLDGVCDKCGCPLITRKD >Random_Gram-positive_AKL_xzjec RRLDPVTGRIYNLKSDPPSPDVVDR----------- >Random_Gram-positive_AKL_zyvla RWVHKASGRSYHATFNPPKSLKAC------------ >Random_Gram-positive_AKL_xxxrz RYTCAKCGAGYHDKFQQPKVAGTCDSCGGEFTRRAD >Random_Gram-positive_AKL_hlisn RFTCAACGEGYHDHFKQAAVAGTCDKCGGDFRRRPD >Random_Gram-positive_AKL_xeeqt RYSCGNCGAVYHDETKPTKVEGVCDVCGSDLRRRAD >Random_Gram-positive_AKL_yuxgl RRLDPVTGRIYHLKYSPPENEEIAA----RLTQRFD >Random_Gram-positive_AKL_gzzla RRICRSCGASYHVLFNKPAIEGRCNACGGELYQRSD >Random_Gram-positive_AKL_lrjxz RRICESCGTTYHLVFNPPKVEGICDIDGGKLYQRED >Random_Gram-positive_AKL_subqs RRFCPNCKAGFHIDFMPSSKGNICDKCGTELITRKD >Random_Gram-positive_AKL_rqgxp RRACVDCGATYHLVYAPTKEEGICDKCGGGLILRDD >Random_Gram-positive_AKL_ffzzd RYTCANCGAGYHDTFKQPKIEGVCDECGSEFKRRPD >Random_Gram-positive_AKL_ytofs RRTSKVTGKIYHIKFNPPVDEKEED-----LVQRAD >Random_Gram-positive_AKL_oygvi RQTCKTCGSTYNIYYFPSKHPNVCDDCGGKLYQRSD >Random_Gram-positive_AKL_cyvfv RFICRNCGATYHKLYNAPKVEGTCDVCGHEFYQRDD >Random_Gram-positive_AKL_mwtjn RRACVGCGATYHVVYNPTKEEGTCDTCGGELIVRDD >Random_Gram-positive_AKL_vaoec RRACLKCGATYHIVYAAPKVENVCDTCGENLVLRDD >Random_Gram-positive_AKL_gxjsl RRACVGCGATYHLVYAPTKTEGICDVCGKELILRDD >Random_Gram-positive_AKL_nyalo RYTCGGCGEGYHDSFKMPNVAGICDKCGGEMKRRAD >Random_Gram-positive_AKL_unitb RSTCGGCGEVYNDITKPWPADGKCAKCGSDVKRRAD >Random_Gram-positive_AKL_cgcxi RRVHEGSGRIYHVKYDPPKTEGKDDETGDALIQRED >Random_Gram-positive_AKL_umndp RRVHAPSGRVYHTVYNPPKVAGKDNETGDELTIRVD >Random_Gram-positive_AKL_diikj RFTCGGCGEGYHDSFKPTDKPGICDACGGDMKRRAD >Random_Gram-positive_AKL_uwyra RSTCAGCGEVYNDITKPIPADGICPKCGGEFKRRAD >Random_Gram-positive_AKL_nlnlr RRQDPETGAIYHLKFNPPADEAVLA----RLVQRKD >Random_Gram-positive_AKL_easkj RRVCSHCGTPFHLESNPPKKPDVCDVCGGELIERDD >Random_Gram-positive_AKL_bzgzv RQNCRKCGEIYNKLFMPSKVEGVCDKCGGELFQRPD >Random_Gram-positive_AKL_rdkpc RRICESCGTTYHLVFNPPKVEGICDIDGGKLYQRED >Random_Gram-positive_AKL_etmff RRMCKECGATYHILFNPPTKADQCDKCGGQLYQRDD >Random_Gram-positive_AKL_ynwrz RRICESCGTTYHLVFNPPKVEGICDIDGGKLYQRED >Random_Gram-positive_AKL_fxmpj RRACVDCGATYHIVYAPTEKEDVCDKCGGSLILRDD >Random_Gram-positive_AKL_wkyro RFICRNCGTTYHKLYNAPKVEGTCDVCGHEFYQRDD >Random_Gram-positive_AKL_yemlt RQTCKTCGATYNIYYFPSKHPNICDDCGGKLYQRSD >Random_Gram-positive_AKL_ekvfi RRLDPVTGRIYHLKYSPPENEEIAA----RLTQRFD >Random_Gram-positive_AKL_quwnk RRACVGCGATYHIVYNPTKVEGKCDVCSSDLILRDD >Random_Gram-positive_AKL_qrxch RRVCEKCGATYHLLYKKPKAEGVCDICGGTLIQRKD >Random_Gram-positive_AKL_qrppk RRICESCGTTYHLVFNPPKVEGICDIDGGKLYQRED >Random_Gram-positive_AKL_fnlju RRICKECGATYHLEFNPPAKADVCDKCGGELYQRSD >Random_Gram-positive_AKL_kvakt RRICKECGATYHLEFNPPAKADVCDKCGGKLYQRSD >Random_Gram-positive_AKL_rcuhp RRICKECGATYHLEFNPPAKADVCDKCGGELYQRSD >Random_Gram-positive_AKL_tbwwv RRICKECGATYHLEFNPPAKADVCDKCGGELYQRSD >Random_Gram-positive_AKL_acvdx RRICKECGATYHLEFNPPAKADVCDKCGGELYQRSD >Random_Gram-positive_AKL_wojzq RRICKECGATYHLEFNPPAKADVCDKCGGELYQRSD >Random_Gram-positive_AKL_jipjw RRICKECGATYHLEFNPPAKADVCDKCSGELYQRSD >Random_Gram-positive_AKL_hrayu RRICKECGATYHLEFNPPAKADVCDKCGGELYQRSD >Random_Gram-positive_AKL_kgavl RRICKECGATYHLEFNPPAKADVCDKCGGELYQRSD >Random_Gram-positive_AKL_pusim RRICKECGATYHLEFNPPAKADVCDKCGGELYQRSD >Random_Gram-positive_AKL_vwdds RRICKECGATYHLEFNPPAKADVCDKCGGELYQRSD >Random_Gram-positive_AKL_tkoth RRICKECGATYHLEFNPPATADVCDKCGGELYQRSD >Random_Gram-positive_AKL_hcbyk RRICKECGATYHLEFNPPAKADVCDKCGGELYQRSD >Random_Gram-positive_AKL_ahjnk RRICKECGATYHLEFNPPAKADVCDKCGGELYQRSD >Random_Gram-positive_AKL_dryqn RRICKECGATYHLEFNAPAKADVCDKCGGELYQRSD >Random_Gram-positive_AKL_bkgrl RRICKECGATYHLEFNPPAKADVCDKCGGELYQRSD >Random_Gram-positive_AKL_piqkt RRICKECGATYHLEFNAPAKADVCDKCGGELYQRSD >Random_Gram-positive_AKL_xmjgd RRICKECGATYHLEFNAPAKADVCDKCGGKLYQRSD >Random_Gram-positive_AKL_fmrku RRICKECGATYHLEFNPPANADVCDKCGGDLYQRSD >Random_Gram-positive_AKL_kiqtd RRICKECGATYHLEFNPPANADVCDKCGGDLYQRSD >Random_Gram-positive_AKL_awqsq RRICKECGATYHLEFNPPANADVCDKCGGDLYQRSD >Random_Gram-positive_AKL_cjuqw RRICKECGATYHLEFNPPANADVCDKCGGDLYQRSD >Random_Gram-positive_AKL_etzsu RRICKECGATYHLEFNPPANADVCDKCGGDLYQRSD >Random_Gram-positive_AKL_ndrpd RRICKECGATYHLEFNPPANADVCDKCGGDLYQRSD >Random_Gram-positive_AKL_taebr RRICKECGATYHLEFNPPAKADVCDKCGGELYQRSD >Random_Gram-positive_AKL_qvzlz RRICKECGATYHLEFNPPAKADVCDKCGGELYQRSD >Random_Gram-positive_AKL_yhzrz RRICKECGATYHLEFNPPAKADVCDKCGGELYQRSD ================================================ FILE: inst/extdata/LeaderRepeat_All.fa ================================================ >Ain_RyC-MR95 ATCCGTTGATCAAATTTGAGGTTTGAGAGATATGTAAATTCAAAGGATAATCAAAC >Asp_D21 ATCCGTTGATCAAATTTGAGGTTTGAGAGATATGTAAATTCAAAGGATAATCAAAC >Bbi_S17 AGGAATCCTTAAGGCTATCGGTTTCAGATGCCTGTCAGATCAATGACTTTGACCAC >Bca_FSLF6-1037 ACAAAATCGACGCATTTGAGGTTTTAGAGCTGTGTTAAATTGAATGGTATTAAAAC >Bfi_16/4 GTTGGAAATTTGGATTTGACGTTTTAGTACCCGGGAAAATTAAGTGATTGGAAAAC >Bpe_CAG:437 TTTGGATAACATGATTTGGTATTTTAGTACCTGAACAAATTACGTGACTGTAAAAC >Bsp_AC2005 TTTATCATACTATATTTGGTGTTTTAGTACCTAGAGAAATTAAGTGATTAGAAAAC >Bth_DSM20171 ACAAAATTTCATTGTTTGAGGTTTTAGAGCTGTGTTAAATTGAATGGTATTAAAAC >Cgl_PW2 GAGAAGATTTTGATCCAATGGTTTTGGAGCAGTGTCGTTCTGACTGGTAATCCAAC >Cla_DSM_14151 ATGGCTCTCTAAAATTTGAGGTTTTAGACCAGTGTAATTTTAGAGAGTAGTAAAAC >Cma_M35/04/3 TTTAAATATTACAATTTAAGGTTCTTGTACTTTCTAGATTTTCATATTAGTAAAAC >Cmi_DSM15897 ATTGGATTTTTGAATTTGAGGTTTTAGGGTTATGTTATTTTGAACTGAATTAAAAC >Csp_CAG:230 CGATTATATTTGAATTTGATATTTTAGTACCTGAAAGAATTGAGTTATCGTAAAAC >Csp_ZWU0011 TACGTTATAATGAAATTGACATTTTGGTACTCTCGCATCTTTTGGTATAAGGAAAC >Dlo_AGR2136 CGGCGAGAACCGGATTTGAGGTTTGAGAGTCTTGTTAATACGGAAGGATTTTAAAC >Edo_DSM3991 TATGTTAAAATATGTTTGAGGTTTTGTTACCATATGGATTTTTGCTAGATTAAGAC >Efa_1141733 GGAAAAATTTTTTCTGCGAGGTTTTAGAGCTATGCTGATTTGAATGCTTCCAAAAC >Efa_D32_1 GAAAAAAATAATTCTCCGAGGTTTTAGAGTCATGTTGTTTAGAATGGTACCAAAAC >Efa_OG1RF_1 GAAAAAAATAATTCTCCGAGGTTTTAGAGTCATGTTGTTTAGAATGGTACCAAAAC >Efa_TX0012 TTTCAAATTTTAAATTTGAGGTTTTTGTACTCTCAATAATTTCTTATCAGTAAAAC >Efa_TX0012_2 TTTCAAATTTTAAATTTGAGGTTTTTGTACTCTCAATAATTTCTTATCAGTAAAAC >Eit_DSM15952 AGATAAAAAATATCTGCGAGGTTTTAGAGCTATGTTGAATCGAATGCTTCCAAAAC >Emu_QU25_DNA GAAAAAATTTTTTCTACGAGGTTTTAGAGCTATGTTGAATTGAATGCTTCCAAAAC >Eph_ATCCBAA-412 AGAAAGAAAATGGCTGCGAGGTTTTAGAGCCATGTTGAATTGAATGCTTCCAAAAC ================================================ FILE: inst/extdata/Rfam/RF00458.fasta ================================================ >AF178440.1/5925-6123 UUGACUAUGUGAUCUUGCUUUCG----UAAUAAAAUUCUGUACAUAAAAGUCGAAAGUAUUGCUAUAGUUAAGGUUGCGCUUGCCUAUUUAGGCAUACUUCUCAGGAUGGCGCG-UUGCAGUCCAA-CAAG-AUCCAGGGACUGUACAGAAUUUUCC-UAUACCUCGAGUCGGGUUU-GGAA--UCUAAGGUUGACUCGCUGUAAAUAAU >AB017037.1/6286-6484 GAAAAUGUGUGAUCUGAUUAGAAG--UAAGAAAAUUCCUAG-UUAUAAUAUUUUUAAUACUGCUACAUUUUU-AAGACCCUUAGUUAUUUAGCUUUACCGCCCAGGAUGGGGUG-CAGCGUUCCUG-CAA-UAUCCAGGGCAC--CUAGGUGCAGCCUUGUAGUUUUAGUGGACUUUAGGCU--AAAGAAUUUCACUAGCAAAUAAUAAU >AF014388.1/6078-6278 GUUAAGAUGUGAUCUUGCUUCCUU--AUACAAUUUUGAGAGGUUAAUAAGAAGGAAGUAGUGCUAUCUUAAU-AAUUAGGUUAACUAUUUAGUUUUACUGUUCAGGAUGCCUAU-UGGCAGCCCCA-UAA-UAUCCAGGACAC-CCUCUCUGCUUCUUAUAUGAUUAGGUUGUCAUUUAGAA--UAAGAAAAUAACCUGCUAACUUUCAA >AF218039.1/6028-6228 GCAAAAAUGUGAUCUUGCUUGUAA--AUACAAUUUUGAGAGGUUAAUAAAUUACAAGUAGUGCUAUUUUUGU-AUUUAGGUUAGCUAUUUAGCUUUACGUUCCAGGAUGCCUAG-UGGCAGCCCCA-CAA-UAUCCAGGAAGC-CCUCUCUGCGGUUUUUCAGAUUAGGUAGUCGAAAAACC--UAAGAAAUUUACCUGCUACAUUUCAA >AF183905.1/5647-5848 CCAACAAUGUGAUCUUGCUUGCGGA-GGCAAAAUUUGCACAGUAUAAAAUCUGCAAGUAGUGCUAUUGUUGG-AAUCACCGUACCUAUUUAGGUUUACGCUCCAAGAUCGGUGGAUAGCAGCCCUAUCAA-UAUCUAGGAGAA-CUGUGCU-AUGUUUAGAAGAUUAGGUAGUCUCUAAACA---GAACAAUUUACCUGCUGAACAAAUU >AB006531.1/6003-6204 CUGACUAUGUGAUCUUAUUAAAAUUAGGUUAAAUUUCGAGGUUAAAAAUAGUUUUAAUAUUGCUAUAGUCUU-AGAGGUCUUGUAUAUUUAUACUUACCACACAAGAUGGACCG-GAGCAGCCCUC-CAA-UAUCUAGUGUAC--CCUCGUGCUCGCUCAAACAUUAAGUGGUGUUGUGCGA--AAAGAAUCUCACUUCAAGAAAAAGAA >AF022937.1/6935-7121 AGUGUUGUGUGAUCUUGCGCGAU-------AAAUGCUGACG---UGAAAACGUUGCGUAUUGCUACAACACU-----UGGUUAGCUAUUUAGCUUUACUAAUCAAGACGCCGUC-GUGCAGCCCAC-AAAA-GUCUAGAUA----CGUCACAGGAGAGCAUACGCUAGGUCGCGUUGACUAUCCUUAUAUAU-GACCUGCAAAUAUAAAC ================================================ FILE: inst/extdata/Rfam/RF03120.fasta ================================================ >KU973692.1/1-298 AUAUUAGGUUUUUACCUACCCAGGAA--AAGCCAACCAA-CCUCGAUCUCUUGUAGAUCUGUUCUCUAAACGA-ACUUUAAAA------UCUGUGUAGCUGUCGCUCGGCUGCAUGCCUAGUGCACCUACGCAGUAUAAACAAUAAUAAACUUUACUGUCGUUGACAAGAAACGAGUAACUCGUCCCUCUUCUGCAGACUGCUUACGGUUUCGUCCGUGUUGCAGUCGAUCAUCAGCAUACCUAGGUUUCGUCCGGGUGUGACCGAAAGGUAAGAUGGAGAGCCUUGUUCUUGGAAUCAACGAGAAAA >DQ071615.1/1-298 AUAUUAGGUUUUUACCUACCCAGGAA--AAGCCAACCAA-CCUCGAUCUCUUGUAGAUCUGUUCUCUAAACGA-ACUUUAAAA------UCUGUGUAGCUGUCGCUUGGCUGCAUGCCUAGUGCACCUACGCAGUAUAAACAAUAAUAAACUCUACUGUCGUUGACAAGAAACGAGUAACUCGUCCCUCUUCUGCAGACUGCUUACGGUUUCGUCCGUGUUGCAGUCGAUCAUCAGCAUACCUAGGUUUCGUCCGGGUGUGACCGAGAGGUAAGAUGGAGAGCCUUGUUCUUGGAAUCAACGAGAAAA >KF367457.1/1-298 AUAUUAGGUUUUUACCUACCCAGGAA--AAGCCAACCAA-CCCCGAUCUCUUGUAGAUCUGUUCUCUAAACGA-ACUUUAAAA------UCUGUGUAGCUGUCGCUCGGCUGCAUGCCUAGUGCACCUACGCAGUAUAAACAAUAAUAAAUUUUACUGUCGUUGACAAGAAACGAGUAACUCGUCCCUCUUCUGCAGACUGCUUACGGUUUCGUCCGUGUUGCAGUCGAUCAUCAGCAUACCUAGGUUUCGUCCGGGUGUGACCGAAAGGUAAGAUGGAGAGCCUUGUUCUUGGUGUCAACGAGAAAA >MK211377.1/1-296 --UUUAGGUUUUUACCUACCCAGGAA--AAGCCAACCAA-CCUCGAUCUCUUGUAGAUCUGUUCUCUAAACGA-ACUUUAAAA------UCUGUGUAGCUGUCGCUCGGCUGCAUGCCUAGUGCACCUACGCAGUAUAAACAAUAAUAAAUUUUACUGUCGUUGACAAGAAACGAGUAACUCGUCCCUCUUCUGCAGACUGCUUACGGUUUCGUCCGUGUUGCAGUCGAUCAUCAGCAUACCUAGGUUUCGUCCGGGUGUGACCGAAAGGUAAGAUGGAGAGCCUUGUUCUUGGUGUCAACGAGAAAA >MK062184.1/1-299 AUAUUAGGUUUUUACCUACCCAGGAA--AAGCCAACCAA-CCUCGAUCUCUUGUAGAUCUGUUCUCUAAUGGUCGCUUUAAAA------UCUGUGUAGCUGUCGCUCGGCUGCAUGCCUAGUGCACCUACGCAGUAUAAACAAUAAUAAAUUUUACUGUCGUUGACAAGAAACGAGUAACUCGUCCCUCUUCUGCAGACUGCUUACGGUUUCGUCCGUGUUGCAGUCGAUCAUCAGCAUACCUAGGUUUCGUCCGGGUGUGACCGAAAGGUAAGAUGGAGAGCCUUGUUCUUGGUGUCAACGAGAAAA >AY559082.1/1-297 AAAUUUUGUUU-CAUCUAUACAGGAA--AAGCCAACCAA-CCUCGAUCUCUUGUAGAUCUGUUCUCUAAACGA-ACUUUAAAA------UCUGUGUAGCUGUCGCUCGGCUGCAUGCCUAGUGCACCUACGCAGUAUAAACAAUAAUAAAUUUUACUGUCGUUGACAAGAAACGAGUAACUCGUCCCUCUUCUGCAGACUGCUUACGGUUUCGUCCGUGUUGCAGUCGAUCAUCAGCAUACCUAGGUUUCGUCCGGGUGUGACCGAAAGGUAAGAUGGAGAGCCUUGUUCUUGGUGUCAACGAGAAAA >DQ412043.1/1-294 AUAUUAGGUUUUUACCUACCCAGGAA--AAGCCAACCAA-CCUCGAUCUCUUGUAGAUCUGUUCUUUAAACGA-ACUUUAAAA------UCUGUGUGGCUGUCGCUUGGCUGUAUGCCUAGUGCACCUACACAGUAUAAA---UAAU-AACUUUACUGUCGUUGACAAGAAACGGGUAACUCGUCCUUCUUCUGCAGACUGCUUACGGUUUCGUCCGUGUUGUAGUCGAUCAUCAGCAUACCUAGGUUUCGUCCGGGUGUGACCGAAAGGUAAGAUGGAGAGCCUUGUUCUUGGUGUCAACGAGAAAA >KP886809.1/1-297 AUAUUAGGUUUUUACCUACCCAGGAA--AAGCCAACCAA-CCUCGAUCUCUUGUAGAUCUGUUCUUUAAACGA-ACUUUAAAA------UCUGUGUGGCUGUCGCUCGGCUGUAUGCCUAGUGCACCUACACAGUAUAAAUAUUAAU-AACUUUACUGUCGUUGACAAGAAACGAGUAACUCGUCCUUCUUCUGCAGACUGCUUACGGUUUCGUCCGUGUUGCAGUCGAUCAUCAGCAUACCUAGGUUCCGUCCGGGUGUGACCGAAAGGUAAGAUGGAGAGCCUUGUUCUUGGUGUCAACGAGAAAA >DQ022305.2/1-295 --GUUAGGUUUUUACCUACCCAGGAA--AAGCCAACCAA-CCUUGAUCUCUUGUAGAUCUGUUCUCUAAACGA-ACUUUAAAA------UCUGUGUGGCUGUCGCUCGGCUGCAUGCCUAGCGCACCUACGCAGUAUAAAUAUUAAU-AACUUUACUGUCGUUGACAAGAAACGAGUAACUCGUCCCUCUUCUGCAGACUGCUUACGGUUUCGUCCGUGUUGCAGUCGAUCAUCAGCAUACCUAGGUUUCGUCCGGGUGUGACCGAAAGGUAAGAUGGAGAGCCUUGUUCUUGGUGUCAAUGAGAAAA >MK211374.1/1-294 ---UUAGGUUUUUACCUACCCAGGAA--AAGCCAACCAA-CUUCGAUCUCUUGUAGAUCUGUUCUCUAAACGA-ACUUUAAAA------UCUGUGUGGCUGUCGCUUGGCUGUAUGCCUAGUGCACCUACGCAGUACAAAUAUUAAU-AACUCUAUUGUCGUUGACAAGAAACGAGUAACUCGUCCUUCUUCUGCAGACUGCUUACGGUUUCGUCCGUGUCGCAGUCGAUCAUCAGCAUACCUAGGUUUCGUCCGGGUGUGACCGAAAGGUAAGAUGGAGAGCCUUGUUCUUGGUGUCAACGAGAAAA >DQ648857.1/1-297 AUAUUAGGUUUUUACCUACCCAGGAA--AAGCCAACCAA-UCUCGAUCUCUUGUAGAUCUGUUCUCUAAACGA-ACUUUAAAA------UCUGUGUAGCUGUCGCUCGGCUGCAUGCCUAGUGCACUUACGCAGUAUAAAUAUUAAU-AACUUUACUGUCGCUGACUGGAUACGAGUAACUCGUCCUUCUUCUGCAGACUGCUUACGGUUUCGUCCGUGUUGCAGUCGAUCAUCAGCAUACCUAGGUUUCGUCCGGGUGUGACCGAAAGGUAAGAUGGAGAGCCUUGCUCUUGGUGUCAGCGAGAAAA >KF569996.1/1-305 AUAUUAGGUUUUUACCUACCCAGGAA--AAGCCAACCAACCCUCGAUCUCUUGUAGAUCUGUUCUCUAAACGA-ACUUUAAAAGCAUUCUCUGUGUAGCUGUCGCUCGGCUGCAUGCCUAGUGCACCUACGCAGUAUAAACAAUAAUAAACUUUACUGUCGUUGACAAGAAACGAGUAACUCGUCCCCCUUCUGCAGACUGCUUGCGGUUUCGUCCGUGUUGCAGUCGAUCAUCAGCAUACCUAGGUUCCGUCCGGGUGUGACCGAAAGGUAAGAUGGAGAGCCUUGUUCUUGGAAUCAACGAGAAAA >MT163718.1/1-299 UUGGUUGGUUUAUACCUUCSCAGGUAACAAACCAACCAACUUUCGAUCUCUUGUAGAUCUGUUCUCUAAACGA-ACUUUAAAA------UCUGUGUGGCUGUCACUCGGCUGCAUGCUUAGUGCACUCACGCAGUAUAAUUAAUAACUAA--UUACUGUCGUUGACAGGACACGAGUAACUCGUCUAUCUUCUGCAGGCUGCUUACGGUUUCGUCCGUGUUGCAGCCGAUCAUCAGCACAUCUAGGUUUCGUCCGGGUGUGACCGAAAGGUAAGAUGGAGAGCCUUGUCCCUGGUUUCAACGAGAAAA >MT344963.1/1-299 AUUAAAGGUUUAUACCUUCCCAGGUAACAAACCAACCAACUUUCGAUCUCUUGUAGAUCUGUUCUCUAAACGA-ACUUUAAAA------UCUGUGUGGCUGUCACUCGGCUGCAUGCUUAGUGCACUCACGCAGUAUAAUUAAUAACUAA--UUACUGUCGUUGACAGGACACGAGUAACUCGUCUAUCUUCUGCAGGCUGCUUACGGUUUCKUCCGUGUUGCAGCCGAUCAUCAGCACAUCUAGGUUUUGUCCGGGUGUGACCGAAAGGUAAGAUGGAGAGCCUUGUCCCUGGUUUCAACGAGAAAA >MT019530.1/1-299 AUUAAAGGUUUAUACCUUCCCAGGUAACAAACCAACCAACUUUCGAUCUCUUGUAGAUCUGUUCUCUAAACGA-ACUUUAAAA------UCUGUGUGGCUGUCACUCGGCAGCAUGCCGAGUGCAGCCACACAGUAUAAUUAAUAACUAA--UUACUGUCGUUGACAGGACACGAGUAACUCGUCUAUCUUCUGCAGGCUGCUUACGGUUUCGUCCGUGUUGCAGCCGAUCAUCAGCACAUCUAGGUUUCGUCCGGGUGUGACCGAAAGGUAAGAUGGAGAGCCUUGUCCCUGGUUUCAACGAGAAAA >MT263421.1/1-296 AU---AGGUUUAUACCUUCCCAGGUAACAAACCHUUHAACUUUCGAUCUCUUGUAGAUCUGUUCUCUAAACGA-ACUUUAAAA------UCUGUGUGGCUGUCACUCGGCUGCAUGCUUAGUGCACUCACGCAGUAUAAUUAAUAACUAA--UUACUGUCGUUGACAGGACACGAGUAACUCGUCUAUCUUCUGCAGGCUGCUUACGGUUUCGUCCGUGUUGCAGCCGAUCAUCAGCACAUCUAGGUUUCGUCCGGGUGUGACCGAAAGGUAAGAUGGAGAGCCUUGUCCCUGGUUUCAACGAGAAAA >MT345869.1/1-293 ------GGUUUAUACCUUCCCAGGUAACAAUCCAWUCAACUUUCGAUCUCUUGUAGAUCUGUUCUCUAAACGA-ACUUUAAAA------UCUGUGUGGCUGUCACUCGGCUGCAUGCUUAGUGCACUCACGCAGUAUAAUUAAUAACUAA--UUACUGUCGUUGACAGGACACGAGUAACUCGUCUAUCUUCUGCAGGCUGCUUACGGUUUCGUCCGUGUUGCAGCCGAUCAUCAGCACAUCUAGGUUUUGUCCGGGUGUGACCGAAAGGUAAGAUGGAGAGCCUUGUCCCUGGUUUCAACGAGAAAA >MT345841.1/1-293 ------CUYYUAUACCUUCCCAGGUAACAAACCHWYCAACUUUCGAUCUCUUGUAGAUCUGUUCUCUAAACGA-ACUUUAAAA------UCUGUGUGGCUGUCACUCGGCUGCAUGCUUAGUGCACUCACGCAGUAUAAUUAAUAACUAA--UUACUGUCGUUGACAGGACACGAGUAACUCGUCUAUCUUCUGCAGGCUGCUUACGGUUUCGUCCGUGUUGCAGCCGAUCAUCAGCACAUCUAGGUUUCGUCCGGGUGUGACCGAAAGGUAAGAUGGAGAGCCUUGUCCCUGGUUUCAACGAGAAAA >MG772934.1/1-298 AUAUUAGGUUUUUACCUUCCCAGGUAACAAACCAACUAACUCUCGAUCUCUUGUAGAUCUGUUCUCUAAACGA-ACUUUAAAA------UCUGUGUGACUGUCACUUAGCUGCAUGCUUAGUGCACUCACGCAGUUUAAUUA-UAAUUAA--UUACUGUCGUUGACAGGACACGAGUAACUCGUCUAUCUUCUGCAGGUUGCUUACGGUUUCGUCCGUGUUGCAGCCGAUCAUCAGCAUACCUUGGUUUCGUCCGGGUGUGACCGAGAGGUAAGAUGGAGAGCCUUGUCCCUGGUUUCAACGAGAAAA ================================================ FILE: inst/extdata/Rfam/RF03120_SS.txt ================================================ >RF03120 ......<<<<<<<.<<<....>>>>>..>>>>>...........<<<<<.....>>>>>.<<<<.......>>.>>..............<<<<<<<<.<<.<<<<.<<<.....>>>.>>>>>>.>>>>>>>>........................((((((((((((.(((((...(((.(((.((((<<<..<<<<<<.<<<<<......>>>>>..>>>>>>......>>><<<<<<<.<<......>>>>>>>>><<<....>>>)))).)))))).))))))))))...)))))))..... ================================================ FILE: inst/extdata/sample.fasta ================================================ >PH4H_Rattus_norvegicus MAAVVLENGVLSRKLSDFGQETSYIEDNSNQNGAISLIFSLKEEVGALAKVLRLFEENDINLTHIESRPSRLNKDEYEFF TYLDKRTKPVLGSIIKSLRNDIGATVHELSRDKEKNTVPWFPRTIQELDRFANQILSYGAELDADHPGFKDPVYRARRKQ FADIAYNYRHGQPIPRVEYTEEEKQTWGTVFRTLKALYKTHACYEHNHIFPLLEKYCGFREDNIPQLEDVSQFLQTCTGF RLRPVAGLLSSRDFLGGLAFRVFHCTQYIRHGSKPMYTPEPDICHELLGHVPLFSDRSFAQFSQEIG-LASLGAPDEYIE KLATIYWFTVEFGLCKEG-DSIKAYGAGLLSSFGELQYCLSD-KPKLLPLELEKTACQEYSVTEFQPLYYVAESFSDAKE KVRTFAATIPRPFSVRYDPYTQRVEVLDNTQQLKILADSINSEVGILCNALQKIKS >PH4H_Mus_musculus MAAVVLENGVLSRKLSDFGQETSYIEDNSNQNGAVSLIFSLKEEVGALAKVLRLFEENEINLTHIESRPSRLNKDEYEFF TYLDKRSKPVLGSIIKSLRNDIGATVHELSRDKEKNTVPWFPRTIQELDRFANQILSYGAELDADHPGFKDPVYRARRKQ FADIAYNYRHGQPIPRVEYTEEERKTWGTVFRTLKALYKTHACYEHNHIFPLLEKYCGFREDNIPQLEDVSQFLQTCTGF RLRPVAGLLSSRDFLGGLAFRVFHCTQYIRHGSKPMYTPEPDICHELLGHVPLFSDRSFAQFSQEIG-LASLGAPDEYIE KLATIYWFTVEFGLCKEG-DSIKAYGAGLLSSFGELQYCLSD-KPKLLPLELEKTACQEYTVTEFQPLYYVAESFNDAKE KVRTFAATIPRPFSVRYDPYTQRVEVLDNTQQLKILADSINSEVGILCHALQKIKS >PH4H_Homo_sapiens MSTAVLENPGLGRKLSDFGQETSYIEDNCNQNGAISLIFSLKEEVGALAKVLRLFEENDVNLTHIESRPSRLKKDEYEFF THLDKRSLPALTNIIKILRHDIGATVHELSRDKKKDTVPWFPRTIQELDRFANQILSYGAELDADHPGFKDPVYRARRKQ FADIAYNYRHGQPIPRVEYMEEEKKTWGTVFKTLKSLYKTHACYEYNHIFPLLEKYCGFHEDNIPQLEDVSQFLQTCTGF RLRPVAGLLSSRDFLGGLAFRVFHCTQYIRHGSKPMYTPEPDICHELLGHVPLFSDRSFAQFSQEIG-LASLGAPDEYIE KLATIYWFTVEFGLCKQG-DSIKAYGAGLLSSFGELQYCLSE-KPKLLPLELEKTAIQNYTVTEFQPLYYVAESFNDAKE KVRNFAATIPRPFSVRYDPYTQRIEVLDNTQQLKILADSINSEIGILCSALQKIK- >PH4H_Bos_taurus MSALVLESRALGRKLSDFGQETSYIEGNSDQN-AVSLIFSLKEEVGALARVLRLFEENDINLTHIESRPSRLRKDEYEFF TNLDQRSVPALANIIKILRHDIGATVHELSRDKKKDTVPWFPRTIQELDNFANQVLSYGAELDADHPGFKDPVYRARRKQ FADIAYNYRHGQPIPRVEYTEEEKKTWGTVFRTLKSLYKTHACYEHNHIFPLLEKYCGFREDNIPQLEEVSQFLQSCTGF RLRPVAGLLSSRDFLGGLAFRVFHCTQYIRHGSKPMYTPEPDICHELLGHVPLFSDRSFAQFSQEIG-LASLGAPDEYIE KLATIYWFTVEFGLCKQG-DSIKAYGAGLLSSFGELQYCLSD-KPKLLPLELEKTAVQEYTITEFQPLYYVAESFNDAKE KVRNFAATIPRPFSVHYDPYTQRIEVLDNTQQLKILADSISSEVEILCSALQKLK- >PH4H_Chromobacterium_violaceum -------------------------------------------------------------------------------- ----------------------------------------------------MNDRADFVVPD-----ITTRKNVGLSHD AN------DFTLPQPLDRYSAEDHATWATLYQRQCKLLPGRACDEFMEGL----ERLEVDADRVPDFNKLNQKLMAATGW KIVAVPGLIPDDVFFEHLANRRFPVTWWLREPHQLDYLQEPDVFHDLFGHVPLLINPVFADYLEAYGKGGVKAKALGALP MLARLYWYTVEFGLINTP-AGMRIYGAGILSSKSESIYCLDSASPNRVGFDLMRIMNTRYRIDTFQKTYFVIDSFKQLFD ATA-PDFAPLYLQLADAQPWGAGDVAPDDLVLNAGDRQGWADTEDV---------- >PH4H_Ralstonia_solanacearum -------------------------------------------------------------------------------- -----------------------------------------------MAIATPTSAAPTPAPAGFTGTLTDKLREQFAEG LDGQTLRPDFTMEQPVHRYTAADHATWRTLYDRQEALLPGRACDEFLQGL----STLGMSREGVPSFDRLNETLMRATGW QIVAVPGLVPDEVFFEHLANRRFPASWWMRRPDQLDYLQEPDGFHDIFGHVPLLINPVFADYMQAYGQGGLKAARLGALD MLARLYWYTVEFGLIRTP-AGLRIYGAGIVSSKSESVYALDSASPNRIGFDVHRIMRTRYRIDTFQKTYFVIDSFEQLFD ATR-PDFTPLYEALGTLPTFGAGDVVDGDAVLNAGTREGWADTADI---------- >PH4H_Caulobacter_crescentus -------------------------------------------------------------------------------- ----------------------------------------------------MSG---------------DGLSNGPPPG AR-----PDWTIDQGWETYTQAEHDVWITLYERQTDMLHGRACDEFMRGL----DALDLHRSGIPDFARINEELKRLTGW TVVAVPGLVPDDVFFDHLANRRFPAGQFIRKPHELDYLQEPDIFHDVFGHVPMLTDPVFADYMQAYGEGGRRALGLGRLA NLARLYWYTVEFGLMNTP-AGLRIYGAGIVSSRTESIFALDDPSPNRIGFDLERVMRTLYRIDDFQQVYFVIDSIQTLQE VTL-RDFGAIYERLASVSDIGVAEIVPGDAVLTRGT-QAYATAGGRLAGAAAG--- >PH4H_Pseudomonas_aeruginosa -------------------------------------------------------------------------------- ----------------------------------------------------------------------MKTTQYVARQ PD----------DNGFIHYPETEHQVWNTLITRQLKVIEGRACQEYLDGI----EQLGLPHERIPQLDEINRVLQATTGW RVARVPALIPFQTFFELLASQQFPVATFIRTPEELDYLQEPDIFHEIFGHCPLLTNPWFAEFTHTYGKLGLKASKE-ERV FLARLYWMTIEFGLVETD-QGKRIYGGGILSSPKETVYSLSD-EPLHQAFNPLEAMRTPYRIDILQPLYFVLPDLKRLFQ LAQ-EDIMALVHEAMRLG-LHAPLFPPKQAA------------------------- >PH4H_Rhizobium_loti -------------------------------------------------------------------------------- ----------------------------------------------------MSVAEYAR----------DCAAQGLRGD YS--VCRADFTVAQDYD-YSDEEQAVWRTLCDRQTKLTRKLAHHSYLDGV----EKLGL-LDRIPDFEDVSTKLRKLTGW EIIAVPGLIPAAPFFDHLANRRFPVTNWLRTRQELDYIVEPDMFHDFFGHVPVLSQPVFADFMQMYGKKAGDIIALGGDE MITRLYWYTAEYGLVQEAGQPLKAFGAGLMSSFTELQFAVEGKDAHHVPFDLETVMRTGYEIDKFQRAYFVLPSFDALRD AFQTADFEAIVARRKDQKALDPATV------------------------------- ================================================ FILE: inst/extdata/seedSample.fa ================================================ >hsa-let-7a-5p MIMAT0000062 Homo sapiens let-7a-5p UGAGGUAGUAGGUUGUAUAGUU >hsa-let-7b-5p MIMAT0000063 Homo sapiens let-7b-5p UGAGGUAGUAGGUUGUGUGGUU >hsa-let-7c-5p MIMAT0000064 Homo sapiens let-7c-5p UGAGGUAGUAGGUUGUAUGGUU >hsa-let-7d-5p MIMAT0000065 Homo sapiens let-7d-5p AGAGGUAGUAGGUUGCAUAGUU >hsa-let-7e-5p MIMAT0000066 Homo sapiens let-7e-5p UGAGGUAGGAGGUUGUAUAGUU >hsa-let-7f-5p MIMAT0000067 Homo sapiens let-7f-5p UGAGGUAGUAGAUUGUAUAGUU ================================================ FILE: inst/extdata/sequence-link-tree.fasta ================================================ >Phy000B0HV_NEUCR M-----GIGSATLG-----------------------------------SRIPTPVLVARAVVSSSDGK-----DC--VA NPNLCEKP-VGGSQLTVPIVLGLW----------------RNMKKLAAEEAHDPHKSLDFGLDENM-----------GKA KGRNMAG------EKDGNGSRFHAHQMSMDMNLSSPYLLPPDAH-GSQSSLNSLARTL-NPQDDPFRPVTQYTASDAASV KSMP-R-----GTD-----------R-------GPGG------PFRGPPPRQGSMP-RSPEPTHA---RPGNG----PRP PRI-SVQD------------P---SSNA-TS-D--NE-TS----------------------D----------------S ERTLT-GSPRELHAATHK-------------------DGVKPPA-SPSQPISPANP-AV--------------------- >Phy000FCLK_ASPCL -------------------------------------------------------------------------------- -------------------------------------------------------------------------------M -------------REAEKGNPMHAKGMSLDIV-PSPYLLPPGLH-GSRESLHSLSRSV-IGDDDKYRHATSFL-GDNASV RSQP-R---G-YHDDAMTYSR-SQ-S------K-VS------M--R-DDMNQGLLQ------------NAQRMSR--SSP PL-YNTPPDGGSVHSPVGQD------------------------------------------R----------------- GQDSG-LQLNLPRSLSPVHI-----------------PGFNGSR-GPSPV-----P-TS-PE------GNDDKLPS---- >Phy000FJDH_ASPFL MH---YHHRHQT--HQDIHMV-VRSPP-RRPDI-VPRHRLP------YLV-PEPPTFVKRDSDPS--------QTCSAGD TSSKCEKPTSTTTTTTLPVVLGAVVPILCAV-IVLIYLHRRNVRKLRSEDANDKHRSLDFGLDLEP-TG----GGNA--M R-------Q----TEKSNGSYNHNKGISLDIG-PSPYLLPPGLH-GSRDSLHSLSRSI--GGDDKYRHATSFL-GDNASV RSQS-R---G-AQDDAPSFTG-SA-R------K-AA------L---GDDMKQGLLG------------NAQRMSR--SSP PL-YISPGEDGA-HVQVDPI------------------------------------------A----------------- QPDHG-FQFELPRSPSPVLI-----------------PGAPSTK-ESITP-----TNNV-DK------------------ >Phy000FQ5O_ASPFU MH---HHHQHLHFPRHGIHLA-VRSPP-RRPDI-VPRDRVP------LLVGTEDPTLVKRVPSTSTTSTA--STRCPEGD TSSACEKYTNSSSTTTLPIVLGAVIPIVCAI-IVLFYLHRRNVKKLRQEDANDKHKSLDFGLDLEP-RA----GSKP--M -------------TQAEKGSNMHSKGMSLDIG-QSPYLLPPGLH-GSRESLHSLSRSI-IGDDDKYRHASSFL-GDNASV RSQP-R---G-FHDETSAFSR-SQ-S------K-AS------L--RGDDMNQGLLQ------------NAQRMSR--SSP PL-YNAPSDGGSSHSPRGQG------------------------------------------N----------------- GQDMG-LQLNLPRSLSPVHI-----------------PGVNGSR-GTSPA-----P-GGHAD------GSEDISSS---- >Phy000G05U_EMENI MH---RHQQHQH--RHGKYLG-ARFAP-VEPAL-MPRNRPP------YLLMPEAPTLVKREPMPTTDSGR--VETCSPGD NSARCEKNTSTASNTTLPVVLGAVIPIVCAI-IVLIFLHRRNVKKLRNEDANDKHKSLDFGMDLAP-SG----GRSG--M Q-------E---------KGSHHMKGISLDIG-PSPYMLPPSIR-GSKDSLNSLPRTI-LADDDKYRHAHTYFSTDAQSI RSQR-R-----VHDDAASVAG-ST-R------R-GA------F---GDEMNQGLLG------------NAQRISR--SSP PL-YNPPEPTAGRAQ----P------------------------------------------Q----------------- VQDAG-FELSLPRSPSPVHV-----------------SGLTSIN-ESTTE-----TGRE-AN------------------ >Phy000GDP6_ASPNG M--------------------------------------------------RETPTLARREPLPSTDSSS------ASSS TASSGTKPTSTLTTTTLPVVLGAVVPIVIAI-GILLYLHRRNVKKLRNEDANDKHKSLDFGLDLAP-TN----GAVP--M Q-------Q----AEKTDRNAAHNKGISLDIG-PSPYLLPPGLH-NSRESLASMSRSIGDGDDDKYRHVGSFL-GDNSSL RSHS-R---G-PHDDAASFTG-ST-R------R-AA------L---GDDMNQGLLR------------NAQRMSR--SSP PL-YKTSSGDRNVQSPASSD------------------------------------------H----------------- EHDHG-FQLDLPRSPSPVHV-----------------PGMAISE-PHT-------TSNE-VG------FAGDHAVTETSA >Phy000HD5X_BOTFU MADHQRLANIVRLARRV----P--LAE-AAAED-IGNIASIL------KMSLPDPVLMVRSATTSAAASS---STCAADD TSAACEKP-VGPSAYTLPVILGIVIPVGGAI-ILFTILQRRYMKKAREEDLNDPTKNMDFGMGRIS-R--T------AGG ESG-I--S---N-FDDEKGGAVRTRQMSLDLGGKSPYLLPPELH-NSRESLHSLSRTI-HSNEDPYRPVHEAV---GGSI RSKQ-------GRNGSSIMTESSA-A--------PSK-----MYDAGSPDGQGLLS------------NAAAMSR--TTP PSTGSSPP------------P---KSNS--I-----P-P----------------------------------------- -ANMP-AEPKQAESPQNVARKGL--------------PGNFRPQ-DRFPTAMPVPM-PYP-------------DRESYAG >Phy000IAZP_COCIM MA------------RHTYRDP--SRLV-SRALA-IPVERSI------ILTALEPPSLVKRNPADAASSSSVPTKTCGPDD TTGVCTRPVNSTTTLTLPIVLGAVIPLTCAF-IAFFFLHRRHVKKLRLEDANDKHKSLDFGLDFVP-SG----SNNNRRG NGGNG-P------SMAEKSTRRGGHGVSMDLTLNSPYLLPPGLH-GSHESIHSLSRSL-HGEDDKYRHASAFPTGDSGSI RSCS-PSFKRGGDD-ASSHNSPSS-K------Y-PY----------GDDMNQHLLK------------NAQRMSR--SPP AI-ELDPIESDLGHPPHHA--------------------T----------------------A-----VSASE------S GNTTF-HGRSELTVPTAVSS-----------------HGDRSSS-SSSER-----DDSV-LR---------KS------- >Phy000KG2Q_MAGGR MVGVTVHEGEYHLGSRM----P--VMA-RDAST-PAL-QIAA------DGPGFFKRLVARQSSDD----------CVNGE PSNLCEKP-VTSQTLALPIALGVTIPLVALV-VMLIWLHRKNVRRQRQEDANDPHKSLDFGLDMGP------------GK RKS-K--L---F-GGEKLGGGPHNRQISMDMNLSSPYLLPPNMQ-NSRESIHSLAKTL--HNEDPYRHITQYNASDAGSL RSYK-A---G-GMD-------------------RPIG-----PKITVPTSRKGSLQATSPTSTIGSVPPRYEASQ---DD YV-KPPPP------------A---ALK---S-P--TQ-DS----------------------TPYPDDKSGP-------L ATVMP-SVP-EIQEPKPASLSK----E-SS---QAPS---LAAV-PPSSPLTISAP-EI--------------------- >Phy000ODBJ_SCLSC MEDHQRLANIVRLARRV----P--LAE-AAAED-IDNIASIL------KMAVPDRVIMGRSSTTTSSTSS---STCAADD TSAACEKP-IGPSAYTLPVILGIVIPVAGAI-ILFTILQRRYTKRAREEDANDPTKNMDFGMGRIS-R--T------AGG ESS-I--S---N-FDDEKGDSGRPRQMSLDLGGKSPYLLPPELQ-DSRESLHSLSKTI-HQNEDPYRPVHEAV--GAASI RSKQ-------GRNGSSILSASTV-A--------PSG-----MNDTGSPDGQGLLS------------NAAAMSR--TTP PTAGFNPP------------P---RSNS--I-----P-P----------------------------------------- -AKMP-EEPRQSP-EQNVDKKGP--------------PGNFRPQ-NGFSSTRSIPM-PFL-------------DWESYAG >Phy000PFY6_UNCRE MA------------RHAFQPA--SGLV-PRALA-IPLDRSI------LLTSLDHPSHVKRSPAATASSSAAATTSCGPND TTGICTRPVSSTTTMTLPIVLGAAIPITCAI-IAFFFLHRRHVKKLRLEDANDKHKSLDFGLDFVP-SG----SNNNKRG NGGNG-G------LMGEKSTRQRAHGVSLDLTMGNPYLLPPVSM-GSHESIHSLSKSL-HGGDDKYRHAAAFPSSENRF- --------------------------------------------------RHSVLQ------------PTNPLA---SEP RS-PLSPPGRNELTKLKQQ--------------------L---------------------------------------- ------------------------------------------------DK-----EQSV-LR---------KS------- >Phy00201Y5_COCHE M------------------------------------------------------------------------------- --------------------CATTVPVVGIA-VVLAFLHRRNKQKLREEDQRDKYKSNDWGMEGVIPK---------TSK KGG-P-EM---S-ISEKEISGGHDRGLSIETG--SPYILPPGLH-GSRESFHSLSRST-HDPHDPYGPVAFLR--DDQSL RSH----GPY-KGETNSVYT--A-SS-------SGT---------KKEGLQAGLLQ------------NAQRMST--SAP VR-GESLS------------P---DSTR--SPD--SK-FAEAGIPLSPLNPRYEPEAPA-----AAPAPAPA-------P AHAAP-VASKPTDVP-TI----------------------SIPE-PQVTEKQV--------------------------- >Phy00208KX_MYCGR MY---IPRA------------EDS-----------R-VQRMV------DGAAAGLRIVARSL------A-------ERAE SNSKEDTPNDRMKVQNIGIALGVIIPIGGAI-IVLTYLHRRHVKRQRVEDMNDPHKSLDFGLEGLG-SMPPQAPKKSRRG KKGPE-MIV-TDFGGPTAHPSKRGHGMSLDLGVPSPYLLPAGLQ-GSKESIHSMSRN--YDEHDPYRSVAMM--RPSGET DRF---R----GDDKGSVYSMSTG-N------R-SA------L--PQD--RASLIA------------NARPMS---ITP SK-RSDPATSHPSTPADVSP------------------------------------------R----------------- DSHSPISRTRSPLAKLSVDE-----------------TAIAEKQLEPLPS-----P-PTVPE------VALMMPPP-RKS >Phy0020GNV_PYRTR MP---HSHHLHHMRHQL----R--HDN-QLGSP-ITGSKTMH------VFERATRVLVARAESS-----------C-TND SDPGCTKP---TQVPTMAIALAVIVPIVGVS-IVLCFLHRRNKRKLAEEDSKDQYKSNDWGMEGVA-K---------TNK KKR-P-EM---S-LSEKDAGGGHDRGLSIEAG--SPYILPVGLH-GSRESFHSLSRSQ-HDPHDPYGPVAFLK--DDQST RGSSVRGGPY-RNETGSVYT--T-SS-------SGT---------RKEGLQAGLLQ------------NAQRMST--SNP VR-GDSLS------------P---VSTS--SPD--TK-FPDPGIPLSPLNPRFENQSPI-----SPPAASPS-------P S-------IKPNSVP-TI----------------------SIPE-PGVTEKQV--------------------------- >Phy0022J75_CRYPA MDEMLARRNGHLMGPRI----P--IGR-RVAAV--AE-DTSV------EASTPPSHVVGRSSSSTSDASSST-ATCSSSS ASNTCEKP--TSTSIAGEISIGIAVPMAIIFICVLIYFHRRNLKRQAAEDRDPHHRSLDFGLGDTS-S----------GK SKR-K-SM-----LGLGGEKSKHPRGLSIDMNLSSPYLLPEHVQ-GSRESMNSLAKTL-HQADDPYRPITKYM-SETGSV GSLE-K---N-GRYTPSVMTASTK-RVSRQSYANPM------SPALQQPLRQNSYP-KSPLTPSAA-------------- ----SSVT------------A---VETDIST-P--TAAKE----------------------PTVPEDGPMPPPQC---D LPPLP-VVP-EIRQPAPVAQRGAA----REPVMQEHEEELDLPD-FSNNSKRESAD-EL--------------------- >Phy0022OIS_VERA1 MAATAFNGNGYRMGSRI----H--VRT-AEPTHEDAA-L----------LRSPGPVIAARKEC---------------DP DHPDCEAPAVKPQTLI--IALSVVIPIVAIM-SILYYLHRRGIKKQRMEEASDPTMSLDFGINDDK-M---------GRG GKRKS-VF---R-EKMLNLDPKHRAQVSMDMNLSSPYLLPPALQ-GSKQSLHSLARNL-HDDDDPYRPVNQYG-SEVGSI RSFRPEK--E-GRAGSSVYTGSTE-R-------GSSL------HSRTHPPRQNSLP-KPPPLT-A---DPFATPTGARTP QLETSPIS------------P---TGGS--------L-PH----------------------AIIPEIGTVSYAEDFDDS NRNLP-HVP-DVTQPAPVAQRDARRVSSGASQSSWNEPAAQFPD-PAAHQVHNAAP-TL--------------------- >Phy003AMS0_602072 M--------------------------------------------------AETPTLARREPLPSTESSS-------SSS SSSSETKPTSTLTTTTLPVVLGAVIPVVIAI-AILLYLHRRNVKKLRNEDANDKHKSLDFGLDLAP-T-----GAKP--M Q-------Q----AEKLDRNAAHNKGVSLDIG-PSPYLLPPGLH-NSRESLSSLSRSIGDGDDDKYRHVGSFL-GDNASL RSHS-R---G-PQDDASSFTG-ST-R------R-GA------L---GDDMNQGLLR------------NAQRMSR--SSP PL-YTIPSGDRNVQSPASSD------------------------------------------H----------------- ERDPG-FQLDLPRSPSPVHV-----------------PGMTISE-PTNSM-----TSNE-PE------FSGVHANTENSA >Phy003BKXA_GIBZE MGLTHYH--DQ----------R--ADIGQGASS-ISQ-KMAS------SSSHIFRRLARRENC----------------K DDNSCAQS-SVSNS--------LVLPIVVAI-I--------NMKKQMLEDAHDPHKSLDFGLGDEG-G---------AKK SAR-R-SI---FMGGGEKTLAHKPSQLSMDMNLSSPYLLPPGLQ-ESRESLNSLAKSLGNDNQDPYQYVAAITQSETGSL RSFNPK---D-SHSRNTKFNSPRN---------SGKP-----GSLKMPPSRMNSLP-ETPVSATESRVDPFGTPKM--PA PA-HPAKS------------P---FDS---E-KDAFH-PA----------------------PIVPEIGVVSD------- FDEKN-AVP-SVQQPPIARSKT---------------------------------------------------------- >Phy003BOHC_AJECA -----------------------------------------------MQIPPPPPTLARRHVVPK--------------- TPPEDARD----LLVMLPLPLYPYIPLTIAI-LVLVFLHRRHIRKLRSEDANDKHKSLDFGLDVVP-SG------NKKRG RGRKG-G-MEMTTADAEKSVRRNDRGLSMDITMTSPYLLPPALN-GSHDSLHSLSRSV-HADDDRYRTATAFSAGDNSSM RSFT-SNLKP-FPDDSVSFTGMSS-R------H-AP------P---GDEMHANLLR------------NAQRMSR--ASP PP-GTATHSIGSSQSHRSPPR---KLTT--PT------PN----------------------I-----VS---------- DRSGI-HSPD---------------------------------------R-----SLAP-KSISTPGSELRKS------- >Phy003DGO9_PENCH MP---HAHH-------AGLVMRNH----VRRDV-IPPHRLPFLVPSTSSIATELPSLVARAE-----------AS----- TTVTGEKPTSNLTTTVLPVVLGAGVPILCAI-VVLIVLHRRHVKKLLREDAMDKHKSLDFGMDTVG-PA----TRRK--G P------------GMPPMSEPTHTKGLSLDVG---PYLMPPGLK-NSPESLRSMSI-----DDDKYRPATA-------SI RSYP-R---------GSRFEG-------------------------ADDGNSGLLQ------------NAQRMSR--SSP PL-YSSPIESHGRSLDQHND------------------------------------------Y----------------- L-----GEVPGVTHPPAAQQ-----------------PGMAIGS-PNANRIPSPEP-LP--------------HLDSSLG >Phy003PHXT_PENMQ MS--HRHGMHHHVRRHI------PEDP-VQLES-VPLEPAP--------TISEAPSVIRRTSSATST--------C---- TGSSCETTSSSNLVNTLPVVLGVVIPVVLAI-AVLLFLHRRHVRKLRQEDANDKHKSLDFGMEVVR-AG----GGK---- -------------ANPEMGEKPHKHGMSLDI-ISSPYLLPPGLH-GSKESLRSLSKVI-SPDDDKYRLGLAAQ-SDTASL RSYR-SHPRM-GQDDASSFRG-ST-R------H-GP------L---PDDMNQGLLQ------------NASRMSR--SPP VD-ATSPLSVNHTIHEEQFD------------------------------------------H----------------- PRTVG-NQSPIRQAESPPMA--------------------KSPK-NHVSP-----DHSG-QG---------DE------- >Phy003PVXT_TALSN -M--PHRHGIHHVHRRN------AENL-IKLES-LPLKPAP--------TISEPPSVVRRASSETST--------C---- SGASCEKSSSSGLVNTLPVVLGVVIPVVAAI-IVLLILHRRHVRKLRQEDANDKHKSLDFGMEVVR-AG----GGN---- ---P---------KQPEMGEKPHKHGMSLDI-IGSPYLLPPGLH-NSKESLRSLTKVI-SVEDDKYRVAAQ---SDTASL RSHR-T---M-GNDDASSFGG-ST-R------H-GP------I---PDDMNQGLLQ------------NASRMSR--SPP VD-ASSPLSVSQTIHEEPFD------------------------------------------H----------------- SNAMR-NQSQNHQAVDS-HM-----------------PPEDLPK-NHSSP-----APSG-PG---------DE------- >Phy003PZPF_FUSOX MGIAHYE--GARLR-------P--RTNIEDVSS-ASQ-NGVA------LSSSIFRRLVTRENC----------------Q DTDSCAAA-SANTNLVVPIVVAIVVPIVLIA-IFLYYLHRKNMKRQMLEDANDPHKSLDFGLDGA--G---------GKK SAR-R-SL---FMGGGEKGLNHKPSQLSMDMNLSSPYLLPPGLQ-ESRESLNSLAKSLGNDNQDPYHP------------ ----------------------RN---------SGKP-----GSMKMPPSRMNSLP-ETPVSATDSKVDPFGTPKA--PA PT-HQPNS------------H---FDE-----KDGFQ-PT----------------------AIIPEIGVVSD------- FDEKR-DGA-SVQPPPAVRSKT---------------------------------------------------------- >Phy003QBJJ_PENDI MS---HAHH-------AGLVMRNH----VRRDV-IPANRVPIFVPS-LSVATQLPTLVARSE------------S----- EPTSGPKATSNLATTVFPIVFGAGIPIFCAL-IILVVLHRRQVKKLVREDAMDKHKSLDFGLDTVG-PA----TRRK--G A-------K----GMPPMSEHNHTKGLSLDVG---PYLLPPGLQ-HSTDSLRSMSI-----DDGKYRPATA-------SI RSNS-R---------NSKYGG-------------------------TDDGNSGLLQ------------NAQRIPR--SSP PL-CSPIEPRARSPLNQHDD------------------------------------------Y----------------- I-----GQVPEVTHPPAVHQ-----------------PGMAIGS-PNTNRIPSPEP-LP--------------HVDSSSG >Phy0043OCA_COLGM MASASFSANGYVMGSRI----P--IRD-VNPINMTPT-PASP-------IRIASRIIGARDE------------QC--TG SATLCEKP-VDPASLTLPITLGVTIPIVGAL-FLLYYFHRRNMRRQAQEDATDPNRGLDFGLGDAP-I--D------KGG KKRKS-LM---FREKGMGIETNKQRQLSMDMNLSSPYLLPPGLQ-SSRESLNSLARTL-HNEADPYRPVYASS--DAGSI YTKT-------TSR-----------R-------GSSMTGRTTMTQNTLPPRQTSLP-RPPPAT-A---DPLGASR--SGS PSL-PPTS------------P---AIR---S-P--LV-AE----------------------PVIPQIETVP-------S GSSLP-QIP-DVPEPEPVAQRGL--------------PGNSRPS-PGHPTILEARE-PE--------------------- >Phy0043W64_36779 M-AGVAEAGSYRMSGRI----P--IVR-RNASG-VEA-LDVP-------QPDQTRPLVARESID-----------C-TGE NANLCEKP-YGANSLGVPIALGVAIPIVALL-GVVFWLHRRNIKKQRSEEANDPHKSLDFGLGDGS-R--G------SKG GKRKS-AF---FGGGGAEKASHRNNQLSMDMNLSSPYLLPPSAQAGSRESLHSLARTL-HGNEDPYSPVYQ--QSDARSM RSTK-K---G-SRDD-------YN---------GPSG-----PGLSVPPSRKSSFP-TSPTSPVTSIPPRYEASK---DE VT-PPPPA------------HSPGQAN---F-P--LN-DT----------------------SPYPNDHQLDA------H GVSMP-AVP-ELQEPAQAKMPS-------------SP---RFPL-P---------------------------------- >Phy00443NV_MAGO7 MVGVTVHEGEYHLGSRM----P--VMA-RDAST-PAL-QIAA------DGPGFFKRLVARQSSDD----------CVNGE PSNLCEKP-VTSQTLALPIALGVTIPLVALV-VMLIWLHRKNVRRQRQEDANDPHKSLDFGLDMGP------------GK RKS-K--L---F-GGEKLGGGPHNRQISMDMNLSSPYLLPPNMQ-NSRESIHSLAKTL--HNEDPYRHITQYNASDAGSL RSYK-A---G-GMD-------------------RPIG-----PKITVPTSRKGSLQATSPTSTIGSVPPRYEASQ---DD YV-KPPPP------------A---ALK---S-P--TQ-DS----------------------TPYPDDKSGP-------L ATVMP-SVP-EIQEPKPASLSK----E-SS---QAPS---LAAV-PPSSPLTISAP-EI--------------------A >Phy0044G80_PHANO M-------------HHL----R--RDA-QMAAS-TSATHTL--------VDRASRVLVARTT-------------C-TND SDPGCTKP---TQVPTIAIALAAIVPVVGLL-IVLVFLHRRNQKKLAAEDAKDKYKSMDFGMGGAG-K---------KNK -GG-P-EM---SITEKDIRGGAHSRGISLEGG--NPYILPVGLH-GSRESFHSLSRSQ-NDPHDPYRPVTFLR-NDNQSI RSQS-RG--Y-GHDNGSLYTTRTMSS-------GGT---------QRNRMGDGLLN------------NAQRMST--SRP MR-SESLS------------P---DSTT--SPD--VK-FPEQNIALSPLNPRFEGEPLAMPATELPHSRTPP-------S A-------SSPPNVP-II----------------------AVPA-PAAAKPEI--------------------------- ================================================ FILE: inst/extdata/tp53.fa ================================================ >Homo_sapiens ----MDDLMLSP-------DDIEQWFTED-----------------PGPDEAPRMPEAAPPVAPAPA---------APTPAAPAPAPSWPLSSSVPSQKTYQGSYGFRLGFLHSGTAKSVTCTYSPALNKMFCQLAKTCPVQLWVDSTPPPGTRVRAMAIYKQSQHMTEVVRRCPHHERCSD-SDGLAPPQHLIRVEGNLRVEYLDDRNTFRHSVVVPYEPPEVGSDCTTIHYNYMCNSSCMGGMNRRPILTIITLEDSSGNLLGRNSFEVRVCACPGRDRRTEEENLRKKGEPHHELPPGS-TKRALPNNTSSS---PQPKKKP----LDGEYFTLQIRGRERFEMFRELNEALELKDAQAGKEPGGSRAHSSHLKSKKGQSTS---RHKKLMFKTEG-PDSD >Sus_scrofa MEESQSELGVEPPLSQETFSDLWKLLPENNLLSSELSL-AAVNDLLLSP-VTNWLDENPDDASRVPAPPA----ATAPAPAAPAPATSWPLSSFVPSQKTYPGSYDFRLGFLHSGTAKSVTCTYSPALNKLFCQLAKTCPVQLWVSSPPPPGTRVRAMAIYKKSEYMTEVVRRCPHHERSSDYSDGLAPPQHLIRVEGNLRAEYLDDRNTFRHSVVVPYEPPEVGSDCTTIHYNFMCNSSCMGGMNRRPILTIITLEDASGNLLGRNSFEVRVCACPGRDRRTEEENFLKKGQSCPEPPPGS-TKRALPTSTSSS---PVQKKKP----LDGEYFTLQIRGRERFEMFRELNDALELKDAQTARESGENRAHSSHLKSKKGQSPS---RHKKPMFKREG-PDSD >Rattus_norvegicus MEDSQSDMSIELPLSQETFSCLWKLLPPDDILPTTATGSPNSMEDLFLPQDVAELLEGPEEALQVSAPAAQEPGTEAPAPVAPASATPWPLSSSVPSQKTYQGNYGFHLGFLQSGTAKSVMCTYSISLNKLFCQLAKTCPVQLWVTSTPPPGTRVRAMAIYKKSQHMTEVVRRCPHHERCSD-GDGLAPPQHLIRVEGNPYAEYLDDRQTFRHSVVVPYEPPEVGSDYTTIHYKYMCNSSCMGGMNRRPILTIITLEDSSGNLLGRDSFEVRVCACPGRDRRTEEENFRKKEEHCPELPPGS-AKRALPTSTSSS---PQQKKKP----LDGEYFTLKIRGRERFEMFRELNEALELKDARAAEESGDSRAHSSYPKTKKGQSTS---RHKKPMIKKVG-PDSD >Equus_caballus MEETQTELGIEPPLSQETFSDLWKLLPENNVLSPDLS--PAVNNLLLSPDVVNWLDEGPDEAPRMPA---------APAPLAPAPATSWPLSSFVPSQKTYPGCYGFRLGFLNSGTAKSVTCTYSPTLNKLFCQLAKTCPVQLLVSSPPPPGTRVRAMAIYKKSEFMTEVVRRCPHHERCSDSSDGLAPPQHLIRVEGNLRAEYLEDRNTFRHSVVVPYEPPEVGSDCTTIHYNFMCNSSCMGGMNRRPILTIITLEDSSGNLLGRNSFEVRVCACPGRDRRTEEENFRKKEEPCPEPPPRS-TKRVLSSNTSSS---PPQKKKP----LDGEYFTLQIRGRERFEMFRELNEALELKDAQTGKEPGGSKAHSSHLKSKKGQSTS---SHKKLIFKREG-PDSD >Danio_rerio MAQNDSQE----------FAELWEKNLIS-----------------IQPPGGGSCWDIINDEEYLPGSFDPN--FFENVLEEQPQPSTLPPTSTVPETSDYPGDHGFRLRFPQSGTAKSVTCTYSPDLNKLFCQLAKTCPVQMVVDVAPPQGSVVRATAIYKKSEHVAEVVRRCPHHERTPD-GDNLAPAGHLIRVEGNQRANYREDNITLRHSVFVPYEAPQLGAEWTTVLLNYMCNSSCMGGMNRRPILTIITLETQEGQLLGRRSFEVRVCACPGRDRKTEESNFKKDQETKTMAKTTTGTKRSLVKESSSATLRPEGSKKAKGSSSDEEIFTLQVRGRERYEILKKLNDSLELSDVVPASDAEKYRQKFMTKNKKENRESSEPKQGKKLMVKDEGRSDSD ================================================ FILE: man/GVariation.Rd ================================================ % Generated by roxygen2: do not edit by hand % Please edit documentation in R/data.R \docType{data} \name{GVariation} \alias{GVariation} \title{GVariation} \format{ a folder } \source{ \url{https://link.springer.com/article/10.1007/s11540-015-9307-3} } \description{ A folder containing 4 MAS files as a sample data set to identify the sequence recombination event. } \details{ \itemize{ \item A.Mont.fas MSA with sequences of 'Mont' and 'CF_YL21' \item B.Oz.fas MSA with sequences of 'Oz' and 'CF_YL21' \item C.Wilga5.fas MSA with sequences of 'Wilga5' and 'CF_YL21' \item sample_alignment.fa MSA with sequences of 'Mont', 'CF_YL21', 'Oz', and 'Wilga5' } } \keyword{datasets} ================================================ FILE: man/Gram-negative_AKL.fasta.Rd ================================================ % Generated by roxygen2: do not edit by hand % Please edit documentation in R/data.R \docType{data} \name{Gram-negative_AKL.fasta} \alias{Gram-negative_AKL.fasta} \title{Gram-negative_AKL} \format{ A MSA fasta with 100 sequences and 36 positions. } \source{ \url{http://biovis.net/year/2013/info/redesign-contest} } \description{ Amino acids in the adenylate kinase lid (AKL) domain from Gram-negative bacteria. } \keyword{datasets} ================================================ FILE: man/Gram-positive_AKL.fasta.Rd ================================================ % Generated by roxygen2: do not edit by hand % Please edit documentation in R/data.R \docType{data} \name{Gram-positive_AKL.fasta} \alias{Gram-positive_AKL.fasta} \title{Gram-positive_AKL} \format{ A MSA fasta with 100 sequences and 36 positions. } \source{ \url{http://biovis.net/year/2013/info/redesign-contest} } \description{ Amino acids in the adenylate kinase lid (AKL) domain from Gram-positive bacteria. } \keyword{datasets} ================================================ FILE: man/LeaderRepeat_All.fa.Rd ================================================ % Generated by roxygen2: do not edit by hand % Please edit documentation in R/data.R \docType{data} \name{LeaderRepeat_All.fa} \alias{LeaderRepeat_All.fa} \title{A sample DNA alignment sequences} \format{ A MSA fasta } \description{ DNA alignment sequences with 24 sequences and 56 positions. } \keyword{datasets} ================================================ FILE: man/Rfam.Rd ================================================ % Generated by roxygen2: do not edit by hand % Please edit documentation in R/data.R \docType{data} \name{Rfam} \alias{Rfam} \title{Rfam} \format{ a folder } \source{ \url{https://rfam.xfam.org/} } \description{ A folder containing seed alignment sequences and corresponding consensus RNA secondary structure. } \details{ \itemize{ \item RF00458.fasta seed alignment sequences of Cripavirus internal ribosome entry site (IRES) \item RF03120.fasta seed alignment sequences of Sarbecovirus 5'UTR \item RF03120_SS.txt consensus RNA secondary structure of Sarbecovirus 5'UTR } } \keyword{datasets} ================================================ FILE: man/TP53_genes.xlsx.Rd ================================================ % Generated by roxygen2: do not edit by hand % Please edit documentation in R/data.R \docType{data} \name{TP53_genes.xlsx} \alias{TP53_genes.xlsx} \title{genome locus} \format{ xlsx } \description{ The local genome map shows the 30000 sites around the TP53 gene. } \keyword{datasets} ================================================ FILE: man/adjust_ally.Rd ================================================ % Generated by roxygen2: do not edit by hand % Please edit documentation in R/ancestor_seq.R \name{adjust_ally} \alias{adjust_ally} \title{adjust_ally} \usage{ adjust_ally(tree, node, sub = FALSE, seq_colname = "mol_seq") } \arguments{ \item{tree}{ggtree object} \item{node}{internal node in tree} \item{sub}{logical value.} \item{seq_colname}{the colname of MSA on tree$data} } \value{ tree } \description{ adjust the tree branch position after assigning ancestor node } \author{ Lang Zhou } ================================================ FILE: man/assign_dms.Rd ================================================ % Generated by roxygen2: do not edit by hand % Please edit documentation in R/dms.R \name{assign_dms} \alias{assign_dms} \title{assign_dms} \usage{ assign_dms(x, dms) } \arguments{ \item{x}{data frame from tidy_msa()} \item{dms}{dms data frame} } \value{ tree } \description{ assign dms value to alignments. } \author{ Lang Zhou } ================================================ FILE: man/available_colors.Rd ================================================ % Generated by roxygen2: do not edit by hand % Please edit documentation in R/available.R \name{available_colors} \alias{available_colors} \title{List Color Schemes currently available} \usage{ available_colors() } \value{ A character vector of available color schemes } \description{ This function lists color schemes currently available that can be used by 'ggmsa' } \examples{ available_colors() } \author{ Lang Zhou } ================================================ FILE: man/available_fonts.Rd ================================================ % Generated by roxygen2: do not edit by hand % Please edit documentation in R/available.R \name{available_fonts} \alias{available_fonts} \title{List Font Families currently available} \usage{ available_fonts() } \value{ A character vector of available font family names } \description{ This function lists font families currently available that can be used by 'ggmsa' } \examples{ available_fonts() } \author{ Lang Zhou } ================================================ FILE: man/available_msa.Rd ================================================ % Generated by roxygen2: do not edit by hand % Please edit documentation in R/available.R \name{available_msa} \alias{available_msa} \title{List MSA objects currently available} \usage{ available_msa() } \value{ A character vector of available objects } \description{ This function lists MSA objects currently available that can be used by 'ggmsa' } \examples{ available_msa() } \author{ Lang Zhou } ================================================ FILE: man/extract_seq.Rd ================================================ % Generated by roxygen2: do not edit by hand % Please edit documentation in R/ancestor_seq.R \name{extract_seq} \alias{extract_seq} \title{extract_seq} \usage{ extract_seq(tree_adjust, seq_colname = "mol_seq") } \arguments{ \item{tree_adjust}{ggtree object} \item{seq_colname}{the colname of MSA on tree$data} } \value{ character } \description{ extract ancestor sequence from tree data } \author{ Lang Zhou } ================================================ FILE: man/facet_msa.Rd ================================================ % Generated by roxygen2: do not edit by hand % Please edit documentation in R/facet_msa.R \name{facet_msa} \alias{facet_msa} \title{segment MSA} \usage{ facet_msa(field) } \arguments{ \item{field}{a numeric vector of the field size.} } \value{ ggplot layers } \description{ The MSA would be plot in a field that you set. } \examples{ library(ggplot2) f <- system.file("extdata/sample.fasta", package="ggmsa") # 2 fields ggmsa(f, end = 120, font = NULL, color="Chemistry_AA") + facet_msa(field = 60) # 3 fields ggmsa(f, end = 120, font = NULL, color="Chemistry_AA") + facet_msa(field = 40) } \author{ Lang Zhou } ================================================ FILE: man/geom_GC.Rd ================================================ % Generated by roxygen2: do not edit by hand % Please edit documentation in R/geom_GC.R \name{geom_GC} \alias{geom_GC} \title{geom_GC} \usage{ geom_GC(show.legend = FALSE) } \arguments{ \item{show.legend}{logical. Should this layer be included in the legends?} } \value{ a ggplot layer } \description{ Multiple sequence alignment layer for ggplot2. It plot points of GC content. } \examples{ #plot GC content f <- system.file("extdata/LeaderRepeat_All.fa", package="ggmsa") ggmsa(f, font = NULL, color="Chemistry_NT") + geom_GC() } \author{ Lang Zhou } ================================================ FILE: man/geom_helix.Rd ================================================ % Generated by roxygen2: do not edit by hand % Please edit documentation in R/arc.R \name{geom_helix} \alias{geom_helix} \title{geom_helix} \usage{ geom_helix(helix_data, color_by = "length", overlap = FALSE, ...) } \arguments{ \item{helix_data}{a data frame. The file of nucleltide secondary structure and then read by readSSfile().} \item{color_by}{generate colors for helices by various rules, including integer counts and value ranges one of "length" and "value"} \item{overlap}{Logicals. If TRUE, two structures data called predict and known must be given(eg:heilx_data = list(known = data1, predicted = data2)), plots the predicted helices that are known on top, predicted helices that are not known on the bottom, and finally plots unpredicted helices on top in black.} \item{...}{additional parameter} } \value{ ggplot2 layers } \description{ The layer of helix plot } \examples{ RF03120 <- system.file("extdata/Rfam/RF03120_SS.txt", package="ggmsa") RF03120_fas <- system.file("extdata/Rfam/RF03120.fasta", package="ggmsa") SS <- readSSfile(RF03120, type = "Vienna") ggmsa(RF03120_fas, font = NULL,border = NA, color = "Chemistry_NT", seq_name = FALSE) + geom_helix(SS) } \author{ Lang Zhou } ================================================ FILE: man/geom_msa.Rd ================================================ % Generated by roxygen2: do not edit by hand % Please edit documentation in R/geom_msa.R \name{geom_msa} \alias{geom_msa} \title{geom_msa} \usage{ geom_msa( data, font = "helvetical", mapping = NULL, color = "Chemistry_AA", custom_color = NULL, char_width = 0.9, none_bg = FALSE, by_conservation = FALSE, position_highlight = NULL, seq_name = NULL, border = NULL, consensus_views = FALSE, use_dot = FALSE, disagreement = TRUE, ignore_gaps = FALSE, ref = NULL, position = "identity", show.legend = FALSE, dms = FALSE, position_color = FALSE, ... ) } \arguments{ \item{data}{sequence alignment with data frame, generated by tidy_msa().} \item{font}{font families, possible values are 'helvetical', 'mono', and 'DroidSansMono', 'TimesNewRoman'. Defaults is 'helvetical'.} \item{mapping}{aes mapping If font = NULL, only plot the background tile.} \item{color}{A Color scheme. One of 'Clustal', 'Chemistry_AA', 'Shapely_AA', 'Zappo_AA', 'Taylor_AA', 'LETTER','CN6',, 'Chemistry_NT', 'Shapely_NT', 'Zappo_NT', 'Taylor_NT'. Defaults is 'Chemistry_AA'.} \item{custom_color}{A data frame with two column called "names" and "color".Customize the color scheme.} \item{char_width}{a numeric vector. Specifying the character width in the range of 0 to 1. Defaults is 0.9.} \item{none_bg}{a logical value indicating whether background should be displayed. Defaults is FALSE.} \item{by_conservation}{a logical value. The most conserved regions have the brightest colors.} \item{position_highlight}{A numeric vector of the position that need to be highlighted.} \item{seq_name}{a logical value indicating whether sequence names should be displayed. Defaults is 'NULL' which indicates that the sequence name is displayed when 'font = null', but 'font = char' will not be displayed. If 'seq_name = TRUE' the sequence name will be displayed in any case. If 'seq_name = FALSE' the sequence name will not be displayed under any circumstances.} \item{border}{a character string. The border color.} \item{consensus_views}{a logical value that opening consensus views.} \item{use_dot}{a logical value. Displays characters as dots instead of fading their color in the consensus view.} \item{disagreement}{a logical value. Displays characters that disagreement to consensus(excludes ambiguous disagreements).} \item{ignore_gaps}{a logical value. When selected TRUE, gaps in column are treated as if that row didn't exist.} \item{ref}{a character string. Specifying the reference sequence which should be one of input sequences when 'consensus_views' is TRUE.} \item{position}{Position adjustment, either as a string, or the result of a call to a position adjustment function, default is 'identity' meaning 'position_identity()'.} \item{show.legend}{logical. Should this layer be included in the legends?} \item{dms}{logical.} \item{position_color}{logical.} \item{...}{additional parameter} } \value{ A list } \description{ Multiple sequence alignment layer for ggplot2. It creates background tiles with/without sequence characters. } \examples{ library(ggplot2) aln <- system.file("extdata", "sample.fasta", package = "ggmsa") tidy_aln <- tidy_msa(aln, start = 150, end = 170) ggplot() + geom_msa(data = tidy_aln, font = NULL) + coord_fixed() } \author{ Guangchuang Yu, Lang Zhou seq_name' work position_highlight' work border' work none_bg' work } ================================================ FILE: man/geom_msaBar.Rd ================================================ % Generated by roxygen2: do not edit by hand % Please edit documentation in R/geom_msaBar.R \name{geom_msaBar} \alias{geom_msaBar} \title{geom_msaBar} \usage{ geom_msaBar() } \value{ A list } \description{ Multiple sequence alignment layer for ggplot2. It plot sequence conservation bar. } \examples{ #plot multiple sequence alignment and conservation bar. f <- system.file("extdata/sample.fasta", package="ggmsa") ggmsa(f, 221, 280, font = NULL, seq_name = TRUE) + geom_msaBar() } \author{ Lang Zhou } ================================================ FILE: man/geom_seed.Rd ================================================ % Generated by roxygen2: do not edit by hand % Please edit documentation in R/geom_seed.R \name{geom_seed} \alias{geom_seed} \title{geom_seed} \usage{ geom_seed(seed, star = FALSE) } \arguments{ \item{seed}{a character string.Specifying the miRNA seed sequence like 'GAGGUAG'.} \item{star}{a logical value indicating whether asterisks should be displayed.} } \value{ a ggplot layer } \description{ Highlighting the seed in miRNA sequences } \examples{ miRNA_sequences <- system.file("extdata/seedSample.fa", package="ggmsa") ggmsa(miRNA_sequences, font = 'DroidSansMono', color = "Chemistry_NT", none_bg = TRUE) + geom_seed(seed = "GAGGUAG", star = FALSE) ggmsa(miRNA_sequences, font = 'DroidSansMono', color = "Chemistry_NT") + geom_seed(seed = "GAGGUAG", star = TRUE) } \author{ Lang Zhou } ================================================ FILE: man/geom_seqlogo.Rd ================================================ % Generated by roxygen2: do not edit by hand % Please edit documentation in R/seqlogo.R \name{geom_seqlogo} \alias{geom_seqlogo} \title{geom_seqlogo} \usage{ geom_seqlogo( font = "DroidSansMono", color = "Chemistry_AA", adaptive = TRUE, top = TRUE, custom_color = NULL, show.legend = FALSE, ... ) } \arguments{ \item{font}{font families, possible values are 'helvetical', 'mono', and 'DroidSansMono', 'TimesNewRoman'. Defaults is 'DroidSansMono'.} \item{color}{A Color scheme. One of 'Clustal', 'Chemistry_AA', 'Shapely_AA', 'Zappo_AA', 'Taylor_AA', 'LETTER', 'CN6', 'Chemistry_NT', 'Shapely_NT', 'Zappo_NT', 'Taylor_NT'. Defaults is 'Chemistry_AA'.} \item{adaptive}{A logical value indicating whether the overall height of seqlogo corresponds to the number of sequences.If is FALSE, seqlogo overall height = 4,fixedly.} \item{top}{A logical value. If TRUE, seqlogo is aligned to the top of MSA.} \item{custom_color}{A data frame with two cloumn called "names" and "color".Customize the color scheme.} \item{show.legend}{logical. Should this layer be included in the legends?} \item{...}{additional parameter} } \value{ A list } \description{ Multiple sequence alignment layer for ggplot2. It plot sequence motifs. } \examples{ #plot multiple sequence alignment and sequence motifs f <- system.file("extdata/LeaderRepeat_All.fa", package="ggmsa") ggmsa(f,font = NULL,color = "Chemistry_NT") + geom_seqlogo() } \author{ Lang Zhou } ================================================ FILE: man/ggSeqBundle.Rd ================================================ % Generated by roxygen2: do not edit by hand % Please edit documentation in R/SeqBundles.R \name{ggSeqBundle} \alias{ggSeqBundle} \title{ggSeqBundle} \usage{ ggSeqBundle( msa, line_width = 0.3, line_thickness = 0.3, line_high = 0, spline_shape = 0.3, size = 0.5, alpha = 0.2, bundle_color = c("#2ba0f5", "#424242"), lev_molecule = c("-", "A", "V", "L", "I", "P", "F", "W", "M", "G", "S", "T", "C", "Y", "N", "Q", "D", "E", "K", "R", "H") ) } \arguments{ \item{msa}{Multiple sequence alignment file(FASTA) or object for representing either nucleotide sequences or peptide sequences.Also receives multiple MSA files. eg:msa = c("Gram-negative_AKL.fasta", "Gram-positive_AKL.fasta").} \item{line_width}{The width of bundles at each site, default is 0.3.} \item{line_thickness}{The thickness of bundles at each site, default is 0.3.} \item{line_high}{The high of bundles at each site, default is 0.} \item{spline_shape}{A numeric vector of values between -1 and 1, which control the shape of the spline relative to the control points.} \item{size}{A numeric vector of values between 0 and 1, which control the size of each lines.} \item{alpha}{A numeric vector of values between 0 and 1, which control the alpha of each lines.} \item{bundle_color}{The colors of each sequence bundles. eg: bundle_color = c("#2ba0f5","#424242").} \item{lev_molecule}{Reassigning the Y-axis and displaying letter-coded amino acids/nucleotides arranged by physiochemical properties or others.eg:amino acids hydrophobicity lev_molecule = c("-","A", "V", "L", "I", "P", "F", "W", "M", "G", "S","T", "C", "Y", "N", "Q", "D", "E", "K","R", "H").} } \value{ ggplot object } \description{ plot Sequence Bundles for MSA based 'ggolot2' } \examples{ aln <- system.file("extdata", "Gram-negative_AKL.fasta", package = "ggmsa") ggSeqBundle(aln) } \author{ Lang Zhou } ================================================ FILE: man/gghelix.Rd ================================================ % Generated by roxygen2: do not edit by hand % Please edit documentation in R/arc.R \name{gghelix} \alias{gghelix} \title{gghelix} \usage{ gghelix(helix_data, color_by = "length", overlap = FALSE) } \arguments{ \item{helix_data}{a data frame. The file of nucleltide secondary structure and then read by readSSfile().} \item{color_by}{generate colors for helices by various rules, including integer counts and value ranges one of "length" and "value"} \item{overlap}{Logicals. If TRUE, two structures data called predict and known must be given(eg:heilx_data = list(known = data1, predicted = data2)), plots the predicted helices that are known on top, predicted helices that are not known on the bottom, and finally plots unpredicted helices on top in black.} } \value{ ggplot object } \description{ Plots nucleltide secondary structure as helices in arc diagram } \examples{ RF03120 <- system.file("extdata/Rfam/RF03120_SS.txt", package="ggmsa") helix_data <- readSSfile(RF03120, type = "Vienna") gghelix(helix_data) } \author{ Lang Zhou } ================================================ FILE: man/ggmaf.Rd ================================================ % Generated by roxygen2: do not edit by hand % Please edit documentation in R/ggmaf.R \name{ggmaf} \alias{ggmaf} \title{ggmaf} \usage{ ggmaf( data, ref, block_start = NULL, block_end = NULL, facet_field = NULL, heights = c(0.4, 0.6), facet_heights = NULL ) } \arguments{ \item{data}{a tidy MAF data frame.You can get it by tidy_maf_df()} \item{ref}{character, the name of reference genome. eg:"hg38.chr1_KI270707v1_random"} \item{block_start}{a numeric vector(>0). The start block to plot.} \item{block_end}{a numeric vector(< max block). The end block to plot.} \item{facet_field}{a numeric vector. The field in a facet panel.} \item{heights}{two numeric vector.The plot proportion between "Genomic location" panel(upon) and "Alignment" panel(down). Default:c(0.4,0.6)} \item{facet_heights}{Numeric vectors.The facet proportion.} } \value{ ggplot object } \description{ plot MAF } \author{ Lang Zhou } ================================================ FILE: man/ggmsa.Rd ================================================ % Generated by roxygen2: do not edit by hand % Please edit documentation in R/ggmsa.R \name{ggmsa} \alias{ggmsa} \title{ggmsa} \usage{ ggmsa( msa, start = NULL, end = NULL, font = "helvetical", color = "Chemistry_AA", custom_color = NULL, char_width = 0.9, none_bg = FALSE, by_conservation = FALSE, position_highlight = NULL, seq_name = NULL, border = NULL, consensus_views = FALSE, use_dot = FALSE, disagreement = TRUE, ignore_gaps = FALSE, ref = NULL, show.legend = FALSE ) } \arguments{ \item{msa}{Multiple aligned sequence files or objects representing either nucleotide sequences or AA sequences.} \item{start}{a numeric vector. Start position to plot.} \item{end}{a numeric vector. End position to plot.} \item{font}{font families, possible values are 'helvetical', 'mono', and 'DroidSansMono', 'TimesNewRoman'. Defaults is 'helvetical'. If font = NULL, only plot the background tile.} \item{color}{a Color scheme. One of 'Clustal', 'Chemistry_AA', 'Shapely_AA', 'Zappo_AA', 'Taylor_AA', 'LETTER', 'CN6', 'Chemistry_NT', 'Shapely_NT', 'Zappo_NT', 'Taylor_NT'. Defaults is 'Chemistry_AA'.} \item{custom_color}{A data frame with two column called "names" and "color".Customize the color scheme.} \item{char_width}{a numeric vector. Specifying the character width in the range of 0 to 1. Defaults is 0.9.} \item{none_bg}{a logical value indicating whether background should be displayed. Defaults is FALSE.} \item{by_conservation}{a logical value. The most conserved regions have the brightest colors.} \item{position_highlight}{A numeric vector of the position that need to be highlighted.} \item{seq_name}{a logical value indicating whether sequence names should be displayed. Defaults is 'NULL' which indicates that the sequence name is displayed when 'font = null', but 'font = char' will not be displayed. If 'seq_name = TRUE' the sequence name will be displayed in any case. If 'seq_name = FALSE' the sequence name will not be displayed under any circumstances.} \item{border}{a character string. The border color.} \item{consensus_views}{a logical value that opening consensus views.} \item{use_dot}{a logical value. Displays characters as dots instead of fading their color in the consensus view.} \item{disagreement}{a logical value. Displays characters that disagreememt to consensus(excludes ambiguous disagreements).} \item{ignore_gaps}{a logical value. When selected TRUE, gaps in column are treated as if that row didn't exist.} \item{ref}{a character string. Specifying the reference sequence which should be one of input sequences when 'consensus_views' is TRUE.} \item{show.legend}{logical. Should this layer be included in the legends?} } \value{ ggplot object } \description{ Plot multiple sequence alignment using ggplot2 with multiple color schemes supported. } \examples{ #plot multiple sequences by loading fasta format fasta <- system.file("extdata", "sample.fasta", package = "ggmsa") ggmsa(fasta, 164, 213, color="Chemistry_AA") \dontrun{ #XMultipleAlignment objects can be used as input in the 'ggmsa' AAMultipleAlignment <- Biostrings::readAAMultipleAlignment(fasta) ggmsa(AAMultipleAlignment, 164, 213, color="Chemistry_AA") #XStringSet objects can be used as input in the 'ggmsa' AAStringSet <- Biostrings::readAAStringSet(fasta) ggmsa(AAStringSet, 164, 213, color="Chemistry_AA") #Xbin objects from 'seqmagick' can be used as input in the 'ggmsa' AAbin <- seqmagick::fa_read(fasta) ggmsa(AAbin, 164, 213, color="Chemistry_AA") } } \author{ Guangchuang Yu } ================================================ FILE: man/merge_seq.Rd ================================================ % Generated by roxygen2: do not edit by hand % Please edit documentation in R/pp_interactive.R \name{merge_seq} \alias{merge_seq} \title{merge_seq} \usage{ merge_seq(previous_seq, gap, subsequent_seq, adjust_name = TRUE) } \arguments{ \item{previous_seq}{previous MSA} \item{gap}{gap length} \item{subsequent_seq}{subsequent MSA} \item{adjust_name}{logical value. merge seq name or not} } \value{ tidy MSA data frame } \description{ merge two MSA } \author{ Lang Zhou } ================================================ FILE: man/plot-methods.Rd ================================================ % Generated by roxygen2: do not edit by hand % Please edit documentation in R/method-plot.R \docType{methods} \name{plot} \alias{plot} \alias{plot,SeqDiff,ANY-method} \title{plot method for SeqDiff object} \usage{ \S4method{plot}{SeqDiff,ANY}( x, width = 50, title = "auto", xlab = "Nucleotide Position", by = "bar", fill = "firebrick", colors = c(A = "#ff6d6d", C = "#769dcc", G = "#f2be3c", T = "#74ce98"), xlim = NULL ) } \arguments{ \item{x}{SeqDiff object} \item{width}{bin width} \item{title}{plot title} \item{xlab}{xlab} \item{by}{one of 'bar' and 'area'} \item{fill}{fill color of upper part of the plot} \item{colors}{color of lower part of the plot} \item{xlim}{limits of x-axis} } \value{ plot } \description{ plot method for SeqDiff object } \examples{ fas <- list.files(system.file("extdata", "GVariation", package="ggmsa"), pattern="fas", full.names=TRUE) x1 <- seqdiff(fas[1], reference=1) plot(x1) } \author{ guangchuang yu } ================================================ FILE: man/readSSfile.Rd ================================================ % Generated by roxygen2: do not edit by hand % Please edit documentation in R/arc.R \name{readSSfile} \alias{readSSfile} \title{readSSfile} \usage{ readSSfile(file, type = NULL) } \arguments{ \item{file}{A text file in connect format} \item{type}{file type. one of "Helix, "Connect", "Vienna" and "Bpseq"} } \value{ data frame } \description{ Read secondary structure file } \examples{ RF03120 <- system.file("extdata/Rfam/RF03120_SS.txt", package="ggmsa") helix_data <- readSSfile(RF03120, type = "Vienna") } \author{ Lang Zhou } ================================================ FILE: man/read_maf.Rd ================================================ % Generated by roxygen2: do not edit by hand % Please edit documentation in R/read_maf.R \name{read_maf} \alias{read_maf} \title{read_maf} \usage{ read_maf(multiple_alignment_format) } \arguments{ \item{multiple_alignment_format}{a multiple alignment format(MAF) file} } \value{ data frame } \description{ read 'multiple alignment format'(MAF) file } \author{ Lang Zhou } ================================================ FILE: man/reset_pos.Rd ================================================ % Generated by roxygen2: do not edit by hand % Please edit documentation in R/pp_interactive.R \name{reset_pos} \alias{reset_pos} \title{reset_pos} \usage{ reset_pos(seq_df) } \arguments{ \item{seq_df}{MSA data} } \value{ data frame } \description{ reset MSA position } \author{ Lang Zhou } ================================================ FILE: man/sample.fasta.Rd ================================================ % Generated by roxygen2: do not edit by hand % Please edit documentation in R/data.R \docType{data} \name{sample.fasta} \alias{sample.fasta} \title{A sample data used in ggmsa} \format{ A MSA fasta with 9 sequences and 456 positions. } \description{ A dataset containing the alignment sequences of the phenylalanine hydroxylase protein (PH4H) within nine species } \keyword{datasets} ================================================ FILE: man/seedSample.fa.Rd ================================================ % Generated by roxygen2: do not edit by hand % Please edit documentation in R/data.R \docType{data} \name{seedSample.fa} \alias{seedSample.fa} \title{microRNA data used in ggmsa} \format{ A MSA fasta with 6 sequences and 22 positions. } \source{ \url{https://www.mirbase.org/ftp.shtml} } \description{ Fasta format sequences of mature miRNA sequences from miRBase } \keyword{datasets} ================================================ FILE: man/seqdiff.Rd ================================================ % Generated by roxygen2: do not edit by hand % Please edit documentation in R/seqdiff.R \name{seqdiff} \alias{seqdiff} \title{seqdiff} \usage{ seqdiff(fasta, reference = 1) } \arguments{ \item{fasta}{fasta file} \item{reference}{which sequence serve as reference, 1 or 2} } \value{ SeqDiff object } \description{ calculate difference of two aligned sequences } \examples{ fas <- list.files(system.file("extdata", "GVariation", package="ggmsa"), pattern="fas", full.names=TRUE) seqdiff(fas[1], reference=1) } \author{ guangchuang yu } ================================================ FILE: man/seqlogo.Rd ================================================ % Generated by roxygen2: do not edit by hand % Please edit documentation in R/seqlogo.R \name{seqlogo} \alias{seqlogo} \title{seqlogo} \usage{ seqlogo( msa, start = NULL, end = NULL, font = "DroidSansMono", color = "Chemistry_AA", adaptive = FALSE, top = FALSE, custom_color = NULL ) } \arguments{ \item{msa}{Multiple sequence alignment file or object for representing either nucleotide sequences or peptide sequences.} \item{start}{Start position to plot.} \item{end}{End position to plot.} \item{font}{font families, possible values are 'helvetical', 'mono', and 'DroidSansMono', 'TimesNewRoman'. Defaults is 'DroidSansMono'. If font=NULL, only the background tiles is drawn.} \item{color}{A Color scheme. One of 'Clustal', 'Chemistry_AA', 'Shapely_AA', 'Zappo_AA', 'Taylor_AA', 'LETTER', 'CN6','Chemistry_NT', 'Shapely_NT', 'Zappo_NT', 'Taylor_NT'. Defaults is 'Chemistry_AA'.} \item{adaptive}{A logical value indicating whether the overall height of seqlogo corresponds to the number of sequences. If FALSE, seqlogo overall height = 4,fixedly.} \item{top}{A logical value. If TRUE, seqlogo is aligned to the top of MSA.} \item{custom_color}{A data frame with two cloumn called "names" and "color".Customize the color scheme.} } \value{ ggplot object } \description{ plot sequence logo for MSA based 'ggolot2' } \examples{ #plot sequence motif independently nt_sequence <- system.file("extdata", "LeaderRepeat_All.fa", package = "ggmsa") seqlogo(nt_sequence, color = "Chemistry_NT") } \author{ Lang Zhou } ================================================ FILE: man/sequence-link-tree.fasta.Rd ================================================ % Generated by roxygen2: do not edit by hand % Please edit documentation in R/data.R \docType{data} \name{sequence-link-tree.fasta} \alias{sequence-link-tree.fasta} \title{sequence-link-tree} \format{ A MSA fasta with 28 sequences and 480 positions. } \description{ Alignment sequences used to demonstrate circular MSA layout } \keyword{datasets} ================================================ FILE: man/show-methods.Rd ================================================ % Generated by roxygen2: do not edit by hand % Please edit documentation in R/method-show.R \docType{methods} \name{show} \alias{show} \alias{SeqDiff-class} \alias{show,SeqDiff-method} \title{show method} \usage{ show(object) } \arguments{ \item{object}{SeqDiff object} } \value{ message } \description{ show method } \examples{ fas <- list.files(system.file("extdata", "GVariation", package="ggmsa"), pattern="fas", full.names=TRUE) x1 <- seqdiff(fas[1], reference=1) x1 } ================================================ FILE: man/simplify_hdata.Rd ================================================ % Generated by roxygen2: do not edit by hand % Please edit documentation in R/pp_interactive.R \name{simplify_hdata} \alias{simplify_hdata} \title{simplify_hdata} \usage{ simplify_hdata(hdata, sim_msa) } \arguments{ \item{hdata}{data from tidy_hdata()} \item{sim_msa}{MSA data frame} } \value{ data frame } \description{ reset hdata data position } \author{ Lang Zhou } ================================================ FILE: man/simplot.Rd ================================================ % Generated by roxygen2: do not edit by hand % Please edit documentation in R/simplot.R \name{simplot} \alias{simplot} \title{simplot} \usage{ simplot( file, query, window = 200, step = 20, group = FALSE, id, sep, sd = FALSE, smooth = FALSE, smooth_params = list(method = "loess", se = FALSE) ) } \arguments{ \item{file}{alignment fast file} \item{query}{query sequence} \item{window}{sliding window size (bp)} \item{step}{step size to slide the window (bp)} \item{group}{whether grouping sequence.(eg. For "A-seq1,A-seq-2,B-seq1 and B-seq2", using sep = "-" and id = 1 to divide sequences into groups A and B)} \item{id}{position to extract id for grouping; only works if group = TRUE} \item{sep}{separator to split sequence name; only works if group = TRUE} \item{sd}{whether display standard deviation of similarity among each group; only works if group=TRUE} \item{smooth}{FALSE(default)or TRUE; whether display smoothed spline.} \item{smooth_params}{a list that add params for geom_smooth, (default: smooth_params = list(method = "loess", se = FALSE))} } \value{ ggplot object } \description{ Sequence similarity plot } \examples{ fas <- system.file("extdata/GVariation/sample_alignment.fa", package="ggmsa") simplot(fas, 'CF_YL21') } \author{ guangchuang yu } ================================================ FILE: man/theme_msa.Rd ================================================ % Generated by roxygen2: do not edit by hand % Please edit documentation in R/theme_msa.R \name{theme_msa} \alias{theme_msa} \title{theme_msa} \usage{ theme_msa() } \description{ Theme for ggmsa. } \author{ Lang Zhou } ================================================ FILE: man/tidy_hdata.Rd ================================================ % Generated by roxygen2: do not edit by hand % Please edit documentation in R/pp_interactive.R \name{tidy_hdata} \alias{tidy_hdata} \title{tidy_hdata} \usage{ tidy_hdata(gap, inter, previous_seq, subsequent_seq) } \arguments{ \item{gap}{gap length} \item{inter}{protein-protein interactive position data} \item{previous_seq}{previous MSA} \item{subsequent_seq}{subsequent MSA} } \value{ helix data } \description{ tidy protein-protein interactive position data } \author{ Lang Zhou } ================================================ FILE: man/tidy_maf_df.Rd ================================================ % Generated by roxygen2: do not edit by hand % Please edit documentation in R/ggmaf.R \name{tidy_maf_df} \alias{tidy_maf_df} \title{tidy_maf_df} \usage{ tidy_maf_df(maf_df, ref) } \arguments{ \item{maf_df}{a MAF data frame.You can get it by read_maf()} \item{ref}{character, the name of reference genome. eg:"hg38.chr1_KI270707v1_random"} } \value{ data frame } \description{ tidy MAF data frame } \author{ Lang Zhou } ================================================ FILE: man/tidy_msa.Rd ================================================ % Generated by roxygen2: do not edit by hand % Please edit documentation in R/msa_data.R \name{tidy_msa} \alias{tidy_msa} \title{tidy_msa} \usage{ tidy_msa(msa, start = NULL, end = NULL) } \arguments{ \item{msa}{multiple sequence alignment file or sequence object in DNAStringSet, RNAStringSet, AAStringSet, BStringSet, DNAMultipleAlignment, RNAMultipleAlignment, AAMultipleAlignment, DNAbin or AAbin} \item{start}{start position to extract subset of alignment} \item{end}{end position to extract subset of alignemnt} } \value{ tibble data frame } \description{ Convert msa file/object to tidy data frame. } \examples{ fasta <- system.file("extdata", "sample.fasta", package = "ggmsa") aln <- tidy_msa(msa = fasta, start = 10, end = 100) } \author{ Guangchuang Yu } ================================================ FILE: man/tp53.fa.Rd ================================================ % Generated by roxygen2: do not edit by hand % Please edit documentation in R/data.R \docType{data} \name{tp53.fa} \alias{tp53.fa} \title{TP53 MSA} \format{ A MSA fasta with 5 sequences and 404 positions. } \description{ Alignment sequences of used to show graphical combination } \keyword{datasets} ================================================ FILE: man/treeMSA_plot.Rd ================================================ % Generated by roxygen2: do not edit by hand % Please edit documentation in R/ancestor_seq.R \name{treeMSA_plot} \alias{treeMSA_plot} \title{treeMSA_plot} \usage{ treeMSA_plot( p_tree, tidymsa_df, ancestral_node = "none", sub = FALSE, panel = "MSA", font = NULL, color = "Chemistry_AA", seq_colname = NULL, ... ) } \arguments{ \item{p_tree}{tree view} \item{tidymsa_df}{tidy MSA data} \item{ancestral_node}{vector, internal node in tree. Assigning a internal node to display "ancestral sequences",If ancestral_node = "none" hides all ancestral sequences, if ancestral_node = "all" shows all ancestral sequences.} \item{sub}{logical value. Displaying a subset of ancestral sequences or not.} \item{panel}{panel name for plot of MSA data} \item{font}{font families, possible values are 'helvetical', 'mono', and 'DroidSansMono', 'TimesNewRoman'. Defaults is 'helvetical'. If font = NULL, only plot the background tile.} \item{color}{a Color scheme. One of 'Clustal', 'Chemistry_AA', 'Shapely_AA', 'Zappo_AA', 'Taylor_AA', 'LETTER', 'CN6', 'Chemistry_NT', 'Shapely_NT', 'Zappo_NT', 'Taylor_NT'. Defaults is 'Chemistry_AA'.} \item{seq_colname}{the colname of MSA on tree$data} \item{...}{additional parameters for 'geom_msa'} } \value{ ggplot object } \description{ plot Tree-MSA plot } \details{ 'treeMSA_plot()' automatically re-arranges the MSA data according to the tree structure, } \author{ Lang Zhou } ================================================ FILE: tests/testthat/test-main.R ================================================ library(ggmsa) library(ggplot2) test_that("check whether `ggmsa` create a `ggplot` object", { p <- ggmsa(msa = system.file("extdata", "sample.fasta", package = "ggmsa"), start = 10, end = 20, font = NULL) expect_true(is.ggplot(p)) }) ================================================ FILE: tests/testthat/test-msa_data.R ================================================ library(ggmsa) msa <- system.file("extdata", "sample.fasta", package = "ggmsa") tidymsa <- tidy_msa(msa, 10, 20) test_that("check msaData integrity when using `font`", { msaData <- msa_data(tidymsa) msaFull_names <- c("label", "x", "yy", "order", "name", "position", "character", "color", "group", "y") expect_true(is.data.frame(msaData)) expect_named(msaData, msaFull_names) }) test_that("check msaData integrity when using `font = NULL`", { msaData <- msa_data(tidymsa, font = NULL) msaFull_names <- c("name", "position", "character", "color" ) expect_true(is.data.frame(msaData)) expect_named(msaData, msaFull_names) }) ================================================ FILE: tests/testthat/test-tidy_msa.R ================================================ library(ggmsa) library(Biostrings) msa <- system.file("extdata", "sample.fasta", package = "ggmsa") tidy_names <- c("name", "position", "character") test_that("tidy FASTA format by tidy_msa", { fasta_tidy <- tidy_msa(msa, 10, 20) expect_true(is.data.frame(fasta_tidy)) expect_named(fasta_tidy, tidy_names) }) test_that("tidy Biostrings objects by tidy_msa", { AAMultipleAlignment <- readAAMultipleAlignment(msa) expect_s4_class(AAMultipleAlignment, "AAMultipleAlignment") AAStringSet <- readAAStringSet(msa) expect_s4_class( AAStringSet, "AAStringSet") AAMultipleAlignment_tidy <- tidy_msa(AAMultipleAlignment, 10, 20) AAStringSet_tidy <- tidy_msa(AAStringSet, 10, 20) expect_true(is.data.frame(AAMultipleAlignment_tidy)) expect_named(AAMultipleAlignment_tidy, tidy_names) expect_true(is.data.frame(AAStringSet_tidy)) expect_named(AAStringSet_tidy, tidy_names) }) test_that("tidy AAbin objects by tidy_msa", { AAbin <- ape::read.FASTA(msa, "AA") expect_s3_class(AAbin, "AAbin") AAbin_tidy <- tidy_msa(AAbin, 10, 20) expect_true(is.data.frame(AAbin_tidy)) expect_named(AAbin_tidy, tidy_names) }) ================================================ FILE: tests/testthat.R ================================================ library(testthat) library(ggmsa) test_check("ggmsa") ================================================ FILE: vignettes/.gitignore ================================================ Annotations.Rmd Color_schemes_And_Font_Families.Rmd MSA_theme.Rmd Other_Modules.Rmd View_modes.Rmd ================================================ FILE: vignettes/ggmsa.Rmd ================================================ --- title: "ggmsa-Getting Started" author: "GuangChuang Yu and Lang Zhou" output: prettydoc::html_pretty: toc: false theme: cayman highlight: github pdf_document: toc: true date: "`r Sys.Date()`" bibliography: ggmsa.bib vignette: > %\VignetteIndexEntry{ggmsa} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) # Packages ------------------------------------------------------------------- library(ggmsa) library(ggplot2) library(yulab.utils) ``` # Install package ```{r eval = FALSE} if (!require("BiocManager")) install.packages("BiocManager") BiocManager::install("ggmsa") ``` # Introduction ggmsa is a package designed to plot multiple sequence alignments. This package implements functions to visualize publication-quality multiple sequence alignments (protein/DNA/RNA) in R extremely simple and powerful. It uses module design to annotate sequence alignments and allows to accept other data sets for diagrams combination. In this tutorial, we’ll work through the basics of using ggmsa. ```{r results="hide", message=FALSE, warning=FALSE} library(ggmsa) ``` ```{r echo=FALSE, out.width='50%'} knitr::include_graphics("man/figures/workflow.png") ``` # Importing MSA data We’ll start by importing some example data to use throughout this tutorial. Expect FASTA files, some of the objects in R can also as input. `available_msa()` can be used to list MSA objects currently available. ```{r warning=FALSE} available_msa() protein_sequences <- system.file("extdata", "sample.fasta", package = "ggmsa") miRNA_sequences <- system.file("extdata", "seedSample.fa", package = "ggmsa") nt_sequences <- system.file("extdata", "LeaderRepeat_All.fa", package = "ggmsa") ``` # Basic use: MSA Visualization The most simple code to use ggmsa: ```{r fig.height = 2, fig.width = 10, warning=FALSE} ggmsa(protein_sequences, 300, 350, color = "Clustal", font = "DroidSansMono", char_width = 0.5, seq_name = TRUE ) ``` ## Color Schemes ggmsa predefines several color schemes for rendering MSA are shipped in the package. In the same ways, using `available_msa()` to list color schemes currently available. Note that amino acids (protein) and nucleotides (DNA/RNA) have different names. ```{r warning=FALSE} available_colors() ``` ```{r echo=FALSE, out.width = '50%'} knitr::include_graphics("man/figures/schemes.png") ``` ## Font Several predefined fonts are shipped ggmsa. Users can use `available_fonts()` to list the font currently available. ```{r warning=FALSE} available_fonts() ``` # MSA Annotation ggmsa supports annotations for MSA. Similar to the ggplot2, it implements annotations by `geom` and users can perform annotation with `+` , like this: `ggmsa() + geom_*()`. Automatically generated annotations that containing colored labels and symbols are overlaid on MSAs to indicate potentially conserved or divergent regions. For example, visualizing multiple sequence alignment with **sequence logo** and **bar chart**: ```{r fig.height = 2.5, fig.width = 11, warning = FALSE, message = FALSE} ggmsa(protein_sequences, 221, 280, seq_name = TRUE, char_width = 0.5) + geom_seqlogo(color = "Chemistry_AA") + geom_msaBar() ``` This table shows the annnotation layers supported by ggmsa as following: ```{r echo=FALSE, results='asis', warning=FALSE, message=FALSE} library(kableExtra) x <- "geom_seqlogo()\tgeometric layer\tautomatically generated sequence logos for a MSA\n geom_GC()\tannotation module\tshows GC content with bubble chart\n geom_seed()\tannotation module\thighlights seed region on miRNA sequences\n geom_msaBar()\tannotation module\tshows sequences conservation by a bar chart\n geom_helix()\tannotation module\tdepicts RNA secondary structure as arc diagrams(need extra data)\n " xx <- strsplit(x, "\n\n")[[1]] y <- strsplit(xx, "\t") %>% do.call("rbind", .) y <- as.data.frame(y, stringsAsFactors = FALSE) colnames(y) <- c("Annotation modules", "Type", "Description") knitr::kable(y, align = "l", booktabs = TRUE, escape = TRUE) %>% kable_styling(latex_options = c("striped", "hold_position", "scale_down")) ``` # Learn more Check out the guides for learning everything there is to know about all the different features: - [Getting Started](https://yulab-smu.top/ggmsa/articles/ggmsa.html) - [Annotations](https://yulab-smu.top/ggmsa/articles/Annotations.html) - [Color Schemes and Font Families](https://yulab-smu.top/ggmsa/articles/Color_schemes_And_Font_Families.html) - [Theme](https://yulab-smu.top/ggmsa/articles/guides/MSA_theme.html) - [Other Modules](https://yulab-smu.top/ggmsa/articles/Other_Modules.html) - [View Modes](https://yulab-smu.top/ggmsa/articles/View_modes.html) # Session Info ```{r echo = FALSE} sessionInfo() ``` ================================================ FILE: vignettes/ggmsa.bib ================================================ @article{Taylor1997Residual, title={Residual colours: a proposal for aminochromography.}, author={Taylor, W R}, journal={Protein Eng}, volume={10}, number={7}, pages={743-746}, year={1997}, } @article{Waterhouse2009Jalview, title={Jalview Version 2--a multiple sequence alignment editor and analysis workbench}, author={Waterhouse, A. M. and Procter, J. B. and Martin, D. M. and Clamp, M and Barton, G. J.}, journal={Bioinformatics}, volume={25}, number={9}, pages={1189}, year={2009}, } @article{yu2017ggtree, title={ggtree: an R package for visualization and annotation of phylogenetic trees with their covariates and other associated data}, author={Yu, Guangchuang and Smith, David K and Zhu, Huachen and Guan, Yi and Lam, Tommy Tsanyuk}, journal={Methods in Ecology and Evolution}, volume={8}, number={1}, pages={28--36}, year={2017} } @article{Wagih2017ggseqlogo, title={ggseqlogo: a versatile R package for drawing sequence logos}, author={Wagih, Omar}, journal={Bioinformatics}, volume={33}, number={22}, year={2017}, }