Repository: snap-stanford/UCE Branch: main Commit: 8ead6e07af0c Files: 17 Total size: 374.1 KB Directory structure: gitextract_vi_txbci/ ├── LICENSE ├── README.md ├── data_proc/ │ ├── Create New Species Files.ipynb │ ├── data_utils.py │ ├── download_proc_czi_cxg.py │ ├── gene_embeddings.py │ ├── generate_reduced_chrom_files.py │ └── preproc_many_dataset.py ├── eval_data.py ├── eval_single_anndata.py ├── evaluate.py ├── examples/ │ ├── Benchmark Embeddings with scIB.ipynb │ └── Label Transfer Using Logistic Classifier.ipynb ├── model.py ├── model_files/ │ └── new_species_protein_embeddings.csv ├── requirements.txt └── utils.py ================================================ FILE CONTENTS ================================================ ================================================ FILE: LICENSE ================================================ MIT License Copyright (c) 2023 Yanay Rosen, Yusuf Roohani, Jure Leskovec Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. ================================================ FILE: README.md ================================================ # Universal Cell Embeddings This repo includes a PyTorch [HuggingFace Accelerator](https://huggingface.co/docs/accelerate/package_reference/accelerator) implementation of the UCE model, to be used to embed individual anndata datasets. ## Installation ``` pip install -r requirements.txt ``` ## Embedding a new dataset To generate an embedding for a new single-cell RNA sequencing dataset in the AnnData format, use the `eval_single_anndata.py` script. ``` python eval_single_anndata.py --adata_path {path_to_anndata} --dir {output_dir} --species {species} --model_loc {model_loc} --batch_size {batch_size} ``` where - `adata_path`: a h5ad file. The `.X` slot of the file should be scRNA-seq counts. The `.var_names` slot should correspond to gene names, *not ENSEMBLIDs*. - `dir`: the working directory in which intermediate and final output files will be saved to skip repeated processing of the same dataset. - `species`: the species of the dataset you are embedding. - `model_loc`: the location of the model weights `.torch` file. - `batch_size`: the per GPU batch size. For the 33 layer model, on a 80GB GPU, you should use 25. For a 4 layer model on the same GPU, you can use 100. For a sample output on the 10k pbmc dataset, run ``` python eval_single_anndata.py ``` All necessary model files will be downloaded automatically. **Note**: This script makes use of additional files, which are described in the code documentation. These are downloaded automatically unless already present in the working directory. The script defaults to the pretrained 4-layer model. For running the pretrained 33-layer model from the paper, please download using this [link](https://figshare.com/articles/dataset/Universal_Cell_Embedding_Model_Files/24320806?file=43423236) and set `--nlayers 33`. ## Output Final evaluated AnnData: `dir/{dataset_name}.h5ad`. This AnnData will be identical to the proccessed input anndata, but have UCE embeddings added in the `.obsm["X_uce"]` slot. Please see documentation for information on additional output files. All outputs from `eval_single_anndata.py` are stored in the `dir` directory. ## Data You can download processed datasets used in the papere [here](https://drive.google.com/drive/folders/1f63fh0ykgEhCrkd_EVvIootBw7LYDVI7?usp=drive_link) **Note:** These datasets were embedded using the 33 layer model. Embeddings for the 33 layer model are not compatible with embeddings from the 4 layer model. ## Citing If you find our paper and code useful, please consider citing the [preprint](https://www.biorxiv.org/content/10.1101/2023.11.28.568918v1): ``` @article{rosen2023universal, title={Universal Cell Embeddings: A Foundation Model for Cell Biology}, author={Rosen, Yanay and Roohani, Yusuf and Agrawal, Ayush and Samotorcan, Leon and Consortium, Tabula Sapiens and Quake, Stephen R and Leskovec, Jure}, journal={bioRxiv}, pages={2023--11}, year={2023}, publisher={Cold Spring Harbor Laboratory} } ``` ## Analyses Please see the [reproduce repo](https://github.com/yhr91/uce_reproduce/tree/master) for analyses figures and datasets from the paper. ================================================ FILE: data_proc/Create New Species Files.ipynb ================================================ { "cells": [ { "cell_type": "markdown", "id": "0e4018ee", "metadata": {}, "source": [ "# Embedding Novel Species\n", "\n", "This notebook will create the files you need to embed a novel species that wasn't included in the training data.\n", "\n", "To start, you will need to download the ESM2 protein embeddings and the reference proteome for the species.\n", "\n", "You can find precalculated ESM2 protein embeddings for many species [here](https://drive.google.com/drive/folders/1_Dz7HS5N3GoOAG6MdhsXWY1nwLoN13DJ?usp=drive_link)\n", "\n", "For reference proteomes, you can download them from [here](https://useast.ensembl.org/info/about/species.html).\n", "\n", "If there is no protein embedding for the species you are interested in, you can request to have it made via Github or email, or you can create it yourself following instructions [here](https://github.com/snap-stanford/SATURN/tree/main/protein_embeddings)." ] }, { "cell_type": "code", "execution_count": 1, "id": "ab368d92", "metadata": {}, "outputs": [], "source": [ "import numpy as np\n", "import pickle as pkl\n", "import pandas as pd" ] }, { "cell_type": "code", "execution_count": 2, "id": "c9a306f3", "metadata": {}, "outputs": [], "source": [ "SPECIES_NAME = \"chicken\" # short hand name for this species, will be used in arguments and files\n", "\n", "# Path to the species proteome\n", "SPECIES_PROTEIN_FASTA_PATH = \"../../../SATURN/protein_embeddings/data/Gallus_gallus.bGalGal1.mat.broiler.GRCg7b.pep.all.fa\"\n", "\n", "# Path to the ESM2 Embeddings\n", "SPECIES_PROTEIN_EMBEDDINGS_PATH = \"../model_files/protein_embeddings/Gallus_gallus.bGalGal1.mat.broiler.GRCg7b.pep.all.gene_symbol_to_embedding_ESM2.pt\"\n", "\n", "# primary_assembly name, this needs to be matched to the FASTA file\n", "ASSEMBLY_NAME = \"bGalGal1.mat.broiler.GRCg7b\"\n", "# NCBI Taxonomy ID, please set this so that if someone else also embeds the same species,\n", "# randomly generated chromosome tokens will be the same\n", "TAXONOMY_ID = 9031" ] }, { "cell_type": "markdown", "id": "e5d37e52", "metadata": {}, "source": [ "You can view the FASTA format here, please confirm the primary_assembly name is correct." ] }, { "cell_type": "code", "execution_count": 3, "id": "2ecf1464", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ ">ENSGALP00010000002.1 pep primary_assembly:bGalGal1.mat.broiler.GRCg7b:MT:2824:3798:1 gene:ENSGALG00010000007.1 transcript:ENSGALT00010000007.1 gene_biotype:protein_coding transcript_biotype:protein_coding gene_symbol:ND1 description:NADH dehydrogenase subunit 1 [Source:NCBI gene (formerly Entrezgene);Acc:63549479]\r\n", "MTLPTLTNLLIMTLSYILPILIAVAFLTLVERKILSYMQARKGPNIVGPFGLLQPVADGV\r\n", "KLFIKEPIRPSTSSPFLFIITPILALLLALTIWVPLPLPFPLADLNLGLLFLLAMSSLTV\r\n", "YSLLWSGWASNSKYALIGALRAVAQTISYEVTLAIILLSTIMLSGNYTLSTLAITQEPIY\r\n", "LIFSAWPLAMMWYISTLAETNRAPFDLTEGESELVSGFNVEYAAGPFAMFFLAEYANIML\r\n", "MNTLTTVLFLNPSFLNLPPELFPIALATKTLLLSSSFLWIRASYPRFRYDQLMHLLWKNF\r\n", "LPLTLALCLWHTSMPISYAGLPPI\r\n", ">ENSGALP00010000003.1 pep primary_assembly:bGalGal1.mat.broiler.GRCg7b:MT:4015:5053:1 gene:ENSGALG00010000011.1 transcript:ENSGALT00010000011.1 gene_biotype:protein_coding transcript_biotype:protein_coding gene_symbol:ND2 description:NADH dehydrogenase subunit 2 [Source:NCBI gene (formerly Entrezgene);Acc:63549482]\r\n", "MNPHAKLICTVSLIMGTSITISSNHWILAWTGLEINTLAIIPLISKSHHPRAIEATIKYF\r\n", "LTQSTASALILFSSMTNAWSTGQWDITQLNHPTSCLMLTMAIAIKLGLVPFHFWFPEVLQ\r\n" ] } ], "source": [ "!head {SPECIES_PROTEIN_FASTA_PATH}" ] }, { "cell_type": "code", "execution_count": 4, "id": "90540d0b", "metadata": {}, "outputs": [], "source": [ "species_to_paths = {\n", " SPECIES_NAME: SPECIES_PROTEIN_FASTA_PATH,\n", "}\n", "\n", "species_to_ids = {\n", " SPECIES_NAME: ASSEMBLY_NAME,\n", "}" ] }, { "cell_type": "code", "execution_count": 5, "id": "623b99cf", "metadata": {}, "outputs": [], "source": [ "all_pos_def = []\n", "\n", "missing_genes = {}\n", "for species in species_to_ids.keys():\n", " missing_genes[species] = []\n", " proteome_path = species_to_paths[species]\n", " species_id = species_to_ids[species]\n", "\n", " with open(proteome_path) as f:\n", " proteome_lines = f.readlines()\n", "\n", " gene_symbol_to_location = {}\n", " gene_symbol_to_chrom = {}\n", "\n", " for line in proteome_lines:\n", " if line.startswith(\">\"):\n", " split_line = line.split()\n", " gene_symbol = [token for token in split_line if token.startswith(\"gene_symbol\")]\n", " if len(gene_symbol) > 0:\n", " gene_symbol = gene_symbol[0].split(\":\")\n", " \n", " if len(gene_symbol) == 2:\n", " gene_symbol = gene_symbol[1]\n", " elif len(gene_symbol) > 2:\n", " gene_symbol = \":\".join(gene_symbol[1:]) # fix for annoying zebrafish gene names with colons in them\n", " else:\n", " 1/0 # something weird happening, throw an error\n", " \n", " \n", " chrom = None\n", " \n", " chrom_arr = [token for token in split_line if token.startswith(\"chromosome:\")]\n", " if len(chrom_arr) > 0:\n", " chrom = chrom_arr[0].replace(\"chromosome:\", \"\")\n", " else:\n", " chrom_arr = [token for token in split_line if token.startswith(\"primary_assembly:\")]\n", " if len(chrom_arr) > 0:\n", " chrom = chrom_arr[0].replace(\"primary_assembly:\", \"\")\n", " else:\n", " chrom_arr = [token for token in split_line if token.startswith(\"scaffold:\")] \n", " if len(chrom_arr) > 0:\n", " chrom = chrom_arr[0].replace(\"scaffold:\", \"\")\n", " if chrom is not None:\n", " gene_symbol_to_location[gene_symbol] = chrom.split(\":\")[2]\n", " gene_symbol_to_chrom[gene_symbol] = chrom.split(\":\")[1]\n", " else:\n", " missing_genes[species].append(gene_symbol)\n", " \n", "\n", " positional_df = pd.DataFrame()\n", " positional_df[\"gene_symbol\"] = [gn.upper() for gn in list(gene_symbol_to_chrom.keys())]\n", " positional_df[\"chromosome\"] = list(gene_symbol_to_chrom.values())\n", " positional_df[\"start\"] = list(gene_symbol_to_location.values())\n", " positional_df = positional_df.sort_values([\"chromosome\", \"start\"])\n", " #positional_df = positional_df.set_index(\"gene_symbol\")\n", " positional_df[\"species\"] = species\n", " all_pos_def.append(positional_df)" ] }, { "cell_type": "code", "execution_count": 6, "id": "b72887b3", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
gene_symbolchromosomestartspecies
2327GCC111006145chicken
2502NCAM21100828671chicken
3084ENS-21101147482chicken
2331DENND6B11012031chicken
3973MRPL391102578362chicken
...............
4722CA9Z9779343chicken
4738ARHGEF39Z9835547chicken
3885MRPL17Z9850679chicken
4172CCBE1Z9852827chicken
3293PMAIP1Z9998272chicken
\n", "

13271 rows × 4 columns

\n", "
" ], "text/plain": [ " gene_symbol chromosome start species\n", "2327 GCC1 1 1006145 chicken\n", "2502 NCAM2 1 100828671 chicken\n", "3084 ENS-2 1 101147482 chicken\n", "2331 DENND6B 1 1012031 chicken\n", "3973 MRPL39 1 102578362 chicken\n", "... ... ... ... ...\n", "4722 CA9 Z 9779343 chicken\n", "4738 ARHGEF39 Z 9835547 chicken\n", "3885 MRPL17 Z 9850679 chicken\n", "4172 CCBE1 Z 9852827 chicken\n", "3293 PMAIP1 Z 9998272 chicken\n", "\n", "[13271 rows x 4 columns]" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "master_pos_def = pd.concat(all_pos_def)\n", "master_pos_def" ] }, { "cell_type": "code", "execution_count": 7, "id": "6d9dac28", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "chicken 13271\n", "Name: species, dtype: int64" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "master_pos_def[\"species\"].value_counts() # double check how many genes are mapped" ] }, { "cell_type": "code", "execution_count": 8, "id": "4a3d45c2", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "chicken: 0\n" ] } ], "source": [ "for k, v in missing_genes.items():\n", " print(f\"{k}: {len(v)}\") # are any genes missing?" ] }, { "cell_type": "code", "execution_count": 9, "id": "c59774b1", "metadata": { "scrolled": true }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "*********\n", "chicken\n" ] }, { "data": { "text/plain": [ "1 1785\n", "2 1169\n", "3 1067\n", "4 953\n", "5 817\n", "Z 629\n", "6 458\n", "8 450\n", "7 442\n", "9 382\n", "10 366\n", "14 359\n", "11 327\n", "15 326\n", "13 306\n", "20 298\n", "12 293\n", "19 278\n", "18 274\n", "17 260\n", "26 237\n", "28 237\n", "27 235\n", "21 226\n", "23 214\n", "25 176\n", "34 155\n", "24 149\n", "22 142\n", "16 54\n", "30 52\n", "38 49\n", "31 14\n", "MT 13\n", "39 10\n", "JAENSK010000484.1 7\n", "35 6\n", "JAENSK010000592.1 6\n", "W 5\n", "MU179278.1 5\n", "MU179279.1 4\n", "36 3\n", "JAENSK010000483.1 3\n", "JAENSK010000585.1 3\n", "JAENSK010000593.1 2\n", "MU179258.1 2\n", "MU179272.1 2\n", "MU179273.1 2\n", "JAENSK010000584.1 2\n", "JAENSK010000656.1 1\n", "Name: chromosome, dtype: int64" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "*********\n" ] } ], "source": [ "# Count genes per chromosome\n", "for species in species_to_ids.keys():\n", " print(\"*********\")\n", " print(species)\n", " display(master_pos_def[master_pos_def[\"species\"] == species][\"chromosome\"].value_counts().head(50))\n", " print(\"*********\")" ] }, { "cell_type": "code", "execution_count": 10, "id": "541baded", "metadata": {}, "outputs": [], "source": [ "master_pos_def.to_csv(f\"{SPECIES_NAME}_to_chrom_pos.csv\", index=False) # Save the DF" ] }, { "cell_type": "code", "execution_count": 11, "id": "eabd0e31", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "chicken_to_chrom_pos.csv\n" ] } ], "source": [ "# The chromosome file path will be:\n", "print(f\"{SPECIES_NAME}_to_chrom_pos.csv\")" ] }, { "cell_type": "code", "execution_count": 12, "id": "fe1345b1", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "66" ] }, "execution_count": 12, "metadata": {}, "output_type": "execute_result" } ], "source": [ "N_UNIQ_CHROM = len(master_pos_def[master_pos_def[\"species\"] == species][\"chromosome\"].unique())\n", "N_UNIQ_CHROM" ] }, { "cell_type": "markdown", "id": "e37e277f", "metadata": {}, "source": [ "# Generate token file" ] }, { "cell_type": "code", "execution_count": 13, "id": "d6904975", "metadata": {}, "outputs": [], "source": [ "import torch\n", "import pickle\n", "token_dim = 5120" ] }, { "cell_type": "markdown", "id": "a2798848", "metadata": {}, "source": [ "This will create the token file. Please note the offset value." ] }, { "cell_type": "code", "execution_count": 14, "id": "4355dabd", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "CHROM_TOKEN_OFFSET: 13275\n", "Saved PE, offsets file\n" ] } ], "source": [ "species_to_offsets = {}\n", "\n", "all_pe = torch.load(\"../model_files/all_tokens.torch\")[0:4] # read in existing token file to make sure \n", "# that special vocab tokens are the same for different seeds\n", "\n", "offset = len(all_pe) # special tokens at the top!\n", "\n", "PE = torch.load(SPECIES_PROTEIN_EMBEDDINGS_PATH)\n", "\n", "pe_stacked = torch.stack(list(PE.values()))\n", "all_pe = torch.vstack((all_pe, pe_stacked))\n", "species_to_offsets[species] = offset\n", "\n", "print(\"CHROM_TOKEN_OFFSET:\", all_pe.shape[0])\n", "torch.manual_seed(TAXONOMY_ID)\n", "CHROM_TENSORS = torch.normal(mean=0, std=1, size=(N_UNIQ_CHROM, 5120)) \n", "# N_UNIQ_CHROM is the total number of chromosome choices, it is hardcoded for now (for species in the training data)\n", "all_pe = torch.vstack(\n", " (all_pe, CHROM_TENSORS)) # Add the chrom tensors to the end\n", "all_pe.requires_grad = False\n", "\n", "\n", "torch.save(all_pe, f\"{SPECIES_NAME}_pe_tokens.torch\")\n", "\n", "with open(f\"{SPECIES_NAME}_offsets.pkl\", \"wb+\") as f:\n", " pickle.dump(species_to_offsets, f)\n", "print(\"Saved PE, offsets file\")" ] }, { "cell_type": "code", "execution_count": 15, "id": "c26fe491", "metadata": { "scrolled": true }, "outputs": [ { "data": { "text/plain": [ "torch.Size([13341, 5120])" ] }, "execution_count": 15, "metadata": {}, "output_type": "execute_result" } ], "source": [ "all_pe.shape" ] }, { "cell_type": "code", "execution_count": 16, "id": "21f937ea", "metadata": { "scrolled": true }, "outputs": [ { "data": { "text/plain": [ "torch.Size([13341, 5120])" ] }, "execution_count": 16, "metadata": {}, "output_type": "execute_result" } ], "source": [ "all_pe.shape" ] }, { "cell_type": "code", "execution_count": 17, "id": "5faadace", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "chicken_offsets.pkl\n" ] } ], "source": [ "print(f\"{SPECIES_NAME}_offsets.pkl\")" ] }, { "cell_type": "code", "execution_count": 18, "id": "6ceac20b", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'../model_files/protein_embeddings/Gallus_gallus.bGalGal1.mat.broiler.GRCg7b.pep.all.gene_symbol_to_embedding_ESM2.pt'" ] }, "execution_count": 18, "metadata": {}, "output_type": "execute_result" } ], "source": [ "SPECIES_PROTEIN_EMBEDDINGS_PATH" ] }, { "cell_type": "markdown", "id": "e4697330", "metadata": {}, "source": [ "# Example evaluation of new species" ] }, { "cell_type": "markdown", "id": "2b72667d", "metadata": {}, "source": [ "**Note: when you evaluate a new species, you need to change some arguments and modify some files:**\n", "\n", "You will need to modify the csv in `model_files/new_species_protein_embeddings.csv` to include the new protein embeddings file you downloaded.\n", "\n", "In the file add a row for the new species with the format:\n", "`species name,full path to protein embedding file`\n", "\n", "Please also add this line to the dictionary created on line 247 in the file `data_proc/data_utils.py`.\n", "\n", "When you want to embed this new species, you will need to specify these newly created files as arguments.\n", "- `CHROM_TOKEN_OFFSET`: This tells UCE when the rows corresponding to chromosome tokens starts.\n", "- `spec_chrom_csv_path`: This is a new csv, created by this script, which maps genes to chromosomes and genomic positions\n", "- `token_file`: This is a new token file that will work just for this species. The embeddings generated will still be universal though!\n", "- `offset_pkl_path`: This is another file that maps genes to tokens\n", "\n", "\n", "```\n", "\n", "accelerate launch eval_single_anndata.py chicken_heart.h5ad --species=chicken --CHROM_TOKEN_OFFSET=13275 --spec_chrom_csv_path=data_proc/chicken_to_chrom_pos.csv --token_file=data_proc/chicken_pe_tokens.torch --offset_pkl_path=data_proc/chicken_offsets.pkl --dir=... --multi_gpu=True\n", "\n", "```" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.6" } }, "nbformat": 4, "nbformat_minor": 5 } ================================================ FILE: data_proc/data_utils.py ================================================ import warnings warnings.filterwarnings("ignore") import scanpy as sc import torch from torch import nn, Tensor import torch.nn.functional as F import torch.utils.data as data import torch.optim as optim import numpy as np import pickle import os import argparse import logging import time from tqdm.auto import tqdm import pandas as pd import math import anndata from pathlib import Path from torch.utils.data import dataset from torch.utils.data import DataLoader, TensorDataset, dataset from scipy.stats import binom from typing import Dict, List, Optional, Tuple from scanpy import AnnData from data_proc.gene_embeddings import load_gene_embeddings_adata def data_to_torch_X(X): if isinstance(X, sc.AnnData): X = X.X if not isinstance(X, np.ndarray): X = X.toarray() return torch.from_numpy(X).float() class SincleCellDataset(data.Dataset): def __init__(self, expression: torch.tensor, # Subset to hv genes, count data! cells x genes protein_embeddings: torch.tensor, # same order as expression, also subset genes x pe labels: None, # optional, tensor of labels covar_vals: None, # tensor of covar values or none ) -> None: super(SincleCellDataset, self).__init__() # Set expression self.expression = expression row_sums = self.expression.sum(1) # UMI Counts log_norm_count_adj = torch.log1p(self.expression / (self.expression.sum(1)).unsqueeze(1) * torch.tensor(1000)) # Set log norm and count adjusted expression max_vals, max_idx = torch.max(log_norm_count_adj, dim=0) self.expression_mod = log_norm_count_adj / max_vals # Calculate dropout likliehoods of each gene self.dropout_vec = (self.expression == 0).float().mean(0) # per gene dropout percentages # Set data info self.num_cells = self.expression.shape[0] self.num_genes = self.expression.shape[1] # Set optional label info, including categorical covariate index self.covar_vals = covar_vals self.labels = labels # Set protein embeddings self.protein_embeddings = protein_embeddings self.item_mode = "expression" if self.covar_vals is not None: self.item_mode = "expression+covar" def __getitem__(self, idx): if self.item_mode == "expression": if isinstance(idx, int): if idx < self.num_cells: return self.expression[idx, :] else: raise IndexError else: raise NotImplementedError elif self.item_mode == "expression+covar": if isinstance(idx, int): if idx < self.num_cells: return self.expression[idx, :], self.covar_vals[idx] else: raise IndexError else: raise NotImplementedError def __len__(self) -> int: return self.num_cells def get_dim(self) -> Dict[str, int]: return self.num_genes def data_to_torch_X(X): if isinstance(X, sc.AnnData): X = X.X if not isinstance(X, np.ndarray): X = X.toarray() return torch.from_numpy(X).float() def anndata_to_sc_dataset(adata:sc.AnnData, species:str="human", labels:list=[], covar_col:str=None, hv_genes=None, embedding_model="ESM2", ) -> (SincleCellDataset, AnnData): # Subset to just genes we have embeddings for adata, protein_embeddings = load_gene_embeddings_adata( adata=adata, species=[species], embedding_model=embedding_model ) if hv_genes is not None: sc.pp.highly_variable_genes(adata, flavor='seurat_v3', n_top_genes=hv_genes) # Expects Count Data hv_index = adata.var["highly_variable"] adata = adata[:, hv_index] # Subset to hv genes only protein_embeddings = protein_embeddings[species][hv_index] else: protein_embeddings = protein_embeddings[species] expression = data_to_torch_X(adata.X) covar_vals = None if len(labels) > 0: assert covar_col is None or covar_col in labels, "Covar needs to be in labels" # make sure you keep track of covar column! labels = adata.obs.loc[:, labels].values if covar_col is not None: # we have a categorical label to use as covariate covar_vals = torch.tensor(pd.Categorical(adata.obs[covar_col]).codes) return SincleCellDataset( expression=expression, protein_embeddings=protein_embeddings, labels=labels, covar_vals=covar_vals ), adata def adata_path_to_prot_chrom_starts(adata, dataset_species, spec_pe_genes, gene_to_chrom_pos, offset): """ Given a :path: to an h5ad, """ pe_row_idxs = torch.tensor([spec_pe_genes.index(k.upper()) + offset for k in adata.var_names]).long() print(len(np.unique(pe_row_idxs))) spec_chrom = gene_to_chrom_pos[gene_to_chrom_pos["species"] == dataset_species].set_index("gene_symbol") gene_chrom = spec_chrom.loc[[k.upper() for k in adata.var_names]] dataset_chroms = gene_chrom["spec_chrom"].cat.codes # now this is correctely indexed by species and chromosome print("Max Code:", max(dataset_chroms)) dataset_pos = gene_chrom["start"].values return pe_row_idxs, dataset_chroms, dataset_pos def process_raw_anndata(row, h5_folder_path, npz_folder_path, scp, skip, additional_filter, root): path = row.path if not os.path.isfile(root + "/" + path): print( "**********************************") print(f"***********{root + '/' + path} File Missing****") print( "**********************************") print(path, root) return None name = path.replace(".h5ad", "") proc_path = path.replace(".h5ad", "_proc.h5ad") if skip: if os.path.isfile(h5_folder_path + proc_path): print(f"{name} already processed. Skipping") return None, None, None print(f"Proccessing {name}") species = row.species covar_col = row.covar_col ad = sc.read(root + "/" + path) labels = [] if "cell_type" in ad.obs.columns: labels.append("cell_type") if covar_col is np.nan or np.isnan(covar_col): covar_col = None else: labels.append(covar_col) if additional_filter: sc.pp.filter_genes(ad, min_cells=10) sc.pp.filter_cells(ad, min_genes=25) dataset, adata = anndata_to_sc_dataset(ad, species=species, labels=labels, covar_col=covar_col, hv_genes=None) adata = adata.copy() if additional_filter: sc.pp.filter_genes(ad, min_cells=10) sc.pp.filter_cells(ad, min_genes=25) num_cells = adata.X.shape[0] num_genes = adata.X.shape[1] adata_path = h5_folder_path + proc_path adata.write(adata_path) arr = data_to_torch_X(adata.X).numpy() print(arr.max()) # this is a nice check to make sure it's counts filename = npz_folder_path + f"{name}_counts.npz" shape = arr.shape print(name, shape) fp = np.memmap(filename, dtype='int64', mode='w+', shape=shape) fp[:] = arr[:] fp.flush() if scp != "": subprocess.call(["scp", filename, f"{scp}:{filename}"]) subprocess.call(["scp", adata_path, f"{scp}:{adata_path}"]) return adata, num_cells, num_genes def get_species_to_pe(EMBEDDING_DIR): """ Given an embedding directory, return all embeddings as a dictionary coded by species. Note: In the current form, this function is written such that the directory needs all of the following species embeddings. """ EMBEDDING_DIR = Path(EMBEDDING_DIR) embeddings_paths = { 'human': EMBEDDING_DIR / 'Homo_sapiens.GRCh38.gene_symbol_to_embedding_ESM2.pt', 'mouse': EMBEDDING_DIR / 'Mus_musculus.GRCm39.gene_symbol_to_embedding_ESM2.pt', 'frog': EMBEDDING_DIR / 'Xenopus_tropicalis.Xenopus_tropicalis_v9.1.gene_symbol_to_embedding_ESM2.pt', 'zebrafish': EMBEDDING_DIR / 'Danio_rerio.GRCz11.gene_symbol_to_embedding_ESM2.pt', "mouse_lemur": EMBEDDING_DIR / "Microcebus_murinus.Mmur_3.0.gene_symbol_to_embedding_ESM2.pt", "pig": EMBEDDING_DIR / 'Sus_scrofa.Sscrofa11.1.gene_symbol_to_embedding_ESM2.pt', "macaca_fascicularis": EMBEDDING_DIR / 'Macaca_fascicularis.Macaca_fascicularis_6.0.gene_symbol_to_embedding_ESM2.pt', "macaca_mulatta": EMBEDDING_DIR / 'Macaca_mulatta.Mmul_10.gene_symbol_to_embedding_ESM2.pt', } extra_species = pd.read_csv("./model_files/new_species_protein_embeddings.csv").set_index("species").to_dict()["path"] embeddings_paths.update(extra_species) # adds new species species_to_pe = { species:torch.load(pe_dir) for species, pe_dir in embeddings_paths.items() } species_to_pe = {species:{k.upper(): v for k,v in pe.items()} for species, pe in species_to_pe.items()} return species_to_pe def get_spec_chrom_csv(path="/dfs/project/cross-species/yanay/code/all_to_chrom_pos.csv"): """ Get the species to chrom csv file """ gene_to_chrom_pos = pd.read_csv(path) gene_to_chrom_pos["spec_chrom"] = pd.Categorical(gene_to_chrom_pos["species"] + "_" + gene_to_chrom_pos["chromosome"]) # add the spec_chrom list return gene_to_chrom_pos ================================================ FILE: data_proc/download_proc_czi_cxg.py ================================================ import os os.environ["OMP_NUM_THREADS"] = "20" # export OMP_NUM_THREADS=4 os.environ["OPENBLAS_NUM_THREADS"] = "20" # export OPENBLAS_NUM_THREADS=4 os.environ["MKL_NUM_THREADS"] = "20" # export MKL_NUM_THREADS=6 os.environ["VECLIB_MAXIMUM_THREADS"] = "20" # export VECLIB_MAXIMUM_THREADS=4 os.environ["NUMEXPR_NUM_THREADS"] = "20" import warnings warnings.filterwarnings('ignore') import cellxgene_census from tqdm import tqdm import scanpy as sc from collections import defaultdict from typing import Dict, List, Optional, Tuple import torch import torch.utils.data as data import torch import numpy as np import scanpy as sc from numpy import array import os import pickle as pkl import glob def data_to_torch_X(X): if isinstance(X, sc.AnnData): X = X.X if not isinstance(X, np.ndarray): X = X.toarray() return torch.from_numpy(X).float() import sys sys.path.append('../') from gene_embeddings import load_gene_embeddings_adata import pandas as pd import numpy as np from scanpy import AnnData from multiprocessing import Pool, Process, Manager import multiprocessing.pool as mpp # https://stackoverflow.com/questions/57354700/starmap-combined-with-tqdm def istarmap(self, func, iterable, chunksize=1): """starmap-version of imap """ if self._state != mpp.RUN: raise ValueError("Pool not running") if chunksize < 1: raise ValueError( "Chunksize must be 1+, not {0:n}".format( chunksize)) task_batches = mpp.Pool._get_tasks(func, iterable, chunksize) result = mpp.IMapIterator(self._cache) self._taskqueue.put( ( self._guarded_task_generation(result._job, mpp.starmapstar, task_batches), result._set_length )) return (item for chunk in result for item in chunk) mpp.Pool.istarmap = istarmap VERSION = "2023-04-25" N_TOP_GENES = 12000 print(cellxgene_census.get_census_version_description(VERSION)) census = cellxgene_census.open_soma(census_version=VERSION) census_datasets = census["census_info"]["datasets"].read().concat().to_pandas() # for convenience, indexing on the soma_joinid which links this to other census data. census_datasets = census_datasets.set_index("soma_joinid") species_to_readable = { "Homo sapiens":"human", "Mus musculus":"mouse" } def process_row(row, num_genes, num_cells, paths, all_species, covar_cols, dataset_title, h5_root="/dfs/project/uce/cxg_data/anndatas/", npz_root="/dfs/project/uce/cxg_data/npzs/"): dataset_id = row[1].dataset_id #dataset_title = row[1].dataset_title.lower().replace(' ', '_').replace(",", "").replace("/", "") save_path = h5_root + f"{dataset_title}.h5ad" no_primary_path = save_path.replace(".h5ad", "_no_primary.h5ad") proc_path = save_path.replace(".h5ad", "_proc.h5ad") npz_path = npz_root + f"{dataset_title}_counts.npz" # Download the anndata if os.path.exists(no_primary_path): print("No Primary, skipping") return if not os.path.exists(save_path) and not os.path.exists(no_primary_path): cellxgene_census.download_source_h5ad( dataset_id, to_path=save_path ) if os.path.exists(proc_path) and os.path.exists(npz_path): print("Already Proc") try: ad = sc.read(proc_path) except: print() print() print("Error reading on:", dataset_title) print() print() return # Get organism if "organism" in ad.obs.columns: unique_organisms = list(ad.obs.organism.unique().categories) unique_organism_str = ", ".join(unique_organisms) else: unique_organism_str = "human" species = species_to_readable.get(unique_organism_str, "human") # don't need to do hv if already proc if "sample" in ad.obs.columns: covar_cols[dataset_title] = "sample" elif "batch" in ad.obs.columns: covar_cols[dataset_title] = "batch" else: covar_cols[dataset_title] = "" num_genes[dataset_title] = ad.X.shape[1] num_cells[dataset_title] = ad.X.shape[0] paths[dataset_title] = f"{dataset_title}.h5ad" all_species[dataset_title] = species return # Skip everything else # Read the raw AD ad = sc.read(save_path) # Change to counts if not sc._utils.check_nonnegative_integers(ad.X): # don't have counts yet, need raw if ad.raw is None: print("Skipped, no counts") return ad.X = ad.raw.X.toarray() if not sc._utils.check_nonnegative_integers(ad.X): print("Skipped, no counts") return # SUBSET TO primary data if len(np.unique(ad.obs["is_primary_data"])) >= 1: primary_data = ad.obs.is_primary_data.value_counts() ad = ad[ad.obs.is_primary_data] if ad.X.shape[0] == 0: print("no primary data") print(primary_data) os.rename(save_path, no_primary_path) return # No primary data print("has primary data") # Switch to gene symbols ad.var["feature_id_orig"] = list(ad.var.index) ad.var_names = list(ad.var.feature_name) # Get organism if "organism" in ad.obs.columns: unique_organisms = list(ad.obs.organism.unique().categories) unique_organism_str = ", ".join(unique_organisms) else: unique_organism_str = "human" species = species_to_readable.get(unique_organism_str, "human") # Filter to gene symbols with protein embeddings ad, _ = load_gene_embeddings_adata( adata=ad, species=[species], embedding_model="ESM2" ) ad = ad.copy() # Simple filtering by counts sc.pp.filter_cells(ad, min_genes=200) sc.pp.filter_genes(ad, min_cells=10) #print(ad) if "sample" in ad.obs.columns: try: sc.pp.highly_variable_genes(ad, flavor="seurat_v3", n_top_genes=N_TOP_GENES, subset=True, batch_key="sample") except: try: sc.pp.highly_variable_genes(ad, flavor="seurat_v3", n_top_genes=N_TOP_GENES, subset=True, batch_key="sample", span=1) except: print(f"can't hv gene subset {dataset_title}") covar_cols[dataset_title] = "sample" elif "batch" in ad.obs.columns: try: sc.pp.highly_variable_genes(ad, flavor="seurat_v3", n_top_genes=N_TOP_GENES, subset=True, batch_key="batch") except: try: sc.pp.highly_variable_genes(ad, flavor="seurat_v3", n_top_genes=N_TOP_GENES, subset=True, batch_key="batch", span=1) except: print(f"can't hv gene subset {dataset_title}") covar_cols[dataset_title] = "batch" else: try: sc.pp.highly_variable_genes(ad, flavor="seurat_v3", n_top_genes=N_TOP_GENES, subset=True) except: try: sc.pp.highly_variable_genes(ad, flavor="seurat_v3", n_top_genes=N_TOP_GENES, subset=True, span=1) except: print(f"can't hv gene subset {dataset_title}") covar_cols[dataset_title] = "" num_genes[dataset_title] = ad.X.shape[1] num_cells[dataset_title] = ad.X.shape[0] paths[dataset_title] = f"{dataset_title}.h5ad" all_species[dataset_title] = species print("writing proc") ad.write(proc_path) arr = data_to_torch_X(ad.X).numpy() shape = arr.shape fp = np.memmap(npz_path, dtype='int64', mode='w+', shape=shape) fp[:] = arr[:] fp.flush() return if __name__ == '__main__': ''' manager = Manager() num_genes = manager.dict() num_cells = manager.dict() paths = manager.dict() all_species = manager.dict() covar_cols = manager.dict() ''' num_genes = {} num_cells = {} paths = {} all_species = {} covar_cols = {} df = pd.DataFrame() # Shuffle the dataset census_datasets = census_datasets#.iloc[270:] iterrows = list(census_datasets.iterrows()) #p = Pool(8) #for row in tqdm(iterrows, total=len(census_datasets)): # p.apply_async(process_row, args=(row, num_genes, num_cells, paths, all_species, covar_cols)) #p.close() #p.join() ''' with Pool(1) as p: nrows = len(iterrows) inputs = zip(iterrows, [num_genes]*nrows, [num_cells]*nrows, [paths]*nrows, [all_species]*nrows, [covar_cols]*nrows) for _ in tqdm(p.istarmap(process_row, inputs), total=nrows): pass ''' if os.path.exists("dataset_rows_mouse_fixed.pkl"): dataset_rows = {} for path in glob.glob("dataset_rows_mouse_fixed*.pkl"): with open(path, "rb") as f: dataset_rows_path = pkl.load(f) dataset_rows.update(dataset_rows_path) print(f"{len(dataset_rows)} already counted") else: dataset_rows = {} pbar = tqdm(iterrows) all_errors = [] total_number_of_cells = 0 duplicate_titles = ['Dissection: Body of hippocampus (HiB) - Rostral DG-CA4', 'Retina', 'Colon', 'Myeloid cells', 'Ileum', 'Airway'] duplicate_titles_2 = ['retina', 'airway', 'myeloid_cells', 'colon', 'ileum', 'immune_cells'] for row in pbar: dataset_title = row[1].dataset_title if dataset_title in duplicate_titles: dataset_title = row[1].collection_name + row[1].dataset_title dataset_title = dataset_title.lower().replace(' ', '_').replace(",", "").replace("/", "") if dataset_title in duplicate_titles_2: dataset_title = (row[1].collection_name + "_" + dataset_title).lower().replace(' ', '_').replace(",", "").replace("/", "") print(f"{total_number_of_cells} cells done") if dataset_title in dataset_rows: paths[dataset_title] = dataset_rows[dataset_title][0] all_species[dataset_title] = dataset_rows[dataset_title][1] covar_cols[dataset_title] = dataset_rows[dataset_title][2] num_cells[dataset_title] = dataset_rows[dataset_title][3] num_genes[dataset_title] = dataset_rows[dataset_title][4] #print("skipped read of proc") total_number_of_cells += dataset_rows[dataset_title][3] continue # Skip! else: pbar.set_description(f"{dataset_title} proc") try: process_row(row, num_genes, num_cells, paths, all_species, covar_cols, dataset_title=dataset_title) except: print(f"****{dataset_title} ERROR****") all_errors.append(dataset_title) pbar.set_description(f"{dataset_title} done") if dataset_title in paths: dataset_rows[dataset_title] = [paths[dataset_title], all_species[dataset_title], covar_cols[dataset_title], num_cells[dataset_title], num_genes[dataset_title], dataset_title] total_number_of_cells += dataset_rows[dataset_title][3] with open("dataset_rows_mouse_fixed.pkl", "wb") as f: pkl.dump(dataset_rows, f) print("wrote pkl") # path,species,covar_col,num_cells,names df["path"] = list(paths.values()) df["species"] = list(all_species.values()) df["covar_col"] = list(covar_cols.values()) df["num_cells"] = list(num_cells.values()) df["num_genes"] = list(num_genes.values()) df["names"] = list(paths.keys()) print(df.head(20)) print() print("Errors:") print(all_errors) df.to_csv("cxg_datasets.csv", index=False) ================================================ FILE: data_proc/gene_embeddings.py ================================================ """Helper functions for loading pretrained gene embeddings.""" from pathlib import Path from typing import Dict, Tuple import torch from scanpy import AnnData import numpy as np import pandas as pd EMBEDDING_DIR = Path('model_files/protein_embeddings') MODEL_TO_SPECIES_TO_GENE_EMBEDDING_PATH = { 'ESM2': { 'human': EMBEDDING_DIR / 'Homo_sapiens.GRCh38.gene_symbol_to_embedding_ESM2.pt', 'mouse': EMBEDDING_DIR / 'Mus_musculus.GRCm39.gene_symbol_to_embedding_ESM2.pt', 'frog': EMBEDDING_DIR / 'Xenopus_tropicalis.Xenopus_tropicalis_v9.1.gene_symbol_to_embedding_ESM2.pt', 'zebrafish': EMBEDDING_DIR / 'Danio_rerio.GRCz11.gene_symbol_to_embedding_ESM2.pt', "mouse_lemur": EMBEDDING_DIR / "Microcebus_murinus.Mmur_3.0.gene_symbol_to_embedding_ESM2.pt", "pig": EMBEDDING_DIR / 'Sus_scrofa.Sscrofa11.1.gene_symbol_to_embedding_ESM2.pt', "macaca_fascicularis": EMBEDDING_DIR / 'Macaca_fascicularis.Macaca_fascicularis_6.0.gene_symbol_to_embedding_ESM2.pt', "macaca_mulatta": EMBEDDING_DIR / 'Macaca_mulatta.Mmul_10.gene_symbol_to_embedding_ESM2.pt', } } extra_species = pd.read_csv("./model_files/new_species_protein_embeddings.csv").set_index("species").to_dict()["path"] MODEL_TO_SPECIES_TO_GENE_EMBEDDING_PATH["ESM2"].update(extra_species) # adds new species def load_gene_embeddings_adata(adata: AnnData, species: list, embedding_model: str) -> Tuple[AnnData, Dict[str, torch.FloatTensor]]: """Loads gene embeddings for all the species/genes in the provided data. :param data: An AnnData object containing gene expression data for cells. :param species: Species corresponding to this adata :param embedding_model: The gene embedding model whose embeddings will be loaded. :return: A tuple containing: - A subset of the data only containing the gene expression for genes with embeddings in all species. - A dictionary mapping species name to the corresponding gene embedding matrix (num_genes, embedding_dim). """ # Get species names species_names = species species_names_set = set(species_names) # Get embedding paths for the model species_to_gene_embedding_path = MODEL_TO_SPECIES_TO_GENE_EMBEDDING_PATH[embedding_model] available_species = set(species_to_gene_embedding_path) # Ensure embeddings are available for all species if not (species_names_set <= available_species): raise ValueError(f'The following species do not have gene embeddings: {species_names_set - available_species}') # Load gene embeddings for desired species (and convert gene symbols to lower case) species_to_gene_symbol_to_embedding = { species: { gene_symbol.lower(): gene_embedding for gene_symbol, gene_embedding in torch.load(species_to_gene_embedding_path[species]).items() } for species in species_names } # Determine which genes to include based on gene expression and embedding availability genes_with_embeddings = set.intersection(*[ set(gene_symbol_to_embedding) for gene_symbol_to_embedding in species_to_gene_symbol_to_embedding.values() ]) genes_to_use = {gene for gene in adata.var_names if gene.lower() in genes_with_embeddings} # Subset data to only use genes with embeddings adata = adata[:, adata.var_names.isin(genes_to_use)] # Set up dictionary mapping species to gene embedding matrix (num_genes, embedding_dim) species_to_gene_embeddings = { species_name: torch.stack([ species_to_gene_symbol_to_embedding[species_name][gene_symbol.lower()] for gene_symbol in adata.var_names ]) for species_name in species_names } return adata, species_to_gene_embeddings ================================================ FILE: data_proc/generate_reduced_chrom_files.py ================================================ import os os.environ["OMP_NUM_THREADS"] = "4" # export OMP_NUM_THREADS=4 os.environ["OPENBLAS_NUM_THREADS"] = "4" # export OPENBLAS_NUM_THREADS=4 os.environ["MKL_NUM_THREADS"] = "4" # export MKL_NUM_THREADS=6 os.environ["VECLIB_MAXIMUM_THREADS"] = "4" # export VECLIB_MAXIMUM_THREADS=4 os.environ["NUMEXPR_NUM_THREADS"] = "4" import warnings warnings.filterwarnings("ignore") import scanpy as sc import torch import torch.nn as nn import torch.nn.functional as F import torch.optim as optim import numpy as np import pickle import os import argparse import logging import time from tqdm.auto import tqdm import matplotlib.pyplot as plt import pandas as pd #sc._settings.ScanpyConfig.n_jobs = 6 import math from typing import Tuple import torch from torch import nn, Tensor import torch.nn.functional as F from torch.nn import TransformerEncoder, TransformerEncoderLayer from torch.utils.data import dataset from accelerate import Accelerator import anndata from data_utils import adata_path_to_prot_chrom_starts, get_spec_chrom_csv from torch.utils.data import dataset from torch.utils.data import DataLoader, TensorDataset from scipy.stats import binom def padding_tensor(sequences): """ :param sequences: list of tensors :return: """ num = len(sequences) max_len = max([s.size(0) for s in sequences]) out_dims = (num, max_len, 1280) out_tensor = sequences[0].data.new(*out_dims).fill_(0) out_dims2 = (num, max_len) mask = sequences[0].data.new(*out_dims2).fill_(float('-inf')) for i, tensor in enumerate(sequences): length = tensor.size(0) out_tensor[i, :length] = tensor mask[i, :length] = 1 return out_tensor.permute(1, 0, 2), mask from pathlib import Path # ESM1b ''' EMBEDDING_DIR = Path('/dfs/project/cross-species/data/proteome/embeddings') human_pe_dir = EMBEDDING_DIR / 'Homo_sapiens.GRCh38.gene_symbol_to_embedding_ESM1b.pt' mouse_pe_dir = EMBEDDING_DIR / 'Mus_musculus.GRCm39.gene_symbol_to_embedding_ESM1b.pt' lemur_pe_dir = Path("/dfs/project/cross-species/yanay/data/proteome/embeddings/") / 'Microcebus_murinus.Mmur_3.0.gene_symbol_to_embedding_ESM1b.pt' ''' # Upgrade to ESM2 EMBEDDING_DIR = Path('/dfs/project/cross-species/data/proteome/embeddings') EMBEDDING_DIR = Path('/dfs/project/cross-species/yanay/data/proteome/embeddings') embeddings_paths = { 'human': EMBEDDING_DIR / 'Homo_sapiens.GRCh38.gene_symbol_to_embedding_ESM2.pt', 'mouse': EMBEDDING_DIR / 'Mus_musculus.GRCm39.gene_symbol_to_embedding_ESM2.pt', 'frog': EMBEDDING_DIR / 'Xenopus_tropicalis.Xenopus_tropicalis_v9.1.gene_symbol_to_embedding_ESM2.pt', 'zebrafish': EMBEDDING_DIR / 'Danio_rerio.GRCz11.gene_symbol_to_embedding_ESM2.pt', "mouse_lemur": EMBEDDING_DIR / "Microcebus_murinus.Mmur_3.0.gene_symbol_to_embedding_ESM2.pt", "pig": EMBEDDING_DIR / 'Sus_scrofa.Sscrofa11.1.gene_symbol_to_embedding_ESM2.pt', "macaca_fascicularis": EMBEDDING_DIR / 'Macaca_fascicularis.Macaca_fascicularis_6.0.gene_symbol_to_embedding_ESM2.pt', "macaca_mulatta": EMBEDDING_DIR / 'Macaca_mulatta.Mmul_10.gene_symbol_to_embedding_ESM2.pt', } species_to_pe = { species:torch.load(pe_dir) for species, pe_dir in embeddings_paths.items() } species_to_pe = {species:{k.upper(): v for k,v in pe.items()} for species, pe in species_to_pe.items()} #species_to_keys = {species:list(pe.keys()) for species, pe in species_to_pe.items()} #species_to_keys = {species:dict(zip(keys, np.arange(len(keys)))) for species, keys in species_to_keys.items()} #datasets_df = pd.read_csv("/dfs/project/cross-species/yanay/code/UCE/data_proc/full_train_datasets.csv") datasets_df = pd.read_csv("tissue_datasets.csv") datasets_df = pd.read_csv("perturb_datasets.csv") datasets_df = pd.read_csv("../new_perturb_datasets.csv") #pd.concat((#pd.read_csv("new_datasets.csv"), #pd.read_csv("pbmcs_nohvg.csv"), #pd.read_csv("lung_nohvg.csv"), #pd.read_csv("new_tabula_datasets.csv"), #pd.read_csv("updated_datasets.csv"), # #pd.read_csv("sanger_heart_atlas_datasets.csv"), # pd.read_csv("tissue_datasets.csv") # )) #datasets_df = pd.read_csv("cell_cycle_datasets.csv") #datasets_df = pd.read_csv("spatial_datasets.csv") #datasets_df = pd.read_csv("perturb_datasets.csv") #datasets_df = pd.read_csv("ccle_datasets.csv") #datasets_df = pd.read_csv("pancreas_datasets.csv") sorted_dataset_names = sorted(datasets_df["names"]) with open("dataset_shapes.pkl", "rb") as f: shapes_dict = pickle.load(f) shapes_dict.update({ "madissoon_novel_lung":(190728, 8000), 'flores_cerebellum_human': (20232, 8000), 'osuch_gut_human': (272310, 8000), 'msk_ovarian_human': (929690, 8000), 'htan_vmuc_dis_epi_human': (65084, 8000), 'htan_vmuc_val_epi_human': (57564, 8000), 'htan_vmuc_non_epi_human': (9099, 8000), 'hao_pbmc_3p_human': (161764, 8000), 'hao_pbmc_5p_human': (49147, 8000), 'gao_tumors_human': (36111, 8000), 'swabrick_breast_human': (92427, 8000), 'wu_cryo_tumors_human': (105662, 8000), 'cell_line_het_human': (53513, 8000), 'bi_allen_metastasis_human': (27787, 8000), 'zheng68k_human': (68579, 8000), 'zheng68k_12k_human': (68579, 12000), 'mouse_embryo_ct': (153597, 12000), "regev_gtex_heart": (36574, 8000), "tabula_sapiens_heart": (11505, 8000), "10k_pbmcs":(11990, 12000), "epo_ido":(35834,12000), 'tabula_sapiens_kidney': (9641, 8000), 'tabula_microcebus_kidney': (14592, 8000), 'tabula_muris_kidney': (2781, 8000), 'tabula_muris_senis_kidney': (19610, 8000), 'immune_human': (33506, 8000) }) for row in datasets_df.iterrows(): ngenes = row[1].num_genes ncells = row[1].num_cells name = row[1].names if not np.isnan(ngenes): shapes_dict[name] = (int(ncells), int(ngenes)) #with open("dataset_shapes.pkl", "wb") as f: # pickle.dump(shapes_dict, f) token_dim = 5120 mmap_dict = {} root_dir = "/lfs/local/0/yanay/uce_h5s/" root_dir_census = "/lfs/local/0/yanay/cxg_h5s/" dataset_to_paths = {r[1]["names"]:root_dir + r[1]["path"].replace(".h5ad", "_proc.h5ad") for r in datasets_df.iterrows()} for row in datasets_df.iterrows(): name = row[1].names census = row[1].census if census == "yes": dataset_to_paths[name] = dataset_to_paths[name].replace(root_dir, root_dir_census) datasets_to_species = {r[1]["names"]:r[1]["species"] for r in datasets_df.iterrows()} #species_to_pe = {"mouse":mouse_pe, "human":human_pe, "mouse_lemur":lemur_pe} #dataset_to_protein_embeddings_all = {k:species_to_pe[v] for k, v in datasets_to_species.items()} dataset_to_protein_embeddings = {} #dataset_to_protein_embeddings_all["madissoon_novel_lung"] = species_to_pe["human"] datasets_to_species["madissoon_novel_lung"] = "human" #dataset_to_paths["madissoon_novel_lung"] = "/lfs/local/0/yanay/uce_h5s/madissoon_novel_lung_proc.h5ad" # New Chrom Based Code gene_to_chrom_pos = get_spec_chrom_csv() species_to_chrom_categories = {} for species in np.unique(gene_to_chrom_pos["species"]): species_to_chrom_categories[species] = pd.Categorical(gene_to_chrom_pos["chromosome"]).categories dataset_to_chroms = {} dataset_to_starts = {} sorted_species_names = sorted(species_to_pe.keys()) print(sorted_species_names) if os.path.exists(f"/dfs/project/uce/all_species_pe_tokens.torch"): all_pe = torch.load(f"/dfs/project/uce/all_species_pe_tokens.torch") with open("/dfs/project/uce/all_species_offsets.pkl", "rb") as f: species_to_offsets = pickle.load(f) print("Loaded PE", all_pe.shape) else: torch.manual_seed(8) MASK_TENSOR = torch.zeros((1, token_dim)) # this is the padding token CHROM_TENSOR_LEFT = torch.normal(mean=0, std=1, size=(1, token_dim)) CHROM_TENSOR_RIGHT = torch.normal(mean=0, std=1, size=(1, token_dim)) CLS_TENSOR = torch.normal(mean=0, std=1, size=(1, token_dim)) species_to_offsets = {} all_pe = [MASK_TENSOR, CHROM_TENSOR_LEFT, CHROM_TENSOR_RIGHT, CLS_TENSOR] offset = len(all_pe) # special tokens at the top! for species in sorted_species_names: pe_stacked = torch.stack(list(species_to_pe[species].values())) all_pe.append(pe_stacked) species_to_offsets[species] = offset offset += pe_stacked.shape[0] all_pe = torch.vstack(all_pe) print(all_pe.shape) torch.save(all_pe, f"/dfs/project/uce/all_species_pe_tokens.torch") with open("/dfs/project/uce/all_species_offsets.pkl", "wb+") as f: pickle.dump(species_to_offsets, f) print("Saved PE") # Load in already saved! if os.path.exists(f"/lfs/local/0/yanay/reduced_datasets_to_pe_chrom_{token_dim}_new.torch"): dataset_to_protein_embeddings = torch.load(f"/lfs/local/0/yanay/reduced_datasets_to_pe_chrom_{token_dim}_new.torch") with open("/lfs/local/0/yanay/dataset_to_chroms_new.pkl", "rb") as f: dataset_to_chroms = pickle.load(f) with open("/lfs/local/0/yanay/dataset_to_starts_new.pkl", "rb") as f: dataset_to_starts = pickle.load(f) else: dataset_to_protein_embeddings = {} dataset_to_chroms = {} dataset_to_starts = {} # Add the new ones print("creating reduced size protein embeddings file") redo = True for dataset, path in tqdm(list(dataset_to_paths.items())): if dataset in dataset_to_protein_embeddings.keys() and not redo: continue # skip since already procced print(dataset) adata = sc.read(path) dataset_species = datasets_to_species[dataset] spec_pe_genes = list(species_to_pe[dataset_species].keys()) offset = species_to_offsets[dataset_species] # Get proper idxs pe_row_idxs, dataset_chroms, dataset_pos = adata_path_to_prot_chrom_starts(adata, dataset_species, spec_pe_genes, gene_to_chrom_pos, offset) # Add to dicts dataset_to_chroms[dataset] = dataset_chroms dataset_to_starts[dataset] = dataset_pos dataset_to_protein_embeddings[dataset] = pe_row_idxs del adata # save Dicts and idxs torch.save(dataset_to_protein_embeddings, f"/lfs/local/0/yanay/reduced_datasets_to_pe_chrom_{token_dim}_new.torch") with open("/lfs/local/0/yanay/dataset_to_chroms_new.pkl", "wb+") as f: pickle.dump(dataset_to_chroms, f) with open("/lfs/local/0/yanay/dataset_to_starts_new.pkl", "wb+") as f: pickle.dump(dataset_to_starts, f) ================================================ FILE: data_proc/preproc_many_dataset.py ================================================ import os os.environ["OMP_NUM_THREADS"] = "10" # export OMP_NUM_THREADS=4 os.environ["OPENBLAS_NUM_THREADS"] = "10" # export OPENBLAS_NUM_THREADS=4 os.environ["MKL_NUM_THREADS"] = "10" # export MKL_NUM_THREADS=6 os.environ["VECLIB_MAXIMUM_THREADS"] = "10" # export VECLIB_MAXIMUM_THREADS=4 os.environ["NUMEXPR_NUM_THREADS"] = "10" from collections import defaultdict from typing import Dict, List, Optional, Tuple import torch import torch.utils.data as data import numpy as np import scanpy as sc from numpy import array import subprocess import os from tqdm import tqdm import warnings warnings.filterwarnings("ignore") from gene_embeddings import load_gene_embeddings_adata import pandas as pd import numpy as np from scanpy import AnnData from data_utils import process_raw_anndata def data_to_torch_X(X): if isinstance(X, sc.AnnData): X = X.X if not isinstance(X, np.ndarray): X = X.toarray() return torch.from_numpy(X).float() class SincleCellDataset(data.Dataset): def __init__(self, expression: torch.tensor, # Subset to hv genes, count data! cells x genes protein_embeddings: torch.tensor, # same order as expression, also subset genes x pe labels: None, # optional, tensor of labels covar_vals: None, # tensor of covar values or none ) -> None: super(SincleCellDataset, self).__init__() # Set expression self.expression = expression row_sums = self.expression.sum(1) # UMI Counts log_norm_count_adj = torch.log1p(self.expression / (self.expression.sum(1)).unsqueeze(1) * torch.tensor(1000)) # Set log norm and count adjusted expression max_vals, max_idx = torch.max(log_norm_count_adj, dim=0) self.expression_mod = log_norm_count_adj / max_vals # Calculate dropout likliehoods of each gene self.dropout_vec = (self.expression == 0).float().mean(0) # per gene dropout percentages # Set data info self.num_cells = self.expression.shape[0] self.num_genes = self.expression.shape[1] # Set optional label info, including categorical covariate index self.covar_vals = covar_vals self.labels = labels # Set protein embeddings self.protein_embeddings = protein_embeddings self.item_mode = "expression" if self.covar_vals is not None: self.item_mode = "expression+covar" def __getitem__(self, idx): if self.item_mode == "expression": if isinstance(idx, int): if idx < self.num_cells: return self.expression[idx, :] else: raise IndexError else: raise NotImplementedError elif self.item_mode == "expression+covar": if isinstance(idx, int): if idx < self.num_cells: return self.expression[idx, :], self.covar_vals[idx] else: raise IndexError else: raise NotImplementedError def __len__(self) -> int: return self.num_cells def get_dim(self) -> Dict[str, int]: return self.num_genes def data_to_torch_X(X): if isinstance(X, sc.AnnData): X = X.X if not isinstance(X, np.ndarray): X = X.toarray() return torch.from_numpy(X).float() def anndata_to_sc_dataset(adata:sc.AnnData, species:str="human", labels:list=[], covar_col:str=None, hv_genes:int=12000, embedding_model="ESM1b", ) -> (SincleCellDataset, AnnData): # Subset to just genes we have embeddings for adata, protein_embeddings = load_gene_embeddings_adata( adata=adata, species=[species], embedding_model=embedding_model ) if DO_HVG: sc.pp.highly_variable_genes(adata, flavor='seurat_v3', n_top_genes=hv_genes) # Expects Count Data hv_index = adata.var["highly_variable"] adata = adata[:, hv_index] # Subset to hv genes only protein_embeddings = protein_embeddings[species][hv_index] else: protein_embeddings = protein_embeddings[species] expression = data_to_torch_X(adata.X) covar_vals = None if len(labels) > 0: assert covar_col is None or covar_col in labels, "Covar needs to be in labels" # make sure you keep track of covar column! labels = adata.obs.loc[:, labels].values if covar_col is not None: # we have a categorical label to use as covariate covar_vals = torch.tensor(pd.Categorical(adata.obs[covar_col]).codes) return SincleCellDataset( expression=expression, protein_embeddings=protein_embeddings, labels=labels, covar_vals=covar_vals ), adata def proc(args): datasets_df = pd.read_csv(args.datasets_df) datasets_df["covar_col"] = np.nan skip = args.skip additional_filter = args.filter DO_HVG = args.DO_HVG num_genes = {} num_cells = {} ir = list(datasets_df.iterrows()) for i, row in tqdm(ir, total=len(datasets_df)): _, ncells, ngenes = process_raw_anndata(row, h5_folder_path, npz_folder_path, scp, skip, additional_filter, root=args.file_root_path) if (ncells is not None) and (ngenes is not None): num_genes[path] = adata.X.shape[1] num_cells[path] = ngenes if "num_cells" not in datasets_df.columns: datasets_df["num_cells"] = 0 if "num_genes" not in datasets_df.columns: datasets_df["num_genes"] = 0 for k in num_genes.keys(): ng = num_genes[k] nc = num_cells[k] datasets_df.loc[datasets_df["path"] == k, "num_cells"] = nc datasets_df.loc[datasets_df["path"] == k, "num_genes"] = ng # Write with the cells and genes info back to the original path datasets_df.to_csv(args.datasets_df, index=False) if __name__=="__main__": # Parse command-line arguments parser = argparse.ArgumentParser(description='Preproc datasets h5ad datasets.') # Define command-line arguments parser.add_argument('--scp', type=str, default="", help='Name of a SNAP server to SCP the results to. It should have the same folders as the script is already saving to.') parser.add_argument('--h5_folder_path', type=str, default="/lfs/local/0/yanay/uce_h5s/", help='Folder to save H5s to.') parser.add_argument('--npz_folder_path', type=str, default="/lfs/local/0/yanay/uce_proc/", help='Folder to save NPZs to.') parser.add_argument('--datasets_df', type=str, default="/dfs/project/uce/new_perturb_datasets.csv", help='Path to datasets csv. Will be overwritten to have the correct num cells and num genes for each dataset.') parser.add_argument('--filter', type=bool, default=True, help='Should you do an additional gene/cell filtering? This can be a good step since even if you have already done it, subsetting to protein embeddings can make some cells sparser.') parser.add_argument('--skip', type=bool, default=True, help='Should you skip datasets that appear to have already been created in the h5 folder?') parser.add_argument('--DO_HVG', type=bool, default=False, help='Should a HVG subset be done.') parse args = parser.parse_args() main(args) ================================================ FILE: eval_data.py ================================================ """ Dataloaders """ import warnings warnings.filterwarnings("ignore") import sys sys.path.append('../') from typing import Dict, List, Optional, Tuple, Any import torch import numpy as np import pickle import torch.utils.data as data class MultiDatasetSentences(data.Dataset): def __init__(self, sorted_dataset_names, shapes_dict, args, dataset_to_protein_embeddings_path= "/lfs/local/0/yanay/reduced_datasets_to_pe_chrom_5120_new.torch", datasets_to_chroms_path="/lfs/local/0/yanay/dataset_to_chroms_new.pkl", datasets_to_starts_path="/lfs/local/0/yanay/dataset_to_starts_new.pkl", npzs_dir="/lfs/local/0/yanay/uce_proc/") -> None: super(MultiDatasetSentences, self).__init__() # self.xs = {} self.num_cells = {} self.num_genes = {} self.shapes_dict = shapes_dict self.args = args self.total_num_cells = 0 for name in sorted_dataset_names: num_cells, num_genes = self.shapes_dict[name] # self.xs[name] = X self.num_cells[name] = num_cells self.num_genes[name] = num_genes self.total_num_cells += num_cells self.datasets = sorted_dataset_names # TODO: preferably not hard-coded here self.dataset_to_protein_embeddings = torch.load(dataset_to_protein_embeddings_path) with open(datasets_to_chroms_path, "rb") as f: self.dataset_to_chroms = pickle.load(f) with open(datasets_to_starts_path, "rb") as f: self.dataset_to_starts = pickle.load(f) self.npzs_dir = npzs_dir def __getitem__(self, idx): if isinstance(idx, int): for dataset in sorted(self.datasets): if idx < self.num_cells[dataset]: #cts = np.memmap(f"/lfs/local/0/yanay/cxg_npzs/" + f"{dataset}_counts.npz", # dtype='int64', mode='r', shape=self.shapes_dict[dataset]) cts = np.memmap(self.npzs_dir + f"{dataset}_counts.npz", dtype='int64', mode='r', shape=self.shapes_dict[dataset]) counts = cts[idx] counts = torch.tensor(counts).unsqueeze(0) weights = torch.log1p(counts) weights = (weights / torch.sum(weights)) batch_sentences, mask, seq_len, cell_sentences = \ sample_cell_sentences(counts, weights, dataset, self.args, dataset_to_protein_embeddings= self.dataset_to_protein_embeddings, dataset_to_chroms=self.dataset_to_chroms, dataset_to_starts=self.dataset_to_starts) return batch_sentences, mask, idx, seq_len, cell_sentences else: idx -= self.num_cells[dataset] raise IndexError else: raise NotImplementedError def __len__(self) -> int: return self.total_num_cells def get_dim(self) -> Dict[str, int]: return self.num_genes class MultiDatasetSentenceCollator(object): def __init__(self, args): self.pad_length = args.pad_length def __call__(self, batch): batch_size = len(batch) batch_sentences = torch.zeros((batch_size, self.pad_length)) mask = torch.zeros((batch_size, self.pad_length)) cell_sentences = torch.zeros((batch_size, self.pad_length)) idxs = torch.zeros(batch_size) i = 0 max_len = 0 for bs, msk, idx, seq_len, cs in batch: batch_sentences[i, :] = bs cell_sentences[i, :] = cs max_len = max(max_len, seq_len) mask[i, :] = msk idxs[i] = idx i += 1 return batch_sentences[:, :max_len] , mask[:, :max_len], idxs, cell_sentences def sample_cell_sentences(counts, batch_weights, dataset, args, dataset_to_protein_embeddings, dataset_to_chroms, dataset_to_starts): dataset_idxs = dataset_to_protein_embeddings[dataset] # get the dataset specific protein embedding idxs cell_sentences = torch.zeros((counts.shape[0], args.pad_length)) # init the cell representation as 0s mask = torch.zeros((counts.shape[0], args.pad_length)) # start of masking the whole sequence chroms = dataset_to_chroms[dataset] # get the dataset specific chroms for each gene starts = dataset_to_starts[dataset] # get the dataset specific genomic start locations for each gene longest_seq_len = 0 # we need to keep track of this so we can subset the batch at the end for c, cell in enumerate(counts): weights = batch_weights[c].numpy() weights = weights / sum(weights) # RE NORM after mask # randomly choose the genes that will make up the sample, weighted by expression, with replacement choice_idx = np.random.choice(np.arange(len(weights)), size=args.sample_size, p=weights, replace=True) choosen_chrom = chroms[choice_idx] # get the sampled genes chromosomes # order the genes by chromosome chrom_sort = np.argsort(choosen_chrom) choice_idx = choice_idx[chrom_sort] # sort the genes by start new_chrom = chroms[choice_idx] choosen_starts = starts[choice_idx] ordered_choice_idx = np.full((args.pad_length), args.cls_token_idx) # start with cls # i= 0 first token is CLS i = 1 # continue on to the rest of the sequence with left bracket being assumed. # Shuffle the chroms now, there's no natural order to chromosomes uq_chroms = np.unique(new_chrom) np.random.shuffle(uq_chroms) # shuffle # This loop is actually just over one cell for chrom in uq_chroms: # Open Chrom token ordered_choice_idx[i] = int(chrom) + args.CHROM_TOKEN_OFFSET # token of this chromosome # i = 1 next token is a chrom open i += 1 # now sort the genes by start order within the chroms loc = np.where(new_chrom == chrom)[0] sort_by_start = np.argsort( choosen_starts[loc]) # start locations for this chromsome to_add = choice_idx[loc[sort_by_start]] ordered_choice_idx[i:(i + len(to_add))] = dataset_idxs[to_add] i += len(to_add) ordered_choice_idx[i] = args.chrom_token_right_idx # add the chrom sep again i += 1 # add the closing token again longest_seq_len = max(longest_seq_len, i) remainder_len = (args.pad_length - i) cell_mask = torch.concat((torch.ones(i), # pay attention to all of these tokens, ignore the rest! torch.zeros(remainder_len))) mask[c, :] = cell_mask ordered_choice_idx[i:] = args.pad_token_idx # the remainder of the sequence cell_sentences[c, :] = torch.from_numpy(ordered_choice_idx) cell_sentences_pe = cell_sentences.long() # token indices return cell_sentences_pe, mask, longest_seq_len, cell_sentences ================================================ FILE: eval_single_anndata.py ================================================ """ Script for Evaluating a Single AnnData Parameters: ---------- - `adata_path` (str): Full path to the AnnData you want to embed. - `dir` (str): Working folder where all files will be saved. - `species` (str): Species of the AnnData. - `filter` (bool): Additional gene/cell filtering on the AnnData. - `skip` (bool): Skip datasets that appear to have already been created. - `model_loc` (str): Location of pretrained UCE model's weights in a `.torch` file. - `batch_size` (int): Batch size for processing. - `CXG` (bool): Use CXG model. - `nlayers` (int): Number of transformer layers. - `output_dim` (int): Desired output dimension. - `d_hid` (int): Hidden dimension for processing. - `token_dim` (int): Token dimension. - `spec_chrom_csv_path` (str): CSV file mapping genes from each species to their respective chromosomes and genomic start positions. - `token_file` (str): `.torch` file containing token/protein embeddings for all tokens. - `protein_embeddings_dir` (str): Directory containing protein embedding `.pt` files for all species. - `offset_pkl_path` (str): `.pkl` file mapping between species and their gene's locations in the `token_file`. - `pad_length` (int): Length to pad the cell sentence to. - `pad_token_idx` (int): Index of the padding token in the `token_file`. - `chrom_token_left_idx` (int): Left chromosome token index - `chrom_token_right_idx` (int): Right chromosome token index - `cls_token_idx` (int): CLS token index in the `token_file`. - `CHROM_TOKEN_OFFSET` (int): Offset index, tokens after this mark are chromosome identifiers. - `sample_size` (int): Number of genes sampled for cell sentence. - `multi_gpu` (bool): Run evaluation on multiple GPUs (using accelerator) Returns: ------- - `dir/{dataset_name}_proc.h5ad`: The processed AnnData. Processing involves subsetting it to genes which have protein embeddings and then refiltering the dataset by minimum counts. - `dir/{dataset_name}_chroms.pkl`: File mapping the genes in the dataset to their corresponding chromosome indices. - `dir/{dataset_name}_counts.npz`: File containing the counts of the AnnData in an easily accessible format. - `dir/{dataset_name}_shapes_dict.pkl`: File containing the shape (ncell x ngene) of the AnnData, used to read the `.npz` file. - `dir/{dataset_name}_pe_idx.torch`: File mapping between the genes in the dataset and their index in the tokens file. - `dir/{dataset_name}_starts.pkl`: File mapping between the genes in the dataset and their genomic start locations. """ import argparse from evaluate import AnndataProcessor from accelerate import Accelerator def main(args, accelerator): processor = AnndataProcessor(args, accelerator) processor.preprocess_anndata() processor.generate_idxs() processor.run_evaluation() if __name__ == "__main__": parser = argparse.ArgumentParser( description='Embed a single anndata using UCE.') # Anndata Processing Arguments parser.add_argument('--adata_path', type=str, default=None, help='Full path to the anndata you want to embed.') parser.add_argument('--dir', type=str, default="./", help='Working folder where all files will be saved.') parser.add_argument('--species', type=str, default="human", help='Species of the anndata.') parser.add_argument('--filter', type=bool, default=True, help='Additional gene/cell filtering on the anndata.') parser.add_argument('--skip', type=bool, default=True, help='Skip datasets that appear to have already been created.') # Model Arguments parser.add_argument('--model_loc', type=str, default=None, help='Location of the model.') parser.add_argument('--batch_size', type=int, default=25, help='Batch size.') parser.add_argument('--pad_length', type=int, default=1536, help='Batch size.') parser.add_argument("--pad_token_idx", type=int, default=0, help="PAD token index") parser.add_argument("--chrom_token_left_idx", type=int, default=1, help="Chrom token left index") parser.add_argument("--chrom_token_right_idx", type=int, default=2, help="Chrom token right index") parser.add_argument("--cls_token_idx", type=int, default=3, help="CLS token index") parser.add_argument("--CHROM_TOKEN_OFFSET", type=int, default=143574, help="Offset index, tokens after this mark are chromosome identifiers") parser.add_argument('--sample_size', type=int, default=1024, help='Number of genes sampled for cell sentence') parser.add_argument('--CXG', type=bool, default=True, help='Use CXG model.') parser.add_argument('--nlayers', type=int, default=4, help='Number of transformer layers.') parser.add_argument('--output_dim', type=int, default=1280, help='Output dimension.') parser.add_argument('--d_hid', type=int, default=5120, help='Hidden dimension.') parser.add_argument('--token_dim', type=int, default=5120, help='Token dimension.') parser.add_argument('--multi_gpu', type=bool, default=False, help='Use multiple GPUs') # Misc Arguments parser.add_argument("--spec_chrom_csv_path", default="./model_files/species_chrom.csv", type=str, help="CSV Path for species genes to chromosomes and start locations.") parser.add_argument("--token_file", default="./model_files/all_tokens.torch", type=str, help="Path for token embeddings.") parser.add_argument("--protein_embeddings_dir", default="./model_files/protein_embeddings/", type=str, help="Directory where protein embedding .pt files are stored.") parser.add_argument("--offset_pkl_path", default="./model_files/species_offsets.pkl", type=str, help="PKL file which contains offsets for each species.") args = parser.parse_args() accelerator = Accelerator(project_dir=args.dir) main(args, accelerator) ================================================ FILE: evaluate.py ================================================ import os # os.environ["NCCL_DEBUG"] = "INFO" os.environ["OMP_NUM_THREADS"] = "12" # export OMP_NUM_THREADS=4 os.environ["OPENBLAS_NUM_THREADS"] = "12" # export OPENBLAS_NUM_THREADS=4 os.environ["MKL_NUM_THREADS"] = "12" # export MKL_NUM_THREADS=6 os.environ["VECLIB_MAXIMUM_THREADS"] = "12" # export VECLIB_MAXIMUM_THREADS=4 os.environ["NUMEXPR_NUM_THREADS"] = "12" import warnings warnings.filterwarnings("ignore") import scanpy as sc from tqdm.auto import tqdm from torch import nn, Tensor from model import TransformerModel from eval_data import MultiDatasetSentences, MultiDatasetSentenceCollator from utils import figshare_download from torch.utils.data import DataLoader from data_proc.data_utils import adata_path_to_prot_chrom_starts, \ get_spec_chrom_csv, process_raw_anndata, get_species_to_pe import os import pickle import pandas as pd import numpy as np import torch class AnndataProcessor: def __init__(self, args, accelerator): self.args = args self.accelerator = accelerator self.h5_folder_path = self.args.dir self.npz_folder_path = self.args.dir self.scp = "" # Check if paths exist, if not, create them self.check_paths() # Set up the anndata self.adata_name = self.args.adata_path.split("/")[-1] self.adata_root_path = self.args.adata_path.replace(self.adata_name, "") self.name = self.adata_name.replace(".h5ad", "") self.proc_h5_path = self.h5_folder_path + f"{self.name}_proc.h5ad" self.adata = None # Set up the row row = pd.Series() row.path = self.adata_name row.covar_col = np.nan row.species = self.args.species self.row = row # Set paths once to be used throughout the class self.pe_idx_path = self.args.dir + f"{self.name}_pe_idx.torch" self.chroms_path = self.args.dir + f"{self.name}_chroms.pkl" self.starts_path = self.args.dir + f"{self.name}_starts.pkl" self.shapes_dict_path = self.args.dir + f"{self.name}_shapes_dict.pkl" def check_paths(self): """ Check if the paths exist, if not, create them """ figshare_download("https://figshare.com/ndownloader/files/42706558", self.args.spec_chrom_csv_path) figshare_download("https://figshare.com/ndownloader/files/42706555", self.args.offset_pkl_path) if not os.path.exists(self.args.protein_embeddings_dir): figshare_download("https://figshare.com/ndownloader/files/42715213", 'model_files/protein_embeddings.tar.gz') figshare_download("https://figshare.com/ndownloader/files/42706585", self.args.token_file) if self.args.adata_path is None: print("Using sample AnnData: 10k pbmcs dataset") self.args.adata_path = "./data/10k_pbmcs_proc.h5ad" figshare_download( "https://figshare.com/ndownloader/files/42706966", self.args.adata_path) if self.args.model_loc is None: print("Using sample 4 layer model") self.args.model_loc = "./model_files/4layer_model.torch" figshare_download( "https://figshare.com/ndownloader/files/42706576", self.args.model_loc) def preprocess_anndata(self): if self.accelerator.is_main_process: self.adata, num_cells, num_genes = \ process_raw_anndata(self.row, self.h5_folder_path, self.npz_folder_path, self.scp, self.args.skip, self.args.filter, root=self.adata_root_path) if (num_cells is not None) and (num_genes is not None): self.save_shapes_dict(self.name, num_cells, num_genes, self.shapes_dict_path) if self.adata is None: self.adata = sc.read(self.proc_h5_path) def save_shapes_dict(self, name, num_cells, num_genes, shapes_dict_path): shapes_dict = {name: (num_cells, num_genes)} with open(shapes_dict_path, "wb+") as f: pickle.dump(shapes_dict, f) print("Wrote Shapes Dict") def generate_idxs(self): if self.accelerator.is_main_process: if os.path.exists(self.pe_idx_path) and \ os.path.exists(self.chroms_path) and \ os.path.exists(self.starts_path): print("PE Idx, Chrom and Starts files already created") else: species_to_pe = get_species_to_pe(self.args.protein_embeddings_dir) with open(self.args.offset_pkl_path, "rb") as f: species_to_offsets = pickle.load(f) gene_to_chrom_pos = get_spec_chrom_csv( self.args.spec_chrom_csv_path) dataset_species = self.args.species spec_pe_genes = list(species_to_pe[dataset_species].keys()) offset = species_to_offsets[dataset_species] pe_row_idxs, dataset_chroms, dataset_pos = adata_path_to_prot_chrom_starts( self.adata, dataset_species, spec_pe_genes, gene_to_chrom_pos, offset) # Save to the temp dict torch.save({self.name: pe_row_idxs}, self.pe_idx_path) with open(self.chroms_path, "wb+") as f: pickle.dump({self.name: dataset_chroms}, f) with open(self.starts_path, "wb+") as f: pickle.dump({self.name: dataset_pos}, f) def run_evaluation(self): self.accelerator.wait_for_everyone() with open(self.shapes_dict_path, "rb") as f: shapes_dict = pickle.load(f) run_eval(self.adata, self.name, self.pe_idx_path, self.chroms_path, self.starts_path, shapes_dict, self.accelerator, self.args) def get_ESM2_embeddings(args): # Load in ESM2 embeddings and special tokens all_pe = torch.load(args.token_file) if all_pe.shape[0] == 143574: torch.manual_seed(23) CHROM_TENSORS = torch.normal(mean=0, std=1, size=(1895, args.token_dim)) # 1895 is the total number of chromosome choices, it is hardcoded for now all_pe = torch.vstack( (all_pe, CHROM_TENSORS)) # Add the chrom tensors to the end all_pe.requires_grad = False return all_pe def padding_tensor(sequences): """ :param sequences: list of tensors :return: """ num = len(sequences) max_len = max([s.size(0) for s in sequences]) out_dims = (num, max_len, 1280) out_tensor = sequences[0].data.new(*out_dims).fill_(0) out_dims2 = (num, max_len) mask = sequences[0].data.new(*out_dims2).fill_(float('-inf')) for i, tensor in enumerate(sequences): length = tensor.size(0) out_tensor[i, :length] = tensor mask[i, :length] = 1 return out_tensor.permute(1, 0, 2), mask def run_eval(adata, name, pe_idx_path, chroms_path, starts_path, shapes_dict, accelerator, args): #### Set up the model #### token_dim = args.token_dim emsize = 1280 # embedding dimension d_hid = args.d_hid # dimension of the feedforward network model in nn.TransformerEncoder nlayers = args.nlayers # number of nn.TransformerEncoderLayer in nn.TransformerEncoder nhead = 20 # number of heads in nn.MultiheadAttention dropout = 0.05 # dropout probability model = TransformerModel(token_dim=token_dim, d_model=emsize, nhead=nhead, d_hid=d_hid, nlayers=nlayers, dropout=dropout, output_dim=args.output_dim) if args.model_loc is None: raise ValueError("Must provide a model location") # intialize as empty empty_pe = torch.zeros(145469, 5120) empty_pe.requires_grad = False model.pe_embedding = nn.Embedding.from_pretrained(empty_pe) model.load_state_dict(torch.load(args.model_loc, map_location="cpu"), strict=True) # Load in the real token embeddings all_pe = get_ESM2_embeddings(args) # This will make sure that you don't overwrite the tokens in case you're embedding species from the training data # We avoid doing that just in case the random seeds are different across different versions. if all_pe.shape[0] != 145469: all_pe.requires_grad = False model.pe_embedding = nn.Embedding.from_pretrained(all_pe) print(f"Loaded model:\n{args.model_loc}") model = model.eval() model = accelerator.prepare(model) batch_size = args.batch_size #### Run the model #### # Dataloaders dataset = MultiDatasetSentences(sorted_dataset_names=[name], shapes_dict=shapes_dict, args=args, npzs_dir=args.dir, dataset_to_protein_embeddings_path=pe_idx_path, datasets_to_chroms_path=chroms_path, datasets_to_starts_path=starts_path ) multi_dataset_sentence_collator = MultiDatasetSentenceCollator(args) dataloader = DataLoader(dataset, batch_size=batch_size, shuffle=False, collate_fn=multi_dataset_sentence_collator, num_workers=0) dataloader = accelerator.prepare(dataloader) pbar = tqdm(dataloader, disable=not accelerator.is_local_main_process) dataset_embeds = [] with torch.no_grad(): for batch in pbar: batch_sentences, mask, idxs = batch[0], batch[1], batch[2] batch_sentences = batch_sentences.permute(1, 0) if args.multi_gpu: batch_sentences = model.module.pe_embedding(batch_sentences.long()) else: batch_sentences = model.pe_embedding(batch_sentences.long()) batch_sentences = nn.functional.normalize(batch_sentences, dim=2) # Normalize token outputs now _, embedding = model.forward(batch_sentences, mask=mask) # Fix for duplicates in last batch accelerator.wait_for_everyone() embeddings = accelerator.gather_for_metrics((embedding)) if accelerator.is_main_process: dataset_embeds.append(embeddings.detach().cpu().numpy()) accelerator.wait_for_everyone() if accelerator.is_main_process: dataset_embeds = np.vstack(dataset_embeds) adata.obsm["X_uce"] = dataset_embeds write_path = args.dir + f"{name}_uce_adata.h5ad" adata.write(write_path) print("*****Wrote Anndata to:*****") print(write_path) ================================================ FILE: examples/Benchmark Embeddings with scIB.ipynb ================================================ { "cells": [ { "cell_type": "markdown", "id": "6b258384-9a56-4ed0-be6f-db1c94711356", "metadata": {}, "source": [ "# Large Scale Embedding benchmarks\n", "\n", "This notebook includes an example showing how to run large scale embedding benchmarks using scIB [(single-cell integration benchmark)](https://www.nature.com/articles/s41592-021-01336-8)\n", "\n", "We use the GPU accelerated version implemented here: https://github.com/YosefLab/scib-metrics\n", "\n", "Please follow installation instructions in that repo. \n", "\n", "*Note: installing Faiss can be difficult and may take some time*\n", "\n", "*Running the full benchmarking suite on many cells can take many hours, even on GPUs with large amounts of memory, such as A100s, and with many threads*" ] }, { "cell_type": "markdown", "id": "ca4ba3a1-5c85-4c7b-8564-f8c5689e9345", "metadata": {}, "source": [ "## Load Imports and define Benchmark Function" ] }, { "cell_type": "code", "execution_count": 1, "id": "b9d9fd58-915b-492d-9880-48c37e3859a8", "metadata": {}, "outputs": [], "source": [ "import numpy as np\n", "import scanpy as sc\n", "\n", "from scib_metrics.benchmark import Benchmarker\n", "\n", "import faiss\n", "\n", "from scib_metrics.nearest_neighbors import NeighborsResults\n", "\n", "# Faiss GPU accelerate nearest neighbors methods\n", "def faiss_hnsw_nn(X: np.ndarray, k: int):\n", " \"\"\"Gpu HNSW nearest neighbor search using faiss.\n", "\n", " See https://github.com/nmslib/hnswlib/blob/master/ALGO_PARAMS.md\n", " for index param details.\n", " \"\"\"\n", " X = np.ascontiguousarray(X, dtype=np.float32)\n", " res = faiss.StandardGpuResources()\n", " M = 32\n", " index = faiss.IndexHNSWFlat(X.shape[1], M, faiss.METRIC_L2)\n", " gpu_index = faiss.index_cpu_to_gpu(res, 0, index)\n", " gpu_index.add(X)\n", " distances, indices = gpu_index.search(X, k)\n", " del index\n", " del gpu_index\n", " # distances are squared\n", " return NeighborsResults(indices=indices, distances=np.sqrt(distances))\n", "\n", "\n", "def faiss_brute_force_nn(X: np.ndarray, k: int):\n", " \"\"\"Gpu brute force nearest neighbor search using faiss.\"\"\"\n", " X = np.ascontiguousarray(X, dtype=np.float32)\n", " res = faiss.StandardGpuResources()\n", " index = faiss.IndexFlatL2(X.shape[1])\n", " gpu_index = faiss.index_cpu_to_gpu(res, 0, index)\n", " gpu_index.add(X)\n", " distances, indices = gpu_index.search(X, k)\n", " del index\n", " del gpu_index\n", " # distances are squared\n", " return NeighborsResults(indices=indices, distances=np.sqrt(distances))" ] }, { "cell_type": "code", "execution_count": 2, "id": "4c5fb90f-ffa5-4cb9-bf6a-6afce956fc86", "metadata": {}, "outputs": [], "source": [ "import warnings\n", "warnings.filterwarnings(\"ignore\")\n", "from scib_metrics.benchmark import Benchmarker, BioConservation, BatchCorrection\n", "import pandas as pd\n", "\n", "## Benchmarking Function, returns dataframe of scores\n", "def benchmark(ad, label_key=\"cell_type\", batch_key=\"sample_id\", obsm_keys=[\"X_uce\", \"X_scGPT\", \"X_geneformer\"]):\n", " print(f\"Running using CT key:\", label_key)\n", " biocons = BioConservation()\n", " batchcons = BatchCorrection(pcr_comparison=False)\n", " \n", " bm = Benchmarker(\n", " ad,\n", " batch_key=batch_key,\n", " label_key=label_key,\n", " embedding_obsm_keys=obsm_keys,\n", " bio_conservation_metrics=biocons,\n", " batch_correction_metrics=None,\n", " n_jobs=48,\n", " )\n", " bm.prepare(neighbor_computer=faiss_brute_force_nn)\n", " bm.benchmark()\n", " df = bm.get_results(min_max_scale=False)\n", " return df" ] }, { "cell_type": "markdown", "id": "2f3bb257-21d4-41d5-9726-50b5e7af04b2", "metadata": {}, "source": [ "### Load in anndata\n", "\n", "For this example, we will benchmark cells from developing mouse brain.\n", "\n", "You can download an anndata object with UCE, scGPT and Geneformer embeddings precalulated from [here](https://drive.google.com/drive/folders/1f63fh0ykgEhCrkd_EVvIootBw7LYDVI7)" ] }, { "cell_type": "code", "execution_count": 3, "id": "35392e93-6ffd-4df6-9609-f85ea6aad4ae", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "AnnData object with n_obs × n_vars = 597668 × 18285\n", " obs: 'n_counts', 'n_genes', 'region', 'age', 'experiment', 'species', 'sex', 'n_genes_by_counts', 'total_counts', 'total_counts_mt', 'pct_counts_mt', 'leiden', 'cell_type', 'sex_old', 'abca_class', 'abca_subclass', 'abca_supertype', 'abca_cluster', 'abca_region', 'leiden_old', 'region_dissected', 'biosample_id', 'donor_id', 'species__ontology_label', 'disease', 'disease__ontology_label', 'organ', 'organ__ontology_label', 'library_preparation_protocol', 'library_preparation_protocol__ontology_label', 'cell_type_author', 'cell_type__ontology_label', 'supercluster'\n", " var: 'n_cells', 'mt', 'n_cells_by_counts', 'mean_counts', 'pct_dropout_by_counts', 'total_counts', 'highly_variable', 'highly_variable_rank', 'means', 'variances', 'variances_norm', 'highly_variable_nbatches', 'feature_name'\n", " uns: '10x_batch_colors', '_scvi_manager_uuid', '_scvi_uuid', 'age_colors', 'ages_ordered_colors', 'dendrogram_leiden', 'hvg', 'leiden', 'log1p', 'neighbors', 'pca', 'rank_genes_groups', 'region_colors', 'region_dissected_colors', 'regions_ordered_colors', 'replicate_colors', 'sex_colors', 'umap'\n", " obsm: 'X_geneformer', 'X_pca', 'X_scGPT', 'X_scVI', 'X_uce', 'X_umap', 'latent_gene_encoding'\n", " layers: 'counts'\n", " obsp: 'connectivities', 'distances'" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "ad = sc.read(\"developing_mouse_brain.h5ad\", cache=True)\n", "ad" ] }, { "cell_type": "code", "execution_count": 4, "id": "a4cb1a5e-1672-4ba7-b488-036de0e3ff61", "metadata": {}, "outputs": [], "source": [ "cell_type_column = \"supercluster\"\n", "batch_column = \"donor_id\"" ] }, { "cell_type": "code", "execution_count": 5, "id": "134f4e09-8e68-43fb-9d12-d87a1b5318c1", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "33" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "len(ad.obs[cell_type_column].unique()) # Number of unique cell types" ] }, { "cell_type": "code", "execution_count": 6, "id": "ac956e69-9a66-4225-adb8-a01a2d6e23bf", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "25" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "len(ad.obs[batch_column].unique()) # Number of unique batches" ] }, { "cell_type": "markdown", "id": "ee280476-4057-4051-b4f1-eb7ee0055e69", "metadata": {}, "source": [ "# Running the Benchmark\n", "\n", "Running the benchmark on the full dataset can take a very long time. Instead, we can run on medium sized samples of cells." ] }, { "cell_type": "code", "execution_count": 7, "id": "0cae96b8-5be1-4ea5-a919-d16d2205d645", "metadata": {}, "outputs": [], "source": [ "sample_size = 100_000 # number of cells" ] }, { "cell_type": "code", "execution_count": 8, "id": "189ad01d-83c0-40e6-ab13-d16ed7eb0c88", "metadata": { "scrolled": true }, "outputs": [ { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "0d430c0038f84d33915a3d9b211d9608", "version_major": 2, "version_minor": 0 }, "text/plain": [ " 0%| | 0/10 [00:00\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Isolated labelsKMeans NMIKMeans ARISilhouette labelcLISISilhouette batchiLISIKBETGraph connectivityPCR comparisonBatch correctionBio conservationTotal
Embedding
X_geneformer0.5328570.3211450.1184050.4792770.9823010.8683120.1657350.4971170.7096780.00.4481690.4867970.471346
X_scGPT0.6274450.6155290.3515860.5366520.9983660.885780.1374060.4262210.8726980.00.4644210.6259160.561318
X_uce0.7527080.7278280.504540.5943310.999630.8602440.1367960.4014630.8320730.00.4461150.7158070.607931
\n", "" ], "text/plain": [ " Isolated labels KMeans NMI KMeans ARI Silhouette label cLISI \\\n", "Embedding \n", "X_geneformer 0.532857 0.321145 0.118405 0.479277 0.982301 \n", "X_scGPT 0.627445 0.615529 0.351586 0.536652 0.998366 \n", "X_uce 0.752708 0.727828 0.50454 0.594331 0.99963 \n", "\n", " Silhouette batch iLISI KBET Graph connectivity \\\n", "Embedding \n", "X_geneformer 0.868312 0.165735 0.497117 0.709678 \n", "X_scGPT 0.88578 0.137406 0.426221 0.872698 \n", "X_uce 0.860244 0.136796 0.401463 0.832073 \n", "\n", " PCR comparison Batch correction Bio conservation Total \n", "Embedding \n", "X_geneformer 0.0 0.448169 0.486797 0.471346 \n", "X_scGPT 0.0 0.464421 0.625916 0.561318 \n", "X_uce 0.0 0.446115 0.715807 0.607931 " ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ "grouped_mean" ] }, { "cell_type": "code", "execution_count": 12, "id": "3a71d4f4-1555-4cea-8f8f-332075e00de6", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "Embedding\n", "X_geneformer 0.486797\n", "X_scGPT 0.625916\n", "X_uce 0.715807\n", "Name: Bio conservation, dtype: object" ] }, "execution_count": 12, "metadata": {}, "output_type": "execute_result" } ], "source": [ "grouped_mean[\"Bio conservation\"]" ] } ], "metadata": { "kernelspec": { "display_name": "Faiss", "language": "python", "name": "faiss_1.8" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.11.9" } }, "nbformat": 4, "nbformat_minor": 5 } ================================================ FILE: examples/Label Transfer Using Logistic Classifier.ipynb ================================================ { "cells": [ { "cell_type": "markdown", "id": "3f4f1b19-5369-4e4d-9366-b6f07f88b402", "metadata": {}, "source": [ "# Transferring Labels Using UCE\n", "\n", "This notebook walks through the example from Figure 4d,4e of transferring labels from mouse kidney norn cells to a human lung disease dataset.\n", "\n", "To transfer labels, we use a basic default implementation of sklearn's logistic classifier." ] }, { "cell_type": "code", "execution_count": 1, "id": "5ca49083-fd91-473f-b60a-621b07d52de2", "metadata": {}, "outputs": [], "source": [ "## Imports\n", "import scanpy as sc\n", "import numpy as np\n", "import random\n", "from sklearn.linear_model import LogisticRegression\n", "sc._settings.settings._vector_friendly=True\n", "import matplotlib\n", "import matplotlib.pyplot as plt\n", "\n", "## Seed\n", "np.random.seed(0)\n", "random.seed(0)" ] }, { "cell_type": "markdown", "id": "72536f9e-b010-44a7-b323-c32f05cf7d98", "metadata": {}, "source": [ "## Load in anndatas\n", "You can download the anndatas here: https://drive.google.com/drive/folders/1f63fh0ykgEhCrkd_EVvIootBw7LYDVI7" ] }, { "cell_type": "code", "execution_count": 2, "id": "8e5d7a7e-2a86-4fce-82fe-17afcf83dec5", "metadata": {}, "outputs": [], "source": [ "epo_uce = sc.read(\"mouse_kidney_norn.h5ad\")\n", "kam_20_uce = sc.read(\"human_lung_disease.h5ad\")" ] }, { "cell_type": "markdown", "id": "dcbf764a-ad99-4622-a780-ecd62e471132", "metadata": {}, "source": [ "### Train Classifier on Mouse Kidney Cells\n", "\n", "We train a classifier to predict coarsened cell types, from the UCE embeddings" ] }, { "cell_type": "code", "execution_count": 3, "id": "9e56e9aa-bf4a-42d9-b8a6-e76833161083", "metadata": {}, "outputs": [], "source": [ "epo_map = {\n", " \"Norn\":\"Norn\",\n", " \"Proximal tubule\":\"Proximal tubule\",\n", " \"Collecting duct principal\":\"Collecting duct\",\n", " \"Distal convoluted tubule\":\"Distal convoluted tubule\",\n", " \"Fibroblasts\":\"Fibroblast\",\n", " \"Endothelial\":\"Endothelial\",\n", " \"Collecting duct transient\":\"Collecting duct\",\n", " \"Other\":\"misc\",\n", " \"Pericyte Ren1+\":\"Pericyte\",\n", " \"Podocytes\":\"Podocyte\",\n", " \"Pericyte3\":\"Pericyte\",\n", " \"Pericyte1\":\"Pericyte\",\n", " \"Pericyte2\":\"Pericyte\",\n", " \"Collecting duct intercalated\":\"Collecting duct\",\n", " \"Loop of henle\":\"Loop of henle\",\n", " \"Proximal tubule2\":\"Proximal tubule\",\n", " \"Macrophages\":\"Macrophage\",\n", " \"Neutrophil\":\"Granulocyte\",\n", " \"T lymphocyte\":\"T cell\",\n", " \"Collecting duct\":\"Collecting duct\",\n", " \"Monocytes\":\"Monocyte\",\n", " \n", "} # coarse cell type map" ] }, { "cell_type": "code", "execution_count": 4, "id": "39b160cd-4462-437f-b89e-7363e20e8ebe", "metadata": {}, "outputs": [], "source": [ "epo_uce_no_misc = epo_uce[epo_uce.obs.group != \"Other\"] # remove misc cells\n", "X = epo_uce_no_misc.obsm[\"X_uce\"] # input is UCE embeddings\n", "y = [epo_map[ct] for ct in epo_uce_no_misc.obs[\"group\"].values] # output is mapped cell types\n", "clf = LogisticRegression(random_state=0).fit(X, y) # fit classifier" ] }, { "cell_type": "markdown", "id": "ae2fd8b6-e681-4f02-9f9c-71d71d95f925", "metadata": {}, "source": [ "### Predict norn-like cells using classifier" ] }, { "cell_type": "code", "execution_count": 5, "id": "722bcc74-aaca-43f9-b5dd-2427584d7683", "metadata": {}, "outputs": [], "source": [ "kam_20_uce.obs[\"pred\"] = clf.predict(kam_20_uce.obsm[\"X_uce\"]) # predict cell types for lung disease dataset" ] }, { "cell_type": "code", "execution_count": 6, "id": "ae31d2c0-5987-46c1-9be0-d874ae6577b5", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "pred\n", "Proximal tubule 119834\n", "T cell 93556\n", "Granulocyte 52485\n", "Collecting duct 15727\n", "Macrophage 11800\n", "Endothelial 7233\n", "Norn 6005\n", "Podocyte 4270\n", "Pericyte 1316\n", "Fibroblast 623\n", "Monocyte 56\n", "Loop of henle 23\n", "Name: count, dtype: int64" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "kam_20_uce.obs[\"pred\"].value_counts()" ] }, { "cell_type": "markdown", "id": "23a839f8-88ce-4446-b4a5-961a38a264b5", "metadata": {}, "source": [ "# Check Differential Expression" ] }, { "cell_type": "code", "execution_count": 7, "id": "159cc172-1be5-4ad2-a4b5-2ebd2e52302e", "metadata": {}, "outputs": [], "source": [ "# Preproccess Count Values\n", "sc.pp.highly_variable_genes(kam_20_uce, n_top_genes=8000, flavor=\"seurat_v3\", subset=True)\n", "sc.pp.normalize_per_cell(kam_20_uce)\n", "sc.pp.log1p(kam_20_uce)" ] }, { "cell_type": "code", "execution_count": 8, "id": "57b9fb8a-2699-43f4-b74f-2a646e8610c8", "metadata": {}, "outputs": [], "source": [ "# Subset to predicted Norn-like cells\n", "kam20_norn_ad = kam_20_uce[kam_20_uce.obs.pred == \"Norn\"].copy()" ] }, { "cell_type": "code", "execution_count": 9, "id": "591d1aa7-c4ad-4ad9-9bef-2ed91ed19b57", "metadata": {}, "outputs": [], "source": [ "all_de_dfs = {}\n", "ngenes = 4" ] }, { "cell_type": "code", "execution_count": 10, "id": "bb5bc067-c588-4a34-a8f3-138824531828", "metadata": {}, "outputs": [], "source": [ "sc.tl.rank_genes_groups(kam20_norn_ad, groupby=\"Disease_Identity\", use_raw=False, reference=\"Control\") # DE, diseases vs control" ] }, { "cell_type": "code", "execution_count": 11, "id": "db176232-0dee-4c1a-bda2-38a19a7afcdd", "metadata": {}, "outputs": [], "source": [ "de_df = sc.get.rank_genes_groups_df(kam20_norn_ad, group=\"COPD\") # get COPD vs control results\n", "all_de_dfs[\"copd_vs_control\"] = de_df[~de_df.index.isin(de_df.iloc[10:-10].index)] # top 10 and bottom 10 genes\n", "copd_control_genes = list(de_df.head(ngenes)[\"names\"].values)" ] }, { "cell_type": "code", "execution_count": 12, "id": "c4d25f7a-4056-4e03-bcd1-9ab9cb2705e6", "metadata": {}, "outputs": [], "source": [ "de_df = sc.get.rank_genes_groups_df(kam20_norn_ad, group=\"IPF\") # get IPF vs control results\n", "all_de_dfs[\"ipf_vs_control\"] = de_df[~de_df.index.isin(de_df.iloc[10:-10].index)] # top 10 and bottom 10 genes\n", "ipf_control_genes = list(de_df.head(ngenes)[\"names\"].values)" ] }, { "cell_type": "code", "execution_count": 13, "id": "4e72311d-44ee-4ca5-8abc-24e9079b574b", "metadata": {}, "outputs": [], "source": [ "sc.tl.rank_genes_groups(kam20_norn_ad, groupby=\"Disease_Identity\", use_raw=False, reference=\"IPF\") # DE, all vs IPF" ] }, { "cell_type": "code", "execution_count": 14, "id": "69be495e-3b0f-41f3-95d8-47a526f00bbd", "metadata": {}, "outputs": [], "source": [ "de_df = sc.get.rank_genes_groups_df(kam20_norn_ad, group=\"COPD\") # COPD vs IPF\n", "all_de_dfs[\"copd_vs_ipf\"] = de_df[~de_df.index.isin(de_df.iloc[10:-10].index)] # top 10 and bottom 10 genes\n", "copd_ipf_genes = list(de_df.head(ngenes)[\"names\"].values)" ] }, { "cell_type": "code", "execution_count": 15, "id": "db9e2358-7431-49df-a980-ff4869d824d4", "metadata": {}, "outputs": [], "source": [ "sc.tl.rank_genes_groups(kam20_norn_ad, groupby=\"Disease_Identity\", use_raw=False, reference=\"COPD\") # DE, all vs COPD" ] }, { "cell_type": "code", "execution_count": 16, "id": "753b9891-f940-46b7-9fa2-8972b6f67136", "metadata": {}, "outputs": [], "source": [ "de_df = sc.get.rank_genes_groups_df(kam20_norn_ad, group=\"IPF\") # IPF vs COPD\n", "all_de_dfs[\"ipf_vs_copd\"] = de_df[~de_df.index.isin(de_df.iloc[10:-10].index)] # top 10 and bottom 10 genes\n", "ipf_copd_genes = list(de_df.head(ngenes)[\"names\"].values)" ] }, { "cell_type": "code", "execution_count": 17, "id": "647407f4-c9ef-405e-9742-342e31664497", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "['POSTN',\n", " 'COL1A1',\n", " 'COL3A1',\n", " 'SPARC',\n", " 'LUM',\n", " 'MFAP4',\n", " 'PTGDS',\n", " 'PTPRG',\n", " 'GPX3',\n", " 'NAMPT',\n", " 'RPL41',\n", " 'CRISPLD2',\n", " 'SERPINH1',\n", " 'COL1A2']" ] }, "execution_count": 17, "metadata": {}, "output_type": "execute_result" } ], "source": [ "gene_list = ipf_control_genes + copd_control_genes + copd_ipf_genes + ipf_copd_genes\n", "\n", "reduced_gene_list = []\n", "for g in gene_list:\n", " if g in reduced_gene_list:\n", " next\n", " else:\n", " reduced_gene_list.append(g)\n", "reduced_gene_list" ] }, { "cell_type": "markdown", "id": "c2abdd88-2f7b-4ac4-961a-ea417f85bb04", "metadata": {}, "source": [ "## Plot Results" ] }, { "cell_type": "code", "execution_count": 18, "id": "f11581f2-2895-419e-85da-0f53c07756ec", "metadata": {}, "outputs": [ { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAdsAAAHICAYAAAAGOEABAAAAP3RFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjkuMS5wb3N0MSwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8kixA/AAAACXBIWXMAAA9hAAAPYQGoP6dpAACfUUlEQVR4nOzdd1hT1xsH8G8GG0LYAQRxIe69cG9xrzqqKO7VWu22tY46sEtrW6tWrQtx1rpxVREUt+JEUZEhyCZAGJn39wc/o5GZkJugvJ/nuc8jufecvBfDfXPPPYPDMAwDQgghhLCGa+wACCGEkPcdJVtCCCGEZZRsCSGEEJZRsiWEEEJYRsmWEEIIYRklW0IIIYRllGwJIYQQllGyJYQQQlhGyZYQQghhGSVbQgghhGWUbAkhhBCWUbIlhBBCWEbJlhBCCGEZJVtCCCGEZZRsCSGEEJZRsiWEEEJYRsmWEEIIYRklW0IIIYRllGwJIYQQllGyJYQQQlhGyZYQQghhGSVbQgghhGWUbAkhhBCWUbIlhBBCWEbJlhBCCGEZJVtCCCGEZZRsCSGEEJZRsiWEEEJYRsmWEEIIYRklW0IIIYRllGwJIYQQllGyJYQQQlhGyZYQQghhGSVbQgghhGWUbAkhhBCWUbIlhBBCWEbJlhBCCGEZJVtCCCGEZZRsCSGEEJZRsiWEEEJYRsmWEEIIYRklW0IIIYRllGwJIYQQllGyJYQQQlhGyZYQQghhGSVbQgghhGWUbAkhhBCWUbIlhBBCWEbJlhBCCGEZJVtCCCGEZZRsCSGEEJZRsiWEEEJYRsmWEEIIYRklW0IIIYRllGwJIYQQllGyJYQQQlhGyZYQQghhGSVbQgghhGWUbAkhhBCWUbIlhBBCWEbJlhBCCGEZJVtCCCGEZZRsCSGEEJZRsiWEEEJYRsmWEEIIYRklW0IIIYRllGwJIYQQllGyJYQQQlhGyZYQQghhGSVbQgghhGWUbAkhhBCWUbIlhBBCWMY3dgDvIpVKhaSkJNjY2IDD4Rg7HPIOYhgGubm5cHNzA5dL33kJed9RstVBUlISPDw8jB0GeQ8kJCSgRo0axg6DEMIySrY6sLGxAVB0oRQIBEaOhryLcnJy4OHhof4sEULeb5RsdfCq6VggEFCyJZVCjyEIqR7oYREhhBDCMkq2hBBCCMso2RJCCCEso2RLCCGEsIySLSGEEMIySraEEEIIyyjZEkIIISyjZEsIIYSwjJItIYQQwjJKtoQQQgjLKNkSQgghLKNkSwghhLCMki0hhBDCMlr1h1Qrcc+f486F0+DLCsFwubBwrYmu/QaAx+MZOzRCyHuMki2pFlQqFf7Z9AdqyzMxwMMZgAkAIDf3KQ79tAhNBo2Fd6PGxg2SEPLeomZkUi0c2fYX+tgq0NzDWeN1GwtzDG3gjucn9yH5ZZKRoiOEvO+Mlmy9vLxgaWkJa2truLq64pNPPoFCoQAA/Pnnn/Dx8YGFhQW8vLzw/fffQ6lUqsveu3cPPXv2hJ2dHezs7ODr64vr169j5cqVsLa2hrW1NczMzGBiYqL+eebMmQgNDQWHw8GCBQs0YjE3N0dsbKwhT79SFAqF+ndFypeVlQVb8QtYW5iXekzPOi64evKoAaMihFQnRm1GPn36NDp16oQnT56gS5cu8PHxgVgsxrp16xAcHAxfX188ePAA48aNQ1JSEjZs2AAAGDx4MObPn49Tp05BoVDg4sWLMDMzwzfffINvvvkGALBq1So8evQI27ZtU79faGgobGxssGHDBnz++edwcHAwxmlXSuTNm3jx9BE4HA5EXnXQqm07Y4dU5V0+dQy9a4nKPIbD4YCTkWigiAgh1U2VaEauV68eOnfujGvXrmHZsmX4888/0aVLF/D5fDRr1gxBQUHYtGkToqOjkZaWhtjYWEybNg18Ph/m5ubo1asXmjZtWqH3cnZ2xuDBg/HLL79UOD6pVIqcnByNzVheJsRiYO/uGNCrG1KTXhgtjrIoFAosW7rU2GG8JpeCyy3/o86TS8EwjAECIoRUN1Ui2T5+/Bjh4eFo1aoV5HI5BgwYoLG/efPm8PT0xPnz5+Ho6Ig6depg3LhxOHr0KDIyMrR+v4ULF2L9+vXIzMys0PGBgYGwtbVVbx4eHlq/p75Y2QrxKPopop/GwNzaxmhxlIXP5+Pb774zdhivcSvWgKPi8cHhcFgOhhBSHRk12fr5+UEoFMLPzw8BAQEQCARwdHQscRiGi4sL0tPTweFwcO7cOTg7O+Ojjz6Cs7MzBgwYgOTk5Aq/b7169TBw4MAK390uWLAA2dnZ6i0hIaHC76VvXbr3BGwdobASonuvPkaLozwVuZM0lMYdu+F2Qkq5x6mELgaIhhBSHRn1ihgSEgKxWIyYmBgEBgbCyckJ6enpGp2hXklJSYGjoyMAwNPTExs2bEBcXByioqKQkpKCefPmafXeCxcuxJ9//lmhu1szMzMIBAKNzZh8GjREQxqmUmGeXl54qrSASqUq9ZjbL9Lh07GHAaMihFQnVef2A0CHDh1gYmKC48ePa7weGRmJuLg4dOvWrVgZb29vBAQE4P79+1q9V/369dG/f3+sXr26MiGTd8TgaXOx53EacvILNF5nGAYRz5Mh92mH+vQFhhDCkio1qYVQKMQ333yD2bNnQygUwtfXFw8fPsT48eMxefJk1K9fH1lZWfjtt98wceJE1KxZE0lJSdizZw/atm2r9fstXLgQvr6+NIymGrC0tMSHXyxC2OkQ5MY8BE9WCHC5UAic0HrkNLjXMN5zeELI+69KJVugKAEKhUJMnToV8fHxcHFxwaRJk7Bw4UIAgKmpKZ49e4YuXbogMzMTAoEAfn5++Pnnn7V+rwYNGqBfv37Ys2ePvk+DVEE8Hg/d/QYCGGjsUAgh1QyHobEOWsvJyYGtrS2ys7ON/vyWvJvoM0RI9VKlntkSQggh7yNKtoQQQgjLKNkSQgghLKNkSwghhLCMki0hhBDCsio39IcQtty5fgMR27ZD8ugxFPl54JqYwsTBHh69emDQlCkwNy99CT5CCKkMGvqjAxq28W65+t85hK39DfxrN+BUKCu2X84weOHlAZd+fTFp+fcwMTFhPSb6DBFSvdCdLdGrlJRk3AoLBVQK8C2s0Kl3P1hYWBgtnlPBu3H/uyVwzcgq9RgTDge14l5AvmEzfoyOxvxdO2FpaWnAKAkh7zu6s9WBMe9KGIZBamoqGIaBs7NzlVldJyb6Ee6eOQonWRbaeTqDw+FAKlfgQlw6CoSu6D12EmxsDLsk4KUTIbgyey5cxNkVLqNiGCT264Uvg4NKXH1KX+jOlpDqhZKtDoxxoVQqlQjZGwRF8nPUMC36L0uUccATeaHfqPHg843XSPHwzi2knPsXXWo6lrhfpVLhQHQq/GZ/BaFQaJCYGIbByq49UPPuA63LFjIMHH9ZhaFTp7AQWRFKtoRUL1XjtoiUSaVSIfjXVehmloNB3q5o4eWGFl5uGOjtiu7mEgSvCSxxWUJDKCwsRPSJfaUmWqBobdsP6rvg9NZ1Bovr7MF/4aBDogUAcw4Hz44e03NEhJDqjPVku2nTJjRp0gRWVlbw9PTExIkTERsbCwA4cOAAmjdvDktLS7i5uWHevHnIz89Xlw0ICMDy5cuL1alQKDBy5Eh4eHiAw+Go63tbnz594OLiUmxVnwMHDqB9+/YwNzdHQECAvk6VNacP7sPQmgJYmJkW22duaoIRdexw6oBxFlMIPfYv+tZ2Kvc4DocDH1MpYp5EGyAq4MHBf2FdifLcq9dx98ZNvcVDCKneWE22y5cvx6JFi/DDDz8gIyMDUVFR6NixI86dO4fg4GBMmzYNS5cuhVgsRnh4OG7fvo0RI0agIi3bnTt3xr59+2BmZlbi/pcvX+L8+fOQyWQ4ffq0xj57e3t8/vnnmD17tl7Ok23SF09haV480b5ibmoCedIzA0b0muzFU5iaVKwJu5GrA+6Hn2U5oqImZPGtyErV4VQoQ+Rb6yoTQoiuWHvQJxaLsXLlSgQHB6N///7q16dPnw6VSgVPT08sXboUQ4YMAQDUqVMHe/fuRa1atXD27Fn07t279KD5fHzyySdlvv/u3bvRrl07NGvWDEFBQRox9OjRAwDw9OlTZGZmlnsuUqkUUqlU/XNOTk65ZfRFpVLBVJYHoOy7R0tFIaRSaalfPtjCkxUAqPgzR768kL1g/i8/Px+8vLxK16OQVL4OQggBWLyzvXz5MmQyGQYOLL52aHR0NBITE9WJ9hWRSIT27dvj3LlzlX7/oKAgjB49GmPGjMHhw4chkUh0riswMBC2trbqzcPDcAuNczgcMOCUe5ySYVjtPVsqTvmxvVWAlTDexOPxwGgdV3EcHnVpIIToB2tXk4yMDDg6OpbYSzY9PR1AUXJ9m4uLi3q/rqKionDnzh2MHDkSnTp1gr29PQ4ePKhzfQsWLEB2drZ6S0hIqFR82uBwOJBZ2JZ7nNTC1ig9khVmFX8yyjAMlOaVeZJaMebm5mAq2cOXYRjwbaiXMCFEP1hLtg4ODkhPTy/WOenVPgBITk4uti8lJQWOjqX3bK2InTt3okuXLnB1dQWHw8GoUaMQFBSkc31mZmYQCAQamyHZezdBsji31P1pOXmwrdvIgBG9JqjbGDn5FWsavhqfinZ+g1mOqIhDh7aVKp8ksEbXD8foKRpCSHXHWrLt0KEDTExMcLyETib169eHm5sbDh8+rPF6cnIyrly5gu7du+v8vgzDIDg4GNeuXYNIJIJIJMKWLVtw7tw5vHz5Uud6jalL3/64LrdBYmbxZ8XJ4lxcKjBHN79BRoisKLZT8eU/w5bJFXhp7gxnZxcDRAW08x+PzAp23CqJWZdO8KxVS48REUKqM9baHYVCIb799lvMnj0bZmZm6N69O5RKJfbsKRqismrVKsydOxc1a9aEn58fEhISMHnyZHTq1Emjc5RCoUBh4es7Jz6fDz6fD6lUqu61LJVKUVhYCHNzc4SHhyM1NRV3796FtfXrJsu+ffti9+7d+PTTT6FUKiGXy6FQKKBUKlFYWKiut6oaOmkGLl/4D7ce3IZJQdGMSHJzAZwatMCIHr2MFhePx0OXCTNxOGgDBtdzAqeEZ6X5UhkOvSjAmHkLDBZXm86dcbZNK9hHXNW6bDaPh6YfjGQhKkJIdcX6DFJ//fUXfv/9dzx79gwODg7o0aMHvv/+e9SsWRN79+7FypUrER0dDVtbW4waNQqBgYGwsrICUDTOdvv27Rr1TZkyBZs3b4aXlxfi4uI09jEMg+nTp0Mmk2Hbtm0a+zZs2IC//voLt27dwrZt2zBp0iSN/YsXL8aSJUsqdE40+09xYrEY4Yf3QZUciya2JrCxMENyTh6eK8xg5umN3sNGGXxqyeh79/Hvh/5wT0iscJlCANIpEzDnl5/ZCwz0GSKkuqHpGnVAF8rSKZVKPIp6CEm2GE6ubqhdu45R44mMiMDJj+ahRkxsucdKeFzIPhyNj9auYf2LAX2GCKleKNnqgC6U75a4p09xYs1aZJwLhUdSMnhvNXVnmvBR2KYVfEYMw+Apkw0SE32GCKleKNnqgC6U7yaJRILDf25AdlQUFHn54JmawMTBHi0/GInWnToZNBb6DBFSvVCy1QFdKEll0WeIkOqFpsghhBBCWEbJlhBCCGFZ1R1YSoieyWQyHN++HRl370Gelw+eiQlM7e3hO/5DeDcyzgxchJDqgZItee+lJCXh2K+/I/XsWdSIiYPNW72RT2wPwomO7dF41Ej0+uADI0VJCHmfUbIlesMwDK6GnkNmzCNAqQT4Jqjbvgu8GxrvrvH+jRs4PnsuvJ48K1pMvoQZrlwLCoGzoYi5cBEbb9zE9FWBJc6ERQghuqLeyDqgnqTFhR09iKzbl9DWioGzjYX69UfpOXjGE6J2z0Fo3LpyiwNo61lUFP750B81Yyu+SlM+GMinT8H0HwJZjIw+Q4RUN3Rn+465d/s2XkTfBwC41W2AZq1aGzki4NjWjWia8QgdRZbF9vk4CuADFW6d2oXrkly06dbTIDExDIP9n36OWlokWgCwBAfZW3fgv3Zt0HP4cJaiI4RUN9Qb+R2RmpKCPb+uhPmDC+jjwKCPAwPrxxex99eVSH6ZZLS4Lp06jkbpj+AuKJ5o39TSyRp5oYeQmBBvkLiunDsHu+u3dSprK1fgwb4Deo6IEFKdsZ5sN23ahCZNmsDKygqenp6YOHEiYmNjAQAHDhxA8+bNYWlpCTc3N8ybNw/5+fnqsgEBAVi+fHmxOmNiYtCmTRvY2dnB3t4eQ4cOLXH5vPr166Nly5bFXl+/fj1atmwJExOTCi8+YExKpRL/7dyADxrXQG2Rvfp1L2d7jGxcA2HBmyGXy40SW9qtS/C0LTvRvtLJ1QY3ThxiN6D/u7VrN4RKpc7llRcvI+bxYz1GRAipzlhNtsuXL8eiRYvwww8/ICMjA1FRUejYsSPOnTuH4OBgTJs2DUuXLi1aMSY8HLdv38aIESNQ3mNkJycn7Nu3D5mZmUhOToaPjw/mzp2rccy1a9eQlJSE+/fvIyoqSmOfq6srlixZghEjRuj9nNlw4dQJ+HmLSt3vV98VF04WXzeYbVF378KHI6nw8RwOB7ykp1AoFCxGVbTkovjS5UrV4ZaXj/CgYD1FRAip7lh7ZisWi7Fy5UoEBwejf//+6tenT58OlUoFT09PLF26FEOGDAEA1KlTB3v37kWtWrVw9uxZjTVt32ZjYwMbGxv1z1wuF8+ePdM4JigoCEOGDEFWVhZ27tyJlStXqvcNHToUAHDixIkKnYtUKoVUKlX/nJNT/mLp+lSQnAAbT+tS91uam0Ea/8KAERWJe3gHve1tyj/wDZ5cGV6+fAkPDw+WogIyMzNhlpVV6Xrk4srXQQghAIt3tpcvX4ZMJsPAgQOL7YuOjkZiYqI60b4iEonQvn17nDt3rkLvIRQKYWFhgZ9//hmffvqp+nWFQoG9e/di9OjRGDNmDIKDg8u9Wy5LYGAgbG1t1RubiaIkHKb85lCOSmWASDQxOjTTmvG4KCgoYCGa16RSKbjyyt89q2TGaZonhLx/WEu2GRkZcHR0BJ9f/OY5PT0dQFFyfZuLi4t6f3nEYjGysrLwww8/wNvbW/366dOnIZPJ0LdvXwwdOhQpKSkIDw/X8UyABQsWIDs7W70lJGjXw7Wy5CYW5X5ZkJuYGyia10xtbFGoZVJLkarg7OzMUkRFhEIhZJYVe45cFr6VlR6iIYQQFpOtg4MD0tPTS3w+5+DgAABITk4uti8lJQWOjo4Vfh+BQIAJEyZgyJAhUP3/7i4oKAhDhw6FqakpbGxs0L9/fwQFBel4JoCZmRkEAoHGZkite/TDtZjEUvfffJ6EZp17GTCiIp369seFlPzyD3xDlkAEoVDITkD/Z2trC46Pd/kHliEfDEStWugpIkJIdcdasu3QoQNMTExw/Hjxjjv169eHm5sbDh8+rPF6cnIyrly5gu7du2v1XgqFAsnJyZBIJEVrlh4+jH/++QcikQgikQinT5/GgQMHNJ67vkvca9RAoVsDPE4qfsf/NCUDOc514VW7tsHjMjMzg8y1ToWb6DPzC2HflP2JLTgcDmr07Q1lJR4dpPjUQ98xY/QYFSGkOmMt2QqFQnz77beYPXs2Tp48CalUivz8fPz999/Ytm0bVq1ahcWLF+Pw4cOQyWR49uwZRo8ejU6dOml0jlIoFCgsLFRvCoUCoaGhuHXrFpRKJbKysvDZZ5+hVatWEAgEOHjwIOzs7PD48WNERkYiMjISjx49Ap/PVyf+V3UqlUqNf1dlPQYNg9THF8eei3HyYTxOPozDsediSOq0Ra8hI40WV9cxE3A4qfy7W5lCiVOF1ujUx88AUQFDZs9CvKuLTmUZhoGody/weDw9R0UIqa5Yn67xr7/+wu+//45nz57BwcEBPXr0wPfff4+aNWti7969WLlyJaKjo2Fra4tRo0YhMDAQVv9/VhYQEIDt27dr1DdlyhQMGTIEX375JRISEmBpaYmuXbvi559/Rs2aNdGnTx/4+voWGz/79ddfIzo6GgcPHsSSJUuwdOlSjf1bt25FQEBAhc6JptrTlJyUiPMbfkFfBy5sLcyK7Y/NkuAKxx6j5n8DExMTg8W1+8efkP/jalgrtes8FtPQBzOOHIT9/x93sIE+Q4RULzQ3sg7oQlmcUqlE+MljyH54C/zsVHCUSqj4plA4uqOWbw80a9POKHFt/OIr8LbugE0FE25MnVoYuf1v1GvUkNW46DNESPVCyVYHdKEsH8MwVWblnD0//4LnW3fAM/El+KXElGbKR177thi3dg1qeHmxHhN9hgipXijZ6oAulO+e/Px8HN34F+JDTkIe/RRmBQVQ8E0gtxPCsUtHtJ/gj+bt2xssHvoMEVK9ULLVAV0o320FBQUQi8UwNzeHra0tuFzDr8dBnyFCqhdaYo9UOxYWFrCwsCj/QEII0RNaYo8QQghhGd3Zkmrl3vXreHbxIlT5+eCamsDE0RE9xoylO11CCKso2ZL3nkwmw5kdO5Bx+iTc7txFA9XrCUxkDIP9m/6CSecuaDMxAHUbsjvkhxBSPVEHKR0Yq3OLSqVCxH+nIUl9CYZhYO3sio69+hqlg09JUlNScOXYPzDJE4NRKMDwTMATeaDbkJEwNzf8QgkAkJyQgCNzZqPdg/swL+f39MjcHLzZczBw1mzW46IOUoRUL5RsdWDoC6VcLkfIjs1g4qLQUQDY/X+WJnG+FBdzGHBq+qCv/1SYmRWfvckQcnNzcXLTb3DLT0U7V6HG+FqpXIGw5FwoPBtiQMB0g469TUtOxpHx49ApLrbCZVK5XGTOnoMhn8xjLS6Aki0h1Q0lWx0Y8kJZUFCAA4HfYqQDYMovea5euVKJA2nAkC+XwsZGu8XcKys7W4wTq7/HSA8bcLmlJ9KcAilCCq0w9tNvDZJwGYbBX6M+QNfbt7R+vwRTM1j/+BPaDxjAUnSUbAmpbqpG+yMp1aE1KzDaiVNqogUAEx4Po525OPLrCgNGVuTEn7/gA8+yEy0ACCzM4GeRh5Cgvw0S15UzZ9AwMlKnxO4hk+LJnt0sREUIqa4o2VZh92/fRGtVBngVeCbL5XLgy8vB7SuXDRBZkah7d9GMJ6lwQhOYm0EZc7/ENY71LeafA3CG7o029jdv4OnDh3qMiBBSnRk92YaFhaF9+/awtbVVrwr0/PlzLFmyBCYmJrC2toZQKESPHj3w8I2LX0REBDgcDn777TeN+kJDQ8HlcmFtbQ0bGxs0adIER44c0Tjm33//Rdu2bWFlZQU3NzcMHz4cd+/eNcj5auNZ+BnUtrOu8PGetlaIv3KOxYg0RYefQX1HW63KdHW2QOixQ+wE9H/p6ekwreSXjrpyOW7u3KmniAgh1Z1Rk212drZ6ubysrCzExcXh448/Vq8jOnHiREgkErx8+RLu7u6YNGmSumxQUBDs7OwQFBRUrN7atWtDIpEgOzsbH330EcaOHQuxWAwA2LlzJwICAvDJJ58gNTUVsbGxGDNmDEJCQgxyztrgpSVqXyZd+zK64mWlaF3GyswUspdxLETz2pM7d+CZm1v5ipKTK18HIYTAyMk2OjoaZmZmGD58uPpudNiwYfD09NQ4zsLCAmPHjsWDBw8AFPXO3bdvH3777TfcunUL0dHRJdbP5XLh7++P/Px8REdHQ6VS4euvv8bSpUsxbtw4WFlZwdTUFKNGjcJXX31VapxSqRQ5OTkamyFwlHLtyyi0L6MzHeKrVLkKyk5Lg5UehkMpC/L1EA0hhBg52Xp7e0Mmk2Hq1Kk4c+ZMqUksLy8PwcHBaN68OQAgJCQEPB4PY8eORdeuXUu8uwWK1ljdunUr+Hw+atasicePHyMpKQlDhw7VKs7AwEDY2tqqNw8PD63K64ynw5wjupTREYen40LwuparIGs7OxSotFswviRcM+OMDSaEvH+MmmxtbW0RFhYGqVQKf39/ODk5Yfz48cj9fxPgzp07IRQKUbt2bYjFYmzbtg1AURPyiBEjwOPxMGbMGOzatUuj3ufPn0MoFMLCwgLz58/Htm3b4OLigoyMDACASCTSKs4FCxYgOztbvSUkJFT+5CtAYaddnLqW0ZVC4KB1GalcAZ6DKwvRvFa3aVMkWFhWviInp8rXQQghqAIdpBo3boydO3ciOTkZERERiIiIwIoVRUNY/P39IRaLkZKSgmPHjqFu3brIycnB0aNHMXr0aADAiBEjkJCQgIiICHWdtWrVglgshlgsxujRoxEWFgYAcHAoSg7JWj6LMzMzg0Ag0NgMwb1dFyTmVLwpM0VSAJdWnViMSFPN9l0Rm6nds9ELyRJ0GzKCpYiKiFxdkd+uXaXqiOPx0HjMGD1FRAip7oyebN/UqlUrDB8+HPfv3y/1mAMHDqCwsBCjRo2CSCRCw4YNoVKpSmxKtrS0xB9//IEDBw7g9u3bqF+/Ptzc3HD48GE2T0NvWvl2xkWpBSoy7wjDMDifZ4p23XoYILIizdu0x/VC0wofXyhXQOZW1yAzXXkMHoKsSszXkty0ORq3bq3HiAgh1ZlRk+2jR4+wZs0aJCUlASjqMHX06FG0bdu21DJBQUGYN28e7ty5g8jISERGRmLr1q3Yt28f5PLiHW9sbW0xbdo0BAYGgsvlYtWqVVi8eDF2796NvLw8yOVyHDx4ED/++CNr56krDocDv7nfYP9LKVSq0hMHwzA4mCxD34+/Nuh0iADQbcrHOBqfXe5xUrkCB9MZDJg00wBRAZ0HD8adBrotKpDC48FjBLt334SQ6sWoydbGxgYRERFo1aoVrKys0KtXLwwYMABff/11ice/ePEC4eHhmDt3LkQikXobO3YszM3NSx2+8/HHH+Po0aOIjo6Gv78/tm7dijVr1sDZ2Rk1a9ZEUFAQ/Pz82DxVndnZ22Pg1ytxjCPC6aRcSOWvJ4SQKZQ4k5iDI4wz+n25HI5OzgaPT+TqBt9ZX+FAigKP0oonXYZhEPEiE8dlthj71VLw+YbpwMXlctFv9RpcF2n3fDgbQOKYsej+/8cUhBCiDzQ3sg6MNa9tQUEBwo/9C2VuNhhGBa61LboMGg5LSz10BtKDxw/u41HYaZhKsqBUyMExMYXS3hUdBo6Ak7PhvwgAwLP79xE6by58Y2PBLeeu/wWfj/Rx/hj1LfvzN9PcyIRUL5RsdUAXyndLdnY2zm5Yj5zz59Hg6RPYvZFIGYbBPWtryDr4ot7ID9C2Z0+DxESfIUKqF0q2OqAL5buJYRiEHT6MjDuRUOblg2tqAo7QDp39/eHk4mLQWOgzREj1QslWB3ShJJVFnyFCqpcqNfSHEEIIeR8Zbm4/UmkZGRk4sW8/pLkSQKWCmcAG/T4YCSea6ahCCgsLsXfjJqRFP4U0Lw98UxOY2wrRfewHaNqypbHDI4S8x6gZWQeGbgKMOB+Ks9uC8OzMeVi+TAUHRR18GDDId3FCrT7d0HPiOHTq0cPg42zfpFQqEX74X+TFPQcjkwHm5nBv1RYtOnU2WkwAEB0VhcN/bsSTU//B9Gkc+G/9jiQW5nDo0h6thg/BiEkT1atOsYmakQmpXijZ6sBQF0qZTIalk6cjaf8RWMjKXnC90IQH5xEDsGTrZpibG3YC/Yy0NIRt2QjVtUtol/USAt7rpxPxSuCRZz1YdOyGPpOmwsSE3UUI3nZs9x4c+fxbWCanl3usjGFgObgPvg/ewfpwKkq2hFQvlGx1YIgLpVwuxxfDRqHg+FnwULG7VRUYmPXrhp8O/wNT04pPo1gZT+5E4v7SBeiRnVzmXbVUpcIJr8YYumYdbG21W3BeV4d37sKJuV/CIkdS4TIMwwC9OuPHowdZnVaSki0h1Qt1kKqiVs78SKtECwBccFB4MhTLphlmSsS46Gg8XvQFeuaklNt8bcblYmjcAxz+ZAakUinrsUVeu4YTXy3SKtECRVNkMmfDsWrmRyxFRgipjijZVkHRjx7h+YEjWiXaV3jgIP6f43h49y4LkWmK+GEpuuRlVPh4DoeDgfGPcfynQBajKnJ8w2ZYpJTfdFwSLoeD+CMhiI2J0XNUhJDqipJtFXR4/V+wzMnTubxVXgGObtyix4iKe3DjOnxiorQux+dywL1+CTKZjIWoimRkZCDm9LlK1WGdlYN//tigp4gIIdXdO5lsvby8cPHiRfXPoaGhqFu3bpnHBQQEgMPh4Pz58xrH9Ph/D97Y2FhWY64oqVSKqJAzla7n8cmzyM+v+Fq4Wtd/cC/q6Nhpt31OGs7vCdZvQG/Y9/ufsE5KrVQdHA4H0afOsvqlgBBSfbyTyVZX9erVQ3Dw64t8YmIinj9/brDORBVx+8YNqJ48r3Q9nJh4XAkP10NEpdT/+IHOZS15XEjv3NRjNJpSoh7pZQiUPOoJnkRH6yEiQkh1V62S7bBhw3D8+HF1B53g4GCMGTOm3AuzVCpFTk6OxsaWlBeJMNPhWe3bTAGkJ72sfEClyde9mRsAmEqWL4tUop+6zcBBcnyCXuoihFRv1SrZ2tjYoFOnTjhx4gQAYNeuXRg/fny55QIDA2Fra6vePDw8WIvR1MwUKj3UowJgYsbiHTu3ch8dDosTR3D1VLcKDMwsDDtmmRDyfqpWyRYAxo0bh127duHBg6Jm0EaNGpVbZsGCBcjOzlZvCQns3e3UqF0bhbzK/7cUcjhwr1VLDxGVjBEIK1VeZcPeWFszGxu91CM14cPdy0svdRFCqrf3Itny+XzI5fJir8vl8mIzFvn5+eHy5ctYt24dxo0bV6H6zczMIBAINDa2NG7SBII2zStdj2WrJmjdrl3lAyoFp3lr6DofykuFCm69++k5otfqd+sMuR7mahG2awUvSraEED14L5Kth4cHUlJSNCZLKCgoQGpqKjw9PTWONTU1xYABA/DXX3/hww8/NHSo5eJwOGgysB9U0D1ZMGDQZEBfcCvZ1FsW3wlTcIOrWxPrvRp10aY7e4u0Dw+YAGUj70rVoWQYNBnYz6hzTRNC3h/vbLKVyWQoLCxEYWEhXFxc0LRpU3z33XfIz89Hfn4+vv32W3To0AGurq7Fyi5duhQXLlyAu7u7ESIv39iPZqPQo3jcFVXg6ozRH8/RY0TFubi6Iq1leyhU2n0pSFExsOs7kNUkZmJigvp9e+p85w0ABV41MGbOLD1GRQipzt7ZZNuzZ09YWFiot48++ghPnjyBl5cXvLy8EBcXh927d5dY1tXVFR07djRwxBVna2uLnp9/DKmZ9pP2S01N0OXTOXBwcGAhMk3Dvl+FwzXqQ1nBpJapZHCr60D09A9gNzAA47/8DIoW5T+PL0mhCR8dZ02BlZWVnqMihFRXtBCBDgw1ifwfi5bg1o9/wFxasYkVpGYmaDx/FuYHLmctprcVFBRg/+dz0fLhTXiW0gmYYRjcAR/p/YZj2OdfG6xpNuruXawdOxHmjyo+7aKUz0O9udPw6U8/sBgZLURASHVDyVYHhrxQBv+5Hud+/wvcR0/BL2X8rQIMlN610HX2NEz45GNW4ynNrfBwPD/yDyzuXEPN/BxY8bjIVqgQ4+gKtPFF63EB8GCxd3RpYp48wZopMyGPuAHzMj7pDMNA4uyAdnNnYPo3X7MeFyVbQqoXSrY6MPSFUiaTYf+Wv3Hr32NIuXEbqhwJOAAYGyuIWjVHy2ED8cHUKawuCVdRubm5iHv2DHlZWbB1cUHtunWNPkMXwzA4efAgLu05gJfnwmGdlaO+u5YC4DRrAB+/Phj98Wy4iEQGiYmSLSHVCyVbHRjzQpmTk4OsrCwwDAN7e3u6UGsp7vlzXA29gMJcCUzMTOFUwx3d+/UDj8VJNkpCyZaQ6oVv7ACIdtge5/u+q1mrFmoaoTmbEFK9vbO9kQkhhJB3BSVbQgghhGXUjPyOkMvlOL9/N6SR18DNlwAAVJbWMG3aGt1HfWj0TkjviqTEF4gMPQuOQgoOjw9rFzf49mR3ti1CCKEOUjowZOcWhmEQsvF3KCP+QydFNiz5mh15ChRKXOQJwLTrjoFz5hl1esHCwkKEHjsEZXY6oFKC4fJhW7MeOvboZfRkFnnlEuIizsFFkozWLrbq31NmfiEixCpwa3qj15gAWFhYGCQe6iBFSPVCyVYHhrpQMgyDPYu/Ro8nVyE0Kbu3bI5chTO1WmLM8p8MntgYhsGxoL9hkhaHLrWcYfbG4g8ZOXm4nJwDx6bt0bEXe4sPlOXs/mC4Pr0GH4fSZ4RSqlQ4+FKG3h9/DQdHJ9ZjomRLSPVCbWdV2OE1P6DXk2vlJloAEJhw0Tf2Jg79vNIAkb3GMAx2//ELOptJ0NvbXSPRAoCDwAoDvV3hFH8H/x3+x6CxAUD48cOoGVN2ogUAHpeLkW5mOPX7KhQWFhooOkJIdUHJtorKysqCzbXzEJhU/L/Ims+D3c0LSE9LYzEyTacP7kFfJz5sLMteAaiOsxCOKY/w4E6kYQJD0XNu8ZWzqG1XsTmOORwOhotM8N/+YJYjI4RUN0ZNtl5eXrCyskJeXp76tfz8fNjY2KjXEfXy8oKlpSWsra1hbW0N0Rsz/AQHB4PD4eDIkSMa9W7btg18Ph/W1tYQCARo164dLl68WOz9/fz8wOdXzT5i4UFb4WtSfI3e8nQwVeJS8Db9B1QChmFQEBcNoXXFnnM2dnPE02thLEf1WujRf9HNRbtnsCY8HhSxUZVaMYgQQt5m9Dtbd3d3HDp0SP3z4cOHiy2Ld/r0aUgkEkgkEiQnJ6tfDwoKgp2dHYKCgorV261bN0gkEmRmZqJHjx4YOXKkxgX00KFDyM3N1f8J6YnqwU1wdejsxOFwwNy7yUJExV0JO4/2ImutylhL0g32ey98dh+WptqvnNTKQoGbly+xEBEhpLoyerIdO3Ysdu3apf45KCgI48aNK7dcamoqzpw5g3Xr1uHo0aPIyckp8Tg+nw9/f3+kpKQgIyMDQFGv2YULF2LVqlUVilEqlSInJ0djYxsvJ0v3srli/QVSBnHSCzgJtEu23vZWeBb9mKWINPEK8so/qASuAkukxT/XczSEkOrM6Mm2R48euHfvHtLS0pCWloa7d++iV69e5Zbbs2cPmjZtirFjx8Ld3R0HDhwo8TiZTIbt27fD3d0djo6OAIBVq1ZhzJgxqFGjRoViDAwMhK2trXrz8PCo+AnqSqXSuShHpTBIMygH2r+HKZ8PmUzKQjTFMSql7oUr8fsnhJC3GT3Z8ng8jBw5Env37sXevXsxYsSIYpPC+/n5QSgUQigU4tNPPwVQdAc8evRoAMDo0aOLNSVfuHABQqEQ7u7uuHr1Kv79918AQGxsLPbt24fPP/+8wjEuWLAA2dnZ6i0hIaEyp1whKnPdx3uqzC0NM97W1BwKpXYJLVEsgWsNT5YCeoupbqsgyRRK8Cxp4XhCiP5Uid5B48aNw9y5c8EwDH777Tco37qAh4SEoFOnTuqfo6OjcePGDezbtw8AMGbMGKxatQovXrxQ36127doVZ8+eLfZe8+fPx7Jly2BuXnbv2TeZmZkZfPk6pnYD4OlVncqq6jTQczQl69R3AC5s/AE961eshQAAElRmaGmIlgEASicPMKokrb94XEiWoOukASxFRQipjox+ZwsArVu3RmZmJrKystCmTZtyj391F9u+fXuIRCL07t0bKpUKwcHlD9kIDQ3FnDlzIBKJ0KZNGyiVSohEIjx48KDS56FP3oNG4Gmh9s2gsYUK1Oo3hIWIirO2tobExrHCTdaSgkJYenqzHNVrbQcOx7WXYq3LFbrUMthMUoSQ6qFKJFsAOHjwIA4ePFihY3ft2oWff/4ZkZGR6m3FihUl9kp+2+PHj9VlTpw4AR6Ph8jISNSvX7+yp6BXDVu0wl1RXa3L3XSqjWbtfVmIqGSdh47FscdJ5R6nUCpxJC4HPQYa5osAAIhc3ZAo9IBci6buyykSNOs7iMWoCCHVUZVJtg0bNkTDhg3LPS4iIgKpqamYNm0aRCKReps1axaePn2Ku3fvllne2dlZXcbJqWhaPpFIVCXH2/ZcsAzHuMIKHx/CEaD710vZC6gETs7OaD1qCg48eAFJQckzL73IzMaBmGx88NEXBl+kfcjsz7A/jYFMUX7CvZUmgWmnQahV13B334SQ6oHmRtaBIee1ffkiAee+/xo9cxJhZ1pyohLLFfjP2g1dvw2E+/8nAzE0hUKBsFMnkPs8Cvz8bPAByDkcKAVO8GjeDi3bdTBKXEDRTFKHN/wK+7RYdHazBZer+Qw3MTsPNwtMULvvMDRrZ5hWAZobmZDqhZKtDgx9oVSpVLh0/Agyws7A9vlDuKik4ICDFK4pxF4+sOvUC50HDzP6yjqvMAwDpVJZ5VoLsrKycPHQfnDTEgCFDBwuDwpLG7i16oRWvp0MumISJVtCqhdKtjow5oVSLBYjNSUZDMPA2UUEOzs7g74/0Q9KtoRUL1Xr1oOU69V4Y0IIIe+OqtHuSAghhLzHKNkSQgghLKNkSwghhLCMntmSaoVhGMTHxyMjMRHmNtbwrFUb1tbarVxECCHaomT7DklKiMfVoJ3gZGUWvWBnhzYfjod7TS+jxvUukEgkOL91MwovXUCN54/hAAaFKhUu2NijsFU71B0xyqAzbxFCqhca+qMDQw/beBETg0u//AD7yOtoJi9UjwdlGAb3+GbIaNYa7T/9Ep716rEey7vo/uUIRC1fiO7ZqeCXMpY2VsXBgy59MXbFDwYZr0xDfwipXijZ6sCQF8qn9+/j7hdz0TkzpczjIuyc0WDVang3b8FqPGWRy+W4EHIMhS9jwVUpoeSbQtSoJVp36GjQCSPeFHXzBhIWzEPbguxyj81XqvCfb2+M/2kN6/FSsiWkeqEOUlVYYWEhLi/4rNxECwC+Wam48e2XyM/PN0BkxT1/+gSHVy9FW2kc/NzM0beGFfqLTCB6cgm7fllulLgYhsHtVUsrlGgBwJLHRafLZ3FuT/mrRxFCiDaMlmy9vLxgaWkJa2truLq64pNPPoG1tbV643A4sLKyUv8cHx8PADh8+DA6dOgAKysruLi4oHPnzti9e7e63m7dusHc3Bw2NjYQCoXo1KkTtm3bpvHeYWFhaN++PWxtbeHg4IAePXrg+fPnhjz9CvlvxzZ0S46v8PFdUxPx39YtLEZUsry8PNw5uB3DGrrDylxz3V83ewFG17PH4b/WGjyuiBPH0C5Ju/9XOy4HWWdCWIqIEFJdGfXO9vTp05BIJAgLC8O+ffvw008/QSKRQCKRwMzMDA8ePFD/7OnpiZ07d2LixImYPXs2kpOT8fLlS/z000/FFonfvHkzcnNzERcXh3nz5uGbb77B119/DQDIzs7GkCFD8OWXXyIrKwtxcXH4+OOPDb4aTUVILpyDuRbPD025HOSHn2cxopJdOPYv+tUTlbqfy+WiqaUSjx8ads3glyFH4cDVvjm4ZvQ9RN2+xUJEhJDqqkr0Rq5Xrx46d+5c5gLuKpUKX331Fb7//nv4+/urX2/fvj3at29fYhlbW1uMHDkSpqamGDlyJD777DPExsbCzMwMw4cPB1C0APqwYcPKjE8qlUIqlap/zsnJ0eb0dJKfnw/LmCdal7ONeYqsrCyDzpmsTE2AiZewzGN8XB1w4tpF1G/YyDBBAeDGxehUrh5HhYjLl9CgRUs9R0QIqa6qxDPbx48fIzw8HM2bNy/zmJcvX2LIEO0XH+/fvz9UKhWuX78Ob29vyGQyTJ06FWfOnKlQ4gwMDIStra168/Dw0DoGbeXk5MBaLtO6nLVChtzcXBYiKh1XJa/QcTyVguVI3vLGFyRtqaQlr81LCCG6MGqy9fPzg1AohJ+fHwICAjB58uRSj83IyABQtND7K23btoVQKISFhQXi4uJKLcvn8+Ho6IisrCzY2toiLCwMUqkU/v7+cHJywvjx48tMUAsWLEB2drZ6S0hI0OFstSMQCCAxMdW6nIRvChsbGxYiKh3Dq1icKr7251MpZuY6F+WaW+gxEEJIdWfUZBsSEgKxWIyYmBgEBgaWOb7R3t4eAJCcnKx+7dq1axCLxWAYBmWNYFIoFEhPT1c3rTZu3Bg7d+5EcnIyIiIiEBERgRUrVpRa3szMDAKBQGNjm6WlJQpqaz9uNrt2PYMvu8dz9oBcoSzzmEcvM+DdpqOBIiqiqllLp3JPGC5qdzBsrISQ91uVaEauCB8fH4hEIhw5ckTrsiEhIeByuWjTpk2xfa1atcLw4cNx//59fYSpV1bdeqFQparw8TIVA8su3VmMqGRdBw5DyJPkUvcrlSrczefBu0FDA0YFuPoNRrpS+2HkcfWbwMeI45UJIe+fdybZcrlcBAYGYtGiRdi1axdyc3PVz2GVypLvqnJycnDw4EHMnDkTn3zyCZycnPDo0SOsWbMGSUlJAIDo6GgcPXoUbdu2NeTpVEivCRMR6lqzwseHOrujZ0DpTfFssbS0RIsRATj4MBGSAs1nnS8ys7H/WSaGTP/E4HH59h+Aa+61tSqTqQLsevmxFBEhpLp6Z5ItAAQEBGDLli34/fff4eLiAldXV8ybNw87duyAp6en+ripU6fC2toaHh4e+OWXX/D999/jxx9/BADY2NggIiICrVq1gpWVFXr16oUBAwaohwZVJWZmZvAN/AVh9i7lHnvJzhltVv4ES0tLA0RWnFeduhj22RLcsKyFkKRCnErMx4kUBdLrd8aHny40SlwcDgctFyzGNQvbCh2fr1QhwrcXeoz5kOXICCHVDU3XqANDT7WX+Pw5Lv6yCna3b6C5vEBjbuQ7fHNkNW+N9vO/oLmRS3H/ymU8XL4Q3bNSYFLKuNvnKg4edfPD6GVl9x3QF5qukZDqhZKtDox1oUxOTMSVXTuAzAwADCB0QLvx/nCtwf5QpHddXl4eQrf/jfzw83CPeQx7KCFVMUiwtYe0VQd4jxiNJm3bGSweSraEVC+UbHVAF8p3F8MwePHiBTJeJsHc2gYeNWvCysrK4HHQZ4iQ6qVKzCBFiKFwOBx4eHgYZGISQgh55Z3qIEUIIYS8iyjZEkIIISyjZuR3yLPHj3HjyFGoJBIwKgY8G2u0HDgA9RoadrKIsuTk5CDpRQLys7MhcHKGh4cHzMzMyi9oINF37yL68iWo8vPANTGFqYMjuo0oWqyCEELYQh2kdGDIzi0qlQpn9u3Hs38Pg3/5KjzyCzT2v7Awh6x9W9QeOgh9xowxylKBDMPg2rmzSDx1DNYPb8NdKoEFj4tcFYN4G0comrVF0w8+RJ0GDQweG1A0Xee53cEQnzsNz6i7qMd5PStXgUqFq3bOQNuOaD1hEmrWrWuQmKiDFCHVCyVbHRjqQpmTk4O/Jk1BzdCLsOaUvS5rHsPgeecOmL7tb9gKhazF9LaEZ89wccVCtE15DpFJ6U8lHiiApw1aY/iyH2FurvsCAdp6EROD/76Yh24JT2BdzheRO1wT5I2ZiEEfz2M9Lkq2hFQv9My2isrLy8P6D0ajYQUSLQBYcThoFH4ZG0aOMsh6uwDw7MF93F3wMQZnxpWZaAGgER8YGH0d++ZMRn5+vkHiS3j6FFc+no6BSTHlJloAaKaSo+6uzTi4aqUBoiOEVCeUbKuozTNmofGNSPVsURXB4XDQ5PY9bJk+k8XIimSkpeHO8m/QTSaucBkeh4Nh6c9x4Kt5Za7SpA+FhYUI/XIeumeWvkBCSZw5QN3De3E+aCdLkRFCqiNKtlXQ3evXITh3AVwtEu0rHA4HDqHhuHnxIguRvRa+dSP65KdpXY7L4aBTzB1cPXuGhaheO7djG3okxuhU1h1KpP67j/UvBISQ6oOSbRV0bdsOiGRyncs7KZS4tXOXHiPSpFAowNy+otVd95tcTHhIOn1Uz1Fpyr1wDuaVmOO4xYsYXDp+TI8REUKqM6MmWy8vL1haWsLa2hqurq745JNPYG1trd44HA6srKzUP8fHx6Nbt24wNzeHtbU1nJ2d4e/vr35GWdK+3NxcjfcMDw9Hr169YGNjAwcHB7Ru3Rp//PFHlbmLyc3NhSQ0rNL1FFwIR1ZWlh4iKu7Cwf3okKP9Xe2bHB7dQXJSop4i0nQnIgLez6IqVYcDl4OXx7VfO5kQQkpi9Dvb06dPQyKRICwsDPv27cNPP/0EiUQCiUQCMzMzPHjwQP3zq2X0Nm/eDIlEgjt37iAyMhIrVqxQ1/dq371793D37l0EBgaq950/fx79+vXDwIEDERcXh4yMDGzduhXXrl2DTCYz+LmX5Gb4RdR4mVLpejzTMnDtv//0EFFxBY8ewMakckOMWnHluH32tJ4i0hR/9TI8OZX/8sSLf66HaAghpApNalGvXj107twZDx48qHAZV1dX+Pn5lVjGxcUFffv2RWRkpPq1BQsWYMaMGZg3b576tSZNmmDHjh1lvo9UKoVUKlX/zGZv3+yUFFjo2Dz7JjMOB3mZ7NzZMtKC8g8qB4fDgaqg8vWUqCBPL9WoDNRrmhDy/jP6ne0rjx8/Rnh4OJo3b17hMomJiQgJCSmxzKt9derUAVA0lObatWsYMmSI1rEFBgbC1tZWvbE5ib2JuTlU5R9WLhXDgG9iooeaiuPw9PMdjcNn6buenurlGGGCEELI+8noydbPzw9CoRB+fn4ICAjA5MmTyy0zY8YMCIVCdOjQAb6+vvjmm2809gkEAtSoUQNCoRBLly4FAGRlZYFhGIhEIvWxI0eOhFAohKWlJcLCSn9OumDBAmRnZ6u3hISESpxx2US1a0Fc+Rtb5ICBs1fNyldUAsbKutJ15CmUsHB00kM0JbAWQKWHZ/CMjY0egiGEkCqQbENCQiAWixETE4PAwEBwK9CDdOPGjRCLxYiPj8fGjRthaWmpsS8nJweXLl1CTEwM0tKKOvLY2dmBw+EgOfn1uMsDBw5ALBbD09MTKlXp95NmZmYQCAQaG1tatG2L9CaNKl3Py4Y+aNu1qx4iKq5ev4GIllcumV2yckTnAYP0FJEm31FjcMO08mvUcpu11EM0hBBSBZItW3x9fTFz5kx89dVXAAArKyu0adMGR45U7R6mHA4HLn16QlmJOzMVw8CpV48KfXHRhU+zFnhe00fn8gzDgNOyPfgsNSM7ODoir1W7StVxn2uCVuMD9BMQIaTae2+TLQDMmTMHZ8+eVXegWrlyJTZs2IDff/8dmZmZYBgGDx8+hFgsNm6gb+k/axaeOTvqXD7GwQ5+s2fpMaLiBJ17Ikuh29PlGyo+2oydqOeINNUaNhIJlXj4ndKkpcEWJSCEvP/e62Rrb2+PiRMnYtWqVQCAnj174sSJE/j333/h6ekJR0dHTJgwAYsXL4avr6+Ro31NKBTC86NZyDTV/s4vi8+D66wZcHRi6Xno//UcMw7nG3aArIzm95IkKgHZyElw//8wLra06tIVD3r0R55S+4wbIXRC2/lfsBAVIaS6olV/dGCoFVt2LV8B5s+/4FDB2aQyTPhgpk+G/9IlrMX0Jrlcjt1ff4rej69BwC+/5+4zFQ8v+o9G/5kfGSC6ouUJd376CbpcPgcBt2K9zi7bOsJr0Uo0ZvnLF636Q0j1QslWB4a8UB7btBnRm7ag1rPnMOeU3BBRyDCIrV0TdaZMwuCZM1iN520MwyDkrz9RcPE/tMlMhEMJk108lQPPPL3hPngk2vkNNHh8R1b/BOXpE2gnTi11Cscn4CG2QVO0++Ib1DLAuruUbAmpXijZ6sDQF0q5XI6TO4MQf+w4pJF3YZGfDzBAgaUlTJs1gcfA/ug/cQJMWBpXWxEMw+Di8SNIv3AW/BwxGJkMHHNzKJzdUH/ISDRs2cposQFFE5Oc27kduaFnwXsRD3OZFDIuDwprG3BbtEGTsePh3bSpweKhZEtI9ULJVgfGvFAWFBSoxwzb2dlpDHsiFcMwDPLz82FmZsZaj+jyULIlpHp5rztIvY8sLCzg5uYGd3d3SrQ6erXAhbESLSHh4eFo1qyZQd+TYRj4+/tDKBRi2LBheqkzNjZW4++oW7duCAoK0kvd7xtKtkSvVCoVxGIxkpKSIJFIqsxqSoSU5c0VyKytrTVmmtNX/RffWGO6c+fOuHPnjl7fozzh4eG4dOkSkpOT8e+//xr0vUkVWoiAvNvu37qFi1u3IS00HPyMTPBkMsgtzMHxqAFRrx4YMHsWXPR8ASNEn06fPo1OnTqVul+hULzTrSHx8fGoXbs2zM3NjR1KtUR3tu+Ql4kvcPTP1Tjx8/c4/tP3OLpuNRLj44waU0JMDH4cNhJn+w+GYOce1ElIRM38AtRQKFErNw9eDx/DbO2f2OrbBb/Nmq2xehIhVVloaCjq1q2LxYsXw9HREYsXL8azZ8/QpUsXCIVCuLm5aczLDgB79+5F48aNYWNjgyZNmuDx48eYOnUq4uPj0adPH1hbW2PXrl3qul958OABOnfuDKFQiFatWuHSpUvqfV5eXvjll1/QoEEDCIVCfPRR6UPnsrKyMHbsWDg6OqJOnTrYuHEjAGDXrl2YOnUqQkNDYW1tjd9++61Y2by8PMyePRtubm6ws7ODv7+/et+BAwfQqFEj2NvbY/DgwUhNTS3393flyhW0aNECAoEA7u7uWLNmTbll3mfv7te0aiTq5nU8PrwbzgmP0c+KA87/l+BjGAY3b5zF9Rr1UW/waDRq096gcT28dRuHp89AzaexZR7H4XDgmSmGMng/fopPwCd7gmFDk/yTd0BsbCx4PB5evnwJhUKBpKQkLFu2DB07dsTz58/Rs2dPtG3bFkOHDsWlS5cwZ84cHD58GB06dEB0dDQEAgE2b96Ms2fPIigoSH3nHBoaqn4PmUyGQYMGYd68eTh37hwOHjyIQYMG4dmzZ7CzswMAHDp0COHh4SgsLETLli0xYsQIdO/evVi8rxJxfHw8nj59ip49e8LHxwfjxo2DXC5HUFAQzp49W+K5zps3Dy9fvsTdu3dha2uLq1evAgCuXbuGefPmISQkBD4+Pvj2228xe/ZsHDhwoMzf3bx58/D5559j3LhxyMrKQmxsrLa//vcKJdsq7urJY2AObsJAMxVgrdkQweFw0NqaB4if4tZfK3E5dQo6DNB+CUFdJMbH49DMOfAqJ9G+icfhwOviFfwRMBlf7N1tlCa5u7duIe7+TfCUcoDLBayE6NZ/MHU2I/Dz8wPv/8sqTp48GYMHD4aZmRm++eYb8Pl8mJiYoE6dOuplO+vVq4dx48bh4sWLGDp0KLZt24YZM2agY8eOAAAfn4rNH3716lWoVCrMnTsXADB69Gj8+uuvOHnyJMaOHQugKHE5OhZN4dqtWzfcuXOnWLJVKpXYv38/Hj9+DEtLSzRt2hRTp05FcHAwupazKIlKpcLOnTtx79499fu8+mLw999/Y/bs2WjSpAkA4LvvvoO9vT0UCkWZdZqYmODp06fIzMyEvb29+otDdUXNyFXY/asR4PzzF1qblT/lYEtzBqaHt+DOxdKXCtSnQytXwSv6qdbluBwORGdDcXjz3yxEVbrMzEzs/u1HWD67iv61hOhb1wl9azughwODsG1rEXryuEHjIVXPqxXIxGIxVq9eDQAQiUQaXwoTExMxbNgwiEQi2Nra4tdff0VGRgYA4MWLF6hVq5bW75uUlFRsjeyaNWsiKSlJ/bOLi4v635aWlpBIJMXqSU9Ph1wuh+cbU6G+XU9p0tLSIJVKS4w/Pj4eK1asgFAohFAohIeHB/h8vsYKaiXZvHkzHjx4gLp166JTp064fPlyuXG8zyjZVmFPDwahpXnFe/M2M2Pw/NAuFiMqkp2djezzoTqXN+Nw8Py44ZJbfn4+Tu/YgA+ae6GWSHOBBxM+H72b1IVrThzCTp8wWEzk3fDqkc0rCxcuhJ2dHaKjo5GdnY158+ape9x7eHiU2lT6dj1vcnNzK7ZGdnx8PNzc3LSK1dHRESYmJoiPj9e6HicnJ5iZmZUYv7u7O5YtW6b+IiIWi1FQUIAaNWqUWWf9+vWxb98+pKamYsyYMeq79OqqSifbLVu2oHnz5rCysoKrqyv69OmDU6dOAdDsqu/q6opPPvkECoUCOTk5cHd3x8mTJ9X1PHr0CHZ2dnj6tOhObPr06XB1dYVAIECTJk1w9OhRo5xfWaIf3EOdtOdal/PJjMeDWzdYiOi1o3+uh2dyWqXqMLl6A5FXruoporKdO3oQw5rVLvOCV9fVCeLoO+U2jZHqLTc3FzY2NrC2tsb9+/c1xpROnDgRGzduxOXLl8EwDB4/foyXL18CAJydnUtNxO3aFS0H+ccff0ChUGD//v2IiopCv379tIqNx+Nh5MiRWLhwIfLz83H//n1s2bIFY8aMKbcsl8vFhAkT8OmnnyIjIwNyuVzdSWvSpEn4448/1EOVMjMzcfjw4XLr3LVrFzIyMsDn82FjY6Nuoq+uqmyyXbZsGb777jssX74caWlpSEhIwJdffqmRRE+fPg2JRIKwsDDs27cPmzZtgkAgwG+//YZZs2YhPz8fDMNgxowZ+PLLL9W9/z799FPExsYiJycHf//9N8aPH69uCqoqHp88ggZW2j/TrGvFx7Mzx1iI6LW0K1fBLSNxVYSTTI7bh9lfW5hhGCjSk8CvwEIJXeq5I+zMyXKPI9XXokWLcP78eQgEAsydOxcjRoxQ7+vYsSPWrl2LyZMnQyAQ4IMPPkBOTg4A4KuvvsLXX38NoVCI4OBgjTpNTU1x5MgR7N69Gw4ODggMDMSRI0d0esb5KmF7eHhg8ODBWLJkSYkdqUqyevVquLm5oVGjRnBxccFff/0FoGht8J9//hkTJkyAQCBAy5YtNXpLl+bEiROoX78+bGxs8Ntvv2HHjh1an8/7pEpO15iVlQU3Nzfs3bsXgwcPLvEYLy8vjd59o0aNgrOzM/744w8AwODBg1G/fn3Ur18fv/32G27evFni3ME3btxA586dcfXqVTQtZW5cqVSqMWQlJycHHh4erE61d3zpF+ib/kinsqfs6mHA96v1HNFrP3XrCffIe5WuJ3/cGExdV3wIgj7l5OTgzp718G1Qu0LHn36RB7+xAazGBNB0jYRUN1WyN/KVK1egUCgwYMCACh3/+PFjhIeHY9myZerX1q1bh6ZNm4LD4eDEiRPFEu3s2bOxdetWFBYWon///uqediUJDAzE0qVLdTsZXWm5TqwGphJlK1S9fupXKiu2dGBlKBQK8HkVvwvX17kRQsibqmQzckZGBhwdHTXa+EUiEYRCocbsJ35+fhAKhfDz80NAQAAmT56s3ufm5oYaNWrA0dERbdu2LfYef/75JyQSCc6ePYs+ffqU+TxvwYIFRZ2C/r+93ZmBDYy57kNRKlO2IvhWVnqpx8TKWi/1lMXW1hbpBcoKHatUqgBTC5YjIoRUR1Uy2drb2yM9PR1K5euLZHJyMh49eqTRnPuqq35MTAwCAwPBfWOt0rVr10IgEEAgEGDDhg0lvg+Px0PPnj1x9uxZnDhRek9UMzMzdV2vNrbZt/RFaqH2d36ZUgUETYt/udAnQaOGla4jj2Hg3q6NHqIpG4/Hg1LgWP6BAMIexaJT34q1phBCiDaqZLLt0KED+Hx+mQmwLHFxcVi+fDn++usvbNy4Ed99912ZY8IUCoW6p3JV0aFPP1wzrViSeFMEzw6dBpT8nFtfuk+bgpdWlbt7TmvcAL1HjtRTRGVr3b0vQqPKntZSkl8IiaUjPT8lhLCiSiZbOzs7fPXVV5g1axZOnDiBgoICKJVK9fRh5Zk9ezZmz56NRo0aoVWrVvD398e8efMAFI0RDQ4OhkQiUXezP3/+PLp06cLiGWmPw+FA2NUPSdKKNYECQIpUAZtOfTTu8NlQp359cDvqPjWkimHg2rsn63G+UsPDEy5te+DU3adQlfBMNikjCyeeZ2Hw+EkGiYe8uxo1avTeTc7w8ccfY9cu7cfnb9myBZ9//jkLEb2fqmRv5Ff++usvrFu3Dk+ePIFQKETDhg3x2Wefwc/Pr1hv5Ff27t2LhQsX4t69e+rnuxKJBA0aNMCmTZvg6+uLIUOG4Pbt22AYBnXr1sW3336L4cOHVzguQ/YkPbT2R7R+cB4is7L7sqVKlbjs3REjPv+W1XheuREairDJ0yHKFGtdNtanHmadOAo7e3v9B1aGzMxMXDp1DIw4FXylHAyHA4WFDVx9mqFNh45lPrfXN+qNTKqCly9folOnToiOjgaPx8ONGzcwZswYFBYWYsuWLejbty+Aojmix44di4sXL6r70shkMtSrVw83btyAk5OTMU/jnVClk21VZegL5ZmdWyANP4ku3DxYmmiOFy1QKBGmtAS/Q2/0nTSD9VjeFLJjJ6K/XQyn3OJTx5UmvqYHhu3YivrNSh5mVV1QsiWlMeRSfj/++COSk5PV01P269cPn332Gdzc3DB69Gjcv38fADB8+HDMnz8fnTt31ig/a9YseHt7Y/78+QaJ911WJZuRiabe/lPQd10wrnUfhxOC2jht4oRTfEeECGrjcucx6P1HsMETLQD4TfBHk7U/45mbCMpyvrPlMyo8bdYII3fvrPaJlry73lwEPiAgAHPnzkXPnj1hY2ODPn36IDMzs8RyCoUCH330ERwcHODj44MffvhBPclObGws+Hw+NmzYAHd3dwQEBKCwsBBz5syBSCSCp6cnvv/+e/UjkCVLlmDq1Knqut9cru9VXevXr4eLiws8PT01Zrl628mTJzUSaFxcHLp27YpGjRohPz8fQNHkQZaWlsUSLQB06dJFY6IhUroqOc6WFGdiYoJeo8cBo8cZOxQNPYYPR5vevXFk/QbEnzwN28i7sFUVJV6GYZBiaQGmQzt4Dx2MSeM+rPZTtpH3y759+3D69Gl4e3tjwIABWLt2bYlj8tevX49Lly4hKioKSqUS/fv319ivVCoRGRmJZ8+egWEYLFu2DA8ePEBUVBRyc3PRq1cveHp6IiAgoNyYlEolrl27hri4ONy6dQv9+vVD27Zt4e3tXezYe/fuoV69euqfGzRogP/++w8eHh5wdHSEXC7HwoULcejQoRLfy8fHB3fv3i03JkLJluiBjY0Nxn35BZgvPsfl8+eR9OQJlFIp+JaWGNChA+o3amTsEAlhxQcffKCeeW7EiBE4ffp0icf9888/mD9/PpydnQEUrTsbGBiocczixYvV/Uz27NmDzZs3w87ODnZ2dvjss8+we/fuCiXbN+vy9fXF4MGDceDAgWIL3QOAWCyGtfXr8e4//vgjZsyYgYKCAqxbtw5r167FBx98gMTERPj7+8Pc3Bzr1q2Dl5cXgKK//ezs7ArFVN1RsiV6w+Fw4NujB9Cjh7FDIcQgKrL0HVA0T8Cbq+S8vWIOl8uFq6ur+uekpCSdlsp75c0l+zw8PNQLIrzN1tZWI+a6deviv//+U8e8b98+XLp0CR07dsT+/fuRkJCAL774Avv37wdQtDCDra1theOqzuiZLSGEsEwkEiExMVH984sXLzT2v90T3s3NrdSl8qysrFBQUKDel5KSUuz93pzlLiEhQSORv6lJkyZ48uRJifu+/PJLLF++HCYmJkhPT0fNmjXRpk0bdacpoGiq3LKmuiWvUbIlhBCWDR8+HL/++itSU1ORnJyMdevWlXn86NGjsWzZMmRlZSEhIQGrV69WL5XXrFkzhIaGIjk5GampqVi7dm2x8suWLUNhYSGuXLmCI0eOaKxO9KZ+/fqpO3y9KSIiAnl5eejTpw8AwMLCAg8fPsT58+fVTcgAEBYWph4eRMpGzcjviJSXL3Fp6yao7twGR5ILMAwYaxtwm7aE76QpELm7GztExDx6hDv7gsFPT4FKJgPX3ByMZ210njRVp+XCCHlfzJo1C48ePYKPjw+cnJzg7++P3bt3l3r8d999h08//RQ+Pj4wMTHB1KlTMXHiRABA7969MXDgQPj4+MDd3R2TJ0/G+vXr1WV5PB5at24NT09PmJmZ4ffff0f9+vVLfB9/f3907twZP/74o7rzokqlwpdffqnRi/mnn35C7969YWFhgX379gEA5HI5jh8/juvXr1f691Md0DhbHRhyjGRubi6OfbcAjreuoIWsoFhzE8MwuM23QHrLtui/LNAoz0/uX47A4x2b4fb4LhpxNWdoUjEMrphYIbdpa3T99Cs4i0puzjKEuOcxiDx5BNzUOHDlMoDLg8LKFoKGrdCl/yCD9pSmcbbV28aNG/HPP/+U2qFKV7Gxsahbty4UCkWFy3z00Ufw9fXFhx9+qNV7bdmyBVFRUfj555+1DbNaomSrA0NdKDPT03Fs1hT4JTwtd7F2hmEQ4lYbfus3w/GNThtsu3z0CGR//oQW8rxyjz0jdEWrFavhVcq3bLbIZDL8+/uPqJubiObOxf+/cgulCM1UoNagD9Gsna9BYqJkW73k5ubi6tWr6N69O54/f44BAwZg7ty5mDNnjl7fR5dkSwyDntlWUQqFAofnzUH/CiRaoKiDhV9SDI7NmwO5nP11YgHgXsQlyCuYaAGgt/glri/8DJnp6SxH9ppCocDewO8wxFRcYqIFABtzMwxys4Ls1C7cDA81WGyk+njVNGtra4suXbpgwIABmD59urHDIgZEybaKCt2/B92j72o1Xy+Hw0HPZw9xbtdOFiN77dHfG9C8gon2lT6ZibiwsezOIfp04u8NGCFUwKQCTcQtHa2RHLIHubm5BoiMVCe2tra4desWJBIJkpKSsHr1apiYmOj9fby8vOiutooyeLL18vJCzZo1Ne6+Zs6ciSVLlqh/joiIAIfDwW+//aZRNjQ0FBwOR91R4JUdO3aAw+Go6wgNDQWXy4W1tTVsbGzQpEkTHD58GLt27YK1tTWsra1hbm4OHo+n/tnPz4+1c9ZF1plTsNHhGaIlj4vs82dZiEjTw5s3UOfZQ63LcTgcMNcuGuTuWyaTgZ8QBTOTivcD7O1mg7B/97IYFSGkOjLKnW1ubi62bt1a6v6goCDY2dmVOKenq6srzpw5ozHObNeuXRpTjgFA7dq1IZFIkJ2djVmzZmHMmDHw8/ODRCKBRCLBtm3b0LlzZ/XPISEh+jvBSnr68AHcH97RubzX43uIunVLjxEV9+jgftTl6fa4v0NOKs7vLb0npr5cOHIQXRzMtCrD43Ihj3kI6spAyLuDYRhERkZix44d2LBhAzZs2ICdO3ciMjKyyvwtGyXZzp8/HytXrizx7kYul2Pfvn347bffcOvWLURHR2vst7S0RPfu3XH06FEARbOcPHr0CN26dSvxvbhcrnpi75iYGJ3ilUqlyMnJ0djYFH39GrxR8XVs31aHw+D5bXaTLT+t5BlpKsKKx4MsIVZ/wZRCnpoIC1Ptm+ocZbms/x8TQipPoVBg69at+PLLL7Fz507ExMQgJSUFKSkpePbsGXbu3ImvvvoKW7duNXrzulGSbffu3eHp6Ylt27YV2xcSEgIej4exY8eia9euJd7djhs3Tr3Y8Z49ezBq1KhSFyJXKpXYunUrrKys1CtjaCswsGhIzavtzanQ2MAUFlZ6bVWVtFBP0ZRWv7RS5ZlKlq8QhW5N1dZ8lDrtHiHV1YsXL/DHH3/gyy+/xNy5c/HRRx/h008/xU8//YRbLLekleTBgwf45ptv8OzZM1hbW0MgEGhcNzkcDgQCAaysrBATE4Nvv/0WDx48MHicrxitg9TixYtLvLsNCgrCiBEjwOPxMGbMGHVSfVOfPn1w8+ZNZGZmIigoCOPGFV8J5/nz5xAKhXB2dsb27dvxzz//QCgU6hTrggULkJ2drd7enAqNDRxzi0o3fXAtzPUUTSn1m2nXPFusvDm78QEAo+OaoNlyhobjEKN69OgRgoODsXPnThw7dsxgIwxKkpeXhxUrVuCnn35Ceno6rKys4ODgACcnJ9ja2iI/Px+7du3CggULSp36Ud9u3LiBv//+G1ZWVhXqaMbn82FpaYktW7YYbRIOo80g1bNnT7i7u2P79u3q13JycnD06FH1+ogjRozAnDlzEBERAV/f1+Mf+Xw+hg4diuXLl6OwsBDNmzcvVn+tWrXw9OlTvcRqZmYGs0omF2007doN9/76DU1VMp3KR3H4aOBbfO1JfVLUqAnE3C//wBKIFUpYeTfQc0TFWXrWRc7jRAgsTLUqJ7ayh42NDUtREVK606dPIyIiAtnZ2eqbA5lMhv/++w+1a9fGhAkTDDpxTXZ2Nr7//ntYW1vD3t6+1ONefTndsGEDJkyYgGbNmrEWU0ZGBnbt2qXTzZOtrS2Cg4NRu3ZtODg46D+4Mhh16M/bd7cHDhxAYWEhRo0aBZFIhIYNG0KlUpXYlPzhhx/i119/LfGu9l1Xw8sLaU1a6Fw+qVFz1GJ54ogWY8bjvkq3GZeuOXmgy9Dheo6ouC5+gxCerd1zGqlcAdM6jVmKiFQlly5dwtq1a/HLL7/gzz//ZL3FqjybN2/GmTNnwOFwNBKJqakphEIhMjIy8P3332u1+k9lqFQqBAYGwsbGptTHdG8TCATYsWNHqasM6cOGDRsq9YXD1tYWGzZs0GNEFWPUZNu7d2+IRCL1wsRBQUGYN28e7ty5g8jISERGRmLr1q3Yt29fsWYUX19fnDlzBjNnzjRC5Oxz6TcImUpV+Qe+JVupgmOf/uUfWEm1vOsjsX5TrcspGQa8th0r/MdbGTweDyb1WiC7sOItBKdSCtB92CgWoyLGlpGRga+//hqHDh2CWCyGRCJBamoqfv75Z6xdu9YovVf/+ecfREdHa6wt+7ZXzyBXr15tkM4+p06dAofD0br/iEAgwIEDB1iJ6dmzZ0hLS6tUnxYOh4O0tDQ8e/ZMj5GVz+iTWixevBiZmZl48eIFwsPDMXfuXIhEIvU2duxYmJublzg0p2fPnu/tBPddhgxBePP2UGrxh69iGIQ2bo1uI0ayGNlrLT+ajwhLoVZljrvWQd85n7ATUAn6jZ+Ek0o75FYg4Yal5KHBmOnqBbxJ5WVnZ2PTpk1YvXo1tm3bhvz8fKPGo1KpsGrVKpibm2skNg6HA3t7e6SkpGDTpk0Gj+ny5cuwsrKq0PGmpqY4cuQIy1EB169f1/lvISYmhpXnzCEhIXq55tvZ2Rl8uCfNjawDQ81rW1BQgN0zp8Lv0W2YlnMnKFcxCKnXBKP/+huWlpasxfS2exGX8GLVInTMzyrzOBXD4JioNvr+8gecDLwYAcMwOLp5HSzi7qOLiw34PM3fZZw4D5EKCzQdGYC6DRsZJKbqMDdydnY2lixZAoFAAC6XC6VSiby8PKxcudKgfSDedOLECYSFhcHCwqLUYzIzMxEYGGiwv6OTJ08iNDS0zJjeplKpsHTpUtZiio+Px+rVq3VObDKZDC1atMDIkfr94r9kyZJKj9R40+LFi/VWV3mMfmdLSmdhYQH/zdsQMeADXBA4oFBVvFlZqlLhgrU9LvoNx7i/dxg00QJAE9+OaPjzevzXpjsieJbF7sTzlEqctXHCuR5DMXzTToMnWqDormXwtI/Q4eufcdbWByESU5zKYnAqm4tjciFye/tj+KKfDJZoq4v9+/erEy1Q1KxvYWGBgwcPGi2m+/fvl5vUBAKBehy/ITx58kSrRAsAaWlpUCp1H4tfnlu3blXqS6CpqSnS0tL0GFGR7OxsvdUlFov1VldF0Hq2VZyJiQlGLFoKqVSKczu2oeDGVXAkkqLnStbWMGvdFn7+AVr/sepTTW9v1Az8BWKxGOE7tgJpyWCkUnAsLGDm3RCDR48FX8dhOPpkbW2NAROmGDuMaiMnJ6fYs3kTExNkZmYaKSKgsLD88ed8Ph95edrN+V0Zujx/5XA4yM/PZ63XfGFhYaWXnGSjGVmfXzDY/LJSEuNfAUmFmJmZwW/aDGDaDGOHUiqhUIhBc+cbOwxSRdjb20MsFmtctGUyGVwMuATk28zNzTWmei2JXC436PAaXRYkYBimws94dWFpaQmlUlmphGtqqt2Qu4rQ55d2Q98AUDMyIYQVo0ePRl5envoORyqVQqlUYujQoUaLqXXr1uV20srLy8OgQYMMFBHQqFEjre+kRSIRqz3627ZtW6km28LCQri66v+Rka4TE7FdV0VQsiWEsMLCwgKBgYFo3LgxHB0d0apVKyxfvtyojxS6d+8OLpcLVQn9H4CiToktW7Zk5a6sNN26ddOq009hYSFat27NYkRFybwyLRAymQwDBw7UY0RFHB0d9TI0i2EYODo66iGiiqNm5HfIjbAwRJ0+A2VePhhGBb6VNbx79UC77t2NHRoAICM9Hf/9/TcUGZlQSqXgW1jAqnYt9Js40aAXr7Lk5eXh/vVrECcnw9zKGjW8vVGH5QlAqjNTU1OMGTPG2GGocTgcfPfdd/j555+RlZWlvrtRKpXIzs5Gs2bN4O/vb/CYunXrhv/++6/cZ7AMw4BhGPTr14/1uNq3b49Tp05p3emSYRjUrVu30s98SzJkyBD8+OOPZc5mVRFZWVmYOnWqnqKqGBr6owNDDtuQSqU4snETXpw8CZsbkXB8qzNFBp+H7JbNUaNfXwyeaZwxotfPh+JOcDAUYRdRL1Os8S1dxjB4UtMDgu7d0GXaFNTy9jZ4fAAQdfsWHu4LhumNK2iUmwEBnwepikEiw0F8vUaw7NITPf0nGmxISnUY+lPVPX78GKGhoVAoFLCyssKIESOMOk3n3r17ce3atVI/D0qlEvn5+Vi4cKFBmkAZhsGiRYvA5XK1arLOycnBwoULWZsD4YcffkB+fr7OzegqlQpWVlb48ssv9RxZ2SjZ6sBQF8qUpCRsDZiCOtdvwaScZiY5wyCmVXNM2LoZriyvSvQKwzDYsWgJTP7eBpG8/B6VT+3tUHfZYvT44AMDRFdEKpVi7xfz0ODmJdTjlv5RL1SpEGbrjDqffoNWvXqzHhclW1KSq1ev4ty5c0hOToaNjQ14PB7y8vLA5XJRv359jBs3zqBfqAsLC7FkyRKYmJhUqPk/JycHM2fOLLa+uD7l5eXhu+++07kTW3Z2NpYtW8ZqB7OSULLVgSEulBlpadg0fCR8HjzWqtxjn3qYdHA/nEUiVuJ60+bPv4DzjmCUPsFccYlWFnBduQy9xo5lLa5XpFIpdk2biMEx98Gv4DOx+yYW4H6yAL5DhrEaGyVbUpa0tDTcvHkT+fn58PT0RKtWrfQ6mYM25HI5/vjjD8TFxUEgEBRrHmYYBmKxGA4ODpg6dSorHaPe9ujRI2zevFnrloicnBxMnToVDRqwvxDK2yjZ6oDtCyXDMPhx6Ah4h13S6Q/sUYc2+OrYEVb/OI9s2AD50hWw02H+5lh7Idru3IbGbdqwENlrOz6ehf63wyucaF+5bW4D11W/wacVe51QKNmSd41EIsH+/fsRExOj7lluZmYGkUiEoUOHwtPT06DxxMbGYtOmTVCpVOXe7UulUnA4HEydOhW1atUyUISaKNnqgO0L5aXTZxD94UTYldJjsjzZHMBr22Z0HaT/3oBA0ZeB33v0QuMHj3SuI3bEEExe/6ceo9L04MZ1SOdNQW2Obh/vcy07Y/ha9uKjZEtI5TEMg4MHD+L27dsQi8Wws7NTN3crFAqIxWLY2tqiRYsWGD58uNFaBwDqjVwl3d2zF646JloAsGWAB/v2s5Zsz/97CB4PogDo/sEtDA2DWCxmraPH4wN70EPHRAsAtneuITkxESJ3dz1GRQjRJw6HgxEjRmDEiBFISUnB5cuXIRaLwTAMhEIhfH19jTqJypuqxDhbLy8vWFpawtraGm5ubpg7d656Kq1u3bqpV+hwdnaGv78/cnJy1PtKWuv2TX5+fsUe7C9ZsgSNGjUCl8vFtm3bWDknXaWlpKDwQnil65GHX0ISS+tzPv33EGwrkWgBwDs9EyfXs7OmZG5uLkxvXK5UHS0VhYjYvkVPERFC2Obi4oKhQ4ciICAAkyZNwrBhw6pMogWqSLIFgNOnT0MikSA8PBz//PMPtmx5faHbvHkzJBKJep3bFStWVKjOQ4cOITc3t9jrdevWxerVq9GpUye9xa8v10NDUSNTXOl6PLJzcf3c+coHVAJZTEyl6+ByOCh8Wvl6SnLn0iU0yRNXqg4OhwNu/HP9BFTNSaVSLFmyBFKp1NihFEOx6YZi016VSbav1KlTBx07dkRkZGSxfa6urvDz88ODBw/KraewsBALFy7EqlWriu0bP348+vbtW+HB2lKpFDk5ORobWwrE2eUO86kILocDWQlfNPRBmaefNUkV+RK91PO2nPRU2PAq/9FmyplDl1SMVCrF0qVLq9zFD6DYdEWxaa/KJdvo6GiEh4ejTp06xfYlJiYiJCQEzZs3L7eeVatWYcyYMahRo0alYwoMDIStra1682BxHCvP1EQv05EV1cXOrE1cPU23xzNhZwIJE3NzKPTwO+SYUJcGQoh+VJlk6+fnB2tra9SvXx++vr6YM2eOet+MGTMgFArRoUMH+Pr64ptvvimzrtjYWOzbtw+ff/65XmJbsGABsrOz1VsCS89CAUBUuzay9dBhTsIwsK/BTucenp46NfFs2Zmtp4Z3fSTq3r/sNRvDrfxCCHm/VZlkGxISgtzcXBw6dAg3btyARPK6iXHjxo0Qi8WIj4/Hxo0by23+nT9/PpYtW6a3mVbMzMwgEAg0Nra07dwZ6Y0bVrqe5Ab10blvXz1EVJx1+7aVvvvO4nBQZ0B/PUWkqUHTZnhaq3LzHWcrVRB2qRpzThNC3n1VJtkCRZ1ShgwZgl69emH58uU61xMaGoo5c+ZAJBKhTZs2UCqVEIlEFXrWa2xcLhcuvXtBVYlkxjAMnHr3YGUicADoNXMGntpW7gtHUtNG8GXpywAAWHTqAblK99/hdVFNdBkyXI8RVV9mZmZYvHixwead1gbFphuKTXtVKtm+8vnnn2Pz5s1IT08v91i5XI7CwkL1plQq8fjxY0RGRiIyMhInTpwAj8dDZGQk6v9/dZdXZVQqlca/q4qBH83GcycHncvH2AvRf/YsPUakya1GDaBjB53LyxkGLn17szrAvEfAZIRZ67YyiFSlgkmnbqyuF1qdmJmZYcmSJVXu4gdQbLqi2LRXJa8mDRo0QNeuXbF27dpyj508eTIsLCzUW2BgIJydnSESiSASieDk5ASgaH3GV+Ntp02bBgsLC5w5cwbTp0+HhYUFwsLCWD0nbdjZ2aHWJx8hy9RE67JiPh81586BE8vjy/wWLUSUl/bTszEMg/sd22HYJ5+wENVrlpaWcP/kSzzia/cHp2IYhNRvgQFzP2UpMkJIdUTTNerAUFPt7VoRiMI//oSDVF6h4zNNTWAyexr8Fy1iLaY33b96FWEffQKfuIp1GFMyDO60aYlpwUGwNcASYQAQtm8PsHENmsrKH64kVakQUr8FRv62AdbW2iyvoD2arpGQ6oWSrQ4MeaE8vnUrHv61Be6PomHJKbkhooBR4YV3XTSYOhkDpxl2QeTY6Cc4+s03sL16AzWkshKPYRgG0Xa24PbsgYlrfjH4mrv3Iy7h0a5tcLp3E01VsmLN19lKFa45e8CkUzcMnPd5hZYSqyxKtoRUL5RsdWDoC6VSqcSp3Xvw9NBh5EXegXleAThgUGBpAavmzVBn8CD0HfehQZJEaR7ejsTVbduQffESTDOzYCqVocDCHHB3h12P7ug7awbrTdvlSXj+HNd3bgXvRRyY/HxwTEygEthC2Lk7ugwbwVqHspJQsiWkmmGI1rKzsxkATHZ2tsHfWy6XM2lpaUxqaiojk8kM/v7lUalUTG5uLpOcnMwUFBQYO5wqy5ifITYVFhYykyZNYjw8PBgbGxumXbt2TEREhHp/YGAg4+joyNjZ2TFffPEFo1KpjBJnREQEw+FwmGXLllWp2H744QemRo0ajLW1NdO8eXMmJyenSsR2+/ZtxtfXl7GxsWFq1arFbNq0iWEYhlEqlcwnn3zC2NraMs7Ozszq1atZjePPP/9kWrRowfD5fGbx4sUa+7Zu3cq4u7szNjY2TEBAACOVStX7nj59yvj6+jIWFhZMixYtmMjISFbjLAklWx28rxdKYjjv62dIIpEwS5cuZeLi4hilUsns3r2bcXBwYHJzc5njx48zNWrUYJ4+fcq8fPmSady4MbN582aDx6hUKpl27doxbdu2VSfbqhDbH3/8wXTv3p2Ji4tjVCoVc+fOHaawsLBKxNa4cWNm6dKljFKpZG7evMlYW1szDx8+ZNatW8c0a9aMSUlJYaKjoxk3Nzfm7NmzrMXx77//MocPH2ZGjx6tkWzv3r3LCIVC5tq1a4xYLGZ69uzJLFy4UL2/TZs2zKJFi5iCggLmzz//ZGrVqsXI5XLW4iwJJVsdvK8XSmI41ekz5Orqyty4cYMZM2aMxp3k1q1bmS5duhg8nvXr1zNz585lJk6cqI7H2LEpFArG1dWVefr0abF9xo6NYRjG2tqaiY6OVv/cpk0b5uDBg0z79u2ZnTt3ql9fvHgxM2HCBNbjmTFjhkay/frrr5kpU6aofz5//jzj6enJMAzDPHr0iLGysmIKCwvV+2vWrMmcO3eO9TjfRJO/viPy8vJwbP1GiK9fh0oiAcMw4FlbQ9C6FQbOmgkbG3amPqwouVyOUzt2IPnUGSjT0sHIZOBamINf0xP1R46Ab79+Rl24GSjqqHUrPAzJkdeBwgJwTEwAgR06fTCWnpuy5MmTJ8jMzETdunXx8OFDjB07Vr2vSZMmBp9oJiMjA7/++iuuXLmCefPmqV83dmwvXrxAfn4+Dhw4gNWrV0MoFOLzzz/HtGnTjB4bAHz88ccICgrCd999h1u3biE+Ph7t27fHw4cP0bRpU43Yjh07ZtDYgKL/v549e2rEER8fD4lEgocPH8Lb21tj3O2r32H37oabJY6SbRWXmZ6Of5Z+j/zQMNR9mQLhWwlLee4Ctu4IgnnXzhj63UI4i0QGjU+pVGLvsuUQh5yCT2wcGr7dY/p+FDJPnMKGJo1Q138cevv7GzQ+AJDJZDizYwtktyLQNOsFGpu97gilVDG4+N8h5NVvjgZDPoBP85YGj+99VVBQgPHjx2PBggWwtbWFRCLR+FIjEAg0pmU1hG+//Rbz5s2D8K2hZ8aOLTExEdnZ2YiOjkZsbCyePHmCnj17wsfHx+ixAUVz10+YMEG9vOmWLVvg6upaJWIDSv7/e/X62/te7Td0nJRsq7DY6GgcnDYDDR88KrorLOHOkMfhoH5KGpi9/2DX3XsYvHED6jRsYJD4pFIpNgRMRtNzF+DF4QClDE2yZwD7uw+Q+s0iBD+PxYeLvjNIfACQlZmB4ws/w5DceJjyuICZZo9jHpeDrqZy4Pl13PnpFkKHBqDbB2NLqY1UlFwuxwcffIC6deti0f/HfVtbW2ssT5mTk8P6eOY33b59G9evX8e6deuK7TN2bBYWFgCARYsWwcLCAk2bNsWYMWNw4sQJo8eWmZmJAQMG4O+//8awYcPw4MED9OvXD02aNDF6bK+UFMer19/e92q/oeOskjNIESAjLQ0Hp05Ho4ePK9T8yuFw0PDRExyZNgOpycmsx6dSqfDX9Jloce4CzCrYPOwsV0CwcTP+Wb2G5eiKFBQU4MSCuRiZl1CUaMvRzEQJp4ObEXZwnwGie3+pVCr4+/uDw+Fg+/bt6s9vw4YNce/ePfVx9+/fR6NGjQwW14ULF/D48WO4u7tDJBJh7969+OGHHzBp0iSjx+bt7Q1TU1ONv/Wq8nt79uwZrKysMHLkSPB4PDRt2hS+vr64cOGC0WN7paQ4PD09YW1tjYYNG+LJkyca69saI05KtlXUge8WoeHDx1qXa/D4Cf75ZiELEWk6sW0bfE6eAV/L57D2ShVy/tyIhNhYdgJ7w4mfl2NYQbJWz4rrmnEg27cJyUlJLEb2fpsxYwZevnyJ/fv3a4z9Hj9+PDZu3IiYmBikpKRg9erVmDBhgsHimj59Op4+faqeN33w4MGYM2cO1qxZY/TYXiWzFStWQCqVIioqCnv37kX//v2NHpu3tzfy8/Nx+PBhMAyDhw8fIjw8HE2aNMH48ePx888/Iy0tDU+fPsWmTZtYjU2hUKjnwH/z3x9++CH++ecf3Lx5E9nZ2VixYoU6jvr166NBgwZYtWoVpFIp/vrrL3A4HHTu3Jm1OEtCybYKysrKgiw0XKcORRwOB8rwi0hPS2MhsteSjp2AlY4dnrxzJbjw1yY9R6QpNzcXNlG3wONqH2MnMxWu7N7OQlTvv7i4OGzevBnXrl2Do6OjuhkvPDwcAwYMwKxZs9C2bVv4+Pigb9++mDx5ssFis7S0VM+ZLhKJYGFhAWtrawiFQqPHBgDr1q1Deno6HB0d0b9/fyxbtgydO3c2emy2trbYt28fFi9eDIFAAD8/P3z66afo1asXZs2aha5du6JevXrw9fXFp59+qtFRSd+WL18OCwsLbN68GStWrICFhQV27tyJJk2aYPXq1Rg8eDBq1KgBNzc3LFz4+qYjODgYp0+fhlAoxPr163Hw4EGDTwKk8wxSmzZtwm+//YaYmBg4ODige/fuWLp0KQICAnDlyhXw+XxYWlqib9+++PPPP9W9ZQMCAlC3bl0sXLgQDMNg+fLl2Lx5MzIyMuDg4IBhw4bh119/BQB4eXkhNTUVXC4XNjY2GDVqFH755Rfw+Xx4eXkhKCgInTp10ohryZIlWLFihXpKwJo1a2LEiBH46quv1Ovg/vjjj9i2bRsSEhLg6uqKBQsWYNKkSRU+d7Zn/9m1YiXs1/wOro7JjGEYpH88C+MXszNH8oObN/Fg6AeoIa/YnM0luVPLE1PDQmFiov1iCxVxdP1v6HH5EPg6rtxzAgIM3rKftVmlaAYpQqoXna5Ey5cvx6JFi/DDDz8gIyMDUVFR6NixI86dOwcA2Lx5MyQSCe7du4e7d+8iMDCwxHq2b9+O/fv348KFC5BIJAgPD0erVq00jjl9+jQkEgnCwsKwb98+bNpU/h3RxIkTkZubi7S0NGzatAknT55E7969oVQqARTd/QUHB0MsFuPAgQP4+uuvcenSJV1+FazIvnpN50QLFJ1f9tVreoxI091DhyuVaAGg7rNYhB0/rqeIimMe39U50QJAJ1kmLp06qceICCHVmdZXI7FYjJUrV2L9+vXo378/zM3NYWVlhenTpxdr2nBxcUHfvn0RGRlZYl3Xr19H//794eXlBQDw9PSEfylDQ+rVq4fOnTtrNb7M3NwcHTp0wKFDh3Dnzh31+K8vvvgCzZs3Vz/s79mzJ65cuVJqPVKpFDk5ORobm5Q5uXqog8UYc/MqXYUVl4vclFQ9BFMyXkHlYhSY8iFJT9FTNISQ6k7rZHv58mXIZDIMHDiw3GMTExMREhKCOnXqlLi/Xbt22LJlC9asWYPIyMgyF3B//PgxwsPD0bx5c21DhqurK1q3bl3i3atcLseVK1fK7JkWGBgIW1tb9ebh4aF1DNpgVEo91FH677KyVHqID2A7Rj3UTWt0EEL0ROtkm5GRAUdHxzIfLs+YMQMCgQA1atSAUCjE0qVLSzxuwoQJ+Pnnn3H48GF06NABrq6u+PvvvzWO8fPzg1AohJ+fHwICAnTuGCASiZCVlVXs9c8++wxeXl7o27dvqWUXLFiA7Oxs9ZaQULH1W3XF18P4L54Ve2PIuNZWla5DzjAwZ3NNW3PLShWXKlUwEwj1EwshpNrTOtk6ODggPT0dCoWi1GM2btyInJwcXLp0CTExMUgro2fsxIkTERoaiqysLCxatEg9PdkrISEhEIvFiImJQWBgILg6Pod7+fIl7OzsNF4LDAzEuXPncODAgTJ7/pqZmUEgEGhsbLJs1hQ69ltTs27eTE/RFOfVvRvSdejl+6YokTO6DB6kn4BKoKxVv1K/w3DGEh39BugxIkJIdaZ15urQoQNMTExwvAKdW3x9fTFz5kx89dVX5R5rbm6OOXPmwM7OTiPZ6kNycjJu3ryJjh07ql9bt24dNm/ejNOnT8Pe3l6v71dZvWZMR6yN7nem8VYW6DKVvaEB7Xv1QmLTxpWqw6JbF1hZVf4OuTQdPpyIa1LdvxDIG7Uy+CL3hJD3l9bJVigU4ttvv8Xs2bNx8uRJSKVS5Ofn4++//y7WBAwAc+bMwdmzZ0vs2LR9+3acPHkSeXl5UCqVCAoKQk5ODlq0aFGhWGQyGQoLC9Xb28/ppFIprl69imHDhqFJkybq58w7duzAypUrcfr0abi5uWn7K2BdDS8voJOvzuUVHdqhdv36+gvoLRwOB469e0Gh451jgqkJ2gRM1HNUmlxErkitpdsXgrsFKjQeMlrPERFCqjOd2mQXLlyIxYsX44svvoCdnR3q16+PCxculDiY2d7eHhMnTsSqVauK7bOxscHSpUvh7u4Oe3t7rFmzBvv27Su1Q9XbevbsCQsLC/UWHBwMoCiJ29jYwN7eHlOmTEHPnj1x9uxZ9ZjJxYsXIy0tDc2aNVMPul+5cqUuvwrW9Pr6Kzytof0XgWfurujxdfktCZU1+KM5iGzTUuum2gJGhdyRw9Cwgl+oKqPzx1/iJMdWqzJpMhVSug1B3YYNWYqKEFId6TypRXVmqAkJbl8Iw+VP5qNO4ssKHf/M1QVt1vyMNizO4PKm9JQU7Bo/Aa3uPqjQuOA8hsGTwQMwc+N6nZ+9a+v5oyhE/vgd/JRZ5c7IFS9n8KBNPwyd9wXrcdGkFoRUL5RsdWDIC+WT+/dxaukyWF65BvdCaYnHJJmZIa9da/Rc+A0aGOCO8U25ubnYPf8zmF0IQ90cSYkJTcEwiHJ1gWD4EIz57juDr2ubnpqKi5vXweThTXTm5MPsrUUJogtVeCKqA+ce/dFp8DCDxETJlpDqhZKtDoxxoXx09x4itm1D7vWbUOXmFi0eb2MDm9Yt0X7iRDRs0dwgcZQmKSEB5zduQnZYGJjUNJjKFZCZm4FX0xPOfXqj3/Rp6ukyjaWwsBDn9wRBFfsEkBYAfBMorGxQ328ofJo1N2gslGwJqV4o2eqALpRlUyqVKCgogJWVlcHvYt8V9BkipHqhxeOJ3vF4PKMsIE0IIVUVLbFHCCGEsIySLSGEEMIySraEEEIIyyjZkmqJ+gUSQgyJOkiRauP5s2e4H3EBnNx08FQKMBwO5KZWsHavha59/MpcyYoQQiqDri5Er+RyOS6GhaIwPx9OLiK0atO2Sgz/OX3oABxykzCgjgcAzdWfJPm52PfHj+g1dgqcXVyMEyAh5L1GzchEb86GHMfp/UFoU8MO/ZrXgyunAEd2bUXkrRtGjSv05DF4Ixst6niUuN/a0gJj2jZA6N5tKCgoMHB0hJDqQOtkGxYWhvbt28PW1hYODg7o0aMHnj9/jiVLlsDExEQ9sb+1tTV8fYtWrgkNDQWXy4W1tTVsbGzQpEkTHDlyRF1nbGwsOByOuly9evWwadMm9b43m/e6desGGxsbZGRkqF9btWoVAgICSjz+zXJBQUEAgJSUFAwaNAjOzs5V4q5LW1lZWcjKyjJ2GBr+O3kCPo5W6N+5Haz+P1OUm8gZg7u2hyw5Dg/v3zNKXHK5HPnPo+DpXP4yikOb18H544cNEBUhpLrRKtlmZ2djyJAh+PLLL5GVlYW4uDh8/PHH6tV0Jk6cCIlEot4iIiLUZWvXrg2JRILs7Gx89NFHGDt2LMRisXo/j8eDRCJBbm4u1qxZg5kzZ5a4LB8AmJiY4JdfftHhdItwuVz0798fO3bs0LkOY2AYBvt27cSj6xF4fOMy9u7cXmxZQWOQy+WQZaXCXeRc4v42TRrg6b1Iwwb1fxdOhaBL/ZLvaN/G5/MgT0ukzlOEEL3TKtlGR0fDzMwMw4cPV9+pDhs2DJ6enhV/Qy4X/v7+yM/PR3R0dLH9HA4HAwcOhIODA6Kiokqs4+OPP8b69es17m614eTkhFmzZqF58+YVOl4qlSInJ0djM4ZzZ07Dr3MHdGjTCu1bt8TA7p1x9lSIUWJ5U/iF8+jSqkmZxwjNeUb5vcmzUmFpblbh490teUhJSWExIkLeD35+fti7d6+xw3hnaJVsvb29IZPJMHXqVJw5c0ani6dSqcTWrVvB5/NRs2bNYvsZhsGRI0eQlZWFJk1KvoDXq1cPgwYNwurVq7V+f10EBgbC1tZWvXl4VOxOSd/k0kLY2LyeBtHKyhIqhcIosbxJWpAPSwuLMo+xt7FGdna2gSJ6jWGUWh1vY26CvLw8lqIhVYmXlxesrKw0/r/z8/NhY2MDLy8v4wX2jggJCcHo0aONHcY7Q6tka2tri7CwMEilUvj7+8PJyQnjx49Hbm4uAGDnzp0QCoXqbcaMGeqyz58/h1AohIWFBebPn49t27bB5Y2en0qlEkKhEA4ODli4cCG2bduG+vXrlxrLwoUL8eeffyIzM7PYvld1vbldvHhRm1PVsGDBAmRnZ6u3hIQEneuqDAcnZ8QlvFD//CIxCUJ7B6PE8iZ7R2ekppfdyvAyQwxHR0cDRfQaw9Wuw326pAD29uU/3yXvB3d3dxw6dEj98+HDh+Hq6mq8gFikVGr3xZPol9YdpBo3boydO3ciOTkZERERiIiIwIoVKwAA/v7+EIvF6m3jxo3qcrVq1VK/Pnr0aISFhWnUy+PxIBaLkZmZibt372LcuHFlxuHt7Y0BAwaUeHf7qq43t06dOml7qmpmZmYQCAQamzG0ad8BT16m4cR/YQg5F4YHcUlo31H389KXtu074MqD4o8E3pTHcGBRzt0vGxy86iIlq+J31OkqU9jZ2ZV/IHkvjB07Frt27VL/HBQUVOzaEx8fjwEDBsDBwQENGjTAyZMn1fv+/vtveHt7w8bGBk2bNkVoaKh6X7du3bB48WK0bt0aAoEAo0ePhlRa8prUSqUSixcvRs2aNeHi4oLPPvsMCoUCUqkUjRo1Uvcvyc3NhZeXF06cOAGg6O78xx9/hLe3NxwcHPD555+r+3EsWbIEY8eOxYgRI2BtbY1z586VeS4rVqyAq6srBAIBmjRpgocPH5b5+pudTlUqFRYvXgwPDw+4urpi7ty56nPdtm0bevTogVmzZkEgEKBhw4a4deuW9v9Z77hKDf1p1aoVhg8fjvv371e4jKWlJf744w8cOHAAt2/frszbY+HChVi3bl2Jd7fvq159/dB/5Cj4jRiFvgMGGjscAEXP2Wt4N8Gth49L3H/swhW069rLwFEVadexC67EpVXo2GxJHmxq1GY5IlKV9OjRA/fu3UNaWhrS0tJw9+5d9Or1+rOqUqkwaNAg9O3bFykpKfj777/h7++vfq4vEonw33//QSwW4+OPP8aYMWM0Euq+ffvwzz//ID4+Hvfv30dwcHCJcaxevRrh4eG4ceMGHj9+jFu3bmHDhg0wMzPD9u3b8dlnnyExMRGffvopevXqhf79+6vL7t69G2FhYbh37x5CQkKwdetW9b5///0XM2bMQE5ODjp27FjquTx69AgbNmzA7du3kZ2djf3798Pe3r7U19+2ZcsWHDhwAJcvX8b9+/dx8+ZNBAYGqveHh4ejS5cuyMrKwvDhwzF//nzd/9PeUVol20ePHmHNmjVISkoCUNRh6ujRo2jbtq1Wb2pra4tp06Zp/GfowsfHB35+ftiyZYvWZQsLC9V/FG/+m+imVdu2MBXVwtGLN3H59l1Ex8Ti/NVbOHr5Nlr39DNa0xyHw0Gd9t1x49mLMo+TyeU4EZ2C7v0GGCgyUhXweDyMHDkSe/fuxd69ezFixAj16AoAuHbtGgoKCjB37lzw+Xx06NABXbt2RUhIUcfE/v37w8PDAzweD9OmTQOHw8GTJ0/U5adOnYqaNWtCKBRiwIABuHPnTolxbNmyBcuXL4eTkxOEQiE+++wzHDhwAADQunVrzJw5E3369MHp06eLteZ98sknEIlEcHNzw/z58zU6LXXt2hV9+vQBl8vF3bt3Sz0XPp8PqVSKqKgoKJVK+Pj4QCQSlfr62/bs2YPPP/8cNWrUgIODAxYtWoTdu3er9/v4+GDs2LHg8Xj48MMPS/09vM+0SrY2NjaIiIhAq1atYGVlhV69emHAgAH4+uuvAQDbt2/XGGdbVieDjz/+GEePHi2xR7I2vvvuO40hRBVlYWGhjs/CwqLM58OkYpo0a47BY/3RoHNfmHj4oH3/4Rg8apzRn4E1bdkapj5tceTWY0jyi09aERnzAkefpGHUjLnv5LhrUjnjxo1DcHAwdu3aVWIT8qv+Jq+2kydP4uXLlwCAQ4cOoWXLlup9qampGqMk3uyXYmlpCYlEUmIM8fHx8PPzU9czbtw4pKamqvdPnjwZUVFRmDRpUrHHWG922PTw8FDHBgA1atSo0LnUrVsXv/zyC7755hu4uLhg6tSpyMnJKfX1tyUlJWmMSqlZs6b6pkyb38P7TKveI+7u7ti/f3+J+5YsWYIlS5aUuK9bt254+vRpsbrenK1HUUqvWi8vL419bz4TAYAGDRpoPPh/+/jSytFYSvbY2dlVueeezdu0Q+MWrXDhdAjy4xPAUykALhdyvgWadxmIll61jB0iMZLWrVurH0W1adMGV65cUe9zd3dHgwYNcPfu3WLlpFIpxo4di4MHD6JPnz7g8XhwdXXV6dri7u6OvXv3omXLliXunzlzJj788EOsW7cOkydP1khsb3bYTEhI0Phy++aXx7LOBSjqc+Pv74/09HSMGTMGq1evxpIlS0p9/U1ubm6Ij49X/xwfHw83NzetfgfvO5obmVQbfD4fPfsPMnYYpAo6ePBgia+3a9cOKpUK69evx5QpUwAAV69eVTcNy2QyODsXTeaydu1apKVVrH/A2yZPnoyFCxdiy5YtEIlEiIuLQ1xcHLp27YoNGzYgLS0Nx44dQ2BgIKZMmYLTp0+rE+nvv/8OPz8/qFQq/Prrr5g3b57W51JQUICXL1/C19cXlpaWMDMzA4/Hw+PHj0t8/W2jR4/GL7/8gj59+sDCwgLLli3DmDFjdPpdvK9obmRCSLXXsGFDNGzYsNjrfD4fx48fx6lTp+Du7g43NzesWLECKpUKAoEAP/30E/r27QuRSISMjAzUrVtXp/f/4osv0KFDB3Ts2BG2trYYNGgQEhIS8Pz5c3z77bfYvn07TExM8M033yAzMxMbNmxQlx01ahQ6d+6Mxo0bo3fv3pg0aVKJ71HWuUilUnzxxRdwcHCAp6cnbG1tMX/+/FJff9uUKVMwbNgwtG3bFg0bNkSzZs2wYMECnX4X7ysOQ+2pWsvJyYGtrS2ys7ONNgyIvNvoM0T0wcvLC0FBQZUa2kgMg+5sCSGEEJZRsiWEEEJYRh2kSLUReeMGjm/aisyYWEjz8sA3MYGFnRA+Pbpg9PRpMDc3N3aIhGglNjbW2CGQCqJntjqg520luxUehviTR8AXZ0All4NjZg6law20HOMPz9p1jBbX4V27EbYzGOlhV2BRUHzyEiUYKOt5oX6/Xpj07ddwfmNMIFvoM0RI9ULJVgfGuFAmJyXh6p7t4Dy+C56kaFC50soGKu8maDt6AtyMtBIRAFw6dBAvj+5Hk+TnqGFS/MnEHQUHL+s2RuOJM1C/RcnjCNnAMAxWf/0N7v+2CWaFsvKPBwOmRSN8un0TfEpZcUpfKNlWD40aNcLmzZvRoUMHY4dCjIySrQ4MeaFUqVQ4+OP3cL0bgdbmTLEZjhiGwc1CILFhewxfsKTEMXBsOv7nb6gZshe1eOV/jK7xrSCY9QVa9exjgMiA1V9/i/s//gFTLT/iysbe+O7oAXiyuMwaJVtCqhfqIFWFMQyD4G8/Q6+ocLSxQIlTCXI4HLS24KDPkwgEL5inXvHDEM7u2Iq6IXsqlGgBoK0iD7nrfkTUjessRwYc27MX99Zu1DrRAgDvfjR+mjqbZhkj74TSZt8jVQsl2yrsxMbf4Zd4F5b88u9WLfg8DEx5iOPr1hggMiAvLw/Sf4NQQ8sb6TaqfDzcso6doN5wIWhPhZqOS5MbdhkXz53TY0SkOvLy8lKvpR0QEIC5c+eiZ8+esLGxQZ8+fUpdsUyhUOCjjz6Cg4MDfHx88MMPP6gnzIiNjQWfz8eGDRvg7u6OgIAAFBYWYs6cORCJRPD09MT333+vsdTe1KlT1XWHhoYWq2v9+vVwcXGBp6enetk8ol+UbKsopVIJxfUw2JhUPJtZ8Xng3LoEuVzOYmRFzu/Yik7KfJ3K1o5/jEd3Kre8Ylke3LmDlNCIStVhLlfiv+27yj+QEC3s27cPa9asQVpaGpRKJdauXVvicevXr8elS5cQFRWF8+fPY8+ePRr7lUolIiMj8ezZM2zatAnLli3DgwcPEBUVhYsXLyIoKEi9Bm55lEolrl27hri4OOzZswezZ8+u9AIxpDjWk+2mTZvQpEkTWFlZwdPTExMnTlR3Vz9w4ACaN28OS0tLuLm5Yd68ecjPf30BDwgIwPLly4vVqVAoMHLkSHh4eIDD4ZTa/b1Pnz5wcXEp1szy2WefoU6dOuoFn48dO6a389WXsMP/oKM8S+tynVTZCP1nb/kHVgLDMJBeCwefq9sKOd4mHEQd3KfnqF47umkrLPOKr+6jrZjToRoruBBSWR988AGaNm0Kc3NzjBgxotSl5v755x/Mnz8fzs7OcHV1xUcffVTsmMWLF8Pc3BwWFhbYs2cPFi9eDDs7O3h6euKzzz7TWOKuPK/q8vX1xeDBg9XL+xH9YTXZLl++HIsWLcIPP/yAjIwMREVFoWPHjjh37hyCg4Mxbdo0LF26FGKxGOHh4bh9+zZGjBhRoWdlnTt3xr59+2BmZlbi/pcvX+L8+fOQyWQ4ffq0xj4bGxuEhIQgOzsba9euxfjx4/H8+XO9nLO+5D2IhECLu9pXLPk8yB6VvKqHvmRnZ0OYHF/+gWXgJrD3+xbHxumlHl5KGu7evKmXuggBKr7UXHJyssbyeG/+GwC4XK7G6j7lLXFXnrKW6SP6wdqkFmKxGCtXrkRwcDD69++vfn369OlQqVTw9PTE0qVLMWTIEABAnTp1sHfvXtSqVQtnz55F7969Sw+az8cnn3xS5vvv3r0b7dq1Q7NmzRAUFKQRw5vLQ3Xv3h0NGzbErVu3UKtWycusSaVSjcXlS1rPUd840krcmRVW/q6uLNnZ2RAqZEApX3QqpDLnV45CPa2VaQogLYkuOsTwRCIREhMT1T+/ePFCY//bnSVfLXFXp07RePY3l7izsrLSWM40JSWl2PslJCSo1/dOSEgocVEGUjms3dlevnwZMpkMAwcOLLYvOjoaiYmJ6kT7ikgkQvv27XFODx1TgoKCMHr0aIwZMwaHDx8u9RtkVlYW7t+/X+aHKzAwELa2turNwxBjWvkmupc1qUTZCjA3N4eUV7nvaZzKnF85eCameqlHCcDKxkYvdRGijeHDh+PXX39FamoqkpOTsW5d2Z0KR48ejWXLliErKwsJCQlYvXq1eom7Zs2aITQ0FMnJyUhNTS3xOfGyZctQWFiIK1eu4MiRIxgxYgQr51WdsZZsMzIy4OjoCD6/+EU5PT0dQFFyfZuLi4t6v66ioqJw584djBw5Ep06dYK9vX2J61WqVCpMmjQJI0aMQIMGDUqtb8H/2rv3oCbPfA/g3wQDIiEgeIkIiBYK1aLFdi1Iq9Xd1iqtbm3BGyt4qVY93drZM85IrXFXT21n2uWMdlprrZfpamu3dhetR3erqFCl212veMUbitypFpBLQvI+54+EQEhQBJI3lO9nJjPvPd83ZPLjvT3P8uWorKy0vpp31uwsJv8+7Xr0RAgBo38fJyRqEhAQgArvjhUhk8Z5ncv3CuicbetVPTAoon1dphF1xKJFi/Dkk08iKioKY8eOxdSpU1u9ZAYAb7/9NiIjIxEVFYW4uDhMnz4dKSkpAIBnn30WL7zwAqKiojBu3Di7Qurh4YEnnngCoaGhSExMxPr16xEZGenU/euOnFZsAwMDUVFR4fAZsMDAQADm6xItlZaWok+fjhWLzz//HGPGjMGAAQOgUCiQlJTk8Hb2xYsXo7Ky0qZvSEe8vLyg0WhsXs42KikZ/9Y/+A1Ip/TA46/MdEKiJiqVCqbhv2r3+jVGE9RxYzoxka3oCb9BAzr+jKx/7OMY5uSWpOiXLT8/39r93datW7FixQrrvNTUVBw4cMDheiqVCh999BFu376NS5cuoW/fvhg4cCAA8+NELX9Xvb298fHHH6O0tBS3bt3CqlWroFSaf94VCgU++eQT/Pzzzzh37hz+8Ic/4MqVKzbrL1q0CGVlZSgoKLAWaepcTiu2cXFxUKlU2Lt3r928yMhIBAUFISMjw2Z6SUkJfvjhB4wbN67d7yuEwI4dO/Djjz9Cq9VCq9Xis88+Q2Zmps1F/2XLluH48ePYvXv3Pf9jlIs2KAjlYcMeeL3CkEcQPCis8wO1EPVSIq608wmjYz598Ezi9M4N1MzU2clQPNqx/8xNEBjx4kSHDYkQOVt1dTUOHDgAk8mEK1eu4M9//rPdZTfqWpxWbP39/fHWW29h8eLF2L9/P/R6PWpra7F582Zs3boV7777LnQ6HTIyMmAwGHD16lVMmzYNTz31lM3NUUajEfX19dZX4390er0e9fX1dsPZ2dkoKyvD6dOncerUKZw6dQoXL17EsGHDrLfCr1mzBt9++y32798PXze+JvfEvP/CIeHT5uWzpF4YOXeJExM1eeSxkTj7UPQDn+q+YxTo8fSzUDnxunKPHj0wdNKzEB04ujUMDsHMJYs6MRVR20mShGXLlsHPzw9jxoxBQkICFixYIHcs6gCnt428ceNGrF+/HlevXkVgYCDGjx+PP/3pTxg0aBB27tyJd955B3l5efDz80NSUhLWrl0LHx9zgUlNTcW2bdtstjdv3jxs2rQJYWFhuHHD9hEPIQQWLFgAg8GArVu32szbsGEDNm7ciBMnTkChUMDT09PmB/+TTz7BrFmz2rRPrmzXNu/0SeT972o8q6hu9ShLCIFMqBG2JA2PPN7+07sPqrq6Gn9fMgcvVRa26Qiw2ijhwKPxmLX2facfMVZVVWHZ85OBnBMPvK7euyeeS1+DGQud9+PGtpGJuhd2RNAOrv6hLC8txbHtm6E4+288ZfwZapX5prOaBhOylRpIwx5H3Ky56G+51d+VqqqqsHv5mxiVfx4DVa0X0FyjAgVxv8Eraatcdmr2xrVr+J/EWVCcOAsF2vaeeu+eeDJtKV5bkebUbCy2RN0Li207yPVDaTQa8f3e3aitKAOEgHeffngqYbJTT8m21YmsI8jf93eoz53AIP1d9PJQoqrBhKv+/SHFPImYpFkYFB7h8lxlpaV4b/4iVBzMdtiXbSMTBIzhYXjuv1936hFtIxZbou6FxbYd+EPZuqqqKty8dg01d+7Ar38/DAmPgKdn5zz32hEn/vUv/N9nW3F5fyakgkJ4QQETAIOnCn3jRyFmyiQkLXgV3t7eLsnD7xBR98Ji2w78oey67t69iwtnz6L0ViF8NL4IHTIED4W7/lnarvwdqq+vh8HQ/h6ViH5JPD090bNnz/su57TmGonckVqtxq9iY+WO0WXV19djcFgYShw0+UfUHWm1Wly/fv2+BZfFlojazGAwoKS0FAWXcqFRqwEBQEiAEACEg3HJ8niYACTJMq3Zq+U6QliGLeMmE4CmecJkaponWYalxuFm61vmiXst17itxmySybyMkCzjlmFTszzWbTeuIzUbbrF+830ySS3et8UwBIT1PR1tu9mwNa8w759JQEiNn0+z6VLj/jfNM+9u43KWYUmyrg+pKYcQApJJWP5kApIQlkXM05v+jAImqdm4JMy7jubTLes2jkuWebD8+YVojAPJ8tCeJMzbkCxfEckyDFi23bguzNs0WbZjXtacUwAwiaZ1BQCjMA9LjfMsw0ZLBkmYhxszmJcTNssJy3AtJGwvKYHBYGCxJaLOp/H1hcbXt1lxbL1wOi62917Hptg2K3T3LrbNCqLpXsXWwbbtimWz4ZbF1mSyXae1Yis12ydHBbbFsLhXgW0+bJ3WWGwlx8XW1KLYKpsVW2VTYRcmRYtiq2gqtoqmYtlYKCUhzAWxebFtvpxCQCgs09Gi2DaOwzIPlkL6AMW2scCamg0LWIotmhVby7CxcV3Lexktw5LdMBwOS5ZtNw57NF9OtP3JCnYeT0RE5GQstkRERE7GYktERORkLLZEREROxmJLRETkZCy2RERETsZiS0RE5GR8zrYdGlu4rKqqkjkJdVWN352u2lpqVXW15SFLtP7MLBu1aPZebNTil9ioheEB+sxmsW2H6upqAEBISIjMSairq66uhp+fn9wx2szT0xNarRYhkdFyRyFyC1qttk2drbAjgnaQJAlFRUXw9fV1Wd+szVVVVSEkJAQFBQVu2Yi9u+cD5M8ohEB1dTWCgoKgVHatqzldsSMCuf/eHcHsrvcgudkRgRMplUoEBwfLHQMajcatv8Dung+QN2NXOqJtrmfPnm36cXFHXeE72Rpmd73OzN21/qUmIiLqglhsiYiInIzFtgvy8vKCTqeDl5eX3FEccvd8QNfISJ2nK/+9md31nJGbN0gRERE5GY9siYiInIzFloiIyMlYbImIiJyMxZaIiMjJWGyJiIicjMWWiH4RysvLkZCQAB8fH0RGRuLgwYMOl0tNTYWXlxfUajXUajWGDRvm4qT2Pv74Y4wcORIqlQqrVq1qdTlJkrB06VL4+/ujf//+SE9Pd13IVrQ1+6pVq6BSqayfu1qtdl1IB/R6PebOnYvQ0FBoNBrExsYiJyfH4bJ1dXVITk6Gr68vQkND8cUXXzzw+7G5xi4iNzcXR48exZ07d9C7d2/Ex8cjOlr+xuCLioratFxQUJCTk1B3t2TJEmi1WpSXl+PAgQNISkrC5cuXERAQYLfs22+/jRUrVsiQ0rEBAwZg1apV2LFjxz2X27BhAw4fPoy8vDxUVlbimWeewfDhw/HrX//aRUnttTU7AKSkpGDTpk0uSHV/RqMRYWFh+P777xEcHIyvvvoKL774IvLz8+3+EdDpdKioqEBhYSHOnz+PiRMnYuTIkYiMjGz7Gwpya0ajUUyfPl0olUoRFhYmYmNjxaBBg4SHh4eYNm2aMBqNsuZTKBRCqVQKhULR6kupVMqaMSMjQ9b3J+errq4WKpVKFBQUWKeNHTtWbN682W7ZlJQUsXr1alfGa7OFCxcKnU7X6vzY2Fjx+eefW8d1Op2YPXu2C5Ld3/2y63Q6MW/ePNcFaocBAwaI//znP3bTtVqtyM7Oto6npKSIlStXPtC2eRrZza1btw4nT57EiRMncP36deTk5CA/Px/Hjx/HmTNnsG7dOlnzSZIEk8kESZJafZlMJlkzJicn24zzKPuX5/Lly1Cr1TYdhERHR+PcuXMOl09PT0dgYCBGjx6NI0eOuCpmh50/fx7Dhw+3jt9rH93R119/jcDAQMTExOCbb76RO46Ny5cv4/bt2wgPD7eZfufOHZSUlHT4c2exdXNffPEFPvzwQ4wYMcJm+ogRI7Bu3bo2nbpxtdLSUpw8eRJlZWVyRwFg30F7XV2dTEnIWe7evWvXO4tGo8Hdu3ftln3jjTdw5coVFBcXY8mSJZg8eTJu3Ljhqqgd0nI/W9tHd5SUlISLFy+itLQU7777LlJTU/Hjjz/KHQtA0zXZ5cuX2/XG1fj5+vr6Wqe153NnsXVzeXl5GD16tMN5o0ePxqVLl1ycqHVFRUUYP348QkJCMGnSJAQHB2PcuHEoLCyUNVfLPofl6IOYnEutVqOqqspmWlVVlcObcGJiYtC7d294enpi1qxZiIuLwz//+U9XRe2QlvvZ2j66o6FDh0Kr1aJHjx6YMGECZs6ciYyMDLljoaGhAYmJiQgPD8fKlSvt5jd+vtXV1dZp7fnceYOUmxNCoFevXg7n9erVy60Kx2uvvYbo6Gjs2bMHPj4+qKmpwYoVK7Bw4UJ8++23suWqr6/HggULrOO1tbU24wCwceNGV8eiThQREYG7d++isLAQAwcOBACcPXsWs2fPvu+6SqXS7uyHuxo6dChyc3OtpzTPnj3rFndTt4c7fO6SJOF3v/sdFAoFtm3b5vD3tHfv3tBqtcjNzUV8fDyA9n3uLLZuTq/X45133nE4TwgBg8Hg4kStO3r0KHbt2gWVSgUA8PHxwXvvvYcBAwbImuutt96yGV++fLlMSchZ1Go1pkyZAp1Oh/Xr1+PgwYM4c+YMpkyZYrfsrl278Pzzz8PLywu7du1CdnY2PvzwQxlSNzEajTAajTCZTDAajaivr4dKpYKHh4fNcsnJyXj//ffx3HPPobKyEp9++im2bdsmU2qztmbfvXs3xo4dC19fXxw+fBjbt2/Hvn37ZEpttnDhQhQXF+Mf//gHevRovRwmJydjzZo1+Oqrr3DhwgVkZGS0+phQqzpy5xY5X0pKikhNTb3ny1088sgj4tixYzbTcnJyRGRkpEyJqDspKysTEydOFN7e3iIiIkJ89913Qggh/vKXv4ihQ4dal4uPjxcajUZoNBoxatQoceDAAbkiW+l0OgHA5rVlyxaRlZUlfHx8rMuZTCbxxhtvCD8/P9G3b1/xwQcfyJjarK3Zp02bJvz9/YVarRbR0dHiyy+/lDG1EPn5+QKA6Nmzp/Dx8bG+srKy7L4ztbW1YubMmcLHx0cEBweL7du3P/D7sYs96jTffPMN5s6di5deegmhoaG4ceMGMjIysGnTJrz88suy5bp48SLmzJmDc+fOISYmBlu2bMGQIUNky0NE3Q9vkHJzxcXFmDFjBoYPH46UlBRUVFTIHalVU6ZMwQ8//ICHHnoI5eXlCA8PR05OjqyFFjA3dhAREYGdO3ciJCQES5culTUPEXU/PLJ1c7/97W9RU1ODqVOnYteuXejXr59bPu5jMpmgVqtRWVkJT09PuePYCAgIQHFxMby8vFBTU4Pw8HAUFxfLHYuIuhHeIOXmjh49iqtXr0Kj0WDatGk2D1a7Ew8PDzz66KMoLCzE4MGD5Y5jw2g0wsvLC4D5pi29Xi9zIiLqblhs3Zxer7c+xB4QEIDa2lqZE7Vu4sSJmDBhAubNm4fg4GCb2+hnzpwpWy4++kNEcmOxdXMGg8Hm0Z/6+nq7R4HS0tJcHcuh7OxsDBw4EPv377eZrlAoZC22fPSHiOTGa7Zubs6cOfecr1AosHnzZhel6ZqOHTt232Vaa6WLiKgzsNhSp9FoNHZN5gHm09+3b9+WIZHZ4MGDoVAoWm2tRqFQ4Nq1ay5ORUTdCU8jdxEHDx5EZmYmKioq0KdPH4wfP17WPiwdcVTMamtrZW9S8vr167K+PxERi62bq6urw8svv4zDhw8jNjYW/fv3x+XLl5Geno4xY8bgb3/7G7y9vWXNGBERAYVCgbq6Ojz88MM288rLyx02mUdE1J3wNLKbe/PNN3Hy5Ens3LkT/fv3t04vKSnB9OnTERMTg/T0dBkTAkeOHIEQApMmTbJp61ShUKBfv36IioqSMR0RkfxYbN1ccHAwsrOzHT67evXqVTz99NMoKiqSIZk9SZKgVLJRMiKillhs3ZyPjw8qKysd9kjR0NAAf39/1NTUyJDMXkNDA3bs2IHTp0/bdazM51iJqDvjNVs39/DDD2P//v144YUX7Obt27cPERERMqRybPbs2cjNzUVCQoK1T1EiIuKRrdv7+uuvsWDBAqxevRqTJ0+GVqtFSUkJMjIyoNPpsGHDBiQmJsodEwDg5+eHgoICa4tXRERkxiNbN/fKK6+goaEBy5Ytw+9//3vr9KCgIKxbt85tCi1gPgqvqqpisSUiaoHF1s3dvHkTer0eN2/eRF5eHioqKhAYGIjIyEhs27YNBQUFCAkJkTsmACAhIQETJkzA/Pnzbe6cBuRtG5mISG48jezm5syZg/j4eMyfP99u3pYtW5CVlYUtW7bIkMzeuHHjHE5XKBTIzMx0cRoiIvfBYuvmwsLCcOHCBYcNV9TV1SEqKgo3btyQIRkREbUVTyO7uYqKCnh4eDic5+HhgZ9++snFie7t2rVr+Otf/4qioiIEBQUhMTERQ4YMkTsWEZGs2AKBmxsyZAgOHTrkcN6hQ4fcqpDt3r0bjz32GE6fPo1evXrhzJkziImJQUZGhtzRiIhkxSNbN/f6669j7ty52LBhAxISEqBUKiFJEvbu3YvFixdDp9PJHdEqLS0Ne/bswdixY63TsrKysGjRIraPTETdGq/ZdgFr1qzB2rVr0dDQgD59+qCiogIqlQppaWl2HaPLKSAgAKWlpVCpVNZpDQ0N6NevH+7cuSNjMiIiebHYdhGVlZXIycnBTz/9hMDAQMTFxcHPz0/uWDYSEhIQHR2NP/7xj/Dy8oLBYMDKlStx+vRpmw4KiIi6GxZb6jS3bt3CjBkzcPz4cfTt2xfl5eUYOXIkvvzySwQHB8sdj4hINrxmSx128+ZNZGZmIjU1FdnZ2bh165b1buTvvvvOYafyRETdCe9Gpg7T6XQwGo3W8eDgYIwaNcp6NLty5Uq5ohERuQWeRqYOY8MbRET3xiNb6rCu1vAGEZGrsdhSh3WlhjeIiOTAYksd1tjwxp49eyBJEgBAkiTs2bMH8+fPt+kakIioO+LdyNRhr776KkpLSzF9+nSHDW846rGIiKg74Q1S1Gm6QsMbRERyYLElIiJyMl6zJSIicjIWWyIiIidjsSUiInIyFlsiIiInY7ElIiJyMhZbIiIiJ2OxJSIicrL/B9DkJB2crvVZAAAAAElFTkSuQmCC", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "fig, ax = plt.subplots(1,1, figsize=(5, 5))\n", "sc.pl.dotplot(kam20_norn_ad, groupby=\"Disease_Identity\", var_names=reduced_gene_list, show=True, swap_axes=True, ax=ax)" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.11.7" } }, "nbformat": 4, "nbformat_minor": 5 } ================================================ FILE: model.py ================================================ """ Model class """ import warnings warnings.filterwarnings("ignore") import math from torch import nn, Tensor from torch.nn import TransformerEncoder, TransformerEncoderLayer import sys sys.path.append('../') from typing import Any import torch def full_block(in_features, out_features, p_drop=0.1): return nn.Sequential( nn.Linear(in_features, out_features, bias=True), nn.LayerNorm(out_features), nn.GELU(), nn.Dropout(p=p_drop), ) class PositionalEncoding(nn.Module): def __init__(self, d_model: int, dropout: float = 0.1, max_len: int = 1536): super().__init__() self.dropout = nn.Dropout(p=dropout) position = torch.arange(max_len).unsqueeze(1) div_term = torch.exp \ (torch.arange(0, d_model, 2) * (-math.log(10000.0) / d_model)) pe = torch.zeros(max_len, 1, d_model) pe[:, 0, 0::2] = torch.sin(position * div_term) pe[:, 0, 1::2] = torch.cos(position * div_term) self.register_buffer('pe', pe) def forward(self, x: Tensor) -> Tensor: """ Args: x: Tensor, shape [seq_len, batch_size, embedding_dim] """ x = x + self.pe[:x.size(0)] return self.dropout(x) class TransformerModel(nn.Module): def __init__(self, token_dim: int, d_model: int, nhead: int, d_hid: int, nlayers: int, output_dim:int, dropout: float = 0.05): super().__init__() self.model_type = 'Transformer' self.pos_encoder = PositionalEncoding(d_model, dropout) self.d_model = d_model self.encoder = nn.Sequential(nn.Linear(token_dim, d_model), nn.GELU(), nn.LayerNorm(d_model)) encoder_layers = TransformerEncoderLayer(d_model, nhead, d_hid, dropout) self.transformer_encoder = TransformerEncoder(encoder_layers, nlayers) self.d_model = d_model self.dropout = dropout self.decoder = nn.Sequential(full_block(d_model, 1024, self.dropout), full_block(1024, output_dim, self.dropout), full_block(output_dim, output_dim, self.dropout), nn.Linear(output_dim, output_dim) ) self.binary_decoder = nn.Sequential( full_block(output_dim + 1280, 2048, self.dropout), full_block(2048, 512, self.dropout), full_block(512, 128, self.dropout), nn.Linear(128, 1) ) self.gene_embedding_layer = nn.Sequential(nn.Linear(token_dim, d_model), nn.GELU(), nn.LayerNorm(d_model)) self.pe_embedding = None def forward(self, src: Tensor, mask: Tensor): """ Args: src: Tensor, shape [seq_len, batch_size] Returns: output Tensor of shape [seq_len, batch_size, ntoken] """ src = self.encoder(src) * math.sqrt(self.d_model) src = self.pos_encoder(src) output = self.transformer_encoder(src, src_key_padding_mask=( 1 -mask)) gene_output = self.decoder(output) # batch x seq_len x 128 # embedding = torch.mul(gene_output, mask.t().unsqueeze(2)).sum(0) # average over non zero genes # In the new format, the cls token, which is at the 0 index mark, is the output. embedding = gene_output[0, :, :] # select only the CLS token. embedding = nn.functional.normalize(embedding, dim=1) # Normalize. return gene_output, embedding def predict(self, cell_embedding, gene_embeddings): gene_embeddings = self.gene_embedding_layer(gene_embeddings) dec = self.binary_decoder \ (torch.hstack((cell_embedding, gene_embeddings))) return dec ================================================ FILE: model_files/new_species_protein_embeddings.csv ================================================ species,path ================================================ FILE: requirements.txt ================================================ numpy==1.26.4 scipy==1.14.1 pandas==2.2.2 tqdm==4.66.5 torch==2.1.1 scanpy==1.10.2 accelerate==0.24.0 requests==2.25.1 urllib3==1.26.6 ================================================ FILE: utils.py ================================================ """ Utils """ import warnings warnings.filterwarnings("ignore") import pandas as pd import numpy as np import os import requests from tqdm import tqdm import tarfile def get_shapes_dict(dataset_path): shapes_dict = {} datasets_df = pd.read_csv(dataset_path) sorted_dataset_names = sorted(datasets_df["names"]) for name in sorted_dataset_names: shapes_dict[name] = (int(datasets_df.set_index("names").loc[name]["num_cells"]), 8000) shapes_dict["dev_immune_mouse"] = (443697, 4786) shapes_dict["dev_immune_human"] = (34009, 5566) shapes_dict["intestinal_tract_human"] = (69668, 5192) shapes_dict["gtex_human"] = (18511, 7109) shapes_dict["gut_endoderm_mouse"] = (113043, 6806) shapes_dict["luca"] = (249591, 7196) shapes_dict.update({ "madissoon_novel_lung":(190728, 8000), 'flores_cerebellum_human': (20232, 8000), 'osuch_gut_human': (272310, 8000), 'msk_ovarian_human': (929690, 8000), 'htan_vmuc_dis_epi_human': (65084, 8000), 'htan_vmuc_val_epi_human': (57564, 8000), 'htan_vmuc_non_epi_human': (9099, 8000), 'hao_pbmc_3p_human': (161764, 8000), 'hao_pbmc_5p_human': (49147, 8000), 'gao_tumors_human': (36111, 8000), 'swabrick_breast_human': (92427, 8000), 'wu_cryo_tumors_human': (105662, 8000), 'cell_line_het_human': (53513, 8000), 'bi_allen_metastasis_human': (27787, 8000), 'zheng68k_human': (68579, 8000), 'zheng68k_12k_human': (68579, 12000), 'mouse_embryo_ct': (153597, 12000), "regev_gtex_heart": (36574, 8000), "tabula_sapiens_heart": (11505, 8000), "10k_pbmcs":(11990, 12000), "epo_ido":(35834,12000), 'tabula_sapiens_kidney': (9641, 8000), 'tabula_microcebus_kidney': (14592, 8000), 'tabula_muris_kidney': (2781, 8000), 'tabula_muris_senis_kidney': (19610, 8000), 'immune_human': (33506, 8000) }) shapes_dict["zyl_sanes_glaucoma_pig"] = (5901, 6819) shapes_dict["parkinsons_macaF"] = (1062, 5103) for row in datasets_df.iterrows(): ngenes = row[1].num_genes ncells = row[1].num_cells name = row[1].names if not np.isnan(ngenes): shapes_dict[name] = (int(ncells), int(ngenes)) return shapes_dict def figshare_download(url, save_path): """ Figshare download helper with progress bar Args: url (str): the url of the dataset path (str): the path to save the dataset """ if os.path.exists(save_path): return else: # Check if directory exists if not os.path.exists(os.path.dirname(save_path)): os.makedirs(os.path.dirname(save_path)) print("Downloading " + save_path + " from " + url + " ..." + "\n") response = requests.get(url, stream=True) total_size_in_bytes = int(response.headers.get('content-length', 0)) block_size = 1024 progress_bar = tqdm(total=total_size_in_bytes, unit='iB', unit_scale=True) with open(save_path, 'wb') as file: for data in response.iter_content(block_size): progress_bar.update(len(data)) file.write(data) progress_bar.close() # If the downloaded filename ends in tar.gz then extraact it if save_path.endswith(".tar.gz"): with tarfile.open(save_path) as tar: tar.extractall(path=os.path.dirname(save_path)) print("Done!")