[
  {
    "path": ".flake8",
    "content": "[flake8]\nignore = E501"
  },
  {
    "path": ".github/workflows/build_docs.yml",
    "content": "name: Build docs\n\non:\n  push:\n    branches: [ main ]\n  pull_request:\n    branches: [ main ]\n\n  workflow_dispatch:\n\njobs:\n  build:\n    runs-on: ubuntu-latest\n\n    steps:\n      - uses: actions/checkout@v2\n      \n      - name: Sphinx build\n        uses: ammaraskar/sphinx-action@0.4\n        with:\n          docs-folder: doc/sphinx/\n\n      - name: Commit documentation changes\n        run: |\n          git clone https://github.com/microsoft/msrflute --branch gh-pages --single-branch gh-pages\n          cp -r doc/sphinx/_build/html/* gh-pages/\n          cd gh-pages\n          git config --local user.email \"action@github.com\"\n          git config --local user.name \"GitHub Action\"\n          git add .\n          git commit -m \"Update documentation\" -a || true\n    \n      - name: Push changes\n        uses: ad-m/github-push-action@master\n        with:\n          branch: gh-pages\n          directory: gh-pages\n          github_token: ${{ secrets.GITHUB_TOKEN }}\n"
  },
  {
    "path": ".github/workflows/codeql.yml",
    "content": "# This is based on the standard CodeQL workflow provided by Github\nname: \"CodeQL\"\n\non:\n  push:\n    branches: [ \"main\" ]\n  pull_request:\n    # The branches below must be a subset of the branches above\n    branches: [ \"main\" ]\n  schedule:\n    - cron: '35 2 * * 3'\n\njobs:\n  analyze:\n    name: Analyze\n    runs-on: ubuntu-latest\n    permissions:\n      actions: read\n      contents: read\n      security-events: write\n\n    strategy:\n      fail-fast: false\n      matrix:\n        language: [ 'python' ]\n\n    steps:\n    - name: Checkout repository\n      uses: actions/checkout@v3\n\n    - name: Set-up MPI\n      uses: mpi4py/setup-mpi@v1\n\n    # Initializes the CodeQL tools for scanning.\n    - name: Initialize CodeQL\n      uses: github/codeql-action/init@v2\n      with:\n        languages: ${{ matrix.language }}\n        \n    # Autobuild attempts to build any compiled languages  (C/C++, C#, or Java).\n    # If this step fails, then you should remove it and run the build manually (see below)\n    - name: Autobuild\n      uses: github/codeql-action/autobuild@v2\n\n    - name: Perform CodeQL Analysis\n      uses: github/codeql-action/analyze@v2\n"
  },
  {
    "path": ".gitignore",
    "content": "__pycache__/\n.vscode/\ndoc/sphinx/_build\ntesting/logs.txt\ntesting/outputs\ntesting/mockup"
  },
  {
    "path": ".gitmodules",
    "content": "[submodule \"utils/dp-accountant\"]\n\tpath = utils/dp-accountant\n\turl = https://github.com/microsoft/prv_accountant\n"
  },
  {
    "path": "CHANGELOG.md",
    "content": "# Changelog\n\nAll notable changes to this project will be documented in this file.\n\n## [0.1.0] - 2021-11-22\n\nWe're super excited to announce FLUTE: Federated Learning Utilities for Testing and Experimentation, a platform for conducting high-performance federated learning simulations!\n\nThis first release fully focuses on implementing fast prototyping to validate different CL scenarios \nin an Federated environment.\n\n### Features\n\n- large scale simulation (millions of clients, sampling tens of thousands per round).\n- multi-GPU and multi-node orchestration backed up by MPI.\n- local or global differential privacy.\n- model quantization.\n- a variety of standard optimizers and aggregation methods.\n- most model types including CNNs, RNNs, and Huggingface Transformers.\n- extensibility, enabling new models, dataloaders, optimizers, and aggregators.\n- local or cloud-based job staging using AzureML.\n\n\n## [1.0.0] - 2022-08-29\n\nThis release contain major changes in the communication backbone , in order\nto run previous experiments you have already integrated in FLUTE, please make sure\nto use `torch.distributed` instead of `MPI `to launch the jobs. For more documentation\nabout the new command, please refer to the [README](README.md).\n\n\n### New features\n\n- 🏎 Better performance: Support for NCCL and Gloo as backend communication protocols. \n  - Improvements in GPU utilization and overall communication speed (on the order of minutes!) for projects with huge models and datasets.\n- 🌟 Remove file type dependency on client.py, now FLUTE can receive any kind of dataset and even download the data on-the-fly. The data intantiation is completely under control of each task dataset.\n  - In older versions FLUTE only allowed `json` and `hdf5` files, so the client could recognize it.\n- 🌟 Abstract classes for new models/dataloaders.\n- 🌟 Allows Federated Learning with Personalization. \n  - Personalization allows you to leverage each client local data to obtain models that are better adjusted to their own data distribution. You can run the `cv` task in order to try out this feature.\n\n\n## [1.0.1] - 2023-07-29\n\n🔋 This release removes the restriction of the minimum number of GPUs available in FLUTE, \nallowing users to run experiments using a single-GPU worker by instantiating both: Server\nand clients on the same device. For more documentation about how to run an experiments\nusing a single GPU, please refer to the [README](README.md).\n\n\n### New features\n\n- 🌟 Include FedProx aggregation method\n\n"
  },
  {
    "path": "CITATION.cff",
    "content": "cff-version: 1.2.0\nmessage: \"To cite Microsoft FLUTE in academic papers, please cite it as below.\"\nauthors:\n  - name: \"Microsoft Research\"\ntitle: \"FLUTE: Federated Learning Utilities for Testing and Experimentation\"\nversion: 1.0.0\ndate-released: \"2021-22-11\"\nurl: \"https://github.com/microsoft/msrflute\"\nlicense:\n - MIT\nkeywords:\n  - FLUTE\n  - federated learning\n"
  },
  {
    "path": "CODE_OF_CONDUCT.md",
    "content": "# Microsoft Open Source Code of Conduct\n\nThis project has adopted the [Microsoft Open Source Code of Conduct](https://opensource.microsoft.com/codeofconduct/).\n\nResources:\n\n- [Microsoft Open Source Code of Conduct](https://opensource.microsoft.com/codeofconduct/)\n- [Microsoft Code of Conduct FAQ](https://opensource.microsoft.com/codeofconduct/faq/)\n- Contact [opencode@microsoft.com](mailto:opencode@microsoft.com) with questions or concerns\n"
  },
  {
    "path": "CONTRIBUTING.md",
    "content": "# Contributing\n\nThis project welcomes contributions and suggestions. Most contributions require you to\nagree to a Contributor License Agreement (CLA) declaring that you have the right to,\nand actually do, grant us the rights to use your contribution. For details, visit\nhttps://cla.microsoft.com.\n\nThis project has adopted the [Microsoft Open Source Code of Conduct](https://opensource.microsoft.com/codeofconduct/).\nFor more information see the [Code of Conduct FAQ](https://opensource.microsoft.com/codeofconduct/faq/)\nor contact [opencode@microsoft.com](mailto:opencode@microsoft.com) with any additional questions or comments.\n\n### Pull Requests\n\nSubmit pull requests to **branch contribution**. PR's in any other branch will not be accepted.\n\nWhen you submit a pull request, a CLA-bot will automatically determine whether you need\nto provide a CLA and decorate the PR appropriately (e.g., label, comment). Simply follow the\ninstructions provided by the bot. You will only need to do this once across all repositories using our CLA.\n\n"
  },
  {
    "path": "LICENSE.TXT",
    "content": "Copyright (c) Microsoft Corporation.\n\nMIT License\n\nPermission is hereby granted, free of charge, to any person obtaining a copy\nof this software and associated documentation files (the \"Software\"), to deal\nin the Software without restriction, including without limitation the rights\nto use, copy, modify, merge, publish, distribute, sublicense, and/or sell\ncopies of the Software, and to permit persons to whom the Software is\nfurnished to do so, subject to the following conditions:\n\nThe above copyright notice and this permission notice shall be included in all\ncopies or substantial portions of the Software.\n\nTHE SOFTWARE IS PROVIDED *AS IS*, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR\nIMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,\nFITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE\nAUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER\nLIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,\nOUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE\nSOFTWARE."
  },
  {
    "path": "NOTICE.txt",
    "content": "THIRD-PARTY SOFTWARE NOTICES AND INFORMATION\nDo Not Translate or Localize\n\nThis software incorporates components from the projects listed below. The original copyright notices\nand the licenses under which Microsoft received such components are set forth below and are provided for \ninformational purposes only. Microsoft reserves all rights not expressly granted herein, whether by \nimplication, estoppel or otherwise.\n\nThis software includes parts of the Huggingface/Transformers Library (https://github.com/huggingface/transformers). \nState-of-the-art of  Natural Language Processing for Jax, PyTorch and TensorFlow. Huggingface/Transformers library is \nlicensed under Apache License 2.0, you can find a copy of this license at https://github.com/huggingface/transformers/blob/master/LICENSE\n\nThis software includes parts of the Tensorflow/Privacy Library (https://github.com/tensorflow/privacy). \nA library that includes implementations of TensorFlow optimizers for training machine learning models with\ndifferential privacy. The Tensorflow/Privacy library is licensed under Apache License 2.0, \nyou can find a copy of this license at https://github.com/tensorflow/privacy/blob/master/LICENSE\n\nThis software includes parts of LEAF Library (https://github.com/TalwalkarLab/leaf).\nA Benchmark for Federated Settings. LEAF library is licensed under BSD 2-Clause License, you can find a copy\nof this license at https://github.com/TalwalkarLab/leaf/blob/master/LICENSE.md\n\nThis software includes parts of ECG Classification from Kaggle Competition \n(https://www.kaggle.com/polomarco/ecg-classification-cnn-lstm-attention-mechanism). \nAn example for ECG Classification | CNN LSTM Attention Mechanism. This example is \nlicensed under Apache License 2.0, you can find a copy of this license at \nhttps://www.apache.org/licenses/LICENSE-2.0 \n\nThis software includes parts of Torchvision Library (https://github.com/pytorch/vision.git). A package of\npopular datasets, model architectures, and common image transformations for computer vision. This example\nis licenced under BSD 3-Clause License, you can find a copy of this licence at \nhttps://github.com/pytorch/vision/blob/main/LICENSE\n\nThis software includes parts of FedML Library (https://github.com/FedML-AI/FedML).The Community \nBuilding Open and Collaborative AI Anywhere at Any Scale. FedML library is licensed under Apache License 2.0, \nyou can find a copy of this license at https://github.com/FedML-AI/FedML/blob/master/LICENSE\n\nThis software includes parts of FedNewsRec-EMNLP-Findings-2020 repository (https://github.com/taoqi98/FedNewsRec).  \nCode from the paper \"Privacy-Preserving News Recommendation Model Learning\". This example is licenced \nunder MIT License, you can find a copy of this licence at https://github.com/taoqi98/FedNewsRec/blob/master/LICENSE\n\nThis software includes parts of Fast AutoAugment repository (https://github.com/kakaobrain/fast-autoaugment).  \nCode from the paper \"Fast AutoAugment\" (Accepted at NeurIPS 2019). This example is licenced \nunder MIT License, you can find a copy of this licence at https://github.com/kakaobrain/fast-autoaugment/blob/master/LICENSE\n\nThis software includes parts of NIID-Bench repository (https://github.com/Xtra-Computing/NIID-Bench).  \nCode from the paper \"Federated Learning on Non-IID Data Silos: An Experimental Study\". This example is \nlicenced under MIT License, you can find a copy of this licence at https://github.com/Xtra-Computing/NIID-Bench/blob/main/LICENSE\n"
  },
  {
    "path": "README.md",
    "content": "# FLUTE\n\nWelcome to FLUTE (Federated Learning Utilities for Testing and Experimentation), a platform for conducting high-performance federated learning simulations.\n\n## Features\n\nFLUTE is a pytorch-based orchestration environment enabling GPU or CPU-based FL simulations.  The primary goal of FLUTE is to enable researchers to rapidly prototype and validate their ideas.  Features include:\n\n- large scale simulation (millions of clients, sampling tens of thousands per round)\n- single/multi GPU and multi-node orchestration\n- local or global differential privacy\n- model quantization\n- a variety of standard optimizers and aggregation methods\n- most model types including CNNs, RNNs, and Huggingface Transformers.\n- extensibility, enabling new models, dataloaders, optimizers, and aggregators.\n- local or cloud-based job staging using AzureML\n\n## Benchmarking\n\nThe following common tasks were used to evaluate the performance in speed/memory utilization of FLUTE compared with the most representative simulation platforms based on their number of starts on GitHub: FedML 0.7.303 and Flower 1.0.0. \n\n|Task|Data Set|Model|Algorithm|# Clients|Clients per round|Batch Size|Client Optimizer|lr|Epochs|# Rounds|Test Freq|\n|:----|:----|:----|:----|:----|:----|:----|:----|:----|:----|:----|:----|\n|CV|MNIST|LR|FedAvg|1000|10|10|SGD|0.03|1|100|20|\n|CV|Federated EMNIST|CNN (2 Conv + 2 FC)|FedAvg|3400|10|20|SGD|0.1|1|1500|50|\n|CV|FED_CIFAR-100|ResNet-18+group normalization|FedAvg|500|10|20|SGD|0.1|1|4000|50|\n|NLP|Shakespeare|RNN (2 LSTM + 1 FC)|FedAvg|715|10|4|SGD|0.8|1|1200|50|\n\n### FedML Comparison\n\nThis comparison was carried out using Parrot (Simulator) on version 0.7.303 at commit ID [8f7f261f](https://github.com/FedML-AI/FedML/tree/8f7f261f44e58d0cb5a416b0d6fa270b42a91049). Showing that in some cases FLUTE can outperform 43x faster.\n\n```\n _____________________________________________________________________________\n|                    |   FedML (MPI) - Fastest   |   FLUTE (NCCL)  - Fastest  |\n| Task               | Acc | Time     | GPU Mem  | Acc | Time     | GPU Mem   |\n|--------------------|-----|----------|----------|-----|----------|-----------|\n| LR_MNIST           | ~81 | 00:03:09 | ~3060 MB | ~81 | 00:01:35 | ~1060 MB  |\n| CNN_FEMNIST        | ~83 | 05:49:52 | ~5180 MB | ~83 | 00:08:22 | ~1770 MB  |\n| RESNET_FEDCIFAR100 | ~34 | 15:55:36 | ~5530 MB | ~33 | 01:42:01 | ~1900 MB  |\n| RNN_FEDSHAKESPEARE | ~57 | 06:46:21 | ~3690 MB | ~57 | 00:21:50 | ~1270 MB  |\n -----------------------------------------------------------------------------\n```\n\nYou can find the examples above in [experiments](experiments).\n\n### Flower Comparison\n\nThis comparison was carried out using Flower (Simulator) on version 1.0.0 at commit ID [4e7fad9](https://github.com/adap/flower/tree/4e7fad99389a5ee511730841b61f279e3359cb16) with the [lr_mnist](experiments/cv_lr_mnist/) task. Showing that in some cases FLUTE can outperform 53x faster.\n\n```\n ________________________________________________\n|        |    Flower (Ray)   | FLUTE (NCCL/Gloo) |\n|        | Acc |    Time     | Acc |    Time     |\n|--------|-----|-------------|-----|-------------|\n| CPU    | ~80 |   00:30:14  | ~80 |   00:03:20  |\n| GPU 2x | ~80 |   01:21:44  | ~80 |   00:01:31  |\n| GPU 4x | ~79 |   00:56:45  | ~81 |   00:01:26  |\n ------------------------------------------------\n```\n\nYou can find the example above in the [cv_lr_mnist](experiments/cv_lr_mnist/) folder.\n\n## Quick Start\n\nInstall the requirements stated inside of `requirements.txt`. Ideally this sould be done inside of a virtual environment, for instance, using Anaconda.\n\n```\nconda create -n FLUTE python==3.7\npip install -r requirements.txt\n```\n\nFLUTE uses torch.distributed API as its main communication backbone, supporting three built-in backends. For more information please refer to [Distributed Communication Package](https://pytorch.org/docs/stable/distributed.html). Therefore, we highly suggest to use NCCL backend for distributed GPU training and Gloo for distributed CPU training. There is no `setup.py` as FLUTE is not currently distributed as a package, but instead meant to run from the root of the repository.\n\nAfter this initial setup, you can use the data created for the integration test for a first local run. Note that this data needs to be download manually inside the `testing` folder, for more instructions please look at [the README file inside `testing`](testing/README.md).\n\nFor single-GPU runs:\n\n```\npython -m torch.distributed.run --nproc_per_node=1 e2e_trainer.py -dataPath ./testing -outputPath scratch -config testing/hello_world_nlg_gru.yaml -task nlg_gru -backend nccl\n```\n\nFor multi-GPU runs (3 GPUs):\n\n```\npython -m torch.distributed.run --nproc_per_node=3 e2e_trainer.py -dataPath ./testing -outputPath scratch -config testing/hello_world_nlg_gru.yaml -task nlg_gru -backend nccl\n```\n\nThe config file `testing/hello_world_nlg_gru.yaml` has some comments explaining the major sections and some important details; essentially, it consists in a very short experiment where a couple of iterations are done for just a few clients. A `scratch` folder will be created containing detailed logs.\n\n## Documentation\n\nOnline documentation is available at https://microsoft.github.io/msrflute/\n\nLocally, the documentation is inside the `doc/sphinx` folder. To build the docs on Linux:\n\n```\n$ pip install sphinx\n$ cd doc/sphinx\n$ make html\n```\n\nOn Windows, you can use the `make.bat` script.  It may be necessary to `export PYTHONPATH=../../` for sphinx to find the code.\n\n## Architecture\n\nThe core client/server training code is inside the `core` folder.\n\n- Server-side federation and global DP application takes place in `server.py`, more specifically in the `OptimizationServer.train()` method.\n- Client-side training updates take place in the static method `Client.process_round()`, inside `client.py`.\n\nGeneral FL orchestration code is in `federated.py`, but for most hub and spoke federation scenarios you won't need to touch this (unless you want to invest in optimizing server-client calls, which would be great!). Note that FLUTE does not implement secure aggregation since this is primarily a security feature for production scenarios; contributors are invited to add it for experimentation purposes.\n\nThe primary entry point for an experiment is in the script `e2e_trainer.py`. Primary config scripts for experiments are in `configs`. For instance, a basic training scenario for a next-word prediction task is set up in `hello_world_nlg_gru_json.yaml`.\n\nPrivacy accounting is expensive so the main parameters are logged and the actual accounting can be done offline. RDP privacy accounting is in `extensions/privacy/analysis.py`. A better accounting method is in the `dp-accountant` submodule.\n\n## Customization\n\nSee `experiments` folder for illustrations of how dataloaders and models are customized. In order to in include a new experiment, the new scenario must be added following the same folder structure as `nlg_gru` and `mlm_bert`, naming the folder with the task.\n\n## Experiments\n\nExperiments are defined by YAML files, examples are provided in the `configs` folder. These can be run either locally or on AzureML.\n\nFor running experiments on AzureML, the CLI can help. You should first [install the CLI](https://docs.microsoft.com/en-us/azure/machine-learning/reference-azure-machine-learning-cli) (make sure you have v2) and [create a resource group and workspace](https://docs.microsoft.com/en-us/azure/machine-learning/how-to-manage-workspace-cli?tabs=createnewresources%2Cvnetpleconfigurationsv1cli). You can then create a compute cluster, type `az ml compute create -h` for more info. Afterwards, you should write an YAML file with instructions for the job; we provide a simple example below\n\n```yaml\nexperiment_name: basic_example\ndescription: Basic example of AML config for submitting FLUTE jobs\ncode:\n  local_path: .\ncompute: azureml:Test\nenvironment:\n  image: pytorch/pytorch:1.9.0-cuda10.2-cudnn7-devel\ninputs:\n  data:\n    folder: azureml://datastores/data/paths/cifar\n    mode: rw_mount\ncommand: >\n  apt -y update &&\n  apt -y install openmpi-bin libopenmpi-dev openssh-client &&\n  python3 -m pip install --upgrade pip &&\n  python3 -m pip install -r requirements.txt &&\n  python -m torch.distributed.run --nproc_per_node=4 e2e_trainer.py\n  -outputPath=./outputs\n  -dataPath={inputs.data}\n  -task=classif_cnn\n  -config=./experiments/classif_cnn/config.yaml\n  -backend=nccl\n```\n\nYou should replace `compute` with the name of the one you created before, and adjust the path of the datastore containing the data -- in the example above, we created a datastore called `data` and added to it a folder called `cifar`, which contained the two HDF5 files. The command passed above will install dependencies and then launch a distributed job with 4 threads, for the experiment defined in `experiments/classif_cnn`. Details on how to run a job using the AzureML CLI are given [in its documentation](https://docs.microsoft.com/en-us/azure/machine-learning/how-to-train-cli), but typically it suffices to set up the environment and type `az ml job create -f <name-of-the-yaml-file>`. In the same page of the documentation, you can also find more info about how to set up the YAML file above, in case other changes are needed.\n\nNote that the `local_path` above is relative to the location of the YAML file, so setting it to `.` assumes it is in the same folder as `e2e_trainer.py`. All files on this folder will be uploaded to Azure, including hidden folders such as `.git`, so make sure to temporarily get rid of large files and folders that are not needed.\n\nAfter launching the experiment, you can follow it on AzureML Studio, which prints logs, plots metrics and makes the output easily available after the experiment is finished.\n\n## Privacy Accounting\n\nAccounting is expensive, so we log all the privacy parameters so that accounting can be run offline. Best run on a Linux box with a GPU.\nIn particular, we use a DP accountant from another Microsoft repository, which is included in ours as a submodule. For using this accountant, just follow the instructions below:\n\n```\n$ git submodule update --init --recursive\n$ cd utils\n$ cd dp-accountant\n$ python setup.py install\n$ ./bin/compute-dp-epsilon --help\nusage: compute-dp-epsilon [-h] -p SAMPLING_PROBABILITY -s NOISE_MULTIPLIER -i ITERATIONS -d DELTA\n```\n## Third Party Notice\n\nThis software includes the files listed below from the Huggingface/Transformers Library (https://github.com/huggingface/transformers) as part of task performance and preprocessing pretrained models.\n\n    experiments/mlm_bert\n    └── utils\n        ├── trainer_pt_utils.py\n        └── trainer_utils.py\n\nThis software includes the file extensions/privacy/analysis.py from the Tensorflow/Privacy Library (https://github.com/tensorflow/privacy) as part of Renyi Differential Privacy implementation.\n\nThis software includes the script testing/build_vocab.py from LEAF Library (https://github.com/TalwalkarLab/leaf) to create the vocabulary needed to run a testing job. \n\nThis software includes the model implementation of the example ECG Classification | CNN LSTM Attention Mechanism from Kaggle Competition (https://www.kaggle.com/polomarco/ecg-classification-cnn-lstm-attention-mechanism) to reproduce the [ecg_cnn](experiments/ecg_cnn/model.py) experiment.\n\nThis software includes the model implementation of the FedNewsRec repository (https://github.com/taoqi98/FedNewsRec)| Code from the paper \"Privacy-Preserving News Recommendation Model Learning\" (https://arxiv.org/abs/2003.09592) ported to PyTorch framework to reproduce the [fednewsrec](experiments/fednewsrec/model.py) experiment.\nFor more information about third-party OSS licence, please refer to [NOTICE.txt](NOTICE.txt).\n\nThis software includes the Data Augmentation scripts of the Fast AutoAugment repository (https://github.com/kakaobrain/fast-autoaugment) to preprocess the data used in the [semisupervision](experiments/semisupervision/dataloaders/cifar_dataset.py) experiment.\n\nThis software included the FedProx logic implementation of the NIID-Bench repository (https://github.com/Xtra-Computing/NIID-Bench/tree/main) as Federated aggregation method used in the [trainer](core/trainer.py) object.\n## Support\n\nYou are welcome to open issues on this repository related to bug reports and feature requests.\n\n## Contributing\n\nContributions are welcomed and encouraged. For details on how to contribute, please see [CONTRIBUTING.md](CONTRIBUTING.md).\n\n\n"
  },
  {
    "path": "SECURITY.md",
    "content": "<!-- BEGIN MICROSOFT SECURITY.MD V0.0.7 BLOCK -->\n\n## Security\n\nMicrosoft takes the security of our software products and services seriously, which includes all source code repositories managed through our GitHub organizations, which include [Microsoft](https://github.com/Microsoft), [Azure](https://github.com/Azure), [DotNet](https://github.com/dotnet), [AspNet](https://github.com/aspnet), [Xamarin](https://github.com/xamarin), and [our GitHub organizations](https://opensource.microsoft.com/).\n\nIf you believe you have found a security vulnerability in any Microsoft-owned repository that meets [Microsoft's definition of a security vulnerability](https://aka.ms/opensource/security/definition), please report it to us as described below.\n\n## Reporting Security Issues\n\n**Please do not report security vulnerabilities through public GitHub issues.**\n\nInstead, please report them to the Microsoft Security Response Center (MSRC) at [https://msrc.microsoft.com/create-report](https://aka.ms/opensource/security/create-report).\n\nIf you prefer to submit without logging in, send email to [secure@microsoft.com](mailto:secure@microsoft.com).  If possible, encrypt your message with our PGP key; please download it from the [Microsoft Security Response Center PGP Key page](https://aka.ms/opensource/security/pgpkey).\n\nYou should receive a response within 24 hours. If for some reason you do not, please follow up via email to ensure we received your original message. Additional information can be found at [microsoft.com/msrc](https://aka.ms/opensource/security/msrc). \n\nPlease include the requested information listed below (as much as you can provide) to help us better understand the nature and scope of the possible issue:\n\n  * Type of issue (e.g. buffer overflow, SQL injection, cross-site scripting, etc.)\n  * Full paths of source file(s) related to the manifestation of the issue\n  * The location of the affected source code (tag/branch/commit or direct URL)\n  * Any special configuration required to reproduce the issue\n  * Step-by-step instructions to reproduce the issue\n  * Proof-of-concept or exploit code (if possible)\n  * Impact of the issue, including how an attacker might exploit the issue\n\nThis information will help us triage your report more quickly.\n\nIf you are reporting for a bug bounty, more complete reports can contribute to a higher bounty award. Please visit our [Microsoft Bug Bounty Program](https://aka.ms/opensource/security/bounty) page for more details about our active programs.\n\n## Preferred Languages\n\nWe prefer all communications to be in English.\n\n## Policy\n\nMicrosoft follows the principle of [Coordinated Vulnerability Disclosure](https://aka.ms/opensource/security/cvd).\n\n<!-- END MICROSOFT SECURITY.MD BLOCK -->\n"
  },
  {
    "path": "azure-pipelines.yml",
    "content": "trigger:\n- main\n\npool:\n  vmImage: 'windows-latest'\n\nsteps:\n- task: CredScan@2\n  inputs:\n    toolMajorVersion: 'V2'\n\n- task: Semmle@1\n  env: \n    SYSTEM_ACCESSTOKEN: $(System.AccessToken)\n  inputs:\n    sourceCodeDirectory: '$(Build.SourcesDirectory)'\n    language: 'python'\n    querySuite: 'Recommended'\n    timeout: '1800'\n    ram: '16384'\n    addProjectDirToScanningExclusionList: true\n\n- task: ComponentGovernanceComponentDetection@0\n  inputs:\n    scanType: 'Register'\n    verbosity: 'Verbose'\n    alertWarningLevel: 'High'\n\n- task: PublishSecurityAnalysisLogs@2\n  inputs:\n    ArtifactName: 'CodeAnalysisLogs'\n    ArtifactType: 'Container'\n    AllTools: true\n    ToolLogsNotFoundAction: 'Standard'"
  },
  {
    "path": "configs/hello_world_mlm_bert_json.yaml",
    "content": "# Basic configuration file for running mlm_bert example using json files.\n# Parameters needed to initialize the model\nmodel_config:\n    model_type: BERT \n    model_folder: experiments/mlm_bert/model.py\n    BERT:\n        loader_type: text\n        model:\n            model_name: roberta-large\n            cache_dir: ./cache_dir\n            use_fast_tokenizer: False\n            mask_token: <mask>\n            task: mlm\n            past_index: -1\n            prediction_loss_only: false\n            process_line_by_line: false\n        training:\n            seed: 12345\n            label_smoothing_factor: 0  \n            batch_size: 64\n            max_seq_length: 256            \n\n# Configuration for differential privacy\ndp_config:\n    enable_local_dp: false  # If enabled, the rest of parameters is needed. \n    enable_global_dp: false # Local dp clips and adds noise on the client and centrally accumulates the privacy budget\n    eps: 100                # epsilon\n    global_sigma: 0.35      # Used when global dp es enabled, specifies the global Gaussian noise\n    weight_scaler: 0.0001   # indicates how the aggregation weights scaled before noise addition, and unscaled afterwards.\n    max_grad: 0.008         # max gradient\n    max_weight: 0.5         # The max_weight and min_weight should be already scaled by weight_scaler\n    min_weight: 0.0000001   # Because we scale down the weight using weight_scalar -> clip -> add noise -> scale back up.\n\n# Additional privacy metrics\nprivacy_metrics_config:\n    apply_metrics: false    # If enabled, the rest of parameters is needed. \n\n# Select the Federated optimizer to use (e.g. DGA, FedAvg or FedProx)\nstrategy: DGA\n\n# Determines all the server-side settings for training and evaluation rounds\nserver_config:\n    resume_from_checkpoint: true                    # Resumes from latest checkpoint iteration if available \n    do_profiling: false                             # Capture profiling information during server updates.\n    fast_aggregation: true                          \n    wantRL: false                                   # Enable/Disable Reinforcement learning\n    RL:                                             # Reinforcement Learning parameters\n        RL_path_global: false\n        marginal_update_RL: true\n        RL_path: ./RL_models\n        model_descriptor_RL: marginalUpdate\n        network_params: 300,128,128,128,64,100\n        initial_epsilon: 0.5\n        final_epsilon: 0.0001\n        epsilon_gamma: 0.90\n        max_replay_memory_size: 1000\n        minibatch_size: 16\n        gamma: 0.99\n        optimizer_config:\n            lr: 0.0003\n            type: adam\n            amsgrad: true\n        annealing_config:\n            type: step_lr\n            step_interval: epoch\n            step_size: 1\n            gamma: 0.95\n    optimizer_config:                               # Configuration for server-side optimizer\n        lr: 0.00001                                 \n        weight_decay: 0.01\n        type: adamW\n    annealing_config:                               # This section configures how the learning rate decays\n        type: step_lr\n        step_interval: epoch\n        gamma: 1.0\n        step_size: 1000\n    val_freq: 4                                     # Frequency for validation rounds\n    rec_freq: 16                                    # Frequency for testing rounds\n    initial_val : true                              # Enable initial validation round at itr=0\n    initial_rec: false                              # Enable initial testing round at itr=0\n    max_iteration: 10000                            # Total number of rounds for FL\n    num_clients_per_iteration: 200                  # Number of clients sampled per round\n    data_config:                                    # Server-side data configuration\n        val:                                        # Validation data\n            val_data: <add path to data here>\n            task: mlm\n            mlm_probability: 0.25\n            tokenizer_type_fast: False\n            batch_size: 128\n            max_seq_length: 256\n            min_words_per_utt: 5\n            max_samples_per_user: 5000\n            mask_token: <mask>\n            num_workers: 0\n            prepend_datapath: false\n            cache_dir: ./cache_dir\n        # Note this is NOT the main training data configuration, which is configured in the \n        # client config.  This section is ignored unless you are running replay data.\n        # If you want to run replay data- set a path name for train_data_server.\n        # train:\n        #     loader_type: text\n        #     train_data: null\n        #     train_data_server: null\n        #     desired_max_samples: null\n        test:                                       # Test data configuration\n            test_data: <add path to data here>\n            task: mlm\n            mlm_probability: 0.25\n            tokenizer_type_fast: False\n            batch_size: 128\n            max_seq_length: 256\n            max_samples_per_user: 5000\n            mask_token: <mask>\n            num_workers: 0\n            prepend_datapath: false\n            cache_dir: ./cache_dir\n    type: model_optimization                        # Server type\n    aggregate_median: softmax                       # FL aggregation method\n    weight_train_loss: mag_mean_loss                # Determines how each client's weight is computed (e.g. grad_mean_loss, train_loss)\n    softmax_beta: 1.00                              \n    initial_lr_client: 0.00001\n    lr_decay_factor: 1.0\n    best_model_criterion: loss                      # Determine the best model based on minimal loss, for checkpointing\n    fall_back_to_best_model: false                  # If a model degrades, use the previous best model\n    # server_replay_config:                           # This is only applies if the server-side training data is fully configured and loaded\n    #     server_iterations: 50\n    #     optimizer_config:\n    #         lr: 0.00002\n    #         amsgrad: true\n    #         type: adam\n\n# Dictates the learning parameters for client-side model updates. Train data is defined inside this config.\nclient_config:\n    meta_learning: basic\n    stats_on_smooth_grad: true\n    ignore_subtask: false\n    copying_train_data: false\n    do_profiling: false                             # Enables client-side training profiling\n    data_config:\n        train:                                      # This is the main training data configuration\n            list_of_train_data: <add path to data here>\n            task: mlm\n            mlm_probability: 0.25\n            tokenizer_type_fast: False\n            batch_size: 24\n            max_seq_length: 256\n            min_words_per_utt: 5\n            desired_max_samples: 5000\n            mask_token: <mask>\n            num_workers: 0\n            num_frames: 0\n            max_grad_norm: 15.0\n            prepend_datapath: false\n            cache_dir: ./cache_dir\n            pin_memory: true\n    type: optimization\n    meta_optimizer_config:\n        lr: 0.01\n        type: adam\n    optimizer_config:\n        type: adamW\n        weight_decay: 0.01\n        amsgrad: true\n    annealing_config:\n        type: step_lr\n        step_interval: epoch\n        step_size: 2\n        gamma: 1.0"
  },
  {
    "path": "configs/hello_world_nlg_gru_json.yaml",
    "content": "# Basic configuration file for running nlg_gru example using json files.\n# Parameters needed to initialize the model\nmodel_config: \n    model_type: GRU\n    model_folder: experiments/nlg_gru/model.py\n    pretrained_model_path: <add path to pretrained weights here>\n    embed_dim: 160\n    vocab_size: 10000\n    hidden_dim: 512\n    OOV_correct: false\n\n# Configuration for differential privacy\ndp_config:\n    enable_local_dp: false      # If enabled, the rest of parameters is needed. \n    # enable_local_dp: true     # Local dp clips and adds noise on the client and centrally accumulates the privacy budget\n    # eps: 100                  # epsilon\n    # max_grad: 0.008           # max gradient\n    # weight_scaler: 0.0001     # indicates how the aggregation weights scaled before noise addition, and unscaled afterwards.\n    # max_weight: 0.0001        # The max_weight and min_weight should be already scaled by weight_scaler\n    # min_weight: 0.00009       # Because we scale down the weight using weight_scalar -> clip -> add noise -> scale back up.\n\n# Additional privacy metrics\nprivacy_metrics_config:\n    apply_metrics: false             # If enabled, the rest of parameters is needed. \n    # apply_indices_extraction: true   # If we extract word indices we want to consider the rank of the words extracted.\n    # allowed_word_rank: 9000          # Any word that rank above this value is considered privacy risk\n    # apply_leakage_metric: true\n    # max_leakage: 30\n    # max_allowed_leakage: 3\n    # adaptive_leakage_threshold: 0.95 # Takes the 95th percentile of the leakage for the next round.\n    # is_leakage_weighted: true\n    # attacker_optimizer_config:\n    #     lr: 0.03\n    #     type: adamax\n    #     amsgrad: false\n\n# Select the Federated optimizer to use (e.g. DGA, FedAvg or FedProx)\nstrategy: FedProx\n\n# Determines all the server-side settings for training and evaluation rounds\nserver_config:   \n    wantRL: false                   # Enable/Disable Reinforcement learning\n    resume_from_checkpoint: true    # Resumes from latest checkpoint iteration if available \n    do_profiling: false             # Capture profiling information during server updates.\n    optimizer_config:               # Configuration for server-side optimizer\n        type: lamb\n        lr: 0.1\n        weight_decay: 0.005\n    annealing_config:               # This section configures how the learning rate decays\n        type: step_lr\n        step_interval: epoch\n        gamma: 1.0\n        step_size: 100\n    val_freq: 2                     # Frequency for validation rounds\n    rec_freq: 4                     # Frequency for testing rounds\n    initial_val : true              # Enable initial validation round at itr=0\n    initial_rec: false             # Enable initial testing round at itr=0\n    max_iteration: 11               # Total number of rounds for FL\n    num_clients_per_iteration: 10   # Number of clients sampled per round\n    data_config:                    # Server-side data configuration\n        val:                        # Validation data\n            batch_size: 2048\n            tokenizer_type: not_applicable\n            prepend_datapath: false\n            val_data: <add path to data here>       # Path for validation data\n            vocab_dict: <add path to vocab here>    # Path for vocabulary\n            pin_memory: true\n            num_workers: 0                          # Indicates how many workers are used for creating batches\n            num_frames: 2400                        \n            max_batch_size: 2048\n            max_num_words:  25\n            unsorted_batch: true\n        # Note this is NOT the main training data configuration, which is configured in the \n        # client config.  This section is ignored unless you are running replay data.\n        # If you want to run replay data- set a path name for train_data_server.\n        # train:                                      \n        #     batch_size: 128\n        #     loader_type: text\n        #     tokenizer_type: not_applicable\n        #     prepend_datapath: false\n        #     train_data: null\n        #     train_data_server: null\n        #     vocab_dict: <add path to vocab here>\n        #     pin_memory: true\n        #     num_workers: 0\n        #     num_frames: 2400\n        #     desired_max_samples: 500\n        #     max_grad_norm: 10.0\n        #     max_batch_size: 128\n        #     max_num_words:  25\n        #     unsorted_batch: true\n        test:                                       # Test data configuration\n            batch_size: 2048\n            tokenizer_type: not_applicable\n            prepend_datapath: false\n            train_data: null\n            train_data_server: null\n            test_data: <add path to data here>      # Path for validation data\n            vocab_dict: <add path to vocab here>    # Path for vocabulary\n            pin_memory: true\n            num_workers: 0                          # Indicates how many workers are used for creating batches\n            max_batch_size: 2048\n            max_num_words:  25\n            unsorted_batch: true\n    type: model_optimization\n    aggregate_median: softmax                       # FL aggregation method\n    weight_train_loss: train_loss                   # Determines how each client's weight is computed (e.g. grad_mean_loss, train_loss)\n    softmax_beta: 20.0\n    initial_lr_client: 1.0\n    lr_decay_factor: 1.0\n    best_model_criterion: loss                      # Determine the best model based on minimal loss, for checkpointing\n    fall_back_to_best_model: false                  # If a model degrades, use the previous best model\n    # server_replay_config:                           # This is only applies if the server-side training data is fully configured and loaded\n    #     server_iterations: 50\n    #     optimizer_config:\n    #         type: adam\n    #         lr: 0.00002\n    #         amsgrad: true\n    \n# Dictates the learning parameters for client-side model updates. Train data is defined inside this config.\nclient_config:\n    mu: 0.001                                           # Used only for FedProx aggregation method\n    meta_learning: basic\n    stats_on_smooth_grad: true\n    ignore_subtask: false\n    num_skips_threshold: 10\n    copying_train_data: false\n    do_profiling: false                                 # Enables client-side training profiling\n    data_config:\n        train:                                          # This is the main training data configuration\n            batch_size: 64\n            tokenizer_type: not_applicable\n            prepend_datapath: false\n            list_of_train_data: <add path to data here> # Path to training data\n            vocab_dict: <add path to vocab here>        # Path to vocabulary\n            pin_memory: true\n            num_workers: 0\n            desired_max_samples: 50000\n            max_grad_norm: 20.0\n            max_batch_size: 128\n            max_num_words:  25\n            unsorted_batch: true\n    type: optimization\n    meta_optimizer_config:\n        lr: 1.0\n        type: sgd\n    optimizer_config:\n        type: sgd\n    annealing_config:\n        type: step_lr\n        step_interval: epoch\n        step_size: 1\n        gamma: 1.0"
  },
  {
    "path": "core/__init__.py",
    "content": ""
  },
  {
    "path": "core/client.py",
    "content": "# Copyright (c) Microsoft Corporation.\n# Licensed under the MIT license.\n'''\nThe Client object is short-lived, instantiated inside workers 1 to N for \nprocessing a given client's data. It's main method is the `process_round` \nfunction, used to update the model given a client's data.\n'''\n\nimport copy\nimport logging\nimport os\nimport time\n\nfrom easydict import EasyDict as edict\nfrom importlib.machinery import SourceFileLoader\nimport numpy as np\nimport torch\n\n# Internal imports\nimport core.federated as federated\nfrom .strategies import select_strategy\nfrom .trainer import (\n    Trainer,\n    run_validation_generic,\n    set_component_wise_lr,\n)\nfrom utils import (\n    ScheduledSamplingScheduler,\n    make_optimizer,\n    print_rank,\n    to_device,\n    convex_inference,\n    alpha_update,\n)\nfrom utils.dataloaders_utils import (\n    make_train_dataloader,\n    make_val_dataloader,\n    make_test_dataloader,\n    get_dataset,\n)\nimport extensions.privacy\nfrom extensions.privacy import metrics as privacy_metrics\nfrom experiments import make_model\n\nglobal train_dataset\nglobal trainset_unlab\nglobal trainset_unlab_rand\n\nclass Client:\n    # It's unclear why, but sphinx refuses to generate method docs\n    # if there is no docstring for this class.\n    \"\"\"Client class for specifying individual client training tasks\"\"\"\n\n    def __init__(self, client_id, config, send_gradients):\n        '''\n        Client side processing: computing gradients, update the model and send them back to the server\n\n        Args:\n            client_id (int): identifier for grabbing that client's data.\n            config (dict): dictionary with parameters loaded from config file.\n            send_gradients (bool): if True, model gradients are sent back;\n                otherwise, model weights are sent back.\n        '''\n        super().__init__()\n        \n        self.client_id = client_id\n        self.config = copy.deepcopy(config)\n        self.send_gradients = send_gradients\n\n    def get_client_data(self, dataset=None):\n        '''\"Getter\" method that returns all object's attributes at once.'''\n\n        client_data = self.get_data(self.client_id, dataset)\n        return self.client_id, client_data, self.config, self.send_gradients\n\n    @staticmethod\n    def get_train_dataset(data_path, client_train_config, task):\n        '''This function will obtain the dataset for all training\n        users.\n\n        Args:\n            data_path (str): path to file containing taining data.\n            client_train_config (dict): trainig data config.\n            task (str): task name.\n        '''\n        global train_dataset\n        global trainset_unlab\n        global trainset_unlab_rand\n\n        train_dataset = get_dataset(data_path, client_train_config, task, mode=\"train\")\n\n        if task == 'semisupervision':\n            trainset_unlab = get_dataset(data_path, client_train_config, task, mode=\"train\", user_idx = -2)\n            trainset_unlab_rand = get_dataset(data_path, client_train_config, task, mode=\"train\", user_idx = -3)\n        else:\n            trainset_unlab = None\n            trainset_unlab_rand = None\n\n        return len(train_dataset.user_list)\n\n    @staticmethod\n    def get_data(clients, dataset):\n        ''' Create training dictionary'''\n\n        if dataset == None: # Training case\n            datasets = [train_dataset, trainset_unlab, trainset_unlab_rand] if trainset_unlab != None else [train_dataset]\n        else: # Evaluation case\n            datasets = [dataset]\n\n        data_with_labels = hasattr(datasets[0],\"user_data_label\")\n        \n        strcts = [] # Returning list length will always be 1 except when the task is semisupervision\n        for dataset in datasets:\n            input_strct = {'users': [], 'num_samples': [],'user_data': dict(), 'user_data_label': dict()} if data_with_labels else {'users': [], 'num_samples': [],'user_data': dict()}\n            for client in clients:\n                user = dataset.user_list[client]\n                input_strct['users'].append(user)\n                input_strct['num_samples'].append(dataset.num_samples[client])\n                input_strct['user_data'][user]= dataset.user_data[user]\n                if data_with_labels: \n                    input_strct['user_data_label'][user] = dataset.user_data_label[user]\n            strcts.append(edict(input_strct))\n        \n        return strcts \n\n    @staticmethod\n    def run_testvalidate(client_data, server_data, mode, model):\n        '''Called by worker to run test/validation sample on a client.\n\n        This functions assumes set_model_for_round has already been called to\n        push the model to the client (see federated.py).\n\n        Args:\n            client_data (tuple): client data and config. It is a tuple with 3\n                components; importantly, the second component is a dict\n                containing the data, and the third component is a dict with the\n                config parsed from the YAML file.\n            server_data (tuple): server data (model parameters mostly). It is\n                a tuple with 2 components; importantly, the second component\n                consists of the current model parameters.\n            mode (str): whether to `test` or `validate`.\n            model (torch.nn.Module): actual model without parameters.\n        '''\n\n        # Process inputs and initialize variables\n        _, data_strcts, config, _ = client_data\n        _, model_parameters, iteration = server_data\n        config = copy.deepcopy(config)\n        model_path = config[\"model_path\"]\n\n        begin = time.time()  \n\n        # Use the server's data config since we're distributing test/validate from the server\n        data_strct = data_strcts[0]\n        data_config = config['server_config']['data_config'][mode]\n        want_logits = data_config.get('wantLogits', False)\n        send_dicts = config['server_config'].get('send_dicts', False)\n\n        # Create dataloader \n        dataloader = None\n        print_rank('making dataloader with task {}'.format(config['server_config']['task']), loglevel=logging.DEBUG)\n        if mode == 'test':\n            dataloader = make_test_dataloader(data_config, data_path=None, task=config['server_config']['task'], data_strct=data_strct)\n        elif mode == 'val':\n            dataloader = make_val_dataloader(data_config, data_path=None, task=config['server_config']['task'], data_strct=data_strct)\n\n        # Set model parameters\n        n_layers, n_params = len([f for f in model.parameters()]), len(model_parameters)\n        print_rank(f'Copying model parameters... {n_layers}/{n_params}', loglevel=logging.DEBUG)\n        \n        model = to_device(model)\n        \n        if send_dicts: # Send model state dictionary\n            tmp = {}\n            for param_key, param_dict in zip (model.state_dict(), model_parameters):\n                tmp[param_key] = param_dict\n            model.load_state_dict(tmp)\n        else: # Send parameters\n            for p, data in zip(model.parameters(), model_parameters):\n                p.data = data.detach().clone().cuda() if torch.cuda.is_available() else data.detach().clone()\n\n        print_rank(f'Model setup complete. {time.time() - begin}s elapsed.', loglevel=logging.DEBUG)\n\n        # Compute output and metrics on the test or validation data\n        num_instances = sum(data_strct['num_samples'])\n        print_rank(f'Validating {num_instances}', loglevel=logging.DEBUG)\n        output, metrics = run_validation_generic(model, dataloader)\n        \n        # Load local model if necessary\n        if config['server_config']['type']=='personalization':\n\n            local_model = make_model(config['model_config'])\n            user = data_strct['users'][0]\n\n            local_model_name = os.path.join(model_path, user + '_model.tar')\n\n            if os.path.exists(local_model_name):\n                print_rank('Loading Local Model .. {}'.format(local_model_name))\n                checkpoint = torch.load(local_model_name)\n                local_model.load_state_dict(checkpoint[\"model_state_dict\"])\n\n                local_alpha_name = os.path.join(model_path, user + '_alpha')\n                if os.path.exists(local_alpha_name):\n                    alpha = torch.load(local_alpha_name)\n                    print_rank('Loading Alpha Weight from {}: Value={}'.format(local_model_name, alpha))\n\n                    # Run inference and get logits back\n                    if mode == 'test':\n                        dataloader = make_test_dataloader(data_config, data_path=None, task=config['server_config']['task'], data_strct=data_strct)\n                    elif mode == 'val':\n                        dataloader = make_val_dataloader(data_config, data_path=None, task=config['server_config']['task'], data_strct=data_strct)\n\n                    output_local, local_metrics = run_validation_generic(local_model, dataloader)\n                    loss_local = local_metrics['loss']['value']\n                    cer = local_metrics['acc']['value']\n                    # Combine logits\n                    cer =convex_inference(output, output_local, alpha=alpha)\n                    metrics['loss']['value'] = (metrics['loss']['value'] + loss_local) / 2 \n                    metrics['acc']['value'] = cer\n        output = None if not want_logits else output\n\n        return output, metrics, num_instances\n\n\n\n    @staticmethod\n    def process_round(client_data, server_data, model, data_path, eps=1e-7):\n        '''Compute gradients given client's data and update model.\n\n        Args:\n            client_data (tuple): client data and config. It is a tuple\n                consisting of 4 components: an int indicating the client's id, a\n                dict containing that client's data, a dict with the config\n                parsed from the YAML file, and a bool indicating whether or not\n                gradients should be sent.\n            server_data (tuple): server data (model parameters mostly). It is\n                a tuple consisting of 2 components; importantly, the first is\n                a float giving the client's learning rate, and the second a list\n                of torch.Tensor's with current model parameters. \n            model (torch.nn.Module): actual model without parameters.\n            data_path (str): where to get data from.\n            eps (float): lower bound for aggregation weights.\n        '''\n\n        # Ensure the client is assigned to the correct GPU\n        if torch.cuda.is_available() and torch.cuda.device_count() == federated.size():\n            torch.cuda.set_device(federated.local_rank())\n\n        # Process inputs and initialize variables\n        client_id, data_strcts, config, send_gradients = client_data\n        initial_lr, model_parameters, iteration = server_data\n        config = copy.deepcopy(config)\n\n        model_config = config['model_config']\n        client_config = config['client_config']\n        data_config = client_config['data_config']['train']\n        semisupervision_config = client_config.get('semisupervision',None)\n        task = client_config.get('task', {})\n        trainer_config = client_config.get('trainer_config', {})\n        privacy_metrics_config = config.get('privacy_metrics_config', None)\n        model_path = config[\"model_path\"]\n\n        strategy_algo = config['strategy']\n        StrategyClass = select_strategy(strategy_algo)\n        strategy = StrategyClass('client', config)\n        print_rank(f'Client successfully instantiated strategy {strategy}', loglevel=logging.DEBUG)\n        send_dicts = config['server_config'].get('send_dicts', False)\n\n        begin = time.time()  \n        client_stats = {}  \n\n        data_strct = data_strcts[0]\n        user = data_strct['users'][0]\n        print_rank('Loading : {}-th client with name: {}, {} samples, {}s elapsed'.format(\n            client_id[0], user, data_strct['num_samples'][0], time.time() - begin), loglevel=logging.INFO)\n\n        # Get dataloaders\n        train_dataloader = make_train_dataloader(data_config, data_path, task=task, clientx=0, data_strct=data_strct)\n\n        # Instantiate the model object\n        if model is None:\n            model = make_model(\n                model_config,\n                dataloader_type=train_dataloader.__class__.__name__,\n                input_dim=data_config['input_dim'],\n                vocab_size=train_dataloader.vocab_size,\n            )\n        \n        # Set model parameters\n        n_layers, n_params = len([f for f in model.parameters()]), len(model_parameters)\n        print_rank(f'Copying model parameters... {n_layers}/{n_params}', loglevel=logging.DEBUG)\n        model = to_device(model)\n\n        if send_dicts: # Send model state dictionary\n            tmp = {}\n            for param_key, param_dict in zip (model.state_dict(), model_parameters):\n                tmp[param_key] = param_dict\n            model.load_state_dict(tmp)\n        else: # Send parameters\n            for p, data in zip(model.parameters(), model_parameters):\n                p.data = data.detach().clone().cuda() if torch.cuda.is_available() else data.detach().clone()\n        print_rank(f'Model setup complete. {time.time() - begin}s elapsed.', loglevel=logging.DEBUG)\n\n\n        # Fix parameters of layers\n        if 'updatable_names' in trainer_config:\n            set_component_wise_lr(model, client_config['optimizer_config'], trainer_config['updatable_names'])\n\n        # Create the optimizer on the workers\n        # NOTE: the server dictates the learning rate for the clients\n        client_config['optimizer_config']['lr'] = initial_lr\n        optimizer = make_optimizer(client_config['optimizer_config'], model)\n\n        # Make the scheduled sampling scheduler\n        ss_scheduler = None\n        if 'ss_config' in client_config and client_config['ss_config'] is not None:\n            ss_scheduler = ScheduledSamplingScheduler(model=model, **client_config['ss_config'])\n\n        # Make the trainer\n        trainer = Trainer(\n            model=model,\n            optimizer=optimizer,\n            ss_scheduler=ss_scheduler,\n            train_dataloader=train_dataloader,\n            server_replay_config =client_config,\n            max_grad_norm=client_config['data_config']['train'].get('max_grad_norm', None),\n            anneal_config=client_config['annealing_config'] if 'annealing_config' in client_config else None,\n            num_skips_threshold=client_config['num_skips_threshold'] if 'num_skips_threshold' in client_config else -1,\n            ignore_subtask=client_config['ignore_subtask']\n        )\n\n        if trainer.optimizer is not None:\n            initial_optimizer_state = copy.deepcopy(trainer.optimizer.state_dict())\n\n        annealing_config = client_config['annealing_config'] if 'annealing_config' in client_config else None\n\n        assert 'desired_max_samples' in client_config['data_config']['train'], 'Missing \\'desired_max_samples\\' entry in data config parameter'\n        desired_max_samples = client_config['data_config']['train']['desired_max_samples']\n\n        if trainer.optimizer is not None:  # reset the optimizer state\n            if initial_lr > 0:\n                trainer.optimizer.param_groups[0].update({'lr': initial_lr})\n            initial_optimizer_state = copy.deepcopy(trainer.optimizer.state_dict())\n            trainer.reset_optimizer(initial_optimizer_state, annealing_config)\n\n        # Mark the end of setup\n        end = time.time()\n        client_stats['setup'] = end - begin\n        print_rank(f'Client setup cost {client_stats[\"setup\"]}s', loglevel=logging.DEBUG)               \n        begin_training = end\n        \n        # Training begins here\n        trainer.model.train()\n        trainer.model.zero_grad()\n\n        # Save the client batches if we want to evaluate the privacy metrics\n        apply_privacy_metrics = (False if privacy_metrics_config is None else privacy_metrics_config['apply_metrics'])\n\n        # This is where training actually happens\n        algo_payload = None\n\n        if strategy_algo == 'FedLabels':\n            datasets =[get_dataset(data_path, config, task, mode=\"train\", test_only=False, data_strct=data_strcts[i], user_idx=0) for i in range(3)]\n            algo_payload = {'strategy':'FedLabels', 'data': datasets, 'iter': iteration, 'config': semisupervision_config}\n        elif strategy_algo == 'FedProx':\n            algo_payload = {'strategy':'FedProx', 'mu': client_config.get('mu',0.001)}\n        train_loss, num_samples, algo_computation = trainer.train_desired_samples(desired_max_samples=desired_max_samples, apply_privacy_metrics=apply_privacy_metrics, algo_payload = algo_payload)\n        print_rank('client={}: training loss={}'.format(client_id[0], train_loss), loglevel=logging.DEBUG)\n\n        # Estimate gradient magnitude mean/var\n        # Now computed when the sufficient stats are updated.\n        assert 'sum' in trainer.sufficient_stats\n        assert 'mean' in trainer.sufficient_stats\n        \n        trainer.train_loss = train_loss\n        trainer.num_samples = num_samples\n        trainer.algo_computation = algo_computation\n\n        # Compute pseudo-gradient\n        if not send_dicts:\n            for p, data in zip(trainer.model.parameters(), model_parameters):\n                data = to_device(data)\n                p.grad = data - p.data\n\n        payload = strategy.generate_client_payload(trainer) if send_gradients else None\n\n        if config['server_config']['type'] == 'personalization':\n            # Initialize convex weight alpha\n            alpha= config['client_config'].get('convex_model_interp', 0.75)\n            local_model = make_model(config['model_config'])\n            train_dataloader = make_train_dataloader(data_config, data_path, task=task, clientx=0, data_strct=data_strct)\n            local_optimizer = make_optimizer(client_config['optimizer_config'], local_model)\n\n            # Make the trainer\n            local_trainer = Trainer(\n                model=local_model,\n                optimizer=local_optimizer,\n                ss_scheduler=ss_scheduler,\n                train_dataloader=train_dataloader,\n                server_replay_config=client_config,\n                max_grad_norm=client_config['data_config']['train'].get('max_grad_norm', None),\n                anneal_config=client_config['annealing_config'] if 'annealing_config' in client_config else None,\n                num_skips_threshold=client_config[\n                    'num_skips_threshold'] if 'num_skips_threshold' in client_config else -1,\n                ignore_subtask=client_config['ignore_subtask']\n            )\n\n            local_model_name = os.path.join(model_path, user + '_model.tar')\n            local_alpha_name = os.path.join(model_path, user + '_alpha')\n\n            if os.path.exists(local_model_name):\n                print_rank('Loading Local Model .. {}'.format(local_model_name))\n                local_trainer.load(local_model_name, update_lr_scheduler=False, update_ss_scheduler=False)\n\n            if os.path.exists(local_alpha_name):\n                print_rank('Loading Alpha Weight .. {}'.format(local_model_name), loglevel=logging.INFO)\n                alpha = torch.load(local_alpha_name)\n\n            # Copy original model\n            original_local_model = local_trainer.get_model()\n\n            # Training begins here\n            local_trainer.model.train()\n            local_trainer.model.zero_grad()\n\n            # Run Local Processing\n            train_loss, num_samples = local_trainer.train_desired_samples(desired_max_samples=desired_max_samples,\n                                                                          apply_privacy_metrics=False)\n            print_rank('client={}, user:{}: LOCAL training loss={}'.format(client_id[0], user, train_loss), loglevel=logging.INFO)\n\n            local_trainer.save(\n                model_path=model_path,\n                config=config,\n                token=user)\n\n            # Estimate the pseudo-gradient for local model\n            for p, orig_param in zip(local_trainer.model.parameters(), original_local_model.parameters()):\n                orig_param = orig_param.cuda() if torch.cuda.is_available() else orig_param\n                p.grad = orig_param.data - p.data\n\n            alpha= alpha_update(local_trainer.model, trainer.model, alpha, initial_lr)\n            torch.save(alpha, local_alpha_name)\n            local_trainer.model.zero_grad()\n\n\n        # Mark that training (including post-processing) is finished\n        end = time.time()\n        client_stats['training'] = end - begin_training\n        client_stats['full cost'] = end - begin\n        print_rank(f'Client training cost {end - begin_training}s', loglevel=logging.DEBUG)      \n        print_rank(f'Client full cost {end - begin}s', loglevel=logging.DEBUG)\n\n        # Create dictionary that is sent back to server\n        client_output = {\n            'cs': client_stats, \n            'tl': train_loss, \n            'mg': trainer.sufficient_stats['mag'],\n            'vg': trainer.sufficient_stats['var'],\n            'ng': trainer.sufficient_stats['mean'],\n            'rg': trainer.sufficient_stats['norm'],\n            'ns': num_samples,\n            'pl': payload,\n        }\n       \n        # Apply privacy metrics\n        if privacy_metrics_config and privacy_metrics_config['apply_metrics']:\n            print_rank('Applying privacy metrics', loglevel=logging.DEBUG)\n\n            privacy_stats = {'Dropped clients': 0}\n            batches = trainer.cached_batches\n            trainer.cached_batches = []\n            gradients = extensions.privacy.unroll_network(model.named_parameters(), select_grad=True)[0]\n\n            if privacy_metrics_config['apply_indices_extraction']:\n                allowed_word_rank = privacy_metrics_config.get('allowed_word_rank', 9000)\n                embed_dim, vocab_size = model_config['embed_dim'], model_config['vocab_size']\n                overlap, indices = privacy_metrics.extract_indices_from_embeddings(gradients, batches, embed_dim, vocab_size)\n\n                max_overlap =  privacy_metrics_config.get('max_allowed_overlap', None)\n                if max_overlap is not None and overlap > max_overlap:\n                    print_rank('Removing this client because we extracted {}% words and the maximum allowed is {}%'.format(overlap * 100, max_overlap * 100))\n                    client_output['wt'] = 0.0\n                    privacy_stats['Dropped clients'] = 1\n\n                privacy_stats['Extracted indices percentage'] = overlap\n                privacy_stats['Words percentage above ' + str(allowed_word_rank) + ' word rank'] = (indices > allowed_word_rank).mean() if len(indices)>0 else 0\n          \n            if privacy_metrics_config['apply_leakage_metric']:\n                print_rank('Applying leakage metric', loglevel=logging.DEBUG)\n\n                orig_params = {n: p for (n, _), p in zip(trainer.model.named_parameters(), model_parameters)}\n                max_ratio = np.exp(privacy_metrics_config['max_leakage'])\n                optim_config = privacy_metrics_config['attacker_optimizer_config']\n                is_leakage_weighted = privacy_metrics_config['is_leakage_weighted']\n\n                leakage = privacy_metrics.practical_epsilon_leakage(orig_params,\n                    trainer.model, batches, is_leakage_weighted, max_ratio, optim_config)                \n                print_rank('privacy leakage: {}'.format(leakage), loglevel=logging.DEBUG)\n\n                max_leakage =  privacy_metrics_config.get('max_allowed_leakage', None)\n                if max_leakage is not None and leakage > max_leakage:\n                    print_rank('Removing this client because the information leakage/practical epsilon is {} and the maximum allowed is {}'.format(leakage, max_leakage))\n                    client_output['wt'] = 0.0\n                    privacy_stats['Dropped clients'] = 1\n\n                privacy_stats['Practical epsilon (Max leakage)'] = leakage\n            \n            client_output['ps'] = privacy_stats\n\n        client_output['ts'] = time.time()\n        return client_output\n"
  },
  {
    "path": "core/config.py",
    "content": "# Note this import requires python 3.7+\n# Do we want to commit to this?\nfrom __future__ import annotations\nfrom dataclasses import dataclass\nfrom collections.abc import MutableMapping\nfrom cerberus import Validator\nfrom importlib.machinery import SourceFileLoader\nfrom utils.utils import print_rank\nfrom importlib.machinery import SourceFileLoader\nimport os\n\n\n# TODO everywhere: choose reasonable defaults.\n# TODO: decide where task should live as a setting, maybe its own TaskConfig\n# TODO: docstrings everywhere\n\n# TODO: Make ModelConfig a base class that different models inherit from\n# We could specify the modelconfig class in the config file,\n# like we do for model.py.  The current implementation mixes NLG and BERT\n\n# TODO: DatasetConfig needs to be teased apart.\n# The main issue is we have *_data, list_of_train_data, train_data_server.\n# They all essentially perform the same function in different contexts.\n# also some no-longer-used parameters are still present.\n\n# TODO: it's not clear what MutableMapping methods need overrides- we\n# could probably just use the default implementation.\n\n# TODO: not all pytorch optimizers can handle amsgrad - we should\n# have distinct subclasses for the different optimizers\n\ndef from_dict(cls, config):\n    \"\"\"\n    Helper function to convert a dict to a class\n    \"\"\"\n    return cls(**config)\n\n\nclass Config(MutableMapping):\n    \"\"\"Base class for configuration classes.\"\"\"\n    def get(self, k: str, default=None):\n        result = getattr(self, k, default)\n        if result is None:\n            return default\n        return result\n\n    def lookup(self, s: str, default=None):\n        toks = s.split('.')\n        child = getattr(self, toks[0], default)\n        if len(toks) == 1:\n            return child if child is not None else default\n        elif isinstance(child, Config):\n            return child.lookup('.'.join(toks[1:]), default)\n        else:\n            return default\n\n    def __getitem__(self, k):\n        return getattr(self, k)\n\n    def __setitem__(self, k, v):\n        setattr(self, k, v)\n\n    def __delitem__(self, k):\n        delattr(self, k)\n\n    def __iter__(self):\n        return iter(self.__dict__)\n\n    def __len__(self):\n        return len(self.__dict__)\n\n    def __contains__(self, k):\n        return getattr(self, k, None) is not None\n\n    def pop(self, k, default=None):\n        result = self.get(k, default)\n        if k in self:\n            delattr(self, k)\n        return result\n\n\n@dataclass\nclass ModelConfig(Config):\n    \"\"\"Base class for Model configurations\n\nThe model configuration specifies model architecture, parameters, and initialization settings.\n\nAttributes:\n    model_type (str): The class name of the model to instantiate. eg GRU.\n\n    model_folder (str): The relative path to the model.py file where model_type is defined. eg experiments/nlg_gru/model.py\n\n    pretrained_model_path (str): The path to the pretrained model.  If None, the model will be randomly initialized using the method defined in weight_init.\n\n\"\"\"\n    model_type: str = None\n    model_folder: str = None\n    pretrained_model_path: str = None\n\n    @staticmethod\n    def from_dict(config) -> ModelConfig:\n        \"\"\"Searches the model folder for config.py and if it is found the model config \n        is initialized from the class [model_type]Config\"\"\"\n        cfg_path = os.path.dirname(\"./\" + str(config['model_folder'])) + '/config.py'\n        if os.path.exists(cfg_path):\n            loader = SourceFileLoader('config', cfg_path).load_module()\n            config_class = config['model_type'] + 'Config'\n            try:\n                config_type = getattr(loader, config_class)\n                return from_dict(config_type, config)\n            except AttributeError:\n                print_rank(f\"Config class {config_class} not found in {cfg_path}\")\n                raise\n        else:\n            print_rank(f\"Warning: couldn't find {cfg_path}, falling back to dictionary.\")\n            return config\n            \n\n@dataclass\nclass BERTModelConfig(Config):\n    \"\"\"BERT model configuration\n\nThe BERT configuration specifies huggingface-specific BERT model settings.\n\nAttributes:\n    model_name (str): The name of the BERT model.  eg bert-base-uncased.\n\n    cache_dir (str): Tokenizer cache directory, will be created if it doesn't exist.\n\n    use_fast_tokenizer (bool): Whether to use the fast tokenizer.\n\n    mask_token (str): special token to use for masking.\n\n    task (str): The task to use for BERT.  eg mlm.\n\n    past_index (int): The index of the past state in the BERT model's state dict.\n\n    prediction_loss_only (bool): if False, also produce metrics for predictions and labels.\n\n    process_line_by_line (bool): if True, process the input line-by-line.\n\nToDo:\n    * check how cache_dir is used- there's a risk of multiple processes reading/writing at the same time.\n    * verify the meaning of past_index (thanks copilot)\n    * document the difference when process_line_by_line is True vs False\n\n    \"\"\"\n    model_name: str = None\n    cache_dir: str = None\n    use_fast_tokenizer: bool = False\n    mask_token: str = '<mask>'\n    task: str = 'mlm'\n    past_index: int | None = -2\n    prediction_loss_only: bool = False\n    process_line_by_line: bool = False\n\n    @staticmethod\n    def from_dict(config) -> BERTModelConfig:\n        return from_dict(BERTModelConfig, config)\n\n\n@dataclass\nclass BERTTrainingConfig(Config):\n    \"\"\"BERT training configuration\n\n    Configuration settings for BERT training.\n\n    Attributes:\n        seed (int): random seed for reproducibility.\n\n        label_smoothing_factor (float): label smoothing factor.  Applied label smoothing when the factor is non-zero.\n\n        batch_size (int): batch size.\n\n        max_seq_length (int): maximum input sequence length.\n    \"\"\"\n    seed: int | None = None\n    label_smoothing_factor: float | None = None\n    batch_size: int | None = None\n    max_seq_length: int | None = None\n\n    @staticmethod\n    def from_dict(config) -> BERTTrainingConfig:\n        return from_dict(BERTTrainingConfig, config)\n\n\n@dataclass\nclass BERTConfig(Config):\n    \"\"\"BERT configuration\n    Specifies the model and training configuration for huggingface modeling scenarios.\n\n    Attributes:\n        loader_type (str): loader type hint. eg 'text'\n\n        model (BERTModelConfig): BERT model configuration.\n\n        training (BERTTrainingConfig): BERT training configuration.\n    \"\"\"\n    loader_type: str = None\n    model: BERTModelConfig = None\n    training: BERTTrainingConfig = None\n\n    @staticmethod\n    def from_dict(config) -> BERTConfig:\n        result = BERTConfig()\n        for k in config:\n            if k == 'model':\n                result.model = BERTModelConfig.from_dict(config[k])\n            elif k == 'training':\n                result.training = BERTTrainingConfig.from_dict(config[k])\n            else:\n                setattr(result, k, config[k])\n        return result\n\n\n@dataclass\nclass PrivacyConfig(Config):\n    \"\"\"Privacy configuration\n\n    The privacy configuration specified differential privacy settings for the model.\n    The user can choose between local or global DP.  When local DP is enabled, a global\n    epsilon can be computed by applying the RDP accountant (see extensions/privacy).\n    The `eps` parameter is used to specify the privacy budget for local DP.  Conversely, when\n    global DP is enabled, `eps` is ignored and `global_sigma` directly specifies the global\n    Gaussian noise.   `max_grad` specifies the clipping parameter for local or global DP,\n    `max_weight` specifies the clipping parameter for the local gradient aggregation weight\n    (applies to softmax aggregation), and `weight_scaler` indicates how the aggregation weight\n    is scaled before noise addition, and unscaled afterward. This enables a single eps/sigma\n    parameter for both the gradient and its weight.\n\n    Example:\n       This example applies local DP with eps=1000. The global epsilon will be computing using Renyi DP accounting.\n\n       .. code-block:: yaml\n\n            dp_config:\n                # Local dp clips and adds noise on the client and centrally accumulates the privacy budget.\n                enable_local_dp: true\n                eps: 100 # epsilon\n                max_grad: 0.008  # max gradient\n                # The max_weight and min_weight should be already scaled by weight_scaler\n                # Because we scale down the weight using weight_scalar -> clip -> add noise -> scale back up.\n                max_weight: 0.0001\n                weight_scaler: 0.0001\n                min_weight: 0.00009\n\n\n    Attributes:\n        enable_local_dp (bool): whether to enable local DP.\n\n        enable_global_dp (bool): whether to enable global DP.\n\n        eps (float): the privacy budget for local DP.\n\n        delta (float): the privacy delta parameter for local DP.\n\n        global_sigma (float): the global Gaussian noise for global DP.\n\n        max_grad (float): the gradient clipping parameter.\n\n        max_weight (float): the aggregation weight clipping parameter.\n\n        weight_scaler (float): the aggregation weight scaling parameter.\n\n        min_weight (float): the minimum per-gradient aggregation weight.\n\n    \"\"\"\n    enable_local_dp: bool = False\n    enable_global_dp: bool = False\n    eps: float | None = None\n    delta: float | None = None\n    global_sigma: float | None = None\n    max_grad: float | None = None\n    max_weight: float | None = None\n    weight_scaler: float | None = None\n    min_weight: float | None = None\n\n    @staticmethod\n    def from_dict(config) -> PrivacyConfig:\n        return from_dict(PrivacyConfig, config)\n\n\n@dataclass\nclass PrivacyMetricsConfig(Config):\n    \"\"\"Privacy metrics configuration\n\n    This optional feature computes local privacy metrics for computed gradients,\n    and optionally filters gradients based on estimated privacy loss.\n\n    Attributes:\n        apply_metrics (bool): whether to compute privacy metrics.\n\n        apply_indices_extraction (bool): whether to attempt local data reconstruction.\n\n        allowed_word_rank (int): threshold for successful reconstruction.\n\n        apply_leakage_metric (bool): whether to compute a privacy leakage metric based on the ratio of perplexities before and after local training.\n\n        max_leakage (float): the maximum allowed privacy leakage before filtering\n\n        adaptive_leakage_threshold (float): if non-zero, compute an adaptive leakage threshold based on the previous round of training.  For example at 0.95, the max_leakage will be adjusted to reject 5% of gradients, based on the previous round of training.\n\n        is_leakage_weighted (bool): scales the leakage by the maximum likelihood of the pre- and post- likelihood tensors. ie the worst-case leakage is weighted by the worst-case likelihood that we might encounter it.\n\n        attacker_optimizer_config (OptimizerConfig): the optimizer configuration for the reconstruction attack.\n    \"\"\"\n    apply_metrics: bool = False\n    apply_indices_extraction: bool = False\n    allowed_word_rank: int | None = None\n    apply_leakage_metric: bool = False\n    max_leakage: float | None = None\n    max_allowed_leakage: float | None = None\n    adaptive_leakage_threshold: float | None = None\n    is_leakage_weighted: bool = False\n    attacker_optimizer_config: OptimizerConfig = None\n\n    @staticmethod\n    def from_dict(config) -> PrivacyMetricsConfig:\n        result = PrivacyMetricsConfig()\n        for k in config:\n            if k == 'attacker_optimizer_config':\n                result.attacker_optimizer_config = \\\n                    OptimizerConfig.from_dict(config[k])\n            else:\n                setattr(result, k, config[k])\n        return result\n\n\n@dataclass\nclass OptimizerConfig(Config):\n    \"\"\"Optimizer configuration\n\n    Pass any pytorch-supported optimizer configuration. The object should include\n    a `type` field which indicates the pytorch optimizer type that should be invoked.\n    This will be stripped from the object before being passed to the Optimizer's init.\n    \"\"\"\n    type: str = None\n    # Leave this open for any keyword arguments, so we don't break torch constructors\n    # In the future we can limit keywords to torch-specific ones.\n    # lr: float = 0.0\n    # weight_decay: float = 0.0\n    # amsgrad: bool = False\n\n    @staticmethod\n    def from_dict(config) -> OptimizerConfig:\n        # needs its own from_dict so we can accomodate any fields\n        result = OptimizerConfig()\n        assert 'type' in config\n        for k in config:\n            setattr(result, k, config[k])\n        return result\n\n\n@dataclass\nclass AnnealingConfig(Config):\n    \"\"\"Learning rate annealing configuration\n\n\n    Attributes:\n        type (str): the type of annealing. Supported methods: :code:`step_lr`, :code:`multi_step_lr`, :code:`rampup-keep-expdecay-keep`, :code:`val_loss`.\n\n        step_interval (str): the interval at which to step the learning rate. Supported intevals: :code:`epoch`, :code:`batch`.\n\n        gamma (float): the learning rate decay factor.\n\n        step_size (int): the interval between annealing operations.\n    \"\"\"\n    type: str = None\n    step_interval: str = None\n    gamma: float | None = None\n    step_size: int | None = None\n\n    @staticmethod\n    def from_dict(config) -> AnnealingConfig:\n        return from_dict(AnnealingConfig, config)\n\n\n@dataclass\nclass DatasetConfig(Config):\n    # Common to all text (NLG, MLM) dataloaders\n    batch_size: int | None = None\n    loader_type: str = None\n    prepend_datapath: bool = False\n    num_workers: int | None = None\n    desired_max_samples: int | None = None\n\n    # Common to all client.train dataloaders\n    list_of_train_data: str = None\n    max_grad_norm: float | None = None  # propose moving max_grad_norm to client config\n\n    # Common to all server.train dataloaders. What is the difference?\n    train_data: str = None\n    train_data_server: str = None\n\n    # Common to server.test dataloaders\n    test_data: str = None\n\n    # Common to server.val dataloaders\n    val_data: str = None\n\n    # Specific to NLG dataloaders\n    tokenizer_type: str = None  # Note tokenizer_type appears in NLG configs, but always set to 'not applicable'\n    vocab_dict: str = None\n    pin_memory: bool = False\n    num_frames: int | None = None  # num_frames is missing from NLG server.test dataloader\n    max_batch_size: int | None = None\n    max_num_words: int | None = None\n    unsorted_batch: int | None = None\n    utterance_mvn: bool = False  # only present on NLG client.train dataloader\n\n    # Specific to MLM dataloaders\n    task: str = None\n    mlm_probability: float | None = None\n    tokenizer_type_fast: bool = False\n    max_seq_length: int | None = None\n    min_words_per_utt: int | None = None\n    max_samples_per_user: int | None = None\n    mask_token: str = None\n    cache_dir: str = None\n\n    @staticmethod\n    def from_dict(config) -> DatasetConfig:\n        return from_dict(DatasetConfig, config)\n\n\n@dataclass\nclass DataConfig(Config):\n    \"\"\"Data configurations\n\n    Client and server configs may each contain a data config, consisting of train, test, and validate datasets.\n    A typical configuration will define test and validate in the server data config, while the training data is defined in the client config.\n    Optionally, the server can have a training config which defines server-side training data.\n\n\n    Attributes:\n        train (DatasetConfig): the training dataset configuration.\n\n        val (DatasetConfig): the validation dataset configuration.\n\n        test (DatasetConfig): the test dataset configuration.\n    \"\"\"\n    train: DatasetConfig = None\n    val: DatasetConfig = None\n    test: DatasetConfig = None\n\n    @staticmethod\n    def from_dict(config) -> DataConfig:\n        train = DatasetConfig.from_dict(config['train']) if 'train' in config else None\n        val = DatasetConfig.from_dict(config['val']) if 'val' in config else None\n        test = DatasetConfig.from_dict(config['test']) if 'test' in config else None\n\n        return DataConfig(\n            train, val, test\n        )\n\n\n@dataclass\nclass ServerReplayConfig(Config):\n    \"\"\"Server replay configuration\n\n    When server-side training data is defined, this config defines how it is applied after each client training round.\n\n    Attributes:\n        server_iterations (int): the number of iterations to run over server-side training data for.\n\n        optimizer_config (OptimizerConfig): the optimizer configuration to use for the server.\n    \"\"\"\n    server_iterations: int\n    ignore_subtask: bool\n    optimizer_config: OptimizerConfig\n\n    @staticmethod\n    def from_dict(config) -> ServerReplayConfig:\n        return ServerReplayConfig(\n            config['server_iterations'],\n            config['ignore_subtask'],\n            OptimizerConfig.from_dict(config['optimizer_config'])\n        )\n\n\n@dataclass\nclass RLConfig(Config):\n    \"\"\"Reinforcement learning configuration\n\n    RL can be applied during dynamic gradient aggregation to speed up convergence. This configuration defines the settings for server-side RL to train the model for DGA.\n\n    Attributes:\n        marginal_update_RL (bool): whether to update the RL model when the loss is small.\n\n        RL_path (str): the path to the RL model to train.\n\n        RL_path_global (bool): whether the global training output path should be prepended to RL_path.\n\n        model_descriptor_RL (str): string to append to the model filename.\n\n        network_params (list): List of layer widths in the RL network. eg: 300,128,128,128,64,100\n\n        initial_epsilon (float): the initial epsilon value for the epsilon-greedy policy.\n\n        final_epsilon (float): the final epsilon value for the epsilon-greedy policy.\n\n        epsilon_gamma (float): the decay rate for the epsilon-greedy policy.\n\n        max_replay_memorize_size (int): the maximum number of samples to store in the replay memory.\n\n        minibatch_size (int): the size of the minibatch to use for training.\n\n        gamma (float): the discount factor for the RL model.\n\n        optimizer_config (OptimizerConfig): the optimizer configuration to use for the RL model.\n\n        annealing_config (AnnealingConfig): the annealing configuration to use for the RL model.\n\n\n    \"\"\"\n    marginal_update_RL: bool = False\n    RL_path: str = None\n    RL_path_global: bool = False\n    model_descriptor_RL: str = None\n    network_params: list = None\n    initial_epsilon: float | None = None\n    final_epsilon: float | None = None\n    epsilon_gamma: float | None = None\n    max_replay_memory_size: int | None = None\n    minibatch_size: int | None = None\n    gamma: float | None = None\n    optimizer_config: OptimizerConfig = None\n    annealing_config: AnnealingConfig = None\n\n    @staticmethod\n    def from_dict(config) -> RLConfig:\n        result = RLConfig()\n        for k in config:\n            if k == 'optimizer_config':\n                result.optimizer_config = OptimizerConfig.from_dict(config[k])\n            elif k == 'annealing_config':\n                result.annealing_config = AnnealingConfig.from_dict(config[k])\n            else:\n                setattr(result, k, config[k])\n        return result\n\n\n@dataclass\nclass ServerConfig(Config):\n    \"\"\"Server configuration\n\n    The server configuration defines the server-side settings.\n\n    Attributes:\n        resume_from_checkpoint (bool): whether to resume training from a checkpoint.\n\n        max_iterations (int): the maximum number of iterations (federated training rounds) to run.\n\n        num_clients (int): the number of clients to use per training round.\n\n        optimizer_config (OptimizerConfig): the optimizer configuration to use server-side.\n\n        annealing_config (AnnealingConfig): the learning rate annealing configuration to use server-side.\n\n        val_freq (int): the number of iterations between validation evaluation runs.\n\n        rec_freq (int): the number of iterations between test evaluation runs.\n\n        initial_val (bool): whether to run validation before initiating training.\n\n        initial_rec (bool): whether to run test before initiating training.\n\n        wantRL (bool): whether to train the RL model.\n\n        RL (RLConfig): the RL configuration to use if wantRL is True.\n\n        data_config (DataConfig): the data configuration to use server-side.\n\n        type (str): the type of server. Currently this parameter is ignored and OptimizationServer is always used. However there is some validation code that checks for one of the following values:\n\n            - model_averaging\n            - optimization\n            - model_optimization\n            - cluster_finetuning\n            - cluster_parallel\n\n        aggregate_median (str): the aggregation method to use (DGA softmax, or mean). Note that this only applies when the global aggregation strategy is DGA.\n\n        weight_train_loss (str): when softmax DGA is enabled, what metric to use for weighting. One of\n\n            - train_loss\n            - mag_var_loss\n            - mag_mean_loss\n\n        softmax_beta (float): the beta value to use for the softmax DGA.\n\n        max_weight (float): the maximum allowed client weight.\n\n        initial_lr_client (float): the initial learning rate for each client.\n\n        lr_decay_factor (float): the client learning rate decay factor.\n\n        best_model_criterion (str): The metric to choose when resetting to the best model so far.\n\n        server_replay_config (ServerReplayConfig): the server replay configuration to use for any server-side training.\n\n    \"\"\"\n    resume_from_checkpoint: bool = False\n    max_iteration: int | None = None\n    num_clients_per_iteration: int | None = None\n    optimizer_config: OptimizerConfig = None\n    annealing_config: AnnealingConfig = None\n    val_freq: int | None = None\n    rec_freq: int | None = None\n    initial_val: bool = True\n    initial_rec: bool = True\n    wantRL: bool = False\n    RL: RLConfig = None\n    data_config: DataConfig = None\n    type: str = None\n    aggregate_median: str = None\n    weight_train_loss: str = None\n    softmax_beta: float | None = None\n    max_weight: float | None = None\n    initial_lr_client: float | None = None\n    lr_delay_factor: float | None = None\n    best_model_criterion: str = 'loss'\n    server_replay_config: ServerReplayConfig = None\n\n    @staticmethod\n    def from_dict(config) -> ServerConfig:\n        result = ServerConfig()\n\n        for k in config:\n            if k == 'optimizer_config':\n                result.optimizer_config = \\\n                    OptimizerConfig.from_dict(config[k])\n            elif k == 'annealing_config':\n                result.annealing_config = \\\n                    AnnealingConfig.from_dict(config[k])\n            elif k == 'data_config':\n                result.data_config = \\\n                    DataConfig.from_dict(config[k])\n            elif k == 'server_replay_config':\n                result.server_replay_config = \\\n                    ServerReplayConfig.from_dict(config[k])\n            elif k == 'RL':\n                result.RL = \\\n                    RLConfig.from_dict(config[k])\n            else:\n                setattr(result, k, config[k])\n        return result\n\n\n@dataclass\nclass ClientConfig(Config):\n    \"\"\"\n    Client configuration\n\n    The client configuration defines the client-side settings.\n\n    Attributes:\n        meta_learning (str): Set to 'basic'.  Currently ignored.\n\n        stats_on_smooth_grad (bool): When true, gradient statistics are reset each round. Currently, it appears these statistics aren't used.\n\n        ignore_subtask (bool): Used to determine which model loss to use. In most cases just set to False.\n\n        num_skips_threshold (int): previously used to skip users, deprecated.\n\n        copying_train_data (bool): has no effect.\n\n        do_profiling (bool): whether to enable client-side profiling.\n\n        data_config (DataConfig): the data configuration to use client-side.\n\n        type (str): the type of client. Currently this parameter is ignored?\n\n        meta_optimizer_config (OptimizerConfig): the optimizer configuration to use for meta-learning.\n\n        optimizer_config (OptimizerConfig): the optimizer configuration to use for client-side training.\n\n        annealing_config (AnnealingConfig): the learning rate annealing configuration to use client-side.\n    \"\"\"\n    meta_learning: str = None\n    stats_on_smooth_grad: bool = False\n    ignore_subtask: bool = False\n    num_skips_threshold: int | None = None\n    copying_train_data: bool = False\n    do_profiling: bool = False\n    data_config: DataConfig = None\n    type: str = None\n    meta_optimizer_config: OptimizerConfig = None\n    optimizer_config: OptimizerConfig = None\n    annealing_config: AnnealingConfig = None\n\n    @staticmethod\n    def from_dict(config) -> ClientConfig:\n        result = ClientConfig()\n        for k in config:\n            if k == 'data_config':\n                result.data_config = DataConfig.from_dict(config[k])\n            elif k == 'meta_optimizer_config':\n                result.meta_optimizer_config = \\\n                    OptimizerConfig.from_dict(config[k])\n            elif k == 'optimizer_config':\n                result.optimizer_config = \\\n                    OptimizerConfig.from_dict(config[k])\n            elif k == 'annealing_config':\n                result.annealing_config = \\\n                    AnnealingConfig.from_dict(config[k])\n            else:\n                setattr(result, k, config[k])\n        return result\n\n\n@dataclass\nclass FLUTEConfig(Config):\n    \"\"\"\n    FLUTEConfig represents the global configuration for a training job.\n\n    Attributes:\n        model_config (ModelConfig): the model configuration to use.\n\n        dp_config (PrivacyConfig): differential privacy configuration.\n\n        strategy (str): Aggregation strategy, eg DGA or FedAvg.\n\n        server_config (ServerConfig): the server configuration to use.\n\n        client_config (ClientConfig): the client configuration to use.\n\n    \"\"\"\n    model_config: ModelConfig = None\n    dp_config: PrivacyConfig = None\n    privacy_metrics_config: PrivacyMetricsConfig = None\n    strategy: str = None\n    server_config: ServerConfig = None\n    client_config: ClientConfig = None\n\n    def validate(config):\n\n        # Join paths in config file\n        if config[\"server_config\"][\"wantRL\"]:\n            rl_path = config[\"server_config\"][\"RL\"][\"RL_path\"]\n            rl_path = os.path.join(config[\"output_path\"],rl_path) if config[\"server_config\"][\"RL\"].get(\"RL_path_global\", True) \\\n                                                            else os.path.join(config[\"output_path\"], config[\"experiment_name\"],rl_path)\n\n        if \"pretrained_model_path\" in config[\"model_config\"]:\n            config[\"model_config\"][\"pretrained_model_path\"] = os.path.join(config[\"data_path\"], config[\"model_config\"][\"pretrained_model_path\"])\n\n        for section in [\"server_config\", \"client_config\"]:\n            for mode in ['test','val','train']:\n                if mode in config[section][\"data_config\"] and \"vocab_dict\" in config[section][\"data_config\"][mode]:\n                    config[section][\"data_config\"][mode][\"vocab_dict\"] = os.path.join(config['data_path'], config[section][\"data_config\"][mode][\"vocab_dict\"])\n                \n                # TODO: Remove BERT specific parameters\n                if 'BERT' in config['model_config']:\n                    if mode!= 'train':\n                        config['server_config']['data_config'][mode]['model_name_or_path'] = config['model_config']['BERT']['model']['model_name']\n                        config['server_config']['data_config'][mode]['process_line_by_line'] = config['model_config']['BERT']['model']['process_line_by_line']\n                    else:\n                        config['client_config']['data_config'][mode]['model_name_or_path'] = config['model_config']['BERT']['model']['model_name']\n                        config['client_config']['data_config'][mode]['process_line_by_line'] = config['model_config']['BERT']['model']['process_line_by_line']\n        return config\n\n    @staticmethod\n    def from_dict(config) -> FLUTEConfig:\n\n        # Validate schema in config file\n        schema = eval(open('./core/schema.py', 'r').read())\n        v = Validator(schema)\n        if not v.validate(config,schema):\n            raise ValueError('Missing {} argumment in config file '.format(v.errors))\n        \n        # Normalize default values\n        original_config = config\n        config = v.normalized(config)\n\n        for section in ['server_config', 'client_config']:\n            for mode in config[section]['data_config'].keys():\n                diff = config[section]['data_config'][mode].keys() - original_config[section]['data_config'][mode].keys()\n                if len(diff) > 0:\n                    print_rank(\"Assigning default values for: {} in [{}][{}][data_config]\".format(diff, section, mode))\n        \n        dp_config = \\\n            PrivacyConfig.from_dict(config['dp_config']) \\\n            if 'dp_config' in config else None\n\n        priv_metrics_config = \\\n            PrivacyMetricsConfig.from_dict(config['privacy_metrics_config']) \\\n            if 'privacy_metrics_config' in config else None\n\n        strategy = config.get('strategy', 'DGA')\n\n        return FLUTEConfig(\n            ModelConfig.from_dict(config['model_config']),\n            dp_config, priv_metrics_config, strategy,\n            ServerConfig.from_dict(config['server_config']),\n            ClientConfig.from_dict(config['client_config'])\n        )\n"
  },
  {
    "path": "core/dataloader.py",
    "content": "# Copyright (c) Microsoft Corporation.\n# Licensed under the MIT license.\n\nfrom torch.utils.data import DataLoader as PyTorchDataLoader\nfrom abc import ABC\n\nclass BaseDataLoader(ABC, PyTorchDataLoader):\n    '''This is a wrapper class for PyTorch dataloaders.'''\n\n    def create_loader(self):\n        '''Returns the dataloader'''\n        return self\n\n        \n    \n"
  },
  {
    "path": "core/dataset.py",
    "content": "# Copyright (c) Microsoft Corporation.\n# Licensed under the MIT license.\n\nfrom torch.utils.data import Dataset as PyTorchDataset\nfrom abc import ABC, abstractmethod\n\nclass BaseDataset(ABC, PyTorchDataset):\n    '''This is a wrapper class for PyTorch datasets.'''\n\n    @abstractmethod\n    def __init__(self,**kwargs):\n        super(BaseDataset, self).__init__()\n        \n    @abstractmethod\n    def __getitem__(self, idx, **kwargs):\n        '''Fetches a data sample for a given key'''\n        pass\n    \n    @abstractmethod\n    def __len__(self):\n        '''Returns the size of the dataset'''\n        pass\n    \n    @abstractmethod\n    def load_data(self,**kwargs):\n        '''Wrapper method to read/instantiate the dataset'''\n        pass\n"
  },
  {
    "path": "core/evaluation.py",
    "content": "# Copyright (c) Microsoft Corporation.\n# Licensed under the MIT license.\n'''\nIn this file we define the functions for running\ntest and validation tasks inside the Server.\n'''\n\nimport logging\nimport torch\nimport numpy as np\n\n# Internal imports\nimport core.federated as federated\nfrom core.client import Client\nfrom utils import print_rank\n\n# AzureML-related libs\nfrom azureml.core import Run\nrun = Run.get_context()\n\nclass Evaluation():\n\n    def __init__(self, config, model_path, process_testvalidate, idx_val_clients, idx_test_clients, single_worker):\n\n        self.config = config\n        self.model_path = model_path\n        self.process_testvalidate = process_testvalidate\n        self.server_type = config['server_config']['type']\n        self.idx_val_clients = idx_val_clients\n        self.idx_test_clients = idx_test_clients\n        self.send_dicts = config['server_config'].get('send_dicts', False)\n        self.single_worker = single_worker\n        super().__init__()\n    \n    def run(self, eval_list, req, metric_logger=None):\n        '''Run test/validation taks depending on the modes\n        received in the eval_list.\n        \n        Args:\n            eval_list (arr): Contains the tasks to run.\n            req (dict): information for test/val tasks\n            metric_logger (callback, optional): callback used for logging.\n                Defaults to None, in which case AML logger is used.\n        '''      \n        \n        self.worker_trainer = req['worker_trainer']\n        if self.send_dicts:\n            global_model_values = [self.worker_trainer.model.state_dict()[param_key].to(torch.device('cpu')) for param_key in self.worker_trainer.model.state_dict()]\n        else:\n            global_model_values = [p.data.to(torch.device('cpu')) for p in self.worker_trainer.model.parameters()]\n\n        if 'tmp_unsup' in req:\n            unsup_values = req['tmp_unsup'].values()\n            sup_values = req['tmp_sup'].values()\n            semisupervision_inference = True\n        else:\n            semisupervision_inference = False\n\n        save_model = False \n        \n        if metric_logger is None:\n            metric_logger = run.log\n\n        for mode in eval_list:\n\n            # Skipping validation round when RL is enabled\n            if 'wantRL' in self.config['server_config'] and self.config['server_config']['wantRL'] and mode == \"val\":\n                continue\n            \n            # Compute avg_loss and avg_acc\n            self.metrics = self.run_distributed_inference(mode, global_model_values)\n            req = self.initialize_req(req) if len(req) == 1 else req\n\n            # Only if for semisupervision\n            if semisupervision_inference:\n                unsup_metrics = self.run_distributed_inference(mode, unsup_values)\n                sup_metrics = self.run_distributed_inference(mode, sup_values)\n\n                for key, value in unsup_metrics.items():\n                    metric_logger(str(\"Unsup\" +mode + \" \" + key).capitalize(), value['value'])\n                    print_rank('LOG UNSUP: {}_{}={}'.format(mode, key, value['value']))\n                \n                for key, value in sup_metrics.items():\n                    metric_logger(str(\"Sup\" + mode + \" \" + key).capitalize(), value['value'])\n                    print_rank('LOG SUP: {}_{}={}'.format(mode, key, value['value']))\n\n            # Log metrics\n            for key, value in self.metrics.items():\n                metric_logger(str(mode + \" \" + key).capitalize(), value['value'])\n                print_rank('LOG: {}_{}={}: best_{}_{}={}'.format(mode, key, value['value'], mode, key, req[str(\"best_\"+ mode + \"_\" + key)]))\n\n            for key,value in self.metrics.items():\n                attr = str(\"best_\"+ mode + \"_\" + key)\n                if value['higher_is_better']:\n                    if self.metrics[key]['value'] > req[attr]: \n                        req[attr] = self.metrics[key]['value']\n                        save_model = True\n                else:\n                    if self.metrics[key]['value'] < req[attr]:\n                        req[attr] = self.metrics[key]['value']\n                        save_model = True\n                \n                if save_model and mode == 'val':\n                    self.worker_trainer.save(\n                        model_path=self.model_path,\n                        token=str('best_'+ mode +'_'+key),\n                        config=self.config['server_config']\n                    )\n                    save_model = False\n        \n        return req\n    \n    def initialize_req(self, req):\n        '''Update the keys, to have the same as metrics dictionary. This \n        function is only used during itr=0 for initializing the req \n        dictionary. \n\n        Args:\n            req (dict): Best results for all the metrics (e.g. best_val_acc).\n        '''\n        for mode in ['test','val']:\n            for key in self.metrics.keys():\n                attr = \"best_\"+ mode + \"_\" + key \n                req[attr] = -1.0 if self.metrics[key]['higher_is_better'] else float('inf')\n\n        return req\n\n    def run_distributed_inference(self, mode, model):\n        '''Call `run_distributed_evaluation` specifically for test or validation.\n        \n        This is just a helper function that fetches the clients depending on\n        the mode and calls `run_distributed_evaluation` using that list.\n\n        Args:\n            mode (str): `test` or `val`.\n        '''\n        if mode == 'val':\n            clients = self.idx_val_clients\n        elif mode == 'test':\n            clients = self.idx_test_clients\n        else:\n            raise NotImplementedError('Unsupported mode: {}'.format(mode))\n\n        return self.run_distributed_evaluation(mode, clients, model)\n\n    def run_distributed_evaluation(self, mode, clients, model):\n        '''Perform evaluation using available workers.\n\n        See also `process_test_validate` on federated.py.\n\n        Args:\n            mode (str): `test` or `val`.\n            clients (list): clients for test/val round.\n        '''\n\n        total = 0\n        self.logits = {'predictions': [], 'probabilities': [], 'labels': []}\n        server_data = (0.0, model, 0)\n        for result in self.process_testvalidate(clients, server_data, mode, self.single_worker):\n            output, metrics, count = result\n            val_metrics =  {key: {'value':0, 'higher_is_better': False} for key in metrics.keys()} if total == 0 else val_metrics\n \n            for key in val_metrics:\n                val_metrics[key]['value'] += metrics[key]['value']* count\n                val_metrics[key]['higher_is_better'] = metrics[key]['higher_is_better']\n            total+= count\n            \n            if output is not None:\n                self.logits['predictions'].append(output['predictions'])\n                self.logits['probabilities'].append(output['probabilities'])\n                self.logits['labels'].append(output['labels'])\n\n        if  self.logits['probabilities'] and self.logits['predictions'] and self.logits['labels']:\n            self.logits['predictions'] = np.concatenate(self.logits['predictions'])\n            self.logits['probabilities'] = np.concatenate(self.logits['probabilities'])\n            self.logits['labels'] = np.concatenate(self.logits['labels'])\n\n        \n        for key in val_metrics:\n                val_metrics[key]['value'] = val_metrics[key]['value']/total\n            \n        self.losses = [val_metrics['loss']['value'], val_metrics['acc']['value']] # For compatibility with Server\n        return val_metrics\n\ndef make_eval_clients(dataset, config):\n    '''Generator that yields clients for evaluation, continuously.\n\n    Args:\n        dataset (torch.utils.data.Dataset): used to get client's data\n        config (dict): used for the client's constructor\n    '''\n\n    total = sum(dataset.num_samples)\n    clients = federated.size() - 1 if federated.size()>1 else federated.size()\n    delta = total / clients + 1\n    threshold = delta\n    current_users_idxs = list()\n    current_total = 0\n\n    if config[\"server_config\"][\"type\"] == \"personalization\":  \n        for i in range(len(dataset.user_list)):\n            yield Client([i], config, False)\n    else:\n        for i in range(len(dataset.user_list)):\n            current_users_idxs.append(i)\n            count = dataset.num_samples[i]\n            current_total += count\n            if current_total > threshold:\n                print_rank(f'sending {len(current_users_idxs)} users', loglevel=logging.DEBUG)\n                yield Client(current_users_idxs, config, False)\n                current_users_idxs = list()\n                current_total = 0\n\n        if len(current_users_idxs) != 0:\n            print_rank(f'sending {len(current_users_idxs)} users -- residual', loglevel=logging.DEBUG)\n            yield Client(current_users_idxs, config, False)\n"
  },
  {
    "path": "core/federated.py",
    "content": "# Copyright (c) Microsoft Corporation.\n# Licensed under the MIT license.\n\nimport os\nimport cProfile\nimport logging\nimport threading \n\nimport torch\nimport torch.distributed as dist\nimport numpy as np\n\nfrom core.client import Client\nfrom utils import (\n    print_rank,\n    print_profiler,\n    to_device,\n)\n\nCOMMAND_UPDATE = 0\nCOMMAND_TRAIN = 1\nCOMMAND_TERMINATE = 10\nCOMMAND_TESTVAL = 11\nCOMMAND_SYNC_NODES = 9\nGLOBAL_MESSAGE = None\n\ndef encode_string(word, string_to_int = True):\n    \"\"\" Encodes/Decodes the dictionary keys into an array of integers to be sent \n    as tensors of the same shape during NCCL/Gloo P2P communication.\n    \n    Args:\n            word (string/array): key to be encoded/decoded.\n            string_to_int (bool): flag that indicates which action to perform.\n    \"\"\"\n\n    if string_to_int: # encode\n        word = word.ljust(8, ' ') if len(word) < 8 else word # padding -- 8 is max length, all tensors must have the same size during communication\n        word_encoded = [letter for letter in word.encode()]\n        return word_encoded\n    else: #decode\n        cleanup_array = [letter for letter in word if letter!= 32] # Remove padding\n        word_decoded = bytes(cleanup_array).decode()\n        return word_decoded\n\ndef rank():\n    \"\"\" Return rank of node. \"\"\"\n    return int(os.environ['RANK'])\n\ndef local_rank():\n    \"\"\" Return local rank of node. \"\"\"\n    return int(os.environ['LOCAL_RANK'])\n\ndef size():\n    \"\"\" Returns number of nodes in the distributed group, including server. \"\"\"\n    return int(os.environ['WORLD_SIZE'])\n\ndef _recv(x, src=0):\n    \"\"\" Receives tensors with a single element or a list of tensors \n    with the same shape during distributed communication. \"\"\"\n\n    x = torch.tensor(x) if torch.is_tensor(x) == False else x\n    x = to_device(x)\n    dist.recv(tensor=x, src=src)\n    x.to('cpu')\n    \n    try:\n        return x.item() # single element\n    except:\n        return x.tolist() # list of tensors\n\ndef _recv_gradients(src):\n    \"\"\" Receives a list of tensors with different shape during \n    distributed communication. \"\"\"\n\n    n, n_dimensions, grads = 0, 0, [] # tensors intialization -- required by torch.\n    n = _recv(n,src)\n    for i in range(n):\n        n_dimensions = _recv(n_dimensions,src)\n        dimensions = [0 for i in range(n_dimensions)]\n        dimensions = _recv(dimensions, src)\n        print_rank(f\"Received dimensions {dimensions}\", loglevel=logging.DEBUG)\n        param = to_device(torch.zeros(dimensions))\n        print_rank(f\"Shape assigned {param.shape}\", loglevel=logging.DEBUG)\n        dist.recv(param,src)\n        grads.append(param.detach().cpu())\n    torch.cuda.empty_cache() \n    return grads\n\ndef _send(x, dst=0):\n    \"\"\" Send tensors with a single element or a list of tensors \n    with the same shape during distributed communication. \"\"\"\n    x = torch.tensor(x)\n    x = to_device(x)\n    dist.send(x, dst)\n    del x \n    torch.cuda.empty_cache()\n\ndef _send_metrics(output):\n    \"\"\" Organize the keys and values from the resulting dictionary \n    from test/val rounds into arrays that are sent as independent \n    tensors during distributed communication. \"\"\"\n\n    keys = [encode_string(key) for key in output.keys()]\n    values = [float(output[key]['value']) for key in output.keys()]\n    higher_is_better = [int(output[key]['higher_is_better']) for key in output.keys()] # send the boolean as int\n\n    _send(len(keys),0) \n    _send(keys)\n    _send(values)\n    _send(higher_is_better)\n\ndef _send_gradients(gradients, dst):\n    \"\"\" Send a list of tensors with different shape during \n    distributed communication. \"\"\"\n\n    _send(len(gradients), dst)\n    for i in gradients:\n        dimensions = [int(d) for d in i.shape]\n        _send(len(dimensions),dst)\n        _send(dimensions,dst)\n        param = to_device(i)\n        dist.send(param,dst)\n        del param \n        torch.cuda.empty_cache()\n\ndef _send_train_output(output):\n    \"\"\" Organize the keys and values from the the returning ´client_output´ \n    dictionary in ´Client.proces_round()´ function during training rounds,\n    into arrays that are sent as independent tensors during distributed \n    communication. \"\"\"\n\n    cs_values = [float(cs_v) for cs_v in output['cs'].values()] # cs dict -- values are flatten in 1d array\n    pl_values = [float(output['pl']['weight'])] # pl dict\n    gradients = output['pl']['gradients'] # gradients are sent independently\n\n    if len(output.keys()) > 9: # DP metrics\n        ps_values = [float(ps_v) for ps_v in output['ps'].values()]\n        values = cs_values + [float(output[key]) for key in output.keys() if key not in ['cs','pl','ps']] + pl_values + ps_values # reorganizing values in the order expected by the Server\n    else:\n        values = cs_values + [float(output[key]) for key in output.keys() if key not in ['cs','pl']] + pl_values # reorganizing values in the order expected by the Server\n    \n    # Send data\n    _send(int(len(output.keys())),0) # Warn for number of keys\n    _send(values, 0)\n    _send_gradients(gradients, 0)\n\ndef build_grads_dict(node):\n    \"\"\" Reconstruct the dictionary ´client_output´ returned by \n    ´Client.proces_round()´ function on the Server side during \n    distributed communication. \"\"\"\n\n    # Initialize tensors\n    n_keys = 0\n    n_keys = _recv(n_keys,node)\n    print(n_keys)\n\n    if n_keys == 9:\n        keys = ['cs','tl','mg','vg','ng','rg','ns','ts','pl']\n        values = [0.0 for i in range(11)] # initializing tensor shape -- 11 is fixed number of keys expected\n    elif n_keys == 10:\n        keys = ['cs','tl','mg','vg','ng','rg','ns','ts','pl','ps']\n        values = [0.0 for i in range(15)] # When the privacy metrics are enabled\n    elif n_keys == 11:\n        keys = ['cs','tl','mg','vg','ng','rg','ns','wt','ts','pl','ps']\n        values = [0.0 for i in range(16)] # When the privacy metrics are enabled\n    \n    # Read data\n    values = _recv(values,node)\n    grads = _recv_gradients(node)\n    \n    cs_values = [{key: values.pop(0) for key in ['setup','training','full cost']}] # recreating cs dict\n    # Rebuilding original dictionary\n    if n_keys == 9:\n        pl_values = [{'weight':values.pop(), 'gradients': grads}] # recreating pl dict\n        values_list = cs_values + [values.pop(0) for i in range(7)] + pl_values # 7 is fixed length for remaining items\n    else:\n        ps_values = [{key: values.pop() for key in ['Practical epsilon (Max leakage)','Words percentage above 9000 word rank','Extracted indices percentage','Dropped clients']}]\n        pl_values = [{'weight':values.pop(), 'gradients': grads}] # recreating pl dict\n        values_list = cs_values + [values.pop(0) for i in range(len(values))] + pl_values + ps_values\n\n    result = dict(zip(keys,values_list))\n\n    # Cast values to original type\n    for key in ['mg','vg','ng','rg']:\n        result[key] = np.float32(result[key])\n    result['ns'] = int(result['ns'] )\n                \n    return result\n\ndef build_metrics_dict(node):\n    \"\"\" Reconstruct the dictionary returned during test/val rounds\n    on the Server side during distributed communication. \"\"\"\n\n    # Initialize tensors\n    n = 0\n    n = _recv(n,node)\n    keys = [[0 for j in range(8)] for i in range(n)] # max_seq_len for metric name is 8\n    values = [0.0 for i in range(n)]\n    higher_is_better = [0 for i in range(n)]\n\n    # Read data\n    keys = _recv(keys,node)\n    values = _recv(values,node)\n    higher_is_better = _recv(higher_is_better,node)\n\n    # Reorganize output + decode dict keys\n    orig_keys = [encode_string(key, string_to_int=False) for key in keys]\n    values_dict = [{'value': float(v), 'higher_is_better': bool(higher_is_better[i])} for i, v in enumerate(values)]\n    metrics = dict(zip(orig_keys,values_dict))\n    num_instances = int(metrics.pop('num')['value'])\n\n    result = None, metrics, num_instances\n            \n    return result\n\ndef receive_workers_output(node_request_map, results_list, free_nodes, command, idle_nodes):\n    \"\"\" Receives the clients output on the Server side in async/sync mode. \n    Asynchronous mode is only enabled when using NCCL backend given that Gloo \n    does not provide native non-blocking implementation to check if the operation \n    has been completed during distributed training\"\"\"\n\n    if dist.get_backend() == \"nccl\": # Async\n        for node, req in node_request_map:\n            if req.is_completed():\n                result = build_metrics_dict(node) if command == COMMAND_TESTVAL else build_grads_dict(node)\n                results_list.append(result)\n                free_nodes.append(node)\n                node_request_map.remove((node,req))\n                print_rank(f\"Finished releasing the nodes {free_nodes}\", loglevel=logging.DEBUG)\n    else: # Sync\n        print_rank(f\"Waiting for a workers\", loglevel=logging.DEBUG)\n        gather_objects = [(None,None,None) for i in range(size())]\n        output = [None for _ in gather_objects]\n        dist.all_gather_object(output, gather_objects[rank()])\n        print_rank(f\" All workers have finished ... taking the remaining clients {len(output)}\", loglevel=logging.DEBUG)\n        output = [e for i,e in enumerate(output) if i not in idle_nodes ] # Cleanup for idle workers\n        results_list = results_list + output[1:]\n        free_nodes = list(range(1, size()))\n    \n    return node_request_map, results_list, free_nodes\n\ndef append_async_requests(node_request_map, node):\n    \"\"\" Appends the asynchronous request sent to each worker during \n    asynchronous training. \"\"\"\n\n    ack = to_device(torch.tensor(1))\n    req = dist.irecv(tensor=ack, src=node)\n    node_request_map.append((node,req))\n    return node_request_map\n\ndef sync_idle_nodes(client_queue, free_nodes):\n    \"\"\" Request dummy outputs to the odd (idle) nodes during synchronous training\n    to prevent them to get trapped in the state of the previous iterations \"\"\"\n\n    idle_nodes = []\n    if len(client_queue) == 0:\n        print_rank(f\"Free idle nodes {len(free_nodes)}\", loglevel=logging.DEBUG)\n        while len(free_nodes) > 0:\n            node = free_nodes.pop()\n            idle_nodes.append(node)\n            _send(COMMAND_SYNC_NODES, node)\n    return idle_nodes\n\nclass Server:\n    \"\"\"Server object responsible for orchestration and aggregation.\n\n    The Server is one of the two objects that may exist inside of a thread, all\n    throughout its execution (the other being the Worker). At every round, the\n    Server samples clients ids and send their data for an available Worker to process.\n    The Workers then each produce a new model, and all models are sent to the Server\n    for aggregation.\n\n    The methods defined here are related to orchestration only, the aggregation\n    will be done by a different object which inherits from this one.\n\n    Notes:\n        This class has no :code`__init__` method, and all its methods are static.\n        It thus only serves the purpose of grouping the methods, but nothing\n        is actually stored inside of the object.\n    \"\"\"\n    @staticmethod\n    def dispatch_clients(clients, server_data, command, mode=None, do_profiling=False, single_worker=None):\n        \"\"\"Perform the orchestration between Clients and Workers.\n\n        This function does the following:\n            1. It sends the server_data to all workers\n            2. For each available Worker:\n                2a. It sends the index of the client to instantiate\n                2c. It triggers the execution of the command on the\n                    Client.\n            3. Collect and return all client outputs.\n\n        Notes:\n            This function yields the gradients of different clients\n            as they are received. Therefore, the order of the results generally\n            does not correspond to the order of the clients.\n\n            All commands used during Server-Worker communication must be \n            float/integers given that torch.distributed only allows to\n            send/recv tensors.\n\n        Args:\n            clients (list): list of clients to be processed.\n            server_data (dict): server data sent to the workers and passed to\n                clients, typically includes the global model at that step.\n            command (int): instruction for worker to execute on the Client.\n            mode (int): test/val only provided during evaluation rounds.\n            do_profiling (bool): enables profiler during comunication.\n        \n        Returns:\n            Generator of results.\n        \"\"\"\n        # Single GPU flag\n        single_gpu = True if size()==1 else False\n        print_rank(f\"Single GPU flag Server: {single_gpu}\", loglevel=logging.DEBUG)\n\n        # Some cleanup\n        torch.cuda.empty_cache()\n        torch.cuda.synchronize() if torch.cuda.is_available() else None\n\n        # Initialize communication profiler\n        profiler = None\n        if do_profiling:\n            profiler = cProfile.Profile()\n            profiler.enable()\n\n        # Update lr + model parameters each round for all workers\n        lr, model_params, nround = server_data\n        if not single_gpu:\n            for worker_rank in range(1, size()):\n                _send(COMMAND_UPDATE, worker_rank)\n                _send(lr,worker_rank)\n                _send_gradients(model_params, worker_rank)\n                _send(float(nround),worker_rank)\n                print_rank(f\"Finished sending lr {lr} and n_params {len(model_params)} to worker {worker_rank} - round {nround}\", loglevel=logging.DEBUG)\n                print_rank(f\"Finished sending server_data to workers\", loglevel=logging.DEBUG)\n        \n            client_queue = clients.copy()\n            print_rank(f\"Clients queue: {client_queue}\", loglevel=logging.DEBUG)\n            free_nodes = list(range(1, size()))\n            results_list, node_request_map = [], []\n\n            # Initiate computation for all clients\n            while client_queue:\n                print_rank(f\"Clients queue: {client_queue}\", loglevel=logging.DEBUG)\n                assert len(free_nodes) > 0\n                node = free_nodes.pop()\n                index = len(client_queue)-1\n                client_to_process = client_queue.pop(index) \n                print_rank(f\"Sending client {index} to worker {node}\", loglevel=logging.DEBUG)\n                _send(command, node) # The command should indicate the worker which function to run on the client\n\n                if command == COMMAND_TESTVAL:\n                    _send(mode,node) # Only for test/val has a value\n                    _send(index, node) # Worker receives the index of the client to pop\n                elif command == COMMAND_TRAIN:\n                    _send(client_to_process, node)\n                print_rank(f\"Finished assigning worker {node}, free nodes {free_nodes}\", loglevel=logging.DEBUG)\n\n                if dist.get_backend() == \"nccl\":\n                    append_async_requests(node_request_map, node)\n                    idle_nodes = None\n                else:\n                    idle_nodes = sync_idle_nodes(client_queue, free_nodes)\n    \n                # Waits until receive the output from all ranks\n                if not free_nodes:\n                    print_rank(f\"Waiting for a workers, free nodes {free_nodes}, reqs_lst {node_request_map}\", loglevel=logging.DEBUG)\n                    while len(free_nodes) == 0:\n                        node_request_map, results_list, free_nodes = receive_workers_output(node_request_map, results_list, free_nodes, command, idle_nodes)\n                        for output in results_list:\n                            yield output\n                        results_list = []\n\n            # Wait for all workers to finish\n            while (len(node_request_map)) != 0:\n                node_request_map, results_list, free_nodes = receive_workers_output(node_request_map, results_list, free_nodes, command, idle_nodes)\n\n                for output in results_list:\n                    yield output\n                results_list = []\n        else:\n            # For a single-GPU execution, there is no P2P communication in the same GPU. Using threats to coordinate.\n            \n            global GLOBAL_MESSAGE\n            GLOBAL_MESSAGE = server_data\n\n            if command == COMMAND_TESTVAL:\n                t1 = threading.Thread(target=single_worker.trigger_evaluate)\n                t1.start()\n                t1.join()\n                yield GLOBAL_MESSAGE\n            elif command == COMMAND_TRAIN:\n                total_clients = clients.copy()\n                \n                for client_id in total_clients:\n                    GLOBAL_MESSAGE = lr, model_params, nround, client_id\n                    t1 = threading.Thread(target=single_worker.trigger_train)\n                    t1.start()\n                    t1.join()\n                    result = GLOBAL_MESSAGE\n                    yield result\n\n        if do_profiling:\n            profiler.disable()\n            print_profiler(profiler)\n\n        # Some cleanup\n        torch.cuda.empty_cache()\n        torch.cuda.synchronize() if torch.cuda.is_available() else None\n\n    @staticmethod\n    def process_clients(clients, server_data, single_worker):\n        \"\"\"Ask workers to perform training on Clients.\n\n        Args:\n            clients (list): list of clients indexes sampled by ´Server.py´ \n                            object per iteration.\n            server_data (dict): dictionary containing model.\n\n        Returns:\n            Generator of results.\n        \"\"\"\n        return Server.dispatch_clients(clients, server_data, COMMAND_TRAIN, single_worker=single_worker)\n\n    @staticmethod\n    def process_testvalidate(clients, server_data, mode, single_worker):\n        \"\"\"Ask workers to perform test/val on Clients.\n\n        Args:\n            clients (list): list of clients indexes for test/val rounds.\n            server_data (dict): dictionary containing model.\n            mode (str): test/val.\n\n        Returns:\n            Generator of results.\n        \"\"\"\n\n        mode = [-2] if mode == \"test\" else [2]\n        return Server.dispatch_clients(clients, server_data, COMMAND_TESTVAL, mode, single_worker=single_worker)\n\n    @staticmethod\n    def terminate_workers(terminate=True):\n        \"\"\"Terminate the execution of the workers.\"\"\"\n\n        if terminate:\n            print_rank(\"Terminating worker processes\")\n            for worker_rank in range(1, size()):\n                _send(COMMAND_TERMINATE, worker_rank)\n\nclass Worker:\n    \"\"\"Worker object responsible for instantiate Clients based on incoming data\n    from the Server and perform train/eval functions on it.\n\n    Each worker lives on a different NCCL/Gloo thread and is assigned to a different\n    GPU. Via the :code:`dispatch_clients` function, the Server passes to the\n    Worker specific instructions to process clients' data, typically in order\n    to generate a new model or to compute metrics.\n\n    Attributes:\n        model (torch.nn.Module): model being trained.\n        data_path (str): path where all clients' data is located.\n        do_profiling (bool): if True, analyzes execution in depth.\n        val_clients (list): clients list for validation rounds.\n        test_clients (list): clients list for testing rounds.\n        config (dict): clients configuration.\n        val_dataset (torch.utils.data.Dataset): validation dataset.\n        test_dataset (torch.utils.data.Dataset): testing dataset.\n    \"\"\"\n    def __init__(self, model=None, data_path=None, do_profiling=False, val_clients= None, \\\n                test_clients=None, config=None, val_dataset = None, test_dataset = None):\n\n        self.model = model\n        self.data_path = data_path\n        self.do_profiling = do_profiling\n        self.config = config\n        self.val_clients = val_clients\n        self.test_clients = test_clients\n        self.val_dataset = val_dataset\n        self.test_dataset = test_dataset\n\n    def run(self):\n        \"\"\"Main loop executed by worker nodes.\n        \n        This method handles the NCCL/Gloo communication between the worker and\n        the server. It keeps listening for commands from the Server,\n        and performs different actions on the Client assigned depending on \n        the command received.\n        \"\"\"\n        # Single GPU flag\n        single_gpu = True if size()==1 else False\n        print_rank(f\"Single GPU flag Client: {single_gpu}\", loglevel=logging.DEBUG)\n    \n        if not single_gpu:\n            while True:  # keeps listening for incoming server calls\n\n                # Initialize tensors -- required by torch.distributed\n                command, client_idx, mode = 0, 0, 0  # int\n                lr, nround = torch.zeros(1), torch.zeros(1) # float\n\n                # Read command\n                command = _recv(command)\n                print_rank(f\"Command received {command} on worker {rank()}\", loglevel=logging.DEBUG)\n\n                # Receive server data -- lr, model_params\n                if command == COMMAND_UPDATE:\n                    print_rank(f\"COMMMAND_UPDATE received {rank()}\", loglevel=logging.DEBUG)                \n                    lr = _recv(lr, 0)\n                    model_params = _recv_gradients(0)\n                    nround = _recv(nround, 0)\n                    server_data = (lr, model_params, int(nround))\n                    print_rank(f\"Received lr: {lr} and n_params: {len(model_params)} - round {nround}\", loglevel=logging.DEBUG)\n                    \n                elif command == COMMAND_TRAIN:\n                    print_rank(f\"COMMMAND_TRAIN received {rank()}\", loglevel=logging.DEBUG)\n                    \n                    # Init profiler in training worker\n                    profiler = None\n                    if self.do_profiling:\n                        profiler = cProfile.Profile()\n                        profiler.enable()\n                                    \n                    # Receive client id from Server\n                    client_idx = _recv(client_idx)\n                    print_rank(f\"Cliend idx received from Server: {client_idx}\", loglevel=logging.DEBUG)\n\n                    # Instantiate client\n                    client_to_process = Client(\n                            [client_idx],\n                            self.config,\n                            self.config['client_config']['type'] == 'optimization') \n                    \n                    # Execute Client.get_data()\n                    client_data = client_to_process.get_client_data()\n\n                    # Execute Client.process_round()\n                    output = client_to_process.process_round(client_data, server_data, self.model, self.data_path)\n\n                    # Send output back to Server\n                    if dist.get_backend() == \"nccl\":\n                        # ASYNC mode -- enabled only for nccl backend\n                        ack = to_device(torch.tensor(1))\n                        dist.isend(tensor=ack, dst=0)\n                        _send_train_output(output)\n                    else:\n                        # SYNC mode -- gloo backend does not have a non-blocking way to check if the operation is completed\n                        gather_objects = [output for i in range(size())]\n                        output = [None for _ in gather_objects]\n                        dist.all_gather_object(output, gather_objects[rank()])\n\n                    # Some cleanup\n                    torch.cuda.empty_cache()\n                    torch.cuda.synchronize() if torch.cuda.is_available() else None\n\n                    if self.do_profiling:\n                        profiler.disable()\n                        print_profiler(profiler)\n\n                elif command == COMMAND_TESTVAL:\n                    print_rank(f\"COMMMAND_TESTVAL received {rank()}\", loglevel=logging.DEBUG)\n\n                    # Init profiler in validation worker\n                    profiler = None\n                    if self.do_profiling:\n                        profiler = cProfile.Profile()\n                        profiler.enable()\n                    \n                    # Receive mode and client id from Server\n                    mode = _recv(mode)\n                    mode = \"test\" if mode == -2 else \"val\"\n                    client_idx = _recv(client_idx)\n                    print_rank(f\"Client idx received from Server: {client_idx}, {mode}\", loglevel=logging.DEBUG)\n                    \n                    # Get client and dataset\n                    clients = self.val_clients if mode == \"val\" else self.test_clients\n                    dataset = self.val_dataset if mode == \"val\" else self.test_dataset\n                    clients_queue = clients.copy()\n                    assert 0 <= client_idx < len(clients_queue)\n                    client_to_process = clients_queue.pop(client_idx)\n\n                    # Execute Client.get_data()\n                    client_data = client_to_process.get_client_data(dataset)\n    \n                    # Execute Client.run_testvalidate()\n                    output = client_to_process.run_testvalidate(client_data, server_data, mode, self.model)\n\n                    # Send output back to Server\n                    if dist.get_backend() == \"nccl\":\n                        # ASYNC mode -- enabled only for nccl backend\n                        _, metrics, num_instances = output\n                        metrics['num']= {'value': float(num_instances), 'higher_is_better': False}\n                        output = metrics\n                        print_rank(f\"Worker {rank()} output {output}\", loglevel=logging.DEBUG)\n                        ack = to_device(torch.tensor(1))\n                        dist.isend(tensor=ack, dst=0)\n                        _send_metrics(output)\n                    else:\n                        # SYNC mode -- gloo backend does not have a non-blocking way to check if the operation is completed\n                        gather_objects = [output for i in range(size())]\n                        output = [None for _ in gather_objects]\n                        dist.all_gather_object(output, gather_objects[rank()])\n                        print_rank(f\"Worker {rank()} sent output back\", loglevel=logging.DEBUG)\n\n                    # Some cleanup\n                    torch.cuda.empty_cache()\n                    torch.cuda.synchronize() if torch.cuda.is_available() else None\n\n                    if self.do_profiling:\n                        profiler.disable()\n                        print_profiler(profiler)\n\n                elif command == COMMAND_TERMINATE:\n                    print_rank(f\"COMMMAND_TERMINATE received {rank()}\", loglevel=logging.DEBUG)\n\n                    # Some cleanup\n                    torch.cuda.empty_cache()\n                    torch.cuda.synchronize() if torch.cuda.is_available() else None\n                    return\n\n                elif command == COMMAND_SYNC_NODES: # Only for sync calls\n                    print_rank(f\"COMMMAND_SYNC_NODES received {rank()}\", loglevel=logging.DEBUG)\n\n                    gather_objects = [None for i in range(size())]\n                    output = [None for _ in gather_objects]\n                    dist.all_gather_object(output, gather_objects[rank()])\n                    print_rank(f\"Worker IDLE {rank()} sent dummy output back\", loglevel=logging.DEBUG)\n\n                    # Some cleanup\n                    torch.cuda.empty_cache()\n                    torch.cuda.synchronize() if torch.cuda.is_available() else None\n                else:\n                    assert False, \"unknown command\"\n\n    def trigger_evaluate(self):\n        global GLOBAL_MESSAGE\n\n        lr, model_params, nround = GLOBAL_MESSAGE\n        server_data = (lr, model_params, int(nround))\n        mode = \"val\"\n\n        # Get client and dataset\n        clients = self.val_clients if mode == \"val\" else self.test_clients\n        dataset = self.val_dataset if mode == \"val\" else self.test_dataset\n        clients_queue = clients.copy()\n        client_to_process = clients_queue.pop()\n\n        # Execute Client.get_data()\n        client_data = client_to_process.get_client_data(dataset)\n\n        # Execute Client.run_testvalidate()\n        output = client_to_process.run_testvalidate(client_data, server_data, mode, self.model)\n        _, metrics, num_instances = output\n        metrics['num']= {'value': float(num_instances), 'higher_is_better': False}\n        GLOBAL_MESSAGE = (_, metrics, num_instances)\n\n        # Some cleanup\n        torch.cuda.empty_cache()\n        torch.cuda.synchronize() if torch.cuda.is_available() else None\n    \n    def trigger_train(self):\n        global GLOBAL_MESSAGE\n        lr, model_params, nround, client_idx = GLOBAL_MESSAGE\n        server_data = (lr, model_params, int(nround))\n\n        # Instantiate client\n        client_to_process = Client([client_idx], self.config, self.config['client_config']['type'] == 'optimization') \n    \n        # Execute Client.get_data()\n        client_data = client_to_process.get_client_data()\n\n        # Execute Client.process_round()\n        GLOBAL_MESSAGE = client_to_process.process_round(client_data, server_data, self.model, self.data_path)\n\n        # Some cleanup\n        torch.cuda.empty_cache()\n        torch.cuda.synchronize() if torch.cuda.is_available() else None"
  },
  {
    "path": "core/metrics.py",
    "content": "# Copyright (c) Microsoft Corporation.\n# Licensed under the MIT license.\n'''\nIn this file we define the wrapper class for \nimplementing metrics.\n'''\nimport logging\n\nimport numpy as np\nimport torch\n\nfrom utils import print_rank\n\nclass Metrics():\n\n    def __init__(self):\n        super().__init__()\n\n    def compute_metrics(self,dataloader, model):\n        '''This method is called by ´run_validation_generic´ function \n        inside trainer.py .\n        \n        This is just a helper function that computes the metrics returned \n        in the inference function inside ´model.py´.\n        '''\n        print_rank(\"Computing metrics\")\n        return self.call_inference(dataloader,model)\n\n    def call_inference(self, dataloader, model):\n        \n        metrics, sum_metrics = dict(), dict()\n        output_tot = {\"probabilities\": [], \"predictions\": [], \"labels\":[]}\n        counter = 0\n\n        with torch.no_grad():\n            for _, batch in enumerate(dataloader):\n                val_loss = model.loss(batch).item()\n                inf_results = model.inference(batch)\n                inf_results ['loss'] = {'value': val_loss,'higher_is_better': False}\n                output = inf_results.pop('output')\n                batch_size = inf_results.pop('batch_size')\n\n                for key in inf_results.keys():\n                    if not isinstance(inf_results[key], dict):\n                        inf_results[key] = {'value':inf_results[key],'higher_is_better': True}\n                    sum_metrics[key] = [] if not key in sum_metrics else sum_metrics[key]\n\n                if isinstance(output, dict):\n                    output_tot[\"probabilities\"].append(output[\"probabilities\"])\n                    output_tot[\"predictions\"].append(output[\"predictions\"])\n                    output_tot[\"labels\"].append(output[\"labels\"])\n\n                for q in inf_results.keys():\n                    sum_metrics[q].append(inf_results[q]['value']* batch_size)\n                counter += batch_size\n                torch.cuda.empty_cache()\n\n        output_tot[\"probabilities\"] = np.concatenate(output_tot[\"probabilities\"]) if output_tot[\"probabilities\"] else []\n        output_tot[\"predictions\"] = np.concatenate(output_tot[\"predictions\"]) if output_tot[\"predictions\"] else []\n        output_tot[\"labels\"] = np.concatenate(output_tot[\"labels\"]) if output_tot[\"labels\"] else []\n\n        # Post-processing of metrics\n        print_rank(f\"validation complete {counter}\", loglevel=logging.DEBUG)\n        model.set_train()\n\n        for k in inf_results.keys():\n            metrics[k] = inf_results[k]\n            metrics[k]['value'] = sum(sum_metrics[k])/counter\n\n        print_rank(f\"validation examples {counter}\", loglevel=logging.DEBUG)\n        torch.cuda.empty_cache()\n        \n        return output_tot, metrics\n"
  },
  {
    "path": "core/model.py",
    "content": "# Copyright (c) Microsoft Corporation.\n# Licensed under the MIT license.\n\nimport torch as T\nfrom abc import ABC, abstractmethod\n\nclass BaseModel(ABC, T.nn.Module):\n    '''This is a wrapper class for PyTorch models.'''\n\n    @abstractmethod\n    def __init__(self,**kwargs):\n        super(BaseModel, self).__init__()\n        \n    @abstractmethod\n    def loss(self, input):\n        '''Performs forward step and computes the loss\n\n        Returns:\n            torch: Computed loss.\n        '''\n        pass\n    \n    @abstractmethod\n    def inference(self, input):\n        '''Performs forward step and computes metrics\n             \n        Returns:\n            dict: The metrics to be computed. The following keys are\n            the minimum required by FLUTE during evaluations rounds: \n                - output\n                - acc\n                - batch_size\n\n            More metrics can be computed by adding the key with a\n            dictionary that includes the fields ´value´ and \n            ´higher_is_better´ as follows:\n\n            {'output':output, \n             'acc': accuracy, \n             'batch_size': n_samples, \n             'f1_score': {'value':f1,'higher_is_better': True}}\n        '''\n        pass\n\n    def set_eval(self):\n        '''Bring the model into evaluation mode'''\n        self.eval()\n\n    def set_train(self):\n        '''Bring the model into training mode'''\n        self.train()\n"
  },
  {
    "path": "core/schema.py",
    "content": "# Copyright (c) Microsoft Corporation.\n# Licensed under the MIT license.\n# '''\n# In this file we define the  schema for the configuration \n# files that will be pass it to an instance of the Validator \n# in e2e_trainer.py \n# '''\n\n{\n    'model_config':{\n            'required': True,\n            'type': 'dict',\n            'allow_unknown': True,\n            'schema': {\n                'model_type': {'required': True, 'type':'string'},\n                'model_folder': {'required': True, 'type':'string'},\n                'BERT':{\n                    'required':False,\n                    'type': 'dict',\n                    'allow_unknown': True,\n                    'schema':{\n                        'loader_type': {'required': False, 'type': 'string'},\n                        'model': {\n                            'required': True,\n                            'type': 'dict',\n                            'allow_unknown': True,\n                            'schema': {\n                                'model_name_or_path': {'required': False, 'type':'string'},\n                                'model_name': {'required': True, 'type':'string'},\n                                'process_line_by_line': {'required': True, 'type':'boolean'},\n                            }\n                        }\n                    }\n                },\n            }\n    },\n\n    'dp_config':{\n            'required': True,\n            'type': 'dict',\n            'allow_unknown': True,\n            'schema': {\n                'enable_local_dp': {'required': True, 'type':'boolean'},\n                'enable_global_dp': {'required': False, 'type':'boolean'},\n                'eps': {'required': False, 'type':'float'},\n                'delta': {'required': False, 'type':'float'},\n                'global_sigma': {'required': False, 'type':'float'},\n                'max_grad': {'required': False, 'type':'float'},\n                'max_weight': {'required': False, 'type':'float'},\n                'weight_scaler': {'required': False, 'type':'float'},\n                'min_weight': {'required': False, 'type':'float'},\n                }\n    },\n\n    'privacy_metrics_config':{\n            'required': True,\n            'type': 'dict',\n            'allow_unknown': True,\n            'schema': {\n                'apply_metrics': {'required': True, 'type':'boolean'},\n                'apply_indices_extraction': {'required': False, 'type':'boolean'},\n                'allowed_word_rank': {'required': False, 'type':'integer'},\n                'apply_leakage_metric': {'required': False, 'type':'boolean'},\n                'max_leakage': {'required': False, 'type':'float'},\n                'adaptive_leakage_threshold': {'required': False, 'type':'float'},\n                'is_leakage_weighted': {'required': False, 'type':'boolean'},\n                'attacker_optimizer_config': {'required': False, 'type':'dict', 'allow_unknown': True},\n                }\n    },\n\n    'strategy':{\n        'required': True,\n        'type': 'string'\n    },\n\n    'server_config':{\n            'required': True,\n            'type': 'dict',\n            'allow_unknown': True,\n            'schema': {\n                'wantRL': {'required': True, 'type':'boolean', 'allow_unknown': True},\n                'RL': {'required': False, 'type':'dict'},\n                'resume_from_checkpoint': {'required': True, 'type':'boolean'},\n                'do_profiling': {'required': True, 'type':'boolean'},\n                'optimizer_config': {\n                    'required': True, \n                    'type':'dict',\n                    'allow_unknown': True,\n                    'schema': {\n                        'type': {'required': True, 'type':'string', 'allowed':['sgd', 'adam','adamax', 'lars', 'LarsSGD', 'lamb', 'adamW']},\n                        'lr': {'required': True, 'type':'float'},\n                        'weight_decay': {'required': False, 'type':'float'},\n                    }\n                },\n                'annealing_config': {\n                    'required': True, \n                    'type':'dict',\n                    'allow_unknown': True,\n                    'schema': {\n                        'type': {'required': True, 'type':'string'},\n                        'step_interval': {'required': True, 'type':'string'},\n                        'gamma': {'required': True, 'type':'float'},\n                        'step_size': {'required': True, 'type':'integer'},\n                    }\n                },\n                'val_freq': {'required': False, 'type':'integer', 'default': 1},\n                'rec_freq': {'required': False, 'type':'integer', 'default': 8},\n                'initial_val': {'required': False, 'type':'boolean', 'default': True},\n                'initial_rec': {'required': False, 'type':'boolean', 'default': False},\n                'max_iteration': {'required': False, 'type':'integer', 'default': 10000},\n                'num_clients_per_iteration': {'required': False, 'type':'integer', 'default': 1},\n                'data_config': {\n                    'required': True, \n                    'type':'dict',\n                    'allow_unknown': True,\n                    'keysrules':{'forbidden':['num_clients']},\n                    'schema': {\n                        'val': {\n                            'required': True, \n                            'type':'dict',\n                            'allow_unknown': True,\n                            'schema': {\n                                'batch_size': {'required': False, 'type':'integer', 'default': 40},\n                                'val_data': {'required': True, 'type':'string', 'nullable':True},\n                                'tokenizer_type': {'required': False, 'type':'string'},\n                                'prepend_datapath': {'required': False, 'type':'boolean', 'default': False},\n                                'vocab_dict': {'required': False, 'type':'string'},\n                                'pin_memory': {'required': False, 'type':'boolean', 'default': True},\n                                'num_workers': {'required': False, 'type':'integer', 'default': 1},\n                                'num_frames': {'required': False, 'type':'integer', 'default': 0},\n                                'max_batch_size': {'required': False, 'type':'integer', 'default': 0},\n                                'max_num_words': {'required': False, 'type':'integer'},\n                                'max_grad_norm': {'required': False, 'type':'float', 'default': 5.0 },\n                                'unsorted_batch': {'required': False, 'type':'boolean', 'default': False},\n                                'cache_dir': {'required': False, 'type':'string'},\n                            },\n                        },\n                        'test': {\n                            'required': True, \n                            'type':'dict',\n                            'allow_unknown': True,\n                            'schema': {\n                                'batch_size': {'required': False, 'type':'integer', 'default': 40},\n                                'test_data': {'required': True, 'type':'string', 'nullable': True},\n                                'tokenizer_type': {'required': False, 'type':'string'},\n                                'prepend_datapath': {'required': False, 'type':'boolean', 'default': False},\n                                'vocab_dict': {'required': False, 'type':'string'},\n                                'pin_memory': {'required': False, 'type':'boolean', 'default': True},\n                                'num_workers': {'required': False, 'type':'integer', 'default': 1},\n                                'num_frames': {'required': False, 'type':'integer', 'default': 0},\n                                'max_batch_size': {'required': False, 'type':'integer', 'default': 0},\n                                'max_num_words': {'required': False, 'type':'integer'},\n                                'max_grad_norm': {'required': False, 'type':'float', 'default': 5.0 },\n                                'unsorted_batch': {'required': False, 'type':'boolean', 'default': False},\n                                'cache_dir': {'required': False, 'type':'string'},\n                            },\n                        },\n                        'train': {\n                            'required': False, \n                            'type':'dict',\n                            'allow_unknown': True,\n                            'schema': {\n                                'batch_size': {'required': False, 'type':'integer', 'default': 40},\n                                'train_data_server': {'required': False, 'type':'string'},\n                                'desired_max_samples': {'required': False, 'type':'integer'},\n                                'tokenizer_type': {'required': False, 'type':'string'},\n                                'prepend_datapath': {'required': False, 'type':'boolean', 'default': False},\n                                'vocab_dict': {'required': False, 'type':'string'},\n                                'pin_memory': {'required': False, 'type':'boolean', 'default': True},\n                                'num_workers': {'required': False, 'type':'integer', 'default': 1},\n                                'num_frames': {'required': False, 'type':'integer', 'default': 0},\n                                'max_batch_size': {'required': False, 'type':'integer', 'default': 0},\n                                'max_num_words': {'required': False, 'type':'integer'},\n                                'max_grad_norm': {'required': False, 'type':'float', 'default': 5.0 },\n                                'unsorted_batch': {'required': False, 'type':'boolean', 'default': False},\n                                'cache_dir': {'required': False, 'type':'string'},\n                            }\n                        },\n                    }\n                },\n                'type': {\n                    'required': False, \n                    'type':'string',\n                    'allowed':['model_optimization', 'personalization'],\n                    'default': 'model_optimization'\n                },\n                'aggregate_median': {'required': False, 'type':'string'},\n                'initial_lr_client': {'required': True, 'type':'float'},\n                'lr_decay_factor': {'required': True, 'type':'float'},\n                'weight_train_loss': {'required': True, 'type':'string'},\n                'best_model_criterion': {'required': False, 'type':'string', 'default':'loss'},\n                'fall_back_to_best_model': {'required': False, 'type':'boolean', 'default': False},\n                'softmax_beta': {'required': True, 'type':'float'},\n                'server_replay_config': {\n                    'required': False, \n                    'type':'dict',\n                    'schema':{\n                        'server_iterations': {'required': True, 'type':'integer'},\n                        'optimizer_config': {\n                            'required': True, \n                            'type':'dict',\n                            'allow_unknown': True,\n                            'schema': {\n                                'type': {'required': True, 'type':'string', 'allowed':['sgd', 'adam','adamax', 'lars', 'LarsSGD', 'lamb', 'adamW']},\n                                'lr': {'required': True, 'type':'float'},\n                                'weight_decay': {'required': False, 'type':'float'},\n                                'amsgrad': {'required': False, 'type':'boolean'},\n                            }\n                        },\n                    }\n                },\n                'nbest_task_scheduler': {\n                    'required': False, \n                    'type':'dict',\n                    'schema':{\n                        'num_tasks': {'required': True, 'type':'integer'}, \n                        'iteration_per_task': {'required': True, 'type':'integer'},\n                    }\n                },\n            }\n    },\n\n    'client_config':{\n        'required': True,\n        'type': 'dict',\n        'allow_unknown': True,\n        'schema': {\n            'meta_learning': {'required': False, 'type':'string'},\n            'stats_on_smooth_grad': {'required': False, 'type':'boolean'},\n            'ignore_subtask': {'required': True, 'type':'boolean'},\n            'num_skips_threshold': {'required': False, 'type':'integer'},\n            'copying_train_data': {'required': False, 'type':'boolean'},\n            'do_profiling': {'required': True, 'type':'boolean'},\n            'data_config': {\n                'required': True, \n                'type':'dict',\n                'allow_unknown': True,\n                'keysrules':{'forbidden':['num_clients']},\n                'schema': {\n                    'train': {\n                        'required': True, \n                        'type':'dict',\n                        'allow_unknown': True,\n                        'schema': {\n                            'batch_size': {'required': False, 'type':'integer', 'default': 40},\n                            'list_of_train_data': {'required': True, 'type':'string', 'nullable': True},\n                            'tokenizer_type': {'required': False, 'type':'string'},\n                            'prepend_datapath': {'required': False, 'type':'boolean', 'default': False},\n                            'vocab_dict': {'required': False, 'type':'string'},\n                            'pin_memory': {'required': False, 'type':'boolean', 'default': True},\n                            'num_workers': {'required': False, 'type':'integer', 'default': 1},\n                            'num_frames': {'required': False, 'type':'integer', 'default': 0},\n                            'max_batch_size': {'required': False, 'type':'integer', 'default': 0},\n                            'max_num_words': {'required': False, 'type':'integer'},\n                            'max_grad_norm': {'required': False, 'type':'float', 'default': 5.0 },\n                            'unsorted_batch': {'required': False, 'type':'boolean', 'default': False},\n                        }\n                    },\n                }\n            },\n            'type': {\n                'required': False, \n                'type':'string',\n                'allowed':['optimization', 'gradient_computation'],\n                'default': 'gradient_computation',\n            },\n            'meta_optimizer_config': {\n                'required': False, \n                'type':'dict',\n                'allow_unknown': True,\n                'schema': {\n                    'type': {'required': True, 'type':'string', 'allowed':['sgd', 'adam','adamax', 'lars', 'LarsSGD', 'lamb', 'adamW']},\n                    'lr': {'required': True, 'type':'float'},\n                }\n            },\n            'optimizer_config': {\n                'required': True, \n                'type':'dict',\n                'allow_unknown': True,\n                'schema': {\n                    'type': {'required': True, 'type':'string', 'allowed':['sgd', 'adam','adamax', 'lars', 'LarsSGD', 'lamb', 'adamW']},\n                    'lr': {'required': False, 'type':'float'},\n                    'weight_decay': {'required': False, 'type':'float'},\n                }\n            },\n            'annealing_config': {\n                'required': False, \n                'type':'dict',\n                'allow_unknown': True,\n                'schema': {\n                    'type': {'required': True, 'type':'string'},\n                    'step_interval': {'required': True, 'type':'string'},\n                    'gamma': {'required': False, 'type':'float'},\n                    'step_size': {'required': False, 'type':'integer'},\n                }\n            },\n            'ss_config': {'required': False, 'type':'dict', 'allow_unknown': True},\n        }\n    },\n}"
  },
  {
    "path": "core/server.py",
    "content": "# Copyright (c) Microsoft Corporation.\n# Licensed under the MIT license.\n'''\nIn this file, we define the classes that live inside 'worker 0', the worker\nresponsible for orchestration and aggregation. The main class is the\nOptimizationServer, which sends clients to the other workers to process and\ncombines the resulting models.\n'''\n\nimport json\nimport logging\nimport os\nimport random\nimport shutil\nimport time\nfrom collections import defaultdict\n\nimport numpy as np\nimport torch\n\n# Internal imports\nimport core.federated as federated\nfrom core.evaluation import Evaluation\nfrom core.client import Client\nfrom .strategies import select_strategy\nfrom .trainer import (\n    ModelUpdater,\n    Trainer,\n    set_component_wise_lr,\n)\nfrom utils import (\n    get_lr,\n    print_rank,\n    update_json_log,\n    to_device,\n)\n\n# For profiling\nimport cProfile\nimport pstats\n\n# AzureML-related libs\nfrom azureml.core import Run\nrun = Run.get_context()\n\n\nclass OptimizationServer(federated.Server):\n    def __init__(self, num_clients, model, optimizer, ss_scheduler, data_path, model_path, server_train_dataloader,\n                 config, idx_val_clients, idx_test_clients, single_worker):\n        '''Implement Server's orchestration and aggregation.\n\n        This is the main Server class, that actually implements orchestration\n        and aggregation, inheriting from `federated.Server`, which deals with\n        communication only.\n\n        The `train` method is central in FLUTE, as it defines good part of what\n        happens during training.\n\n        Args:\n            num_clients (int): total available clients.\n            model (torch.nn.Module): neural network model.\n            optimizer (torch.optim.Optimizer): optimizer.\n            ss_scheduler: scheduled sampling scheduler.\n            data_path (str): points to where data is.\n            model_path (str): points to where pretrained model is.\n            server_train_dataloader (torch.utils.data.DataLoader): dataloader for training\n            config (dict): JSON style configuration parameters\n            idx_val_clients (list): validation client ids\n            idx_test_clients (list): testing clients ids\n        '''\n\n        super().__init__()\n\n        # Initialize all attributes from arguments\n        self.client_idx_list = list(range(num_clients))\n        self.config = config\n        server_config = config['server_config']\n        decoder_config = config.get('decoder_config', None)\n\n        self.max_iteration = server_config['max_iteration']\n        self.do_clustering = server_config.get('clustering', False)\n        self.send_dicts = server_config.get('send_dicts', False)\n\n        self.num_clients_per_iteration = [int(x) for x in server_config['num_clients_per_iteration'].split(',')] \\\n            if isinstance(server_config['num_clients_per_iteration'], str) \\\n            else [server_config['num_clients_per_iteration']]\n\n        self.val_freq = server_config['val_freq']\n        self.req_freq = server_config['rec_freq']\n\n        self.evaluation = Evaluation(config, model_path, self.process_testvalidate, idx_val_clients, idx_test_clients, single_worker)\n\n        # TODO: does this need to be adjusted for custom metrics?\n        self.metrics = dict()\n\n        self.model_backup_freq = server_config.get('model_backup_freq', 100)\n        self.worker_trainer_config = server_config.get('trainer_config', {})\n\n        self.aggregate_median = server_config['aggregate_median']\n        self.initial_lr_client = server_config.get('initial_lr_client', -1.0)\n        self.lr_decay_factor = server_config.get('lr_decay_factor', 1.0)\n\n        self.model_type = config['model_config']['model_type']\n        self.quant_thresh = config['client_config'].get('quant_thresh', None)\n        self.quant_bits = config['client_config'].get('quant_bits', 10)\n\n        self.list_of_train_data = config['client_config']['data_config']['train']['list_of_train_data']\n        self.data_path = data_path\n        self.single_worker = single_worker\n\n        # Get max grad norm from data config\n        if 'train' in server_config['data_config']:\n            max_grad_norm = server_config['data_config']['train'].get('max_grad_norm', None)\n        else:\n            max_grad_norm = None\n\n        # Creating an instance to update the model with stats aggregated from workers\n        self.worker_trainer = ModelUpdater(\n            model=model,\n            optimizer=optimizer,\n            ss_scheduler=ss_scheduler,\n            train_dataloader=server_train_dataloader,\n            val_dataloader=None,\n            max_grad_norm=max_grad_norm,\n            anneal_config=server_config['annealing_config'],\n            model_type=self.model_type,\n            decoder_config=decoder_config\n        )\n        self.metrics['worker_trainer'] = self.worker_trainer\n        # Creating an instance for the server-side trainer (runs mini-batch SGD)\n        self.server_replay_iterations = None\n        self.server_trainer = None\n        if server_train_dataloader is not None:\n            assert 'server_replay_config' in server_config, 'server_replay_config is not set'\n            assert 'optimizer_config' in server_config[\n                'server_replay_config'], 'server-side replay training optimizer is not set'\n            self.server_optimizer_config = server_config['server_replay_config']['optimizer_config']\n            self.server_trainer_config = server_config['server_replay_config'].get('trainer_config', {})\n            self.server_replay_iterations = server_config['server_replay_config']['server_iterations']\n            self.server_trainer = Trainer(\n                model=model,\n                optimizer=None,\n                ss_scheduler=ss_scheduler,\n                train_dataloader=server_train_dataloader,\n                server_replay_config=server_config['server_replay_config'],\n                max_grad_norm=server_config['server_replay_config']\\\n                                            .get('max_grad_norm',server_config['data_config']['train']\\\n                                                .get('max_grad_norm',None)),\n                anneal_config=server_config['server_replay_config'].get('annealing_config', None),\n                ignore_subtask = server_config['server_replay_config'].get('ignore_subtask', False)\n            )\n\n        self.skip_model_update = False  # will not update the model if True\n\n        self.train_loss = 0.0\n        self.model_path = model_path\n        self.best_model_criterion = server_config['best_model_criterion']\n        self.fall_back_to_best_model = server_config['fall_back_to_best_model']\n        self.last_model_path = os.path.join(self.model_path, 'latest_model.tar')\n        self.best_model_path = os.path.join(self.model_path,\n            'best_val_{}_model.tar'.format(self.best_model_criterion))\n        self.log_path = os.path.join(self.model_path, 'status_log.json')\n        self.cur_iter_no = 0  # keep the iteration number for Tensor board plotting\n        self.lr_weight = 1.0\n\n        self.losses = []\n        self.no_label_updates = 0  # no. label updates\n\n        # Update the parameters above if the log file\n        if server_config.get('resume_from_checkpoint', False):\n            self.load_saved_status()\n\n        # Decoding config\n        self.decoder_config = decoder_config\n        self.spm_model = server_config['data_config']['test'].get('spm_model', None)\n\n        self.do_profiling = server_config.get('do_profiling', False)\n\n        StrategyClass = select_strategy(config['strategy'])\n        self.strategy = StrategyClass('server', self.config, self.model_path)\n        print_rank(f'Server successfully instantiated strategy {self.strategy}', loglevel=logging.DEBUG)\n\n    def load_saved_status(self):\n        '''Load checkpoint from disk'''\n\n        # Check if model is on disk, if so loads it onto trainer\n        if os.path.exists(self.last_model_path):\n            print_rank('Resuming from checkpoint model {}'.format(self.last_model_path))\n            self.worker_trainer.load(self.last_model_path, update_lr_scheduler=True, update_ss_scheduler=True)\n            if self.server_trainer is not None:\n                self.server_trainer.model = self.worker_trainer.model  # make sure that the models are in sync\n\n        # Check if log is on disk, if so loads it onto current stats\n        if os.path.exists(self.log_path):\n            with open(self.log_path, 'r') as logfp:  # loading the iteration no., best loss and CER\n                elems = json.load(logfp)\n                self.cur_iter_no = elems.get('i', 0)\n                self.metrics['best_val_loss'] = elems.get('best_val_loss', float('inf'))\n                self.metrics['best_val_acc'] = elems.get('best_val_acc', 0)\n                self.metrics['best_test_loss'] = elems.get('best_test_loss', float('inf'))\n                self.metrics['best_test_acc'] = elems.get('best_test_acc', 0)\n                self.lr_weight = elems.get('weight', 1.0)\n                self.no_label_updates = elems.get('num_label_updates', 0)\n                print_rank(f'Resuming from status_log: cur_iter: {self.cur_iter_no}')\n\n    def run(self):\n        '''Trigger training.\n\n        This is a simple wrapper to the `train` method.\n        '''\n        print_rank('server started')\n        self.train()\n        print_rank('server terminated')\n\n    def train(self):\n        '''Main method for training.'''\n\n        self.run_stats = {\n            'secsPerClientRound': [],\n            'secsPerClient': [],\n            'secsPerClientTraining': [],\n            'secsPerClientSetup': [],\n            'secsPerClientFull': [],\n            'secsPerRoundHousekeeping': [],\n            'secsPerRoundTotal': [],\n            'communicationCosts': []\n        }\n\n        run.log('Max iterations', self.max_iteration)\n        try:\n            self.worker_trainer.model = to_device(self.worker_trainer.model)\n\n            # Do an initial validation round to understand the pretrained model's validation accuracy\n            # Skip if we resumed from a checkpoint (cur_iter_no > 0)\n            eval_list = []\n            if self.cur_iter_no == 0:\n\n                if self.config['server_config']['initial_rec']:\n                    eval_list.append('test')\n                if self.config['server_config']['initial_val']:\n                    eval_list.append('val')\n                    run.log('LR for agg. opt.', get_lr(self.worker_trainer.optimizer))\n\n                print_rank(\"Running {} at itr={}\".format(eval_list, self.cur_iter_no))\n                self.metrics = self.evaluation.run(eval_list, self.metrics, metric_logger=run.log)\n                eval_list = [] # some cleanup\n\n            # Dump all the information in aggregate_metric\n            print_rank('Saving Model Before Starting Training', loglevel=logging.INFO)\n            for token in ['best_val_loss', 'best_val_acc', 'best_test_acc', 'latest']:\n                self.worker_trainer.save(\n                    model_path=self.model_path,\n                    token=token,\n                    config=self.config['server_config']\n                )\n\n            # Training loop\n            self.worker_trainer.model.train()\n            for i in range(self.cur_iter_no, self.max_iteration):\n                begin = time.time()\n                metrics_payload = {}\n\n                def log_metric(k, v):\n                    metrics_payload[k] = v\n\n                print_rank('==== iteration {}'.format(i))\n                log_metric('Current iteration', i)\n\n                # Initial value for the learning rate of the worker\n                initial_lr = self.initial_lr_client * self.lr_weight\n                print_rank('Client learning rate {}'.format(initial_lr))\n\n                # Run training on clients\n                self.worker_trainer.model.zero_grad()\n                self.train_loss = []\n\n                if self.send_dicts: # Send state dictionaries\n                    glob_payload = [self.worker_trainer.model.state_dict()[param_key].to(torch.device('cpu')) for param_key in self.worker_trainer.model.state_dict()]\n                else: # Send parameters\n                    glob_payload = [p.data.to(torch.device('cpu')) for p in self.worker_trainer.model.parameters()]\n                \n                server_data = (initial_lr, glob_payload, i)\n\n                # Random number of clients per iteration\n                if len(self.num_clients_per_iteration) > 1:\n                    num_clients_curr_iter = random.randint(\n                        self.num_clients_per_iteration[0],\n                        self.num_clients_per_iteration[1]\n                    )\n                else:\n                    num_clients_curr_iter = self.num_clients_per_iteration[0]\n                log_metric('Clients for round', num_clients_curr_iter)\n\n                # Perform annealing in quantization threshold\n                if self.quant_thresh is not None:\n                    self.config['client_config']['quant_thresh'] *= self.config['client_config'].get('quant_anneal', 1.0)\n                    self.quant_thresh = self.config['client_config']['quant_thresh']\n                    log_metric('Quantization Thresh.', self.config['client_config']['quant_thresh'])\n\n                #  Create the pool of clients -- sample from this pool to assign to workers\n                sampled_idx_clients = random.sample(self.client_idx_list,\n                    num_clients_curr_iter) if num_clients_curr_iter > 0 else self.client_idx_list\n                \n                # Initialize stats\n                clients_begin = time.time()\n\n                client_losses = []\n                client_mag_grads = []\n                client_mean_grads = []\n                client_var_grads = []\n                client_norm_grads = []\n\n                self.run_stats['secsPerClient'].append([])\n                self.run_stats['secsPerClientFull'].append([])\n                self.run_stats['secsPerClientTraining'].append([])\n                self.run_stats['secsPerClientSetup'].append([])\n                self.run_stats['communicationCosts'].append([])\n\n                # Check if we want privacy metrics\n                apply_privacy_metrics = self.config.get('privacy_metrics_config', None) and \\\n                    self.config['privacy_metrics_config']['apply_metrics']\n                adaptive_leakage = apply_privacy_metrics and \\\n                    self.config['privacy_metrics_config'].get('adaptive_leakage_threshold', None)\n                if apply_privacy_metrics:\n                    privacy_metrics_stats = defaultdict(list)\n\n                # Initialize profiler\n                profiler = None\n                if self.do_profiling:\n                    profiler = cProfile.Profile()\n                    profiler.enable()\n\n                # Reset gradient for the model before assigning the new gradients\n                self.worker_trainer.model.zero_grad()\n                \n                print_rank(f\"Clients sampled from server {sampled_idx_clients}\", loglevel=logging.DEBUG)\n                for client_output in self.process_clients(sampled_idx_clients, server_data, self.single_worker):\n                    # Process client output\n                    client_timestamp = client_output['ts']\n                    client_stats = client_output['cs']\n                    client_loss = client_output['tl']\n                    client_mag_grad = client_output['mg']\n                    client_mean_grad = client_output['ng']\n                    client_var_grad = client_output['vg']\n                    client_norm_grad = client_output['rg']\n                    client_payload = client_output['pl']\n\n                    if apply_privacy_metrics:\n                        privacy_stats = client_output['ps']\n                        for metric, value in privacy_stats.items():\n                            privacy_metrics_stats[metric].append(value)\n\n                    self.run_stats['communicationCosts'][-1].append(time.time() - client_timestamp)\n\n                    # Get actual pseudo-gradients for aggregation\n                    payload_processed = self.strategy.process_individual_payload(self.worker_trainer, client_payload)\n                    if not payload_processed:\n                        print_rank('Dropping client', loglevel=logging.DEBUG)\n                        num_clients_curr_iter -= 1\n                        continue\n\n                    # Aggregate stats\n                    self.train_loss.append(client_loss)\n                    client_losses.append(client_loss)\n                    client_mag_grads.append(client_mag_grad.item())\n                    client_mean_grads.append(client_mean_grad.item())\n                    client_var_grads.append(client_var_grad.item())\n                    client_norm_grads.append(client_norm_grad.item())\n\n                    # Mark the end of client processing\n                    client_end = time.time()\n\n                    self.run_stats['secsPerClientFull'][-1].append(client_stats['full cost'])\n                    self.run_stats['secsPerClientTraining'][-1].append(client_stats['training'])\n                    self.run_stats['secsPerClientSetup'][-1].append(client_stats['setup'])\n                    self.run_stats['secsPerClient'][-1].append(client_end - clients_begin)\n\n                # Tear down profiler\n                if self.do_profiling:\n                    profiler.disable()\n                    stats = pstats.Stats(profiler)\n                    stats.sort_stats('cumulative').print_stats()\n\n                # Prepare output\n                client_mag_grads = np.array(client_mag_grads)\n                client_mean_grads = np.array(client_mean_grads)\n                client_var_grads = np.array(client_var_grads)\n                client_norm_grads = np.array(client_norm_grads)\n\n                client_stats = (client_mag_grads, client_mean_grads, client_var_grads)\n\n                dump_norm_stats = self.config.get('dump_norm_stats', False)\n                if dump_norm_stats:\n                    with open(os.path.join(self.model_path, 'norm_stats.txt'), 'a', encoding='utf-8') as outF:\n                        outF.write('{}\\n'.format(json.dumps(list(client_norm_grads))))\n\n                # Print the privacy metrics\n                if apply_privacy_metrics:\n                    for metric, values in privacy_metrics_stats.items():\n                        if metric == 'Dropped clients':\n                            log_metric(metric, sum(values))\n                        else:\n                            log_metric(metric, max(values))\n\n                if type(adaptive_leakage) is float:\n                    values = privacy_metrics_stats['Practical epsilon (Max leakage)']\n                    new_threshold = list(sorted(values))[int(adaptive_leakage*len(values))]\n                    print_rank('Updating leakage threshold to {}'.format(new_threshold))\n                    self.config['privacy_metrics_config']['max_allowed_leakage'] = new_threshold\n\n                # Mark that all clients have been processed\n                end = time.time()\n                self.run_stats['secsPerClientRound'].append(end - begin)\n                begin = end\n\n                # Log the training loss to tensorboard/AML\n                log_metric('Training loss', sum(self.train_loss))\n\n                # Combine payloads\n                self.losses = self.strategy.combine_payloads(\n                    worker_trainer=self.worker_trainer,\n                    curr_iter=i,\n                    num_clients_curr_iter=num_clients_curr_iter,\n                    total_clients = len(self.client_idx_list),\n                    client_stats=client_stats,\n                    logger=log_metric,\n                )\n                \n                # Run a couple of iterations of training data on the server\n                if self.server_trainer is not None:\n                    print_rank('Running replay iterations on server')\n\n                    if 'updatable_names' in self.server_trainer_config:\n                        set_component_wise_lr(\n                            self.worker_trainer.model,\n                            self.server_optimizer_config,\n                            self.server_trainer_config['updatable_names']\n                        )\n                    self.server_trainer.prepare_iteration(self.worker_trainer.model)\n                    self.server_trainer.train_desired_samples(self.server_replay_iterations)\n                    self.worker_trainer.model.load_state_dict(self.server_trainer.model.state_dict())\n                    torch.cuda.empty_cache()\n\n                # Update a sampling scheduler\n                print_rank('Run ss scheduler')\n                self.worker_trainer.run_ss_scheduler()\n\n                # Run inference and score on val/test depending on the iter. number\n                if ((i+1) % self.val_freq) == 0:\n                    eval_list.append(\"val\")\n                if ((i+1) % self.req_freq) == 0 :\n                    eval_list.append(\"test\")\n                \n                if len(eval_list)> 0:\n                    print_rank('Running {} at itr={}'.format(eval_list,i+1))\n                    self.metrics['worker_trainer'] = self.worker_trainer\n                    if hasattr(self.strategy,'tmp_unsup'):\n                        self.metrics['tmp_sup'] = self.strategy.tmp_sup\n                        self.metrics['tmp_unsup'] = self.strategy.tmp_unsup\n                    self.metrics = self.evaluation.run(eval_list, self.metrics, metric_logger=run.log)\n                    self.losses = self.evaluation.losses\n                    eval_list = []\n\n                # Create a schedule for the initial_lr (for the worker)\n                if 'val' in eval_list:\n                    run.log('LR for agg. opt.', get_lr(self.worker_trainer.optimizer))\n                    if not (self.losses[0] < self.metrics['best_val_loss']):\n                        self.lr_weight *= self.lr_decay_factor\n                        print_rank('LOG: Client weight of learning rate {}..'.format(self.lr_weight))\n\n                # Backup the current best models\n                self.backup_models(i)\n\n                # Fall back to the best model if the option is enabled\n                self.fall_back_to_prev_best_status()\n\n                # Logging the latest best values only after the 1st val/test round has been executed\n                if len(self.metrics) > 1:\n                    update_json_log(\n                        self.log_path,\n                        {\n                            'i': i + 1,\n                            'best_val_loss': float(self.metrics['best_val_loss']),\n                            'best_val_acc': float(self.metrics['best_val_acc']),\n                            'best_test_loss': float(self.metrics['best_test_loss']),\n                            'best_test_acc': float(self.metrics['best_test_acc']),\n                            'weight': float(self.lr_weight),\n                            'num_label_updates': int(self.no_label_updates)\n                        },\n                    )\n\n                end = time.time()\n\n                # Aggregate stats\n                self.run_stats['secsPerRoundHousekeeping'].append(end - begin)\n                self.run_stats['secsPerRoundTotal'].append(self.run_stats['secsPerClientRound'][-1] + \\\n                    self.run_stats['secsPerRoundHousekeeping'][-1])\n\n                log_metric('secsPerRoundTotal', self.run_stats['secsPerRoundTotal'][-1])\n                if self.do_profiling:\n                    log_metric('secsPerClientRound', self.run_stats['secsPerClientRound'][-1])\n                    log_metric('secsPerRoundHousekeeping', self.run_stats['secsPerRoundHousekeeping'][-1])\n\n                    metrics_for_stats = [\n                        'secsPerClient',\n                        'secsPerClientTraining',\n                        'secsPerClientFull',\n                        'secsPerClientSetup',\n                        'communicationCosts',\n                    ]\n\n                    for metric in metrics_for_stats:\n                        log_metric(f'{metric}Mean', np.mean(self.run_stats[metric][-1]))\n                        log_metric(f'{metric}Median', np.median(self.run_stats[metric][-1]))\n                        log_metric(f'{metric}Max', max(self.run_stats[metric][-1]))\n\n                    for k in self.run_stats:\n                        if k in metrics_for_stats:\n                            print_rank('{}: {}'.format(k, max(self.run_stats[k][-1])), loglevel=logging.DEBUG)\n                        else:\n                            print_rank('{}: {}'.format(k, self.run_stats[k][-1]), loglevel=logging.DEBUG)\n\n                # Log all the metrics\n                for k in metrics_payload:\n                    run.log(k, metrics_payload[k])\n\n        finally:  # perform cleanup even if error was raised above\n            self.terminate_workers(terminate=(not self.do_clustering))\n\n    def backup_models(self, i):\n        '''Save the current best models.\n\n        Save CER model, the best loss model and the best WER model. This occurs\n        at a specified period.\n\n        Args:\n            i: no. of iterations.\n        '''\n\n        # Always save the latest model\n        self.worker_trainer.save(\n            model_path=self.model_path,\n            token='latest',\n            config=self.config['server_config'],\n        )\n\n        if (i % self.model_backup_freq) == 0:  # save the current best models\n            self.worker_trainer.save(\n                model_path=self.model_path,\n                token='epoch{}'.format(i),\n                config=self.config['server_config']\n            )\n\n            for bodyname in ['best_val_acc', 'best_val_loss', 'best_test_acc']:\n                src_model_path = os.path.join(self.model_path, '{}_model.tar'.format(bodyname))\n                if os.path.exists(src_model_path):\n                    dst_model_path = os.path.join(self.model_path, 'epoch{}_{}_model.tar'.format(i, bodyname))\n                    shutil.copyfile(src_model_path, dst_model_path)\n                    print_rank('Saved {}'.format(dst_model_path))\n\n    def fall_back_to_prev_best_status(self):\n        '''Go back to the past best status and switch to the recent best model.'''\n\n        if self.fall_back_to_best_model:\n            print_rank('falling back to model {}'.format(self.best_model_path))\n\n            # Save current learning rate\n            tmp_lr = get_lr(self.worker_trainer.optimizer)\n\n            # Load previous best model\n            self.worker_trainer.load(self.best_model_path, update_lr_scheduler=False, update_ss_scheduler=False)\n\n            # Update previous learning rate on optimizer\n            for g in self.worker_trainer.optimizer.param_groups:\n                g['lr'] = tmp_lr\n\n            if self.server_trainer is not None:\n                self.server_trainer.model = self.worker_trainer.model  # make sure that the models are in sync\n\n\ndef select_server(server_type):\n    '''Select a server type using different possible strings.\n\n    Right now this just returns `OptimizationServer`, but this\n    function could be useful when there are multiple choices of\n    server.\n\n    Args:\n        server_type (str): indicates server choice.\n        config (dict): config parsed from YAML, passed so that\n            parameters can be used to select a given server.\n    '''\n    if server_type == \"personalization\":\n        from experiments.cv.server import PersonalizationServer\n        return PersonalizationServer\n    else:\n        return OptimizationServer\n"
  },
  {
    "path": "core/strategies/__init__.py",
    "content": "# Copyright (c) Microsoft Corporation.\n# Licensed under the MIT license.\n\nfrom .base import BaseStrategy\nfrom .fedavg import FedAvg\nfrom .dga import DGA\nfrom .fedlabels import FedLabels\n\ndef select_strategy(strategy):\n    ''' Selects the aggregation strategy class\n    \n    NOTE: FedProx uses FedAvg weights during aggregation, \n    which are proportional to the number of samples in \n    each client.\n    '''\n    if strategy.lower() == 'dga':\n        return DGA\n    elif strategy.lower() in ['fedavg', 'fedprox']:\n        return FedAvg\n    elif strategy.lower() == 'fedlabels':\n        return FedLabels\n    else:\n        raise ValueError(f'cannot use strategy f{strategy}')"
  },
  {
    "path": "core/strategies/base.py",
    "content": "# Copyright (c) Microsoft Corporation.\n# Licensed under the MIT license.\n\nfrom abc import abstractmethod\n\n\n@abstractmethod\nclass BaseStrategy:\n    def __init__(self, mode, config, model_path=None):\n        '''Federated learning strategy\n\n        Args:\n            mode (str): which part the instantiated object should play,\n                typically either :code:`client` or :code:`server`.\n            config (dict): initial config dict.\n            model_path (str): where to find model, needed for debugging only.\n        '''\n        pass\n\n    def generate_client_payload(self, trainer):\n        '''Generate client payload\n\n        Args:\n            trainer (core.Trainer object): trainer on client.\n\n        Returns:\n            dict containing payloads in some specified format.\n        '''\n        pass\n\n    def process_individual_payload(self, worker_trainer, payload):\n        '''Process client payload\n        \n        Args:\n            worker_trainer (core.Trainer object): trainer on server\n                (aka model updater).\n            payload (dict): whatever is generated by\n                :code:`generate_client_payload`.\n\n        Returns:\n            True if processed succesfully, False otherwise.\n        '''\n        pass\n\n    def combine_payloads(self, worker_trainer, curr_iter, num_clients_curr_iter, total_clients, client_stats, logger=None):\n        '''Combine payloads to update model\n        \n        Args:\n            worker_trainer (core.Trainer object): trainer on server\n                (aka model updater).\n            curr_iter (int): current iteration.\n            num_clients_curr_iter (int): number of clients on current iteration.\n            total_clients (int): size of total pool of clients (for privacy accounting)\n            client_stats (dict): stats being collected.\n            logger (callback): function called to log quantities.\n        '''\n        pass"
  },
  {
    "path": "core/strategies/dga.py",
    "content": "# Copyright (c) Microsoft Corporation.\n# Licensed under the MIT license.\n\nimport copy\nimport json\nimport logging\nimport math\nimport os\n\nimport numpy as np\nimport torch\n\nfrom extensions import privacy, RL, quant_model\nfrom utils import compute_grad_cosines, print_rank, to_device\nfrom core.strategies import BaseStrategy\nfrom core.strategies.utils import (\n    aggregate_gradients_inplace,\n    filter_weight,\n)\n\nfrom azureml.core import Run\nrun = Run.get_context()\n\nMIN_WEIGHT = 1e-7\n\n\nclass DGA(BaseStrategy):\n    '''Dynamic Gradient Aggregation'''\n\n    def __init__(self, mode, config, model_path=None):\n        ''' Dynamic Gradient Aggregation (DGA) strategy.\n\n        For more info see arXiv:2106.07578.\n\n        Args:\n            mode (str): which part the instantiated object should play,\n                typically either :code:`client` or :code:`server`.\n            config (dict): initial config dict.\n            model_path (str): where to find model, needed for debugging only.\n        '''\n\n        super().__init__(mode=mode, config=config, model_path=model_path)\n\n        if mode not in ['client', 'server']:\n            raise ValueError('mode in strategy must be either `client` or `server`')\n\n        self.config = config\n        self.model_path = model_path\n        self.mode = mode\n\n        # Parse config\n        self.model_config = config['model_config']\n        self.client_config = config['client_config']\n        self.server_config = config['server_config']\n\n        self.dp_config = config.get('dp_config', None)\n\n        if mode == 'client':\n            self.stats_on_smooth_grad = self.client_config.get('stats_on_smooth_grad', False)\n            self.quant_threshold = self.client_config.get('quant_thresh', None)\n            self.quant_bits = self.client_config.get('quant_bits', 10)\n        elif mode == 'server':\n            self.dump_norm_stats = self.config.get('dump_norm_stats', False)\n            self.aggregate_fast = self.server_config.get('fast_aggregation', False)\n            self.want_rl = self.server_config.get('wantRL', False)\n            self.stale_prob = self.server_config.get('stale_prob', 0.0)\n\n            self.skip_model_update = False\n\n            # Do some checks and create objects based on configs\n            if self.aggregate_fast:\n                print_rank('It is NOT possible to enable RL with fast_aggregation, RL is set to False', loglevel=logging.INFO)\n                self.want_rl = False\n\n                print_rank('It is NOT possible in Current Implementation to have stale gradients with fast_aggregation, stale_prob is set to 0.0', loglevel=logging.INFO)\n                self.stale_prob = 0.0\n\n            if self.want_rl:\n                self.rl = RL(config=self.server_config)\n\n            # Initialize accumulators\n            self.client_parameters_stack = []\n            self.client_parameters_stack_stale = []\n            self.client_weights = []\n\n            self.weight_sum_stale = 0.0\n\n    def generate_client_payload(self, trainer):\n        '''Generate client payload\n\n        Args:\n            trainer (core.Trainer object): trainer on client.\n\n        Returns:\n            dict containing payloads in some specified format.\n        '''\n\n        if self.mode != 'client':\n            raise RuntimeError('this method can only be invoked by the client')\n\n        # Get weights for aggregation, potentially using DGA\n        weight = 1.0\n        add_weight_noise = False\n\n        # Reset gradient stats and recalculate them on the smooth/pseudo gradient\n        if self.stats_on_smooth_grad:\n            trainer.reset_gradient_power()\n            trainer.estimate_sufficient_stats()\n\n        # If we are using softmax based on training loss, it needs DP noise\n        if self.config['server_config']['aggregate_median'] == 'softmax':\n            # This matters when DP is required\n            add_weight_noise = True\n\n            if 'weight_train_loss' not in self.config['server_config'] or self.config['server_config']['weight_train_loss'] == 'train_loss':\n                training_weight = trainer.train_loss / trainer.num_samples\n            elif self.config['server_config']['weight_train_loss'] == 'mag_var_loss':\n                training_weight = trainer.sufficient_stats['var']\n            elif self.config['server_config']['weight_train_loss'] == 'mag_mean_loss':\n                training_weight = trainer.sufficient_stats['mean']\n            else:\n                training_weight = trainer.sufficient_stats['mag']\n\n            try:\n                weight = math.exp(-self.config['server_config']['softmax_beta'] * training_weight)\n            except:\n                print_rank('There is an issue with the weight -- Reverting to {}'.format(MIN_WEIGHT), loglevel=logging.DEBUG)\n                weight = MIN_WEIGHT\n            weight = filter_weight(weight)\n\n        # Add local DP noise here.\n        # When weight == 0, something went wrong. So we'll skip adding noise and return a zero gradient.\n        if weight > 0.0 and self.dp_config is not None and self.dp_config.get('enable_local_dp', False):\n            weight = privacy.apply_local_dp(trainer, weight, self.dp_config, add_weight_noise)\n\n        # In all other cases we can compute the weight after adding noise\n        if not add_weight_noise:\n            assert self.config['server_config']['aggregate_median'] == 'mean'\n            assert weight == 1.0\n\n        # Weight the gradient and remove gradients of the layers we want to freeze\n        for n, p in trainer.model.named_parameters():\n            p.grad = weight * p.grad\n            if self.model_config.get('freeze_layer', None) and n == self.model_config['freeze_layer']:\n                print_rank('Setting gradient to zero for layer: {}'.format(n), loglevel=logging.INFO)\n                p.grad.mul_(0)\n\n        # Gradient quantization step -- if quant_threshold is None, the code returns without doing anything\n        quant_model(trainer.model, quant_threshold=self.quant_threshold, quant_bits=self.quant_bits, global_stats=False)\n\n        payload = {}\n        payload['weight'] = weight\n        payload['gradients'] = [p.grad.to(torch.device('cpu')) for p in trainer.model.parameters()]\n\n        return payload\n\n    def process_individual_payload(self, worker_trainer, payload):\n        '''Process client payload\n\n        Args:\n            worker_trainer (core.Trainer object): trainer on server\n                (aka model updater).\n            payload (dict): whatever is generated by\n                :code:`generate_client_payload`.\n\n        Returns:\n            True if processed succesfully, False otherwise.\n        '''\n\n        if self.mode != 'server':\n            raise RuntimeError('this method can only be invoked by the server')\n\n        if payload['weight'] == 0.0:\n            return False\n\n        self.client_weights.append(payload['weight'])\n        if self.aggregate_fast:\n            aggregate_gradients_inplace(worker_trainer.model, payload['gradients'])\n        else:\n            self.client_parameters_stack.append(payload['gradients'])\n        return True\n\n    def combine_payloads(self, worker_trainer, curr_iter, num_clients_curr_iter, total_clients, client_stats, logger=None):\n        '''Combine payloads to update model\n\n        Args:\n            worker_trainer (core.Trainer object): trainer on server\n                (aka model updater).\n            curr_iter (int): current iteration.\n            num_clients_curr_iter (int): number of clients on current iteration.\n            total_clients (int): size of total pool of clients (for privacy accounting)\n            client_stats (dict): stats being collected.\n            logger (callback): function called to log quantities.\n\n        Returns:\n            losses, computed for use with LR scheduler.\n        '''\n\n        if self.mode != 'server':\n            raise RuntimeError('this method can only be invoked by the server')\n\n        if self.want_rl:\n            rl_model = self._run_rl_inference(self.client_weights, *client_stats)\n\n        # Aggregation step\n        if self.dump_norm_stats:\n            cps_copy = [[g.clone().detach() for g in x] for x in self.client_parameters_stack]\n        weight_sum = self._aggregate_gradients(worker_trainer, num_clients_curr_iter, self.client_weights, metric_logger=logger)\n        print_rank('Sum of weights: {}'.format(weight_sum), loglevel=logging.DEBUG)\n\n        torch.cuda.empty_cache()\n\n        # Normalize with weight_sum\n        for p in worker_trainer.model.parameters():\n            p.grad /= weight_sum\n\n        if self.dump_norm_stats:\n            cosines = compute_grad_cosines(cps_copy, [p.grad.clone().detach() for p in worker_trainer.model.parameters()])\n            with open(os.path.join(self.model_path, 'cosines.txt'), 'a', encoding='utf-8') as outfile:\n                outfile.write('{}\\n'.format(json.dumps(cosines)))\n\n        # DP-specific steps\n        privacy.apply_global_dp(self.config, worker_trainer.model, num_clients_curr_iter=num_clients_curr_iter, select_grad=True, metric_logger=logger)\n        eps = privacy.update_privacy_accountant(self.config, total_clients, curr_iter=curr_iter, num_clients_curr_iter=num_clients_curr_iter)\n        if eps:\n            print_rank(f'DP result: {eps}')\n\n        if self.skip_model_update is True:\n            print_rank('Skipping model update')\n            return\n\n        # Run optimization with gradient/model aggregated from clients\n        print_rank('Updating model')\n        worker_trainer.update_model()\n        print_rank('Updating learning rate scheduler')\n        losses = worker_trainer.run_lr_scheduler(force_run_val=False)\n\n        if self.want_rl:\n            self._run_rl_training(curr_iter, rl_model, self.client_weights, *client_stats, logger)\n\n        return losses\n\n    def _aggregate_gradients(self, worker_trainer, num_clients_curr_iter, client_weights, metric_logger=None):\n        '''Go through stored gradients, aggregate and put them inside model.\n\n        Args:\n            num_clients_curr_iter (int): how many clients were processed.\n            client_weights: weight for each client.\n            metric_logger (callback, optional): callback used for logging.\n                Defaults to None, in which case AML logger is used.\n\n        Returns:\n            float: sum of weights for all clients.\n        '''\n\n        weight_sum = 0\n        if metric_logger is None:\n            metric_logger = run.log\n\n        if not self.aggregate_fast:\n            metric_logger('Stale Gradients Ratio', len(self.client_parameters_stack_stale) / num_clients_curr_iter)\n            if len(self.client_parameters_stack_stale) > 0:\n                weight_sum = self.weight_sum_stale\n                for client_parameters in self.client_parameters_stack_stale:\n                    # Model parameters are already multiplied with weight on client, we only have to sum them up\n                    aggregate_gradients_inplace(worker_trainer.model, client_parameters)\n                self.client_parameters_stack_stale = []\n                self.weight_sum_stale = 0\n\n            for client_weight, client_parameters in zip(client_weights, self.client_parameters_stack):\n                if np.random.random() > self.stale_prob:\n                    # Model parameters are already multiplied with weight on client, we only have to sum them up\n                    aggregate_gradients_inplace(worker_trainer.model, client_parameters)\n                else:\n                    self.weight_sum_stale += client_weight\n                    self.client_parameters_stack_stale.append(client_parameters)\n\n        weight_sum += sum(client_weights) - self.weight_sum_stale\n\n        # Some cleaning\n        self.client_parameters_stack = []\n        self.client_weights = []\n\n        return weight_sum\n\n    def _run_rl_inference(self, client_weights, client_mag_grads, client_mean_grads, client_var_grads):\n        '''Uses RL to estimate weights, using DGA.\n\n        Args:\n            client_weights (numpy.ndarray): original weights for aggregation.\n            client_mag_grads (numpy.ndarray): gradient stats for RL (magnitudes).\n            client_mean_grads (numpy.ndarray): gradient stats for RL (means).\n            client_var_grads (numpy.ndarray): gradient stats for RL (vars).\n\n        Returns:\n            list of torch.Tensor: parameters of model used to perform RL.\n        '''\n\n        weight_sum = 0\n        original_model = copy.copy([p for p in self.worker_trainer.model.parameters()])\n\n        # Reinforcement learning for estimating weights\n        print_rank('RL estimation of the aggregation weights', loglevel=logging.INFO)\n        rl_weights = self.rl.forward(\n            np.concatenate((client_weights, client_mag_grads, client_mean_grads, client_var_grads), axis=0)).cpu().detach().np()\n        if rl_weights.ndim > 1:\n            rl_weights = rl_weights[-1, :]\n        rl_weights = np.exp(rl_weights)\n\n        print_rank('RL Weights BEFORE filtering: {}'.format(rl_weights), loglevel=logging.DEBUG)\n        index = np.argwhere(np.isnan(rl_weights))\n        rl_weights[index] = 0\n        index = np.argwhere(np.isinf(rl_weights))\n        rl_weights[index] = 0\n        print_rank('RL Weights AFTER filtering: {}'.format(rl_weights), loglevel=logging.DEBUG)\n\n        for client_parameters, orig_weight, rl_weight in zip(self.client_parameters_stack, client_weights, rl_weights):\n            # Model parameters are already multiplied with weight on client, we only have to sum them up\n            for p, client_grad in zip(self.worker_trainer.model.parameters(), client_parameters):\n                if p.grad is None:\n                    p.grad = to_device(client_grad) * rl_weight / orig_weight\n                else:\n                    p.grad += to_device(client_grad) * rl_weight / orig_weight\n            weight_sum += rl_weight\n\n        # Normalize with weight_sum\n        for p in self.worker_trainer.model.parameters():\n            p.grad /= weight_sum\n        \n        # Run optimization with gradient/model aggregated from clients\n        self.worker_trainer.update_model()\n\n        # Get the validation result back\n        (rl_val_loss, rl_val_acc) = self.worker_trainer.run_lr_scheduler(force_run_val=True)\n\n        # Save model and revert to previous one\n        rl_model = copy.copy([p.data for p in self.worker_trainer.model.parameters()])\n        for p, p_ in zip(self.worker_trainer.model.parameters(), original_model):\n            p.data = p_.data.detach().clone()\n\n        # Set the current set of weights\n        self.rl.set_weights(rl_weights)\n        self.rl.set_losses((rl_val_loss, rl_val_acc))\n\n        # Return the resulting RL-based model\n        return rl_model\n\n    def _run_rl_training(self, iter, rl_model, client_weights, client_mag_grads, client_mean_grads, client_var_grads, metric_logger):\n        '''Trains RL for estimating weights, following DGA recipe.\n        \n        Args:\n            iter (int): current iteration.\n            rl_model (list of torch.Tensor): parameters of model used to perform RL.\n            client_weights (numpy.ndarray): original weights for aggregation.\n            client_mag_grads (numpy.ndarray): gradient stats for RL (magnitudes).\n            client_mean_grads (numpy.ndarray): gradient stats for RL (means).\n            client_var_grads (numpy.ndarray): gradient stats for RL (vars).\n            metric_logger (callback, optional): callback used for logging.\n                Defaults to None, in which case AML logger is used.\n        '''\n\n        # Get the validation result back\n        if None in self.losses:\n            self.losses = self.run_distributed_inference(mode='val')\n\n        # Expected structure of batch\n        print_rank('Performing RL training on the aggregation weights')\n        if abs(self.losses[1] - self.rl.rl_losses[1]) < 0.001:\n            reward = 0.1\n            print_rank(\n                'Iter:{}  val_ACC={}  rl_val_ACC={}  reward={}'.format(iter, self.losses[1], self.rl.rl_losses[1], reward))\n            if 'marginal_update_RL' in self.config['server_config'] and \\\n                    self.config['server_config']['marginal_update_RL']:\n                self.losses = self.rl.rl_losses\n                for p, p_ in zip(self.worker_trainer.model.parameters(), rl_model):\n                    p.data= p_.data.detach().clone()\n\n        elif (self.losses[1] - self.rl.rl_losses[1]) > 0:\n            reward = 1.0\n            print_rank(\n                'Iter:{}  val_ACC={}  rl_val_ACC={}  reward={}'.format(iter, self.losses[1], self.rl.rl_losses[1], reward))\n            self.losses = self.rl.rl_losses\n            for p, p_ in zip(self.worker_trainer.model.parameters(), rl_model):\n                p.data = p_.data.detach().clone()\n\n        else:\n            reward = -1.0\n            print_rank(\n                'Iter:{}  val_ACC={}  rl_val_ACC={}  reward={}'.format(iter, self.losses[1], self.rl.rl_losses[1], reward))\n\n        # Taking the policy from a game-based RL\n        batch = (\n            (np.concatenate((client_weights, client_mag_grads, client_mean_grads, client_var_grads), axis=0)),\n            (self.rl.rl_weights),\n            [reward]\n        )\n\n        print_rank('RL Model Update -- Training')\n        self.rl.train(batch)\n\n        print_rank('RL State Saving')\n        self.rl.save(iter)\n\n        print_rank('RL logging')\n        metric_logger('RL Running Loss', self.rl.runningLoss)\n        metric_logger('RL Rewards', reward)"
  },
  {
    "path": "core/strategies/fedavg.py",
    "content": "# Copyright (c) Microsoft Corporation.\n# Licensed under the MIT license.\n\nimport json\nimport logging\nimport os\n\nimport torch\n\nfrom utils import compute_grad_cosines, print_rank\nfrom core.strategies import BaseStrategy\nfrom core.strategies.utils import (\n    aggregate_gradients_inplace,\n)\n\nfrom azureml.core import Run\nrun = Run.get_context()\n\n\nclass FedAvg(BaseStrategy):\n    '''Federated Averaging'''\n\n    def __init__(self, mode, config, model_path=None):\n        '''Federated Averaging strategy.\n\n        Args:\n            mode (str): which part the instantiated object should play,\n                typically either :code:`client` or :code:`server`.\n            config (dict): initial config dict.\n            model_path (str): where to find model, needed for debugging only.\n        '''\n\n        super().__init__(mode=mode, config=config, model_path=model_path)\n\n        if mode not in ['client', 'server']:\n            raise ValueError('mode in strategy must be either `client` or `server`')\n\n        self.config = config\n        self.model_path = model_path\n        self.mode = mode\n\n        # Parse config\n        self.model_config = config['model_config']\n        self.client_config = config['client_config']\n        self.server_config = config['server_config']\n\n        self.dp_config = config.get('dp_config', None)\n\n        if mode == 'client':\n            self.stats_on_smooth_grad = self.client_config.get('stats_on_smooth_grad', False)\n        elif mode == 'server':\n            self.dump_norm_stats = self.config.get('dump_norm_stats', False)\n            self.aggregate_fast = self.server_config.get('fast_aggregation', False)\n\n            self.skip_model_update = False\n\n            # Initialize accumulators\n            self.client_parameters_stack = []\n            self.client_weights = []\n\n    def generate_client_payload(self, trainer):\n        '''Generate client payload\n\n        Args:\n            trainer (core.Trainer object): trainer on client.\n\n        Returns:\n            dict containing payloads in some specified format.\n        '''\n\n        if self.mode != 'client':\n            raise RuntimeError('this method can only be invoked by the client')\n\n        # Reset gradient stats and recalculate them on the smooth/pseudo gradient\n        if self.stats_on_smooth_grad:\n            trainer.reset_gradient_power()\n            trainer.estimate_sufficient_stats()\n\n        # Weight the gradient and remove gradients of the layers we want to freeze\n        weight = trainer.num_samples\n        for n, p in trainer.model.named_parameters():\n            p.grad = weight * p.grad\n            if self.model_config.get('freeze_layer', None) and n == self.model_config['freeze_layer']:\n                print_rank('Setting gradient to zero for layer: {}'.format(n), loglevel=logging.INFO)\n                p.grad.mul_(0)\n\n        payload = {}\n        payload['weight'] = weight\n        payload['gradients'] = [p.grad.to(torch.device('cpu')) for p in trainer.model.parameters()]\n\n        return payload\n\n    def process_individual_payload(self, worker_trainer, payload):\n        '''Process client payload\n\n        Args:\n            worker_trainer (core.Trainer object): trainer on server\n                (aka model updater).\n            payload (dict): whatever is generated by\n                :code:`generate_client_payload`.\n\n        Returns:\n            True if processed succesfully, False otherwise.\n        '''\n\n        if self.mode != 'server':\n            raise RuntimeError('this method can only be invoked by the server')\n\n        if payload['weight'] == 0.0:\n            return False\n\n        self.client_weights.append(payload['weight'])\n        if self.aggregate_fast:\n            aggregate_gradients_inplace(worker_trainer.model, payload['gradients'])\n        else:\n            self.client_parameters_stack.append(payload['gradients'])\n        return True\n\n    def combine_payloads(self, worker_trainer, curr_iter, num_clients_curr_iter, total_clients, client_stats, logger=None):\n        '''Combine payloads to update model\n\n        Args:\n            worker_trainer (core.Trainer object): trainer on server\n                (aka model updater).\n            curr_iter (int): current iteration.\n            num_clients_curr_iter (int): number of clients on current iteration.\n            client_stats (dict): stats being collected.\n            logger (callback): function called to log quantities.\n\n        Returns:\n            losses, computed for use with LR scheduler.\n        '''\n\n        if self.mode != 'server':\n            raise RuntimeError('this method can only be invoked by the server')\n\n        # Aggregation step\n        if self.dump_norm_stats:\n            cps_copy = [[g.clone().detach() for g in x] for x in self.client_parameters_stack]\n        weight_sum = self._aggregate_gradients(worker_trainer, num_clients_curr_iter, self.client_weights, metric_logger=logger)\n        print_rank('Sum of weights: {}'.format(weight_sum), loglevel=logging.DEBUG)\n\n        torch.cuda.empty_cache()\n\n        # Normalize with weight_sum\n        for p in worker_trainer.model.parameters():\n            p.grad /= weight_sum\n\n        if self.dump_norm_stats:\n            cosines = compute_grad_cosines(cps_copy, [p.grad.clone().detach() for p in worker_trainer.model.parameters()])\n            with open(os.path.join(self.model_path, 'cosines.txt'), 'a', encoding='utf-8') as outfile:\n                outfile.write('{}\\n'.format(json.dumps(cosines)))\n\n        if self.skip_model_update is True:\n            print_rank('Skipping model update')\n            return\n\n        # Run optimization with gradient/model aggregated from clients\n        print_rank('Updating model')\n        worker_trainer.update_model()\n        print_rank('Updating learning rate scheduler')\n        losses = worker_trainer.run_lr_scheduler(force_run_val=False)\n\n        # TODO: Global DP. See dga.py\n\n        return losses\n\n    def _aggregate_gradients(self, worker_trainer, num_clients_curr_iter, client_weights, metric_logger=None):\n        '''Go through stored gradients, aggregate and put them inside model.\n\n        Args:\n            num_clients_curr_iter (int): how many clients were processed.\n            client_weights: weight for each client.\n            metric_logger (callback, optional): callback used for logging.\n                Defaults to None, in which case AML logger is used.\n\n        Returns:\n            float: sum of weights for all clients.\n        '''\n\n        if metric_logger is None:\n            metric_logger = run.log\n\n        if not self.aggregate_fast:\n            for client_parameters in self.client_parameters_stack:\n                # Model parameters are already multiplied with weight on client, we only have to sum them up\n                aggregate_gradients_inplace(worker_trainer.model, client_parameters)\n        weight_sum = sum(client_weights)\n\n        # Some cleaning\n        self.client_parameters_stack = []\n        self.client_weights = []\n\n        return weight_sum"
  },
  {
    "path": "core/strategies/fedlabels.py",
    "content": "# Copyright (c) Microsoft Corporation.\n# Licensed under the MIT license.\n\nimport json\nimport logging\nimport os\n\nimport torch\nimport numpy as np\nfrom azureml.core import Run\n\nfrom core.strategies import BaseStrategy\nfrom utils import (\n    compute_grad_cosines, \n    print_rank, \n    to_device)\n\nrun = Run.get_context()\n\nclass FedLabels(BaseStrategy):\n    '''FedLabels: Semi-supervision strategy.'''\n\n    def __init__(self, mode, config, model_path=None):\n        '''\n        Args:\n            mode (str): which part the instantiated object should play,\n                typically either :code:`client` or :code:`server`.\n            config (dict): initial config dict.\n            model_path (str): where to find model, needed for debugging only.\n        '''\n\n        super().__init__(mode=mode, config=config, model_path=model_path)\n\n        if mode not in ['client', 'server']:\n            raise ValueError('mode in strategy must be either `client` or `server`')\n\n        self.config = config\n        self.model_path = model_path\n        self.mode = mode\n        self.model_config = config['model_config']\n        self.client_config = config['client_config']\n        self.server_config = config['server_config']\n        self.dp_config = config.get('dp_config', None)\n\n        self.tmp_sup = None\n        self.tmp_unsup = None\n\n        if mode == 'client':\n            self.stats_on_smooth_grad = self.client_config.get('stats_on_smooth_grad', False)\n        elif mode == 'server':\n            self.dump_norm_stats = self.config.get('dump_norm_stats', False)\n            self.aggregate_fast = self.server_config.get('fast_aggregation', False)\n\n            self.skip_model_update = False\n\n            # Initialize accumulators\n            self.client_parameters_stack = []\n            self.client_weights = []\n\n    def generate_client_payload(self, trainer):\n        '''Generate client payload\n\n        Args:\n            trainer (core.Trainer object): trainer on client.\n            unsup_dict (dict): unsupervised model state dictionary\n            iteration (int): training round\n            total_est_labels (int): labels generated\n\n        Returns:\n            dict containing payloads in some specified format.\n        '''\n\n        unsup_dict = trainer.algo_computation\n\n        if self.mode != 'client':\n            raise RuntimeError('this method can only be invoked by the client')\n\n        # Reset gradient stats and recalculate them on the smooth/pseudo gradient\n        if self.stats_on_smooth_grad:\n            trainer.reset_gradient_power()\n            trainer.estimate_sufficient_stats()\n\n        # Weight the gradient and preprocess state dictionaries from supervised and unsupervised model\n        weight = 1 if trainer.num_samples == 0 else trainer.num_samples\n        unsup_grads = [unsup_dict[param_tensor].to(torch.device('cpu')) for param_tensor in unsup_dict.keys()]\n        sup_grads = [trainer.model.state_dict()[param_tensor].to(torch.device('cpu')) for param_tensor in trainer.model.state_dict().keys()]\n\n        payload = {}\n        payload['weight'] = weight\n        payload['gradients'] = sup_grads + unsup_grads\n\n        return payload\n\n    def process_individual_payload(self, worker_trainer, payload):\n        '''Process client payload\n\n        Args:\n            worker_trainer (core.Trainer object): trainer on server\n                (aka model updater).\n            payload (dict): whatever is generated by\n                :code:`generate_client_payload`.\n\n        Returns:\n            True if processed succesfully, False otherwise.\n        '''\n\n        if self.mode != 'server':\n            raise RuntimeError('this method can only be invoked by the server')\n\n        if payload['weight'] == 0.0:\n            return False\n\n        self.client_weights.append(payload['weight'])\n        if self.aggregate_fast:\n            aggregate_gradients_inplace(worker_trainer.model, payload['gradients'])\n        else:\n            self.client_parameters_stack.append(payload['gradients'])\n        return True\n\n    def combine_payloads(self, worker_trainer, curr_iter, num_clients_curr_iter, total_clients, client_stats, logger=None):\n        '''Combine payloads to update model\n\n        Args:\n            worker_trainer (core.Trainer object): trainer on server\n                (aka model updater).\n            curr_iter (int): current iteration.\n            num_clients_curr_iter (int): number of clients on current iteration.\n            client_stats (dict): stats being collected.\n            logger (callback): function called to log quantities.\n\n        Returns:\n            losses, computed for use with LR scheduler.\n        '''\n\n        if self.mode != 'server':\n            raise RuntimeError('this method can only be invoked by the server')\n\n        # Aggregation step\n        if self.dump_norm_stats:\n            cps_copy = [[g.clone().detach() for g in x] for x in self.client_parameters_stack]\n        weight_sum, self.tmp_sup, self.tmp_unsup = self._aggregate_gradients(worker_trainer, num_clients_curr_iter, self.client_weights, metric_logger=logger)\n        print_rank('Sum of weights: {}'.format(weight_sum), loglevel=logging.DEBUG)\n        torch.cuda.empty_cache()\n\n        # Disjoint aggregation\n        tmp_both = {}\n        for param_key in self.tmp_unsup.keys():\n                tmp_both[param_key] = self.tmp_sup[param_key]/2 + self.tmp_unsup[param_key]/2\n        worker_trainer.model.load_state_dict(tmp_both)\n        \n        if self.dump_norm_stats:\n            cosines = compute_grad_cosines(cps_copy, [p.grad.clone().detach() for p in worker_trainer.model.parameters()])\n            with open(os.path.join(self.model_path, 'cosines.txt'), 'a', encoding='utf-8') as outfile:\n                outfile.write('{}\\n'.format(json.dumps(cosines)))\n\n        if self.skip_model_update is True:\n            print_rank('Skipping model update')\n            return\n\n        # Run optimization with gradient/model aggregated from clients\n        print_rank('Updating model')\n        worker_trainer.update_model()\n        print_rank('Updating learning rate scheduler')\n        losses = worker_trainer.run_lr_scheduler(force_run_val=False)\n\n        # TODO: Global DP. See dga.py\n\n        return losses\n\n    def _aggregate_gradients(self, worker_trainer, num_clients_curr_iter, client_weights, metric_logger=None):\n        '''Go through stored gradients, aggregate and put them inside model.\n\n        Args:\n            num_clients_curr_iter (int): how many clients were processed.\n            client_weights: weight for each client.\n            metric_logger (callback, optional): callback used for logging.\n                Defaults to None, in which case AML logger is used.\n\n        Returns:\n            float: sum of weights for all clients.\n            dict: supervised model state dictionary.\n            dict: unsupervised model state dicionary.\n        '''\n\n        if metric_logger is None:\n            metric_logger = run.log\n\n        # Separate sup/unsup dictionaries from client payload\n        sup_slice = int(len(self.client_parameters_stack[0])/2)\n        keys = [key for key in worker_trainer.model.state_dict()]\n        model_dicts = [client_dict[:sup_slice] for client_dict in self.client_parameters_stack]\n        unsup_dicts = [client_dict[sup_slice:] for client_dict in self.client_parameters_stack]\n\n        first = True\n        tmp_sup, tmp_unsup = {}, {}\n\n        # Compute radios for each model\n        weight_sum = sum(client_weights)\n        ratio_sup = 1/len(client_weights)\n        ratio_unsup = np.array(client_weights)/weight_sum\n\n        if not self.aggregate_fast:\n            # Perform aggregation for supervised model\n            for i, client_parameters in enumerate(model_dicts):\n                first, tmp_sup = aggregate_gradients_inplace(keys, client_parameters, first, tmp_sup, ratio_sup)\n            first = True\n            \n            # Perform aggregation for unsupervised model\n            for j, client_parameters in enumerate(unsup_dicts):\n                first, tmp_unsup = aggregate_gradients_inplace(keys, client_parameters, first, tmp_unsup, ratio_unsup[j])\n        \n        # Some cleaning\n        self.client_parameters_stack = []\n        self.client_weights = []\n\n        return weight_sum, tmp_sup, tmp_unsup\n\ndef aggregate_gradients_inplace(keys, values, first, tmp, ratio):\n    '''Aggregate list of tensors into model dictionary.\n\n    Args:\n        keys (list): state dictionary keys of model to which dictionaries will be summed.\n        values (list): list of values to sum to model dictionary.\n        first (bool): flag that indicates the first value in the dictionary.\n        tmp (dict): model state dictionary that will be summed.\n        ratio (float): radio to weight each client value.\n    '''\n\n    for param_key, client_dict in zip (keys, values):\n        if first:\n            tmp[param_key] = to_device(client_dict) * ratio\n        else:\n            tmp[param_key] += to_device(client_dict) * ratio\n\n    return False, tmp"
  },
  {
    "path": "core/strategies/utils.py",
    "content": "# Copyright (c) Microsoft Corporation.\n# Licensed under the MIT license.\n\nimport logging\n\nimport numpy as np\n\nfrom utils import print_rank, to_device\n\n\ndef filter_weight(weight):\n    '''Handles aggregation weights if something messed them up'''\n    print_rank('Client Weight BEFORE filtering: {}'.format(weight), loglevel=logging.DEBUG)\n    if np.isnan(weight) or not np.isfinite(weight):\n        weight = 0.0\n    elif weight > 100:\n        weight = 100\n    print_rank('Client Weights AFTER filtering: {}'.format(weight), loglevel=logging.DEBUG)\n    return weight\n\ndef aggregate_gradients_inplace(model, gradients):\n    '''Aggregate list of tensors into model gradients.\n\n    Args:\n        model (torch.nn.Module): model to which gradients will be summed.\n        gradients (list): list of gradients to sum to model.\n    '''\n\n    for p, client_grad in zip(model.parameters(), gradients):\n        if p.grad is None:\n            p.grad = to_device(client_grad)\n        else:\n            p.grad += to_device(client_grad)"
  },
  {
    "path": "core/trainer.py",
    "content": "# Copyright (c) Microsoft Corporation.\n# Licensed under the MIT license.\n\nimport logging\nimport os\nimport re\nimport copy \n\nimport random\nimport numpy as np\nimport torch\nimport torch.nn as nn\nimport torch.nn.functional as F\nfrom torch.utils.data import DataLoader\n\nfrom core.metrics import Metrics\nfrom utils import \\\n    get_lr, \\\n    get_lr_all, \\\n    make_optimizer, \\\n    make_lr_scheduler, \\\n    print_rank, \\\n    torch_save, \\\n    try_except_save, \\\n    write_yaml\nfrom utils.utils import (\n    to_device, \n    get_label_VAT)\n\nclass TrainerBase:\n    \"\"\"Abstract class defining Trainer objects' common interface.\n\n    Args:\n        model (torch.nn.Module): model to be trained.\n        train_dataloader (torch.utils.data.DataLoader): dataloader that\n            provides the training data.\n        optimizer: (torch.optim.Optimizer): optimizer that will be used to\n            update the model.\n        max_grad_norm (float): if not None, avg gradients are clipped to this\n            norm; defaults to None.\n        ignore_subtask (bool): ignore subtasks, defaults to True.\n        model_type (str): what kind of model is used, defaults to\n            :code:`LanguageModel`.\n        decoder_config (dict or None): config for decoder, defaults to None.\n    \"\"\"\n\n    def __init__(\n        self,\n        model,\n        train_dataloader,\n        optimizer,\n        max_grad_norm=None,\n        ignore_subtask=True,\n        model_type=\"LanguageModel\",\n        decoder_config=None\n    ):\n\n        self.model = model\n        self.train_dataloader = train_dataloader\n        self.optimizer = optimizer\n        self.max_grad_norm = max_grad_norm\n        self.model_type = model_type\n        self.decoder_config = decoder_config\n\n        self.step = 0  # count how many batches are processed\n        self.ignore_subtask = ignore_subtask  # ignore subtasks even if there are multiple task branches\n\n    def epoch_boundary(self):\n        '''Check if we are at the end of any given epoch.'''\n        return self.step % len(self.train_dataloader.create_loader()) == 0 and self.step != 0\n\n    def train_desired_samples(self, desired_max_samples, apply_privacy_metrics):\n        pass\n\n    def save(self):\n        pass\n\n    def load(self):\n        pass\n\n\nclass ModelUpdater(TrainerBase):\n    \"\"\"Update the model, given the already computed gradient.\n\n    This is a special kind of trainer, that actually does not use any data.\n\n    Args:\n        model (torch.nn.Module): model to be updated.\n        optimizer (torch.optim.Optimizer): optimizer that will be used to\n            update the model.\n        ss_scheduler: scheduled sampler.\n        train_dataloader: train dataloader, this is not actually used.\n        val_dataloader: val dataloader, this is not actually used.\n        max_grad_norm (float): avg gradients are clipped to this norm.\n        anneal_config (dict): annealing configuration.\n        model_type (str): what kind of model is used, defaults to\n            :code:`LanguageModel`.\n        decoder_config (dict): config for decoder, defaults to None.\n    \"\"\"\n\n    def __init__(\n        self,\n        model,\n        optimizer,\n        ss_scheduler,\n        train_dataloader,\n        val_dataloader,\n        max_grad_norm,\n        anneal_config,\n        model_type=\"LanguageModel\",\n        decoder_config=None\n    ):\n        super().__init__(\n            model=model,\n            train_dataloader=train_dataloader,\n            optimizer=optimizer,\n            max_grad_norm=max_grad_norm,\n            model_type=model_type,\n            decoder_config=decoder_config\n        )\n\n        self.val_dataloader = val_dataloader\n        self.annealing_type = anneal_config[\"type\"] if anneal_config is not None else None\n        self.lr_scheduler = make_lr_scheduler(anneal_config, self.optimizer)\n        self.ss_scheduler = ss_scheduler\n\n    def update_model(self):\n        \"\"\"Update model parameters using pre-computed gradients.\"\"\"\n\n        # Apply gradient clipping\n        if self.max_grad_norm is not None:\n            grad_norm = nn.utils.clip_grad_norm_(self.model.parameters(), self.max_grad_norm)\n            print_rank(f\"clipped norm: {grad_norm} to {min(grad_norm,self.max_grad_norm)}\", logging.DEBUG)\n\n        # Do optimizer step\n        self.optimizer.step()\n        self.optimizer.zero_grad()\n\n    def run_lr_scheduler(self, force_run_val=False):\n        \"\"\"Update learning rate using scheduler.\"\"\"\n\n        val_loss = val_acc = None\n        if force_run_val is True or self.annealing_type == \"val_loss\":\n            _, val_loss, val_acc = run_validation_generic(self.model, self.val_dataloader)\n\n        # Do LR scheduling\n        print_rank(f\"LR all: {list(get_lr_all(self.optimizer))}\", loglevel=logging.DEBUG)\n        print_rank(\"LR BEFORE lr_scheduler step: {}\".format(get_lr(self.optimizer)))\n        if self.annealing_type == \"val_loss\":\n            self.lr_scheduler.step(val_loss)\n        else:\n            self.lr_scheduler.step()\n        print_rank(\"LR AFTER lr_scheduler step: {}\".format(get_lr(self.optimizer)), loglevel=logging.DEBUG)\n\n        return (val_loss, val_acc)\n\n    def run_ss_scheduler(self):\n        \"\"\"Do scheduled sampling.\"\"\"\n\n        if self.ss_scheduler is not None:\n            self.ss_scheduler.step()\n\n    def save(self, model_path, token=None, config=None):\n        \"\"\"Save model to disk.\"\"\"\n\n        save_model(\n            model_path=model_path,\n            config=config,\n            model=self.model,\n            optimizer=self.optimizer,\n            lr_scheduler=self.lr_scheduler,\n            ss_scheduler=self.ss_scheduler,\n            token=token\n        )\n\n    def load(self, save_path, update_lr_scheduler, update_ss_scheduler):\n        \"\"\"Load model from disk.\n\n        If save_path is given, load from there. If not, then resume training\n        from current model dir.  If at any point the save_path is not present on\n        the disk, it won't be loaded.\n        \"\"\"\n\n        if os.path.isfile(save_path):\n            print_rank(\"Loading checkpoint: {}\".format(save_path))\n            checkpoint = torch.load(save_path)\n            self.model.load_state_dict(checkpoint[\"model_state_dict\"])\n            if self.optimizer is not None:\n                self.optimizer.load_state_dict(checkpoint[\"optimizer_state_dict\"])\n\n            anl_st_dict = checkpoint.get(\"lr_scheduler_state_dict\")\n            if anl_st_dict and self.lr_scheduler is not None and update_lr_scheduler is True:\n                self.lr_scheduler.load_state_dict(anl_st_dict)\n\n            sss_st_dict = checkpoint.get(\"ss_scheduler_state_dict\")\n            if sss_st_dict and self.ss_scheduler is not None and update_lr_scheduler is True:\n                self.ss_scheduler.load_state_dict(sss_st_dict)\n\n\nclass Trainer(TrainerBase):\n    \"\"\"Perform training step for any given client.\n\n    The main method to be called for triggering a training step is\n    :code:`train_desired_samples`, which on its turn relies on\n    :code:`run_train_epoch`.\n\n    Args:\n        model (torch.nn.Module): model to be trained.\n        ss_scheduler: scheduled sampler.\n        train_dataloader (torch.data.utils.DataLoader): dataloader that\n            provides the training data.\n        server_replay_config (dict or None): config for replaying training;\n            defaults to None, in which case no replaying happens.\n        optimizer (torch.optim.Optimizer or None): optimizer that will be used\n            to update the model. If :code:`None`, skip optimization.\n        max_grad_norm (float or None): if not None, avg gradients are clipped\n            to this norm; defaults to None.\n        anneal_config (dict or None): annealing configuration.\n        num_skips_threshold (int): previously used to skip users, deprecated.\n        ignore_subtask (bool): ignore subtasks, defaults to True.\n    \"\"\"\n\n    def __init__(\n        self,\n        model,\n        ss_scheduler,\n        train_dataloader,\n        server_replay_config=None,\n        optimizer=None,\n        max_grad_norm=None,\n        anneal_config=None,\n        num_skips_threshold=-1,\n        ignore_subtask=True\n    ):\n        super().__init__(\n            model=model,\n            train_dataloader=train_dataloader,\n            optimizer=optimizer,\n            max_grad_norm=max_grad_norm,\n            ignore_subtask=ignore_subtask\n        )\n\n        self.server_replay_config=None\n        if server_replay_config is not None:\n            self.server_replay_config = server_replay_config\n\n        self.anneal_config=None\n        if anneal_config is not None:\n            self.anneal_config = anneal_config\n\n        self.lr_scheduler = None\n        if self.optimizer is None and self.server_replay_config is not None and \"optimizer\" in self.server_replay_config:\n            self.optimizer = make_optimizer(self.server_replay_config[\"optimizer_config\"], model)\n\n        if self.optimizer is not None and self.anneal_config is not None:\n            self.lr_scheduler = make_lr_scheduler(\n                                                self.anneal_config,\n                                                self.optimizer)\n\n        self.cached_batches = []\n        self.ss_scheduler = ss_scheduler\n\n    def reset_gradient_power(self):\n        \"\"\"Reset the sum of gradient power.\n\n        This is used to compute statistics about the gradients.\n        \"\"\"\n\n        self.sum_grad = self.sum_grad2 = self.counter = 0\n\n    def accumulate_gradient_power(self):\n        \"\"\"Compute sum of gradient power.\n\n        This is used to compute statistics about the gradients.\n        \"\"\"\n\n        for p in self.model.parameters():\n            if p.grad is None:\n                continue\n\n            grad = p.grad.detach().clone().cpu().numpy()\n            p1 = np.sum(grad)\n            p2 = np.sum(grad ** 2)\n            n = p.grad.numel()\n\n            self.sum_grad += p1\n            self.sum_grad2 += p2\n            self.counter += n\n\n        print_rank(\"Magn. Grad. Squared: {}\".format(self.sum_grad2), loglevel=logging.DEBUG)\n        print_rank(\"Magn. Grad.: {}\".format(self.sum_grad), loglevel=logging.DEBUG)\n        return self.sum_grad, self.sum_grad2, self.counter\n\n    def estimate_sufficient_stats(self):\n        \"\"\"Compute statistics about the gradients.\"\"\"\n\n        sum_mean_grad, sum_mean_grad2, n = self.accumulate_gradient_power()\n\n        mean_grad = sum_mean_grad / n\n        mag_grad = np.sqrt(sum_mean_grad2 / n)\n        var_grad = sum_mean_grad2 / n - mag_grad**2\n        norm_grad = np.sqrt(sum_mean_grad2)\n\n        self.sufficient_stats = {\n            \"n\": n,\n            \"sum\": sum_mean_grad,\n            \"sq_sum\": sum_mean_grad2,\n            \"var\": var_grad,\n            \"mean\": mean_grad,\n            \"mag\": mag_grad,\n            \"norm\": norm_grad\n        }\n\n    def train_desired_samples(self, desired_max_samples=None, apply_privacy_metrics=False, algo_payload = None):\n        \"\"\"Triggers training step.\n\n        Args:\n            desired_max_samples (int): number of samples that you would like to process.\n            apply_privacy_metrics (bool): whether to save the batches used for the round for privacy metrics evaluation.\n\n        Returns:\n            2-tuple of (float, int): total training loss and number of processed samples.\n        \"\"\"\n\n        num_samples = 0\n        total_train_loss = 0\n        algo_computation = None\n\n        if algo_payload == None:\n            num_samples_per_epoch, train_loss_per_epoch = self.run_train_epoch(desired_max_samples, apply_privacy_metrics)\n        elif algo_payload['strategy'] == 'FedLabels':\n            num_samples_per_epoch, train_loss_per_epoch, algo_computation = self.run_train_epoch_sup(desired_max_samples, apply_privacy_metrics, algo_payload)\n        elif algo_payload['strategy'] == 'FedProx':\n            num_samples_per_epoch, train_loss_per_epoch = self.run_train_epoch_fedprox(desired_max_samples, apply_privacy_metrics, algo_payload)\n\n        num_samples += num_samples_per_epoch\n        total_train_loss += train_loss_per_epoch\n\n        return total_train_loss, num_samples, algo_computation\n\n    def run_train_epoch(self, desired_max_samples=None, apply_privacy_metrics=False):\n        \"\"\"Implementation example for training the model.\n\n        The training process should stop after the desired number of samples is processed.\n\n        Args:\n            desired_max_samples (int): number of samples that you would like to process.\n            apply_privacy_metrics (bool): whether to save the batches used for the round for privacy metrics evaluation.\n\n        Returns:\n            2-tuple of (int, float): number of processed samples and total training loss.\n        \"\"\"\n\n        sum_train_loss = 0.0\n        num_samples = 0\n        self.reset_gradient_power()\n\n        # Reset gradient just in case\n        self.model.zero_grad()\n\n        train_loader = self.train_dataloader.create_loader()\n        for batch in train_loader:\n            if desired_max_samples is not None and num_samples >= desired_max_samples:\n                break\n\n            # Compute loss\n            if self.optimizer is not None:\n                self.optimizer.zero_grad()\n\n            if self.ignore_subtask is True:\n                loss = self.model.single_task_loss(batch)\n            else:\n                if apply_privacy_metrics:\n                    if \"x\" in batch:\n                        indices = to_device(batch[\"x\"])\n                    elif \"input_ids\" in batch:\n                        indices = to_device(batch[\"input_ids\"])\n                    self.cached_batches.append(indices)\n                loss = self.model.loss(batch)\n            loss.backward()\n\n            # Apply gradient clipping\n            if self.max_grad_norm is not None:\n                grad_norm = nn.utils.clip_grad_norm_(self.model.parameters(), self.max_grad_norm)\n\n            # Sum up the gradient power\n            self.estimate_sufficient_stats()\n\n            # Now that the gradients have been scaled, we can apply them\n            if self.optimizer is not None:\n                self.optimizer.step()\n\n            print_rank(\"step: {}, loss: {}\".format(self.step, loss.item()), loglevel=logging.DEBUG)\n\n            # Post-processing in this loop\n            # Sum up the loss\n            sum_train_loss += loss.item()\n\n            # Increment the number of frames processed already\n            if \"attention_mask\" in batch:\n                num_samples += torch.sum(batch[\"attention_mask\"].detach().cpu() == 1).item()\n            elif \"total_frames\" in batch:\n                num_samples += batch[\"total_frames\"]\n            else:\n                num_samples += len(batch[\"x\"])\n\n            # Update the counters\n            self.step += 1\n\n        # Take a step in lr_scheduler\n        if self.lr_scheduler is not None:\n            self.lr_scheduler.step()\n\n        return num_samples, sum_train_loss\n    \n    def run_train_epoch_fedprox(self, desired_max_samples=None, apply_privacy_metrics=False, algo_payload=None):\n        \"\"\"Implementation example for training the model.\n\n        The training process should stop after the desired number of samples is processed.\n\n        Args:\n            desired_max_samples (int): number of samples that you would like to process.\n            apply_privacy_metrics (bool): whether to save the batches used for the round for privacy metrics evaluation.\n            algo_payload (dict): hyperparameters needed to fine-tune FedProx algorithm.\n\n        Returns:\n            2-tuple of (int, float): number of processed samples and total training loss.\n        \"\"\"\n\n        sum_train_loss = 0.0\n        num_samples = 0\n        self.reset_gradient_power()\n\n        # Reset gradient just in case\n        self.model.zero_grad()\n\n        # FedProx parameters\n        mu = algo_payload['mu']\n        global_model = to_device(copy.deepcopy(self.model))\n        global_weight_collector = list(global_model.parameters())\n\n        train_loader = self.train_dataloader.create_loader()\n        for batch in train_loader:\n            if desired_max_samples is not None and num_samples >= desired_max_samples:\n                break\n\n            # Compute loss\n            if self.optimizer is not None:\n                self.optimizer.zero_grad()\n\n            if self.ignore_subtask is True:\n                loss = self.model.single_task_loss(batch)\n            else:\n                if apply_privacy_metrics:\n                    if \"x\" in batch:\n                        indices = to_device(batch[\"x\"])\n                    elif \"input_ids\" in batch:\n                        indices = to_device(batch[\"input_ids\"])\n                    self.cached_batches.append(indices)\n                loss = self.model.loss(batch)\n            \n            # Fedprox regularization term\n            fed_prox_reg = 0.0\n            for param_index, param in enumerate(self.model.parameters()):\n                fed_prox_reg += ((mu / 2) * torch.norm((param - global_weight_collector[param_index]))**2)\n                loss += fed_prox_reg\n            loss.backward()\n\n            # Apply gradient clipping\n            if self.max_grad_norm is not None:\n                grad_norm = nn.utils.clip_grad_norm_(self.model.parameters(), self.max_grad_norm)\n\n            # Sum up the gradient power\n            self.estimate_sufficient_stats()\n\n            # Now that the gradients have been scaled, we can apply them\n            if self.optimizer is not None:\n                self.optimizer.step()\n\n            print_rank(\"step: {}, loss: {}\".format(self.step, loss.item()), loglevel=logging.DEBUG)\n\n            # Post-processing in this loop\n            # Sum up the loss\n            sum_train_loss += loss.item()\n\n            # Increment the number of frames processed already\n            if \"attention_mask\" in batch:\n                num_samples += torch.sum(batch[\"attention_mask\"].detach().cpu() == 1).item()\n            elif \"total_frames\" in batch:\n                num_samples += batch[\"total_frames\"]\n            else:\n                num_samples += len(batch[\"x\"])\n\n            # Update the counters\n            self.step += 1\n\n        # Take a step in lr_scheduler\n        if self.lr_scheduler is not None:\n            self.lr_scheduler.step()\n\n        return num_samples, sum_train_loss\n    \n    def run_train_epoch_sup(self, desired_max_samples=None, apply_privacy_metrics=False, algo_payload=None):\n        \"\"\"Implementation example for training the model using semisupervision.\n\n        Args:\n            desired_max_samples (int): number of samples that you would like to process.\n            apply_privacy_metrics (bool): whether to save the batches used for the round for privacy metrics evaluation.\n            algo_payload (dict): datasets and configuration used during training for the FedLabels algorithm.\n\n        Returns:\n            3-tuple of (int, float, dict): number of processed samples, total training loss and unsupervised model state dict.\n        \"\"\"\n\n        sum_train_loss = 0.0\n        num_samples = 0\n        round_ = algo_payload['iter']\n        semisupervision_config = algo_payload['config']\n        self.reset_gradient_power()\n\n        # Reset gradient just in case\n        self.model.zero_grad()\n\n        KL_pointLoss = torch.nn.KLDivLoss(reduction=\"none\", log_target=True)\n        MSELoss = torch.nn.MSELoss()\n        Softmax = torch.nn.LogSoftmax(dim=1)\n        nolog_Softmax = torch.nn.Softmax(dim=1)\n        initial_net = copy.deepcopy(self.model)\n        loss_func = torch.nn.CrossEntropyLoss()\n\n        # Create datasets\n        normal_dataset, unsupdataset, unsupdataset_rand  = algo_payload['data'][0], algo_payload['data'][1], algo_payload['data'][2]\n        self.optimizer = torch.optim.SGD(self.model.parameters(), lr=0.003, momentum=0)\n\n        for i in range(int(semisupervision_config['train_ep'])):\n            sup_train = DataLoader(normal_dataset, batch_size=64, shuffle=True)\n            data_sup = iter(sup_train)\n            (images, labels) = next(data_sup)\n            self.model.zero_grad()\n            labels = to_device(labels)\n            log_probs = self.model(to_device(images))\n            loss = loss_func(log_probs, labels)\n            num_samples+= len(labels)\n            sum_train_loss += loss.item()\n            loss.backward()\n            self.optimizer.step()\n\n        self.estimate_sufficient_stats()\n        self.step += 1 # Update the counters\n        print_rank(\"step: {}, loss: {}\".format(self.step, loss.item()), loglevel=logging.DEBUG)\n\n        net = copy.deepcopy(initial_net)\n        optimizer = torch.optim.SGD(net.parameters(), lr=semisupervision_config['eta'], momentum=0)\n        total_est_labels = 0\n        total_est_ratios = 0\n        correct = 0\n\n        if round_ >= semisupervision_config['burnout_round']:\n            for _ in range(int(semisupervision_config['unsuptrain_ep'])):\n                data_idx = random.sample(range(len(unsupdataset)), semisupervision_config['unl_bs']) \n                partitioned = torch.utils.data.Subset(unsupdataset, indices=data_idx)\n                ldr_train = DataLoader(partitioned, batch_size=semisupervision_config['bs'], shuffle=False)\n\n                (images, true_labels) = next(iter(ldr_train))\n                images, true_labels = to_device(images), to_device(true_labels)\n\n                initial_net.eval()\n                self.model.eval()\n\n                with torch.no_grad():\n                    output_local = initial_net(images).detach()\n                    output_server = self.model(images).detach()\n\n                local_logits = nolog_Softmax(output_local/semisupervision_config['temp'])\n                server_logits = nolog_Softmax(output_server / semisupervision_config['temp'])\n                est_labels, est_idx, est_var, est_ratio = get_label_VAT(local_logits, server_logits, semisupervision_config['thre'], semisupervision_config['comp'])\n                total_est_labels += len(est_labels)\n                total_est_ratios += est_ratio/semisupervision_config['unsuptrain_ep']\n\n                if len(est_labels) != 0:\n                    partitioned_rand = torch.utils.data.Subset(unsupdataset_rand, indices=data_idx)\n                    ldr_rand_train = DataLoader(partitioned_rand, batch_size=semisupervision_config['bs'], shuffle=False)\n                    (rand_images, _) = next(iter(ldr_rand_train))\n                    rand_images = to_device(rand_images)\n\n                    correct += ((est_labels == true_labels[est_idx]).sum().item()) / (\n                                len(est_idx) * semisupervision_config['unsuptrain_ep'])\n\n                    lamb_consist = semisupervision_config['vat_consis']\n                    net.train()\n\n                    output = net(rand_images[est_idx]) if semisupervision_config['uda'] == 1 else net(images[est_idx])\n                    output_norand = net(images[est_idx])\n\n                    # Compute Losses, this should go inside model.py\n                    unsup_loss = loss_func(output, est_labels)\n                    kl_point_loss = KL_pointLoss(Softmax(output_norand / semisupervision_config['temp']), Softmax(output_server[est_idx]/semisupervision_config['temp']))\n                    consist_loss = torch.tensor(0.0, requires_grad=True)\n                    consist_tmp = torch.tensor(0.0)\n\n                    for i in range(len(est_var)):\n                        if torch.argmax(local_logits[est_idx[i]]) == torch.argmax(server_logits[est_idx[i]]):\n                            dummy = kl_point_loss[i]*est_var[i]\n                            consist_tmp += 1\n                            consist_loss = consist_loss+ dummy.sum()\n\n                    if consist_tmp != torch.tensor(0.0):\n                        consist_loss = consist_loss/consist_tmp\n\n                    l2_lambda = semisupervision_config['l2_lambda']\n                    initial_net.eval()\n                    reg_loss = torch.tensor(0., requires_grad=True)\n                    for p, prev_param in zip(net.parameters(), initial_net.parameters()):\n                        reg_loss = reg_loss + MSELoss(p, prev_param)\n\n                    (semisupervision_config['unsup_lamb']*unsup_loss + lamb_consist*consist_loss+l2_lambda*reg_loss).backward(retain_graph=True)\n                    optimizer.step()\n\n        return total_est_labels, sum_train_loss/semisupervision_config['ensize'], net.state_dict()\n\n    def get_model(self):\n        return copy.deepcopy(self.model)\n\n    def prepare_iteration(self, model=None):\n        \"\"\"Steps to run before iteration begins.\"\"\"\n\n        if model is not None:\n            self.model.load_state_dict(model.state_dict())\n\n            self.lr_scheduler = None\n            if self.optimizer is None and self.server_replay_config is not None and \\\n                    \"optimizer_config\" in self.server_replay_config:\n                print_rank(\"Creating server-side replay training optimizer\", loglevel=logging.DEBUG)\n                self.optimizer = make_optimizer(self.server_replay_config[\"optimizer_config\"], self.model)\n\n            if self.optimizer is not None and self.anneal_config is not None:\n                print_rank(\"Creating server-side replay-training lr_scheduler\", loglevel=logging.DEBUG)\n                self.lr_scheduler = make_lr_scheduler(self.anneal_config, self.optimizer)\n\n    def reset_optimizer(self, optimizer_state_dict, annealing_config=None):\n        \"\"\"Re-load optimizer.\"\"\"\n\n        assert self.optimizer is not None, \"This trainer does not have an optimizer\"\n\n        # Load optimizer on state dict\n        self.optimizer.load_state_dict(optimizer_state_dict)\n\n        # Set learning rate scheduler\n        self.lr_scheduler = None\n        if annealing_config is not None:\n            self.lr_scheduler = make_lr_scheduler(annealing_config, self.optimizer)\n\n    def save(self, model_path, token=None, config=None):\n        \"\"\"Save model to disk.\"\"\"\n\n        save_model(\n            model_path=model_path,\n            config=config,\n            model=self.model,\n            optimizer=self.optimizer,\n            lr_scheduler=self.lr_scheduler,\n            ss_scheduler=self.ss_scheduler,\n            token=token\n        )\n\n    def load(self, save_path, update_lr_scheduler, update_ss_scheduler):\n        \"\"\"Load model from disk.\n\n        If save_path is given, load from there. If not, then resume training\n        from current model dir.  If at any point the save_path is not present on\n        the disk, it won't be loaded.\n        \"\"\"\n\n        if os.path.isfile(save_path):\n            print_rank(\"Loading checkpoint: {}\".format(save_path))\n            checkpoint = torch.load(save_path)\n            self.model.load_state_dict(checkpoint[\"model_state_dict\"])\n            if self.optimizer is not None:\n                self.optimizer.load_state_dict(checkpoint[\"optimizer_state_dict\"])\n\n            anl_st_dict = checkpoint.get(\"lr_scheduler_state_dict\")\n            if anl_st_dict and self.lr_scheduler is not None and update_lr_scheduler is True:\n                self.lr_scheduler.load_state_dict(anl_st_dict)\n\n            sss_st_dict = checkpoint.get(\"ss_scheduler_state_dict\")\n            if sss_st_dict and self.ss_scheduler is not None and update_lr_scheduler is True:\n                self.ss_scheduler.load_state_dict(sss_st_dict)\n\n\ndef run_validation_generic(model, val_dataloader):\n    \"\"\"Perform a validation step.\n\n    Args:\n        model (torch.nn.Module): model to be validated.\n        val_dataloader (torch.data.utils.DataLoader): provides val data.\n\n    Returns:\n        Average validation loss.\n    \"\"\"\n\n    print_rank(\"run_validation_generic\", loglevel=logging.DEBUG)\n    model.set_eval()\n    print_rank(\"set_eval\", loglevel=logging.DEBUG)\n\n    # Initialize dataloader etc.\n    val_loader = val_dataloader.create_loader()\n    print_rank(\n        f\"created loader {val_loader.num_workers}, \" + \\\n        f\"users: {len(val_dataloader.dataset.user_list)} \" + \\\n        f\"examples: {sum(val_dataloader.dataset.num_samples)} \" + \\\n        f\"lendata: {len(val_loader)} \",\n        loglevel=logging.DEBUG\n    )\n\n    print_rank(\n        f\"drop_last: {val_loader.drop_last} \" + \\\n        f\"len_sampler: {len(val_loader._index_sampler)}\",\n        loglevel=logging.DEBUG\n    )\n\n    print_rank(\"Loading metrics ...\", logging.DEBUG)\n    metrics_cl = Metrics()\n    return metrics_cl.compute_metrics(dataloader=val_loader, model=model)\n\ndef set_component_wise_lr(model, optimizer_config, updatable_names):\n    \"\"\"Set zero learning rate for layers in order to freeze the update.\n\n    Args:\n        model (torch.nn.Module):\n        optimizer_config (string):\n        updatable_names (list): [\"^dec_rnn\", \"^fc\"]\n    \"\"\"\n\n    def name_matched(name, updatable_names):\n        for updatable_name in updatable_names:\n            if re.match(updatable_name, name) is not None:\n                return True\n\n        return False\n\n    # Set learning rate to zero in layers which name does not follow regex\n    parameters = []\n    for name, params in model.named_parameters():\n        if name_matched(name, updatable_names) is True:\n            print_rank(\"updating {} with lr = {}\".format(name, optimizer_config[\"lr\"]))\n            parameters.append({\"params\": params, \"lr\":optimizer_config[\"lr\"]})\n        else:\n            print_rank(\"freezing {}\".format(name))\n            parameters.append({\"params\": params, \"lr\": 0.0})\n\n    return parameters\n\ndef save_model(model_path, config, model, optimizer, lr_scheduler, ss_scheduler, token=None):\n    \"\"\"Save a model as well as training information.\"\"\"\n\n    save_state = {\n        \"model_state_dict\": model.state_dict(),\n        \"optimizer_state_dict\": optimizer.state_dict() if optimizer is not None else None,\n        \"lr_scheduler_state_dict\": lr_scheduler.state_dict() if lr_scheduler is not None else None\n    }\n    if ss_scheduler is not None:\n        save_state[\"ss_scheduler_state_dict\"] = ss_scheduler.state_dict()\n\n    if token:  # just save as \"best\" and return\n        save_path = os.path.join(model_path, \"{}_model.tar\".format(token))\n    else:\n        save_path = os.path.join(model_path, \"model.tar\")\n\n    print_rank(\"Saving model to: {}\".format(save_path))\n    try_except_save(torch_save, state_or_model=save_state, save_path=save_path)\n\n    # Write out the config to model_dir\n    if config is not None:\n        try_except_save(write_yaml, config=config,\n                save_path=os.path.join(model_path, \"config.yaml\"))\n"
  },
  {
    "path": "doc/sphinx/Makefile",
    "content": "# Minimal makefile for Sphinx documentation\n#\n\n# You can set these variables from the command line, and also\n# from the environment for the first two.\nSPHINXOPTS    ?=\nSPHINXBUILD   ?= sphinx-build\nSOURCEDIR     = .\nBUILDDIR      = _build\n\n# Put it first so that \"make\" without argument is like \"make help\".\nhelp:\n\t@$(SPHINXBUILD) -M help \"$(SOURCEDIR)\" \"$(BUILDDIR)\" $(SPHINXOPTS) $(O)\n\n.PHONY: help Makefile\n\n# Catch-all target: route all unknown targets to Sphinx using the new\n# \"make mode\" option.  $(O) is meant as a shortcut for $(SPHINXOPTS).\n%: Makefile\n\t@$(SPHINXBUILD) -M $@ \"$(SOURCEDIR)\" \"$(BUILDDIR)\" $(SPHINXOPTS) $(O)\n"
  },
  {
    "path": "doc/sphinx/advanced.rst",
    "content": "Advanced Topics\n===============\n\nPrivacy\n-------\n\nAggregation Options\n-------------------\n\n\nOptimizer Options\n-----------------"
  },
  {
    "path": "doc/sphinx/class_reference.rst",
    "content": "\n\nClass Reference\n===============\n\nFLUTE Core\n~~~~~~~~~~\n\ncore/server\n-----------\n\n.. automodule:: core.server\n   :members:\n   :special-members: __init__\n\ncore/client\n-----------\n\n.. automodule:: core.client\n   :members:\n   :special-members: __init__\n\ncore/federated\n--------------\n\n.. automodule:: core.federated\n   :members:\n   :special-members: __init__\n\n\ncore/config\n-----------\n.. automodule:: core.config\n   :members:\n   :special-members: __init__\n"
  },
  {
    "path": "doc/sphinx/conf.py",
    "content": "# Configuration file for the Sphinx documentation builder.\n#\n# This file only contains a selection of the most common options. For a full\n# list see the documentation:\n# https://www.sphinx-doc.org/en/master/usage/configuration.html\n\n# -- Path setup --------------------------------------------------------------\n\n# If extensions (or modules to document with autodoc) are in another directory,\n# add these directories to sys.path here. If the directory is relative to the\n# documentation root, use os.path.abspath to make it absolute, like shown here.\n#\n# import os\n# import sys\n# sys.path.insert(0, os.path.abspath('.'))\n\n\n# -- Project information -----------------------------------------------------\n\nproject = 'FLUTE'\ncopyright = '2021, Microsoft Research'\nauthor = 'Microsoft Research'\n\n\n# -- General configuration ---------------------------------------------------\n\n# Add any Sphinx extension module names here, as strings. They can be\n# extensions coming with Sphinx (named 'sphinx.ext.*') or your custom\n# ones.\nextensions = [\n    'sphinx.ext.autodoc'\n]\n\n# Add any paths that contain templates here, relative to this directory.\ntemplates_path = ['_templates']\n\n# List of patterns, relative to source directory, that match files and\n# directories to ignore when looking for source files.\n# This pattern also affects html_static_path and html_extra_path.\nexclude_patterns = ['_build', 'Thumbs.db', '.DS_Store']\n\n\n# -- Options for HTML output -------------------------------------------------\n\n# The theme to use for HTML and HTML Help pages.  See the documentation for\n# a list of builtin themes.\n#\n#html_theme = 'alabaster'\n\n# Add any paths that contain custom static files (such as style sheets) here,\n# relative to this directory. They are copied after the builtin static files,\n# so a file named \"default.css\" will overwrite the builtin \"default.css\".\nhtml_static_path = ['_static']\n\nimport sphinx_rtd_theme\n\nhtml_theme = 'sphinx_rtd_theme'\n\nhtml_theme_path = [sphinx_rtd_theme.get_html_theme_path()]"
  },
  {
    "path": "doc/sphinx/index.rst",
    "content": ".. FLUTE documentation master file, created by\n   sphinx-quickstart on Sat Jun 19 09:15:36 2021.\n   You can adapt this file completely to your liking, but it should at least\n   contain the root `toctree` directive.\n\nWelcome to FLUTE documentation!\n===============================\n\n.. toctree::\n   :maxdepth: 2\n   :caption: Contents:\n\n   overview\n   scenarios\n   launch\n   advanced\n   reference\n   class_reference\n\nIndices and tables\n==================\n\n* :ref:`genindex`\n* :ref:`modindex`\n* :ref:`search`\n"
  },
  {
    "path": "doc/sphinx/launch.rst",
    "content": "Launch FLUTE\n================\n\nLocal run\n------------\n\nInstall the requirements stated inside of requirements.txt. Ideally this sould be done inside of a virtual environment, for instance, using Anaconda.\n\n.. code:: bash\n    conda create -n FLUTE python==3.8\n    pip install -r requirements.txt\n\nFLUTE uses torch.distributed API as its main communication backbone, supporting three buil-in backends. For more information please refer to [Distributed Communication Package](https://pytorch.org/docs/stable/distributed.html). Therefore, we highly suggest to use NCCL backend for distributed GPU training and Gloo for distributed CPU training. There is no `setup.py` as FLUTE is not currently distributed as a package, but instead meant to run from the root of the repository.\n\nAfter this initial setup you can use your data for launching a local run. However the following instructions will be adapted to run ``nlg_gru`` task. For running this example, you need to first download and preprocess the data. Instructions can be found `here`_.  Once the data is available you can run FLUTE from root as follows:\n\n.. code:: bash\n\n    python -m torch.distributed.run --nproc_per_node=3 e2e_trainer.py -dataPath ./testing/mockup -outputPath scratch  -config testing/configs/hello_world_local.yaml -task nlg_gru -backend nccl\n\n.. _here: https://github.com/microsoft/msrflute/tree/main/testing\n\nIf the setup of the experiment has been done correctly, after the model initialization we would be able to see the clients being trained:\n\n.. figure:: img/run.png\n    :align: center\n    :width: 800\n\n    Local run for nlg_gru task.\n\nAML Run \n------------\n\nFLUTE has a native integration for job submissions with Azure ML, allowing users to use the built-in CLI or web interface for job/experiment tracking.\n\nFor running experiments on AzureML, the CLI can help. You should first install the CLI `install the CLI`_ (make sure you have v2) and `create a resource group and workspace`_. You can then create a compute cluster, type ``az ml compute create -h`` for more info. Afterwards, you should write a YAML file with instructions for the job; we provide a simple example below:\n\n.. _install the CLI: https://docs.microsoft.com/en-us/azure/machine-learning/reference-azure-machine-learning-cli\n.. _create a resource group and workspace: https://docs.microsoft.com/en-us/azure/machine-learning/how-to-manage-workspace-cli?tabs=vnetpleconfigurationsv1cli%2Ccreatenewresources%2Cworkspaceupdatev1%2Cworkspacesynckeysv1%2Cworkspacedeletev1\n\n.. code:: yaml\n\n    experiment_name: basic_example\n    description: Basic example of AML config for submitting FLUTE jobs\n    code:\n    local_path: .\n    compute: azureml:Test\n    environment:\n    image: pytorch/pytorch:1.9.0-cuda10.2-cudnn7-devel\n    inputs:\n    data:\n        folder: azureml://datastores/data/paths/cifar\n        mode: rw_mount\n    command: >\n    apt -y update &&\n    apt -y install openmpi-bin libopenmpi-dev openssh-client &&\n    python3 -m pip install --upgrade pip &&\n    python3 -m pip install -r requirements.txt &&\n    python -m torch.distributed.run --nproc_per_node=4 e2e_trainer.py\n    -outputPath=./outputs\n    -dataPath={inputs.data}\n    -task=classif_cnn\n    -config=./experiments/classif_cnn/config.yaml\n    -backend=nccl\n\n\nYou should replace ``compute`` with the name of the one you created before, and adjust the path of the datastore containing the data. In the example above, we created a datastore called ``data`` and added to it a folder called ``cifar``, which contained the two HDF5 files. The command passed above will install dependencies and then launch a NCCL job with 4 threads, for the experiment defined in ``experiments/classif_cnn``. Details on how to run a job using the AzureML CLI are given in its `documentation`_ , but typically it suffices to set up the environment and type ``az ml job create -f <name-of-the-yaml-file>``. In the same page of the documentation, you can also find more info about how to set up the YAML file above, in case other changes are needed.\n\n.. _documentation: https://docs.microsoft.com/en-us/azure/machine-learning/how-to-train-cli\n\n.. note:: The local_path above is relative to the location of the YAML file. Setting it to ``.`` assumes it is in the same folder as ``e2e_trainer.py``. \n    \n.. note:: All files on this folder will be uploaded to Azure, including hidden folders such as ``.git``, make sure to remove large files and folders that are not needed.\n\nAfter launching the experiment, you can follow it on AzureML Studio, which prints logs, plots metrics and makes the output easily available after the experiment is finished.\n\n"
  },
  {
    "path": "doc/sphinx/make.bat",
    "content": "@ECHO OFF\n\npushd %~dp0\n\nREM Command file for Sphinx documentation\n\nif \"%SPHINXBUILD%\" == \"\" (\n\tset SPHINXBUILD=sphinx-build\n)\nset SOURCEDIR=.\nset BUILDDIR=_build\n\nif \"%1\" == \"\" goto help\n\n%SPHINXBUILD% >NUL 2>NUL\nif errorlevel 9009 (\n\techo.\n\techo.The 'sphinx-build' command was not found. Make sure you have Sphinx\n\techo.installed, then set the SPHINXBUILD environment variable to point\n\techo.to the full path of the 'sphinx-build' executable. Alternatively you\n\techo.may add the Sphinx directory to PATH.\n\techo.\n\techo.If you don't have Sphinx installed, grab it from\n\techo.http://sphinx-doc.org/\n\texit /b 1\n)\n\n%SPHINXBUILD% -M %1 %SOURCEDIR% %BUILDDIR% %SPHINXOPTS% %O%\ngoto end\n\n:help\n%SPHINXBUILD% -M help %SOURCEDIR% %BUILDDIR% %SPHINXOPTS% %O%\n\n:end\npopd\n"
  },
  {
    "path": "doc/sphinx/overview.rst",
    "content": "FLUTE Overview\n============\n\nFLUTE: Federated Learning Utilities and Tools for Experimentation is a high-performance open source platform that enables researchers and developers to perform rapid prototyping and offline simulations of novel federated learning algorithms at scale. \n\nAn FLUTE job consists of one or more nodes (physical or virtual machines) executing a total of K workers that can become a Server or Client. \n\n.. figure:: img/client-server.png\n    :align: center\n    :width: 600\n    \n    FLUTE uses a distributed processing architecture backed by torch.distributed. \n\nWorker 0 acts as a central orchestrator, maintaining and distributing the central model to workers, and subsequently distributing client tasks to them. On each training round the orchestrator takes care of:\n    \n    * Dispatch the central model to the rest of the workers\n    * Queues up client tasks for workers to execute. \n    \nWorkers receive client tasks (client training data and training config) and:\n\n    * Execute SGD on the central model using their client's training data\n    * Send model delta (pseudo-gradient) back to the orchestrator. \n\nEach worker>0 processes client tasks sequentially, consisting of data encoding and one or more batch updates to the central model (note the central model is reset to its original state for each client task). As each client task completes, the model delta, aka the pseudo-gradient is sent back to the orchestrator for federation into a new central model.\n\nExecution runs for up to N training rounds.  In each round the orchestrator may sample a subset of clients, and may also randomly delay pseudo-gradient updates from some clients to future rounds. The orchestrator will also periodically distribute evaluation tasks to determine model quality on validation and test data.\n\n.. note:: AzureML generally expects there will be one worker per GPU on each node.\n\nArchitecture\n------------\n\nFLUTE design is based on a central server architecture.\n\n.. figure:: img/architecture.png\n    :align: center\n    :width: 500\n    \n    FLUTE logical workflow. \n\nThe logical workflow performed is:\n\n    1. Send and initial global model to clients.\n    2. Train instances of the global model with locally available data on each client.\n    3. Send training information to the Server (e.g. adapted models, logits, pseudo-gradients).\n    4. Combine the returned information on the server to produce a new model.\n    5. Optionally, update the logbal model with an additional server-side rehearsal step.\n    6. Send the updated global model back to the clients.\n    7. Repeat steps 2-6 after sampling a new subset of clients for the next training interation.\n\n\n"
  },
  {
    "path": "doc/sphinx/reference.rst",
    "content": "Option Reference\n================\n\nCommand Line Arguments\n----------------------\n\nYAML Configuration\n------------------\n\nFLUTE yaml files consist of three main sections, and a few optional sections. The `model_config` specifies model architecture and pretrained model setup path. The `server_config` section defines server settings such as total training rounds, aggregation method, optimizer settings, learning rate schedule, and any server-side training data.  The `client_config` section specifies client optimizer settings and the client-side training data.\n\n.. note:: Training data is loaded by the server and dispatched to the clients. The configuration settings for this data are specified in the `client_config`.\n\n\nmodel_config\n~~~~~~~~~~~~\n\nserver_config\n~~~~~~~~~~~~~\n\nclient_config\n~~~~~~~~~~~~~\n\nOptional Sections\n-----------------\nIn addition to the main sections, some optional sections may be specified to control privacy settings, specifically a `dp_config` section for differential privacy settings, and `privacy_metrics_config` for applying privacy metrics.\n\n\ndp_config\n~~~~~~~~~\n\nprivacy_metrics_config\n~~~~~~~~~~~~~~~~~~~~~~"
  },
  {
    "path": "doc/sphinx/requirements.txt",
    "content": "sphinx_rtd_theme\njinja2==3.0.3\n"
  },
  {
    "path": "doc/sphinx/scenarios.rst",
    "content": "Adding New Scenarios\n====================\n\nData Preparation\n------------\nFLUTE provides the abstract class `BaseDataset` inside ``core/dataset.py`` that can be used  to wrap\nany dataset and make it compatible with the platform. The dataset should be able to access all the data, \nand store it in the attributes `user_list`, `user_data`, `num_samples` and  `user_data_labels` (optional). \nThese attributes are required to have these exact names. The abstract method ``load_data ()`` should be \nused to instantiate/load the dataset and provide the training format required by FLUTE on-the-fly. \nHere is a sample data blob for language model training.\n\n.. code:: json\n\n    {\n        \"users\": [\"bert\",\"elmo\"],\n        \"user_data\": {\n            \"bert\": {\"x\": [\"my name is Bert.\", \"I live with Ernie.\"]},\n            \"elmo\": {\"x\": [\"Big Bird is my friend.\"]}\n        },\n        \"num_samples\": [2, 1]\n    }\n\nThe blob consists of three fields:\n\n    * ``users``: indicates a unique id for each user in the training data.  Users are sampled uniformly to create client tasks during training. There could be many more users than client tasks per round or even over all client tasks over all rounds. \n    * ``num_samples`` : indicates the number of samples for each user, in the same order as ``users`` list.  That is, for any index ``i`` in ``range(len(data['users']))``: \n    * ``user_data``: contains user-indexed training data. Each user's data is a dictionary of the form ``{\"x\": [list of examples]}``.  \n\nIf labels are needed by the task, ``user_data_label`` will be required by FLUTE with the user-indexed labels. The format should be similar to ``user_data`` where each user's label is a dictionary of the form ``{\"x\": [list of labels]}`` as follows:\n\n.. code:: json\n\n    \"user_data_label\": {\n        \"bert\": {\"x\": [ 0 , 1 ]},\n        \"elmo\": {\"x\": [ 0 ]}\n        }\n\n.. note::\n\n    Test and validation data is formatted similarly.\n\n.. note::\n\n    Test/validate data is dispatched to workers by partitioning on users. If your test data isn't user-partitioned, we recommend partitioning it uniformly using some dummy user ids.\n\nAdd the model to FLUTE\n--------------\n\nFLUTE requires the model declaration framed in PyTorch, which must inhereit from the `BaseModel` class defined in ``core/model.py``. The following methods should be overridden:\n\n    * __init__: model definition\n    * loss: computes the loss used for training rounds\n    * inference: computes the metrics used during evaluation rounds\n\nPlease see the example provided below:\n\n.. code:: python\n\n    from core.model import BaseModel\n\n    class CNN(BaseModel):\n    '''This is a PyTorch model with some extra methods'''\n\n    def __init__(self, model_config):\n        super().__init__()\n        self.net = Net()\n\n    def loss(self, input: torch.Tensor) -> torch.Tensor:\n        '''Performs forward step and computes the loss'''\n        device = 'cuda' if torch.cuda.is_available() else 'cpu'\n        features, labels = input['x'].to(device), input['y'].to(device)\n        output = self.net.forward(features)\n        return F.cross_entropy(output, labels.long())\n\n    def inference(self, input):\n        '''Performs forward step and computes metrics'''\n        device = 'cuda' if torch.cuda.is_available() else 'cpu'\n        features, labels = input['x'].to(device), input['y'].to(device)\n        output = self.net.forward(features)\n\n        n_samples = features.shape[0]\n        accuracy = torch.mean((torch.argmax(output, dim=1) == labels).float()).item()\n        f1 = f1_score(labels.cpu(), torch.argmax(output, dim=1).cpu(), average='micro')\n\n        # NOTE: Only the keys 'output','acc' and 'batch_size' does not require \n        # extra fields as 'value' and 'higher is better'. FLUTE requires this \n        # format only for customized metrics.\n\n        return {'output':output, 'acc': accuracy, 'batch_size': n_samples, \\\n                'f1_score': {'value':f1,'higher_is_better': True}} \n\nOnce the model is ready, all mandatory files must be in a single folder inside ´{/experiments´. Please adjust your files with the following naming structure so FLUTE can be able to find all the scripts needed.\n\n.. code-block:: bash\n\n    task_name\n        |---- dataloaders\n              |---- dataloader.py\n              |---- dataset.py\n        |---- utils\n              |---- utils.py (if needed)\n        |---- model.py\n        |---- config.yaml\n        |---- README.txt\n\n.. note:: In case you need to import a module that has not been considered in FLUTE, this can be added in requirements.txt \n\n.. note:: All files must contain only absolute imports, in order to avoid issues when running.\n\nImplement new metrics\n--------------\n\nThe metrics computed during the evaluation rounds are declared inside `inference()` in the model declaration. FLUTE requires this function to return a dictionary with at least `output`, `acc` and `batch_size` as follows:\n\n    .. code:: bash\n        \n        { \"output\": loss, \"acc\": accuracy, \"batch_size\": batch_size}\n\nIn order to add a new metric, we just need to add the key inside the same dictionary with the following format:\n\n    .. code:: bash\n        \n        { \"output\": loss, \n          \"acc\": accuracy, \n          \"batch_size\": batch_size, \n          \"custom_metric_1\": {\"value\": value1 ,'higher_is_better': True},\n          \"custom_metric_2\": {\"value\": value2 ,'higher_is_better': False}}\n\nOnce the keys have been included in the returning dictionary from `inference()`, FLUTE will automatically recognize them during the test/val rounds.\n\n.. note:: Only the keys `output`, `acc` and `batch_size` does not require a dictionary. \n\nCreate the configuration file\n---------------------------------\n\nThe configuration file will allow you to specify the setup in your experiment, such as the optimizer, learning rate, number of clients and so on. FLUTE requires the following 6 sections:\n\n    * model_config: path an parameters (if needed) to initialize the model.\n    * dp_config: differential privacy setup.\n    * privacy_metrics_config: for cache data to compute additional metrics.\n    * strategy: defines the federated optimizer.\n    * server_config: determines all the server-side settings.\n    * client_config: dictates the learning parameters for client-side model updates. \n\nThe blob below indicates the basic parameters required by FLUTE to run an experiment:\n\n.. code:: yaml \n\n    model_config:\n        model_type: CNN                                    # Class name in model.py \n        model_folder: experiments/classif_cnn/model.py     # Relative path to the model declaration\n\n    dp_config:\n        enable_local_dp: false                             # DP disabled\n\n    privacy_metrics_config:\n        apply_metrics: false                               # Privacy metrics disabled\n\n    strategy: DGA                                          # Federated optimizar (DGA or FedAvg)\n\n    server_config:   \n        wantRL: false                                      # Whether to use RL-based meta-optimizers\n        resume_from_checkpoint: false                      # Restart from checkpoint if file exists\n        do_profiling: false                                # Run profiler and compute runtime metrics\n        optimizer_config:                                  # Optimizer used to update the global model\n            type: sgd\n            lr: 1.0\n        annealing_config:                                  # Annealer for the learning rate\n            type: step_lr\n            step_interval: epoch\n            gamma: 1.0\n            step_size: 100\n        val_freq: 50                                       # Validation rounds frequency\n        rec_freq: 100                                      # Testing rounds frequency\n        initial_val: true                                  # Enable initial validation round\n        initial_rec: true                                  # Enable initial testing round\n        max_iteration: 2000                                # Total of iteration rounds\n        num_clients_per_iteration: 10                      # Clients per interation\n        data_config:                                       # Information for the test/val dataloaders\n            val:\n                batch_size: 10000\n                val_data: test_data.hdf5                   # Assign to null for data loaded on-the-fly\n            test:\n                batch_size: 10000\n                test_data: test_data.hdf5                  # Assign to null for data loaded on-the-fly\n        type: model_optimization                           # Server type (model_optimization is the only available for now)\n        aggregate_median: softmax                          # How aggregations weights are computed\n        initial_lr_client: 0.001                           # Learning rate used on optimizer\n        lr_decay_factor: 1.0                               # Decay factor for LR\n        weight_train_loss: train_loss                      # Determines how each client's weight is computed (e.g. grad_mean_loss, train_loss)\n        best_model_criterion: f1_score                     # Determines the best model based on minimal loss, for checkpointing\n        fall_back_to_best_model: false                     # If a model degrades, use the previous best model\n        softmax_beta: 1.0                                  # Beta value to use for the softmax DGA\n\n    client_config:\n        do_profiling: false                                # Run profiling and compute runtime metrics\n        ignore_subtask: false                              # Determines which model loss to use. In most cases just set to False.\n        data_config:                                       # Information for the train dataloader\n            train:\n                batch_size: 4\n                list_of_train_data: train_data.hdf5        # Assign to null for data loaded on-the-fly\n                desired_max_samples: 50000\n        optimizer_config:                                  # Optimizer used by the client\n            type: sgd\n            lr: 0.001                                      # This is overridden by `initial_lr_client`\n            momentum: 0.9\n        type: optimization                                 # The type of client (always set \"optimization for now\")\n\n.. note:: Documented templates for all the options available in the configuration files are provided inside configs folder.\n"
  },
  {
    "path": "e2e_trainer.py",
    "content": "# Copyright (c) Microsoft Corporation.\n# Licensed under the MIT license.\n\n'''\nThis is the main script to run on each NCCL/GLOO thread. It will spawn either a\nServer or Worker object -- the former is responsible for orchestrating and\naggregating models, where as the latter processes clients' data to generate\na new model. The Server lives on the very first thread, whereas remaining\nthreads contain each a diferent Worker.\n'''\n\nimport argparse\nimport os\nimport shutil\nimport yaml\nimport logging\nfrom psutil import virtual_memory\n\nimport torch\nimport torch.distributed as dist\nfrom azureml.core import Run\n\nfrom core import federated\nfrom core.config import FLUTEConfig\nfrom core.server import select_server\nfrom core.client import Client\nfrom experiments import make_model\nfrom utils import (\n    make_optimizer,\n    init_logging,\n    print_rank,\n    find_pretrained_model\n)\nfrom utils.dataloaders_utils import (\n    make_train_dataloader,\n    get_dataset,\n)\nfrom core.evaluation import make_eval_clients\n\ndef log_run_properties(config: FLUTEConfig):\n    \"\"\"Log parameters on AzureML.\n    \n    Args:\n        config (dict): config containing parameters to log.\n    \"\"\"\n\n    properties = {}\n\n    # Build properties dictionary\n    mem = virtual_memory()\n    properties[\"System memory (GB)\"] = float(mem.total) / (1024**3)\n\n    props = [\n        (\"server_config.num_clients_per_iteration\", 0),\n        (\"server_config.max_iteration\", 0),\n        (\"dp_config.eps\", 0),\n        (\"dp_config.max_weight\", 0),\n        (\"dp_config.min_weight\", 0),\n        (\"server_config.optimizer_config.type\", \"sgd\"),\n        (\"server_config.optimizer_config.lr\", 1.0),\n        (\"server_config.optimizer_config.amsgrad\", False),\n        (\"server_config.annealing_config.type\", \"step_lr\"),\n        (\"server_config.annealing_config.step_interval\", \"epoch\"),\n        (\"server_config.annealing_config.gamma\", 1.0),\n        (\"server_config.annealing_config.step_size\", 100),\n    ]\n\n    for (key, default) in props:\n        properties[key] = config.lookup(key, default)\n\n    # Log the properties dictionary into AzureML\n    run = Run.get_context()\n    for k in properties:\n        run.log(k, properties[k])\n\n\ndef run_worker(model_path, config, task, data_path, local_rank, backend):\n    \"\"\"Spawn worker object that lives throughout NCCL/GLOO thread.\n    \n    Args:\n        model_path (str): path to the pretrained model.\n        config (dict): dictionary containing parameters.\n        task (str): what task to solve, must be a folder of :code:`experiments`.\n        data_path (str): path to data.\n        local_rank (int): the rank of the NCCL/GLOO thread.\n    \"\"\"\n    model_config = config[\"model_config\"]\n    server_config = config[\"server_config\"]\n    client_config = config[\"client_config\"]\n\n    # Backend initialization\n    WORLD_RANK = federated.rank()\n    LOCAL_RANK = federated.local_rank()\n    print_rank(f\"Backend: {backend}\")\n    dist.init_process_group(backend=backend, init_method=None, rank=WORLD_RANK, world_size=federated.size())\n\n    # Assign NCCL thread to a specific GPU\n    if torch.cuda.is_available():\n        print_rank(f\"Assigning worker to GPU {LOCAL_RANK}\")\n        device = torch.device(\"cuda:{}\".format(LOCAL_RANK))\n        torch.cuda.set_device(device)\n\n    # Make the Model to distribute to workers\n    model = make_model(model_config)\n\n    # Get evaluation datasets\n    val_dataset = get_dataset(data_path, config, task, mode=\"val\", test_only=True)\n    test_dataset = get_dataset(data_path, config, task, mode=\"test\", test_only=True)\n    \n    # Create list of clients for test/val -- Server need the indexes and Worker the clients list\n    val_clients = list(make_eval_clients(val_dataset, config))\n    test_clients = list(make_eval_clients(test_dataset, config))\n\n    # pre-cache the training data and capture the number of clients for sampling\n    num_clients = Client.get_train_dataset(data_path, config, task)\n    config[\"server_config\"][\"data_config\"][\"num_clients\"] = num_clients\n\n    # Instantiate the Server object on the first thread\n    if WORLD_RANK == 0:\n\n        single_worker = None\n        if federated.size() == 1:\n            # For a single-GPU/CPU execution using NCCL, Server and Worker are instantiated in the same GPU.\n            single_worker = federated.Worker(model=model,\n                                        data_path=data_path,\n                                        do_profiling=client_config.get(\"do_profiling\", False),\n                                        val_clients=val_clients,\n                                        test_clients=test_clients,\n                                        val_dataset = val_dataset,\n                                        test_dataset = test_dataset,\n                                        config= config)\n            single_worker.run()\n        \n        try:\n            print_rank('Server data preparation')\n\n            if 'train' in config['server_config']['data_config']:\n                server_train_dataloader = make_train_dataloader(config['server_config']['data_config']['train'], data_path, task=task, clientx=None)\n            else:\n                server_train_dataloader = None\n\n            idx_val_clients = list(range(len(val_clients))) # Generates indexes for val clients\n            idx_test_clients = list(range(len(test_clients))) # Generates indexes for test clients\n\n            print_rank(\"Prepared the dataloaders\")\n\n            # Create the optimizer on the server\n            optimizer = make_optimizer(server_config[\"optimizer_config\"], model)\n\n            # Load a model that's already trained\n            best_trained_model = find_pretrained_model(model_path, model_config)\n            if best_trained_model is not None:\n                model_state_dict = torch.load(best_trained_model,\n                    map_location=None if torch.cuda.is_available() else torch.device(\"cpu\"))\n                model.load_state_dict(model_state_dict)\n\n            server_type = server_config[\"type\"]\n            server_setup = select_server(server_type)  # Return the server class\n            server = server_setup(\n                num_clients=config['server_config']['data_config'][\"num_clients\"],\n                model=model,\n                optimizer=optimizer,\n                ss_scheduler=None,\n                data_path=data_path,\n                model_path=model_path,\n                server_train_dataloader=server_train_dataloader,\n                config=config,\n                idx_val_clients=idx_val_clients,\n                idx_test_clients=idx_test_clients,\n                single_worker=single_worker,\n            )\n            log_run_properties(config)\n\n        except Exception as e:\n            # Be sure the other workers are shut down.\n            server.terminate_workers()\n            raise e\n\n        print_rank(\"Launching server\")\n        server.run()\n\n    else:\n        # Instantiate client-processing Worker on remaining threads\n        print_rank(\"Worker on node {}: process started\".format(WORLD_RANK))\n        worker = federated.Worker(\n            model=model,\n            data_path=data_path,\n            do_profiling=client_config.get(\"do_profiling\", False),\n            val_clients=val_clients,\n            test_clients=test_clients,\n            val_dataset = val_dataset,\n            test_dataset = test_dataset,\n            config= config,\n        )\n        worker.run()\n\n\nif __name__ == \"__main__\":\n    # Parse command-line arguments\n    parser = argparse.ArgumentParser()\n    parser.add_argument(\"-config\")\n    parser.add_argument(\"-outputPath\")\n    parser.add_argument(\"-dataPath\", default=None)\n    parser.add_argument(\"-task\", default=None, help=\"Define the task for the run\")\n    parser.add_argument(\"-backend\", default=None, help=\"Define the communication protocol\")\n    parser.add_argument(\"-num_skip_decoding\", default=-1, type=int, help=\"Skip decoding in unsupervised learning mode\")\n    parser.add_argument(\"--local_rank\", default=-1, type=int)\n\n    args = parser.parse_args()\n    data_path = args.dataPath\n    task = args.task\n    local_rank = args.local_rank\n    assert args.backend in ['nccl','gloo'], f\"Backend {args.backend} not recognized, please select nccl or gloo\"\n    backend = args.backend\n\n    # The mount point can also be retrieved from input_datasets of the run context\n    if data_path is None:\n        data_path = Run.get_context().input_datasets[\"input\"]\n    print(\"The data can be found here: \", data_path)\n\n    # Update the model path for the sake of AzureML\n    id = Run.get_context().id\n    experiment_name = \"-\".join(id.split(\"-\")[-4:-2])\n    experiment_root = os.path.join(args.outputPath, experiment_name)\n    os.makedirs(experiment_root, exist_ok=True)\n    model_path = os.path.join(experiment_root, \"models\")\n    log_path = os.path.join(experiment_root, \"log\")\n\n    os.makedirs(model_path, exist_ok=True)\n    os.makedirs(log_path, exist_ok=True)\n\n    # Make a copy of the config file into the output folder, for future reference\n    cfg_out = os.path.join(experiment_root, \"FLUTE_config.yaml\")\n    if local_rank <= 0:\n        shutil.copyfile(args.config, cfg_out)\n    \n    # Initialize logging\n    init_logging(log_path, loglevel=logging.INFO)\n\n    with open(args.config) as f:\n\n        cfg_dict = yaml.safe_load(f)\n        config = FLUTEConfig.from_dict(cfg_dict)\n        config[\"data_path\"] = data_path\n        config[\"output_path\"] = args.outputPath\n        config[\"model_path\"]= model_path\n        config[\"experiment_name\"] = experiment_name\n        config[\"client_config\"][\"task\"] = task\n        config[\"server_config\"][\"task\"] = task\n        config.validate()\n\n        # Instantiate either Server or Worker on the thread\n        run_worker(model_path, config, task, data_path, local_rank, backend)\n"
  },
  {
    "path": "experiments/__init__.py",
    "content": "# Copyright (c) Microsoft Corporation.\n# Licensed under the MIT license.\n\nimport torch\nfrom utils import print_rank, print_cuda_stats, to_device\nfrom importlib.machinery import SourceFileLoader\n\ndef make_model(model_config, dataloader_type=None, input_dim=-1, output_dim=-1):\n    print('Preparing model .. Initializing')\n    \n    try:\n        dir = \"./\"+ str(model_config[\"model_folder\"])\n        model_class = model_config[\"model_type\"]\n        loader = SourceFileLoader(model_class,dir).load_module()\n        model_type = getattr(loader,model_class )\n    except:\n        raise ValueError(\"{} model not found, make sure to indicate the model path in the .yaml file\".format(model_config[\"type\"]))\n\n    model = model_type(model_config)\n    print(model)\n\n    if not \"weight_init\" in model_config or model_config[\"weight_init\"] == \"default\":\n        print_rank(\"initialize model with default settings\")\n        pass\n    elif model_config[\"weight_init\"] == \"xavier_normal\":\n        print_rank(\"initialize model with xavier_normal\")\n        for p in model.parameters():\n            if p.dim() > 1: # weight\n                torch.nn.init.xavier_normal_(p.data)\n            elif p.dim() == 1: # bias\n                p.data.zero_()\n        for m in model.modules():\n            if isinstance(m, (torch.nn.Embedding, torch.nn.LayerNorm, torch.nn.BatchNorm2d)):\n                m.reset_parameters()\n    else:\n        return ValueError(\"{} not supported\".format(model_config[\"weight_init\"]))\n\n    print_rank(\"trying to move the model to GPU\")\n    model = to_device(model)\n    print_rank(\"model: {}\".format(model))\n    print_cuda_stats()\n\n    return model\n"
  },
  {
    "path": "experiments/classif_cnn/.gitignore",
    "content": "utils/data\n*.hdf5\n*.json"
  },
  {
    "path": "experiments/classif_cnn/README.md",
    "content": "# Simple example of a CNN on CIFAR-10\n\nOur objective here is to bring a simple experiment from the Pytorch tutorials,\nmore specifically the one in https://github.com/pytorch/tutorials/blob/master/beginner_source/blitz/cifar10_tutorial.py,\nand convert it to FLUTE. Instructions on how to do this are given below.\n\nAn adapted version of the tutorial above is provided in the\n`utils/centralized_training.py` script.\n\n## Preparing the data\n\nIn this experiment we are making use of the CIFAR10 Dataset from torchvision, \ninitializated in `dataloaders/cifar_dataset.py`, which inhereits from the\nFLUTE base dataset class `core/dataset.py`\n\n## Specifying the model\n\nNext, we prepare the model. The `model.py` file contains two classes: one is the\n`Net` class already contained in the original script, and the other, a class\ncalled `CNN` which effectively wraps `Net`. Importantly, the `CNN` class defines\ntwo methods: `loss` and `inference`; both perform forward steps and then perform\nadditional computations, in particular, the former executes the loss' evaluation,\nand the latter the metrics' computation. The format of the inputs and outputs\nshould be the same as in this example.\n\n## Specifying dataset and dataloaders\n\nInside the `dataloaders` folder, there are two files: `dataset.py` and\n`dataloader.py`. Both inherit from the base classes declared in `core`\nfolder, that under the hood inhereit from Pytorch classes with same name.\n\nThe dataset should be able to access all the data, and store it in the\nattributes `user_list`, `user_data`, `user_data_labels` and `num_samples` (user\nnames, user features, user labels if the problem is supervised, and number of\nsamples for each user, respectively). These attributes are required to have\nthese exact names. Otherwise, it should also be able to access the examples of a\nspecific user, which id is passed during initialization via the `user_idx`\nargument.\n\nThe dataloader is simpler, and essentially just instantiates the dataset and\ncreates batches with a specific format.\n\n## Creating a config file\n\nAll the parameters of the experiment are passed in a YAML file. A documented\nexample is provided in `config.yaml`.\n\n## Running the experiment\n\nFinally, to launch the experiment, it suffices to launch the `e2e_trainer.py`\nscript using torch.distributed.\n\n```\npython -m torch.distributed.run --nproc_per_node=4 e2e_trainer.py -dataPath experiments/classif_cnn/utils/data -outputPath scratch -config experiments/classif_cnn/config.yaml -task classif_cnn -backend gloo\n```\n\nThe `dataPath`, `outputPath` and `config` arguments should just specify the\nrespective files or folders, as in the example above -- in this case, a folder\ncalled `scratch` will be created containing logs and checkpoints. The task\nshould be the name of the folder insider `experiments`.\n\nFollowing what is specified in the config file, the experiment will run for\n2000 rounds, and during each of them 10 clients will be selected at random,\neach of whom has 50 samples. It is more or less the same, then, as the 2\nepochs in the centralized training, except that clients are selected at\nrandom so we might not see all of them."
  },
  {
    "path": "experiments/classif_cnn/config.yaml",
    "content": "# Basic configuration file for running classif_cnn example using torchvision CIFAR10 dataset.\n# Parameters needed to initialize the model\nmodel_config:\n    model_type: CNN                                    # class w/ `loss` and `inference` methods\n    model_folder: experiments/classif_cnn/model.py     # file containing class\n\n# Configuration for differential privacy\ndp_config:\n    enable_local_dp: false                             # whether to enable user-level DP\n\n# Additional privacy metrics\nprivacy_metrics_config:\n    apply_metrics: false                               # cache data to compute additional metrics\n\n# Select the Federated optimizer to use (e.g. DGA, FedAvg or FedProx)\nstrategy: DGA\n\n# Determines all the server-side settings for training and evaluation rounds\nserver_config:   \n    wantRL: false                                      # whether to use RL-based meta-optimizers\n    resume_from_checkpoint: false                      # restart from checkpoint if file exists\n    do_profiling: false                                # run profiler and compute runtime metrics\n    optimizer_config:                                  # this is the optimizer used to update the model\n        type: sgd\n        lr: 1.0\n    annealing_config:                                  # annealer for the learning rate\n        type: step_lr\n        step_interval: epoch\n        gamma: 1.0\n        step_size: 100\n    val_freq: 50                                       # how many iterations between metric eval on val set\n    rec_freq: 100                                      # how many iterations between metric eval on test set\n    initial_val: true\n    initial_rec: true\n    max_iteration: 2000                                # how many iterations in total\n    num_clients_per_iteration: 10                      # how many clients per iteration\n    data_config:                                       # where to get val and test data from\n        val:\n            batch_size: 10000\n            val_data: null                             # Assigned to null because dataset is being instantiated\n        test:\n            batch_size: 10000\n            test_data: null                            # Assigned to null because dataset is being instantiated\n    type: model_optimization\n    aggregate_median: softmax                          # how aggregations weights are computed\n    initial_lr_client: 0.001                           # learning rate used on client optimizer\n    lr_decay_factor: 1.0\n    weight_train_loss: train_loss\n    best_model_criterion: f1_score\n    fall_back_to_best_model: false\n    softmax_beta: 1.0\n\n# Dictates the learning parameters for client-side model updates. Train data is defined inside this config.\nclient_config:\n    do_profiling: false                                # run profiling and compute runtime metrics\n    ignore_subtask: false\n    data_config:                                       # where to get training data from\n        train:\n            batch_size: 4\n            list_of_train_data: null                   # Assigned to null because dataset is being instantiated\n            desired_max_samples: 50000\n    optimizer_config:                                  # this is the optimizer used by the client\n        type: sgd\n        lr: 0.001                                      # this is overridden by `initial_lr_client`\n        momentum: 0.9\n    type: optimization"
  },
  {
    "path": "experiments/classif_cnn/dataloaders/cifar_dataset.py",
    "content": "# Copyright (c) Microsoft Corporation.\n# Licensed under the MIT license.\nimport time\nimport torchvision\nimport torchvision.transforms as transforms\n\nclass CIFAR10:\n    def __init__(self) :\n        # Get training and testing data from torchvision\n        transform = transforms.Compose([\n            transforms.ToTensor(),\n            transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5)),\n        ])\n\n        trainset = torchvision.datasets.CIFAR10(root='./data', train=True,\n                download=True, transform=transform)\n        testset = torchvision.datasets.CIFAR10(root='./data', train=False,\n                download=True, transform=transform)\n\n        print('Processing training set...')\n        self.trainset=_process(trainset, n_users=1000)\n\n        print('Processing test set...')\n        self.testset=_process(testset, n_users=200)\n\ndef _process(dataset, n_users):\n    '''Process a Torchvision dataset to expected format and save to disk'''\n\n    # Split training data equally among all users\n    total_samples = len(dataset)\n    samples_per_user = total_samples // n_users\n    assert total_samples % n_users == 0\n\n    # Function for getting a given user's data indices\n    user_idxs = lambda user_id: slice(user_id * samples_per_user, (user_id + 1) * samples_per_user)\n\n    # Convert training data to expected format\n    print('Converting data to expected format...')\n    start_time = time.time()\n\n    data_dict = {  # the data is expected to have this format\n        'users' : [f'{user_id:04d}' for user_id in range(n_users)],\n        'num_samples' : 10000 * [samples_per_user],\n        'user_data' : {f'{user_id:04d}': dataset.data[user_idxs(user_id)].tolist() for user_id in range(n_users)},\n        'user_data_label': {f'{user_id:04d}': dataset.targets[user_idxs(user_id)] for user_id in range(n_users)},\n    }\n\n    print(f'Finished converting data in {time.time() - start_time:.2f}s.')\n\n    return data_dict\n\n"
  },
  {
    "path": "experiments/classif_cnn/dataloaders/dataloader.py",
    "content": "# Copyright (c) Microsoft Corporation.\n# Licensed under the MIT license.\n\nimport torch\n\nfrom core.dataloader import BaseDataLoader\nfrom experiments.classif_cnn.dataloaders.dataset import Dataset\n\nclass DataLoader(BaseDataLoader):\n    def __init__(self, mode, num_workers=0, **kwargs):\n        args = kwargs['args']\n        self.batch_size = args['batch_size']\n\n        dataset = Dataset(\n            data=kwargs['data'],\n            test_only=(not mode=='train'),\n            user_idx=kwargs.get('user_idx', None),\n        )\n\n        super().__init__(\n            dataset,\n            batch_size=self.batch_size,\n            shuffle=(mode=='train'),\n            num_workers=num_workers,\n            collate_fn=self.collate_fn,\n        )\n\n    def collate_fn(self, batch):\n        x, y = list(zip(*batch))\n        return {'x': torch.tensor(x), 'y': torch.tensor(y)}"
  },
  {
    "path": "experiments/classif_cnn/dataloaders/dataset.py",
    "content": "# Copyright (c) Microsoft Corporation.\n# Licensed under the MIT license.\n\nimport numpy as np\nfrom core.dataset import BaseDataset\nfrom experiments.classif_cnn.dataloaders.cifar_dataset import CIFAR10\n\nclass Dataset(BaseDataset):\n    def __init__(self, data, test_only=False, user_idx=0, **kwargs):\n        self.test_only = test_only\n        self.user_idx = user_idx\n\n        # Get all data\n        self.user_list, self.user_data, self.user_data_label, self.num_samples = self.load_data(data, self.test_only)\n\n        if self.test_only:  # combine all data into single array\n            self.user = 'test_only'\n            self.features = np.vstack([user_data for user_data in self.user_data.values()])\n            self.labels = np.hstack([user_label for user_label in self.user_data_label.values()])\n        else:  # get a single user's data\n            if user_idx is None:\n                raise ValueError('in train mode, user_idx must be specified')\n\n            self.user = self.user_list[user_idx]\n            self.features = self.user_data[self.user]\n            self.labels = self.user_data_label[self.user]\n\n    def __getitem__(self, idx):\n        return np.array(self.features[idx]).astype(np.float32).T, self.labels[idx]\n\n    def __len__(self):\n        return len(self.features)\n\n    def load_data(self, data, test_only):\n        '''Wrapper method to read/instantiate the dataset'''\n\n        if data == None:\n            dataset = CIFAR10()\n            data = dataset.testset if test_only else dataset.trainset\n        \n        users = data['users']\n        features = data['user_data']\n        labels = data['user_data_label']\n        num_samples = data['num_samples']\n            \n        return users, features, labels, num_samples"
  },
  {
    "path": "experiments/classif_cnn/model.py",
    "content": "# Copyright (c) Microsoft Corporation.\n# Licensed under the MIT license.\n\nimport torch\nfrom torch import nn\nfrom torch.nn import functional as F\nfrom sklearn.metrics import f1_score\n\nfrom core.model import BaseModel\n\nclass Net(nn.Module):\n    '''The standard PyTorch model we want to federate'''\n\n    def __init__(self):\n        super().__init__()\n        self.conv1 = nn.Conv2d(3, 6, 5)\n        self.pool = nn.MaxPool2d(2, 2)\n        self.conv2 = nn.Conv2d(6, 16, 5)\n        self.fc1 = nn.Linear(16 * 5 * 5, 120)\n        self.fc2 = nn.Linear(120, 84)\n        self.fc3 = nn.Linear(84, 10)\n\n    def forward(self, x):\n        x = self.pool(F.relu(self.conv1(x)))\n        x = self.pool(F.relu(self.conv2(x)))\n        x = torch.flatten(x, 1)  # flatten all dimensions except batch\n        x = F.relu(self.fc1(x))\n        x = F.relu(self.fc2(x))\n        x = self.fc3(x)\n        return x\n\n\nclass CNN(BaseModel):\n    '''This is a PyTorch model with some extra methods'''\n\n    def __init__(self, model_config):\n        super().__init__()\n        self.net = Net()\n\n    def loss(self, input: torch.Tensor) -> torch.Tensor:\n        '''Performs forward step and computes the loss'''\n        device = 'cuda' if torch.cuda.is_available() else 'cpu'\n        features, labels = input['x'].to(device), input['y'].to(device)\n        output = self.net.forward(features)\n        return F.cross_entropy(output, labels.long())\n\n    def inference(self, input):\n        '''Performs forward step and computes metrics'''\n        device = 'cuda' if torch.cuda.is_available() else 'cpu'\n        features, labels = input['x'].to(device), input['y'].to(device)\n        output = self.net.forward(features)\n\n        n_samples = features.shape[0]\n        accuracy = torch.mean((torch.argmax(output, dim=1) == labels).float()).item()\n        f1 = f1_score(labels.cpu(), torch.argmax(output, dim=1).cpu(), average='micro')\n\n        # NOTE: Only the keys 'output','acc' and 'batch_size' does not require \n        # extra fields as 'value' and 'higher is better'. FLUTE requires this \n        # format only for customized metrics.\n\n        return {'output':output, 'acc': accuracy, 'batch_size': n_samples, \\\n                'f1_score': {'value':f1,'higher_is_better': True}} \n\n\n        "
  },
  {
    "path": "experiments/classif_cnn/utils/centralized_training.py",
    "content": "# Copyright (c) Microsoft Corporation.\n# Licensed under the MIT license.\n\n'''Simple example of a CNN on CIFAR-10\n\nThis is adapted from the Pytorch tutorials. See\nhttps://github.com/pytorch/tutorials/blob/master/beginner_source/blitz/cifar10_tutorial.py\nfor more info.\n'''\n\nimport torch\nimport torchvision\nimport torchvision.transforms as transforms\nimport torch.nn as nn\nimport torch.nn.functional as F\nimport torch.optim as optim\n\n\n# Parameters\nBATCH_SIZE = 4\nN_EPOCHS = 2\n\ndevice = torch.device(\"cuda:0\" if torch.cuda.is_available() else \"cpu\")\n\n# Create dataloaders\ntransform = transforms.Compose([\n    transforms.ToTensor(),\n    transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5)),\n])\n\ntrainset = torchvision.datasets.CIFAR10(root='./data', train=True,\n        download=True, transform=transform)\ntrainloader = torch.utils.data.DataLoader(trainset, batch_size=BATCH_SIZE,\n        shuffle=True, num_workers=2)\n\ntestset = torchvision.datasets.CIFAR10(root='./data', train=False,\n        download=True, transform=transform)\ntestloader = torch.utils.data.DataLoader(testset, batch_size=BATCH_SIZE,\n        shuffle=False, num_workers=2)\n\n\n# Define the model\nclass Net(nn.Module):\n    def __init__(self):\n        super().__init__()\n        self.conv1 = nn.Conv2d(3, 6, 5)\n        self.pool = nn.MaxPool2d(2, 2)\n        self.conv2 = nn.Conv2d(6, 16, 5)\n        self.fc1 = nn.Linear(16 * 5 * 5, 120)\n        self.fc2 = nn.Linear(120, 84)\n        self.fc3 = nn.Linear(84, 10)\n\n    def forward(self, x):\n        x = self.pool(F.relu(self.conv1(x)))\n        x = self.pool(F.relu(self.conv2(x)))\n        x = torch.flatten(x, 1) # flatten all dimensions except batch\n        x = F.relu(self.fc1(x))\n        x = F.relu(self.fc2(x))\n        x = self.fc3(x)\n        return x\n\n\n# Instantiate model, loss and optimizer\nnet = Net().to(device)\ncriterion = nn.CrossEntropyLoss()\noptimizer = optim.SGD(net.parameters(), lr=0.001, momentum=0.9)\n\n# Training loop\nfor epoch in range(N_EPOCHS):  # loop over the dataset multiple times\n    running_loss = 0.0\n    for i, data in enumerate(trainloader, 0):\n        # Get the inputs; data is a list of [inputs, labels]\n        inputs, labels = data[0].to(device), data[1].to(device)\n\n        # Zero the parameter gradients\n        optimizer.zero_grad()\n\n        # Forward + backward + optimize\n        outputs = net(inputs)\n        loss = criterion(outputs, labels)\n        loss.backward()\n        optimizer.step()\n\n        # Print statistics\n        running_loss += loss.item()\n        if i % 2000 == 1999:    # print every 2000 mini-batches\n            print('[%d, %5d] loss: %.3f' %\n                  (epoch + 1, i + 1, running_loss / 2000))\n            running_loss = 0.0\n\n# Compute accuracy\ncorrect = 0\ntotal = 0\nwith torch.no_grad():\n    for data in testloader:\n        images, labels = data[0].to(device), data[1].to(device)\n        outputs = net(images)\n        _, predicted = torch.max(outputs.data, 1)\n        total += labels.size(0)\n        correct += (predicted == labels).sum().item()\n\nprint('Accuracy of the network on the 10000 test images: %d %%' % (\n    100 * correct / total))"
  },
  {
    "path": "experiments/classif_cnn/utils/download_and_convert_data.py",
    "content": "import h5py\nimport json\nimport time\n\nimport torchvision\nimport torchvision.transforms as transforms\nimport tqdm\n\n\ndef _dump_dict_to_hdf5(data_dict: dict, hdf5_file: h5py.File):\n    '''Dump dict with expected structure to HDF5 file'''\n\n    hdf5_file.create_dataset('users', data=data_dict['users'])\n    hdf5_file.create_dataset('num_samples', data=data_dict['num_samples'])\n\n    # Store actual data in groups\n    user_data_group = hdf5_file.create_group('user_data')\n    for user, user_data in tqdm.tqdm(data_dict['user_data'].items()):\n        user_subgroup = user_data_group.create_group(user)\n        user_subgroup.create_dataset('x', data=user_data) \n\n    user_data_label_group = hdf5_file.create_group('user_data_label')\n    for user, user_data_label in tqdm.tqdm(data_dict['user_data_label'].items()):\n        user_data_label_group.create_dataset(user, data=user_data_label) \n\ndef _process_and_save_to_disk(dataset, n_users, file_format, output):\n    '''Process a Torchvision dataset to expected format and save to disk'''\n\n    # Split training data equally among all users\n    total_samples = len(dataset)\n    samples_per_user = total_samples // n_users\n    assert total_samples % n_users == 0\n\n    # Function for getting a given user's data indices\n    user_idxs = lambda user_id: slice(user_id * samples_per_user, (user_id + 1) * samples_per_user)\n\n    # Convert training data to expected format\n    print('Converting data to expected format...')\n    start_time = time.time()\n\n    data_dict = {  # the data is expected to have this format\n        'users' : [f'{user_id:04d}' for user_id in range(n_users)],\n        'num_samples' : 10000 * [samples_per_user],\n        'user_data' : {f'{user_id:04d}': dataset.data[user_idxs(user_id)].tolist() for user_id in range(n_users)},\n        'user_data_label': {f'{user_id:04d}': dataset.targets[user_idxs(user_id)] for user_id in range(n_users)},\n    }\n\n    print(f'Finished converting data in {time.time() - start_time:.2f}s.')\n\n    # Save training data to disk\n    print('Saving data to disk...')\n    start_time = time.time()\n\n    if file_format == 'json':\n        with open(output + '.json', 'w') as json_file:\n            json.dump(data_dict, json_file)\n    elif file_format == 'hdf5':\n        with h5py.File(output + '.hdf5', 'w') as hdf5_file:\n            _dump_dict_to_hdf5(data_dict=data_dict, hdf5_file=hdf5_file)\n    else:\n        raise ValueError('unknown format.')\n\n    print(f'Finished saving data in {time.time() - start_time:.2f}s.')\n\n\n# Get training and testing data from torchvision\ntransform = transforms.Compose([\n    transforms.ToTensor(),\n    transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5)),\n])\n\ntrainset = torchvision.datasets.CIFAR10(root='./data', train=True,\n        download=True, transform=transform)\ntestset = torchvision.datasets.CIFAR10(root='./data', train=False,\n        download=True, transform=transform)\n\nprint('Processing training set...')\n_process_and_save_to_disk(trainset, n_users=1000, file_format='hdf5', output='./data/train_data')\n\nprint('Processing test set...')\n_process_and_save_to_disk(testset, n_users=200, file_format='hdf5', output='./data/test_data')"
  },
  {
    "path": "experiments/cv/README.md",
    "content": "# Simple example of ResNet model using personalization\n\nOur objective here is to bring a simple experiment of Computer Vision task,\nand convert it to FLUTE using the personalization feature. Instructions on \nhow to do this are given below.\n\n## Preparing the data\n\nIn this experiment we are making use of the CIFAR10 Dataset from torchvision, \ninitializated in `data.py`, which is wrapped by FLUTE Base Dataset.\n\n## Specifying the model\n\nNext, we prepare the model. The `model.py` file contains different classes than\ncan be used for this experiment. However, for this example we are using the\n`ResNet` class . Importantly, the `ResNet` class inheeits from `Base Model` \ndeclared in `core/model.py` and defines two methods: `loss` and `inference`; \nboth perform forward steps and then perform additional computations, in particular, \nthe former executes the loss' evaluation, and the latter the metrics' computation. \nThe format of the inputs and outputs should be the same as in this example.\n\n## Specifying dataset and dataloaders\n\nInside the `dataloaders` folder, there are two files: `dataset.py` and\n`dataloader.py`. Both inherit from the base classes declared in `core`\nfolder, that under the hood inhereit from Pytorch classes with same name.\n\nThe dataset should be able to access all the data, and store it in the\nattributes `user_list`, `user_data`, `user_data_labels` and `num_samples` (user\nnames, user features, user labels if the problem is supervised, and number of\nsamples for each user, respectively). These attributes are required to have\nthese exact names. Otherwise, it should also be able to access the examples of a\nspecific user, which id is passed during initialization via the `user_idx`\nargument.\n\nThe dataloader is simpler, and essentially just instantiates the dataset and\ncreates batches with a specific format.\n\n## Creating a config file\n\nAll the parameters of the experiment are passed in a YAML file. A documented\nexample is provided in `config.yaml`.\n\n## Running the experiment\n\nFinally, to launch the experiment, it suffices to launch the `e2e_trainer.py`\nscript using torch.distributed.\n\n```\npython -m torch.distributed.run --nproc_per_node=4 e2e_trainer.py -dataPath ./ -outputPath scratch -config experiments/classif_cnn/config.yaml -task cv -backend gloo\n```\n\nThe `dataPath`, `outputPath` and `config` arguments should just specify the\nrespective files or folders, as in the example above -- in this case, `dataPath` \ncan be any path given that data is being downloaded on-the.fly. A folder\ncalled `scratch` will be created containing logs and checkpoints. The task\nshould be the name of the folder insider `experiments`.\n"
  },
  {
    "path": "experiments/cv/config.yaml",
    "content": "model_config:\n    model_type: resnet50 #vgg11                                  # class w/ `loss` and `inference` methods\n    model_folder: experiments/cv/model.py              # file containing class\n    num_classes: 10\n\ndp_config:\n    enable_local_dp: false                             # whether to enable user-level DP\n\nprivacy_metrics_config:\n    apply_metrics: false                               # cache data to compute additional metrics\n\nstrategy: DGA                                          # Select the Federated optimizer to use (e.g. DGA, FedAvg or FedProx)\n\nserver_config:\n    wantRL: false                                      # whether to use RL-based meta-optimizers\n    resume_from_checkpoint: false                      # restart from checkpoint if file exists\n    do_profiling: false                                # run profiler and compute runtime metrics\n    save_to_disk: false                                # save the updated dataset in disk\n    optimizer_config:                                  # this is the optimizer used to update the model\n        type: adam\n        lr: 0.001\n    annealing_config:                                  # annealer for the learning rate\n        type: step_lr\n        step_interval: epoch\n        gamma: 1.00\n        step_size: 100\n    val_freq: 1000                                       # how many iterations between metric eval on val set\n    rec_freq: 5                                       # how many iterations between metric eval on test set\n    initial_val: False\n    initial_rec: True\n    max_iteration: 1000                                # how many iterations in total\n    num_clients_per_iteration: 10                      # how many clients per iteration\n    total_num_clients: 100\n    data_config:                                       # where to get val and test data from\n        val:\n            batch_size: 128\n            val_data: null\n        test:\n            batch_size: 128\n            test_data: null\n    type: personalization                              # Options: personalization | model_optimization\n    aggregate_median: softmax                          # how aggregations weights are computed\n    softmax_beta: 20.0\n    initial_lr_client: 1.0                             # learning rate used on client optimizer\n    lr_decay_factor: 1.0\n    weight_train_loss: train_loss\n    best_model_criterion: loss\n    fall_back_to_best_model: false\n\nclient_config:\n    do_profiling: false                                # run profiling and compute runtime metrics\n    ignore_subtask: false\n    convex_model_interp: 0.75                          # This is specific to personalization server/client\n    data_config:                                       # where to get training data from\n        train:\n            batch_size: 128\n            list_of_train_data: null\n            desired_max_samples: 50000\n    optimizer_config:                                  # this is the optimizer used by the client\n        type: sgd\n        lr: 0.001                                        # this is overridden by `initial_lr_client`\n    type: optimization"
  },
  {
    "path": "experiments/cv/data.py",
    "content": "# Copyright (c) Microsoft Corporation.\n# Licensed under the MIT license.\n\nimport logging\nimport h5py\nimport json\nimport os\n\nimport torchvision\nfrom torchvision import transforms\nimport numpy as np\nfrom numpy.random import RandomState\n\nfrom utils import print_rank\n\nclass DataPartitioner(object):\n    \"\"\" Partitions a dataset into different chunks. \"\"\"\n\n    def __init__(self, data, sizes=None, rnd=0, alpha=0, num_c=10,\n                 dataset=None, lab_distr=None, ratio=1, img_size=32, wantTrans=True):\n        self.data = data\n        self.dataset  = dataset\n        self.total_num= len(sizes) if sizes is not None else len(lab_distr)\n        self.img_size= img_size\n        self.wantTrans= wantTrans\n\n        if lab_distr is not None:\n            self.partitions, self.dat_stat = self.__use_fixed_lab_distr__(data, lab_distr,\n                                                                           ratio, rnd, num_c)\n        else:\n            self.partitions, self.ratio, self.dat_stat, self.endat_size = self.__getDirichletData__(data, sizes,\n                                                                                                    alpha, num_c, rnd)\n\n\n    def get_lab_distr(self):\n        return self.dat_stat\n\n\n    def return_partition(self, partition, flag='data', is_train_set=True):\n\n        if flag != 'data':\n            return [self.data[idx][1] for idx in self.partitions[partition]]\n        mean = [x / 255 for x in [125.3, 123.0, 113.9]]\n        std = [x / 255 for x in [63.0, 62.1, 66.7]]\n\n        if self.wantTrans:\n            dc = {'resize': 0.5 if is_train_set else None,\n                  'pad': None,\n                  'crop': None,\n                  'flip': False,\n                  'rotate': (-180+2*int(partition*180/self.total_num), -180+2*int((partition+1)*180/self.total_num)) if is_train_set else \\\n                            (-180+2*int(partition*180/self.total_num)+2, -180+2*int(partition*180/self.total_num)+2),\n                  'normalize': [mean, std]}\n        else:\n            dc = {'resize': None,\n                  'pad': None,\n                  'crop': None,\n                  'flip': False,\n                  'rotate': None,\n                  'normalize': [mean, std]}\n\n        transform = get_transform(transform=dc,img_size=self.img_size)\n\n        return {'x': [transform(self.data[idx][0]).tolist() for idx in self.partitions[partition]]}\n\n\n    def __use_fixed_lab_distr__(self, data, lab_distr, ratio, rnd, num_c):\n        n_nets = []\n        idx_batch = []\n        labelList = np.array(data.targets)\n        rann = RandomState(rnd)\n\n        # Find where all labels are\n        label_dict={lab: np.where(labelList == lab)[0] for lab in range(num_c)}\n\n        # Process the prefixed label distributions one by one\n        for lab_indices in list(lab_distr.keys())[:-1]:\n            net_dataidx_map = {}\n\n            for lab, num in lab_distr[lab_indices].items():\n                len_k = len(label_dict[lab])\n                idx_k = label_dict[lab][:min(int(num*ratio), len_k)]\n                label_dict[lab] = label_dict[lab][min(int(num*ratio), len_k):]\n                if len(idx_k)>0:\n                    net_dataidx_map[lab] = list(idx_k)\n            n_nets.append(net_dataidx_map)\n\n        net_dataidx_map = {}\n        for lab, idx_k in label_dict.items():\n            if len(idx_k)>0:\n                net_dataidx_map[lab] = idx_k\n        n_nets.append(net_dataidx_map)\n\n        for i, lab_indices in enumerate(n_nets):\n            idx_batch.append([item for sublist in lab_indices.values() for item in sublist])\n\n        net_cls_counts = {}\n        for net_i, dataidx in enumerate(idx_batch):\n            unq, unq_cnt = np.unique(labelList[dataidx], return_counts=True)\n            tmp = {unq[i]: unq_cnt[i] for i in range(len(unq))}\n            net_cls_counts[net_i] = tmp\n\n        print_rank('Data statistics: %s' % str(net_cls_counts), loglevel=logging.DEBUG)\n\n        if 0:\n            count=0\n            tot_count={i:0 for i in range(10)}\n            for _, client in net_cls_counts.items():\n                for lab, num in client.items():\n                    tot_count[lab]+=num\n                    count+=num\n            print('Debugging:', tot_count, count)\n\n\n        return idx_batch, net_cls_counts\n\n\n    # Getting this function from FedML -- 02-17-22\n    def __getDirichletData__(self, data, psizes, alpha, num_c, rnd):\n        n_nets = len(psizes)\n        K = num_c\n        labelList = np.array(data.targets)\n        min_size = 0\n        N = len(labelList)\n        rann = RandomState(rnd)\n\n        net_dataidx_map = {}\n        while min_size < K:\n            idx_batch = [[] for _ in range(n_nets)]\n            # for each class in the dataset\n            for k in range(K):\n                idx_k = np.where(labelList == k)[0]\n                rann.shuffle(idx_k)\n                proportions = rann.dirichlet(np.repeat(alpha, n_nets))\n                ## Balance\n                proportions = np.array([p * (len(idx_j) < N / n_nets) for p, idx_j in zip(proportions, idx_batch)])\n                proportions = proportions / proportions.sum()\n                proportions = (np.cumsum(proportions) * len(idx_k)).astype(int)[:-1]\n                idx_batch = [idx_j + idx.tolist() for idx_j, idx in zip(idx_batch, np.split(idx_k, proportions))]\n                min_size = min([len(idx_j) for idx_j in idx_batch])\n\n        for j in range(n_nets):\n            rann.shuffle(idx_batch[j])\n            net_dataidx_map[j] = idx_batch[j]\n\n        net_cls_counts = {}\n        for net_i, dataidx in net_dataidx_map.items():\n            unq, unq_cnt = np.unique(labelList[dataidx], return_counts=True)\n            tmp = {unq[i]: unq_cnt[i] for i in range(len(unq))}\n            net_cls_counts[net_i] = tmp\n\n        local_sizes = []\n        for i in range(n_nets):\n            local_sizes.append(len(net_dataidx_map[i]))\n        local_sizes = np.array(local_sizes)\n        weights = local_sizes / np.sum(local_sizes)\n\n        print_rank('Data statistics: %s' % str(net_cls_counts), loglevel=logging.DEBUG)\n        print_rank('Data ratio: %s' % str(weights), loglevel=logging.DEBUG)\n\n\n        if 0:\n            count=0\n            tot_count={i:0 for i in range(10)}\n            for _, client in net_cls_counts.items():\n                for lab, num in client.items():\n                    tot_count[lab]+=num\n                    count+=num\n            print('Debugging:', tot_count, count)\n\n        return idx_batch, weights, net_cls_counts, np.sum(local_sizes)\n\n\ndef partition_dataset(rnd, img_size, image, total_num_clients, image_path, alpha, wantTransform):\n\n    partition_sizes = [1.0/total_num_clients for _ in range(total_num_clients)]\n\n    if image == 'cifar':\n        trainset = torchvision.datasets.CIFAR10(\n                                            root=os.path.join(image_path, image),\n                                            train=True,\n                                            download=True,\n                                            transform=None)\n        train_partition = DataPartitioner(trainset, partition_sizes, rnd,\n                                            alpha=alpha,\n                                            num_c=10,\n                                            img_size=img_size,\n                                            wantTrans=wantTransform)\n\n        testset = torchvision.datasets.CIFAR10(\n                                            root=os.path.join(image_path, image),\n                                            train=False,\n                                            download=True,\n                                            transform=None)\n\n        if 0:\n            lab_distr= train_partition.get_lab_distr()\n            test_partition = DataPartitioner(testset, lab_distr=lab_distr, rnd=rnd, ratio=0.2,\n                                                num_c=10,\n                                                img_size=img_size,\n                                                wantTrans=wantTransform)\n        else:\n            test_partition = DataPartitioner(testset, partition_sizes, rnd,\n                                              alpha=alpha,\n                                              num_c=10,\n                                              img_size=img_size,\n                                              wantTrans=wantTransform)\n\n    elif image == 'cifar100':\n        trainset = torchvision.datasets.CIFAR100(\n                                            root=os.path.join(image_path, image),\n                                            train=True,\n                                            download=True,\n                                            transform=transform_train) # NOTE: Is this working?\n        train_partition = DataPartitioner(trainset, partition_sizes, rnd,\n                                            alpha=alpha,\n                                            num_c=100)\n\n        testset = torchvision.datasets.CIFAR100(\n                                            root=os.path.join(image_path, image),\n                                            train=False,\n                                            download=True,\n                                            transform=transform_test)\n        test_partition = DataPartitioner(testset, partition_sizes, rnd,\n                                            alpha=alpha,\n                                            num_c=100)\n\n    return train_partition, test_partition\n\n\n# Setup all necessary image datasets for training\ndef prepare_dataset(rnd=2020, img_size=40, image='cifar', total_num_clients=100, image_path=\"./\", alpha= 1.0, wantTransform=False, save_to_disk=False):\n    \n    train_partition, test_partition = partition_dataset(rnd=rnd, \n                                                        img_size=img_size, \n                                                        image=image, \n                                                        total_num_clients=total_num_clients, \n                                                        image_path=image_path,\n                                                        alpha=alpha,\n                                                        wantTransform= wantTransform)\n\n    datasets = [\"train_dataset.hdf5\", \"test_dataset.hdf5\"]\n    print_rank('Processing {}... '.format(datasets), loglevel=logging.DEBUG)\n    output = [_process_and_save_to_disk(train_partition if set == \"train_dataset.hdf5\" else test_partition, \n                                            save_to_disk, \n                                            file_format= set.split('.')[-1], \n                                            output=set, \n                                            is_train_set=True if set == \"train_dataset.hdf5\" else False) for set in datasets]\n\n    return output[0], output[1]\n\n\ndef _dump_dict_to_hdf5(data_dict: dict, hdf5_file: h5py.File):\n    '''Dump dict with expected structure to HDF5 file'''\n\n    hdf5_file.create_dataset('users', data=data_dict['users'])\n    hdf5_file.create_dataset('num_samples', data=data_dict['num_samples'])\n\n    # Store actual data in groups\n    user_data_group = hdf5_file.create_group('user_data')\n    for user, user_data in data_dict['user_data']['x'].items():\n        user_subgroup = user_data_group.create_group(user)\n        user_subgroup.create_dataset('x', data=user_data)\n\n    user_data_label_group = hdf5_file.create_group('user_data_label')\n    for user, user_data_label in data_dict['user_data_label'].items():\n        user_data_label_group.create_dataset(user, data=user_data_label)\n\n\ndef _process_and_save_to_disk(dataset, save_to_disk, file_format, output, is_train_set=True):\n    '''Process a Torchvision dataset to expected format and save to disk'''\n\n    n_users = len(dataset.partitions)\n\n    # Convert training data to expected format\n    print_rank('Converting data to expected format...', loglevel=logging.DEBUG)\n\n    data_dict = {\n        'users': [f'{user_id:04d}' for user_id in range(n_users)],\n        'num_samples': [len(dataset.partitions[user_id]) for user_id in range(n_users)],\n        'user_data': {f'{user_id:04d}': dataset.return_partition(user_id, 'data', is_train_set) for user_id in range(n_users)},\n        'user_data_label': {f'{user_id:04d}': dataset.return_partition(user_id, 'labels', is_train_set) for user_id in range(n_users)},\n    }\n\n    # Save training data to disk\n    print_rank('Saving data to disk...', loglevel=logging.DEBUG)\n    if save_to_disk:\n        if file_format == 'json':\n            outfile =output + '.json'\n            with open(outfile, 'w') as json_file:\n                json.dump(data_dict, json_file)\n        elif file_format == 'hdf5':\n            outfile =output + '.hdf5'\n            with h5py.File(outfile, 'w') as hdf5_file:\n                _dump_dict_to_hdf5(data_dict=data_dict, hdf5_file=hdf5_file)\n        else:\n            raise ValueError('unknown format.')\n        print_rank('Finished saving data...{}'.format(outfile), loglevel=logging.DEBUG)\n    else:\n        outfile=data_dict\n\n    return outfile\n\n\ndef get_transform(transform, img_size=32):\n    \"\"\"Unpack transformations and apply to train or test splits\"\"\"\n\n    transform_list = [transforms.ToTensor()]\n    # resize\n    if transform['resize'] is not None:\n        transform_list.append(transforms.RandomResizedCrop(img_size, scale=(transform['resize'], 2*transform['resize'])))\n        transform_list.append(torchvision.transforms.Pad(4))\n    else:\n        transform_list.append(transforms.RandomCrop(img_size, padding=4))\n        #transform_list.append(transforms.Resize(img_size))\n    # padding\n    if transform['pad'] is not None:\n        transform_list.append(transforms.Pad(transform['pad']))\n\n    # crop\n    if transform['crop'] is not None:\n        transform_list.append(transforms.RandomResizedCrop(transform['crop']))\n\n    if transform['rotate'] is not None:\n        transform_list.append(transforms.RandomRotation(transform['rotate']))\n\n    # flips\n    if transform['flip']:\n        transform_list.append(transforms.RandomHorizontalFlip())\n        transform_list.append(transforms.RandomVerticalFlip())\n\n    # normalization\n    if transform['normalize'] is not None:\n        transform_list.append(transforms.Normalize(mean=transform['normalize'][0], std=transform['normalize'][1]))\n\n    return transforms.Compose(transform_list)\n"
  },
  {
    "path": "experiments/cv/dataloaders/dataloader.py",
    "content": "# Copyright (c) Microsoft Corporation.\n# Licensed under the MIT license.\n\nimport torch\nimport numpy as np\n\nfrom core.dataloader import BaseDataLoader\nfrom experiments.cv.dataloaders.dataset import Dataset\n\nclass DataLoader(BaseDataLoader):\n    def __init__(self, mode, num_workers=0, **kwargs):\n        args = kwargs['args']\n        self.batch_size = args['batch_size']\n\n        dataset = Dataset(\n            data=kwargs['data'],\n            test_only=(not mode=='train'),\n            user_idx=kwargs.get('user_idx', 0),\n        )\n\n        super().__init__(\n            dataset,\n            batch_size=self.batch_size,\n            shuffle=(mode=='train'),\n            num_workers=num_workers,\n            collate_fn=self.collate_fn,\n        )\n\n    def collate_fn(self, batch):\n        x, y = list(zip(*batch))\n        return {'x': torch.tensor(np.array(x)), 'y': torch.tensor(np.array(y)).long()}"
  },
  {
    "path": "experiments/cv/dataloaders/dataset.py",
    "content": "# Copyright (c) Microsoft Corporation.\n# Licensed under the MIT license.\n\nimport numpy as np\n\nfrom core.dataset import BaseDataset\nfrom experiments.cv.data import prepare_dataset\n\nclass Dataset(BaseDataset):\n    def __init__(self, data, test_only=False, user_idx=0, **kwargs):\n        self.test_only = test_only\n        self.user_idx = user_idx\n\n        # Get all data\n        self.user_list, self.user_data, self.user_data_label, self.num_samples = self.load_data(data, self.test_only)\n\n        if self.test_only:  # combine all data into single array\n            self.user = 'test_only'\n            self.features = np.vstack([user_data['x'] for user_data in self.user_data.values()])\n            self.labels = np.hstack(list(self.user_data_label.values()))\n\n        else:  # get a single user's data\n            if user_idx is None:\n                raise ValueError('in train mode, user_idx must be specified')\n\n            self.user = self.user_list[user_idx]\n            self.features = np.vstack([user_data['x'] for user_data in self.user_data.values()])\n            self.labels = np.hstack(list(self.user_data_label.values()))\n\n    def __getitem__(self, idx):\n        return self.features[idx].astype(np.float32).T, self.labels[idx]\n\n    def __len__(self):\n        return len(self.features)\n\n    def load_data(self, data, test_only):\n        '''Download or load data from disk/memory.\n        \n        The `data` argument can be either the path to the JSON\n        or HDF5 file that contains the expected dictionary, or the\n        actual dictionary. In case data cannot be loaded, will be \n        downloaded through `prepare_dataset` method.'''\n\n        if data == None:\n            training_dataset, test_dataset = prepare_dataset(rnd=2020,\n                                                                img_size=40, \n                                                                image='cifar', \n                                                                total_num_clients=100, \n                                                                image_path=\"./\",\n                                                                save_to_disk= False,\n                                                                alpha= 1.0,\n                                                                wantTransform= False)\n            data = test_dataset if test_only else training_dataset\n        \n        users = data['users']\n        features = data['user_data']\n        labels = data['user_data_label']\n        num_samples = data['num_samples']\n\n        return users, features, labels, num_samples"
  },
  {
    "path": "experiments/cv/model.py",
    "content": "# Copyright (c) Microsoft Corporation.\n# Licensed under the MIT license.\n'''\nModified from https://github.com/pytorch/vision.git\n\nThe torchvision package consists of popular datasets, model architectures, \nand common image transformations for computer vision.\n'''\nimport torch as T\nimport torch.nn as nn\nimport numpy as np\nimport logging\n\nlogging.basicConfig(format='%(levelname)s - %(message)s', level=logging.DEBUG)\n\nfrom torch import Tensor\nfrom torch.utils.model_zoo import load_url as load_state_dict_from_url\nfrom typing import Type, Any, Callable, Union, List, Optional\n\nfrom core.model import BaseModel\n\n__all__ = ['ResNet', 'resnet18', 'resnet34', 'resnet50', 'resnet101',\n           'resnet152', 'resnext50_32x4d', 'resnext101_32x8d',\n           'wide_resnet50_2', 'wide_resnet101_2']\n\n\nmodel_urls = {\n    'resnet18': 'https://download.pytorch.org/models/resnet18-f37072fd.pth',\n    'resnet34': 'https://download.pytorch.org/models/resnet34-b627a593.pth',\n    'resnet50': 'https://download.pytorch.org/models/resnet50-0676ba61.pth',\n    'resnet101': 'https://download.pytorch.org/models/resnet101-63fe2227.pth',\n    'resnet152': 'https://download.pytorch.org/models/resnet152-394f9c45.pth',\n    'resnext50_32x4d': 'https://download.pytorch.org/models/resnext50_32x4d-7cdf4587.pth',\n    'resnext101_32x8d': 'https://download.pytorch.org/models/resnext101_32x8d-8ba56ff5.pth',\n    'wide_resnet50_2': 'https://download.pytorch.org/models/wide_resnet50_2-95faca4d.pth',\n    'wide_resnet101_2': 'https://download.pytorch.org/models/wide_resnet101_2-32ee1156.pth',\n}\n\n\ndef conv3x3(in_planes: int, out_planes: int, stride: int = 1, groups: int = 1, dilation: int = 1) -> nn.Conv2d:\n    \"\"\"3x3 convolution with padding\"\"\"\n    return nn.Conv2d(in_planes, out_planes, kernel_size=3, stride=stride,\n                     padding=dilation, groups=groups, bias=False, dilation=dilation)\n\n\ndef conv1x1(in_planes: int, out_planes: int, stride: int = 1) -> nn.Conv2d:\n    \"\"\"1x1 convolution\"\"\"\n    return nn.Conv2d(in_planes, out_planes, kernel_size=1, stride=stride, bias=False)\n\n\nclass BasicBlock(nn.Module):\n    expansion: int = 1\n\n    def __init__(\n        self,\n        inplanes: int,\n        planes: int,\n        stride: int = 1,\n        downsample: Optional[nn.Module] = None,\n        groups: int = 1,\n        base_width: int = 64,\n        dilation: int = 1,\n        norm_layer: Optional[Callable[..., nn.Module]] = None\n    ) -> None:\n        super(BasicBlock, self).__init__()\n        if norm_layer is None:\n            norm_layer = nn.BatchNorm2d\n        if groups != 1 or base_width != 64:\n            raise ValueError('BasicBlock only supports groups=1 and base_width=64')\n        if dilation > 1:\n            raise NotImplementedError(\"Dilation > 1 not supported in BasicBlock\")\n        # Both self.conv1 and self.downsample layers downsample the input when stride != 1\n        self.conv1 = conv3x3(inplanes, planes, stride)\n        self.bn1 = norm_layer(planes)\n        self.relu = nn.ReLU(inplace=True)\n        self.conv2 = conv3x3(planes, planes)\n        self.bn2 = norm_layer(planes)\n        self.downsample = downsample\n        self.stride = stride\n\n\n    def forward(self, x: Tensor) -> Tensor:\n        identity = x\n\n        out = self.conv1(x)\n        out = self.bn1(out)\n        out = self.relu(out)\n\n        out = self.conv2(out)\n        out = self.bn2(out)\n\n        if self.downsample is not None:\n            identity = self.downsample(x)\n\n        out += identity\n        out = self.relu(out)\n\n        return out\n\n\nclass Bottleneck(nn.Module):\n    # Bottleneck in torchvision places the stride for downsampling at 3x3 convolution(self.conv2)\n    # while original implementation places the stride at the first 1x1 convolution(self.conv1)\n    # according to \"Deep residual learning for image recognition\"https://arxiv.org/abs/1512.03385.\n    # This variant is also known as ResNet V1.5 and improves accuracy according to\n    # https://ngc.nvidia.com/catalog/model-scripts/nvidia:resnet_50_v1_5_for_pytorch.\n\n    expansion: int = 4\n\n    def __init__(\n        self,\n        inplanes: int,\n        planes: int,\n        stride: int = 1,\n        downsample: Optional[nn.Module] = None,\n        groups: int = 1,\n        base_width: int = 64,\n        dilation: int = 1,\n        norm_layer: Optional[Callable[..., nn.Module]] = None\n    ) -> None:\n        super(Bottleneck, self).__init__()\n        if norm_layer is None:\n            norm_layer = nn.BatchNorm2d\n        width = int(planes * (base_width / 64.)) * groups\n\n        # Both self.conv2 and self.downsample layers downsample the input when stride != 1\n        self.conv1 = conv1x1(inplanes, width)\n        self.bn1 = norm_layer(width)\n        self.conv2 = conv3x3(width, width, stride, groups, dilation)\n        self.bn2 = norm_layer(width)\n        self.conv3 = conv1x1(width, planes * self.expansion)\n        self.bn3 = norm_layer(planes * self.expansion)\n        self.relu = nn.ReLU(inplace=True)\n        self.downsample = downsample\n        self.stride = stride\n\n    def forward(self, x: Tensor) -> Tensor:\n        identity = x\n\n        out = self.conv1(x)\n        out = self.bn1(out)\n        out = self.relu(out)\n\n        out = self.conv2(out)\n        out = self.bn2(out)\n        out = self.relu(out)\n\n        out = self.conv3(out)\n        out = self.bn3(out)\n\n        if self.downsample is not None:\n            identity = self.downsample(x)\n\n        out += identity\n        out = self.relu(out)\n\n        return out\n\n\nclass ResNet(BaseModel):\n    def __init__(\n        self,\n        block: Type[Union[BasicBlock, Bottleneck]],\n        layers: List[int],\n        num_class: int = 1000,\n        zero_init_residual: bool = False,\n        groups: int = 1,\n        width_per_group: int = 64,\n        replace_stride_with_dilation: Optional[List[bool]] = None,\n        norm_layer: Optional[Callable[..., nn.Module]] = None\n    ) -> None:\n        super(ResNet, self).__init__()\n        if norm_layer is None:\n            norm_layer = nn.BatchNorm2d\n        self._norm_layer = norm_layer\n\n        self.inplanes = 64\n        self.dilation = 1\n        if replace_stride_with_dilation is None:\n            # each element in the tuple indicates if we should replace\n            # the 2x2 stride with a dilated convolution instead\n            replace_stride_with_dilation = [False, False, False]\n        if len(replace_stride_with_dilation) != 3:\n            raise ValueError(\"replace_stride_with_dilation should be None \"\n                             \"or a 3-element tuple, got {}\".format(replace_stride_with_dilation))\n        self.groups = groups\n        self.base_width = width_per_group\n        self.conv1 = nn.Conv2d(3, self.inplanes, kernel_size=7, stride=2, padding=3, bias=False)\n        self.bn1 = norm_layer(self.inplanes)\n        self.relu = nn.ReLU(inplace=True)\n        self.maxpool = nn.MaxPool2d(kernel_size=3, stride=2, padding=1)\n        self.layer1 = self._make_layer(block, 64, layers[0])\n        self.layer2 = self._make_layer(block, 128, layers[1], stride=2, dilate=replace_stride_with_dilation[0])\n        self.layer3 = self._make_layer(block, 256, layers[2], stride=2, dilate=replace_stride_with_dilation[1])\n        self.layer4 = self._make_layer(block, 512, layers[3], stride=2, dilate=replace_stride_with_dilation[2])\n        self.avgpool = nn.AdaptiveAvgPool2d((1, 1))\n        self.fc = nn.Linear(512 * block.expansion, num_class)\n\n        for m in self.modules():\n            if isinstance(m, nn.Conv2d):\n                nn.init.kaiming_normal_(m.weight, mode='fan_out', nonlinearity='relu')\n            elif isinstance(m, (nn.BatchNorm2d, nn.GroupNorm)):\n                nn.init.constant_(m.weight, 1)\n                nn.init.constant_(m.bias, 0)\n\n        # Zero-initialize the last BN in each residual branch,\n        # so that the residual branch starts with zeros, and each residual block behaves like an identity.\n        # This improves the model by 0.2~0.3% according to https://arxiv.org/abs/1706.02677\n        if zero_init_residual:\n            for m in self.modules():\n                if isinstance(m, Bottleneck):\n                    nn.init.constant_(m.bn3.weight, 0)  # type: ignore[arg-type]\n                elif isinstance(m, BasicBlock):\n                    nn.init.constant_(m.bn2.weight, 0)  # type: ignore[arg-type]\n\n\n    def _make_layer(self, block: Type[Union[BasicBlock, Bottleneck]], planes: int, blocks: int,\n                    stride: int = 1, dilate: bool = False) -> nn.Sequential:\n        norm_layer = self._norm_layer\n        downsample = None\n        previous_dilation = self.dilation\n        if dilate:\n            self.dilation *= stride\n            stride = 1\n        if stride != 1 or self.inplanes != planes * block.expansion:\n            downsample = nn.Sequential(\n                conv1x1(self.inplanes, planes * block.expansion, stride),\n                norm_layer(planes * block.expansion),\n            )\n\n        layers = []\n        layers.append(block(self.inplanes, planes, stride, downsample, self.groups,\n                            self.base_width, previous_dilation, norm_layer))\n        self.inplanes = planes * block.expansion\n        for _ in range(1, blocks):\n            layers.append(block(self.inplanes, planes, groups=self.groups,\n                                base_width=self.base_width, dilation=self.dilation,\n                                norm_layer=norm_layer))\n\n        return nn.Sequential(*layers)\n\n    def forward(self, inputs):\n        inp = inputs['x'].cuda() if T.cuda.is_available() else inputs['x']\n        x = self.conv1(T.transpose(inp, 1, 3))\n        x = self.bn1(x)\n        x = self.relu(x)\n        x = self.maxpool(x)\n\n        x = self.layer1(x)\n        x = self.layer2(x)\n        x = self.layer3(x)\n        x = self.layer4(x)\n\n        x = self.avgpool(x)\n        x = T.flatten(x, 1)\n        x = self.fc(x)\n\n        return x\n\n\n\n    def get_logit(self, x = None, evalis = True, logmax=False):\n        data, target = x\n        if logmax:\n            Softmax = T.nn.LogSoftmax(dim=1)\n        else:\n            Softmax = T.nn.Softmax(dim=1)\n\n        data = data.cuda() if T.cuda.is_available() else data\n\n        if evalis:\n            self.eval()\n            with T.no_grad():\n                # Run the forward pass\n                output = self.forward(data)\n                logits = Softmax(output)\n                logits.detach_()\n\n        else:\n            self.train()\n            output = self.forward(data)\n            logits = Softmax(output)\n        loss = 1\n\n        return logits.cpu(), target.cpu(), loss\n\n\n    def inference(self, inputs):\n        targets = inputs['y'].cuda() if T.cuda.is_available() else inputs['y']\n\n        # Run the forward pass\n        self.eval()\n        output = self(inputs)\n        output = T.nn.LogSoftmax(dim=1)(output)\n\n        # accuracy\n        accuracy = T.mean((T.argmax(output, dim=1) == targets).float()).item()\n\n        output = {'probabilities': output.cpu().detach().numpy(),\n                      'predictions': np.arange(0, targets.shape[0]),\n                      'labels': targets.cpu().numpy()}\n\n        return {'output':output, 'acc': accuracy, 'batch_size': targets.shape[0]}\n\n    def loss(self, inputs):\n        targets = inputs['y'].cuda() if T.cuda.is_available() else inputs['y']\n\n        # Run the forward pass\n        self.train()\n        output = self.forward(inputs)\n        loss = T.nn.functional.cross_entropy(output, targets)\n\n        return loss\n\n\n    def copy_state_dict(self, state_dict):\n        self.state_dict=state_dict.clone()\n\n\n    def get_model(self):\n        return self\n\n\ndef _resnet(\n    arch: str,\n    block: Type[Union[BasicBlock, Bottleneck]],\n    layers: List[int],\n    pretrained: bool,\n    progress: bool,\n    **kwargs: Any\n) -> ResNet:\n    model = ResNet(block, layers, **kwargs)\n    if pretrained:\n        state_dict = load_state_dict_from_url(model_urls[arch],\n                                              progress=progress)\n        # edit last layer\n        state_dict['fc.weight'] = state_dict['fc.weight'][:kwargs['num_class']]\n        state_dict['fc.bias'] = state_dict['fc.bias'][:kwargs['num_class']]\n        model.load_state_dict(state_dict)\n    return model\n\n\n\ndef resnet18(config, pretrained: bool = False, progress: bool = True, **kwargs: Any) -> ResNet:\n    r\"\"\"ResNet-18 model from\n    `\"Deep Residual Learning for Image Recognition\" <https://arxiv.org/pdf/1512.03385.pdf>`_.\n\n    Args:\n        pretrained (bool): If True, returns a model pre-trained on ImageNet\n        progress (bool): If True, displays a progress bar of the download to stderr\n    \"\"\"\n    kwargs['num_class']= config['num_classes']\n    return _resnet('resnet18', BasicBlock, [2, 2, 2, 2], pretrained, progress,\n                   **kwargs)\n\n\n\n\ndef resnet34(config, pretrained: bool = False, progress: bool = True, **kwargs: Any) -> ResNet:\n    r\"\"\"ResNet-34 model from\n    `\"Deep Residual Learning for Image Recognition\" <https://arxiv.org/pdf/1512.03385.pdf>`_.\n\n    Args:\n        pretrained (bool): If True, returns a model pre-trained on ImageNet\n        progress (bool): If True, displays a progress bar of the download to stderr\n    \"\"\"\n    kwargs['num_class']= config['num_classes']\n    return _resnet('resnet34', BasicBlock, [3, 4, 6, 3], pretrained, progress,\n                   **kwargs)\n\n\n\n\ndef resnet50(config, pretrained: bool = False, progress: bool = True, **kwargs: Any) -> ResNet:\n    r\"\"\"ResNet-50 model from\n    `\"Deep Residual Learning for Image Recognition\" <https://arxiv.org/pdf/1512.03385.pdf>`_.\n\n    Args:\n        pretrained (bool): If True, returns a model pre-trained on ImageNet\n        progress (bool): If True, displays a progress bar of the download to stderr\n    \"\"\"\n    kwargs['num_class']= config['num_classes']\n    return _resnet('resnet50', Bottleneck, [3, 4, 6, 3], pretrained, progress,\n                   **kwargs)\n\n\n\n\ndef resnet101(config, pretrained: bool = False, progress: bool = True, **kwargs: Any) -> ResNet:\n    r\"\"\"ResNet-101 model from\n    `\"Deep Residual Learning for Image Recognition\" <https://arxiv.org/pdf/1512.03385.pdf>`_.\n\n    Args:\n        pretrained (bool): If True, returns a model pre-trained on ImageNet\n        progress (bool): If True, displays a progress bar of the download to stderr\n    \"\"\"\n    return _resnet('resnet101', Bottleneck, [3, 4, 23, 3], pretrained, progress,\n                   **kwargs)\n\n\n\n\ndef resnet152(config, pretrained: bool = False, progress: bool = True, **kwargs: Any) -> ResNet:\n    r\"\"\"ResNet-152 model from\n    `\"Deep Residual Learning for Image Recognition\" <https://arxiv.org/pdf/1512.03385.pdf>`_.\n\n    Args:\n        pretrained (bool): If True, returns a model pre-trained on ImageNet\n        progress (bool): If True, displays a progress bar of the download to stderr\n    \"\"\"\n    kwargs['num_class']= config['num_classes']\n    return _resnet('resnet152', Bottleneck, [3, 8, 36, 3], pretrained, progress,\n                   **kwargs)\n\n\n\n\ndef resnext50_32x4d(config, pretrained: bool = False, progress: bool = True, **kwargs: Any) -> ResNet:\n    r\"\"\"ResNeXt-50 32x4d model from\n    `\"Aggregated Residual Transformation for Deep Neural Networks\" <https://arxiv.org/pdf/1611.05431.pdf>`_.\n\n    Args:\n        pretrained (bool): If True, returns a model pre-trained on ImageNet\n        progress (bool): If True, displays a progress bar of the download to stderr\n    \"\"\"\n    kwargs['groups'] = 32\n    kwargs['width_per_group'] = 4\n    kwargs['num_class']= config['num_classes']\n    return _resnet('resnext50_32x4d', Bottleneck, [3, 4, 6, 3],\n                   pretrained, progress, **kwargs)\n\n\n\n\ndef resnext101_32x8d(config, pretrained: bool = False, progress: bool = True, **kwargs: Any) -> ResNet:\n    r\"\"\"ResNeXt-101 32x8d model from\n    `\"Aggregated Residual Transformation for Deep Neural Networks\" <https://arxiv.org/pdf/1611.05431.pdf>`_.\n\n    Args:\n        pretrained (bool): If True, returns a model pre-trained on ImageNet\n        progress (bool): If True, displays a progress bar of the download to stderr\n    \"\"\"\n    kwargs['groups'] = 32\n    kwargs['width_per_group'] = 8\n    kwargs['num_class']= config['num_classes']\n    return _resnet('resnext101_32x8d', Bottleneck, [3, 4, 23, 3],\n                   pretrained, progress, **kwargs)\n\n\n\n\ndef wide_resnet50_2(config, pretrained: bool = False, progress: bool = True, **kwargs: Any) -> ResNet:\n    r\"\"\"Wide ResNet-50-2 model from\n    `\"Wide Residual Networks\" <https://arxiv.org/pdf/1605.07146.pdf>`_.\n\n    The model is the same as ResNet except for the bottleneck number of channels\n    which is twice larger in every block. The number of channels in outer 1x1\n    convolutions is the same, e.g. last block in ResNet-50 has 2048-512-2048\n    channels, and in Wide ResNet-50-2 has 2048-1024-2048.\n\n    Args:\n        pretrained (bool): If True, returns a model pre-trained on ImageNet\n        progress (bool): If True, displays a progress bar of the download to stderr\n    \"\"\"\n    kwargs['width_per_group'] = 64 * 2\n    kwargs['num_class']= config['num_classes']\n    return _resnet('wide_resnet50_2', Bottleneck, [3, 4, 6, 3],\n                   pretrained, progress, **kwargs)\n\n\n\n\ndef wide_resnet101_2(config, pretrained: bool = False, progress: bool = True, **kwargs: Any) -> ResNet:\n    r\"\"\"Wide ResNet-101-2 model from\n    `\"Wide Residual Networks\" <https://arxiv.org/pdf/1605.07146.pdf>`_.\n\n    The model is the same as ResNet except for the bottleneck number of channels\n    which is twice larger in every block. The number of channels in outer 1x1\n    convolutions is the same, e.g. last block in ResNet-50 has 2048-512-2048\n    channels, and in Wide ResNet-50-2 has 2048-1024-2048.\n\n    Args:\n        pretrained (bool): If True, returns a model pre-trained on ImageNet\n        progress (bool): If True, displays a progress bar of the download to stderr\n    \"\"\"\n    kwargs['width_per_group'] = 64 * 2\n    kwargs['num_class']= config['num_classes']\n    return _resnet('wide_resnet101_2', Bottleneck, [3, 4, 23, 3],\n                   pretrained, progress, **kwargs)\n"
  },
  {
    "path": "experiments/cv/model_vgg.py",
    "content": "# Copyright (c) Microsoft Corporation.\n# Licensed under the MIT license.\n\n'''\nModified from https://github.com/pytorch/vision.git\n\nThe torchvision package consists of popular datasets, model architectures, \nand common image transformations for computer vision.\n'''\nimport math\nimport torch as T\nimport torch.nn as nn\nimport numpy as np\nimport logging\n\nlogging.basicConfig(format='%(levelname)s - %(message)s', level=logging.DEBUG)\n\n__all__ = [\n    'VGG', 'vgg11', 'vgg11_bn', 'vgg13', 'vgg13_bn', 'vgg16', 'vgg16_bn',\n    'vgg19_bn', 'vgg19',\n]\n\nclass VGG(nn.Module):\n    '''\n    VGG model\n    '''\n    def __init__(self, vgg, num_class, topK_results=None):\n        super(VGG, self).__init__()\n\n        self.topK_results = num_class if topK_results is None else topK_results\n        self.vgg = vgg\n        self.classifier = nn.Sequential(\n            nn.Dropout(),\n            nn.Linear(512, 512),\n            nn.ReLU(True),\n            nn.Dropout(),\n            nn.Linear(512, 512),\n            nn.ReLU(True),\n            nn.Linear(512, num_class),\n        )\n        if 0:\n            # Initialize weights\n            for m in self.modules():\n                if isinstance(m, nn.Conv2d):\n                    n = m.kernel_size[0] * m.kernel_size[1] * m.out_channels\n                    m.weight.data.normal_(0, math.sqrt(2. / n))\n                    m.bias.data.zero_()\n\n    def forward(self, inputs):\n        inputs = inputs['x'].cuda() if T.cuda.is_available() else inputs['x']\n        x = self.vgg(inputs.view(-1,3,32,32))\n        x = T.flatten(x, 1)\n        x = self.classifier(x)\n        return x\n\n\n    def loss(self, inputs):\n        targets = inputs['y'].cuda() if T.cuda.is_available() else inputs['y']\n        # Run the forward pass\n        output = self(inputs)\n        loss = T.nn.functional.cross_entropy(output, targets)\n\n        return loss\n\n\n    def inference(self, inputs):\n        targets = inputs['y'].cuda() if T.cuda.is_available() else inputs['y']\n\n        # Run the forward pass\n        output = self(inputs)\n\n        # accuracy\n        accuracy = T.mean((T.argmax(output, dim=1) == targets).float()).item()\n\n        output = {'probabilities': output.cpu().detach().numpy(),\n                      'predictions': np.arange(0, targets.shape[0]),\n                      'labels': targets.cpu().numpy()}\n\n        return {'output':output, 'val_acc': accuracy, 'batch_size': targets.shape[0]}\n\n    def get_logit(self, inputs = None, evalis = True, logmax=False):\n        data, targets = inputs\n\n        if logmax:\n            Softmax = T.nn.LogSoftmax(dim=1)\n        else:\n            Softmax = T.nn.Softmax(dim=1)\n\n        data = data.cuda() if T.cuda.is_available() else data\n\n        if evalis:\n            self.eval()\n            with T.no_grad():\n                # Run the forward pass\n                output = self.forward(data)\n                logits = Softmax(output)\n        else:\n            self.train()\n            output = self.forward(data)\n            logits = Softmax(output)\n\n        loss = T.nn.functional.cross_entropy(output, targets)\n\n        return logits.cpu(), targets.cpu(), loss.cpu()\n\n    def copy_state_dict(self, state_dict):\n        self.state_dict=state_dict.clone()\n\n    def set_eval(self):\n        \"\"\"\n        Bring the model into evaluation mode\n        \"\"\"\n        self.eval()\n\n    def set_train(self):\n        \"\"\"\n        Bring the model into train mode\n        \"\"\"\n        self.train()\n\n\ndef make_layers(cfg, n_channels=3, batch_norm=True):\n    layers = []\n    in_channels = n_channels\n    for v in cfg:\n        if v == 'M':\n            layers += [nn.MaxPool2d(kernel_size=2, stride=2)]\n        else:\n            conv2d = nn.Conv2d(in_channels, v, kernel_size=3, padding=1)\n            if batch_norm:\n                layers += [conv2d, nn.BatchNorm2d(v), nn.ReLU(inplace=True)]\n            else:\n                layers += [conv2d, nn.ReLU(inplace=True)]\n            in_channels = v\n    return nn.Sequential(*layers)\n\n\ncfg = {\n    'A': [64, 'M', 128, 'M', 256, 256, 'M', 512, 512, 'M', 512, 512, 'M'],\n    'B': [64, 64, 'M', 128, 128, 'M', 256, 256, 'M', 512, 512, 'M', 512, 512, 'M'],\n    'D': [64, 64, 'M', 128, 128, 'M', 256, 256, 256, 'M', 512, 512, 512, 'M', 512, 512, 512, 'M'],\n    'E': [64, 64, 'M', 128, 128, 'M', 256, 256, 256, 256, 'M', 512, 512, 512, 512, 'M',\n          512, 512, 512, 512, 'M'],\n}\n\n\ndef vgg11(config):\n    \"\"\"VGG 11-layer model (configuration \"A\")\"\"\"\n    num_class = config['num_classes']\n    return VGG(make_layers(cfg['A'], batch_norm=False),num_class)\n\n\ndef vgg11_bn(config):\n    \"\"\"VGG 11-layer model (configuration \"A\") with batch normalization\"\"\"\n    num_class = config['num_classes']\n    return VGG(make_layers(cfg['A'], batch_norm=True),num_class)\n\n\ndef vgg13(config):\n    \"\"\"VGG 13-layer model (configuration \"B\")\"\"\"\n    num_class = config['num_classes']\n    return VGG(make_layers(cfg['B'], batch_norm=False),num_class)\n\n\ndef vgg13_bn(config):\n    \"\"\"VGG 13-layer model (configuration \"B\") with batch normalization\"\"\"\n    num_class=config['num_classes']\n    return VGG(make_layers(cfg['B'], batch_norm=True),num_class)\n\n\ndef vgg16(config):\n    \"\"\"VGG 16-layer model (configuration \"D\")\"\"\"\n    num_class = config['num_classes']\n    return VGG(make_layers(cfg['D'], batch_norm=False),num_class)\n\n\ndef vgg16_bn(config):\n    \"\"\"VGG 16-layer model (configuration \"D\") with batch normalization\"\"\"\n    num_class = config['num_classes']\n    return VGG(make_layers(cfg['D'], batch_norm=True),num_class)\n\n\ndef vgg19(config):\n    \"\"\"VGG 19-layer model (configuration \"E\")\"\"\"\n    num_class=config['num_classes']\n    return VGG(make_layers(cfg['E'], batch_norm=False),num_class)\n\n\ndef vgg19_bn(config):\n    \"\"\"VGG 19-layer model (configuration 'E') with batch normalization\"\"\"\n    num_class=config['num_classes']\n    return VGG(make_layers(cfg['E'], batch_norm=True),num_class)"
  },
  {
    "path": "experiments/cv/server.py",
    "content": "# Copyright (c) Microsoft Corporation.\n# Licensed under the MIT license.\n'''\nIn this file, we define the local server that lives inside the client.\n'''\n\nfrom core.server import OptimizationServer\n\nclass PersonalizationServer(OptimizationServer):\n    def __init__(self, num_clients, model, optimizer, ss_scheduler, data_path, model_path, server_train_dataloader,\n                 config, idx_val_clients, idx_test_clients):\n        \"\"\"\n        Personalization Server. \n        \n        Customized routines for server can be included here.\n        \"\"\"\n        super().__init__(num_clients, model, optimizer, ss_scheduler, data_path, model_path, server_train_dataloader,\n                 config, idx_val_clients, idx_test_clients)"
  },
  {
    "path": "experiments/cv_cnn_femnist/README.md",
    "content": "## FedML Benchmark\n\n### Examples\n\nThe example in this folder was taken from [FedML](https://github.com/FedML-AI/FedML/tree/master/python/examples/simulation/mpi_fedavg_datasets_and_models_example) repository on its release 0.7.300, using the configuration suggested on their\n[benchmarking results](https://doc.fedml.ai/simulation/benchmark/BENCHMARK_MPI.html) for MPI-Based Federated Learning (fastest on this version).\n\n### Data\n\nFLUTE will automatically download the data used for this example, otherwise you can use the scripts provided [here](https://github.com/FedML-AI/FedML/tree/master/python/fedml/data) for each independent dataset in the FedML GitHub repository. \n\n### Run\n\nIf you downloaded the data manually, make sure that the variable `data_cache_dir` has been updated inside `preprocess.py`. Later, you can run the experiment as follows:\n\n```code\n    python -m torch.distributed.run  --nproc_per_node=4  e2e_trainer.py -dataPath ~/data -outputPath ~/outputTest  -config ./experiments/cv_cnn_femnist/config.yaml -task cv_cnn_femnist -backend nccl\n```\n\n### Results\n\nThis comparison was carried out using Parrot (Simulator) on version 0.7.303 at commit ID [8f7f261f](https://github.com/FedML-AI/FedML/tree/8f7f261f44e58d0cb5a416b0d6fa270b42a91049). \n\n```\n _____________________________________________________________________________\n|                    |   FedML (MPI) - Fastest   |   FLUTE (NCCL)  - Fastest  |\n| Task               | Acc | Time     | GPU Mem  | Acc | Time     | GPU Mem   |\n|--------------------|-----|----------|----------|-----|----------|-----------|\n| LR_MNIST           | ~81 | 00:03:09 | ~3060 MB | ~81 | 00:01:35 | ~1060 MB  |\n| CNN_FEMNIST        | ~83 | 05:49:52 | ~5180 MB | ~83 | 00:08:22 | ~1770 MB  |\n| RESNET_FEDCIFAR100 | ~34 | 15:55:36 | ~5530 MB | ~33 | 01:42:01 | ~1900 MB  |\n| RNN_FEDSHAKESPEARE | ~57 | 06:46:21 | ~3690 MB | ~57 | 00:21:50 | ~1270 MB  |\n -----------------------------------------------------------------------------\n```\n\n### FedML Configuration file\n\nIn order to reproduce this experiment in FedML please use the setup below. \n\n```yaml\n\ncommon_args:\n  training_type: \"simulation\"\n  random_seed: 0\n\ndata_args:\n  dataset: \"femnist\"\n  data_cache_dir: \"~/fedml_data\"\n  partition_method: \"hetero\"\n  partition_alpha: 0.5\n\nmodel_args:\n  model: \"cnn\"\n\n\ntrain_args:\n  federated_optimizer: \"FedAvg\"\n  client_id_list: \"[]\"\n  client_num_in_total: 3400\n  client_num_per_round: 10\n  comm_round: 800\n  epochs: 1\n  batch_size: 20\n  client_optimizer: sgd\n  learning_rate: 0.1\n  weight_decay: 0.001\n\nvalidation_args:\n  frequency_of_the_test: 50\n\ndevice_args:\n  worker_num: 10\n  using_gpu: true\n  gpu_mapping_file: config/fedemnist_cnn/gpu_mapping.yaml\n  gpu_mapping_key: mapping_default # [3, 3, 3, 2]\n\ncomm_args:\n  backend: \"MPI\"\n  is_mobile: 0\n\n```"
  },
  {
    "path": "experiments/cv_cnn_femnist/config.yaml",
    "content": "# Basic configuration file for running classif_cnn example using torchvision CIFAR10 dataset.\n# Parameters needed to initialize the model\nmodel_config:\n    model_type: CNN                                   # class w/ `loss` and `inference` methods\n    model_folder: experiments/cv_cnn_femnist/model.py     # file containing class\n\n# Configuration for differential privacy\ndp_config:\n    enable_local_dp: false                             # whether to enable user-level DP\n\n# Additional privacy metrics\nprivacy_metrics_config:\n    apply_metrics: false                               # cache data to compute additional metrics\n\n# Select the Federated optimizer to use (e.g. DGA, FedAvg or FedProx)\nstrategy: FedAvg\n\n# Determines all the server-side settings for training and evaluation rounds\nserver_config:   \n    wantRL: false                                      # whether to use RL-based meta-optimizers\n    resume_from_checkpoint: false                      # restart from checkpoint if file exists\n    do_profiling: false                                # run profiler and compute runtime metrics\n    optimizer_config:                                  # this is the optimizer used to update the model\n        type: sgd\n        lr: 1.0\n    annealing_config:                                  # annealer for the learning rate\n        type: step_lr\n        step_interval: epoch\n        gamma: 1.0\n        step_size: 100\n    val_freq: 50000                                  # not executing validation on this experiment, only testing\n    rec_freq: 50                                     # how many iterations between metric eval on test set\n    initial_val: false\n    initial_rec: false\n    max_iteration: 800                               # how many iterations in total\n    num_clients_per_iteration: 10                      # how many clients per iteration\n    data_config:                                       # where to get val and test data from\n        val:\n            batch_size: 20\n            val_data: null                             # Assigned to null because dataset is being instantiated\n        test:\n            batch_size: 20\n            test_data: null                            # Assigned to null because dataset is being instantiated\n    type: model_optimization\n    aggregate_median: softmax                          # how aggregations weights are computed\n    initial_lr_client: 0.1                           # learning rate used on client optimizer\n    lr_decay_factor: 1.0\n    weight_train_loss: train_loss\n    best_model_criterion: loss\n    fall_back_to_best_model: false\n    softmax_beta: 1.0\n\n# Dictates the learning parameters for client-side model updates. Train data is defined inside this config.\nclient_config:\n    do_profiling: false                                # run profiling and compute runtime metrics\n    ignore_subtask: false\n    data_config:                                       # where to get training data from\n        train:\n            batch_size: 20\n            list_of_train_data: null                   # Assigned to null because dataset is being instantiated\n            desired_max_samples: 5000\n    optimizer_config:                                  # this is the optimizer used by the client\n        type: sgd\n        lr: 0.1                                      # this is overridden by `initial_lr_client`\n    type: optimization"
  },
  {
    "path": "experiments/cv_cnn_femnist/dataloaders/dataloader.py",
    "content": "# Copyright (c) Microsoft Corporation.\n# Licensed under the MIT license.\n\nimport torch\nimport numpy as np \n\nfrom core.dataloader import BaseDataLoader\nfrom experiments.cv_cnn_femnist.dataloaders.dataset import Dataset\n\nclass DataLoader(BaseDataLoader):\n    def __init__(self, mode, num_workers=0, **kwargs):\n        args = kwargs['args']\n        self.batch_size = args['batch_size']\n\n        dataset = Dataset(\n            data=kwargs['data'],\n            test_only=(not mode=='train'),\n            user_idx=kwargs.get('user_idx', None),\n        )\n\n        super().__init__(\n            dataset,\n            batch_size=self.batch_size,\n            shuffle=(mode=='train'),\n            num_workers=num_workers,\n            collate_fn=self.collate_fn,\n        )\n\n    def collate_fn(self, batch):\n        x, y = list(zip(*batch))\n        x, y = np.array(x), np.array(y)\n        return {'x': torch.tensor(x), 'y': torch.tensor(y)}"
  },
  {
    "path": "experiments/cv_cnn_femnist/dataloaders/dataset.py",
    "content": "# Copyright (c) Microsoft Corporation.\n# Licensed under the MIT license.\n\nimport numpy as np\nfrom core.dataset import BaseDataset\nfrom experiments.cv_cnn_femnist.dataloaders.preprocess import FEMNIST\n\nclass Dataset(BaseDataset):\n    def __init__(self, data, test_only=False, user_idx=0, **kwargs):\n        self.test_only = test_only\n        self.user_idx = user_idx\n\n        # Get all data\n        self.user_list, self.user_data, self.user_data_label, self.num_samples = self.load_data(data, self.test_only)\n\n        if user_idx == -1:\n            self.user = self.user_list\n            self.features = np.vstack([user_data for user_data in self.user_data.values()])\n            self.labels = np.hstack([user_label for user_label in self.user_data_label.values()])\n        else:           \n            if self.test_only:  # combine all data into single array\n                self.user = 'test_only'\n                self.features = np.vstack([user_data for user_data in self.user_data.values()])\n                self.labels = np.hstack([user_label for user_label in self.user_data_label.values()])\n            else:  # get a single user's data\n                if user_idx is None:\n                    raise ValueError('in train mode, user_idx must be specified')\n\n                self.user = self.user_list[user_idx]\n                self.features = self.user_data[self.user]\n                self.labels = self.user_data_label[self.user]\n\n    def __getitem__(self, idx):\n        return np.array(self.features[idx]).astype(np.float32).T, self.labels[idx]\n\n    def __len__(self):\n        return len(self.features)\n\n    def load_data(self, data, test_only):\n        '''Wrapper method to read/instantiate the dataset'''\n\n        if data == None:\n            dataset = FEMNIST()\n            data = dataset.testset if test_only else dataset.trainset\n        \n        users = data['users']\n        features = data['user_data']\n        labels = data['user_data_label']\n        num_samples = data['num_samples']\n            \n        return users, features, labels, num_samples"
  },
  {
    "path": "experiments/cv_cnn_femnist/dataloaders/preprocess.py",
    "content": "import os\nimport h5py\nimport wget\nimport tarfile\n\ndata_cache_dir = \"./data\"\nDEFAULT_TRAIN_FILE = \"fed_emnist_train.h5\"\nDEFAULT_TEST_FILE = \"fed_emnist_test.h5\"\n\n''' \n    The FederatedEMNIST dataset is taken from FedML repository. For more information regarding this dataset, \n    please refer to https://github.com/FedML-AI/FedML/tree/master/python/fedml/data/FederatedEMNIST.\n\n    In order to download the data run the following commands:\n        - wget --no-check-certificate --no-proxy https://fedml.s3-us-west-1.amazonaws.com/fed_emnist.tar.bz2\n        - tar -xvf fed_emnist.tar.bz2\n'''\n\nclass FEMNIST:\n    def __init__(self) :\n        \n        download_files(data_cache_dir)\n\n        # Preprocess the dataset\n        train_h5 = h5py.File(os.path.join(data_cache_dir,'femnist', DEFAULT_TRAIN_FILE), \"r\")\n        test_h5 = h5py.File(os.path.join(data_cache_dir, 'femnist',DEFAULT_TEST_FILE), \"r\")\n        test_dict = {'users': [], 'num_samples': [], 'user_data': dict(), 'user_data_label': dict()}\n        train_dict = {'users': [], 'num_samples': [], 'user_data': dict(), 'user_data_label': dict()}\n\n        for user in test_h5['examples'].keys():\n            test_dict['users'].append(user)\n            test_dict['num_samples'].append(len(test_h5['examples'][user]['pixels'][()]))\n            test_dict['user_data'][user] = test_h5['examples'][user]['pixels'][()]\n            test_dict['user_data_label'][user] = test_h5['examples'][user]['label'][()]\n            \n        for user in train_h5['examples'].keys():\n            train_dict['users'].append(user)\n            train_dict['num_samples'].append(len(train_h5['examples'][user]['pixels'][()]))\n            train_dict['user_data'][user] = train_h5['examples'][user]['pixels'][()]\n            train_dict['user_data_label'][user] = train_h5['examples'][user]['label'][()]\n\n        print(\" Dictionaries ready .. \")\n        self.trainset, self.testset = train_dict, test_dict\n\ndef download_files(data_cache_dir):\n\n    URL = \"https://fedml.s3-us-west-1.amazonaws.com/fed_emnist.tar.bz2\"\n\n    if not os.path.exists(data_cache_dir):\n        os.makedirs(data_cache_dir)\n\n    file_path = os.path.join(data_cache_dir,\"fed_emnist.tar.bz2\") \n\n    # Download and decompress the file (if we haven't already)\n    if not os.path.exists(file_path):\n        wget.download(URL, out=file_path)\n\n        file = tarfile.open(file_path)\n        file.extractall(os.path.join(data_cache_dir,'femnist'))\n        file.close()\n"
  },
  {
    "path": "experiments/cv_cnn_femnist/model.py",
    "content": "import torch\nfrom torch import nn\nfrom torch.nn import functional as F\nfrom core.model import BaseModel\n\n''' \n    The CNN_DropOut model is taken from FedML repository. For more information regarding this model, \n    please refer to https://github.com/FedML-AI/FedML/blob/master/python/fedml/model/cv/cnn.py.\n\n'''\n\nclass CNN_DropOut(torch.nn.Module):\n    \"\"\"\n    Recommended model by \"Adaptive Federated Optimization\" (https://arxiv.org/pdf/2003.00295.pdf)\n    Used for EMNIST experiments.\n    When `only_digits=True`, the summary of returned model is\n    ```\n    Model:\n    _________________________________________________________________\n    Layer (type)                 Output Shape              Param #\n    =================================================================\n    reshape (Reshape)            (None, 28, 28, 1)         0\n    _________________________________________________________________\n    conv2d (Conv2D)              (None, 26, 26, 32)        320\n    _________________________________________________________________\n    conv2d_1 (Conv2D)            (None, 24, 24, 64)        18496\n    _________________________________________________________________\n    max_pooling2d (MaxPooling2D) (None, 12, 12, 64)        0\n    _________________________________________________________________\n    dropout (Dropout)            (None, 12, 12, 64)        0\n    _________________________________________________________________\n    flatten (Flatten)            (None, 9216)              0\n    _________________________________________________________________\n    dense (Dense)                (None, 128)               1179776\n    _________________________________________________________________\n    dropout_1 (Dropout)          (None, 128)               0\n    _________________________________________________________________\n    dense_1 (Dense)              (None, 10)                1290\n    =================================================================\n    Total params: 1,199,882\n    Trainable params: 1,199,882\n    Non-trainable params: 0\n    ```\n    Args:\n      only_digits: If True, uses a final layer with 10 outputs, for use with the\n        digits only MNIST dataset (http://yann.lecun.com/exdb/mnist/).\n        If False, uses 62 outputs for Federated Extended MNIST (FEMNIST)\n        EMNIST: Extending MNIST to handwritten letters: https://arxiv.org/abs/1702.05373.\n    Returns:\n      A `torch.nn.Module`.\n    \"\"\"\n\n    def __init__(self, only_digits=True):\n        super(CNN_DropOut, self).__init__()\n        self.conv2d_1 = torch.nn.Conv2d(1, 32, kernel_size=3)\n        self.max_pooling = nn.MaxPool2d(2, stride=2)\n        self.conv2d_2 = torch.nn.Conv2d(32, 64, kernel_size=3)\n        self.dropout_1 = nn.Dropout(0.25)\n        self.flatten = nn.Flatten()\n        self.linear_1 = nn.Linear(9216, 128)\n        self.dropout_2 = nn.Dropout(0.5)\n        self.linear_2 = nn.Linear(128, 10 if only_digits else 62)\n        self.relu = nn.ReLU()\n        # self.softmax = nn.Softmax(dim=1)\n\n    def forward(self, x):\n        x = torch.unsqueeze(x, 1)\n        x = self.conv2d_1(x)\n        x = self.relu(x)\n        x = self.conv2d_2(x)\n        x = self.relu(x)\n        x = self.max_pooling(x)\n        x = self.dropout_1(x)\n        x = self.flatten(x)\n        x = self.linear_1(x)\n        x = self.relu(x)\n        x = self.dropout_2(x)\n        x = self.linear_2(x)\n        # x = self.softmax(self.linear_2(x))\n        return x\n\nclass CNN(BaseModel):\n    '''This is a PyTorch model with some extra methods'''\n\n    def __init__(self, model_config):\n        super().__init__()\n        self.net = CNN_DropOut(False)\n\n    def loss(self, input: torch.Tensor) -> torch.Tensor:\n        '''Performs forward step and computes the loss'''\n        device = 'cuda' if torch.cuda.is_available() else 'cpu'\n        features, labels = input['x'].to(device), input['y'].to(device)\n        output = self.net.forward(features)\n        criterion = nn.CrossEntropyLoss().to(device)\n        return criterion(output, labels.long())\n\n    def inference(self, input):\n        '''Performs forward step and computes metrics'''\n        device = 'cuda' if torch.cuda.is_available() else 'cpu'\n        features, labels = input['x'].to(device), input['y'].to(device)\n        output = self.net.forward(features)\n        n_samples = features.shape[0]\n        accuracy = torch.mean((torch.argmax(output, dim=1) == labels).float()).item()\n\n        return {'output':output, 'acc': accuracy, 'batch_size': n_samples} \n\n\n        "
  },
  {
    "path": "experiments/cv_lr_mnist/README.md",
    "content": "## FedML Benchmark\n\n### Examples\n\nThe example in this folder was taken from [FedML](https://github.com/FedML-AI/FedML/tree/master/python/examples/simulation/mpi_fedavg_datasets_and_models_example) repository on its release 0.7.300, using the configuration suggested on their\n[benchmarking results](https://doc.fedml.ai/simulation/benchmark/BENCHMARK_MPI.html) for MPI-Based Federated Learning (fastest on this version).\n\n### Data\n\nFLUTE will automatically download the data used for this example, otherwise you can use the scripts provided [here](https://github.com/FedML-AI/FedML/tree/master/python/fedml/data) for each independent dataset in the FedML GitHub repository. \n\n### Run\n\nIf you downloaded the data manually, make sure that the variable `data_cache_dir` has been updated inside `preprocess.py`. Later, you can run the experiment as follows:\n\n```code\n\n    python -m torch.distributed.run  --nproc_per_node=4  e2e_trainer.py -dataPath ~/data -outputPath ~/outputTest  -config ./experiments/cv_lr_mnist/config.yaml -task cv_lr_mnist -backend nccl\n```\n\n### FedML Results\n\nThis comparison was carried out using Parrot (Simulator) on version 0.7.303 at commit ID [8f7f261f](https://github.com/FedML-AI/FedML/tree/8f7f261f44e58d0cb5a416b0d6fa270b42a91049). \n\n```\n _____________________________________________________________________________\n|                    |   FedML (MPI) - Fastest   |   FLUTE (NCCL)  - Fastest  |\n| Task               | Acc | Time     | GPU Mem  | Acc | Time     | GPU Mem   |\n|--------------------|-----|----------|----------|-----|----------|-----------|\n| LR_MNIST           | ~81 | 00:03:09 | ~3060 MB | ~81 | 00:01:35 | ~1060 MB  |\n| CNN_FEMNIST        | ~83 | 05:49:52 | ~5180 MB | ~83 | 00:08:22 | ~1770 MB  |\n| RESNET_FEDCIFAR100 | ~34 | 15:55:36 | ~5530 MB | ~33 | 01:42:01 | ~1900 MB  |\n| RNN_FEDSHAKESPEARE | ~57 | 06:46:21 | ~3690 MB | ~57 | 00:21:50 | ~1270 MB  |\n -----------------------------------------------------------------------------\n```\n\n### FedML Configuration file\n\nIn order to reproduce this experiment in FedML please use the setup below. \n\n```yaml\n\ncommon_args:\n  training_type: \"simulation\"\n  random_seed: 0\n\ndata_args:\n  dataset: \"mnist\"\n  data_cache_dir: ~/fedml_data\n  partition_method: \"hetero\"\n  partition_alpha: 0.5\n\nmodel_args:\n  model: \"lr\"\n\ntrain_args:\n  federated_optimizer: \"FedAvg\"\n  client_id_list: \"[]\"\n  client_num_in_total: 1000\n  client_num_per_round: 10\n  comm_round: 100\n  epochs: 1\n  batch_size: 10\n  client_optimizer: sgd\n  learning_rate: 0.03\n  weight_decay: 0.001\n\nvalidation_args:\n  frequency_of_the_test: 20\n\ndevice_args:\n  worker_num: 10\n  using_gpu: true\n  gpu_mapping_file: config/fedemnist_cnn/gpu_mapping.yaml\n  gpu_mapping_key: mapping_default # [3, 3, 3, 2]\n\ncomm_args:\n  backend: \"MPI\"\n  is_mobile: 0\n\n```\n\n### Flower Results\n\nThis comparison was carried out using Flower (Simulator) on version 1.0.0 at commit ID [4e7fad9](https://github.com/adap/flower/tree/4e7fad99389a5ee511730841b61f279e3359cb16). Showing that in some cases FLUTE can outperform 53x faster.\n\n```\n ________________________________________________\n|        |    Flower (Ray)   | FLUTE (NCCL/Gloo) |\n|        | Acc |    Time     | Acc |    Time     |\n|--------|-----|-------------|-----|-------------|\n| CPU    | ~80 |   00:30:14  | ~80 |   00:03:20  |\n| GPU 2x | ~80 |   01:21:44  | ~80 |   00:01:31  |\n| GPU 4x | ~79 |   00:56:45  | ~81 |   00:01:26  |\n ------------------------------------------------\n```\n\n### Flower Configuration file\n\nIn order to reproduce this experiment in Flower please use the following patch [file](https://github.com/AnonymousQTHM31/FLUTE/blob/main/flower.patch) for the CPU setup. If you want to use multiple GPUs, follow the configuration suggested [here](https://github.com/adap/flower/issues/1415)"
  },
  {
    "path": "experiments/cv_lr_mnist/config.yaml",
    "content": "# Basic configuration file for running classif_cnn example using torchvision CIFAR10 dataset.\n# Parameters needed to initialize the model\nmodel_config:\n    model_type: LR                                   # class w/ `loss` and `inference` methods\n    model_folder: experiments/cv_lr_mnist/model.py     # file containing class\n    input_dim: 784\n    output_dim: 10\n# Configuration for differential privacy\ndp_config:\n    enable_local_dp: false                             # whether to enable user-level DP\n\n# Additional privacy metrics\nprivacy_metrics_config:\n    apply_metrics: false                               # cache data to compute additional metrics\n\n# Select the Federated optimizer to use (e.g. DGA, FedAvg or FedProx)\nstrategy: FedAvg\n\n# Determines all the server-side settings for training and evaluation rounds\nserver_config:   \n    wantRL: false                                      # whether to use RL-based meta-optimizers\n    resume_from_checkpoint: false                      # restart from checkpoint if file exists\n    do_profiling: false                                # run profiler and compute runtime metrics\n    optimizer_config:                                  # this is the optimizer used to update the model\n        type: sgd\n        lr: 1.0\n    annealing_config:                                  # annealer for the learning rate\n        type: step_lr\n        step_interval: epoch\n        gamma: 1.0\n        step_size: 100\n    val_freq: 1000                                       # how many iterations between metric eval on val set\n    rec_freq: 20                                      # how many iterations between metric eval on test set\n    initial_val: false\n    initial_rec: false\n    max_iteration: 100                               # how many iterations in total\n    num_clients_per_iteration: 10                      # how many clients per iteration\n    data_config:                                       # where to get val and test data from\n        val:\n            batch_size: 10\n            val_data: null                             # Assigned to null because dataset is being instantiated\n        test:\n            batch_size: 10\n            test_data: null                            # Assigned to null because dataset is being instantiated\n    type: model_optimization\n    aggregate_median: softmax                          # how aggregations weights are computed\n    initial_lr_client: 0.03                           # learning rate used on client optimizer\n    lr_decay_factor: 1.0\n    weight_train_loss: train_loss\n    best_model_criterion: loss\n    fall_back_to_best_model: false\n    softmax_beta: 1.0\n\n# Dictates the learning parameters for client-side model updates. Train data is defined inside this config.\nclient_config:\n    do_profiling: false                                # run profiling and compute runtime metrics\n    ignore_subtask: false\n    data_config:                                       # where to get training data from\n        train:\n            batch_size: 10\n            list_of_train_data: null                   # Assigned to null because dataset is being instantiated\n            desired_max_samples: 5000\n    optimizer_config:                                  # this is the optimizer used by the client\n        type: sgd\n        lr: 0.03                                      # this is overridden by `initial_lr_client`\n    type: optimization"
  },
  {
    "path": "experiments/cv_lr_mnist/dataloaders/dataloader.py",
    "content": "# Copyright (c) Microsoft Corporation.\n# Licensed under the MIT license.\n\nimport torch\nimport numpy as np\n\nfrom core.dataloader import BaseDataLoader\nfrom experiments.cv_lr_mnist.dataloaders.dataset import Dataset\n\nclass DataLoader(BaseDataLoader):\n    def __init__(self, mode, num_workers=0, **kwargs):\n        args = kwargs['args']\n        self.batch_size = args['batch_size']\n\n        dataset = Dataset(\n            data=kwargs['data'],\n            test_only=(not mode=='train'),\n            user_idx=kwargs.get('user_idx', None),\n        )\n\n        super().__init__(\n            dataset,\n            batch_size=self.batch_size,\n            shuffle=(mode=='train'),\n            num_workers=num_workers,\n            collate_fn=self.collate_fn,\n        )\n\n    def collate_fn(self, batch):\n        x, y = list(zip(*batch))\n        x, y = np.array(x), np.array(y)\n        return {'x': torch.tensor(x), 'y': torch.tensor(y)}"
  },
  {
    "path": "experiments/cv_lr_mnist/dataloaders/dataset.py",
    "content": "# Copyright (c) Microsoft Corporation.\n# Licensed under the MIT license.\n\nimport numpy as np\nfrom core.dataset import BaseDataset\nfrom experiments.cv_lr_mnist.dataloaders.preprocessing import MNIST\n\nclass Dataset(BaseDataset):\n    def __init__(self, data, test_only=False, user_idx=0, **kwargs):\n        self.test_only = test_only\n        self.user_idx = user_idx\n\n        # Get all data\n        self.user_list, self.user_data, self.user_data_label, self.num_samples = self.load_data(data, self.test_only)\n\n        if user_idx == -1:\n            self.user = self.user_list\n            self.features = np.vstack([user_data for user_data in self.user_data.values()])\n            self.labels = np.hstack([user_label for user_label in self.user_data_label.values()])\n        else:\n\n            if self.test_only:  # combine all data into single array\n                self.user = 'test_only'\n                self.features = np.vstack([user_data for user_data in self.user_data.values()])\n                self.labels = np.hstack([user_label for user_label in self.user_data_label.values()])\n            else:  # get a single user's data\n                if user_idx is None:\n                    raise ValueError('in train mode, user_idx must be specified')\n\n                self.user = self.user_list[user_idx]\n                self.features = self.user_data[self.user]\n                self.labels = self.user_data_label[self.user]\n\n    def __getitem__(self, idx):\n        return np.array(self.features[idx]).astype(np.float32).T, self.labels[idx]\n\n    def __len__(self):\n        return len(self.features)\n\n    def load_data(self, data, test_only):\n        '''Wrapper method to read/instantiate the dataset'''\n\n        if data == None:\n            dataset = MNIST()\n            data = dataset.testset if test_only else dataset.trainset\n        \n        users = data['users']\n        features = data['user_data']\n        labels = data['user_data_label']\n        num_samples = data['num_samples']\n            \n        return users, features, labels, num_samples"
  },
  {
    "path": "experiments/cv_lr_mnist/dataloaders/preprocessing.py",
    "content": "import os\nimport wget\nimport zipfile\nimport numpy as np\nimport json\n\nFEDML_DATA_MNIST_URL = \"https://fedcv.s3.us-west-1.amazonaws.com/MNIST.zip\"\ndata_cache_dir = \"./data\"\n\n''' \n    The MNIST dataset is taken from FedML repository. For more information regarding this dataset, \n    please refer to https://github.com/FedML-AI/FedML/tree/master/python/fedml/data/MNIST.\n\n    In order to download the data run the following commands:\n        - wget --no-check-certificate --no-proxy https://fedcv.s3.us-west-1.amazonaws.com/MNIST.zip\n        - unzip MNIST.zip\n'''\n\nclass MNIST:\n    def __init__(self) :\n        \n        download_mnist(data_cache_dir)\n        self.trainset, self.testset = read_data(\n            train_data_dir = os.path.join(data_cache_dir,'MNIST','train'),\n            test_data_dir= os.path.join(data_cache_dir,'MNIST','test'),\n        )\n        print(\"Dictionaries ready ..\")\n\ndef download_mnist(data_cache_dir):\n    if not os.path.exists(data_cache_dir):\n        os.makedirs(data_cache_dir)\n\n    file_path = os.path.join(data_cache_dir,\"MNIST.zip\") \n\n    # Download the file (if we haven't already)\n    if not os.path.exists(file_path):\n        wget.download(FEDML_DATA_MNIST_URL, out=file_path)\n\n    with zipfile.ZipFile(file_path, \"r\") as zip_ref:\n        zip_ref.extractall(data_cache_dir)\n\ndef read_data(train_data_dir, test_data_dir):\n\n    train_files = os.listdir(train_data_dir)\n    train_files = [f for f in train_files if f.endswith(\".json\")]\n    for f in train_files:\n        file_path = os.path.join(train_data_dir, f)\n        with open(file_path, \"r\") as inf:\n            train_data = json.load(inf)\n\n    train_data['user_data_label'] = dict()\n    for user in train_data['user_data']:\n        train_data['user_data_label'][user] = train_data['user_data'][user]['y']\n        train_data['user_data'][user] = train_data['user_data'][user]['x']\n    \n    test_files = os.listdir(test_data_dir)\n    test_files = [f for f in test_files if f.endswith(\".json\")]\n    for f in test_files:\n        file_path = os.path.join(test_data_dir, f)\n        with open(file_path, \"r\") as inf:\n            test_data = json.load(inf)  \n        \n    test_data['user_data_label'] = dict()\n    for user in test_data['user_data']:\n        test_data['user_data_label'][user] = test_data['user_data'][user]['y']\n        test_data['user_data'][user] = test_data['user_data'][user]['x']\n        \n    return train_data, test_data"
  },
  {
    "path": "experiments/cv_lr_mnist/model.py",
    "content": "import torch\nfrom torch import nn\nfrom torch.nn import functional as F\nfrom core.model import BaseModel\n\n''' \n    The LogisticRegression model is taken from FedML repository. For more information regarding this model, \n    please refer to https://github.com/FedML-AI/FedML/blob/master/python/fedml/model/linear/lr.py.\n'''\n\n\nclass LogisticRegression(torch.nn.Module):\n    def __init__(self, input_dim, output_dim):\n        super(LogisticRegression, self).__init__()\n        self.linear = torch.nn.Linear(input_dim, output_dim)\n\n    def forward(self, x):\n        o = self.linear(x.view(-1,28*28))\n        outputs = torch.sigmoid(o)\n        #outputs = torch.sigmoid(self.linear(x))\n        return outputs\n\nclass LR(BaseModel):\n    '''This is a PyTorch model with some extra methods'''\n\n    def __init__(self, model_config):\n        super().__init__()\n        self.net = LogisticRegression(model_config['input_dim'], model_config['output_dim'])\n\n    def loss(self, input: torch.Tensor) -> torch.Tensor:\n        '''Performs forward step and computes the loss'''\n        device = 'cuda' if torch.cuda.is_available() else 'cpu'\n        features, labels = input['x'].to(device), input['y'].to(device)\n        output = self.net.forward(features)\n        criterion = nn.CrossEntropyLoss().to(device)\n        return criterion(output, labels.long())\n\n    def inference(self, input):\n        '''Performs forward step and computes metrics'''\n        device = 'cuda' if torch.cuda.is_available() else 'cpu'\n        features, labels = input['x'].to(device), input['y'].to(device)\n        output = self.net.forward(features)\n\n        n_samples = features.shape[0]\n        accuracy = torch.mean((torch.argmax(output, dim=1) == labels).float()).item()\n\n        return {'output':output, 'acc': accuracy, 'batch_size': n_samples} \n\n\n        "
  },
  {
    "path": "experiments/cv_resnet_fedcifar100/README.md",
    "content": "## FedML Benchmark\n\n### Examples\n\nThe example in this folder was taken from [FedML](https://github.com/FedML-AI/FedML/tree/master/python/examples/simulation/mpi_fedavg_datasets_and_models_example) repository on its release 0.7.300, using the configuration suggested on their\n[benchmarking results](https://doc.fedml.ai/simulation/benchmark/BENCHMARK_MPI.html) for MPI-Based Federated Learning (fastest on this version).\n\n### Data\n\nFLUTE will automatically download the data used for this example, otherwise you can use the scripts provided [here](https://github.com/FedML-AI/FedML/tree/master/python/fedml/data) for each independent dataset in the FedML GitHub repository. \n\n### Run\n\nIf you downloaded the data manually, make sure that the variable `data_cache_dir` has been updated inside `preprocess.py`. Later, you can run the experiment as follows:\n\n```code\n\n    python -m torch.distributed.run  --nproc_per_node=4  e2e_trainer.py -dataPath ~/data -outputPath ~/outputTest  -config ./experiments/cv_resnet_fedcifar100/config.yaml -task cv_resnet_fedcifar100 -backend nccl\n```\n\n### Results\n\nThis comparison was carried out using Parrot (Simulator) on version 0.7.303 at commit ID [8f7f261f](https://github.com/FedML-AI/FedML/tree/8f7f261f44e58d0cb5a416b0d6fa270b42a91049). \n```\n _____________________________________________________________________________\n|                    |   FedML (MPI) - Fastest   |   FLUTE (NCCL)  - Fastest  |\n| Task               | Acc | Time     | GPU Mem  | Acc | Time     | GPU Mem   |\n|--------------------|-----|----------|----------|-----|----------|-----------|\n| LR_MNIST           | ~81 | 00:03:09 | ~3060 MB | ~81 | 00:01:35 | ~1060 MB  |\n| CNN_FEMNIST        | ~83 | 05:49:52 | ~5180 MB | ~83 | 00:08:22 | ~1770 MB  |\n| RESNET_FEDCIFAR100 | ~34 | 15:55:36 | ~5530 MB | ~33 | 01:42:01 | ~1900 MB  |\n| RNN_FEDSHAKESPEARE | ~57 | 06:46:21 | ~3690 MB | ~57 | 00:21:50 | ~1270 MB  |\n -----------------------------------------------------------------------------\n```\n### FedML Configuration file\n\nIn order to reproduce this experiment in FedML please use the setup below. \n\n```yaml\n\ncommon_args:\n  training_type: \"simulation\"\n  random_seed: 0\n\ndata_args:\n  dataset: \"fed_cifar100\"\n  data_cache_dir: ~/fedml_data\n  partition_method: \"hetero\"\n  partition_alpha: 0.5\n\nmodel_args:\n  model: \"resnet18_gn\"\n\ntrain_args:\n  federated_optimizer: \"FedAvg\"\n  client_id_list: \"[]\"\n  client_num_in_total: 500\n  client_num_per_round: 10\n  comm_round: 4000\n  epochs: 1\n  batch_size: 20\n  client_optimizer: sgd\n  learning_rate: 0.1\n  weight_decay: 0.001\n\nvalidation_args:\n  frequency_of_the_test: 50\n\ndevice_args:\n  worker_num: 10\n  using_gpu: true\n  gpu_mapping_file: config/fedcifar100_resnet18/gpu_mapping.yaml\n  gpu_mapping_key: mapping_default # [3, 3, 3, 2]\n\ncomm_args:\n  backend: \"MPI\"\n  is_mobile: 0\n\n```"
  },
  {
    "path": "experiments/cv_resnet_fedcifar100/config.yaml",
    "content": "# Basic configuration file for running classif_cnn example using torchvision CIFAR10 dataset.\n# Parameters needed to initialize the model\nmodel_config:\n    model_type: RESNET                                # class w/ `loss` and `inference` methods\n    model_folder: experiments/cv_resnet_fedcifar100/model.py     # file containing class\n\n# Configuration for differential privacy\ndp_config:\n    enable_local_dp: false                             # whether to enable user-level DP\n\n# Additional privacy metrics\nprivacy_metrics_config:\n    apply_metrics: false                               # cache data to compute additional metrics\n\n# Select the Federated optimizer to use (e.g. DGA, FedAvg or FedProx)\nstrategy: FedAvg\n\n# Determines all the server-side settings for training and evaluation rounds\nserver_config:   \n    wantRL: false                                      # whether to use RL-based meta-optimizers\n    resume_from_checkpoint: false                      # restart from checkpoint if file exists\n    do_profiling: false                                # run profiler and compute runtime metrics\n    optimizer_config:                                  # this is the optimizer used to update the model\n        type: sgd\n        lr: 1.0\n    annealing_config:                                  # annealer for the learning rate\n        type: step_lr\n        step_interval: epoch\n        gamma: 1.0\n        step_size: 100\n    val_freq: 50000                                       # how many iterations between metric eval on val set\n    rec_freq: 50                                     # how many iterations between metric eval on test set\n    initial_val: false\n    initial_rec: false\n    max_iteration: 4000                               # how many iterations in total\n    num_clients_per_iteration: 10                      # how many clients per iteration\n    data_config:                                       # where to get val and test data from\n        val:\n            batch_size: 20\n            val_data: null                             # Assigned to null because dataset is being instantiated\n        test:\n            batch_size: 20\n            test_data: null                            # Assigned to null because dataset is being instantiated\n    type: model_optimization\n    aggregate_median: softmax                          # how aggregations weights are computed\n    initial_lr_client: 0.1                           # learning rate used on client optimizer\n    lr_decay_factor: 1.0\n    weight_train_loss: train_loss\n    best_model_criterion: loss\n    fall_back_to_best_model: false\n    softmax_beta: 1.0\n\n# Dictates the learning parameters for client-side model updates. Train data is defined inside this config.\nclient_config:\n    do_profiling: false                                # run profiling and compute runtime metrics\n    ignore_subtask: false\n    data_config:                                       # where to get training data from\n        train:\n            batch_size: 20\n            list_of_train_data: null                   # Assigned to null because dataset is being instantiated\n            desired_max_samples: 5000\n    optimizer_config:                                  # this is the optimizer used by the client\n        type: sgd\n        lr: 0.1                                      # this is overridden by `initial_lr_client`\n    type: optimization"
  },
  {
    "path": "experiments/cv_resnet_fedcifar100/dataloaders/dataloader.py",
    "content": "# Copyright (c) Microsoft Corporation.\n# Licensed under the MIT license.\n\nimport torch\nimport numpy as np\n\nfrom core.dataloader import BaseDataLoader\nfrom experiments.cv_resnet_fedcifar100.dataloaders.dataset import Dataset\n\nclass DataLoader(BaseDataLoader):\n    def __init__(self, mode, num_workers=0, **kwargs):\n        args = kwargs['args']\n        self.batch_size = args['batch_size']\n\n        dataset = Dataset(\n            data=kwargs['data'],\n            test_only=(not mode=='train'),\n            user_idx=kwargs.get('user_idx', None),\n        )\n\n        super().__init__(\n            dataset,\n            batch_size=self.batch_size,\n            shuffle=(mode=='train'),\n            num_workers=num_workers,\n            collate_fn=self.collate_fn,\n        )\n\n    def collate_fn(self, batch):\n        x, y = list(zip(*batch))\n        x, y = np.array(x), np.array(y)\n        return {'x': torch.tensor(x), 'y': torch.tensor(y)}"
  },
  {
    "path": "experiments/cv_resnet_fedcifar100/dataloaders/dataset.py",
    "content": "# Copyright (c) Microsoft Corporation.\n# Licensed under the MIT license.\n\nimport numpy as np\nfrom core.dataset import BaseDataset\nfrom experiments.cv_resnet_fedcifar100.dataloaders.preprocessing import FEDCIFAR100\n\nclass Dataset(BaseDataset):\n    def __init__(self, data, test_only=False, user_idx=0, **kwargs):\n        self.test_only = test_only\n        self.user_idx = user_idx\n\n        # Get all data\n        self.user_list, self.user_data, self.user_data_label, self.num_samples = self.load_data(data, self.test_only)\n\n        if user_idx == -1:\n            self.user = 'test_only'\n            self.features = np.vstack([user_data for user_data in self.user_data.values()])\n            self.labels = np.hstack([user_label for user_label in self.user_data_label.values()])       \n        else:\n            if self.test_only:  # combine all data into single array\n                self.user = 'test_only'\n                self.features = np.vstack([user_data for user_data in self.user_data.values()])\n                self.labels = np.hstack([user_label for user_label in self.user_data_label.values()])\n            else:  # get a single user's data\n                if user_idx is None:\n                    raise ValueError('in train mode, user_idx must be specified')\n\n                self.user = self.user_list[user_idx]\n                self.features = self.user_data[self.user]\n                self.labels = self.user_data_label[self.user]\n\n    def __getitem__(self, idx):\n        return np.array(self.features[idx]).astype(np.float32).T, self.labels[idx]\n\n    def __len__(self):\n        return len(self.features)\n\n    def load_data(self, data, test_only):\n        '''Wrapper method to read/instantiate the dataset'''\n\n        if data == None:\n            dataset = FEDCIFAR100()\n            data = dataset.testset if test_only else dataset.trainset\n        \n        users = data['users']\n        features = data['user_data']\n        labels = data['user_data_label']\n        num_samples = data['num_samples']\n            \n        return users, features, labels, num_samples"
  },
  {
    "path": "experiments/cv_resnet_fedcifar100/dataloaders/preprocessing.py",
    "content": "import os\nimport wget\nimport zipfile\nimport tarfile\nimport h5py\n\ndata_cache_dir = \"./data\"\nDEFAULT_TRAIN_FILE = \"fed_cifar100_train.h5\"\nDEFAULT_TEST_FILE = \"fed_cifar100_test.h5\"\n\n''' \n    The FedCIFAR100 dataset is taken from FedML repository. For more information regarding this dataset, \n    please refer to https://github.com/FedML-AI/FedML/tree/master/python/fedml/data/fed_cifar100.\n\n    In order to download the data run the following commands:\n        - wget --no-check-certificate --no-proxy https://fedml.s3-us-west-1.amazonaws.com/fed_cifar100.tar.bz2\n        - tar -xvf fed_cifar100.tar.bz2\n'''\n\nclass FEDCIFAR100:\n    def __init__(self) :\n\n        download_files(data_cache_dir)\n\n        # Preprocess datasets\n        train_h5 = h5py.File(os.path.join(data_cache_dir,'fed_cifar100', DEFAULT_TRAIN_FILE), \"r\")\n        test_h5 = h5py.File(os.path.join(data_cache_dir, 'fed_cifar100',DEFAULT_TEST_FILE), \"r\")\n        test_dict = {'users': [], 'num_samples': [], 'user_data': dict(), 'user_data_label': dict()}\n        train_dict = {'users': [], 'num_samples': [], 'user_data': dict(), 'user_data_label': dict()}\n\n        for user in test_h5['examples'].keys():\n            test_dict['users'].append(user)\n            test_dict['num_samples'].append(len(test_h5['examples'][user]['image'][()]))\n            test_dict['user_data'][user] = test_h5['examples'][user]['image'][()]\n            test_dict['user_data_label'][user] = test_h5['examples'][user]['label'][()]\n            \n        for user in train_h5['examples'].keys():\n            train_dict['users'].append(user)\n            train_dict['num_samples'].append(len(train_h5['examples'][user]['image'][()]))\n            train_dict['user_data'][user] = train_h5['examples'][user]['image'][()]\n            train_dict['user_data_label'][user] = train_h5['examples'][user]['label'][()]\n\n        print(\" Dictionaries ready .. \")\n        self.trainset, self.testset = train_dict, test_dict\n\ndef download_files(data_cache_dir):\n\n    URL = \"https://fedml.s3-us-west-1.amazonaws.com/fed_cifar100.tar.bz2\"\n\n    if not os.path.exists(data_cache_dir):\n        os.makedirs(data_cache_dir)\n\n    file_path = os.path.join(data_cache_dir,\"fed_cifar100.tar.bz2\") \n\n    # Download and decompress the file (if we haven't already)\n    if not os.path.exists(file_path):\n        wget.download(URL, out=file_path)\n\n        file = tarfile.open(file_path)\n        file.extractall(os.path.join(data_cache_dir,'fed_cifar100'))\n        file.close()\n"
  },
  {
    "path": "experiments/cv_resnet_fedcifar100/group_normalization.py",
    "content": "import torch.nn.functional as F\nfrom torch.nn.modules.batchnorm import _BatchNorm\n\n\"\"\" This group normalization script was taken from FedML repository. For more information please refer to\n    https://github.com/FedML-AI/FedML/blob/master/python/fedml/model/cv/group_normalization.py.\n\n    Pytorch implementation of group normalization in https://arxiv.org/abs/1803.08494 (Following the PyTorch Style)\n\"\"\"\n\ndef group_norm(\n    input,\n    group,\n    running_mean,\n    running_var,\n    weight=None,\n    bias=None,\n    use_input_stats=True,\n    momentum=0.1,\n    eps=1e-5,\n):\n    \"\"\"Applies Group Normalization for channels in the same group in each data sample in a\n    batch.\n    See :class:`~torch.nn.GroupNorm1d`, :class:`~torch.nn.GroupNorm2d`,\n    :class:`~torch.nn.GroupNorm3d` for details.\n    \"\"\"\n    if not use_input_stats and (running_mean is None or running_var is None):\n        raise ValueError(\n            \"Expected running_mean and running_var to be not None when use_input_stats=False\"\n        )\n\n    b, c = input.size(0), input.size(1)\n    if weight is not None:\n        weight = weight.repeat(b)\n    if bias is not None:\n        bias = bias.repeat(b)\n\n    def _instance_norm(\n        input,\n        group,\n        running_mean=None,\n        running_var=None,\n        weight=None,\n        bias=None,\n        use_input_stats=None,\n        momentum=None,\n        eps=None,\n    ):\n        # Repeat stored stats and affine transform params if necessary\n        if running_mean is not None:\n            running_mean_orig = running_mean\n            running_mean = running_mean_orig.repeat(b)\n        if running_var is not None:\n            running_var_orig = running_var\n            running_var = running_var_orig.repeat(b)\n\n        # norm_shape = [1, b * c / group, group]\n        # print(norm_shape)\n        # Apply instance norm\n        input_reshaped = input.contiguous().view(\n            1, int(b * c / group), group, *input.size()[2:]\n        )\n\n        out = F.batch_norm(\n            input_reshaped,\n            running_mean,\n            running_var,\n            weight=weight,\n            bias=bias,\n            training=use_input_stats,\n            momentum=momentum,\n            eps=eps,\n        )\n\n        # Reshape back\n        if running_mean is not None:\n            running_mean_orig.copy_(\n                running_mean.view(b, int(c / group)).mean(0, keepdim=False)\n            )\n        if running_var is not None:\n            running_var_orig.copy_(\n                running_var.view(b, int(c / group)).mean(0, keepdim=False)\n            )\n\n        return out.view(b, c, *input.size()[2:])\n\n    return _instance_norm(\n        input,\n        group,\n        running_mean=running_mean,\n        running_var=running_var,\n        weight=weight,\n        bias=bias,\n        use_input_stats=use_input_stats,\n        momentum=momentum,\n        eps=eps,\n    )\n\n\nclass _GroupNorm(_BatchNorm):\n    def __init__(\n        self,\n        num_features,\n        num_groups=1,\n        eps=1e-5,\n        momentum=0.1,\n        affine=False,\n        track_running_stats=False,\n    ):\n        self.num_groups = num_groups\n        self.track_running_stats = track_running_stats\n        super(_GroupNorm, self).__init__(\n            int(num_features / num_groups), eps, momentum, affine, track_running_stats\n        )\n\n    def _check_input_dim(self, input):\n        return NotImplemented\n\n    def forward(self, input):\n        self._check_input_dim(input)\n\n        return group_norm(\n            input,\n            self.num_groups,\n            self.running_mean,\n            self.running_var,\n            self.weight,\n            self.bias,\n            self.training or not self.track_running_stats,\n            self.momentum,\n            self.eps,\n        )\n\n\nclass GroupNorm2d(_GroupNorm):\n    r\"\"\"Applies Group Normalization over a 4D input (a mini-batch of 2D inputs\n    with additional channel dimension) as described in the paper\n    https://arxiv.org/pdf/1803.08494.pdf\n    `Group Normalization`_ .\n    Args:\n        num_features: :math:`C` from an expected input of size\n            :math:`(N, C, H, W)`\n        num_groups:\n        eps: a value added to the denominator for numerical stability. Default: 1e-5\n        momentum: the value used for the running_mean and running_var computation. Default: 0.1\n        affine: a boolean value that when set to ``True``, this module has\n            learnable affine parameters. Default: ``True``\n        track_running_stats: a boolean value that when set to ``True``, this\n            module tracks the running mean and variance, and when set to ``False``,\n            this module does not track such statistics and always uses batch\n            statistics in both training and eval modes. Default: ``False``\n    Shape:\n        - Input: :math:`(N, C, H, W)`\n        - Output: :math:`(N, C, H, W)` (same shape as input)\n    Examples:\n        >>> # Without Learnable Parameters\n        >>> m = GroupNorm2d(100, 4)\n        >>> # With Learnable Parameters\n        >>> m = GroupNorm2d(100, 4, affine=True)\n        >>> input = torch.randn(20, 100, 35, 45)\n        >>> output = m(input)\n    \"\"\"\n\n    def _check_input_dim(self, input):\n        if input.dim() != 4:\n            raise ValueError(\"expected 4D input (got {}D input)\".format(input.dim()))\n\n\nclass GroupNorm3d(_GroupNorm):\n    \"\"\"\n    Assume the data format is (B, C, D, H, W)\n    \"\"\"\n\n    def _check_input_dim(self, input):\n        if input.dim() != 5:\n            raise ValueError(\"expected 5D input (got {}D input)\".format(input.dim()))\n"
  },
  {
    "path": "experiments/cv_resnet_fedcifar100/model.py",
    "content": "import math\nimport torch\nimport torch.nn as nn\nimport torch.utils.model_zoo as model_zoo\nfrom torch.nn import functional as F\n\nfrom experiments.cv_resnet_fedcifar100.group_normalization import GroupNorm2d\nfrom core.model import BaseModel\n\n''' \n    The ResNet models are taken from FedML repository. For more information regarding this model, \n    please refer to https://github.com/FedML-AI/FedML/blob/master/python/fedml/model/cv/resnet_gn.py.\n'''\n\n\n__all__ = [\"ResNet\", \"resnet18\", \"resnet34\", \"resnet50\", \"resnet101\", \"resnet152\"]\n\nmodel_urls = {\n    \"resnet18\": \"https://download.pytorch.org/models/resnet18-5c106cde.pth\",\n    \"resnet34\": \"https://download.pytorch.org/models/resnet34-333f7ec4.pth\",\n    \"resnet50\": \"https://download.pytorch.org/models/resnet50-19c8e357.pth\",\n    \"resnet101\": \"https://download.pytorch.org/models/resnet101-5d3b4d8f.pth\",\n    \"resnet152\": \"https://download.pytorch.org/models/resnet152-b121ed2d.pth\",\n}\n\ndef conv3x3(in_planes, out_planes, stride=1):\n    \"\"\"3x3 convolution with padding\"\"\"\n    return nn.Conv2d(\n        in_planes, out_planes, kernel_size=3, stride=stride, padding=1, bias=False\n    )\n\n\ndef norm2d(planes, num_channels_per_group=32):\n    print(\"num_channels_per_group:{}\".format(num_channels_per_group))\n    if num_channels_per_group > 0:\n        return GroupNorm2d(\n            planes, num_channels_per_group, affine=True, track_running_stats=False\n        )\n    else:\n        return nn.BatchNorm2d(planes)\n\n\nclass BasicBlock(nn.Module):\n    expansion = 1\n\n    def __init__(self, inplanes, planes, stride=1, downsample=None, group_norm=0):\n        super(BasicBlock, self).__init__()\n        self.conv1 = conv3x3(inplanes, planes, stride)\n        self.bn1 = norm2d(planes, group_norm)\n        self.relu = nn.ReLU(inplace=True)\n        self.conv2 = conv3x3(planes, planes)\n        self.bn2 = norm2d(planes, group_norm)\n        self.downsample = downsample\n        self.stride = stride\n\n    def forward(self, x):\n        residual = x\n\n        out = self.conv1(x)\n        out = self.bn1(out)\n        out = self.relu(out)\n\n        out = self.conv2(out)\n        out = self.bn2(out)\n\n        if self.downsample is not None:\n            residual = self.downsample(x)\n\n        out += residual\n        out = self.relu(out)\n\n        return out\n\n\nclass Bottleneck(nn.Module):\n    expansion = 4\n\n    def __init__(self, inplanes, planes, stride=1, downsample=None, group_norm=0):\n        super(Bottleneck, self).__init__()\n        self.conv1 = nn.Conv2d(inplanes, planes, kernel_size=1, bias=False)\n        self.bn1 = norm2d(planes, group_norm)\n        self.conv2 = nn.Conv2d(\n            planes, planes, kernel_size=3, stride=stride, padding=1, bias=False\n        )\n        self.bn2 = norm2d(planes, group_norm)\n        self.conv3 = nn.Conv2d(planes, planes * 4, kernel_size=1, bias=False)\n        self.bn3 = norm2d(planes * 4, group_norm)\n        self.relu = nn.ReLU(inplace=True)\n        self.downsample = downsample\n        self.stride = stride\n\n    def forward(self, x):\n        residual = x\n\n        out = self.conv1(x)\n        out = self.bn1(out)\n        out = self.relu(out)\n\n        out = self.conv2(out)\n        out = self.bn2(out)\n        out = self.relu(out)\n\n        out = self.conv3(out)\n        out = self.bn3(out)\n\n        if self.downsample is not None:\n            residual = self.downsample(x)\n\n        out += residual\n        out = self.relu(out)\n\n        return out\n\n\nclass ResNet(nn.Module):\n    def __init__(self, block, layers, num_classes=1000, group_norm=0):\n        self.inplanes = 64\n        super(ResNet, self).__init__()\n        self.conv1 = nn.Conv2d(3, 64, kernel_size=7, stride=2, padding=3, bias=False)\n        self.bn1 = norm2d(64, group_norm)\n        self.relu = nn.ReLU(inplace=True)\n        self.maxpool = nn.MaxPool2d(kernel_size=3, stride=2, padding=1)\n        self.layer1 = self._make_layer(block, 64, layers[0], group_norm=group_norm)\n        self.layer2 = self._make_layer(\n            block, 128, layers[1], stride=2, group_norm=group_norm\n        )\n        self.layer3 = self._make_layer(\n            block, 256, layers[2], stride=2, group_norm=group_norm\n        )\n        self.layer4 = self._make_layer(\n            block, 512, layers[3], stride=2, group_norm=group_norm\n        )\n        # self.avgpool = nn.AvgPool2d(7, stride=1)\n        self.avgpool = nn.AvgPool2d(1)\n        self.fc = nn.Linear(512 * block.expansion, num_classes)\n\n        for m in self.modules():\n            if isinstance(m, nn.Conv2d):\n                n = m.kernel_size[0] * m.kernel_size[1] * m.out_channels\n                m.weight.data.normal_(0, math.sqrt(2.0 / n))\n            elif isinstance(m, nn.BatchNorm2d):\n                m.weight.data.fill_(1)\n                m.bias.data.zero_()\n            elif isinstance(m, GroupNorm2d):\n                m.weight.data.fill_(1)\n                m.bias.data.zero_()\n\n        for m in self.modules():\n            if isinstance(m, Bottleneck):\n                m.bn3.weight.data.fill_(0)\n            if isinstance(m, BasicBlock):\n                m.bn2.weight.data.fill_(0)\n\n    def _make_layer(self, block, planes, blocks, stride=1, group_norm=0):\n        downsample = None\n        if stride != 1 or self.inplanes != planes * block.expansion:\n            downsample = nn.Sequential(\n                nn.Conv2d(\n                    self.inplanes,\n                    planes * block.expansion,\n                    kernel_size=1,\n                    stride=stride,\n                    bias=False,\n                ),\n                norm2d(planes * block.expansion, group_norm),\n            )\n\n        layers = []\n        layers.append(block(self.inplanes, planes, stride, downsample, group_norm))\n        self.inplanes = planes * block.expansion\n        for i in range(1, blocks):\n            layers.append(block(self.inplanes, planes, group_norm=group_norm))\n\n        return nn.Sequential(*layers)\n\n    def forward(self, x):\n        x = self.conv1(x)\n        x = self.bn1(x)\n        x = self.relu(x)\n        x = self.maxpool(x)\n\n        x = self.layer1(x)\n        x = self.layer2(x)\n        x = self.layer3(x)\n        x = self.layer4(x)\n\n        x = self.avgpool(x)\n        x = x.view(x.size(0), -1)\n        x = self.fc(x)\n\n        return x\n\n\ndef resnet18(pretrained=False, **kwargs):\n    \"\"\"Constructs a ResNet-18 model.\n    Args:\n        pretrained (bool): If True, returns a model pre-trained on ImageNet\n    \"\"\"\n    model = ResNet(BasicBlock, [2, 2, 2, 2], **kwargs)\n    if pretrained:\n        model.load_state_dict(model_zoo.load_url(model_urls[\"resnet18\"]))\n    return model\n\n\ndef resnet34(pretrained=False, **kwargs):\n    \"\"\"Constructs a ResNet-34 model.\n    Args:\n        pretrained (bool): If True, returns a model pre-trained on ImageNet\n    \"\"\"\n    model = ResNet(BasicBlock, [3, 4, 6, 3], **kwargs)\n    if pretrained:\n        model.load_state_dict(model_zoo.load_url(model_urls[\"resnet34\"]))\n    return model\n\n\ndef resnet50(pretrained=False, **kwargs):\n    \"\"\"Constructs a ResNet-50 model.\n    Args:\n        pretrained (bool): If True, returns a model pre-trained on ImageNet\n    \"\"\"\n    model = ResNet(Bottleneck, [3, 4, 6, 3], **kwargs)\n    if pretrained:\n        model.load_state_dict(model_zoo.load_url(model_urls[\"resnet50\"]))\n    return model\n\n\ndef resnet101(pretrained=False, **kwargs):\n    \"\"\"Constructs a ResNet-101 model.\n    Args:\n        pretrained (bool): If True, returns a model pre-trained on ImageNet\n    \"\"\"\n    model = ResNet(Bottleneck, [3, 4, 23, 3], **kwargs)\n    if pretrained:\n        model.load_state_dict(model_zoo.load_url(model_urls[\"resnet101\"]))\n    return model\n\n\ndef resnet152(pretrained=False, **kwargs):\n    \"\"\"Constructs a ResNet-152 model.\n    Args:\n        pretrained (bool): If True, returns a model pre-trained on ImageNet\n    \"\"\"\n    model = ResNet(Bottleneck, [3, 8, 36, 3], **kwargs)\n    if pretrained:\n        model.load_state_dict(model_zoo.load_url(model_urls[\"resnet152\"]))\n    return model\n\nclass RESNET(BaseModel):\n    '''This is a PyTorch model with some extra methods'''\n\n    def __init__(self, model_config):\n        super().__init__()\n        self.net = resnet18()\n\n    def loss(self, input: torch.Tensor) -> torch.Tensor:\n        '''Performs forward step and computes the loss'''\n        device = 'cuda' if torch.cuda.is_available() else 'cpu'\n        features, labels = input['x'].to(device), input['y'].to(device)\n        output = self.net.forward(features)\n\n        return F.cross_entropy(output, labels.long())\n\n    def inference(self, input):\n        '''Performs forward step and computes metrics'''\n        device = 'cuda' if torch.cuda.is_available() else 'cpu'\n        features, labels = input['x'].to(device), input['y'].to(device)\n        output = self.net.forward(features)\n\n        n_samples = features.shape[0]\n        accuracy = torch.mean((torch.argmax(output, dim=1) == labels).float()).item()\n\n        return {'output':output, 'acc': accuracy, 'batch_size': n_samples} "
  },
  {
    "path": "experiments/ecg_cnn/.gitignore",
    "content": "./data\n./raw_data\n*.hdf5\n*.json"
  },
  {
    "path": "experiments/ecg_cnn/centralized_model.ipynb",
    "content": "{\n  \"cells\": [\n    {\n      \"cell_type\": \"code\",\n      \"execution_count\": 1,\n      \"metadata\": {\n        \"gather\": {\n          \"logged\": 1644332992250\n        }\n      },\n      \"outputs\": [],\n      \"source\": [\n        \"# Example running CL taken from:\\n\",\n        \"# https://www.kaggle.com/polomarco/ecg-classification-cnn-lstm-attention-mechanism\"\n      ]\n    },\n    {\n      \"cell_type\": \"code\",\n      \"execution_count\": 23,\n      \"metadata\": {\n        \"gather\": {\n          \"logged\": 1644397182872\n        }\n      },\n      \"outputs\": [],\n      \"source\": [\n        \"import csv\\n\",\n        \"import time\\n\",\n        \"\\n\",\n        \"import numpy as np # linear algebra\\n\",\n        \"import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)\\n\",\n        \"\\n\",\n        \"import torch\\n\",\n        \"import torch.nn as nn\\n\",\n        \"import matplotlib.pyplot as plt\\n\",\n        \"import torch.nn.functional as F\\n\",\n        \"from torch.utils.data import Dataset, DataLoader\\n\",\n        \"from torch.optim import AdamW, SGD\\n\",\n        \"from torch.optim.lr_scheduler import StepLR\\n\",\n        \"\\n\",\n        \"from sklearn.model_selection import train_test_split\\n\",\n        \"from sklearn.metrics import accuracy_score, f1_score, precision_score, recall_score\"\n      ]\n    },\n    {\n      \"cell_type\": \"code\",\n      \"execution_count\": 3,\n      \"metadata\": {\n        \"gather\": {\n          \"logged\": 1644332993422\n        }\n      },\n      \"outputs\": [],\n      \"source\": [\n        \"class Config:\\n\",\n        \"    device = torch.device(\\\"cuda\\\" if torch.cuda.is_available() else \\\"cpu\\\")\\n\",\n        \"    train_csv_path = './raw_data/mitbih_train.csv'\\n\",\n        \"    test_csv_path = './raw_data/mitbih_test.csv'\\n\",\n        \"    seed = 123\\n\",\n        \"config = Config\"\n      ]\n    },\n    {\n      \"cell_type\": \"code\",\n      \"execution_count\": null,\n      \"metadata\": {\n        \"jupyter\": {\n          \"outputs_hidden\": false,\n          \"source_hidden\": false\n        },\n        \"nteract\": {\n          \"transient\": {\n            \"deleting\": false\n          }\n        }\n      },\n      \"outputs\": [],\n      \"source\": []\n    },\n    {\n      \"cell_type\": \"code\",\n      \"execution_count\": 4,\n      \"metadata\": {\n        \"gather\": {\n          \"logged\": 1644332993546\n        }\n      },\n      \"outputs\": [],\n      \"source\": [\n        \"class ECGDataset(Dataset):\\n\",\n        \"\\n\",\n        \"    def __init__(self, df):\\n\",\n        \"        self.df = df\\n\",\n        \"        self.data_columns = self.df.columns[:-2].tolist()\\n\",\n        \"\\n\",\n        \"    def __getitem__(self, idx):\\n\",\n        \"        signal = self.df.loc[idx, self.data_columns].astype('float32')\\n\",\n        \"        signal = torch.FloatTensor([signal.values])                 \\n\",\n        \"        target = torch.LongTensor(np.array(self.df.loc[idx, 'class']))\\n\",\n        \"        return signal, target\\n\",\n        \"\\n\",\n        \"    def __len__(self):\\n\",\n        \"        return len(self.df)\\n\",\n        \"\\n\",\n        \"id_to_label = {\\n\",\n        \"    0: \\\"Normal\\\",\\n\",\n        \"    1: \\\"Artial Premature\\\",\\n\",\n        \"    2: \\\"Premature ventricular contraction\\\",\\n\",\n        \"    3: \\\"Fusion of ventricular and normal\\\",\\n\",\n        \"    4: \\\"Fusion of paced and normal\\\"\\n\",\n        \"}\\n\",\n        \"\\n\",\n        \"def get_dataloader(phase: str, batch_size: int = 96) -> DataLoader:\\n\",\n        \"    '''\\n\",\n        \"    Dataset and DataLoader.\\n\",\n        \"    Parameters:\\n\",\n        \"        pahse: training or validation phase.\\n\",\n        \"        batch_size: data per iteration.\\n\",\n        \"    Returns:\\n\",\n        \"        data generator\\n\",\n        \"    '''\\n\",\n        \"    df = pd.read_csv(config.train_csv_path, header=None)\\n\",\n        \"    df.rename(columns={187: 'class'}, inplace=True)\\n\",\n        \"    df['label'] = df.iloc[:, -1].map(id_to_label)\\n\",\n        \"    train_df, val_df = train_test_split(\\n\",\n        \"        df, test_size=0.15, random_state=config.seed, stratify=df['label']\\n\",\n        \"    )\\n\",\n        \"    train_df, val_df = train_df.reset_index(drop=True), val_df.reset_index(drop=True)\\n\",\n        \"    df = train_df if phase == 'train' else val_df\\n\",\n        \"    dataset = ECGDataset(df)\\n\",\n        \"    dataloader = DataLoader(dataset=dataset, batch_size=batch_size, num_workers=4)\\n\",\n        \"    return dataloader\"\n      ]\n    },\n    {\n      \"cell_type\": \"code\",\n      \"execution_count\": 5,\n      \"metadata\": {\n        \"gather\": {\n          \"logged\": 1644332993675\n        }\n      },\n      \"outputs\": [],\n      \"source\": [\n        \"class Swish(nn.Module):\\n\",\n        \"    def forward(self, x):\\n\",\n        \"        return x * torch.sigmoid(x)\"\n      ]\n    },\n    {\n      \"cell_type\": \"code\",\n      \"execution_count\": 6,\n      \"metadata\": {\n        \"gather\": {\n          \"logged\": 1644332993801\n        }\n      },\n      \"outputs\": [],\n      \"source\": [\n        \"class ConvNormPool(nn.Module):\\n\",\n        \"    \\\"\\\"\\\"Conv Skip-connection module\\\"\\\"\\\"\\n\",\n        \"    def __init__(\\n\",\n        \"        self,\\n\",\n        \"        input_size,\\n\",\n        \"        hidden_size,\\n\",\n        \"        kernel_size,\\n\",\n        \"        norm_type='bachnorm'\\n\",\n        \"    ):\\n\",\n        \"        super().__init__()\\n\",\n        \"        \\n\",\n        \"        self.kernel_size = kernel_size\\n\",\n        \"        self.conv_1 = nn.Conv1d(\\n\",\n        \"            in_channels=input_size,\\n\",\n        \"            out_channels=hidden_size,\\n\",\n        \"            kernel_size=kernel_size\\n\",\n        \"        )\\n\",\n        \"        self.conv_2 = nn.Conv1d(\\n\",\n        \"            in_channels=hidden_size,\\n\",\n        \"            out_channels=hidden_size,\\n\",\n        \"            kernel_size=kernel_size\\n\",\n        \"        )\\n\",\n        \"        self.conv_3 = nn.Conv1d(\\n\",\n        \"            in_channels=hidden_size,\\n\",\n        \"            out_channels=hidden_size,\\n\",\n        \"            kernel_size=kernel_size\\n\",\n        \"        )\\n\",\n        \"        self.swish_1 = Swish()\\n\",\n        \"        self.swish_2 = Swish()\\n\",\n        \"        self.swish_3 = Swish()\\n\",\n        \"        if norm_type == 'group':\\n\",\n        \"            self.normalization_1 = nn.GroupNorm(\\n\",\n        \"                num_groups=8,\\n\",\n        \"                num_channels=hidden_size\\n\",\n        \"            )\\n\",\n        \"            self.normalization_2 = nn.GroupNorm(\\n\",\n        \"                num_groups=8,\\n\",\n        \"                num_channels=hidden_size\\n\",\n        \"            )\\n\",\n        \"            self.normalization_3 = nn.GroupNorm(\\n\",\n        \"                num_groups=8,\\n\",\n        \"                num_channels=hidden_size\\n\",\n        \"            )\\n\",\n        \"        else:\\n\",\n        \"            self.normalization_1 = nn.BatchNorm1d(num_features=hidden_size)\\n\",\n        \"            self.normalization_2 = nn.BatchNorm1d(num_features=hidden_size)\\n\",\n        \"            self.normalization_3 = nn.BatchNorm1d(num_features=hidden_size)\\n\",\n        \"            \\n\",\n        \"        self.pool = nn.MaxPool1d(kernel_size=2)\\n\",\n        \"        \\n\",\n        \"    def forward(self, input):\\n\",\n        \"        conv1 = self.conv_1(input)\\n\",\n        \"        x = self.normalization_1(conv1)\\n\",\n        \"        x = self.swish_1(x)\\n\",\n        \"        x = F.pad(x, pad=(self.kernel_size - 1, 0))\\n\",\n        \"        \\n\",\n        \"        x = self.conv_2(x)\\n\",\n        \"        x = self.normalization_2(x)\\n\",\n        \"        x = self.swish_2(x)\\n\",\n        \"        x = F.pad(x, pad=(self.kernel_size - 1, 0))\\n\",\n        \"        \\n\",\n        \"        conv3 = self.conv_3(x)\\n\",\n        \"        x = self.normalization_3(conv1+conv3)\\n\",\n        \"        x = self.swish_3(x)\\n\",\n        \"        x = F.pad(x, pad=(self.kernel_size - 1, 0))   \\n\",\n        \"        \\n\",\n        \"        x = self.pool(x)\\n\",\n        \"        return x\"\n      ]\n    },\n    {\n      \"cell_type\": \"code\",\n      \"execution_count\": 7,\n      \"metadata\": {\n        \"gather\": {\n          \"logged\": 1644332993953\n        }\n      },\n      \"outputs\": [],\n      \"source\": [\n        \"class RNN(nn.Module):\\n\",\n        \"    \\\"\\\"\\\"RNN module(cell type lstm or gru)\\\"\\\"\\\"\\n\",\n        \"    def __init__(\\n\",\n        \"        self,\\n\",\n        \"        input_size,\\n\",\n        \"        hid_size,\\n\",\n        \"        num_rnn_layers=1,\\n\",\n        \"        dropout_p = 0.2,\\n\",\n        \"        bidirectional = False,\\n\",\n        \"        rnn_type = 'lstm',\\n\",\n        \"    ):\\n\",\n        \"        super().__init__()\\n\",\n        \"        \\n\",\n        \"        if rnn_type == 'lstm':\\n\",\n        \"            self.rnn_layer = nn.LSTM(\\n\",\n        \"                input_size=input_size,\\n\",\n        \"                hidden_size=hid_size,\\n\",\n        \"                num_layers=num_rnn_layers,\\n\",\n        \"                dropout=dropout_p if num_rnn_layers>1 else 0,\\n\",\n        \"                bidirectional=bidirectional,\\n\",\n        \"                batch_first=True,\\n\",\n        \"            )\\n\",\n        \"            \\n\",\n        \"        else:\\n\",\n        \"            self.rnn_layer = nn.GRU(\\n\",\n        \"                input_size=input_size,\\n\",\n        \"                hidden_size=hid_size,\\n\",\n        \"                num_layers=num_rnn_layers,\\n\",\n        \"                dropout=dropout_p if num_rnn_layers>1 else 0,\\n\",\n        \"                bidirectional=bidirectional,\\n\",\n        \"                batch_first=True,\\n\",\n        \"            )\\n\",\n        \"    def forward(self, input):\\n\",\n        \"        outputs, hidden_states = self.rnn_layer(input)\\n\",\n        \"        return outputs, hidden_states\"\n      ]\n    },\n    {\n      \"cell_type\": \"code\",\n      \"execution_count\": 8,\n      \"metadata\": {\n        \"gather\": {\n          \"logged\": 1644332994075\n        }\n      },\n      \"outputs\": [],\n      \"source\": [\n        \"class RNNAttentionModel(nn.Module):\\n\",\n        \"    def __init__(\\n\",\n        \"        self,\\n\",\n        \"        input_size,\\n\",\n        \"        hid_size,\\n\",\n        \"        rnn_type,\\n\",\n        \"        bidirectional,\\n\",\n        \"        n_classes=5,\\n\",\n        \"        kernel_size=5,\\n\",\n        \"    ):\\n\",\n        \"        super().__init__()\\n\",\n        \" \\n\",\n        \"        self.rnn_layer = RNN(\\n\",\n        \"            input_size=46,\\n\",\n        \"            hid_size=hid_size,\\n\",\n        \"            rnn_type=rnn_type,\\n\",\n        \"            bidirectional=bidirectional\\n\",\n        \"        )\\n\",\n        \"        self.conv1 = ConvNormPool(\\n\",\n        \"            input_size=input_size,\\n\",\n        \"            hidden_size=hid_size,\\n\",\n        \"            kernel_size=kernel_size,\\n\",\n        \"        )\\n\",\n        \"        self.conv2 = ConvNormPool(\\n\",\n        \"            input_size=hid_size,\\n\",\n        \"            hidden_size=hid_size,\\n\",\n        \"            kernel_size=kernel_size,\\n\",\n        \"        )\\n\",\n        \"        self.avgpool = nn.AdaptiveMaxPool1d((1))\\n\",\n        \"        self.attn = nn.Linear(hid_size, hid_size, bias=False)\\n\",\n        \"        self.fc = nn.Linear(in_features=hid_size, out_features=n_classes)\\n\",\n        \"        \\n\",\n        \"    def forward(self, input):\\n\",\n        \"        x = self.conv1(input)\\n\",\n        \"        x = self.conv2(x)\\n\",\n        \"        x_out, hid_states = self.rnn_layer(x)\\n\",\n        \"        x = torch.cat([hid_states[0], hid_states[1]], dim=0).transpose(0, 1)\\n\",\n        \"        x_attn = torch.tanh(self.attn(x))\\n\",\n        \"        x = x_attn.bmm(x_out)\\n\",\n        \"        x = x.transpose(2, 1)\\n\",\n        \"        x = self.avgpool(x)\\n\",\n        \"        x = x.view(-1, x.size(1) * x.size(2))\\n\",\n        \"        x = F.softmax(self.fc(x), dim=-1)\\n\",\n        \"        return x\"\n      ]\n    },\n    {\n      \"cell_type\": \"code\",\n      \"execution_count\": 9,\n      \"metadata\": {\n        \"gather\": {\n          \"logged\": 1644332994213\n        }\n      },\n      \"outputs\": [],\n      \"source\": [\n        \"class Meter:\\n\",\n        \"    def __init__(self, n_classes=5):\\n\",\n        \"        self.metrics = {}\\n\",\n        \"        self.confusion = torch.zeros((n_classes, n_classes))\\n\",\n        \"    \\n\",\n        \"    def update(self, x, y, loss):\\n\",\n        \"        x = np.argmax(x.detach().cpu().numpy(), axis=1)\\n\",\n        \"        y = y.detach().cpu().numpy()\\n\",\n        \"        # print('here!', recall_score(x,y, average='macro', zero_division=1))\\n\",\n        \"        self.metrics['loss'] += loss\\n\",\n        \"        self.metrics['accuracy'] += accuracy_score(x,y)\\n\",\n        \"        self.metrics['f1'] += f1_score(x,y,average='macro')\\n\",\n        \"        self.metrics['precision'] += precision_score(x, y, average='macro', zero_division=1)\\n\",\n        \"        self.metrics['recall'] += recall_score(x,y, average='macro', zero_division=1)\\n\",\n        \"        \\n\",\n        \"        self._compute_cm(x, y)\\n\",\n        \"        \\n\",\n        \"    def _compute_cm(self, x, y):\\n\",\n        \"        for prob, target in zip(x, y):\\n\",\n        \"            if prob == target:\\n\",\n        \"                self.confusion[target][target] += 1\\n\",\n        \"            else:\\n\",\n        \"                self.confusion[target][prob] += 1\\n\",\n        \"    \\n\",\n        \"    def init_metrics(self):\\n\",\n        \"        self.metrics['loss'] = 0\\n\",\n        \"        self.metrics['accuracy'] = 0\\n\",\n        \"        self.metrics['f1'] = 0\\n\",\n        \"        self.metrics['precision'] = 0\\n\",\n        \"        self.metrics['recall'] = 0\\n\",\n        \"        \\n\",\n        \"    def get_metrics(self):\\n\",\n        \"        return self.metrics\\n\",\n        \"    \\n\",\n        \"    def get_confusion_matrix(self):\\n\",\n        \"        return self.confusion\"\n      ]\n    },\n    {\n      \"cell_type\": \"code\",\n      \"execution_count\": 24,\n      \"metadata\": {\n        \"gather\": {\n          \"logged\": 1644397187037\n        }\n      },\n      \"outputs\": [],\n      \"source\": [\n        \"class Trainer:\\n\",\n        \"    def __init__(self, net, lr, batch_size, num_epochs):\\n\",\n        \"        self.net = net.to(config.device)\\n\",\n        \"        self.num_epochs = num_epochs\\n\",\n        \"        self.criterion = nn.CrossEntropyLoss(weight=torch.tensor([1,3,3,4,12]).float().to(config.device))\\n\",\n        \"        # self.optimizer = AdamW(self.net.parameters(), lr=lr)\\n\",\n        \"        self.optimizer = SGD(self.net.parameters(), lr=lr)\\n\",\n        \"        # self.scheduler = CosineAnnealingLR(self.optimizer, T_max=num_epochs, eta_min=5e-6)\\n\",\n        \"        self.scheduler = StepLR(self.optimizer, step_size=100, gamma=1.0)\\n\",\n        \"        self.best_loss = float('inf')\\n\",\n        \"        self.phases = ['train', 'val']\\n\",\n        \"        self.dataloaders = {\\n\",\n        \"            phase: get_dataloader(phase, batch_size) for phase in self.phases\\n\",\n        \"        }\\n\",\n        \"        self.train_df_logs = pd.DataFrame()\\n\",\n        \"        self.val_df_logs = pd.DataFrame()\\n\",\n        \"    \\n\",\n        \"    def _train_epoch(self, phase):\\n\",\n        \"        print(f\\\"{phase} mode | time: {time.strftime('%H:%M:%S')}\\\")\\n\",\n        \"        \\n\",\n        \"        self.net.train() if phase == 'train' else self.net.eval()\\n\",\n        \"        meter = Meter()\\n\",\n        \"        meter.init_metrics()\\n\",\n        \"        \\n\",\n        \"        for i, (data, target) in enumerate(self.dataloaders[phase]):\\n\",\n        \"            data = data.to(config.device)\\n\",\n        \"            target = target.to(config.device)\\n\",\n        \"            \\n\",\n        \"            output = self.net(data).to(config.device)\\n\",\n        \"            loss = self.criterion(output.to(config.device), target.to(config.device))\\n\",\n        \"                        \\n\",\n        \"            if phase == 'train':\\n\",\n        \"                self.optimizer.zero_grad()\\n\",\n        \"                loss.backward()\\n\",\n        \"                self.optimizer.step()\\n\",\n        \"            \\n\",\n        \"            meter.update(output, target, loss.item())\\n\",\n        \"        \\n\",\n        \"        metrics = meter.get_metrics()\\n\",\n        \"        metrics = {k:v / i for k, v in metrics.items()}\\n\",\n        \"        df_logs = pd.DataFrame([metrics])\\n\",\n        \"        confusion_matrix = meter.get_confusion_matrix()\\n\",\n        \"        \\n\",\n        \"        if phase == 'train':\\n\",\n        \"            self.train_df_logs = pd.concat([self.train_df_logs, df_logs], axis=0)\\n\",\n        \"        else:\\n\",\n        \"            self.val_df_logs = pd.concat([self.val_df_logs, df_logs], axis=0)\\n\",\n        \"        \\n\",\n        \"        # show logs\\n\",\n        \"        print('{}: {}, {}: {}, {}: {}, {}: {}, {}: {}'\\n\",\n        \"              .format(*(x for kv in metrics.items() for x in kv))\\n\",\n        \"             )\\n\",\n        \"        fig, ax = plt.subplots(figsize=(5, 5))\\n\",\n        \"        cm_ = ax.imshow(confusion_matrix, cmap='hot')\\n\",\n        \"        ax.set_title('Confusion matrix', fontsize=15)\\n\",\n        \"        ax.set_xlabel('Actual', fontsize=13)\\n\",\n        \"        ax.set_ylabel('Predicted', fontsize=13)\\n\",\n        \"        plt.colorbar(cm_)\\n\",\n        \"        plt.show()\\n\",\n        \"        \\n\",\n        \"        return loss\\n\",\n        \"    \\n\",\n        \"    def run(self):\\n\",\n        \"        for epoch in range(self.num_epochs):\\n\",\n        \"            self._train_epoch(phase='train')\\n\",\n        \"            with torch.no_grad():\\n\",\n        \"                val_loss = self._train_epoch(phase='val')\\n\",\n        \"                self.scheduler.step()\\n\",\n        \"            \\n\",\n        \"            if val_loss < self.best_loss:\\n\",\n        \"                self.best_loss = val_loss\\n\",\n        \"                print('\\\\nNew checkpoint\\\\n')\\n\",\n        \"                self.best_loss = val_loss\\n\",\n        \"                torch.save(self.net.state_dict(), f\\\"best_model_epoc{epoch}.pth\\\")\\n\",\n        \"            #clear_output()\\n\",\n        \"        \"\n      ]\n    },\n    {\n      \"cell_type\": \"code\",\n      \"execution_count\": 25,\n      \"metadata\": {\n        \"gather\": {\n          \"logged\": 1644397187238\n        }\n      },\n      \"outputs\": [],\n      \"source\": [\n        \"attn_model = RNNAttentionModel(1, 64, 'lstm', False)\"\n      ]\n    },\n    {\n      \"cell_type\": \"code\",\n      \"execution_count\": null,\n      \"metadata\": {},\n      \"outputs\": [\n        {\n          \"name\": \"stderr\",\n          \"output_type\": \"stream\",\n          \"text\": [\n            \"/anaconda/envs/azureml_py36/lib/python3.6/site-packages/ipykernel_launcher.py:9: UserWarning: Creating a tensor from a list of numpy.ndarrays is extremely slow. Please consider converting the list to a single numpy.ndarray with numpy.array() before converting to a tensor. (Triggered internally at  ../torch/csrc/utils/tensor_new.cpp:201.)\\n\",\n            \"  if __name__ == '__main__':\\n\",\n            \"/anaconda/envs/azureml_py36/lib/python3.6/site-packages/ipykernel_launcher.py:9: UserWarning: Creating a tensor from a list of numpy.ndarrays is extremely slow. Please consider converting the list to a single numpy.ndarray with numpy.array() before converting to a tensor. (Triggered internally at  ../torch/csrc/utils/tensor_new.cpp:201.)\\n\",\n            \"  if __name__ == '__main__':\\n\",\n            \"/anaconda/envs/azureml_py36/lib/python3.6/site-packages/ipykernel_launcher.py:9: UserWarning: Creating a tensor from a list of numpy.ndarrays is extremely slow. Please consider converting the list to a single numpy.ndarray with numpy.array() before converting to a tensor. (Triggered internally at  ../torch/csrc/utils/tensor_new.cpp:201.)\\n\",\n            \"  if __name__ == '__main__':\\n\",\n            \"/anaconda/envs/azureml_py36/lib/python3.6/site-packages/ipykernel_launcher.py:9: UserWarning: Creating a tensor from a list of numpy.ndarrays is extremely slow. Please consider converting the list to a single numpy.ndarray with numpy.array() before converting to a tensor. (Triggered internally at  ../torch/csrc/utils/tensor_new.cpp:201.)\\n\",\n            \"  if __name__ == '__main__':\\n\",\n            \"/anaconda/envs/azureml_py36/lib/python3.6/site-packages/ipykernel_launcher.py:9: UserWarning: Creating a tensor from a list of numpy.ndarrays is extremely slow. Please consider converting the list to a single numpy.ndarray with numpy.array() before converting to a tensor. (Triggered internally at  ../torch/csrc/utils/tensor_new.cpp:201.)\\n\",\n            \"  if __name__ == '__main__':\\n\",\n            \"/anaconda/envs/azureml_py36/lib/python3.6/site-packages/ipykernel_launcher.py:9: UserWarning: Creating a tensor from a list of numpy.ndarrays is extremely slow. Please consider converting the list to a single numpy.ndarray with numpy.array() before converting to a tensor. (Triggered internally at  ../torch/csrc/utils/tensor_new.cpp:201.)\\n\",\n            \"  if __name__ == '__main__':\\n\",\n            \"/anaconda/envs/azureml_py36/lib/python3.6/site-packages/ipykernel_launcher.py:9: UserWarning: Creating a tensor from a list of numpy.ndarrays is extremely slow. Please consider converting the list to a single numpy.ndarray with numpy.array() before converting to a tensor. (Triggered internally at  ../torch/csrc/utils/tensor_new.cpp:201.)\\n\",\n            \"  if __name__ == '__main__':\\n\"\n          ]\n        },\n        {\n          \"name\": \"stdout\",\n          \"output_type\": \"stream\",\n          \"text\": [\n            \"train mode | time: 09:00:00\\n\",\n            \"loss: 1.5601879076803884, accuracy: 0.07522580645161298, f1: 0.06748332740742076, precision: 0.3165194883003588, recall: 0.2457613616641853\\n\",\n            \"val mode | time: 09:00:31\\n\"\n          ]\n        },\n        {\n          \"data\": {\n            \"image/png\": \"iVBORw0KGgoAAAANSUhEUgAAAU4AAAEnCAYAAADGqKr7AAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4yLjEsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy+j8jraAAAgAElEQVR4nO3de7hdVX3u8e9riEAFROSeoFABK1DFknJQ2nMolJJWKuiRGvsoaYvGUmixh7aCx1at91bQYpVHFAqoJVC8gFTAyEUPFgIJIhACGiXKhkAM10AlGHjPH3MsXdnZt7mYa629st7P88xnzzXWvPz23tm//MYYc84l20RExNQ9p98BREQMmiTOiIiakjgjImpK4oyIqCmJMyKips36HUBEbBrmzp3rNWvW1Npn6dKlV9qe26WQuiaJMyIasWbNGpYsWVJrH0nbdymcrkrijIiGGFjf7yB6IokzIhqUxBkRUUMqzoiImpI4IyJqSuKMiKhpeBJnLoDvMkmvl3S1pEckrZP0fUkf6NZlGJIOlnSzpCclNfboK0nvlVTvIr1pTNICSUfX2P5cSfWutRk6rcRZZxlMqTi7SNJpwDuAfwM+DjwG7AP8ObAv8LounPYzwGrgCGBdg8f9HPC1Bo/XbwuA24GvTnH79wNbdi+cTcXgJsM6UnF2iaQ/BP4P8Dbbb7X9Ndvfsn0m8BvAWV069a8BXy3nuqGpg9oesb20qeMNCklbAtj+oe3b+x3P9Gbg6ZrLxCRtIelGSd+TtEzS+0r7dpIWSfpB+fqCtn1OlbRC0l2SjmhrP0DSbeW9MySptG8u6cLSvljS7pPFlcTZPX8N3Gz7nNFv2H7a9uWt15K2l3SepAcl/bekayXNad9H0kpJH5P015JGJD0saaGkbcv7h5Su+QzgXyRZ0rnlPUs6cdTxNuh6S9pW0uck3Ve6+T+R9Nnxti9te0j6qqTHJK2V9DVJe47axpJOkvQhST+VtFrSpyRtPtEPr9U1lvQaSXeUn8t/lj+YPSVdI+mJss3LR+17sqSbJD0q6YHRcUm6FjgAmF/is6Q/afs5nybp7yWNUPUSNuqqS7pM0p2txNp23icl7TvR97bp6kpXfR1wqO1XAPsDcyUdBJwCXGV7L+Cq8hpJ+wDzqHp0c4FPS5pRjnUmVU9jr7K0bvU8DnjY9p5UPcOPThZUEmcXSJoJvBq4Yoq7fJWqa/03wBupfi/XjE5CwB8Bh1H98t8JHAl8qLx3M/Cqsn5aWX9/jbBPB36LKuEfAbyL6i9hTCXxXQW8DHgb8CfAHsC3JG03avOTgV2BNwP/DLwdOGkKMb0I+Efg3VTf86upKvWFZXkD1XDTwlb1UMwG/hU4qsQ2A/iOpOeX9/8CuBP4OtXP6VXAf7bt/8fA/yrbvXGc2N4G7AB8GEDSy4APAO+xvWwK39smqPnE6crj5eXMspjqd3teaT8PaI1XHwUstL3O9t3ACuBASbsA29i+3tXHXpw/ap/WsS4GDhv172kjGePsjhcCmwM/mWxDSXOBg4FDbH+rtF0NrAT+lirJtPwcONr2+rJd63/Xv7D9GHBD+X2v7KCbfiDwKdsXtrV9YYLt/5Qqse1t+0clnsXAj0rMH27bdqXtPynrV0o6GHg98E+TxLQd8CrbPyzHfznVz2S+7fNLm6iS3q8BywFs/3XrAKXaWEQ17nsUcL7tOyQ9Afx0gp/TkbafHC8w26tKFf9FSV8r3+93gY9N8j3FhrbXhpNuZ9neYBir/A6XAntS/RtdLGkn26vgF7+LHcvms4D23+lIaft5WR/d3trnnnKs9ZIepfobHncyNImzu6Yyq30g1R/wt36xk/2EpMuoKsB217SSZnEHsKOk59p+6lnGegvwt5KeBr5p+/tTiPvmVtIscY9I+s4YcX9j1Os7gDlMbmUraRYryterx2ibRUmcpSv3fqqx5Pbqd+8pnBOqLuC4SbPF9gWSXk+VuJ8BXmF78oG7TVrtyaE1tif8t1B+pvuXYamvSNpvgs3HqhQ9QftE+4wrXfXueJBqbOZFU9h2F+CBMdofYMM/eoBHRr1+iuqX/ty6AY7hRKohg38A7lI16D5vgu2fbdxbTCGmsfYb3d5q2wJA0ouoErWoKt+Dgd+kqjinck4Y+/sazwVUvYtFtn9QY79NUHcvR7L9CHAt1djkA6X7Tfm6umw2AuzWttts4L7SPnuM9g32kbQZ8HzgoYliSeLsAts/B75DNVY4mVXAjmO078Qkv7wa1rFxct0gudl+xPZf2d4ZeAWwmKobus84x+xF3J2YC/wKcJTti23/F1U1PTqZT2RK179K2oZqMuG7wGvVNoM7nJpPnJJ2aJsA3RL4Xarx6UuB+WWz+cAlZf1SYJ6qmfI9qCaBbizd+rWSDirDO8eO2qd1rDcAV3uSj/9N4uyeTwBzJM0f/Yak55SxTagS1I6S/mfb+78CvAa4rqFYRqgmcX5xfuDQ8Ta2fSvVWOJzqMYOx7IYOKD842wddxbVBE5TcXdiS6puc/tf5R+x8bDUVKveiXyCauLpUODfgc+1TUANoa5UnLtQTZTeCtxEVdlfBnwEOFzSD4DDy2vKxNxFVMNBVwAntA2fHE91PfIK4IdA68qWs4EXSlpBdQnhKZMFlTHOLrH9NUmnA2eXyZBLgMepEtGfU03+XGH7yjIueKGkU6i6+X9DlQD+uaFwvgKcIOm7VJM3bwW2ad9A0nVlu9up/gLeBjwB3DjOMc+lmtm/XNI/UF2U916qAfXPNBR3J66mSmb/JulsqstS/oaNu/13AkeUKvFB4G7bD071JJKOpJog+33bj0j6S6qf3b9QXWEwhJq/5bL8J/7KMdofpLrCZKx9Pgh8cIz2JcBG46NlPPuYOnGl4uwi2ydTXc6yF1VFsojq0pyrqP73a3ldee8TwH9Qjc8dansFzXhfOe4HqBLeLcDo60uvp/qDv5jqf+ztqZLCCGOwvY5fdpvOprqc48dUVwf0ratu+zaqhPY/gMuoLi06Bnh01KYfoJpMuoiqkvnDqZ6jXG51FvBZ21eU8z5E9Z/NfFU3Pwyp4bjlUpN05SMipmTOnL29ZMkna+0jzV062az6dJSuekQ0ZHiejpTEGRENSeKMiKgpiTMiogNJnH01Q/LMfgdR034HHNDvEGq5c+ngPSVuEC8D2fuAwYp65cpnWLPGEz7kYmypOPtuJhveNzUIliwZrAeEv3riB8BMS8/rdwAdWLTk2V5n31tz5kx6m/44kjgjImpqPch405fEGRENScUZEdGB4UicgzVqHRExDaTijIiGpKseEVFTEmdERE1JnBERNSVxRkR0IIkzIqKGVJwRETUlcUZE1JTEGRFRUxJnREQHkjgjImpIxRkRUVMSZ0RETcOTOHv2dCRJcyXdJWmFpFN6dd6I6KWnay6DqScVp6QZwKeAw4ER4CZJl9q+oxfnj4heGJ6Ks1dd9QOBFbZ/BCBpIXAUkMQZsckYnsTZq676LOCettcjpW0DkhZIWiJpyeAW8RGxqetVxTnWxyl6owb7LOAsgC2kjd6PiOlseCrOXiXOETb8tN/ZwH09OndE9EQSZ9NuAvaStAdwLzAP+OMenTsieiKJs1G210s6EbgSmAGcY3tZL84dEb2UxNko218Hvt6r80VErw1PxZmPB46IhrQSZ51lYpJ2k3SNpOWSlkk6qbS/V9K9km4pyx+07XNqudHmLklHtLUfIOm28t4ZklTaN5d0YWlfLGn3yeLKLZcR0ZCuVJzrgZNt3yxpa2CppEXlvY/b/lj7xpL2oZpD2RfYFfimpL1tPw2cCSwAbqDq/c4FLgeOAx62vaekecBHgTdOFFQqzohoSPMVp+1Vtm8u62uB5YxxDXibo4CFttfZvhtYARwoaRdgG9vX2zZwPnB02z7nlfWLgcNa1eh4kjgjokHNJs52pQv9SmBxaTpR0q2SzpH0gtI23s02s8r66PYN9rG9HngUeOFEsSRxRkRDOqo4t2/dLViWBWMdWdJWwJeAd9h+jKrb/RJgf2AVcFpr03ECm+gmnCndoNMuY5wR0ZCOxjjX2J4z0QaSZlIlzS/a/jKA7Qfa3v8scFl5Od7NNiNlfXR7+z4jkjYDng88NFFMqTgjoiFdmVUXcDaw3Pbpbe27tG32OuD2sn4pMK/MlO8B7AXcaHsVsFbSQeWYxwKXtO0zv6y/Abi6jIOOKxVnRDSkK7PqBwNvAW6TdEtpexfwJkn7l5OuBN4OYHuZpIuonry2HjihzKgDHA+cC2xJNZt+eWk/G/i8pBVUlea8yYJK4oyIBjX7XDPb1zH2GOS4N9PY/iDwwTHalwD7jdH+JHBMnbiSOCOiIcNz51ASZ0Q0ZHgSZyaHIiJqSsUZEQ0ZnooziTMiGpTEGRFRQyrOiIiakjj77ufAA5NuNb3MmviBKtPOI/0OYEjM0n/3O4RaftrxnkmcERH1eTg+2DuJMyKa80y/A+iNJM6IaIZp+o7LaSuJMyKakcQZEdGBdNUjImpIxRkR0YFUnBERNaTijIjoQBJnREQNJl31iIjaUnFGRNQwRGOceQJ8RERNqTgjojkZ44yIqGGIuupJnBHRnFScERE1pOKMiKhpiBJnT2bVJZ0jabWk23txvojok2dqLgOqV5cjnQvM7dG5IqIfWhVnnWVA9aSrbvvbknbvxbkioo8GOBnWkTHOiGhG7lXvD0kLgAUAg/VBuxEBpOLsB9tnAWcBzJDc53Aioo5UnBERHRiSirNXlyNdAFwPvFTSiKTjenHeiOihzKo3y/abenGeiOizIemq57FyEdGMLlScknaTdI2k5ZKWSTqptG8naZGkH5SvL2jb51RJKyTdJemItvYDJN1W3jtDkkr75pIuLO2Lp3LpZBJnRDSn+a76euBk2y8DDgJOkLQPcApwle29gKvKa8p784B9qW66+bSkGeVYZ1JdtbNXWVo35RwHPGx7T+DjwEcnCyqJMyKmLdurbN9c1tcCy4FZwFHAeWWz84Cjy/pRwELb62zfDawADpS0C7CN7ettGzh/1D6tY10MHNaqRseTxBkRzWhdjlTvXvXtJS1pWxaMd/jShX4lsBjYyfYqqJIrsGPZbBZwT9tuI6VtVlkf3b7BPrbXA48CL5zoW83lSBHRnPoz5Wtsz5lsI0lbAV8C3mH7sQkKwrHe8ATtE+0zrlScEdGMzirOSUmaSZU0v2j7y6X5gdL9pnxdXdpHgN3adp8N3FfaZ4/RvsE+kjYDng88NFFMSZwR0ZzmZ9UFnA0st31621uXAvPL+nzgkrb2eWWmfA+qSaAbS3d+raSDyjGPHbVP61hvAK4u46DjSlc9IprRnQcZHwy8BbhN0i2l7V3AR4CLys00PwGOAbC9TNJFwB1UM/In2G5FdTzVIy63BC4vC1SJ+fOSVlBVmvMmC0qTJNa+mSH5ef0Ooqat+x1ATY/0O4AhsW2/A6jpp8BTdu3n7Mx5ibzkQ/X20TyWTmWMc7pJxRkRzRiij85I4oyIZiRxRkR0YEjuVU/ijIhmpOKsSPqHqRzE9j82E05EDLRUnAD8dtu6gP8J3A/8GHgxsDPwre6EFhEDJRVnxfbhrXVJpwNXAx9uXRwq6VRg+65GGBGDI4lzI8cCO4+6ov6fqSrQkxuNKiIGTz5zaEw/A/YDbmlr+3XgyUYjajNo/3lt1+8AauraLy428OJ+B1DTo89m50H7o+1QncT5aeAKSZ8BVgK7Uz0U9JPNhxURAycV58Zsf1jSCNV9o8cA9wLvtH1+t4KLiJiOal3HafvzwOe7FEtEDLp01Tcm6Vepnhyyq+0TJe0NzLS9rCvRRcTgGKLLkab8PE5JhwPfo/rApGNL8w7Ax7oQV0QMoi48yHg6qlNxfgQ4xvYVkh4ubTcDv9F8WBExcIao4qyTOF9i+4qybgDbPyuPtY+IYTdEibPOR2fcI2m/9gZJr6C6NCkiYmi66nUS5xnAlyW9GZgh6X8DX6D6APeIGHatirPBzxyarupcx/nZ8iFH7wRmAO8DPlEuUYqIGOgqso6613GeBZzVpVgiYpBljHNjkpaP035bc+FExEBLV30js2u2R8Qwyb3qvyTpXa1t29Zb9gTuaTyqiBhMA1xF1jGVirP1MOOZbetQ/d9yP/BnTQcVEQNoiMY4J02ctn8HQNInbf9l90OKiIE1JF31WtdxStq5vUHSTpL2bDimiBhEQ3QdZ53E+e9s/PlCO5T2iIjcOTSGvW3fPqptGbD3ZDtK2k3SNZKWS1om6aRaUUZETCN1Lkd6RNL2tte0tW0PPDGFfdcDJ9u+WdLWwFJJi2zfUSfYiJjGhmhyqE7FuQg4U9JWAOXrJ4FvTLaj7VW2by7ra4HlwKz64UbEtDYkY5x1Ks5TgEuBByWtBnYElgKvrXNCSbsDrwQWj/HeAqoPgEN1DhoR/ZcL4Ddme42kg4HfpPrE05XAklGfsz6hUqV+CXiH7cfGOMcv7oWfIU35uBExTQxwFVlH3Yd8GLixLLWUBx5/Cfii7S/X3T8iprkhGuOcMHFKOsP2X5X1cZ+KZHvBJMcRcDaw3PbpnQQaEQNgSLrqk00OzRy1Pt4ymYOpPo/9UEm3lOUPOog3IqarLlwAL+kcSasl3d7W9l5J946VSySdKmmFpLskHdHWfoCk28p7Z5RiDkmbS7qwtC8uczCTmrDitH182/qfTuWA4xznOjLfE7Fp687k0LnAvwLnj2r/uO0NPmFX0j5UH1++L7Ar8E1Je9t+GjiTauL5BuDrwFzgcuA44GHbe0qaB3wUeONkQdW5HCkiYmINV5y2vw08NMWzHwUstL3O9t3ACuBASbsA29i+vszTnA8c3bbPeWX9YuCwVjU6kQkTp6RnJD092TLFbyoiNmWdddW3l7SkbZlwvqTNiZJuLV35F5S2WWz4mMuR0jarrI9u32Af2+uBR4EXTnbyyWbVf7ttfQ7w58BpwN3ArwLvAD4z2UkiYkjU76qvsT2n5j5nAu+nStXvp8pJf8bYw4GeoJ1J3hvXZGOc32mtS/pX4EjbPyxNV0m6mqq8PWOyE0XEJq5HlyPZfqC1LumzwGXl5QiwW9ums4H7SvvsMdrb9xmRtBnwfKYwNFBnjPMlbPy093upKs+IiJ7cclnGLFteB7Rm3C8F5pWZ8j2AvYAbba8C1ko6qIxfHgtc0rbP/LL+BuDqqdzUU+cC+KXAxyT9ne0nJW0BfAT4bo1jRMSmqguz6pIuAA6hGgsdAd4DHCJp/3LGlcDbAWwvk3QRcAfVg4VOKDPqAMdTzdBvSTWbfnlpPxv4vKQVVJXmvCnFNdU7JssDi78G7A607lX/MfBa29+f0kFqmCF5i6YP2mWDVnrfN/km0YCX9juAmm4FHrdrXz44Z2t5yf719tF1LO1gjLPv6tyrvkLSfsBBVDNR9wI3tGX0iBhmecjH2Gw/Lem/gJ3LuEFExNCZ8uSQpK0knQ38jOrCUiQdLek93QouIgbMkDyPs86s+mnATlT3nT9V2m5iCrcnRcQQGKIPa6vTVT8S2Mf2oyrPyrR9r6RduxNaRAycjHFuRFTd9F82VA8mfrzRiCJiMA3R8zjrdNW/A5w6qu0vgWuaCyciBla66mM6meo2yzcDW0m6jepZnId1JbKIGDzpqm/I9k/KdZxHAntQXfx+me2fTbxnRAyFIeqqTylxlpvfHwR2sv2l7oY0uAbtTpwn+x3AkPhevwOo6Vn9u0jF+Uu210taQ9U1z99bRGxsiCrOOpND7wHOlDRr0i0jYjhlcmgj/wbMAN4k6RnaHvZp+7lNBxYRAyb3qm+oPBnpjcC2wA8n2TwihtUAV5F1TJo4Jb0euJCq2nwKeL3tr3c7sIgYMBnj3MC7gXcBW1ONc76rqxFFxOB6puYyoKaSOPcATrP9BHA6sGd3Q4qIgZQ7hzYww/YzALZ/LikTQRExtgGuIuuYSuJ8rqT27vkWo15j+0PNhhURA2eIxjinkjhvAA5ve7141GsDSZwRMTQmTZy2D+lBHBGxKUjFGRFRQy6Aj4joQCrOiIgaMjkUEdGBdNUjIuoZkoIziTMimjFEPfUkzohozpD01JM4I6IZqTgbJmkL4NvA5uWcF9t+Ty/OHRG9k4qzWeuAQ20/LmkmcJ2ky23f0KPzR0SXpeJsmG0Dj5eXM8vi8feIiEEzTImzzoe1PSuSZki6BVgNLLK9eIxtFkhaImlJsmrE4BmS5xj3LnHaftr2/sBs4EBJ+42xzVm259ieo14FFhGNGKLnGPcucbbYfgS4Fpjb63NHRHc1nTglnSNptaTb29q2k7RI0g/K1xe0vXeqpBWS7pJ0RFv7AZJuK++dIUmlfXNJF5b2xZJ2n8r32ZPEKWkHSduW9S2B3wXu7MW5I6I3Wg9Harirfi4bF1mnAFfZ3gu4qrxG0j7APGDfss+nJc0o+5wJLAD2KkvrmMcBD9veE/g48NGpBNWrinMX4BpJtwI3UY1xXtajc0fEgLL9beChUc1HAeeV9fOAo9vaF9peZ/tuYAXVsOAuwDa2ry8T1eeP2qd1rIuBw1rV6ER6Nat+K/DKXpwrIvqng3HL7SUtaXt9lu2zJtlnJ9urAGyvkrRjaZ9F9YkVLSOl7edlfXR7a597yrHWS3oUeCGwZqIAcudQRDSiw+cYr7E9p6EQxqoUPUH7RPtMqOeTQxGx6erRrPoDpftN+bq6tI8Au7VtNxu4r7TPHqN9g30kbQY8n42HBjaSxBkRjejh5UiXAvPL+nzgkrb2eWWmfA+qSaAbS7d+raSDyvjlsaP2aR3rDcDVZRx0QumqR0Rjmr6oXdIFwCFUY6EjwHuAjwAXSToO+AlwDIDtZZIuAu4A1gMn2G7l5+OpZui3BC4vC8DZwOclraCqNOdNKa4pJNe+mCF5i34HUdOgxftkvwOIaelJ4Gm79j0o+0peWHOfl8PSBsc4eyYVZ0Q0YpjuVU/ijIjGDPL953UkcUZEI1JxRkR0IBVnREQNqTgjIjqQxBkRUUOHt1wOpCTOiGhMKs6IiBoyxhkR0YFh6arnIR8RETWl4mzQi/sdQE139TuADszsdwAd+O1+B1DT/+twv3TVIyI6MCxd9STOiGhEKs6IiA4kcUZE1JAL4CMiOpCKMyKihoxxRkR0IF31iIgaUnFGRNSUyaGIiA6k4oyIqCFd9YiIDqSrHhFRQyrOiIgOJHFGRNQwTLPqeZBxRERNqTgjojHpqkdE1DBMXfWeJk5JM4AlwL22j+zluSOi+1JxdsdJwHJgmx6fNyK6bJguR+rZ5JCk2cBrgM/16pwR0VvP1FwGVS8rzk8AfwdsPd4GkhYACwDUo6AiohmpOBsm6Uhgte2lE21n+yzbc2zPSeKMGCytxFlnmQpJKyXdJukWSUtK23aSFkn6Qfn6grbtT5W0QtJdko5oaz+gHGeFpDMkdZxmetVVPxh4raSVwELgUElf6NG5I6JHuthV/x3b+9ueU16fAlxley/gqvIaSfsA84B9gbnAp8ukNMCZVD3avcoyt5PvEXqUOG2fanu27d2pvqmrbb+5F+eOiN7oVsU5jqOA88r6ecDRbe0Lba+zfTewAjhQ0i7ANravt23g/LZ9asudQxHRmA4qzu0lLWlbFoxxWAPfkLS07f2dbK8CKF93LO2zgHva9h0pbbPK+uj2jvT8Anjb1wLX9vq8EdFdHU4OrWnrfo/nYNv3SdoRWCTpzgm2HWvc0hO0dyR3DkVEY7oxq277vvJ1taSvAAcCD0jaxfaq0g1fXTYfAXZr2302cF9pnz1Ge0fSVY+IRrRuuWxyckjS8yRt3VoHfg+4HbgUmF82mw9cUtYvBeZJ2lzSHlSTQDeW7vxaSQeV2fRj2/apLRVnRDSmCxXnTsBXypVDmwH/bvsKSTcBF0k6DvgJcAyA7WWSLgLuANYDJ9huhXU8cC6wJXB5WTqSxBkRjejGBfC2fwS8Yoz2B4HDxtnng8AHx2hfAuzXRFxJnBHRmEG+jbKOJM6IaERuuYyIiHGl4oyIxqSrHhFRwzB11ZM4I6IxSZwRETXkM4ciIjqQijMiooaMcUZE1JSuekREB1JxRkTUkIozIqIDqTj77BlY89/w4y4centgTReOy3e7cdBK12LukkGLF7oY82XdOGh3f8Yv7mSnTA5NA7Z36MZxJS2ZwqP6p5VBi3nQ4oXBi3m6xpuuekREDak4IyI6kMS56Tqr3wF0YNBiHrR4YfBinnbxDtOsuqrPZo+IeHa2klz3cykWw9LpOFY7mTzIOCKipmHsqkdEFwxTV31oKk5JcyXdJWmFpFP6Hc9kJJ0jabWk2/sdy1RJ2k3SNZKWS1om6aR+xzQRSVtIulHS90q87+t3TFMlaYak70rq0mWinXm65jKohiJxSpoBfAr4fWAf4E2S9ulvVJM6F5jb7yBqWg+cbPtlwEHACdP857wOONT2K4D9gbmSDupzTFN1ErC830G0a12OlMS56TgQWGH7R7afAhYCR/U5pgnZ/jbwUL/jqMP2Kts3l/W1VH/Ys/ob1fhceby8nFmWaT9bKmk28Brgc/2OZbRnai6DalgS5yzgnrbXI0zjP+hNgaTdgVcCi/sbycRKl/cWYDWwyPa0jrf4BPB3TLPck4pz06Mx2qZ9ZTGoJG0FfAl4h+3H+h3PRGw/bXt/YDZwoKS6V9T0lKQjgdW2l/Y7ltGSODc9I8Buba9nA/f1KZZNmqSZVEnzi7a/3O94psr2I8C1TP9x5YOB10paSTXkdKikL/Q3pF9KV33TchOwl6Q9JD0XmAdc2ueYNjmSBJwNLLd9er/jmYykHSRtW9a3BH4XuLO/UU3M9qm2Z9venerf8dW239znsIBUnJsc2+uBE4ErqSYsLrK9rL9RTUzSBcD1wEsljUg6rt8xTcHBwFuoqqBbyvIH/Q5qArsA10i6leo/10W2p9XlPYNmWCrO3HIZEY14ruSda+5zz4Decpk7hyKiMYPc/a4jiTMiGjFMt1wmcUZEY1JxRkTUkCfAR0R0IF31iD6SdC6w3vZb+x1LTM0wVZxDcR1nTJ2kd0uypGNr7GNJv9XNuCKmk1Sc8QuSngMcR/VUprcD5/c3ohgkz8CVa6vPe6+jW58N31VJnNHuCKr7+I8GLlIKVcoAAAHTSURBVJO0n+3bASS9HPgn4ABgBtWFy4dL+l7Z9xuSngEW2n5ruZf63ba/UPbfHbgb2M32iKTDgA8Be1M9x/Mq4K9sr+7NtxpNsz3d7/NvTLrq0e7twOW2/xP4HrAAQNIuwLfKsjuwM/BRgPIQYIDfs71VjTHJdVS3we4A/DqwK/AvzXwbEd2VxBkASNqV6uG455Smc4C3lIdfvIXqQdAftv2E7adsf/PZnM/2dbZvsr3e9v1U1exhz+aYEb2SxBktrbHN1kMuvgBsCbyRqsr8fpMnk3SApCsl3S/pMeACquozYtpL4ozWpNBbgW2BEUn3A3dQjWUuAFYCe01wiLGeFPM48Ly217uOen8hcDOwt+1tgDd1FHxEHyRxBlQP750NvJrqQ8tay2uAV1F9/MVLJb1T0q9Imlkmd1ruZ+PEuoTqQ/G2krQD8Pej3t8GeBRYK+lFwLT/5NGIliTOgGpS6Ku2l9q+v235BtUzQY8BDgEOp3qa/gPAO9v2/7/AP0p6WNJnStu7qa6HXkX1ZPWFo865gKrKXQt8GfiPbnxjEd2Q53FGRNSUijMioqYkzoiImpI4IyJqSuKMiKgpiTMioqYkzoiImpI4IyJqSuKMiKjp/wMtX5EUCzmbdQAAAABJRU5ErkJggg==\",\n            \"text/plain\": [\n              \"<Figure size 360x360 with 2 Axes>\"\n            ]\n          },\n          \"metadata\": {\n            \"needs_background\": \"light\"\n          },\n          \"output_type\": \"display_data\"\n        },\n        {\n          \"name\": \"stderr\",\n          \"output_type\": \"stream\",\n          \"text\": [\n            \"/anaconda/envs/azureml_py36/lib/python3.6/site-packages/ipykernel_launcher.py:9: UserWarning: Creating a tensor from a list of numpy.ndarrays is extremely slow. Please consider converting the list to a single numpy.ndarray with numpy.array() before converting to a tensor. (Triggered internally at  ../torch/csrc/utils/tensor_new.cpp:201.)\\n\",\n            \"  if __name__ == '__main__':\\n\"\n          ]\n        },\n        {\n          \"name\": \"stdout\",\n          \"output_type\": \"stream\",\n          \"text\": [\n            \"loss: 1.3856678911868263, accuracy: 0.11747619720965312, f1: 0.07200591838594485, precision: 0.27436380889422163, recall: 0.3566003931926984\\n\",\n            \"\\n\",\n            \"New checkpoint\\n\",\n            \"\\n\",\n            \"train mode | time: 09:00:38\\n\"\n          ]\n        },\n        {\n          \"data\": {\n            \"image/png\": \"iVBORw0KGgoAAAANSUhEUgAAAUgAAAEnCAYAAADLttq8AAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4yLjEsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy+j8jraAAAgAElEQVR4nO3de7hcdX3v8ffHcIvSFGi4xASFalCBFmzSNErbY0VLWimhnkONfZDoQWMpKvZQK3hsqa0obZUiPcJjBCUUS0y9gSiXNIB9tNzCxUK4lFQQIoEYFImogcDn/LF+W4bNzOyZzZqZvWd/Xs+znr3mN+u31i+BfPfvvmSbiIh4tucNugARERNVAmRERAsJkBERLSRARkS0kAAZEdHCdoMuQEQMh0WLFnnz5s1d5bnxxhsvt72oR0V6zhIgI6IWmzdvZu3atV3lkTSzR8WpRQJkRNTEwLZBF6JWCZARUaMEyIiIJlKDjIhoIQEyIqKFBMiIiBaGL0BmoniPSXqjpCslPSJpq6T/kvThXk1vkHSIpJsk/UxSbVs1SfprSd1NcpvAJC2TdGQX158nqbs5LFPOSIDs5pjYUoPsIUkfB94LfBb4R+BRYH/gT4ADgD/swWM/BWwCDgO21njfc4Cv1ni/QVsG3AZ8pcPr/xaY3rviDIuJH/S6kQDZI5L+APg/wLG2P9Pw1TckLQd+t0ePfjmw3PY36ryp7Q3AhjrvORlImm77p7b/e9BlmfgMPDnoQtQqTeze+TPgplHBEQDbT9q+dOSzpJmSVkh6WNJPJF0taX5jHkn3SvqYpD+TtEHSDyWtlLRL+f41pUk9DfiEJEs6r3xnSe8adb9nNJkl7SLpHEkPlOb5fZI+3er6kravpK9IelTSFklflfTSUddY0gmSPiLp+5I2SfqkpB3b/eWNNGklvUHS7eXv5WuSdpP0UklXSXqsXPOro/KeKOkGST+S9NDockm6GpgHLC3ls6S3Nvw9f1zSX0raQFXrf1YTW9Ilku6UNH3Uc38m6YB2f7bhNXxN7ATIHpC0PfBq4LIOs3yFqkn858CbqP67XDU62AB/BBxK1Tx8P3A48JHy3U3Aq8r5x8v533ZR7NOB36QK7IcBH6D6P76pEuDWAK8A3gG8FdiXqoa826jLTwReCBwN/APwTuCEDsr0IuBvgA9S/ZlfDSwHVpbjf1G1glZKUkO+OcD/AxaXsk0DviXpF8v3fwrcCXyd6u/pVcDXGvL/MfA/ynVvalG2dwC7Ax8FkPQK4MPAKbbXdfBnG0LDFyDTxO6NXwJ2BO4b60JJi4BDgNeMNIslXQncC7yPKpiMeAI40va2ct3+wBLgT20/Clxb4sS9tq/tsswLgE/a/nxD2gVtrn8bVQDbz/Z3SnmuA75TyvzRhmvvtf3Wcn65pEOANwJ/P0aZdgNeNdK8LTXF9wFLbZ9f0kQV3F4O3AFg+89GbiBpGrCaql92MXC+7dslPQZ8v83f0+G2f9aqYLY3llr55yR9tfx5bwY+NsafKSaRBMje6mQUeQHVP9Sf9xnafkzSJVQ1ukZXjQTH4nZgD0k72H78OZb1FuB9kp4E/s32f3VQ7ptGgmMp9wZJ32pS7itGfb4dmM/Y7h3V97e+/LyySdpsSoCUtJCq9vxrVEF2xH4dPBNgTbvgOML2hZLeSBWgnwIOsj1cnXBdm/i1wm6kid0bD1ONIL+og2tnAQ81SX+IZ/7jBnhk1OfHAQE7dFvAJt5F1dT/K+AuSXdLWtLm+uda7p06KFOzfKPTR9J2ApD0IqqALKqa7CHAr1PVIDt5JjT/c7VyIVVrYbXtu7vIN4SGr4mdANkDtp8AvkXVlzeWjcAeTdL3BH5QU5G28uwg+owgZvsR2++xvRdwEHAdVfNx/xb37Ee5x2MR8Hxgse0v2P4Pqtrx6KDdTkfzRyXNoJq+dTNwhKRO/nsPsQTI6NwZwHxJS0d/Iel5pe8RqkC0h6Tfbvj++cAbgG/WVJYNVIMpP38+8NpWF9v+T6q+vudR9e01cx0wT9K+DfedTTWQUle5x2M6VXO38V/fH/Hs7qROa7HtnEE1APRa4F+AcxoGgqag4QuQ6YPsEdtflXQ6cG4ZlLgI+DFVwPkTqkGYy2xfXvrtPi/pJKrm+Z9T/UP/h5qK82XgeEk3Uw2ivB2Y0XiBpG+W626j+j/9HcBjwPUt7nke1Uj6pZL+imoC3F8Dm6kmqw/KlVRB67OSzqWakP/nPLu5fidwWKn1PQzcY/vhTh8i6XCqgarfs/2IpHdT/d19gmpEfwrKUsPogu0TqaaJzKWqYaymmvKyBjiu4dI/LN+dAfwrVf/Za22vpx4fKvf9MFVguwUYPT/zGqp/2F8AVgEzqf7xN50cbnsr8DqqQHMusAL4LtVo/MCa2LZvpQpcvwFcQjVl5yjgR6Mu/TDVoM4q4AbgDzp9RpnGtBz4tO3LynN/QPVLZamqRQJT1HDVIGXXtlw3Iqaw+fP389q1/9RVHmnRjbY7mdEwEKlBRkRNetMHWVaPrZN0m6QLJe1UVlStLrMtVkvateH6kyWtl3RX48CZpHmSbi3fnTlqcUFTCZARUZP6A2QZ+HsPMN/2gVT9y0uAk6jmq86l6rI6qVw/snjiAKoZDWeVxQIAZ1OtyJpbjjHfppgAGRE16dko9nbAdEnbUU3heoBqVdSK8v0KYGTrusXASttbbd9DtZBggaRZwAzb17jqVzy/IU9LCZARUaOuA+TMsuHIyLGs8W62v0e1fPM+qrm3P7J9BbCn7Y3lmsY5ubOB+xtusaGkzeaZu1GNpLc1Yaf5bCe5juUh/TRj7EsmlI7ntEwgTw26AOPwynnzBl2Ertx7771s3rx5zP65ZxvXNJ/N7QZpSt/iYqqNUB4B/lXS0W3u16zcbpPe1oQNkDvQeobyRPX6QRegS58ddAHG4bFBF2Ac1q6dXBuRz58/3kHlnsyDfB3VHNXvA0j6EtVihIckzSqbhsyiWkoKVc1w74b8c6ia5BvK+ej0ttLEjoiajGyY280xpvuAhZKeX0adD6Wav3oxMLJKbSnVQgxK+hJJO5ZVXnOB60szfIukheU+xzTkaWnC1iAjYrKpvwZp+zpJX6Da73Qb1br35cDOwCpJx1IF0aPK9eskraLaMWobcHzDDkvHUS2UmA5cWo62EiAjokb1r46xfQpwyqjkrVS1yWbXnwqc2iR9LXBgN89OEzsiooXUICOiJsO3WUUCZETUJAEyIqKFBMiIiBYSICMi2kiAjIhoIjXIiIgWEiAjIlpIgIyIaCEBMiKijQTIiIgmUoOMiGghATIiooXhC5B9281H0qLyGsb1kk7q13Mjop9q3zB3oPpSgyyvXfwk1VsJNgA3SLrY9u39eH5E9MPw1SD71cReAKy3/R0ASSupXsSTABkxNIYvQParid3qVYzPIGnZyOsfh+uvOSImo37VIDt65aLt5VTvm+D50pivZIyIiWT4apD9CpCtXsUYEUMjAXK8bgDmltcwfg9YAvxxn54dEX2RADkutrdJehdwOTAN+Iztdf14dkT003AFyL7Ng7T9ddv72X5JeS1jRAyVkRpkN0d7kl4m6ZaG41FJ75W0m6TVku4uP3dtyHNymW99l6TDGtLnSbq1fHempGZjI8+Q175GRE3qD5C277J9sO2DgXnAT4AvAycBa2zPBdaUz0jan6oL7wBgEXBWmYcNcDawDJhbjkVjPT8BMiJqUn+AHOVQ4L9tf5dqHvWKkr4COLKcLwZW2t5q+x5gPbBA0ixghu1rbBs4vyFPS1mLHRE1GdcgzUxJaxs+Ly/T/ZpZAlxYzve0vRHA9kZJe5T02cC1DXlG5lw/Uc5Hp7eVABkRNeo6QG62PX+siyTtABwBnDzWpU3S3Ca9rQTIiKhJT6f5/B5wk+2HyueHJM0qtcdZwKaS3mrO9YZyPjq9rfRBRkRNetoH+Waebl4DXAwsLedLgYsa0pdI2rHMu54LXF+a41skLSyj18c05GkpNciIqElvapCSnk+1E9g7G5JPA1ZJOha4DzgKwPY6SauoNsLZBhxve2RfteOA84DpwKXlaCsBMiJq0psAafsnwC+NSnuYalS72fWnAs+aa217LXBgN89OgIyIGk38TXC7kQAZETXJWuyIiBaGL0BmFDsiooXUICOiJsNXg0yAjIgaJUBGRDSRGmRERAsJkH2zlWqfosnkrkEXICakvcfel3VCeWjsS1pIgIyIaM2ZKB4R0dxTgy5AvRIgI6IeZthWGiZARkRNEiAjItpIEzsioonUICMi2kgNMiKiidQgIyLaSICMiGjCpIkdEdFSapAREU0MYR9kdhSPiGghATIi6vNUl0cHJO0i6QuS7pR0h6RXSdpN0mpJd5efuzZcf7Kk9ZLuknRYQ/o8SbeW786Uxt5mKQEyIuox0sTu5ujMJ4DLbL8cOAi4AzgJWGN7LrCmfEbS/sAS4ABgEXCWpGnlPmcDy4C55Vg01oMTICOiPjXXICXNAH4bOBfA9uO2HwEWAyvKZSuAI8v5YmCl7a2276HaVnaBpFnADNvX2DZwfkOelhIgI6Ie46tBzpS0tuFYNuquvwx8H/ispJslnSPpBcCetjcClJ97lOtnA/c35N9Q0maX89HpbWUUOyLqMb5R7M2257f5fjvg14B3275O0icozekWmvUruk16W32pQUr6jKRNkm7rx/MiYkDqH6TZAGywfV35/AWqgPlQaTZTfm5quH7vhvxzgAdK+pwm6W31q4l9Hh10iEbEJNaDQRrbDwL3S3pZSToUuB24GFha0pYCF5Xzi4ElknaUtC/VYMz1pRm+RdLCMnp9TEOelvrSxLb975L26cezImKAejNR/N3A5yTtAHwHeBtV5W6VpGOB+4CjAGyvk7SKKohuA463f/6inOOoKmvTgUvL0Vb6ICOiHj1ai237FqBZP+WhLa4/FTi1Sfpa4MBunj2hAmQZwVoGzXtUI2KCG7KlhhMqQNpeDiwHmCaNOcIUERNIdvOJiGhjyGqQ/ZrmcyFwDfAySRtKx2pEDJPeLTUcmH6NYr+5H8+JiAFLEzsiookh3A8yATIi6jNkATKbVUREtJAaZETUI9N8IiLaGLImdgJkRNQjNciIiDZSg4yIaCLTfCIi2kgTOyKiidQgIyJaSICMiGgjTeyIiCamWg1S0l91chPbf1NPcSJiUptiNcjfajgX8NvAg8B3gRcDewHf6E3RImJSmWo1SNuvHzmXdDpwJfBR2y5pJwMze1rCiJg8plKAHOUYYK+R4Fj8A1WN8sRaSxURk88UX2r4U6pXJt7SkPYrwM9qLVEhJt9ebEP2yzNqMnfQBejSj55L5h78I5B0L7Cl3H2b7fmSdgM+D+wD3Av8ke0flutPBo4t17/H9uUlfR5Pvxf768AJoyp8z9JNDDoLuEzShyS9TdKHykPO6uIeETGsRmqQ3Ryd+x3bB9seeT/2ScAa23OBNeUzkvYHlgAHAIuAsyRNK3nOpnqt9NxyLBrroR0HSNsfBd4HvKr8fDXwftsf6fQeERE1WQysKOcrgCMb0lfa3mr7HmA9sEDSLGCG7WtKrfH8hjwtdTUP0vY/A//cTZ6ImEK6b2LPlLS24fNy28tHXWPgCkkGPlW+39P2RgDbGyXtUa6dDVzbkHdDSXuinI9Ob6urACnpl6mqry+0/S5J+wHb217XzX0iYgiNb5rP5oZmcyuH2H6gBMHVku5sc61alKxVelsdN7ElvR74NrCQakQbYHfgY53eIyKGXA/6IG0/UH5uAr4MLAAeKs1mys9N5fINwN4N2ecAD5T0OU3S2+pmkOY04CjbR/D074mbgF/r4h4RMaxGapDdHGOQ9AJJvzByDvwucBtwMbC0XLYUuKicXwwskbSjpH2pBmOuL83xLZIWShJVJe8ixtBNE/slti8r5waw/VNJ23dxj4gYVr1ZSbMn8OUqprEd8C+2L5N0A7BK0rHAfcBRALbXSVoF3A5sA463PVKq43h6ms+l5WirmwB5v6QDbd82kiDpIKo5SBERtU8Ut/0d4KAm6Q8Dh7bIcypwapP0tVRzuTvWTRP7TOBLko4Gpkn6n8AFwD9288CIGFI9aGIPWsc1SNufLm339wPTgA8BZ5SpPxERU3qpIWX+0eg5ShERQ7mbTzfTfO5okX5rfcWJiEltqjaxeeYcok7SI2IqmYq7+Uj6wMi1DecjXgrcX3upImJymgS1wm50UoMc2TR3+4ZzqH5XPAj877oLFRGT0BD2QY4ZIG3/DoCkf7L97t4XKSImrSFrYnc1D1LSXo0JkvaU9NKayxQRk9EQzoPsJkD+C89+/8zuJT0iopcb5g5ENwFyv8ZlhsU6YL+xMkraW9JVku6QtE7SCV2VMiJiALqZ5vOIpJm2NzekzQQe6yDvNuBE2zeVnTlulLTa9u3dFDYiJrAhHKTppga5Gjhb0s4A5ec/AVeMldH2Rts3lfMtwB10sJtvREwyQ9YH2U0N8iSqvdYelrQJ2AO4ETiimwdK2gd4JXBdk++WUb1Up+n2vxExgU3FieIjbG+WdAjw68CLqbY5WzvWaxMblVrnF4H32n60yTN+vtZ7u+r9ExExmUyCWmE3ut2swsD15ehK2Vj3i8DnbH+p2/wRMcENYR9k2wAp6Uzb7ynnLXfxsb1sjPsIOBe4w/bp4yloREwCU6yJvX2L824dArwFuFXSLSXtA7a//hzuGRETyVSrQdo+ruH8beN9iO1vknGXiOE2lQdpIiLGNJVqkJKeooOXa9ueVluJImJymmpNbOC3Gs7nA38CfBy4B/hl4L3Ap3pTtIiYdIasid12JY3tb40cwFuBw22fY3uN7U9TTRIfd99kRAyRHu7mI2mapJslXVI+7yZptaS7y89dG649WdJ6SXdJOqwhfZ6kW8t3Z5bZNW11s9TwJTx79/DvUdUkIyJ6udTwBKolyiNOAtbYngusKZ+RtD+wBDgAWAScJWmkC/BsqpV6c8uxaKyHdhMgbwQ+JmmnUpCdgNOAm7u4R0QMq5FR7Jq3O5M0B3gDcE5D8mJgRTlfARzZkL7S9lbb9wDrgQWSZgEzbF9TFryc35CnpW5Gsd8BfBX4YcNa7O/S5VrsiBhi3Q/SzJS0tuHz8rLkuNEZwF8Av9CQtqftjVBthiNpj5I+G7i24boNJe2Jcj46va1u1mKvl3QgsLDc+HvAtbaHbNwqIsZlfPMgN9ue3+pLSYcDm2zfKOk1HdyvWb+i26S31e1a7Ccl/Qew10j0jojooUOAIyT9PrATMEPSBcBDkmaV2uMsYFO5fgOwd0P+OcADJX1Ok/S2Ou6DlLSzpHOBn1K165F0pKRTOr1HRAy5mgdpbJ9se47tfagGX660fTTV1otLy2VLgYvK+cXAEkk7StqXajDm+lKh2yJpYRm9PqYhT0vdDNJ8HNiTKqI/XtJuAN7UxT0iYlj196VdpwGvl3Q31euoTwOwvQ5YBdwOXAYc39ANeBzVQM964L+BS8d6iDrdzlHS94D9bf9I0g9s71bSH7G9Szd/sk5sJ3nnum/aY08MugAxIf3GoAvQpbXAo3bXeyfMf4G89uXd5dFN3NiuD3LQuumDFFXz+umEagPcH9daooiYnIZwqWE3TexvASePSns3cFV9xYmISWsI34vdTQ3yRGCNpKOBnSXdSrVH5KE9KVlETD5Dtha7m3mQ95V5kIcD+1JNEr/E9k/b54yIKWEIm9gdBUhJ2wEPU81e/2Jvi1QxGfSI4bB27EsmlE5edN/SVKxB2t4maTNVk/pnvS1SRExKQ1iD7GaQ5hTgbEljrl+MiClqCg/SfBaYBrx59E7jtneou2ARMclM1XfSSHop1YqZXahmoEdEPNskqBV2Y8wAKemNwOepao+PA2/M61oj4lmmaB/kB4EPUO3Fdko5j4h4th5smDtInQTIfYGP234MOB14aW+LFBGT0hRdSTPN9lMAtp+QlAGZiGhuEtQKu9FJgNxBUmOzeqdRn7H9kXqLFRGTzhD2QXYSIK+l2m9txHWjPhtIgIyIoTNmgLT9mj6UIyKGwRSsQUZEjG2qThSPiOhIapAREU1M0UGaiIjOpIkdEdHckFUgEyAjoh5D2MLuaj/IiIi26l6KLWknSddL+rakdZI+VNJ3k7Ra0t3l564NeU6WtF7SXZIOa0ifJ+nW8t2ZksZ8tW0CZETUokdLsbcCr7V9EHAwsEjSQuAkYI3tucCa8hlJ+wNLgAOARcBZkqaVe50NLAPmlmPRWA/vS4Bs9VsgIoZL3TVIV35cPm5fDgOLgRUlfQVwZDlfDKy0vdX2PcB6YIGkWcAM29fYNnB+Q56W+lWDbPVbICKGxDhrkDMlrW04lo2+r6Rpkm4BNgGrbV9H9QLBjQDl5x7l8tnA/Q3ZN5S02eV8dHpbfRmkKRG72W+BiBgS4xyk2Wx7ftv72k8CB0vaBfhyef10K836Fd0mva2+9UG2+C0w+pplI79JEj0jJp9e7pdr+xHgaqq+w4dKs5nyc1O5bAOwd0O2OcADJX1Ok/S2+hYgbT9p+2Cqgi1o9lvA9nLb823PH3N4KSImlF4M0kjavdQckTQdeB1wJ3AxsLRcthS4qJxfDCyRtKOkfakGY64vzfAtkhaW0etjGvK01Pd5kLYfkXQ11W+B2/r9/IjonR7Mg5wFrCgj0c8DVtm+RNI1wCpJxwL3AUcB2F4naRVwO7ANOL400QGOA84DpgOXlqMtVd2DvSVpd+CJEhynA1cAf2f7klZ5pkneqecli+i9aWNfMqE8Bjxpd92I+1XJX+syz4vgxrH6IAepXzXIpr8F+vTsiIhx6dco9n8Cr+zHsyJicIZtqWHWYkdELYZwv9wEyIioT2qQERFNDONuPgmQEVGbNLEjIppIDTIiooUEyIiINtLEjohoIjXIiIg2UoOMiGgiNciIiDYSICMimshSw4iINlKDjIhoIn2QERFtDFsTu2/vpImImGxSg4zoscMGXYAurRlnvjSxIyLaGLYmdgJkRNQiNciIiDaGLUBmkCYiajEyUbybYyyS9pZ0laQ7JK2TdEJJ303Sakl3l5+7NuQ5WdJ6SXdJOqwhfZ6kW8t3Z0oa89W2CZARUZsnuzw6sA040fYrgIXA8ZL2B04C1tieSzWudBJA+W4JcACwCDirvG4a4GxgGTC3HIvGengCZETUYqQPss4AaXuj7ZvK+RbgDmA2sBhYUS5bARxZzhcDK21vtX0PsB5YIGkWMMP2NbYNnN+Qp6X0QUZEbcYxij1T0tqGz8ttL292oaR9gFcC1wF72t4IVRCVtEe5bDZwbUO2DSXtiXI+Or2tBMiIqMU4R7E3254/1kWSdga+CLzX9qNtug+bfeE26W2liR0RtejFIA2ApO2pguPnbH+pJD9Ums2Un5tK+gZg74bsc4AHSvqcJultJUBGRG3q7oMsI83nAnfYPr3hq4uBpeV8KXBRQ/oSSTtK2pdqMOb60hzfImlhuecxDXlaShM7ImrRo4nihwBvAW6VdEtJ+wBwGrBK0rHAfcBRALbXSVoF3E41An687ZFiHQecB0wHLi1HWwmQEVGbupca2v4mzfsPAQ5tkedU4NQm6WuBA7t5fgJkRNQiSw0jItpIgIyIaGIY30mTUeyIiBZSg4yI2qSJHRHRxDA2sfsaIMuuGmuB79k+vJ/PjojeSw3yuTmBajeOGX1+bkT02DBO8+nbII2kOcAbgHP69cyI6K9erMUepH7WIM8A/gL4hVYXSFpGtaFly6nzETExpQY5TpIOBzbZvrHddbaX255ve34CZMTk0osNcwetXzXIQ4AjJP0+sBMwQ9IFto/u0/Mjog8mQ7O5G32pQdo+2fYc2/tQvS/iygTHiOGSGmRERBvDVoPse4C0fTVwdb+fGxG9NYyDNKlBRkRtEiAjIprIUsOIiDZSg4yIaCJ9kBERbaSJHRHRxDDWILOjeERECwmQEVGbunfzkfQZSZsk3daQtpuk1ZLuLj93bfjuZEnrJd0l6bCG9HmSbi3fnSmpo+0eEiAjohY9Wmp4HrBoVNpJwBrbc4E15TOS9qdaynxAyXNW2aQb4GyqncLmlmP0PZtKgIyI2tQdIG3/O/CDUcmLgRXlfAVwZEP6Sttbbd8DrAcWSJoFzLB9jW0D5zfkaSuDNBFRi3FOFJ8paW3D5+W2l4+RZ0/bGwFsb5S0R0mfDVzbcN2GkvZEOR+dPqYEyIiozThGsTfbnl/T45v1K7pN+pgSICOiFn2c5vOQpFml9jgL2FTSNwB7N1w3B3igpM9pkj6m9EFGRC1Gmth9eCfNxcDScr4UuKghfYmkHSXtSzUYc31pjm+RtLCMXh/TkKet1CAjojZ11yAlXQi8hqqvcgNwCnAasErSscB9wFEAttdJWgXcDmwDjrc9UqTjqEbEpwOXlmPs51eDOhPPNMk7DboQETX4/UEXoEtrgB/YXb8W6hcl/2aXeb4ON9bYB1m71CAjojbDttRwwgbIp2DzT+C7Pbj1TGBzD+7bS5OtzJOtvNDDMn+hFzft7d/xi8eTaRjXYk/YAGl7917cV9LaiVylb2aylXmylRcmX5knanmzm09ERBOpQUZEtJEAOfmNtYxpIppsZZ5s5YXJV+YJV95hfCfNhJ3mExGTy86SD+wyz3UTfJpPVtJERLQwFZvYEdEDw9jEnjI1SEmLyi7D6yWdNOjyjKXZTsoTnaS9JV0l6Q5J6ySdMOgytSNpJ0nXS/p2Ke+HBl2mTkmaJulmSZcMuiyNerBh7kBNiQBZdhX+JPB7wP7Am8vuwxPZeXS46/EEsg040fYrgIXA8RP873kr8FrbBwEHA4skLRxwmTp1AnDHoAvRqEc7ig/UlAiQwAJgve3v2H4cWEm1+/CE1WIn5QnN9kbbN5XzLVT/gDvamHQQXPlx+bh9OSb8qKWkOcAbgHMGXZbR+rSbT99MlQA5G7i/4XPHOwrH+EjaB3glcN1gS9JeaareQrWn4GrbE7q8xRnAXzDBYkxqkJPXuHcUju5J2hn4IvBe248Oujzt2H7S9sFUm6gukNTtTJW+knQ4sMn2jYMuy2gJkJNXq52Go2aStqcKjp+z/aVBl6dTth8Brmbi9/seAhwh6V6qrqLXSrpgsDsyT7sAAAMoSURBVEV6WprYk9MNwFxJ+0ragerVkBcPuExDp+zWfC5wh+3TB12esUjaXdIu5Xw68DrgzsGWqj3bJ9ueY3sfqv+Pr7R99ICLBaQGOWnZ3ga8C7icauBgle11gy1Ve2Un5WuAl0naUHZPnugOAd5CVau5pRwTeb/YWcBVkv6T6pfoatsTatrMZDNsNcgsNYyIWuwgea8u89w/wZcaZiVNRNRmMjSbu5EAGRG1GMalhgmQEVGb1CAjIpoYxh3Fp8QodkT0Ry9GsQe50UxqkDEhSToP2Gb77YMuS3SmFzXIho1mXk+14OMGSRfbvr3mRzWVGmQ8g6QPSrKkY7rIY0ndvjM+ohMD3WgmNcj4OUnPA46l2kXoncD5gy1RTCZPweVbqvd1d2MnSWsbPi+33fi+nWYbzfzGeMvYrQTIaHQY1Tr1I4FLJB1o+zYASb8K/D0wD5hGNcH39ZK+XfJeIekpYKXtt5e1wh+0fUHJvw9wD7C37Q2SDgU+AuxHtY/kGuA9tjf1548adbPdi3XsA91oJk3saPRO4FLbXwO+DSwDkDQL+EY59gH2Av4OoGw2C/C7tnfuos9wK9Xyz92BXwFeCHyinj9GDJGBbjSTABkASHoh1SasnylJnwHeUjZxeAtVP9BHbT9m+3Hb//Zcnmf7m7ZvsL3N9oNUtdNDn8s9YygNdKOZBMgYMdL3OLJZwwXAdOBNVLXG/6rzYZLmSbpc0oOSHgUupKpNRvzcoDeaSYCMkcGZtwO7ABskPQjcTtXXuAy4F5jb5hbN+oR+DLyg4fMLR32/ErgJ2M/2DODN4yp8DD3bX7e9n+2X2D61n89OgAyoNomdA7ya6uVVI8cbgFdRvTbhZZLeL+n5krYvgywjHuTZAXQt1cvRdpa0O/CXo76fAfwI2CLpRcCEf9NkTD0JkAHV4MxXbN9o+8GG4wqqPSmPAl7D05N1HwLe35D//wJ/I+mHkj5V0j5INW94I9VO3StHPXMZVa11C/Al4F978QeLeC6yH2RERAupQUZEtJAAGRHRQgJkREQLCZARES0kQEZEtJAAGRHRQgJkREQLCZARES38f0TnyDeH8MSaAAAAAElFTkSuQmCC\",\n            \"text/plain\": [\n              \"<Figure size 360x360 with 2 Axes>\"\n            ]\n          },\n          \"metadata\": {\n            \"needs_background\": \"light\"\n          },\n          \"output_type\": \"display_data\"\n        },\n        {\n          \"name\": \"stderr\",\n          \"output_type\": \"stream\",\n          \"text\": [\n            \"/anaconda/envs/azureml_py36/lib/python3.6/site-packages/ipykernel_launcher.py:9: UserWarning: Creating a tensor from a list of numpy.ndarrays is extremely slow. Please consider converting the list to a single numpy.ndarray with numpy.array() before converting to a tensor. (Triggered internally at  ../torch/csrc/utils/tensor_new.cpp:201.)\\n\",\n            \"  if __name__ == '__main__':\\n\",\n            \"/anaconda/envs/azureml_py36/lib/python3.6/site-packages/ipykernel_launcher.py:9: UserWarning: Creating a tensor from a list of numpy.ndarrays is extremely slow. Please consider converting the list to a single numpy.ndarray with numpy.array() before converting to a tensor. (Triggered internally at  ../torch/csrc/utils/tensor_new.cpp:201.)\\n\",\n            \"  if __name__ == '__main__':\\n\",\n            \"/anaconda/envs/azureml_py36/lib/python3.6/site-packages/ipykernel_launcher.py:9: UserWarning: Creating a tensor from a list of numpy.ndarrays is extremely slow. Please consider converting the list to a single numpy.ndarray with numpy.array() before converting to a tensor. (Triggered internally at  ../torch/csrc/utils/tensor_new.cpp:201.)\\n\",\n            \"  if __name__ == '__main__':\\n\",\n            \"/anaconda/envs/azureml_py36/lib/python3.6/site-packages/ipykernel_launcher.py:9: UserWarning: Creating a tensor from a list of numpy.ndarrays is extremely slow. Please consider converting the list to a single numpy.ndarray with numpy.array() before converting to a tensor. (Triggered internally at  ../torch/csrc/utils/tensor_new.cpp:201.)\\n\",\n            \"  if __name__ == '__main__':\\n\"\n          ]\n        },\n        {\n          \"name\": \"stdout\",\n          \"output_type\": \"stream\",\n          \"text\": [\n            \"loss: 1.3225790266836843, accuracy: 0.333766129032258, f1: 0.16503508884653428, precision: 0.34857432644627095, recall: 0.4273865645595446\\n\",\n            \"val mode | time: 09:01:09\\n\"\n          ]\n        },\n        {\n          \"data\": {\n            \"image/png\": \"iVBORw0KGgoAAAANSUhEUgAAAU4AAAErCAYAAACxamqAAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4yLjEsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy+j8jraAAAgAElEQVR4nO3de7hdVX3u8e9LiCEWwy2AMUFDTVCBI+EkTaNYS4lKVCpooUaPEms0iqDoQQt4rDfESxVQrPAYhSagElK8gAhiSlAfLAR2uAgJULcSYUMghmugEgn8zh9jLFzZ2XvtNRdz3fZ6P88znz3XWHPMOXYgv4wxx00RgZmZ1W+7dhfAzKzbOHCamRXkwGlmVpADp5lZQQ6cZmYFOXCamRXkwGlmHUvSDpKuk3SzpDWSPpPTPy3pHkk35eMNVXlOltQv6Q5Jh1alz5R0S/7uTEnK6eMkXZjTV0maOlK5HDjNrJNtBg6JiAOAGcA8SXPyd2dExIx8XAYgaV9gPrAfMA84S9KYfP3ZwCJgej7m5fSFwEMRMQ04A/jSSIVy4DSzjhXJY/nj2HzUmrVzOLAsIjZHxJ1APzBb0iRgQkRcE2nWz3nAEVV5lubzi4C5ldrocLZv7NcxM9vavHnzYuPGjYXyrF69+oqImFfrmlxjXA1MA74REaskvR44TtLRQB9wQkQ8BEwGrq3KPpDTnszng9PJP+8GiIgtkh4BdgOG/WUcOM2sFBs3bqSvr69QHkkvlVSdaXFELK6+JiKeAmZI2hn4oaT9Sc3uU0i1z1OA04B3A0PVFKNGOiN8NyQHTjMrSQBbimbaGBGz6rp7xMOSfg7Mi4ivVNIlfQu4NH8cAPaqyjYFuDenTxkivTrPgKTtgZ2AB2uVxe84zaxEWwoetUnaPdc0kTQeeA1we35nWfFm4NZ8fgkwP/eU703qBLouItYDmyTNye8vjwYursqzIJ8fCayMEVY/co3TzErSUI1zJJOApfk953bA8oi4VNL5kmbkh64D3gcQEWskLQfW5sIcm5v6AMcAS4DxwOX5ADgHOF9SP6mmOX+kQsnLyplZGWbNOjD6+q4qlEfaZXW9TfVO4hqnmZWkKTXOjuR3nE0m6S2SVkp6WNJmSf8t6XOSJjbpeQdJukHSE5JKa07kmRrFxpp0MEmLJB0x8pXPXL9kUO+vbaMSOMt7x9mpXONsIkmnAR8G/p00I+FRYF/g/aSZDW9uwmO/CWwADiXNuijLt4Efl3i/dltE6lD4UZ3Xn0J6N2bD6p0apwNnk0j6e+D/Agsj4tyqr34haTHwuiY9+qWksXC/KPOmETHA1gOIe4Kk8RHxx4j4bbvL0h16I3C6qd48HwFuGBQ0gTSgNyIqPXpImihpqaQHJP2PpJ9L2uqFuaR1kr4i6SOSBiQ9JGlZ1VCNg3PTfAzwNUkhaUn+LiQdN+h+WzW9Je0s6duS7s3N/Lvy+Lghr89pe0v6kaRHJW2S9GNJ0wZdE5KOl/R5SX+QtEHSNySNq/WHV2kaS3qjpLX5z+UnknaVNE3SVZIez9e8fFDeEyRdL+kRSfcPLlceCzgTWJDLF5LeVfXnfJqkf5E0QGolbNNUl3SppNvzEJnq5z4hab9av9voFcBTBY/u5MDZBJLGAq8Eflpnlh+RmtYfBd5K+u9y1eAgBPwjMJfUzDwROAz4fP7uBuAV+fy0fH5KgWKfDryKFPAPBT5OjdkTOfBdCbwMeC/wLmBvUo1610GXnwC8AHgH8GXS0JHj6yjTC4HPAp8g/c6vBBYDy/JxJKnVtCyPzauYAvwbaQ7ye0n/mPxK0k75+w8AtwOXkf6cXgH8pCr/24G/zde9dZiyvRfYHfgCgKSXAZ8DPhURa+r43UYhv+O0Z2c3YBxw10gXSpoHHAQcXGleS1pJGpv2MfL4tOxJ4IiI2JKvq6wE84GIeBS4NsePdRFRPV+3HrNJ84AvrEr7To3r/4kU2PaJiN/l8qwCfpfL/IWqa9dFxLvy+RWSDgLeAvzrCGXaFXhFpZmca5YfAxZExHk5TaSg91LgNoCI+EjlBnn83wrSe9/DgfMiYq2kx4E/1PhzOiwinhiuYBGxPtfivyvpx/n3vRH4ynB5Rr/eecfpGmdz1dOrPZv0F/iZd5IR8ThpCtmrBl17VSVoZmuBPSQ951mXFG4CPibpA5L2qeP62aRXEb+rJOT3oL9i23L/bNDntWw9/W046wa9W+zPP1cOkVZZsAGl2SErJD1A+pv8P8COQD2/F8CVtYJmRURcAHyfFLj3JwX07m1/Wt0cOJvjAVKP9gvruHYScP8Q6feTalzVHh70+U+kBQrKCJzHkV4ZfBK4Q9JvJNWaQfFsy71DHWUaKt/g9EraDgCSXkgK1CLVfA8C/opU46znmTD07zWcC0itixUR8ZsC+Uap3miqO3A2QUQ8Sap5HTrStcB6YI8h0vdkhIUGCtjMtsF1q+AWEQ9HxIci4vnAAcAqUjN032Hu2YpyN2Ie8Fzg8Ii4KCL+i1SbHhzMa6lr/KukCaRhZjcCb1LVauO9qXfecTpwNs9XgVmSFgz+QtJ2+d0mpAC1h6RXV33/XOCNwNUllWWA1InzzPOBQ4a7OCJ+TXqXuB3p3eFQVgEzlRZSqNx3MqkDp6xyN2I88DRb/638R7Z9n19vrbeWr5I6ng4Bvgd8u6oDqgf1TuB051CTRMSPJZ0OnJM7Qy4GHiMFoveTOn9+GhFXSPoVcKGkk0jN/I+SAsCXSyrOD4FjJd1I6rx5DzCh+gJJV+frbiX9DXgv8Dhw3TD3XELq2b9c0idJY0s+TVr89ZsllbsRK0nB7N8lnUOaaPBRtm323w4cmmuJDwB3RsQD9T5E0mGkDrLX5+XOPkj6s/saaYRBD3LnkJUgIk4gDWeZTqqRrCANzbmStFJLxZvzd18F/oP0fu6QiOinHJ/J9/0cKeDdBAweX3oN6S/8RcByYCIpKAw56D0iNpOX+CKtLrMU+D1pdEDbmuoRcQspoP01qYPt7cBRwCODLv0cqRd+OXA98Pf1PiMPt1oMfCsifpqf+yDpH5sFSpMfelDv1Di9OpKZlWLWrJdEX99ZhfJIr/HqSGbWy3qnqe7AaWYlceA0MyvIgdPMrCAHTjOzBjhwttXE8YqpXTaU+P4iE/U6wH3tLkADnm53ARpw4MyZ7S5CIevWrWPjxo1D7TU+Atc4227qTtB3dLtLUcxXyhqu3iIjLU3UiR5vdwEa0NfXXTtuzJrV6OggB04zs4IqCxmPfg6cZlYS1zjNzBrQG4HTc9XNzApyjdPMSuKmuplZQQ6cZmYFOXCamRXkwGlm1oDeCJzuVTezkpS/ArykHSRdJ+lmSWskfSan75q3gP5N/rlLVZ6TJfVLuqN6Az1JMyXdkr87U5Jy+jhJF+b0VZKmjlQuB04zK0lTts7YTNpG5gBgBjBP0hzgJODKiJhO2ormJIC8K+t80l5T84CzJI3J9zobWETaymZ6/h5gIfBQREwj7Vr6pZEK5cBpZiUpP3BG8lj+ODYfARxO2ueK/POIfH44sCwiNkfEnUA/MFvSJGBCRFwTab+g8wblqdzrImBupTY6HAdOMytJczZrkzRG0k3ABmBFRKwC9oyI9QD55x758snA3VXZB3La5Hw+OH2rPBGxhbSx3261yuTOITMrUeHOoYmSqpePWhwRi6sviIingBmSdgZ+KGn/GvcbqqYYNdJr5RmWA6eZlaSh4Ugb693lMu9f/3PSu8n7JU2KiPW5Gb4hXzYA7FWVbQpwb06fMkR6dZ4BSdsDOwE1t7h2U93MStKUXvXdc00TSeOB1wC3A5cAC/JlC4CL8/klwPzcU743qRPoutyc3yRpTn5/efSgPJV7HQmsjBH2TW9ZjVPSPOBrwBjg2xHxxVY928xaoSkD4CcBS3PP+HbA8oi4VNI1wHJJC4G7gKMAImKNpOXA2lyYY3NTH+AYYAkwHrg8HwDnAOdL6ifVNOePVKiWBM78S38DeC2pWny9pEsiYm0rnm9mrVLuQsYR8WvgwCHSHwDmDpPnVODUIdL7gG3ej0bEE+TAW69W1ThnA/0R8TsASctIQwAcOM1GDU+5LNtQQwT+ukXPNrOW6J3A2arOobq6+yUtktQnqe8Pf2xBqczMGtCqGudwQwS2ksdvLQaY9XzV7NUys07TOzXOVgXO64HpeXjAPaReq7e36Nlm1hIOnKWKiC2SjgOuIA1HOjci1rTi2WbWKg6cpYuIy4DLWvU8M2sHB04zswJc4zQzK8iB08ysIAdOM7OCHDjNzBrgwGlmVoBrnGZmBTlwmpkV5MBpZlaQA6eZWQPKXci4UzlwmllJXOM0MyuodwKnd7k0MyvINU4zK0nv1DgdOM2sRA6cZmYFuMZpZlaQA2fb3Xw/7PHldpeimCfaXYCCemPEXftN1VCbvHau9Q3ndOA0MysueuOfYwdOMyvP0+0uQGs4cJpZOYKeef/jwGlm5XDgNDNrQI801T3l0szKUalxFjlGIGkvSVdJuk3SGknH5/RPS7pH0k35eENVnpMl9Uu6Q9KhVekzJd2SvztTSsMdJI2TdGFOXyVp6kjlcuA0s/I8XfAY2RbghIh4GTAHOFbSvvm7MyJiRj4uA8jfzQf2A+YBZ0kak68/G1gETM/HvJy+EHgoIqYBZwBfGqlQDpxmVo4m1DgjYn1E3JDPNwG3AZNrZDkcWBYRmyPiTqAfmC1pEjAhIq6JiADOA46oyrM0n18EzK3URofjwGlm5SkeOCdK6qs6Fg1369yEPhBYlZOOk/RrSedK2iWnTQburso2kNMm5/PB6VvliYgtwCPAbrV+TQdOMytH0EhTfWNEzKo6Fg91a0k7At8HPhwRj5Ka3S8GZpAmO51WuXSYkg2XXivPsBw4zaw8JTfVASSNJQXN70bEDwAi4v6IeCoinga+BczOlw8Ae1VlnwLcm9OnDJG+VR5J2wM7AQ/WKpMDp5mVozm96gLOAW6LiNOr0idVXfZm4NZ8fgkwP/eU703qBLouItYDmyTNyfc8Gri4Ks+CfH4ksDK/Bx2Wx3GaWSc7CHgncIukm3Lax4G3SZpBCtfrgPcBRMQaScuBtaQe+WMjnplAfwywBBgPXJ4PSIH5fEn9pJrm/JEKpRECa9uMlWKXkS/rKF4dyYaye7sLUNB6YHNE4SWdZr1c0XdJsTzam9URMavos9rNNU4zK4enXJqZNaBHplw6cJpZOVzjNDMrqIcCZ0uGI+WR/Rsk3Try1WbWtcqfq96RWjWOcwl/nlBvZqNRE8ZxdqqWNNUj4pf1LNVkZl2ui4NhEX7HaWblqMxV7wEdFTjzyiiLwHNBzbqSa5ytl1dGWQxp5lCbi2NmRbjGaWbWgB6pcbZqONIFwDXASyQNSFrYiueaWQu5V71cEfG2VjzHzNrMTXUzswJ6aOaQA6eZladHAqdH/ZiZFeQap5mVw8ORzMwa0CNNdQdOMyuHa5xmZg1wjdPMrAAPRzIza4Cb6mZmBbjGaWZWkAOnmVkD3FQHSZ+s5yYR8dlyimNmXcs1zmf8TdW5gFcD9wG/B14EPB/4RXOKZmZdxzVOiIjXVs4lnQ6sBL4QEZHTTgYmNrWEZtYdXOMc0tHA8ytBM/syqQZ6QqmlMrPu1COBs8jqSH8E9h+U9r+AJ8orjpl1rcqUyyLHCCTtJekqSbdJWiPp+Jy+q6QVkn6Tf+5SledkSf2S7pB0aFX6TEm35O/OlKScPk7ShTl9VT1bmRepcZ4F/FTSN4F1wFTSjpRfL3CPum0HPLcZN26ibvsXZEy7C9CAbqzQvKTdBSjooWeTufz/QFuAEyLiBknPA1ZLWgG8C7gyIr4o6STgJOBESfsC84H9gBcA/ylpn4h4CjibFLOuBS4D5gGXAwuBhyJimqT5wJeAt9YqVN01zoj4AvAx4BX55yuBEyPi8/Xew8xGsSbUOCNifUTckM83AbcBk4HDgaX5sqXAEfn8cGBZRGyOiDuBfmC2pEnAhIi4Jr9uPG9Qnsq9LgLmVmqjwyk0jjMizgfOL5LHzKwMuQl9ILAK2DMi1kMKrpL2yJdNJtUoKwZy2pP5fHB6Jc/d+V5bJD0C7AZsHK4shQKnpL8kVYNfEBHHSdoHGBsRa4rcx8xGqeJN9YmS+qo+L46IxYMvkrQj8H3gwxHxaI0K4VBfRI30WnmGVXdTXdJrgZuBOaQedoDdga/Uew8zG8Ua2x54Y0TMqjqGCppjSUHzuxHxg5x8f25+k39uyOkDwF5V2acA9+b0KUOkb5VH0vbATsCDtX7VIr3qXwSOiog38ed/V24A/neBe5jZaFZ+r7qAc4DbIuL0qq8uARbk8wXAxVXp83NP+d7AdOC63KzfJGlOvufRg/JU7nUksHLQsMttFGmqvzgifprPAyAi/pj/NTCzXtecAfAHAe8EbpF0U077OKkit1zSQuAu4CiAiFgjaTmwltQjf2zuUQc4BlgCjCf1pl+e088BzpfUT6ppzh+pUEUC592S9o+IWysJkg4gDU0ys17XhMAZEVcz9DtIgLnD5DkVOHWI9D62HYtORDxBDrz1KtJUPxP4gaR3AGMk/QPwHeCMIg80s1Gs5KZ6p6q7xhkR38rvBk4kjZ3+DPDVPETJzHqd56oPLfd4bdPrZWYGdHUtsogiw5FuGyb9lvKKY2Zdq7HhSF2pSI1zSsF0M+s1XRwMixgxcEr6eOXaqvOKaeSpSmbW4ypz1XtAPTXOymLGY6vOIf0R3Qe8u+xCmVmXco0ziYi/A5D09Yj4YPOLZGZdqYd61QuN45T0/OoESXtKmlZymcysW/XIOM4igfN7bLu/0O453cx6XQ/1qhcJnPtUT7fM1gD7jJRxuOXvzWyUcY1zGw9LGlzjnAg8XkfeyvL3LyMtS3dsXuLezKzrFAmcK4Cz84KilYVFvw78bKSMNZa/N7PRooea6kUGwJ9EWrfuAUkbgD2A1cCbijxw0PL3ZjaadHEwLKLIIh8bJR0E/BXwItJycn0jLfhZbfDy90N8v4i0C11X7sBo1tM8AH5oOUhel49Chln+fvD9n1lEZJxUd0A2sw7hGidIOjMiPpTPh10VKSIWjXCf4Za/N7PRoocGwI9U4xw7zHlRQy5/HxGXPYt7mlmncVMdIuKYqvN/avQhIyx/b2ajgWucZmYFuXMokfQ0I2zMDhAR7gQ3M9c4s7+pOp8FvB84DbgT+Evgw8A3m1M0M+sqbqonEfGryrmkfwMOi4jf5qQrJa0ELiLtgGlmvc5N9W28mG1Xe7+HVPM0s17XQzXOInPVVwNfkbQDQP75ReDGZhTMzLqQ56pv473Aj4GHquaq/56Cc9XNbJRyr/q2IqJf0v6kZeEmk5rp10ZEF/+7YWal6pFoUHSu+lOS/gt4fkSsb1KZzKwb9VCNs+53nJJ2lHQO8EegP6cdIelTzSqcmVknKtI5dBqwJ2ne+Z9y2vXAW8sulJl1qR7pHCoSOA8D/k9ErCbPJoqIe4AXNKNgZtZlmrACvKRzJW2QdGtV2qcl3SPppny8oeq7kyX1S7pD0qFV6TMl3ZK/OzOv2IakcZIuzOmr8kLrIyoSOEVqplf/UjsCjxW4h5mNZuVv1rYEmDdE+hkRMSMflwHkfczmA/vlPGdJqkwHP5u0SPr0fFTuuRB4KCKmAWcAX6qnUEUC56+AkwelfRC4qsA9zGy0akKNMyJ+CTxYZwkOB5ZFxOaIuJPUFzNb0iRgQkRckxdjPw84oirP0nx+ETC3UhutpUiv+gmkaZbvAHaUdAtpjc65Be5hZqNVa2cOHSfpaKCPtIPuQ6RhktdWXTOQ057M54PTyT/vBoiILZIeAXYDNtZ6eJFxnHflcZyHAXuTBr9fGhF/rJ3TzHpG8eFIEyX1VX1enLfQqeVs4BRSqD6F1HH9boZe8zdqpDPCd8OqK3BK2h54ANgzIr5fTx4z6zGN1Tg3RsSsQo+JuL9yLulbwKX54wCwV9WlU4B7c/qUIdKr8wzkOLcTdbwaqCtw5irsRlLT/Il68jxbTwGbWvGgEnXx6AproqvbXYCCntVf8BYMgJc0qWoCzpuBSo/7JcD3JJ1OGu0zHbguT9zZJGkOaVvyo4GvV+VZAFwDHAmsrGfn3iLvOD8FnC3pxDwMyczsz5rwjlPSBcDBpCb9ACkOHSxpRn7iOuB9ABGxRtJyYC2wBTi2akr4MaQe+vHA5fmAtInk+ZL6STXN+XWVq95t0SU9SdruvDKx6pmMEfGcum5SwPZS7FT2TZusJVVxsyZ7AngqovAeYbPGKfomFcuj37O6aFO9E9T7jnMaaYbQzsBvR7jczHpRD81VHzFwSnoLcCGptvkn4C3e1tfMhtQjL/rrGQD/CeDjwPNI7xc+3tQSmVl3asIA+E5VT+DcGzgtIh4HTgemNbdIZta1yp9y2ZHqecc5JiKeBoiIJyWV3hFkZqNAD+05VE/gfI6k6ub5DoM+ExGfL7dYZtaVurgWWUQ9gfNa4LVVn1cN+hyAA6dZr3ON888i4uAWlMPMrGsU2nPIzKwm1zjNzArwAHgzswa4xmlmVoA7h8zMGuCmuplZMT1S4XTgNLNy9FBL3YHTzMrTIy311gROSTsAvwTG5WdeFBGfasWzzaw1XOMs32bgkIh4TNJY4GpJl0fEtSNlNLPu4RpnifLmR4/lj2PzUd+eHWbWFXqpxlnPepylkDRG0k3ABmBFRKxq1bPNrPl6aB3j1gXOiHgqImaQ9jSeLWn/wddIWiSpT1Kfq6Nm3adH1jFuXeCsiIiHgZ8D84b4bnFEzIqIWYW32DOztnKNs2SSdpe0cz4fD7wGuL0Vzzaz1umVwNmqXvVJwFJJY0jBenlEXNqiZ5tZC/TQ4kgt61X/NXBgK55lZtZsnjlkZqXp5uZ3EQ6cZlYKN9XNzBrgGqeZWQG9NHPIgdPMStMrTfWWD4A3s9GpGQPgJZ0raYOkW6vSdpW0QtJv8s9dqr47WVK/pDskHVqVPlPSLfm7MyUpp4+TdGFOXyVpaj2/qwOnmZWiSTOHlrDtLMOTgCsjYjpwZf6MpH2B+cB+Oc9Zeew4wNnAImB6Pir3XAg8FBHTgDOAL9VTKAdOMytN2XPVI+KXwIODkg8HlubzpcARVenLImJzRNwJ9JPWxZgETIiIa/JKbecNylO510XA3EpttBa/4zSzUrSwc2jPiFgPEBHrJe2R0ycD1Wv8DuS0J/P54PRKnrvzvbZIegTYDdhYqwAOnGZWmgY6hyZK6qv6vDgiFjf4+KFqilEjvVaemhw4zawUDdY4N0bErIJ57pc0Kdc2J5HW+IVUk9yr6ropwL05fcoQ6dV5BiRtD+zEtq8GtuF3nGZWmhatjnQJsCCfLwAurkqfn3vK9yZ1Al2Xm/WbJM3J7y+PHpSncq8jgZX5PWhNrnGaWSmaMeVS0gXAwaQm/QDwKeCLwHJJC4G7gKMAImKNpOXAWmALcGxEVOLzMaQe+vHA5fkAOAc4X1I/qaY5v65y1RFc22J7KXZqdyEKeqLdBTArwRPAUxGF1xJ/iRRnF8wzF1Y30FRvO9c4zawUnnJpZtYAT7k0M7MhdWyNM0ijVs263XvaXYCCljWYz011M7MG9EpT3YHTzErhGqeZWQMcOM3MCvCeQ2ZmDXCN08ysAL/jNDNrgJvqZmYFuMZpZlaQO4fMzBrgGqeZWQFuqpuZNcBNdTOzAlzjNDNrgAOnmVkBvdSr7oWMzcwKco3TzErjprqZWQG91FRvaeCUNAboA+6JiMNa+Wwzaz7XOJvjeOA2YEKLn2tmTdZLw5Fa1jkkaQrwRuDbrXqmmbXW0wWPbtXKGudXgX8GnjfcBZIWAYsA1KJCmVk5XOMsmaTDgA0RsbrWdRGxOCJmRcQsB06z7lIJnEWObtWqGudBwJskvQHYAZgg6TsR8Y4WPd/MWqCbm99FtKTGGREnR8SUiJgKzAdWOmiajS6ucZqZNcA1ziaJiJ97DKfZ6NOsGqekdZJukXSTpL6ctqukFZJ+k3/uUnX9yZL6Jd0h6dCq9Jn5Pv2SzpTUcFeK56qbWWma2FT/u4iYERGz8ueTgCsjYjpwZf6MpH1JrwP3A+YBZ+WJNwBnk0btTM/HvEZ+R3DgNLOSVKZctmgc5+HA0ny+FDiiKn1ZRGyOiDuBfmC2pEnAhIi4JiICOK8qT2EOnGZWmgZqnBMl9VUdi4a4bQA/k7S66vs9I2I9QP65R06fDNxdlXcgp03O54PTG+LOITMrRYMD4DdWNb+Hc1BE3CtpD2CFpNtrXDvUe8uokd4Q1zjNrDTNaKpHxL355wbgh8Bs4P7c/Cb/3JAvHwD2qso+Bbg3p08ZIr0hDpxmVopm9KpL+gtJz6ucA68DbgUuARbkyxYAF+fzS4D5ksZJ2pvUCXRdbs5vkjQn96YfXZWnMDfVzayT7Qn8MI8c2h74XkT8VNL1wHJJC4G7gKMAImKNpOXAWmALcGxEVGL0McASYDxweT4a4sBpZqUpewB8RPwOOGCI9AeAucPkORU4dYj0PmD/MsrlwGlmpeil1ZEcOM2sNA6cZmYFeM8hM7MGuMZpZlaA33GamRXkprqZWQNc4zQzK8A1TjOzBrjG2WZPw8ZN8Psm3HoisLEJ922mbitzt5UXmljmM5tx0+b+Gb+okUzuHOoAEbF7M+4rqa+OZaw6SreVudvKC91X5k4tr5vqZmYFuMZpZtYAB87Ra3G7C9CAbitzt5UXuq/MHVfeXupVV9q3yMzs2dlRiqJrtq2C1Z34rnYkXgHezKygngmckublDer7JZ3U7vKMRNK5kjZIurXdZamXpL0kXSXpNklrJB3f7jLVImkHSddJujmX9zPtLlO9JI2RdKOkS9tdlooWbw/cVj0ROPOG9N8AXg/sC7wtb1zfyZYA89pdiIK2ACdExMuAOcCxHf7nvBk4JCIOAGYA8yTNaXOZ6nU8cFu7CzFY2XsOdaqeCJykXfH6I+J3EfEnYBlp4/qOFRG/BB5sdzmKiIj1EXFDPt9E+ovd8N7VzRbJY/nj2Hx0/Et/SVOANwLfbndZqjVjs7ZO1SuBc7hN6q1JJE0FDgRWtbckteUm702k7WVXRERHlzf7KvDPdGBr10310aXUzeitNkk7At8HPoxZcGQAAANtSURBVBwRj7a7PLVExFMRMYO0z/ZsSaVs5tUskg4DNkTE6naXZTDXOEef4Tapt5JJGksKmt+NiB+0uzz1ioiHgZ/T+e+VDwLeJGkd6ZXTIZK+094iJQ6co8/1wHRJe0t6DjCftHG9lUhp8+tzgNsi4vR2l2ckknaXtHM+Hw+8Bri9vaWqLSJOjogpETGV9P/xyoh4R5uL9Qw31UeRiNgCHAdcQeqwWB4Ra9pbqtokXQBcA7xE0oCkhe0uUx0OAt5JqgXdlI83tLtQNUwCrpL0a9I/risiomOG93SbXqpxeuaQmZVinBQvKJhnXZfOHOrFuepm1gReHcnMrAEOnGZmBfTS6kgOnGZWml6pcfZEr7qZNV+zetU7cYEeB07rSJKWSOqoudg2srLHcXbqAj0OnLYVSZ+QFJKOLpAnJL2qmeWyztekGmdHLtDjwGnPkLQdsJC0KtP72lwcM+jQBXrcOWTVDiXN4z8CuFTS/hFxK4CklwP/CswExpAGLr9W0s05788kPQ0si4j35LnUn4iI7+T8U4E7gb0iYkDSXODzwD6kdTyvBD4UERta86ta2Z6GKzal/d6L2EFSX9XnxRFRvZ9SRy7Q48Bp1d4HXB4RP8kBcRHwIUmTgF+QAuc/AE8CrwaIiAMkBfC6iLi6wLM2k6bB3kj6y7Yc+BrwtrJ+GWutiGjGAikduUCPm+oGgKQXkBbHPTcnnQu8My9+8U7Se6YvRMTjEfGniPjPZ/O8iLg6Iq6PiC0RcR8pKM99Nve0UakjF+hx4LSKyrvNyiIX3wHGA28FpgL/XebDJM2UdIWk+yQ9ClwA7F7mM6z7deoCPQ6cVukUeg+wMzAg6T5gLeld5iJgHTC9xi2Geuf0GPAXVZ8Hr/+wDLgB2CciJuAmug0jIi6LiH0i4sURcWq7ywMOnJbMI707eiVp07LK8UbgFaTtL14i6URJz5U0NnfuVNzHtoG1jzTmbkdJuwP/Muj7CcAjwCZJLwQ6YmCzWT0cOA1Sp9CPImJ1RNxXdfyMtCboUcDBwGtJL+vvB06syv//gM9KekjSN3PaJ0hD9daTVlZfNuiZi0i13E3AD4D/aMYvZtYMXo/TzKwg1zjNzApy4DQzK8iB08ysIAdOM7OCHDjNzApy4DQzK8iB08ysIAdOM7OCHDjNzAr6/6Vf9etnx4TNAAAAAElFTkSuQmCC\",\n            \"text/plain\": [\n              \"<Figure size 360x360 with 2 Axes>\"\n            ]\n          },\n          \"metadata\": {\n            \"needs_background\": \"light\"\n          },\n          \"output_type\": \"display_data\"\n        }\n      ],\n      \"source\": [\n        \"trainer = Trainer(net=attn_model, lr=1e-3, batch_size=96, num_epochs=10)\\n\",\n        \"trainer.run()\"\n      ]\n    },\n    {\n      \"cell_type\": \"code\",\n      \"execution_count\": null,\n      \"metadata\": {\n        \"jupyter\": {\n          \"outputs_hidden\": false,\n          \"source_hidden\": false\n        },\n        \"nteract\": {\n          \"transient\": {\n            \"deleting\": false\n          }\n        }\n      },\n      \"outputs\": [],\n      \"source\": []\n    }\n  ],\n  \"metadata\": {\n    \"interpreter\": {\n      \"hash\": \"7a6183492d0e103ac878e198fb5e468f3d279e98271ee06042fca66727adf0ef\"\n    },\n    \"kernel_info\": {\n      \"name\": \"python3\"\n    },\n    \"kernelspec\": {\n      \"display_name\": \"Python 3\",\n      \"language\": \"python\",\n      \"name\": \"python3\"\n    },\n    \"language_info\": {\n      \"codemirror_mode\": {\n        \"name\": \"ipython\",\n        \"version\": 3\n      },\n      \"file_extension\": \".py\",\n      \"mimetype\": \"text/x-python\",\n      \"name\": \"python\",\n      \"nbconvert_exporter\": \"python\",\n      \"pygments_lexer\": \"ipython3\",\n      \"version\": \"3.6.9\"\n    },\n    \"microsoft\": {\n      \"host\": {\n        \"AzureML\": {\n          \"notebookHasBeenCompleted\": true\n        }\n      }\n    },\n    \"nteract\": {\n      \"version\": \"nteract-front-end@1.0.0\"\n    },\n    \"orig_nbformat\": 4\n  },\n  \"nbformat\": 4,\n  \"nbformat_minor\": 0\n}\n"
  },
  {
    "path": "experiments/ecg_cnn/config.yaml",
    "content": "# Basic configuration file for running ecg_cnn example using json files.\n# Parameters needed to initialize the model\nmodel_config:\n    model_type: SuperNet                               # class w/ `loss` and `inference` methods\n    model_folder: experiments/ecg_cnn/model.py         # file containing class\n\n# Configuration for differential privacy\ndp_config:\n    enable_local_dp: false                             # whether to enable user-level DP\n\n# Additional privacy metrics\nprivacy_metrics_config:\n    apply_metrics: false                               # cache data to compute additional metrics\n\n# Select the Federated optimizer to use (e.g. DGA, FedAvg or FedProx)\nstrategy: DGA\n\n# Determines all the server-side settings for training and evaluation rounds\nserver_config:   \n    wantRL: false                                      # whether to use RL-based meta-optimizers\n    resume_from_checkpoint: false                      # restart from checkpoint if file exists\n    do_profiling: false                                # run profiler and compute runtime metrics\n    optimizer_config:                                  # this is the optimizer used to update the model\n        type: sgd\n        lr: 1.0\n    annealing_config:                                  # annealer for the learning rate\n        type: step_lr\n        step_interval: epoch\n        gamma: 1.0\n        step_size: 100\n    val_freq: 50                                       # how many iterations between metric eval on val set\n    rec_freq: 500                                      # how many iterations between metric eval on test set\n    initial_val: true\n    initial_rec: true\n    max_iteration: 2000                                # how many iterations in total\n    num_clients_per_iteration: 25                      # how many clients per iteration\n    data_config:                                       # where to get val and test data from\n        val:\n            batch_size: 10000\n            val_data: test_data.hdf5\n        test:\n            batch_size: 10000\n            test_data: test_data.hdf5\n    type: model_optimization\n    aggregate_median: softmax                          # how aggregations weights are computed\n    softmax_beta: 20.0\n    initial_lr_client: 0.001                           # learning rate used on client optimizer\n    lr_decay_factor: 1.0\n    weight_train_loss: train_loss\n    best_model_criterion: loss\n    fall_back_to_best_model: false\n\n# Dictates the learning parameters for client-side model updates. Train data is defined inside this config.\nclient_config:\n    do_profiling: false                                # run profiling and compute runtime metrics\n    ignore_subtask: false\n    data_config:                                       # where to get training data from\n        train:\n            batch_size: 96\n            list_of_train_data: train_data.hdf5\n            desired_max_samples: 87000\n    optimizer_config:                                  # this is the optimizer used by the client\n        type: sgd \n        lr: 0.001                                      # this is overridden by `initial_lr_client`\n        momentum: 0.90\n    type: optimization"
  },
  {
    "path": "experiments/ecg_cnn/dataloaders/dataloader.py",
    "content": "# Copyright (c) Microsoft Corporation.\n# Licensed under the MIT license.\n\nfrom experiments.ecg_cnn.dataloaders.dataset import Dataset\nfrom core.dataloader import BaseDataLoader\n\nimport torch\n\nclass DataLoader(BaseDataLoader):\n    def __init__(self, mode, num_workers=0, **kwargs):\n        args = kwargs['args']\n        self.batch_size = args['batch_size']\n\n        dataset = Dataset(\n            data=kwargs['data'],\n            test_only=(not mode=='train'),\n            user_idx=kwargs.get('user_idx', None),\n            file_type='hdf5',\n        )\n\n        super().__init__(\n            dataset,\n            batch_size=self.batch_size,\n            shuffle=(mode=='train'),\n            num_workers=num_workers,\n            collate_fn=self.collate_fn,\n        )\n\n    def collate_fn(self, batch):\n        x, y = list(zip(*batch))\n        return {'x': torch.tensor(x), 'y': torch.tensor(y)}"
  },
  {
    "path": "experiments/ecg_cnn/dataloaders/dataset.py",
    "content": "# Copyright (c) Microsoft Corporation.\n# Licensed under the MIT license.\n\nimport h5py\nimport numpy as np\n\nfrom core.dataset import BaseDataset\n\nclass Dataset(BaseDataset):\n    def __init__(self, data, test_only=False, user_idx=0, **kwargs):\n        self.test_only = test_only\n        self.user_idx = user_idx\n\n        # Get all data\n        self.user_list, self.user_data, self.user_data_label, self.num_samples = self.load_data(data)\n\n        if self.test_only:  # combine all data into single array\n            self.user = 'test_only'\n            self.features = np.vstack([user_data['x'] for user_data in self.user_data.values()])\n            self.labels = np.hstack([user_label['x'] for user_label in self.user_data_label.values()])\n        else:  # get a single user's data\n            if user_idx is None:\n                raise ValueError('in train mode, user_idx must be specified')\n\n            self.user = self.user_list[user_idx]\n            self.features = self.user_data[self.user]['x']\n            self.labels = self.user_data_label[self.user]['x']\n\n    def __getitem__(self, idx):\n        items = self.features[idx].astype(np.float32).T.reshape(1,187)\n        return items, self.labels[idx]\n\n    def __len__(self):\n        return len(self.features)\n\n    def load_data(self,data):\n        '''Load data from disk or memory'''\n\n        if isinstance(data, str):\n            try:\n                data = h5py.File(data, 'r')\n            except:\n                raise ValueError('Only HDF5 format is allowed for this experiment')\n\n            users = []\n            num_samples = data['num_samples']\n            features, labels = dict(), dict()\n            \n            # Decoding bytes from hdf5\n            decode_if_str = lambda x: x.decode() if isinstance(x, bytes) else x\n            for user in data['users']:\n                user = decode_if_str(user)\n                users.append(user)\n                features[user] = {'x': data['user_data'][user]['x'][()]}\n                labels[user] = {'x': data['user_data_label'][user][()]}\n\n        else:\n        \n            users = data['users']\n            features = data['user_data']\n            labels = data['user_data_label']\n            num_samples = data['num_samples']\n            \n        return users, features, labels, num_samples"
  },
  {
    "path": "experiments/ecg_cnn/model.py",
    "content": "# Copyright (c) Microsoft Corporation.\n# Licensed under the MIT license.\n\n'''The model architecture used was first created by the user polomarco for a Kaggle competition:\nhttps://www.kaggle.com/polomarco/ecg-classification-cnn-lstm-attention-mechanism\nHowever, this example has been altered to fit the FLUTE architecture'''\n\nimport torch\nfrom torch import nn\nfrom torch.nn import functional as F\n\nfrom core.model import BaseModel\n\n# ReLu alternative \nclass Swish(nn.Module):\n    def forward(self, x):\n        return x * torch.sigmoid(x)\n\nclass ConvNormPool(nn.Module):\n    \"\"\"Conv Skip-connection module\"\"\"\n    def __init__(\n        self,\n        input_size,\n        hidden_size,\n        kernel_size,\n        norm_type='bachnorm'\n    ):\n        super().__init__()\n        \n        self.kernel_size = kernel_size\n        self.conv_1 = nn.Conv1d(\n            in_channels=input_size,\n            out_channels=hidden_size,\n            kernel_size=kernel_size\n        )\n        self.conv_2 = nn.Conv1d(\n            in_channels=hidden_size,\n            out_channels=hidden_size,\n            kernel_size=kernel_size\n        )\n        self.conv_3 = nn.Conv1d(\n            in_channels=hidden_size,\n            out_channels=hidden_size,\n            kernel_size=kernel_size\n        )\n        self.swish_1 = Swish()\n        self.swish_2 = Swish()\n        self.swish_3 = Swish()\n        if norm_type == 'group':\n            self.normalization_1 = nn.GroupNorm(\n                num_groups=8,\n                num_channels=hidden_size\n            )\n            self.normalization_2 = nn.GroupNorm(\n                num_groups=8,\n                num_channels=hidden_size\n            )\n            self.normalization_3 = nn.GroupNorm(\n                num_groups=8,\n                num_channels=hidden_size\n            )\n        else:\n            self.normalization_1 = nn.BatchNorm1d(num_features=hidden_size)\n            self.normalization_2 = nn.BatchNorm1d(num_features=hidden_size)\n            self.normalization_3 = nn.BatchNorm1d(num_features=hidden_size)\n            \n        self.pool = nn.MaxPool1d(kernel_size=2)\n        \n    def forward(self, input):\n        conv1 = self.conv_1(input)\n        x = self.normalization_1(conv1)\n        x = self.swish_1(x)\n        x = F.pad(x, pad=(self.kernel_size - 1, 0))\n        \n        x = self.conv_2(x)\n        x = self.normalization_2(x)\n        x = self.swish_2(x)\n        x = F.pad(x, pad=(self.kernel_size - 1, 0))\n        \n        conv3 = self.conv_3(x)\n        x = self.normalization_3(conv1+conv3)\n        x = self.swish_3(x)\n        x = F.pad(x, pad=(self.kernel_size - 1, 0))   \n        \n        x = self.pool(x)\n        return x\n\nclass RNN(nn.Module):\n    \"\"\"RNN module(cell type lstm or gru)\"\"\"\n    def __init__(\n        self,\n        input_size,\n        hid_size,\n        num_rnn_layers=1,\n        dropout_p = 0.2,\n    ):\n        super().__init__()\n        \n        self.rnn_layer = nn.LSTM(\n            input_size=input_size,\n            hidden_size=hid_size,\n            num_layers=num_rnn_layers,\n            dropout=dropout_p if num_rnn_layers>1 else 0,\n            bidirectional=False,\n            batch_first=True,\n        )\n        \n    def forward(self, input):\n        outputs, hidden_states = self.rnn_layer(input)\n        return outputs, hidden_states\n\nclass Net(nn.Module): \n    def __init__(\n        self,\n        input_size=1,\n        hid_size=64,\n        n_classes=5,\n        kernel_size=5,\n    ):\n        super().__init__()\n \n        self.rnn_layer = RNN(\n            input_size=46,\n            hid_size=hid_size,\n        )\n        self.conv1 = ConvNormPool(\n            input_size=input_size,\n            hidden_size=hid_size,\n            kernel_size=kernel_size,\n        )\n        self.conv2 = ConvNormPool(\n            input_size=hid_size,\n            hidden_size=hid_size,\n            kernel_size=kernel_size,\n        )\n        self.avgpool = nn.AdaptiveMaxPool1d((1))\n        self.attn = nn.Linear(hid_size, hid_size, bias=False)\n        self.fc = nn.Linear(in_features=hid_size, out_features=n_classes)\n        \n    def forward(self, input):\n        x = self.conv1(input)\n        x = self.conv2(x)\n        x_out, hid_states = self.rnn_layer(x)\n        x = torch.cat([hid_states[0], hid_states[1]], dim=0).transpose(0, 1)\n        x_attn = torch.tanh(self.attn(x))\n        x = x_attn.bmm(x_out)\n        x = x.transpose(2, 1)\n        x = self.avgpool(x)\n        x = x.view(-1, x.size(1) * x.size(2))\n        x = F.softmax(self.fc(x), dim=-1)\n        return x\n\nclass SuperNet(BaseModel):\n    '''This is the parent of the net with some extra methods'''\n    def __init__(self, model_config):\n        super().__init__()\n        self.net = Net()\n    \n    def loss(self, input: torch.Tensor):\n        device = 'cuda' if torch.cuda.is_available() else 'cpu'\n        features, labels = input['x'].to(device), input['y'].to(device)\n        output = self.net.forward(features)\n        return F.cross_entropy(output, labels.long())\n\n    def inference(self, input):\n        device = 'cuda' if torch.cuda.is_available() else 'cpu'\n        features, labels = input['x'].to(device), input['y'].to(device)\n        output = self.net.forward(features)\n        n_samples = features.shape[0]\n \n        accuracy = torch.mean((torch.argmax(output, dim=1) == labels).float()).item()\n\n        return {'output':output, 'acc': accuracy, 'batch_size': n_samples}\n        \n\n\n\n\n"
  },
  {
    "path": "experiments/ecg_cnn/readme.md",
    "content": "# Example of CNN-LSTM model on Arrhythmia dataset\n\nThe objective of this experiment is to show the capabilities of FLUTE in data setting relevant to the healthcare sector. \n\n### Federating the MIT-BIH Arrhythmia Dataset\n\nIn this experiment, a processed version of [MIT-BIH Arrhythmia Dataset](https://www.physionet.org/content/mitdb/1.0.0/) is used. In particular, we are using the dataset version found on [this Kaggle competition](https://www.kaggle.com/shayanfazeli/heartbeat). \n\nExcerpt from the original [MIT-BIH Arrhythmia Database](https://physionet.org/content/mitdb/1.0.0/): \n\n> The MIT-BIH Arrhythmia Database contains 48 half-hour excerpts of two-channel ambulatory ECG recordings, obtained from 47 subjects studied by the BIH Arrhythmia Laboratory between 1975 and 1979. Twenty-three recordings were chosen at random from a set of 4000 24-hour ambulatory ECG recordings collected from a mixed population of inpatients (about 60%) and outpatients (about 40%) at Boston's Beth Israel Hospital; the remaining 25 recordings were selected from the same set to include less common but clinically significant arrhythmias that would not be well-represented in a small random sample.\n\nWhat this means for us: the federation in this example is a exemplar one, as the 47 subjects and their 48 half-hour excerpts are split up into the 109446 labeled samples of length 187. The sampling frequency is 125Hz and the number of categories is five. The categories are: \n\n```['N': 0, 'S': 1, 'V': 2, 'F': 3, 'Q': 4]```\n\nOr: \n\n```-N : Non-ecotic beats (normal beat) -S : Supraventricular ectopic beats -V : Ventricular ectopic beats -F : Fusion Beats -Q : Unknown Beats```\n\nThe classes in the dataset are quite skewed; the *normal beats* class is present in 82.77% of samples. Using synthetic data could possibly increase the performance of the models by decreasing the class imbalance (e.g. by using [this GAN]([GitHub - mandrakedrink/ECG-Synthesis-and-Classification: 1D GAN for ECG Synthesis and 3 models: CNN, LSTM, and Attention mechanism for ECG Classification.](https://github.com/mandrakedrink/ECG-Synthesis-and-Classification)) for data synthesis) but is not too relevant for our experiment of transferring this experiment to FLUTE. \n\n#### Model architecture\n\nThe model architecture is largely taken from [this notebook on Kaggle](https://www.kaggle.com/polomarco/ecg-classification-cnn-lstm-attention-mechanism). The architecture has been altered to fit  the FLUTE architecture. The image below showcases the general model architecture. \n\n![network](./net.png)\n\nThe FLUTE-ready model can be found in `model.py`. Here, `SuperNet` is the parent class of the model various model network classes. `SuperNet` contains the `loss` and `inference` methods which FLUTE expects. `SuperNet` is therefore also the `model_type` set in `config.yaml`. \n\nThe file `centralized_model.ipynb` can be used to test a centralized run of the model. Running this model expects the csv test and train files to be added to a `.\\ecg_cnn\\data\\mitbih\\` folder. This model has higher performance than the remote model (roughly 94% as opposed to 87% accuracy). This not fully unexpected, since the federated model could have more issues dealing with the class imbalance. \n\n#### Preparing the data\n\nFirst, place the `mitbih_test.csv` and `mitbig_train.csv` files in the folder `.\\ecg_cnn\\data\\mitbih\\`. Next, run preprocess.py in the `utils` folder to generate the HDF5 files. \n\n## Specifying dataset and dataloaders\n\nInside the `dataloaders` folder, there are two files: `dataset.py` and\n`dataloader.py`. Both inherit from the base classes declared in `core`\nfolder, that under the hood inhereit from Pytorch classes with same name.\n\nThe dataset should be able to access all the data, and store it in the\nattributes `user_list`, `user_data`, `user_data_labels` and `num_samples` (user\nnames, user features, user labels if the problem is supervised, and number of\nsamples for each user, respectively). These attributes are required to have\nthese exact names. Otherwise, it should also be able to access the examples of a\nspecific user, which id is passed during initialization via the `user_idx`\nargument.\n\nThe dataloader is simpler, and essentially just instantiates the dataset and\ncreates batches with a specific format.\n\n## Creating a config file\n\nAll the parameters of the experiment are passed in a YAML file. A documented\nexample is provided in `config.yaml`.\n\n## Running the experiment locally\n\nFinally, to launch the experiment, it suffices to launch the `e2e_trainer.py`\nscript using torch.distributed:\n\n`python -m torch.distributed.run --nproc_per_node=2 .\\e2e_trainer.py -dataPath experiments/ecg_cnn/data -outputPath scratch -config experiments/ecg_cnn/config.yaml -task ecg_cnn -backend nccl`\n\nThe `dataPath`, `outputPath` and `config` arguments should just specify the\nrespective files or folders, as in the example above -- in this case, a folder\ncalled `scratch` will be created containing logs and checkpoints. The task\nshould be the name of the folder insider `experiments\n\n## Running the experiments on Azure Machine Learning\n\nIn order to run the experiment on Azure Machine Learning, you first need to follow the steps described [here](#Experiments).\nMake sure the HDF5 dataset is uploaded, the compute has a GPU and is running, and your YAML file is properly set up. An example file for running this experiment would be the following: \n\n```yaml\nexperiment_name: ecg_cnn_run \ndescription: FLUTE heartbeat dataset example \ncode:\n  local_path: .\ncompute: azureml:compute_with_gpu\nenvironment:\n  image: pytorch/pytorch:1.9.0-cuda10.2-cudnn7-devel\ninputs:\n  data:\n    folder: azureml://datastores/workspaceblobstore/paths/data\n    mode: rw_mount\ncommand: >\n  apt -y update &&\n  apt -y install openmpi-bin libopenmpi-dev openssh-client &&\n  python3 -m pip install --upgrade pip &&\n  python3 -m pip install -r requirements.txt &&\n  python -m torch.distributed.run --nproc_per_node=4 e2e_trainer.py\n  -outputPath=./outputs\n  -dataPath={inputs.data}\n  -task=ecg_cnn\n  -config=./experiments/ecg_cnn/config.yaml\n  -backend=nccl\n\n```\nTo run your job, you can then use the following command: \n`az ml job create -f ./run.yaml -w \"YourWorkspaceName\" -g \"YourResourceGroup\"`\n\nThe job should now be created and uploaded, after which it can be found in the AzureML Studio. "
  },
  {
    "path": "experiments/ecg_cnn/utils/preprocess.py",
    "content": "# Copyright (c) Microsoft Corporation.\n# Licensed under the MIT license.\n\nimport h5py\nimport time\nimport tqdm\nimport csv\nimport pandas as pd\nfrom sklearn.utils import resample\n\ndef _dump_dict_to_hdf5(data_dict: dict, hdf5_file: h5py.File):\n    '''Dump dict with expected structure to HDF5 file'''\n\n    hdf5_file.create_dataset('users', data=data_dict['users'])\n    hdf5_file.create_dataset('num_samples', data=data_dict['num_samples'])\n\n    # Store actual data in groups\n    user_data_group = hdf5_file.create_group('user_data')\n    for user, user_data in tqdm.tqdm(data_dict['user_data'].items()):\n        user_subgroup = user_data_group.create_group(user)\n        user_subgroup.create_dataset('x', data=user_data) \n\n    user_data_label_group = hdf5_file.create_group('user_data_label')\n    for user, user_data_label in tqdm.tqdm(data_dict['user_data_label'].items()):\n        user_data_label_group.create_dataset(user, data=user_data_label) \n\ndef _process_and_save_to_disk(dataset, n_users, output):\n    '''Process the dataset to expected format and save to disk'''\n\n    # Split training data equally among all users\n    total_samples = len(dataset)\n    samples_per_user = total_samples // n_users\n    assert total_samples % n_users == 0\n\n    # Function for getting a given user's data indices\n    user_idxs = lambda user_id: slice(user_id * samples_per_user, (user_id + 1) * samples_per_user)\n\n    # Convert training data to expected format\n    print('Converting data to expected format...')\n    start_time = time.time()\n\n    data_dict = {  # the data is expected to have this format\n        'users' : [f'{user_id:04d}' for user_id in range(n_users)],\n        'num_samples' : n_users * [samples_per_user],\n        'user_data' : {f'{user_id:04d}': dataset.data[user_idxs(user_id)] for user_id in range(n_users)},\n        'user_data_label': {f'{user_id:04d}': dataset.targets[user_idxs(user_id)] for user_id in range(n_users)},\n    }\n    print(f'Finished converting data in {time.time() - start_time:.2f}s.')\n\n    # Save training data to disk\n    print('Saving data to disk...')\n    start_time = time.time()\n\n    with h5py.File(output + '.hdf5', 'w') as hdf5_file:\n        _dump_dict_to_hdf5(data_dict=data_dict, hdf5_file=hdf5_file)\n    print(f'Finished saving data in {time.time() - start_time:.2f}s.')\n\nclass HeartDataSet: \n    def __init__(self, heartdata, cutoff):\n        self.data = [row[:187] for row in heartdata][:cutoff]\n        self.targets = [int(float(row[187])) for row in heartdata][:(round(len(heartdata), -3))][:cutoff]\n\n    def __len__(self):\n        return len(self.data)  \n\n\n# From https://www.kaggle.com/gregoiredc/arrhythmia-on-ecg-classification-using-cnn/notebook\n# Can be used to creating resampled training set for less class imbalance\ndef resampleSet(train_df): \n    train_df[187]=train_df[187].astype(float).astype(int)\n    df_1=train_df[train_df[187]==1]\n    df_2=train_df[train_df[187]==2]\n    df_3=train_df[train_df[187]==3]\n    df_4=train_df[train_df[187]==4]\n    df_0=(train_df[train_df[187]==0]).sample(n=40001,random_state=42)\n\n    df_1_upsample=resample(df_1,replace=True,n_samples=10000,random_state=123)\n    df_2_upsample=resample(df_2,replace=True,n_samples=20000,random_state=124)\n    df_3_upsample=resample(df_3,replace=True,n_samples=5000,random_state=125)\n    df_4_upsample=resample(df_4,replace=True,n_samples=20000,random_state=126)\n\n    train_df=pd.concat([df_0,df_1_upsample,df_2_upsample,df_3_upsample,df_4_upsample])\n    return train_df\n\n# Uncomment lines below for resampled dataset\nwith open('../data/mitbih/mitbih_test.csv') as f: \n    testset = list(csv.reader(f , delimiter=','))\nTestDataset = HeartDataSet(testset, 21000)\n_process_and_save_to_disk(TestDataset,1000,'../data/test_data')\n\nwith open('../data/mitbih/mitbih_train.csv') as f: \n    trainset = csv.reader(f , delimiter=',')\n    trainsetlist = list(trainset) \nTrainDataset = HeartDataSet(trainsetlist, 87000)\n_process_and_save_to_disk(TrainDataset,1000,'../data/train_data')\n"
  },
  {
    "path": "experiments/fednewsrec/README.md",
    "content": "### Data\n\nIn order to run this experiment, you need to previously download the MIND dataset [here](https://msnews.github.io/index.html) and the glove.840B.300d embbeding vector [here](https://nlp.stanford.edu/projects/glove/). Once you have the data, make sure to replace the `root_data_path` and `embedding_path` parameters inside [dataset.py](dataloaders/dataset.py) and [configuration file](config.yaml). The preprocessing steps will be done automatically by FLUTE once the jobs is launched.\n\n### Run\n\nOnce the paths for the dataset and embedding have been updated, you can run the experiment as follows:\n\n```code\n\n    python -m torch.distributed.run  --nproc_per_node=4  e2e_trainer.py -dataPath ~/data -outputPath ~/outputTest  -config ./experiments/fednewsrec/config.yaml -task fednewsrec -backend nccl\n    \n```\n### Results\n\n- MIND_Large, 1500 rounds, 6 clients per round:\n\n|Platform|AUC|MRR|nDCG5|nDCG10|\n|:----|:----|:----|:----|:----|\n|FedNews|0.54|0.23|0.25|0.32|\n|FLUTE|0.58|0.24|0.26|0.33| \n\n"
  },
  {
    "path": "experiments/fednewsrec/config.yaml",
    "content": "# Parameters needed to initialize the model\nmodel_config:\n    model_type: FEDNEWS                                    # class w/ `loss` and `inference` methods\n    model_folder: experiments/fednewsrec/model.py     # file containing class\n    embbeding_path: /mnt/data/MIND_large\n# Configuration for differential privacy\ndp_config:\n    enable_local_dp: false                             # whether to enable user-level DP\n\n# Additional privacy metrics\nprivacy_metrics_config:\n    apply_metrics: false                               # cache data to compute additional metrics\n\n# Select the Federated optimizer to use (e.g. DGA, FedAvg or FedProx)\nstrategy: FedAvg\n\n# Determines all the server-side settings for training and evaluation rounds\nserver_config:   \n    wantRL: false                                      # whether to use RL-based meta-optimizers\n    resume_from_checkpoint: true                      # restart from checkpoint if file exists\n    do_profiling: false                                # run profiler and compute runtime metrics\n    optimizer_config:                                  # this is the optimizer used to update the model\n        type: sgd\n        lr: 1.0\n    annealing_config:                                  # annealer for the learning rate\n        type: step_lr\n        step_interval: epoch\n        gamma: 1.0\n        step_size: 100\n    val_freq: 50                                       # how many iterations between metric eval on val set\n    rec_freq: 2000                                      # how many iterations between metric eval on test set\n    initial_val: true\n    initial_rec: false\n    max_iteration: 1500                                # how many iterations in total\n    num_clients_per_iteration: 500                     # how many clients per iteration\n    data_config:                                       # where to get val and test data from\n        val:\n            batch_size: 1\n            val_data: null                             # Assigned to null because dataset is being instantiated\n        test:\n            batch_size: 1\n            test_data: null                            # Assigned to null because dataset is being instantiated\n    type: model_optimization\n    aggregate_median: softmax                          # how aggregations weights are computed\n    initial_lr_client: 0.1                           # learning rate used on client optimizer\n    lr_decay_factor: 1.0\n    weight_train_loss: train_loss\n    best_model_criterion: auc\n    fall_back_to_best_model: false\n    softmax_beta: 1.0\n\n# Dictates the learning parameters for client-side model updates. Train data is defined inside this config.\nclient_config:\n    do_profiling: false                                # run profiling and compute runtime metrics\n    ignore_subtask: false\n    data_config:                                       # where to get training data from\n        train:\n            batch_size: 64\n            list_of_train_data: null                   # Assigned to null because dataset is being instantiated\n            desired_max_samples: 50000\n    optimizer_config:                                  # this is the optimizer used by the client\n        type: sgd\n        lr: 0.1                                      # this is overridden by `initial_lr_client`\n    type: optimization"
  },
  {
    "path": "experiments/fednewsrec/dataloaders/dataloader.py",
    "content": "# Copyright (c) Microsoft Corporation.\n# Licensed under the MIT license.\n\nimport torch\nimport numpy as np\nfrom core.dataloader import BaseDataLoader\nfrom experiments.fednewsrec.dataloaders.dataset import Dataset\n\nclass DataLoader(BaseDataLoader):\n    def __init__(self, mode, num_workers=0, **kwargs):\n        args = kwargs['args']\n        self.batch_size = args['batch_size']\n        self.mode = mode\n\n        dataset = Dataset(\n            data=kwargs['data'],\n            test_only=(not mode=='train'),\n            user_idx=kwargs.get('user_idx', None),\n        )\n\n        super().__init__(\n            dataset,\n            batch_size=self.batch_size,\n            shuffle=(mode=='train'),\n            num_workers=num_workers,\n            collate_fn=self.collate_fn,\n        )\n\n    def collate_fn(self, batch):\n        if self.mode == \"train\": # For training\n            click, sample, label = list(zip(*batch))\n            click = torch.tensor(click)\n            sample = torch.tensor(sample)\n            label = torch.tensor(label)\n            return {'x': (click, sample), 'y': label}\n\n        else: # For testing -- data format is different\n            nv_hist = torch.stack(batch[0][0]).squeeze(1) \n            nv_imp = torch.stack(batch[0][1]).squeeze(1)\n            label = batch[0][2]\n            return {'x': (nv_hist, nv_imp), 'y': label}\n\n\n\n        \n        "
  },
  {
    "path": "experiments/fednewsrec/dataloaders/dataset.py",
    "content": "# Copyright (c) Microsoft Corporation.\n# Licensed under the MIT license.\n\nimport numpy as np\nimport torch\nfrom core.dataset import BaseDataset\nfrom experiments.fednewsrec.dataloaders.preprocess_mind import MIND\n\nclass Dataset(BaseDataset):\n    def __init__(self, data, test_only=False, user_idx=0, **kwargs):\n        self.test_only = test_only\n        self.user_idx = user_idx\n\n        # Get all data\n        self.user_list, self.user_data, self.user_data_label, self.num_samples = self.load_data(data, self.test_only)\n\n        if user_idx != -1:\n            if self.test_only:  # combine all data into single array\n                self.user = 'test_only'\n                self.labels = [user_label for user_label in self.user_data_label.values()]\n                self.features_x = [user_data['x'] for user_data in self.user_data.values()]\n                self.features_y = [user_data['y'] for user_data in self.user_data.values()]\n            else:  # get a single user's data\n                if user_idx is None:\n                    raise ValueError('in train mode, user_idx must be specified')\n\n                self.user = self.user_list[user_idx]\n                self.features_x = self.user_data[self.user]['x']\n                self.features_y = self.user_data[self.user]['y']\n                self.labels = self.user_data_label[self.user]\n\n    def __getitem__(self, idx):\n        return self.features_x[idx], self.features_y[idx], self.labels[idx]\n\n    def __len__(self):\n        return len(self.features_x)\n\n    def load_data(self, data, test_only):\n        '''Wrapper method to read/instantiate the dataset'''\n\n        if data == None:\n            dataset = MIND(root_data_path=\"/mnt/data/MIND_large\", embedding_path=\"/mnt/data/MIND_large\")\n            data = dataset.testset if test_only else dataset.trainset\n        \n        users = data['users']\n        features = data['user_data']\n        labels = data['user_data_label']\n        num_samples = data['num_samples']\n            \n        return users, features, labels, num_samples"
  },
  {
    "path": "experiments/fednewsrec/dataloaders/preprocess_mind.py",
    "content": "from nltk.tokenize import word_tokenize\nimport random\nimport os\nimport numpy as np\nimport torch\n\nMAX_SENTENCE = 30\nMAX_ALL = 50\nnpratio = 4\n\n''' \n    Preprocessing steps for MIND dataset were taken from the FedNewsRec-EMNLP-Findings-2020\n    repository, for more information please refer to https://github.com/taoqi98/FedNewsRec/blob/master/code/preprecoess.py.\n'''\n\nclass MIND:\n    def __init__(self, root_data_path, embedding_path) :\n\n        # Preprocessing\n        news,news_index,category_dict,subcategory_dict,word_dict = read_news(root_data_path,['train','val'])\n        news_title,news_vert,news_subvert=get_doc_input(news,news_index,category_dict,subcategory_dict,word_dict)\n        title_word_embedding_matrix, have_word = load_matrix(embedding_path,word_dict)\n        train_session, train_uid_click, train_uid_table = read_clickhistory(root_data_path,'train')\n        test_session, test_uid_click,test_uid_table = read_clickhistory(root_data_path,'val')\n        train_user = parse_user(train_session,news_index)\n        test_user = parse_user(test_session,news_index)\n        train_sess, train_user_id, train_label, train_user_id_sample = get_train_input(train_session,train_uid_click,news_index)\n        test_impressions, test_userids = get_test_input(test_session,news_index)\n        get_user_data = GetUserDataFunc(news_title,train_user_id_sample,train_user,train_sess,train_label,train_user_id)\n        \n        # Return datasets\n        print(\"Preparing train datasets ...\")\n        train_dict = {'users': [], 'num_samples': [], 'user_data': dict(), 'user_data_label': dict()}\n\n        for uidx in range(50000):\n            uid = train_uid_table[uidx]\n            click, sample, label = get_user_data(uid)\n            user = str(uidx) # uid\n            train_dict['users'].append(user)\n            train_dict['num_samples'].append(len(click))\n            train_dict['user_data'][user] = {'x': click, 'y': sample}\n            train_dict['user_data_label'][user] = label\n\n        print(\"Preparing test datasets ...\")\n        test_dict = {'users': [], 'num_samples': [], 'user_data': dict(), 'user_data_label': dict()}\n        doc_cache = []\n\n        for j in range(len(news_title)):\n            doc_cache.append(torch.from_numpy(np.array([news_title[j]])))\n\n        for i in range(len(test_impressions)):\n            docids = test_impressions[i]['docs']\n            labels = test_impressions[i]['labels']\n            nv_hist = [doc_cache[j] for j in test_user['click'][i]]\n            nv_imp = [doc_cache[j] for j in docids]\n            user = str(i)\n            test_dict['users'].append(user)\n            test_dict['num_samples'].append(len(nv_imp))\n            test_dict['user_data'][user] = {'x':nv_hist, 'y':nv_imp}\n            test_dict['user_data_label'][user] = labels\n        self.trainset = train_dict\n        self.testset = test_dict\n\ndef GetUserDataFunc(news_title,train_user_id_sample,train_user,train_sess,train_label,train_user_id):\n    def _get_user_data(uid):\n        click = []\n        sample = []\n        label = []\n        for sid in train_user_id_sample[uid]:\n            click.append(train_user['click'][train_user_id[sid]])\n            sample.append(train_sess[sid])\n            label.append(train_label[sid])\n        click = np.array(click)\n        sample = np.array(sample)\n        label = np.array(label)\n        click = news_title[click]\n        sample = news_title[sample]        \n        return click,sample,label\n    return _get_user_data\n\ndef newsample(nnn,ratio):\n    if ratio >len(nnn):\n        return random.sample(nnn*(ratio//len(nnn)+1),ratio)\n    else:\n        return random.sample(nnn,ratio)\n\ndef read_news(root_data_path,modes):\n    news={}\n    category=[]\n    subcategory=[]\n    news_index={}\n    index=1\n    word_dict={}\n    word_index=1\n    \n    for mode in modes:\n        with open(os.path.join(root_data_path,mode,'news.tsv'), encoding=\"utf8\") as f:\n            lines = f.readlines()\n        for line in lines:\n            splited = line.strip('\\n').split('\\t')\n            doc_id,vert,subvert,title= splited[0:4]\n            if doc_id in news_index:\n                continue\n            news_index[doc_id]=index\n            index+=1\n            category.append(vert)\n            subcategory.append(subvert)\n            title = title.lower()\n            title=word_tokenize(title)\n            news[doc_id]=[vert,subvert,title]\n            for word in title:\n                word = word.lower()\n                if not(word in word_dict):\n                    word_dict[word]=word_index\n                    word_index+=1\n    category=list(set(category))\n    subcategory=list(set(subcategory))\n    category_dict={}\n    index=1\n    for c in category:\n        category_dict[c]=index\n        index+=1\n    subcategory_dict={}\n    index=1\n    for c in subcategory:\n        subcategory_dict[c]=index\n        index+=1\n    return news,news_index,category_dict,subcategory_dict,word_dict\n\ndef get_doc_input(news,news_index,category,subcategory,word_dict):\n    news_num=len(news)+1\n    news_title=np.zeros((news_num,MAX_SENTENCE),dtype='int32')\n    news_vert=np.zeros((news_num,),dtype='int32')\n    news_subvert=np.zeros((news_num,),dtype='int32')\n    for key in news:    \n        vert,subvert,title=news[key]\n        doc_index=news_index[key]\n        news_vert[doc_index]=category[vert]\n        news_subvert[doc_index]=subcategory[subvert]\n        for word_id in range(min(MAX_SENTENCE,len(title))):\n            news_title[doc_index,word_id]=word_dict[title[word_id].lower()]\n        \n    return news_title,news_vert,news_subvert\n\n\ndef load_matrix(embedding_path,word_dict):\n    embedding_matrix = np.zeros((len(word_dict)+1,300))\n    have_word=[]\n    with open(os.path.join(embedding_path,'glove.840B.300d.txt'),'rb') as f:\n        while True:\n            l=f.readline()\n            if len(l)==0:\n                break\n            l=l.split()\n            word = l[0].decode()\n            if word in word_dict:\n                index = word_dict[word]\n                tp = [float(x) for x in l[1:]]\n                embedding_matrix[index]=np.array(tp)\n                have_word.append(word)\n    return embedding_matrix,have_word\n\n\ndef read_clickhistory(root_data_path,mode):\n    \n    lines = []\n    userids = {}\n    uid_table = {}\n    with open(os.path.join(root_data_path,mode,'behaviors.tsv')) as f:\n        lines = f.readlines()\n        \n    sessions = []\n    for i in range(len(lines)):\n        _,uid,_,click,imp = lines[i].strip().split('\\t')\n        true_click = click.split()\n        assert not '' in true_click\n        if not uid in userids:\n            uid_table[len(userids)] = uid\n            userids[uid] = []\n        userids[uid].append(i)\n        imp = imp.split()\n        pos = []\n        neg = []\n        for beh in imp:\n            nid, label = beh.split('-')\n            if label == '0':\n                neg.append(nid)\n            else:\n                pos.append(nid)\n        sessions.append([true_click,pos,neg])\n    return sessions,userids,uid_table\n\ndef parse_user(session,news_index):\n    user_num = len(session)\n    user={'click': np.zeros((user_num,MAX_ALL),dtype='int32'),}\n    for user_id in range(len(session)):\n        tclick = []\n        click, pos, neg =session[user_id]\n        for i in range(len(click)):\n            tclick.append(news_index[click[i]])\n        click = tclick\n\n        if len(click) >MAX_ALL:\n            click = click[-MAX_ALL:]\n        else:\n            click=[0]*(MAX_ALL-len(click)) + click\n            \n        user['click'][user_id] = np.array(click)\n    return user\n\ndef get_train_input(session,uid_click_talbe,news_index):\n    inv_table = {}\n    user_id_session = {}\n\n    for uid in uid_click_talbe:\n        user_id_session[uid] = []\n        for v in uid_click_talbe[uid]:\n            inv_table[v] = uid\n    \n    sess_pos = []\n    sess_neg = []\n    user_id = []\n    for sess_id in range(len(session)):\n        sess = session[sess_id]\n        _, poss, negs=sess\n        for i in range(len(poss)):\n            pos = poss[i]\n            neg=newsample(negs,npratio)\n            sess_pos.append(pos)\n            sess_neg.append(neg)\n            user_id.append(sess_id)                \n            user_id_session[inv_table[sess_id]].append(len(sess_pos)-1)\n            \n    sess_all = np.zeros((len(sess_pos),1+npratio),dtype='int32')\n    label = np.zeros((len(sess_pos),1+npratio))\n    for sess_id in range(sess_all.shape[0]):\n        pos = sess_pos[sess_id]\n        negs = sess_neg[sess_id]\n        sess_all[sess_id,0] = news_index[pos]\n        index = 1\n        for neg in negs:\n            sess_all[sess_id,index] = news_index[neg]\n            index+=1\n        #index = np.random.randint(1+npratio)\n        label[sess_id,0]=1\n    user_id = np.array(user_id, dtype='int32')\n    \n    return sess_all, user_id, label, user_id_session\n\ndef get_test_input(session,news_index):\n    \n    Impressions = []\n    userid = []\n    for sess_id in range(len(session)):\n        _, poss, negs = session[sess_id]\n        imp = {'labels':[],\n                'docs':[]}\n        userid.append(sess_id)\n        for i in range(len(poss)):\n            docid = news_index[poss[i]]\n            imp['docs'].append(docid)\n            imp['labels'].append(1)\n        for i in range(len(negs)):\n            docid = news_index[negs[i]]\n            imp['docs'].append(docid)\n            imp['labels'].append(0)\n        Impressions.append(imp)\n        \n    userid = np.array(userid,dtype='int32')\n    \n    return Impressions, userid,\n\n\n\n"
  },
  {
    "path": "experiments/fednewsrec/fednewsrec_model.py",
    "content": "import torch\nimport torch.nn as nn\nimport numpy as np\n\nnpratio = 4\n\n''' \n    The FedNewsRec model is taken from FedNewsRec-EMNLP-Findings-2020 repository and ported to PyTorch\n    framework to be compatible with FLUTE (https://github.com/simra/FedNewsRec#fednewsrec-emnlp-findings-2020). \n    For more information regarding this model, please refer to https://github.com/taoqi98/FedNewsRec.\n'''\nclass AttentivePooling(nn.Module):\n    def __init__(self, dim1: int, dim2: int):\n        super(AttentivePooling, self).__init__()\n        self.dim1 = dim1\n        self.dim2 = dim2\n\n        self.dropout = nn.Dropout(0.2)\n        self.dense  = nn.Linear(dim2, 200)\n        self.tanh = nn.Tanh()\n        self.dense2 = nn.Linear(200, 1)\n        self.softmax = nn.Softmax(dim=1)\n       \n\n    def forward(self, x):\n        user_vecs = self.dropout(x)\n        user_att = self.tanh(self.dense(user_vecs))\n        user_att = self.dense2(user_att).squeeze(2)\n        user_att = self.softmax(user_att)\n        result = torch.einsum('ijk,ij->ik', user_vecs, user_att)        \n        return result\n\n    def fromTensorFlow(self, tfmodel):\n        keras_weights = tfmodel.layers[1].get_weights()\n        # print(keras_weights)\n        self.dense.weight.data = torch.tensor(keras_weights[0]).transpose(0,1).cuda()\n        self.dense.bias.data = torch.tensor(keras_weights[1]).cuda()\n\n        keras_weights = tfmodel.layers[2].get_weights()\n        # print(keras_weights)\n        self.dense2.weight.data = torch.tensor(keras_weights[0]).transpose(0,1).cuda()\n        self.dense2.bias.data = torch.tensor(keras_weights[1]).cuda()\n\nclass Attention(nn.Module):\n \n    def __init__(self, input_dim, nb_head, size_per_head, **kwargs):\n        super(Attention, self).__init__(**kwargs)\n        #self.input_shape = input_shape\n        self.input_dim = input_dim\n        self.nb_head = nb_head\n        self.size_per_head = size_per_head\n        self.output_dim = nb_head*size_per_head\n        #self.WQ = nn.Linear(self.input_shape[0][-1], self.output_dim, bias=False)\n        #self.WK = nn.Linear(self.input_shape[1][-1], self.output_dim, bias=False)\n        #self.WV = nn.Linear(self.input_shape[2][-1], self.output_dim, bias=False)\n        self.WQ = nn.Linear(self.input_dim, self.output_dim, bias=False)\n        self.WK = nn.Linear(self.input_dim, self.output_dim, bias=False)\n        self.WV = nn.Linear(self.input_dim, self.output_dim, bias=False)\n        torch.nn.init.xavier_uniform_(self.WQ.weight, gain=np.sqrt(2))\n        torch.nn.init.xavier_uniform_(self.WK.weight, gain=np.sqrt(2))\n        torch.nn.init.xavier_uniform_(self.WV.weight, gain=np.sqrt(2))\n        \n    def fromTensorFlow(self, tf, criteria = lambda l: l.name.startswith('attention')):\n        for l in tf.layers:\n            print(l.name, l.output_shape)\n            if criteria(l):\n                weights = l.get_weights()\n                self.WQ.weight.data = torch.tensor(weights[0].transpose()).cuda()\n                self.WK.weight.data = torch.tensor(weights[1].transpose()).cuda()\n                self.WV.weight.data = torch.tensor(weights[2].transpose()).cuda()\n                \n\n \n    def forward(self, x):\n        if len(x) == 3:\n            Q_seq,K_seq,V_seq = x\n            Q_len,V_len = None,None\n        \n        Q_seq = self.WQ(Q_seq)\n        Q_seq = torch.reshape(Q_seq, (-1, Q_seq.shape[1], self.nb_head, self.size_per_head))\n        #Q_seq = K.permute_dimensions(Q_seq, (0,2,1,3))\n        Q_seq = torch.transpose(Q_seq, 1, 2)\n        K_seq = self.WK(K_seq)\n        K_seq = torch.reshape(K_seq, (-1, K_seq.shape[1], self.nb_head, self.size_per_head))\n        K_seq = torch.transpose(K_seq, 1, 2)\n        V_seq = self.WV(V_seq)\n        V_seq = torch.reshape(V_seq, (-1, V_seq.shape[1], self.nb_head, self.size_per_head))\n        V_seq = torch.transpose(V_seq, 1, 2)\n        \n        #print('pt shapes')\n        #print(Q_seq.shape, K_seq.shape)\n        #A = K.batch_dot(Q_seq, K_seq, axes=[3,3]) / self.size_per_head**0.5\n        A = torch.einsum('ijkl,ijml->ijkm', Q_seq, K_seq) / self.size_per_head**0.5\n        # A = K.permute_dimensions(A, (0,3,2,1))\n        # A = self.Mask(A, V_len, 'add')\n        # A = K.permute_dimensions(A, (0,3,2,1))\n        A = torch.softmax(A, dim=-1)\n        #输出并mask\n        #O_seq = K.batch_dot(A, V_seq, axes=[3,2])\n        O_seq = torch.einsum('ijkl,ijlm->ijkm', A, V_seq)\n        #O_seq = K.permute_dimensions(O_seq, (0,2,1,3))\n        O_seq = torch.transpose(O_seq, 1,2)\n        #O_seq = K.reshape(O_seq, (-1, K.shape(O_seq)[1], self.output_dim))\n        O_seq = torch.reshape(O_seq, (-1, O_seq.shape[1], self.output_dim))\n        #O_seq = self.Mask(O_seq, Q_len, 'mul')\n        return O_seq\n \n\n\n\n\n\nclass Permute(nn.Module):\n    def __init__(self, *dims):\n        super(Permute, self).__init__()\n        self.dims = dims\n    \n    def forward(self, x):\n        return x.permute(*self.dims)\n\nclass SwapTrailingAxes(nn.Module):\n    def __init__(self):\n        super(SwapTrailingAxes, self).__init__()\n        \n    def forward(self, x):        \n        return x.transpose(-2, -1)\n\nclass DocEncoder(nn.Module):\n    def __init__(self):        \n        super(DocEncoder,self).__init__()\n        self.phase1 = nn.Sequential(\n            nn.Dropout(0.2),\n            # TODO: why we need the SwapTrailingAxes here?\n            SwapTrailingAxes(),            \n            nn.Conv1d(300, 400, 3),\n            nn.ReLU(),\n            nn.Dropout(0.2),\n            # TODO: seems here we swap the dimension back. why?\n            SwapTrailingAxes()\n        )\n\n        #self.attention = nn.MultiheadAttention(400, 20, batch_first=True)\n        self.attention = Attention(400, 20, 20)\n        # Pytorch MultiheadAttention has in_proj_weight of size (3*embed_dim, embed_dim)\n        # Thus, we need to scale the xavier by sqrt(2)\n        #torch.nn.init.xavier_uniform_(self.attention.in_proj_weight, gain=np.sqrt(2))\n        self.phase2 = nn.Sequential(\n            nn.ReLU(),\n            nn.Dropout(0.2),\n            AttentivePooling(30,400)\n        )\n\n    def fromTensorFlow(self, tfDoc):\n        print('td')\n        for l in self.phase1:\n            if 'conv' in l._get_name().lower():\n                print('conv shape:',l.weight.data.shape, l.bias.data.shape)\n            #print('\\t',[p[0] for p in l.named_parameters()])\n        \n                for lt in tfDoc.layers:\n                    print(lt.name, lt.output_shape)\n                    if 'conv' in lt.name.lower():\n                        weights = lt.get_weights()\n                        l.weight.data = torch.tensor(weights[0]).transpose(0,2).cuda()\n                        l.bias.data = torch.tensor(weights[1]).cuda()\n                        #print(len(l.get_weights()), [p.shape for p in l.get_weights()])\n                        break\n                break\n\n        #for lt in tfDoc.layers:\n        #    print('tf2')\n        #    print(lt.name, lt.output_shape)\n        #    if 'attention' in lt.name:\n        # TODO: we should just pass the specific layer\n        self.attention.fromTensorFlow(tfDoc)\n\n        print('phase2')\n        for l in self.phase2:\n            if 'attentive' in l._get_name().lower():\n                for lt in tfDoc.layers:\n                    print(lt.name)\n                    if 'model' in lt.name.lower():\n                        print('copying attentive pooling')\n                        l.fromTensorFlow(lt)\n\n        \n\n    \n    def forward(self, x):\n        # print(x.shape)\n        l_cnnt = self.phase1(x)\n        # print('doc_encoder:phase1',l_cnnt.shape)\n        l_cnnt = self.attention([l_cnnt]*3)\n        # print('doc_encoder:attention', l_cnnt.shape)\n        result = self.phase2(l_cnnt)\n        # print('doc_encoder:phase2', result.shape)\n        return result\n\n\nclass VecTail(nn.Module):\n    def __init__(self, n):\n        super(VecTail, self).__init__()\n        self.n = n\n\n    def forward(self, x):\n        return x[:,-self.n:,:]\n\nclass UserEncoder(nn.Module):\n    def __init__(self):        \n        super(UserEncoder,self).__init__()\n        # news_vecs_input = Input(shape=(50,400), dtype='float32')\n        #self.dropout1 = nn.Dropout(0.2)\n        #self.tail = VecTail(15)\n        #self.gru = nn.GRU(400, 400)\n        #self.attention = nn.MultiheadAttention(400, 20)\n        #self.pool = AttentivePooling(50, 400)\n        #self.attention2 = nn.MultiheadAttention(400, 20, batch_first=True)\n        self.attention2 = Attention(400, 20, 20)\n        #torch.nn.init.xavier_uniform_(self.attention2.in_proj_weight, gain=np.sqrt(2))\n        self.dropout2 = nn.Dropout(0.2)\n        self.pool2 = AttentivePooling(50, 400)\n        self.tail2 = VecTail(20)\n        #TODO: what is batch_first?\n        self.gru2 = nn.GRU(400,400, bidirectional=False, batch_first=True)\n        self.pool3 = AttentivePooling(2, 400)\n\n    def forward(self, news_vecs_input):    \n        #news_vecs =self.dropout1(news_vecs_input)\n        #gru_input = self.tail(news_vecs)\n        #vec1 = self.gru(gru_input)\n        #vecs2 = self.attention(*[news_vecs]*3)\n        #vec2 = self.pool(vecs2)\n        # print('news_vecs_input', news_vecs_input.shape)\n        user_vecs2 = self.attention2([news_vecs_input]*3)\n        user_vecs2 = self.dropout2(user_vecs2)\n        user_vec2 = self.pool2(user_vecs2)\n        # print('pool2_user_vec2', user_vec2.shape)\n        #user_vec2 = keras.layers.Reshape((1,400))(user_vec2)\n        #user_vec2 = user_vec2.unsqueeze(1)\n\n        user_vecs1 = self.tail2(news_vecs_input)\n        # print('tail2_user_vecs1', user_vecs1.shape)\n        self.gru2.flatten_parameters()\n        user_vec1, _u_hidden = self.gru2(user_vecs1)\n        # print('gru2_user_vec1', user_vec1.shape)\n        # TODO: does this flatten the second dimension? print out the shape to check\n        user_vec1 = user_vec1[:, -1, :]\n        #user_vec1 = keras.layers.Reshape((1,400))(user_vec1)\n        #user_vec1 = user_vec1.unsqueeze(1)\n        \n        user_vecs = torch.stack([user_vec1, user_vec2], dim=1) #keras.layers.Concatenate(axis=-2)([user_vec1,user_vec2])\n        # print(user_vecs.shape)\n        vec = self.pool3(user_vecs)\n        # print(vec.shape)\n        return vec\n\n    def fromTensorFlow(self, tfU):\n        for l in tfU.layers:\n            print(l.name, l.output_shape)\n            if l.name == 'model_1':\n                self.pool2.fromTensorFlow(l)\n            elif l.name == 'model_2':\n                self.pool3.fromTensorFlow(l)\n            elif l.name=='gru_1':                              \n                print(len(l.get_weights()), [p.shape for p in l.get_weights()])\n                weights = l.get_weights()\n                for p in self.gru2.named_parameters():\n                    s1 = p[1].data.shape\n                    if p[0] == 'weight_ih_l0':                        \n                        p[1].data = torch.tensor(weights[0]).transpose(0,1).contiguous().cuda()\n                    elif p[0] == 'weight_hh_l0':\n                        p[1].data = torch.tensor(weights[1]).transpose(0,1).contiguous().cuda()\n                    elif p[0] == 'bias_ih_l0':\n                        p[1].data = torch.tensor(weights[2]).cuda()\n                    elif p[0] == 'bias_hh_l0':\n                        p[1].data = torch.zeros(p[1].data.shape).cuda()\n                    print(p[0], s1, p[1].shape)\n        self.attention2.fromTensorFlow(tfU)\n        # TODO: GRU\n        \n            \n\n\nclass TimeDistributed(nn.Module):    \n    def __init__(self, module): #, batch_first=False):\n        super(TimeDistributed, self).__init__()\n        self.module = module\n        # self.batch_first = batch_first\n\n    def forward(self, x):\n        # print('TimeDist_x',x.size())\n        if len(x.size()) <= 2:\n            return self.module(x)\n\n        output = torch.tensor([]).cuda(x.get_device())\n        for i in range(x.size(1)):\n          output_t = self.module(x[:, i, :, :])\n          output_t  = output_t.unsqueeze(1)\n          output = torch.cat((output, output_t ), 1)\n          # print('TimeDist_output', output.size())\n        return output\n        # # Squash samples and timesteps into a single axis\n        # x_reshape = x.contiguous().view(x.size(0), -1, x.size(-1))  # (samples * timesteps, input_size)\n        #print('TimeDist_x_reshape',x_reshape.shape)\n        # y = self.module(x_reshape)\n        # print('TimeDist_y', y.shape)\n        # # We have to reshape Y\n        # if self.batch_first:\n        #     y = y.contiguous().view(x.size(0), -1, y.size(-1))  # (samples, timesteps, output_size)\n        # else:\n        #    y = y.view(-1, x.size(1), y.size(-1))  # (timesteps, samples, output_size)\n        # print('TimeDist_y_reshape',y.size())\n        #return y\n\nclass FedNewsRec(nn.Module):\n    def __init__(self, title_word_embedding_matrix):\n        super(FedNewsRec, self).__init__()\n        self.doc_encoder = DocEncoder() \n        self.user_encoder = UserEncoder()\n        self.title_word_embedding_layer = nn.Embedding.from_pretrained(torch.tensor(title_word_embedding_matrix, dtype=torch.float), freeze=True)\n    \n        # click_title = Input(shape=(50,30),dtype='int32')\n        # can_title = Input(shape=(1+npratio,30),dtype='int32')\n    \n        self.softmax = nn.Softmax(dim=1)\n        self.click_td = TimeDistributed(self.doc_encoder) #, batch_first=True)\n        self.can_td = TimeDistributed(self.doc_encoder) #, batch_first=True)\n        \n    def forward(self, click_title, can_title):\n        click_word_vecs = self.title_word_embedding_layer(click_title)\n        # print('click', click_word_vecs.shape, click_word_vecs.type)\n        can_word_vecs = self.title_word_embedding_layer(can_title)\n        # print('can', can_word_vecs.shape, can_word_vecs.type)\n        click_vecs = self.click_td(click_word_vecs)\n        # print('click_vecs (None, 50, 400)', click_vecs.shape)\n        can_vecs = self.can_td(can_word_vecs)\n        # print('can_vecs (None, 5, 400)', can_vecs.shape)\n    \n        user_vec = self.user_encoder(click_vecs)        \n        # print('user_vec (None, 400)', user_vec.shape)\n        # TODO verify\n        scores = torch.einsum('ijk,ik->ij',  can_vecs, user_vec)\n        #if verbose:            \n        #    print('model scores:', scores.detach().cpu().numpy())\n        # print('scores  (None, 5)', scores.shape)\n        #logits = self.softmax(scores)     \n        # pytorch crossentropyloss function accepts unnormalized scores.\n        logits = scores\n        # print('logits  (None, 5)', logits.shape)\n        \n        #news_word_vecs = self.title_word_embedding_layer(news_input)\n        #news_vec = self.doc_encoder(news_word_vecs)\n        \n        # print('user_vec', user_vec.shape)\n        # print('news_vec', news_vec.shape)        \n        return logits, user_vec #, news_vec\n\n    def news_encoder(self, news_title):\n        news_word_vecs = self.title_word_embedding_layer(news_title)\n        news_vec = self.doc_encoder(news_word_vecs)\n        return news_vec\n"
  },
  {
    "path": "experiments/fednewsrec/model.py",
    "content": "import os\nimport torch\nfrom torch.nn import CrossEntropyLoss\nfrom torch.nn import functional as F\nimport numpy as np\nfrom sklearn.metrics import roc_auc_score\nfrom nltk.tokenize import word_tokenize\n\nfrom core.model import BaseModel\nfrom experiments.fednewsrec.utils import ndcg_score, mrr_score\nfrom experiments.fednewsrec.fednewsrec_model import FedNewsRec\n\n''' \n    The FedNewsRec model is taken from FedNewsRec-EMNLP-Findings-2020 repository and ported to PyTorch\n    framework to be compatible with FLUTE (https://github.com/simra/FedNewsRec#fednewsrec-emnlp-findings-2020). \n    For more information regarding this model, please refer to https://github.com/taoqi98/FedNewsRec.\n'''\n\nclass FEDNEWS(BaseModel):\n    '''This is a PyTorch model with some extra methods'''\n\n    def __init__(self, model_config):\n        super().__init__()\n\n        root_data_path = model_config['embbeding_path']\n        embedding_path = model_config['embbeding_path']\n\n        news,news_index,category_dict,subcategory_dict,word_dict = self.read_news(root_data_path,['train','val'])\n        title_word_embedding_matrix, _ = self.load_matrix(embedding_path,word_dict)\n        self.net = FedNewsRec(title_word_embedding_matrix)\n\n    def loss(self, input: torch.Tensor) -> torch.Tensor:\n        '''Performs forward step and computes the loss'''\n\n        if not self.net.training:\n            return torch.tensor(0) # Not using the loss during evaluation\n            \n        device = 'cuda' if torch.cuda.is_available() else 'cpu'\n        (click, sample), label = input['x'], input['y']\n        click = click.to(device)\n        sample = sample.to(device)\n        label = label.to(device)\n        criterion = CrossEntropyLoss()\n        output, _ = self.net.forward(click, sample)\n        return criterion(output, label)\n\n    def inference(self, input):\n        '''Performs forward step and computes metrics'''\n        device = 'cuda' if torch.cuda.is_available() else 'cpu'\n        (nv_hist, nv_imp), labels = input['x'], input['y']\n        nv_hist = nv_hist.to(device)\n        nv_imp = nv_imp.to(device)\n\n        nv = self.net.news_encoder(nv_imp).detach().cpu().numpy()  # news vector?\n        nv_hist = self.net.news_encoder(nv_hist)\n        uv = self.net.user_encoder(nv_hist.unsqueeze(0)).detach().cpu().numpy()[0] # user vector?\n\n        score = np.dot(nv,uv)\n        auc = roc_auc_score(labels,score)\n        mrr = mrr_score(labels,score)\n        acc = ndcg_score(labels,score,k=1)\n        ndcg5 = ndcg_score(labels,score,k=5)\n        ndcg10 = ndcg_score(labels,score,k=10)\n\n        return {'output':None, 'acc': acc, 'batch_size': 1, \\\n                'auc': {'value':auc,'higher_is_better': True},\n                'mrr': {'value':mrr,'higher_is_better': True},\n                'ndcg5': {'value':ndcg5,'higher_is_better': True},\n                'ndcg10': {'value':ndcg10,'higher_is_better': True}} \n\n    def read_news(self, root_data_path, modes):\n        news={}\n        category=[]\n        subcategory=[]\n        news_index={}\n        index=1\n        word_dict={}\n        word_index=1\n        \n        for mode in modes:\n            with open(os.path.join(root_data_path,mode,'news.tsv'), encoding=\"utf8\") as f:\n                lines = f.readlines()\n            for line in lines:\n                splited = line.strip('\\n').split('\\t')\n                doc_id,vert,subvert,title= splited[0:4]\n                if doc_id in news_index:\n                    continue\n                news_index[doc_id]=index\n                index+=1\n                category.append(vert)\n                subcategory.append(subvert)\n                title = title.lower()\n                title=word_tokenize(title)\n                news[doc_id]=[vert,subvert,title]\n                for word in title:\n                    word = word.lower()\n                    if not(word in word_dict):\n                        word_dict[word]=word_index\n                        word_index+=1\n        category=list(set(category))\n        subcategory=list(set(subcategory))\n        category_dict={}\n        index=1\n        for c in category:\n            category_dict[c]=index\n            index+=1\n        subcategory_dict={}\n        index=1\n        for c in subcategory:\n            subcategory_dict[c]=index\n            index+=1\n        return news,news_index,category_dict,subcategory_dict,word_dict\n    \n    def load_matrix(self, embedding_path,word_dict):\n        embedding_matrix = np.zeros((len(word_dict)+1,300))\n        have_word=[]\n        with open(os.path.join(embedding_path,'glove.840B.300d.txt'),'rb') as f:\n            while True:\n                l=f.readline()\n                if len(l)==0:\n                    break\n                l=l.split()\n                word = l[0].decode()\n                if word in word_dict:\n                    index = word_dict[word]\n                    tp = [float(x) for x in l[1:]]\n                    embedding_matrix[index]=np.array(tp)\n                    have_word.append(word)\n        return embedding_matrix,have_word\n            "
  },
  {
    "path": "experiments/fednewsrec/utils.py",
    "content": "import numpy as np\n\ndef mrr_score(y_true, y_score):\n    order = np.argsort(y_score)[::-1]\n    y_true = np.take(y_true, order)\n    rr_score = y_true / (np.arange(len(y_true)) + 1)\n    return np.sum(rr_score) / np.sum(y_true)\n\ndef ndcg_score(y_true, y_score, k=10):\n    best = dcg_score(y_true, y_true, k)\n    actual = dcg_score(y_true, y_score, k)\n    return actual / best\n\ndef dcg_score(y_true, y_score, k=10):\n    order = np.argsort(y_score)[::-1]\n    y_true = np.take(y_true, order[:k])\n    gains = 2 ** y_true - 1\n    discounts = np.log2(np.arange(len(y_true)) + 2)\n    return np.sum(gains / discounts)"
  },
  {
    "path": "experiments/mlm_bert/README.md",
    "content": "# Simple example of a MLM task on Reddit Dataset\n\nInstructions on how to run the experiment, given below.\n\n## Preparing the data\n\nFor this experiment, we can create a dummy dataset by running the \nscript located in `testing/create_data.py` as follows:\n\n```code\n    python create_data.py --task mlm_bert\n```\n\nA couple of scripts are provided in `utils/preprocessing` for preprocessing .tsv files\nin case you want to use your own data.\n\n## Creating a config file\n\nAll the parameters of the experiment are passed in a YAML file. An example is\nprovided in `configs/hello_world_mlm_bert_json.yaml` with the suggested parameters\nto do a simple run for this experiment. Make sure to point your training files at\nthe fields: list_of_train_data, test_data and val_data inside the config file.\n\n## Running the experiment locally\n\nFinally, to launch the experiment, it suffices to launch the `e2e_trainer.py`\nscript using torch.distributed:\n\n```code\n    python -m torch.distributed.run --nproc_per_node=2 .\\e2e_trainer.py -dataPath data_folder -outputPath scratch -config configs\\hello_world_mlm_bert_json.yaml -task mlm_bert -backend nccl\n```\n\nFor submitting jobs in Azure ML, we have included the instructions in the `Experiments` \nsection of the main `README.md`."
  },
  {
    "path": "experiments/mlm_bert/config.py",
    "content": "from __future__ import annotations\nfrom dataclasses import dataclass\nimport sys\nsys.path.append('../../')\nfrom core.config import ModelConfig, Config, from_dict\n\n\n@dataclass\nclass BERTModelConfig(Config):\n    \"\"\"BERT model configuration\n\nThe BERT configuration specifies huggingface-specific BERT model settings.\n\nAttributes:\n    model_name (str): The name of the BERT model.  eg bert-base-uncased.\n\n    cache_dir (str): Tokenizer cache directory, will be created if it doesn't exist.\n\n    use_fast_tokenizer (bool): Whether to use the fast tokenizer.\n\n    mask_token (str): special token to use for masking.\n\n    task (str): The task to use for BERT.  eg mlm.\n\n    past_index (int): The index of the past state in the BERT model's state dict.\n\n    prediction_loss_only (bool): if False, also produce metrics for predictions and labels.\n\n    process_line_by_line (bool): if True, process the input line-by-line.\n\nToDo:\n    * check how cache_dir is used- there's a risk of multiple processes reading/writing at the same time.\n    * verify the meaning of past_index (thanks copilot)\n    * document the difference when process_line_by_line is True vs False\n\n    \"\"\"\n    model_name: str = None\n    cache_dir: str = None\n    use_fast_tokenizer: bool = False\n    mask_token: str = '<mask>'\n    task: str = 'mlm'\n    past_index: int | None = -2\n    prediction_loss_only: bool = False\n    process_line_by_line: bool = False\n\n    @staticmethod\n    def from_dict(config) -> BERTModelConfig:\n        return from_dict(BERTModelConfig, config)\n\n\n@dataclass\nclass BERTTrainingConfig(Config):\n    \"\"\"BERT training configuration\n\n    Configuration settings for BERT training.\n\n    Attributes:\n        seed (int): random seed for reproducibility.\n\n        label_smoothing_factor (float): label smoothing factor.  Applied label smoothing when the factor is non-zero.\n\n        batch_size (int): batch size.\n\n        max_seq_length (int): maximum input sequence length.\n    \"\"\"\n    seed: int | None = None\n    label_smoothing_factor: float | None = None\n    batch_size: int | None = None\n    max_seq_length: int | None = None\n\n    @staticmethod\n    def from_dict(config) -> BERTTrainingConfig:\n        return from_dict(BERTTrainingConfig, config)\n\n\n@dataclass\nclass BERTSpecificConfig(Config):\n    \"\"\"BERT configuration\n    Specifies the model and training configuration for huggingface modeling scenarios.\n\n    Attributes:\n        loader_type (str): loader type hint. eg 'text'\n\n        model (BERTModelConfig): BERT model configuration.\n\n        training (BERTTrainingConfig): BERT training configuration.\n    \"\"\"\n    loader_type: str = None\n    model: BERTModelConfig = None\n    training: BERTTrainingConfig = None\n\n    @staticmethod\n    def from_dict(config) -> BERTSpecificConfig:\n        result = BERTSpecificConfig()\n        for k in config:\n            if k == 'model':\n                result.model = BERTModelConfig.from_dict(config[k])\n            elif k == 'training':\n                result.training = BERTTrainingConfig.from_dict(config[k])\n            else:\n                setattr(result, k, config[k])\n        return result\n\n\n@dataclass\nclass BERTConfig(ModelConfig):\n    \"\"\"\n    Expected MLM config wraps the BERTSpecificConfig as a sub-field of the ModelConfig.\n    \"\"\"\n    BERT: BERTSpecificConfig = None\n\n    @staticmethod\n    def from_dict(config) -> ModelConfig:\n        result = BERTConfig()\n        for k in config:\n            if k==\"BERT\":\n                result.BERT = BERTConfig.from_dict(config[k])\n            else:\n                setattr(result, k, config[k])\n        return result\n"
  },
  {
    "path": "experiments/mlm_bert/dataloaders/dataloader.py",
    "content": "# Copyright (c) Microsoft Corporation.\n# Licensed under the MIT license.\n\nfrom transformers.data.data_collator import default_data_collator, DataCollatorWithPadding\nfrom torch.utils.data import RandomSampler, SequentialSampler\nfrom transformers import AutoTokenizer\nfrom transformers import DataCollatorForLanguageModeling\nfrom experiments.mlm_bert.dataloaders.dataset import Dataset\nfrom core.dataloader import BaseDataLoader\nfrom utils import print_rank\nimport logging\n\nclass DataLoader(BaseDataLoader):\n    \"\"\"\n    PyTorch dataloader for loading text data from\n    text_dataset.\n    \"\"\"\n    def __init__(self, mode, data, num_workers=0,  **kwargs):\n\n        args = kwargs['args']\n        task = args['task']\n        user_idx = kwargs['user_idx']\n        mlm_probability = args['mlm_probability']\n        self.batch_size = args['batch_size']\n        self.mode = mode\n        self.num_workers = num_workers\n        self.utt_ids = None\n        max_samples_per_user = args.get('max_samples_per_user', -1)\n        min_words_per_utt = args.get('min_words_per_utt', 5)\n        tokenizer_kwargs = {\n                            \"cache_dir\": args['cache_dir'],\n                            \"use_fast\": args['tokenizer_type_fast'],\n                            \"use_auth_token\":  None\n                        }                     \n        \n        if 'tokenizer_name' in args:\n            tokenizer = AutoTokenizer.from_pretrained(args['tokenizer_name'], **tokenizer_kwargs)\n        elif 'model_name_or_path' in args:\n            tokenizer = AutoTokenizer.from_pretrained(args['model_name_or_path'], **tokenizer_kwargs)\n        else:\n            raise ValueError(\"You are instantiating a new tokenizer from scratch. This is not supported by this script.\")\n\n        print_rank(\"Tokenizer is: {}\".format(tokenizer), loglevel=logging.DEBUG)\n        \n        dataset = Dataset(\n                                data,\n                                args= args,\n                                test_only = self.mode is not 'train',\n                                tokenizer= tokenizer,\n                                user_idx=user_idx,\n                                max_samples_per_user=max_samples_per_user,\n                                min_words_per_utt=min_words_per_utt,\n                              )\n        self.utt_ids = dataset.user\n\n        try:\n            data_collator = DataCollatorForLanguageModeling(\n                                                    tokenizer=tokenizer,\n                                                    mlm= task=='mlm',\n                                                    mlm_probability=mlm_probability,)\n        except:\n\n            print('There is an issue with the DataCollator .. Falling back to default_data_collator')\n            data_collator = default_data_collator if tokenizer is None else DataCollatorWithPadding(tokenizer)\n\n        if self.mode == 'train':\n            train_sampler = RandomSampler(dataset)\n            super(DataLoader, self).__init__(\n                                            dataset,\n                                            batch_size=self.batch_size,\n                                            sampler=train_sampler,\n                                            collate_fn=data_collator,\n                                            drop_last=False,\n                                            num_workers=self.num_workers,\n                                            pin_memory=True,\n                                            )\n                                            \n        elif self.mode == 'val' or self.mode == 'test':\n            eval_sampler = SequentialSampler(dataset)\n            super(DataLoader, self).__init__(\n                                            dataset,\n                                            sampler=eval_sampler,\n                                            batch_size= self.batch_size,\n                                            collate_fn=data_collator,\n                                            drop_last=False,\n                                            num_workers=self.num_workers,\n                                            pin_memory=True,\n                                            )\n\n        else:\n            raise Exception(\"Sorry, there is something wrong with the 'mode'-parameter \")\n\n    def get_user(self):\n        return self.utt_ids\n    "
  },
  {
    "path": "experiments/mlm_bert/dataloaders/dataset.py",
    "content": "# Copyright (c) Microsoft Corporation.\n# Licensed under the MIT license.\n\nfrom core.dataset import BaseDataset\nfrom transformers import AutoTokenizer\nfrom utils import print_rank\nimport logging\nimport json\nimport itertools\n\nclass Dataset(BaseDataset):\n    \"\"\"\n    Map a text source to the target text\n    \"\"\"\n\n    def __init__(self, data, args, tokenizer=None, test_only=False, user_idx=0, max_samples_per_user=-1, min_words_per_utt=5, **kwargs):\n\n        self.utt_list = list()\n        self.test_only= test_only\n        self.padding = args.get('padding', True)\n        self.max_seq_length= args['max_seq_length']\n        self.max_samples_per_user = max_samples_per_user\n        self.min_num_words = min_words_per_utt\n        self.process_line_by_line=args.get('process_line_by_line', False)\n        self.user = None\n\n        if tokenizer != None:\n            self.tokenizer = tokenizer\n        else:\n            tokenizer_kwargs = {\n                    \"cache_dir\": args['cache_dir'],\n                    \"use_fast\": args['tokenizer_type_fast'],\n                    \"use_auth_token\":  None\n                }                     \n        \n            if 'tokenizer_name' in args:\n                self.tokenizer = AutoTokenizer.from_pretrained(args['tokenizer_name'], **tokenizer_kwargs)\n            elif 'model_name_or_path' in args:\n                self.tokenizer = AutoTokenizer.from_pretrained(args['model_name_or_path'], **tokenizer_kwargs)\n            else:\n                raise ValueError(\"You are instantiating a new tokenizer from scratch. This is not supported by this script.\")\n\n        if self.max_seq_length is None:\n            self.max_seq_length = self.tokenizer.model_max_length\n            if self.max_seq_length > 512:\n                print_rank(\n                    f\"The tokenizer picked seems to have a very large `model_max_length` ({self.tokenizer.model_max_length}). \"\n                    \"Picking 512 instead. You can change that default value by passing --max_seq_length xxx.\", loglevel=logging.DEBUG\n                )\n                self.max_seq_length = 512\n        else:\n            if self.max_seq_length > self.tokenizer.model_max_length:\n                print_rank(\n                    f\"The max_seq_length passed ({self.max_seq_length}) is larger than the maximum length for the\"\n                    f\"model ({self.tokenizer.model_max_length}). Using max_seq_length={self.tokenizer.model_max_length}.\", loglevel=logging.DEBUG\n                )\n            self.max_seq_length = min(self.max_seq_length, self.tokenizer.model_max_length)\n\n        self.load_data(data, user_idx)\n\n        if user_idx != -1: # Avoid loading unnecessary data on memory before training\n            if not self.process_line_by_line:\n                self.post_process_list()\n\n\n    def __len__(self):\n        return len(self.utt_list)\n\n    def __getitem__(self, idx):\n        # Find the index in the available data\n        if self.process_line_by_line:\n            tokenized_text = LineByLineTextDataset(\n                                tokenizer=self.tokenizer,\n                                input_lines=self.utt_list[idx]['src_text'],\n                                line_by_line=True,\n                                truncation=True,\n                                max_length=self.max_seq_length,\n                                padding=\"max_length\")\n\n            self.utt_list[idx]['duration']= len(tokenized_text['input_ids'])\n            return tokenized_text\n        else:\n            return self.utt_list[idx]\n\n\n    def load_data(self, orig_strct, user_idx):\n        \"\"\" Reads the data for a specific user (unless it's for val/testing) and returns a \n        list of embeddings and targets.\"\"\"\n\n        if isinstance(orig_strct, str):\n            print('Loading json-file: ', orig_strct)\n            with open(orig_strct, 'r') as fid:\n                orig_strct = json.load(fid)\n\n        self.user_list  = orig_strct['users']\n        self.num_samples= orig_strct['num_samples']\n        self.user_data  = orig_strct['user_data']\n\n        if user_idx != -1: # Avoid loading unnecessary data on memory before training\n            if self.test_only:\n                self.user = 'test_only'\n                self.process_x(self.user_data)\n            else:\n                self.user = self.user_list[user_idx]\n                self.process_x(self.user_data[self.user])\n\n    def process_x(self, raw_x_batch):\n\n        if self.test_only:\n            for i, user in enumerate(self.user_list):\n                counter=self.process_user(user, raw_x_batch[user])\n                self.num_samples[i] = counter # Update userdata counter \"num_samples[user]\" after truncation\n        else:\n            counter = self.process_user(self.user, raw_x_batch)\n            self.num_samples[self.user_list.index(self.user)] = counter # Update userdata counter \"num_samples[user]\" after truncation\n\n        if len(self.utt_list) == 0:\n            self.utt_list = [{'src_text': 'N/A', 'duration': 0, 'loss_weight': 1.0}]\n\n        print_rank('Processing json-structure for User: {} Utterances Processed: {}'.format(self.user, len(self.utt_list)), loglevel=logging.DEBUG)\n\n    def process_user(self, user, user_data):\n        counter=0\n        for line in user_data:\n            for e in line:\n                if len(e.split()) < self.min_num_words:\n                    continue\n                if self.max_samples_per_user > -1 and counter >= self.max_samples_per_user:\n                    print_rank('Max allowed size per user is reached for user: {},  N: {} utts,  Utt_list Len: {}' \\\n                               .format(user, counter, len(self.utt_list)), loglevel=logging.DEBUG)\n                    return counter\n                counter += 1\n\n                utt = {}\n                utt['src_text'] = e\n                utt['duration'] = len(e.split())\n                utt['loss_weight'] = 1.0\n                self.utt_list.append(utt)\n        return counter\n\n\n    def post_process_list(self):\n\n        # Use only the text part of the dataset\n        input_lines=[line['src_text'] for line in self.utt_list]\n\n        # Process all lines of text\n        print_rank('Tokenizing {} Utterances'.format(len(input_lines)), loglevel=logging.DEBUG)\n        self.utt_list= LineByLineTextDataset(self.tokenizer, input_lines) #this one has return_special_tokens_mask as True\n        \n        def group_texts(examples):\n            \"\"\"\"Main data processing function that will concatenate all texts\n            from our dataset and generate chunks of max_seq_length.\"\"\"\n            \n            print_rank('Concatenating Frames in Sequences of {} samples'.format(self.max_seq_length), loglevel=logging.DEBUG)\n\n            if self.padding: # Padding last frame\n\n                total_length = sum([len(k) for k in examples['input_ids']])\n                print_rank('Found {} samples Before Concatenation'.format(total_length), loglevel=logging.DEBUG)\n                padN= self.max_seq_length - (total_length % self.max_seq_length)\n                print_rank('Padding last frame with {} samples'.format(padN), loglevel=logging.DEBUG)\n                print_rank('keys {}'.format(examples.keys()), loglevel=logging.DEBUG)\n                examples['input_ids'].append([self.tokenizer.convert_tokens_to_ids(self.tokenizer.pad_token)]*padN) \n                examples['attention_mask'].append([0]*padN)\n                \n                if 'special_tokens_mask' in examples.keys():\n                    examples['special_tokens_mask'].append([1]*padN)\n\n                if 'token_type_ids' in examples.keys():\n                    examples['token_type_ids'].append([0]*padN)\n\n       \n            # Concatenate all input.\n            concatenated_examples = {k: list(itertools.chain.from_iterable(examples[k])) for k in examples.keys()}\n            total_length = len(concatenated_examples[list(examples.keys())[0]])\n            print_rank('Concatenated in {} Samples'.format(total_length), loglevel=logging.DEBUG)\n            total_length = (total_length // self.max_seq_length) * self.max_seq_length\n            print_rank('Concatenated in {} Frames'.format(total_length // self.max_seq_length), loglevel=logging.DEBUG)\n\n            # Split by chunks of max_len\n            self.utt_list=[]\n            for i in range(0, total_length, self.max_seq_length):\n                utt={}\n                for k, t in concatenated_examples.items():\n                    utt[k]= t[i : i + self.max_seq_length]\n                self.utt_list.append(utt)\n                print_rank('Utterance Len is: {}'.format(len(utt['input_ids'])),loglevel=logging.DEBUG)\n                \n        # Process list of text\n        group_texts(self.utt_list) \n\n        total_length = len(self.utt_list)\n        print_rank('Finished Reshaping in Sequences of {} Frames'.format(total_length), loglevel=logging.INFO)\n\n        # Update userdata after truncation\n        if not self.test_only:\n            self.num_samples[self.user_list.index(self.user)] = total_length\n\n        # Not used anywhere but necessary when the dataset is initiated\n        if total_length == 0:\n            self.utt_list = [{\"input_ids\": [0, 2], \"special_tokens_mask\": [1, 1], \"attention_mask\": [0, 0]}]\n\ndef LineByLineTextDataset(tokenizer, input_lines, truncation=True, max_length=512, padding = False, line_by_line=False):\n\n    if input_lines==['N/A']:\n        batch_encoding = {\"input_ids\": [[0, 2]], \"special_tokens_mask\": [[1, 1]], \"attention_mask\": [[0, 0]]}\n    else:\n        lines = [line for line in input_lines if (len(line) > 0 and not line.isspace())]\n        print_rank ('padding is : ' + str(padding),loglevel=logging.DEBUG)\n        print_rank ('max_length is : ' + str(max_length),loglevel=logging.DEBUG)\n        batch_encoding = tokenizer(lines, truncation=truncation, max_length=max_length, padding = padding, return_special_tokens_mask=True,)\n    if line_by_line:\n        batch_encoding[\"input_ids\"] = batch_encoding[\"input_ids\"][0]\n        batch_encoding[\"special_tokens_mask\"] = batch_encoding[\"special_tokens_mask\"][0]\n        batch_encoding[\"attention_mask\"] = batch_encoding[\"attention_mask\"][0]\n\n    return batch_encoding"
  },
  {
    "path": "experiments/mlm_bert/model.py",
    "content": "# Copyright (c) Microsoft Corporation.\n# Licensed under the MIT license.\n\nimport torch as T\nfrom utils import print_rank\nimport logging\nimport copy\nfrom typing import (Dict, \n                    List, \n                    Optional, \n                    Tuple, \n                    Union)\n\nfrom experiments.mlm_bert.utils.trainer_pt_utils import (\n    LabelSmoother,\n    DistributedTensorGatherer,\n    nested_concat,\n    nested_detach,\n    nested_numpify,\n)\n\nfrom experiments.mlm_bert.utils.trainer_utils import (\n    EvalPrediction,\n    ComputeMetrics)\n\nfrom transformers import (\n                    MODEL_FOR_MASKED_LM_MAPPING,\n                    AutoConfig,\n                    AutoModelForMaskedLM,\n                    AutoTokenizer,\n                    set_seed,\n)\nfrom utils.utils import to_device\nfrom core.model import BaseModel\n\nMODEL_CONFIG_CLASSES = list(MODEL_FOR_MASKED_LM_MAPPING.keys())\nMODEL_TYPES = tuple(conf.model_type for conf in MODEL_CONFIG_CLASSES)\n\nclass BERT(BaseModel):\n    def __init__(self, model_config, **kwargs):\n        super(BERT, self).__init__()\n        \"\"\"\n            from transformers import RobertaConfig\n            config = RobertaConfig(\n                        vocab_size=52_000,\n                        max_position_embeddings=514,\n                        num_attention_heads=12,\n                        num_hidden_layers=6,\n                        type_vocab_size=1,\n            )\n\n            from transformers import RobertaTokenizerFast\n            tokenizer = RobertaTokenizerFast.from_pretrained(\"./EsperBERTo\", max_len=512)\n\n            from transformers import RobertaForMaskedLM\n            model = RobertaForMaskedLM(config=config)\n        \"\"\"\n\n        # Extracting model_config['BERT']\n        args = model_config['BERT']\n        # Split data to smaller configuration parameters\n        model_args, training_args = args['model'], args['training']\n\n        # Set seed before initializing model.\n        set_seed(training_args['seed'])\n\n        self.gradient_accumulation_steps =  model_args.get('gradient_accumulation_steps', 1)\n        self.past_index = model_args.get('past_index', -1)\n        self.prediction_loss_only = model_args.get('prediction_loss_only', True)\n        self.eval_accumulation_steps = model_args.get('eval_accumulation_steps', None)\n        self.label_names = model_args.get('label_names', None)\n        self.batch_size= training_args['batch_size']\n        self.model_name=model_args['model_name']\n\n        if 'model_name_or_path' not in model_args:\n            model_args['model_name_or_path']=self.model_name\n\n        # Label smoothing\n        if training_args['label_smoothing_factor'] != 0:\n            self.label_smoother = LabelSmoother(epsilon=training_args['label_smoothing_factor'])\n        else:\n            self.label_smoother = None\n        self.label_names = ( [\"labels\"]) if self.label_names is None else self.label_names\n\n        config_kwargs = {\n                        \"cache_dir\": model_args['cache_dir'],\n                        \"revision\": None,\n                        \"use_auth_token\": None,\n                    }\n\n        if 'config_name' in model_args:\n            config = AutoConfig.from_pretrained(model_args['config_name'], **config_kwargs)\n        elif 'model_name_or_path' in model_args:\n            config = AutoConfig.from_pretrained(model_args['model_name_or_path'], **config_kwargs)\n        else:\n            raise ValueError(\n                \"You are instantiating a new configuration from scratch. This is not supported by this script.\"\n            )\n\n\n        tokenizer_kwargs = {\n                            \"cache_dir\": model_args['cache_dir'],\n                            \"use_fast\": model_args['use_fast_tokenizer'],\n                            \"use_auth_token\":  None,\n                        }\n        if 'tokenizer_name' in model_args:\n            tokenizer = AutoTokenizer.from_pretrained(model_args['tokenizer_name'], **tokenizer_kwargs)\n        elif 'model_name_or_path' in model_args:\n            print('Loading Tokenizer from Pretrained: {}'.format(model_args['model_name_or_path']) )\n            tokenizer = AutoTokenizer.from_pretrained(model_args['model_name_or_path'], **tokenizer_kwargs)\n        else:\n            raise ValueError(\n                \"You are instantiating a new tokenizer from scratch. This is not supported by this script.\"\n            )\n        self.output_layer_size=len(tokenizer)\n\n        if 'model_name_or_path' in model_args:\n            print('Loading Model from Pretrained: {}'.format(model_args['model_name_or_path']) )\n            self.model = AutoModelForMaskedLM.from_pretrained(\n                                                    model_args['model_name_or_path'],\n                                                    from_tf=False,\n                                                    config=config,\n                                                    cache_dir=model_args['cache_dir'],\n                                                    use_auth_token=None,\n                                                )\n            if 'adapter' in model_args:\n                if model_args['adapter']:\n                    self.model.add_adapter(\"FLUTE\")\n\n                    #Activate the adapter\n                    self.model.train_adapter(\"FLUTE\")\n\n        else:\n            raise ValueError(\n                \"You are instantiating a new model from scratch. This is not supported by this script.\"\n            )\n        self.model.resize_token_embeddings(self.output_layer_size) \n        total_params = 0\n        trainable_params = 0\n\n        for p in self.model.parameters():\n            total_params += p.numel()\n            if p.requires_grad: \n                trainable_params += p.numel()\n\n        print_rank(f\"Total parameters count: {total_params}\", loglevel=logging.DEBUG) # ~109M\n        print_rank(f\"Trainable parameters count: {trainable_params}\", loglevel=logging.DEBUG) # ~1M\n        print_rank(f\"Original Bert parameters count: {total_params-trainable_params}\", loglevel=logging.DEBUG) # ~1M\n        \n\n    def copy_state_dict(self, state_dict):\n        self.model.state_dict=state_dict.clone()\n\n    def get_model(self):\n        return self.model\n\n\n    def _prepare_inputs(self, inputs):\n        \"\"\"\n        Prepare :obj:`inputs` before feeding them to the model, converting them to tensors if they are not already and\n        handling potential state.\n        \"\"\"\n        for k, v in inputs.items():\n            if isinstance(v, T.Tensor):\n                inputs[k] = to_device(v)\n        if self.past_index >= 0 and self._past is not None:\n            inputs[\"mems\"] = self._past\n\n        return inputs\n\n\n    def forward(self, inputs):\n        inputs = self._prepare_inputs(inputs)\n        return self.model(**inputs)\n\n\n    def loss(self, inputs):\n        \"\"\"\n        Perform a training step on a batch of inputs.\n        Subclass and override to inject custom behavior.\n        Args:\n            model (:obj:`nn.Module`):\n                The model to train.\n            inputs (:obj:`Dict[str, Union[T.Tensor, Any]]`):\n                The inputs and targets of the model.\n                The dictionary will be unpacked before being fed to the model. Most models expect the targets under the\n                argument :obj:`labels`. Check your model's documentation for all accepted arguments.\n        Return:\n            :obj:`T.Tensor`: The tensor with training loss on this batch.\n        \"\"\"\n        inputs = self._prepare_inputs(inputs)\n\n        loss = self.compute_loss(inputs)\n        loss = loss / self.gradient_accumulation_steps\n\n        return loss\n\n\n    def compute_loss(self, inputs_orig, return_outputs=False):\n        \"\"\"\n        How the loss is computed by Trainer. By default, all models return the loss in the first element.\n        Subclass and override for custom behavior.\n\n        inputs (:obj:`Dict[str, Union[T.Tensor, Any]]`):\n                The inputs and targets of the model.\n                The dictionary will be unpacked before being fed to the model. Most models expect the targets under the\n                argument :obj:`labels`. Check your model's documentation for all accepted arguments.\n        \"\"\"\n        # Copy a local copy of the data\n        inputs=copy.deepcopy(inputs_orig)\n\n        if self.label_smoother is not None and \"labels\" in inputs:\n            labels = inputs[\"labels\"].detach().cpu()\n        else:\n            labels = None\n\n        # The following fields need to be removed for Roberta\n        if 'roberta'  in self.model_name:\n            #print(\"here\")\n            if 'attention_mask' in inputs:\n                inputs.pop('attention_mask')\n            if 'special_tokens_mask' in inputs:\n                inputs.pop('special_tokens_mask')\n\n\n        # Forward pass for the transformer\n        outputs = self.model(**inputs)\n\n        if self.past_index >= 0:\n            self._past = outputs[self.past_index]\n\n        if labels is not None:\n            loss = self.label_smoother(outputs, labels)\n        else:\n            # We don't use .loss here since the model may return tuples instead of ModelOutput.\n            loss = outputs[\"loss\"] if isinstance(outputs, dict) else outputs[0]\n\n        return (loss, outputs) if return_outputs else loss\n\n\n\n\n    def inference(\n            self, inputs, ignore_keys: Optional[List[str]] = [], metric_key_prefix: str = \"eval\"\n    ) -> List[float]:\n        \"\"\"\n        Run prediction and returns predictions and potential metrics.\n        Depending on the dataset and your use case, your test dataset may contain labels. In that case, this method\n        will also return metrics, like in :obj:`evaluate()`.\n        Args:\n            inputs (:obj:`Dict[str, Union[T.Tensor, Any]]`):\n                The inputs and targets of the model.\n                The dictionary will be unpacked before being fed to the model. Most models expect the targets under the\n            argument :obj:`labels`. Check your model's documentation for all accepted arguments.\n                            ignore_keys (:obj:`Lst[str]`, `optional`):\n                A list of keys in the output of your model (if it is a dictionary) that should be ignored when\n                gathering predictions.\n            metric_key_prefix (:obj:`str`, `optional`, defaults to :obj:`\"eval\"`):\n                An optional prefix to be used as the metrics key prefix. For example the metrics \"bleu\" will be named\n                \"eval_bleu\" if the prefix is \"eval\" (default)\n        .. note::\n            If your predictions or labels have different sequence length (for instance because you're doing dynamic\n            padding in a token classification task) the predictions will be padded (on the right) to allow for\n            concatenation into one array. The padding index is -100.\n        Returns: `NamedTuple` A namedtuple with the following keys:\n            - predictions (:obj:`np.ndarray`): The predictions on :obj:`test_dataset`.\n            - label_ids (:obj:`np.ndarray`, `optional`): The labels (if the dataset contained some).\n            - metrics (:obj:`Dict[str, float]`, `optional`): The potential dictionary of metrics (if the dataset\n              contained labels).\n        \"\"\"\n\n\n        output, batch_size = self.prediction_loop(\n                                            inputs,\n                                            description=\"Evaluation\",\n                                            ignore_keys=ignore_keys,\n                                            metric_key_prefix=metric_key_prefix)\n        return {'output':output['eval_loss'], 'acc': output['eval_acc'], 'batch_size': batch_size[0]}\n\n\n\n    def prediction_loop(\n                    self,\n                    inputs,\n                    description: str,\n                    ignore_keys: Optional[List[str]] = None,\n                    metric_key_prefix: str = \"eval\",\n            ) -> Union[Dict, List[int]]:\n        \"\"\"\n        Prediction/evaluation loop, shared by :obj:`Trainer.evaluate()` and :obj:`Trainer.predict()`.\n        Works both with or without labels.\n        \"\"\"\n\n        out_label_ids=None\n        if 'labels' in inputs:\n            out_label_ids = inputs['labels'].detach().cpu()\n\n        if 'attention_mask' in inputs:\n            attention_mask= inputs['attention_mask'].detach().cpu()\n\n        losses_host = None\n        preds_host  = None\n        labels_host = None\n\n        world_size = 1\n        num_hosts  = 1\n        eval_losses_gatherer = DistributedTensorGatherer(world_size, num_hosts, make_multiple_of=self.batch_size)\n        if not self.prediction_loss_only:\n            preds_gatherer = DistributedTensorGatherer(world_size, num_hosts)\n            labels_gatherer = DistributedTensorGatherer(world_size, num_hosts)\n\n        self.model.eval()\n        if self.past_index >= 0:\n            self._past = None\n\n        loss, logits, _ = self.prediction_step(inputs, ignore_keys=ignore_keys, has_labels=True)\n        if loss is not None:\n            losses = loss.repeat(self.batch_size).cpu()\n            losses_host = losses if losses_host is None else T.cat((losses_host, losses), dim=0)\n        if logits is not None:\n            preds_host = logits.detach().cpu() if preds_host is None else nested_concat(preds_host, logits, padding_index=-100)\n        if out_label_ids is not None:\n            labels_host = out_label_ids if labels_host is None else nested_concat(labels_host, out_label_ids, padding_index=-100)\n\n        # Gather all tensors and put them back on the CPU if we have done enough accumulation steps.\n        if self.eval_accumulation_steps is not None :\n            eval_losses_gatherer.add_arrays(self._gather_and_numpify(losses_host, \"eval_losses\"))\n            if not self.prediction_loss_only:\n                preds_gatherer.add_arrays(self._gather_and_numpify(preds_host, \"eval_preds\"))\n                labels_gatherer.add_arrays(self._gather_and_numpify(labels_host, \"eval_label_ids\"))\n\n            # Set back to None to begin a new accumulation\n            losses_host, preds_host, labels_host = None, None, None\n\n        if self.past_index and hasattr(self, \"_past\"):\n            # Clean the state at the end of the evaluation loop\n            delattr(self, \"_past\")\n\n        # Gather all remaining tensors and put them back on the CPU\n        if num_hosts>1:\n            eval_losses_gatherer.add_arrays(self._gather_and_numpify(losses_host, \"eval_losses\"), want_masked=True)\n            if not self.prediction_loss_only:\n                preds_gatherer.add_arrays(self._gather_and_numpify(preds_host, \"eval_preds\"))\n                labels_gatherer.add_arrays(self._gather_and_numpify(labels_host, \"eval_label_ids\"))\n\n            eval_loss = eval_losses_gatherer.finalize()\n            preds = preds_gatherer.finalize() if not self.prediction_loss_only else None\n            label_ids = labels_gatherer.finalize() if not self.prediction_loss_only else None\n        else:\n            eval_loss= losses_host\n            preds    = preds_host\n            label_ids= labels_host\n\n        if preds is not None and label_ids is not None:\n            metrics = ComputeMetrics.compute_metrics(EvalPrediction(predictions=preds, label_ids=label_ids), attention_mask)\n        else:\n            metrics = {}\n\n        if eval_loss is not None:\n            metrics[f\"{metric_key_prefix}_loss\"] = eval_loss.mean().item()\n\n        # Prefix all keys with metric_key_prefix + '_'\n        for key in list(metrics.keys()):\n            if not key.startswith(f\"{metric_key_prefix}_\"):\n                metrics[f\"{metric_key_prefix}_{key}\"] = metrics.pop(key).item()\n        return metrics, preds.size()\n\n\n    def _gather_and_numpify(self, tensors, name):\n        \"\"\"\n        Gather value of `tensors` (tensor or list/tuple of nested tensors) and convert them to numpy before\n        concatenating them to `gathered`\n        \"\"\"\n        if tensors is None:\n            return\n        return nested_numpify(tensors)\n\n\n    def prediction_step(\n            self,\n            inputs,\n            ignore_keys: Optional[List[str]] = None, has_labels: bool = None\n    ) -> Tuple[Optional[float], Optional[T.Tensor], Optional[T.Tensor]]:\n        \"\"\"\n        Perform an evaluation step on :obj:`model` using obj:`inputs`.\n        Subclass and override to inject custom behavior.\n        Args:\n            model (:obj:`nn.Module`):\n                The model to evaluate.\n            inputs (:obj:`Dict[str, Union[T.Tensor, Any]]`):\n                The inputs and targets of the model.\n                The dictionary will be unpacked before being fed to the model. Most models expect the targets under the\n                argument :obj:`labels`. Check your model's documentation for all accepted arguments.\n            prediction_loss_only (:obj:`bool`):\n                Whether or not to return the loss only.\n            ignore_keys (:obj:`Lst[str]`, `optional`):\n                A list of keys in the output of your model (if it is a dictionary) that should be ignored when\n                gathering predictions.\n        Return:\n            Tuple[Optional[float], Optional[T.Tensor], Optional[T.Tensor]]: A tuple with the loss, logits and\n            labels (each being optional).\n        \"\"\"\n\n\n        inputs = self._prepare_inputs(inputs)\n\n        # labels may be popped when computing the loss (label smoothing for instance) so we grab them first.\n        if has_labels:\n            #labels = nested_detach(tuple(inputs.get(name) for name in self.label_names))\n            labels = inputs[\"labels\"].detach().cpu()\n            if len(labels) == 1:\n                labels = labels[0]\n        else:\n            labels = None\n\n        with T.no_grad():\n            if has_labels:\n                loss, outputs = self.compute_loss(inputs, return_outputs=True)\n                loss = loss.mean().detach()\n                if isinstance(outputs, dict):\n                    logits = outputs[\"logits\"]\n                else:\n                    logits = outputs[1:]\n            else:\n                loss = None\n                outputs = self.model(**inputs)\n                if isinstance(outputs, dict):\n                    logits = tuple(v for k, v in outputs.items() if k not in ignore_keys)\n                else:\n                    logits = outputs\n                if self.past_index >= 0:\n                    self._past = outputs[self.past_index - 1]\n\n        if self.prediction_loss_only:\n            return (loss, None, None)\n\n        logits = nested_detach(logits)\n        if len(logits) == 1:\n            logits = logits[0]\n\n        return (loss, logits, labels)\n\n\n    def floating_point_ops(self, inputs):\n        \"\"\"\n        For models that inherit from :class:`~transformers.PreTrainedModel`, uses that method to compute the number of\n        floating point operations for every backward + forward pass. If using another model, either implement such a\n        method in the model or subclass and override this method.\n        Args:\n            inputs (:obj:`Dict[str, Union[T.Tensor, Any]]`):\n                The inputs and targets of the model.\n        Returns:\n            :obj:`int`: The number of floating-point operations.\n        \"\"\"\n        if hasattr(self.model, \"floating_point_ops\"):\n            return self.model.floating_point_ops(inputs)\n        else:\n            return 0\n\n\n\n    def set_eval(self):\n        \"\"\"\n        Bring the model into evaluation mode\n        \"\"\"\n        self.model.eval()\n\n\n    def set_train(self):\n        \"\"\"\n        Bring the model into train mode\n        \"\"\"\n        self.model.train()\n"
  },
  {
    "path": "experiments/mlm_bert/utils/trainer_pt_utils.py",
    "content": "# Copyright (c) Microsoft Corporation.\n# Licensed under the MIT license.\n\n# coding=utf-8\n# Copyright 2020-present the HuggingFace Inc. team.\n#\n# Licensed under the Apache License, Version 2.0 (the \"License\");\n# you may not use this file except in compliance with the License.\n# You may obtain a copy of the License at\n#\n#     http://www.apache.org/licenses/LICENSE-2.0\n#\n# Unless required by applicable law or agreed to in writing, software\n# distributed under the License is distributed on an \"AS IS\" BASIS,\n# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n# See the License for the specific language governing permissions and\n# limitations under the License.\n\"\"\"\nTorch utilities for the Trainer class.\n\"\"\"\n\nimport json\nimport math\nimport os\nimport warnings\nfrom contextlib import contextmanager\nfrom dataclasses import dataclass\nfrom typing import Dict, Iterator, List, Optional, Union\n\nimport numpy as np\nimport torch\nfrom packaging import version\nfrom torch.utils.data.dataset import Dataset\nfrom torch.utils.data.distributed import DistributedSampler\nfrom torch.utils.data.sampler import RandomSampler, Sampler\n\n\n\n# this is used to supress an undesired warning emitted by pytorch versions 1.4.2-1.7.0\ntry:\n    from torch.optim.lr_scheduler import SAVE_STATE_WARNING\nexcept ImportError:\n    SAVE_STATE_WARNING = \"\"\n\n\n\ndef torch_pad_and_concatenate(tensor1, tensor2, padding_index=-100):\n    \"\"\"Concatenates `tensor1` and `tensor2` on first axis, applying padding on the second if necessary.\"\"\"\n    if len(tensor1.shape) == 1 or tensor1.shape[1] == tensor2.shape[1]:\n        return torch.cat((tensor1, tensor2), dim=0)\n\n    # Let's figure out the new shape\n    new_shape = (tensor1.shape[0] + tensor2.shape[0], max(tensor1.shape[1], tensor2.shape[1])) + tensor1.shape[2:]\n\n    # Now let's fill the result tensor\n    result = tensor1.new_full(new_shape, padding_index)\n    result[: tensor1.shape[0], : tensor1.shape[1]] = tensor1\n    result[tensor1.shape[0] :, : tensor2.shape[1]] = tensor2\n    return result\n\n\ndef numpy_pad_and_concatenate(array1, array2, padding_index=-100):\n    \"\"\"Concatenates `array1` and `array2` on first axis, applying padding on the second if necessary.\"\"\"\n    if len(array1.shape) == 1 or array1.shape[1] == array2.shape[1]:\n        return np.concatenate((array1, array2), dim=0)\n\n    # Let's figure out the new shape\n    new_shape = (array1.shape[0] + array2.shape[0], max(array1.shape[1], array2.shape[1])) + array1.shape[2:]\n\n    # Now let's fill the result tensor\n    result = np.full_like(array1, padding_index, shape=new_shape)\n    result[: array1.shape[0], : array1.shape[1]] = array1\n    result[array1.shape[0] :, : array2.shape[1]] = array2\n    return result\n\n\ndef nested_concat(tensors, new_tensors, padding_index=-100):\n    \"\"\"\n    Concat the `new_tensors` to `tensors` on the first dim and pad them on the second if needed. Works for tensors or\n    nested list/tuples of tensors.\n    \"\"\"\n    assert type(tensors) == type(\n        new_tensors\n    ), f\"Expected `tensors` and `new_tensors` to have the same type but found {type(tensors)} and {type(new_tensors)}.\"\n    if isinstance(tensors, (list, tuple)):\n        return type(tensors)(nested_concat(t, n, padding_index=padding_index) for t, n in zip(tensors, new_tensors))\n    elif isinstance(tensors, torch.Tensor):\n        return torch_pad_and_concatenate(tensors, new_tensors, padding_index=padding_index)\n    elif isinstance(tensors, np.ndarray):\n        return numpy_pad_and_concatenate(tensors, new_tensors, padding_index=padding_index)\n    else:\n        raise TypeError(f\"Unsupported type for concatenation: got {type(tensors)}\")\n\n\ndef nested_numpify(tensors):\n    \"Numpify `tensors` (even if it's a nested list/tuple of tensors).\"\n    if isinstance(tensors, (list, tuple)):\n        return type(tensors)(nested_numpify(t) for t in tensors)\n    return tensors.cpu().numpy()\n\n\ndef nested_detach(tensors):\n    \"Detach `tensors` (even if it's a nested list/tuple of tensors).\"\n    if isinstance(tensors, (list, tuple)):\n        return type(tensors)(nested_detach(t) for t in tensors)\n    return tensors.detach()\n\n\n\n\ndef reissue_pt_warnings(caught_warnings):\n    # Reissue warnings that are not the SAVE_STATE_WARNING\n    if len(caught_warnings) > 1:\n        for w in caught_warnings:\n            if w.category != UserWarning or w.message != SAVE_STATE_WARNING:\n                warnings.warn(w.message, w.category)\n\n\n\n\ndef nested_new_like(arrays, num_samples, padding_index=-100):\n    \"\"\" Create the same nested structure as `arrays` with a first dimension always at `num_samples`.\"\"\"\n    if isinstance(arrays, (list, tuple)):\n        return type(arrays)(nested_new_like(x, num_samples) for x in arrays)\n    return np.full_like(arrays, padding_index, shape=(num_samples, *arrays.shape[1:]))\n\n\ndef nested_expand_like(arrays, new_seq_length, padding_index=-100):\n    \"\"\" Expand the `arrays` so that the second dimension grows to `new_seq_length`. Uses `padding_index` for padding.\"\"\"\n    if isinstance(arrays, (list, tuple)):\n        return type(arrays)(nested_expand_like(x, new_seq_length, padding_index=padding_index) for x in arrays)\n\n    result = np.full_like(arrays, padding_index, shape=(arrays.shape[0], new_seq_length) + arrays.shape[2:])\n    result[:, : arrays.shape[1]] = arrays\n    return result\n\n\ndef nested_truncate(tensors, limit):\n    \"Truncate `tensors` at `limit` (even if it's a nested list/tuple of tensors).\"\n    if isinstance(tensors, (list, tuple)):\n        return type(tensors)(nested_truncate(t, limit) for t in tensors)\n    return tensors[:limit]\n\n\ndef _get_first_shape(arrays):\n    \"\"\"Return the shape of the first array found in the nested struct `arrays`.\"\"\"\n    if isinstance(arrays, (list, tuple)):\n        return _get_first_shape(arrays[0])\n    return arrays.shape\n\n\nclass DistributedTensorGatherer:\n    \"\"\"\n    A class responsible for properly gathering tensors (or nested list/tuple of tensors) on the CPU by chunks.\n    If our dataset has 16 samples with a batch size of 2 on 3 processes and we gather then transfer on CPU at every\n    step, our sampler will generate the following indices:\n        :obj:`[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 0, 1]`\n    to get something of size a multiple of 3 (so that each process gets the same dataset length). Then process 0, 1 and\n    2 will be responsible of making predictions for the following samples:\n        - P0: :obj:`[0, 1, 2, 3, 4, 5]`\n        - P1: :obj:`[6, 7, 8, 9, 10, 11]`\n        - P2: :obj:`[12, 13, 14, 15, 0, 1]`\n    The first batch treated on each process will be\n        - P0: :obj:`[0, 1]`\n        - P1: :obj:`[6, 7]`\n        - P2: :obj:`[12, 13]`\n    So if we gather at the end of the first batch, we will get a tensor (nested list/tuple of tensor) corresponding to\n    the following indices:\n        :obj:`[0, 1, 6, 7, 12, 13]`\n    If we directly concatenate our results without taking any precautions, the user will then get the predictions for\n    the indices in this order at the end of the prediction loop:\n        :obj:`[0, 1, 6, 7, 12, 13, 2, 3, 8, 9, 14, 15, 4, 5, 10, 11, 0, 1]`\n    For some reason, that's not going to roll their boat. This class is there to solve that problem.\n    Args:\n        world_size (:obj:`int`):\n            The number of processes used in the distributed training.\n        num_samples (:obj:`int`):\n            The number of samples in our dataset.\n        make_multiple_of (:obj:`int`, `optional`):\n            If passed, the class assumes the datasets passed to each process are made to be a multiple of this argument\n            (by adding samples).\n        padding_index (:obj:`int`, `optional`, defaults to -100):\n            The padding index to use if the arrays don't all have the same sequence length.\n    \"\"\"\n\n    def __init__(self, world_size, num_samples, make_multiple_of=None, padding_index=-100):\n        self.world_size = world_size\n        self.num_samples = num_samples\n        total_size = world_size if make_multiple_of is None else world_size * make_multiple_of\n        self.total_samples = int(np.ceil(num_samples / total_size)) * total_size\n        self.process_length = self.total_samples // world_size\n        self._storage = None\n        self._offsets = None\n        self.padding_index = padding_index\n\n    def add_arrays(self, arrays):\n        \"\"\"\n        Add :obj:`arrays` to the internal storage, Will initialize the storage to the full size at the first arrays\n        passed so that if we're bound to get an OOM, it happens at the beginning.\n        \"\"\"\n        if arrays is None:\n            return\n        if self._storage is None:\n            self._storage = nested_new_like(arrays, self.total_samples, padding_index=self.padding_index)\n            self._offsets = list(range(0, self.total_samples, self.process_length))\n        else:\n            storage_shape = _get_first_shape(self._storage)\n            arrays_shape = _get_first_shape(arrays)\n            if len(storage_shape) > 1 and storage_shape[1] < arrays_shape[1]:\n                # If we get new arrays that are too big too fit, we expand the shape fo the storage\n                self._storage = nested_expand_like(self._storage, arrays_shape[1], padding_index=self.padding_index)\n        slice_len = self._nested_set_tensors(self._storage, arrays)\n        for i in range(self.world_size):\n            self._offsets[i] += slice_len\n\n    def _nested_set_tensors(self, storage, arrays):\n        if isinstance(arrays, (list, tuple)):\n            for x, y in zip(storage, arrays):\n                slice_len = self._nested_set_tensors(x, y)\n            return slice_len\n        assert (\n            arrays.shape[0] % self.world_size == 0\n        ), f\"Arrays passed should all have a first dimension multiple of {self.world_size}, found {arrays.shape[0]}.\"\n\n        slice_len = arrays.shape[0] // self.world_size\n        for i in range(self.world_size):\n            if len(arrays.shape) == 1:\n                storage[self._offsets[i] : self._offsets[i] + slice_len] = arrays[i * slice_len : (i + 1) * slice_len]\n            else:\n                storage[self._offsets[i] : self._offsets[i] + slice_len, : arrays.shape[1]] = arrays[\n                    i * slice_len : (i + 1) * slice_len\n                ]\n        return slice_len\n\n    def finalize(self):\n        \"\"\"\n        Return the properly gathered arrays and truncate to the number of samples (since the sampler added some extras\n        to get each process a dataset of the same length).\n        \"\"\"\n        if self._storage is None:\n            return\n        if self._offsets[0] != self.process_length:\n            logger.warn(\"Not all data has been set. Are you sure you passed all values?\")\n        return nested_truncate(self._storage, self.num_samples)\n\n\n@dataclass\nclass LabelSmoother:\n    \"\"\"\n    Adds label-smoothing on a pre-computed output from a Transformers model.\n    Args:\n        epsilon (:obj:`float`, `optional`, defaults to 0.1):\n            The label smoothing factor.\n        ignore_index (:obj:`int`, `optional`, defaults to -100):\n            The index in the labels to ignore when computing the loss.\n    \"\"\"\n\n    epsilon: float = 0.1\n    ignore_index: int = -100\n\n    def __call__(self, model_output, labels):\n        logits = model_output[\"logits\"] if isinstance(model_output, dict) else model_output[0]\n        log_probs = -torch.nn.functional.log_softmax(logits, dim=-1)\n        if labels.dim() == log_probs.dim() - 1:\n            labels = labels.unsqueeze(-1)\n\n        padding_mask = labels.eq(self.ignore_index)\n        # In case the ignore_index is -100, the gather will fail, so we replace labels by 0. The padding_mask\n        # will ignore them in any case.\n        labels.clamp_min_(0)\n        nll_loss = log_probs.gather(dim=-1, index=labels)\n        smoothed_loss = log_probs.sum(dim=-1, keepdim=True)\n\n        nll_loss.masked_fill_(padding_mask, 0.0)\n        smoothed_loss.masked_fill_(padding_mask, 0.0)\n\n        # Take the mean over the label dimensions, then divide by the number of active elements (i.e. not-padded):\n        num_active_elements = padding_mask.numel() - padding_mask.long().sum()\n        nll_loss = nll_loss.sum() / num_active_elements\n        smoothed_loss = smoothed_loss.sum() / (num_active_elements * log_probs.shape[-1])\n        return (1 - self.epsilon) * nll_loss + self.epsilon * smoothed_loss\n\n\ndef get_length_grouped_indices(lengths, batch_size, mega_batch_mult=None, generator=None):\n    \"\"\"\n    Return a list of indices so that each slice of :obj:`batch_size` consecutive indices correspond to elements of\n    similar lengths. To do this, the indices are:\n    - randomly permuted\n    - grouped in mega-batches of size :obj:`mega_batch_mult * batch_size`\n    - sorted by length in each mega-batch\n    The result is the concatenation of all mega-batches, with the batch of :obj:`batch_size` containing the element of\n    maximum length placed first, so that an OOM happens sooner rather than later.\n    \"\"\"\n    # Default for mega_batch_mult: 50 or the number to get 4 megabatches, whichever is smaller.\n    if mega_batch_mult is None:\n        mega_batch_mult = min(len(lengths) // (batch_size * 4), 50)\n        # Just in case, for tiny datasets\n        if mega_batch_mult == 0:\n            mega_batch_mult = 1\n\n    # We need to use torch for the random part as a distributed sampler will set the random seed for torch.\n    indices = torch.randperm(len(lengths), generator=generator)\n    megabatch_size = mega_batch_mult * batch_size\n    megabatches = [indices[i : i + megabatch_size].tolist() for i in range(0, len(lengths), megabatch_size)]\n    megabatches = [list(sorted(megabatch, key=lambda i: lengths[i], reverse=True)) for megabatch in megabatches]\n\n    # The rest is to get the biggest batch first.\n    # Since each megabatch is sorted by descending length, the longest element is the first\n    megabatch_maximums = [lengths[megabatch[0]] for megabatch in megabatches]\n    max_idx = torch.argmax(torch.tensor(megabatch_maximums)).item()\n    # Switch to put the longest element in first position\n    megabatches[0][0], megabatches[max_idx][0] = megabatches[max_idx][0], megabatches[0][0]\n\n    return sum(megabatches, [])\n\n\nclass LengthGroupedSampler(Sampler):\n    r\"\"\"\n    Sampler that samples indices in a way that groups together features of the dataset of roughly the same length while\n    keeping a bit of randomness.\n    \"\"\"\n\n    def __init__(self, dataset: Dataset, batch_size: int, lengths: Optional[List[int]] = None):\n        self.dataset = dataset\n        self.batch_size = batch_size\n        if lengths is None:\n            if not isinstance(dataset[0], dict) or \"input_ids\" not in dataset[0]:\n                raise ValueError(\n                    \"Can only automatically infer lengths for datasets whose items are dictionaries with an \"\n                    \"'input_ids' key.\"\n                )\n            lengths = [len(feature[\"input_ids\"]) for feature in dataset]\n        self.lengths = lengths\n\n    def __len__(self):\n        return len(self.lengths)\n\n    def __iter__(self):\n        indices = get_length_grouped_indices(self.lengths, self.batch_size)\n        return iter(indices)\n\n\nclass DistributedLengthGroupedSampler(DistributedSampler):\n    r\"\"\"\n    Distributed Sampler that samples indices in a way that groups together features of the dataset of roughly the same\n    length while keeping a bit of randomness.\n    \"\"\"\n    # Copied and adapted from PyTorch DistributedSampler.\n    def __init__(\n        self,\n        dataset: Dataset,\n        batch_size: int,\n        num_replicas: Optional[int] = None,\n        rank: Optional[int] = None,\n        seed: int = 0,\n        drop_last: bool = False,\n        lengths: Optional[List[int]] = None,\n    ):\n        if num_replicas is None:\n            if not dist.is_available():\n                raise RuntimeError(\"Requires distributed package to be available\")\n            num_replicas = dist.get_world_size()\n        if rank is None:\n            if not dist.is_available():\n                raise RuntimeError(\"Requires distributed package to be available\")\n            rank = dist.get_rank()\n        self.dataset = dataset\n        self.batch_size = batch_size\n        self.num_replicas = num_replicas\n        self.rank = rank\n        self.epoch = 0\n        self.drop_last = drop_last\n        # If the dataset length is evenly divisible by # of replicas, then there\n        # is no need to drop any data, since the dataset will be split equally.\n        if self.drop_last and len(self.dataset) % self.num_replicas != 0:\n            # Split to nearest available length that is evenly divisible.\n            # This is to ensure each rank receives the same amount of data when\n            # using this Sampler.\n            self.num_samples = math.ceil((len(self.dataset) - self.num_replicas) / self.num_replicas)\n        else:\n            self.num_samples = math.ceil(len(self.dataset) / self.num_replicas)\n        self.total_size = self.num_samples * self.num_replicas\n        self.seed = seed\n\n        if lengths is None:\n            if not isinstance(dataset[0], dict) or \"input_ids\" not in dataset[0]:\n                raise ValueError(\n                    \"Can only automatically infer lengths for datasets whose items are dictionaries with an \"\n                    \"'input_ids' key.\"\n                )\n            lengths = [len(feature[\"input_ids\"]) for feature in dataset]\n        self.lengths = lengths\n\n    def __iter__(self) -> Iterator:\n        # Deterministically shuffle based on epoch and seed\n        g = torch.Generator()\n        g.manual_seed(self.seed + self.epoch)\n        indices = get_length_grouped_indices(self.lengths, self.batch_size, generator=g)\n\n        if not self.drop_last:\n            # add extra samples to make it evenly divisible\n            indices += indices[: (self.total_size - len(indices))]\n        else:\n            # remove tail of data to make it evenly divisible.\n            indices = indices[: self.total_size]\n        assert len(indices) == self.total_size\n\n        # subsample\n        indices = indices[self.rank : self.total_size : self.num_replicas]\n        assert len(indices) == self.num_samples\n\n        return iter(indices)\n\n\n# In order to keep `trainer.py` compact and easy to understand, place any secondary PT Trainer\n# helper methods here\n\n\ndef _get_learning_rate(self):\n    if self.deepspeed:\n        # with deepspeed's fp16 and dynamic loss scale enabled the optimizer/scheduler steps may\n        # not run for the first few dozen steps while loss scale is too large, and thus during\n        # that time `get_last_lr` will fail if called during that warm up stage, so work around it:\n        try:\n            last_lr = self.lr_scheduler.get_last_lr()[0]\n        except AssertionError as e:\n            if \"need to call step\" in str(e):\n                logger.warn(\"tried to get lr value before scheduler/optimizer started stepping, returning lr=0\")\n                last_lr = 0\n            else:\n                raise\n    else:\n        last_lr = (\n            # backward compatibility for pytorch schedulers\n            self.lr_scheduler.get_last_lr()[0]\n            if version.parse(torch.__version__) >= version.parse(\"1.4\")\n            else self.lr_scheduler.get_lr()[0]\n        )\n    return last_lr\n\n\ndef metrics_format(self, metrics: Dict[str, float]) -> Dict[str, float]:\n    \"\"\"\n    Reformat Trainer metrics values to a human-readable format\n    Args:\n        metrics (:obj:`Dict[str, float]`):\n            The metrics returned from train/evaluate/predict\n    Returns:\n        metrics (:obj:`Dict[str, float]`): The reformatted metrics\n    \"\"\"\n\n    metrics_copy = metrics.copy()\n    for k, v in metrics_copy.items():\n        if \"_mem_\" in k:\n            metrics_copy[k] = f\"{ v >> 20 }MB\"\n        elif k == \"total_flos\":\n            metrics_copy[k] = f\"{ int(v) >> 30 }GF\"\n        elif type(metrics_copy[k]) == float:\n            metrics_copy[k] = round(v, 4)\n\n    return metrics_copy\n\n\ndef log_metrics(self, split, metrics):\n    \"\"\"\n    Log metrics in a specially formatted way\n    Args:\n        split (:obj:`str`):\n            Mode/split name: one of ``train``, ``eval``, ``test``\n        metrics (:obj:`Dict[str, float]`):\n            The metrics returned from train/evaluate/predictmetrics: metrics dict\n    \"\"\"\n\n    logger.info(f\"***** {split} metrics *****\")\n    metrics_formatted = self.metrics_format(metrics)\n    k_width = max(len(str(x)) for x in metrics_formatted.keys())\n    v_width = max(len(str(x)) for x in metrics_formatted.values())\n    for key in sorted(metrics_formatted.keys()):\n        logger.info(f\"  {key: <{k_width}} = {metrics_formatted[key]:>{v_width}}\")\n\n\ndef save_metrics(self, split, metrics):\n    \"\"\"\n    Save metrics into a json file for that split, e.g. ``train_results.json``.\n    Args:\n        split (:obj:`str`):\n            Mode/split name: one of ``train``, ``eval``, ``test``, ``all``\n        metrics (:obj:`Dict[str, float]`):\n            The metrics returned from train/evaluate/predict\n    \"\"\"\n    path = os.path.join(self.args.output_dir, f\"{split}_results.json\")\n    with open(path, \"w\") as f:\n        json.dump(metrics, f, indent=4, sort_keys=True)"
  },
  {
    "path": "experiments/mlm_bert/utils/trainer_utils.py",
    "content": "# Copyright (c) Microsoft Corporation.\n# Licensed under the MIT license.\n\n# coding=utf-8\n# Copyright 2020-present the HuggingFace Inc. team.\n#\n# Licensed under the Apache License, Version 2.0 (the \"License\");\n# you may not use this file except in compliance with the License.\n# You may obtain a copy of the License at\n#\n#     http://www.apache.org/licenses/LICENSE-2.0\n#\n# Unless required by applicable law or agreed to in writing, software\n# distributed under the License is distributed on an \"AS IS\" BASIS,\n# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n# See the License for the specific language governing permissions and\n# limitations under the License.\n\"\"\"\nUtilities for the Trainer and TFTrainer class. Should be independent from PyTorch and TensorFlow.\n\"\"\"\n\nimport random\nfrom typing import Any, Dict, NamedTuple, Optional, Tuple, Union\nimport numpy as np\nimport torch\nimport logging\n\nfrom utils import print_rank\n\n\ndef set_seed(seed: int):\n    \"\"\"\n    Helper function for reproducible behavior to set the seed in ``random``, ``numpy``, ``torch`` and/or ``tf`` (if\n    installed).\n    Args:\n        seed (:obj:`int`): The seed to set.\n    \"\"\"\n    random.seed(seed)\n    np.random.seed(seed)\n\n    torch.manual_seed(seed)\n    torch.cuda.manual_seed_all(seed)\n    # ^^ safe to call this function even if cuda is not available\n\n\nclass EvalPrediction(NamedTuple):\n    \"\"\"\n    Evaluation output (always contains labels), to be used to compute metrics.\n    Parameters:\n        predictions (:obj:`np.ndarray`): Predictions of the model.\n        label_ids (:obj:`np.ndarray`): Targets to be matched.\n    \"\"\"\n\n    predictions: Union[np.ndarray, Tuple[np.ndarray]]\n    label_ids: np.ndarray\n\n\nclass PredictionOutput(NamedTuple):\n    predictions: Union[np.ndarray, Tuple[np.ndarray]]\n    label_ids: Optional[np.ndarray]\n    metrics: Optional[Dict[str, float]]\n\n\nclass ComputeMetrics:\n    def __init__(self, p: EvalPrediction, mask=None):\n        self.EvalPrediction = EvalPrediction\n        self.compute_metrics( self.EvalPrediction)\n\n    @staticmethod\n    def compute_metrics(p: EvalPrediction, mask=None):\n        print_rank('Prediction Block Size: {}'.format(p.predictions.size()), loglevel=logging.DEBUG)\n        if len(list(p.predictions.size()))<3:\n            if len(list(p.predictions.size()))<2:\n                print_rank('There is something REALLY wrong with prediction tensor:'.format(p.predictions.size()), loglevel=logging.INFO)\n                return {'acc': torch.tensor(0.0)}\n            print_rank('There is something wrong with prediction tensor:'.format(p.predictions.size()), loglevel=logging.INFO)\n            preds = np.argmax(p.predictions, axis=1)\n        else:\n            preds = np.argmax(p.predictions, axis=2)\n\n        if mask is None:\n            return {'acc': (preds == p.label_ids).float().mean()}\n        else:\n            #valid = preds >1  # reject oov predictions even if they're correct.\n            valid = mask==1\n            return {'acc': (preds.eq(p.label_ids.cpu()) * valid.cpu()).float().mean()}\n"
  },
  {
    "path": "experiments/nlg_gru/README.md",
    "content": "# Simple example of a NLG task on Reddit Dataset\n\nInstructions on how to run the experiment, given below.\n\n## Preparing the data\n\nFor this experiment, we can create a dummy dataset by running the \nscript located in `testing/create_data.py` as follows:\n\n```code\n    python create_data.py --task nlg_gru\n```\n\nA couple of scripts are provided in `utils/preprocessing` for preprocessing .tsv files\nin case you want to use your own data.\n\n## Creating a config file\n\nAll the parameters of the experiment are passed in a YAML file. An basic example is \nprovided in `configs/hello_world_nlg_gru_json.yaml` with the suggested \nparameters for local runs. \n\nThe example provided above is for running json files. If you want to try with HDF5 files\nmake sure to use the script `utils/preprocessing/from_json_to_hdf5.py` to convert the mock\ndata to HDF5 format.\n\n## Running the experiment\n\nFinally, to launch the experiment locally , it suffices to launch the `e2e_trainer.py`\nscript using torch.distributed , you can use as example the following line:\n\n```code\n    python -m torch.distributed.run --nproc_per_node=3 e2e_trainer.py -dataPath .\\testing\\mockup\\ -outputPath scratch -config .\\testing\\configs\\hello_world_nlg_gru.yaml -task nlg_gru -backend nccl\n```\n\nFor submitting jobs in Azure ML, we have included the instructions in the `Experiments` \nsection of the main `README.md`."
  },
  {
    "path": "experiments/nlg_gru/config.py",
    "content": "from __future__ import annotations\nfrom dataclasses import dataclass\nimport sys\nsys.path.append('../../')\nfrom core.config import ModelConfig, from_dict\n\n\n@dataclass\nclass GRUConfig(ModelConfig):\n    \"\"\"nlg_gru configuration\n\nThe model configuration specifies model architecture, parameters, and initialization settings.\n\nAttributes:\n    embed_dim (int): specific to GRU models, embedding dimension.\n\n    vocab_size (int): specific to GRU models, the vocabulary size.\n\n    hidden_dim (int): specific to GRU models, the hidden size.\n\n    weight_init (str): ``default``, or ``xavier_normal``, indicating how to randomly initialize the model weights.\n\n    OOV_correct (bool): whether OOV predictions are evaluated as correct, or ignored.\n\"\"\"\n    embed_dim: int | None = None\n    vocab_size: int | None = None\n    hidden_dim: int | None = None\n    weight_init: str = None\n    OOV_correct: bool = False\n    \n    @staticmethod\n    def from_dict(config) -> GRUConfig:\n        return from_dict(GRUConfig, config)\n"
  },
  {
    "path": "experiments/nlg_gru/dataloaders/dataloader.py",
    "content": "# Copyright (c) Microsoft Corporation.\n# Licensed under the MIT license.\n\nimport random\nimport torch\nimport numpy as np\nfrom core.dataloader import BaseDataLoader\nfrom torch.utils.data.distributed import DistributedSampler\nfrom experiments.nlg_gru.dataloaders.dataset import Dataset\nfrom utils.data_utils import BatchSampler, DynamicBatchSampler\n\nclass DataLoader(BaseDataLoader):\n    \"\"\"\n    PyTorch dataloader for loading text data from\n    text_dataset.\n    \"\"\"\n    def __init__(self, mode, num_workers=0, **kwargs):\n\n        args = kwargs['args']\n        self.batch_size = args['batch_size']\n        batch_sampler = None\n\n        dataset = Dataset(\n                        data   = kwargs['data'],\n                        test_only    = not mode==\"train\",\n                        vocab_dict   = args['vocab_dict'],\n                        user_idx     = kwargs['user_idx'], \n                        max_num_words= args['max_num_words'],\n                        preencoded   = args.get('preencoded', False))\n        \n        if mode == 'train':\n            \n            sampler = DistributedSampler(dataset,num_replicas=1,rank=0)\n            sampler.set_epoch(random.randint(0, 10**10))\n            batch_sampler = DynamicBatchSampler(sampler,\n                                            frames_threshold = args['max_num_words'],\n                                            max_batch_size   = self.batch_size,\n                                            unsorted_batch   = args['unsorted_batch'],\n                                            fps=1)\n\n        elif mode == 'val' or mode == 'test':\n            sampler = BatchSampler(dataset, batch_size=self.batch_size, randomize=False, drop_last=False)\n            super().__init__(dataset,\n                             batch_sampler=sampler,\n                             num_workers=num_workers,\n                             collate_fn=self.collate_fn,\n                             pin_memory=args[\"pin_memory\"])\n            return\n\n        if batch_sampler is None:\n            super().__init__(dataset,\n                             batch_size=self.batch_size,\n                             sampler=sampler,\n                             num_workers=num_workers,\n                             collate_fn=self.collate_fn,\n                             drop_last=True)\n        else:\n            super().__init__(dataset,\n                             batch_sampler=batch_sampler,\n                             num_workers=num_workers,\n                             collate_fn=self.collate_fn,\n                             pin_memory=args[\"pin_memory\"])\n\n    def collate_fn(self, batch):\n        def pad_and_concat_feats(labels):\n            batch_size = len(labels)\n            max_len = max(len(l[0]) for l in labels)\n            cat_labels = np.full((batch_size, max_len), -1)\n\n            for e, l in enumerate(labels):\n                cat_labels[e,:len(l[0])] = np.squeeze(l)\n            return cat_labels\n\n\n        src_seq, utt_ids = zip(*batch)\n        x_len =  [len(s[0]) for s in src_seq]\n\n        src_seq = pad_and_concat_feats(src_seq)\n        packed  = {\n                    'x': torch.from_numpy(src_seq).long(),\n                    'x_len': x_len,\n                    'utt_ids' : utt_ids,\n                    'total_frames' : sum(x_len),\n                    'total_frames_with_padding' : np.prod(src_seq.shape),\n                    'loss_weight' : None\n                }\n        return packed\n    "
  },
  {
    "path": "experiments/nlg_gru/dataloaders/dataset.py",
    "content": "# Copyright (c) Microsoft Corporation.\n# Licensed under the MIT license.\n\nimport numpy as np\nimport logging\nimport json\n\nfrom utils import print_rank\nfrom core.dataset import BaseDataset\nfrom experiments.nlg_gru.utils.utility import *\n\nclass Dataset(BaseDataset):\n    \"\"\"\n    Map a text source to the target text\n    \"\"\"\n    \n    def __init__(self, data, min_num_words=2, max_num_words=25, test_only=False, user_idx=0, vocab_dict=None, preencoded=False, **kwargs):\n\n        self.utt_list = list()\n        self.test_only = test_only\n        self.max_num_words = max_num_words\n        self.min_num_words = min_num_words\n        self.preencoded = preencoded\n\n        # Load the vocab\n        self.vocab = load_vocab(kwargs['args']['vocab_dict']) if 'args' in kwargs else load_vocab(vocab_dict)\n        self.vocab_size = len(self.vocab)\n\n        # reading the jsonl for a specific user_idx\n        self.load_data(data, user_idx)\n\n    def __len__(self):\n        \"\"\"Return the length of the elements in the list.\"\"\"\n        return len(self.utt_list)\n\n\n    def __getitem__(self, idx):\n        \"\"\"Find the index in the available data\"\"\"\n\n        if self.preencoded:\n            batch = np.array([self.utt_list[idx]['src_text']], dtype=np.int32)\n        else:\n            # case_backoff_batch tries to find the best capitalisation that will allow the word to be in vocabulary\n            batch = case_backoff_batch([self.utt_list[idx]['src_text']], self.vocab.term_to_idx)\n            batch = to_indices(self.vocab, batch)\n\n        return  batch, self.user\n\n    def load_data(self, orig_strct, user_idx):\n\n        if isinstance(orig_strct, str):\n            print('Loading json-file: ', orig_strct)\n            with open(orig_strct, 'r') as fid:\n                orig_strct = json.load(fid)\n\n\n        self.user_list  = orig_strct['users']\n        self.num_samples = orig_strct['num_samples']\n        self.user_data  = orig_strct['user_data'] \n        self.user = 'test_only' if self.test_only else self.user_list[user_idx]\n\n        if user_idx != -1:\n            self.process_x(self.user_data)\n\n    def process_x(self, user_data):\n        print_rank('Processing data-structure: {} Utterances expected'.format(sum(self.num_samples)), loglevel=logging.DEBUG)\n        for user in self.user_list:\n            for e in user_data[user]['x']:\n                utt={}\n                utt['src_text'] = e if type(e) is list else e.split()\n                utt['duration'] = len(e)\n                if utt['duration']<= self.min_num_words:\n                    continue\n\n                if utt['duration'] > self.max_num_words:\n                    utt['src_text'] = utt['src_text'][:self.max_num_words]\n                    utt['duration'] = self.max_num_words\n                utt[\"loss_weight\"] = 1.0\n                self.utt_list.append(utt)\n"
  },
  {
    "path": "experiments/nlg_gru/model.py",
    "content": "# Copyright (c) Microsoft Corporation.\n# Licensed under the MIT license.\n\nimport torch as T\nfrom torch import Tensor\nfrom typing import List, Tuple\n\nfrom core.model import BaseModel\nfrom utils import softmax, to_device\n\nclass GRU2(T.nn.Module):\n    def __init__(self, input_size, hidden_size, input_bias, hidden_bias):\n        super(GRU2, self).__init__()\n        self.input_size = input_size\n        self.hidden_size = hidden_size\n        self.w_ih = T.nn.Linear(input_size, 3 * hidden_size, input_bias)\n        self.w_hh = T.nn.Linear(hidden_size, 3 * hidden_size, hidden_bias)\n        \n    def _forward_cell(self, input : Tensor, hidden : Tensor) -> Tensor:\n        g_i = self.w_ih(input)\n        g_h = self.w_hh(hidden)\n        i_r, i_i, i_n = g_i.chunk(3, 1)\n        h_r, h_i, h_n = g_h.chunk(3, 1)\n        reset_gate = T.sigmoid(i_r + h_r)\n        input_gate = T.sigmoid(i_i + h_i)\n        new_gate   = T.tanh(i_n + reset_gate * h_n)\n        hy         = new_gate + input_gate * (hidden - new_gate)\n        return hy\n    \n    def forward(self, input : Tensor) -> Tuple[Tensor, Tensor]:\n        hiddens : List[Tensor] = [to_device(T.zeros((input.shape[0], self.hidden_size)))]\n        for step in range(input.shape[1]):\n            hidden = self._forward_cell(input[:, step], hiddens[-1])\n            hiddens.append(hidden)\n            \n        return T.stack(hiddens, dim=1), hiddens[-1]\n    \n\nclass Embedding(T.nn.Module):\n    def __init__(self, vocab_size, embedding_size): \n        super(Embedding, self).__init__()\n        self.vocab_size = vocab_size\n        self.embedding_size = embedding_size\n        self.table = T.nn.Parameter(T.zeros((vocab_size, embedding_size)))\n        self.unembedding_bias = T.nn.Parameter(T.zeros(vocab_size))\n        delta = (3 / self.table.shape[1]) ** 0.5\n        T.nn.init.uniform_(self.table, -delta, delta)\n\n    def forward(self, input : Tensor, embed : bool) -> Tensor:\n        if embed:\n            output = T.nn.functional.embedding(input, self.table)\n        else:\n            output = input @ self.table.t() + self.unembedding_bias\n        return output\n    \n\nclass GRU(BaseModel): #DLM_2_0\n    def __init__(self, model_config, OOV_correct=False, dropout=0.0, topK_results=1, wantLogits=False, **kwargs):\n        super(GRU, self).__init__()\n        self.vocab_size = model_config['vocab_size']\n        self.embedding_size = model_config['embed_dim']\n        self.hidden_size = model_config['hidden_dim']\n        self.embedding = Embedding(self.vocab_size, self.embedding_size)\n        self.rnn = GRU2(self.embedding_size, self.hidden_size, True, True)\n        self.squeeze = T.nn.Linear(self.hidden_size, self.embedding_size, bias=False)\n        self.OOV_correct = OOV_correct\n        self.topK_results = topK_results\n        self.dropout=dropout\n        self.wantLogits=wantLogits\n        if self.dropout>0.0:\n            self.drop_layer = T.nn.Dropout(p=self.dropout)\n\n    def forward(self, input : T.Tensor) -> Tuple[Tensor, Tensor]:\n        input = input['x'] if isinstance(input, dict) else input\n        input = to_device(input)\n        embedding = self.embedding(input, True)\n        hiddens, state = self.rnn(embedding)\n        if self.dropout>0.0:\n            hiddens= self.drop_layer(hiddens)\n        output = self.embedding(self.squeeze(hiddens), False)\n        return output, state\n\n\n    def loss(self, input : T.Tensor) -> T.Tensor:\n        input = input['x'] if isinstance(input, dict) else input\n        input = to_device(input)\n        non_pad_mask = input >= 0\n        input = input * non_pad_mask.long()\n        non_pad_mask = non_pad_mask.view(-1)\n\n        # Run the forward pass\n        output, _ = self.forward(input[:, :-1])\n\n        # Estimate the targets\n        targets = input.view(-1)[non_pad_mask]\n        preds   = output.view(-1, self.vocab_size)[non_pad_mask]\n\n        # Estimate the loss\n        return T.nn.functional.cross_entropy(preds, targets)\n\n\n    def inference(self, input):\n        input = input['x'] if isinstance(input, dict) else input\n        input = to_device(input)\n        non_pad_mask = input >= 0\n        input = input * non_pad_mask.long()\n        non_pad_mask = non_pad_mask.view(-1)\n        output, _ = self.forward(input[:, :-1])\n\n        # Apply mask to input/output\n        targets = input.view(-1)[non_pad_mask]\n        preds = output.view(-1, self.vocab_size)[non_pad_mask]\n\n        # accuracy\n        probs_topK, preds_topK = T.topk(preds, self.topK_results, sorted=True, dim=1)\n        probs, preds = probs_topK[:,0], preds_topK[:,0]\n        if self.OOV_correct:\n            acc = preds.eq(targets).float().mean()\n        else:\n            valid = preds != 0  # reject oov predictions even if they're correct.\n            acc = (preds.eq(targets) * valid).float().mean()\n\n        if self.wantLogits:\n            if 1:\n                output=  {'probabilities': softmax(probs_topK.cpu().detach().numpy(), axis=1),\n                               'predictions': preds_topK.cpu().detach().numpy(),\n                               'labels': targets.cpu().detach().numpy()}\n            else:\n                output = {'probabilities': probs_topK.cpu().detach().numpy(),\n                              'predictions': preds_topK.cpu().detach().numpy(),\n                              'labels': targets.cpu().detach().numpy()}\n\n        return {'output':output, 'acc': acc.item(), 'batch_size': input.shape[0]}\n\n\n\n"
  },
  {
    "path": "experiments/nlg_gru/utils/utility.py",
    "content": "# Copyright (c) Microsoft Corporation.\n# Licensed under the MIT license.\n\nimport os\nimport json\nimport time\nfrom argparse import ArgumentParser\n\nimport numpy as np\nfrom collections import namedtuple\nfrom tqdm import tqdm\n\nTR_UPPER = {ord('i'): 'İ'}\nTR_LOWER = {ord('I'): 'ı'}\n\nVocab = namedtuple('Vocab', ['idx_to_term', 'term_to_idx'])\n\n\ndef load_vocab(url):\n    \"\"\"Load a vocabulary file.\n\n    url -- string -- url to the txt file\n\n    returns -- Vocab(idx_to_term=list, term_to_idx=dict)\n    \"\"\"\n    term_to_idx = {}\n    idx_to_term = []\n    with open(url, 'r', encoding='utf-8') as f:\n        for i, line in enumerate(f):\n            word = line.strip()\n            idx_to_term.append(word)\n            term_to_idx[word] = i\n    return Vocab(idx_to_term, term_to_idx)\n\n\ndef to_indices(vocab, batch, ndim=2, oov_idx=0, pad_idx=-1):\n        \"\"\"Convert a nested list of strings to a np.array of integers.\n        \n        vocab -- Vocab -- the vocabulary of the model\n        \n        batch -- [..[str]..] -- multidimensional batch\n\n        ndim -- int -- number of dimensions in batch\n\n        oov_idx -- int or None -- if specified, replace missing terms by\n                   the given index, otherwise raise an error\n\n        pad_idx -- int or None -- if specified, pad short last-dimension\n                   as specified, otherwise raise an error\n\n        raises -- ValueError -- if pad is required but pad_idx not specified\n               -- KeyError -- if oov is required but oov_idx not specified\n\n        returns -- np.array(int) -- term indices\n        \"\"\"\n        #print_rank(f'to_indices: batch len: {len(batch)} ndim: {ndim}')\n        if ndim == 1:\n            return np.array(\n                [(vocab.term_to_idx[term] if oov_idx is None else \n                        vocab.term_to_idx.get(term, oov_idx)) \n                            for term in batch],  dtype=np.int32)\n\n        if ndim == 2:\n            # note: in most circumstances there is only one example in the batch\n            # as a result, padding is never applied. We rely on collate_fn to properly\n            # apply padding.\n            length = max(len(row) for row in batch)\n            if pad_idx is None and min(len(row) for row in batch) != length:\n                raise ValueError('Padding required, but no pad_idx provided')\n            pad = length * [pad_idx]\n\n            result = np.array(\n                [[(vocab.term_to_idx[term] if oov_idx is None else\n                        vocab.term_to_idx.get(term, oov_idx))\n                            for term in row] + pad[len(row):]\n                                for row in batch], dtype=np.int32)\n            #print_rank(f'to_indices result: {result.shape}')\n            return result\n\n        # Flatten to a 2D batch, then recurse & reshape up (this ensures\n        # padding is handled correctly)\n        shape = [len(batch)]\n        for _ in range(2, ndim):\n            shape.append(len(batch[0]))\n            batch = [item for sub_batch in batch for item in sub_batch]\n        shape.append(-1)\n        return to_indices(vocab, batch, ndim=2, oov_idx=oov_idx, pad_idx=pad_idx).reshape(*shape)\n\ndef case_backoff_batch(batch, vocab):\n    \"\"\"Perform capitalization backoff on words both to lower & initial-upper case variants.\n\n    batch -- list(list(string)) -- batch of sentences of words, to back off\n\n    vocab -- set(string) -- vocabulary to consider\n\n    returns -- list(list(string)) -- backed-off batch\n    \"\"\"\n\n    def _variants(word):\n        yield word\n        yield word.translate(TR_LOWER).lower()\n        yield word.lower()\n        if len(word) > 1:\n            yield word[0].translate(TR_UPPER).capitalize() + word[1:]\n        yield word.capitalize()\n\n    return [[next((variant for variant in _variants(word) if variant in vocab),\n                  word)  # will become OOV\n             for word in sentence]\n            for sentence in batch]\n\n\ndef encode_data(data_dict, vocab):\n    '''Encode data that is in the format expected by FLUTE\n    \n    Parameters\n    ----------\n    data_dict: dict\n        Dictionary where keys consist of usernames and values give\n        the data for that user, specified by another dictionary with\n        keys :code:`x` (features) and, optionally, :code:`y` (labels).\n    vocab:\n\n    Returns\n    -------\n    dict\n        Dictionary in the same format as the input one, but now the\n        data in the :code:`x` field is given by tokens (i.e., integers),\n        instead of strings.\n    '''\n    new_dict = {}\n    for key, value in tqdm(data_dict.items()):\n        user_data = [s.split() for s in value['x']]\n        processed_data = case_backoff_batch(user_data, vocab.term_to_idx)\n        encoded_data = [[vocab.term_to_idx.get(term, 0) for term in row] for row in processed_data]\n        new_dict[key] = {'x': encoded_data}\n\n    return new_dict\n\n\nif __name__ == '__main__':\n    parser = ArgumentParser(description='Encodes data')\n    parser.add_argument('data_path', type=str, help='Path to data')\n    parser.add_argument('vocab_path', type=str, help='Path to vocabulary')\n    args = parser.parse_args()\n\n    if not os.path.isfile(args.data_path):\n        raise ValueError('data file does not exist')\n    if not os.path.isfile(args.vocab_path):\n        raise ValueError('vocabulary file does not exist')\n    if args.data_path[-5:] != '.json':\n        raise ValueError('argument must be a valid json file')\n\n    # Load vocabulary\n    print('Loading vocabulary...')\n    vocab = load_vocab(args.vocab_path)\n\n    # Load and encode data\n    print('Loading data... ', end='', flush=True)\n    start_time = time.time()\n    with open(args.data_path, 'r') as input_file:\n        all_data = json.load(input_file)\n    print(f'Finished in {time.time() - start_time:.2f}s')\n\n    print('Converting data...')\n    converted_user_data = encode_data(all_data['user_data'], vocab)\n    \n    # For debug purposes\n    for k, v in converted_user_data.items():\n        print(f'USER: {k}\\nDATA: {v}')\n        break\n\n    # Save encoded data to disk\n    print('Saving encoded data to disk...')\n    all_data['user_data'] = converted_user_data\n    with open(f'{args.data_path[:-5]}-encoded.json', 'w') as output_file:\n        json.dump(all_data, output_file)"
  },
  {
    "path": "experiments/nlp_rnn_fedshakespeare/README.md",
    "content": "## FedML Benchmark\n\n### Examples\n\nThe example in this folder was taken from [FedML](https://github.com/FedML-AI/FedML/tree/master/python/examples/simulation/mpi_fedavg_datasets_and_models_example) repository on its release 0.7.300, using the configuration suggested on their\n[benchmarking results](https://doc.fedml.ai/simulation/benchmark/BENCHMARK_MPI.html) for MPI-Based Federated Learning (fastest on this version).\n\n### Data\n\nFLUTE will automatically download the data used for this example, otherwise you can use the scripts provided [here](https://github.com/FedML-AI/FedML/tree/master/python/fedml/data) for each independent dataset in the FedML GitHub repository. \n\n### Run\n\nIf you downloaded the data manually, make sure that the variable `data_cache_dir` has been updated inside `preprocess.py`. Later, you can run the experiment as follows:\n\n```code\n\n    python -m torch.distributed.run  --nproc_per_node=4  e2e_trainer.py -dataPath ~/data -outputPath ~/outputTest  -config ./experiments/nlp_rnn_fedshakespeare/config.yaml -task nlp_rnn_fedshakespeare -backend nccl\n    \n```\n### Results\n\nThis comparison was carried out using Parrot (Simulator) on version 0.7.303 at commit ID [8f7f261f](https://github.com/FedML-AI/FedML/tree/8f7f261f44e58d0cb5a416b0d6fa270b42a91049). \n```\n _____________________________________________________________________________\n|                    |   FedML (MPI) - Fastest   |   FLUTE (NCCL)  - Fastest  |\n| Task               | Acc | Time     | GPU Mem  | Acc | Time     | GPU Mem   |\n|--------------------|-----|----------|----------|-----|----------|-----------|\n| LR_MNIST           | ~81 | 00:03:09 | ~3060 MB | ~81 | 00:01:35 | ~1060 MB  |\n| CNN_FEMNIST        | ~83 | 05:49:52 | ~5180 MB | ~83 | 00:08:22 | ~1770 MB  |\n| RESNET_FEDCIFAR100 | ~34 | 15:55:36 | ~5530 MB | ~33 | 01:42:01 | ~1900 MB  |\n| RNN_FEDSHAKESPEARE | ~57 | 06:46:21 | ~3690 MB | ~57 | 00:21:50 | ~1270 MB  |\n -----------------------------------------------------------------------------\n```\n\n### FedML Configuration file\n\nIn order to reproduce this experiment in FedML please use the setup below. \n\n```yaml\n\ncommon_args:\n  training_type: \"simulation\"\n  random_seed: 0\n\ndata_args:\n  dataset: \"fed_shakespeare\"\n  data_cache_dir: ~/fedml_data\n  partition_method: \"hetero\"\n  partition_alpha: 0.5\n\nmodel_args:\n  model: \"rnn\"\n\n\ntrain_args:\n  federated_optimizer: \"FedAvg\"\n  client_id_list: \"[]\"\n  client_num_in_total: 715\n  client_num_per_round: 10\n  comm_round: 1200\n  epochs: 1\n  batch_size: 4\n  client_optimizer: sgd\n  learning_rate: 0.8\n  weight_decay: 0.001\n\nvalidation_args:\n  frequency_of_the_test: 50\n\ndevice_args:\n  worker_num: 10\n  using_gpu: true\n  gpu_mapping_file: config/fedshakespeare_rnn/gpu_mapping.yaml\n  gpu_mapping_key: mapping_default # [3, 3, 3, 2]\n\ncomm_args:\n  backend: \"MPI\"\n  is_mobile: 0\n\n```"
  },
  {
    "path": "experiments/nlp_rnn_fedshakespeare/config.yaml",
    "content": "# Basic configuration file for running classif_cnn example using torchvision CIFAR10 dataset.\n# Parameters needed to initialize the model\nmodel_config:\n    model_type: RNN                                # class w/ `loss` and `inference` methods\n    model_folder: experiments/nlp_rnn_fedshakespeare/model.py     # file containing class\n\n# Configuration for differential privacy\ndp_config:\n    enable_local_dp: false                             # whether to enable user-level DP\n\n# Additional privacy metrics\nprivacy_metrics_config:\n    apply_metrics: false                               # cache data to compute additional metrics\n\n# Select the Federated optimizer to use (e.g. DGA, FedAvg or FedProx)\nstrategy: FedAvg\n\n# Determines all the server-side settings for training and evaluation rounds\nserver_config:   \n    wantRL: false                                      # whether to use RL-based meta-optimizers\n    resume_from_checkpoint: false                      # restart from checkpoint if file exists\n    do_profiling: false                                # run profiler and compute runtime metrics\n    optimizer_config:                                  # this is the optimizer used to update the model\n        type: sgd\n        lr: 1.0\n    annealing_config:                                  # annealer for the learning rate\n        type: step_lr\n        step_interval: epoch\n        gamma: 1.0\n        step_size: 100\n    val_freq: 50000                                       # how many iterations between metric eval on val set\n    rec_freq: 50                                     # how many iterations between metric eval on test set\n    initial_val: false\n    initial_rec: false\n    max_iteration: 1200                               # how many iterations in total\n    num_clients_per_iteration: 10                      # how many clients per iteration\n    data_config:                                       # where to get val and test data from\n        val:\n            batch_size: 4\n            val_data: null                             # Assigned to null because dataset is being instantiated\n        test:\n            batch_size: 4\n            test_data: null                            # Assigned to null because dataset is being instantiated\n    type: model_optimization\n    aggregate_median: softmax                          # how aggregations weights are computed\n    initial_lr_client: 0.8                           # learning rate used on client optimizer\n    lr_decay_factor: 1.0\n    weight_train_loss: train_loss\n    best_model_criterion: loss\n    fall_back_to_best_model: false\n    softmax_beta: 1.0\n\n# Dictates the learning parameters for client-side model updates. Train data is defined inside this config.\nclient_config:\n    do_profiling: false                                # run profiling and compute runtime metrics\n    ignore_subtask: false\n    data_config:                                       # where to get training data from\n        train:\n            batch_size: 4\n            list_of_train_data: null                   # Assigned to null because dataset is being instantiated\n            desired_max_samples: 5000\n    optimizer_config:                                  # this is the optimizer used by the client\n        type: sgd\n        lr: 0.8                                      # this is overridden by `initial_lr_client`\n    type: optimization"
  },
  {
    "path": "experiments/nlp_rnn_fedshakespeare/dataloaders/dataloader.py",
    "content": "# Copyright (c) Microsoft Corporation.\n# Licensed under the MIT license.\n\nimport torch\nimport numpy as np\n\nfrom core.dataloader import BaseDataLoader\nfrom experiments.nlp_rnn_fedshakespeare.dataloaders.dataset import Dataset\n\nclass DataLoader(BaseDataLoader):\n    def __init__(self, mode, num_workers=0, **kwargs):\n        args = kwargs['args']\n        self.batch_size = args['batch_size']\n\n        dataset = Dataset(\n            data=kwargs['data'],\n            test_only=(not mode=='train'),\n            user_idx=kwargs.get('user_idx', None),\n        )\n\n        super().__init__(\n            dataset,\n            batch_size=self.batch_size,\n            shuffle=(mode=='train'),\n            num_workers=num_workers,\n            collate_fn=self.collate_fn,\n        )\n\n    def collate_fn(self, batch):\n        x, y = list(zip(*batch))\n        x, y = np.array(x), np.array(y)\n        return {'x': torch.tensor(x), 'y': torch.tensor(y)}"
  },
  {
    "path": "experiments/nlp_rnn_fedshakespeare/dataloaders/dataset.py",
    "content": "# Copyright (c) Microsoft Corporation.\n# Licensed under the MIT license.\n\nimport numpy as np\nfrom core.dataset import BaseDataset\nfrom experiments.nlp_rnn_fedshakespeare.dataloaders.preprocessing import FEDSHAKESPEARE\n\nclass Dataset(BaseDataset):\n    def __init__(self, data, test_only=False, user_idx=0, **kwargs):\n        self.test_only = test_only\n        self.user_idx = user_idx\n\n        # Get all data\n        self.user_list, self.user_data, self.user_data_label, self.num_samples = self.load_data(data, self.test_only)\n\n        if user_idx == -1:\n            self.user = self.user_list\n            self.features = np.vstack([user_data for user_data in self.user_data.values()])\n            self.labels = np.vstack([user_label for user_label in self.user_data_label.values()])\n        else:\n            if self.test_only:  # combine all data into single array\n                self.user = 'test_only'\n                self.features = np.vstack([user_data for user_data in self.user_data.values()])\n                self.labels = np.vstack([user_label for user_label in self.user_data_label.values()])\n            else:  # get a single user's data\n                if user_idx is None:\n                    raise ValueError('in train mode, user_idx must be specified')\n\n                self.user = self.user_list[user_idx]\n                self.features = self.user_data[self.user]\n                self.labels = self.user_data_label[self.user]\n\n    def __getitem__(self, idx):\n        return np.array(self.features[idx]).astype(np.int32).T, self.labels[idx]\n\n    def __len__(self):\n        return len(self.features)\n\n    def load_data(self, data, test_only):\n        '''Wrapper method to read/instantiate the dataset'''\n\n        if data == None:\n            dataset = FEDSHAKESPEARE()\n            data = dataset.testset if test_only else dataset.trainset\n        \n        users = data['users']\n        features = data['user_data']\n        labels = data['user_data_label']\n        num_samples = data['num_samples']\n            \n        return users, features, labels, num_samples"
  },
  {
    "path": "experiments/nlp_rnn_fedshakespeare/dataloaders/preprocessing.py",
    "content": "import logging\nimport os\nimport wget\nimport tarfile\nimport h5py\nimport collections\nimport numpy as np\n\ndata_cache_dir = \"./data\"\nDEFAULT_TRAIN_FILE = \"shakespeare_train.h5\"\nDEFAULT_TEST_FILE = \"shakespeare_test.h5\"\n\nword_dict = None\nword_list = None\n_pad = \"<pad>\"\n_bos = \"<bos>\"\n_eos = \"<eos>\"\n\n''' \n    The FedeShakespeare dataset is taken from FedML repository. For more information regarding this dataset, \n    please refer to https://github.com/FedML-AI/FedML/tree/master/python/fedml/data/fed_shakespeare.\n\n    In order to download the data run the following commands:\n        - wget --no-check-certificate --no-proxy https://fedml.s3-us-west-1.amazonaws.com/shakespeare.tar.bz2\n        - tar -xvf shakespeare.tar.bz2\n    \n    This code follows the steps of preprocessing in tff shakespeare dataset: \n    https://github.com/google-research/federated/blob/master/utils/datasets/shakespeare_dataset.py\n\n'''\n\nSEQUENCE_LENGTH = 80  # from McMahan et al AISTATS 2017\n# Vocabulary re-used from the Federated Learning for Text Generation tutorial.\n# https://www.tensorflow.org/federated/tutorials/federated_learning_for_text_generation\n\nCHAR_VOCAB = list(\"dhlptx@DHLPTX $(,048cgkoswCGKOSW[_#'/37;?bfjnrvzBFJNRVZ\\\"&*.26:\\naeimquyAEIMQUY]!%)-159\\r\")\n\ndef preprocess(sentences, max_seq_len=SEQUENCE_LENGTH):\n\n    sequences = []\n\n    def to_ids(sentence, num_oov_buckets=1):\n        \"\"\"\n        map list of sentence to list of [idx..] and pad to max_seq_len + 1\n        Args:\n            num_oov_buckets : The number of out of vocabulary buckets.\n            max_seq_len: Integer determining shape of padded batches.\n        \"\"\"\n        tokens = [char_to_id(c) for c in sentence]\n        tokens = [char_to_id(_bos)] + tokens + [char_to_id(_eos)]\n        if len(tokens) % (max_seq_len + 1) != 0:\n            pad_length = (-len(tokens)) % (max_seq_len + 1)\n            tokens += [char_to_id(_pad)] * pad_length\n        return (\n            tokens[i : i + max_seq_len + 1]\n            for i in range(0, len(tokens), max_seq_len + 1)\n        )\n\n    for sen in sentences:\n        sequences.extend(to_ids(sen))\n    return sequences\n\ndef char_to_id(char):\n    word_dict = get_word_dict()\n    if char in word_dict:\n        return word_dict[char]\n    else:\n        return len(word_dict)\n\ndef get_word_dict():\n    global word_dict\n    if word_dict == None:\n        words = [_pad] + CHAR_VOCAB + [_bos] + [_eos]\n        word_dict = collections.OrderedDict()\n        for i, w in enumerate(words):\n            word_dict[w] = i\n    return word_dict\n\ndef split(dataset):\n    ds = np.asarray(dataset)\n    x = ds[:, :-1]\n    y = ds[:, 1:]\n    return x, y\n\ndef download_files(data_cache_dir):\n\n    URL = \"https://fedml.s3-us-west-1.amazonaws.com/shakespeare.tar.bz2\"\n\n    if not os.path.exists(data_cache_dir):\n        os.makedirs(data_cache_dir)\n\n    file_path = os.path.join(data_cache_dir,\"shakespeare.tar.bz2\") \n\n    # Download and decompress the file (if we haven't already)\n    if not os.path.exists(file_path):\n        wget.download(URL, out=file_path)\n\n        file = tarfile.open(file_path)\n        file.extractall(os.path.join(data_cache_dir,'fed_shakespeare'))\n        file.close()\n\nclass FEDSHAKESPEARE:\n    def __init__(self) :\n\n        download_files(data_cache_dir)\n        train_h5 = h5py.File(os.path.join(data_cache_dir,'fed_shakespeare', DEFAULT_TRAIN_FILE), \"r\")\n        test_h5 = h5py.File(os.path.join(data_cache_dir, 'fed_shakespeare',DEFAULT_TEST_FILE), \"r\")\n        test_dict = {'users': [], 'num_samples': [], 'user_data': dict(), 'user_data_label': dict()}\n        train_dict = {'users': [], 'num_samples': [], 'user_data': dict(), 'user_data_label': dict()}\n\n        for user in train_h5['examples'].keys():\n            train_dict['users'].append(user)\n            raw_train = train_h5['examples'][user]['snippets'][()]\n            raw_train = [x.decode(\"utf8\") for x in raw_train]\n            user_data = preprocess(raw_train)\n            train_dict['num_samples'].append(len(user_data))\n\n            # split data\n            train_x, train_y = split(user_data)\n            train_dict['user_data'][user] = train_x\n            train_dict['user_data_label'][user] = train_y\n\n        for user in test_h5['examples'].keys():\n            test_dict['users'].append(user)\n            raw_test = test_h5['examples'][user]['snippets'][()]\n            raw_test = [x.decode(\"utf8\") for x in raw_test]\n            user_data = preprocess(raw_test)\n            test_dict['num_samples'].append(len(user_data))\n\n            # split data\n            test_x, test_y = split(user_data)\n            test_dict['user_data'][user] = test_x\n            test_dict['user_data_label'][user] = test_y\n            \n        print(\" Dictionaries ready .. \")\n        self.trainset, self.testset = train_dict, test_dict\n\n"
  },
  {
    "path": "experiments/nlp_rnn_fedshakespeare/model.py",
    "content": "import torch\nfrom torch import nn\nfrom torch.nn import functional as F\nfrom core.model import BaseModel\n\n''' \n    The CNN_DropOut model is taken from FedML repository. For more information regarding this model, \n    please refer to https://github.com/FedML-AI/FedML/blob/master/python/fedml/model/nlp/rnn.py.\n\n'''\n\nclass nlp_rnn_fedshakespeare(nn.Module):\n    def __init__(self, embedding_dim=8, vocab_size=90, hidden_size=256):\n        super(nlp_rnn_fedshakespeare, self).__init__()\n        self.embeddings = nn.Embedding(\n            num_embeddings=vocab_size, embedding_dim=embedding_dim, padding_idx=0\n        )\n        self.lstm = nn.LSTM(\n            input_size=embedding_dim,\n            hidden_size=hidden_size,\n            num_layers=2,\n            batch_first=True,\n        )\n        self.fc = nn.Linear(hidden_size, vocab_size)\n\n    def forward(self, input_seq):\n        embeds = self.embeddings(input_seq)\n        # Note that the order of mini-batch is random so there is no hidden relationship among batches.\n        # So we do not input the previous batch's hidden state,\n        # leaving the first hidden state zero `self.lstm(embeds, None)`.\n        lstm_out, _ = self.lstm(embeds)\n        # use the final hidden state as the next character prediction\n        final_hidden_state = lstm_out[:, -1]\n        # output = self.fc(final_hidden_state)\n        # For fed_shakespeare\n        output = self.fc(lstm_out[:, :])\n        output = torch.transpose(output, 1, 2)\n        return output\n\nclass RNN(BaseModel):\n    '''This is a PyTorch model with some extra methods'''\n\n    def __init__(self, model_config):\n        super().__init__()\n        self.net = nlp_rnn_fedshakespeare()\n\n    def loss(self, input: torch.Tensor) -> torch.Tensor:\n        '''Performs forward step and computes the loss'''\n        device = 'cuda' if torch.cuda.is_available() else 'cpu'\n        x, target = input['x'].to(device), input['y'].to(device)\n        output = self.net.forward(x)\n        criterion = nn.CrossEntropyLoss(ignore_index=0).to(device)\n        return criterion(output, target.long())\n\n    def inference(self, input):\n        '''Performs forward step and computes metrics'''\n        device = 'cuda' if torch.cuda.is_available() else 'cpu'\n        x, target = input['x'].to(device), input['y'].to(device)\n        output = self.net.forward(x)\n        n_samples = x.shape[0]\n        \n        pred = torch.argmax(output, dim=1)\n        mask = (target != 0)\n        accuracy = torch.sum((pred[mask] == target[mask]).float()).item()\n        accuracy = accuracy/mask.sum()\n\n        return {'output':output, 'acc': accuracy, 'batch_size': n_samples} \n\n\n        "
  },
  {
    "path": "experiments/semisupervision/README.md",
    "content": "### Data\n\nIn order to run this experiment, you need to previously run the script [cifar_dataset.py](dataloaders/cifar_dataset.py) in order to download and preprocess the CIFAR100 dataset needed for this task. \n\n```code\n\n    python experiments/semisupervision/dataloaders/cifar_dataset.py\n    \n```\n### Run\n\nOnce the data has been downloaded, you can run the experiment as follows:\n\n```code\n\n    python -m torch.distributed.run --nproc_per_node=2  e2e_trainer.py -dataPath ~/data -outputPath ~/outputTest  -config ./experiments/semisupervision/config.yaml -task semisupervision -backend nccl\n    \n```\n"
  },
  {
    "path": "experiments/semisupervision/config.yaml",
    "content": "# Basic configuration file for running semisupervision with data loaded on-the-fly\n# Parameters needed to initialize the model\nmodel_config:\n    model_type: Res                               # class w/ `loss` and `inference` methods\n    model_folder: experiments/semisupervision/model.py         # file containing class\n    num_classes: 100\n\n# Configuration for differential privacy\ndp_config:\n    enable_local_dp: false                             # whether to enable user-level DP\n\n# Additional privacy metrics\nprivacy_metrics_config:\n    apply_metrics: false                               # cache data to compute additional metrics\n\n# Select the Federated optimizer to use (e.g. DGA, FedAvg or FedProx)\nstrategy: FedLabels\n\n# Determines all the server-side settings for training and evaluation rounds\nserver_config:\n    send_dicts: true                                   # if true, the server will update model dictionaries instead of grads\n    wantRL: false                                      # whether to use RL-based meta-optimizers\n    resume_from_checkpoint: true                      # restart from checkpoint if file exists\n    do_profiling: false                                # run profiler and compute runtime metrics\n    optimizer_config:                                  # this is the optimizer used to update the model\n        type: sgd\n        lr: 1.0\n    annealing_config:                                  # annealer for the learning rate\n        type: step_lr\n        step_interval: epoch\n        gamma: 1.0\n        step_size: 100\n    val_freq: 1                                       # how many iterations between metric eval on val set\n    rec_freq: 5000                                      # how many iterations between metric eval on test set\n    initial_val: true\n    initial_rec: false\n    max_iteration: 2000                                # how many iterations in total\n    num_clients_per_iteration: 10                     # how many clients per iteration\n    data_config:                                       # where to get val and test data from\n        val:\n            batch_size: 64\n            val_data: null\n        test:\n            batch_size: 64\n            test_data: null\n    type: model_optimization\n    aggregate_median: softmax                          # how aggregations weights are computed\n    softmax_beta: 20.0\n    initial_lr_client: 0.003                           # learning rate used on client optimizer\n    lr_decay_factor: 1.0\n    weight_train_loss: train_loss\n    best_model_criterion: loss\n    fall_back_to_best_model: false\n\n# Dictates the learning parameters for client-side model updates. Train data is defined inside this config.\nclient_config:\n    do_profiling: false                                # run profiling and compute runtime metrics\n    ignore_subtask: false\n    data_config:                                       # where to get training data from\n        train:\n            batch_size: 64\n            list_of_train_data: null\n            desired_max_samples: 87000\n    optimizer_config:                                  # this is the optimizer used by the client\n        type: sgd \n        lr: 0.003                                      # this is overridden by `initial_lr_client`\n        momentum: 0\n    type: optimization\n    semisupervision:\n        uda: 1\n        num_classes: 100\n        isclust: 0\n        alpha: 0.1\n        train_ratio: 0.2\n        test_ratio: 0.0\n        val_ratio: 0.8\n        vat_ptb: 0\n        vat_consis: 0.05\n        lamb_consist: 0.05\n        unsup_lamb: 1\n        l2_lambda: 10\n        burnout_round: 50 \n        thre: 0.3\n        comp: var\n        eta: 0.003\n        bs: 64\n        unl_bs: 128\n        train_ep: 30\n        unsuptrain_ep: 10\n        ensize: 100\n        seed: 0\n        temp: 1\n        device: cuda\n        size: 10\n        shuffle: 1"
  },
  {
    "path": "experiments/semisupervision/dataloaders/RandAugment.py",
    "content": "'''\nCode in this file is adapted from rpmcruz/autoaugment\nhttps://github.com/rpmcruz/autoaugment/blob/master/transformations.py\n\nThis code is modified version of https://github.com/ildoonet/pytorch-randaugment/blob/master/RandAugment/augmentations.py\nfor randaugmentation.\n'''\nimport random\n\nimport PIL, PIL.ImageOps, PIL.ImageEnhance, PIL.ImageDraw\nimport numpy as np\nimport torch\nfrom PIL import Image\n\n\ndef ShearX(img, v):  # [-0.3, 0.3]\n    assert -0.3 <= v <= 0.3\n    if random.random() > 0.5:\n        v = -v\n    return img.transform(img.size, PIL.Image.AFFINE, (1, v, 0, 0, 1, 0))\n\n\ndef ShearY(img, v):  # [-0.3, 0.3]\n    assert -0.3 <= v <= 0.3\n    if random.random() > 0.5:\n        v = -v\n    return img.transform(img.size, PIL.Image.AFFINE, (1, 0, 0, v, 1, 0))\n\n\ndef TranslateX(img, v):  # [-150, 150] => percentage: [-0.45, 0.45]\n    assert -0.45 <= v <= 0.45\n    if random.random() > 0.5:\n        v = -v\n    v = v * img.size[0]\n    return img.transform(img.size, PIL.Image.AFFINE, (1, 0, v, 0, 1, 0))\n\n\ndef TranslateXabs(img, v):  # [-150, 150] => percentage: [-0.45, 0.45]\n    assert 0 <= v\n    if random.random() > 0.5:\n        v = -v\n    return img.transform(img.size, PIL.Image.AFFINE, (1, 0, v, 0, 1, 0))\n\n\ndef TranslateY(img, v):  # [-150, 150] => percentage: [-0.45, 0.45]\n    assert -0.45 <= v <= 0.45\n    if random.random() > 0.5:\n        v = -v\n    v = v * img.size[1]\n    return img.transform(img.size, PIL.Image.AFFINE, (1, 0, 0, 0, 1, v))\n\n\ndef TranslateYabs(img, v):  # [-150, 150] => percentage: [-0.45, 0.45]\n    assert 0 <= v\n    if random.random() > 0.5:\n        v = -v\n    return img.transform(img.size, PIL.Image.AFFINE, (1, 0, 0, 0, 1, v))\n\n\ndef Rotate(img, v):  # [-30, 30]\n    assert -30 <= v <= 30\n    if random.random() > 0.5:\n        v = -v\n    return img.rotate(v)\n\n\ndef AutoContrast(img, _):\n    return PIL.ImageOps.autocontrast(img)\n\n\ndef Invert(img, _):\n    return PIL.ImageOps.invert(img)\n\n\ndef Equalize(img, _):\n    return PIL.ImageOps.equalize(img)\n\n\ndef Flip(img, _):  # not from the paper\n    return PIL.ImageOps.mirror(img)\n\n\ndef Solarize(img, v):  # [0, 256]\n    assert 0 <= v <= 256\n    return PIL.ImageOps.solarize(img, v)\n\n\ndef SolarizeAdd(img, addition=0, threshold=128):\n    img_np = np.array(img).astype(np.int)\n    img_np = img_np + addition\n    img_np = np.clip(img_np, 0, 255)\n    img_np = img_np.astype(np.uint8)\n    img = Image.fromarray(img_np)\n    return PIL.ImageOps.solarize(img, threshold)\n\n\ndef Posterize(img, v):  # [4, 8]\n    v = int(v)\n    v = max(1, v)\n    return PIL.ImageOps.posterize(img, v)\n\n\ndef Contrast(img, v):  # [0.1,1.9]\n    assert 0.1 <= v <= 1.9\n    return PIL.ImageEnhance.Contrast(img).enhance(v)\n\n\ndef Color(img, v):  # [0.1,1.9]\n    assert 0.1 <= v <= 1.9\n    return PIL.ImageEnhance.Color(img).enhance(v)\n\n\ndef Brightness(img, v):  # [0.1,1.9]\n    assert 0.1 <= v <= 1.9\n    return PIL.ImageEnhance.Brightness(img).enhance(v)\n\n\ndef Sharpness(img, v):  # [0.1,1.9]\n    assert 0.1 <= v <= 1.9\n    return PIL.ImageEnhance.Sharpness(img).enhance(v)\n\n\ndef Cutout(img, v):  # [0, 60] => percentage: [0, 0.2]\n    assert 0.0 <= v <= 0.2\n    if v <= 0.:\n        return img\n\n    v = v * img.size[0]\n    return CutoutAbs(img, v)\n\n\ndef CutoutAbs(img, v):  # [0, 60] => percentage: [0, 0.2]\n    # assert 0 <= v <= 20\n    if v < 0:\n        return img\n    w, h = img.size\n    x0 = np.random.uniform(w)\n    y0 = np.random.uniform(h)\n\n    x0 = int(max(0, x0 - v / 2.))\n    y0 = int(max(0, y0 - v / 2.))\n    x1 = min(w, x0 + v)\n    y1 = min(h, y0 + v)\n\n    xy = (x0, y0, x1, y1)\n    color = (125, 123, 114)\n    # color = (0, 0, 0)\n    img = img.copy()\n    #print(img)\n    PIL.ImageDraw.Draw(img).rectangle(xy, color)\n    return img\n\n\ndef SamplePairing(imgs):  # [0, 0.4]\n    def f(img1, v):\n        i = np.random.choice(len(imgs))\n        img2 = PIL.Image.fromarray(imgs[i])\n        return PIL.Image.blend(img1, img2, v)\n\n    return f\n\n\ndef Identity(img, v):\n    return img\n\n\ndef augment_list(grey):  # 16 oeprations and their ranges\n    # https://github.com/google-research/uda/blob/master/image/randaugment/policies.py#L57\n    # l = [\n    #     (Identity, 0., 1.0),\n    #     (ShearX, 0., 0.3),  # 0\n    #     (ShearY, 0., 0.3),  # 1\n    #     (TranslateX, 0., 0.33),  # 2\n    #     (TranslateY, 0., 0.33),  # 3\n    #     (Rotate, 0, 30),  # 4\n    #     (AutoContrast, 0, 1),  # 5\n    #     (Invert, 0, 1),  # 6\n    #     (Equalize, 0, 1),  # 7\n    #     (Solarize, 0, 110),  # 8\n    #     (Posterize, 4, 8),  # 9\n    #     # (Contrast, 0.1, 1.9),  # 10\n    #     (Color, 0.1, 1.9),  # 11\n    #     (Brightness, 0.1, 1.9),  # 12\n    #     (Sharpness, 0.1, 1.9),  # 13\n    #     # (Cutout, 0, 0.2),  # 14\n    #     # (SamplePairing(imgs), 0, 0.4),  # 15\n    # ]\n\n    if grey:\n        # https://github.com/tensorflow/tpu/blob/8462d083dd89489a79e3200bcc8d4063bf362186/models/official/efficientnet/autoaugment.py#L505\n        l = [\n            (AutoContrast, 0, 1),\n            (Equalize, 0, 1),\n            (Invert, 0, 1),\n            (Rotate, 0, 30),\n            (Posterize, 0, 4),\n            (Solarize, 0, 256),\n            (SolarizeAdd, 0, 110),\n            (Color, 0.1, 1.9),\n            (Contrast, 0.1, 1.9),\n            (Brightness, 0.1, 1.9),\n            (Sharpness, 0.1, 1.9),\n            (ShearX, 0., 0.3),\n            (ShearY, 0., 0.3),\n            (TranslateXabs, 0., 100),\n            (TranslateYabs, 0., 100),\n        ]\n\n    else:\n        l = [\n            (AutoContrast, 0, 1),\n            (Equalize, 0, 1),\n            (Invert, 0, 1),\n            (Rotate, 0, 30),\n            (Posterize, 0, 4),\n            (Solarize, 0, 256),\n            (SolarizeAdd, 0, 110),\n            (Color, 0.1, 1.9),\n            (Contrast, 0.1, 1.9),\n            (Brightness, 0.1, 1.9),\n            (Sharpness, 0.1, 1.9),\n            (ShearX, 0., 0.3),\n            (ShearY, 0., 0.3),\n            (CutoutAbs, 0, 40),\n            (TranslateXabs, 0., 100),\n            (TranslateYabs, 0., 100),\n        ]\n    return l\n\n\nclass Lighting(object):\n    \"\"\"Lighting noise(AlexNet - style PCA - based noise)\"\"\"\n\n    def __init__(self, alphastd, eigval, eigvec):\n        self.alphastd = alphastd\n        self.eigval = torch.Tensor(eigval)\n        self.eigvec = torch.Tensor(eigvec)\n\n    def __call__(self, img):\n        if self.alphastd == 0:\n            return img\n\n        alpha = img.new().resize_(3).normal_(0, self.alphastd)\n        rgb = self.eigvec.type_as(img).clone() \\\n            .mul(alpha.view(1, 3).expand(3, 3)) \\\n            .mul(self.eigval.view(1, 3).expand(3, 3)) \\\n            .sum(1).squeeze()\n\n        return img.add(rgb.view(3, 1, 1).expand_as(img))\n\n\nclass CutoutDefault(object):\n    \"\"\"\n    Reference : https://github.com/quark0/darts/blob/master/cnn/utils.py\n    \"\"\"\n    def __init__(self, length):\n        self.length = length\n\n    def __call__(self, img):\n        h, w = img.size(1), img.size(2)\n        mask = np.ones((h, w), np.float32)\n        y = np.random.randint(h)\n        x = np.random.randint(w)\n\n        y1 = np.clip(y - self.length // 2, 0, h)\n        y2 = np.clip(y + self.length // 2, 0, h)\n        x1 = np.clip(x - self.length // 2, 0, w)\n        x2 = np.clip(x + self.length // 2, 0, w)\n\n        mask[y1: y2, x1: x2] = 0.\n        mask = torch.from_numpy(mask)\n        mask = mask.expand_as(img)\n        img *= mask\n        return img\n\n\nclass RandAugment:\n    def __init__(self, n, m, grey=False):\n        self.n = n\n        self.m = m      # [0, 30]\n        self.augment_list = augment_list(grey)\n\n    def __call__(self, img):\n        ops = random.choices(self.augment_list, k=self.n)\n        #print(ops)\n        for op, minval, maxval in ops:\n            val = (float(self.m) / 30) * float(maxval - minval) + minval\n            img = op(img, val)\n\n        return img"
  },
  {
    "path": "experiments/semisupervision/dataloaders/cifar_dataset.py",
    "content": "# Copyright (c) Microsoft Corporation.\n# Licensed under the MIT license.\n\nimport os \nimport time\nimport json\n\nimport torch\nimport numpy as np\nimport pathlib\n\nfrom torchvision import datasets, transforms\nfrom torch.utils.data import TensorDataset, DataLoader\nfrom numpy.random import RandomState\n\nTRAINSET = \"trainset.json\"\nTRAINSET_UNLAB = \"trainset_unlab.json\"\nTRAINSET_UNLAB_RAND = \"trainset_unlab_rand.json\"\nTESTSET = \"testset.json\"\nROOT = './data'\n\n\nclass CIFAR100:\n    def __init__(self, user_idx=None, test_only=None, args=None, read_data=True) :\n        if read_data: # Reads the data previously saved on files\n            if user_idx == -1:\n                if test_only:\n                    print(\"Reading testing file\")\n                    file = os.path.join(ROOT,TESTSET)\n                else:\n                    print(\"Reading training labeled file\")\n                    file = os.path.join(ROOT,TRAINSET)\n            elif user_idx == -2:\n                print(\"Reading unlabeled training file\")\n                file = os.path.join(ROOT, TRAINSET_UNLAB)\n            elif user_idx == -3:\n                print(\"Reading unlabeled random training file\")\n                file = os.path.join(ROOT, TRAINSET_UNLAB_RAND)\n\n            with open(file, 'r') as f:\n                json_file = json.load(f)\n\n            self.data = json_file\n        else: # Create, preprocess and save the datasets\n            from RandAugment import RandAugment\n            trans = transforms.Compose(\n                [transforms.ToTensor(), transforms.Normalize((0.4914, 0.4822, 0.4465), (0.2023, 0.1994, 0.2010))])\n            transform_train = transforms.Compose([\n                transforms.RandomCrop(32, padding=4),\n                transforms.RandomHorizontalFlip(),\n                transforms.ToTensor(),\n                transforms.Normalize((0.4914, 0.4822, 0.4465), (0.2023, 0.1994, 0.2010))])\n\n            transform_unlabeltrain = transforms.Compose([ \n                RandAugment(1, 10),\n                transforms.RandomCrop(32, padding=4),\n                transforms.RandomHorizontalFlip(),\n                transforms.ToTensor(),\n                transforms.Normalize((0.4914, 0.4822, 0.4465), (0.2023, 0.1994, 0.2010))])\n\n            # Download and preprocess datasets\n            trainset = datasets.CIFAR100('./data', train=True, download=True, transform=transform_train)\n            unlabel_trainset = datasets.CIFAR100('./data', train=True, download=True, transform=transform_unlabeltrain)\n            self.pretestset = datasets.CIFAR100('./data', train=False, download=True, transform=trans)\n\n            train_loader = DataLoader(trainset, batch_size=len(trainset))\n            ultrain_loader = DataLoader(unlabel_trainset, batch_size=len(unlabel_trainset))\n\n            X_train = next(iter(train_loader))[0].numpy()\n            Y_train = next(iter(train_loader))[1].numpy()\n            X_unlabel_train = next(iter(ultrain_loader))[0].numpy()\n            Y_unlabel_train = next(iter(ultrain_loader))[1].numpy()\n\n            self.pretrainset, trainset_unlab_rand, trainset_unlab, \\\n            self.embed_dim = partition_imagedataset(X_train, Y_train, X_unlabel_train, Y_unlabel_train,args)\n            self.trainset = _process(self.pretrainset, train=True)\n            self.trainset_unlab = _process(trainset_unlab, train=True)\n            self.trainset_unlab_rand = _process(trainset_unlab_rand, train=True)\n            self.testset = _process(self.pretestset, train=False)\n\n            save_json(self.trainset, TRAINSET)\n            save_json(self.trainset_unlab, TRAINSET_UNLAB)\n            save_json(self.trainset_unlab_rand, TRAINSET_UNLAB_RAND)\n            save_json(self.testset, TESTSET)\n\ndef save_json(dict, filename):\n    f = open(os.path.join('./data',filename), \"w\")\n    json.dump(dict,f)\n    f.close()\n\ndef _process(dataset, train=True):\n    '''Process a Torchvision/preprocessed dataset to expected FLUTE format'''\n\n    print('Converting data to expected format...')\n    start_time = time.time()\n\n    data_dict = {'users':[], 'num_samples': [], 'user_data':{}, 'user_data_label':{}}\n    \n    for i in range(len(dataset)):\n\n        if train:\n            x, y = dataset[i]['x'], dataset[i]['y']\n        else:\n            x, y = dataset[i]\n\n        data_dict['users'].append(f'{i:04d}')\n        data_dict['num_samples'].append(len(y) if train else 1)\n        data_dict['user_data'][f'{i:04d}'] = [xi.tolist() for xi in x] if train else [x.tolist()]\n        data_dict['user_data_label'][f'{i:04d}'] = [yi.tolist() for yi in y] if train else y\n\n    print(f'Finished converting data in {time.time() - start_time:.2f}s.')\n\n    return data_dict\n\ndef partition_imagedataset(X_train, Y_train, X_unlabel_train, Y_unlabel_train, args):\n\n    if args['isclust'] == 1:\n        partition = __getClusteredData__(Y_train, args['ensize'])\n\n    elif args['isclust'] == 2:\n        partition = __getClusteredMixedData__(Y_train, args['ensize'])\n    else:\n        partition = __getDirichletData__(Y_train, args)\n\n    dataset_train = []\n    dataset_val = []\n    dataset_val_norand = []\n    dataset_test = []\n\n    train_ratio = args['train_ratio']\n    val_ratio = args['val_ratio']\n    test_ratio = args['test_ratio']\n    x_for_embed = np.shape(X_train[0])\n    for (i, ind) in enumerate(partition):\n\n        x = X_train[ind]\n        y = Y_train[ind]\n\n        x_ul = X_unlabel_train[ind]\n        y_ul = Y_unlabel_train[ind]\n\n        n_i = len(ind)\n\n        train_size = int(train_ratio * n_i)\n        val_size = int(val_ratio * n_i) \n        test_size = int(test_ratio * n_i)\n\n        x_train = torch.Tensor(x[val_size:val_size + train_size])\n        y_train = torch.LongTensor(y[val_size:val_size + train_size])\n\n        dataset_train_torch = {'x': x_train, 'y':y_train}\n\n        if val_size == 0:\n            x_val = x_train\n            y_cal = y_train\n            dataset_val_torch = dataset_train_torch\n            dataset_val_torch_norand = dataset_train_torch\n        else:\n            x_val = torch.Tensor(x[:val_size])\n            y_val = torch.LongTensor(y[:val_size])\n            x_ul_val = torch.Tensor(x_ul[:val_size])\n            y_ul_val = torch.LongTensor(y_ul[:val_size])\n            dataset_val_torch = {'x': x_ul_val, 'y': y_ul_val}\n            dataset_val_torch_norand = {'x':x_val, 'y':y_val}\n\n        dataset_train.append(dataset_train_torch)\n        dataset_val.append(dataset_val_torch)\n        dataset_val_norand.append(dataset_val_torch_norand)\n\n    return dataset_train, dataset_val, dataset_val_norand, x_for_embed\n\ndef __getDirichletData__(y, args):\n\n    n = args['ensize']\n    n_nets = args['ensize']\n    K = args['num_classes']\n    num_c = args['num_classes']\n    labelList_true = y\n\n    min_size = 0\n    N = len(labelList_true)\n    rnd = 0\n    rann = RandomState(rnd)\n    net_dataidx_map = {}\n    p_client = np.zeros((n, num_c))\n\n    for i in range(n):\n        p_client[i] = rann.dirichlet(np.repeat(args['alpha'], num_c))\n\n    idx_batch = [[] for _ in range(n_nets)]\n\n    for k in range(K):\n        idx_k = np.where(labelList_true == k)[0]\n        rann.shuffle(idx_k)\n        proportions = p_client[:, k]\n        proportions = proportions / proportions.sum()\n        proportions = (np.cumsum(proportions) * len(idx_k)).astype(int)[:-1]\n        idx_batch = [idx_j + idx.tolist() for idx_j, idx in zip(idx_batch, np.split(idx_k, proportions))]\n\n    for j in range(n_nets):\n        if args['shuffle'] == 1:\n            rann.shuffle(idx_batch[j])\n\n        net_dataidx_map[j] = idx_batch[j]\n\n    net_cls_counts_label = {}\n    net_cls_counts_unlabel = {}\n\n    for net_i in range(len(idx_batch)):\n        n_i = len(idx_batch[net_i])\n        train_size = int(args['train_ratio'] * n_i)\n        val_size = int(args['val_ratio'] * n_i)\n        unq, unq_cnt = np.unique(labelList_true[idx_batch[net_i][val_size:val_size + train_size]], return_counts=True)\n        tmp = {unq[i]: unq_cnt[i] for i in range(len(unq))}\n        net_cls_counts_label[net_i] = tmp\n\n        unq1, unq_cnt1 = np.unique(labelList_true[idx_batch[net_i][:val_size]], return_counts=True)\n        tmp1 = {unq1[i]: unq_cnt1[i] for i in range(len(unq1))}\n        net_cls_counts_unlabel[net_i] = tmp1\n\n    local_sizes = []\n    for i in range(n_nets):\n        local_sizes.append(len(net_dataidx_map[i]))\n    local_sizes = np.array(local_sizes)\n    weights = local_sizes / np.sum(local_sizes)\n\n    return idx_batch\n\nif __name__ == \"__main__\":\n\n    # Download and preprocess data\n    args= {'name': 'FedVATnew', 'isaml':0, 'uda':1 , 'dataset': 'cifar100',\n            'num_classes': 100, 'isclust': 0, 'alpha': 0.1, 'train_ratio': 0.2, 'val_ratio':0.8,\n            'shuffle':1, 'vat_ptb':0.0 , 'vat_consis':0.05, 'unsup_lamb':1, 'l2_lambda':10,\n            'bo': 50, 'thre': 0.3, 'comp': 'var', 'eta': 0.003, 'bs':64, 'unl_bs':128, 'train_ep':30,\n            'unsuptrain_ep':10, 'rounds':2000, 'ensize':100, 'size': 10, 'model': 'RES50', 'seed': 0,\n            'test_ratio': 0.0}\n\n    data = CIFAR100(read_data=False, args=args)"
  },
  {
    "path": "experiments/semisupervision/dataloaders/dataloader.py",
    "content": "# Copyright (c) Microsoft Corporation.\n# Licensed under the MIT license.\n\nimport torch\nimport numpy as np\n\nfrom core.dataloader import BaseDataLoader\nfrom experiments.semisupervision.dataloaders.dataset import Dataset\n\nclass DataLoader(BaseDataLoader):\n    def __init__(self, mode, num_workers=0, **kwargs):\n        args = kwargs['args']\n        self.batch_size = args['batch_size']\n\n        dataset = Dataset(\n            data=kwargs['data'],\n            test_only=(not mode=='train'),\n            user_idx=kwargs.get('user_idx', None),\n        )\n\n        super().__init__(\n            dataset,\n            batch_size=self.batch_size,\n            shuffle=(mode=='train'),\n            num_workers=num_workers,\n            collate_fn=self.collate_fn,\n        )\n\n    def collate_fn(self, batch):\n        x, y = list(zip(*batch))\n        x = np.array(x)\n        y = np.array(y)\n        return {'x': torch.tensor(x), 'y': torch.tensor(y)}"
  },
  {
    "path": "experiments/semisupervision/dataloaders/dataset.py",
    "content": "# Copyright (c) Microsoft Corporation.\n# Licensed under the MIT license.\n\nimport numpy as np\nfrom core.dataset import BaseDataset\nfrom experiments.semisupervision.dataloaders.cifar_dataset import CIFAR100\n\nclass Dataset(BaseDataset):\n    def __init__(self, data, test_only=False, user_idx=0, **kwargs):\n        self.test_only = test_only\n        self.user_idx = user_idx\n        args = kwargs.get('args',None)\n        \n        # Get all data\n        self.user_list, self.user_data, self.user_data_label, self.num_samples = self.load_data(data, self.test_only, args)\n\n        if user_idx != -1:\n            if self.test_only:  # combine all data into single array\n                self.user = 'test_only'\n                self.features = np.vstack([user_data for user_data in self.user_data.values()])\n                self.labels = np.hstack([user_label for user_label in self.user_data_label.values()])\n            else:  # get a single user's data\n                if user_idx is None:\n                    raise ValueError('in train mode, user_idx must be specified')\n\n                self.user = self.user_list[user_idx]\n                self.features = self.user_data[self.user]\n                self.labels = self.user_data_label[self.user]\n\n    def __getitem__(self, idx):\n        return np.array(self.features[idx]).astype(np.float32), self.labels[idx]\n\n    def __len__(self):\n        return len(self.features)\n\n    def load_data(self, data, test_only, sup_config):\n        '''Wrapper method to read/instantiate the dataset'''\n\n        if data == None:\n            dataset = CIFAR100(self.user_idx, test_only, sup_config)\n            data = dataset.data\n        \n        users = data['users']\n        features = data['user_data']\n        labels = data['user_data_label']\n        num_samples = data['num_samples']\n            \n        return users, features, labels, num_samples"
  },
  {
    "path": "experiments/semisupervision/model.py",
    "content": "import math\nimport torch\nimport torch.nn as nn\nimport torch.nn.functional as F\nimport numpy as np\nfrom core.model import BaseModel\n\n'''ResNet in PyTorch.\n\nReference:\n[1] Kaiming He, Xiangyu Zhang, Shaoqing Ren, Jian Sun\n    Deep Residual Learning for Image Recognition. arXiv:1512.03385\n'''\n\nclass BasicBlock(nn.Module):\n    expansion = 1\n\n    def __init__(self, in_planes, planes, stride=1):\n        super(BasicBlock, self).__init__()\n        self.conv1 = nn.Conv2d(\n            in_planes, planes, kernel_size=3, stride=stride, padding=1, bias=False)\n        self.bn1 = nn.BatchNorm2d(planes)\n        self.conv2 = nn.Conv2d(planes, planes, kernel_size=3,\n                               stride=1, padding=1, bias=False)\n        self.bn2 = nn.BatchNorm2d(planes)\n\n        self.shortcut = nn.Sequential()\n        if stride != 1 or in_planes != self.expansion*planes:\n            self.shortcut = nn.Sequential(\n                nn.Conv2d(in_planes, self.expansion*planes,\n                          kernel_size=1, stride=stride, bias=False),\n                nn.BatchNorm2d(self.expansion*planes)\n            )\n\n    def forward(self, x):\n        out = F.relu(self.bn1(self.conv1(x)))\n        out = self.bn2(self.conv2(out))\n        out += self.shortcut(x)\n        out = F.relu(out)\n        return out\n\n\nclass Bottleneck(nn.Module):\n    expansion = 4\n\n    def __init__(self, in_planes, planes, stride=1):\n        super(Bottleneck, self).__init__()\n        self.conv1 = nn.Conv2d(in_planes, planes, kernel_size=1, bias=False)\n        self.bn1 = nn.BatchNorm2d(planes)\n        self.conv2 = nn.Conv2d(planes, planes, kernel_size=3,\n                               stride=stride, padding=1, bias=False)\n        self.bn2 = nn.BatchNorm2d(planes)\n        self.conv3 = nn.Conv2d(planes, self.expansion *\n                               planes, kernel_size=1, bias=False)\n        self.bn3 = nn.BatchNorm2d(self.expansion*planes)\n\n        self.shortcut = nn.Sequential()\n        if stride != 1 or in_planes != self.expansion*planes:\n            self.shortcut = nn.Sequential(\n                nn.Conv2d(in_planes, self.expansion*planes,\n                          kernel_size=1, stride=stride, bias=False),\n                nn.BatchNorm2d(self.expansion*planes)\n            )\n\n    def forward(self, x):\n        out = F.relu(self.bn1(self.conv1(x)))\n        out = F.relu(self.bn2(self.conv2(out)))\n        out = self.bn3(self.conv3(out))\n        out += self.shortcut(x)\n        out = F.relu(out)\n        return out\n\n\nclass ResNet(nn.Module):\n    def __init__(self, block, num_blocks, num_classes=10, inchannels = 3):\n        super(ResNet, self).__init__()\n        self.in_planes = 64\n\n        self.conv1 = nn.Conv2d(inchannels, 64, kernel_size=3,\n                               stride=1, padding=1, bias=False)\n        self.bn1 = nn.BatchNorm2d(64)\n        self.layer1 = self._make_layer(block, 64, num_blocks[0], stride=1)\n        self.layer2 = self._make_layer(block, 128, num_blocks[1], stride=2)\n        self.layer3 = self._make_layer(block, 256, num_blocks[2], stride=2)\n        self.layer4 = self._make_layer(block, 512, num_blocks[3], stride=2)\n        self.linear = nn.Linear(512*block.expansion, num_classes)\n\n    def _make_layer(self, block, planes, num_blocks, stride):\n        strides = [stride] + [1]*(num_blocks-1)\n        layers = []\n        for stride in strides:\n            layers.append(block(self.in_planes, planes, stride))\n            self.in_planes = planes * block.expansion\n        return nn.Sequential(*layers)\n\n    def forward(self, x):\n        out = F.relu(self.bn1(self.conv1(x)))\n        out = self.layer1(out)\n        out = self.layer2(out)\n        out = self.layer3(out)\n        out = self.layer4(out)\n        out = F.avg_pool2d(out, 4)\n        out = out.view(out.size(0), -1)\n        out = self.linear(out)\n        return out\n\n\ndef ResNet18(num_classes=10):\n    return ResNet(BasicBlock, [2, 2, 2, 2], num_classes)\n\ndef ResNet18_emnist(num_classes=62, inchannel = 1):\n    return ResNet(BasicBlock, [2, 2, 2, 2], num_classes, inchannel)\n\ndef ResNet18_organ(num_classes=11, inchannel = 1):\n    return ResNet(BasicBlock, [2, 2, 2, 2], num_classes, inchannel)\n\ndef ResNet18_path(num_classes=9, inchannel = 3):\n    return ResNet(BasicBlock, [2, 2, 2, 2], num_classes, inchannel)\n\ndef ResNet18_blood(num_classes=8, inchannel = 3):\n    return ResNet(BasicBlock, [2, 2, 2, 2], num_classes, inchannel)\n\ndef ResNet34(num_classes=10):\n    return ResNet(BasicBlock, [3, 4, 6, 3], num_classes)\n\ndef ResNet50(num_classes=10):\n    return ResNet(Bottleneck, [3, 4, 6, 3], num_classes)\n\ndef ResNet101(num_classes=10):\n    return ResNet(Bottleneck, [3, 4, 23, 3], num_classes)\n\ndef ResNet152(num_classes=10):\n    return ResNet(Bottleneck, [3, 8, 36, 3], num_classes)\n\ndef test():\n    net = ResNet18()\n    y = net(torch.randn(1, 3, 32, 32))\n    print(y.size())\n\n\nclass Res(BaseModel):\n    '''This is a PyTorch model with some extra methods'''\n\n    def __init__(self, model_config):\n        super().__init__()\n        self.net = ResNet50(num_classes=model_config['num_classes'])\n    \n    def forward(self,x):\n        return self.net.forward(x)\n\n    def loss(self, input: torch.Tensor) -> torch.Tensor:\n        '''Performs forward step and computes the loss'''\n        device = 'cuda' if torch.cuda.is_available() else 'cpu'\n        features, labels = input['x'].to(device), input['y'].to(device)\n        log_probs = self.net.forward(features)\n\n        if not self.net.training: # For evaluation\n            loss = F.cross_entropy(log_probs, labels, reduction='sum')\n            loss /= labels.size(0)\n        else:   \n            loss_func = torch.nn.CrossEntropyLoss()\n            loss = loss_func(log_probs, labels)\n\n        return loss\n\n    def inference(self, input):\n        '''Performs forward step and computes metrics'''\n        device = 'cuda' if torch.cuda.is_available() else 'cpu'\n        features, labels = input['x'].to(device), input['y'].to(device)\n\n        Softmax = torch.nn.LogSoftmax(dim=1)\n\n        if len(np.shape(labels)) == 0:\n                labels = torch.stack([labels])\n        \n        output = self.net.forward(features)\n        log_probs = Softmax(output)\n        _, predicted = log_probs.max(1)\n        accuracy = predicted.eq(labels).sum().item() * 100\n        n_samples = labels.size(0)\n\n        return {'output':output, 'acc': accuracy/n_samples, 'batch_size': n_samples} \n\n"
  },
  {
    "path": "extensions/RL/RL.py",
    "content": "# Copyright (c) Microsoft Corporation.\n# Licensed under the MIT license.\n\nimport logging\nimport os\nimport json\nimport random\nimport torch\nimport torch.nn as nn\nimport numpy as np\nfrom collections import OrderedDict\nfrom utils import ( make_lr_scheduler,\n                    print_rank,\n                    torch_save,\n                    try_except_save,\n                    make_optimizer,\n                    to_device)\n\nclass SequenceWise(nn.Module):\n    def __init__(self, module):\n        \"\"\"\n        Collapses input of dim T*N*H to (T*N)*H, and applies to a module.\n        Allows handling of variable sequence lengths and minibatch sizes.\n        :param module: Module to apply input to.\n        \"\"\"\n        super(SequenceWise, self).__init__()\n        self.module = module\n\n    def forward(self, x):\n        t, n = x.size(0), x.size(1)\n        x = x.view(t * n, -1)\n        x = x.contiguous()\n        x = self.module(x)\n        x = x.view(t, n, -1)\n        return x\n\n    def __repr__(self):\n        tmpstr = self.__class__.__name__ + ' (\\n'\n        tmpstr += self.module.__repr__()\n        tmpstr += ')'\n        return tmpstr\n\n\nclass BatchRNN(nn.Module):\n    def __init__(self, input_size, hidden_size, rnn_type=nn.LSTM, bidirectional=False, batch_norm=True,dropout=0.0,multi=1):\n        super(BatchRNN, self).__init__()\n        self.input_size     = input_size\n        self.hidden_size    = hidden_size\n        self.batch_norm_activate = batch_norm\n        self.bidirectional  = bidirectional\n        self.multi          = multi\n        self.dropout        = dropout\n\n        if self.batch_norm_activate:\n            self.batch_norm = SequenceWise(nn.BatchNorm1d(input_size))\n        self.rnn = rnn_type(input_size   = input_size,\n                            hidden_size  = hidden_size,\n                            bidirectional= bidirectional,\n                            bias         = True,\n                            batch_first  = True,\n                            dropout      = self.dropout)\n        self.num_directions = 2 if bidirectional else 1\n\n\n    def forward(self, x):\n        if x.dim()==2:\n            x=x.unsqueeze(1)\n\n        if self.batch_norm_activate:\n            x = x.contiguous()\n            x = self.batch_norm(x)\n        x, _ = self.rnn(x)\n\n        if self.bidirectional and self.multi<2:\n            x = x.view(x.size(0), x.size(1), 2, -1).sum(2).view(x.size(0), x.size(1), -1)\n        return x\n\n\nclass NeuralNetwork(nn.Module):\n    def __init__(self, params, wantLSTM=False, batch_norm=False):\n        super(NeuralNetwork, self).__init__()\n\n        \"\"\"\n        The following parameters need revisiting\n        self.number_of_actions = 2\n        self.gamma = 0.99\n        self.final_epsilon = 0.0001\n        self.initial_epsilon = 0.1\n        self.number_of_iterations = 2000000\n        self.replay_memory_size = 10000\n        self.minibatch_size = 32\n\n        optimizer = optim.Adam(model.parameters(), lr=1e-6)\n        criterion = nn.MSELoss()\n\n        \"\"\"\n        self.wantLSTM  = wantLSTM\n        self.batch_norm= batch_norm\n        params = [int(x) for x in params.split(',')]\n        layers = []\n\n        self.softmax = nn.Softmax(dim = 1)\n        if self.wantLSTM:\n            # Recurrent Component of the architecture\n            rnns = []\n            for i in range(1, len(params) - 2):\n                multi = 1 if i==1 else 1\n                rnn = BatchRNN(input_size    = params[i-1]*multi,\n                                hidden_size  = params[i],\n                                rnn_type     = nn.LSTM,\n                                bidirectional= True,\n                                batch_norm   = batch_norm,\n                                multi        = 1,\n                                dropout      = 0.0)\n                rnns.append(('%d' %(i-1), rnn))\n            self.rnn = nn.Sequential(OrderedDict(rnns))\n\n            layers.append(nn.Linear(params[-3], params[-2], bias=True))\n            layers.append(nn.ReLU(inplace=True))\n            layers.append(nn.Linear(params[-2], params[-1], bias=True))\n            mlp = nn.Sequential(*layers)\n            self.mlp = nn.Sequential(SequenceWise(mlp),)\n\n        else:\n            if self.batch_norm:\n                self.batch_norm = nn.BatchNorm1d(params[0])\n\n            for i in range(1, len(params)-1):\n                layers.append(nn.Linear(params[i-1], params[i], bias=True))\n                layers.append(nn.ReLU(inplace=True))\n            layers.append(nn.Linear(params[-2], params[-1], bias=True))\n            self.mlp = nn.Sequential(*layers) \n\n\n    def forward(self, x):\n        if self.wantLSTM:\n            x = self.rnn(x)\n\n        if self.batch_norm:\n            x = self.batch_norm(x)\n        out = self.mlp(x)\n        out = out.squeeze()\n\n        return out\n\n\n\n\nclass RL:\n    def __init__(self, config=None):\n\n        # Finalized config-file\n        self.config= config\n\n        self.out_size = config[\"num_clients_per_iteration\"]\n        self.wantLSTM = config['RL']['wantLSTM'] if 'wantLSTM' in config['RL'] else False\n        self.replay_memory= []\n        self.state_memory = []\n        self.epsilon= config['RL']['initial_epsilon']\n        self.step =0 \n        self.runningLoss =0\n\n        model_descriptor = config['RL']['model_descriptor_RL'] if 'model_descriptor_RL'  in config['RL'] else 'Default'\n        self.model_name = os.path.join(config['RL']['RL_path'], 'rl_{}.{}.model'.format(self.out_size, model_descriptor))\n        self.stats_name = os.path.join(config['RL']['RL_path'], 'rl_{}.{}.stats'.format(self.out_size, model_descriptor))\n\n        # Initialize RL model\n        self.make_model()\n        self.load_saved_status()\n\n        # Set the RL weights\n        self.rl_weights=None\n        self.rl_losses=None\n\n        self.criterion = nn.MSELoss()\n\n    def set_losses(self, losses):\n        self.rl_losses=losses\n\n    def set_weights(self, weights):\n        self.rl_weights = weights\n\n    def forward(self, state=None):\n        # epsilon greedy exploration\n\n        if self.wantLSTM:\n            N = len(state)\n            state.resize(1, N)\n            if len(self.state_memory)==0:\n                self.state_memory = np.zeros((self.config['RL']['minibatch_size'], N))\n            self.state_memory = np.concatenate((self.state_memory[1:], state), axis=0)\n            state = self.state_memory\n\n        if random.random() <= self.epsilon:\n            print_rank(\"Performed random action!\")\n            action= to_device(torch.rand(self.out_size))\n        else:\n            state = to_device(torch.from_numpy(state))\n            print_rank(f'RL_state: {state.shape}')\n            action= self.model(state.float())\n        return action\n\n\n\n    def train(self, batch=None):\n        # save transition to replay memory\n        self.replay_memory.append(batch)\n\n        # if replay memory is full, remove the oldest transition\n        if len(self.replay_memory) > self.config['RL']['max_replay_memory_size']:\n            self.replay_memory.pop(0)\n\n        # epsilon annealing\n        self.epsilon *= self.config['RL']['epsilon_gamma'] if self.epsilon*self.config['RL']['epsilon_gamma']>self.config['RL']['final_epsilon'] else 1.0\n\n        # sample random minibatch\n        if self.wantLSTM:\n            if len(self.replay_memory)>= self.config['RL']['minibatch_size']:\n                minibatch = self.replay_memory[-self.config['RL']['minibatch_size']:]\n            else:\n                minibatch = self.replay_memory \n        else:\n            minibatch = random.sample(self.replay_memory, min(len(self.replay_memory), self.config['RL']['minibatch_size']))\n\n        # unpack minibatch\n        state_batch  = torch.tensor(tuple(d[0] for d in minibatch)).float()\n        action_batch = torch.tensor(tuple(d[1] for d in minibatch)).float()\n        reward_batch = torch.tensor(tuple(d[2] for d in minibatch)).float()\n\n        state_batch = to_device(state_batch)\n        action_batch = to_device(action_batch)\n        reward_batch = to_device(reward_batch)\n\n\n        # set y_j to r_j for terminal state, otherwise to r_j + gamma*max(Q)\n        y_batch = reward_batch\n\n        # extract Q-value\n        print_rank(f'RL state_batch: {state_batch.shape}', loglevel=logging.DEBUG)\n        state_output = self.model(state_batch)\n        print_rank(f'RL train shapes: {state_batch.shape} {action_batch.shape} {state_output.shape}', loglevel=logging.DEBUG)\n        q_value = torch.sum(state_output * action_batch, dim=1)\n\n        # reset gradient\n        self.optimizer.zero_grad()\n\n        # returns a new Tensor, detached from the current graph, the result will never require gradient\n        y_batch = y_batch.detach()\n\n        # calculate loss\n        loss = self.criterion(q_value, y_batch)\n\n        # do backward pass\n        loss.backward()\n        self.optimizer.step()\n\n        # Tracking a running average of loss\n        if self.runningLoss==0:\n            self.runningLoss = loss.item()\n        else:\n            self.runningLoss = 0.95 * self.runningLoss + 0.05 * loss.item()\n        print_rank('Running Loss for RL training process: {}'.format(self.runningLoss))\n\n        # Decay learning rate\n        self.lr_scheduler.step()\n\n\n    def make_model(self):\n        # make model\n        self.model = NeuralNetwork(self.config['RL']['network_params'], \\\n                        self.config['RL']['wantLSTM'] if 'wantLSTM' in self.config['RL'] else False, \\\n                        self.config['RL']['batchNorm'] if 'batchNorm' in self.config['RL'] else False)\n        print(self.model)\n        model = to_device(model)\n\n        # make optimizer\n        self.optimizer = make_optimizer(self.config['RL'][\"optimizer_config\"], self.model)\n\n        # make lr_scheduler\n        self.lr_scheduler = make_lr_scheduler(\n                                            self.config['RL']['annealing_config'],\n                                            self.optimizer,\n                                            num_batches=1)\n\n\n    def load_saved_status(self):\n        if os.path.exists(self.model_name):\n            print_rank(\"Resuming from checkpoint model {}\".format(self.model_name))\n            self.load()\n\n        if os.path.exists(self.stats_name):\n            with open(self.stats_name, 'r') as logfp: # loading the iteration no., val_loss and lr_weight\n                elems = json.load(logfp)\n                self.cur_iter_no= elems[\"i\"]\n                self.val_loss   = elems[\"val_loss\"]\n                self.val_cer    = elems[\"val_cer\"]\n                self.runningLoss= elems[\"weight\"]\n\n\n\n    def load(self):\n        print_rank(\"Loading checkpoint: {}\".format(self.model_name))\n        checkpoint = torch.load(self.model_name)\n\n        self.model.load_state_dict(checkpoint['model_state_dict'])\n        if self.optimizer is not None:\n            self.optimizer.load_state_dict(checkpoint['optimizer_state_dict'])\n\n        anl_st_dict = checkpoint.get('lr_scheduler_state_dict')\n        if anl_st_dict and self.lr_scheduler is not None:\n            self.lr_scheduler.load_state_dict(anl_st_dict)\n\n\n    def save(self, i):\n        \"\"\"\n        Save a model as well as training information\n        \"\"\"\n\n        save_state = {\n                'model_state_dict' : self.model.state_dict(),\n                'optimizer_state_dict' : self.optimizer.state_dict() if self.optimizer is not None else None,\n                'lr_scheduler_state_dict' : self.lr_scheduler.state_dict() if self.lr_scheduler is not None else None\n            }\n\n        outputdir = os.path.dirname(self.model_name)\n        if os.path.exists(outputdir) is False:\n            os.makedirs(outputdir, exist_ok=True)\n\n        print_rank(\"Saving model to: {}\".format(self.model_name))\n        try_except_save(torch_save, state_or_model=save_state,\n                                        save_path=self.model_name)\n\n        # logging the latest best values\n        print_rank(f'Saving stats to {self.stats_name}')\n        with open(self.stats_name, 'w') as logfp:\n            json.dump({\"i\":i+1,\n                        \"val_loss\":float(self.rl_losses[0]),\n                        \"val_cer\":float(self.rl_losses[1]),\n                        \"weight\":float(self.runningLoss)},\n                        logfp)\n\n\n\n"
  },
  {
    "path": "extensions/__init__.py",
    "content": "# Copyright (c) Microsoft Corporation.\n# Licensed under the MIT license.\n\nfrom extensions.RL.RL import *\nfrom extensions.quantization.quant import *\n"
  },
  {
    "path": "extensions/privacy/__init__.py",
    "content": "# Copyright (c) Microsoft Corporation.\n# Licensed under the MIT license.\n\nimport numpy as np\nimport torch as T\nimport logging\nimport math\nimport json\nfrom utils import print_rank\nfrom azureml.core import Run\nfrom scipy.special import betainc, betaln\n\nrun = Run.get_context()\n\ndef compute_LDP_noise_std(eps, max_sensitivity, delta):\n    return np.sqrt(2 * np.log(1.25 / delta)) * max_sensitivity / eps\n\n    \ndef _beta2betainc_ratio(a, x):\n    return 1 / betainc(a, a, x)\n\n\ndef _log_m1(d, alpha, gamma):\n    return alpha * np.log(1 - gamma**2) - (d - 2) * np.log(2) - np.log(d - 1)\n\n\ndef _log_m2(p, tau, alpha):\n    return np.log(p / (_beta2betainc_ratio(alpha, tau) - 1) - (1 - p)) + np.log(_beta2betainc_ratio(alpha, tau)) - betaln(alpha, alpha)\n\n\ndef _efficient_m(d, gamma, p):\n    alpha = (d - 1) / 2\n    tau = (1 + gamma) / 2 \n    return np.exp(_log_m1(d, alpha, gamma) + _log_m2(p, tau, alpha))\n\n\ndef privacy_parameters(eps0, eps, d):\n    exp_eps0 = np.exp(eps0)\n    exp_eps = np.exp(eps)\n    if exp_eps0 == np.inf:\n        p0 = 1\n    else:\n        p0 = exp_eps0 / (1 + exp_eps0)\n    if exp_eps == np.inf:\n        gamma = np.sqrt(np.pi / (2 * (d - 1)))\n    else:\n        gamma = ((exp_eps - 1) / (exp_eps + 1)) * np.sqrt(np.pi / (2 * (d - 1)))\n    return p0, gamma\n\n\ndef private_unit2(grad, gamma, prob):\n    np.testing.assert_almost_equal(grad.norm().cpu().item(), 1, decimal=5)\n    assert prob >= 0.5\n    assert (0 <= gamma <= 1)\n    p = T.rand(())\n    while True:\n        # create a uniform distriubtion over d-sphere\n        V = T.normal(0, 1, grad.shape, device=grad.device)\n        V = V / V.norm()\n        dot_prod = T.dot(V, grad)\n        if (dot_prod >= gamma and p < prob) or (dot_prod < gamma and p >= prob):\n            break\n    d = grad.shape[0]\n    m = _efficient_m(d, gamma, prob)\n    return V / m\n\n\ndef add_gaussian_noise(grad, eps, max_grad, delta):\n    sigma = compute_LDP_noise_std(eps, max_grad, delta)\n    #sigma = np.sqrt(2 * np.log(1.25 / delta)) * max_grad / eps\n    noisy_grad = sigma * T.randn(grad.shape, device=grad.device) + grad\n    return noisy_grad, sigma\n\n\ndef add_private_unit2_noise(eps, grad):\n    eps0 = 0.01 * eps\n    eps1 = 0.99 * eps\n    samp_prob, gamma = privacy_parameters(eps0, eps1, grad.shape[0])\n    return private_unit2(grad, gamma, samp_prob)\n\n\ndef scalar_DP(r, eps, k, r_max):\n    r = np.minimum(r, r_max)\n    val = k * r / r_max\n    f_val = math.floor(val)\n    c_val = math.ceil(val)\n    J = f_val if T.rand(()) < (c_val - val) else c_val\n    exp_eps = np.exp(eps)\n    rand_prob = exp_eps / (exp_eps + k)\n    if T.rand(()) >= rand_prob:\n        while True:\n            J_ = T.randint(0, k + 1, ()).item()\n            if J != J_:\n                J = J_\n                break\n    a = ((exp_eps + k) / (exp_eps - 1)) * (r_max / k)\n    b = (k * (k + 1)) / (2 * (exp_eps + k))\n    return a * (J - b)\n\n\ndef laplace_noise(max_sens, eps, vocab_size):\n    return np.random.laplace(0.0, max_sens/eps, vocab_size)\n\n\ndef unroll_network(named_params, select_grad=False):\n    # Unroll the network as 1D vector and save original values indices\n    params_ids, flat_params  = {}, []\n    cur_idx = 0\n    for n, p in named_params:\n        dat = p.grad if select_grad else p.data\n        flat_params.append(dat.view(-1))\n        next_idx = cur_idx + flat_params[-1].shape[0]\n        params_ids[n] = (cur_idx, next_idx)\n        cur_idx = next_idx\n    return T.cat(flat_params), params_ids\n\n\ndef update_network(named_params, params_ids, flat_params, apply_to_grad=False):\n    # Roll back the network parameters to layers\n    for n, p in named_params:\n        s_id, e_id = params_ids[n]\n        if apply_to_grad:\n            p.grad.copy_(flat_params[s_id : e_id].view(*p.grad.shape))\n        else:\n            p.data.copy_(flat_params[s_id : e_id].view(*p.data.shape))\n\n\ndef apply_global_dp(config, model, num_clients_curr_iter, select_grad=True, metric_logger=None):\n    # Add global DP noise here\n    dp_config = config.get('dp_config', None)\n    if dp_config is not None and dp_config.get('enable_global_dp', False):\n        # enable_local_dp must be enabled - client-side gradient clipping must be enabled.\n        assert (dp_config['enable_local_dp'])\n        # Unroll the network grads as 1D vectors\n        flat_grad, params_ids = unroll_network(model.named_parameters(), select_grad=select_grad)\n\n        sigma = dp_config['global_sigma']\n        max_grad = dp_config['max_grad']\n        noise_scale = sigma * max_grad / num_clients_curr_iter\n        noise = T.normal(0, 1, flat_grad.shape, device=flat_grad.device) * noise_scale\n        flat_noisy_grad = flat_grad + noise\n        print_rank('Error from noise {} is {}. grad norm: {} noisy_grad norm: {}'.format(noise_scale, (\n                    flat_grad - flat_noisy_grad).norm(), flat_grad.norm(), flat_noisy_grad.norm()))\n\n        # Return back to the network gradients\n        update_network(model.named_parameters(), params_ids, flat_noisy_grad,\n                               apply_to_grad=select_grad)\n\n        if metric_logger is None:\n            metric_logger = Run.get_context().log\n        metric_logger('Gradient Norm', flat_grad.norm().cpu().item())\n\n\ndef apply_local_dp(trainer, weight, dp_config, add_weight_noise):\n    '''Apply client-side DP, possibly given a data-dependent aggregation weight\n\n    Args:\n        trainer (core.Trainer object): trainer on client.\n        dp_config (dict): DP config on original config file.\n        add_weight_noise (bool): whether noise should be added to aggregation weight.\n    '''\n\n    # Unroll the network grads as 1D vectors\n    flat_grad, params_ids = unroll_network(trainer.model.named_parameters(), select_grad=True)\n    grad_norm = flat_grad.norm().cpu().item()\n\n    if dp_config['eps'] < 0:\n        # clip, but don't add noise\n        if grad_norm > dp_config['max_grad']:\n            flat_grad = flat_grad * (dp_config['max_grad'] / grad_norm)\n            update_network(trainer.model.named_parameters(), params_ids, flat_grad, apply_to_grad=True)\n\n    else:\n        # Get Gaussian LDP noise\n        dp_eps = dp_config['eps']\n        delta = dp_config.get('delta', 1e-7) # TODO pre-compute in config\n        weight_ = weight\n\n        # Scaling the weight down so we don't impact the noise too much\n        weight = dp_config.get('weight_scaler', 1) * weight\n        weight = min(dp_config['max_weight'], weight)\n        flat_noisy_grad = dp_config['max_grad'] * (flat_grad / flat_grad.norm())\n        max_sensitivity = np.sqrt(dp_config['max_grad']**2 + (dp_config['max_weight']**2 if add_weight_noise else 0.0))\n        flat_noisy_grad = T.cat([flat_noisy_grad, T.tensor([weight], device=flat_noisy_grad.device)], dim=0)\n        flat_noisy_grad, _ = add_gaussian_noise(flat_noisy_grad, dp_eps, max_sensitivity, delta)\n        weight = min(max(flat_noisy_grad[-1].item(), dp_config['min_weight']), dp_config['max_weight'])\n\n        # Scaling the weight back up after noise addition (This is a DP-protect transformation)\n        weight = weight / dp_config.get('weight_scaler', 1)\n        if not add_weight_noise:\n            weight = weight_\n        flat_noisy_grad = flat_noisy_grad[:-1]\n\n        print_rank('Cosine error from noise {}'.format(T.nn.functional.cosine_similarity(flat_grad, flat_noisy_grad, dim=0)), loglevel=logging.DEBUG)\n        print_rank('Error from noise is {}'.format((flat_grad-flat_noisy_grad).norm()), loglevel=logging.DEBUG)\n        print_rank('weight is {} and noisy weight is {}'.format(weight_, weight), loglevel=logging.DEBUG)\n\n        # Return back to the network\n        update_network(trainer.model.named_parameters(), params_ids, flat_noisy_grad, apply_to_grad=True)\n\n    return weight\n\n\ndef update_privacy_accountant(config, num_clients, curr_iter, num_clients_curr_iter):\n    # Privacy accounting starts here\n    # We will dump all the needed parameters to the log so as not to slow down training.\n    dp_config = config.get('dp_config', None)\n    if dp_config is not None and dp_config.get('enable_global_dp', False) or dp_config.get('enable_local_dp',\n                                                                                           False):\n        from math import sqrt, exp, log\n        import extensions.privacy.analysis as privacy_analysis\n\n        K = 1  # from DP perspective each user is contributing one gradient\n        B = num_clients_curr_iter  # batch size\n        n = num_clients\n        T = curr_iter + 1\n        _delta = dp_config.get('delta', min(1e-7, 1. / (n * log(n))))  # TODO should be precomputed in config\n        if dp_config.get('global_sigma', None) is None:\n            max_sensitivity = np.sqrt(dp_config['max_grad'] ** 2 + dp_config['max_weight'] ** 2)\n            noise_scale = compute_LDP_noise_std(dp_config['eps'], max_sensitivity, _delta)\n            global_sigma = noise_scale * np.sqrt(B) / max_sensitivity\n        else: \n            global_sigma = dp_config['global_sigma']\n            noise_scale = global_sigma * dp_config['max_grad'] / B\n\n        try:\n            mu = K * B / n * sqrt(T * exp((1. / global_sigma) ** 2 - 1))\n        except OverflowError:\n            print_rank(f\"Error computing mu {global_sigma} {K} {B} {n} {T}\")\n            mu = -1\n\n        orders = ([1.25, 1.5, 1.75, 2., 2.25, 2.5, 3., 3.5, 4., 4.5] + list(range(5, 64)) + [128, 256, 512])\n        q = B / n\n        _sigma = global_sigma  # was: noise_scale but we should apply the noise multiplier.\n        rdp = privacy_analysis.compute_rdp(q, _sigma, T, orders)\n\n        rdp_epsilon, opt_order = privacy_analysis.get_privacy_spent(orders, rdp, _delta)\n\n        props = {\n            'dp_global_K': K,  # gradients per user\n            'dp_global_B': B,  # users per batch\n            'dp_global_n': n,  # total users\n            'dp_global_T': T,  # how many iterations\n            'dp_sigma': _sigma,  # noise_multiplier. Should be combined global+local sigma.\n            'dp_global_mu': mu,\n            # 'dp_epsilon_fdp': fdp_epsilon,\n            'dp_epsilon_rdp': rdp_epsilon,\n            # 'dp_epsilon_exact': exact_eps,\n            'dp_opt_order': opt_order,\n            'dp_delta': _delta,\n            'dp_noise_scale': noise_scale  # Note: not needed for accounting.\n        }\n\n        print_rank(f'DP accounting: {json.dumps(props)}')\n        for k in props:\n            run.log(k, props[k])\n\n        return rdp_epsilon\n    else:\n        return None\n"
  },
  {
    "path": "extensions/privacy/analysis.py",
    "content": "#!/usr/bin/env python3\n# Copyright (c) Facebook, Inc. and its affiliates. All Rights Reserved\n\n\"\"\"\n*Borrowed from Facebook Opacus, which in turn borrowed from Tensorflow Privacy.  \n*Facebook's original notice follows below.\n\n\n*Based on Google's TF Privacy:* https://github.com/tensorflow/privacy/blob/master/tensorflow_privacy/privacy/analysis/rdp_accountant.py.\n*Here, we update this code to Python 3, and optimize dependencies.*\n\nFunctionality for computing Renyi Differential Privacy (RDP) of an additive\nSampled Gaussian Mechanism (SGM).\n\nExample:\n    Suppose that we have run an SGM applied to a function with L2-sensitivity of 1.\n\n    Its parameters are given as a list of tuples\n    ``[(q_1, sigma_1, steps_1), ..., (q_k, sigma_k, steps_k)],``\n    and we wish to compute epsilon for a given target delta.\n\n    The example code would be:\n\n    >>> max_order = 32\n    >>> orders = range(2, max_order + 1)\n    >>> rdp = np.zeros_like(orders, dtype=float)\n    >>> for q, sigma, steps in parameters:\n    >>>     rdp += privacy_analysis.compute_rdp(q, sigma, steps, orders)\n    >>> epsilon, opt_order = privacy_analysis.get_privacy_spent(orders, rdp, delta)\n\n\"\"\"\n\nimport math\nimport numpy as np\nfrom scipy import special\nfrom typing import List, Tuple, Union\n\n########################\n# LOG-SPACE ARITHMETIC #\n########################\n\n\ndef _log_add(logx: float, logy: float) -> float:\n    r\"\"\"Adds two numbers in the log space.\n\n    Args:\n        logx: First term in log space.\n        logy: Second term in log space.\n\n    Returns:\n        Sum of numbers in log space.\n    \"\"\"\n    a, b = min(logx, logy), max(logx, logy)\n    if a == -np.inf:  # adding 0\n        return b\n    # Use exp(a) + exp(b) = (exp(a - b) + 1) * exp(b)\n    return math.log1p(math.exp(a - b)) + b  # log1p(x) = log(x + 1)\n\n\ndef _log_sub(logx: float, logy: float) -> float:\n    r\"\"\"Subtracts two numbers in the log space.\n\n    Args:\n        logx: First term in log space. Expected to be greater than the second term.\n        logy: First term in log space. Expected to be less than the first term.\n\n    Returns:\n        Difference of numbers in log space.\n\n    Raises:\n        ValueError\n            If the result is negative.\n    \"\"\"\n    if logx < logy:\n        raise ValueError(\"The result of subtraction must be non-negative.\")\n    if logy == -np.inf:  # subtracting 0\n        return logx\n    if logx == logy:\n        return -np.inf  # 0 is represented as -np.inf in the log space.\n\n    try:\n        # Use exp(x) - exp(y) = (exp(x - y) - 1) * exp(y).\n        return math.log(math.expm1(logx - logy)) + logy  # expm1(x) = exp(x) - 1\n    except OverflowError:\n        return logx\n\n\ndef _compute_log_a_for_int_alpha(q: float, sigma: float, alpha: int) -> float:\n    r\"\"\"Computes :math:`log(A_\\alpha)` for integer ``alpha``.\n\n    Notes:\n        Note that\n        :math:`A_\\alpha` is real valued function of ``alpha`` and ``q``,\n        and that 0 < ``q`` < 1.\n\n        Refer to Section 3.3 of https://arxiv.org/pdf/1908.10530.pdf for details.\n\n    Args:\n        q: Sampling rate of SGM.\n        sigma: The standard deviation of the additive Gaussian noise.\n        alpha: The order at which RDP is computed.\n\n    Returns:\n        :math:`log(A_\\alpha)` as defined in Section 3.3 of\n        https://arxiv.org/pdf/1908.10530.pdf.\n    \"\"\"\n\n    # Initialize with 0 in the log space.\n    log_a = -np.inf\n\n    for i in range(alpha + 1):\n        log_coef_i = (\n            math.log(special.binom(alpha, i))\n            + i * math.log(q)\n            + (alpha - i) * math.log(1 - q)\n        )\n\n        s = log_coef_i + (i * i - i) / (2 * (sigma ** 2))\n        log_a = _log_add(log_a, s)\n\n    return float(log_a)\n\n\ndef _compute_log_a_for_frac_alpha(q: float, sigma: float, alpha: float) -> float:\n    r\"\"\"Computes :math:`log(A_\\alpha)` for fractional ``alpha``.\n\n    Notes:\n        Note that\n        :math:`A_\\alpha` is real valued function of ``alpha`` and ``q``,\n        and that 0 < ``q`` < 1.\n\n        Refer to Section 3.3 of https://arxiv.org/pdf/1908.10530.pdf for details.\n\n    Args:\n        q: Sampling rate of SGM.\n        sigma: The standard deviation of the additive Gaussian noise.\n        alpha: The order at which RDP is computed.\n\n    Returns:\n        :math:`log(A_\\alpha)` as defined in Section 3.3 of\n        https://arxiv.org/pdf/1908.10530.pdf.\n    \"\"\"\n    # The two parts of A_alpha, integrals over (-inf,z0] and [z0, +inf), are\n    # initialized to 0 in the log space:\n    log_a0, log_a1 = -np.inf, -np.inf\n    i = 0\n\n    z0 = sigma ** 2 * math.log(1 / q - 1) + 0.5\n\n    while True:  # do ... until loop\n        coef = special.binom(alpha, i)\n        log_coef = math.log(abs(coef))\n        j = alpha - i\n\n        log_t0 = log_coef + i * math.log(q) + j * math.log(1 - q)\n        log_t1 = log_coef + j * math.log(q) + i * math.log(1 - q)\n\n        log_e0 = math.log(0.5) + _log_erfc((i - z0) / (math.sqrt(2) * sigma))\n        log_e1 = math.log(0.5) + _log_erfc((z0 - j) / (math.sqrt(2) * sigma))\n\n        log_s0 = log_t0 + (i * i - i) / (2 * (sigma ** 2)) + log_e0\n        log_s1 = log_t1 + (j * j - j) / (2 * (sigma ** 2)) + log_e1\n\n        if coef > 0:\n            log_a0 = _log_add(log_a0, log_s0)\n            log_a1 = _log_add(log_a1, log_s1)\n        else:\n            log_a0 = _log_sub(log_a0, log_s0)\n            log_a1 = _log_sub(log_a1, log_s1)\n\n        i += 1\n        if max(log_s0, log_s1) < -30:\n            break\n\n    return _log_add(log_a0, log_a1)\n\n\ndef _compute_log_a(q: float, sigma: float, alpha: float) -> float:\n    r\"\"\"Computes :math:`log(A_\\alpha)` for any positive finite ``alpha``.\n\n    Notes:\n        Note that\n        :math:`A_\\alpha` is real valued function of ``alpha`` and ``q``,\n        and that 0 < ``q`` < 1.\n\n        Refer to Section 3.3 of https://arxiv.org/pdf/1908.10530.pdf\n        for details.\n\n    Args:\n        q: Sampling rate of SGM.\n        sigma: The standard deviation of the additive Gaussian noise.\n        alpha: The order at which RDP is computed.\n\n    Returns:\n        :math:`log(A_\\alpha)` as defined in the paper mentioned above.\n    \"\"\"\n    if float(alpha).is_integer():\n        return _compute_log_a_for_int_alpha(q, sigma, int(alpha))\n    else:\n        return _compute_log_a_for_frac_alpha(q, sigma, alpha)\n\n\ndef _log_erfc(x: float) -> float:\n    r\"\"\"Computes :math:`log(erfc(x))` with high accuracy for large ``x``.\n\n    Helper function used in computation of :math:`log(A_\\alpha)`\n    for a fractional alpha.\n\n    Args:\n        x: The input to the function\n\n    Returns:\n        :math:`log(erfc(x))`\n    \"\"\"\n    return math.log(2) + special.log_ndtr(-x * 2 ** 0.5)\n\n\ndef _compute_rdp(q: float, sigma: float, alpha: float) -> float:\n    r\"\"\"Computes RDP of the Sampled Gaussian Mechanism at order ``alpha``.\n\n    Args:\n        q: Sampling rate of SGM.\n        sigma: The standard deviation of the additive Gaussian noise.\n        alpha: The order at which RDP is computed.\n\n    Returns:\n        RDP at order ``alpha``; can be np.inf.\n    \"\"\"\n    if q == 0:\n        return 0\n\n    # no privacy\n    if sigma == 0:\n        return np.inf\n\n    if q == 1.0:\n        return alpha / (2 * sigma ** 2)\n\n    if np.isinf(alpha):\n        return np.inf\n\n    return _compute_log_a(q, sigma, alpha) / (alpha - 1)\n\n\ndef compute_rdp(\n    q: float, noise_multiplier: float, steps: int, orders: Union[List[float], float]\n) -> Union[List[float], float]:\n    r\"\"\"Computes Renyi Differential Privacy (RDP) guarantees of the\n    Sampled Gaussian Mechanism (SGM) iterated ``steps`` times.\n\n    Args:\n        q: Sampling rate of SGM.\n        noise_multiplier: The ratio of the standard deviation of the\n            additive Gaussian noise to the L2-sensitivity of the function\n            to which it is added. Note that this is same as the standard\n            deviation of the additive Gaussian noise when the L2-sensitivity\n            of the function is 1.\n        steps: The number of iterations of the mechanism.\n        orders: An array (or a scalar) of RDP orders.\n\n    Returns:\n        The RDP guarantees at all orders; can be ``np.inf``.\n    \"\"\"\n    if isinstance(orders, float):\n        rdp = _compute_rdp(q, noise_multiplier, orders)\n    else:\n        rdp = np.array([_compute_rdp(q, noise_multiplier, order) for order in orders])\n\n    return rdp * steps\n\n\ndef get_privacy_spent(\n    orders: Union[List[float], float], rdp: Union[List[float], float], delta: float\n) -> Tuple[float, float]:\n    r\"\"\"Computes epsilon given a list of Renyi Differential Privacy (RDP) values at\n    multiple RDP orders and target ``delta``.\n\n    Args:\n        orders: An array (or a scalar) of orders (alphas).\n        rdp: A list (or a scalar) of RDP guarantees.\n        delta: The target delta.\n\n    Returns:\n        Pair of epsilon and optimal order alpha.\n\n    Raises:\n        ValueError\n            If the lengths of ``orders`` and ``rdp`` are not equal.\n    \"\"\"\n    orders_vec = np.atleast_1d(orders)\n    rdp_vec = np.atleast_1d(rdp)\n\n    if len(orders_vec) != len(rdp_vec):\n        raise ValueError(\n            f\"Input lists must have the same length.\\n\"\n            f\"\\torders_vec = {orders_vec}\\n\"\n            f\"\\trdp_vec = {rdp_vec}\\n\"\n        )\n\n    eps = rdp_vec - math.log(delta) / (orders_vec - 1)\n\n    # special case when there is no privacy\n    if np.isnan(eps).all():\n        return np.inf, np.nan\n\n    idx_opt = np.nanargmin(eps)  # Ignore NaNs\n    return eps[idx_opt], orders_vec[idx_opt]\n"
  },
  {
    "path": "extensions/privacy/dp_kmeans.py",
    "content": "# Copyright (c) Microsoft Corporation.\n# Licensed under the MIT license.\n\nimport sys\nimport numpy as np\nfrom scipy.special import gammainc\nfrom sklearn.cluster import KMeans\nfrom sklearn import cluster as skcluster\n\n\nkmeans_single = skcluster._kmeans.lloyd_iter_chunked_dense\n\n\ndef sample(ndim, r, num_samples=1):\n    x = np.random.normal(size=(num_samples, ndim))\n    ssq = np.sum(x**2,axis=1)\n    fr = r*gammainc(ndim/2,ssq/2)**(1/ndim)/np.sqrt(ssq)\n    if num_samples > 1:\n        fr = np.tile(fr.reshape(num_samples,1),(1,ndim))\n    return  np.multiply(x,fr)\n\n\ndef sphere_packing_initialization(n_clusters, n_dim, min_cluster_radius,\n                                  max_space_size, max_failed_cases, verbose=None):\n    a, max_r = min_cluster_radius, max_space_size\n    centers = np.empty((n_clusters, n_dim))\n    cluster_id = 0\n    fail_count = 0\n    r = max_r - a\n    while cluster_id < n_clusters:\n        v = sample(n_dim, r)\n        if cluster_id > 0 and np.min(np.linalg.norm(centers[:cluster_id, :] - v, axis=-1)) < 2 * a:\n            fail_count += 1\n            if fail_count >= max_failed_cases:\n                fail_count = 0\n                cluster_id = 0\n                a = a / 2 # TODO Use binary search to find maximum a that don't fail (vaguely discribed in the diff-p kmeas paper)\n                if verbose:\n                    print(f'Failing to pack, halving min_cluster_radius to {a}')\n                r = max_r - a\n            continue\n     \n        centers[cluster_id] = v\n        cluster_id += 1\n    if verbose:\n        print('Final min_cluster_radius', a)\n    return centers, a\n\n\ndef add_gaussian_noise(centers_new, weight_in_clusters, eps,\n                       max_cluster_l2, max_sample_weight,\n                       cluster_to_weight_ratio=-1, delta=1e-7, verbose=None):\n    scaler = 1\n    \n    if cluster_to_weight_ratio > 0:\n        # Compute the scaler to apply to the sample weights\n        scaler = max_cluster_l2 / (max_sample_weight * cluster_to_weight_ratio)\n    max_sample_weight *= scaler\n   \n    max_l2_sensitivity = np.sqrt(max_cluster_l2 ** 2 + max_sample_weight ** 2)\n    sigma = np.sqrt(2 * np.log(1.25 / delta)) * max_l2_sensitivity / eps\n    if verbose:\n        print('cluster_to_weight_ratio', cluster_to_weight_ratio,\n              'scaler', scaler,\n              'max_sample_weight', max_sample_weight,\n              'max_l2_sensitivity', max_l2_sensitivity,\n              'sigma', sigma)\n    centers_sum = (centers_new * weight_in_clusters.reshape(-1, 1)) + np.random.normal(scale=sigma, size=centers_new.shape)\n    # Scale the sample weights by scaling the cluster weights, since (s*w1 + s*w2, ...) == s*(w1 + w2 + ...), where s is the scaler\n    # Add noise then rescale back. We should never get negative weights because of the noise\n    weight_in_clusters[:] = np.maximum(1e-10, (weight_in_clusters * scaler) + np.random.normal(scale=sigma, size=weight_in_clusters.shape)) / scaler\n    centers_new[:] = centers_sum / weight_in_clusters.reshape(-1, 1)\n\n\ndef DPKMeans(n_dim, eps, max_cluster_l2, max_sample_weight=1.0,\n             max_iter=300, cluster_to_weight_ratio=-1, n_clusters=8,\n             tol=1e-4, verbose=0, delta=1e-7, max_failed_cases=300,\n             min_cluster_radius=None, **kwargs):\n    \"\"\"Differentially private KMeans\n\n    Initialise the differentially-private Sklearn.cluster.KMeans overriding lloyd algorithm,\n    by adding Gaussian noise.\n\n    Parameters\n    ---------\n    \n    n_dim : int\n        The dimension size of the input space\n    \n    eps : float\n        The privacy loss (epsilon) per iteration. Currently only fix epsilon is implemented so\n        the overall privacy loss <= eps * max_iter\n\n    max_cluster_l2 : float\n        The maximum l2 norm of any example vector that we want to cluster\n\n    max_sample_weight : float\n        The maximum weight of a sample default=1.0\n\n    max_iter : int, default=300\n        Maximum number of iterations of the k-means algorithm for a\n        single run.\n\n    cluster_to_weight_ratio : float, default=-1\n        The ratio max_cluster_l2 / max_sample_weight used to scale the cluster counts before adding the noise\n        If it is set to -1, do not scale the counts\n\n    n_clusters : int, default=8\n        The number of clusters to form as well as the number of\n        centroids to generate.\n\n    tol : float, default=1e-4\n        Relative tolerance with regards to Frobenius norm of the difference\n        in the cluster centers of two consecutive iterations to declare\n        convergence.\n\n    verbose : int, default=0\n        Verbosity mode.\n\n    delta : float, default=1e-7\n        Gaussian mechanism delta or probability of failure, should be set < 1/num of examples\n\n    max_failed_cases : int, default=300\n        The number of sampling trails in sphere packing before halving the minimum cluster radius\n\n    min_cluster_radius : float, default=None (= max_cluster_l2 / n_clusters)\n        Half the minimum distance between clusters centers\n    \"\"\"\n\n    if min_cluster_radius is None:\n        min_cluster_radius = max_cluster_l2 / n_clusters\n\n    # Initalise the cluster centers using sphere packing\n    init_centers, min_cluster_radius = sphere_packing_initialization(n_clusters, n_dim,\n                                                                     min_cluster_radius,\n                                                                     max_cluster_l2,\n                                                                     max_failed_cases,\n                                                                     verbose)\n\n    final_eps = [0] # To keep track of the actual number of iterations until convergence\n    def modified_lloyd(X, sample_weight, x_squared_norms, centers, centers_new,\n                       weight_in_clusters, labels, center_shift, n_threads,\n                       update_centers=True):\n\n        # Clip the maximum client contribution to the cluster count\n        sample_weight = np.minimum(sample_weight, max_sample_weight)\n        \n        if not update_centers:\n            return kmeans_single(X, sample_weight, x_squared_norms, centers, centers_new,\n                                weight_in_clusters, labels, center_shift, n_threads, update_centers=False)\n        \n        \n        # Scale input vectors if necessary\n        if np.max(x_squared_norms) > max_cluster_l2 ** 2:\n            if verbose:\n                print(f'Scaling the input examples as their l2 norm is larger than {max_cluster_l2}')\n            scaler_squared = np.minimum(max_cluster_l2 ** 2 / x_squared_norms, 1.0)\n            x_squared_norms[:] = x_squared_norms * scaler_squared\n            X[:] = X * np.sqrt(scaler_squared).reshape(-1, 1)\n        \n        kmeans_single(X, sample_weight, x_squared_norms, centers, centers_new,\n                      weight_in_clusters, labels, center_shift, n_threads)\n\n        # Add noise to centers_new\n        add_gaussian_noise(centers_new, weight_in_clusters, eps,\n                           max_cluster_l2, max_sample_weight,\n                           cluster_to_weight_ratio, delta=delta,\n                           verbose=verbose)\n\n        # Other values need to be changed because of that: center_shift, labels, \n        center_shift[:] = np.linalg.norm(centers - centers_new, axis=-1)\n        # Run E-step of kmeans to get the new labels\n        kmeans_single(X, sample_weight, x_squared_norms, centers, centers_new,\n                    weight_in_clusters, labels, center_shift, n_threads, update_centers=False)\n\n        # Increment the number of iterations\n        final_eps[0] += eps\n\n    sys.modules[KMeans.__module__].lloyd_iter_chunked_dense = modified_lloyd\n\n    kmeans = KMeans(n_clusters=n_clusters,\n                    algorithm='full',\n                    init=init_centers,\n                    verbose=verbose,\n                    max_iter=max_iter,\n                    tol=tol, **kwargs)\n    kmeans.eps = final_eps\n    return kmeans\n\n\ndef resetKMeans():\n    sys.modules[KMeans.__module__].lloyd_iter_chunked_dense = kmeans_single"
  },
  {
    "path": "extensions/privacy/metrics.py",
    "content": "# Copyright (c) Microsoft Corporation.\n# Licensed under the MIT license.\n\nimport logging\nimport numpy as np\nimport torch as T\nfrom copy import deepcopy\nfrom utils import make_optimizer, print_rank\n\ndef extract_indices_from_embeddings(gradients, batch, embed_size, vocab_size):\n    # Extract the Input gradient embeddings\n    batch = T.cat([b.view(-1) for b in batch]).cpu().detach().numpy()\n    embed_grad = gradients[:embed_size * vocab_size].reshape(vocab_size, embed_size)\n    valid_batch = batch[batch > 0]\n    tot_valid_tokens, tot_tokens = len(valid_batch), len(batch)\n    # The embedding gradients of the indices seen in the batch have higher l2 norm,\n    # because dl/dembed_i = dl/dembed_input_i * (if word_i is in batch) + dl/dembed_output_i\n    extracted_indices = T.argsort(embed_grad.norm(dim=-1), descending=True)[:tot_tokens].cpu().detach().numpy()\n    # Get the overlap ratio\n    extracted_ratio = np.isin(valid_batch, extracted_indices).mean()\n    # Find True positive extracted indices\n    return extracted_ratio, np.intersect1d(extracted_indices, valid_batch)\n\n\ndef compute_perplexity(encoded_batch, model):\n    outputs = model.inference(encoded_batch)    \n    (batch_size, seq_len, vocab_size) = outputs['output'].shape    \n    perplex = T.nn.functional.log_softmax(outputs['output'], dim=-1)\n    return perplex.reshape(-1, vocab_size)[np.arange(batch_size * seq_len),\n                    encoded_batch.reshape(-1)].reshape(batch_size, seq_len)\n\n\ndef practical_epsilon_leakage(original_params, model, encoded_batches, is_weighted_leakage=True,\n                              max_ratio=1e9, optimizer_config=None):\n    # Copy the gradients and save the model.\n    current_params = deepcopy(model.state_dict())\n    current_gradients = dict((n,p.grad.clone().detach()) for n,p in model.named_parameters())\n    model.load_state_dict(original_params)\n    pre_perplex, post_perplex = [], []\n    # This is just to initialise the gradients\n    model.loss(encoded_batches[0][:1]).backward()\n    model.zero_grad()\n    tolerance = 1 / max_ratio\n    max_leakage = 0\n    with T.no_grad():\n        # Original model before training on client\n        for encoded_batch in encoded_batches:\n            pre_perplex.append(compute_perplexity(encoded_batch, model))\n        # The attacker doesn't not he optimal gradient magnitude but using Adamax with high lr, is proved to be effective    \n        for n, p in model.named_parameters():\n            p.grad = current_gradients[n] #.grad\n            print_rank('grad l2: {}'.format(p.grad), loglevel=logging.DEBUG)\n        if optimizer_config is None:\n            optimizer_config = {'lr': 0.03, 'amsgrad': False, 'type': 'adamax'}\n        #T.optim.Adamax(model.parameters(), lr=optim_lr).step()\n        make_optimizer(optimizer_config, model).step()\n        #model.zero_grad()\n        # The model after training on the client data\n        for encoded_batch in encoded_batches:\n            post_perplex.append(compute_perplexity(encoded_batch, model))\n      \n        for pre, post in zip(pre_perplex, post_perplex):\n            # Compute the ratio of preplexity and weight it be the probability of correctly predicting the word\n            leakage = ((pre + tolerance) / (post + tolerance)).clamp_(0, max_ratio)\n            print_rank('perplexities leakage: {} '.format(leakage), loglevel=logging.DEBUG)\n            if is_weighted_leakage:\n                weight_leakage = T.max(pre.exp(), post.exp()) * leakage\n            else:\n                weight_leakage = leakage\n            max_leakage = max(max_leakage, weight_leakage.max().item())\n    print_rank('raw max leakage: {}'.format(max_leakage), loglevel=logging.DEBUG)\n    model.load_state_dict(current_params)\n    for n,p in model.named_parameters():\n        p.grad = current_gradients[n]\n    # WE return the log to match epsilon\n    return max(np.log(max_leakage), 0)"
  },
  {
    "path": "extensions/quantization/quant.py",
    "content": "# Copyright (c) Microsoft Corporation.\n# Licensed under the MIT license.\n\nimport logging\nimport torch\nfrom utils import print_rank\nfrom typing import Optional, Tuple\n\ndef quant_model(\n        model: torch.nn.Module,\n        quant_bits: int = 8,\n        quant_threshold: Optional[int] = None,\n        global_stats: bool = False\n    ):\n    '''Quantize the gradients using the desired number of bits.\n\n    Nothing is returned as gradients inside :code:`model` are modified\n    in-place.\n\n    Args:\n        model: model which gradients we want to quantize.\n        quant_bits: how many bits will we use to quantize the gradients.\n        quant_threshold: fraction of components to be set to zero; defaults to\n            None, in which case no quantization happens.\n        global_stats: use a single histogram for all layers when binning,\n            defaults to False.\n    '''\n\n    # If no `quant_threshold`, does nothing\n    if quant_threshold is None:\n        return\n    print_rank('Performing Gradient Quantization with Prob. Threshold: {}'.format(\n        quant_threshold), loglevel=logging.INFO)\n\n    # If `global_stats` is true, min/max and thresh are computed across all layers\n    if global_stats:\n        flattened_grad = torch.cat([p.grad.data.flatten() for p in model.parameters()])\n        min_grad, max_grad, thresh = find_min_max_gradient(flattened_grad,\n            quant_threshold)\n\n    # Loop through all layers\n    for p in model.parameters():\n        if not global_stats:\n            min_grad, max_grad, thresh = find_min_max_gradient(p.grad.data,\n                quant_threshold)\n\n        # Perform binning and sparsification of components\n        binned_grad = quant_bins(p.grad.data, 2 ** quant_bits, min_grad, max_grad)\n        p.grad = torch.where(torch.abs(p.grad.data) > thresh, binned_grad,\n            torch.tensor(0.).to(p.grad))\n\n\ndef find_min_max_gradient(\n        gradient: torch.Tensor,\n        quant_threshold: Optional[float] = None\n    ) -> Tuple[float, float, float]:\n    '''Get min and max gradients, as well as threshold gradient.\n\n    Args:\n        gradient: tensor over which statistics will be computed.\n        quant_threshold: which quantile to look for to compute threshold, must\n            be between 0 and 1.\n    '''\n\n    # Computes min/max and quantile corresponding to `quant_threshold`\n    min_grad, max_grad = gradient.min(), gradient.max()\n    thresh = torch.quantile(torch.abs(gradient), quant_threshold)\n\n    print_rank('Min. and Max. Gradients: {}, {}'.format(min_grad, max_grad),\n        loglevel=logging.INFO)\n    print_rank('Grad. Threshold: {}'.format(thresh), loglevel=logging.INFO)\n\n    return min_grad, max_grad, thresh\n\n\ndef quant_bins(\n        gradients: torch.Tensor,\n        n_bins: int,\n        min_grad: float,\n        max_grad: float\n    ) -> torch.Tensor:\n    '''Perform quantization using binning.\n\n    Creates histogram with `n_bins` bins between `min_grad` and `max_grad`.\n    Returns a tensor similar to gradients but with components corresponding to\n    bin labels.\n\n    Args:\n        gradients: tensor we want to quantize.\n        n_bins: how many bins to use for binning.\n        min_grad: min. value for bins.\n        max_grad: max. value for bins.\n    '''\n\n    # We remove half bin width, as bucketize always takes the ceil instead of rounding\n    bin_labels = torch.linspace(min_grad, max_grad, n_bins).to(gradients)\n    bin_width = bin_labels[1] - bin_labels[0]\n    grad_bins = torch.bucketize(gradients - .5 * bin_width, bin_labels, right=False)\n\n    return bin_labels[grad_bins]\n"
  },
  {
    "path": "requirements.txt",
    "content": "torch==1.11.0\nmpi4py\neasydict\nscipy\npsutil\ntransformers\ntorchvision\npandas\nh5py\nsphinx_rtd_theme\nazureml-core\nazureml-defaults\npyyaml\nscikit-learn\ncerberus\nprotobuf\nsentencepiece\ngoogledrivedownloader\nwget\n"
  },
  {
    "path": "testing/README.md",
    "content": "## Information\n\nThe tests are designed to evaluate the operation of the tasks, not the performance. Therefore, we are using dummy data to run all tasks. In order to have ralistic results about the behaviour of each experiment, please follow the instructions provided in the README.md  file inside each experiment folder, for downloading the recommended datasets. \n\n## Setup Instructions for Pytest\n1. Run create_data.py in order to download and preprocess the dummy training and testing datasets that will be used. Make sure to indicate the task name. The example below shows how to create the data for the ```nlg_gru``` task.\n\n``` python\n    python create_data.py --task nlg_gru\n```\n2. The script ```test_e2e_trainer.py``` is designed to run the test over all tasks, therefore you need to run Step 1 for each experiment first).\n3. Run ```pytest -v -s``` to perfor the local test.\n"
  },
  {
    "path": "testing/build_vocab.py",
    "content": "\"\"\"Builds vocabulary file from data.\"\"\"\n\nimport argparse\nimport collections\nimport json\nimport os\n\ndef build_counter(train_data, initial_counter=None):\n    train_tokens = []\n    for u in train_data:\n        for c in train_data[u]['x']:\n            train_tokens.extend([s for s in c])\n\n    all_tokens = []\n    for i in train_tokens:\n        all_tokens.extend(i)    \n    train_tokens = []\n\n    if initial_counter is None:\n        counter = collections.Counter()\n    else:\n        counter = initial_counter\n\n    counter.update(all_tokens)\n    all_tokens = []\n\n    return counter\n\n\ndef build_vocab(counter, vocab_size=10000):\n    pad_symbol, unk_symbol = 0, 1\n    count_pairs = sorted(counter.items(), key=lambda x: (-x[1], x[0]))\n    count_pairs = count_pairs[:(vocab_size - 2)] # -2 to account for the unknown and pad symbols\n\n    words, _ = list(zip(*count_pairs))\n\n    vocab = {}\n    vocab['<PAD>'] = pad_symbol\n    vocab['<UNK>'] = unk_symbol\n\n    for i, w in enumerate(words):\n        if w != '<PAD>':\n            vocab[w] = i + 1\n\n    return {'vocab': vocab, 'size': vocab_size, 'unk_symbol': unk_symbol, 'pad_symbol': pad_symbol}\n\n\ndef load_leaf_data(file_path):\n    with open(file_path) as json_file:\n        data = json.load(json_file)\n        to_ret = data['user_data']\n        data = None\n    return to_ret\n\n\ndef save_vocab(vocab, target_dir):\n    os.makedirs(target_dir, exist_ok=True)\n    with open('./models/vocab_reddit.vocab', 'w') as outV:\n        outV.write('<OOV>\\n')\n        for t in vocab['vocab'].keys():\n            outV.write(t+'\\n')\n\n\ndef main():\n    args = parse_args()\n\n    json_files = [f for f in os.listdir(args.data_dir) if f.endswith('.json')]\n    json_files.sort()\n\n    counter = None\n    train_data = {}\n    for f in json_files:\n        print('loading {}'.format(f))\n        train_data = load_leaf_data(os.path.join(args.data_dir, f))\n        print('counting {}'.format(f))\n        counter = build_counter(train_data, initial_counter=counter)\n        print()\n        train_data = {}\n\n    if counter is not None:\n        vocab = build_vocab(counter, vocab_size=args.vocab_size)\n        save_vocab(vocab, args.target_dir)\n    else:\n        print('No files to process.')\n\n\ndef parse_args():\n    parser = argparse.ArgumentParser()\n\n    parser.add_argument('--data-dir', \n        help='dir with training file;',\n        type=str,\n        required=True)\n    parser.add_argument('--vocab-size', \n        help='size of the vocabulary;',\n        type=int,\n        default=10000,\n        required=False)\n    parser.add_argument('--target-dir', \n        help='dir with training file;',\n        type=str,\n        default='./',\n        required=False)\n\n    return parser.parse_args()\n\nif __name__ == '__main__':\n    main()\n"
  },
  {
    "path": "testing/create_data.py",
    "content": "# Copyright (c) Microsoft Corporation.\n# Licensed under the MIT license.\n\nimport os\nimport csv\nimport json\nimport random\nimport argparse\nimport platform\nfrom collections import OrderedDict\nfrom itertools import islice\n\nimport tqdm\nimport h5py\nimport torchvision\nimport torchvision.transforms as transforms\nfrom google_drive_downloader import GoogleDriveDownloader as gdd\n\ndef get_arg_parser() -> argparse.ArgumentParser:\n    parser = argparse.ArgumentParser()\n    parser.add_argument(\"--task\")\n    return parser\n\ndef reduce_users(file):\n\n    with open(file, 'r') as f:\n            json_file = json.load(f)\n\n    num_samples = json_file['num_samples'][0:25]\n    user_data = dict(OrderedDict(islice(json_file['user_data'].items(), 0, 25)))\n    users_list = list(user_data.keys())\n\n    return users_list, num_samples, user_data\n\ndef _process_and_save_to_disk(dataset, n_users, exp, output):\n    '''Process a Torchvision dataset to expected format and save to disk'''\n\n    # Split training data equally among all users\n    total_samples = len(dataset)\n    samples_per_user = total_samples // n_users\n    assert total_samples % n_users == 0\n\n    # Function for getting a given user's data indices\n    user_idxs = lambda user_id: slice(user_id * samples_per_user, (user_id + 1) * samples_per_user)\n\n    data_dict = {  # the data is expected to have this format\n        'users' : [f'{user_id:04d}' for user_id in range(n_users)],\n        'num_samples' : n_users * [samples_per_user],\n        'user_data' : {f'{user_id:04d}': dataset.data[user_idxs(user_id)].tolist() if exp ==\"classif_cnn\" else dataset.data[user_idxs(user_id)] for user_id in range(n_users)},\n        'user_data_label': {f'{user_id:04d}': dataset.targets[user_idxs(user_id)] for user_id in range(n_users)},\n    }\n\n    with h5py.File(output + '.hdf5', 'w') as hdf5_file:\n        _dump_dict_to_hdf5(data_dict=data_dict, hdf5_file=hdf5_file)\n\n\ndef _dump_dict_to_hdf5(data_dict: dict, hdf5_file: h5py.File):\n    '''Dump dict with expected structure to HDF5 file'''\n\n    hdf5_file.create_dataset('users', data=data_dict['users'])\n    hdf5_file.create_dataset('num_samples', data=data_dict['num_samples'])\n\n    # Store actual data in groups\n    user_data_group = hdf5_file.create_group('user_data')\n    for user, user_data in tqdm.tqdm(data_dict['user_data'].items()):\n        user_subgroup = user_data_group.create_group(user)\n        user_subgroup.create_dataset('x', data=user_data) \n\n    user_data_label_group = hdf5_file.create_group('user_data_label')\n    for user, user_data_label in tqdm.tqdm(data_dict['user_data_label'].items()):\n        user_data_label_group.create_dataset(user, data=user_data_label)\n\nclass HeartDataSet: \n    def __init__(self, heartdata, cutoff):\n        self.data = [row[:187] for row in heartdata][:cutoff]\n        self.targets = [int(float(row[187])) for row in heartdata][:(round(len(heartdata), -3))][:cutoff]\n\n    def __len__(self):\n        return len(self.data)  \n\ndef main():\n\n    parser = get_arg_parser()\n    args = parser.parse_args()\n    args = vars(args)\n    exp = args[\"task\"]\n\n    # Create data folder\n    os.system(\"mkdir data\")\n\n    if exp == \"nlg_gru\" or exp == \"mlm_bert\":\n        \n        # Download preprocessed reddit dataset by LEAF: A Benchmark for Federated Settings\n        gdd.download_file_from_google_drive(file_id='1ISzp69JmaIJqBpQCX-JJ8-kVyUns8M7o', dest_path='./data/nlg_gru.zip', unzip=True)\n\n        files = [\"train_data\", \"val_data\", \"test_data\"]\n        for file in files:\n            orig_file = os.path.join(\"data\",\"new_small_data\",str(file+\".json\"))\n            users_list, num_samples, user_data = reduce_users(orig_file)\n            \n            # Preprocess data\n            if exp == \"nlg_gru\":\n                os.makedirs(\"data/nlg_gru\", exist_ok= True) if platform.system() == \"Windows\" else os.system(\"mkdir data/nlg_gru\") \n                for users in user_data:\n                    listToStr = ''\n                    for i, sentences in enumerate(user_data[users]['x']):\n                        for j, pieces in enumerate(sentences):\n                            listToStr = ' '.join([elem for elem in pieces])\n                            user_data[users]['x'][i][j] = listToStr\n                        full_sentence = ' '.join([elem for elem in sentences])\n                        full_sentence = full_sentence.replace('<PAD>', '').replace('<EOS>', '').replace('<BOS>', '').strip()\n                        user_data[users]['x'][i] = full_sentence\n                        user_data[users].pop('y',None)\n\n            elif exp == \"mlm_bert\":\n                os.makedirs(\"data/mlm_bert\", exist_ok= True) if platform.system() == \"Windows\" else os.system(\"mkdir data/mlm_bert\")\n                user_data_aux = dict()\n                for users in user_data:\n                    listToStr = ''\n                    for i, sentences in enumerate(user_data[users]['x']):\n                        for j, pieces in enumerate(sentences):\n                            listToStr = ' '.join([elem for elem in pieces])\n                            listToStr = listToStr.replace('<PAD>', '').replace('<EOS>', '').replace('<BOS>', '').strip()\n                            user_data[users]['x'][i][j] = listToStr\n                        user_data[users].pop('y',None)\n                    user_data_aux[users] = user_data[users]['x']\n                user_data = user_data_aux\n\n            # Create new dictionary\n            new_dict = {'users':users_list ,'num_samples':num_samples, 'user_data':user_data}\n\n            # Save preprocessed files\n            ext = \".json\" if exp==\"nlg_gru\" else \".txt\"\n            new_file = os.path.join(\"data\",exp,str(file+ ext))\n            f = open(new_file,'w')\n            json.dump(new_dict,f)\n            f.close()\n\n            # Build vocabulary\n            os.system(str(\"python build_vocab.py --data-dir ./data/\"+ exp +\" --target-dir ./models\"))\n            \n    elif exp == \"classif_cnn\":\n        os.makedirs(\"data/classif_cnn\", exist_ok= True) if platform.system() == \"Windows\" else os.system(\"mkdir data/classif_cnn\")\n        \n        # Get training and testing sets from torchvision\n        transform = transforms.Compose([transforms.ToTensor(), transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))])\n        trainset = torchvision.datasets.CIFAR10(root='./data', train=True,download=True, transform=transform)\n        testset = torchvision.datasets.CIFAR10(root='./data', train=False,download=True, transform=transform)\n\n        # Saving datasets\n        _process_and_save_to_disk(trainset, n_users=50, exp=exp, output='./data/classif_cnn/train_data')\n        _process_and_save_to_disk(testset, n_users=50, exp=exp, output='./data/classif_cnn/test_data')\n    \n    elif exp == \"ecg_cnn\":\n        os.makedirs(\"data/ecg_cnn\", exist_ok= True) if platform.system() == \"Windows\" else os.system(\"mkdir data/ecg_cnn\")\n        \n        # Create dummy datasets\n        for set in ['train_data.csv', 'test_data.csv']:\n            data= [random.random() for i in range(188)]\n            with open(os.path.join('data',exp,set), 'w', newline='') as f:\n                write = csv.writer(f)\n                for row in range(87554):\n                    write.writerow(data)\n\n        # Preprocess datasets\n        for set in ['train_data', 'test_data']: \n            with open(os.path.join('data',exp,str(set+\".csv\"))) as f: \n                testset = list(csv.reader(f , delimiter=','))\n            TestDataset = HeartDataSet(testset, 21000)\n            _process_and_save_to_disk(TestDataset,1000,exp,os.path.join('data',exp,set))\n\nif __name__ == '__main__':\n    main()\n"
  },
  {
    "path": "testing/hello_world_classif_cnn.yaml",
    "content": "# Basic configuration file for running classif_cnn example using hdf5 files.\n# Parameters needed to initialize the model\nmodel_config:\n    model_type: CNN                                    # class w/ `loss` and `inference` methods\n    model_folder: experiments/classif_cnn/model.py     # file containing class\n\n# Configuration for differential privacy\ndp_config:\n    enable_local_dp: false                             # whether to enable user-level DP\n\n# Additional privacy metrics\nprivacy_metrics_config:\n    apply_metrics: false                               # cache data to compute additional metrics\n\n# Select the Federated optimizer to use (e.g. DGA or FedAvg)\nstrategy: DGA\n\n# Determines all the server-side settings for training and evaluation rounds\nserver_config:   \n    wantRL: false                                      # whether to use RL-based meta-optimizers\n    resume_from_checkpoint: false                      # restart from checkpoint if file exists\n    do_profiling: false                                # run profiler and compute runtime metrics\n    optimizer_config:                                  # this is the optimizer used to update the model\n        type: sgd\n        lr: 1.0\n    annealing_config:                                  # annealer for the learning rate\n        type: step_lr\n        step_interval: epoch\n        gamma: 1.0\n        step_size: 100\n    val_freq: 1                                       # how many iterations between metric eval on val set\n    rec_freq: 5                                     # how many iterations between metric eval on test set\n    initial_val: true\n    initial_rec: true\n    max_iteration: 3                                # how many iterations in total\n    num_clients_per_iteration: 3                      # how many clients per iteration\n    data_config:                                       # where to get val and test data from\n        val:\n            batch_size: 10000\n            val_data: null\n        test:\n            batch_size: 10000\n            test_data: null\n    type: model_optimization\n    aggregate_median: softmax                          # how aggregations weights are computed\n    initial_lr_client: 0.001                           # learning rate used on client optimizer\n    lr_decay_factor: 1.0\n    weight_train_loss: train_loss\n    best_model_criterion: f1_score\n    fall_back_to_best_model: false\n    softmax_beta: 1.0\n\n# Dictates the learning parameters for client-side model updates. Train data is defined inside this config.\nclient_config:\n    do_profiling: false                                # run profiling and compute runtime metrics\n    ignore_subtask: false\n    data_config:                                       # where to get training data from\n        train:\n            batch_size: 4\n            list_of_train_data: null\n            desired_max_samples: 50000\n    optimizer_config:                                  # this is the optimizer used by the client\n        type: sgd\n        lr: 0.001                                      # this is overridden by `initial_lr_client`\n        momentum: 0.9\n    type: optimization"
  },
  {
    "path": "testing/hello_world_ecg_cnn.yaml",
    "content": "# Basic configuration file for running ecg_cnn example using json files.\n# Parameters needed to initialize the model\nmodel_config:\n    model_type: SuperNet                               # class w/ `loss` and `inference` methods\n    model_folder: experiments/ecg_cnn/model.py         # file containing class\n\n# Configuration for differential privacy\ndp_config:\n    enable_local_dp: false                             # whether to enable user-level DP\n\n# Additional privacy metrics\nprivacy_metrics_config:\n    apply_metrics: false                               # cache data to compute additional metrics\n\n# Select the Federated optimizer to use (e.g. DGA or FedAvg)\nstrategy: DGA\n\n# Determines all the server-side settings for training and evaluation rounds\nserver_config:   \n    wantRL: false                                      # whether to use RL-based meta-optimizers\n    resume_from_checkpoint: false                      # restart from checkpoint if file exists\n    do_profiling: false                                # run profiler and compute runtime metrics\n    optimizer_config:                                  # this is the optimizer used to update the model\n        type: sgd\n        lr: 1.0\n    annealing_config:                                  # annealer for the learning rate\n        type: step_lr\n        step_interval: epoch\n        gamma: 1.0\n        step_size: 100\n    val_freq: 5                                      # how many iterations between metric eval on val set\n    rec_freq: 3                                      # how many iterations between metric eval on test set\n    initial_val: false\n    initial_rec: false\n    max_iteration: 3                               # how many iterations in total\n    num_clients_per_iteration: 3                      # how many clients per iteration\n    data_config:                                       # where to get val and test data from\n        val:\n            batch_size: 10000\n            val_data: data/ecg_cnn/test_data.hdf5\n        test:\n            batch_size: 10000\n            test_data: data/ecg_cnn/test_data.hdf5\n    type: model_optimization\n    aggregate_median: softmax                          # how aggregations weights are computed\n    softmax_beta: 20.0\n    initial_lr_client: 0.001                           # learning rate used on client optimizer\n    lr_decay_factor: 1.0\n    weight_train_loss: train_loss\n    best_model_criterion: loss\n    fall_back_to_best_model: false\n\n# Dictates the learning parameters for client-side model updates. Train data is defined inside this config.\nclient_config:\n    do_profiling: false                                # run profiling and compute runtime metrics\n    ignore_subtask: false\n    data_config:                                       # where to get training data from\n        train:\n            batch_size: 96\n            list_of_train_data: data/ecg_cnn/train_data.hdf5\n            desired_max_samples: 87000\n    optimizer_config:                                  # this is the optimizer used by the client\n        type: sgd \n        lr: 0.001                                      # this is overridden by `initial_lr_client`\n        momentum: 0.90\n    type: optimization"
  },
  {
    "path": "testing/hello_world_mlm_bert.yaml",
    "content": "# Basic configuration file for running mlm_bert example using json files.\n# Parameters needed to initialize the model\nmodel_config:\n    model_type: BERT \n    model_folder: experiments/mlm_bert/model.py\n    BERT:\n        loader_type: text\n        model:\n            model_name: roberta-large\n            cache_dir: ./cache_dir\n            use_fast_tokenizer: False\n            mask_token: <mask>\n            task: mlm\n            past_index: -1\n            prediction_loss_only: false\n            process_line_by_line: false\n        training:\n            seed: 12345\n            label_smoothing_factor: 0  \n            batch_size: 64\n            max_seq_length: 256            \n\n# Configuration for differential privacy\ndp_config:\n    enable_local_dp: false  # If enabled, the rest of parameters is needed. \n    enable_global_dp: false # Local dp clips and adds noise on the client and centrally accumulates the privacy budget\n    eps: 100                # epsilon\n    global_sigma: 0.35      # Used when global dp es enabled, specifies the global Gaussian noise\n    weight_scaler: 0.0001   # indicates how the aggregation weights scaled before noise addition, and unscaled afterwards.\n    max_grad: 0.008         # max gradient\n    max_weight: 0.5         # The max_weight and min_weight should be already scaled by weight_scaler\n    min_weight: 0.0000001   # Because we scale down the weight using weight_scalar -> clip -> add noise -> scale back up.\n\n# Additional privacy metrics\nprivacy_metrics_config:\n    apply_metrics: false    # If enabled, the rest of parameters is needed. \n\n# Select the Federated optimizer to use (e.g. DGA or FedAvg)\nstrategy: DGA\n\n# Determines all the server-side settings for training and evaluation rounds\nserver_config:\n    resume_from_checkpoint: true                    # Resumes from latest checkpoint iteration if available \n    do_profiling: false                             # Capture profiling information during server updates.\n    wantRL: false                                   # Enable/Disable Reinforcement learning\n    optimizer_config:                               # Configuration for server-side optimizer\n        lr: 0.00001                                 \n        weight_decay: 0.01\n        type: adamW\n    annealing_config:                               # This section configures how the learning rate decays\n        type: step_lr\n        step_interval: epoch\n        gamma: 1.0\n        step_size: 1000\n    val_freq: 5                                     # Frequency for validation rounds\n    rec_freq: 5                                    # Frequency for testing rounds\n    initial_val : false                              # Enable initial validation round at itr=0\n    initial_rec: false                              # Enable initial testing round at itr=0\n    max_iteration: 2                            # Total number of rounds for FL\n    num_clients_per_iteration: 2                  # Number of clients sampled per round\n    data_config:                                    # Server-side data configuration\n        val:                                        # Validation data\n            val_data: data/mlm_bert/val_data.txt\n            task: mlm\n            mlm_probability: 0.25\n            tokenizer_type_fast: False\n            batch_size: 128\n            max_seq_length: 256\n            min_words_per_utt: 5\n            max_samples_per_user: 5000\n            mask_token: <mask>\n            num_workers: 0\n            prepend_datapath: false\n            cache_dir: ./cache_dir\n        # Note this is NOT the main training data configuration, which is configured in the \n        # client config.  This section is ignored unless you are running replay data.\n        # If you want to run replay data- set a path name for train_data_server.\n        # train:\n        #     loader_type: text\n        #     train_data: null\n        #     train_data_server: null\n        #     desired_max_samples: null\n        test:                                       # Test data configuration\n            test_data: data/mlm_bert/test_data.txt\n            task: mlm\n            mlm_probability: 0.25\n            tokenizer_type_fast: False\n            batch_size: 128\n            max_seq_length: 256\n            max_samples_per_user: 5000\n            mask_token: <mask>\n            num_workers: 0\n            prepend_datapath: false\n            cache_dir: ./cache_dir\n    type: model_optimization                        # Server type\n    aggregate_median: softmax                       # FL aggregation method\n    weight_train_loss: train_loss                # Determines how each client's weight is computed (e.g. grad_mean_loss, train_loss)\n    softmax_beta: 1.00                              \n    initial_lr_client: 0.00001\n    lr_decay_factor: 1.0\n    best_model_criterion: loss                      # Determine the best model based on minimal loss, for checkpointing\n    fall_back_to_best_model: false                  # If a model degrades, use the previous best model\n\n# Dictates the learning parameters for client-side model updates. Train data is defined inside this config.\nclient_config:\n    meta_learning: basic\n    stats_on_smooth_grad: true\n    ignore_subtask: false\n    copying_train_data: false\n    do_profiling: false                             # Enables client-side training profiling\n    data_config:\n        train:                                      # This is the main training data configuration\n            list_of_train_data: data/mlm_bert/train_data.txt\n            task: mlm\n            mlm_probability: 0.25\n            tokenizer_type_fast: False\n            batch_size: 24\n            max_seq_length: 256\n            min_words_per_utt: 5\n            desired_max_samples: 5000\n            mask_token: <mask>\n            num_workers: 0\n            num_frames: 0\n            max_grad_norm: 15.0\n            prepend_datapath: false\n            cache_dir: ./cache_dir\n            pin_memory: true\n    type: optimization\n    meta_optimizer_config:\n        lr: 0.01\n        type: adam\n    optimizer_config:\n        type: adamW\n        weight_decay: 0.01\n        amsgrad: true\n    annealing_config:\n        type: step_lr\n        step_interval: epoch\n        step_size: 2\n        gamma: 1.0"
  },
  {
    "path": "testing/hello_world_nlg_gru.yaml",
    "content": "# Basic configuration file for running nlg_gru example using json files.\n# Parameters needed to initialize the model\nmodel_config: \n    model_type: GRU\n    model_folder: experiments/nlg_gru/model.py\n    embed_dim: 160\n    vocab_size: 10000\n    hidden_dim: 512\n    OOV_correct: false\n\n# Configuration for differential privacy\ndp_config:\n    enable_local_dp: false      # If enabled, the rest of parameters is needed. \n\n# Additional privacy metrics\nprivacy_metrics_config:\n    apply_metrics: false             # If enabled, the rest of parameters is needed. \n\n# Select the Federated optimizer to use (e.g. DGA or FedAvg)\nstrategy: DGA\n\n# Determines all the server-side settings for training and evaluation rounds\nserver_config:   \n    wantRL: false                   # Enable/Disable Reinforcement learning\n    resume_from_checkpoint: true    # Resumes from latest checkpoint iteration if available \n    do_profiling: false             # Capture profiling information during server updates.\n    optimizer_config:               # Configuration for server-side optimizer\n        type: adam\n        lr: 0.003\n        amsgrad: true\n    annealing_config:               # This section configures how the learning rate decays\n        type: step_lr\n        step_interval: epoch\n        gamma: 1.0\n        step_size: 100\n    val_freq: 1                     # Frequency for validation rounds\n    rec_freq: 5                     # Frequency for testing rounds\n    initial_val : true            # Enable initial validation round at itr=0\n    initial_rec: false             # Enable initial testing round at itr=0\n    max_iteration: 3               # Total number of rounds for FL\n    num_clients_per_iteration: 10   # Number of clients sampled per round\n    data_config:                    # Server-side data configuration\n        val:                        # Validation data\n            # batch_size: 2048\n            tokenizer_type: not_applicable\n            prepend_datapath: false\n            val_data: data/nlg_gru/val_data.json\n            vocab_dict: models/vocab_reddit.vocab\n            pin_memory: true\n            num_workers: 0                          # Indicates how many workers are used for creating batches\n            num_frames: 2400                        \n            max_batch_size: 2048\n            max_num_words:  25\n            unsorted_batch: true\n        test:                                       # Test data configuration\n            batch_size: 2048\n            tokenizer_type: not_applicable\n            prepend_datapath: false\n            train_data: null\n            train_data_server: null\n            test_data: data/nlg_gru/test_data.json\n            vocab_dict: models/vocab_reddit.vocab\n            pin_memory: true\n            num_workers: 0                          # Indicates how many workers are used for creating batches\n            max_batch_size: 2048\n            max_num_words:  25\n            unsorted_batch: true\n    type: model_optimization\n    aggregate_median: softmax                       # FL aggregation method\n    weight_train_loss: train_loss                   # Determines how each client's weight is computed (e.g. grad_mean_loss, train_loss)\n    softmax_beta: 20.0\n    initial_lr_client: 1.0\n    lr_decay_factor: 1.0\n    best_model_criterion: loss                      # Determine the best model based on minimal loss, for checkpointing\n    fall_back_to_best_model: false                  # If a model degrades, use the previous best model\n\n# Dictates the learning parameters for client-side model updates. Train data is defined inside this config.\nclient_config:\n    meta_learning: basic\n    stats_on_smooth_grad: true\n    ignore_subtask: false\n    num_skips_threshold: 10\n    copying_train_data: false\n    do_profiling: false                                 # Enables client-side training profiling\n    data_config:\n        train:                                          # This is the main training data configuration\n            batch_size: 64\n            tokenizer_type: not_applicable\n            prepend_datapath: false\n            list_of_train_data: data/nlg_gru/train_data.json\n            vocab_dict: models/vocab_reddit.vocab\n            pin_memory: true\n            num_workers: 0\n            desired_max_samples: 50000\n            max_grad_norm: 20.0\n            max_batch_size: 128\n            max_num_words:  25\n            unsorted_batch: true\n    type: optimization\n    meta_optimizer_config:\n        lr: 1.0\n        type: sgd\n    optimizer_config:\n        type: sgd\n    annealing_config:\n        type: step_lr\n        step_interval: epoch\n        step_size: 1\n        gamma: 1.0"
  },
  {
    "path": "testing/test_e2e_trainer.py",
    "content": "# Copyright (c) Microsoft Corporation.\n# Licensed under the MIT license.\n\nimport subprocess\nimport os\nimport platform\nimport pytest \n\nxfail = pytest.mark.xfail\n\ndef get_info(task):\n\n    data_path=r'./testing/'\n    output_path=r'./testing/outputs/'\n\n    if task == 'nlg_gru':\n        config_path=r'./testing/hello_world_nlg_gru.yaml'\n    elif task == \"classif_cnn\":\n        config_path=r'./testing/hello_world_classif_cnn.yaml'\n    elif task == \"ecg_cnn\":\n        config_path=r'./testing/hello_world_ecg_cnn.yaml'\n    elif task == \"mlm_bert\":\n        config_path=r'./testing/hello_world_mlm_bert.yaml'\n\n    return data_path, output_path, config_path\n\ndef run_pipeline(data_path, output_path, config_path, task):\n\n    print(\"Testing {} task\".format(task))\n\n    # Adjust command to the task and OS\n    sym = \"&\" if platform.system() == \"Windows\" else \";\" \n    command = 'cd .. '+ sym +' python '+'-m '+'torch.distributed.run '+ '--nproc_per_node=2 '+ 'e2e_trainer.py '+ \\\n            '-dataPath '+ data_path+' -outputPath '+output_path+' -config ' +config_path +\\\n            ' -task '+ task + ' -backend '+ 'nccl'\n\n    # Execute e2e_trainer + stores the exit code\n    with open('logs.txt','w') as f:                      \n        process= subprocess.run(command, shell=True,stdout=f,text=True,timeout=900)\n    return_code=process.returncode\n    \n    # Print logs\n    os.system(\"ls\")\n    os.system(\"less logs.txt\")\n    print(process.stderr)\n    print(\"Finished running {} task\".format(task))\n\n    return return_code\n\ndef test_nlg_gru():  \n    \n    task = 'nlg_gru'\n    data_path, output_path, config_path = get_info(task)\n    assert run_pipeline(data_path, output_path, config_path, task)==0\n\ndef test_ecg_cnn():  \n    \n    task = 'ecg_cnn'\n    data_path, output_path, config_path = get_info(task)\n    assert run_pipeline(data_path, output_path, config_path, task)==0\n    \n@pytest.mark.xfail\ndef test_mlm_bert():  \n    \n    task = 'mlm_bert'\n    data_path, output_path, config_path = get_info(task)\n    assert run_pipeline(data_path, output_path, config_path, task)==0\n    print(\"PASSED\")\n\n@pytest.mark.xfail\ndef test_classif_cnn():  \n    \n    task = 'classif_cnn'\n    data_path, output_path, config_path = get_info(task)\n    assert run_pipeline(data_path, output_path, config_path, task)==0\n    print(\"PASSED\")\n"
  },
  {
    "path": "utils/__init__.py",
    "content": "# Copyright (c) Microsoft Corporation.\n# Licensed under the MIT license.\n\nfrom .utils import *\nfrom utils.optimizers.lars import *\nfrom utils.optimizers.lamb import *\n\n"
  },
  {
    "path": "utils/data_utils.py",
    "content": "# Copyright (c) Microsoft Corporation.\n# Licensed under the MIT license.\n\nimport random\nimport logging\nfrom torch.utils.data import sampler\nfrom utils import AverageMeter\n\nclass BatchSampler(sampler.Sampler):\n    \"\"\"\n    Simply determines the order in which the loader will read samples from the data set.\n    We want to sample batches randomly, but each batch should have samples that are\n    close to each other in the dataset (so that we don't have a lot of zero padding)\n    \"\"\"\n\n    def __init__(self, dataset, batch_size, randomize=True, drop_last=False):\n        self.dataset = dataset\n        self.batch_size = batch_size\n        self.randomize=randomize\n\n        batches = [range(begin_id, begin_id + batch_size) for begin_id in range(0, len(dataset), batch_size)]\n\n        # if the indexes in the last batch are going over len(dataset), we drop the last batch.\n        if batches[-1][-1] > len(dataset):\n            if drop_last:\n                del batches[-1]\n            else:\n                batches[-1]=range(batches[-1][0],len(dataset))\n        self.batches = batches\n\n    def __iter__(self):\n\n        if self.randomize:\n            random.shuffle(self.batches)\n\n        return iter(self.batches)\n\n    def __len__(self):\n        return len(self.batches) * self.batch_size\n\n\nclass DynamicBatchSampler(sampler.Sampler):\n    \"\"\"Extension of Sampler that will do the following:\n        1.  Change the batch size (essentially number of sequences)\n            in a batch to ensure that the total number of frames are less\n            than a certain threshold.\n        2.  Make sure the padding efficiency in the batch is high.\n    \"\"\"\n\n    def __init__(self, sampler, frames_threshold, max_batch_size=0, unsorted_batch=False, fps= 1000 / 30):\n        \"\"\"\n        @sampler: will mostly be an instance of DistributedSampler.\n        Though it should work with any sampler.\n        @frames_threshold: maximum area of the batch\n        \"\"\"\n        self.sampler = sampler\n        self.frames_threshold = frames_threshold\n        self.max_batch_size = max_batch_size\n        self.unsorted_batch = unsorted_batch\n\n        indices, batches = list(), list()\n        # the dataset to which these indices are pointing to\n        dataset = self.sampler.dataset\n        # get all the indices and corresponding durations from\n        # the sampler\n        for idx in self.sampler:\n            indices.append((idx, dataset.utt_list[idx][\"duration\"]))\n\n        # sort the indices according to duration\n        if self.unsorted_batch is False:\n            indices.sort(key=lambda elem : elem[1])\n            max_dur = indices[-1][1]\n        else:\n            # make sure that you will be able to serve all the utterances\n            max_dur = max([indices[i][1] for i in range(len(indices))])\n\n        # start clubbing the utterances together\n        batch = list()\n        batch_frames, batch_area = 0, 0\n        max_frames_in_batch = 0\n        average_meter = AverageMeter('Padding Efficiency')\n        for idx, duration in indices:\n            if duration > 0:\n                frames = duration * fps\n                if frames > max_frames_in_batch:\n                    max_frames_in_batch = frames\n\n                if (self.unsorted_batch and len(batch) < max_batch_size)\\\n                    or (not self.unsorted_batch and batch_frames + frames <= self.frames_threshold and (max_batch_size == 0 or len(batch) < max_batch_size)):\n                    batch.append(idx)\n                    batch_frames += frames\n                    batch_area = max_frames_in_batch * len(batch)\n                else:\n                    # log the stats and add previous batch to batches\n                    if batch_area > 0 and len(batch) > 0:\n                        average_meter.add(batch_frames, batch_area)\n                        batches.append(batch)\n                    # make a new one\n                    batch = list()\n                    batch_frames, batch_area = frames, frames\n                    max_frames_in_batch = batch_frames\n\n        # When all indices are processed\n        if batch_area > 0 and len(batch) > 0:\n            average_meter.add(batch_frames, batch_area)\n            batches.append(batch)\n\n        # don't need the 'indices' any more\n        del indices\n        self.batches = batches\n        average_meter.display_results(loglevel=logging.DEBUG)\n\n    def __iter__(self):\n        # shuffle on a batch level\n        random.shuffle(self.batches)\n        return iter(self.batches)\n\n    def __len__(self):\n        return len(self.batches)\n\n"
  },
  {
    "path": "utils/dataloaders_utils.py",
    "content": "# Copyright (c) Microsoft Corporation.\n# Licensed under the MIT license.\n\nimport os\nimport logging\nfrom importlib.machinery import SourceFileLoader\nfrom utils import print_rank\n\ndef get_exp_dataloader(task):\n    \"\"\" Detect the dataloader declared in the experiment folder\n\n    Args:\n        task (str): task parsed from the console\n    \"\"\"\n    \n    try:\n        dir = os.path.join('experiments',task,'dataloaders','dataloader.py')\n        loader = SourceFileLoader(\"DataLoader\",dir).load_module()\n        loader = loader.DataLoader\n    except:\n        print_rank(\"Dataloader not found, please make sure is located inside the experiment folder\")\n\n    return loader\n\ndef make_train_dataloader(data_config, data_path, clientx, task=None, vec_size=300, data_strct=None, replay_server=False):\n    \"\"\" Create a dataloader for training on either server or client side \"\"\"\n\n    mode = 'train'\n    tokenizer_type= data_config.get('tokenizer_type', 'not_applicable')\n\n    # Training list for a server\n    if clientx is None:  \n        if not \"train_data_server\" in data_config or data_config[\"train_data_server\"] is None:\n            print_rank(\"No server training set is defined\")\n            return None\n        my_data = os.path.join(data_path, data_config[\"train_data_server\"])\n        mode='val' # Only for replay_server\n        clientx = 0 # Only for replay_server\n        \n    # Training list on a client side\n    else:  \n        if tokenizer_type != 'not_applicable':\n            assert clientx >=0 and clientx < len(data_config[\"train_data\"]), \"Invalid client index {}\".format(clientx)\n            my_data = data_config[\"train_data\"][clientx]\n        else:\n            my_data = data_config[\"list_of_train_data\"]\n\n    DataLoader = get_exp_dataloader(task)\n    train_dataloader = DataLoader(data = data_strct if data_strct is not None else my_data,\n                                    user_idx = clientx,\n                                    mode = mode,\n                                    args=data_config\n                                    )\n\n    return train_dataloader\n\n\ndef make_val_dataloader(data_config, data_path, task=None, data_strct=None, train_mode=False):\n    \"\"\" Return a data loader for a validation set \"\"\"\n\n    DataLoader = get_exp_dataloader(task)\n    val_file = os.path.join(data_path, data_config[\"val_data\"]) if data_config[\"val_data\"] != None and data_path != None else None\n    val_dataloader = DataLoader(data = data_strct if data_strct is not None else val_file,\n                                user_idx = 0,\n                                mode = 'val',\n                                args=data_config\n                                )\n\n    return val_dataloader\n\n\ndef make_test_dataloader(data_config, data_path, task=None, data_strct=None):\n    \"\"\" Return a data loader for an evaluation set. \"\"\"\n\n    DataLoader = get_exp_dataloader(task)\n    test_file = os.path.join(data_path, data_config[\"test_data\"]) if data_config[\"test_data\"] != None and data_path != None else None\n    test_dataloader = DataLoader(data = data_strct if data_strct is not None else test_file,\n                                user_idx = 0,\n                                mode = 'test',\n                                args=data_config\n                                )\n\n    return test_dataloader\n\ndef get_dataset(data_path, config, task, mode, test_only=False, user_idx=-1, data_strct=None):\n    \"\"\" Return the task train/val/test dataset \"\"\"\n\n    # Load Dataset Class\n    data_config = get_data_config(config,mode)\n    dir_ = os.path.join('experiments',task,'dataloaders','dataset.py')\n    loader = SourceFileLoader(\"Dataset\",dir_).load_module()\n    dataset = loader.Dataset\n\n    data_file = \"val_data\" if mode == \"val\" else \"test_data\" if mode == \"test\" else \"list_of_train_data\"\n    data_file = data_config[data_file]\n    data_pointer = os.path.join(data_path, data_file) if data_file != None else data_file\n\n    return dataset(data_pointer if data_strct == None else data_strct, test_only=test_only, user_idx=user_idx, args=data_config)\n\ndef get_data_config(config, mode):\n    \"\"\" Return the configuration for the dataset\"\"\"\n\n    if mode == 'val':\n        data_config = config['server_config']['data_config'][\"val\"]\n    elif mode == 'test':\n        data_config = config['server_config']['data_config'][\"test\"]\n    else:\n        data_config = config[\"client_config\"][\"data_config\"][\"train\"]\n    \n    semisupervision_config = config[\"client_config\"].get('semisupervision',None)\n    if semisupervision_config == None:\n        return data_config\n    else:\n        return {** data_config, **semisupervision_config}\n\n\n"
  },
  {
    "path": "utils/optimizers/adamW.py",
    "content": "# Copyright (c) Microsoft Corporation.\n# Licensed under the MIT license.\n\nimport math\nimport torch\nfrom torch.optim import Optimizer\n\nclass AdamW(Optimizer):\n    \"\"\" Implements Adam algorithm with weight decay fix.\n    Parameters:\n        lr (float): learning rate. Default 1e-3.\n        betas (tuple of 2 floats): Adams beta parameters (b1, b2). Default: (0.9, 0.999)\n        eps (float): Adams epsilon. Default: 1e-6\n        weight_decay (float): Weight decay. Default: 0.0\n        correct_bias (bool): can be set to False to avoid correcting bias in Adam (e.g. like in Bert TF repository). Default True.\n    \"\"\"\n    def __init__(self, params, lr=1e-3, betas=(0.9, 0.999), eps=1e-6, weight_decay=0.0, correct_bias=True):\n        if lr < 0.0:\n            raise ValueError(\"Invalid learning rate: {} - should be >= 0.0\".format(lr))\n        if not 0.0 <= betas[0] < 1.0:\n            raise ValueError(\"Invalid beta parameter: {} - should be in [0.0, 1.0[\".format(betas[0]))\n        if not 0.0 <= betas[1]  < 1.0:\n            raise ValueError(\"Invalid beta parameter: {} - should be in [0.0, 1.0[\".format(betas[1]))\n        if not 0.0 <= eps:\n            raise ValueError(\"Invalid epsilon value: {} - should be >= 0.0\".format(eps))\n        defaults = dict(lr=lr, betas=betas, eps=eps, weight_decay=weight_decay,\n                        correct_bias=correct_bias)\n        super(AdamW, self).__init__(params, defaults)\n\n    def step(self, closure=None):\n        \"\"\"Performs a single optimization step.\n        Arguments:\n            closure (callable, optional): A closure that reevaluates the model\n                and returns the loss.\n        \"\"\"\n        loss = None\n        if closure is not None:\n            loss = closure()\n\n        for group in self.param_groups:\n            for p in group['params']:\n                if p.grad is None:\n                    continue\n                grad = p.grad.data\n                if grad.is_sparse:\n                    raise RuntimeError('Adam does not support sparse gradients, please consider SparseAdam instead')\n\n                state = self.state[p]\n\n                # State initialization\n                if len(state) == 0:\n                    state['step'] = 0\n                    # Exponential moving average of gradient values\n                    state['exp_avg'] = torch.zeros_like(p.data)\n                    # Exponential moving average of squared gradient values\n                    state['exp_avg_sq'] = torch.zeros_like(p.data)\n\n                exp_avg, exp_avg_sq = state['exp_avg'], state['exp_avg_sq']\n                beta1, beta2 = group['betas']\n\n                state['step'] += 1\n\n                # Decay the first and second moment running average coefficient\n                # In-place operations to update the averages at the same time\n                exp_avg.mul_(beta1).add_(grad, alpha=1.0 - beta1)\n                exp_avg_sq.mul_(beta2).addcmul_(grad, grad, value=1.0 - beta2)\n                denom = exp_avg_sq.sqrt().add_(group['eps'])\n\n                step_size = group['lr']\n                if group['correct_bias']:  # No bias correction for Bert\n                    bias_correction1 = 1.0 - beta1 ** state['step']\n                    bias_correction2 = 1.0 - beta2 ** state['step']\n                    step_size = step_size * math.sqrt(bias_correction2) / bias_correction1\n\n                p.data.addcdiv_(exp_avg, denom, value = -step_size)\n\n                # Just adding the square of the weights to the loss function is *not*\n                # the correct way of using L2 regularization/weight decay with Adam,\n                # since that will interact with the m and v parameters in strange ways.\n                #\n                # Instead we want to decay the weights in a manner that doesn't interact\n                # with the m/v parameters. This is equivalent to adding the square\n                # of the weights to the loss with plain (non-momentum) SGD.\n                # Add weight decay at the end (fixed version)\n                if group['weight_decay'] > 0.0:\n                    p.data.add_(p.data, alpha= -group['lr'] * group['weight_decay'])\n\n        return loss"
  },
  {
    "path": "utils/optimizers/lamb.py",
    "content": "# Copyright (c) Microsoft Corporation.\n# Licensed under the MIT license.\n\n\"\"\"Lamb optimizer.\"\"\"\n\nimport collections\nimport math\n\nimport torch\nfrom torch.optim import Optimizer\n\ntry:\n    from tensorboardX import SummaryWriter\n\n    def log_lamb_rs(optimizer: Optimizer, event_writer: SummaryWriter, token_count: int):\n        \"\"\"Log a histogram of trust ratio scalars in across layers.\"\"\"\n        results = collections.defaultdict(list)\n        for group in optimizer.param_groups:\n            for p in group['params']:\n                state = optimizer.state[p]\n                for i in ('weight_norm', 'adam_norm', 'trust_ratio'):\n                    if i in state:\n                        results[i].append(state[i])\n\n        for k, v in results.items():\n            event_writer.add_histogram(f'lamb/{k}', torch.tensor(v), token_count)\n\nexcept ImportError:\n    def log_lamb_rs(optimizer, event_writer, token_count):\n        print(\"tensorboardX is not installed\")\n\n\nclass LAMB(Optimizer):\n    r\"\"\"Implements Lamb algorithm.\n\n    It has been proposed in `Large Batch Optimization for Deep Learning: Training BERT in 76 minutes`_.\n\n    Arguments:\n        params (iterable): iterable of parameters to optimize or dicts defining\n            parameter groups\n        lr (float, optional): learning rate (default: 1e-3)\n        betas (Tuple[float, float], optional): coefficients used for computing\n            running averages of gradient and its square (default: (0.9, 0.999))\n        eps (float, optional): term added to the denominator to improve\n            numerical stability (default: 1e-8)\n        weight_decay (float, optional): weight decay (L2 penalty) (default: 0)\n        adam (bool, optional): always use trust ratio = 1, which turns this into\n            Adam. Useful for comparison purposes.\n\n    .. _Large Batch Optimization for Deep Learning: Training BERT in 76 minutes:\n        https://arxiv.org/abs/1904.00962\n    \"\"\"\n\n    def __init__(self, params, lr=1e-3, betas=(0.9, 0.999), eps=1e-6,\n                 weight_decay=0, adam=False):\n        if not 0.0 <= lr:\n            raise ValueError(\"Invalid learning rate: {}\".format(lr))\n        if not 0.0 <= eps:\n            raise ValueError(\"Invalid epsilon value: {}\".format(eps))\n        if not 0.0 <= betas[0] < 1.0:\n            raise ValueError(\"Invalid beta parameter at index 0: {}\".format(betas[0]))\n        if not 0.0 <= betas[1] < 1.0:\n            raise ValueError(\"Invalid beta parameter at index 1: {}\".format(betas[1]))\n        defaults = dict(lr=lr, betas=betas, eps=eps,\n                        weight_decay=weight_decay)\n        self.adam = adam\n        super(LAMB, self).__init__(params, defaults)\n\n    def step(self, closure=None):\n        \"\"\"Performs a single optimization step.\n\n        Arguments:\n            closure (callable, optional): A closure that reevaluates the model\n                and returns the loss.\n        \"\"\"\n        loss = None\n        if closure is not None:\n            loss = closure()\n\n        for group in self.param_groups:\n            for p in group['params']:\n                if p.grad is None:\n                    continue\n                grad = p.grad.data\n                if grad.is_sparse:\n                    raise RuntimeError('Lamb does not support sparse gradients, consider SparseAdam instad.')\n\n                state = self.state[p]\n\n                # State initialization\n                if len(state) == 0:\n                    state['step'] = 0\n                    # Exponential moving average of gradient values\n                    state['exp_avg'] = torch.zeros_like(p.data)\n                    # Exponential moving average of squared gradient values\n                    state['exp_avg_sq'] = torch.zeros_like(p.data)\n\n                exp_avg, exp_avg_sq = state['exp_avg'], state['exp_avg_sq']\n                beta1, beta2 = group['betas']\n\n                state['step'] += 1\n\n                # Decay the first and second moment running average coefficient\n                # m_t\n                exp_avg.mul_(beta1).add_(grad, alpha=1 - beta1)\n                # v_t\n                exp_avg_sq.mul_(beta2).addcmul_(grad, grad, value=1 - beta2)\n\n                # Paper v3 does not use debiasing.\n                # bias_correction1 = 1 - beta1 ** state['step']\n                # bias_correction2 = 1 - beta2 ** state['step']\n                # Apply bias to lr to avoid broadcast.\n                step_size = group['lr'] # * math.sqrt(bias_correction2) / bias_correction1\n\n                weight_norm = p.data.pow(2).sum().sqrt().clamp(0, 10)\n\n                adam_step = exp_avg / exp_avg_sq.sqrt().add(group['eps'])\n                if group['weight_decay'] != 0:\n                    adam_step.add_(p.data, alpha=group['weight_decay'])\n\n                adam_norm = adam_step.pow(2).sum().sqrt()\n                if weight_norm == 0 or adam_norm == 0:\n                    trust_ratio = 1\n                else:\n                    trust_ratio = weight_norm / adam_norm\n                state['weight_norm'] = weight_norm\n                state['adam_norm'] = adam_norm\n                state['trust_ratio'] = trust_ratio\n                if self.adam:\n                    trust_ratio = 1\n\n                p.data.add_(adam_step, alpha=-step_size * trust_ratio)\n\n        return loss\n"
  },
  {
    "path": "utils/optimizers/lars.py",
    "content": "# Copyright (c) Microsoft Corporation.\n# Licensed under the MIT license.\n\n\"\"\"distoptim.hit package\"\"\"\nimport logging\nimport torch\n\nLOG = logging.getLogger(__name__)\n\nclass LarsSGDV1(torch.optim.SGD):\n    \"\"\" LARS SGD V1, based on https://arxiv.org/abs/1708.03888\n        2018.\n        Refer to torch.optim.SGD for paramters.\n    \"\"\"\n\n    def __init__(self, params, lr, momentum=0, dampening=0,\n                 weight_decay=0, nesterov=False):\n        LOG.info(\"Init LarsSGDV1\")\n        super(LarsSGDV1, self).__init__(\n            params, lr, momentum, dampening, weight_decay, nesterov)\n\n    def step(self, closure=None):\n        \"\"\"Performs a single optimization step.\n\n        Arguments:\n            closure (callable, optional): A closure that reevaluates the model\n                and returns the loss.\n        \"\"\"\n        loss = None\n        if closure is not None:\n            loss = closure()\n\n        for group in self.param_groups:\n            weight_decay = group['weight_decay']\n            momentum = group['momentum']\n            # dampening = group['dampening']\n            nesterov = group['nesterov']\n\n            for p in group['params']:\n                if p.grad is None:\n                    continue\n\n                d_p = p.grad.data\n\n                p_n = p.data.norm()\n                d_p_n = d_p.norm()\n\n                if weight_decay != 0:\n                    d_p_n.add_(weight_decay, p_n)\n                    d_p.add_(weight_decay, p.data)\n\n                alpha = 0.001 * p_n / d_p_n  # This is the LARS eta from the paper\n                lr = alpha * group['lr']\n                lr = min(lr, 5.0) \n\n                if momentum != 0:\n                    param_state = self.state[p]\n                    if 'momentum_buffer' not in param_state:\n                        buf = param_state['momentum_buffer'] = \\\n                            torch.clone(d_p).detach()\n                    else:\n                        buf = param_state['momentum_buffer']\n                        buf.mul_(momentum).add_(lr, d_p)\n                    if nesterov:\n                        d_p = d_p.add(momentum, buf)\n                    else:\n                        d_p = buf\n\n                p.data.add_(-1, d_p)\n\n        return loss\n\n\nclass LarsSGD(torch.optim.SGD):\n    \"\"\" LARS SGD, based on https://arxiv.org/abs/1904.00962 Algorithm 1\n        2019, a newer version.\n        Refer to torch.optim.SGD for paramters.\n    \"\"\"\n\n    def __init__(self, params, lr, momentum=0, dampening=0,\n                 weight_decay=0, nesterov=False):\n        LOG.info(\"Init LarsSGD\")\n        super(LarsSGD, self).__init__(\n            params, lr, momentum, dampening, weight_decay, nesterov)\n\n    def step(self, closure=None):\n        \"\"\"Performs a single optimization step.\n\n        Arguments:\n            closure (callable, optional): A closure that reevaluates the model\n                and returns the loss.\n        \"\"\"\n        loss = None\n        if closure is not None:\n            loss = closure()\n\n        for group in self.param_groups:\n            weight_decay = group['weight_decay']\n            momentum = group['momentum']\n            # dampening = group['dampening']\n            nesterov = group['nesterov']\n\n            for p in group['params']:\n                if p.grad is None:\n                    continue\n\n                d_p = p.grad.data\n                if weight_decay != 0:\n                    d_p.add(p.data, alpha=weight_decay)\n\n                if momentum != 0:\n                    param_state = self.state[p]\n                    if 'momentum_buffer' not in param_state:\n                        buf = param_state['momentum_buffer'] = \\\n                            torch.clone(d_p).detach()\n                    else:\n                        buf = param_state['momentum_buffer']\n                        buf.mul_(momentum).add_(1 - momentum, d_p)\n                    if nesterov:\n                        d_p = d_p.add(buf, alpha=momentum)\n                    else:\n                        d_p = buf\n\n                lr = group['lr'] * p.data.norm() / (d_p.norm() + 1e-8)\n                lr.clamp_(0, 10)\n                p.data.add_(d_p, alpha=-lr)\n\n        return loss\n"
  },
  {
    "path": "utils/preprocessing/create-hdf5.py",
    "content": "# Copyright (c) Microsoft Corporation.\n# Licensed under the MIT license.\n\nimport h5py\nimport time\nfrom tqdm import tqdm\nimport pandas as pd\n\n\npath = r'C:\\Users\\train.tsv'\n\ndef local_time():\n    return str(time.strftime(\"%H:%M:%S\",time.localtime()))\n\n\nprint(local_time() + \" Starting script \" )    \ncolumns = ['author','num1','content','str1','str2','num2','subreddit']\ndf = pd.read_csv(path, sep='\\t', names=columns, header=None)\nprint(local_time() + \" File has been read \"  )\n\ndf_authors = pd.DataFrame(df['author'])\ndf_content = pd.DataFrame(df['content'])\ndf_file = pd.concat([df_authors,df_content], axis=1)\nprint(local_time() + \" Data needed has been concatenated \")\n\n\nusers_group = df_file.groupby('author')\ngroup0 = df_file.groupby(['author','content'])\ngroup1 = pd.Series(users_group.size())\nusers = (group1.index).to_numpy() \nprint(local_time() + \" users been formatted \")\nnum_samples = group1.values \nprint(local_time() + \" num_samples has been formatted \")\nuser_data_dict= {}\n\nuser_data_dict= {i: {'x':list()} for i in tqdm(users)}\n\nfor i in tqdm(range(len(df_file))):\n    if df_file['content'][i] not in user_data_dict[df_file['author'][i]]['x']:\n        user_data_dict[df_file['author'][i]]['x'].append(df_file['content'][i])\n        \n\nprint(local_time() + \" user_data has been formatted \")\nf = h5py.File(r\"C:\\Users\\train.hdf5\", \"w\")\ndset_0 = f.create_dataset(\"num_samples\",data=num_samples)\ndset_1= f.create_dataset(\"users\", data =users)\nprint(local_time() + \" starting to store dictionary \")\n\nuser_data = f.create_group(\"user_data\")\nfor user in tqdm(user_data_dict):\n    user_group = user_data.create_group(user)\n    user_data_dict[user]['x'] = [str(e).encode('utf8') for e in  user_data_dict[user]['x']]\n    x_dset = user_group.create_dataset('x',data=user_data_dict[user]['x'])\n\nprint(local_time() + \" end of script \")"
  },
  {
    "path": "utils/preprocessing/create-json.py",
    "content": "# Copyright (c) Microsoft Corporation.\n# Licensed under the MIT license.\n\nimport json\nimport time\nfrom tqdm import tqdm\nimport pandas as pd\n\npath = r'C:\\Users\\train.tsv'\n\ndef local_time():\n    return str(time.strftime(\"%H:%M:%S\",time.localtime()))\n\n\nprint(local_time() + \" Starting script \" )    \ncolumns = ['author','num1','content','str1','str2','num2','subreddit']\ndf = pd.read_csv(path, sep='\\t', names=columns, header=None)\nprint(local_time() + \" File has been read \"  )\n\ndf_authors = pd.DataFrame(df['author'])\ndf_content = pd.DataFrame(df['content'])\ndf_file = pd.concat([df_authors,df_content], axis=1)\nprint(local_time() + \" Data needed has been concatenated \")\n\n\nusers_group = df_file.groupby('author')\ngroup0 = df_file.groupby(['author','content'])\ngroup1 = pd.Series(users_group.size())\nusers = (group1.index).to_numpy() \nprint(local_time() + \" users been formatted \")\nnum_samples = group1.values \nprint(local_time() + \" num_samples has been formatted \")\nuser_data_dict= {}\n\nuser_data_dict= {i: {'x':list()} for i in tqdm(users)}\n\nfor i in tqdm(range(len(df_file))):\n    if df_file['content'][i] not in user_data_dict[df_file['author'][i]]['x']:\n        user_data_dict[df_file['author'][i]]['x'].append(df_file['content'][i])\n        \n\nf = open(r'C:\\Users\\train.json', \"w\")\nnew_data = {'users': users.tolist(), 'num_samples': num_samples.tolist(), 'user_data': user_data_dict}\njson.dump(new_data,f)\nprint(local_time() + \" end of script \")"
  },
  {
    "path": "utils/preprocessing/from_json_to_hdf5.py",
    "content": "# Copyright (c) Microsoft Corporation.\n# Licensed under the MIT license.\n\nimport json\nimport h5py\nfrom tqdm import tqdm\nimport time\n\njson_file = r'C:\\Users\\train.tsv'\n\ndef local_time():\n    return str(time.strftime(\"%H:%M:%S\",time.localtime()))\n\nprint(local_time() + \" Starting script \" )   \nwith open(json_file, 'r') as f:\n    json_file = json.load(f)\nprint(local_time() + \" JSON file read \" )   \n\nhdf_file = h5py.File(r\"C:\\Users\\train.hdf5\", \"w\")\ndset_0 = hdf_file.create_dataset(\"users\",data=json_file['users'])\ndset_1 = hdf_file.create_dataset(\"num_samples\",data=json_file['num_samples'])\nprint(local_time() + \" users and num_samples stored \" )   \n\nuser_data = hdf_file.create_group(\"user_data\")\nfor user in tqdm(json_file['user_data']):\n    user_group = user_data.create_group(user)\n    dset_2 = user_group.create_dataset('x',data=json_file['user_data'][user]['x'])\n\nprint(local_time() + \" end of script \" )   "
  },
  {
    "path": "utils/utils.py",
    "content": "# Copyright (c) Microsoft Corporation.\n# Licensed under the MIT license.\n\nimport os\nimport sys\nimport numpy as np\nimport logging\nimport yaml\nimport time\nimport math\nimport json\nimport copy\nimport io\nimport pstats\nimport functools\nimport torch\nfrom collections import OrderedDict\nfrom utils.optimizers.lars import LarsSGD\nfrom utils.optimizers.lamb import LAMB\nfrom utils.optimizers.adamW import AdamW\nfrom easydict import EasyDict as edict\nfrom torch.optim.lr_scheduler import (\n    StepLR, \n    MultiStepLR, \n    ReduceLROnPlateau )\n\ndef make_optimizer(optimizer_config, model):\n    \"\"\"Initialization for optimizer.\"\"\"\n\n    tmp_config = copy.deepcopy(optimizer_config)\n    if optimizer_config[\"type\"] == \"sgd\":\n        tmp_config.pop(\"type\", None)\n        return torch.optim.SGD(model.parameters(), **tmp_config)\n\n    elif optimizer_config[\"type\"] == \"adam\":\n        tmp_config.pop(\"type\", None)\n        return torch.optim.Adam(model.parameters(), **tmp_config)\n\n    elif optimizer_config[\"type\"] == \"adamax\":\n        tmp_config.pop(\"type\", None)\n        tmp_config.pop(\"amsgrad\", None)\n        return torch.optim.Adamax(model.parameters(), **tmp_config)\n\n    elif optimizer_config[\"type\"] == \"lars\":\n        tmp_config.pop(\"type\", None)\n        from torchlars import LARS\n        base_optimizer = torch.optim.SGD(model.parameters(), **tmp_config)\n        return LARS(optimizer=base_optimizer, eps=1e-8, trust_coef=0.001)\n    \n    elif optimizer_config[\"type\"] == \"LarsSGD\":\n        tmp_config.pop(\"type\", None)\n        return LarsSGD(model.parameters(),**tmp_config)\n\n    elif optimizer_config[\"type\"] == \"lamb\":\n        tmp_config.pop(\"type\", None)\n        return LAMB(model.parameters(), **tmp_config)\n\n    elif optimizer_config[\"type\"] == \"adamW\":\n        tmp_config.pop(\"type\", None)\n        tmp_config.pop(\"amsgrad\", None)\n        return AdamW(model.parameters(), **tmp_config)\n        \n    else:\n        raise ValueError(\"{} optimizer not supported\".format(optimizer_config[\"type\"]))\n\n\ndef get_lr(optimizer):\n    \"\"\"Obtain LR.\"\"\"\n    for param_group in optimizer.param_groups:\n        return param_group['lr']\n\ndef get_lr_all(optimizer):\n    \"\"\"Double checking for get_lr.\"\"\"\n    for param_group in optimizer.param_groups:\n        yield param_group['lr']\n\n\ndef softmax(X, theta = 1.0, axis = None):\n    \"\"\"Compute the softmax of each element along an axis of X.\n\n    Args:\n        X (ndarray): x, probably should be floats.\n        theta (float): used as a multiplier prior to exponentiation. Default = 1.0\n        axis : axis to compute values along. Default is the first non-singleton axis.\n\n    Returns:\n        An array the same size as X. The result will sum to 1 along the specified axis.\n    \"\"\"\n    # make X at least 2d\n    y = np.atleast_2d(X)\n\n    # find axis\n    if axis is None:\n        axis = next(j[0] for j in enumerate(y.shape) if j[1] > 1)\n\n    # multiply y against the theta parameter,\n    y = y * float(theta)\n\n    # subtract the max for numerical stability\n    y = y - np.expand_dims(np.max(y, axis = axis), axis)\n\n    # exponentiate y\n    y = np.exp(y)\n\n    # take the sum along the specified axis\n    ax_sum = np.expand_dims(np.sum(y, axis = axis), axis)\n\n    # finally: divide elementwise\n    p = y / ax_sum\n\n    # flatten if X was 1D\n    if len(X.shape) == 1: p = p.flatten()\n\n    return p\n\n\nclass AverageMeter(object):\n    \"\"\" Will calculate running micro and macro averages for various\n    (error/efficiency) rates.\n    \"\"\"\n    def __init__(self, metric_name):\n        self.numerators, self.denominators = list(), list()\n        self.metric_name = metric_name\n\n    def add(self, top, bottom):\n        self.numerators.append(top)\n        self.denominators.append(bottom)\n\n    def get_macro_average(self):\n        scores = [float(self.numerators[i]) / self.denominators[i] \\\n                            for i in range(len(self.denominators))]\n        return self.get_average(scores)\n\n    def get_micro_average(self):\n        return float(sum(self.numerators)) / sum(self.denominators)\n\n    # accepts a list and returns average\n    def get_average(self, l):\n        return sum(l) / float(len(l))\n\n    def reset(self):\n        self.numerators, self.denominators = list(), list()\n\n    def display_results(self, loglevel=logging.INFO):\n        print_rank(\"{} Macro average: {}\".format(self.metric_name,\n                                                self.get_macro_average()), loglevel)\n        print_rank(\"{} Micro average: {}\".format(self.metric_name,\n                                                self.get_micro_average()), loglevel)\n\n\ndef make_lr_scheduler(annealing_config, optimizer, num_batches=1):\n    \"\"\"Set learning rate scheduler.\"\"\"\n\n    annealing_config = copy.deepcopy(annealing_config)\n    annealing_type = annealing_config.pop(\"type\")\n\n    # per epoch or per iter\n    step_interval='epoch'\n    if \"step_interval\" in annealing_config:\n        step_interval = annealing_config.pop(\"step_interval\")\n\n    if annealing_type == \"step_lr\":\n        # convert epoch steps to iter steps\n        # expochs can also be floats like 1.5\n        if step_interval == \"epoch\":\n            annealing_config[\"step_size\"] = int(num_batches * \\\n                                    annealing_config[\"step_size\"])\n        lr_scheduler =  StepLR(optimizer=optimizer,\n                                **annealing_config)\n    elif annealing_type == \"multi_step_lr\":\n        # convert epoch steps to iter steps\n        if step_interval == \"epoch\":\n            annealing_config[\"milestones\"] = [int(i * num_batches) for i in annealing_config[\"milestones\"]]\n        lr_scheduler =  MultiStepLR(optimizer=optimizer,\n                                **annealing_config)\n    elif annealing_type == \"rampup-keep-expdecay-keep\":\n        # emulate SpecAugment scheduling\n        lr_scheduler =  RampupKeepExpdecayKeepLRScheduler(optimizer=optimizer,\n                                        **annealing_config)\n    elif annealing_type == 'val_loss':\n        lr_scheduler =  ReduceLROnPlateau(optimizer,\n                                        **annealing_config)\n    else:\n        raise ValueError(\"{} LR scheduler not supported\".format(\n                                                annealing_type))\n    return lr_scheduler\n\n\nclass RampupKeepExpdecayKeepLRScheduler(torch.optim.lr_scheduler._LRScheduler):\n    \"\"\"Implements the LR schedule described in the specaugment paper.\"\"\"\n\n    def __init__(self, optimizer, peak_lr=0.001, floor_lr=0.00001, sr=1000, si=40000, sf=160000, last_epoch=-1):\n        assert(peak_lr>=floor_lr)\n        self.peak_lr = peak_lr\n        self.floor_lr = floor_lr\n        assert(sr<=si)\n        assert(si<=sf)\n        self.sr = sr\n        self.si = si\n        self.sf = sf\n        self.gamma = math.log(self.floor_lr/self.peak_lr)/(float(self.sf-self.si))\n        print('self.gamma')\n        print(self.gamma)\n        self.step_count = 0\n        super(RampupKeepExpdecayKeepLRScheduler, self).__init__(optimizer, last_epoch=last_epoch)\n\n    def step(self, epoch=None):\n        for p, lr in zip(self.optimizer.param_groups, self.get_lr()):\n            p['lr'] = lr\n        self.step_count += 1\n\n    def get_lr(self):\n        lr = self.floor_lr\n        if self.step_count < self.sr:\n            # linear ramp up\n            lr = self.peak_lr * float(self.step_count) / float(self.sr)\n        elif self.step_count < self.si:\n            # keep peak_lr\n            lr = self.peak_lr\n        elif self.step_count < self.sf:\n            # exponential decay from peak_lr to floor_lr\n            lr = self.peak_lr * math.exp(self.gamma * (float(self.step_count-self.si)))\n\n        return [lr for base_lr in self.base_lrs]\n\n\n\nclass ScheduledSamplingScheduler():\n    \"\"\" Implementing the schedule sampling rate schedule.\n\n    0 - ramp_start          = initial_rate\n    ramp_start - ramp_end   = {linearly increase to final_rate}\n    ramp_end - infinity     = final_rate\n    \"\"\"\n\n    def __init__(self, model, ramp_start, ramp_stop,\n                            initial_rate, final_rate):\n        self.model = model\n        self.ramp_start = ramp_start\n        self.ramp_stop = ramp_stop\n        self.initial_rate = initial_rate\n        self.final_rate = final_rate\n        self.iter = 0\n\n    def step(self):\n        if self.iter < self.ramp_start:\n            self.model.scheduled_sampling_rate = self.initial_rate\n        elif self.iter >= self.ramp_start and self.iter <= self.ramp_stop:\n            self.model.scheduled_sampling_rate = self.initial_rate + (self.final_rate - self.initial_rate) * ( (self.iter - self.ramp_start) / (self.ramp_stop - self.ramp_start))\n        else:\n            self.model.scheduled_sampling_rate = self.final_rate\n\n        self.model.scheduled_sampling = (self.model.scheduled_sampling_rate != 0)\n        self.iter += 1\n\n    def state_dict(self):\n        return {key: value for key, value in self.__dict__.items() if key != 'model'}\n\n    def load_state_dict(self, state_dict):\n        self.__dict__.update(state_dict)\n\n\nclass NBestTaskScheduler():\n    \"\"\" Implementing the scheduler for multi-task training.\n\n    num_tasks[0]: 0                     <= i < iteration_per_task[0]\n    num_tasks[1]: iteration_per_task[0] <= i < iteration_per_task[1]\n    \"\"\"\n    def __init__(self, num_tasks, iteration_per_task):\n        assert len(num_tasks) == len(iteration_per_task), \"Mismatched length {}!={}\".format(len(num_tasks), len(iteration_per_task))\n        self.iter = 0\n        self.stagex = 0\n        self.num_tasks = num_tasks\n        self.iteration_per_task = iteration_per_task\n\n    def current_num_tasks(self):\n        return self.num_tasks[self.stagex]\n\n    def no_label_updates(self):\n        \"\"\"Return how many times transcription must be updated.\"\"\"\n        return (self.iter // self.iteration_per_task[-1]) + 1\n\n    def set_iteration_no(self, iter_no):\n        self.iter = iter_no\n\n    def step(self):\n        print_rank(\"Iter={}: #tasks {} at stage {}\".format(self.iter, self.current_num_tasks(), self.stagex))\n        local_iter = self.iter % self.iteration_per_task[-1]\n        if local_iter == 0:\n            self.stagex = 0\n        elif local_iter >= self.iteration_per_task[self.stagex]:\n            self.stagex += 1\n\n        self.iter += 1\n\n\n# Logging and write-to-disk utilities\n\ndef init_logging(log_dir, loglevel=logging.DEBUG):\n    \"\"\"Initialize logging\"\"\"\n    \n    os.makedirs(log_dir, exist_ok=True)    \n    log_file = os.path.join(log_dir, \"log.out\")\n    logging.basicConfig(filename=log_file,\n                        level=loglevel)\n    handler = logging.StreamHandler(stream=sys.stdout)\n    logging.getLogger().addHandler(handler)\n\n\ndef print_cuda_stats():\n    if torch.cuda.is_available():\n        print_rank(\"torch.cuda.memory_allocated(): {}\".format(torch.cuda.memory_allocated()))\n        print_rank(\"torch.cuda.memory_cached(): {}\".format(torch.cuda.memory_cached()))\n        print_rank(\"torch.cuda.synchronize(): {}\".format(torch.cuda.synchronize()))\n    else:\n        print_rank(\"No CUDA GPU available\")\n\n\ndef print_rank(str, loglevel=logging.INFO):\n\n    str = \"{} : {}\".format(time.ctime(), str)\n    logging.log(loglevel, str)\n\ndef print_profiler(profiler, loglevel=logging.INFO):\n    memfile = io.StringIO()\n    pstats.Stats(profiler, stream=memfile) \\\n        .strip_dirs() \\\n        .sort_stats(pstats.SortKey.CUMULATIVE) \\\n        .print_stats(20)                    \n    for l in memfile.getvalue().split('\\n'):\n        print_rank(l, loglevel=loglevel)\n    memfile.close()\n\n\ndef write_yaml(save_path, config):\n    with open(save_path, 'w', encoding='utf8') as yaml_file:\n        yaml.dump(config, yaml_file, default_flow_style=False)\n\ndef torch_save(save_path, state_or_model):\n    torch.save(state_or_model, save_path)\n\ndef write_tokens(save_path, token_list):\n    with open(save_path, 'w', encoding='utf8') as token_fid:\n        for w in token_list:\n            token_fid.write(w + '\\n')\n\n\ndef try_except_save(save_fn, **kwargs):\n    \"\"\" Try to write it out 3 times.\"\"\"\n\n    max_attempts = 3\n    for attempt in range(1, max_attempts+1):\n        try:\n            save_fn(**kwargs)\n        except IOError:\n            print_rank(\"Write operation failed on {} attempt\".format(attempt))\n        else:\n            print_rank(\"Write operation succeeded in {} attempts\".format(attempt))\n            return\n\n\ndef write_nbest_jsonl(uttid2jsonl, uttid2hypos, uttid2scores, outputpath, nbest, orgpath=\"\", newpath=\"\"):\n    \"\"\" Dump a json list file with n-best hypos.\"\"\"\n\n    newjsonl = []\n    for uttid, jsonl in uttid2jsonl.items():\n        if not uttid in uttid2hypos:\n            print(\"Missing utterance {} in results\".format(uttid))\n            continue\n        hypos  = uttid2hypos[uttid]\n        if nbest > 1:\n            # re-normalize the probablity from N-best: ignoring the events out of the N-best hypos\n            weights = uttid2scores[uttid]\n            if len(weights) < nbest:\n                for n in range(len(weights), nbest):\n                    print_rank(\"Mising {}-th best result in {}. Appending {}\".format(n, uttid, weights[0]))\n                    weights = np.append(weights, np.array(weights[0]))\n\n            weights = softmax(weights[0:nbest]) if uttid in uttid2scores else np.ones(nbest) / nbest\n            # Filling the missing hypos with the 1st best candidate\n            for n in range(min(nbest, len(hypos))):\n                newjson = copy.deepcopy(jsonl)\n                newjson[\"id\"]   = \"{}-{}\".format(uttid, n)\n                newjson[\"text\"] = \" \".join(hypos[n])\n                newjson[\"loss_weight\"] = weights[n]\n        else:\n            newjson = copy.deepcopy(jsonl)\n            newjson[\"id\"]   = uttid\n            newjson[\"text\"] = \" \".join(hypos[0])\n\n        newjsonl.append(newjson)\n\n    with open(outputpath, 'w') as ofp:\n        for jsonl in newjsonl:\n            jsonl[\"wav\"] = jsonl[\"wav\"].replace(orgpath, newpath)\n            ofp.write(\"{}\\n\".format(json.dumps(jsonl)))\n\n    return True\n\n\ndef write_multitask_jsonl(uttid2jsonl, uttid2hypos, uttid2scores, outputpath, nbest, orgpath=\"\", newpath=\"\"):\n    \"\"\" Dump a json list file with n-best hypos.\"\"\"\n\n    if nbest==1:\n        return write_nbest_jsonl(uttid2jsonl, uttid2hypos, uttid2scores, outputpath, nbest, orgpath, newpath)\n\n    newjsonl = []\n    for uttid, jsonl in uttid2jsonl.items():\n        if not uttid in uttid2hypos:\n            print_rank(\"Missing utterance {} in results\".format(uttid))\n            continue\n        hypos  = uttid2hypos[uttid]\n        # re-normalize the probablity from N-best: ignoring the events out of the N-best hypos\n        weights = uttid2scores[uttid]\n        if len(weights) < nbest:\n            for n in range(len(weights), nbest):\n                print_rank(\"Mising {}-th best result in {}. Appending {}\".format(n, uttid, weights[0]))\n                weights = np.append(weights, np.array(weights[0]))\n\n        weights = softmax(weights[0:nbest]) if uttid in uttid2scores else np.ones(nbest) / nbest\n        newjson = jsonl\n        newjson[\"task_weights\"] = weights.tolist()\n        assert len(weights) == nbest, \"{}: Weight length does not match: {} != {}\".format(uttid, len(weights), nbest)\n        newjson[\"text\"] = \" \".join(hypos[0])\n        newjson[\"subtextl\"] = []\n        all_null_results = newjson[\"text\"] == \"\"\n        for n in range(1, nbest):\n            if n < len(hypos):\n                newjson[\"subtextl\"].append(\" \".join(hypos[n]))\n            else:\n                print_rank(\"Mising {}-th best result in {}\".format(n, uttid))\n                newjson[\"subtextl\"].append(\" \".join(hypos[0]))\n            if all_null_results is True:\n                all_null_results = newjson[\"subtextl\"][n-1] == \"\"\n\n        assert len(newjson[\"subtextl\"]) == nbest-1, \"#sub-rec results does not match: {} != {}\".format(len(newjson[\"subtextl\"]), nbest-1)\n        # take meaningful results only and ignore null string\n        if all_null_results is False:\n            newjsonl.append(newjson)\n        else:\n            print_rank(\"Skip {}: Invalid result '{}'\".format(uttid, newjson[\"text\"]))\n\n    with open(outputpath, 'w') as ofp:\n        for jsonl in newjsonl:\n            jsonl[\"wav\"] = jsonl[\"wav\"].replace(orgpath, newpath)\n            ofp.write(\"{}\\n\".format(json.dumps(jsonl)))\n\n    return True\n\n\ndef load_eval_result_jsonl(resultjsonl, uttid2hypos=OrderedDict(), uttid2scores=OrderedDict(), dumpfp=None, dump_msg=\"RESULT: \"):\n    \"\"\"Load the result JSON list file dumped by Evaluator().\n\n    Args:\n\n    resultjsonl (str): input JSON list file\n    uttid2hypos: (dict): maps the utterance ID to text, [uttid] = hypothesis text\n    uttid2scores (dict): maps the utterance ID to a confidence score, [uttid] = confidence score(s)\n    dumpfp (file): pointer where the WERs will be written out\n    dump_msg (str): message string before the WER result\n    \"\"\"\n    total_weighted_best_wer   = 0\n    total_weighted_oracle_wer = 0\n    total_length              = 0\n    with open(resultjsonl) as resultfp:\n        for line in resultfp:\n            elems = json.loads(line.strip())\n            if \"hypothesis\" in elems:\n                uttid = elems[\"utt_id\"]\n                params = list(elems[\"hypothesis\"].keys())\n                uttid2hypos[uttid] = elems[\"hypothesis\"][params[0]]\n                if \"nbest_model_scores\" in elems:\n                    uttid2scores[uttid] = np.array(elems[\"nbest_model_scores\"][params[0]])\n            else:\n                print_rank(\"Result: {}\".format(line.strip()))\n                if dumpfp is not None:\n                    dumpfp.write(\"{}{}\\n\".format(dump_msg, line.strip()))\n                params = list(elems[\"wer-\"].keys())\n                total_weighted_best_wer   += elems[\"wer-\"][params[0]][\"best_wer\"] * elems[\"wer-\"][params[0]][\"total_length\"]\n                total_weighted_oracle_wer += elems[\"wer-\"][params[0]][\"oracle_wer\"] * elems[\"wer-\"][params[0]][\"total_length\"]\n                total_length += elems[\"wer-\"][params[0]][\"total_length\"]\n\n    return uttid2hypos, uttid2scores, total_weighted_best_wer, total_weighted_oracle_wer, total_length\n\n\ndef find_pretrained_model(model_path, config):\n    \"\"\"\"Load a a pre-trained/seed model if provided in config file.\"\"\"\n    output_file=None\n\n    if config.get(\"pretrained_model_path\", None):\n        output_file=config[\"pretrained_model_path\"]\n\n    print_rank('Loading Model from: {}'.format(output_file), loglevel=logging.INFO)\n    return output_file\n\n\ndef flatten_grads_model(learner) -> np.ndarray:\n    \"\"\"Given a model flatten all params and return as np array.\"\"\"\n\n    return np.concatenate([w.grad.detach().clone().cpu().numpy().flatten() for w in learner.parameters()])\n\ndef flatten_grads_array(param_array)->np.array:\n    \"\"\"Given a model flatten all params and return as np array.\"\"\"\n\n    N=len(param_array)\n    tmp_array=[]\n    for i in range(N):\n        tmp_array.append(np.concatenate([w.detach().clone().cpu().numpy().flatten() for w in param_array[i]]))\n    return np.array(tmp_array)\n\ndef dist_weights_to_model(weights, parameters):\n    \"\"\"Updates the model parameters with the supplied weights.\"\"\"\n\n    offset = 0\n    for param in parameters:\n        new_size = functools.reduce(lambda x, y: x*y, param.shape)\n        current_data = weights[offset:offset + new_size]\n        param.data[:] = torch.from_numpy(current_data.reshape(param.shape)).to(param.data)\n        offset += new_size\n\ndef dist_params_to_model(grads, model):\n    \"\"\"Updates the model gradients (Corresponding to each param) with the supplied grads.\"\"\"\n\n    offset = 0\n    for p in model:\n        new_size = functools.reduce(lambda x, y: x*y, p.data.shape)\n        current_data = torch.from_numpy(grads[offset:offset + new_size].reshape(p.data.shape)).type(p.data.dtype).to(p)\n        p.grad = current_data if p.grad==None else p.grad+current_data\n        offset += new_size\n        \ndef reshape_params_to_model(grads, model):\n    \"\"\" Given Gradients and a model architecture this method updates the model gradients (Corresponding to each param)\n    with the supplied grads \"\"\"\n    offset = 0\n    reshaped_grads=[]\n    for p in model:\n        new_size = functools.reduce(lambda x, y: x*y, p.shape)\n        current_data = torch.from_numpy(grads[offset:offset + new_size].reshape(p.shape)).type(p.dtype).to(p)\n        reshaped_grads.append(current_data)\n        offset += new_size\n    return reshaped_grads\n\ndef to_device(x):\n    return x.cuda() if torch.cuda.is_available() else x\n\ndef update_json_log(log_path, status_info):\n    \"\"\"Update J-son elements\"\"\"\n    \n    elems = {}\n    if os.path.exists(log_path):\n        with open(log_path, 'r') as logfp: \n            elems = json.load(logfp)\n            print_rank(\"Loaded status info: {}\".format(elems))\n\n    for k, v in status_info.items():\n        elems[k] = v\n\n    with open(log_path, 'w') as logfp:\n        json.dump(elems, logfp)\n        print_rank(\"Updated status info: {}\".format(elems))\n\n\ndef scrub_empty_clients(data_strct):\n    \"\"\" Clean empty clients in the data structure\"\"\"\n\n    users_out = []\n    user_data_out = {}\n    num_samples_out = []\n    if 'user_data_label' in data_strct.keys():\n        user_data_label_out = {}\n    for ix, user in enumerate(data_strct['users']):\n        if data_strct['num_samples'][ix] > 0:\n            users_out.append(user)\n            user_data_out[user] = data_strct['user_data'][user]\n            num_samples_out.append(data_strct['num_samples'][ix])\n            if 'user_data_label' in data_strct.keys():\n                user_data_label_out[user] = data_strct['user_data_label'][user]\n\n    if ('user_data_label' in data_strct.keys()):\n        return edict({'users': users_out, 'user_data': user_data_out, 'num_samples': num_samples_out, 'user_data_label': user_data_label_out})\n    else:\n        return edict({'users': users_out, 'user_data': user_data_out, 'num_samples': num_samples_out})\n\n\ndef compute_grad_cosines(grads, model_grad):\n    def compute_cosine(g, m):\n        tot = 0\n        g2 = 0\n        m2 = 0\n        for p1, p2 in zip(g, m):\n            tot += torch.mul(p1, p2.to('cpu')).sum().item()\n            g2 += torch.mul(p1, p1).sum().item()\n            m2 += torch.mul(p2, p2).sum().item()\n        return tot / (np.sqrt(g2) * np.sqrt(m2)) if g2 > 0 and m2 > 0 else 0\n    return [compute_cosine(g, model_grad) for g in grads]\n\n# Personalization Routines\ndef convex_inference(model_global, model_personal, alpha):\n    \"\"\"\" Model interpolation \"\"\"\n    targets= torch.tensor(model_global['labels'])\n    probs = alpha*model_personal['probabilities']+(1-alpha)*model_global['probabilities']\n    probs= torch.argmax(torch.tensor(probs), dim=1)\n    return torch.mean((probs == targets).float()).detach().cpu().item()\n\ndef alpha_update(model_global, model_personal, alpha, eta):\n    \"\"\"\" Training convex model interpolation weight. \"\"\"\n    grad_alpha = 0.0\n    for l_params, p_params in zip(model_global.parameters(), model_personal.parameters()):\n        dif = p_params.data - l_params.data\n        grad = alpha * p_params.grad + (1 - alpha) * l_params.grad\n        grad_alpha += dif.view(-1).T.dot(grad.view(-1))\n\n    grad_alpha += 0.02 * alpha\n    alpha_n = alpha - eta * grad_alpha\n    alpha_n = np.clip(alpha_n.detach().cpu().item(), 0.0001, 0.9999)\n\n    return alpha_n if np.isfinite(alpha_n) else 0.75\n\n# Semi-supervision Routines\ndef get_label_VAT(local_logits, server_logits, thre, comp):\n    \"\"\"\" Returns the estimated labels to SemiSupervision Task \"\"\"\n    bs = np.shape(local_logits)[0]\n    logit_dim = np.shape(local_logits)[1]\n    labels = []\n    idx = []\n    var = []\n\n    if comp == 'var':\n        local_var = torch.var(local_logits, dim=1)\n        server_var = torch.var(server_logits, dim=1)\n\n        server = 0\n        local = 0\n        ratio = 0\n\n        for bs_i in range(bs):\n            if local_var[bs_i] >= server_var[bs_i] and torch.max(local_logits[bs_i]) > thre:\n                labels.append(torch.argmax(local_logits[bs_i]))\n                idx.append(bs_i)\n                var.append((server_var[bs_i]) / (local_var[bs_i]))\n                local += 1\n            if local_var[bs_i] < server_var[bs_i] and torch.max(server_logits[bs_i]) > thre:\n                labels.append(torch.argmax(server_logits[bs_i]))\n                idx.append(bs_i)\n                var.append((local_var[bs_i]) / (server_var[bs_i]))\n                server += 1\n\n        if len(labels) != 0:\n            labels = torch.stack(labels)\n            var = torch.stack(var)\n            ratio = server / (server + local)\n\n    elif comp == 'ent':\n        local_var = scipyst.entropy(local_logits.cpu(), axis=1)+0.00001\n        server_var = scipyst.entropy(server_logits.cpu(), axis=1)+0.00001\n\n        server = 0\n        local = 0\n        ratio = 0\n\n        for bs_i in range(bs):\n            if 1/local_var[bs_i]>= 1/server_var[bs_i] and torch.max(local_logits[bs_i])>thre:\n                labels.append(torch.argmax(local_logits[bs_i]))\n                idx.append(bs_i)\n                var.append((1/server_var[bs_i])/(1/local_var[bs_i]))\n                local += 1\n            if 1/local_var[bs_i]< 1/server_var[bs_i] and torch.max(server_logits[bs_i])>thre:\n                labels.append(torch.argmax(server_logits[bs_i]))\n                idx.append(bs_i)\n                var.append((1/local_var[bs_i])/(1/server_var[bs_i]))\n                server += 1\n\n        if len(labels) != 0:\n            labels = torch.stack(labels)\n            #var = torch.stack(var)\n            ratio = server/(server+local)\n\n    return labels, idx, var, ratio\n\n\n\n"
  }
]