[
  {
    "path": ".gitignore",
    "content": ".DS_Store\n\n# data files\nMNIST_DATA\n\nfast_mnist/\n\n# model files\nmodels\nsecret.zip\n\n# compiled python files\n*.pyc\n\n.idea/\n"
  },
  {
    "path": "LICENSE",
    "content": "Copyright (c) 2019, Maksym Andriushchenko, Francesco Croce, Nicolas Flammarion, Matthias Hein\nAll rights reserved.\n\nRedistribution and use in source and binary forms, with or without\nmodification, are permitted provided that the following conditions are met:\n    * Redistributions of source code must retain the above copyright\n      notice, this list of conditions and the following disclaimer.\n    * Redistributions in binary form must reproduce the above copyright\n      notice, this list of conditions and the following disclaimer in the\n      documentation and/or other materials provided with the distribution.\n    * Neither the name of the copyright holder nor the\n      names of its contributors may be used to endorse or promote products\n      derived from this software without specific prior written permission.\n\nTHIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS \"AS IS\" AND\nANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED\nWARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE\nDISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER BE LIABLE FOR ANY\nDIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES\n(INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;\nLOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND\nON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT\n(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS\nSOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE."
  },
  {
    "path": "README.md",
    "content": "# Square Attack: a query-efficient black-box adversarial attack via random search\n**ECCV 2020**\n\n**Maksym Andriushchenko\\*, Francesco Croce\\*, Nicolas Flammarion, Matthias Hein**\n\n**EPFL, University of Tübingen**\n\n**Paper:** [https://arxiv.org/abs/1912.00049](https://arxiv.org/abs/1912.00049)\n\n\\* denotes equal contribution\n\n\n## News\n+ [Jul 2020] The paper is accepted at **ECCV 2020**! Please stop by our virtual poster for the latest insights in black-box adversarial attacks (also check out our recent preprint [Sparse-RS paper](https://arxiv.org/abs/2006.12834) where we use random search for sparse attacks).\n+ [Mar 2020] Our attack is now part of [AutoAttack](https://github.com/fra31/auto-attack), an ensemble of attacks used\nfor automatic (i.e., no hyperparameter tuning needed) robustness evaluation. Table 2 in the [AutoAttack paper](https://arxiv.org/abs/2003.01690) \nshows that at least on 6 models our **black-box** attack outperforms gradient-based methods. Always useful to have a black-box attack to prevent inaccurate robustness claims!\n+ [Mar 2020] We also achieve the best results on [TRADES MNIST benchmark](https://github.com/yaodongyu/TRADES)!\n+ [Jan 2020] The Square Attack achieves the best results on [MadryLab's MNIST challenge](https://github.com/MadryLab/mnist_challenge),\noutperforming all white-box attacks! In this case we used 50 random restarts of our attack, each with a query limit of 20000.\n+ [Nov 2019] The Square Attack breaks the recently proposed defense from \"Bandlimiting Neural Networks Against Adversarial Attacks\" \n([https://github.com/robust-ml/robust-ml.github.io/issues/15](https://github.com/robust-ml/robust-ml.github.io/issues/15)).\n\n\n## Abstract\nWe propose the *Square Attack*, a score-based black-box L2- and Linf-adversarial attack that does not \nrely on local gradient information and thus is not affected by gradient masking. Square Attack is based on a randomized \nsearch scheme which selects localized square-shaped updates at random positions so that at each iteration the perturbation \nis situated approximately at the boundary of the feasible set. Our method is significantly more query efficient and achieves a higher success rate compared to the state-of-the-art \nmethods, especially in the untargeted setting. In particular, on ImageNet we improve the average query efficiency in the untargeted setting for various deep networks \nby a factor of at least 1.8 and up to 3 compared to the recent state-of-the-art Linf-attack of Al-Dujaili & O’Reilly.\nMoreover, although our attack is *black-box*, it can also outperform gradient-based *white-box* attacks \non the standard benchmarks achieving a new state-of-the-art in terms of the success rate.\n\n-----\n\nThe code of the Square Attack can be found in `square_attack_linf(...)` and `square_attack_l2(...)` in `attack.py`.\\\nBelow we show adversarial examples generated by our method for Linf and L2 perturbations:\n<p align=\"center\"><img src=\"images/adv_examples.png\" width=\"700\"></p>\n\n<!-- \nBelow we show the evolution of the images produced by the Linf (left) and L2 (right) versions of the Square Attack until misclassification is reached.\n<p align=\"center\"><img src=\"images/ezgif.com-gif-maker-50-conf-small.gif\" width=\"425\" /> <img src=\"images/ezgif.com-gif-maker-img-53-l2-2.gif\" width=\"425\" /> </p>  \n-->\n\n\n## About the paper\nThe general algorithm of the attack is extremely simple and relies on the random search algorithm: we try some update and\naccept it only if it helps to improve the loss:\n<p align=\"center\"><img src=\"images/algorithm_rs.png\" width=\"600\"></p>\n\nThe only thing we customize is the sampling distribution P (see the paper for details). The main idea behind the choice\nof the sampling distributions is that:\n- We start at the boundary of the feasible set with a good initialization that helps to improve the query efficiency (particularly for the Linf-attack).\n- Every iteration we stay at the boundary of the feasible set by changing squared-shaped regions of the image.\n\nIn the paper we also provide convergence analysis of a variant of our attack in the non-convex setting, and justify \nthe main algorithmic choices such as modifying squares and using the same sign of the update.\n\nThis simple algorithm is sufficient to significantly outperform much more complex approaches in terms of the success rate\nand query efficiency:\n<p align=\"center\"><img src=\"images/main_results_imagenet.png\" width=\"650\"></p>\n<p align=\"center\"><img src=\"images/main_results_imagenet_l2_commonly_successful.png\" width=\"450\"></p>\n\nHere are the complete success rate curves with respect to different number of queries. We note that the Square Attack \nalso outperforms the competing approaches in the low-query regime.\n<p align=\"center\"><img src=\"images/success_rate_curves_full.png\" width=\"650\"></p>\n\nThe Square Attack also performs very well on adversarially trained models on MNIST achieving results competitive or \nbetter than *white-box* attacks despite the fact our attack is *black-box*:\n<p align=\"center\"><img src=\"images/table_madry_trades_mnist_linf.png\" width=\"300\"></p>\n\nInterestingly, the L2 perturbations for the Linf adversarially trained model are challenging for many attacks, including\nwhite-box PGD, and also other black-box attacks. However, the Square Attack is able to much more accurately assess the \nrobustness in this setting:\n<p align=\"center\"><img src=\"images/table_madry_mnist_l2.png\" width=\"550\"></p>\n\n<!--\nThe Square Attack is a useful tool to evaluate robustness of new defenses. It works even in cases when the \nwhite-box PGD-attack fails to accurately estimate the robustness of the model or requires many random restarts:\n<p align=\"center\"><img src=\"images/table_clp_lsq.png\" width=\"550\"></p>\n<p align=\"center\"><img src=\"images/table_post_avg.png\" width=\"550\"></p>\n\nAn ablation study helps to understand what is particularly important for the query efficiency of our attack: \nsquare-shaped updates and using the same random sign inside each square. See the paper for theoretical justifications of\nthis choice of the updates.\n<p align=\"center\"><img src=\"images/ablation_study.png\" width=\"400\"></p>\n\nFinally, we note that the attack is stable under different choices of the hyperparameter `p`. This is an important property\nfor the black-box setting since doing even an approximate grid search over `p` for every new model would require a lot more additional queries.\n<p align=\"center\"><img src=\"images/sensitivity_wrt_p.png\" width=\"1000\"></p>\n-->\n\n\n\n## Running the code\n`attack.py` is the main module that implements the Square Attack, see the command line arguments there.\nThe main functions which implement the attack are `square_attack_linf()` and `square_attack_l2()`.\n\nIn order to run the untargeted Linf Square Attack on ImageNet models from the PyTorch repository you need to specify a correct path \nto the validation set (see `IMAGENET_PATH` in `data.py`) and then run:\n- ``` python attack.py --attack=square_linf --model=pt_vgg       --n_ex=1000  --eps=12.75 --p=0.05 --n_iter=10000 ```\n- ``` python attack.py --attack=square_linf --model=pt_resnet    --n_ex=1000  --eps=12.75 --p=0.05 --n_iter=10000 ```\n- ``` python attack.py --attack=square_linf --model=pt_inception --n_ex=1000  --eps=12.75 --p=0.05 --n_iter=10000 ```\n\nNote that eps=12.75 is then divided by 255, so in the end it is equal to 0.05.\n\nFor performing targeted attacks, one should use additionally the flag `--targeted`, use a lower `p`, and specify more \niterations `--n_iter=100000` since it usually takes more iteration to achieve a misclassification to some particular, \nrandomly chosen class.\n\nThe rest of the models have to downloaded first (see the instructions below), and then can be evaluated in the following way:\n\nPost-averaging models:\n- ``` python attack.py --attack=square_linf --model=pt_post_avg_cifar10  --n_ex=1000 --eps=8.0 --p=0.3 --n_iter=20000 ```\n- ``` python attack.py --attack=square_linf --model=pt_post_avg_imagenet --n_ex=1000 --eps=8.0 --p=0.3 --n_iter=20000 ```\n\nClean logit pairing and logit squeezing models:\n- ``` python attack.py --attack=square_linf --model=clp_mnist   --n_ex=1000  --eps=0.3   --p=0.3 --n_iter=20000 ```\n- ``` python attack.py --attack=square_linf --model=lsq_mnist   --n_ex=1000  --eps=0.3   --p=0.3 --n_iter=20000 ```\n- ``` python attack.py --attack=square_linf --model=clp_cifar10 --n_ex=1000  --eps=16.0  --p=0.3 --n_iter=20000 ```\n- ``` python attack.py --attack=square_linf --model=lsq_cifar10 --n_ex=1000  --eps=16.0  --p=0.3 --n_iter=20000 ```\n\nAdversarially trained model (with only 1 restart; note that the results in the paper are based on 50 restarts):\n- ``` python attack.py --attack=square_linf --model=madry_mnist_robust --n_ex=10000 --eps=0.3 --p=0.8 --n_iter=20000 ```\n\nThe L2 Square Attack can be run similarly, but please check the recommended hyperparameters in the paper (Section B of the supplement)\nand make sure that you specify the right value `eps` taking into account whether the pixels are in [0, 1] or in [0, 255] \nfor a particular dataset dataset and model.\nFor example, for the standard ImageNet models, the correct L2 eps to specify is 1275 since after division by 255 it will become 5.0.\n\n\n\n## Saved statistics\nIn the folder `metrics`, we provide saved statistics of the attack on 4 models: Inception-v3, ResNet-50, VGG-16-BN.\\\nHere are simple examples how to load the metrics file.\n\n### Linf attack\nTo print the statistics from the last iteration:\n```\nmetrics = np.load('metrics/2019-11-10 15:57:14 model=pt_resnet dataset=imagenet n_ex=1000 eps=12.75 p=0.05 n_iter=10000.metrics.npy')\niteration = np.argmax(metrics[:, -1])  # max time is the last available iteration\nacc, acc_corr, mean_nq, mean_nq_ae, median_nq, avg_loss, time_total = metrics[iteration]\nprint('[iter {}] acc={:.2%} acc_corr={:.2%} avg#q={:.2f} avg#q_ae={:.2f} med#q_ae={:.2f} (p={}, n_ex={}, eps={}, {:.2f}min)'.\n      format(n_iters+1, acc, acc_corr, mean_nq, mean_nq_ae, median_nq_ae, p, n_ex, eps, time_total/60))\n```\n\nThen one can also create different plots based on the data contained in `metrics`. For example, one can use `1 - acc_corr`\nto plot the success rate of the Square Attack at different number of queries.\n\n### L2 attack\nIn this case we provide the number of queries necessary to achieve misclassification (`n_queries[i] = 0` means that the image `i` was initially misclassified, `n_queries[i] = 10001` indicates that the attack could not find an adversarial example for the image `i`).\nTo load the metrics and compute the success rate of the Square Attack after `k` queries, you can run:\n```\nn_queries = np.load('metrics/square_l2_resnet50_queries.npy')['n_queries']\nsuccess_rate = float(((n_queries > 0) * (n_queries <= k)).sum()) / (n_queries > 0).sum()\n```\n\n\n## Models\nNote that in order to evaluate other models, one has to first download them and move them to the folders specified in \n`model_path_dict` from `models.py`:\n- [Clean Logit Pairing on MNIST](https://oc.cs.uni-saarland.de/owncloud/index.php/s/w2yegcfx8mc8kNa)\n- [Logit Squeezing on MNIST](https://oc.cs.uni-saarland.de/owncloud/index.php/s/a5ZY72BDCPEtb2S)\n- [Clean Logit Pairing on CIFAR-10](https://oc.cs.uni-saarland.de/owncloud/index.php/s/odcd7FgFdbqq6zL)\n- [Logit Squeezing on CIFAR-10](https://oc.cs.uni-saarland.de/owncloud/index.php/s/EYnbHDeMbe4mq5M)\n- MNIST, Madry adversarial training: run `python madry_mnist/fetch_model.py secret`\n- MNIST, TRADES: download the [models](https://drive.google.com/file/d/1scTd9-YO3-5Ul3q5SJuRrTNX__LYLD_M) and see their [repository](https://github.com/yaodongyu/TRADES)\n- [Post-averaging defense](https://github.com/YupingLin171/PostAvgDefense/blob/master/trainedModel/resnet110.th): the model can be downloaded directly from the repository\n\nFor the first 4 models, one has to additionally update the paths in the `checkpoint` file in the following way: \n```\nmodel_checkpoint_path: \"model.ckpt\"\nall_model_checkpoint_paths: \"model.ckpt\"\n```\n\n\n\n## Requirements\n- PyTorch 1.0.0\n- Tensorflow 1.12.0\n\n\n\n## Contact\nDo you have a problem or question regarding the code?\nPlease don't hesitate to open an issue or contact [Maksym Andriushchenko](https://github.com/max-andr) or \n[Francesco Croce](https://github.com/fra31) directly.\n\n\n## Citation\n```\n@article{ACFH2020square,\n  title={Square Attack: a query-efficient black-box adversarial attack via random search},\n  author={Andriushchenko, Maksym and Croce, Francesco and Flammarion, Nicolas and Hein, Matthias},\n  conference={ECCV},\n  year={2020}\n}\n```\n"
  },
  {
    "path": "attack.py",
    "content": "import argparse\r\nimport time\r\nimport numpy as np\r\nimport data\r\nimport models\r\nimport os\r\nimport utils\r\nfrom datetime import datetime\r\nnp.set_printoptions(precision=5, suppress=True)\r\n\r\n\r\ndef p_selection(p_init, it, n_iters):\r\n    \"\"\" Piece-wise constant schedule for p (the fraction of pixels changed on every iteration). \"\"\"\r\n    it = int(it / n_iters * 10000)\r\n\r\n    if 10 < it <= 50:\r\n        p = p_init / 2\r\n    elif 50 < it <= 200:\r\n        p = p_init / 4\r\n    elif 200 < it <= 500:\r\n        p = p_init / 8\r\n    elif 500 < it <= 1000:\r\n        p = p_init / 16\r\n    elif 1000 < it <= 2000:\r\n        p = p_init / 32\r\n    elif 2000 < it <= 4000:\r\n        p = p_init / 64\r\n    elif 4000 < it <= 6000:\r\n        p = p_init / 128\r\n    elif 6000 < it <= 8000:\r\n        p = p_init / 256\r\n    elif 8000 < it <= 10000:\r\n        p = p_init / 512\r\n    else:\r\n        p = p_init\r\n\r\n    return p\r\n\r\n\r\ndef pseudo_gaussian_pert_rectangles(x, y):\r\n    delta = np.zeros([x, y])\r\n    x_c, y_c = x // 2 + 1, y // 2 + 1\r\n\r\n    counter2 = [x_c - 1, y_c - 1]\r\n    for counter in range(0, max(x_c, y_c)):\r\n        delta[max(counter2[0], 0):min(counter2[0] + (2 * counter + 1), x),\r\n              max(0, counter2[1]):min(counter2[1] + (2 * counter + 1), y)] += 1.0 / (counter + 1) ** 2\r\n\r\n        counter2[0] -= 1\r\n        counter2[1] -= 1\r\n\r\n    delta /= np.sqrt(np.sum(delta ** 2, keepdims=True))\r\n\r\n    return delta\r\n\r\n\r\ndef meta_pseudo_gaussian_pert(s):\r\n    delta = np.zeros([s, s])\r\n    n_subsquares = 2\r\n    if n_subsquares == 2:\r\n        delta[:s // 2] = pseudo_gaussian_pert_rectangles(s // 2, s)\r\n        delta[s // 2:] = pseudo_gaussian_pert_rectangles(s - s // 2, s) * (-1)\r\n        delta /= np.sqrt(np.sum(delta ** 2, keepdims=True))\r\n        if np.random.rand(1) > 0.5: delta = np.transpose(delta)\r\n\r\n    elif n_subsquares == 4:\r\n        delta[:s // 2, :s // 2] = pseudo_gaussian_pert_rectangles(s // 2, s // 2) * np.random.choice([-1, 1])\r\n        delta[s // 2:, :s // 2] = pseudo_gaussian_pert_rectangles(s - s // 2, s // 2) * np.random.choice([-1, 1])\r\n        delta[:s // 2, s // 2:] = pseudo_gaussian_pert_rectangles(s // 2, s - s // 2) * np.random.choice([-1, 1])\r\n        delta[s // 2:, s // 2:] = pseudo_gaussian_pert_rectangles(s - s // 2, s - s // 2) * np.random.choice([-1, 1])\r\n        delta /= np.sqrt(np.sum(delta ** 2, keepdims=True))\r\n\r\n    return delta\r\n\r\n\r\ndef square_attack_l2(model, x, y, corr_classified, eps, n_iters, p_init, metrics_path, targeted, loss_type):\r\n    \"\"\" The L2 square attack \"\"\"\r\n    np.random.seed(0)\r\n\r\n    min_val, max_val = 0, 1\r\n    c, h, w = x.shape[1:]\r\n    n_features = c * h * w\r\n    n_ex_total = x.shape[0]\r\n    x, y = x[corr_classified], y[corr_classified]\r\n\r\n    ### initialization\r\n    delta_init = np.zeros(x.shape)\r\n    s = h // 5\r\n    log.print('Initial square side={} for bumps'.format(s))\r\n    sp_init = (h - s * 5) // 2\r\n    center_h = sp_init + 0\r\n    for counter in range(h // s):\r\n        center_w = sp_init + 0\r\n        for counter2 in range(w // s):\r\n            delta_init[:, :, center_h:center_h + s, center_w:center_w + s] += meta_pseudo_gaussian_pert(s).reshape(\r\n                [1, 1, s, s]) * np.random.choice([-1, 1], size=[x.shape[0], c, 1, 1])\r\n            center_w += s\r\n        center_h += s\r\n\r\n    x_best = np.clip(x + delta_init / np.sqrt(np.sum(delta_init ** 2, axis=(1, 2, 3), keepdims=True)) * eps, 0, 1)\r\n\r\n    logits = model.predict(x_best)\r\n    loss_min = model.loss(y, logits, targeted, loss_type=loss_type)\r\n    margin_min = model.loss(y, logits, targeted, loss_type='margin_loss')\r\n    n_queries = np.ones(x.shape[0])  # ones because we have already used 1 query\r\n\r\n    time_start = time.time()\r\n    s_init = int(np.sqrt(p_init * n_features / c))\r\n    metrics = np.zeros([n_iters, 7])\r\n    for i_iter in range(n_iters):\r\n        idx_to_fool = (margin_min > 0.0)\r\n\r\n        x_curr, x_best_curr = x[idx_to_fool], x_best[idx_to_fool]\r\n        y_curr, margin_min_curr = y[idx_to_fool], margin_min[idx_to_fool]\r\n        loss_min_curr = loss_min[idx_to_fool]\r\n        delta_curr = x_best_curr - x_curr\r\n\r\n        p = p_selection(p_init, i_iter, n_iters)\r\n        s = max(int(round(np.sqrt(p * n_features / c))), 3)\r\n\r\n        if s % 2 == 0:\r\n            s += 1\r\n\r\n        s2 = s + 0\r\n        ### window_1\r\n        center_h = np.random.randint(0, h - s)\r\n        center_w = np.random.randint(0, w - s)\r\n        new_deltas_mask = np.zeros(x_curr.shape)\r\n        new_deltas_mask[:, :, center_h:center_h + s, center_w:center_w + s] = 1.0\r\n\r\n        ### window_2\r\n        center_h_2 = np.random.randint(0, h - s2)\r\n        center_w_2 = np.random.randint(0, w - s2)\r\n        new_deltas_mask_2 = np.zeros(x_curr.shape)\r\n        new_deltas_mask_2[:, :, center_h_2:center_h_2 + s2, center_w_2:center_w_2 + s2] = 1.0\r\n        norms_window_2 = np.sqrt(\r\n            np.sum(delta_curr[:, :, center_h_2:center_h_2 + s2, center_w_2:center_w_2 + s2] ** 2, axis=(-2, -1),\r\n                   keepdims=True))\r\n\r\n        ### compute total norm available\r\n        curr_norms_window = np.sqrt(\r\n            np.sum(((x_best_curr - x_curr) * new_deltas_mask) ** 2, axis=(2, 3), keepdims=True))\r\n        curr_norms_image = np.sqrt(np.sum((x_best_curr - x_curr) ** 2, axis=(1, 2, 3), keepdims=True))\r\n        mask_2 = np.maximum(new_deltas_mask, new_deltas_mask_2)\r\n        norms_windows = np.sqrt(np.sum((delta_curr * mask_2) ** 2, axis=(2, 3), keepdims=True))\r\n\r\n        ### create the updates\r\n        new_deltas = np.ones([x_curr.shape[0], c, s, s])\r\n        new_deltas = new_deltas * meta_pseudo_gaussian_pert(s).reshape([1, 1, s, s])\r\n        new_deltas *= np.random.choice([-1, 1], size=[x_curr.shape[0], c, 1, 1])\r\n        old_deltas = delta_curr[:, :, center_h:center_h + s, center_w:center_w + s] / (1e-10 + curr_norms_window)\r\n        new_deltas += old_deltas\r\n        new_deltas = new_deltas / np.sqrt(np.sum(new_deltas ** 2, axis=(2, 3), keepdims=True)) * (\r\n            np.maximum(eps ** 2 - curr_norms_image ** 2, 0) / c + norms_windows ** 2) ** 0.5\r\n        delta_curr[:, :, center_h_2:center_h_2 + s2, center_w_2:center_w_2 + s2] = 0.0  # set window_2 to 0\r\n        delta_curr[:, :, center_h:center_h + s, center_w:center_w + s] = new_deltas + 0  # update window_1\r\n\r\n        hps_str = 's={}->{}'.format(s_init, s)\r\n        x_new = x_curr + delta_curr / np.sqrt(np.sum(delta_curr ** 2, axis=(1, 2, 3), keepdims=True)) * eps\r\n        x_new = np.clip(x_new, min_val, max_val)\r\n        curr_norms_image = np.sqrt(np.sum((x_new - x_curr) ** 2, axis=(1, 2, 3), keepdims=True))\r\n\r\n        logits = model.predict(x_new)\r\n        loss = model.loss(y_curr, logits, targeted, loss_type=loss_type)\r\n        margin = model.loss(y_curr, logits, targeted, loss_type='margin_loss')\r\n\r\n        idx_improved = loss < loss_min_curr\r\n        loss_min[idx_to_fool] = idx_improved * loss + ~idx_improved * loss_min_curr\r\n        margin_min[idx_to_fool] = idx_improved * margin + ~idx_improved * margin_min_curr\r\n\r\n        idx_improved = np.reshape(idx_improved, [-1, *[1] * len(x.shape[:-1])])\r\n        x_best[idx_to_fool] = idx_improved * x_new + ~idx_improved * x_best_curr\r\n        n_queries[idx_to_fool] += 1\r\n\r\n        acc = (margin_min > 0.0).sum() / n_ex_total\r\n        acc_corr = (margin_min > 0.0).mean()\r\n        mean_nq, mean_nq_ae, median_nq, median_nq_ae = np.mean(n_queries), np.mean(\r\n            n_queries[margin_min <= 0]), np.median(n_queries), np.median(n_queries[margin_min <= 0])\r\n\r\n        time_total = time.time() - time_start\r\n        log.print(\r\n            '{}: acc={:.2%} acc_corr={:.2%} avg#q_ae={:.1f} med#q_ae={:.1f} {}, n_ex={}, {:.0f}s, loss={:.3f}, max_pert={:.1f}, impr={:.0f}'.\r\n                format(i_iter + 1, acc, acc_corr, mean_nq_ae, median_nq_ae, hps_str, x.shape[0], time_total,\r\n                       np.mean(margin_min), np.amax(curr_norms_image), np.sum(idx_improved)))\r\n        metrics[i_iter] = [acc, acc_corr, mean_nq, mean_nq_ae, median_nq, margin_min.mean(), time_total]\r\n        if (i_iter <= 500 and i_iter % 500) or (i_iter > 100 and i_iter % 500) or i_iter + 1 == n_iters or acc == 0:\r\n            np.save(metrics_path, metrics)\r\n        if acc == 0:\r\n            curr_norms_image = np.sqrt(np.sum((x_best - x) ** 2, axis=(1, 2, 3), keepdims=True))\r\n            print('Maximal norm of the perturbations: {:.5f}'.format(np.amax(curr_norms_image)))\r\n            break\r\n\r\n    curr_norms_image = np.sqrt(np.sum((x_best - x) ** 2, axis=(1, 2, 3), keepdims=True))\r\n    print('Maximal norm of the perturbations: {:.5f}'.format(np.amax(curr_norms_image)))\r\n\r\n    return n_queries, x_best\r\n\r\n\r\ndef square_attack_linf(model, x, y, corr_classified, eps, n_iters, p_init, metrics_path, targeted, loss_type):\r\n    \"\"\" The Linf square attack \"\"\"\r\n    np.random.seed(0)  # important to leave it here as well\r\n    min_val, max_val = 0, 1 if x.max() <= 1 else 255\r\n    c, h, w = x.shape[1:]\r\n    n_features = c*h*w\r\n    n_ex_total = x.shape[0]\r\n    x, y = x[corr_classified], y[corr_classified]\r\n\r\n    # [c, 1, w], i.e. vertical stripes work best for untargeted attacks\r\n    init_delta = np.random.choice([-eps, eps], size=[x.shape[0], c, 1, w])\r\n    x_best = np.clip(x + init_delta, min_val, max_val)\r\n\r\n    logits = model.predict(x_best)\r\n    loss_min = model.loss(y, logits, targeted, loss_type=loss_type)\r\n    margin_min = model.loss(y, logits, targeted, loss_type='margin_loss')\r\n    n_queries = np.ones(x.shape[0])  # ones because we have already used 1 query\r\n\r\n    time_start = time.time()\r\n    metrics = np.zeros([n_iters, 7])\r\n    for i_iter in range(n_iters - 1):\r\n        idx_to_fool = margin_min > 0\r\n        x_curr, x_best_curr, y_curr = x[idx_to_fool], x_best[idx_to_fool], y[idx_to_fool]\r\n        loss_min_curr, margin_min_curr = loss_min[idx_to_fool], margin_min[idx_to_fool]\r\n        deltas = x_best_curr - x_curr\r\n\r\n        p = p_selection(p_init, i_iter, n_iters)\r\n        for i_img in range(x_best_curr.shape[0]):\r\n            s = int(round(np.sqrt(p * n_features / c)))\r\n            s = min(max(s, 1), h-1)  # at least c x 1 x 1 window is taken and at most c x h-1 x h-1\r\n            center_h = np.random.randint(0, h - s)\r\n            center_w = np.random.randint(0, w - s)\r\n\r\n            x_curr_window = x_curr[i_img, :, center_h:center_h+s, center_w:center_w+s]\r\n            x_best_curr_window = x_best_curr[i_img, :, center_h:center_h+s, center_w:center_w+s]\r\n            # prevent trying out a delta if it doesn't change x_curr (e.g. an overlapping patch)\r\n            while np.sum(np.abs(np.clip(x_curr_window + deltas[i_img, :, center_h:center_h+s, center_w:center_w+s], min_val, max_val) - x_best_curr_window) < 10**-7) == c*s*s:\r\n                deltas[i_img, :, center_h:center_h+s, center_w:center_w+s] = np.random.choice([-eps, eps], size=[c, 1, 1])\r\n\r\n        x_new = np.clip(x_curr + deltas, min_val, max_val)\r\n\r\n        logits = model.predict(x_new)\r\n        loss = model.loss(y_curr, logits, targeted, loss_type=loss_type)\r\n        margin = model.loss(y_curr, logits, targeted, loss_type='margin_loss')\r\n\r\n        idx_improved = loss < loss_min_curr\r\n        loss_min[idx_to_fool] = idx_improved * loss + ~idx_improved * loss_min_curr\r\n        margin_min[idx_to_fool] = idx_improved * margin + ~idx_improved * margin_min_curr\r\n        idx_improved = np.reshape(idx_improved, [-1, *[1]*len(x.shape[:-1])])\r\n        x_best[idx_to_fool] = idx_improved * x_new + ~idx_improved * x_best_curr\r\n        n_queries[idx_to_fool] += 1\r\n\r\n        acc = (margin_min > 0.0).sum() / n_ex_total\r\n        acc_corr = (margin_min > 0.0).mean()\r\n        mean_nq, mean_nq_ae, median_nq_ae = np.mean(n_queries), np.mean(n_queries[margin_min <= 0]), np.median(n_queries[margin_min <= 0])\r\n        avg_margin_min = np.mean(margin_min)\r\n        time_total = time.time() - time_start\r\n        log.print('{}: acc={:.2%} acc_corr={:.2%} avg#q_ae={:.2f} med#q={:.1f}, avg_margin={:.2f} (n_ex={}, eps={:.3f}, {:.2f}s)'.\r\n            format(i_iter+1, acc, acc_corr, mean_nq_ae, median_nq_ae, avg_margin_min, x.shape[0], eps, time_total))\r\n\r\n        metrics[i_iter] = [acc, acc_corr, mean_nq, mean_nq_ae, median_nq_ae, margin_min.mean(), time_total]\r\n        if (i_iter <= 500 and i_iter % 20 == 0) or (i_iter > 100 and i_iter % 50 == 0) or i_iter + 1 == n_iters or acc == 0:\r\n            np.save(metrics_path, metrics)\r\n        if acc == 0:\r\n            break\r\n\r\n    return n_queries, x_best\r\n\r\n\r\nif __name__ == '__main__':\r\n    parser = argparse.ArgumentParser(description='Define hyperparameters.')\r\n    parser.add_argument('--model', type=str, default='pt_resnet', choices=models.all_model_names, help='Model name.')\r\n    parser.add_argument('--attack', type=str, default='square_linf', choices=['square_linf', 'square_l2'], help='Attack.')\r\n    parser.add_argument('--exp_folder', type=str, default='exps', help='Experiment folder to store all output.')\r\n    parser.add_argument('--gpu', type=str, default='7', help='GPU number. Multiple GPUs are possible for PT models.')\r\n    parser.add_argument('--n_ex', type=int, default=10000, help='Number of test ex to test on.')\r\n    parser.add_argument('--p', type=float, default=0.05,\r\n                        help='Probability of changing a coordinate. Note: check the paper for the best values. '\r\n                             'Linf standard: 0.05, L2 standard: 0.1. But robust models require higher p.')\r\n    parser.add_argument('--eps', type=float, default=0.05, help='Radius of the Lp ball.')\r\n    parser.add_argument('--n_iter', type=int, default=10000)\r\n    parser.add_argument('--targeted', action='store_true', help='Targeted or untargeted attack.')\r\n    args = parser.parse_args()\r\n    args.loss = 'margin_loss' if not args.targeted else 'cross_entropy'\r\n\r\n    os.environ[\"CUDA_VISIBLE_DEVICES\"] = args.gpu\r\n    dataset = 'mnist' if 'mnist' in args.model else 'cifar10' if 'cifar10' in args.model else 'imagenet'\r\n    timestamp = str(datetime.now())[:-7]\r\n    hps_str = '{} model={} dataset={} attack={} n_ex={} eps={} p={} n_iter={}'.format(\r\n        timestamp, args.model, dataset, args.attack, args.n_ex, args.eps, args.p, args.n_iter)\r\n    args.eps = args.eps / 255.0 if dataset == 'imagenet' else args.eps  # for mnist and cifar10 we leave as it is\r\n    batch_size = data.bs_dict[dataset]\r\n    model_type = 'pt' if 'pt_' in args.model else 'tf'\r\n    n_cls = 1000 if dataset == 'imagenet' else 10\r\n    gpu_memory = 0.5 if dataset == 'mnist' and args.n_ex > 1000 else 0.15 if dataset == 'mnist' else 0.99\r\n\r\n    log_path = '{}/{}.log'.format(args.exp_folder, hps_str)\r\n    metrics_path = '{}/{}.metrics'.format(args.exp_folder, hps_str)\r\n\r\n    log = utils.Logger(log_path)\r\n    log.print('All hps: {}'.format(hps_str))\r\n\r\n    if args.model != 'pt_inception':\r\n        x_test, y_test = data.datasets_dict[dataset](args.n_ex)\r\n    else:  # exception for inception net on imagenet -- 299x299 images instead of 224x224\r\n        x_test, y_test = data.datasets_dict[dataset](args.n_ex, size=299)\r\n    x_test, y_test = x_test[:args.n_ex], y_test[:args.n_ex]\r\n\r\n    if args.model == 'pt_post_avg_cifar10':\r\n        x_test /= 255.0\r\n        args.eps = args.eps / 255.0\r\n\r\n    models_class_dict = {'tf': models.ModelTF, 'pt': models.ModelPT}\r\n    model = models_class_dict[model_type](args.model, batch_size, gpu_memory)\r\n\r\n    logits_clean = model.predict(x_test)\r\n    corr_classified = logits_clean.argmax(1) == y_test\r\n    # important to check that the model was restored correctly and the clean accuracy is high\r\n    log.print('Clean accuracy: {:.2%}'.format(np.mean(corr_classified)))\r\n\r\n    square_attack = square_attack_linf if args.attack == 'square_linf' else square_attack_l2\r\n    y_target = utils.random_classes_except_current(y_test, n_cls) if args.targeted else y_test\r\n    y_target_onehot = utils.dense_to_onehot(y_target, n_cls=n_cls)\r\n    # Note: we count the queries only across correctly classified images\r\n    n_queries, x_adv = square_attack(model, x_test, y_target_onehot, corr_classified, args.eps, args.n_iter,\r\n                                     args.p, metrics_path, args.targeted, args.loss)\r\n\r\n"
  },
  {
    "path": "data.py",
    "content": "import torch\nimport numpy as np\nfrom torchvision import transforms\nfrom torchvision.datasets import ImageFolder\nfrom torch.utils.data import DataLoader\n\n\ndef load_mnist(n_ex):\n    from tensorflow.keras.datasets import mnist as mnist_keras\n\n    x_test, y_test = mnist_keras.load_data()[1]\n    x_test = x_test.astype(np.float64) / 255.0\n    x_test = x_test[:, None, :, :]\n\n    return x_test[:n_ex], y_test[:n_ex]\n\n\ndef load_cifar10(n_ex):\n    from madry_cifar10.cifar10_input import CIFAR10Data\n    cifar = CIFAR10Data('madry_cifar10/cifar10_data')\n    x_test, y_test = cifar.eval_data.xs.astype(np.float32), cifar.eval_data.ys\n    x_test = np.transpose(x_test, axes=[0, 3, 1, 2])\n    return x_test[:n_ex], y_test[:n_ex]\n\n\ndef load_imagenet(n_ex, size=224):\n    IMAGENET_SL = size\n    IMAGENET_PATH = \"/scratch/maksym/imagenet/val_orig\"\n    imagenet = ImageFolder(IMAGENET_PATH,\n                           transforms.Compose([\n                               transforms.Resize(IMAGENET_SL),\n                               transforms.CenterCrop(IMAGENET_SL),\n                               transforms.ToTensor()\n                           ]))\n    torch.manual_seed(0)\n\n    imagenet_loader = DataLoader(imagenet, batch_size=n_ex, shuffle=True, num_workers=1)\n    x_test, y_test = next(iter(imagenet_loader))\n\n    return np.array(x_test, dtype=np.float32), np.array(y_test)\n\n\ndatasets_dict = {'mnist': load_mnist,\n                 'cifar10': load_cifar10,\n                 'imagenet': load_imagenet,\n}\nbs_dict = {'mnist': 10000,\n           'cifar10': 4096,  # 4096 is the maximum that fits\n           'imagenet': 100,\n}\n"
  },
  {
    "path": "logit_pairing/models.py",
    "content": "import tensorflow as tf\nfrom collections import OrderedDict\n\n\n# -------------------------------------------------------------\n#    Models\n# -------------------------------------------------------------\n\nclass LeNet:\n    def __init__(self):\n        super().__init__()\n        self.nb_classes = 10\n        self.input_shape = [28, 28, 3]\n        self.weights_init = 'He'\n        self.filters = 32  # 32 is the default here for all our pre-trained models\n        self.is_training = False\n        self.bn = False\n        self.bn_scale = False\n        self.bn_bias = False\n        self.parameters = 0\n\n        # Create variables\n        with tf.variable_scope('conv1_vars'):\n            self.W_conv1 = create_conv2d_weights(kernel_size=3, filter_in=1, filter_out=self.filters,\n                                                 init=self.weights_init)\n            self.parameters += 3 * 3 * self.input_shape[-1] * self.filters\n\n            self.b_conv1 = create_biases(size=self.filters)\n            self.parameters += self.filters\n\n        with tf.variable_scope('conv2_vars'):\n            self.W_conv2 = create_conv2d_weights(kernel_size=3, filter_in=self.filters, filter_out=self.filters * 2,\n                                                 init=self.weights_init)\n            self.parameters += 3 * 3 * self.filters * (self.filters * 2)\n\n            self.b_conv2 = create_biases(size=self.filters * 2)\n            self.parameters += self.filters * 2\n\n        with tf.variable_scope('fc1_vars'):\n            self.W_fc1 = create_weights(units_in=7 * 7 * self.filters * 2, units_out=1024, init=self.weights_init)\n            self.parameters += (7 * 7 * self.filters * 2) * 1024\n\n            self.b_fc1 = create_biases(size=1024)\n            self.parameters += 1024\n\n        with tf.variable_scope('fc2_vars'):\n            self.W_fc2 = create_weights(units_in=1024, units_out=self.nb_classes, init=self.weights_init)\n            self.parameters += 1024 * self.nb_classes\n\n            self.b_fc2 = create_biases(size=self.nb_classes)\n            self.parameters += self.nb_classes\n\n        self.x_input = tf.placeholder(tf.float32, shape=[None, 784])\n        self.y_input = tf.placeholder(tf.int64, shape=[None])\n\n        x = tf.reshape(self.x_input, [-1, 28, 28, 1])\n\n        with tf.name_scope('conv-block-1'):\n            conv1 = conv_layer(x, self.is_training, self.W_conv1, stride=1, padding='SAME', bn=self.bn,\n                               bn_scale=self.bn_scale, bn_bias=self.bn_bias, name='conv1', bias=self.b_conv1)\n\n        with tf.name_scope('max-pool-1'):\n            conv1 = tf.nn.max_pool(conv1, ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1], padding='VALID')\n\n        with tf.name_scope('conv-block-2'):\n            conv2 = conv_layer(conv1, self.is_training, self.W_conv2, stride=1, padding='SAME', bn=self.bn,\n                               bn_scale=self.bn_scale, bn_bias=self.bn_bias, name='conv2', bias=self.b_conv2)\n\n        with tf.name_scope('max-pool-2'):\n            conv2 = tf.nn.max_pool(conv2, ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1], padding='VALID')\n\n        with tf.name_scope('fc-block'):\n            conv2 = tf.layers.flatten(conv2)\n            fc1 = fc_layer(conv2, self.is_training, self.W_fc1, bn=self.bn, bn_scale=self.bn_scale,\n                           bn_bias=self.bn_bias, name='fc1', non_linearity='relu', bias=self.b_fc1)\n\n            logits = fc_layer(fc1, self.is_training, self.W_fc2, bn=self.bn, bn_scale=self.bn_scale,\n                              bn_bias=self.bn_bias, name='fc2', non_linearity='linear', bias=self.b_fc2)\n\n        self.summaries = False\n        self.logits = logits\n\n\nclass ResNet20_v2:\n    def __init__(self):\n        super().__init__()\n        self.nb_classes = 10\n        self.input_shape = [32, 32, 3]\n        self.weights_init = 'He'\n        self.filters = 64  # 64 is the default here for all our pre-trained models\n        self.is_training = False\n        self.bn = True\n        self.bn_scale = True\n        self.bn_bias = True\n        self.parameters = 0\n\n        # Create variables\n        with tf.variable_scope('conv1_vars'):\n            self.W_conv1 = create_conv2d_weights(kernel_size=3, filter_in=self.input_shape[-1], filter_out=self.filters,\n                                                 init=self.weights_init)\n            self.parameters += 3 * 3 * self.input_shape[-1] * self.filters\n\n        with tf.variable_scope('conv2_vars'):\n            self.W_conv2 = create_conv2d_weights(kernel_size=3, filter_in=self.filters, filter_out=self.filters,\n                                                 init=self.weights_init)\n            self.parameters += 3 * 3 * self.filters * self.filters\n\n        with tf.variable_scope('conv3_vars'):\n            self.W_conv3 = create_conv2d_weights(kernel_size=3, filter_in=self.filters, filter_out=self.filters,\n                                                 init=self.weights_init)\n            self.parameters += 3 * 3 * self.filters * self.filters\n\n        with tf.variable_scope('conv4_vars'):\n            self.W_conv4 = create_conv2d_weights(kernel_size=3, filter_in=self.filters, filter_out=self.filters,\n                                                 init=self.weights_init)\n            self.parameters += 3 * 3 * self.filters * self.filters\n\n        with tf.variable_scope('conv5_vars'):\n            self.W_conv5 = create_conv2d_weights(kernel_size=3, filter_in=self.filters, filter_out=self.filters,\n                                                 init=self.weights_init)\n            self.parameters += 3 * 3 * self.filters * self.filters\n\n        with tf.variable_scope('conv6_vars'):\n            self.W_conv6 = create_conv2d_weights(kernel_size=3, filter_in=self.filters, filter_out=self.filters,\n                                                 init=self.weights_init)\n            self.parameters += 3 * 3 * self.filters * self.filters\n\n        with tf.variable_scope('conv7_vars'):\n            self.W_conv7 = create_conv2d_weights(kernel_size=3, filter_in=self.filters, filter_out=self.filters,\n                                                 init=self.weights_init)\n            self.parameters += 3 * 3 * self.filters * self.filters\n\n        with tf.variable_scope('conv8_vars'):\n            self.W_conv8 = create_conv2d_weights(kernel_size=3, filter_in=self.filters, filter_out=self.filters * 2,\n                                                 init=self.weights_init)\n            self.parameters += 3 * 3 * self.filters * (self.filters * 2)\n\n        with tf.variable_scope('conv9_vars'):\n            self.W_conv9 = create_conv2d_weights(kernel_size=3, filter_in=self.filters * 2, filter_out=self.filters * 2,\n                                                 init=self.weights_init)\n            self.parameters += 3 * 3 * (self.filters * 2) * (self.filters * 2)\n\n        with tf.variable_scope('conv10_vars'):\n            self.W_conv10 = create_conv2d_weights(kernel_size=3, filter_in=self.filters * 2,\n                                                  filter_out=self.filters * 2, init=self.weights_init)\n            self.parameters += 3 * 3 * (self.filters * 2) * (self.filters * 2)\n\n        with tf.variable_scope('conv11_vars'):\n            self.W_conv11 = create_conv2d_weights(kernel_size=3, filter_in=self.filters * 2,\n                                                  filter_out=self.filters * 2, init=self.weights_init)\n            self.parameters += 3 * 3 * (self.filters * 2) * (self.filters * 2)\n\n        with tf.variable_scope('conv12_vars'):\n            self.W_conv12 = create_conv2d_weights(kernel_size=3, filter_in=self.filters * 2,\n                                                  filter_out=self.filters * 2, init=self.weights_init)\n            self.parameters += 3 * 3 * (self.filters * 2) * (self.filters * 2)\n\n        with tf.variable_scope('conv13_vars'):\n            self.W_conv13 = create_conv2d_weights(kernel_size=3, filter_in=self.filters * 2,\n                                                  filter_out=self.filters * 2, init=self.weights_init)\n            self.parameters += 3 * 3 * (self.filters * 2) * (self.filters * 2)\n\n        with tf.variable_scope('conv14_vars'):\n            self.W_conv14 = create_conv2d_weights(kernel_size=3, filter_in=self.filters * 2,\n                                                  filter_out=self.filters * 4, init=self.weights_init)\n            self.parameters += 3 * 3 * (self.filters * 2) * (self.filters * 4)\n\n        with tf.variable_scope('conv15_vars'):\n            self.W_conv15 = create_conv2d_weights(kernel_size=3, filter_in=self.filters * 4,\n                                                  filter_out=self.filters * 4, init=self.weights_init)\n            self.parameters += 3 * 3 * (self.filters * 4) * (self.filters * 4)\n\n        with tf.variable_scope('conv16_vars'):\n            self.W_conv16 = create_conv2d_weights(kernel_size=3, filter_in=self.filters * 4,\n                                                  filter_out=self.filters * 4, init=self.weights_init)\n            self.parameters += 3 * 3 * (self.filters * 4) * (self.filters * 4)\n\n        with tf.variable_scope('conv17_vars'):\n            self.W_conv17 = create_conv2d_weights(kernel_size=3, filter_in=self.filters * 4,\n                                                  filter_out=self.filters * 4, init=self.weights_init)\n            self.parameters += 3 * 3 * (self.filters * 4) * (self.filters * 4)\n\n        with tf.variable_scope('conv18_vars'):\n            self.W_conv18 = create_conv2d_weights(kernel_size=3, filter_in=self.filters * 4,\n                                                  filter_out=self.filters * 4, init=self.weights_init)\n            self.parameters += 3 * 3 * (self.filters * 4) * (self.filters * 4)\n\n        with tf.variable_scope('conv19_vars'):\n            self.W_conv19 = create_conv2d_weights(kernel_size=3, filter_in=self.filters * 4,\n                                                  filter_out=self.filters * 4, init=self.weights_init)\n            self.parameters += 3 * 3 * (self.filters * 4) * (self.filters * 4)\n\n        with tf.variable_scope('fc1_vars'):\n            self.W_fc1 = create_weights(units_in=self.filters * 4, units_out=self.nb_classes, init=self.weights_init)\n            self.parameters += (self.filters * 4) * self.nb_classes\n\n            self.b_fc1 = create_biases(size=self.nb_classes)\n            self.parameters += self.nb_classes\n\n        with tf.variable_scope('scip1_vars'):\n            self.W_scip1 = create_conv2d_weights(kernel_size=1, filter_in=self.filters, filter_out=self.filters,\n                                                 init=self.weights_init)\n            self.parameters += 1 * 1 * self.filters * self.filters\n\n        with tf.variable_scope('scip2_vars'):\n            self.W_scip2 = create_conv2d_weights(kernel_size=1, filter_in=self.filters, filter_out=self.filters * 2,\n                                                 init=self.weights_init)\n            self.parameters += 1 * 1 * self.filters * (self.filters * 2)\n\n        with tf.variable_scope('scip3_vars'):\n            self.W_scip3 = create_conv2d_weights(kernel_size=1, filter_in=self.filters * 2, filter_out=self.filters * 4,\n                                                 init=self.weights_init)\n            self.parameters += 1 * 1 * (self.filters * 2) * (self.filters * 4)\n\n        self.x_input = tf.placeholder(tf.float32, shape=[None, 32, 32, 3])\n        self.y_input = tf.placeholder(tf.int64, shape=None)\n        x = self.x_input / 255.0\n\n        # Specify forward pass\n        with tf.name_scope('input-block'):\n            conv1 = conv_layer(x, self.is_training, self.W_conv1, stride=1, padding='SAME',\n                               bn=False, bn_scale=self.bn_scale, bn_bias=self.bn_bias,\n                               name='conv1',\n                               non_linearity='linear')\n\n        with tf.name_scope('conv-block-1'):\n            conv2 = pre_act_conv_layer(conv1, self.is_training, self.W_conv2, stride=1, padding='SAME',\n                                       bn=self.bn, bn_scale=self.bn_scale, bn_bias=self.bn_bias, name='conv2')\n\n            conv3 = pre_act_conv_layer(conv2, self.is_training, self.W_conv3, stride=1, padding='SAME',\n                                       bn=self.bn, bn_scale=self.bn_scale, bn_bias=self.bn_bias, name='conv3')\n\n            # skip connection\n            conv3 += tf.nn.conv2d(conv1, self.W_scip1, strides=[1, 1, 1, 1], padding='SAME', name='conv-skip1')\n\n            conv4 = pre_act_conv_layer(conv3, self.is_training, self.W_conv4, stride=1, padding='SAME',\n                                       bn=self.bn, bn_scale=self.bn_scale, bn_bias=self.bn_bias, name='conv4')\n\n            conv5 = pre_act_conv_layer(conv4, self.is_training, self.W_conv5, stride=1, padding='SAME',\n                                       bn=self.bn, bn_scale=self.bn_scale, bn_bias=self.bn_bias, name='conv5')\n\n            # skip connection\n            conv5 += conv3\n\n            conv6 = pre_act_conv_layer(conv5, self.is_training, self.W_conv6, stride=1, padding='SAME',\n                                       bn=self.bn, bn_scale=self.bn_scale, bn_bias=self.bn_bias, name='conv6')\n\n            conv7 = pre_act_conv_layer(conv6, self.is_training, self.W_conv7, stride=1, padding='SAME',\n                                       bn=self.bn, bn_scale=self.bn_scale, bn_bias=self.bn_bias, name='conv7')\n\n            # skip connection\n            conv7 += conv5\n\n        with tf.name_scope('conv-block-2'):\n            conv8 = pre_act_conv_layer(conv7, self.is_training, self.W_conv8, stride=2, padding='SAME',\n                                       bn=self.bn, bn_scale=self.bn_scale, bn_bias=self.bn_bias, name='conv8')\n\n            conv9 = pre_act_conv_layer(conv8, self.is_training, self.W_conv9, stride=1, padding='SAME',\n                                       bn=self.bn, bn_scale=self.bn_scale, bn_bias=self.bn_bias, name='conv9')\n\n            # skip connection\n            conv9 += tf.nn.conv2d(conv7, self.W_scip2, strides=[1, 2, 2, 1], padding='SAME', name='conv-skip2')\n\n            conv10 = pre_act_conv_layer(conv9, self.is_training, self.W_conv10, stride=1, padding='SAME',\n                                        bn=self.bn, bn_scale=self.bn_scale, bn_bias=self.bn_bias, name='conv10')\n\n            conv11 = pre_act_conv_layer(conv10, self.is_training, self.W_conv11, stride=1, padding='SAME',\n                                        bn=self.bn, bn_scale=self.bn_scale, bn_bias=self.bn_bias, name='conv11')\n\n            # skip connection\n            conv11 += conv9\n\n            conv12 = pre_act_conv_layer(conv11, self.is_training, self.W_conv12, stride=1, padding='SAME',\n                                        bn=self.bn, bn_scale=self.bn_scale, bn_bias=self.bn_bias, name='conv12')\n\n            conv13 = pre_act_conv_layer(conv12, self.is_training, self.W_conv13, stride=1, padding='SAME',\n                                        bn=self.bn, bn_scale=self.bn_scale, bn_bias=self.bn_bias, name='conv13')\n\n            # skip connection\n            conv13 += conv11\n\n        with tf.name_scope('conv-block-3'):\n            conv14 = pre_act_conv_layer(conv13, self.is_training, self.W_conv14, stride=2, padding='SAME',\n                                        bn=self.bn, bn_scale=self.bn_scale, bn_bias=self.bn_bias, name='conv14')\n\n            conv15 = pre_act_conv_layer(conv14, self.is_training, self.W_conv15, stride=1, padding='SAME',\n                                        bn=self.bn, bn_scale=self.bn_scale, bn_bias=self.bn_bias, name='conv15')\n\n            # skip connection\n            conv15 += tf.nn.conv2d(conv13, self.W_scip3, strides=[1, 2, 2, 1], padding='SAME', name='conv-skip3')\n\n            conv16 = pre_act_conv_layer(conv15, self.is_training, self.W_conv16, stride=1, padding='SAME',\n                                        bn=self.bn, bn_scale=self.bn_scale, bn_bias=self.bn_bias, name='conv16')\n\n            conv17 = pre_act_conv_layer(conv16, self.is_training, self.W_conv17, stride=1, padding='SAME',\n                                        bn=self.bn, bn_scale=self.bn_scale, bn_bias=self.bn_bias, name='conv17')\n\n            # skip connection\n            conv17 += conv15\n\n            conv18 = pre_act_conv_layer(conv17, self.is_training, self.W_conv18, stride=1, padding='SAME',\n                                        bn=self.bn, bn_scale=self.bn_scale, bn_bias=self.bn_bias, name='conv18')\n\n            conv19 = pre_act_conv_layer(conv18, self.is_training, self.W_conv19, stride=1, padding='SAME',\n                                        bn=self.bn, bn_scale=self.bn_scale, bn_bias=self.bn_bias, name='conv19')\n\n            # skip connection\n            conv19 += conv17\n            conv19 = nonlinearity(conv19)\n\n        with tf.name_scope('output-block'):\n            with tf.name_scope('global-average-pooling'):\n                fc1 = tf.reduce_mean(conv19, axis=[1, 2])\n\n            logits = fc_layer(fc1, self.is_training, self.W_fc1, bn=False, bn_scale=self.bn_scale, bn_bias=self.bn_bias,\n                              name='fc1',\n                              non_linearity='linear', bias=self.b_fc1)\n\n        self.summaries = False\n        self.logits = logits\n\n\n# -------------------------------------------------------------\n#    Helpers\n# -------------------------------------------------------------\n\ndef create_weights(units_in, units_out, init='Xavier', seed=None):\n    if init == 'Xavier':\n        initializer = tf.variance_scaling_initializer(scale=1.0,\n                                                      mode='fan_in',\n                                                      distribution='normal',\n                                                      seed=None,\n                                                      dtype=tf.float32)\n    elif init == 'He':\n        initializer = tf.variance_scaling_initializer(scale=2.0,\n                                                      mode='fan_in',\n                                                      distribution='normal',\n                                                      seed=None,\n                                                      dtype=tf.float32)\n    else:\n        initializer = tf.truncated_normal_initializer(mean=0.0, stddev=0.01, seed=seed, dtype=tf.float32)\n\n    weights = tf.get_variable(name='weights',\n                              shape=[units_in, units_out],\n                              dtype=tf.float32,\n                              initializer=initializer)\n    return weights\n\n\ndef create_conv2d_weights(kernel_size, filter_in, filter_out, init='Xavier', seed=None):\n    if init == 'Xavier':\n        initializer = tf.variance_scaling_initializer(scale=1.0,\n                                                      mode='fan_in',\n                                                      distribution='normal',\n                                                      seed=None,\n                                                      dtype=tf.float32)\n    elif init == 'He':\n        initializer = tf.variance_scaling_initializer(scale=2.0,\n                                                      mode='fan_in',\n                                                      distribution='normal',\n                                                      seed=None,\n                                                      dtype=tf.float32)\n    else:\n        initializer = tf.truncated_normal_initializer(mean=0.0, stddev=0.01, seed=seed, dtype=tf.float32)\n\n    weights = tf.get_variable(name='weights',\n                              shape=[kernel_size, kernel_size, filter_in, filter_out],\n                              dtype=tf.float32,\n                              initializer=initializer)\n    return weights\n\n\ndef create_biases(size):\n    return tf.get_variable(name='biases', shape=[size], dtype=tf.float32, initializer=tf.zeros_initializer())\n\n\ndef batch_norm(x, is_training, scale, bias, name, reuse):\n    return tf.contrib.layers.batch_norm(\n        x,\n        decay=0.999,\n        center=bias,\n        scale=scale,\n        epsilon=0.001,\n        param_initializers=None,\n        updates_collections=tf.GraphKeys.UPDATE_OPS,\n        is_training=is_training,\n        reuse=reuse,\n        variables_collections=['batch-norm'],\n        outputs_collections=None,\n        trainable=True,\n        batch_weights=None,\n        fused=False,\n        zero_debias_moving_mean=False,\n        scope=name,\n        renorm=False,\n        renorm_clipping=None,\n        renorm_decay=0.99\n    )\n\n\ndef nonlinearity(x, non_linearity='relu'):\n    if non_linearity == 'linear':\n        return tf.identity(x)\n    if non_linearity == 'sigmoid':\n        return tf.nn.sigmoid(x)\n    if non_linearity == 'tanh':\n        return tf.nn.tanh(x)\n    if non_linearity == 'relu':\n        return tf.nn.relu(x)\n    if non_linearity == 'elu':\n        return tf.nn.elu(x)\n    if non_linearity == 'selu':\n        return tf.nn.selu(x)\n\n\ndef conv_layer(inputs, is_training, weights, stride, padding, bn, bn_scale, bn_bias, name,\n               non_linearity='relu', bias=None):\n    if bias is not None:\n        inputs = tf.nn.conv2d(inputs, weights, strides=[1, stride, stride, 1], padding=padding) + bias\n    else:\n        inputs = tf.nn.conv2d(inputs, weights, strides=[1, stride, stride, 1], padding=padding)\n\n    if bn:\n        inputs = batch_norm(inputs, is_training=is_training, scale=bn_scale, bias=bn_bias,\n                            name='batch-norm-{:s}'.format(name),\n                            reuse=tf.AUTO_REUSE)\n\n    activations = nonlinearity(inputs, non_linearity=non_linearity)\n\n    return activations\n\n\ndef pre_act_conv_layer(inputs, is_training, weights, stride, padding, bn, bn_scale, bn_bias, name,\n                       non_linearity='relu'):\n    if bn:\n        inputs = batch_norm(inputs, is_training=is_training, scale=bn_scale, bias=bn_bias,\n                            name='batch-norm-{:s}'.format(name),\n                            reuse=tf.AUTO_REUSE)\n\n    activations = nonlinearity(inputs, non_linearity=non_linearity)\n\n    outputs = tf.nn.conv2d(activations, weights, strides=[1, stride, stride, 1], padding=padding)\n\n    return outputs\n\n\ndef fc_layer(inputs, is_training, weights, bn, bn_scale, bn_bias, name, non_linearity='relu', bias=None):\n    if bias is not None:\n        inputs = tf.matmul(inputs, weights) + bias\n    else:\n        inputs = tf.matmul(inputs, weights)\n\n    if bn:\n        inputs = batch_norm(inputs, is_training=is_training, scale=bn_scale, bias=bn_bias,\n                            name='batch-norm-{:s}'.format(name),\n                            reuse=tf.AUTO_REUSE)\n\n    activations = nonlinearity(inputs, non_linearity)\n\n    return activations\n"
  },
  {
    "path": "madry_cifar10/LICENSE",
    "content": "MIT License\n\nCopyright (c) 2017 Aleksander Madry, Aleksandar Makelov, Ludwig Schmidt, Dimitris Tsipras, and Adrian Vladu\n\nPermission is hereby granted, free of charge, to any person obtaining a copy\nof this software and associated documentation files (the \"Software\"), to deal\nin the Software without restriction, including without limitation the rights\nto use, copy, modify, merge, publish, distribute, sublicense, and/or sell\ncopies of the Software, and to permit persons to whom the Software is\nfurnished to do so, subject to the following conditions:\n\nThe above copyright notice and this permission notice shall be included in all\ncopies or substantial portions of the Software.\n\nTHE SOFTWARE IS PROVIDED \"AS IS\", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR\nIMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,\nFITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE\nAUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER\nLIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,\nOUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE\nSOFTWARE."
  },
  {
    "path": "madry_cifar10/README.md",
    "content": "# CIFAR10 Adversarial Examples Challenge\r\n\r\nRecently, there has been much progress on adversarial *attacks* against neural networks, such as the [cleverhans](https://github.com/tensorflow/cleverhans) library and the code by [Carlini and Wagner](https://github.com/carlini/nn_robust_attacks).\r\nWe now complement these advances by proposing an *attack challenge* for the\r\n[CIFAR10 dataset](https://www.cs.toronto.edu/~kriz/cifar.html) which follows the\r\nformat of [our earlier MNIST challenge](https://github.com/MadryLab/mnist_challenge).\r\nWe have trained a robust network, and the objective is to find a set of adversarial examples on which this network achieves only a low accuracy.\r\nTo train an adversarially-robust network, we followed the approach from our recent paper:\r\n\r\n**Towards Deep Learning Models Resistant to Adversarial Attacks** <br>\r\n*Aleksander Madry, Aleksandar Makelov, Ludwig Schmidt, Dimitris Tsipras, Adrian Vladu* <br>\r\nhttps://arxiv.org/abs/1706.06083.\r\n\r\nAs part of the challenge, we release both the training code and the network architecture, but keep the network weights secret.\r\nWe invite any researcher to submit attacks against our model (see the detailed instructions below).\r\nWe will maintain a leaderboard of the best attacks for the next two months and then publish our secret network weights.\r\n\r\nAnalogously to our MNIST challenge, the goal of this challenge is to clarify the state-of-the-art for adversarial robustness on CIFAR10. Moreover, we hope that future work on defense mechanisms will adopt a similar challenge format in order to improve reproducibility and empirical comparisons.\r\n\r\n**Update 2017-12-10**: We released our secret model. You can download it by running `python fetch_model.py secret`. As of Dec 10 we are no longer accepting black-box challenge submissions. We have set up a leaderboard for white-box attacks on the (now released) secret model. The submission format is the same as before. We plan to continue evaluating submissions and maintaining the leaderboard for the foreseeable future.\r\n\r\n## Black-Box Leaderboard (Original Challenge)\r\n\r\n| Attack                                 | Submitted by  | Accuracy | Submission Date |\r\n| -------------------------------------- | ------------- | -------- | ---- |\r\n| PGD on the cross-entropy loss for the<br> adversarially trained public network     | (initial entry)       | **63.39%**   | Jul 12, 2017    |\r\n| PGD on the [CW](https://github.com/carlini/nn_robust_attacks) loss for the<br> adversarially trained public network     | (initial entry)       | 64.38%   | Jul 12, 2017    |\r\n| FGSM on the [CW](https://github.com/carlini/nn_robust_attacks) loss for the<br> adversarially trained public network     | (initial entry)       | 67.25%   | Jul 12, 2017    |\r\n| FGSM on the [CW](https://github.com/carlini/nn_robust_attacks) loss for the<br> naturally trained public network     | (initial entry)       | 85.23%   | Jul 12, 2017    |\r\n\r\n## White-Box Leaderboard\r\n\r\n| Attack                                 | Submitted by  | Accuracy | Submission Date |\r\n| -------------------------------------- | ------------- | -------- | ---- |\r\n| [FAB: Fast Adaptive Boundary Attack](https://github.com/fra31/fab-attack) | Francesco Croce       | **44.51%**   | Jun 7, 2019    |\r\n| [Distributionally Adversarial Attack](https://github.com/tianzheng4/Distributionally-Adversarial-Attack) | Tianhang Zheng       | 44.71%   | Aug 21, 2018    |\r\n| 20-step PGD on the cross-entropy loss<br> with 10 random restarts | Tianhang Zheng       | 45.21%   | Aug 24, 2018    |\r\n| 20-step PGD on the cross-entropy loss | (initial entry)       | 47.04%   | Dec 10, 2017    |\r\n| 20-step PGD on the [CW](https://github.com/carlini/nn_robust_attacks) loss | (initial entry)       | 47.76%   | Dec 10, 2017    |\r\n| FGSM on the [CW](https://github.com/carlini/nn_robust_attacks) loss | (initial entry)       | 54.92%   | Dec 10, 2017    |\r\n| FGSM on the cross-entropy loss | (initial entry)       | 55.55%   | Dec 10, 2017    |\r\n\r\n\r\n\r\n\r\n\r\n## Format and Rules\r\n\r\nThe objective of the challenge is to find black-box (transfer) attacks that are effective against our CIFAR10 model.\r\nAttacks are allowed to perturb each pixel of the input image by at most `epsilon=8.0` on a `0-255` pixel scale.\r\nTo ensure that the attacks are indeed black-box, we release our training code and model architecture, but keep the actual network weights secret. \r\n\r\nWe invite any interested researchers to submit attacks against our model.\r\nThe most successful attacks will be listed in the leaderboard above.\r\nAs a reference point, we have seeded the leaderboard with the results of some standard attacks.\r\n\r\n### The CIFAR10 Model\r\n\r\nWe used the code published in this repository to produce an adversarially robust model for CIFAR10 classification. The model is a residual convolutional neural network consisting of five residual units and a fully connected layer. This architecture is derived from the \"w32-10 wide\" variant of the [Tensorflow model repository](https://github.com/tensorflow/models/blob/master/resnet/resnet_model.py).\r\nThe network was trained against an iterative adversary that is allowed to perturb each pixel by at most `epsilon=8.0`.\r\n\r\nThe random seed used for training and the trained network weights will be kept secret.\r\n\r\nThe `sha256()` digest of our model file is:\r\n```\r\n555be6e892372599380c9da5d5f9802f9cbd098be8a47d24d96937a002305fd4\r\n```\r\nWe will release the corresponding model file on September 15 2017, which is roughly two months after the start of this competition. **Edit: We are extending the deadline for submitting attacks to October 15th due to requests.**\r\n\r\n### The Attack Model\r\n\r\nWe are interested in adversarial inputs that are derived from the CIFAR10 test set.\r\nEach pixel can be perturbed by at most `epsilon=8.0` from its initial value on the `0-255` pixel scale.\r\nAll pixels can be perturbed independently, so this is an l_infinity attack.\r\n\r\n### Submitting an Attack\r\n\r\nEach attack should consist of a perturbed version of the CIFAR10 test set.\r\nEach perturbed image in this test set should follow the above attack model. \r\n\r\nThe adversarial test set should be formated as a numpy array with one row per example and each row containing a 32x32x3\r\narray of pixels.\r\nHence the overall dimensions are 10,000x32x32x3.\r\nEach pixel must be in the [0, 255] range.\r\nSee the script `pgd_attack.py` for an attack that generates an adversarial test set in this format.\r\n\r\nIn order to submit your attack, save the matrix containing your adversarial examples with `numpy.save` and email the resulting file to cifar10.challenge@gmail.com. \r\nWe will then run the `run_attack.py` script on your file to verify that the attack is valid and to evaluate the accuracy of our secret model on your examples.\r\nAfter that, we will reply with the predictions of our model on each of your examples and the overall accuracy of our model on your evaluation set.\r\n\r\nIf the attack is valid and outperforms all current attacks in the leaderboard, it will appear at the top of the leaderboard.\r\nNovel types of attacks might be included in the leaderboard even if they do not perform best.\r\n\r\nWe strongly encourage you to disclose your attack method.\r\nWe would be happy to add a link to your code in our leaderboard.\r\n\r\n## Overview of the Code\r\nThe code consists of seven Python scripts and the file `config.json` that contains various parameter settings.\r\n\r\n### Running the code\r\n- `python train.py`: trains the network, storing checkpoints along\r\n      the way.\r\n- `python eval.py`: an infinite evaluation loop, processing each new\r\n      checkpoint as it is created while logging summaries. It is intended\r\n      to be run in parallel with the `train.py` script.\r\n- `python pgd_attack.py`:  applies the attack to the CIFAR10 eval set and\r\n      stores the resulting adversarial eval set in a `.npy` file. This file is\r\n      in a valid attack format for our challenge.\r\n- `python run_attack.py`: evaluates the model on the examples in\r\n      the `.npy` file specified in config, while ensuring that the adversarial examples \r\n      are indeed a valid attack. The script also saves the network predictions in `pred.npy`.\r\n- `python fetch_model.py name`: downloads the pre-trained model with the\r\n      specified name (at the moment `adv_trained` or `natural`), prints the sha256\r\n      hash, and places it in the models directory.\r\n- `cifar10_input.py` provides utility functions and classes for loading the CIFAR10 dataset.\r\n\r\n### Parameters in `config.json`\r\n\r\nModel configuration:\r\n- `model_dir`: contains the path to the directory of the currently \r\n      trained/evaluated model.\r\n\r\nTraining configuration:\r\n- `tf_random_seed`: the seed for the RNG used to initialize the network\r\n      weights.\r\n- `numpy_random_seed`: the seed for the RNG used to pass over the dataset in random order\r\n- `max_num_training_steps`: the number of training steps.\r\n- `num_output_steps`: the number of training steps between printing\r\n      progress in standard output.\r\n- `num_summary_steps`: the number of training steps between storing\r\n      tensorboard summaries.\r\n- `num_checkpoint_steps`: the number of training steps between storing\r\n      model checkpoints.\r\n- `training_batch_size`: the size of the training batch.\r\n\r\nEvaluation configuration:\r\n- `num_eval_examples`: the number of CIFAR10 examples to evaluate the\r\n      model on.\r\n- `eval_batch_size`: the size of the evaluation batches.\r\n- `eval_on_cpu`: forces the `eval.py` script to run on the CPU so it does not compete with `train.py` for GPU resources.\r\n\r\nAdversarial examples configuration:\r\n- `epsilon`: the maximum allowed perturbation per pixel.\r\n- `k`: the number of PGD iterations used by the adversary.\r\n- `a`: the size of the PGD adversary steps.\r\n- `random_start`: specifies whether the adversary will start iterating\r\n      from the natural example or a random perturbation of it.\r\n- `loss_func`: the loss function used to run pgd on. `xent` corresponds to the\r\n      standard cross-entropy loss, `cw` corresponds to the loss function \r\n      of [Carlini and Wagner](https://arxiv.org/abs/1608.04644).\r\n- `store_adv_path`: the file in which adversarial examples are stored.\r\n      Relevant for the `pgd_attack.py` and `run_attack.py` scripts.\r\n\r\n## Example usage\r\nAfter cloning the repository you can either train a new network or evaluate/attack one of our pre-trained networks.\r\n#### Training a new network\r\n* Start training by running:\r\n```\r\npython train.py\r\n```\r\n* (Optional) Evaluation summaries can be logged by simultaneously\r\n  running:\r\n```\r\npython eval.py\r\n```\r\n#### Download a pre-trained network\r\n* For an adversarially trained network, run\r\n```\r\npython fetch_model.py adv_trained\r\n```\r\nand use the `config.json` file to set `\"model_dir\": \"models/adv_trained\"`.\r\n* For a naturally trained network, run\r\n```\r\npython fetch_model.py natural\r\n```\r\nand use the `config.json` file to set `\"model_dir\": \"models/naturally_trained\"`.\r\n#### Test the network\r\n* Create an attack file by running\r\n```\r\npython pgd_attack.py\r\n```\r\n* Evaluate the network with\r\n```\r\npython run_attack.py\r\n```\r\n"
  },
  {
    "path": "madry_cifar10/cifar10_input.py",
    "content": "\"\"\"\nUtilities for importing the CIFAR10 dataset.\n\nEach image in the dataset is a numpy array of shape (32, 32, 3), with the values\nbeing unsigned integers (i.e., in the range 0,1,...,255).\n\"\"\"\n\nfrom __future__ import absolute_import\nfrom __future__ import division\nfrom __future__ import print_function\n\nimport os\nimport pickle\nimport sys\nimport tensorflow as tf\nfrom tensorflow.examples.tutorials.mnist import input_data\nversion = sys.version_info\n\nimport numpy as np\n\nclass CIFAR10Data(object):\n    \"\"\"\n    Unpickles the CIFAR10 dataset from a specified folder containing a pickled\n    version following the format of Krizhevsky which can be found\n    [here](https://www.cs.toronto.edu/~kriz/cifar.html).\n\n    Inputs to constructor\n    =====================\n\n        - path: path to the pickled dataset. The training data must be pickled\n        into five files named data_batch_i for i = 1, ..., 5, containing 10,000\n        examples each, the test data\n        must be pickled into a single file called test_batch containing 10,000\n        examples, and the 10 class names must be\n        pickled into a file called batches.meta. The pickled examples should\n        be stored as a tuple of two objects: an array of 10,000 32x32x3-shaped\n        arrays, and an array of their 10,000 true labels.\n\n    \"\"\"\n    def __init__(self, path):\n        train_filenames = ['data_batch_{}'.format(ii + 1) for ii in range(5)]\n        eval_filename = 'test_batch'\n        metadata_filename = 'batches.meta'\n\n        train_images = np.zeros((50000, 32, 32, 3), dtype='uint8')\n        train_labels = np.zeros(50000, dtype='int32')\n        for ii, fname in enumerate(train_filenames):\n            cur_images, cur_labels = self._load_datafile(os.path.join(path, fname))\n            train_images[ii * 10000 : (ii+1) * 10000, ...] = cur_images\n            train_labels[ii * 10000 : (ii+1) * 10000, ...] = cur_labels\n        eval_images, eval_labels = self._load_datafile(\n            os.path.join(path, eval_filename))\n\n        with open(os.path.join(path, metadata_filename), 'rb') as fo:\n              if version.major == 3:\n                  data_dict = pickle.load(fo, encoding='bytes')\n              else:\n                  data_dict = pickle.load(fo)\n\n              self.label_names = data_dict[b'label_names']\n        for ii in range(len(self.label_names)):\n            self.label_names[ii] = self.label_names[ii].decode('utf-8')\n\n        self.train_data = DataSubset(train_images, train_labels)\n        self.eval_data = DataSubset(eval_images, eval_labels)\n\n    @staticmethod\n    def _load_datafile(filename):\n      with open(filename, 'rb') as fo:\n          if version.major == 3:\n              data_dict = pickle.load(fo, encoding='bytes')\n          else:\n              data_dict = pickle.load(fo)\n\n          assert data_dict[b'data'].dtype == np.uint8\n          image_data = data_dict[b'data']\n          image_data = image_data.reshape((10000, 3, 32, 32)).transpose(0, 2, 3, 1)\n          return image_data, np.array(data_dict[b'labels'])\n\nclass AugmentedCIFAR10Data(object):\n    \"\"\"\n    Data augmentation wrapper over a loaded dataset.\n\n    Inputs to constructor\n    =====================\n        - raw_cifar10data: the loaded CIFAR10 dataset, via the CIFAR10Data class\n        - sess: current tensorflow session\n        - model: current model (needed for input tensor)\n    \"\"\"\n    def __init__(self, raw_cifar10data, sess, model):\n        assert isinstance(raw_cifar10data, CIFAR10Data)\n        self.image_size = 32\n\n        # create augmentation computational graph\n        self.x_input_placeholder = tf.placeholder(tf.float32, shape=[None, 32, 32, 3])\n        padded = tf.map_fn(lambda img: tf.image.resize_image_with_crop_or_pad(\n            img, self.image_size + 4, self.image_size + 4),\n            self.x_input_placeholder)\n        cropped = tf.map_fn(lambda img: tf.random_crop(img, [self.image_size,\n                                                             self.image_size,\n                                                             3]), padded)\n        flipped = tf.map_fn(lambda img: tf.image.random_flip_left_right(img), cropped)\n        self.augmented = flipped\n\n        self.train_data = AugmentedDataSubset(raw_cifar10data.train_data, sess,\n                                             self.x_input_placeholder,\n                                              self.augmented)\n        self.eval_data = AugmentedDataSubset(raw_cifar10data.eval_data, sess,\n                                             self.x_input_placeholder,\n                                             self.augmented)\n        self.label_names = raw_cifar10data.label_names\n\n\nclass DataSubset(object):\n    def __init__(self, xs, ys):\n        self.xs = xs\n        self.n = xs.shape[0]\n        self.ys = ys\n        self.batch_start = 0\n        self.cur_order = np.random.permutation(self.n)\n\n    def get_next_batch(self, batch_size, multiple_passes=False, reshuffle_after_pass=True):\n        if self.n < batch_size:\n            raise ValueError('Batch size can be at most the dataset size')\n        if not multiple_passes:\n            actual_batch_size = min(batch_size, self.n - self.batch_start)\n            if actual_batch_size <= 0:\n                raise ValueError('Pass through the dataset is complete.')\n            batch_end = self.batch_start + actual_batch_size\n            batch_xs = self.xs[self.cur_order[self.batch_start : batch_end], ...]\n            batch_ys = self.ys[self.cur_order[self.batch_start : batch_end], ...]\n            self.batch_start += actual_batch_size\n            return batch_xs, batch_ys\n        actual_batch_size = min(batch_size, self.n - self.batch_start)\n        if actual_batch_size < batch_size:\n            if reshuffle_after_pass:\n                self.cur_order = np.random.permutation(self.n)\n            self.batch_start = 0\n        batch_end = self.batch_start + batch_size\n        batch_xs = self.xs[self.cur_order[self.batch_start : batch_end], ...]\n        batch_ys = self.ys[self.cur_order[self.batch_start : batch_end], ...]\n        self.batch_start += batch_size\n        return batch_xs, batch_ys\n\n\nclass AugmentedDataSubset(object):\n    def __init__(self, raw_datasubset, sess, x_input_placeholder,\n                 augmented):\n        self.sess = sess\n        self.raw_datasubset = raw_datasubset\n        self.x_input_placeholder = x_input_placeholder\n        self.augmented = augmented\n\n    def get_next_batch(self, batch_size, multiple_passes=False, reshuffle_after_pass=True):\n        raw_batch = self.raw_datasubset.get_next_batch(batch_size, multiple_passes,\n                                                       reshuffle_after_pass)\n        images = raw_batch[0].astype(np.float32)\n        return self.sess.run(self.augmented, feed_dict={self.x_input_placeholder:\n                                                    raw_batch[0]}), raw_batch[1]\n\n"
  },
  {
    "path": "madry_cifar10/config.json",
    "content": "{\n  \"_comment\": \"===== MODEL CONFIGURATION =====\",\n  \"model_dir\": \"models/secret\",\n\n  \"_comment\": \"===== DATASET CONFIGURATION =====\",\n  \"data_path\": \"cifar10_data\",\n\n  \"_comment\": \"===== TRAINING CONFIGURATION =====\",\n  \"tf_random_seed\": 451760341,\n  \"np_random_seed\": 216105420,\n  \"max_num_training_steps\": 80000,\n  \"num_output_steps\": 100,\n  \"num_summary_steps\": 100,\n  \"num_checkpoint_steps\": 1000,\n  \"training_batch_size\": 128,\n  \"step_size_schedule\": [[0, 0.1], [40000, 0.01], [60000, 0.001]],\n  \"weight_decay\": 0.0002,\n  \"momentum\": 0.9,\n\n  \"_comment\": \"===== EVAL CONFIGURATION =====\",\n  \"num_eval_examples\": 100,\n  \"eval_batch_size\": 100,\n  \"eval_on_cpu\": false,\n\n  \"_comment\": \"=====ADVERSARIAL EXAMPLES CONFIGURATION=====\",\n  \"epsilon\": 8.0,\n  \"num_steps\": 10,\n  \"step_size\": 2.0,\n  \"random_start\": true,\n  \"loss_func\": \"xent\",\n  \"store_adv_path\": \"attack.npy\"\n}\n"
  },
  {
    "path": "madry_cifar10/eval.py",
    "content": "\"\"\"\nInfinite evaluation loop going through the checkpoints in the model directory\nas they appear and evaluating them. Accuracy and average loss are printed and\nadded as tensorboard summaries.\n\"\"\"\nfrom __future__ import absolute_import\nfrom __future__ import division\nfrom __future__ import print_function\n\nfrom datetime import datetime\nimport json\nimport math\nimport os\nimport sys\nimport time\n\nimport tensorflow as tf\n\nimport cifar10_input\nfrom model import Model\nfrom pgd_attack import LinfPGDAttack\n\n# Global constants\nwith open('config.json') as config_file:\n  config = json.load(config_file)\nnum_eval_examples = config['num_eval_examples']\neval_batch_size = config['eval_batch_size']\neval_on_cpu = config['eval_on_cpu']\ndata_path = config['data_path']\n\nmodel_dir = config['model_dir']\n\n# Set upd the data, hyperparameters, and the model\ncifar = cifar10_input.CIFAR10Data(data_path)\n\nif eval_on_cpu:\n  with tf.device(\"/cpu:0\"):\n    model = Model(mode='eval')\n    attack = LinfPGDAttack(model,\n                           config['epsilon'],\n                           config['num_steps'],\n                           config['step_size'],\n                           config['random_start'],\n                           config['loss_func'])\nelse:\n  model = Model(mode='eval')\n  attack = LinfPGDAttack(model,\n                         config['epsilon'],\n                         config['num_steps'],\n                         config['step_size'],\n                         config['random_start'],\n                         config['loss_func'])\n\nglobal_step = tf.contrib.framework.get_or_create_global_step()\n\n# Setting up the Tensorboard and checkpoint outputs\nif not os.path.exists(model_dir):\n  os.makedirs(model_dir)\neval_dir = os.path.join(model_dir, 'eval')\nif not os.path.exists(eval_dir):\n  os.makedirs(eval_dir)\n\nlast_checkpoint_filename = ''\nalready_seen_state = False\n\nsaver = tf.train.Saver()\nsummary_writer = tf.summary.FileWriter(eval_dir)\n\n# A function for evaluating a single checkpoint\ndef evaluate_checkpoint(filename):\n  with tf.Session() as sess:\n    # Restore the checkpoint\n    saver.restore(sess, filename)\n\n    # Iterate over the samples batch-by-batch\n    num_batches = int(math.ceil(num_eval_examples / eval_batch_size))\n    total_xent_nat = 0.\n    total_xent_adv = 0.\n    total_corr_nat = 0\n    total_corr_adv = 0\n\n    for ibatch in range(num_batches):\n      bstart = ibatch * eval_batch_size\n      bend = min(bstart + eval_batch_size, num_eval_examples)\n\n      x_batch = cifar.eval_data.xs[bstart:bend, :]\n      y_batch = cifar.eval_data.ys[bstart:bend]\n\n      dict_nat = {model.x_input: x_batch,\n                  model.y_input: y_batch}\n\n      x_batch_adv = attack.perturb(x_batch, y_batch, sess)\n\n      dict_adv = {model.x_input: x_batch_adv,\n                  model.y_input: y_batch}\n\n      cur_corr_nat, cur_xent_nat = sess.run(\n                                      [model.num_correct,model.xent],\n                                      feed_dict = dict_nat)\n      cur_corr_adv, cur_xent_adv = sess.run(\n                                      [model.num_correct,model.xent],\n                                      feed_dict = dict_adv)\n\n      print(eval_batch_size)\n      print(\"Correctly classified natural examples: {}\".format(cur_corr_nat))\n      print(\"Correctly classified adversarial examples: {}\".format(cur_corr_adv))\n      total_xent_nat += cur_xent_nat\n      total_xent_adv += cur_xent_adv\n      total_corr_nat += cur_corr_nat\n      total_corr_adv += cur_corr_adv\n\n    avg_xent_nat = total_xent_nat / num_eval_examples\n    avg_xent_adv = total_xent_adv / num_eval_examples\n    acc_nat = total_corr_nat / num_eval_examples\n    acc_adv = total_corr_adv / num_eval_examples\n\n    summary = tf.Summary(value=[\n          tf.Summary.Value(tag='xent adv eval', simple_value= avg_xent_adv),\n          tf.Summary.Value(tag='xent adv', simple_value= avg_xent_adv),\n          tf.Summary.Value(tag='xent nat', simple_value= avg_xent_nat),\n          tf.Summary.Value(tag='accuracy adv eval', simple_value= acc_adv),\n          tf.Summary.Value(tag='accuracy adv', simple_value= acc_adv),\n          tf.Summary.Value(tag='accuracy nat', simple_value= acc_nat)])\n    summary_writer.add_summary(summary, global_step.eval(sess))\n\n    print('natural: {:.2f}%'.format(100 * acc_nat))\n    print('adversarial: {:.2f}%'.format(100 * acc_adv))\n    print('avg nat loss: {:.4f}'.format(avg_xent_nat))\n    print('avg adv loss: {:.4f}'.format(avg_xent_adv))\n\n# Infinite eval loop\nwhile True:\n  cur_checkpoint = tf.train.latest_checkpoint(model_dir)\n\n  # Case 1: No checkpoint yet\n  if cur_checkpoint is None:\n    if not already_seen_state:\n      print('No checkpoint yet, waiting ...', end='')\n      already_seen_state = True\n    else:\n      print('.', end='')\n    sys.stdout.flush()\n    time.sleep(10)\n  # Case 2: Previously unseen checkpoint\n  elif cur_checkpoint != last_checkpoint_filename:\n    print('\\nCheckpoint {}, evaluating ...   ({})'.format(cur_checkpoint,\n                                                          datetime.now()))\n    sys.stdout.flush()\n    last_checkpoint_filename = cur_checkpoint\n    already_seen_state = False\n    evaluate_checkpoint(cur_checkpoint)\n  # Case 3: Previously evaluated checkpoint\n  else:\n    if not already_seen_state:\n      print('Waiting for the next checkpoint ...   ({})   '.format(\n            datetime.now()),\n            end='')\n      already_seen_state = True\n    else:\n      print('.', end='')\n    sys.stdout.flush()\n    time.sleep(10)\n"
  },
  {
    "path": "madry_cifar10/fetch_model.py",
    "content": "\"\"\"Downloads a model, computes its SHA256 hash and unzips it\r\n   at the proper location.\"\"\"\r\nfrom __future__ import absolute_import\r\nfrom __future__ import division\r\nfrom __future__ import print_function\r\n\r\nimport sys\r\nimport zipfile\r\nimport hashlib\r\n\r\nif len(sys.argv) == 1 or sys.argv[1] not in ['natural',\r\n                                             'adv_trained',\r\n                                             'secret']:\r\n  print('Usage: python fetch_model.py [natural, adv_trained]')\r\n  sys.exit(1)\r\n\r\nif sys.argv[1] == 'natural':\r\n  url = 'https://www.dropbox.com/s/cgzd5odqoojvxzk/natural.zip?dl=1'\r\nelif sys.argv[1] == 'adv_trained':\r\n  url = 'https://www.dropbox.com/s/g4b6ntrp8zrudbz/adv_trained.zip?dl=1'\r\nelse: # fetch secret model\r\n  url = 'https://www.dropbox.com/s/ywc0hg8lr5ba8zd/secret.zip?dl=1'\r\n\r\nfname = url.split('/')[-1].split('?')[0]  # get the name of the file\r\n\r\n# model download\r\nprint('Downloading models')\r\nif sys.version_info >= (3,):\r\n  import urllib.request\r\n  urllib.request.urlretrieve(url, fname)\r\nelse:\r\n  import urllib\r\n  urllib.urlretrieve(url, fname)\r\n\r\n# computing model hash\r\nsha256 = hashlib.sha256()\r\nwith open(fname, 'rb') as f:\r\n  data = f.read()\r\n  sha256.update(data)\r\nprint('SHA256 hash: {}'.format(sha256.hexdigest()))\r\n\r\n# extracting model\r\nprint('Extracting model')\r\nwith zipfile.ZipFile(fname, 'r') as model_zip:\r\n  model_zip.extractall()\r\n  print('Extracted model in {}'.format(model_zip.namelist()[0]))\r\n"
  },
  {
    "path": "madry_cifar10/model.py",
    "content": "# based on https://github.com/tensorflow/models/tree/master/resnet\nfrom __future__ import absolute_import\nfrom __future__ import division\nfrom __future__ import print_function\n\nimport numpy as np\nimport tensorflow as tf\n\nclass Model(object):\n  \"\"\"ResNet model.\"\"\"\n\n  def __init__(self, mode='eval'):\n    \"\"\"ResNet constructor.\n\n    Args:\n      mode: One of 'train' and 'eval'.\n    \"\"\"\n    self.mode = mode\n    self._build_model()\n\n  def add_internal_summaries(self):\n    pass\n\n  def _stride_arr(self, stride):\n    \"\"\"Map a stride scalar to the stride array for tf.nn.conv2d.\"\"\"\n    return [1, stride, stride, 1]\n\n  def _build_model(self):\n    assert self.mode == 'train' or self.mode == 'eval'\n    \"\"\"Build the core model within the graph.\"\"\"\n    with tf.variable_scope('input'):\n\n      self.x_input = tf.placeholder(\n        tf.float32,\n        shape=[None, 32, 32, 3])\n\n      self.y_input = tf.placeholder(tf.int64, shape=None)\n\n\n      input_standardized = tf.map_fn(lambda img: tf.image.per_image_standardization(img),\n                               self.x_input)\n      x = self._conv('init_conv', input_standardized, 3, 3, 16, self._stride_arr(1))\n\n\n\n    strides = [1, 2, 2]\n    activate_before_residual = [True, False, False]\n    res_func = self._residual\n\n    # Uncomment the following codes to use w28-10 wide residual network.\n    # It is more memory efficient than very deep residual network and has\n    # comparably good performance.\n    # https://arxiv.org/pdf/1605.07146v1.pdf\n    filters = [16, 160, 320, 640]\n\n\n    # Update hps.num_residual_units to 9\n\n    with tf.variable_scope('unit_1_0'):\n      x = res_func(x, filters[0], filters[1], self._stride_arr(strides[0]),\n                   activate_before_residual[0])\n    for i in range(1, 5):\n      with tf.variable_scope('unit_1_%d' % i):\n        x = res_func(x, filters[1], filters[1], self._stride_arr(1), False)\n\n    with tf.variable_scope('unit_2_0'):\n      x = res_func(x, filters[1], filters[2], self._stride_arr(strides[1]),\n                   activate_before_residual[1])\n    for i in range(1, 5):\n      with tf.variable_scope('unit_2_%d' % i):\n        x = res_func(x, filters[2], filters[2], self._stride_arr(1), False)\n\n    with tf.variable_scope('unit_3_0'):\n      x = res_func(x, filters[2], filters[3], self._stride_arr(strides[2]),\n                   activate_before_residual[2])\n    for i in range(1, 5):\n      with tf.variable_scope('unit_3_%d' % i):\n        x = res_func(x, filters[3], filters[3], self._stride_arr(1), False)\n\n    with tf.variable_scope('unit_last'):\n      x = self._batch_norm('final_bn', x)\n      x = self._relu(x, 0.1)\n      x = self._global_avg_pool(x)\n\n    with tf.variable_scope('logit'):\n      self.pre_softmax = self._fully_connected(x, 10)\n\n    self.predictions = tf.argmax(self.pre_softmax, 1)\n    self.correct_prediction = tf.equal(self.predictions, self.y_input)\n    self.num_correct = tf.reduce_sum(\n        tf.cast(self.correct_prediction, tf.int64))\n    self.accuracy = tf.reduce_mean(\n        tf.cast(self.correct_prediction, tf.float32))\n\n    with tf.variable_scope('costs'):\n      self.y_xent = tf.nn.sparse_softmax_cross_entropy_with_logits(\n          logits=self.pre_softmax, labels=self.y_input)\n      self.xent_per_point = self.y_xent\n      self.xent = tf.reduce_sum(self.y_xent, name='y_xent')\n      self.mean_xent = tf.reduce_mean(self.y_xent)\n      self.weight_decay_loss = self._decay()\n\n  def _batch_norm(self, name, x):\n    \"\"\"Batch normalization.\"\"\"\n    with tf.name_scope(name):\n      return tf.contrib.layers.batch_norm(\n          inputs=x,\n          decay=.9,\n          center=True,\n          scale=True,\n          activation_fn=None,\n          updates_collections=None,\n          is_training=(self.mode == 'train'))\n\n  def _residual(self, x, in_filter, out_filter, stride,\n                activate_before_residual=False):\n    \"\"\"Residual unit with 2 sub layers.\"\"\"\n    if activate_before_residual:\n      with tf.variable_scope('shared_activation'):\n        x = self._batch_norm('init_bn', x)\n        x = self._relu(x, 0.1)\n        orig_x = x\n    else:\n      with tf.variable_scope('residual_only_activation'):\n        orig_x = x\n        x = self._batch_norm('init_bn', x)\n        x = self._relu(x, 0.1)\n\n    with tf.variable_scope('sub1'):\n      x = self._conv('conv1', x, 3, in_filter, out_filter, stride)\n\n    with tf.variable_scope('sub2'):\n      x = self._batch_norm('bn2', x)\n      x = self._relu(x, 0.1)\n      x = self._conv('conv2', x, 3, out_filter, out_filter, [1, 1, 1, 1])\n\n    with tf.variable_scope('sub_add'):\n      if in_filter != out_filter:\n        orig_x = tf.nn.avg_pool(orig_x, stride, stride, 'VALID')\n        orig_x = tf.pad(\n            orig_x, [[0, 0], [0, 0], [0, 0],\n                     [(out_filter-in_filter)//2, (out_filter-in_filter)//2]])\n      x += orig_x\n\n    tf.logging.debug('image after unit %s', x.get_shape())\n    return x\n\n  def _decay(self):\n    \"\"\"L2 weight decay loss.\"\"\"\n    costs = []\n    for var in tf.trainable_variables():\n      if var.op.name.find('DW') > 0:\n        costs.append(tf.nn.l2_loss(var))\n    return tf.add_n(costs)\n\n  def _conv(self, name, x, filter_size, in_filters, out_filters, strides):\n    \"\"\"Convolution.\"\"\"\n    with tf.variable_scope(name):\n      n = filter_size * filter_size * out_filters\n      kernel = tf.get_variable(\n          'DW', [filter_size, filter_size, in_filters, out_filters],\n          tf.float32, initializer=tf.random_normal_initializer(\n              stddev=np.sqrt(2.0/n)))\n      return tf.nn.conv2d(x, kernel, strides, padding='SAME')\n\n  def _relu(self, x, leakiness=0.0):\n    \"\"\"Relu, with optional leaky support.\"\"\"\n    return tf.where(tf.less(x, 0.0), leakiness * x, x, name='leaky_relu')\n\n  def _fully_connected(self, x, out_dim):\n    \"\"\"FullyConnected layer for final output.\"\"\"\n    num_non_batch_dimensions = len(x.shape)\n    prod_non_batch_dimensions = 1\n    for ii in range(num_non_batch_dimensions - 1):\n      prod_non_batch_dimensions *= int(x.shape[ii + 1])\n    x = tf.reshape(x, [tf.shape(x)[0], -1])\n    w = tf.get_variable(\n        'DW', [prod_non_batch_dimensions, out_dim],\n        initializer=tf.uniform_unit_scaling_initializer(factor=1.0))\n    b = tf.get_variable('biases', [out_dim],\n                        initializer=tf.constant_initializer())\n    return tf.nn.xw_plus_b(x, w, b)\n\n  def _global_avg_pool(self, x):\n    assert x.get_shape().ndims == 4\n    return tf.reduce_mean(x, [1, 2])\n\n\n\n"
  },
  {
    "path": "madry_cifar10/model_robustml.py",
    "content": "import robustml\r\nimport tensorflow as tf\r\n\r\nimport model\r\n\r\nclass Model(robustml.model.Model):\r\n  def __init__(self, sess):\r\n    self._model = model.Model('eval')\r\n\r\n    saver = tf.train.Saver()\r\n    checkpoint = tf.train.latest_checkpoint('models/secret')\r\n    saver.restore(sess, checkpoint)\r\n\r\n    self._sess = sess\r\n    self._input = self._model.x_input\r\n    self._logits = self._model.pre_softmax\r\n    self._predictions = self._model.predictions\r\n    self._dataset = robustml.dataset.CIFAR10()\r\n    self._threat_model = robustml.threat_model.Linf(epsilon=0.03)\r\n\r\n  @property\r\n  def dataset(self):\r\n      return self._dataset\r\n\r\n  @property\r\n  def threat_model(self):\r\n      return self._threat_model\r\n\r\n  def classify(self, x):\r\n      return self._sess.run(self._predictions,\r\n                            {self._input: x})[0]\r\n\r\n  # expose attack interface\r\n\r\n  @property\r\n  def input(self):\r\n      return self._input\r\n\r\n  @property\r\n  def logits(self):\r\n      return self._logits\r\n\r\n  @property\r\n  def predictions(self):\r\n      return self._predictions\r\n"
  },
  {
    "path": "madry_cifar10/pgd_attack.py",
    "content": "\"\"\"\r\nImplementation of attack methods. Running this file as a program will\r\napply the attack to the model specified by the config file and store\r\nthe examples in an .npy file.\r\n\"\"\"\r\nfrom __future__ import absolute_import\r\nfrom __future__ import division\r\nfrom __future__ import print_function\r\n\r\nimport tensorflow as tf\r\nimport numpy as np\r\n\r\nimport cifar10_input\r\n\r\n\r\nclass LinfPGDAttack:\r\n    def __init__(self, model, epsilon, num_steps, step_size, random_start, loss_func):\r\n        \"\"\"Attack parameter initialization. The attack performs k steps of\r\n       size a, while always staying within epsilon from the initial\r\n       point.\"\"\"\r\n        self.model = model\r\n        self.epsilon = epsilon\r\n        self.num_steps = num_steps\r\n        self.step_size = step_size\r\n        self.rand = random_start\r\n\r\n        if loss_func == 'xent':\r\n            loss = model.xent\r\n        elif loss_func == 'cw':\r\n            label_mask = tf.one_hot(model.y_input,\r\n                                    10,\r\n                                    on_value=1.0,\r\n                                    off_value=0.0,\r\n                                    dtype=tf.float32)\r\n            correct_logit = tf.reduce_sum(label_mask * model.pre_softmax, axis=1)\r\n            wrong_logit = tf.reduce_max((1 - label_mask) * model.pre_softmax - 1e4 * label_mask, axis=1)\r\n            loss = -tf.nn.relu(correct_logit - wrong_logit + 50)\r\n        else:\r\n            print('Unknown loss function. Defaulting to cross-entropy')\r\n            loss = model.xent\r\n\r\n        self.grad = tf.gradients(loss, model.x_input)[0]\r\n\r\n    def perturb(self, x_nat, y, sess):\r\n        \"\"\"Given a set of examples (x_nat, y), returns a set of adversarial\r\n       examples within epsilon of x_nat in l_infinity norm.\"\"\"\r\n        if self.rand:\r\n            x = x_nat + np.random.uniform(-self.epsilon, self.epsilon, x_nat.shape)\r\n            x = np.clip(x, 0, 255)  # ensure valid pixel range\r\n        else:\r\n            x = np.copy(x_nat)\r\n\r\n        for i in range(self.num_steps):\r\n            grad = sess.run(self.grad, feed_dict={self.model.x_input: x,\r\n                                                  self.model.y_input: y})\r\n\r\n            x = np.add(x, self.step_size * np.sign(grad), out=x, casting='unsafe')\r\n\r\n            x = np.clip(x, x_nat - self.epsilon, x_nat + self.epsilon)\r\n            x = np.clip(x, 0, 255)  # ensure valid pixel range\r\n\r\n        return x\r\n\r\n\r\nif __name__ == '__main__':\r\n    import json\r\n    import sys\r\n    import math\r\n\r\n    from model import Model\r\n\r\n    with open('config.json') as config_file:\r\n        config = json.load(config_file)\r\n\r\n    model_file = tf.train.latest_checkpoint(config['model_dir'])\r\n    if model_file is None:\r\n        print('No model found')\r\n        sys.exit()\r\n\r\n    model = Model(mode='eval')\r\n    attack = LinfPGDAttack(model,\r\n                           config['epsilon'],\r\n                           config['num_steps'],\r\n                           config['step_size'],\r\n                           config['random_start'],\r\n                           config['loss_func'])\r\n    saver = tf.train.Saver()\r\n\r\n    data_path = config['data_path']\r\n    cifar = cifar10_input.CIFAR10Data(data_path)\r\n\r\n    gpu_options = tf.GPUOptions(visible_device_list='7', per_process_gpu_memory_fraction=0.5)\r\n    tf_config = tf.ConfigProto(allow_soft_placement=True, gpu_options=gpu_options)\r\n    with tf.Session(config=tf_config) as sess:\r\n        # Restore the checkpoint\r\n        saver.restore(sess, model_file)\r\n\r\n        # Iterate over the samples batch-by-batch\r\n        num_eval_examples = config['num_eval_examples']\r\n        eval_batch_size = config['eval_batch_size']\r\n        num_batches = int(math.ceil(num_eval_examples / eval_batch_size))\r\n\r\n        x_adv = []  # adv accumulator\r\n\r\n        print('Iterating over {} batches'.format(num_batches))\r\n\r\n        for ibatch in range(num_batches):\r\n            bstart = ibatch * eval_batch_size\r\n            bend = min(bstart + eval_batch_size, num_eval_examples)\r\n            print('batch size: {}'.format(bend - bstart))\r\n\r\n            x_batch = cifar.eval_data.xs[bstart:bend, :]\r\n            y_batch = cifar.eval_data.ys[bstart:bend]\r\n\r\n            x_batch_adv = attack.perturb(x_batch, y_batch, sess)\r\n\r\n            x_adv.append(x_batch_adv)\r\n\r\n        print('Storing examples')\r\n        path = config['store_adv_path']\r\n        x_adv = np.concatenate(x_adv, axis=0)\r\n        np.save(path, x_adv)\r\n        print('Examples stored in {}'.format(path))\r\n"
  },
  {
    "path": "madry_cifar10/run_attack.py",
    "content": "\"\"\"Evaluates a model against examples from a .npy file as specified\r\n   in config.json\"\"\"\r\nfrom __future__ import absolute_import\r\nfrom __future__ import division\r\nfrom __future__ import print_function\r\n\r\nfrom datetime import datetime\r\nimport json\r\nimport math\r\nimport os\r\nimport sys\r\nimport time\r\n\r\nimport tensorflow as tf\r\nimport numpy as np\r\n\r\nfrom model import Model\r\nimport cifar10_input\r\n\r\nwith open('config.json') as config_file:\r\n    config = json.load(config_file)\r\n\r\ndata_path = config['data_path']\r\n\r\ndef run_attack(checkpoint, x_adv, epsilon):\r\n  cifar = cifar10_input.CIFAR10Data(data_path)\r\n\r\n  model = Model(mode='eval')\r\n\r\n  saver = tf.train.Saver()\r\n\r\n  num_eval_examples = 10000\r\n  eval_batch_size = 100\r\n\r\n  num_batches = int(math.ceil(num_eval_examples / eval_batch_size))\r\n  total_corr = 0\r\n\r\n  x_nat = cifar.eval_data.xs\r\n  l_inf = np.amax(np.abs(x_nat - x_adv))\r\n\r\n  if l_inf > epsilon + 0.0001:\r\n    print('maximum perturbation found: {}'.format(l_inf))\r\n    print('maximum perturbation allowed: {}'.format(epsilon))\r\n    return\r\n\r\n  y_pred = [] # label accumulator\r\n\r\n  with tf.Session() as sess:\r\n    # Restore the checkpoint\r\n    saver.restore(sess, checkpoint)\r\n\r\n    # Iterate over the samples batch-by-batch\r\n    for ibatch in range(num_batches):\r\n      bstart = ibatch * eval_batch_size\r\n      bend = min(bstart + eval_batch_size, num_eval_examples)\r\n\r\n      x_batch = x_adv[bstart:bend, :]\r\n      y_batch = cifar.eval_data.ys[bstart:bend]\r\n\r\n      dict_adv = {model.x_input: x_batch,\r\n                  model.y_input: y_batch}\r\n      cur_corr, y_pred_batch = sess.run([model.num_correct, model.predictions],\r\n                                        feed_dict=dict_adv)\r\n\r\n      total_corr += cur_corr\r\n      y_pred.append(y_pred_batch)\r\n\r\n  accuracy = total_corr / num_eval_examples\r\n\r\n  print('Accuracy: {:.2f}%'.format(100.0 * accuracy))\r\n  y_pred = np.concatenate(y_pred, axis=0)\r\n  np.save('pred.npy', y_pred)\r\n  print('Output saved at pred.npy')\r\n\r\nif __name__ == '__main__':\r\n  import json\r\n\r\n  with open('config.json') as config_file:\r\n    config = json.load(config_file)\r\n\r\n  model_dir = config['model_dir']\r\n\r\n  checkpoint = tf.train.latest_checkpoint(model_dir)\r\n  x_adv = np.load(config['store_adv_path'])\r\n\r\n  if checkpoint is None:\r\n    print('No checkpoint found')\r\n  elif x_adv.shape != (10000, 32, 32, 3):\r\n    print('Invalid shape: expected (10000, 32, 32, 3), found {}'.format(x_adv.shape))\r\n  elif np.amax(x_adv) > 255.0001 or np.amin(x_adv) < -0.0001:\r\n    print('Invalid pixel range. Expected [0, 255], found [{}, {}]'.format(\r\n                                                              np.amin(x_adv),\r\n                                                              np.amax(x_adv)))\r\n  else:\r\n    run_attack(checkpoint, x_adv, config['epsilon'])\r\n"
  },
  {
    "path": "madry_cifar10/train.py",
    "content": "\"\"\"Trains a model, saving checkpoints and tensorboard summaries along\n   the way.\"\"\"\nfrom __future__ import absolute_import\nfrom __future__ import division\nfrom __future__ import print_function\n\nfrom datetime import datetime\nimport json\nimport os\nimport shutil\nfrom timeit import default_timer as timer\n\nimport tensorflow as tf\nimport numpy as np\n\nfrom model import Model\nimport cifar10_input\nfrom pgd_attack import LinfPGDAttack\n\nwith open('config.json') as config_file:\n    config = json.load(config_file)\n\n# seeding randomness\ntf.set_random_seed(config['tf_random_seed'])\nnp.random.seed(config['np_random_seed'])\n\n# Setting up training parameters\nmax_num_training_steps = config['max_num_training_steps']\nnum_output_steps = config['num_output_steps']\nnum_summary_steps = config['num_summary_steps']\nnum_checkpoint_steps = config['num_checkpoint_steps']\nstep_size_schedule = config['step_size_schedule']\nweight_decay = config['weight_decay']\ndata_path = config['data_path']\nmomentum = config['momentum']\nbatch_size = config['training_batch_size']\n\n# Setting up the data and the model\nraw_cifar = cifar10_input.CIFAR10Data(data_path)\nglobal_step = tf.contrib.framework.get_or_create_global_step()\nmodel = Model(mode='train')\n\n# Setting up the optimizer\nboundaries = [int(sss[0]) for sss in step_size_schedule]\nboundaries = boundaries[1:]\nvalues = [sss[1] for sss in step_size_schedule]\nlearning_rate = tf.train.piecewise_constant(\n    tf.cast(global_step, tf.int32),\n    boundaries,\n    values)\ntotal_loss = model.mean_xent + weight_decay * model.weight_decay_loss\ntrain_step = tf.train.MomentumOptimizer(learning_rate, momentum).minimize(\n    total_loss,\n    global_step=global_step)\n\n# Set up adversary\nattack = LinfPGDAttack(model,\n                       config['epsilon'],\n                       config['num_steps'],\n                       config['step_size'],\n                       config['random_start'],\n                       config['loss_func'])\n\n# Setting up the Tensorboard and checkpoint outputs\nmodel_dir = config['model_dir']\nif not os.path.exists(model_dir):\n  os.makedirs(model_dir)\n\n# We add accuracy and xent twice so we can easily make three types of\n# comparisons in Tensorboard:\n# - train vs eval (for a single run)\n# - train of different runs\n# - eval of different runs\n\nsaver = tf.train.Saver(max_to_keep=3)\ntf.summary.scalar('accuracy adv train', model.accuracy)\ntf.summary.scalar('accuracy adv', model.accuracy)\ntf.summary.scalar('xent adv train', model.xent / batch_size)\ntf.summary.scalar('xent adv', model.xent / batch_size)\ntf.summary.image('images adv train', model.x_input)\nmerged_summaries = tf.summary.merge_all()\n\n# keep the configuration file with the model for reproducibility\nshutil.copy('config.json', model_dir)\n\nwith tf.Session() as sess:\n\n  # initialize data augmentation\n  cifar = cifar10_input.AugmentedCIFAR10Data(raw_cifar, sess, model)\n\n  # Initialize the summary writer, global variables, and our time counter.\n  summary_writer = tf.summary.FileWriter(model_dir, sess.graph)\n  sess.run(tf.global_variables_initializer())\n  training_time = 0.0\n\n  # Main training loop\n  for ii in range(max_num_training_steps):\n    x_batch, y_batch = cifar.train_data.get_next_batch(batch_size,\n                                                       multiple_passes=True)\n\n    # Compute Adversarial Perturbations\n    start = timer()\n    x_batch_adv = attack.perturb(x_batch, y_batch, sess)\n    end = timer()\n    training_time += end - start\n\n    nat_dict = {model.x_input: x_batch,\n                model.y_input: y_batch}\n\n    adv_dict = {model.x_input: x_batch_adv,\n                model.y_input: y_batch}\n\n    # Output to stdout\n    if ii % num_output_steps == 0:\n      nat_acc = sess.run(model.accuracy, feed_dict=nat_dict)\n      adv_acc = sess.run(model.accuracy, feed_dict=adv_dict)\n      print('Step {}:    ({})'.format(ii, datetime.now()))\n      print('    training nat accuracy {:.4}%'.format(nat_acc * 100))\n      print('    training adv accuracy {:.4}%'.format(adv_acc * 100))\n      if ii != 0:\n        print('    {} examples per second'.format(\n            num_output_steps * batch_size / training_time))\n        training_time = 0.0\n    # Tensorboard summaries\n    if ii % num_summary_steps == 0:\n      summary = sess.run(merged_summaries, feed_dict=adv_dict)\n      summary_writer.add_summary(summary, global_step.eval(sess))\n\n    # Write a checkpoint\n    if ii % num_checkpoint_steps == 0:\n      saver.save(sess,\n                 os.path.join(model_dir, 'checkpoint'),\n                 global_step=global_step)\n\n    # Actual training step\n    start = timer()\n    sess.run(train_step, feed_dict=adv_dict)\n    end = timer()\n    training_time += end - start\n"
  },
  {
    "path": "madry_mnist/LICENSE",
    "content": "MIT License\n\nCopyright (c) 2017 Aleksander Madry, Aleksandar Makelov, Ludwig Schmidt, Dimitris Tsipras, and Adrian Vladu\n\nPermission is hereby granted, free of charge, to any person obtaining a copy\nof this software and associated documentation files (the \"Software\"), to deal\nin the Software without restriction, including without limitation the rights\nto use, copy, modify, merge, publish, distribute, sublicense, and/or sell\ncopies of the Software, and to permit persons to whom the Software is\nfurnished to do so, subject to the following conditions:\n\nThe above copyright notice and this permission notice shall be included in all\ncopies or substantial portions of the Software.\n\nTHE SOFTWARE IS PROVIDED \"AS IS\", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR\nIMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,\nFITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE\nAUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER\nLIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,\nOUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE\nSOFTWARE."
  },
  {
    "path": "madry_mnist/config.json",
    "content": "{\r\n  \"_comment\": \"===== MODEL CONFIGURATION =====\",\r\n  \"model_dir\": \"models/secret\",\r\n\r\n  \"_comment\": \"===== TRAINING CONFIGURATION =====\",\r\n  \"random_seed\": 4557077,\r\n  \"max_num_training_steps\": 100000,\r\n  \"num_output_steps\": 100,\r\n  \"num_summary_steps\": 100,\r\n  \"num_checkpoint_steps\": 300,\r\n  \"training_batch_size\": 50,\r\n\r\n  \"_comment\": \"===== EVAL CONFIGURATION =====\",\r\n  \"num_eval_examples\": 10000,\r\n  \"eval_on_cpu\": false,\r\n\r\n  \"_comment\": \"=====ADVERSARIAL EXAMPLES CONFIGURATION=====\",\r\n  \"epsilon\": 0.3,\r\n  \"k\": 100,\r\n  \"a\": 0.01,\r\n  \"random_start\": true,\r\n  \"loss_func\": \"xent\",\r\n  \"store_adv_path\": \"attack.npy\"\r\n}\r\n"
  },
  {
    "path": "madry_mnist/eval.py",
    "content": "\"\"\"\r\nInfinite evaluation loop going through the checkpoints in the model directory\r\nas they appear and evaluating them. Accuracy and average loss are printed and\r\nadded as tensorboard summaries.\r\n\"\"\"\r\nfrom __future__ import absolute_import\r\nfrom __future__ import division\r\nfrom __future__ import print_function\r\n\r\nfrom datetime import datetime\r\nimport json\r\nimport math\r\nimport os\r\nimport sys\r\nimport time\r\n\r\nimport tensorflow as tf\r\nfrom tensorflow.examples.tutorials.mnist import input_data\r\n\r\nfrom model import Model\r\nfrom attack import LinfPGDAttack\r\n\r\n# Global constants\r\nwith open('config.json') as config_file:\r\n  config = json.load(config_file)\r\nnum_eval_examples = config['num_eval_examples']\r\neval_batch_size = config['eval_batch_size']\r\neval_on_cpu = config['eval_on_cpu']\r\n\r\nmodel_dir = config['model_dir']\r\n\r\n# Set upd the data, hyperparameters, and the model\r\nmnist = input_data.read_data_sets('MNIST_data', one_hot=False)\r\n\r\nif eval_on_cpu:\r\n  with tf.device(\"/cpu:0\"):\r\n    model = Model()\r\n    attack = LinfPGDAttack(model, \r\n                           config['epsilon'],\r\n                           config['k'],\r\n                           config['a'],\r\n                           config['random_start'],\r\n                           config['loss_func'])\r\nelse:\r\n  model = Model()\r\n  attack = LinfPGDAttack(model, \r\n                         config['epsilon'],\r\n                         config['k'],\r\n                         config['a'],\r\n                         config['random_start'],\r\n                         config['loss_func'])\r\n\r\nglobal_step = tf.contrib.framework.get_or_create_global_step()\r\n\r\n# Setting up the Tensorboard and checkpoint outputs\r\nif not os.path.exists(model_dir):\r\n  os.makedirs(model_dir)\r\neval_dir = os.path.join(model_dir, 'eval')\r\nif not os.path.exists(eval_dir):\r\n  os.makedirs(eval_dir)\r\n\r\nlast_checkpoint_filename = ''\r\nalready_seen_state = False\r\n\r\nsaver = tf.train.Saver()\r\nsummary_writer = tf.summary.FileWriter(eval_dir)\r\n\r\n# A function for evaluating a single checkpoint\r\ndef evaluate_checkpoint(filename):\r\n  with tf.Session() as sess:\r\n    # Restore the checkpoint\r\n    saver.restore(sess, filename)\r\n\r\n    # Iterate over the samples batch-by-batch\r\n    num_batches = int(math.ceil(num_eval_examples / eval_batch_size))\r\n    total_xent_nat = 0.\r\n    total_xent_adv = 0.\r\n    total_corr_nat = 0\r\n    total_corr_adv = 0\r\n\r\n    for ibatch in range(num_batches):\r\n      bstart = ibatch * eval_batch_size\r\n      bend = min(bstart + eval_batch_size, num_eval_examples)\r\n\r\n      x_batch = mnist.test.images[bstart:bend, :]\r\n      y_batch = mnist.test.labels[bstart:bend]\r\n\r\n      dict_nat = {model.x_input: x_batch,\r\n                  model.y_input: y_batch}\r\n\r\n      x_batch_adv = attack.perturb(x_batch, y_batch, sess)\r\n\r\n      dict_adv = {model.x_input: x_batch_adv,\r\n                  model.y_input: y_batch}\r\n\r\n      cur_corr_nat, cur_xent_nat = sess.run(\r\n                                      [model.num_correct,model.xent],\r\n                                      feed_dict = dict_nat)\r\n      cur_corr_adv, cur_xent_adv = sess.run(\r\n                                      [model.num_correct,model.xent],\r\n                                      feed_dict = dict_adv)\r\n\r\n      total_xent_nat += cur_xent_nat\r\n      total_xent_adv += cur_xent_adv\r\n      total_corr_nat += cur_corr_nat\r\n      total_corr_adv += cur_corr_adv\r\n\r\n    avg_xent_nat = total_xent_nat / num_eval_examples\r\n    avg_xent_adv = total_xent_adv / num_eval_examples\r\n    acc_nat = total_corr_nat / num_eval_examples\r\n    acc_adv = total_corr_adv / num_eval_examples\r\n\r\n    summary = tf.Summary(value=[\r\n          tf.Summary.Value(tag='xent adv eval', simple_value= avg_xent_adv),\r\n          tf.Summary.Value(tag='xent adv', simple_value= avg_xent_adv),\r\n          tf.Summary.Value(tag='xent nat', simple_value= avg_xent_nat),\r\n          tf.Summary.Value(tag='accuracy adv eval', simple_value= acc_adv),\r\n          tf.Summary.Value(tag='accuracy adv', simple_value= acc_adv),\r\n          tf.Summary.Value(tag='accuracy nat', simple_value= acc_nat)])\r\n    summary_writer.add_summary(summary, global_step.eval(sess))\r\n\r\n    print('natural: {:.2f}%'.format(100 * acc_nat))\r\n    print('adversarial: {:.2f}%'.format(100 * acc_adv))\r\n    print('avg nat loss: {:.4f}'.format(avg_xent_nat))\r\n    print('avg adv loss: {:.4f}'.format(avg_xent_adv))\r\n\r\n# Infinite eval loop\r\nwhile True:\r\n  cur_checkpoint = tf.train.latest_checkpoint(model_dir)\r\n\r\n  # Case 1: No checkpoint yet\r\n  if cur_checkpoint is None:\r\n    if not already_seen_state:\r\n      print('No checkpoint yet, waiting ...', end='')\r\n      already_seen_state = True\r\n    else:\r\n      print('.', end='')\r\n    sys.stdout.flush()\r\n    time.sleep(10)\r\n  # Case 2: Previously unseen checkpoint\r\n  elif cur_checkpoint != last_checkpoint_filename:\r\n    print('\\nCheckpoint {}, evaluating ...   ({})'.format(cur_checkpoint,\r\n                                                          datetime.now()))\r\n    sys.stdout.flush()\r\n    last_checkpoint_filename = cur_checkpoint\r\n    already_seen_state = False\r\n    evaluate_checkpoint(cur_checkpoint)\r\n  # Case 3: Previously evaluated checkpoint\r\n  else:\r\n    if not already_seen_state:\r\n      print('Waiting for the next checkpoint ...   ({})   '.format(\r\n            datetime.now()),\r\n            end='')\r\n      already_seen_state = True\r\n    else:\r\n      print('.', end='')\r\n    sys.stdout.flush()\r\n    time.sleep(10)\r\n"
  },
  {
    "path": "madry_mnist/fetch_model.py",
    "content": "\"\"\"Downloads a model, computes its SHA256 hash and unzips it\r\n   at the proper location.\"\"\"\r\nfrom __future__ import absolute_import\r\nfrom __future__ import division\r\nfrom __future__ import print_function\r\n\r\nimport sys\r\nimport zipfile\r\nimport hashlib\r\n\r\nif len(sys.argv) != 2 or sys.argv[1] not in ['natural',\r\n                                             'adv_trained',\r\n                                             'secret']:\r\n  print('Usage: python fetch_model.py [natural, adv_trained, secret]')\r\n  sys.exit(1)\r\n\r\nif sys.argv[1] == 'natural':\r\n  url = 'https://github.com/MadryLab/mnist_challenge_models/raw/master/natural.zip'\r\nelif sys.argv[1] == 'secret':\r\n  url = 'https://github.com/MadryLab/mnist_challenge_models/raw/master/secret.zip'\r\nelse: # fetch adv_trained model\r\n  url = 'https://github.com/MadryLab/mnist_challenge_models/raw/master/adv_trained.zip'\r\n\r\nfname = url.split('/')[-1]  # get the name of the file\r\n\r\n# model download\r\nprint('Downloading models')\r\nif sys.version_info >= (3,):\r\n  import urllib.request\r\n  urllib.request.urlretrieve(url, fname)\r\nelse:\r\n  import urllib\r\n  urllib.urlretrieve(url, fname)\r\n\r\n# computing model hash\r\nsha256 = hashlib.sha256()\r\nwith open(fname, 'rb') as f:\r\n  data = f.read()\r\n  sha256.update(data)\r\nprint('SHA256 hash: {}'.format(sha256.hexdigest()))\r\n\r\n# extracting model\r\nprint('Extracting model')\r\nwith zipfile.ZipFile(fname, 'r') as model_zip:\r\n  model_zip.extractall()\r\n  print('Extracted model in {}'.format(model_zip.namelist()[0]))\r\n"
  },
  {
    "path": "madry_mnist/model.py",
    "content": "\"\"\"\r\nThe model is adapted from the tensorflow tutorial:\r\nhttps://www.tensorflow.org/get_started/mnist/pros\r\n\"\"\"\r\nfrom __future__ import absolute_import\r\nfrom __future__ import division\r\nfrom __future__ import print_function\r\n\r\nimport tensorflow as tf\r\n\r\nclass Model(object):\r\n  def __init__(self):\r\n    self.x_input = tf.placeholder(tf.float32, shape = [None, 784])\r\n    self.y_input = tf.placeholder(tf.int64, shape = [None])\r\n\r\n    self.x_image = tf.reshape(self.x_input, [-1, 28, 28, 1])\r\n\r\n    # first convolutional layer\r\n    W_conv1 = self._weight_variable([5,5,1,32])\r\n    b_conv1 = self._bias_variable([32])\r\n\r\n    h_conv1 = tf.nn.relu(self._conv2d(self.x_image, W_conv1) + b_conv1)\r\n    h_pool1 = self._max_pool_2x2(h_conv1)\r\n\r\n    # second convolutional layer\r\n    W_conv2 = self._weight_variable([5,5,32,64])\r\n    b_conv2 = self._bias_variable([64])\r\n\r\n    h_conv2 = tf.nn.relu(self._conv2d(h_pool1, W_conv2) + b_conv2)\r\n    h_pool2 = self._max_pool_2x2(h_conv2)\r\n\r\n    # first fully connected layer\r\n    W_fc1 = self._weight_variable([7 * 7 * 64, 1024])\r\n    b_fc1 = self._bias_variable([1024])\r\n\r\n    h_pool2_flat = tf.reshape(h_pool2, [-1, 7 * 7 * 64])\r\n    h_fc1 = tf.nn.relu(tf.matmul(h_pool2_flat, W_fc1) + b_fc1)\r\n\r\n    # output layer\r\n    W_fc2 = self._weight_variable([1024,10])\r\n    b_fc2 = self._bias_variable([10])\r\n\r\n    self.pre_softmax = tf.matmul(h_fc1, W_fc2) + b_fc2\r\n\r\n    y_xent = tf.nn.sparse_softmax_cross_entropy_with_logits(\r\n        labels=self.y_input, logits=self.pre_softmax)\r\n\r\n    self.xent_per_point = y_xent\r\n    self.xent = tf.reduce_sum(y_xent)\r\n\r\n    self.y_pred = tf.argmax(self.pre_softmax, 1)\r\n\r\n    correct_prediction = tf.equal(self.y_pred, self.y_input)\r\n\r\n    self.num_correct = tf.reduce_sum(tf.cast(correct_prediction, tf.int64))\r\n    self.accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))\r\n\r\n  @staticmethod\r\n  def _weight_variable(shape):\r\n      initial = tf.truncated_normal(shape, stddev=0.1)\r\n      return tf.Variable(initial)\r\n\r\n  @staticmethod\r\n  def _bias_variable(shape):\r\n      initial = tf.constant(0.1, shape = shape)\r\n      return tf.Variable(initial)\r\n\r\n  @staticmethod\r\n  def _conv2d(x, W):\r\n      return tf.nn.conv2d(x, W, strides=[1,1,1,1], padding='SAME')\r\n\r\n  @staticmethod\r\n  def _max_pool_2x2( x):\r\n      return tf.nn.max_pool(x,\r\n                            ksize = [1,2,2,1],\r\n                            strides=[1,2,2,1],\r\n                            padding='SAME')\r\n"
  },
  {
    "path": "madry_mnist/run_attack.py",
    "content": "\"\"\"Evaluates a model against examples from a .npy file as specified\r\n   in config.json\"\"\"\r\nfrom __future__ import absolute_import\r\nfrom __future__ import division\r\nfrom __future__ import print_function\r\n\r\nfrom datetime import datetime\r\nimport json\r\nimport math\r\nimport os\r\nimport sys\r\nimport time\r\n\r\nimport tensorflow as tf\r\nfrom tensorflow.examples.tutorials.mnist import input_data\r\n\r\nimport numpy as np\r\n\r\nfrom model import Model\r\n\r\n\r\ndef run_attack(checkpoint, x_adv, epsilon):\r\n    mnist = input_data.read_data_sets('MNIST_data', one_hot=False)\r\n\r\n    model = Model()\r\n\r\n    saver = tf.train.Saver()\r\n\r\n    num_eval_examples = 10000\r\n    eval_batch_size = 64\r\n\r\n    num_batches = int(math.ceil(num_eval_examples / eval_batch_size))\r\n    total_corr = 0\r\n\r\n    x_nat = mnist.test.images\r\n    l_inf = np.amax(np.abs(x_nat - x_adv))\r\n\r\n    if l_inf > epsilon + 0.0001:\r\n        print('maximum perturbation found: {}'.format(l_inf))\r\n        print('maximum perturbation allowed: {}'.format(epsilon))\r\n        return\r\n\r\n    y_pred = []  # label accumulator\r\n\r\n    with tf.Session() as sess:\r\n        # Restore the checkpoint\r\n        saver.restore(sess, checkpoint)\r\n\r\n        # Iterate over the samples batch-by-batch\r\n        for ibatch in range(num_batches):\r\n            bstart = ibatch * eval_batch_size\r\n            bend = min(bstart + eval_batch_size, num_eval_examples)\r\n\r\n            x_batch = x_adv[bstart:bend, :]\r\n            y_batch = mnist.test.labels[bstart:bend]\r\n\r\n            dict_adv = {model.x_input: x_batch,\r\n                        model.y_input: y_batch}\r\n            cur_corr, y_pred_batch = sess.run([model.num_correct, model.y_pred],\r\n                                              feed_dict=dict_adv)\r\n\r\n            total_corr += cur_corr\r\n            y_pred.append(y_pred_batch)\r\n\r\n    accuracy = total_corr / num_eval_examples\r\n\r\n    print('Accuracy: {:.2f}%'.format(100.0 * accuracy))\r\n    y_pred = np.concatenate(y_pred, axis=0)\r\n    np.save('pred.npy', y_pred)\r\n    print('Output saved at pred.npy')\r\n\r\n\r\nif __name__ == '__main__':\r\n    import json\r\n\r\n    with open('config.json') as config_file:\r\n        config = json.load(config_file)\r\n\r\n    model_dir = config['model_dir']\r\n\r\n    checkpoint = tf.train.latest_checkpoint(model_dir)\r\n    x_adv = np.load(config['store_adv_path'])\r\n\r\n    if checkpoint is None:\r\n        print('No checkpoint found')\r\n    elif x_adv.shape != (10000, 784):\r\n        print('Invalid shape: expected (10000,784), found {}'.format(x_adv.shape))\r\n    elif np.amax(x_adv) > 1.0001 or \\\r\n        np.amin(x_adv) < -0.0001 or \\\r\n        np.isnan(np.amax(x_adv)):\r\n        print('Invalid pixel range. Expected [0, 1], found [{}, {}]'.format(\r\n            np.amin(x_adv),\r\n            np.amax(x_adv)))\r\n    else:\r\n        run_attack(checkpoint, x_adv, config['epsilon'])\r\n"
  },
  {
    "path": "madry_mnist/train.py",
    "content": "\"\"\"Trains a model, saving checkpoints and tensorboard summaries along\r\n   the way.\"\"\"\r\nfrom __future__ import absolute_import\r\nfrom __future__ import division\r\nfrom __future__ import print_function\r\n\r\nfrom datetime import datetime\r\nimport json\r\nimport os\r\nimport shutil\r\nfrom timeit import default_timer as timer\r\n\r\nimport tensorflow as tf\r\nimport numpy as np\r\nfrom tensorflow.examples.tutorials.mnist import input_data\r\n\r\nfrom model import Model\r\nfrom attack import LinfPGDAttack\r\n\r\nwith open('config.json') as config_file:\r\n    config = json.load(config_file)\r\n\r\n# Setting up training parameters\r\ntf.set_random_seed(config['random_seed'])\r\n\r\nmax_num_training_steps = config['max_num_training_steps']\r\nnum_output_steps = config['num_output_steps']\r\nnum_summary_steps = config['num_summary_steps']\r\nnum_checkpoint_steps = config['num_checkpoint_steps']\r\n\r\nbatch_size = config['training_batch_size']\r\n\r\n# Setting up the data and the model\r\nmnist = input_data.read_data_sets('MNIST_data', one_hot=False)\r\nglobal_step = tf.contrib.framework.get_or_create_global_step()\r\nmodel = Model()\r\n\r\n# Setting up the optimizer\r\ntrain_step = tf.train.AdamOptimizer(1e-4).minimize(model.xent,\r\n                                                   global_step=global_step)\r\n\r\n# Set up adversary\r\nattack = LinfPGDAttack(model, \r\n                       config['epsilon'],\r\n                       config['k'],\r\n                       config['a'],\r\n                       config['random_start'],\r\n                       config['loss_func'])\r\n\r\n# Setting up the Tensorboard and checkpoint outputs\r\nmodel_dir = config['model_dir']\r\nif not os.path.exists(model_dir):\r\n  os.makedirs(model_dir)\r\n\r\n# We add accuracy and xent twice so we can easily make three types of\r\n# comparisons in Tensorboard:\r\n# - train vs eval (for a single run)\r\n# - train of different runs\r\n# - eval of different runs\r\n\r\nsaver = tf.train.Saver(max_to_keep=3)\r\ntf.summary.scalar('accuracy adv train', model.accuracy)\r\ntf.summary.scalar('accuracy adv', model.accuracy)\r\ntf.summary.scalar('xent adv train', model.xent / batch_size)\r\ntf.summary.scalar('xent adv', model.xent / batch_size)\r\ntf.summary.image('images adv train', model.x_image)\r\nmerged_summaries = tf.summary.merge_all()\r\n\r\nshutil.copy('config.json', model_dir)\r\n\r\nwith tf.Session() as sess:\r\n  # Initialize the summary writer, global variables, and our time counter.\r\n  summary_writer = tf.summary.FileWriter(model_dir, sess.graph)\r\n  sess.run(tf.global_variables_initializer())\r\n  training_time = 0.0\r\n\r\n  # Main training loop\r\n  for ii in range(max_num_training_steps):\r\n    x_batch, y_batch = mnist.train.next_batch(batch_size)\r\n\r\n    # Compute Adversarial Perturbations\r\n    start = timer()\r\n    x_batch_adv = attack.perturb(x_batch, y_batch, sess)\r\n    end = timer()\r\n    training_time += end - start\r\n\r\n    nat_dict = {model.x_input: x_batch,\r\n                model.y_input: y_batch}\r\n\r\n    adv_dict = {model.x_input: x_batch_adv,\r\n                model.y_input: y_batch}\r\n\r\n    # Output to stdout\r\n    if ii % num_output_steps == 0:\r\n      nat_acc = sess.run(model.accuracy, feed_dict=nat_dict)\r\n      adv_acc = sess.run(model.accuracy, feed_dict=adv_dict)\r\n      print('Step {}:    ({})'.format(ii, datetime.now()))\r\n      print('    training nat accuracy {:.4}%'.format(nat_acc * 100))\r\n      print('    training adv accuracy {:.4}%'.format(adv_acc * 100))\r\n      if ii != 0:\r\n        print('    {} examples per second'.format(\r\n            num_output_steps * batch_size / training_time))\r\n        training_time = 0.0\r\n    # Tensorboard summaries\r\n    if ii % num_summary_steps == 0:\r\n      summary = sess.run(merged_summaries, feed_dict=adv_dict)\r\n      summary_writer.add_summary(summary, global_step.eval(sess))\r\n\r\n    # Write a checkpoint\r\n    if ii % num_checkpoint_steps == 0:\r\n      saver.save(sess,\r\n                 os.path.join(model_dir, 'checkpoint'),\r\n                 global_step=global_step)\r\n\r\n    # Actual training step\r\n    start = timer()\r\n    sess.run(train_step, feed_dict=adv_dict)\r\n    end = timer()\r\n    training_time += end - start\r\n"
  },
  {
    "path": "models.py",
    "content": "import torch\nimport tensorflow as tf\nimport numpy as np\nimport math\nimport utils\nfrom torchvision import models as torch_models\nfrom torch.nn import DataParallel\nfrom madry_mnist.model import Model as madry_model_mnist\nfrom madry_cifar10.model import Model as madry_model_cifar10\nfrom logit_pairing.models import LeNet as lp_model_mnist, ResNet20_v2 as lp_model_cifar10\nfrom post_avg.postAveragedModels import pa_resnet110_config1 as post_avg_cifar10_resnet\nfrom post_avg.postAveragedModels import pa_resnet152_config1 as post_avg_imagenet_resnet\n\n\nclass Model:\n    def __init__(self, batch_size, gpu_memory):\n        self.batch_size = batch_size\n        self.gpu_memory = gpu_memory\n\n    def predict(self, x):\n        raise NotImplementedError('use ModelTF or ModelPT')\n\n    def loss(self, y, logits, targeted=False, loss_type='margin_loss'):\n        \"\"\" Implements the margin loss (difference between the correct and 2nd best class). \"\"\"\n        if loss_type == 'margin_loss':\n            preds_correct_class = (logits * y).sum(1, keepdims=True)\n            diff = preds_correct_class - logits  # difference between the correct class and all other classes\n            diff[y] = np.inf  # to exclude zeros coming from f_correct - f_correct\n            margin = diff.min(1, keepdims=True)\n            loss = margin * -1 if targeted else margin\n        elif loss_type == 'cross_entropy':\n            probs = utils.softmax(logits)\n            loss = -np.log(probs[y])\n            loss = loss * -1 if not targeted else loss\n        else:\n            raise ValueError('Wrong loss.')\n        return loss.flatten()\n\n\nclass ModelTF(Model):\n    \"\"\"\n    Wrapper class around TensorFlow models.\n\n    In order to incorporate a new model, one has to ensure that self.model has a TF variable `logits`,\n    and that the preprocessing of the inputs is done correctly (e.g. subtracting the mean and dividing over the\n    standard deviation).\n    \"\"\"\n    def __init__(self, model_name, batch_size, gpu_memory):\n        super().__init__(batch_size, gpu_memory)\n        model_folder = model_path_dict[model_name]\n        model_file = tf.train.latest_checkpoint(model_folder)\n        self.model = model_class_dict[model_name]()\n        self.batch_size = batch_size\n        self.model_name = model_name\n        self.model_file = model_file\n        if 'logits' not in self.model.__dict__:\n            self.model.logits = self.model.pre_softmax\n\n        gpu_options = tf.GPUOptions(per_process_gpu_memory_fraction=gpu_memory)\n        config = tf.ConfigProto(allow_soft_placement=True, gpu_options=gpu_options)\n        self.sess = tf.Session(config=config)\n        tf.train.Saver().restore(self.sess, model_file)\n\n    def predict(self, x):\n        if 'mnist' in self.model_name:\n            shape = self.model.x_input.shape[1:].as_list()\n            x = np.reshape(x, [-1, *shape])\n        elif 'cifar10' in self.model_name:\n            x = np.transpose(x, axes=[0, 2, 3, 1])\n\n        n_batches = math.ceil(x.shape[0] / self.batch_size)\n        logits_list = []\n        for i in range(n_batches):\n            x_batch = x[i*self.batch_size:(i+1)*self.batch_size]\n            logits = self.sess.run(self.model.logits, feed_dict={self.model.x_input: x_batch})\n            logits_list.append(logits)\n        logits = np.vstack(logits_list)\n        return logits\n\n\nclass ModelPT(Model):\n    \"\"\"\n    Wrapper class around PyTorch models.\n\n    In order to incorporate a new model, one has to ensure that self.model is a callable object that returns logits,\n    and that the preprocessing of the inputs is done correctly (e.g. subtracting the mean and dividing over the\n    standard deviation).\n    \"\"\"\n    def __init__(self, model_name, batch_size, gpu_memory):\n        super().__init__(batch_size, gpu_memory)\n        if model_name in ['pt_vgg', 'pt_resnet', 'pt_inception', 'pt_densenet']:\n            model = model_class_dict[model_name](pretrained=True)\n            self.mean = np.reshape([0.485, 0.456, 0.406], [1, 3, 1, 1])\n            self.std = np.reshape([0.229, 0.224, 0.225], [1, 3, 1, 1])\n            model = DataParallel(model.cuda())\n        else:\n            model = model_class_dict[model_name]()\n            if model_name in ['pt_post_avg_cifar10', 'pt_post_avg_imagenet']:\n                # checkpoint = torch.load(model_path_dict[model_name])\n                self.mean = np.reshape([0.485, 0.456, 0.406], [1, 3, 1, 1])\n                self.std = np.reshape([0.229, 0.224, 0.225], [1, 3, 1, 1])\n            else:\n                model = DataParallel(model).cuda()\n                checkpoint = torch.load(model_path_dict[model_name] + '.pth')\n                self.mean = np.reshape([0.485, 0.456, 0.406], [1, 3, 1, 1])\n                self.std = np.reshape([0.225, 0.225, 0.225], [1, 3, 1, 1])\n                model.load_state_dict(checkpoint)\n                model.float()\n        self.mean, self.std = self.mean.astype(np.float32), self.std.astype(np.float32)\n\n        model.eval()\n        self.model = model\n\n    def predict(self, x):\n        x = (x - self.mean) / self.std\n        x = x.astype(np.float32)\n\n        n_batches = math.ceil(x.shape[0] / self.batch_size)\n        logits_list = []\n        with torch.no_grad():  # otherwise consumes too much memory and leads to a slowdown\n            for i in range(n_batches):\n                x_batch = x[i*self.batch_size:(i+1)*self.batch_size]\n                x_batch_torch = torch.as_tensor(x_batch, device=torch.device('cuda'))\n                logits = self.model(x_batch_torch).cpu().numpy()\n                logits_list.append(logits)\n        logits = np.vstack(logits_list)\n        return logits\n\n\nmodel_path_dict = {'madry_mnist_robust': 'madry_mnist/models/robust',\n                   'madry_cifar10_robust': 'madry_cifar10/models/robust',\n                   'clp_mnist': 'logit_pairing/models/clp_mnist',\n                   'lsq_mnist': 'logit_pairing/models/lsq_mnist',\n                   'clp_cifar10': 'logit_pairing/models/clp_cifar10',\n                   'lsq_cifar10': 'logit_pairing/models/lsq_cifar10',\n                   'pt_post_avg_cifar10': 'post_avg/trainedModel/resnet110.th'\n                   }\nmodel_class_dict = {'pt_vgg': torch_models.vgg16_bn,\n                    'pt_resnet': torch_models.resnet50,\n                    'pt_inception': torch_models.inception_v3,\n                    'pt_densenet': torch_models.densenet121,\n                    'madry_mnist_robust': madry_model_mnist,\n                    'madry_cifar10_robust': madry_model_cifar10,\n                    'clp_mnist': lp_model_mnist,\n                    'lsq_mnist': lp_model_mnist,\n                    'clp_cifar10': lp_model_cifar10,\n                    'lsq_cifar10': lp_model_cifar10,\n                    'pt_post_avg_cifar10': post_avg_cifar10_resnet,\n                    'pt_post_avg_imagenet': post_avg_imagenet_resnet,\n                    }\nall_model_names = list(model_class_dict.keys())\n\n"
  },
  {
    "path": "post_avg/LICENSE.txt",
    "content": "MIT License\n\nCopyright (c) [2019] [Yuping Lin]\n\nPermission is hereby granted, free of charge, to any person obtaining a copy\nof this software and associated documentation files (the \"Software\"), to deal\nin the Software without restriction, including without limitation the rights\nto use, copy, modify, merge, publish, distribute, sublicense, and/or sell\ncopies of the Software, and to permit persons to whom the Software is\nfurnished to do so, subject to the following conditions:\n\nThe above copyright notice and this permission notice shall be included in all\ncopies or substantial portions of the Software.\n\nTHE SOFTWARE IS PROVIDED \"AS IS\", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR\nIMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,\nFITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE\nAUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER\nLIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,\nOUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE\nSOFTWARE."
  },
  {
    "path": "post_avg/PADefense.py",
    "content": "# -*- coding: utf-8 -*-\n\nimport time\n\nimport torch\nimport torch.nn as nn\nimport torch.utils.data as data\nimport torch.cuda as cuda\nimport torchvision.transforms as transforms\nimport torchvision.utils as utl\nimport torch.backends.cudnn as cudnn\n\nimport torchvision.datasets as datasets\nimport torchvision.models as mdl\n\ndef checkEntropy(scores):\n    scores = scores.squeeze()\n    scr = scores.clone()\n    scr[scr <= 0] = 1.0\n    return - torch.sum(scores * torch.log(scr))\n    \n\ndef checkConfidence(scores, K=10):\n    scores = scores.squeeze()\n    hScores, _ = torch.sort(scores, dim=0, descending=True)\n    \n    return hScores[0] / torch.sum(hScores[:K])\n    \n\ndef integratedForward(model, sps, batchSize, nClasses, device='cpu', voteMethod='avg_softmax'):\n    N = sps.size(0)\n    feats = torch.empty(N, nClasses)\n    model = model.to(device)\n    \n    with torch.no_grad():\n        baseInx = 0\n        while baseInx < N:\n            cuda.empty_cache()\n            endInx = min(baseInx + batchSize, N)\n            y = model(sps[baseInx:endInx, :].to(device)).detach().to('cpu')\n            feats[baseInx:endInx, :] = y\n            baseInx = endInx\n    \n    if voteMethod == 'avg_feat':\n        feat = torch.mean(feats, dim=0, keepdim=True)\n    elif voteMethod == 'most_vote':\n        maxV, _ = torch.max(feats, dim=1, keepdim=True)\n        feat = torch.sum(feats == maxV, dim=0, keepdim=True)\n    elif voteMethod == 'weighted_feat':\n        feat = torch.mean(feats, dim=0, keepdim=True)\n        maxV, _ = torch.max(feats, dim=1, keepdim=True)\n        feat = feat * torch.sum(feats == maxV, dim=0, keepdim=True).float()\n    elif voteMethod == 'avg_softmax':\n        feats = nn.functional.softmax(feats, dim=1)\n        feat = torch.mean(feats, dim=0, keepdim=True)\n    else:\n        # default method: avg_softmax\n        feats = nn.functional.softmax(feats, dim=1)\n        feat = torch.mean(feats, dim=0, keepdim=True)\n    \n    return feat, feats\n\n# not updated, deprecated\ndef integratedForward_cls(model, sps, batchSize, nClasses, device='cpu', count_votes=False):\n    N = sps.size(0)\n    feats = torch.empty(N, nClasses)\n    model = model.to(device)\n    \n    with torch.no_grad():\n        baseInx = 0\n        while baseInx < N:\n            cuda.empty_cache()\n            endInx = min(baseInx + batchSize, N)\n            y = model.classifier(sps[baseInx:endInx, :].to(device)).detach().to('cpu')\n            feats[baseInx:endInx, :] = y\n            baseInx = endInx\n    \n    if count_votes:\n        maxV, _ = torch.max(feats, dim=1, keepdim=True)\n        feat = torch.sum(feats == maxV, dim=0, keepdim=True)\n    else:\n        feat = torch.mean(feats, dim=0, keepdim=True)\n    \n    return feat, feats\n\n\ndef findNeighbors_random(sp, K, r=[2], direction='both'):\n    # only accept single sample\n    if sp.size(0) != 1:\n        return None\n        \n    if isinstance(K, list):\n        K = sum(K)\n        \n    # randomly select directions\n    shifts = torch.randn(K, sp.size(1) * sp.size(2) * sp.size(3)).to('cuda')\n    shifts = nn.functional.normalize(shifts, p=2, dim=1)\n    shifts = shifts.view(K, sp.size(1), sp.size(2), sp.size(3)).contiguous()\n    \n    if direction == 'both':\n        shifts = torch.cat([shifts, -shifts], dim=0)\n    \n    nbs = []\n    for rInx in range(len(r)):\n        nbs.append(sp.to('cuda') + r[rInx] * shifts)\n\n    return torch.cat(nbs, dim=0)\n    \n\ndef findNeighbors_plain_vgg(model, sp, K, r=[2], direction='both', device='cpu'):\n    # only accept single sample\n    if sp.size(0) != 1:\n        return None\n    \n    # storages for K selected distances and linear mapping\n    selected_list = []\n    \n    # set model to evaluation mode\n    model = model.to(device)\n    model = model.eval()\n    \n    # place holder for input, and set to require gradient\n    x = sp.clone().to(device)\n    x.requires_grad = True\n    \n    # forward through the feature part\n    y = model.features(x)\n    y = model.avgpool(y)\n    y = y.view(y.size(0), -1)\n    \n    # forward through classifier layer by layer\n    for lyInx, module in model.classifier.named_children():\n        # forward\n        y = module(y)\n        \n        # at each layer activation\n        if isinstance(module, nn.Linear):\n            # for each neuron\n            for i in range(y.size(1)):\n                # clear previous gradients\n                x.grad = None\n                \n                # compute gradients\n                goal = torch.abs(y[0, i])\n                goal.backward(retain_graph=True)    # retain graph for future computation\n                \n                # compute distance\n                d = torch.abs(y[0, i]) / torch.norm(x.grad)\n                \n                # keep K shortest distances\n                selected_list.append((d.clone().detach().to('cpu'), x.grad.clone().detach().to('cpu')))\n                selected_list = sorted(selected_list, key=lambda x:x[0], reverse=False)\n                selected_list = selected_list[0:K]\n    \n    # generate neighboring samples\n    grad_list = [e[1] / torch.norm(e[1]) for e in selected_list]\n    unit_shifts = torch.cat(grad_list, dim=0)\n    nbs = []\n    for rInx in range(len(r)):\n        if direction == 'inc':\n            nbs.append(sp.to('cpu') + r[rInx] * unit_shifts)\n        elif direction == 'dec':\n            nbs.append(sp.to('cpu') - r[rInx] * unit_shifts)\n        else:\n            nbs.append(sp.to('cpu') + r[rInx] * unit_shifts)\n            nbs.append(sp.to('cpu') - r[rInx] * unit_shifts)\n    nbs = torch.cat(nbs, dim=0)\n    nbs = nbs.detach()\n    nbs.requires_grad = False\n    \n    return nbs\n    \n\ndef findNeighbors_lastLy_vgg(model, sp, K, r=[2], direction='both', device='cpu'):\n    # only accept single sample\n    if sp.size(0) != 1:\n        return None\n    \n    # storages for K selected distances and linear mapping\n    selected_list = []\n    \n    # set model to evaluation mode\n    model = model.to(device)\n    model = model.eval()\n    \n    # place holder for input, and set to require gradient\n    x = sp.clone().to(device)\n    x.requires_grad = True\n    \n    # forward through the feature part\n    y = model(x)\n    y = y.view(y.size(0), -1)\n\n    for i in range(y.size(1)):\n        # clear previous gradients\n        x.grad = None\n                \n        # compute gradients\n        goal = torch.abs(y[0, i])\n        if i < y.size(1) - 1:\n            goal.backward(retain_graph=True)    # retain graph for future computation\n        else:\n            goal.backward(retain_graph=False)\n                \n        # compute distance\n        d = torch.abs(y[0, i]) / torch.norm(x.grad)\n                \n        # keep K shortest distances\n        selected_list.append((d.clone().detach().to('cpu'), x.grad.clone().detach().to('cpu')))\n        selected_list = sorted(selected_list, key=lambda x:x[0], reverse=False)\n        selected_list = selected_list[0:K]\n    \n    # generate neighboring samples\n    grad_list = [e[1] / torch.norm(e[1]) for e in selected_list]\n    unit_shifts = torch.cat(grad_list, dim=0)\n    nbs = []\n    for rInx in range(len(r)):\n        if direction == 'inc':\n            nbs.append(sp.to('cpu') + r[rInx] * unit_shifts)\n        elif direction == 'dec':\n            nbs.append(sp.to('cpu') - r[rInx] * unit_shifts)\n        else:\n            nbs.append(sp.to('cpu') + r[rInx] * unit_shifts)\n            nbs.append(sp.to('cpu') - r[rInx] * unit_shifts)\n    nbs = torch.cat(nbs, dim=0)\n    nbs = nbs.detach()\n    nbs.requires_grad = False\n    \n    return nbs\n    \n    \ndef findNeighbors_approx_vgg(model, sp, K, r=[2], direction='both', device='cpu'):\n    # only accept single sample\n    if sp.size(0) != 1:\n        return None\n    \n    # storages for K selected distances and linear mapping\n    selected_list = []\n    \n    # set model to evaluation mode\n    model = model.to(device)\n    model = model.eval()\n    \n    # place holder for input, and set to require gradient\n    x = sp.clone().to(device)\n    x.requires_grad = True\n    \n    # forward through the feature part\n    y = model.features(x)\n    y = model.avgpool(y)\n    y = y.view(y.size(0), -1)\n    \n    # forward through classifier layer by layer\n    lnLy_inx = 0\n    for lyInx, module in model.classifier.named_children():\n        # forward\n        y = module(y)\n        \n        # at each layer activation\n        if isinstance(module, nn.Linear):\n            KInx = min(lnLy_inx, len(K)-1)\n            if K[KInx] > 0:\n                with torch.no_grad():\n                    # compute weight norm\n                    w_norm = torch.norm(module.weight, dim=1, keepdim=True)\n            \n                    # compute distance\n                    d = torch.abs(y) / w_norm.t()\n                    _, sortedInx = torch.sort(d, dim=1, descending=False)\n            \n                # for each selected neuron\n                for i in range(K[KInx]):\n                \n                    # clear previous gradients\n                    x.grad = None\n                \n                    # compute gradients\n                    goal = torch.abs(y[0, sortedInx[0, i]])\n                    goal.backward(retain_graph=True)    # retain graph for future computation\n                \n                    # record gradients\n                    selected_list.append(x.grad.clone().detach().to('cpu') / torch.norm(x.grad).detach().to('cpu'))\n            \n            # update number of linear layer sampled\n            lnLy_inx = lnLy_inx + 1\n    \n    # generate neighboring samples\n    unit_shifts = torch.cat(selected_list, dim=0)\n    nbs = []\n    for rInx in range(len(r)):\n        if direction == 'inc':\n            nbs.append(sp.to('cpu') + r[rInx] * unit_shifts)\n        elif direction == 'dec':\n            nbs.append(sp.to('cpu') - r[rInx] * unit_shifts)\n        else:\n            nbs.append(sp.to('cpu') + r[rInx] * unit_shifts)\n            nbs.append(sp.to('cpu') - r[rInx] * unit_shifts)\n    nbs = torch.cat(nbs, dim=0)\n    nbs = nbs.detach()\n    nbs.requires_grad = False\n    \n    return nbs\n    \n    \ndef findNeighbors_randPick_vgg(model, sp, K, r=[2], direction='both', device='cpu'):\n    # only accept single sample\n    if sp.size(0) != 1:\n        return None\n    \n    # storages for K selected distances and linear mapping\n    selected_list = []\n    \n    # set model to evaluation mode\n    model = model.to(device)\n    model = model.eval()\n    \n    # place holder for input, and set to require gradient\n    x = sp.clone().to(device)\n    x.requires_grad = True\n    \n    # forward through the feature part\n    y = model.features(x)\n    y = model.avgpool(y)\n    y = y.view(y.size(0), -1)\n    \n    # forward through classifier layer by layer\n    lnLy_inx = 0\n    for lyInx, module in model.classifier.named_children():\n        # forward\n        y = module(y)\n        \n        # at each layer activation\n        if isinstance(module, nn.Linear):\n            KInx = min(lnLy_inx, len(K)-1)\n            if K[KInx] > 0:\n                # randomly permute indices\n                pickInx = torch.randperm(y.size(1))\n            \n                # for each selected neuron\n                for i in range(K[KInx]):\n                \n                    # clear previous gradients\n                    x.grad = None\n                \n                    # compute gradients\n                    goal = torch.abs(y[0, pickInx[i]])\n                    goal.backward(retain_graph=True)    # retain graph for future computation\n                \n                    # record gradients\n                    selected_list.append(x.grad.clone().detach().to('cpu') / torch.norm(x.grad).detach().to('cpu'))\n            \n            # update number of linear layer sampled\n            lnLy_inx = lnLy_inx + 1\n    \n    # generate neighboring samples\n    unit_shifts = torch.cat(selected_list, dim=0)\n    nbs = []\n    for rInx in range(len(r)):\n        if direction == 'inc':\n            nbs.append(sp.to('cpu') + r[rInx] * unit_shifts)\n        elif direction == 'dec':\n            nbs.append(sp.to('cpu') - r[rInx] * unit_shifts)\n        else:\n            nbs.append(sp.to('cpu') + r[rInx] * unit_shifts)\n            nbs.append(sp.to('cpu') - r[rInx] * unit_shifts)\n    nbs = torch.cat(nbs, dim=0)\n    nbs = nbs.detach()\n    nbs.requires_grad = False\n    \n    return nbs\n\n# not updated, deprecated\ndef findNeighbors_feats_lastLy_vgg(model, sp, K, r=[2], direction='both', device='cpu', includeOriginal=True):\n    # only accept single sample\n    if sp.size(0) != 1:\n        return None\n    \n    # storages for K selected distances and linear mapping\n    selected_list = []\n    \n    # set model to evaluation mode\n    model = model.to(device)\n    model = model.eval()\n    \n    # forward through the feature part\n    with torch.no_grad():\n        feat = model.features(sp.to(device))\n        feat = feat.view(feat.size(0), -1).contiguous().detach()\n        \n    # place holder for feature, and set to require gradient\n    x = feat.clone().detach()\n    x.requires_grad = True\n    \n    # forward through the classifier part\n    y = model.classifier(x)\n    y = y.view(y.size(0), -1)\n\n    for i in range(y.size(1)):\n        # clear previous gradients\n        x.grad = None\n                \n        # compute gradients\n        goal = torch.abs(y[0, i])\n        if i < y.size(1) - 1:\n            goal.backward(retain_graph=True)    # retain graph for future computation\n        else:\n            goal.backward(retain_graph=False)\n                \n        # compute distance\n        d = torch.abs(y[0, i]) / torch.norm(x.grad)\n                \n        # keep K shortest distances\n        selected_list.append((d.clone().detach().to('cpu'), x.grad.clone().detach().to('cpu')))\n        selected_list = sorted(selected_list, key=lambda x:x[0], reverse=False)\n        selected_list = selected_list[0:K]\n    \n    # generate neighboring samples\n    grad_list = [e[1] / torch.norm(e[1]) for e in selected_list]\n    unit_shifts = torch.cat(grad_list, dim=0)\n    if includeOriginal:\n        nbs = [feat.to('cpu')]\n    else:\n        nbs = []\n    for rInx in range(len(r)):\n        if direction == 'inc':\n            nbs.append(feat.to('cpu') + r[rInx] * unit_shifts)\n        elif direction == 'dec':\n            nbs.append(feat.to('cpu') - r[rInx] * unit_shifts)\n        else:\n            nbs.append(feat.to('cpu') + r[rInx] * unit_shifts)\n            nbs.append(feat.to('cpu') - r[rInx] * unit_shifts)\n    nbs = torch.cat(nbs, dim=0)\n    nbs = nbs.detach()\n    nbs.requires_grad = False\n    \n    return nbs\n\n# not updated, deprecated\ndef findNeighbors_feats_approx_vgg(model, sp, K, r=[2], direction='both', device='cpu', includeOriginal=True):\n    # only accept single sample\n    if sp.size(0) != 1:\n        return None\n    \n    # storages for K selected distances and linear mapping\n    selected_list = []\n    \n    # set model to evaluation mode\n    model = model.to(device)\n    model = model.eval()\n    \n    # forward through the feature part\n    with torch.no_grad():\n        feat = model.features(sp.to(device))\n        feat = feat.view(feat.size(0), -1).contiguous().detach()\n        \n    # place holder for feature, and set to require gradient\n    x = feat.clone().detach()\n    x.requires_grad = True\n    y = x\n    \n    # forward through classifier layer by layer\n    lnLy_inx = 0\n    for lyInx, module in model.classifier.named_children():\n        # forward\n        y = module(y)\n        \n        # at each layer activation\n        if isinstance(module, nn.Linear):\n            KInx = min(lnLy_inx, len(K)-1)\n            if K[KInx] > 0:\n                with torch.no_grad():\n                    # compute weight norm\n                    w_norm = torch.norm(module.weight, dim=1, keepdim=True)\n            \n                    # compute distance\n                    d = torch.abs(y) / w_norm.t()\n                    _, sortedInx = torch.sort(d, dim=1, descending=False)\n            \n                # for each selected neuron\n                for i in range(K[KInx]):\n                \n                    # clear previous gradients\n                    x.grad = None\n                \n                    # compute gradients\n                    goal = torch.abs(y[0, sortedInx[0, i]])\n                    goal.backward(retain_graph=True)    # retain graph for future computation\n                \n                    # record gradients\n                    selected_list.append(x.grad.clone().detach().to('cpu') / torch.norm(x.grad).detach().to('cpu'))\n            \n            # update number of linear layer sampled\n            lnLy_inx = lnLy_inx + 1\n    \n    # generate neighboring samples\n    unit_shifts = torch.cat(selected_list, dim=0)\n    if includeOriginal:\n        nbs = [feat.to('cpu')]\n    else:\n        nbs = []\n    for rInx in range(len(r)):\n        if direction == 'inc':\n            nbs.append(feat.to('cpu') + r[rInx] * unit_shifts)\n        elif direction == 'dec':\n            nbs.append(feat.to('cpu') - r[rInx] * unit_shifts)\n        else:\n            nbs.append(feat.to('cpu') + r[rInx] * unit_shifts)\n            nbs.append(feat.to('cpu') - r[rInx] * unit_shifts)\n    nbs = torch.cat(nbs, dim=0)\n    nbs = nbs.detach()\n    nbs.requires_grad = False\n    \n    return nbs\n    \n    \ndef formSquad_vgg(method, model, sp, K, r=[2], direction='both', device='cpu', includeOriginal=True):\n    if method == 'random':\n        nbs = findNeighbors_random(sp, K, r, direction=direction)\n        if includeOriginal:\n            nbs = torch.cat([sp, nbs], dim=0)\n    elif method == 'plain':\n        nbs = findNeighbors_plain_vgg(model, sp, K, r, direction=direction, device=device)\n        if includeOriginal:\n            nbs = torch.cat([sp, nbs], dim=0)\n    elif method == 'lastLy':\n        nbs = findNeighbors_lastLy_vgg(model, sp, K, r, direction=direction, device=device)\n        if includeOriginal:\n            nbs = torch.cat([sp, nbs], dim=0)\n    elif method == 'approx':\n        nbs = findNeighbors_approx_vgg(model, sp, K, r, direction=direction, device=device)\n        if includeOriginal:\n            nbs = torch.cat([sp, nbs], dim=0)\n    elif method == 'randPick':\n        nbs = findNeighbors_randPick_vgg(model, sp, K, r, direction=direction, device=device)\n        if includeOriginal:\n            nbs = torch.cat([sp, nbs], dim=0)\n    elif method == 'feats_lastLy':\n        nbs = findNeighbors_feats_lastLy_vgg(model, sp, K, r, direction=direction, device=device, includeOriginal=includeOriginal)\n    elif method == 'feats_approx':\n        nbs = findNeighbors_feats_approx_vgg(model, sp, K, r, direction=direction, device=device, includeOriginal=includeOriginal)\n    else:\n        # if invalid method, use default setting. (actually should raise error here)\n        nbs = findNeighbors_random(sp, K, r, direction=direction)\n        if includeOriginal:\n            nbs = torch.cat([sp, nbs], dim=0)\n        \n    return nbs\n    \n\ndef findNeighbors_approx_resnet(model, sp, K, r=[2], direction='both', device='cpu'):\n    # only accept single sample\n    if sp.size(0) != 1:\n        return None\n    \n    # storages for K selected distances and linear mapping\n    selected_list = []\n    \n    # set model to evaluation mode\n    model = model.to(device)\n    model = model.eval()\n    \n    # place holder for input, and set to require gradient\n    x = sp.clone().to(device)\n    x.requires_grad = True\n    \n    # forward through the model\n    y = model(x)\n    y = y.view(y.size(0), -1)\n    \n    if K > 0:\n        with torch.no_grad():\n            # compute weight norm\n            w_norm = torch.norm(model.fc.weight, dim=1, keepdim=True)\n            \n            # compute distance\n            d = torch.abs(y) / w_norm.t()\n            _, sortedInx = torch.sort(d, dim=1, descending=False)\n            \n        # for each selected neuron\n        for i in range(K):\n                \n            # clear previous gradients\n            x.grad = None\n                \n            # compute gradients\n            goal = torch.abs(y[0, sortedInx[0, i]])\n            goal.backward(retain_graph=True)    # retain graph for future computation\n                \n            # record gradients\n            selected_list.append(x.grad.clone().detach().to('cpu') / torch.norm(x.grad).detach().to('cpu'))\n    \n    # generate neighboring samples\n    unit_shifts = torch.cat(selected_list, dim=0)\n    nbs = []\n    for rInx in range(len(r)):\n        if direction == 'inc':\n            nbs.append(sp.to('cpu') + r[rInx] * unit_shifts)\n        elif direction == 'dec':\n            nbs.append(sp.to('cpu') - r[rInx] * unit_shifts)\n        else:\n            nbs.append(sp.to('cpu') + r[rInx] * unit_shifts)\n            nbs.append(sp.to('cpu') - r[rInx] * unit_shifts)\n    nbs = torch.cat(nbs, dim=0)\n    nbs = nbs.detach()\n    nbs.requires_grad = False\n    \n    return nbs\n    \n\ndef findNeighbors_approx_resnet_small(model, sp, K, r=[2], direction='both', device='cpu'):\n    # only accept single sample\n    if sp.size(0) != 1:\n        return None\n    \n    # storages for K selected distances and linear mapping\n    selected_list = []\n    \n    # set model to evaluation mode\n    model = model.to(device)\n    model = model.eval()\n    \n    # place holder for input, and set to require gradient\n    x = sp.clone().to(device)\n    x.requires_grad = True\n    \n    # forward through the model\n    y = model(x)\n    y = y.view(y.size(0), -1)\n    \n    if K > 0:\n        with torch.no_grad():\n            # compute weight norm\n            w_norm = torch.norm(model.linear.weight, dim=1, keepdim=True)\n            \n            # compute distance\n            d = torch.abs(y) / w_norm.t()\n            _, sortedInx = torch.sort(d, dim=1, descending=False)\n            \n        # for each selected neuron\n        for i in range(K):\n                \n            # clear previous gradients\n            x.grad = None\n                \n            # compute gradients\n            goal = torch.abs(y[0, sortedInx[0, i]])\n            goal.backward(retain_graph=True)    # retain graph for future computation\n                \n            # record gradients\n            selected_list.append(x.grad.clone().detach().to('cpu') / torch.norm(x.grad).detach().to('cpu'))\n    \n    # generate neighboring samples\n    unit_shifts = torch.cat(selected_list, dim=0)\n    nbs = []\n    for rInx in range(len(r)):\n        if direction == 'inc':\n            nbs.append(sp.to('cpu') + r[rInx] * unit_shifts)\n        elif direction == 'dec':\n            nbs.append(sp.to('cpu') - r[rInx] * unit_shifts)\n        else:\n            nbs.append(sp.to('cpu') + r[rInx] * unit_shifts)\n            nbs.append(sp.to('cpu') - r[rInx] * unit_shifts)\n    nbs = torch.cat(nbs, dim=0)\n    nbs = nbs.detach()\n    nbs.requires_grad = False\n    \n    return nbs\n    \n    \ndef formSquad_resnet(method, model, sp, K, r=[2], direction='both', device='cpu', includeOriginal=True):\n    if method == 'random':\n        nbs = findNeighbors_random(sp, K, r, direction=direction)\n        if includeOriginal:\n            nbs = torch.cat([sp, nbs], dim=0)\n    elif method == 'approx':\n        nbs = findNeighbors_approx_resnet(model, sp, K, r, direction=direction, device=device)\n        if includeOriginal:\n            nbs = torch.cat([sp, nbs], dim=0)\n    elif method == 'approx_cifar10':\n        nbs = findNeighbors_approx_resnet_small(model, sp, K, r, direction=direction, device=device)\n        if includeOriginal:\n            nbs = torch.cat([sp, nbs], dim=0)\n    else:\n        # if invalid method, use default setting. (actually should raise error here)\n        nbs = findNeighbors_random(sp, K, r, direction=direction)\n        if includeOriginal:\n            nbs = torch.cat([sp, nbs], dim=0)\n        \n    return nbs\n"
  },
  {
    "path": "post_avg/README.md",
    "content": "# Post-Average Adversarial Defense\nImplementation of the Post-Average adversarial defense method as described in [Bandlimiting Neural Networks Against Adversarial Attacks](https://arxiv.org/abs/1905.12797).\n\nThis implementation is based on PyTorch and uses the [Foolbox](https://github.com/bethgelab/foolbox) toolbox to provide attacking methods.\n\n## [robustml](https://github.com/robust-ml/robustml) evaluation\nThis implementation supports the robustml API for evaluation.\n\nTo evaluate on CIFAR-10:\n```\npython robustml_test_cifar10.py <datasetPath>\n```\n\nTo evaluate on ImageNet:\n```\npython robustml_test_imagenet.py <datasetPath>\n```\n"
  },
  {
    "path": "post_avg/attacks.py",
    "content": "# -*- coding: utf-8 -*-\n\nimport robustml\nimport numpy as np\n\nimport foolbox.criteria as crt\nimport foolbox.attacks as attacks\nimport foolbox.distances as distances\nimport foolbox.adversarial as adversarial\n\nclass NullAttack(robustml.attack.Attack):\n    def run(self, x, y, target):\n        return x\n\nclass FoolboxAttackWrapper(robustml.attack.Attack):\n    def __init__(self, attack):\n        self._attacker = attack\n    \n    def run(self, x, y, target):\n        # model requires image in (C, H, W), but robustml provides (H, W, C)\n        # transpose x to accommodate pytorch's axis arrangement convention\n        x = np.transpose(x, (2, 0, 1))\n        if target is not None:\n            adv_criterion = crt.TargetClass(target)\n            adv_obj = adversarial.Adversarial(self._attacker._default_model, adv_criterion, x, y, distance=self._attacker._default_distance)\n            adv_x = self._attacker(adv_obj)\n        else:\n            adv_x = self._attacker(x, y)\n        \n        # transpose back to data provider's convention       \n        return np.transpose(adv_x, (1, 2, 0))\n\ndef fgsmAttack(victim_model):   # victim_model should be model wrapped with foolbox model\n    attacker = attacks.GradientSignAttack(victim_model, crt.Misclassification())\n    return FoolboxAttackWrapper(attacker)\n        \ndef pgdAttack(victim_model):    # victim_model should be model wrapped with foolbox model\n    attacker = attacks.RandomStartProjectedGradientDescentAttack(victim_model, crt.Misclassification(), distance=distances.Linfinity)\n    return FoolboxAttackWrapper(attacker)\n    \ndef dfAttack(victim_model):   # victim_model should be model wrapped with foolbox model\n    attacker = attacks.DeepFoolAttack(victim_model, crt.Misclassification())\n    return FoolboxAttackWrapper(attacker)\n    \ndef cwAttack(victim_model): # victim_model should be model wrapped with foolbox model\n    attacker = attacks.CarliniWagnerL2Attack(victim_model, crt.Misclassification())\n    return FoolboxAttackWrapper(attacker)\n    "
  },
  {
    "path": "post_avg/postAveragedModels.py",
    "content": "# -*- coding: utf-8 -*-\n\nimport robustml\nimport numpy as np\nfrom collections import OrderedDict\nimport post_avg.PADefense as padef\nimport post_avg.resnetSmall as rnsmall\n\nimport torch\nimport torchvision.models as mdl\nimport torchvision.transforms as transforms\n\nclass PostAveragedResNet152(robustml.model.Model):\n    def __init__(self, K, R, eps, device='cuda'):\n        self._model = mdl.resnet152(pretrained=True).to(device)\n        self._dataset = robustml.dataset.ImageNet((224, 224, 3))\n        self._threat_model = robustml.threat_model.Linf(epsilon=eps)\n        self._K = K\n        self._r = [R/3, 2*R/3, R]\n        self._sample_method = 'random'\n        self._vote_method = 'avg_softmax'\n        self._device = device\n    \n    @property\n    def model(self):\n        return self._model\n    \n    @property\n    def dataset(self):\n        return self._dataset\n        \n    @property\n    def threat_model(self):\n        return self._threat_model\n        \n    def classify(self, x):\n        x = x.unsqueeze(0)\n        \n        # gather neighbor samples\n        x_squad = padef.formSquad_resnet(self._sample_method, self._model, x, self._K, self._r, device=self._device)\n        \n        # forward with a batch of neighbors\n        logits, _ = padef.integratedForward(self._model, x_squad, batchSize=100, nClasses=1000, device=self._device, voteMethod=self._vote_method)\n\n        return torch.as_tensor(logits)\n\n    def __call__(self, x):\n        logits_list = []\n        for img in x:\n            logits = self.classify(img)\n            logits_list.append(logits)\n        return torch.cat(logits_list, dim=0)\n        \n    def _preprocess(self, image):\n        # normalization used by pre-trained model\n        normalize = transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])\n        \n        return normalize(image)\n        \n    def to(self, device):\n        self._model = self._model.to(device)\n        self._device = device\n        \n    def eval(self):\n        self._model = self._model.eval()\n        \n        \ndef pa_resnet152_config1():\n    return PostAveragedResNet152(K=15, R=30, eps=8/255)\n\n    \nclass PostAveragedResNet110(robustml.model.Model):\n    def __init__(self, K, R, eps, device='cuda'):\n        # load model state dict\n        checkpoint = torch.load('post_avg/trainedModel/resnet110.th')\n        paramDict = OrderedDict()\n        for k, v in checkpoint['state_dict'].items():\n            # remove 'module.' prefix introduced by DataParallel, if any\n            if k.startswith('module.'):\n                paramDict[k[7:]] = v\n        self._model = rnsmall.resnet110()\n        self._model.load_state_dict(paramDict)\n        self._model = self._model.to(device)\n        \n        self._dataset = robustml.dataset.CIFAR10()\n        self._threat_model = robustml.threat_model.Linf(epsilon=eps)\n        self._K = K\n        self._r = [R/3, 2*R/3, R]\n        self._sample_method = 'random'\n        self._vote_method = 'avg_softmax'\n        self._device = device\n    \n    @property\n    def model(self):\n        return self._model\n    \n    @property\n    def dataset(self):\n        return self._dataset\n        \n    @property\n    def threat_model(self):\n        return self._threat_model\n        \n    def classify(self, x):\n        x = x.unsqueeze(0)\n\n        # gather neighbor samples\n        x_squad = padef.formSquad_resnet(self._sample_method, self._model, x, self._K, self._r, device=self._device)\n        \n        # forward with a batch of neighbors\n        logits, _ = padef.integratedForward(self._model, x_squad, batchSize=1000, nClasses=10, device=self._device, voteMethod=self._vote_method)\n        \n        return torch.as_tensor(logits)\n\n    def __call__(self, x):\n        logits_list = []\n        for img in x:\n            logits = self.classify(img)\n            logits_list.append(logits)\n        return torch.cat(logits_list, dim=0)\n\n    def _preprocess(self, image):\n        # normalization used by pre-trained model\n        normalize = transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])\n        \n        return normalize(image)\n        \n    def to(self, device):\n        self._model = self._model.to(device)\n        self._device = device\n        \n    def eval(self):\n        self._model = self._model.eval()\n        \n        \ndef pa_resnet110_config1():\n    return PostAveragedResNet110(K=15, R=6, eps=8/255)\n"
  },
  {
    "path": "post_avg/resnetSmall.py",
    "content": "# -*- coding: utf-8 -*-\n\n'''\nProperly implemented ResNet-s for CIFAR10 as described in paper [1].\n\nThe implementation and structure of this file is hugely influenced by [2]\nwhich is implemented for ImageNet and doesn't have option A for identity.\nMoreover, most of the implementations on the web is copy-paste from\ntorchvision's resnet and has wrong number of params.\n\nProper ResNet-s for CIFAR10 (for fair comparision and etc.) has following\nnumber of layers and parameters:\n\nname      | layers | params\nResNet20  |    20  | 0.27M\nResNet32  |    32  | 0.46M\nResNet44  |    44  | 0.66M\nResNet56  |    56  | 0.85M\nResNet110 |   110  |  1.7M\nResNet1202|  1202  | 19.4m\n\nwhich this implementation indeed has.\n\nReference:\n[1] Kaiming He, Xiangyu Zhang, Shaoqing Ren, Jian Sun\n    Deep Residual Learning for Image Recognition. arXiv:1512.03385\n[2] https://github.com/pytorch/vision/blob/master/torchvision/models/resnet.py\n\nIf you use this implementation in you work, please don't forget to mention the\nauthor, Yerlan Idelbayev.\n'''\nimport torch\nimport torch.nn as nn\nimport torch.nn.functional as F\nimport torch.nn.init as init\n\nfrom torch.autograd import Variable\n\n__all__ = ['ResNet', 'resnet20', 'resnet32', 'resnet44', 'resnet56', 'resnet110', 'resnet1202']\n\ndef _weights_init(m):\n    classname = m.__class__.__name__\n    # print(classname)\n    if isinstance(m, nn.Linear) or isinstance(m, nn.Conv2d):\n        init.kaiming_normal(m.weight)\n\nclass LambdaLayer(nn.Module):\n    def __init__(self, lambd):\n        super(LambdaLayer, self).__init__()\n        self.lambd = lambd\n\n    def forward(self, x):\n        return self.lambd(x)\n\n\nclass BasicBlock(nn.Module):\n    expansion = 1\n\n    def __init__(self, in_planes, planes, stride=1, option='A'):\n        super(BasicBlock, self).__init__()\n        self.conv1 = nn.Conv2d(in_planes, planes, kernel_size=3, stride=stride, padding=1, bias=False)\n        self.bn1 = nn.BatchNorm2d(planes)\n        self.conv2 = nn.Conv2d(planes, planes, kernel_size=3, stride=1, padding=1, bias=False)\n        self.bn2 = nn.BatchNorm2d(planes)\n\n        self.shortcut = nn.Sequential()\n        if stride != 1 or in_planes != planes:\n            if option == 'A':\n                \"\"\"\n                For CIFAR10 ResNet paper uses option A.\n                \"\"\"\n                self.shortcut = LambdaLayer(lambda x:\n                                            F.pad(x[:, :, ::2, ::2], (0, 0, 0, 0, planes//4, planes//4), \"constant\", 0))\n            elif option == 'B':\n                self.shortcut = nn.Sequential(\n                     nn.Conv2d(in_planes, self.expansion * planes, kernel_size=1, stride=stride, bias=False),\n                     nn.BatchNorm2d(self.expansion * planes)\n                )\n\n    def forward(self, x):\n        out = F.relu(self.bn1(self.conv1(x)))\n        out = self.bn2(self.conv2(out))\n        out += self.shortcut(x)\n        out = F.relu(out)\n        return out\n\n\nclass ResNet(nn.Module):\n    def __init__(self, block, num_blocks, num_classes=10):\n        super(ResNet, self).__init__()\n        self.in_planes = 16\n\n        self.conv1 = nn.Conv2d(3, 16, kernel_size=3, stride=1, padding=1, bias=False)\n        self.bn1 = nn.BatchNorm2d(16)\n        self.layer1 = self._make_layer(block, 16, num_blocks[0], stride=1)\n        self.layer2 = self._make_layer(block, 32, num_blocks[1], stride=2)\n        self.layer3 = self._make_layer(block, 64, num_blocks[2], stride=2)\n        self.linear = nn.Linear(64, num_classes)\n\n        self.apply(_weights_init)\n\n    def _make_layer(self, block, planes, num_blocks, stride):\n        strides = [stride] + [1]*(num_blocks-1)\n        layers = []\n        for stride in strides:\n            layers.append(block(self.in_planes, planes, stride))\n            self.in_planes = planes * block.expansion\n\n        return nn.Sequential(*layers)\n\n    def forward(self, x):\n        out = F.relu(self.bn1(self.conv1(x)))\n        out = self.layer1(out)\n        out = self.layer2(out)\n        out = self.layer3(out)\n        out = F.avg_pool2d(out, out.size()[3])\n        out = out.view(out.size(0), -1)\n        out = self.linear(out)\n        return out\n\n\ndef resnet20():\n    return ResNet(BasicBlock, [3, 3, 3])\n\n\ndef resnet32():\n    return ResNet(BasicBlock, [5, 5, 5])\n\n\ndef resnet44():\n    return ResNet(BasicBlock, [7, 7, 7])\n\n\ndef resnet56():\n    return ResNet(BasicBlock, [9, 9, 9])\n\n\ndef resnet110():\n    return ResNet(BasicBlock, [18, 18, 18])\n\n\ndef resnet1202():\n    return ResNet(BasicBlock, [200, 200, 200])\n\n\ndef test(net):\n    import numpy as np\n    total_params = 0\n\n    for x in filter(lambda p: p.requires_grad, net.parameters()):\n        total_params += np.prod(x.data.numpy().shape)\n    print(\"Total number of params\", total_params)\n    print(\"Total layers\", len(list(filter(lambda p: p.requires_grad and len(p.data.size())>1, net.parameters()))))\n\n\nif __name__ == \"__main__\":\n    for net_name in __all__:\n        if net_name.startswith('resnet'):\n            print(net_name)\n            test(globals()[net_name]())\n            print()"
  },
  {
    "path": "post_avg/robustml_test_cifar10.py",
    "content": "# -*- coding: utf-8 -*-\n\nimport torch\nimport argparse\nimport robustml\nimport numpy as np\nfrom foolbox.models import PyTorchModel\nfrom robustml_portal import attacks as atk\nfrom robustml_portal import postAveragedModels as pamdl\n\n\n# argument parsing\nparser = argparse.ArgumentParser(description=\"robustml evaluation on CIFAR-10\")\nparser.add_argument(\"datasetPath\", help=\"path to the 'test_batch' file\")\nparser.add_argument(\"--start\", type=int, default=0, help=\"inclusive starting index for data. default: 0\")\nparser.add_argument(\"--end\", type=int, help=\"exclusive ending index for data. default: dataset size\")\nparser.add_argument(\"--attack\", choices=[\"pgd\", \"fgsm\", \"df\", \"cw\", \"none\"], default=\"pgd\", help=\"attack method to be used. default: pgd\")\nparser.add_argument(\"--device\", help=\"compuation device to be used. 'cpu' or 'cuda:<index>'\")\nargs = parser.parse_args()\n\nif args.device is None:\n    if torch.cuda.is_available():\n        device = torch.device(\"cuda\")\n    else:\n        device = torch.device(\"cpu\")\nelse:\n    device = torch.device(args.device)\n\n# setup test model\nmodel = pamdl.pa_resnet110_config1()\nmodel.to(device)\nmodel.eval()\n\n# setup attacker\nnClasses = 10\nvictim_model = PyTorchModel(model.model, (0,1), nClasses, device=device, preprocessing=(np.array([0.485, 0.456, 0.406]).reshape((3, 1, 1)), np.array([0.229, 0.224, 0.225]).reshape((3, 1, 1))))\nif args.attack == \"pgd\":\n    attack = atk.pgdAttack(victim_model)\nelif args.attack == \"fgsm\":\n    attack = atk.fgsmAttack(victim_model)\nelif args.attack == \"df\":\n    attack = atk.dfAttack(victim_model)\nelif args.attack == \"cw\":\n    attack = atk.cwAttack(victim_model)\nelse:\n    attack = atk.NullAttack()\n\n# setup data provider\nprov = robustml.provider.CIFAR10(args.datasetPath)\n\n# evaluate performance\nif args.end is None:\n    args.end = len(prov)\natk_success_rate = robustml.evaluate.evaluate(model, attack, prov, start=args.start, end=args.end)\nprint('Overall attack success rate: %.4f' % atk_success_rate)"
  },
  {
    "path": "post_avg/robustml_test_imagenet.py",
    "content": "# -*- coding: utf-8 -*-\n\nimport torch\nimport argparse\nimport robustml\nimport numpy as np\nfrom foolbox.models import PyTorchModel\nfrom robustml_portal import attacks as atk\nfrom robustml_portal import postAveragedModels as pamdl\n\n\n# argument parsing\nparser = argparse.ArgumentParser(description=\"robustml evaluation on ImageNet\")\nparser.add_argument(\"datasetPath\", help=\"directory containing 'val.txt' and 'val/' folder\")\nparser.add_argument(\"--start\", type=int, default=0, help=\"inclusive starting index for data. default: 0\")\nparser.add_argument(\"--end\", type=int, help=\"exclusive ending index for data. default: dataset size\")\nparser.add_argument(\"--attack\", choices=[\"pgd\", \"fgsm\", \"df\", \"cw\", \"none\"], default=\"pgd\", help=\"attack method to be used. default: pgd\")\nparser.add_argument(\"--device\", help=\"compuation device to be used. 'cpu' or 'cuda:<index>'\")\nargs = parser.parse_args()\n\nif args.device is None:\n    if torch.cuda.is_available():\n        device = torch.device(\"cuda\")\n    else:\n        device = torch.device(\"cpu\")\nelse:\n    device = torch.device(args.device)\n\n# setup test model\nmodel = pamdl.pa_resnet152_config1()\nmodel.to(device)\nmodel.eval()\n\n# setup attacker\nnClasses = 1000\nvictim_model = PyTorchModel(model.model, (0,1), nClasses, device=device, preprocessing=(np.array([0.485, 0.456, 0.406]).reshape((3, 1, 1)), np.array([0.229, 0.224, 0.225]).reshape((3, 1, 1))))\nif args.attack == \"pgd\":\n    attack = atk.pgdAttack(victim_model)\nelif args.attack == \"fgsm\":\n    attack = atk.fgsmAttack(victim_model)\nelif args.attack == \"df\":\n    attack = atk.dfAttack(victim_model)\nelif args.attack == \"cw\":\n    attack = atk.cwAttack(victim_model)\nelse:\n    attack = atk.NullAttack()\n\n# setup data provider\nprov = robustml.provider.ImageNet(args.datasetPath, (224, 224, 3))\n\n# evaluate performance\nif args.end is None:\n    args.end = len(prov)\natk_success_rate = robustml.evaluate.evaluate(model, attack, prov, start=args.start, end=args.end)\nprint('Overall attack success rate: %.4f' % atk_success_rate)"
  },
  {
    "path": "post_avg/visualHelper.py",
    "content": "# -*- coding: utf-8 -*-\n\nimport torch\nimport torch.nn as nn\nimport numpy as np\n\nimport matplotlib; matplotlib.use('agg')\nimport matplotlib.pyplot as plt\n\ndef plotPredStats(feats, lb, K=10, image=None, noiseImage=None, savePath=None):\n    \n    # score by averaging\n    scores = torch.mean(feats, dim=0)\n    \n    # sort and select the top K scores\n    hScores, hCates = torch.sort(scores, dim=0, descending=True)\n    hScores = hScores[:K].numpy()\n    hCates = hCates[:K].numpy()\n    \n    # get individual preditions\n    _, preds = torch.max(feats, dim=1)\n    \n    # count votes\n    preds_count = {lb: 0}\n    for i in range(feats.size(0)):\n        if preds[i].item() in preds_count:\n            preds_count[preds[i].item()] = preds_count[preds[i].item()] + 1\n        else:\n            preds_count[preds[i].item()] = 1\n            \n    candidates = sorted(preds_count.keys())\n    votes = [preds_count[x] for x in candidates]\n    \n    # generate figure\n    fig = plt.figure()\n    if image is None and noiseImage is None:\n        ax1, ax2, ax3 = fig.subplots(3, 1)\n    else:\n        axes = fig.subplots(2, 2)\n        ax1 = axes[0, 0]\n        ax2 = axes[1, 0]\n        ax3 = axes[0, 1]\n        ax4 = axes[1, 1]\n    \n    # chart 1, votes distribution\n    inx1 = list(range(len(candidates)))\n    clr1 = []\n    for i in inx1:\n        if candidates[i] == lb:\n            clr1.append('Red')\n        else:\n            clr1.append('SkyBlue')\n    rects1 = ax1.bar(inx1, votes, color=clr1)\n    for rect in rects1:\n        h = rect.get_height()\n        ax1.text(rect.get_x() + 0.5 * rect.get_width(), 1.01 * h, '{}'.format(h), ha='center', va='bottom')\n    ax1.set_ylim(top=1.1 * ax1.get_ylim()[1])\n    ax1.set_xticks(inx1)\n    ax1.set_xticklabels([str(x) for x in candidates], rotation=30)\n    ax1.set_ylabel('votes')\n    ax1.set_title('Votes Distribution')\n    \n    # chart 2, top prediction scores\n    inx2 = list(range(len(hCates)))\n    clr2 = []\n    for i in inx2:\n        if hCates[i] == lb:\n            clr2.append('Red')\n        else:\n            clr2.append('SkyBlue')\n    rects2 = ax2.bar(inx2, hScores, color=clr2)\n    for rect in rects2:\n        h = rect.get_height()\n        ax2.text(rect.get_x() + 0.5 * rect.get_width(), 1.01 * h, '{:.2f}'.format(h), ha='center', va='bottom')\n    ax2.set_ylim(top=1.1 * ax2.get_ylim()[1])\n    ax2.set_xticks(inx2)\n    ax2.set_xticklabels([str(x) for x in hCates], rotation=30)\n    ax2.set_ylabel('score')\n    ax2.set_xlabel('Top Predictions')\n    \n    # axis 3, the noise image\n    if noiseImage is not None:\n        ax3.imshow(noiseImage)\n        ax3.set_xlabel('Noise Image')\n        ax3.set_axis_off()\n    else:\n        # if noise image is not given, show prediction event plot\n        clr3 = []\n        for i in range(preds.size(0)):\n            if preds[i] == lb:\n                clr3.append('Red')\n            else:\n                clr3.append('Green')\n        ax3.eventplot(preds.unsqueeze(1).numpy(), orientation='vertical', colors=clr3)\n        ax3.set_yticks(candidates)\n        ax3.set_yticklabels([str(x) for x in candidates])\n        ax3.set_xlabel('sample index')\n        ax3.set_ylabel('class')\n\n    # axis 4, the input image\n    if image is not None:\n        ax4.imshow(image)\n        ax4.set_title('Input Image')\n        ax4.set_axis_off()\n    \n    # save figure and close\n    if savePath is not None:\n        fig.savefig(savePath)\n        \n    plt.close(fig)\n\n\ndef plotPerturbationDistribution(perturbations, savePath=None):\n\n    # generate figure\n    fig = plt.figure()\n    ax1, ax2, ax3 = fig.subplots(3, 1)\n    \n    # plot scatter chart\n    perts = np.asarray(perturbations)\n    ax1.scatter(perts[:, 0], perts[:, 1], c='SkyBlue')\n    ax1.autoscale(axis='x')\n    ax1.set_ylim((-1, 2))\n    ax1.set_yticks([0, 1])\n    ax1.set_yticklabels(['missed', 'defensed'])\n    ax1.set_xlabel('Perturbation distance')\n    ax1.set_title('Perturbations Distribution')\n    \n    # plot bin chart for defensed adversarial samples\n    x = [e[0] for e in perturbations if e[1] == 1]\n    ax2.hist(x, bins=20, color='SkyBlue')\n    ax2.set_xlabel('Perturbation distance')\n    ax2.set_ylabel('Denfensed')\n    \n    # plot bin chart for missed adversarial samples\n    x = [e[0] for e in perturbations if e[1] == 0]\n    ax3.hist(x, bins=20, color='Red')\n    ax3.set_xlabel('Perturbation distance')\n    ax3.set_ylabel('Missed')\n    \n    # save figure and close\n    if savePath is not None:\n        fig.savefig(savePath)\n        \n    plt.close(fig)\n\n"
  },
  {
    "path": "utils.py",
    "content": "import numpy as np\nimport os\n\n\nclass Logger:\n    def __init__(self, path):\n        self.path = path\n        if path != '':\n            folder = '/'.join(path.split('/')[:-1])\n            if not os.path.exists(folder):\n                os.makedirs(folder)\n\n    def print(self, message):\n        print(message)\n        if self.path != '':\n            with open(self.path, 'a') as f:\n                f.write(message + '\\n')\n                f.flush()\n\n\ndef dense_to_onehot(y_test, n_cls):\n    y_test_onehot = np.zeros([len(y_test), n_cls], dtype=bool)\n    y_test_onehot[np.arange(len(y_test)), y_test] = True\n    return y_test_onehot\n\n\ndef random_classes_except_current(y_test, n_cls):\n    y_test_new = np.zeros_like(y_test)\n    for i_img in range(y_test.shape[0]):\n        lst_classes = list(range(n_cls))\n        lst_classes.remove(y_test[i_img])\n        y_test_new[i_img] = np.random.choice(lst_classes)\n    return y_test_new\n\n\ndef softmax(x):\n    e_x = np.exp(x - np.max(x, axis=1, keepdims=True))\n    return e_x / e_x.sum(axis=1, keepdims=True)\n"
  }
]