[
  {
    "path": ".gitignore",
    "content": "*.DS_Store\n.vscode/\ndata/PyCon\npy/.ipynb_checkpoints/\n__*\npy/logs/"
  },
  {
    "path": "LICENSE",
    "content": "MIT License\n\nCopyright (c) 2018 Jens Leitloff & Felix Riese\n\nPermission is hereby granted, free of charge, to any person obtaining a copy\nof this software and associated documentation files (the \"Software\"), to deal\nin the Software without restriction, including without limitation the rights\nto use, copy, modify, merge, publish, distribute, sublicense, and/or sell\ncopies of the Software, and to permit persons to whom the Software is\nfurnished to do so, subject to the following conditions:\n\nThe above copyright notice and this permission notice shall be included in all\ncopies or substantial portions of the Software.\n\nTHE SOFTWARE IS PROVIDED \"AS IS\", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR\nIMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,\nFITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE\nAUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER\nLIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,\nOUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE\nSOFTWARE.\n"
  },
  {
    "path": "README.md",
    "content": "[![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.3268451.svg)](https://doi.org/10.5281/zenodo.3268451)\n[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](LICENSE)\n[![Codacy Badge](https://api.codacy.com/project/badge/Grade/a6b8568aab8c4c319a4f58d84cccf7c0)](https://www.codacy.com/manual/jensleitloff/CNN-Sentinel?utm_source=github.com&amp;utm_medium=referral&amp;utm_content=jensleitloff/CNN-Sentinel&amp;utm_campaign=Badge_Grade)\n\n# Analyzing Sentinel-2 satellite data in Python with TensorFlow.Keras\n\nOverview about state-of-the-art land-use classification from satellite data\nwith CNNs based on an open dataset\n\n## Outline\n\n* [Scripts you will find here](#scripts-you-will-find-here)\n* [Requirements (what we used):](#requirements--what-we-used--)\n* [Setup Environment](#setup-environment)\n* [Our talks about this topic](#our-talks-about-this-topic)\n* [Resources](#resources)\n* [How to get Sentinel-2 data](#how-to-get-sentinel-2-data)\n* [Citation](#citation)\n\n## Scripts you will find here\n\n* `01_split_data_to_train_and_validation.py`: split complete dataset into train\n  and validation\n* `02_train_rgb_finetuning.py`: train VGG16 or DenseNet201 using RGB data with\n  pre-trained weights on ImageNet\n* `03_train_rgb_from_scratch.py`: train VGG16 or DenseNet201 from scratch using\n  RGB data\n* `04_train_ms_finetuning.py`: train VGG16 or DenseNet201 using multispectral\n  data with pre-trained weights on ImageNet\n* `04_train_ms_finetuning_alternative.py`: an alternative way to train VGG16 or\n  DenseNet201 using multispectral data with pre-trained weights on ImageNet\n* `05_train_ms_from_scratch.py`: train VGG16 or DenseNet201 from scratch using\n  multispectral data\n* `06_classify_image.py`: a simple implementation to classify images with\n  trained models\n* `image_functions.py`: functions for image normalization and a simple\n  generator for training data augmentation\n* `statistics.py`: a simple implementation to calculate normalization\n  parameters (i.e. mean and std of training data)\n\nAdditionally you will find the following notebooks:\n\n* `Image_functions.ipynb`: notebook of `image_functions.py`\n* `Train_from_Scratch.ipynb`: notebook of `05_train_ms_from_scratch.py`\n* `Transfer_learning.ipynb`: notebook of `02_train_rgb_finetuning.py`\n\n## Requirements (what we used)\n\nWe have defined the requirements in [requirements.txt](requirements.txt).\nWe used:\n\n* python 3.6.x\n* tensorflow 2.2\n* scikit-image (0.14.1)\n* gdal (2.2.4) for `06_classify_image.py`\n\n## Frequently asked questions (FAQs)\n\n* **How can I interpret the classification results?** - Please have a look at our answers\n  [#3](https://github.com/jensleitloff/CNN-Sentinel/issues/3),\n  [#4](https://github.com/jensleitloff/CNN-Sentinel/issues/4), and\n  [#6](https://github.com/jensleitloff/CNN-Sentinel/issues/6).\n* **Is there a paper I can cite for this repository?** - Please have a look at [Citation](#citation)\n\n## Setup environment\n\nAppend conda-forge to your Anaconda channels:\n\n```bash\nconda config --append channels conda-forge\n```\n\nCreate new environment:\n\n```bash\nconda create -n pycon scikit-image gdal tqdm\nconda activate pycon\npip install tensorflow-gpu\npip install keras\n```\n\n(or use tensorflow version of keras, i.e. `from tensorflow import keras`)\n\nSee also:\n\n* [Keras](https://keras.io/)\n\n## Our talks about this topic\n\n### Podcast episode @ TechTiefen\n\n* **Title:** \"Fernerkundung mit multispektralen Satellitenbildern\"\n* **Episode:** [Episode 18](https://techtiefen.de/18-fernerkundung-mit-multispektralen-satellitenbildern/)\n* **Podcast:** [TechTiefen](https://techtiefen.de) by Nico Kreiling\n* **Language:** German (Deutsch)\n* **Date:** July 2019\n\n<details><summary>Abstract</summary>\n Jens Leitloff und Felix Riese berichten in Folge 18 von ihrer Forschung am “Institut für Photogrammetrie und Fernerkundung” des Karlsruher Instituts für Technologie. Mit der Bestrebung Nachhaltigkeit zu stärken erforschen die beiden etwa Verfahren, um Wasserqualität anhand von Satellitenaufnahmen zu bewerten oder die Nutzung landwirtschaftlicher Flächen zu kartografieren. Hierfür kommen unterschiedlichste Verfahren zum Einsatz wie Radaraufnahmen oder multispektrale Bilderdaten, die mehr als die drei von Menschen wahrnehmbaren Farbkanäle erfassen. Außerdem geht es um Drohnen, Satelliten und zahlreiche ML-Verfahren wie Transfer- und Aktive Learning. Persönliche Erfahrungen von Jens und Felix im Umgang mit unterschiedlichen Datenmengen runden eine thematisch Breite und anschauliche Folge ab.\n</details>\n\n### M3 Minds mastering machines 2019 @ Mannheim\n\n* **Title:** \"Satellite Computer Vision mit Keras und Tensorflow - Best practices und beispiele aus der Forschung\"\n* **Slides:** [Slides](slides/M3-2019_RieseLeitloff_SatelliteCV.pdf)\n* **Language:** German (Deutsch)\n* **Date:** 15 - 16 May 2019\n* **DOI:** [![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.4056744.svg)](https://doi.org/10.5281/zenodo.4056744)\n* **URL:** [m3-konferenz.de](https://m3-konferenz.de/2019/)\n\n<details><summary>Abstract</summary>\n> Im Forschungsfeld des Maschinellen Lernens werden zunehmend leicht zugängliche Framework wie Keras, Tensorflow oder Pytorch verwendet. Hierdurch ist ein Austausch und die Wiederverwendung bestehender (trainierter) neuronaler Netze möglich.\n>\n> Wir am Institut für Photogrammetrie und Fernerkundung (IPF) des Karlsruher Institut für Technologie (KIT) beschäftigen uns unter anderem mit der Analyse von optischen Satellitendaten. Satellitenprogramme wie Sentinel-2 von Copernicus liefern wöchentliche, weltweite und dabei frei zugängliche multispektrale Bilder, die eine Vielzahl neuartiger Anwendungen ermöglichen. Wir nehmen das zum Anlass, eine interaktive Einführung in die Auswertung dieser Satellitendaten mit Learnings aus unserer täglichen Forschung zu geben. Wir sprechen unter anderem über die folgenden Themen:\n>\n> * Einfacher Umgang mit georeferenzierten Bilddaten\n> * Einführung in Learning-From-Scratch und Transfer Learning mit Keras\n> * Anpassung von fertigen Netzen an neue Eingangsdaten (RGB → multispektral)\n> * Anschauliche Interpretation von Klassifikationsergebnissen\n> * Best Practices aus unserer Forschung, die die Arbeit mit Neuronalen Netzen wesentlich vereinfachen und beschleunigen\n> * Code und Daten für die ersten Schritte mit CNNs mit Keras in Python, welche in einem GitHub Repository zur Verfügung gestellt werden\n</details>\n\n### PyCon.DE 2018 @ Karlsruhe\n\n* **Title:** \"Satellite data is for everyone: insights into modern remote sensing research with open data and Python\"\n* **Slides:** [Slides](slides/PyCon2018_LeitloffRiese_SatelliteData.pdf)\n* **Video:** [youtube.com/watch?v=tKRoMcBeWjQ](https://www.youtube.com/watch?v=tKRoMcBeWjQ)\n* **Language:** English\n* **Date:** 24 - 28 October 2018\n* **DOI:** [![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.4056516.svg)](https://doi.org/10.5281/zenodo.4056516)\n* **URL:** [de.pycon.org](https://de.pycon.org)\n\n<details><summary>Abstract</summary>\n> The largest earth observation programme Copernicus (http://copernicus.eu) makes it possible to perform terrestrial observations providing data for all kinds of purposes. One important objective is to monitor the land-use and land-cover changes with the Sentinel-2 satellite mission. These satellites measure the sun reflectance on the earth surface with multispectral cameras (13 channels between 440 nm to 2190 nm). Machine learning techniques like convolutional neural networks (CNN) are able to learn the link between the satellite image (spectrum) and the ground truth (land use class). In this talk, we give an overview about the state-of-the-art land-use classification with CNNs based on an open dataset.\n>\n> We use different out-of-box CNNs for the Keras deep learning library (https://keras.io/). All networks are either included in Keras itself or are available from Github repositories. We show the process of transfer learning for the RGB datasets. Furthermore, the minimal changes required to apply commonly used CNNs to multispectral data are demonstrated. Thus, the interested audience will be able to perform their own classification of remote sensing data within a very short time. Results of different network structures are visually compared. Especially the differences of transfer learning and learning from scratch are demonstrated. This also includes the amount of necessary training epochs, progress of training and validation error and visual comparison of the results of the trained networks. Finally, we give a quick overview about the current research topics including recurrent neural networks for spatio-temporal land-use classification and further applications of multi- and hyperspectral data, e.g. for the estimation of water parameters and soil characteristics.\n</details>\n\n## Resources\n\n**This talk:**\n\n* EuroSAT Data (Sentinel-2, [Link](http://madm.dfki.de/downloads))\n\n**Platforms for datasets:**\n\n* HyperLabelMe: a Web Platform for Benchmarking Remote Sensing Image Classifiers ([Link](http://hyperlabelme.uv.es/))\n* GRSS Data and Algorithm Standard Evaluation (DASE) website ([Link](http://dase.ticinumaerospace.com/))\n\n**Datasets:**\n\n* ISPRS 2D labeling challenge ([Link](http://www2.isprs.org/commissions/comm3/wg4/semantic-labeling.html))\n* UC Merced Land Use Dataset ([Link](http://weegee.vision.ucmerced.edu/datasets/landuse.html))\n* AID: A Benchmark Dataset for Performance Evaluation of Aerial Scene Classification ([Link](https://captain-whu.github.io/AID/))\n* NWPU-RESISC45 (RGB, [Link](http://www.escience.cn/people/JunweiHan/NWPU-RESISC45.html))\n* Zurich Summer Dataset (RGB, [Link](https://sites.google.com/site/michelevolpiresearch/data/zurich-dataset))\n* **Note**: Many German state authorities offer free geodata (high resolution images, land use/cover vector data, ...) over their geoportals. You can find an overview of all geoportals here ([geoportals](https://www.geoportal.nrw/geoportale_bundeslaender_nachbarstaaten))\n\n**Image Segmentation Resources:**\n\n* More than 100 combinations for image segmentation routines with Keras and pretrained weights for endcoding phase ([Segmentation Models](https://github.com/qubvel/segmentation_models))\n* Another source for image segmentation with Keras including pretrained weights ([Keras-FCN](https://github.com/aurora95/Keras-FCN))\n* Great link collection of image segmantation networks and datasets ([Link](https://github.com/mrgloom/awesome-semantic-segmentation))\n* Free land use vector data of NRW ([BasisDLM](https://www.bezreg-koeln.nrw.de/brk_internet/geobasis/landschaftsmodelle/basis_dlm/index.html) or [openNRW](https://open.nrw/en/node/154))\n\n**Other:**\n\n* DeepHyperX - Deep learning for Hyperspectral imagery: [gitlab.inria.fr/naudeber/DeepHyperX/](https://gitlab.inria.fr/naudeber/DeepHyperX/)\n\n## How to get Sentinel-2 data\n\n1. Register at Copernicus [Open Access Hub](https://scihub.copernicus.eu/dhus/#/home) or [EarthExplorer](https://earthexplorer.usgs.gov/)\n2. Find your region\n3. Choose tile(s) (→ area) and date\n    * Less tiles makes things easier\n    * Less clouds in the image are better\n    * Consider multiple dates for classes like “annual crop”\n4. Download L1C data\n5. Decide of you want to apply L2A atmospheric corrections\n    * Your CNN might be able to do this by itself\n    * If you want to correct, use [Sen2Cor](http://step.esa.int/main/third-party-plugins-2/sen2cor/)\n6. Have fun with the data\n\n## Citation\n\nJens Leitloff and Felix M. Riese, \"Examples for CNN training and classification on Sentinel-2 data\", Zenodo, [10.5281/zenodo.3268451](http://doi.org/10.5281/zenodo.3268451), 2018.\n\n```tex\n@misc{leitloff2018examples,\n    author = {Leitloff, Jens and Riese, Felix~M.},\n    title = {{Examples for CNN training and classification on Sentinel-2 data}},\n    year = {2018},\n    DOI = {10.5281/zenodo.3268451},\n    publisher = {Zenodo},\n    howpublished = {\\href{http://doi.org/10.5281/zenodo.3268451}{http://doi.org/10.5281/zenodo.3268451}}\n}\n```\n"
  },
  {
    "path": "bibliography.bib",
    "content": "@misc{leitloff2018examples,\n    author = {Leitloff, Jens and Riese, Felix~M.},\n    title = {{Examples for CNN training and classification on Sentinel-2 data}},\n    year = {2018},\n    DOI = {10.5281/zenodo.3268451},\n    publisher = {Zenodo},\n    howpublished = {\\href{http://doi.org/10.5281/zenodo.3268451}{http://doi.org/10.5281/zenodo.3268451}}\n}\n\n@inproceedings{leitloff2018satellite,\n    author = {Leitloff, Jens and Riese, Felix~M.},\n    title = {{Satellite data is for everyone: insights into modern remote sensing research with open data and Python}},\n    year = {2018},\n    booktitle = {PyCon.DE 2018},\n    address = {Karlsruhe, Germany},\n    DOI = {10.5281/zenodo.4056516},\n    publisher = {Zenodo},\n    howpublished = {\\href{http://doi.org/10.5281/zenodo.4056516}{http://doi.org/10.5281/zenodo.4056516}}\n}\n\n@inproceedings{leitloff2019satellite,\n    author = {Leitloff, Jens and Riese, Felix~M.},\n    title = {{Satellite Computer Vision mit Keras und Tensorflow - Best practices und beispiele aus der Forschung}},\n    year = {2019},\n    booktitle = {Minds Mastering Machines (M3)},\n    address = {Mannheim, Germany},\n    DOI = {10.5281/zenodo.4056744},\n    publisher = {Zenodo},\n    howpublished = {\\href{http://doi.org/10.5281/zenodo.4056744}{http://doi.org/10.5281/zenodo.4056744}}\n}\n"
  },
  {
    "path": "py/01_split_data_to_train_and_validation.py",
    "content": "# -*- coding: utf-8 -*-\n\"\"\"\nCode for the PyCon.DE 2018 talk by Jens Leitloff and Felix M. Riese.\n\nPyCon 2018 talk: Satellite data is for everyone: insights into modern remote\nsensing research with open data and Python.\n\nLicense: MIT\n\n\"\"\"\nimport os\nimport random\nimport shutil\n\nrandom.seed(42)\n# variables\n# root path to folders \"AnnualCrop, Forest ...\" in home (\"~\")\npath_to_all_images = \"~/Documents/Data/EuroSAT/AllBands\"\npath_to_all_images = r'C:\\Users\\Jens\\Downloads\\EuroSAT_RGB'\n# path to new created folders \"train\" and \"validation\" with subfolders\n# \"AnnualCrop, Forest ...\" in home (\"~\")\npath_to_split_datasets = \"~/Documents/Data/PyCon/AllBands\"\npath_to_split_datasets = r'C:\\Users\\Jens\\Downloads\\PyCon\\RGB'\n# percentage of validation data (between 0 an 1)\npercentage_validation = 0.3\n# !!! If \"True\", complete \"path_to_split_datasets\" tree will be deleted !!!\ndelete_old_path_to_split_datasets = True\n\n# contruct path\npath_to_home = os.path.expanduser(\"~\")\npath_to_all_images = path_to_all_images.replace(\"~\", path_to_home)\npath_to_split_datasets = path_to_split_datasets.replace(\"~\", path_to_home)\n# create directories if necessary\nif delete_old_path_to_split_datasets and os.path.isdir(path_to_split_datasets):\n    shutil.rmtree(path_to_split_datasets)\npath_to_train = os.path.join(path_to_split_datasets, \"train\")\npath_to_validation = os.path.join(path_to_split_datasets, \"validation\")\nif not os.path.isdir(path_to_train):\n    os.makedirs(path_to_train)\nif not os.path.isdir(path_to_validation):\n    os.makedirs(path_to_validation)\n\n# copy files\nsub_dirs = [sub_dir for sub_dir in os.listdir(path_to_all_images)\n            if os.path.isdir(os.path.join(path_to_all_images, sub_dir))]\nfor sub_dir in sub_dirs:\n    # list and shuffle images in class directories\n    current_dir = os.path.join(path_to_all_images, sub_dir)\n    files = os.listdir(current_dir)\n    random.shuffle(files)\n    # split files into train and validation set\n    split_idx = int(len(files)*percentage_validation)\n    files_for_validation = files[:split_idx]\n    files_for_train = files[split_idx:]\n    # copy files to path_to_split_datasets\n    if not os.path.isdir(os.path.join(path_to_train, sub_dir)):\n        os.makedirs(os.path.join(path_to_train, sub_dir))\n    if not os.path.isdir(os.path.join(path_to_validation, sub_dir)):\n        os.makedirs(os.path.join(path_to_validation, sub_dir))\n    for file in files_for_train:\n        shutil.copy2(os.path.join(current_dir, file),\n                     os.path.join(path_to_train, sub_dir))\n    for file in files_for_validation:\n        shutil.copy2(os.path.join(current_dir, file),\n                     os.path.join(path_to_validation, sub_dir))\n"
  },
  {
    "path": "py/02_train_rgb_finetuning.py",
    "content": "#!/usr/bin/env python3\n# -*- coding: utf-8 -*-\n\"\"\"\nCode for the PyCon.DE 2018 talk by Jens Leitloff and Felix M. Riese.\n\nPyCon 2018 talk: Satellite data is for everyone: insights into modern remote\nsensing research with open data and Python.\n\nLicense: MIT\n\n\"\"\"\nimport os\n\nfrom tensorflow.keras.applications.densenet import DenseNet201 as DenseNet\nfrom tensorflow.keras.applications.vgg16 import VGG16 as VGG\nfrom tensorflow.keras.callbacks import (EarlyStopping, ModelCheckpoint,\n                                        TensorBoard)\nfrom tensorflow.keras.layers import Dense, GlobalAveragePooling2D\nfrom tensorflow.keras.models import Model\nfrom tensorflow.keras.optimizers import SGD\nfrom tensorflow.keras.preprocessing.image import ImageDataGenerator\n\nfrom image_functions import preprocessing_image_rgb\n\n\n# variables\npath_to_split_datasets = \"~/Documents/Data/PyCon/RGB\"\nuse_vgg = True\nbatch_size = 64\n\n# contruct path\npath_to_home = os.path.expanduser(\"~\")\npath_to_split_datasets = path_to_split_datasets.replace(\"~\", path_to_home)\npath_to_train = os.path.join(path_to_split_datasets, \"train\")\npath_to_validation = os.path.join(path_to_split_datasets, \"validation\")\n\n# get number of classes\nsub_dirs = [sub_dir for sub_dir in os.listdir(path_to_train)\n            if os.path.isdir(os.path.join(path_to_train, sub_dir))]\nnum_classes = len(sub_dirs)\n\n# parameters for CNN\nif use_vgg:\n    base_model = VGG(include_top=False,\n                     weights='imagenet',\n                     input_shape=(64, 64, 3))\nelse:\n    base_model = DenseNet(include_top=False,\n                          weights='imagenet',\n                          input_shape=(64, 64, 3))\n# add a global spatial average pooling layer\ntop_model = base_model.output\ntop_model = GlobalAveragePooling2D()(top_model)\n# or just flatten the layers\n#    top_model = Flatten()(top_model)\n# let's add a fully-connected layer\nif use_vgg:\n    # only in VGG19 a fully connected nn is added for classfication\n    # DenseNet tends to overfitting if using additionally dense layers\n    top_model = Dense(2048, activation='relu')(top_model)\n    top_model = Dense(2048, activation='relu')(top_model)\n# and a logistic layer\npredictions = Dense(num_classes, activation='softmax')(top_model)\n\n# this is the model we will train\nmodel = Model(inputs=base_model.input, outputs=predictions)\n\n# print network structure\nmodel.summary()\n\n# defining ImageDataGenerators\n# ... initialization for training\ntrain_datagen = ImageDataGenerator(\n    fill_mode=\"reflect\",\n    rotation_range=45,\n    horizontal_flip=True,\n    vertical_flip=True,\n    preprocessing_function=preprocessing_image_rgb)\n\n# ... initialization for validation\ntest_datagen = ImageDataGenerator(\n    preprocessing_function=preprocessing_image_rgb)\n\n# ... definition for training\ntrain_generator = train_datagen.flow_from_directory(path_to_train,\n                                                    target_size=(64, 64),\n                                                    batch_size=batch_size,\n                                                    class_mode='categorical')\n# just for information\nclass_indices = train_generator.class_indices\nprint(class_indices)\n\n# ... definition for validation\nvalidation_generator = test_datagen.flow_from_directory(\n    path_to_validation,\n    target_size=(64, 64),\n    batch_size=batch_size,\n    class_mode='categorical')\n\n# first: train only the top layers (which were randomly initialized)\n# i.e. freeze all convolutional layers\nfor layer in base_model.layers:\n    layer.trainable = False\n\n# compile the model (should be done *after* setting layers to non-trainable)\nmodel.compile(optimizer='adadelta', loss='categorical_crossentropy',\n              metrics=['categorical_accuracy'])\n\n# generate callback to save best model w.r.t val_categorical_accuracy\nif use_vgg:\n    file_name = \"vgg\"\nelse:\n    file_name = \"dense\"\n\ncheckpointer = ModelCheckpoint(\"../data/models/\" + file_name +\n                               \"_rgb_transfer_init.\" +\n                               \"{epoch:02d}-{val_categorical_accuracy:.3f}.\" +\n                               \"hdf5\",\n                               monitor='val_categorical_accuracy',\n                               verbose=1,\n                               save_best_only=True,\n                               mode='max')\n\nearlystopper = EarlyStopping(monitor='val_categorical_accuracy',\n                             patience=10,\n                             mode='max',\n                             restore_best_weights=True)\n\ntensorboard = TensorBoard(log_dir='./logs', write_graph=True,\n                          write_images=True, update_freq='epoch')\n\nhistory = model.fit(\n    train_generator,\n    steps_per_epoch=1000,\n    epochs=10000,\n    callbacks=[checkpointer, earlystopper,\n                tensorboard],\n    validation_data=validation_generator,\n    validation_steps=500)\ninitial_epoch = len(history.history['loss'])+1\n# at this point, the top layers are well trained and we can start fine-tuning\n# convolutional layers. We will freeze the bottom N layers\n# and train the remaining top layers.\n\n# let's visualize layer names and layer indices to see how many layers\n# we should freeze:\nnames = []\nfor i, layer in enumerate(model.layers):\n    names.append([i, layer.name, layer.trainable])\nprint(names)\n\nif use_vgg:\n    # we will freaze the first convolutional block and train all\n    # remaining blocks, including top layers.\n    for layer in model.layers[:4]:\n        layer.trainable = False\n    for layer in model.layers[4:]:\n        layer.trainable = True\nelse:\n    for layer in model.layers[:7]:\n        layer.trainable = False\n    for layer in model.layers[7:]:\n        layer.trainable = True\n\n# we need to recompile the model for these modifications to take effect\n# we use SGD with a low learning rate\nmodel.compile(optimizer=SGD(lr=0.0001, momentum=0.9),\n              loss='categorical_crossentropy',\n              metrics=['categorical_accuracy'])\n\n# generate callback to save best model w.r.t val_categorical_accuracy\nif use_vgg:\n    file_name = \"vgg\"\nelse:\n    file_name = \"dense\"\ncheckpointer = ModelCheckpoint(\"../data/models/\" + file_name +\n                               \"_rgb_transfer_final.\" +\n                               \"{epoch:02d}-{val_categorical_accuracy:.3f}\" +\n                               \".hdf5\",\n                               monitor='val_categorical_accuracy',\n                               verbose=1,\n                               save_best_only=True,\n                               mode='max')\nearlystopper = EarlyStopping(monitor='val_categorical_accuracy',\n                             patience=50,\n                             mode='max')\nmodel.fit(\n    train_generator,\n    steps_per_epoch=1000,\n    epochs=10000,\n    callbacks=[checkpointer, earlystopper, tensorboard],\n    validation_data=validation_generator,\n    validation_steps=500,\n    initial_epoch=initial_epoch)\n"
  },
  {
    "path": "py/03_train_rgb_from_scratch.py",
    "content": "#!/usr/bin/env python3\n# -*- coding: utf-8 -*-\n\"\"\"\nCode for the PyCon.DE 2018 talk by Jens Leitloff and Felix M. Riese.\n\nPyCon 2018 talk: Satellite data is for everyone: insights into modern remote\nsensing research with open data and Python.\n\nLicense: MIT\n\n\"\"\"\nimport os\n\nfrom tensorflow.keras.applications.densenet import DenseNet201 as DenseNet\nfrom tensorflow.keras.applications.vgg16 import VGG16 as VGG\nfrom tensorflow.keras.callbacks import EarlyStopping, ModelCheckpoint\nfrom tensorflow.keras.layers import Dense, GlobalAveragePooling2D\nfrom tensorflow.keras.models import Model\nfrom tensorflow.keras.preprocessing.image import ImageDataGenerator\n\nfrom image_functions import preprocessing_image_rgb\n\n# variables\npath_to_split_datasets = \"~/Documents/Data/PyCon/RGB\"\nuse_vgg = False\nbatch_size = 64\n\n# contruct path\npath_to_home = os.path.expanduser(\"~\")\npath_to_split_datasets = path_to_split_datasets.replace(\"~\", path_to_home)\npath_to_train = os.path.join(path_to_split_datasets, \"train\")\npath_to_validation = os.path.join(path_to_split_datasets, \"validation\")\n\n# get number of classes\nsub_dirs = [sub_dir for sub_dir in os.listdir(path_to_train)\n            if os.path.isdir(os.path.join(path_to_train, sub_dir))]\nnum_classes = len(sub_dirs)\n\n# parameters for CNN\nif use_vgg:\n    base_model = VGG(include_top=False,\n                     weights=None,\n                     input_shape=(64, 64, 3))\nelse:\n    base_model = DenseNet(include_top=False,\n                          weights=None,\n                          input_shape=(64, 64, 3))\n# add a global spatial average pooling layer\ntop_model = base_model.output\ntop_model = GlobalAveragePooling2D()(top_model)\n# or just flatten the layers\n# top_model = Flatten()(top_model)\n# let's add a fully-connected layer\nif use_vgg:\n    # only in VGG19 a fully connected nn is added for classfication\n    # DenseNet tends to overfitting if using additionally dense layers\n    top_model = Dense(2048, activation='relu')(top_model)\n    top_model = Dense(2048, activation='relu')(top_model)\n# and a logistic layer\npredictions = Dense(num_classes, activation='softmax')(top_model)\n\n# this is the model we will train\nmodel = Model(inputs=base_model.input, outputs=predictions)\n# print network structure\nmodel.summary()\n\n# defining ImageDataGenerators\n# ... initialization for training\ntrain_datagen = ImageDataGenerator(\n    fill_mode=\"reflect\",\n    rotation_range=45,\n    horizontal_flip=True,\n    vertical_flip=True,\n    preprocessing_function=preprocessing_image_rgb)\n\n# ... initialization for validation\ntest_datagen = ImageDataGenerator(\n    preprocessing_function=preprocessing_image_rgb)\n\n# ... definition for training\ntrain_generator = train_datagen.flow_from_directory(path_to_train,\n                                                    target_size=(64, 64),\n                                                    batch_size=batch_size,\n                                                    class_mode='categorical')\nprint(train_generator.class_indices)\n\n# ... definition for validation\nvalidation_generator = test_datagen.flow_from_directory(\n    path_to_validation,\n    target_size=(64, 64),\n    batch_size=batch_size,\n    class_mode='categorical')\n\n# compile the model (should be done *after* setting layers to non-trainable)\nmodel.compile(optimizer='adadelta', loss='categorical_crossentropy',\n              metrics=['categorical_accuracy'])\n\n# generate callback to save best model w.r.t val_categorical_accuracy\nif use_vgg:\n    file_name = \"vgg\"\nelse:\n    file_name = \"dense\"\ncheckpointer = ModelCheckpoint(\"../data/models/\" + file_name +\n                               \"_rgb_from_scratch.\" +\n                               \"{epoch:02d}-{val_categorical_accuracy:.3f}\" +\n                               \".hdf5\",\n                               monitor='val_categorical_accuracy',\n                               verbose=1,\n                               save_best_only=True,\n                               mode='max')\nearlystopper = EarlyStopping(monitor='val_categorical_accuracy',\n                             patience=50,\n                             mode='max')\nmodel.fit(\n    train_generator,\n    steps_per_epoch=1000,\n    epochs=10000,\n    callbacks=[checkpointer, earlystopper],\n    validation_data=validation_generator,\n    validation_steps=500)\n"
  },
  {
    "path": "py/04_train_ms_finetuning.py",
    "content": "#!/usr/bin/env python3\n# -*- coding: utf-8 -*-\n\"\"\"\nCode for the PyCon.DE 2018 talk by Jens Leitloff and Felix M. Riese.\n\nPyCon 2018 talk: Satellite data is for everyone: insights into modern remote\nsensing research with open data and Python.\n\nLicense: MIT\n\n\"\"\"\nimport os\nfrom glob import glob\n\nfrom tensorflow.keras.applications.densenet import DenseNet201 as DenseNet\nfrom tensorflow.keras.applications.vgg16 import VGG16 as VGG\nfrom tensorflow.keras.callbacks import EarlyStopping, ModelCheckpoint\nfrom tensorflow.keras.layers import (Conv2D, Dense, GlobalAveragePooling2D,\n                                     Input)\nfrom tensorflow.keras.models import Model\nfrom tensorflow.keras.optimizers import SGD\n\nfrom image_functions import simple_image_generator\n\n# variables\npath_to_split_datasets = \"~/Documents/Data/PyCon/AllBands\"\nuse_vgg = False\nbatch_size = 64\n\nclass_indices = {'AnnualCrop': 0, 'Forest': 1, 'HerbaceousVegetation': 2,\n                 'Highway': 3, 'Industrial': 4, 'Pasture': 5,\n                 'PermanentCrop': 6, 'Residential': 7, 'River': 8,\n                 'SeaLake': 9}\nnum_classes = len(class_indices)\n\n# contruct path\npath_to_home = os.path.expanduser(\"~\")\npath_to_split_datasets = path_to_split_datasets.replace(\"~\", path_to_home)\npath_to_train = os.path.join(path_to_split_datasets, \"train\")\npath_to_validation = os.path.join(path_to_split_datasets, \"validation\")\n\n# parameters for CNN\ninput_tensor = Input(shape=(64, 64, 13))\n# introduce a additional layer to get from 13 to 3 input channels\ninput_tensor = Conv2D(3, (1, 1))(input_tensor)\nif use_vgg:\n    base_model_imagenet = VGG(include_top=False,\n                              weights='imagenet',\n                              input_shape=(64, 64, 3))\n    base_model = VGG(include_top=False,\n                     weights=None,\n                     input_tensor=input_tensor)\n    for i, layer in enumerate(base_model_imagenet.layers):\n        # we must skip input layer, which has no weights\n        if i == 0:\n            continue\n        base_model.layers[i+1].set_weights(layer.get_weights())\nelse:\n    base_model_imagenet = DenseNet(include_top=False,\n                                   weights='imagenet',\n                                   input_shape=(64, 64, 3))\n    base_model = DenseNet(include_top=False,\n                          weights=None,\n                          input_tensor=input_tensor)\n    for i, layer in enumerate(base_model_imagenet.layers):\n        # we must skip input layer, which has no weights\n        if i == 0:\n            continue\n        base_model.layers[i+1].set_weights(layer.get_weights())\n\n# add a global spatial average pooling layer\ntop_model = base_model.output\ntop_model = GlobalAveragePooling2D()(top_model)\n# or just flatten the layers\n# top_model = Flatten()(top_model)\n\n# let's add a fully-connected layer\nif use_vgg:\n    # only in VGG19 a fully connected nn is added for classfication\n    # DenseNet tends to overfitting if using additionally dense layers\n    top_model = Dense(2048, activation='relu')(top_model)\n    top_model = Dense(2048, activation='relu')(top_model)\n# and a logistic layer\npredictions = Dense(num_classes, activation='softmax')(top_model)\n\n# this is the model we will train\nmodel = Model(inputs=base_model.input, outputs=predictions)\n\n# print network structure\nmodel.summary()\n\n# defining ImageDataGenerators\n# ... initialization for training\ntraining_files = glob(path_to_train + \"/**/*.tif\")\ntrain_generator = simple_image_generator(training_files, class_indices,\n                                         batch_size=batch_size,\n                                         rotation_range=45,\n                                         horizontal_flip=True,\n                                         vertical_flip=True)\n\n# ... initialization for validation\nvalidation_files = glob(path_to_validation + \"/**/*.tif\")\nvalidation_generator = simple_image_generator(validation_files, class_indices,\n                                              batch_size=batch_size)\n\n# first: train only the top layers (which were randomly initialized)\n# i.e. freeze all convolutional layers\nfor layer in base_model.layers:\n    layer.trainable = False\n# set convolution block for reducing 13 to 3 layers trainable\nfor layer in model.layers[:2]:\n    layer.trainable = True\n\n# compile the model (should be done *after* setting layers to non-trainable)\nmodel.compile(optimizer='adam', loss='categorical_crossentropy',\n              metrics=['categorical_accuracy'])\n\n# generate callback to save best model w.r.t val_categorical_accuracy\nif use_vgg:\n    file_name = \"vgg\"\nelse:\n    file_name = \"dense\"\ncheckpointer = ModelCheckpoint(\"../data/models/\" + file_name +\n                               \"_ms_transfer_init.\" +\n                               \"{epoch:02d}-{val_categorical_accuracy:.3f}.\" +\n                               \"hdf5\",\n                               monitor='val_categorical_accuracy',\n                               verbose=1,\n                               save_best_only=True,\n                               mode='max')\nearlystopper = EarlyStopping(monitor='val_categorical_accuracy',\n                             patience=10,\n                             mode='max',\n                             restore_best_weights=True)\nhistory = model.fit(\n    train_generator,\n    steps_per_epoch=1000,\n    epochs=10000,\n    callbacks=[checkpointer, earlystopper],\n    validation_data=validation_generator,\n    validation_steps=500)\ninitial_epoch = len(history.history['loss'])+1\n\n# at this point, the top layers are well trained and we can start fine-tuning\n# convolutional layers. We will freeze the bottom N layers\n# and train the remaining top layers.\n\n# let's visualize layer names and layer indices to see how many layers\n# we should freeze:\nnames = []\nfor i, layer in enumerate(model.layers):\n    names.append([i, layer.name, layer.trainable])\nprint(names)\n\nif use_vgg:\n    # we will freaze the first convolutional block and train all\n    # remaining blocks, including top layers.\n    for layer in model.layers[:2]:\n        layer.trainable = True\n    for layer in model.layers[2:5]:\n        layer.trainable = False\n    for layer in model.layers[5:]:\n        layer.trainable = True\nelse:\n    for layer in model.layers[:2]:\n        layer.trainable = True\n    for layer in model.layers[2:8]:\n        layer.trainable = False\n    for layer in model.layers[8:]:\n        layer.trainable = True\n\n# we need to recompile the model for these modifications to take effect\n# we use SGD with a low learning rate\nmodel.compile(optimizer=SGD(lr=0.0001, momentum=0.9),\n              loss='categorical_crossentropy',\n              metrics=['categorical_accuracy'])\n\n# generate callback to save best model w.r.t val_categorical_accuracy\nif use_vgg:\n    file_name = \"vgg\"\nelse:\n    file_name = \"dense\"\ncheckpointer = ModelCheckpoint(\"../data/models/\" + file_name +\n                               \"_ms_transfer_final.\" +\n                               \"{epoch:02d}-{val_categorical_accuracy:.3f}\" +\n                               \".hdf5\",\n                               monitor='val_categorical_accuracy',\n                               verbose=1,\n                               save_best_only=True,\n                               mode='max')\nearlystopper = EarlyStopping(monitor='val_categorical_accuracy',\n                             patience=10,\n                             mode='max')\nmodel.fit(\n    train_generator,\n    steps_per_epoch=1000,\n    epochs=10000,\n    callbacks=[checkpointer, earlystopper],\n    validation_data=validation_generator,\n    validation_steps=500,\n    initial_epoch=initial_epoch)\n"
  },
  {
    "path": "py/04_train_ms_finetuning_alternative.py",
    "content": "#!/usr/bin/env python3\n# -*- coding: utf-8 -*-\n\"\"\"\nCode for the PyCon.DE 2018 talk by Jens Leitloff and Felix M. Riese.\n\nPyCon 2018 talk: Satellite data is for everyone: insights into modern remote\nsensing research with open data and Python.\n\nLicense: MIT\n\n\"\"\"\nimport os\nfrom glob import glob\n\n# from tensorflow.keras.applications.vgg19 import VGG19 as VGG\n# from tensorflow.keras.applications.densenet import DenseNet121 as DenseNet\nfrom tensorflow.keras.applications.densenet import DenseNet201 as DenseNet\nfrom tensorflow.keras.applications.vgg16 import VGG16 as VGG\nfrom tensorflow.keras.callbacks import EarlyStopping, ModelCheckpoint\nfrom tensorflow.keras.layers import Dense, GlobalAveragePooling2D\nfrom tensorflow.keras.models import Model\nfrom tensorflow.keras.optimizers import SGD\n\nfrom image_functions import simple_image_generator\n\n# variables\npath_to_split_datasets = \"~/Dokumente/Data/PyCon/AllBands\"\nuse_vgg = False\nbatch_size = 64\n\nclass_indices = {'AnnualCrop': 0, 'Forest': 1, 'HerbaceousVegetation': 2,\n                 'Highway': 3, 'Industrial': 4, 'Pasture': 5,\n                 'PermanentCrop': 6, 'Residential': 7, 'River': 8,\n                 'SeaLake': 9}\nnum_classes = len(class_indices)\n\n# contruct path\npath_to_home = os.path.expanduser(\"~\")\npath_to_split_datasets = path_to_split_datasets.replace(\"~\", path_to_home)\npath_to_train = os.path.join(path_to_split_datasets, \"train\")\npath_to_validation = os.path.join(path_to_split_datasets, \"validation\")\n\n\n# parameters for CNN\nif use_vgg:\n    base_model_imagenet = VGG(include_top=False,\n                              weights='imagenet',\n                              input_shape=(64, 64, 3))\n    base_model = VGG(include_top=False,\n                     weights=None,\n                     input_shape=(64, 64, 13))\n    for i, layer in enumerate(base_model_imagenet.layers):\n        # we must skip input layer and first convolutional layer\n        if i < 2:\n            continue\n        base_model.layers[i].set_weights(layer.get_weights())\nelse:\n    base_model_imagenet = DenseNet(include_top=False,\n                                   weights='imagenet',\n                                   input_shape=(64, 64, 3))\n    base_model = DenseNet(include_top=False,\n                          weights=None,\n                          input_shape=(64, 64, 13))\n    for i, layer in enumerate(base_model_imagenet.layers):\n        # we must skip input layer, zeropadding and first convolutional layer\n        if i < 3:\n            continue\n        base_model.layers[i].set_weights(layer.get_weights())\n\n# add a global spatial average pooling layer\ntop_model = base_model.output\ntop_model = GlobalAveragePooling2D()(top_model)\n\n# or just flatten the layers\n#    top_model = Flatten()(top_model)\n# let's add a fully-connected layer\nif use_vgg:\n    # only in VGG19 a fully connected nn is added for classfication\n    # DenseNet tends to overfitting if using additionally dense layers\n    top_model = Dense(2048, activation='relu')(top_model)\n    top_model = Dense(2048, activation='relu')(top_model)\n# and a logistic layer\npredictions = Dense(num_classes, activation='softmax')(top_model)\n\n# this is the model we will train\nmodel = Model(inputs=base_model.input, outputs=predictions)\n# print network structure\nmodel.summary()\n\n# defining ImageDataGenerators\n# ... initialization for training\ntraining_files = glob(path_to_train + \"/**/*.tif\")\ntrain_generator = simple_image_generator(training_files, class_indices,\n                                         batch_size=batch_size,\n                                         rotation_range=45,\n                                         horizontal_flip=True,\n                                         vertical_flip=True)\n\n# ... initialization for validation\nvalidation_files = glob(path_to_validation + \"/**/*.tif\")\nvalidation_generator = simple_image_generator(validation_files, class_indices,\n                                              batch_size=batch_size)\n\n# first: train only the top layers (which were randomly initialized)\n# i.e. freeze all convolutional layers\nfor layer in base_model.layers:\n    layer.trainable = False\n# set first convolution block for reducing 13 to 3 layers trainable\nfor layer in model.layers[:3]:\n    layer.trainable = True\n\n# compile the model (should be done *after* setting layers to non-trainable)\nmodel.compile(optimizer='adam', loss='categorical_crossentropy',\n              metrics=['categorical_accuracy'])\n\n# generate callback to save best model w.r.t val_categorical_accuracy\nif use_vgg:\n    file_name = \"vgg\"\nelse:\n    file_name = \"dense\"\ncheckpointer = ModelCheckpoint(\"../data/models/\" + file_name +\n                               \"_ms_transfer_alternative_init.\" +\n                               \"{epoch:02d}-{val_categorical_accuracy:.3f}.\" +\n                               \"hdf5\",\n                               monitor='val_categorical_accuracy',\n                               verbose=1,\n                               save_best_only=True,\n                               mode='max')\nearlystopper = EarlyStopping(monitor='val_categorical_accuracy',\n                             patience=10,\n                             mode='max',\n                             restore_best_weights=True)\nhistory = model.fit(\n    train_generator,\n    steps_per_epoch=1000,\n    epochs=10000,\n    callbacks=[checkpointer, earlystopper],\n    validation_data=validation_generator,\n    validation_steps=500)\ninitial_epoch = len(history.history['loss'])+1\n\n# at this point, the top layers are well trained and we can start fine-tuning\n# convolutional layers. We will freeze the bottom N layers\n# and train the remaining top layers.\n\n# let's visualize layer names and layer indices to see how many layers\n# we should freeze:\nnames = []\nfor i, layer in enumerate(model.layers):\n    names.append([i, layer.name, layer.trainable])\nprint(names)\n\nif use_vgg:\n    # we will freaze the first convolutional block and train all\n    # remaining blocks, including top layers.\n    for layer in model.layers[:2]:\n        layer.trainable = True\n    for layer in model.layers[2:5]:\n        layer.trainable = False\n    for layer in model.layers[5:]:\n        layer.trainable = True\nelse:\n    for layer in model.layers[:3]:\n        layer.trainable = True\n    for layer in model.layers[3:7]:\n        layer.trainable = False\n    for layer in model.layers[7:]:\n        layer.trainable = True\n\n# we need to recompile the model for these modifications to take effect\n# we use SGD with a low learning rate\nmodel.compile(optimizer=SGD(lr=0.0001, momentum=0.9),\n              loss='categorical_crossentropy',\n              metrics=['categorical_accuracy'])\n\n# generate callback to save best model w.r.t val_categorical_accuracy\nif use_vgg:\n    file_name = \"vgg\"\nelse:\n    file_name = \"dense\"\ncheckpointer = ModelCheckpoint(\"../data/models/\" + file_name +\n                               \"_ms_transfer_alternative_final.\" +\n                               \"{epoch:02d}-{val_categorical_accuracy:.3f}\" +\n                               \".hdf5\",\n                               monitor='val_categorical_accuracy',\n                               verbose=1,\n                               save_best_only=True,\n                               mode='max')\nearlystopper = EarlyStopping(monitor='val_categorical_accuracy',\n                             patience=10,\n                             mode='max')\nmodel.fit(\n    train_generator,\n    steps_per_epoch=1000,\n    epochs=10000,\n    callbacks=[checkpointer, earlystopper],\n    validation_data=validation_generator,\n    validation_steps=500,\n    initial_epoch=initial_epoch)\n"
  },
  {
    "path": "py/05_train_ms_from_scratch.py",
    "content": "#!/usr/bin/env python3\n# -*- coding: utf-8 -*-\n\"\"\"\nCode for the PyCon.DE 2018 talk by Jens Leitloff and Felix M. Riese.\n\nPyCon 2018 talk: Satellite data is for everyone: insights into modern remote\nsensing research with open data and Python.\n\nLicense: MIT\n\n\"\"\"\nimport os\nfrom glob import glob\n\nfrom tensorflow.keras.applications.densenet import DenseNet201 as DenseNet\nfrom tensorflow.keras.applications.vgg16 import VGG16 as VGG\nfrom tensorflow.keras.callbacks import EarlyStopping, ModelCheckpoint\nfrom tensorflow.keras.layers import Dense, GlobalAveragePooling2D\nfrom tensorflow.keras.models import Model\n\nfrom image_functions import simple_image_generator\n\n# variables\npath_to_split_datasets = \"~/Documents/Data/PyCon/AllBands\"\nuse_vgg = False\nbatch_size = 64\n\nclass_indices = {'AnnualCrop': 0, 'Forest': 1, 'HerbaceousVegetation': 2,\n                 'Highway': 3, 'Industrial': 4, 'Pasture': 5,\n                 'PermanentCrop': 6, 'Residential': 7, 'River': 8,\n                 'SeaLake': 9}\nnum_classes = len(class_indices)\n\n# contruct path\npath_to_home = os.path.expanduser(\"~\")\npath_to_split_datasets = path_to_split_datasets.replace(\"~\", path_to_home)\npath_to_train = os.path.join(path_to_split_datasets, \"train\")\npath_to_validation = os.path.join(path_to_split_datasets, \"validation\")\n\n# parameters for CNN\nif use_vgg:\n    base_model = VGG(include_top=False,\n                     weights=None,\n                     input_shape=(64, 64, 13))\nelse:\n    base_model = DenseNet(include_top=False,\n                          weights=None,\n                          input_shape=(64, 64, 13))\n\n# add a global spatial average pooling layer\ntop_model = base_model.output\ntop_model = GlobalAveragePooling2D()(top_model)\n# or just flatten the layers\n#    top_model = Flatten()(top_model)\n# let's add a fully-connected layer\nif use_vgg:\n    # only in VGG19 a fully connected nn is added for classfication\n    # DenseNet tends to overfitting if using additionally dense layers\n    top_model = Dense(2048, activation='relu')(top_model)\n    top_model = Dense(2048, activation='relu')(top_model)\n# and a logistic layer\npredictions = Dense(num_classes, activation='softmax')(top_model)\n\n# this is the model we will train\nmodel = Model(inputs=base_model.input, outputs=predictions)\n# print network structure\nmodel.summary()\n\n# defining ImageDataGenerators\n# ... initialization for training\ntraining_files = glob(path_to_train + \"/**/*.tif\")\ntrain_generator = simple_image_generator(training_files, class_indices,\n                                         batch_size=batch_size,\n                                         rotation_range=45,\n                                         horizontal_flip=True,\n                                         vertical_flip=True)\n\n# ... initialization for validation\nvalidation_files = glob(path_to_validation + \"/**/*.tif\")\nvalidation_generator = simple_image_generator(validation_files, class_indices,\n                                              batch_size=batch_size)\n\n# compile the model\nmodel.compile(optimizer='adam', loss='categorical_crossentropy',\n              metrics=['categorical_accuracy'])\n\n# generate callback to save best model w.r.t val_categorical_accuracy\nif use_vgg:\n    file_name = \"vgg\"\nelse:\n    file_name = \"dense\"\ncheckpointer = ModelCheckpoint(\"../data/models/\" + file_name +\n                               \"_ms_from_scratch.\" +\n                               \"{epoch:02d}-{val_categorical_accuracy:.3f}.\" +\n                               \"hdf5\",\n                               monitor='val_categorical_accuracy',\n                               verbose=1,\n                               save_best_only=True,\n                               mode='max')\nearlystopper = EarlyStopping(monitor='val_categorical_accuracy',\n                             patience=50,\n                             mode='max',\n                             restore_best_weights=True)\nmodel.fit(\n        train_generator,\n        steps_per_epoch=1000,\n        epochs=10000,\n        callbacks=[checkpointer, earlystopper],\n        validation_data=validation_generator,\n        validation_steps=500)\n"
  },
  {
    "path": "py/06_classify_image.py",
    "content": "#!/usr/bin/env python3\n# -*- coding: utf-8 -*-\n\"\"\"\nCode for the PyCon.DE 2018 talk by Jens Leitloff and Felix M. Riese.\n\nPyCon 2018 talk: Satellite data is for everyone: insights into modern remote\nsensing research with open data and Python.\n\nLicense: MIT\n\n\"\"\"\nimport gdal\nimport numpy as np\nfrom skimage.io import imread\nfrom skimage.util import pad\nfrom tensorflow.keras.models import load_model\nfrom tqdm import tqdm\n\n\n# input files\npath_to_image = \"../data/karlsruhe/2018_zugeschnitten.tif\"\npath_to_model = \"../data/models/vgg/vgg_ms_transfer_alternative_final.27-0.985.hdf5\"\n# output files\npath_to_label_image = \"../data/karlsruhe/2018_zugeschnitten_10m_vgg_ms_label.tif\"\npath_to_prob_image = \"../data/karlsruhe/2018_zugeschnitten_10m_vgg_ms_prob.tif\"\n\n# read image and model\nimage = np.array(imread(path_to_image), dtype=float)\n_, num_cols_unpadded, _ = image.shape\nmodel = load_model(path_to_model)\n# get input shape of model\n_, input_rows, input_cols, input_channels = model.layers[0].input_shape\n_, output_classes = model.layers[-1].output_shape\nin_rows_half = int(input_rows/2)\nin_cols_half = int(input_cols/2)\n\n# import correct preprocessing\nif input_channels is 3:\n    from image_functions import preprocessing_image_rgb as preprocessing_image\nelse:\n    from image_functions import preprocessing_image_ms as preprocessing_image\n\n# pad image\nimage = pad(image, ((input_rows, input_rows),\n                    (input_cols, input_cols),\n                    (0, 0)), 'symmetric')\n\n# don't forget to preprocess\nimage = preprocessing_image(image)\nnum_rows, num_cols, _ = image.shape\n\n# sliding window over image\nimage_classified_prob = np.zeros((num_rows, num_cols, output_classes))\nrow_images = np.zeros((num_cols_unpadded, input_rows,\n                       input_cols, input_channels))\nfor row in tqdm(range(input_rows, num_rows-input_rows), desc=\"Processing...\"):\n    # get all images along one row\n    for idx, col in enumerate(range(input_cols, num_cols-input_cols)):\n        # cut smal image patch\n        row_images[idx, ...] = image[row-in_rows_half:row+in_rows_half,\n                                     col-in_cols_half:col+in_cols_half, :]\n    # classify images\n    row_classified = model.predict(row_images, batch_size=1024, verbose=0)\n    # put them to final image\n    image_classified_prob[row, input_cols:num_cols-input_cols, : ] = row_classified\n\n# crop padded final image\nimage_classified_prob = image_classified_prob[input_rows:num_rows-input_rows,\n                                              input_cols:num_cols-input_cols, :]\nimage_classified_label = np.argmax(image_classified_prob, axis=-1)\nimage_classified_prob = np.sort(image_classified_prob, axis=-1)[..., -1]\n\n# write image as Geotiff for correct georeferencing\n# read geotransformation\nimage = gdal.Open(path_to_image, gdal.GA_ReadOnly)\ngeotransform = image.GetGeoTransform()\n\n# create image driver\ndriver = gdal.GetDriverByName('GTiff')\n# create destination for label file\nfile = driver.Create(path_to_label_image,\n                     image_classified_label.shape[1],\n                     image_classified_label.shape[0],\n                     1,\n                     gdal.GDT_Byte,\n                     ['TFW=YES', 'NUM_THREADS=1'])\nfile.SetGeoTransform(geotransform)\nfile.SetProjection(image.GetProjection())\n# write label file\nfile.GetRasterBand(1).WriteArray(image_classified_label)\nfile = None\n# create destination for probability file\nfile = driver.Create(path_to_prob_image,\n                     image_classified_prob.shape[1],\n                     image_classified_prob.shape[0],\n                     1,\n                     gdal.GDT_Float32,\n                     ['TFW=YES', 'NUM_THREADS=1'])\nfile.SetGeoTransform(geotransform)\nfile.SetProjection(image.GetProjection())\n# write label file\nfile.GetRasterBand(1).WriteArray(image_classified_prob)\nfile = None\nimage = None\n"
  },
  {
    "path": "py/Image_functions.ipynb",
    "content": "{\n \"cells\": [\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"# Notebook for image_functions.py\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": null,\n   \"metadata\": {},\n   \"outputs\": [],\n   \"source\": [\n    \"import os\\n\",\n    \"from random import choice, sample\\n\",\n    \"\\n\",\n    \"import numpy as np\\n\",\n    \"from skimage.io import imread\\n\",\n    \"from skimage.transform import rotate\\n\",\n    \"from tensorflow.keras.utils import to_categorical\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"## Preprocessing\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"### RGB data\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": null,\n   \"metadata\": {},\n   \"outputs\": [],\n   \"source\": [\n    \"def preprocessing_image_rgb(x):\\n\",\n    \"    # define mean and std values\\n\",\n    \"    mean = [87.845, 96.965, 103.947]\\n\",\n    \"    std = [23.657, 16.474, 13.793]\\n\",\n    \"    # loop over image channels\\n\",\n    \"    for idx, mean_value in enumerate(mean):\\n\",\n    \"        x[..., idx] -= mean_value\\n\",\n    \"        x[..., idx] /= std[idx]\\n\",\n    \"    return x\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"### MS data\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": null,\n   \"metadata\": {},\n   \"outputs\": [],\n   \"source\": [\n    \"def preprocessing_image_ms(x):\\n\",\n    \"    # define mean and std values\\n\",\n    \"    mean = [1353.036, 1116.468, 1041.475, 945.344, 1198.498, 2004.878,\\n\",\n    \"            2376.699, 2303.738, 732.957, 12.092, 1818.820, 1116.271, 2602.579]\\n\",\n    \"    std = [65.479, 154.008, 187.997, 278.508, 228.122, 356.598, 456.035,\\n\",\n    \"           531.570, 98.947, 1.188, 378.993, 303.851, 503.181]\\n\",\n    \"    # loop over image channels\\n\",\n    \"    for idx, mean_value in enumerate(mean):\\n\",\n    \"        x[..., idx] -= mean_value\\n\",\n    \"        x[..., idx] /= std[idx]\\n\",\n    \"    return x\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"## Image Generator for MS data\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"### Get label from file name\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": null,\n   \"metadata\": {},\n   \"outputs\": [],\n   \"source\": [\n    \"def categorical_label_from_full_file_name(files, class_indices):\\n\",\n    \"    # file basename without extension\\n\",\n    \"    base_name = [os.path.splitext(os.path.basename(i))[0] for i in files]\\n\",\n    \"    # class label from filename\\n\",\n    \"    base_name = [i.split(\\\"_\\\")[0] for i in base_name]\\n\",\n    \"    # label to indices\\n\",\n    \"    image_class = [class_indices[i] for i in base_name]\\n\",\n    \"    # class indices to one-hot-label\\n\",\n    \"    return to_categorical(image_class, num_classes=len(class_indices))\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"### Generate images\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": null,\n   \"metadata\": {},\n   \"outputs\": [],\n   \"source\": [\n    \"def simple_image_generator(files, class_indices, batch_size=32,\\n\",\n    \"                           rotation_range=0, horizontal_flip=False,\\n\",\n    \"                           vertical_flip=False):\\n\",\n    \"\\n\",\n    \"    while True:\\n\",\n    \"        # select batch_size number of samples without replacement\\n\",\n    \"        batch_files = sample(files, batch_size)\\n\",\n    \"        # get one_hot_label\\n\",\n    \"        batch_Y = categorical_label_from_full_file_name(batch_files,\\n\",\n    \"                                                        class_indices)\\n\",\n    \"        # array for images\\n\",\n    \"        batch_X = []\\n\",\n    \"        # loop over images of the current batch\\n\",\n    \"        for idx, input_path in enumerate(batch_files):\\n\",\n    \"            image = np.array(imread(input_path), dtype=float)\\n\",\n    \"            image = preprocessing_image_ms(image)\\n\",\n    \"            # process image\\n\",\n    \"            if horizontal_flip:\\n\",\n    \"                # randomly flip image up/down\\n\",\n    \"                if choice([True, False]):\\n\",\n    \"                    image = np.flipud(image)\\n\",\n    \"            if vertical_flip:\\n\",\n    \"                # randomly flip image left/right\\n\",\n    \"                if choice([True, False]):\\n\",\n    \"                    image = np.fliplr(image)\\n\",\n    \"            # rotate image by random angle between\\n\",\n    \"            # -rotation_range <= angle < rotation_range\\n\",\n    \"            if rotation_range is not 0:\\n\",\n    \"                angle = np.random.uniform(low=-abs(rotation_range),\\n\",\n    \"                                          high=abs(rotation_range))\\n\",\n    \"                image = rotate(image, angle, mode='reflect',\\n\",\n    \"                               order=1, preserve_range=True)\\n\",\n    \"            # put all together\\n\",\n    \"            batch_X += [image]\\n\",\n    \"        # convert lists to np.array\\n\",\n    \"        X = np.array(batch_X)\\n\",\n    \"        Y = np.array(batch_Y)\\n\",\n    \"\\n\",\n    \"        yield(X, Y)\"\n   ]\n  }\n ],\n \"metadata\": {\n  \"kernelspec\": {\n   \"display_name\": \"Python 3\",\n   \"language\": \"python\",\n   \"name\": \"python3\"\n  },\n  \"language_info\": {\n   \"codemirror_mode\": {\n    \"name\": \"ipython\",\n    \"version\": 3\n   },\n   \"file_extension\": \".py\",\n   \"mimetype\": \"text/x-python\",\n   \"name\": \"python\",\n   \"nbconvert_exporter\": \"python\",\n   \"pygments_lexer\": \"ipython3\",\n   \"version\": \"3.7.7\"\n  },\n  \"varInspector\": {\n   \"cols\": {\n    \"lenName\": 16,\n    \"lenType\": 16,\n    \"lenVar\": 40\n   },\n   \"kernels_config\": {\n    \"python\": {\n     \"delete_cmd_postfix\": \"\",\n     \"delete_cmd_prefix\": \"del \",\n     \"library\": \"var_list.py\",\n     \"varRefreshCmd\": \"print(var_dic_list())\"\n    },\n    \"r\": {\n     \"delete_cmd_postfix\": \") \",\n     \"delete_cmd_prefix\": \"rm(\",\n     \"library\": \"var_list.r\",\n     \"varRefreshCmd\": \"cat(var_dic_list()) \"\n    }\n   },\n   \"types_to_exclude\": [\n    \"module\",\n    \"function\",\n    \"builtin_function_or_method\",\n    \"instance\",\n    \"_Feature\"\n   ],\n   \"window_display\": false\n  }\n },\n \"nbformat\": 4,\n \"nbformat_minor\": 2\n}\n"
  },
  {
    "path": "py/Train_from_Scratch.ipynb",
    "content": "{\n \"cells\": [\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"# Notebook for 05_train_ms_from_scratch.py\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"###  Import libaries\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": null,\n   \"metadata\": {},\n   \"outputs\": [],\n   \"source\": [\n    \"import os\\n\",\n    \"from glob import glob\\n\",\n    \"\\n\",\n    \"from tensorflow.keras.applications.vgg16 import VGG16 as VGG\\n\",\n    \"from tensorflow.keras.applications.densenet import DenseNet201 as DenseNet\\n\",\n    \"from tensorflow.keras.layers import GlobalAveragePooling2D, Dense\\n\",\n    \"from tensorflow.keras.models import Model\\n\",\n    \"from tensorflow.keras.callbacks import ModelCheckpoint, EarlyStopping, TensorBoard\\n\",\n    \"\\n\",\n    \"from image_functions import simple_image_generator\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"### define path to training and validation data\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": null,\n   \"metadata\": {},\n   \"outputs\": [],\n   \"source\": [\n    \"# variables\\n\",\n    \"path_to_split_datasets = \\\"~/Documents/Data/PyCon/AllBands\\\"\\n\",\n    \"use_vgg = False\\n\",\n    \"batch_size = 64\\n\",\n    \"\\n\",\n    \"# contruct path\\n\",\n    \"path_to_home = os.path.expanduser(\\\"~\\\")\\n\",\n    \"path_to_split_datasets = path_to_split_datasets.replace(\\\"~\\\", path_to_home)\\n\",\n    \"path_to_train = os.path.join(path_to_split_datasets, \\\"train\\\")\\n\",\n    \"path_to_validation = os.path.join(path_to_split_datasets, \\\"validation\\\")\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"![tree](images_for_notebook/tree_files.png \\\"file_tree\\\")\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"### define classes\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": null,\n   \"metadata\": {},\n   \"outputs\": [],\n   \"source\": [\n    \"class_indices = {'AnnualCrop': 0, 'Forest': 1, 'HerbaceousVegetation': 2,\\n\",\n    \"                 'Highway': 3, 'Industrial': 4, 'Pasture': 5,\\n\",\n    \"                 'PermanentCrop': 6, 'Residential': 7, 'River': 8,\\n\",\n    \"                 'SeaLake': 9}\\n\",\n    \"num_classes = len(class_indices)\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"## Training from scratch\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"![vgg16](images_for_notebook/vgg16.png \\\"Original VGG\\\")\\n\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"### 1. Initialize network model without top layers\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"![vgg16_no_top](images_for_notebook/vgg16_no_top.png \\\"VGG no top\\\")\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": null,\n   \"metadata\": {},\n   \"outputs\": [],\n   \"source\": [\n    \"# parameters for CNN\\n\",\n    \"if use_vgg:\\n\",\n    \"    base_model = VGG(include_top=False,\\n\",\n    \"                     weights=None,\\n\",\n    \"                     input_shape=(64, 64, 13))\\n\",\n    \"else:\\n\",\n    \"    base_model = DenseNet(include_top=False,\\n\",\n    \"                          weights=None,\\n\",\n    \"                          input_shape=(64, 64, 13))\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"### 2. define new top layers\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"![vgg16_sentinel_rgb](images_for_notebook/vgg16_sentinel_rgb.png \\\"VGG RGB Sentinel\\\")\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": null,\n   \"metadata\": {},\n   \"outputs\": [],\n   \"source\": [\n    \"# add a global spatial average pooling layer\\n\",\n    \"top_model = base_model.output\\n\",\n    \"top_model = GlobalAveragePooling2D()(top_model)\\n\",\n    \"# or just flatten the layers\\n\",\n    \"#    top_model = Flatten()(top_model)\\n\",\n    \"# let's add a fully-connected layer\\n\",\n    \"if use_vgg:\\n\",\n    \"    # only in VGG19 a fully connected nn is added for classfication\\n\",\n    \"    # DenseNet tends to overfitting if using additionally dense layers\\n\",\n    \"    top_model = Dense(2048, activation='relu')(top_model)\\n\",\n    \"    top_model = Dense(2048, activation='relu')(top_model)\\n\",\n    \"# and a logistic layer\\n\",\n    \"predictions = Dense(num_classes, activation='softmax')(top_model)\\n\",\n    \"# this is the model we will train\\n\",\n    \"model = Model(inputs=base_model.input, outputs=predictions)\\n\",\n    \"# print network structure\\n\",\n    \"model.summary()\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"### 3. define data augmentation\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": null,\n   \"metadata\": {},\n   \"outputs\": [],\n   \"source\": [\n    \"# defining ImageDataGenerators\\n\",\n    \"# ... initialization for training\\n\",\n    \"training_files = glob(path_to_train + \\\"/**/*.tif\\\")\\n\",\n    \"train_generator = simple_image_generator(training_files, class_indices,\\n\",\n    \"                                         batch_size=batch_size,\\n\",\n    \"                                         rotation_range=45,\\n\",\n    \"                                         horizontal_flip=True,\\n\",\n    \"                                         vertical_flip=True)\\n\",\n    \"\\n\",\n    \"# ... initialization for validation\\n\",\n    \"validation_files = glob(path_to_validation + \\\"/**/*.tif\\\")\\n\",\n    \"validation_generator = simple_image_generator(validation_files, class_indices,\\n\",\n    \"                                              batch_size=batch_size)\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"### 4. define callbacks\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": null,\n   \"metadata\": {},\n   \"outputs\": [],\n   \"source\": [\n    \"# generate callback to save best model w.r.t val_categorical_accuracy\\n\",\n    \"if use_vgg:\\n\",\n    \"    file_name = \\\"vgg\\\"\\n\",\n    \"else:\\n\",\n    \"    file_name = \\\"dense\\\"\\n\",\n    \"checkpointer = ModelCheckpoint(\\\"../data/models/\\\" + file_name +\\n\",\n    \"                               \\\"_ms_from_scratch.\\\" +\\n\",\n    \"                               \\\"{epoch:02d}-{val_categorical_accuracy:.3f}.\\\" +\\n\",\n    \"                               \\\"hdf5\\\",\\n\",\n    \"                               monitor='val_categorical_accuracy',\\n\",\n    \"                               verbose=1,\\n\",\n    \"                               save_best_only=True,\\n\",\n    \"                               mode='max')\\n\",\n    \"earlystopper = EarlyStopping(monitor='val_categorical_accuracy',\\n\",\n    \"                             patience=50,\\n\",\n    \"                             mode='max',\\n\",\n    \"                             restore_best_weights=True)\\n\",\n    \"\\n\",\n    \"tensorboard = TensorBoard(log_dir='./logs', write_graph=True,\\n\",\n    \"                          write_images=True, update_freq='epoch')\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"![tensorflow](images_for_notebook/tensorflow.png \\\"VGG RGB Sentinel\\\")\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"### 8. fit model\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": null,\n   \"metadata\": {},\n   \"outputs\": [],\n   \"source\": [\n    \"# compile the model\\n\",\n    \"model.compile(optimizer='adam', loss='categorical_crossentropy',\\n\",\n    \"              metrics=['categorical_accuracy'])\\n\",\n    \"\\n\",\n    \"model.fit(\\n\",\n    \"        train_generator,\\n\",\n    \"        steps_per_epoch=100,\\n\",\n    \"        epochs=5,\\n\",\n    \"        callbacks=[checkpointer, earlystopper, tensorboard],\\n\",\n    \"        validation_data=validation_generator,\\n\",\n    \"        validation_steps=500)\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": null,\n   \"metadata\": {},\n   \"outputs\": [],\n   \"source\": []\n  }\n ],\n \"metadata\": {\n  \"kernelspec\": {\n   \"display_name\": \"Python 3\",\n   \"language\": \"python\",\n   \"name\": \"python3\"\n  },\n  \"language_info\": {\n   \"codemirror_mode\": {\n    \"name\": \"ipython\",\n    \"version\": 3\n   },\n   \"file_extension\": \".py\",\n   \"mimetype\": \"text/x-python\",\n   \"name\": \"python\",\n   \"nbconvert_exporter\": \"python\",\n   \"pygments_lexer\": \"ipython3\",\n   \"version\": \"3.8.5\"\n  },\n  \"varInspector\": {\n   \"cols\": {\n    \"lenName\": 16,\n    \"lenType\": 16,\n    \"lenVar\": 40\n   },\n   \"kernels_config\": {\n    \"python\": {\n     \"delete_cmd_postfix\": \"\",\n     \"delete_cmd_prefix\": \"del \",\n     \"library\": \"var_list.py\",\n     \"varRefreshCmd\": \"print(var_dic_list())\"\n    },\n    \"r\": {\n     \"delete_cmd_postfix\": \") \",\n     \"delete_cmd_prefix\": \"rm(\",\n     \"library\": \"var_list.r\",\n     \"varRefreshCmd\": \"cat(var_dic_list()) \"\n    }\n   },\n   \"types_to_exclude\": [\n    \"module\",\n    \"function\",\n    \"builtin_function_or_method\",\n    \"instance\",\n    \"_Feature\"\n   ],\n   \"window_display\": false\n  }\n },\n \"nbformat\": 4,\n \"nbformat_minor\": 2\n}\n"
  },
  {
    "path": "py/Transfer_learning.ipynb",
    "content": "{\n \"cells\": [\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"# Note for 02_train_rgb_finetuning.py\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"###  Import libaries\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": 1,\n   \"metadata\": {},\n   \"outputs\": [\n    {\n     \"name\": \"stderr\",\n     \"output_type\": \"stream\",\n     \"text\": [\n      \"Using TensorFlow backend.\\n\"\n     ]\n    }\n   ],\n   \"source\": [\n    \"import os\\n\",\n    \"\\n\",\n    \"from tensorflow.keras.preprocessing.image import ImageDataGenerator\\n\",\n    \"from tensorflow.keras.applications.vgg16 import VGG16 as VGG\\n\",\n    \"from tensorflow.keras.applications.densenet import DenseNet201 as DenseNet\\n\",\n    \"from tensorflow.keras.optimizers import SGD\\n\",\n    \"from tensorflow.keras.layers import GlobalAveragePooling2D, Dense\\n\",\n    \"from tensorflow.keras.models import Model\\n\",\n    \"from tensorflow.keras.callbacks import ModelCheckpoint, EarlyStopping, TensorBoard\\n\",\n    \"\\n\",\n    \"from image_functions import preprocessing_image_rgb\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"### define path to training and validation data\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": 2,\n   \"metadata\": {},\n   \"outputs\": [],\n   \"source\": [\n    \"# variables\\n\",\n    \"path_to_split_datasets = \\\"~/Documents/Data/PyCon/RGB\\\"\\n\",\n    \"use_vgg = True\\n\",\n    \"batch_size = 64\\n\",\n    \"\\n\",\n    \"# contruct path\\n\",\n    \"path_to_home = os.path.expanduser(\\\"~\\\")\\n\",\n    \"path_to_split_datasets = path_to_split_datasets.replace(\\\"~\\\", path_to_home)\\n\",\n    \"path_to_train = os.path.join(path_to_split_datasets, \\\"train\\\")\\n\",\n    \"path_to_validation = os.path.join(path_to_split_datasets, \\\"validation\\\")\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"![tree](images_for_notebook/tree_files.png \\\"file_tree\\\")\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"### determine number of classes from data\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": 3,\n   \"metadata\": {},\n   \"outputs\": [],\n   \"source\": [\n    \"# get number of classes\\n\",\n    \"sub_dirs = [sub_dir for sub_dir in os.listdir(path_to_train)\\n\",\n    \"            if os.path.isdir(os.path.join(path_to_train, sub_dir))]\\n\",\n    \"num_classes = len(sub_dirs)\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"## Transfer-learning \"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"![vgg16](images_for_notebook/vgg16.png \\\"Original VGG\\\")\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"### 1. Pretrained network model without top layers\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"![vgg16_no_top](images_for_notebook/vgg16_no_top.png \\\"VGG no top\\\")\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": 4,\n   \"metadata\": {},\n   \"outputs\": [],\n   \"source\": [\n    \"# parameters for CNN\\n\",\n    \"if use_vgg:\\n\",\n    \"    base_model = VGG(include_top=False,\\n\",\n    \"                     weights='imagenet',\\n\",\n    \"                     input_shape=(64, 64, 3))\\n\",\n    \"else:\\n\",\n    \"    base_model = DenseNet(include_top=False,\\n\",\n    \"                          weights='imagenet',\\n\",\n    \"                          input_shape=(64, 64, 3))\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"### 2. define new top layers\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"![vgg16_sentinel_rgb](images_for_notebook/vgg16_sentinel_rgb.png \\\"VGG RGB Sentinel\\\")\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": null,\n   \"metadata\": {},\n   \"outputs\": [],\n   \"source\": [\n    \"# add a global spatial average pooling layer\\n\",\n    \"top_model = base_model.output\\n\",\n    \"top_model = GlobalAveragePooling2D()(top_model)\\n\",\n    \"# or just flatten the layers\\n\",\n    \"#    top_model = Flatten()(top_model)\\n\",\n    \"# let's add a fully-connected layer\\n\",\n    \"if use_vgg:\\n\",\n    \"    # only in VGG19 a fully connected nn is added for classfication\\n\",\n    \"    # DenseNet tends to overfitting if using additionally dense layers\\n\",\n    \"    top_model = Dense(2048, activation='relu')(top_model)\\n\",\n    \"    top_model = Dense(2048, activation='relu')(top_model)\\n\",\n    \"# and a logistic layer\\n\",\n    \"predictions = Dense(num_classes, activation='softmax')(top_model)\\n\",\n    \"\\n\",\n    \"# this is the model we will train\\n\",\n    \"model = Model(inputs=base_model.input, outputs=predictions)\\n\",\n    \"\\n\",\n    \"# print network structure\\n\",\n    \"model.summary()\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"### 3. define data augmentation\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": 6,\n   \"metadata\": {},\n   \"outputs\": [\n    {\n     \"name\": \"stdout\",\n     \"output_type\": \"stream\",\n     \"text\": [\n      \"Found 18900 images belonging to 10 classes.\\n\",\n      \"{'AnnualCrop': 0, 'Forest': 1, 'HerbaceousVegetation': 2, 'Highway': 3, 'Industrial': 4, 'Pasture': 5, 'PermanentCrop': 6, 'Residential': 7, 'River': 8, 'SeaLake': 9}\\n\",\n      \"Found 8100 images belonging to 10 classes.\\n\"\n     ]\n    }\n   ],\n   \"source\": [\n    \"# defining ImageDataGenerators\\n\",\n    \"# ... initialization for training\\n\",\n    \"train_datagen = ImageDataGenerator(fill_mode=\\\"reflect\\\",\\n\",\n    \"                                   rotation_range=45,\\n\",\n    \"                                   horizontal_flip=True,\\n\",\n    \"                                   vertical_flip=True,\\n\",\n    \"                                   preprocessing_function=preprocessing_image_rgb)\\n\",\n    \"# ... initialization for validation\\n\",\n    \"test_datagen = ImageDataGenerator(preprocessing_function=preprocessing_image_rgb)\\n\",\n    \"# ... definition for training\\n\",\n    \"train_generator = train_datagen.flow_from_directory(path_to_train,\\n\",\n    \"                                                    target_size=(64, 64),\\n\",\n    \"                                                    batch_size=batch_size,\\n\",\n    \"                                                    class_mode='categorical')\\n\",\n    \"# just for information\\n\",\n    \"class_indices = train_generator.class_indices\\n\",\n    \"print(class_indices)\\n\",\n    \"\\n\",\n    \"# ... definition for validation\\n\",\n    \"validation_generator = test_datagen.flow_from_directory(path_to_validation,\\n\",\n    \"                                                        target_size=(64, 64),\\n\",\n    \"                                                        batch_size=batch_size,\\n\",\n    \"                                                        class_mode='categorical')\\n\",\n    \"\\n\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"### 4. define callbacks\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": 7,\n   \"metadata\": {},\n   \"outputs\": [],\n   \"source\": [\n    \"# generate callback to save best model w.r.t val_categorical_accuracy\\n\",\n    \"if use_vgg:\\n\",\n    \"    file_name = \\\"vgg\\\"\\n\",\n    \"else:\\n\",\n    \"    file_name = \\\"dense\\\"\\n\",\n    \"\\n\",\n    \"checkpointer = ModelCheckpoint(\\\"../data/models/\\\" + file_name +\\n\",\n    \"                               \\\"_rgb_transfer_init.\\\" +\\n\",\n    \"                               \\\"{epoch:02d}-{val_categorical_accuracy:.3f}.\\\" +\\n\",\n    \"                               \\\"hdf5\\\",\\n\",\n    \"                               monitor='val_categorical_accuracy',\\n\",\n    \"                               verbose=1,\\n\",\n    \"                               save_best_only=True,\\n\",\n    \"                               mode='max')\\n\",\n    \"\\n\",\n    \"earlystopper = EarlyStopping(monitor='val_categorical_accuracy',\\n\",\n    \"                             patience=10,\\n\",\n    \"                             mode='max',\\n\",\n    \"                             restore_best_weights=True)\\n\",\n    \"\\n\",\n    \"tensorboard = TensorBoard(log_dir='./logs', write_graph=True, \\n\",\n    \"                          write_images=True, update_freq='epoch')\\n\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"![tensorflow](images_for_notebook/tensorflow.png \\\"VGG RGB Sentinel\\\")\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"### 5. set base layers non trainable \"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"![vgg16_rgb_init](images_for_notebook/vgg16_rgb_init.png \\\"VGG RGB Sentinel\\\")\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": 8,\n   \"metadata\": {},\n   \"outputs\": [],\n   \"source\": [\n    \"# first: train only the top layers (which were randomly initialized)\\n\",\n    \"# i.e. freeze all convolutional layers\\n\",\n    \"for layer in base_model.layers:\\n\",\n    \"    layer.trainable = False\\n\",\n    \"\\n\",\n    \"# compile the model (should be done *after* setting layers to non-trainable)\\n\",\n    \"model.compile(optimizer='adadelta', loss='categorical_crossentropy',\\n\",\n    \"              metrics=['categorical_accuracy'])\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"### 6. fit model (train new top layers)\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": 9,\n   \"metadata\": {},\n   \"outputs\": [\n    {\n     \"name\": \"stdout\",\n     \"output_type\": \"stream\",\n     \"text\": [\n      \"Epoch 1/5\\n\",\n      \"100/100 [==============================] - 18s 182ms/step - loss: 0.6681 - categorical_accuracy: 0.7841 - val_loss: 0.3120 - val_categorical_accuracy: 0.8931\\n\",\n      \"\\n\",\n      \"Epoch 00001: val_categorical_accuracy improved from -inf to 0.89309, saving model to ../data/models/vgg_rgb_transfer_init.01-0.893.hdf5\\n\",\n      \"Epoch 2/5\\n\",\n      \"100/100 [==============================] - 17s 174ms/step - loss: 0.3477 - categorical_accuracy: 0.8808 - val_loss: 0.2733 - val_categorical_accuracy: 0.9039\\n\",\n      \"\\n\",\n      \"Epoch 00002: val_categorical_accuracy improved from 0.89309 to 0.90388, saving model to ../data/models/vgg_rgb_transfer_init.02-0.904.hdf5\\n\",\n      \"Epoch 3/5\\n\",\n      \"100/100 [==============================] - 17s 172ms/step - loss: 0.3098 - categorical_accuracy: 0.8961 - val_loss: 0.2515 - val_categorical_accuracy: 0.9126\\n\",\n      \"\\n\",\n      \"Epoch 00003: val_categorical_accuracy improved from 0.90388 to 0.91263, saving model to ../data/models/vgg_rgb_transfer_init.03-0.913.hdf5\\n\",\n      \"Epoch 4/5\\n\",\n      \"100/100 [==============================] - 17s 174ms/step - loss: 0.2719 - categorical_accuracy: 0.9106 - val_loss: 0.2330 - val_categorical_accuracy: 0.9200\\n\",\n      \"\\n\",\n      \"Epoch 00004: val_categorical_accuracy improved from 0.91263 to 0.92003, saving model to ../data/models/vgg_rgb_transfer_init.04-0.920.hdf5\\n\",\n      \"Epoch 5/5\\n\",\n      \"100/100 [==============================] - 17s 172ms/step - loss: 0.2601 - categorical_accuracy: 0.9136 - val_loss: 0.3209 - val_categorical_accuracy: 0.8893\\n\",\n      \"\\n\",\n      \"Epoch 00005: val_categorical_accuracy did not improve from 0.92003\\n\"\n     ]\n    }\n   ],\n   \"source\": [\n    \"history = model.fit(train_generator,\\n\",\n    \"                    steps_per_epoch=100,\\n\",\n    \"                    epochs=5,\\n\",\n    \"                    callbacks=[checkpointer, earlystopper,\\n\",\n    \"                               tensorboard],\\n\",\n    \"                    validation_data=validation_generator,\\n\",\n    \"                    validation_steps=500)\\n\",\n    \"initial_epoch = len(history.history['loss'])\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"### 7. set (some) base layers trainable\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"![vgg16_rgb_finetune](images_for_notebook/vgg16_rgb_finetune.png)\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": null,\n   \"metadata\": {},\n   \"outputs\": [],\n   \"source\": [\n    \"# at this point, the top layers are well trained and we can start fine-tuning\\n\",\n    \"# convolutional layers. We will freeze the bottom N layers\\n\",\n    \"# and train the remaining top layers.\\n\",\n    \"\\n\",\n    \"# let's visualize layer names and layer indices to see how many layers\\n\",\n    \"# we should freeze:\\n\",\n    \"names = []\\n\",\n    \"for i, layer in enumerate(model.layers):\\n\",\n    \"    print([i, layer.name, layer.trainable])\\n\",\n    \"\\n\",\n    \"if use_vgg:\\n\",\n    \"    # we will freaze the first convolutional block and train all\\n\",\n    \"    # remaining blocks, including top layers.\\n\",\n    \"    for layer in model.layers[:4]:\\n\",\n    \"        layer.trainable = False\\n\",\n    \"    for layer in model.layers[4:]:\\n\",\n    \"        layer.trainable = True\\n\",\n    \"else:\\n\",\n    \"    for layer in model.layers[:7]:\\n\",\n    \"        layer.trainable = False\\n\",\n    \"    for layer in model.layers[7:]:\\n\",\n    \"        layer.trainable = True\\n\",\n    \"\\n\",\n    \"# we need to recompile the model for these modifications to take effect\\n\",\n    \"# we use SGD with a low learning rate\\n\",\n    \"model.compile(optimizer=SGD(lr=0.0001, momentum=0.9),\\n\",\n    \"              loss='categorical_crossentropy',\\n\",\n    \"              metrics=['categorical_accuracy'])\"\n   ]\n  },\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"### 8. fit model (fine-tune base and top layers)\"\n   ]\n  },\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": 11,\n   \"metadata\": {\n    \"scrolled\": true\n   },\n   \"outputs\": [\n    {\n     \"name\": \"stdout\",\n     \"output_type\": \"stream\",\n     \"text\": [\n      \"Epoch 6/10\\n\",\n      \"100/100 [==============================] - 18s 181ms/step - loss: 0.1926 - categorical_accuracy: 0.9299 - val_loss: 0.1883 - val_categorical_accuracy: 0.9356\\n\",\n      \"\\n\",\n      \"Epoch 00006: val_categorical_accuracy improved from -inf to 0.93564, saving model to ../data/models/vgg_rgb_transfer_final.06-0.936.hdf5\\n\",\n      \"Epoch 7/10\\n\",\n      \"100/100 [==============================] - 18s 178ms/step - loss: 0.1697 - categorical_accuracy: 0.9409 - val_loss: 0.1771 - val_categorical_accuracy: 0.9399\\n\",\n      \"\\n\",\n      \"Epoch 00007: val_categorical_accuracy improved from 0.93564 to 0.93985, saving model to ../data/models/vgg_rgb_transfer_final.07-0.940.hdf5\\n\",\n      \"Epoch 8/10\\n\",\n      \"100/100 [==============================] - 18s 176ms/step - loss: 0.1614 - categorical_accuracy: 0.9436 - val_loss: 0.1674 - val_categorical_accuracy: 0.9430\\n\",\n      \"\\n\",\n      \"Epoch 00008: val_categorical_accuracy improved from 0.93985 to 0.94299, saving model to ../data/models/vgg_rgb_transfer_final.08-0.943.hdf5\\n\",\n      \"Epoch 9/10\\n\",\n      \"100/100 [==============================] - 17s 174ms/step - loss: 0.1489 - categorical_accuracy: 0.9515 - val_loss: 0.1619 - val_categorical_accuracy: 0.9466\\n\",\n      \"\\n\",\n      \"Epoch 00009: val_categorical_accuracy improved from 0.94299 to 0.94663, saving model to ../data/models/vgg_rgb_transfer_final.09-0.947.hdf5\\n\",\n      \"Epoch 10/10\\n\",\n      \"100/100 [==============================] - 17s 174ms/step - loss: 0.1279 - categorical_accuracy: 0.9544 - val_loss: 0.1482 - val_categorical_accuracy: 0.9483\\n\",\n      \"\\n\",\n      \"Epoch 00010: val_categorical_accuracy improved from 0.94663 to 0.94829, saving model to ../data/models/vgg_rgb_transfer_final.10-0.948.hdf5\\n\"\n     ]\n    },\n    {\n     \"data\": {\n      \"text/plain\": [\n       \"<keras.callbacks.History at 0x7fdff804c978>\"\n      ]\n     },\n     \"execution_count\": 11,\n     \"metadata\": {},\n     \"output_type\": \"execute_result\"\n    }\n   ],\n   \"source\": [\n    \"# generate callback to save best model w.r.t val_categorical_accuracy\\n\",\n    \"if use_vgg:\\n\",\n    \"    file_name = \\\"vgg\\\"\\n\",\n    \"else:\\n\",\n    \"    file_name = \\\"dense\\\"\\n\",\n    \"checkpointer = ModelCheckpoint(\\\"../data/models/\\\" + file_name +\\n\",\n    \"                               \\\"_rgb_transfer_final.\\\" +\\n\",\n    \"                               \\\"{epoch:02d}-{val_categorical_accuracy:.3f}\\\" +\\n\",\n    \"                               \\\".hdf5\\\",\\n\",\n    \"                               monitor='val_categorical_accuracy',\\n\",\n    \"                               verbose=1,\\n\",\n    \"                               save_best_only=True,\\n\",\n    \"                               mode='max')\\n\",\n    \"earlystopper = EarlyStopping(monitor='val_categorical_accuracy',\\n\",\n    \"                             patience=50,\\n\",\n    \"                             mode='max')\\n\",\n    \"model.fit(train_generator,\\n\",\n    \"          steps_per_epoch=100,\\n\",\n    \"          epochs=initial_epoch+5,\\n\",\n    \"          callbacks=[checkpointer, earlystopper, tensorboard],\\n\",\n    \"          validation_data=validation_generator,\\n\",\n    \"          validation_steps=500,\\n\",\n    \"          initial_epoch=initial_epoch)\"\n   ]\n  }\n ],\n \"metadata\": {\n  \"kernelspec\": {\n   \"display_name\": \"Python 3\",\n   \"language\": \"python\",\n   \"name\": \"python3\"\n  },\n  \"language_info\": {\n   \"codemirror_mode\": {\n    \"name\": \"ipython\",\n    \"version\": 3\n   },\n   \"file_extension\": \".py\",\n   \"mimetype\": \"text/x-python\",\n   \"name\": \"python\",\n   \"nbconvert_exporter\": \"python\",\n   \"pygments_lexer\": \"ipython3\",\n   \"version\": \"3.8.5\"\n  },\n  \"varInspector\": {\n   \"cols\": {\n    \"lenName\": 16,\n    \"lenType\": 16,\n    \"lenVar\": 40\n   },\n   \"kernels_config\": {\n    \"python\": {\n     \"delete_cmd_postfix\": \"\",\n     \"delete_cmd_prefix\": \"del \",\n     \"library\": \"var_list.py\",\n     \"varRefreshCmd\": \"print(var_dic_list())\"\n    },\n    \"r\": {\n     \"delete_cmd_postfix\": \") \",\n     \"delete_cmd_prefix\": \"rm(\",\n     \"library\": \"var_list.r\",\n     \"varRefreshCmd\": \"cat(var_dic_list()) \"\n    }\n   },\n   \"types_to_exclude\": [\n    \"module\",\n    \"function\",\n    \"builtin_function_or_method\",\n    \"instance\",\n    \"_Feature\"\n   ],\n   \"window_display\": false\n  }\n },\n \"nbformat\": 4,\n \"nbformat_minor\": 2\n}\n"
  },
  {
    "path": "py/image_functions.py",
    "content": "# -*- coding: utf-8 -*-\nimport os\nfrom random import choice, sample\n\nimport numpy as np\nfrom skimage.io import imread\nfrom skimage.transform import rotate\nfrom tensorflow.keras.utils import to_categorical\n\n\ndef preprocessing_image_rgb(x):\n    # define mean and std values\n    mean = [87.845, 96.965, 103.947]\n    std = [23.657, 16.474, 13.793]\n    # loop over image channels\n    for idx, mean_value in enumerate(mean):\n        x[..., idx] -= mean_value\n        x[..., idx] /= std[idx]\n    return x\n\n\ndef preprocessing_image_ms(x):\n    # define mean and std values\n    mean = [1353.036, 1116.468, 1041.475, 945.344, 1198.498, 2004.878,\n            2376.699, 2303.738, 732.957, 12.092, 1818.820, 1116.271, 2602.579]\n    std = [65.479, 154.008, 187.997, 278.508, 228.122, 356.598, 456.035,\n           531.570, 98.947, 1.188, 378.993, 303.851, 503.181]\n    # loop over image channels\n    for idx, mean_value in enumerate(mean):\n        x[..., idx] -= mean_value\n        x[..., idx] /= std[idx]\n    return x\n\n\ndef categorical_label_from_full_file_name(files, class_indices):\n    # file basename without extension\n    base_name = [os.path.splitext(os.path.basename(i))[0] for i in files]\n    # class label from filename\n    base_name = [i.split(\"_\")[0] for i in base_name]\n    # label to indices\n    image_class = [class_indices[i] for i in base_name]\n    # class indices to one-hot-label\n    return to_categorical(image_class, num_classes=len(class_indices))\n\n\ndef simple_image_generator(files, class_indices, batch_size=32,\n                           rotation_range=0, horizontal_flip=False,\n                           vertical_flip=False):\n\n    while True:\n        # select batch_size number of samples without replacement\n        batch_files = sample(files, batch_size)\n        # get one_hot_label\n        batch_Y = categorical_label_from_full_file_name(batch_files,\n                                                        class_indices)\n        # array for images\n        batch_X = []\n        # loop over images of the current batch\n        for idx, input_path in enumerate(batch_files):\n            image = np.array(imread(input_path), dtype=float)\n            image = preprocessing_image_ms(image)\n            # process image\n            if horizontal_flip:\n                # randomly flip image up/down\n                if choice([True, False]):\n                    image = np.flipud(image)\n            if vertical_flip:\n                # randomly flip image left/right\n                if choice([True, False]):\n                    image = np.fliplr(image)\n            # rotate image by random angle between\n            # -rotation_range <= angle < rotation_range\n            if rotation_range is not 0:\n                angle = np.random.uniform(low=-abs(rotation_range),\n                                          high=abs(rotation_range))\n                image = rotate(image, angle, mode='reflect',\n                               order=1, preserve_range=True)\n            # put all together\n            batch_X += [image]\n        # convert lists to np.array\n        X = np.array(batch_X)\n        Y = np.array(batch_Y)\n\n        yield(X, Y)\n"
  },
  {
    "path": "py/statistics.py",
    "content": "# -*- coding: utf-8 -*-\n\n\"\"\"Some statistics about the EuroSAT dataset.\"\"\"\n\nimport glob\nimport os\n\nimport numpy as np\nfrom osgeo import gdal\n\n\ndef getMeanStd(path, n_bands=3, n_max=-1):\n    \"\"\"Get mean and standard deviation from images.\n\n    Parameters\n    ----------\n    path : str\n        Path to training images\n    n_bands : int\n        Number of spectral bands (3 for RGB, 13 for Sentinel-2)\n    n_max : int\n        Maximum number of iterations (-1 = all)\n\n    Return\n    ------\n\n    \"\"\"\n    if not os.path.isdir(path):\n        print(\"Error: Directory does not exist.\")\n        return 0\n\n    mean_array = [[] for _ in range(n_bands)]\n    std_array = [[] for _ in range(n_bands)]\n\n    # iterate over the images\n    i = 0\n    for tif in glob.glob(path+\"*/*.*\"):\n        if (i < n_max) or (n_max == -1):\n            ds = gdal.Open(tif)\n            for band in range(n_bands):\n                mean_array[band].append(\n                    np.mean(ds.GetRasterBand(band+1).ReadAsArray()))\n                std_array[band].append(\n                    np.std(ds.GetRasterBand(band+1).ReadAsArray()))\n            i+=1\n        else:\n            break\n\n    # results\n    res_mean = [np.mean(mean_array[band]) for band in range(n_bands)]\n    res_std = [np.mean(std_array[band]) for band in range(n_bands)]\n\n    # print results table\n    print(\"Band |   Mean   |   Std\")\n    print(\"-\"*28)\n    for band in range(n_bands):\n        print(\"{band:4d} | {mean:8.3f} | {std:8.3f}\".format(\n            band=band, mean=res_mean[band], std=res_std[band]))\n\n    return res_mean, res_std\n\nif __name__ == \"__main__\":\n    getMeanStd(path=\"data/PyCon/RGB/train/\", n_bands=3)\n"
  },
  {
    "path": "requirements.txt",
    "content": "tensorflow>=2.1.0\nskimage>=0.14.1\ngdal==2.2.4\ntqdm>=4.0.0\nnumpy>=1.17.4\n"
  }
]